Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2747
3
Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo
Branislav Rovan
Peter Vojt´asˇ (Eds.)
Mathematical Foundations of Computer Science 2003 28th International Symposium, MFCS 2003 Bratislava, Slovakia, August 25-29, 2003 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Branislav Rovan Comenius University Department of Computer Science 84248 Bratislava, Slovakia E-mail:
[email protected] Peter Vojt´asˇ ˇ arik University P.J. Saf´ Department of Computer Science, Faculty of Science Jesenn´a 5, 04154 Koˇsice, Slovakia E-mail:
[email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at .
CR Subject Classification (1998): F., G.2, D.3, I.3, E.1 ISSN 0302-9743 ISBN 3-540-40671-9 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by Olgun Computergrafik Printed on acid-free paper SPIN 10929285 06/3142 543210
Preface
This volume contains papers selected for presentation at the 28th Symposium on Mathematical Foundations of Computer Science – MFCS 2003, held in Bratislava, Slovakia, August 25–29, 2003. MFCS 2003 was organized by the Slovak Society for Computer Science and the Comenius University in Bratislava, in cooperation with other institutions in Slovakia. It was supported by the European Association for Theoretical Computer Science and the Slovak Research Consortium for Informatics and Mathematics. The series of MFCS symposia, organized alternately in the Czech Republic, Poland and Slovakia since 1972, has a well-established tradition. The MFCS symposia encourage high-quality research in all branches of theoretical computer science. Their broad scope provides an opportunity to bring together specialists who do not usually meet at specialized conferences. The previous meetings took place ˇ in Jablonna, 1972; Strbsk´ e Pleso, 1973; Jadwisin, 1974; Mari´ ansk´e L´aznˇe, 1975; Gda` nsk, 1976; Tatransk´ a Lomnica, 1977; Zakopane, 1978; Olomouc, 1979; Ryˇ dzina, 1980; Strbsk´ e Pleso, 1981; Prague, 1984; Bratislava, 1986; Carlsbad, 1988; Por¸abka-Kozubnik, 1989; Bansk´ a Bystrica, 1990; Kazimierz Dolny, 1991; Prague, 1992; Gda` nsk, 1993, Koˇsice, 1994; Prague, 1995; Krak´ow, 1996; Bratislava, 1997; Brno, 1998; Szklarska Por¸eba, 1999; Bratislava, 2000; Mari´ansk´e L´aznˇe, 2001; and Warsaw-Otwock, 2002. The MFCS 2003 Proceedings consists of 7 invited papers and 55 contributed papers. The latter were selected by the Program Committee from a total of 137 submitted papers. The following program committee members took part in the evaluation and selection of submitted papers (those denoted by ∗ took part in the selection meeting in Bratislava on May 10, 2003): Julian Bradfield∗ (Edinburgh), J´ anos Csirik (Szeged), Pierpaolo Degano (Pisa), Mariangiola Dezani-Ciancaglini∗ (Torino), Krzysztof Diks (Warsaw), Juhani Karhum¨ aki∗ (Turku), Marek Karpinski (Bonn), Mojm´ır Kˇret´ınsk´ y∗ (Brno), Werner Kuich (Vienna), Jan van Leeuwen (Utrecht), Christoph Meinel (Trier), Leszek Pacholski (Wroclaw), David Peleg∗ (Rehovot), Jos´e D.P. Rolim (Geneˇıma∗ va), Branislav Rovan∗ (Bratislava, Chair ), Jan Rutten (Amsterdam), Jiˇr´ı S´ (Prague), Paul Spirakis∗ (Patras), Ulrich Ultes-Nitsche (Fribourg), Peter Vojt´ aˇs∗ (Koˇsice, Vice-chair ), and Igor Walukiewicz (Bordeaux). We would like to thank all Program Committee members for their meritorious work in evaluating the submitted papers, as well as the following referees who assisted the Program Committee members: F. Ablayev, L. Aceto, C.J. van Alten, A. Ambainis, G. Andrejkov´ a, M. Andreou, F. Arbab, V. Auletta, S. Bala, F. Bartels, M. Bellia, Y. Benenson, L. Bertossi, B. Blanchet, N. Blum, H.L. Bodlaender, F.S. de Boer, M. Bojanczyk, M.M. Bonsangue, M. Boreale, A. Bucciarelli, H. Buhrman, C. Busch, ˇ ˇ R. Cada, O. Carton, I. Cern´ a, B. Chlebus, P. Chrzastowski-Wachtel, K. Ciebiera,
VI
Preface
A. Condon, F. Corradini, B. Courcelle, P. Crescenzi, M. Crochemore, A. Czumaj, C. Damm, J. Dassow, A. Di Pierro, V. Diekert, S. Dziembowski, P. Eli´ aˇs, L. Epstein, K. Etessami, E. Fachini, S. Fedin, M. Fellows, P. Flajolet, L. Fortnow, D. Fotakis, F. Franˇek, M.P. Frank, R. Freund, S. Fr¨ oschle, S. Fujita, Z. F¨ ul¨ op, F. Gadducci, G. Galbiati, A. Gambin, V. Geffert, R. Gennaro, M. GhasemZadeh, K. Golab, R. Govindan, G. Gramlich, S. Gruner, K. Grygiel, V. Halava, T. Harju, T. Hartman, M. Hauptmann, L.A. Hemaspaandra, M. Hermann, P. Hertling, E.A. Hirsch, M. Hirvensalo, J. Honkala, C.S. Iliopoulos, S. Irani, R. Irving, G.F. Italiano, P. Janˇcar, K. Jansen, G. Jir´ askov´a, J. Kari, D. Kavvadias, C. Kenyon, L. Kirousis, V. Klotz, G. Kortsarz, V. Koubek, O. Koval, L . Kowalik, D. Kowalski, S. Krajˇci, J. Kratochv´ıl, M. Krause, V. Kreinovich, A. Kuˇcera (Brno), A. Kuˇcera (Prague), M. Kufleitner, A. Kulikov, M. Kunc, C. Kupke, P. K˚ urka, M. Kurowski, S. Lasota, R. Lencses, M. Lenisa, S. Leonardi, M. Lewenstein, C. Lhoussaine, U. de’Liguoro, L. Liquori, M. Li´skiewicz, J. Longley, M. Loreti, Z. Lotker, A. de Luca, B. Luttik, M. Maidl, Ch. Makris, A. Malinowski, A. Marchetti-Spaccamela, B. Martin, J. Matouˇsek, G. Mauri, M. Mavronicolas, F. Mera, F. Meyer auf der Heide, M. Mlotkowski, B. Monien, F. de Montgolfier, K. Morita, Ph. Moser, F. Mr´ az, M. Mucha, R. Neruda, R. Niedermeier, S. Nikoletseas, D. Niwinski, D. Nowotka, D. Oddoux, D. von Oheimb, V. van Oostrom, B. Palano, A. Panholzer, V. Papadopoulou, L. Parida, R. Paturi, G. Paun, R. Pel´ anek, A. Pelc, J. Pelik´ an, P. Penna, E. Petre, J.ˇ Porubsk´ E. Pin, W. Plandowski, M. Ploˇsˇcica, E. Porat, S. y, O. Powell, J. Power, ˇ ak, M. Repick´ M. Przybylski, P. Pudl´ ak, F. van Raamsdonk, V. Reh´ y, P. Rychlikowski, W. Rytter, P. Sankowski, V. Sassone, P. Savick´ y, Ch. Scheideler, V. Schillings, U. Sch¨ oning, N. Schweikardt, G. Semaniˇsin, M. Serna, J. Sgall, ˇ R. Silvestri, L. Skarvada, R. Solis-Oba, D. Sonowska, P. Sos´ık, D. Spielman, ˇ edr´ S. St James, I. Stark, A. Stˇ y, C. Stirling, J. Strejˇcek, T. Suel, G. Sutre, L . Sznuk, L. Tendera, Sh.-H. Teng, P. Tesson, D. Th´erien, W. Thomas, L. Trevisan, E. Tronci, T. Truderung, U. Vaccaro, M.Y. Vardi, F.-J. de Vries, T. Wale´ n, D. Walukiewicz-Chrzaszcz, I. Wegener, P. Weil, D. West, M. Westermann, ˇ ak, P. Widmayer, Th. Wilke, Th. Worsch, InSeon Yoo, Sh. Yu, R. Yuster, S. Z´ M. Zawadowski, H. Zhang, T. Zwissig. EATCS offered a Best Student Paper Award for the best paper submitted to MFCS and authored solely by students. The Program Committee decided to give this award in 2003 to Gregor Gramlich (Institut f¨ ur Informatik, Johann Wolfgang G¨ othe-Universit¨ at, Frankfurt) for his paper “Probabilistic and Nondeterministic Unary Automata.” As the editors of these proceedings, we are much indebted to all contributors to the scientific program of the symposium, especially to the authors of papers. Special thanks go to those authors who prepared the manuscripts according to the instructions and made life easier for us. We would also like to thank those who responded promptly to our requests for minor modifications and corrections in their manuscript. The database and electronic support system for the Program Committee was designed by Miroslav Chladn´ y who, together with Miroslav Zervan, made everything run smoothly. Our special thanks go to Miroslav Chladn´ y
Preface
VII
for most of the hard technical work in preparing this volume. We are also thankful to the members of the Organizing Committee who made sure that the conference ran smoothly in a pleasant environment. Last, but not least, we want to thank Springer-Verlag for excellent co-operation in the publication of this volume.
Bratislava, June 2003
Branislav Rovan Peter Vojt´ aˇs
VIII
Preface
Organized by Slovak Society for Computer Science Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava
Supported by European Association for Theoretical Computer Science Slovak Research Consortium for Informatics and Mathematics
Program Committee
Julian Bradfield (Edinburgh), J´ anos Csirik (Szeged), Pierpaolo Degano (Pisa), Mariangiola Dezani-Ciancaglini (Torino), Krzysztof Diks (Warsaw), Juhani Karhum¨ aki (Turku), Marek Karpinski (Bonn), Mojm´ır Kˇret´ınsk´ y (Brno), Werner Kuich (Vienna), Jan van Leeuwen (Utrecht), Christoph Meinel (Trier), Leszek Pacholski (Wroclaw), David Peleg (Rehovot), Jos´e D.P. Rolim (Geneva), ˇıma Branislav Rovan (Bratislava, Chair ), Jan Rutten (Amsterdam), Jiˇr´ı S´ (Prague), Paul Spirakis (Patras), Ulrich Ultes-Nitsche (Fribourg), Peter Vojt´ aˇs (Koˇsice, Vice-chair ), and Igor Walukiewicz (Bordeaux)
Organizing Committee Miroslav Chladn´ y, Vanda Hamb´ alkov´ a, Rastislav Kr´ aloviˇc, Zuzana Kubincov´ a, Marek Nagy, Martin Neh´ez, Dana Pardubsk´ a (Chair ), Edita Riˇc´anyov´ a, Branislav Rovan, Miroslav Zervan
Table of Contents
Invited Talks Distributed Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Harry Buhrman and Hein R¨ ohrig
1
Selfish Routing in Non-cooperative Networks: A Survey . . . . . . . . . . . . . . . . 21 R. Feldmann, M. Gairing, Thomas L¨ ucking, Burkhard Monien, and Manuel Rode Process Algebraic Frameworks for the Specification and Analysis of Cryptographic Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Roberto Gorrieri and Fabio Martinelli Semantic and Syntactic Approaches to Simulation Relations . . . . . . . . . . . . . 68 Jo Hannay, Shin-ya Katsumata, and Donald Sannella On the Computational Complexity of Conservative Computing . . . . . . . . . . 92 Giancarlo Mauri and Alberto Leporati Constructing Infinite Graphs with a Decidable MSO-Theory . . . . . . . . . . . . . 113 Wolfgang Thomas Towards a Theory of Randomized Search Heuristics . . . . . . . . . . . . . . . . . . . . 125 Ingo Wegener
Contributed Papers Adversarial Models for Priority-Based Networks . . . . . . . . . . . . . . . . . . . . . . . 142 ` C. Alvarez, M. Blesa, J. D´ıaz, A. Fern´ andez, and M. Serna On Optimal Merging Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Kazuyuki Amano and Akira Maruoka Problems which Cannot Be Reduced to Any Proper Subproblems . . . . . . . . 162 Klaus Ambos-Spies ACID-Unification Is NEXPTIME-Decidable . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Siva Anantharaman, Paliath Narendran, and Michael Rusinowitch Completeness in Differential Approximation Classes . . . . . . . . . . . . . . . . . . . . 179 G. Ausiello, C. Bazgan, M. Demange, and V. Th. Paschos On the Length of the Minimum Solution of Word Equations in One Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Kensuke Baba, Satoshi Tsuruta, Ayumi Shinohara, and Masayuki Takeda
X
Table of Contents
Smoothed Analysis of Three Combinatorial Problems . . . . . . . . . . . . . . . . . . . 198 Cyril Banderier, Ren´e Beier, and Kurt Mehlhorn Inferring Strings from Graphs and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Hideo Bannai, Shunsuke Inenaga, Ayumi Shinohara, and Masayuki Takeda Faster Algorithms for k-Medians in Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Robert Benkoczi, Binay Bhattacharya, Marek Chrobak, Lawrence L. Larmore, and Wojciech Rytter Periodicity and Transitivity for Cellular Automata in Besicovitch Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 F. Blanchard, J. Cervelle, and E. Formenti Starting with Nondeterminism: The Systematic Derivation of Linear-Time Graph Layout Algorithms . . . . 239 Hans L. Bodlaender, Michael R. Fellows, and Dimitrios M. Thilikos Error-Bounded Probabilistic Computations between MA and AM . . . . . . . . 249 Elmar B¨ ohler, Christian Glaßer, and Daniel Meister A Faster FPT Algorithm for Finding Spanning Trees with Many Leaves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Paul S. Bonsma, Tobias Brueggemann, and Gerhard J. Woeginger Symbolic Analysis of Crypto-Protocols Based on Modular Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Michele Boreale and Maria Grazia Buscemi Denotational Testing Semantics in Coinductive Form . . . . . . . . . . . . . . . . . . . 279 Michele Boreale and Fabio Gadducci Lower Bounds for General Graph–Driven Read–Once Parity Branching Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Henrik Brosenne, Matthias Homeister, and Stephan Waack The Minimal Graph Model of Lambda Calculus . . . . . . . . . . . . . . . . . . . . . . . . 300 Antonio Bucciarelli and Antonino Salibra Unambiguous Automata on Bi-infinite Words . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Olivier Carton Relating Hierarchy of Temporal Properties to Model Checking . . . . . . . . . . . 318 ˇ a and Radek Pel´ Ivana Cern´ anek Arithmetic Constant-Depth Circuit Complexity Classes . . . . . . . . . . . . . . . . . 328 Hubie Chen
Table of Contents
XI
Inverse NP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Hubie Chen A Linear-Time Algorithm for 7-Coloring 1-Planar Graphs . . . . . . . . . . . . . . . 348 Zhi-Zhong Chen and Mitsuharu Kouno Generalized Satisfiability with Limited Occurrences per Variable: A Study through Delta-Matroid Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Victor Dalmau and Daniel K. Ford Randomized Algorithms for Determining the Majority on Graphs . . . . . . . . 368 Gianluca De Marco and Andrzej Pelc Using Transitive–Closure Logic for Deciding Linear Properties of Monoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 Christian Delhomm´e, Teodor Knapik, and D. Gnanaraj Thomas Linear-Time Computation of Local Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 Jean-Pierre Duval, Roman Kolpakov, Gregory Kucherov, Thierry Lecroq, and Arnaud Lefebvre Two Dimensional Packing: The Power of Rotation . . . . . . . . . . . . . . . . . . . . . 398 Leah Epstein Approximation Schemes for the Min-Max Starting Time Problem . . . . . . . . 408 Leah Epstein and Tamir Tassa Quantum Testers for Hidden Group Properties . . . . . . . . . . . . . . . . . . . . . . . . 419 Katalin Friedl, Fr´ed´eric Magniez, Miklos Santha, and Pranab Sen Local LTL with Past Constants Is Expressively Complete for Mazurkiewicz Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Paul Gastin, Madhavan Mukund, and K. Narayan Kumar LTL with Past and Two-Way Very-Weak Alternating Automata . . . . . . . . . 439 Paul Gastin and Denis Oddoux Match-Bounded String Rewriting Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Alfons Geser, Dieter Hofbauer, and Johannes Waldmann Probabilistic and Nondeterministic Unary Automata . . . . . . . . . . . . . . . . . . . 460 Gregor Gramlich On Matroid Properties Definable in the MSO Logic . . . . . . . . . . . . . . . . . . . . 470 Petr Hlinˇen´y Characterizations of Catalytic Membrane Computing Systems . . . . . . . . . . . 480 Oscar H. Ibarra, Zhe Dang, Omer Egecioglu, and Gaurav Saxena
XII
Table of Contents
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets in Undirected Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 Toshimasa Ishii and Masayuki Hagiwara Scheduling and Traffic Allocation for Tasks with Bounded Splittability . . . . 500 Piotr Krysta, Peter Sanders, and Berthold V¨ ocking Computing Average Value in Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . 511 Miroslaw Kutylowski and Daniel Letkiewicz A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences of Basic Parallel Processes . . . 521 Slawomir Lasota Solving the Sabotage Game Is PSPACE-Hard . . . . . . . . . . . . . . . . . . . . . . . . . 531 Christof L¨ oding and Philipp Rohde The Approximate Well-Founded Semantics for Logic Programs with Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Yann Loyer and Umberto Straccia Which Is the Worst-Case Nash Equilibrium? . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Thomas L¨ ucking, Marios Mavronicolas, Burkhard Monien, Manuel Rode, Paul Spirakis, and Imrich Vrto A Unique Decomposition Theorem for Ordered Monoids with Applications in Process Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 Bas Luttik Generic Algorithms for the Generation of Combinatorial Objects . . . . . . . . . 572 Conrado Mart´ınez and Xavier Molinero On the Complexity of Some Problems in Interval Arithmetic . . . . . . . . . . . . 582 K. Meer An Abduction-Based Method for Index Relaxation in Taxonomy-Based Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos On Selection Functions that Do Not Preserve Normality . . . . . . . . . . . . . . . . 602 Wolfgang Merkle and Jan Reimann On Converting CNF to DNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener A Basis of Tiling Motifs for Generating Repeated Patterns and Its Complexity for Higher Quorum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot
Table of Contents
XIII
On the Complexity of Some Equivalence Problems for Propositional Calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632 Steffen Reith Quantified Mu-Calculus for Control Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 642 St´ephane Riedweg and Sophie Pinchinat On Probabilistic Quantified Satisfiability Games . . . . . . . . . . . . . . . . . . . . . . . 652 Marcin Rychlik A Completeness Property of Wilke’s Tree Algebras . . . . . . . . . . . . . . . . . . . . . 662 Saeed Salehi Symbolic Topological Sorting with OBDDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Philipp Woelfel Ershov’s Hierarchy of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 Xizhong Zheng, Robert Rettinger, and Romain Gengler
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
Distributed Quantum Computing Harry Buhrman and Hein R¨ ohrig Centrum voor Wiskunde en Informatica (CWI) P.O. Box 94079, 1090 GB Amsterdam, The Netherlands
[email protected],
[email protected] Abstract. Quantum computing combines the framework of quantum mechanics with that of computer science. In this paper we give a short introduction to quantum computing and survey the results in the area of distributed quantum computing and its applications to physics.
1
Introduction
Computing is a physical process and therefore the theory of computing should incorporate the laws of physics. Quantum mechanics, developed during the last century, is to date the most accurate description of nature. Quantum computing is the area that combines the laws of quantum mechanics with computer science. In this paper we give a short introduction to quantum computing and its formalism; for a more detailed treatment of this we refer the reader to the excellent textbook of Nielsen and Chuang [46]. Quantum bits or “qubits” are the basic building blocks for quantum computers. As was shown already in the seventies by Holevo [30], qubits cannot be used to compress messages better than with bits. In general, a k-bit message needs also k qubits to be stored or sent over a channel. Qubits can, however, reduce the communication of certain distributed computational tasks, as was first demonstrated in [22] and subsequent papers, among them [17,23,21,48,7]. We survey some of these results here. The first result of a cheaper quantum than classical communication protocol [22] was inspired by nonlocality experiments constructed by physicist in order to test the strange and nonlocal behavior of entanglement. In 1935, Einstein, Podolsky, and Rosen devised a thought experiment that sought to show how quantum mechanics is incomplete because it would allow for some form of faster-than-light communication. Much later, in 1964, Bell [10] came up with an experimental way of testing the nonlocal behavior of quantum mechanics. These tests and the so-called Bell inequalities lead to experiments [9], that seem to demonstrate the nonlocality of quantum mechanics. However, these tests suffered from the drawback that implementations in the lab are error prone and sometimes do not give the right outcome or none at all. When the classical local theory is also allowed to make such errors, it can be shown that the nonlocality tests can also be explained by classical
Supported in part by the EU fifth framework project RESQ, IST-2001-37559, and NWO grant 612.055.001.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 1–20, 2003. c Springer-Verlag Berlin Heidelberg 2003
2
Harry Buhrman and Hein R¨ ohrig
physics, and do not demonstrate the nonlocal behavior of quantum mechanics at all! In this paper we survey how the results obtained in quantum communication complexity can be used to propose nonlocality experiments that are robust against errors and would for the first time demonstrate conclusively nonlocality. The paper is organized as follows. In Sect. 2 we give a short introduction to quantum mechanics and the notation used in this paper. In Sect. 3 we describe the quantum analogue of the black-box model of computation and describe one of the first quantum algorithms, due to Deutsch and Jozsa. Section 4 surveys some of the results in distributed computing.
2
Quantum Mechanics and Computing
One of the main and very counterintuitive features of quantum mechanics is the superposition principle. A physical system may be in a superposition of two or more different states at the same time. Quantum mechanics prescribes that when we observe such a system we see one of these states with a certain probability resulting in a collapse of the system into the state that we observed. 2.1
Qubits, Superposition, and Measurement
Let us concentrate now to computation. Classically a bit can be in any of two states: 0 or 1. Quantum mechanically a quantum bit or qubit may be in a superposition of both 0 and 1. It is useful to describe such systems as vectors in a finite-dimensional Hilbert space, in this case a two-dimensional one. We 1 will identify the vector with the symbol |0 to denote the classical bit 0 0 0 and vector with the symbol |1 denoting the classical bit 1. This notation 1 is called Dirac or “ket” notation, from “bra-ket.” The “bra” is | and a|b denotes the inner product between a and b. Quantum mechanics now allows for a superposition of these two classical states: α α|0 + β|1 = , (1) β where α and β, called amplitudes, are complex numbers with the property that |α|2 + |β|2 = 1 .
(2)
Next, observing or measuring a qubit α|0 + β|1 will yield outcome 0 with probability |α|2 and 1 with probability |β|2 . Moreover, after this measurement the qubit is either in the classical state |0 if we measured a 0, and in |1 if we measured a 1. Note that equation (2) guarantees that a qubit, when measured, indeed induces a probability distribution over 0 and 1. Let us try to plug in some values for α and β: 1 1 √ |0 + √ |1 2 2
(3)
Distributed Quantum Computing
3
Observing this qubit will result with probability 0.5 in seeing a 0 and with probability 0.5 in a 1. In general, our system will consist of more than just one qubit. Equations (1) and (2) generalize in the obvious way. Suppose we want to model k qubits. Classically k bits can be in any of 2k different configurations: 1 . . . 2k . This means that k qubits can be in a superposition of all (or part) of these 2k basis states: k
k
α1 |00 . . . 0 + . . . + α2k |11 . . . 1 =
αi |i
(4)
i∈{0,1}k
with the additional requirement that
|αi |2 = 1 .
(5)
i∈{0,1}k
When observing these k qubits we will see i with probability |αi |2 . If we have two qubits |x = α0 |0+α1 |1 and |y = β0 |0+β1 |1 then |x⊗|y are the two qubits in a four-dimensional Hilbert space. This construction is called the tensor or Kronecker product: |x ⊗ |y = (α0 |0 + α1 |1) ⊗ (β0 |0 + β1 |1) = α0 β0 |00 + α0 β1 |01 + α1 β0 |10 + α1 β1 |11. By convention, |0 ⊗ |0, |0|0, and |00 denote the same thing. In general not all the two-qubit states that satisfy (4) and (5) are obtained as the tensor of two qubits. We will see an important example, the EPR pair, in Subsect. 2.3. Such states are called entangled. 2.2
Unitary Operations
Next we would like to model operations on qubits. Quantum mechanics tells us that these operation have to be modeled as linear operations with the additional constraint that these operations preserve the probability interpretation, i.e., the squares of the amplitudes sum up to 1 (see (2) and (5)). Such transformations are called unitary; they are the square matrices U that satisfy UU∗ = I , where U ∗ is the complex conjugate transpose of U and I is the identity matrix. In terms of computation, the unitarity constraint implies that the computation is reversible. The following transformation on a single qubit is important and very useful. It is called the Hadamard transform. 1 1 1 H=√ (6) 2 1 −1 It is a unitary operation since:
4
Harry Buhrman and Hein R¨ ohrig
1 1 1 1 1 1 10 √ ·√ = 01 2 1 −1 2 1 −1 Now let us do a Hadamard operation on a qubit that is in the classical state |0: 1 1 1 1 1 1 √ · =√ 0 2 1 −1 2 1 This is in ket notation √12 |0 + √12 |1, which is the random qubit from (3). When we apply the Hadamard transform again on this qubit, 1 1 1 1 1 1 1 1 + √ , (7) ·√ = 21 21 = 0 1 −1 1 − 2 2 2 2 we get the |0 again. The important point is the minus sign in the Hadamard transform. Its effect is illustrated in (7) above. The minus sign caused the 12 − 12 in the lower half of the vector to cancel out, or interfere destructively, while both terms in the upper half interfered constructively. It is the superposition principle together with this interference behavior that gives quantum computing its power. The tensor product is also defined on linear operations. If we have an m × n matrix A and an m × n matrix B then A ⊗ B is a (m · m ) × (n · n ) matrix defined as: a1,1 · B a1,2 · B . . . a1,n · B a2,1 · B a2,2 · B . . . a2,n · B .. .. . . .. . . . . am,1 · B am,2 · B . . . am,n · B 2.3
Einstein-Podolsky-Rosen Paradox
In Sect. 2.1 we have seen that any set of k qubits is admissible if it satisfies (4) and (5). Bearing this in mind let us examine the following state consisting out of 2 qubits: 1 1 √ |00 + √ |11 (8) 2 2 Note that the first 0 and the first 1 form the first qubit and the second 0 and the second 1 form the second qubit. This state is called the “EPR state” after its inventors Einstein, Podolsky, and Rosen [25]. The purpose of this state was to devise a thought experiment to show the incompleteness of quantum mechanics. Imagine that we have this EPR state and that Alice has the first qubit somewhere on Mars and that Bob has the second, say, here on earth. If Alice measures her qubit she will see a 0 or a 1 with equal probability and the state will have collapsed to either |00, if she saw a 0 or |11 in case it was a 1. The same is true for Bob. This leads to the following situation. Suppose that the first qubit, on Mars, was measured first and that Alice saw a 1. This now means that when Bob measures his qubit he will also measure a 1. It appears that some information, i.e., the outcome of Alice’s measurement, has somehow traveled to
Distributed Quantum Computing
5
earth instantaneously. Since nothing can travel faster than the speed of light something must be wrong. It turns out that EPR pairs cannot be used for communication: straightforward arithmetic shows that the probabilities of Bob obtaining a certain measurement outcome are not changed no matter what Alice does. However, they can be used to reduce communication complexity as we are going to see in Sect. 4.2. Classical bits can be copied. Qubits on the other hand can not be copied. Theorem 1. [24,52] Qubits cannot be copied The reason for this is that the copy-qubit operation is not linear and, hence, not unitary. Suppose we had a linear operation Uc that would copy a qubit. This means on state (α|0 + β|1) ⊗ |0 it would do the following: Uc [(α|0 + β|1) ⊗ |0] = (α|0 + β|1) ⊗ (α|0 + β|1)
(9)
= α |00 + αβ|01 + αβ|10 + β |11 2
2
(10)
On the other hand, since Uc is linear and because (α|0 + β|1) ⊗ |0 = α|00 + β|10: Uc [α|00 + β|10] = α|00 + β|11 (11) It is clear that (10) and (11) are the same if and only if α = 1 and β = 0 or α = 0 and β = 1. This is precisely the case if we have a classical 0 or 1. Hence, there cannot be a linear operation that copies an arbitrary unknown qubit. Now imagine that Alice has an unknown qubit |x = α|0 + β|1 that she wants to send to Bob and that she furthermore can only communicate using classical bits. Is it in this case possible for Alice to communicate |x to Bob? In the light of the no-cloning Theorem 1 it certainly is impossible to do this since whenever she measures x she will destroy/collapse it to a classical bit and she cannot copy it first. But suppose that Alice and Bob in addition each share one half of an EPR pair (8). The surprising observation is that there is a scheme that allows Alice to send or “teleport” |x to Bob using only 2 classical bits [11]. In operational terms, the scheme works as follows. Let φ+ be the first part of an EPR pair and φ− the other half. That is, φ+ is the first bit of √12 [|00 + |11] and φ− the second bit. Alice has φ+ and Bob has φ− . At some point Alice gets the unknown qubit |x = α|0 + β|1. She now does a unitary operation1 on the two qubits, i.e., φ+ and x. Then she measures these two qubits, obtaining two bits: 00, 01, 10, or 11. Next she send these two bits to Bob, who depending on the two bits, does one of four unitary operations on his φ− . It turns out that this last unitary operation changes2 φ− into the unknown qubit |x. After the protocol, the EPR pair is destroyed, so in order to repeat this procedure a fresh EPR pair is needed. 1 2
The unitary operation is a controlled-not of x on φ+ , followed by a Hadamard on x. In fact after the controlled-not and the Hadamard transform of Alice, it follows that their joint state is: |00(α|0+β|1)+|01(α|1+β|0)+|10(α|0−β|1)+|11(α|1− β|0). This means that after Alice does her measurement, the third bit, i.e., φ− , is the unknown qubit x up to a possible bit flip and/or phase shift depending on the outcome of Alice’s measurement.
6
Harry Buhrman and Hein R¨ ohrig
The important point for communication complexity is that this teleportation scheme is a way to simulate a qubit channel between Alice and Bob with a classical channel, at the cost of two bits per qubit, whenever Alice and Bob share EPR pairs. Theorem 2. [11] When Alice and Bob share EPR pairs, they can simulate a qubit channel with a classical bit channel at the cost of two classical bits per qubit.
3
Quantum Black-Box Computation
Perhaps the simplest form of a computational task is the following. Suppose we have n Boolean variables X0 , . . . , Xn−1 , and we want to compute a property P (X0 , . . . , Xn−1 ). The goal is to compute P looking at as few variables as possible. For example, suppose P (X0 , . . . , Xn−1 ) = 1 iff there exists an i such that Xi = 1. That is, we want to compute the OR(X0 , . . . , Xn−1 ). How many variables do we have to query? It is not too hard to see that we have to look at all the variables. A similar kind of reasoning shows that also in the randomized setting the bound is Ω(n). It has been shown by Grover [29] that a quantum √ algorithm can solve the OR with only O( n) quantum queries. Next we will turn our attention to another problem that allows even an exponential speedup. Define the following promise on the variables. We are guaranteed that they are either constant (i.e., all the Xi are either all 0 or all 1) or they are balanced: exactly half the Xi are 0 and the other half is 1. The problem is to find out whether the variables are constant or balanced. It is easy to see that classically this problem requires n/2 + 1 queries to the variables. One of the first quantum algorithms, by Deutsch and Jozsa [33], establishes that this problem can be solved with just a single quantum query! Before we explain this algorithm we first have to explain how we model a quantum query. Quantum Query. We have to model a quantum query in such a way that it is a unitary operation. We define a quantum query to variable Xi as follows. The state |i, 0 becomes after the query |i, Xi and |i, 1 becomes |i, 1 − Xi . That is, for 1 ≤ i ≤ n and b ∈ {0, 1} : |i, b → |i, b ⊕ Xi Since this describes what a query does on basis states, because of linearity it also works on states that are in superposition: αi,b |i, b → αi,b |i, b ⊕ Xi . (12) i∈{0,1}log(n) ,b∈{0,1}
i∈{0,1}log(n) ,b∈{0,1}
It can be easily checked that this operation is unitary.
Distributed Quantum Computing
7
The Deutsch-Jozsa Algorithm. Suppose n is a power of 2 and l = log n. We start in a state with l 0s followed by a 1: |0l 1 Remember the Hadamard transform H on one qubit from (6). We do a Hadamard transform on all the qubits of the state, i.e., the following operation l+1
H ⊗ H ⊗ . . . ⊗ H = H ⊗l+1 . This will result in the following state: 1 √ n
1 |i √ (|0 − |1) 2 i∈{0,1}l
(13)
Then we perform the only quantum query. This will affect our state according to (12) as follows: 1 1 √ (−1)Xi |i √ (|0 − |1) (14) n 2 l i∈{0,1}
To see that this is correct, first observe that we perform the quantum query with the target qubit in superposition (|0 − |1) This means that |i √12 (|0 − |1) after the query becomes |i √12 (|0 ⊕ Xi − |1 ⊕ Xi ). Furthermore, if Xi is 0 then this is simply |i √12 (|0 − |1); on the other hand if Xi = 1 then it becomes |i √12 (|1 − |0), which is the same as (−1)|i √12 (|0 − |1). Hence, we get a factor of −1 iff Xi = 1. Next we apply again H ⊗l+1 to the state and obtain the following messy-looking expression: 1 √ n
i∈{0,1}l
1 √ n
(−1)Xi ⊕(i,j) |j|1 ,
(15)
j∈{0,1}l
where (i, j) is the inner-product between i and j modulo 2. Let us take a closer look at the part of this sum where j = 0l : 1 n
(−1)Xi |0l |1
(16)
i∈{0,1}l
Suppose that all the X i = 0 and we are in the case “variables constant 0.” Then (16) boils down to: n1 i∈{0,1}l |0l |1 = |0l 1. For the “constant 1” case we will end up in (−1)|0l 1. This means that when we observe the final state in (15), we will see 0l 1 with probability 1. On the other hand, if half of the Xi = 1 and the other half are 0, then half of the terms in (16) are 1 and the other half are −1 and cancel each other out. The result of this is that |0l 1 has amplitude 0 and will be seen with probability 0. So by observing state (15) we can conclude that if we observe 0l 1 we are in the constant case and if we observe anything else we are in the balanced case.
8
4 4.1
Harry Buhrman and Hein R¨ ohrig
Applications in Distributed Computing Communication
One of the main themes in quantum information processing is to extend classical communication and communication schemes with quantum ones. Here we will consider three models of quantum communication and compare them with classical communication. 1. Communication is done with qubits. 2. Both parties share EPR pairs but communication is done via a classical-bit channel. 3. Both parties share EPR pairs and communication is done with qubits. The most simple form of communication is where Alice wants to send a message m of say k bits to Bob. We know that classically in general Alice needs to send k bits to Bob. Is this still true in the setting 1, 2, and 3? It follows from a theorem of Holevo [30] that when only qubits are used for communication Alice still needs to send k qubits. Moreover Cleve et al. [23] show that the same is true when both parties share EPR-pairs and classical communication is used. For the third variant, where both EPR pairs and qubits are used, things are slightly different. Bennett and Wiesner [12] show that in this case there is a kind of a reverse of Theorem 2. This is a scheme, called super-dense coding, that allows Alice to send two classical bits with one qubit to Bob provided they share an EPR pair. It can be shown that, like Holevo’s theorem, this is optimal. 4.2
Communication Complexity
Communication Complexity was introduced by Yao and Abelson [2,53]. Alice has an n-bit string x and Bob has an n-bit string y and their goal is to compute some function f : {0, 1}n × {0, 1}n → {0, 1}, minimizing the number of bits they communicate to each other. The area of communication complexity is well studied, see for example the books by Kushilevitz and Nisan [37] and Hromkoviˇc [32]. The question we want to address here is: how does the communication complexity of certain problems vary when different models of quantum communication are used. We will denote C(f ) to denote the classical communication complexity of f . That is the number of bits the optimal protocol uses on the worst-case input. The model where only qubits can be used for communication (model 1, Sect. 4.1) was introduced by Yao [54]. We will use Q(f ) for the quantum communication complexity in the model where only qubits are used for communication. The first results in that model were lower bounds or impossibility results due to Yao and Kremer [36] and we will discuss them in Sect. 3. The model where the communication is classical but both parties share entanglement, model 2, was introduced by Cleve and Buhrman [22]. We will denote the communication complexity in this model with C ∗ (f ), the model which uses both EPR pairs and qubits will be Q∗ (f ). Cleve and Buhrman were the first to show that communication complexity can be reduced contrary to what one might
Distributed Quantum Computing
9
believe considering Holevo’s theorem. Their setting differed slightly from the models we discuss here. In this setting they exhibit an example of a three party communication problem where the three parties share an entangled state, like an EPR pair but then for three parties. It is shown that when the parties share this entangled state the communication problem can be solved with two bits of communication whereas without such a prior shared state three bits are necessary. That is, there is a function f such that C ∗ (f ) = 2 whereas C(f ) ≥ 3. Better separations in the multiparty setting were found in [15] and [21]. The latter paper exhibits a function f for k parties such that C ∗ (f ) = k and C(f ) = Ω(k log(k)). Next we will turn our attention to the qubit communication model Q(f ). However, keep in mind that protocols for this model can be translated to the model where both parties share EPR pairs and communicate classically, since via teleportation, Theorem 2 gives us: C ∗ (f ) ≤ 2Q(f ). Deutsch-Jozsa Communication Problem. The first gap for two-party qubit communication complexity was demonstrated by Buhrman, Cleve, and Wigderson [17]. They showed for a promise version of the equality problem3 , EQ , that Q(EQ ) = O(log(n)) and that also C(EQ ) = Ω(n). This exhibits an exponential gap between classical and quantum communication complexity. The quantum protocol is inspired by the Deutsch-Jozsa algorithm from Sect. 3 and the classical bound stems from a deep and surprising combinatorial theorem from Frankl and R¨ odl [27]. EQ (x, y) = 1 iff x = y but with the extra promise that it will always be the case that the Hamming distance ∆(x, y) = 0 or n/2. The Hamming distance between two strings x and y, ∆(x, y), is the total number of bits where x and y are different. We will see that EQ can be solved with just log(n) + 1 qubits of communication from Bob to Alice. Note that under the Hamming distance promise, Alice and Bob have to figure out whether x1 ⊕ y1 . . . xn ⊕ yn is constant or balanced, since in the constant 0 case x = y and in the balanced x = y. So if we set Xi = xi ⊕ yi then we have the Deutsch-Jozsa problem back. If Alice could obtain the final state from equation (15), 1 √ n
i∈{0,1}l
1 √ n
(−1)Xi ⊕(i,j) |j|1 ,
j∈{0,1}l
she would do a final measurement and know the answer. To this end Bob prepares the following state: 1 √ n 3
1 |i √ (|0 ⊕ yi − |1 ⊕ yi ) 2 i∈{0,1}l
EQ(x, y) = 1 if x = y and 0 otherwise. EQ requires n bits of communication. A promise version of a problem means that Alice and Bob are only required to compute the answer correctly on certain instances that fall within the promise and it doesn’t matter what they compute on the other instances that don’t satisfy the promise.
10
Harry Buhrman and Hein R¨ ohrig
and sends these log(n) + 1 qubits to Alice. Alice then performs the unitary transformation that changes state |i|b to |i|b ⊕ xi resulting in state: 1 √ n
1 |i √ (|0 ⊕ yi ⊕ xi − |1 ⊕ yi ⊕ xi ) 2 i∈{0,1}l
which is after we rewrite it precisely the state from (14): 1 √ n
1 (−1)Xi |i √ (|0 − |1) 2 i∈{0,1}l
Next Alice proceeds as in the Deutsch-Josza algorithm and applies H ⊗ log(n)+1 and measures the final state. The general idea is to use a quantum black-box algorithm in a distributed setting. Whenever the black-box algorithm wants to make a query, Alice and Bob exchange a round of log(n) + 1 qubits and Alice continues the black-box algorithm. This allows one in general to use any black-box algorithm as a communication protocol. In this way it can be shown that, by using √ Grover’s algorithm [29] the Disjointness problem can be solved with O( n log(n)) many qubits [17]. Bounded-Error Protocols. All the above (quantum) protocols don’t make errors and compute the outcome exactly. When studying randomized versions of communication complexity, however, it is unavoidable to introduce errors. A classical randomized protocol for f , R2 (f ), is a protocol where both Alice and Bob can use random bits. They are required to compute the correct outcome with probability at least 2/3. The distinction between private and public random bits can be made, where in the public bit/coin model Alice and Bob see the same random bits and in the private they each have a different random source. Newman [45] has shown that up to an additive logarithmic term the models are the same. Rabin and Yao show for EQ that there exists a classical randomized protocol that only needs O(log(n)) bits: R2 (EQ) = O(log(n)). This implies that the promise problem EQ also has a O(log(n)) randomized classical bit protocol that is correct with probability at least 2/3. Note, however, that the quantum protocol never makes an error. The disjointness problem DISJ is defined as follows. Alice and Bob each have a subset A and B of {0, 1}n , they have to decide whether A ∩ B = ∅. Kalyanasundaram and Schnitger [34] show that this problem also has high communication complexity in the randomized setting: R2 (DISJ) = Ω(n). Buhrman et al. in the same paper show that when we allow the quantum protocol to compute the answer with probability at least 2/3, we denote this by Q2 (f ), that √ √ Q2 (DISJ) = O( n log(n)). This bound was improved by [31] and a O( n) protocol was recently constructed by Aaronson and Ambainis [1]. Razborov has shown, using some variant of the polynomial method, that this bound is tight [49]
Distributed Quantum Computing
11
The disjointness problem demonstrates a quadratic gap between classical randomized and quantum communication complexity. Moreover this is an example of a gap known where the function f is not a promise problem. The only other known total problem that allows for a more efficient quantum protocol is that of the equality problem in the simultaneous message passing model [16]. The main ingredients to the protocol are a quantum fingerprinting scheme and a test to distinguish orthogonal states from parallel ones. In the simultaneous message passing model Alice and Bob don’t send messages to each other but send one message to a third party, called the referee. The referee only sees the messages from Alice and Bob and has to output f (x, y). The biggest gap between the randomized and the quantum two party communication complexity model was obtained by Raz [48]. He showed that √ there is a promise problem f such that Q(f ) = O(log(n)) but R2 (f ) = Ω( n). Ambainis et al. [7] also exhibit an exponential gap between quantum protocols and classical protocols for a different form of communication problem called sampling which we shall not discuss here further. Summarizing for promise problems there exist exponential gaps between classical and quantum communication complexity. For total problems the best known gap is only quadratic. In turn this sheds some light on the EPR paradox. Holevo’s theorem proves that EPR pairs cannot be used to reduce communication. Since all the protocols in this section work for the model where the parties share EPR pairs and communicate classically it follows that EPR pairs can reduce the communication complexity of certain problems. This situation seems contradictory but notice that the actual amount of information that needs to be communicated between Alice and Bob is only 1 bit, namely the outcome of f . Lower Bounds. In the previous section we showed that quantum communication protocols are sometimes superior to classical protocols. In this section we examine the converse and turn our attention to lower bounds for quantum communication complexity. Classically for deterministic communication complexity there is a general technique for proving lower bounds. For any function f : {0, 1}n × {0, 1}n → {0, 1} one can define the boolean 2n × 2n communication matrix Mf (x, y) = f (x, y). Mehlhorn and Schmidt [44] related the rank of this matrix to the communication complexity. They show that log(rank(Mf )) ≤ C(f ). This is a very useful tool. Take for example the equality problem. The communication complexity matrix for EQ is the 2n × 2n identity matrix which has only 1’s on the diagonal and is 0 on off-diagonal entries. Since this matrix has rank 2n it follows that C(f ) ≥ n. A similar statement is true in the quantum setting: Theorem 3. For any communication problem f : 1. log(rank(Mf ))/2 ≤ Q(f ) [36]. 2. log(rank(Mf )) ≤ C ∗ (f ) [18]. 3. log(rank(Mf ))/2 ≤ Q∗ (f ) [18].
12
Harry Buhrman and Hein R¨ ohrig
A natural and long standing open problem is whether the communication complexity is also a lower bound for the log-rank. That is, whether the log-rank characterizes the communication complexity. The biggest known gap between the log-rank and the communication complexity is almost quadratic [47]. The log-rank conjecture states that for every total f , log(rank(f )) and C(f ) are all polynomially related. It follows from Theorem 3 that if the log-rank conjecture is true then for total f : Q(f ), C ∗ (f ), Q∗ (f ), and C(f ) are polynomially related. The log rank lower bound method only works well for errorless protocols. For bounded error models there is another bound called discrepancy. Kremer [36] and Yao show that the discrepancy bound also works for the bounded error qubit communication model Q2 . This enables them to show a linear lower bound in this model for a problem called inner product modulo 2, IP . Here IP (x, y) = x1 · y1 + · · · + xn · yn mod 2. Ambainis et al. [7] extend this bound to also yield a Ω(n) bound even when Alice and Bob are allowed to make an error which is very close to 1/2. For the model where both parties share EPR pairs, Cleve et al. [23] were the first to show a linear lower bound for IP . They came up with a new technique that is essentially quantum mechanical in nature. It can be seen as a quantum adversary argument. This enabled them to show that any (quantum) protocol for IP can be (ab)used, when run in superposition, to communicate n bits from Alice to Bob. Let Q∗2 (f ) denote the communication complexity of f where Alice and Bob compute f correctly with probability 2/3, they share EPR pairs and the communication is with qubits. Theorem 3 yields a lower bound of Ω(n) for DISJ in the errorless models since the MDISJ has rank 2n . In the bounded error model √ recently Razborov showed, in a very nice paper, that the DISJ needs Ω( n) qubits of communication even in the presence of shared EPR pairs. Summarizing we have the following theorem: (IP ) = Ω(n) [36,23]. Theorem 4. 1. Q∗2√ 2. Q∗2 (DISJ) = Ω( n) [49] 4.3
Loopholes in Nonlocality Experiments
Tools and results from the study of quantum communication complexity have been applied fruitfully to tune parameters in physical experiments that test the “quantumness” of our world. The EPR paradox has been and still is a subject of dispute. Much progress was made when Bell [10] came up with a test that would, in case quantum mechanics was correct, show correlations that could not be explained with just classical reasoning. Such nonlocality experiments have been performed in the lab and non-classical correlations have been observed [9]. However, experimental realizations of the nonlocality tests are hampered by noise and imperfections in the physical apparatus. In particular, measurement devices for individual quantum systems (e.g., single-photon detectors) tend to fail on most runs of the experiment, allowing local classical explanations of the data by means of local classical theories that are allowed to make the same kind
Distributed Quantum Computing
13
of errors and this opens the so-called “detection loophole.” Ideas from quantum communication complexity have been used by Brassard et al. [14], Massar [40], and Buhrman et al. [19] to propose new nonlocality experiments and to bound the maximum detector efficiency, minimum noise, and hidden communication using which the results can be explained by means of a classical local model. The goal is to construct an experiment that demonstrates the nonlocal character of quantum mechanics even when the experiments are faulty and make errors. An experiment is modeled as two (or more) parties Alice and Bob that each have an input of length n. However, contrary to the communication complexity model, Alice and Bob are not allowed to communicate with each other. In the classical setting Alice and Bob share a common source of random bits, and in the quantum scenario Alice and Bob share EPR pairs or more generally an entangled state. Alice now will depending on her input and her random bits (or some operation on her part of the EPR pairs) output some string a of m bits. Bob follows some protocol to also output m bits b. This way they produce correlation distributions Pr[a, b | x, y]. The goal now is to come up with a set of correlation distributions and show that there is a quantum protocol that generates these distributions whereas every classical protocol fails to do so even if it is allowed to make small errors or sometimes not produce an output at all. Deutsch-Josza Correlations. To demonstrate these ideas, we return once more to the Deutsch-Josza problem, following Brassard et al. [14] and Massar [40]. This time, Alice and Bob cannot communicate, but they start out sharing a quantum state, receive classical bit strings x, y ∈ {0, 1}n , respectively; both Alice and Bob produce outputs, a, b ∈ {0, 1}l , respectively, and we are interested in the correlations between these outputs, namely the probability distributions Pr[a, b | x, y] of Alice outputting a and Bob outputting b given that Alice got input x and Bob input y. Recall that the “trick” in turning the Deutsch-Josza algorithm into a communication protocol was to let Bob perform the first steps of the algorithm and then send the quantum state to Alice who completed the steps with her input. Now, since Alice and Bob cannot communicate, we replace the quantum channel by EPR pairs. Alice and Bob start out with the following state comprised of l = log(n) EPR pairs and two auxiliary qubits:
1 √
2 n
|i (|0 − |1) |i (|0 − |1)
i∈{0,1}l
Here, Alice has the first l+1 qubits and Bob the remaining l+1 qubits. Now they pretend that each on her/his side are in the Deutsch-Jozsa algorithm before the oracle query, as given in (13). Accordingly, they perform the operation |i|b → |i|b ⊕ yi on their part of the state, resulting in the following global state: 1 √
2 n
(−1)xi +yi |i (|0 − |1) |i (|0 − |1)
i∈{0,1}l
Then they apply the Hadamard operation on their l +1 qubits, yielding the state
14
Harry Buhrman and Hein R¨ ohrig
1 √ n n
i∈{0,1}l
(−1)xi +yi
(−1)(i,a) |a |1
a∈{0,1}l
1 = √ n n
a,b∈{0,1}l
b∈{0,1}l
(−1)(i,b) |b |1
(−1)xi +yi +(i,a⊕b) |a|1|b|1
i∈{0,1}l
Now they both measure and output their measurement. By the laws of quantum mechanics, the probability for Alice to observe |a|1 and Bob |b|1 is 2 1 Pr[a, b | x, y] = 3 (−1)xi +yi +(i,a⊕b) n l i∈{0,1}
If x = y, then
Pr[a, b | x, y] =
1 n
0
if a = b if a = b
whereas for ∆(x, y) = n/2 and a = b we have Pr[a, b | x, y] = 0. Hence, the outputs are correlated in that whenever x = y, we always see a = b and whenever ∆(x, y) = n/2, we never see a = b. Can these correlations be realized by a classical protocol with shared randomness and no communication? No, since then Bob could send his output to Alice, solving the communication problem with O(log n) bits, which is ruled out by the lower bound of Ω(n). Then, how closely can they be realized approximately, i.e., how precise does an experiment need to be? For the “detection loophole,” it is assumed that any measurement succeeds with probability at least η and if it fails, there will be no output. Then η 2 is the probability that both Alice’s and Bob’s measurements succeed. If the world is classical, then we have an adversary who is trying to reproduce the correlations without communication using the possibility not to produce an output on a η 2 fraction of the runs of the experiment. By the Yao principle there will be for any distribution on the inputs a classical local deterministic strategy which produces a (correct) output for an η 2 fraction of the inputs. Consider the input distribution where x ∈ {0, 1}n is chosen uniformly and random and y = x; fix the best deterministic strategy. Let Za = {x : Alice and Bob output a}, then η 2 2n ≤ |Za | a∈{0,1}l
Moreover, for each a ∈ {0, 1}l , Za ⊆ {0, 1}n must not contain x, y with ∆(x, y) = n/2, therefore, by a deep theorem odl [27], |Za | ≤ 20.993n . This √ by Frankl and R¨ 2 n 0.993n −0.007n implies η 2 ≤ n2 or η ≤ n2 . Hence, with growing n, the detector efficiency at which there still exists a classical local model decreases exponentially. So if the quality of the measurement equipment does not decrease too fast with growing n, the detection loophole can be “closed” with an experiment for the Deutsch-Jozsa correlations.
Distributed Quantum Computing
15
There are several issues with this approach. In a nonlocality experiment, the input distribution should a product distribution so that it can be implemented locally in the lab. Furthermore, there are very efficient classical bounded-error protocols for equality, implying that the quantum correlations above can be very well simulated classically if the experiment is subject to noise. And finally, an asymptotic analysis is often too coarse since the region where the bounds kick in may be out of reach experimentally. Concerning the bounded-error case, a multiparty nonlocality experiment has been constructed, building again on an earlier multiparty quantum communication protocol [21]. This family of experiments has η ≤ 1/k 1/6 and tolerates error 1/2 − 1/ o(k 1/6 ), where k is the number of parties [20]. 4.4
Coin Tossing
Research into quantum cryptography is motivated by two observations about quantum mechanics: 1. Nonorthogonal quantum states cannot be distinguished perfectly and parts of certain orthogonal quantum states cannot be distinguished if the remaining parts are inaccessible; 2. Measurement disturbs the quantum state. This is the so-called “collapse of the wave function.” The second observation hints at the possibility of detecting eavesdroppers or other types of cheaters, whereas the first property appears to allow hiding data, both unhampered by unproven computational assumptions. Indeed, for the task of cooperatively establishing a random bit string between two parties in the presence of eavesdroppers, quantum key distribution [13,41,39] achieves security against the most general attack by an adversary that has unbounded computational power but has to obey the laws of quantum mechanics. Initially, it was thought that these properties would admit protocols for the cryptographic primitive “bit commitment.” In bit commitment, there are two parties Alice and Bob; in the initial phase of the protocol, Alice has a bit b and communicates with Bob to “commit” to the value of b without revealing it. At a later time, Alice “unveils” her bit, allowing Bob to perform checks against the information obtained in the initial phase. The properties sought of bit-commitment protocols are that they are “concealing” (Bob does not learn anything about b in the initial phase) and “binding” (Bob will catch Alice trying to unveil 1 − b instead of b). Unfortunately, Mayers [42] and Lo and Chau [38] proved that perfect quantum bit commitment is impossible. Their impossibility result extends to “coin tossing” [43,38], a weaker cryptographic primitive where the two parties want to agree on a random bit whose value cannot be influenced by either of them. Moreover, the impossibility extends even to the case of “weak coin tossing” [4], where outcome b = 0 is favorable for Alice and outcome b = 1 favorable for Bob, thus ruling out perfect quantum protocols for leader election. However, what
16
Harry Buhrman and Hein R¨ ohrig
turned out to be possible are coin-tossing protocols, where there are guarantees on how much a cheater can bias the outcome. Consider k parties out of which at most k < k are dishonest; which players are dishonest is fixed in advance but unknown to the honest players. The players can communicate over broadcast channels. Initially they do not share randomness, but they can privately flip coins; the probabilities below are with respect to the private random coins. A coin-flipping protocol establishes among the honest players a bit b such that – if all players are honest, Pr[b = 0] = Pr[b = 1] = 1/2 – if up to k players are dishonest, then Pr[b = 0], Pr[b = 1] ≤ 1/2 + is called the bias; a small bias implies that colluding dishonest players cannot strongly influence the outcome of the protocol. Players may abort the protocol. Classically, if a (weak) majority of the players is bad then no bias < 1/2 can be achieved and hence no meaningful protocols exist [50]. For example, if we only have two players and one of them is dishonest, then no protocols with bias < 1/2 exist. (For a minority of bad players, quite nontrivial protocols exist; see [26].) Allowing quantum bits (qubits) to be sent instead of classical bits changes the situation dramatically. Surprisingly, in the two-party case coin flipping with bias < 1/2 is possible, as was first shown in [3]. The best known bias is 1/4 and this is optimal for a special class of three-round protocols [4]; for a bias of at least Ω(log log(1/)) rounds of communication are necessary [4]. Recently, Kitaev (unpublished, see [35,6]) showed that in the two-party case no bias smaller than √ 1/ 2 − 1/2 is possible. In the weak version of the coin-flipping problem, we know in advance that outcome 0 benefits Alice and outcome 1 benefits Bob. In this case, we only need to bound the probabilities of a dishonest Alice convincing Bob that the outcome is 0 and a dishonest Bob convincing Alice that the outcome is 1. In the classical setting, a standard argument shows that even weak coin flipping with a bias < 1/2 is impossible when a majority of the players is dishonest. In the quantum setting, this scenario was first studied for two parties under the name quantum gambling [28]. Subsequently, Spekkens and Rudolph √ [51] gave a quantum protocol for two-party weak coin flipping with bias 1/ 2 − 1/2 √ (i.e., no party can achieve the desired outcome with probability greater than 1/ 2). Notice that this is a better bias than in the best strong coin flipping protocol of [4]. Kitaev’s lower bound for strong coin flipping does not apply to weak coin flipping. Thus, weak protocols with arbitrarily small > 0 may be possible. The only known lower bounds for weak coin flipping are that the protocol of [51] is optimal for a restricted class of protocols [5] and that a protocol must use at least Ω(log log(1/)) rounds of communication to achieve bias (shown in [4] for strong coin flipping but the proof also applies to weak coin flipping). Quantum coin flipping and leader election for more than two parties were investigated by Ambainis et al. [6]: Even if there is only a single honest party among k players, bias 1/2 − c/k 1.78 can still be achieved by a quantum protocol (for some c > 0) and there is a lower bound that for some c > 0, 1/2 − c /k cannot be achieved. Both bounds can be generalized to the situation where at
Distributed Quantum Computing
17
most (1 − )k of the players are bad, for > 0; in this case, bias δ < 1/2 − c 1.78 is achievable independent of the number of players and achieving constant bias δ < 1/2 − c is impossible, for constants c , c > 0.
5
Conclusion and Open Problems
We have surveyed some of the results in quantum distributed computing. Many problems however remain. What is the relationship between the various models, Q, C ∗ , Q∗ both in the errorless and in the bounded error setting? For the errorless models, a positive answer to the log-rank conjecture shows that they are all polynomially related but also this is at the moment still wide open. We have seen that exponential gaps between classical and quantum communication complexity problems are possible, however, all of these examples entailed promise problems. Can there also be exponential gaps for total problems in the bounded error setting? Techniques and protocols from quantum and classical communication complexity can help to construct nonlocality experiments. It remains an open question what the best bounds for two and more parties are for the error of the experiment and the detector efficiency. A question that is sheds some light on the relationship between Q, C ∗ , and ∗ Q is the following. Given a correlation game with two parties that each get inputs of size n and produce outputs of size m, and use an entangled state of a finite amount of qubits. Is there a protocol that uses only an O log(n + m)) qubit entangled state that can be used to approximate, say in terms of small total variation distance, the correlations from the original protocol? Such a statement is true in the classical scenario with respect to the number of shared random bits and has a very similar prove of the fact that for any communication complexity protocol only O(log(n)) shared random bits are needed [45]. Note that if it can be shown that O(log(n)) entangled qubits are enough to simulate an arbitrary communication complexity protocol on inputs of length n then C2∗ , Q∗2 , and Q2 are all related with an additional overhead of O(log(n)) qubits of communication. In the simultaneous message passing model there exists a O(log n) protocol that solves the equality problem. The equality problem is equivalent to the problem of deciding whether x ⊕ y = 0, where x ⊕ y is the bitwise XOR of x and y. No such protocol is known for the three party problem to decide whether x ⊕ y ⊕ z = 0.√The best known quantum protocol is due to Ambainis and Shi [8] who need O( n) qubits to solve this problem. However, the best √ known lower bound for this three party problem in the classical setting is Ω( n) and the best known classical upper bound is O(n2/3 ) Quantum bit commitment and perfect coin tossing have been shown to be impossible, but there are protocols for coin tossing with constant bias. The best known impossibility bound for strong coin tossing matches the bias of the best known protocol for weak coin tossing – it is not clear whether this is a coincidence. Tight bounds on the achievable bias are not known in both cases; we even do not know whether there exists a protocol with a finite number of rounds and qubits that guarantees the optimal bias, or whether there are more and more complex protocols whose biases converge.
18
Harry Buhrman and Hein R¨ ohrig
References 1. S. Aaronson and A. Ambainis. Quantum search of spatial regions. quant-ph/0303041, 2003. 2. H. Abelson. Lower bounds on information transfer in distributed computations. J. Assoc. Comput. Mach., 27(2):384–392, 1980. Earlier version in FOCS’78. 3. D. Aharonov, A. Ta-Shma, U. Vazirani, and A. Yao. Quantum bit escrow. In Proceedings of STOC’00, pages 705–714, 2000. 4. A. Ambainis. A new protocol and lower bounds for quantum coin flipping. In Proceedings of 33rd ACM STOC, pages 134–142, 2001. 5. A. Ambainis. Lower bound for a class of weak quantum coin flipping protocols. quant-ph/0204063, 2002. 6. A. Ambainis, H. Buhrman, Y. Dodis, and H. R¨ ohrig. Multiparty quantum coin flipping. Submitted, 2003. 7. A. Ambainis, L. Schulman, A. Ta-Shma, U. Vazirani, and A. Wigderson. The quantum communication complexity of sampling. In 39th IEEE Symposium on Foundations of Computer Science, pages 342–351, 1998. 8. A. Ambainis and Y. Shi. Distributed construction of quantum fingerprints. quantph/0305022, 2003. 9. A. Aspect, J. Dalibard, and G. Roger. Experimental test of Bell’s inequalities using time-varying analyzers. Phys. Rev. Lett., 49(25):1804, 1982. 10. J. S. Bell. On the Einstein-Podolsky-Rosen paradox. Physics, 1, 1964. 11. C. Bennett, G. Brassard, C. Cr´epeau, R. Jozsa, A. Peres, and W. Wootters. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels. Physiscal Review Letters, 70:1895–1899, 1993. 12. C. Bennett and S. Wiesner. Communication via one- and two-particle operators on Einstein-Podolsky-Rosen states. Physiscal Review Letters, 69:2881–2884, 1992. 13. C. H. Bennett and G. Brassard. Quantum cryptography: Public key distribution and coin tossing. In Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing, pages 175–179, 1984. 14. G. Brassard, R. Cleve, and A. Tapp. The cost of exactly simulating quantum entanglement with classical communication. Physical Review Letters, 83(9):1874– 1877, 1999. 15. H. Buhrman, R. Cleve, and W. van Dam. Quantum entanglement and communication complexity. SIAM Journal on Computing, 30(8):1829–1841, 2001. quantph/9705033. 16. H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf. Quantum fingerprinting. Physical Review Letters, 87(16), September 26, 2001. 17. H. Buhrman, R. Cleve, and A. Wigderson. Quantum vs. classical communication and computation. In 30th Annual ACM Symposium on Theory of Computing, 1998. quant-ph/9702040. 18. H. Buhrman and R. de Wolf. Communication complexity lower bounds by polynomials. In 16th IEEE Annual Conference on Computational Complexity (CCC’01), pages 120–130, 2001. cs.CC/9910010. 19. H. Buhrman, P. Høyer, S. Massar, and H. R¨ ohrig. Combinatorics and quantum nonlocality. Accepted for publication in Physical Review Letters, 2002. 20. H. Buhrman, P. Høyer, S. Massar, and H. R¨ ohrig. Resistance of quantum nonlocality to imperfections. Manuscript, 2003. 21. Harry Buhrman, Wim van Dam, Peter Høyer, and Alain Tapp. Multiparty quantum communication complexity. Physical Review A, 60(4):2737 – 2741, October 1999.
Distributed Quantum Computing
19
22. R. Cleve and H. Buhrman. Substituting quantum entanglement for communication complexity. Physical Review A, 56(2):1201–1204, august 1997. 23. R. Cleve, W. van Dam, M. Nielsen, and A. Tapp. Quantum entanglement and the communication complexity of the inner product function. In Springer-Verlag, editor, Proceedings of the 1st NASA International Conference on Quantum Computing and Quantum Communications, 1998. 24. D. Dieks. Communication by EPR devices. Phys. Lett. A, 92(6):271–272, 1982. 25. A. Einstein, B. Podolsky, and N. Rosen. Can quantum-mechanical description of physical reality be considered complete? Phys. Rev., 47:777, 1935. 26. U. Feige. Noncryptographic selection protocols. In Proceedings of 40th IEEE FOCS, pages 142–152, 1999. 27. P. Frankl and V. R¨ odl. Forbidden intersections. Trans. Amer. Math. Soc., 300(1):259–286, 1987. 28. L. Goldenberg, L. Vaidman, and S. Wiesner. Quantum gambling. Physical Review Letters, 88:3356–3359, 1999. 29. L. Grover. A fast quantum mechenical algorithm for database search. In 28th ACM Symposium on Theory of Computing, pages 212–218, 1996. 30. A. S. Holevo. Bounds for the quantity of information transmitted by a quantum communication channel. Problemy Peredachi Informatsii, 9(3):3–11, 1973. English translation in Problems of Information Transmission, 9:177–183, 1973. 31. P. Høyer and R. de Wolf. Improved quantum communication complexity bounds for disjointness and equality. In Proceedings of 19th Annual Symposium on Theoretical Aspects of Computer Science (STACS’2002), volume 2285 of Lecture Notes in Computer Science, pages 299–310. Springer, 2002. quant-ph/0109068. 32. J. Hromkoviˇc. Communication Complexity and Parallel Computing. EATCS series: Texts in Theoretical Computer Science. Springer, 1997. 33. D. Deutsch R. Josza. Rapid solutions of problems by quantum computation. Proc. Roy. Soc. London Se. A, 439:553–558, 1992. 34. B. Kalyanasundaram and G. Schnitger. The probabilistic communication complexity of set intersection. SIAM J. Discrete Mathematics, 5(4):545–557, 1992. 35. A. Yu. Kitaev. Quantum coin-flipping. Talk at QIP 2003 (slides and video at MSRI), December 2002. 36. I. Kremer. Quantum communication. Master’s thesis, Computer Science Department, The Hebrew University, 1995. 37. E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. 38. H. K. Lo and H. F. Chau. Why quantum bit commitment and ideal quantum coin tossing are impossible. Physica D, 120:177–187, 1998. 39. H-K. Lo and H. F. Chau. Unconditional security of quantum key distribution over arbitrarily long distances. quant-ph/9803006, 3 Mar 1998. 40. S. Massar. Nonlocality, closing the detection loophole, and communication complexity. Physical Review A, 65:032121, 2002. 41. D. Mayers. Unconditional security in quantum cryptography. quant-ph/9802025, 10 Feb 1998. 42. D. Mayers. Unconditionally secure quantum bit commitment is impossible. Physical Review Letters, 78:3414–3417, 1997. 43. D. Mayers, L. Salvail, and Y. Chiba-Kohno. Unconditionally secure quantum coin tossing. quant-ph/9904078, 22 Apr 1999.
20
Harry Buhrman and Hein R¨ ohrig
44. K. Mehlhorn and E. M. Schmidt. Las Vegas is better than determinism in VLSI and distributed computing (extended abstract). In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pages 330–337, San Francisco, California, 5–7 May 1982. 45. I. Newman. Private vs. common random bits in communication complexity. Information Processing Letters, 39(2):67–71, July 1991. 46. M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000. 47. N. Nisan and A. Wigderson. On rank vs. communication complexity. Combinatorica, 15:557–566, 1995. Earlier version in FOCS’94. 48. R. Raz. Exponential separation of quantum and classical communication complexity. In Proceedings of 31th STOC, pages 358–367, 1999. 49. A. A. Razborov. Quantum communication complexity of symmetric predicates. Izv. Math., 67(1):145–159, 2003. 50. M. Saks. A robust noncryptographic protocol for collective coin flipping. SIAM J. Discrete Math., 2(2):240–244, 1989. 51. R. Spekkens and T. Rudolph. A quantum protocol for cheat-sensitive weak coin flipping. quant-ph/0202118, 2002. 52. W. K. Wootters and W. H. Zurek. A single quantum cannot be cloned. Nature, 299(5886):802–803, 1982. 53. A. C-C. Yao. Some complexity questions related to distributive computing. In Proceedings of 11th STOC, pages 209–213, 1979. 54. A. C-C. Yao. Quantum circuit complexity. In Proceedings of 34th FOCS, pages 352–360, 1993.
Selfish Routing in Non-cooperative Networks: A Survey R. Feldmann, M. Gairing, Thomas L¨ ucking, Burkhard Monien, and Manuel Rode Department of Computer Science, Electrical Engineering and Mathematics University of Paderborn, F¨ urstenallee 11, 33102 Paderborn, Germany {obelix,gairing,luck,bm,rode}@uni-paderborn.de
Abstract. We study the problem of n users selfishly routing traffics through a shared network. Users route their traffics by choosing a path from their source to their destination of the traffic with the aim of minimizing their private latency. In such an environment Nash equilibria represent stable states of the system: no user can improve its private latency by unilaterally changing its strategy. In the first model the network consists only of a single source and a single destination which are connected by m parallel links. Traffics are unsplittable. Users may route their traffics according to a probability distribution over the links. The social optimum minimizes the maximum load of a link. In the second model the network is arbitrary, but traffics are splittable among several paths leading from their source to their destination. The goal is to minimize the sum of the edge latencies. Many interesting problems arise in such environments: A first one is the problem of analyzing the loss of efficiency due to the lack of central regulation, expressed in terms of the coordination ratio. A second problem is the Nashification problem, i.e. the problem of converting any given non-equilibrium routing into a Nash equilibrium without increasing the social cost. The Fully Mixed Nash Equilibrium Conjecture (FMNE Conjecture) states that a Nash equilibrium, in which every user routes along every possible edge with probability greater than zero, is a worst Nash equilibrium with respect to social cost. A third problem is to exactly specify the sub-models in which the FMNE Conjecture is valid. The wellknown Braess’s Paradox shows that there exist networks, such that strict sub-networks perform better when users are selfish. A natural question is the following network design problem: Given a network, which edges should be removed to obtain the best possible Nash equilibrium. We present complexity results for various problems in this setting, upper and lower bounds for the coordination ratio, and algorithms solving the problem of Nashification. We survey results on the validity of the FMNE Conjecture in the model of unsplittable flows, and for the model of splittable flows we survey results for the network design problem.
Partly supported by the DFG-SFB 376 and by the IST Program of the EU under contract numbers IST-1999-14186 (ALCOM-FT), and IST-2001-33116 (FLAGS). International Graduate School of Dynamic Intelligent Systems
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 21–45, 2003. c Springer-Verlag Berlin Heidelberg 2003
22
1
R. Feldmann et al.
Introduction
Motivation-Framework. Large-scale traffic and communication networks, like e.g. the internet, telephone networks, or road traffic systems often lack a central regulation for several reasons: The size of the network may be too large, the networks may be dynamically evolving over time, or the users of the network may be free to act according to their private interest, without regard to the overall performance of the system. Besides the lack of central regulation even cooperation of the users among themselves may be impossible due to the fact that the users may not even know each other. Networks with non-cooperative users have already been studied in the early 1950s in the context of road traffic systems [41,3]. Recently, this kind of networks has become increasingly important in computer science. Modern computational artifacts, like e.g. the internet, are modeled as communication networks with non-cooperative users. We survey routing problems in communication networks where n sources of traffic, called users, are going to route their traffics through a shared network. Traffics are routed through links of the network at a certain rate depending on the link, and different users may have different objectives, e.g. speed, quality of service, etc. The users choose routing strategies in order to minimize their private costs in terms of their private objectives without cooperating with other users. Such networks are called non-cooperative networks [22]. A famous example of such a network is the internet. Motivated by non-cooperative systems like the internet, combining ideas from game theory and computer science has become increasingly important [12,21,30,31,34]. Such an environment, which lacks a central control unit due to its size or operational mode, can be modeled as a non-cooperative game [32]. Users selfishly choose their private strategies, which in our environment correspond to paths (or probability distributions over the paths) from their sources to their destinations. When routing their traffics according to the strategies chosen, the users will experience an expected latency caused by the traffics of all users sharing edges. Each user tries to minimize its private cost, expressed in terms of its expected individual latency. This often contradicts the goal of optimizing the social cost which measures the global performance of the whole network. The degradation of the global performance due to the selfish behavior of its users is often termed price of anarchy [34,38] and measured in terms of the coordination ratio. The theory of Nash equilibria [29] provides us with an important solution concept for environments of this kind: a Nash equilibrium is a state of the system such that no user can decrease its individual cost by unilaterally changing its strategy. It has been shown by Nash that a Nash equilibrium exists under fairly broad circumstances. The concept of Nash equilibria has become an important mathematical tool in analyzing the behavior of selfish users in non-cooperative systems [34]. Many algorithms have been developed to compute a Nash equilibrium in a general game (see [27] for an overview). Although the theorem of Nash [29] guarantees the existence of a Nash equilibrium the computational complexity of computing a Nash equilibrium in general games is open even if only n = 2 users are
Selfish Routing in Non-cooperative Networks: A Survey
23
involved. Papadimitriou [34] states that due to the guaranteed existence of a solution the problem is unlikely to be N P-hard. The problem becomes even more challenging when global objective functions have to be optimized over the set of all Nash equilibria. In this work we survey results for two different models of non-cooperative communication networks. In the first model, the network consists of a single source and a single destination which are connected by m parallel links of capacities c1 , . . . , cm . Users 1, . . . , n are going to selfishly route their traffics w1 , . . . , wn from the source to the destination. Traffics are unsplittable. Users may choose mixed strategies to route their traffics. This model has been introduced by Koutsoupias and Papadimitriou [23]. We denote it as the KP-model. The individual cost of a user is defined as the maximum expected latency over all links it has chosen with positive probability. Depending on how the latency of a link is defined we distinguish between three variations of the model: In the identical link model all links have equal capacity. In the model of related links the latency for a link j is defined to be the quotient of the sum of the traffics through j and the capacity cj . In the general case of unrelated links traffic i induces load wij on link j. In all these models the social cost is defined to be the maximum expected latency on a link, where the expectation is taken over all random choices of the users. The second model, which we denote by Wardrop-model, has already been studied in the 1950’s [41,3] in the context of road traffic systems. It allows to split traffics into arbitrary pieces. Wardrop [41] introduced the concept of equilibrium to describe user behavior in this kind of traffic networks. For a survey of the early work on this model see [1]. In this environment unregulated traffic is modeled as network flow. Given an arbitrary network with edge latency functions, equilibrium flows have been classified as flows with all flow paths used between a given source-destination pair having equal latency. Equilibrium flows are optimal solutions to a convex program, if the edge latencies are given by convex functions. A lot of subsequent work (see [36, Sec. 1.2] for a brief survey) on this model has been motivated by Braess’s Paradox [4]. An equilibrium in this model can be interpreted as a Nash equilibrium in a game with infinitely many users, each carrying an infinitesimal amount of traffic from a source to a destination. The individual cost of a user is defined to be the sum of the edge latencies on a path from the user’s source to its destination, the social cost is defined to be the sum over all edge latencies in the network. Inspired by the new interest in the coordination ratio, Roughgarden and Tardos [36,37,38] reinvestigated the Wardrop-model. We survey some of their results in section 6. The practical relevance of the Wardrop-model is underpinned by its use by traffic engineers, who utilized equilibria in route-guidance systems to prescribe user behavior. Recent analyses of this framework have been done by Schulz and Stier Moses [39] for capacitated networks; algorithms and experimental benchmarking on real world problems are given by Jahn et al. [20]. The two models differ in terms of their definition of social cost. Routing in the KP-model is equivalent to scheduling n jobs on m parallel machines [15]. The maximum expected latency is then equivalent to the expected makespan of
24
R. Feldmann et al.
a schedule. The Wardrop-model has its origins in the definition of road traffic systems. Here the total network load expressed in terms of the sum of edge loads is a natural measure for the network performance. In the KP-model there may be Nash equilibria with different social costs, while in the Wardrop-model the costs of Nash equilibria are equal. It is well known that in non-cooperative networks, due to the lack of coordination, the users may get to a solution, i.e. a Nash equilibrium, that is sub-optimal in terms of the social cost. Koutsoupias and Papadimitriou [23] defined the coordination ratio as the ratio of the social cost of a worst Nash equilibrium and the social cost of the global optimal solution. The coordination ratio is a measure for the price of anarchy. The well known Braess’s Paradox [4] shows that there exist networks, such that strict sub-networks perform better when users are selfish. If the goal is to construct a network in such a way that the coordination ratio of the network is small, an interesting network design problem arises: Given a network and the corresponding routing tasks, determine a set of edges which should be removed from the network to obtain a best possible routing at Nash equilibrium. Such network design problems arise e.g. in the routing of road traffic, when traffic engineers want to determine roads that should be closed, or changed to one-way roads, in order to obtain an optimal traffic flow. In the case that the users are allowed to randomize their strategies, the set of solutions of the routing problem equals the set of all mixed Nash equilibria. A special class of mixed equilibria is the class of Fully Mixed Nash equilibria. A Nash equilibrium is fully mixed, if every user plays each of its pure strategies with positive probability. Gairing et al. [15] conjecture that, if a fully mixed Nash equilibrium exists, it is the worst equilibrium with respect to the social cost in the KP-model with related links. In the case that the users are not allowed to randomize their strategies, the set of solutions of the routing problem consists of all pure Nash equilibria. In this environment the problem of Nashification becomes important. The problem of Nashification is to compute an equilibrium routing from a given non-equilibrium one without increasing the social cost. The intention to centrally nashify a nonequilibrium solution is to provide a routing from which no user has an incentive to deviate. One way to nashify an assignment is to perform a sequence of greedy selfish steps. A greedy selfish step is a user’s change of its current pure strategy to its best pure strategy with respect to the current strategies of all other users. Any sequence of greedy selfish steps leads to a pure Nash equilibrium. However, the length of such a sequence may be exponential in n. We present polynomial time algorithms for the problem of Nashification. Overview. In section 2 we review basic notations. Then, in section 3 we focus on the sub-model of identical links. In section 4 we turn to the more general model of related links. The most general sub-model of the KP-model is considered in section 5. Finally, in section 6 we review results for the Wardrop-model.
Selfish Routing in Non-cooperative Networks: A Survey
2
25
Basic Notations
Mathematical Preliminaries. For an integer i ≥ 1, denote [i] = {1, . . . , i}. Denote Γ the Gamma function; that is, for any natural ∞number i, Γ (i + 1) = i!, while for any arbitrary real number x > 0, Γ (x) = 0 tx−1 e−t dt. We use the fact that Γ (x + 1) = x · Γ (x). The Gamma function is invertible; both Γ and its inverse Γ −1 are increasing. Game Theoretic Framework. We consider a non-cooperative network game in which each network user i ∈ [n], or user for short, wishes to route a particular amount of traffic along (non-fixed) paths from source si to destination ti . Denote Pi as the set of all distinct simple paths from si to ti . Then, a pure strategy for user i ∈ [n] is a path in Pi . A mixed strategy for user i is a probability distribution over pure strategies. Each user i wants to minimize its private cost, which depends on the latency of the shared edges used by user i to route its traffic. A Nash equilibrium is then a setting where no user can decrease its private cost by unilaterally changing its strategy. The performance of the total network is measured in terms of social cost. In this paper we consider two models which are different with respect to the following aspects: 1. The structure of the network. 2. Traffics may be splittable or unsplittable. 3. The definition of the private and social cost functions. 2.1
KP-Model
We first consider a model introduced by Koutsoupias and Papadimitriou [23], called KP-model. Here, the network consists only of a single source node and a single destination node connected by m parallel links. Traffics are unsplittable, that is, a pure strategy of a user is some specific link, and a mixed strategy is a probability distribution over the set of links. The private cost of user i is defined to be the maximum expected latency on a link used by user i. The social cost is the expected maximum latency of a link. General. Due to the simplicity of the network, a pure strategy profile L is represented by an n-tuple l1 , l2 , . . . , ln ∈ [m]n ; a mixed strategy profile P is represented by an n × m probability matrix of nm probabilities pij , i ∈ [n] and j ∈ [m], where pij is the probability that user i chooses link j. A mixed strategy profile F is fully mixed if for all users i ∈ [n] and all links j ∈ [m], pij > 0. The support of the mixed strategy for user i ∈ [n], denoted support(i), is the set of those pure strategies (links) to which i assigns positive probability; so, support(i) = {j ∈ [m] | pij > 0}. For pure strategies we denote link(i) = li . System, Models and Cost Measures. Denote wi the traffic of user i ∈ [n]. Define the n × 1 traffic vector w in the natural way. Assume, without loss of n generality, that w1 ≥ w2 ≥ . . . ≥ wn , and denote W = i=1 wi the total traffic. Denote cj > 0 the capacity of link j ∈ [m], representing the rate at which the link processes traffic. In the model of identical links, all link capacities are equal.
26
R. Feldmann et al.
Link capacities may vary arbitrarily in the model of links. Without loss of related m generality assume c1 ≥ . . . ≥ cm , and denote C = j=1 cj the total capacity. So, the latency for traffic wi through link j equals wcji . In the model of unrelated links, there exists neither an ordering on the traffics nor on the capacities. We denote wij the traffic of user i ∈ [n] on link j ∈ [m]. Link capacities are not necessary as a problem input when links are unrelated. However, to obtain common expressions in the following definitions we assume that the link capacities are all equal to 1. Let P be an arbitrary mixed strategy profile. The expected latency of user i on link j is wij + k∈[n],k=i pkj wkj . λij = cj Denote IC(w, P) the maximum expected individual latency by IC(w, P) = max
max
i∈[n] j∈[m] | pij >0
λij .
The minimum expected latency of user i is λi = minj∈[m] λij . The expected load Λj on link j is the ratio between the expected traffic on link j and the capacity of link j. Thus, i∈[n] pij wij . Λj = cj The maximum expected load Λ = maxj∈[m] Λj is the maximum (over all links) of the expected load Λj on a link j. Associated with a traffic vector w and a mixed strategy profile P is the social cost [23, Section 2], denoted SC(w, P), which is the expected maximum latency on a link, where the expectation is taken over all random choices of the users. Thus, n k:lk =j wkj SC(w, P) = . pklk · max cj j∈[m] n l1 ,l2 ,...,ln ∈[m]
k=1
Note that SC(w, P) reduces to the maximum latency through a link in the case of pure strategies. Moreover, by definition of the social cost, there always exists a pure strategy profile with minimum social cost. So, the social optimum [23, Section 2] associated with a traffic vector w, denoted OPT(w), is the least possible maximum (over all links) latency through a link, that is, k:lk =j wkj OPT(w) = min max . cj l1 ,l2 ,...,ln ∈[m]n j∈[m] Nash Equilibria and Coordination Ratio. Say that a user i ∈ [n] is satisfied for the probability matrix P if λij = λi for all links j ∈ support(i), and λij ≥ λi for all j ∈ support(i). Otherwise, user i is unsatisfied. Thus, a satisfied user has no incentive to unilaterally deviate from its mixed strategy. P is a Nash equilibrium [29] iff all users i ∈ [n] are satisfied for P. Fix any traffic vector w. A best (worst) Nash equilibrium is a Nash equilibrium that minimizes (maximizes) SC(w, P). The best social cost is the social cost
Selfish Routing in Non-cooperative Networks: A Survey
27
of a best Nash equilibrium. The worst social cost is the social cost of a worst Nash equilibrium and is denoted by WC(w). Gairing et.al. [15] conjecture that in case of its existence the fully mixed Nash equilibrium, which is unique, is the worst Nash equilibrium. Fully Mixed Nash Equilibrium Conjecture [15]. Consider the model of arbitrary traffics and related links. Then, for any traffic vector w such that the fully mixed Nash equilibrium F exists, and for any Nash equilibrium P, SC(w, P) ≤ SC(w, F). The coordination ratio [23] is the maximum of WC(w)/OPT(w), over all traffic vectors w. Correspondingly, we denote the maximum of IC(w, P)/OPT(w) the individual coordination ratio. Though a mixed Nash equilibrium always exists, as implied by the fundamental theorem of Nash [29], this is not the case for pure Nash equilibria in general settings. However, for the model under consideration, there is always some pure strategy profile, which fulfills the Nash equilibrium condition. A proof can be given with the help of sequences of greedy selfish steps. In a selfish step, exactly one unsatisfied user is allowed to change its pure strategy. A selfish step is a greedy selfish step if the user chooses its best strategy. Selfish steps do not increase the social cost of the initial pure strategy profile. Starting with any pure strategy profile, every sequence of (greedy) selfish steps eventually ends in a pure Nash equilibrium. Moreover, starting with any pure strategy profile with minimum social cost, this also shows that there always exists a pure Nash equilibrium L with social cost SC(w, L) = OPT(w). Theorem 1 ([11], Theorem 1). There exists a pure Nash equilibrium. Algorithmic Problems. We list a few algorithmic problems related to Nash equilibria that we consider in this work. The definitions are given in the style of Garey and Johnson [14]. A problem instance is a tuple (n, m, w, c), where n is the number of users (traffics), m is the number of links, w = (wi,j ) is a n × m matrix of traffics and c = (cj ) is a vector of m link capacities. Π1 : BEST NASH EQUILIBRIUM SUPPORTS INSTANCE: A problem instance (n, m, w, c). OUTPUT: A best Nash equilibrium P. Π2 : WORST NASH EQUILIBRIUM SUPPORTS INSTANCE: A problem instance (n, m, w, c). OUTPUT: A worst Nash equilibrium P. The corresponding problem to compute a worst pure Nash equilibrium is denoted as WORST PURE NASH EQUILIBRIUM SUPPORTS. If, additionally, m is constant the problem is denoted as m-WORST PURE NASH EQUILIBRIUM SUPPORTS.
28
R. Feldmann et al.
Π3 : NASH EQUILIBRIUM SOCIAL COST INSTANCE: A problem instance (n, m, w, c); a Nash equilibrium P for (n, m, w, c). OUTPUT: The social cost of the Nash equilibrium P. Π4 : NASHIFY INSTANCE: A problem instance (n, m, w, c); a pure strategy profile L for the system of the users; an integer k > 0. QUESTION: Is there a sequence of at most k selfish steps that transforms L to a (pure) Nash equilibrium? If k is a constant and not part of the input the corresponding decision problem is denoted as k-NASHIFY. 2.2
Wardrop-Model
In the Wardrop-model traffics are splittable into infinitesimally small pieces. Each of these infinitely many pieces may be viewed as the traffic of a single user who chooses a path from its source to its destination. The problem of routing splittable traffic in a congested network has already been studied since the 1950’s [41,3]. Given a network, rates of traffic between pairs of nodes, and a latency function for each edge, the objective is to route traffic such that the sum of all latencies is minimized. More formally, the model is defined as follows: General. An instance (G, r, l) for the routing problem consists of a network G = (V, E), a set r = {(ri , si , ti ) ∈ IR>0 × V × V | i ∈ [k]} of routing tasks, and a set l = {le | e ∈ E} of edge latency functions. A triple (ri , si , ti ) ∈ r defines the task to route a traffic k of rate ri from si to ti . Traffics are to be routed via : P → IR≥0 . For paths in Pi . Let P = i=1 Pi . A traffic flow then is a function f a fixed flow f the flow fe along edge e ∈ E is defined as fe = P ∈P:e∈P fP . f is a feasible solution for the routing problem instance (G, r, l) iff P ∈Pi fP = ri for all i ∈ [k]. System, Models and Cost Measures. The latency of an edge e is given by a non-negative, differentiable and non-decreasing function le : IR≥0 → IR≥0 . For a fixed flow f the latency of an edge e ∈ E is defined as ce (fe ) = le (fe )fe , the product of e s latency when routing traffic fe times the traffic fe itself. The latency cP (f ) = e∈P ce (fe ) of a path P is defined to be the sum of the edge latencies on P . The social cost C(f ) of a flow f is defined as the sum of the edge latencies C(f ) = ce (fe ) = lP (f )fP , where lP (f ) =
e∈E
P ∈P
e∈P le (fe ).
Nash Equilibria and Coordination Ratio. By definition, a flow f is a Nash equilibrium, if any arbitrarily small amount of flow routed from, say, si to ti via
Selfish Routing in Non-cooperative Networks: A Survey
29
a path P1 ∈ Pi cannot improve the latency experienced on P1 by switching to a path P2 ∈ Pi . More formally, from [7] we obtain the following definition: A flow f in G is a Nash equilibrium, occasionally called a Wardrop equilibrium [7], if for all P1 , P2 ∈ P and all δ ∈ [0, fP1 ], we have lP1 (f ) ≤ lP2 (f˜), where fP − δ if P = P1 f˜P = fP + δ if P = P2 fP if P ∈ {P1 , P2 }. In contrast to the notion of mixed Nash equilibria in the KP-model, here flows may be split among different paths. Consequently the social cost C(f ) of a flow as defined above is not a measure of expected cost. Nash equilibria in this model are pure, however, there are infinitely many users, each carrying an infinitesimal amount of the overall traffic. As for the KP-model, the coordination ratio is defined by the ratio of the social cost of the worst Nash equilibrium and the minimum possible total latency.
3
Routing in the KP-Model with Identical Links
In this section we consider a simple model for our routing game, namely the KP-model where all links have the same capacity. We distinguish between pure Nash equilibria, where each user chooses a single link as its strategy, and mixed Nash equilibria. Here the strategy of a user is a probability distribution over the links. 3.1
Pure Nash Equilibria
Fotakis et al. [11, Theorem 3] show that computing a pure Nash equilibrium with minimum social cost is N P-hard even in the model of identical links. Since this problem can be formulated as an integer program, it follows that it is N Pcomplete. However, computing some pure Nash equilibrium L can be done with help of the LPT-algorithm introduced by Graham [16], using polynomial time and yielding 4 1 OPT(w). SC(w, L) ≤ − 3 3m Another way to compute a pure Nash equilibrium is, starting from any pure strategy profile, to allow the users to perform greedy selfish steps. Every sequence of greedy selfish steps eventually yields a pure Nash equilibrium. We now give bounds on the maximum length of a sequence of greedy selfish steps, that show that such a sequence may be of exponential length. Theorem 2 ([8]). There exists an instance of n users on m identical links for which the maximum length of a sequence of greedy selfish steps is at least
m−1 n m−1
2(m − 1)!
.
An upper bound on the number of greedy selfish steps is given by
30
R. Feldmann et al.
Theorem 3 ([9]). For any instance with n users on identical links, the length of any sequence of greedy selfish steps is at most 2n − 1. Instead of the maximum length one may ask about the minimum length of a sequence of greedy selfish step. In particular one may consider whether a given pure strategy profile can be transformed into a pure Nash equilibrium with at most k selfish steps. This problem is called NASHIFY and was shown to be N P-complete. Theorem 4 ([15]). NASHIFY is N P-complete on identical links even if m = 2. The proof relies on a reduction from PARTITION. This reduction implies that NASHIFY is N P-complete in the strong sense (cf. [14, Section 4.2]) if m is part of the input. Thus, there is no pseudo-polynomial-time algorithm for NASHIFY (unless P = N P). In contrast, there is a natural pseudo-polynomial-time algorithm for k-NASHIFY, which exhaustively searches all sequences of k selfish steps; since a selfish step involves a (unsatisfied) user and a link for a total of mn choices, the running time of such an algorithm is Θ((mn)k ). In order to compute a Nash equilibrium by using sequences of greedy selfish steps, we can use two types of sequences of polynomial length introduced by Even-Dar et al. [8]. Theorem 5 ([8]). For FIFO and Random strategy, the length of a sequence of greedy selfish steps is at most n(n + 1)/2 before reaching a Nash equilibrium. We continue to present algorithm NashifyIdentical that solves NASHIFY when n selfish steps are allowed. A crucial observation is Lemma 1 ([15,8]). A greedy selfish step of an unsatisfied user i with traffic wi makes no user k with traffic wk ≥ wi unsatisfied. NashifyIdentical sorts the user traffics in non-increasing order so that w1 ≥ . . . ≥ wn . Then for each user i := 1 to n, it removes user i from the link it is currently assigned to, it finds the link j with the minimum latency, and it reassigns user i to the link j. We prove: Theorem 6 ([15,8]). Let L = l1 , . . . , ln be a pure strategy profile for n users with traffics w1 , ..., wn on m identical links with social cost SC(w, L). Then algorithm NashifyIdentical computes a Nash equilibrium from L with social cost ≤ SC(w, L) using O(n log n) time. The proof relies on Lemma 1 and the usage of appropriate data structures. Running the PTAS of Hochbaum and Shmoys [18] for scheduling n jobs on m identical machines yields a pure strategy profile L such that SC(w, L) ≤ (1 + ε)OPT(w). On the other hand, applying NashifyIdentical to L yields a Nash equilibrium L such that SC(w, L ) ≤ SC(w, L). Thus, SC(w, L ) ≤ (1 + ε)OPT(w). Since also OPT(w) ≤ SC(w, L ), it follows that:
Selfish Routing in Non-cooperative Networks: A Survey
31
Theorem 7 ([15]). There exists a PTAS for BEST PURE NASH EQUILIBRIUM for the model of identical links. After studying best pure Nash equilibria we now turn our attention to worst pure Nash equilibria. We first show: Theorem 8 ([40,15]). Fix any traffic vector w and pure Nash equilibrium L. Then, 2 SC(w, L) ≤2− . OPT(w) m+1 Furthermore, this upper bound is tight. Theorem 8 shows that the social cost of any Nash equilibrium is at most 2 the factor 2 − m+1 away from the social cost of an optimal Nash equilibrium. This also implies, that every Nash equilibrium approximates the social cost of the worst Nash equilibrium within this factor. We now establish, that approximating a worst Nash equilibrium with a better guaranty is N P-hard. Theorem 9 ([15]). It is N P-hard to find a Nash equilibrium L with 2 WC(w) 0.
It is N P-hard in the strong sense if the number of links m is part of the input. The proof of Theorem 9 is based on a reduction from PARTITION. Since WORST CASE PURE NASH EQUILIBRIUM is N P-hard in the strong sense [11], there exists no pseudo-polynomial algorithm to solve WORST CASE PURE NASH EQUILIBRIUM. However, for a constant number of links m such an algorithm exists [15]. 3.2
Mixed Nash Equilibria
For a mixed Nash equilibrium it is hard even to compute its social cost. Theorem 10 ([11]). NASH EQUILIBRIUM SOCIAL COST is #P-complete even in the model of identical links. However, Fotakis et al. [11] show that there exists a fully polynomial, randomized approximation scheme for NASH EQUILIBRIUM SOCIAL COST. The coordination ratio was introduced by Koutsoupias and Papadimitriou [23]. They provided a lower bound of Ω(log m/ log log m). This result can be tightened as follows: Theorem 11 ([24,6]). For m identical links the worst-case coordination ratio is at most log m · (1 + o(1)). Γ −1 (m) + Θ(1) = log log m
32
R. Feldmann et al.
Together with the lower bound from [23], this bound is tight up to an additive constant. In the remainder of this section, we consider fully mixed Nash equilibria. Mavronicolas and Spirakis [28] showed that in the model of identical links there always exists a unique fully mixed Nash equilibrium. Lemma 2 ([28]). There is a unique fully mixed Nash equilibrium P with pij = 1 m for any user i ∈ [n] and link j ∈ [m]. Since we know that in our model the fully mixed Nash equilibrium always exists, we can compare it to a worst Nash equilibrium. The following theorem provides evidence for the Fully Mixed Nash Equilibrium Conjecture. Theorem 12 ([25]). Consider the model of identical traffics and identical links, and assume that m = 2 and n is even. Then, the FMNE Conjecture is valid.
4
Routing in the KP-Model with Related Links
In this section we are engaged in the two node routing network with related links. Often the term uniform links is used to refer to this model in literature. Subsection 4.1 deals with pure Nash equilibria only, whereas the results quoted in subsection 4.2 hold for general (i.e. mixed) Nash equilibria. Subsection 4.3 concentrates on the fully mixed Nash equilibrium. 4.1
Pure Nash Equilibria
In [11] it was shown that the LPT algorithm, which was first explored by Graham [16], can be used to compute some pure Nash equilibrium. Its coordination ratio lies between 1.52 and 1.67 [13]. Keep in mind, that the exact computation of the best Nash equilibrium is N P-complete in the strong sense (see section 3). We now describe another approach to approximate the best Nash equilibrium with arbitrary but constant precision. A Polynomial Time Algorithm for Nashification. We call the process of converting a given pure strategy profile for related links into a Nash equilibrium without increasing the social cost Nashification. A simple Nashification approach is to perform a greedy selfish step for any user which can improve by this as long as such a user exists. Unfortunately, this can lead to an exponential number of steps, even on identical links (see section 3). In Figure 1 we present the algorithm NashifyRelated which nashifies any pure routing by a polynomial number of (not necessarily selfish) moves without increasing the maximum latency. A crucial observation for proving the correctness of the algorithm is stated in Lemma 3, which is a generalization of Lemma 1. It shows that greedy-moving a user from its current link to a non-slower link can only make users of smaller size unsatisfied.
Selfish Routing in Non-cooperative Networks: A Survey
33
NashifyRelated Input: n users with traffics w1 ≥ · · · ≥ wn m links with capacities c1 ≥ · · · ≥ cm Assignment of users to links Output: Assignment of users to links with less or equal maximum latency, which is a NE { // phase 1: i := n; S := {n}; while i ≥ 1 { move user i to link with highest possible index without increasing overall maximum latency; if i was moved or i ∈ S or link(i) ≤ link(i + 1) then S := S ∪ {i}; i := i − 1; else { move user i to link with smallest possible index without increasing overall maximum latency; if i was moved then S := S ∪ {i}; i := n; else break; } } // phase 2: while ∃i ∈ S { make greedy selfish step for user i = min(S); S := S\{i}; } }
Fig. 1. NashifyRelated: Converts any assignment into a Nash equilibrium
Lemma 3 ([9]). If user i with traffic wi performs a greedy selfish step from link j to link k with cj ≤ ck , then no user s with traffic ws ≥ wi becomes unsatisfied. NashifyRelated works in two phases. At every time link(i) denotes the link user i is currently assigned to. The main idea is to fill up slow links with users with small traffics as close to the maximum latency as possible in the first phase (but without increasing the maximum latency) and to perform greedy selfish steps for unsatisfied users in the second phase. During the first phase, set S is used to collect all users that have already been considered by the algorithm. No user is deleted from S in this phase. Throughout the whole algorithm, each user in S is located on a link with non-greater index than any smaller user in S. In other words, the smaller the traffic of a user in S, the slower the link it is assigned to. We may start with S = {n}, because the above property is trivially fulfilled if S contains only one user. When no further user can be added to S, the first phase terminates. In the second phase we successively perform
34
R. Feldmann et al.
greedy selfish steps for all unsatisfied users, starting with the largest one. That is, we move each user, who can improve by changing its link, to its best link with respect to the current situation. Because of the special conditions that have been established by phase 1, and by Lemma 3, these greedy selfish steps do not cause other users with larger traffics to become unsatisfied. Note that during phase 1 the algorithm does not necessarily perform selfish steps. The social cost, however, cannot be increased due to the constraints of the move commands in phase 1. The following lemma formalizes the above mentioned conditions which are established by phase 1. Lemma 4 ([9]). After phase 1 the following holds: (1) All unsatisfied users are in S. (2) S = {n, (n − 1), . . . , (n + 1 − |S|)}, that is, S contains the |S| users with smallest traffics. (3) i, i + 1 ∈ S ⇒ link(i) ≤ link(i + 1). (4) Every user i ∈ S can only improve by moving to a link with smaller index. Each of the properties in Lemma 4 also holds after each run of the loop in phase 2 ([9]). In particular, S contains all unsatisfied users (property (1)). But, as S is empty when the algorithm terminates, this implies, that there are no unsatisfied users, i.e., the new assignment, defined by link(·), is a Nash equilibrium. Implementing the algorithm in a proper way yields a running time as stated in the following theorem. Theorem 13 ([9]). Given any pure strategy profile for the model of related links, algorithm NashifyRelated computes a Nash equilibrium with non-increased social cost, performing at most (m + 1)n moves in sequential running time O(m2 n). Combining any approximation algorithm for the computation of good routings with the algorithm NashifyRelated yields a method for approximating the best Nash equilibrium. Particularly, using the PTAS for the Scheduling Problem from Hochbaum and Shmoys [19], we get: Corollary 1. There is a PTAS for approximating a best pure Nash equilibrium. We cannot expect to find an FPTAS for this problem, since the exact computation of the best Nash equilibrium is N P-complete in the strong sense [11]. Coordination Ratio. Theorem 14 ([6]). The coordination ratio for pure Nash equilibria on m related links with capacities c1 ≥ · · · ≥ cm is bounded from above by c1 log m −1 (1 + o(1)) as well as O log . Γ (m) + 1 = log log m cm
Selfish Routing in Non-cooperative Networks: A Survey
35
The upper bound in Theorem 14 can be improved to Γ −1 (m) [9]. The following Example shows that this improved bound is asymptotically tight. Example 1 ([9]) Let k ∈ N, and consider the following instance with k different classes of users: – Class U1 : |U1 | = k users with traffics 2k−1 – Class Ui : |Ui | = 2i−1 · (k − 1) j=1,...,i−1 (k − j) users with traffics 2k−i for all 2 ≤ i ≤ k. In the same way we define k + 1 different classes of links: – Class P0 : One link with capacity 2k−1 . – Class P1 : |P1 | = |U1 | − 1 links with capacity 2k−1 . – Class Pi : |Pi | = |Ui | links with capacity 2k−i for all 2 ≤ i ≤ k. Consider the following assignment: – Class P0 : All users in U1 are assigned to this link. – Class Pi : On each link in Pi there are 2(k−i) users from Ui+1 , respectively, for all 1 ≤ i ≤ k − 1. – Class Pk : The links from Pk remain empty. The above assignment is a pure Nash equilibrium L with social cost SC(w, L) = k and OPT(w) = 1. Lemma 5 ([9]). For each k ∈ N there exists an instance with a pure Nash equilibrium L with k= 4.2
SC(w, L) ≥ Γ −1 (m) · (1 + o(1)). OPT(w)
Mixed Nash Equilibria
In this subsection we state upper bounds on the coordination ratio and on the individual coordination ratio for mixed Nash equilibria on related links. Theorem 15 ([6]). The coordination ratio for m related parallel links is log m log m .
Θ min , log m log log log m log log c1 /cm
Whereas the above theorem bounds the coordination ratio depending only on the number of links m and the relation between the fastest and slowest link, we now introduce a structural parameter p, which can yield better bounds on the individual coordination ratio (and the coordination ratio for pure Nash equilibria). We denote M1 = {j ∈ [m] | w1 ≤ cj · OPT(w)} and p = j∈M1 cj /C. In other words, p is the ratio between the sum of link capacities of links to which the largest traffic can be assigned causing latency at most OPT(w) and the sum of all link capacities. With the help of p we are able to prove an upper bound on the individual coordination ratio.
36
R. Feldmann et al.
Theorem 16 ([9]). For any mixed Nash equilibrium P the ratio between the maximum expected individual latency IC(w, P) = maxi∈[n] λi and OPT(w) for the model of related links is bounded by 3 1 3 + if 13 ≤ p ≤ 1, 2 p − 4 IC(w, P) 1 < 2 + 3 p1 − 2 if 37 ≤ p < 13 , OPT(w)
1 if p < 37 . Γ −1 p1 1 Since wc11 ≤ OPT(w), we have p ≥ cC1 ≥ m . Furthermore, IC(w, P) ≥ Λ holds for every assignment. Thus, from Theorem 16 we can derive an upper bound of Γ −1 (m)OPT(w) on both the maximum expected load and the maximum expected individual latency. This leads to an improvement of the upper bound on the coordination ratio for pure Nash equilibria [6]. From Example 1 the following lower bound on the coordination ratio subject to p can be obtained.
Lemma 6 ([9]). For each k ∈ N there exists an instance with a pure Nash equilibrium L with 1 SC(w, L) −1 ≥Γ . k= OPT(w) 3p We can also prove k ≥ Γ −1 ( p1 ) − 1. This shows that the generalized upper bound is tight up to an additive constant for all m whereas Lemma 5 shows tightness of Γ −1 (m) up to an additive constant only for large m. We conclude this section by giving another upper bound on the maximal expected individual latency of a mixed Nash equilibrium, which only depends on the number of links m. The same bound also applies to the social cost of a pure Nash equilibrium. Theorem 17 ([9]). For any mixed Nash equilibrium P on m related links, the maximum expected individual cost is bounded by √ 1 + 4m − 3 OPT(w). IC(w, P) ≤ 2 This bound is tight if and only if m ≤ 5. Only if m ≤ 3, there is a pure Nash equilibrium matching the bound. For m ≥ 2, there is no fully mixed Nash equilibrium matching the bound. √
That 1+ 4m−3 is an upper bound on the coordination ratio for pure Nash 2 equilibria, which is an implication of Theorem 17, can also be obtained from Cho and Sahni [5] and Schuurman and Vredeveld [40], where jump optimal schedules are considered. A schedule is said to be jump optimal if no user on a link with maximum load can improve by moving to another processor. Obviously, the set of pure Nash equilibria is √ a subset of the set of jump optimal schedules. Thus, the strict upper bound of 1+ 4m−3 on the ratio between best and worst makespan 2 of jump optimal schedules [5,40] also holds for pure Nash equilibria. This bound is not asymptotically tight, but for small numbers of links (m ≤ 19) better than the asymptotically tight bound Γ −1 (m).
Selfish Routing in Non-cooperative Networks: A Survey
4.3
37
Fully Mixed Nash Equilibria
A fully mixed Nash equilibrium is a special Nash equilibrium, where all probabilities pij for a user i to choose link j are strictly positive. Such a Nash equilibrium does not always exist, but if it exists, it is unique and can be efficiently computed. Theorem 18 ([28]). For the model of related links there is a fully mixed Nash equilibrium F for a traffic vector w if and only if
W cj mcj · 1− ∈ (0, 1) + 1− C (n − 1)wi C
∀i ∈ [n], j ∈ [m].
If F exists, F is unique and has associated Nash probabilities W cj mcj · 1− + , pij = 1 − C (n − 1)wi C
for any user i ∈ [n] and link j ∈ [m]. Gairing et al. [15] conjecture that a fully mixed Nash equilibrium (FMNE ) is the worst-case Nash equilibrium (the one with highest social cost) among all Nash equilibria for the same instance. So far, this conjecture is proved to hold in special cases (Theorem 20) and to hold up to a constant multiplicative factor in the general related links case (Theorem 19). Theorem 19 ([11]). Consider an instance with identical traffics for the model of related links for which the fully mixed Nash equilibrium exists. Then the social cost of the worst mixed Nash equilibrium is at most 49.02 times the social cost of any fully mixed Nash equilibrium.
Theorem 20 ([25]). For the model of related links and two identical traffics the FMNE Conjecture holds. In contrast to the yet unproved claim of the FMNE Conjecture (that the fully mixed Nash equilibrium has the worst social cost), Lemma 7 shows, that each user indeed experiences the worst individual cost in the fully mixed Nash equilibrium. This implies, that the social cost of a pure Nash equilibrium is bounded from above by the social cost of the fully mixed Nash equilibrium. Lemma 7 ([15]). Fix any traffic vector w, mixed Nash equilibrium P and user i. Then, λi (w, P) ≤ λi (w, F). Theorem 21 ([15]). Fix any traffic vector w and pure Nash equilibrium L. Then, SC(w, L) ≤ SC(w, F).
38
5
R. Feldmann et al.
Routing in the KP-Model with Unrelated Links
Up to now only little attention has been payed to Nash equilibria in the model of unrelated links. Similar to the model of identical links, a pure Nash equilibrium in the model of unrelated links can be computed by performing sequences of (greedy) selfish steps. However, up to now it is unknown whether, starting with any pure strategy profile, there always exists a sequence of polynomial length ending in a pure Nash equilibrium. A trivial upper bound on the number of selfish steps before reaching a pure Nash equilibrium is mn . Starting with a pure strategy profile with minimal social cost, the convergence of any sequence of (greedy) selfish steps implies that there always exists a pure Nash equilibrium with optimal social cost. However, we can not hope to approximate a best pure Nash equilibrium within factor 32 since Minimum Multiprocessor Scheduling on unrelated processors is not approximable within a factor 32 − ε for any ε > 0 [26]. Consider the case n ≤ m, that is, there are at most as many users as links. Then, the minimum expected latency of any user i in a pure Nash equilibrium L is smaller than the minimum expected latency of user i in the fully mixed Nash equilibrium F. Proposition 1 ([25]). For the model of unrelated links, let w = (wij ) be a traffic matrix such that a fully mixed Nash equilibrium F exists, and let L be a pure Nash equilibrium. Let n ≤ m. Then, for every user i, λi (L) < λi (F). Clearly, the social cost of any pure Nash equilibrium L is equal to the maximum of the expected latencies, while the social cost of a fully mixed Nash equilibrium F is at least the expected latency of any user. Hence, Proposition 1 implies that for n ≤ m the social cost of every pure Nash equilibrium L is at most the social cost of the fully mixed Nash equilibrium F. Theorem 22 ([25]). For the model of unrelated links, let w = (wij ) be a traffic matrix such that a fully mixed Nash equilibrium F exists, and let L be a pure Nash equilibrium. Let n ≤ m. Then, SC(w, L) ≤ SC(w, F). In case of n = 2, the minimum expected latency of user i ∈ [2] in any mixed Nash equilibrium P is also bounded by its minimum expected latency in the fully mixed Nash equilibrium F. Proposition 2 ([25]). For the model of unrelated links, let w = (wij ) be a traffic matrix such that a fully mixed Nash equilibrium F exists, and let P be a Nash equilibrium. If n = 2 then λi (P) ≤ λi (F) for every user i ∈ [2]. This implies that the FMNE Conjecture holds for n = m = 2. However, for the case n = 3 and m = 2 there exist instances for which the FMNE Conjecture does not hold. Theorem 23 ([25]). For the model of unrelated links, if n = 2 and m = 2, then the FMNE Conjecture holds. If n = 3 and m = 2, then the FMNE Conjecture does not hold.
Selfish Routing in Non-cooperative Networks: A Survey
6
39
Routing in the Wardrop-Model
The problem of routing splittable traffics in a congested network has already been studied since the 1950’s [41,3]. In contrast to the notion of mixed Nash equilibria in the previous sections, here flows may be split among different paths. The model can be interpreted as a game with infinitely many users each carrying an infinitesimally small amount of traffic. A Nash equilibrium is defined to be a state in which no such user has an incentive to unilaterally deviate from its chosen path. In the Wardrop-model the following characterization of a Nash equilibrium, occasionally called a Wardrop equilibrium, can be obtained: Theorem 24 ([41]). A flow f is a Nash equilibrium iff for every i ∈ [k] and P1 , P2 ∈ Pi with fP1 > 0 we have lP1 (f ) ≤ lP2 (f ). Thus, in a Nash equilibrium all paths nP ∈ Pi which are used by user i have equal latency, say Li (f ). Then C(f ) = i=1 Li (f )ri . The main differences to the KP-model are the following: a) The social cost of a flow is defined to be the sum of the edge latencies, as opposed to the maximum of the edge latencies in the KP-model. b) Traffic rates and flows may be split arbitrarily. Equivalently, the Wardropmodel can be viewed as a system of infinitely many users, each controlling an infinitesimally small amount of traffic. In the Wardrop-model the fact that flows are splittable into arbitrarily small pieces guarantees the existence of a Nash equilibrium. c) Edge latencies are defined by using quite general, but not arbitrary, edge latency functions, giving the latency on an edge per unit of flow. If all edge latency functions le (fe ) are of the form le (fe ) = ae fe , the definition of latency coincides with the definition of latency in the KP-model with related links. d) Due to the differences in the definition of latencies (a) and due to the fact that all Nash equilibria in the Wardrop-model are pure (b), the semantic of the coordination ratios is different in the two models. In the Wardrop-model optimal flows can be computed as a solution of the non-linear program. N LP : min e∈E ce (fe ) s.t. P ∈Pi
fe =
fP = ri
P ∈P:e∈P
fP ≥ 0
fP
∀i ∈ [k] ∀e ∈ E ∀P ∈ P
From this non-linear program the following necessary and sufficient condition for a flow of being optimal can be derived [3] in the case of convex edge latencies ce (fe ): Theorem 25 ([3]). Let ce (fe ) be convex for all e ∈ E. A flow f is optimal iff for every i ∈ [k] and P1 , P2 ∈ Pi with fP1 > 0 we have cP1 (f ) ≤ cP2 (f ), where d cP (f ) denotes the derivative of cP (f ). cP (f ) = dx
40
R. Feldmann et al.
Note that from the two characterizations of Nash equilibria and optimal flows an optimal flow f for a problem instance (G, r, l) with convex edge latencies ce (fe ) = le (fe )fe can be regarded as a flow at Nash equilibrium with respect to the edge latencies ce (fe ) = le (fe ) + le (fe )fe . From this it follows that there always exists a Nash equilibrium in the Wardrop-model, and that, if f, f˜ are Nash equilibria then C(f ) = C(f˜). Roughgarden and Tardos proved the following bicriteria bound for the coordination ratio: Theorem 26 ([36]). Let the edge latency functions le (fe ) be continuous and nondecreasing for all e ∈ E. If f is a flow at Nash equilibrium for (G, r, l) and f ∗ is a feasible flow for (G, (1 + δ)r, l), then C(f ) ≤ 1δ C(f ∗ ). In particular (δ = 1), the cost of a flow at Nash equilibrium never increases the cost of a feasible flow for the problem instance obtained from doubling the traffic rates.
xp s
x t
s
1 0
1
t x
1 Fig. 2. Pigou’s example (left) and Braess’s Paradox (right)
Pigou’s example [35,3] (see Figure 2) can be used to show that the bound of theorem 26 is tight: In the graph G with two nodes s, t and two edges from s to t with edge latency functions le1 = 1 and le2 = xp the optimal solution to route 1 a total of one unit of flow from s to t is to route a fraction of (p + 1)− p units 1 along e2 and a flow of 1 − (p + 1)− p along edge e1 . The social cost of the optimal solution tends to zero when p tends to infinity. The unique Nash equilibrium in Pigou’s example is to route all flow along edge e2 with social cost 1. For a fixed δ > 0 and any ε ∈ [0, δ[, p can be chosen large enough such that C(f ∗ ) < δ for an optimal flow f ∗ of (G, (1 + δ − ε), l) and such that C(f ∗∗ ) is arbitrarily close to δ for an optimal flow f ∗∗ of (G, (1 + δ), l). Then 1δ · C(f ∗∗ ) is arbitrarily close to C(f ) = 1. In the Wardrop-model the coordination ratio cannot be bounded from above by a constant, when arbitrary edge latency functions are allowed. However, when the edge latency functions le are restricted to be linear ones, le (fe ) = ae fe + be for some ae , be ≥ 0, the coordination ratio is bounded from above by 43 :
Selfish Routing in Non-cooperative Networks: A Survey
41
Theorem 27 ([36]). If (G, r, l) has linear edge latency functions le (fe ) = ae fe + be then 4 WC(G, r, l) ≤ OPT(G, r, l) 3 A problem instance based on Braess’s Paradox [4] (see Figure 2) is used to show that the bound from the above theorem is tight. Suppose one unit of flow is to be routed from s to t. The globally optimal solution is to route 12 along the upper path and 12 along the lower path, resulting in an overall social cost of 3 2 . The unique Nash equilibrium in this environment is obtained when the total traffic is routed via the edge connecting the two intermediate nodes, resulting in a social cost of 2. The coordination ratio in this example is 43 . The interesting fact justifying the title “Paradox” is that after removing the edge with latency 0 the globally optimal solution does not change and the unique Nash equilibrium coincides with the global optimal solution. The network performs better after deletion of an edge. Motivated by Braess’s Paradox Roughgarden [37] studied a natural network design problem: NETWORK DESIGN INSTANCE: (G, r, l) with a network G = (V, E), rates r and latencies l. OUTPUT: A subgraph H = (V, EH ) ⊂ G such that a flow f at Nash equilibrium for (H, r, l) has minimum social cost over all possible subgraphs H. In the right graph of figure 2 deleting the edge with latency zero leads to a flow at Nash equilibrium which equals the globally optimal solution. For linear latency functions it follows from Theorem 27 that the trivial algorithm, which does not remove any edge, is a 43 -approximation. Roughgarden proved the following Theorem 28 ([37]). If (G, r, l) has linear edge latency functions le (fe ) = ae fe + be then for any ε > 0, there is no ( 43 − ε)-approximation algorithm for NETWORK DESIGN unless P = N P. For a more general class of latency functions Roughgarden proved Theorem 29 ([37]). If (G, r, l) with G = (V, E) and |V | = n has continuous, nonnegative and nondecreasing edge latency functions then, for any ε > 0, there is no ( n2 − ε)-approximation algorithm for NETWORK DESIGN unless P = N P. Moreover, Roughgarden showed that the approximation quality of the trivial algorithm, which returns the entire graph G, is upper bounded by n2 and that there are instances (generalizations of the Braess graph), for which the trivial algorithm returns an n2 -approximation. Recently, Roughgarden [38] proved that the worst coordination ratio, that may occur for a network, is equal to the coordination ratio of some simple 2-node
42
R. Feldmann et al.
2-edge network. In the example of Pigou the flow at Nash equilibrium incurs 1 unit of social cost, whereas the optimal flow has social cost that tend to zero at a rate of Θ( logp p ). Thus, as the latency function xp gets “steeper”, the coordination ratio for Pigou’s example grows to infinity. The main result of Roughgarden [38] bounds the coordination ratio of problem instance (G, r, l) by a measure of the steepness of the latency functions allowed. This bound is independent of the network topology of G. A class of nonnegative, differentiable and nondecreasing latency functions L is called standard, if it contains a non-zero function and if for each l ∈ L, the function cl (x) = x · l(x) is convex on [0, ∞[. For a non-zero latency function l d d such that cl (x) = x · l(x) is convex on [0, ∞[ let cl (x) = dx cl (x) = dx (x · l(x)) and define the anarchy value α(l) of l as α(l) =
1 r>0:l(r)>0 λµ + (1 − λ) sup
where λ ∈]0, 1[ satisfies cl (λr) = l(r) and µ ∈ [0, 1] is defined as µ = l(λr)/l(r). Given the anarchy values of single latency functions, the anarchy value α(L) of a standard class of latency functions L is defined as α(L) = sup0=l∈L α(l). Let ρ(G, r, l) denote the ratio between the social cost of a Nash equilibrium and the social cost of an optimal flow for (G, r, l) Since in the Wardrop-model all Nash equilibria have equal social cost, ρ(G, r, l) is well defined and expresses the coordination ratio of the problem instance (G, r, l). That α(L) is always an upper bound on the coordination ratio is stated in the following Theorem 30 ([38]). Let L be a standard class of latency functions with anarchy value α(L). Let (G, r, l) be a problem instance with latency functions drawn from L. Then ρ(G, r, l) ≤ α(L). Based on Pigou’s example Roughgarden constructed instances of the routing problem such that their coordination ratio approaches the anarchy value: Lemma 8 ([38]). Given a standard class L of latency functions containing the constant functions, there are instances (G, r, l), where G is a network of two nodes and two links and latency functions l ∈ L such that ρ(G, r, l) is arbitrarily close (from below) to α(L). Lemma 8 and Theorem 30 can be combined to achieve Theorem 31 ([38]). Let G2 denote the graph with one source node, one sink node, and two edges directed from source to sink. Let L be a standard class of latency functions containing the constant functions. Let I denote the set of all instances (G, r, l) with latency functions in L and I2 ⊆ I the instances with underlying network G2 , then sup (G2 ,r,l)∈I2
ρ(G2 , r, l) = α(L) =
sup (G,r,l)∈I
ρ(G, r, l)
Selfish Routing in Non-cooperative Networks: A Survey
43
An application of this theory to the set of polynomials taken as a standard class of latency functions results in the following Theorem 32 ([38]). Let Lp be the set of polynomials, let Ip be the set of instances with latency functions in Lp . Then sup (G,r,l)∈Ip
ρ(G, r, l) =
1 1−
p (p+1)
= Θ( p+1 p
p ) log p
Roughgarden generalized the above mentioned results on the coordination ratio for sets of latency functions that do not contain all constant functions and found similar results based on graphs with one source, one sink, and m parallel edges from the source to the sink.
References 1. M.J. Beckmann. On the theory of traffic flow in networks. Traffic Quart, 21:109– 116, 1967. 2. P. Brucker, J. Hurink, and F. Werner. Improving local search heuristics for some scheduling problems. part ii. Discrete Applied Mathematics, 72:47–69, 1997. 3. M. Beckmann, C.B. McGuire, and C.B. Winsten. Studies in the Economics of Transportation. Yale University Press, 1956. ¨ 4. D. Braess. Uber ein Paradoxon der Verkehrsplanung. Unternehmensforschung, 12:258–268, 1968. 5. Y. Cho and S. Sahni. Bounds for list schedules on uniform processors. SIAM Journal on Computing, 9(1):91–103, 1980. 6. A. Czumaj and B. V¨ ocking. Tight bounds for worst-case equilibria. In Proc. of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’02), pages 413–420, 2002. 7. S.C. Dafermos and F.T. Sparrow. The traffic assignment problem for a general network. Journal of Research of the National Bureau of Standards, Series B, Vol. 73B, No. 2, pages 91–118, 1969. 8. E. Even-Dar, A. Kesselmann, and Y. Mansour. Convergence time to nash equilibria. In Proc. of the 30th International Colloquium on Automata, Languages, and Programming (ICALP’03), 2003. 9. R. Feldmann, M. Gairing, T. L¨ ucking, B. Monien, and M. Rode. Nashification and the coordination ratio for a selfish routing game. In Proc. of the 30th International Colloquium on Automata, Languages, and Programming (ICALP’03), 2003. 10. G. Finn and E. Horowitz. A linear time approximation algorithm for multiprocessor scheduling. BIT, 19:312–320, 1979. 11. D. Fotakis, S. Kontogiannis, E. Koutsoupias, M. Mavronicolas, and P. Spirakis. The structure and complexity of nash equilibria for a selfish routing game. In Proc. of the 29th International Colloquium on Automata, Languages, and Programming (ICALP’02), pages 123–134, 2002. 12. J. Feigenbaum, C. Papdimitriou, and S. Shenker. Sharing the cost of multicast transmissions. In Proc. of the 32nd Annual ACM Symposium on the Theory of Computing, pages 218–227, 2000. 13. D.K. Friesen. Tighter bounds for lpt scheduling on uniform processors. SIAM Journal on Computing, 16(3):554–560, 1987.
44
R. Feldmann et al.
14. M.R. Garey and D.S. Johnson. Computers and intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979. 15. M. Gairing, T. L¨ ucking, M. Mavronicolas, B. Monien, and P. Spirakis. Extreme nash equilibria. Technical report, FLAGS-TR-03-10, University of Paderborn, 2002. 16. R:L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics, 17(2):416–429, 1969. 17. A. Haurie and P. Marcotte. On the relatonship between nash-cournot and wardrop equilibria. Networks, 15:295–308, 1985. 18. D.S. Hochbaum and D. Shmoys. Using dual approximation algorithms for scheduling problems: Theoretical and practical results. Journal of the ACM, 34(1):144– 162, 1987. 19. D.S. Hochbaum and D. Shmoys. A polynomial approximation scheme for scheduling on uniform processors: using the dual approximation approach. SIAM Journal on Computing, 17(3):539–551, 1988. 20. O. Jahn, R.H. M¨ ohring, A.S. Schulz, N.E. Stier Moses. System-Optimal Routing of Traffic Flows With User Constraints in Networks With Congestion. MIT Sloan School of Management Working Paper No. 4394-02, 2002. 21. K. Jain and V. Vazirani. Applications of approximation algorithms to cooperative games. In Proc. of the 33rd Annual ACM Symposium on Theory of Computing (STOC’01), pages 364–372, 2001. 22. Y.A. Korilis, A.A. Lazar, and A. Orda. Architecting noncooperative networks. IEEE Journal on Selected Areas in Communications, 13(7):1241–1251, 1995. 23. E. Koutsoupias and C. Papadimitriou. Worst-case equilibria. In Proc. of the 16th International Symposium on Theoretical Aspects of Computer Science (STACS’99), pages 404–413, 1999. 24. E. Koutsoupias, M. Mavronicolas, and P. Spirakis. Approximate Equilibria and Ball Fusion In Proc. of the 9th International Colloquium on Structural Information and Communication Complexity (SIROCCO’02), 2002, (accepted for TOCS). 25. T. L¨ ucking, M. Mavronicolas, B. Monien, M. Rode, P. Spirakis, and I. Vrto. Which is the worst-case nash equilibrium? In Proc. of the 28th International Symposium on Mathematical Foundations of Computer Science (MFCS’03), 2003. 26. J.K. Lenstra, D.B. Shmoys, and E. Tardos. Approximation algorithms for scheduling unrelated parallel machines. In Proc. of the 28th Annual Symposium on Foundations of Computer Science (FOCS’87), pages 217–224, 1987. 27. R.D. McKelvey and A. McLennan. Computation of equilibria in finite games. In H. Amman, D. Kendrick, and J. Rust, editors, Handbook of Computational Economics, 1996. 28. M. Mavronicolas and P. Spirakis. The price of selfish routing. In Proc. of the 33rd Annual ACM Symposium on Theory of Computing (STOC’01), pages 510–519, 2001. 29. J. Nash. Non-cooperative games. Annals of Mathematics, 54(2):286–295, 1951. 30. N. Nisan. Algorithms for selfish agents. In Proc. of the 16th International Symposium on Theoretical Aspects of Computer Science (STACS’99), pages 1–15, 1999. 31. N. Nisan and A. Ronen. Algorithmic mechanism design. In Proc. of the 31st ACM Symposium on Theory of Computing (STOC’99), pages 129–140, 1999. 32. M.J. Osborne and A. Rubinstein. A Course in Game Theory. The MIT Press, 1994. 33. C.H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and System Science, 48(3):498–532, 1994.
Selfish Routing in Non-cooperative Networks: A Survey
45
34. C.H. Papadimitriou. Algorithms, games, and the internet. In Proc. of the 33rd Annual ACM Symposium on Theory of Computing (STOC’01), pages 749–753, 2001. 35. A.C. Pigou. The economics of welfare. Macmillan, 1920. 36. T. Roughgarden and E. Tardos. How bad is selfish routing? Journal of the ACM, 49(2):236–259, 2002. 37. T. Roughgarden. Designing Networks for Selfish Users is Hard. In Proc. of the 42nd Annual ACM Symposium on Foundations of Computer Science (FOCS’01), pages 472–481, 2001. 38. T. Roughgarden. The Price of Anarchy is Independent of the Network Topology. In Proc. of the 34th Annual ACM Symposium on Theory of Computing (STOC’02), pages 428–437, 2002. 39. A.S. Schulz and N.E. Stier Moses. On The Performance of User Equilibria in Traffic Networks. MIT Sloan School of Management Working Paper No. 4274-02, 2002. 40. P. Schuurman and T. Vredeveld. Performance guarantees of load search for multiprocessor scheduling. In Proc. of the 8th Conference on Integer Programming and Combinatorial Optimization (IPCO’01), pages 370–382, 2001. 41. J.G. Wardrop. Some theoretical aspects of road traffic research. In Proc. of the Institute of Civil Engineers, Pt. II, Vol. 1, pages 325–378, 1952.
Process Algebraic Frameworks for the Specification and Analysis of Cryptographic Protocols Roberto Gorrieri1 and Fabio Martinelli2 1
Dipartimento di Scienze dell’Informazione, Universit` a di Bologna, Italy 2 Istituto di Informatica e Telematica C.N.R., Pisa, Italy
Abstract. Two process algebraic approaches for the analysis of cryptographic protocols, namely the spi calculus by Abadi and Gordon and CryptoSPA by Focardi, Gorrieri and Martinelli, are surveyed and compared. We show that the two process algebras have comparable expressive power, by providing an encoding of the former into the latter. We also discuss the relationships among some security properties, i.e., authenticity and secrecy, that have different definitions in the two approaches.
1
Introduction
Security protocols are those protocols that accomplish security goals such as preserving the secrecy of a piece of information during a protocol or establishing the integrity of the transmitted information. Cryptographic protocols are those security protocols running over a public network that use cryptographic primitives (e.g., encryption and digital signatures) to achieve their security goals. In the analysis of cryptographic protocols, one has to cope with the insecurity of the network. So, it is assumed that one attacker (sometimes called enemy or intruder) of the protocol has complete control over the communication medium. On the other hand, to make analysis less intricate, it is also usually assumed perfect cryptography, i.e., such an enemy is not able to perform cryptanalytic attacks: an encrypted message can be decrypted by the enemy only if he knows (or is able to learn) the relevant decryption key. Such an analysis scenario is often referred to as the Dolev-Yao approach [10]. Because of the above, cryptographic protocols are difficult to be analysed and to be proved correct. Indeed, a lot of them have flaws or inaccuracies. As a well-known example, we mention that Lowe [22] pointed out one inaccuracy in an authentication protocol by Needham and Schroeder [31]. Hence, the need of a formal approach to the analysis of cryptographic protocols. The spi calculus [3], proposed by Abadi and Gordon, and CryptoSPA
Work partially supported by MURST Progetto “Metodi Formali per la Sicurezza” (MEFISTO); IST-FET project “Design Environments for Global ApplicationS (DEGAS)”; Microsoft Research Europe; by CNR project “Tecniche e Strumenti Software per l’analisi della sicurezza delle comunicazioni in applicazioni telematiche di interesse economico e sociale” and by a CSP grant for the project “SeTAPS II”.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 46–67, 2003. c Springer-Verlag Berlin Heidelberg 2003
Process Algebraic Frameworks
47
[17,16], proposed by the authors in joint work with R. Focardi, are two wellknown possible answers. The goal of this paper is to show similarities and differencies of these two approaches from the point of view of both modeling and analysis. A small running example is used throughout the paper in order to illustrate the basic features of the two approaches. The basic idea is that, in order to analyse a protocol, one has to begin by modeling it as a program of the calculus. At first sight, there are some differencies in the spi model and in the CryptoSPA one. In particular, the spi calculus, being based on the π calculus, is apparently more expressive as it handles mobility (of channels) as a first class primitive of the language. On the other hand we show that it is possible to define an encoding from the spi calculus to CryptoSPA that preserves a rather strong notion of equivalence. The core idea of the encoding is that name generation of spi can be simulated by means of a suitable process in CryptoSPA that uses the inference system hidden inside the language. Moreover, the spi calculus offers the possibility to describe secret pieces of information inside the syntax, by means of the restriction operator. On the contrary, in CryptoSPA these secrets are to be specified separately as limitations on the knowedge of the enemy that tries to attack the protocol. A major difference between the two approaches can be summarised by the motto: Contextual equivalence vs Equivalence of contexts. In the spi calculus the properties of secrecy and message authenticity are expressed as the equivalence of systems, where the used notion of equivalence is may testing: it is based on the idea that two systems are equivalent if they cannot be distinguished by an external observer. According to the spi calculus approach, the tester is playing at the same time the role of observer and attacker of the protocol; hence, elegantly, spi includes the notion of external enemy inside the definition of the semantics of the calculus by using a contextual equivalence. Indeed, the two processes must exhibit the same observable behavior w.r.t. any context (the observer). On the contrary, in CryptoSPA the properties of secrecy and authenticity (or integrity) are formulated as instances of the following general form: ∀X ∈ ECφI
(S | X) \ C ∼ α(S)
where X is any process in the set ECφI of admissible enemies, C is the set of communication channels, ∼ is a behavioural semantics (actually, trace semantics for our purpose) and α(S) is the correct specification of S when run in isolation. The equation above amounts to say that the behaviour of system S when exposed to any enemy X is the same as the correct behaviour of S. Hence, such properties are expressed as a form of equivalence of contexts: a closed system, i.e. α(S), is compared with an open system (S | •) \ C and the comparison takes the form of an infinity of checks between closed systems, for each possible enemy X. (Actually, nothing prevents ∼ from being itself a contextual equivalence, though trace equivalence is the usual relation used in the CryptoSPA approach.) We will show that the latter approach is more flexible, by providing an example of an attack scenario where there are several enemies with different capabilities that can be naturally treated in the CryptoSPA approach. On the contrary,
48
Roberto Gorrieri and Fabio Martinelli
the spi approach is appropriate for modeling a scenario where there is one single enemy, as in the Dolev-Yao approach. The final part of the paper is devoted to show similarities and differences among secrecy and authenticity as defined in the two approaches. We show that, in spite of the many technical differencies, the notion of spi authenticity is the same as the notion of integrity in CryptoSPA. On the contrary, the two notions of secrecy are quite different. A short summary of other process algebraic approaches to the analysis of cryptographic protocols concludes the paper.
2
The Spi Calculus
The spi calculus [3] is a version of the π calculus [30] equipped with abstract cryptographic primitives, e.g. primitives for perfect encryption and decryption. Names represent encryption keys as well as communication channels. Here we give a short overview of the main features of the calculus, by presenting a simple version with asynchronous communication and shared-key cryptography. The interested reader can find more details in [3] and [20] (a tutorial on the subject that has inspired the current short survey). 2.1
Syntax and Reduction Semantics
In this section we briefly recall some basic concepts about the asynchronous spi calculus with shared-key cryptography. The choice of the asynchronous version is inessential for the results of the paper and is only taken for simplicity. The restriction to shared-key cryptography only is for the sake of simplicity too. Given a countable set of names N (ranged over by a, b, . . . , n, m, . . .) and a countable set of variables V (ranged over by x, y, . . . ,), the set of terms is defined by the grammar: M, N ::= m | x | (M, N ) | {M }N with the proviso that in (M, N ) and {M }N the term M (and similarly N ) can be either a ground term (i.e., without variable occurrences) or simply a variable. The set of spi calculus processes is defined by the BN F -like grammar: P, Q ::= 0 | M N | M (x).P | (νn) P | P | Q | [M = N ]P else Q | AM1 , . . . , Mn | let (x, y) = M in P else Q | case M of {x}N in P else Q The name n is bound in the term (νn)P . In M (x).P the variable x is bound in P . In let (x, y) = M in P else Q the variables x and y are bound in P . In case M of {x}N in P else Q the variable x is bound in P . The set f n(P ) of free names of P is defined as usual. We give an intuitive explanation of the operators of the calculus:
Process Algebraic Frameworks
49
– 0 is the stuck process that does nothing. – M N is the output construct. It denotes a communication on the channel M of the term N . – M (x).P is the input construct. A name is received on the channel M and its value is substituted for the free occurrences of x in P (written P [M/x]). – (νn)P is the process that makes a new, private name n, for P , and then behaves as P . – P | Q is the parallel composition of two processes P and Q. Each may interact with the other on channels known to both, or with the outside world, independently of the other. – [M = N ]P else Q is the match construct. The process behaves as P when M = N , otherwise it behaves as Q. – Ax1 , . . . , xn is a process constant. We assume that constants are equipped . by a constant definition like Ax1 , . . . , xn = P , where the free variables of P are contained in {x1 , . . . , xn }. – let (x, y) = M in P else Q is the pair splitting process. If the term M is of the form (N, L), then it behaves as P [N/x][L/y]; otherwise, it behaves as Q. – case M of {x}N in P else Q is the decryption process. If M is of the form {L}N , then the process behaves as P [L/x]; otherwise, it behaves as Q. We also define the structural congruence as follows. Let ≡ be the least congruence relation over processes closed under the following rules: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
P ≡ Q, if P is obtained through α–conversion from Q P |0 ≡ P P |Q ≡ Q|P P |(Q | R) ≡ (P | Q) | R (νn)0 ≡ 0 (νn)(νm)P ≡ (νm)(νn)P (νn)(νn)P ≡ (νn)P / f n(M ) ∪ f n(N ) (νn)M N ≡ M N , if n ∈ (νn)M (x).P ≡ M (x).(νn)P , if n ∈ / sort(M ) (νn)(P | Q) ≡ P |(νn)Q if n ∈ / f n(P ) [M = M ]P else Q ≡ P , if M = M [M = M ]P else Q ≡ Q, if M = M and M, M are ground let (x, y) = ( M, N ) in P else Q ≡ P [M/x][N/y] /V let (x, y) = M in P else Q ≡ Q, if M = (N, N1 ), for some N, N1 and M ∈ case {M }N of {x}N in P else Q ≡ P [M/x] /V case M of {x}N in P else Q ≡ Q, if M = {N }N , for some N and M ∈ . AM1 , . . . , Mn ≡ P [M1 /x1 , . . . , M1 /x1 ], when Ax1 , . . . , xn = P .
We give the reduction semantics for the asynchronous spi calculus. Processes communicate among them by exchanging messages. An internal communication (or reduction) of the process P is denoted by P −→ P . We have the following rules for calculating the reduction relation between processes: P ≡ Q, Q −→ Q , Q ≡ P mN | m(x).P −→ P [N/x] P −→ P P −→ P P −→ P νn(P ) −→ νn(P ) P | Q −→ P | Q
50
Roberto Gorrieri and Fabio Martinelli
2.2
May Testing Semantics
May testing equivalence [9] is the equivalence notion that is used in the spi calculus to define the security properties. In order to define this equivalence, we first define a predicate that describes the channels on which a process can communicate. We let a barb β be an output channel. For a closed process P , we define the predicate P exhibits barb β, written P ↓ β, by the following rules: P ↓β mN ↓ m P |Q ↓ β P ↓ β β ∈ {m, m} P ≡ Q Q ↓ β P ↓β (νm)P ↓ β Intuitively, P ↓ β holds if P may output immediately along β. The convergence predicate P ⇓ β holds if P exhibits β after some reactions. P ↓ β P −→ Q Q ⇓ β P ⇓β P ⇓β A test consists of any closed process R and any barb β. A closed process P passes the test if and only if (P | R) ⇓ β. May testing equivalence is then defined on the set of closed processes as follows: P ≈may Q ⇐⇒ for any test (R, β), (P | R) ⇓ β if and only if (Q | R) ⇓ β May-testing has been chosen because it corresponds to partial correctness (or safety), and security properties are often safety properties. Moreover, a test neatly formalises the idea of a generic experiment or observation another process (such as an attacker) might perform on a process. So testing equivalence captures the concept of equivalence in an arbitrary environment; as a matter of fact, maytesting equivalence is a contextual equivalence.
3
CryptoSPA
Cryptographic Security Process Algebra (CryptoSP A for short) is a slight modification of CCS process algebra [29], adopted for the description of cryptographic protocols. It makes use of cryptographic-oriented modeling constructs and can deal with confidential values [15,17,26]. The CryptoSPA model consists of a set of sequential agents able to communicate by exchanging messages. The data handling part of the language consists of a set of inference rules used to deduce messages from other messages. We consider a set of relations among closed messages as: r ⊆ P f in (M) × M, where r is the name of the rule. Given a set R of inference rules, we consider the deduction relation DR ⊆ P f in (M)×M. Given a finite set of closed messages, say φ, then (φ, M ) ∈ DR if M can be derived by iteratively applying the rules in R. For the sake of simplicity, we assume that r (for each r ∈ R) and DR ⊆ P f in (M) × M are decidable.
Process Algebraic Frameworks
3.1
51
The Language Syntax
CryptoSP A syntax is based on the following elements: – A set Ch of channels, partitioned into a set I of input channels (ranged over by c) and a set O of output channels (ranged over by c, the output corresponding to the input c); – A set V ar of variables, ranged over by x; – A set M of messages, defined as above for the spi calculus, ranged over by M, N (and by m, n, with abuse of notation, to denote closed messages). The set L of CryptoSPA terms (or processes) is defined as follows: P, Q ::= 0| c(x).P | cM.P | τ.P | P | Q | P \L | A(M1 , . . . , Mn ) | [M1 , . . . , Mr rule x]P ; Q where M, M , M1 , . . . , Mr are messages or variables and L is a set of channels. Both the operators c(x).P and [M1 . . . Mr rule x]P ; Q bind variable x in P . We assume the usual conditions about closed and guarded processes, as in [29]. We call P the set of all the CryptoSP A closed and guarded terms. The set of actions is Act = {c(M ) | c ∈ I}∪{cM | c ∈ O}∪{τ } (τ is the internal, invisible action), ranged over by a. We define sort(P ) to be the set of all the channels syntactically occurring in the term P . Moreover, for the sake of readability, we always omit the termination 0 at the end of process specifications, e.g. we write a in place of a.0. We give an informal overview of CryptoSP A operators: – 0 is a process that does nothing. – c(x).P represents the process that can get an input M on channel c behaving like P [M/x]). – cm.P is the process that can send m on channel c, and then behaves like P . – τ.P is the process that executes the invisible τ and then behaves like P . – P1 | P2 (parallel) is the parallel composition of processes that can proceed in an asynchronous way but they must synchronize on complementary actions to make a communication, represented by a τ . – P \L is the process that cannot send and receive messages on channels in L; for all the other channels, it behaves exactly like P ; – A(M1 , . . . , Mn ) behaves like the respective defining term P where all the variables x1 , . . . , xn are replaced by the messages M1 , . . . , Mn ; – [M1 , . . . , Mr rule x]P ; Q is the process used to model message manipulation as cryptographic operations. Indeed, the process [M1 , . . . , Mr rule x]P ; Q tries to deduce an information z from the tuple M1 , . . . , Mr through the application of rule rule ; if it succeeds then it behaves like P [z/x], otherwise it behaves as Q. The set of rules that can be applied is defined through an inference system (e.g., see Figure 1 for an instance).
52
Roberto Gorrieri and Fabio Martinelli m m (pair ) (m, m )
(m, m ) (f st ) m
m k (enc ) {m}k
{m}k m
(m, m ) (snd ) m k
(dec )
Fig. 1. An example inference system for shared key cryptography
3.2
The Operational Semantics of CryptoSPA
In order to model message handling and cryptography we use a set of inference rules. Note that CryptoSP A syntax, its semantics and the results obtained are completely parametric with respect to the inference system used. We present in Figure 1 an instance inference system, with rules: to combine two messages obtaining a pair (rule pair ); to extract one message from a pair (rules f st and snd ); to encrypt a message m with a key k obtaining {m}k and, finally, to decrypt a message of the form {m}k only if it has the same key k (rules enc and dec , respectively). In a similar way, inference systems can contain rules for handling the basic arithmetic operations and boolean relations among numbers, so that the valuepassing CCS if-then-else construct can be obtained via the rule operator. Example 1. Natural numbers may be encoded by assuming a single value 0 and a function S(y), with the following rule: x inc. Similarly, we can define sumS(x) mations and other operations on natural numbers. Example 2. We do not explicitly define equality check among messages in the syntax. However, this can be implemented through the usage of the inference x x construct. E.g., consider rule equal. Then [m = m ]A (with the Equal(x, x) expected semantics) may be equivalently expressed as [m m equal y]A where y does not occur in A. Similarly, we can define inequalities, e.g., ≤, among natural numbers. More interestingly, this form of inference constructs of CryptoSPA is also useful to model common access control mechanisms in distributed systems. Example 3. Indeed, consider a set of credentials, i.e. (signed) messages containing information about access rights. Assume that {A, ob1 , +}pr(C) means that the user C (via the signature with its private key pr(C)) asserts A has the right to access the object ob1 and may grant this access to other users (this is denoted through the symbol +). A rule like: {A, ob1 , +}pr(C)
pr(C) {grant {B, ob1 , +}pr(C)
B, ob1 }pr(A)
(accC )
Process Algebraic Frameworks
(input)
m∈M
c(m)
(\L)
(output)
c(m)
c(x).P −→ P [m/x]
P −→ P
c ∈ L
c(m)
P \L −→ P \L
c(x)
a
(| ) 1
P1 −→ P1 a
P1 | P2 −→
P1
(| )
| P2
P [m1 /x1 , . . . , mn /xn ] −→ P
(D1 )
τ
τ.P −→ P cm
τ
P2 −→ P2
P1 | P2 −→
P1
| P2
.
a
A(x1 , . . . , xn ) = P
A(m1 , . . . , mn ) −→ P (D)
P1 −→ P1
2
a
(Def )
(internal)
cm
cm.P −→ P
53
a
P [m/x] −→ P
m1 , . . . , mr rule m
a
[m1 , . . . , mr rule x]P ; Q −→ P
∃m s.t. m1 , . . . , mr rule m
a
Q −→ Q
a
[m1 , . . . , mr rule x]P ; Q −→ Q
Fig. 2. Structured Operational Semantics for CryptoSPA (symmetric rules for |1 , |2 and \L are omitted)
may be used by the controller C to issue other access right credentials, after receiving an indication by A, i.e. the signed message {grant B, ob1 }pr(A) . Thus, we may also consider the inference rules as an abstract mechanism to express security policies usually defined using other mathematical models and logics (e.g., see [21,34]). The operational semantics of a CryptoSP A term is described by means of the a a labelled transition system (lts, for short) P, Act, {−→}a∈Act , where {−→}a∈Act is the least relation between CryptoSP A processes induced by the axioms and inference rules of Figure 2. As a notation we also use P =⇒ P for denoting γ τ that P and P belong to the reflexive and transitive closure of −→; P =⇒ P a1 τ if γ is a finite sequence of actions ai , 1 ≤ i ≤ n s.t. ai = τ and P =⇒−→=⇒ an τ . . . =⇒−→ =⇒ P . γ Let T r(P ) denote the set {γ ∈ (Act \ {τ })∗ |P =⇒ P } of executable observable traces. We define the trace preorder, ≤trace , as follows: P ≤trace Q if T r(P ) ⊆ T r(Q). We say that P and Q are trace equivalent, denoted P ∼tr Q, iff T r(P ) = T r(Q).
4
Comparison in Expressiveness
We compare the two languages by providing an encoding from the spi calculus to CryptoSPA (actually a sublanguage). Basically, the encoding [ ], preserves the equality of terms, i.e. assume that ≈1 (≈2 ) is an equivalence relation on spi calculus (CryptoSPA), then P ≈1 Q
⇐⇒
[P ] ≈2 [Q]
The technical machinery in [35] about barbed equivalences will be of help, as ≈i , with i = 1, 2, will be weak barbed equivalences. Barbed equivalences are based
54
Roberto Gorrieri and Fabio Martinelli
on a minimal notion of observable, i.e. the barb. This makes them very suitable to provide natural equivalence notions in different languages in a uniform way. It is worthwhile noticing that these forms of equivalence are finer than may-testing (in their respective languages). Barbed Equivalences. Given the predicate P exhibits a barb β, P ↓ β, it is possible to define in a standard way a set of useful process equivalences. We say that a symmetric relation R among processes is a barbed bisimulation, if (P, Q) ∈ R then: – For all β, it holds P ↓ β iff Q ↓ β; – if P −→ P then ∃ Q s.t. Q −→ Q and (P , Q ) ∈ R. The union of all barbed bisimulations, denoted by ∼, is a barbed bisimulation. There exists also a form of weak barbed bisimulation where in the previous statements, the observable predicate is replaced by the weak one, i.e. P ⇓ β and Q −→ Q by Q =⇒ Q . We say that a symmetric relation R among processes is a barbed equivalence whenever, if (P, Q) ∈ R then for each static (cf [35]) context C[·], it holds that C[P ] ∼ C[Q]. Our encoding works in two steps: – The spi calculus will be encoded, up to weak barbed equivalence, into a sublanguage, called spires -calculus. In this sublanguage, the input operator is replaced by a new one: a process can receive only pairs on a public channel, i.e. net, that cannot be restricted. After receiving a pair, a process is obliged to check the first element of the pair with a given message. Only if the match is successful the process proceeds, otherwise it has to reproduce the message in the net and to try with another pair. – Then, the spires -calculus is encoded into CryptoSPAres , a similar variant of spires but on CryptoSPA. Basically, we encode the decryption, splitting and matching constructs through inference rules. Moreover, the new name generation is simulated as the receiving action of a fresh message generated by a special process, called Gen. 4.1
An Encoding of Spi Calculus into spires -Calculus
The encoding [P ]1 acts as an homomorphism on spi calculus process, except for MN.P and M(x).P . In particular, we have the homomorphic [(νc)P ]1 = (νc)[P ]1 ; moreover, [M N ]1 = net(M, N ), where net is a special channel name that cannot be restricted, and [M (x).P ]1 = A where the defining equation for A is . A = net(x).let (z1 , z2 ) = x in ([M = z1 ][P [z2 /x]]1 else net(z1 , z2 ) | A) else netx | A The encoding works as follows. Sent messages are encoded as pairs: the first element denotes the channel, and the second one the message itself. When a process wishes to receive a message on a certain channel, say M , it has to get
Process Algebraic Frameworks
55
a pair from the network, and then it is obliged to check if the first element of the pair is the channel; if so, the process proceeds as before (provided that the derivative is encoded), otherwise the pair that has been captured from the network is inserted again. Note that we cannot avoid that a communication happens within the channel net, however, in the case that the channel is not the expected one then the system returns to the original configuration. Consider the following example of a communication on a restricted channel. Example 4. Suppose P = νc(cn | c(x).xx). Then, [P ]1 is νc(net(c, n) | A) where A is defined as .
A = net(x).let (z1 , z2 ) = x in ([c = z1 ](net(z2 , z2 )) else net(z1 , z2 ) | A) else netx | A Now, P −→ νc(nn) and similarly [P ]1 −→ νc(net(n, n)). Consider now the process Q = c1 n. Then, [P | Q]1 = [P ]1 |[Q]1 = νc(net(c, n) | A) | netc1 , n. Note that [P | Q]1 −→ T , by means of a synchronization of net and T ≡ [P | Q]1 : [P | Q]1 = νc(net(c, n) | A) | net(c1 , n) ≡ νc(net(c, n) | A | net(c1 , n)) −→ νc(net(c, n) |([c = c1 ](net(n, n)) else net(c1 , n) | A)) ≡ νc(net(c, n) | A | net(c1 , n)) Indeed, the encoded process may perform useless communications on the channel net; the crucial point is that such communications do not significantly change the status of the process. Thus, we may define a form of weak barbed equivalence among processes in spi and the ones obtained through [ ]1 , i.e. spires . The idea is that whenever P exhibits an output on a barb c, say P ↓ c, then [P ]1 exhibits an output on a barb net of c, say [P ]1 ↓ netc, ∗, and conversely. Moreover, if P performs a reduction then also [P ]1 must perform it, on the contrary if [P ]1 performs a reduction P may also choose to stay blocked. This encoding is clearly not satisfactory for the point of view of implementation, as it introduces divergence. However, it is useful when we are simply interested in verifying security properties that usually depends on may testing equivalence. Indeed, weak barbed equivalence implies may testing equivalence. 4.2
Encoding spires into CryptoSPAres
We may encode the spires calculus into CryptoSPAres . For most operators, the [ ]2 function works as a homomorphism, e.g., [P | Q]2 = [P ]2 |[Q]2 . However, we must face two relevant problems: (i) the different forms of cryptography handling;(ii) the different treatment of new name generation. We deal first with the simpler one that is the treatment of cryptography. In CryptoSPA cryptographic primitives are modeled by means of the inference system. Hence, we can map the decryption construct of spi as follows: [case M of {x}N in P else Q]2 = [M N dec x][P ]2 ; [Q]2
56
Roberto Gorrieri and Fabio Martinelli
and similarly for the splitting construct and the matching one. For the second problem, we have to consider the restriction operator (new name generation). The restriction operator of the spi calculus is used to denote “secret” values known locally by the process. The treatment of such secret values is very elegant in the pi/spi calculus. Ultimately, the operator (νn)P defines a new fresh name n in P , that no one else should be ever able to create/guess. We encode these features through the usage of a specific process that creates new names. This process is the unique one allowed to generate such messages and it creates them iteratively. A CryptoSPA specification for such a process could be the following: Gen(x) = [x nonce y].geny.Gen(y) where the (omitted) rule for nonce creation could be as inc in Example 1. The encoding [ ]2 may map each νn(P ) construct to a receiving action, i.e. [νn(P )]2 = gen(x).[P ]2 This second encoding [ ]∗ will be actually the following [P ]∗ = [P ]2 | Gen(0), because we need one single instance of the name generator process. Note that restricted names are mapped to new names that are unguessable by the enemy because we assume that only process Gen can send along gen, hence ensuring that an enemy cannot eavesdrop new names sent from Gen. The encoding [ ]∗ is sound as we consider as observable the first component in a pair which is an output over net, taking care not to consider in CryptoSPAres the new (nonce) names (which correspond to restricted channels in spires ). We show a complete example of encoding from spi to CryptoSPAres . Example 5. Consider P = νc(cn). Then, [P ]∗ = gen(c).net(c, n) | Gen(0). Note that although [P ]∗ may perform a communication step, that cannot be matched by P , the observable behaviour is the same since the output of nonce values cannot be observed.
5 5.1
Comparison of the Two Approaches Protocol Analysis in the Spi Calculus
We show a very basic example. We have two principals A and B that use a public (hence insecure) channel, cAB , for communication; in order to achieve privacy, messages are encrypted with a shared key KAB . The protocol is simply that A sends along cAB a single message M to B, encrypted with KAB . A → B : {M }KAB on public cAB The spi calculus specification is as follows: A(M ) = cAB {M }KAB B = cAB (x).case x of {y}KAB in F (y) P (M ) = (νKAB )(A(M ) | B) where F (y) is the continuation of B. The fact that the channel cAB is public is witnessed by the fact that it is not restricted. On the other hand, KAB is
Process Algebraic Frameworks
57
restricted to model that it is a secret known only by A and B. When B receives {M }KAB on cAB , B attempts to decrypt it using KAB ; if this decryption succeeds, B applies F to the result. Two important properties hold for this protocol: – Authenticity (or integrity): B always applies F to the message M that A sends; an enemy cannot cause B to apply F to some other message M . – Secrecy: No information on the message M can be inferred by an observer while M is in transit from A to B: if F does not reveal M , then the whole protocol does not reveal M . Intuitively, the secrecy property should establish that if F (M ) is indistinguishable from F (M ), then the protocol with message M is indistinguishable from the protocol with message M . This intuition can be formulated in terms of equivalences as follows: if F (M ) ≈may F (M ), for any M , M , then P (M ) ≈may P (M ). Also integrity can be formalized in terms of an equivalence. This equivalence compares the protocol with another version of the protocol which is secure by construction. For this example, the required specification is: A(M ) = cAB {M }KAB Bspec (M ) = cAB (x).case x of {y}KAB in F (M ) Pspec (M ) = (νKAB )(A(M ) | Bspec (M )) The principal B is replaced with a variant Bspec (M ) that receives an input from A and then acts like B when B receives M . Bspec (M ) is a sort of “magical” version of B that knows the message M sent by A, hence ensuring integrity by construction. Therefore, we take the following equivalence as our integrity property: P (M ) ≈may Pspec (M ), for any M . 5.2
Protocol Analysis in CryptoSPA
In this section we want to show how to use CryptoSPA for the analysis of cryptographic protocols. The following subsections are devoted (i) to illustrate how security properties can be specified by decorating suitably protocol specification, then (ii) to discuss the actual definition of admissible attackers and, finally, (iii) to show how secrecy and integrity can be modeled in this framework. Noninterference for Cryptographic Protocols Analysis. Noninterference essentially says that a system P is secure if its low behaviour in isolation is the same as its low behaviour when exposed to the interaction with any high level process Π. Analogously, we may think that a protocol P is secure if its (low) behaviour is the same as its (low) behaviour when exposed to the possible attacks of any intruder X. To set up the correspondence, this analogy forces to consider the enemies as the high processes. Since the enemy has complete control over the communication medium, the CryptoSPA public channels in set C (i.e., the names used for
58
Roberto Gorrieri and Fabio Martinelli
message exchange) are the high level actions while the private channels in set (I ∪O)\C are the low level ones. As a protocol specification is usually completely given by message exchanges, it may be not obvious what are the low level actions. In our approach, they are extra observable actions that are included into the protocol specification to observe properties of the protocol. Of course, the choice of these extra actions (and the place into the specification where they are to be inserted) is property dependent. Considering the example A → B : {M }KAB on public cAB the basic integrity property that we want to model can be obtained by enriching the protocol specification with a (low) extra action received(M ) that B performs when receiving the message M . Hence, the protocol specification is: A(M ) = cAB {M }KAB B = cAB (x).[{M }KAB , KAB dec y].receivedy P (M ) = A(M ) | B where cAB {M }KAB is a shorthand for [M, KAB enc x]cAB x and where the received message is sent along the private channel received. Hence, we can state that integrity holds if the following equation holds for the protocol enriched with the event received: P (M ) satisfies integrity of M if for all (admissible) enemies X we have P (M ) \ {cAB } ∼tr (P (M ) | X) \ {cAB } This noninterference-based definition seems intuitively quite strong; it is of the form of equivalence of contexts: the closed term defining the security property, i.e. P (M ) \ {cAB } ∼tr {received(M )}, is checked for (trace) equivalence with the open system (P (M ) | •) \ {cAB }; such a comparison takes the form of an infinity of equivalence checks for all possible enemies X closing the term. Admissible Enemies. Intuitively, an enemy X can be thought of as a process which tries to attack a protocol by stealing and faking the information which is transmitted on the CryptoSPA public channels in set C. However, we are to be sure that X is not a too powerful attacker. Indeed, a peculiar feature of the enemies is that they should not be allowed to know secret information in advance: as we assume perfect cryptography, the initial knowledge of an enemy must be limited to include only publicly available pieces of information, such as names of entities and public keys, and its own private data (e.g., enemy’s private key). If we do not impose such a limitation, the attacker would be able to “guess” every secret piece of information. Considering the example above, if the enemy knows the key kAB that should be known only by A and B, the protocol would be easily attacked by the instance of the enemy def X(m, k) = cAB {m}k
Process Algebraic Frameworks
59
where m = MX and k = KAB . The problem of guessing secret values can be solved by imposing some constraints on the initial data known by the enemies. Given a process P , we call ID(P ) the set of messages that occur syntactically in P . Now, let φI ⊆ M be the finite, initial knowledge that we would like to give to the enemies, i.e., the public information such as the names of the entities and the public keys, plus some possible private data of the intruders (e.g., their private keys or nonces). For a certain intruder X, we want that all the messages in ID(X) are deducible from φI . Formally, given a finite set φI ⊆ M, called the initial knowledge, we define the set ECφI of admissible enemies as ECφI = {X ∈ P | sort(X) ⊆ C and ID(X) ⊆ D(φI )}. To see how ECφI prevents the problem presented in the running example, to indicate that KAB is secret, we can now require that KAB ∈ D(φI ). Since ID(X(MX , KAB )) = {MX , KAB }, we finally have that X(MX , KAB ) ∈ ECφI . Integrity. In order to specify integrity (or authenticity), one has simply to decorate the protocol specification P with an action of type received in correspondence of the relevant point of the protocol specification, obtaining a decorated protocol P . Once this has been done, the integrity equation reads as follows: P (M ) satisfies integrity of M if ∀X ∈ ECφI (P (M ) | X) \ C ∼tr P (M ) \ C where C is the set of public channels. The intuition is that P (M ) \ C represents the protocol P running in isolation (because of the restriction on public channels), while (P (M ) | X) \ C represents the protocol under the attack of an admissible (i.e., that does not know too much) enemy X. The equality imposes that the enemy X is not able to violate the integrity property specification that is represented by the correct (low) trace received(M ). The property above is very similar to a property known in the literature as Non Deducibility on Compositions [13,14] (NDC for short) and, as we will show in the next subsection is valid also for analysing secrecy. NDC for CryptoSPA is defined as follows (see [17]): A process S is NDC iff ∀X ∈ ECφI (S | X) \ C ∼tr S \ C. In other words S is NDC if every possible enemy X which has an initial knowledge limited by φI is not able to significantly change the behaviour of the system. This definition can be generalized to the scheme GNDC [15,17] as follows: α iff ∀X ∈ ECφI : (S | X) \ C ≈ α(S) S is GN DC≈ where α(S) denotes the secure specification of the system, which is then compared with the open term (S | •)\C. GNDC is a very general scheme under which many security properties for cryptographic protocols can be defined as suitable instances. See, e.g., [16] for some examples about authentication properties. Secrecy. Also secrecy can be defined via a variation of the NDC equation above, or better as an instance of the GNDC scheme. Consider a protocol P (M ) and assume that we want to verify if P (M ) preserves the secrecy of message M .
60
Roberto Gorrieri and Fabio Martinelli
This can be done by proving that every enemy which does not know message M , cannot learn it by interacting with P (M ). Thus, we need a mechanism that notifies whenever an enemy is learning M . We implement it through a simple process called knowledge notifier which reads from a public channel ck ∈ C \ sort(P (M )) not used in P (M ) and executes a learntM action if the read value is exactly equal to M . For a generic message m, it can be defined as follows: def
KN (m) = ck (y).[m = y]learntm We assume that learnt is a special channel that is never used by protocols def and is not public, i.e., learnt ∈ sort(P ) ∪ C. We now consider P (M ) = P (M ) | KN (M ), i.e., a modified protocol where the learning of M is now notified. A very intuitive definition of secrecy can be thus given as follows: P (M ) preserves the secrecy of M iff for all secrets N ∈ M \ D(φI ) ∀X ∈ ECφI
P (N ) \ C ∼tr (P (N ) | X) \ C
In other words, we require that for every secret M and for every admissible enemy X, process (P (M ) | X) \ C never executes a learntM action, as P (N ) \ C is not able to do so. Most Powerful Enemy. A serious obstacle to the widespread use of these GNDC -like properties is the universal quantification over all admissible enemies. While the proof that a protocol is not NDC can be naturally given by exhibiting an enemy that breaks the semantic equality, much harder is the proof that a protocol is indeed NDC, as it requires an infinity of equivalence checks, one for each admissible enemy. One reasonable way out could be to study if there is an attacker that is more powerful than all the others, so that one can reduce the infinity of checks to just one, albeit huge, check with respect to such most powerful enemy. Indeed, it is easy to prove that if is a pre-congruence1 and if there exists a process T op ∈ ECφI such that for every process X ∈ ECφI we have X T op, then: P ∈ N DC
iff
(P | T op) \ C P \ C
If the hypotheses of the proposition above hold, then it is sufficient to check that P \ C is equivalent to (P | T op) \ C. Given the pre-congruence , let ≈= ∩ −1 . If there exist two processes Bot, T op ∈ ECφI such that for every process X ∈ ECφI we have Bot X T op then P ∈ N DC≈ iff (P | Bot) \ C ≈ (P | T op) \ C ≈ P \ C Given these very general results, one may wonder if they are instanciable to some of the semantics we have described so far. Indeed, this is the case, at least for the trace preorder ≤trace , which is a pre-congruence. 1
A preorder is a pre-congruence (w.r.t. the operators | and \C) if for every P, Q, R ∈ P if Q R then P | Q P | R and Q \ C R \ C.
Process Algebraic Frameworks
61
The easy part is to identify the minimal element Bot in ECφI w.r.t. ≤trace : the minimum set of traces is the emptyset, that is generated, e.g., by process 0. Let us now try to identify the top element T op in ECφI w.r.t. ≤trace . The “most powerful enemy” can be defined by using a family of processes T opC,φ trace each representing the instance of the enemy with knowledge φ: C,φ∪{m} T opC,φ c(m).T optrace + cm.T opC,φ trace = trace c∈C m ∈ Msg(c)
c∈C m ∈ D(φ) ∩ Msg(c)
I The “initial element” of the family is T opC,φ trace as φI is the initial knowledge. Note that it may accept any input message, to be bound to the variable x which is then added to the knowledge set φ∪{x}, and may output only messages that can transit on the channel c and that are deducible from the current knowledge set I φ via the deduction function D. It is easy to see that T opC,φ trace is the top element of the trace preorder. As a consequence of the fact that the trace preorder is a C,φI pre-congruence and that T optrace is the top element for that preorder, we have that the single check against the top element is enough to ensure NDC. Formally: φI I P ∈ N DCC iff (P | T opC,φ trace ) \ C ∼tr P \ C.
The Example. The example studied for the spi calculus can be easily modeled in CryptoSPA: A(M ) = cAB {M }KAB B = cAB (x).[{M }KAB , KAB dec y]0 P (M ) = A(M ) | B
In order to study if integrity and secrecy hold, we have to define a suitably decorated version P of the protocol P : A(M ) = cAB {M }KAB B = cAB (x).[{M }KAB , KAB dec y]receivedy
P (M ) = A(M ) | B | KN (M )
where we have inserted the extra event received for integrity analysis purpose and the knowledge notifier for secrecy analysis purpose. In the single session described above, both secrecy and integrity holds, that is (P (M ) | X) \ C can never show up a low trace that is not possible for P (M ) \ C. 5.3
Comparing the Analysis Scenarios
In the spi calculus, the analysis scenario is rather delicate: a tester is at the same time the enemy that tries to influence the behaviour of the system and the observer that should keep track of the behaviour of the system. On the contrary, in CryptoSPA the two roles are separate: on the one hand we explicitly introduce
62
Roberto Gorrieri and Fabio Martinelli
the enemy X inside the scope of restriction, on the other hand we use trace equivalence to compare the two behaviours. In essence, we can say that in spi we use a contextual equivalence, while in CryptoSPA we use and equivalence of contexts. We argue that the elegance of the spi approach is paid in terms of lack of flexibility in modeling enemies; for instance, it is not straightforward to model passive attackers (less powerful enemies) or situations in which different parties of the protocol are subject to different enemies (more holes in the context). Both cases can be easily represented in the CryptoSPA approach by choosing either suitable ECφI or by including more enemies inside the GNDC-like equations. Example 6. Consider a system with three components, say A, B and C. The components A and B can communicate on a private channel c and share a secret b, while the components B and C through the private channel c1 and shares a secret d. Assume that both A and C are malicious; this means to regard them as enemies. Considering a generic specification for A (resp. C), as XA (resp. XC ), then a possible context to be analyzed is νcνb(XA | νc1 νd(B | XC )), with b, c ∈ / f n(XC ) 2 . As an extension of the spi-calculus approach, we may define a suitable class of contexts w.r.t. one performs certain observations. Thus the previous analysis problem could be instantiated as a contextual equivalence problem. However, techniques for dealing with such generic contexts are currently not well developed. On the contrary, within the GNDC approach, a simple generalization to having more “holes” (unknown components as remarked in [25,26]), makes it possible to model and analyze it in the case of trace equivalence by resorting to the use of the most general enemies3 . As a matter of fact, assume that EcφA (resp. Ecφ1C ) denotes the possible behaviours of enemies with the knowledge of φA (resp. φC ). Then, the GNDC specification could be ((XA ) | B |(XC )) \ {c, c1 } ∼tr α(B) Note that α(B) could also take into account the description of A and C. Using the most general intruder approach the verification is equal to just one check. As a matter of fact we may simply check that: ((T opφc A ) | B |(T opφc1C )) \ {c, c1 } ∼tr α(B) A main difference, w.r.t. the two approaches may be noted in the treatment of integrity. In the spi-calculus approach we must find the magical correct implementation to be used as reference w.r.t. the system under investigation. (We use the term implementation because it is very close to the description of the system). On the contrary, in the GNDC approach, we simply specify the intended observable behavior, indeed, that the messages are correctly delivered through a control action. Thus, the correct specification is indeed rather more abstract 2
3
Due to the limitations of using a spi term for expressing the system, it seems difficult to specify such a case study without imposing some side conditions on the processes XA and XC . In several cases, to have more enemies that have the possibility to directly communicate with each other is equivalent to consider just one enemy (see [26]).
Process Algebraic Frameworks
63
than the system. Recently Gordon and Jeffrey (e.g., see [19]) developed type systems for a spi-calculus variant that embodies a form of control (correspondence) actions. Using that type systems they were able to check authentication properties as agreement ([33]). In that framework, authenticity is exactly specified as control actions, following the Woo-Lam approach (see [36]). Another difference is that in spi it is necessary to perform two different analyses in order to prove the two security properties of secrecy and authenticity. On the contrary, in CryptoSPA one single NDC check is enough for both, as both properties are in the NDC form. As a matter of fact, it is enough to consider the decorated specification which includes the control action for integrity and the knowledge notifier. Then, we can use this single, combined specification for a single check against the most powerful enemy. This idea of combined analysis can be generalized to many different properties and has shown its usefulness (higher probability to find unexplored attacks) in some concrete cases [12]. Finally, the way secrets are handled is quite different. In spi this is achieved elegantly by means of the restriction operator, while in CryptoSPA we have explicitly to manage the set of pieces of information that are given to the enemies.
5.4
Comparing the Security Properties
One may wonder if the properties of secrecy and integrity defined in the two different process algebraic frameworks are somehow related. In spite of the technical differences, integrity is indeed the same property. The actual definition of integrity asks to check that the system meets its magical specification for each continuation F (y), provided F (y) does not reveal information on the message y. In [16], it has been shown that it is enough to consider the simple continuation F (x) = receivedx to establish whether a protocol enjoys integrity. On the contrary, the two notions of secrecy are clearly different: the spi one is based on the idea of indistinguishablility, while the CryptoSPA one is based on the idea of possession, i.e. the so-called Dolev-Yao (see also [1]). To see the main point we must note that secret parameter x, in the process S(x), must be a public value, i.e. a non-restricted one. Thus, among the possible tests we may found at least one that “knows” it, i.e. it has it as a free name. While, on the usual CryptoSPA approach the secret values are never “known” by the enemies; in fact, when an enemy is able to discover it we say that there is a secrecy attack. Consider the process P (x) = (νn)M n. Note that x does not occur in the term of P . Thus, P necessarily preserves the secrecy of the public messages M, M , . . ., indeed, P (M ) = P (M ) = P . However, from another point of view, P reveals one of its (declared) “secrets”, i.e. the private name n, since it communicates n on the public channel M . In the usual CryptoSPA approach for secrecy, we would be interested in studying the secrecy of the restricted names, rather than the public ones. The notion of secrecy developed in the spi approach is rather a form of information flow, i.e. non-interference. As a matter of fact, it can be formulated in the GNDC schema as follows:
64
Roberto Gorrieri and Fabio Martinelli S(M )
– S(x) preserves the secrecy of x iff c(x).S(x) enjoys N DC≈may (with M public) w.r.t. all the enemies whose sort is {c} that output at least one message on the channel c and c ∈ / Sort(S). Thus, simply by considering other assumptions on the set of possible intruders it is possible to code secrecy in the spi as a GNDC property. Moreover, it is possible P to show that if S(x) preserves the secrecy of x and S(M ) enjoys GN DCmay then P also S(M ) enjoys GN DCmay . This holds because may testing may be defined in CryptoSPA and it is a congruence w.r.t. restriction and parallel composition (under certain assumptions, e.g. see [16]).
6
Other Frameworks
Very shortly we mention also other well-known process algebraic approaches that have been proposed in recent years. The oldest and most widely deployed approach is the one based on CSP, which is well-illustrated in the book [33]. It shares similarities with CryptoSPA, as also in this approach security properties are modeled as observable events decorating the protocol, even if the idea of explicitly applying non-interference is not used. This approach has been mechanized, by using a compiler (called Casper [23] that translates protocol specifications into CSP code) and the FDR model checker; it has been enhanced to deal with symbolic reasoning (data independence) in [32]. Similarly, we have a compiler, called CVS [11] that translates specifications in a pre-dialect of CryptoSPA into SPA (i.e., CCS) code, and the Concurrency Workbench model checker. These analyses are approximated by considering an enemy that has limited memory and capability of generating new messages. More advanced symbolic semantics have been studied for CryptoSPA in [24]. In [25,26], the usage of contexts (open systems) to describe the security analysis scenarios has been advocated. The language used is similar to CryptoSPA. However, differently from the GNDC approach, the correct specification is given through logical formulas and the treatment of the admissible enemies is done by reducing the verification problem to a validity one in the logic. Recently, the approach has been extended with symbolic techniques (see [27]). For a symbolic semantics of a spi-like language see, for instance, [6]. As spi is an extension of the π calculus with cryptographic primitive, similarly sjoin [2] extends the join calculus with constructs for encryption and decryption and with names that can be used as keys, nonces or other tags. The applied pi calculus (see [18]) deals with the variety of different cryptosystems by adopting a general term algebra with an equality relation. This process calculus permits to describe cryptographic protocols using different cryptosystems. Thus, both applied pi and CryptoSPA recognize the necessity to manage uniformly different kind of cryptography that may be present in a complex protocol. The former exploits term algebras plus equality while the latter exploits a generic inference system. A more recent approach is LySa [8], which is a very close relative of spi and pi-calculus. LySa mainly differs in two respects: (i) absence of channels
Process Algebraic Frameworks
65
(one global communication medium) and (ii) tests on values being received in communications as well as values being decrypted are directly embedded inside inputs and decryptions. A static analysis technology, based on Control Flow Analysis, has been applied to security protocols, expressed in LySa, providing a fully automatic and efficient tool. The same technology has been successfully used to analyse secrecy for pi [4] and spi [5]. Other process algebraic approaches not strictly related to the analysis of cryptographic protocols include the ambient calculus and the security pi calculus. The ambient calculus [7] is concerned with mobility of ambients (abstract collection of processes and objects that functions both as a unit of mobility and a unit of security). Communication takes place only inside an ambient, hence the hierarchy of nested ambients regulates who can communicate with who. Differently from spi, the security pi calculus [28] extends the π calculus with a new construct [P ]σ denoting that process P is running at security level σ. This calculus is not very suited to talk about cryptographic protocols but is tailored for access control policies.
Acknowledgements We would like to thank Nadia Busi and Marinella Petrocchi for helpful comments.
References 1. M. Abadi. Security protocols and specifications. In Proc. Foundations of Software Science and Computation Structures, volume 1578 of LNCS, pages 1–13, 1999. 2. M. Abadi, C. Fournet, and G. Gonthier. Secure implementation of channel abstractions. Information and Computation, 174(1):37–83, 2002. 3. M. Abadi and A. D. Gordon. A calculus for cryptographic protocols: The spi calculus. Information and Computation, 148(1):1–70, 1999. 4. C. Bodei, P. Degano, F. Nielson, and H. R. Nielson. Static analysis for the picalculus with applications to security. Information and Computation, 168:68–92, 2001. 5. C. Bodei, P. Degano, F. Nielson, and H. R. Nielson. Flow logic for dolev-yao secrecy in cryptographic processes. Future Generation Computer Systems, 18(6):747–756, 2002. 6. M. Boreale. Symbolic trace analysis of cryptographic protocols. In Automata, Languages and Programming, LNCS, pages 667–681, 2001. 7. L. Cardelli and A. Gordon. Mobile ambients. Theoretical Computer Science, 240(1):177–213, 2000. 8. C.Bodei, M. Buchholtz, P.Degano, F. Nielson, and H. R. Nielson. Automatic validation of protocol narration. In Proceedings of The 16th Computer Security Foundations Workshop. IEEE Computer Society Press, 2003. 9. R. De Nicola and M. C. B. Hennessy. Testing equivalences for processes. Theoretical Computer Science, 34(1-2):83–133, 1984. 10. D. Dolev and A. Yao. On the security of public key protocols. IEEE Transactions on Information Theory, 29(12):198–208, 1983.
66
Roberto Gorrieri and Fabio Martinelli
11. A. Durante, R. Focardi, and R. Gorrieri. A compiler for analysing cryptographic protocols using non-interference. ACM Transactions on Software Engineering and Methodology, 9(4):489–530, 2000. 12. A. Durante, R. Focardi, and R. Gorrieri. Cvs at work: A report on new failures upon some cryptographic protocols. In Workshop on Mathematical Methods, Models and Architectures for Computer Networks Security, LNCS 2052, 2001. 13. R. Focardi and R. Gorrieri. A classification of security properties. Journal of Computer Security, 3(1):5–33, 1995. 14. R. Focardi and R. Gorrieri. Classification of security properties (part i: Information flow). In Foundations of Security Analysis and Design, volume 2171 of LNCS, pages 331–396, 2001. 15. R. Focardi, R. Gorrieri, and F. Martinelli. Non interference for the analysis of cryptographic protocols. In Proceedings of 27th International Colloquium in Automata, Languages and Programming, volume 1853 of LNCS, pages 354–372, 2000. 16. R. Focardi, R. Gorrieri, and F. Martinelli. A comparison of three authentication properties. Theoretical Computer Science, 291(3):285–327, 2003. 17. R. Focardi and F. Martinelli. A uniform approach for the definition of security properties. In Proceedings of World Congress on Formal Methods (FM’99), volume 1708 of LNCS, pages 794–813, 1999. 18. C. Fournet and M. Abadi. Mobile values, new names, and secure communication. In Proceedings of the 28th ACM Symposium on Principles of Programming Languages (POPL’01), pages 104–115, 2001. 19. A. Gordon and A. Jeffrey. Authenticity by typing in security protocols. In Proceedings of The 14th Computer Security Foundations Workshop. IEEE Computer Society Press, 2001. 20. A. D. Gordon. Notes on nominal calculi for security and mobility. In Foundations of Security Analysis and Design, volume 2171 of LNCS, pages 262–330, 2001. 21. J. Halpern and R. van der Meyden. A logic for SDSI’s linked local name spaces. In PCSFW: Proceedings of The 12th Computer Security Foundations Workshop. IEEE Computer Society Press, 1999. 22. G. Lowe. Breaking and fixing the Needham Schroeder public-key protocol using FDR. In Proceedings of Tools and Algorithms for the Construction and the Analisys of Systems, volume 1055 of LNCS, pages 147–166. Springer Verlag, 1996. 23. G. Lowe. Casper: A compiler for the analysis of security protocols. Journal of Computer Security, 6:53–84, 1998. 24. F. Martinelli. Symbolic semantics and analysis for crypto-ccs with (almost) generic inference systems. In Proceedings of the 27th international Symposium in Mathematical Foundations of Computer Sciences(MFCS’02), volume 2420 of LNCS, pages 519–531. 25. F. Martinelli. Formal Methods for the Analysis of Open Systems with Applications to Security Properties. PhD thesis, University of Siena, Dec. 1998. 26. F. Martinelli. Analysis of security protocols as open systems. Theoretical Computer Science, 290(1):1057–1106, 2003. 27. F. Martinelli. Symbolic partial model checking for security analysis. In Workshop on Mathematical Methods, Models and Architectures for Computer Networks Security, LNCS, 2003. To appear. 28. J. Rieley and Matthew Hennessy. Information flow vs. resource access in the asynchronous pi-calculus. ACM Trans. on Progr. Lang. and Systems (TOPLAS), 24(5):566–591, 2002. 29. R. Milner. Communication and Concurrency. International Series in Computer Science. Prentice Hall, 1989.
Process Algebraic Frameworks
67
30. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes. Information and Computation, 100(1):1–77, 1992. 31. R. M. Needham and M. D. Schroder. Using encryption for authentication in large networks of computers. Communications of the ACM, 21(12):993–999, 1978. 32. A. Roscoe and P. Broadfoot. Proving security protocols with model checkers by data independence techniques. Journal of Computer Security, 7(2-3):147–190, 1999. 33. P. Ryan, S. Schneider, M. Goldsmith, G. Lowe, and B. Roscoe. The Modelling and Analysis of Security Protocols: the CSP Approach. Addison-Wesley, 2001. 34. P. Samarati and S. D. C. di Vimercati. Access control: Policies, models, and mechanisms. In R. Focardi and R. Gorrieri, editors, Foundations of Security Analysis and Design, LNCS 2171. Springer-Verlag, 2001. 35. D. Sangiorgi. Expressing Mobility in Process Algebras: First-Order and HigherOrder Paradigms. CST–99–93, Department of Computer Science, University of Edinburgh, 1992. Also published as ECS–LFCS–93–266. 36. T. Woo and S. Lam. A semantic model for authentication protocols. In IEEE Computer Society Symposium on Research in Security and Privacy, pages 178– 194, 1993.
Semantic and Syntactic Approaches to Simulation Relations Jo Hannay1 , Shin-ya Katsumata2 , and Donald Sannella2 1
2
Department of Software Engineering, Simula Research Laboratory Laboratory for Foundations of Computer Science, University of Edinburgh
Abstract. Simulation relations are tools for establishing the correctness of data refinement steps. In the simply-typed lambda calculus, logical relations are the standard choice for simulation relations, but they suffer from certain shortcomings; these are resolved by use of the weaker notion of pre-logical relations instead. Developed from a syntactic setting, abstraction barrier-observing simulation relations serve the same purpose, and also handle polymorphic operations. Meanwhile, second-order prelogical relations directly generalise pre-logical relations to polymorphic lambda calculus (System F). We compile the main refinement-pertinent results of these various notions of simulation relation, and try to raise some issues for aiding their comparison and reconciliation.
1
Introduction
One of the central activities involved in stepwise development of programs is the transformation of “abstract programs” involving types of data that are not normally available as primitive in programming languages (graphs, sets, etc.) into “concrete programs” in which a representation of these in terms of simpler types of data (integers, arrays, etc.) is provided. Apart from the change to data representation, such data refinement should have no effect on the results computed by the program: the concrete program should be equivalent to the abstract program in the sense that all computational observations should return the same results in both cases. The usual way of establishing this property, known as observational equivalence, is by exhibiting a simulation relation that gives a correspondence between the data values involved in the two programs that is respected by the functions they implement. The details depend on the nature of the language in which the programs are written. In the simple case of a language with only first-order functions, it is usually enough to use an invariant on the domain of concrete values together with a function mapping concrete values (that satisfy the invariant) to abstract values [Hoa72], but a strictly more general method is to use a homomorphic relation [Mil71], [Sch90], [ST97]. If non-determinism is present in the language then some kind of bisimulation relation is required.
This research was partly supported by the MRG project (IST-2001-33149) which is funded by the EC under the FET proactive initiative on Global Computing. SK was supported by an LFCS studentship.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 68–91, 2003. c Springer-Verlag Berlin Heidelberg 2003
Semantic and Syntactic Approaches to Simulation Relations
69
When the language in question is the simply-typed lambda calculus, the standard choice of simulation relation – which originates with Reynolds in [Rey81, Rey83] but is described most clearly in [Ten94], cf. [Mit96] – is to use a logical relation, a type-indexed family of relations that respects not just function application (like homomorphisms) but also lambda abstraction. Logical relations are used extensively in the study of typed lambda calculus and have applications outside lambda calculus. A problem with the use of logical relations, in connection with data refinement and other applications, is the fact that they lack some convenient algebraic properties; in particular, the composition of two logical relations is not in general a logical relation. This calls into question their application to data refinement at least, where one might expect composition to account for the correctness of stepwise refinement. An alternative is to use instead a pre-logical relation [HS02], a weaker form of logical relations that nevertheless has many of the features that make logical relations so useful as well as being composable. This yields a proof method for establishing observational equivalence that is not just sound, as with logical relations, but is also complete. The use of pre-logical relations in data refinement is studied in [HLST00]. The situation is more complicated when we consider polymorphically typed lambda calculi such as System F [Gir71,Rey74]. Pre-logical relations can be extended to this context, see [Lei01], but then they do not compose in general although they remain sound and complete for observational equivalence. At the same time, the power of System F opens the possibility of taking a syntactic approach, placing the concept of simulation relation in a logical setting and using existential type quantification for data abstraction [MP88]. This line of development has been investigated in a string of papers on abstraction barrierobserving simulation relations by Hannay [Han99,Han00,Han01,Han03] based on a logic for parametric polymorphism due to Plotkin and Abadi [PA93]. A clear advantage of such an approach is that it is amenable to computer-aided reasoning but there are certain compromises forced by the syntactic nature of the framework. We present this background in Sects. 2–4 and then make a number of remarks aiming at some kind of reconciliation in Sect. 5. There are more questions than answers but some possible lines of enquiry are suggested.
2
Pre-logical Relations
Our journey begins with λ→ , the simply-typed lambda calculus having → as its only type constructor. Definition 2.1. The set of types over a set B of base types (or type constants) is given by the grammar σ ::= b | σ → σ where b ranges over B. A signature Σ consists of a set B of type constants and a collection C of typed term constants c : σ. Types → (Σ) denotes the set of types over B. Σ-terms are given by the grammar M ::= x | c | λx:σ.M | M M where x ranges over variables and c over term constants. The usual typing rules associate
70
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
each well-formed term M in a Σ-context Γ = x1 :σ1 , . . . , xn :σn with a type σ ∈ Types → (Σ), written Γ M : σ. If Γ is empty then we write simply M : σ. Definition 2.2. A Σ-combinatory algebra A consists of: – – – –
a carrier set [[σ]]A for each σ ∈ Types → (Σ); → A A A a function App σ,τ A : [[σ → τ ]] → [[σ]] → [[τ ]] for each σ, τ ∈ Types (Σ); A A an element [[c]] ∈ [[σ]] for each term constant c : σ in Σ; and σ,τ ρ,σ,τ ∈ [[σ → (τ → σ)]]A and SA ∈ [[(ρ → σ → τ ) → (ρ → combinators KA → A σ) → ρ → τ ]] for each ρ, σ, τ ∈ Types (Σ)
σ,τ σ,τ →σ σ,τ ρ,σ,τ such that KA x y = x (i.e. App τ,σ KA x) y = x) and SA xyz = A (App A (x z)(y z) (ditto).
A Γ -environment η on a combinatory algebra A assigns elements of A to variables, with η(x) ∈ [[σ]]A for x : σ in Γ . A Σ-term Γ M : σ is interpreted in A under a Γ -environment η in the usual way with λ-abstraction interpreted via translation to combinators, written [[Γ M : σ]]A η , and this is an element of [[σ]]A . If M is closed then we write simply [[M : σ]]A . A signature Σ models the interface (type names and function names) of a functional program, with Σ-combinatory algebras modelling programs that match interface Σ. Observational equivalence of two Σ-combinatory algebras is then fundamental to the notion of data refinement. Definition 2.3. Let A and B be Σ-combinatory algebras and let OBS , the observable types, be a subset of Types → (Σ). Then A is observationally equivalent to B with respect to OBS , written A ≡OBS B, if for any two closed Σ-terms M, N : σ for σ ∈ OBS , [[M : σ]]A = [[N : σ]]A iff [[M : σ]]B = [[N : σ]]B . It is usual to take OBS to be the “built-in” types for which equality is decidable, for instance bool and/or nat. Then A and B are observationally equivalent iff it is not possible to distinguish between them by performing computational experiments. Note that OBS ⊆ OBS implies ≡OBS ⊇ ≡OBS . Logical relations are structure-preserving relations on combinatory algebras. Definition 2.4. A logical relation R over Σ-combinatory algebras A and B is a family of relations {Rσ ⊆ [[σ]]A × [[σ]]B }σ∈Types → (Σ) such that: – Rσ→τ (f, g) iff ∀a ∈ [[σ]]A .∀b ∈ [[σ]]B .Rσ (a, b) ⇒ Rτ (App A f a, App B g b). – Rσ ([[c]]A , [[c]]B ) for every term constant c : σ in Σ. For OBS = {nat}, the connection between logical refinement and observational equivalence is given by Mitchell’s representation independence theorem. Theorem 2.5 (Representation Independence [Mit96]). Let Σ be a signature that includes a type constant nat, and let A and B be Σ-combinatory algebras1 with [[nat]]A = [[nat]]B = N. If there is a logical relation R over A and 1
Actually Henkin models, which are extensional combinatory algebras; however extensionality is not a necessary condition for this theorem.
Semantic and Syntactic Approaches to Simulation Relations
71
B with Rnat the identity relation on natural numbers, then A ≡{nat} B. Conversely, if A ≡{nat} B, Σ provides a closed term for each element of N, and Σ contains only first-order term constants, then there is a logical relation R over A and B with Rnat the identity relation. 2 This theorem corresponds directly to the following method for establishing the correctness of data refinement steps. Proof Method ([Ten94]). Let A and B be Σ-combinatory algebras and let OBS ⊆ Types → (Σ). To show that B is a refinement of A, find a logical relation R over A and B such that Rσ is the identity relation for each σ ∈ OBS . We R then say that B is a logical refinement of A and write A B, or A B when we want to make R explicit. A well-known problem with logical relations is the fact that they are not R closed under composition. It follows that, given logical refinements A B and S B C, the composition S ◦ R cannot in general be used as a witness for the composed refinement A C. (In fact, the problem is more serious than it appears at first: sometimes there is no witness for A C at all.) This is at odds with the stepwise nature of refinement, and the transitivity of the underlying notion of observational equivalence. It is one source of examples demonstrating the incompleteness of the above proof method; there are other examples that do not involve composition of refinement steps, see [HLST00]. The restriction to signatures with first-order term constants in the second part of Theorem 2.5 is necessary, and this is the key to the incompleteness of logical refinements as a proof method and the problem with composability of logical refinements. If A B C then A ≡OBS B ≡OBS C, and so A ≡OBS C since ≡OBS is an equivalence relation. But then it follows that A C only for signatures without higher-order term constants. In [HS02], a weakening of the notion of logical relations called pre-logical relations was studied; see [PPST00] for a categorical formulation. Definition 2.6 ([HS02]). An algebraic relation R over Σ-combinatory algebras A, B is a family of relations {Rσ ⊆ [[σ]]A × [[σ]]B }σ∈Types → (Σ) such that: – If Rσ→τ (f, g) then ∀a ∈ [[σ]]A .∀b ∈ [[σ]]B .Rσ (a, b) ⇒ Rτ (App A f a, App B g b). – Rσ ([[c]]A , [[c]]B ) for every term constant c : σ in Σ. A pre-logical relation R is an algebraic relation such that: ρ,σ,τ σ,τ – R(SA , SBρ,σ,τ ) and R(KA , KBσ,τ ) for all ρ, σ, τ ∈ Types → (Σ).
The idea of this definition is to replace the reverse implication in the definition of logical relations with a requirement that the relation contains the S and K combinators. Since these suffice to express all lambda terms, this amounts to requiring the reverse implication to hold only for pairs of functions that are expressible by the same lambda term. It is easy to see that any logical relation is a pre-logical relation.
72
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
Example 2.7. A Σ-homomorphism h : A → B is a type-indexed family of functions {hσ : [[σ]]A → [[σ]]B }σ∈Types → (Σ) such that for any term constant σ,τ σ→τ c : σ in Σ, hσ ([[c]]A ) = [[c]]B , hτ (App σ,τ (f ) hσ (a) and A f a) = App B h σ→τ A ([[Γ λx:σ.M : σ → τ ]]ηA ) = [[Γ λx:σ.M : σ → τ ]]B h h◦ηA . Any Σhomomorphism is a pre-logical relation but is not in general a logical relation. The binary case of pre-logical relations over A and B is derived from the unary case of pre-logical predicates for the product structure A × B. Similarly for n-ary relations for n > 2. Definition 2.8 ([HS02]). A pre-logical predicate P over a Σ-combinatory algebra A is a family of predicates {P σ ⊆ [[σ]]A }σ∈Types → (Σ) such that: – If P σ→τ (f ) then ∀a ∈ [[σ]]A .P σ (a) ⇒ P τ (App A f a). – P σ ([[c]]A ) for every term constant c : σ in Σ. ρ,σ,τ σ,τ – P (SA ) and R(KA ) for all ρ, σ, τ ∈ Types → (Σ). Example 2.9. For any signature Σ and combinatory algebra A, the family P σ (v) ⇔ v is the value of a closed Σ-term M : σ is a pre-logical predicate over A. (In fact, P is the least such – see Prop. 2.17 below.) Now, consider the signature Σ containing the type constant nat and term constants 0 : nat and succ : nat → nat and let A be the combinatory algebra over N where 0 and succ have their usual interpretations and [[σ → τ ]]A = [[σ]]A → [[τ ]]A for every σ, τ ∈ Types → (Σ) with App σ,τ A f x = f (x). Then P is not a logical predicate over A: any function f ∈ [[nat → nat]]A , including functions that are not lambda definable, takes values in P to values in P and so must itself be in P . An improved version of Theorem 2.5, without the restriction to first-order signatures, holds if pre-logical relations are used in place of logical relations. Theorem 2.10 (Representation Independence for Pre-Logical Relations [HS02]). Let A and B be Σ-combinatory algebras and let OBS ⊆ Types → (Σ). Then A ≡OBS B iff there exists a pre-logical relation over A and B which is a partial injection on OBS . 2 This suggests the following. (We switch to a notation that makes the set of observable types explicit.) Definition 2.11 ([HLST00]). Let A and B be Σ-combinatory algebras and OBS > B, OBS ⊆ Types → (Σ). Then B is a pre-logical refinement of A, written A ∼ ∼∼∼∼ σ if there is a pre-logical relation R over A and B such that R is a partial injection for each σ ∈ OBS . We phrase this as a definition, rather than as a proof method for the underlying notion of data refinement, in contrast to logical refinements. As a proof method it is sound and complete, and therefore equivalent to this underlying notion. Pre-logical relations compose – in fact, for extensional models they are the minimal weakening of logical relations with this property (see [HS02] for details).
Semantic and Syntactic Approaches to Simulation Relations
73
Proposition 2.12 ([HS02]). The composition S ◦ R of pre-logical relations R over A, B and S over B, C is a pre-logical relation over A, C. 2 So pre-logical refinements compose, and this explains why stepwise refinement OBS OBS >B ∼ >C ⇒ ∼∼∼∼ ∼∼∼∼ is sound. Another explanation goes via Theorem 2.10: A ∼ OBS > C. The set of observable types A ≡OBS B ≡OBS C ⇒ A ≡OBS C ⇒ A ∼∼∼∼∼ need not be the same in both steps, as the following result spells out.
OBS OBS OBS >B ∼ > C and OBS ⊆ OBS then A ∼ > C. ∼∼∼∼ ∼∼∼∼ ∼∼∼∼ Proposition 2.13. If A ∼ 2
The key to many of the applications of logical relations, including Theorem 2.5, is the Basic Lemma, which says that any logical relation over A and B relates the interpretation of each lambda term in A to its interpretation in B. Lemma 2.14 (Basic Lemma for Logical Relations). Let R be a logical relation over A and B. Then for all Γ -environments ηA , ηB such that RΓ (ηA , ηB ) B and every term Γ M : σ, Rσ ([[Γ M : σ]]A 2 ηA , [[Γ M : σ]]ηB ). (Here, RΓ (ηA , ηB ) refers to the obvious extension of R to environments.) For pre-logical relations, we get a two-way implication. This says that pre-logical relations are the most liberal weakening of logical relations that give the Basic Lemma. (The reverse implication fails for logical relations.) Lemma 2.15 (Basic Lemma for Pre-Logical Relations [HS02]). Let R = {Rσ ⊆ [[σ]]A × [[σ]]B }σ∈Types → (Σ) be a family of relations over A and B. Then R is a pre-logical relation iff for all Γ -environments ηA , ηB such that RΓ (ηA , ηB ) B 2 and every Σ-term Γ M : σ, Rσ ([[Γ M : σ]]A ηA , [[Γ M : σ]]ηB ). Composability of pre-logical relations (Prop. 2.12) is an easy consequence of this. Pre-logical relations enjoy a number of useful algebraic properties apart from closure under composition. For instance: Proposition 2.16 ([HS02]). Pre-logical relations are closed under intersection, product, projection, permutation and ∀. Logical relations are closed under product, permutation and ∀ but not under intersection or projection. 2 A consequence of closure under intersection is that given a property P of relations that is preserved under intersection, there is always a least pre-logical relation satisfying P . We then have the following lambda-definability result (recall Example 2.9 above): Proposition 2.17 ([HS02]). The least pre-logical predicate over a given combinatory algebra contains exactly those elements that are the values of closed Σ-terms. 2 In a signature with no term constants, a logical relation may be constructed by defining a relation R on base types and using the definition to “lift” R inductively to higher types. The situation is different for pre-logical relations: there are in general many pre-logical liftings of a given R, one being of course
74
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
its lifting to a logical relation. But since the property of lifting a given R is preserved under intersection, the least pre-logical lifting of R is also a well-defined relation. Similarly, the least pre-logical extension of a given family of relations is well-defined for any signature. Lifting R to a logical relation is not possible in general for signatures containing higher-order term constants. Extension is also problematic: the cartesian product A × A is a logical relation that trivially extends any binary relation on A, but this is uninteresting.
3
Pre-logical Relations for System F
The simply-typed lambda calculus λ→ considered in the last section is a very simple language. Extending it with other type constructors, for example sum and product types, is unproblematic, see [HS02]. Much more challenging is the addition of parametric polymorphism as found in functional programming languages, which yields System F [Gir71,Rey74]. A hint of the power this adds, apart from the obvious ability to define functions that work uniformly over a family of types, is the fact that it is possible to encode inductive types, including the natural numbers, booleans, lists and products, in pure System F [BB85]. Our interest is in data refinement over System F viewed as a programming language, as a means of applying the ideas in the previous section to languages like Standard ML. The key concept underlying data refinement, as we have seen, is that of observational equivalence, and thus understanding the notion of observational equivalence between models of System F is the main theme. Towards this goal, we extend the semantic approach described in Sect. 2 to System F. This involves extending pre-logical relations to System F, and characterising observational equivalence by pre-logical relations. Leiß [Lei01] has developed a formulation of pre-logical relations and studied their properties in Fω , the extension of System F by type constructors. In the context of this paper we restrict attention to plain System F, even though the further extension to Fω presents no additional difficulties. Definition 3.1. The set of types for System F over a set B of base types is given by the grammar σ ::= b | α | σ → σ | ∀α . σ, where α ranges over type variables. A signature Σ consists of a set B of type constants and a collection C of typed term constants c : σ where σ is closed. Types →∀ (Σ) denotes the set of types over B. The set of Σ-terms for System F is given by the grammar M ::= x | c | λx : σ . M | M M | Λα . M | M σ. For simplicity, we obey Barendregt’s variable convention: bound variables are chosen to differ in name from free variables in any type or term. A Σ-type context (ranged over by ∆) is a list of distinct type variables. A Σ-context (ranged over by Γ ) is a list of pairs of variables and types in Types →∀ (Σ), where variables are distinct from each other. We often omit Σ if it is clear from the context. For the type system and representation of data types, see e.g. [GTL90]. By ∆ τ, ∆ Γ
Semantic and Syntactic Approaches to Simulation Relations
75
and ∆ | Γ M : τ we mean to declare a type, a context and a term which are well-formed. First we introduce the underlying model theory of System F in the style of Bruce, Mitchell and Meyer [BMM90]. Definition 3.2. A Σ-BMM interpretation (abbreviated BMMI) A consists of n n – a set TA and a family [TA → TA ] ⊆ TA → TA for each n ∈ N satisfying 2 certain conditions , – an element [[b]]A ∈ TA for each b ∈ B of Σ, – functions ⇒A : TA × TA → TA and ∀A : [TA → TA ] → TA , – a TA -indexed family of sets A, : At⇒A u → (At → Au ) for each t, u ∈ TA , – a function Appt⇒u A ∀f – a function AppA : A∀A f → t∈A Af (t) for each f ∈ [TA → TA ].
Here we introduce two terminologies. A ∆-environment is a mapping from type ∆ variables in ∆ to TA . We write TA for the set of ∆-environments. For a context ∆ Γ , a Γ -environment is a mapping which maps a variable in Γ to an element A A in A[[Γ (x)]]χ , where χ is a ∆-environment. We write A[[Γ ]]χ for Γ -environments. We continue the definition: ∆ – a meaning function for types [[−]]A , which maps a type ∆ σ and χ ∈ TA A to [[σ]]χ ∈ TA (for details, see [Has91]), A
– an element [[c]]A ∈ A[[σ]] for each c : σ in Σ, – a meaning function for terms [[−]]A (we use the same symbol), which maps A ∆ a term ∆ | Γ M : σ and environments χ ∈ TA , η ∈ R[[Γ ]]χ to a value [[σ]]A χ (for details, see [Has91]). [[M ]]A χ;η ∈ R Given Σ-BMMIs A and B, we can define the product Σ-BMMI A × B in the obvious componentwise fashion. Definition 3.3. Let A be a Σ-BMMI. A predicate R over A (written R ⊆ A) consists of a subset TR ⊆ TA and TR -indexed family of subsets Rt ⊆ At . For t, u ∈ TA and f ∈ [TA → TA ] such that f (t) ∈ TR for any t ∈ TR , we define u Rt → Ru = {x ∈ At⇒A u | ∀y ∈ Rt . Appt⇒u A (x)(y) ∈ R }
f (t) } ∀x ∈ TR . Rf = {x ∈ A∀A f | ∀t ∈ TR . πt (App∀f A (x)) ∈ R
Binary relations for Σ-BMMIs are just predicates over product interpretations. Now a predicate R ⊆ A is – pre-logical for types if it satisfies the following: • [[b]]A ∈ TR , • t, u ∈ TR implies t ⇒A u ∈ TR , 2
n [TA → TA ] includes projections and is closed under composition, ⇒A and ∀A . See [Has91] for a detailed account.
76
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
∆ • for all types ∆, α σ with χ ∈ TR , if [[σ]]A χ{α→t} ∈ TR holds for all t ∈ TR , then [[∀α . σ]]A ∈ T , R χ – algebraic if it is pre-logical for types and A • [[c]]A ∈ R[[σ]] for all c : σ in Σ, • for all t, u ∈ TR , Rt⇒A u ⊆ Rt → Ru , ∆ • for all types ∆, α σ with χ ∈ TR , R∀A f ⊆ ∀x ∈ TR . Rf holds, where A 3 f (t) = [[σ]]χ{α→t} , – pre-logical if it is algebraic and A ∆ • for all terms ∆ | Γ, x : σ M : σ with χ ∈ TR and η ∈ R[[Γ ]]χ , if A [[σ ]]A χ holds for all v ∈ R[[σ]]χ , then [[λx : σ . M ]]A [[M ]]A χ;η ∈ χ;η{x→v} ∈ R A
A
R[[σ]]χ ⇒A [[σ ]]χ , A ∆ • for all terms ∆, α | Γ M : σ with χ ∈ TR and η ∈ R[[Γ ]]χ , if ft ∀A f [[M ]]A holds for all t ∈ TR , then [[Λα . M ]]A , χ;η ∈ R χ{α→t};η ∈ R A where f (t) = [[σ]]χ{α→t} . – logical if it is pre-logical for types and conditions of algebraic relations hold by equality. Alternatively, we can extend the definition of pre-logical relations (Definition 2.6) in terms of algebraic relations relating additional combinators for System F [BMM90]. We have the following main theorem of pre-logical predicates: Theorem 3.4 (Basic Lemma for Pre-Logical Predicates [Lei01]). Let A be a Σ-BMMI and R ⊆ A be a predicate. ∆ 1. R is pre-logical for types iff for all types ∆ σ with χ ∈ TR , [[σ]]A χ ∈ TR holds. A ∆ 2. R is pre-logical iff for all terms ∆ | Γ M : σ with χ ∈ TR and η ∈ R[[Γ ]]χ , A [[σ]]χ [[M ]]A holds.
χ;η ∈ R
Corollary 3.5 ([Lei01]). Logical predicates over Σ-BMMIs are pre-logical.
Proposition 3.6 ([Lei01]). Let A be a Σ-BMMI. We define the definability A predicate D by TD = {[[∅ σ]]A } and D[[∅σ]] = {[[∅ | ∅ M : σ]]A }. Then D is the least pre-logical predicate over A.
It is easy to see that pre-logical predicates for System F are closed under product, permutation and arbitrary intersection. On the other hand, they are not closed under composition (nor under projection). This is pointed out by Leiß in the setting of F ω [Lei01]. The composition R◦S of two relations R ⊆ A×B, S ⊆ B×C is given as follows: TR◦S = TR ◦ TS , 3
R ◦ S t,u = {Rt,r ◦ S r,u | ∃r.(t, r) ∈ TR , (r, s) ∈ TS }
At this point we know that f (t) ∈ TR for any t ∈ TR by the first part of Theorem 3.4. Thus ∀x ∈ TR . Rf is defined.
Semantic and Syntactic Approaches to Simulation Relations
77
Proposition 3.7. Binary pre-logical relations between Σ-BMMIs do not compose in general. Proof. Let Σc = ({b}, {c : b}) be a signature and A be a Σc -BMMI where [[b]]A contains at least two elements, namely {, ⊥}. Any non-trivial BMMI for the empty signature can be used for this purpose. We interpret the constant by [[b]]A ⇒A [[b]]A . [[c]]A = . We use λx . ⊥ as shorthand for [[λx : b . y]]A ∅;{y→⊥} ∈ A Let θ = [b/α] and θ = [b ⇒ b/α] be type substitutions. We define relation TR by TR = {([[σθ]]A , [[σθ ]]B ) | α σ}. It is easy to show that this is pre-logical A B ∆ by induction: for types. Next we define R[[σθ]]χ ,[[σθ ]]χ for all ∆, α σ and χ ∈ TR A
B
R[[bθ]]χ ,[[bθ ]]χ = {(, )} A
B
A
B
R[[αθ]]χ ,[[αθ ]]χ = {(⊥, λx . ⊥)} R[[βθ]]χ ,[[βθ ]]χ = Rχ(β) A A B B A B R[[(σ⇒σ )θ]]χ ,[[(σ⇒σ )θ ]]χ = R[[σθ]]χ ,[[σθ ]]χ → R[[σ θ]]χ ,[[σ θ ]]χ A
B
R[[(∀β.σ)θ]]χ ,[[(∀β.σ)θ ]]χ = ∀x ∈ TR . Rf
B (f (t, u) = ([[σθ]]A χ{β→t} , [[σθ ]]χ{β→u} ))
We can show that R = (TR , R) is pre-logical. However the relation R−1 ◦ R relates (λx . ⊥, λx . ⊥) but not (⊥, ⊥). This contradicts algebraicity.
One natural question is when the composition of two pre-logical relations is pre-logical, and Leiß showed a sufficient condition [Lei01]. We give a characterisation of observational equivalence by pre-logical relations, as in Theorem 2.10. We can reuse Definition 2.3 for observational equivalence between Σ-BMMIs with respect to closed observable types. Theorem 3.8 ([Lei01]). Let A and B be Σ-BMMIs and let OBS be a set of closed types of System F. Then A ≡OBS B iff there exists a pre-logical relation over A and B which is a partial injection on OBS .
4
Expressing Simulation Relations Syntactically
Our journey now moves into the syntactic realm, placing the concepts of simulation relation and representation independence in a logical setting. The main incentive is that computer-aided reasoning requires syntactic expressibility. One reasonable choice for syntactic formalism is the polymorphic lambda calculus, together with a second-order logic, with the lambda calculus being the object programming language. The decision in this section to make use of polymorphism is motivated primarily by expressibility, since semantic notions may then be internalised in syntax. At the outset, we use this expressive power to express the simply-typed notions of Sect. 2. However, in Sect. 4.4, polymorphism in data types is also handled. Nevertheless, in Sect. 4.6 we suggest that the appropriate setting for describing polymorphic data types is actually F3 . Our choice here of formalism influences the way we regard the failure of standard simulation relations, i.e., logical relations, at higher order. It turns out that a natural trail of development gives a solution that is conceptually
78
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
different from that of pre-logical relations, although it should be evident that the concepts are strongly related. The syntactic approach here is developed directly from syntactic abstraction barriers inherent in polymorphic types. This gives a notion of abstraction barrier-observing (abo) simulation relation. 4.1
Internalisation into Syntax
To start, we consider System F, cf. Sect. 3. Here we wish to be formalistic, so we use pure System F without constants. Self-iterating inductive types can be def def encoded [BB85], e.g., nat = ∀α.α → (α → α) → α, bool = ∀α.α → α → α, and def list σ = ∀α.α → (σ → α → α) → α, with programmable constructors, destrucdef tors and conditionals. Products encode as σ × τ = ∀α.(σ → τ → α) → α with constructor pair σ,τ and destructors proj 1σ,τ and proj 2σ,τ . We use the logic for parametric polymorphism due to [PA93], a second-order logic over System F augmented with relation symbols, relation definition, and the axiomatic assertion of relational parametricity. See also [Mai91,Tak98]. Formulae now include relational statements as basic predicates and quantifiables, φ ::= (M =σ N ) | M ξ N | · · · | ∀ξ ⊂ σ×τ . φ | ∃ξ ⊂ σ×τ . φ where ξ ranges over relation variables. Relation definition is given by the syntax Γ (x : σ, y : τ ) . φ ⊂ σ×τ def
where φ is a formula. For example eq σ = (x : σ, y : σ) . (x =σ y). We write U [X] to indicate possible occurrences of variable X in type, term or formula U , and write U [A] for the capture-correct substitution U [A/X]. Complex relations may be built from simpler ones. We get the arrow-type relation R → R ⊂ (σ → σ )×(τ → τ ) from R ⊂ σ×τ and R ⊂ σ ×τ by (R → R ) = (f : σ → σ , g : τ → τ ) . (∀x : σ.∀y : τ . (x R y ⇒ (f x) R (gy))) def
The universal-type relation ∀(α, β, ξ ⊂ α × β)R[ξ] ⊂ (∀α.σ[α]) × (∀β.τ [β]) is defined from R[ξ] ⊂ σ[α] × τ [β], where α, β and ξ ⊂ α × β are free, by def
∀(α, β, ξ ⊂ α×β)R[ξ] = (y : ∀α.σ[α], z : ∀β.τ [β]) . (∀α.∀β.∀ξ . (yα)R[ξ](zβ)) For n-ary α, σ, τ , R, where Ri ⊂ σi ×τi , we get ρ[R] ⊂ ρ[σ]×ρ[τ ], the action of type ρ[α] on R, by substituting relations for type variables: ρ[α] = αi : ρ[R] = Ri ρ[α] = ρ [α] → ρ [α] : ρ[R] = ρ [R] → ρ [R] ρ[α] = ∀α .ρ [α, α ] : ρ[R] = ∀(β, γ, ξ ⊂ β ×γ)ρ [R, ξ] Here, R may be seen as base relations from which one uniquely defines relations according to type construction. This is logical lifting and gives the mechanism for logical relations in our syntactic setting. The proof system is intuitionistic natural deduction, augmented with inference rules for relation symbols in the obvious way. There are standard axioms for equational reasoning implying extensionality for arrow and universal types.
Semantic and Syntactic Approaches to Simulation Relations
79
Parametric polymorphism requires all instances of a polymorphic functional to exhibit a uniform behaviour [Str67,BFSS90,Rey83]. We adopt relational parametricity [Rey83,MR91]: A polymorphic functional instantiated at two related domains should give related instances. This is asserted by the schema Param : ∀γ.∀f : (∀α.σ[α, γ]) . f (∀α.σ[α, eq γ ]) f The logic with Param is sound; we have, e.g., the parametric per -model of [BFSS90] and the syntactic models of [Has91]. In order to prove the existence of a model, one has to show that Param holds for all closed f . If one then expands the statement, one obtains a syntactic analogue of the Basic Lemma for logical relations, but here involving universal types. Lemma 4.1 (Basic Lemma Param [PA93]). For all closed f : ∀α.σ[α], we derive without Param, f (∀α.σ[α]) f . 2 Constructs such as products, sums, initial and final (co-)algebras are encodable in System F. With Param, these become provably universal constructions. Relational parametricity also yields the fundamental Lemma 4.2 (Identity Extension Param [PA93]). With Param, we derive ∀γ.∀u, v : σ[γ] . (u σ[eq γ ] v ⇔ (u =σ[γ] v))
2
For data types, we use the following notation: A data type over a signature T consists of a data representation A and an implementation of a set of operations a : T [A]. Encapsulation is provided in the style of [MP88] by the following encoding of existential (abstract) types and pack and unpack combinators: def
∃α.T [α] = ∀β.(∀α.(T [α] → β) → β),
β not free in σ
def
pack T (A)(a) = Λβ.λf : ∀α.(T [α] → β).f (A)(a) def unpack T (package)(τ )(f ) = package(τ )(f ) Operationally, pack packages a data representation and an implementation of operations on that data representation to give a data type of the existential type. The resulting package is a polymorphic functional, that given a client computation and its result domain, instantiates the client with the particular elements of the package. The unpack combinator is merely the application operator for pack . An abstract type for stacks of natural numbers could be ∃α.(α × (nat → α → α) × (α → α) × (α → nat)) A data type of this type is, e.g., (pack list nat l), where (proj 1 l) = nil , (proj 2 l) = cons, (proj 3 l) = λl : list nat . (cond list nat (isnil l) nil (cdr l)), (proj 4 l) = λl : list nat . (cond nat (isnil l) 0 (car l)). For convenience we use a labelled product notation,
80
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
∃α.TStack nat [α] def
where TStack nat [α] = (empty : α, push : nat → α → α, pop : α → α, top : α → nat). Each fi : Ti [α] is a profile in T [α]. The analogy to Sect. 2 is that fi is a term constant in the signature T , and models are internalised as packages (pack Aa). Consider now the issue of when two packages are interchangeable in a program. To each refinement stage, a set OBS of observable types is associated, assumed to contain closed inductive types, such as bool or nat, and also any parameters. Two data types are interchangeable if their observable properties are the same, i.e., packages should be observationally equivalent if it makes no difference which one is used in computations with observable result types. Thus: Definition 4.3 (Observational Equivalence). Observational equivalence of (pack Aa), (pack Bb) with respect to OBS is expressed by ∀f : ∀α.(T [α] → ι).(f A a) =ι (f B b) ι∈OBS
For example, an observable computation on natural-number stacks could be Λα.λx : TStack nat [α] . x.top(x.push n x.empty). Observational equivalence is the conceptual description of interchangeability, and simulation relations is a means for showing observational equivalence. In the logic one uses the action of types on relations to define logical relations. Two data types are related by a simulation relation if there exists a relation on their data representations that is preserved by their corresponding operations. Definition 4.4 (Simulation Relation). The existence of a simulation relation between (pack Aa) and (pack Bb) is expressed by ∃ξ ⊂ A×B . a(T [ξ, eq γ ])b. We want the two notions to be equivalent. For data types with first-order operations, this equivalence is a fact under relational parametricity. At higher-order this is not the case. Also, the composability of simulation relations fails at higher order, compromising the constructive composition of refinement steps. Consider the assumption that T [α] has only first-order function profiles: T : Every profile Ti [α] = Ti1 [α] → · · · → Tni [α] → Tci [α] of T [α] is first FDT OBS order, and such that Tci [α] is either α or some ι ∈ OBS . T Theorem 4.5 (Composability [Han99]). Assuming FDTOBS , with Param we get
∀A, B, C, ξ ⊂ A×B, ζ ⊂ B ×C, a : T [A], b : T [B], c : T [C]. a T [ξ, eq γ ] b ∧ b T [ζ, eq γ ] c ⇒ a T [ξ ◦ ζ, eq γ ] c 2 Theorem 4.6 (Representation Independence [Han99]). Assuming T , we get with Param, for A, B, a : T [A], b : T [B] and OBS , FDTOBS ∀f : ∀α.(T [α] → ι) . (f A a) =ι (f B b) ∃ξ ⊂ A×B . a T [ξ, eq γ ] b ⇔ ι∈OBS
2
Semantic and Syntactic Approaches to Simulation Relations
81
For Theorem 4.6, consider how to derive ⇒. In Sect. 2, we would use the Basic Lemma in this situation. Here, we apply Param; f (∀α.T [α, eq γ ] → ι) f . As mentioned before, Param for closed f is essentially the Basic Lemma for logical relations. In the semantic setting one can talk about closed terms. This is not immediately possible syntactically, and note that the definition of observational equivalence here says nothing about f being closed. To compensate for this, we use a extended ‘Basic Lemma’, namely relational parametricity. For the opposite direction, one must construct a relation ξ. Analogous to the case in the semantic setting, we use definability, but since closedness is intangible def for us, we can only exhibit ξ = (a : A, b : B) . (Dfnbl (a, b)), where def
Dfnbl = (x : A, y : B) . (∃fα : ∀α.T [α] → α . fα Aa = x ∧ fα Bb = y) This works since observational equivalence is defined ‘open’ as well. For example, if there is a profile g : α → α in T , then we show a.g (Dfnbl → Dfnbl ) b.g which def follows easily by giving f = Λα.λx : α.x.g(fα αx), where fα is postulated by the antecedentary Dfnbl . If g : α → ι, then f : ∀α.T [α] → ι, and by observational equivalence we have f Aa =ι f Bb, which gives f Aa ι f Bb by Param. This particular proof fails if T [α] has higher-order profiles. Consider def
T [α] = (p : (α → α) → nat, s : α → α) We must derive ∀x : A → A, y : B → B . x(Dfnbl → Dfnbl )y ⇒ a.px =nat b.py. However, x(Dfnbl → Dfnbl )y does not give us an fα→α : ∀α.T [α] → (α → α) such that fα→α Aa = x ∧ fα→α Bb = y, so we cannot construct our f : ∀α.T [α] → nat to complete the proof. This negative result involving Dfnbl generalises. At higher order, there might not exist any simulation relation in the presence of observational equivalence [Han01]. To exemplify with T [α] above, any candidate R ⊂ A × B has to satisfy ∀x : A → A, y : B → B . x (R → R) y ⇒ a.px =nat b.py, and this includes x and y that do not belong to, or are not expressible by, operations in the respective data types. This, one might argue, is unreasonable. In fact, it is. Consider a computation f = Λα.λx : T [α].M [α, x]. A crucial observation is now embodied in the following obvious statement. Abs-Bar : A computation Λα.λx : T [α].M [α, x] cannot have free variables of types involving the virtual data representation α. This has a direct bearing on how data type operations may be used. For example, x : A → A and y : B → B above cannot be arbitrary, but must be expressible by respective data type operations; in this case, the only possible candidate for x is a.g, and b.g for y. 4.2
abo-Simulation Relations with Special Parametricity
An obvious solution is now to define a notion of simulation relation where arrowtype relations are weakened by definability clauses for arguments [Han00,Han01]. def For example, write a.s (R → R) גb.s, for = גA, Ba, b, meaning ∀x : A, y : B . x R y ∧ Dfnbl גα (x, y) ⇒ a.sx R b.sy
82
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
where Dfnbl גα (x, y) = (x : A, y : B) . (∃fα : ∀α.T [α] → α . fα Aa = x ∧ fα Bb = y). In general, definability clauses are inserted recursively in arrow types, bottoming def out at base relations, i.e., R = גR. The full definition is in [Han01], and includes the formulation at universal type as well. With a slight abuse of notation, def
Definition 4.7 (abo-Simulation Relation). For any A, B and R ⊂ A × B, T [R, eq γ ]( = גa : T [A, γ], b : T [B, γ]) . (∧1≤i≤k a.gi (Ti [R, eq γ ] ) גb.gi ) def
Observable types such as nat are universal types and appear in ג-variants inside T [ξ]. Therefore, it is important that nat גis eq nat . This holds for closed inductive types using Param. However, we do not get desired relational properties at product type, hence the formulation in Definition 4.7. We are still working under the assumption of relational parametricity. However, notice that we cannot apply Param, e.g., when using T [ξ, eq γ ] ג. One can recover the needed proof power by asserting the missing piece of relational parametricity. Write f (∀α.T [α, eq γ ]ε → σ[α, eq γ ]ε ) f , meaning ∀A, B, ξ ⊂ A×B.∀a : T [A, γ], b : T [B, γ] . a(T [ξ, eq γ ]) גb ⇒ (f A a)(σ[ξ, eq γ ]() גf B b) where = גA, B, a, b. We assume the following: T HDT OBS : Every profile Ti [α] = Ti1 [α] → · · · → Tni [α] → Tci [α] of T [α] is such that Tij [α] has no occurrences of universal types other than those in OBS , and Tci [α] is either α or some ι ∈ OBS . T Definition 4.8 (Special abo-Parametricity (SpParam)). For HDTOBS , for σ[α, γ] having no occurrences of universal types other than those in OBS , and whose only free variables are among α and γ,
SpParam: ∀f : ∀α.(T [α, γ] → σ[α, γ]) . f (∀α.T [α, eq γ ]ε → σ[α, eq γ ]ε ) f T Lemma 4.9 (Basic Lemma SpParam [Han01]). For HDTOBS , for σ[α] having no occurrences of universal types other than those in OBS , and for closed f : ∀α.(T [α] → σ[α]), we derive f (∀α.T [α]ε → σ[α]ε ) f . 2 Lemma 4.9 entails soundness for the logic with Param and SpParam with respect to the closed type and term model and the parametric minimal model due to [Has91]. T Theorem 4.10 (Composability [Han00]). Assuming HDTOBS , we get using SpParam, for = גA, Ba, b, = גB, Cb, c, and = גA, Ca, c,
∀A, B, C, ξ ⊂ A×B, ζ ⊂ B ×C, a : T [A], b : T [B], c : T [C]. a T [ξ, eq γ ] גb ∧ b T [ζ, eq γ ] גc ⇒ a T [ζ ◦ ξ, eq γ ] גc 2 Theorem 4.11 (Representation Independence [Han00]). With the T assumption HDTOBS , we get with SpParam, for A, B, a : T [A], b : T [B], OBS , and = גA, Ba, b, ∃ξ ⊂ A×B . a T [ξ, eq γ ] גb ⇔ ∀f : ∀α.(T [α] → ι) . (f A a) =ι (f B b) ι∈OBS
2
Semantic and Syntactic Approaches to Simulation Relations
4.3
83
abo-Simulation Relations with Closed Special Parametricity
If we can express closedness in the logic, then we can relate to non-syntactic models as well. Closedness is inherently intractable, but we can approximate to a certain degree. We add a basic predicate Closed to the syntax together with a pre-defined semantics. The effect of this is, for example, that the interpretation in any model of ∀f : ∀α.(T [α, γ] → ι) . Closed OBS (f ) ⇒ φ(f ) restricts attention in φ to those interpretations of all f : ∀α.(T [α] → ι) that are denotable by terms whose only free variables are of types in OBS . The semantics for the predicate Closed is not stable under term formation, so we cannot make axioms for Closed in order to derive the closedness of a term from its subterms. We can however add a second symbol ClosedS with a pre-defined semantics that does allow the derivation of closedness, but ClosedS will then not satisfy substitutivity. This is resolved by giving a separate nonsubstitutive calculus for deriving closedness, together with rules for importing the needed results into the main logic. Details are in [Han01]. We now get: Definition 4.12 (Observational Equivalence by Closed Computation). Observational equivalence by closed computation of (pack Aa) and (pack Bb) with respect to OBS is expressed as ∀f : ∀α.(T [α] → ι) . Closed OBS (f ) ⇒ (f A a) =ι (f B b) ι∈OBS def
Also we write for example, a.s (R → R)גC b.s, for = גA, Ba, b, meaning ∀x : A, y : B . x R y ∧ Dfnbl Cα( גx, y) ⇒ a.sx R b.sy def
where Dfnbl Cα( גx, y) = (x : A, y : B) . (∃fα : ∀α.T [α] → α . Closed OBS (f ) ∧ fα Aa = x ∧ fα Bb = y) Again, with a slight abuse of notation: Definition 4.13 (abo-Simulation Relation by Closed Computation). For any A, B and R ⊂ A × B, T [R, eq γ ]גC = (a : T [A, γ], b : T [B, γ]) . (∧1≤i≤k a.gi (Ti [R, eq γ ]גC ) b.gi ) def
Definition 4.14 (Special Closed abo-Parametricity (spParamC)). For T , for σ[α, γ] having no occurrences of universal types other than those in HDTOBS OBS , and whose only free variables are among α and γ, spParamC: ∀f : ∀α.(T [α, γ] → σ[α, γ]) . Closed OBS (f ) ⇒ f (∀α.T [α, eq γ ]εC → σ[α, eq γ ]εC ) f Using this, we get analogous results to the previous section. The corresponding Basic Lemma entails the soundness of the logic with Param and spParamC with respect to any relational parametric model.
84
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
4.4
abo-Relational Parametricity
The previous two subsections augmented relational parametricity with special instances of what one could call abo-relational parametricity. Now we replace relational parametricity altogether with full-fledged abo-relational parametricity. This gives a much simpler treatment than the previous approaches, but at a price: we now need infinite conjunctions in the logic. These are however well-behaved in the sense that proofs only need pointwise treatment. Moreover, refinement proofs need not be concerned with infinite conjunctions. Abs-Bar says that function arguments in computations are bounded by fordef mal parameters, e.g., in the computation f = Λα.λx : α.λs : α → α.t[x, s], s will only be applied to arguments built from formal parameters x and s. This transfers to instances f σ and f τ . So even at the basic level of universal types, one could for R ⊂ σ×τ say that e.g., sσ (R → R) sτ should reflect this, in that only those x, y are considered for the antecedent x R y that are admissible in the computations. Thence, f (∀α.α → (α → α) → α)abo g is the relation given by ∀γ, δ, ξ ⊂ γ × δ . ∀a : γ, b : δ, s : γ → γ, s : δ → δ . a ξ b ⇒ s (ξ → ξ) s ⇒ f γas ξ gδbs where s (ξ → ξ) s for = γ, δa, bs, s is ∀x : γ, y : δ . x ξ y ∧ Dfnbl γ (x, y) ⇒ sx ξ s y where Dfnbl γ (x, y) = (x : γ, y : δ) . (∃fα : ∀α.T [α] → α . fα γas = x ∧ fα γbs = y). In general, Dfnbl clauses are inserted recursively in arrow types, bottoming out at base relations. The notion of abo-relation in [Han03] formalises the idea. Universal types play two rˆ oles. Consider (∀α.α → (α → α) → (∀β.(α → β) → β) → α)abo , and a term of this type, e.g., Λα.λx : α, s : α → α, p : ∀β.(α → β) → β . s(pαs). The abo-relation treats the outer universal type as the type of a computation, and sets up the Dfnbl clauses according to formal parameters x, s, p. Then, the inner universal type must be treated as a polymorphic parameter, and it is necessary to capture that instances pσ may only vary in α. This is where infinite conjunctions enters the scene, but this discernability for universal types is what enable abo-simulation relations to handle polymorphism in data types, see below. The abstraction barrier-observing formulation of relational parametricity is now given by the following axiom schema. def
Definition 4.15 (abo-Parametricity). abo-Param : ∀γ.∀f : (∀α.σ[α, γ]) . f (∀α.σ[α, eq γ ])abo f The abo-version of the identity extension lemma does not follow from aboParam, because we can no longer use extensionality. Nevertheless, in the spirit of observing abstraction barriers, we argue that in virtual computations, it suffices to consider extensionality only with respect to function arguments that will actually occur. The simplest way to capture this is in fact by asserting identity extension.
Semantic and Syntactic Approaches to Simulation Relations
85
Definition 4.16 (abo-Identity Extension for Universal Types). abo-Iel : ∀γ.∀u, v : (∀α.σ[α, γ]) . u (∀α.σ[α, eq γ ])abo v ⇔ u = v Both abo-Param and abo-Iel hold in the abo-parametric per -model [Han03]. We can also formulate a basic lemma for abo-Param, if we allow infinite derivations. Regular parametricity, Param, will not hold in this model; in fact any logic containing both Param and abo-Param is inconsistent. Note that abo-Iel implies abo-Param. Nevertheless, we choose to display both. With abo-Param and abo-Iel, we regain universal properties, for example for products: ∀σ, τ.∀z : σ×τ . pair (proj 1 z)(proj 2 z) = z ∀u, v : σ × τ . u (σ[eq γ ]×τ [eq γ ])abo v ⇔ (proj 1 u) σ[eq γ ]abo (proj 1 v) ∧ (proj 2 u) τ [eq γ ]abo (proj 2 v) Theorem 4.17 (Representation Independence [Han03]). Under the asT sumption HDTOBS , we get with abo-Param and abo-Iel, ∀A, B.∀a : T [A], b : T [B] . ∃ξ ⊂ A×B . a T [ξ, eq γ ] A,B a,b b 2 ⇔ ι∈OBS ∀f : ∀α.(T [α] → ι) . (f A a) =ι (f B b) T , we get with Theorem 4.18 (Composability [Han03]). Assuming HDTOBS abo-Param and abo-Iel,
∀A, B, C, ξ ⊂ A×B, ζ ⊂ B ×C, a : T [A], b : T [B], c : T [C]. a(T [ξ, eq γ ] A,B a,b )b ∧ b(T [ζ, eq γ ] B,C b,c )c ⇒ a(T [ζ ◦ ξ, eq γ ] A,C a,c )c 2 If we allow infinite derivations, we get representation independence and composability for data types with polymorphic operations, under one requirement: In the sense of Sects. 2 and 3, all type constants must either be observable or hidden, if they are the result type of any operation. Here, type constants correspond to closed types, and since we have polymorphism, type instantiation requires added caution. For example, if OBS = {bool }, then T [α] may not have a profile g : α → (∀β.β → β), since ∀β.β → β is not observable, nor any profile g : α → (∀β.α → β), since gx can then be instantiated by a non-observable closed type yielding a derived profile gx(∀β.β → β) : α → (∀β.β → β). Thus, the requirement takes the form T DT OBS : Every profile Ti [α] = Ti1 [α] → · · · → Tni [α] → Tci [α] of T [α] is such that if Tci [α] has a deepest rightmost universal type ∀β.V , then this subtype is not closed, nor is the deepest rightmost subtype of ∀β.V the quantified β. T Then, Theorem 4.17 and Theorem 4.18 hold under DT OBS . T In closing, we mention that for this section, HDT OBS can in any case be relaxed by dropping the restriction on Tij .
86
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
4.5
pl -Relational Parametricity
It is possible to define algebraic relations in the logic. We do this from basic principles, just as we do for abo-relations. Consider again for example the universal type ∀α.α → (α → α) → α. In a sense, universal types determine signatures with function profiles. This inductive type has a profile for ‘zero’, and a profile for ‘successor’. Relative to the ‘signature’ consisting of these profiles, one can then define algebraic relations in a finite way. Here this can be done for any σ and τ , by giving a relation Rα ⊂ σ × τ , taking the rˆ ole of a base type relation, and then giving a relation Rα→α ⊂ (σ → σ) × (τ → τ ) that we insist satisfies algebraicity: Rα→α (s, s ) ⇒ s (Rα → Rα ) s . In this manner, the universal type induces a family of relations, namely Rα and Rα→α , over the ‘signature’ of the universal type. Thus, we write e.g., f (∀α.α → (α → α) → α)pl g for the relation ∀γ, δ, ξα ⊂ γ × δ, ξα→α ⊂ γ → γ × δ → δ . pl α→α (ξα→α ; ξα ) ⇒ ∀a : γ, b : δ, s : γ → γ, s : δ → δ . Rα (a, b) ∧ Rα→α (s, s ) ⇒ Rα (f γas, gδbs ) where pl α→α (ξα→α ; ξα ) asserts algebraicity of ξα→α relative to ξα . In general, one completes the finite family of relations with all so-called free subtypes, in order to ensure well-definedness of the algebraicity conditions. Also, the full definition of algebraic relations in this manner must reflect the two levels of polymorphism mentioned in the previous section. To get pre-logical relations (pl -relations), one must additionally ensure closure over abstraction. This spoils finiteness, since for a combinatorial approach, we must assert relatedness of an infinite set of combinators. Again this infinite conjunction is well-behaved, since it ranges over all types only varying over the data representations. We may soundly assert pl -relational parametricity and using this, we get similar results to those in the previous section. It might be possible to get away with a finite number of combinators. The rationale behind this is that one may proceed with only a finite family of algebraic relations. If relations of higher order than those in the family are needed, then these can be constructed by logical lifting. This is relevant for polymorphic instantiation. Based on this, it may suffice to have an upper bound on the type complexity of combinators needed. This is under investigation. 4.6
Polymorphic Data Types in F3
Polymorphism within data types is dealt with in a somewhat general manner in the two previous sections. However, it is hard to find natural examples of data types with polymorphic operations that are expressible in System F. Instead, F3 is appropriate, and then one could give e.g., polymorphic stacks as follows. ∃X : ∗ → ∗.TpolyStack [X], TpolyStack [X] = (empty : ∀γ . Xγ, push : ∀γ . γ → Xγ → Xγ, pop : ∀γ . Xγ → Xγ, top : ∀γ . Xγ → γ → γ, map : ∀γ, γ . (γ → γ ) → Xγ → Xγ )
Semantic and Syntactic Approaches to Simulation Relations
87
This provides polymorphic stack operations. The data representation is a type constructor X to be instantiated by the relevant stack element type. One can treat this kind of shallow polymorphism in a pointwise fashion in F2 [Han01], so that one essentially reduces the problem to non-polymorphic signatures. Then it is not necessary that F2 technology deals with polymorphic signatures, neither in one way or another. Alternatively, one could devise appropriate notions of relational parametricity for F3 .
5
Reconciliation
In comparison with the neat and tidy story of pre-logical relations in the simplytyped lambda calculus told in Sect. 2, both the semantic account of pre-logical relations in System F in Sect. 3 and the syntactic approach of abstraction barrier-observing simulation relations in Sect. 4 exhibit certain shortcomings. Our present feeling is that true enlightenment on this subject will require some bridge between the two. The following subsections suggest some possible lines of enquiry that seem promising to us. 5.1 Internalisation of Semantic Notions into Syntax Sects. 2 and 3 deal with semantic simulation relations between models for lambda calculi. Sect. 4, on the other hand, internalises models and simulation relations into syntax. Models (data types) then become terms of an existential type of the def form ∃α.T [α] = ∀β.(∀β.T [α] → β) → β, for some ‘signature’ T [α], and computations or programs using data types are f : ∀α.T [α] → σ. Thus, polymorphism is used to internalise semantic notions. This use of polymorphism is at a level external to data types (models); as in the outermost universal type in computations f : ∀α.T [α] → σ, in contrast to polymorphism within models arising from any polymorphic profiles in T [α]. Relationally, this two-leveled aspect gives rise to certain difficulties. For logical relations it suffices to give a uniform relational definition for universal types, but for abo-relations which use definability relative to data type signatures, it is necessary to reflect the two levels in the relational definitions. This gives a nonuniform relational treatment at universal type. Note that the semantic approach in Sect. 3 does not work on internalised structures in the syntax, and the notion of relational parametricity is there cleaner. Internalising models as, e.g., inhabitants of existential type, gives syntactic control, especially in the context of refining abstract specifications to executable programs. However, we think that the benefits of this to mechanised reasoning should be weighed against a possible scenario without internalisation, but perhaps with sound derived proofs rules for data refinement. This would be a more domain-specific calculus, but might provide a simpler formalism, perhaps more in style with semantic reasoning. 5.2 Equivalence of Models versus Equivalence of Values in a Model As a consequence of the internalisation of semantic notions discussed above, the term “observational equivalence” has been applied at two different levels to
88
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
achieve similar aims. In Sects. 2 and 3, observational equivalence is a relation between two models over the same signature, representing programs. In Sect. 4, it is a relation on encapsulated data types within a single model; such a relation is sometimes referred to as indistinguishability, written ≈. In both of the latter two sections, the power of System F would allow the opposite approach to be taken. Then the question of the relationship between the resulting definitions arises. This question has been investigated in a number of simpler frameworks in [BHW95,HS96,Kat03], where the connection is given by a factorisability result of the form A ≡ B iff A/≈ ∼ = B/≈. It is likely that the same applies in the context of System F, and this might in turn help to shed light on the relationship between the semantic and syntactic worlds. 5.3
Finiteness
Pre-logical relations are defined in terms of definable elements. In logic it is hard to deal with definability in a term-specific way. In Sect. 4.3 we approximated by introducing a new predicate Closed , and in Sect. 4.2 we explicitly related to syntactic models. In both Sect. 4.4 and Sect. 4.5, we basically end up with infinitary logic, albeit in a tractable manner. It may also be feasible to combine elements, for example to use the Closed predicate together with pl -relations. From a purist point of view, all these approaches are slightly unsatisfactory, although for practical purposes they provide methods for proving refinement, since the infinitary issues are basic and of no concern when doing refinement proofs. In fact this is true a fortiori for abo-simulation relations, since these are in fact finite, unlike pre-logical relations. 5.4
Pre-logical Relations and abo-Relations
Both pre-logical relations (pl -relations in the logic) and abo-relations solve the same problems for refinement. The question is then what else they have in common. To make a comparison easier, one can do two things. First, one can transpose the syntactic idea of abo-relation into the semantic setting around e.g., combinatory algebras. It is then probably natural to interpret the Dfnbl clauses as term-definability. In that case when considering data types, it is evident that abo-simulation relations specialise to a finitary version of the minimal pre-logical relation, which is not surprising. The general relationship is however unclear. Conversely, one might transpose the idea of pre-logical relation into syntax with internalised data types. This is mentioned in Sect. 4.5. Then, the connection is not so clear, since the Dfnbl clause says nothing about term-definability, unless we use the Closed predicate of Sect. 4.3. Any comparison would probably depend on the model of choice for the logic. Furthermore, at a more fundamental level, one gets various concepts of relational parametricity. We have the ones in the syntactic setting where data types are internalised, but we also have the external semantic concept in connection to the scenario in Sect. 3. Characterising these in terms of one another is left as an interesting challenge, the start of which is described in the next section.
Semantic and Syntactic Approaches to Simulation Relations
5.5
89
Connection with Pre-logical Relations and Relational Interpretation of System F
We can regard a binary relation over a BMM interpretation as an interpretation of types by relations. The origin of this viewpoint, the relational interpretation of System F, goes back to Reynolds [Rey83] in his attempt to obtain a set-theoretic model of System F. This relational viewpoint enables him to capture the nature of polymorphism in terms of relational parametricity. A semantic account of this viewpoint is given in [MR91,Has91,RR94,BAC95]. Roughly speaking, a relational interpretation of System F consists of two components: a reflexive graph (a graph with an identity edge at each node), which gives a skeleton of binary relations; and an underlying interpretation of System F together with binary relations over it. The interpretation ties nodes and edges of the reflexive graph to the carrier sets of the underlying interpretation and binary relations. This mapping respects identity, i.e. identity edges are mapped to identity relations. We can find a correspondence between pre-logical relations and the relational interpretation of System F. Reflexive graphs are a generalisation of reflexive relations. Thus a pair consisting of a BMM interpretation A and a relation R ⊆ A×A such that TR is reflexive and Rt,t = idAt form a relational interpretation of System F. Conversely, any relational interpretation whose reflexive graph is just a reflexive relation can be regarded as a relation over its underlying interpretation of System F. Moreover we often assume that the mapping from edges to relations respects the interpretation of types. This situation is called natural in [Has91], and under the above correspondence, this means that the corresponding relation is logical. This correspondence suggests that we can bring back our notion of pre-logical relations to consider a class of relational interpretations of System F. We expect that this new class includes interpretations which satisfy Reynolds’ abstraction theorem. The question is how do we understand the notion of relational parametricity in this new class. Parametricity states that relations at universal types include identity relations, but the identity relation itself is a logical notion (in extensional models). One negative consequence of this mismatch is that the Identity Extension Lemma does not hold. On the other hand, modifying parametricity is a good idea for achieving a finer characterisation of observational equivalence. This is exactly achieved on the syntactic side in Sect. 4. We expect that this modification and relevant results developed in the syntactic approach will provide interesting feedback to the relational interpretation of System F.
References BFSS90. BAC95.
E. Bainbridge, P. Freyd, A. Scedrov, and P. Scott. Functorial polymorphism. Theoretical Computer Science 70:35–64 (1990). R. Bellucci, M. Abadi, and P.-L. Curien. A model for formal parametric polymorphism: a PER interpretation for system R. Proc. 2nd Intl. Conf. on Typed Lambda Calculi and Applications, TLCA’95, Edinburgh. Springer LNCS 902, 32–46 (1995).
90 BHW95.
Jo Hannay, Shin-ya Katsumata, and Donald Sannella
M. Bidoit, R. Hennicker and M. Wirsing. Behavioural and abstractor specifications. Science of Computer and Programming, 25:149–186 (1995). BB85. C. B¨ ohm and A. Berarducci. Automatic synthesis of typed λ-programs on term algebras. Theoretical Computer Science 39:135–154 (1985). BMM90. K. Bruce, A. Meyer, and J. Mitchell. The semantics of the second-order lambda calculus. Information and Computation 85(1):76–134 (1990). Gir71. J.-Y. Girard. Une extension de l’interpr´etation de G¨ odel ` a l’analyse, et son application ` a l’´elimination des coupures dans l’analyse et la th´eorie des types. Proc. 2nd Scandinavian Logic Symp., Oslo. Studies in Logic and the Foundations of Mathematics, Vol. 63, 63–92. North-Holland (1971). GTL90. J.-Y. Girard, P. Taylor, and Y. Lafont. Proofs and Types. Cambridge University Press (1990). Han99. J. Hannay. Specification refinement with System F. Proc. 13th Intl. Workshop on Computer Science Logic, CSL’99, Madrid. Springer LNCS 1683, 530–545 (1999). Han00. J. Hannay. A higher-order simulation relation for System F. Proc. 3rd Intl. Conf. on Foundations of Software Science and Computation Structures. ETAPS 2000, Berlin. Springer LNCS 1784, 130–145 (2000). Han01. J. Hannay. Abstraction Barriers and Refinement in the Polymorphic Lambda Calculus. PhD thesis, Laboratory for Foundations of Computer Science (LFCS), University of Edinburgh (2001). Han03. J. Hannay. Abstraction barrier-observing relational parametricity. Proc. 6th Intl. Conf. on Typed Lambda Calculi and Applications, TLCA 2003, Valencia. Springer LNCS 2701 (2003). Has91. R. Hasegawa. Parametricity of extensionally collapsed term models of polymorphism and their categorical properties. Proc. Intl. Conf. on Theoretical Aspects of Computer Software, TACS’91, Sendai. Springer LNCS 526, 495– 512 (1991). Hoa72. C.A.R. Hoare. Proof of correctness of data representations. Acta Informatica 1:271–281 (1972). HS96. M. Hofmann and D. Sannella. On behavioural abstraction and behavioural satisfaction in higher-order logic. Theoretical Computer Science 167:3–45 (1996). HLST00. F. Honsell, J. Longley, D. Sannella and A. Tarlecki. Constructive data refinement in typed lambda calculus. Proc. 3rd Intl. Conf. on Foundations of Software Science and Computation Structures. ETAPS 2000, Berlin. Springer LNCS 1784, 161–176 (2000). HS02. F. Honsell and D. Sannella. Prelogical relations. Information and Computation 178:23–43 (2002). Short version in Proc. Computer Science Logic, CSL’99, Madrid. Springer LNCS 1683, 546–561 (1999). Kat03. S. Katsumata. Behavioural equivalence and indistinguishability in higherorder typed languages. Selected papers from the 16th Intl. Workshop on Algebraic Development Techniques, Frauenchiemsee. Springer LNCS, to appear (2003). Lei01. H. Leiß. Second-order pre-logical relations and representation independence. Proc. 5th Intl. Conf. on Typed Lambda Calculi and Applications, TLCA’01, Cracow. Springer LNCS 2044, 298–314 (2001). MR91. Q. Ma and J. Reynolds. Types, abstraction and parametric polymorphism, part 2. Proc. 7th Intl. Conf. on Mathematical Foundations of Programming Semantics, MFPS, Pittsburgh. Springer LNCS 598, 1–40 (1991).
Semantic and Syntactic Approaches to Simulation Relations Mai91.
91
H. Mairson. Outline of a proof theory of parametricity. Proc. 5th ACM Conf. on Functional Programming and Computer Architecture, Cambridge, MA. Springer LNCS 523, 313–327 (1991). Mil71. R. Milner. An algebraic definition of simulation between programs. Proc. 2nd Intl. Joint Conf. on Artificial Intelligence. British Computer Society, 481–489 (1971). Mit96. J. Mitchell. Foundations for Programming Languages. MIT Press (1996). MP88. J. Mitchell and G. Plotkin. Abstract types have existential type. ACM Trans. on Programming Languages and Systems 10(3):470–502 (1988). PA93. G. Plotkin and M. Abadi. A logic for parametric polymorphism. Proc. Intl. Conf. Typed Lambda Calculi and Applications, TLCA’93, Utrecht. Springer LNCS 664, 361–375 (1993). PPST00. G. Plotkin, J. Power, D. Sannella and R. Tennent. Lax logical relations. Proc. 27th Int. Colloq. on Automata, Languages and Programming, Geneva. Springer LNCS 1853, 85–102 (2000). Rey74. J. Reynolds. Towards a theory of type structures. Programming Symposium (Colloque sur la Programmation), Paris, Springer LNCS 19, 408–425 (1974). Rey81. J. Reynolds. The Craft of Programming. Prentice Hall (1981). Rey83. J. Reynolds. Types, abstraction and parametric polymorphism. Proc. 9th IFIP World Computer Congress, Paris. North Holland, 513–523 (1983). RR94. E. Robinson and G. Rosolini. Reflexive graphs and parametric polymorphism. Proc., Ninth Annual IEEE Symposium on Logic in Computer Science, Paris, 364–371. IEEE Computer Society Press (1994). ST97. D. Sannella and A. Tarlecki. Essential concepts of algebraic specification and program development. Formal Aspects of Computing 9:229–269 (1997). Sch90. O. Schoett. Behavioural correctness of data representations. Science of Computer Programming 14:43–57 (1990). Str67. C. Strachey. Fundamental concepts in programming languages. Lecture notes from the Intl. Summer School in Programming Languages, Copenhagen (1967). Ten94. R. Tennent. Correctness of data representations in Algol-like languages. In: A Classical Mind: Essays in Honour of C.A.R. Hoare. Prentice Hall (1994). Tak98. I. Takeuti. An axiomatic system of parametricity. Fundamenta Informaticae, 33(4):397–432 (1998).
On the Computational Complexity of Conservative Computing Giancarlo Mauri and Alberto Leporati Dipartimento di Informatica, Sistemistica e Comunicazione Universit` a degli Studi di Milano – Bicocca Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy
[email protected],
[email protected] Abstract. In a seminal paper published in 1982, Fredkin and Toffoli have introduced conservative logic, a mathematical model that allows one to describe computations which reflect some properties of microdynamical laws of Physics, such as reversibility and conservation of the internal energy of the physical system used to perform the computations. In particular, conservativeness is defined as a mathematical property whose goal is to model the conservation of the energy associated to the data which are manipulated during the computation of a logic gate. Extending such notion to generic gates whose input and output lines may assume a finite number d of truth values, we define conservative computations and we show that they naturally induce a new NP–complete decision problem and an associated NP–hard optimization problem. Moreover, we briefly describe the results of five computer experiments performed to study the behavior of some polynomial time heuristics which give approximate solutions to such optimization problem. Since the computational primitive underlying conservative logic is the Fredkin gate, we advocate the study of the computational power of Fredkin circuits, that is circuits composed by Fredkin gates. Accordingly, we give some first basic results about the classes of Boolean functions which can be computed through polynomial–size constant–depth Fredkin circuits.
1
Introduction
The possibility to perform computations with zero internal energy dissipation has been extensively explored in the past few decades. Considerations of thermodynamics of computing started in the early fifties of the twentieth century. As shown in [11], erasing a bit necessarily dissipates kT ln 2 Joule in a computer operating at temperature T , and generates a corresponding amount of entropy. Here k is Boltzmann’s constant and T the absolute temperature in degrees Kelvin, so that kT ≈ 3 × 10−21 Joule at room temperature. However, in [11] Landauer also demonstrated that only logically irreversible operations necessarily dissipate energy when performed by a physical computer. (An operation
This work has been supported by MIUR project 60% “Teoria degli automi”
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 92–112, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Computational Complexity of Conservative Computing
93
is logically reversible if its inputs can always be deduced from its outputs.) This result gave substance to the idea that logically reversible computations could be performed with zero internal energy dissipation. Indeed, since the appearance of [11] many authors have concentrated their attention on reversible computations. The importance of reversibility has grown further with the development of quantum computing, where the dynamical behavior of quantum systems is usually described by means of unitary operators, which are inherently logically reversible. Let us note, however, that computing in a logically reversible way says nothing about whether or not the computation dissipates energy: it merely means that the laws of physics do not require that such a dissipation occurs. Of the many papers on reversible computation which have appeared in the literature, the most famous are certainly the work of Bennett on universal reversible Turing machines [3], and the work of Fredkin and Toffoli on conservative logic [6]. In particular, conservative logic has been introduced as a mathematical model that allows one to describe computations which reflect some properties of microdynamical laws of Physics, such as reversibility and conservation of the internal energy of the physical system used to perform the computations. The model is based upon the so called Fredkin gate (see section 6), a three– input/three–output Boolean gate originally introduced by Petri in [14]. In this model conservative computations are realized through circuits which are composed of Fredkin gates. According to [6], conservativeness is usually modeled by the property that the output patterns of the involved Boolean gates are always a permutation of the corresponding input patterns. Besides being conservative, the Fredkin gate is also reversible, that is, it computes a one–to–one map on {0, 1}3 . Notice that conservativeness and reversibility are two independent notions: a gate can satisfy both properties, only one of them, or none. Since every conservative gate produces permutations of its input patterns, it must have the same number of input and output lines. On the other hand, a necessary condition for a gate to be reversible is that the number of output lines be greater than or equal to the number of input lines. Following [4], in this paper we extend the notion of conservativeness to generic gates whose input and output lines may assume a finite number d of truth values, and we derive some properties which are satisfied by conservative gates. By associating equispaced energy levels to the truth values, we show that our notion of conservativeness corresponds to the energy conservation principle applied to the data which are manipulated during the computation. Let us stress that conservativeness is here considered from a strictly mathematical point of view, as in [6]; that is, we are not requiring that the entire energy used to perform the computation is preserved, or that the computing device be a conservative physical system (an ideal but unrealistic situation). In particular, we do not consider the energy needed to actually perform the computation, that is, to apply the operators that transform input values into output values. Successively we introduce the notion of conservative computation, under the reasonable assumption that a gate may store, or accumulate, some energy in its
94
Giancarlo Mauri and Alberto Leporati
internal machinery. Following again [4] we show that conservative computations naturally induce an interesting optimization problem, that we have named Min Storage. By proving that its decision version is NP–complete we show that such problem is NP–hard. Since it is generally believed that no deterministic polynomial time algorithm exists that always gives the correct solution to an NP–hard optimization problem, we present some polynomial time heuristics that give approximate solutions to Min Storage, and we report the results of five computer experiments which have been performed to study the behavior of such heuristics on uniformly randomly chosen instances. Finally, since the computational primitive underlying conservative logic is the Fredkin gate, we advocate the study of the computational power of Fredkin circuits, that is, circuits composed by Fredkin gates. Accordingly, we give some first basic results about the classes of Boolean functions which can be computed by polynomial–size constant–depth Fredkin circuits. Precisely, we show that all Boolean functions which are computed by this kind of circuits can also be computed by depth–two polynomial–size threshold circuits. However, Fredkin circuits can also be regarded as computational devices which generate finite sets of permutations. Since the generated set depends upon the topology of the circuit, we believe that it should be interesting to investigate the relationships between the topology of Fredkin circuits, the computational complexity of the Boolean functions computed by them, and the structure of the corresponding sets of permutations. We conclude the paper by giving some directions towards the study of these relationships.
2
Conservativeness
Our notion of conservativeness is based upon many–valued logics. These are extensions of the classical Boolean logic which have known a great diffusion due to their ability to manage incomplete and/or uncertain knowledge. Different approaches to many–valued logics have been considered in the literature (for an overview, see [16,17]). However, here we are not interested into the study of syntactical or algebraic aspects of many–valued logics; we just use some gates whose input and output lines may assume “intermediate” truth values, such as the gates defined in [5]. 1 2 For every integer d ≥ 2, we consider the finite set Ld = {0, d−1 , d−1 , . . ., d−2 , 1} of truth values; 0 and 1 denote falsity and truth, respectively, whereas the d−1 other values of Ld indicate different degrees of indefiniteness. As usually found in literature, we will use Ld both as a set of truth values and as a numerical set equipped with the standard order relation on rational numbers. An n–input/m–output d–valued function (also called an (n, m, d)–function for short) is a map f : Lnd → Lm d . Analogously, an (n, m, d)–gate and an (n, m, d)– circuit are devices that compute (n, m, d)–functions. A gate is considered as a primitive operation, that is, it is assumed that a gate cannot be decomposed into simpler parts. On the other hand, a circuit is composed by layers of gates: for a precise definition see, for example, [21].
On the Computational Complexity of Conservative Computing
95
Our notion of conservativeness is based upon the conservation of some additive quantities associated with the truth values of Ld . Let us consider the set 1 , ε 2 , . . . , ε d−2 , ε1 ⊆ IR of real values; for exposition conveEd = ε0 , ε d−1 d−1 d−1 nience we can think to such quantities as energy values. To each truth value v ∈ Ld we associate the energy level εv ; moreover, let us assume that the values of Ed are all positive, equispaced, and ordered according to the corresponding 1 truth values: 0 < ε0 < ε d−1 < · · · < ε d−2 < ε1 . If we denote by δ the gap d−1 between two adjacent energy levels then the following linear relation holds: εv = ε0 + δ (d − 1) v
∀ v ∈ Ld
(1)
Notice that it is not required that ε0 = δ. Now, let x = (x1 , . . . , xn ) ∈ Lnd be an input pattern for nan (n, m, d)–gate. We define the amount of energy associated to x as En (x) = i=1 εxi , where εxi ∈ Ed is the amount of energy associated to the i–th element xi of the input pattern. Let us remark that the map En : Lnd → IR+ is indeed a family of mappings parameterized by n, the size of the input. Analogously, for an output m pattern y ∈ Lm d we define the associated amount of energy as Em (y) = i=1 εyi . We can now define a conservative gate as follows. Definition 1. An (n, m, d)–gate, described by the function G : Lnd → Lm d , is conservative if the following condition holds: ∀ x ∈ Lnd
En (x) = Em (G(x))
(2)
Notice that it is not required that the gate has the same number of input and output lines, as it happens with the reversible and conservative gates considered in [6,5]. Using relation (1), equation (2) can also be written as: n m ε0 m ε0 n xi = yj + + δ(d − 1) i=1 δ(d − 1) j=1 Hence, when n = m conservativeness reduces to the conservation of the sum of truth values given in input; in [5] this property is called “weak conservativeness”. A trivial observation is that conservativeness and weak conservativeness coincide in the Boolean case. As for the d–valued case, with d ≥ 3, it is easy to see that conservativeness implies weak conservativeness, whereas the converse is not true: for example, a weakly conservative (3, 3, 3)–gate may transform the input pattern (1, 0, 0) into the output pattern ( 12 , 21 , 0). Also notice that in the Boolean case conservativeness is equivalent to requiring that the number of 1’s given in input is preserved, as originally stated in [6]. An interesting remark is that conservativeness entails an upper and a lower bound to the ratio m n of the number of output lines versus the number of input lines of a gate (or circuit). In fact, maximum amount of energy that can be the n associated to an input pattern is i=1 ε1 = n ε1 , whereas minimum amount the m of energy that can be associated to an output pattern is i=1 ε0 = m ε0 . Clearly if n ε1 < m ε0 then the gate cannot produce any output pattern in a conservative
96
Giancarlo Mauri and Alberto Leporati
ε1 way. As a consequence it must hold m n ≤ ε0 . Analogously, if we consider the minimum amount of energy n ε0 that can be associated to an input pattern x and the maximum amount of energy m ε1 that can be associated to an output ε0 pattern y, it clearly must hold n ε0 ≤ m ε1 , that is m n ≥ ε1 . Summarizing, we have the bounds: ε0 m ε1 ≤ ≤ ε1 n ε0
that is, for a conservative gate (or circuit) the number m of output lines is constrained to grow linearly with respect to the number n of input lines. A natural question is whether we can compute all (n, m, d)–functions in a conservative way. Let us consider the Boolean case. Let f : {0, 1}n → {0, 1}m be a non necessarily conservative function, and let us define the following quantities: Of = max 0, max n {Em (f (x)) − En (x)} x∈{0,1} Zf = max 0, max n {En (x) − Em (f (x))} x∈{0,1}
Practically speaking, Of (resp., Zf ) is the maximum number of 1’s (resp., 0’s) in the output pattern that should be converted to 0 (resp., 1) in order to make the computation conservative. This means that if we use a gate Gf with n + Of + Zf input lines and m+Of +Zf output lines then we can compute f in a conservative way as follows: Gf (x, 1Of , 0Zf ) = (f (x), 1w(x) , 0z(x) ) where 1k (resp., 0k ) is the k–tuple consisting of all 1’s (resp., 0’s), and the pair (1w(x) , 0z(x) ) ∈ {0, 1}Of +Zf is such that w(x) = Of + En (x) − Em (f (x)) and z(x) = Zf − En (x) +Em (f (x)). As we can see, we use some additional input (resp., output) lines in order to provide (resp., remove) the required (resp., exceeding) energy that allows Gf to compute f in a conservative way. It is easily seen that the same trick can be applied to generic d–valued functions f : Lnd → Lm d ; instead of the number of missing or exceeding 1’s, we just compute the missing or exceeding number of energy units, and we provide an appropriate number of additional input and output lines.
3
Conservative Computations
Let us introduce now the notion of conservative computation. We have seen that conservativeness amounts to the requirement that the energy En (x) associated to the input pattern is equal to the energy Em (y) associated to the corresponding output pattern. We can weaken this requirement as follows. Let G : Lnd → Lm d be the function computed by an (n, m, d)–gate. Moreover, let Sin = x1 , x2 , . . ., xk be a sequence of elements from Lnd to be used as input patterns for the gate, and let Sout = G(x1 ), G(x2 ), . . . , G(xk ) be the corresponding sequence of output patterns from Lm d . Let us consider the quantities ei = En (xi )−Em (G(xi )) for all
On the Computational Complexity of Conservative Computing
97
i ∈ {1, 2, . . . , k}; note that, without loss of generality, by an appropriate rescaling we may assume that all ei ’s are integer values. We say that the computation of Sout , obtained starting from Sin , is conservative if the following condition holds: k i=1
ei =
k
En (xi ) −
i=1
k
Em (G(xi )) = 0
i=1
This condition formalizes the requirement that the total energy provided by all input patterns of Sin is used to build all output patterns of Sout . Of course it may happen that ei > 0 or ei < 0 for some i ∈ {1, 2, . . . , k}. In the former case the gate has an excess of energy that should be dissipated into the environment after the production of the value G(xi ), whereas in the latter case the gate does not have enough energy to produce the desired output pattern. Since we want to avoid these situations, we assume to perform computations through gates which are equipped with an internal accumulator (also storage unit) which is able to store a maximum amount C of energy units. We call C the capacity of the gate. The amount of energy contained into the internal storage unit at a given time can be used during the next computation if the output pattern energy is greater than the energy of the corresponding input pattern. If the output patterns G(x1 ), G(x2 ), . . . , G(xk ) of Sout are computed exactly in this order then, assuming that the computation starts with no energy stored into the gate, it is easily seen that st1 := e1 , st2 := e1 + e2 , . . . , stk := e1 + e2 + . . . + ek is the sequence of the amounts of energy stored into the gate during the computation of Sout . Notice that stk = 0 for conservative computations. This condition is equivalent to the requirement that the amount of energy stored into the gate at the end of the computation be equal to the amount of energy stored at the beginning (i.e., zero as assumed here; in section 5 we will also consider the more general case where the initial energy can be greater than zero). In some cases the order with which the output patterns of Sout are computed does not matter. We can thus introduce the following problem: Given an (n, m, d)–gate that computes the map G : Lnd → Lm d , an input sequence x1 , . . . , xk and the corresponding output sequence G(x1 ), . . . , G(xk ), is there a permutation π ∈ Sk (the symmetrical group of order k) such that the computation of G(xπ(1) ), G(xπ(2) ), . . . , G(xπ(k) ) can be performed by a gate having a predefined capacity C? This is a decision problem, whose formal statement follows. (Note that we do not actually need to know the values of x1 , . . . , xk and G(x1 ), . . . , G(xk ): all we need are the values ei = En (xi ) − Em (G(xi )), for i ∈ {1, 2, . . . , k}.) Let E = e1 , e2 , . . . , ek be a finite sequence of integer numbers, defined as i above. For a fixed i ∈ {1, 2, . . . , k}, the i–th prefix sum of E is the value j=1 ej . Let C be a positive integer; we say that E is C–feasible if for each i ∈ {1, 2, . . . , k} the i–th prefix sum of E is in the closed interval [0, C]. Problem 1. Name: ConsComp. – Instance: a set E = {e1 , e2 , . . . , ek } of integer numbers such that e1 + e2 + . . . + ek = 0, and an integer number C > 0.
98
Giancarlo Mauri and Alberto Leporati
– Question: is there a permutation π ∈ Sk (the symmetric group of order k)
such that the sequence eπ(1) , eπ(2) , . . . , eπ(k) is C–feasible? The fact that the resulting sequence eπ(1) , eπ(2) , . . . , eπ(k) is C–feasible can be explicitly written as: 0≤
i
eπ(j) ≤ C
∀ i ∈ {1, 2, . . . , k}
(3)
j=1
The ConsComp problem can be obviously solved by trying every possible permutation π from Sk . However, this procedure requires an exponential time with respect to k, the length of the computation. A natural question is whether it is possible to give the correct answer in polynomial time. With the following theorem we show that the ConsComp problem is NP–complete, and hence it is very unlikely that a polynomial time algorithm exists that solves it. The proof of this theorem was originally published in [4]. Theorem 1. ConsComp is NP–complete. Proof. ConsComp is clearly in NP, since a permutation π ∈ Sk has linear length and verifying if π is a solution can be done in polynomial time. Let us show a polynomial reduction from Partition, which is notoriously an NP–complete problem ([7], page 47). First of all we restate Partition in the following form: – Instance: a set R = {r1 , r2 , . . . , rk } of rational numbers from the interval [0, 1]. – Question: is there a partition (R1 , R2 ) of R such that r= r? r∈R1
r∈R2
The equivalence of this formulation with the one contained in [7] is trivial to prove. Moreover, without loss of generality we can assume that if (R1 , R2 ) is a solution of Partition then it holds r∈R1 r = r∈R2 r = 1: in fact, given a generic instance R = {r1 , r2 , . . . , rk } of Partition it suffices to compute m = k m i=1 ri and divide each element of R by 2 ; as a consequence, it holds r∈R r = 2. Now we consider a generic instance of Partition and we build a corresponding instance of ConsComp such that a solution of the latter can be transformed in polynomial time into a solution of the former. The instance of ConsComp is built as follows: let C = 1, and E = {e1 , e2 , . . . , ek , ek+1 , ek+2 } such that ei = −ri for i ∈ {1, 2, . . . , k} and ek+1 = ek+2 = 1. It is immediate to see that this transformation can be computed in polynomial time. When we solve ConsComp on the instance we have just built, since inequalities stated in (3) must hold, the first element of E which has to be selected to build the permutation π is necessarily a 1. For the same reason we have to choose the next elements from the set {e1 , e2 , . . . , ek } ⊆ E. For each choice, the sum of the elements which have been selected up to that point decreases; the remaining 1 can be chosen if and only if such sum becomes exactly zero. When
On the Computational Complexity of Conservative Computing
99
this situation occurs, the negative elements of E selected up to that point sum up to −1, and thus their opposites constitute one of the two sets of the partition of R in the instance of Partition we started from. In other words, we can place the second 1 in the solution of ConsComp (thus solving the problem) if and only if we can solve the instance of Partition we started from.
The ConsComp problem naturally leads to the formulation of the following optimization problem. Problem 2. Name: Min Storage. – Instance: a set {e1 , e2 , . . . , ek } of integer numbers such that e1 + e2 + . . . + ek = 0. i – Solution: a permutation π ∈ Sk such that j=1 eπ(j) ≥ 0 for each i ∈ {1, 2, . . . , k}. i – Measure: max j=1 eπ(j) .
1≤i≤k
Informally, the output of Min Storage is the minimum value of C for which there exists a permutation π ∈ Sk such that the sequence eπ(1) , eπ(2) , . . . , eπ(k) k is C–feasible. Observe that a trivial upper bound for the value of C is i=1 |ei |, while a trivial lower bound is max1≤i≤k |ei |. It is immediate to see that Min Storage is in the class NPO. In fact, checking whether some given integers e1 , e2 , . . . , ek sum up to zero can be trivially done in polynomial time; each feasible solution has linear length and besides it can be verified in polynomial time whether a given permutation π ∈ Sk is a feasible solution; finally, the measure function can be computed in polynomial time. Since the underlying decision problem ConsComp is NP–complete, we can immediately conclude that Min Storage is NP–hard ([1], page 30). As with the ConsComp decision problem, this means that it is very unlikely that a polynomial time algorithm exists that gives the correct solution to every instance of Min Storage. A trivial exponential time algorithm for solving Min Storage is the following: for each permutation π ∈ Sk we compute: st1 = eπ(1) ,
st2 = eπ(1) + eπ(2) ,
...,
stk = eπ(1) + eπ(2) + . . . + eπ(k)
(notice that stk is always zero by definition of the problem). If there exists an i ∈ {1, 2, . . . , k} such that sti < 0 then we discard the permutation, otherwise we compute stmax = max{st1 , st2 , . . . , stk }. The solution is the minimum value of stmax over all the permutations which have not been discarded. Some optimization is actually possible: if, for a given permutation π and i ∈ {1, 2, . . . , k} it holds sti < 0, then we can immediately discard all permutations π such that π (j) = π(j) for all j ∈ {1, 2, . . . , i}. Analogously, we can discard the same permutations if sti is greater than the best solution obtained so far. Of course, also with these tricks the execution time of this algorithm remains in general exponential with respect to k: a computer implementation has shown no relevant
100
Giancarlo Mauri and Alberto Leporati
advantage with respect to the brute force algorithm which tries all permutations π ∈ Sk , but for artificial instances built on purpose. Another computer experiment has shown that also discarding the permutation under examination whenever sti is found greater than a parameter given in input gives no particular advantage, at least under the reasonable assumption that the input parameter is greater than or equal to the optimal solution of the given instance. In the next section we present some polynomial time heuristics which give approximate solutions to this problem. Successively we report the results of five computer experiments which have been performed to study the behavior of such heuristics on uniformly randomly chosen instances.
4
Polynomial Time Heuristics for Min Storage
Let us first recall the notion of coefficient of approximation. Let cA (I) be the value which is returned as a solution by a heuristic A for the instance I of the Min Storage problem, and let opt(I) be the optimal solution, that is the value returned by the brute force algorithm described above. Then, the coefficient of approximation of algorithm A over the instance I is the value appA (I), where appA (I) =
|cA (I)| opt(I)
Note that appA (I) is always greater than or equal to 1, and that the closer it is to 1, the better the approximate solution is. We say that algorithm A has the guaranteed coefficient of approximation c if appA (I) ≤ c, for every instance I. Notice that in this paper we do not show any guaranteed value for the coefficient of approximation of the proposed algorithms. Indeed, finding an approximation algorithm for Min Storage with a guaranteed coefficient of approximation is still an open problem. In order to evaluate the time complexity of the proposed algorithms we make the following assumptions. First of all we assume that all lists are implemented as arrays. By associating a Boolean flag to each element of the lists, indicating whether the element has to be considered as deleted or not, we can assume that the removal of a generic element L[i] from a list L takes a constant time. As for sorting operations, we assume to use some comparisons–based optimal algorithm such as QuickSort or MergeSort, which take Θ(k log k) time steps to sort k elements. 4.1
The Greedy Algorithm
The first heuristic we propose is the greedy algorithm. This algorithm maintains a list L of elements to be considered. At the beginning of the execution L contains all the elements {e1 , e2 , . . . , ek } of the instance. An integer variable st, initially set to 0, indicates the amount of energy currently stored into the gate. The algorithm repeats the following operations until L becomes empty: first it finds the minimum positive value of st + , with ∈ L, then it updates the value of st with st + , and finally it removes from L. An integer variable stmax
On the Computational Complexity of Conservative Computing
101
records the maximum value reached by st; the value of stmax at the end of the execution is the result returned by the greedy algorithm. It is easily seen that this algorithm can also be implemented as follows: Greedy(I) L ← Sort(I)
// L initially contains the elements of I in increasing // order
st ← 0 stmax ← 0 while Length(L) > 0 do i←1 while st + L[i] < 0 do i←i+1 endwhile st ← st + L[i] stmax ← max{st, stmax} remove L[i] from L endwhile return stmax From the inspection of the pseudocode it is clear that, under the assumptions made above, the execution time of the whole algorithm is Θ(k 2 ). 4.2
Some Θ(k log k) Algorithms
Let us consider the following algorithm: Min(I) Ln ← negative values of I Lp ← I \ Ln sort Lp and Ln in increasing order st ← 0 stmax ← 0 while Length(Lp ) > 0 do st ← st + min(Lp ) stmax ← max{st, stmax} remove min(Lp ) from Lp while Length(Ln ) > 0 and st + max(Ln ) ≥ 0 do st ← st + max(Ln ) remove max(Ln ) from Ln endwhile endwhile return stmax As we can see, at each iteration of the outer while loop the minimum of the remaining positive elements is chosen. For each considered positive element,
102
Giancarlo Mauri and Alberto Leporati
the inner while loop takes as many negative elements as possible, choosing the maximum of them (that is, the one with minimum absolute value) at each iteration. After an initial sorting, each element is considered only once during the execution of the two while loops; hence, the total execution time of the algorithm is Θ(k log k). We can also consider a dual algorithm, which we have called Max, where at each iteration of the outer loop the maximum of the remaining positive elements is chosen, whereas at each iteration of the inner loop the minimum of the remaining negative values is chosen. Another variation is the MaxMinMax algorithm, where at each iteration of the outer while loop the maximum of the remaining positive values is chosen, as in Max. This time, however, there are two subsequent inner while loops: in the former we remove (as much as possible) the minimum negative elements, that is those with highest absolute value, while in the latter we remove the maximum elements as much as possible. Also in this case there exists a dual algorithm, called MinMaxMin, where at each iteration of the outer loop we remove the minimum of the remaining positive elements, and in the two inner loops we remove first the maximum and then the minimum of the remaining negative elements. A further variation is given by algorithms MinMaxMinMax and MaxMinMaxMin. In the outer loop of these algorithms the maximum or the minimum of the remaining positive elements is alternately chosen; in particular, in the former algorithm the first element chosen from the instance is the minimum of positive elements, whereas in the latter algorithm the maximum element of the instance is chosen first. The two inner loops are just like those of MaxMinMax and MinMaxMin; in particular, if the minimum of positive values has been chosen in the outer loop then we first remove the maximum negative elements and then the minimum ones, whereas we do the opposite if the maximum of positive elements was chosen. It is immediately seen that all the variations just exposed are uninfluent to the asymptotic execution time, that remains equal to Θ(k log k). 4.3
The Best Fit Algorithm
Another approach to solve Min Storage is given by the following algorithm: Best Fit(I) Ln ← negative values of I Lp ← I \ Ln sort Lp and Ln in increasing order est ← max{max(Lp ), − min(Ln )} st ← 0 while Length(Lp ) > 0 do if st + min(Lp ) > est then est ← st + min(Lp ) st ← st + min(Lp ) remove min(Lp ) from Lp
// est ← max1≤i≤k |ei |
On the Computational Complexity of Conservative Computing
103
else for i ← Length(Lp ) downto 1 do if st + Lp [i] ≤ est then st ← st + Lp [i] remove Lp [i] from Lp endif endfor endif for i ← 1 to Length(Ln ) do if st + Ln [i] ≥ 0 then st ← st + Ln [i] remove Ln [i] from Ln endif endfor endwhile return est The Best Fit algorithm assumes as a first estimate for the capacity of the gate the value max1≤i≤k |ei |. This initial estimate follows from the observation that the capacity cannot be less than this value since the corresponding element has to be added or removed from the internal storage of the gate in one step. During the execution of the algorithm the estimate for the capacity is adjusted, that is, increased; of course, each time the value of the estimate is increased of the smallest possible amount. Let est be the estimate for the capacity of the gate. At each iteration of the outer while loop we first add to the internal storage some positive values from the instance, and then we add some negative values. Positive values of the instance are scanned from the maximum down to the minimum; each of them is added to the internal storage (and removed from the instance), unless the resulting value exceeds est. Analogously, negative values are scanned from the minimum to the maximum; each of them is added to the internal storage (and removed from the instance), unless the resulting value becomes negative. If at some point no positive value can be added — that is, if st + min(Lp ) > est, where st is the energy currently stored into the gate — then we adjust the value of est by putting est = st + min(Lp ). Now we can add min(Lp ), the minimum of the remaining positive elements, to the internal storage and then try to add some negative elements. The result returned by the algorithm is the value of est at the end of the execution, that is, after all the elements of the instance have been considered. A direct inspection of the pseudocode allows us to see that the execution time of Best Fit is Θ(k 2 ).
5
Experimental Analysis
All the algorithms here exposed have been implemented in the C language and tested over uniformly randomly chosen instances of Min Storage. For each
104
Giancarlo Mauri and Alberto Leporati
heuristic the average coefficient of approximation and the corresponding variance have been computed. In all we have executed five computer experiments, that we now briefly expose. For a detailed report we refer the reader to [12]. In the first experiment we have generated 100 instances, each one containing 12 elements. The elements were chosen from the interval [−106 , 106 ] of integers. The small number and length of instances have been chosen in order to allow the computation of optimal solutions through the “brute force” algorithm that examines all permutations π in Sk . Of the proposed algorithms, Best Fit has obtained both the best coefficient of approximation, about 1.0202729, and the smallest variance. In particular, this means that Best Fit frequently finds a good solution. In the second experiment we have generated 100000 instances of 100 elements. Each element was taken from the interval [−106 , 106 ] of integers. Due to the length of instances, during this experiment (as well as during the next three) we were not able to compute the optimal solutions by means of the “brute force” algorithm; hence, in order to compute the coefficients of approximation we have used the theoretical lower bound max1≤i≤k |ei | as the optimal solution, thus obtaining upper bounds to the real coefficients. Indeed, the first experiment was conceived to compare these upper bounds with the real coefficients of approximation, although computed over very small instances. Once again the winner has been Best Fit, having both the smallest average coefficient of approximation and the smallest variance. We have also considered a variant of the Min Storage problem, where we have relaxed the requirement that the amount of energy stored into the gate at the beginning of the computation is zero. This corresponds to a natural extension of the notion of conservative computation, obtained by letting the gate to have a positive amount ε of energy stored at the beginning of the computation, and requiring that exactly the same amount ε of energy is stored into the gate at the end of the computation. When this situation occurs, we say that the computation is ε–conservative. Clearly also the variant of Min Storage concerning ε–conservative computations (with ε ≥ 0) is NP–hard. The third and fourth experiments have been conceived to study the behavior of the proposed heuristics on this variant of the problem. Setting up these experiments required some minor and trivial modifications to the algorithms. In the third experiment we generated 100 instances, each one composed by 100 elements taken from the interval [−106 , 106 ] of integers. For each instance we ran the proposed algorithms, varying the initial energy ε from 0 to max1≤i≤k |ei |, with steps of 100. As with the previous experiments, we computed the average coefficients of approximation and the corresponding variances. Surprisingly, Best Fit gives no more the best results: instead, MinMaxMinMax has both the lowest average coefficient of approximation (equal to 1.1470192, versus 1.1619727 of Best Fit) and the lowest variance (0.0042202 versus 0.0219509). It is our opinion that Best Fit does not perform better than MinMaxMinMax under this setting because the former algorithm starts by considering the elements of the instance from the greatest positive to the smallest positive element, each
On the Computational Complexity of Conservative Computing
105
time taking the element if there is enough free storage into the gate; negative elements are considered only later. Of course this may not be the optimal strategy, especially when the initial energy stored into the gate is high with respect to gate capacity. The latter algorithm alternately chooses the minimum and the maximum of the positive elements remaining into the instance, and then it immediately considers negative elements: as a consequence, it has more chances to make the right choices. Some modifications to the Best Fit algorithm in order to perform better when there is a positive initial amount of energy into the gate are currently under consideration. The fourth experiment is very similar to the third, the only difference being that the initial energy ε is now varied from 0 to 1.1·max1≤i≤k |ei |. Hence, for some instances we assume that the initial energy is greater than the maximum element of the instance, and thus the only reasonable choice is to start by choosing negative elements. The results obtained during this experiment are similar to the ones obtained during the third experiment. Finally, we conjectured that the variant of Min Storage concerning ε– conservative computations could be harder to solve when ε = 12 · max1≤i≤k |ei |, since in this case we have the smallest number of possible choices for the first element to be chosen from the instance. The fifth experiment was performed to test this conjecture. We generated 10000 instances of 100 elements, where each element was taken from the interval [−106 , 106 ] of integers. However, the results we have obtained are similar to those obtained for previous experiments, and hence it seems that our conjecture is false.
6
Conservative Computing and Small Constant Depth Circuits
As stated above, the computational primitive underlying conservative logic is the Fredkin gate. Indeed, in [6] it is understood that conservative computations are performed by circuits composed of such gates. Hence we believe that it is important to give a characterization of the classes of Boolean functions which can be computed by different topologies of circuits composed by Fredkin gates. Here we give a first step towards this direction: precisely, we show that all Boolean functions which can be computed by constant–depth polynomial–size Fredkin circuits can also be computed by depth-2 polynomial–size threshold circuits with polynomially bounded integer weights. A threshold gate T is a (n, 1, 2)–gate whose computation is uniquely determined by a real vector w = [w1 , w2 , . . . , wn ] ∈ IRn , called the weights vector, and a real value w0 called the threshold. If we denote by x1 , x2 , . . . , xn the input values of T , then the gate computes the following Boolean function: n n 1 if i=1 wi xi ≥ w0 fT (x1 , x2 , . . . , xn ) = step (4) wi xi − w0 = 0 otherwise i=1
It is widely known (see for example [13]) that a Boolean function can be expressed in the form (4) if and only if it is linearly separable, and that integer weights (and
106
Giancarlo Mauri and Alberto Leporati
threshold) are sufficient to a threshold gate to compute every linearly separable function. Trivial examples of linearly separable functions are and, or and not. Putting together threshold gates we can build threshold circuits, that is acyclic and connected directed graphs made up of layers of threshold gates. (For a precise and formal definition of circuits see, for example, [21].) Evaluating a threshold circuit in topological order (i.e. layer by layer, starting from the layer directly connected to the input lines) we can define the Boolean function computed by the circuit as the composition of the functions computed by the single threshold gates. As usually done in the literature, we assume that we can feed to a threshold circuit not only the input values x1 , x2 , . . . , xn , but also their negated counterparts x1 , x2 , . . . , xn ; without loss of generality, we also assume to have two input wires which supply the constant values 0 and 1 respectively. As it is easily demonstrated, both these modifications do not alter the computational power of threshold circuits. In evaluating the resources used by a threshold circuit to compute a Boolean function we consider the size, the depth and the weight of the circuit, respectively defined as the number of threshold gates, the number of layers and the maximum of the absolute values of the weights over all the threshold gates of the circuit. We will also consider the fan–in of gates and circuits. The fan–in of a gate is the number of its input lines, while the fan–in of a circuit is defined as the maximum fan–in of the gates contained into the circuit. From now on we will refer to (n, m, 2)–functions with unspecified values of n and m as multi–output Boolean functions. Of course a Boolean function can be considered as a special case of multi–output Boolean function. When we will state a property for a multi–output Boolean function, it will be understood that each of the m Boolean functions that compose it has the property. We are particularly interested in computing multi–output Boolean functions using threshold circuits of constant depth, polynomial size, polynomial weight and unbounded fan–in, so we give the following definition. Definition 2. Using the established notation given in [19], for any integer d ≥ 1 d the class of Boolean functions computed by depth-d, polynomial we denote by LT size, polynomial–weight and unbounded fan–in threshold circuits. Small–depth polynomial–size threshold circuits with polynomial weights have proven to be a very powerful model of computation: in fact, it has been shown that several arithmetic and Boolean operations of practical interest have surprisingly efficient realizations by threshold circuits of depth smaller than 4 [20,18,22,2]. A Boolean function f (x1 , x2 , . . . , xn ) is symmetric if its value is invariant with respect to permutations over the input variables; it is not difficult to see that this property holds n if and only if the value of f depends uniquely on the value of the quantity i=1 xi . A proposition in [9] states that every sym 2 . From this proposition it follows immediately metric function is in the class LT the well known fact that the Parity function: Parity(x1 , x2 , . . . , xn ) = x1 ⊕ x2 ⊕ . . . ⊕ xn
On the Computational Complexity of Conservative Computing
107
1 and LT 2 , since it is symmetric but not linearly sepaseparates the classes LT rable. Again in [9], Hajnal, Maass, Pudl´ ak, Szegedy and Tur´ an have given the first proof that the Inner Product mod 2 function (ip): ip(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ) = (x1 ∧ y1 ) ⊕ (x2 ∧ y2 ) ⊕ . . . ⊕ (xn ∧ yn ) 2 and LT 3 . Hence: separates the classes LT 2 ⊂ LT 3
1 ⊂ LT LT Unfortunately, no one knows whether the subsequent classes in the hierarchy are separated or not. Indeed, many simple questions about the capabilities of the corresponding circuits remain unanswered: for example, the best we can do for depth-3 threshold circuits is to show some strong lower bounds for some restricted types of circuits, such as bounded fan-in circuits [10] and circuits with and gates at the bottom [15]. To the best knowledge of the authors, no superpolynomial lower bound is known for general threshold circuits with a depth greater than or equal to 3; indeed, in principle it is even possible (although not very plausible) that depth-3 threshold circuits can compute every Boolean function in NP. In spite of this, it is generally believed that constant–depth threshold circuits are not powerful enough to compute all functions in the class NC1 , that is those computable by logarithmic–depth and/or/not circuits with fan–in equal to 2. We can now introduce Fredkin circuits. A Fredkin gate is a (3, 3, 2)–gate whose input/output map FG : {0, 1}3 → {0, 1}3 associates any input triple (x1 , x2 , x3 ) with its corresponding output triple (y1 , y2 , y3 ) as follows: y1 = x1 y2 = (¬x1 ∧ x2 ) ∨ (x1 ∧ x3 ) y3 = (x1 ∧ x2 ) ∨ (¬x1 ∧ x3 )
(5)
The Fredkin gate is functionally complete for Boolean logic: in fact, by fixing x3 = 0 we get y3 = x1 ∧ x2 , whereas by fixing x2 = 1 and x3 = 0 we get y2 = ¬x1 . A useful point of view is that the Fredkin gate behaves as a conditional switch: that is, FG(1, x2 , x3 ) = (1, x3 , x2 ) and FG(0, x2 , x3 ) = (0, x2 , x3 ) for every x2 , x3 ∈ {0, 1}. In other words, x1 can be considered as a control input whose value determines whether the input values x2 and x3 have to be exchanged or not. Here we just mention the fact that every permutation can be written in a unique way (up to the order of factors) as a composition of exchanges, that is, cycles of length two. This means not only that the Fredkin gate can be used to build an appropriate circuit to perform any given conservative computation (and thus it is universal also in this sense with respect to conservative computations), but also that it is the most elementary conceivable operation that can be used to describe conservative computations. A Fredkin circuit is a circuit composed by Fredkin gates. As with threshold circuits, the size and the depth of a Fredkin circuit are defined as the number of gates and the number of layers in the circuit, respectively; the notion of weight
108
Giancarlo Mauri and Alberto Leporati
has no meaning for Fredkin circuits. Also in this case we assume to be able to feed a gate both with variables and with their negated counterparts, and to have two input wires which supply the constant values 0 and 1. Definition 3. For any integer d ≥ 1, let us denote by F Cd the class of multi– output Boolean functions which can be computed by depth-d polynomial–size Fredkin circuits. As a first result, we show that the function FG cannot be computed by a depth-1 threshold circuit. 1. Proposition 1. FG ∈ LT Proof. Let FG2 : {0, 1}3 → {0, 1} be the Boolean function computed by a Fredkin gate on its second output line, y2 . We show that assuming the existence of a threshold gate T that computes FG2 yields to a contradiction. Let w1 , w2 , w3 be the weights associated to input lines x1 , x2 and x3 of T , respectively, and let τ be the threshold value. T computes FG2 if and only if: 1 if w1 x1 + w2 x2 + w3 x3 ≥ τ FG2 (x1 , x2 , x3 ) = 0 otherwise By definition of FG it must hold FG2 (0, 0, 1) = 0 and FG2 (0, 1, 0) = 1, that is w3 < τ and w2 ≥ τ , from which w3 < w2 . Similarly, it should be FG2 (1, 0, 1) = 1 and FG2 (1, 1, 0) = 0, that is w1 + w3 ≥ τ and w1 + w2 < τ . From these inequalities immediately follows w2 < w3 , which contradicts what we have previously obtained.
2 , since every On the other hand, by (5) it is immediately seen that FG ∈ LT and, or and not gate can be realized by an appropriate threshold gate. By substituting every Fredkin gate in a Fredkin circuit with an equivalent depth-2 d are closed under Boolean threshold circuit, and recalling that the classes LT negation, we obtain F Cd ⊆ LT 2d for every integer d ≥ 1. However, we are indeed 2 for every integer d ≥ 1. Let able to show a stronger result: that is, F Cd ⊆ LT us first give the following two definitions. Definition 4. A set G of Boolean functions is said to be closed under constant– degree functions if, for every fixed integer c ≥ 1, every choice of functions g1 , g2 , . . . , gc from G and every c–ary Boolean function h, the Boolean function f (x) = h g1 (x), g2 (x), . . . , gc (x) is in G. Definition 5. Let G be a set of n–ary Boolean functions. A function f : {0, 1}n → IR is G–approximable if for any k > 0 there exists a polynomial number of functions g1 , g2 , . . . , gp(n) ∈ G such that: p(n)
f (x) =
j=1
aj gj (x) ± O(n−k )
∀ x ∈ {0, 1}n
On the Computational Complexity of Conservative Computing
where a1 , a2 , . . . , ap(n) are real numbers such that
max 1≤j≤p(n)
109
|aj | is polynomially
bounded. For our purposes we obtain the most interesting case of G–approximability d ; in this case, instead of LT d –approximability we when we choose G = LT will simply call it d–approximability. Since threshold circuits can only compute (multi–output) Boolean functions, let us consider the following important subclass of d–approximable functions. Definition 6. For any integer d ≥ 1, let us denote by AP Pd the class of all Boolean d–approximable functions. d form the following In [18] it is easily proved that the classes AP Pd and LT intertwined hierarchy: d ⊆ AP Pd ⊆ LT d+1 LT
∀d ≥ 1
Moreover, the following theorem is also proved. Theorem 2 ([18], page 52). For every integer d ≥ 1, the class AP Pd is closed under constant–degree functions. We are now ready to state our result. Let f ∈ F Cd , for some fixed integer d ≥ 1. By definition, there exists a depth-d polynomial–size Fredkin circuit F that computes f . First of all we build an and/or/not circuit that computes f , by replacing each Fredkin gate in F with an equivalent and/or/not circuit, as indicated in (5). Then, using de Morgan laws we move all negations to the first layer of the circuit, and we eliminate them by picking input values or their negations. As a result we obtain a depth-2d and/or circuit, that we denote by C. Since every gate of the first layer of C can be replaced by an equivalent threshold gate, such layer computes a multi–output Boolean function whose components 1 , and hence in AP P1 . The fact that AP P1 is closed under constant– are in LT degree functions implies that the multi–output Boolean function computed by the first two layers of C is in AP P1 too, and so on. This proves that the multi– output Boolean function computed by the first 2d layers of C, that is f , is 2 , f can be computed by a depth-2 polynomial– in AP P1 . Since AP P1 ⊆ LT size polynomial–weight threshold circuit. We have thus proved the following proposition. 2. Proposition 2. For every integer d ≥ 1, F Cd ⊆ LT Actually this proposition can also be proved in a more direct way, without involving the class AP P1 . Let us consider the depth-2d and/or circuit C defined above. Note that such circuit has the maximum number of gates when each output is computed by a subcircuit which is a complete binary tree. This means that the value of each output line of C depends upon at most 22d variables, which 2d is a constant number. Such variables may assume 22 different configurations, which is also a constant number. Hence if we write f in Disjunctive Normal Form (DNF), as a disjunction of all configurations for which f = 1, we obtain a
110
Giancarlo Mauri and Alberto Leporati
depth-2 constant–size and/or circuit. By replacing each gate of this circuit by 2 . Notice that if we allow an equivalent threshold gate we conclude that f ∈ LT the use of only input variables x1 , x2 , . . . , xn instead of both input variables and their negations, then by writing f in DNF we obtain a depth-3 and/or/not circuit whose negations are all located in the first layer. However it is not difficult to see that also in this case we can replace each and and each or gate by an appropriate threshold gate, and embed all negations in the first layer of threshold gates, thus obtaining a depth-2 constant–size threshold circuit. A different approach to the study of the computational capability of Fredkin circuits is obtained by considering the sets of permutations they generate. If we do not allow the duplication of a signal by splitting the corresponding wire (equivalently, if we impose that each input and output value of the gates can be used just once) then it is easily seen that in a Fredkin circuit the number of input and output lines are the same, and that the circuit is conservative with respect to the notion given in [6]. In particular, the circuit applies a permutation to every possible input pattern. The applied permutation depends upon both the topology of the circuit and the input pattern. In the literature, this kind of permutations are usually named data dependent permutations. Hence if we fix a Fredkin circuit we can associate a set of permutations to the set of all possible input patterns. Here we advocate the use of conditional permutations to study the set of permutations realized by Fredkin circuits. Generally, a conditional permutation can be written as: f (x1 , x2 , . . . , xn ) · π(x1 , x2 , . . . , xn )
(6)
where f : {0, 1}n → {0, 1} is a Boolean function, and π ∈ Sn is a permutation. If f = 1 then (6) behaves as π, otherwise it behaves as the identity. As an example, the Fredkin gate (5) can be expressed as x1 · (x2 x3 ). Clearly every Fredkin circuit can be expressed as a composition of conditional exchanges. Hence the set of permutations realized by a Fredkin circuit is obtained by considering all possible truth assignments to the Boolean formulas which express the conditions in conditional exchanges. Future work will be devoted to study the properties of the sets of permutations generated by Fredkin circuits, as well as the relationships between such properties and the complexity of the corresponding Boolean functions computed by the same circuits.
7
Conclusions and Directions for Future Work
In this paper we have proposed the first steps towards a theory of conservative computations, where the amount of energy associated to the data manipulated during the computations is preserved. We have shown that conservative computations induce ConsComp, a new NP–complete decision problem, and Min Storage, its naturally associated optimization problem. Since it is commonly believed that no polynomial time algorithm exists that always gives the optimal solution to any NP–hard optimization
On the Computational Complexity of Conservative Computing
111
problem, we have presented some polynomial time heuristics that give approximate solutions to the instances of Min Storage. A study of their behavior, performed through some computer experiments, suggests that Min Storage seems to be easy to solve on uniformly randomly chosen instances. In particular one of the proposed heuristics, namely Best Fit, seems to perform very well when the initial energy stored into the gate is zero. Interestingly, the same heuristic is no more the best when the initial energy is positive. Hence a first open problem is to try to modify Best Fit in order to perform better with a positive initial energy. A more important open problem is to devise an approximation algorithm for Min Storage. Of course perfect conservation of energy is possible only in theory. Hence, a further possibility for future work could be the relaxation of the conservativeness constraint (2), by allowing that the amount of energy dissipated during a computation step be no greater than a fixed value. Analogously, we can suppose that if we try to store an amount of energy that exceeds the capacity of the gate then the energy which cannot be stored is dissipated. In such a case it should be interesting to study trade-offs between the amount of energy dissipated and the hardness of the corresponding modified ConsComp and Min Storage problems. In this paper we have also started to study the complexity of Boolean functions computed by Fredkin circuits. However, only constant–depth circuits have been here considered, and only very basic results have been exposed. We believe that logarithmic–depth Fredkin circuits should be far more interesting from this point of view. Notice that the proof of Proposition 2 cannot be replicated for 2 , contrary to the fact this kind of circuits, otherwise we could infer NC1 ⊆ LT 1 that the Inner Product mod 2 function is in NC \ LT 2 . Besides the study of permutations generated by Fredkin circuits, it should be interesting to study the computational complexity of many–valued extensions of Fredkin circuits, such as circuits composed by the gates defined in [5]. Finally, it remains to study how to theoretically model circuits whose gates are equipped with an internal storage unit. It is our opinion that it seems appropriate to consider this kind of gates as finite state automata, by viewing the energy levels of the storage unit as their states.
References 1. G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti–Spaccamela, M. Protasi. Complexity and Approximation. Combinatorial Optimization Problems and Their Approximability Properties. Springer–Verlag, 1999. 2. R. Beigel, J. Tarui. On ACC. In Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science, 1991, pp. 783–792. 3. C. H. Bennett. Logical reversibility of computation. IBM Journal of Research and Development, 17, November 1973, pp. 525–532. 4. G. Cattaneo, G. Della Vedova, A. Leporati, R. Leporini. Towards a Theory of Conservative Computing. Submitted for publication, 2002. e-print available at: http://arxiv.org/quant-ph/abs/0211085
112
Giancarlo Mauri and Alberto Leporati
5. G. Cattaneo, A. Leporati, R. Leporini. Fredkin Gates for Finite–valued Reversible and Conservative Logics. Journal of Physics A: Mathematical and General, 35, 2002, pp. 9755–9785. 6. E. Fredkin, T. Toffoli. Conservative Logic. International Journal of Theoretical Physics, 21, Nos. 3/4, 1982, pp. 219–253. 7. M. R. Garey, D. S. Johnson. Computers and Intractability. A Guide to the Theory on NP–Completeness. W. H. Freeman and Company, 1979. 8. M. Goldmann, J. H˚ astad, A. Razborov. Majority Gates vs. general weighted Threshold Gates. Computational Complexity, Vol. 2, No. 4, 1992, pp. 277–300. 9. A. Hajnal, W. Maass, P. Pudl´ ak, M. Szegedy, G. Tur´ an. Threshold Circuits of Bounded Depth. Journal of Computer and System Sciences, Vol. 46, No. 2, 1993, pp. 129–154. 10. J. H˚ astad, M. Goldmann. On the power of small–depth threshold circuits. Computational Complexity I, Vol. 2, 1991, pp. 113–129. 11. R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 3, 1961, pp. 183–191. 12. A. Leporati, G. Della Vedova, G. Mauri. An Experimental Study of some Heuristics for Min Storage. Submitted for publication, 2003. 13. S. Muroga. Threshold Logic and its Applications. Wiley–Interscience, 1971. 14. C. A. Petri. Gr¨ undsatzliches zur Beschreibung diskreter Prozesse. In Proceedings of the 3 rd Colloquium u ¨ber Automatentheorie (Hannover, 1965), Birkh¨ auser Verlag, Basel, 1967, pp. 121–140. English translation: Fundamentals of the Representation of Discrete Processes, ISF Report 82.04, 1982. 15. A. Razborov, A. Widgerson. nΩ(log n) lower bounds on the size of depth–3 threshold circuits with AND gates at the bottom. Information Processing Letters, Vol. 45, No. 6, 1993, pp. 303–307. 16. N. Rescher. Many–valued logics. McGraw–Hill, 1969. 17. J. B. Rosser, A. R. Turquette. Many–valued logics. North Holland, 1952. 18. V. P. Roychowdhury, K. Y. Siu, A. Orlitsky. Theoretical Advances in Neural Computation and Learning. Kluwer Academic, 1994. 19. K. Y. Siu, J. Bruck. On the Power of Threshold Circuits with Small Weights. SIAM Journal on Discrete Mathematics, Vol. 4, No. 3, 1991, pp. 423–435. 20. K. Y. Siu, V. P. Roychowdhury. On Optimal Depth Threshold Circuits for Multiplication and Related Problems. SIAM Journal on Discrete Mathematics, Vol. 7, No. 2, 1994, pp. 284–292. 21. H. Vollmer. Introduction to Circuit Complexity: A Uniform Approach. Springer– Verlag, 1999. 22. A. C. Yao. On ACC and Threshold Circuits. In Proceedings of the 31st IEEE Symposium on Foundations of Computer Science, 1990, pp. 619–628.
Constructing Infinite Graphs with a Decidable MSO-Theory Wolfgang Thomas RWTH Aachen, Lehrstuhl f¨ ur Informatik VII D-52056 Aachen, Germany
[email protected] Abstract. This introductory paper reports on recent progress in the search for classes of infinite graphs where interesting model-checking problems are decidable. We consider properties expressible in monadic second-order logic (MSO-logic), a formalism which encompasses standard temporal logics and the modal µ-calculus. We discuss a class of infinite graphs proposed by D. Caucal (in MFCS 2002) which can be generated from the infinite binary tree by applying the two processes of MSO-interpretation and of unfolding. The main purpose of the paper is to give a feeling for the rich landscape of infinite structures in this class and to point to some questions which deserve further study.
1
Introduction
A fundamental decidability result which appears in hundreds of applications in theoretical computer science is Rabin’s Tree Theorem [23]. The theorem says that the monadic second-order theory (MSO-theory) of the infinite binary tree is decidable. The system of monadic second-order logic arises from first-order logic by adjunction of variables for sets (of tree nodes) and quantifiers ranging over sets. In this language one can express many interesting properties, among them reachability conditions (existence of finite paths between elements) and recurrence conditions (existence of infinite paths with infinitely many points of a given property). Already in Rabin’s paper [23] the main theorem is used to infer a great number of further decidability results. The technique for the transfer of decidability is the method of interpretation: It is based on the idea of describing a structure A, using MSO-formulas, within the structure T2 of the binary tree. The decidability of the MSO-theory of A can then be deduced from the fact that the MSO-theory of T2 is decidable. Rabin considered mainly structures of interest to mathematical logic. For example, he showed that the monadic second-order theory of the rational number ordering (Q, 2. As typical example consider T3 = ({0, 1, 2}∗ , S03 , S13 , S23 ). We obtain a copy of T3 in T2 by considering only the T2 -vertices in the set T = (10 + 110 + 1110)∗ . A word in this set has the form 1i1 0 . . . 1im 0 with i1 , . . . , im ∈ {1, 2, 3}; and we take it as a representation of the element (i1 − 1) . . . (im − 1) of T3 . The following MSO-formula ϕ(x) (written in abbreviated suggestive form) defines the set T in T2 : ∀Y [Y (x) ∧ ∀y((Y (y10) ∨ Y (y110) ∨ Y (y1110)) → Y (y)) → Y ()] It says that x is in the closure of under 10-, 110-, and 1110-successors. The relation {(w, w10)|w ∈ {0, 1}∗ } is defined by the following formula: ψ0 (x, y) := ∃z(S1 (x, z) ∧ S0 (z, y)) With the analogous formulas ψ1 , ψ2 for the other successor relations, we see that the structure with universe ϕT2 and the relations ψiT2 restricted to ϕT2 is isomorphic to T3 . In general, an MSO-interpretation of a structure A in a structure B is given by a “domain formula” ϕ(x) and, for each relation RA of A, say of arity m, an MSO-formula ψ(x1 , . . . , xm ) such that A with the relations RA is isomorphic to the structure with universe ϕB and the relations ψ B restricted to ϕB . Then for an MSO-sentence χ (in the signature of A) one can construct a sentence χ (in the signature of B) such that A |= χ iff B |= χ . In order to obtain χ from χ, one has to replace every atomic formula R(x1 , . . . , xm ) by the corresponding formula ψ(x1 , . . . , xm ) and to relativize all quantifications to ϕ(x) (for details see e.g. [13]). As a consequence, we note the following:
116
Wolfgang Thomas
Proposition 1. If A is MSO-interpretable in B and the MSO-theory of B is decidable, then so is the MSO-theory of A. In the literature a more general type of interpretation is also used, called MSO-transduction (see [8]), where the universe B is represented in a k-fold copy of A rather than in A itself. For the results treated below it suffices to use the simple case mentioned above. 2.2
Pushdown Graphs and Prefix Recognizable Graphs
A graph G = (V, (Ea )a∈A ) is called pushdown graph (over the label alphabet A) if it is the transition graph of the reachable global states of an -free pushdown automaton. Here a pushdown automaton is of the form P = (Q, A, Γ, q0 , Z0 , ∆), where Q is the finite set of control states, A the input alphabet, Γ the stack alphabet, q0 the initial control state, Z0 ∈ Γ the initial stack symbol, and ∆ ⊆ Q × A × Γ × Γ ∗ × Q the transition relation. A global state (configuration) of the automaton is given by a control state and a stack content, i.e., by a word from QΓ ∗ . The graph G = (V, (Ea )a∈A ) is now specified as follows: – V is the set of configurations from QΓ ∗ which are reachable (via finitely many applications of transitions of ∆) from the initial global state q0 Z0 . – Ea is the set of all pairs (pγw, qvw) from V 2 for which there is a transition (p, a, γ, v, q) in ∆. A more general class of graphs, which includes the case of vertices of infinite degree, has been introduced by Caucal [4]. These graphs are introduced in terms of prefix-rewriting systems in which “control states” (as they occur in pushdown automata) are no longer used and where a word on the top of the stack (rather than a single letter) may be rewritten. Thus, a rewriting step can be specified by a triple (u1 , a, u2 ), describing a transition from a word u1 w via letter a to the word u2 w. The feature of infinite degree is introduced by allowing generalized rewriting rules of the form U1 →a U2 with regular sets U1 , U2 of words. Such a rule leads to the (in general infinite) set of rewrite triples (u1 , a, u2 ) with u1 ∈ U1 and u2 ∈ U2 . A graph G = (V, (Ea )a∈A ) is called prefix-recognizable if for some finite system S of such generalized prefix rewriting rules U1 →a U2 over an alphabet Γ , we have – V ⊆ Γ ∗ is a regular set, – Ea consists of the pairs (u1 w, u2 w) where u1 ∈ U1 , u2 ∈ U2 for some rule U1 →a U2 from S, and w ∈ Γ ∗ . Theorem 2 (Muller-Schupp [22], Caucal [4]). The MSO-theory of a pushdown graph is decidable; so is the MSO-theory of a prefix-recognizable graph. First we present the proof for pushdown graphs. Let G = (V, (Ea )a∈A ) be generated by the pushdown automaton P = (Q, A, Γ, q0 , Z0 , ∆). Each configuration is a word over the alphabet Q ∪ Γ . Taking m = |Q| + |Γ | we can represent
Constructing Infinite Graphs with a Decidable MSO-Theory
117
a configuration by a node of the tree Tm . For technical convenience we write the configurations in reverse order, i.e. as words in Γ + Q. We give an MSOinterpretation of G in Tm . The formula ψa (x, y) which defines Ea in Tm has to say the following: “there is a stack content w such that x = (pγw)R and y = (qvw)R for a rule (p, a, γ, v, q) of ∆.” This is easily formalized (even with a first-order formula), using the successor relations in Tm to capture the prolongation of w by γ, p, q and by the letters of v. Now it is easy to write down also the desired domain formula ϕ(x) which defines the configurations reachable from q0 Z0 . We refer to (q0 Z0 )R as definable element of the tree Tm and to the union E of the relations Ea , defined by a∈A ψa (x, y). The formula ϕ(x) says that “each set X which contains (q0 Z0 )R and is closed under taking Esuccessors also contains x.” For prefix-recognizable graphs, a slight generalization of the previous proof is needed. Let G be a prefix-recognizable graph with a regular set V ⊆ Γ ∗ of vertices. We describe an MSO-interpretation of G in the tree Tm where m is the size of Γ . We start with a formula ψ(x, y) which defines the edge relation induced by a single rule U1 →a U2 with regular U1 , U2 . The formula expresses for x, y that there is a word (= tree node) w such that x = u1 w, y = u2 w with u1 ∈ U1 , u2 ∈ U2 . If A1 , A2 are finite automata recognizing U1 , U2 respectively, this can be phrased as follows: “there is a node w such that A1 accepts the path segment from x to w and A2 the path segment from y to w.” Acceptance of a path segment is expressed by requiring a corresponding automaton run. Its existence can be coded by a tuple of subsets over the considered path segment (for an automaton with 2k states a k-tuple of sets suffices). The disjunction of such formulas taken for all a-rules gives the desired formula defining the edge relation Ea . The domain formula ϕ(x) is provided in the same way, now referring to the path segment from node x back to the root. Using the interpretation of Tm in T2 , the decidability claims follow from Rabin’s Tree Theorem. It is interesting to note that the prefix-recognizable graphs in fact coincide with the graphs which are MSO-interpretable in T2 ([2]).
3
Unfoldings
Let G = (V, (Ei )i∈I , (Pj )j∈J ) be a graph and v0 a designated vertex of V . The unfolding of G from v0 is a structure of the form t(G, v0 ) = (V , (Ei )i∈I , (Pj )j∈J ). Its domain V is the set of all paths from v0 ; here a path from v0 is a sequence v0 i1 v1 . . . ik vk where for h ≤ k we have (vh−1 , vh ) ∈ Eih . A pair (p, q) of paths is in Ei iff q is an extension of p by an edge from Ei , and we have p ∈ Pj iff the last element of p is in Pj .
118
Wolfgang Thomas
As an example consider the singleton graph G0 with vertex v0 and two edge relations E0 , E1 , both of which contain the edge (v0 , v0 ). The unfolding of G0 is (isomorphic to) the binary tree T2 . This example illustrates the power of the unfolding operation: Starting from the trivial singleton graph (which of course has a decidable MSO-theory), we obtain the binary tree T2 where decidability of the MSO-theory is a deep result. The unfolding operation takes sequences of edges (as elements of the unfolded structure). A related construction, called tree iteration, refers to sequences of elements instead. It has the advantage that it covers arbitrary relational structures without extra conventions. To spare notation we define it only over graphs, as considered above. The tree iteration of a graph G = (V, (Ei )i∈I , (Pj )j∈J ) is the structure G∗ = ∗ (V , S, C, (Ei∗ )i∈I , (Pj∗ )j∈J ) where S = {(w, wv) | w ∈ V ∗ , v ∈ V } (“successor”), C = {(wv, wvv) | w ∈ V ∗ , v ∈ V } (“clone relation”), Ei∗ = {(wu, wv) | w ∈ V ∗ , (u, v) ∈ Ei }, and Pj∗ = {wv | w ∈ V ∗ , v ∈ Pj }. From the singleton graph mentioned above one obtains by tree iteration a copy of the natural number ordering rather than of the binary tree. However, the structure T2 can be generated by tree iteration from the two element structure ({0, 1}, P0 , P1 ) using the two predicates P0 = {0} and P1 = {1}. The unfolding t(G, v0 ) can be obtained by a monadic transduction from G∗ , more precisely by an MSO-interpretation in a twofold copy of G. Both operations preserve the decidability of the MSO-theory. Again we state this only for graphs: Theorem 3 (Muchnik, Walukiewicz, Courcelle (cf. [24]), [26], [11])). If a graph has a decidable MSO-theory, then its unfolding from a definable vertex and its tree iteration also have decidable MSO-theories. Extending earlier work of Shelah and Stupp, the theorem was shown for tree iterations by A. Muchnik (see [24]). A full proof is given by Walukiewicz in [26]; for a very readable account we recommend [1]. For the unfolding operation see the papers [9,11] by Courcelle and Walukiewicz. As a small application of the theorem we show a result (of which we do not know a reference) on structures (N, Succ, P ), the successor structure of the natural numbers with an extra unary predicate P . Consider the binary tree T2 expanded by the predicate P = {w ∈ {0, 1}∗ | |w| ∈ P }, the “level predicate” for P . Now the MSO-theory of (N, Succ, P ) is decidable iff the MSO-theory of (N, Succ0 , Succ1 , P ) is decidable where Succ0 = Succ1 = Succ. The unfolding of the latter structure is the binary tree expanded by the level predicate for P . Hence we obtain: Proposition 2. If the MSO-theory of (N, Succ, P ) is decidable, then so is the MSO-theory of the binary tree expanded by the level predicate for P .
4
Caucal’s Hierarchy
In [5], Caucal introduced the following hierarchy (Gn ) of graphs, together with a hierarchy (Tn ) of trees:
Constructing Infinite Graphs with a Decidable MSO-Theory
119
– T0 = the class of finite trees – Gn = the class of graphs which are MSO-interpretable in a tree of Tn – Tn+1 = the class of unfoldings of graphs in Gn By the results of the preceding sections (and the fact that a finite structure has a decidable MSO-theory), each structure in the Caucal hierarchy has a decidable MSO-theory. By a hierarchy result of Damm [12] on higher-order recursion schemes, the hierarchy is strictly increasing. In Caucal’s paper [5], a different formalism of interpretation (via “inverse rational substitutions”) is used instead of MSO-interpretations. We work with the latter to keep the presentation more uniform; the equivalence between the two approaches has been established by Carayol and W¨ ohrle [10]. Let us take a look at some structures which occur in this hierarchy. It is clear that G0 is the class of finite graphs, while T1 contains the so-called regular trees (alternatively defined as the infinite trees which have only finitely many non-isomorphic subtrees). Figure 1 (upper half) shows a finite graph and its unfolding as a regular tree: a
•
a
c
b
•
a
•
•
• •
a
a
•
d
c
b
•
a
•
d
c
e
a
•
d
c
e
c
•
•
•
a
•
d
•
a
···
c
•
c
e
a
•
c
•
•
a
•
c
•
•
a
•
c
b
•
a
•
•
•
a
···
•
d
···
···
c
e
e
Fig. 1. A graph, its unfolding, and a pushdown graph
By an MSO-interpretation we can obtain the pushdown graph of Figure 1 in the class G1 ; the domain formula and the formulas defining Ea , Eb , Ec are trivial, while ψd (x, y) = ψe (x, y) = ∃z∃z (Ea (z, z ) ∧ Ec (z, y) ∧ Ec (z , x)) Let us apply the unfolding operation again, from the only vertex without incoming edges. We obtain the “algebraic tree” of Figure 2, belonging to T2 (where for the moment one should ignore the dashed line). As a next step, let us apply an MSO-interpretation to this tree which will produce a graph (V, E, P ) in the class G2 (where E is the edge relation and P a unary predicate). Referring to Figure 2, V is the set of vertices which are located along the dashed line, E contains the pairs which are successive vertices along the dashed line, and P contains the special vertices drawn as non-filled circles. This structure is isomorphic to the structure (N, Succ, P2 ) with the successor relation Succ and predicate P2 containing the powers of 2.
120
Wolfgang Thomas •
a
•
a
c
b
•
a
• c
◦ ◦
• •
• e
• d
◦
d
• e
•
···
c
d
e
a
•
c
• d
a
•
d
•
e
• e
e
d
•
• d
◦
• •
e
•
d
•
e
•
···
Fig. 2. Unfolding of the pushdown graph of Figure 1
To prepare a corresponding MSO-interpretation, we use formulas such as Ed∗ (x, y) which expresses “all sets which contain x and are closed under taking Ed -successors contain y, and y has no Ed -successor” As domain formula we use ϕ(x) = ∃z(Eb (z, x) ∨ ∃y(Ec (z, y) ∧ Ed∗ (y, x))). The required edge relation E is defined by ψ(x, y) = ∃z∃z (ψ1 (x, y) ∨ ψ2 (x, y) ∨ ψ3 (x, y)) where – ψ1 (x, y) = Ea (z, z ) ∧ Eb (z, x) ∧ Ec (z , y) – ψ2 (x, y) = Ea (z, z ) ∧ Ece∗ (z, x) ∧ Ecd∗ (z , y) – ψ3 (x, y) = Ede∗ (z, x) ∧ Eed∗ (z, y) Finally we define P by the formula χ(x) = ∃z∃z (Ec (z, z ) ∧ Ed∗ (z , x)). We infer that the MSO-theory of (N, Succ, P2 ) is decidable, a result first proved by Elgot and Rabin [14] with a different approach. The idea of [14], later applied to many other expansions of the successor structure by unary predicates, is to transform first a given MSO-sentence ϕ to an equivalent Bchi automaton Bϕ , so that (N, Succ, P2 ) |= ϕ iff Bϕ accepts the characteristic 0-1-sequence αP2 (with αP2 (i) = 1 iff i ∈ P2 ). By contracting the 0-segments between the letters 1, one can modify αP2 to an ultimately periodic sequence β such that Bϕ accepts αP iff Bϕ accepts β. Whether Bϕ accepts such a “regular model” β is decidable. Note that this reduction to a regular model depends on the sentence ϕ under consideration. The generation of (N, Succ, P2 ) as a model in G2 provides a uniform decidability proof. In [7], the contraction method was adapted to cover all morphic predicates P (coded by morphic 0-1-words). Fratani and S´enizergues [15] have shown that the models (N, Succ, P ) for morphic P also occur in the Caucal hierarchy. In the
Constructing Infinite Graphs with a Decidable MSO-Theory
121
present paper we discuss another structure treated already in [14]: the structure (N, Succ, Fac) where Fac is the set of factorial numbers. We start from a simpler pushdown graph than the one used above and consider its unfolding, which is the comb structure indicated by the thick arrows of the lower part of the figure. •
a
•
c
b
•
a
•
c
b
• b
a
•
c
b
• b
•
•
b
c
···
•
c
···
• b
• c
•
a
b
•
•
•
•
···
c
•
• c
c
•
• c
• Fig. 3. Preparing for the factorial predicate
We number the vertices of the horizontal line by 0, 1, 2 . . . and call the vertices below them to be of “level 0”, “level 1”, “level 2” etc. Now we use the simple MSO-interpretation which takes all tree nodes as domain and introduces for n ≥ 0 a new edge from any vertex of level n + 1 to the first vertex of level n. This introduces the thin lines in Figure 3 as new edges (assumed to point backwards). The reader will be able to write down a defining MSO-formula. Note that the top vertex of each level plays a special role since it is the target of an edge labelled b, while the remaining ones are targets of edges labelled c. Consider the tree obtained from this graph by unfolding. It has subtrees consisting of a single branch off level 0, 2 branches off level 1, 2 · 3 branches off level 2, and generally (n + 1)! branches off level n. Referring to the c-labelled edges these branches are arranged in a natural (and MSO-definable) order. To capture the structure (N, Succ, Fac), we apply an interpretation which (for n ≥ 1) cancels the branches starting at the b-edge target of level n (and leaves only the branches off the targets of c-edges). As a result, (n + 1)! − n! branches off level n remain for n ≥ 1, while there is one branch off level 0. Numbering these remaining branches, the n!-th branch appears as first branch off level n. Note that we traverse this first branch off a given level by disallowing c-edges after the first c-edge. So a global picture like Figure 2 emerges, now representing the factorial predicate. Summing up, we have generated the structure (N, Succ, Fac) as a graph in G3 .
122
Wolfgang Thomas
•
•
•
•
•
0
1
10
11
100
•
• 110
•
• 1000
•
• 1010
•
•
•
1100
Fig. 4. Graph of flip function
So far we have considered expansions of the successor structure of the natural numbers by unary predicates. We now discuss the expansion by an interesting unary function (here identified with its graph, a binary relation). It is the flip function, introduced in [21] in the study of a hierarchical time structure (involving different time granularities). The function flip associates 0 to 0 and for each nonzero n that number which arises from the binary expansion of n by modifying the least significant 1-bit to 0. An illustration of the graph Flip of this function is given in Figure 4. It is easy to see that the structure (N, Succ, Flip) can be obtained from the algebraic tree of Figure 2 by an MSO-interpretation. A Flip-edge will connect vertex u to the last leaf vertex v which is reachable by a d∗ -path from an ancestor of u; if such a path does not exist, an edge to the target of the b-edge (representing number 0) is taken. Other parts of arithmetic can also be captured by suitable structures of the Caucal hierarchy. For example, it can be shown that a semilinear relation (a relation definable in Presburger arithmetic) can be represented by a suitable graph. As the simplest example consider the relation x + y = z. It can be represented in a comb structure like Figure 3 where each vertical branch is infinite and for each edge a corresponding back-edge (with dual label) is introduced. In the unfolding of this infinite comb structure, a vertex on column x and row y allows a path precisely of length x + y via the back-edges to the origin. In this way, graphs can be generated which (as acceptors of languages) are equivalent to the Parikh automata of [19].
5
Outlook
The examples treated above should convince the reader that the Caucal Hierarchy supplies a large reservoir of interesting models where the MSO-theory is decidable. Many problems are open in this field. We mention some of them. 1. Studying and extending the range of the Caucal Hierarchy: We do not know much about the graphs on levels ≥ 3 of the Caucal hierarchy. Which structures of arithmetic (with domain N and some relations over N) occur there? How to decide on which level a given structure occurs? Is it possible to obtain a still richer landscape of models by invoking the operation of tree iteration (possibly for structures with relations of arity > 2, as in [3])? 2. Comparison with other approaches to generate infinite graphs: There are representation results which allow to generate, for n > 0, the graphs of level n from a single tree of level n, respectively as the transition graphs of higher-level
Constructing Infinite Graphs with a Decidable MSO-Theory
123
pushdown automata (see [5,6] and the references mentioned there). There are as yet only partial results which settle the relation between the graphs of Caucal’s hierarchy and the synchronized rational (or “automatic”) graphs, the rational graphs, and the graphs generated by ground term rewriting systems (cf. e.g. [25,20] and the references mentioned there). 3. Complexity of Model-Checking: The reduction of the MSO-model-checking problem for an unfolded graph to the corresponding problem for the original graph involves a non-elementary blow-up in complexity. When using restricted logics one can avoid this. For example, Cachat [6] has shown that µ-calculus model-checking over graphs of level n is possible in n-fold exponential time.
Acknowledgments Many thanks are due to Didier Caucal for numerous fruitful discussions and to my collaborators and students Jan Altenbernd, Thierry Cachat, Christof L¨ oding, and Stefan W¨ ohrle for their help.
References 1. D. Berwanger, A. Blumensath, The monadic theory of tree-like structures, in [16], pp. 285-302. 2. A. Blumensath, Prefix-recognizable graphs and monadic second-order logic, Rep. AIB-06-2001, RWTH Aachen, 2001. 3. A. Blumensath, Axiomatising tree-interpretable structures, in: Proc. 19th STACS, Springer LNCS 2285 (2002), 596-607. 4. D. Caucal, On infinite transition graphs having a decidable monadic theory, in: Proc. 23rd ICALP (F. Meyer auf der Heide, B. Monien, Eds.), Springer LNCS 1099 (1996), 194-205 [Full version in: Theor. Comput. Sci. 290 (2003), 79-115]. 5. D. Caucal, On infinite graphs having a decidable monadic theory, in: Proc. 27th MFCS (K. Diks, W. Rytter, Eds.), Springer LNCS 2420 (2002), 165-176. 6. Th. Cachat, Higher order pushdown automata, the Caucal Hierarchy of graphs and parity games, in Proc. 30th ICALP, 2003, Springer LNCS (to appear). 7. O. Carton, W. Thomas, The monadic theory of morphic infinite words and generalizations, in: Proc. 25th MFCS (M. Nielsen, B. Rovan, Eds.), Springer LNCS 1893 (2000), 275-284. 8. B. Courcelle, Monadic second-order graph transductions: a survey, Theor. Comput. Sci. 126 (1994), 53-75. 9. B. Courcelle, The monadic second-order logic of graphs IX: machines and their behaviours, Theor. Comput. Sci. 151 (1995), 125-162. 10. A. Carayol, S. Whrle, personal communication. 11. B. Courcelle, I. Walukiewicz, Monadic second-order logic, graph coverings and unfoldings of transition systems, Ann. Pure Appl. Logic 92 (1998), 35-62. 12. W. Damm, The IO and OI hierarchies, Theor. Comput. Sci. 20 (1982), 95-208. 13. H.D. Ebbinghaus, J. Flum, W. Thomas, Mathematical Logic, Springer, BerlinHeidelberg-New York 1984. 14. C.C. Elgot, M.O. Rabin, Decidability and undecidability of extensions of second (first) order theory of (generalized) successor, J. Symb. Logic 31 (1966), 169-181.
124
Wolfgang Thomas
15. S. Fratani, G. S´enizergues, personal communication. 16. Automata, Logics, and Infinite Games (E. Grdel, W. Thomas, Th. Wilke, Eds.), Springer LNCS 2500 (2002), Springer-Verlag, Berlin-Heidelberg-New York 2002. 17. T. Knapik, D. Niwinski, P. Urzyczyn, Deciding monadic theories of hyperalgebraic trees, in: TCLA 2001 (S. Abramsky, Ed.), Springer LNCS 2044 (2001), 253-267. 18. T. Knapik, D. Niwinski, P. Urzyczyn, Higher-order pushdown trees are easy, in: Proc. 5th FoSSaCS (M. Nielsen, U. Engberg, Eds.), Springer LNCS 2303 (2002), 205-222. 19. F. Klaedtke, H. Rue, Monadic second-order logic with cardinalities, in: Proc. 30th ICALP 2003, Springer LNCS (to appear). 20. C. L¨ oding, Ground tree rewriting graphs of bounded tree width, in: Proc. 19th STACS, Springer LNCS 2285 (2002), 559-570. 21. A. Montanari, A. Peron, A. Policriti, Extending Kamp’s Theorem to model time granularity, J. Logic Computat. 12 (2002), 641-678. 22. D. Muller, P. Schupp, The theory of ends, pushdown automata, and second-order logic, Theor. Comput. Sci. 37 (1985), 51-75. 23. M.O. Rabin, Decidability of second-order theories and automata on infinite trees, Trans. Amer. Math. Soc. 141 (1969), 1-35. 24. A. Semenov, Decidability of monadic theories, in: Proc. MFCS 1984 (M.P. Chytil, V. Koubek, Eds.), Springer LNCS 176 (1984), 162-175. 25. W. Thomas, A short introduction to infinite automata. In: Proc. 5th International Conference “Developments in Language Theory”, Springer LNCS 2295 (2002), 130144. 26. I. Walukiewicz, Monadic second-order logic on tree-like structures, Theor. Comput. Sci. 275 (2002), 311-346.
Towards a Theory of Randomized Search Heuristics Ingo Wegener FB Informatik, LS 2, Univ. Dortmund, 44221 Dortmund, Germany
[email protected] Abstract. There is a well-developed theory about the algorithmic complexity of optimization problems. Complexity theory provides negative results which typically are based on assumptions like NP=P or NP=RP. Positive results are obtained by the design and analysis of clever algorithms. These algorithms are well-tuned for their specific domain. Practitioners, however, prefer simple algorithms which are easy to implement and which can be used without many changes for different types of problems. They report surprisingly good results when applying randomized search heuristics like randomized local search, tabu search, simulated annealing, and evolutionary algorithms. Here a framework for a theory of randomized search heuristics is presented. It is discussed how randomized search heuristics can be delimited from other types of algorithms. This leads to the theory of black-box optimization. Lower bounds in this scenario can be proved without any complexity-theoretical assumption. Moreover, methods how to analyze randomized search heuristics, in particular, randomized local search and evolutionary algorithms are presented.
1
Introduction
Theoretical computer science has developed powerful methods to estimate the algorithmic complexity of optimization problems. The borderline between polynomial-time solvable and NP-equivalent problems is marked out and this holds for problems and their various subproblems as well as for their approximation variants. We do not expect that randomized algorithms can pull down this border. The “best” algorithms for specific problems are those with the smallest asymptotic (w.r.t. the problem dimension) worst-case (w.r.t. the problem instance) run time. They are often well-tuned especially for this purpose. They can be complicated, difficult to implement, and not very efficient for reasonable problem dimension. This has led to the area of algorithm engineering. Nevertheless, many practitioners like another class of algorithms, namely socalled randomized search heuristics. Their characteristics are that they are
This work was supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the Collaborative Research Center “Computational Intelligence” (SFB 531), the Collaborative Research Center “Complexity Reduction of Multivariate Data Structures” (SFB 475), and the GIF project “Robustness Aspects of Algorithms”.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 125–141, 2003. c Springer-Verlag Berlin Heidelberg 2003
126
Ingo Wegener
– easy to implement, – easy to design, – often fast although there is no guaranteed upper bound on the expected run time, – often producing good results although there is no guarantee that the solution is optimal or close to optimal. Classical algorithm theory is concerned with guarantees on the run time (or the expected run time for randomized algorithms) and with guarantees for the quality of the results produced by the algorithm. This has led to the situation that practitioners work with algorithms which are almost not considered in the theory of algorithms. The motivation of this paper is the following. If randomized search heuristics find many applications, then there should be a theory of this class of algorithms. The aim is to understand how these algorithms work, what they can achieve and what they cannot achieve. This should lead to the design of better heuristics, rules which algorithm is appropriate under certain restrictions, and an at least partial analysis of these algorithms on selected problems. Finally, these results can be used when teaching randomized search heuristics. The problem is that we are interested in “the best” algorithms for an optimization problem and we do not expect that a randomized search heuristic is such a best algorithm. It seems to be impossible to define precisely which algorithm is a randomized search heuristic. Our solution to this dilemma is to describe an algorithmic scenario such that all known randomized search heuristics can work in this scenario while most problem-specific algorithms are not applicable in this scenario. This black-box scenario is presented in Section 2. There it is shown that black-box algorithms can be interpreted as randomized decision trees. This allows the application of methods from classical complexity theory, in particular, lower-bound methods like Yao’s minimax principle. This new framework allows a general theory of black-box algorithms including randomized local search, tabu search, simulated annealing, and evolutionary algorithms. Theoretical results on each of these classes of randomized search heuristics were known before, e.g., Papadimitriou, Sch¨ affer, and Yannakakis (1990) for local search, Glover and Laguna (1993) for tabu search, Kirkpatrick, Gelatt, and Vecchi (1983) and Sasaki and Hajek (1998) for simulated annealing, and Rabani, Rabinovich, and Sinclair (1998), Wegener (2001), Droste, Jansen, and Wegener (2002), and Giel and Wegener (2003) for evolutionary algorithms. Lower bounds on the complexity of black-box optimization problems are presented at the end of this paper, in Section 9. Before, we investigate what can be achieved by randomized search heuristics, in particular, by randomized local search and evolutionary algorithms. The aim is to present and to apply methods to analyze randomized search heuristics on selected problems. In Section 3, we, therefore, discuss some methods which have been applied recently and, afterwards, we present examples of these applications. In Section 4, we investigate the optimization of degree-bounded polynomials which are monotone with respect to each variable. It is not known whether the polynomial is increasing or
Towards a Theory of Randomized Search Heuristics
127
decreasing with xi . Afterwards, we investigate famous problems with well-known efficient algorithms working in the classical optimization scenario. In Section 5, we investigate sorting as the maximization of sortedness. The sortedness can be measured in different ways which has influence on the optimization time of evolutionary algorithms. In Section 6, the single-source-shortest-paths problem is discussed. It turns out that this problem can be handled efficiently only in the model of multi-objective optimization. In Section 7, the maximum matching problem is investigated in order to show that evolutionary algorithms can find improvements which are not obtainable by single local operations. Evolutionary algorithms work with two search operators known as mutation and crossover. In Section 8, we discuss the first analytic results about the effectiveness of crossover. The results on the black-box complexity in Section 9 show that the considered heuristics are close to optimal — in some cases. We finish with some conclusions
2
The Scenario of Black-Box Optimization
The aim is to describe an algorithmic scenario such that the well-known randomized search heuristics can work in this scenario while it is not possible to apply “other algorithms”. What are the specific properties of randomized search heuristics? The main observation is that randomized search heuristics use the information about the considered problem instance in a highly specialized way. They do not work with the parameters of the instance. They only compute possible solutions and work with the values of these solutions. Consider, e.g., the 2-opt algorithm for the TSP. It starts with a random tour π. In general, it stores one tour π, cuts it randomly into two pieces and combines these pieces to a new tour π . The new tour π replaces π iff its cost is not larger than the cost of π. Problem-specific algorithms work in a different way. Cutting-plane techniques based on integer linear programming create new conditions based on the values of the distance matrix. Branch-and-bound algorithms use the values of the distance matrix for the computation of upper and lower bounds and for the creation of subproblems. This observation can be generalized by considering other optimization problems. Droste, Jansen, and Wegener (2003) have introduced the following scenario called black-box optimization. A problem is described as a class of functions. This unifies the areas of mathematical optimization (e.g., maximize a pseudoboolean polynomial of degree 2) and combinatorial optimization. E.g., TSP is the class of all functions fD : Σn → R+ 0 where D = (dij ) is a distance matrix, Σn is the set of permutations or tours, and fD (π) is the cost of π with respect to D. Hence, it is no restriction to consider problems as classes Fn of functions f : Sn → R. The set Sn is called search space for the problem dimension n. In our case, Sn is finite. A problem-specific algorithm knows Fn and the problem instance f ∈ Fn . Each randomized search heuristic belongs to the following class of black-box algorithms. Algorithm 1. (Black-box algorithm) 1. Choose some probability distribution p on Sn and produce a random search point x1 ∈ Sn according to p. Compute f (x1 ).
128
Ingo Wegener
2. In Step t, stop if the considered stopping criterion is fulfilled. Otherwise, depending on I(t) = (x1 , f (x1 ), . . . , xt−1 , f (xt−1 )) choose some probability distribution pI(t) on Sn and produce a random search point xt ∈ S according to pI(t) . Compute f (xt ). This can be interpreted as follows. The black-box algorithm only knows the problem Fn and has access to a black box which, given a query x ∈ Sn , answers with the correct value of f (x) where f is the considered problem instance. Hence, black-box optimization is an information-restricted scenario. It is obvious that most problem-specific algorithms cannot be applied in this scenario. We have to take into account that randomized search heuristics stop without knowing whether they have found an optimal search point. Therefore, we investigate black-box algorithms without stopping criterion as an infinite stochastic process and we define the run time as the random variable measuring the time until an optimal search point is presented as a query to the black box. This is justified since randomized search heuristics use most of their time for searching for an optimum and not for proving that it is optimal (this is different for exact algorithms like branch and bound). Since queries to the black box are the essential steps, we only charge the algorithm for queries to the black box, i.e., for collecting information. Large lower bounds in this model imply that black-box algorithms cannot solve the problem efficiently. For most optimization problems, the computation of f (x) is easy and, for most randomized search heuristics, the computation of the next query is easy. The disadvantage of the model is that we allow all black-box algorithms including those which collect information to identify the problem instance. Afterwards, they can apply any problem-specific algorithm. MAX-CLIQUE is defined as follows. For a graph G and a subset V of the vertex set V let fG (V ) = |V |, if V is a clique in G, and fG (V ) = 0 otherwise. Asking a query for each twoelement set V we get the information about the adjacency matrix of G and can compute a maximum clique without asking the black box again. Finally, we have to present the solution to the black box. The number of black-box queries of this algorithm equals n2 + 1 but the overall run time is exponential. Hence, our cost model is too generous. For upper bounds, we also have to consider the overall run time of the algorithm. Nevertheless, we may get efficient black-box algorithms which cannot be considered as randomized search heuristics, see, e.g., Giel and Wegener (2003) for the maximum matching problem. Hence, it would be nice to further restrict the scenario to rule out such algorithms. The second observation about randomized search heuristics is that they typically do not store the whole history, i.e., all previous queries and answers or, equivalently, all chosen search points and their values. Randomized local search, simulated annealing, and even some special evolutionary algorithms only store one search point (and its value). Then the next search point is computed and it is decided which of the two search points is stored. Evolutionary algorithms typically work with populations, i.e., multisets of search points. In most cases, the population size is quite small, typically not larger than the problem dimension.
Towards a Theory of Randomized Search Heuristics
129
A black-box algorithm with space restriction s(n) can store at most s(n) search points with their values. After a further search point is produced and presented to the black box, it has to be decided which search point will be forgotten. This decision can be done randomly. We can conclude that the blackbox scenario with (even small) space restrictions includes the typical randomized search heuristics and rules out several algorithms which try to identify the problem instance. Up to now, there are several lower bounds on the black-box complexity which even hold in the scenario without space restrictions (see Section 9). Lower bounds which depend strongly on the space bound are not known and are an interesting research area. We have developed the black-box scenario from the viewpoint of well-known randomized search heuristics. In order to prove lower bounds it is more appropriate to describe black-box algorithms as randomized decision trees. A deterministic black-box algorithm corresponds to a deterministic search tree. The root contains the first query and has an outgoing edge for each possible answer. In general, a path to an inner node describes the history with all previous queries and answers and contains the next query with an outgoing edge for each possible answer. A randomized decision tree is a probability distribution on the set of deterministic decision trees. Since Sn is finite and since it makes no sense to repeat queries if the whole history is known, the depth of the decision trees can be bounded by |Sn |. It is easy to see that both definitions of black-box algorithms are equivalent. The study of randomized decision trees has a long history (e.g., Hajnal (1991), Lov´ asz, Naor, Newman, and Wigderson (1991), Heiman and Wigderson (1991), Heiman, Newman, and Wigderson (1993)). Usually, parts of the unknown input x can be queried and one is interested in computing f (x). Here we can query search points x and get the answer f (x). Usually, the search stops at a leaf of the decision tree and we know the answer to the problem. Here the search stops at the first node (not necessarily a leaf) where the query concerns an optimal search point. Although our scenario differs in details from the traditional investigation of randomized decision trees, we can apply lower-bound techniques known from the theory on decision trees. It is not clear how to improve such lower bounds in the case of space restrictions.
3
Methods for the Analysis of Randomized Search Heuristics
We are interested in the worst-case (w.r.t. the problem instance) expected (w.r.t. the random bits used by the algorithm) run time of randomized search heuristics. If the computation of queries (or search points) and the evaluation of f (often called fitness function) are algorithmically simple, it is sufficient to count the number of queries. First of all, randomized search heuristics are randomized algorithms and many methods used for the analysis of problem-specific randomized algorithms can be applied also for the analysis of randomized search heuristics. The main
130
Ingo Wegener
difference is that many problem-specific randomized heuristics implement an idea how to solve the problem and they work in a specific direction. Randomized search heuristics try to find good search directions by experiments, i.e., they try search regions which are known to be bad if one knows the problem instance. Nevertheless, when analyzing a randomized search heuristic, we can develop an intuition how the search heuristic will approach the optimum. More precisely, we define a typical run of the heuristic with certain subgoals which should be reached within certain time periods. If a subgoal is not reached within the considered time period, this can be considered as a failure. The aim is to estimate the failure probabilities and often it is sufficient to estimate the total failure probability by the sum of the single failure probabilities. If the heuristic works with a finite storage and the analysis is independent from the initialization of the storage, then a failure can be interpreted as the start of a new trial. This general approach is often successful. The main idea is easy but we need a good intuition how the heuristic works. If the analysis is not independent of the contents of the storage, the heuristic can get stuck in local optima. If the success probability within polynomially many steps is not too small (at least 1/p(n) for a polynomial p), a restart or a multistart strategy can guarantee a polynomial expected optimization time. It is useful to analyze search heuristics together with their variants defined by restarts or many independent parallel runs. The question is how we can estimate the failure probabilities. The most often applied tool is Chernoff’s inequality. It can be used to ensure that a (not too short) sequence of random experiments with results 1 (success) and 0 (no success) has a behavior which is very close to the expected behavior with overwhelming probability. A typical situation is that one needs n steps with special properties in order to reach the optimum. If the success probability of a step equals p, it is very likely that we need Θ(n/p) steps to have n successes. All other tail inequalities, e.g., Markoff’s inequality and Tschebyscheff’s inequality, are also useful. We also need a kind of inverse of Chernoff’s inequality. During N Bernoulli trials with success probability 1/2 it is not unlikely (more precisely, there is a positive constant c > 0 such that the probability is at least c) to have at least N/2 + N 1/2 successes, i.e., the binomial distribution is not too concentrated. If a heuristic tries two directions with equal probability and the goal lies in one direction, the heuristic may find it. E.g., the expected number of steps of a random walk on {0, . . . , n} with p(0, 1) = p(n, n−1) = 1 and p(i, i−1) = p(i, i + 1) = 1/2, otherwise, until it reaches n is bounded by O(n2 ). A directed search starting in 0 needs n steps. This shows that a directed search is considerably better but a randomized search is not too bad (for an application of these ideas see Jansen and Wegener (2001b)). If the random walk is not fair and the probability to go to the right equals p, we may be interested in the probability of reaching the good point n before the bad point 0 if we start at a. This is equivalent to the gambler’s ruin problem. Let t := (1 − p)/p. Then the success probability equals (1 − ta )/(1 − tn ).
Towards a Theory of Randomized Search Heuristics
131
There is an another result with a nice description which has many applications. If a randomized search heuristic flips a random bit of a search point x ∈ {0, 1}n , we are interested in the expected time until each position has been flipped at least once. This is the scenario of the coupon collector’s theorem. The expected time equals Θ(n log n) and large deviations from the expected value are extremely unlikely. This result has the following consequences. If the global optimum is unique, randomized search heuristics without problem-specific modules need Ω(n log n) steps on the average. In many cases, one needs more complicated arguments to estimate the failure probability. Ranade (1991) was the first to apply an argument now known as delay-sequence argument. The idea is to characterize those runs which are delayed by events which have to have happened. Afterwards, the probability of these events is estimated. This method has found many applications since its first presentation, for the only application to the analysis of an evolutionary algorithm see Dietzfelbinger, Naudts, van Hoyweghen, and Wegener (2002). Typical runs of a search heuristic are characterized by subgoals. In the case of maximization, this can be the first point of time when a query x where f (x) ≥ b is presented to the black box. Different fitness levels (all x where f (x) = a) can be combined to fitness layers (all x where a1 ≤ f (x) ≤ a2 ). Then it is necessary to estimate the time until a search point from a better layer is found if one has seen a point from a worse layer. The fitness alone does not provide the information that “controls” or “directs” the search. As in the case of classical algorithms, we can use a potential function g : Sn → R (also called pseudo-fitness). The black box still answers the query x with the value of f (x) but our analysis of the algorithm is based on the values of g(x). Even if a randomized search heuristic with space restriction 1 does not accept search points whose fitness is worse, the g-value of the search point stored in the memory may decrease. We may hope that it is sufficient that the expected change of the g-value is positive. This is not true in a strict sense. A careful drift analysis is necessary in order to guarantee “enough” progress in a “short” time interval (see, e.g., Hajek (1982), He and Yao (2001), Droste, Jansen, and Wegener (2002)). Altogether, the powerful tools from the analysis of randomized algorithms have to be combined with some intuition about the algorithm and the problem. Results obtained in this way are reported in the following sections.
4
The Optimization of Monotone Polynomials
Each pseudo-boolean function f : {0, 1}n → R can be written uniquely as a polynomial f (x) = wA xi . A⊆{1,...,n}
i∈A
Its degree d(f ) is the maximal |A| where wA = 0 and its size s(f ) the number of sets A where wA = 0. Already the maximization of polynomials of degree 2 is NP-hard. The polynomial is called monotone increasing if wA ≥ 0 for all A. The maximization of monotone increasing polynomials is trivial since the input
132
Ingo Wegener
1n is optimal. Here we investigate the maximization of monotone polynomials of degree d, i.e., polynomials which are monotone increasing with respect to some z1 , . . . , zn where zi = xi or zi = 1 − xi . This class of functions is interesting because of its general character and because of the following properties. For each input a and each global optimum a∗ there is a path a0 = a, . . . , am = a∗ such that ai+1 is a Hamming neighbor of ai and f (ai+1 ) ≥ f (ai ), i.e., we can find the optimum by local steps which do not create points with a worse fitness. Nevertheless, there are non-optimal points where no search point in the Hamming ball with radius d − 1 is better. We investigate search heuristics with space restriction 1. They use a random search operator (also called mutation operator) which produces the new query a from the current search point a. The new search point a is stored instead of a if f (a ) ≥ f (a). The first mutation operator RLS (randomized local search) chooses i uniformly at random and flips ai , i.e., ai = 1 − ai and aj = aj for all j = i. The second operator EA (evolutionary algorithm) flips each bit independently from the others with probability 1/n. Finally, we consider a class of operators RLSp , 0 ≤ p ≤ 1/n, which choose uniformly at random some i. Then ai is flipped with probability 1 and each aj , j = i, is flipped independently from the others with probability p. Obviously RLS0 = RLS. Moreover, RLS1/n is close to EA if the steps without flipping bit are omitted. For all these heuristics, we have to investigate how they find improvements. In general, the analysis of RLS is easier. The number of bits which have a correct (optimal) value and influence the fitness value essentially is never decreased. This is different for RLSp , p > 0, and EA. If one bit gets the correct value, several other bits can be changed from correct into incorrect. Nevertheless, it is possible that a replaces a. Wegener and Witt (2003) have obtained the following results. All heuristics need an expected time of Θ((n/d) · 2d ) to optimize monotone polynomials of size 1 and degree d, i.e., monomials. This is not too difficult to prove. One has to find the unique correct assignment to d variables, i.e., one has to choose among 2d possibilities, and the probability that one of the d important bits is flipped in one step equals Θ(d/n). In general, RLS performs a kind of parallel search on all monomials. Its expected optimization time is bounded by O((n/d) · log(n/d + 1) · 2d ). It can be conjectured that the same bounds hold for RLSp and EA. The best known result is a bound of O((n2 /d) · 2d ) for RLSp , d(f ) ≤ c log n, and p small enough, more precisely p ≤ 1/(3dn) and p ≤ α/(nc/2 log n) for some constant α > 0. The proof is a drift analysis on the pseudo-fitness counting the correct bits with essential influence on the fitness value. Moreover, the behavior of the underlying Markoff chain is estimated by comparing it with a simpler Markoff chain. It can be shown that the true Markoff chain is only by a constant factor slower than the simple one. Similar ideas are applied to analyze the mutation operator EA. This is essentially the case of RLS1/n , i.e., there are often several flipping bits. The best bound for degree d ≤ 2 log n − 2 log log n − a for some constant a depends on the size s and equals O(s · (n/d) · 2d ).
Towards a Theory of Randomized Search Heuristics
133
For all mutation operators, the expected optimization time equals Θ((n/d) · log(n/d + 1) · 2d ) for the following function called royal road function in the community of evolutionary algorithms. This function consists of n/d monomials of degree d, their weights equal 1 and they are defined on disjoint sets of variables. These functions are the most difficult monotone polynomials for RLS and the conjecture is that this holds also for RLSp and EA. The conjecture implies that overlapping monomials simplify the optimization of monotone polynomials. Our analysis of three simple randomized search heuristics on the simple class of degree-bounded monotone polynomials shows already the difficulties of such analyses.
5
The Maximization of the Sortedness of a Sequence
Polynomials of bounded degree are a class of functions defined by structural properties. Here and in the following sections, we want to discuss typical algorithmic problems. Sorting can be understood as the maximization of the sortedness of the sequence. Measures of sortedness have been developed in the theory of adaptive sorting algorithms. Scharnow, Tinnefeld, and Wegener (2002) have investigated five scenarios defined as minimization problems with respect to fitness functions defined as distances dπ∗ (π) of the considered sequence (or permutation) π on {1, . . . , n} from the optimal sequence π ∗ . Because of symmetry it is sufficient to describe the definitions only for the case that π ∗ = id is the identity: – INV(π) counts the number of inversions, i.e., pairs (i, j) with i < j and π(i) > π(j), – EXC(π) counts the minimal number of exchanges of two objects to sort the sequence, – REM(π) counts the minimal number of removals of objects in order to obtain a sorted subsequence, this is also the minimal number of jumps (an object jumps from its current position to another position) to sort the sequence, – HAM(π) counts the number of objects which are at incorrect positions, and – RUN(π) counts the number of runs, i.e., the number of sorted blocks of maximal length. The search space is the set of permutations and the function to be minimized is one of the functions dπ∗ . We want to investigate randomized search heuristics related to RLSp and EA in the last section. Again we have a space restriction of 1 and consider the same selection procedure to decide which search point is stored. There are two local search operators, the exchange of two objects and the jump of one object to a new position. RLS performs one local operation chosen uniformly at random. For EA the number of randomly chosen local operations equals X + 1 where X is Poisson distributed with parameter λ = 1. It is quite easy to prove O(n2 log n) bounds for the expected run times of RLS and EA and the fitness functions INV, EXC, REM, and HAM. It is sufficient to consider the different fitness levels and to estimate the probability of increasing
134
Ingo Wegener
the fitness within one step. A lower bound of Ω(n2 ) holds for all five fitness functions. Scharnow, Tinnefeld, and Wegener (2002) describe also some Θ(n2 log n) bounds which hold if we restrict the search heuristics to one of the search operators, namely jumps or exchanges. E.g., in the case of HAM, exchanges seem to be the essential operations and the expected optimization time of RLS and EA using exchanges only is Θ(n2 log n). An exchange can increase the HAM value by at most 2. One does not expect that jumps are useful in this scenario. This is true in most situations but there are exceptions. I.e., HAM(n, 1, . . . , n − 1) = n and a jump of object n to position n creates the optimum. An interesting scenario is described by RUN. The number of runs is essential for adaptive mergesort. In the black-box scenario with small space bounds, RUN seems to give not enough information for an efficient optimization. Experiments prove that RLS and EA are rather inefficient. This has not been proven rigorously. Here we discuss why RUN establishes a difficult problem for typical randomized search heuristics. Let RUN(π) = 2 and let the shorter run have length l. An accepted exchange of two objects usually does neither change the number of runs nor their lengths. Each object has a good jump destination in the other run. This may change l by 1. However, there are only l jumps decreasing l but n − l jumps increasing l. Applying the results on the gambler’s ruin problem it is easy to see that it will take exponential time until l drops from a value of at least n/4 to a value of at most n/8. A rigorous analysis is difficult. At the beginning, there are many short runs and it is difficult to control the lengths of the runs when applying RLS or EA. Moreover, there is another event which has to be controlled. If run r2 follows r1 and the last object of r1 jumps away, it can happen that r1 and r2 melt together since all objects of r2 are larger than the remaining objects of r1 . It seems to be unlikely that long runs melt together. Under this assumption one can prove that RLS and EA need on the average exponential time on RUN.
6
Shortest-Paths Problems
The computation of shortest paths from a source s to all other places is one of the classical optimization problems. The problem instance is described by a distance matrix D = (dij ) where dij ∈ R+ ∪ {∞} describes the length of the direct connection from i to j. The search space consists of all trees T rooted at T s := n. Each tree can be described by the vector vT = (v1T , . . . , vn−1 ) where viT is the number of the direct predecessor of i in T . The fitness of T can be defined in different ways. Let dT (i) be the length of the s-i-path in T . Then – fT (v) = dT (1) + · · · + dT (n − 1) leads to a minimization problem with a single objective and – gT (v) = (dT (1), . . . , dT (n − 1)) leads to a minimization problem with n − 1 objectives. In the case of multi-objective optimization we are interested in Pareto optimal solutions, i.e., search points v where gT (v) is minimal with respect to the partial
Towards a Theory of Randomized Search Heuristics
135
order “≤” on (R ∪ {∞})n−1 . Here (a1 , . . . , an−1 ) ≤ (b1 , . . . , bn−1 ) iff aj ≤ bj for all j. In the case of the shortest-paths problem there is exactly one Pareto optimal fitness vector which corresponds to all trees containing shortest s-i-paths for all i. Hence, in both cases optimal search points correspond to solutions of the considered problem. Nevertheless, the problems are of different complexity when considered as black-box optimization problems. The single-objective problem has very hard instances. All instances where only the connections of a specific tree T have finite length lead in black-box optimization to the same situation. All but one search points have the fitness ∞ and the other search point is optimal. This implies an exponential black-box complexity (see Section 9). The situation is different for the multi-objective problem. A local operator is to replace viT by some w ∈ / {i, viT }. This may lead to a graph with cycles and, therefore, an illegal search point. We may assume that illegal search points are marked or that dT (i) = ∞ for all i without an s-i-path. Again, we can consider the operator RLS performing a single local operation (uniformly chosen at random) and the operator EA performing X + 1 local operations (X Poisson distributed with λ = 1). Scharnow, Tinnefeld, and Wegener (2002) have analyzed these algorithms by estimating the expected time until the algorithm stores a search point whose fitness vector has more optimal components. The worst-case expected run time can be estimated by O(n3 ) and by O(n2 d log n) if the depth (number of edges on a path) of an optimal tree equals d. This result proves the importance of the choice of an appropriate problem modeling when applying randomized search heuristics.
7
Maximum Matchings
The maximum matching problem is a classical optimization problem. In order to obtain a polynomial-time algorithm one needs the non-trivial idea of augmenting paths. This raises the question what can be achieved by randomized search heuristics that do not employ the idea of augmenting paths. Such a study can give insight how an undirected search can find a goal. The problem instance of a maximum matching problem is described by an undirected graph G = (V, E). A candidate solution is some edge set E ⊆ E. The search space equals {0, 1}m for graphs with m edges and each bit position describes whether the corresponding edge is chosen. Finally, fG (E ) = |E |, if the edges of E are a G-matching, and fG (E ) = 0 otherwise. A search heuristic can start with the empty matching. We investigate three randomized search heuristics. Randomized local search RLS flips a coin in order to decide whether it flips one or two bits uniformly at random. It is obvious that an RLS flipping only one bit per step can get stuck in local optima. Flipping two bits in a step, an augmenting path can be shortened by two edges within one step. If the augmenting path has length 1, the matching can be enlarged by flipping the edge on this path. The mutation operator EA flips each bit independently with probability 1/m. This allows to flip all bits of an augmenting path simultaneously. The analysis of EA is much more difficult than the analysis of RLS since more global
136
Ingo Wegener edges
h
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
Kh,h
Fig. 1. The graph Gh, and an augmenting path.
changes are possible. Finally, SA is some standard form of simulated annealing whose details are not described here (see Sasaki and Hajek (1988)). Practitioners do not ask for optimal solutions, they are satisfied with (1 + ε)optimal solutions, in particular, if they can choose the accuracy parameter ε > 0. It is sufficient to find a (1 + ε)-optimal solution in expected polynomial time and to obtain a probability of 3/4 of finding a (1 + ε)-optimal solution within a polynomial number of steps. Algorithms with the second property are called PRAS (polynomial-time randomized approximation scheme). Giel and Wegener (2003) have shown that RLS and EA have the desired properties and the expected run time is bounded by O(m21/ε ). This question has not been investigated for SA. The observation is that small matchings imply the existence of short augmenting paths. For RLS the probability of a sequence of steps shortening a path to an augmenting edge is estimated. For EA it is sufficient to estimate the probability of flipping exactly all the edges of a short augmenting path. These results are easy to prove and show that randomized search heuristics perform quite well for all graphs. The run time grows exponentially with 1/ε. This is necessary as the following example shows (see Figure 1). The graph Gh, has h · ( + 1) nodes arranged as h × ( + 1)-grid. All “horizontal” edges exist. Moreover, the columns 2i and 2i + 1 are connected by all possible edges. The graph has a unique perfect matching consisting of all edges ((i, 2j − 1), (i, 2j)). Figure 1 shows an almost perfect matching (solid edges). In such a situation there is a unique augmenting path which goes from left to right perhaps changing the row, in our example (2, 5), (2, 6), (2, 7), (2, 8), (1, 9), (1, 10). The crucial observation is that the free node (2, 5) (similarly for (1, 10)) is connected to h + 1 free edges whose other endpoints are connected to a unique matching edge each. There are h + 1 2-bitflips changing the augmenting path at (2, 5), h of them increase the length of the augmenting path and only one decreases the length. If h ≥ 2, this is an unfair game and we may conjecture that Gh, , h ≥ 2, is difficult for randomized search heuristics. This is indeed the case. Sasaki and Hajek (1988) have proved for SA that the expected optimization time grows exponentially if h = . Giel and Wegener (2003) have proved a bound of 2Ω() on the expected optimization time for each h ≥ 2 and RLS and EA. The proof for RLS follows the ideas discussed above.
Towards a Theory of Randomized Search Heuristics
137
One has to be careful since the arguments do not hold if an endpoint of the augmenting path is in the first or last column. Moreover, we have to control which matchings are created during the search process. Many of the methods discussed in Section 3 are applied, namely typical runs analyzed with appropriate potential functions (length of the shortest augmenting path), drift analysis, gambler’s ruin problem, and Chernoff bounds. The analysis of EA is even more difficult. It is likely that there are some steps where more than two bits flip and the resulting bit string describes a matching of the same size. Such a step may change the augmenting paths significantly. Hence, a quite simple, bipartite graph where the degree of each node is bounded above by 3 is a difficult problem instance for typical search heuristics. Our arguments do not hold in the case h = 1, i.e., in the case that the graph is a path of odd length. In this case, RLS and EA find the perfect matching in an expected number of O(m4 ) steps (Giel and Wegener (2003)). One may wonder why the heuristics are not more efficient. Consider the situation of one augmenting path of length Θ() = Θ(m). Only four different 2-bit-flips are accepted (two at each endpoint of the augmenting path). Hence, on the average only one out of Θ(m2 ) steps changes the situation. The length of the augmenting path has to be decreased by Θ(m). The “inverse” of Chernoff’s bound (see Section 3) implies that we need on the average Θ(m2 ) essential steps. The reason is that we cannot decrease the length of the augmenting path deterministically. We play a coin tossing game and have to wait until we have won Θ(m) euros. We cannot lose too much since the length of the augmenting path is bounded above by m. The considerations show that randomized search heuristics “work” at free nodes v. The pairs ((v, w), (w, u)) of a free and a matching edge cannot be distinguished by black-box heuristics with small space bounds. Some of them will decrease and some of them will increase the length of augmenting paths. If this game is fair “on the average”, we can hope to find a better matching in expected polynomial time. Which graphs are fair in this imprecise sense? There is a new result of Giel and Wegener showing that RLS finds an optimal matching in trees with n nodes in an expected number of O(n6 ) steps. One can construct situations which look quite unfair. Then nodes have a large degree. Trees with many nodes of large degree have a small diameter and/or many leaves. A leaf, however, is unfair but in favor of RLS. If a leaf is free, a good 2-bit-flip at this free node can only decrease the length of each augmenting path containing the leaf. The analysis shows that the bad unfair inner nodes and the good unfair leaves together make the game fair or even unfair in favor of the algorithm.
8
Population-Based Search Heuristics and Search with Crossover
We have seen that randomized local search RLS is often efficient. RLS is not able to escape from local optima. This can be achieved with the same search operator if we accept sometimes worsenings. This idea leads to the Metropolis
138
Ingo Wegener
algorithm or simulated annealing. Another idea is the mutation operator EA from evolutionary algorithms. It can perform non-local changes but it prefers local and almost local changes. In any case, these algorithms work with a space restriction of 1. One of the main ideas of evolutionary algorithms is to work with more search points in the storage, typically called population-based search. Such a population can help only if the algorithm maintains some diversity in the population, i.e., it contains search points which are not close together. It is not necessary to define these notions rigorously here. It should be obvious nevertheless that it is more difficult to analyze population-based search heuristics (with the exception of multi-start variants of simple heuristics). Moreover, the crossover operator needs a population. We remember that crossover creates a new search point z from two search points x and y. In the case of Sn = {0, 1}n , one-point crossover chooses i ∈ {1, . . . , n − 1} uniformly at random and z = (x1 , . . . , xi , yi+1 , . . . , yn ). Uniform crossover decides with independent coin tosses whether zi = xi or zi = yi . Evolutionary algorithms where crossover plays an important role are called genetic algorithms. There is only a small number of papers with a rigorous analysis of population-based evolutionary algorithms and, in particular, genetic algorithms. The difficulties can be described by the following example. Assume a population consisting of n search points all having k ones where k > n/2. The optimal search point consists of ones only, all search points with k ones are of equal fitness, and all other search points are much worse. If k is not very close to n, it is quite unlikely to create 1n with mutation. A genetic algorithm will sometimes choose one search point for mutation and sometimes choose two search points x and y for uniform crossover and the resulting search point z is mutated to obtain the new search point z ∗ . Uniform crossover can create 1n only if there is no position i where xi = yi = 0. Hence, the diversity in the population should be large. Mutation creates a new search point close to the given one. If both stay in the population, this can decrease the diversity. Uniform crossover creates a search point z between x and y. This implies that the search operators do not support the creation of a large diversity. Crossover is even useless if all search points of the population are identical. In the case of a very small diversity, mutation tends to increase the diversity. The evolution of the population and its diversity is a difficult stochastic process. It cannot be analyzed completely with the known methods (including rapidly mixing Markoff chains). Jansen and Wegener (2002) have analyzed this situation. In the case of k = n − Θ(log n) they could prove that a genetic algorithm reaches the goal in expected polynomial time. This genetic algorithm uses standard parameters with the only exception that the probability of performing crossover is very small, namely 1/(cn log n) for some constant c. This assumption is necessary for the proof that we obtain a population with quite different search points. Since many practitioners believe that crossover is essential, theoreticians are interested in proving this, i.e., in proving that a genetic algorithm is efficient in situations where all mutation- and population-based algorithms fail. The most
Towards a Theory of Randomized Search Heuristics
139
modest aim is to prove such a result for at least one instance of one perhaps even very artificial problem. No such result was known for a long time. The royal road functions (see Section 4) were candidates for such a result. We have seen that randomized local search and simple evolutionary algorithms solve these problems in expected time O((n/d) · log(n/d + 1) · 2d ) and the black-box complexity of these problem is Ω(2d ) (see Section 9). Hence, there can be no superpolynomial trade-off for these functions. The first superpolynomial trade-off has been proved by Jansen and Wegener (2002, the conference version has been published 1999) based on the results discussed above. Later, Jansen and Wegener (2001a) have designed artificial functions and have proved exponential trade-offs for both types of crossover.
9
Results on the Black-Box Complexity of Specific Problems
We have seen that all typical randomized search heuristics work in the blackbox scenario and they indeed work with a small storage. Droste, Jansen, and Wegener (2003) have proved several lower bounds on the black-box complexity of specific problems. The lower-bound proofs apply Yao’s minimax principle (Yao (1977)). Yao considers the zero-sum game between Alice choosing a problem instance and Bob choosing an algorithm (a decision tree). Bob has to pay for each query asked by his decision tree when confronted with the problem instance chosen by Alice. Both players can use randomized strategies. If the number of problem instances and the number of decision trees are finite, lower bounds on the black-box complexity can be obtained by proving lower bounds for deterministic algorithms for randomly chosen problem instances. We are free to choose the probability distribution on the problem instances. The following application of this technique is trivial. Let Sn be the search space and let fa , a ∈ Sn , be the problem instance where fa (a) = 1 and fa (b) = 0 for b = a. The aim is maximization. We choose the uniform distribution on all fa , a ∈ Sn . A deterministic decision tree is essentially a decision list. If a query leads to the answer 1, the search is stopped successfully. Hence, the expected depth is always at least (|Sn | + 1)/2 and this bound can be achieved if we query all a ∈ Sn in random order. This example seems to be too artificial to have applications. The shortestpaths problem (see Section 6) with a single objective contains this problem where the search space consists of all trees rooted at s. Hence, we know that this problem is hard in black-box optimization. In the case of the maximization of monotone polynomials we have the subproblem of the maximization of all z1 · · · zd where zi ∈ {xi , 1 − xi }. The bits at the positions d + 1, . . . , n have no influence on the answers to queries. Hence, we get the lower bound (2d + 1)/2 for the maximization of polynomials (or monomials) of degree d. This bound is not far from the upper bound shown in Section 4. For several problems, we need lower bounds which hold only in a spacerestricted scenario since there are small upper bounds in the unrestricted scenario:
140
Ingo Wegener
– O(n) for sorting and the distance measure INV, – O(n log n) for sorting and the distance measures HAM and RUN, – O(n) for shortest paths as multi-objective optimization problem (a simulation of Dijkstra’s algorithm), – O(m2 ) for the maximum matching problem. Finally, we discuss a non-trivial lower bound in black-box optimization. A function is unimodal on {0, 1}n if each non-optimal search point has a better Hamming neighbor. It is easy to prove that RLS and EA can optimize unimodal functions with at most b different function values in an expected number of O(nb) steps. This bound is close to optimal. A lower bound of Ω(b/nε ) has been proved by Droste, Jansen, and Wegener (2003) if (1 + δ)n ≤ b = 2o(n) (an exponential lower bound for deterministic algorithms has been proven earlier by Llewellyn, Tovey, and Trick (1989)). Here the idea is to consider the following stochastic process to create a unimodal function. Set p0 = 1n , let pi+1 be a random Hamming neighbor of pi , 1 ≤ i ≤ b − n. Then delete the circles on p0 , p1 , . . . to obtain a simple path q0 , q1 , . . .. Finally, let f (qi ) = n + i and f (a) = a1 + · · · + an for all a outside the simple path. Then it can be shown that a randomized search heuristic cannot do essentially better than to follow the path.
10
Conclusion
Randomized search heuristics find many applications but the theory of these heuristics is not well developed. The black-box scenario allows the proof of lower bounds for all randomized search heuristics – without complexity theoretical assumption. The reason is that the scenario restricts the information about the problem instance. Moreover, methods to analyze typical heuristics on optimization problems have been presented. Altogether, the idea of a theory of randomized search heuristics developed as well as the theory of classical algorithms is still a vision but steps to approach this vision have been described
References 1. Dietzfelbinger, M., Naudts, B., van Hoyweghen, C., and Wegener, I. (2002). The analysis of a recombinative hill-climber on H-IFF. Submitted for publication in IEEE Trans. on Evolutionary Computation. 2. Droste, S., Jansen, T., and Wegener, I. (2003). Upper and lower bounds for randomized search heuristics in black-box optimization. Tech. Rep. Univ. Dortmund. 3. Droste, S., Jansen, T., and Wegener, I. (2002). On the analysis of the (1+1) evolutionary algorithm. Theoretical Computer Science 276, 51–81. 4. Giel, O. and Wegener, I. (2003). Evolutionary algorithms and the maximum matching problem. Proc. of 20th Symp. on Theoretical Aspects of Computer Science (STACS), LNCS 2607, 415–426. 5. Glover, F. and Laguna, M. (1993). Tabu search. In C.R. Reeves (Ed.): Modern Heuristic Techniques for Combinatorial Problems, 70–150, Blackwell, Oxford.
Towards a Theory of Randomized Search Heuristics
141
6. Hajek, B. (1982). Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied Probability 14, 502–525. 7. He, J. and Yao, X. (2001). Drift analysis and average time complexity of evolutionary algorithms. Artificial Intelligence 127, 57–85. 8. Jansen, T. and Wegener, I. (2001a). Real royal road functions — where crossover provably is essential. Proc. of 3rd Genetic and Evolutionary Computation Conf. (GECCO), 375–382. 9. Jansen, T. and Wegener, I. (2001b). Evolutionary algorithms — how to cope with plateaus of constant fitness and when to reject strings of the same fitness. IEEE Trans. on Evolutionary Computation 5, 589–599. 10. Jansen, T. and Wegener, I. (2002). The analysis of evolutionary algorithms — a proof that crossover really can help. Algorithmica 34, 47–66. 11. Kirkpatrick, S., Gelatt, C.D., and Vecchi, M.P. (1983). Optimization by simulated annealing. Science 220, 671–680. 12. Llewellyn, D.C., Tovey, C., and Trick, M. (1989). Local optimization on graphs. Discrete Applied Mathematics 23, 157–178. 13. Lov´ asz, L., Naor, M., Newman, I., and Wigderson, A. (1991). Search problems in the decision tree model. Proc. of 32nd IEEE Symp. on Foundations of Computer Science (FOCS), 576–585. 14. Papadimitriou, C.H., Sch¨ affer, A.A., and Yannakakis, M. (1990). On the complexity of local search. Proc. of 22nd ACM Symp. on Theory of Computing (STOC), 438– 445. 15. Rabani, Y., Rabinovich, Y., and Sinclair, A. (1998). A computational view of population genetics. Random Structures and Algorithms 12, 314–330. 16. Ranade, A.G. (1991). How to emulate shared memory. Journal of Computer and System Sciences 42, 307–326. 17. Sasaki, G. and Hajek, B. (1988). The time complexity of maximum matching by simulated annealing. Journal of the ACM 35, 387–403, 1988. 18. Scharnow, J., Tinnefeld, K., and Wegener, I. (2002). Fitness landscapes based on sorting and shortest paths problems. Proc. of 7th Conf. on Parallel Problem Solving from Nature (PPSN–VII), LNCS 2439, 54–63. 19. Wegener, I. (2001). Theoretical aspects of evolutionary algorithms. Proc. of 28th Int. Colloquium on Automata, Languages and Programming (ICALP), LNCS 2076, 64–78. 20. Wegener, I. and Witt, C. (2003). On the optimization of monotone polynomials by simple randomized search heuristics. Combinatorics, Probability and Computing, to appear. 21. Yao, A.C. (1977). Probabilistic computations: Towards a unified measure of complexity. Proc. of 17th IEEE Symp. on Foundations of Computer Science (FOCS), 222–227.
Adversarial Models for Priority-Based Networks 1 ` C. Alvarez , M. Blesa1 , J. D´ıaz1 , A. Fern´ andez2 , and M. Serna1 1
Dept. de Llenguatges i Sistemes Inform` atics, Universitat Polit`ecnica de Catalunya Jordi Girona 1–3, 08034 Barcelona {alvarez,mjblesa,diaz,mjserna}@lsi.upc.es 2 Dept. Ciencias Experimentales e Ingenier´ıa, Universidad Rey Juan Carlos Tulip´ an s/n, Campus de M´ ostoles, 28933 Madrid
[email protected] Abstract. We propose several variations of the adversarial queueing model to cope with packets that can have different priorities, the priority and variable priority models, and link failures, the failure and reliable models. We address stability issues in the proposed adversarial models. We show that the set of universally stable networks in the adversarial model remains the same in the four introduced models. From the point of view of queueing policies we show that several queueing policies that are universally stable in the adversarial model remain so in the priority, failure and reliable models. However, we show that lis, a universally stable queueing policy in the adversarial model, is not universally stable in any of the other models, and that no greedy queueing policy is universally stable in the variable priority model. Finally we analyze the problem of deciding stability of a given network under a fixed protocol. We provide a characterization of the networks that are stable under fifo and lis in the failure model. This characterization allows us to show that deciding network stability under fifo and lis in the proposed models can be solved in polynomial time.
1
Introduction
The model of Adversarial Queueing Theory (aqt) proposed by Borodin et al. [10] considers the time evolution of a packet-routing network as a game between an adversary and a queueing policy. At each time step the adversary may inject a set of packets to some of the nodes. For each packet the adversary specifies the sequence of edges that it must traverse, after which the packet will be absorbed. If more than one packet try to cross an edge e at the same time step, then the queueing policy chooses one of these packets to be sent across e. The remaining
Work partially supported by the FET Programme of the EU under contract number IST-2001-33116 (FLAGS), and by the Spanish CICYT project TIC-2001-4917-E. The second author was also supported by the Catalan government with the predoctoral grant 2001FI-00659.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 142–151, 2003. c Springer-Verlag Berlin Heidelberg 2003
Adversarial Models for Priority-Based Networks
143
packets wait in the queue. This game then advances to the next time step. The main goal of the model is to study stability issues of the network, under different greedy queueing policies. Stability is the property of deciding whether at any time the maximum number of packets present in the system is bounded by a constant that may depend on system parameters. Recall that a protocol is greedy if whenever there is at least one packet waiting to use an edge, the protocol advances a packet through the edge. In the adversarial model the adversary is restricted by a pair (r, b) where b ≥ 0 is the burstiness, and 0 < r < 1 is the injection rate. The adversary must obey the following rule Ne (I) ≤ r|I| + b, where Ne (I) denotes the number of packets injected by the adversary during a time interval I that have paths containing the edge e1 [10,5]. In this paper we consider a generalization of the adversarial model which takes into account the possibility that packets may have different priorities and we explore some dynamic network settings. Priority Models. Considering priorities is a natural approach to model nowadays’ networks. Today’s networked applications, such as data mining, e-commerce and multimedia, are bandwidth hungry and time sensitive. These applications need networks that accommodate to these requirements and guarantee some Quality of Service (QoS). Classifying and prioritizing network traffic flow are basic tasks. We are interested in analyzing the power of an adversary that can prioritize the packets. We will consider two settings, in the first one each packet have a fixed priority and in the second one the adversary is allowed to modify the priority of a packet at any time step. Consequently, we define two new models for adversarial queueing theory, the priority and the variable priority. When packets have priorities, each edge has a queue associated to every possible priority. If at a certain time, more than one packet tries to cross the same edge e, the queueing policy chooses one of these packets to send across e, from the non-empty queue with highest priority. The limitations on the adversary are the same as in the adversarial model. Models for Dynamic Networks. Inspired by the priority models and by the growing importance of wireless mobile networks where some connections between nodes may fail or change quickly and unpredictably, we also consider some variations of the adversarial model for dynamic networks, in which edges can appear and disappear arbitrarily. Note that in the priority model, we can simulate the failure of an edge e by injecting a packet of length 1 in e with a priority higher than any other packet in the queue of e. Once this packet has been injected none of the remaining packets of the queue can be served until the next time step. It seems natural to introduce models for dynamic networks in which the adversary controls not only the packet arrivals, but also the edge failures. The constraints of an adversary are defined obeying the rule: the number of packets introduced by the adversary during interval I, which have paths containing e, can not be 1
Recall that in [10] the model is defined over windows of fixed size w and the equation Ne (I) ≤ r|I|, for |I| = w. It is known that both models are equivalent [16].
144
` C. Alvarez et al.
greater than a fraction of the number of times that e is alive in this interval. Furthermore, as packets must follow a pre-specified path, the adversary should not be able to make an edge fail perpetually. We assume that a packet cannot cross a failed link, and that during an edge failure the packets that arrive wait queued at the head of the edge. Let Fe (I) be the number of steps during time interval I for which the edge e is down. In order to guarantee that we keep a bound on the maximum number of failures of an edge e in any time interval, we propose the failure model, in which the adversary is controlled by a common bound on both the injection and the edge failures, according to the restriction Ne (I) + Fe (I) ≤ r|I| + b.
(1)
Observe than in this case, during a given interval, the injection rate limits the maximum number of failures and the maximum number of packet injections per edge. With the aim of allowing a higher amount of edge failures, we define a new dynamic model. To do so, we introduce an additional parameter α. The adversary is characterized by (r, b, α) , where r, b are defined as before and r ≤ α ≤ 1. For any edge e and any interval I, the adversary must obey the constraint Ne (I) + αFe (I) ≤ r|I| + b.
(2)
In the model defined by equation (2), we can consider two extreme cases: For α = 1 we obtain the constraint (1) defining the failure model. For α ≤ r we get a model in which an edge can be down all the time. Notice that in the case r < α ≤ 1, if the adversary produces a failure of an edge, then it is forced to recover the edge after b/(α − r) steps, otherwise it will violate inequality (2). We are interested in the latter property and will use the term reliable model to denote an adversary with parameters (r, b, α) where 0 < r < 1, b > 1, and r < α ≤ 1 and constrained by inequality (2). Greedy Protocols. Through this paper the term network will refer to digraphs which may have multiple edges but no loops. As in [10,5] we will only consider greedy protocols which apply their policies to the queues at the edges according to some local or global criteria. The systems acts synchronously. The main queueing protocols we consider are: fifo (First In First Out), lifo (Last In First Out), sis (Shortest In System), lis (Longest In System), ntg (Nearest To Go), ftg (Furthest To Go), nfs (Nearest From Source), ffs (Farthest From Source) and ntg-lis, which works as ntg, but resolving ties using the lis protocol. It is known that ftg, nts, sis and lis are universally stable in the adversarial model while fifo, lifo, ntg and ffs are not [5]. Related Work. Two adversarial models for dynamic networks have been proposed in [7] and [6]. In both models the injected packets are defined by specifying only source and destination, and thus they are not forced to follow a pre-specified path. The dynamic models proposed in this paper consider the case in which the injected packets are defined specifying the sequence of edges that they must
Adversarial Models for Priority-Based Networks
145
traverse. Our models and the dynamic models proposed in [7] and in [6] have the common characteristic that, for every interval I, the adversary can not inject to any edge e (or to any set S of nodes for the model in [6]), more packets than the number of packets that e can absorb (or the edges with only one extreme in S). An interpretation of system failure as a slowdown in the transmission or link capacity, instead of link failure was studied in [9], in both models packets are injected with a pre-specified path. In the dynamic slowdown model, a packet p suffers slowdown se (t) while crossing edge e at time t, that means that it p starts to traverse the link at time t and arrives to the tail of e at time t + se (t). During this transfer time the packets that want to cross e wait in the queue of e. In the static case every link e has a fixed slowdown se . This situation has some similarities with the failure model as the slowdown s incurred by a packet traversing a link e can be interpreted as “link e fails during s − 1 steps”. However, there is a difference, in the slowdown model p is delayed after leaving e’s queue, while in the failure model p waits in e’s queue. This means that when e is recovered, the next packet to be served might not be p. In the capacity model every edge e in a network has capacity ce (t) at time step t. They also consider a static case in which the capacity does not depend on time. In step t a link is able to transmit simultaneously up to ce (t) packets. The main results in the paper are that every universally stable network remains universally stable in the slowdown and the capacity models, even in the dynamic case. That sis, nts and ftg remain universally stable in all the models. The situation is different for lis since it is universally stable in the static slowdown model but it is not in the dynamic slowdown and capacity models. Even though we can interpret that a link fails at any time step with ce (t) = 0 the proof that lis is not universally in the dynamic capacity model uses two non zero capacities (see Theorem 3.1 [9]). Our Contributions. We address stability issues in the proposed adversarial models. Let us recall that a network is stable under a protocol and an adversary if the number of packets in the system at any time step remains bounded. Our first results concern with universal stability of networks. First we show that the property that a network is stable under any adversary and queueing policy remains the same in the adversarial, priority, variable priority, failure and reliable models. From the point of view of queueing policies, we show that nfs, sis and ftg, that are universally stable in the adversarial model [5], remain so in the failure, reliable, and priority models. However, we show that lis, a universally stable queueing policy in the adversarial model [5], is not universally stable in the failure, reliable and priority models. Moreover, we show that no greedy protocol is universally stable in the variable priority model. Finally we analyze the problem of deciding stability of a given network under a fixed protocol. We provide a characterization of the networks that are stable under fifo and lis in the failure model. This characterization is the same as the one given in [3] in the adversarial model for universal stability and for stability
146
` C. Alvarez et al.
under ntg-lis. Thus, our results show that for fifo and lis the stability problem in the failure model can be solved in polynomial time. Let us observe that the characterization of fifo stability in the adversarial model remains an open problem [4]. Due to the lack of space we will omit all the proofs they can be found in the full version [1].
2
Universal Stability of Networks
A communication system is formed by three main components: A network G, a scheduling protocol P and a traffic pattern A which is represented by an adversary. The concept of universal stability applies either to networks or protocols. Let M denote a model in the set {adversarial, reliable, failure, priority, variable priority } as defined in the previous section. Given a network G, a queuing policy P and a model M, we say that for a given adversary A following the restrictions of M, the system S = (G, A, P) is stable in the model M if, at any time step, the maximum number of packets in the system is bounded by a fixed value that may depend on system parameters. The pair (G, P) is stable in the model M if, for any adversary A following the restrictions of M, the system S = (G, A, P) is stable in M. A network G is universally stable in the model M if, for any greedy queuing policy P, the pair (G, P) is stable in M. A greedy protocol P is universally stable in the model M if, for any digraph G, the pair (G, P) is stable in M. In order to compare the power of adversaries in different adversarial models, we introduce the concept of simulation. We say that an adversary A in model M simulates an adversary A in model M when for any network G and any protocol P if (G, A, P) is stable in M, then (G, A , P) is stable in M . Lemma 1. Any (r, b)-adversary in the adversarial model can be simulated by an (r, b)-adversary in the failure model. Any (r, b)-adversary in the failure model is an (r, b, 1)-adversary in the reliable model. Any (r, b)-adversary in the failure model can be simulated by an (r, b)-adversary in the priority model using two priorities. Any (r, b, α)-adversary in the reliable model can be simulated by an (r + 1 − α, b)-adversary in the failure model. Observe that the failure and reliable models are equivalent. So any stability or instability result for one model applies to the other as well. We will consider in the following only the failure and priority models. Now we can state our main result in this section. Theorem 1. Given a digraph 1. G is universally stable 2. G is universally stable 3. G is universally stable 4. G is universally stable
G, the following properties are equivalent in the adversarial model, in the failure model, in the priority model, and in the variable priority model.
Adversarial Models for Priority-Based Networks
3
147
Universal Stability of Protocols
In this section we address the universal stability property in the failure and priority models, from the point of view of the queuing policy. We will consider the six basic protocols presented in the introduction. Recall that ftg, nts, sis and lis are universally stable in the adversarial model while fifo, lifo, ntg, and ffs are not [5]. Since any adversary in the adversarial model can be seen as an adversary in the other models, fifo, lifo, ntg, and ffs are not universally stable in the failure and priority models. First we show how the behavior of any (r, b)-adversary for network G in the priority model can be simulated by an (r , b )-adversary for a network G in the adversarial model, under the same protocol P in the case that P ∈ {ftg, ntg, nfs, ffs}. Let G be a directed graph and Aπ an adversary in the priority model, assigning at most π priorities and with injection rate (r, b), where b ≥ 0 and 0 < r < 1. Every injected packet p has a priority πp in the ordered interval [1, . . . , π], being one the lowest priority. Lemma 2. For any system S = (G, Aπ , P) in the priority model, for P ∈ {ftg, ntg}, there is a system S = (G , A , P) in the adversarial model such that, G is a subgraph of G , A has injection rate (r, b), if a packet p is injected in S at time t with route r, a packet p is injected in S at time t with a route r obtained by concatenating r and a path of edges not in G, and if p crosses edge e at time t in S, p crosses e at time t in S . To get a similar result for nfs and ffs we have to relate two different networks. Lemma 3. For any system S = (G, Aπ , P) in the priority model, for P ∈ {nfs, ffs}, there is a system S = (G , A , P) in the adversarial model such that, G is a subgraph of G , A has injection rate (r, b ), where b = r(π − 1)d + b, if a packet p is injected in S with route r, a packet p is injected in S with a route r obtained by concatenating and a path of edges not in G and r, and if p crosses edge e at time t in S, p crosses e at time t in S . As a consequence of the previous lemmas, and the results in [5] we get, Theorem 2. ftg and nfs are universally stable in the priority, failure and reliable models. We can show the universal stability of sis in the priority model by following similar arguments to that of Lemma 2.2 in [5] for showing the universal stability of sis in the adversarial model and induction on the number of priorities. Theorem 3. sis is universally stable in the priority, failure and reliable models. The next result states the non universal stability of lis in the failure model, and therefore in all the other models. We show that the graph U 1 of Figure 1 is not stable under lis in the failure model. Putting this result together with Lemma 1 we get,
148
` C. Alvarez et al.
ͽ
;
Í ½
Í ¾
Fig. 1. The two subgraphs characterizing universal stability in the adversarial model (see [3]), and their extensions replacing an edge by a path
Theorem 4. lis is not universally stable in the failure, reliable and priority models. Finally, we consider universal stability of protocols in the variable priority model. We show that the graph U 1 of Figure 1 is not stable under any greedy protocol in the variable priority model. Theorem 5. There is no greedy protocol universally stable in the variable priority model
4
Stability under a Protocol
In this section we analyze the complexity of the problem of deciding whether a given network G is stable under a fixed protocol. Few results are known for this problem. Before formally stating our results, we need to introduce some graph theoretical definitions. We will consider the following subdivision operations over digraphs: – The subdivision of an arc (u, v) in a digraph G consists in the addition of a new vertex w and the replacement of (u, v) by the two arcs (u, w) and (w, v). – The subdivision of a 2-cycle (u, v), (v, u) in a digraph G consists in the addition of a new vertex w and the replacement of (u, v), (v, u) by the arcs (u, w), (w, u), (v, w) and (w, v). Given a digraph G, E (G) denotes the family of digraphs formed by G and all the digraphs obtained from G by successive arc or 2-cycle subdivisions. Given a family of digraphs F, S(F) denotes the family of digraphs that contain a graph in F as a subgraph. Figure 1 provides the two basic graphs needed to characterize universal stability, and the shape of the extensions of those graphs. This basic family provides the characterization of the stability properties. It is known that a digraph is universally stable in the adversarial model if and only if G ∈ / S(E(U 1 ) ∪ E(U 2 )) [3] . The same property characterizes network stability under ntg-lis [3] and ffs [2]. It is also known that, for a given digraph G, checking whether G ∈ / S(E(U 1 ) ∪ E(U 2 )) can be done in polynomial time [3]. Further results for undirected graphs and other variations can be found in [5] and [4]. Nothing is known about the complexity of deciding stability in the adversarial model for any other queueing policy. In the following we will provide a similar characterization of stability in the failure model under fifo and lis.
Adversarial Models for Priority-Based Networks
4.1
149
FIFO Stability under the Failure Model
Much effort have been devoted to study stability and instability properties of fifo recently. The fifo protocol was shown not to be universally stable [5]. A network-dependent absolute constant is provided in [12] such that fifo is stable against any adversary with smaller injection rate. A lower bound of 0.749 for the instability is calculated in [13]. This bound was decreased to 0.5 [15]. In [11] it is shown that fifo is stable if the injection rate is smaller than 1/(d − 1). Recently, it has been proved that fifo can become unstable at arbitrarily low injections rates [8,14]. We show that the two basic graphs given in Figure 1, as well as their extensions, are not stable under fifo in the failure model. As a more general result, for network U 2 we can show instability in the adversarial model. Lemma 4. Any graph in S(E(U 1 ) ∪ E(U 2 )) is not stable under fifo in the failure model. As we have pointed out before, all networks G ∈ / S(E(U 1 ) ∪ E(U 2 )) are universally stable in the adversarial model. Taking into account that if a network has an unstable subnetwork it is also unstable we get the following result. Theorem 6. Let G be a digraph, the pair (G, fifo) is stable in the failure model if and only if G is universally stable in the adversarial model. A corollary of this result is the equivalence between fifo stability in the failure model and universal stability in the adversarial model. Furthermore, as instability in the failure model implies instability in the priority and reliable models, the characterization of fifo stability remains the same in the priority and reliable models. Observe also that stability under fifo can be checked in polynomial time for the failure, priority and reliable models. 4.2
LIS Stability under the Failure Model
The lis protocol gives priority to the packet that was longer in the system, i.e., that joined the network earlier. In [5], the lis protocol was shown to be universally stable in the adversarial model, with O(b/(1 − r)d ) queue size per edge and delay of the packets in the order O(b/(1 − r)d ). However, as we have shown the protocol is not universally stable in the failure model. We proceed as in the case of fifo by showing, respectively, the instability of the basic graphs given in Figure 1 and their extensions. Lemma 5. Any graph in S(E(U 1 )∪E(U 2 )) is not stable under lis in the failure model. Therefore, as in the case of fifo, we have Theorem 7. A digraph G is stable under lis in the failure model if and only if G is universally stable in the adversarial model.
150
4.3
` C. Alvarez et al.
The Variable Priority Model
For the variable priority model the situation is simpler. It is easy to adapt the lis instability proofs for U 1 , U 2 and their extensions. This together with Theorem 1 gives the following result. Theorem 8. Let P be any greedy protocol. A digraph G is stable under P in the failure model if and only if G is universally stable in the adversarial model.
5
Conclusions and Open Problems
We have proposed several variations on the adversarial model to cope with packet priorities and link failures. We have studied universal stability from the point of view of both, the network and the queueing policy. We have also addressed the complexity of deciding stability under a fixed protocol. We have shown that in the adversarial, failure, reliable, priority and variable priority models, the set of networks that are universally stable remains the same. The models present a different behavior with respect to the universal stability of protocols, since lis is universally stable in the adversarial model, but it is not universally stable in the other models. In contrast, we have shown that there are no universally stable protocols for the variable priority model. We have proposed a new and natural way to model the behavior of queueing systems in dynamic networks. Our results compared to the slowdown models introduced in [9] show that the power of an adversary in the failure and in the dynamic slowdown model is quite similar. In both cases the lis protocol is not universally stable. However the static slowdown model is less powerful than the failure model as lis remains universally stable [9]. The argument used in the proof of Theorem 4.1 in [9] can be used to show how to construct an adversary in the variable priority model that simulates an adversary in the dynamic slowdown model. It would be of interest to find constructions, similar to those given in Lemmas 2 and 3 to relate the power of the slowdown and failure models without changing the protocol. Regarding the dynamic capacity model, the authors frequently use the trick of injecting c − ce (t) dummy packets which only need to traverse link e. This can be done without violating the load condition for a network with static capacity c provided that ce (t) > 0 (see Theorems 3.3 and 3.4 [9]). It will be of interest to analyze the case with zero capacities. It remains as an open problem to show the existence of a protocol that is universally stable in the failure model but it is not in the priority model. All the already known characterizations of stability under a protocol are equivalent to universal stability in the adversarial model, even in the variable priority model. It is an interesting open question to know whether there is any protocol P, not universally stable, for which there are networks that are not universally stable but that are stable under P. Finally let us point that deciding stability under FIFO in the adversarial model is open, and that also nothing is known about characterizations of stability under lifo.
Adversarial Models for Priority-Based Networks
151
References ` 1. C. Alvarez, M. Blesa, J. D´ıaz, A. Fern´ andez and M. Serna. Adversarial models for priority-based networks. Technical Report LSI-03-25-R, Software department, Universitat Polit`ecnica de Catalunya, 2003. ` 2. C. Alvarez, M. Blesa, J. D´ıaz, A. Fern´ andez and M. Serna. The complexity of deciding stability under ffs in the adversarial model. Technical Report LSI-03-16R, Software department, Universitat Polit`ecnica de Catalunya, 2003. ` 3. C. Alvarez, M. Blesa, and M. Serna. A characterization of universal stability in the adversarial queueing model. Technical Report LSI-03-27-R, Software department, Universitat Polit`ecnica de Catalunya, 2003. ` 4. C. Alvarez, M. Blesa, and M. Serna. Universal stability of undirected graphs in the adversarial queueing model. In 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA’02), pages 183–197, Winnipeg, Canada, August 2002. ACM Press New York, USA. 5. M. Andrews, B. Awerbuch, A. Fern´ andez, J. Kleinberg, T. Leighton, and Z. Liu. Universal stability results for greedy contention–resolution protocols. Journal of the ACM, 48(1):39–69, 2001. 6. E. Anshelevich, D. Kempe, and J. Kleinberg. Stability of load balancing algorithms in dynamic adversarial systems. In 34th. ACM Symposium on Theory of Computing (STOC’02), pages 399–406, 2002. 7. B. Awerbuch, P. Berenbrink, A. Brinkmann, and C. Scheideler. Simple routing strategies for adversarial systems. In 42th. IEEE Symposium on Foundations of Computer Science (FOCS’01), pages 158–167, 2001. 8. R. Bhattacharjee and A. Goel. Instability of FIFO at arbitrarily low rates in the adversarial queueing model. Technical Report 02-776, Department of Computer Science, University of Southern California, Los Angeles, USA, 2002. 9. A. Borodin, R. Ostrovsky, Y. Rabani. Stability Preserving Transformations: Packet Routing Networks with Edge Capacities and Speeds. In ACM-SIAM Symposium on Discrete Algorithms (SODA’01), pages 601–610, 2001. 10. A. Borodin, J. Kleinberg, P. Raghavan, M. Sudan, and D. Williamson. Adversarial queueing theory. Journal of the ACM, 48(1):13–38, 2001. 11. Charny, A. and Le Boudec, J.-Y. Delay Bounds in a Network With Aggregate Scheduling In Proc. First International Workshop on Quality of future Internet Services. Berlin, Germany, 2000. 12. J. Diaz, D. Koukopoulos, S. Nikoletseas, M. Serna, P. Spirakis, and D. Thilik´ os. Stability and non-Stability of the FIFO Protocol. In 13th annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’01), pages 48–52, 2001. 13. D. Koukopoulos, M. Mavronicolas, S. Nikoletseas, and P. Spirakis. On the stability of compositions of universally stable, greedy contention-resolution protocols. In D. Malkhi, editor, Distributed Computing, 16th International Conference, volume 2508 of Lecture Notes in Computer Science, pages 88–102, Springer. 2002. 14. D. Koukopoulos, M. Mavronicolas and P. Spirakis. FIFO is unstable at arbitrarily low rates. ECCC, TR-03-16, 2003. 15. Z. Lotker, B. Patt-Shamir, and A. Ros´en. New stability results for adversarial queuing. In 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA’02), pages 175–182, Winnipeg, Canada, 2002. 16. A. Ros´en. A note on models for non-probabilistic analysis of packet switching networks. Information Processing Letters, 84:237–240, 2002.
On Optimal Merging Networks Kazuyuki Amano and Akira Maruoka Graduate School of Information Sciences, Tohoku University Aoba 05, Aramaki, Sendai 980-8579, Japan {ama,maruoka}@ecei.tohoku.ac.jp
Abstract. We prove that Batcher’s odd-even (m, n)-merging networks are exactly optimal for (m, n) = (3, 4k + 2) and (4, 4k + 2) for k ≥ 0 in terms of the number of comparators used. For other cases where m ≤ 4, the optimality of Batcher’s (m, n)-merging networks has been proved. So we can conclude that Batcher’s odd-even merge yields optimal (m, n)merging networks for every m ≤ 4 and for every n. The crucial part of the proof is characterizing the structure of optimal (2, n)-merging networks.
1
Introduction
A comparator network, which consists of comparators, has been widely investigated as a model of oblivious comparison-based algorithm. A comparator is a 2-input 2-output gate, where one output computes the minimum of the two inputs and the other output computes the maximum (see Fig. 1). x1
1 0 0 1 0 1
x2 y1
x
1 0 0 1
min{ x , y }
1 0 1 0
y2 y3
0 1 1 0 0 1
y4
y
1 0 0 1
max{ x , y }
x3
z1 0 1 1 0 0 1
1 0 0 1 0 1
1 0 1 0 1 0 0 1
00 11 11 00 00 11 11 00 11 00
11 00 11 00 00 11 00 11 0 1 00 11 0 1 00 11 0 1 00 11 00 11 00 00 11 11 00 11 00 11 00 11 00 11 00 11 00 11
z2 z3 z4 z5 z6 z7
Fig. 1. Left: a comparator, Right: a (3, 4)-merging network
An (m, n)-merging network is a comparator network that merges m elements x1 ≤ · · · ≤ xm with n elements y1 ≤ · · · ≤ yn to form the sorted sequence z1 ≤ · · · ≤ zm+n . A merging network is usually drawn as in Fig. 1. Batcher [2] proposed odd-even merge, which gives an (m, n)-merging network with the least number of comparators used known up to the present. Let C(m, n) be the number of comparators used in the odd-even merge for m and n. The function C(m, n) is given(see [4, p.224]) by mn, if mn ≤ 1; C(m, n) = C(m/2, n/2) + C(m/2, n/2) + (m + n − 1)/2, if mn > 1. Let M (m, n) denote the minimum number of comparators in an (m, n)merging network. At present no merging networks have been discovered that B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 152–161, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Optimal Merging Networks
153
are superior to the odd-even merge, and the problem of proving or disproving M (m, n) = C(m, n) for every m and n remains unsolved for over 30 years (see [8] or [4, Exercise 43, p. 241]). We call an (m, n)-merging network that uses M (m, n) comparators as the optimal merging network. There has been considerable effort to determine the exact values of M (m, n). It is clear that M (1, n) = C(1, n) = n. Yao and Yao [8] proved that M (2, n) = C(2, n) = 3n/2. Aigner and Schwarzkopf [1] showed that M (3, n) = C(3, n) = (7n + 3)/4 for n ≡ 0, 1, 3 mod 4. In the same paper, they also stated that M (3, n) = C(3, n) for n ≡ 2 mod 4 but without a detailed proof. Recently, Iwata [3] proved that M (m1 + m2 , n) ≥ M (m1 , n) + M (m2 , n) + m1 + m2 + n − 2)/2,
(1)
for every m1 , m2 ≥ 1 and for every n ≥ 1. As a corollary of this inequality, they showed that M (m, n) = C(m, n) for m = 3, 4 and for n ≡ 0, 1, 3 mod 4. Unfortunately, for n ≡ 2 mod 4, the best lower bounds obtained by Eq. (1) is M (m, n) ≥ C(m, n) − 1 for m = 3, 4. Somewhat interestingly, the case n ≡ 2 mod 4 seems to be the hardest one in determining the exact values of M (m, n) for m = 3, 4. The exact values of M (n, n) for n ≤ 9 have been known [1,7], and the asymptotic behavior of M (m, n) has been also investigated in, e.g., [3,5,6,8]. The main result of the present paper is to show that Batcher’s odd-even merge yields optimal (m, n)-merging networks for every m ≤ 4 and for every n. This is achieved by proving M (3, 4k + 2) = C(3, 4k + 2) (in Section 3), and M (4, 4k + 2) = C(4, 4k + 2) (in Section 4). The crucial part of the proofs of the above results is characterizing the structure of optimal (2, n)-merging networks(Theorem 1), which we describe in Section 2. In addition, our arguments can also be used to determine M (m, n) for certain (m, n)’s with m > 4 such as (m, n) = (5, 8k + 6). This will be discussed in Section 5. In what follows, we call a horizontal line of input element xi (yj ) of an (m, n)merging network line xi (line yj ) for 1 ≤ i ≤ m (1 ≤ j ≤ n, respectively). The line xi is always placed above the line xi+1 for every 1 ≤ i < m, and similarly for yj . We can place lines x1 , x2 , . . . , xm of an (m, n)-merging network interspersed within lines y1 , y2 , . . . , yn in arbitrarily order since, for any two differently interspersed input lines, there is a transformation from an (m, n)-merging network with interspersed input lines into a network with another intersperse lines that preserves the number of comparators used [4, Exercise 16, p.238]. A comparator connecting a line i to a line j is denoted by [i : j]. A comparator is said to be of the form [i : ∗] if it is a comparator [i : j] for some j. Comparator of the form [∗ : j] is defined similarly. A subnetwork of a merging network N consists of a set K of adjacent lines in N with the set of all comparators that connect two lines in K. We sometimes identify a merging network with the set of comparators contained in the network. For a set A, |A| denotes the cardinality of A.
154
2
Kazuyuki Amano and Akira Maruoka
Structure of Optimal (2, n)-Merging Networks
In this section, we characterize the structure of optimal (2, n)-merging networks for even n. This is the key to the proofs of the lower bounds for M (m, n) with m = 3, 4 and we believe that it is interesting in its own right. Put n = 2k. Let A be an optimal (2, n)-merging network consisting of input lines x1 , x2 , y1 , y2 , . . . , yn from top to bottom. The network A contains M (2, n) = 3n/2 = 3k comparators. Consider the behavior of the network A for two input sequences a0 = 0, n + 1, 1, 2, . . . , n and a1 = n + 1, n + 2, 1, 2 . . . , n . Note that the network A sorts a0 and a1 . We also note that if a network sorts a0 and a1 , then it is a (2, n)-merging network (this can be proved by using the zero-one principle [4, p. 224] or see [1, Lemma 1]). Now we divide the set of comparators A into three subsets A0 , A1 and A01 as follows: A0 : the set of comparators that change the content of a0 , but not of a1 , A1 : the set of comparators that change the content of a1 , but not of a0 , A01 : the set of comparators that change the contents of a0 and a1 . x1 x2 y1 y2 y3 y4 y5 y6
1
1 0 0 1 0 1 0 1 0 0 1 01 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 11 00 1 0 0 1 1 0 11 00 1 0 01 0 0 1 00 11 0 1 0 1 00 11 0 1 0 1 0 1 00 11 0 1 0 1 0 1 1 00 11 0 1 0 1 0 1 00 11 0 1 0 1 0 1 00 11 0 1 0 1 0 1 0 1 011 0 0 1 00 11 00 11 0 1111111111111111 0000000000000000 0 1 0 00 1 11 0 1 00 11 00 11 0 1 0 1 0 1 0 1 0 1 0 00 1 0 1 0 1 01 1 1111111111111111 0000000000000000 11 0 0 1 0 1 0 1 0 1 0 1 01 1 0 1 0 0 1 1111111111111111 0000000000000000 11 00 11 00 0 1 11 00
(a)
x1 x2 y1 y2 y3 y4 y5 y6
1
1
11 00 00 11 0 1 00 11 00 11 0 1 0 0 1 01 1 0 1 0 1 0 1 0 1 0 1 0 1 11 00 0 1 11 00 0 1 0 1 0 1 0 1 0 1 0 1 00 11 0 1 00 11 0 1 00 11
011 0 0 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 011 00 01 00 11 0 0 1 0 1 1 011 1 00 11 00 11 0 1 0 1 0 1 0 00 00 0 11 1 0 1 0 1 00 11 1 0 11 1 0 1 00 0 1 0 11 1 0 1 00 0 1 0 1 0 1 0 1 0 1 00 11 0 1 1111111111111111 0000000000000000 11 00 0 1 00 11 0 1 0 1 1 0 0 1 1111111111111111 0000000000000000 11 00 0 1 0 1 0 1 0 1 1 0 1111111111111111 0000000000000000 11 00 1 0 1 0
(b)
Fig. 2. (a) Batcher’s (2, 6)-merging network; (b) a (2, 6)-merging network that uses the same number of comparators as Batcher’s network. A comparator labeled “0” (“1” and “01”) is a member of A0 (“A1 ” and “A01 ”, respectively)
Note that A0 , A1 and A01 are mutually disjoint, and A0 ∪ A1 ∪ A01 = A since A is optimal. In both networks shown in Fig. 2, the three sets A0 , A1 and A01 have the same size (,i.e., |A0 | = |A1 | = |A01 | = 3). Moreover, we can notice that every comparator in A0 has odd “length” and every comparator in A1 ∪ A01 has even “length”. Formally, the length of a comparator c = [i : j] is defined to be (j − i) where we label the input lines with 0, 1, 2, . . . , from top to bottom. Note that in the network A, a comparator [yi : yj ] has length (j − i) and [xi : yj ] has length j + (2 − i). In the following, we prove that these conditions are satisfied by any optimal (2, 2k)-merging network. The statements (iv) and (v) of Theorem 1 are crucial to the proofs of all our lower bounds. Theorem 1. For every k ≥ 1 and for every optimal (2, 2k)-merging network A, which consists of lines x1 , x2 , y1 , . . . , y2k , the following are true: (i) |A0 | = |A1 | = |A01 | = k. (ii) For every 1 ≤ i ≤ 2k, there is a unique comparator in A0 ∪ A01 which is of the form [∗ : yi ]. (iii) For every 1 ≤ i ≤ 2k, there is a unique comparator in A1 ∪ A01 which is of the form [∗ : yi ].
On Optimal Merging Networks
155
(iv) Every comparator in A01 ∪ A1 has even length. (v) Every comparator in A0 has odd length. Proof (of (i), (ii) and (iii), Theorem 1). Recall that n = 2k. Let A be an optimal (2, 2k)-merging network. Obviously, a network consisting of the set of comparators A0 ∪ A01 sorts a0 . For any network sorting a0 , there are precisely n comparators that change the content of a0 and there is a unique comparator which is of the form [∗ : yj ] for every 1 ≤ j ≤ n. (This was observed by Aigner and Schwarzkopf [1, Section 3], and a similar observation was made by Yao and Yao [8, Lemma 1].) This implies (ii) of Theorem 1, and |A0 | + |A01 | = n = 2k. Hence we have |A1 | = 3k − 2k = k. Let (a0 )j ((a1 )j ) be the content of the line j after the h-th comparison in A when a0 (a1 , respectively) is given to A as input. Let us say that a line j is split after the h-th comparison if (a0 )j = (a1 )j holds. We define dh to be the number of split lines after the h-th comparison. Thus, at the input d0 = 2, and at the end d3k = n + 2, and hence d3k − d0 = n. Since every comparator raises the value of dh at most 1 and a comparator in A01 does not change the value of dh , we have |A0 | + |A1 | ≥ n = 2k. This implies |A0 | ≥ k, since |A1 | = k. Here we show that, for every comparator c of the form [∗ : yj ] in A0 , there is another comparator of the form [∗ : yj ] in A1 that lies to the left of c. If this is proven, then k = |A1 | ≥ |A0 | ≥ k, and hence k = |A1 | = |A0 | = |A01 | (that is, (i) of Theorem 1.). Moreover, this and (i) and (ii) of Theorem 1 imply (iii) of the theorem. Suppose that a comparator c = [i : yj ] is in A0 . It is easy to check that we must have (a0 )yj < (a0 )i ≤ (a1 )i < (a1 )yj at the input of c. Thus, the line yj must be split before encountering the comparator c. Since the initial content of the line yj is (j, j), the line yj must have been part of a previous comparator. Since the content of yj is greater than or equal to j at any stage, the first comparator changing the content (j, j) is of the form [∗ : yj ]. Moreover this comparator must be in the set A1 since c is in A0 and (ii) of Theorem 1. This completes the proof of (iii) of the theorem.
Proof (of (iv), Theorem 1). Let us focus on the content of a1 in the network A. We label the lines with 1, 2, . . . , n + 2 from top to bottom. For any stage in A, the differential of line j, denoted by ej , is defined to be (a1 )j − j and the differential sequence is defined to be e1 , e2 , . . . , en+2 . Note that the differential sequence is n, n, −2, . . . , −2 at the input and is 0, 0, . . . , 0 at the output. If there is a comparator of odd length in A01 ∪ A1 , then there must be a line whose differential is an odd. (This is proved by focusing on the outputs of the first such comparator.) So to prove (iv), it is sufficient to show that there are no lines whose differential d is odd in the network A. We shall prove this by induction on t where d = 2t + 1. Let c = [i : j] ∈ A01 ∪ A1 be a comparator whose length is l. Suppose that (ei , ej ) is equal to (α, β) at the input of c. By (iii) of Theorem 1 and the first comparator changing the content of line j is of the form [∗ : yj ], we have β = −2. Since the comparator c swaps the two inputs, we have i + α > −2 + j, and this implies α + 2 > j − i = l ≥ 1. Hence α ≥ 0. So every comparator in A01 ∪ A1
156
Kazuyuki Amano and Akira Maruoka
decreases the number of negative entries in a differential sequence by at most 1. Because it contains precisely n = |A01 ∪ A1 | negative entries at the input, every comparator in A01 ∪ A1 must decrease the number of negative entries in a sequence by 1. The differential (ei , ej ) is changed from (α, −2) to (−2 + l, α − l) by the comparator c. Hence, for every comparator c = [i : j] ∈ A01 ∪ A1 , the differential (ei , ej ) = (α, −2) before c must satisfy α ≥ l ≥ 2. This proves the base case (t = 0). For the induction step, suppose that (ei , ej ) = (2t + 1, −2) at the input of some comparator c = [i : j] ∈ A01 ∪ A1 whose length is l. The differentials of the outputs of c are −2 + l and 2t + 1 − l. Since 2 ≤ l ≤ 2t + 1, one of them is odd and is smaller than or equal to 2(t − 1) + 1. This contradicts the induction hypothesis. (End of the proof of (iv) of Theorem 1.)
Proof (of (v), Theorem 1). To prove (v) of Theorem 1, we will show that the number of comparators having odd length is greater than or equal to that having even length for any network of size n that sorts a0 . This implies (v) of Theorem 1 because (a) A0 ∪ A01 sorts a0 , (b) |A0 | = |A01 | = n/2 (Theorem 1, (i)) and (c) every comparator in A01 has even length(Theorem 1, (iv)). To show this, we analyze an arbitrary network with n comparisons sorting a = n + 1, 1, 2, . . . , n , which consists of the lines 1, 2, . . . , n + 1. Aigner and Schwarzkopf [1] observed that, for every such network, the content a1 , . . . , an+1 after the r-th comparison can be described as follows: The set of lines {1, 2, . . . , n + 1} is uniquely divided into groups F1 |F2 | · · · |Fr+1 from top to bottom such that (i) i ∈ Fk , j ∈ Fl , k < l ⇒ ai < aj , and (ii) if Fk = {fk , fk + 1, . . . , fk+1 − 1} then afk +1 < · · · < afk+1 −1 < afk . In other words, within each group the top element is the largest, with the others appearing in their natural order. A comparator [i : j] changes the content of a if and only if i and j are in a certain group Fk with i = fk < j ≤ fk+1 − 1. By this comparator the set Fk splits into two groups Fku = {fk , fk + 1, . . . , j − 1} and Fkl = {j, j + 1, . . . , fk+1 − 1} satisfying (i) and (ii) when we replace Fk by Fku and Fkl . For a network M , let o(M ) (e(M )) be the number of comparators having odd length (even length, respectively) in M . Let T (k) be the minimum value of o(Ak ) − e(Ak ) where Ak ranges over all networks with input lines 1, 2, . . . , k + 1 consists of k comparators that sorts a sequence a1 , a2 , . . . , ak+1 with a2 < a3 < · · · < ak+1 < a1 . Then T (k) can be expressed recursively as follows: T (k) = min {(−1)i−1 + T (i − 1) + T (k − i)}. 1≤i≤k
(2)
We are going to show T (k) = 0 for every even k and T (k) = 1 for every odd k by the induction on k. The base cases T (0) = 0 and T (1) = 1 are obvious. If k is even, then by Eq. (2) and the induction hypothesis we have T (k) = min −1 + min {T (k1 )} + min {T (k2 )}, k1 :k1 3, polynomially bounded max wk-sat-B as well as the general minimization and maximization versions of integer-linear programming are in 0-DAPX. Theorem 3. Under ≤D , NPO-complete = 0-DAPX-complete ⊆ 0-DAPX. A natural question rising from the above is: what is the relation between NPOcomplete and 0-DAPX? Taking into consideration the fact that 0-DAPX is the hardest differential approximability class in NPO, one might guess that NPO-complete ≡ 0-DAPX, but in order to prove it we need a stronger reducibility. We show in [10] that defining a special a kind of Turing-reduction, one can prove that NPO-complete = 0-DAPX-complete = 0-DAPX.
3
Differential APX-Completeness
Let us now address the problem of completeness in the class DAPX. Note first that a careful reading of the proof of the standard APX-completeness of max
Completeness in Differential Approximation Classes
183
wsat-B given in [2] establishes also the following proposition which will be used in what follows. Proposition 1. Let Π ∈ APX. There exist 3 polynomially computable functions f , g and cρ :]0, 1[∩Q →]0, 1[∩Q such that ∀x ∈ IΠ , ∀z ∈ solΠ (x), ∀ρ ∈ ]0, 1[: (1) f (x, z, ρ) = (φx,z,ρ , Wx,z,ρ , wx,z,ρ ) with (φx,z,ρ , wx,z,ρ ) ∈ Imax wsat ; (2) ∀y ∈ solmax wsat (f (x, z, ρ)), g(x, z, ρ, y) ∈ solΠ (x); (3) if γΠ (x, z) ρ, then f (x, z, ρ) is an instance of max wsat-B and, for any solution y of f (x, z, ρ), if γmax wsat-B (f (x, z, ρ), y) 1 − cρ (), then γΠ (x, g(x, z, ρ, y)) 1 − . We now define a notion of polynomial time differential approximation schemata preserving reducibility, called DPTAS-reduction in what follows. Definition 3. Let Π, Π ∈ NPO. Then, Π ≤DPTAS Π if there exist two functions f , g and a function c :]0, 1[∩Q →]0, 1[∩Q, all computable in polynomial time, such that: (i) ∀x ∈ IΠ , ∀ ∈]0, 1[∩Q, f (x, ) ∈ IΠ ; f is possibly multivalued; (ii) ∀x ∈ IΠ , ∀ ∈]0, 1[∩Q, ∀y ∈ solΠ (f (x, )), g(x, y, ) ∈ solΠ (x); (iii) ∀x ∈ IΠ , ∀ ∈]0, 1[∩Q, ∀y ∈ solΠ (f (x, )), δΠ (f (x, ), y) 1 − c() ⇒ δΠ (x, g(x, y, )) 1 − ; if f is multi-valued, i.e., f = (f1 , . . . , fi ), for some i polynomial in |x|, then, the former implication becomes: ∀x ∈ IΠ , ∀ ∈]0, 1[∩Q, ∀y ∈ solΠ ((f1 , . . . , fi )(x, )), ∃j i such that δΠ (fj (x, ), y) 1 − c() ⇒ δΠ (x, g(x, y, )) 1 − . It is easy to see that given two NPO problems Π and Π , if Π ≤DPTAS Π and Π ∈ DAPX, then Π ∈ DAPX. Let Π ∈ DAPX and let T be a differential ρ-approximation algorithm for Π, with ρ ∈]0, 1[. There exists a polynomial p such that ∀x ∈ IΠ , |ω(x) − opt(x)| 2p(|x|) . An instance x ∈ IΠ can be written in terms of an integer linear program as: x : opt v(y) subject to y ∈ Cx , where Cx is the constraint-set of x. For any i ∈ {0, . . . , p(|x|)} and for any l ∈ N, we define xi,l by: xi,l : max[vi,l (y) = v(y)/2i − l] subject to y ∈ Cx , if Π is a maximization problem, or xi,l : min[vi,l (y) = l − v(y)/2i ] subject to y ∈ Cx , if Π is a minimization problem. Any xi,l can be considered as an instance of an NPO problem denoted by Πi,l . Then, the following proposition holds. Proposition 2. Let < min{ρ, 1/2}, x ∈ IΠ and (i, l) ∈ {1, . . . , p(|x|)} × N be such that 2i | opt(x) − ω(x)| 2i+1 and set l = ω(x)/2i . Then, for any y ∈ solΠ (x) = solΠi,l (xi,l ): (1) δΠi,l (xi,l , y) (1 − ) =⇒ δΠ (x, y) 1 − 3; (2) δΠ (x, y) ρ =⇒ δΠi,l (xi,l , y) (ρ − )/(1 + ). The proof of the existence of a DAPX-complete problem is performed along the following schema. We first prove that any DAPX problem Π is reducible to max wsat-B by a reduction transforming a PTAS for max wsat-B into a DPTAS for Π; we denote it by ≤D S . Next, we consider a particular APXcomplete problem Π , say max independent set-B; max wsat-B that is in APX is PTAS-reducible to max independent set-B. max independent set-B is both in APX and in DAPX and, moreover, standard and differential approximation ratios coincide for it; this coincidence draws a trivial reduction
184
G. Ausiello et al.
called ID-reduction; it trivially transforms a differential polynomial time approximation schema into a standard polynomial time approximation schema. In other words, we prove that Π ≤D S max wsat-B ≤PTAS max independent set-B ≤ID max independent set-B The composition of the three reductions, i.e., the one from Π to max wsat-B, the one from max wsat-B to max independent set-B and the ID-reduction, is a DPTAS reduction transforming a differential polynomial time approximation schema for max independent set-B into a differential polynomial time approximation schema for Π, i.e., max independent set-B ∈ DAPX-complete. Theorem 4. max independent set-B is DAPX-complete. Proof. We sketch here the part ∀Π ∈ DAPX, Π ≤D S max wsat-B (we assume integer valued problems; extension to the case of rational values is immediate). Remark that given a formula φ, a variable-weight system w and a constant B, one can decide in polynomial time if (φ, B, w) ∈ Imax wsat-B . Since Π is in DAPX, let T be a polynomial algorithm that guarantees differential ratio ρ ∈]0, 1[. Let < min{ρ, 1/2}. For any ζ > 0, we denote by Oζ an oracle that, for any instance x of max wsat-B, computes a feasible solution Oζ (x) ∈ solmax wsat-B guaranteeing γmax wsat-B (x, Oζ ) 1 − ζ. We construct an algorithm A (this is the component of ≤D S transforming solutions for max wsat-B into solutions for Π) using this oracle such that: A guarantees differential approximation ratio 1 − for Π and, in the case where Oζ is polynomial (in other words, Oζ can be seen as a polynomial time approximation schema), A is also polynomial. The ≤D S -reduction claimed is based upon the construction of a family F of instances xi,l : F = {xi,l : (i, l) ∈ F }, where F is of polynomial size and contains a pair (io , lo ) such that: either i0 = 0, 2i0 | opt(x) − ω(x)| 2i0 +1 and l0 = ω(x)/2i0 , or i0 = 0, | opt(x) − ω(x)| 2 and l0 = ω(x). For instance xi0 ,l0 the worst value is 0; henceforth standard and differential ratios coincide. In other words, δΠi0 ,l0 (xi0 ,l0 , z) = γΠi0 ,l0 (xi0 ,l0 , z), for all feasible z. Moreover, for i0 = 0, δΠ (x, z) = δΠ0,ω(x) (x0,ω(x) , z) = γΠ0,ω(x) (x0,ω(x) , z). We first suppose that F can be constructed in polynomial time. For each (i, l) ∈ F , we consider the three functions gi,l , fi,l and ci,l (Proposition 1) for the instance xi,l . We set = min{(ci,l )ρ (), (ci,l )(ρ−)/(1+) (/3) : (i, l) ∈ F } and define, for (i, l) ∈ F , η = ρ if i = 0; otherwise, η = (ρ−)/(1+). Let z = T(x); then, for any (i, l) ∈ F , we set zi,l = gi,l (xi,l , z, η, O (fi,l (xi,l , z, η))), if fi,l (xi,l , z, η) is an instance of max wsat-B; otherwise we set zi,l = z. Remark that zi,l is a feasible solution for xi,l and, consequently, for x. In all, A constructs zi,l for each (i, l) ∈ F and selects the best among them as solution for x. Next, we prove that A achieves differential approximation ratio 1 − . Using Propositions 1 and 2, we can show that δΠ (x, zi0 ,l0 ) 1 − . Since (i0 , l0 ) ∈ F , A has already computed the solution zi0 ,l0 . By taking into account that the solution finally returned by A is the best among the computed ones, we immediately
Completeness in Differential Approximation Classes
185
conclude that it is at least as good as zi0 ,l0 . Therefore, it guarantees ratio 1 − . Finally, we prove that F can be constructed in polynomial time. Steps sketched just above show that ∀Π ∈ DAPX, Π ≤D S max wsat-B. Theorem 5. min vertex cover-B, max set packing-B, min set coverB, are DAPX-complete under DPTAS-reductions. Furthermore, max independent set, min vertex cover, max set packing, min set cover, max clique and max -colorable induced subgraph, are DAPX-hard under DPTAS-reductions.
4
Differential PTAS-Hardness
In this section, we will take into consideration the class DPTAS and we will address the problem of completeness in such class. Consider the following reduction preserving fully polynomial time differential approximation schemata, denoted by DFPTAS-reduction in what follows. Definition 4. Assume two NPO problems Π and Π . Then, Π ≤DFPTAS Π , if there exist three functions f , g and c such that: (i) f and g are as for PTASreduction (Section 1; (ii) c : (]0, 1[∩Q) × IΠ →]0, 1[∩Q; its time complexity and its value are polynomial in both |x| and 1/; (iii) ∀x ∈ IΠ , ∀ ∈]0, 1[∩Q, ∀y ∈ solΠ (f (x, )), δΠ (f (x, ), y) 1 − c(, x) ⇒ δΠ (x, g(x, y, )) 1 − . Obviously, given two NPO problems Π and Π , if Π ≤DFPTAS Π and Π ∈ DPTAS, then Π ∈ DPTAS. In the following we study completeness not for the whole class DPTAS but for a subclass DPTASp mainly consisting of the maximization problems of PTAS the worst-value of which is computable in polynomial time (this class includes, in particular, maximization problems with worst value 0). Recall that, the first problem proved PTAS-complete (under FPTAS reduction) is max linear wsat-B ([2]). Consider two problems Π ∈ DPTASp and Π , instances of which x ∈ IΠ and x ∈ IΠ , respectively, are expressed, in terms of an integer linear programs as: x : opt v(y) subject to y ∈ Cx , x : opt v(y ) − ω(x) subject to: y ∈ Cx and Cx ≡ Cx . Obviously, δΠ (x, y) = δΠ (x , y ) = γΠ (x , y ) and, moreover, Π and Π belong to DPTASp ; also, Π ∈ PTAS and Π ≤FPTAS max linear wsat-B. So, for any Π ∈ DPTASp , Π ≡D Π ≤FPTAS max linear wsat-B; reduction ≡D ◦ ≤FPTAS is a DFPTAS-reduction. AF of DPTASp under affine transforConsider now the closure DPTASp mations of objective functions of its problems. min vertex cover in planar AF graphs is in DPTASp \ DPTASp . AF
and Π its “affine mate” in DPTASp . Then, Let any Π ∈ DPTASp Π ≤AF Π ≡D Π ≤FPTAS max linear wsat-B and since, obviously, the reduction ≤AF ◦ ≡D ◦ ≤FPTAS is a DFPTAS-one, the following proposition holds. Proposition 3. max linear wsat-B is DPTASp
AF
-hard, under ≤DFPTAS .
186
5
G. Ausiello et al.
MAX-SNP and Differential GLO
In the theory of approximability of optimization problems based upon the standard approximation ratio interesting results have been obtained by studying the behavior of local search heuristics and the degree of approximation that such heuristics can achieve. In particular, in [8,16], the class GLO is defined as the class of NPO-PB problems whose local optima have a guaranteed quality with respect to the global optima. Of course, the differential counterpart of GLO, called DGLO in what follows, can be defined analogously. In [17] it is shown that max cut, min dominating set-B, max independent set-B, min vertex cover-B, max set packing-B, min coloring, min set cover-B min set w(K)cover-B, min feedback edge set, min feedback vertex set-B and min multiprocessor scheduling, are included in DGLO. Furthermore in [18] it is proved that both min and max tsp on graphs with polynomially bounded edge-distances are also included in DGLO. Let us now consider the relationship of DGLO with respect to the differenDPTAS tial approximability class DAPX. Let DGLO be the closure of DGLO PTAS under ≤DPTAS . Analogously GLO is defined in [16] where it is also proved PTAS = APX. It is easy to show that the same holds for differential that GLO approximation. DPTAS
Proposition 4. DAPX = DGLO
.
Among other interesting properties of the class GLO, in [8] it is proved that max 3-sat is complete in GLO ∩ MAX-SNP with respect to LOP-reduction. A related result in [19] shows that MAX-SNP ⊆ Non-Oblivious GLO, a variant of the class GLO defined by means of local search algorithms that are allowed to use more general kinds of objective functions, rather than the natural objective function of the given problem, for improving the quality of the solution. In what follows, we show the existence of complete problems for a large, natural subclass of DGLO. As one can see from the definition of LOP-reduction in Section 1, the local optimality preserving properties do not depend on the approximation measure adopted. Hence, in an analogous way, we define here a reduction called DLOP which is a DPTAS-one with the same local optimality preserving properties as the ones of a LOP-reduction (Section 1). Definition 5. A DLOP-reduction is a DPTAS-reduction with the same surjectivity, partial monotonicity, locality and dominance properties as an LOPreduction. Obviously, given two NPO problems Π and Π , if Π ≤DLOP Π and Π ∈ DGLO, then Π ∈ DGLO. Let DGLO0 be the class of MAX-SNP maximization problems that belong to DGLO and for which the worst value 0 is feasible for any instance (max independent set-B, for example, is such a problem). Note that for the problems of DGLO0 , the standard and differential approximation ratios coincide. Now
Completeness in Differential Approximation Classes
187
let us consider the closure of DGLO0 under affine transformations. This leads to the following definition. Definition 6. Let Π be a polynomially bounded NPO problem. Then, Π ∈ DGLO if (i) it belongs to DGLO0 , or (ii) it can be transformed into a problem in DGLO0 by means of an affine transformation; in other words, DGLO = AF DGLO0 . Theorem 6. ∀Π ∈ DGLO , Π ≤DLOP max independent set-B. Proof. Assume Π ∈ DGLO . We then have the following two cases: (i) Π ∈ DGLO0 or (ii) Π can be transformed into a problem in DGLO0 by means of an affine transformation. Dealing with case (i), note that for DGLO0 , an LOP-reduction is also a DLOP-one and that the L-reduction of any π ∈ GLO (hence in DGLO0 ) is an LOP-reduction ([8]). We can show that both L-reductions in [3] from max 3-sat to max 3-sat-B and from max 3-sat-B to max independent set-B are also LOP-ones. So, the result follows. Dealing with case (ii), since an affine transformation is a DLOP-reduction, Π ≤DLOP Π and by case (i), Π ≤DLOP max independent set-B. Proposition 5. max cut, min vertex cover-B, max set packing-B, min set cover-B are DGLO -complete, under DLOP-reductions. Note that min multiprocessor scheduling, or even min and max tsp on graphs with polynomially bounded edge-distances belong to DGLO ([17,18]) but neither to GLO, nor to DGLO . On the other hand, min vertex coverB belongs to DGLO but not to MAX-SNP.
References 1. Orponen, P., Mannila, H.: On approximation preserving reductions: complete problems and robust measures. Technical Report C-1987-28, Dept. of Computer Science, University of Helsinki, Finland (1987) 2. Crescenzi, P., Panconesi, A.: Completeness in approximation classes. Inform. and Comput. 93 (1991) 241–262 3. Papadimitriou, C.H., Yannakakis, M.: Optimization, approximation and complexity classes. J. Comput. System Sci. 43 (1991) 425–440 4. Ausiello, G., Crescenzi, P., Protasi, M.: Approximate solutions of NP optimization problems. Theoret. Comput. Sci. 150 (1995) 1–55 5. Crescenzi, P., Trevisan, L.: On approximation scheme preserving reducibility and its applications. In: Foundations of Software Technology and Theoretical Computer Science, FCT-TCS. Number 880 in Lecture Notes in Computer Science, SpringerVerlag (1994) 330–341 6. Ausiello, G., D’Atri, A., Protasi, M.: On the structure of combinatorial problems and structure preserving reductions. In: Proc. ICALP’77. Lecture Notes in Computer Science, Springer-Verlag (1977)
188
G. Ausiello et al.
7. Demange, M., Paschos, V.T.: On an approximation measure founded on the links between optimization and polynomial approximation theory. Theoret. Comput. Sci. 158 (1996) 117–141 8. Ausiello, G., Protasi, M.: NP optimization problems and local optima graph theory. In Alavi, Y., Schwenk, A., eds.: Combinatorics and applications. Proc. 7th Quadriennal International Conference on the Theory and Applications of Graphs. Volume 2. (1995) 957–975 9. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and approximation. Combinatorial optimization problems and their approximability properties. Springer, Berlin (1999) 10. Ausiello, G., Bazgan, C., Demange, M., Paschos, V.T.: Completeness in differential approximation classes. Cahier du LAMSADE 204, LAMSADE, Universit´e ParisDauphine (2003) Available on http://www.lamsade.dauphine.fr/cahiers.html. 11. Crescenzi, P., Kann, V., Silvestri, R., Trevisan, L.: Structure in approximation classes. SIAM J. Comput. 28 (1999) 1759–1782 12. Monnot, J.: Differential approximation results for the traveling salesman and related problems. Inform. Process. Lett. 82 (2002) 229–235 13. Hassin, R., Khuller, S.: z-approximations. J. Algorithms 41 (2001) 429–442 14. Bazgan, C., Paschos, V.T.: Differential approximation for optimal satisfiability and related problems. European J. Oper. Res. 147 (2003) 397–404 15. Toulouse, S.: Approximation polynomiale: optima locaux et rapport diff´erentiel. PhD thesis, LAMSADE, Universit´e Paris-Dauphine (2001). 16. Ausiello, G., Protasi, M.: Local search, reducibility and approximability of NPoptimization problems. Inform. Process. Lett. 54 (1995) 73–79 17. Monnot, J., Paschos, V.T., Toulouse, S.: Optima locaux garantis pour l’approximation diff´erentielle. Technical Report 203, LAMSADE, Universit´e Paris-Dauphine (2002). Available on http://www.lamsade.dauphine.fr/cahdoc.html#cahiers. 18. Monnot, J., Paschos, V.T., Toulouse, S.: Approximation algorithms for the traveling salesman problem. Mathematical Methods of Operations Research 57 (2003) 387–405 19. Khanna, S., Motwani, R., Sudan, M., Vazirani, U.: On syntactic versus computational views of approximability. SIAM J. Comput. 28 (1998) 164–191
On the Length of the Minimum Solution of Word Equations in One Variable Kensuke Baba1 , Satoshi Tsuruta2 , Ayumi Shinohara1,2 , and Masayuki Takeda1,2 1
2
PRESTO, Japan Science and Technology Corporation (JST) Department of Informatics, Kyushu University 33, Fukuoka 812-8581, Japan {baba,s-tsuru,ayumi,takeda}@i.kyushu-u.ac.jp
Abstract. We show the tight upperbound of the length of the minimum solution of a word equation L = R in one variable, in terms of the differences between the positions of corresponding variable occurrences in L and R. By introducing the notion of difference, the proof is obtained from Fine and Wilf’s theorem. As a corollary, it implies that the length of the minimum solution is less than N = |L| + |R|.
1
Introduction
Word equations can be used to describe several features of strings, for example, they generalize pattern matching problem [3,4] with variables. Fig. 1 shows an example of word equations. The fundamental work in word equations is Makanin’s algorithm [10] which decides whether a word equation has a solution (see for a survey on this topic [9]). Plandowski [11] introduced a PSPACE algorithm which gives the best upperbound so far known. On the other hand, the problem is known to be NP-hard [1]. An approach to the problem is to analyze word equations with a restricted number of variables. Charatonik and Pacholski [2], and Ilie and Plandowski [7] introduced a polynomial time algorithm for word equations in two variables. As to word equations in one variable, there is an efficient algorithm by Obono et al. [6] which solves a word equation L = R in O(N log N ) time in terms of N = |L| + |R|. D¸abrowski and Plandowski [5] presented an algorithm of O(N + x log N ) time complexity for the number x of occurrences of the variable x. However, the upperbound of the length of the minimum solution of word equations is not exactly understood even for one-variable version. Let χ be the upperbound, that is, a word equation has a solution if and only if there exists a solution A of length |A| ≤ χ. For any word equation in one variable, we can choose a single candidate for the solution of a length, therefore we have only to check for the χ candidates at most to decide whether a word equation has a solution. Indeed no χ leads a better result for the complexity as long as it is proportional to N , but from a practical viewpoint, χ is quite important. In [6], χ is taken to be equal to 4N without precise proof. Hence, we need to reduce χ as small as possible and prove it formally. B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 189–197, 2003. c Springer-Verlag Berlin Heidelberg 2003
190
Kensuke Baba et al.
Let a, b be characters and x be a variable. The word equation xxbaababa = ababaxabx has a solution
x = ababaababa
Fig. 1. An example of word equation in one variable
In this paper, we show the tight upperbound of the minimum solution for one variable word equations, by introducing a new measure in terms of the positions of variable occurrences. The bound reveals that χ is less than N . We now explain the basic idea briefly. A word equation in one variable is nontrivial only if both side of the equation have the same number of occurrences of the variable: Otherwise, the length of a possible solution is exactly determined by an integer equation on both the length of instance and the number of variable occurrences. Let m be the number of occurrences. We focus on the fact that, for a word equation L = R, the “gap” between the k-th occurrence of the variable x in L and the k-th occurrence in R is preserved for any substitution of a string A, as the gap between the corresponding occurrences of A in L[A/x] and R[A/x]. We denote the gaps by dk (1 ≤ k ≤ m). In the example in Fig. 1, d1 = 5 and d2 = 7. By utilizing this notion, the proof of the upperbound is essentially reducible to one for a word equation which has only one occurrence of x in both side respectively. If A is a solution and is longer than dk , then the k-th pair of occurrences of A overlap each other, that is, dk is a period of A. Therefore, by Fine and Wilf’s theorem [9], the upperbound is max1≤k≤m {dk + p−gcd(dk , p)}−1 for a period p of A. Since the minimum length of p is not larger than min1≤k≤m,dk =0 dk , the tight upperbound will be given as max1≤k≤m dk + min1≤k≤m,dk =0 dk −2. Obviously, min1≤k≤m,dk =0 dk ≤ max1≤k≤m dk < |L|. Thus χ is less than N = 2|L|.
2
Preliminaries
Let Σ be an alphabet and x ∈ / Σ be a variable. The empty word is denoted by ε. The length of a word w is denoted by |w|, where |ε| = 0 and |x| = 1. The i-th element of a word w is denoted by w[i] for 1 ≤ i ≤ |w|. The word w[i]w[i+1] · · · w[j] is called a subword of w, and denoted by w[i : j]. In particular, it is called a prefix if i = 1 and a suffix if j = |w|. For convenience, let w[i : j] = ε for j < i. A period of a non-empty word w is defined as an integer 0 < p ≤ |w|, such that w[i] = w[i + p] for any 1 ≤ i ≤ |w| − p. Note that the |w| is always a period of w. Proposition 1 (Fine and Wilf ). Let p, q be periods of a word w. If |w| ≥ p + q − gcd(p, q), then gcd(p, q) is also a period of w.
On the Length of the Minimum Solution of Word Equations in One Variable
191
A word equation (in one variable) is a pair of words over Σ ∪ {x} and is usually written by connecting two words with “=”. A solution of a word equation L = R is a homomorphism σ : (Σ ∪ {x})∗ → Σ ∗ leaving the letters of Σ invariant and such that σ(L) = σ(R). Since the solution is uniquely decided by a mapping of x into Σ ∗ , in this paper we define a solution as a word A ∈ Σ ∗ such that A = σ(x). Therefore, we can rewrite the condition that σ(L) = σ(R) by L[A/x] = R[A/x], where the result w[A/x] of the substitution of A to x in a word w is defined inductively as: if w = ε, w[A/x] = ε; if w = a ∈ Σ, w[A/x] = a; if w = x, w[A/x] = A; if w = w1 w2 , w[A/x] = w1 [A/x]w2 [A/x]. If two words L and R have the same prefix M , the solution of a word equation L = R is obtained by solving the word equation L = R where L = M L and R = M R . Therefore, we can assume without loss of generality that any word equation is of the form xL1 = BxR1 for a non-empty word B which has no variable and words L1 , R1 . This form implies that any solution A is a prefix of the word B k for a natural number k. By a similar argument for suffix, we can assume that either L1 or R1 ends with x. In particular, if L and R have exactly one occurrence of x respectively, the word equation L = R can be reduced to the form xC = Bx for non-empty words B, C which have no variable. We denote by x (w) the number of occurrences of the variable x in a word w. If a word equation L = R has a solution A, the length of L[A/x] is same as the length of R[A/x]. Hence we have |L| + x (L) · (|A| − 1) = |R| + x (R) · (|A| − 1), |L|−|R| and therefore |A| = x (R)− +1. If x (L) = x (R), the length of the solution is x (L) determined uniquely to the word equation and its upperbound is | |L|−|R| |+1 ≤ max(|L|, |R|). If x (L) = x (R), we have |L| = |R|. Proposition 2 ([6]). Let L = R be a word equation. (i) If x (L) = x (R), the length of the solution is determined uniquely with respect to L = R and is at most max(|L|, |R|). (ii) If x (L) = x (R), L = R has a solution only if |L| = |R|.
3
Solutions
We show the upperbound of the length of the minimum solution of word equations in one variable. By Proposition 2, we have only to consider the word equation L = R in the situation that x (L) = x (R) and |L| = |R|. Let x m = x (L) = x (R) and n = |L| = |R|. We denote by x1 , · · · , xm and r1x , · · · , rm the positions of occurrences of x in L and R, respectively in increasing order. A We define A k and rk for a word A and 1 ≤ k ≤ m as x A k = k + (k − 1)(|A| − 1), rkA = rkx + (k − 1)(|A| − 1).
192
Kensuke Baba et al.
L: R:
x1 x
A 1
L[A/x]:
A
-
r1x x
r2x x
A
-
A 2 A
r1A R[A/x]:
x2 x
A
r2A
A
A
-
A Fig. 2. The difference xk − rkx is equal to the difference A k − rk for any A
A k is, intuitively, the position in L[A/x] of a occurrence of A substituted to the kth occurrence of x in L (which is not always the k-th occurrence).Therefore, A k − rkA is the difference between it and the position of the corresponding occurrence of A in R[A/x]. The difference does not depend on the length of A, see Fig. 2. Proposition 3. For any word A, any word equation L = R, and integer 1 ≤ k ≤ m, A x x (i) A k − rk = k − rk , A A (ii) L[A/x][k : k + |A| − 1] = R[A/x][rkA : rkA + |A| − 1] = A. Proof. (i) Trivial by the definition. (ii) We prove for L. By the definition of substitution, L[A/x] is represented as L[A/x] = L[1 : x1 − 1]AL[x1 + 1 : x2 − 1] · · · L[xk−1 + 1 : xk − 1]AL[xk + 1 : xk+1 − 1] · · · L[xm−1 + 1 : xm − 1]AL[xm + 1 : n].
(1)
The length of the prefix of L[A/x] which ends L[xk−1 + 1 : xk − 1] equals to k (x1 −1)+ i=2 {(xi −1)−(xi−1 +1)+1}+(k −1)|A| = xk −k +(k −1)|A| = A k −1 for any 1 ≤ k ≤ m. Thus A k is the position of the occurrence of A which is the next to L[xk−1 + 1 : xk − 1] in the right side of Eq. (1). We denote by dk the absolute value of the difference, that is, dk = |xk − rkx | for 1 ≤ k ≤ m. Then we have the following lemma. Lemma 1. Let A be a solution of a word equation L = R. For 1 ≤ k ≤ m and dk = 0, if |A| ≥ dk then A has a period dk .
On the Length of the Minimum Solution of Word Equations in One Variable
193
Proof. We can assume rkx < xk without loss of generality. If |A| = dk , by the A x x definition, dk is a period of A. If |A| > dk , by Proposition 3 (i), A k = rk +k −rk = A A rk + dk < rk + |A|. Since A is a solution of L = R, we consider subwords of A A A L[A/x] and R[A/x], then L[A/x][A k : rk + |A| − 1] = R[A/x][k : rk + |A| − 1]. A A A By Proposition 3 (ii), L[A/x][k : rk + |A| − 1] = A[1 : |A| − (k − rkA )] and A A A R[A/x][A k : rk + |A| − 1] = A[1 + (k − rk ) : |A|]. Thus, by Proposition 3 (i), x x x x A[1 : |A| − (k − rk )] = A[1 + (k − rk ) : |A|] which implies that xk − rkx is a period of A. Lemma 2. Let A be a solution of a word equation L = R and p be a period of A. If |A| ≥ max dk + p − 1, 1≤k≤m
then the prefix A[1 : |A| − p] of A is also a solution of L = R. Proof. We prove by induction on the number m = x (L) = x (R). (Base step) By the argument in Section 2, we can assume L = xC and R = Bx with B, C ∈ Σ + . By Lemma 1, d1 = |B| is a period of A. By Proposition 1, gcd(d1 , p) is a period of A, moreover it is also a period of AC and BA. Since A[1 + gcd(d1 , p) : |A|] = A[1 : |A| − gcd(d1 , p)], we have A[1 : |A| − k gcd(d1 , p)]C = (AC)[1 + k gcd(d1 , p) : |A|] = (BA)[1 + k gcd(d1 , p) : |A|] = BA[1 : |A| − k gcd(d1 , p)] for a natural number k such that k gcd(d1 , p) ≤ |A|. (Induction step) We can assume L = L xC and R = R xBx with L , R ∈ (Σ ∪ {x})+ and B, C ∈ Σ + . Then we have dm = |C| and L [A/x]AC = R [A/x]ABA. If |C| ≤ |B|, the result is obviously obtained by induction for two equations L = R xB[1 : |B| − |C|] and xC = B[|B| − |C| + 1 : |B|]x. If |C| > |B|, we have |ABA| > |AC| > |BA| by the assumption |A| ≥ max1≤k≤m dk + p − 1. Hence the occurrence of A starting at A m in L[A/x] and the occurrence of A starting at A rm−1 in R[A/x] have a non-trivial overlapping Q. (This situation is illustrated in Fig. 3.) Now we consider two equations L Q = R x and xC = QBx. The assumption L [A/x]AC = R [A/x]ABA implies L [A/x]Q = R [A/x]A and AC = QBA, that is, A is a solution of the equations. Then, by induction hypothesis, we have L [A /x]Q = R [A /x]A and A C = QBA where A = A[1 : |A| − p]. Thus, we have L[A /x] = L [A /x]A C = L [A /x]QBA = R [A /x]A BA = R[A /x]. Theorem 1 (Tight upperbound). For any word equation L = R such that x (L) = x (R), the length of the minimum solution is at most max dk +
1≤k≤m
The bound is tight.
min
1≤k≤m,dk =0
dk − 2.
194
Kensuke Baba et al. L: R:
L[A/x]: R[A/x]:
L
xm x
R
C
x rm−1
x rm x B x
A m Q
L [A/x]
R [A/x]
A rm−1
A
Q
C
A B
A rm
A
Fig. 3. If |C| > |B| and |A| ≥ dm = |C|, then |ABA| > |AC| > |BA| and two A occurrences of A starting at A m and rm−1 have a overlap Q
Proof. Assume a word equation has a solution A such that |A| ≥ max1≤k≤m dk + min1≤k≤m,dk =0 dk − 1. By Lemma 1, A has a period p ≤ min1≤k≤m,dk =0 dk . Hence, by Lemma 2, A[1 : |A| − p] is also a solution of the word equation. Therefore A is not the minimum solution. To see that the bound is tight, let us consider the following word equation: xxbaababa = ababaxabx. We can verify that the solution of length 10 x = ababaababa. is in fact the minimum solution. Since d1 = 5 and d2 = 7, we have max1≤k≤2 dk = 7 and min1≤k≤2 dk = 5. Thus max1≤k≤2 dk + min1≤k≤2 dk − 2 = 10, which shows the bound is tight. In case of binary alphabet, the minimum solution which length is the upper bound is central which is defined as: A word is central if and only if it is in the set 0∗ ∪ 1∗ ∪ (P ∩ P 10P ) where P is the set of palindrome words. It is obtained by the proof of Lemma 2 and the fact that: a word w is central if and only if it has two periods p and q such that gcd(p, q) = 1 and |w| = p+q−2 [9, pp. 69–70]. We also have the following relaxed upperbound, since min1≤k≤m,dk =0 dk ≤ max1≤k≤m dk < |L|. Corollary 1. For any word equation L = R such that x (L) = x (R), the length of the minimum solution is at most N − 4 = |L| + |R| − 4. Consequently, we have the following upperbound by Proposition 2. Corollary 2. For any word equation L = R, the length of the minimum solution is at most N − 1.
On the Length of the Minimum Solution of Word Equations in One Variable
195
Table 1. The numbers of solvable word equations in one variable in E, classified by the lengths of their minimum solutions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
4
3 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0
4 32 20 12 0 0 0 0 0 0 0 0 0 0 0 0
5 220 104 56 24 0 0 0 0 0 0 0 0 0 0 0
6 1388 548 252 140 60 0 0 0 0 0 0 0 0 0 0
length of L (and R) 7 8 9 10 8364 49120 284204 1630124 2868 14856 76236 388212 1208 5844 28268 136536 564 2488 11304 53008 260 1148 4764 20784 116 580 2052 8592 8 264 1152 4368 0 8 504 2148 0 0 24 1084 0 0 8 48 0 0 8 36 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0
11 9303292 1964612 657868 250296 95868 36076 16152 7532 4404 2120 136 24 0 8 8
String Statistics Problem
We are developing a system whose aim is to experimentally analyze the combinatorial property and structures of word equations. As a first step, we are recording all solvable word equations (up to a moderate length) in one variable together with their minimum solutions. By the fact that: for any word w, there exists a binary word w which has the same set of periods as w [9, pp. 275–279], we have only to consider a binary alphabet to find out the relation between the length of an equation and the length of its solutions. For a fixed alphabet Σ = {a, b} and a specified length n, we enumerate the set E of all word equations L = R such that (1) both a and b appear either L or R, (2) |L| = |R| = n, (3) L and R contains the same number of variables, and (4) the pairs (L[1], R[1]) and (L[n], R[n]) must be taken from {(x, a), (x, b), (a, x), (b, x)}. Then for each word equation in E, we try to find the minimum solution by checking each prefix of B k (where B is a constant prefix of either L or R) in increasing order up to 2n − 4. If a solution is found, we logged it and turn to the next equation. Otherwise, we can conclude that the word equation has no solution, thanks to the upperbound we have shown (Corollary 1). For interested readers, Table 1 shows the numbers of the solvable word equations in E, classified by the lengths of their minimum solutions. At i-th row and column labeled n = |L| of the table T , we fill the number of word equations in E of length |L| = |R| = n whose minimum solution is of length i. Remark that some equations may be equivalent each other, by either replacing a with b, exchanging left-side with right-side, or reversing the formulae. We did not exclude these duplications. For example, T (0, 3) = 4 corresponds to the number of equations {abx = xab, bax = xba, xab = abx, xba = bax}, where the
196
Kensuke Baba et al.
empty string is a solution to them. They are equivalent each other. Moreover, T (1, 3) = 4 corresponds to {abx = xba, bax = xab, xab = bax, xba = abx}, whose minimum solutions are of length 1. They are essentially the same. Let us pick up some interesting pairs of equation and its minimum solution. – xxbaababa = ababaxabx, ababaababa, from T (10, 9) = 8, which was used to prove the tightness of the upperbound. This is a unique instance in T (10, 9) = 8, since the other 7 instances are all equivalent to it. – xxbaabababa = abababaxabx, abababaabababa, from T (14, 11) = 8, which also matches the upperbound. This is a unique instance in T (14, 11) = 8, since the other 7 instances are all equivalent to it. – xabxbaaaaaa = aaaaaabaxbx, aaaaaabaaaaaa). This is a unique instance in T (13, 11) = 8.
5
Conclusion
We showed the tight upperbound of the length of minimum solution of word equations in one variable. The upperbound is easily computed from a given word equation. Moreover, we showed concrete examples which match the bound. As a corollary, we also have a more relaxed upperbound which is easier applicable: the length of the minimum solution is less than the size of the total length of a word equation. Khmelevski˘ı [8, pp. 12] proved that if a word equation C0 xC1 · · · xCu = xB1 · · · xBv is solvable, it has a solution of length smaller than M 2 + 3M where M = maxi,j {u, v, |Ci |, |Bj |}. When we consider the upperbound in terms of the length N of a given word equation, the order of this value comes up to N 2 since M ≤ N − 1. Even for the original expression, we can show that the value M 2 + 3M − 1 never be less than the upperbound of our result for a non-trivial word equation. Let ν = u = v and λ = maxi,j {|Ci |, |Bj |}. Then M = max{ν, λ}. By the definition of dk , we have mink,dk =0 dk ≤ |C0 | ≤ λ and maxk dk ≤ ν k−1 max{ i=0 |Ci |, i=k |Ci |} ≤ νλ. Therefore, maxk dk + mink,dk =0 dk − 2 ≤ νλ + λ − 2 ≤ M 2 + 2M − 2 ≤ M 2 + 3M − 1. Thanks to the bound, we could perform a comprehensive analysis of word equations in one variable up to a moderate size the equations, by enumerating all word equations and solving them one by one. We showed some statistics of the lengths of minimum solutions.
Acknowledgements We thank Yoshihito Tanaka for useful discussions. The relation between the two upperbound in Section 5 was obtained by his idea. We also thank the anonymous referees for their helpful comments to improve this work, in particular the proof of Lemma 2 became clear.
On the Length of the Minimum Solution of Word Equations in One Variable
197
References 1. Angluin, D.: Finding Patterns Common to a Set of Strings. J. Comput. Sys. Sci., Vol. 21 (1980) 46–62 2. Charatonik, W. and Pacholski, L.: Word Equations in Two Variables. Proc. IWWERT’91, LNCS, Vol. 677 (1991) 43–57 3. Crochemore, M. and Rytter, W.: Text Algorithms. Oxford University Press, New York (1994) 4. Crochemore, M. and Rytter, W.: Jewels of Stringology. World Scientific (2003) 5. D¸abrowski, R. and Plandowski, W.: On Word Equations in One Variable. Proc. MFCS2002, LNCS Vol. 2420 (2002) 212–220 6. Eyono Obono, S., Goralcik, P., and Maksimenko, M.: Efficient Solving of the Word Equations in One Variable. Proc. MFCS’94, LNCS Vol. 841 (1994) 336–341 7. Ilie, L. and Plandowski, W.: Two-Variable Word Equations. Proc. STACS2000, LNCS Vol. 1770 (2000) 122–132 8. Khmelevski˘ı, Yu.I.: Equations in Free Semigroups. Proc. Steklov Inst. of Mathematics 107, AMS (1976) 9. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press (2002) 10. Makanin, G.S.: The Problem of Solvability of Equations in a Free Semigroup. Mat. Sb. Vol. 103, No. 2, 147–236. In Russian; English translation in: Math. USSR Sbornik, Vol. 32 (1977) 129–198 11. Plandowski, W.: Satisfiability of Word Equations with Constants is in PSPACE. Proc. FOCS’99, IEEE Computer Society Press (1999) 495–500
Smoothed Analysis of Three Combinatorial Problems Cyril Banderier1 , Ren´e Beier2 , and Kurt Mehlhorn2 1
Laboratoire d’Informatique de Paris Nord Institut Galil´ee, Universit´e Paris 13 99, avenue Jean-Baptiste Cl´ement, 93430 Villetaneuse, France
[email protected] 2 Max-Planck-Institut f¨ ur Informatik Stuhlsatzenhausweg 85, 66123 Saarbr¨ ucken, Germany {rbeier,mehlhorn}@mpi-sb.mpg.de
Abstract. Smoothed analysis combines elements over worst-case and average case analysis. For an instance x, the smoothed complexity is the average complexity of an instance obtained from x by a perturbation. The smoothed complexity of a problem is the worst smoothed complexity of any instance. Spielman and Teng introduced this notion for continuous problems. We apply the concept to combinatorial problems and study the smoothed complexity of three classical discrete problems: quicksort, left-to-right maxima counting, and shortest paths.
1
Introduction
Recently, Spielman and Teng [13] introduced smoothed analysis, which is intermediate between average case analysis and worst case analysis. The smoothed complexity of an algorithm is max Ey∈U (x) C(y) , x
where x ranges over all inputs, y is a random instance in a neighborhood of x (whose size depends on the smoothing parameter ), E denotes expectation, and C(y) is the cost of the algorithm on input y. In other words, worst-case complexity is smoothed by considering the expected running time in a neighborhood of an instance instead of the running time at the instance. If U (x) is the entire input space, smoothed analysis becomes average case analysis, and if U (x) = {x} for all , smoothed analysis becomes worst case analysis. Smoothed analysis gives information whether instance space contains dense regions of hard instances, see Figure 1. The smoothed complexity of an algorithm is low if worst-case instances are “isolated events” in instance space. In other words, if the smoothed complexity is low, worst case instances are not robust under small changes; most small changes to the instance destroy the property of being worst-case; a small random perturbation destroys the property of being worst-case.
This work was partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 198–207, 2003. c Springer-Verlag Berlin Heidelberg 2003
Smoothed Analysis of Three Combinatorial Problems
199
Fig. 1. The graphs show conceivable dependency of running time on input instance. We assume a one-dimensional input space; the running time on an instance is shown as the y-value of the graph. The neighborhood of an instance is simply an interval around the instance. In the situation on the left, the smoothed complexity will be equal to the worst case complexity (for all small enough ), and in the situation on the right, the smoothed complexity decreases sharply as a function of
In most sciences, one attempts to develop a theory that explains and predicts observed phenomena. Smoothed Analysis is one attempt to push the analysis of algorithms in this direction. To develop a mathematically rigorous theory of the behavior of algorithms, one must make mathematically rigorous assumptions about their inputs. Smoothed analysis makes the assumption that inputs are subject to noise, circumstance, or randomness. This is an assumption that is valid in most practical problem domains. While it is unreasonable to assume that one can develop a model of inputs subject to noise that precisely models practice, one can try to come close and then reason by analogy. Spielman and Teng [13] showed that the smoothed complexity of the simplex algorithm (with the shadow-vertex pivot rule) for linear programming is polynomial. Linear programming is a continuous problem. The input is a sequence of real numbers (a cost vector, a constraint matrix, a right-hand side). The smoothing operation adds Gaussian noise with parameter σ to each number in the input. The expected running time of the simplex algorithm for such a perturbed instance is polynomial in 1/σ and the number of input variables. The other papers on smoothed analysis [3,5] also discuss continuous problems. Our paper is the first to apply the concept of smoothed analysis to discrete problems. We define natural models of perturbation for sequences and natural numbers and analyze the smoothed complexity of quicksort, left-to-right maxima and the shortest path problem. Partial Permutations: Our first model applies to problems defined on sequences. It is parameterized by a real parameter p with 0 ≤ p ≤ 1 and is defined as follows. Given a sequence s1 , s2 , . . . , sn , each element is selected (independently) with probability p. Let m be the number of selected elements (on average m = pn). Choose one of the m! permutations of m elements (uniformly at random) and let it act on the selected elements. E.g., for p = 1/2 and n = 7, one might select m = 3 elements (namely, s2 , s4 , and s7 ) out of an input sequence (s1 , s2 , s3 , s4 , s5 , s6 , s7 ). Applying the permutation (312) to the selected elements yields (s1 , s7 , s3 , s2 , s5 , s6 , s4 ). The probability to obtain this sequence in this way is p3 (1 − p)4 /3!.
200
Cyril Banderier, Ren´e Beier, and Kurt Mehlhorn
We show that the smoothed complexity of a deterministic variant of quicksort is O((n/p) ln p) (Section 2). However, it is not true that every fixed element takes part in O((1/p) ln n)) comparisons on average. The maximum recursion depth is Ω( n/p) (Section 3). Partial Bit Randomization: Our second model applies to problems involving natural numbers and is parameterized by an integer k ≥ 0. For each integer of the input, the last k bits are randomly modified. This model is a discrete analogue of the model considered by Spielman and Teng. Notice, that in our model of perturbation the expectation of the resulting distribution is not necessarily equal to the unperturbed value. We show that the smoothed running time of the shortest path algorithm of Meyer [10] and Goldberg [6] on a graph with n nodes and m edges is O(m + n(K − k)) when the edge lengths are K bit integers, the last k of which are randomized. In [2] our model is used to analyze clairvoyant scheduling.
2
Quicksort
We analyze a deterministic variant of quicksort under partial permutations. We assume that quicksort takes the first element of the list as the pivot and splits the input list with respect to the pivot into two parts: the elements smaller than the pivot and the elements larger than the pivot. We assume that the order of elements in the resulting two sublist is unchanged. Theorem 1 (Quicksort under Limited Randomness). The expected running time (i.e., number of comparisons) of quicksort on a partial permutation of n elements is O((n/p) ln n). Proof: We utilize a proof, based on randomized incremental constructions [11]. Let C denote the number of comparisons performed by the algorithm. Assume that we have a permutation of the numbers 1 to n. Let Xij be the indicator variable which is 1 iff i and j are compared in a run of quicksort with i being the pivot. Clearly C = i,j Xij . Then Xij = 1 if and only if i occurs before all other elements with value between i and j in the sequence. Thus, for a random permutation, prob(Xij = 1) = |1/(j − i + 1)| and hence E[C] =
i=j
1 1 ≤ 2n ≤ 2n ln n . |j − i + 1| k 2≤k≤n
Next we estimate prob(Xij = 1) for partial permutations. Let s1 , . . . , sn be our initial permutation and let L = (8/p) ln n. If i is among s1 , . . . , sL or |j − i| ≤ L, we estimate prob(Xij = 1) for a total contribution of O( np ln n). Next assume that there are at least L elements preceding i in the initial permutation and that |i − j| > L. We split our estimate for prob(Xij = 1) into two parts. First assume that i is selected. Let l = |i − j|. The probability that at most lp/2 elements between i (exclusive) and j (inclusive) are selected is less than
Smoothed Analysis of Three Combinatorial Problems
201
exp(−lp/8). If more than lp/2 elements are selected, Xij = 1 implies that i is first in the permutation of the selected elements which happens with probability at most 2/(lp). Together we obtain prob(Xij = 1) ≤ exp(−lp/8) + 2/(lp) and hence n ln n . exp(−lp/8) + 2/(lp) = O p LL
Assume next that i is not selected and let i be the ki -th element in the initial sequence. The probability that less than pki /2 elements before i are chosen or less than pl/2 elements between i and j or more than 2pn elements are chosen altogether is less than exp(−pki /8)+exp(−pl/8)+exp(−pn/2). The contribution of these rare events to the E[C] is only O(1/p) since for the first term we have exp(−pki /8) = n exp(−pL/8) exp(−p/8)m = O(1/p) . ki ≥L
j
m≥0
The same bound can be shown for the other two terms. So assume that the required number of elements are chosen. If i occurs before all elements i + 1, . . . , j in the partial permutation, it must be the case that none of the pl/2 selected elements between i and j is inserted before i which happens with probability at most
2pn − ki p/2 2pn
lp/2 ≤ exp(−ki lp/(8n)).
Next observe that n n k=1 l=1
exp(−klp/(8n)) ≤
n k=1
n
16n 1 16n ln n ≤ ≤ ; 1 − exp(−kp/(8n)) kp p k=1
since 1 − e−x ≥ x/2 for 0 ≤ x ≤ 1 and hence 1/(1 − e−x ) ≤ 2/x.
Remark: When we consider partial permutations of a sorted sequence (the worstcase instance without permutations) we are able to get closed form formulae for the Xij ’s. We distinguish 10 subcases, most of them involving 7 nested sums. From these sums (involving binomials), it is possible to get the differential equation satisfied by their generating functions, and then the Frobenius method allows to get the full asymptotic scale which gives a p2 n ln n complexity. We refer the reader to the full paper for details. Pitfalls: The expected running time of quicksort on random permutations can be analyzed in many different ways. Many of them rely on the fact that the subproblems generated by recursive calls are again random permutations. This is not true for partial permutations1 as the following example demonstrates. Consider an input 1, 2, 3, 4 and define q := 1 − p. Assume that 2 is the pivot element and hence the second subproblem consists of the numbers 3, 4. If 2 is 1
In the first version of this paper, we fell into this pitfall.
202
Cyril Banderier, Ren´e Beier, and Kurt Mehlhorn
the pivot (first element after permutation), at least the numbers 1 and 2 are selected. Conditioned on the fact that 1 and 2 are selected and 2 is made the first element we obtain subproblem (3, 4) – – – –
always, when neither 3 or 4 is selected (probability q 2 ), always2 , when 3 is selected, but 4 is not (probability pq) in one out of two cases3 , when 4 is selected but 3 is not (probability pq/2) in one out of two cases, when 3 and 4 are selected (probability p2 /2).
Thus prob((3, 4)) = q 2 + 32 pq + 12 p2 . Consequently, prob((4, 3)) = 12 pq + 12 p2 . On the other hand, applying partial permutations on input sequence 3, 4 gives prob((3, 4)) = q 2 + 2pq + 12 p2 and prob((4, 3)) = 12 p2 . We also point out that the content of the first position, even if it is selected, is not a random element of the sequence. It is more likely to be the original element than any other element. The other elements are equally likely. This unbalance results from the fact that if only one element is selected, the permutation of the selected elements has very little freedom. The expected maximum recursion depth of quicksort on random permutations is O(ln n). For partial permutations the expected maximum recursion depth is Ω( n/p). This is a consequence of the result in the next section. We will show that the number of left-to-right-maxima in a partial permutation might be as large as Ω( n/p). The number of left-to-right-maxima is the number of times the largest element is compared to a pivot element. Thus some elements may take part in as many as Ω( n/p) recursive calls which disproves the conjecture that every element takes part in O((1/p) ln n) calls with high probability. In the full version of this paper we show that the worst-case instances (sorted sequence, increasing or decreasing) also exibit the highest smoothed complexity. The complexity landscape for quicksort has two dominant peaks with a rather sharp transition (cf. Figure 1). We will see in the next section that is not always the case.
3
Left-to-Right Maxima
The simplest strategy to determine the largest element in a sequence is to scan the sequence from left to right and to keep track of the largest element seen. The number of changes to the current maximum is called the number of left-toright maxima in the sequence. The sequence 1, . . . , n has n left-to-right maxima and the expected number of left-to-right maxima in a random permutation of n elements is Hn = 1 + 1/2 + · · · + 1/n. Surprisingly, the worst case instance for the classical analysis is not the worst case under smoothing. Theorem 2 (Left-to-Right Maxima). Under the model, partial permutation the smoothed number of left-to-right maxima is Ω( n/p) and O( (n/p) log n), whereas the smoothed complexity for the sequence (1, . . . , n) is 2 3
The relevant partial permutations of the input are (2, 1, 3, 4) and (2, 3, 1, 4). The relevant partial permutations are (2, 1, 3, 4) and (2, 4, 1, 3). The former generates the subproblem (3, 4).
Smoothed Analysis of Three Combinatorial Problems
1−p + ln(pn) + γ + 2 p
1 2 (1 − p) + 2 p2
203
1 1 + O( 2 ) , n n
where γ ≈ .5772 is Euler’s constant. Proof: Due to space limitations we provide a proof only for the two first asymptotic terms of the smoothed complexity of the sorted sequence (see full version of the paper for a generating function proof which gives the full asymptotics). The sequence 1, . . . , n has n left-to-right maxima. Smoothing decreases the number to about ln(pn) + 2/p as we show next. Let Xi be the probability that the i-th position is not selected and is a maximum and let Yi be the probability that the i-th position is selected and is a maximum. Consider first a selected position. It contains a maximum if and only if it is a maximum among the selected elements. In this case its value is at least i and hence it is also a maximum considering all elements. Thus i Yi is simply the number of maxima among the selected elements. The number of selected elements concentrates around pn and hence E[ i Yi ] ≈ log(pn). Assume next that i is not selected. We start with the observation that Xi and Xn+1−i have the same distribution. Consider i < n/2. Position i stays a maximum if non of the preceding i − 1 elements move to a position larger than i. Analogously, position n + 1 − i stays a maximum if non of the succeeding i − 1 elements move to a position smaller than i + 1 − i. We therefore concentrate on i ≤ n/2. If k1 elements among the first i − 1 and k2 elements among the last n − i are selected, the probability that i stays a maximum is f (k1 , k2 ) =
k1 ! · k2 ! . (k1 + k2 )!
The expression for f (k1 , k2 ) is decreasing in both arguments. Namely, k1 ! · (k2 + 1)! · (k1 + k2 )! k2 + 1 f (k1 , k2 + 1) = = ≤1. f (k1 , k2 ) (k1 + k2 + 1)! · k1 ! · k2 ! k1 + k2 + 1 We want to compute E[ i≤n/2 Xi ]. Define L = (16/p) log n. We split the sum into two parts: i ≤ L and i > L. For the second part, i > L, we expect to select about pi ≥ pL elements less than i and about p(n − i) ≥ pn/2 elements larger than i. The probability that we select less than half the stated number in either part is less than exp(−(16/8) log n) = O(n−2 ) by Chernoff bounds. If at least pL/2 = 8 log n elements smaller than i are selected and at least pn/4 elements larger than i are selected the probability that i is a maximum is less than f (8 log n, pn/4) = O(n−2 ). Taking into account that i in not selected, we get prob(Xi = 1) = O((1 − p)n−2 ) for L ≤ i ≤ n/2. We turn to the i’s with i < L. If none of the first i − 1 elements is selected i stays a maximum. If at least one of the first i − 1 elements is chosen, the probability that i stays a maximum is at most e−pn/16 + 4/pn. The first term accounts for the fact that less than pn/4 elements larger i are selected and the
204
Cyril Banderier, Ren´e Beier, and Kurt Mehlhorn
second term accounts for the fact that at least pn/4 elements larger i are selected and none of them is moved to a position before i. Thus for i < L prob(Xi = 1) ≤ (1 − p) (1 − p)i−1 + e−pn/16 + 4/pn and hence L−1
E[
i=1
Xi ] ≤
1−p 1−p + (1 − p)L(e−pn/16 + 4/pn) = (1 + o(1)), p p
for constant p. Taking into account all i ≥ n/2, we conclude 2(1 − p) E[ (Xi + Yi )] ≤ log(pn) + + o(1) . p i In fact, constant p is not required. The argument remains valid as long as log n/(p2 n) = o(1), i.e., for p log n/n. We now come to the first assertion of the theorem: the complexity of the worst case among all perturbations. We show that, for p < 1/2, the smoothed number of left-to-right maxima in a permutation of n elements may be Ω( n/p). Consider the sequence n − k, n − k + 1, . . . , n, 1, 2, . . . , n − k − 1 (where k = n/p) . The first part of the sequence consists of the first k elements. Let a ≈ pk and b ≈ p(n − k) be the number of selected elements in the first and second part of the sequence respectively. For large n, the probability that a > 2pk or b < pn/2 is exponentially small by Chernoff bounds. So assume a ≤ 2pk and b ≥ pn/2. The probability that all elements selected in the first part are put into the second part by the random permutation of the selected elements is at least q :=
b−1 b−a+1 b · ··· . a+b a+b−1 b+1
In this case all elements not selected in the first part are left-to-right maxima. We have a a 4a2 2a 2a b−a ≥ exp − . = 1− = exp a ln 1 − q≥ a+b a+b a+b a+b since ln(1 − x) ≥ −2x for 0 ≤ x ≤ 3/4. Using a ≤ 2pk and b ≥ pn/2 we get 4(2p)2 n/p 4a2 ≥ exp − ≥ e−32 . q ≥ exp − a+b pn/2 We conclude that with constant probability the number of left-to-right maxima in the perturbed sequence is at least k − a ≥ k(1 − 2p) = Ω( n/p) for p < 1/2.
Smoothed Analysis of Three Combinatorial Problems
205
We next show an almost matching upper bound. Let s1 , . . . , sn be an arbitrary permutation of the numbers 1 to n, let k = 8(n/p) log n, and let I be the set of indices such that i ≥ k and si ≤ n − k. Basically, I ignores the first k and the largest k elements of the permutation. We estimate how many si with i ∈ I are left-to-right maxima in the perturbed sequence. Then the total number of maxima is at most 2k larger. Consider a fixed si with i ∈ I. If si is selected and is a maximum in the partial permutation, it must be a maximum among the selected elements. The expected number of left-to-right maxima among the selected elements is ln pn. So assume that si is not selected. With high probability there are at least kp/2 elements preceding si among the selected elements, there are at least kp/2 elements larger than si among the selected elements, and there are at most 2np selected elements. Therefore, the probability that si is a maximum in the perturbed sequence is bounded by kp/2 kp/2 2np − kp/2 k 1 ≤ 1− ≤ exp(−k 2 p/(8n)) = 2np 4n n and hence the expected number of left-to-right maxima in the perturbed se quence is O( (n/p) log n) .
4
Single Source Shortest Path Problems
We consider the single source shortest path problem with nonnegative integer edge weights. As usual, let n and m denote the number of nodes and edges respectively. We assume our edge weights to be in [0, 2K −1], i.e., edge weights are K bit integers. Meyer [10] has shown that the average complexity of the problem is linear O(n + m). He assumes edge weights to be random K bit integers and that a certain set of primitive operations on such integers can be performed in constant time (addition, finding the first bit where two integers differ, . . . ). The algorithm can be used for arbitrary graphs. An alternative algorithm was later given by Goldberg [6] and his work is the starting point for this section. The worst case complexity of his algorithm is O(m + nK). Algorithms with better worst case behavior are known [1,4,12,8]. Theorem 3 (Shortest Paths under Limited Randomness). Let G be an arbitrary graph, let c : E → [0, . . . , 2K − 1] be an arbitrary cost function, and let k be such that 0 ≤ k ≤ K. Let c be obtained from c by making the last k bits of each edge cost random. Then the single source shortest path problem can be solved in expected time O(m + n(K − k)). With full randomness the expected running time is O(m + n), with no randomness the running time is O(m + nK). Limited randomness interpolates linearly between the extremes. Proof: Goldberg has shown that the running time of his algorithm is (K − log min in cost(v) + 1), O(n + m + v
206
Cyril Banderier, Ren´e Beier, and Kurt Mehlhorn
where min in cost(v) denotes the minimal cost of an (directed) edge with target node v. Next observe that min in cost(v) is the minimum of indeg(v) numbers of which the last k bits are random; here indeg(v) denotes the indegree of v. For an edge e, let r(e) be the number of leading zeroes in the random part of e. Then E[r(e)] ≤ 2 and K − log min in cost(v) ≤ K − k + max{r(e) ; e ∈ inedges(v)} ≤K −k+ {r(e) ; e ∈ inedges(v)} Thus E[K − log min in cost(v)] ≤ K − k + O(indeg(v)) and the time bound follows. In our model of perturbation, the last k bits of each weight are set randomly. Alternatively, one might select bits with probability p and set selected bits to random values. With this definition, the smoothed complexity becomes O(m/p). For an edge e, let r(e) be the number of leading zeroes in the weight of e. Then E[r(e)] ≤ 2/p and K − log min in cost(v) ≤ max{r(e) ; e ∈ inedges(v)} ≤ {r(e) ; e ∈ inedges(v)} Therefore, E[K − log min in cost(v)] ≤ O(indeg(v)/p) and the time bound follows.
5
Conclusion
We have analyzed the smoothed complexity of three combinatorial problems. Smoothed complexity gives additional information about the distribution of hard instances in instance space. We believe, that the analysis of further discrete problems is a worthwhile task. Beccetti et al. [2] have recently analyzed NonClairvoyant scheduling under the partial bit randomization model. From a more theoretical viewpoint, it is natural to raise the question “Is there any relevant notion of smoothed complexity completeness?”. Such a notion would have to extend the notion of average case completeness which was introduced by Levin [9,7]. In conclusion, we believe that smoothed complexity is a key concept to get a better understanding of the behavior of algorithms in practice.
References 1. R. K. Ahuja, K. Mehlhorn, J. B. Orlin, and R. E. Tarjan. Faster algorithms for the shortest path problem. J. Assoc. Comput. Mach., 37(2):213–223, 1990. 2. L. Becchetti, S. Leonardi, A. Marchetti-Spaccamela, G. Sch¨ afer, and T. Vredeveld. Smoothening helps: A probabilistic analysis of the Multi-Level Feedback algorithm. submitted, 2003.
Smoothed Analysis of Three Combinatorial Problems
207
3. A. Blum and J.D. Dunagan. Smoothed analysis of the perceptron algorithm for linear programming. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA-02), pages 905–914. ACM Press, 2002. 4. Boris V. Cherkassky, Andrew V. Goldberg, and Craig Silverstein. Buckets, heaps, lists, and monotone priority queues. SIAM J. Comput., 28(4):1326–1346 (electronic), 1999. 5. J.D. Dunagan, D.A. Spielman, and S-H. Teng. Smoothed analysis of the renegar’s condition number for linear programming. In SIAM Conference on Optimization, 2002. 6. Andrew V. Goldberg. A simple shortest path algorithm with linear average time. In Proceedings of the 9th European Symposium on Algorithms (ESA ’01), pages 230–241. Springer Lecture Notes in Computer Science LNCS 2161, 2001. 7. Yuri Gurevich. Average case completeness. Journal of Computer and System Sciences, 42(3):346–398, 1991. Twenty-Eighth IEEE Symposium on Foundations of Computer Science (Los Angeles, CA, 1987). 8. Torben Hagerup. Improved shortest paths on the word RAM. In Automata, languages and programming (ICALP 2000), pages 61–72. Springer, Berlin, 2000. 9. Leonid A. Levin. Average case complete problems. SIAM J. Comput., 15(1):285– 286, 1986. 10. Ulrich Meyer. Shortest-Paths on arbitrary directed graphs in linear Average-Case time. In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA-01), pages 797–806. ACM Press, 2001. 11. Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. Cambridge University Press, Cambridge, 1995. 12. Raman. Recent results on the single-source shortest paths problem. In SIGACTN: SIGACT News (ACM Special Interest Group on Automata and Computability Theory), volume 28, pages 61–72. Springer, Berlin, 1997. 13. Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. In Proceedings of the ThirtyThird Annual ACM Symposium on Theory of Computing, pages 296–3051, 2001.
Inferring Strings from Graphs and Arrays Hideo Bannai1 , Shunsuke Inenaga2 , Ayumi Shinohara2,3 , and Masayuki Takeda2,3 1
2
Human Genome Center, Institute of Medical Science, University of Tokyo 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
[email protected] Department of Informatics, Kyushu University 33, Fukuoka 812-8581, Japan 3 PRESTO, Japan Science and Technology Corporation (JST) {s-ine,ayumi,takeda}@i.kyushu-u.ac.jp
Abstract. This paper introduces a new problem of inferring strings from graphs, and inferring strings from arrays. Given a graph G or an array A, we infer a string that suits the graph, or the array, under some condition. Firstly, we solve the problem of finding a string w such that the directed acyclic subsequence graph (DASG) of w is isomorphic to a given graph G. Secondly, we consider directed acyclic word graphs (DAWGs) in terms of string inference. Finally, we consider the problem of finding a string w of a minimal size alphabet, such that the suffix array (SA) of w is identical to a given permutation p = p1 , . . . , pn of integers 1, . . . , n. Each of our three algorithms solving the above problems runs in linear time with respect to the input size.
1
Introduction
To process strings efficiently, several kinds of data structures are often used. A typical form of such a structure is a graph, which is specialized for a certain purpose such as pattern matching [1]. For instance, directed acyclic subsequence graphs (DASGs) [2] are used for subsequence pattern matching, and directed acyclic word graphs (DAWGs) [3] are used for substring pattern matching. It is quite important to construct these graphs as fast as possible, processing the input strings. In fact, for any string, its DASG and DAWG can be built in linear time in the length of a given string. Thus, the input in this context is a string, and the output is a graph. In this paper, we introduce a challenging problem that is a ‘reversal’ of the above, namely, a problem of inferring strings from graphs. That is, given a directed graph G, we infer a string that suits G under some condition. Firstly, we consider the problem of finding a string w such that the DASG of w is isomorphic to a given unlabeled graph G. We show a characterization theorem that gives if-and-only-if conditions so that a directed acyclic graph is isomorphic to a DASG. Our algorithm inferring a string w from G as a DASG is based on this theorem, and it will be shown to run in linear time in the size of G. Secondly, we consider DAWGs in terms of the string inference problem. We also give a linearB. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 208–217, 2003. c Springer-Verlag Berlin Heidelberg 2003
Inferring Strings from Graphs and Arrays
209
time algorithm that finds a string w such that the DAWG of w is isomorphic to a given unlabeled graph G. Another form of a data structure for string processing is an array of integers. A problem of inferring strings from arrays was first considered by Franˇek et al. [4]. They proposed a method to check if an integer array is a border array for some string w. Border arrays are better known as failure functions [5]. They showed an on-line linear-time algorithm to verify if a given integer array is a border array for some string w on an unbounded size alphabet. Duval et al. [6] gave an on-line linear-time algorithm for a bounded size alphabet, to solve this problem. On the other hand, in this paper we consider suffix arrays (SAs) [7] in the context of string inference. Namely, given a permutation p = p1 , . . . , pn of integers 1, . . . , n, we infer a string w of a minimal size alphabet, such that the SA of w is identical to p. We present a linear time algorithm to infer string w from a given p. 1.1
Notations on Strings
Let Σ be a finite alphabet. An element of Σ ∗ is called a string. Strings x, y, and z are said to be a prefix, substring, and suffix of string w = xyz, respectively. The sets of prefixes, substrings, and suffixes of a string w are denoted by Prefix (w), Substr (w), and Suffix (w), respectively. String u is said to be a subsequence of string w if u can be obtained by removing zero or more characters from w. The set of subsequences of a string w is denoted by Subseq(w). The length of a string w is denoted by |w|. The empty string is denoted by ε, that is, |ε| = 0. Let Σ + = Σ ∗ − {ε}. The i-th character of a string w is denoted by w[i] for 1 ≤ i ≤ |w|, and the substring of a string w that begins at position i and ends at position j is denoted by w[i : j] for 1 ≤ i ≤ j ≤ |w|. For convenience, let w[i : j] = ε for j < i. For strings w, u ∈ Σ ∗ , we denote w ≡ u if w is obtained from u by one-to-one character replacements. For a string w let Σw denote the set of the characters appearing in w. 1.2
Graphs
Let V be a finite set of nodes. An edge is defined to be an ordered pair of nodes. Let E be a finite set of edges. A directed graph G is defined to be a pair (V, E). For an edge (u, v) of a directed graph G, u is called a parent of v, and v is called a child of u. Let Children(u) = {v ∈ V | (u, v) ∈ E}, and Parents(v) = {u ∈ V | (u, v) ∈ E}. Node u (v, respectively) is called the head (tail, respectively) of edge (u, v). An edge (u, v) is said to be an out-going edge of node u and an in-coming edge of node v. A node without any in-coming edges is said to be a source node of G. A node without any out-going edges is said to be a sink node of G. In a directed graph G, the sequence of edges (v0 , v1 ), (v1 , v2 ), . . . , (vn−1 , vn ) is called a path, and denoted by path(v0 , vn ). The length of the path is defined
210
Hideo Bannai et al.
to be the number of edges in the path, namely, n. If v0 = vn , the path is called a cycle. If G has no cycles, it is called a directed acyclic graph (DAG). An edge of a labeled graph G is an ordered triple (u, a, v), where u, v ∈ V and a ∈ Σ. A path (v0 , a1 , v1 ), (v1 , a2 , v2 ), . . . (vn−1 , an , vn ) is said to spell out string a1 a2 · · · an . For a labeled graph G, let s(G) be the graph obtained by removing all edge-labels from G. For two labeled graphs G and H, we write as G ∼ = H if s(G) is isomorphic to s(H). Recall the following basic facts on Graph Theory, which will be used in the sequel. Lemma 1 (e.g. [8] pp. 6–10). Checking if a given directed graph is acyclic can be done in linear time. Lemma 2 (e.g. [8] pp. 6–8). Connected components of a given undirected graph can be computed in linear time. Without loss of generality, we consider in this paper, DAGs G = (V, E) with exactly one source node and sink node, denoted source and sink , respectively. We also assume that for all nodes v ∈ V (excluding source and sink ), there exists both path(source, v) and path(v, sink ). For nodes u, v ∈ V , let us define pathLengths(u, v) as the multi-set of lengths of all paths from u to v, and let depths(v) = pathLengths(source, v).
2
Inferring String from Graph as DASG
This section considers the problem of inferring a string from a given graph as an unlabeled DASG. For a subsequence x of string w ∈ Σ ∗ , we consider the end-position of the leftmost occurrence of x in w and denote it by LM w (x), where 0 ≤ |x| ≤ ∗ LM w (x) ≤ |w|. We define an equivalence relation ∼seq w on Σ by x ∼seq w y ⇔ LM w (x) = LM w (y). Let [x]seq denote the equivalence class of a string x ∈ Σ ∗ under ∼seq w w . The directed acyclic subsequence graph (DASG) of string w ∈ Σ ∗ , denoted by DASG(w), is defined as follows: Definition 1. DASG(w) is the DAG (V, E) such that V = {[x]seq | x ∈ Subseq(w)}, w seq E = {([x]w , a, [xa]seq w ) | x, xa ∈ Subseq(w) and a ∈ Σ}. According to the above definition, each node of DASG(w) can be associated with a position of w uniquely. When we indicate the position i of a node v of DASG(w), we write as vi . Theorem 1 (Baeza-Yates [2]). For any string w ∈ Σ ∗ , DASG(w) is the smallest (partial) DFA that recognizes all subsequences of w.
Inferring Strings from Graphs and Arrays
211
Fig. 1. (a) DASG(w) with w = abba (b) DAWG(w) with w = ababcabcd
DASG(w) with w = abba is shown in Fig. 1 (a). Using DASG(w), we can examine whether or not a given pattern p ∈ Σ ∗ is a subsequence of w in O(|p|) time [2]. Details of construction and applications of DASGs can be found in the literature [2]. Theorem 2. A labeled DAG G = (V, E) is DASG(w) for some string w of length n, if and only if the following properties hold. 1. Path property There is a unique path of length n from source to sink . 2. Node number property |V | = n + 1. 3. Out-going edge labels property The labels of the out-going edges of each node v are mutually distinct. 4. In-coming edge labels property The labels of all in-coming edges of each node v are equal. Moreover, the integers assigned to the tails of these edges are consecutive. 5. Character positions property For any node vk ∈ V , assume Parents(vk ) = ∅. Assume vi ∈ Parents(vk ) and vi−1 ∈ / Parents(vk ) for some 1 ≤ i < k. If the in-coming edges of vk are labeled by some character a, then edge (vi−1 , vi ) is also labeled by a. The path of Property 1 is the unique longest path of G, which spells out w. We call this path the backbone of G. The backbone of DASG(w) can be expressed by sequence (v0 , w[1], v1 ), . . . , (vn−1 , w[n], vn ). Lemma 3. For any two strings u, w ∈ Σ ∗ , u ≡ w if and only if DASG(u) ∼ = DASG(w). The above lemma means that, if an unlabeled DAG is isomorphic to the DASG of some string, the string is uniquely determined except for apparent one-to-one character replacements. Theorem 3. Given an unlabeled graph G = (V, E), the string inference problem for DASGs can be solved in linear time. Proof. We describe a linear time algorithm which, when given unlabeled graph G = (V, E), infers a string w where s(DASG(w)) is isomorphic to G. First, recall that the acyclicity test for given graph G is possible in linear time (Lemma 1). If it contains a cycle, we reject it and halt. While traversing G to test the acyclicity
212
Hideo Bannai et al.
of G, we can also compute the length of the longest path from source to sink of G, and let n be the length. We at the same time count the number of nodes in G. If |V | = n + 1, we reject it and halt. Then, we assign an integer i to each node v of G such that the length of the longest path from source to v is i. This corresponds to a topological sort of nodes in G, and it is known to be feasible in O(|V | + |E|) time (e.g. [8] pp. 6–8). After the above procedures, the algorithm starts from sink of G. Let w be a string of length n initialized with nil at each position. The variable unlabeled indicates the rightmost position of w where the character is not determined yet, and thus it is initially set to n = |w|. At step i, the node at position unlabeled is given a new character ci . We then determine all the positions of the character ci in w, by backward traversal of in-coming edges from sink towards source. To do so, we preprocess G after ordering the nodes topologically. At node vi of G, for each vj ∈ Children(vi ) we insert vi to the list maintained in vj , corresponding to a reversed edge (vj , vi ). Since there exists exactly n + 1 nodes in G, the integers assigned to nodes in the backbone are sorted from 0 to n. Therefore, if we start from source, the list of reversed edges of every node is sorted in increasing order. Thus, given a node node, we can examine if the numbers assigned to nodes in Parents(node) are consecutive, in time linear in the number of elements in the list of the reversed edges of node. If they are consecutive, the next position where ci appears in w corresponds to the smallest value in the set (the first element in the list), and the process is repeated for this node until we reach source. If, at any point, the elements in the set are not consecutive, we reject G and halt. This part is based on Properties 4 and 5 of Theorem 2. If, in this process, we encounter a position of w in which a character is already determined, we reject G and halt since if G is a DASG, for any position its character has to be uniquely determined. After we finish determining the positions of ci in w, we decrement unlabeled until w[unlabeled] is undetermined, or if we reach source. If unlabeled = 0 (if not source), then the process is repeated for a new character ci+1 . Otherwise, all the characters have been determined, and we output w. Since each edge is traversed (backwards) only once, and unlabeled is decremented at most n times, we can conclude that the whole algorithm runs in linear time with respect to the size of G.
3
Inferring String from Graph as DAWG
This section considers the problem of inferring a string from a given graph as an unlabeled DAWG. Definition 2 (Crochemore [9]). The directed acyclic word graph (DAWG) of w ∈ Σ ∗ is the smallest (partial) DFA that recognizes all suffixes of w. The DAWG of w ∈ Σ ∗ is denoted by DAWG(w). DAWG(w) with w = ababcabcd is shown in Fig. 1 (b). Using DAWG(w), we can examine whether or not a given pattern p ∈ Σ ∗ is a substring of w in O(|p|) time. Details of construction and applications of DAWGs can be found in the literature [3].
Inferring Strings from Graphs and Arrays
213
Lemma 4. For any two strings u, w ∈ Σ ∗ , u ≡ w if and only if DAWG(u) ∼ = DAWG(w). The above lemma means that, if an unlabeled DAG is isomorphic to the DAWG of some string, the string is uniquely determined except for apparent one-to-one character replacements. We assume that any string w terminates with a special delimiter symbol $ which does not appear in prefixes. Then the suffixes of w are all recognized at sink of DAWG(w), spelled out from source. Note that, on such an assumption, DAWG(w) is the smallest DFA recognizing all substrings of w. It is not difficult to see that a DAWG will have the following properties. Theorem 4. If a labeled DAG G is DAWG(w) for some string w of length n, then the following properties hold. 1. Length property For each length i = 1, . . . , n, there is a unique path from source to sink of length i, where n is the length of the longest path. 2. In-coming edge labels property The labels of all in-coming edges of each node v are equal. 3. Suffix property Let ui = ui [1]ui [2] . . . ui [i] be the labels of a path of length i from source to sink . Then ui [i − j] = w[n − j] for each j = 0, . . . , i − 1. The above theorem gives necessary properties for a DAG to be a DAWG. Therefore, if a DAG G does not satisfy a property of the above theorem, then we can immediately decide that G is not isomorphic to any DAWG. A na¨ıve way to check the length property would take O(n2 ) time since the n total lengths of all the paths is Σi=1 i, but we here introduce how to confirm the length property in linear time. The length property claims that depths(sink ) = {1, 2, . . . , n} holds, where n is the length of the longest path in G from source to sink . The next lemma is a stronger version of the length property, which holds for any node. Lemma 5. Let w be an arbitrary string of length n. For any node v in DAWG(w), the multi-set depths(v) consists of distinct consecutive integers, that is, depths(v) = {i, i + 1, . . . , j} for some 1 ≤ i ≤ j ≤ n. Lemma 6. Length property can be verified in linear time with respect to the total number of edges in the graph. Proof. If a given G forms DAWG(w) for some string w, by Lemma 5, at each node v, the multi-set depths(v) consists of distinct consecutive integers. Thus depths(v) = {i, i + 1, . . . , j} can be represented by the pair i, j of the minimum i and the maximum j. Starting from source, we traverse all nodes in a depthfirst manner, where all in-coming edges of a node must have been traversed to go deeper. If a node v has only one parent node u, then depths(v) is simply
i + 1, j + 1 where depths(u) = i, j. If a node v has k > 1 parent nodes u1 , . . ., uk , we do as follows. Let i1 , j1 = depths(u1 ), . . ., ik , jk = depths(uk ). By Lemma 5, depths(v) = i1 + 1, j1 + 1∪ · · · ∪ ik + 1, jk + 1 must be equal to
214
Hideo Bannai et al.
imin + 1, jmax + 1, where imin = min{i1 , . . . , ik } and jmax = max{j1 , . . . , jk }. (Remark that the union operation is taken over multi-sets.) This can be verified by sorting the pairs i1 , j1 , . . ., ik , jk with respect to the first component in increasing order into i1 , j1 , . . ., ik , jk , (i1 < · · · < ik ) and checking that j1 + 1 = i2 , . . . , jk−1 + 1 = ik . The sorting and verification can be done in O(k) time at each node with a radix sort and skip count trick, provided that we prepare an array of size n before the traversal, and reuse it. If depths(sink ) = 1, n finally, the length property holds. The running time is linear with respect to the number of edges, since each edge is only processed once as out-going, and once as in-coming. Theorem 5. Given an unlabeled graph G = (V, E), the string inference problem for DAWGs can be solved in linear time. Proof (sketch). We describe a linear time algorithm which, when given unlabeled graph G = (V, E), infers a string w where s(DAWG(w)) is isomorphic to G. The algorithm is correct, provided that there exists such a string for G. Invalid inputs can be rejected with linear time sanity checks, after the inference. Initially, we check the acyclicity of the graph in linear time (Lemma 1), and find source and sink . Using the algorithm of Lemma 6, we verify the length property in linear time. At the same time, we can mark at each node, its deepest parent, that is, the parent on the longest path from source. Notice that Property 2 of Theorem 4 allows us to label the nodes instead of the edges. From Definition 2, it is easy to see that the labels of out-going edges from source are distinct and should comprise the alphabet Σw , and therefore we assign distinct labels to nodes in Children(source) (the label for sink can be set to ‘$’). The algorithm then resembles a simple breadth-first traversal from sink , going up to source. For any set N of nodes, let Parents(N ) = u∈N Parents(u). Starting with N0 = {sink}, at step i, we will consider labeling a set Ni+1 ⊆ Parents(Ni ) of nodes whose construction is defined below. Nodes may have multiple paths of different lengths to the sink, and it is marked visited when it is first considered in the set. Ni+1 is constructed by including all unvisited nodes, as well as a single deepest visited node (if any), in Parents(Ni ) (sink is also disregarded since it cannot have a label). With this construction, we will later see that at least one node in Ni+1 will have already been labeled, and therefore from Property 3 of Theorem 4, all other nodes in Ni+1 can be given the same label. When there are no more unvisited nodes, we infer the resulting string w, which is spelled out by longest path from source to sink . The linear run time of the algorithm is straightforward, since it is essentially a linear time breadthfirst traversal of the DAG with one extra width at most (notice that redundant traversals from visited nodes can be avoided by using only the deepest parent node marked at each node), and the depth of the traversal is at most the length of the longest path from source to sink . The claim that Ni+1 will contain at least one labeled node for all i is justified as follows. If Ni+1 contains a node marked visited, we can use this node since the label of nodes are always inferred when they are marked visited. If Ni+1 does not contain a visited node, it is not difficult to see from its construction
Inferring Strings from Graphs and Arrays
215
that this implies that Ni+1 represents the set of all nodes which have a path of length i + 1 to the sink . Then, from the length property, we can see that at least one of these nodes is labeled in the initial distinct labeling of Children(source). If G was not a valid structure for a DAWG, s(DAWG(w)) may not be isomorphic to G. However, G is labeled at the end of the inference algorithm, and we can check if the labeled G and DAWG(w) are congruent or not in linear time. This is done by first creating DAWG(w) from w in linear time [3], checking the number of nodes and edges, and then doing a simultaneous linear-time traversal on DAWG(w) and labeled G. For each pair of nodes which have the same path from source in both graphs, the labels of the out-going edges are compared. The inclusion of a single deepest visited node (if any) when constructing Ni+1 from P arents(Ni ) is the key to the linear time algorithm, because including all visited nodes in Parents(Ni ) would result in quadratic running time, while not including any visited nodes would result in failure of inferring the string for some inputs.
4
Inferring String from Suffix Array
A suffix array SA of a string w of length n is a permutation p = p1 , . . . , pn of the integers 1, . . . , n, which represents the lexicographic ordering of the suffixes w[pi : n]. Details of construction and applications of suffix arrays can be found in the literature [7]. Opposed to the string inference problem for DASGs and DAWGs, the inferred string cannot be determined uniquely (with respect to ≡). For example, for a given suffix array p = p1 , . . . , pn , we can easily create a string w = w[1] . . . w[n] with an alphabet of size n, where w[i] is set to the character with the pi th lexicographic order in the alphabet. Therefore, we define the string inference problem for suffix arrays as: given a permutation p = p1 . . . pn of integers 1, . . . , n, construct a string w with a minimal alphabet size, whose suffix array SA(w) = p. The only condition that a permutation p = p1 . . . pn must satisfy for it to represent a suffix array of string w is, for all i ∈ 1, . . . n − 1, w[pi : n] ≤lex w[pi+1 : n], where ≤lex represents the lexicographic relation over strings. From the suffix array, we are provided with the lexicographic ordering of each of the characters in the string, that is, w[p1 ] ≤lex · · · ≤lex w[pn ]. Let I denote the set of integers where i ∈ I indicates w[pi ] 23 m (by the choice of µ), we get mηη = muη + mηv − m = muη + 12 mηv − m + 12 mηv > 2 1 2 1 1 3 m + 2 ( 3 m + 1) − m + 2 mηv > 2 mηv . So the children of η, v will be η, η and 1 2 t, v. Further, mtv < 2 mηv ≤ 3 mηv + 1, so η, v is right-balanced, so (c) holds. By Lemmas 1 and 2, on any root-to-leaf path no more than three unbalanced nodes can occur consecutively. This implies the following lemma. Lemma 3. The height of SD(T ) is O(log n), and its total size, u,v∈SD(T ) |Tuv |, is O(n log n).
3
Directed Trees
Cost Functions. We define two cost functions as follows: j (α) = the optimal cost of Tu with at most j facilities, such that (i) at least Puv one of the facilities is in Suv , and (ii) u is served by a facility that is above u at distance α. Qjuv (α) = the optimal cost of Tuv with at most j facilities, such that (i) no facility is in Suv , and (ii) u is served by a facility that is above u at distance α. For the cost functions for full subtrees we will simplify notation and write j j Puj (α) = Puρ(u) (α) and Qju (α) = Qjuρ(u) (α). Note the asymmetry between Puv (·) j j and Quv (·): in the definition of Puv (·) the facilities can be located in the whole subtree Tu , while for Qjuv (·) only in Tuv . However, in both cases the “top” facilities – those that do not have any other facilities above them – are located in Tuv . This is, in fact, the crucial idea behind the dynamic programming formulation. k Our ultimate goal is to compute the cost functions at the root, kthat is kPr (·), k Qr (·). The optimal cost of T with k facilities is costk (T ) = min Pr (0) , Qr (0) . Recurrence Equations. We now set up recurrence equations for our cost functions. There are two base cases: when j = 0 and when u, v is a leaf of SD(T ). 0 (α) is unConsider first the case j = 0. For any node u, v ∈ SD(T ), Puv 0 defined, and Quv (α) = αWuv + Quv , where Wuv = w , and Quv = x x∈Tuv Q0uv (0) = x∈Tuv wx dxu . If u, u is a leaf of SD(T ) (that is, u is a leaf of T ), then for j = 1, . . . , k, set j Puu (α) = 0, and Qjuu (α) is undefined. For j > 0 and any non-leaf node u, v of SD(T ), we want to express each cost function for j facilities at u, v in terms of cost functions for j ≤ j at the descendands of u, v in SD(T ), or in terms of any cost functions for j < j.
Faster Algorithms for k-Medians in Trees
u
α
u µ
(a)
α µ
η v
(b)
223
η v
Fig. 2. Illustration of recurrence equation (1). We show two placements of facilities that correspond to the two terms in the minimum.
Suppose that u, v ∈ SD(T ), for u = v, has two children u, µ and η, v, where η = right(µ). Then j Puµ (α) j i Puv (α) = min , (1) j−i mini=0,...,j−1 Quµ (α) + Pηv (α + dηu ) Qjuv (α) = min Qiuµ (α) + Qj−i (2) ηv (α + dηu ) . i=0,...,j
The second case is when we have a node u, u ∈ SD(T ) with one child z, ρ(z), where z = left(u). Then j Puu (α) = min Puj−1 (0), Qj−1 (3) u (0) j j j Quu (α) = αwu + min Pz (α + dzu ), Qz (α + dzu ) . (4) j Note that to determine Puu (α) we use information from an ancestor u, ρ(u) of u, u in SD(T ), but for a smaller number of facilities. The definition of the cost functions for the leaves and for j = 0 should be obvious. We give a brief justification for (1), the arguments for other recurrences j are similar. Supopse that F is a set of j facilities that realizes Puv (α), that α j is F ⊆ Tu , |F | = j, F ∩ Ruv = ∅, and costF (Tu ) = Puv (α). There are two cases (see Figure 2). If F ∩ Ruµ = ∅, then F is accounted for by the top term j (α). Otherwise, F ∩ Rηv = ∅. Let F = in the minimum, costF (Tuα ) = Puµ F ∩ Tuµ , i = |F | and F = F ∩ Tη . We have F ∩ Ruµ = ∅, F ∩ Rηv = ∅, and α+d α ) + costF (Tη ηu ), so this F is accounted for by the costF (Tuα ) = costF (Tuµ i j−i term Quµ (α) + Pηv (α + dηu ). j Theorem 1. The above recurrence equations Puv (·) and Qjuv (·) for cost functions are correct. In particular, costk (T ) = min Prk (0), Qkr (0) . α α Minimizers. Consider a set F ⊆ Tuv of j facilities. Then costF (Tuv ) = Aα + B, 0 where B = costF (Tuv ), and A = x∈X wx , for a set of nodes X ⊆ Tuv − F j that do not have any ancestors in F . We conclude that all functions Puv (·) and j Quv (·) are lower envelopes of a collection of lines, and thus they are piece-wise linear, continuous and convex.
224
Robert Benkoczi et al.
By the complexity of a piece-wise linear function we mean the number of its line segments. If f (·) and g(·) are two piece-wise linear functions, then the complexity of each function f (·) + g(·) and min {f (·), g(·)} is at most the sum of the complexities of f (·) and g(·). j We say that a set of at most j nodes F ⊆ Tu is a Puv -minimizer at α if α j j F ∩Suv = ∅ and costF (Tu ) = Puv (α). F is a Quv -minimizer at α if F ⊆ Tuv −Suv α and costF (Tuv ) = Qjuv (α). F will be called a (j, u, v)-minimizer if it is either a j Puv -minimizer or a Qjuv -minimizer, for some α. j The complexity of each cost function Puv (·) and Qjuv (·) is equal to the number of the respective minimizers (assuming that the ties are broken in an appropriate way). So we are interested in the following question: How many different minimizers we may have? Our Linearity Conjecture is that this number is that for any fixed j ≥ 1 and each u, v, the number of (j, u, v)-minimizers is O(|Tuv |). For j = 1 this is is trivially true. The conjecture is supported by the result from [8], where it was proved that it indeed holds for j = 2. (Technically, the definition of minimizers in [8] was slightly different.) Although we were unable to prove this conjecture, we show how to use the recurrence equations for the cost functions to get a weaker bound. Theorem 2. Assume that k is fixed. Then, for any j and each u, v ∈ SD(T ), the number of (j, u, v)-minimizers is O(m logj−1 m), where m = |Tuv |. The proof of this theorem will appear in the final version. The cost functions are piece-wise linear and non-decreasing. Each slope value is a sum of at most n weights, so there are only O(Bn) possible slopes. Thus: Lemma 4. Assume that the node weights are positive integers from a range 1, 2, . . . , B, where B is a constant. Then the number of minimizers is O(n). The Algorithm for Directed Trees. Our algorithm uses the recurrence equations developed earlier to compute the cost functions for all j = 0, . . . , k and all nodes u, v of the decomposition tree SD(T ). Algorithm k-MedDirTree Compute the spine decomposition tree SD(T ) Initialize the cost functions for j = 0 and at the leaves of SD(T ) For j = 1, 2, . . . , k do For all u, v ∈ SD(T ) in the bottom-up order do j compute functions Puv (·) and Qjuv (·) using the recurrences (1), (2) and (3) Return costk (T ) = min Prk (0), Qkr (0)
Initialization. The only non-trivial question is how tocompute efficiently all functions Q0uv (α) = αWuv + Quv . Recall that Wuv = x∈Tuv wx , and Quv = Q0uv (0) = x∈Tuv wx dxu . If u, v has two children u, µ and η, v, then we have Quv = Quµ + Q0ηv (dηu ) and Wuv = Wuµ + Wηv . If v = u and u, u has one child z, ρ then Quu = Q0zρ (dzu ), and Wuu = Wzρ + wu . Using these recurrences we can compute all functions Q0uv (·) in linear time by traversing SD(T ) bottom-up.
Faster Algorithms for k-Medians in Trees
225
Running Time. We represent each cost function by a list of its line segments, sorted in order of increasing α. Then the sum and the minimum of two cost functions can be computed in time linear in the total size of these functions. j This means that for a given u, v ∈ SD(T ) the computation of functions Puv (·), j Quv (·) according to the recurrences (1), (2), (3), and (4) can be done in time proportional to the size of the cost functions for this particular node u, v (recall that k is a constant). Thus, by Theorem 2, denoting this cost by time(j, u, v), we have time(j, u, v) = O(|Tuv |polylog(n)). For any given j, the running time of the inner loop is bounded by the sum of time(j, u, v) over all u, v ∈ SD(T ). This time is O(n polylog(n)) due to the fact that the total size of all these subtrees is O(n log n), according to Lemma 3. Thus the overall running time is also O(n polylog(n)). If the weights are positive integers bounded by a constant, as in Lemma 4, or if the linearity conjecture holds, then time(j, u, v) = O(|Tuv |). Theorem 3. Algorithm k-MedDirTree solves correctly the k-median problem. Further, assuming that k is a constant: (a) Algorithm k-MedDirTree runs in time O(n polylog(n)). (b) If the linearity conjecture is true, then Algorithm k-MedDirTree runs in time O(n log2 n). (c) If the weights are positive integers bounded by a constant then Algorithm kMedDirTree runs in time O(n log2 n).
4
Undirected Trees
As in the directed case, we need to partition the tree into k subtrees, in order to minimize the total 1-median cost of these subtrees. The difference is that now the facility in each subtree does not need to be its root. Another way to view the problem is to find a set of k nodes R that are the roots of the subtrees in the partition. We refer to those nodes as cut-nodes, and to the edges between them (except for r) and their parents as cut-edges. (For consistency, we refer to r as a cut-node as well.) We now define several cost functions: j Puv (α) = the optimal cost of Tu with at most j facilities, such that (i) at least one of the cut-nodes is in Suv , and (ii) some nodes in Tu may be served by a facility located outside Tu at distance α from u. ← −j Q uv (α) = the optimal cost of Tuv with at most j facilities, such that (i) there are no cut-nodes in Suv , and (ii) some nodes in Tuv may be served by a facility located outside Tuv at distance α from u. →j − Q uv (α) = the optimal cost of Tuv with at most j facilities, such that (i) there are no cut-nodes in Suv , and (ii) some nodes in Tuv may be served by a facility located outside Tuv at distance α from v. j (s) = the optimal cost of Tu with at most j facilities, such that u is served Cuv by a facility at s ∈ Tuv .
226
Robert Benkoczi et al.
Let also Cuj = mins∈Tu Cuj (s), which is simply costj (Tu ), the cost of Tu with j facilities. Our goal is to compute Crk . The complete set of recurrences will be j j given in the full paper; here we show how to compute Puv (α) and Cuv (s). If the children of u, v are u, µ and η, v, then: j Puµ (α) j ← −i Puv (α) = min j−i mini=0,...,j−1 Q uµ (α) + Pηv (α + dηu ) j (s) Cuµ if s ∈ Tuµ j Cuv (s) = →i − j−i mini=0,...,j Q uµ (dµs ) + Cηv (s) if s ∈ Tηv j Otherwise, when v = u, and u, u has one child z, ρ, then Puu (α) = Cuj and −j−1 min P j−1 (d ), ← Q (d ) if s = u zu zu z z j (s) = Cuu j Cz (s) + wu dus if s ∈ Tz
Currently, we have not been able to design an algorithm that would evaluate those recurrences in subquadratic time for the general case. The crucial difference between the undirected and directed case is the formula for Cuj that involves minimization over the whole tree Tu . However, we show how to do it in two special cases: when k = 3 or when the tree is balanced. The proofs will appear in the full version of this paper. Theorem 4. The 3-median problem in trees can be solved in time O(n log3 n). Theorem 5. The k-median problem in a balanced undirected tree (that is, a tree of depth O(log n)) can be solved in time O(k 2 n logk−1 n + n log n) and space O(kn logk−2 n).
References 1. S. Arora, P. Raghavan, and S. Rao. Approximation schemes for euclidean kmedians and related problems. In Proc. 30th Annual ACM Symposium on Theory of Computing (STOC’98), pages 106–113, 1998. 2. V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Mungala, and V. Pandit. Local seach heuristic for k-median and facility location problems. In Proc. 16th Annual ACOM Symposium on Computing, pages 21–29, 2001. 3. V. Auletta, D. Parente, and G. Persiano. Dynamic and static algorithms for optimal placement of resources in a tree. Theoretical Computer Science, 165:441–461, 1996. 4. V. Auletta, D. Parente, and G. Persiano. Placing resources on a growing line. Journal of Algorithms, 26:87–100, 1998. 5. R.R. Benkoczi and B.K.Bhattacharya. Spine tree decomposition. Technical Report CMPT1999-09, School of Computing Science, Simon Fraser University, Canada, 1999.
Faster Algorithms for k-Medians in Trees
227
6. M. Charikar and S. Guha. Improved combinatorial algorithms for facility location and k-median problems. In Proc. 40th Symposium on Foundations of Computer Science (FOCS’99), pages 378–388, 1999. 7. M. Charikar, S. Guha, E. Tardos, and D. Shmoys. A constant-factor approximation algorithm for the k-median problem. In Proc. 31st Annual ACM Symposium on Theory of Computing (STOC’99), pages 1–10, 1999. 8. M. Chrobak, L. Larmore, and W. Rytter. The k-median problem for directed trees. In Proc. 26th International Symposium on Mathematical Foundations of Computer Science (MFCS’01), number 136 in Lecture Notes in Computer Science, pages 260–271, 2001. 9. M.R. Garey and D.S. Johnson. Computers and Intractability: a Guide to the Theory of NP-completeness. W.H. Freeman and Co., 1979. 10. R. Gavish and S. Sridhar. Computing the 2-median on tree networks in O(n log n) time. Networks, 26:305–317, 1995. 11. R. Hassin and A. Tamir. Improved complexity bounds for location problems on the real line. Operation Research Letters, 10:395–402, 1991. 12. W.L. Hsu. The distance-domination numbers of trees. Operation Research Letters, 1:96–100, 1982. 13. O. Kariv and S.L. Hakimi. An algorithmic approach to network location problems II: The p-medians. SIAM Journal on Applied Mathematics, 37:539–560, 1979. 14. M.R. Korupolu, C.G. Plaxton, and R. Rajaraman. Analysis of a local search heuristic for facility location problems. Journal of Algorithms, 37:146–188, 2000. 15. B. Li, X. Deng, M. Golin, and K. Sohraby. On the optimal placement of web proxies on the internet: linear topology. In Proc. 8th IFIP Conference on High Peformance Netwworking (HPN’98), pages 485–495, 1998. 16. B. Li, M.J. Golin, G.F. Italiano, X. Deng, and K. Sohraby. On the optimal placement of web proxies in the internet. In IEEE InfoComm’99, pages 1282–1290, 1999. 17. R. Shah and M. Farach-Colton. Undiscretized dynamic programming: faster algorithms for facility location and related problems on trees. In Proc. 13th Annual Symposium on Discrete Algorithms (SODA), pages 108–115, 2002. 18. R. Shah, S. Langerman, and S. Lodha. Algorithms for efficient filtering in contentbased multicast. In Proc. 9th Annual European Symposium on Algorithms (ESA), pages 428–439, 2001. 19. A. Tamir. An O(pn2 ) algorithm for the p-median and related problems on tree graphs. Operations Research Letters, 19:59–64, 1996. 20. A. Vigneron, L. Gao, M. Golin, G. Italiano, and B. Li. An algorithm for finding a k-median in a directed tree. Information Processing Letters, 74:81–88, 2000. 21. G. Woeginger. Monge strikes again: optimal placement of web proxies in the internet. Operations Research Letters, 27:93–96, 2000.
Periodicity and Transitivity for Cellular Automata in Besicovitch Topologies F. Blanchard1 , J. Cervelle2 , and E. Formenti3, 1
Institut de Math´ematique de Luminy, CNRS, Campus de Luminy Case 907 - 13288 Marseille Cedex 9, France
[email protected] 2 Laboratoire d’informatique Institut Gaspard-Monge 5 Bd Descartes, Champs-sur-Marne, F-77454 Marne-la-Vall´ee Cedex 2, France
[email protected] 3 Laboratoire d’Informatique Fondamentale de Marseille (LIF) 39 rue Joliot-Curie, 13453 Marseille Cedex 13, France
[email protected] Abstract. We study cellular automata (CA) behavior in Besicovitch topology. We solve an open problem about the existence of transitive CA. The proof of this result has some interest in its own since it is obtained by using Kolmogorov complexity. At our knowledge it if the first result on discrete dynamical systems obtained using Kolmogorov complexity. We also prove that every CA (in Besicovitch topology) either has a unique fixed point or a countable set of periodic points. This result underlines that CA have a great degree of stability and may be considered a further step towards the understanding of CA periodic behavior.
1
Introduction
In the last twenty years CA received a growing and growing attention as formal models for complex systems with applications in almost every scientific domain. They consists in an infinite lattice of finite automata. All automata are identical. Each automaton updates its state according to a local rule on the basis of its actual state and of the one of a fixed finite set of neighboring automata. The state of all automata is updated synchronously. A configuration is a snapshot of the state of all automata in the lattice. The simplicity of the definition of this model is in contrast with the wide variety of different dynamical behaviors most of them are not completely understood yet. Dynamical behavior of CA is studied mainly in the context of discrete dynamical systems by putting on configurations the classical Cantor topology (i.e. the one obtained by putting the product topology when the set of states of the automata is equipped with the discrete topology). Deterministic chaos is one of the most appealing (and poorly understood) dynamical behavior. Among CA,
Corresponding author
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 228–238, 2003. c Springer-Verlag Berlin Heidelberg 2003
Periodicity and Transitivity for Cellular Automata in Besicovitch Topologies
229
one can find many interesting example of this kind of behavior. The problem is that shift map is chaotic according to most popular chaos definitions in litterature (see [7] for example). The shift map is very a simple CA which shifts left the content of configurations. The chaoticity of this map is somewhat counter-intuitive (see [4, 1] for a discussion on this topic). In fact, the chaoticity of the shift is mostly due to the structure of the topology than to the intrinsic complexity of the automaton [8, 5]. In [4], to overcome the drawbacks of the Cantor topology (in the context of chaotic behavior), the authors proposed to substitute it with the Besicovitch topology. In [10], the authors proved that this new topology better links the classical notion of sensibility to initial conditions with the intuitive notion of chaotic behavior. As usual, the introduction of a new result discloses a series of new questions (some of them are reported in [6]). In this paper we solve the problem of finding a transitive CA in Besicovitch topology (Theorem 2) which is qualified of challenging open problem in [12]. This result has deep implications in CA dynamics. First, it states that they are unable to vary arbitrarily the density of differences between two configurations during their evolutions. In its own turn this fact implies that the information contained in configurations cannot spread too much during evolutions. Second, the proof technique is of some interest in its own since we used Kolmogorov complexity to prove a purely topological property about discrete dynamical systems. The low degree of complexity from a chaotic behavior point of view is underlined by the second main result of the paper: a CA either has a unique fixed point or an uncountable set of periodic points. These two results open the quest for new more appropriate properties for describing the “complex” behavior of CA dynamics. Some very interesting proposals along this line of thoughts may be found in [13]. The authors are currently investigating this subject.
2
Cellular Automata
Formally, a CA is a quadruple d, S, N, λ. The integer d is the dimension of the CA and controls how the cells of the lattice are indexed. Indeed, indexes of cells take values in ZZd . The symbol S is the finite set of states of cells and λ : SN → S is the local rule which updates the state of a cell on the basis of a (finite) neighborhood N ⊂ ZZd . A configuration c is a function from ZZd to S and may be viewed as a snapshot of the content of each cell in the lattice. Denote by X the set SZZ of all configurations. The local rule induces naturally a global d d rule on the space of configurations fA : SZZ → SZZ as follows d ∀c ∈ SZZ ∀i ∈ ZZd , fA (c)(i) = λ(i + n1 , . . . , i + nt ) ,
where N = (n1 . . . , nt ) and + is addition in ZZd .
230
F. Blanchard, J. Cervelle, and E. Formenti
In the sequel, when no misunderstanding is possible, we will often make no distinction between a CA and its local rule. Moreover we will denote fA simply by f. Some sets of configurations play a special role in the study of dynamical behavior such as finite and spatial periodic configurations. A configuration c ∈ X is finite if it has a finite number of non-zero cells. A configuration is spatial periodic c ∈ X if there exists p ∈ IN such that ∀i ∈ ZZ, ci = cp+i . The least p with the above property is the (spatial ) period of c. Denote P the set of spatial periodic configurations. A point x ∈ X is ultimately periodic for f if there exist p, t ∈ IN such that ∀h ∈ IN, fph+t (x) = x. The least integer p with such a property is called the (temporal ) period of x and t is its pre-period. A point is fixed if it has period 1. In this paper, we mainly study one-dimensional CA (d = 1) with S = {0, 1}. For any configuration c ∈ X, ca:b is the word ca ca+1 . . . cb if b a, ε (the empty word) otherwise. The pattern of size 2n+1 centered at index 0 of a configuration c ∈ X is denoted Mx (n) = x−n:n . For any word w ∈ S , |u| denotes its length. Finally, if {0, 1} ⊂ S, 0 is the configuration in which all cells are in state 0, and, similarly, 1 is the configuration in which all cells are in state 1. In the sequel, configurations like 0 or 1 are said to be homogeneous configurations. For any c ∈ X, the set Of (c) = {fi (c), i ∈ IN} is the orbit of initial condition c for f (we assume f0 (x) = x). The space-time diagram of initial configuration c ∈ X is a graphical representation of Of (c) and it is obtained by superposing the configurations c, f( c), . . ., fn (c), . . . This representation is very useful in the visualization of simulations of cellular automata evolutions on a computer. Our main interest is to study CA in the context of discrete dynamical systems i.e. structures U, F, where U is a topological (possibly metric) space and F a continuous function from X to itself. When S is endowed with the discrete topology and SZZ the induced product topology, then any global rule f of a CA can be considered as a discrete dynamical system SZZ , f. The product topology on SZZ is usually called Cantor topology since it is a compact, totally disconnected and perfect space. One can easily verify that the following metric on configurations ∀x, y ∈ SZZ , d(x, y) = 2min{|i|,xi =yi ,i∈ZZ} . induces exactly the Cantor topology on SZZ .
3
The Besicovitch Topology
The topology with which the space of a dynamical system X, f is endowed plays a fundamental role in the study of the asymptotic behavior. In particular it can filter out special intrinsic behaviors and hide unimportant marginal phenomena. Besicovitch topology has been introduced in [4] in order to refine the study of sensitivity to initial conditions for CA. In contrast with Cantor topology, this one greatly decreases the “importance” of errors near the cell of index zero by giving to all cells the same weight. Let dB be the following function
Periodicity and Transitivity for Cellular Automata in Besicovitch Topologies
dB (x, y) = lim sup n→ ∞
231
∆(x−n:n , y−n:n ) , 2n + 1
where ∆(x−n:n , y−n:n ) is the number of positions which the words x−n:n and y−n:n differ at. The function dB is a pseudo-distance ([4]), and is called Besicovitch pseudodistance. Besicovitch topology is obtained by taking the quotient space w.r.t. the ˙ this relation. In this equivalence relation “being at zero dB -distance”. Denote ≡ way, dB becomes a metric on classes. In the sequel, when no misunderstanding is possible, we will denote dB simply by d. This topology is suitable for the study of CA. The following proposition allows us to lift a global function from X to itself into a global function from X˙ to itself. ˙ i.e. x≡y ˙ =⇒ f(x)≡f(y). ˙ Proposition 1 ([4]). Any CA is compatible with ≡ For all CA f, denote f˙ the function which transform a class c of X˙ to the class which contains all the images of the configurations of c by f. Such a f˙ is called a CA on (Besicovitch) classes. 3.1
Fixed and Periodic Points
Some recent works pointed out that CA are not complex from a purely algorithmic complexity point of view [8, 3, 5]. This fact seems to originate from an intrinsic stability of such systems. It is well-known that in Cantor topology, if a CA has a non-homogeneous fixed point then it has at least a countable set of fixed points. Besicovitch topology allows to go even further. In this section we prove that either a CA has one unique fixed point or it has uncountably many periodic points. Clearly these “new” periodic points are due to the special structure of the Besicovitch space but if we analyze more in details how they are built one can see that they are made of larger and larger areas in which the system is periodic even in the Cantor sense. Before introducing the main result of this section (Theorem 1) we need some technical lemma and notation. If p is an integer and u a word of size at least 2p, then p |u|p is the word up+1:|u|−p i.e. the word u in which the first and last p letters are deleted. Lemma 1. For all CA of global rule f and radius r, it holds that ∆(r |ab|r , f(ab)) ∆(r |b|r , f(b)) + ∆(r |a|r , f(a)) + 2r, where a and b have length bigger than 2r. Previous result comes from the fact that the image of the concatenation of two words is the concatenation of the images of these words separated by 2r cells of perturbation. An iterated application of this lemma gives an inequality for the concatenation of h words. Lemma 2. Let (ai )i∈1,h be a finite sequence of h words of length greater than 2r. Let x = a1 . . . ah be the concatenation of these words. Then, for all CA f of radius r, ∆(r |x|r , f(x)) h i=1 ∆(r |ai |r , f(ai )) + 2r(h − 1).
232
F. Blanchard, J. Cervelle, and E. Formenti
Proof. By induction on the number of concatenated words and Lemma 1.
We will also make use of the well-known Cesaro’s lemma on series. Lemma 3 (C´ esaro’s Lemma). Let (an )n∈IN and (un )n∈IN be such that lim
n→ ∞
un =l , an
where an is divergent and positive. Then, limn→ ∞
n ui i=0 n i=0 ai
= l.
Proposition 2. In Besicovitch space, any CA with two distinct periodic points of period p1 and p2 has an uncountable number of periodic points whose period is lcm(p1 , p2 ). Proof. Let f be a CA on classes with radius r. Let x and y be two periodic points of f with periods p1 and p2 , respectively. Let p = lcm(p1 , p2 ). Let x a member of the class x and y a member of the class y. Since x and y are distinct, d(x , y ) = δ > 0. Hence, there exists a sequence of integers (un )n∈IN such that, for all integer n > 0 : ∆(x−u , y−u ) δun . n :un n :un
(1)
Let σ be an increasing function such that uσ(0) > 4r and uσ(n+1) > 2uσ(n) . Let vn = uσ(n) . By a simple recurrence on n, it holds that : n
vi 2vn .
(2)
i=0
˙ such that the Let us construct an injection g from {0, 1}ZZ into X (not X) ˙ classes of the image set of g are all periodic points of f. Let α be sequence of {0, 1}IN . For all positive integers i, define the sequences kα and kα as follows x1:vi if αi = 0 x−vi :−1 if αi = 0 α α ki = and ki = if α = 1 if αi = 1 y1:v y−v i i i :−1 Define the configuration g(α) as follows α α α α 2rp+1 α α α α k0 k1 k2 . . . kα g(α) = . . . kα n kn−1 . . . k2 k1 k0 0 n−1 kn . . .
Let us prove that the class containing g(α) is a periodic point for f with period p, i.e. d(g(α), fp (g(α))) = 0. One has to prove that lim sup n→ ∞
∆(g(α)−n:n , fp (g(α))−n:n ) =0 2n
Since x and y are periodic points of period p1 and p2 , respectively, and both p1 and p2 are divisors of p, we have that d(x , fp (x )) = d(y , fp (y )) = 0,
Periodicity and Transitivity for Cellular Automata in Besicovitch Topologies
and then, that limn→ ∞ Hence, it holds that lim
n→ ∞
,fp (x )−n:n ) ∆(x−n:n 2n
= limn→ ∞
,fp (y )−n:n ) ∆(y−n:n 2n
233
= 0.
∆(x1:n , fp (x )1:n ) ∆(y1:n , fp (y )1:n ) = lim = 0. n→ ∞ n n
(3)
α Fix an integer n > r. Let us decompose g(α)−n:n in its kα i and ki factors
−→ α ← − α α α 2rp+1 α α α α g(α)−n:n = kα k0 k1 k2 . . . kα h kh−1 . . . k2 k1 k0 0 h−1 kh −→ ← − α α where h depends on n, and kα h [resp. kh ] is the suffix [resp. prefix] of kh [resp. α kh ] of g(α)−n:n . Since fp is a CA of radius rp, Lemma 2 gives h−1 p α ∆(g(α)−n:n , fp (g(α))−n:n ) ∆(0, fp (02rp+1 )) + i=0 ∆(rp |kα i |rp , f (ki )) h−1 p α + i=0 ∆(rp |kα i |rp , f (ki ))+ −→ −→ α p α ∆(rp |kh |rp , f (kh )) ← − − p ← α +∆(rp |kα h |rp , f (kh )) + 2rp(2h). ← − α By the definition of kα i , one have that kh is a prefix of x or of y . Hence, ← − − p ← α p p ∆(rp |kα h |rp , f (kh )) ∆(rp |x1:n |rp , f (x1:n )) + ∆(rp |y1:n |rp , f (y1:n )) . Similarly −→ −→ p α p p ∆(rp |kα h |rp , f (kh )) ∆(rp |x−n:−1 |rp , f (x−n:−1 )) + ∆(rp |y −n:−1 |rp , f (y −n:−1 )).
Then one has h−1 p α ∆(g(α)−n:n , fp (g(α))−n:n ) 1 + i=0 ∆(rp |kα i |rp , f (ki ))+ h−1 α p α i=0 ∆(rp |ki |rp , f (ki ))+ |rp , fp (x−n:n ))+ ∆(rp |x−n:n p ∆(rp |y−n:n |rp , f (y−n:n )) + 4rph
(4)
By Equation (3), one finds lim i→ ∞
αi =1
p α ∆(rp |kα i |rp ,f (ki )) vi
And similarly, lim i→ ∞
αi =0
|rp ,fp (x−v )) i :vi vi ,fp (x )−n:n ) ∆(x−n:n limn→ ∞ = 0. n
limi→ ∞
p α ∆(rp |kα i |rp ,f (ki )) vi
∆(rp |x−v
i :vi
= 0.
p α ∆(rp |kα i |rp ,f (ki )) = v i h α p α ∆( |k | ,f (k )) i h i rp = 0, since the 0 and by Cesaro’s Lemma we have limh→ ∞ i=0 rp i=0 vi divergent. The same argument applies to k , and series (vi )i∈IN is positive and h h α p α α p α ∆( |k | ,f (k )) ∆( |k | ,f (k rp rp rp rp i i i i )) therefore one has limh→ ∞ i=0 h v + i=0 h v = 0. i=0 i i=0 i
Summing the two previous equations, we obtain limi→ ∞
234
F. Blanchard, J. Cervelle, and E. Formenti
Since
h i=0
vi 2n, one gets
n
lim
n→ ∞
i=0
p α ∆(rp |kα i |rp , f (ki )) + 2n
n i=0
p α ∆(rp |kα i |rp , f (ki )) =0 2n
(5)
Using Equation (3) again, we obtain that lim
n→ ∞
∆(rp |x−n:n |rp , fp (x−n:n )) ∆(rp |y−n:n |rp , fp (y−n:n )) = lim =0 n→ ∞ 2n 2n
Finally lim
n→ ∞
4rph + 1 =0 2n
(6)
(7)
since h ln2 (n). Using Equations (5), (6) and (7) inside Equation (4), one finds that p limn→ ∞ ∆(g(α)−n:n f2n(g(α))−n:n ,) = 0 which implies that g(α) is a periodic point of f of period p. Let ∼ be the equivalence relation such that x ∼ y if and only if y and x differ only at a finite number of positions, and the converse relation. Let α and β be two sequences of {0, 1}IN such that α β. Let (ai )i∈IN be the increasing sequence of indices where they differ. Then, ∆(kα , kβ ∆(g(α)−n:n , g(β)−n:n ) an ) lim sup an an 2n n∈IN i∈IN 2 i=0 vi + 2r + 1 ∆(xv an , yv an ) lim sup an . i∈IN 2 i=0 vi + 2r + 1
d(g(α), g(β)) lim sup
Applying Equation (1) and (2), we obtain d(g(α), g(β)) lim supn∈IN
δvan 2van +1
δ2 . Hence, g(α) and g(β) are in different classes. Finally, we prove that all configurations in g(X) are periodic points of period ˙ g(β). Let E be a set containing a member of each p, and that α β =⇒ g(α) ≡ equivalence class of ∼. Since {0, 1}IN / ∼ is not countable, so is E. Using previous equation, g|E is injective and hence, g(E) is a non countable set of periodic points of f of period p. This result has two easy corollaries. The first one is obtained simply recalling that a fixed point is a periodic point of period 1. The second one comes from the fact that if p is a periodic points of period greater of equal to 2, then there are at least two distinct periodic points. Corollary 1. If a CA f has two fixed points, it has an uncountable number of fixed points. Corollary 2. If a CA has a periodic point of period p > 1, then its set of periodic points is not countable. Putting together the results of the two previous corollaries we have the main result of this section.
Periodicity and Transitivity for Cellular Automata in Besicovitch Topologies
235
Theorem 1. Any CA has either one and only one fixed point or an uncountable number of periodic points. Proof. Let f be a CA. There are several cases. First f can have one and only one fixed point. Second, if f can has two fixed points, then it has a uncountable set of periodic points (which are in this case fixed points); finally, if f has no fix points, then f(0) = 1 and f(1) = (0), and 0 is a periodic point of period 2, and therefore, using Corollary 2, there are uncountably many periodic points for f. Proposition 3. If a surjective CA has a blocking word (or, equivalently, an equicontinuity point for Cantor topology), then its set of periodic points is dense in Besicovitch topology. Remark that the same argument can be used to prove a similar result for the Cantor topology, without making use of measure theory, as it is the case in [2]. Proof. Let f be a CA and w a blocking word for f. One has to prove that for all configurations x, and all real numbers ε > 0, there exists a periodic configuration at distance less than ε from x. Let y be the following configuration wl if l < |w| ∀n, l < k ∈ IN, ynk+l = xnk+l otherwise, where k = 2ε|w|. The configuration y is everywhere equal to the x except that periodically we put w. The number of differences between x−n:n and y−n:n is bounded by the product of |w| and the number of times w is written within y−n:n . Hence, it holds that n
2n |w| +|w| ,y−n:n ) d(x, y) limn→ ∞ ∆(x−n:n limn→ ∞ k2n limn→ ∞ ε 2n < ε. 2n+1 Now we are going to prove that y is a periodic point of f. Since w is a blocking word, for all integers i and n, the pattern fi+1 (y)nk+|w|/2:n(k+1)+|w|/2 depends only on the corresponding word of the same size in the pre-image fi (y)nk+|w|/2:n(k+1)+|w|/2 . (i)
For all i, n ∈ IN, let un = fi (y)nk+|w|/2:n(k+1)+|w|/2 . For any fixed (i) is has finite range and each term depends only on n, the sequence un i∈IN
its predecessor. Hence, it is ultimately periodic. By the hypothesis the CA is surjective, this implies that y is periodic (for if the sequence is periodic of period (k) (k−1) (k+p−1) and un , which p with pre-period k, then un has two pre-images: un is impossible since a surjective CA is pre-injective – see [9] for more on preinjectivity). Since the number of possible values for this sequence is 2k , the period of each column is at most 2p . Hence, the configuration is periodic and its period is at most lcm{2i , 1 i k}. 3.2
Transitive Cellular Automata
As already recalled, Besicovitch topology was introduced in order to further study CA chaotic behavior and, in particular, sensitivity to initial conditions.
236
F. Blanchard, J. Cervelle, and E. Formenti
In [1], the authors wondered about the existence of transitive CA in this topology. The same problem has been qualified as “challenging” in [12]. In this section we prove that the question has a negative answer. Former researches tried to prove or disprove the existence of transitive CA either by looking for counter-examples or by complicated combinatorial proofs. Here we have drastically diminished the complexity of the problem by making use of Kolmogorov complexity and the classical approach of the “incompressibility method” (see [11] for more on this last subject). Clearly, giving a glance to our proof, now one can find a pure combinatorial proof by doing some “reverseengineering”. For any two words u, w on {0, 1} , denote K(u) the Kolmogorov complexity of u and K(u|w), the Kolmogorov complexity of u conditional to w. We make reference to [11] for the precise definitions of these quantities and all the related well-known inequalities. The intuition behind the proof of next result is that CA cannot increase the algorithmic complexity of configurations. If a CA would be transitive in Besicovitch topology then, given two configurations x, y which differs on a sequence of places with relatively large complexity, it should be able to take a point arbitrarily near to x, arbitrarily near to y, but this implies a great change in complexity contradicting our initial intuition. Theorem 2. In Besicovitch topological space there is no transitive CA. Proof. By contradiction, suppose that there exists a transitive CA f of radius r with C states. Let x and y be two configurations such that for all integers K(x−n:n |y−n:n ) n 2 . One can prove that configurations x and y exist by a simple counting argument. Since f is transitive, there are two configurations x and y such that ∆(x−n:n , x−n:n ) 4εn
and
∆(y−n:n , y−n:n ) 4δn
(8)
and an integer u (which only depends on ε and δ) such that fu (y ) = x ,
(9)
−1
where ε = δ = (4e10 log2 C ) . In the sequel of the proof, only n varies, while C, u, x, y, x , y , δ and ε are fixed and independent of n. from the following items: By Equation (9), one may compute the word x−n:n y−n:n , f, u, n and the twice ur bits of y which surrounds y−n:n and which are missing to compute x−n:n with Equation (9). We obtain that K(x−n:n |y−n:n ) 2ur + K(u) + K(n) + K(f) + O(1) o(n)
(10)
(the notations O and o are defined with respect to n). Now, let us evaluate K(y−n:n |y−n:n ). Let a1 , a2 , a3 , . . . , ak be the positive positions which y−n:n and y−n:n differ at, sorted increasingly. Let b1 = a1 and bi = ai −ai−1 , for 2 i k. Using Equation (8), we know that k 4δn. Remark that ki=1 bi = ak n. By
Periodicity and Transitivity for Cellular Automata in Besicovitch Topologies
237
contradiction, let a1 , a2 , a3 , . . . , ak the absolute value of the strictly negative positions which y−n:n and y−n:n differ at, sorted increasingly. Let b1 = a1 and bi = ai − ai−1 , where 2 i k . Equation (8) statesthat k 4δn. Since the ln bi bi logarithm is a concave function, one has ln n k ln k k and hence
ln bi k ln
n k
(11)
which also holds for bi and k . The knowledge of the bi , the bi , and of the k + k states of the cells of y−n:n where y−n:n differs from enough to y−n:n is compute y−n:n from y−n:n . Hence, K(y−n:n |y−n:n ) ln(bi ) + ln(bi ) + (k + k ) log2 C + O(1). Equation (11) states that K(y−n:n |y−n:n ) k ln n k + n k ln kn +(k+k ) log2 C+O(1). The function k → k ln n is increasing on [0, ]. k e As n n n 4n 10 log2 C , we have that k ln 4δn ln ln e k 4δn e10 log C 10 log C k 4δn 2 2 e 2 log2 Cn 40 n and that C(k + k ) e10 log2 C . Replacing a, b and k by a , b and e10 log2 C k , the same sequence of inequalities leads to the same result. One may deduce that (2 log2 C + 80)n K(y−n:n |y−n:n ) + O(1) (12) e10 log2 C ). Similarly, Equation (12) is also true with K(x−n:n |x−n:n The triangular inequality of the Kolmogorov complexity gives K(x−n:n |y−n:n ) K(x−n:n |x−n:n ) + K(x−n:n |y−n:n ) + K(y−n:n |y−n:n ) + O(1) .
2 C+80)n By Equations (12) and (10) one concludes that K(x−n:n |y−n:n ) (2 log e10 log2 C + o(n). The hypothesis on x and y was K(x−n:n |y−n:n ) n 2 . This implies that (2C+80)n n + o(n). Last inequality is false for a big enough n. 2 e10C
References 1. F. Blanchard, E. Formenti, and P. K˚ urka. Cellular automata in the Cantor, Besicovitch and Weyl topological spaces. Complex Systems, 11:107–123, 1999. 2. F. Blanchard and P. Tisseur. Some properties of cellular automata with equicontinuity points. Annales de l’Instute Henri Poincar´e, 36(5):562–582, 2000. 3. C. Calude, P. Hertling, H. J¨ urgensen, and K. Weihrauch. Randomness on full shift spaces. Chaos, Solitons & Fractals, 1:1–13, 2000. 4. G. Cattaneo, E. Formenti, L. Margara, and J. Mazoyer. A Shift-invariant Metric on SZZ Inducing a Non-trivial Topology. In I. Privara and P. Rusika, editors, MFCS’97, volume 1295 of LNCS, Bratislava, 1997. Springer-Verlag. 5. J. Cervelle, B. Durand, and E. Formenti. Algorithmic information theory and cellular automata dynamics. In MFCS’01, volume 2136 of LNCS, pages 248–259. Springer Verlag, 2001. 6. M. Delorme, E. Formenti, and J. Mazoyer. Open problems on cellular automata. Submitted, 2002. 7. R. L. Devaney. An introduction to chaotic dynamical systems. Addison-Wesley, Reading (MA), 1989.
238
F. Blanchard, J. Cervelle, and E. Formenti
8. J-C. Dubacq, B. Durand, and E. Formenti. Kolmogorov complexity and cellular automata classification. Theoretical Computer Science, 259(1–2):271–285, 2001. 9. B. Durand. Global properties of cellular automata. In E. Goles and S. Martinez, editors, Cellular Automata and Complex Systems. Kluwer, 1998. 10. E. Formenti. On the sensitivity of cellular automata in Besicovitch spaces. Theoretical Computer Science, 301(1–3):341–354, 2003. 11. M. Li and P. Vit´ anyi. An Introduction to Kolmogorov complexity and its applications. Springer-Verlag, second edition, 1997. 12. G. Manzini. Characterization of sensitive linear cellular automata with respect to the counting distance. In MFCS’98, volume 1450 of LNCS, pages 825–833. Springer-Verlag, 1998. 13. B. Martin. Damage spreading and µ-sensitivity in cellular automata. Ergodic theory & dynammical systems, 2002. To appear.
Starting with Nondeterminism: The Systematic Derivation of Linear-Time Graph Layout Algorithms Hans L. Bodlaender1 , Michael R. Fellows2 , and Dimitrios M. Thilikos3 1
2
Institute of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
[email protected] School of Electrical Engineering and Computer Science, University of Newcastle Callaghan, NSW 2308, Australia 3 Departament de Llenguatges i Sistemes Inform` atics Universitat Polit`ecnica de Catalunya Campus Nord M` odul C6. c/ Jordi Girona Salgado 1-3 08034, Barcelona, Spain
Abstract. This paper investigates algorithms for some related graph parameters. Each asks for a linear ordering of the vertices of the graph (or can be formulated as such), and there are constructive linear time algorithms for the fixed parameter versions of the problems. Examples are cutwidth, pathwidth, and directed or weighted variants of these. However, these algorithms have complicated technical details. This paper attempts to present these algorithms in a different more easily accessible manner, by showing that the algorithms can be obtained by a stepwise modification of a trivial hypothetical non-deterministic algorithm. The methodology is applied for a generalisation of the cutwidth problem to weighted mixed graphs. As a consequence, we obtain new algorithmic results for various problems like modified cutwidth, and rederive known results for other related problems with simpler proofs. Keywords: Algorithms and data structures; graph algorithms; algorithm design methodology; graph layout problems; finite state automata.
1
Introduction
The notion of pathwidth (and the related notion of treewidth) has been applied successfully for constructing algorithms for several problems. One such application area is for problems where linear orderings of the vertices of a given graph
The first and third author were partially supported by EC contract IST-1999-14186: Project ALCOM-FT (Algorithms and Complexity - Future Technologies. The second author was supported by the New Zealand Marsden Fund, and the Australian Research Council Project DP0344762. The research of the third author was supported by the Spanish CICYT project TIC2000-1970-CE and the Ministry of Education and Culture of Spain (Resoluci´ on 31/7/00 – BOE 16/8/00).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 239–248, 2003. c Springer-Verlag Berlin Heidelberg 2003
240
Hans L. Bodlaender, Michael R. Fellows, and Dimitrios M. Thilikos
are to be found, with a specific parameter of the ordering to be optimised. In this paper, we are interested in a number of related notions which appear to allow the same algorithmic approach for solving them. The central problem in the exposition is the cutwidth problem (see Section 2 for the definition). While cutwidth is an NP-complete problem [14], we are interested in the fixed parameter variant of it: for fixed k, we ask for an algorithm that given a graph G, decides if the cutwidth of G is at most k, and if so, gives a linear ordering of G with cutwidth at most k. This fixed parameter variant of the problem is known to be linear time solvable (with the constant factor depending exponentially on k) [10,12,16]. Such a linear time algorithm can be of the following form: first a path decomposition of bounded pathwidth is found (if the pathwidth of G is more than k, we know that the cutwidth of G is more than k), and then a dynamic programming algorithm is run that uses this path decomposition. Unfortunately, the technical details of this dynamic programming algorithm are rather complex. Other problems that have a similar algorithmic solution are the pathwidth problem itself (see [7,3]), and variants on weighted or directed graphs, including directed vertex separation number [2]. See also [1]. In this paper, we attempt to present these algorithms in a different, more easily accessible manner, by showing that the algorithms can be obtained by a stepwise modification of a trivial hypothetical non-deterministic algorithm. Thus, while our resulting algorithms will not be much different from solutions given in the literature, the reader may understand the underlying principles and the correctness of the algorithms much easier. Also, we give some new results, e.g., our solution for (directed) modified cutwidth is new. Ingredients of the techniques displayed in this paper appeared in the early 1990’s independently in work of Abrahamson and Fellows [1], Lagergren and Arnborg [13], and Bodlaender and Kloks [7]. In [5], a relation between decision and construction versions of algorithms running on path decomposition with an eye to finite state automata was established. More background and more references can be found in [11]. Many missing details can be found in [6].
2
Definitions
A mixed graph is a triple G = (V, E, A). E is the set of undirected edges of G, and A the set of (directed) arcs. When not distinguishing between an undirected edge and an arc, we use the term ‘edge’. Definition 1. A path decomposition of a mixed graph G = (V, E, A) is a sequence of subsets of vertices (X1 , X2 , . . . , Xr ), such that – 1≤i≤r Xi = V . – for all (v, w) ∈ E ∪ A, there exists an i, 1 ≤ i ≤ r, with v ∈ Xi and w ∈ Xi . – for all i, j, k ∈ I: if i ≤ j ≤ k, then Xi ∩ Xk ⊆ Xj . The width of a path decomposition (X1 , X2 , . . . , Xr ) is max1≤i≤r |Xi | − 1. The pathwidth of G is the minimum width over all possible path decompositions of G.
Starting with Nondeterminism
241
Definition 2. A linear ordering of a mixed graph G = (V, E, A) is a bijective function f : V → {1, 2, . . . , |V |}, such that for all arcs (v, w) ∈ A: f (v) < f (w). An edge (v, w) ∈ E ∪ A is said to cross vertex x ∈ V in linear ordering f , if f (v) < f (x) < f (w). Edge (v, w) ∈ E ∪ A is said to cross gap i, if f (v) ≤ i < f (w). Definition 3. Let G = (V, E, A) be a mixed graph, and let f : V → {1, 2, . . . , n} be a linear ordering of G, n = |V |. Let c : E ∪ A → N be a function that assigns to every edge a non-negative integer weight. 1. The total weight of the edges crossing gap i is denoted by nf (i). 2. The weighted cutwidth of f is the maximum over all gaps i, 1 ≤ i ≤ n of nf (i). 3. For 1 ≤ i ≤ n, we denote the total weight of the edges and arcs that cross vertex f −1 (i) by mf (i). 4. The modified cutwidth of f is max1≤i≤n mf (i). 5. The weighted cutwidth or weighted modified cutwidth of a mixed graph G is the minimum weighted cutwidth or weighted modified cutwidth over all possible linear orderings of G. Interesting cases are the unweighted variants (all edges and arcs have weight one), and the cases when E or A are empty. In this way, we obtain the standard cutwidth, modified cutwidth, directed cutwidth, . . . problems as special cases. The pathwidth of a graph is at most its cutwidth, and at most one larger than its modified cutwidth. See [4] for an overview of related notions and results. Let Σ be a finite alphabet. Σ ∗ is the set of all (possibly empty) strings with symbols in Σ. The concatenation of strings s and t is denoted st. A string s ∈ Σ ∗ is a substring of a string t ∈ Σ ∗ , if there are t , t ∈ Σ ∗ , with t = t st . A string s is a subsequence of a string t = t1 t2 . . . tr ∈ Σ ∗ , if there are indices 1 ≤ α(1) < α(2) < · · · < α(q) ≤ r with s = tα(1) · · · tα(q) . Let Σ, Σ0 be disjoint finite alphabets, and let s be a string in (Σ ∪ Σ0 )∗ . The string s|Σ is the maximal subsequence of s that belongs to Σ ∗ , i.e., s|Σ is obtained from s by removing all symbols in Σ0 . In this paper, we also consider a linear ordering of G = (V, E) as a string in V ∗ , i.e., a string where every element of V appears exactly once. We say that a string t can be obtained by inserting a symbol v into a string s, if we can write s = s1 s2 , t = s1 vs2 , with s1 , s2 substrings of s. We say a path decomposition (X1 , . . . , Xr ) is nice, if |X1 | = 1, and for all i, 1 < i ≤ r, there is a v such that Xi = Xi−1 ∪ {v} (i is called a introduce node, inserting v), or Xi = Xi−1 − {v} (i is called a forget node). It is not hard to see (see e.g. [4]), that a path decomposition of width k can be transformed to a nice path decomposition of width k in linear time. A terminal graph is a triple (V, E, X), with (V, E) a graph, and X an labeled set of distinguished vertices from V , called the terminals. A terminal graph with k terminals is also called a k-terminal graph. Given two k-terminal graphs G and H, G ⊕ H is defined as the graph, obtained by taking the disjoint union
242
Hans L. Bodlaender, Michael R. Fellows, and Dimitrios M. Thilikos
of G and H, then identifying the i’th terminal of G with the i’th terminal of H for all i, 1 ≤ i ≤ k, and then dropping parallel edges. Suppose we have a path decomposition (X1 , . . . , Xr ) of G = (V, E). To each i, 1 ≤ i ≤ r, we can associate the terminal graph Gi = (Vi , Ei , Xi ), with Vi = 1≤j≤i Xj , and Ei = |{{v, w} ∈ E | v, w ∈ Vi }. For a nice path decomposition (X1 , . . . , Xr ) of G = (V, E), we can build a k + 1-coloring : V → {1, . . . , k + 1} of G, such that for all v, w ∈ V , if there is an i with v, w ∈ Xi , then (v) = (w). (Go through the path decomposition from left to right. At each introduce node i, color the inserted vertex in Xi − Xi−1 different from the other vertices in Xi .) Call (v) the label of v. We now introduce a notation for the possible operations, that given a graph Gi−1 , build the next graph Gi . We denote the terminal with number r as tr . If i is an introduce node, suppose the inserted vertex v ∈ Xi − Xi−1 has label l , and S ⊆ {1, . . . , k + 1} − {l } is the set of the labels of those vertices in Xi−1 = Xi − {v} that are adjacent to v. We can write an introduce operation as I(r, S), meaning an insertion of a new terminal tr , r ∈ {1, . . . , k + 1}, with this terminal adjacent to the terminals with numbers in S ⊂ {1, . . . , k+1}. The forget operation can be written as F (r), meaning that tr becomes a non-terminal. If f1 is a linear order of G = (VG , EG ), G is a subgraph of H = (VH , EH ), and f is a linear order of H, we say that f extends f1 , if f1−1 (1), f1−1 (2), . . . , f1−1 (|VG |) is a subsequence of f −1 (1), f −1 (2), . . . , f −1 (|VH |), i.e., f can be obtained from f1 by inserting the vertices of VH − VG . To a sequence of vertices v1 , . . . , vn (or a linear order f ), we associate the set of n + 1 gaps: a gap is the location between two successive vertices, or the location before the first, or after the last vertex. For a linear order f of G = (V, E) and a subset W ⊆ V , we consider the linear order f |W of G[W ], where for all v, w ∈ W , f (v) < f (w), if and only if f |W (v) < f |W (w), i.e., f |W is obtained in the natural way from f by dropping all vertices not in W from the sequence. For v, w ∈ V , we write f [v, w] as the sequence f |W with W = {x | f (v) ≤ f (x) ≤ f (w)}, i.e., we take the substring that starts at v and ends at w. We use the acronyms NFA and DFA for non-deterministic finite state automaton and deterministic finite state automaton. We assume that the reader is familiar with these classic notions, and the fact that for every NFA A, there is a DFA B that accepts the same set of strings.
3
An Algorithm for Weighted Cutwidth
In this section, we give a stepwise derivation of the following result. Theorem 1. Let k, c be constants. One can construct a linear time algorithm, that given a mixed edge-weighted graph G together with a path decomposition of width at most k, decides if the weighted cutwidth of G is at most c, and if so, finds a linear ordering of G with weighted cutwidth at most r. Due to space constraints, we only discuss the decision version of the result here. The corresponding ordering can be found with additional bookkeeping. We derive
Starting with Nondeterminism
243
the algorithm by first giving a naive non-deterministic algorithm for the problem, and then modifying it step by step. We always start the algorithm by making the path decomposition of G nice, without increasing its width. 3.1
A Trivial Non-deterministic Algorithm
The following trivial non-deterministic algorithm finds a linear order of G, and solves the weighted cutwidth problem. It builds the order by inserting the vertices one by one as they appear in the path decomposition, thus after step i, we have a linear order of Gi . 1. Start with an empty sequence. 2. Now, go through the path decomposition from left to right. If we deal with the ith node of the path decomposition, then (a) If the ith node is an introduce node of vertex v, then insert v nondeterministically at some gap in the sequence such that the resulting sequence has weighted cutwidth at most c, and no arc from or to v is directed in the wrong direction. If there is no such gap, halt and reject. (b) If the ith node is a forget node of vertex v, then we do nothing. 3. If all nodes of the path decompositions have been handled, then accept. 3.2
A Non-deterministic Algorithm that Counts Edges
To determine whether a vertex can be inserted at some place in the sequence without violating the cutwidth condition, the algorithm above needs to consult the graph G. Instead, we can keep the information about the total weight of edges that cross gaps in the sequence. Now, we use sequences of the form nf0 , v1 , nf1 , v2 , . . . , nfn−1 , vr , nfr , i.e., we alternatingly have a number that tells the total weight of edges crossing a gap and a vertex. I.e., the sequence gives the ordering of the vertices and the weights of edges across the gaps. When inserting a vertex v in some gap, the information needed to construct a new sequence from the old sequence is only the sequence and the list of neighbours of v. In the remainder, we often drop the superscript f . We use the same algorithm as above, but now start with the sequence 0, and we have to detail how an insertion of a vertex v takes place now. 1. Non-deterministically, a gap with number nj in the sequence is chosen, such that for every arc (v, x) ∈ A, the gap comes before x in the sequence, and for every arc (x, v) ∈ A, the gap comes after x in the sequence. If no such gap exists, halt and reject. 2. nj is replaced by nj , v, nj . 3. For every terminal x with (v, x) ∈ E ∪ A, add the weight of (v, x) to every number in the sequence between x and v. 4. If we obtain a number that is k + 1 or larger, halt and reject.
244
3.3
Hans L. Bodlaender, Michael R. Fellows, and Dimitrios M. Thilikos
A Non-deterministic Decision Algorithm
Now note that we can forget the names of non-terminal vertices, because vertices that are later inserted have no edges to non-terminals. The algorithm of the previous step is modified as follows: If the ith node is a forget node of the form F (r), then we replace tr in the sequence by a symbol −. 3.4
A Non-deterministic Finite State Automaton
This step gives the crucial observation that turns the method first into a NFA; later steps then transform the NFA into a deterministic linear time algorithm. First, we give an example. Suppose we have a substring 3 − 5 − 7 in the sequence. Then one can note that when the non-deterministic algorithm succeeds in completing the sequence, it can also do so by not inserting any vertex on the gap of the 5: everything it inserts there can also be inserted at the 3. Thus, the 5 can be be forgotten from the sequence. More generally, we have: Lemma 1. Let Gi = (Vi , Ei , X), i = 1, 2 be -terminal graphs. Let f be a linear order of G1 = (V1 , E1 , X) of weighted cutwidth at most c, and let f be a linear order of G1 ⊕ G2 of weighted cutwidth at most c such that f extends f . Suppose we have for 1 ≤ j1 < j2 ≤ |V1 |: – X ∩ {f −1 (j1 + 1), f −1 (j1 + 2), . . . , f −1 (j2 − 1), f −1 (j2 )} = ∅. – nf (j1 ) = minj1 ≤j≤j2 nf (j). – nf (j2 ) = maxj1 ≤j≤j2 nf (j). Let f be the linear order of G1 ⊕G2 that is obtained from f by replacing the substring f [f −1 (j1 ), f −1 (j2 )] by the substring f −1 (j1 ) · (f [f −1 (j1 ), f −1 (j2 )])|V2 −X · f [f −1 (j1 + 1), f −1 (j2 )]. Then the cutwidth of f is at most c. The change in the linear ordering is graphically depicted in Figure 1. The proof (omitted here) is by case analysis, considering the different locations for gaps and distinguishing between different ‘types’ of edges. Case analysis shows that f still preserves directions of arcs. A similar lemma can be proved for the case that nf (j1 ) = maxj1 ≤j≤j2 nf (j) and nf (j2 ) = minj1 ≤j≤j2 nf (j), and all other conditions are as in the lemma. The main reason why this lemma is interesting is the following corollary. Corollary 1. Consider the non-deterministic decision algorithm. Suppose at some point, a sequence s1 · s2 · s3 , with s2 = n1 − n2 − · · · − nq . s2 does not contain a character of the form tr . Suppose it holds that n1 = min{n1 , . . . , nq } and nq = max{n1 , . . . , nq }, or that n1 = max{n1 , . . . , nq } and nq = min{n1 , . . . , nq }. Then, if there is an extension of the sequence that corresponds to a linear order of G of cutwidth at most c, there is such an extension that does not insert any vertex on the gaps corresponding to the numbers n2 , . . . , nq−1 in substring s2 .
Starting with Nondeterminism
f −1 (j1 )
f −1 (j1 + 1) −1 f (j1 ) + 2
245
f −1 (j2 )
Fig. 1. A graphical depiction of the change going from f to f
This gives an obvious modification to the non-deterministic algorithm: when choosing where to insert a vertex, forbid to insert a vertex on any of the gaps n2 , . . . , nq−1 , as indicated in Corollary 1. But then we may note that when we do not insert at the gaps corresponding to these numbers n2 , . . . , nq−1 , we can actually forget these numbers: if we insert v, and an edge e with endpoint v crosses one of these gaps, then e also crosses the gap with value max{n1 , nq }; thus we can drop the numbers n2 , . . . , nq−1 from the sequence. The discussion leads to observing that the following non-deterministic algorithm indeed also correctly decides whether the cutwidth is at most k. The insertion of a vertex is still done as in Section 3.2; the main difference in the algorithm below is in the compression operation. 1. Start with the sequence 0. 2. Now, go through the path decomposition from left to right. If we deal with the ith node of the path decomposition, then (a) If the ith node is an introduce node of the form I(r, S), then we insert tr non-deterministically at some gap in the sequence such that the resulting sequence has weighted cutwidth at most c. If there is no such gap, halt and reject. (b) If the ith node is a forget node of the form F (r), then we replace tr in the sequence by a symbol −. (c) In both cases, check if the sequence has a substring of the form n1 − n2 − · · · − nq with {nj1 , nj2 } = {minj1 ≤j≤j2 nj , maxj1 ≤j≤j2 nj }. If so, replace the substring n1 − n2 − · · · − nq with the substring n1 − nq . Repeat this step until such a replacement is no longer possible. (We call this a compression operation.) 3. If all nodes of the path decompositions have been handled, then output yes. A sequence of integers that cannot made smaller by the compression operation is called a typical sequence. There are at most 83 22c typical sequences of the integers in {0, 1, . . . , c} ([7, Lemma 3.5]). As we have at most k + 1 terminals, and between every pair of terminals there is a typical sequence, the number of possible sequences that can arise is bounded by a function of k and c; i.e., constant if k and c are constants. (See [15, Lemma 3.2] for an estimate.)
246
Hans L. Bodlaender, Michael R. Fellows, and Dimitrios M. Thilikos
This implies that the algorithm can be viewed as a NFA. The input to the automaton is a string that describes the nice path decompositions; with symbols from the finite alphabet {I(r, S) | 1 ≤ r ≤ k+1, S ⊆ {1, . . . , k+1}}∪{F (r) | 1 ≤ r ≤ k +1}, i.e., the input string is the sequence of successive introduce and forget operations that give the nice path decomposition. The states of the automaton are the different strings that can be formed during the process: the number of different such strings, and hence also the number of states is bounded by a constant, depending only on k and d. The possible next states are determined by the symbol (the type of node we deal with in the path decomposition), possibly a non-deterministic choice (where to insert the new vertex) and the old state (the sequence to which the vertex is inserted or the sequence where the forgotten node is replaced by an −). 3.5
A Deterministic Decision Algorithm
As is known for finite state automata, NFA’s recognise the same set of languages as DFA’s. We can employ the (actually simple) tabulation technique here too, and arrive at our deterministic algorithm. Thus, we take the algorithm of Section 3.4, and make it deterministic, using this technique. 1. Start with a set of sequences A0 that initially contains one sequence 0. 2. Now, go through the path decomposition from left to right. If we deal with the ith node of the path decomposition, then (a) Set Ai = ∅ and Bi = ∅. (b) If the ith node is an introduce node of the form I(r, S), then do for every sequence s ∈ Ai−1 : For every gap in the sequence s, look to the sequence s obtained by inserting tr in s in that gap. If s has weighted cutwidth at most c, preserves directions of arcs to and from tr and s ∈ Bi , then insert s in Bi . (c) If the ith node is a forget node of the form F (r), then for every sequence s ∈ Ai−1 , let s be the sequence obtained by replacing tr in s by a symbol −. If s ∈ Bi , then insert s in Bi . (d) In both cases, let s be the sequence obtained by applying the compression operation to s. If s ∈ Ai , then insert s in Ai . 3. If all r nodes of the path decompositions have been handled, then output yes, if and only if Ar = ∅. I.e., we just tabulate all possible sequences the non-deterministic algorithm can attain at its steps, and thus arrive at an equivalent deterministic algorithm. We mentioned earlier that the number of possible sequences is bounded by a function of k and c, thus, if k and c are fixed, each set S is of constant size. Hence, the algorithm above uses linear time. The standard technique that turns a dynamic programming algorithm for a decision problem into an algorithm that constructs solutions can be employed here too: additional bookkeeping is done, and when the decision problem is positively solved, the kept information is used to construct a solution. We omit the details.
Starting with Nondeterminism
4
247
Other Problems
The technique can also be applied to several other problems. Some well studied problems are just a special case of the weighted cutwidth for mixed graphs problem, like the cutwidth (undirected graphs with all edges weight one) and directed cutwidth (directed graphs with all arcs weight one) problems. For some other problems, like pathwidth and modified cutwidth, we can derive a linear time algorithm using a transformation, or constructing the algorithm in the same manner as we did for weighted cutwidth. Theorem 2. For each fixed k, l, each of the following problems can be solved in linear time, assuming in each case that G is given together with a path decomposition of width at most l: 1. Given an undirected graph G, determine if the pathwidth of G is at most k, and if so, find a path decomposition of width at most k. 2. Given a directed acyclic graph G, determine if the directed vertex separation number of G is at most k, and if so, find a corresponding topological sort. 3. Given an undirected, directed, or mixed (weighted) graph G, determine if the (weighted) modified cutwidth of G is at most k, and if so, find a corresponding linear ordering. Similar results hold for weighted modified cutwidth with positive weights. The result for pathwidth gives an easier proof of a result from [7]. In [2] it was stated without proof that directed vertex separation number is linear time solvable.
5
Conclusions
The techniques described here only deal with problems where a linear order has to be found, and as common characteristic we have that yes-instances have bounded pathwidth. Similar algorithms are known however for notions that imply bounded treewidth (like branchwidth [8], carving width [16] and treewidth itself [13,7]). In order to be able to present or extend such algorithms like we did above, additional techniques have to be added to the machinery, in particular: – The desired output of the problem has a tree structure. Basically, one should show that certain parts of the tree (tree- or branch-decomposition) can be forgotten in the algorithm, and that the remainder of the tree then can be formed by gluing a constant number of paths together. – The input has bounded treewidth, thus a nice tree decomposition can be formed. We now have, in addition to the introduce and forget operations, a join operation, where two partial solutions of two subgraphs have to be combined, basically by ‘interleaving’ these two. The constant factors of the algorithms resulting from the methodology presented in this paper are very large. Finding improvements that lead towards implementations that are fast enough in practice for moderate values of k remains a challenge. Finally, starting with a non-deterministic algorithm and turning it stepwise into a deterministic algorithm appears to be a technique that is useful for the design of fixed parameter tractability results (see [9]), and worthy of further investigation.
248
Hans L. Bodlaender, Michael R. Fellows, and Dimitrios M. Thilikos
References 1. K. R. Abrahamson and M. R. Fellows. Finite automata, bounded treewidth and well-quasiordering. In Proc. of the AMS Summer Workshop on Graph Minors, Graph Structure Theory, Contemporary Mathematics vol. 147, pp. 539–564. American Mathematical Society, 1993. 2. H. Bodlaender, J. Gustedt, and J. A. Telle. Linear-time register allocation for a fixed number of registers. In Proc. of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 574–583. ACM, 1998. 3. H. L. Bodlaender. Treewidth: Algorithmic techniques and results. In Proc. 22nd International Symposium on Mathematical Foundations of Computer Science, MFCS’97, pp. 19–36. LNCS, vol. 1295, Springer-Verlag, 1997. 4. H. L. Bodlaender. A partial k-arboretum of graphs with bounded treewidth. Theor. Comp. Sc., 209:1–45, 1998. 5. H. L. Bodlaender, M. R. Fellows, and P. A. Evans. Finite-state computability of annotations of strings and trees. In Proc. Conference on Pattern Matching, pp. 384–391, 1996. 6. H. L. Bodlaender, M. R. Fellows, and D. M. Thilikos. Derivation of algorithms for cutwidth and related graph layout problems. Technical Report UU-CS-2002-032, Inst. of Inform. and Comp. Sc., Utrecht Univ., Utrecht, the Netherlands, 2002. 7. H. L. Bodlaender and T. Kloks. Efficient and constructive algorithms for the pathwidth and treewidth of graphs. J. Algorithms, 21:358–402, 1996. 8. H. L. Bodlaender and D. M. Thilikos. Constructive linear time algorithms for branchwidth. In Proc. 24th International Colloquium on Automata, Languages, and Programming, pp. 627–637. LNCS, vol. 1256, Springer-Verlag, 1997. 9. J. Chen, D. K. Friesen, W. Jia, and I. Kanj. Using nondeterminism to design deterministic algorithms. In Proc. 21st Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2001, pp. 120–131. LNCS, vol. 2245, Springer-Verlag, 2001. 10. M.-H. Chen and S.-L. Lee. Linear time algorithms for k-cutwidth problem. In Proc. Third International Symposium on Algorithms and Computation, ISAAC’92, pp. 21–30. LNCS, vol. 650, Springer-Verlag, 1992. 11. R. G. Downey and M. R. Fellows. Fixed-parameter tractability and completeness I: Basic results. SIAM J. Comput., 24:873–921, 1995. 12. N. G. Kinnersley and W. M. Kinnersley. Tree automata for cutwidth recognition. Congressus Numerantium, 104:129–142, 1994. 13. J. Lagergren and S. Arnborg. Finding minimal forbidden minors using a finite congruence. In Proc. of the 18th International Colloquium on Automata, Languages and Programming, pp. 532–543. LNCS vol. 510, Springer-Verlag, 1991. 14. B. Monien and I. H. Sudborough. Min cut is NP-complete for edge weighted trees. Theor. Comp. Sc., 58:209–229, 1988. 15. D. M. Thilikos, M. J. Serna, and H. L. Bodlaender. A constructive linear time algorithm for small cutwidth. Technical Report LSI-00-48-R, Departament de Llenguatges i Sistemes Informatics, Univ. Politecnica de Catalunya, Barcelona, Spain, 2000. 16. D. M. Thilikos, M. J. Serna, and H. L. Bodlaender. Constructive linear time algorithms for small cutwidth and carving-width. In Proc. 11th International Symposium on Algorithms And Computation ISAAC ’00, pp. 192–203. LNCS vol. 1969, Springer-Verlag, 2000.
Error-Bounded Probabilistic Computations between MA and AM Elmar B¨ohler, Christian Glaßer, and Daniel Meister Theoretische Informatik, Universit¨at W¨urzburg, 97074 W¨urzburg, Germany {boehler,glasser,meister}@informatik.uni-wuerzburg.de
Abstract. We introduce the probabilistic complexity class SBP. This class emerges from BPP by keeping the promise of a probability gap but decreasing the probability limit to exponentially small values. We locate SBP in the polynomial-time hierarchy, more precisely, between MA and AM. We provide evidence that SBP does not coincide with these and other known complexity classes. We construct an oracle relative to which SBP is not contained in ΣP 2. We provide a new characterization of BPPpath . This characterization shows that SBP is a subset of BPPpath . Consequently, there is an oracle relative to which BPPpath is not contained in ΣP 2.
1
Introduction
The use of randomness provides an extension of conventional deterministic Turing machines. The origins of this idea go back to the work of de Leeuw, Moore, Shannon, and Shapiro [dLMSS56]. In 1972, Gill started the investigation of probabilistic polynomialtime bounded machines [Gil72,Gil77]. Such machines can be considered to be usual polynomial-time Turing machines but choose every step randomly with a certain probability. Usually, the choice is between two steps of equal probability 12 . An input x is accepted by a probabilistic machine if it is accepted with probability at least 12 (the probability limit). The complexity class containing all sets acceptable by probabilistic polynomial-time bounded machines is denoted by PP. A well-known subclass of PP is BPP (bounded-error probabilistic polynomial-time) [Gil72,Gil77]. For each language L in this class, there exists ρ > 12 and a probabilistic polynomial-time decision procedure which finds the correct answer to arbitrary queries “x ∈ L?” with probability larger than ρ. Making use of an amplification technique one can increase this probability of success to values arbitrarily close to 1, which means that almost every probabilistic computation gives the correct answer. The only difference between PP and BPP is the existence of a probability gap ε > 0: A probabilistic machine deciding a set in BPP promises not to accept or reject with a probability in the interval [ 21 − ε, 12 + ε]. What happens if we keep the gap but lower the limit of 12 ? Nothing changes as long as the probability limit is decreased by a polynomial factor. However, if we decrease the probability limit by an exponential factor, then we seem to leave BPP and obtain a new complexity class. Languages in this class are accepted by probabilistic polynomial-time machines that may have an exponentially small probability limit and keep the promise of a probability gap. We denote this class B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 249–258, 2003. c Springer-Verlag Berlin Heidelberg 2003
250
Elmar B¨ohler, Christian Glaßer, and Daniel Meister
by SBP (small bounded-error probability). We stick to the notation SBP rather than to SBPP, since the latter denotes the restriction of BPP where a semi-random source of randomness is used [Vaz86]. So far, SBP is an extension of BPP. However, SBP appears in other contexts: The class PP can be defined via GapP functions. Since these functions have different characterizations [FFK94,Gup95], the following statements are equivalent to saying that L ∈ PP. 1. There exists a balanced nondeterministic polynomial-time machine M such that x ∈ L if and only if accM (x) > rejM (x). 2. There exist f, g ∈ #P such that x ∈ L if and only if f (x) > g(x). 3. There exist f ∈ #P and g ∈ FP such that x ∈ L if and only if f (x) > g(x). Interestingly, this equivalence completely disappears if we demand a gap. By this, we mean that there must be some ε > 0 such that either accM (x) > (1+ε) · rejM (x) or accM (x) < (1−ε) · rejM (x) for statement 1, and f (x) > (1+ε) · g(x) or f (x) < (1−ε) · g(x) for statements 2 and 3. The modified statement 1 defines BPP. We show that the modified statement 2 describes exactly the class BPPpath . This characterization of BPPpath is new and of interest in its own right. We show that, apart from the original definition of SBP, one can allow any polynomial-time computable probability limit. Therefore, statement 3 equipped with a gap exactly describes SBP. So if we start from three equivalent characterizations of PP and introduce a gap, then the equivalence of the three statements disappears and we obtain the promise classes BPP, BPPpath , and SBP. In particular, this shows that SBP can be thought of as a restriction of BPPpath , and therefore BPP ⊆ SBP ⊆ BPPpath . Further relations of SBP with gap-definable counting classes as well as detailed proofs can be found in the technical report [BGM02]. Paper Outline. In section 3, we formally introduce SBP and give alternative definitions of this class. Since SBP allows amplification, we can prove that SBP is closed under union. With a technique of universal hash functions [Sip83,BM88] we show that SBP is in the polynomial-time hierarchy. More precisely, BP·UP ∪ MA ⊆ SBP ⊆ BPPpath ∩ AM.
(1)
By the results of Babai [Bab85], it follows that SBP ⊆ ΠP 2. In section 4, we provide evidence for the strictness of the inclusions given in (1). This is done by means of collapse consequences and oracle constructions. We construct an oracle relative to which SBP is not contained in ΣP 2 . This indicates that SBP might not be closed under complementation. As a consequence, we obtain that BPPpath ⊆ ΣP 2 with respect to this oracle. This answers an open question of Han, Hemaspaandra, and Thierauf [HHT97] who ask for BPPpath ’s relationship to RNP and ΣP 2. Our oracle separates two complexity classes by a single diagonalization argument. However, this argument is not easy to obtain. The construction extends ideas from Baker and Selman [BS79] by including redundant encoding of strings. We introduce a suitable witness language that is decidable in SBP. Then, we use a large bit string and encode it in a redundant way into the oracle. An SBP machine can process all this information. However, if a ΣP 2 machine can simulate the SBP machine, then it reveals information of the encoded data. Because of the redundant encoding, this is enough to completely
Error-Bounded Probabilistic Computations between MA and AM
251
reconstruct the original bit string. This gives a description of the bit string that is too short to be possible. In this way we can diagonalize against every ΣP 2 machine. In particular, P ΣP 2 = Π2 relative to our oracle.
2
Preliminaries
df We fix the alphabet Σ ={0, 1}. For a nondeterministic polynomial-time Turing machine M , let accM (x) and rejM (x) denote the number of accepting and rejecting paths df of M on input x, respectively. Let totalM (x) = accM (x) + rejM (x) denote the total number of computation paths. Throughout the paper, if not stated otherwise, variables are natural numbers and polynomials have natural coefficients. The characteristic function of a set B is denoted by cB . When we talk about probabilistic machines, we mean balanced machines, unless we explicitly announce them to be unbalanced (as needed in the definition of BPPpath ). Let M be a (possibly unbalanced) probabilistic machine. For every B ⊆ Σ ∗ × Σ ∗ , every function f : N → N, and every x ∈ Σ ∗ , let df count=f B (x) = #{y : |y| = f (|x|) and (x, y) ∈ B}.
For B ∈ P and a polynomial p, obviously count=p B is in #P. With the help of so-called operators, one can, starting from an existing complexity class C, define new classes. We say A ∈ ∃·C if there are B ∈ C and a polynomial p such that for all x ∈ Σ ∗ , x ∈ A ⇐⇒ count=p B (x) ≥ 1. We say A ∈ ∀·C if there are p(|x|) B ∈ C and a polynomial p such that for all x ∈ Σ ∗ , x ∈ A ⇐⇒ count=p B (x) = 2 [Sto77,Wra77]. We say A ∈ BP·C if there are B ∈ C, a polynomial p, and ε > 0 such that for all x ∈ Σ ∗ , if x ∈ A, then count=p (x) > 12 + ε · 2p(|x|) , and if x ∈ / A, then B 1 p(|x|) =p countB (x) < 2 − ε · 2 [Sch89]. We say A ∈ U·C if there are B ∈ C and a polynomial p such that for all x ∈ Σ ∗ , if x ∈ A, then count=p / A, B (x) = 1, and if x ∈ (x) = 0. It is obvious that ∃·P = NP, ∀·P = coNP, and BP·P = BPP then count=p B [Gil77,Sch89].
3 The Class SBP The class BPP is defined to be the class of all sets that can be decided by a probabilistic polynomial-time machine such that the acceptance probability must never be in a (closed) interval of non-zero length around 12 . Even a very small forbidden interval suffices to admit a probability amplification, which means in case of BPP that the amplified machine outputs the correct result with a probability arbitrarily close to 1. We introduce the class SBP, which is defined similarly, except that the value 12 is lowered to exponentially small values. Definition 1. The set A is in SBP if there exist B ∈ P, polynomials p and q, and ε > 0 such that for all x ∈ Σ ∗ : p(|x|) x ∈ A =⇒ count=q B (x) > (1 + ε) · 2 p(|x|) x∈ / A =⇒ count=q B (x) < (1 − ε) · 2
252
Elmar B¨ohler, Christian Glaßer, and Daniel Meister
From this definition, SBP seems to be more powerful than BPP. In contrast, definitions that arise from those of BPP and SBP by not demanding a gap both lead to PP. 3.1
Properties of SBP
The gap in the definition of SBP allows amplification. However, in contrast to BPP, it does not seem to be possible to raise the probability of the correct answer. At least, we can lower the probability of obtaining the wrong result to values arbitrarily close to 0, if the input has to be rejected. Proposition 1 (Amplification). A ∈ SBP if and only if for every polynomial r > 0 there exist B ∈ P and polynomials q and s such that for all x ∈ Σ ∗ : r(|x|) · 2s(|x|) x ∈ A =⇒ count=q B (x) > 2 1 s(|x|) x∈ / A =⇒ count=q B (x) < r(|x|) · 2 2
Apart from the original definition of SBP, we can allow any polynomial-time computable probability limit. Proposition 2. A set A is in SBP if and only if there exist f ∈ #P, g ∈ FP, and ε > 0 such that for all x ∈ Σ ∗ : x ∈ A =⇒ f (x) > (1 + ε) · g(x) x∈ / A =⇒ f (x) < (1 − ε) · g(x) Proof. It suffices to show the implication from right to left. Without loss of generality, we may assume that g > 0. Let q be a polynomial such that f (x) ≤ 2q(|x|) for all x ∈ Σ ∗ , and df p(|x|) /g(x)· choose a polynomial p such that 2ε ·2p(n) > 2q(n) for all n ≥ 0. Let h(x) = 2 f (x), and note that h ∈ #P. Now observe the following implications. f (x) p(|x|) ε p(|x|) ·2 ·2 x ∈ A =⇒ h(x) > − f (x) > 1 + g(x) 2 f (x) p(|x|) ε p(|x|) ·2 ·2 x∈ / A =⇒ h(x) ≤ < 1− g(x) 2 SBP’s characterization in Proposition 2 allows amplification. Proposition 3. A ∈ SBP if and only if for every h ∈ FP, h > 1, there exist f ∈ #P and g ∈ FP such that for all x ∈ Σ ∗ : x ∈ A =⇒ f (x) > h(x) · g(x) x∈ / A =⇒ f (x) < g(x) It is known that BPP is closed under union, intersection, and complement. We cannot show SBP to be likewise robust. We will see that there is an oracle relative to which SBP = coSBP (cf. Corollary 5). Besides that it remains open whether SBP is closed under intersection. We do not know whether there is an oracle relative to which SBP is not closed under intersection. However, we can prove that it is closed under union.
Error-Bounded Probabilistic Computations between MA and AM
253
Proposition 4. SBP is closed under ∪. df 4. Proposition 3 provides functions f1 , f2 ∈ Proof. Let A1 , A2 ∈ SBP and let h(x) = df df df #P and g1 , g2 ∈ FP. Let ε = 1/3, f (x) = f1 (x) · g2 (x) + f2 (x) · g1 (x) and g(x) = 3· g1 (x)·g2 (x). Observe that ε, f , and g characterize A1 ∪A2 in the sense of Proposition 2.
3.2
Inclusion Relations
Babai [Bab85] introduced the Arthur-Merlin classes MA and AM as interacting proof systems with bounded number of rounds and public coin tosses. Theorem 1. BP·UP ∪ MA ⊆ SBP. Proof. This is achieved by Sch¨oning’s amplification technique [Sch89]. Babai and Moran introduced ε-approximate lower bound protocols of AM [BM88]. They used Sipser’s technique on universal hashing [Sip83] and showed that these protocols solve the following promise problem in an AM-manner (AM-conditions only have to hold for inputs satisfying the promise). Input: An NP machine M , a number N , and a string x, Promise: |accM (x)| < N or |accM (x)| ≥ (1 + ε) · N , Question: Is |accM (x)| ≥ (1 + ε) · N ? If N is the probability limit of an SBP machine M on input x, then, by definition of SBP, the promise is always satisfied. This shows: Theorem 2. SBP ⊆ AM. Particularly, SBP is contained in ΠP 2 . Note that it is not clear whether the mentioned promise problem is in AM. The reason is that inputs that do not satisfy the promise may destroy the promise of possible AM machines. Corollary 1. ∃·BPP = NPBPP ⊆ MA ⊆ SBP ⊆ AM = BP·NP. For PP the notion of balanced and unbalanced probabilistic machines are equal [Sim75]. However, in case of BPP we obtain the new class BPPpath . Definition 2 ([HHT97]). A set A is in BPPpath if there exist a nondeterministic polynomial-time Turing machine M and ε > 0 such that for all x ∈ Σ ∗ : 1 + ε · totalM (x) x ∈ A =⇒ accM (x) > 2 1 − ε · totalM (x) x∈ / A =⇒ accM (x) < 2 Theorem 3 ([HHT97]). PNP[log] ⊆ BPPpath . If we use two #P functions in Proposition 2 instead of one #P function and one FP function we obtain a new characterization of BPPpath .
254
Elmar B¨ohler, Christian Glaßer, and Daniel Meister
Theorem 4. L ∈ BPPpath if and only if there exist f, g ∈ #P and ε > 0 such that: x ∈ L =⇒ f (x) > (1 + ε) · g(x) x∈ / L =⇒ f (x) < (1 − ε) · g(x) Proof. If L ∈ BPPpath , it is easy to see, that L satisfies the right hand side of the proposition. For the other direction, let L satisfy the right-hand side of the proposition. Since f, g ∈ #P there exist nondeterministic polynomial-time Turing machines N1 and N2 with accN1 (x) = f (x) and accN2 (x) = g(x) for all x ∈ Σ ∗ . One can verify that the following nondeterministic polynomial-time Turing machine M accepts L in the sense of BPPpath for a certain polynomial q. M works as follows on input x: First, M produces two paths while making one nondeterministic step. On the first (resp., second) path M simulates N1 (resp., N2 ) on input x. Each time this simulation ends with a rejecting path, M makes one more nondeterministic step in order to produce one accepting and one rejecting path. If the simulation of N1 (resp., N2 ) ends with an accepting path, then M makes q(|x|) additional nondeterministic steps in order to produce 2q(|x|) accepting (resp., rejecting) paths. Corollary 2. BPP ⊆ SBP ⊆ BPPpath .
4
Separation by Oracle Results
In the previous sections our observations aimed at localizing SBP with respect to known complexity classes. In particular, we obtained BP·UP∪MA ⊆ SBP ⊆ AM∩BPPpath . However, up to now we have not provided any evidence of the strictness of these inclusions. The objective of this section is to find hints that separate the classes BPP, BPPpath , and AM from SBP. Furthermore, we prove separation results with respect to ΣP 2 and other classes. We start with the separation of SBP from BPP and BPPpath . Theorem 5. If BPP = SBP or SBP = BPPpath , then the polynomial-time hierarchy collapses to ΣP 2. Proof. If SBP ⊆ BPP, then NP ⊆ BPP and the polynomial-time hierarchy collapses [Sip83,KL82]. If BPPpath ⊆ SBP, then coNP ⊆ AM and the polynomial-time hierarchy collapses [BHZ87]. In contrast to Theorem 5 we cannot prove consequences for SBP = MA or SBP = AM which are similarly unlikely. Therefore, we approach this question by utilizing relativizations. Theorem 6. There exists an oracle A such that AMA ⊆ SBPA and coAMA ⊆ SBPA . Proof. This follows by an oracle A showing that AMA ∩ coAMA ⊆ PPA [Ver92]. Remember that AM contains classes like NP, BPP, MA, and it is unlikely that AM is contained in ΣP 2 . In this light AM seems to be quite powerful. However, Boppana, H˚astad, and Zachos showed that unless the polynomial-time hierarchy collapses, AM (and therefore also SBP) is not powerful enough to contain coNP [BHZ87]. In connection with Yao’s oracle this gives the following consequence.
Error-Bounded Probabilistic Computations between MA and AM
255
Theorem 7 ([Yao85,BHZ87]). There exists an oracle A such that coNPA ⊆ AMA . A
Corollary 3. There exists an oracle A such that coNPA ⊆ SBPA and ΣP ⊆ SBPA . 2 We come to a new oracle construction showing that SBP is not contained in ΣP 2. We will prove a stronger result, namely that there exists an oracle relative to which P BP·UP ⊆ ΣP 2 . Since BP·UP ⊆ SBP and MA ⊆ Σ2 relative to all oracles, this finally A NP yields SBPA ⊆ MAA and SBPA ⊆ ΣP 2 . So relative to our oracle, BPPpath ⊆ R P and BPPpath ⊆ Σ2 . This solves an open question of Han, Hemaspaandra, and Thierauf [HHT97]. Theorem 8. There exists an oracle A such that BP·UPA ⊆ ∃·∀·PA . A
Corollary 4. There exists an oracle A such that SBPA ⊆ MAA , SBPA ⊆ ΣP 2 , and A PA BPPpath ⊆ Σ2 . Corollary 5. There exists an oracle relative to which SBP = coSBP. The remaining part of this section sketches the proof of Theorem 8. We want to mention that Santha [San89] constructs a similar oracle relative to which AM is not contained in ΣP 2 . An examination of his proof shows that it actually establishes Theorem 8. df Proof. We construct oracle stages A1 , A2 , . . ., and let A = i≥1 Ai . As an abbreviation df df df 12 for intervals of stages Ai we use A[k, j] = A . Let a and ai+1 = 2ai for 1 =2 k≤i≤j i i ≥ 1. It is easy to show that 2ai /4 ≥ (ai )i for i ≥ 1. Define the following conditions for B ⊆ Σ ∗ and i ≥ 1: df C1(B, i) = for every x ∈ Σ ai /4 there exists at most one y ∈ Σ ai ·3/4 with xy ∈ B 1 df |B ∩ Σ ai | = 2ai /4 ∨ |B ∩ Σ ai | ≤ · 2ai /4 C2(B, i) = 2
Our construction is such that Ai ⊆ Σ ai ∧ C1(A[1, i], i) ∧ C2(A[1, i], i) for each i ≥ 1. For B ⊆ Σ ∗ let df ai W (B) ={0 : i ≥ 1 and for all x ∈ Σ ai /4 there exists exactly one y ∈ Σ ai ·3/4 such that xy ∈ B}.
We use W (A) as witness language: On one hand it is clear that, if Ai ⊆ Σ ai ∧ C1(A[1, i], i) ∧ C2(A[1, i], i) for i ≥ 1, then W (A) ∈ BP·UPA . On the other hand we show that W (A) ∈ / ∃·∀·PA . Let T1 , T2 , . . . be an enumeration of all triples T = (M, r, s) where M is a deterministic polynomial-time oracle machine and r, s are polynomials. For Ti = (Mi , ri , si ) we may assume that ri (n) ≤ ni and there is a polynomial ti (n) ≤ ni such that the computation MiB (x, y, z) halts within ti (|x|) steps for all x ∈ Σ + , y ∈ Σ ri (|x|) , z ∈ Σ si (|x|) . To reach W (A) ∈ / ∃·∀·PA , the construction of Ai diagonalizes against the “∃·∀·P-machine” Ti : If Ti = (Mi , ri , si ), then the construction of Ai prevents A[1,i] 0ai ∈ W (A[1, i]) ⇐⇒ (∃y ∈ Σ ri (ai ) )(∀z ∈ Σ si (ai ) ) (0ai , y, z) ∈ L(Mi ) .
256
Elmar B¨ohler, Christian Glaßer, and Daniel Meister
So the construction will additionally satisfy conditions C3(A[1, i], i), which are defined as follows. df C3(B, i) = ¬ 0ai ∈ W (B) ⇐⇒ (∃y ∈ Σ ri (ai ) )(∀z ∈ Σ si (ai ) ) (0ai , y, z) ∈ L(MiB ) As an abbreviation for the conditions defined so far we use df C(B, i) = C1(B, i) ∧ C2(B, i) ∧ C3(B, i).
Claim 1. There exist oracle stages A1 , A2 , . . . such that Ai ⊆ Σ ai and C(A[1, i], i) for all i ≥ 1. The idea of the proof is as follows: We start with some large number N, transform it into a redundant representation, and use it as an oracle. If Claim 1. does not hold, then we can describe half of the information contained in the oracle by a small number of bits. Since our representation is redundant, we can reconstruct N. Hence, N has a description that is too short. This is a contradiction and proves Claim 1.. With Claim 1. at hand we complete the proof of the theorem: If W (A) ∈ ∃·∀·PA , then there exists some i ≥ 1 such that 0ai ∈ W (A) ⇐⇒ (∃y ∈ Σ ri (ai ) )(∀z ∈ Σ si (ai ) )[(0ai , y, z) ∈ L(MiA )]. The ai ’s grow fast enough such that MiB (0ai , y, z) cannot ask for words of length ≥ ai+1 . Moreover, 0ai ∈ W (A) ⇐⇒ 0ai ∈ W (A[1, i]). It follows that A[1,i]
0ai ∈ W (A[1, i]) ⇐⇒ (∃y ∈ Σ ri (ai ) )(∀z ∈ Σ si (ai ) )[(0ai , y, z) ∈ L(Mi
)].
This contradicts C3(A[1, i], i), which holds by Claim 1.; this proves the theorem.
5
Conclusions and Open Questions
We introduced SBP and showed that it arises in several contexts. We proved that it is located between MA and AM ∩ BPPpath . We found evidence that it is unlikely that SBP coincides with any of these classes. We showed that SBP is closed under union. By our oracle construction, SBP does not seem to be closed under complementation. We do not know whether SBP is closed under intersection. The idea of concatenating two machines does not work. Is there an oracle separating SBP and its closure under intersection? We know that BPPpath as well as AM are closed under intersection, so that the closure of SBP under intersection is contained in AM ∩ BPPpath . Is this inclusion strict? Han, Hemaspaandra, and Thierauf ask whether BPPpath has complete sets [HHT97]. Aspnes, Fischer, Fischer, Kao, and Kumar study the predictability of stock markets [AFF+ 01]. They give a problem that is hard for BPPpath , but its completeness is unknown. So far, we neither know complete sets for BPPpath , nor do we know an oracle relative to which BPPpath does not have complete sets. The same questions are interesting with respect to SBP. Note that there exists an oracle relative to which BPP does not have complete sets [Sip82,HH88].
Error-Bounded Probabilistic Computations between MA and AM
257
A similar open issue is to find natural problems in SBP that are not obviously in BPP. The graph non-isomorphism problem and some matrix group problems over finite fields are known to be in AM [GMW91,Sch88,Bab85]. Is one of them in SBP? Other open questions address the separation of SBP from MA and AM. Can one extend this to collapse consequences? In addition one should look for unlikely consequence of the assumption SBP ⊆ ΣP 2.
Acknowledgements We thank Klaus W. Wagner for initiating this work and for many helpful discussions. In particular, the idea of the class SBP is due to him. Furthermore, we thank Stephen A. Fenner, Frederic Green, Lane A. Hemaspaandra, Sven Kosub, and Heribert Vollmer for helpful hints. We thank an anonymous referee for informing us of Santha’s paper [San89].
References AFF+ 01.
J. Aspnes, D. F. Fischer, M. J. Fischer, M. Y. Kao, and A. Kumar. Towards understanding the predictability of stock markets from the perspective of computational complexity. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 745–754. ACM Press, 2001. Bab85. L. Babai. Trading group theory for randomness. In Proceedings 17th Symposium on Theory of Computing, pages 421–429. ACM Press, 1985. BGM02. E. B¨ohler, C. Glaßer, and D. Meister. Error-bounded probabilistic computations between MA and AM. Technical Report 299, Julius-Maximilians-Universit¨at W¨urzburg, 2002. Available at http://www.informatik.uni-wuerzburg.de/reports/tr.html. BHZ87. R. B. Boppana, J. H˚astad, and S. Zachos. Does co-NP have short interactive proofs? Information Processing Letters, 25(2):127–132, 1987. BM88. L. Babai and S. Moran. Arthur-Merlin games: A randomized proof system, and a hierarchy of complexity classes. Journal of Computer and System Sciences, 36:254– 276, 1988. BS79. T. P. Baker and A. L. Selman. A second step towards the polynomial hierarchy. Theoretical Computer Science, 8:177–187, 1979. dLMSS56. K. de Leeuw, E. F. Moore, C. E. Shannon, and N. Shapiro. Computability by probabilistic machines. In C. E. Shannon, editor, Automata Studies, volume 34 of Annals of Mathematical Studies, pages 183–198. Rhode Island, 1956. FFK94. S. Fenner, L. Fortnow, and S. Kurtz. Gap-definable counting classes. Journal of Computer and System Sciences, 48:116–148, 1994. Gil72. J. Gill. Probabilistic Turing Machines and Complexity of Computations. PhD thesis, University of California Berkeley, 1972. Gil77. J. Gill. Computational complexity of probabilistic turing machines. SIAM Journal on Computing, 6:675–695, 1977. GMW91. O. Goldreich, S. Micali, and A. Widgerson. Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems. Journal of the Association for Computing Machinery, 38(1):691–729, 1991. Gup95. S. Gupta. Closure properties and witness reductions. Journal of Computer and System Sciences, 50:412–432, 1995.
258 HH88. HHT97. KL82. San89. Sch88. Sch89. Sim75. Sip82.
Sip83. Sto77. Vaz86. Ver92. Wra77. Yao85.
Elmar B¨ohler, Christian Glaßer, and Daniel Meister J. Hartmanis and L. A. Hemachandra. Complexity classes without machines: On complete languages for UP. Theoretical Computer Science, 58:129–142, 1988. Y. Han, L. A. Hemaspaandra, and T. Thierauf. Threshold computation and cryptographic security. SIAM Journal on Computing, 26(1):59–78, 1997. R. Karp and R. Lipton. Turing machines that take advice. L’enseignement math´ematique, 28:191–209, 1982. M. Santha. Relativized Arthur-Merlin versus Merlin-Arthur games. Information and Computation, 80:44–49, 1989. U. Sch¨oning. Graph isomorphism is in the low hierarchy. Journal of Computer and System Sciences, 37:312–323, 1988. U. Sch¨oning. Probabilistic complexity classes and lowness. Journal of Computer and System Sciences, 39:84–100, 1989. J. Simon. On Some Central Problems in Computational Complexity. PhD thesis, Cornell University, 1975. M. Sipser. On relativization and the existence of complete sets. In Proceedings 9th ICALP, volume 140 of Lecture Notes in Computer Science, pages 523–531. Springer Verlag, 1982. M. Sipser. A complexity theoretic approach to randomness. In Proceedings of the 15th Symposium on Theory of Computing, pages 330–335, 1983. L. Stockmeyer. The polynomial-time hierarchy. Theoretical Computer Science, 3:1–22, 1977. U. Vazirani. Randomness, Adversaries and Computation. PhD thesis, University of California Berkeley, 1986. N. K. Vereshchagin. On the power of PP. In Proceedings 7th Structure in Complexity Theory, pages 138–143. IEEE Computer Society Press, 1992. C. Wrathall. Complete sets and the polynomial-time hierarchy. Theoretical Computer Science, 3:23–33, 1977. A. C. C. Yao. Separating the polynomial-time hierarchy by oracles. In Proceedings 26th Foundations of Computer Science, pages 1–10. IEEE Computer Society Press, 1985.
A Faster FPT Algorithm for Finding Spanning Trees with Many Leaves Paul S. Bonsma, Tobias Brueggemann, and Gerhard J. Woeginger University of Twente, The Netherlands {p.s.bonsma,t.brueggemann,g.j.woeginger}@math.utwente.nl
Abstract. We describe a new, fast, and fairly simple FPT algorithm for the problem of deciding whether a given input graph G has a spanning tree with at least k leaves. The time complexity of our algorithm is polynomially bounded in the size of G, and its dependence on k is roughly O(9.49k ). This is the fastest currently known algorithm for this problem.
1
Introduction
In the max-leaf spanning tree problem (MaxLeaf), an input consists of a connected, undirected, simple graph G = (V, E) on n = |V | vertices. The objective is to find a spanning tree for G with the maximum number of leaves. This problem has been well-studied over the last twenty years. On the negative side, MaxLeaf is known to be NP-hard (Garey & Johnson [11]), and APX-hard (Galbiati, Maffioli & Morzenti [9,10]). On the positive side, the literature contains some polynomial time approximation algorithms for MaxLeaf that have fairly small worst case performance guarantees (a guarantee of 3 by Lu & Ravi [15], and a guarantee of 2 by Solis-Oba [16]). Moreover, it is known that the following natural parameterized version of problem MaxLeaf falls into the complexity class FPT of Downey & Fellows [6]: “Given an n-vertex graph G and a positive integer parameter k, does G possess a spanning tree with at least k leaves? ” Sloppily speaking, a problem belongs to the complexity class FPT, if it has an algorithm with a time complexity of the form O(f (k) poly(n)). Here the dependence f (k) of the running time on k may be arbitrary; for instance, f (k) may grow doubly exponentially with k, or even worse. However, the running time must be polynomially bounded in the input size n. A problem with such an algorithm is said to be fixed-parameter tractable (FPT, for short). Fellows & Langston [8] observed that MaxLeaf belongs to FPT via the graph minors machinery of Robertson & Seymour; their argument was non-constructive and did not explicitly yield an algorithm. Bodlaender [2] constructed the first FPT algorithm for MaxLeaf. Its time complexity was linear in n and had a parameter function f (k) of roughly (17k 4 )!; we stress that [2] was only interested in proving the existence of such an algorithm, and did not put any effort in getting a good time complexity. A little bit later, Downey & Fellows [5] constructed a B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 259–268, 2003. c Springer-Verlag Berlin Heidelberg 2003
260
Paul S. Bonsma, Tobias Brueggemann, and Gerhard J. Woeginger
better FPT algorithm for MaxLeaf with f (k) = (2k)4k . The fastest known FPT algorithm for MaxLeaf prior to our work is due to Fellows, McCartin, Rosamond & Stege [7]; its parameter function is roughly k (14.23)k . The literature also contains a number of purely combinatorial results around problem MaxLeaf, mainly in extremal graph theory. Ding, Johnson & Seymour [4] prove that whenever a graph G = (V, E) satisfies |E| ≥ |V | + 12 (k − 1)(k − 2) and |V | = k + 1, then it possesses a spanning tree with at least k leaves. Another branch of extremal results deals with graphs with large minimum degree. Nati Linial conjectured around 1987 that if the minimum degree of G is at least δ, then there exists a spanning tree with at least n(δ − 2)/(δ + 1) + cδ leaves, where cδ is a constant only depending on δ. Alon [1] used a probabilistic argument to disprove Linial’s conjecture for the cases where the minimum degree δ is sufficiently large. However, for small values of δ the conjecture turned out to be true. Proposition 1. (Linial & Sturtevant [14]; Kleitman & West [13]) Every connected graph G = (V, E) with minimum vertex degree at least 3 has a spanning tree with at least 14 |V | + 2 leaves. Moreover, such a spanning tree can be determined in polynomial time. Kleitman & West [13] proved Linial’s conjecture for δ = 4 with c4 = 8/5, and Griggs & Wu [12] proved Linial’s conjecture for δ = 5 with c5 = 2. All these bounds are best possible. The cases with δ ≥ 6 are not well-understood. In particular, we do not know the value of the smallest δ for which Linial’s conjecture is false. For more information, we refer the reader to Caro, West & Yuster [3]. The contribution of this paper is the following: We construct a fast FPT algorithm for MaxLeaf with parameter function f (k) ≈ ck where c = 256/27 ≈ 9.4815. Our solution approach is quite different from the previous lines of attack in [2,5,7]. It uses heavier combinatorial machinery from the literature, and it is based on three main ingredients. The first ingredient is a preprocessing procedure that translates any instance (G; k) of MaxLeaf into an (equivalent) smaller instance (G ; k ) that satisfies a number of nice structural properties. This preprocessing procedure is described in detail in Section 3. The second ingredient is the combinatorial result in Proposition 1. Finally, the third ingredient is an (expensive!) enumerative procedure that checks an exponential number of vertex subsets whether they can form the leaf-set of a certain type of spanning tree. The second and third ingredient are discussed in Section 4. This section also describes our FPT algorithm, and analyzes its time complexity.
2
Notation and Preliminaries
The degree d(v) of a vertex v is the number of its neighbors. For a subset S ⊆ V , we denote by G[S] the subgraph of G induced by S. Moreover, we write G − S short for G[V − S]. By the leaf-set leaf(G), we denote the set of all vertices in V that have degree 1. Next, we consider the sequence of graphs that starts with the graph G0 = G and that has Gi = Gi−1 − leaf(Gi−1 ) for all i ≥ 1. Let k
A Faster FPT Algorithm for Finding Spanning Trees with Many Leaves
261
be the smallest index for which Gk = Gk+1 holds. Since this graph Gk results from repeatedly shaving off the leaves from G, the remaining graph Gk is either empty or has minimum vertex degree 2. The graph Gk is called the shaved graph of G, and we will denote it by shave(G). We partition the vertex set of G into three disjoint classes S ∅ (G), S ≥3 (G), and S =2 (G): – The set S ∅ (G) of so-called shaved-off vertices contains the vertices of G that do not show up in shave(G). – The set S ≥3 (G) of so-called shaved high-degree vertices contains the vertices that have degree at least 3 in the shaved graph shave(G). – The set S =2 (G) of so-called shaved degree-2 vertices contains the vertices that have degree 2 in the shaved graph shave(G). If S ≥3 (G) = ∅, then shave(G) is either a cycle, or it is the empty graph. Moreover, shave(G) is the empty graph if and only if G is cycle-free. Let u and v be two vertices in S ≥3 (G). A caterpillar path (or c-path, for short) between u and v is a subgraph of G that is a simple path with end vertices u and v, and all internal vertices in S =2 (G). If x1 , . . . , xr are the internal vertices of a c-path ordered from u to v, then P = u, x1 , . . . , x2 , v denotes this c-path. Note that a single edge uv with u, v ∈ S ≥3 (G) also forms a c-path. A vertex x ∈ S ∅ (G) is said to be in the neighborhood of a c-path from u to v, if there is a path P from an interior vertex of the c-path to x that only traverses vertices from S ∅ (G). In other words, x is in the neighborhood if it belongs to one of the trees that are dangling from the c-path. If P is a c-path, C(P ) denotes the (connected) subgraph of G induced by V (P ) and all the vertices in S ∅ (G) that are in the neighborhood of the c-path P (C(P ) is the actual caterpillar). A vertex x ∈ S =2 (G) without neighbors in S ∅ (G) is called an α-vertex, and a vertex x ∈ S =2 (G) with at least one neighbor in S ∅ (G) is called a β-vertex. Proposition 2. Let G = (V, E) be a graph that contains a (not necessarily spanning) tree T = (V , E ) with V ⊆ V and E ⊆ E, and assume that T has at least leaves. Then G has a spanning tree T with at least leaves.
3
The Preprocessing Phase
In this section, we will prove a number of reduction lemmas that altogether lead to the following theorem. Theorem 3. There exists a polynomial time algorithm that translates any instance (G; k) of MaxLeaf into another instance (G ; k ) that satisfies the following properties: – (G; k) is a YES-instance if and only if (G ; k ) is a YES-instance; – G has at most as many edges as G; G has at most as many c-paths as G; and k ≤ k holds; – Between any two vertices u, v ∈ S ≥3 (G ), there is at most one c-path; – For any u ∈ S ≥3 (G ), none of the c-paths both starts and ends in u;
262
Paul S. Bonsma, Tobias Brueggemann, and Gerhard J. Woeginger
– Every vertex in S ∅ (G ) is a leaf; – For every c-path in G , its internal vertices form an alternating sequence of α- and β-vertices. The first property in this theorem expresses the equivalence of instances (G ; k ) and (G; k). The second property states that the reduction does not blow up the data. The structural information is in the third and fourth property: The c-paths in G are well-behaved. No two of them run in parallel, and none of them forms a loop. The fifth and sixth property limit the possible forms of c-paths in G . We will present a number of lemmas that make small cosmetic changes to the instance, that make the instance simpler and smaller, and that step by step lead to the desired properties. When we say an instance (G; k) containing a certain structure is reducible, we mean that we can replace this structure with a different structure giving a new instance (G ; k ) that satisfies the first two properties of Theorem 3. Furthermore, in such a replacement step either the number of edges or the number of c-paths decreases. This guarantees that the algorithm will terminate in polynomial time. Lemma 4. If G has a bridge e = uv with d(u) ≥ 2 and d(v) ≥ 2, then (G; k) is reducible. Proof. In every spanning tree, the bridge e is used and the vertices u and v are non-leaves. Hence, we may contract e without changing the problem. Lemma 5. If G contains two leaves v1 and v2 that are adjacent to the same vertex u, then (G; k) is reducible. We apply Lemmas 4 and 5 to G over and over again, as long as this is possible. The resulting graph G already has a fairly simple structure. Every vertex in S ∅ (G) is a leaf. Every vertex in S ≥3 (G) ∪ S =2 (G) is adjacent to at most one vertex from S ∅ (G). Observe that if S ≥3 (G) = ∅, then (G, k) already satisfies the properties in Theorem 3. So from now on we will assume that S ≥3 (G) = ∅. Lemma 6. Consider a c-path P = u, x1 , . . . , xr , v with u, v ∈ S ≥3 (G). (a) If for some i, the vertices xi and xi+1 both are α-vertices, then (G; k) is reducible. (b) If for some i, the vertices xi and xi+1 both are β-vertices, then (G; k) is reducible. Proof. (a) We claim that there always exists an optimal tree for MaxLeaf that avoids the edge xi xi+1 . Consider an optimal tree T that does use xi xi+1 . If we remove the edge xi xi+1 from T , it breaks into two subtrees T1 and T2 . By Lemma 4 we may assume that xi xi+1 is not a bridge in G, and so there is an edge y1 y2 in G that connects T1 and T2 . Then it can be seen that the new tree T := T − xi xi+1 + y1 y2 is an optimal tree, too (the transformation cannot decrease the number of leaves). Since the edge xi xi+1 is not needed in an optimal tree, we may remove it from G and reduce the instance.
A Faster FPT Algorithm for Finding Spanning Trees with Many Leaves
263
(b) We claim that there always exists an optimal tree for MaxLeaf that uses the edge xi xi+1 . Consider an optimal tree T that avoids xi xi+1 . If we insert the edge xi xi+1 into T , this creates a cycle C; let y1 y2 = xi xi+1 be any edge on C. Then the new tree T := T + xi xi+1 − y1 y2 is optimal, too: The vertices xi and xi+1 were non-leaves in T , and they are non-leaves in T . Since the edge xi xi+1 can be put into every optimal tree, and since the β-vertices xi and xi+1 can never be leaves, we may contract the edge xi xi+1 in G and reduce the instance. Lemma 7. Consider a c-path P = u, x1 , . . . , xr , v with u, v ∈ S ≥3 (G). If for some i, the vertices xi−1 and xi+1 both are α-vertices and xi is a β-vertex, then (G; k) is reducible. Proof. Vertex xi−1 is incident to the edge xi−1 xi and to a second edge that we call e. Vertex xi+1 is incident to the edge xi xi+1 and to a second edge that we call f . The β-vertex xi has an incident edge g that connects it to a leaf y. We claim that there always exists an optimal tree for MaxLeaf that uses the three edges xi−1 xi , xi xi+1 , and g. Indeed, if T is an optimal tree with xi−1 xi ∈ / T , then T = T − e + xi−1 xi is a spanning tree with at least as many leaves. Similarly, if xi xi+1 ∈ / T , then T = T − f + xi xi+1 is a spanning tree with at least as many leaves. The edge g must be in T , since it is the only edge incident to the leaf y. We reduce (G; k) as follows: We contract xi−1 , xi , xi+1 , and y into a single vertex, and we replace k by k − 1. As an immediate consequence of Lemmas 6 and 7, we may assume from now on that on any c-path P = u, x1 , . . . , xr , v with u, v ∈ S ≥3 (G), r ≤ 3 holds and that the interior vertices are alternatingly α-vertices and β-vertices. Moreover, if r = 3 then x1 and x3 are β-vertices and x2 is an α-vertex. Next, we will study the situations where G contains two vertices u and v in S ≥3 (G) that are connected by two distinct c-paths P = u, x1 , . . . , xr , v and P = u, x1 , . . . , xs , v . We will show that in all these situations, the instance (G; k) is reducible. Here is some intuition for this: Since the two paths P and P form a cycle, any spanning tree must avoid one edge from P , or one edge from P , or both, one edge from P and one from P . A spanning tree cannot avoid two edges from the same path, since this would yield isolated vertices. There are not many possibilities for avoiding edges, since the c-paths and their neighborhood are strongly structured. We list and analyze all possible cases in the following two subsections. 3.1
The Cases Where Both Caterpillar Paths Have Interior Vertices
Lemma 8. Let P = u, x1 , . . . , xr , v and P = u, x1 , . . . , xs , v be two c-paths between u, v ∈ S ≥3 (G). If r ≥ 2 and s ≥ 1, and if x1 and x1 both are β-vertices, then (G; k) is reducible. Proof. Consider an optimal tree T for MaxLeaf. If T does not use the edge ux1 , then x1 x2 ∈ T and the new tree T − x1 x2 + ux1 has at least as many leaves as T (In the new tree x2 is a leaf, and in the old tree u might have been a leaf). Hence, there always exists an optimal tree T that uses ux1 .
264
Paul S. Bonsma, Tobias Brueggemann, and Gerhard J. Woeginger
If in this optimal tree T the vertex u is a leaf, then ux1 ∈ / T and x1 x2 ∈ T . Furthermore, the β-vertex x1 cannot be a leaf in any spanning tree. Therefore, T + ux1 − x1 x2 has at least as many leaves as T , and is another optimal tree. To summarize, there always exists an optimal tree that contains the edge ux1 , and in which u and x1 are non-leaves. Hence, we may contract the edge ux1 in G to a single vertex without changing the problem. Lemma 9. Consider two c-paths P = u, x1 , . . . , xr , v and P = u, x1 , v between u, v ∈ S ≥3 (G). If r ≥ 1 and if the (unique) interior vertex x1 of P is an α-vertex, then (G; k) is reducible. Proof. We claim there always exists an optimal tree for MaxLeaf that does not use both of the edges ux1 and x1 v. Indeed, consider an optimal tree T . Suppose that it uses both edges ux1 and x1 v. Then T must avoid exactly one edge yz in P . We distinguish a number of cases: First, if y and z both are interior vertices of P , then one of them is an α-vertex and one of them is a β-vertex. Hence, at most one of them can be a leaf in T . Then the tree T − ux1 + yz has x1 as a new leaf, and hence it has at least as many leaves as T . In the second case, we assume that T avoids the edge yz = ux1 . If at most one of u and x1 is in leaf(T ), then T − ux1 + ux1 has at least as many leaves as T . If both u and x1 are in leaf(T ), then the tree T − ux1 + ux1 still has u as a leaf, and it has the new leaf x1 instead of the old leaf x1 . Hence, also in this case, T − ux1 + ux1 has at least as many leaves as T . The third case, where T avoids the edge yz = xr v is symmetric to the second case. To summarize, there always exists an optimal spanning tree T in which x1 is a leaf. The reduction is now as follows: We remove x1 from G, and we replace k by k − 1. (In any spanning tree, one of u and v must be a non-leaf, and to that non-leaf we then can reconnect vertex x1 ). Lemma 10. Consider two c-paths P = u, x1 , v and P = u, x1 , v with one interior vertex between u, v ∈ S ≥3 (G). If x1 and x1 both are β-vertices, then (G; k) is reducible. Lemma 11. Consider two c-paths P = u, x1 , x2 , v and P = u, x1 , x2 , v between u, v ∈ S ≥3 (G). If x1 and x2 both are α-vertices, and if x2 and x1 both are β-vertices, then (G; k) is reducible. Let us finish this subsection by checking that we have indeed settled all the cases where both c-paths P = u, x1 , . . . , xr , v and P = u, x1 , . . . , xs , v have at least one interior vertex (and hence satisfy r, s ≥ 1). By Lemmas 6 and 7, we may assume that these c-paths alternatingly consist of α-vertices and β-vertices. Moreover, there are at most 3 interior vertices, and if there are exactly 3 then the first and last one are β-vertices and the middle one is an α-vertex. Without loss of generality we assume that r ≥ s. First assume that s = 1: If x1 is an α-vertex, then by Lemma 9 this situation is reducible. If x1 is a β-vertex and r ≥ 2, then Lemma 8 can be applied to reduce the instance. If x1 is a β-vertex and r = 1, then either Lemma 9 or
A Faster FPT Algorithm for Finding Spanning Trees with Many Leaves
265
Lemma 10 lead to a reduction. Next assume that s = 2 (and r ≥ 2): If u or v are adjacent to two β-vertices on P and P , then Lemma 8 applies, otherwise Lemma 11 applies. The case s = r = 3 can be settled by Lemma 8. 3.2
The Cases Where One of the Caterpillar Paths Is an Edge
In this subsection, we discuss the case where u, v ∈ S ≥3 (G) are connected by the edge uv and by a c-path P = u, x1 , . . . , xr , v . There are four possible cases that are handled in the four lemmas below: (i) r = 3; (ii) r = 2; (iii) r = 1 and x1 is a β-vertex; (iv) r = 1 and x1 is an α-vertex. Lemma 12. If there is a c-path P = u, x1 , x2 , x3 , v between u, v ∈ S ≥3 (G) with uv ∈ E, then (G; k) is reducible. Proof. We may assume by Lemmas 6 and 7 that x1 and x3 are β-vertices and that x2 is an α-vertex. We claim that there always exists an optimal tree for MaxLeaf that does not use the edge x1 x2 . Indeed, consider an optimal tree T with x1 x2 ∈ T . If x2 x3 ∈ / T , then we may simply switch the edges x1 x2 and x2 x3 and derive another optimal tree. Now assume T contains both x1 x2 and x2 x3 . By symmetry, we may furthermore assume that ux1 ∈ T . There remain / T (otherwise there would be two cases to consider: First, if x3 v ∈ T , then uv ∈ a cycle). Then T − x1 x2 + uv has at least as many leaves as T (since u and v / T , then T − x1 x2 + x3 v has at can not both be leaves in T ). Second, if x3 v ∈ least as many leaves as T . Since the edge x1 x2 is not needed in an optimal tree, we may remove it from G and reduce the instance. Lemma 13. If there is a c-path P = u, x1 , x2 , v between u, v ∈ S ≥3 (G) with uv ∈ E, then (G; k) is reducible. Lemma 14. Consider a c-path P = u, x1 , v between u, v ∈ S ≥3 (G) with uv ∈ E. If x1 is a β-vertex, then (G; k) is reducible. Proof. In this case, there always exists an optimal tree for MaxLeaf that does not use the edge uv. We reduce the instance by removing uv from G. Lemma 15. Consider a c-path P = u, x1 , v between u, v ∈ S ≥3 (G) with uv ∈ E. If x1 is an α-vertex, then (G; k) is reducible. To complete the proof of Theorem 3, we still have to get of rid of c-paths that start and end in the same vertex u ∈ S ≥3 (G). But this is straightforward: Exactly one edge has to removed from this cycle, and one can compute the best edge to remove and remove it in polynomial time.
4
The Algorithm
In this section, we will construct our fast FPT algorithm. The algorithm first goes through the preprocessing procedure described in the preceding Section 3.
266
Paul S. Bonsma, Tobias Brueggemann, and Gerhard J. Woeginger
By Theorem 3 it ends up with an instance (G; k) such that between any u, v ∈ S ≥3 (G) there is at most one connecting c-path in G, and such that there is no c-path that starts and ends in the same vertex. In addition we know that every c-path consists of an alternating sequence of α-vertices and β-vertices and every vertex in S ∅ (G) is a leaf. Next, we distinguish three cases that depend on the cardinality of S ≥3 (G). The first case deals with S ≥3 (G) = ∅. In this case either G is a tree and the problem is trivial or shave(G) is a cycle C, and this cycle C is the unique cycle in G. This case is straightforward, since exactly one of the edges in C must be broken. In the second case, |S ≥3 (G)| ≥ 4k holds. We define a graph H that has vertex set S ≥3 (G) and that has an edge uv for every two vertices u, v ∈ S ≥3 (G) that are connected by a c-path in shave(G). By Theorem 3, the graph H is simple and loop-free. Moreover, every vertex in S ≥3 (G) has degree at least 3 in shave(G), and so its incident c-paths connect it to at least three pairwise distinct other vertices in S ≥3 (G). This means that H is connected, has minimum vertex degree at least 3, and has at least 4k vertices. Hence, Proposition 1 can be applied to find a spanning tree T in H with at least k leaves. By replacing the edges in T by their corresponding c-paths in G, we get a (not necessarily spanning) tree in G that has at least k leaves. Finally, Proposition 2 yields a spanning tree with at least k leaves for G, and thus (G; k) is a YES-instance of MaxLeaf. This completes the discussion of the second case. In the third case, |S ≥3 (G)| < 4k holds. We define an edge-weighted graph H that has vertex set S ≥3 (G) and that has an edge uv for every two vertices u, v ∈ S ≥3 (G) that are connected by a c-path in G. We assign the following weights to edges in H: if the c-path corresponding to edge e contains an αvertex, set w(e) = 1, otherwise set w(e) = 0. If T is a spanning tree in G and all edges of a c-path P are present in T , we say T uses the c-path P . Observe that for any spanning tree T in G, the edges in H corresponding to c-paths used in T induce a spanning tree in H. For any c-path P in G, if this c-path is used in T , none of the vertices in V (P ) ∩ S =2 (G) is a leaf. Since the α-vertices and β-vertices alternate and β-vertices can never be a leaf, at most one of those vertices is a leaf if the c-path is not used. Also, if the c-path does not contain α-vertices, the c-path can never contain an internal vertex that is a leaf. So the weights assigned to edges in H represent the cost of using the corresponding c-path in a spanning tree in G, in terms of the loss of possible leaves. We enumerate all subsets Y ⊆ S ≥3 (G) with |Y | ≤ k. For each such subset Y , we perform the following test procedure. A spanning tree T of G is called Y compatible, if Y ⊆ leaf(T ) holds. We determine the maximum possible number of leaves in S =2 (G) ∪ S ∅ (G) over all the Y -compatible spanning trees T for G. That means, vertices in Y must be leaves in T (but are not counted as leaves) and vertices in S ≥3 (G) − Y may be leaves or may be non-leaves in T (but in either case are not counted as leaves).
A Faster FPT Algorithm for Finding Spanning Trees with Many Leaves
267
Lemma 16. G has a Y -compatible spanning tree if and only if G − Y is connected, and if in the graph H, every y ∈ Y has a neighbor that is not in Y . How can we compute the Y -compatible spanning tree with the maximum number of leaves in S =2 (G) ∪ S ∅ (G)? Since all vertices in S ∅ (G) are always leaves, this is also the Y -compatible spanning tree that maximizes the number of leaves in S =2 (G). If Y does not satisfy the two conditions in Lemma 16, then we stop without a solution. Otherwise, we remove Y from H. For the remaining graph H − Y , we then compute a minimum spanning tree TH with respect to the edge weights w defined above. This tree is used to construct a spanning tree T in G: for all edges in H − Y do the following: if the edge e in H − Y corresponding to c-path P is used in TH , then all edges of C(P ) are added to E(T ). If this edge e is not used in TH , then choose one edge in E(P ) incident with an α-vertex if possible, otherwise choose an arbitrary edge of E(P ). Add all other edges of C(P ) to E(T ). This guarantees that P is not used in T , and that P contributes w(e) leaves to T . Now for c-paths P with one end vertex y ∈ Y : add all edges of C(P ) to E(T ), except for the unique edge incident with y. For every y: if y has a neighbor in G that is a β-vertex or a neighbor in S ≥3 (G), connect y to this neighbor in T . Otherwise connect it to an arbitrary neighbor (an α-vertex). This maximizes the number of leaves in c-paths with end vertex y, when y should be a leaf. Lemma 16 shows that there are no c-paths in G between two vertices in Y with at least one internal vertex. Therefore this procedure has constructed a spanning tree in G: of every c-path, at most one edge is not present in E(T ), and the edges in H corresponding to c-paths that are used in T form a spanning tree in H. Also, every vertex in y is a leaf in T , so T is Y -compatible. By construction, T is also a Y -compatible spanning tree with maximum number of leaves in S =2 (G). We determine the overall number of leaves in S =2 (G) ∪ S ∅ (G) in T . If + |Y | ≥ k, then we have found the desired solution. If + |Y | < k, then there is no solution for this particular choice of Y , and we test the next set. Lemma 17. If all the tests for all subsets Y ⊆ S ≥3 (G) with |Y | ≤ k fail, then the graph G does not possess a spanning tree with at least k leaves. Lemma 17 yields the correctness of our algorithm. What about its time complexity? Before starting the testing of the subsets Y , we compute all the edge weights for H in O(n2 ) time. Every single test boils down to computing a minimum spanning tree TH for H, and to translating TH into a spanning tree for G. Since H has at most 4k vertices, all this can be done in O(k 3 ) time. So the time complexity mainly depends on the √ number of tested subsets Y . By using Stirling’s approximation x! ≈ xx e−x 2πn for factorials, and by suppressing some constant factors, we see that the number of tested subsets is at most 4k (4k)! e3k ek 44k 256 k (4k)4k = ) ≈ 9.4815k . · · = = ( ≈ 4k 3k k 3k k (3k)! k! e (3k) k 3 27 Summarizing, we have proved the following theorem.
268
Paul S. Bonsma, Tobias Brueggemann, and Gerhard J. Woeginger
Theorem 18. The problem MaxLeaf of deciding whether a given input graph G has a spanning tree with at least k leaves can be solved by an FPT algorithm with time complexity O(n3 +9.4815k k 3 ). The parameter function of this algorithm is f (k) = O(9.49k ).
References 1. N. Alon (1990). Transversal numbers of uniform hypergraphs. Graphs and Combinatorics 6, 1–4. 2. H.L. Bodlaender (1993). On linear time minor tests and depth-first search. Journal of Algorithms 14, 1–23. 3. Y. Caro, D.B. West, and R. Yuster (2000). Connected domination and spanning trees with many leaves. SIAM Journal on Discrete Mathematics 13, 202–211. 4. G. Ding, Th. Johnson, and P. Seymour (2001). Spanning trees with many leaves. Journal of Graph Theory 37, 189–197. 5. R. Downey and M.R. Fellows (1995). Parameterized computational feasibility. In: P. Clote, J. Remmel (eds.): Feasible Mathematics II. Birkh¨ auser, Boston, 219– 244. 6. R. Downey and M.R. Fellows (1998). Parameterized Complexity. Springer Verlag. 7. M.R. Fellows, C. McCartin, F.A. Rosamond, and U. Stege (2000). Coordinated kernels and catalytic reductions: An improved FPT algorithm for max leaf spanning tree and other problems. Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS’2000), Springer, LNCS 1974, 240–251. 8. M.R. Fellows and M.A. Langston (1992). On well-partial-order theory and its applications to combinatorial problems of VLSI design. SIAM Journal on Discrete Mathematics 5, 117–126. 9. G. Galbiati, F. Maffioli, and A. Morzenti (1994). A short note on the approximability of the maximum leaves spanning tree problem. Information Processing Letters 52, 45–49. 10. G. Galbiati, A. Morzenti, and F. Maffioli (1997). On the approximability of some maximum spanning tree problems. Theoretical Computer Science 181, 107– 118. 11. M.R. Garey and D.S. Johnson (1979). Computers and Intractability. W.H. Freeman and Co., New York. 12. J.R. Griggs and M. Wu (1992). Spanning trees in graphs of minimum degree four or five. Discrete Mathematics 104, 167–183. 13. D.J. Kleitman and D.B. West (1991). Spanning trees with many leaves. SIAM Journal on Discrete Mathematics 4, 99–106. 14. N. Linial and D. Sturtevant (1987). Unpublished result. 15. H.-I. Lu and R. Ravi (1998). Approximating maximum leaf spanning trees in almost linear time. Journal of Algorithms 29, 132–141. 16. R. Solis-Oba (1998). 2-approximation algorithm for finding a spanning tree with the maximum number of leaves. Proceedings of the 6th Annual European Symposium on Algorithms (ESA’1998), Springer, LNCS 1461, 441–452.
Symbolic Analysis of Crypto-Protocols Based on Modular Exponentiation Michele Boreale1 and Maria Grazia Buscemi2 1
Dipartimento di Sistemi e Informatica, Universit` a di Firenze, Italy 2 Dipartimento di Informatica, Universit` a di Pisa, Italy
[email protected],
[email protected] Abstract. Automatic methods developed so far for analysis of security protocols only model a limited set of cryptographic primitives (often, only encryption and concatenation) and abstract from low-level features of cryptographic algorithms. This paper is an attempt towards closing this gap. We propose a symbolic technique and a decision method for analysis of protocols based on modular exponentiation, such as DiffieHellman key exchange. We introduce a protocol description language along with its semantics. Then, we propose a notion of symbolic execution and, based on it, a verification method. We prove that the method is sound and complete with respect to the language semantics.
1
Introduction
During the last decade, a lot of research effort has been directed towards automatic analysis of crypto-protocols. Tools based on finite-state methods (e.g. [13]) take advantage of a well established model-checking technology, and are very effective at finding bugs. Infinite-state approaches, based on a variety of symbolic techniques ([2,3,8,14]), have emerged over the past few years. Implementations of these techniques (e.g. [4,16]) are still at an early stage. However, symbolic methods seem to be very promising in two respects. First, at least when the number of sessions is bounded, they can accomplish a complete exploration of the protocol’s state space: thus they provide proofs or disproofs of correctness under Dolev-Yao-like [11] assumptions - even though the protocol’s state space is infinite. Second, symbolic methods usually rely on representations of data that help to control very well state-explosion induced by communications. The application of automatic methods has mostly been confined to protocols built around ‘black-box’ enciphering and hashing functions. In this paper, we take a step towards broadening the scope of symbolic techniques, so as to include a class of low-level cryptographic operations. In particular, building on the general framework proposed in [5], we devise a complete analysis method for protocols that depend on modular exponentiation operations, like the DiffieHellman key-exchange [10]. We expect that our methodology may be adapted to other low-level primitives (like RSA encryption).
This work has been partially supported by EU within the FET - Global Computing initiative, projects MIKADO and PROFUNDIS and by MIUR project NAPOLI.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 269–278, 2003. c Springer-Verlag Berlin Heidelberg 2003
270
Michele Boreale and Maria Grazia Buscemi
The Diffie-Hellman protocol is intended for exchange of a secret key over an insecure medium, without prior sharing of any secret. The protocol has two public parameters: a large prime p and a generator α for the multiplicative group Zp∗ = {1, . . . , p−1}. Assume A and B want to establish a shared secret key. First, A generates a random private value nA ∈ Zp∗ and B generates a random private value nB ∈ Zp∗ . Next, A and B exchange their public values (exp (x, y) denotes xy mod p): 1. A −→ B : exp (α, nA ) 2. B −→ A : exp (α, nB ). Finally, A computes the key as K = exp (exp (α, nB ), nA ) = exp (α, nA × nB ), and B computes K = exp (exp (α, nA ), nB ) = exp (α, nA × nB ). Now A and B share K, and A can use it to, say, encrypt a secret datum d and send it to B: 3. A −→ B : {d}K . The protocol’s security depends on the difficulty of the discrete logarithm problem: computing y is computationally infeasible if only x and exp (x, y) are known. When defining a model for low-level protocols of this sort, one is faced with two conflicting requirements. On one hand, one should be accurate in accounting for the operations involved in the protocol (exponentiation, product) and their ‘relevant’ algebraic laws; even operations that are not explicitly mentioned in protocols, but that are considered feasible (like taking the k th root modulo a prime, and division) must be accounted for, because an adversary could in principle take advantage of them. On the other hand, one must be careful in keeping the model effectively analysable. In this respect, recent undecidability results on related problems of equational unification [12] indicate that some degree of abstraction is unavoidable. The limitations of our model are discussed in Section 2. Technically, we simplify the model by avoiding explicit commutativity laws and by keeping a free algebra model and ordinary unification. In fact, we ‘promote’ commutativity to non-determinism. As an example, upon evaluation of the expression exp (exp (α, n), m), an attacker will non-deterministically produce exp (α, m × n) or exp (α, n × m). The intuition is that if there is some action that depends on these two terms being equal modulo ×-commutativity, then there is an execution trace of the protocol where this action will take place. This seems reasonable since we only consider safety properties (i.e., ‘no bad action ever takes place’). Here is a more precise description of our work. In Section 2, parallelling [5], we introduce a syntax for expressions (including exp (·, ·) and related operations), along with a notion of evaluation. Based on this, we present a small protocol description language akin to the applied pi [1] and its (concrete) semantics. The latter assumes a Dolev-Yao adversary and is therefore infinitary. In Section 2, we introduce a finitary symbolic semantics, which relies on a form of narrowing strategy, and discuss its relationship with the concrete semantics. A verification method based on the symbolic semantics is presented in Section 4: the main result is Theorem 2, which asserts the correctness and completeness of the method with respect to the concrete model. Remarkably, the presence of the modular root
Symbolic Analysis of Crypto-Protocols Based on Modular Exponentiation
271
operation plays a crucial role in the completeness proof. Directions for further research are discussed in Section 5. An extended version of the present paper is available as [6]. Complete proofs will appear in a forthcoming full version. Very recent work by Millen and Shmatikov [15] shows how to reduce the symbolic analysis problem in the presence of modular exponentiation and multiplication plus encryption to the solution of quadratic Diophantine equations; decidability, however, remains an open issue. Closely related to our problem is also protocol analysis in the presence of the xor operation, which has been recently proven to be decidable by Chevalier et al. [7] and, independently, by Comon-Lundh and Shmatikov [9].
2
The Model
We recall here the concept of frame from [5], and tailor it to the case of modular exponentiation and multiplication. We consider two countable disjoint sets of names m, n, . . . ∈ N and variables x, y, . . . ∈ V. The set N is in turn partitioned into a countable set of local names a, b, . . . ∈ LN and a countable set of environmental names a, b, . . . ∈ EN : these two sets represent infinite supplies of fresh basic values (keys, random numbers,. . . ) at disposal of processes and of the (hostile) environment, respectively. It is technically convenient also to consider which will be used as place-holders a set of marked variables x ˆ, yˆ, zˆ, . . . ∈ V, is ranged for generic messages known to the environment. The set N ∪ V ∪ V over by u, v, . . .. Given a signature Σ of function symbols f, g, . . ., each coming with its arity (constants have arity 0), we denote by EΣ the algebra of terms (or where expressions) on N ∪ V ∪ Σ, given by the grammar: ζ, η ::= u | f (ζ), ζ is a tuple of terms of the expected length. A term context C[·] is a term with a hole that can be filled with any ζ, thus yielding an expression C[ζ]. Definition 1 (Frame for Exponentiation). A frame F is a triple (Σ, M, ↓), where: Σ is a signature; M ⊆ EΣ is a set of messages M, N, . . .; ↓ ⊆ EΣ × EΣ is an evaluation relation. We write ζ ↓ η for (ζ, η) ∈ ↓ and say that ζ evaluates to η. Besides shared-key encryption {ζ}η and decryption decη (ζ) (with η used as the key), the other symbols of Σ represent arithmetic operations modulo a fixed and public prime number, which is kept implicit: exponentiation exp (ζ, η), root extraction root (ζ, η), a constant α that represents a public generator and two constants for multiplicative unit (unit, 1), two distinct symbols for the product mult(ζ, η) and its result ζ × η, three symbols, inv(ζ), inv (ζ) and ζ −1 , representing the multiplicative inverse operation. The reason for using different symbols for the same operation is discussed below. All the underlying operations are computationally feasible. Evaluation (↓) is the reflexive and transitive closure of an auxiliary relation ;, as presented in Table 1. There, we use ζ1 ×ζ2 ×· · ·×ζn as a shorthand for ζ1 × (ζ2 × · · · × ζn ), while (i1 , . . . , in ) is any permutation of (1, . . . , n). The relation ; is terminating, but not confluent. In fact, the non-determinism of ; is intended
272
Michele Boreale and Maria Grazia Buscemi Table 1. FDH , a frame for modular exponentiation
Signature
Σ
=
{ α,
unit ,
· × ·,
1,
{·}(·) ,
mult(·, ·) ,
Factors
f
::=
u | u
F
::=
1 | f1 × · · · × fk
Keys
K, H
::=
f | exp (α, F )
Messages
M, N
::=
F | K | {M }K
(Mult) (Inv1 ) (Inv2 ) (Unit1 ) (Exp) (Root) (Ctx)
inv(·) ,
inv (·) ,
exp (·, ·) , −1
(·)
root (·, ·) ,
}
−1
Products
(Dec)
dec(·) (·) ,
decη ({ζ}η ) ; ζ mult(ζ1 × · · · ζk , ζk+1 × · · · × ζn ) ; ζi1 × · · · × ζin
inv(ζ1 × · · · × ζn ) ; inv (ζ1 ) × · · · × inv (ζn )
inv (ζ
−1
);ζ
unit × ζ ; ζ
(Inv3 ) (Unit2 )
inv (ζ) ; ζ
−1
1≤k j. As for proving lower bounds, the existence of a global variable ordering ensures that one can proceed as follows. Having put a cut through a σ–OBDD representing f at distance of say k from the source, the number of distinct subfunctions f |π , where π ranges over all paths from the source to the frontier nodes of the cut, is a lower bound on the σ–OBDD size of f . OBDDs are highly restricted branching programs. Many even simple functions have exponential OBDD–size (see [7], [11]). To maintain the essence of the above subfunction argument for more general models, the following observation is useful. If B is a deterministic BP1 on {x1 , x2 , . . . , xn }, then for each input a ∈ {0, 1}n there is a variable ordering σ(a) according to which the bits of a are queried. But not every combination of variable orderings can be implemented by deterministic BP1s. Only those resulting from graph orderings, independently introduced by Gergov and Meinel (see [12]) and Sieling and Wegener (see [23]), are possible. Definition 1. A graph ordering G is a deterministic BP1 such that each branching node has outdegree two, and each variable is tested on each path from the source to the target exactly once. A BP1 B is called a graph–driven one guided by a graph ordering G over the same set of variables as B, if the following condition is satisfied. For an arbitrary input a ∈ {0, 1}n , the list of variables inspected on every computation path for a in B is a subsequence of the corresponding list resulting from G. For every deterministic BP1 B, it is easy to construct a graph ordering G that guides B. But it is clear that there are BP1s that are not guided by a graph ordering. Of course, OBDDs are graph–driven deterministic BP1s. ⊕-OBDDs were introduced by Gergov and Meinel in [13], they have been intensively studied in [25] from a theoretical point of view. Heuristics for a successful practical implementation are due to Meinel and Sack (see [21], [18], [19]). Examples of functions showing that ⊕-OBDDs are more powerful than OBDDs are given in [13]. Graph–driven ⊕–BP1s have a strictly larger descriptive power than both deterministic BP1s and ⊕-OBDDs with respect to polynomial size. This follows from results due to Sieling [24].
292
Henrik Brosenne, Matthias Homeister, and Stephan Waack
Up to now, proving superpolynomial lower bounds on the size of ⊕–BP1s is a challenging open problem in complexity theory. In [22] exponential lower bounds for pointer functions on the size of (⊕, k)– BP1s are proved. A (⊕, k)–BP1 is a read–once BP with the source being the only nondeterministic node, where k denotes the fan–out √ of the source. In [8] exponential lower bounds of magnitude 2Ω( n) on the size of well– structured graph–driven ⊕–BP1 for certain linear code functions have been proved. Well–structured ⊕–BP1s and ∨–BP1s have been further investigated in [4] and [6]. In [4] a strongly exponential lower bound for integer multiplication is proved. In [6] polynomial size well–structured ⊕–BP1s are separated from polynomial size general ⊕–BP1s. The notion of well–structured graph–driven BP1s was introduced in [23]. The reader who is not familiar with this notion is referred to [26]. The results of this paper can be summarized as follows. In Section 2 we characterize all BP1s that are guided by graph orderings. The latter is the case if and only if for each input there is a variable ordering that is compatible with each computation path for the input. This shows the condition of being guided by a graph ordering is in fact a very natural combinatorial one. In Section 3 we prove a lower bound criterion for general graph–driven ⊕–BP1s. This criterion is applied in three cases. It is supposed that certain linear code functions do not have polynomially bounded unrestricted ⊕–BP1s. The result of Section √ 4 supports this supposition. We prove a lower bound of magnitude 2Ω( n) on the size of general graph–driven ⊕–BP1s representing them. Read-once projections introduced by Bollig and Wegener in [5] are projections of multiplicity one for each variable. They are an appropriate reduction notion for all branching program classes C subject to the read-once condition, since they preserve polynomial size. Given such a class C, in [26] at page 89 the following is observed. An exponential lower bound for the function PERM on the size of C-BP1s proves that projections with multiplicity 2 may lead to an exponential blow-up of the C size. Such a lower bound on the size of OBDDs and ∨-OBDDs is proved in [16] √ and [15], respectively. In Section 5 we prove a lower bound of magnitude 2Ω( n) on the size of general graph–driven ⊕–BP1s representing PERM. Finally, in Section 6 it is proved that unrestricted ⊕–BP1s are strictly stronger than graph–driven ⊕–BP1s.
2
Characterizing General Graph–Driven ⊕–BP1s
In this section we give the following characterization for a ⊕–BP1 to be a graph– driven one. Proposition 1. Let be B a ⊕–BP1 on the set of variables {x1 , x2 , . . . , xn }. Then there exists a graph–ordering G such that B is guided by G if and only if the following condition is satisfied. For each input a there is an ordering σ(a) of
Lower Bounds
293
{x1 , x2 , . . . , xn } such that on each computation path for a the bits of a are queried according to σ(a). Proof. It is clear that the condition is necessary. Assume now that the condition is fulfilled for a ⊕–BP1 B. We show that we can choose a variable xi such that for each input a the variables can be tested according to an ordering that starts with xi . Then the graph–ordering G can be constructed as follows. At the very beginning we create the unlabeled source s. The unique successor u of s is labeled with xi . Then we calculate the subdiagrams Bxi =0 (Bxi =1 , resp.) by setting in B the variable xi to 0 (to 1, resp.). Now for Bxi =b (b = 0, 1) there is another variable xjb that can be tested first in Bxi =b for all inputs a with ai = b. So we label the b−successor of u by xjb and then the procedure iterates. We assume that for each variable xi there is an input ai such that all orderings compatible with ai can not start with xi . Then for each xi there is an input ai and a computation path pi for ai such that a variable xj (j = i) is tested on pi before xi . After having renamed the indices we get inputs a1 , a2 , . . . , aν and computation paths p1 , p2 , . . . , pν such that the variable xν is tested before x1 on p1 , and for i = 2, . . . , ν the variable xi−1 is the first variable tested on pi , and xi occurs on pi , too. Clearly, the number ν is always greater than or equal to 2 and less than or equal to n. We call the sequence xν , x1 , . . . , xν−1 , xν a cycle with respect to the inputs a1 , a2 , . . . , aν and the corresponding computation paths p1 , p2 , . . . , pν . For i = 1, . . . , ν, let Si be the set of variables tested on pi before xi with xi ν being excluded. The number i=1 |Si | is called the weight of the cycle. Let us consider from now on a cycle xν , x1 , . . . , xν−1 , xν with respect to the inputs a1 , a2 , . . . , aν and the corresponding computation paths p1 , p2 , . . . , pν of minimal weight. We observe that the minimality entails that the sets Si (i = 1, . . . , ν) are pairwise disjoint. Since the sets S1 , S2 , . . . , Sν are pairwise disjoint, there is an input a such that for all i = 1, . . . , ν, a |Si = ai |Si . Contradiction.
3
A Lower Bound Criterion for Graph–Driven ⊕–BP1s
Let B be a graph–driven ⊕–BP1 on the set of variables {x1 , x2 , . . . , xn } guided by a graph ordering G representing the Boolean function f . We define two vector spaces over IF2 . The space IB(B) is spanned by the functions Resv , where v is a node of B and Resv denotes the function represented by the subdiagram of B rooted at v. The second space, denoted by IBG (f ), is the span of all subfunctions f |π , where π is a path from the source to a node w in G and f |π results from f by setting the variable according to the labels of the nodes and edges on π. There is an equivalent way of defining the function f ∈ IBn represented by a ⊕–BP B on {x1 , x2 , . . . , xn }. For each node u of the diagram B, we inductively
294
Henrik Brosenne, Matthias Homeister, and Stephan Waack
define its resulting function Resu . The resulting function of the target equals the all–one function. For a branching node u labeled with the variable x, Resu := (x ⊕ 1) ∧ Resv ⊕ x ∧ Resv . v∈Succ0 (u)
v∈Succ1 (u)
If s is the source, then Ress := v∈Succ(s) Resv . The function Res(B) : {0, 1}n → {0, 1} represented by the whole diagram is defined to be Ress . We are now in the position to state a lower bound criterion, whose proof is very easy. It estimates the size of a graph–driven ⊕–BP1 by an invariant of the graph ordering and the function represented. Theorem 1. Let B be a graph–driven ⊕–BP1 guided by a graph ordering G representing the Boolean function f . Then SIZE (B) ≥ dimIF2 IBG (f ). Proof. First we observe that SIZE (B) ≥ dimIF2 IB(B). Let f |π be any generating element of the vector space IBG (f ), and let α be the partial assignment to the set of variables {x1 , x2 , . . . , xn } associated with the path π. Since the branching program B is guided by the graph ordering G, we are led to nodes v1 , v2 , . . ., vν when traversing B starting ν at the source according to the partial assignment α. Consequently, f |π = j=1 Resvj . This entails that IB(B) contains IBG (f ) as a subspace. The claim follows. Corollary 1. Let π1 , π2 , . . . , πν be paths in G starting at the source. Let α1 , . . . , αν be the partial assignments associated with these paths. If the subfunctions fα1 , . . . , fαν are linearly independent, then SIZE (B) ≥ ν. Proof. The subspace of IBG (f ) spanned by {fα1 , . . . , fαν } is of dimension ν.
4
A Lower Bound for Linear Codes
A linear code C is a linear subspace of IFn2 . Our first explicit lower bound is for the characteristic function of such a linear code C, that is fC : IFn2 → {0, 1} defined by fC (a) = 1 ⇐⇒ a ∈ C. To this end we will give some basic definitions and facts on linear codes. The Hamming distance of two code words a, b ∈ C is defined to be the number of 1’s of a⊕b. The minimal distance of a code C is the minimal Hamming distance of two distinct elements of C. The dual C ⊥ is the set of all vectors b such that a1 b1 ⊕ . . . ⊕ an bn = 0, for all elements a ∈ C. A set D ⊆ IFn2 is defined to be k-universal, if for any subset of k indices I ⊆ {1, . . . , n} the projection onto these coordinates restricted to the set D gives the whole space IFk2 . The next lemma is well–known. See [14] for a proof. Lemma 1. If C is a code of minimal distance k + 1, then its dual C ⊥ is k– universal. Theorem 2. Let C ⊆ IFn2 be a linear code of minimal distance d whose dual C ⊥ has minimal distance d⊥ . Then each graph–driven ⊕–BP1 representing its ⊥ characteristic function fC has size bounded below by 2(min{d,d }−1) .
Lower Bounds
295
Proof. Let B be a graph–driven ⊕–BP1 guided by G representing f = fC . Consider the set of all nodes of the graph ordering G at depth k from the source, where k := min{d, d⊥ } − 1. Thus for each such node v and each path π leading from the source to v exactly k variables are tested on π. For m := 2k , let α1 , . . . , αm be the partial assignments of the variables x1 , x2 , . . . , xn resulting from these paths. Observe, that the code C is both of distance k and k–universal. We consider the subfunctions fα1 , fα2 , . . . , fαm and take notice of the fact that these functions formally depend on all variables x1 , x2 , . . . , xn . According to corollary 1 it suffices to prove that fα1 , fα2 , . . . , fαm are linearly independent. Let {fαi1 , fαi2 , . . . , fαiµ } be any nonempty subset of {fα1 , fα2 , . . . , fαm }. Having assumed without loss of generality µ that (αi1 , αi2 , . . . , αiµ ) is equal to (α1 , α2 , . . . , αµ ), we have to show that i=1 fαi = 0. For each partial assignment α1 to the variables {x1 , x2 , . . . , xn } whose domain is complementary to the domain of α1 , we can define a vector (α1 , α1 ) := (a1 , a2 , . . . , an ) ∈ {0, 1}n as follows. α1 (xj ) if α1 (xj ) is defined; aj := α1 (xj ) if α1 (xj ) is defined (j = 1, 2, . . . n) . Since C is k–universal, there is a α1 such that a := (α1 , α1 ) is a member of C. Consequently fα1 (a) = 1. For each 1 < i ≤ µ, we show that fαi (a) = 0. Obviously, fαi (a) = f (a(i) ), where αi (xj ) if αi (xj ) is defined; (i) aj := aj otherwise. Since the distance between a and a(i) is less than or equal to k, the claim follows. Corollary 2. Let n = 2l and r = l/2 . Then every graph–driven ⊕–BP1 representing the characteristic function √ of the r–th order binary Reed–Muller code R(r, l) has size bounded below by 2Ω ( n) . Proof. We apply that the code R(r, l) is linear and has minimal distance 2l−r . It is known that the dual of R(r, l) is R(l − r − 1, l) (see [17]).
5
A Lower Bound for Permutation Matrices
A n × n matrix over {0, 1} is defined to be a permutation matrix, if each row and each column contains exactly one entry 1. The well-known function PERM depending on n2 Boolean variables accepts exactly those inputs corresponding to permutation matrices. In this section we adopt ideas from [16] and [15]. Theorem 3. ⊕–BP1 representing PERM has size bounded Each graph–driven below by Ω n−1/2 2n .
296
Henrik Brosenne, Matthias Homeister, and Stephan Waack
Proof. Let B be a graph–driven ⊕–BP1 guided by G representing the function f := PERMn depending on the variables xij (i, j = 1, 2, . . . , n). We consider the n! inputs a = (aij )1≤i,j≤n that correspond to permutation matrices and the corresponding paths in the graph ordering G. Having tested exactly n/2 variables 1, we truncate these paths. Let A1 := {α1 , α2 , . . . , αν } be the partial assignments to the set of variables {xij | i, j = 1, 2, . . . , n} associated with these truncated paths. For such a partial assignment α let R(α) be the set of row indices i such that α(xij ) = 1, for some column index j. Analogously, let C(α) be the set of column indices j such that α(xij ) = 1, for some row index i. Then |C(α)| = |R(α)| = n/2 by construction. We consider sets A of the above defined partial assignments such that for distinct α, β ∈ A it holds that R(α) = R(β) or C(α) = C(β). We recapitulate the proof given in [16] for the fact, that there is one of these subsets A such that 2 |A| ≥ n!/ ((n/2)!) . Indeed, if we fix two subsets C, R ⊆ {1, 2, . . . , n} of columns and rows, where 2 |C| = |R|, then there are exactly n2 ! inputs a that lead to partial assignments α such that C(α) = C and R(α) = R. They result from combining the (n/2)! bijections from R to C with the (n/2)! bijections from {1, 2, . . . , n} \ R to {1, 2, . . . , n} \ C. Since there are exactly n! accepted inputs, the claim follows. n 2 Let A = {α1 , α2 , . . . , αm } m ≥ n!/ 2 ! be a set of partial assignments the existence of which we have just proved. To prove that A fulfils the prerequisites of Corollary 1, we choose an arbitrary subset A of A. We may µ assume without loss of generality that A = {α1 , α2 , . . . , αµ }. We show that i=1 fαi = 0. By the choice of A there is an partial assignment α1 such that (α1 , α1 ) is a permutation matrix. Thus fα1 (0, . . . , 0, α1 ) = 1, where (0, . . . , 0, α1 ) is the following matrix. α1 (xk, ) if α1 (xk, ) is defined; (0, . . . , 0, α1 )k, := 0 otherwise. We show now, that fαi (0, . . . , 0, α1 ) = 0 for i > 1. For the sake of deriving a contradiction, let us assume that there is an index i > 1 such that fαi (0, . . . , 0, α1 ) = 1. Then there is a permutation matrix consisting of only those ones set by αi and α1 . But this implies R(αi ) = R(α1 ) and C(αi ) = C(α1 ), because otherwise there is a row or a column of the permutation matrix a defined by if αi (xk, ) is defined; αi (xk, ) ak, := (0, . . . , 0, α1 )k, if αi (xk, ) is not defined; without any one. Contradiction. Now the claim follows from Stirling’s formula.
Lower Bounds
6
297
Unrestricted ⊕–BP1s Are Stronger than Graph–Driven Ones
To get a function that has small unrestricted but requires exponential sized graph–driven ⊕–BP1s we consider the following function that as PERM depends on a matrix of n2 Boolean variables. It is defined by 1lC ∨ 1lR+ where 1 if each column of X contains exactly one 1; 1lC (X) = 0 otherwise. 1 if n − 1 rows of X contain exactly one 1 1lR+ (X) = and one row contains two 1’s; 0 otherwise. It is easy to construct an OBDD of size bounded above by O n2 testing the variables in a columnwise manner and taking the value one if each column contains a single 1. In the same way we get a linear sized OBDD that tests the variables in a rowwise manner and accepts if n − 1 rows contain a single 1 and one row contains exactly two. Joining the sources of these two OBDDs together, we get a ⊕–BP1 of linear size that represents 1lC ∨ 1lR+ . Theorem 4. Each graph–driven ⊕–BP1 representing 1lC ∨ 1lR+ has size bounded below by Ω n−1/4 · 2n/2 . Proof. Let B be a graph–driven ⊕–BP1 guided by a graph ordering G that represents f := 1lC ∨ 1lR+ on the variables xij (i, j = 1, 2, . . . , n). Without loss of generality we suppose that n is even. As in the case of Theorem 3, we consider the n! inputs a = (aij )1≤i,j≤n that correspond to permutation matrices and the corresponding paths in the graph ordering G. Having tested exactly n/2 variables 1, we truncate these paths. We consider the partial assignments A1 := {α1 , α2 , . . . , αν } to the set of variables {xij | i, j = 1, 2, . . . , n} associated with these truncated paths. First, we observe that
2 n − n/2 2 |A1 | = ν ≤ n · ≤ n2 · (2en)n/2 . (1) n/2 Indeed, there are i+n/2 paths in G starting from the source along which i n/2 variables are tested 0, and n/2 variables are tested 1. Permutation matrices can only follow those paths, where the number of variables tested 0 is less than or equal to n2 − n. Equation 1 follows. Second, without loss of generality let A2 := {α1 , α2 , . . . , αµ } be those elements of A1 that can be extended to atleast two permutation matrices. By the pin!−ν . geon hole principle, we get ν−µ+µ· n2 ! ≥ n!. Consequently, |A2 | = µ ≥ (n/2)!−1 Without loss of generality, let A = {α1 , α2 , . . . , ακ } for κ ≤ ν be those elements of A2 such that C(α) = C(β) or R(α) = R(β) (see the proof of
298
Henrik Brosenne, Matthias Homeister, and Stephan Waack
Theorem 3 for the definitions of R and C). Since at most (n/2)! elements n!−ν . α ∈ A2 may have the same pair (R, C), we obtain |A| = κ ≥ (n/2)!·(n/2)! ν n! Since limn→∞ (n/2)!·(n/2)! = 0 we get |A| = κ ≥ (n/2)!·(n/2)! − o(1). Now, let AC (AR ) be a subset of A of maximal size satisfying the following C(β) property. If α, β ∈ AC (α, β ∈ AR ) are two distinct elements, then C(α) = (R(α) = R(β)).It follows from an easy counting argument, that |AC | < |A| implies |AR | ≥ |A|. Thus we have the following two cases to distinguish. Case |AC | ≥ |A|. Let AC = {β1 , β2 , . . . , βν } be any subset of the set AC . ν We have to show, that i=1 fβi = 0. By the choice of A there is a partial assignment β1 such that (β1 , β1 ) is a permutation matrix. Thus fβ1 (0, . . . , 0, β1 ) = 1, where the matrix (0, . . . , 0, β1 ) is defined as in the proof of theorem 3. Moreover, we get that fβi (0, . . . , 0, β1 ) = 0 for i > 0. By the definition of AC we have that C(βi ) = C(β1 ) for i > 0. Thus there is a column of the matrix a defined by if βi (xk, ) is defined; βi (xk, ) ak, := (2) (0, . . . , 0, β1 )k, if βi (xk, ) is not defined; without any one. So we get that fβi (0, . . . , 0, β1 ) = 0 for i > 0 and the inequality ν i=1 fβi = 0 follows. Case |AR | ≥ |A|. Let AR = {β1 , β2 , . . . , βν } be any subset of the set AR . ν Again we have to show, that i=1 fβi = 0. Each element βi of AR can be extended to at least two permutation matrices (βi , βi ) and (βi , β ). So we can construct an assignment βi∗ such that (βi , βi∗ ) is a matrix that contains n − 1 rows with exactly one entry 1 and one row that contains exactly two ones. Thus fβ1 (0, . . . , 0, β1∗ ) = 1. For i > 0 we get that fβ1 (0, . . . , 0, βi∗ ) = 0 since similar to the first case there is a row of the matrix a as defined in (2) without any one. So the claim follows.
References 1. M. Ajtai. A non-linear time lower bound for Boolean branching programs. In Proceedings, 40th FOCS, pages 60–70, 1999. 2. P. Beame, M. Saks, X. Sun, and E. Vee. Super–linear time-space tradeoff lower bounds for randomized computations. In Proceedings, 41st FOCS, pages 169–179, 2000. 3. P. Beame and E. Vee. Time-space trade-offs, multiparty communication complexity, and nearest neighbour problems. In Proceedings, 34th STOC, pages 688–697, 2002. 4. B. Bollig, St. Waack, and P. Woelfel. Parity graph-driven read-once branching programs and an exponential lower bound for integer multiplication. In Proceedings 2nd IFIP International Conference on Theoretical ComputerScience, 2002. 5. B. Bollig and I. Wegener. Read-once projections and formal circuit verification with binary decision diagrams. In Proceedings, STACS’96, Lecture Notes in Computer Science, pages 491–502. Springer Verlag, 1996. 6. B. Bollig and P. Woelfel. A lower bound technique for nondeterministic graphdriven read-once branching programs and its applications. In Proceedings, 27th MFCS, Lecture Notes in Computer Science. Springer, 2002.
Lower Bounds
299
7. Y. Breitbart, H. B. Hunt, and D. Rosenkrantz. The size of binary decision diagrams representing Boolean functions. Theoretical Computer Science, 145:45–69, 1995. 8. H. Brosenne, Homeister M., and St. Waack. Graph–driven free parity BDDs: Algorithms and lower bounds. In Proceedings, 26th MFCS, volume 2136 of Lecture Notes in Computer Science, pages 212–223. Springer Verlag, 2001. 9. R. E. Bryant. Symbolic manipulation of Boolean functions using a graphical representation. In Proceedings, 22nd DAC, pages 688–694, Piscataway, NJ, 1985. IEEE. 10. R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers, 35:677–691, 1986. 11. R. Bryant. On the complexity of VLSI implementations of Boolean functions with applications to integer multiplication. IEEE Transactions on Computers, 40:205– 213, 1991. 12. J. Gergov and Ch. Meinel. Frontiers of feasible and probabilistic feasible Boolean manipulation with branching programs. In Proceedings, 10th STACS, volume 665 of Lecture Notes in Computer Science, pages 576–585. Springer Verlag, 1993. 13. J. Gergov and Ch. Meinel. Mod-2-OBDDs – a data structure that generalizes exor-sum-of-products and ordered binary decision diagrams. Formal Methods in System Design, 8:273–282, 1996. 14. S. Jukna. Linear codes are hard for oblivious read-once parity branching programs. Information Processing Letters, 69:267–269, 1999. 15. M. Krause, Ch. Meinel, and St. Waack. Separating the eraser Turing machine classes Le , NLe , co-NLe , and Pe . Theoretical Computer Science, 86:267–275, 1991. 16. M. Krause. Exponential lower bounds on the complexity of local and real-time branching programs. Journal of Information Processing and Cybernetics (EIK), 24:99–110, 1988. 17. E. J. MacWilliams and N. J. A. Sloane. The Theory of Error–Correcting Codes. Elsevier, 1977. 18. Ch. Meinel and H. Sack. Heuristics for ⊕-OBDDs. In Proceedings, IEEE/ACM International Workshop of Logig and Synthesis, pages 304–309, 2001. 19. Ch. Meinel and H. Sack. Improving XOR-node placements for ⊕-OBDDs. In Proceedings, 5th International Workshop of Reed-Muller Expansion in Circuit Design, pages 51–55, 2001. ` I. Nechiporuk. A Boolean function. Sov. Math. Doklady, 7:999–1000, 1966. 20. E. 21. H. Sack. Improving the Power of OBDDs by Integrating Parity Nodes. PhD thesis, Univ. Trier, 2001. 22. P. Savick´ y and D. Sieling. A hierarchy result for read–once branching programs with restricted parity nondeterminism. In Proceedings, 25th MFCS, volume 1893 of Lecture Notes in Computer Science, pages 650–659. Springer Verlag, 2000. 23. D. Sieling and I. Wegener. Graph driven BDDs – a new data structure for Boolean functions. Theoretical Computer Science, 141:238–310, 1995. 24. D. Sieling. Lower bounds for linear transformed OBDDs and FBDDs. In Proceedings, 19th FSTTCS, number 1738 in Lecture Notes in Computer Science, pages 356–368. Springer Verlag, 1999. 25. St. Waack. On the descriptive and algorithmic power of parity ordered binary decision diagrams. Information and Computation, 166:61–70, 2001. 26. I. Wegener. Branching Programs and Binary Decision Diagrams – Theory and Applications. SIAM Monographs on Discrete Mathematics and Applications. SIAM, Philadelphia, 2000.
The Minimal Graph Model of Lambda Calculus Antonio Bucciarelli1 and Antonino Salibra2,
2
1 Universit´e Paris 7, Equipe PPS 2 place Jussieu, 72251 Paris Cedex 05, France
[email protected] Universit`a Ca’Foscari di Venezia, Dipartimento di Informatica Via Torino 155, 30172 Venezia, Italia
[email protected] Abstract. A longstanding open problem in lambda-calculus, raised by G.Plotkin, is whether there exists a continuous model of the untyped lambda-calculus whose theory is exactly the beta-theory or the beta-eta-theory. A related question, raised recently by C.Berline, is whether, given a class of lambda-models, there is a minimal equational theory represented by it. In this paper, we give a positive answer to this latter question for the class of graph models a` la Plotkin-Scott-Engeler. In particular, we build a graph model the equational theory of which is exactly the set of equations satisfied in any graph model.
1
Introduction
Lambda theories (that is, congruence relations on λ-terms closed under α- and βconversion) are equational extensions of the untyped lambda calculus that are closed under derivation. Lambda theories arise by syntactical considerations, a lambda theory may correspond to a possible operational (observational) semantics of the lambda calculus, as well as by semantic ones, a lambda theory may be induced by a model of lambda calculus through the kernel congruence relation of the interpretation function (see e.g. [4], [7]). Since the lattice of the lambda theories is a very rich and complex structure, syntactical techniques are usually difficult to use in the study of lambda theories. Therefore, semantic methods have been extensively investigated. Computational motivations and intuitions justify Scott’s view of models as partially ordered sets with a least element and of computable functions as monotonic functions over these sets. After Scott, mathematical models of the lambda calculus in various categories of domains were classified into semantics according to the nature of their representable functions (see e.g. [4], [7], [17]). Scott’s continuous semantics [19] is given in the category whose objects are complete partial orders (cpo’s) and morphisms are Scott continuous functions. The stable semantics introduced by Berry [8] and the strongly stable semantics introduced by Bucciarelli-Ehrhard [9] are strengthenings of the continuous semantics. The stable semantics is given in the category of DI-domains with stable functions as morphisms, while the strongly stable one is given in the category of DI-domains with coherence, and strongly stable functions as morphisms. All these
Partially supported by MURST Cofin’01 COMETA Project and by a visiting fellowship granted by the Equipe PPS of the University Paris 7-Denis Diderot.
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 300–307, 2003. c Springer-Verlag Berlin Heidelberg 2003
The Minimal Graph Model of Lambda Calculus
301
semantics are structurally and equationally rich in the sense that it is possible to build up 2ℵ0 models in each of them inducing pairwise distinct lambda theories [14] [15]. The following are long standing open problems in lambda calculus (see Berline [7]): Question 1. Is there a continuous (stable, strongly stable) model whose theory is exactly λβ or λβη (where λβ is the minimal lambda theory and λβη is the minimal extensional lambda theory)? Question 1 can be weakened in two ways: Question 2. Is λβ the intersection of all theories of continuous semantics? and similarly for λβη and extensional models; and similarly for other semantics. A lambda theory T is the minimal theory of a class C of models if there exists a model in the class which induces T , while all the other models in the class induce theories including T . Question 3. Given a class of models in a given semantics, is there a minimal lambda theory represented in it? Two related question have been answered. Given a semantics, it is natural to ask if all possible λ-theories are induced by a model in the semantics. Negative answers to this question for the continuous, stable and strongly stable semantics were obtained respectively by Honsell-Ronchi della Rocca [13], Bastonero-Gouy [6] and Salibra [18]. All the known semantics are thus incomplete for arbitrary lambda theories. On the other hand, Di Gianantonio et al. [12] have shown that λβη can arise as the theory of a model in the ω1 -semantics (thus all questions collapse and have a positive answer in this case). If ω0 and ω1 denote, respectively, the first infinite ordinal and the first uncountable ordinal, then the models in the ω1 -semantics are the reflexive objects in the category whose objects are ω0 - and ω1 -complete partial orders, and whose morphisms preserve limits of ω1 -chains (but not necessarily of ω0 -chains). Another result of [12] is that Question 3 admits a positive answer for Scott’s continuous semantics, at least if we restrict to extensional models. However, the proofs of [12] use logical relations, and since logical relations do not allow to distinguish terms with the same applicative behavior, the proofs do not carry out to non-extensional models. Among the set-theoretical models of the untyped lambda calculus that were introduced in the seventies and early eighties, there is a class whose members are particularly easy to describe (see Section 2 below). These models, referred to as graph models, were isolated by Plotkin, Scott and Engeler [4] within the continuous semantics. Graph models have been proved useful for giving proofs of consistency of extensions of λcalculus and for studying operational features of λ-calculus. For example, the simplest of all graph models, namely Engeler-Plotkin’s model, has been used to give concise proofs of the head-normalization theorem and of the left-normalization theorem of λcalculus (see [7]), while a semantical proof based on graph models of the “easiness” of (λx.xx)(λx.xx) was obtained by Baeten and Boerboom in [3]. Intersection types were introduced by Dezani and Coppo [10] to overcome the limitations of Curry’s type discipline. They provide a very expressive type language which allows to describe and capture various properties of λ-terms. By duality, type theories give rise to filter models of lambda calculus (see [1], [5]). Di Gianantonio and Honsell
302
Antonio Bucciarelli and Antonino Salibra
[11] have shown that graph models are strictly related to filter models, since the class of λ-theories induced by graph models is included in the class of λ-theories induced by non-extensional filter models. Alessi et al. [2] have shown that this inclusion is strict, namely there exists an equation between λ-terms which cannot be proved in any graph model, whilst this is possible with non-extensional filter models. In this paper we show that the graph models admit a minimal lambda theory. This result provides a positive answer to Question 3 for the restricted class of graph models.An interesting question arises: what equations between λ-terms are equated by this minimal lambda theory? The answer to this difficult question is still unknown; we conjecture that the right answer is the minimal lambda theory λβ. By what we said in the previous paragraph this would solve the same problem for the class of filter models. We conclude this introduction by giving a sketch of the technicalities used in the proof of the main theorem. For any equation between λ-terms which fails in some graph model we fix a graph model, where the equation fails. Then we use a technique of completion for gluing together all these models in a unique graph model. Finally, we show that the equational theory of this completion is the minimal lambda theory of graph models.
2
Graph Models
To keep this article self-contained, we summarize some definitions and results concerning graph models that we need in the subsequent part of the paper. With regard to the lambda calculus we follow the notation and terminology of [4]. The class of graph models belongs to Scott’s continuous semantics. Historically, the first graph model was Plotkin and Scott’s Pω , which is also known in the literature as “the graph model". “Graph” referred to the fact that the continuous functions were encoded in the model via (a sufficient fragment of) their graph. As a matter of notation, for every set D, D∗ is the set of all finite subsets of D, while P(D) is the powerset of D. Definition 1. A graph model is a pair (D, p), where D is an infinite set and p : D∗ ×D → D is an injective total function. Let (D, p) be a graph model and EnvD be the set of D-environments ρ mapping the set of the variables of λ-calculus into P(D). We define the interpretation M p : EnvD → P(D) of a λ-term M as follows. – xpρ = ρ(x) – (M N )pρ = {α ∈ D | ∃a ⊆ Nρp s.t. p(a, α) ∈ Mρp } p } – (λx.M )pρ = {p(a, α) | α ∈ Mρ[x:=a] It is not difficult to show that any graph model (D, p) is a model of β-conversion, i.e., it satisfies the following condition: λβ M = N ⇒ Mρp = Nρp , for all λ-terms M, N and all environments ρ. Then any graph model (D, p) defines a model for the untyped lambda calculus through the reflexive cpo (P(D), ⊆) determined by the continuous (w.r.t. the Scott topology) mappings
The Minimal Graph Model of Lambda Calculus
F : P(D) → [P(D) → P(D)];
303
G : [P(D) → P(D)] → P(D),
defined by F (X)(Y ) = {α ∈ D : (∃a ⊆ Y ) p(a, α) ∈ X}; G(f ) = {p(a, α) : α ∈ f (a), a ∈ D∗ }, where [P(D) → P(D)] denotes the set of continuous selfmaps of P(D). For more details we refer the reader to Berline [7] and to Chapter 5 of Barendregt’s book [4]. Given a graph model (D, p), we say that M p = N p if, and only if, Mρp = Nρp for all environments ρ. The lambda theory T h(D, p) induced by (D, p) is defined as T h(D, p) = {M = N : M p = N p }. It is well known that T h(D, p) is never extensional because (λx.x)p = (λxy.xy)p . Given this huge amount of graph models (one for each total pair (D, p)), one naturally asks how many different lambda theories are induced by these models. Kerth has shown in [14] that there exist 2ℵ0 graph models with different lambda theories. A lambda theory T is the minimal lambda theory of the class of graph models if there exists a graph model (D, p) such that T = T h(D, p) and T ⊆ T h(E, i) for all other graph models (E, i). The completion method for building graph models from “partial pairs” was initiated by Longo in [16] and recently developed on a wide scale by Kerth in [14] [15]. This method is useful to build models satisfying prescribed constraints, such as domain equations and inequations, and it is particularly convenient for dealing with the equational theories of the graph models. Definition 2. A partial pair (D, p) is given by an infinite set D and a partial, injective function p : D∗ × D → D. A partial pair is a graph model if and only if p is total. We always suppose that no element of D is a pair. This is not restrictive because partial pairs can be considered up to isomorphism. Definition 3. Let (D, p) be a partial pair. The Engeler completion of (D, p) is the graph model (E, i) defined as follows: – E = n∈ω En , where E0 = D, En+1 = En ∪ ((En∗ × En ) − dom(p)). – Given a ∈ E ∗ , α ∈ E, p(a, α) if a ∪ {α} ⊆ D, and p(a, α) is def ined i(a, α) = (a, α) otherwise It is easy to check that the Engeler completion of a given partial pair (D, p) is actually a graph model. The Engeler completion of a total pair (D, p) is equal to (D, p). A notion of rank can be naturally defined on the Engeler completion (E, i) of a partial pair (D, p). The elements of D are the elements of rank 0, while an element α ∈ E − D has rank n if α ∈ En and α ∈ En−1 . We conclude this preliminary Section by remarking that the classic graph models, such as Plotkin and Scott’s Pω [4] and Engeler-Plotkin’s EA (with A an arbitrary nonempty set of “atoms”) [7], can be viewed as the Engeler completions of suitable partial pairs. In fact, Pω and E are respectively isomorphic to the Engeler completions of ({0}, p) (with p(∅, 0) = 0) and (A, ∅).
304
Antonio Bucciarelli and Antonino Salibra
3 The Minimal Graph Model Let I be the set of equations between λ-terms which fail to hold in some graph model. For every equation e ∈ I, we consider a fixed graph model (De , ie ), where the equation e fails to hold. Without loss of generality, we may assume that De1 ∩ De2 = ∅ for all distinct equations e1 , e2 ∈ I. We consider the pair (DI , qI ) defined by: DI = De ; qI = ie . e∈I
e∈I
This pair fails to be a graph model because the map qI : DI∗ × DI → DI is not total (qI is defined only on the pairs (a, x) such that a ∪ {x} ⊆ De for some e ∈ I). Finally, let (E, i) be the Engeler completion of (DI , qI ). We are going to show that the theory of (E, i) is the intersection of all the theories of graph models, i.e. that: Theorem 1. The class of graph models admits a minimal lambda theory. From now on, we focus on one of the (De , ie ), in order to show that all the equations between closed lambda terms true in (E, i) are true in (De , ie ). The idea is to prove that, for all closed λ-terms M M ie = M i ∩ De .
(1)
This takes a structural induction on M , and hence the analysis of open terms too. Roughly, we are going to show that equation (1) holds for open terms as well, provided that the environments satisfy a suitable closure property introduced below. Definition 4. Given e ∈ I, we call e-flattening the following function fe : E → E defined by induction on the rank of elements of E: if rank(x) = 0 then fe (x) = x if rank(x) = n + 1 and x = ({y1 , ..., yk }, y) then ie ({fe (y1 ), ..., fe (yk )} De , fe (y)) if fe (y) ∈ De fe (x) = x otherwise For all a ⊆ E, fe (a) will denote the set {fe (x) : x ∈ a}. The following easy facts will be useful: Lemma 1. (a) For all x ∈ E, if fe (x) ∈ De then fe (x) = x. (b) If a ⊆ E, z ∈ E and fe (z) ∈ De , then fe (i(a, z)) = ie (fe (a) ∩ De , fe (z)) ∈ De . We notice that Lemma 1(b) holds, a fortiori, if z ∈ De . Definition 5. For a ⊆ E let a ˆ = a ∪ fe (a); we say that a is e-closed if a ˆ = a. Lemma 2. For all a ⊆ E, a ˆ ∩ De = fe (a) ∩ De .
The Minimal Graph Model of Lambda Calculus
305
Proof. By definition, a ˆ = a ∪ fe (a), hence a ˆ ∩ De = (a ∩ De ) ∪ (fe (a) ∩ De ) Since fe restricted to De is the identity function, we have a ∩ De ⊆ fe (a) ∩ De , and we are done. Definition 6. Let ρ : V ar → P(E) be an E-environment. We define the e-restriction ρe of ρ by ρe (x) = ρ(x) ∩ De , while we say that ρ is e-closed if for every variable x, ρ(x) is e-closed. The following proposition is the key technical lemma of the paper: Proposition 1. Let M be a λ-term and ρ be an e-closed E-environment; then (i) Mρi is e-closed. (ii) Mρi ∩ De ⊆ Mρie . Proof. We prove (i) and (ii) simultaneously by induction on the structure of M . If M ≡ x, both statements are trivially true. Let M ≡ λx.N , and let us start by proving the statement (i): given y = i(a, z) ∈ Mρi , we have to show that fe (y) ∈ Mρi . First we remark that, if rank(y) = 0 or fe (z) ∈ De , then by Lemma 1(a) fe (y) = y and we are done. Hence, let y = i(a, z) and fe (z) ∈ De ; we have y ∈ Mρi i ⇒ z ∈ Nρ[x:=a] i ⇒ z ∈ Nρ[x:=ˆ a] i ⇒ fe (z) ∈ Nρ[x:=ˆ a] i ⇒ fe (z) ∈ N(ρ[x:=ˆ a])e ⇒ fe (z) ∈ Nρie [x:=fe (a)∩De ] ⇒ i(fe (a) ∩ De , fe (z)) ∈ Mρie ⇒ ie (fe (a) ∩ De , fe (z)) ∈ Mρie ⇒ fe (y) ∈ Mρie ⇒ fe (y) ∈ Mρi
by definition of ( )i by monotonicity of ( )i w.r.t. environments by (i), remark that ρ[x := a ˆ] is closed by (ii) , since fe (z) ∈ De by Lemma 2 by definition of ( )i by definition of (E, i) by definition of fe by monotonicity of ( )i
Let us prove that M ≡ λx.N satisfies (ii): y ∈ Mρi ∩ De i ⇒ (∃a ⊆ De )(∃z ∈ De ) y = ie (a, z) and z ∈ Nρ[x:=a] i ⇒ z ∈ N(ρ[x:=a]) e ⇒ z ∈ Nρie [x:=a] ⇒ y ∈ Mρie
by definition of ( )i and by y ∈ De by (ii), remark that a ˆ=a since a ⊆ De by definition of ( )i
Let M ≡ P Q. (i) Let z ∈ (P Q)iρ . If fe (z) = z we are done, otherwise by Lemma 1(a) fe (z) ∈ De . Moreover, ∃a ⊆ E such that i(a, z) ∈ Pρi and a ⊆ Qiρ . Applying (i) and Lemma 1(b) to P we get
306
Antonio Bucciarelli and Antonino Salibra
fe (i(a, z)) = ie (fe (a) ∩ De , fe (z)) = i(fe (a) ∩ De , fe (z)) ∈ Pρi . Applying (i) to Q we get fe (a) ⊆ Qiρ . Hence fe (z) ∈ Mρi . (ii) If z ∈ (P Q)iρ ∩ De , then ∃a ⊆ E such that i(a, z) ∈ Pρi and a ⊆ Qiρ . Since ρ is e-closed and z ∈ De , then by (i) and by Lemma 1(b) we get fe (i(a, z)) = ie (fe (a) ∩ De , z) ∈ Pρi and fe (a) ∩ De ⊆ Qiρ . Now, by (ii), we obtain ie (fe (a) ∩ De , z) ∈ Pρie and fe (a) ∩ De ⊆ Qiρe , and we conclude z ∈ (P Q)iρe . Proposition 2. Let M be a λ-term and ρ : V ar → P(De ) be a De -environment; then we have Mρi ∩ De = Mρie . Proof. We prove by induction on the structure of M that Mρi ∩ De ⊆ Mρie . The converse is ensured by Mρie ⊆ Mρi and Mρie ⊆ De , both trivially true. If M ≡ x, the statement trivially holds. Let M ≡ λxN ; if y ∈ Mρi ∩ De , then y = ie (a, z) with a ∪ {z} ⊆ De , and ie i . By induction hypothesis z ∈ Nρ[x:=a] , and hence ie (a, z) = y ∈ Mρie . z ∈ Nρ[x:=a] Let M ≡ P Q; If z ∈ (P Q)iρ ∩ De , then ∃a ⊆ E such that i(a, z) ∈ Pρi and a ⊆ Qiρ . Since ρ is e-closed and z ∈ De , we can use Lemma 1(b) and Proposition 1(i) to obtain fe (i(a, z)) = ie (fe (a) ∩ De , z) ∈ Pρi . Hence we can use the induction hypothesis to get ie (fe (a) ∩ De , z) ∈ Pρie . Moreover, fe (a) ∩ De ⊆ Qρie by using again Proposition 1(i) and the induction hypothesis on Q. Hence z ∈ (P Q)iρe . Proposition 3. T h(E, i) ⊆ T h(De , ie ). Proof. Let M i = N i . By the previous proposition we have M ie = M i ∩ De = N i ∩ De = N ie . Theorem 1 is an immediate corollary of Proposition 3 and of the definition of (E, i).
4
Conclusion
We have shown that the graph models admit a minimal lambda theory T h(E, i). Graph models provide a suitable framework for proving the consistency of extensions of λβ. For instance, for every closed λ-term M there exists a graph model (DM , iM ) such that (λx.xx)(λx.xx)iM = M iM [3]. Symmetrically, one could use them in order to realise inequalities between non β-equivalent terms: given M =β N , this can be achieved by finding a graph model (D, i) such that M i = N i . We are not able to perform this construction in general, yet, but we conjecture that T h(E, i) = λβ. Another question raised by this work concerns the generality of the notions of eflattening and e-closure, introduced to prove the minimality of (E, i). Actually it should be possible to apply our technique for proving that classes of models other than graph models, which, informally, are closed under direct product of “pre-models” and free completion, admit a minimal lambda theory.
The Minimal Graph Model of Lambda Calculus
307
References 1. Abramsky, S.: Domain theory in logical form. Annals of Pure and Applied Logic 51 (1991) 1–77 2. Alessi, F., Dezani, M., Honsell, F.: Filter models and easy terms. in ICTCS’01, LNCS 2202, Springer-Verlag, (2001) 17–37 3. Baeten, J., Boerboom, B.: Omega can be anything it should not be. Indag. Mathematicae 41 (1979) 111–120 4. Barendregt, H.P.: The lambda calculus: Its syntax and semantics. Revised edition, Studies in Logic and the Foundations of Mathematics 103, North-Holland Publishing Co., Amsterdam (1984) 5. Barendregt, H.P., Coppo, M., Dezani, M.: A filter lambda model and the completeness of type assignment. J. Symbolic Logic 48 (1983) 931–940 6. Bastonero, O., Gouy, X.: Strong stability and the incompleteness of stable models of λcalculus. Annals of Pure and Applied Logic 100 (1999) 247–277 7. Berline, C.: From computation to foundations via functions and application: The λ-calculus and its webbed models. Theoretical Computer Science 249 (2000) 81–161 8. Berry, G.: Stable models of typed lambda-calculi. Proc. 5th Int. Coll. on Automata, Languages and Programming, LNCS vol.62, Springer-Verlag (1978) 9. Bucciarelli, A., Ehrhard, T.: Sequentiality and strong stability. Sixth Annual IEEE Symposium on Logic in Computer Science (1991) 138–145 10. Coppo, M., Dezani, M.: An extension of the basic functionality theory for the λ-calculus. Notre Dame J. Formal Logic 21 (1980) 685–693 11. Di Gianantonio, P., Honsell, F.: An abstract notion of application. in M. Bezem and J.F. Groote, editors, Typed lambda calculi and applications, LNCS 664, Springer-Verlag, (1993) 124–138 12. Di Gianantonio, P., Honsell, F., Plotkin, G.D.: Uncountable limits and the lambda calculus. Nordic J. Comput. 2 (1995) 126–145 13. Honsell, F., Ronchi della Rocca, S.: An approximation theorem for topological λ-models and the topological incompleteness of λ-calculus. Journal Computer and System Science 45 (1992) 49–75 14. Kerth, R.: Isomorphism and equational equivalence of continuous lambda models. Studia Logica 61 (1998) 403–415 15. Kerth, R.: On the construction of stable models of λ-calculus. Theoretical Computer Science 269 (2001) 16. Longo, G.: Set-theoretical models of λ-calculus: theories, expansions and isomorphisms. Ann. Pure Applied Logic 24 (1983) 153–188 17. Plotkin, G.D.: Set-theoretical and other elementary models of the λ-calculus. Theoretical Computer Science 121 (1993) 351–409 18. Salibra, A.: Topological incompleteness and order incompleteness of the lambda calculus. ACM Transactions on Computational Logic (2003) 19. Scott, D.S.: Continuous lattices. In: Toposes, Algebraic geometry and Logic (F.W. Lawvere ed.), LNM 274, Springer-Verlag (1972) 97–136
Unambiguous Automata on Bi-infinite Words Olivier Carton LIAFA, Universit´e Paris 7
[email protected] http://www.liafa.jussieu.fr/˜carton/
Abstract. We consider finite automata accepting bi-infinite words. We introduce unambiguous automata where each accepted word is the label of exactly one accepting path. We show that each rational set of bi-infinite words is accepted by such an automaton. This result is a counterpart of McNaughton’s theorem for bi-infinite words.
1
Introduction
Roughly speaking, an acceptor is unambiguous if each word is accepted in only one way. For instance, a context-free grammar is unambiguous if any word has at most one derivation tree. Similarly a finite state automaton is generally called unambiguous if any finite word labels at most one accepting path. In this area, an important question is whether any acceptor of some type is equivalent to another one of the same type which is unambiguous. For instance it is well known that any Turing machine is equivalent to a deterministic one which is obviously unambiguous [6]. On the contrary, it is not true that any context-free grammar is equivalent to an ambiguous one. Some context-free languages are inherently ambiguous [3, Prop 1.8]. In this paper, we address the problem of ambiguity for automata accepting bi-infinite words. For automata on finite words, any automaton is equivalent to a deterministic one that can be computed using the classical subset construction [10, Thm 2.1]. This gives an equivalent unambiguous automaton. For ω-words, the same result holds but this is already a nontrivial result [8]. Furthermore, Muller automata which have a more powerful acceptance condition must be used and all determinization algorithms of automata over ω-words that have been given so far are complex [11]. Determinization is not the only way to get unambiguous automata on ω-words [1]. In [5], it has been proved that any B¨ uchi is equivalent to an unambiguous one which is actually co-deterministic. The determinization of automata has even been extended to automata on transfinite words by B¨ uchi [4]. In this paper, we show that any automaton on bi-infinite words is equivalent to an unambiguous one. We actually prove two results, a specific one for rational sets that are shift invariant and one for all rational sets. The former one holds for automata with a simple and natural acceptance mode which is only suitable for shift invariant sets. The latter one holds for automata with a more powerful acceptance mode suited to all rational sets. This result is also stronger because B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 308–317, 2003. c Springer-Verlag Berlin Heidelberg 2003
Unambiguous Automata on Bi-infinite Words
309
it states the existence of automata that are unambiguous and complete. There was no counterpart of McNaughton’s result for automata on bi-infinite words. Our result fills this gap. Complementation is often proved using unambiguous automata. For instance, proving that the complement of a rational set of finite words is also rational is usually achieved through the computation of an equivalent deterministic and complete automaton (see [10, Cor 2.3]. For ω-words, complementation becomes also easy if a deterministic Muller automaton is provided. In both cases, the key idea for complementation is to use an automaton where each word labels exactly one path that can be accepting. In deterministic automata, this path is of course the unique path stating in the initial state. Our second result provides unambiguous and complete automata that can be used for complementation. A rational expression is unambiguous if each word is described in a unique way by the expression. This means that all unions appearing in the expression are disjoint unions and that product, stars and ω-iterations are unambiguous. Recall for instance that a product XY is unambiguous if any word z in XY has a unique factorization z = xy where x and y are in X and Y . It is well-known that any rational set of finite words is described by an unambiguous rational expression [10, Prop 4.3]. It follows from Arnold’s results [1], that the same result holds for rational sets of infinite words. No similar result is known for sets of bi-infinite words but the unambiguous automata that we have introduced seem to be a step towards such a result. In symbolic dynamics [7,2] are studied sets of bi-infinite words that are shift invariant and closed for the usual topology. These sets are called subshifts and they are characterized by the factors of their elements. Recall that a factor of a bi-infinite word is a finite word that appear in it. When its set of factors is rational, a subshift is the set of labels of all bi-infinite paths in an automaton without any acceptance condition and it is called a sofic subshift. Automata without any acceptance condition where each bi-infinite word labels at most one path are called local. They correspond to a strict subclass of sofic subshifts called subshifts of finite type. The paper is organized as follows. The notion of bi-infinite words is defined in Section 2. Automata accepting these words are introduced in Section 3. The definition of unambiguous automata is given is Section 4. Each of the two main results is stated and proved in a subsection (Subsections 4.1 and 4.2) of this section.
2
Bi-infinite Words
In this section, we recall the notion of bi-infinite words. We actually define two notions of bi-infinite words, namely pointed and unpointed words. The latter one refers to shift invariant sets. Let A be a finite alphabet whose elements are called letter. A finite word over A is a finite sequence a1 . . . an of letters. An ω-word x is a infinite sequence x0 x1 x2 . . . of letters. The sets of finite and ω-words are respectively denoted by A∗ and Aω . A pointed bi-infinite word z or a pointed word for short is a sequence
310
Olivier Carton
. . . z−2 z−1 · z0 z1 z2 . . . of letters indexed by the set Z of relative integers. A dot is inserted between the letters indexed by −1 and 0 to mark that position. The sets of all pointed bi-infinite words is denoted AZ . It is equipped with the shift σ : AZ → AZ defined by σ((zk )k∈Z ) = (zk−1 )k∈Z . The image by the shift of a pointed word . . . z−3 z−2 z−1 · z0 z1 z2 . . . is thus the pointed word . . . z−3 z−2 · z−1 z0 z1 z2 . . . where the marked position has been shift one position to the left. The shift induces the equivalence relation ∼ on AZ defined as follows. Two pointed words z and z satisfy z ∼ z if z = σ n (z) for some relative integer n. The class of a pointed word z is denoted by σ Z (z) = {σ n (z) | n ∈ Z}. If z is equal to . . . z−2 z−1 · z0 z1 z2 . . ., its class σ Z (z) is written . . . z−2 z−1 z0 z1 z2 . . . without any dot. A class σ Z (z) is called an unpointed biinfinite word or an unpointed word for short. The set of all unpointed words is denoted by Aζ . For a set Z of pointed words, we denote by σ Z (Z) the set {σ Z (z) | z ∈ Z} of unpointed words. Let z = (zi )i∈Z be the pointed word defined by zi = a if i is even and zi = b if i is odd. The class σ Z (z) of z only contains the two pointed words z = . . . abab · abab . . . = (ab)ω˜ · (ab)ω and σ(z) = . . . baba · baba . . . = (ba)ω˜ · (ba)ω . This class is the unpointed word . . . ababab . . . which is denoted (ab)ζ . A set Z of pointed words is shift invariant if it satisfies σ(Z) = Z. If Z is shift invariant, each class σ Z (z) is either contained in Zif z ∈ Z or disjoint from Z if z ∈ / Z. The set Z is then equal to the union z∈Z σ Z (z) of classes. This means that a shift invariant set Z can be identified with the set σ Z (Z) of unpointed words. Let x = x0 x1 x2 . . . and y = y0 y1 y2 . . . be two ω-words. We denote by x ˜ · y the pointed word z = . . . x2 x1 x0 · y0 y1 y2 . . . obtained by concatenating the mirror image of x with y. The corresponding unpointed word . . . x2 x1 x0 y0 y1 y2 . . . is denoted by x ˜y. For two sets X and Y of ω-words, we respectively denote by ˜ · Y and XY ˜ the sets {˜ X x · y | x ∈ X and y ∈ Y } and {˜ xy | x ∈ X and y ∈ Y }. ˜ · Y ) = XY ˜ . Note that σ Z (X Let A be the alphabet {a, b} and let X and Y be the sets Aω and aAω of ˜ · Y = Aω˜ · aAω is the set of pointed words having an a at ω-words. The set X ˜ = Aω˜ aAω is the set of unpointed words having at least position 0 whereas XY one occurrence of an a. 2.1
Rational Sets of Bi-infinite Words
A set Z of pointed (respectively words is said to be rational if Z n unpointed) ˜ i · Yi (respectively Z = n X ˜ is equal to a finite union Z = i=1 X i=1 i Yi ) where each set Xi or Yi is a rational set of ω-words. For a shift invariant set Z, Z is rational as a set of pointed words if and only if σ Z (Z) is rational as a set of unpointed words. In the sequel, we need the following result. Theorem 1. The complement of a rational set of pointed (respectively unpointed) words is also rational. We refer the reader to [9, chap. 9] for a proof of this result.
Unambiguous Automata on Bi-infinite Words
3
311
Automata
In this section, we recall the notion of automata accepting bi-infinite words. These automata are a natural extension of B¨ uchi automata to bi-infinite words. We refer the reader to [9, chap. 9] for a complete introduction to these notions. As we shall see, automata equipped with the most natural acceptance mode accept unpointed words. For pointed words, an additional set of states called middle states is needed to define the accepting paths. An automaton over the alphabet A is given by a set Q of states, a set E ⊆ Q × A × Q of transitions and by sets I ⊆ Q and F ⊆ Q of initial and final states. For an automaton accepting pointed words, a set M ⊆ Q of middle states is also given. Such an automaton A is denoted (Q, A, E, I, F ) or (Q, A, E, I, M, F ) in a the latter case. Each transition (p, a, q) is denoted by p − → q and the letter a is called its label. Generally speaking, a path in A is a sequence of consecutive transitions. As for words, we consider infinite and bi-infinite paths. An infinite (respectively bi-infinite) path is an infinite (respectively bi-infinite) sequence of consecutive transitions. The label of a path is the concatenation of the labels of its transitions. The label of an infinite (respectively bi-infinite) path is thus an infinite (respectively bi-infinite) word over A. More formerly, the label of an infinite path a
a
a
0 1 2 q0 −→ q1 −→ q2 −→ q3 · · ·
is the ω-word a0 a1 a2 . . . and the label of a bi-infinite path γ a−2
a−1
a
a
a
· · · q−2 −−→ q−1 −−→ q0 −−0→ q1 −−1→ q2 −−2→ q3 · · · is the pointed bi-infinite word z = (ak )k∈Z . Note that the pointed bi-infinite word σ n (z) is the label of the path σ n (γ) obtained by shifting the path. Therefore, we say that the unpointed word σ Z (z) is also the label of the path γ. An infinite path is said to be initial if its first state is initial. This definition is of course not relevant to bi-infinite paths which do not have any first state. A bi-infinite path is said to be initial if at least one initial state occurs infinitely often on the left. More formerly, let us define the left limit lim−∞ γ of a bi-infinite path γ by lim γ = {q | ∀n ∈ N ∃k k < −n and qk = q}.
−∞
A bi-infinite path γ is initial if lim−∞ γ ∩I is non-empty. An infinite or bi-infinite path is final if at least one final state occurs infinitely often on the right. Let us define similarly the right limit of an infinite or bi-infinite path γ by lim γ = {q | ∀n ∈ N ∃k k > n and qk = q}.
+∞
An infinite or bi-infinite path is final if lim+∞ γ ∩ F is non-empty. In a B¨ uchi automaton A = (Q, A, E, I, F ), an infinite path is accepting if it is both initial and final. This means that its first state is initial and that it
312
Olivier Carton
satisfies the well-known B¨ uchi condition [12, p. 136]. An ω-word is accepted by a B¨ uchi automaton if it is the label of an accepting path. In the figures, initial and final states are respectively marked by a small incoming and outgoing arrow.
Fig. 1. An automaton for unpointed words
Example 1. Consider the automaton pictured in Figure 1 as a B¨ uchi automaton. It accepts the set b∗ aAω of ω-words having at least one occurrence of an a. In an automaton A = (Q, A, E, I, F ) a bi-infinite path is accepting if it is both initial and final. This means that its left part satisfies the B¨ uchi condition with respect to initial states and that its right part satisfies the B¨ uchi condition with respect to final states. An unpointed word z is accepted by an automaton if it is the label of an accepting path. Example 2. Consider again the automaton pictured in Figure 1. It accepts the set Aω˜ aAω of unpointed words having at least one occurrence of an a. This acceptance mode is the most natural one for bi-infinite words because it simply generalizes the B¨ uchi acceptance condition used for infinite words. For unpointed words, this mode is suitable since there is a Kleene-like theorem. A set of unpointed words is rational if and only if it is accepted by a finite automaton. For pointed words, this acceptance mode is unsuitable for the following reason. If γ is an accepting path, the path σ(γ) is also accepting. Therefore, if the pointed word z is accepted, the word σ(z) must also be accepted. Its turns out that only shift invariant sets can be accepted by automata with this acceptance mode. In an automaton A = (Q, A, E, I, M, F ) with a set M of middle states, a bi-infinite path γ is accepting, if it is both initial and final and if the state q0 at position 0 in γ is a middle state. In the figures, the middle states are marked by a double circle. Example 3. Consider the automaton pictured in Figure 2. It accepts the set Aω˜ · aAω of pointed words having an a at position 0.
4
Unambiguous Automata
In this section, we introduce the notion of unambiguous automata on bi-infinite words. We first give the definitions of unambiguity and completeness for these
Unambiguous Automata on Bi-infinite Words
313
Fig. 2. An automaton for pointed words
automata. These definitions make sense for both automata on pointed bi-infinite words and automata on unpointed bi-infinite words. However, the results that can be obtained are different. For pointed words, the main result is that any rational set can be accepted by an automaton which is both unambiguous and complete. For unpointed words, it is only possible to accept any rational set by an unambiguous automaton. The automaton cannot be complete in general. This comes intrinsically from the acceptance mode. We begin with the definition and we provide some examples and counterexamples. Definition 1. An automaton A is ζ-unambiguous (respectively ζ-complete) if any bi-infinite word is the label of at most (respectively at least) one path which is initial and final. Note first that the initial and final paths of an automaton only depend on its initial and final states. Therefore, this definition can be applied to both automata with or without middle states. Second, an automaton is unambiguous (respectively complete) for pointed words if and only if it is unambiguous (respectively complete) for pointed words. This is due to the fact that if a bi-infinite word z labels a path, the word σ(z) labels the path σ(γ). To illustrate this definition, we provide examples of ζ-ambiguous and ζunambiguous automata.
Fig. 3. ζ-ambiguous automaton of Example 4
Example 4. The automaton pictured in Figure 3 accepts the set bω˜ (a + b)∗ bω of unpointed words having finitely many a. This automaton is ζ-ambiguous since the word bζ is the label of the two accepting paths 0ω˜ 12ω and 0ω˜ 112ω . The automaton pictured in Figure 4 accepts the set same set but it is ζ-unambiguous. This automaton is however not ζ-complete since the word aζ is not the label of a path.
314
Olivier Carton
Fig. 4. ζ-unambiguous automaton of Example 4
From the previous example, it might seem obvious to find an unambiguous automaton recognizing a given set of unpointed words. The following example shows that it is not always so easy.
Fig. 5. ζ-unambiguous automaton of Example 5
Example 5. The set (ab∗ )ω˜ (a + b)ω + (a + b)ω˜ (b∗ a)ω of unpointed words having infinitely many a is accepted by the automaton pictured in Figure 5. This automaton is ζ-unambiguous. This automaton is also not ζ-complete since the word bζ is not the label of a path. 4.1
Pointed Words
In this section, we focus our attention to pointed bi-infinite words. We first show that unambiguous and complete automata behave well with respect to complementation. Then, we state and we prove the main result for pointed words. Complementation becomes easy with unambiguous and complete automata on pointed words. It suffices to exchange middle states and non-middle states to get an automaton accepting the complement set. Proposition 1. Let A = (Q, A, E, I, M, F ) be a ζ-unambiguous and ζ-complete automaton accepting a subset Z of bi-infinite pointed words. The automaton (Q, A, E, I, Q \ M, F ) accepts the complement of Z. In the previous examples, we have seen that many typical rational sets of unpointed bi-infinite words can accepted by a ζ-unambiguous automata. The
Unambiguous Automata on Bi-infinite Words
315
following theorem states that this is actually true for any rational set of pointed bi-infinite words. This is one of the main results of the paper. Theorem 2. Any rational set of pointed bi-infinite words is accepted by a ζunambiguous and ζ-complete automaton. The result of McNaughton states that any B¨ uchi automaton can be replaced by an equivalent automaton which is deterministic and thus unambiguous. The previous theorem is thus the counterpart of McNaughton’s result for bi-infinite words. The proof of Theorem 2 is based on a similar result concerning automata on ω-words. This result involves the notion of an unambiguous automaton on ω-words that we now define. Definition 2. An automaton A = (Q, A, E, I, F ) is said to be ω-unambiguous (respectively ω-complete) if any ω-word is the label of at most (respectively at least) one final path. The main difference with the definitions on unambiguity and completeness for automata on bi-infinite words is that final paths are considered instead of paths which are both initial and final. We can now state the analogous result for ω-words. Theorem 3 ([5]). Any rational set of ω-words is accepted by an ω-unambiguous and ω-complete automaton. Let A = (Q, A, E, I, F ) and A = (Q , A, E , I , F ) be two automata on ω ˜ words. The (synchronized) product A×A is the automaton B defined as follows. The set of states of B is Q × Q , its sets of initial and final states are respectively F × Q and Q × F and its set T of transitions is defined by a
a
a
T = {(p, q) − → (p , q ) | p − → p ∈ E and q − → q ∈ E }. Note that the transitions of A are used backwards in the automaton A˜ × A . The following lemma allows us to combine unambiguous and complete automata on ω-words to make unambiguous and complete automata on bi-infinite words. Lemma 1. If A and A are two ω-unambiguous and ω-complete automata, the synchronized product A˜ × A is ζ-unambiguous and ζ-complete. The main idea of the proof of Theorem 2 is to use the synchronized product of automata on ω-words and Lemma 1. 4.2
Unpointed Words
In this section, we focus our attention to unpointed words. Theorem 4. Any rational set of unpointed bi-infinite words is accepted by a ζ-unambiguous automaton.
316
Olivier Carton
Note that the automaton given by the previous theorem cannot be complete since it would accept the full set Aζ . The result can however be applied to both a set X and its complement X = Aζ \X to get two unambiguous automata AX and AX recognizing X and X. The automaton A = AX ∪ AX is still unambiguous and it is also complete. One obtains an unambiguous and complete automaton which is naturally divided in two parts. The unique path labeled by any unpointed bi-infinite word x is either in the first part of the automaton if x belongs to X and in the second part otherwise. If for instance the automata pictured in Figure 4 and 5 are joined, the resulting automaton is unambiguous and complete. The construction used in the proof of Theorem 4 is illustrated by the following example.
¼
Fig. 6. Product automaton of Example 6
Example 6. Consider again the automaton of Example 2 pictured in Figure 1. It accepts the set X = Aω˜ aAω of bi-infinite words having at least one occurrence of a. The construction described in the proof of Theorem 4 can be used to get an unambiguous automaton recognizing X. The complement of X is the set X = bζ . The unambiguous and complete automata A (in the Figure is actually pictured the automaton A˜ obtained by reversing the transitions of A) and A pictured in Figure 6 accept the sets bω and bω . The automaton A˜ × A is unambiguous and complete. One gets an unambiguous automata recognizing X by removing the state 02 from it.
Unambiguous Automata on Bi-infinite Words
317
References 1. A. Arnold. Rational ω-languages are non-ambiguous. Theoret. Comput. Sci., 26:221–223, 1983. 2. M.-P. B´eal and D. Perrin. Symbolic dynamics and finite automata. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 2, chapter 10. Springer-Verlag, 1997. 3. J. Berstel and L. Boasson. Context-free languages. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 2, pages 59–102. Elsevier, 1990. 4. J. R. B¨ uchi. Transfinite automata recursions and weak second order theory of ordinals. In Proc. Int. Congress Logic, Methodology, and Philosophy of Science, Jerusalem 1964, pages 2–23. North Holland, 1964. 5. O. Carton and M. Michel. Unambiguous B¨ uchi automata. Theoret. Comput. Sci., 297:37–81, 2003. 6. J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, 1979. 7. D. Lind and B. Marcus. An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, 1995. 8. R. McNaughton. Testing and generating infinite sequences by a finite automaton. Inform. Control, 9:521–530, 1966. ´ Pin. Infinite words. To appear, available at 9. D. Perrin and J.-E. http://www.liafa.jussieu.fr/˜jep/Resumes/InfiniteWords.html. 10. D. Perrin. Finite automata. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 1, pages 1–57. Elsevier, 1990. 11. S. Safra. On the complexity of ω-automata. In 29th Annual Symposium on Foundations of computer sciences, pages 24–29, 1988. 12. W. Thomas. Automata on infinite objects. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 4, pages 133–191. Elsevier, 1990.
Relating Hierarchy of Temporal Properties to Model Checking ˇ Ivana Cern´ a and Radek Pel´ anek Department of Computer Science, Faculty of Informatics Masaryk University Brno, Czech Republic {cerna,xpelanek}@fi.muni.cz
Abstract. The hierarchy of properties as overviewed by Manna and Pnueli [18] relates language, topology, ω-automata, and linear temporal logic classifications of properties. We provide new characterisations of this hierarchy in terms of automata with B¨ uchi, co-B¨ uchi, and Streett acceptance condition and in terms of ΣiLTL and ΠiLTL hierarchies. Afterwards, we analyse the complexity of the model checking problem for particular classes of the hierarchy and thanks to the new characterisations we identify those linear time temporal properties for which the model checking problem can be solved more efficiently than in the general case.
1
Introduction
Model checking has become a popular technique for formal verification of reactive systems. The model checking process has several phases – the major ones being modelling of the system, specification of desired properties of the system, and the actual process of automatic verification. Each of these phases has its specific difficulties. In this paper we study linear temporal properties and algorithms for the automatic verification of these properties. Reactive systems maintain an ongoing interaction with their environment and thus produce computations – infinite sequences of states. When analysing the behaviour of such a system we are interested in some finite set AP of observable propositions about states. Hence, we can view a computation of the system as an infinite word over 2AP . In general, we define a temporal property as a language of infinite words. A reactive system S is said to have a property P if all possible computations of S belong to P . The problem of proper and correct specification of properties the system ought to satisfy led to a careful study of theoretical aspects of properties. Manna and Pnueli [18] have proposed a classification of temporal properties into a hierarchy. They characterise the classes of the hierarchy through four views: a language-theoretic view, a topological view, a temporal logic view, and an automata view. The fact that the hierarchy can be defined in many different ways shows the robustness of this hierarchy.
ˇ grant no. 201/03/0509 Supported by GA CR
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 318–327, 2003. c Springer-Verlag Berlin Heidelberg 2003
Relating Hierarchy of Temporal Properties to Model Checking
319
Model checking theory is devoted to the development of efficient algorithms for the automatic verification of properties of reactive systems. A very successful approach to verifying properties expressed as linear temporal logic (LTL) formulas makes use of automata over infinite words. Here the problem of verifying a property reduces to the problem whether a given automaton recognises a non-empty language (so called non-emptiness check). The complexity of the non-emptiness check depends on the type of the automaton. Bloem, Ravi, and Somenzi [1] have studied two specialised types of automata, called weak and terminal, for which the non-emptiness check can be performed more efficiently than in the general case. Our Contribution. Our aim is to classify temporal properties specifiable by linear temporal logic formulas with respect to the complexity of their verification. To this end we provide a classification of temporal properties through two new views. First, we characterise properties in terms of automata over infinite words (ω-automata) with B¨ uchi, co-B¨ uchi, and Streett acceptance condition and in terms of weak and terminal automata. Weak and terminal automata are used in the verification process and are checked for non-emptiness. For the second characterisation we introduce a new hierarchy (called UntilRelease hierarchy) of LTL formulas based on alternation depth of temporal operators Until and Release. We provide a relationship between the Until-Release hierarchy and the hierarchy by Manna and Pnueli [18]. Our new classification provides us with an exact relationship between the type of a formula and the type of an automaton, which is checked for nonemptiness in the model checking process of the formula. In the second part of the paper we enquire into particular automata and analyse the complexity of their non-emptiness check in connection with both explicit and implicit representation of automata. This gives us an exact relationship between types of properties and the complexity of their verification. Finally, we discuss the possibility of exact determination of the type of a formula. Due to space limitations complete proofs (and some formal definitions) are omitted and can be found in the full version of the paper [3]. Related Work. The previous work on verification, which takes into account a classification of properties, is partly devoted to the proof-based approach to verification [4]. Papers on specialised model checking algorithms either cover only part of the hierarchy or have a heuristic nature. Vardi and Kupferman [15] study the model checking of safety properties. Schneider [19] is concerned with a translation of persistence properties into weak automata. Bloem and Somenzi study heuristics for the translation of a formula into weak (terminal) automaton [21] and suggest specialised algorithms for the non-emptiness problem [1]. Our work covers all types of properties and brings out the correspondence between the type and the complexity of non-emptiness check.
320
2
ˇ Ivana Cern´ a and Radek Pel´ anek
Hierarchy of Temporal Properties
The hierarchy studied by Manna and Pnueli [18] classifies properties into six classes: guarantee, safety, obligation, persistence, recurrence, and reactivity properties. Definition 1 (Language-Theoretic View [18]). Let P ⊆ Σ ω be a property over Σ. – P is a safety property if there exists a language of finite words L ⊆ Σ ∗ such that for every w ∈ P all finite prefixes of w belong to L. – P is a guarantee property if there exists a language of finite words L ⊆ Σ ∗ such that for every w ∈ P there exists a finite prefix of w which belongs to L. – P is an obligation property if P can be expressed as a positive boolean combination of safety and guarantee properties. – P is a recurrence property if there exists a language of finite words L ⊆ Σ ∗ such that for every w ∈ P infinitely many prefixes of w belong to L. – P is a persistence property if there exists a language of finite words L ⊆ Σ ∗ such that for every w ∈ P all but finitely many prefixes of w belong to L. – P is a reactivity property if P can be expressed as a positive boolean combination of recurrence and persistence properties. In what follows, the abbreviation κ-property stands for a property of one of the six above mentioned types. Inclusions, which relate the corresponding classes into a hierarchy, are depicted in Fig. 1. Classes which are higher up strictly contain classes which are lower down. 2.1
Automata View
Manna and Pnueli [18] have defined the hierarchy of properties in terms of deterministic Streett predicate automata. Automata for considered classes of properties differ in restrictions on their transition functions and acceptance conditions. In this section we provide a new characterisation of the hierarchy in terms of deterministic ω-automata which uses only restrictions on acceptance conditions (the transition function is always the same). We find this characterisation more uniform and believe that it provides better insight into the hierarchy. On top of that we study other widely used types of ω-automata and show that each of them exactly corresponds to one class in the hierarchy. An ω-automaton is a tuple A = Σ, Q, q0 , δ, α, where Σ is a finite alphabet, Q is a finite set of states, q0 ∈ Q is an initial state, δ is a transition function, and α is an acceptance condition. The transition function determines four types of automata: deterministic, nondeterministic, universal, and alternating. A nondeterministic automaton has a transition function of the type δ : Q × Σ → 2Q . A run π of such an automaton on an infinite word w = w(0)w(1) . . . over Σ is a sequence of states π = r0 , r1 , . . . such that r0 = q0 and ri+1 ∈ δ(ri , w(i)) for each i ≥ 0. A nondeterministic automaton accepts a word w if there exists an
Relating Hierarchy of Temporal Properties to Model Checking
321
accepting run (see below) on w. Universal automata are defined in the same way, the only difference is that the universal automaton accepts a word w if all runs on w are accepting. Deterministic automata are such that |δ(q, a)| = 1 for all q ∈ Q, a ∈ Σ (there is a unique run on each word). Alternating automata form a generalisation of nondeterministic and universal automata. General automata Accepting cycle
Reactivity Streett
Recurrence
Persistence
Buchi ¨
Weak automata
co-Buchi ¨
¾ÄÌÄ
¾ÄÌÄ
Fully accepting cycle
Obligation Occ. Streett
Safety
Guarantee
Occ. co-Buchi ¨
Occ. Buchi ¨
½ÄÌÄ
½ÄÌÄ
Terminal aut. Reachability
Fig. 1. Relations between classes of the hierarchy and their different characterisations. Classes which are higher up properly contain classes which are lower down. Classes on the same level are dual with respect to complementation, while the classes obligation and reactivity can be obtained by boolean combinations of properties from classes lower down
For a run π we define the infinity set, Inf (π), to be the set of all states that appear infinitely often in π and the occurrence set, Occ(π), to be the set of states that appear at least once in π. Acceptance conditions α are defined with respect to infinity set as follows: – B¨ uchi condition α ⊆ Q : a run π is accepting iff Inf (π) ∩ α = ∅ – co-B¨ uchi condition α ⊆ Q : a run π is accepting iff Inf (π) ∩ α = ∅ – Streett condition α = {G1 , R1 , . . . , Gn , Rn }, Gi , Ri ⊆ Q : a run π is accepting iff ∀i : (Inf (π) ∩ Gi = ∅ ⇒ Inf (π) ∩ Ri = ∅) For every acceptance condition we can define its “occurrence” version [17,23] if Occ(π) substitues for Inf (π) (also called Staiger-Wagner acceptance). According to the acceptance condition we denote ω-automata as B¨ uchi and occurence B¨ uchi automata respectively and so on. A property P is defined to be specifiable
322
ˇ Ivana Cern´ a and Radek Pel´ anek
Table 1. The expressivity – each of 24 possible inter-combinations of the transition function and acceptance condition corresponds to one of the six hierarchy classes B¨ uchi Deterministic recurrence Nondeterministic reactivity Universal recurrence Alternating reactivity
co-B¨ uchi persistence persistence reactivity reactivity
Streett reactivity reactivity reactivity reactivity
B¨ uchi guarantee persistence guarantee persistence
Occurrence co-B¨ uchi safety safety recurrence recurrence
Streett obligation persistence recurrence reactivity
by automata if there is an ω-automaton A which accepts a word w if and only if w ∈ P . Theorem 1. Let P be a property specifiable by automata. Then P is a guarantee, safety, obligation, persistence, recurrence, or reactivity property if and only if it is specifiable by a deterministic occurrence B¨ uchi, occurrence co-B¨ uchi, occurrence Streett, co-B¨ uchi, B¨ uchi, or Streett automaton respectively (see Table 1). Proof. For each class from the hierarchy a κ-automaton is defined in [18] by posing specific restrictions on transition functions and acceptance conditions of deterministic Streett predicate automata. Using an adjustment of accepting conditions and a copy construction one can effectively transform κ-automata to above mentioned automaton and vice versa. (Details can be found in the full version of the paper [3].) To make the picture complete we have examined other types of automata as well (see Table 1). For every possible combination of transition function and acceptance condition the class of specifiable properties exactly coincides with one class in the hierarchy. Results for infinite occurrence acceptance conditions follow from [16]. Universal occurrence B¨ uchi and nondeterministic occurrence co-B¨ uchi automata can be determinised through the power set construction and thus they recognise the same classes as their deterministic counterparts. The other results for occurrence acceptance condition follow from [17]. 2.2
Linear Temporal Logic View
In this section we characterise the hierarchy of properties through a new hierarchy of LTL formulas based on an alternation depth. The set of LTL formulas is defined inductively starting from a countable set AP of atomic propositions, Boolean operators, and the temporal operators X (Next), U (Until) and R (Release): Ψ := a | ¬Ψ | Ψ ∨ Ψ | Ψ ∧ Ψ | X Ψ | Ψ U Ψ | Ψ R Ψ LTL formulas are interpreted in the standard way on infinite words over the alphabet 2AP . A property P is defined to be specifiable by LTL if there is an LTL formula ϕ such that w |= ϕ if and only if w ∈ P . In recent years, considerable effort has been devoted to the study of LTL hierarchies which were defined with respect to the number of nested temporal
Relating Hierarchy of Temporal Properties to Model Checking
323
operators Until, Since, and Next ([10,22,14]). These hierarchies provide interesting characterizations of LTL definable languages. However, they do not seem to have a direct connection to the model checking problem. We propose a new hierarchy which is based on alternation depth instead of nested depth, and establish its connection with the hierarchy of properties. In the next Section we demonstrate that this classification directly reflects the hardness of the verification problem for particular properties. Let us define hierarchies ΣiLTL and ΠiLTL , which reflect alternations of Until and Release operators in formulas. We use the Σ/Π notation since the way the hierarchy is defined strongly resembles the quantifier alternation hierarchy of first-order logic formulas or fixpoints alternation hierarchy of µ-calculus formulas. Definition 2. The class Σ0LTL = Π0LTL is the least set containing all atomic propositions and closed under the application of boolean and Next operators. LTL is the least set containing ΠiLTL and closed under the applicaThe class Σi+1 tion of conjunction, disjunction, Next and Until operators. LTL is the least set containing ΣiLTL and closed under the applicaThe class Πi+1 tion of conjunction, disjunction, Next and Release operators. The following theorem shows that the type of a property and alternation depth of its specification are closely related. Theorem 2. A property that is specifiable by LTL is a guarantee (safety, persistence, recurrence respectively) property if and only if it is specifiable by a formula from the class Σ1LTL (Π1LTL , Σ2LTL , Π2LTL respectively) (see Fig. 1). Proof. The proof makes use of the classification of LTL formulas by Chang, Manna, and Pnueli [4]. Here every κ-property is syntactically characterised with the help of a κ-formula. We can transform any guarantee (safety, persistence, recurrence respectively) formula into an equivalent Σ1LTL (Π1LTL , Σ2LTL , Π2LTL respectively) formula. Theorem 3. A property is specifiable by LTL if and only if it is specifiable by a positive boolean combination of Σ2LTL and Π2LTL formulas. Therefore both ΣiLTL and ΠiLTL hierarchies collapse in the sense that every LTL formula is specifiable both by a Σ3LTL and Π3LTL formula.
3
Model Checking and Hierarchy of Properties
The model checking problem is to determine for a given reactive system K and a temporal formula ϕ whether the system satisfies the formula. A common approach to model checking of finite state systems and LTL formulas is to construct an automaton A¬ϕ for the negation of the property and to model the system
324
ˇ Ivana Cern´ a and Radek Pel´ anek
as an automaton K. The product automaton K × A¬ϕ is then checked for nonemptiness. The product automaton is a nondeterministic B¨ uchi automaton. For the formal definition of the problem and detailed description of the algorithm we refer to [5]. Our aim is to analyse the complexity of the non-emptiness check depending on the type of the verified property. As the complexity of the non-emptiness check is determined by attributes of an automaton, the question is whether for different types of formulas one can construct different types of automata. We give a comprehensive answer to this question in this section. In the next section we demonstrate how the complexity of the non-emptiness check varies depending on the type of automata. To classify nondeterministic B¨ uchi automata we adopt the criteria proposed by Bloem, Ravi, and Somenzi [1]. They differentiate general, weak, and terminal automata according to the following restrictions posed on their transition functions: - general : no restrictions - weak : there exists a partition of the set Q into components Qi and an ordering ≤ on these sets, such that for each q ∈ Qi , p ∈ Qj , if ∃a ∈ Σ : q ∈ δ(p, a) then Qi ≤ Qj . Moreover for each Qi , Qi ∩ α = ∅, in which case Qi is a rejecting component, or Qi ⊆ α, in which case Qi is an accepting component. - terminal : for each q ∈ α, a ∈ Σ it holds δ(q, a) = ∅ and δ(q, a) ⊆ α. Each transition of a weak automaton leads to a state in either the same or lower component. Consequently each run of a weak automaton gets eventually trapped within one component. The run is accepting iff this component is accepting. The transition function of a terminal automaton is even more restricted – once a run of a terminal automaton reaches an accepting state the run is accepting regardless of the suffix. Terminal and weak automata are jointly called specialised automata. It shows up that the classes of properties specifiable by weak and terminal automata coincide with classes of the hierarchy. Theorem 4. A property P specifiable by automata is a guarantee (persistence) property if and only if it is specifiable by a terminal (weak) automaton. Theorem 4 raises a natural question whether and how effectively one can construct for a given guarantee (persistence) formula the corresponding terminal (weak) automaton. A construction of an automaton for an LTL formula was first proposed by Wolper, Vardi, and Sistla [24]. This basic construction has been improved in several papers ([12,21,9]) where various heuristics have been used to produce automaton as small and as “weak” as possible. Although these heuristics are quite sophisticated, they do not provide any insight into the relation between the formula and the “weakness” of the resulting automaton. Constructions for special types of properties can be found in [19,15]. We present a new modification of the original construction which yields for a formula from the class Σ1LTL and Σ2LTL a specialised automaton. Theorem 5. For every Σ1LTL (Σ2LTL ) formula ϕ we can construct a terminal (weak) automaton accepting the property defined by ϕ.
Relating Hierarchy of Temporal Properties to Model Checking
325
Proof. States of the automaton are sets of subformulas of the formula ϕ. The transition function is constructed in such a way that the following invariant is valid: if the automaton is in a state S then the remaining suffix of the word should satisfy all formulas in S. The acceptance condition is used to enforce the fulfillment of Until operators. For Σ1LTL and Σ2LTL formulas the acceptance condition can be simplified thanks to the special structure of alternation of Until and Release operators in the formula.
4
Non-emptiness Algorithms
In the previous section we showed that we can effectively construct specialised automata for formulas from lower classes of the hierarchy. Since the verified system K can be modelled as an automaton without acceptance conditions, the type of the product automaton is determined entirely by the type of the automaton A¬ϕ , that is even the product automaton is specialised. In this section we revise both explicit and symbolic non-emptiness algorithms for different types of automata. General Automata. For general automata the non-emptiness check is equivalent to the reachability of an accepting cycle (i.e. cycle cointaining an accepting state). The most efficient explicit algorithm is the nested depth-first search (DFS) algorithm [6,13]. With the symbolic representation one has to use nested fixpoint computation (e.g. Emerson-Lei algorithm) with a quadratic number of symbolic steps (for an overview of symbolic algorithms see [11]). Weak Automata. States of a weak automaton are partitioned into components and therefore states from each cycle are either all accepting (the cycle is fully accepting) or all non-accepting. The non-emptiness problem is equivalent to the reachability of a fully accepting cycle. The explicit algorithm has to use only a single DFS [8] in this case. With the symbolic representation single fixpoint computation [1] with a linear number of steps is sufficient. Terminal Automata. Once a terminal automaton reaches an accepting state, it accepts the whole word. Thus the non-emptiness of a terminal automaton can be decided by a simple reachability analysis. With the symbolic representation there is even asymptotical difference between the algorithms for general and specilized cases. All explicit algorithms have linear time complexity, but the use of specialized algorithms still brings several benefits. Time and space optimalizations, “Guided search” heuristics [8], and the partial-order reduction [13] can be employed more directly for specialized algorithms. Algorithms for specialized automata can be more effectively transformed to distributed ones [2]. These benefits were already experimentally demonstrated. Edelkamp, Lafuente, and Leue [8] extended the explicit model checker SPIN by a non-emptiness algorithm which to a certain extent takes the type of an automaton into consideration. Bloem, Ravi, and Somenzi [1] performed experiments with symbolic algorithms and in [2] experiments with distributed algorithms are presented.
326
5
ˇ Ivana Cern´ a and Radek Pel´ anek
Conclusions
The paper provides a new classification of temporal properties through deterministic ω-automata and through the Until-Release hierarchy. It provides effective transformation of the Σ1LTL (Σ2LTL ) formula into terminal (weak) automaton and it argues that the non-emptiness problem for these automata can be solved more efficiently. It is decidable whether given formula specifies property of type κ [3]. In a case that it is guarantee (persistence) formula it is possible to transform it into an equivalent Σ1LTL (Σ2LTL ) formula. Thus the new classifications provide us with exact relationship between the type of a formula and the type of the non-emptiness problem. The determination of the type of a formula and the transformation are rather expensive (even deciding whether a given formula specifies a safety property is PSPACE-complete [20]). However, formulas are usually quite short and it is typical to make many tests for one fixed formula. In such a case, the work needed for determining the type of the formula is amortised over its verification. Moreover, most of the practically used formulas are simple. We have studied the Specification Patterns System [7] that is a collection of the most often verified properties. It shows up that most of the properties can be easily transformed into terminal (41%) or weak (54%) automata. We conclude that model checkers should take the type of the property into account and use the specialized nonemptiness algoritms as often as possible.
References 1. R. Bloem, K. Ravi, and F. Somenzi. Efficient decision procedures for model checking of linear time logic properties. In Proc. Computer Aided Verification, volume 1633 of LNCS, pages 222–235. Springer, 1999. ˇ 2. I. Cern´ a and R. Pel´ anek. Distributed explicit fair cycle detection. In Proc. SPIN Workshop on Model Checking of Software, volume 2648 of LNCS, pages 49–73. Springer, 2003. ˇ 3. I. Cern´ a and R. Pel´ anek. Relating hierarchy of linear temporal properties to model checking. Technical Report FIMU-RS-2003-03, Faculty of Informatics, Masaryk University, 2003. http://www.fi.muni.cz/informatics/reports/. 4. E. Y. Chang, Z. Manna, and A. Pnueli. Characterization of temporal property classes. In Proc. Automata, Languages and Programming, volume 623 of LNCS, pages 474–486. Springer, 1992. 5. E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. The MIT Press, 1999. 6. C. Courcoubetis, M. Vardi, P. Wolper, and M. Yannakakis. Memory-efficient algorithms for the verification of temporal properties. Formal Methods in System Design, 1:275–288, 1992. 7. M. B. Dwyer, G. S. Avrunin, and J. C. Corbett. Property specification patterns for finite-state verification. In Proc. Workshop on Formal Methods in Software Practice, pages 7–15. ACM Press, 1998.
Relating Hierarchy of Temporal Properties to Model Checking
327
8. S. Edelkamp, A. L. Lafuente, and S. Leue. Directed explicit model checking with HSF-SPIN. In Proc. SPIN Workshop on Model Checking of Software, volume 2057 of LNCS, pages 57–79. Springer, 2001. 9. K. Etessami and G. J. Holzmann. Optimizing B¨ uchi automata. In Proc. CONCUR, volume 1877 of LNCS, pages 153–167. Springer, 2000. 10. K. Etessami and T. Wilke. An Until hierarchy for temporal logic. In Proc. IEEE Symposium on Logic in Computer Science, pages 108–117. Computer Society Press, 1996. 11. K. Fisler, R. Fraer, G. Kamhi Y. Vardi, and Zijiang Yang. Is there a best symbolic cycle-detection algorithm? In Proc. Tools and Algorithms for Construction and Analysis of Systems, volume 2031 of LNCS, pages 420–434. Springer, 2001. 12. R. Gerth, D. Peled, M. Y. Vardi, and P. Wolper. Simple on-the-fly automatic verification of linear temporal logic. In Proc. Protocol Specification Testing and Verification, pages 3–18. Chapman & Hall, 1995. 13. G. J. Holzmann, D. Peled, and M. Yannakakis. On nested depth first search. In Proc. SPIN Workshop, pages 23–32. American Mathematical Society, 1996. 14. A. Kuˇcera and J. Strejˇcek. The stuttering principle revisited: On the expressiveness of nested X and U operators in the logic LTL. In Proc. Computer Science Logic, volume 2471 of LNCS, pages 276–291. Springer, 2002. 15. O. Kupferman and M. Y. Vardi. Model checking of safety properties. Formal Methods in System Design, 19(3):291–314, 2001. 16. C. L¨ oding. Methods for the transformation of omega-automata: Complexity and connection to second order logic. Master’s thesis, Christian-Albrechts-University of Kiel, 1998. 17. C. L¨ oding and W. Thomas. Alternating automata and logics over infinite words. In Proc. IFIP International Conference on Theoretical Computer Science, volume 1872 of LNCS, pages 521–535. Springer, 2000. 18. Z. Manna and A. Pnueli. A hierarchy of temporal properties. In Proc. ACM Symposium on Principles of Distributed Computing, pages 377–410. ACM Press, 1990. 19. K. Schneider. Improving automata generation for linear temporal logic by considering the automaton hierarchy. In Proc. Logic for Programming, Artificial Intelligence, and Reasoning, volume 2250 of LNCS, pages 39–54. Springer, 2001. 20. A. P. Sistla. Safety, liveness, and fairness in temporal logic. Formal Aspects of Computing, 6(5):495–512, 1994. 21. F. Somenzi and R. Bloem. Efficient B¨ uchi automata from LTL formulae. In Proc. Computer Aided Verification, volume 1855 of LNCS, pages 248–263. Springer, 2000. 22. D. Therien and T. Wilke. Nesting Until and Since in linear temporal logic. In Proc. Symposium on Theoretical Aspects of Computer Science, volume 2285 of LNCS, pages 455–464. Springer, 2002. 23. W. Thomas. Languages, automata and logic. In Handbook of Formal Languages, volume 3, pages 389–455. Springer, 1997. 24. P. Wolper, M.Y. Vardi, and A.P. Sistla. Reasoning about inifinite computation paths. In Proc. Symp. on Foundations of Computer Science, pages 185 – 194, Tuscon, 1983.
Arithmetic Constant-Depth Circuit Complexity Classes Hubie Chen Department of Computer Science, Cornell University Ithaca, NY 14853, USA
[email protected] Abstract. The boolean circuit complexity classes AC 0 ⊆ AC 0 [m] ⊆ T C 0 ⊆ N C 1 have been studied intensely. Other than N C 1 , they are defined by constant-depth circuits of polynomial size and unbounded fan-in over some set of allowed gates. One reason for interest in these classes is that they contain the boundary marking the limits of current lower bound technology: such technology exists for AC 0 and some of the classes AC 0 [m], while the other classes AC 0 [m] as well as T C 0 lack such technology. Continuing a line of research originating from Valiant’s work on the counting class P , the arithmetic circuit complexity classes AC 0 and N C 1 have recently been studied. In this paper, we define and investigate the classes AC 0 [m] and T C 0 , new arithmetic circuit complexity classes that are defined by constant-depth circuits and are analogues of the classes AC 0 [m] and T C 0 .
1
Introduction
The study of counting complexity was initiated by Valiant’s work on P , the class of functions mapping a string x to the number of accepting paths of x on an NP-machine [11]. The class L, defined similarly but with NL-machines, has also been studied [13,10,6,5]. Both P and L can be obtained by “arithmetizing” boolean circuit characterizations of N P and N L given in [12]. To arithmetize a boolean circuit, we propagate all NOT gates to the input level, convert OR gates to addition (+) gates, and convert AND gates to multiplication (∗) gates. Viewing the inputs to the circuit as taking on the values 0, 1 from the natural numbers, we obtain circuits which map naturally from the binary strings {0, 1}∗ to the natural numbers. More recently, the arithmetic classes AC 0 , BW BP , N C 1 , and SAC 1 have been defined and studied [1,7,3,8,4,13]. Other than BW BP , these classes are arithmetic versions of boolean classes typically defined by circuits, and arise from arithmetizing the corresponding boolean circuits. These classes obey the inclusion chain AC 0 BW BP ⊆ N C 1 ⊆ SAC 1 , which essentially mirrors the known relationships AC 0 BW BP = N C 1 ⊆ SAC 1 of boolean classes. Lying inbetween the boolean classes AC 0 and N C 1 are a hierarchy of classes AC 0 [m] and the class T C 0 , which have been studied extensively. (For any m, B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 328–337, 2003. c Springer-Verlag Berlin Heidelberg 2003
Arithmetic Constant-Depth Circuit Complexity Classes
329
we have the inclusions AC 0 ⊆ AC 0 [m] ⊆ T C 0 ⊆ N C 1 .) Not only have these classes given insight into the structure of N C 1 , but the class T C 0 captures the complexity of natural problems such as multiplication and division, while the AC 0 [m] hierarchy is particularly interesting since it contains the boundary marking the limits of current lower bounds technology. In this paper, we introduce the classes AC 0 [m] and T C 0 , arithmetic versions of the boolean classes AC 0 [m] and T C 0 . Just as AC 0 [m] and T C 0 give a refined view of N C 1 , our new arithmetic classes refine N C 1 . Shadowing their boolean counterparts, these classes fall into the inclusion chain AC 0 ⊆ AC 0 [m] ⊆ T C 0 ⊆ BW BP ⊆ N C 1 . Both the original boolean classes and the new arithmetic classes are defined by constant-depth circuits, which in this paper are always of unbounded fan-in and polynomial size. The class AC 0 [m] (respectively T C 0 ) consists of those functions computable by constant-depth circuits with AND, OR, and MOD m (respectively MAJORITY) gates. In order to define the classes AC 0 [m] and T C 0 , we introduce arithmetic extensions of the functions MOD m and MAJORITY. Then, we arithmetize AC 0 [m] and T C 0 as above, but in addition convert MOD m and MAJORITY gates into their arithmetic extensions. While T C 0 is shown to be equal to its boolean analogue T C 0 , our definition of AC 0 [m] begets additional complexity classes. By defining Diff AC 0 [m] to consist of those functions equal to the difference of two AC 0 [m] functions, we obtain another hierarchy of classes, which includes the already studied Diff AC 0 [1,7,3] at the bottom. Moreover, we define two generic operators on arithmetic function classes; acting on the classes Diff AC 0 [m] and AC 0 [m] with these operators gives us new language classes. This paper focuses on studying the structure of three hierarchies: the AC 0 [m] hierarchy, the Diff AC 0 [m] hierarchy, and a hierarchy of language classes. The hierarchy of language classes includes the classes AC 0 [m] and the classes induced by applying the mentioned operators to the classes AC 0 [m] and Diff AC 0 [m]. We prove class separations and containments where possible. Although making unconditional statements about these classes would in many cases require new lower bounds, it is often possible to show that a question in one hierarchy is equivalent to a question in another. For instance, we prove that the hierarchy of the classes AC 0 [m] has exactly the same structure as the hierarchy of the classes AC 0 [m]: AC 0 [m] ⊆ AC 0 [m ] if and only if AC 0 [m] ⊆ AC 0 [m ]. We also investigate closure properties of the classes AC 0 [m] and Diff AC 0 [m]. The closure properties proved here generalize those appearing in previous work [1,7,3]. One reason why this plethora of new classes is interesting is that it offers rephrasings of open questions. Not only does the AC 0 [m] hierarchy have the same structure as the AC 0 [m] hierarchy, but the classes AC 0 [m] seem to give an alternate decomposition1 of T C 0 . As a result, any question regarding the structure of the boolean AC 0 [m] hierarchy can be rephrased as a question concerning arithmetic classes, offering a new line of attack on such questions. As mentioned, the AC 0 [m] hierarchy is particularly important because it contains 1
Note that by Theorem 7, F AC 0 [m] is properly contained in AC 0 [m], assuming that AC 0 [m] = T C 0 .
330
Hubie Chen
both classes for which we have lower bounds technology, and classes for which we do not: such technology exists for the class AC 0 [m] when m is a prime power, but not when m is a composite with two or more distinct prime factors [9]. Note that ours are not the first results providing an interface between boolean and arithmetic circuit complexity, in the constant-depth setting: an intriguing result obtained by Agrawal et al. is that deciding whether or not two AC 0 circuits are equal characterizes exactly T C 0 [1]. Another reason to be interested in the classes introduced here is that they provide refinements of open questions, which may be more tractable than the original questions. For instance, for any odd positive integer m, new language classes sitting inbetween AC 0 [m] and AC 0 [2m] are induced by our arithmetic classes. These new classes offer us the ability to “interpolate” between existing classes. In particular, when m is an odd prime, our new classes sit inbetween a class (AC 0 [m]) for which we have lower bound technology, and a class (AC 0 [2m]) for which we do not. Thus, there is the natural question of whether or not one can prove lower bounds using one of these new classes. Establishing lower bounds technology for one of these new classes is necessarily no more difficult than doing so for AC 0 [2m], since the new classes are contained in AC 0 [2m]. Studying these “refined” questions is not only independently interesting, but may give insight into the original questions. The contents of this paper are as follows. In Section 2, we define the complexity classes to be studied, as well as the operators on arithmetic classes. In Section 3, we study the language classes induced by the arithmetic classes. Section 4 contains a normal form theorem, which essentially states that our arithmetic circuits need only use the arithmetic MOD and MAJORITY gates on 0-1 valued inputs. This theorem in turn allows us to show that the AC 0 [m] hierarchy is isomorphic to the AC 0 [m] hierarchy. Section 5 studies the classes Diff AC 0 [m]; focus is given to the question of whether or not Diff AC 0 [m] = Diff AC 0 [2m]. (This equality is unconditionally true when m = 1.) Aided by the notion of normal form, we derive a number of closure properties in Section 6. For more background on circuit complexity, we refer the reader to the book [14], which contains a chapter on arithmetic circuit complexity; the survey [2] is also a good source of information on arithmetic circuit complexity.
2
Preliminaries
We let N denote the set of natural numbers, {0, 1, 2, . . .}; and, we let N+ denote the set of positive integers, {1, 2, 3, . . .}. The complexity classes that we study in this paper are defined by constant depth circuits; what varies among the definitions of the classes are the types of gates allowed. We will instantiate the following definition with different bases of gates to define our classes. Definition 1. We say that a function is computable by AC 0 circuits over the basis B if the function can be computed by a family of constant depth, polynomial size circuits of unbounded fan-in with gates from B and inputs from {0, 1, xi , xi }. (By the size of a circuit, we mean the number of gates plus the number of wires.)
Arithmetic Constant-Depth Circuit Complexity Classes
2.1
331
Boolean Classes
The boolean functions and classes in the next two definitions have been studied in past work. Definition 2. We define the following boolean functions: – MOD m on inputs x1 , . . . , xk takes on the value 1 if the number of xi ’s that are nonzero is a multiple of m; and the value 0 otherwise. – MAJORITY on inputs x1 , . . . , xk takes on the value 1 if the number of xi ’s that are nonzero is strictly greater than k/2; and the value 0 otherwise. Definition 3. We define the following classes of boolean functions2 : – AC 0 (F AC 0 ) is the class of functions computable by AC 0 circuits over the basis of boolean functions {AND, OR} with exactly one output gate (one or more output gates). – AC 0 [m1 , . . . , ml ] (F AC 0 [m1 , . . . , ml ]) is the class of functions computable by AC 0 circuits over the basis of boolean functions {AND, OR, MOD m1 , . . . , MOD ml } with exactly one output gate (one or more output gates). (This definition is for all {m1 , . . . , ml } ⊆ N+ .) – T C 0 (F T C 0 ) is the class of functions computable by AC 0 circuits over the basis of boolean functions {AND, OR, MAJORITY} with exactly one output gate (one or more output gates). We note a lower bound due to Smolensky. Theorem 1. [9] If p and q are distinct primes, then MOD q ∈ / AC 0 [p]. This separates the classes AC 0 [q] and AC 0 [p] (for p, q distinct primes), and will allow us to derive separations of some of the classes which we introduce. 2.2
Arithmetic Classes
We give arithmetic versions of the definitions of the functions MOD m and MAJORITY. Definition 4. Define η : N → N so that η(x) is 1 if x = 0, and equal to x otherwise. We define the following arithmetic functions: k – AMOD m on inputs x1 , . . . , xk takes on the value i=1 η(xi ) if the number of xi ’s that are nonzero is a multiple of m; and the value 0 otherwise. 2
Although these classes are often defined so that NOT gates are allowed anywhere in the corresponding circuits, it is an easy exercise to show that such definitions are equivalent to ours. Our definitions will make proving various properties more convenient.
332
Hubie Chen
k – AMAJORITY on inputs x1 , . . . , xk takes on the value i=1 η(xi ) if the number of xi ’s that are nonzero is strictly greater than k/2; and the value 0 otherwise. Notice that these arithmetic functions coincide with their boolean counterparts on 0-1 valued inputs, typing issues aside3 . With these new functions in hand, we can now define the arithmetic complexity classes to be studied in this paper. The definition is parallel to Definition 3. Definition 5. We define the following classes of functions from {0, 1}∗ to N; the functions +, ∗ denote the usual arithmetic sum and product in N, and the input gates are interpreted as the values 0, 1 in N. – AC 0 is the class of functions computable by AC 0 circuits over the basis {+, ∗}. – AC 0 [m1 , . . . , ml ] is the class of functions computable by AC 0 circuits over the basis {+, ∗, AMOD m1 , . . . , AMOD ml }. (This definition is for all {m1 , . . . , ml } ⊆ N+ .) – T C 0 is the class of functions computable by AC 0 circuits over the basis {+, ∗, AMAJORITY}. 2.3
Operators on Arithmetic Classes
By way of some generic operators defined on arithmetic classes, we will obtain yet more complexity classes. The class Diff AC 0 was studied in [1,7,3]; we generalize it here. Definition 6. We define the following classes of functions from {0, 1}∗ to Z: – DiffAC 0 is the class of functions expressible as the difference of two AC 0 functions. (That is, DiffAC 0 = {f − g : f, g ∈ AC 0 }.) – DiffAC 0 [m1 , . . . , ml ] is the class of functions expressible as the difference of two AC 0 [m1 , . . . , ml ] functions. The following two operators allow us to obtain language classes from arithmetic classes. Definition 7. Suppose that C is a class of functions from {0, 1}∗ to Z. Define χC to be the class of languages with characteristic function in C. Define LowOrdC to be the class of languages with characteristic function equal to the low order bit of a function in C (i.e., the value of a function in C modulo two). 3
In this paper, we will generally ignore such typing issues, and associate the boolean values 0, 1 with the natural numbers 0, 1.
Arithmetic Constant-Depth Circuit Complexity Classes
2.4
333
Unambiguous Circuits
We now observe a basic fact: that our arithmetic classes are at least as powerful as their boolean analogues4 . Lemma 1. F AC 0 ⊆ AC 0 , F AC 0 [m1 , . . . , ml ] ⊆ AC 0 [m1 , . . . , ml ] (for all m1 , . . . , ml ∈ N), and F T C 0 = T C 0 .
3
Language Classes
In this section, we study the language classes that result by allowing the operators χ and LowOrd to act on the arithmetic classes. We first show that the characteristic functions in each of the function classes AC 0 , AC 0 [m1 , . . . , ml ], T C 0 , are exactly the corresponding boolean language classes5 . Lemma 2. AC 0 = χAC 0 , AC 0 [m1 , . . . , ml ] = χAC 0 [m1 , . . . , ml ], and T C 0 = χT C 0 . This lemma immediately allows us to derive separations of the arithmetic classes. Theorem 2. For all primes p, AC 0 AC 0 [p]. For all distinct primes p, q, AC 0 [p] \ AC 0 [q] is nonempty. The other language classes we get by operating on AC 0 [m] and Diff AC 0 [m] fall into a chain of inclusions bounded below and above by AC 0 [m] and AC 0 [2, m], respectively. Theorem 3. For all odd positive integers m, we have AC 0 [m] ⊆ χDiffAC 0 [m] ⊆ LowOrdDiffAC 0 [m] = LowOrdAC 0 [m] ⊆ AC 0 [2, m]. If m is an even positive integer, all of the classes coincide.
4
Arithmetic Classes
We now study the relationships of the arithmetic classes to each other, and to the boolean classes. We begin by proving that every function from an arithmetic class can be computed by circuits in normal form: circuits where the inputs (and hence outputs) of the AMOD (or AMAJORITY) gates always have 0-1 values. This notion of normal form will allow us to derive many facts concerning the structure of the classes AC 0 [m], and will also aid us in proving closure properties. Definition 8. Let us say that a AC 0 [m1 , . . . , ml ] (T C 0 ) circuit is in normal form if on all inputs x, all AMOD (AMAJORITY) gates appearing in the circuit receive as inputs only the values 0 and 1. We say that a AC 0 [m1 , . . . , ml ] (T C 0 ) circuit family is in normal form if all of its circuits are in normal form. 4
5
Note that we view the classes F AC 0 , F AC 0 [m1 , . . . , ml ], and F T C 0 as classes of functions from {0, 1}∗ to N, in order to compare them with the corresponding arithmetic classes. To do this, we view a string of bits as a natural number in the i usual way: the string yn . . . y0 represents the natural number n 2 yi . i=0 By associating languages with their characteristic functions, we view the classes AC 0 , AC 0 [m1 , . . . , ml ], and T C 0 as language classes.
334
Hubie Chen
Theorem 4. For every function f in AC 0 [m1 , . . . , ml ] (T C 0 ), there is a AC 0 [m1 , . . . , ml ] (T C 0 ) circuit family in normal form computing f . In some sense, what we are showing is that only the boolean function MOD (MAJORITY) is required in arithmetic circuits, to capture the full power of AC 0 [m1 , . . . , ml ] (T C 0 ). We now show that the structure of the classes AC 0 [m] is isomorphic to the structure of the classes AC 0 [m], with respect to the subset relation ⊆. Theorem 5. For all positive integers m1 , m2 , AC 0 [m1 ] ⊆ AC 0 [m2 ] if and only if AC 0 [m1 ] ⊆ AC 0 [m2 ]. Corollary 1. For all positive integers m1 , m2 , AC 0 [m1 ] = AC 0 [m2 ] if and only if AC 0 [m1 ] = AC 0 [m2 ]. Thus far, we have sometimes restricted our attention to the classes AC 0 [m]; the next theorem justifies this restriction, showing that any class AC 0 [m1 , . . . , ml ] is equivalent to some class AC 0 [m]. Theorem 6. For every subset {m1 , . . . , ml } ⊆ N+ , we have AC 0 [m1 , . . . , ml ] l = AC 0 [ i=1 mi ] = AC 0 [p1 , . . . , pk ] where p1 , . . . , pk are the primes dividing l i=1 mi . The following two corollaries can be derived by making simple modifications to the proof of Theorem 5, along with the fact that AC 0 [m] ⊆ T C 0 . Corollary 2. For all positive integers m, AC 0 [m] ⊆ T C 0 . Corollary 3. For all positive integers m, AC 0 [m] = T C 0 if and only if AC 0 [m] = T C 0 . The next theorem demonstrates that for a particular m, the AC 0 [m] versus T C 0 question is equivalent to the F AC 0 [m] versus AC 0 [m] question. Theorem 7. Let m be a positive integer. Either F AC 0 [m] = AC 0 [m] = F T C 0 , or F AC 0 [m] AC 0 [m] F T C 0 .
5
Difference Classes
We now focus on the difference classes Diff AC 0 [m]. First, we observe a separation. Theorem 8. For all odd primes p, DiffAC 0 DiffAC 0 [p]. In the case of Diff AC 0 [2], we have class equality with Diff AC 0 . Roughly, this is because the boolean function MOD 2 is contained in Diff AC 0 .
Arithmetic Constant-Depth Circuit Complexity Classes
335
Theorem 9. DiffAC 0 = DiffAC 0 [2]. There is the more general question of whether or not Diff AC 0 [m] = Diff AC 0 [2m] for m > 1. We are not able to answer this question unconditionally, but can connect it to a question concerning language classes: equality holds if and only if AC 0 [2m] ⊆ χDiff AC 0 [m]. Theorem 10. Let m be a positive integer. The following are equivalent: 1. AC 0 [2m] ⊆ DiffAC 0 [m] 2. DiffAC 0 [2m] = DiffAC 0 [m] 3. AC 0 [2m] ⊆ χDiffAC 0 [m]
6
Closure Properties
6.1
Maximum and Minimum
Theorem 11. Let m be a positive integer. Neither AC 0 [m] nor DiffAC 0 [m] is closed under MAX, unless AC 0 [2m] = T C 0 . The same theorem holds for MIN in place of MAX. 6.2
Division by a Constant
For a positive integer c, we say that a function class C is closed under division by c if f ∈ C implies that fc ∈ C. Theorem 12. Let m be a positive integer and let p be a prime. The class AC 0 [p, (p − 1), m] is closed under division by p. Corollary 4. Let m be a positive integer and let p be a prime. The class DiffAC 0 [p, (p − 1), m] is closed under division by p. The idea behind the proofs of Theorem 12 and Corollary 4 is to modify AC 0 [p, (p − 1), m] circuits inductively so that for each gate g in the original circuit, both the remainder of g (modulo p) and gp are computed in the modified g from the circuit6 . The MOD p − 1 gates are used to compute the remainder of remainders of its inputs gi , when g is a multiplication gate (i.e., g = gi ). In the direction of proving converses of Theorem 12 and Corollary 4, we have the following theorem and corollary. Some new terminology is required for their statement. We call a function f non-trivial if it is not constant. We say that a function f is symmetric if it is non-trivial and for any x1 , . . . , xn ∈ n n {0, 1} and x1 , . . . , xn ∈ {0, 1}, i=1 xi = i=1 xi implies f (x1 , . . . , xn ) = f (x1 , . . . , xn ). We say that a symmetric function f has period k if k ≥ 1, and for n n any x1 , . . . , xn ∈ {0, 1} and x1 , . . . , xn ∈ {0, 1}, i=1 xi = k + i=1 xi implies f (x1 , . . . , xn ) = f (x1 , . . . , xn ). 6
In light of the equalities GapAC 0 = Diff AC 0 = Diff AC 0 [2], Corollary 4, when instantiated with p = 2 and m = 1, gives [3, Theorem 8].
336
Hubie Chen
Theorem 13. Let m be a positive integer. If AC 0 [m] is closed under division by p, then there exist symmetric functions with periods p and p−1 in DiffAC 0 [m]. Corollary 5. Let m be a positive integer. If AC 0 [m] is closed under division by p, then MOD p ∈ AC 0 [2m], and there exists a divisor q > 1 of p − 1 such that MOD q ∈ AC 0 [2m]. 6.3
Choose
We say that a function class C is closed under the choose operation if f ∈ C implies that fk ∈ C for every positive integer k. Theorem 14. Let m be a positive integer. The classes AC 0 [m] and DiffAC 0 [m] are closed under the choose operation.
7
Future Work
We identify some open issues as possible avenues for future work. – Are there combinatorial problems complete for the classes AC 0 [m]? A characterization of AC 0 by the problem of counting paths in a certain class of graphs is given in [3]. – Can Corollary 5 be improved to show that under the hypotheses, MOD p − 1 is in AC 0 [2m] (as opposed to just MOD q for a non-trivial divisor q of p−1)? – Can one generalize the characterization of Diff AC 0 = Diff AC 0 [2] as GapAC 0 (given in [7])? Access to the complex second roots of unity {1, −1} seems to be what gives GapAC 0 the ability to compute MOD 2. If we allow AC 0 the third roots of unity as constants, we obtain circuits that can compute both MOD 2 and MOD 3. What languages can be computed by AC 0 circuits when the pth roots of unity (for a prime p) are allowed as constants? Can give a general characterization of such circuits which includes the result of [7] as a particular case? – Another possibility is to study the power of arithmetic circuits when the underlying algebraic structure is a group ring, such as NG for some finite group G. If the constants {1g : g ∈ G} are allowed in circuits over NG that can multiply, then for all m dividing |G|, MOD m is computable. – Suppose that C1 , C2 are classes from {AC 0 [m] : m ≥ 2} ∪ {T C 0 }. It was demonstrated that C1 = C2 if and only if C1 = C2 (Corollaries 1 and 3). It is the case that T C 0 = N C 1 implies T C 0 = N C 1 (since χT C 0 = T C 0 , and χN C 1 = N C 1 ). Does the converse hold? – Of course, this list would not be complete without a request for new lower bounds. We conjecture that for odd primes p, χDiff AC 0 [p] is properly contained in AC 0 [2p] (and hence that Diff AC 0 [p] = Diff AC 0 [2p]). Can this be proved? More generally, can one prove any lower bounds using the classes χDiff AC 0 [p]?
Arithmetic Constant-Depth Circuit Complexity Classes
337
Acknowledgements The author would like to thank Eric Allender for many interesting discussions. Riccardo Pucella deserves thanks for his useful comments on a draft of this paper.
References 1. M. Agrawal, E. Allender, and S. Datta. On TC0 , AC0 , and arithmetic circuits. In Proceedings 12th Computational Complexity, pages 134–148. IEEE Computer Society Press, 1997. 2. E. Allender. Making computation count: arithmetic circuits in the nineties. SIGACT News, 28(4):2–15, 1998. 3. E. Allender, A. Ambainis, D. A. Mix Barrington, S. Datta, and H. LˆeThanh. Bounded depth arithmetic circuits: Counting and closure. In Proceedings 26th International Colloquium on Automata, Languages, and Programming (ICALP), Lecture Notes in Computer Science 1644, pages 149–158, 1999. 4. E. Allender, J. Jiao, M. Mahajan, and V. Vinay. Non-commutative arithmetic circuits: depth reduction and size lower bounds. Theoretical Computer Science, 209:47–86, 1998. 5. E. Allender and M. Ogihara. Relationships among PL, #L, and the determinant. RAIRO – Theoretical Informatics and Applications, 30:1–21, 1996. ` 6. C. Alvarez and B. Jenner. A very hard log space counting class. Theoretical Computer Science, 107:3–30, 1993. 7. A. Ambainis, D. A. Mix Barrington, and H. LˆeThanh. On counting AC 0 circuits with negative constants. In Proceedings 23rd Mathematical Foundations of Computer Science, Lecture Notes in Computer Science 1450, pages 409–417, Berlin, 1998. Springer-Verlag. 8. H. Caussinus, P. McKenzie, D. Th´erien, and H. Vollmer. Nondeterministic NC1 computation. Journal of Computer and System Sciences, 57:200–212, 1998. 9. R. Smolensky. Algebraic methods in the theory of lower bounds for Boolean circuit complexity. In Proceedings 19th Symposium on Theory of Computing, pages 77–82. ACM Press, 1987. 10. S. Toda. Classes of arithmetic circuits capturing the complexity of computing the determinant. IEICE Transactions on Communications/Electronics/Information and Systems, E75-D:116–124, 1992. 11. L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8:189–201, 1979. 12. H. Venkateswaran. Circuit definitions of non-deterministic complexity classes. SIAM Journal on Computing, 21:655–670, 1992. 13. V. Vinay. Counting auxiliary pushdown automata and semi-unbounded arithmetic circuits. In Proceedings 6th Structure in Complexity Theory, pages 270–284. IEEE Computer Society Press, 1991. 14. H. Vollmer. Introduction to Circuit Complexity. Springer-Verlag, 1999.
Inverse NP Problems Hubie Chen Department of Computer Science, Cornell University Ithaca, NY 14853, USA
[email protected] Abstract. One characterization of the class NP is as the class of all languages for which there exists a polynomial-time verifier with the following properties: for every member of the language, there exists a polynomiallysized proof causing the verifier to accept; and, for every non-member, there is no proof causing the verifier to accept. Relative to a particular verifier, every member x of the language induces a set of proofs, namely, the set of proofs causing the verifier to accept x. This paper studies the complexity of deciding, given a set Π of proofs, whether or not there exists some x inducing Π (relative to a particular verifier). We call this decision problem the inverse problem for the verifier. We introduce a new notion of reduction suited for inverse problems, and use it to classify as coNP-complete the inverse problems for the “natural” verifiers of many NP-complete problems.
1
Introduction
By now, the complexity class NP has become one of the most pervasive and successful notions in computer science. Indeed, the intimately connected notion of NP-completeness is described as “computer science’s favorite paradigm, fad, punching bag, buzzword, alibi, and intellectual export” in a lively discussion on the influence and nature of NP-completeness [11]. There are many equivalent characterizations of the class NP. According to one such characterization, a language is in NP if there is an efficient (polynomial-time) verifier with the following properties: for every member of the language, there exists a polynomially-sized proof causing the verifier to accept (upon being given the member along with the proof); and, for every non-member, there is no proof causing the verifier to accept. Thus, with a verifier for a language in hand, deciding membership in the language amounts to deciding whether or not a given potential member has a proof or not. Inverse Problems. Relative to a particular verifier, every member x of the language induces a set of proofs, namely, the set of proofs causing the verifier to accept x as a member. Call a set of proofs arising in this way induced. This paper studies a simply stated and natural question: what is the complexity of deciding, given a set Π of proofs, whether or not Π is induced? We call this decision problem the inverse problem for the verifier. B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 338–347, 2003. c Springer-Verlag Berlin Heidelberg 2003
Inverse NP Problems
339
As a first example, consider the natural verifier for 3-SAT, which accepts a 3-SAT formula and an assignment to the variables of the formula when the assignment satisfies the formula; call the inverse problem for this verifier inverse 3-sat. Inverse 3-sat is then the question of deciding, given a set Π of assignments, if there is a 3-SAT formula with exactly Π as its set of satisfying assignments. It is fairly straightforward to show that inverse 3-sat is in coNP: for any set of assignments Π, we can efficiently compute the “candidate formula” F containing all 3-clauses satisfied by all assignments in Π. It can be verified that if any 3-SAT formula has Π as set of satisfying assignments, then the candidate formula is such a formula. Intuitively, this is because we placed as many clauses as possible in the candidate formula F , so any 3-SAT formula including Π in its induced set can only be less constrained – that is, admit more satisfying assignments – than F . Thus, inverse 3-sat is in coNP: to show that a set Π is not in inverse 3-sat, it suffices to give an assignment outside of Π satisfied by the candidate formula (of Π). In previous work, Kavvadias and Sideri [8] considered the inverse satisfiability problem, which includes as a specific case the inverse 3-sat problem. (It is from them that we adopt the term inverse problem.) They examined the class of generalized boolean satisfiability problems on which Schaefer’s dichotomy theorem [12] was proved – a class which includes Horn SAT and 2-SAT, two well-known tractable subclasses of SAT; as well as not-all-equal SAT and one-in-three SAT, two intractable variants of SAT. For any satisfiability problem from this generalized class, the inverse problem is always in coNP by an argument similar to that given above. The intriguing result they obtained is that the inverse satisfiability problem is intractable (coNP-complete) if and only if the original satisfiability problem is intractable (NP-complete)! Thus, it follows from their result that inverse 3-sat is coNP-complete because 3-SAT is NP-complete, as well as that inverse 2-sat is in P because 2-SAT is in P. To give another example of an inverse problem, we begin with the language k-clique consisting of pairs G, k such that G is an undirected graph with a clique of size k. Consider the verifier for this language which accepts H as a proof for G, k if H is a size k subset of the vertex set of G that forms a clique in G. Call the inverse problem for this verifier inverse k-clique. Then, a wellformed input to the inverse k-clique problem is a set Π = {H1 , . . . , Hm }, where each Hi is a set of size k; and the problem is to decide whether or not there is a graph G such that {H1 , . . . , Hm } is exactly the set of k-cliques in G. In analogy to the “candidate formula” for an instance of inverse 3-sat, we can compute from an instance Π of inverse k-clique a “candidate graph” G with the property that if any graph has Π as its set of k-cliques, then G is such a graph: our candidate graph has vertex set ∪m i=1 Hi , with an edge between two vertices if the two vertices are both contained in Hi , for some i. Thus, inverse k-clique is in coNP: to show that Π is not in inverse k-clique, it suffices to demonstrate that the candidate graph for Π has a k-clique not in Π. It will be demonstrated in this paper that inverse k-clique is also coNP-complete.
340
Hubie Chen
These examples might seem to be evidence for the conjecture that the inverse problem for verifiers accepting NP-complete languages will always be coNPcomplete. However, a fairly succinct argument demonstrates that this conjecture fails. Suppose we take the verifier for k-clique and define a new verifier which accepts only “padded” proofs of the form s, H where H is a clique and s is a string with, say, length equal to |H|. Then, the inverse problem becomes “easier”: for every clique H accepted by the first verifier, exponentially many proofs of the form s, H are accepted by the second verifier. If padding of sufficient length is applied to the proofs of the verifier for k-clique, we obtain a verifier for k-clique with inverse complexity in P. Of course, such a padded verifier may seem to be quite artificial. For many NP-complete languages L, one may feel that there is a “natural” verifier for L (such as the verifiers described in the above examples), and we will show that the inverse problem for many such verifiers is coNP-complete. Nonetheless, we emphasize that for each language L in NP, there is more than one verifier1 for L, and that the inverse problem is defined relative to a verifier for an NP language, and not just an NP language by itself. Related Work and Contributions. Our study of inverse problems contributes to a line of research which aims to understand the properties of the natural verifiers for NP-complete languages. This research program has its origins in the work done on the counting class P by Simon and Valiant [13,14]; one important outcome of this work was the observation that many natural verifiers are related by parsimonious reductions, a stronger notion of many-one reduction that preserves the number of proofs. A notion of reduction even stronger than parsimonious reduction was studied by Lynch and Lipton [10]; they required that when a language member x many-one reduces to another language y, there be an efficiently computable function mapping proofs for x to proofs for y. Fischer, Hemaspaandra, and Torenvliet [5] went one step further to study what they called witness-isomorphic reductions, which require that there be efficiently computable isomorphisms between the proof sets of language members reduced to one another. Other work along this line includes [1], in which Agrawal and Biswas gave sufficient conditions for a verifier to accept a NP-complete language. These sufficient conditions allow one to derive many of the known NP-completeness results in a uniform manner. Our work complements these previous results by identifying an entirely new feature shared by many of the natural verifiers for NP-complete languages – namely, coNP-complete “inverse complexity.” This paper is also a contribution to the study of structure identification [3], where the goal is to decide, given a set of data points, whether or not the set is meaningfully structured in a way that admits a description of desirable type. More broadly, we believe that the study of inverse complexity sheds new light on the multitude of natural properties of classical combinatorial structures yielding NP-complete decision problems. We note that inverse complexity has consequences for another task involving computationally difficult problems – the task of generating multiple solutions 1
In fact, there are infinitely many.
Inverse NP Problems
341
to a search problem, which is of both theoretical and practical interest (see, for example, [6,9,7,4]). Here, a relevant problem is to decide whether or not all proofs of a language member have (at some point in time) been generated – precisely, to decide, given a language member x and a non-empty set of proofs Π of x, whether or not Π is the full set of proofs induced by x. Hardness of the inverse problem implies hardness of this decision problem, under a mild assumption. Results and Paper Organization. The notation and conventions used throughout the paper are fixed in Section 2. In this section, we also define the notion of a candidate function for a verifier, examples of which include the “candidate formula” and “candidate graph” procedures given above. In Section 3, we prove that inverse k-clique is coNP-complete. We also formulate a sufficient condition for an inverse problem to be in P, and give an example application of the condition by showing that inverse bipartite matching is in P. We then develop a notion of reduction (Section 4) which allows us to compare the complexity of inverse problems with relative ease. In particular, we define what we call a π-reduction between verifiers with candidate functions, and show that the existence of a π-function implies the existence of a π-reduction. By giving a sequence of π-functions, we leverage the hardness of inverse k-clique and inverse 3-sat to show the coNP-completeness of many other inverse problems. We have hinted that the “inverse complexity” of a language depends strongly on the verifier used to accept the language. In the full version of this paper, we formally demonstrate and study this dependence. Among other results, we show that there exists a verifier with a Σ2p -complete inverse problem, giving a new and natural example of a Σ2p -complete problem. We also prove that for all natural NP-complete languages L and any language A in NP, there is a verifier for L with inverse complexity A.
2
Preliminaries
We assume familiarity with basic notions of complexity theory, such as the complexity classes P, NP, coNP, Σ2p , and Π2p , described for instance in [2]. We write A ≤pm B if there is a many-one polynomial time reduction from A to B, and A ≡pm B if A ≤pm B and B ≤pm A. The power set of a set S is denoted by P(S). ∗ Throughout this paper, Σ denotes a fixed finite alphabet. When S ⊆ Σ is a set of strings, S denotes S = x∈S |x|, that is, the sum of the lengths of the strings in S. (We assume that such a set S ⊆ Σ ∗ is represented by a string of length linear in S.) Definition 1. A relation R ⊆ Σ ∗ × Σ ∗ is a verifier if membership in R is decidable in polynomial time and there is a polynomial p such that (for all x, π ∈ Σ ∗ ) R(x, π) implies that |π| ≤ p(|x|). When R is a verifier, we let R(x) denote the set of proofs for x ∈ Σ ∗ , that is, {π ∈ Σ ∗ : R(x, π)}; and, we let L(R) denote the set {x ∈ Σ ∗ : ∃π ∈ Σ ∗ such that R(x, π)}, which we call the language associated with or accepted by R.
342
Hubie Chen
We now define the inverse problem corresponding to a verifier, which is the problem of focal interest in this paper. Definition 2. Suppose that R is a verifier. Let R−1 denote the language {Π ⊆ Σ ∗ : ∃x ∈ L(R) such that R(x) = Π}. We call this language the inverse problem for R. When R is a verifier, we will refer to the members of the language L(R) as theorems; when x is a theorem, the elements of R(x) are said to be its proofs. Using this terminology, the inverse problem is the question of deciding, given a set of proofs Π, whether or not there is a theorem x with exactly Π as its set of proofs. A natural initial consideration is whether or not there is an upper bound on the complexity of R−1 . We can obtain such an upper bound when we place a restriction which we call fairness on R. All verifiers considered in this paper obey this restriction. Definition 3. A verifier R is a fair verifier if there is a polynomial q such that for all x ∈ L(R), there exists x ∈ L(R) where |x | ≤ q(R(x)) and R(x ) = R(x). Put differently, R is fair if for all sets of proofs Π, when there exists a theorem x with exactly Π as its set of proofs (that is, Π = R(x)), then there exists such an x with length bounded above by a polynomial in Π. Suppose R is a fair verifier. In deciding whether or not a set of proofs Π is in R−1 , we need only consider as potential theorems the x which are bounded above in length by a polynomial in Π, the length of the representation of Π. Moreover, by the definition of a verifier, a proof of x (that is, an element of R(x)) has length polynomial in |x|. These two facts lead to the following observation. Observation 1. If R is a fair verifier, then R−1 is in Σ2p . We note that without the restriction of fairness, the complexity of R−1 can be non-recursive – in fact, as high as the halting problem for general Turing machines2 . Definition 4. A non-empty set Π ⊆ Σ ∗ is well-formed (relative to an NP verifier R) if there exists x ∈ Σ ∗ such that Π ⊆ R(x). The inverse problem is only “interesting” on well-formed sets, since a set that is not well-formed cannot be in R−1 . Checking well-formedness of a set can be done in polynomial time for all verifiers considered in this paper. 2
For any recursively enumerable language L (such as the halting problem), there exists a verifier R such that R−1 = L. To see this, let M be a Turing machine accepting L. Define the verifier R(x, π) to be true when x is of the form 0n , π and M accepts when simulated for time n on input π: then, R−1 = L. Since R−1 is always recursively enumerable, we obtain that a language is recursively enumerable if and only if it is equal to R−1 for some verifier R.
Inverse NP Problems
343
We now introduce the notion of a candidate function for a verifier R. This notion can be thought of as a generalization of the “candidate formula” and “candidate graph” procedures described in the introduction. A candidate function efficiently maps a set of proofs Π to a “candidate theorem” having the property that if any theorem has Π as its set of proofs, then the candidate theorem is such a theorem. In addition, the candidate theorem for Π is optimistic in the sense that all proofs in Π are proofs of it. Definition 5. Let R be a verifier. A polynomial time computable function C : P(Σ ∗ ) → Σ ∗ is a candidate function for R if the following two conditions hold. 1. For all well-formed Π ⊆ Σ ∗ , all proofs in Π are proofs for C(Π): Π ⊆ R(C(Π)). 2. For all well-formed Π ⊆ Σ ∗ , if there exists an x ∈ L(R) such that Π = R(x), then Π = R(C(Π)). Note that when C(Π) has exactly Π as its set of proofs, there is no requirement that C(Π) be the unique theorem with Π as its set of proofs. Indeed, there will generally not be such a unique theorem. When a verifier has a candidate function, we can give an even better upper bound on the complexity of its inverse problem than that of Observation 1. (Notice that every verifier with a candidate function is fair.) Observation 2. If R is a verifier with a candidate function, then R−1 is in coNP. This is because we no longer have to search over all theorems x to find a match for Π, but rather can simply compute the candidate theorem in polynomial time, and then check whether or not its set of proofs is Π. Definition 6. Let R be a verifier. Define the exhaustive proof problem for R to be the language Exhaustive(R) = {x, Π ∈ Σ ∗ × P(Σ ∗ ) : R(x) = Π, Π = ∅} In other words, the exhaustive proof problem is to determine, given an x and a nonempty set of proofs Π for x, whether or not Π contains all proofs of x (relative to R). This is the case when π is a proof of x if and only if π is in Π (for all π). To verify this one needs only examine π of length polynomial in x (by definition of a verifier), leading to the following observation. Observation 3. If R is a verifier, then Exhaustive(R) is in coNP. Notice that without the restriction that the given set of proofs must be nonempty, the exhaustive proof problem for the verifier R of any NP-complete language L would trivially be coNP-complete. In this case, the reduction mapping a string x to the pair x, ∅ would reduce co-L to the exhaustive proof problem for R. Although this trivial reduction fails for the given definition of the exhaustive proof problem, we can nonetheless establish hardness of this problem by using the hardness of the corresponding inverse problem.
344
Hubie Chen
Lemma 1. If R is a verifier with candidate function, then R−1 ≤pm exhaustive(R).
3
Inverse Problems
In this section, we give some initial complexity classifications of inverse problems. The hardness results given here will serve as starting points for deriving further hardness results. First, we note a theorem established in previous work by Kavvadias and Sideri. Theorem 4. [8] Inverse 3-sat is coNP-complete. Next, we establish that the inverse clique problem is coNP-hard. This is done by reducing from the clique problem itself. The idea is to take a graph G (an instance of the clique problem) and “expose” its edges by creating a clique for each edge of G. The resulting candidate graph corresponding to the set of cliques Π has, for each clique in the original graph G, a clique not in Π. Theorem 5. Inverse k-clique is coNP-complete. It generally seems to be the case that the inverse problem for a natural verifier accepting an NP-complete language is coNP-complete: in the next section, we give many examples where this is true. However, there is at least one exception to this apparent rule of thumb. Observation 6. Inverse circuit-sat is in P. This is because every well-formed set Π has a circuit with exactly Π as the satisfying assignments, namely, a DNF circuit directly encoding Π. One might contrast this observation with Theorem 4; intuitively, the difference is that the class of boolean circuits is capable of expressing all possible sets of satisfying assignments, whereas the sets of satisfying assignments expressible by 3-SAT formulas is restricted. We can place the complexity of other inverse problems inside P by making use of a link between inverse problems and the notion of output polynomial time. (This notion has been discussed in previous work, for example [6].) Definition 7. Say that a verifier R has an output polynomial time algorithm if given x, the set R(x) can be computed in time polynomial in |x| + R(x). When R has an output polynomial time algorithm, both the exhaustive proof problem and inverse problem for R become easy. The following lemma is implicit in [8]. Lemma 2. Suppose that R is a verifier with an output polynomial time algorithm. Then, Exhaustive(R) is in P. If in addition R has a candidate function, then R−1 is in P.
Inverse NP Problems
345
This lemma allows us to establish the tractability of inverse bipartite matching, as there is an output polynomial time algorithm for bipartite matching3 . Theorem 7. inverse bipartite matching is in P.
4
π-Reductions
We now develop a notion of reduction between pairs of verifiers with candidate functions called π-reduction, which allows us to compare the complexity of the respective inverse problems. When there is a π-reduction from one verifier to another, the inverse problem of the first reduces in polynomial time to the inverse problem of the second. In addition, we show that the existence of a particular type of function, which we call a π-function, is sufficient for a π-reduction to exist; giving such π-functions will be our main tool for showing hardness of inverse problems. Throughout this section, we assume that R and S are verifiers with candidate functions CR and CS , respectively. Definition 8. A polynomial time computable function g : P(Σ ∗ ) → P(Σ ∗ ) is a π-reduction from R to S if the following three conditions hold. 1. For all well-formed Π ⊆ Σ ∗ , g preserves the number of proofs: |Π| = |g(Π)|. 2. For all well-formed Π ⊆ Σ ∗ , the candidate theorems for Π and g(Π) have the same number of proofs: |R(CR (Π))| = |S(CS (g(Π)))|. 3. If Π ⊆ Σ ∗ is not well-formed, then g(Π) is not well-formed. Lemma 3. If g is a π-reduction from R to S, then R−1 ≤pm S −1 via g. Definition 9. Suppose f : Σ ∗ × Σ ∗ → Σ ∗ is a partial function computable in polynomial time, and let fx : Σ ∗ → Σ ∗ denote the function defined by fx (π) = f (x, π). Let us say that f is a π-function from R to S (relative to CR and CS ) if for all well-formed Π ⊆ Σ ∗ , when x and y are set as x = CR (Π) and y = CS (fx (Π)), fx is a bijection between R(x) and S(y). Lemma 4. Suppose that there exists a π-function from R to S (relative to CR and CS ). Then there exists a π-reduction from R to S, and R−1 ≤pm S −1 . 3
Although it would be tempting to conjecture that there is a correspondence between the complexity of the counting problem corresponding to a verifier R (given x, compute |R(x)|) and the inverse problem R−1 , a general correspondence seems unlikely in light of Theorem 4, Theorem 5, Observation 6 and Theorem 7. The counting versions of 3-sat, k-clique, circuit-sat, and bipartite matching are all P-complete [14]. Moreover, the decision problems for 3-sat and circuit-sat are both NP-complete, so even attempting to predict the inverse complexity based on both the counting complexity and decision complexity seems difficult.
346
Hubie Chen
Having developed a notion of reduction between inverse problems, we can classify the complexity of many natural inverse problems with relative ease. Theorem 8. Inverse exact cover, inverse vertex cover, inverse knapsack, inverse steiner tree, and inverse partition are all coNPcomplete. In the full version of this paper, we give precise descriptions of the verifiers that are addressed by Theorem 8. All of these verifiers have candidate functions, and thus the corresponding inverse problems are all in coNP. We establish that the inverse problems are all coNP-hard by starting with the fact that inverse 3sat and inverse k-clique are coNP-hard (Theorems 4 and 5), and then giving a sequence of π-functions. We note that parsimonious reductions h between R and S such that there is an efficient mapping from proofs of x ∈ L(R) to proofs of h(x) are often (but not always) of great help in developing π-functions. Corollary 1. Exhaustive 3-sat, exhaustive circuit-sat, exhaustive kclique, exhaustive exact cover, exhaustive vertex cover, exhaustive knapsack, exhaustive steiner tree, and exhaustive partition are all coNP-complete4 .
5
Conclusions
In this paper, we formalized the inverse problem for an arbitrary NP verifier, which includes as particular cases the inverse satisfiability problems studied by Kavvadias and Sideri [8]. We developed a notion of reduction for inverse problems and used it to classify the inverse complexity of many natural verifiers. We also formally demonstrated that the “inverse complexity” of a verifier for language L cannot be predicted from L, but rather, exhibits a strong dependence on the choice of verifier used to accept L. There are some interesting directions for future work. Can the intuitive claim that natural verifiers for NP-complete problems tend to have coNP-complete inverse problems, be formalized and proved? (Obviously, a formalized notion of “natural verifier” having the property that all natural verifiers have coNPcomplete inverse problems would have to exclude the verifier for circuit-sat, by Observation 6.) Along these lines, it would be of interest to classify the inverse complexity of the natural verifiers for graph 3-colorability and hamiltonian circuit. In addition, one could study inverse problems for other complexity classes definable by nondeterministic machines, such as NL and Σ2p . 4
For some (but not all) of the listed problems, we are aware of elementary proofs of coNP-completeness where the reduction is from the complement of the original language. For instance, co-k-clique reduces to exhaustive k-clique by mapping G, k to G ∪ C, k, V (C), where C is a clique of size k with vertex set disjoint from that of G.
Inverse NP Problems
347
Acknowledgements The author would like to thank Andrew Blumberg, Carla Gomes, Dexter Kozen, Riccardo Pucella, Tim Roughgarden, and Bart Selman for useful discussions and comments. The author was supported by a NSF Graduate Research Fellowship.
References 1. M. Agrawal and S. Biswas. Universal relations. In Proc. 7th Structure in Complexity Theory Conference, pages 207–220, 1992. 2. J. L. Balc´ azar, J. D´ıaz, and J. Gabarr´ o. Structural Complexity I. Texts in Theoretical Computer Science – An EATCS series. Springer-Verlag, Berlin, 2nd edition, 1995. 3. R. Dechter and J. Pearl. Structure identification in relational data. Artificial Intelligence, 58:237–270, 1992. 4. Thomas Eiter and Kazuhisa Makino. On computing all abductive explanations. In AAAI/IAAI, pages 62–67, 2002. 5. S. Fischer, L. Hemaspaandra, and L. Torenvliet. Witness-isomorphic reductions and local search. Complexity, Logic and Recursion Theory, Lecture Notes in Pure and Applied Mathematics, A. Sorbi (ed.), pages 207–223, 1997. 6. D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. On generating all maximal independent sets. Information Processing Letters, 27:119–123, 1988. 7. Dimitris J. Kavvadias, Martha Sideri, and Elias C. Stavropoulos. Generating all maximal models of a Boolean expression. Information Processing Letters, 74(3– 4):157–162, 2000. 8. D. Kavvadias and M. Sideri. The inverse satisfiability problem. SIAM Journal on Computing, 28(1):152–163, 1998. 9. A. Kwan, E. P. K. Tsang, and J. E. Borrett. Phase transition in finding multiple solutions in constraint satisfaction problems. In Workshop on Studying and Solving Really Hard Problems, First International Conference on Principles and Practice of Constraint Programming, pages 119–126, 1995. 10. N. Lynch and R. Lipton. On structure preserving reductions. SIAM J. Comp., 7(2):119–125, 1978. 11. C. Papadimitriou. NP-completeness: A retrospective. In Proceedings of the 24th International Colloquium on Automata, Languages and Programming, volume 1256 of Lecture Notes in Computer Science, pages 2–6. Springer, 1997. 12. T. J. Schaefer. The complexity of satisfiability problems. In Proc. 10th Annual ACM Symposium on Theory of Computing, pages 216–226, 1978. 13. J. Simon. On some central problems in computational complexity. Technical report, Cornell University, 1975. 14. L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8:189–201, 1979.
A Linear-Time Algorithm for 7-Coloring 1-Planar Graphs (Extended Abstract) Zhi-Zhong Chen and Mitsuharu Kouno Dept. of Math. Sci., Tokyo Denki Univ., Hatoyama, Saitama 350-0394, Japan
[email protected] Abstract. A graph G is 1-planar if it can be embedded in the plane in such a way that each edge crosses at most one other edge. Borodin showed that 1-planar graphs are 6-colorable, but his proof only leads to a complicated polynomial (but nonlinear) time algorithm. This paper presents a linear-time algorithm for 7-coloring 1-planar graphs (that are already embedded in the plane). The main difficulty in the design of our algorithm comes from the fact that the class of 1-planar graphs is not closed under the operation of edge contraction. This difficulty is overcome by a structure lemma that may find useful in other problems on 1-planar graphs. This paper also shows that it is NP-complete to decide whether a given 1-planar graph is 4-colorable. The complexity of the problem of deciding whether a given 1-planar graph is 5-colorable is still unknown.
1
Introduction
The problem of coloring the vertices of a graph using few colors has been a central problem in graph theory. It has also been intensively studied in algorithm theory due to its applications in many practical fields such as scheduling, resource allocation, and VLSI design. Of special interest is the case where the graph is planar. Appel and Haken [1,2] showed that every planar graph is 4-colorable, but their proof only leads to a complicated polynomial (but nonlinear) time algorithm. Since then, a number of linear-time algorithms for 5-coloring planar graphs have appeared [9,12,11,15,10]. An interesting generalization of planar graphs is the class of 1-planar graphs. The problem of coloring the vertices of a 1-planar graph using few colors has also attracted very much attention [14,13,3,4,5]. Indeed, the problem has been formulated in another different way: It is equivalent to the problem of coloring the vertices of a plane graph so that the boundary vertices of every face of size at most 4 receive different colors [13]. Ringel [14] proved that every 1-planar graph is 7-colorable and conjectured that every 1-planar graph is 6-colorable. Ringel [14] and Archdeacon [3] confirmed the conjecture for two special cases.
The full version can be found at http://rnc.r.dendai.ac.jp/˜chen/papers/1planar.pdf Supported in part by the Grant-in-Aid for Scientific Research of the Ministry of Education, Science, Sports and Culture of Japan, under Grant No. 14580390.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 348–357, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Linear-Time Algorithm for 7-Coloring 1-Planar Graphs
349
Borodin [4] settled the conjecture in the affirmative with a lengthy proof. He [5] later came up with a relatively shorter proof. However, his proof only leads to a complicated polynomial (but nonlinear) time algorithm for 6-coloring 1-planar graphs. Chen, Grigni, and Papadimitriou [7] studied a modified notion of planarity, in which two nations of a (political) map are considered adjacent when they share any point of their boundaries (not necessarily an edge, as planarity requires). Such adjacencies define a map graph (see [8] for a comprehensive survey of known results on map graphs). The map graph is called a k-map graph if no more than k nations on the map meet at a point. As observed in [7], the adjacency graph of the United States is nonplanar but is a 4-map graph. Obviously, every 4-map graph is 1-planar. In Section 3, we will observe that every 1-planar graph can be modified to a 4-map graph by adding some edges (see Fact 1. below). By these facts, the problem of k-coloring 1-planar graphs is essentially equivalent to the problem of k-coloring 4-map graphs, for every integer k ≥ 4. Recall that in the case of planar graphs, a linear-time 4-coloring algorithm seems to be difficult to design and hence it is of interest to look for a linear-time 5-coloring algorithm. Similarly, in the case of 1-planar graphs, a linear-time 6coloring algorithm seems to be difficult to design and hence it is of interest to look for a linear-time 7-coloring algorithm. In this paper, we present the first lineartime algorithm for 7-coloring 1-planar graphs (that are already embedded in the plane). Our algorithm is much more complicated than all 5-coloring algorithms for planar graphs. The main reason is that unlike planar graphs, the class of 1-planar graphs is not closed under the operation of edge contraction (recall that contracting an edge {u, v} in a graph G is made by replacing u and v by a single new vertex z and adding an edge between z and each original neighbor of u and/or v). It is worth noting that many coloring algorithms (e.g., those for planar graphs) are crucially based on the property that the class of their input graphs is closed under the operation of edge contraction. In the case of 1-planar graphs, this property is not available and it becomes difficult to find suitable vertices to merge so that the resulting graph is still 1-planar. We overcome this difficulty with a structure lemma which essentially says that every 1-planar graph either has a constant fraction of vertices of degree at most 7, or has a constant fraction of vertices each of which is of degree 8 and has at least 5 neighbors of degree at most 8. We believe that this lemma will find useful in the design of algorithms for other problems on 1-planar graphs. Our algorithm works only when the input 1-planar graph is given together with its embedding in the plane. Since it is still unknown whether 1-planar graphs can be recognized in polynomial time, it is an interesting open question to ask whether 1-planar graphs can be 7-colored in linear time when they are given without an embedding in the plane. Since planar graphs are special 1-planar graphs and it is NP-complete to decide whether a given planar graph is 3-colorable, it is also NP-complete to decide whether a given 1-planar graph is 3-colorable. This paper shows that it is NP-complete to decide whether a given 1-planar graph is 4-colorable. The problem of deciding whether a given 1-planar graph is 5-colorable remains open.
350
2
Zhi-Zhong Chen and Mitsuharu Kouno
Preliminaries
Throughout this paper, a graph is always simple (i.e., has neither multiple edges nor self-loops) unless stated explicitly otherwise. Let G = (V, E) be a graph. The neighborhood of a vertex v in G, denoted NG (v), is the set of vertices in G adjacent to v; dG (v) = |NG (v)| is the degree of v in G. For U ⊆ V , let NG (U ) = ∪u∈U NG (u). For U ⊆ V , the subgraph of G induced by U is the graph (U, F ) with F = {{u, v} ∈ E : u, v ∈ U } and is denoted by G[U ]. For U ⊆ V , we denote by G − U the subgraph induced by V − U . If u ∈ V , we write G − u instead of G − {u}. An independent set in G is a set of pairwise nonadjacent vertices in G. A maximal independent set in G is an independent set in G that is not a proper subset of another independent set in G. A 1-plane embedding of G is an embedding of G in the plane in such a way that each edge crosses at most one other edge. G has a 1-plane embedding only when G is a 1-planar graph. For a sequence u1 , . . . , uk of two or more distinct pairwise nonadjacent vertices in G, merging u1 , . . . , uk is the operation of modifying G by adding an edge between uk and every vertex in ∪1≤i≤k−1 NG (ui ) − NG (uk ) and further removing vertices u1 , . . . , uk−1 . Note that the sequence is ordered and uk is the last vertex in the sequence. Let k be a natural number. A k-coloring of G is a coloring of the vertices of G with at most k colors such that no two adjacent vertices get the same color. The color classes of a coloring C of the vertices of G are the sets V1 , V2 , ..., Vk , where k is the number of colors used by C and Vi , 1 ≤ i ≤ k, is the set of all vertices with the ith color.
3
The Algorithm
Since we can reduce the problem of 7-coloring 1-planar graphs to its special case where the input 1-planar graph is 3-connected, we can restrict our attention only to 3-connected 1-planar graphs. We assume that each input 1-planar graph G is given by its adjacency list and a list L of disjoint (unordered) pairs of edges of G such that G has a 1-plane embedding in which the two edges in each pair in L cross while no two other edges of G cross. Given G in such a way, we first construct a graph H as follows. H contains all vertices of G and all those edges of G that are contained in no pair in L. Moreover, for each (unordered) pair {e, e } ∈ L, H contains a new vertex ve,e , and contains an edge between ve,e and each endpoint of e and/or e . H does not contain other vertices or edges. Note that H is a planar graph by our assumption on L. Also note that some vertices ve,e may have only three neighbors in H because e and e may share one endpoint. We then compute a plane embedding of H in linear time. For convenience, we identify H with its plane embedding. Hereafter, for a vertex v of H, we say that two neighbors u and w of v in H are consecutive if u and w appear around ve,e consecutively (clockwise or counterclockwise) in H. Fact 1. G has a supergraph that is a 4-map graph. Consequently, every 1-planar graph has a supergraph that is a 4-map graph.
A Linear-Time Algorithm for 7-Coloring 1-Planar Graphs
351
Fact 2. Let ve,e be a vertex in H but not in G. Suppose that x and y are two consecutive neighbors of ve,e in H such that {x, y} is an edge in H. Then, the cycle C formed by the three edges {ve,e , x}, {x, y}, and {y, ve,e } together is the boundary of some face of H. Lemma 1. We can modify H in linear time so that H satisfies the following conditions: 1. H has the same vertices as G, and contains all edges of G. 2. For each vertex v of H and for every two consecutive neighbors u and w of v in H, {u, w} is an edge in H and crosses no edge in H. 3. For each pair of edges of H that cross in H, the endpoints of the two edges induce a clique of size 4 in H. Hereafter, without loss of generality (by Lemma 1), we assume the following: Assumption 1. H satisfies the conditions in Lemma 1. Now, by Condition 1 in Lemma 1, it suffices to 7-color H in order to 7-color G. Hereafter, we will work on H instead of G. Corollary 1. The following two statements hold: 1. Let v be a vertex in H. Suppose that u is a neighbor of v in H such that edge {v, u} crosses another edge {x, y} in H. Then, {x, y} ⊆ NH (v), and x, u, y appear around v consecutively in H in this order (clockwise or counterclockwise). 2. Let u and w be two consecutive neighbors of v in H. Then, at least one of edges {v, u} and {v, w} crosses no edge in H. 3.1
A Structure Lemma
Fix two constants α and K with 1 < α < 2 and K > 7 + 9/(α − 1). Let v be a vertex of H. If |dH (v)| ≤ K, we say that v is small; otherwise, we say that v is large. We say that v is reducible if one of the following holds: 1. dH (v) ≤ 6. 2. dH (v) = 7 and NH (v) contains at most one large vertex. 3. dH (v) = 8, NH (v) contains no large vertex, and one of the following holds: (a) There are at most two vertices u ∈ NH (v) with dH (u) ≥ 9. (b) There are exactly three vertices u ∈ NH (v) with dH (u) ≥ 9 and there are distinct vertices u1 , u2 , u3 in NH (v) such that dH (u1 ) ≥ 9, dH (u2 ) ≥ 9, dH (u3 ) ≤ 8, and {v, u2 } and {u1 , u3 } are edges of H and cross in H. Lemma 2. Let R be the set of reducible vertices in H. Then, R contains a constant fraction of vertices of H. Corollary 2. We can compute a set I of reducible vertices of H in linear time such that the following conditions are satisfied: 1. I contains a constant fraction of vertices of H. 2. For every two vertices u and v in I, there is no path P between u and v in H such that P has at most three edges and has no large vertex.
352
3.2
Zhi-Zhong Chen and Mitsuharu Kouno
Outline of the Algorithm
We first give an outline of the algorithm. It first computes a set I of reducible vertices of H satisfying the conditions in Corollary 2. It then uses I and H to construct a new 1-planar graph G in linear time such that the number of vertices in G is a constant fraction of the number of vertices in H and a 7-coloring of H can be constructed in linear time from an arbitrarily given 7-coloring of G . It further recurses on G to obtain a 7-coloring of G which is then used to obtain a 7-coloring of H in linear time. Since each recursion takes linear time and reduces the size of the graph by a constant fraction, the overall time is linear. The core of the algorithm is in the construction of G . 3.3
Constructing Graph G for Recursion
To construct G , we may simply remove all v ∈ I with dH (v) ≤ 6 from H because each 7-coloring of H − v extends to a 7-coloring of H. Similarly, for each v ∈ I such that NH (v) contains a vertex u with dH (u) ≤ 6, we may remove u from H. However, these are not enough because I may contain very few such vertices v. So, we need to do something about those vertices v ∈ I such that 7 ≤ dH (v) ≤ 8 and NH (v) contains no vertex u with dH (u) ≤ 6. We call such vertices v critical vertices. The idea is to explore the neighborhood structure of critical vertices. First, we need the following definitions: Definition 1. A vertex x in H is dangerous for a critical vertex v if one of the following holds: – dH (v) = 7 and x is a large neighbor of v in H. – dH (v) = 8, x ∈ NH (v) ∪ {v}, and x is adjacent to some vertex u ∈ NH (v) in H. Note that a vertex x may be dangerous for more than one critical vertex. Definition 2. Let v be a critical vertex with dH (v) = 7. A mergable pair for v is a pair (u, w) of two nonadjacent neighbors of v in H such that u is small and the graph G1 obtained from H − v by merging u, w is a 1-planar graph. In Definition 2, w may be dangerous for v. Moreover, no matter whether w is dangerous for v, w remains in G1 . The intuition behind Definition 2 is that we can extend a given 7-coloring of G1 to a 7-coloring of H as follows: Let u have the color of w, and then let v have a color that is assigned to no vertex in NH (v). Definition 3. Let v be a critical vertex with dH (v) = 8. 1. A mergable triple for v is a set {u1 , u2 , u3 } of three pairwise nonadjacent neighbors of v in H such that the graph G2 obtained from H − v by merging u1 , u2 , u3 is a 1-planar graph.
A Linear-Time Algorithm for 7-Coloring 1-Planar Graphs
353
2. Two simultaneously mergable pairs for v are two pairs (u1 , u2 ) and (w1 , w2 ) such that u1 , u2 , w1 , and w2 are distinct neighbors of v in H, neither {u1 , u2 } nor {w1 , w2 } is an edge of H, and the graph G3 obtained from H − v by merging u1 , u2 and merging w1 , w2 is a 1-planar graph. 3. A desired quadruple for v is an ordered list (u, w1 , w2 , w3 ) of four distinct neighbors in H such that – dH (u) ≤ 8, – {w1 , w2 } ⊆ NH (u), and – {w1 , w2 , w3 } is an independent set in H, and the graph G4 obtained from H − {v, u} by merging w1 , w2 , w3 is a 1-planar graph. 4. A favorite quintuple for v is an ordered list (u, w1 , w2 , w3 , w4 ) of five distinct neighbors of v in H such that – dH (u) ≤ 8, – {w1 , w2 } ⊆ NH (u), and – neither {w1 , w2 } nor {w3 , w4 } is an edge in H, and the graph G5 obtained from H − {v, u} by merging w1 , w2 and merging w3 , w4 is a 1-planar graph. 5. A desired quintuple for v is an ordered list (u1 , u2 , w1 , w2 , w3 ) of five distinct neighbors of v in H such that – dH (u1 ) ≤ 8 and dH (u2 ) ≤ 8, – {w1 , w2 } ⊆ NH (u1 ) and {w2 , w3 } ⊆ NH (u2 ), and – {w1 , w2 , w3 } is an independent set in H, and the graph G6 obtained from H − {v, u1 , u2 } by merging w1 , w2 , w3 is a 1-planar graph. 6. A desired sextuple for v is an ordered list (u1 , u2 , w1 , w2 , w3 , w4 ) of six distinct neighbors of v in H such that – dH (u1 ) ≤ 8 and dH (u2 ) ≤ 8, – {w1 , w2 } ⊆ NH (u1 ) and {w3 , w4 } ⊆ NH (u2 ), and – neither {w1 , w2 } nor {w3 , w4 } is an edge in H, and the graph G7 obtained from H − {v, u1 , u2 } by merging w1 , w2 and merging w3 , w4 is a 1planar graph. 7. A useful sextuple for v is an ordered list (u1 , u2 , w1 , x1 , w2 , x2 ) of six distinct vertices in H such that – dH (u1 ) ≤ 8, dH (u2 ) ≤ 8, and {u1 , u2 } ⊆ {v} ∪ NH (v), – w1 ∈ {v} ∪ NH (v) and w2 ∈ {v} ∪ NH (v), – {w1 , x1 , w2 , x2 } ⊆ NH (u1 ) and {u1 , w1 , x1 } ⊆ NH (u2 ), and – neither {w1 , x1 } nor {w2 , x2 } is an edge in H, and the graph G8 obtained from H −{u1 , u2 } by merging w1 , x1 and merging w2 , x2 is a 1-planar graph. 8. A useful triple for v is an ordered list (u, w, x) of three distinct vertices in H such that – dH (u) ≤ 7 and u ∈ NH (v), – w ∈ NH (v) and {w, x} ⊆ NH (u), and – {w, x} is not an edge in H and the graph G9 obtained from H − {u} by merging w, x is a 1-planar graph.
354
Zhi-Zhong Chen and Mitsuharu Kouno
In Definition 3(7), x1 and x2 may be dangerous for v. Also, in Definition 3(8), x may be dangerous for v. The intuitions behind Definitions 3(1) and 3(2) are similar to that of Definition 2. The intuition behind Definition 3(3) is that we can extend a given 7-coloring of the graph G4 to a 7-coloring of H as follows: let w1 and w2 have the color of w3 , let u have a color assigned to no vertex in NH (u) − {v}, and further let v have a color assigned to no vertex in NH (v) before. The intuition behind Definition 3(4) is that we can extend a given 7-coloring of the graph G5 to a 7-coloring of H as follows: let w1 have the color of w2 , let w3 have the color of w4 , let u have a color assigned to no vertex in NH (u) − {v}, and further let v have a color assigned to no vertex in NH (v) before. The intuition behind Definition 3(5) is that we can extend a given 7-coloring of the graph G6 to a 7-coloring of H as follows: let w1 and w2 have the color of w3 , let u2 have a color assigned to no vertex in NH (u2 ) − {v}, let u1 have a color assigned to no vertex in NH (u1 ) − {v}, and further let v have a color assigned to no vertex in NH (v) before. The intuition behind Definition 3(6) is that we can extend a given 7-coloring of the graph G7 to a 7-coloring of H as follows: let w1 have the color of w2 , let w3 have the color of w4 , let u2 have a color assigned to no vertex in NH (u2 ) − {v}, let u1 have a color assigned to no vertex in NH (u1 ) − {v}, and further let v have a color assigned to no vertex in NH (v) before. The intuition behind Definition 3(7) is that we can extend a given 7-coloring of the graph G8 to a 7-coloring of H as follows: let w1 have the color of x1 , let w2 have the color of x2 , let u2 have a color assigned to no vertex in NH (u2 ) − {u1 } before, and further let u1 have a color assigned to no vertex in NH (u1 ) before. Finally, the intuition behind Definition 3(8) is that we can extend a given 7-coloring of the graph G9 to a 7-coloring of H as follows: let w have the color of x, and then let u have a color assigned to no vertex in NH (u) before. The following theorem can be proved by a case-analysis. The proof is very tedious (15 pages long). Theorem 1. For a critical vertex v, call an edge e in H a basic critical edge for v if at least one endpoint of e is v or a small neighbor of v in H, Moreover, for a critical vertex v, call an edge e in H a critical edge for v if e is a basic critical edge for v or e crosses a basic critical edge for v in H. Then, for every critical vertex v, the following hold: 1. If dH (v) = 7, then we can use the sub-embedding of H induced by the set of critical edges for v to find a mergable pair for v in O(1) time such that the graph G1 defined in Definition 2 has a 1-plane embedding H satisfying the following three conditions: (C1) For every pair of edges e1 and e2 in H , e1 and e2 cross each other in embedding H if and only if they cross each other in embedding H. (C2) For every vertex x in H that is neither v nor a small neighbor of v in H, and for every sequence e1 , . . . , ek of edges in H that are incident to x but incident to neither v nor a small neighbor of v, if edges e1 , . . . , ek appear around x consecutively in this order in embedding
A Linear-Time Algorithm for 7-Coloring 1-Planar Graphs
355
H, then edges e1 , . . . , ek appear around x consecutively in this order in embedding H . (C3) Same as (C2) but with both occurrences of the word “consecutively” deleted. 2. If dH (v) = 8, then we can use the sub-embedding of H induced by the set of critical edges for v to compute one of the following for v in O(1) time: – A mergable triple such that the graph G2 defined in Definition 3(1) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). – Two simultaneously mergable pairs such that the graph G3 defined in Definition 3(2) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). – A desired quadruple such that the graph G4 defined in Definition 3(3) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). – A favorite quintuple such that the graph G5 defined in Definition 3(4) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). – A desired quintuple such that the graph G6 defined in Definition 3(5) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). – A desired sextuple such that the graph G7 defined in Definition 3(6) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). – A useful sextuple such that the graph G8 defined in Definition 3(7) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). – A useful triple such that the graph G9 defined in Definition 3(8) has a 1-plane embedding H satisfying the above conditions (C1) through (C3). In the constructions of graphs G1 through G9 (cf. Definitions 2 and 3), we may merge a dangerous vertex for v only with a small vertex in {v}∪NH (v) (and the dangerous vertex remains after the merging operation), may delete only v and/or some small vertices in NH (v), and may touch only some critical edges for v. Note that the set of critical edges for a critical vertex is disjoint from the set of critical edges for another critical vertex, because of the second condition in Corollary 2. Thus, Conditions (C1) through (C3) together guarantee that for each critical vertex v, we can find and use a mergable pair, a mergable triple, two simultaneously mergable pairs, a desired quadruple, a favorite quintuple, a desired quintuple, a desired sextuple, a useful sextuple, or a useful triple for v to modify H in such a way that after the modification, we can still find a mergable pair, a mergable triple, two simultaneously mergable pairs, a desired quadruple, a favorite quintuple, a desired quintuple, a desired sextuple, a useful sextuple, or a useful triple for each other critical vertex. Now, we are ready to explain how to construct G . The construction of G from H is done as follows. 1. For each critical vertex v with dH (v) = 7, find a mergable pair for v as guaranteed in Theorem 1. 2. For each critical vertex v with dH (v) = 8, find a mergable triple, two simultaneously mergable pairs, a desired quadruple, a favorite quintuple, a desired quintuple, a desired sextuple, a useful sextuple, or a useful triple for v as guaranteed in Theorem 1.
356
Zhi-Zhong Chen and Mitsuharu Kouno
3. For each critical vertex v with dH (v) = 7 and the mergable pair (u, w) found for v in Step 1, remove v from H and further merge u, w. 4. For each critical vertex v with dH (v) = 8, perform the following: (a) If a mergable triple {u1 , u2 , u3 } was found for v in Step 2, then remove v from H and further merge u1 , u2 , u3 . (b) If two simultaneously mergable pairs (u1 , u2 ) and (w1 , w2 ) were found for v in Step 2, then remove v from H, merge u1 , u2 , and further merge w1 , w2 . (c) If a desired quadruple (u, w1 , w2 , w3 ) was found for v in Step 2, then remove v and u from H, and further merge w1 , w2 , w3 . (d) If a favorite quintuple (u, w1 , w2 , w3 , w4 ) was found for v in Step 2, then remove v and u from H, merge w1 , w2 , and further merge w3 , w4 . (e) If a desired quintuple (u1 , u2 , w1 , w2 , w3 ) was found for v in Step 2, then remove v, u1 , and u2 from H, and further merge w1 , w2 , w3 . (f) If a desired sextuple (u1 , u2 , w1 , w2 , w3 , w4 ) was found for v in Step 2, then remove v, u1 , and u2 from H, merge w1 , w2 , and further merge w3 , w4 . (g) If a useful sextuple (u1 , u2 , w1 , x1 , w2 , x2 ) was found for v in Step 2, then remove u1 and u2 from H, merge w1 , x1 , and further merge w2 , x2 . (h) If a useful triple (u, w, x) was found for v in Step 2, then remove u from H and further merge w, x. 5. Remove all v ∈ I with dH (v) ≤ 6 from H. 6. For each v ∈ I such that NH (v) contains a vertex u with dH (u) ≤ 6, remove all such vertices u from H. By the discussion in the paragraph succeeding Theorem 1, the merging and removal operations in Steps 3 through 6 do not interfere with each other. It is also easy to see that the construction of G takes O(|I|) time (and hence linear time). Recall that a vertex x may be dangerous for more than one critical vertex. So, during the construction of G , it is possible that after a dangerous vertex x for some critical vertex v is merged with a small vertex in {v} ∪ NH (v), x is merged with a small vertex in {v }∪NH (v ) later, where v = v is a critical vertex for which x is dangerous too. Fortunately, the second condition in Corollary 2 guarantees that during the construction of G , adjacent vertices are never merged together. By Condition (C1), G has a 1-plane embedding H such that for every pair of edges e1 and e2 in G , e1 and e2 cross each other in embedding H if and only if they cross each other in embedding H. Thus, we can compute a list L of disjoint (unordered) pairs of edges of G in linear time such that G has a 1-plane embedding in which the two edges in each pair in L cross while no two other edges of G cross.
A Linear-Time Algorithm for 7-Coloring 1-Planar Graphs
4
357
NP-Completeness of 4-Colorability
Since planar graphs are special 1-planar graphs and it is NP-complete to decide whether a given planar graph is 3-colorable, it is also NP-complete to decide whether a given 1-planar graph is 3-colorable. We can show the following: Theorem 2. It is NP-complete to decide whether a given 1-planar graph is 4colorable. It is natural to consider the problem of deciding whether a given 1-planar graph is 5-colorable. Unfortunately, we still do not know whether this problem is NP-complete. This is an open question.
References 1. K. Appel and W. Haken. Every planar map is four colorable, Part I: Discharging. Illinois J. Math., 21:429–490, 1977. 2. K. Appel, W. Haken, and J. Koch. Every planar map is four colorable, Part II: Reducibility. Illinois J. Math., 21:491–567, 1977. 3. D. Archdeacon. Coupled colorings of planar graphs. Congres. Numer., 39:89–94, 1983. 4. O.V. Borodin. Solution of Ringel’s problems on vertex-face coloring of planar graphs and coloring of 1-planar graphs (in Russian). Met. Discret. anal., Novosibirsk, 41:12–26, 1984. 5. O.V. Borodin. A new proof of the 6 color theorem. J. Graph Theory, 19:507–521, 1995. 6. Z.-Z. Chen. Approximation algorithms for independent sets in map graphs. J. Algorithms, 41:20–40, 2001. 7. Z.-Z. Chen, M. Grigni, and C.H. Papadimitriou. Planar map graphs. In Proc. ACM STOC’98, pages 514–523, 1998. 8. Z.-Z. Chen, M. Grigni, and C.H. Papadimitriou. Map graphs. J. ACM, 49:127–138, 2002. 9. N. Chiba, T. Nishizeki, and N. Saito. A linear 5-coloring algorithm of planar graphs. J. Algorithms., 8:470–479, 1981. 10. M. Chrobak and K. Diks. Two algorithms for coloring planar graphs with 5 colors. Tech. Report, Columbia University, January, 1987. 11. G.N. Frederickson. On linear-time algorithms for five-coloring planar graphs. Inform. Process. Lett., 19:219–224, 1984. 12. D.W. Matula, Y. Shiloach, and R.E. Tarjan. Two linear-time algorithms for fivecoloring a planar graph. Tech. Report STAN-CS-80-830, Stanford University, November, 1980. 13. O. Ore and M.D. Plummer. Cyclic coloration of planar graphs. Recent Progress in Combinatorics (Proc. 3rd Waterloo Conf. on Combinatorics, 1968), Academic Press, New York-London, pages 287–293, 1969. 14. G. Ringel. Ein Sechsfarbenproblem auf der Kugel. Abh. Math. Sem. Univ. Hamburg, 29:107-117, 1965. 15. M.H. Williams. A linear algorithm for colouring planar graphs with five colours. Comput. J., 28:78–81, 1985.
Generalized Satisfiability with Limited Occurrences per Variable: A Study through Delta-Matroid Parity Victor Dalmau1, and Daniel K. Ford2, 1
Universitat Pompeu Fabra
[email protected] 2 UC Santa Cruz
[email protected] Abstract. In this paper we examine generalized satisfiability problems with limited variable occurrences. First, we show that 3 occurrences per variable suffice to make these problems as hard as their unrestricted version. Then we focus on generalized satisfiability problems with at most 2 occurrences per variable. It is known that some N P -complete generalized satisfiability problems become polynomially solvable when only 2 occurrences per variable are allowed. We identify two new families of generalized satisfiability problems, called local and binary, that are polynomially solvable when only 2 occurrences per variable are allowed. We achieve this result by means of a reduction to the -matroid parity problem, which is another important theme of this work.
1
Introduction and Summary of Results
Satisfiability problems are studied intensively in theoretical computer science. One of the main lines of research consists in identifying the restrictions of the general satisfiability problem that give rise to tractable problems. In an attempt to define a common framework to study different classes of satisfiability problems Schaefer defined the class of generalized satisfiability problems [16]. He observed that many of the variants of the satisfiability problem considered in the literature, for example Horn and 2-CNF formulas, can be obtained by restricting the type of clauses that are allowed in a formula. In the generalized satisfiability framework such restrictions are implemented by means of a fixed basis, or set of Boolean relations, that define the basic building blocks from which formulas are created. Many families of satisfiability problems can be expressed using this formalism. The following example illustrates how 3-SAT can be expressed as a generalized satisfiability problem. The 3-clause, x ∨ y ∨ ¬z, can be expressed as the atomic formula, R(x, y, z), where R is a Boolean relation consisting of every assignment to x, y, z that satisfies x ∨ y ∨ ¬z, that is, {0, 1}3 \ {(0, 0, 1)}. Thus, if we let the basis Γ be the set consisting of the four relations, (1) {0, 1}3 \ {(0, 0, 0)}, (2) {0, 1}3 \ {(0, 0, 1)}, (3) {0, 1}3 \ {(0, 1, 1)}, and (4) {0, 1}3 \ {(1, 1, 1)}, then SAT (Γ ) is 3-SAT . Every 3-CN F formula can be generated, clause by clause, by picking the appropriate relation from Γ and assigning to
Research partially conducted when visiting UC Santa Cruz. Supported by NSF grant No. CCR9610257 and spanish MCyT under projects TIC2002-04019-C03 and TIC2002-04470-C03. Ford was partially supported by NSF Grant No. ISS-9907419.
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 358–367, 2003. c Springer-Verlag Berlin Heidelberg 2003
Generalized Satisfiability with Limited Occurrences per Variable
359
it the appropriate variables. Every finite collection of relations, Γ , defines the generalized satisfiability problem, SAT (Γ ). Schaefer’s Dichotomy Theorem showed that generalized satisfiability problems are either in P or they are N P -complete. This is of particular interest because of Ladner’s Theorem that states, if P = N P there exists problems in N P that are neither N P complete nor in P [12]. Schaefer’s Dichotomy Theorem additionally gives a complete classification of the computational complexity of generalized satisfiability problems: SAT (Γ ) is in the class P if Γ is a subset of one of the following six families (1) Horn relations, (2) dual-Horn relations, (3) bijunctive relations, (4) affine relations, (5) 0-valid relations, and (6) 1-valid relations and N P -complete otherwise [16]. The main motivation of this paper is to bring into the picture the number of occurrences per variable. It is well known that some NP-complete satisfiability problems become tractable when the maximum number of occurrences per variable is bounded. For example, Tovey showed that (3, 4)-SAT, that is, 4-SAT with at most three variable occurrences per formula and at most one occurrence per clause, is N P -complete, and that, by a direct application of P. Hall’s Marriage Theorem, every (k, k)-SAT formula is satisfiable [17]. He also conjectured that there exists an exponential function l(k) such that every (k, l(k))-SAT formula is satisfiable. Kratochv´il et al. confirmed this conjecture and showed that (k, l(k) + 1)-SAT is N P -complete [11]. The study of generalized satisfiability problems with a bounded number of occurrences per variable was introduced independently in [7] and [10]. It was motivated by the fact that (1) some N P complete satisfiability problems become solvable in polynomial time when the number of occurrences per variable is bounded and (2) the generalized satisfiability framework is a well established setting for the uniform analysis of satisfiability problems. The ultimate goal here is to find a refinement of Schaefer’s Dichotomy Theorem in which the maximum number of occurrences per variable is taken into account. More precisely, we let SAT (k, Γ ) contain those formulas (instances) of SAT (Γ ) in which every variable occurs at most k times, and we are interested in determining, for every positive integer k and every basis Γ , the computational complexity of SAT (k, Γ ). Matching on graphs with bounded degree is a typical example of a problem that can be expressed as a generalized satisfiability problem with a bounded number occurrences per variable. Edges of the graph are represented by variables. Each vertex of degree j is represented by an ordered set of j variables, one per incident edge, and the j-ary Boolean relation 1-in-j: the j-element set of all j-dimensional unit vectors, e.g. the relation 1in-3 is {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. A 1 in the relation corresponds to an edge that is assigned to the final matching and a 0 corresponds to an edge that is not. The relation 1-in-j constrains variable assignments to those that assign exactly one incident edge, of the corresponding vertex, to the final matching. Since every edge is incident to two vertices every variable must occur exactly twice. If we let Γm be the set consisting of the m relations 1-in-j for 1 ≤ j ≤ m, then SAT (2, Γm ) can express every instance of the matching problem on graphs with degree ≤ m. This is a subclass of the graph matching problem which Edmonds’ Matching Algorithm solves in polynomial time [6]. However, Schaefer showed that the relation 1-in-3 is neither Horn, dual-Horn, bijunctive, affine, 0-valid, nor 1-valid, thereby proving that SAT (Γm ) is N P -complete whenever m ≥ 3 [16]. Schaefer’s Theorem fails to show that the graph matching problem is tractable
360
Victor Dalmau and Daniel K. Ford
because it doesn’t address the case where each variable occurs exactly twice. Istrate was the first to notice that if we limit the formulas of SAT (Γm ) to those where variables occur at most twice, then the problem, SAT (2, Γm ), reduces in polynomial time to the matching problem on graphs of degree at most m [10]. Hence, generalized satisfiability explains the polynomial solvability of graph matching only when we limit the number of variable occurrences. Graph matching is not an isolated example of a generalized satisfiability problem that is tractable when variables occur at most twice and N P -complete otherwise. In particular, it is known that if every relation of a basis, Γ , is compact then SAT (2, Γ ) is solvable in polynomial-time whereas SAT (Γ ) is, in general, NP-complete [10]. Another example of a broad family of generalized satisfiability problems that becomes tractable when the number of occurrences is bounded by 2 is the class of bases consisting of co-independent relations [7]. Generalized satisfiability with at most two occurrences per variable is a rich class of problems intimately related to algorithms, concepts and methods of matching theory. Moreover, the computational complexity of these problems remains largely unknown. In order to simplify matters we only consider in this paper a variant of the generalized satisfiability problems, denoted SATC (Γ ), already introduced by Schaeffer, in which constants 0 and 1 can be used in the formulas. This assumption, widely used in the literature (see [5]) allows us to simplify considerably the picture. In particular, we are able to show that if k ≥ 3 then SATC (k, Γ ), is polynomially equivalent to SATC (Γ ) for every basis Γ . This result shows that three occurrences suffice to make a problem as hard as the unrestricted problem. As an immediate corollary we get that many common satisfiability problems such as 1-in-3 SAT and not all equal 3-SAT (see [8] for definitions), are N P -complete with at most 3 occurrences per variable (these problems are known to be tractable with at most 2 occurrences). Generalized satisfiability with exactly one occurrence per variable, SATC (1, Γ ), is trivially tractable: every instance is satisfiable. Hence, we only need to determine the complexity of , SATC (2, Γ ). It was recently shown that SATC (2, Γ ) is polynomially equivalent to SATC (Γ ) if Γ contains a relation that is not a -matroid [7]. These two results combine to identify SATC (2, Γ ) where every relation in Γ is a -matroid as the only family that may contain additional tractable families that were not previously identified by Schaefer. Thus, the only remaining task is to identify all the bases, Γ , consisting of -matroids such that SATC (2, Γ ) is tractable. As we mentioned above, only two families of such bases have been identified so far (prior to the results in this paper): the class of bases consisting of compact relations and the class of bases consisting of co-independent relations [10,7]. In this paper we add two new families of bases to the list: the class of bases consisting of local -matroids and the class of bases consisting of binary -matroids. It is possible to show, using the technique of forbidden minors, that these four classes of -matroids, compact, co-independent, local, and binary, are pairwise incomparable. The two new classes, local and binary, are shown to be tractable via a reduction to the -matroid parity problem (see [9] for definitions). In addition we give specific conditions under which tractable cases of the -matroid parity problem yield tractable cases of SATC (2, Γ ). The only previously known tractable cases of the -matroid parity problem are binary and linear -matroids [13,9]. However, only the
Generalized Satisfiability with Limited Occurrences per Variable
361
family of binary -matroids, yields a tractable case of SATC (2, Γ ). The reason for this is that all the known polynomial algorithms for these classes require that a representation, in terms of matrices, of the -matroid be given. Whereas for binary -matroids such a matrix representation can be constructed, it is still open whether this is possible for linear -matroids. On the other hand, the class of local -matroids constitutes a brand new family of tractable cases of the -matroid parity problem. The proof of this result relies on a generalization of the concept of augmenting paths originally introduced by Berge in graph matching [1]. Figure 1 illustrates the current state of knowledge about the computational complexity of SATC (2, Γ ).
SAT NP−C
P
non−Schaefer
Schaefer
NP−C 3 occurences this paper
2 occurences
dual− bijunctive Horn (2−SAT) affine
Horn
NP−C non −matroid Feder
? ?
−matroid
P compact Istrate
P co−independent Feder
P
P binary this paper
local this paper
Fig. 1. Computational complexity of generalized satisfiability with ≤ k occurrences per variable
Because of space restrictions proofs are only available in the full version of this paper available at http://www.soe.ucsc.edu/ ford/gslvo2003.ps.gz.
2
Generalized Satisfiability with ≤ k Occurrences per Variable
We introduced generalized satisfiability problems, SAT (Γ ), via the example of 3-SAT where clauses were replaced by an ordered set of variables and a relation. We now give a formal definition of SAT (Γ ). An r-ary relation, R, is any nonempty subset of {0, 1}r , R is a predicate symbol with the same arity as R, and a basis or constraint set, Γ , is a finite collection {R1 , . . . , Rm } of relations. A CN F (Γ )-formula is a finite conjunction of clauses C1 ∧ · · · ∧ Cn such that each clause, Ci , is an atomic formula of the form
362
Victor Dalmau and Daniel K. Ford
R (v1 , . . . , vr ) where v1 , . . . , vr are Boolean variables in an infinite set V , and R is an r-ary relation in Γ . An atomic formula R (v1 , ..., vr ) is satisfied by a variable assignment f : V → {0, 1} if and only if (f (v1 ), ..., f (vr )) ∈ R, and a CN F (Γ )-formula is satisfiable if and only if there exists an assignment satisfying all its clauses. It is sometimes customary to assume that we can replace some variables in a CN F (Γ )-formula by the constant symbols 0 and 1 to be interpreted as 0 and 1 respectively. We call any formula obtained this way a CN FC (Γ )-formula. Each basis Γ gives rise to the generalized satisfiability problem SAT (Γ ): given a CN F (Γ )-formula, is it satisfiable? The problem generalized satisfiability problem with constants, SATC (Γ ), is defined similarly. As we mentioned in the introduction, Schaefer completely classified the computational complexity of generalized satisfiability problems [16]. In order to state Schaefer’s Dichotomy Theorem (Theorem 1 below) we introduce the following definitions, where ∨, ∧, and ⊕ act on tuples component-wise. A relation, R, is Horn if x, y ∈ R ⇒ x ∧ y ∈ R, dual-Horn if x, y ∈ R ⇒ x ∨ y ∈ R, bijunctive if x, y, z ∈ R ⇒ (x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z) ∈ R, affine if x, y, z ∈ R ⇒ x ⊕ y ⊕ z ∈ R, 1-valid if it contains the tuple (1, 1, . . . , 1), and 0-valid if it contains the tuple (0, 0, . . . , 0). A basis, Γ , is Horn (respectively, dual-Horn, bijunctive, affine, 1-valid, or 0-valid) if every relation in Γ is Horn (respectively, dualHorn, bijunctive, affine, 1-valid, or 0-valid). We say that Γ is Schaefer if it is Horn, dual-Horn, bijunctive or affine. Theorem 1. [16] If Γ is Schaefer, 1-valid, or 0-valid then SAT (Γ ) is in P; otherwise, it is NP-complete. If Γ is Schaefer, then SATC (Γ ) is in P; otherwise, it is NP-complete. The ultimate goal of this research is to classify the complexity of generalized satisfiability problems (with and without constants) when the number of occurrences per variable is bounded. Although we did not completely achieve this objective we have been able to identify some subclasses of the N P -complete problems identified by Schaefer that become tractable when the number of occurrences per variables is bounded. In what follows we will only deal with generalized satisfiability problems with constants, SATC (Γ ). The first question we asked is how many occurrences per variable suffice to make the restricted problem as hard as the general problem. Let k ≥ 1 be a positive integer. We define CN FC (k, Γ )-formulas to be the subset of CN FC (Γ )-formulas restricted to those formulas where each variable occurs at most k times. SATC (k, Γ ) is defined to be the following decision problem: given a CN FC (k, Γ )-formula φ, is it satisfiable? SAT (k, Γ ) is defined similarly. The following theorem could be derived from Theorem 3 in [7]. Theorem 2. If k ≥ 3 and Γ is a basis then SATC (k, Γ ) is polynomially equivalent to SATC (Γ ). 2.1 At Most 2 Occurrences per Variable. -Matroids For any basis Γ , any generalized satisfiability problem with exactly one occurrence per variable, SATC (1, Γ ), is trivially solvable in polynomial time, that is, every instance is satisfiable. Therefore, the only case of unknown complexity is when variables occur
Generalized Satisfiability with Limited Occurrences per Variable
363
at most twice. The computational complexity of SATC (2, Γ ) appears to be quite interesting. Let {1 − in − 3} be the basis containing the single relation 1-in-3 given by {(0, 0, 1), (0, 1, 0), (1, 0, 0)}. Istrate first observed that SATC (2, {1 − in − 3}) is solvable in polynomial-time, by means of a reduction to the matching problem in graphs, whereas SATC ({1 − in − 3}) is NP-complete [16,10]. The guiding purpose of this work is to identify those bases, Γ , like {1 − in − 3}, such that SATC (2, Γ ) is polynomially solvable and SATC (Γ ) is NP-complete. That is, we ask, assuming P = N P , when is tractability contingent on having at most two occurrences per variable? When variables occur at most twice there are two previously known tractable classes in addition to those identified by Schaefer. SATC (2, Γ ) is polynomially solvable when every relation in Γ is compact or when every relation in Γ is co-independent [10,7]. An n-ary relation, R, is co-independent if d(x, y) = 1 for all x, y ∈ {0, 1}n \ R where d is the hamming distance. We do not provide a definition of compact relations since it is not needed to obtain our results and it is rather involved. On the other hand, it is also known that if a basis, Γ , contains a relation which is not a -matroid relation, then SATC (2, Γ ) is not any easier that SATC (Γ ) [7]. Thus, any new tractable basis, that is, any non-Schaefer basis, Γ , such that SATC (2, Γ ) is solvable in polynomial time, must consists entirely of -matroids. We now give two equivalent definitions of -matroids which will play a central role in our study. Definition 1. [2] Let E be a finite set and F ⊆ P(E) a collection of subsets of E. The pair (E, F) is called a set system. The set E is called the universe and the subsets in F are called the feasible sets of (E, F). Let be the symmetric difference operator, that is, for two sets A and B, AB = (A ∪ B) \ (A ∩ B). Then (E, F) is a -matroid if F satisfies the following symmetric exchange axiom: ∀A, B ∈ F and ∀x ∈ AB, ∃y ∈ AB such that A{x, y} ∈ F. Notice that y is not necessarily different from x. Definition 2. [14] Let R ⊆ {0, 1}r be a relation. Let x, y, x ∈ {0, 1}r , then x is a step from x to y if d(x, x ) = 1 and d(x, x ) + d(x , y) = d(x, y). R is a -matroid (relation) if it satisfies the following two-step axiom: ∀x, y ∈ R and ∀x a step from x to y, either x ∈ R or ∃x ∈ R which is a step from x to y. A -matroid relation can be obtained as the set of indicator tuples of the feasible sets of a -matroid. More formally, let (E, F) be a -matroid where E = {u1 , . . . , un }. Thus, the n-ary relation R containing for every F ∈ F the tuple tF given by tF [i] = 1 if ui ∈ F and 0 otherwise is a -matroid relation. We say that R is obtained from (E, F) (via the ordering u1 , . . . , un ). Conversely, given a n-ary relation R we can construct a matroid (E, F) in the following way: E = {1, . . . , n} and F contains a feasible set F if and only if there exists a tuple t ∈ R such that F = {1 ≤ i ≤ n| t[i] = 1}. Consequently, -matroids and -matroid relations are essentially different mathematical embodiments of the same concept. We will change freely between the set system and the relation formalism of -matroids as convenient. In the process we will often abuse notation and let R denote both the -matroid relation and the -matroid from which it is obtained. Now we have introduced all the necessary conceptual machinery to formally state Feder’s result. Theorem 3. [7] Let Γ be a basis that contains a relation that is not a -matroid. Then SATC (2, Γ ) is polynomially equivalent to SATC (Γ ).
364
Victor Dalmau and Daniel K. Ford
3 The -Matroid Parity Problem When Γ is a -matroid basis the problem SATC (2, Γ ) is intimately related with a problem in combinatorics called the -matroid parity problem. The -matroid parity problem was shown to be equivalent to the -covering problem and is a generalization of the well known matroid parity problem [9,4,13]. Let (E, F) be a -matroid and L a partition of E into pairs. For every u ∈ E, its mate will be denoted by u, that is, u is the only element in E such that {u, u} ∈ L. Let F ∈ F be a feasible set. We will let LF denote the subset of L containing those pairs {u, u} ∈ L such that either both u and u are in F or neither u nor u are in F . An instance of the -matroid parity problem consist of a -matroid, (E, F), and a partition, L, of E into pairs. The goal is to find a feasible set F ∈ F such that LF is maximum, that is, at least as large as LG for any other G ∈ F. For computational complexity purposes, the -matroid parity problem, as defined above, is not adequately described: it is not clear what the input of the problem is, that is, how is the -matroid given? If the -matroid is specified in its entirety then the problem is trivially tractable by exhaustive search. At the other end of the spectrum a -matroid, (E, F), can be specified via a feasible oracle. That is, given F ⊆ E the oracle tells whether F is in F or not. Unless explicitly stated we will assume that the -matroid is specified by means of a feasible oracle. We will also assume that one feasible set is always available. Using this representation, we say that the -matroid parity problem is polynomially solvable if it can be solved in time polynomial in the size |E| of the universe of the -matroid. Much of the work on the -matroid parity problem (and on the matroid parity problem) uses this representation (see for example [9]). Lov´asz showed that the -matroid parity problem, with feasible oracle representation, requires time exponential in the size of the universe even if the -matroid of the instance is restricted to be a matroid [15]. However, the problem is polynomial-time solvable if the -matroid in an instance is either linear or binary and is specified by its matrix representation [9]. In order to make precise the intimate relationship between generalized satisfiability problems with at most two occurrences per variables and the -matroid parity problem we introduce the following definitions. Let S be a collection of -matroids. The S-parity problem contains all the instances ((E, F), L) of the -matroid parity problem such that (E, F) belongs to S. Let (E1 , F1 ) and (E2 , F2 ) be two -matroids such that E1 ∩ E2 = ∅. Let E = E1 ∪ E2 and F = {F1 ∪ F2 : F1 ∈ F1 , F2 ∈ F2 }. Then (E, F) is the direct sum of (E1 , F1 ) and (E2 , F2 ). It is easy to see that the direct sum of two -matroids is a -matroid. A set S of -matroids is closed under direct sum if the direct sum of any two elements in S is an element in S. Two -matroids (E1 , F1 ), (E2 , F2 ) are called isomorphic if there exists some bijection h : E1 → E2 such that for every u1 , . . . , un ∈ E1 , {u1 , . . . , un } ∈ F1 iff {h(u1 ), . . . , h(un )} ∈ F2 . A set S of -matroids is closed under isomorphism if every -matroid (E2 , F2 ) isomorphic to an element (E1 , F1 ) of S belongs also to S. Lemma 1. Let S be any collection of -matroids closed under direct sum and under isomorphism such that the S-parity problem is polynomially solvable, and let Γ be a basis consisting of relations from S. Then SATC (2, Γ ) is in P .
Generalized Satisfiability with Limited Occurrences per Variable
365
The two new tractable cases of SATC (2, Γ ) introduced in this paper, local and binary, are obtained by identifying some polynomial-time solvable cases of the S-parity problem and transforming them, via Lemma 1, into tractable cases of SATC (2, Γ ). 3.1 The Method of Augmenting Paths Beginning with Edmonds’ original matching algorithm (see [6]), the notion of an augmenting path has been central to matching theory. In this section we generalize the notion of an augmenting path to make it suitable for the -matroid parity problem. Let ((E, F), L) be an instance of the -matroid parity problem. A path in (E, F) (or simply a path when the -matroid is implicit) is an ordered collection, u1 , . . . , un , of different elements in E. Let L ⊆ L, a path, u1 , . . . , un , is called L-alternating if: (1) {u1 , u1 }, {u1 , un } ∈ L, and (2) for every 1 ≤ 2j < n, {u2j , u2j+1 } ∈ L. Let F ∈ F be any feasible set. We say that a path, u1 , . . . , un , is an F path if: (1) F {u1 , . . . , un } ∈ F, and (2) for every 1 < 2j ≤ n, F {u1 , . . . , u2j } ∈ F. An LF -alternating F path, u1 , . . . , un , is called LF -augmenting (or simply augmenting when L and F are implicit) if either n is odd or {un , un } ∈ LF . The basic intuition behind this definition is that if F is a feasible set such that |LF | is not maximum then, by Theorem 4, there exists some LF -augmenting path, u1 , . . . , un . This path can be used to obtain a new feasible set G = F {u1 , . . . , un } which increases the objective function that we intend to maximize, (|LG | > |LF |) Theorem 4. Let ((E, F), L) be an instance of the -matroid parity problem and F ∈ F such that |LF | is not maximum. Then there exists an LF -augmenting path. Furthermore, we can compute it in time polynomial in |E| given a G ∈ F with |LG | > |LF |. 3.2 A New Tractable Case of -Matroid Parity: Local -Matroids From Theorem 4, the S-parity problem—and hence the SATC (2, Γ ) problem when Γ ⊆ S—reduces in polynomial time to the problem of either finding an augmenting path or determining that none exists. The augmenting path approach can be viewed as searching for an augmenting path by extending alternating paths. In graph matching two odd length alternating paths to the same node can be extended in the same way. This is not generally the case for parity problems. Intuitively, local -matroids is a family of -matroids for which this property still holds, that is, extending an alternating F path of a local -matroid is independent of the path. This motivates the term, local, and allows the local -matroid parity problem to be solved via augmenting paths without the need for transforms (see [9] for definitions). In particular we show that if a -matroid is local, then the existence of an augmenting path is equivalent to the existence of a specific type of path, called a local path. Furthermore, we show that it takes only polynomial time to (1) find an alternating local path or determining that none exists and (2) transform an alternating local path into an augmenting path. Consequently, we conclude that the S-parity problem is tractable when S consists of local -matroids. We now give a formal definition of local -matroids in terms of concepts related to augmenting paths. In the full version of the paper it is possible to find an equivalent characterization of local -matroids in terms of forbidden minors.
366
Victor Dalmau and Daniel K. Ford
Let (E, F) be a -matroid and let F ∈ F be a feasible set.A path u1 , . . . , un is called an F -local path (or simply a local path when F is implicit) if: (1) for every 1 < 2j ≤ n, F {u2j−1 , u2j } ∈ F and (2) F {un } ∈ F if n is odd. A cut of u1 , . . . , un is any path u1 , . . . , uj where 1 ≤ j ≤ n, and j is odd. A shortcut of u1 , . . . , un is any path u1 , . . . , uj , uk , uk+1 , . . . , un where 1 ≤ j < k ≤ n, j is odd, and k is even. A subpath of u1 , . . . , un is any path obtained from u1 , . . . , un by a (possibly empty) sequence of cuts and shortcuts. A path is F -local-minimal if it is F -local and it does not contain any proper F -local subpath. Let (E, F) be a -matroid relation, and M ≥ 0. (E, F) is called M -local if for every F ∈ F, every F -local-minimal path of length at most 2M is also F . (E, F) is called local if it is M -local for every M ≥ 0. Theorem 5. If SL is a set of local -matroids then the SL-parity problem is polynomially solvable.
4 Two New Tractable Cases of Generalized Satisfiability with at Most 2 Occurrences per Variable: Local and Binary -Matroids The general procedure adopted here is to use Lemma 1 to transform tractable families of the -matroid parity problem into corresponding additional tractable families of SATC (2, Γ ). This technique is immediately applicable to local -matroids. Corollary 1. If Γ is a basis consisting of local relations then SATC (2, Γ ) is in P . As previously mentioned in Sec. 3 linear (binary) -matroids are also polynomially solvable cases of the -matroid parity problem. As we did with local -matroids, we would like to use Lemma 1 to transform the tractability of the linear (binary) matroid parity problem into corresponding additional tractable families of SATC (2, Γ ). Unfortunately, Lemma 1 doesn’t apply in these cases since the linear (binary) -matroid parity problems are only known to be tractable when the instances include a matrix representation of the -matroid. However, in the binary case, we can work around this problem to get the additional tractable family of SATC (2, Γ ) where Γ consists of binary -matroids. We continue this section with the definition of the linear (binary) -matroid parity problem, and conclude with Theorem 2, which provides the desired additional tractable case of SATC (2, Γ ). Let R ⊆ {0, 1}n be a -matroid and x ∈ {0, 1}n . Then R ⊕ x := {x ⊕ y : y ∈ R} is the twisting of R by x. It is easy to see that R ⊕ x is a -matroid. Two -matroids are said to be equivalent if one is a twisting of the other. A matrix is said to be skew-symmetric if all its diagonal entries are 0 and A = −AT . A matrix is said to be symmetric if A = AT . Hence, skew-symmetric matrices over GF (2) are a proper subset of symmetric matrices over GF (2). Let A be a (skew-)symmetric n by n matrix. For x ∈ {0, 1}n , define A[x] to be the principal sub-matrix of A indexed by the non-zero elements of x. Define R(A) := {x ∈ {0, 1}n : rank A[x] = x1 }. Then R(A) is an n-ary -matroid relation [3]. A -matroid R is called linear if it is equivalent to R(A) for some skewsymmetric matrix A, and binary if it is equivalent to R(A) for some symmetric matrix A over GF (2) [9]. (A, x) will be called the matrix representation of the linear or binary
Generalized Satisfiability with Limited Occurrences per Variable
367
-matroid R(A)⊕x (if (A, x) is the matrix representation of R then x ∈ R.) The linear (binary) -matroid parity problem is: given a matrix representation for a linear (binary) n-ary -matroid, R, and, L, a partition of {1, . . . , n} into pairs, find the maximum matching in R with respect to L. Theorem 6. [9] The linear and binary -matroid parity problems with matrix representations are in P . Corollary 2. If Γ is a basis consisting of binary relations then SATC (2, Γ ) is in P .
5
Concluding Remarks
Generalized satisfiability with at most two variable occurrences is a rich class of problems intimately related to algorithms, concepts and methods of matching theory. Moreover, the computational complexity of these problems remains largely unknown.
Acknowledgements We would like to thank Phokion Kolaitis for introducing us to this problem and for his many helpful suggestions.
References 1. C. Berge. Two theorems in graph theory. Proc. Nat. Acad. Sci. U.S.A., 43:842–844, 1957. 2. A. Bouchet and W. H. Cunningham. Delta-matroids, jump systems, and bisubmodular polyhedra. SIAM J. Discrete Math., 8(1):17–32, 1995. 3. A. Bouchet. Representability of -matroids. In Combinatorics, pages 167–182. 1988. 4. A. Bouchet. Coverings and delta-coverings. In Integer programming and combinatorial optimization (Copenhagen, 1995), pages 228–243. 5. N. Creignou, S. Khanna, and M. Sudan. Complexity classifications of Boolean constraint satisfaction problems. SIAM 2001. 6. J. Edmonds. Paths, trees, and flowers. Canad. J. Math., 17:449–467, 1965. 7. T. Feder. Fanout limitations on constraint systems. Th. Comp. Sci., 255(1-2):281–293, 2001. 8. M. R. Garey and D. S. Johnson. Computers and intractability. 1979. 9. J. Geelen, S. Iwata, and K. Murota. The linear delta-matroid parity problem. Technical Report RIMS Preprint 1149, Kyoto University, 1997. 10. G. I. Istrate. Looking for a version of schaeer’s dichotomy theorem when each variable occurs at most twice. Technical Report TR652, The University of Rochester, March 1997. 11. J. Kratochv´ıl, P. Savick´y, and Z. Tuza. One more occurrence of variables makes satisfiability jump from trivial to NP-complete. SIAM J. Comput., 22(1):203–210, 1993. 12. R. E. Ladner. On the structure of polynomial time reducibility. J. Assoc. Comput. Mach., 22:155–171, 1975. 13. L. Lov´asz and M. D. Plummer. Matching Theory. 1986. 14. L. Lov´asz. The membership problem in jump systems. J. Comb. Th. (B), 70(1):45–66, 1997. 15. L. Lov´asz. The matroid matching problem. In Algebraic methods in graph theory, Vol. I, II (Szeged, 1978), pages 495–517. 16. T. J. Schaefer. The complexity of satisfiability problems. In Theory of Computing. 1978. 17. Craig A. Tovey. A simplified NP-complete satisfiability problem. Discrete Appl. Math., 8(1):85–89, 1984.
Randomized Algorithms for Determining the Majority on Graphs Gianluca De Marco1 and Andrzej Pelc2 1
Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche via Moruzzi 1, 56124 Pisa, Italy
[email protected] 2 D´epartement d’informatique, Universit´e du Qu´ebec en Outaouais Hull, Qu´ebec J8X 3X7, Canada
[email protected] Abstract. Every node of an undirected connected graph is colored white or black. Adjacent nodes can be compared and the outcome of each comparison is either 0 (same color) or 1 (different colors). The aim is to discover a node of the majority color, or to conclude that there is the same number of black and white nodes. We consider randomized algorithms for this task and establish upper and lower bounds on their expected running time. Our main contribution are lower bounds showing that some simple and natural algorithms for this problem cannot be improved in general.
1
Introduction
Given an undirected connected n-node graph G = (V, E), any assignment of colors white or black to the nodes of G such that there are at most n/2 black nodes, is referred to as a coloring of G. Given such a coloring, adjacent nodes can be compared and the outcome of each comparison is either 0 (same color) or 1 (different colors). The aim is to discover a white node, in the case when white nodes form a strict majority, or else to conclude that there is the same number of black and white nodes, using as few comparisons as possible. This problem has been investigated in [2,3,10] for the complete graph, i.e., in the case when any pair of nodes can be compared. It has been proved in [10,2] that the minimum worst-case number of comparisons to deterministically solve this problem is n − ν(n), where ν(n) is the number of 1-bits in the binary representation of n. In [3] the minimum average-case number of comparisons was investigated for this problem. The above problem has a similar flavor to that of diagnosis of multiprocessor systems, introduced in [9]. Such a system is represented by an undirected
This work was done during the first author’s visit at the Research Chair in Distributed Computing of the Universit´e du Qu´ebec en Outaouais. The work of the second author was supported in part by NSERC grant OGP 0008136 and by the Research Chair in Distributed Computing of the Universit´e du Qu´ebec en Outaouais.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 368–377, 2003. c Springer-Verlag Berlin Heidelberg 2003
Randomized Algorithms for Determining the Majority on Graphs
369
connected graph whose nodes are processors. Some processors are faulty, there are less than one-half of such processors, and the aim of diagnosis is to locate all faulty processors, by performing tests on some processors by adjacent ones. Among many fault models considered in the literature ([9] introduced the first of them, called the PMC model), two are particularly relevant in our context: the symmetric comparison model of Chwa and Hakimi [4], and the asymmetric comparison model of Malek [8]. In both models, comparison tests can be conducted between adjacent processors. A comparison test between two fault-free processors gets answer 0 (no difference) and the comparison test between a faultfree and a faulty processor gets answer 1 (difference). This is also identical to our assumptions (black nodes representing faulty processors). The two models differ between them and also differ from our setting when two faulty processors are compared. In the symmetric comparison model, the answer can be then arbitrary (0 or 1) and in the asymmetric comparison model the answer is 1. The justification usually given for the two above diagnosis models is the following. Comparison tests often consist in choosing a fairly complex computational task and submitting it to both compared processors. If the results of computations are identical for both processors, the answer to the test is 0, if not, the answer is 1. Two fault-free processors will clearly give the same result of the computation, and a faulty processor will likely make an error somewhere in the computation, thus producing a different result than a good one. The situation is less clear when two faulty processors are compared. They may either err in the same way, thus causing answer 0 to the comparison test, or make different mistakes, the test answer being 1 in this case. This argument justifies the symmetric model. However, one may say that, for a complex computational task, identical errors for two faulty processors are very unlikely, thus the asymmetric comparison model could be more realistic. Our testing model, in which comparison test results faithfully describe the same or different fault status of tested processors (same or different node colors) can be justified by another scenario. Suppose that all processors of the system have a boolean variable identically initialized in the beginning. Then some processors (less than half of them) fail, and the fault consists in corrupting precisely this bit. We want to discover the original bit (which corresponds to the majority color, since only less than half of the processors changed it) by comparing the value of this boolean variable between adjacent processors. This situation is similar to the persistent bit problem described in [7], although the focus in [7] was to distributedly restore the common bit in all nodes, using only local probes of the network by each processor. In the above context of fault-tolerance, it is natural to assume that faulty processors (black nodes) are significantly less numerous than the good ones (white nodes), since in realistic systems the number of faults rarely approaches 50%. Therefore, it is reasonable to suppose that the number of black nodes is at most αn, for some positive constant α < 1/2. A coloring satisfying this assumption will be called an α-coloring. Thus an α-coloring of an n-node graph G = (V, E)
370
Gianluca De Marco and Andrzej Pelc
is a function I : V → {b, w}, (where b and w stand for black and white, respectively), with |I −1 ({b}| ≤ αn. We now formulate two variations of the problem considered in this paper. The Simple-Majority Problem on Graphs (MPG). Let G = (V, E) be an undirected connected graph with an input coloring defined on it. If white nodes strictly outnumber black nodes, we must discover a white node v ∈ V , and report equality otherwise, by making comparisons beetwen adjacent nodes of G. The outcome of every comparison is either 0 (equal colors) or 1 (nonequal colors). The goal is to use as few comparisons as possible. The α-Majority Problem on Graphs (α-MPG). Let G = (V, E) be an undirected connected graph with an input α-coloring defined on it, for some α < 1/2. We must discover a white node v ∈ V by making comparisons beetwen adjacent nodes of G. The outcome of every comparison is either 0 (equal colors) or 1 (nonequal colors). The goal is to use as few comparisons as possible. For both these problems, we refer to the number of comparisons used by an algorithm on a given input coloring (resp. α-coloring), as the running time of the algorithm on this input. Hence we assume that all operations other than tests take negligible time. It should be noted that in the α-MPG, the parameter α < 1/2 is known to the algorithm selecting comparisons. Aigner [1] considered this variation of the majority problem in the deterministic setting for the complete graph, and proved that the minimum number of comparisons is 2αn − ν(αn). Obviously, for any connected graph, 2αn comparisons are sufficient, by performing tests along edges of a spanning tree of any connected subgraph with 2αn+1 nodes. Aigner [1] also pointed out interesting relations between the α-MPG and the diagnosis problem in the PMC model from [9], in the case of the complete graph. Hence (asymptotically) optimal solutions to both MPG and α-MPG are known in the deterministic setting. Since the running time of optimal algorithms is linear in the number of nodes in both cases, it is natural to seek randomized algorithms for both problems, hoping to improve efficiency. Thus in the present paper we concentrate on randomized algorithms for both these problems. It turns out that while randomization does not help significantly in the case of the MPG, it sometimes drastically improves algorithm complexity for the α-MPG. Our main contribution are lower bounds showing that the complexity of some simple and natural algorithms for these problems cannot be improved in general. 1.1
Our Results
We first show that the simple-majority problem does not allow efficient randomized algorithms on any graphs. Indeed, we prove that if the difference between the number of white and black nodes is bounded by a constant then every randomized algorithm for determining a white node in an n-node graph (with sufficiently
Randomized Algorithms for Determining the Majority on Graphs
371
small constant error probability) uses expected running time Ω(n) on some input, even for the complete graph. (As mentioned above, O(n)-time algorithms – even deterministic – exist for any connected graph). Hence, in the rest of the paper we investigate randomized algorithms for the α-majority problem on nnode connected graphs, with parameter α < 1/2. We study expected running time of randomized Monte Carlo algorithms for the α-MPG, whose probability of error is at most > 0. For any connected graph, we show an algorithm whose running time on every input is O(D log(1/)), where D is the diameter of the graph. If the maximum degree of a graph is βn, where β > 2α, then there is an algorithm with running time O(log(1/)). We show that these bounds cannot be improved in general. Every algorithm, running on an arbitrary n-node graph, must use expected time Ω(min(n, log(1/))) on some input. We also show that the large constant β in the requirement of maximum degree of βn is essential to get a fast algorithm: we show graphs of maximum degree Θ(n), for which every algorithm must use expected time Ω(n) on some input. On the other hand, for sufficiently small constant , the bound O(D) cannot be improved for a large class of graphs: we show that, for d-dimensional grids and tori, every algorithm must use expected time Ω(D) on some input.
2
Terminology and Preliminaries
Throughout the paper, we restrict attention to connected graphs. Given a graph G = (V, E) and an input coloring I on it, we define the comparison function LI of G on input I as the function LI : E → {0, 1}, such that for each edge e = {u, v} ∈ E, 0 if I(u) = I(v); LI (e) = 1 otherwise. We will use deterministic algorithms as a tool to prove lower bounds on the performance of randomized algorithms. Fix a deterministic algorithm A for the MPG (resp. α-MPG). Given a graph G = (V, E) and an input coloring (resp. α-coloring) I on it, algorithm A works in consecutive steps defined as follows. At any step, the algorithm selects an edge e ∈ E and receives as an answer LI (e). After the last step, it shows a node of the graph as the solution (a discovered white node). For any input I, the set EI ⊆ E denotes the set containing all the edges selected during the execution of A. The running time of algorithm A on input I is r(I) = |EI |. For any input I and integer γ, we define the set EI (γ) as follows {e∈EI | e is selected at step s≤γ } if γ < |EI | EI (γ) = EI otherwise. Given an input I, we define the execution E(A, I) of algorithm A on input I, as the pair E(A, I) = (GI , uI ), where GI = (VI , EI ) is the subgraph of G induced by the set of edges EI , and uI is the node representing the solution given by A. We will use the following version of Chernoff bound.
372
Gianluca De Marco and Andrzej Pelc
Lemma 1 (Chernoff bound [6]). Let X be the number of successes in a series of Bernoulli trials of length m with success probability q. Let q < q. Then P rob(X ≤ q m) ≤ e−am , where a is a positive constant depending on q and q but not on m. The main tool for proving lower bounds on the performance of randomized algorithms will be the following well known result. Lemma 2 (Yao’s minimax principle [11]). Let 0 < < 1/2. Let P be a probability distribution over the set of inputs. Let A denote the set of all deterministic algorithms that err with probability at most 2 over P. For A ∈ A, let C(A, P) denote the expected running time of A over P. Let R be the set of randomized algorithms that err with probability at most for any input, and let E(R, I) denote the expected running time of R ∈ R on input I. Then, for all P and all R ∈ R, minA∈A C(A, P) ≤ 2 maxI E(R, I). The standard application of the above lemma to lower bound proofs is the following. We construct a probability distribution over the set of inputs for a given graph, for which any deterministic algorithm that errs with probability at most 2 has a large expected running time over this probability distribution. (Note that there is a big flexibility in the choice of the distribution, in fact any set of inputs can be selected, by putting probability zero on other inputs). In view of the lemma, this implies that every randomized (Monte Carlo) algorithm that errs with probability at most for any input, must have large expected running time on some input.
3
The Simple-Majority Problem on Graphs
In this section we show that the simple-majority problem on graphs does not allow efficient randomized algorithms. Indeed, we prove that if the difference between the number of white and black nodes is bounded by a constant then every randomized algorithm for determining a white node in an n-node graph (with sufficiently small constant error probability) uses expected running time Ω(n) on some input, even for the complete graph. Note that this lower bound holds even if the exact (constant) difference between the number of black and white nodes is known to the algorithm selecting comparisons. This lower bound on complexity is tight in view of the obvious algorithm using n − 1 comparisons on any connected graph: performing all tests along edges of a spanning tree, and determining the majority color. Theorem 1. Consider an arbitrary graph G = (V, E) and an arbitrary positive integer constant d. Suppose that the number of white nodes exceeds that of black nodes by d. Any randomized algorithm for determining a white node in G, which errs with probability at most , for sufficiently small constant , must have expected running time Ω(n) on some input I.
Randomized Algorithms for Determining the Majority on Graphs
373
Proof. Fix a positive integer constant d. Consider the set I of input colorings on G, for which the number of white nodes exceeds that of black nodes by d. We prove that any deterministic algorithm for the MPG on G, that errs with probability at most 2 on the uniform distribution on I, uses an expected number of Ω(n) comparisons. By Yao’s minimax principle, this implies our theorem. Let g be the size of the set I. Let c = 10d · d! and fix < 1/(4c + 4). Fix a deterministic algorithm A for the MPG on G which errs with probability at most 2 on the uniform distribution on I. Two cases are possible: either r(I) ≤ n/10 for at least g/2 inputs I, or r(I) > n/10 for at least g/2 inputs I. Suppose that the first case holds and let J be the set of inputs for which r(I) ≤ n/10. Denote by J + the set of inputs in J on which algorithm A is correct, and by J − the set of inputs in J on which algorithm A is incorrect. Let VI be the set of nodes involved in comparisons made by A on input I and let uI be the node presented by A as a solution on input I. Denote UI = VI ∪ {u}. We have |UI | ≤ 2r(I) + 1 ≤ 3n/10. Denote by WI the set of nodes outside of UI which are colored white on input I. We have |WI | ≥ n/5, for any I ∈ J . For any I ∈ J + and any set D ⊆ WI of size d, let f (I, D) denote the coloring resulting from I by interchanging colors white and black at each node outside of D. Notice that f (I, D) ∈ I because the number of white nodes in f (I, D) exceeds that of black nodes by exactly d (the set D is a set of those extra d white nodes). For any I ∈ J + and any D ⊆ WI , we have E(A, I) = E(A, f (I, D)). In particular, f (I, D) ∈ J . Moreover, for any I ∈ J + , algorithm A is incorrect on f (I, D), and hence f (I, D) ∈ J − . For a given I ∈ J + and different sets D1 , D2 , the colorings f (I, D1 ) and f (I, D2 ) are different. On the other hand, for a fixed set D and fixed coloring J, there is only one coloring I such that J = f (I, D). Consider the bipartite graph on the set J of colorings with bipartition J + , J − and edges between those I ∈ J + and J ∈ J − , for which J = f (I, D), for some set D ⊆ WI of size d. Every I ∈ J + has degree at least n/5 (n/10)d nd ≥ = . d d! c Every J ∈ J − has degree at most nd ≤ nd . Hence |J − | ≥
nd /c · |J + | = |J + |/c. nd
This implies |J + | ≤ c|J − |, hence g/2 ≤ |J | ≤ (c + 1)|J − |, and consequently |J − | ≥ g/(2c + 2). Thus the probability that the algorithm is incorrect is at least 1/(2c + 2) > 2, which is a contradiction. Hence the second case must hold, i.e., r(I) > n/10 for at least g/2 inputs I. This implies that the expected running time of algorithm A is Ω(n), and the theorem follows.
374
4
Gianluca De Marco and Andrzej Pelc
Upper Bounds for the α-Majority Problem on Graphs
In this section we establish upper bounds on the expected running time of randomized algorithms for the α-MPG, and show efficient randomized algorithms for this problem on a large class of graphs. We begin with the following lemma. Lemma 3. Fix α < 1/2. Let β > 2α and let H be a connected subgraph of an nnode graph G, with k ≥ βn nodes. Let 0 < < 1 be the bound on error probability. Then there exists a randomized algorithm for the α-MPG on the graph G which errs with probability at most and has running time O(D log(1/)) on every input, where D is the diameter of the graph H. Proof. Consider a series of m = c log(1/) Bernoulli trials with success probability q = (β − α)/β. Let E be the event that in this series there are at most m/2 successes. By Lemma 1, the probability of event E is at most , for some constant c, in view of q > 1/2. Fix such a constant c. The algorithm works as follows. 1. Make m random independent selections of nodes in H with uniform probability 1/k. Note that selections are with return, so it is possible to select the same node many times. 2. Let S = {s1 , ..., sr } be the set of selected nodes. Construct a spanning subtree T of this set in the graph H in the following greedy manner: let Ti−1 be the part of T constructed till step i − 1. (T0 consists of node s1 ). At step i, join node si+1 to the closest (in H) among nodes in Ti−1 by a shortest path in H. 3. Perform all tests along edges of this tree. Answers to these tests induce an assignment of colors 1 and 2 to all nodes of the tree. Consider colors assigned to nodes of S and count them with multiplicities, i.e., add x to the count of a given color if a node of this color was chosen x times in the selection. 4. Select a node v of that of colors 1 or 2 which got count at least as large as the other color. Choose this node v as the solution (white) node. In order to analyze this algorithm, observe that if it is incorrect then the majority among the m random selections must be black. This means that in the corresponding Bernoulli series with success probability q (success means selecting a white node) there are at most m/2 successes. By the choice of the constant c, this probability is at most , hence the algorithm errs with probability at most , as required. The total number of tests is at most rD ≤ mD ∈ O(D log(1/)), for any input. Notice that the bound on running time holds for every execution of the algorithm (not only for the expected value of the running time), for every input. Putting H = G in the above lemma we obtain the following result. Theorem 2. Fix α < 1/2. For any 0 < < 1 and any connected graph G there exists a randomized algorithm for the α-MPG, which errs with probability at most and has running time O(D log(1/)) on every input, where D is the diameter of G.
Randomized Algorithms for Determining the Majority on Graphs
375
The next result implies that the α-MPG can be randomly solved with any constant error bound in constant time, for graphs of large maximum degree. It follows from Lemma 3. Theorem 3. Fix α < 1/2. For any < 1 and any graph G of maximum degree at least βn, for β > 2α, there exists a randomized algorithm for the α-MPG, which errs with probability at most and has running time O(log(1/)) on every input.
5
Lower Bounds for the α-Majority Problem on Graphs
In this section we establish lower bounds on the performance of randomized algorithms for the α-MPG. The first lower bound shows that the complexity obtained in Theorem 3 for graphs of large maximum degree, cannot be improved, regardless of this degree. Of course, we cannot get the lower bound Ω(log(1/)) for any error probability , when is a very fast decreasing function of the number n of nodes of the graph, since, for connected graphs, n − 1 comparisons performed along edges of a spanning tree are always sufficient. Hence the best lower bound we can hope for in general is Ω( min (n, log(1/))). We show that it actually holds. Theorem 4. Fix α < 1/2. Let G = (V, E) be an n-node graph. Every randomized algorithm for the α-MPG on G, that errs with probability at most , has expected running time Ω(min(n, log(1/))), for some input. Proof. We define a set I of input α-colorings for the α-MPG on G, and a (nonuniform) probability distribution on I, such that any deterministic algorithm for the α-MPG on G, that errs with probability at most 2 on this distribution, uses an expected number of Ω(min(n, log(1/))) comparisons. By Yao’s minimax principle, this implies our theorem. Let p = α/2. Consider the set J of all assignments of colors white or black to nodes in V . For a given assignment J ∈ J , let x(J) denote the number of black nodes in J. Define a probability distribution on J by the formula P (J) = px(J) (1−p)n−x(J) . Distribution P corresponds to random and independent coloring of every node: black with probability p and white with probability 1 − p. Let I = {J ∈ J |x(J) ≤ αn} and let q = J∈I P (J). Define the probability distribution P on I by the formula P(J) = P (J)/q. Set I consists of all α-colorings of G and probability distribution P is yielded by probability distribution P by restricting it to I and normalizing. As usual, we extend P to subsets of I by taking the sum of distribution values over all elements of a subset. By Lemma 1, there exists a positive constant b such that 1−q ≤ e−bn . Let d be a positive constant for which e−bn ≤ pdn /2. Let c be a positive constant satisfying 2c log(1/)+1 > . (There exists an 0 depending only on p, such the inequality p 4q that, for < 0 , such a constant exists. If ≥ 0 then the theorem is true because Ω(min(n, log(1/))) = Ω(1) in this case.) Let γ = min((dn − 1)/2, c log(1/)). Fix a deterministic algorithm A for the α-MPG on G, that errs with probability
376
Gianluca De Marco and Andrzej Pelc
at most 2 on distribution P. Two cases are possible: either for some input I ∈ I we have r(I) ≤ γ, or r(I) > γ for all I ∈ I. Suppose that the first case holds and fix I ∈ I for which r(I) ≤ γ. Let U = VI be the set of nodes involved in comparisons made by A on input I and let u = uI be the node presented by A as a solution on input I. Let |U ∪{u}| = δ. By definition, δ ≤ 2γ + 1. Consider the following assignment C of colors white and black to nodes of the set U ∪ {u}: if A is incorrect on I, then C is as in I; otherwise, C is reverse with respect to I (colors white and black are interchanged on U ∪ {u}). Let K be the set of all inputs in I which have the assignment C of colors on the set U ∪ {u}. We have P (K) ≥ pδ − (1 − q) ≥ pδ − pdn /2 ≥ pδ /2. Hence p2c log(1/)+1 pδ ≥ > 2. P(K) ≥ 2q 2q For any J ∈ K, we have E(A, J) = E(A, I), hence A is incorrect on every input J ∈ K. This implies that A errs with probability larger than 2, which is a contradiction. Hence we must have r(I) > γ, for all I ∈ I. Our next result shows that the upper bound given by Theorem 2 cannot be improved in general. For constant error probability we got running time linear in the diameter of the graph. We now show that for a large class of graphs, including d-dimensional grids and tori, this running time is optimal. Fix a positive integer d. An m-node line is the graph with the set of nodes V = {v1 , ..., vm } and set of edges E = {{vi , vi+1 }|i = 1, ..., m − 1}. An m-node cycle has the same set of nodes and one edge more: {v1 , vm }. A d-dimensional grid of size m is the graph Gd,m which is the graph product of d copies of the mnode line. A d-dimensional torus of size m, denoted by Td,m , is a d-dimensional grid of size m, supplemented with wrap around edges. The proof of the following theorem will appear in the full version of this paper. Theorem 5. Fix α < 1/2 and < α/8. For any randomized algorithm R for the α-MPG on the grid Gd,m or on the torus Td,m , having error probability at most , there exists an input α-coloring I, such that the expected number of comparisons used by R on I is Ω(m). We conclude with the observation that the assumption of Theorem 3 cannot be weakened to only require maximum degree linear in the number of nodes: the large constant β, for maximum degree βn, turns out to be crucial to get a fast randomized algorithm. The proof of the following result will appear in the full version of this paper. Proposition 1. Fix α < 1/2. There exist n-node connected graphs Gn of maximum degree Θ(n), such that every randomized algorithm for the α-MPG on Gn , with sufficiently small constant error probability, has expected running time Ω(n), on some input.
Randomized Algorithms for Determining the Majority on Graphs
6
377
Conclusion and Open Problems
We showed that randomization does not help significantly to solve the simplemajority problem on graphs: expected time linear in the number of nodes is still necessary if white nodes outnumber black nodes only by a constant. What happens in the more general case, if the difference is o(n) for n-node graphs? Can the MPG be solved with constant error probability √ in sublinear expected time, e.g., if the algorithm knows that the difference is n? On the other hand, randomization drastically improves algorithm complexity for the α-majority problem on graphs, α < 1/2, in some cases. For constant error probability, it is possible to solve the α-MPG in time linear in the diameter of a sufficiently large subgraph of the given graph. Thus, for graphs of large maximum degree, the α-MPG can be solved in constant time with constant error probability. However, for graphs of bounded maximum degree, constant diameter subgraphs are only of constant size, so our upper bound argument does not apply. Nevertheless, for some graphs of bounded maximum degree, the α-MPG can be solved in constant time with constant error probability. This is the case, e.g., for regular expander graphs, in view of the results from [5]. On the other hand, our lower bound for d-dimensional grids and tori shows that for some bounded degree graphs such a constant time algorithm does not exist. The following question remains open: for which graphs the α-MPG can be solved in constant time with constant error probability?
References 1. M. Aigner, Variants of the majority problem, Discrete Applied Mathematics, to appear. 2. L. Alonso, E. M. Reingold, and R. Schott, Determining the majority, Information Processing Letters 47 (1993), 253-255. 3. L. Alonso, E. M. Reingold, and R. Schott, The average-case complexity of determining the majority, SIAM Journal on Computing 26 (1997), 1-14. 4. K. Y. Chwa and S. L. Hakimi, Schemes for fault-tolerant computing: A comparison of modularly redundant and t-diagnosable systems, Information and Control 49 (1981), 212-238. 5. D. Gillman, A Chernoff bound for random walks on expander graphs, SIAM Journal on Computing 27 (1998), 1203-1220. 6. T. Hagerup and C. Rub, A guided tour of Chernoff bounds, Information Processing Letters 33 (1989/90), 305-308. 7. S. Kutten and D. Peleg, Fault-local distributed mending, Proc. 14th ACM Symposium on Principles of Distributed Computing (1995), 20-27. 8. M. Malek, A comparison connection assignment for diagnosis of multiprocessor systems, Proc. 7th Symp. Comput. Architecture (1980), 31-35. 9. F. P. Preparata, G. Metze, and R. T. Chien, On the connection assignment problem of diagnosable systems, IEEE Trans. on Electr. Computers 16 (1967), 848-854. 10. M. E. Saks and M. Werman, On computing majority by comparisons, Combinatorica 11 (1991), 383-387. 11. A. C-C. Yao, Probabilistic computations: Towards a unified measure of complexity, Proc. 18th Ann. IEEE Symp. on Foundations of Computer Science (1977), 222-227.
Using Transitive–Closure Logic for Deciding Linear Properties of Monoids Christian Delhomm´e1 , Teodor Knapik1 , and D. Gnanaraj Thomas2 1
ERMIT, Universit´e de la R´eunion, BP 7151 97715 Saint Denis Messageries Cedex 9, R´eunion {delhomme,knapik}@univ-reunion.fr 2 Dept. of Mathematics Madras Christian College Tambaram, Madras 600 059, India
Abstract. We use first–order logic with transitive closure operator FO(TC1 ) for deciding first–order linear monoid properties. These are written in the style of linear sentences of Ron V. Book, but with a less restrictive language. The decidability of such properties concerns monoids presented by recognizable convergent suffix semi–Thue systems. Keywords: string rewriting, monoid presentations, transitive closure logic
Introduction String–rewriting systems, also known as semi–Thue systems, appear as perhaps the oldest model of computation although their introduction by Axel Thue [18] early in the 20th century was related to (semi–)group theory. Later, semi–Thue systems have been confirmed as useful in various areas of computing including computer algebra, compiling [1], cryptographic protocols [6, 7] and public–key cryptography [17]. In the present paper, semi–Thue systems are studied in their primary context of combinatorial (semi–)group theory and are regarded as monoid presentations. Our leitmotiv is the question: which classes of properties are decidable for which classes of monoids. Of course, without restrictions on the general form of semi– Thue systems involved in the presentations, only undecidability results may be expected. Several restrictions have therefore been identified as being crucial in discussing decidability during the second half of the 20th century, in particular, termination, confluence or the property of being monadic. These restrictions lead to several classes of monoids. Concerning the classes of properties, a general approach consists in asking about the logical language necessary for their expression. A class of first–order monoid properties has been proposed by Ron V. Book in [5]. The properties of this class are definable by means of linear sentences, viz., sentences with no multiple occurrences of any one variable. More precisely Ron V. Book considered a Σ02 ∪ Π02 fragment of first–order logic with equality, restricted to connectives ∧ and ∨, where the sentences are linear and B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 378–387, 2003. c Springer-Verlag Berlin Heidelberg 2003
Using Transitive–Closure Logic for Deciding Linear Properties of Monoids
379
such that every term may contain only universally or only existentially bound variables. This may seem to be a rather weak logic, but, as pointed out in [5], it is still powerful enough for expressing a number of interesting monoid properties. Moreover, the main result of [5] attests the decidability of those sentences for every monoid presented by a finite monadic and Church–Rosser semi–Thue system and this problem is shown to be PSPACE–complete. In [16], the decidability of Book’s linear sentences is established for context–free groups. We extend the aforementioned result of Ron V. Book by considering more general classes of both semi–Thue systems and sentences. First, we drop the restriction to sole connectives ∨ and ∧, we let terms contain both existentially and universally bound variables and we allow a more liberal alternation of quantifiers. More precisely, we use the full first–order logic with equality, still however subject to linearity restriction and of the following condition: we require the order of quantifiers to be in accordance with the order of variables in terms. We show that this logic is decidable in the case of monoids presented by recognizable (possibly infinite) convergent and suffix semi–Thue systems. This is accomplished by reducing the truth of linear sentences to the model checking of, so called, Cayley–type graph, with respect to the transitive closure logic FO(TC1 ).
1
Background
The powerset of a set E is written ℘(E). The set {1, . . . , n} is abbreviated as [n] with [0] = ∅. The domain (resp. range) of a binary relation R is written Dom(R) (resp. Ran(R)). We assume that the reader is familiar with the notions of monoid, word, language, rational and recognizable subsets of a monoid and regular expression. The family of recognizable (resp. rational) subsets of a monoid M is written Rec(M) (resp. Rat(M)). The free monoid over Σ is written Σ ∗ , ε stands for the empty word and the length of a word w ∈ Σ ∗ is denoted by |w|. The i–th letter of w, for i ∈ [|w|], is written w(i). The reversal w(|w|)w(|w| − 1) . . . w(1) of w is = {w written w and this notation is extended to sets in the usual way: W |w∈ W }. When w = uv then u (resp. v) is called a prefix (resp. suffix ) of w. The set of suffixes of w is written suff(w). Semi–Thue Systems and Monoid Presentations A semi–Thue system (an sts for short) S is a subset of Σ ∗ ×Σ ∗ . An element (l, r) of S is called a rule and is written l → r; the word l (resp.r) is its left hand (resp. right hand) side. The reversal of S is the following sts: S := { l → r | l → r ∈ S}. The single–step reduction relation induced by S on Σ ∗ , is the binary relation - = {(xly, xry) | x, y ∈ Σ ∗ , l → r ∈ S} .
S
∗ v, A word u reduces into a word v (alternatively v is a descendant of u), if u S ∗ where - is the reflexive–transitive closure of - called the reduction reS
S
380
Christian Delhomm´e, Teodor Knapik, and D. Gnanaraj Thomas
lation induced by S on Σ ∗ . A word v is irreducible with respect to (w.r.t. for short) S when v does not belong to Dom( - ). It is easy to see that the set S
of all irreducible words w.r.t. S, written Irr(S), is rational whenever Dom(S) is so, since Dom( - ) = Σ ∗ Dom(S) Σ ∗ . The set of irreducible descendants of S
a word u w.r.t. S, is written u↓S . The Thue congruence induced by S on Σ ∗ is the reflexive–symmetric–transitive closure of - , written ∗- . We denote by MS , the quotient of the S S free monoid Σ ∗ by this congruence. We say that a monoid M is presented by Σ, S, if S ⊆ Σ ∗ × Σ ∗ and M is isomorphic to MS . A semi–Thue system S over Σ is said to be (1) monadic if Ran(S) ⊆ Σ ∪ {ε} and |l| > |r| for each l → r ∈ S, (2) terminating or noetherian if there is no infinite chain u0 - u1 - · · · , S
S
∗ ∗ (3) confluent if for all words u, v, v ∈ Σ such that u v and u v , there
∗
S
S
∗ ∗ w and v w, exists a word w ∈ Σ such that v ∗
S
S
(4) convergent if it is both confluent and terminating, n (5) recognizable if it is a finite union S = i=1 Li × Ri with Li , Ri ∈ Rat(Σ ∗ ) for i ∈ [n]. We recall that, in a convergent sts, u↓S is a singleton for every word u. Graphs and Transitive Closure Logic Given an alphabet Σ, a simple directed edge–labeled graph G over Σ is a set of edges, viz a subset of D × Σ × D where D is an arbitrary set. Given d, d ∈ D, a d . A (finite) path in an edge from d to d labeled by a ∈ Σ is written d G
G from some d ∈ D to some d ∈ D is a sequence of edges of the following a1 an form: d0 d1 , . . . , dn−1 dn , such that d0 = d and dn = d . The word G G w - d to mean that w = a1 . . . an is then the label of the path and we write d ........ L - d , if there is a path from d to d labelled by w. For L ⊆ Σ ∗ , we write d ........ G
w - d , for some w ∈ L. By convention, there is a path labelled by ε from d ........ G
every vertex to itself. For the purpose of this paper, the interest lies only in graphs, the vertices of which are all accessible from some distinguished vertex. Thus, a graph G ⊆ D × Σ × D is said to be rooted on a vertex e ∈ D if there exists a path from e to each vertex of G. The following assumption is made for the sequel. Whenever in a definition of a graph a vertex e is distinguished as root, then the maximal subgraph rooted on e is understood. We skip a general definition of a unary transitive–closure logic FO(TC1 ) of a relational structure and we focus on the particular case of simple, directed,
Using Transitive–Closure Logic for Deciding Linear Properties of Monoids
381
edge–labelled graph G ⊆ D × Σ × D with a distinguished root e ∈ D. Such a graph G may be seen as the model theoretic structure D, ( a- )a∈Σ , e and used for interpreting FO(TC1 ). The formulae are constructed from vertex variables written x, y, x , x1 , etc., the binary predicate symbols (sa )a∈Σ and “=”, the constant symbol r, the classical connectives and quantifiers, as well as the transitive closure operator TC defined as follows. Let ϕ(x, y) be a formula with at least 2 free variables x and y and let s, t be two terms. The formula [TCx,y ϕ](s, t) says that the ordered pair (s, t) belongs to the reflexive–transitive closure of the binary relation that is defined by ϕ(x, y). The predicate symbols sa and “=” are a and equality. The constant symbol r is interpreted interpreted on G resp. as G
by e. The satisfaction of a formula under an assignment is defined in the usual way (see e.g. [11]). We denote by Fvar(ϕ) the set of variables which have free occurrences in ϕ. In the following, we give an example of a property that can be defined in FO(TC1 ). Example 1.1. For every rational language L, one may construct an FO(TC1 ) formula pathL (x, y) which is true under every assignment ν into a graph G such
L - ν(y). Assuming that L is presented by a rational expression, that ν(x) ......... G
pathL (x, y) is defined inductively path∅ (x, y) path{a} (x, y) pathR∪R (x, y) pathRR (x, y) pathR∗ (x, y)
:⇐⇒ :⇐⇒ :⇐⇒ :⇐⇒ :⇐⇒
sa (x, y) ∧ ¬sa (x, y) sa (x, y) pathR (x, y) ∨ pathR (x, y) ∃z (pathR (x, z) ∧ pathR (z, y)) [TCx,y pathR (x, y)](x, y)
The Kleene star is a typical example of a situation where the transitive closure is needed. As shown by Fagin [12], the latter cannot be defined in first–order logic. A sentence is a formula with no free variable1 . Given a logic L, the L model checking problem for a class of structures S , written MCP(L, S ), is the problem of deciding, for every structure S ∈ S , the membership to the set of L sentences which are true for S.
2
Cayley–Type Graphs of Semi–Thue Systems
As defined in [10], a Cayley–type graph CG(S) of an sts S is a simple, directed, edge–labeled graph: a ∗ v | u, v ∈ Irr(S), a ∈ Σ, au v} . CG(S) := {u S
1
Note that x and y are bound in [TCx,y ϕ](x , y ) whenever {x, y} ∩ {x , y } = ∅.
382
Christian Delhomm´e, Teodor Knapik, and D. Gnanaraj Thomas
We use here its symmetric definition which has been considered in [14]: a ∗ v | u, v ∈ Irr(S), a ∈ Σ, ua v} . CG(S) := {u S
S) are isomorphic. Obviously CG(S) and CG( From the above definition we obtain immediately the following facts: (1) If S is noetherian, then every vertex of CG(S) has an outgoing edge labelled by a, for each a ∈ Σ. (2) If S is confluent, then, for each a ∈ Σ, every vertex of CG(S) has at most one outgoing edge labelled by a. (3) CG(S) may have an infinite number of connected components. We shall focus on the subgraph CG(S, ε) of CG(S) rooted at the vertex ε: ∗
Σ a - u} . CG(S, ε) := {u v ∈ CG(S) | ε ........... CG(S)
Note that CG(S) = ∅ when ε ∈ Dom(S), because Irr(S) = ∅, and so is CG(S, ε). Just as properties of the Cayley graph of a group are meaningful for the group itself, some properties of CG(S, ε) and MS are related, especially when S u - w, and we have also is convergent. In particular, w ∈ u↓S , whenever ε .............. CG(S,ε)
the converse provided that S is confluent. This observation leads to the following lemma: Lemma 2.1. For all u, v ∈ Σ ∗ and every convergent S, the following conditions are equivalent: (1) u ∗- v, S
u v - w and ε .............. - w. (2) there exists a vertex w such that ε .............. CG(S,ε)
CG(S,ε)
We now turn our attention to a particular class of semi–Thue systems. Definition 2.2. A sts S is suffix on L ⊆ Irr(S), if where
∗(LΣ) S
is the image of the set LΣ under
L ⊆ Irr(S), if S is suffix on L.
∗(LΣ) Σ −1 S
∗. S
⊆ Irr(S),
A sts S is prefix on
In other words, in each step of a reduction starting from a word of LΣ (resp. ΣL), rules apply only on a suffix (resp. prefix). In the case when L ∈ Rat(Σ ∗ ) and S is recognizable, this property is decidable [10].
Using Transitive–Closure Logic for Deciding Linear Properties of Monoids
3
383
Linear Sentences of Ron V. Book and Their Translation into FO(TC1 )
Book’s linear sentences (BL sentences) are first–order sentences with atomic formulae of the form u ≡ v where u, v ∈ (Σ ∪ X )∗ are words with variables in X and are called terms. Only two connectives are allowed, namely ∧ and ∨. The sentences considered are in the prenex normal form and may have at most one alternation of quantifiers2 . All variables within an occurrence of a term have to be bound by quantifiers of the same type (existential or universal). Each variable occurs at most once in a sentence; this is the linearity condition. We say that a BL sentence ϕ such that Var(ϕ) = x = (x1 , . . . , xn ), holds on a structure Σ, S, L1 , . . . , Ln , where S ⊆ Σ ∗ × Σ ∗ is an sts and L1 , . . . , Ln ⊆ Σ ∗ , when ϕ is true according to the usual first–order interpretation such that each xi ranges over Li and “≡” is interpreted as ∗- . S
Many properties of monoids may be expressed by a BL sentence together with an appropriate structure. We mention only two of them. The reader may consult [8] for more examples. Example 3.1. (1) Extended Word Problem Instance: A semi–Thue system S and two languages L1 , L2 . Question: Do there exist u1 ∈ L1 and u2 ∈ L2 such that u1 ∗- u2 ? S The answer is yes if and only if the sentence ∃x1 ∃x2 x1 ≡ x2 holds on the structure Σ, S, L1 , L2 . (2) (Finite) Independent Set Problem Instance: A semi–Thue system S and a finite set of nonempty words F = {u1 , . . . , un }. Question: Is ui congruent to no word of (F {ui })∗ ? The answer is no if and only if the sentence ∃x1 . . . ∃xi ni=1 ui ≡ xi holds on the structure Σ, S, (F {u1 })∗ , . . . , (F {un })∗ .
According to [5], the truth of BL sentences is decidable (even PSPACE– complete) on every structure Σ, S, R1 , . . . , Rn , such that S is a finite, monadic and confluent and R1 , . . . , Rn are rational languages. What are the properties of monoids that cannot be expressed using BL sentences ? Unfortunately, we are not able to answer this question at the present. Even for a single example of a property it may be difficult to establish that it is not expressible. The general technique of Ehrenfeucht–Fra¨ıss´e games does not apply here due to syntactic restrictions of BL sentences. We therefore only conjecture that the following Left Zero Problem cannot be expressed using BL sentences: Instance: A semi–Thue system S over Σ. Question: Do there exist z ∈ Σ ∗ such that for every u ∈ Σ ∗ , zu ∗- z ? S
2
In other words, BL sentences are Σ02 or Π02 sentences.
384
Christian Delhomm´e, Teodor Knapik, and D. Gnanaraj Thomas
More General Linear Sentences We shall consider now the full first order logic with equality restricted to the linear case. The sentences of this more general logic are called first–order linear monoid (FOLM) sentences. Besides the use of all connectives and arbitrary alternation of quantifiers, the syntactical difference with BL sentences is that instead of using a structure Σ, S, R1 , . . . , Rn in order to constrain each xi to range over Ri , we use bounded quantification: ∃Ri xi and ∀Ri xi . An atomic FOLM formula is of the form u ≡ v, with u, v ∈ (Σ ∪ X )∗ . FOLM formulae use connectives ¬, ∨, ∧, ⇒, bounded quantifiers ∃R , ∀R , where R is a rational language, and obey the following restrictions:
• each variable occurs at most once in a formula, • for every term t1 xt2 yt3 occurring in a formula, with x, y ∈ X and t1 , t2 , t3 ∈ (Σ ∪ X )∗ , the quantifier bounding x cannot occur in the scope of the quantifier bounding y. According to the latter restriction, the order of quantifiers has to be in accordance with the order of appearance of variables in a term. We say that a FOLM sentence holds in MS , if the sentence is true according to the usual first order interpretation where “≡” is interpreted by ∗- . The next lemma follows from S definitions. Lemma 3.2. For every BL sentence ϕ such that Var(ϕ) = {x1 , . . . , xn } and every structure Σ, S, R1 , . . . , Rn , such that R1 , . . . , Rn are rational languages, there exists a FOLM sentence θ such that ϕ holds in Σ, S, R1 , . . . , Rn , if and only if, θ holds in MS . It is interesting that, in spite of linearity restriction, Left Zero Problem may be expressed using the following FOLM sentence: ∃Σ∗z ∃Σ∗x ∀Σ∗y xy ≡ z. This works because y may equal ε. However, this sentence cannot be directly translated into a BL sentence. Indeed, the variables of the term xy are not bound by the same type of quantifiers. The following main result establishes a connection between FOLM on MS and FO(TC1 ) on CG(S, ε): Proposition 3.3. For every FOLM sentence ϕ, there exists effectively an FO(TC1 ) sentence θ such that, for every recognizable, convergent sts S, ϕ holds ε). in MS , if and only if, θ holds in CG(S, Before proving Proposition 3.3, we need some preliminaries. We call basic FOLM formula, an FOLM formula of the form Q t ≡ t , where t, t ∈ (Σ∪X )∗ and Q is a block: Q = Q1 . . . Qn with each Qi either equal to a bounded existential quantification ∃Ri xi or equal to a negation ¬. A formula which is a boolean combination of basic FOLM formulas is said to be in boolean normal form. Lemma 3.4. Every FOLM formula is equivalent to an FOLM formula in boolean normal form.
Using Transitive–Closure Logic for Deciding Linear Properties of Monoids
385
Proof. As for the usual quantification, for the bounded one we have the following identities: ¬∃R x ϕ ⇔ ∀R x ¬ϕ,
¬∀R x ϕ ⇔ ∃R x ¬ϕ.
In addition, it is easy to establish the following identities when x ∈ / Fvar(ϕ): ∃R x (ϕ ∨ ϕ ) ⇔ ϕ ∨ ∃R x ϕ
∃R x (ϕ ∧ ϕ ) ⇔ ϕ ∧ ∃R x ϕ
Thanks to the linearity condition, we may apply the above identities from left to right in order to transform any FOLM formula into a boolean normal form. 2 We call basic FOLM formula any basic FOLM formula ϕ of the form Q zt ≡ z t with Fvar(ϕ) = {z, z } (in particular z and z are two distinct variables). To each such formula ϕ, we associate an FO(TC1 ) formula T(ϕ) according to the following inductive rules: T(zt ≡ z t ) := ∃x (patht (z, x) ∧ patht (z , x)), T(¬Q zt ≡ z t ) := ¬T(Q zt ≡ z t ), T(∃R x Q zuxt ≡ z t ) := ∃x (pathuR (z, x) ∧ T(Q xt ≡ z t )), T(∃R x Q zt ≡ z uxt ) := ∃x (pathuR (z , x) ∧ T(Q zt ≡ xt )). Notice that Fvar(T(ϕ)) = Fvar(ϕ). Lemma 3.5. For every basic FOLM formula Q zt ≡ z t , all r, r ∈ Σ ∗ and every recognizable, convergent sts S, MS |= Q rt ≡ r t
iff
CG(S, ε) |= ∃z ∃z (pathr (r, z) ∧ pathr (r, z ) ∧ T(Q zt ≡ z t )) . The proof of the above lemma, by induction on the structure of the formula, is available in a longer version of this paper. Proof (Proposition 3.3). Thanks to Lemma 3.4, we only have to consider basic FOLM sentences. Now given a FOLM sentence Q t ≡ t , consider the FO(TC1 ) sentence ∃z ∃z (pathε (r, z)∧pathε (r, z )∧T(Q zt ≡ z t )) and invoke Lemma 3.5, 2 given that CG(S, ε) |= ∀z(pathε (r, z) ⇔ z = r). We note that our translation of FOLM into FO(TC1 ) based on Cayley–type graphs cannot be extended to the case where the order of quantifiers is not in accordance with the order of appearance of variables in a term. Indeed consider the following FOLM sentence: ∃x ∀y ∃z yxz = u, where u ∈ Σ ∗ . On a Cayley– type type graph G, the corresponding property may be expressed as follows: there exists a word, say v (i.e. ∃x) such that from every vertex of G (i.e. ∀y), there is a path to u↓ with a label in vΣ ∗ (i.e. Σ ∗ corresponds to ∃z). Here, the prefix v
386
Christian Delhomm´e, Teodor Knapik, and D. Gnanaraj Thomas
of a path in vΣ ∗ is independent from the starting point of this path. This means that, on the contrary to the proof of Lemma 3.5, in the present example, the first existential quantification on the label of a path (i.e. ∃x) cannot be replaced with a quantification on vertices of G. Now, the quantification on labels of a path cannot be expressed in FO(TC1 ) (even not in monadic second order logic). For the sequel, we need to define the following classes of structures: G0 := {CG(S) | S recognizable and prefix on Irr(S)}, G1 := {CG(S) | S recognizable, convergent and prefix on Irr(S)}, G2 := {CG(S, ε) | S recognizable, convergent and prefix on Irr(S)}, G2 := {CG(S, ε) | S recognizable, convergent and suffix on Irr(S)} and M2 := {MS | S recognizable, convergent and suffix on Irr(S)}, and let us denote by GPrefRec the class of prefix–recognizable graphs [9]. Using Theorem 4.6 of [10] we conclude that G0 ⊆ GPrefRec . Since G1 ⊆ G0 , we have also G1 ⊆ GPrefRec. Since the reachability is expressible in monadic second order (MSO) logic, each graph of G2 is definable within a graph of G1 . According to [3] (see also [4]), each graph of G2 is therefore in GPrefRec . Since G2 and G2 are equal up to graph isomorphism, we have the following lemma. Lemma 3.6. CG(S, ε) is prefix–recognizable for every recognizable sts which is suffix on Irr(S). Now, since MCP(MSO, GPrefRec) is decidable [9] and FO(TC1 ) embeds into MSO, MCP(FO(TC1 ), G2 ) is decidable too. Hence, from Proposition 3.3, we obtain the following: Corollary 3.7. FOLM theory and, in particular, BL theory of MS is decidable for every recognizable convergent sts S which is suffix on Irr(S). This properly extends the decidability result of [5] beyond the case of finite monadic and Church–Rosser sts’s. Indeed, every monadic sts S is terminating and is suffix on Irr(S) but not vice–versa [14]. Unfortunately, at the present, we are not able to obtain an analogous result about complexity. Nevertheless, the following discussion may be useful for further investigations. Let µ denote the modal µ–calculus [2]. As established in [15, 19], MCP(µ, GPrefRec) is EXPTIME–complete. Since on finite structures the complexity of model checking w.r.t. FO(TC1 ) is lower than w.r.t. µ (see e.g. [13]), we conjecture the following: Conjecture 3.8. MCP(FO(TC1 ), GPrefRec ) is in EXPTIME. Consequently we conjecture that MCP(FO(TC1 ), G2 ) is in EXPTIME and also that MCP(FOLM, M2 ) is so because the proof of Proposition 3.3 is based on a linear translation of FOLM sentences into FO(TC1 ) sentences.
Using Transitive–Closure Logic for Deciding Linear Properties of Monoids
387
References 1. A. V. Aho, R. Sethi, and J. D. Ullman. Code optimization and finite Church– Rosser systems. In R. Rustin, editor, Design and Optimization of Compilers, pages 89–106. Prentice–Hall, 1972. 2. A. Arnold and D. Niwi´ nski. Rudiments of µ–calculus. Number 146 in Studies in Logic and the Foundations of Mathematics. Elsevier, 2001. 3. K. Barthelmann. On equational simple graphs. Technical Report 9/97, Johannes Gutenberg Universit¨ at, Mainz, 1998. 4. A. Blumensath. Axiomatising tree-interpretable structures. In H. Alt and A. Ferreira, editors, STACS 2002, LNCS 2285, pages 596–607, Antibes – Juan les Pins, Mar. 2002. Springer. 5. R. V. Book. Decidable sentences of Church–Rosser congruences. Theoretical Comput. Sci., 24:301–312, 1983. 6. R. V. Book and F. Otto. On the security of name–stamp protocols. Theoretical Comput. Sci., 39:319–325, 1985. 7. R. V. Book and F. Otto. On the verifiability of two–party algebraic protocols. Theoretical Comput. Sci., 40:101–130, 1985. 8. R. V. Book and F. Otto. String–Rewriting Systems. Texts and Monographs in Computer Science. Springer–Verlag, 1993. 9. D. Caucal. On infinite transition graphs having a decidable monadic second–order theory. Theoretical Comput. Sci., 290(1):79–115, 2003. 10. D. Caucal and T. Knapik. A Chomsky–like hierarchy of infinite graphs. In K. Diks and W. Rytter, editors, MFCS 2002, LNCS 2420, pages 177–187, Warsaw, Aug. 2002. 11. H.-D. Ebbinghaus and J. Flum. Finite Model Theory. Springer–Verlag, 1999. Second edition. 12. R. Fagin. Monadic generalized spectra. Zeitschrift f¨ ur Mathematische Logik und Grundlagen der Mathematik, 21:89–96, 1975. 13. N. Immerman and M. Y. Vardi. Model checking and transitive-closure logic. In O. Grumberg, editor, Computer Aided Verification, 9th International Conference, CAV ’97, pages 291–302, Haifa, June 1997. 14. T. Knapik and H. Calbrix. Thue specifications and their monadic second–order properties. Fundamenta Informaticae, 39(3):305–325, 1999. 15. O. Kupferman and M. Y. Vardi. An automata–theoretic approach to reasoning about infinite–state systems. In Computer Aided Verification, Proc. 12th Int. Conference, LNCS 1855, pages 36–52, Chicago, July 2000. Springer. 16. K. Madlener and F. Otto. Decidable sentences for context–free groups. In C. Choffrut and M. Jantzen, editors, Symposium on Theoretical Aspects of Computer Science, LNCS 480, pages 160–171, Hamburg, Feb. 1991. Springer. 17. V. A. Oleshchuk. On public–key cryptosystem based on Church–Rosser string– rewriting systems. In D.-Z. Du and M. Li, editors, First Annual International Computing and Combinatorics Conference, COCOON’95, LNCS 959, pages 264– 269, Xian, Aug. 1995. Springer. 18. A. Thue. Probleme u ¨ber Ver¨ anderungen von Zeichenreihen nach gegebenen Regeln. Skr. Vid. Kristiania, I Mat. Natuv. Klasse, 10:34 pp, 1914. 19. I. Walukiewicz. Monadic second–order logic on tree–like structures. Theoretical Comput. Sci., 275(1–2):311–346, 2002.
Linear-Time Computation of Local Periods Jean-Pierre Duval1 , Roman Kolpakov2, , Gregory Kucherov3 , Thierry Lecroq4 , and Arnaud Lefebvre4 1
2
LIFAR, Universit´e de Rouen, France
[email protected] Department of Computer Science, University of Liverpool, UK
[email protected] 3 INRIA/LORIA, Nancy, France
[email protected] 4 ABISS, Universit´e de Rouen, France {Thierry.Lecroq,Arnaud.Lefebvre}@univ-rouen.fr
Abstract. We present a linear-time algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factorization Theorem.
1
Introduction
Periodicities in words have been classically studied in word combinatorics and are at the core of many fundamental results [18,2,19]. Besides, notions and techniques related to periodic structures in words find their applications in different areas: data compression [24], molecular biology [12], as well as for designing more efficient string search algorithms [11,3,5]. In this paper, we concentrate, from the algorithmic perspective, on the important notion of local periods, that characterize a local periodic structure at each location of the word [9,8]. In informal terms, the local period at a given position is the size of the smallest square centered at this position. An importance of local periods is evidenced by the fundamental Critical Factorization Theorem [18,2,19] that asserts that there exists a position in the word (and a corresponding factorization), for which the local period is equal to the global period of the word. Designing efficient algorithms for computing different periodic structures in words has been for a long time an active area of research. It is well-known that the (global) period of a word can be computed in linear time, using the KnuthMorris-Pratt string matching method [16,4].On the other hand, in [3] it has been shown that a critical factorization can be constructed in linear time, by computing the smallest and largest suffixes under the lexicographical ordering. In the same work, the factorization has then been used to design a new string matching algorithm.
On leave from the French-Russian Institute for Informatics and Applied Mathematics, Moscow University, Russia
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 388–397, 2003. c Springer-Verlag Berlin Heidelberg 2003
Linear-Time Computation of Local Periods
389
In this paper, we show how to compute all local periods in a word in time O(n) assuming an alphabet of constant size. This is substantially more powerful than linear-time computations of a critical factorization and of the global period: indeed, once all local periods have been computed, the global period is simply the maximum of all local periods, and each such maximal value corresponds to a distinct critical factorization. Note that a great deal of work has been done on finding periodicities occurring in a word (see [13] for a survey). However, none of them allows to compute all local periods in linear time. The reason is that most of those algorithms are intrinsically super-linear, which can be explained by the fact that they tend, explicitly or implicitly, to enumerate all squares in the word, the number of which can be super-linear. The closest result is the one of [17] which claims a linear-time algorithm for finding, for each position i of the string, the smallest square starting at i. The approach is based on a sophisticated analysis of the suffix tree. The absence of a complete proof prevents the comprehension of the algorithm in full details; however, to the best of our understanding, this approach cannot be applied to finding local periods. Here we design a linear-time algorithm for finding all local periods, based on several different string matching techniques. Some of those techniques (sfactorization, Main-Lorentz extension functions) have already been successfully used for several repetition finding problems [7,21,20,13,14,15]. In particular, in [13], it has been shown that all maximal repetitions can be found in linear time, providing an exhaustive information about the periodic structure of the word. However, here again, a direct application of this approach to finding local periods leads to a super-linear algorithm. We then propose a non-trivial modification of this approach, that allows to find a subclass of local periods in linear time. Another tool we use is the simplified Boyer-Moore shift function, which allows us to complete the computation of local periods, staying within the linear time bound.
2
Local Periods: Preliminaries
Consider a word w = a1 ...an over a finite alphabet. |w| denotes the length of w, and wR stands for the reverse of w, that is an an−1 . . . a1 . w[i..j], for 1 ≤ i, j ≤ n, denotes the subword ai ...aj provided that i ≤ j, and the empty word otherwise. A position i in w is an integer number between 0 and n, associated with the factorization w = uv, where |u| = i. A square s is a word of the form tt (i.e. a word of even length with two equal halves). t is called the root of s, and |t| is called its period. Definition 1. Let w = uv, and |u| = i. We say that a non-empty square tt is centered at position i of w (or matches w at central position i) iff the following conditions hold: (i) t is a suffix of u, or u is a suffix of t, (ii) t is a prefix of v, or v is a prefix of t.
390
Jean-Pierre Duval et al.
In the case when t is a suffix of u and t is a prefix of v, we have a square occurring inside w. We call it an internal square. If v is a proper prefix of t (respectively, u is a proper suffix of t), the square is called right external (respectively, left external). Definition 2. The smallest square centered at a position i of w is called the minimal local square (hereafter simply minimal, for shortness). The local period at position i of w, denoted LPw (i), is the period of the minimal square centered at this position. Note that for each position i of w, LPw (i) is well-defined, and 1 ≤ LPw (i) ≤ |w|. Any word w has the (global) period p(w), which is the minimal integer p such that w[i] = w[i + p] whenever 1 ≤ i, i + p ≤ |w|. Equivalently, p(w) is the smallest positive integer p such that words w[1..n − p] and w[p + 1..n] are equal. The critical factorization theorem [18,2,19] is a fundamental result relating local and global periods: Theorem 1 (Critical Factorization Theorem). For each word w, there exists a position i (and the corresponding factorization w = uv, |u| = i) such that LPw (i) = p(w). Moreover, such a position exists among any p(w) consecutive positions of w. Apart from its combinatorial consequences, an interesting feature of the critical factorization is that it can be computed very efficiently, in a time linear in the word length [3]. This can be done, for example, using the suffix tree construction [4]. On the other hand, it is well-known that the (global) period of a word can be computed in linear time, using, for example, the Knuth-Morris-Pratt technique [4]. In this paper, we show how to compute all local periods in a word in linear time. This computation is much more powerful than that of a critical factorization or the global period : once all local periods are computed, the global period is equal to the maximum among them, and each such maximal local period corresponds to a critical factorization of the word. The method we propose consists of two parts. We first show, in Section 3, how to compute all internal minimal squares. Then, in Section 4 we show how to compute left and right external minimal squares, in particular for those positions for which no internal square has been found. Both computations will be shown to be linear-time, and therefore computing all local periods can be done within linear time too.
3
Computing Internal Minimal Squares
Finding internal minimal squares amounts to computing, for each position of the word, the smallest square centered at this position and occurring entirely inside the word, provided that such a square exists. Thus, throughout this section we
Linear-Time Computation of Local Periods
391
will be considering only squares occurring inside the word and therefore, for the sake of brevity, omit the adjective “internal”. The problem of finding squares and, more generally, finding repetitions occurring in a given word has been studied for a long time in the string matching area, we refer to [13] for a survey. A natural idea is then to apply one of those methods in order to compute all squares and then select, for each central position, the smallest one. A direct application of this approach, however, cannot result in a linear-time algorithm, for the reason that the overall number of squares in a word can be as big as Θ(n log n) (see [6]). Therefore, manipulating the set of all squares explicitly is prohibitive for our purpose. In [13], maximal repetitions have been studied, which are maximally extended runs of consecutive squares. Importantly, the set of maximal repetitions encodes the whole set of squares, while being only of linear size. Our approach here is to use the technique of computing maximal repetitions in order to retrieve squares which are minimal for some position. To present the algorithm in full details, we first need to describe the techniques used in [20,13] for computing maximal repetitions. 3.1
s-Factorization, Main-Lorentz Extension Functions, and Computing Repetitions
In this section we recall basic ideas, methods and tools underlying our approach. The s-factorization [7] is a special decomposition of the word. It is closely related to the Lempel-Ziv factorization (implicitly) defined by the well-known Lempel-Ziv compression method. The idea of defining the s-factorization is to proceed from left to right and to find, at each step, the longest factor which has another copy on the left. Alternatively, the Lempel-Ziv factorization considers the shortest factor which does not appear to the left (i.e. extends by one letter the longest factor previously occurred). We refer to [12] for a discussion on these two variants of factorization. A salient property of both factorizations is that they can be computed in linear time [22] in the case of constant alphabet. In their original definition, both of these factorizations allow an overlap between a factor and its left copy. However, we can restrict this and require the copy to be non-overlapping with the factor. This yields a factorization without copy overlap (see [15]). Computing the s-factorization (or Lempel-Ziv factorization) without copy overlap can still be done in linear time. In this work we will use the s-factorization without copy overlap: Definition 3. The s-factorization of w without copy overlap is the factorization w = f1 f2 . . . fm , where fi ’s are defined inductively as follows: (i) f1 = w[1], (ii) assume we have computed f1 f2 . . . fi−1 (i ≥ 2), and let w[bi ] be the letter immediately following f1 f2 . . . fi−1 (i.e. bi = |f1 f2 . . . fi−1 | + 1). If w[bi ] does not occur in f1 f2 . . . fi−1 , then fi = w[bi ], otherwise fi is the longest subword starting at position bi , which has another occurrence in f1 f2 . . . fi−1 .
392
Jean-Pierre Duval et al.
Note however that the choice of the factorization definition is guided by the simplicity of algorithm design and presentation clarity, and is not unique. Our second tool is Main-Lorentz extension functions [21]. In its basic form, the underlying problem is the following. Assume we are given two words w1 , w2 and we want to compute, for each position i of w2 , the longest prefix of w1 which occurs at position i in w2 . This computation can be done in time O(|w1 | + |w2 |) [21]. Note that w1 and w2 can be the same word, and that if we invert w1 and w2 , we come up with the symmetric computation of longest suffixes of w2 [1..i] which are suffixes of w1 . We now recall how Main-Lorentz extension functions are used for finding repetitions. The key idea is illustrated by the following problem. Assume we have two words w1 = w1 [1..m] and w2 = w2 [1..n] and consider their concatenation w = w1 w2 . Assume we want to find all squares of w which cross the boundary between w1 and w2 , i.e. squares which start at some position ≤ m and end at some position > m in w (start and end positions of a square are the positions of respectively its first and last letter). First, we divide all such squares into two categories – those centered at a position < m and those centered at a position ≥ m – and by symmetry, we concentrate on the squares centered at a position ≥ m only. We then compute the following extension functions : – pref (i), 2 ≤ i ≤ n + 1 defined by pref (i) = max{j|w2 [1..j] = w2 [i..i + j − 1]} for 2 ≤ i ≤ n, and pref (n + 1) = 0, – suf (i), 1 ≤ i ≤ n defined by suf (i) = max{j|w1 [m − j + 1..m] = w[m + i − j + 1..m + i]}. Then there exists a square with period p iff suf (p) + pref (p + 1) ≥ p
(1)
[20]. This gives a key of the algorithm: we first compute values pref (p) and suf (p) for all possible p, which takes time O(m + n). Then we simply check for each p inequality (1) – each time it is verified, we witness new squares of period p. More precisely, whenever the inequality is verified we have identified, in general, a series (run) of squares centered at each position from the interval [m − suf (p) + p..m + pref (p + 1)]. This run is a maximal repetition in w (see [13]). Formally, this maximal repetition may contain squares centered at positions < m (if suf (p) > p), and squares starting at positions > m (if pref (p + 1) > p − 1). Therefore, if we want only squares centered at positions ≥ m and starting at positions ≤ m (as it will be our case in the next Section), we have to restrict the interval of centers to [max{m − suf (p) + p, m}.. min{m + pref (p + 1), m + p}]. Clearly, verifying inequality (1) takes constant time and the whole computation can be done in O(n). To find, in linear time, all squares in a word (and not only those which cross a given position), we have to combine the factorization and extension function techniques. In general terms, the idea is the following : we compute the s-factorization and process factors one-by-one from left to right. For each factor fr , we consider separately those squares which occur completely inside fr , and
Linear-Time Computation of Local Periods
393
those ending in fr and crossing the boundary with fr−1 . The squares of the first type are computed using the fact that fr has a copy on the left – we can then retrieve those squares from this copy in time O(|fr |). The squares of the second type are computed using the extension function technique sketched above, together with an additional lemma asserting that those squares cannot extend to the left of fr by more than |fr | + 2|fr−1 | letters [20]. Therefore, finding all these squares, in form of runs, takes time O(|fr−1 | + |fr |). The whole word can then be processed in time O(n). The reader is referred to [20,13] for full details. This general approach, initiated in [7,20] has been applied successfully to various repetition finding problems [13,14,15]. In this work we show that it can be also applied to obtain a linear-time algorithm for computing internal local periods. This gives yet another illustration of the power of the approach. 3.2
Finding Internal Minimal Squares
We are ready now to present a linear-time algorithm of computing all internal minimal squares in a given word w. First, we compute, in linear time, the s-factorization of w without copy overlap and we keep, for each factor fr , a reference to its non-overlapping left copy. The algorithm processes all factors from left to right and computes, for each factor fr , all minimal squares ending in this factor. For each minimal square found, centered at position i, the corresponding value LPw (i) is set. After the whole word has been processed, positions i for which values LPw (i) have not been assigned are those positions for which no internal square centered at i exists. For those positions, minimal squares are external, and they will be computed at the second stage, presented in Section 4. Let fr = w[m + 1..m + l] be the current factor, and let w[p + 1..p + l] be its left copy (note that p + l ≤ m). If for some position m + i, 1 ≤ i < l, the minimal square centered at m + i occurs entirely inside the factor, that is LPw (m + i) ≤ min{i, l − i}, then LPw (m + i) = LPw (p + i). Note that LPw (p + i) has been computed before, as the minimal square centered at p + i ends before the beginning of fr . Based on this, we retrieve, in time O(|fr |), all values LPw (m + i) which correspond to squares occurring entirely inside fr . It remains to find those values LPw (m + i) which correspond to minimal squares that end in fr and extend to the left beyond the border between fr and fr−1 . To do this, we use the technique of computing squares described in the previous section. The idea is to compute all candidate squares and test which of them are minimal. However, this should be done carefully: as mentioned earlier, this can break down the linear time bound, because of a possible super-linear number of all squares. The main trick is to keep squares in runs and to show that there is only a linear number of individual squares which need to be tested for minimality. As in [20], we divide all squares under consideration into those which are centered inside fr and those centered to the left of fr . Two cases are symmetrical and therefore we concentrate on those squares centered at positions m..m + l − 1. In addition, we are interested in squares starting at positions ≤ m and ending inside fr . We compute all such squares in the increasing order of pe-
394
Jean-Pierre Duval et al. q k−q m
j−q
q j
k−p
j+q k
p
k+p p
bound of factors
Fig. 1. Case where neither of inequations (2),(3) holds (subcase k > j)
riods. For each p = 1..l −1 we compute the run of all squares of period p centered at positions belonging to the interval [m..m + l − 1], starting at a position ≤ m, and ending inside fr , as explained in Section 3.1. Assume we have computed a run of such squares of period p, and assume that q < p is the maximal period for which squares have been previously found. If p ≥ 2q, then we check each square of the run whether it is minimal or not by checking the value LPw (i). If this square is not minimal, then its center i has been already assigned a value LPw (i). Indeed, if a smaller square centered at i exists, it has necessarily been already computed by the algorithm (recall that squares are computed in the increasing order of periods), and therefore a positive value LPw (i) has been set before. If no value LPw (i) has yet been assigned, then we have found the minimal square centered at i. Since there is ≤ p of considered squares of period p (their centers belong to the interval [m..m + p − 1]), checking all of them takes ≤ 2(p − q) individual checks (as q ≤ p/2 and p − q ≥ p/2). Now assume p < 2q. Consider a square sq = w[j − q + 1..j + q] of period q and center j, which has been previously found by the algorithm (square of period q in Figure 1). We now prove that we need to check for minimality only those squares sp of period p which have their center k verifying one of the following inequalities : |k − j| ≤ p − q, or k ≥j+q
(2) (3)
In words, k is located either within distance p − q from j, or beyond the end of square sq . Show that one of inequations (2),(3) must hold. By contradiction, assume that neither of them holds. Consider the case k > j, case k < j is symmetric. The situation with k > j is shown in Figure 1. Now observe that word w[j + 1..k] has a copy w[j − q + 1..k − q] (shown with empty strips in Figure 1) and that its length is (k − j). Furthermore, since k − j > p − q (as inequation (2) does not hold), this copy overlaps by p − q letters with the left root of sp . Consider this overlap w[k − p + 1..k − q] (shadowed strip in Figure 1). It has a copy w[k + 1..k + (p − q)] and another copy w[k − (p − q) + 1..k] (see Figure 1). We thus have a smaller square centered at k, which proves that square sp cannot be minimal. Therefore, we need to check for minimality only those squares sp which verify, with respect to sq , one of inequations (2),(3). Note that there are at most 2(p−q)
Linear-Time Computation of Local Periods
395
squares sp verifying (2), and at most p − q squares sp verifying (3), the latter because sp must start before the current factor, i.e. k ≤ m + p. We conclude that there are ≤ 3(p−q) squares of period p to check for minimality, among all squares found for period p. Summing up the number of all individual checks results in a telescoping sum, and we obtain that processing all squares centered in the current factor can be done in time O(|fr |). A similar argument applies to the squares centered on the left of fr . Note that after processing fr , all minimal squares ending in fr have been computed. To sum up, we need to check for minimality only O(|fr−1 | + |fr |) squares, among those crossing the border between fr and fr−1 , each check taking a constant time. We also need O(|fr |) time to compute minimal squares occurring inside fr . Processing fr takes then time O(|fr−1 | + |fr |) overall, and processing the whole word takes time O(n). Theorem 2. In a word of length n, all internal minimal squares can be computed in time O(n).
4
Computing External Minimal Squares
In this section, we show how to compute minimal external squares for those positions which don’t have internal squares centered at them. The algorithm is based on the simplified Boyer-Moore shift function, used in classical string matching algorithms [1,16]. Definition 4. For a word w of length n the simplified Boyer-Moore shift function is defined as follows [1,16]: dw (i) = min{ | ≥ 1 and (for all j, i < j ≤ n, ≥ j or w[j] = w[j − ])} . In words, dw (i) is the smallest shift between suffix v and its copy in w. If v has no other occurrence in w, then we look for the longest suffix of v occurring in prefix of w. The function dw can be computed in O(n) time and space [1]. We will show that, given a word w of length n, all minimal external squares can be computed in time O(n). Consider a word w and assume that the function dw has been computed. Consider a factorization w = uv and assume that there is no internal square centered at position |u|. We first consider the case when |u| ≥ |v|, and show how to compute the minimal right external square centered at |u|. Lemma 1. Let w = uv with |u| ≥ |v|. If there is no internal square centered at i = |u|, then the minimal right external square has period dw (i). Proof. First note that dw (i) > |v| must hold, as otherwise there is a copy of v overlapping (or touching) the suffix occurrence of v, which implies that there is an internal square of period dw (i) centered at i, which contradicts our assumption. We now consider two cases. If dw (i) ≤ i,then there is an occurrence of v inside u and therefore u = u0 vu1 for some u0 , u1 . It follows that there is a right external
396
Jean-Pierre Duval et al.
square centered at i with the root vu1 . This is the minimal such square, as the definition of dw guarantees that u1 is the shortest possible. If dw (i) > i,then v = v0 v1 and u = v1 u0 with |v1 u0 v0 | = dw (i). v1 u0 v0 forms the root of a right and left external square centered at i. Again, the existence of a smaller right external square would contradict the minimality requirement in the definition of dw . The case |u| < |v| is symmetric and can be treated similarly by considering the inverse of w. To conclude, all external minimal squares can be computed in time O(n), for those positions which don’t have internal squares centered in them. We then obtain an O(n) algorithm for computing all minimal squares: first, using the algorithm of Section 3 we compute all internal minimal squares and then, using Lemma 1,we compute all external minimal squares for those positions for which no internal square has been found at the first stage. This proves the main result. Theorem 3. In a word w of length n, all local periods LPw (i) can be computed in time O(n).
5
Conclusions
We presented an algorithm that computes all local periods in a word in a time linear in length of the word. This computation provides an exhaustive information about the local periodic structure of the word. According to the Critical Factorization Theorem, the (global) period of the word is simply the maximum among all local periods. Therefore, as a case application, our algorithm allows to find all possible critical factorization of the word. The main difficulty to solve was to extract all shortest local squares without having to process all individual squares occurring in the word, which would break down the linear time bound. This made impossible an off-the-shelf use of existing repetition-finding algorithms, and necessitated a non-trivial modification of existing methods. An interesting research direction would be to study the combinatorics of possible sets of local periods, in a similar way as it was done for the structure of all (global) periods [10,23]. The results presented in this paper might provide an initial insight for a such study.
Acknowledgments GK, TL and AL have been supported by the french Action Sp´ecifique “Algorithmes et S´equences” of CNRS. JPD, TL and AL have been supported by the NATO grant PST.CLG.977017. Part of this work has been done during the stay of RK at LORIA in summer 2002, supported by INRIA.
References 1. R. S. Boyer and J. S. Moore. A fast string searching algorithm. Communications of the ACM, 20:762–772, 1977.
Linear-Time Computation of Local Periods
397
2. Ch. Choffrut and J. Karhum¨ aki. Combinatorics of words. In G. Rozenberg and A. Salomaa, editors, Handbook on Formal Languages, volume I, 329–438, Springer Verlag, 1997. 3. M. Crochemore and D. Perrin. Two-way string matching. J. ACM, 38:651–675, 1991. 4. M. Crochemore and W. Rytter. Text algorithms. Oxford University Press, 1994. 5. M. Crochemore and W. Rytter. Squares, cubes, and time-space efficient string searching. Algorithmica, 13:405–425, 1995. 6. M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12:244–250, 1981. 7. M. Crochemore. Recherche lin´eaire d’un carr´e dans un mot. Comptes Rendus Acad. Sci. Paris S´er. I Math., 296:781–784, 1983. 8. J.-P. Duval, F. Mignosi, and A. Restivo. Recurrence and periodicity in infinite words from local periods. Theoretical Computer Science, 262(1):269–284, 2001. 9. J.-P. Duval. P´eriodes locales et propagation de p´eriodes dans un mot. Theoretical Computer Science, 204(1-2):87–98, 1998. 10. Leo J. Guibas and Andrew M. Odlyzko. Periods in strings. Journal of Combinatorial Theory, Series A, 30:19–42, 1981. 11. Z. Galil and J. Seiferas. Time-space optimal string matching. Journal of Computer and System Sciences, 26(3):280–294, 1983. 12. D. Gusfield. Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. Cambridge University Press, 1997. 13. R. Kolpakov and G. Kucherov. Finding maximal repetitions in a word in linear time. In Proc. of FOCS’99, New York (USA), 596–604, IEEE Comp. Soc., 1999. 14. R. Kolpakov and G. Kucherov. Finding repeats with fixed gap. In Proc. of the 7th SPIRE, La Coru˜ na, Spain , 162–168, IEEE, 2000. 15. R. Kolpakov and G. Kucherov. Finding Approximate Repetitions under Hamming Distance. In F.Meyer auf der Heide, editor, Proc. of the 9th ESA, Aarhus, Denmark, LNCS 2161, 170–181, 2001. 16. D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM Journal of Computing, 6:323–350, 1977. 17. S. R. Kosaraju. Computation of squares in string. In M. Crochemore and D. Gusfield, editors, Proc. of the 5th CPM, LNCS 807, 146–150, Springer Verlag, 1994. 18. M. Lothaire. Combinatorics on Words, volume 17 of Encyclopedia of Mathematics and Its Applications. Addison Wesley, 1983. 19. M. Lothaire. Algebraic Combinatorics on Words. Cambridge University Press, 2002. 20. M. G. Main. Detecting leftmost maximal periodicities. Discrete Applied Mathematics, 25:145–153, 1989. 21. M.G. Main and R.J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5(3):422–432, 1984. 22. M. Rodeh, V.R. Pratt, and S. Even. Linear algorithm for data compression via string matching. Journal of the ACM, 28(1):16–24, 1981. 23. E. Rivals and S. Rahmann. Combinatorics of periods in strings. In J. van Leuween P. Orejas, P. G. Spirakis, editors, Proc. of the 28th ICALP, LNCS 2076, 615–626, Springer Verlag, 2001. 24. J.A. Storer. Data Compression: Methods and Theory. Computer Science Press, Rockville, MD, 1988.
Two Dimensional Packing: The Power of Rotation Leah Epstein School of Computer Science, The Interdisciplinary Center, Herzliya, Israel
[email protected] Abstract. Recently there is a rise in the study of two-dimensional packing problems. In such problems the input items are rectangles which need to be assigned into unit squares. However, most of the previous work concentrated on fixed items. Fixed items have a fixed direction and must be assigned so that their bottom is parallel to the bottom of the bin. In this paper we study two-dimensional bin packing of rotatable items. Those are rectangles which can be rotated by ninety degrees. We give almost tight bounds for bounded space bin packing of rotatable items, and introduce a new unbounded space algorithm. This improves the results of Fujita and Hada.
1
Introduction
Consider a situation where large sheets of paper need to be cut into smaller pages. A smaller page is cut off the large sheet so that its side is parallel to a side of the large paper. However, we are not restricted in the direction. We can rotate some of the requests by 90◦ if this makes the assignment of small pages into the large sheet easier. The difference between a page of height h and width w and a page of height w and width h is insignificant. Modeling this situation we get the familiar two-dimensional packing problem of packing rectangles into unit squares. However our problem is slightly different as the requests may be rotated. This problem is called “rotatable items packing”, introduced and studied by Fujita and Hada [6]. The same problem is also known as “packing of nonoriented items” [4]. We use bins which are unit squares. The items are rectangles of sides bounded by 1. The items arrive one by one, each rectangle must be assigned to a bin before the next rectangle is introduced. The algorithm has to decide on a position for the rectangle in a previously opened bin, or in a new bin. This decision involves also the decision whether an item is rotated or not. The cost of the algorithm is the number of bins that were used. We study on-line algorithms which are measured by the competitive ratio. This is the asymptotic worst case ratio between the cost of the on-line algorithm and the cost of an optimal off-line algorithm which sees the input stream as a set of items given in advance. Bounded space algorithms are algorithms which may have only a constant number of active bins. Active bins are bins that can be used to pack
Research supported in part by the Israel Science Foundation (grant no. 250/01).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 398–407, 2003. c Springer-Verlag Berlin Heidelberg 2003
Two Dimensional Packing: The Power of Rotation
399
new items. Other previously used bins are “closed”, which means that they cannot accommodate arriving items. Our Results. We design a bounded space algorithm of competitive ratio at most 2935/1152 + δ ≈ 2.54775. We show that this algorithm is very close to have the optimal competitive ratio of bounded space algorithms by showing a lower bound of 120754/47628 ≈ 2.53537 on the competitive ratio of every such algorithm. We also design an algorithm of competitive ratio slightly below 2.45 which uses unbounded space. The first algorithm uses many ideas from the paper [6]. To improve the results we use a more advanced analysis, a better partition to types, and a technique which allows us to have a constant number of open bins. This technique is a special case of the technique developed in [5], where it allows to develop algorithms of optimal competitive ratio for non-rotatable rectangles, boxes and hyperboxes. The second algorithm tries to combine together items which, roughly speaking, occupy a lot of space in any bounded space algorithm. Note that in our model, not only the assignment to a bin, but also the position of a rectangle inside the bin is decided upon arrival. Our lower bound holds also for the model where only the assignment to the bin has to be done right away, and the exact packing of each bin can be postponed till later. Previous Work. For many years now, there has been extensive work on onedimensional bin packing [9,10,11,15,13] and on two-dimensional bin packing of rectangles into unit squares [1,3,2,7,14]. The best on-line algorithm currently known for packing of rectangles into squares achieves the competitive ratio 2.66013 and was given by [14]. Fujita and Hada [6] introduced the rotatable problem and designed two competitive algorithms. None of the algorithms was defined to be bounded space in their work. The first algorithm does not combine very different items in one bin, and using our techniques can be converted into a bounded space algorithm without losing its properties and analysis. Unfortunately there are two errors in the analysis, one of them can be fixed, but the other is more crucial. Given the current definition of the algorithm and the bounds used to analyze it, the competitive ratio should be slightly higher, namely, instead of at most 47/18 ≈ 2.61112 claimed in the paper it is at most 95/36 ≈ 2.63889. It might be possible to get a better bound on the same algorithm using a more complicated analysis. The second algorithm cannot be converted into a bounded space algorithm. Unfortunately it is not well defined in the paper and its analysis builds on the wrong analysis of the first algorithm. Its competitive ratio is claimed to be at most 100/39 ≈ 2.56411. As for lower bounds, Seiden and Van Stee [14] give bounds on packing of squares, cubes and hypercubes, which clearly imply lower bounds on the packing of rotatable items. We focus on their results for d = 2. They show a lower bound of 2.28229 for bounded space algorithms and 1.62176 for unbounded space algorithms. The lower bound for bounded space square was recently improved [5] to 2.3552. Throughout the paper, we denote a rectangle by (x, y). This means that it has height x and width y. We always assume that the input is rotated in a way such that x ≥ y. The algorithms may assign a rectangle to a bin in this position
400
Leah Epstein
or rotated to the other position. In some cases we assume that we can split the bin into parts and rotate some of them. In practice this means that the rectangles are rotated to be placed into this part of the bin. The paper is organized as follows. We start with a proof of the lower bound. In that section we also prove some useful properties. Then we move to the bounded space algorithm, and in the following section we adapt it to combine different types of large items and show how this affects the competitive ratio.
2
A Lower Bound for Bounded Space Algorithms
In order to construct a sequence which allows us to prove the lower bound, we define seven types of relatively large rectangles. For each type we check how many instances of this rectangle can simultaneously fit into one bin (with no rectangles of other types). We will use the three following geometrical lemmas. Lemma 1. Let γ > 0 be a small constant. Given a packing of squares of width and height 1/k +γ, where k ≥ 2 is an integer, each bin may have at most (k −1)2 squares packed in it. Proof. Any vertical or horizontal line through the bin can meet at most k − 1 squares. Take the projection of the squares and the bin on one axis. We get short intervals of length 1/k + γ (projections of squares) on an main interval of length 1 (the projection of the bin). Each point of the main interval can have the projection of at most k − 1 rectangles. Consider the short intervals as an interval graph. The size of the largest clique is at most k−1. Therefore, as interval graphs are perfect [8], we can colour the short intervals using k − 1 colours. Note that the number of intervals of each independent set is at most k − 1 (due to length), and so the total number of intervals is at most (k − 1)2 . We omit the proofs of the two following lemmas. Lemma 2. Consider a packing of identical rectangles of width 1/2 < w < 2/3 and height 1/3 < h < 1/2, such that w + h > 1. Each bin may have one or two rectangles. Lemma 3. Consider a packing of identical rectangles of width 1/2 < w < 4/7 and height 1/7 < h < 1/6, such that w + 3h > 1. Each bin may have at most eight rectangles. We turn to the specific definition of items. Let δ > 0 be a small constant. Type A: Rectangles of width and height 1/2 + δ. By Lemma 1, a bin can only contain a single such rectangle. Type B: Rectangles of width and height 1/3 + δ. By Lemma 1, a bin can only contain at most four such rectangles. Type C1 : Rectangles of width 2/3 − δ and height of 1/3 + 2δ. By Lemma 2, a bin can only contain at most two such rectangles. Type C2 : Rectangles of width 2/3 − 2δ and height of 1/3 + 3δ. By Lemma 2, a bin can only contain at most two such rectangles.
Two Dimensional Packing: The Power of Rotation
401
Type D1 : Rectangles of width 11/21−4δ and height of 10/63+2δ. By Lemma 3, a bin can only contain at most eight such rectangles. Type D2 : Rectangles of width 32/63−4δ and height of 31/189+2δ. By Lemma 3, a bin can only contain at most eight such rectangles. Type E: Rectangles of width and height 1/7 + δ. By Lemma 1, a bin can only contain at most 36 such rectangles. We also use tiny squares of very small height and width. The choice of large rectangles was in a way that one bin (of the optimal off-line algorithm) can contain exactly one of each type. Moreover, there are small gaps left in such a bin. The size of tiny rectangles is picked so that they can fill up all the gaps in a bin containing one of each type of large rectangles. We show how the large rectangles can fit in a bin containing one of each type. We cut off one horizontal strip from the top of the bin. The height of this strip is 1/3 + 2δ. In this strip we assign one rectangle of type B and one C1 rectangle. From the part which is left, we cut off the rightmost part, which is a strip of height 2/3 − 2δ and width 1/3 + 3δ, and assign a rotated C2 item there. We are left with a bin of height 2/3 − 2δ and width 2/3 − 3δ. From this we cut off a horizontal strip of height 10/63 + 2δ (from the top). We use it to pack one E item and one D1 item. In the remainder of the bin we pack the A item and a rotated D2 item. Note that 31/189 + 1/2 < 2/3. The sizes of D1 and D2 were optimized so that they can fit together and with the E item, and under these conditions they need to have total minimum area. This leaves an area of V = 361/47628 − Θ(δ) ≈ 0.007579575 − Θ(δ). A sequence consists of n rectangles of each type, followed by the tiny squares, i.e. n rectangles of type A followed by n rectangles of type B, then C1 , C2 , D1 , D2 , and E (n items each), and finally tiny squares of total area V n . We can now compute the value of the lower bound. Theorem 1. The competitive ratio of any on-line bounded space algorithm for packing of rotatable items is at least 120754/47628 ≈ 2.535357. Proof. A bounded space algorithm can keep at most some constant number of bins open. Therefore each type of items is packed separately (except for a small number of items). A direct calculation gives a lower bound on the number of bins used by the on-line algorithm. We omit the details.
3
A Bounded Space Algorithm
The algorithm classifies items by their size and packs each class separately. To do the classification, we use two parameters ε and M . M ≥ 10 is a positive integer, and ε > 0 is a small constant. The value of ε is chosen so that 1/(εi(i + 1)) is integer for all i < M . The role of ε is to control the additive constant of the algorithm. As ε becomes smaller, there are more open bins. On the other hand, we bound the amount of occupied space in each bin. In those calculations, an exact computation would derive values which depend on ε. For simplicity, we assume that ε is small enough and neglect it, this can change the competitive
402
Leah Epstein
ratio by a very small constant (tending to zero as ε becomes smaller). The value of M also influences the number of classes. A safe choice for M is M = 20, and we use this value. We use the following terms in order to analyze the algorithm. Occupation Ratio. For a class (or a subclass) of items, this is the minimum total area of items in a closed bin used to pack items of this class. Weight. We give each item (assigned to a bin) a weight which is the fraction of a bin which it occupies. This is not the area of the item, but the fraction of the bin that it actually uses (which can be larger than its area). E.g. an item that is a single item in a bin gets weight one. When the algorithm terminates, some bins are active and did not receive a full amount of items as did other bins with similar contents. We ignore these bins, there is only a constant number of them. Some bins will contain a fixed number q of items (that are defined to have similar properties, the number q depends on the items). In this case the weight of each item is 1/q. For a bin B that contains a variety of items, if the occupation ratio is ORB , we simply give an item of area r the weight r/ORB . Expansion. The expansion of an item is the ratio between its weight and its area. As a first step, we would like to identify all cases where the occupation ratio is low, i.e. items with high expansion. This type of algorithm can be analyzed using the weighting method introduced already in [10] and further developed in [13]. We use the following theorem. Theorem 2. Given a bin packing algorithm, define a weight function on items (rectangles with 0 < h ≤ w ≤ 1) such that for all bins packed by the algorithm (except a constant number of bins), the sum of weights of items in the bin is at least 1. Consider the (infinite) set of all possible (finite) sets of items such that there exists a feasible packing of them into a single bin. For each such set, define its total weight by the sum of weights of all items. Then the competitive ratio of an algorithm is bounded from above by the supremum of the weights of those sets. Next we describe the classification. Each rectangle is classified according to its height and width. For a rectangle (x, y), let i be an integer such that 1/(i + 1) < x ≤ 1/i and let j be an integer such that 1/(j + 1) < y ≤ 1/j, clearly i ≤ j (as x ≥ y). If j < M , the class of rectangles with given values i and j is called (large) class (i, j). If j ≥ M but i < M , the class is the medium class i and otherwise we find an integer p ≥ 0 such that 1/(2M ) < x2p ≤ 1/M . Let i be an integer M ≤ i ≤ 2M − 1 so that 1/(i + 1) < x2p ≤ 1/i . Then (x, y) belongs to small class i . We define the packing of each class of each size. For “medium” and “large” classes, we use either “simple packing”, or “advanced packing”. The specific decision depends on the properties of each class. As a rule, in order to get a low competitive ratio we need to be more careful with packing relatively large items, therefore some classes of large items will be split into sub-classes. The exact definition will be given later. We would like to make sure that the expansion ratio of all small and medium items is at most 1.5. For large items this cannot be true and therefore we need to use a deeper analysis for those items.
Two Dimensional Packing: The Power of Rotation
403
We start with defining the simple packing methods for large and medium items. As simple packing of large class items is easiest, and advanced packing of such items is the most complicated, we start and end with large items. The simple packing of a large class bin is natural. There is always at most one open bin used to pack this class. The bin is partitioned into exactly ij identical rectangles (of height 1/i and width 1/j). The partition is done when the rectangle is opened, by cutting it into i identical rows and j identical columns. Each slot can accommodate exactly one item. After assigning ij items, the bin is closed and a new bin is opened for this class. For a class (i, j), a closed bin has ij items of area at least 1/((i + 1)(j + 1)). Therefore the weight of each item is 1/(ij), the occupation ratio is ij/((i + 1)(j + 1)), and the expansion of items is (i + 1)(j + 1)/(ij). The simple packing of a medium class is done as follows, we use it for i ≥ 3. For each 3 ≤ i < M , we keep one open bin. This bin is initialized by cutting it into i horizontal strips of height 1/i and width 1. The items are packed into the strips in an any-fit fashion. Each strip is seen as a bin of one dimension (the width). Each item is packed immediately to the right of the previous item in the strip. All items in a medium class i are assigned to such a bin and packed into one of the strips (using any-fit) ignoring their heights. When an item does not fit into any of the strips, the bin is closed and a new bin for medium class i is initialized. In a closed bin, each strip is full by at least i/(i+1) to the height, and by at least 1 − 1/M to the width (an item which did not fit, has width of at most 1/M ). This gives an occupation ratio of at least (3/4) · (M − 1)/M ≥ 57/80 ≥ 2/3. Advanced packing of medium classes is done for i = 1 and i = 2. We first consider medium class 1. In this case the height of a rectangle is classified further into Θ(1/ε) subclasses. Let α be an integer such that 1 ≤ α ≤ 1/(2ε). The subclass α consists of rectangles (x, y) such that x ∈ (1/2 + (α − 1)ε, 1/2 + αε]. The bin is split into one part of width 1 and height 1/2 + αε, and another part of height 1/2 − αε and width 1/2 + αε. (Some space remains unused). After rotation we have two strips (one wide and one narrow) of height 1/2 + αε. We use them one-dimensionally applying any-fit. When the bin is closed, each strip is full by at least its width minus 1/M . This gives total occupied area of at least (1/2+(α−1)ε)(1−1/M +1/2−αε−1/M ). The function x(1−1/M +1−x−1/M ) is at least 7/10 > 2/3 (achieved for x = 0.5) for M = 20 and 1/2 < x ≤ 1. For i = 2, the height is also classified further into Θ(1/ε) subclasses. Let α be an integer such that 1 ≤ α ≤ 1/(6ε). The subclass α consists of rectangles (x, y) such that x ∈ (1/3 + (α − 1)ε, 1/3 + αε]. Given a subclass α, we partition the bin into two parts of width 1 and height 1/3 + αε, and two parts of height 1/3 − 2αε and width 1/3 + αε. Again some space remains unused. After rotation we have two wide and two narrow strips of height 1/3 + αε. Those strips are used in an any-fit fashion, in one dimension, to pack items of medium class 2. Similarly to the previous case, we can get the function x(2 + 2(1 − 2x) − 4/M ) for 1/3 < x ≤ 1/2. This gives at least occupied area of 37/45 > 2/3 (the minimum is obtained for x = 1/3). The packing of small items is done as follows. For a small class i the initial partition is the same as for medium classes for i ≥ 3, but the strips get further
404
Leah Epstein
partitioned. Packing into strips is again done in one dimension. For each i such that M ≤ i ≤ 2M − 1, there is a single open bin dedicated to it. When such a bin is initialized, it is split into i horizontal strips of height 1/i and width 1. The items which are assigned to this bin are in class i, i.e. for an item (x, y) there exists an integer p ≥ 0 such that 1/(i + 1) < x2p ≤ 1/i. The strips will have heights which are of the form 1/(i2k ). On arrival of an item there are several cases. Given p, if there is an open strip of height 1/(i2p ), and the item fits there, it is simply packed in it. Otherwise (no such strip, or the item does not fit), an empty strip of smallest height 1/(i2p ) that is still larger than 1/(i2p ) (i.e. largest p < p) is picked, and partitioned into horizontal strips of heights 1/(i2p +1 ), . . . , 1/(i2p ). An additional strip of height 1/(i2p ) is created in this process, and it is used to pack the new item. Finally if this is impossible (no p exists), the bin is closed and a new one is initialized. For a bin for small classes, used for a small class i, M ≤ i ≤ 2M − 1, we can analyze the situation when the bin is closed. Note that an empty strip of some height 1/(i2k ), k ≥ 1, is created if no such empty strip exists. When a strip of height 1/(i2p ) is partitioned in order to get a strip of height 1/(i2p ), there are no empty strips of heights 1/(i2p +1 ), . . . , 1/(i2p ). The only height of which two identical strips are created is 1/(i2p ), but one of them is immediately used. Hence only one such empty strip exists at a time. A direct calculation shows that all closed bins for small classes have an occupation ratio of at least 2/3. We omit the details. As mentioned before, for some of the large classes we do not use the simple packing method, but advanced packing. We define and analyze it now. The choice of those classes is done in a way that the occupation ratio becomes large enough for most items. Specifically, we use advanced packing for the following large classes (we picked M = 20 so all the classes listed below are large). 1. i = 1 and 2 ≤ j ≤ M − 1 = 19 2. i = 2 and 3 ≤ j ≤ M − 1 = 19 3. i = 3 and 4 ≤ j ≤ 7. For items in classes (i, i) (i < M ), rotation or further classification does not help, so we pack them using simple packing. Other classes that are not discussed here already have occupation ratio of at least 2/3. Each large class (i, j) (i ≤ j) that is packed by advanced packing is further partitioned into Θ(1/(ε2 )) subclasses. Let α, β be integers such that 1 ≤ α ≤ 1/(εi(i + 1)) and 1 ≤ β ≤ 1/(εj(j + 1)). The subclass (i, j, α, β) consists of rectangles (x, y) such that x ∈ (1/(i + 1) + (α − 1)ε, 1/(i + 1) + αε] and y ∈ (1/(j + 1) + (β − 1)ε, 1/(j + 1) + βε]. We describe two advanced methods: “side by side” packing and “round packing”. The round packing is used only for (i = 1, j ≤ 10), (i = 2, j = 3, 4) and (i = 3, j = 4). We start with the side by side packing. Consider a subclass (i, j, α, β), we describe a partition of a bin into parts which can contain one item each. The size of each part is at least (xα , yβ ) where xα = 1/(i + 1)+αε and yβ = 1/(j + 1)+βε. We first cut the bin into two horizontal strips of heights ixα and 1 − ixα and width 1. The first one is further cut vertically into j identical parts, and
Two Dimensional Packing: The Power of Rotation
405
horizontally into i identical parts. In result we get ij identical parts of width 1/j and height xα . The other part is cut into 1 − ixα /yβ horizontal strips, each of height at least yβ , and further cut into parts of width 1/i. This adds i1 − ixα /yβ parts which can keep one item each. To compute the occupation ratio we see that the total area packed in a closed bin is at least i(xα −ε)(yβ −ε)(j+(1 − ixα )/yβ −1) which is close to ixα yβ (j−1)+ ixα (1−ixα ) for small ε. Given that yβ > 1/(j+1) we get at least ixα (2j/(j + 1)− ixα ). This is at least min{(j − 1)/(j + 1), (i2 j + 2ij − i2 )/((i + 1)2 (j + 1))}. We get at least (j − 1)/(j + 1) for j ≤ 2i + 1. Hence for (i = 2, j = 5) and (i = 3, j = 5, 6, 7) we get occupation ratios of at least 2/3. Otherwise we get at least (i2 j + 2ij − i2 )/((i + 1)2 (j + 1)) and so for (i = 1, j ≥ 11) and (i = 2, j ≥ 6) we get occupation ratios of at least 2/3. Next we describe the “round packing” method. Given sizes s, t such that s+t ≤ 1, we fit two rectangles of width s and height t in the upper left and lower right corners, and two rectangles of width t and height s in the other corners. Since rotation is allowed, we can assume that we thus have four identical areas. Given i, j, α, β we can find positive integers k and r such that kxα + ryβ ≤ 1, we define s = kxα , t = ryβ and get four areas such that each area can contain rk items. In total we can pack 4rk items of subclass (i, j, α, β). Note that using side by side packing we can pack i(j + r) items when r is the largest integer satisfying ixα + ryβ ≤ 1. For i = 3, j = 4 we can pack 12 or 15 items using side by side packing and 16 items using round packing. The case 2xα + 2yβ > 1 implies 3xα + yβ > 1 (as xα > yβ ) so we need to consider only two options. We get 12 items if 2xα + 2yβ > 1, and we get 16 items if 2xα + 2yβ ≤ 1. In the first case the minimum for 12xα yβ (which is the occupation ratio) is 18/25 > 2/3. If 16 items fit, the occupation ratio is at least 16 · (1/5) · (1/4) = 4/5. For i = 2, j = 3, six items are assigned by simple packing. If xα + 2yβ ≤ 1, round packing allows to pack eight items. The first option gives occupation ratio of at least 6/9 = 2/3, the second gives at least 1/3 · (1/4) · 8 = 2/3 as well. Simple packing may be used if the inequality does not hold, and round packing if it does. For i = 2, j = 4, if xα + 2yβ > 1, simple packing assigns eight items. If 2xα + yβ ≤ 1 and xα + 3yβ > 1, we can assign ten items using side by side packing. If xα + 3yβ ≤ 1 we can pack 12 items using round packing. We get occupation ratios of 16/25, 20/27 and 4/5. For i = 1 we use either a simple packing of j items (if xα + yβ > 1) or if xα + ryβ ≤ 1 but xα + (r + 1)yβ > 1, we can choose between packing j + r items side by side or 4r items using round packing. Therefore, if j ≥ 3r we prefer side by side packing and otherwise round packing. For all simple packing cases we get xα +yβ > 1 and j items packed. This gives an occupation ratio of at least j 2 /((j + 1)2 ). If xα + yβ ≤ 1 (but xα + 2yβ > 1) we use round packing for j = 2 only, and for this case get occupation ratio of at least 2/3. For j = 3 we pack four items using round or side by side packing and get occupation ratio of at least 1/2. In all other cases where xα + yβ ≤ 1 and xα + 2yβ > 1 we get an occupation ratio of at least (j − 1)/(j + 1).
406
Leah Epstein
The case xα + 2yβ ≤ 1 and xα + 3yβ > 1 is relevant for j ≥ 4. If 4 ≤ j ≤ 5 we can use round packing and pack eight items. This gives occupation ratio of at least 2/3 (xα > 1/2 and yβ > 1/6). We get occupation ratio of at least (j 2 − 4)/((j + 1)2 ) for j ≥ 6. For j = 6 this gives 32/49 and for j ≥ 7, (j 2 − 4)/((j + 1)2 ) ≥ 2/3 The case xα + 3yβ ≤ 1 and xα + 4yβ > 1 is relevant for j ≥ 6. For 6 ≤ j ≤ 8 we pack 12 items using round packing. The occupation ratio is at least 2/3. For 9 ≤ j ≤ 10 we pack j + 3 items using side by side packing, and get occupation ratio of at least (j 2 − 9)/((j + 1)2 ) > 2/3 for both cases (j = 9 and j = 10). The last option xα + 4yβ ≤ 1 is useful only in some cases where 8 ≤ j ≤ 10. In this case, 16 items are packed using round packing. We get an occupation ratio of at least 8/11 ≥ 2/3. We analyze the competitive ratio using Theorem 2. Due to space constraints we omit the analysis which leads to the following theorem. Theorem 3. The competitive ratio of the above algorithm is at most
2935 1152
+ δ.
It is unclear whether an algorithm of this type can have the best performance among bounded space algorithms. It seems that a different algorithm could pack some items better. However, as we saw in the lower bound section, the performance cannot be improved by very much. Note that it is difficult to analyze algorithms for rotatable items using a computer program. For the one-dimensional case, and also for the standard twodimensional case, there is a fixed set of sizes which is critical. Those are sizes of the structure 1/k + δ (for small δ > 0). However, as we saw in the lower bound section, there are no such rules for rotatable items.
4
An Unbounded Space Algorithm
In the previous sections we saw that the drawback of bounded space algorithms is having to pack different types of items separately. As we saw earlier, the items with largest expansion are the relatively small items which still belong to the (1,1) class. We define an algorithm which combines large but different items together. The idea is to combine not all, but a certain percentage of relatively large items in the same bins. This is similar to the algorithms “Refined Harmonic” [11] and “Modified Harmonic” [12]. Our algorithm packs most items as in the previous algorithm. Some items in the largest three classes: (1,1), (1,2) and (2,2) are packed differently and combined. The analysis uses a generalized form of Theorem 2 and gives a competitive ratio slightly below 2.45. The details are omitted.
5
Conclusion
We showed an almost optimal algorithm (in terms of competitive ratio) for bounded space packing of rotatable items. It is interesting to find an algorithm which can be shown to be optimal. It could be the case that our algorithm, or a simple adaptation, could be used for this purpose. Another interesting issue is
Two Dimensional Packing: The Power of Rotation
407
simplifying the analysis of unbounded space algorithms (i.e. reducing the problem to a given set of possible items as in [13]) in a way that a computer program will be able to perform the analysis. This will allow a deeper analysis which can take smaller items into account, and should result in smaller competitive ratios. A natural extension of the problem is packing rotatable boxes in three dimensional cubes, and packing rotatable d-dimensional hyper-boxes in d-dimensional hyper-cubes. Most methods used in this paper can be applied for more dimensions as well, however there may be many cases to check and it seems that a good analysis can be done once a method to computerize such calculations is developed.
Acknowledgement The author would like to thank the MFCS referee who gave many helpful comments.
References 1. D. Coppersmith and P. Raghavan. Multidimensional online bin packing: Algorithms and worst case analysis. Operations Research Letters, 8:17–20, 1989. 2. J. Csirik and A. van Vliet. An on-line algorithm for multidimensional bin packing. Operations Research Letters, 13(3):149–158, Apr 1993. 3. J. Csirik, J. B. G. Frenk, and M. Labbe. Two dimensional rectangle packing: on line methods and results. Discrete Applied Mathematics, 45:197–204, 1993. 4. M. Dell’Amico, S. Martello, and D. Vigo. A lower bound for the non-oriented two-dimensional bin packing problem. Discrete Applied Mathematics, 118:13–24, 2002. 5. L. Epstein and R. van Stee. Optimal online bounded space multidimensional packing. Technical Report SEN-E0303, CWI, Amsterdam, 2003. 6. S. Fujita and T. Hada. Two-dimensional on-line bin packing problem with rotatable items. Theoretical Computer Science, 289(2):939–952, 2002. 7. G. Galambos and A. van Vliet. Lower bounds for 1-, 2-, and 3-dimensional online bin packing algorithms. Computing, 52:281–297, 1994. 8. T. R. Jensen and B. Toft. Graph coloring problems. Wiley, 1995. 9. D. S. Johnson. Near-optimal bin packing algorithms. PhD thesis, MIT, Cambridge, MA, 1973. 10. D. S. Johnson, A. Demers, J. D. Ullman, Michael R. Garey, and Ronald L. Graham. Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM Journal on Computing, 3:256–278, 1974. 11. C. C. Lee and D. T. Lee. A simple online bin packing algorithm. Journal of the ACM, 32:562–572, 1985. 12. P. Ramanan, D. J. Brown, C. C. Lee, and D. T. Lee. Online bin packing in linear time. Journal of Algorithms, 10:305–326, 1989. 13. S. S. Seiden. On the online bin packing problem. Journal of the ACM, 49(5):640– 671, 2002. 14. S. S. Seiden and R. van Stee. New bounds for multi-dimensional packing. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’2002), pages 486–495, 2002. 15. A. van Vliet. An improved lower bound for online bin packing algorithms. Information Processing Letters, 43:277–284, 1992.
Approximation Schemes for the Min-Max Starting Time Problem Leah Epstein1 and Tamir Tassa2 1
2
School of Computer Science, The Interdisciplinary Center P.O.B 167, 46150 Herzliya, Israel
[email protected] Department of Applied Mathematics, Tel-Aviv University Ramat Aviv, Tel Aviv, Israel
[email protected] Abstract. We consider the off-line scheduling problem of minimizing the maximal starting time. The input to this problem is a sequence of n jobs and m identical machines. The goal is to assign the jobs to the machines so that the first time in which all jobs have already started their processing is minimized, under the restriction that the processing of the jobs on any given machine must respect their original order. Our main result is a polynomial time approximation scheme for this problem in the case where m is considered as part of the input. As the input to this problem is a sequence of jobs, rather than a set of jobs where the order is insignificant, we present techniques that are designed to handle ordering constraints. Those techniques are combined with common techniques of assignment problems in order to yield a polynomial time approximation scheme.
1
Introduction
Consider the following scenario: a computer operator needs to run an ordered sequence of n jobs having known processing times. The operator may assign the jobs to one of m identical and parallel machines, where each job must be executed continuously and completely on one machine. After the jobs have been assigned to the m machines, they must run on each machine according to their original order. The operator needs to verify that each job has started running before he may go home. The goal of the operator is to minimize the time when he could go home. Hence, he aims at finding an assignment that minimizes the maximal starting time, namely, the maximum over all machines of the time in which the last job assigned to that machine starts running. The above scenario may be generalized to any setting where n clients are registered in a service center having m servers; e.g., patients in a clinic where there are m doctors or drivers that bring their car to a garage where there are m service stations. Each client has a priority that could be determined, for example, by the time the client has called in to make an appointment. The clients need
Research supported in part by the Israel Science Foundation (grant no. 250/01).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 408–418, 2003. c Springer-Verlag Berlin Heidelberg 2003
Approximation Schemes for the Min-Max Starting Time Problem
409
to be assigned to servers according to their estimated service time so that the time in which the last client starts being served is minimized. That could be the relevant cost function if the waiting clients need to be attended (for example, if the receptionist in the clinic needs to stay and watch the patients that are still waiting, he would aim at assigning the clients to doctors so that he could leave as early as possible). An assignment constraint is that a client with a lower priority cannot be served before a client of a higher priority by the same server. We note that a similar off-line problem with ordering constraint was presented in [10]. There, the n jobs were to be assigned to m = 2 machines and be processed on each machine according to their original order so that the sum of all completion times is minimized. Many scheduling problems have been studied both in off-line [3,4,6,12,7] and on-line environments. In on-line environments, the jobs usually arrive in a sequence. In off-line environments, however, the input is usually a set of jobs, where the order of jobs is insignificant. The min-max starting time problem was first introduced as an on-line problem in [2]. Here, we study the off-line version of the problem where the input is still viewed as a sequence of jobs and order does matter. Note that if we disregard order in this problem, namely, if we view the sequence of jobs as merely a set of jobs, then it becomes equivalent to the standard makespan problem. Indeed, the non-ordered min-max starting time problem may be reduced to the makespan problem if we remove the m largest jobs and then solve a makespan minimization problem for the remaining n − m jobs. On the other hand, the makespan problem may be reduced to the non-ordered min-max starting time problem by adding m additional jobs of very large size to the given set of jobs. In view of the above, the non-ordered version of our problem is strongly NP-hard, and, consequently, so is the ordered version which we study herein. Due to the strong NP-hardness of the problem, Polynomial Time Approximation Schemes (PTAS) are sought. Such schemes aim at finding a solution whose target function value is larger than the optimal value by a factor of no more than (1 + ε), where ε > 0 is an arbitrarily small parameter. The run-time of such schemes depends polynomially on n and m, but it depends exponentially on 1/ε. In case m is viewed as a constant, one usually aims at finding a Fully Polynomial Time Approximation Scheme (FPTAS), the run time of which depends polynomially on n and 1/ε, but is exponential in the constant m. For example, PTAS for the classical makespan problem were designed by Hochbaum and Shmoys [6,5], while an FPTAS to that problem was presented by Graham in [4] and later by Sahni in [12]. Regarding the on-line version of our problem, an algorithm of competitive approximation ratio 12 is presented in [2]. It is also shown there that a greedy algorithm that performs list scheduling on the sequence [3] has a competitive ratio of Θ(log m). In this paper we design two approximation schemes: a PTAS for the case where m is part of the input and an FPTAS for the case where m is constant (the FPTAS is omitted due to space limitations). In doing so, we employ several techniques that appeared in previous studies: rounding the job sizes [6], distin-
410
Leah Epstein and Tamir Tassa
guishing between small and large jobs and preprocessing the small jobs [5,1], and enumeration. However, the handling of sequences of jobs, rather than sets thereof, is significantly more delicate: the small jobs require a different handling when order matters and one must keep track of the index of the final job that is scheduled to run on each machine in order to guarantee the legality of the assignment. The techniques presented here address those issues. Note that there exists some work on off-line scheduling of sequences. One such example is the precedence constraints problem. The jobs in that problem are given as the vertices of a directed acyclic graph. An edge (a, b) means that job a must be completed before job b is started. The goal is to minimize the makespan. It was proved in [9] that it is hard to achieve for this problem an approximation factor smaller than 4/3, unless P = N P . Schuurman and Woeginger [13] mention this problem as the first among ten main open problems in approximation of scheduling problems. Specifically they ask whether the problem can be approximated up to a factor smaller than 2 − 1/m, an approximation factor that is achieved by the on-line List Scheduling algorithm [3]. We proceed by presenting a formal definition of the problem. In the Minmax Starting Time Problem one is given a sequence of n jobs with processing times pi , 1 ≤ i ≤ n, and m < n identical machines, Mk , 1 ≤ k ≤ m. An assignment of the jobs to the machines is a function A : {1, . . . , n} → {1, . . . , m}. The subset of jobs that are scheduled to run on machine Mk is {pi : i ∈ A−1 (k)} where A−1 (k) = {i : A(i) = k}. The jobs are processed on each machine in the order that corresponds to their index. Hence, the index of the last job to run on machine Mk is given by fk = max{A−1 (k)}. Such jobs are referred to as the final jobs for assignment A. The time in which the final job on machine Mk will start running is given by Fk = pi . The goal is to find an assignment i∈A−1 (k)\{fk }
A such that the time in which the last final job starts running is minimized. Namely, we look for an assignment for which T (A) := max Fk is minimized. 1≤k≤m
The discussion of this problem becomes easier if we assume that the given problem instance is collision-free in the sense that all processing times are different, i.e., pi = pj for i = j. Such an assumption also allows us to identify a job with its processing time. In order to rely upon that assumption, we show how to translate a problem instance having collisions into another one that is collision-free and has a close target function value. To that end, define ∆ = min1≤i,j≤n {pi , |pi −pj |}. If C = {pi }1≤≤c is a cluster of colliding jobs, pi1 = pi }1≤≤c where pˇi = . . . = pic , we replace that cluster of jobs with Cˇ = {ˇ pi + ( − 1) · ε∆ , where 0 < ε ≤ 1. By the definition of ∆, it is clear that after ap2 n plying this procedure to all clusters among {p1 , . . . , pn }, we get a new sequence of perturbed jobs {ˇ p1 , . . . , pˇn } that is collision-free. Moreover, as 0 ≤ pˇi − pi < ε∆ n , we conclude that the value of the target function may increase in wake of such a perturbation by no more than ε∆. Since it is clear that all assignments satisfy T (A) ≥ ∆ (recall that n > m), we may state the following: Proposition 1. Let A be an assignment of {p1 , . . . , pn }, the original sequence of jobs, and let Aˇ be the corresponding assignment of the perturbed sequence of ˇ ≤ (1 + ε) · T (A) , 0 < ε ≤ 1. jobs, {ˇ p1 , . . . , pˇn }. Then T (A) ≤ T (A)
Approximation Schemes for the Min-Max Starting Time Problem
411
In view of the above, we assume henceforth that all processing times are different and we maintain the original notation (i.e., pi and not pˇi ). In the subsequent section we describe a PTAS for the case where m is part of the input. The FPTAS the case where m is constant appears in the full version of this paper.
2
A Polynomial Time Approximation Scheme
To facilitate the presentation of our PTAS, we introduce the following notations: 1. Given an assignment A, FA denotes the subset of indices of final jobs, FA = {fk : 1 ≤ k ≤ m}. 2. FAc denotes the complement subset of indices of non-final jobs. 3. plnf denotes the size of the largest non-final job, plnf = max{pi : i ∈ FAc }. 4. J m+1 = {pj1 , . . . , pjm+1 } denotes the subset of the m + 1 largest jobs. The above definitions imply that one of the jobs in J m+1 has processing time plnf . The main loop in the algorithm goes over all jobs p ∈ J m+1 and considers assignments in which plnf = p. Obviously, by doing so, we cover all possible assignments. For a given value of plnf , we may conclude that all corresponding assignments A satisfy T (A) ≥ plnf . In view of that, we decompose the set of jobs {p1 , . . . , pn } to small and large jobs as follows: pi : pi ≤ εplnf
S=
,
L=
pi : pi > εplnf
.
(1)
Discretizing the Large Jobs. Given a lower bound plnf , we discretize the processing times of all large jobs that are smaller than plnf . Namely, we treat all jobs pi for which εplnf < pi < plnf . To this end, we define a geometric mesh on the interval [ε, 1], ξ0 = ε ;
ξi = (1 + ε)ξi−1 ,
and then, for all p ∈ L,
p =
1≤i≤q ;
plnf · H p/plnf p
q :=
if p < plnf otherwise
− lg ε lg(1 + ε)
,
,
(2)
(3)
where H replaces its argument by the left end point of the interval [ξi−1 , ξi ) where it lies. Note that if p < plnf then p belongs to a finite set of size q, Ω = {ξ0 · plnf , . . . , ξq−1 · plnf }. With this definition, we state the following straightforward proposition. Proposition 2. For a given 0 < ε ≤ 1 and plnf ∈ J m+1 , let S and L be as in (1), and L = {p : p ∈ L} . (4)
412
Leah Epstein and Tamir Tassa
Let A be an assignment of the original jobs, S ∪L, and let A be the corresponding assignment of the modified jobs S ∪ L . Then T (A ) ≤ T (A) ≤ (1 + ε) · T (A ). Preprocessing the Small Jobs. Denote the subsequence of indices of small jobs by i1 < i2 < . . . < ib , where b = |S|. We describe below how to modify the small jobs into another set of jobs, the size of which is either εplnf or 0. To that end, let σr denote the sum of the first r small jobs, r σr = pik 0≤r≤b. (5) k=1
The modified small jobs are defined as follows: εplnf if σr /εplnf > σr−1 /εplnf . (6) Sˆ = {ˆ pi1 , . . . , pˆib } where pˆir = 0 otherwise Proposition 3. Let A be an assignment of the original jobs, S ∪ L, with target ˆ of the value T (A). Then for every 0 < ε ≤ 1 there exists a legal assignment, A, ˆ modified jobs, S ∪ L, such that ˆ ≤ T (A) + 2εplnf . T (A)
(7)
Notation Agreement. Each assignment A : {1, . . . , n} → {1, . . . , m} induces a unique function from the set of processing times {p1 , . . . , pn } to the set of machines {M1 , . . . , Mm }. In view of our assumption of distinct processing times, the converse holds as well. In order to avoid cumbersome notations, we identify between those two equivalent functions and use the same letter to denote them both. For example, notations such as A(i) or A−1 (k) correspond to the indexindex interpretation, while A : S ∪ L → {M1 , . . . , Mm } corresponds to the jobmachine interpretation. Proof. The order of the machines. Consider the subset of indices of final jobs, FA . Without loss of generality, we assume that they are monotonically increasing, i.e., f1 < f2 < . . . < fm . ˆ Define the prefix subsets and sums of A, Description of A. Ak = {pi : pi ∈ S and A(i) ≤ k} ,
τk =
pi
0≤k≤m.
(8)
pi ∈Ak
Namely, Ak denotes the prefix subset of small jobs that are assigned by A to one of the first k machines, while τk denotes the corresponding prefix sum. Next, we define σr τk = 0 ≤ k < m and r(m) = b , (9) r(k) = min r : εplnf εplnf where σr is given in (5) and b is the number of small jobs. Obviously,
Approximation Schemes for the Min-Max Starting Time Problem
0 = r(0) ≤ r(1) ≤ . . . ≤ r(m) = b . Next, we define the assignment Aˆ : Sˆ ∪ L → {M1 , . . . , Mm } as follows: ˆ A(p) = A(p)
∀p ∈ L ,
413
(10)
(11)
namely, it coincides with A for the large jobs, while for the modified small jobs ˆ pi ) = k A(ˆ r
for all 1 ≤ r ≤ b such that r(k − 1) + 1 ≤ r ≤ r(k).
(12)
ˆ Finally, for each of Note that (10) implies that (12) defines Aˆ for all jobs in S. the machines, we rearrange the jobs that were assigned to it in (11) and (12) in an increasing order according to their index. ˆ Similarly to (8), we define the prefix subsets and The prefix sets and sums of A. ˆ sums for the modified small jobs Sˆ and assignment A: ˆ ≤ k} , τˆk = pi : pˆi ∈ Sˆ and A(i) pˆi 0 ≤ k ≤ m . (13) Aˆk = {ˆ ˆk pˆi ∈A
Denote the largest index of a job in Ak by it(k) and the largest index of a job in Aˆk by itˆ(k) . As (12) implies that tˆ(k) = r(k) while, by (9), r(k) ≤ t(k), we conclude that tˆ(k) ≤ t(k). ˆ Given an assignment A : S ∪ L → {M1 , . . . , Mm } The Min-Max start time of A. ˆ we defined an assignment A : Sˆ ∪ L → {M1 , . . . , Mm }. Let Mk be an arbitrary machine, 1 ≤ k ≤ m. The final job in that machine, corresponding to the first assignment, is pfk , and its start time is θk = k + τk − τk−1 − pfk , where k is the sum of large jobs assigned to Mk . Similarly, letting pˆfˆk denote the final job in that machine as dictated by the second assignment, its start time is θˆk = k + τˆk − τˆk−1 − pˆfˆk . In order to prove (7) we show that θˆk ≤ θk + 2εplnf .
(14)
First, we observe that in view of the definition of the modified small jobs, (6), and r(k), (9), τˆk = εplnf · τk /εplnf . Consequently, τˆk − εplnf < τk ≤ τˆk .
(15)
Therefore, it remains to show only that pfk − pˆfˆk ≤ εplnf
(16)
in order to establish (14). If pfk ∈ S then (16) is immediate since then pfk − pˆfˆk ≤ pfk ≤ εplnf . Hence, we concentrate on the more interesting case where pfk ∈ L. In this case ˆ We claim that it is in fact also the final the job pfk is assigned to Mk also by A. job in that latter assignment, namely, pˆfˆk = pfk . This may be seen as follows:
414
Leah Epstein and Tamir Tassa
◦ The indices of the modified small jobs in Mk are bounded by itˆ(k) . ◦ As shown earlier, itˆ(k) ≤ it(k) . ◦ it(k) < fk since the machines are ordered in an increasing order of fk and, therefore, the largest index of a small job that is assigned to one of the first k machines, it(k) , is smaller than the index of the final (large) job on the kth machine, fk . ◦ The above arguments imply that the indices of the modified small jobs in Mk cannot exceed fk . ◦ Hence, the rearrangement of large and modified small jobs that were assigned to Mk by Aˆ keeps job number fk as the final job on that machine. This proves pˆfˆk = pfk and, consequently, (16).
Proposition 4. Let Aˆ be an assignment of the modified jobs, Sˆ ∪ L, with target ˆ Then there exists a legal assignment, A, of the original jobs, S ∪ L, value T (A). such that ˆ + 2εplnf . T (A) ≤ T (A) (17) Remark. Proposition 4 complements Proposition 3 as it deals with the inverse reduction, from a solution in terms of the modified jobs to a solution in terms of the original jobs. Hence the similarity in the proofs of the two complementary propositions. Note, however, that the direction treated in Proposition 4 is the algorithmically important one (as opposed to the direction treated in Proposition 3 that is needed only for the error estimate). Proof. Due to the similarity of this proof to the previous one, we focus on the constructive part of the proof. We first assume that the indices of the final jobs, ˆ are monotonically increasing, namely, fˆ1 < fˆ2 < . . . < fˆm . as dictated by A, ˆ Then, we consider the prefix subsets and sums of A, ˆ ≤ k} , τˆk = Aˆk = {ˆ pi : pˆi ∈ Sˆ and A(i) pˆi 0≤k≤m, (18) ˆk pˆi ∈A
and define
τˆk σr r(k) = min r : = εplnf εplnf
0 ≤ k < m and r(m) = b ,
(19)
where σr and b are as before. Note that τˆk is always an integral multiple of εplnf and 0 = r(0) ≤ r(1) ≤ . . . ≤ r(m) = b. Finally, we define the assignment A : S ∪ L → {M1 , . . . , Mm }: A coincides with Aˆ on L; as for the small jobs S, A is defined by A(pir ) = k
for all 1 ≤ r ≤ b such that r(k − 1) + 1 ≤ r ≤ r(k) .
(20)
The jobs in each machine – large and small – are then sorted according to their index. The proof of estimate (17) is analogous to the proof of (7) in Proposition 3. In fact, if we take the proof of (7) and swap there between every hat-notated ˆ pi ↔ pˆi , it (k) ↔ iˆ etc.), symbol with its non-hat counterpart (i.e., A ↔ A, t(k) we get the corresponding proof of (17).
Approximation Schemes for the Min-Max Starting Time Problem
415
The Algorithm. In the previous subsections we described two modifications of the given jobs that have a small effect on the value of the target function, Propositions 2–4. The first modification translated the values of the large jobs that are smaller than plnf into values from a finite set Ω. The second modification replaced the set of small jobs with modified small jobs of size either 0 or ξ0 · plnf = ε · plnf . Hence, after applying those two modifications, we are left with job sizes p where either p ≥ plnf or p ∈ Ω ∪ {0}. After those preliminaries, we are ready to describe our algorithm. The Main Loop 1. Identify the subsequence of m + 1 largest jobs, J m+1 = {pj1 , . . . , pjm+1 }, pj1 > pj2 > . . . > pjm+1 . 2. For r = 1, . . . , m + 1 do: (a) Set plnf = pjr . (b) Identify the subsets of small and large jobs, (1). (c) Discretize the large jobs, L → L , according to (3)+(4). ˆ (d) Preprocess the small jobs, S → S. (e) Solve the problem in an optimal manner for the modified sequence of jobs Sˆ ∪ L , using the core algorithm that is described below. (f) Record the optimal assignment, Ar , and its value, T (Ar ). 3. Select the assignment Ar for which T (Ar ) is minimal. 4. Translate Ar to an assignment A in terms of the original jobs, using Propositions 2 and 4. 5. Output assignment A. The Core Algorithm The core algorithm receives: (1) a value of r, 1 ≤ r ≤ m + 1; (2) a guess for the largest non-final job, plnf = pjr ; (3) a sequence Φ of r − 1 jobs pj1 , . . . , pjr−1 that are final jobs; (4) a sequence Γ of n − r jobs, the size of which belongs to Ω ∪ {0}. It should be noted that the given choice of plnf splits the remaining n − 1 jobs from Sˆ ∪ L into two subsequences of jobs: those that are larger than plnf (i.e., the r − 1 jobs in Φ that are non-modified jobs) and those that are smaller than plnf (i.e., the n − r jobs in Γ that are modified jobs, either large or small). Step 1: Filling in the remaining final jobs. The input to this algorithm dictates the identity of r − 1 final jobs, Φ. We need to select the additional m − r + 1 final jobs out of the n − r jobs in Γ . We omit the proof of the next lemma. Lemma 1. The essential number of selections of the remaining m − r + 1 final jobs out of the n − r jobs in Γ is polynomial in m.
416
Leah Epstein and Tamir Tassa
Step 2: A makespan problem with constraints. Assume that we selected in Step 1 the remaining m − r + 1 final jobs and let FA = {fk : 1 ≤ k ≤ m} be the resulting selection of final jobs, fk being the index of the selected final job in Mk . Without loss of generality, we assume that f1 < f2 < . . . < fm . Given this selection, an optimal solution may be found by solving the makespan problem on the remaining n − m jobs with the constraint that a job pi may be assigned only to machines Mk where i < fk . Next, define Γη∗ to be the subset of non-final jobs from Γη , −1 ≤ η ≤ q − 1, and let Γq∗ = {plnf }. For each 0 ≤ k ≤ m and −1 ≤ η ≤ q, we let zηk be the number of jobs from Γη∗ whose index is less than fk . Finally, we define for k each 0 ≤ k ≤ m the vector zk = (z−1 , . . . , zqk ) that describes the subset of non-final jobs that could be assigned to at least one of the first k machines. In particular, we see that 0 = z0 ≤ z1 ≤ . . . ≤ zm where the inequality sign is to be understood component-wise and zm describes the entire set of non-final jobs, FAc . In addition, we let P(zk ) = {z ∈ N q+2 : 0 ≤ z ≤ zk } be the set of all sub-vectors of zk . Step 3: Shortest path in a graph. Next, we describe all possible assignments of those jobs to the m machines using a layered graph G = (V, E). The set of vertices is composed of m+1 layers, k m V = ∪m k=0 Vk , where Vk = P(z ) 0 ≤ k < m and Vm = {z }. We see that 0 the first and last layers are composed of one vertex only, {z = 0} and {zm } respectively, while the intermediate layers are monotonically non-decreasing in size corresponding to the non-decreasing prefix subsets P(zk ). The set of edges is composed of m layers, E = ∪m k=1 Ek where Ek is defined by Ek = { (u, v) ∈ Vk−1 × Vk : u ≤ v }. It is now apparent that all possible assignments of jobs from FAc to the m machines are represented by all paths in G from V0 to Vm . Assigning weights to the edges of the graph in the natural q q−1 (vη − uη ) · plnf · ξη , where {ξη }η=0 are given in (2) and manner, w[(u, v)] = η=0
ξq = 1, we may define the cost of a path (u0 , . . . , um ) ∈ V0 × . . . × Vm as T [(u0 , . . . , um )] = max{w[(uk−1 , uk )] : 1 ≤ k ≤ m}. In view of the above, we need to find the shortest path in the graph, from the source V0 to the sink Vm , and then translate it to an assignment in the original jobs. Step 4: Translating the shortest path to an assignment of jobs. For 1 ≤ k ≤ m and −1 ≤ η ≤ q, let ukη be the number of jobs of type η that were assigned by the shortest path to machine k. Then assign the ukη jobs of the lowest indices in Γη∗ to machine Mk and remove those jobs from Γη∗ . Finally, assign the final jobs of indices fk , 1 ≤ k ≤ m. Performance Estimates. We omit the proofs of the following theorems. Theorem 1. For a fixed 0 < ε ≤ 1, the running time of the above described algorithm is polynomial in m and n.
Approximation Schemes for the Min-Max Starting Time Problem
417
Theorem 2. Let T o be the value of an optimal solution to the given Min-Max Starting Time Problem. Let A be the solution that the above described PTAS yields for that problem. Then for all 0 < ε ≤ 1, T (A) ≤ (1 + 35ε)T o .
3
Concluding Remarks
The min-max starting time problem is closely related to the makespan minimization problem in the hierarchical model [11]. In that model, each of the given jobs may be assigned to only a suffix subset of the machines, i.e., {Mk , . . . , Mm } for some 1 ≤ k ≤ m. An FPTAS for the makespan minimization problem in the case where m is constant is given in [7,8]. That FPTAS may serve as a building block for an FPTAS for the min-max starting time problem. We also observe that the same techniques that were presented above to construct a PTAS for the min-max starting time problem may be used in order to construct a PTAS for the makespan minimization problem in the hierarchical model (to the best of our knowledge, such a PTAS was never presented before).
Acknowledgement The authors would like to thank Yossi Azar from Tel-Aviv University for his helpful suggestions.
References 1. N. Alon, Y. Azar, G. Woeginger, and T. Yadid. Approximation schemes for scheduling on parallel machines. Journal of Scheduling, 1:1:55–66, 1998. 2. L. Epstein and R. van Stee. Minimizing the maximum starting time on-line. In Proc. of the 10th Annual European Symposium on Algorithms (ESA’2002), pages 449–460, 2002. 3. R.L. Graham. Bounds for certain multiprocessor anomalies. Bell System Technical Journal, 45:1563–1581, 1966. 4. R.L. Graham. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math, 17:416–429, 1969. 5. D. Hochbaum and D. Shmoys. A polynomial approximation scheme for scheduling on uniform processors: using the dual approximation approach. SIAM Journal on Computing, 17(3):539–551, 1988. 6. D. S. Hochbaum and D. B. Shmoys. Using dual approximation algorithms for scheduling problems: theoretical and practical results. Journal of the Association for Computing Machinery, 34(1):144–162, 1987. 7. E. Horowitz and S. Sahni. Exact and approximate algorithms for scheduling nonidentical processors. Journal of the Association for Computing Machinery, 23:317– 327, 1976. 8. K. Jansen and L. Porkolab. Improved approximation schemes for scheduling unrelated parallel machines. In Proceedings of the 31st annual ACM Symposium on Theory of Computing (STOC’99), pages 408–417, 1999. 9. J. K. Lenstra and A. H. G. Rinnooy Kan. Complexity os scheduling under precedence constraints. Operations Research, 26:22–35, 1978.
418
Leah Epstein and Tamir Tassa
10. S.R. Mehta, R. Chandrasekaran, and H. Emmons. Order-presrving allocation of jobs to two machines. Naval Research Logistics Quarterly, 21:846–847, 1975. 11. J. Naor, A. Bar-Noy, and A. Freund. On-line load balancing in a hierarchical server topology. In Proc. of the 7th European Symp. on Algorithms (ESA’99), pages 77–88. Springer-Verlag, 1999. 12. S. Sahni. Algorithms for scheduling independent tasks. Journal of the Association for Computing Machinery, 23:116–127, 1976. 13. P. Schuurman and G. Woeginger. Polynomial time approximation algorithms for machine scheduling: Ten open problems. Journal of Scheduling, 2:203–213, 1999.
Quantum Testers for Hidden Group Properties Katalin Friedl1 , Fr´ed´eric Magniez2 , Miklos Santha2 , and Pranab Sen2 1 2
CAI, Hungarian Academy of Sciences, H-1111 Budapest, Hungary CNRS–LRI, UMR 8623 Universit´e Paris-Sud, 91405 Orsay, France
Abstract. We construct efficient or query efficient quantum property testers for two existential group properties which have exponential query complexity both for their decision problem in the quantum and for their testing problem in the classical model of computing. These are periodicity in groups and the common coset range property of two functions having identical ranges within each coset of some normal subgroup.
1
Introduction
In the paradigm of property testing one would like to decide whether an object has a global property by performing random local checks. The goal is to distinguish with sufficient confidence the objects which satisfy the property from those objects that are far from having the property. In this sense, property testing is a notion of approximation for the corresponding decision problem. Property testers, with a slightly different objective, were first considered for programs under the name of self-testers. Following the pioneering approach of Blum, Kannan, Luby and Rubinfeld [3], self-testers were constructed for programs purportedly computing functions with some algebraic properties such as linear functions, polynomial functions, and functions satisfying some functional equations [3,14]. The notion in its full generality was defined by Goldreich, Goldwasser and Ron and successfully applied among others to graph properties [8]. For surveys on property testing see [6]. Quantum computing (for surveys see e.g. [13]) is an extremely active research area, where a growing trend is to cast quantum algorithms in a group theoretical setting. In this setting, we are given a finite group G and, besides the group operations, we also have at our disposal a function f mapping G into a finite set. The function f can be queried via an oracle. The complexity of an algorithm is measured by the number of queries (i.e. evaluations of the function f ), and also by the overall running time counting one query as one computational step. We say that an algorithm is query efficient (resp. efficient) if its query complexity (resp. overall time complexity) is polynomial in the logarithm of the order of G. The most important unifying problem of group theory for the purpose of
Research partially supported by the EU 5th framework programs RESQ IST-200137559 and RAND-APX IST-1999-14036, and by CNRS/STIC 01N80/0502 and 01N80/0607 grants, by ACI Cryptologie CR/02 02 0040 grant of the French Research Ministry, and by OTKA T42559, T42706, and NWO-OTKA N34040 grants.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 419–428, 2003. c Springer-Verlag Berlin Heidelberg 2003
420
Katalin Friedl et al.
quantum algorithms has turned out to be the Hidden Subgroup Problem (HSP), which can be cast in the following broad terms: Let H be a subgroup of G such that f is constant on each left coset of H and distinct on different left cosets. We say that f hides the subgroup H. The task is to determine the hidden subgroup H. While no classical algorithm can solve this problem with polynomial query complexity, the biggest success of quantum computing until now is that it can be solved by a quantum algorithm efficiently whenever G is Abelian [15,11]. We will refer to this algorithm as the standard algorithm for the HSP. The main tool for this solution is Fourier sampling based on the (approximate) quantum Fourier transform for Abelian groups which can be efficiently implemented quantumly [11]. In strong opposition to these positive results, a natural generalization of the HSP has exponential quantum query complexity even in Abelian groups. In this generalization, the function f may not be distinct on different cosets. Indeed, the unordered database search problem can be reduced to the decision problem whether a function on a cyclic group has a non-trivial period or not. Two different extensions of property testing were studied recently in the quantum context. The first approach consists in testing quantum devices by classical procedures. Mayers and Yao [12] have designed tests for deciding if a photon source is perfect. These tests guarantee that if a source passes them, it is adequate for the security of the Bennett-Brassard [1] quantum key distribution protocol. Dam, Magniez, Mosca and Santha [4] considered the design of testers for quantum gates. They showed the possibility of classically testing quantum processes and they provided the first family of classical tests allowing one to estimate the reliability of quantum gates. The second approach considers testing deterministic functions by a quantum procedure. Quantum testing of deterministic function families was introduced by Buhrman, Fortnow, Newman, and R¨ ohrig [2], and they have constructed efficient quantum testers for several properties. One of their nicest contributions is that they have considered the possibility that quantum testing of periodicity might be easier than the corresponding decision problem. Indeed, they succeeded in giving a polynomial time quantum tester for periodic functions over Zn2 . They have also proved that any classical tester requires exponential time for this task. Independently and earlier, while working on the extension of the HSP to periodic functions over Z which may be many-to-one in each period, Hales and Hallgren [10] have given the essential ingredients for constructing a polynomial time quantum tester for periodic functions over the cyclic group Zn . But contrarily to [2], their result is not stated in the testing context. In this work, we construct efficient or query efficient quantum testers for two hidden group properties, that is, existential properties over groups whose decision problems have exponential quantum query complexity. We also introduce a new technique in the analysis of quantum testers. Our main contribution is a generalization of the periodicity property studied in [10,2]. For any finite group G and any normal subgroup K, a function f satisfies the property LARGER-PERIOD(K) if there exists a normal subgroup
Quantum Testers for Hidden Group Properties
421
H > K for which f is H-periodic (i.e. f (xh) = f (x) for all x ∈ G and h ∈ H). For this property, we give an efficient tester whenever G is Abelian (Theorem 1). This result generalizes the previous periodicity testers in three aspects. First, we work in any finite Abelian group G, while previously only G = Zn [10] and G = Zn2 [2] were considered. Second, the property we test is parametrized by some known normal subgroup K, while previously only the case K = {0} was considered. Third, our query complexity is only linear in the inverse of the distance parameter, whereas the previous works have a quadratic dependence. Our result implies that the period finding algorithm of [10] has, in fact, query complexity linear in the inverse of the distance parameter, as opposed to only quadratic dependence proved in that paper. The main technical ingredient of the periodicity test in Abelian groups is efficient Fourier sampling. This procedure remains a powerful tool also in nonAbelian groups. Unfortunately, currently no efficient implementation is known for it in general groups. Therefore, when dealing with non-Abelian groups, our aim is to construct query efficient testers. We construct query efficient testers, with query complexity linear in the inverse of the distance parameter, for two properties. First, we show that the tester used for LARGER-PERIOD(K) in Abelian groups yields a query efficient tester when G is any finite group and K any normal subgroup (Theorem 2). Second, we study in any finite group G the property COMMON-COSET-RANGE(k, t) (for short CCR(k, t)), can be thought of as a generalization of the hidden translation property [5,7]. The heart of the tester for CCR(k, t) is again Fourier sampling applied in the direct product group G × Z2 . Our tester is query efficient in any group if k is polylogarithmic in the size of the group (Theorem 4). After finishing this paper, we learnt from Lisa Hales that in her thesis [9], she has also obtained polynomial time quantum testers for periodic functions over any finite Abelian group, although her results, just as those of [10], are not stated explicitly in the testing context. Her proof technique is also closely related to that of [10], and the query complexity of her tester remains quadratic in the inverse of the distance parameter. After hearing a talk about our results, she has pointed out to us that our periodicity tester can be generalized to the integers. For the sake of completeness, with her permission, we include here in Section 4 this efficient periodicity tester over the integers Z. We present a complete correctness proof for this tester (Theorem 3) by combining Hales’s ideas with our earlier periodicity testing results about finite Abelian groups.
2 2.1
Preliminaries Fourier Sampling over Abelian Groups
For a finite set D, let the uniform superposition over D be |D = √1
|D|
x∈D
|x,
and for a function f from D to a finite set S, let the uniform superposition of f be |f = √1 x∈D |x|f (x). For two functions f, g from D to S, their distance is |D|
422
Katalin Friedl et al.
dist(f, g) = |{x ∈ D : f (x) = g(x)}|/|D|. In this paper, · denotes the 2 -norm and ·1 denotes the 1 -norm of a vector. Proposition 1. For functions f, g defined on the same finite set, dist(f, g) = 2 1 2 |f − |g . Let G be a finite Abelian group and H ≤ G a subgroup. The coset of x ∈ G with respect to H is denoted by x + H. We use the notation <X> for the subgroup generated by a subset X of G. We identify with G the set of characters of G, via some fixed isomorphism y → χy . The orthogonal G of H ≤ G is defined as H ⊥ = {y ∈ G : ∀h ∈ H, χy (h) = 1}, and we |H| set |H ⊥ (x) = y∈H ⊥ χy (x)|y. The quantum Fourier transform over |G| unitary transformation defined as follows: For every x ∈ G, G, QFTG , is the QFTG |x = √1 y∈G χy (x)|y. |G|
Proposition 2. Let G be a finite Abelian group, x ∈ G and H ≤ G. Then QFT
G |x + H −−−−→ |H ⊥ (x).
The following well known quantum Fourier sampling algorithm will be used as a building block in our quantum testers. In the algorithm, f : G → S is given by a quantum oracle. Fourier samplingf (G) 1. Create zero-state |0G |0S . 2. Create the superposition √1
|G|
x∈G
|xG in the first register.
3. Query function f . 4. Apply QFTG on the first register. 5. Observe and then output the first register. 2.2
Property Testing
Let D and S be two finite sets and let C be a family of functions from D to S. Let F ⊆ C be the sub-family of functions of interest, that is, the set of functions possessing the desired property. In the testing problem, one is interested in distinguishing functions f : D → S, given by an oracle, which belong to F, from functions which are far from every function in F. Definition 1 (δ-tester). Let F ⊆ C and 0 ≤ δ < 1. A quantum (resp. probabilistic) δ-tester for F on C is a quantum (resp. probabilistic) oracle Turing machine T such that, for every f ∈ C, 1. if f ∈ F then Pr[T f accepts] = 1, 2. if dist(f, F) > δ then Pr[T f rejects] ≥ 2/3, where the probabilities are taken over the observation results (resp. the coin tosses) of T . By our definition, a tester always accepts functions having the property F. We may also consider testers with two-sided error, where this condition is relaxed, and one requires only that the tester accept functions from F with probability at least 2/3.
Quantum Testers for Hidden Group Properties
3
423
Periodicity in Finite Groups
In this section, we design quantum testers for testing periodicity of functions from a finite group G to a finite set S. For a normal subgroup H G, a function f : G → S is H-periodic if for all x ∈ G and h ∈ H, f (xh) = f (x). Notice that our definition describes formally right H-periodicity, but this coincides with left H-periodicity since H is normal. The set of H-periodic functions is denoted by Per(H). For a known normal subgroup H, testing if f ∈ Per(H) can be easily done classically by sampling random elements x ∈ G and h ∈ H and verifying that f (xh) = f (x), as can be seen from the following proposition. Proposition 3. Let G be a finite group, H G and f : G → S a function. Let η = Prx∈G,h∈H [f (xh) = f (x)]. Then, η/2 ≤ dist(f, Per(H)) ≤ 2η. On the other hand, testing if a function has a non-trivial period is classically hard even in Zn2 [2]. The main result of this section is that we can test query efficiently (and efficiently in the Abelian case) by a quantum algorithm an even more general property: Does a function have a strictly larger period than a known normal subgroup K G? Indeed, we test the family LARGER-PERIOD(K) = {f : G → S | ∃H G, H > K and f is H-periodic}. 3.1
Finite Abelian Case
In this subsection, we give our algorithm for testing periodicity in finite Abelian groups. Theorem 1 below states that this algorithm is efficient. The algorithm assumes that G has an efficient exact quantum Fourier transform. When G only has an efficient approximate quantum Fourier transform, the algorithm has two-sided error. Efficient implementations of approximate quantum Fourier transforms exist in every finite Abelian group [11]. Test Larger periodf (G, K, δ) 1. N ← 4 log(|G|)/δ. 2. For i = 1, . . . , N do yi ← Fourier samplingf (G). 3. Accept iff 1≤i≤N < K ⊥ . Theorem 1. For a finite set S, finite Abelian group G, subgroup K ≤ G, and 0 < δ < 1, Test Larger period(G, K, δ) is a δ-tester for LARGER-PERIOD(K) on the family of all functions from G to S, with O(log(|G|)/δ) query complexity and (log(|G|)/δ)O(1) time complexity. Let S be a finite set and G a finite Abelian group. We describe now the ingredients of our two-step correction process. First, we generalize the notion of uniform superposition of a function to uniform superposition of a probabilistic function. By definition, a probabilistic function is a mapping µ : x → µx from the domain G to probability distributions on S. For every x ∈ G, define the unit
424
Katalin Friedl et al.
1 -norm vector |µx = s∈S µx (s)|s. Then the uniform superposition of µ is defined as |µ = √1 x∈G |x|µx . Notice that |µ has unit 2 -norm when µ is |G|
a (deterministic) function, otherwise its 2 -norm is smaller. A function f : G → S and a subgroup H ≤ G naturally define an H-periodic |f −1 (s)∩(x+H)| . The value µf,H probabilistic function µf,H , where µf,H x (s) = x (s) |H| is the proportion of elements in the coset x + H where f takes the value s. When f is H-periodic |µf,H = |f , and so |µf,H = 1, otherwise |µf,H < 1. The next two lemmas, which imply Theorem 1, give the connection 2 between the distance |f − |µf,H and respectively the probability that Fourier sampling outputs an element outside H ⊥ , and dist(f, Per(H)). 2 Lemma 1. |f − |µf,H = Pr[Fourier samplingf (G) outputs y ∈ H ⊥ ]. Proof. Since y ∈ H ⊥ iff y ∈ 1 ⊥ √ √ 1 x∈G |{0} (x)|f (x) − |G|
{0}⊥ − H ⊥ , the probability term is 2 ⊥ x∈G |H (x)|f (x) . We apply the
|G||H|
inverse quantum Fourier transform QFT−1 G , which is 2 -norm preserving, to the first register in the above expression. The probability becomes 2 1 |f − √ x∈G |x + H|f (x) , using Proposition 2. Changing the vari |G||H| ables, the second term inside the norm is 1 1 1 1 |x |f (x − h) = |x |f (x + h), |H| |H| |G| |G| x∈G
x∈G
h∈H
h∈H
where theequality holds because a subgroup of G. We conclude by observing H isf,H 1 f,H |f (x + h) = µ that |H| h∈H s∈S x (s)|s = |µx . 2 Lemma 2. dist(f, Per(H)) ≤ 2 |f − |µf,H . Proof.It will be useful to rewrite |f as a probabilistic function f f √1 x∈G |x s∈S δx (s)|s, where δx (s) = 1 if f (x) = s and 0 otherwise. |G|
Let us define the H-periodic function g : G → S by g(x) = Majh∈H f (x + h), where ties are decided arbitrarily. In fact, g is the correction of f with respect to H-periodicity. Proposition 1 and the H-periodicity of g imply dist(f, Per(H)) ≤ 2 1 |g − |µf,H ≤ |f − |µf,H . This will al|f − |g . We will show that 2 low us to prove the desired statement using the triangle inequality. Observe that for any function h : G → S, we have 2 |h − |µf,H 2 = 1 |δxh (s) − µf,H (1) x (s)| . |G| x∈G s∈S
Moreover for every x ∈ G, one can establish 2 2 f,H |δxg (s) − µf,H (µf,H x (s)| = 1 + x (s)) − 2µx (g(x)) ≤1+
s∈S
s∈S 2 (µf,H x (s))
−
2µf,H x (f (x))
=
s∈S
s∈S 2 |δxf (s) − µf,H x (s)| ,
(2)
Quantum Testers for Hidden Group Properties
425
f,H where the inequality follows from µf,H in turn follows x (f (x)) ≤ µx (g(x)), which |g − |µf,H immediately from the definition of g. From (1) and (2) we get that ≤ |f − |µf,H , which completes the proof.
Lemmas 1 and 2 together can be interpreted as the robustness [14] in the quantum context [4] of the property that Fourier samplingf (G) outputs only y ∈ H ⊥ : if f does not satisfy exactly the property but with error probability less than δ, then f is 2δ-close to a function that satisfies exactly the property. 3.2
Finite General Case
We now give our algorithm for testing periodicity in general finite groups. Our main tool continues to be the quantum Fourier√transform (over a general finite group). For any d × d matrix M , define |M = d 1≤i,j≤d Mi,j |M, i, j. Let G be a complete set of finite dimensional inequivalent be any finite group and let G irreducible unitary representations of G. Thus, for any ρ ∈ G of dimension dρ and x ∈ G, |ρ(x) = dρ 1≤i,j≤dρ (ρ(x))i,j |ρ, i, j. The quantum Fourier transform over G is the unitary transformation defined as follows: For every : x ∈ G, QFTG |x = √1 |ρ(x). For any H G set H ⊥ = {ρ ∈ G ρ∈G |G|
∀h ∈ H, ρ(h) = Idρ }, where Idρ is the dρ × dρ identity matrix. Let |H ⊥ (x) = |H| ρ∈H ⊥ |ρ(x). |G| QFT
G Proposition 4. If x ∈ G and H G, then |xH −−−−→ |H ⊥ (x).
Test Larger periodf (G, K, δ) 1. N ← 4 log(|G|)/δ. 2. For i = 1, . . . , N do ρi ← Fourier samplingf (G). 3. Accept iff ∩1≤i≤N ker ρi > K. In the above algorithm, Fourier samplingf (G) is as before, except that we only observe the representation ρ, and not the indices i, j. Thus, the output K is assumed to be a normal of Fourier samplingf (G) is an element of G. subgroup of G. For any ρ ∈ G, ker ρ denotes its kernel. We now prove the robustness of the property that Fourier samplingf (G) outputs only ρ ∈ H ⊥ , for any finite group G, normal subgroup H and H-periodic function f . This robustness corresponds to Lemmas 1 and 2 of the Abelian case. Lemma 3. Let f : G → S and H G. Then dist(f, Per(H)) ≤ 2 · Pr[Fourier samplingf (G) outputs ρ ∈ H ⊥ ]. Our second theorem states that Test Larger period is a query efficient tester for LARGER-PERIOD(K) for any finite group G. Theorem 2. For a finite set S, finite group G, normal subgroup K G, and 0 < δ < 1, Test Larger period(G, K, δ) is a δ-tester for LARGER-PERIOD(K) on the family of all functions from G to S, with O(log(|G|)/δ) query complexity.
426
4
Katalin Friedl et al.
Periodicity on Z
We address here the problem of periodicity testing when the group is finitely generated Abelian, but possibly infinite. For Z, it is still possible to test if a function is periodic. The proof involves Fourier sampling methods of [10] and the following lemma which was communicated to us by Hales. Lemma 4. Let G be a finite Abelian group, f : G → S a function and δ > 0. Set N = 4(log|G|)2 /δ. For i = 1, . . . , N , let yi = Fourier samplingf (G) and set Y = 1≤i≤N . Then Pr[f is δ-close to Per(Y ⊥ )] ≥ 2/3. Proof. Let E be the complementary event dist(f, Per(Y ⊥ )) > δ. Then E is realized exactly when there is a subgroup H ≤ G such that dist(f, Per(H)) > δ and H ⊥ = Y . Therefore Pr(E) is upper bounded by Pr[dist(f, Per(H)) > δ and H ⊥ = Y ] ≤ (Pr[y1 ∈ H ⊥ ])N . H≤G
H≤G,dist(f,Per(H))>δ
The number of subgroups of G is at most |G|log |G| , and since by Lemmas 1 and 2 the probability that y1 is in H ⊥ is at most 1 − δ/2, we can upper bound Pr[E] by |G|log |G| (1 − δ/2) ≤ 1/3. For the sake of clarity, we now restrict ourselves to functions defined over the natural numbers N. For any integer T ≥ 1, we identify the set {0, . . . , T −1} with ZT in the usual way. We recast Test Larger period(G, K, δ) in the arithmetic formalism when G = ZT and K = ≤ G, for some p0 dividing T . Test Dividing periodf (T, p0 , δ) 1. N ← 4 log(T )/δ. 2. For i = 1, . . . , N do yi ← Fourier samplingf (ZT ) and compute the reduced fraction abii of yTi . 3. p ← lcm{bi : 1 ≤ i ≤ N }. 4. Accept iff p divides and is less than p0 . Then Lemma 4 can be also rewritten as follows. Corollary 1. Let T ≥ 1 be an integer, f : ZT → S a function and δ > 0. Set N = 4(log T )2 /δ. For i = 1, . . . , N let yi = Fourier samplingf (ZT ), yi ai bi be the reduced fraction of T , and set p = lcm{bi : 1 ≤ i ≤ N }. Then Pr[f is δ-close to Per(
)] ≥ 2/3. We want to test periodicity in the family of functions defined on N. To make the problem finite, we fix an upper bound on the period. Then, a function f : {0, . . . , T − 1} → S is q-periodic, for 1 ≤ q < T , if f (x + aq) = f (x), for every x, a ∈ N such that x + aq < T . The problem we now want to test is if there exists a period less than some given number t. More precisely, we define for integers 2 ≤ t ≤ T, INT-PERIOD(T, t) = {f : {0, . . . , T − 1} → S | ∃q : 1 ≤ q < t, f is q-periodic}. Here we do not require that q divides t since we do not have any finite group structure.
Quantum Testers for Hidden Group Properties
427
Test Integer periodf (T, t, δ) 1. N ← Ω((log T )2 /δ). 2. For i = 1, . . . , N do yi ← Fourier samplingf (ZT ), and use the continued fractions method to round yTi to the nearest fraction abii with bi < t. 3. p ← lcm{bi : 1 ≤ i ≤ N }. 4. If p ≥ t, reject. 5. Tp ← T /pp. 6. M ← Ω(1/δ). 7. For i = 1, . . . , M let ai , xi ∈R ZTp . 1 |{i : f (xi + ai p mod Tp ) = f (xi )}| < 2δ . 8. Accept iff M Theorem 3. For 0 < δ < 1, and integers 2 ≤ t ≤ T such that T /(log T )4 = Ω((t log t/δ)2 ), Test Integer period(T, t, δ) is a δ-tester with two-sided error for INT-PERIOD(T, t) on the family of functions from {0, . . . , T − 1} to S, with O((log T )2 /δ) query complexity and (log T /δ)O(1) time complexity.
5
Common Coset Range
In this section, G denotes a finite group and S a finite set. Let f0 , f1 be functions from G to S. For a normal subgroup H G, we say that f0 and f1 are Hsimilar if on all cosets of H the ranges of f0 and f1 are the same, that is, the multiset equality f0 (xH) = f1 (xH) holds for every x ∈ G. Consider the function f : G × Z2 → S, where by definition f (x, b) = fb (x). We will use f for (f0 , f1 ) when it is convenient in the coming discussion. We denote by Range(H) the set of functions f such that f0 and f1 are H-similar. We say that H is (k, t)-generated, for some positive integers k, t, if |H| ≤ k and it is the normal closure of a subgroup generated by at most t elements. The aim of this section is to establish that for any positive integers k and t, the family COMMON-COSET-RANGE(k, t) (for short CCR(k, t)), defined as the set {f : G × Z2 → S | ∃H G : H is (k, t)-generated, f0 and f1 are H-similar}, can be tested by the following quantum test. Note that a subgroup of size k is always generated by at most log k elements, therefore we always assume that t ≤ log k. In the testing algorithm, we assume that we have a quantum oracle for the function f : G × Z2 → S. Test Common coset rangef (G, k, t, δ) 1. N ← 2kt log(|G|)/δ. 2. For i = 1, . . . , N do (ρi , bi ) ← Fourier samplingf (G × Z2 ). 3. Accept iff ∃H G : H is (k, t)-generated ∀i (bi = 1 =⇒ ρi ∈ H ⊥ ). We first prove the robustness of the property that when Fourier samplingf (G × Z2 ) outputs (ρ, 1), where G is any finite group, H G and f ∈ Range(H), then ρ is not in H ⊥ .
428
Katalin Friedl et al.
Lemma 5. Let S be a finite set and G a finite group. Let f : G × Z2 → S and H G. Then dist(f, Range(H)) ≤ |H| · Pr[Fourier samplingf (G × Z2 ) outputs (ρ, 1) such that ρ ∈ H ⊥ ]. Our next theorem implies that CCR(k, t) is query efficiently testable when k is polynomial in log|G|. Theorem 4. For any finite set S, finite group G, integers k ≥ 1, 1 ≤ t ≤ log k, and 0 < δ < 1, Test Common coset range(G, k, t, δ) is a δ-tester for CCR(k, t) on the family of all functions from G × Z2 to S, with O(kt log(|G|)/δ) query complexity. The proof technique of Theorem 4.2 of [2] yields: Theorem 5. Let G be a finite Abelian group and let k be the exponent of G. For testing CCR(k, 1) on G, any classical randomized bounded error query algorithm on G requires Ω( |G|) queries.
References 1. C. H. Bennett and G. Brassard. Quantum cryptography: Public key distribution and coin tossing. In Proc. IEEE International Conference on Computers, Systems, and Signal Processing, pages 175–179, 1984. 2. H. Buhrman, L. Fortnow, I. Newman, and H. R¨ ohrig. Quantum property testing. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2003. 3. M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical problems. J. Comput. System Sci., 47(3):549–595, 1993. 4. W. van Dam, F. Magniez, M. Mosca, and M. Santha. Self-testing of universal and fault-tolerant sets of quantum gates. Proc. 32nd ACM STOC, pp. 688–696, 2000. 5. M. Ettinger and P. Høyer. On quantum algorithms for noncommutative hidden subgroups. Adv. in Appl. Math., 25(3):239–251, 2000. 6. E. Fischer. The art of uninformed decisions: A primer to property testing, the computational complexity. In The Computational Complexity Column, volume 75, pages 97–126. The Bulletin of the EATCS, 2001. 7. K. Friedl, G. Ivanyos, F. Magniez, M. Santha, and P. Sen. Hidden translation and orbit coset in quantum computing. In Proc. 35th ACM STOC, 2003. 8. O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653–750, 1998. 9. L. Hales. The Quantum Fourier Transform and Extensions of the Abelian Hidden Subgroup Problem. PhD thesis, University of California, Berkeley, 2002. 10. L. Hales and S. Hallgren. An improved quantum Fourier transform algorithm and applications. In Proc. 41st IEEE FOCS, pages 515–525, 2000. 11. A. Kitaev. Quantum measurements and the Abelian Stabilizer Problem. Technical report no. 9511026, Quantum Physics e-Print archive, 1995. 12. D. Mayers and A. Yao. Quantum cryptography with imperfect apparatus. In Proc. 39th IEEE FOCS, pages 503–509, 1998. 13. M. Nielsen and I. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000. 14. R. Rubinfeld and M. Sudan. Robust characterizations of polynomials with applications to program testing. SIAM J. Comp., 25(2):23–32, 1996. 15. P. Shor. Algorithms for quantum computation: Discrete logarithm and factoring. SIAM J. Comp., 26(5):1484–1509, 1997.
Local LTL with Past Constants Is Expressively Complete for Mazurkiewicz Traces Paul Gastin1 , Madhavan Mukund2 , and K. Narayan Kumar2 1 2
LIAFA, Universit´e Paris 7, 2, place Jussieu, F-75251 Paris Cedex 05, France [email protected] Chennai Mathematical Institute, 92 G N Chetty Road, Chennai 600 017, India {madhavan,kumar}@cmi.ac.in
Abstract. To obtain an expressively complete linear-time temporal logic (LTL) over Mazurkiewicz traces that is computationally tractable, we need to intepret formulas locally, at individual events in a trace, rather than globally, at configurations. Such local logics necessarily require past modalities, in contrast to the classical setting of LTL over sequences. Earlier attempts at defining expressively complete local logics have used very general past modalities as well as filters (side-conditions) that “look sideways” and talk of concurrent events. In this paper, we show that it is possible to use unfiltered future modalities in conjunction with past constants and still obtain a logic that is expressively complete over traces. Keywords: Temporal logics, Mazurkiewicz traces, concurrency
1
Introduction
Linear-time temporal logic (LTL) [17] has established itself as a useful formalism for specifying the interleaved behaviour of reactive systems. To combat the combinatorial blow-up involved in describing computations of concurrent systems in terms of interleavings, there has been a lot of interest in using temporal logic more directly on labelled partial orders. Mazurkiewicz traces [13] are labelled partial orders generated by dependence alphabets of the form (Σ, D), where D is a dependence relation over Σ. If (a, b) ∈ / D, a and b are deemed to be independent actions that may occur concurrently. Traces are a natural formalism for describing the behaviour of static networks of communicating finite-state agents [24]. LTL over Σ-labelled sequences is equivalent to FOΣ ( d. Example 2. The system {ab → bac} is match-bounded by 1, {ab → ac, ca → bc} is match-bounded by 2, and {ab → ac, ca → b} is match-bounded by 3. (None of these systems is deleting.) See Section 5 for verification of the bounds.
4
Match-Bounded Systems Preserve Regularity
Theorem 2. If R is match-bounded, then R preserves regularity. Proof. By Theorem 1, deleting systems preserve regularity. By Proposition 1, all we need is two more rational transducers to do the encoding and decoding. Example 3. The system R = {ab → ba} on Σ = {a, b} is not regularity preserving, since R∗ ((ab)∗ )∩a∗ b∗ = {an bn | n ≥ 0} is not regular. So Theorem 2 implies that R is not match-bounded. (See Example 5 for a more direct proof.) Example 4. Peg solitaire is a one-person game. The objective is to remove pegs from a board. A move consists of one peg X hopping over an adjacent peg Y , landing on the empty space on the other side of Y . After the hop, Y is removed. Peg solitaire on a one-dimensional board corresponds to the rewriting system P = { → , → },
Match-Bounded String Rewriting Systems
453
where stands for “peg”, and for “empty”. One is interested in the language of all positions that can be reduced to one single peg, which is P −∗ (∗ ∗ ). Regularity of P −∗ (∗ ∗ ) is a “folklore theorem”, see [16] for its history. The system P − is match-bounded by 2, so we obtain yet another proof of that result. The automata Ac constructed for matchc (P − )∗ (∗ ∗ ) have sizes of 2, 14, and 30 respectively for A0 , A1 , and A2 . Remark 1. Ravikumar [17] proved that P − preserves regularity by considering the system’s change-bound (which is 4). Change-boundedness is a concept which is strongly related to match-boundedness. Given a length-preserving string rewriting system R (viz. || = |r| for every rule → r), define the system change(R) = { → r | (base() → base(r)) ∈ R, height(succ()) = height(r)} over alphabet Σ × N, where succ is the morphism succ : (Σ × N)∗ → (Σ × N)∗ induced by succ : (a, h) → (a, h + 1). Ravikumar proves that if change(R) has bounded height, then R preserves regularity. Our results both generalize and strengthen this, the main improvement being that the definition of match does also apply to systems that are not length-preserving. For length-preserving systems, match(R) will always give lower or equal heights, so our result implies Ravikumar’s.
5
Verification and Refutation of Match-Bounds
Theorem 3. The following problem is decidable: Given: A string rewriting system R, a regular language L, and c ∈ N. Question: Is R match-bounded by c for L? Proof. Construct (a finite automaton for) Lc = (lift0 ◦ matchc (R)∗ )(L), using Proposition 1. Then decide whether Lc contains a string x that has a factor liftc (), for some rule → r in R. If this is not the case, then Lc = Lc+1 = · · · and R is match-bounded by c for L. Otherwise, we have found a “high redex” in x, thus there is a string y with x →match(R) y and max(height(y)) = c + 1. For an implementation, the enormous growth of | matchc (R)| as a function of c is problematic. If we are computing matchc (R)∗ (lift0 (L)), then we should restrict attention to those rules of matchc (R) that are accessible in derivations starting from lift0 (L). For a language L ⊆ Σ ∗ , a system R over Σ, and a system S ⊆ match(R) define accessible(L, R, S) = match(R) ∩ (factor(S ∗ (lift0 (L))) × (Σ × N)∗ ). Note that this construction is effective if a finite system S and a regular language L are effectively given. We construct a sequence of rewriting systems Ri by R0 = ∅ and Ri+1 = accessible(L, R, Ri ). Induction on i shows Ri ⊆ matchi (R) for i ≥ 0. In particular, every system Ri is finite. By induction on i, using
454
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
monotonicity of S → accessible(L, R, S), one also proves that Ri ⊆ Ri+1 . Define ∗ R∞ = i∈N Ri . Clearly, R∞ (lift0 (L)) = match(R)∗ (lift0 (L)). If R is matchbounded by c, then R∞ is a subset of matchc (R); so R∞ is finite, and there is an index N such that RN = RN +1 = · · · . If R is not match-bounded then R∞ contains for each c a rule with height c, and is so infinite. The enumeration of Ri up to i = | matchc (R)| + 1 can be used as an alternative decision procedure for Theorem 3. In some cases we can also verify automatically that a given rewriting system R is not match-bounded for a language L. For this purpose, we try to find a self-embedding set of witnesses, as follows. The set raisedc (R, L) consists of all strings that occur as the base of a “high factor” (with all positions of height > 0) of a string that is reachable by a matchc (R)-derivation starting from lift0 (L): raisedc (R, L) = base(factor(matchc (R)∗ (lift0 (L))) ∩ (Σ × {1, 2, . . .})∗ ). First we observe that a match(R)-derivation can be raised to larger heights. For u , u ∈ (Σ × N)∗ we write u ≥ u if base(u ) = base(u) and height(u ) ≥n height(u), where ≥n denotes the pointwise greater-or-equal ordering on Nn . Lemma 1. If u ≥ u →match(R) v, then u →match(R) v ≥ v for some string v . Proposition 4. Let R be a string rewriting system, let L be a language, both over Σ. If there are c ∈ N and a language W ⊆ L ∩ raisedc (R, W ) with W ⊆ {}, then R is not match-bounded for L. Proof. We call u ∈ Σ ∗ a witness for height h if there is a match(R)-derivation from lift0 (u) to some string in (Σ × N)∗ that contains at least one position of height ≥ h. We will show that for each h ∈ N, there is some witness u ∈ W for height h. For h = 0, there is nothing to prove. By induction, assume u ∈ W is a witness for height h. Since W ⊆ raisedc (R, W ), there is some v ∈ W such that u ∈ raisedc (R, v). We claim that v is a witness for height h + 1. By definition of raisedc , there is a match(R)-derivation D from lift0 (v) to some string xu y with base(u ) = u and min(height u ) ≥ 1. Since u is a witness for h, there is a match(R)-derivation E from lift0 (u) to some word w with maximum height ≥ h. This derivation can be relabelled to a derivation from succ(lift0 (u)) = lift1 (u) to succ(w), where succ is the morphism defined in Section 4 that increases the height of each position by 1. By Lemma 1 and u ≥ lift1 (u), this derivation can be raised to a derivation E : u →∗ w for some string w ≥ succ(w). Now, D and E can be combined to lift0 (v) →∗ xu y →∗ xw y, such that max(height w ) ≥ h + 1. Note that the condition in Proposition 4 can be effectively checked if a finite SRS R, a number c ∈ N, and regular languages W and L are effectively given. Example 5. The system R = {ab → ba} (cf. Example 3) is not match-bounded for Σ ∗ . Take W = (ab)+ . Then raised1 (R, W ) = factor((ba)+ ) ⊇ W . Example 6. Neither is R = {aabb → ba} match-bounded, as wittnessed by W = {a, b}∗ = raised1 (R, W ). See Example 8 for a similar system with different behaviour.
Match-Bounded String Rewriting Systems
455
We have implemented the algorithm according to Theorem 3 and Proposition 4, see http://theo1.informatik.uni-leipzig.de/˜joe/bounded/.
6
Deciding Termination for Inverse Deleting and Inverse Match-Bounded Systems
In this section, we will prove that termination is decidable for inverse deleting string rewriting systems, and conclude that the same holds for inverse matchbounded systems. Lemma 2. Let s ⊆ Σ ∗ × Γ ∗ be a substitution, and let K be a regular language over Γ . Then Inf(s ∩ (Σ ∗ × K)) is regular. Proof. Consider a finite automaton A with state set Q that accepts K. Denote x by L(A, p, q) the set of strings x for which there is a path p → q in A. We define an automaton B over alphabet Σ × {F, I} as follows. The sets of states, initial states, and final states of B and A coincide. For p, q ∈ Q and a ∈ Σ, B contains the transition (a,I)
– p −→ q iff the language s(a) ∩ L(A, p, q) is infinite, (a,F )
– p −→ q iff the language s(a) ∩ L(A, p, q) is finite and non-empty. We claim that a1 . . . an ∈ Inf(s ∩ (Σ ∗ × K)) for ai ∈ Σ if and only if there is an accepting path in B that is labelled by (a1 , b1 ) . . . (an , bn ) where at least one bi equals I. Therefore, Inf(s ∩ (Σ ∗ × K)) = π(L(B) \ (Σ × F )∗ ) where π : (Σ × {I, F })∗ → Σ ∗ is the morphism induced by π : (a, b) → a. Lemma 3. Let Σ, Σ0 , Γ, Γ0 be alphabets, let s ⊆ Σ ∗ × Γ ∗ be a substitution, and let T1 ⊆ Σ0∗ × Σ ∗ and T2 ⊆ Γ ∗ × Γ0∗ be finitely branching rational transductions. Then Inf(T1 ◦ s ◦ T2 ) is regular. Proof. By Lemma 2, since Inf(T1 ◦ s ◦ T2 ) = T1− (Inf(s ∩ (Σ ∗ × T2− (Γ0∗ )))).
Remark 2. The regularity results in Lemma 2 and Lemma 3 are effective if s is an L-substitution for a family L of languages that is closed under intersection with regular sets, and for which emptiness and finiteness are decidable. This is the case, e.g., for the family of context-free languages, as in the proof of Proposition 5 below. Proposition 5. For an inverse deleting SRS R, Inf(R∗ ) is effectively regular. Proof. Let R be a system over alphabet Σ such that R− is deleting. First we exclude some trivial cases. Since R is inverse deleting, we have ∈ / rhs(R). And if R contains a rule → r = , then Inf(R∗ ) = Σ ∗ . So from now on, we may assume ∈ / lhs(R) ∪ rhs(R). By Theorem 1 we have R−∗ = (s ◦ C −∗ ) ∩ (Σ ∗ × Σ ∗ ), where s ⊆ Σ ∗ × Γ ∗ is a finite substitution into an extended alphabet Γ ⊇ Σ, and C is a context-free
456
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
rewriting system over Γ . Reviewing the construction in [10], we find that no can occur on either side of s and C, so C −∗ is a context-free substitution c ⊆ Γ ∗ ×Γ ∗ , and s− ⊆ Γ ∗ × Σ ∗ is the inverse of a finite and epsilon-free subsitution. We have R∗ = e ◦ c ◦ s− , where e ⊆ Σ ∗ × Γ ∗ is the embedding of Σ ∗ in Γ ∗ , therefore the claim follows by Lemma 3 and Remark 2. Theorem 4. The following problem is decidable: Given: A regular language L over Σ; an inverse deleting SRS R over Σ. Question: Is there an infinite R-derivation starting from a string in L? Proof. A finitely branching binary relation ρ is well-founded if and only if ρ∗ is finitely branching and ρ+ is irreflexive. Note that if R− is deleting then R−+ is well-founded, hence irreflexive. So there is an infinite R-derivation starting from a string in L if, and only if, Inf(R∗ )∩L = ∅. By Proposition 5, Inf(R∗ ) is regular, so emptiness of Inf(R∗ ) ∩ L is decidable. Corollary 2. Termination and uniform termination are decidable for inverse deleting string rewriting systems. Proof. Choose L = {x} to decide whether there is an infinite derivation starting with string x, and choose L = Σ ∗ to decide uniform termination. Example 7. McNaughton [13] proves decidability of termination and of uniform termination for the following class of string rewriting systems: A system R is called an inhibitor system, if there is a letter a ∈ / Σ such that ∈ Σ + and ∗ ∗ r ∈ (Σ ∪ {a}) \ Σ for every rule → r in R. (Inhibitor systems play a vital role in solving the uniform termination problem of well-behaved SRSs [13].) We can give an alternative proof by observing that an inhibitor system R is inverse deleting for the ordering that makes a greater than every other letter. Hence decidability of (uniform) termination follows from Corollary 2. As a bonus, we get context-freeness of R∗ (x) for x ∈ Σ ∗ , a result by Ginsburg and Greibach [8]. This shows once more that language classes and the uniform termination problem are intrinsically related. Theorem 5. Termination and uniform termination is decidable for string rewriting systems R for which R− is match-bounded. Proof. Assume R− match-bounded by c. Then each derivation modulo R corresponds to a derivation modulo matchc (R− )− = S, by the remark before Definition 1. So termination of R and S coincide. By Proposition 2, S is an inverse deleting system, and by Corollary 2, (uniform) termination of S is decidable. Example 8. Proving termination of the one-rule system Z = {aabb → bbbaaa} is known as Zantema’s Problem. This is a “modern classic” in rewriting [3,4,12,19,20,22], as it provides a test case where most of the automated methods for termination proofs fail. The match-bound of Z − is 2, therefore termination can be mechanically verified. (Recall that the fact that Z − is inverse match-bounded is in itself not a proof of termination for Z.) The computation of match(Z − )∗ (Σ ∗ ) according to Section 5 takes five iterations (i.e.,
Match-Bounded String Rewriting Systems
457
Z4− = Z5− = Z6− = · · · ). In our implementation (Haskell code compiled with ghc-5.04.2), this needs about 70 CPU seconds on a 2.4-GHz Pentium. The resulting automaton has 199 states. The intermediate constructions according to Theorem 1 involve much larger automata (up to 1576 states with 15999 transitions), on much larger alphabets (up to 283 letters).
7
Discussion
If the flow of information during rewriting is suitably restricted, nice properties hold: termination, bounded derivational complexity, or preservation of regular languages. For instance, McNaughton [13] and independently Ferreira and Zantema [5] use extra letters to indicate absence of information flow through certain positions. Kobayashi et al. [11] restrict derivations by using markers for the start and the end of a redex. S´enizergues [19] constructs finite automata to solve the termination problem for certain one-rule string rewriting systems. Moczydlowski and Geser [14,15] restrict the way the right hand side of a rule may be consumed in order to simulate the rewrite relation by the computation of a pushdown automaton. Our concepts of deleting and match-bounded rewriting aim at extending these approaches to a systematic theory of termination by language properties. The concept of match-bounded string rewriting opens two novel approaches to automated termination proofs: match-bounded systems are terminating, and for inverse match-bounded systems, termination is decidable. These methods can be further strengthened by considering match-boundedness not for all strings over the respective alphabet, but only for suitably chosen subsets. As we have demonstrated elsewhere [7], the right hand sides of forward closures are a suitable such subset. We expect these powerful tools to enable some major progress in the problem of deciding uniform termination for one-rule string rewriting systems, an open problem for 13 years [12], see [18, Problem 21]. Single-player games like Peg Solitaire can be analyzed through the construction of reachability sets. It is very challenging to extend this approach to twoplayer rewriting games [21]. Instead of termination (which is required anyway to give a well-defined game), for instance, one would like to know whether winning sets are regular. Even the impartial case is hard; here the central question is whether Grundy values are bounded. It seems natural to carry over the notion of match-boundedness to term rewriting, in order to obtain both closure properties and new automated termination proof methods.
Acknowledgements This research was supported in part by the National Aeronautics and Space Administration (NASA) while the last two authors were visiting scientists at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center (LaRC), Hampton, VA, in September 2002.
458
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
References 1. J. Berstel. Transductions and Context-Free Languages. Teubner, Stuttgart, 1979. 2. R. V. Book and F. Otto. String-Rewriting Systems. Texts and Monographs in Computer Science. Springer-Verlag, New York, 1993. 3. T. Coquand and H. Persson. A proof-theoretical investigation of Zantema’s problem. In M. Nielsen and W. Thomas (Eds.) 11th Annual Conf. of the EACSL CSL-97, Lect. Notes Comp. Sci. Vol. 1414, pp. 177–188. Springer-Verlag, 1998. 4. N. Dershowitz and C. Hoot. Topics in termination. In C. Kirchner (Ed.), Proc. 5th Int. Conf. Rewriting Techniques and Applications RTA-93, Lect. Notes Comp. Sci. Vol. 690, pp. 198–212. Springer-Verlag, 1993. 5. M. C. F. Ferreira and H. Zantema. Dummy elimination: Making termination easier. In H. Reichel (Ed.), 10th Int. Symp. Fundamentals of Computation Theory FCT-95, Lect. Notes Comp. Sci. Vol. 965, pp. 243–252. Springer-Verlag, 1995. 6. T. Genet and F. Klay. Rewriting for Cryptographic Protocol Verification. In D. A. McAllester (Ed.), 17th Int. Conf. Automated Deduction CADE-17, Lect. Notes Artificial Intelligence Vol. 1831, pp. 271–290. Springer-Verlag, 2000. 7. A. Geser, D. Hofbauer, and J. Waldmann. Match-bounded string rewriting systems and automated termination proofs. 6th Int. Workshop on Termination WST-03, Valencia, Spain, 2003. 8. S. Ginsburg and S. A. Greibach. Mappings which preserve context sensitive languages. Inform. and Control, 9(6):563–582, 1966. 9. T. N. Hibbard. Context-limited grammars. J. ACM, 21(3):446–453, 1974. 10. D. Hofbauer and J. Waldmann. Deleting string rewriting systems preserve regularity. In Proc. 7th Int. Conf. Developments in Language Theory DLT-03, Lect. Notes Comp. Sci., Springer-Verlag, 2003. To appear. 11. Y. Kobayashi, M. Katsura, and K. Shikishima-Tsuji. Termination and derivational complexity of confluent one-rule string-rewriting systems. Theoret. Comput. Sci., 262(1-2):583–632, 2001. 12. W. Kurth. Termination und Konfluenz von Semi-Thue-Systemen mit nur einer Regel. Dissertation, Technische Universit¨ at Clausthal, Germany, 1990. 13. R. McNaughton. Semi-Thue systems with an inhibitor. J. Automat. Reason., 26:409–431, 2001. 14. W. Moczydlowski Jr. Jednoregulowe systemy przepisywania sl´ ow. Masters thesis, Warsaw University, Poland, 2002. 15. W. Moczydlowski Jr. and A. Geser. Termination of single-threaded one-rule SemiThue systems. Technical Report TR 02-08 (273), Warsaw University, Dec. 2002. Available at http://research.nianet.org/˜geser/papers/single.html. 16. C. Moore and D. Eppstein. One-dimensional peg solitaire, and duotaire. In R. J. Nowakowski (Ed.), More Games of No Chance, Cambridge Univ. Press, 2003. 17. B. Ravikumar. Peg-solitaire, string rewriting systems and finite automata. In H.-W. Leong, H. Imai, and S. Jain (Eds.), Proc. 8th Int. Symp. Algorithms and Computation ISAAC-97, Lect. Notes Comp. Sci. Vol. 1350, pp. 233–242. SpringerVerlag, 1997. 18. The RTA list of open problems. http://www.lsv.ens-cachan.fr/rtaloop/. 19. G. S´enizergues. On the termination problem for one-rule semi-Thue systems. In H. Ganzinger (Ed.), Proc. 7th Int. Conf. Rewriting Techniques and Applications RTA-96, Lect. Notes Comp. Sci. Vol. 1103, pp. 302–316. Springer-Verlag, 1996.
Match-Bounded String Rewriting Systems
459
20. E. Tahhan Bittar. Complexit´e lin´eaire du probl`eme de Zantema. C. R. Acad. Sci. Paris S´er. I Inform. Th´ eor., t. 323:1201–1206, 1996. 21. J. Waldmann. Rewrite games. In S. Tison (Ed.), Proc. 13th Int. Conf. Rewriting Techniques and Applications RTA-02, Lect. Notes Comp. Sci. Vol. 2378, pp. 144– 158. Springer-Verlag, 2002. 22. H. Zantema and A. Geser. A complete characterization of termination of 0p 1q → 1r 0s . Appl. Algebra Engrg. Comm. Comput., 11(1):1–25, 2000.
Probabilistic and Nondeterministic Unary Automata Gregor Gramlich Institut f¨ ur Informatik Johann Wolfgang Goethe–Universit¨ at Frankfurt Robert-Mayer-Straße 11-15 60054 Frankfurt am Main, Germany [email protected] Fax: +49 - 69 - 798-28814
Abstract. We investigate unary regular languages and compare deterministic finite automata (DFA’s), nondeterministic finite automata (NFA’s) and probabilistic finite automata (PFA’s) with respect to their size. Given a unary PFA with n states and an -isolated cutpoint, we show 1 that the minimal equivalent DFA has at most n 2 states in its cycle. This result is almost optimal, since for any α < 1 a family of PFA’s can be α constructed such that every equivalent DFA has at least n 2 states. Thus we show that for the model of probabilistic automata with a constant error bound, there is only a polynomial blowup for cyclic languages. Given a unary NFA with n states, we show that efficiently √ approximating the size of a minimal equivalent NFA within the factor ln nn is impossible unless P = N P . This result even holds under the promise that the accepted language is cyclic. On the other hand we show that we can approximate a minimal NFA within the factor ln n, if we are given a cyclic unary n-state DFA.
1
Introduction
Regular languages and finite state automata as their acceptance devices, are well studied objects. We consider DFA’s, NFA’s and PFA’s with isolated cutpoint and compare their sizes. For an n-state PFA with -isolated cutpoint, the equivalent DFA needs at 1 n−1 most (1 + 2 ) states [9]. For a unary alphabet, Milani and Pighizzini [8] show √ the tight bound of Θ(e n ln n ) for the number of states in the cycle of the minimal DFA. This result does not depend on the size of the isolation and the proof of the lower bound actually relies on an isolation that tends to zero. We show that the isolation plays a crucial role, namely that L can be accepted by a DFA 1 with at most n 2 states in its cycle. Thus, for constant isolation , we improve the upper bound of Milani and Pighizzini to be a polynomial in n.
Partially supported by DFG project SCHN503/2-1
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 460–469, 2003. c Springer-Verlag Berlin Heidelberg 2003
Probabilistic and Nondeterministic Unary Automata
461
The minimization problem for DFA’s can be efficiently solved. But for a given DFA, the problem of determining the minimal number of states of an equivalent NFA is P SP ACE-complete [5]. A result of Stockmeyer and Meyer [10] shows that the problem of minimizing a given NFA is P SP ACE-complete for a binary alphabet and N P -complete for a unary alphabet. We show that, given an n-state NFA accepting L, it is impossible to efficiently approximate the number of states of a minimal NFA accepting L within a factor √ of ln nn unless P = N P . This result holds even under the promise that L is a unary cyclic language and can be extended to PFA’s with isolated cutpoint. On the other hand we show that if we are given a unary cyclic n-state DFA accepting L, then we can efficiently construct an equivalent NFA with at most k · (1 + ln n) states, where k is the number of states of a minimal NFA accepting L. This contrasts with a result of Jiang et al. [4] who show that the number of states of a minimal NFA, equivalent to a given unary DFA, cannot be computed in polynomial time, unless N P ⊆ DT IM E(nO(ln n) ). This result even holds, if we restrict the DFA to accept only cyclic languages. The next section gives a short introduction into unary NFA’s and unary PFA’s. Unary PFA’s with -isolated cutpoint, resp. unary NFA’s, are investigated in sections 3 and 4 respectively.
2
Preliminaries
We consider unary languages L ⊆ {a}∗ . A unary regular language is recognized by a DFA that starts with a possibly empty path and ends in a non-empty cycle. A language L is ultimately d-cyclic, if there is a µ ∈ IN0 , so that (aj ∈ L ⇔ j+d a ∈ L) holds for any j ≥ µ and we say that d is an ultimate period of L. A smallest ultimate period is called the minimal ultimate period c(L) and any ultimate period is a multiple of the minimal ultimate period. L is called cyclic, if the path of the minimal DFA for L is empty. For cyclic languages we use the term period instead of ultimate period and d-cyclic (resp. minimally d-cyclic) instead of ultimately d-cyclic (resp. minimally utimately d-cyclic). The size of an automaton A is the number of states of A. For a given regular language L, we use nsize(L) as the minimal size of an NFA accepting L. A normal form for unary NFA’s is established by Chrobak in [1]. His construction converts a given NFA N with n states into an equivalent NFA N consisting of a deterministic path and several deterministic cycles. Only the last state of the path branches nondeterministically into one state of each cycle. The path of N has length O(n2 ), and the number of all states in the cycles is bounded by n. Chrobak proves, that L(N ) is ultimately d-cyclic, where d is the least common multiple of the length of the cycles in N . For cyclic languages we introduce union automata as automata in Chrobak normal form with an empty path. Definition 1. A union automaton U is described by a collection (A1 , . . . , Ak ) of cyclic DFA’s. U accepts an kinput w iff there is an Ai , such that Ai accepts w. The size of U is defined as i=1 si , where si is the number of states of Ai .
462
Gregor Gramlich
To convert a union automaton U into an NFA with a single inital state, we simply add one state q0 and transitions from q0 to each state that succeeds an initial state of the deterministic automata that U consists of. Jiang, McDowell and Ravikumar [4] show a structural result about minimal unary NFA’s accepting cyclic languages. Fact 1. [4] Let L be a minimally D-cyclic unary language. Every minimal NFA accepting L can be obtained by converting some minimal union automaton U accepting L into an NFA. Moreover D is the least common multiple of the cycle lengths of U . αr 1 Consider the prime factorization of D = pα 1 ·. . .·pr , where the pi are distinct αr 1 and αi ∈ IN, then every NFA accepting L has at least pα 1 + . . . + pr states. This result offers some clues about the composition of the (ultimate) period of a unary language which also apply to probabilistic finite automata which we define as follows. A unary PFA M with a set Q of n states is described by a stochastic n × n matrix A, a stochastic row vector π representing the initial distribution, and a column vector η ∈ {0, 1}n indicating the final states. Observe that πAj η is the acceptance probability for input aj . The language accepted by M with respect to a cutpoint λ ∈ [0, 1] is L(M, λ) = {aj |πAj η > λ}. We call cutpoint λ -isolated, if for any j ∈ IN0 : |πAj η − λ| ≥ . We call a cutpoint isolated, if there is an > 0, so that it is -isolated. We regard A as the stochastic matrix of a finite Markov chain M, with rows and columns indexed by states, and consider the representation of M as a directed graph GA = (V, E) with V = Q. An arc from state q to state p exists in GA , if Ap,q > 0. We call a strongly connected component B ⊆ Q in GA ergodic1 , if starting in any state q ∈ B, we cannot reach any state outside of B. States within an ergodic component are called ergodic states, non-ergodic states are called transient. For an ergodic component B, the period of q ∈ B is defined as dq = gcd{j| starting in q one can reach q with exactly j steps}. All states q ∈ B have the same period d = dq , which we call the period of B. Factorization and primality play an important role for (ultimate) periods. To estimate the size of the i-th prime number we use the following fact. Fact 2. [3] If pi is the i-th prime number, then i ln i ≤ pi ≤ 2i ln i for i ≥ 3.
3
Unary PFA’s with -Isolated Cutpoint
In [8] Milani and Pighizzini show, that the ergodic components of a unary PFA with isolated cutpoint basically play the same role as the cycles of an NFA in Chrobak normal form. The least common multiple D of the periods of these components is an ultimate period of the language L(M, λ) accepted by the PFA. This result does not take the isolation into account and yields√an exponential upper bound for the ultimate period, namely c(L(M, λ)) = O(e n ln n ) where n 1
Unlike some authors we do not require an ergodic component to be aperiodic.
Probabilistic and Nondeterministic Unary Automata
463
is the number of states in the PFA. We show that the ultimate period c(L(M, λ)) decreases significantly with increasing isolation and this results in a polynomial upper bound for c(L(M, λ)), if is a constant. As a first step, Lemma 1 shows that the period di of an ergodic component Bi with absorption probability ri < 2, where (πAt )p = prob(a random walk is eventually absorbed into Bi ), ri := lim t→∞
p∈Bi
does not play a role for c(L(M, λ)), neither do periods of collections of ergodic components with small combined absorption probability. Lemma 1. Let B1 , . . . , Bm be the ergodic components of a Markov chain with periods di and absorption probabilities ri , respectively. If the corresponding PFA M accepts L := L(M, λ) with -isolated cutpoint, then for any I ⊆ {1, . . . , m} with i∈I ri > 1 − 2, D(I) := lcm{di |i ∈ I} is an ultimate period of L and thus is a multiple of c(L). Proof (Sketch). For an ultimate period D of L the limit A∞ := limt→∞ (AD )t exists, where we require convergence in each entry of the matrix. This can be shown by bringing the matrix A into a normal form (see Gantmacher [2]), so that the stochastic submatrix Ai for each ergodic component Bi forms a block within A. If Bi has period di , then limt→∞ (Adi i )t exists. Since D is a multiple of every di , the limit of (AD )t exists. As a consequence from [8] and from the existence of this limit, for every δ there must be a µδ ∈ IN, such that for every j ≥ µδ , aj ∈ L ⇔ aj+D ∈ L and |(πAj )q − (πA(j mod D) A∞ )q | < δ. q∈Q
Let I ⊆ {1, . . . , m} be a set of indices with i∈I ri > 1 − 2. Assume that D(I) is not an ultimate period of L. Then there is some j > µδ with aj ∈ L and aj+D(I) ∈ L. So πAj η ≥ λ + and πAj+D(I) η ≤ λ − , and thus j j+D(I) η ≥ 2. Let (x)+ = x if x > 0, and π A −A let (x)+ = 0 otherwise. n Remember, that η ∈ {0, 1} . Then we have with QI := i∈I Bi ∪ {q|q transient} (π(Aj − Aj+D(I) ))q 2 ≤ q∈Q
≤
(π(Aj − Aj+D(I) ))+ q +
q∈QI
(π(Aj − Aj+D(I) ))+ q .
(1)
q∈QI
The proof of the existence of A∞ also shows that if we restrict the matrix A to all the states in QI and call the resulting substochastic matrix AI , then the D(I) limit limt→∞ (AI )t exists as well. And so, for δ = 2 − i∈I ri and for any j ≥ µδ , we get (π(Aj − Aj+D(I) ))+ (2) q < δ. q∈QI
But on the other hand, for any j ≥ 0
464
Gregor Gramlich
(π(Aj − Aj+D(I) ))+ q ≤
q∈QI
(πAj )q ≤
q∈QI
ri = 2 − δ.
(3)
i∈I
The second inequality follows, since the absorption probability is the limit of a monotonically increasing sequence. So we have reached a contradiction, since the sum of (3) and (2) does not satisfy (1). We can now exclude some prime powers as potential divisors of c(L(M, λ)). Definition 2. Let M be a PFA with ergodic periods di and absorption probabilities ri . We call a prime power q = ps -essential (for M ), if ri ≥ 2 and ri < 2. i: q divides di
i: q·p divides di
Lemma 2. If λ is -isolated for a PFA M , then D= q. q is −essential
is an ultimate period of L = L(M, λ). Hence D is a multiple of c(L). Proof. Assume that c(L) is a multiple of a prime power pk which does not divide any -essential prime power. Let J = {i|pk divides di }, and let I = {1, . . . , m}\J be the complement of J. Then pk does not divide any di with i ∈ I and thus pk k does not divide D(I) = lcm{d i |i ∈ I}.Since p does not divide any -essential prime power, we have that i∈J ri < 2, and so i∈I ri > 1 − 2. According to Lemma 1, D(I) is a multiple of c(L). But on the other hand D(I) is not a multiple of pk . This is a contradiction, since pk was assumed to divide c(L). Now we show the tight upper bound for the minimal ultimate period of a language accepted by an -isolated PFA. Theorem 1. a) For any unary PFA M with n states and -isolated cutpoint λ 1
c(L(M, λ)) ≤ n 2 . 1 with m ∈ IN, there is a PFA M with n b) For any 0 ≤ α < 1 and any = 2m α states and -isolated cutpoint λ, such that c(L(M, λ)) > n 2 .
Proof. a) Let M have m ergodic components with periods d1 , . . . , dm . Set D := q and remember, that q is −essential i: q divides di ri ≥ 2 for any -essential q, then ri 2 2 q ≤ q i: q divides di D = q is −essential
=
q is −essential m i=1 q is −essential, q divides di
q ri ≤
m i=1
dri i .
Probabilistic and Nondeterministic Unary Automata
465
m Now, since i=1 ri = 1, the weighted arithmetic mean is at least as large as the geometric mean, and thus m
ri d i ≥
i=1
m i=1
dri i .
Since D ≥ c(L(M, λ)) with Lemma 2, we obtain n≥
m
di ≥
i=1
m
ri d i ≥
i=1
m i=1
dri i ≥ D2 ≥ c(L(M, λ))2 .
And the claim follows. b) Let p1 , p2 , . . . be the sequence of prime numbers. We define the languages
k+m−1 j Lk,m = a j ≡ 0 mod pi i=k
k+m−1
m for k, m ≥ 1. Obviously c(Lk,m ) = i=k p i ≥ pm k ≥ (k ln k) . 1 On the other hand Lk,m can be accepted by a PFA with isolation = 2m 1 and cutpoint λ = 1 − 2m as follows. We define a “union automaton with an initial distribution” by setting up m disjoint cycles of length pk , pk+1 , . . . , pk+m−1 , respectively. The transition probability from one state to the next in a cycle is 1. There is exactly one final state in each cycle and the initial distribution 1 places probability m on each final state. For every word az ∈ Lk,m we have z ≡ 0(mod pi ) for every k ≤ i ≤ k + m − 1 and for every word az ∈ Lk,m there is at least one i with z ≡ 0(mod pi ). Thus a word is either accepted with 1 probability 1, or it can reach acceptance probability at most 1 − m . Applying Fact 2, the number of states in the PFA is
k+m k+m−1 k+m−1 pi ≤ 2 i ln i ≤ 2 x ln x dx nk,m = i=k
=2
i=k
k
2 x=k+m
x x ln x − 2 4 2
x=k
≤ (k + 2km + m ) ln(k + m) − k 2 ln k m + (2km + m2 ) ln(k + m). = k 2 ln 1 + k m k ≤ ln em = m, But since k ln 1 + m k = ln 1 + k 2
2
nk,m ≤ km + (2km + m2 ) ln(k + m) ≤ (3km + m2 ) ln(k + m). Thus for any 0 ≤ α < 1, any constant m =
1 2
and a sufficiently large k, we have a
2 , c(Lk,m ) ≥ (k ln k)m > ((3km + m2 ) ln(k + m))αm ≥ nk,m
and the claim follows.
Our result shows that for a fixed isolation , the ultimate period of the language accepted by the PFA M with n states is only polynomial in n.
466
4
Gregor Gramlich
Approximating the Size of a Minimal NFA
Stockmeyer and Meyer [10] show, that the universe problem L(N ) = Σ ∗ is N P complete for regular expressions and NFA’s N , even if we consider only unary languages. Since our argument is based on their construction, we show the proof. Fact 3. [10] For a unary NFA N , it is N P -hard to decide, if L(N ) = {a}∗ . Proof. We reduce 3SAT to the universe problem for unary NFA’s. Let Φ be a 3CNF-formula overn variables with m clauses. Let p1 , . . . , pn be the first n n primes and set D := i=1 pi . According to the chinese remainder theorem, the n function µ : IN0 → IN0 with µ(x) = (x mod p1 , . . . , x mod pn ) is injective, if we restrict the domain to {0, . . . , D − 1}. We call x a code (for an assignment), if µ(x) ∈ {0, 1}n . We construct a union automaton NΦ that accepts {a}∗ iff Φ is not satisfiable. We first make sure, that L0,Φ = {ak |k is not a code} is accepted. Therefore, for every prime pi (pi > 2) we construct a cycle that accepts the words aj with j ≡ 0(mod pi ) ∧ j ≡ 1(mod pi ). So there are 2 non-final states and (pi − 2) final states in the cycle. For every clause C of Φ with variables xi1 , xi2 , xi3 we construct a cycle C ∗ of length pi1 pi2 pi3 . C ∗ will accept {ak | the assignment k mod pij for xij (j = 1, 2, 3) does not satisfy C}. Since the falsifying assignment is unique for the three variables in question, exactly one state is accepting in C ∗ . The construction can be done in time polynomial in the length of Φ. If there is a word aj ∈ L(NΦ ), then j is a code for a satisfying assignment. On the other hand every satisfying assignment has a code j and aj is not accepted by NΦ . We set LΦ = L(NΦ ) for the automaton NΦ constructed above. Observe that LΦ is a union of cyclic languages and hence itself cyclic. Obviously if Φ ∈ 3SAT , 1. We will show, that for Φ ∈ 3SAT every then the minimal NFA for LΦ has size n NFA accepting LΦ must have at least i=2 pi states, which implies Theorem 2. Theorem 2. Given an NFA N with n states, it is impossible to efficiently ap√ n proximate nsize(L(N )) within a factor of ln n unless P = N P . We first determine a lower bound for the period of LΦ . Lemma 3. For nany given 3CNF-formula Φ ∈ 3SAT the minimal period of LΦ is either D := i=2 pi or 2D. Proof. LΦ is 2D-cyclic, since 2D is the least common multiple of the cycle lengths of NΦ . Assume that neither D nor 2D is the minimal period of LΦ . Then there is i ≥ 2, such that d = pDi is a period of LΦ . We know that aqpi +2 ∈ L0,Φ for every q ∈ IN, because qpi + 2 does not represent a code. Since L0,Φ ⊆ LΦ and we assume that LΦ is d-cyclic, aqpi +2+rd belongs to LΦ for every r ∈ IN as well. On the other hand, since LΦ = {a}∗ , there is an al ∈ LΦ , and so al+td ∈ LΦ for every t ∈ IN. It is a contradiction, if we find q, r, t ∈ IN0 , so that qpi +2+rd =
Probabilistic and Nondeterministic Unary Automata
467
l + td, since the corresponding word has to be in LΦ because of the left-hand side of the equation and cannot be in LΦ because of the right-hand side. ∃q, r, t : qpi + 2 + rd = l + td ⇔ ∃q, r, t : qpi = l − 2 + (t − r)d (mod d) ⇔ ∃q : qpi ≡ l − 2 ⇔ ∃q : q ≡ (l − 2)p−1 (mod d) i The multiplicative inverse of pi modulo d exists, since gcd(pi , d) = 1, and we have obtained the desired contradiction. We will need a linear relation between the number of clauses and variables in the CNF-formula. Fact 4. Let E3SAT −E5 be the satisfiability problem for formulae with exactly 3 literals in every clause and every variable appearing in exactly 5 distinct clauses, then E3SAT − E5 is N P -complete. The following Lemma determines a lower bound for the size of an NFA equivalent to NΦ , if Φ is satisfiable. Lemma 4. Let Φ ∈ E3SAT − E5 and assume that Φ consists of m clauses. Then nsize(L(NΦ )) ≥ cm2 ln m for some constant c. Proof. We know from Lemma 3, that L(NΦ ) is either minimally D-cyclic or 2Dn cyclic with D = i=2 pi where n is the number of variables n in Φ. Applying Fact 1 the size of a minimal NFA accepting LΦ is at least i=2 pi . We observe that n
i=2
pi ≥
n
i=1
i ln i ≥
n 1
x ln x dx ≥
n2 4
ln n
We have 5n = 3m and thus nsize(LΦ ) ≥ cm2 ln m for some constant c.
Finally we determine an upper bound for the size of the NFA NΦ . Lemma 5. Let Φ be a 3CNF formula with m clauses and exactly 5 appearances of every variable. Then the NFA NΦ has size Θ m4 (ln m)3 . Proof. The number of states in a cycle for a clause is a product of three primes. 3 (m ln m) ) states in all of these cycles. The So there are at most m · p3n = Θ(m n 2 cycles recognizing L0,Φ have i=2 pi = Θ(n ln n) states, where n is the number of variables of Φ. Since n = Θ(m) the claim follows. Proof (of Theorem 2). Assume that the polynomial time deterministic algorithm √ A approximates nsize(L(N )) within the factor ln ss for an NFA N with s states. We show that the satisfiablity problem can be decided in polynomial time. Let Φ be the given input for the E3SAT − E5 problem, where we assume that Φ has n variables and m clauses. We construct the NFA NΦ as in fact 3. If Φ is not satisfiable, then nsize(LΦ ) = 1, and according to Lemma 5 the algorithm A claims that an equivalent NFA with at most √ (Θ(m4 (ln m)3 )) s = o(m2 ln m) = ln s ln(Θ(m4 (ln m)3 ))
468
Gregor Gramlich
n 2 states exists. Since i=2 pi = Θ(m ln m), the claimed number of states is asymptotically smaller than nsize(LΨ ) for any satisfiable formula Ψ with the same number of clauses as Φ. Hence with the help of A, we can decide if Φ is satisfiable within polynomial time. Remark 1. For every 0 < ≤ 1 the same construction as in the proof of Theorem 2 can be used to show that it is not possible to approximate the size of a minimal 1 PFA with isolation equivalent to a given n-state PFA with isolation c · n− 4 √ within the factor ln nn . For a given formula Φ with m clauses we construct the PFA MΦ with m cycles2 and uniform initial distribution for the initial states of each cycle. We 1 . Hence a word is accepted by MΦ iff it is accepted define the cutpoint as λ = 2m 1 1 ≥ c · n− 4 for by at least one cycle. Thus the cutpoint λ is δ-isolated with δ = 2m some appropriate c, and MΦ behaves like a union automaton. Since L(MΦ , λ) is the same languageas considered before, it is 1-cyclic if Φ is not satisfiable m and has period D = i=2 pi or 2D if Φ is satisfiable. Every PFA with isolated m m cutpoint that accepts a language with period i=2 pi has at least i=2 pi states [7], independent of the actual isolation. The approximation complexity changes if a unary cyclic language is specified by a DFA M , although the decision problem, namely to decide whether there is a k-state NFA accepting the cyclic language L(M ), is not efficiently solvable unless N P ⊆ DT IM E(nO(ln n) ) [4]. Theorem 3. Given a unary cyclic DFA accepting L with D states, an NFA for L with at most nsize(L) · (1 + ln D) states computed in polynomial time. can be3 Observe that nsize(L) · (1 + ln D) = O nsize(L) 2 ln nsize(L) . Proof. We reduce the optimization problem for a given cyclic DFA M to an instance of the weighted set cover problem. We can assume M to be a minimal cyclic D-state DFA with the set of states Q = {0, . . . , D − 1}, 0 as the initial state, and final states F ⊆ Q. Then L(M ) = {aj+kD |j ∈ F, k ∈ IN0 }. For every dl that divides D we construct a deterministic cycle Cl with period dl . The union automaton consisting of these cycles will accept L(M ), if we choose the final states of Cl as follows: For each aj ∈ L with 0 ≤ j < dl , we let Cl accept aj , iff aj+k·dl ∈ L(M ) for any 0 ≤ k < dDl . Remember, that we don’t have to check for ax with x ≥ D, since L(M ) is D-cyclic and dl divides D. At this stage the union automaton will have a lot of unnecessary cycles. Therefore we define an instance of the set cover problem, where we introduce a set Tl := {j|0 ≤ j < D, aj is accepted by Cl } of weight wl := dl for every cycle Cl . The universe is {j|0 ≤ j < D, aj ∈ L(M )}. The instance can be constructed in polynomial time, since the number of divisors of D is less than D and thus the set cover problem consists of at most D sets with at most D elements. If N is a minimal NFA accepting L(M ), then we know from Fact 1 that N is a union automaton (with an additional initial state) that consists of cycles 2
To check the validity of a code we can also use the clause cycles.
Probabilistic and Nondeterministic Unary Automata
469
with periods that divide D. Every cycle C ∗ of N corresponds to a set Tl and the accepted words of C ∗ up to length D − 1 are contained in Tl . So a minimal union automaton with n states can be expressed by a set cover of weight n. On the other hand, every set cover can be considered to be a union automaton. Thus a minimal set cover corresponds to a minimal NFA. The greedy algorithm for the weighted set problem approximates the cover k optimal set cover within the factor H(k) = i=1 k1 ≤ 1 + ln k, where k is the size of the largest set [6]. For an n-state NFA N Chrobak [1] bounds the size of √ c(L(N )) by the Landau function and receives D = O(e n ln n ).
5
Conclusions and Open Problems
In Theorem 1 we have shown that PFA’s with constant isolation lead to only polynomially smaller automata in comparison to cyclic unary DFA’s. It is not hard to observe that PFA’s with constant isolation are negatively exponentially smaller than DFA’s for non-cyclic unary languages. The size relation between minimal PFA’s and minimal DFA’s for non-cyclic unary languages is to be further explored. The hardness result of Theorem 2 for minimizing unary NFA’s is tight within √ a square, since size ln nn is excluded for a given NFA of size n. Is Theorem 2 “essentially” optimal? Jiang and Ravikumar [5] state the open problem of approximating a minimal NFA given a DFA. Specifically to determine the complexity of designing an NFA accepting L(M ) with at most nsize(L(M ))k states for a given DFA M and a given k. We have answered the question for the case of unary cyclic DFA’s and k > 32 in Theorem 3.
References 1. Chrobak, M.: Finite automata and unary languages, Theoretical Computer Science 47, 1986, pp. 149-158. 2. Gantmacher, F.R.: Theory of Matrices, Vol. II, Chelsea, New York, 1959. 3. Graham, R., Knuth, D., Patashnik, O.: Concrete Mathematics, Addison Wesley, Reading, Massachusetts, 1989. 4. Jiang, T., McDowell, E., Ravikumar, B.: The structure and complexity of minimal NFA’s over a unary alphabet, Int. J. Found. of Comp. Sci., 2, 1991, pp. 163-182. 5. Jiang, T., Ravikumar, B.: Minimal NFA problems are hard, SIAM Journal on Computing, 22 (1), 1993, pp. 1117-1141. 6. Hochbaum, D. (editor): Approximation algorithms for N P -hard problems, PWS Publishing Company, Boston, 1997. 7. Mereghetti, C., Palano, B., Pighizzini, G.: On the succinctness of deterministic, nondeterministic, probabilistic and quantum finite automata, DCAGRS 2001. 8. Milani, M., Pighizzini, G.: Tight bounds on the simulation of unary probabilistic automata by deterministic automata, DCAGRS 2000. 9. Rabin, M.: Probabilistic automata, Information and Control, 1963, pp. 230-245. 10. Stockmeyer, L., Meyer, A.: Word Problems Requiring Exponential Time, Proc. of the 5th Ann. ACM Symposium on Theory of Computing, New York, 1973, pp. 1-9.
On Matroid Properties Definable in the MSO Logic Petr Hlinˇen´ y ´ SAV) Institute of Mathematics and Comp. Science (MU Matej Bel University and Slovak Academy of Sciences Severn´ a ul. 5, 974 00 Bansk´ a Bystrica, Slovakia [email protected]
Abstract. It has been proved by the author that all matroid properties definable in the monadic second-order (MSO) logic can be recognized in polynomial time for matroids of bounded branch-width which are represented by matrices over finite fields. (This result extends so called “M S2 -theorem” of graphs by Courcelle and others.) In this work we review the MSO theory of finite matroids and show some interesting matroid properties which are MSO-definable. In particular, all minorclosed properties are recognizable in such way. Keywords: matroid, branch-width, MSO logic, parametrized complexity.
1
Introduction
The theory of parametrized complexity provides a background for analysis of difficult algorithmic problems which is finer than classical complexity theory. We postpone formal definitions till Section 3. Briefly saying, a problem is called “fixed-parameter tractable” if there is an algorithm having running time with the (possible) super-polynomial part separated in terms of some natural “parameter”, which is supposed to be small even for large input in practice. (Successful practical applications of this concept are known, for example, in computational biology or in database theory.) We are interested in algorithmic problems that are parametrized by a “treelike” structure of the input objects. Graph “branch-width” is closely related to well-known tree-width [13], but a branch decomposition does not refer to vertices, and so branch-width directly generalizes from graphs to matroids. It follows from works of Courcelle [2] and Bodlaender [1] that all graph problems definable in the monadic second-order logic can be solved in linear time for graphs of bounded tree-width. Those include many notoriously hard problems like 3-colouring, Hamiltonicity, etc.
Parts of this research have been done during author’s stay at the Victoria University of Wellington in New Zealand. From August 2003 also Department of Computer Science, Technical University Ostrava, Czech Republic.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 470–479, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Matroid Properties Definable in the MSO Logic
471
We study and present analogous results for matroids representable over finite fields. The motivation of our research is mainly theoretical — to show how the mentioned complexity phenomenon extends from graphs to a much larger class of combinatorial objects, and to stimulate further research interest in matroid branch-width and the complexity of matroid problems. (Unfortunately, wide generality of our approach leads to impractically huge constants involved in the algorithms, such as in Theorem 4.1.) Since not all computer scientists are familiar with structural matroid theory or with parametrized complexity, we give a basic overview of necessary concepts in the next two sections.
2
Matroids and Branch-Width
We refer to Oxley [12] for matroid terminology. A matroid is a pair M = (E, B) where E = E(M ) is the ground set of M (elements of M ), and B ⊆ 2E is a nonempty collection of bases of M . Moreover, matroid bases satisfy the “exchange axiom”; if B1 , B2 ∈ B and x ∈ B1 − B2 , then there is y ∈ B2 − B1 such that (B1 − {x}) ∪ {y} ∈ B. We consider only finite matroids. Subsets of bases are called independent sets, and the remaining sets are dependent. Minimal dependent sets are called circuits. All bases have the same cardinality called the rank r(M ) of the matroid. The rank function rM : 2E → N of M tells the maximal cardinality rM (X) of an independent subset of a set X ⊆ E(M ). If G is a graph, then its cycle matroid on the ground set E(G) is denoted by M (G). The bases of M (G) are the (maximal) spanning forests of G, and the circuits of M (G) are the cycles of G. Another example of a matroid is a finite set of vectors with usual linear dependency. If A is a matrix, then the matroid formed by the column vectors of A is called the vector matroid of A, and denoted by M (A). The matrix A is a representation of a matroid M M (A). We say that the matroid M (A) is F-represented if A is a matrix over a field F. The dual matroid M ∗ of M is defined on the same ground set E, and the bases of M ∗ are the set-complements of the bases of M . The dual rank function satisfies rM ∗ (X) = |X| − r(M ) + rM (E − X). A set X is coindependent in M if it is independent in M ∗ . An element e of M is called a loop (a coloop), if {e} is dependent in M (in M ∗ ). The matroid M \ e obtained by deleting a noncoloop element e is defined as (E − {e}, B − ) where B − = {B : B ∈ B, e ∈ B}. The matroid M/e obtained by contracting a non-loop element e is defined using duality M/e = (M ∗ \ e)∗ . (This corresponds to contracting an edge in a graph.) A minor of a matroid is obtained by a sequence of deletions and contractions of elements. Since these operations naturally commute, a minor M of a matroid M can be uniquely expressed as M = M \ D/C where D are the coindependent deleted elements and C are the independent contracted elements. A matroid family M is minor-closed if M ∈ M implies that all minors of M are in M. A matroid N is called an excluded minor (also known as “forbidden”) for a minor-closed family M if N ∈ M but N ∈ M for all proper minors N of N . The connectivity function λM of a matroid M is defined for all subsets A ⊆ E = E(M ) by λM (A) = rM (A) + rM (E − A) − r(M ) + 1. Notice that λM (A) =
472
Petr Hlinˇen´ y 1
2
4
3
8
7 4
2
1
9
3
1
2
6
5
7
4
8 5
3
6
5 6
7
8 9
4
8
3
7
2
6
1
5
Fig. 1. Two examples of width-3 branch decompositions of the Pappus matroid (top left, rank 3) and of the binary affine cube (bottom left, rank 4). Here the lines depict linear dependencies between matroid elements.
λM (E − A). Is is also routine to verify that λM (A) = λM ∗ (A), i.e. matroid connectivity is dual-invariant. A subset A ⊆ E is k-separating if λM (A) ≤ k. A partition (A, E−A) is called a k-separation if A is k-separating and both |A|, |E− A| ≥ k. For n > 1, the matroid M is called n-connected if it has no k-separation for k = 1, 2, . . . , n − 1, and |E(M )| ≥ 2n − 2. (A connected matroid corresponds to a vertex 2-connected graph. Geometric interpretation of a k-separation (A, B) is that the spans of A and of B intersect in a subspace of rank less than k.) Let (T ) denote the set of leaves of a tree T . A branch decomposition of a matroid M is a pair (T, τ ) where T is a tree of maximal degree three, and τ is a bijection of E(M ) onto (T ). Let f be an edge of T , and T1 , T2 be the connected components of T − f . The width of an edge f in T is λM (A) = λM (B), where A = τ −1 ((T1 )) and B = τ −1 ((T2 )). The width of the branch decomposition (T, τ ) is maximum of the widths of all edges of T , and the branch-width of M is the minimal width over all branch decompositions of M . If T has no edge, then we take its width as 0. An example of a branch decomposition is presented in Fig. 1. Notice that matroid branch-width is invariant under duality. It is straightforward to verify that branch-width does not increase when taking minors: Let (T, τ ) be a branch decomposition of a matroid M . Say, up to duality, that M = M \ e. We form T from T by deleting the leaf τ (e), and set τ to be τ restricted to E(M ). Then, for any partition (A, B) of E(M ) given by an edge f in T , we have obvious λM (A − {e}) ≤ λM (A), and so the width of (T , τ ) is not bigger than the width of (T, τ ) for M .
On Matroid Properties Definable in the MSO Logic
473
We remark that branch-width of a graph G is defined analogously, using the connectivity function λG where λG (F ) for F ⊆ E(G) is the number of vertices incident both with F and E(G) − F . Clearly, branch-width of a graph G is never smaller than branch-width of its cycle matroid M (G). It is still an open conjecture that these numbers are actually equal. On the other hand, branchwidth is within a constant factor of tree-width in graphs [13]. Lastly in this section we mention few words about relations of matroid theory to computer science. As the reader surely knows, a greedy algorithm on a matroid is one of the basic tools in combinatorial optimization. That is why matroids naturally arise in a number of optimization problems; such as the minimum spanning tree or job assignment problems. More involved applications of matroids in combinatorial optimization could be found in numerous works of Edmonds, Cunningham and others. Besides that, the concept of branch-width has attracted increasing attention among matroid theorists recently, and several deep results of Robertson-Seymour’s graph minor theory have been extended from graphs to matroids representable over finite fields; such as [6]. Robertson-Seymour’s theory has been followed by many interesting algorithmic applications on graphs (mostly related to tree-width or branch-width). Therefore we think it is right time now to look at complexity aspects of branchwidth in matroid problems. For example, we have given a straightforward polynomial algorithm for computation of the Tutte polynomial [10] on a representable matroid of bounded branch-width. (It seems that matroids present a more suitable model than graphs for computing the Tutte polynomial on structures of bounded tree-/branch-width.) As yet another motivation we remark that linear codes over a finite field F are in a direct correspondence with F-represented matroids.
3
Parametrized Complexity
When speaking about parametrized complexity, we closely follow Downey and Fellows [4]. Here we present the basic definition of parametrized tractability. For simplicity, we restrict the definition to decision problems, although an extension to computation problems is straightforward. Let Σ be the input alphabet. A parametrized problem is an arbitrary subset Ap ⊆ Σ ∗ × N. For an instance (x, k) ∈ Ap , we call k the parameter and x the input for the problem. (The parameter is sometimes implicit in the context.) We say that a parametrized problem Ap is (nonuniformly) fixed-parameter tractable if there is a sequence of algorithms {Ai : i ∈ N}, and a constant c; such that (x, k) ∈ Ap iff the algorithm Ak accepts (x, k), and that the running time of Ak on (x, k) is O(|x|c ) for each k. Similarly, a parametrized problem Ap is uniformly fixed-parameter tractable if there is an algorithm A, a constant c, and an arbitrary function f : N → N; such that (x, k) ∈ Ap iff the algorithm A accepts (x, k), and that the running time of A on (x, k) is O(f (k) · |x|c ). There is a natural correspondence of a parametrized problem Ap to an ordinary problem A = { x, k : (x, k) ∈ Ap } (for example, the problem of a
474
Petr Hlinˇen´ y
k-vertex cover in a graph), or to a problem A = {x : ∃k (x, k) ∈ Ap } if k is not “directly involved” in the question (such as a Hamiltonian cycle in a graph of tree-width k). On the other hand, an ordinary problem may have several natural parametrized versions respecting different parameters. We remark that the parameter is formally a natural number, but that may encode arbitrary finite structures in a standard way. As we have already noted above, our interest is in parametrized problems where the parameter is branch-width (tree-width). Inspired by the algorithm of Bodlaender [1], we have shown that branch-width of matroids represented over finite fields is fixed parameter tractable, and that, moreover, we could efficiently construct a branch decomposition. Let Bt denote the class of all matroids of branch-width at most t. We have proved the following: Theorem 3.1. (PH [9]) Let t ≥ 1 be fixed, and let F be a finite field. Suppose that A is an r × n matrix over F (r ≤ n) such that the represented matroid M (A) ∈ Bt . Then there is an algorithm that finds a branch decomposition of the matroid M (A) of width at most 3t in time O(n3 ). Actually, our algorithm directly constructs a so called “parse tree” for the mentioned branch decomposition. Unfortunately, the algorithm in Theorem 3.1 does not necessarily produce the optimal branch decomposition. On the other hand, there are finitely many excluded minors for the class Bk for each k, and these excluded minors are constructed algorithmically since they have size at most 15 (6k+1 − 1) by [5]. Hence, in this particular case, we can extend the idea in Theorem 5.2 to show: Corollary 3.2. Let F be a finite field. Suppose that A is a given matrix over F. Then branch-width of the matroid M (A) is uniformly fixed parameter tractable.
4
MSO Logic of Matroids
The monadic second-order (MSO) theory of matroids uses language based on the monadic second-order logic. The syntax includes variables for matroid elements and element sets, the quantifiers ∀, ∃ applicable to these variables, the logical connectives ∧, ∨, ¬, and the following predicates: 1. =, the equality for elements and their sets, 2. e ∈ F , where e is an element variable and F is an element set variable, 3. indep(F ), where F is an element set variable, and the predicate tells whether F is independent in the matroid. Moreover, we write φ → ψ to stand for ¬φ∨ψ, and X ⊆ Y for ∀x(x ∈ Y ∨x ∈ X). Notice that the “universe” of a formula (the model in logic terms) in the above theory is one particular matroid. To give a better feeling for the MSO theory of matroids, we provide few simple predicates now. We write basis(B) ≡ indep(B) ∧ ∀e e ∈ B ∨ ¬ indep(B ∪ {e}) where indep(B ∪ {e}) is a shortcut for obvious ∃X indep(X) ∧ e ∈ X ∧ B ⊆ X ∧ ∀x(x = e ∨ x ∈ B ∨ x ∈ X) . Similarly,
On Matroid Properties Definable in the MSO Logic
475
we write a predicate circuit(C) ≡ ¬ indep(C) ∧ ∀e e ∈ C → indep(C − {e}) where indep(C − {e}) is a shortcut for ∃X indep(X) ∧ e ∈ X ∧ X ⊆ C ∧ ∀x(x = e ∨ x ∈ C ∨ x ∈ X) . Let us now look at the (graph) property of being Hamiltonian. In matroid language, that means to have a circuit containing a basis. So we may write a sentence hamilton ≡ ∃C circuit(C) ∧ ∃e basis(C − {e}) . A related matroidal property is to be a paving matroid M — i.e., to have all circuits C in M of size |C| ≥ r(M ). Let us explain this sample property in detail. Since C − {e} is independent for each e ∈ C by definition of a circuit, we have |C| ≤ r(M ) + 1 for any circuit C in M . Considering a basis B ⊇ C − {e} and the inequality |C| ≥ r(M ) = |B| valid in a paving matroid, we conclude that there is an element f such that B ⊆ C ∪ {f}. The converse also holds. Hence we express paving ≡ ∀C circuit(C) → ∃f, B B ⊆ C ∪ {f } ∧ basis(B) . The reason why we are looking for properties definable in the MSO logic of matroids is, that such properties can be recognized in polynomial time for matroids of bounded branch-width over finite fields. The following result is based on a finite-state recognizability of matroidal MSO properties, proved by the author in [8], and on Theorem 3.1. Theorem 4.1. (PH [7,8,9]) Let F be a finite field. Assume that M is a class of matroids defined in one of the following ways; (a) there is an MSO sentence φ such that M ∈ M iff φ is true on M , or (b) there is a sequence of MSO sentences {φk : k = 1, 2, . . .} and, for all k ≥ 1 and matroids M ∈ Bk , we have M ∈ M iff φk is true on M . Suppose that A is an n-column matrix over F such that M (A) ∈ Bt where t ≥ 1 is fixed. Then there is an algorithm deciding whether M (A) ∈ M in time O(n3 ), and this algorithm can be constructed from the given sentence(s) φ or φt for all t. Remark. In the language of parametrized complexity, Theorem 4.1 says that the class of F-represented matroids defined by MSO sentences φ or φt is fixedparameter tractable with respect to the combined parameter F, t . Moreover, in the case (a), or in the case (b) when the sentences φk are constructible by an algorithm, the class M is uniformly fixed-parameter tractable. So it follows that the properties of being Hamiltonian or a paving matroid can be efficiently recognized on F-represented matroids of bounded branch-width. Other simple matroidal properties definable in the MSO logic are, for example, the properties of being identically self-dual, or being a “free spike” [11]. Moreover, all properties definable in the extended MSO theory of graphs (M S2 ) are also MSO-definable over graphic matroids [8]. Several more interesting classical matroid properties are shown to be MSO-definable in the next sections.
5
Minor-Closed Properties
It is easy to see that the class of F-representable matroids is minor-closed, and so is the class Bt of matroids of branch-width at most t. We say that a set S is wellquasi-ordered (WQO) if there are neither infinite antichains nor infinite strictly
476
Petr Hlinˇen´ y
descending chains in S. By a deep result of [6], matroids of bounded branchwidth which are representable over a fixed finite field F are WQO in the minor order. (However, unlike graphs, matroids are not WQO in general.) So it follows that any minor-closed matroid family M has a finite number of F-representable excluded minors in Bt . We now show that presence of one particular minor can be described by an MSO sentence. Lemma 5.1. Let N be a matroid. There is a (computable) MSO sentence ψN such that ψN is true on a matroid M if and only if M has an N -minor. Proof. N is a minor of M if and only if there are two sets C, D such that C is independent and D is coindependent in M , and that N = M \ D/C. Suppose that N = M \ D/C holds. Then a set X ⊆ E(N ) is dependent in N if and only if there is a dependent set Y ⊆ E(M ) in M such that Y − X ⊆ C. (This simple claim may be more obvious when viewed over the dual matroid M ∗ — a set is dependent in M iff it intersects each basis of M ∗ , and N ∗ = M ∗ /D \ C.) Since N is fixed, we may identify the elements of the (supposed) N -minor in M by variables x1 , . . . , xn in order, where n = |E(N )|. Then, knowing the contract set C (and implicit D), we are able to say which subsets of {x1 , . . . , xn } are dependent in M \ D/C. For each J ⊆ [1, n], we write mdep(xj : j ∈ J; C) ≡ ∃ Y ¬ indep(Y ) ∧ ∀y y ∈ Y ∨ y ∈ C ∨ y = xj . j∈J
Now, M \ D/C is isomorphic to N iff the dependent subsets of {x1 , . . . , xn } exactly match the dependent sets of N . Hence we express ψN as
¬ mdep(xj : j ∈ J; C) ∧ mdep(xj : j ∈ J; C) , ∃ C ∃ x1 , . . . , xn J∈J+
J∈J−
where J+ is the set of all J ⊆ [1, n] such that {xj : j ∈ J} actually is independent in N , and where J− is the complement of J+ . 2 Hence, in connection with Theorem 4.1, we conclude: Theorem 5.2. Let t ≥ 1 be fixed, let F be a finite field, and let M be a minorclosed family. Given a matrix A over F with n columns such that M (A) ∈ Bt , one can decide whether the matroid M (A) belongs to M in time O(n3 ). Proof. As already noted above, the family M has a finite number of Frepresentable excluded minors X1 , . . . , Xp ∈ Bt . Keeping in mind that all minors of M (A) also belong to Bt , we see that M (A) ∈ M iff M (A) has no minors isomorphic to X1 , . . . , Xp . (For formal completeness, we may verify M (A) ∈ Bt using Corollary 3.2.) We write φt ≡ ¬ψX1 ∧ . . . ∧ ¬ψXp using Lemma 5.1. Finally, we apply Theorem 4.1(b). 2 Applications of this theorem include determining the exact branch-width (cf. Section 3) or tree-width of a matroid, or deciding matroid orientability and representability over another field.
On Matroid Properties Definable in the MSO Logic
477
Remark. Unfortunately, the proof of Theorem 5.2 is non-constructive — there is no way in general how to compute the excluded minors X1 , . . . , Xp , not even their number or size. So we cannot speak about uniform fixed-parameter tractability here.
6
Matroid Connectivity
Another interesting task is to describe matroid connectivity in the MSO logic. That can be done quite easily. Lemma 6.1. Let M be a matroid on the ground set E, and let k ≥ 1. There is an MSO formula σk (X) which is true for X ⊆ E if and only if λM (X) ≥ k + 1. Proof. By definition, λM (X) ≥ k + 1 iff rM (X) + rM (E − X) ≥ r(M ) + k. Using standard matroidal arguments, this is equivalent to stating that there exist two bases B1 , B2 of M such that B2 ∩ X ⊂ B1 and |(B1 − B2 ) ∩ X| ≥ k. We may formalize this statement as
σk (X) ≡ ∃B1 , B2 basis(B1 ) ∧ basis(B2 ) ∧ ∀x (x ∈ B2 ∧ x ∈ X) → x ∈ B1 ∧ ∧ ∃z1 , . . . , zk
i=j
zi = zj ∧
i
zi ∈ X ∧
i
zi ∈ B1 ∧
i
zi ∈ B2
. 2
So we may finish this section with the next immediate result: Corollary 6.2. For each n > 1, there is an MSO sentence κn which is true on a matroid M if and only if M is n-connected.
7
Transversal Matroids
A matroid M is transversal if there is a bipartite graph G with vertex parts V = E(M ) and W , such that the rank of any set X in M equals the largest size of a matching incident with X in G. (Equivalently, a transversal matroid is a union of rank-1 matroids.) We consider transversal matroids here mainly because they have long history of research, but there is not much known about their relation to branch-width. Two elements e, f in a matroid M are parallel if {e, f } form a circuit, and e, f are in series if e, f are parallel in the dual M ∗ . A series minor of a matroid M is obtained by a sequence of contractions of series elements and arbitrary deletions of elements in M . A matroid having a representation over GF (2) is called a binary matroid. The trouble with transversal matroids is that these are not closed under taking minors or duals. However, series minors of transversal matroids are transversal again. We cannot use a “series” analogue of Theorem 5.2 since there is no wellquasi-ordering property of series minors even of bounded branch-width. Still, we can say a bit:
478
Petr Hlinˇen´ y
Theorem 7.1. There is an MSO sentence τ which is true on a matroid M if and only if M is a binary transversal matroid. Sketch of proof. Let Ck2 denote the graph obtained from a length-k cycle Ck by adding one parallel edge to each edge of Ck . According to [3], the following is true: A matroid M is both binary and transversal if and only if M has no series minor isomorphic to either the 4-element line U2,4 , or the graphic matroids M (K4 ) or M (Ck2 ) for k ≥ 3. Let N = M \ D/C be a minor of M , and let F = E(N ). There are no problems to express that N is a series minor of M , i.e. that C consists of series elements of M \ D. (For simplicity, we assume no coloops.) We write ∀x ∈ C ∃y ∈ F ∀Z Z ⊆ F ∪ C ∧ basis(Z) → x ∈ Z ∨ y ∈ Z . Now let P be a matroid. We may express whether P is isomorphic to M (Ck2 ) (regardless of the value of k) as follows ∃Z circuit(Z) ∧ ∀x ∈ Z ∃y ∈ Z circuit(x, y) ∧ ∀y ∈ Z ∃!x x ∈ Z ∧ circuit(x, y) where ∃!x Π(x) is a shortcut for ∃x Π(x) ∧ ∀x, x x = x ∨ ¬Π(x) ∨ ¬Π(x ) . The rest of the proof proceeds by combining the previous formulas with the ideas in the proof of Lemma 5.1. (Considering matroid P as a minor of M , we use the predicate mdep from that proof to express circuit in the above formula.) We leave technical details to the reader. 2 Since the proof of Theorem 7.1 is very specific to binary matroids, we doubt that it could be extended to all matroids. Thus we ask: Problem 7.2. Is the property of being a transversal matroid MSO-definable?
Acknowledgement I would like to thank Prof. Geoff Whittle from Victoria University for introducing me to the beauties of structural matroid theory, and to Prof. Rod Downey for pointing my research towards parametrized complexity of matroid problems. Moreover, I am grateful to the NZ Marsden Fund and the Victoria University of Wellington for supporting my stay in New Zealand.
References 1. H.L. Bodlaender, A Linear Time Algorithm for Finding Tree-Decompositions of Small Treewidth, SIAM J. Computing 25 (1996), 1305–1317. 2. B. Courcelle, The Monadic Second-Order Logic of Graphs I. Recognizable sets of Finite Graphs, Information and Computation 85 (1990), 12–75. 3. J. de Sousa, D.J.A. Welsh, A Characterisation of Binary Transversal Matroids, J. Math. Anal. Appl. 40 (1972), 55–59.
On Matroid Properties Definable in the MSO Logic
479
4. R.G. Downey, M.R. Fellows, Parametrized Complexity, Springer-Verlag, 1999. 5. J.F. Geelen, A.H.M. Gerards, N. Robertson, G.P. Whittle, On the Excluded Minors for the Matroids of Branch-Width k, J. Combin. Theory Ser. B, to appear (2003). 6. J.F. Geelen, A.H.M. Gerards, G.P. Whittle, Branch-Width and Well-QuasiOrdering in Matroids and Graphs, J. Combin. Theory Ser. B 84 (2002), 270–290. 7. P. Hlinˇen´ y, Branch-Width, Parse Trees, and Monadic Second-Order Logic for Matroids (Extended Abstract), In: STACS 2003, Lecture Notes in Computer Science 2607, Springer Verlag (2003), 319–330. 8. P. Hlinˇen´ y, Branch-Width, Parse Trees, and Monadic Second-Order Logic for Matroids, submitted, 2002. 9. P. Hlinˇen´ y, A Parametrized Algorithm for Matroid Branch-Width, submitted, 2002. 10. P. Hlinˇen´ y, The Tutte Polynomial for Matroids of Bounded Branch-Width, submitted, 2002. 11. P. Hlinˇen´ y, It is Hard to Recognize Free Spikes, submitted, 2002. 12. J.G. Oxley, Matroid Theory, Oxford University Press, 1992,1997. 13. N. Robertson, P.D. Seymour, Graph Minors X. Obstructions to Tree-Decomposition, J. Combin. Theory Ser. B 52 (1991), 153–190.
Characterizations of Catalytic Membrane Computing Systems (Extended Abstract) Oscar H. Ibarra1 , Zhe Dang2 , Omer Egecioglu1 , and Gaurav Saxena1 1
2
Department of Computer Science University of California Santa Barbara, CA 93106, USA [email protected] Fax: 805-893-8553 School of Electrical Engineering and Computer Science Washington State University Pullman, WA 99164, USA
Abstract. We look at 1-region membrane computing systems which only use rules of the form Ca → Cv, where C is a catalyst, a is a noncatalyst, and v is a (possibly null) string of noncatalysts. There are no rules of the form a → v. Thus, we can think of these systems as “purely” catalytic. We consider two types: (1) when the initial configuration contains only one catalyst, and (2) when the initial configuration contains multiple (not necessarily distinct) catalysts. We show that systems of the first type are equivalent to communication-free Petri nets, which are also equivalent to commutative context-free grammars. They define precisely the semilinear sets. This partially answers an open question in [19]. Systems of the second type define exactly the recursively enumerable sets of tuples (i.e., Turing machine computable). We also study an extended model where the rules are of the form q : (p, Ca → Cv) (where q and p are states), i.e., the application of the rules is guided by a finite-state control. For this generalized model, type (1) as well as type (2) with some restriction correspond to vector addition systems. Keywords: membrane computing, catalytic system, semilinear set, vector addition system, reachability problem.
1
Introduction
In recent years, there has been a burst of research in the area of membrane computing [16], which identifies an unconventional computing model (namely a P system) from natural phenomena of cell evolutions and chemical reactions [2]. Due to the built-in nature of maximal parallelism inherent in the model, P systems have a great potential for implementing massively concurrent systems in an efficient way, once future biotechnology (or silicon-technology) gives way to a practical bio-realization (or a chiprealization). In this sense, it is important to study the computing power of the model.
This research was supported in part by NSF Grants IIS-0101134 and CCR02-08595.
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 480–489, 2003. c Springer-Verlag Berlin Heidelberg 2003
Characterizations of Catalytic Membrane Computing Systems
481
Two fundamental questions one can ask of any computing device (such as a Turing machine) are: (1) What kinds of restrictions/variations can be placed on the device without reducing its computing power? (2) What kinds of restrictions/variations can be placed on the device which will reduce its computing power? For Turing machines, the answer to (1) is that Turing machines (as well as variations like multitape, nondeterministic, etc.) accept exactly the recursively enumerable (r.e.) languages. For (2), there is a wide spectrum of well-known results concerning various sub-Turing computing models that have been introduced during the past half century – to list a few, there are finite automata, pushdown automata, linearly bounded automata, various restricted counter automata, etc. Undoubtedly, these sub-Turing models have enhanced our understanding of the computing power of Turing machines and have provided important insights into the analysis and complexity of many problems in various areas of computer science. We believe that studying the computing power of P systems would lend itself to the discovery of new results if a similar methodology is followed. Indeed, much research work has shown that P systems and their many variants are universal (i.e., equivalent to Turing machines) [4,16,17,3,6,8,19] (surveys are found in [12,18]). However, there is little work in addressing the sub-Turing computing power of restricted P systems. To this end, we present some new results in this paper, specifically focusing on catalytic P systems. A P system S consists of a finite number of membranes, each of which contains a multiset of objects (symbols). The membranes are organized as a Venn diagram or a tree structure where one membrane may contain 0 or many membranes. The dynamics of S is governed by a set of rules associated with each membrane. Each rule specifies how objects evolve and move into neighboring membranes. The rule set can also be associated with priority: a lower priority rule does not apply if one with a higher priority is applicable. A precise definition of S can be found in [16]. Since, from a recent result in [19], P systems with one membrane (i.e., 1-region P systems) and without priority are already able to simulate two counter machines and hence universal [14], for the purposes of this paper, we focus on catalytic 1-region P Systems, or simply catalytic systems (CS’s) [16,19]. A CS S operates on two types of symbols: catalytic symbols called catalysts (denoted by capital letters C, D, etc) and noncatalytic symbols called noncatalysts (denoted by lower case letters a, b, c, d, etc). An evolution rule in S is of the form Ca → Cv, where C is a catalyst, a is a noncatalyst, and v is a (possibly null) string (an obvious representation of a multiset) of noncatalysts. A CS S is specified by a finite set of rules together with an initial multiset (configuration) w0 , which is a string of catalysts and noncatalysts. As with the standard semantics of P systems [16], each evolution step of S is a result of applying all the rules in S in a maximally parallel manner. More precisely, starting from the initial configuration w0 , the system goes through a sequence of configurations, where each configuration is derived from the directly preceding configuration in one step by the application of a subset of rules, which are chosen nondeterministically. Note that a rule Ca → Cv is applicable if there is a C and an a in the preceding configuration. The result of applying this rule is the replacement of a by v. If there is another occurrence of C and another occurrence of a, then the same rule or another rule with Ca on the left hand side can be applied. We require that the chosen subset of rules to apply must be
482
Oscar H. Ibarra et al.
maximally parallel in the sense that no other applicable rule can be added to the subset. Configuration w is reachable if it appears in some execution sequence. w is halting if none of the rules is applicable. The set of all reachable configurations is denoted by R(S). The set of all halting reachable configurations (which is a subset of R(S)) is denoted by Rh (S). We show that CS’s, whose initial configuration contains only one catalyst, are equivalent to communication-free Petri nets, which are also equivalent to commutative context free grammars [5,11]. They define precisely the semilinear sets. Hence R(S) and Rh (S) are semilinear. This partially answers an open problem in [19], where it was shown that when the initial configuration contains six catalysts, S is universal, and [19] raised the question of what is the optimal number of catalysts for universality. Our result shows that one catalyst is not enough. We also study an extended model where the rules are of the form q : (p, Ca → Cv) (where q and p are states), i.e., the application of the rules is guided by a finite-state control. For this generalized model, systems with one catalyst in its initial configuration as well as systems with multiple catalysts in its initial configuration but with some restriction correspond to vector addition systems. We conclude this section by recalling the definitions of semilinear sets and Parikh maps [15]. Let N be the set of nonnegative integers and k be a positive integer. A set S ⊆ Nk is a linear set if there exist vectors v0 , v1 , . . . , vt in Nk such that S = {v | v = v0 + a1 v1 + . . . + at vt , ai ∈ N}. The vectors v0 (referred to as the constant vector) and v1 , v2 , . . . , vt (referred to as the periods) are called the generators of the linear set S. A set S ⊆ Nk is semilinear if it is a finite union of linear sets. The empty set is a trivial (semi)linear set, where the set of generators is empty. Every finite subset of Nk is semilinear – it is a finite union of linear sets whose generators are constant vectors. Clearly, semilinear sets are closed under union and projection. It is also known that semilinear sets are closed under intersection and complementation. Let Σ = {a1 , a2 , . . . , an } be an alphabet. For each string w in Σ ∗ , define the Parikh map of w to be ψ(w) = (|w|a1 , |w|a2 , . . . , |w|an ), where |w|ai is the number of occurrences of ai in w. For a language (set of strings) L ⊆ Σ ∗ , the Parikh map of L is ψ(L) = {ψ(w) | w ∈ L}.
2
1-Region Catalytic Systems
In this section, we study 1-region membrane computing systems which use only rules of the form Ca → Cv, where C is a catalyst, a is a noncatalyst, and v is a (possibly null) string of noncatalysts. Note that we do not allow rules of the form a → v as in a P System. Thus, we could think of these systems as “purely” catalytic. As defined earlier, we denote such a system by CS. Let S be a CS and w be an initial configuration (string) representing a multiset of catalysts and noncatalysts. A configuration x is a reachable configuration if S can reach x starting from the initial configuration w. Call x a halting configuration if no rule is applicable on x. Unless otherwise specified, “reachable configuration” will mean any reachable configuration, halting or not. Note that a non-halting reachable configuration x is an intermediate configuration in a possibly infinite computation. We denote by R(S)
Characterizations of Catalytic Membrane Computing Systems
483
the set of Parikh maps of reachable configurations with respect to noncatalysts only. Since catalysts do not change in a computation, we do not include them in the Parikh map. Also, for convenience, when we talk about configurations, we sometimes do not include the catalysts. R(S) is called the reachability set of R. Rh (S) will denote the set of all halting reachable configurations. 2.1 The Initial Configuration Has Only One Catalyst In this subsection, we assume that the initial configuration of the CS has only one catalyst C. A noncatalyst a is evolutionary if there is a rule in the system of the form Ca → Cv; otherwise, a is non-evolutionary. Call a CS simple if each rule Ca → Cv has at most one evolutionary noncatalyst in v. Our first result shows that semilinear sets and simple CS’s are intimately related. Theorem 1. 1. Let Q ⊆ Nk . If Q is semilinear, then there is a simple CS S such that Q is definable by S, i.e., Q is the projection of Rh (S) on k coordinates. 2. Let S be a simple CS. Then Rh (S) and R(S) are semilinear. Later, in Section 4, we will see that, in fact, the above theorem holds for any CS whose initial configuration has only one catalyst. Suppose that we extend the model of a CS so that the rules are now of the form q : (p, Ca → Cv), i.e., the application of the rules is guided by a finite-state control. The rule means that if the system is in state q, application of Ca → Cv will land the system in state p. We call this system a CS with states or CSS. In addition, we allow the rules to be prioritized, i.e., there is a partial order on the rules: A rule r of lower priority than r cannot be applied if r is applicable. We refer to such a system as a CSSP. For both systems, the computation starts at (q0 , w), where q0 is a designated start state, and w is the initial configuration consisting of catalyst C and noncatalysts. In Section 4, we will see that a CSS can define only a recursive set of tuples. In contrast, the following result shows that a CSSP can simulate a Turing machine. Theorem 2. Let S be a CSSP with one catalyst and two noncatalysts. Then S can simulate a Turing machine. Directly from Theorem 2, we have: Corollary 1. Let S be a CSSP with one catalyst and two noncatalysts. Then R(S) ⊆ N2 need not be a semilinear set. We will see later that in contrast to the above result, when the rules are not prioritized, i.e., we have a CSS S with one catalyst and two noncatalysts, R(S) is semilinear. 2.2 The Initial Configuration Has Multiple Catalysts In this subsection, we assume that initial configuration of the CS can have multiple catalysts.
484
Oscar H. Ibarra et al.
In general, we say that a noncatalyst is k-bounded if it appears at most k times in any reachable configuration. It is bounded if it is k-bounded for some k. Consider a CSSP whose initial configuration has multiple catalysts. Assume that except for one noncatalyst, all other noncatalysts are bounded or make at most r (for some fixed r) alternations between nondecreasing and nonincreasing multiplicity in any computation. Call this a reversal-bounded CSSP. Corollary 2. If S is a reversal-bounded CSSP, then Rh (S) and R(S) are semilinear. Without the reversal-bounded restriction, a CSSP can simulate a TM. In fact, a CS (with multiple catalysts in its initial configuration) can simulate a TM. It was shown in [19] that a CS augmented with noncooperating rules of the form a → v, where a is a noncatalyst and v is a (possibly null) string of noncatalysts is universal in the sense that such an augmented system with 6 catalysts can define any recursively enumerable set of tuples. A close analysis of the proof in [19] shows that all the rules can be made purely catalytic (i.e., of the form Ca → Cv) using at most 8 catalysts. Actually, this number 8 can be improved further using the newest results in [7]: Corollary 3. A CS with 7 catalysts can define any recursively enumerable set of tuples. There is another restriction on a CSSP S that makes it define only a semilinear set. Let T be a sequence of configurations corresponding to some computation of S starting from a given initial configuration w (which contains multiple catalysts). A noncatalyst a is positive on T if the following holds: if a occurs in the initial configuration or does not occur in the initial configuration but later appears as a result of some catalytic rule, then the number of occurrences (multiplicity) of a in any configuration after the first time it appears is at least 1. (There is no bound on the number of times the number of a’s alternate between nondecreasing and nonincreasing, as long there is at least 1.) We say that a is negative on T if it is not positive on T , i.e., the number of occurrences of a in configurations in T can be zero. Any sequence T of configurations for which every noncatalyst is bounded or is positive is called a positive computation. Corollary 4. Any semilinear set is definable by a CSSP where every computation path is positive. Conversely, we have, Corollary 5. Let S be a CSSP. Suppose that every computation path of S is positive. Then Rh (S) and R(S) are semilinear. The previous corollary can further be strengthened. Corollary 6. Let S be a CSSP. Suppose we allow one (and only one) noncatalyst, say a, to be negative. This means that a configuration with a positive occurrence (multiplicity) of a can lead to a configuration with no occurrence of a. Suppose that every computation path of S is positive, except for a. Then Rh (S) and R(S) are semilinear.
Characterizations of Catalytic Membrane Computing Systems
3
485
Characterizations in Terms of Vector Addition Systems
An n-dimensional vector addition system (VAS) is a pair G = x, W , where x ∈ Nn is called the start point (or start vector) and W is a finite set of vectors in Zn , where Z is the set of all integers (positive, negative, zero). The reachability set of the VAS x, W is the set R(G) = {z | for some j, z = x + v1 + ... + vj , where, for all 1 ≤ i ≤ j, each vi ∈ W and x + v1 + ... + vi ≥ 0}. The halting reachability set Rh (G) = {z | z ∈ R(G), z + v ≥ 0 for every v in W }. An n-dimensional vector addition system with states (VASS) is a VAS x, W together with a finite set T of transitions of the form p → (q, v), where q and p are states and v is in W . The meaning is that such a transition can be applied at point y in state p and yields the point y + v in state q, provided that y + v ≥ 0. The VASS is specified by G = x, T, p0 , where p0 is the starting state. The reachability problem for a VASS (respectively, VAS) G is to determine, given a vector y, whether y is in R(G). The equivalence problem is to determine given two VASS (respectively, VAS) G and G , whether R(G) = R(G ). Similarly, one can define the reachability problem and equivalence problem for halting configurations. We summarize the following known results concerningVAS andVASS [20,9,1,10,13]: Theorem 3. 1. Let G be an n-dimensional VASS. We can effectively construct an (n + 3)-dimensional VAS G that simulates G. 2. If G is a 2-dimensional VASS G, then R(G) is an effectively computable semilinear set. 3. There is a 3-dimensional VASS G such that R(G) is not semilinear. 4. If G is a 5-dimensional VAS G, then R(G) is an effectively computable semilinear set. 5. There is a 6-dimensional VAS G such that R(G) is not semilinear. 6. The reachability problem for VASS (and hence also for VAS) is decidable. 7. The equivalence problem for VAS (and hence also for VASS) is undecidable. Clearly, it follows from part 6 of the theorem above that the halting reachability problem for VASS (respectively, VAS) is decidable. 3.1 The Initial Configuration Has Only One Catalyst We first consider CSS (i.e., CS with states) whose initial configuration has only one catalyst. There is an example of a 3-dimensional VASS G in [10] such that R(G) is not semilinear: G =< x, T, p >, where x = (0, 0, 1), and the transitions in T are: p → (p, (0, 1, −1)) p → (q, (0, 0, 0)) q → (q, (0, −1, 2)) q → (p, (1, 0, 0)) Thus, there are only two states p and q. The following was shown in [10]: 1. (x1 , x2 , x3 ) is reachable in state p if and only if 0 < x2 + x3 ≤ 2x1 . 2. (x1 , x2 , x3 ) is reachable in state q if and only if 0 < 2x2 + x3 ≤ 2x1 +1 . Hence R(G) is not semilinear. From this example, we can show: Corollary 7. There is CSS S with 1 catalyst, 3 noncatalysts, and two states such that R(S) is not semilinear.
486
Oscar H. Ibarra et al.
In fact, as shown below, each CSS corresponds to a VASS and vice versa. Lemma 1. 1. Let S be a CSS. We can effectively construct a VASS G such that R(G) = R(S). 2. Every VASS can be simulated by a CSS. From Theorem 3 part 6, we have: Corollary 8. The reachability problem for CSS is decidable. Clearly a reachable configuration is halting if no rule is applicable on the configuration. It follows from the above result that the halting reachability problem (i.e., determining if a configuration is in Rh (S)) is also decidable. A VASS is communication-free if for each transition q → (p, (j1 , ..., jk )) in the VASS, at most one ji is negative, and if negative its value is −1. From Lemma 1 and the observation that the VASS constructed for the proof of Lemma 1 can be made communication-free, we have: Theorem 4. The following systems are equivalent in the sense that each system can simulate the others: CSS, VASS, communication-free VASS. Now consider a communication-free VASS without states, i.e., a VAS where in every transition, at most one component is negative, and if negative, its value is -1. Call this a communication-free VAS. Communication-free VAS’s are equivalent to communicationfree Petri nets, which are also equivalent to commutative context-free grammars [5,11]. It is known that they have effectively computable semilinear reachability sets [5]. It turns out that communication-free VAS’s characterize CS’s. Theorem 5. Every communication-free VAS G can be simulated by a CS, and vice versa. Corollary 9. If S is a CS, then R(S) and Rh (S) are effectively computable semilinear sets. The following is obvious, as we can easily construct a VAS from the specification of the linear set. Corollary 10. If Q is a linear set, then we can effectively construct a communicationfree VAS G such that R(G) = Q. Hence, every semilinear set is a union of the reachability sets of communication-free VAS’s. From the NP-completeness of the reachability problem for communication-free Petri nets (which are equivalent to commutative context-free grammars) [11,5], we have: Corollary 11. The reachability problem for CS is NP-complete. We have already seen that a CSS S with prioritized rules (CSSP) and with two noncatalysts can simulate a TM (Theorem 2); hence R(S) need not be semilinear. Interestingly, if we drop the requirement that the rules are prioritized, such a system has a semilinear reachable set. Corollary 12. Let S be a CSS with two noncatalysts. Then R(S) and Rh (S) are effectively computable semilinear sets. Open Problem: Suppose S has only rules of the form Ca → Cv whose initial configuration has exactly one catalyst. Suppose the rules are prioritized. How is R(S) related to VASS?
Characterizations of Catalytic Membrane Computing Systems
487
3.2 The Initial Configuration Has Multiple Catalysts We have seen that a CS with multiple catalysts can simulate a TM. Consider the following restricted version: Instead of “maximal parallelism” in the application of the rules at each step of the computation, we only allow “limited parallelism” by organizing the rules to apply in one step to be in the following form (called a matrix rule): (D1 b1 → D1 v1 , ..., Ds bs → Ds vs ) where the Di ’s are catalysts (need not be distinct), the bi ’s are noncatalysts (need not be distinct), the vi ’s are strings of noncatalysts (need not be distinct), and s is the degree of the matrix. The matrix rules in a given system may have different degrees. The meaning of a matrix rule is that it is applicable if and only if each component of the matrix is applicable. The system halts if no matrix rule is applicable. Call this system a matrix CS, or MCS for short. We shall also consider MCS with states (called MCSS), where now the matrix rules have states and are of the form: p : (q, (D1 b1 → D1 v1 , ..., Ds bs → Ds vs )) Now the matrix is applicable if the system is in state p and all the matrix components are applicable. After the application of the matrix, the system enters state q. Lemma 2. Given a VAS (VASS) G, we can effectively construct an MCS (MCSS) S such that R(S) = R(G) × {1}. Lemma 3. Given an MCSS S over n noncatalysts, we can effectively construct an (n + 1)-dimensional VASS G such that R(S) = projn (R(G) ∩ (Nn × {1})). The VASS in Lemma 3 can be converted to a VAS. It was shown in [10] that if G is an n-dimensional VASS with states q1 , ..., qk , then we can construct an (n+3)-dimensional VAS G with the following property: If the VASS G is at (i1 , ..., in ) in state qj , then the VAS G will be at (i1 , ..., in , aj , bj , 0), where aj = j for j = 1 to k, bk = k + 1 and bj = bj+1 + k + 1 for j = 1 to k − 1. The last three coordinates keep track of the state changes, and G has additional transitions for updating these coordinates. However, these additional transitions only modify the last three coordinates. Define the finite set of tuples Fk = {(j, (k − j + 1)(k + 1)) | j = 1, ..., k} (note that k is the number of states of G). Then we have: Corollary 13. Given an MCSS S over n noncatalysts, we can effectively construct an (n+4)-dimensional VAS G such that R(S) = projn (R(G )∩(Nn ×{1}×Fk ×{0})), for some effectively computable k (which depends only on the number of states and number of rules in G). From Theorem 4, Lemmas 2 and 3, and the above corollary, we have: Theorem 6. The following systems are equivalent in the sense that each system can simulate the others: CSS, MCS, MCSS, VAS, VASS, communication-free VASS. Corollary 14. It is decidable to determine, given an MCSS S and a configuration α, whether α is a reachable configuration (halting or not).
488
Oscar H. Ibarra et al.
Corollary 15. It is decidable to determine, given an MCSS S and a configuration α, whether α is a halting reachable configuration. From Lemma 2 and Theorem 3 part 7, we have: Corollary 16. The equivalence and containment problems for MCSS are undecidable.
4
Closure Properties
Let S be a catalytic system of any type introduced in the previous sections. For the purposes of investigating closure properties, we will say that S defines a set Q ⊆ Nk (or Q is definable by S) if Rh (S) = Q × {0}r for some given r. Thus, the last r coordinates of the (k + r)-tuples in Rh (S) are zero, and the first k-components are exactly the tuples in Q. Fixed the noncatalysts to be a1 , a2 , a3 , .... Thus, any system S has noncatalysts a1 , ..., at for some t. We say that a class of catalytic systems of a given type is closed under: 1. Intersection if given two systems S1 and S2 , which define sets Q1 ⊆ Nk and Q2 ⊆ Nk , respectively, there exists a system S which defines Q = Q1 ∩ Q2 . 2. Union if given two systems S1 and S2 , which define sets Q1 ⊆ Nk and Q2 ⊆ Nk , respectively, there exists a system S which defines Q = Q1 ∪ Q2 3. Complementation if given a system S which defines a set Q ⊆ Nk , there exists a system S which defines Q = Nk − Q. 4. Concatenation if given two systems S1 and S2 , which define sets Q1 ⊆ Nk and Q2 ⊆ Nk , respectively, there exists a system S which defines Q = Q1 Q2 , where Q1 Q2 = {(i1 + j1 , ..., ik + jk ) | (i1 , ..., ik ) ∈ Q1 , (j1 , ..., jk ) ∈ Q2 }. 5. Kleene + if given a system S which defines a set Q ⊆ Nk , there exists a system S which defines Q = n≥1 Qn . 6. Kleene * if given a system S which defines a set Q ⊆ Nk , there exists a system S which defines Q = n≥0 Qn . Other unary and binary operations can be defined similarly. Theorem 7. The class CS with only one catalyst in the initial configuration is closed under intersection, union, complementation, concatenation, and Kleene+ (or Kleene∗ ). Investigation of closure properties of other types of catalytic systems is a subject for future research.
Acknowledgment We would like to thank Dung Huynh and Hsu-Chun Yen for their comments and for pointing out some of the references concerning vector addition systems. We also appreciate the comments and encouragement of Gheorghe Paun and Petr Sosik on this work.
Characterizations of Catalytic Membrane Computing Systems
489
References 1. H. G. Baker. Rabin’s proof of the undecidability of the reachability set inclusion problem for vector addition systems. In C.S.C. Memo 79, Project MAC, MIT, 1973. 2. G. Berry and G. Boudol. The chemical abstract machine. In POPL’90, pages 81–94. ACM Press, 1990. 3. P. Bottoni, C. Martin-Vide, Gh. Paun, and G. Rozenberg. Membrane systems with promoters/inhibitors. Acta Informatica, 38(10):695–720, 2002. 4. J. Dassow and Gh. Paun. On the power of membrane computing. Journal of Universal Computer Science, 5(2):33–49, 1999. 5. J. Esparza. Petri nets, commutative context-free grammars, and basic parallel processes. In FCT’95, volume 965 of LNCS, pages 221–232. Springer, 1995. 6. R. Freund and M. Oswald. P Systems with activated/prohibited membrane channels. In WMC-CdeA’02, volume 2597 of LNCS, pages 261–269. Springer, 2003. 7. R. Freund, M. Oswald, and P. Sosik. Reducing the number of catalysts needed in computationally universal P systems without priorities. In the 5th Descriptional Complexity of Formal Systems Workshop (DFCS), July 12-14, 2003, Budapest, Hungary. 8. P. Frisco and H. Jan Hoogeboom. Simulating counter automata by P Systems with symport/antiport. In WMC-CdeA’02, volume 2597 of LNCS, pages 288–301. Springer, 2003. 9. M. H. Hack. The equality problem for vector addition systems is undecidable. In C.S.C. Memo 121, Project MAC, MIT, 1975. 10. J. Hopcroft and J.-J. Pansiot. On the reachability problem for 5-dimensional vector addition systems. TCS, 8(2):135–159, 1979. 11. D.T. Huynh. Commutative grammars: The complexity of uniform word problems. Information and Control, 57:21–39, 1983. 12. C. Martin-Vide and Gh. Paun. Computing with membranes (P Systems): Universality results. In MCU, volume 2055 of LNCS, pages 82–101. Springer, 2001. 13. E. Mayr. Persistence of vector replacement systems is decidable. Acta Informatica, 15:309– 318, 1981. 14. M. Minsky. Recursive unsolvability of Post’s problem of Tag and other topics in the theory of Turing machines. Ann. of Math., 74:437–455, 1961. 15. R. Parikh. On context-free languages. Journal of the ACM, 13:570–581, 1966. 16. Gh. Paun. Computing with membranes. JCSS, 61(1):108–143, 2000. 17. Gh. Paun. Computing with membranes (P Systems): A variant. International Journal of Foundations of Computer Science, 11(1):167–181, 2000. 18. Gh. Paun and G. Rozenberg. A guide to membrane computing. TCS, 287(1):73–100, 2002. 19. P. Sosik and R. Freund. P Systems without priorities are computationally universal. In WMC-CdeA’02, volume 2597 of LNCS, pages 400–409. Springer, 2003. 20. J. van Leeuwen. A partial solution to the reachability problem for vector addition systems. In STOC’74, pages 303–309.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets in Undirected Graphs Toshimasa Ishii and Masayuki Hagiwara Department of Information and Computer Sciences Toyohashi University of Technology Aichi 441-8580, Japan {ishii,masa}@algo.ics.tut.ac.jp
Abstract. Given an undirected multigraph G = (V, E), a family W of sets W ⊆ V of vertices (areas), and a requirement function rW : W → Z + (where Z + is the set of positive integers), we consider the problem of augmenting G by the smallest number of new edges so that the resulting graph has at least rW (W ) edge-disjoint paths between v and W for every pair of a vertex v ∈ V and an area W ∈ W. So far this problem was shown to be NP-hard in the uniform case of rW (W ) = 1 for each W ∈ W, and polynomially solvable in the uniform case of rW (W ) = r ≥ 2 for each W ∈ W. In this paper, we show that the problem can be solved in O(m+ pr∗ n5 log (n/r∗ )) time, even in the general case of rW (W ) ≥ 3 for each W ∈ W, where n = |V |, m = |{{u, v}|(u, v) ∈ E}|, p = |W|, and r∗ = max{rW (W ) | W ∈ W}. Moreover, we give an approximation algorithm which finds a solution with at most one surplus edges over the optimal value in the same time complexity in the general case of rW (W ) ≥ 2 for each W ∈ W.
1
Introduction
In a communication network, graph connectivity is a fundamental measure of its robustness. The problem of achieving a high connectivity between every (or specified) two vertices has been extensively studied as the network design problem and so on (see [2,12] for surveys). Most of all those researches have dealt with connectivity between two vertices in a graph. However, in many real-wold networks, the connectivity between every two vertices is not necessarily required. For example, in a multimedia network, for a set W of vertices offering certain service i, such as mirror servers, a user at a vertex v can use service i by communicating with one vertex w ∈ W through a path between w and v. In such networks, it is desirable that the network has some pairwise disjoint paths from the vertex v to at least one of vertices in W . This means that the measure of reliability is the connectivity between a vertex and a set of vertices rather than that between two vertices. From this point of view, H. Ito et al. considered the node to area connectivity (NA-connectivity, for B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 490–499, 2003. c Springer-Verlag Berlin Heidelberg 2003
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
491
v3 v5
v4
W3
W3
v
v2
10
v1
v
11
W1
v7
W1
v9
v6
W2
W2
v8
W
(G=(V,E),
rW W1 (
(i)
)=2,
={W1,W2,W3})
rW W2 (
) =3,
rW W3 (
) = 4
W
(G=(V,E),
rW W1 (
(ii)
)=2,
={W1,W2,W3})
rW W2 (
) =3,
rW W3 (
) = 4
Fig. 1. Illustration of an instance of rW -NA-ECAP. (i) An initial graph G = (V, E) with a family W = {W1 = {v4 , v7 , v11 }, W2 = {v1 , v8 , v9 }, W3 = {v1 , v2 , v10 }} of areas, where a requirement function rW : W → Z + satisfies rW (W1 ) = 2, rW (W2 ) = 3, and rW (W3 ) = 4. (ii) An rW -NA-edge-connected graph obtained from G by adding a set of edges drawn as broken lines; there are at least rW (W ) edge-disjoint paths between every pair of a vertex v ∈ V and an area W ∈ W.
short) as a concept that represents the connectivity between vertices and sets of vertices (areas) in a graph [5,6,7]. In this paper, given a multigraph G = (V, E) with a family W of sets W of vertices (areas), and a requirement function rW : W → Z + , we consider the problem of asking to augment G by adding the smallest number of new edges so that the resulting graph has at least rW (W ) pairwise edge-disjoint paths between v and W for every pair of a vertex v ∈ V and an area W ∈ W. We call this problem rW -NA-edge-connectivity augmentation problem (for short, rW -NA-ECAP). Figure 1 gives an instance of rW -NA-ECAP with rW (W1 ) = 2, rW (W2 ) = 3, and rW (W3 ) = 4. So far r-NA-ECAP in the uniform case that rW (W ) = r holds for every area W ∈ W has been studied, and several algorithms have been developed. It was shown by H. Miwa et al. [9] that 1-NA-ECAP is NP-hard. Whereas, r-NA-ECAP is polynomially solvable in the case of r = 2 by H. Miwa et al. [9], and in the case of r ≥ 3 by T. Ishii et al. [4]. However, it was still open whether the problem with general requirements rW (W ) ≥ 2, W ∈ W is polynomially solvable or not. The above two algorithms for r-NA-ECAP are based on algorithms for solving the classical edge-connectivity augmentation problem which augments the edge-connectivity of a graph, but they are essentially different; the former one follows the method based on the minimum cut structure by T. Watanabe et al. [13], and the latter one follows the so-called ‘splitting off’ method by A. Frank [1]. In this paper, by extending the approach in [4] and establishing a min-max formula to rW -NA-ECAP, we show that rW -NA-ECAP with general requirements rW (W ) ≥ 3 for each W ∈ W can be solved in O(m+ pr∗ n5 log (n/r∗ )) time, where n = |V |, m = |{{u, v}|(u, v) ∈ E}|, p = |W|, and r∗ = max{rW (W ) | W ∈ W}. We also give an approximation algorithm for rW -NA-ECAP with general requirements rW (W ) ≥ 2, W ∈ W which delivers a solution with at most one edge over the optimal in the same time complexity. Some of the proofs will be omitted from this extended abstract.
492
2
Toshimasa Ishii and Masayuki Hagiwara
Problem Definition
Let G = (V, E) stand for an undirected graph with a set V of vertices and a set E of edges. An edge with end vertices u and v is denoted by (u, v). We denote |V | by n and |{{u, v}|(u, v) ∈ E}| by m. A singleton set {x} may be simply written as x, and “ ⊂ ” implies proper inclusion while “ ⊆ ” means “ ⊂ ” or “ = ”. In G = (V, E), its vertex set V and edge set E may be denoted by V (G) and E(G), respectively. For a subset V ⊆ V in G, G[V ] denotes the subgraph induced by V . For an edge set E with E ∩ E = ∅, we denote the augmented graph (V, E ∪ E ) by G + E . For an edge set E , we denote by V [E ] a set of all end vertices of edges in E . An area graph is defined as a graph G = (V, E) with a family W of vertex subsets W ⊆ V which are called areas (see Figure 1). We denote an area graph G with W by (G, W). In the sequel, we may denote (G, W) by G simply if no confusion arises. For two disjoint subsets X, Y ⊂ V of vertices, we denote by EG (X, Y ) the set of edges e = (x, y) such that x ∈ X and y ∈ Y , and also denote |EG (X, Y )| by dG (X, Y ). A cut is defined as a subset X of V with ∅ = X = V , and the size of a cut X is defined by dG (X, V − X), which may also be written as dG (X). Moreover, we define d(∅) = 0. For two cuts X, Y ⊂ V with X ∩ Y = ∅ in G, we denote by λG (X, Y ) the minimum size of cuts which separate X and Y , i.e., λG (X, Y ) = min{dG (S)|S ⊇ X, S ⊆ V − Y }. For two cuts X, Y ⊂ V with X ∩ Y = ∅ in G, we define λG (X, Y ) = ∞. The edge-connectivity of G, denoted by λ(G), is defined as minX⊂V,Y ⊂V λG (X, Y ). For a vertex v ∈ V and a set W ⊆ V of vertices, the node-to-area edge-connectivity (NA-edge-connectivity, for short) between v and W is defined as λG (v, W ). Note that λG (v, W ) = ∞ holds for v ∈ W . Also note that by Menger’s theorem, λG (v, W ) ≥ r holds if and only if there exist at least r edge-disjoint paths between v and W . For an area graph (G, W) and a function rW : W → Z + ∪ {0}, we say that (G, W) is rW -NA-edge-connected if λ(v, W ) ≥ rW (W ) holds for every pair of a vertex v ∈ V and an area W ∈ W. Note that the area graph (G, W) in Figure 1(ii) is rW -NA-edge-connected, where rW (W1 ) = 2, rW (W2 ) = 3, and rW (W3 ) = 4. In this paper, we consider the following problem, called rW -NA-ECAP. Problem 1. (rW -NA-edge-connectivity augmentation problem, rW -NA-ECAP) Input: An area graph (G, W) and a requirement function rW : W → Z + . Output: A set E ∗ of new edges with the minimum cardinality such that G + E ∗ is rW -NA-edge-connected.
3
Lower Bound on the Optimal Value
For an area graph (G, W) and a fixed function rW : W → Z + , let opt(G, W, rW ) denote the optimal value to rW -NA-ECAP in (G, W), i.e., the minimum size |E ∗ | of a set E ∗ of new edges such that G + E ∗ is rW -NA-edge-connected. In this section, we derive lower bounds on opt(G, W, rW ) to rW -NA-ECAP with (G, W). In the sequel, let W = {W1 , W2 , . . . , Wp }, rW (Wi ) = ri , and r1 ≤ r2 ≤ · · · ≤ rp if no confusion occurs.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
493
A family X = {X1 , . . . , Xt } of cuts in G is called a subpartition of V , if every two cuts Xi , Xj ∈ X satisfy Xi ∩ Xj = ∅ and ∪X∈X X ⊆ V holds. For an area graph (G, W) and an area Wi ∈ W, a cut X with X ∩ Wi = ∅ is called type (Ai ), and a cut X with X ⊇ Wi is called type (Bi ) (note that a cut X of type (Bi ) satisfies X = V by the definition of a cut). We easily see the following property. Lemma 1. An area graph (G, W) is rW -NA-edge-connected if and only if all cuts X ⊂ V of type (Ai ) or (Bi ) satisfy dG (X) ≥ ri for each area Wi ∈ W.
Let X be a cut in (G, W). If X is a cut of type (Ai ) or (Bi ) with dG (X) < ri for some area Wi ∈ W, then it is necessary to add at least ri − dG (X) edges between X and V − X. It follows since if X is of type (Ai ) (resp., type (Bi )), then the NA-edge-connectivity between a vertex in X (resp., V − X) and an area Wi ∈ W with Wi ∩ X = ∅ (resp., Wi ⊆ X) need be augmented to at least ri . Here we define αG,W,rW (X) as follows, which indicates the number of necessary edges which join two vertices form a cut X to the cut V − X (note that r1 ≤ r2 ≤ · · · ≤ rp holds). Definition 1. For each cut X of type (Aj ) or (Bj ) for some area Wj , we define iX as the maximum index i such that X is of type (Ai ) or (Bi ), and define αG,W,rW (X) = max{0, riX − dG (X)}. For any other cut X, we define
αG,W,rW (X) = 0. Lemma 2. For each cut X, it is necessary to add at least αG,W,rW (X) edges between X and V − X.
Let
α(G, W, rW ) = max αG,W,rW (X) , X
(1)
X∈X
where the maximization is taken over all subpartitions of V . Then any feasible solution to rW -NA-ECAP with (G, W) must contain an edge which joins two vertices from a cut X with αG,W,rW (X) > 0 and the cut V − X. Adding one edge can contribute to at most two ‘cut deficiencies’ in a subpartition of V , and hence we see the following lemma. Lemma 3. opt(G, W, rW ) ≥ α(G, W, rW )/2 holds.
The area graph (G, W) in Figure 1(i) satisfies α(G, W, rW ) = 8. We have X∈X αG,W,rW (X) = 8 for the subpartition X = {{v1 }, {v2 }, {v4 }, {v6 , v7 , v8 }, {v9 , v11 }, {v10 }} of V . We remark that there is an area graph (G, W) with opt(G, W, rW ) > α(G, W, rW )/2. Figure 2 gives an instance for r = r1 = r2 = r3 = 2. Each cut {vi }, i = 1, 2, 4 is of type (A3 ) and satisfies r − dG (vi ) = 1 and the cut {v3 } is of type (A1 ) and satisfies r − dG (v3 ) = 1. Then we see α(G, W, rW )/2 = 2. In order to make (G, W) rW -NA-edge-connected by adding two new edges, we must add e = (v1 , v2 ) and e = (v3 , v4 ) without loss of generality. G + {e, e } is not rW -NA-edge-connected by λG+{e,e } (v1 , W3 ) = 1. We will show that all such instances can be completely characterized as follows.
494
Toshimasa Ishii and Masayuki Hagiwara
v1 W3
v3 v4
W2
W1
v2
W) Fig. 2. Illustration of an area graph (G, W) with opt(G, W, rW ) = α(G,W,r + 1, 2 where rW (Wi ) = 2 holds for i = 1, 2, 3.
Definition 2. We say that an area graph (G, W) has property (P ) ifα(G, W, rW ) is even and there is a subpartition X = {X1 , . . . , Xt } of V with X∈X αG,W,rW (X) = α(G, W, rW ) satisfying the following conditions (P 1)–(P 3): (P 1) Each cut X ∈ X is of type (Ai ) for some Wi ∈ W. (P 2) The cut X1 satisfies αG,W,rW (X1 ) = 1 and X1 ⊂ C1 for some component C1 of G with X ∩ C1 = ∅ for each = 2, 3, . . . , t. (P 3) For each = 2, 3, . . . , t, there is a cut Y of type (Bj ) with some Wj ∈ W such that we have X ∪ X1 ⊆ Y and X∈X ,X⊂Y αG,W,rW (X) ≤ (rj + 1) −dG (Y ), and that every cut X ∈ X satisfies X ⊂ Y or X ∩ Y = ∅.
Intuitively, the above condition (P3) indicates that for any feasible solution E , if the number of edges e ∈ E incident to Y is equal to X∈X ,X⊂Y αG,W,rW (X), then any edge e ∈ E must have its end vertex also in V − Y , from dG+E (Y ) ≥ rj . Note that (G, W) in Figure 2 has property (P) because α(G, W, rW ) = 4 holds and a subpartition X = {X1 = {v4 }, X2 = {v1 }, X3 = {v2 }, X4 = {v3 }} of V satisfies Y2 = C1 ∪{v1 }, Y3 = C1 ∪{v2 }, and Y4 = C1 ∪{v3 } for the component C1 of G containing v4 . Lemma 4. If (G, W) has property (P), then opt(G, W, rW ) ≥ α(G, W, rW )/2 +1 holds. Proof. Assume by contradiction that (G, W) has property (P) and there is an edge set E ∗ with |E ∗ | = α(G, W, rW )/2 such that G + E ∗ is rW -NAedge-connected (note that α(G, W, rW ) is even). Let X = {X1 , . . . , Xt } denote a subpartition of V satisfying X∈X αG,W,rW (X) = α(G, W, rW ) and the above (P1)–(P3). Since |E ∗ | = α(G, W, rW )/2 holds, each cut X ∈ X satisfies dG+E ∗ (X) = riX , and hence dG (X) = riX − dG (X) = αG,W,rW (X), where G = (V, E ∗ ). Therefore, any edge (x , x ) ∈ E ∗ satisfies x ∈ X and x ∈ X for some two cuts X , X ∈ X with = . From this, there exists a cut Xs ∈ X with s = 1 and EG (Xs , X1 ) = ∅. Since (G, W) satisfies property (P), there is a cut Ys of type (Bj ) which satisfies (P3), and hence d v∈Ys G (v) ≤ (rj +1)−dG (Ys ). Since G [Ys ] contains one edge in EG (Xs , X1 ), we have dG (Ys ) ≤ (rj − 1) − dG (Ys ), which implies that dG+E ∗ (Ys ) = dG (Ys ) + dG (Ys ) ≤ rj − 1. Hence a vertex v ∈ V − Ys satisfies λG+E ∗ (v, Wj ) ≤ rj − 1, contradicting that G + E ∗ is rW -NA-edge-connected (note that Ys is of type (Bj ) and hence we have Wj ⊆ Ys ).
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
495
In this paper, we prove that rW -NA-ECAP enjoys the following min-max theorem and is polynomially solvable. Theorem 1. For rW -NA-ECAP with rW (W ) ≥ 3 for each area W ∈ W, opt(G, W, rW ) = α(G, W, rW )/2 holds if (G, W) does not have property (P ), and opt(G, W, rW ) = α(G, W, rW )/2 + 1 holds otherwise. Moreover, a solution E ∗ with |E ∗ | = opt(G, W, rW ) can be obtained in O(m+ prp n5 log (n/rp )) time.
Theorem 2. For rW -NA-ECAP with rW (W ) ≥ 2 for each area W ∈ W, a solution E ∗ with |E ∗ | ≤ opt(G, W, rW ) + 1 can be obtained in O(m+ prp n5 log (n/rp )) time.
4
Algorithm
Based on the lower bounds in the previous section, we give an algorithm, called rW -NAEC-AUG, which finds a feasible solution E to rW -NA-ECAP with |E | = opt(G, W, rW ), for a given area graph (G, W) and a requirement function rW : W → Z + − {1, 2}. It finds a feasible solution E with |E | = α(G, W, rW )/2 + 1 if (G, W) has property (P), |E | = α(G, W, rW )/2 otherwise. For a graph H = (V ∪ {s}, E) and a designated vertex s ∈ / V , an operation called edge-splitting (at s) is defined as deleting two edges (s, u), (s, v) ∈ E and adding one new edge (u, v). That is, the graph H = (V ∪ {s}, (E − {(s, u), (s, v)}) ∪ {(u, v)}) is obtained from such edge-splitting operation. Then we say that H is obtained from H by splitting a pair of edges (s, u) and (s, v). A sequence of splittings is complete if the resulting graph H does not have any neighbor of s. Conversely, we say that H is obtained from H by hooking up an edge (u, v) ∈ E(H − s) at s, if we construct H by replacing an edge (u, v) with two edges (s, u) and (s, v) in H. The edge-splitting operation is known to be a useful tool for solving connectivity augmentation problems [1]. An outline of our algorithm is as follows. We first add a new vertex s and the minimum number of new edges between s and an area graph (G, W) to construct an rW -NA-edge-connected graph H and convert H into an rW -NAedge-connected graph by splitting off edges incident to s and eliminating s. More precisely, we describe the algorithm below, and introduce three theorems necessary to justify the algorithm, whose proofs are omitted due to space limitation. An example of computational process of rW -NAEC-AUG is shown in Figure 3. Algorithm rW -NAEC-AUG. Input: An area graph (G = (V, E), W = {W1 , W2 , . . . , Wp }) and a requirement function rW : W → Z + − {1, 2}. Output: A set E ∗ of new edges with |E ∗ | = opt(G, W, rW ) such that G + E ∗ is rW -NA-edge-connected. Step 1: We add a new vertex s and a set F1 of new edges between s and V such that in the resulting graph H = (V ∪ {s}, E ∪ F1 ), all cuts X ⊂ V of type (Ai ) or (Bi ) satisfy dH (X) ≥ ri for each Wi ∈ W, (2)
496
Toshimasa Ishii and Masayuki Hagiwara
∪
H=(V
{s},
E
F1
∪
F1)
∪
H=(V
s
{s},
E
F1
∪
F1)
s
W3
v1 v2
W1 W2
(i)
∪
H=(V
{s},
E
F1
∪
F1)
(ii)
s
∪
H=(V
{s},
F1
E
∪
F1)
s
v3 v4
(iii)
(iv)
Fig. 3. Computational process of algorithm rW -NAEC-AUG applied to the area graph (G, W) in Figure 1 and (rW (W1 ), rW (W2 ), rW (W3 )) = (2, 3, 4). The lower bound in Section 3 is α(G, W, rW )/2 = 4. (i) H = (V ∪ {s}, E ∪ F1 ) obtained by Step 1. Edges in F1 are drawn as broken lines. Then λH (v, W ) ≥ rW (W ) holds for every pair of v ∈ V and W ∈ W. (ii) H1 = (H − {(s, v1 ), (s, v2 )}) ∪ {(v1 , v2 )} obtained from H by an admissible splitting of (s, v1 ) and (s, v2 ). (iii) H2 = (H1 − {(s, v3 ), (s, v4 )}) ∪ {(v3 , v4 )} obtained from H1 by an admissible splitting of (s, v3 ) and (s, v4 ). (iv) H3 obtained from H2 by a complete admissible splitting at s. The graph G3 = H3 − s is rW -NAedge-connected.
and no F ⊂ F1 satisfies this property (as will be shown, |F1 | = α(G, W, rW ) holds). If dH (s) is odd, then we add to F1 one extra edge between s and V . Step 2: We split two edges incident to s while preserving (2) (such splitting pair is called admissible). We continue to execute admissible edge-splittings at s until no pair of two edges incident to s is admissible. Let H2 = (V ∪ {s}, E ∪ E2 ∪ F2 ) be the resulting graph, where F2 = EH2 (s) and E2 denotes a set of split edges. If F2 = ∅ holds, then halt after outputting E ∗ := E2 . Otherwise dH2 (s) = 4 holds and the graph H2 − s has two components C1 and C2 with dH2 (s, C1 ) = 3 and dH2 (s, C2 ) = 1, where EH2 (s, C2 ) = {(s, u∗ )}. We have the following four cases (a) – (d). (a) The vertex u∗ is contained in no cut X ⊆ C2 of type (Ai ) with dH2 (X) = ri for any i. Then after replacing (s, u∗ ) with a new edge (s, v) for some vertex v ∈ C1 while preserving (2), execute a complete admissible splitting at s. Output a set E ∗ of all split edges, where |E ∗ | = α(G, W, rW )/2 holds.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
497
(b) E2 ∩ E(H2 [V − C1 ]) = ∅ holds. Then after hooking up one edge e ∈ E2 ∩ E(H2 [V − C1 ]), execute a complete admissible splitting at s. Output a set E ∗ of all split edges, where |E ∗ | = α(G, W, rW )/2 holds. (c) There is a set E ⊆ E2 of at most two split edges such that the graph H3 resulting from hooking up a set E of edges in H2 has an admissible pair {(s, u∗ ), f } for some f ∈ EH3 (s, V ). After a complete admissible splitting at s in H3 , output a set E ∗ of all split edges, where |E ∗ | = α(G, W, rW )/2 holds. (d) None of (a) – (c) holds. Then we can prove that (G, W) has property (P). After adding one new edge e∗ to EH2 (C1 , C2 ), execute a complete admissible splitting at s in H2 + {e∗ }. Outputting an edge set E ∗ := E3 ∪ {e∗ }, where E3 denotes a set of all split edges and |E ∗ | = α(G, W, rW )/2 + 1 holds.
To justify the algorithm rW -NAEC-AUG, it suffices to show the following three theorems. Theorem 3. Let (G = (V, E), W = {W1 , . . . , Wp }) be an area graph, and 0 ≤ / V and r1 ≤ · · · ≤ rp be integers. Let H = (V ∪ {s}, E ∪ F1 ) be a graph with s ∈ F1 = EH (s, V ) such that H satisfies (2) and no F ⊂ F1 satisfies this property. Then |F1 | = α(G, W, rW ) holds.
Theorem 4. Let (G = (V, E), W = {W1 , . . . , Wp }) be an area graph, and 2 ≤ r1 ≤ · · · ≤ rp be integers. Let H = (V ∪ {s}, E ∪ F ) with F = EH (s, V ) = ∅, s∈ / V , and an even dH (s), satisfy (2). If no pair of two edge in F is admissible, then we have dH (s) = 4 and G has two components C1 and C2 with dH (s, C1 ) = 3 and dH (s, C2 ) = 1. Moreover, in the graph H + e∗ obtained by adding one arbitrary new edge e∗ to EG (C1 , C2 ), there is a complete admissible splitting at s.
Theorem 5. Let (G, W) and H satisfy the assumption of Theorem 4, and 3 ≤ r1 ≤ · · · ≤ rp be integers. Let H ∗ be a graph obtained by a sequence of admissible splittings at s from H such that EH ∗ (s, V ) = ∅ holds and no pair of two edge in EH ∗ (s, V ) is admissible in H ∗ . Let C1 and C2 be two components in H ∗ − s with dH ∗ (s, C1 ) = 3 and dH ∗ (s, C2 ) = 1 (they exist by Theorem 4). Then if H ∗ satisfies one of the following conditions (a)–(c), then H has a complete admissible splitting at s after replacing at most one edge in EH (s, V ). Otherwise (G, W) has property (P). (a) For {(s, u∗ )} = EH ∗ (s, C2 ), u∗ is contained in no cut X ⊆ C2 of type (Ai ) with dH ∗ (X) = ri for any i. (b) E1 ∩ E(H ∗ [V − C1 ]) = ∅ holds, where E1 denotes a set of all split edges. (c) There is a set E ⊆ E1 of at most two split edges such that the graph H resulting from hooking up a set E of edges in H ∗ has an admissible pair {(s, u∗ ), f } for some f ∈ EH (s, V ).
By Theorems 4 and 5, for a set E ∗ of edges obtained by algorithm rW -NAECAUG, the graph H ∗ = (V ∪ {s}, E ∪ E ∗ ) satisfies (2), i.e., all cuts X ⊂ V of type (Ai ) or (Bi ) satisfy dH ∗ (X) ≥ ri for each area Wi ∈ W. By dH ∗ (s) = 0, all cuts X ⊂ V satisfy dG+E ∗ (X) = dH ∗ (X). By Lemma 1, this implies that G + E ∗ is rW -NA-edge-connected. By Theorems 3 and 5, we have |E ∗ | =
498
Toshimasa Ishii and Masayuki Hagiwara
α(G, W, rW )/2+1 in the cases where an initial area graph (G, W) has property (P), |E ∗ | = α(G, W, rW )/2 otherwise. By Lemmas 3 and 4, we have |E ∗ | = opt(G, W, rW ). Finally, we analyze the time complexity of algorithm rW -NAEC-AUG. By the maximum flow technique in [3], we can compute in O(mn log (n2 /m)) time λG (v, W ) for a vertex v ∈ V and an area W ∈ W. Hence it can be checked in O(mpn2 log (n2 /m)) time whether H satisfies (3) or not. In Step 1, for each vertex v ∈ V , after deleting all edges between s and v, we check whether the resulting graph H satisfies (3) or not. If (3) is violated, then we add maxx∈V,Wi ∈W {ri − λH (x, Wi )} edges between s and v in H . In Step 2, for each pair {u, v} ⊆ V , after splitting min{dH (s, u), dH (s, v)} pairs {(s, u), (s, v)}, we check whether the resulting graph H satisfies (3) or not. If (3) is violated, then we hook up 12 maxx∈V,Wi ∈W {ri −λH (x, Wi )} pairs in H . The procedures (a) – (d) can be also executed in polynomial time since the number of hooking up operations is O(n4 ). By further analysis, we can prove that hooking up split edges O(n2 ) times suffice for these procedures, but we here omit the details. Therefore, we see that algorithm rW -NAEC-AUG can be implemented to run in O(mpn4 log (n2 /m)) time. As a result, this total complexity can be reduced to O(m+ prp n5 log (n/rp )) by applying the procedure to a sparse spanning subgraph of G with O(rp n) edges, where such sparsification takes O(m + n log n) time [10,11]. Summarizing the argument given so far, Theorem 1 is now established. Notice that the assumption of r1 ≥ 3 is necessary only for Theorem 5. Therefore, even in the case of r1 = 2, we see by Theorem 4 that we can obtain a feasible solution E to rW -NA-ECAP with |E | ≤ α(G, W, rW )/2 + 1 ≤ opt(G, W, rW ) + 1. This implies Theorem 2. Actually, we remark that there are some differences between the case of r1 = 2 and the case of r1 ≥ 3. For example, a graph (G = (V ∪ {v}, E), W) obtained from the graph (G, W) in Figure 2 by adding an isolated vertex v does not have property (P), but satisfies opt(G , W, rW ) > α(G , W, rW )/2.
5
Conclusion
In this paper, given an area multigraph (G = (V, E), W) and a requirement function rW : W → Z + , we have proposed a polynomial time algorithm for rW -NA-ECAP in the case where each area W ∈ W satisfies rW (W ) ≥ 3. The time complexity of our algorithm is O(m+ pr∗ n5 log (n/r∗ )). Moreover, we have showned that in the case of rW (W ) ≥ 2, W ∈ W, a solution with at most one edge over the optimal can be found in the same time complexity. However, it is still open whether the problem in the case of rW (W ) ≥ 2, W ∈ W is polynomially solvable. We finally remark that our method in this paper cannot be applied to the problem of augmenting a given simple graph while preserving the simplicity of the graph. For such simplicity preserving problems, it was shown [8] that even the edge-connectivity augmentation problem is NP-hard.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
499
Acknowledgments This research is supported by a Grant-in-Aid for the 21st Century COE Program “Intelligent Human Sensing” from the Ministry of Education, Culture, Sports, Science, and Technology.
References 1. A. Frank, Augmenting graphs to meet edge-connectivity requirements, SIAM J. Discrete Math., 5(1), (1992), 25–53. 2. A. Frank, Connectivity augmentation problems in network design, in Mathematical Programming: State of the Art 1994, J.R. Birge and K.G. Murty (Eds.), The University of Michigan, Ann Arbor, MI, (1994), 34–63. 3. A. V. Goldberg and R. E. Tarjan, A new approach to the maximum flow problem, J. Assoc. Comput. Mach., 35, (1988), 921–940. 4. T. Ishii, Y. Akiyama, and H. Nagamochi, Minimum augmentation of edgeconnectivity between vertices and sets of vertices in undirected graphs, Electr. Notes Theo. Comp. Sci., vol.78, Computing Theory: The Australian Theory Symposium (CATS’03), (200 3). 5. H. Ito, Node-to-area connectivity of graphs, Transactions of the Institute of Electrical Engineers of Japan, 11C(4), (1994), 463-469. 6. H. Ito, Node-to-area connectivity of graphs, In M. Fushimi and K. Tone, editors, Proceedings of APORS94, World Scientific publishing, (1995), 89–96. 7. H. Ito and M. Yokoyama, Edge connectivity between nodes and node-subsets, Networks, 31(3), (1998), 157–164. 8. T. Jord´ an, Two NP-complete augmentation problems, Preprint no. 8, Department of Mathematics and Computer Science, Odense University, (1997). 9. H. Miwa and H. Ito, Edge augmenting problems for increasing connectivity between vertices and vertex subsets, 1999 Technical Report of IPSJ, 99-AL-66(8), (1999), 17–24. 10. H. Nagamochi and T. Ibaraki, A linear-time algorithm for finding a sparse kconnected spanning subgraph of a k-connected graph, Algorithmica, 7, (1992), 583– 596. 11. H. Nagamochi and T. Ibaraki, Computing edge-connectivity of multigraphs and capacitated graphs, SIAM J. Discrete Math., 5, (1992), 54–66. 12. H. Nagamochi and T. Ibaraki, Graph connectivity and its augmentation: applications of MA orderings, Discrete Applied Mathematics, 123(1), (2002), 447–472. 13. T. Watanabe and A. Nakamura, Edge-connectivity augmentation problems, J. Comput. System Sci., 35, (1987), 96–144.
Scheduling and Traffic Allocation for Tasks with Bounded Splittability Piotr Krysta1, Peter Sanders1 , and Berthold V¨ ocking2 1
2
Max-Planck-Institut f¨ ur Informatik Stuhlsatzenhausweg 85, Saarbr¨ ucken, Germany {krysta,sanders}@mpi-sb.mpg.de Dept. of Computer Science, Universit¨ at Dortmund Baroper Str. 301, 44221 Dortmund, Germany [email protected]
Abstract. We investigate variants of the problem of scheduling tasks on uniformly related machines to minimize the makespan. In the k-splittable scheduling problem each task can be broken into at most k ≥ 2 pieces to be assigned to different machines. In a more general SAC problem each task j has its own splittability parameter kj ≥ 2. These problems are NPhard and previous research focuses mainly on approximation algorithms. Our motivation to study these scheduling problems is traffic allocation for server farms based on a variant of the Internet Domain Name Service (DNS) that uses a stochastic splitting of request streams. We show that the traffic allocation problem with standard latency functions from Queueing Theory cannot be approximated in polynomial time within any finite factor because of the extreme behavior of these functions. Our main result is a polynomial time, exact algorithm for the k-splittable scheduling problem as well as the SAC problem with a fixed number of machines. The running time of our algorithm is exponential in the number of machines but is only linear in the number of tasks. This result is the first proof that bounded splittability reduces the complexity of scheduling as the unsplittable scheduling is known to be NP-hard already for two machines. Furthermore, since our algorithm solves the scheduling problem exactly, it also solves the traffic allocation problem.
1
Introduction
A server farm is a collection of servers delivering data to a set of clients. Large scale server farms are distributed all over the Internet and deliver various types of site content including graphics, streaming media, downloadable files, and HTML on behalf of other content providers who pay for an efficient and reliable delivery of their site data. To satisfy these requirements, one needs an advanced traffic management that takes care for the assignment of traffic streams to individual servers. Such streams can be formed, e.g., by traffic directed to the same page,
Partially supported by DFG grants Vo889/1-1, Sa933/1-1, and the IST program of the EU under contract IST-1999-14186 (ALCOM-FT).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 500–510, 2003. c Springer-Verlag Berlin Heidelberg 2003
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
501
traffic directed to pages of the same content provider, or by the traffic requested from clients in the same geographical region or domain, or also by combinations of these criteria. The objective is to distribute these streams as evenly as possible over all servers in order to ensure site availability and optimal performance. For each traffic stream there is a corresponding stream of requests sent from the clients to the server farm. Current implementations of commercial Web server farms use the Internet Domain Name Service (DNS) to direct the requests to the server that is responsible for delivering the data of the corresponding traffic stream. The DNS can answer a query such as “What is www.uni-dortmund.de?” with a short list of IP addresses rather than only a single IP address. The original idea behind returning this list is that, in case of failures, clients can redirect their requests to alternative servers. Nowadays, slightly deviating from this idea, these lists are also used for the purpose of load balancing among replicated servers (cf., e.g., [8]). When clients make a DNS query for a name mapped to a list of addresses, the server responds with the entire list of IP addresses, rotating the ordering of addresses for each reply. As clients typically send their HTTP requests to the IP address listed first, DNS rotation distributes the requests more or less evenly among all the replicated servers in the list. Suppose the request streams are formed by a sufficiently large number of clients such that it is reasonably well described by a Poisson process. Let λj denote the rate of stream j, i.e., the expected number of requests in some specified time interval. Under this assumption, rotating a list of servers corresponds to splitting stream j into substreams each of which having rate λj /. We propose a slightly more sophisticated stochastic splitting policy that allows for a better load balancing and additionally preserves the Poisson property of the request streams. Suppose, the DNS attaches a vector pj1 , . . . , pj with i pji = 1 to the list of each stream j. In this way, every individual request in stream j can be directed to the ith server on this list with probability pji . This policy breaks Poisson stream j into Poisson streams of rate pj1 λj , . . . , pj λj , respectively. The possibility to split streams into smaller substreams can obviously reduce the maximum latency. It is not obvious, however, whether it is easier or more difficult to find an optimal assignment if every stream is allowed to be broken into a bounded number of substreams. Observe that the allocation problem above is a variant of machine scheduling in which streams correspond to jobs and servers to machines. In the context of machine scheduling, bounded splittability has been investigated before with the motivation to speed up the execution of parallel programs. We will introduce first the relevant background in scheduling. Scheduling on Uniformly Related Machines. Suppose a set of jobs [n] = {1, . . . , n} need to be scheduled on a set of machines [m] = {1, . . . , m}. Jobs are described by sizes λ1 , . . . , λn ∈ >0 , and machines are described by their speeds s1 , . . . , sn ∈ >0 . In the classical, unsplittable scheduling problem on uniformly related machines, every job must be assigned to exactly one machine. This mapping can be described by an assignment matrix (xij )i∈[m],j∈[n] , where xij is an indicator variable with xij = 1 if job j is assigned to machine i and 0 otherwise. The objective is to minimize the makespan z = maxi∈[m] j∈[n] λj xij /si . It is
502
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
well known that this problem is strongly NP-hard. Hochbaum and Shmoys [5, 6] gave the first polynomial time approximation schemes (PTAS) for this problem. If the number of machines is fixed, then the problem is only weakly NP-hard and it admits a fully polynomial time approximation scheme (FPTAS) [7]. A fractional relaxation of the problem leads to splittable scheduling. In the fully splittable scheduling problem the variables xij can take arbitrary real values from [0, 1] subject to the constraints i∈[m] xij ≥ 1, for every j ∈ [n]. This problem is trivially solvable, e.g., by assigning a piece of each job to each machine whose size is proportional to the speed of the machine. k-Splittable Machine Scheduling and the SAC Problem. In the k-splittable machine scheduling problem each job can be broken into at most k ≥ 2 pieces that must be placed on different machines, i.e., at most k of the variables xij ∈ [0, 1], for every j, are allowed to be positive. Recently, Shachnai and Tamir [12] introduced a generalization of this problem, called scheduling with machine allotment constraints (SAC). In this problem, each job j has its own splittability parameter kj ≥ 1. In our study, we will mostly assume kj ≥ 2, for every j ∈ [n]. Shachnai and Tamir [12] prove that, in contrast to the fully splittable scheduling problem, the k-splittable machine scheduling problem is strongly NP-hard even on identical machines. They also give a PTAS for the SAC problem, whose running time, however, does not render practical as the splittability appears double exponentially in the running time. As a more practical result, they present a very fast maxj (1 + 1/kj )-approximation algorithm. This result suggests that, in fact, approximation should get easier when the splittability is increased. We should mention here, that there is a related scheduling problem in which preemption is allowed, that is, jobs can be split arbitrarily but pieces of the same job cannot be processed at the same time on different machines. Shachnai and Tamir study also combinations of SAC and scheduling with preemption in which jobs can be broken into a bounded number of pieces and additionally there are bounds on the number of pieces that can be executed at the same time. Further variants of scheduling with different notions of splittability with motivations from parallel computing and production planning can be found in [12] and [13]. Scheduling with Non-linear Latency Functions. The only difference between the k-splittable scheduling and the traffic allocation problems is that the latency occurring at servers may not be linear. A typical example of a latency function at a server of speed s with an incoming Poisson stream at rate λ is λ . This family of functions can be derived from the formula fs (λ) = s(s−min{s,λ}) for the waiting time on an M/M/1 queueing system. Of course, M/M/1 waiting time is only one out of many examples for latency functions that can be obtained from Queueing Theory. In fact, a typical property of such functions is that the latency goes to infinity when the injection rate approaches the service rate. Instead of focusing on particular latency functions, we will set up a more general framework to analyze the effects of non-linearity. The k-splittable traffic allocation problem is a variant of the k-splittable scheduling. Streams are described by rates λ1 , . . . , λn , and servers–by bandwidths or service rates s1 , . . . , sm . Hence, traffic streams can be identified with jobs and servers with machines.
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
503
The latencies occurring at the servers are described by a family of latency functions F = {fs : ≥0 → ≥0 ∪{∞} | s ∈ >0 } where fs denotes a non-decreasing latency function for a server with service rate s. Scheduling under non-linear latency functions has been considered before. Alon et al. [2] give a PTAS for makespan minimization on identical machines with certain well-behaved latency functions. This was extended into a PTAS for makespan minimization on uniformly related machines by Epstein and Sgall [4]. In both studies, the latency functions must fulfill some analytical properties like convexity and uniform continuity under a logarithmic scale. Unfortunately, the uniform continuity condition excludes typical functions from Queueing Theory. Our Results. The main result of this paper is a fixed-parameter tractable algorithm for the k-splittable scheduling and the more general SAC problem with splittability at least two for every job. Our algorithm has polynomial running time for every fixed number of machines. This result is remarkable as unsplittable scheduling is known to be NP-hard already on two machines. In fact, our result is the first proof that bounded splittability reduces the complexity of scheduling. In more detail, given any upper bound T on the makespan of an optimal assignment, our algorithm computes a feasible assignment with makespan at most T in time O(n+mm+m/(k0 −1) ) with k0 = min{k1 , k2 , . . . , kn }. Furthermore, despite the possibility to split the jobs into pieces of non-rational size, we prove that the optimal makespan can be represented by a rational number with only a polynomial number of bits. Thus the optimal makespan can be found by using binary search techniques over the rationals. This yields an exact, polynomialtime algorithm for SAC with a fixed number of machines. (We have recently improved the running time in case of identical machines [1].) Note, that this problem is strongly NP-hard when the number of machines is not fixed and k0 ≥ 2 [12]. In addition, we study the effects due to the non-linearity of latency functions. The algorithm above can be adopted to work efficiently for a wide class of latency functions containing even such extreme functions as M/M/1 waiting time. On the negative side, we prove that latency functions like M/M/1 do not admit polynomial time approximation algorithms with finite approximation ratio if the number of machines is unbounded. The latter result is an ultimate rationale for our approach to devise efficient algorithms for a fixed number of machines.
2
An Exact Algorithm for SAC with Given Makespan
We present here an exact algorithm for SAC with kj ≥ 2 for every job. Our algorithm has polynomial running time for any fixed number of machines. We assume that an upper bound on the optimal makespan is given. This upper bound defines a capacity for each machine. The capacity of machine i is denoted by ci . The computed schedule has to satisfy j∈[n] λj xij ≤ ci , for every i ∈ [m]. A difficult subproblem to be solved is to decide into which pieces of which size the jobs should be cut. In principle, the number of possible cuts is unbounded. We will show that it suffices to consider only those cuts that “saturate” a machine.
504
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
Let πij = λj xij denote the size of the piece of job j allocated to machine i. Machine i is saturated by job j if πij = ci . Our algorithm (Algorithm 1) schedules the bulkiest job j first where the bulkiness of j is λj /(kj − 1). Using backtracking it tries all ways to cut one piece from job j s.t. a machine is saturated. The saturated machine is removed from the problem; the splittability and size of j is reduced accordingly. The remaining problem is solved recursively. Two special cases arise. If j is too small to saturate kj machines, all remaining jobs can be scheduled using a simple greedy approach known as McNaughton’s rule [10]. Since the splittability kj of a job is decreased whenever a piece is cut off, a remaining piece can eventually become unsplittable. Since this remaining piece will be infinitely bulky, it will be scheduled next. In this case, all machines that can accommodate the piece are tried. For the precise description see Fig. 1. I := [n] – – Machines to be saturated; Jobs to be scheduled [m]; J := λ > c ∨ ¬solve() then output “no solution possible” if j i j∈J i∈I else output nonzero πij values Function solve() : Boolean if J = ∅ then return true find a j ∈ J that maximizes λj /(kj − 1) – – Unsplittable remaining piece if kj = 1 then forall i ∈ I with ci ≥ λj do – – (*) πij := λj ; ci := ci − λj ; J := J \ {j} if solve() then return true undo changes made in line (*) else – – Job j is splittable if λj /(kj − 1) ≤ min {ci : i ∈ I} then McNaughton(); return true forall i ∈ I with ci < λj do – – (**) πij := ci ; λj := λj − ci ; kj := kj − 1; I := I \ {i} if solve() then return true undo changes made in line (**) return false Procedure McNaughton() – – Schedule greedily pick any i ∈ I foreach j ∈ J do while ci ≤ λj do πij := ci ; λj := λj − ci ; I := I \ {i}; pick any new i ∈ I πij := λj ; ci := ci − λj Fig. 1. Algorithm 1: Find a schedule of n jobs with splittabilities kj to m machines.
Theorem 1. Algorithm 1 finds a feasible solution for SAC with a given possible makespan, provided that the splittability of each job is at least two. It can be implemented to run in time O n + mm+m/(k0 −1) , where k0 = min{k1 , . . . , kn }. Proof. All the necessary data structures can be initialized in time O(m + n) if we use a representation of the piece size matrix (πij ) that only stores nonzero entries. There can be at most m recursive calls that saturate a machine and at
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
505
most m/(k0 − 1) recursive calls made for unsplittable pieces that remain after a job j was split kj − 1 times. All in all, the backtrack tree considers no more than m!mm/(k0 −1) possibilities. The selection of the bulkiest job can be implemented to run in time O(log m) independent of n: Only the m largest jobs can ever be a candidate. Hence it suffices to select these jobs initially using an O(n) time algorithm [3] and keep them in a priority queue data structure. Greedy scheduling using McNaughton’s rule takes time O(n + m). Overall, we get an execution time of O n + m + m!mm/(k0 −1) log m = O n + mm+m/(k0 −1) . The algorithm also produces only correct schedules. In particular, when λj /(kj − 1) ≤ min {ci : i ∈ I} McNaughton’s rule can complete the schedule because no remaining job is large enough to saturate more than k j −1 of the remain ing machines. In particular, solve() maintains the invariant j∈J λj ≤ i∈I ci and when McNaughton’s rule is called, it can complete the schedule: Lemma 1. McNaughton’s rule computes a correct schedule if j∈J λj ≤ i∈I ci and ∀i ∈ I, j ∈ J : λj /(kj − 1) ≤ ci . Proof. The only thing that can go wrong is that a job j is split more than kj − 1 times, i.e., into ≥ kj + 1 pieces. Then, it completely fills at least kj − 1 machines with capacity at least mini∈I ci , contradicting λj /(kj −1) ≤ ci . Now we come to the interesting part of the proof. We have to show that the search succeeds if a feasible schedule exists. We show the stronger claim that the algorithm is correct even if unsplittable jobs are present. (In this case only the above running time analysis would fail.) The proof is by induction on m. For m = 1 this is trivial since no splits are necessary. Consider the case m > 1. If there are unsplittable jobs, they are infinitely bulky and so are scheduled first. Since all possible placements for them are tried, nothing can be missed for them. When a splittable job is bulkiest, only those splits are considered that saturate one machine. Lemma 2 shows that if there is a feasible schedule, there must also be one with this property. The recursive call leaves a problem with one machine less and the induction hypothesis is applicable. Lemma 2. If a feasible schedule exists and the bulkiest job is large enough to saturate a machine then there is a feasible schedule where the bulkiest job saturates a machine. Our approach to proving Lemma 2 is to show that any feasible schedule can be transformed into a feasible schedule where the bulkiest job saturates a machine. To simplify this task, we first establish a toolbox of simpler transformations. We begin with two very simple transformations that affect only two jobs and obviously maintain feasibility. See Fig. 2-(a) and 2-(b) for illustrations. Lemma 3. For any feasible schedule, consider two jobs p and q sharing machine i , i.e., πi p > 0 and πi q > 0. For any machine i such that πi q < πip there is a feasible schedule where the overlapping piece of q is moved to machine i, i.e., (πi p , πip , πi q , πiq ) := (πi p + πi q , πip − πi q , 0, πiq + πi q ).
506
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
Fig. 2. Manipulating schedules. Lines represent jobs. (Bent) boxes represent machines. (a): The move from Lemma 3; (b): The swap from Lemma 4; (c): Saturation using Lemma 5; (d): The rotation from Lemma 6; (e): Moving j away from r.
Lemma 4. For any feasible schedule, consider two jobs p and q sharing machine i, i.e., πip > 0 and πiq > 0. Furthermore, consider two other pieces πip p and πiq q of p and q. If πip p ≤ πiq + πiq q and πiq q ≤ πip + πip p then there is a feasible schedule where the pieces πip p and πiq q are swapped as follows: ( πip p , πip , πiq p , πip q , πiq , πiq q ) := ( 0, πip + πip p − πiq q , πiq p + πiq q , πip q + πip p , πiq + πiq q − πip p , 0 ) As a first application of Lemma 3 we now explain how a large job j allocated to at most kj − 1 machines can “take over” a small machine. Lemma 5. Consider a job j and machine i such that λj /(kj − 1) ≥ ci . If there is a feasible schedule where j is scheduled to at most (kj − 1) machines, then there is a feasible schedule where j saturates machine i. Proof. Let i denote a machine index that maximizes πi j and note that πi j ≥ λj /(kj −1) ≥ ci . We can now apply Lemma 3 to subsequently move all the pieces on machine i to machine i . Lemma 3 remains applicable because πi j is large enough to saturate machine i. See Fig. 2-(c) for an illustration. After the above local transformations, we come to a global transformation that greatly simplifies the kind of schedules we have to consider. Definition 1. Job j is called split if |{i : πij > 0}| > 1. The split graph corresponding to a schedule is an undirected hypergraph G = ([m], E) where each split job j corresponds to a hyperedge {i : πij > 0} ∈ E. Lemma 6. If a feasible schedule exists, then there is a feasible schedule whose split graph is a forest. Proof. It suffices to show that for a feasible schedule whose split graph G contains a cycle there is also a feasible schedule whose corresponding split graph has a smaller value of e∈E |e|. Then it follows that a feasible schedule that minimizes |e| is a forest. e∈E So suppose G contains a cycle involving edges. Let succ(j) stand for (j + 1) mod + 1. By appropriately renumbering machines and jobs we can assume
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
507
without loss of generality that this cycle is made up of jobs 1 to and machines 1 to such that for j ∈ [], πjj > 0, πsucc(j)j > 0, and δ = π11 = minj∈[] min πjj , πsucc(j)j . Fig. 2-(d) depicts this normalized situation. Now we rotate the pieces in the cycle by increasing πjj by δ and decreasing πsucc(j)j by the same amount. The schedule remains feasible since the load of the machines in the cycle remains unchanged. Since the first job is now split in one piece less, e∈E |e| decreases. Now we have all the necessary tools to establish Lemma 2. Proof. Consider any feasible schedule; let j denote the bulkiest job and let there be a machine i0 with λj /(kj − 1) ≥ ci0 . We transform this schedule in several steps. We first apply Lemma 6 to obtain a schedule whose split graph is a forest. We now concentrate on the tree T where j is allocated. If job j is allocated to at most kj − 1 machines, we can saturate i0 using Lemma 5 and we are done. If one piece of j is allocated to a leaf i of T then all other jobs mapped to machine i are allocated there entirely. Let i denote another machine j is mapped to. We apply Lemma 3 to move small jobs from i to i . When this is no longer possible, either job j saturates machine i and we are done or there is a job j with λj = πij > πi j . Now we can apply Lemma 4 to pieces πij , πi j , πij , and a zero size piece of job j . This transformation reduces the number of pieces of job j so that we can saturate machine i0 using Lemma 5. Finally, j could be allocated to machines that are all interior nodes of T . We focus on the two largest pieces πij and πi j so that πij + πi j ≥ 2λj /kj . Now fix a leaf r that is connected to i via a path that does not involve j as an edge. This is possible since j is connected to interior nodes only. Now we intend to move job j away from r, i.e., we transform the schedule such that the path between node r and job j becomes longer. (The path between a node v and a job e in a tree starts at v and uses edges e = e until a node is reached that has e as an incident edge.) We do this iteratively until j is incident to a leaf in T . Then we can apply the transformations described above and we are done. We first apply Lemma 3 to move small pieces of jobs allocated to machine i to machine i. Although this changes the shape of T , it leaves the distance between jobs j and r invariant unless j ends up in machine i completely so that we can apply Lemma 5 and we are done. When Lemma 3 is no longer applicable, either j saturates machine i and we are done or there is a job q with πi q > πij . In that case we consider the smallest other piece πiq q of job q. More precisely, if q is split into at most kq − 1 nonzero pieces we pick some iq with πiq q = 0. Otherwise we pick iq = min { = i : πq > 0}. In either case πiq q ≤ λq /(kq − 1). Recall that πij + πi j ≥ 2λj /kj since this sum is invariant under the move operations we have performed. Furthermore, j is the bulkiest job so that πiq q ≤
λj 2λj 2λj λq ≤ = ≤ kq − 1 kj − 1 (kj − 1) + (kj − 1) (kj − 1) + 1 2λj = ≤ πij + πi j . kj
508
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
So, we can apply Lemma 4 to pieces i and i of job j and to pieces i and iq of job q. This increases the distance from job j to machine r as desired. Fig. 2-(e) gives an example where we apply Lemma 3 once and then Lemma 4.
3
Finding the Optimal Makespan
We assumed so far that an upper bound on the optimal makespan is known. The obvious idea now is to find the optimal makespan using binary search. In order to show that this search terminates one needs to prove that the optimal makespan is a rational number. This is not completely obvious as in principle jobs might be broken into pieces of non-rational size. The following lemma, however, shows that the optimal makespan can be represented by rational numbers of polynomial length. Let denote the set of non-negative rational numbers that can be represented by an -bit numerator and an -bit denominator and the symbol ∞. The proof of the next lemma is omitted in this extended abstract. Lemma 7. There is a constant κ > 0 s.t. the value of an optimum solution to SAC problem with kj ≥ 2 (for all j) is in N κ , where N is the problem size. By Lemma 7, the optimal makespan can be found by binary search methods over the rationals (see, e.g., [9, 11]) with Algorithm 1 as a decision oracle. Thus: Corollary 1. For every fixed number of machines, there is an exact polynomial time optimization algorithm for the SAC problem with splittability at least two.
4
Solving the Traffic Allocation Problem
In this section, we show how to apply the binary search approach to the traffic allocation problem, i.e., we solve the SAC problem with non-linear latency functions. We need to make some very modest assumptions about these functions. A latency function is monotone if it is positive and non-decreasing. The functions need not be continuous or strictly increasing, e.g., step functions are covered. For a monotone function f : ≥0 → ≥0 ∪ {∞}, let the inverse of f be defined by f −1 (y) = sup{λ|f (λ) ≤ y}, for y ≥ f (0), and f −1 (y) = 0, for y < f (0). We say that a function f is polynomially length-bounded if for every λ ∈ , f (λ) ∈ poly() . For example, the M/M/1 waiting time function is polynomially length-bounded although limλ→b− fs (λ) = ∞. This is because, for λ, s ∈ with λ < s, one can show (s − λ) ∈ 2 , s(s − λ) ∈ 4 and λ/(s(s − λ)) ∈ 8 so that f (λ) ∈ 8 . We say that a family of latency functions F is efficiently computable if, for every s, λ ∈ , fs (λ) and fs−1 (λ) can be calculated in time polynomial in . Observe that the functions from an efficiently computable family must also be polynomially length-bounded. It is easy to check that the M/M/1 waiting time family and other typical function families from Queueing Theory are efficiently computable. We obtain the following result whose proof is omitted.
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
509
Theorem 2. Let F be any efficiently computable family of monotone functions. Consider the SAC problem with latency functions from F and splittability at least two. Suppose the best possible maximum latency can be represented by a number from . Then an optimal solution can be found in time O poly() · (n + mm+m/(k0 −1) ) with k0 = min{kj : j = 1, 2, . . . , n}. Note that is an obvious lower bound on the running time of any algorithm computing the exact, optimal makespan. It is unclear if there exist latency functions s.t. cannot be bounded polynomially in the input length. If an appropriate upper bound on is not known in advance, we can use the geometric search, which can be stopped after computing the optimal latency with desired precision.
5
Non-approximability for Non-linear Scheduling
The M/M/1 waiting time cost function family defined in Section 1 has an infinity pole as λ → b− . Intuitively, this pole reflects the capacity restriction on the servers and it is typical also for other families that can be derived from Queueing Theory. The following theorem, whose proof is omitted, shows that the non-linear k-splittable scheduling even with identical servers is completely inapproximable. Theorem 3. Let F be an efficiently computable family of monotone latency functions. Suppose there is s ∈ >0 s.t. limλ→s fs (λ) = ∞. Then there does not exist a polynomial time approximation algorithm with finite approximation ratio for the non-linear k-splittable scheduling problem under F , provided P = NP.
References 1. A. Agarwal, T. Agarwal, S. Chopra, A. Feldmann, N. Kammenhuber, P. Krysta and B. V¨ ocking. An Experimental Study of k-Splittable Scheduling for DNS-Based Traffic Allocation. To appear in Proc. of the 9th EUROPAR, 2003. 2. N. Alon, Y. Azar, G. J. Woeginger and T. Yadid. Approximation schemes for scheduling on parallel machines. Journal of Scheduling, 1:55–66, 1998. 3. M. Blum, R. Floyd, V. Pratt, R. Rivest, and R. Tarjan. Time bounds for selection. J. Computer and System Science, 7(4):448–461, August 1973. 4. L. Epstein and J. Sgall. Approximation schemes for scheduling on uniformly related and identical parallel machines. Proc. of the 7th ESA, 151–162, 1999. 5. D. S. Hochbaum and D.B. Shmoys. Using dual approximation algorithms for scheduling problems: theoretical and practical results. J. ACM, 34: 144–162, 1987. 6. D. S. Hochbaum and D.B. Shmoys. A polynomial approximation scheme for scheduling on uniform processors: using the dual approximation approach. SIAM Journal on Computing, 17: 539–551, 1988. 7. E. Horowitz and S. K. Sahni. Exact and approximate algorithms for scheduling nonidentical processors. J. ACM, 23:317–327, 1976. 8. J. F. Kurose and K. W. Ross. Computer networking: a top-down approach featuring the Internet. Addison-Wesley, 2001. 9. St. Kwek and K. Mehlhorn. Optimal search for rationals. Information Processing Letters, 86:23 - 26, 2003.
510
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
10. R. McNaughton. Scheduling with deadlines and loss functions. Management Science, 6:1–12, 1959. 11. C. H. Papadimitriou. Efficient search for rationals. Information Processing Letters, 8:1–4, 1979. 12. H. Shachnai and T. Tamir. Multiprocessor Scheduling with Machine Allotment and Parallelism Constraints. Algorithmica, 32(4): 651–678, 2002. 13. W. Xing and J. Zhang. Parallel machine scheduling with splitting jobs. Discrete Applied Mathematics, 103: 259–269, 2000.
Computing Average Value in Ad Hoc Networks Mirosław Kutyłowski1 and Daniel Letkiewicz2 1
2
Inst. of Mathematics, Wrocław University of Technology [email protected] Inst. of Engineering Cybernetics, Wrocław University of Technology [email protected]
Abstract. We consider a single-hop sensor network with n = Θ(N ) stations using R independent communication channels. Communication between the stations can fail at random or be scrambled by an adversary so that it cannot be distinguished from a random noise. Assume that each station Si holds an integer value Ti . The problem that we consider is to replace the values Ti by their average (rounded to integer values). A typical situation is that we have a local sensor network that needs to make a decision based on the values read by sensors by computing the average value or some kind of voting. We design a protocol that solves this problem in O(N/R · log N ) steps. The protocol is robust: a constant random fraction of messages can be lost (by communication channel failure, by action of an adversary or by synchronization problems). Also a constant fraction of stations may go down (or be destroyed by an adversary) without serious consequences for the rest. The algorithm is well suited for dynamic systems, for which the values Ti may change and the protocol once started works forever. Keywords: mobile computing, radio network, sensor network
1 Introduction Ad hoc networks that communicate via radio channels gain importance due to many new application areas: sensor networks used to monitor environment, self-organizing networks of mobile devices, mobile networks used in military and rescue operations. Ad hoc networks provide many features that are very interesting from practical point of view. They have no global control (which could be either attacked or accidentally destroyed), should work if some stations leave or join the network. So the systems based on ad hoc networks are robust (once they work). On the other side, it is quite difficult to design efficient algorithms for ad hoc networks. Classical distributed algorithms have been designed for wired environments with quite different communication features. For instance, in many cases one can assume that an hoc networks works synchronously (due to GPS signals); if the network works on a small area, then two stations may communicate directly (single-hop model) and there is no communication latency. On the other hand, stations compete for access into a limited number of radio channels. They may disturb each other making the transmission
This research was partially supported by Komitet Bada´n Naukowych grant grant 8T11C 04419.
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 511–520, 2003. c Springer-Verlag Berlin Heidelberg 2003
512
Mirosław Kutyłowski and Daniel Letkiewicz
unreadable, if broadcasting at the same time on the same channel. Therefore quite a different algorithmic approach is necessary. Recently, there have been a lot of research on fundamental issues for ad hoc networks (as a starting point for references see [7]). Problem Statement. In this paper we consider the following task. Each station Si of a network holds initially an integer value Ti . The goal is to compute the average value of all numbers Ti so that each station changes its Ti into the average value. We demand that the numbers held by the stations remain to be integers and their sum is not changed (so at the end, some small differences could be inevitable and the stations hold not exactly the average value but values close to it). The intuition is that Ti might be a physical value measured by a sensor or a preference of Si , expressed as an integer value (for instance 0 meaning totally against, 50 meaning undecided and 100 as a fully supporting voice). The network may have to compute the average in order to get an output of a group of sensors or to make a common decision regarding its behavior. This task, which can trivially be solved for most computing systems (for instance, by collecting data, simple arithmetic and broiadcasting the result) becomes non trivial in ad hoc networks. Computation Model. We consider networks consisting of identical stations with no ID’s (the idea is that it is unpredictable which stations appear in the network and the devices are bulk produced). However, we assume that the stations know n, the number of stations in the network, within a constant factor. This parameter is called N for the rest of the paper. (Let us remark that N can be derived by an efficient algorithm [5].) Communication between stations is accomplished through R independent communication channels labeled by numbers 0 through R−1. A station may either send a message through a chosen channel or listen to a chosen channel (but not both at the same time, according to the IEEE 802.11 standard). If more than one station is sending on the same channel, then a collision occurs and the messages on this channel are scrambled. We assume that a station listening to the channel on which there is a collision receives noise and cannot even recognize that a collision has occurred (no-collision-detection model). In this paper we consider only networks that are concentrated in a local area: we assume that if a station sends a message, then every station (except the sender) can hear it. So we talk about single-hop network. Computation of each station consists of steps. During a step a station may perform a local computation and either send or receive messages through a chosen channel. For the sake of simplicity of presentation we assume that computation is synchronous. However, our results also hold for asynchronous systems, provided that the stations work with comparable speeds and lack of synchronization may only result in failure of a constant fraction of communication rounds. We do not use any global clock available to all stations. (In fact, our algorithm can be used to agree upon a common time for other purposes.) Design Goals. We design a protocol that has to remain stable and efficient in the following sense:
Computing Average Value in Ad Hoc Networks
513
– Each station may break down or leave the network. However we assume that Ω(N ) stations remain active. – A message sent by one station is not received by a station that is listening with probability not lower than p, where p < 1 is fixed. – An adversary, who knows all details about the algorithm, may scramble communication for a constant fraction of the total communication time over all communication channels. (So no “hot spots” of communication pattern of the protocol are admissible - they would be easily attacked by an adversary). – The protocol has to be suited for dynamic systems, which once started compute the average forever. So it has to be applied in systems where the values Ti may change. (For a discussion on dynamic systems see [8].) Preferably, a solution should be periodic with the same code is executed repeatedly. – The protocol should ensure that the number of steps at which a station transmitting a message or listening is minimized. Also there should be no station that is involved in communication substantially longer than an average station. This is due to the fact that energy consumption is due mainly for radio communication and that battery opereted devices has only limited energy resources. Former Results. Computing an average value is closely related to load balancing in distributed systems. (In fact, in the later case we have not only to compare the loads but also forward some tasks.) However, the algorithms are designed for wired networks. An exception for it is a solution proposed by Gosh and Muthukrishnan [4]: they propose a simple protocol based on random matchings in the connection graph. In one round of their protocol the load is balanced between the nodes connected by an edge of a matching. This approach is very different from straightforward algorithms that try to gather information and then make decisions based on it. They provide an estimation of convergence of this algorithm to the equal load based on total characteristics of the graph (its degree and the second eigenvalue). Their proof shows decrease of a certain potential describing how much the load is unbalanced. The protocol of Gosh and Muthukrishnan has been reused for permuting at random in a distributed system [2]. However, in this case the analysis is based on a rapid-mixing property of a corresponding Markov chain and a refined path coupling approach [3]. New Results. We extend the results from [4] and adopt it to the case of a single hop ad hoc network. The issue is that in the protocol of Gosh and Muthukrishnan one needs to show not only that a certain potential decreases fast, but also that there are no “bad points” in the network that have results far from the valid ones (if potential is low, we can only guarantee that the number of bad points is small). Theorem 1. Consider an ad hoc network consisting of Θ(N ) stations using R communication channels. Let D denote the maximum difference of the form Ti − Tj at the beginning of the protocol. With high probability, i.e. probability at least 1 − N1 , after executing O(N/R · (log N + log D)) steps of the protocol: – the sum of all values Ti remains unchanged and each station keeps one value, – either all station keep the same value Ti , or these values differ by at most 1, or they differ by at most 2 and the number of stations that keep the biggest and the smallest values is bounded by ε · N for a small constant ε.
514
Mirosław Kutyłowski and Daniel Letkiewicz
2 Protocol Description The protocol repeats a single stage consisting of 3N/R steps, three consecutive steps are called a round. A stage is in fact a step of protocol of Gosh and Muthukrishnan. Description of a Stage. 1. Each station Si chooses t, t ∈ [1, . . . , N ] and a bit b uniformly at random (the choices of t, t and b are stochastically independent). 2. Station Si performs the following actions during round t/R on the channel t mod R. – If b = 0, then Si transmits Ti at step 1. Otherwise, it transmits Ti at step 2. – At step 3, station Si listens. If a message comes from another station with the Ti transmitted and an another value, say Tj , then Si changes Ti as follows: • if Ti + Tj is even, then station Si puts Ti := 12 (Ti + Tj ), • if Ti + Tj is odd, then station Si puts Ti := 12 (Ti + Tj ), if its b equals 0, and Ti := 12 (Ti + Tj ) otherwise. 3. If t = t, then during round t /R station Si uses channel t mod R: – it listens during the first two steps, – it concatenates the messages heard and sends them during the third step. The idea of the protocol is that 3N/R steps of a single stage are used as a place for N slots in which pairs of stations can balance their values. If everything works fine, then for a given channel at a given round: – during step 1 a station Su for which b = 0 sends Tu , – during step 2 a station Sv for which b = 1 sends Tv , – step 3 is used to avoid Byzantine problems [6]: another station Sw repeats Tu and Tv . Otherwise, neither Su nor Sv could be sure that its message came through. (An update of only one of the values Su or Sv would violate the condition that the sum of all values must not change.) Of course, such a situation happens only for some slots. However, standard considerations (see for instance [3]) show that the following fact: Lemma 1. With high probability, during a stage balancing does occur at step 3 for at least c · N slots, where c is a fixed constant, 0 < c < 1. Note that Lemma 1 holds also, if for a constant fraction of slots communication failure occurs or an adversary scrambles messages. Since the stations communicate at randomly chosen moments, it is difficult for an adversary to attack only some group of stations.
3 Analysis of the Protocol The analysis consists of three different phases (even if the stations behave in exactly the same way all the time). In Phase I we show that some potential function reaches a certain low level – this part is borrowed from [4]. In Phase II we guarantee with high probability that all stations deviate by at most β from the average value. Then Phase 3 is used to show that with high probability all stations hold one of at most 3 consecutive values. In order to simplify the presentation we call the stations S1 , . . . Sn , even if the algorithm does not use any ID’s of the stations.
Computing Average Value in Ad Hoc Networks
515
3.1 Phase I Let Tt,j denote the value of Tj hold by S j nimmediately after executing stage t. Let T denote the average value, that is, T = n1 i=1 T0,i . We examine the values xt,j = Tt,j − T . In order to examine the differences from the average value we consider the following potential function: ∆t =
n
x2t,i .
i=1
Claim A. E [∆t+1 ] ≤ ρ · ∆t +
n 4
for some constant ρ < 1.
Proof. We first assume that new values hold by two stations after balancing their values become equal (possibly reaching non-integer values). Then we make an adjustment to the real situation when the values must remain to be integers. By linearity of expectation, n n E [∆t+1 ] = E x2t+1,i = E x2t+1,i . i=1
i=1
So now we inspect a single E x2t+1,i . As already mentioned, with a constant probability δ station Si balances Tt,j with some other station, say with Ss . Assume that the values held by Si and Ss become equal. So xt+1,i and xt+1,s become equal to z = 12 (xt,i + xt,s ). Therefore 2 2 xt,i + xt,s 2 E xt+1,i = (1 − δ) · xt,i + δ · E 2 = (1 − δ) · x2t,i + δ · E 14 x2t,i + 14 x2t,s + 12 xt,i · xt,s = (1 − 34 δ) · x2t,i + 14 δ · E x2t,s + 12 δ · E [xt,i · xt,s ] . Since s is uniformly distributed over {1, . . . , n}, we get n 1 2 1 · xt,j = · ∆t . E x2t,s = n n j=1
The next expression we have to evaluate is n n 1 1 E [xt,i · xt,s ] = · (xt,i · xt,j ) = · xt,i · xt,j . n n j=1 j=1
n
xt,j = 0. So E [xt,i · xt,s ] equals 0 and finally, 1 E x2t+1,i = (1 − 34 δ) · x2t,i + 14 δ · · ∆t . n When we sum up all expectations E x2t+1,i we get n 1 E [∆t+1 ] = (1 − 34 δ) · x2t,i + 14 δ · ∆t n i=1 Obviously,
j=1
= (1 − 34 δ) · ∆t + 14 δ · ∆t = (1 − 12 δ) · ∆t .
516
Mirosław Kutyłowski and Daniel Letkiewicz
Now, let us consider the case when Tt,i + Tt,s (or equivalently xt,i + xt,s ) is odd. Let us see how it contributes to the change of ∆t+1 compared to the value computed previously. In the simplified case, Si and Ss contribute 2 xt,i + xt,s 2 2 to the value of ∆t+1 . Now, this contribution could be 2 2 xt,i + xt,s + 1 xt,i + xt,s − 1 + . 2 2 For every y,
y+1 2
2 +
y−1 2
2
=2·
y 2 2
+
so we have to increase the value computed for ∆t+1 by at most established a link. It follows finally that n E [∆t+1 ] ≤ ρ · ∆t + 4 1 for ρ = (1 − 2 δ).
1 2 1 2
for each pair that has (1)
Claim B. After τ0 = O(log D + log n) stages, ∆τ0 ≤ α · n + 1 with probability 1 − O( n12 ). Proof. Let ∇t = ∆t − αn, for α =
1 4(1−ρ) .
By inequality (1)
E [∇t+1 ] = E [∆t+1 − αn] ≤ ρ · ∆t +
n − αn = 4
n − αn = ρ · ∇t . 4 It follows that E [∇t+1 ] ≤ ρ · E [∇t ]. Let τ0 = logρ−1 D · n2 . Then E [∇τ0 ] ≤ n−2 . So by Markov inequality, Pr[∇τ0 ≥ 1] ≤ n−2 . We conclude that Pr[∆τ0 < 1 + αn] is at least 1 − n−2 . ρ · ∇t + ραn +
3.2 Phase II
√ We assume that ∆τ0 < 1 + αn. Let β = 2 α. Let B = Bt be the set of stations Si such that |xt,i | > β, and G = Gt be the set of stations Sj such that |xt,j | = β. Claim C. |B ∪ G| < 14 n + O(1)
for each t ≥ τ0 .
Proof. The stations from B ∪ G contribute at least |B ∪ G| · β 2 = |B ∪ G| · 4α to ∆t . 1 Since ∆t ≤ ∆τ0 < αn + 1, we must have |B ∪ G| < 14 n + 4α . Now we define a potential function ∆ used to measure the size of Bt . Definition 1. For station Si ∈ Bt define x ˜i,t = xi,t − β and as a zero otherwise. Then x ˜2i,t . ∆t = i∈Bt
Computing Average Value in Ad Hoc Networks
Claim D. E ∆t+1 ≤ µ · ∆t
517
for some constant µ < 1.
Proof. We consider a single station Si ∈ Bt , with xi,t > β. By Claim C, with a constant probability it communicates with a station Sj ∈ Bt ∪ Gt . In this case, station Sj may join B, but let us consider contribution of stations Si and Sj to ∆t+1 . Let δ = β − xj,t . Then: 2 2 2 x ˜2i,t x ˜i,t − δ x˜i,t − 1 x˜i,t 2 2 . ˜j,t+1 ≤ 2 · τ1 we may assume that for no station |xi,t | ≥ β. Our goal now is to reduce the maximal value of |xi,t |. We achieve this in at most 2β − 1 subphases, each consisting of O(log n) stages: during each subphase we “cut off” one of the values that can be taken by xi,t . Always it is the smallest or the biggest value. Let V (s) = Vt (s) be the set of stations, for which xi,t takes the value s. Consider t1 > τ1 . Let l = min{xi,t1 : i ≤ n} and and g = max{xi,t1 : i ≤ n}. Assume that l + 1 < g − 1 (so, there are at least four values of numbers xi,t1 ). We show that for t = t1 + O(log n) either Vt (l) = ∅ or Vt (g) = ∅. Obviously, no station may join Vt (l) or Vt (g), so their sizes are non-increasing. Now consider a single stage. Observe that |Vt (l)∪Vt (l+1)| ≤ 12 n or |Vt (g)∪Vt (g−1)| ≤ 12 n. W.l.o.g. we may assume that |Vt (g) ∪ Vt (g − 1)| ≤ 12 n. Consider a station Si ∈ Vt (g). With a constant probability Si communicates with a station Sj that does not belong to Vt (g) ∪ Vt (g − 1). Then station Si leaves Vt (g) and Sj remains outside Vt (g). Indeed, the values xi,t and xj,t differ by at most 2, so xi,t+1 , xj,t+1 ≤ xi,t − 1. It follows that E [|Vt+1 (l)|] ≤ ψ · |Vt (l)| for some ψ < 1. We see that in a single stage we expect either |Vt (l)| or |Vt (g)| to shrink by a constant factor. Using Markov inequality as in the proof of Claim E we may then easily derive the following property:
518
Mirosław Kutyłowski and Daniel Letkiewicz
Claim F. For some T = O(log n), if t > T , then with probability 1 − O( n12 ) either the set Vτ1 +t (l) or the set Vτ1 +t (g) is empty. By Claim F, after O(β log n) = O(log n) stages we end up in the situation in which there are at most three values taken by xi,t . Even then, we may proceed in the same way as before in order to reduce the sizes of Vt (l) or Vt (g) as long as one of these sets has size Ω(n). So we can derive the following claim which concludes the proof of Theorem 1: Claim G. For some T = O(log n), for τ2 = τ1 + T and t ≥ τ2 with probability 1 − O( n12 ) either xi,t takes only two values, or there are three values and the number of stations holding the smallest and the largest values is at most γ · n.
4 Properties of the Protocol and Discussion Changes in the Network. By a simple examination of the proof we get the following additional properties: – the result holds even if a constant fraction of messages is lost. This only increases the number of stages by a constant factor. – if some number of stations goes down during the execution of the protocol, then the final values do not concentrate around the average value of the original values of the stations that have survived, but anyway they differ by at most 2. If a new station joins the network and its value deviates from the values hold by the rest of the stations, then we may proceed with the same analysis. Conclusions regarding the rate of convergence and the rate at which new stations emerge can be derived as before. Energy Efficiency. Time complexity is not the most important complexity measure for mobile networks. Since the devices are powered by batteries, it is important to design algorithms that are energy efficient (otherwise, the network may fail due to exhaustion of batteries). The main usage of energy is for transmitting messages and listening to the communication channel. Energy usage of internal computations and sensors is substantially smaller and can be neglected. Surprisingly, for transmitting and listening comparable amounts of energy are necessary. A properly designed algorithm should require a small number of messages (not only messages sent, but also messages awaited by the stations). Additionally, the differences between the number of messages sent or awaited by different stations should be as small as possible. The reason is that with similar energy resources no station should be at higher risk of going down due to energy exhaustion. In out algorithm energy usage of each station is Θ(log n). This is optimal since we need that many sending trials to ensure that its value has been transmitted successfully with high probability in the presence of constant probability of transmission failures. Protocol Extensions – Getting Exactly One Value. Our algorithm leaves the network in a state in which there might be 3 different values. The bound from Theorem 1 regarding behavior of the algorithm cannot be improved. Indeed, consider the following example:
Computing Average Value in Ad Hoc Networks
519
assume that initially exactly one station holds value T − 1, exactly one station holds T + 1, and the rest has value T . Then in order to get into the state when all stations get value T we need that the station with T − 1 communicates with the station with value T + 1. However, probability that it happens during a single stage is Θ(1/N ). Therefore, probability that these two station encounter each other within logarithmic number of stages is O(log N/N ). Once we are left with the values that differ from the average value by less than two, it is quite reasonable to start a procedure of computing the minimum over all active stations. In fact, it suffices to redefine part 3 of a stage: instead of computing the average of two values both stations are assigned the smaller of their two values. Arguing as in Claim E, we may easily show that after O(log n) stages with high probability all stations know the minimum. If each station knows the number n of the active stations, a simple trick may be applied to compute the average value exactly. At the beginning, each value is multiplied by n. Then the average value becomes s = Ti and it is an integer. So after executing the protocol we end up in the second situation described in Theorem 1 or all stations hold the same value. In order to get rid of the first situation a simple protocol may broadcast the minimal and maximal values to all stations within O(log N ) steps. Then all stations may find s and thereby the average s/n. Dynamic Processes. For a dynamic process, in which the the values considered are changing (think for instance about the output of sensors), we may observe that the protocol works quite well. For the sake of discussion assume that the values may only increase. If we wish to ignore the effect of increments of the values we may think about old units of the values existing at the beginning of the protocol, and the new ones due to incrementing the values. When executing part 3 of a stage and “allocating” the units to stations A and B we may assume that first the same (up to 1) amount of old units is given to A and B and afterwards the new units are assigned. In this way, the new units do not influence the behavior of the old ones. So a good distribution of the old units will be achieved as stated by Theorem 1 despite the fact that the values have changed. Security Issues. Since the stations exchange information at random moments an adversary which only disturbs communication can only slow down the rate of convergence. However, if it knows that there is a station with a value X that differs a lot from the average, it may increase its chances a little bit: if X has not occurred so far during a stage, then it might be advantageous for the adversary to scramble the rest of the stage making sure that X remains untouched. Of course, serious problems occur when an adversary can fake messages of the legitimate stations. If the legitimate stations have a common secret, say K, then the problems can be avoided. Faking messages becomes hard, when the messages are secured with MAC code using K. In order to avoid the first problem it is necessary to encipher all messages (together with random nounces). In this case an adversary cannot say which values are exchanged by the algorithm. The only information that might be derived is the fact that somebody has transmitted at a given time. But this seems not to bring any substantial advantage, except that then it could be advantageous to attack the third step
520
Mirosław Kutyłowski and Daniel Letkiewicz
of a round. Encryption with a symmetric algorithm should be no problem regarding speed differences between transmission and internal computations.
Acknowledgment We thank Artur Czumaj for some ideas developed together and contained in this paper.
References 1. Chlebus, B. S.: Randomized communication in radio networks. A chapter in “Handbook on Randomized Computing” (P. M. Pardalos, S. Rajasekaran, J. H. Reif, J. D. P. Rolim, Eds.), Kluwer Academic Publishers, 2001, vol. I, 401-456 2. Czumaj, A., Kanarek, P., Kutyłowski, M., and Lory´s, K.: Distributed stochastic processes for generating random permutations. ACM-SIAM SODA’99, 271-280 3. Czumaj, A., Kutyłowski, M.: Generating random permutations and delayed path coupling method for mixing time of Markov chains. Random Structures and Algorithms 17 (2000), 238–259 4. Gosh, B., Muthukrishnan, S.: Dynamic Load Balancing in Parallel and Distributed Networks by Random Matchings. JCSS 53(3) (1996), 357–370 5. Jurdzi´nski, T., Kutyłowski, M., Zatopia´nski, J.: Energy-Efficient Size Approximation for Radio Networks with no Collision Detection. COCOON’2002, LNCS 2387, Springer-Verlag, 279-289 6. L. Lamport, R. Shostak and M. Pease: The Byzantine Generals Problem. ACM TOPLAS 4 (1982), 382-401 7. I. Stojmenoviˇc (Ed.): Handbook of Wireless Networks and Mobile Computing, Wiley, 2002 8. E. Upfal: Design and Analysis of Dynamic Processes: A Stochastic Approach, ESA’1998, LNCS 1461, Springer-Verlag, 26–34
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences of Basic Parallel Processes Sławomir Lasota Institute of Informatics, Warsaw University, Poland [email protected]
Abstract. A polynomial-time algorithm is presented to decide distributed bisimilarity of Basic Parallel Processes. As a direct conclusion, several other noninterleaving semantic equivalences are also decidable in polynomial time for this class of process, since they coincide with distributed bisimilarity.
1 Introduction One important problem in the verification of concurrent systems is to check whether two given systems P and Q are equivalent under a chosen notion of equivalence. For process algebras generating infinite-state systems the equivalence checking problem cannot be decidable in general, therefore restricted classes of processes have been defined and investigated. We study here the class of Basic Parallel Processes [9] (BPP), an extension of recursively defined finite-state systems by parallel composition. Strong bisimilarity [25] is a well accepted behavioural equivalence, which often remains decidable for infinite-state systems. An elegant proof of decidability of bisimilarity for BPP and even for BPPτ , extension of BPP by communication, was given in [10]. The PSPACE lower bound has been recently proved in [26], followed by the PSPACEcompleteness result of Janˇcar [18]. On the other hand, all other equivalences in van Glabbeek’s spectrum are undecidable [17]. BPP is the natural class of processes to investigate non-interleaving equivalences, intended to capture true concurrent computations of a system. One of the bisimulationlike non-interleaving equivalences is distributed bisimilarity [6], taking into account spatial distribution of a process. Already in [8] distributed bisimilarity was shown to be decidable on BPPτ by means of a sound and complete tableau proof system. Concerning complexity, the tableau depth was only bounded exponentially. In this paper we design a polynomial-time decision procedure for distributed bisimilarity. It strongly uses a polynomial-time algorithm for deciding strong bisimilarity on normed BPP processes proposed in [16]. Distributed bisimilarity is therefore very likely to be computationally more feasible than interleaving bisimilarity, in the light of the recent PSPACE lower bound for the latter one by Srba [26]. Further interesting conclusions follow from the fact that many non-interleaving equivalences coincide on BPP. As mentioned in [12], Kiehn proved [21] that location equivalence [7], causal equivalence [11] and distributed bisimilarity all coincide
A part of this work has been performed during the post-doc stay at Laboratoire Specification et Verification, ENS Cachan. Partially supported by the KBN grant 7 T11C 002 21 and the EC Research Training Network “Games and Automata for Synthesis and Validation” (GAMES).
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 521–530, 2003. c Springer-Verlag Berlin Heidelberg 2003
522
Sławomir Lasota
on CPP, a sublanguage of BPPτ without explicit τ , hence also on BPP. Furthermore, causal equivalence and history preserving bisimilarity [13] coincide on BPP by the result of Aceto [1]; moreover, Fr¨oschle showed coincidence of distributed and history preserving bisimilarity on BPP [12]. The coincidence with performance equivalence, a timed bisimulation equivalence proposed by [14], has been shown in [23], to complete the picture1 . As a direct conclusion from all these results, all the mentioned equivalences can be also decided in polynomial time on BPP. Related results are [15] decision procedures for causal equivalence, location equivalence and ST-bisimulation equivalence of BPPτ as well as for their weak versions on a subset of BPPτ . However, complexity issues were not addressed there. Furthermore, polynomial-time complexity of performance equivalence extends the result of [5] shown only for timed BPP in a full standard form [9]. Surprisingly, polynomial-time complexity of history preserving bisimilarity on BPP can be contrasted with the EXPTIME-completeness on finite-state systems (finite 1-safe nets) [19]. Similarly, decidability of hereditary history preserving bisimilarity [4] on BPP, proved in [12], can be contrasted with undecidability on finite 1-safe nets shown by Jurdzi´nski and Nielsen in [20]. We start by Section 2 containing definitions and some basic facts and then we outline our algorithm in Section 3. The algorithm works for BPP processes in standard form, similarly as in [9]. A polynomial-time preprocessing procedure transforming a process into standard form can be found in the full version of the paper [24].
2 Basic Definitions and Facts Let Act be a finite set of actions, ranged over by a, b, etc. and let Const be a finite set of process constants, ranged over by X, Y , etc. The set of BPP process expressions [9] over Act and Const is given by: Pi | P P | P P (1) P ::= 0 | X | a.P | i∈I
where 0 denotes the empty process, a. is an action prefix, i∈I Pi is a finite nondeterministic choice for a finite nonempty set I and stands for a parallel composition. The only operator not present in CCS [25] is the left merge , which differs from the parallel composition only in that the very first action must be performed in the left argument. The purpose of considering here is the standard form (3) below. A BPP process definition ∆ consists of a finite set Act(∆) of actions, a finite set Const(∆) of constants and a finite number of recursive process equations X = P, def
one for each constant X ∈ Const(∆), where P is a process expression over Act(∆) and Const(∆). Sets Const(∆) and Act(∆) are often even not mentioned explicitly, as they can be deduced from process equations. In the sequel we shall assume that a 1
Related results are [2], where Aceto proved that distributed bisimilarity coincides with timed bisimilarity on BPP without recursion, and [22], where decidability of strong bisimilarity for timed BPP was shown, that generalizes the result of [10].
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
523
process definitions that our algorithm inputs is guarded, i.e., that each occurrence of a constant on the right-hand side is within the scope of an action prefix. For instance, def X = a.Y X + b.Z is not guarded. By a BPP process we mean a pair (P, ∆), where ∆ is a BPP process definition and P is a BPP process expression over Act(∆) and Const(∆). When ∆ is evident from the context, P itself is called a process too. Distributed bisimilarity was introduced in [6], but here we follow [9]. Given a BPP process definition ∆, consider the following SOS transition rules: a
P → [P , P ]
(X = P ) ∈ ∆ def
a
X → [P , P ] a
Pj → [P , P ] for some j∈I a i∈I Pi → [P , P ] a
P → [P , P ] a
P Q →
a
a.P → [P, 0] a
P → [P , P ] a
P Q → [P , P Q] a
(2)
Q → [Q , Q ]
[P , P Q]
a
P Q → [Q , P Q ]
a
We write P → [P , P ] if this transition can be derived from the above rules. The rules a reflect a view on a process as distributed in space. Each transition P → [P , P ] gives rise to a local derivative P , which intuitively records a location at which the action is observed, and a concurrent derivative P , recording the part of the process separated from the local component. BPP processes (P1 , ∆1 ) and (P2 , ∆2 ) are distributed bisimilar, denoted (P1 , ∆1 ) ∼ (P2 , ∆2 ) if they are related by some distributed bisimulation R, i.e., a binary relation over BPP process expressions such that whenever (P, Q) ∈ R, for each a ∈ Act, a
a
– if P → [P , P ] then Q → [Q , Q ] for some Q , Q such that (P , Q ) ∈ R and (P , Q ) ∈ R, a a – if Q → [Q , Q ] then P → [P , P ] for some P , P such that (P , Q ) ∈ R and (P , Q ) ∈ R. In the next section we prove polynomial-time complexity of the problem of checking distributed bisimilarity for a given pair of constants. We do not lose generality, as checking P ∼Q for arbitrary P, Q is equivalent to checking XP ∼XQ , where XP and def def XQ are new fresh constants with defining equations: XP = a.P and XQ = a.Q, for arbitrary a. Moreover, w.l.o.g. we assume that both constants share a process definition. Problem: D ISTRIBUTED BISIMILARITY FOR BPP Instance: A BPP process definition ∆ and X, Y ∈ Const(∆) Question: (X, ∆)∼(Y, ∆) ? Christensen presented in [8] a sound and complete tableau proof system for ∼ on BPPτ and proved an exponential upper bound for the depth of a tableau. Theorem 1 ([8, 9]). Distributed bisimilarity is decidable on BPP.
524
Sławomir Lasota
Christensen [9] showed also that each BPP process definition ∆ can be effectively transformed into an equivalent process definition ∆ in standard form, i.e., consisting exclusively of process equations in the restricted form def X= (ai .Pi )Qi , (3) i∈I
where all Pi and Qi are merely a parallel composition of constants, i.e., of the form X1 X2 . . .Xn ,
for n > 0 and X1 , . . . , Xn ∈ Const(∆ ).
(4)
Note that (3) is not guarded in general. We omit brackets in (4) as is associative and commutative w.r.t. ∼ (and w.r.t. any other known semantical equivalence, in fact). A parallel composition of constants (4) is called basic process (expression) in the sequel. Observe that processes Pi in (3) are precisely local derivatives of X and processes Qi are precisely its concurrent derivatives. Hence the left merge operator allows here to syntactically separate both derivatives. Consequently, in the next section we will only consider basic processes since both local and concurrent derivatives of a basic process are basic again. Since ∆ is guarded, the process definition ∆ produced by Christensen’s transformation is bounded in the following sense: Definition 1. Define a binary relation over Const(∆ ) as follows: Y ≺1 X iff Y appears in some concurrent derivative of X. We say that ∆ is bounded if the transitive + closure ≺+ 1 of ≺1 is irreflexive, i.e., no constant satisfies X≺1 X.
Christensen’s transformation produces ∆ that contains only basic processes consisting of at most three constants. The price for this constant bound is the exponential size of ∆ w.r.t. the size of ∆, defined as the number of process equations in ∆ plus the sum of lengths of right-hand sides. This is why we proved (Theorem 2 below) that the transformation to the standard form (3) can be done in polynomial time.
Theorem 2. There exists a polynomial-time algorithm that transforms a guarded process definition ∆ into a process definition ∆ such that: 1. ∆ is in standard form (3), 2. ∆ is bounded, 3. Const(∆) ⊆ Const(∆ ) and for each X ∈ Const(∆), (X, ∆)∼(X, ∆ ). (The proof is given in the full version of this paper [24].) a → P obtained Strong bisimilarity is defined w.r.t. single-derivative transitions P − by the rules (2) when the distribution of a process is ignored – for details we refer eg. to [25]. A remarkable result is that strong bisimilarity can be decided for BPP processes in polynomial time [16], but only when all the constants are normed. A constant, or generally a process P , is normed if an inactive process is reachable from P , i.e., if there a1 an a2 P1 −→ . . . −− → Pn , n ≥ 0, such that there are is a finite sequence of transitions P −→ no further transition from Pn . The norm of P is the length of the shortest such sequence. Strong bisimilarity is less restrictive than distributed bisimilarity. Hence a process definition ∆ can be transformed into a strong bisimilarity equivalent process definition ∆ in a more restrictive than (3) full standard form [9]. Full standard form admits exdef clusively process equations in the form: X = i∈I ai .Pi , where all Pi are basic again. Theorem 3 ([16]). There exists a polynomial-time algorithm to decide strong bisimilarity on normed BPP in full standard form.
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
525
3 Algorithm Throughout this section we fix ∆, and hence also sets of constants and actions. ∆ is assumed to be in standard form and bounded. A more succinct representation is possible for ∆ in standard form. Namely, due to associativity and commutativity of , it is sufficient to remember the number of occurrences of each constant in each basic expression Pi and Qi in the right-hand side of all equations (3) in ∆, encoded in binary. In this section, complexity is measured w.r.t. the size of ∆ defined as the number of process equations plus the sum of lengths of the succinct representations of the right-hand sides. Theorem 3 is still valid for this definition of size. The reachability relation over process expressions is defined as the smallest transitive relation such that each P is related to all its local and concurrent derivatives, i.e, a whenever P → [P , P ], the relation contains pairs (P, P ) and (P, P ). We say that Q is reachable from P if the pair (P, Q) is in the reachability relation. Let us denote by P the set of all process expressions reachable from the constants. As observed previously, all P ∈ P are basic. Unless stated otherwise, all the relations mentioned below are binary relations over P. This includes also ∼, which is restricted now to P only. Unless explicitly stated otherwise, P , Q, etc., range over P. An exponential-time algorithm can be easily derived as follows. First define two monotonic operators as follows. The operator B2 acts on a pair of relations as follows: (P, Q) ∈ B2 (R , R ) iff for each a ∈ Act, a
a
– if P → [P , P ] then Q → [Q , Q ] for some Q , Q such that (P , Q ) ∈ R and (P , Q ) ∈ R , a a – if Q → [Q , Q ] then P → [P , P ] for some P , P such that (P , Q ) ∈ R and (P , Q ) ∈ R . Then the operator B1 is defined by B1 (R) := B2 (R, R). Now, define the approximating equivalences ∼i as follows: – P ∼0 Q for all P and Q, – ∼i+1 := B1 (∼i ), for i ≥ 0. Distributed bisimulations R are exactly the post-fixed points of B1 , i.e., are defined by R ⊆ B1 (R). Hence ∼ being the union of all distributed bisimulations is then the greatest fixed point of B1 , by the Knaster-Tarski theorem. Recall that BPP is imagefinite, that is to say each process expression has only finitely many local and concurrent derivatives. Thus by a standard argument the decreasing chain {∼i } converges to ∼: ∼= ∼i . (5) i∈N
Furthermore, observe that each local derivative of a basic process X1 . . .Xn is a local derivative of some Xj , i.e., is equal to some process Pi appearing in a process equation (3) in ∆. In consequence, the number of local derivatives of all basic processes is polynomial. Let us denote the set of all those by L. Moreover, there are only exponentially many processes reachable from each P ∈ L – this follows easily from boundedness of ∆. Consequently, the cardinality N of the whole P is exponential. Hence ∼
526
Sławomir Lasota
can be computed over P in exponential time, e.g., as the limit of the sequence {∼i } of equivalences, since the sequence stabilizes after at most N −1 steps. We have not focused on details of the exponential-time algorithm, as in the rest of this section we argue that one can do better: the problem can be solved in polynomial time. Essentially, this is possible due to a ”quicker” convergence to the greatest fixed point, as explained in Lemma 2 below and thereafter. Then, in the crucial Lemma 4 we reduce distributed bisimilarity to strong bisimilarity between normed processes. To this aim we incorporate local derivatives into actions and obtain single-derivative transitions. We start by a couple of definitions and simple facts. Definition 2. Given a binary relation S, a distributed bisimulation w.r.t. S is any binary relation R such that R ⊆ B2 (S, R). P and Q are distributed bisimilar w.r.t. S, denoted P ∼S Q, if they are related by some distributed bisimulation w.r.t. S. Definition 3. We say that a relation R is a distributed bisimilarity w.r.t. itself if R = ∼R . Let ≈ denote the greatest distributed bisimilarity w.r.t. itself. A relation is a distributed bisimilarity w.r.t. itself precisely if it is a fixed point of the monotonic mapping R → ∼R . Hence the greatest distributed bisimilarity w.r.t. itself always exists. Lemma 1. ∼ and ≈ coincide. Proof. For one inclusion, recall that ∼ is the union of all distributed bisimulations while ∼∼ is the union of all distributed bisimulations w.r.t. ∼. Since each distributed bisimulation is a distributed bisimulation w.r.t. ∼, we have ∼ ⊆ ∼∼ , i.e., ∼ is a post-fixed point of the mapping R → ∼R . As ≈ is the greatest fixed point of that mapping, we obtain ∼ ⊆ ≈. For the other inclusion, assume a relation S is a distributed bisimilarity w.r.t. itself, S = ∼S . Since ∼S is the union of all distributed bisimulations w.r.t. S, it is the (greatest) fixed point of the monotonic mapping R → B2 (S, R), i.e., ∼S = B2 (S, ∼S ). Substituting S in place of ∼S we get S = B2 (S, S), i.e., S is a fixed point of B1 . Hence S ⊆ ∼. As S was chosen arbitrary, we showed that each distributed bisimilarity w.r.t. itself is included in ∼. Hence also ≈ does, i.e., ≈ ⊆ ∼. 2 We proved that ≈ is just another formulation of ∼. But ≈ gives rise to another sequence of approximating equivalences {≈i } that converges more rapidly than {∼i }, namely after a polynomial number of iterations: Lemma 2. ≈ = i∈N ≈i , where the sequence {≈i } is defined by: – P ≈0 Q for all P and Q, – ≈i+1 := ∼≈i . Proof. Obviously ≈ ⊆ i∈N ≈i , so we only need to show the opposite inclusion. Similarly as ∼, ∼S is the greatest fixed point of the monotonic mapping R → B2 (S, R), for any fixed S. So ∼S is also a limit of a decreasing sequence of approximations: ∼Si , (6) ∼S = i∈N
where relations ∼Si are defined by:
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
527
– P ∼S0 Q for all P and Q, – ∼Si+1 := B2 (S, ∼Si ). Having this, by an easy induction we show ≈i ⊆ ∼i , for all i ≥ 0. As the induction assumption suppose ≈i ⊆ ∼i , for a fixed i ≥ 0. Now substitute ≈i in place of S in (6) and in the definition of ∼Sj , j ≥ 0. Due to monotonicity of B2 we derive by another ≈i i easy induction on j ≤ i that ∼≈ j+1 = B2 (≈i , ∼j ) ⊆ B2 (∼j , ∼j ) = ∼j+1 , since i ≈i ⊆ ∼i ⊆ ∼j by the induction assumption and by j ≤ i. Hence ∼≈ i+1 ⊆ ∼i+1 . ≈i ≈i By (6) we know that ≈i+1 = ∼ ⊆ ∼i+1 , hence we conclude ≈i+1 ⊆ ∼i+1 . This completes the induction step. 2 Having ≈i ⊆ ∼i , for all i ≥ 0, we apply Lemma 1 and (5). Equipped with Lemmas 1 and 2, we are ready to describe the polynomial-time algorithm. Recall that L denotes the set of all local derivatives. The algorithm consists of two phases as outlined in the figure below. By R∩(L×L) we mean here the restriction of a relation R to pairs from L. P HASE 1: let 0 := L×L REPEAT FOR n = 0, 1, . . . compute n+1 ⊆ L×L as follows: n+1 := ∼n ∩(L×L) UNTIL n = n+1 P HASE 2:
decide whether X ∼n Y
The first phase is a crucial one, but amounts simply to computing an initial part of {≈i } up to the position n where it eventually stabilizes. The trick is that ≈i is computed only for local derivatives. Then, in the second phase we only need to check whether the input pair (X, Y ) belongs to ∼≈n . Assuming that the first phase of the algorithm terminates, the outcome ∼n coincides with ≈. Lemma 3. If n = n+1 then ∼n = ≈. Proof. Assuming n = n+1 , we will show ∼n ⊆ ≈ and ∼n ⊇ ≈. For both inclusions we will silently use an obvious fact: when two relations S1 and S2 coincide on L × L, i.e., S1 ∩(L×L) = S2 ∩(L×L), then ∼S1 = ∼S2 . For ∼n ⊆ ≈, it is sufficient to show that ∼n is a distributed bisimilarity w.r.t. itn n = ∼∼ ∩(L×L) = ∼n+1 = ∼n . self. Indeed, ∼∼ n For ∼ ⊇ ≈, we show by induction that for all i ≥ 0, (a) i = ≈i ∩(L×L), (b) ∼i = ≈i+1 . For i = 0 it is obvious, so assume i > 0 and (a) and (b) hold for i−1. We prove (b)
(a) first: i = ∼i−1 ∩(L×L) = ≈i ∩(L×L). Then (b) follows easily from (a): (a)
∼i = ∼≈i ∩(L×L) = ∼≈i = ≈i+1 . Now ∼n ⊇ ≈ follows from (b) since ≈n+1 ⊇ ≈, by Lemma 2.
2
528
Sławomir Lasota
Termination of the first phase of the algorithm after a polynomial number of iterations of the main loop is guaranteed as the sequence {i } is non-increasing: 0 ⊇ 1 ⊇ . . . , and each i contains only polynomially many pairs. What we still need to show is that the single iteration of the loop body, i.e., computation of i+1 from i , can be done in polynomial time. To this aim we will prove the following Lemma 4. In the proof we will profit from Theorem 3. Lemma 4. Let S ⊆ L×L be an equivalence such that there exists a polynomial-time algorithm (w.r.t. the size of ∆) to decide whether (P, Q) ∈ S, for given P, Q ∈ L. Then there exists a polynomial-time algorithm to decide P ∼S Q, for given P , Q ∈ P. Proof. As the first stage, we construct from ∆ a new process definition ∆ in the full standard form, equivalent to ∆ in the following sense: P, Q ∈ P, (P, ∆) ∼S (Q, ∆)
iff (P, ∆ ) is strongly bisimilar to (Q, ∆ ).
(7)
The construction of ∆ is as follows: Const(∆ ) := Const(∆)
Act(∆ ) := Act(∆) × L/S,
where L/S denotes the set of equivalence classes of S and can be computed in polynomial time. Furthermore, whenever ∆ contains a process equation def X= (ai .Pi )Qi , (8) i∈I
∆ contains
X= def
i∈I
(ai , [Pi ]S ).Qi ,
(9)
where [Pi ]S denotes the equivalence class of Pi in S. Having this, (7) is clear, by the very definitions of the two bisimilarities involved. Now, the crucial point is that ∆ is always normed, since ∆ is bounded. We have Y ≺1 X (cf. Section 2) iff Y appears in some Qi on the right-hand side of process equation (8) defining X. As the transitive closure ≺+ 1 is irreflexive, the following equations (10) and (11) are well-defined and give the norm of all the constants. First, for each constant X defined by (9) in ∆ , norm(X) = 1 + min{norm(Qi )}. i∈I
(10)
Second, the norm is additive w.r.t. parallel composition: norm(P Q) = norm(P ) + norm(Q),
(11)
and this implies that the norm of each concurrent derivative Qi in equation (10) is the sum of norms of all parallel components. Now we apply Theorem 3 to ∆ and by (7) get a polynomial-time procedure to decide ∼S . 2 Evidently each i is an equivalence, hence the lemma applies and we conclude that the body of the main loop of the first phase requires only polynomial time: it amounts to invoking the decision procedure for strong bisimilarity on normed BPP polynomially many times, since the set L×L has polynomial cardinality. By Lemma 4 the second phase can be computed in polynomial time as well. Correctness of the algorithm follows by Lemmas 3 and 1. This completes the proof of the following:
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
529
Theorem 4. There exists a polynomial-time algorithm to decide distributed bisimilarity for BPP processes.
4 Final Remarks We have proposed a polynomial-time decision procedure for distributed bisimilarity on BPP. As mentioned in the introduction, many non-interleaving equivalences coincide on BPP. Therefore, we directly conclude from Theorem 4: Corollary 1. There exists a polynomial-time algorithm to decide the following equivalences for BPP processes: location equivalence, causal equivalence, history preserving equivalence, performance equivalence. Consider BPPτ , an extension of BPP with communication between parallel components expressed by one additional rule: a
P → [P , P ] τ
a ¯
Q → [Q , Q ]
P Q → [P Q , P Q ]
.
(12)
A local derivative of a τ -transition can be composed of two local derivatives of parallel components. Hence local derivatives cannot be encoded directly into actions, and the reduction of distributed bisimilarity to strong bisimilarity in the proof of Lemma 4 fails. A crucial ingredient of our decision procedure is the polynomial-time transformation of a process definition to the standard form, described in the full version of this paper [24]. It is different from the transformation proposed by Christensen in [9], since the process definition in standard form yielded by the latter is of exponential size. Our algorithm needs Θ(n2 ) calls to the polynomial-time algorithm of [16] in each iteration in the first phase, where n stands for the size of ∆. At most n iterations are needed, since all i are equivalences, and therefore total cost is Θ(n3 ) calls to the procedure of [16]. On the other hand, P-completeness of the problem follows easily since it subsumes strong bisimilarity for finite-state systems and the latter is P-complete [3]. An interesting continuation of the work would be to develop a more efficient direct algorithm, not referring to the procedure of [16].
Acknowledgements The author is very grateful to Philippe Schnoebelen for many fruitful discussions.
References 1. L. Aceto. History preserving, causal and mixed-ordering equivalence over stable event structures. Fundamenta Informaticae, 17:319–331, 1992. 2. L. Aceto. Relating distributed, temporal and causal observations of simple processes. Fundamenta Informaticae, 17:369–397, 1992. 3. J. Balc´azar, J. Gabarr´o, and M. S´antha. Deciding bisimilarity is P-complete. Formal Aspects of Computing, (6A):638–648, 1992.
530
Sławomir Lasota
4. M. Bednarczyk. Hereditary history preserving bisimulation or what is the power of the future perfect in program logics. Technical report, Polish Academy of Sciences, Gda´nsk, 1991. 5. B. B´erard, A. Labroue, and P. Schnoebelen. Verifying performance equivalence for timed Basic Parallel Processes. In Proc. FOSSACS’00, LNCS 1784, pages 35–47, 2000. 6. I. Castellani. Bisimulations for Concurrency. PhD thesis, University of Edinburg, 1988. 7. I. Castellani. Process algebras with localities. In J. Bergstra, A. Ponse, S. Smolka, eds., Handbook of Process Algebra, chapter 15, pages 945–1046, 2001. 8. S. Christensen. Distributed bisimilarity is decidable for a class of infinite state systems. In Proc. 3th Int. Conf. Concurrency Theory (CONCUR’92), LNCS 630, pages 148–161, 1992. 9. S. Christensen. Decidability and Decomposition in process algebras. PhD thesis, Dept. of Computer Science, University of Edinburgh, UK, 1993. 10. S. Christensen, Y. Hirshfeld, and F. Moller. Bisimulation equivalence is decidable for Basic Parallel Processes. In Proc. CONCUR’93, LNCS 713, pages 143–157, 1993. 11. P Darondeau and P. Degano. Causal trees. In Proc. ICALP’89, LNCS 372, pages 234–248, 1989. 12. S. Fr¨oschle. Decidability of plain and hereditary history-preserving bisimulation for BPP. In Proc. EXPRESS’99, volume 27 of ENTCS, 1999. 13. R. van Glabbeek and U. Goltz. Equivalence notions for concurrent systems and refinement of actions. In Proc. MFCS’89, LNCS 379, pages 237–248, 1989. 14. R. Gorrieri, M. Roccetti, and E. Stancampiano. A theory of processes with durational actions. Theoretical Computer Science, 140(1):73–94, 1995. 15. M. Hennessy and A. Kiehn. On the decidability of non-interleaving process equivalences. In Proc. 5th Int Conf. Concurrency Theory (CONCUR’94), pages 18–33, 1994. 16. Y. Hirshfeld, M. Jerrum, and F. Moller. A polynomial time algorithm for deciding bisimulation equivalence of normed basic parallel processes. Mathematical Structures in Computer Science, 6:251–259, 1996. 17. H. H¨uttel. Undecidable equivalences for basic parallel processes. In Proc. TACS’94, LNCS 789, pages 454–464, 1994. 18. P. Janˇcar. Bisimilarity of basic parallel processes is PSPACE-complete. In Proc. LICS’03, to appear, 2003. 19. L Jategaonkar and A. R. Meyer. Deciding true concurrency equivalences on safe, finite nets. Theoretical Computer Science, 154:107–143, 1996. 20. M. Jurdzi´nski and M. Nielsen. Hereditary history preserving bisimilarity is undecidable. In Proc. STACS’00, LNCS 1770, pages 358–369, 2000. 21. A. Kiehn. A note on distributed bisimulations. Unpublished draft, 1999. 22. S. Lasota. Decidability of strong bisimilarity for timed BPP. In Proc. 13th Int. Conf. on Concurrency Theory (CONCUR’02), LNCS 2421, pages 562–578. Springer-Verlag, 2002. 23. S. Lasota. On coincidence of distributed and performance equivalence for Basic Parallel Processes. http://www.mimuw.edu.pl/˜sl/papers/, unpublished draft, 2002. 24. S. Lasota. A polynomial-time algorithm for deciding true concurrency equivalences of Basic Parallel Processes. Research Report LSV-02-13, LSV, ENS de Cachan, France, 2002. 25. R. Milner. Communication and Concurrency. Prentice Hall, 1989. 26. J. Srba. Strong bisimilarity and regularity of Basic Parallel Processes is PSPACE-hard. In Proc. STACS’02, LNCS 2285, 2002.
Solving the Sabotage Game Is PSPACE-Hard Christof L¨ oding and Philipp Rohde Lehrstuhl f¨ ur Informatik VII, RWTH Aachen {loeding,rohde}@informatik.rwth-aachen.de
Abstract. We consider the sabotage game as presented by van Benthem. In this game one player moves along the edges of a finite multigraph and the other player takes out a link after each step. One can consider usual algorithmic tasks like reachability, Hamilton path, or complete search as winning conditions for this game. As the game definitely ends after at most the number of edges steps, it is easy to see that solving the sabotage game for the mentioned tasks takes at most PSPACE in the size of the graph. In this paper we establish the PSPACE-hardness of this problem. Furthermore, we introduce a modal logic over changing models to express tasks corresponding to the sabotage games and we show that model checking this logic is PSPACE-complete.
1
Introduction
In some fields of computer science, especially the controlling of reactive systems, an interesting sort of tasks arises, which consider temporal changes of a systems itself. In contrast to the usual tasks over reactive systems, where movements within a system are considered, an additional process affects: the dynamic change of the system itself. Hence we have two different processes: a local movement within the system and a global change of the system. Consider, for example, a network where connections or servers may break down. Some natural questions arise for such a system: is it possible – regardless of the removed connections – to interchange information between two designated servers? Is there a protocol which guarantees that the destination can be reached? Another example for a task of this kind was recently given by van Benthem [1], which can be described as the real Travelling Salesman Problem: is it possible to find your way between two cities within a railway network where a malevolent demon starts cancelling connections? As usual one can model such kind of reactive system as a two-person game, where one player tries to achieve a certain goal given by a winning condition and the other player tries to prevent this. As winning conditions one can consider algorithmic tasks over graphs as, e.g., reachability, Hamilton path, or complete search. Determining the winner of these games gives us the answers for our original tasks. In this paper we show that solving sabotage games where one player (the Runner ) moves along edges in a multi-graph and the other player (the Blocker ) removes an edge in each round is PSPACE-hard for the three mentioned winning B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 531–540, 2003. c Springer-Verlag Berlin Heidelberg 2003
532
Christof L¨ oding and Philipp Rohde
conditions. The main aspect of the sabotage game is that the Runner can only act locally by moving one step further from his actual position whereas the Blocker has the possibility to behave globally on the arena of the game. So the sabotage game is in fact a match between a local and a global player. This distinguishes the sabotage game from the classical games that are studied in combinatorial game theory (see [2] for an overview). In Sect. 2 we introduce the basic notions of the sabotage game. In Sect. 3 we show the PSPACE-hardness for the sabotage game with the reachability condition on undirected graphs by giving a polynomial time reduction from the PSPACE-complete problem of Quantified Boolean Formulas to these games. In Sect. 4 we give polynomial time reductions from sabotage games with reachability condition to the other winning conditions. In the last section we introduce the extension SML of modal logic over transitions systems which captures the concept of removing edges, i.e., SML is a modal logic over changing models. We give the syntax and the semantics of SML and provide a translation to first order logic. By applying the results of the first part we will show that model checking for this logic is PSPACE-complete. We would like to thank Johan van Benthem and Peter van Emde Boas for several ideas and comments on the topic.
2
The Sabotage Game
In this section we give the definition of the sabotage game and we repeat three algorithmic tasks over graphs which can be considered as winning conditions for this game. A multi-graph is a pair (V, e) where V is a non-empty, finite set of vertices and e : V × V → N is an edge multiplicity function, i.e., e(u, v) denotes the number of edges between the vertices u and v. e(u, v) = 0 means that u and v are not connected. In case of an undirected graph we have in addition e(u, v) = e(v, u) for all u, v ∈ V . A single-graph is given by a multiplicity function with e(u, v) ≤ 1 for all vertices u, v ∈ V . The size of a multi-graph (V, e) is given by |V | + |E|, where we set |E| := u,v∈V e(u, v) for directed graphs and |E| := 12 u,v∈V e(u, v) for undirected graphs. Let (V, e0 ) be a multi-graph and v0 ∈ V be an initial vertex. The two-person sabotage game is played as follows: initially the game arena is A0 = (V, e0 , v0 ). The two players, which we call Runner and Blocker, move alternatingly, where the Runner starts his run from vertex v0 . At the start of round n the Runner moves one step further along an existing edge of the graph, i.e., if vn is his actual position, he chooses a vn+1 ∈ V with en (vn , vn+1 ) > 0 and moves to vn+1 . Afterwards the Blocker removes one edge of the graph, i.e., he chooses two vertices u and v somewhere in the graph with en (u, v) > 0. In the directed case we define en+1 (u, v) := en (u, v) − 1 and en+1 (·, ·) := en (·, ·) otherwise. In the undirected case we let en+1 (u, v) := en+1 (v, u) := en (u, v) − 1. The multi-graph An+1 = (V, en+1 , vn+1 ) becomes the arena for the next round. The game ends, if either the Runner cannot make a move, i.e., there is no link starting from his actual position or if the winning condition is fulfilled.
Solving the Sabotage Game Is PSPACE-Hard
533
As winning conditions for the sabotage game on an undirected or directed graph one can consider the usual tasks over graphs, for example: 1. Reachability: the Runner wins iff he can reach a given vertex (which we call the goal ) 2. Hamilton Path or Travelling Salesman: the Runner wins iff he can move along a Hamilton Path, i.e., he visits each vertex exactly once 3. Complete Search: the Runner wins iff he can visit each vertex (possibly more than once) It is easy to see that for the reachability game with one single goal the use of multi-graphs is crucial, but we can bound the multiplicity uniformly by two or, if we allow a second goal vertex, we even can transform every multi-graph game into a single-graph game: Lemma 1. Let G be a sabotage game with reachability condition on a multigraph arena A. Then there are games G , G on arenas A , A with a size polynomial in the size of A such that the Runner wins G iff he wins G , resp. G , and A is a single-graph with two goals and A is a multi-graph with one goal and only single or double edges where the double edges occur only connected with the goal. Proof. We only sketch the proof for directed graphs. To obtain A one adds a new goal and replaces each edge between vertices u and v with multiplicity k > 0 by the construction depicted in Fig. 1 (with k new vertices). We actually need a new goal if v is the original goal. The arena A is constructed similarly: if v is not the original goal we apply the same construction (Fig. 1), but reusing the existing goal instead of adding a new one. If v is the goal then we add double edges from the new vertices to v (see Fig. 2). Note that Blocker does not gain additional moves because all new vertices are directly connected to the goal. u •
•
···
u •
•
v Fig. 1. Replacement for A
•
•
• 2
2
··· v
2
•
•
2
Fig. 2. Replacement for A
Since edges are only deleted but not added during the play the following fact is easy to see: Lemma 2. If the Runner has a winning strategy in the sabotage game with reachability condition then he can win without visiting any vertex twice. In the sequel we will introduce several game arenas where we use edges with a multiplicity ‘high enough’ to ensure that the Blocker cannot win the game
534
Christof L¨ oding and Philipp Rohde
by reducing these edges. In figures these edges are represented by a curly link • . For the moment we can consider these links to be ‘unremovable’. • Due to the previous lemma we have: if the Runner can win the reachability game at all, then he can do so within at most |V | − 1 rounds. Hence we can set the multiplicity of the ’unremovable’ edges to |V | − 1. To bound the multiplicity of edges uniformly one can apply Lemma 1.
3
PSPACE-Hardness for Sabotage Reachability Games
In this section we prove that the PSPACE-complete problem of Quantified Boolean Formulas (cf. [3]), QBF for short, can be reduced by a polynomial time reduction to sabotage games on undirected graphs with the reachability condition. Let ϕ ≡ ∃x1 ∀x2 ∃x3 . . . Qxn ψ be an instance of QBF, where Q is ∃ for n odd and ∀ otherwise and ψ is a quantifier-free Boolean formula in conjunctive normal form. We will construct an undirected game arena for a sabotage game Gϕ with a reachability condition such that the Runner has a winning strategy in the game iff the formula ϕ is satisfiable. A reduction like the classical one from QBF to the Geography Game (cf. [3]) does not work here, since the Blocker may destroy connections in a part of the graph which should be visited only later in the game. This could be solved by blowing up the distances, but this approach results in an arena with a size exponential in the size n of ϕ. So we have to restrict the liberty of the Blocker in a more sophisticated way, i.e., to force him removing edges only ‘locally’. The game arena Gϕ consists of two parts: a chain of n gadgets where first the Runner chooses an assignment for x1 , then the Blocker chooses an assignment for x2 before the Runner chooses an assignment for x3 and so on. The second part gives the Blocker the possibility to select one of the clauses of ψ. The Runner must certify that this clause is indeed satisfied by the chosen assignment: he can reach the goal vertex and win the game iff at least one literal in the clause is true under the assignment. Figure 5 shows an example of the sabotage game Gϕ for the formula ϕ ≡ ∃x1 ∀x2 ∃x3 (c1 ∧c2 ∧c3 ∧c4 ) where we assume that each clause consists of exactly three literals. In the following we describe in detail the several components of Gϕ and their arrangement. The main step of the construction is to take care about the opportunity of the Blocker to remove edges somewhere in the graph. The ∃-Gadget. The gadget where the Runner chooses an assignment for the xi with i odd is displayed in Fig. 3. We are assuming that the run reaches this gadget at vertex A at the first time. Vertex B is intended to be the exit. In the complete construction there are also edges from Xi , resp. Xi leading to the last gadget of the graph, represented as dotted lines labelled by back. We will see later that taking these edges as a shortcut, starting from the ∃-gadget directly to the last gadget is useless for the Runner. The only meaningful direction is coming from the last gadget back to the ∃-gadget. So we temporary assume that
Solving the Sabotage Game Is PSPACE-Hard
in
•
•
•
Xi
in
A 4
4
•
535
•
•
•
D
Xi
•
•
Xi
back
back
•
A 4
4
•
Xi
•
3
C back
back
B
B
out
out
Fig. 3. ∃-gadget for xi with i odd
Fig. 4. ∀-gadget for xi with i even
start
•
•
•
X1
•
•
•
X2
•
•
•
X3
• 4
•
4
• 4
•
4
• 4
•
4
•
•
X1
•
•
•
X2
•
3
•
•
X3
•
• A1 C1 • • •
A2 •
C2 • • •
A3 •
C3
•
• • •
• •
• unremovable link n
• edge of multiplicity n • single edge
•
C4
•
• • •
Fig. 5. The arena for ∃x1 ∀x2 ∃x3 (c1 ∧ c2 ∧ c3 ∧ c4 ) Types of edges: •
A4
536
Christof L¨ oding and Philipp Rohde
the Runner does not take these edges. In the sequel we further assume, due to Lemma 2, that the Runner does not move backwards. The Runner makes his choice simply by moving from A either to the left or to the right. Thereby he moves either towards Xi if he wants xi to be false or towards Xi if he wants xi to be true. We consider only the first case. The Blocker has exactly four steps to remove all the links between Xi and the goal before the Runner reaches this vertex. On the other hand the Blocker cannot remove edges somewhere else in the graph without loosing the game. Why we use four steps here will be clarified later on. If the Runner has reached Xi and he moves towards B then the Blocker has to delete the edge between B and Xi since otherwise the Runner can reach the goal on this way (there are still four edges left between Xi and the goal). The ∀-Gadget. The gadget where the Blocker chooses an assignment for the xi with i even is a little bit more sophisticated. Figure 4 shows the construction. If the Blocker wants xi to be false he tries to lead the Runner towards Xi . In this case he simply removes the three edges between C and Xi during the first three steps. Then the Runner has to move across D and in the meantime the Blocker deletes the four edges between Xi and the goal to ensure that the Runner cannot win directly. As above he removes in the last step the link between B and Xi to prevent a premature end of the game. If the Blocker wants to assign true to xi he should lead the Runner towards Xi . To achieve this aim he removes three of the four links between Xi and the goal before the Runner reaches C. Nevertheless the Runner has the free choice at vertex C whether he moves towards Xi or towards Xi , i.e., the Blocker cannot guarantee that the run goes across Xi . But let us consider the two possible cases: first we assume that the Runner moves as intended and uses an edge between C and Xi . In this round the Blocker removes the last link from Xi to the goal. Then the Runner moves to B and the Blocker deletes the edge from B to Xi . Now assume that the Runner ‘misbehaves’ and moves from C to D and further towards Xi . Then the Blocker first removes the four edges between Xi and the goal. When the Runner now moves from Xi to B the Blocker has to take care that the Runner cannot reach the goal via the link between B and Xi (there is still one edge left from Xi to the goal). For that he can delete the last link between Xi and the goal and isolate the goal completely within this gadget. The Verification Gadget. The last component of the arena is a gadget where the Blocker can choose one of the clauses of the formula ψ. Before we give the representation of this gadget let us explain the idea. If the Blocker chooses the clause c then the Runner can select for his part one literal xi of c. There is an edge back to the ∃-gadget if i is odd or to the ∀-gadget if i is even, videlicet to Xi if xi is positive in c, resp. to Xi if xi is negative in c. So if the chosen assignment satisfies ψ, then for all clauses of ψ there is at least one literal which is true. Since the path through the assignment gadgets visits the opposite truth values this means that there is at least one edge back to an Xi , resp. Xi , which itself is connected to the goal by an edge with a multiplicity of four (assuming
Solving the Sabotage Game Is PSPACE-Hard
537
that the Runner did not misbehave in the ∀-gadget). Therefore the Runner can reach the goal and wins the game. For the converse if the chosen assignment does not satisfy ψ, then there is a clause c in ψ such that every literal in c is assigned false. If the Blocker chooses this clause c then every edge back to the assignment gadgets ends in an Xi , resp. Xi , which is unconnected to the goal. If we show that there is no other way to reach the goal this means that the Runner looses the game. But we have to be very careful neither to allow any shortcuts for the Runner nor to give the Blocker to much liberty. Figure 5 contains the verification gadget for ψ ≡ c1 ∧ c2 ∧ c3 ∧ c4 where each clause ci has exactly three literals. The curly edges at the bottom of the gadget lead back to the corresponding literals of each clause. The Blocker chooses the clause ck by first removing the edges from Aj to Cj for j < k one after the other. Then he cuts the link between Ak and Ak+1 , resp. between Ak and the goal if ck is the last clause. By Lemma 2 it is useless for the Runner to go back, thus he can only follow the given path to Ck . If he reaches this vertex the Blocker must remove the link from Ck to the goal to prevent the win for the opponent. In the next step the Runner selects a literal xi , resp. ¬xi in ck , moves towards the corresponding vertex and afterwards along the curly edge back to the assignment gadgets as described above. At this point the Blocker has exactly two moves left, i.e., he is allowed to remove two edges somewhere in the graph. But we have: if the ‘right’ assignment for this literal has been chosen then there are exactly four edges left connecting the corresponding vertex and the goal. So the Blocker has not the opportunity to isolate the goal and the Runner wins the game. Otherwise, if the ‘wrong’ assignment has been chosen then there is no link from Xi , resp. Xi to the goal left. Any continuation which the Runner could take either leads him back to an already visited vertex (which is a loss by Lemma 2) or, by taking another back -edge in the ‘wrong’ direction, to another vertex in the verification gadget. We handle the latter case in general: if the Runner uses a shortcut starting from a literal vertex and moves directly to the bottom of the verification gadget then the Blocker can prevent the continuation of the run by removing the corresponding single edge between the clause vertex Ck and the vertex beneath and the Runner has to move back. So the Runner wins the game if and only if he wins it without using any shortcut. If the Runner reaches a vertex Ak and the Blocker removes either the edge between Ak and Ck or the one between Ck and the goal or one of the edges leading to the vertices beneath Ck (one for each literal in ck ) then the Runner moves towards Ak+1 , resp. towards the goal if ck is the last clause. The Runner has to do so since, in the latter two cases, entering the ‘damaged’ area around Ck could be a disadvantage for him. Finally we consider the case that the Blocker removes an edge somewhere else in the graph instead. This behaviour is only reasonable if the chosen assignment satisfies ψ. So consider the round when the Runner reaches for the first time an Ak such that the edges from Ak to Ak+1 , resp. the goal, as well as all edges connected to Ck are still left. If ck is the last clause then the Runner just reaches
538
Christof L¨ oding and Philipp Rohde
the goal and wins the game. Otherwise he moves to Ck , chooses an appropriate literal xi , resp. ¬xi such that at least three edges from the corresponding vertex are still left (at least one literal of this kind exists in each clause). Since Ak is the first vertex with this property the Blocker has gained only one additional move, so nevertheless it remains at least one edge from the vertex Xi , resp. Xi to the goal. So if the Runner can chose a satisfying assignment at all then the Blocker cannot prevent the win for the Runner by this behaviour. This explains the multiplicity of four within the assignment gadgets. This completes the construction of the game Gϕ . Obviously, this construction can be done in polynomial time. Therefore, we obtain the following results. Lemma 3. The Runner has a winning strategy in the sabotage game Gϕ iff ϕ is satisfiable. Theorem 4. There is a polynomial time reduction from QBF to sabotage games with reachability winning condition on undirected graphs. In particular solving these games is PSPACE-hard. Since each edge of the game Gϕ has an ‘intended direction’, it is straight forward to check that a similar construction works for directed graphs as well. The construction can also be adapted to prove the PSPACE-hardness of other variants of the game, e.g., if the Blocker is allowed to remove up to n edges in each round for a fixed number n or if the Blocker removes vertices instead of edges. For the details we refer the reader to [4].
4
The Remaining Winning Conditions
In this section we give polynomial time reductions from sabotage games with reachability condition to the ones with complete search condition and with Hamilton path condition. We only consider games on undirected graphs. Let G be a sabotage game on an undirected arena A = (V, e, v0 ) with the reachability condition. We present an arena B such that the Runner wins G iff he wins the game G on B with the complete search condition iff he wins the game G on B with the Hamilton path condition. To obtain B we add several vertices to A: let m := |V | − 2 and let v1 , . . . , vm be an enumeration of all vertices in A except the initial vertex and the goal. We add a sequence P1 , . . . , Pm of new vertices to A together with several chains of new vertices such that each chain has length max{|V |, |E|} and their nodes are linked among each other by ‘unremovable’ edges. We add these chains from Pi as well as from Pi+1 to vertex vi for i < m and one chain from Pm to vertex vm . Furthermore we add for i < m shortcuts from the last vertices in the chains between Pi and vi to the last vertices in the chains between Pi+1 and vi to give the Runner the possibility to skip the visitation of vi . Additionally there is one link with multiplicity |V | from P1 to the goal in A, see Fig. 6. If the Runner can reach the goal in the original game G then by Lemma 2 he can do so within at most |V | − 1 steps. In this case there is at least one link
Solving the Sabotage Game Is PSPACE-Hard
539
|V |
goal •
A start •
v1 v2 v3
•
...
•
•
...
•
•
...
•
•
...
•
•
...
•
P1 P2 P3
max{|V |,|E|}
Fig. 6. Game arena B
to P1 which he uses to reach P1 . He follows the chain to v1 . If he had already visited v1 on his way to the goal he uses the shortcut at the last vertex in the chain, otherwise he visits v1 . Afterwards he moves to P2 using the next chain. Continuing like this he reaches Pm and moves towards the last vertex vm . If he had already visited vm he just stops one vertex before. Otherwise he stops at vm . Moving this way he visits each vertex of B exactly once and wins both games G and G . For the converse: if the Runner cannot reach the goal in G then he cannot do so in the games G and G as well. If he tries to use a shortcut via some Pi the Blocker has enough time on the way to Pi to cut all the links between the goal and P1 . On Runner’s way back from some Pj to a vertex in A he is able to remove all edges in the original game arena A to isolate the goal completely. Thus the Runner looses both games G and G on B. So we have: Theorem 5. There is a polynomial time reduction from sabotage games with reachability condition to sabotage games with complete search condition, resp., with Hamilton path condition. In particular solving these games is PSPACEhard.
5
A Sabotage Modal Logic
In [1] van Benthem considered a ‘sabotage modal logic’, i.e., a modal logic over changing models to express tasks corresponding to sabotage games. He introduced a cross-model modality referring to submodels from which objects have been removed. In this section we will give a formal definition of a sabotage modal logic with a ‘transition-deleting’ modality and we will show how to apply the results of the previous sections to determine the complexity of uniform model checking for this logic. To realise the use of multi-graphs we will interpret the logic over edge-labelled transition systems. By applying Lemma 1 the complexity results for the reachability game can be obtained for multi-graphs with a uniformly bounded multiplicity. Hence we can do with a finite alphabet Σ.
540
Christof L¨ oding and Philipp Rohde
Definition 6. Let p be an unary predicate symbol and a ∈ Σ. Formulae of the sabotage modal logic SML over transition systems are defined by ϕ ::= p | ¬ϕ | ϕ ∨ ϕ | ♦a ϕ | ♦ - aϕ The dual modality a and the label-free versions ♦, are defined as usual. The - a, ♦- and - are defined analogously. modalities Let T = (S, {Ra | a ∈ Σ}, L) be a transition system. For t, t ∈ S and a ∈ Σ a we define the submodel T(t,t ) := (S, {Rb | b ∈ Σ \ {a}} ∪ {Ra \ {(t, t )}}, L). For a given state s ∈ S the semantics of SML is defined as for usual modal logic together with a (T , s) |= ♦ - a ϕ iff there is (t, t ) ∈ Ra such that (T(t,t ) , s) |= ϕ
For a transition system T let Tˆ be the corresponding FO-structure. Similar to the usual modal logic one can translate the logic SML into first order logic. Since FO-model checking is in PSPACE we obtain (see [4] for a proof): Theorem 7. For every SML-formula ϕ there is an effectively constructible FOformula ϕ(x) ˆ such that for every transition system M and state s of M one has ˆ The size of ϕ(x) ˆ is polynomial in the size of ϕ. (T , s) |=SML ϕ iff Tˆ |=FO ϕ[s]. In particular, SML-model checking is in PSPACE. We can express the winning of the Runner in the sabotage game G on directed graphs with the reachability condition by an SML-formula. For that we consider the game arena as a transition system T (G) such that the multiplicity of edges is captured by the edge labelling and such that the goal vertex of the game is viewed as the only state with predicate p. We inductively define the SML-formula - γi ) ∨ p. Then we obtain the following lemma (see γn by γ0 := p and γi+1 := (♦ [4] for a proof) and in combination with Theorem 4 the PSPACE-completeness of SML model checking. Lemma 8. The Runner has a winning strategy from vertex s in the sabotage game G iff (T (G), s) |= γn where n is the number of edges of the game arena. Theorem 9. Model checking for the sabotage logic SML is PSPACE-complete.
References 1. van Benthem, J.: An essay on sabotage and obstruction. In Hutter, D., Werner, S., eds.: Festschrift in Honour of Prof. J¨ org Siekmann. LNAI. Springer (2002) 2. Demaine, E.D.: Playing games with algorithms: Algorithmic combinatorial game theory. In: Proceedings of MFCS 2001. Volume 2136 of LNCS., Springer (2001) 18–32 3. Papadimitriou, C.H.: Computational Complexity. Addison–Wesley (1994) 4. L¨ oding, C., Rohde, P.: Solving the sabotage game is PSPACE-hard. Technical Report AIB-05-2003, RWTH Aachen (2003)
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty Yann Loyer1 and Umberto Straccia2 1
PRiSM, Universit´e de Versailles, 45 Avenue des Etats-Unis, 78035 Versailles, France 2 I.S.T.I. - C.N.R., Via G. Moruzzi,1 I-56124 Pisa, Italy
Abstract. The management of uncertain information in logic programs becomes to be important whenever the real world information to be represented is of imperfect nature and the classical crisp true, false approximation is not adequate. A general framework, called Parametric Deductive Databases with Uncertainty (PDDU) framework [10], was proposed as a unifying umbrella for many existing approaches towards the manipulation of uncertainty in logic programs. We extend PDDU with (non-monotonic) negation, a well-known and important feature of logic programs. We show that, dealing with uncertain and incomplete knowledge, atoms should be assigned only approximations of uncertainty values, unless some assumption is used to complete the knowledge. We rely on the closed world assumption to infer as much default “false” knowledge as possible. Our approach leads also to a novel characterizations, both epistemic and operational, of the well-founded semantics in PDDU, and preserves the continuity of the immediate consequence operator, a major feature of the classical PDDU framework.
1
Introduction
The management of uncertainty within deduction systems is an important issue whenever the real world information to be represented is of imperfect nature. In logic programming, the problem has attracted the attention of many researchers and numerous frameworks have been proposed. Essentially, they differ in the underlying notion of uncertainty (e.g. probability theory [9,13,14,15], fuzzy set theory [16,17,19], multivalued logic [7,8,10], possibilistic logic [2]) and how uncertainty values, associated to rules and facts, are managed. Lakshmanan and Shiri have recently proposed a general framework [10], called Parametric Deductive Databases with Uncertainty (PDDU), that captures and generalizes many of the precedent approaches. In [10], a rule is of the form α A ← B1 , ..., Bn . Computationally, given an assignment I of certainties to the Bi s, the certainty of A is computed by taking the “conjunction” of the certainties I(Bi ) and then somehow “propagating” it to the rule head, taking into account the certainty α of the implication. However, despite its generality, one fundamental issue that remains unaddressed in PDDU is non-monotonic negation, a well-known and important feature in logic programming. In this paper, we extend PDDU [10] to normal logic programs, logic programs with negation. In order to deal with knowledge that is usually not only uncertain, but also incomplete, we believe that one should rely on approximations of uncertainty values only. Then we study the problem of assigning a semantics to a normal logic program in such B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 541–550, 2003. c Springer-Verlag Berlin Heidelberg 2003
542
Yann Loyer and Umberto Straccia
a framework. We first consider the least model, show that it extends the Kripke-Kleene semantics [4] from Datalog programs to normal logic programs, but that it is usually to weak. We then explain how one should try to determine approximations as much precise as possible by completing its knowledge by a kind of default reasoning based on the wellknown Closed World Assumption (CWA). Our approach consists in determining how much knowledge “extracted” from the CWA can “safely” be used to “complete” a logic program. Our approach leads to novel characterizations, both epistemic and operational, of the well-founded semantics [3] for logic programs and extends that semantics to PDDU. Moreover we show that the continuity of the immediate consequence operator, used for inferring information from the program, is preserved. This is important as this is a major feature of classical PDDU, opposed to classical frameworks like [8]. Negation has already been considered in some deductive databases with uncertainty frameworks. In [13,14], the stable semantics has been considered, but limited to the case where the underlying uncertainty formalism is probability theory. That semantics has been considered also in [19], where a semi-possibilistic logic has been proposed, a particular negation operator has been introduced and a fixed min/max-evaluation of conjunction and disjunction is adopted. To the best of our knowledge, there is no work dealing with default negation within PDDU, except than our previous attempt [11]. The semantics defined in [11] is weaker than the one presented in this paper, as in the latter approach more knowledge can be extracted from a program, it has no epistemic characterization, and rely on a less natural management of negation. In the remaining, we proceed as follows. In the following section, the syntax of PDDU with negation, called normal parametric programs, is given, Section 3 contains the definitions of interpretation and model of a program. In Section 4, we present the fundamental notion of support of a program provided by the CWA with respect to (w.r.t.) an interpretation. Then we propose novel characterizations of the well-founded semantics and compare our approach with usual semantics. Section 5 concludes.
2
Preliminaries
Consider an arbitrary first order language that contains infinitely many variable symbols, finitely many constants, and predicate symbols, but no function symbols. The predicate symbol π(A) of an atomic formula A given by A = p(X1 , . . . , Xn ) is defined by π(A) = p. The truth-space is given by a complete lattice: atomic formulae are mapped into elements of a certainty lattice L = T , , ⊗, ⊕ (a complete lattice), where T is the set of certainty values, is a partial order, ⊗ and ⊕ are the meet and join operators, respectively. With ⊥ and we denote the least and greatest element in T . With B(T ) we denote the set of finite multisets (denoted {| · }| ) over T . For instance, a typical certainty lattice is L[0,1] = T , , ⊗, ⊕, where T = [0, 1], α β iff α ≤ β, α⊗β = min(α, β), α ⊕ β = max(α, β), ⊥ = 0 and = 1. While the language does not contain function symbols, it contains symbols for families of conjunction (Fc ), propagation (Fp ) and disjunction functions (Fd ), called combination functions. Roughly, as we will see below, the conjunction function (e.g. ⊗) determines the certainty of the conjunction of L1 , ..., Ln α (the body) of a logic program rule like A ← L1 , ..., Ln , a propagation function (e.g. ⊗) determines how to “propagate” the certainty, resulting from the evaluation of the body L1 , ..., Ln , to the head A, by taking into account the certainty α of the implication,
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
543
while the disjunction function (e.g. ⊕) dictates how to combine the certainties in case an atom appears in the heads of several rules (evaluates a disjunction). Examples of conjunction, propagation and disjunction over L[0,1] are fc (x, y) = min(x, y), fp (x, y) = xy, fd (x, y) = x + y − xy. Formally, a propagation function is a mapping from T × T to T and a conjunction or disjunction function is a mapping from B(T ) to T . Each combination function is monotonic and continuous w.r.t. each one of its arguments. Conjunction and disjunction functions are commutative and associative. Additionally, each kind of function must verify some of the following properties1 : (i) bounded-above: f (α1 , α2 ) αi , for i = 1, 2, ∀α1 , α2 ∈ T ; (ii) bounded-below: f (α1 , α2 ) αi , for i = 1, 2, ∀α1 , α2 ∈ T ; (iii) f ({α}) = α, ∀α ∈ T ; (iv) f (∅) = ⊥; (v) f (∅) = ; and (vi) f (α, ) = α, ∀α ∈ T . The following should be satisfied. A conjunction function in Fc should satisfy properties (i), (iii), (v) and (vi); a propagation function in Fp should satisfy properties (i) and (vi), while a disjunction function in Fd should satisfy properties (ii), (iii) and (iv). We also assume that there is a function from T to T , called negation function, denoted ¬, that is anti-monotone w.r.t. and satisfies ¬¬α = α, ∀α ∈ T . E.g., in L[0,1] , ¬α = 1 − α is quite typical. Finally, a literal is an atomic formula or its negation. Definition 1 (Normal Parametric Program [10]). A normal parametric program P (np-program) is a 5-tuple L, R, C, P, D, whose components are defined as follows: (i) L = T , , ⊗, ⊕ is a complete lattice, where T is a set of certainties partially ordered by , ⊗ is the meet operator and ⊕ the join operator; (ii) R is a finite set of normal α parametric rules (np-rules), each of which is a statement of the form: r : A ←r L1 , ..., Ln where A is an atomic formula, L1 , ..., Ln are literals or values in T and αr ∈ T \ {⊥} is the certainty of the rule; (iii) C maps each np-rule to a conjunction function in Fc ; (iv) P maps each np-rule to a propagation function in Fp ; (v) D maps each predicate symbol in P to a disjunction function in Fd . α
For ease of presentation, we write r : A ←r L1 , ..., Ln ; fd , fp , fc to represent a np-rule in which fd ∈ Fd is the disjunction function associated with π(A) and, fc ∈ Fc and fp ∈ Fp are respectively the conjunction and propagation functions associated with r. Note that, by Definition 1, rules with same head must have the same disjunction function associated. The following example illustrates the notion of np-program. Example 1. Consider an insurance company, which has information about its customers used to determine the risk coefficient of each customer. The company has: (i) data grouped into a set F of facts; and (ii) a set R of rules. Suppose the company has the following database (which is an np-program P = F ∪ R), where a value of the risk coefficient may be already known, but has to be re-evaluated (the client may be a new client and his risk coefficient is given by his precedent insurance company). The certainty lattice is L[0;1] , with fp (x, y) = xy.
F =
1
1 Experience(John) ← 0.7 ⊕, fp , ⊗ Risk(John)
1
← 0.5 ⊕, f , ⊗
p 1 Sport car(John) ← 0.8 ⊕, fp , ⊗
For simplicity, we formulate the properties treating any function as a binary function on T .
544
Yann Loyer and Umberto Straccia
R=
1 Good driver(X) ← Experience(X), ¬Risk(X) ⊕, ⊗, ⊗ 0.8 Risk(X)
Risk(X) Risk(X)
← Young(X), ⊕, fp , ⊗
0.8
← Sport car(X) ⊕, fp , ⊗ 1
← Experience(X), ¬Good driver(X) ⊕, fp , ⊗
Using another disjunction function associated to the rules with head Risk, such as fd (x, y) = x + y − xy, might have been more appropriate in such an example (i.e. we accumulate the risk factors, rather than take the max only), but we will use ⊕ in order to facilitate the reader’s comprehension later on when we compute the semantics of P . We further define the Herbrand base BP of an np-program P as the set of all instantiated atoms corresponding to atoms appearing in P and define P ∗ to be the Herbrand instantiation of P , i.e. the set of all ground instantiations of the rules in P (P ∗ is finite). Note that a Datalog program with negation P is equivalent to the np-program constructed by ret placing each rule in P of the form A←L1 , ..., Ln by the rule A ← L1 , ..., Ln ; ⊕, ⊗, ⊗, where the classical certainty lattice L{t,f } is considered, where L{t,f } = T , , ⊗, ⊕, with T = {t, f }, is defined by f t, ⊕ = max , ⊗ = min , ¬f = t and ¬t = f , ⊥ = f and = t.
3
Interpretations of Programs
The semantics of a program P is determined by selecting a particular interpretation of P in the set of models of P , where an interpretation I of an np-program P is a function that assigns to all atoms of the Herbrand base of P a value in T . In Datalog programs, as well as in PDDU, that chosen model is usually the least model of P w.r.t. 2 . Unfortunately, the introduction of negation may have the consequence that some logic programs do not have a unique minimal model, as shown in the following example. Example 2. Consider the certainty lattice L[0,1] and the program P = {(A←¬B), (B←¬A), (A←0.2), (B←0.3)}. Informally, an interpretation I is a model of the program if it satisfies every rule, while I satisfies a rule X ← Y if I(X) I(Y ) 3 . So, this program has an infinite number of models Ixy , where 0.2 x 1, 0.3 y 1, y ≥ 1 − x, Ixy (A) = x and Ixy (B) = y (those in the A area). There are also an infinite number of minimal models (those on the thin diagonal line). The minimal models Ixy are such that y = 1 − x. 2 Concerning the previous example we may note that the certainty of A in the minimal models is in the interval [0.2, 0.7], while for B the interval is [0.3, 0.8]. An obvious question is: what should be the answer to a query A to the program proposed in Example 2? There are at least two answers: (i) the certainty of A is undefined, as there is no unique minimal model. This is clearly a conservative approach, which in case of ambiguity prefers to leave A unspecified; (ii) the certainty of A is in [0.2, 0.7], which means that even if there is no unique value for A, in all minimal models the certainty of A is in [0.2, 0.7]. In this approach we still try to provide some information. Of course, some 2 3
is extended to the set of interpretations as follows: I J iff for all atoms A, I(A) J(A). Roughly, X ← Y dictates that “X should be at least as true as Y .
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
545
care should be used. Indeed from I(A) ∈ [0.2, 0.7] and I(B) ∈ [0.3, 0.8] we should not conclude that I(A) = 0.2 and I(B) = 0.3 is a model of the program. Applying a usual approach, like the well-founded semantics [18] or the Kripke-Kleene semantics [4], would lead us to choose the conservative solution 1. This was also the approach in our early attempt to deal with normal parametric programs [11]. Such a semantics seems to be too weak, in the sense that it loses some knowledge (e.g. the value of A should be at least 0.2). In this paper we address solution 2. To this end, we propose to rely on T × T . Any element of T × T is denoted by [a; b] and interpreted as an interval on T , i.e. [a; b] is interpreted as the set of elements x ∈ T such that a x b. For instance, turning back to Example 2 above, in the intended model of P , the certainty of A is “approximated” with [0.2; 0.7], i.e. the certainty of A lies in between 0.2 and 0.7 (similarly for B). Formally, given a complete lattice L = T , , ⊗, ⊕, we construct a bilattice over T ×T , according to a well-known construction method (see [3,6]). We recall that a bilattice is a triple B, t , k , where B is a nonempty set and t , k are both partial orderings giving to B the structure of a lattice with a top and a bottom [6]. We consider B = T ×T with orderings: (i) the truth ordering t , where [a1 ; b1 ] t [a2 ; b2 ] iff a1 a2 and b1 b2 ; and (ii) the knowledge ordering k , where [a1 ; b1 ] k [a2 ; b2 ] iff a1 a2 and b2 b1 . The intuition of those orders is that truth increases if the interval contains greater values (e.g. [0.1; 0.4] t [0.2; 0.5]), whereas the knowledge increases when the interval (i.e. in our case the approximation of a certainty value) becomes more precise (e.g. [0.1; 0.4] k [0.2; 0.3], i.e. we have more knowledge). The least and greatest elements of T × T are respectively (i) f = [⊥; ⊥] (false) and t = [; ] (true), w.r.t. t ; and (ii) ⊥ = [⊥; ] (unknown – the less precise interval, i.e. the atom’s certainty value is unknown) and = [; ⊥] (inconsistent – the empty interval) w.r.t. k . The meet, join and negation on T × T w.r.t. both orderings are defined by extending the meet, join and negation from T to T × T in the natural way: let [a1 ; b1 ], [a2 ; b2 ] ∈ T × T , then – [a1 ; b1 ] ⊗t [a2 ; b2 ] = [a1 ⊗ a2 ; b1 ⊗ b2 ] and [a1 ; b1 ] ⊕t [a2 ; b2 ] = [a1 ⊕ a2 ; b1 ⊕ b2 ]; – [a1 ; b1 ] ⊗k [a2 ; b2 ] = [a1 ⊗ a2 ; b1 ⊕ b2 ] and [a1 ; b1 ] ⊕k [a2 ; b2 ] = [a1 ⊕ a2 ; b1 ⊗ b2 ]; – ¬[a1 ; b1 ] = [¬b1 ; ¬a1 ]. ⊗t and ⊕t (⊗k and ⊕k ) denote the meet and join operations on T × T w.r.t. the truth (knowledge) ordering, respectively. For instance, taking L[0,1] , [0.1; 0.4] ⊕t [0.2; 0.5] = [0.2; 0.5], [0.1; 0.4]⊗t [0.2; 0.5] = [0.1; 0.4], [0.1; 0.4]⊕k [0.2; 0.5] = [0.2; 0.4], [0.1; 0.4] ⊗k [0.2; 0.5] = [0.1; 0.5] and ¬[0.1; 0.4] = [0.6; 0.9]. Finally, we extend in a similar way the combination functions from T to T × T . Let fc (resp. fp and fd ) be a conjunction (resp. propagation and disjunction) function over T and [a1 ; b1 ], [a2 ; b2 ] ∈ T × T : (i) fc ([a1 ; b1 ], [a2 ; b2 ]) = [fc (a1 , a2 ); fc (b1 , b2 )]; (ii) fp ([a1 ; b1 ], [a2 ; b2 ]) = [fp (a1 , a2 ); fp (b1 , b2 )]; and (iii)fd ([a1 ; b1 ], [a2 ; b2 ]) = [fd (a1 , a2 ); fd (b1 , b2 )]. It is easy to verify that these extended combination functions preserve the original properties of combination functions.The following theorem holds. Theorem 1. Consider T × T with the orderings t and k . Then (i) ⊗t , ⊕t , ⊗k , ⊕k and the extensions of combination functions are continuous (and, thus, monotonic) w.r.t. t and k ; (ii) any extended negation function is monotonic w.r.t. k ; and (iii) if the negation function satisfies the de Morgan laws, i.e. ∀a, b ∈ T .¬(a ⊕ b) = ¬a ⊗ ¬b then the extended negation function is continuous w.r.t. k .
546
Yann Loyer and Umberto Straccia
Proof: We proof only the last item, as the others are immediate. Consider a chain of intervals x0 k x1 k . . ., where xj = [aj ; bj ] with aj , bj ∈ T . To show the continuity of the extended negation function w.r.t. k , we show that ¬ ⊕kj≥0 xj = ⊕kj≥0 ¬xj : ¬ ⊕kj≥0 xj = ¬[⊕j≥0 aj ; ⊗j≥0 bj ] = [¬ ⊗j≥0 bj ; ¬ ⊕j≥0 aj ] = [⊕j≥0 ¬bj ; ⊗j≥0 ¬aj ] = ⊕kj≥0 [¬bj ; ¬aj ] = ⊕kj≥0 ¬[aj ; bj ] = ⊕kj≥0 ¬xj . We can now extend interpretations over T to the above specified “interval” bilattice. Definition 2 (Approximate Interpretation). Let P be an np-program. An approximate interpretation of P is a total function I from the Herbrand base BP to the set T × T . The set of all the approximate interpretations of P is denoted CP . Intuitively, assigning the logical value [a; b] to an atom A means that the exact certainty value of A lies in between a and b with respect to . Our goal will be to determine for each atom of the Herbrand base of P the most precise interval that can be inferred. At first, we extend the two orderings on T ×T to the set of approximate interpretations CP in a usual way: let I1 and I2 be in CP , then (i) I1 t I2 iff I1 (A) t I2 (A), for all ground atoms A; and (ii) I1 k I2 iff I1 (A) k I2 (A), for all ground atoms A. Under these two orderings CP becomes a complete bilattice. The meet and join operations over T × T for both orderings are extended to CP in the usual way (e.g. for any atom A, (I ⊕k J)(A) = I(A) ⊕k J(A)). Negation is extended similarly, for any atom A, ¬I(A) = I(¬A), and approximate interpretations are extended to T , for any α ∈ T , I(α) = [α; α]. At second, we identify the models of a program. The definition extends the one given in [10] to intervals. Definition 3 (Models of a Logic Program). Let P be an np-program and let I be an approximate interpretation of P . α
1. I satisfies a ground np-rule r : A ←r L1 , ..., Ln ; fd , fp , fc in P , denoted |=I r, iff fp ([αr ; αr ], fc ({|I(L1 ), . . . , I(Ln )|})) t I(A); 2. I is a model of P , or I satisfies P , denoted |=I P , iff for all atoms A ∈ BP , fd (X) t I(A) where fd is the disjunction function associated with π(A) and α X = {|fp ([αr ; αr ], fc ({|I(L1 ), . . . , I(Ln )|})): A ←r L1 , ..., Ln ; fd , fp , fc ∈ P ∗}| . At third, among all possible models of an np-program, we have now to specify which one is the intended model. The characterization of that model will require the definition of an immediate consequence operator that will be used to infer knowledge from a program. That operator is a simple extension from T to T × T of the immediate consequence operator defined in [10] to give semantics to classical PDDU. Definition 4. Let P be any np-program. The immediate consequence operator TP is a mapping from CP to CP , defined as follows: for every interpretation I, for every ground atom A, TP (I)(A) = fd (X), where fd is the disjunction function associated with π(A) α and X = {|fp ([αr ; αr ], fc ({|I(L1 ), . . . , I(Ln )|})): A ←r L1 , ..., Ln ; fd , fp , fc ∈ P ∗}| . Note that from the property (iv) of combination functions satisfied by all disjunction functions, it follows that if an atom A does not appear as the head of a rule, then TP (I)(A) = f. Note also that any fixpoint of TP is a model of P . We have Theorem 2. For any np-program P , TP is monotonic and, if the de Morgan laws hold, continuous w.r.t. k .
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
547
Proof: The proof of monotonicity is easy. To prove the continuity w.r.t. k , consider a chain of interpretations I0 k I1 k . . .. We show that for any A ∈ BP , TP (⊕kj≥0 Ij )(A) = ⊕kj≥0 TP (Ij )(A) (Eq. 1). As CP is a complete lattice, the sequence I0 k I1 k . . . ¯ has a least upper bound, say I¯ = ⊕kj≥0 Ij . For any B ∈ BP , we have ⊕kj≥0 Ij (B) = I(B) k k k ¯ and, from Theorem 1, ⊕j≥0 Ij (¬B) = ⊕j≥0 ¬Ij (B) = ¬ ⊕j≥0 Ij (B) = ¬I(B) and, ¯ thus, for any literal or certainty value L, ⊕kj≥0 Ij (L) = I(L) (Eq. 2) Now, consider the finite set (P ∗ is finite) of all ground rules r1 , . . . , rk having A as head, where α ri = A ←i Li1 , ..., Lini ; fd , fpi , fci ). Let us evaluate the left hand side of Equation (1). ¯ ¯ i ), . . . , I(L ¯ i }| )): 0 ≤ i ≤ TP (⊕kj≥0 Ij )(A) = TP (I)(A) = fd ({|fpi ([αi ; αi ], fci ({|I(L ni 1 k k i k|}). On the other hand side, ⊕j≥0 TP (Ij )(A) = ⊕j≥0 fd ({|fp ([αi ; αi ], fci ({|Ij (Li1 ),. . . , Ij (Lini }| )): 0 ≤ i ≤ k|}). But, fd , fpi and fci are continuous and, thus, by Equation (2), ⊕kj≥0 TP (Ij )(A) = fd ({| ⊕kj≥0 {fpi ([αi ; αi ], fci ({|Ij (Li1 ),. . . , Ij (Lini }| )): 0 ≤ i ≤ k}|}) = fd ({|fpi ([αi ; αi ], ⊕kj≥0 {fci ({|Ij (Li1 ), . . . , Ij (Lini }| )}): 0 ≤ i ≤ k|}) = fd ({|fpi ([αi ; αi ], ¯ i ), . . . , fci ({| ⊕kj≥0 Ij (Li1 ), . . . , ⊕kj≥0 Ij (Lini }| )): 0 ≤ i ≤ k|}) = fd ({|fpi ([αi ;αi ],fci ({|I(L 1 i ¯ I(Lni }| )): 0 ≤ i ≤ k|}). Therefore, Equation (1) holds and, thus, TP is continuous.
4
Semantics of Normal Logic Programs
Usually, the semantics of a normal logic program is the least model of the program w.r.t. the knowledge ordering. That model always exists and coincides with the the least fixed-point of TP with respect to k (which exists as TP is monotonic w.r.t. k ). Note that this least model with respect to k corresponds to an extension of the classical KripkeKleene semantics [4] of Datalog programs with negation to normal parametric programs: if we restrict our attention to Datalog with negation, then we have to deal with four values [f ; f ], [t; t], [f ; t] and [t; f ] that correspond to the truth values false, true, unknown and inconsistent, respectively. Then, our bilattice coincides with Belnap’s logic [1] and for any Datalog program with negation P , the least fixed-point of TP w.r.t. k is a model of P that coincides with the Kripke-Kleene semantics of P . To illustrate the different notions introduced in the paper, we rely on the Example 3. Example 3 (Running example). The certainty lattice is L[0,1] and the np-program is 1
1
1
P = {(A ← B, 0.6; ⊕, ⊗, ⊗), (B ← B; ⊕, ⊗, ⊗), (A ← 0.3; ⊕, ⊗, ⊗)}.
2
For ease of presentation, we represent an interpretation as a set of expressions of the form A: [x; y], where A is a ground atom, indicating that I(A) = [x; y]. E.g. the following sequence of interpretations I0 , I1 , I2 shows how the Kripke- Kleene semantics, KKP , of the running Example 3 is computed (as the iterated fixed-point of TP , starting from I0 = I⊥ , the k minimal interpretation that maps any A ∈ BP to [⊥; ], and In+1 = TP (In )): I0 = {A: [0; 1], B: [0; 1]}, I1 = {A: [0.3; 0.6], B: [0; 1]}, I2 = I1 = KK(P ). In that model, which is minimal w.r.t. k and contains only the knowledge provided by P , the certainty of B lies between 0 and 1, i.e. is unknown, and the certainty of A then lies between 0.3 and 0.6. As well known, that semantics is usually considered as too weak. We propose to consider the Closed World Assumption (CWA) to complete our knowledge (the CWA assumes that all atoms whose value cannot be inferred from the program are false by default). This is done by defining the notion of support, introduced
548
Yann Loyer and Umberto Straccia
in [12], of a program w.r.t. an interpretation. Given a program P and an interpretation I, the support of P w.r.t. I, denoted CP (I), determines in a principled way how much false knowledge, i.e. how much knowledge provided by the CWA, can “safely” be joined to I w.r.t. the program P . Roughly speaking, a part of the CWA is an interpretation J such that J k If , where If maps any A ∈ BP to [⊥; ⊥], and we consider that such an interpretation can be safely added to I if J k TP (I ⊕k J), i.e. if J does not contradict the knowledge represented by P and I. Definition 5. The support of an np-program P w.r.t. an interpretation I, denoted CP (I), is the maximal interpretation J w.r.t. k such that J k If and J k TP (I ⊕k J). k It is easy to note that CP (I) = {J | J k If and J k TP (I ⊕k J)}. The following theorem provides an algorithm for computing the support. Theorem 3. CP (I) coincides with the iterated fixpoint of the function FP,I beginning the computation with If , where FP,I (J) = If ⊗k TP (I ⊕k J). From Theorems 1 and 2, it can be shown that FP,I is monotone and, if the de Morgan laws hold, continuous w.r.t. k . It follows that the iteration of the function FP,I starting from If decreases w.r.t. k . We will refer to CP as the closed world operator. Corollary 1. Let P be an np-program. The closed world operator CP is monotone and, if the de Morgan laws hold, continuous w.r.t. the knowledge order k . The following sequence of interpretations J0 , J1 , J2 shows the computation of CP (KKP ), i.e. the additional knowledge that can be considered using the CWA on the Kripke-Kleene semantics KKP of the running Example 3 (I = KKP , J0 = If and Jn+1 = FP,I (Jn )): J0 = {A: [0; 0], B: [0; 0]}, J1 = {A: [0.0; 0.3], B: [0; 0]}, J2 = J1 = CP (KKP ). CP (KKP ) asserts that, according to the CWA and w.r.t. P and KKP , the certainty of A should be at most 0.3, while that of B is exactly 0. We have now two ways to infer information from an np-program P and an approximate interpretation I: using TP and using CP . To maximize the knowledge derived from P and the CWA, but without introducing any other extra knowledge, we propose to choose the least model of P containing its own support, i.e. that cannot be completed anymore according to the CWA, as the semantics of P . This consideration leads to the following epistemic definition of semantics of a program P . Definition 6. The approximate well-founded semantics of an np-program P , denoted WP , is the least model I of P w.r.t. k such that CP (I) k I. Now we provide a fixpoint characterization and, thus, a way of computation of the approximate well-founded semantics. It is based on an operator, called approximate well-founded operator, that combines the two operators that have been defined above. Given an interpretation I, we complete it with its support provided by the CWA, and then activate the rules of the program on the obtained interpretation using the immediate consequence operator. Definition 7. Let P be an np-program. The approximate well-founded operator, denoted AWP , takes in input an approximate interpretation I ∈ CP and returns AWP (I) ∈ CP defined by AWP (I) = TP (I ⊕k CP (I)).
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
549
From [12], the following theorems can be shown. Theorem 4. Let P be an np-program. Any fixed-point I of AWP is a model of P . Using the properties of monotonicity and continuity of TP and CP w.r.t. the knowledge order k over CP , from the fact that CP is a complete lattice w.r.t. k , by the well-known Knaster-Tarski theorem, it follows that Theorem 5. Let P be an np-program. The approximate well-founded operator AWP is monotone and, if the de Morgan laws hold, continuous w.r.t. the knowledge order k . Therefore, AWP has a least fixed-point w.r.t. the knowledge order k . Moreover that least fixpoint coincides with the approximate well-founded semantics WP of P . The following sequence of interpretations shows the computation of WP of Example 3 (I0 = I⊥ and In+1 = AWP (In )). The certainty of A is 0.3 and the certainty of B is 0. Note that KKP k WP , i.e. the well-founded semantics contains more knowledge than the Kripke-Kleene semantics that was completed with some default knowledge from the CWA. I0 = {A: [0; 1], B: [0; 1]}, CP (I0 ) = {A: [0; 0.3], B: [0; 0]}, I1 = {A: [0.3; 0.3], B: [0; 0]}, CP (I1 ) = {A: [0; 0.3], B: [0; 0]}, I2 = I1 = WP . Example 4. Consider the program P = R ∪ F given in Example 1. The computation of the approximate well-founded semantics WP of P gives the following result4 : WP = {R(J): [0.64; 0.7], S(J): [0.8; 0.8], Y(J): [0; 0], G(J): [0.3; 0.36], E(J): [0.7; 0.7]}, which establishes that John’s degree of Risk is in between [0.64, 0.7]. 2 Finally, our approach captures and extends the usual semantics of logic programs. Theorem 6. If we restrict our attention to PDDU, then for any program P the approximate well-founded semantics WP assigns exact values to all atoms and coincides with the semantics of P proposed in [10]. Theorem 7. If we restrict our attention to Datalog with negation, then we have to deal with Belnap’s bilattices [1] and for any Datalog program with negation P , (i) any stable model [5] of P is a fixpoint of AWP , and (ii) the approximate well-founded semantics WP coincides with the well-founded semantics of P [18].
5
Conclusions
We present a novel characterization, both epistemic and operational, of the well-founded semantics in PDDU [10], an unifying umbrella for many existing approaches towards the manipulation of uncertainty in logic programs. We extended it with non-monotonic (default) negation. Main features of our extension are (i) dealing with uncertain and incomplete knowledge, atoms are assigned approximation of uncertainty values; (ii) the CWA is used to complete the knowledge to infer the most precise approximations as possible relying on a natural management of negation; (iii) that the continuity of the immediate consequence operator is preserved (which is a major feature of the classical PDDU framework); and (iv) our approach extends to PDDU with negation not only the semantics proposed in [10] for PDDU, but also the usual semantics of Datalog with negation: the well-founded semantics and the Kripke-Kleene semantics. 4
For ease of presentation, we use the first letter of predicates and constants only.
550
Yann Loyer and Umberto Straccia
References 1. N. D. Belnap. How a computer should think. In Gilbert Ryle, editor, Contemporary aspects of philosophy, pages 30–56. Oriel Press, Stocksfield, GB, 1977. 2. D. Dubois, J. Lang, and H. Prade. Towards possibilistic logic programming. In Proc. of the 8th Int. Conf. on Logic Programming (ICLP-91), pages 581–595, 1991. 3. M. Fitting. The family of stable models. J. of Logic Programming, 17:197–225, 1993. 4. M. Fitting. A Kripke-Kleene-semantics for general logic programs. J. of Logic Programming, 2:295–312, 1985. 5. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proc. of the 5th Int. Conf. on Logic Programming, pages 1070–1080, 1988. 6. M. L. Ginsberg. Multi-valued logics: a uniform approach to reasoning in artificial intelligence. Computational Intelligence, 4:265–316, 1988. 7. M. Kifer and A. Li. On the semantics of rule-based expert systems with uncertainty. In Proc. of the Int. Conf. on Database Theory (ICDT-88), in LNCS 326, pages 102–117, 1988. 8. M. Kifer and V.S. Subrahmanian. Theory of generalized annotaded logic programming and its applications. J. of Logic Programming, 12:335–367, 1992. 9. L. V.S. Lakshmanan and N. Shiri. Probabilistic deductive databases. In Int. Logic Programming Symposium, pages 254–268, 1994. 10. L. V.S. Lakshmanan and N. Shiri. A parametric approach to deductive databases with uncertainty. IEEE Transactions on Knowledge and Data Engineering, 13(4):554–570, 2001. 11. Y. Loyer and U. Straccia. The well-founded semantics in normal logic programs with uncertainty. In Proc. of the 6th Int. Symposium on Functional and Logic Programming (FLOPS2002), in LNCS 2441, pages 152–166, 2002. 12. Y. Loyer and U. Straccia. The well-founded semantics of logic programs over bilattices: an alternative characterisation. Technical Report ISTI-2003-TR-05, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy, 2003. Submitted. 13. T. Lukasiewicz. Fixpoint characterizations for many-valued disjunctive logic programs with probabilistic semantics, in LNCS, 2173, pages 336–350, 2001. 14. R. Ng and V.S. Subrahmanian. Stable model semantics for probabilistic deductive databases. In Proc. of the 6th Int. Symposium on Methodologies for Intelligent Systems (ISMIS-91), in LNAI 542, pages 163–171, 1991. 15. R. Ng and V.S. Subrahmanian. Probabilistic logic programming. Information and Computation, 101(2):150–201, 1993. 16. E.Y. Shapiro. Logic programs with uncertainties: A tool for implementing rule-based systems. In Proc. of the 8th Int. Joint Conf. on Artificial Intelligence (IJCAI-83), pages 529–532, 1983. 17. M.H. van Emden. Quantitative deduction and its fixpoint theory. J. of Logic Programming, 4(1):37–53, 1986. 18. A. van Gelder, K. A. Ross, and J. S. Schlimpf. The well-founded semantics for general logic programs. J. of the ACM, 38(3):620–650, January 1991. 19. G. Wagner. Negation in fuzzy and possibilistic logic programs. In T. Martin and F. Arcelli, editors, Logic programming and Soft Computing. Research Studies Press, 1998.
Which Is the Worst-Case Nash Equilibrium? Thomas L¨ ucking1 , Marios Mavronicolas2, Burkhard Monien1 , Manuel Rode1, , Paul Spirakis3,4 , and Imrich Vrto5 1
3
Faculty of Computer Science, Electrical Engineering and Mathematics University of Paderborn, F¨ urstenallee 11, 33102 Paderborn, Germany {luck,bm,rode}@uni-paderborn.de 2 Department of Computer Science, University of Cyprus P. O. Box 20537, Nicosia CY-1678, Cyprus [email protected] Computer Technology Institute, P. O. Box 1122, 261 10 Patras, Greece [email protected] 4 Department of Computer Engineering and Informatics University of Patras, Rion, 265 00 Patras, Greece 5 Institute of Mathematics, Slovak Academy of Sciences 841 04 Bratislava 4, D´ ubravska´ a 9, Slovak Republic [email protected]
Abstract. A Nash equilibrium of a routing network represents a stable state of the network where no user finds it beneficial to unilaterally deviate from its routing strategy. In this work, we investigate the structure of such equilibria within the context of a certain game that models selfish routing for a set of n users each shipping its traffic over a network consisting of m parallel links. In particular, we are interested in identifying the worst-case Nash equilibrium – the one that maximizes social cost. Worst-case Nash equilibria were first introduced and studied in the pioneering work of Koutsoupias and Papadimitriou [9]. More specifically, we continue the study of the Conjecture of the Fully Mixed Nash Equilibrium, henceforth abbreviated as FMNE Conjecture, which asserts that the fully mixed Nash equilibrium, when existing, is the worst-case Nash equilibrium. (In the fully mixed Nash equilibrium, the mixed strategy of each user assigns (strictly) positive probability to every link.) We report substantial progress towards identifying the validity, methodologies to establish, and limitations of, the FMNE Conjecture.
1
Introduction
Motivation and Framework. Nash equilibrium [12,13] is arguably the most important solution concept in (non-cooperative) Game Theory1 . It represents
1
This work has been partially supported by the IST Program of the European Union under contract numbers IST-1999-14186 (ALCOM-FT) and IST-2001-33116 (FLAGS), by funds from the Joint Program of Scientific and Technological Collaboration between Greece and Cyprus, by research funds at University of Cyprus, and by the VEGA grant No. 2/3164/23. Graduate School of Dynamic Intelligent Systems See [14] for a concise introduction to contemporary Game Theory.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 551–561, 2003. c Springer-Verlag Berlin Heidelberg 2003
552
Thomas L¨ ucking et al.
a stable state of the play of a strategic game in which each player holds an accurate opinion about the (expected) behavior of other players and acts rationally. Understanding the combinatorial structure of Nash equilibria is a necessary prerequisite to either designing efficient algorithms to compute them, or establishing corresponding hardness and thereby designing (efficient) approximation algorithms2 . In this work, we embark on a systematic study of the combinatorial structure of Nash equilibria in the context of a simple routing game that models selfish routing over a non-cooperative network such as the Internet. This game was originally introduced in a pioneering work of Koutsoupias and Papadimitriou [9]; that work defined coordination ratio (also known as price of anarchy [15]) as a worst-case measure of the impact of the selfish behavior of users on the efficiency of routing over a non-cooperative network operating at a Nash equilibrium. As a worst-case measure, the coordination ratio bounds the maximum loss of efficiency due to selfish behavior of users at the worst-case Nash equilibrium; in sharp contrast, the principal motivation of our work is to identify the actual worst-case Nash equilibrium of the selfish routing game. Within the framework of the selfish routing game of Koutsoupias and Papadimitriou [9], we assume a collection of n users, each employing a mixed strategy, which is a probability distribution over m parallel links, to control the shipping of its own assigned traffic. For each link, a capacity specifies the rate at which the link processes traffic. In a Nash equilibrium, each user selfishly routes its traffic on those links that minimize its expected latency cost, given the network congestion caused by the other users. The social cost of a Nash equilibrium is the expectation, over all random choices of the users, of the maximum, over all links, latency through a link. The worst-case Nash equilibrium is one that maximizes social cost. Our study distinguishes between pure Nash equilibria, where each user chooses exactly one link (with probability one), and mixed Nash equilibria, where the choices of each user are modeled by a probability distribution over links. Of special interest to our work is the fully mixed Nash equilibrium [10], where each user chooses each link with non-zero probability; henceforth, denote F the fully mixed Nash equilibrium. We will also introduce and study disjointly mixed Nash equilibria, where (loosely speaking) mixed strategies of different users do not intersect. Allowing link capacities to vary arbitrarily gives rise to the standard model of related links, also known as model of uniform links in the scheduling literature (cf. Gonzales et al. [5]); the name is due to the fact that the order of the delays a user experiences on each of the links is the same across all users. A special case of the model of related links is the model of identical links, where all link capacities are equal (cf. Graham [6]); thus, in this model, each user incurs the same delay on all links. We also consider the model of unrelated links, where instead of associating a traffic and a capacity with each user and link, respec2
Computation of Nash equilibria has been long observed to be a very challenging, yet notoriously hard algorithmic problem; see [15] for an advocation.
Which Is the Worst-Case Nash Equilibrium?
553
tively, we assign a delay for each pair of a user and a link in an arbitrary way (cf. Horowitz and Sahni [7]); thus, in the unrelated links model, there is no relation between the delays incurred to a user on different links. Reciprocally, in the model of identical traffics, all user traffics are equal; they may vary arbitrarily in the model of arbitrary traffics. We are interested in understanding the impact of model assumptions on links and users on the patterns of the worst-case Nash equilibria for the selfish routing game we consider. Results and Contribution. In this work, we embark on a systematic study of a natural conjecture due to Gairing et al. [4], which asserts that the fully mixed Nash equilibrium is the worst-case Nash equilibrium (with respect to social cost). Fully Mixed Nash Equilibrium Conjecture [4]. Consider the model of arbitrary traffics and related links. Then, for any traffic vector w such that the fully mixed Nash equilibrium F exists, and for any Nash equilibrium P, SC (w, P) ≤ SC (w, F). Henceforth, abbreviate the Fully Mixed Nash Equilibrium Conjecture as the FMNE Conjecture. Our study reports substantial progress towards the settlement of the FMNE Conjecture: – We prove the FMNE Conjecture for several interesting special cases of it (within the model of related links). – In doing so, we provide proof techniques and tools which, while applicable to interesting special cases of it, may suffice for the general case as well. – We reveal limitations of the FMNE Conjecture by establishing that it is not, in general, valid over the model of unrelated links; we present both positive and negative instances for the conjecture. Related Work, Comparison and Significance. The selfish routing game considered in this paper was first introduced and studied in the pioneering work of Koutsoupias and Papadimitriou [9]. This game was subsequently studied in the work of Mavronicolas and Spirakis [10], where fully mixed Nash equilibria were introduced and analyzed. Both works focused mainly on proving bounds on coordination ratio. Subsequent works that provided bounds on coordination ratio include [1,2,8]. The work of Fotakis et al. [3] was the first to study the combinatorial structure and the computational complexity of Nash equilibria for the selfish routing game we consider; that work was subsequently extended by Gairing et al. [4]. (See details below.) The closest to our work are the one by Fotakis et al. [3] and the one by Gairing et al. [4]. – The FMNE Conjecture has been inspired by two results due to Fotakis et al. [3] that confirm or support the conjecture. First, Fotakis et al. [3, Theorem 6] establish the Fully Mixed Nash Equilibrium Conjecture for the model of identical links and assuming that n = 2; Theorem 3 in this work extends this
554
Thomas L¨ ucking et al.
result to the model of related links, still assuming that n = 2 while assuming, in addition, that traffics are identical. Second, Fotakis et al. [3, Theorem 7] prove that, for the model of related links and of identical traffics, the social cost of any Nash equilibrium is no more than 49.02 times the social cost of the fully mixed Nash Equilibrium. – The FMNE Conjecture was explicitly stated in the work of Gairing et al. [4, Conjecture 1.1]. In the same paper, two results are shown that confirm or support the conjecture. First, Gairing et al. [4, Theorem 4.2] establish the validity of the FMNE Conjecture when restricted to pure Nash equilibria. Second, Gairing et al. [4, Theorem 5.1] prove that for the model of identical links, the social cost of any Nash equilibrium is no more than 6 + ε times the social cost of the fully mixed Nash equilibrium, for any constant ε > 0. (Note that since this result does not assume identical traffics, it is incomparable to the related result by Fotakis et al. [3, Theorem 7] (for the model of related links) which does.) The ultimate settlement of the FMNE Conjecture (for the model of related links) may reveal an interesting complexity-theoretic contrast between the worstcase pure and the worst-case mixed Nash equilibria. On one hand, identifying the worst-case pure Nash equilibrium is an N P-hard problem [3, Theorem 4]; on the other hand, if the FMNE Conjecture is valid, identification of the worstcase mixed Nash equilibrium is immediate in the cases where the fully mixed Nash equilibrium exists. (In addition, the characterization of the fully mixed Nash equilibrium shown in [10, Theorem 14] implies that such existence can be checked in polynomial time.) Road Map. The rest of this paper is organized as follows. Section 2 presents our definitions and some preliminaries. The case of disjointly mixed Nash equilibria is treated in Section 3. Section 4 considers the case of identical traffics and related links with n = 2. The reciprocal case of identical traffics and identical links with m = 2 is studied in Section 5. Section 6 examines the case of unrelated links. We conclude, in Section 7, with a discussion of our results and some open problems.
2
Framework
Most of our definitions are patterned after those in [10, Section 2], [3, Section 2] and [4, Section 2], which, in turn, were based on those in [9, Sections 1 & 2]. Mathematical Preliminaries and Notation. Throughout, denote for any integer m ≥ 2, [m] = {1, . . . , m}. For a random variable X, denote E(X) the expectation of X. General. We consider a network consisting of a set of m parallel links 1, 2, . . . , m from a source node to a destination node. Each of n network users 1, 2, . . . , n, or users for short, wishes to route a particular amount of traffic along a (non-fixed) link from source to destination. (Throughout, we will be using subscripts for users and superscripts for links.) In the model of related links, denote wi the
Which Is the Worst-Case Nash Equilibrium?
555
traffic of user i ∈ [n], and W = i∈[n] wi . Define the n × 1 traffic vector w in the natural way. Assume throughout that m > 1 and n > 1. Assume also, without loss of generality, that w1 ≥ w2 ≥ . . . ≥ wn . In the model of unrelated links, denote Cij the cost of user i ∈ [n] on link j ∈ [m]. Define the n × m cost matrix C in the natural way. A pure strategy for user i ∈ [n] is some specific link. A mixed strategy for user i ∈ [n] is a probability distribution over pure strategies; thus, a mixed strategy is a probability distribution over the set of links. The support of the mixed strategy for user i ∈ [n], denoted support(i), is the set of those pure strategies (links) to which i assigns positive probability. A pure strategy profile is represented by an n-tuple 1 , 2 , . . . , n ∈ [m]n ; a mixed strategy profile is represented by an n × m probability matrix P of nm probabilities pji , i ∈ [n] and j ∈ [m], where pji is the probability that user i chooses link j. For a probability matrix P, define indicator variables Iij ∈ {0, 1}, where i ∈ [n] and j ∈ [m], such that Iij = 1 if and only if pji > 0. Thus, the support of the mixed strategy for user i ∈ [n] is the set {j ∈ [m] | Iij = 1}. For each link j ∈ [m], define the view of link j, denoted view (j), as the set of users i ∈ [n] that potentially assign their traffics to link j; so, view (j) = {i ∈ [n] | Iij = 1}. For each link j ∈ [m], denote V j = |view (j)|. Syntactic Classes of Mixed Strategies. A mixed strategy profile P is disjointly mixed if for all links j ∈ [m], |{i ∈ view (j) : pji < 1}| ≤ 1, that is, there is at most one non-pure user on each link. A mixed strategy profile P is fully mixed [10, Section 2.2] if for all users i ∈ [n] and links j ∈ [m], Iij = 1 3 . Throughout, we will cast a pure strategy profile as a special case of a mixed strategy profile in which all (mixed) strategies are pure. System, Models and Cost Measures. In the model of related links, denote c > 0 the capacity of link ∈ [m], representing the rate at which the link processes traffic, and C = l∈[m] cl . So, the latency for traffic w through link equals w/c . In the model of identical capacities, all link capacities are equal to c, for some constant c > 0; link capacities may vary arbitrarily in the model of arbitrary capacities. Assume throughout, without loss of generality, that c1 ≥ c2 ≥ . . . ≥ cm . In the model of identical traffics, all user traffics are equal to 1; user traffics may vary arbitrarily in the model of arbitrary traffics. For a pure strategy profile 1 , 2 , . . . , n , the latency cost for user i ∈ [n], denoted λi , is the latency cost of the link it chooses, that is, ( k:k =i wk )/ci . For a mixed strategy profile P, denote δ the actual traffic on link ∈ [m]; so, δ is a random variable. For each link ∈ [m], denote θ the expected traffic n on link ∈ [m]; thus, θ = E(δ ) = i=1 pi wi . For a mixed strategy profile P, the expected latency cost for user i ∈ [n] on link ∈ [m], denoted λi , is the expectation, over all random choices of the remaining users, of the latency cost for user i had its traffic been assigned to link ; thus, 3
An earlier treatment of fully mixed strategies in the context of bimatrix games has been found in [16], called there completely mixed strategies. See also [11] for a subsequent treatment in the context of strategically zero-sum games.
556
Thomas L¨ ucking et al.
λi
=
wi +
k=1,k=i c
pk wk
=
(1 − pi )wi + θ . c
For each user i ∈ [n], the minimum expected latency cost, denoted λi , is the minimum, over all links ∈ [m], of the expected latency cost for user i on link ; thus, λi = min∈[m] λi . Associated with a traffic vector w and a mixed strategy profile P is the social cost [9, Section 2], denoted SC(w, P), which is the expectation, over all random choices of the users, of the maximum (over all links) latency of traffic through a link; thus,
SC(w, P) = E
max
∈[m]
k:k = c
wk
=
1 ,2 ,...,n ∈[m]n
n k=1
pkk
· max
∈[m]
k:k = c
wk
.
Note that SC (w, P) reduces to the maximum latency through a link in the case of pure strategies. On the other hand, the social optimum [9, Section 2] associated with a traffic vector w, denoted OPT(w), is the least possible maximum (over all links) latency of traffic through a link. Note that while SC(w, P) is defined in relation to a mixed strategy profile P, OPT(w) refers to the optimum pure strategy profile. In the model of unrelated links, the latency of user i on link l is its cost . expected latency cost of user i on link l translates to λli = Cil + C il Thus, the l on C and the strategy profile k=1,k=i pi Ckl , and the social cost, now depending n lk P, is defined by SC(C, P) = l1 ,l2 ,...,ln ∈[m]n k:lk =l Ckl . k=1 pk · maxl∈[m] Nash Equilibria. We are interested in a special class of mixed strategies called Nash equilibria [13] that we describe below. Formally, the probability matrix P is a Nash equilibrium [9, Section 2] if for all users i ∈ [n] and links ∈ [m], λi = λi if Ii = 1, and λi ≥ λi if Ii = 0. Thus, each user assigns its traffic with positive probability only on links for which its expected latency cost is minimized; this implies that there is no incentive for a user to unilaterally deviate from its mixed strategy in order to avoid links on which its expected latency cost is higher than necessary. The coordination ratio [9] is the maximum value, over all traffic vectors w and Nash equilibria P of the ratio SC (w, P) /OPT (w). In the model of unrelated links, the coordination ratio translates to the maximum value of SC (C, P) /OPT (C). Mavronicolas and Spirakis [10, Lemma 15] show that in the model of identical links, all links are equiprobable in a fully mixed Nash equilibrium. Lemma 1 (Mavronicolas and Spirakis [10]). Consider the fully mixed case under the model of identical capacities. Then, there exists a unique Nash equilibrium with associated Nash probabilities pi = 1/m, for any user i ∈ [n] and link ∈ [m]. Gairing et al. [4, Lemma 4.1] show that in the model of related links, the minimum expected latency cost of any user i ∈ [n] in a Nash equilibrium P is bounded by its minimum expected latency cost in the fully mixed Nash equilibrium F.
Which Is the Worst-Case Nash Equilibrium?
557
Lemma 2 (Gairing et al. [4]). Fix any traffic vector w, mixed Nash equilibrium P and user i. Then, λi (w, P) ≤ λi (w, F).
3
Disjointly Mixed versus Fully Mixed Nash Equilibria
In this section, we restrict ourselves to the case of disjointly mixed Nash equilibria, and we establish the FMNE Conjecture for this case. We prove: Theorem 1. Fix any traffic vector w such that F exists, and any disjointly mixed Nash equilibrium P. Then, SC (w, P) ≤ SC (w, F). Corollary 1. Consider the model of related links, and assume that n = 2 and m = 2. Then, the FMNE Conjecture is valid.
4
Identical Traffics, Related Links and n = 2
In this section we restrict to 2 users with identical traffics, that is, w1 = w2 . Without loss of generality we assume w1 = w2 = 1 and c1 ≥ · · · ≥ cm . In the following, we denote by support(1) and support(2) the supports of user 1 and 2, respectively, and by pji and fij the probabilities for user i to choose link j in P and F, respectively. Since we consider two users with identical traffics, we have f1j = f2j for all j ∈ [m], and we write f j = fij . In order to prove the FMNE Conjecture for this type of Nash equilibria we will use the following formula for the social cost of any Nash equilibrium P in this setting. Theorem 2. In case of two users with identical traffics on m related links, the social cost of any Nash equilibrium P is
1 1 i j SC(w, P) = λ2 (P) + p2 p1 − i . cj c 1≤i<j≤m
We now show that we only have to consider Nash equilibria P of certain structure. Lemma 3. For any Nash equilibrium P = F of two users with identical traffics on m related links the following holds: 1. The supports of the two users are support(1) = [r] ∪ I1
and
support(2) = [r] ∪ I2 ,
where I1 , I2 are disjoint sets of links not containing a link i ∈ [r], such that [r] ∪ I1 ∪ I2 = [r + |I1 | + |I2 |]. 2. All links in I1 (I2 ) have the same capacity.
558
Thomas L¨ ucking et al.
In order to prove the FMNE Conjecture for two users with identical traffics on m related links in Theorem 3, we show that the following lemma holds. Lemma 4. Let G be the fully mixed Nash equilibrium of two users with identical traffics on m related links with capacities c1 ≥ . . . ≥ cm . Furthermore, let the last s ≥ 1 links have the same capacity, and let F be the fully mixed Nash equilibrium of the instance received by increasing the capacities of the last s links to cm−s . Then SC(w, F) ≤ SC(w, G). Theorem 3. Consider the model of identical traffics and related links, and assume that n = 2. Then, the FMNE Conjecture is valid.
5
Identical Traffics, Identical Links and m = 2
We show: Theorem 4. Consider the model of identical traffics and identical links, and assume that m = 2 and n is even. Then, the FMNE Conjecture is valid. Proof. Since both the traffics and the link capacities are identical, we can assume without loss of generality that wi = 1 for all i ∈ [n] and cj = 1 for all j ∈ [m]. Recall that in the case of identical capacities, the fully mixed Nash equilibrium F exists always (that is, for all traffic vectors w). Hence, we will show that for any other Nash equilibrium P, SC (w, P) ≤ SC (w, F). Fix any Nash equilibrium P. We can identify three sets of users in P: U1 = {i : support(i) = {1}}, U2 = {i : support(i) = {2}} and U12 = {i : support(i) = {1, 2}}. There are u = min(|U1 |, |U2 |) (pure) users, which choose link 1 and link 2, respectively, with probability 1. Therefore, SC(w, P) = SC(w, P ) + u, where P is the Nash equilibrium derived from P by omitting those 2u users. We will show, that SC (w, F ) ≥ SC (w, P ) for the fully mixed Nash equilibrium F of n − 2u users. As SC (w, F) > SC (w, F ) + 2u (Lemma 5), this will prove the theorem. Without loss of generality, we can assume that P is of the following form: r (pure) users go on link 1 with probability 1, and n − r users choose both links with positive probability. We write Pr for this kind of Nash equilibrium. Lemma 5. For the fully mixed Nash equilibrium F,
n n−1 n SC (w, F) = + n n . 2 2 2 −1 Lemma 6. For the Nash equilibrium Pr with two sets of users U1 = {i : support(i) = {1}} and U12 = {i : support(i) = {1, 2}} with |U1 | = r < n and |U12 | = n − r the Nash probabilities are p := p1i =
r 1 − , 2 2(n − r − 1)
and
q := p2i =
for all users i ∈ U12 . Furthermore, n > 2r + 1 holds.
r 1 + , 2 2(n − r − 1)
Which Is the Worst-Case Nash Equilibrium?
559
Lemma 7. The social cost of the Nash equilibrium Pr is given by n SC (w, Pr ) = 2 +
n−r n −r 2
n−r i= n +1 2
i·
p
n −r 2
q
n 2
+
n i= n +1 2
i·
n−r i−r
pi−r q n−i
n − r n−r−i i p q . i
The proof is completed by showing that ∆ := SC (w, F) − SC (w, Pr ) ≥ 0.
6
Unrelated Links
In this section, we consider the case of unrelated links. We prove Proposition 1. Consider the model of unrelated links. Fix any cost matrix C for which F exists, and a pure Nash equilibrium P. Assume that n ≤ m. Then, for any user i, λi (P) < λi (F). Theorem 5. Consider the model of unrelated links. Assume that n ≤ m. Consider any cost matrix C such that the fully mixed Nash equilibrium F exists, and any pure Nash equilibrium P. Then, SC (C, P) ≤ SC (C, F). Proof. Clearly, the social cost of any pure Nash equilibrium P is equal to the selfish cost of some user, while the social cost of a fully mixed Nash equilibrium F is at least the selfish cost of any user. Hence, Proposition 1 implies the claim. Proposition 2. Consider the model of unrelated links. Assume that n = 2. Fix any cost matrix C for which F exists, and any Nash equilibrium P. Then, for any user i ∈ [2], λi (P) ≤ λi (F). Theorem 6. Consider the model of unrelated links. Assume that n = 2 and m = 2. Then, the FMNE Conjecture is valid. We remark that Theorem 6 generalizes Corollary 1 to the case of unrelated links. We finally prove: Theorem 7 (Counterexample to the FMNE Conjecture). Consider the model of unrelated links. Then, the FMNE Conjecture is not valid even if n = 3 and m = 2.
7
Conclusion and Directions for Further Research
We have verified the FMNE Conjecture over several interesting restrictions of the selfish routing game we considered for the case of related links. We have also investigated the FMNE Conjecture in the case of unrelated links, for which we have identified instances of the game that validate and falsify the FMNE Conjecture, respectively. The most obvious problem left open by our work is to
560
Thomas L¨ ucking et al.
establish the FMNE Conjecture in its full generality for the case of related links. We hope that several of the combinatorial techniques introduced in this work for settling special cases of the conjecture may be handy for the general case. The FMNE Conjecture attempts to study a possible order on the set of Nash equilibria (for the specific selfish routing game we consider) that is defined with respect to their social costs; in the terminology of partially ordered sets, the FMNE Conjecture asserts that the fully mixed Nash equilibrium is a maximal element of the defined order. We feel that this order deserves further study. For example, what are the minimal elements of the order? More generally, is there a characterization of measures on Nash equilibria such that the fully mixed Nash equilibrium is a maximal element of the order defined with respect to any specific measure? (Our study considers the social cost as one such measure of interest.)
Acknowledgments We thank Rainer Feldmann and Martin Gairing for several helpful discussions.
References 1. A. Czumaj and B. V¨ ocking, “Tight Bounds for Worst-Case Equilibria”, Proceedings of the 13th Annual ACM Symposium on Discrete Algorithms, pp. 413–420, 2002. 2. R. Feldmann, M. Gairing, T. L¨ ucking, B. Monien and M. Rode, “Nashification and the Coordination Ratio for a Selfish Routing Game”, 30th International Colloquium on Automata, Languages and Programming, 2003. 3. D. Fotakis, S. Kontogiannis, E. Koutsoupias, M. Mavronicolas and P. Spirakis, “The Structure and Complexity of Nash Equilibria for a Selfish Routing Game,’ Proceedings of the 29th International Colloquium on Automata, Languages and Programming, LNCS 2380, pp. 123–134, 2002. 4. M. Gairing, T. L¨ ucking, M. Mavronicolas, B. Monien and P. Spirakis, “Extreme Nash Equilibria”, submitted for publication, March 2003. Also available as Technical Report FLAGS-TR-02-5, Computer Technology Institute, Patras, Greece, November 2002. 5. T. Gonzalez, O.H. Ibarra and S. Sahni, “Bounds for LPT schedules on uniform processors”, SIAM Journal on Computing, Vol. 6, No. 1, pp. 155–166, 1977. 6. R. L. Graham, “Bounds on Multiprocessing Timing Anomalies”, SIAM Journal on Applied Mathematics, Vol. 17, pp. 416–426, 1969. 7. E. Horowitz and S. Sahni, “Exact and aproximate algorithms for scheduling nonidentical processors”, Journal of the Association of Computing Machinery, Vol. 23, No. 2, pp. 317–327, 1976. 8. E. Koutsoupias, M. Mavronicolas and P. Spirakis, “Approximate Equilibria and Ball Fusion”, Proceedings of the 9th International Colloquium on Structural Information and Communication Complexity, 2002, accepted to Theory of Computing Systems. 9. E. Koutsoupias and C. H. Papadimitriou, “Worst-case Equilibria”, Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science, LNCS 1563, pp. 404–413, 1999.
Which Is the Worst-Case Nash Equilibrium?
561
10. M. Mavronicolas and P. Spirakis, “The Price of Selfish Routing”, Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 510–519, 2001. 11. H. Moulin and L. Vial, “Strategically Zero-Sum Games: The Class of Games whose Completely Mixed Equilibria Cannot be Improved Upon”, International Journal of Game Theory, Vol. 7, Nos. 3/4, pp. 201–221, 1978. 12. J. F. Nash, “Equilibrium Points in N -Person Games”, Proceedings of the National Academy of Sciences, Vol. 36, pp. 48–49, 1950. 13. J. F. Nash, “Non-cooperative Games”, Annals of Mathematics, Vol. 54, No. 2, pp. 286–295, 1951. 14. M. J. Osborne and A. Rubinstein, A Course in Game Theory, MIT Press, 1994. 15. C. H. Papadimitriou, “Algorithms, Games and the Internet”, Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 749–753, 2001. 16. T. E. S. Raghavan, “Completely Mixed Strategies in Bimatrix Games”, Journal of London Mathematical Society, Vol. 2, No. 2, pp. 709–712, 1970.
A Unique Decomposition Theorem for Ordered Monoids with Applications in Process Theory (Extended Abstract) Bas Luttik Dept. of Theoretical Computer Science, Vrije Universiteit Amsterdam De Boelelaan 1081a, NL-1081 HV Amsterdam, The Netherlands [email protected], http://www.cs.vu.nl/˜luttik
Abstract. We prove a unique decomposition theorem for a class of ordered commutative monoids. Then, we use our theorem to establish that every weakly normed process definable in ACPε with bounded communication can be expressed as the parallel composition of a multiset of weakly normed parallel prime processes in exactly one way.
1
Introduction
The Fundamental Theorem of Arithmetic states that every element of the commutative monoid of positive natural numbers under multiplication has a unique decomposition (i.e., can be expressed as a product of prime numbers uniquely determined up to the order of the primes). It has been an invaluable tool in number theory ever since the days of Euclid. In the realm of process theory, unique decomposability with respect to parallel composition is crucial in the proofs that bisimulation is decidable for normed BPP [5] and normed PA [8]. It also plays an important rˆ ole in the analysis of axiom systems involving an operation for parallel composition [1,6,12]. Milner and Moller [10] were the first to establish the unique decomposition property for a commutative monoid of finite processes with a simple operation for parallel composition. In [11], Moller presents an alternative proof of this result which he attributes to Milner; we shall henceforth refer to it as Milner’s technique. Moller explains that the reason for presenting Milner’s technique is that it serves “as a model for the proof of the same result in more complicated languages which evade the simpler proof method” of [10]. He refines Milner’s technique twice. First, he adds communication to the operational semantics of the parallel operator. Then, he turns from strong bisimulation semantics to weak bisimulation semantics. Christensen [4] shows how Milner’s technique can be further refined so that also certain infinite processes can be dealt with. He proves unique decomposition theorems for the commutative monoids of weakly normed BPP and of weakly normed BPPτ expressions modulo strong bisimulation. Milner’s technique hinges on some special properties of the operational semantics of parallel composition. The main contribution of this paper is to place these properties in a general algebraic context. Milner’s technique employs a well-founded subrelation of the transition relation induced on processes by the B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 562–571, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Unique Decomposition Theorem for Ordered Monoids
563
operational semantics. We consider commutative monoids equipped with a wellfounded partial order (rather than an arbitrary well-founded relation) to tie in with the theory of ordered monoids as put forward, e.g., in [3,7]. In Section 2 we propose a few simple conditions on ordered commutative monoids, and we prove that they imply the unique decomposition property (Theorem 13). Then, to prove that a commutative monoid has the unique decomposition property, it suffices to define a partial order and establish that it satisfies our conditions. From Section 3 onwards, we illustrate this technique, discussing unique decomposability for the process theory ACPε [13]. ACPε is more expressive than any of the process theories for which unique decomposition was investigated previously. Firstly, it distinguishes two forms of termination (successful and unsuccessful). Secondly, it has a more general communication mechanism (an arbitrary number of parallel components may participate in a single communication, and communication not necessarily results in τ ). These two features make the extension of Milner’s technique to ACPε nontrivial; in fact, they both lead to counterexamples obstructing a general unique decomposition result (see Examples 16 and 19). In Section 4 we introduce for ACPε an appropriate notion of weak normedness that takes into account the distinction between successful and unsuccessful termination, and we propose a requirement on the communication mechanism. In Section 5 we prove that if the communication mechanism meets the requirement, then the commutative monoid of weakly normed ACPε expressions modulo bisimulation satisfies the abstract specification of Section 2, and hence admits a unique decomposition theorem. Whether or not a commutative monoid satisfies the conditions put forward in Section 2 is independent of the nature of its elements (be it natural numbers, bisimulation equivalence classes of process expressions, or objects of any other kind). Thus, in particular, our unique decomposition theorem for ordered monoids is independent of a syntax for specifying processes. We think that it will turn out to be a convenient tool for establishing unique decomposability results in a wide range of process theories, and for a wide range of process semantics. For instance, we intend to investigate next whether our theorem can be applied to establish unique decomposition results for commutative monoids of processes definable in ACPε modulo weak- and branching bisimulation, and of processes definable in the π-calculus modulo observation equivalence.
2
Unique Decomposition in Commutative p.o. Monoids
A positively ordered monoid (a p.o. monoid ) is a nonempty set M endowed with: (i) an associative binary operation ⊗ on M with an identity element ι ∈ M ; the operation ⊗ stands for composition and ι represents the empty composition; (ii) a partial order on M that is compatible with ⊗, i.e., x y implies x ⊗ z y ⊗ z and z ⊗ x z ⊗ y for all x, y, z ∈ M , and for which the identity ι is the least element, i.e., ι x for all x ∈ M . A p.o. monoid is commutative if its composition is commutative.
564
Bas Luttik
An example of a commutative p.o. monoid is the set N of natural numbers with addition (+) as binary operation, 0 as identity element and the less-thanor-equal relation (≤) as (total) order; we call it the additive p.o. monoid of natural numbers. Another example is the set N∗ of positive natural numbers with multiplication (·) as binary operation, 1 as identity element and the divisibility relation (|) as (partial) order; we call it the multiplicative p.o. monoid of positive natural numbers. In the remainder of this section we shall use N and N∗ to illustrate the theory of decomposition in commutative p.o. monoids that we are about to develop. However, they are not meant to motivate it; the motivating examples stem from process theory. In particular, note that N and N∗ are so-called divisibility monoids [3] in which x y is equivalent to ∃z(x ⊗ z = y). The p.o. monoids arising from process theory generally do not have this property. Definition 1. An element p of a monoid M is called prime if p = ι and p = x⊗y implies x = ι or y = ι. Example 2. The natural number 1 is the only prime element of N. The prime elements of N∗ are the prime numbers. Let x1 , . . . , xn be a (possibly empty) sequence of elements of a monoid M ; we formally define its composition x1 ⊗ · · · ⊗ xn by the following recursion: (i) if n = 0, then x1 ⊗ · · · ⊗ xn = ι; and (ii) if n > 0, then x1 ⊗ · · · ⊗ xn = (x1 ⊗ · · · ⊗ xn−1 ) ⊗ xn . n Occasionally, we shall write i=1 xi instead of x1 ⊗ · · · ⊗ xn . Furthermore, we write xn for the n-fold composition of x. Definition 3. If x is an element of a monoid M and p1 , . . . , pn is a sequence of prime elements of M such that x = p1 ⊗ · · · ⊗ pn , then we call the expression p1 ⊗ · · · ⊗ pn a decomposition of x in M . Two decompositions p1 ⊗ · · · ⊗ pm and q1 ⊗ · · · ⊗ qn of x are equivalent if there is a bijection σ : {1, . . . , m} → {1, . . . , n} such that pi = qσ(i) for all 1 ≤ i ≤ m; otherwise, they are distinct. The identity element ι has the composition of the empty sequence of prime elements as a decomposition, and every prime element has itself as a decomposition. We now proceed to discuss the existence and uniqueness of decompositions in commutative p.o. monoids. We shall present two conditions that together guarantee that every element of a commutative p.o. monoid has a unique decomposition. Definition 4. Let M be a commutative p.o. monoid; by a stratification of M we understand a mapping | | : M → N from M into the additive p.o. monoid N of natural numbers that is a strict homomorphism, i.e., (i) |x ⊗ y| = |x| + |y|, and (ii) x ≺ y implies |x| < |y| (where ≺ and < are the strict relations corresponding to and ≤, respectively).
A Unique Decomposition Theorem for Ordered Monoids
565
A commutative p.o. monoid M together with a stratification | | : M → N we call a stratified p.o. monoid; the number |x| thus associated with every x ∈ M is called the norm of x. Observe that |x| = 0 iff x = ι (since |ι| + |ι| ≤ |ι ⊗ ι| = |ι| by the first condition in Definition 4, it follows that |ι| = 0, and if x = ι, then ι ≺ x, whence 0 = |ι| < |x| by the second condition in Definition 4). Example 5. The additive p.o. monoid N is stratified with the identity mapping idN on N as stratification. The multiplicative p.o. monoid N∗ is stratified with | | : N∗ → N defined by |k| = max{n ≥ 0 : ∃k0 < k1 < · · · < kn (1 = k0 | k1 | · · · | kn = k)}. Proposition 6. In a stratified commutative p.o. monoid every element has a decomposition. Proof. Straightforward by induction on the norm. The next two propositions are straightforward consequences of the definition of stratification; we need them later on. Proposition 7. If M is a stratified commutative p.o. monoid, then M is strict: x ≺ y implies x ⊗ z ≺ y ⊗ z and z ⊗ x ≺ z ⊗ y for all x, y, z ∈ M . Proposition 8. The order of a stratified p.o. monoid M is well-founded : every nonempty subset of M has a -minimal element. Definition 9. We call a p.o. monoid M precompositional if for all x, y, z ∈ M : x y ⊗ z implies that there exist y y and z z such that x = y ⊗ z . Example 10. That N∗ is precompositional can be shown using the well-known property that if p is a prime number such that p | k · l, then p | k or p | l (see, e.g., [9, p. 11]). If x ≺ y, then x is called a predecessor of y, and y a successor of x. If there is no z ∈ M such that x ≺ z ≺ y, then x is an immediate predecessor of y, and y is an immediate successor of x. The following two lemmas establish a crucial relationship between the immediate predecessors of a composition and certain immediate predecessors of its components. Lemma 11. Let M be a precompositional stratified commutative p.o. monoid, and let x, y and z be elements of M . If x is a predecessor of y of maximal norm, then x ⊗ z is an immediate predecessor of y ⊗ z. Lemma 12. Suppose that x = x1 ⊗ . . . ⊗ xn and y are elements of a precompositional stratified commutative p.o. monoid M . If y is an immediate predecessor of x, then there exist i ∈ {1, . . . , n} and an immediate predecessor yi of xi such that y = x1 ⊗ · · · ⊗ xi−1 ⊗ yi ⊗ xi+1 ⊗ · · · ⊗ xn .
566
Bas Luttik
Theorem 13 (Unique Decomposition). In a stratified and precompositional commutative p.o. monoid every element has a unique decomposition. Proof. Let M be a stratified and precompositional commutative p.o. monoid. By Proposition 6, every element of M has a decomposition. To prove uniqueness, suppose, to the contrary, that the subset of elements of M with two or more distinct decompositions is nonempty. Since is well-founded by Proposition 8, this subset has a -minimal element a. That a has at least two distinct decompositions means that there must be a sequence p, p1 , . . . , pn of distinct primes, and sequences k, k1 , . . . , kn and l, l1 , . . . , ln of natural numbers such that (A) a = pk ⊗ pk11 ⊗ · · · ⊗ pknn and a = pl ⊗ pl11 ⊗ · · · ⊗ plnn ; (B) k < l; and (C) |p| < |pi | implies ki = li for all 1 ≤ i ≤ n. That a is -minimal means that the predecessors of a, i.e., the elements of the initial segment I(a) = {x ∈ M : x ≺ a} of M determined by a, all have a unique decomposition. Let x be an element of I(a). We define #p (x), the multiplicity of p in x, as the number of occurrences of the prime p in the unique decomposition of x. The index of p in x, denoted by [x : p], is the maximum of the multiplicities of p in the weak predecessors of x, i.e., [x : p] = max{#p (y) : y x}. We now use that a = pk ⊗ pk11 ⊗ · · · ⊗ pknn to give an upper bound for the multiplicity of p in an element x of I(a). Since M is precompositional there exist y1 , . . . , yk p and zi1 , . . . , ziki pi (1 ≤ i ≤ n) such that k n k i x= i=1 yi ⊗ i=1 j=1 zij . From yi p it follows that #p (yi ) ≤ [p : p] = 1, and from zij pi it follows that #p (zij ) ≤ [pi : p], so for all x ∈ I(a) #p (x) =
k
#p (yi ) +
i=1
ki n i=1 j=1
#p (zij ) ≤ k +
n
ki · [pi : p].
(1)
i=1
We shall now distinguish two cases, according to the contribution of the second term to the right-hand side of the above inequality, and show that either case leads inevitably to a contradiction with condition (B) above. n First, suppose that i=1 ki · [pi : p] > 0; then [pj : p] > 0 for some 1 ≤ j ≤ n. Let x1 , . . . , xn be such that xi pi and #p (xi ) = [pi : p] for all 1 ≤ i ≤ n, and x = pl ⊗ xl11 ⊗ · · · ⊗ xlnn . Since #p (pi ) = 0, if #p (xi ) > 0 then xi ≺ pi . In particular, since #p (xj ) = [pj : p] > 0, this means that x is an element of I(a) (use that a = pl ⊗pl11 ⊗· · ·⊗plnn and apply Proposition 7), and hence, that #p (x) is defined, by n #p (x) = l + li · [pi : p]. i=1
We combine this definition with the inequality in (1), to conclude that
A Unique Decomposition Theorem for Ordered Monoids
l+
n
li · [pi : p] ≤ k +
i=1
n
567
ki · [pi : p].
i=1
To arrive at a contradiction with condition (B), it therefore suffices to prove that ki · [pi : p] = li · [pi : p] for all 1 ≤ i ≤ n. If [pi : p] = 0, then this is clear at once. If [pi : p] > 0, then, since #p (pi ) = 0, there exists x ≺ pi such that #p (x) = [pi : p] > 0. Every occurrence of p in the decomposition of x contributes |p| to the norm of x, so |p| ≤ |x| < |pi |, from which itfollows by condition (C) n that ki · [pi : p] = li · [pi : p]. This settles the case that i=1 ki · [pi : p] > 0. n We continue with the hypothesis that i=1 ki · [pi : p] = 0. First, assume li > 0 for some 1 ≤ i ≤ n; then, by Proposition 7, pl is a predecessor of a, but that implies l = #p (pl ) ≤ k, a contradiction with (B). In the case that remains, we may assume that li = 0 for all 1 ≤ i ≤ n, and consequently, since a = pl cannot be prime, that l > 1. Clearly, pl−1 is a predecessor of a, so 0 < l − 1 = #p (pl−1 ) ≤ k; it follows that k > 0. Now, let y be a predecessor of p of maximal norm; by Lemma 11, it gives rise to an immediate a-predecessor x = y ⊗ pk−1 ⊗ pk11 ⊗ · · · ⊗ pknn . Then, since a = pl , it follows by Lemma 12 that there exists an immediate predecessor z of p such that x = z ⊗pl−1 . We conclude that k−1 = #p (x) = l−1, again a contradiction with condition (B).
3
ACPε
We fix two disjoint sets of constant symbols A and V; the elements of A we call actions; the elements of V we call process variables. With a ∈ A, X ∈ V and H ranging over finite subsets of A, the set P of process expressions is generated by P ::= ε | δ | a | X | P ·P | P +P | ∂H (P ) | P P | P |P | P P. If X is a process variable and P is a process expression, then the expression def X = P is called a process equation defining X. A set of such expressions is called a process specification if it contains precisely one defining process equation for each X ∈ V. For the remainder of this paper we fix a guarded process specification S: every occurrence of a process variable in a right-hand side P of an equation in S occurs in a subexpression of P of the form a · Q with a ∈ A. We also presuppose a communication function, a commutative and associative partial mapping γ : A×A A. It specifies which actions may communicate: if γ(a, b) is undefined, then the actions a and b cannot communicate, whereas if γ(a, b) = c then they can and c stands for the event that they do. The transition system specification in Table 1 defines on the set P a unary predicate ↓ and binary relations −−a→ (a ∈ A). A bisimulation is a symmetric binary relation R on P such that P R Q implies (i) if P ↓, then Q↓; and (ii) if P −−a→ P , then there exists Q such that Q −−a→ Q and P R Q .
568
Bas Luttik Table 1. The transition system specification for ACPε .
ε↓
P ↓, Q↓ (P · Q)↓
P↓ (P + Q)↓, (Q + P )↓
P ↓, Q↓ (P Q)↓, (Q P )↓
a
a −−→ ε a
P −−→ P a P + Q −−→ P , Q + P −−→ P a
b
a
a
P Q −−→
P
P ↓, Q −−→ Q a P · Q −−→ Q a
def
P −−→ P , (X = P ) ∈ S a X −−→ P
c
a
P −−→ P , Q −−→ Q , a = γ(b, c) a P | Q −−→ P Q
P −−→ P a P Q −−→ P Q
a
a
P −−→ P a P · Q −−→ P · Q
a
P −−→ P a Q, Q P −−→ Q P
P↓ ∂H (P )↓
b
P −−→ P , a ∈ H a ∂H (P ) −−→ ∂H (P ) c
P −−→ P , Q −−→ Q , a = γ(b, c) a P Q −−→ P Q
Process expressions P and Q are said to be bisimilar (notation: P ↔ Q) if there exists a bisimulation R such that P R Q. The relation ↔ is an equivalence relation; we write [P ] for the equivalence class of process expressions bisimilar to P , and we denote by P/↔ the set of all such equivalence classes. Baeten and van Glabbeek [2] prove that ↔ has the substitution property with respect to , and that P (Q R) ↔ (P Q) R, P ε ↔ ε P ↔ P and P Q ↔ Q P . Hence, we have the following proposition. Proposition 14. The set P/↔ with ⊗ and ι defined by [P ] ⊗ [Q] = [P Q] and ι = [ε] is a commutative monoid.
4
Weakly Normed ACPε with Bounded Communication
In this section we present three counterexamples obstructing a general unique decomposition theorem for the monoid P/↔ defined in the previous section. They will guide us in defining a submonoid of P/↔ which does admit a unique decomposition theorem, as we shall prove in the next section. The first counterexample already appears in [10]; it shows that perpetual processes need not have a decomposition. def
Example 15. Let a be an action, let γ(a, a) be undefined and let X = a·X. One can show that X ↔ P1 · · · Pn implies Pi ↔ X for some 1 ≤ i ≤ n. It follows that [X] has no decomposition in P/↔ . For suppose that [X] = [P1 ] ⊗ · · · ⊗ [Pn ]; then [Pi ] = [X], whereas [X] is not a prime element of P/↔ (e.g., X ↔ a X). The second counterexample employs the distinction between successful and unsuccessful termination characteristic of ACP-like process theories. Example 16. Let a be an action; then [a], [a + a · δ] and [a · δ + ε] are prime a elements of P/↔ . Moreover, a ↔ a + a · δ (the transition a + a · δ −−→ δ cannot be
A Unique Decomposition Theorem for Ordered Monoids
569
simulated by a). However, it is easily verified that a(a·δ+ε) ↔ (a+a·δ)(a·δ+ε), so a decomposition in P/↔ need not be unique. w
Let w ∈ A∗ , say w = a1 · · · an ; we write P −−→ P if there exist P0 , . . . , Pn an a1 such that P = P0 −−→ · · · −−→ Pn = P . To exclude the problems mentioned in Examples 15 and 16 above we use the following definition. Definition 17. A process expression P is weakly normed if there exist w ∈ A∗ w and a process expression P such that P −− → P ↔ ε. The set of weakly normed ε process expressions is denoted by P . It is straightforward to show that bisimulation respects the property of being weakly normed, and that a parallel composition is weakly normed iff its parallel components are. Hence, we have the following proposition. Proposition 18. The set P ε /↔ is a submonoid of P/↔ . Moreover, if [P Q] ∈ P ε /↔ , then [P ] ∈ P ε /↔ and [Q] ∈ P ε /↔ . Christensen et al. [5] prove that every element of the commutative monoid of weakly normed BPP expressions modulo bisimulation has a unique decomposition. Presupposing a communication function γ that is everywhere undefined, the operational semantics for BPP expressions is as given in Table 1. So, in BPP there is no communication between parallel components. Christensen [4] extends this result to a unique decomposition theorem for the commutative monoid of weakly normed BPPτ expressions modulo bisimulation. His BPPτ is obtained by replacing the parallel operator of BPP by a parallel operator that allows a restricted form of handshaking communication. Our next example shows that the more general communication mechanism of ACPε gives rise to weakly normed process expressions without a decomposition. def
Example 19. Let a be an action, suppose that a = γ(a, a) and X = a · X + a. Then one can show that X ↔ P1 · · · Pn implies that Pi ↔ X for some 1 ≤ i ≤ n, from which it follows by a similar argument as in Example 15 that [X] has no decomposition in P/↔ . The communication function in the above example allows an unbounded number of copies of the action a to participate in a single communication. To exclude this phenomenon, we use the following definition. Definition 20. A communication function γ is bounded if every action can be assigned a weight ≥ 1 in such a way that a = γ(b, c) implies that the weight of a is the sum of the weights of b and c.
5
Unique Decomposition in P ε /↔
We now prove that every element of the commutative monoid P ε /↔ of weakly normed process expressions modulo bisimulation has a unique decomposition, provided that the communication function is bounded. We proceed by defining on P ε /↔ a partial order and a stratification | | : P ε /↔ → N turning it into
570
Bas Luttik
a precompositional stratified commutative p.o. monoid. That every element of P ε /↔ has a unique decomposition then follows from the theorem of Section 2. Throughout this section we assume that the presupposed communication function γ is bounded so that every action has a unique weight assigned to it (cf. Definition 20). We use it to define the weighted length (w) of w ∈ A∗ inductively as follows: if w is the empty sequence, then (w) = 0; and if w = w a and a is an action of weight i, then (w) = (w ) + i. This definition takes into account that a communication stands for the simultaneous execution of multiple actions. It allows us to formulate the following crucial property of the operational semantics of ACPε . w Lemma 21. If P , Q and R are process expressions such that P Q −− → R, then u ε ∗ there exist P , Q ∈ P and u, v ∈ A such that R = P Q , P −−→ P , Q −−v→ Q and (u) + (v) = (w).
Definition 22. The norm |P | of a weakly normed process expression is the least natural number n such that there exists w ∈ A∗ of weighted length n and w a process expression P such that P −− → P ↔ ε. Lemma 23. If P ↔ Q, then |P | = |Q| for all P, Q ∈ P ε . Lemma 24. |P Q| = |P | + |Q| for all P, Q ∈ P ε . We define on P ε binary relations i (i ≥ 1) and by a
P i Q ⇐⇒ there exists a ∈ A of weight i s.t. P −−→ Q and |P | = |Q| + i. P Q ⇐⇒ P i Q for some i ≥ 1. The reflexive-transitive closure ∗ of is a partial order on P ε . Definition 25. We write [P ] [Q] iff there exist P ∈ [P ] and Q ∈ [Q] such that Q ∗ P . It is straightforward to verify that is a partial order on P ε /↔ . Furthermore, that is compatible with ⊗ can be established by means of Lemma 24, and that ι is its least element essentially follows from weak normedness. Hence, we get the following proposition. Proposition 26. The set P ε /↔ is a commutative p.o. monoid. By Lemmas 23 and 24, the mapping | | : (P ε /↔ ) → N defined by [P ] → |P | is a strict homomorphism. Proposition 27. The mapping | | : (P ε /↔ ) → N is a stratification of P ε /↔ . Lemma 28. If P Q ∗ R, then there exist P and Q such that P ∗ P , Q ∗ Q and R = P Q . The following proposition is an easy consequence of the above lemma.
A Unique Decomposition Theorem for Ordered Monoids
571
Proposition 29. The p.o. monoid P ε /↔ is precompositional. According to Propositions 26, 27 and 29, P ε /↔ is a stratified and precompositional commutative p.o. monoid, so by Theorem 13 we get the following result. Theorem 30. In the p.o. monoid P ε /↔ of weakly normed processes expressions modulo bisimulation every element has a unique decomposition, provided that the communication function is bounded.
Acknowledgment The author thanks Clemens Grabmayer, Jeroen Ketema, Vincent van Oostrom, Simona Orzan and the referees for their comments.
References 1. L. Aceto and M. Hennessy. Towards action-refinement in process algebras. Inform. and Comput., 103(2):204–269, 1993. 2. J. C. M. Baeten and R. J. van Glabbeek. Merge and termination in process algebra. In K. V. Nori, editor, Proc. of FST TCS 1987, LNCS 287, pages 153–172, 1987. 3. G. Birkhoff. Lattice theory, volume XXV of American Mathematical Society Colloquium Publications. American Mathematical Society, third edition, 1967. 4. S. Christensen. Decidability and Decomposition in Process Algebras. PhD thesis, University of Edingburgh, 1993. 5. S. Christensen, Y. Hirshfeld, and F. Moller. Decomposability, decidability and axiomatisability for bisimulation equivalence on basic parallel processes. In Proc. of LICS 1993, pages 386–396. IEEE Computer Society Press, 1993. 6. W. J. Fokkink and S. P. Luttik. An ω-complete equational specification of interleaving. In U. Montanari, J. D. P. Rolim, and E. Welzl, editors, Proc. of ICALP 2000, LNCS 1853, pages 729–743, 2000. 7. L. Fuchs. Partially Ordered Algebraic Systems, volume 28 of International Series of Monographs on Pure and Applied Mathematics. Pergamon Press, 1963. 8. Y. Hirshfeld and M. Jerrum. Bisimulation equivalence is decidable for normed process algebra. In J. Wiedermann, P. van Emde Boas, and M. Nielsen, editors, Proc. of ICALP 1999, LNCS 1644, pages 412–421, 1999. 9. T. W. Hungerford. Algebra, volume 73 of GTM. Springer, 1974. 10. R. Milner and F. Moller. Unique decomposition of processes. Theoret. Comput. Sci., 107:357–363, January 1993. 11. F. Moller. Axioms for Concurrency. PhD thesis, University of Edinburgh, 1989. 12. F. Moller. The importance of the left merge operator in process algebras. In M. S. Paterson, editor, Proc. of ICALP 1990, LNCS 443, pages 752–764, 1990. 13. J. L. M. Vrancken. The algebra of communicating processes with empty process. Theoret. Comput. Sci., 177:287–328, 1997.
Generic Algorithms for the Generation of Combinatorial Objects Conrado Mart´ınez and Xavier Molinero Departament de Llenguatges i Sistemes Inform` atics, Universitat Polit`ecnica de Catalunya, E-08034 Barcelona, Spain {conrado,molinero}@lsi.upc.es
Abstract. This paper briefly describes our generic approach to the exhaustive generation of unlabelled and labelled combinatorial classes. Our algorithms receive a size n and a finite description of a combinatorial class A using combinatorial operators such as union, product, set or sequence, in order to list all objects of size n in A. The algorithms work in constant amortized time per generated object and thus they are suitable for rapid prototyping or for inclusion in general libraries.
1
Introduction
Exhaustively generating all the objects of a given size is an important problem with numerous applications that has attracted the interest of combinatorialists and computer scientists for many years. There is a vast literature on the topic and many ingenious techniques and efficient algorithms have been devised for the generation of objects of relevant combinatorial classes (permutations, trees, sets, necklaces, words, etc.). Indeed, it is common to find introductory material in many textbooks on algorithms (see for instance [8]). Furthermore, several distinct natural (and useful) orderings have been considered for the generation of combinatorial classes, for example, lexicographic and Gray ordering. Many stateof-the-art algorithms for exhaustive generation can be found (and executed) in the Combinatorial Object Server (www.theory.csc.uvic.ca/˜cos), where the interested reader can also find further references. The ultimate goal is to achieve algorithms with constant amortized time per generated object, that is, the cost of generating all N objects of size n takes time proportional to N . Many such algorithms are known, but there is still on-going and active research on this topic. In this work, we combine some well-known principles and a few novel ideas in a generic framework to design algorithms that solve the problem of exhaustive generation, given the size and a finite specification of the combinatorial class whose elements are to be listed. This kind of approach was pioneered by Flajolet et al. [2] for the random generation of combinatorial objects and later applied
This research was supported by the Future and Emergent Technologies programme of the EU under contract IST-1999-14186 (ALCOM-FT) and the Spanish “Ministerio de Ciencia y Tecnolog´ıa” programme TIC2002-00190 (AEDRI II).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 572–581, 2003. c Springer-Verlag Berlin Heidelberg 2003
Generic Algorithms for the Generation of Combinatorial Objects
573
by the authors for the unranking problem [6] and for the generation of labelled objects [5]. Somewhat different, but with a similar spirit, is the general approach of Kemp [4] for the generation of words in lexicographic order. We show that all our algorithms work in constant amortized time and provide a general framework for the analysis of the performance of these algorithms in the form of a calculus or set of rules. Most existing algorithms exploit particular characteristics of the combinatorial class to be generated, thus achieving improved performance over na¨ıve or brute force methods. The main contribution of this work is to provide a few generic algorithms which solve the problem of iteration over the subset of objects of a given size, given the size and a finite specification of the combinatorial class. These finite specifications are built from basic - and atomic classes, and combinatorial operators like unions (‘+’), Cartesian products (‘×’), sequences (‘S’), multisets (‘M’), cycles (‘C’), etc. Our algorithms, deprived of specific knowledge of the problem at hand, are likely to be a bit worse than their specific counterparts, but still have competitive performance, making them good candidates for rapid prototyping and for inclusion into general combinatorial libraries such as the combstruct package [1] for Maple1 and MuPAD-combinat for MuPAD (mupad-combinat.sourceforge.net). Typically, complex objects in a given class are composed by smaller units, called atoms. Atoms are objects of size 1 and the size of an object is the number of atoms it contains. For instance, a string is composed by the concatenation of symbols, where each of these is an atom, and the size of the string is its length or the number of symbols it is composed of. Similarly, a tree is built out of nodes – its atoms – and the size of the tree is its number of nodes. Objects of size 0 and 1 will be generically denoted by and Z, respectively2 . Unlabelled objects are those whose atoms are indistinguishable. On the contrary, each of n atoms of a labelled object of size n bears a distinct label drawn from the numbers 1 to n. For the rest of this paper, we will use calligraphic uppercase letters to denote classes (A, B, C, . . . ). Given a class A, An will denote the subset of objects of size n in A and an the number of such objects. We use the corresponding uppercase roman letter (A, B, C, . . . ) to denote the counting generating functions (ordinary GFs for unlabelled classes, exponential GF for labelled classes). The n-th coefficient of A(z) is denoted [z n ]A(z); hence, an = [z n ]A(z) if A(z) is ordinary and an = n! · [z n ]A(z) if A(z) is exponential. As it will become apparent, our approach to the exhaustive generation problem requires an efficient algorithm for counting, that is, given a specification of a class and a size, compute the number of objects with the given size. Hence, we will only deal with so-called admissible combinatorial classes [10,9]. Those are constructed from admissible operators, operations over classes that yield new 1
2
The current implementation of combstruct offers a routine allstructs to generate all objects of a given size and a finite specification of the class; but it does the job by repeatedly generating objects at random until all them have been generated. Also, we will use these symbols to denote not only objects but the classes that contain just one object of size 0 and of size 1, respectively.
574
Conrado Mart´ınez and Xavier Molinero
classes, and such that the number of objects of a given size in the new class can be computed from the number of objects of that size or smaller sizes in the constituent classes. Tables 1 and 2 give a few examples of both labelled and unlabelled admissible classes; as such, our algorithms are able to generate all their objects of a given size and the specification of the class. Table 1. Examples of labelled classes and their specifications Labelled class Cayley trees Binary plane trees Hierarchies Surjections Functional graphs
Specification A = Z M(A) B =Z +BB C = Z + M(C, card ≥ 2)) D = S(M(Z, card ≥ 1))) E = M(C(A))
Any combinatorial object belonging to an admissible class can be suitably represented as a string of symbols or as an expression tree whose leaves correspond to the object’s atoms and whose internal nodes are labelled by admissible operators. However, such a representation is not the most convenient for the exhaustive generation problem; our algorithms will act upon a new kind of objects, which we call iterators. An iterator contains a combinatorial object (represented as a tree-like structure), but also additional information which helps and speeds up the generation process. This additional information is also organized as a tree-like structure – which we call deep structure – and reflects that of the corresponding combinatorial object, but each node contains information about the class, the rank, the size, the labelling of the “subobject” rooted at the node in the case of labelled objects, etc. Furthermore, there are pointers between the object’s representation and the deep structure to allow fast access and update of the object’s representation. Table 2. Examples of unlabelled classes and their specifications Unlabelled class Binary sequences Necklaces Rooted unlabelled trees Non plane ternary trees Integer partititons without repetition
Specification A = S(Z + Z) B = C(M(Z, card ≥ 1)) C = Z × M(C) D = Z + M(D, card = 3) E = P(S(Z, card ≥ 1))
From the user’s point of view, we shall offer the following four routines: 1) a procedure to initialize the iterator (init iter), which given a finite description of a combinatorial class A and a size n, returns the iterator corresponding to the first object in An ; 2) a function next, which given an iterator modifies it so that it corresponds to the object following the previous one; 3) a function get obj to retrieve the combinatorial object from the iterator, in order to print or process it as needed; 4) a boolean function is last to check whether the
Generic Algorithms for the Generation of Combinatorial Objects
575
iterator corresponds to the past-the-end object: a fictitious sentinel which follows the last object in An . These will be typically used as follows: it:= init_iter(A, n); while not is_last(it) do print(get_obj(it)); it:= next(it); end In next section we will describe our algorithms and their performance for the generation of admissible unlabelled combinatorial classes. Section 3 briefly considers the generation of labelled objects, based on our previous work [5], thus integrating the generation of labelled and unlabelled classes into a single and elegant framework. Except for the case of unlabelled cycles (not describe here), most common combinatorial constructions can be dealt within this framework. In section 4 we comment our current work, extensions and future developments.
2
Unlabelled Classes
Here, by an admissible class we mean that the class can be finitely specified using the class (the class with a single object of size 0), atomic classes (classes that contain a single object of size 1), and disjoint unions (‘+’), products (‘×’), sequences (‘S’), multisets (‘M’) and powersets (‘P’) of admissible classes. We have also developed a generic algorithm for cycles (‘C’) of admissible unlabelled classes; however both the algorithm and its analysis use rather different ideas and techniques from the other operators and will be not explained here because of the space limitations. There exist other interesting admissible operations, but we shall restrict our attention to those mentioned above. Even within this restricted framework, many important combinatorial classes can be specified. Furthermore, the techniques and results presented here can be easily extended to other admissible operators such as substitutions, sequences, multisets and powersets of restricted cardinality, etc. 2.1
The Algorithms
The problem of generating the class and atomic classes is trivial. We shall only consider the function next, as the others functions are more or less straightforward. We assume that we have a function count(A, n) which returns the number of objects of size n in the combinatorial class A [2]. The function next actually uses a recursive routine which receives a pointer p to some node of the deep structure; the initial recursive call is with p pointing to the root of the deep structure. However, we will use the same name for the recursive routine which actually performs the job. If the current object (of size n) belongs to a class A+B, we need only to check whether the current object is the last of A or not. If it is, then the next object will be the first object in B; otherwise, we generate the next object in the appropriate class (A if the current rank is smaller than or equal to count(A, n) − 1, B if the
576
Conrado Mart´ınez and Xavier Molinero
current rank is greater than or equal to count(A, n) − 1). This actually means recursively applying the procedure to the unique subtree hanging from p. All the checks above can be easily done as the node pointed to by p in the deep structure contains the specification, the rank of the current object, its size, etc. On the other hand, if the current subobject of size n corresponds to a product, say A×B, we check if the second component (given by p.second ) is the last object of size n−k of its class B, where k = p.first.size. If it is not, we recursively obtain the next object of the second component. If the second component is the last object of size n − k in B, but the first component (pointed to by p.first) is not the last object of size k in A then the algorithm is recursively applied to the first component; we also reset the information in the current node and set the second component of the new object to be the first object in B of size n − k. If both first and second components were the last objects of sizes k and n − k in A and B, respectively, then a loop looks for the smallest k > k such that Ak × Bn−k is not empty. After that, the first objects in Ak and Bn−k are generated and the appropriate information is updated in the current node p. Multisets are dealt with in a similar spirit as products. The basis of the algorithm is the isomorphism ΘM(A) = ∆ΘA × M(A), where ∆A denotes the diagonal or stacking of the class A, that is, ∆A = {α | α ∈ A} + {(α, α) | α ∈ A} + {(α, α, α) | α ∈ A} + · · · , and ΘA is the pointing (marking) of the class A, that is, the class that we obtain by making copies of each object of size k in A, but marking a different atom in each copy. If we mark an atom in a multiset we might think that we have marked the object that contains the atom, say the m-th copy of the object; on the right hand side we produce the marked object, make m copies of the marked object and attach a multiset. A multiset γ consists of two parts: a first component α ∈ A of size k together with its number of occurrences, say , and a second component β which is a multiset of size n − k. This second component contains objects in A whose size is less than or equal to k; in the latter case, their rank is strictly smaller than the rank of α (implying that they have been used as the first component previously). In order to get the next object, we check whether there exist multisets satisfying the conditions above, that is, whether there is some object following β. If not, we obtain the object following α in ∆Ak . When the first component in the current object is the last object in ∆Ak , we loop until we find a suitable size j > j = k, and obtain the respective first objects. The generation of the next object of α is also easy: we obtain the object following α in Ak if it exists; if not, we look for the smaller divisor k of j = k which is larger than k and produce the first object in Ak and attach the appropiate number of occurrences = j/k . Powersets are generated in the same vein. For a fixed first component α of size j, we produce all powersets of size n − j made up of objects of size smaller than or equal to j and whose rank is strictly smaller than the rank of α; if there are no more such powersets, we recursively apply the procedure to α. If α is the last object of size j then we look for the next available size j for the first component. The isomorphim here is given by ΘP(A) = ∆[odd] A × P(A) − ∆[even] A × P(A), where ∆[odd] and ∆[even] are like the diagonal operator, but for odd and even
Generic Algorithms for the Generation of Combinatorial Objects
577
numbers of copies, respectively. The proof of this isomorphism is a bit more involved, and exploits the principle of inclusion-exclusion to guarantee that no element is repeated. On a purely formal basis, we can introduce ∆ˆ = ∆[odd] − ∆[even] , so that we ˆ could say ΘP(A) = ∆ΘA × P(A). The operator ∆ˆ allows for more convenient symbolic manipulations when computing the cost of the algorithms, but has no combinatorial meaning, though. 2.2
The Performance Let ΛAn = α∈An cn(α), where cn(α) denotes the cost of applying next to object α. Then µA,n = ΛAn /an is the amortized cost per generated object. We will not include in the cost the preprocessing time needed to compute the tables of counts nor the cost of constructing the first object (and associated information) of a given size in a given class. These can be precomputed just once (or computed the first time they are needed and stored into tables for later reuse) and its contribution to the overall performance will be neglected. Also, we will not include the time to parse the specification and transform it to standard form either as this cost does not depend on n. Lemma 1. Given an admissible unlabelled class A, let ΛA(z) denote the ordinary generating function of the cumulated costs {ΛAn }n≥0 . Then 1. 2. 3. 4. 5. 6.
Λ∅ = Λ = ΛZ = 0, Λ(A + B) = ΛA + ΛB + [[A]] + [[B]] − [[A + B]], Λ(A × B) = ΛA · [[B]] + A · ΛB + [[A]] · [[B]] − [[A × B]], ΛΘA = ΘΛA + Θ[[A]] − [[ΘA]], Λ∆A = ∆ΛA + ∆[[A]] − [[∆A]], Λ∆[t] A = ∆[t] ΛA + ∆[t] [[A]] − [[∆[t] A]], t ∈ {odd, even}
where ΘA(z) ≡ z dA dz , ΘA denotes the pointing or marking of the class A, ∆A(z) ≡ k>0 A(z k ), ∆A denotes the diagonal of the class A, ∆[odd] A(z) = 2k−1 ), ∆[even] A(z) = k>0 A(z 2k ), ∆[odd] A and ∆[even] denote the odd k>0 A(z and even diagonals of the class A, respectively, and [[A]] = n≥0 [[an = 0]]z n , with [[P ]] = 1 if the predicate P is true and 0 otherwise. Proof. For classes that contain just one item, the cumulated cost is 0, as we do not count the cost of generating the first object. The rule for unions is straightforward, but we must take care to charge the cost corresponding to computing the next of the last element in A; this is accounted for by the terms [[A]]+[[B]]−[[A + B]]. For products, we generate pairs in Ak ×Bn−k for k = 0, . . . , n. For a fixed object α of size k we generate all objects in Bn−k to form all pairs whose first component is α; since there are ak objects of size k and we have to consider all possible values of k, this contribution is given by A · ΛB. The other main contribution to the cost comes from the generation of all objects in A with sizes k from 0 to n (and such that there are objects of size Bn−k to form at least a pair). This is given by the term ΛA · [[B]]. The remaining terms account for the
578
Conrado Mart´ınez and Xavier Molinero
application of next to the last pair in Ak × Bn−k whenever there exist such a pair, but not for the first pair in A × B. The algorithm for the marking is also straightforward: list all elements in A of size n with the first atom marked, then list all them again but with the second atom marked and so on. The terms Θ[[A]] − [[ΘA]] account for the cost of passing from the first listing to the second, from the second to the third, etc. To generate all objects of size n in the diagonal of the class A, recall that we loop through all divisors d of n such that there are objects of size d. For each such d, we list all objects of size d and for each one we attach the number of copies (n/d) that make up the corresponding object in ∆A. Thus Λ∆An = d divides n (ΛAd + [[ad = 0]]) − [[∆An = ∅]]. The rule for the odd and even diagonals of A are similarly obtained. From Lemma 1 we can easily obtain rules for sequences, sets and multisets. Corollary 1. Let A be an admissible class such that ∈ A and let A be its counting generating function. Then 1. Let S(A) = 1/(1 − A). Then ΛS(A) = S(A) · (1 + (ΛA + [[A]] − 1)[[S(A)]]) . 2. Let M(A) = exp( k>0 A(z k )/k). Then z dz ∆Θ(ΛA + [[A]]) · [[M(A)]] − Θ[[M(A)]] . ΛM(A) = M(A) · z · M(A) 0 3. Let P(A) = exp( k>0 (−1)k−1 A(z k )/k). Then ΛP(A) = P(A) ·
z
dz ˆ ∆Θ(ΛA + [[A]]) · [[P(A)]] − Θ[[P(A)]] z · P(A)
+[[ΘP(A)]] − [[∆[odd] ΘA × P(A)]] + [[∆[even] ΘA × P(A)]] . 0
Proof. It suffices to use the isomorphisms S(A) = + A × S(A), ΘM(A) = ∆ΘA × M(A) and ΘP(A) = ∆[odd] ΘA × P(A) − ∆[even] ΘA × P(A), and apply rules 1-6 in the statement of Lemma 1. In the case of multisets and powersets, the rules can be obtained applying Λ to both sides of the isomorphisms given above, inverting Θ and Λ with rule 4, and solving the resulting linear differential equations. In the case of powersets, the sought cost arises from the difference of costs; we have thus ΛΘP(A) = Λ(Θ∆[odd] A × P(A)) − Λ(Θ∆[even] A × P(A)). We have the following theorem that can be easily established from either the rules that we have just derived or directly reasoning about the algorithms. Theorem 1. For any unlabelled admissible class A which can be finitely specified using , Z, +, ×, S, M and P, we have µA,n = ΛAn /an = Θ(1).
Generic Algorithms for the Generation of Combinatorial Objects
579
Proof (Sketch of proof ). The proof is by structural induction on the specification of the class and on the size of the objects. We consider thus what happens for unions, products, sequences, etc. and assume that the statement is true for smaller sizes. Since we charge just one “time” unit for the update of one node in the deep structure and assume that the initialization and calls to the count routine are free, we actually have µA,n → 1 as n → ∞. In practice, µA,n is different for different classes if we take into account that the (constant) overhead associated with each operator varies. We conclude with a few simple examples of application of these rules. 1. K-shuffles. A K-shuffle is a sequence of a’s and b’s that contains exactly K b’s. Let LK = S(a) × (b × LK−1 ) for K > 0 and L0 = S(a). It is not zK difficult to show that ΛLK ∼ (1−z) K+1 near the singularity z = 1; hence the n K-shuffles of size n. amortized cost µLK ,n → 1 since there are exactly K We get the same asymptotic performance if we use alternative specifications, e.g., LK = LK−1 × (b × S(a)). 2. Motzkin trees. For √M √= + Z × M + Z × M × M, one readily gets 6z + z 2 + O((1 − 6z + z 2 )3/2 ) near the domΛM ∼ −1/2(3 − 2 2) 1 − √ inant singularity at z = 3 − 2 2; hence ΛM ∼ M and µM,n = 1 + o(1).
3
Labelled Classes
By admissible labelled classes we mean those that can be finitely specified using the -class, atomic labelled classes, unions (+), labelled products ( ), sequences (Seq), sets (Set) and cycles (Cycle) of admissible labelled classes. As in the previous section, there exist other admissible operators over labelled classes, but we shall restrict our attention to those mentioned above. Again, many important combinatorial classes can be specified within this framework and the ideas that we present here carry on to other admissible operators such as substitutions, sequences, sets and cycles of restricted cardinality, etc. For example, the class C of Cayley trees is admissible, as it can be specified by C = Z Set(C), where Z denotes an atomic class. The class F of functional graphs is also admissible, since a functional graph is a set of cycles of Cayley trees; therefore, F = Set(Cycle(C)). The exhaustive generation of labelled combinatorial objects uses similar ideas to those sketched in Section 2; in fact, we face easier problems since sets and cycles can be specified by means of the so-called boxed product, denoted by 2 [3]. We recall that in a boxed product, we obtain a collection of labelled objects from a pair of given objects, much in the same manner as for the usual labelled product, but the smallest label must always correspond to an atom belonging to the first object in the pair. Boxed products are related to the pointing (see subsection 2.1) of a class A by ΘA = ΘB C ⇐⇒ A = B 2 C. The isomorphisms for sequences, sets and cycles in terms of the other constructors (union, product and boxed product) that we use in our algorithms are the following: 1) Seq(A) = +A Seq(A), 2) Set(A) = +A 2 Set(A), and 3) Cycle(A) = A 2 Seq(A). Thus every admissible specification (a finite set of equations specifying admissible
580
Conrado Mart´ınez and Xavier Molinero
classes, like in the example of functional graphs) can be transformed into an equivalent specification that involves only unions, products and boxed products. The algorithms for unions and products are very similar to those for unlabelled classes, and the algorithm for boxed products works much like the algorithm for products. In the case of labelled and boxed products, we change the partition or relabelling of the current object if possible; otherwise we recursively apply the next routine to the second component or the first component of the object. In order to “traverse” the nk possible partitions of the labels3 of a pair α, β , where n is the size of the objects to be generated and k is the size of the first component of the current object, we use Nijenhuis and Wilf’s routine for the next k-subset [7] (alternatively, we can use the algorithm by Kemp [4]). Also, like for unlabelled classes, we can set up a calculus for the complexity of our algorithms with rules such as ΘΛ(A 2 B) = ΘΛA · [[B]] + ΘA · ΛB + Θ[[A]] · [[B]] − Θ[[A 2 B]], n for boxed products. Here, [[A]] = n≥0 [[an = 0]] zn! and ΛA is the exponential generating function of the total costs to generate all elements of each size. The rules for the other combinatorial constructions are similar in spirit and can be easily derived from the rules for unions, products and boxed products. We make here the same assumptions as in the analysis of the performance of the algorithms for unlabelled classes; moreover, we take into account the fact that the algorithm for the generation of k-subsets works in constant amortized time [7]. Then it is not difficult to show that this cost can be easily “absorbed” by terms like A · ΛB in the rule for products and there is no need to include a term of the type c · A · B. Using the same techniques as in the proof of Theorem 1 is not hard to establish an analogous result for labelled generation. The detailed account of the complexity calculus for the generation of labelled objects and of the proof of the following theorem will be given in the full version of this extended abstract. Theorem 2. For any admissible labelled class A which can be finitely specified using , Z, +, , Seq, Set and Cycle, we have µA,n = ΛAn /an = Θ(1).
4
Current and Future Work
As we have mentioned earlier, we already have a constant amortized time algorithm to generate unlabelled cycles of A’s. However, this algorithm for the generation of unlabelled cycles is based upon techniques quite different from the one here and it doesn’t fit nicely in the framework here sketched (in sharp contrast with labelled cycles). We have implemented all the algorithms described here for the Maple system, on top of the basic routines provided by the combstruct package. Also, 3
There are only
n−1 k−1
partitions of the labels in the case of boxed products.
Generic Algorithms for the Generation of Combinatorial Objects
581
there are plans for a port of these routines to the MuPAD-combinat package in the near future. Furthermore, we also have routines for the generation of labelled substitutions and for labelled sequences, sets and cycles when their cardinalities are restricted. We have conducted extensive experiments to asses the practical performance of our algorithms. These experiments show that the practical performance is in good agreement to the theoretical predictions (namely, the cost grows linearly with the total number N of generated objects, if N is sufficiently large; the slope of the plot is independent of the size of the objects being generated). Our current work is now centered in the extension of the techniques presented here to other admissible operators. We also are trying to design an algorithm for unlabelled cycles that fits within the framework here sketched. If we obtained such an algorithm, it would immediately suggest an efficient answer for the unranking of unlabelled cycles, a question that still remains open, to the best of the authors’ knowledge. We are also working on alternative isomorphisms and orderings which could improve the efficiency of the generation algorithms (similar ideas yield significant improvements for the random generation and unranking of objects, see [2,6]).
References 1. Ph. Flajolet and B. Salvy. Computer algebra libraries for combinatorial structures. J. Symbolic Computation, 20:653–671, 1995. 2. Ph. Flajolet, P. Zimmerman, and B. Van Cutsem. A calculus for the random generation of combinatorial structures. Theoret. Comput. Sci., 132(1-2):1–35, 1994. 3. D.H. Greene. Labelled Formal Languages and Their Uses. PhD thesis, Computer Science Dept., Stanford University, 1983. 4. R. Kemp. Generating words lexicographically: An average-case analysis. Acta Informatica, 35(1):17–89, 1998. 5. C. Mart´ınez and X. Molinero. Generic algorithms for the exhaustive generation of labelled objects. In Proc. Workshop on Random Generation of Combinatorial Structures and Bijective Combinatorics (GASCOM), pages 53–58, 2001. 6. C. Mart´ınez and X. Molinero. A generic approach for the unranking of labelled combinatorial classes. Random Structures & Algorithms, 19(3–4):472–497, 2001. 7. A. Nijenhuis and H. S. Wilf. Combinatorial Algorithms. Academic Press, 1978. 8. E.M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, 1977. 9. R. Sedgewick and Ph. Flajolet. An Introduction to the Analysis of Algorithms. Addison-Wesley, Reading, MA, 1996. 10. J.S. Vitter and Ph. Flajolet. Average-case analysis of algorithms and data structures. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, chapter 9. North-Holland, 1990.
On the Complexity of Some Problems in Interval Arithmetic K. Meer Department of Mathematics and Computer Science Syddansk Universitet, Campusvej 55, 5230 Odense M, Denmark [email protected] Fax: 0045 6593 2691
Abstract. We study some problems in interval arithmetic treated in Kreinovich et al. [6]. First, we consider the best linear approximation of a quadratic interval function. Known to be N P -hard in the Turing model, we analyze its complexity in the real number model and the analoguous class N PR . We give new upper complexity bounds by locating the decision version in DΣR2 (a real analogue of Σ 2 ) and solve a problem left open in [6].
1
Introduction
Problems in interval arithmetic model situations in which input data only is known within a certain accuracy. Starting from an exact description with input values ai , i ∈ I (say ai ∈ R or ∈ Q, I an index set), a corresponding formalization in terms of interval arithmetic would only supply the information that the ai ’s belong to some given intervals [ai , ai ] ⊂ R. This framework provides a way to formalize and study problems related to the presence of uncertainties. The latter both includes data errors occuring during data measurements and rounding errors during the performance of computer algorithms. Interval arithmetic thus can be seen as an approach to validate numerical calculations. The computational complexity of solving a problem in the interval setting might be significantly larger than solving the corresponding problem with accurate input data. In fact, many such results are known in interval arithmetic. As an example, consider the solvability problem for a linear equation system A · x = b. If A and b are given precisely Gaussian elimination efficiently yields computation of a solution (or proves its non-existence). This holds both for the bit measure and the algebraic cost measure, see below. However, if we only know the entries in A and b to belong to given intervals, then the complexity changes dramatically; deciding the question whether concrete choices for A and b exist within the given (rational) interval bounds such that the resulting system for these choices is solvable is N P -complete, see [6].
Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT) and by the Danish Natural Science Research Council SNF.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 582–591, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Complexity of Some Problems in Interval Arithmetic
583
From a logical point of view this increase in the complexity (provided P = N P ) is due to the presence of additional quantifiers in the interval problem description. For example, in the above linear system problem the interval approach has additional existential quantifiers ranging over the intervals of the given accuracy bounds and asking for the existence of “right” coefficients followed by the linear equation problem. Since the new quantifiers are ranging over real intervals, they in principle introduce a quantified formula in the first-order theory of the reals (considered as a real closed field; this of course only holds for algebraic problems). Even though this theory is decidable the complexity of currently known algorithms is tremendous. It is well known that already the existential theory over R is an N PR -complete problem; here, N PR denotes the real analogue of N P in the framework of real Turing (or Blum-Shub-Smale, shortly: BSS) machines, see [1]. Therefore, it is natural to consider interval problems in that framework. In this paper we want to analyze whether for interval problems known to be hard in the Turing setting the shift from the Turing to the real number model implies N PR -completeness or N PR -hardness of the problem in the latter as well. We shall substantiate that in the real number model a finer complexity analysis can be done. More precisely, for some problems the interval formulation will likely not lead to N PR -hardness, even though the restriction to rational problems and the Turing model implies N P -hardness (or completeness, respectively). This will be due to the fact that even though formally the new quantifiers range over the reals, in certain situations they can be replaced by Boolean quantifiers, i.e., quantifiers ranging over {0, 1}, only. We study the following problem treated in [6] for clarifying our approach: Best approximation of quadratic interval functions by linear ones. The problem is known to be N P -hard in the Turing model, [7]; note, however, that membership in the polynomial hierarchy is not established in [7]. Definition 1. (a) Let B := [b1 , b1 ] × . . . [bn , bn ] be a box in Rn , bi < bi for 1 ≤ i ≤ n. An interval function f on B is a mapping which assigns to each point y ∈ B an interval f (y) := [f (y), f (y)] ⊆ R. If both functions f and f are linear or quadratic functions, i.e., if they are polynomials of degree 1 or 2, resp., we call f a linear respectively a quadratic interval function. (b) Given a box B as above, a linear interval function X := [X, X] and a quadratic interval function f := [f , f ] on B, we say that X approximates f on B iff [f (y), f (y)] ⊆ [X(y), X(y)] for all y ∈ B. Definition 2. (a) The problem BLAQIF (best linear approximation of a quadratic interval function) is defined as follows: given n ∈ N, a box B ⊆ Rn , a quadratic interval function f (y) := [f (y), f (y)] on B and a bound M ∈ R, is there a linear approximation X = [X, X] of f on B such that max X(y)−X(y) ≤ M ? y∈B
(b) The computational version of BLAQIF asks to compute min max X(y)− X
y∈B
X(y) under the constraints that X = (X, X) is an approximation of f.
584
K. Meer
Our main results now can be summarized as follows: Main results: (i) In the real number model BLAQIF is not N PR -complete under weak polynomial time reductions and likely (to be precised!) neither N PR complete under (full) polynomial time reductions. (ii) In the Turing model BLAQIF can be located in Σ2 . For fixed input dimension n both the decision and the computational version can be solved in polynomial time. Part (ii) complements the results in [7] by providing an upper complexity bound in the Turing setting. It also answers a question posed in [6].
2
Basic Notations; Structural Properties
We first recall the definition of some complexity classes important for our results. Then, we analyze the consequence for problems belonging to one of these classes with respect to completeness or hardness properties in the real number model. 2.1
Complexity Classes
Though there are different equivalent definitions for the classes we need, for our purposes those based on alternating quantifiers are most appropriate. Definition 3. (a) A decision problem A over the alphabet {0, 1} is in class Σ k , k ∈ Nx iff there are a problem B ∈ P and polynomials p1 , . . . , pk such that x ∈ A ⇐⇒ Q1 y1 ∈ {0, 1}p1 (|x|) . . . Qk yk ∈ {0, 1}pk (|x|) (x, y1 , . . . , yk ) ∈ B , where the variable blocks yi range over {0, 1}pi (|x|) and the quantifiers Qi ∈ {∃, ∀} alternate, starting with Q1 = ∃ (and |x| describes the bit size of x). ∞ Ri is in class ΣRk , k ∈ N in the real (b) A decision problem A over R∞ := i=1
number model iff there are a problem B ∈ PR and polynomials p1 , . . . , pk such that x ∈ A ⇐⇒ Q1 y1 ∈ Rp1 (|x|R ) . . . Qk yk ∈ Rpk (|x|R ) (x, y1 , . . . , yk ) ∈ B , where the variable blocks yi range over Rpi (|x|R ) and the quantifiers Qi ∈ {∃, ∀} alternate, starting with Q1 = ∃ (and |x|R describes the algebraic size of x). (c) If we in b) restrict the quantifiers to be Boolean ones, i.e., if the variable blocks range over {0, 1}∗ instead of R∞ we obtain the digital classes DΣRk . Clearly, Σ 1 = N P, ΣR1 = N PR and DΣR1 = DN PR , where the latter is the class digital N PR of problems in N PR that require a discrete search space for verification, only.
On the Complexity of Some Problems in Interval Arithmetic
2.2
585
The Real Number Complexity of Problems in DΣR2
This section is on structural complexity of problems in DΣR2 . The main goal is to argue that problems in DΣR2 likely do not bear the full complexity of ΣR2 , and even not of N PR -hard problems. This shows that the complexity analysis of several NP-hard interval arithmetic problems can be considerably refined. We turn this into more precise statements as follows. We give an absolute statement with respect to so called weak reductions (introduced by Koiran [5] for a weak version of the BSS model): No problem in DΣR2 is N PR -hard under weak reductions. Then, we give an analogous statement for (general) polynomial time reductions under a widely believed hypothesis concerning computations of resultant polynomials: No problem in DΣR2 is N PR -hard under polynomial time reductions unless there is a (non-uniform) polynomial time algorithm computing a multiple of the resultant polynomial on a Zariski-dense subset. Though some definitions are necessary to precisely state these results, the proofs are almost straightforward extensions of similar statements for DN PR given in [2] and [8]. Definition 4. (Weak running time, [5]) (a) Let M be a real machine with (real) machine constants c := (c1 , . . . , cs ) and having a running time bounded by a function t of the (algebraic) input size. Any intermediate result computed by M on x 1 ,...,cs ) is a rational function of the form p(x,c q(x,c1 ,...,cs ) , where p and q are polynomials with integer coefficients over x and c. The weak costs of computing this intermediate result are given as the maximum among the degrees of p, q and the bit sizes of any of its coefficient. Other operations of M (like branches and copying) have weak costs 1. The weak running time of M on input x ∈ R∞ is the sum of the weak costs of all intermediate results and branch-nodes along the computational path of M on x. (b) We call a many-one reduction a weak polynomial time reduction if it can be computed in weak polynomial time by a BSS machine. The notion of N PR -completeness under weak polynomial time reductions then is defined in a straightforward manner. Note that using the weak cost measure we still allow real number algorithms, but operation sequences like repeated squaring now get more expensive than in the BSS model, see [5]. The next definition introduces (a particular subcase of) the well known resultant polynomials. Consider the problem of deciding whether a given system f := (f1 , . . . , fn ) ∈ R[x1 , . . . , xn ]n of n homogeneous polynomial systems of degree 2 in n variables has a zero x ∈ Cn \ {0}, i.e., fi (x) = 0 ∀i. We denote by H the set of all such systems and by H0 those being solvable in Cn \ {0}. The implicitly stated claims in the following definition are well known, see,e.g., [11]. Definition 5. Let n ∈ N, N := 12 · n2 · (n + 1). The resultant polynomial RESn : RN → R is a polynomial which as its indeterminates takes the coefficient vectors of homogeneous systems in H. It is the unique (up to sign) irreducible polynomial with integer coefficients that generates the variety of (coefficient vectors of ) solvable instances H0 of problems in H, i.e., RESn (f ) = 0 iff f ∈ H
586
K. Meer
has a zero x ∈ Cn \ {0}. In this notation, RESn (f ) is interpreted as evaluating RESn on the coefficient vector of f in RN . It is generally believed that no efficient algorithms for computing RESn exist. This is, for example, substantiated by the close relation of this problem to other potentially hard computational problems like the computation of mixed volumes. For more see [11] and the literature cited in there. Hardness results for certain resultant computations can be found in [9]; relations between computation of resultants and the real PR versus N PR question are studied in [10]. Theorem 6. (a) No problem in DN PR is N PR -complete under weak polynomial time reductions. No problem in DΣR2 is N PR -hard under weak polynomial time reductions. (b) Suppose there is no (non-uniform) polynomial time algorithm which for each n ∈ N computes a non-zero multiple of RESn on a Zariski-dense subset of H0 . Then no problem in DN PR is N PR -complete and no problem in DΣR2 is N PR -hard under polynomial time reductions in the BSS model. The proof is an extension of ideas developed in [2] and [8].
Remark 1. Note that in (b) we cannot expect an absolute statement of noncompleteness like in (a) unless PR = N PR is proven (for the weak model the relation weak-PR = weak-N PR is shown in [2]). In the next sections the above theorem is used to substantiate the conjecture that interval problems which either belong to DN PR or DΣR2 do not share the full difficulty of complete problems in N PR . Thus, in the real number model their complexities seem to be not the hardest possible among all (algebraic) interval problems belonging to the corresponding real complexity class.
3
Approximation of Interval Functions
The BLAQIF problem is closely related to a semi-infinite optimization problem. Towards this end, suppose for a while that we have found an optimal linear approximation X(y) := x0 + x1 y1 + . . . + xn yn , X(y) − f (y) ≥ 0 ∀ y ∈ B and X(y) := x0 + x1 y1 + . . . + xn yn , f (y) − X(y) ≥ 0 ∀ y ∈ B . As shown in [6] it is easy to calculate max X(y)−X(y) once X, X are known. y∈B
The components yi∗ of the optimal y ∗ are determined by the signs of xi − xi according to yi∗ := bi if xi ≥ xi and yi∗ := bi if xi < xi . Knowing these signs we obtain a linear semi-infinite programming problem. For example, if we suppose xi ≥ xi ∀ 1 ≤ i ≤ n the problem turns into
On the Complexity of Some Problems in Interval Arithmetic
587
n n min x + x · b − x − xi · bi 0 i i 0 i=1 i=1 T s.t. x0 + x · y − f (y) ≥ 0 ∀y∈B (LSI) T f (y) − x − x · y ≥ 0 ∀ y∈B 0 xi ≥ xi ∀1≤i≤n, where x := (x1 , . . . , xn ), and similarly for x. This problem is linear on the upper variable level (i.e., the 2n + 2 many xvariables) and quadratic for the lower variable level (i.e., y). It is semi-infinite because there are infinitely many side-constraints for X, X, parametrized through y. Note that in general we do not know in advance which sign-conditions for the components of an optimal solution X, X hold. Later on, we shall guess the right conditions as the first part of our DΣR2 algorithm and start from the resulting (LSI) problem. General assumption: For sake of simplicity, in the following we assume without loss of generality xi ≥ xi ∀ 1 ≤ i ≤ n and deal with the above (LSI). It is easy to see that the decision version of BLAQIF belongs to ΣR2 . The result, however, is not strong enough for what we want. It neither proves a similar statement in the Turing model (since we do not know how to bound the bit-sizes of the guessed reals) nor does it give any information that BLAQIF in a real setting is likely to be an easier problem than complete ones for class ΣR2 (and even for class N PR ) are. In order to see how general quantifier elimination procedures over R can be avoided when solving the real BLAQIF problem we have to study semi-infinite optimization problems a bit more deeply. 3.1
Optimality Conditions for (LSI)
A fundamental idea for studying (LSI) is to reduce the infinitely many constraints to finitely many in order to apply common optimization criteria. The following can be deduced from semi-infinite programming theory, see, e.g., [4]. Theorem 7. A feasible point (X, X) is optimal for (LSI) iff the following conditions are satisfied: there exist two sets {y (i) , i ∈ I} and {y (j) , j ∈ J}, each of at most n points in B, together with Lagrange parameters λi , i ∈ I, νj , j ∈ J and µk , 1 ≤ k ≤ n such that 1 0 1 0 (i) y · λ i + 0 · νj + µ = b i) 0 −1 0 −1 i∈I j∈J −µ 0 −y (j) −b where µ := (µ1 , . . . , µn ) and 0 ∈ Rn ; ii) λi ≥ 0 ∀ i ∈ I, νj ≥ 0 ∀ j ∈ J, µk ≥ 0 ∀1 ≤ k ≤ n; iii) either λi = 0 or the point y (i) is optimal for the problem min x0 + xT · y (i) − y∈B
f (y), and the optimal value is 0;
588
K. Meer
iv) either νj = 0 or the point y (j) is optimal for the problem min f (y) − x0 − y∈B
xT · y (j) , and the optimal value is 0; v) µk · (xk − xk ) = 0 ∀ 1 ≤ k ≤ n.
This theorem is most important for obtaining our results by the following (j) reasons. First, it states that at least one point y (i) and one point y satisfying conditions iii) and iv), respectively, exist; this follows from λi = 1 = νj . i∈I
j∈J
Therefore, we can search for it and are sure to find it if we guarantee the search to be exhaustive. Secondly, as global optima for the corresponding subproblems y (i) and y (j) satisfy the following optimality conditions on the lower level of the semi-infinite problem. Corollary 1. Using the setting of Theorem 7 let y (i) be a point satisfying con(i) (i) dition iii), where λi > 0. Let AC(y (i) ) := {k|yk = bk or yk = bk } be the set of active components of y (i) in B. Then thereexist Lagrange parameters ηj ≥ 0, j ∈ AC(y (i) ) such that x − Dy f (y (i) ) = ηj · (±ej ) , where ej is j∈AC(y (i) )
the j-th unit vector and the sign is +1 iff 3.2
(i) yj
(i)
= bj and −1 iff yj = bj .
Linear Approximation of Quadratic Functions Is in DΣR2
The previous results on the relations of BLAQIF and (LSI) are used in this subsection in order to prove membership in DΣR2 resp. in Σ 2 as follows. The overall goal is to find a solution (X, X) and check that it realizes the demanded bound M. It has to be shown how that can be realized using binary (digital) quantifiers. Towards this end 1) we guess the right set of signs for xk − xk , 1 ≤ k ≤ n and produce the corresponding (LSI); without loss of generality we again assume all these signs to be 0 or 1. 2) Assuming (X, X) to be known we guess certain discrete information which then is used to compute at least one point y (i) , i ∈ I and one point y (j) , j ∈ J satisfying Theorem 7. This is done using Corollary 1 and the ideas developed in [8]. 3) From the corollary and the information obtained in 2) we deduce conditions that have to be fulfilled by an optimal solution (X, X). These conditions lead to a linear programming problem. By means of a DN PR algorithm we obtain a candidate (X, X) for the optimum. 4) Finally, the candidate obtained in 3) is checked for optimality. This mainly requires to check the constraints, which now are quadratic programs in y. Using the results of [8] this problem belongs to class co-DN PR = DΠR1 . Together, we obtain a DΣR2 algorithm. Theorem 8. Let y (i) ∈ S and y (j) ∈ S be two points in the statement of Theorem 7 such that the corresponding Lagrange parameters λi and νj are positive.
On the Complexity of Some Problems in Interval Arithmetic
589
Suppose that we do neither know an optimal (X, X) nor y (i) , y (j) . Then having the correct information about the signs for xi − xi of an optimal solution and about the active components of y (i) and y (j) in S (i.e., those components that correspond either to bk or to bk ) we can compute an optimal solution (X, X) of (LSI) as (any) solution of a specific linear programming problem. Moreover, the latter linear programming problem can be constructed deterministically in polynomial time if the active components are known. Corollary 2. There is a DΣR1 algorithm which computes a set X of vectors in which an optimal solution of (LSI) can be found, i.e., a non-deterministic algorithm that guesses a vector in {0, 1}∗ of polynomial length in n and produces a candidate (X, X) for each guess such that at least one of the candidates produced is an optimal solution. Proof. The active components of y (i) and y (j) can be coded by a bit-vector. Now use Theorem 8 together with the results in [8]. Proof of Theorem 8. As it can be seen from the proof below it will be sufficient to argue for one of the points y (i) , y (j) only, so let us consider y (i) . W.l.o.g. (i) suppose the first s many components to be active and to satisfy yk = bk for 1 ≤ k ≤ s. This actually is the most difficult case because the values of the active (i) components yk = bk do not correspond to the assumed inequalities xk − xk ≥ 0 which in the objective function result in the terms (xk − xk ) · bk (instead of (i) (xk − xk ) · bk which would correspond to yk = bk ). However, the difference only results in an additional LP-problem which is of no concern in our analysis. We plug the active components into the constraint x0 + xT · y (i) − f (y (i) ) ≥ 0 and obtain a quadratic minimization problem in the remaining components ys+1 , . . . , yn : s n min x + x · b + x · y − f (b , . . . , b , y , . . . , y ) 0 k k n k 1 s s+1 (∗) k=1 k=s+1 such that bk < yk < bk , s + 1 ≤ k ≤ n If the guess was correct we know that an interior solution for y˜ := ys+1 , . . . , yn T exists. Now define f (b1 , . . . , bs , ys+1 , . . . , yn ) := 12 y˜T · D · y˜ + h · y˜ + e , where D ∈ R(n−s)×(n−s) , h ∈ Rn−s , e ∈ R. Then Corollary 1 together with a straightforward calculation gives: i) an optimal (interior) solution y˜ lies in the kernel of D; ii) an optimal (interior) solution y˜ satisfies (xs+1 , . . . , xn )T = D · y˜ + h . Thus, i) implies (xs+1 , . . . , xn )T = h and we can compute these components of the (LSI) solution directly; s iii) the optimal value of (∗) is 0; using ii) this results in x0 + xk · bk = e . k=1
In a completely analogue fashion we obtain a similar condition for the part X of a solution when studying an optimal y (j) for min f (y) − xT · y − x0 . If without
590
K. Meer
loss of generality the last n − components of a solution y (j) are active we get s (x1 , . . . , x )T = h as well as x0 + xk · bk = e for appropriate values h, e that k=1
can easily be computed knowing the active components of y (j) . Putting all the information together we have the following situation: knowing the active components of y (i) , y (j) we can directly compute in polynomial time from the problem input those components of an optimal solution (X, X) (i) (j) that correspond to non-active yk , yk . The remaining ones can be obtained as optimal solution of the linear program min x0 + s.t. x0 +
s k=1 s k=1
xk · bk +
n k=s+1
hk · bk − x0 +
xk · bk = e and x0 +
n k=+1
k=1
hk · bk −
n k=+1
xk · bk
xk · bk = e
Following [8] such a solution can be computed non-deterministically in polynomial time using a binary vector as a guess. The theorem can be used to prove the main result of this section: Theorem 9. The BLAQIF decision problem belongs to DΣR2 in the real number model and to Σ 2 in the Turing model. Proof. It is clear that there exists a best linear approximation for each instance. The first sequence of binary existential quantifiers is used to find the correct (LSI) version, i.e., to guess the correct signs for xk − xk in an optimal solution. We use the guess to construct the right objective function for the problem (as described before). According to Theorem 7 there exist two points y (i) , y (j) as described in Theorem 8. Moreover, we can guess a binary vector of polynomial length in the algebraic input size, perform the algorithm described in the proof of Theorem 8 and compute a candidate (X, X) for an optimal solution of the (LSI) instance. The proof also guarantees that if we would handle in a deterministic (but inefficient) algorithm all possible guesses at least one would give an optimal solution, see Corollary 2. In the remaining part of the DΣR2 algorithm we have to verify that the computed candidate (X, X) indeed is feasible and gives a bound ≤ M for the objective function. Whereas the latter is done by a simple evaluation the former requires the computation of an optimal point for two quadratic programming problems with linear constraints. These problems have the lower level variables y as unknowns and are obtained by plugging X and X into the lower level equations. We have seen this problem to be in co-DN PR . Note that if we want to get a globally minimal point we have to compare it with all other candidates. Thus, this part corresponds to checking validity of a formula containing a sequence of O(n) many universal binary quantifiers. This implies BLAQIF ∈ DΣR2 . In the Turing model the above structure of binary quantifiers still describes a Σ 2 procedure. The only point to check is that the intermediate computations can be performed in polynomial time with respect to the bit-measure. This is
On the Complexity of Some Problems in Interval Arithmetic
591
true for the arguments relying in [8] as well as for the proof of Theorem 8: No additional constants are introduced in these algorithms and the construction of intermediate matrices and LP-subproblems is done by only rearranging some of the input data. Corollary 3. BLAQIF is not N PR -hard under weak polynomial time reductions; it is not N PR -hard under polynomial time reductions unless a non-zero multiple of RESn can be computed non-uniformly in polynomial time on a Zariski-dense subset of H0 . Theorem 9 also answers a question posed in [6], chapter 19, concerning the complexity of the rational BLAQIF problem if the dimension n is fixed. Our result extends the one in [7]. Theorem 10. Let n ∈ N be fixed. The computational version of the BLAQIF problem for rational inputs and fixed dimension n is solvable in polynomial time in the Turing model. We finally mention that results similar to Corollary 3 can be obtained for several versions of interval linear systems, see [6], which are known to be NPcomplete in the Turing model. We postpone our discussion to the full version.
References 1. L. Blum, F. Cucker, M. Shub, S. Smale: Complexity and Real Computation. Springer, 1998. 2. F. Cucker, M. Shub, S. Smale: Complexity separations in Koiran’s weak model. Theoretical Computer Science, 133, 3 – 14, 1994. 3. E. Gr¨ adel, K. Meer: Descriptive Complexity Theory over the Real Numbers. In: Lectures in Applied Mathematics, J. Renegar, M. Shub, S. Smale (eds.), 32, 381– 403, 1996. 4. S.˚ A. Gustafson, K.O. Kortanek: Semi-infinte programming and applications. In: Mathematical Programming: The State of the Art, A. Bachem, M. Gr¨ otschel, B. Korte, eds., Springer, 132 – 157, 1983. 5. P. Koiran: A weak version of the Blum-Shub-Smale model. In 34th Annual IEEE Symposium on Foundations of Computer Science, 486 – 495, 1993. 6. V. Kreinovich, A.V. Lakeyev, J. Rohn, P. Kahl: Computational Complexity and Feasibility of Data Processing and Interval Computations. Kluwer, 1997. 7. M. Koshelev, L. Longpr´e, P. Taillibert: Optimal Enclusure of Quadratic Interval Functions. Reliable Computing 4, 351 – 360, 1998. 8. K. Meer: On the complexity of quadratic programming in real number models of computation. Theoretical Computer Science 133, 85 – 94, 1994. 9. D.A. Plaisted: New NP-hard and NP-complete polynomial and integer divisibility problems. Theoretical Computer Science 31, 125 – 138, 1984. 10. M. Shub: Some remarks on Bezout’s theorem and complexity theory. In: M. Hirsch, J. Marsden, M. Shub (eds.), Form topology to computation: Proc. of the Smalefest, Springer, 443 – 455, 1993. 11. B. Sturmfels: Introduction to resultants. In: Application of Computational Algebraic Geometry, D.A. Cox B. Sturmfels (eds.), Proc. of Symposia in Applied Mathematics, Vol. 53, AMS, 25 – 39, 1998.
An Abduction-Based Method for Index Relaxation in Taxonomy-Based Sources Carlo Meghini1 , Yannis Tzitzikas1, , and Nicolas Spyratos2 1
Consiglio Nazionale delle Ricerche Istituto della Scienza e delle Tecnologie della Informazione, Pisa, Italy 2 Laboratoire de Recherche en Informatique Universite de Paris-Sud, France
Abstract. The extraction of information from a source containing termclassified objects is plagued with uncertainty. In the present paper we deal with this uncertainty in a qualitative way. We view an information source as an agent, operating according to an open world philosophy. The agent knows some facts, but is aware that there could be other facts, compatible with the known ones, that might hold as well, although they are not captured for lack of knowledge. These facts are, indeed, possibilities. We view possibilities as explanations and resort to abduction in order to define precisely the possibilities that we want our system to be able to handle. We introduce an operation that extends a taxonomy-based source with possibilities, and then study the property of this operation from a mathematical point of view.
1
Introduction
Taxonomies are probably the oldest conceptual modeling tool. Nevertheless, they make a powerful tool still used for indexing by terms books in libraries, and very large collections of heterogeneous objects (e.g. see [8]) and the Web (e.g. Yahoo!, Open Directory). The extraction of information from an information source (hereafter, IS) containing term-classified objects is plagued with uncertainty. From the one hand, the indexing of objects, that is the assignment of a set of terms to each object, presents many difficulties, whether it is performed manually by some expert or automatically by a computer programme. In the former case, subjectivity may play a negative role (e.g. see [10]); in the latter case, automatic classification methods may at best produce approximations. On the other hand, the query formulation process, being linguistic in nature, would require perfect attuning of the system and the user language, an assumption that simply does not hold in open settings such as the Web. A collection of textual documents accessed by users via natural language queries is clearly a kind of IS, where documents play the role of objects and words play the role of terms. In this context, the above mentioned uncertainty is
This work has been carried out while Dr. Tzitzikas was a visiting researcher at CNR-ISTI as an ERCIM fellow. Our thanks to ERCIM.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 592–601, 2003. c Springer-Verlag Berlin Heidelberg 2003
An Abduction-Based Method for Index Relaxation
593
typically dealt with in a quantitative way, i.e. by means of numerical methods: in a document index, each term is assigned a weight, expressing the extent to which the document is deemed to be about the term. The same treatment is applied to each user query, producing an index of the query which is a formal representation of the user information need of the same kind as that of each document. Document and query term indexes are then matched against each other in order to estimate the relevance of the document to a query (e.g. see [1]). In the present study, we take a different approach, and deal with uncertainty in a qualitative way. We view an IS as an agent, operating according to an open world philosophy. The agent knows some facts, but it does not interpret these facts as the only ones that hold; the agent is somewhat aware that there could be other facts, compatible with the known ones, that might hold as well, although they are not captured for lack of knowledge. These facts are, indeed, possibilities. One way of defining precisely in logical terms the notion of possibility, is to equate it with the notion of explanation. That is, the set of terms associated to an object is viewed as a manifestation of a phenomenon, the indexing process, for which we wish to find an explanation, justifying why the index itself has come to be the way it is. In logic, the reasoning required to infer explanations from given theory and observations, is known as abduction. We will therefore resort to abduction in order to define precisely the possibilities that we want our system to be able to handle. In particular, we will define an operation that extends an IS by adding to it a set (term, object) pairs capturing the sought possibilities, and then study the property of this operation from a mathematical point of view. The introduced operation can be used also for ordering query answers using a possibility-based measure of relevance. The paper is structured as follows. Sections 2 and 3 provide the basis of our framework, introducing ISs and querying. Section 4 introduces extended ISs and Section 5 discusses query answering in such sources. Subsequently, Section 6 generalizes extended ISs and introduces iterative extensions of ISs. Finally, Section 7 concludes the paper. For reasons of space, proofs are just sketched.
2
Information Sources
An IS consists of two elements. The first one is a taxonomy, introduced next. Definition 1: A taxonomy is a pair O = (T, K) where T is a finite set of symbols, called the terms of the taxonomy, and K is a finite set of conditionals on T, i.e. formulae of the form p → q where p and q are terms; K is called the knowledge base of the taxonomy. The knowledge graph of O is the directed graph GO = (T, L), such that (t, t ) ∈ L iff t → t is in K. 2 The second element of an IS is a structure, in the logical sense of the term. Definition 2: Given a taxonomy O = (T, K), a structure on O is a pair U = (Obj, I) where: Obj is a countable set of objects, called the domain of the structure, and I is a finite relation from T to Obj, that is I ⊆ T × Obj, called the interpretation of the structure. 2
594
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
As customary, we will treat the relation I as a function from terms to sets of objects and, where t is a term in T, write I(t) to denote the extension of t, i.e. I(t) = {o ∈ Obj | (t, o) ∈ I}. Definition 3: An information source (IS) S is a pair S = (O, U ) where O is a taxonomy and U is a structure on O. 2 It is not difficult to see the strict correspondence between the notion of IS and that of a restricted monadic predicate calculus: the taxonomy plays the role of the theory, by providing the predicate symbols (the terms) and a set of axioms (the knowledge base); the structure plays the basic semantical role, by providing a domain of interpretation and an extension for each term. These kinds of systems have also been studied in the context of description logics [3], where terms are called concepts and axioms are called terminological axioms. For the present study, we will mostly focus on the information relative to single objects, which takes the form of a propositional theory, introduced by the next Definition. Definition 4: Given an IS S and an object o ∈ Obj, the index of o in S, indS (o), is the set of terms in whose extension o belongs according to the structure S, formally: indS (o) = {t ∈ T | (t, o) ∈ I}. The context of o in S, CS (o), is defined as: CS (o) = indS (o) ∪ K. 2 For any object o, CS (o) consists of terms and simple conditionals that collectivelly form all the knowledge about o that S has. Viewing the terms as propositional variables makes object contexts propositional theories. This is the view that will be adopted in this study. Example 1: Throughout the paper, we will use as an example the IS graphically illustrated in Figure 1, given by (the abbreviations introduced in Figure 1 are used for reasons of space): T = {, C, SC, MPC, UD, R, M, UMC}, K = {C → , SC → C, MPC → C, UD → , R → SC, M → SC, UMC → MPC, UMC → UD}, and U is the structure given by: Obj = {1, 2} and I = {(SC, 1), (M, 2), (MPC, 2)}. The index of object 2 in S, indS (2) is {M, MPC}, while the context of 2 in S is CS (2) = indS (2) ∪ K. Notice that the taxonomy of the example has a maximal element, , whose existence is not required in every taxonomy. 2 Given a set of propositional variables P, a truth assignment for P is a function mapping P to the set of standard truth values, denoted by T and F, respectively [5]. A truth assignment V satisfies a sentence σ, V |= σ, if σ is true in V, according to the truth valuation rules of predicate calculus (PC). A set of sentences Σ logically implies the sentence α, Σ |= α, iff every truth assignment which satisfies every sentence in Σ also satisfies α. In the following, we will be interested in deciding whether a certain conditional is logically implied by a knowledge base. Proposition 1: Given a taxonomy O = (T, K) and any two terms p, q in T, K |= p → q iff there is a path from p to q in GO . 2 From a complexity point of view, the last Proposition reduces logical implication of a conditional to the well-known problem on graphs REACHABILITY, which has been shown to have time complexity equal to O(n2 ), where n is the
An Abduction-Based Method for Index Relaxation
595
Cameras(C)
StillCameras(SC)
Reflex (R)
MovingPictureCams(MPC)
Miniatures(M)
1
UnderwaterDevices(UD)
UnderwaterMovingCams(UMC)
2
Fig. 1. A source
number of nodes of the graph [7]. Consequently, for any two terms p, q in T, K |= p → q can be decided in time O(|T |2 ).
3
Querying Information Sources
We next introduce the query language for extracting information from an IS in the traditional question-answering way. Definition 5: Given a taxonomy O = (T, K), the query language for O, LO , is defined by the following grammar, where t is a term in T : q ::= t | q ∧ q | q ∨ q | ¬q | (q) 2 The answer to queries is defined in logical terms by taking a model-theoretic approach, compliant with the fact that the semantical notion of structure is used to model the extensional data of an IS. To this end, we next select, amongst the models of object contexts, the one realizing a closed-world reading of an IS, whose existence and uniqueness trivially follow from the next Definition. Definition 6: Given an IS S, for every object o ∈ Obj, the truth model of o in S, Vo,S , is the truth assignment for T defined as follows, for each term t ∈ T : T if CS (o) |= t Vo,S (t) = F otherwise Given a query ϕ in LO , the answer of ϕ in S is the set of objects whose truth model satisfies the query: ans(ϕ, S) = {o ∈ Obj | Vo,S |= ϕ}. 2 In the Boolean model of information retrieval, a document is returned in response to a query if the index of the document satisfies the query. Thus, the above definition extends Boolean retrieval by considering also the knowledge base in the retrieval process. Example 2: The answer to the query C in the IS introduced in Example 1, ans(C, S), consists of both object 1 (since {SC, SC → C} ⊆ CS (1) hence V1,S (C) = T) and object 2 (since {MPC, MPC → C} ⊆ CS (2) hence V2,S (C) = T). 2 The next definition introduces the function αS , which, along with Proposition 1, provides a mechanism for the computation of answers.
596
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
Definition 7: Given an IS S, the solver of S, αS , is the total function from queries to sets of objects, αS : LO → P(Obj), defined as follows: αS (t) = {I(u) | K |= u → t} αS (q∧q ) = αS (q)∩αS (q ), αS (q∨q ) = αS (q)∪αS (q ), and αS (¬q) = Obj\αS (q). 2 As intuition suggests, solvers capture sound and complete query answerers. Proposition 2:
For all ISs S and queries ϕ ∈ LO , ans(ϕ, S) = αS (ϕ).
We shall also use I
−
to denote the restriction of αS on T , i.e. I
−
2
= αS|T .
Example 3: In the IS previously introduced, the term C can be reached in the knowledge graph by each of the following terms: C, SC, MPC, R, M, and UMC. Hence: ans(C, S) = αS (C) = I(C) ∪ I(SC) ∪ I(MPC) ∪ I(R) ∪ I(M) ∪ I(UMC) = {1, 2}. Likewise, it can be verified that ans(M, S) = {2} and ans(UMC, S) = ∅. 2 In the worst case, answering a query requires (a) to visit the whole knowledge graph for each term of the query and (b) to combine the so obtained sets of objects via the union, intersection and difference set operators. Since the time complexity of each such operation is polynomial in the size of the input, the time complexity of query answering is polynomial.
4
Extended Information Sources
Let us suppose that a user has issued a query against an IS and that the answer does not contain objects that are relevant to the user information need. The user may not be willing to replace the current query with another one, for instance because of lack of knowledge on the available language or taxonomy. In this type of situation, both database and information retrieval (IR) systems offer practically no support. If the IS does indeed contain relevant objects, the reason of the user’s disappointment is indexing mismatch: the objects have been indexed in a way that is different from the way the user would expect. One way of handling the problem just described, would be to consider the index of an IS not as the ultimate truth about how the world is and is not, but as a flexible repository of information, which may be interpreted in a more liberal or more conservative way, depending on the context. For instance, the above examples suggest that a more liberal view of the IS, in which the camera in question is indexed under the term M, could help the user in getting out of the impasse. One way of defining precisely in logical terms the discussed extension, is to equate it with the notion of explanation. That is, we view the index of an object as a manifestation, or observation, of a phenomenon, the indexing process, for which we wish to find an explanation, justifying why the index itself has come to be as it is. In logic, the reasoning required to infer explanations from given theory and observations, is known as abduction. The model of abduction that we adopt is the one presented in [4]. Let LV be the language of propositional logic over an alphabet V of propositional variables,
An Abduction-Based Method for Index Relaxation
597
with syntactic operators ∧, ∨, ¬, →, (a constant for truth) and ⊥ (falsity). A propositional abduction problem is a tuple A = V, H, M, T h , where V is a finite set of propositional variables, H ⊆ V is the set of hypotheses, M ⊆ V is the set of manifestations, and T h ⊆ LV is a consistent theory. S ⊆ H is a solution (or explanation) for A iff T h ∪ S is consistent and T h ∪ S |= M. Sol(A) denotes the set of the solutions to A. In the context of an IS S, the terms in S taxonomy play both the role of the propositional variables V and of the hypotheses H, as there is no reason to exclude apriori any term from an explanation; the knowledge base in S taxonomy plays the role of the theory T h; the role of manifestation, for a fixed object, is played by the index of the object. Consequently, we have the following Definition 8: Given an IS S and object o ∈ Obj, the propositional abduction problem for o in S, AS (o), is the propositional abduction problem AS (o) = T, T, indS (o), K . The solutions to AS (o) are given by: Sol(AS (o)) = {A ⊆ T | K ∪ A |= indS (o)} where the consistency requirement on K ∪ A has been omitted since for no knowledge base K and set of terms A, K ∪ A can be inconsistent. 2 Usually, certain explanations are preferable to others, a fact that is formalized in [4] by defining a preference relation over Sol(A). Letting a ≺ b stand for a b and b a, the set of preferred solutions is given by: Sol (A) = {S ∈ Sol(A) | ∃S ∈ Sol(A) : S ≺ S}. In the present context, we require the preference relation to satisfy the following criteria, reflecting the application priorities in order of decreasing priority: (1) explanations including only terms in the manifestation are less preferable than explanations including also terms not in the manifestation; (2) explanations altering the behaviour of the IS to a minimal extent, are to be preferred; (3) between two explanations that alter the behaviour of the IS equally, the simpler, that is the smaller, one is to be preferred. Without the first criterion, all minimal solutions would be found amongst the subsets of M, a clearly undesirable effect, at least as long as alternative explanations are possible. In order to formalize our intended preference relation, we start by defining perturbation. Definition 9: Given an IS S, an object o ∈ Obj and a set of terms A ⊆ T, the perturbation of A on S with respect to o, p(S, o, A) is given by the number of additional terms in whose extension o belongs, once the index of o is extended with the terms in A. Formally: p(S, o, A) = |{t ∈ T | (CS (o) ∪ A) |= t and CS (o) |= t}|. 2 As a consequence of the monotonicity of the PC, for all ISs S, objects o ∈ Obj and sets of terms A ⊆ T, p(S, o, A) ≥ 0. In particular, p(S, o, A) = 0 iff A ⊆ indS (o). We can now define the preference relation over solutions of the above stated abduction problem. Definition 10: Given an IS S, an object o ∈ Obj and two solutions A and A to the problem AS (o), A A if either of the following holds:
598
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
1. p(S, o, A ) = 0 2. 0 < p(S, o, A) < p(S, o, A ) 3. 0 < p(S, o, A) = p(S, o, A ), and A ⊆ A .
2
In order to derive the set Sol (AS (o)), we introduce the following notions. Definition 11: Given an IS S and an object o ∈ Obj, the depth of Sol(AS (o)), do , is the maximum perturbation of the solutions to AS (o), that is: do = max{p(S, o, A) | A ∈ Sol(AS (o))} Moreover, two solutions A and A are equivalent, A ≡ A , iff they have the same perturbation, that is p(S, o, A) = p(S, o, A ). 2 It can be readily verified that ≡ is an equivalence relation over Sol(AS (o)), determining the partition π≡ whose elements are the set of solutions having the same perturbation. Letting Pi stand for the solutions having perturbation i, Pi = {A ∈ Sol(AS (o)) | p(S, o, A) = i} it turns out that π≡ includes one element for each perburbation value in between 0 and do , as the following Proposition states. Proposition 3: For all ISs IS S and objects o ∈ Obj, π≡ = {Pi | 0 ≤ i ≤ do }. In order to prove the Proposition, it must be shown that {Pi | 0 ≤ i ≤ do } is indeed a partition, that is: (1) Pi = ∅ for each 0 ≤ i ≤ do ; (2) Pi ∩ Pj = ∅ for 0 ≤ i, j ≤ do , i = j; (3) {Pi | 0 ≤ i ≤ do } = Sol(AS (o)). Items 2 and 3 above are easily established. Item 1 is trivial for do = 0. For do > 0, item 1 can be established by backward induction on i : the basis step, Pdo = ∅, is true by definition. The inductive step, Pk = ∅ for k > 0 implies Pk−1 = ∅, can be proved by constructing a solution having perturbation k − 1 from a solution with perturbation k. Finally, it trivially follows that this partition is the one induced by the ≡ relation. 2 We are now in the position of deriving Sol (AS (o)). Proposition 4: For all ISs S and objects o ∈ Obj, if do = 0 P0 Sol (AS (o)) = {A ∈ P1 | for no A ∈ P1 , A ⊂ A} if do > 0 This proposition is just a corollary of the previous one. Indeed, if do is 0, by Proposition 3, Sol(AS (o)) = P0 and by Definition 10, all elements in Sol(AS (o)) are minimal. If, on the other hand, do is positive, then by criterion (1) of Definition 10, all solutions with non-zero perturbation are preferable to those in P0 , and not viceversa; and by criterion (2) of Definition 10, all solutions with perturbation equal to 1 are preferable to the remaining, and not viceversa. Hence, for a positive do , minimal solutions are to be found in P1 . Finally, by considering the containment criterion set by item (3) of Definition 10, the Proposition results. Example 4: Let us consider again the IS S introduced in Example 1, and the problem AS (1). The manifestation is given by {SC}. Letting B stand for the set {UMC, MPC, UD, , C}, it can be verified that: Sol(AS (1)) = P(T ) \ P(B) as B includes all the terms in T not implying SC. Since do = 5, minimal solutions are to be found in the set P1 . By considering all sets of terms in Sol(AS (1)), it
An Abduction-Based Method for Index Relaxation
599
can be verified that: P1 = {{M} ∪ A | A ∈ P({SC, C, })} ∪ {{R} ∪ A | A ∈ P({SC, C, })} ∪ {{SC, UD} ∪ A | A ∈ P({, C})} ∪ {{SC, MPC} ∪ A | A ∈ P({, C})}. By applying the set containment criterion, we have: Sol (AS (1)) = {{M}, {R}, {SC, UD}, {SC, MPC}}. Analogously, it can be verified that: 2 Sol (AS (2)) = {{M, MPC, UD}, {R, M, MPC}}. We now introduce the notion of extension of an IS. The idea is that an extended IS (EIS for short) adds to the original IS all and only the indexing information captured by the abduction process illustrated in the previous Section. In order to maxime the extension, all the minimal solutions are included in the EIS. Definition 12: Given an IS S and an object o ∈ Obj, the abduced index of o, abindS (o), is given by: abindS (o) = Sol (AS (o)). The abduced interpretation of S, I + , is given by I + = I ∪ {t, o ∈ (T × Obj) | t ∈ abindS (o)}. Finally, the extended IS, S e , is given by S e = (O, U e ) where U e = (Obj, I + ). 2 Example 5: From the last Example, it follows that the extended S is given by S e = (O, U e ), U e = (Obj, I + ) where: abindS (1) = {SC, M, R, UD, MPC}, abindS (2) = {M, MPC, UD, R} and I + = {(SC, 1), (M, 1), (R, 1), (UD, 1), (MPC, 1), (M, 2), (MPC, 2), (UD, 2), (R, 2)} 2
5
Querying Extended Information Sources
As anticipated in Section 4, EISs are meant to be used in order to obtain more results about an already stated query, without posing a new query to the underlying information system. The following Example illustrates the case in point. Example 6: The answer to the query M in the extended IS derived in the last Example, ans(M, S e ), consists of both object 1 (since M ∈ abindS (1) hence M ∈ CS e (1)) and object 2 (since (M, 2) ∈ I hence (M, 2) ∈ I + ). Notice that 1 is not returned when M is stated against S, i.e. ans(M, S) ⊂ ans(M, S e ). Instead, ans(UMC, S) = ans(UMC, S e ) = ∅. 2 It turns that queries stated against an EIS can be answered without actually computing the whole EIS. In order to derive an answering procedure for queries posed against an EIS, we introduce a recursive function on the IS query language LO , in the same style as the algorithm for querying IS presented in Section 3. Definition 13: Given an IS S, the extended solver of S, αSe , is the total function from queries to sets of objects, αSe : LO → P(Obj), defined as follows: αSe (t) = {αS (u) | t → u ∈ K and K |= u → t} αSe (q ∧ q ) = αSe (q) ∩ αSe (q ) αSe (q ∨ q ) = αSe (q) ∪ αSe (q ) αSe (¬q) = Obj \ αSe (q)
where αS is the solver of S.
2
600
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
Note that since is the maximal element the set {αS (u) | → u ∈ K and K |= u → } is empty. This means that αSe (), i.e. {αS (u) | → u ∈ K and K |= u → } is actually the intersection of an empty family of subsets of Obj. However, according to the Zermelo axioms of set theory (see [2] for an overview), the intersection of an empty family of subsets of a universe equals to the universe. In our case, the universe is the set of all objects known to the source, i.e. the set Obj, thus we conclude that αSe () = Obj. The same holds for each maximal element (if the taxonomy has more than one maximal elements). 2 Proposition 5: For all ISs S and queries ϕ ∈ LO , ans(ϕ, S e ) = αSe (ϕ). Example 7: By applying the last Proposition, we have: 2 ans(M, S e ) = αSe (M) = αS (SC) = I(SC) ∪ I(R) ∪ I(M) = {1, 2}.
6
Iterative Extension of Information Sources
Intuitively, we would expect that ·+ be a function which, applied to an IS interpretation, produces a new interpretation that is equal to or larger than the original extension, the former case corresponding to the situation in which the knowledge base of the IS does not enable to find any explanations for each object index. Technically, this amounts to say that ·+ is a monotonic function, which is in fact the case. Then, by iterating the ·+ operator, we expect to move from an interpretation to a larger one, until an interpretation is reached which cannot be extended any more. Also this turns out to be true, and in order to show it, we will model the domain of the ·+ operator as a complete partial order, and use the notion of fixed point in order to capture interpretations that are no longer extensible. Proposition 6: Given an IS S, the domain of S is the set D given by D = {I ∪ A | A ∈ P(T × Obj)}. Then, ·+ is a continuous function on the complete partial order (D, ⊆). The proof that (D, ⊆) is a complete partial order is trivial. The continuity of ·+ follows from its monotonicity (also, a simple fact to show) and the fact that in the considered complete partial order all chains are finite, hence the class of monotonic functions coincides with the class of continuous functions [6]. 2 As a corollary of the previous Proposition and of the Knaster-Tarski fixed point theorem, we have: Proposition 7: The function ·+ has a least fixed point that is the least upper bound of the chain {I, I + , (I + )+ , . . .}. 2 Example 8: Let R be the EIS derived in the last Example, i.e. R = S e , and let us consider the problem AR (1), for which the manifestation is given by the set abindS (1) above. It can be verified that Sol(AR (1)) = P0 ∪ P1 , where: P0 = {{R, M, MPC, UD} ∪ A | A ∈ P({SC, C, })} P1 = {{R, M, UMC} ∪ A | A ∈ P({SC, C, , MPC, UD})} Therefore: Sol (AR (1)) = {{R, M, UMC}} from which we obtain: abindR (1) = {R, M, UMC} which means that the index of object 1 in R has been extended with the term UMC. If we now set P = Re , and consider the problem AP (1), we find
An Abduction-Based Method for Index Relaxation
601
Sol(AP (1)) = P0 = {{R, M, UMC} ∪ A | A ∈ P({SC, MPC, UD, C, })} Consequently, Sol (AP (1)) = {{R, M, UMC}} and abindP (1) ⊆ indP (1). Analogously, we have abindR (2) = indR (2) ∪ {UMC} and abindP (2) ⊆ indP (2). Thus, since ((I + )+ )+ = (I + )+ , (I + )+ is a fixed point, which means that P is no longer extensible. Notice that ∅ = ans(UMC, S) = ans(UMC, R) ⊂ ans(UMC, P ) = {1, 2}. 2
7
Conclusion and Future Work
To alleviate the problem of indexing uncertainty we have proposed a mechanism which allows liberating the index of a source in a gradual manner. This mechanism is governed by the notion of explanation, logically captured by abduction. The proposed method can be implemented as an answer enlargement1 process where the user is not required to give additional input, but from expressing his/her desire for more objects. Another interesting remark is that the abduced extension operation can be applied not only to manually constructed taxonomies but also to taxonomies derived automatically on the basis of an inference service. For instance, it can be applied on sources indexed using taxonomies of compound terms which are defined algebraically [9]. The introduced framework can be also applied for ranking the objects of an answer according to an explanation-based measure of relevance. In particular, we can define the rank of an object o as (k)e follows: rank(o) = min{ k | o ∈ αS (ϕ)}.
References 1. R. Baeza-Yates and B. Ribeiro-Neto. “Modern Information Retrieval”. ACM Press, Addison-Wesley, 1999. 2. George Boolos. “Logic, Logic and Logic”. Harvard University Press, 1998. 3. F.M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. Reasoning in description logics. In G. Brewka, editor, Principles of Knowledge Representation, Studies in Logic, Language and Information, pages 193–238. CSLI Publications, 1996. 4. T. Eiter and G. Gottlob. The complexity of logic-based abduction. Journal of the ACM, 42(1):3–42, January 1995. 5. H.B. Enderton. A mathematical introduction to logic. Academic Press, N. Y., 1972. 6. P.A. Fejer and D.A. Simovici. Mathematical Foundations of Computer Science. Volume 1: Sets, Relations, and Induction. Springer-Verlag, 1991. 7. C.H. Papadimitriou. Computational complexity. Addison-Wesley, 1994. 8. Giovanni M. Sacco. “Dynamic Taxonomies: A Model for Large Information Bases”. IEEE Transactions on Knowledge and Data Engineering, 12(3), May 2000. 9. Y. Tzitzikas, A. Analyti, N. Spyratos, and P. Constantopoulos. “An Algebra for Specifying Compound Terms for Faceted Taxonomies”. In 13th European-Japanese Conf. on Information Modelling and Knowledge Bases, Kitakyushu, J, June 2003. 10. P. Zunde and M.E. Dexter. “Indexing Consistency and Quality”. American Documentation, 20(3):259–267, July 1969. 1
If the query contains negation then the answer can be reduced.
On Selection Functions that Do Not Preserve Normality Wolfgang Merkle and Jan Reimann Ruprecht-Karls-Universit¨ at Heidelberg Mathematisches Institut Im Neuenheimer Feld 294 D-69120 Heidelberg, Germany {merkle,reimann}@math.uni-heidelberg.de
Abstract. The sequence selected from a sequence R(0)R(1) . . . by a language L is the subsequence of all bits R(n + 1) such that the prefix R(0) . . . R(n) is in L. By a result of Agafonoff [1], a sequence is normal if and only if any subsequence selected by a regular language is again normal. Kamae and Weiss [11] and others have raised the question of how complex a language must be such that selecting according to the language does not preserve normality. We show that there are such languages that are only slightly more complicated than regular ones, namely, normality is neither preserved by linear languages nor by deterministic one-counter languages. In fact, for both types of languages it is possible to select a constant sequence from a normal one.
1
Introduction
It is one of the fundamental beliefs about chance experiments that any infinite binary sequence obtained by independent tosses of a fair coin will, in the long run, produce any possible finite sequence with frequency 2−n , where n is the length of the finite sequence considered. Sequences of zeros and ones having this property are called normal. It is a basic result of probability theory that, with respect to the uniform Bernoulli measure, almost every sequence is normal. One may now pose the following problem: If we select from a normal sequence an infinite subsequence, under what selection mechanisms is the thereby obtained sequence again normal, i.e. which restrictions must and can one impose on the class of admissible selection rules to guarantee that normality is preserved. This problem originated in the work of von Mises (see for example [22]). His aim was to base a mathematical theory of probability on the primitive notion of a Kollektiv, which are objects having two distinguished properties. On the one hand, individual symbols possess an asymptotic frequency (as normal sequences do) which allows in turn to assign probabilities. On the other hand, the limiting frequencies are preserved when a subsequence is selected from the original sequence. Of course, not arbitrary selection rules, or place selection rules, as von Mises calls them, will be allowed in this context, since one might simply select all zeroes from a given sequence. Von Mises did not give a formal definition of an B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 602–611, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Selection Functions that Do Not Preserve Normality
603
admissible selection rule, however, he requires them to select a subsequence “independently of the result of the corresponding observation, i.e., before anything is known about this result.” There have been various attempts to clarify and rigorously define what an admissible selection rule and hence a Kollektiv is. One approach allowed only rules that were in some sense effective, for instance, computable by a Turing machine. This effort was initiated by Church [9] and lead to the study of effective stochastic sequences (see the survey by Uspensky, Semenov and Shen [20] for more on this). Knowing Champernowne’s construction (see Champernowne’s paper [8] and Section 2 below), normal numbers disqualify as stochastic sequences, as some of them are easy to describe by algorithms. On the other hand, from a purely measure theoretic point of view, normal sequences seem to be good candidates for a Kollektiv, as they have the right limiting frequency of individual symbols. Furthermore, their dynamic behavior is as complex as possible: They are generic points in {0, 1}∞ with respect to a measure with highest possible entropy – the uniform (1/2, 1/2)-Bernoulli measure. (In Section 3 we will explain this further, also refer to Weiss [23].) So one might ask a question contrary to the problem set up by Church and others: Which selection rules preserve normality, i.e. map normal sequences to normal ones? In particular, such rules will preserve the limiting frequency of zeroes and ones and hence satisfy von Mises’ requirements for a Kollektiv. There are two kinds of selection rules that are commonly considered: oblivious ones, for which the decision of selecting a bit for the subsequence does not depend on the input sequence up to that bit (i.e., the places to be selected are fixed in advance), and those selection rules that depend on the input sequence. For oblivious selection rules, Kamae [10] found a necessary and sufficient condition for them to preserve normality. For input dependent rules, Agafonoff [1] obtained the result that if a sequence N is normal, then any infinite subsequence selected from N by a regular language L is again normal. Detailed proofs and further discussion can be found in Schnorr and Stimm [18] as well as in Kamae and Weiss [11]. (It is not hard to see that the reverse implication holds, too, as observed by Postnikova [17] and others [15,7], hence the latter property of a sequence N is equivalent to N being normal). It has been asked by Kamae and Weiss [11] whether Agafonoff’s result can be extended to classes of languages that are more comprising than the class of regular languages, e.g., to the class of context-free languages (see also Li and Vit´ anyi [13], p. 59, problem 1.9.7). In the sequel, we give a negative answer to this question for two classes of languages that are the least proper superclasses of the class of regular languages that are usually considered in the theory of formal languages. More precisely, Agafonoff’s result can neither be extended to the class of linear languages nor to the class of languages that are recognized by a deterministic pushdown automat with unary stack alphabet, known as deterministic one-counter languages. Recall that these two classes are incomparable and that the latter fact is witnessed, for example, by the languages used in the proofs of Propositions 10 and 11, i.e., the language of all words that contains as
604
Wolfgang Merkle and Jan Reimann
many 0’s as 1 and the language of even-length palindromes. (For background on formal language theory we refer to the survey by Autebert, Berstel, and Boasson [5].) However, determining exactly the class of languages preserving normality remains an open problem. The outline of the paper is as follows. In Section 2 we review the basic definitions related to normality and recap Champernowne’s constructions of normal sequences. Section 3 discusses the two kinds of selection rules, oblivious and input dependent ones. In Section 4, we show that normality is not preserved by selection rules defined by deterministic one-counter languages, while Section 5 is devoted to proving that normality is not preserved by linear languages. Our notation is mostly standard, for unexplained terms and further details we refer to the textbooks and surveys cited in the bibliography [3,4,6,13,14,16]. Unless explicitly stated otherwise, sequences are always infinite and binary. A word is a finite sequence. For i = 0, 1, . . ., we write A(i) for bit i of a sequence A, hence, A = A(0)A(1) . . . and we proceed similarly for words. A word w is a prefix of a sequence A if A(i) = w(i) for i = 0, . . . , |w| − 1, where |w| is the length of w. The prefix of a sequence A of length m is denoted by A|m. The concatenation of two words v and w is denoted by vw. A word u is a subword of a word w if w = v1 uv2 for appropriate words v1 and v2 .
2
Normal Sequences
For a start, we review the concept of a normal sequence and standard techniques for the construction of such sequences. Definition 1. (i) For given words u and w, let occu (w) be the number of times that u appears as a subword of w, and let frequ (w) = occu (w)/|w|. (ii) A sequence N is normal if and only if for any word u lim frequ (N |m) =
m→∞
1 . 2|u|
(1)
Remark 2. A sequence N is normal if for any word u and any ε > 0, we have for all sufficiently large m, 1 + ε. (2) 2|u| For a proof, it suffices to observe that for any given ε > 0 and for all sufficiently large m, inequality (2) holds with u replaced by any word v that has the same length as u, while the sum of the relative frequencies freqv (N |m) over these 2|u| words differ from 1 by less than ε; hence by (2) all such m, 1 (1 − ε) ≤ freqv (N |m) ≤ frequ (N |m) + (2|u| − 1)( 2|u| + ε), frequ (N |m)
0 has been chosen arbitrarily. Definition 3. A set W of words is normal in the limit if and only if for any nonempty word u and any ε > 0 for all but finitely many words w in W , 1 1 − ε < frequ (w) < |u| + ε. 2|u| 2
(3)
Definition 4. For any n, let vn = 0n 0n−1 1 0n−2 10 . . . 1n be the word that is obtained by concatenating all words of length n in lexicographic order. Proposition 5. The set {v1 , v2 , . . .} is normal in the limit. Proof. By an argument similar to the one given in Remark 2, it suffices to show that for any word u and any given ε > 0 we have for almost all words vi , frequ (vi )
0 and consider any index i such that |u|/i < ε. Recalling that vi is the concatenation of all words of length i, call a subword of vi undivided if it is actually a subword of one of these words of length i, and call all other subwords of vi divided. It is easy to see that u can occur at most 2i |u| many times as a divided subword of vi . Furthermore, a symmetry argument shows that among the at most |vi | many undivided subwords of vi of length |u|, each of the 2|u| words of length |u| occurs exactly the same number of times. In summary, we have |vi | 1 2i |u| 1 i occu (vi ) ≤ |u| + 2 |u| = |vi | < |vi | + +ε , |vi | 2 2|u| 2|u| where the last inequality follows by |vi | = 2i i and the choice of i. Equation (4) is then immediate by definition of frequ (vi ). Lemma 6. Let W be a set of words that is normal in the limit. Let w1 , w2 , . . . be a sequence of words in W such that |{i ≤ t : wi = w}| = 0, t→∞ t
(i) for all w ∈ W, lim
|wt+1 | = 0. t→∞ |w1 . . . wt |
(ii) lim
Then the sequence N = w1 w2 . . . is normal. Remark 7. The sequence v1 v2 v2 v3 v3 v3 v4 . . ., which consists of i copies of vi concatenated in length-increasing order, is normal. This assertion is immediate by definition of the sequence, Proposition 5, and Lemma 6.
606
Wolfgang Merkle and Jan Reimann
Due to lack of space, we omit the proof of Lemma 6. The arguments and techniques (also for the other results in this section) are essentially the same as the ones used by Champernowne [8], who considered normal sequences over the decimal alphabet {0, 1, . . . , 9} and proved that the decimal analogues of the sequences N1 = v1 v2 v2 v3 v3 v3 v4 . . .
and
N2 = v1 v2 v3 v4 . . .
are normal. In Remark 7, we have employed Lemma 6 and the fact that the set of all words vi is normal in the limit in order to show that N1 is normal. In order to demonstrate the normality of the decimal analogue of the sequence N2 , Champernowne [8, item (ii) on page 256] shows a fact about the decimal versions of the vi that is stronger than just being normal in the limit, namely, for any word u and any constant k, we have for all sufficiently large i, and all m ≤ |vi |, frequ (vi (0) . . . vi (m − 1))
0 for all nonempty proper prefixes of vs . (8)
We proceed by induction on i. For i = 0 there is nothing to prove, so assume i > 0. Let v0i and v1i be the first and the second half of vi , respectively. For r = 0, 1, the string vri is obtained from vi−1 by inserting 2i−1 times r, where d(vi−1 ) = 0 by the induction hypothesis. Hence (i) follows because d(vi ) = d(v0i ) + d(v1i ) = d(vi−1 ) + 2i−1 + d(vi−1 ) − 2i−1 = 0 . In order to show (ii), fix any nonempty proper prefix u of vi . First assume that u is a proper prefix of v0i . Then u can be obtained from a nonempty, proper prefix of vi−1 by inserting some 0’s, hence we are done by the induction hypothesis. Next assume u = v0i v for some proper prefix v of v1i . We have already argued that the induction hypothesis implies d(v0i ) = 2i−1 . Furthermore, v can be obtained from a proper prefix v of vi−1 by inserting at most 2i−1 many 1’s, where by the induction hypothesis we have d(v ) > 0. In summary, we have d(u) = d(v0i ) + d(v) ≥ 2i−1 + d(v ) − 2i−1 > 0 ,
which finishes the proof of the proposition.
5
Normality Is Not Preserved by Linear Languages
Proposition 11. There is a normal sequence N and a linear language L such that the sequence selected from N by L is infinite and constant. Proof. For any word w = w(0) . . . w(n − 1) of length n, let wR = w(n − 1) . . . w(0) be the mirror word of w and let L = {wwR : w is a word} be the language of palindromes of even length. The language L is linear because it can be generated by a grammar with start symbol S and rules S → 0S0 | 1S1 | λ. The sequence N is defined in stages s = 0, 1, . . . where during stage s we z0 and z0 both be equal to the specify prefixes zs and zs of N . At stage 0, let empty string. At any stage s > 0, obtain zs by appending 2s copies of vs to zs−1 zs its own mirror word zR and obtain zs by appending to s , i.e., zs = zs−1 vs . . . vs
(2s copies of vs ),
i.e., for example, we have z1 = v1 ,
and
zs = zs zR s;
(9)
On Selection Functions that Do Not Preserve Normality
z1 z2 z2 z3
609
= v1 vR 1, = v1 vR 1 v2 v2 , R R R = v1 vR 1 v2 v2 v2 v2 v1 v1 , R R R = v1 vR 1 v2 v2 v2 v2 v1 v1 v3 v3 v3 v3 ,
R R R R R R R R R R R z3 = v1 vR 1 v2 v2 v2 v2 v1 v1 v3 v3 v3 v3 v3 v3 v3 v3 v1 v1 v2 v2 v2 v2 v1 v1 .
We show next that the set of prefixes of N that are in L coincides with the set {zs : s ≥ 0}. From the latter, it is then immediate that L selects from N an infinite subsequence that consists only of 0’s, since any prefix zs of N is followed by the word vs+1 , where all these words start with 0. By definition of the zs , all words zs are prefixes of N and are in L. In order to show that the zs are the only prefixes of N contained in L, let us = 01s 1s 0 . By induction on s, we show for all s > 2 that (i) in zs occurs exactly one subword us−1 and no subword us ; (ii) in zs occur exactly two subwords us−1 and one subword us ; Inspection shows that both assertions are true in case s = 3. In the induction step, consider some s > 3. Assertion (i) follows by zs = zs−1 vs . . . vs , the induction hypothesis on zs−1 , and because by definition of vs , the block of copies of vs cannot overlap with a subword us . Assertion (ii) is then immediate by R assertion (i), by zs = zs zR and 01s is a suffix s , and because us is equal to us of zs . Now fix any prefix w of N and assume that w is in L, i.e., is a palindrome of even length. Let s be maximum such that zs is a prefix of w. We can assume s ≥ 3, because inspection reveals that w cannot be a prefix of z3 unless w is equal to some zi , where in the latter case we are done. By (ii), the words zs and zs+1 contain us as a subword exactly once and twice, respectively, hence w contains us as a subword at least once and at most twice. When mirroring the palindrome w onto itself, the first occurrence of the palindrome us in w must either be mapped to itself or, if present at all, to the second occurrence of us in w, in which cases w must be equal to zs and zs+1 , respectively. Since w was chosen as an arbitrary prefix of N in L, this shows that the zs are the only prefixes of N in L. It remains to show that N is normal. Let W = {vi : i ∈ N} ∪ {vi R : i ∈ N} and write the sequence N in the form N = w1 w2 . . .
(10)
where the words wi correspond in the natural way to the words in the set W that occur in the inductive definition of N (e.g., w1 , w2 , and w3 are equal to v1 , v1 R , and v2 ). For the scope of this proof, we will call the subwords wi of N in (10) the designated subwords of N .
610
Wolfgang Merkle and Jan Reimann
We conclude the proof by showing that the assumptions of Lemma 6 are satisfied. By Proposition 5, the set of all words of the form vi is normal in the limit, and the same holds, by literally the same proof, for the set of all words vi R ; the union of these two sets, i.e., the set W , is then also normal in the limit because the class of sets that are normal in the limit is easily shown to be closed under union. Next observe that in every prefix zs of N each of the 2s words v1 , . . . , vs and v1 R , . . . , vs R occurs exactly 2s−1 many times; in particular, zs contains at least s2s designated subwords and has length of at least 2s−1 |vs |. Now fix any t > 0 and let z = w1 . . . wt ; let s be maximum such that zs is a prefix of z. By the preceding discussion, we have for any w in W , |{i ≤ t : wi = w}| 1 2s < s = t s2 s and, furthermore, |wt+1 | |vs+1 | 2s+1 1 < ≤ s−1 = s−2 . |w1 . . . wt | |zs | 2 |vs | 2 Since t was chosen arbitrarily and s goes to infinity when t does, this shows that assumptions (i) and (ii) in Lemma 6 are satisfied.
Acknowledgements We are grateful to Klaus Ambos-Spies, Frank Stephan, and Paul Vit´ anyi for helpful discussions.
References 1. V. N. Agafonoff. Normal sequences and finite automata. Soviet Mathematics Doklady, 9:324–325, 1968. 2. K. Ambos-Spies. Algorithmic randomness revisited. In B. McGuinness (ed.), Language, Logic and Formalization of Knowledge. Bibliotheca, 1998. 3. K. Ambos-Spies and A. Kuˇcera. Randomness in computability theory. In P. Cholak et al. (eds.), Computability Theory: Current Trends and Open Problems, Contemporary Mathematics, 257:1–14. American Mathematical Society, 2000. 4. K. Ambos-Spies and E. Mayordomo. Resource-bounded balanced genericity, stochasticity and weak randomness. In Complexity, Logic, and Recursion Theory. Marcel Dekker, 1997. 5. J.-M. Autebert, J. Berstel, and L. Boasson, Context-Free Languages and Pushdown Automata. In G. Rozenberg and A. Salomaa (eds.), Handbook of formal languages. Springer, 1997. 6. J.L. Balc´ azar, J. D´ıaz and J. Gabarr´ o. Structural Complexity, Vol. I and II. Springer, 1995 and 1990. 7. A. Broglio and P. Liardet. Predictions with automata. Symbolic dynamics and its applications, Proc. AMS Conf. in honor of R. L. Adler, New Haven/CT (USA) 1991, Contemporary Mathematics, 135:111–124. American Mathematical Society, 1992.
On Selection Functions that Do Not Preserve Normality
611
8. D. G. Champernowne, The construction of decimals normal in the scale of ten. Journal of the London Mathematical Society, 8:254–260, 1933. 9. A. Church. On the concept of a random number. Bulletin of the AMS, 46:130–135, 1940. 10. T. Kamae. Subsequences of normal seuqences. Isreal Journal of Mathematics, 16:121–149, 1973. 11. T. Kamae and B. Weiss. Normal numbers and selection rules. Isreal Journal of Mathematics, 21(2-3):101–110, 1975. 12. M. van Lambalgen. Random Sequences, Doctoral dissertation, University of Amsterdam, Amsterdam, 1987. 13. M. Li and P. Vit´ anyi An Introduction to Kolmogorov Complexity and Its Applications, second edition, Springer, 1997. 14. J. H. Lutz. The quantitative structure of exponential time. In Hemaspaandra, L. A. and A. L. Selman, editors, Complexity Theory Retrospective II. Springer, 1997. 15. M. G. O’Connor. An unpredictability approach to finite-state randomness. Journal of Computer and System Sciences, 37(3):324–336, 1988. 16. P. Odifreddi. Classical Recursion Theory. Vol. I. North-Holland, 1989. 17. L. P. Postnikova, On the connection between the concepts of collectives of MisesChurch and normal Bernoulli sequences of symbols. Theory of Probability and its Applications, 6:211–213, 1961. 18. C. P. Schnorr and H. Stimm. Endliche Automaten und Zufallsfolgen. Acta Informatica, 1:345–359, 1972. 19. A. Kh. Shen’. On relations between different algorithmic definitions of randomness. Soviet Mathematics Doklady, 38:316–319, 1988. 20. V. A. Uspensky, A. L. Semenov, and A. Kh. Shen’. Can an individual sequence of zeros and ones be random? Russian Math. Surveys, 45:121–189, 1990. ´ 21. J. Ville, Etude Critique de la Notion de Collectif. Gauthiers-Villars, 1939. 22. R. von Mises. Probability, Statistics and Truth. Macmillan, 1957. 23. B. Weiss. Single Orbit Dynamics. CBMS Regional Conference Series in Mathematics. American Mathematical Society, 2000.
On Converting CNF to DNF Peter Bro Miltersen1,∗ , Jaikumar Radhakrishnan2,∗∗ , and Ingo Wegener3,∗∗∗ 1
3
Department of Computer Science, University of Aarhus, Denmark [email protected] 2 School of Technology and Computer Science Tata Institute of Fundamental Research, Mumbai 400005, India [email protected] FB Informatik LS2, University of Dortmund, 44221 Dortmund, Germany [email protected]
Abstract. We study how big the blow-up in size can be when one switches between the CNF and DNF representations of boolean functions. For a function f : {0, 1}n → {0, 1}, cnfsize(f ) denotes the minimum number of clauses in a CNF for f ; similarly, dnfsize(f ) denotes the minimum number of terms in a DNF for f . For 0 ≤ m ≤ 2n−1 , let dnfsize(m, n) be the maximum dnfsize(f ) for a function f : {0, 1}n → {0, 1} with cnfsize(f ) ≤ m. We show that there are constants c1 , c2 ≥ 1 and > 0, such that for all large n and all m ∈ [ 1 n, 2n ], we have n n−c1 log(m/n)
2
n n−c2 log(m/n)
≤ dnfsize(m, n) ≤ 2
.
In particular, when m is the polynomial nc , we get dnfsize(nc , n) = n ) n−θ(c−1 log n
2
1
.
Introduction
Boolean functions are often represented as disjunctions of terms (i.e. in DNF) or as conjunctions of clauses (i.e. in CNF). Which of these representations is preferable depends on the application. Some functions are represented more succinctly in DNF whereas others are represented more succinctly in CNF, and switching between these representations can involve an exponential increase in size. In this paper, we study how big this blow-up in size can be. We recall some well-known concepts (for more details see Wegener [15]). The set of variables is denoted by Xn = {x1 , . . . , xn }. Literals are variables and negated variables. Terms are conjunctions of literals. Clauses are disjunctions of literals. Every Boolean function f can be represented as a conjunction of clauses, s , (1) i=1 ∈Ci
as well as a disjunction of terms, ∗ ∗∗ ∗∗∗
Supported by BRICS, Basic Research in Computer Science, a centre of the Danish National Research Foundation. Work done while the author was visiting Aarhus. Supported by DFG-grant We 1066/9.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 612–621, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Converting CNF to DNF s
,
613
(2)
i=1 ∈Ti
where Ti and Ci are sets of literals. The form (1) is usually referred to as conjunctive normal form (CNF) and the form (2) is usually referred to as disjunctive normal form (DNF), although it would be historically more correct to call them conjunctive and disjunctive forms and use normal only when the sets Ci and Ti have n literals on distinct variables. In particular, this would ensure that normal forms are unique. However, in the computer science literature such a distinction is not made, and we will use CNF and DNF while referring to expressions such as (1) or (2) even when no restriction is imposed on the sets Ci and Ti , and there is no guarantee of uniqueness. The size of a CNF is the number of clauses (the parameter s in (1)), and cnfsize(f ) is the minimum number of clauses in a CNF for f . Similarly, dnfsize(f ) is the minimum number of terms in a DNF for f . We are interested in the maximal blow-up of size when switching from the CNF representation to the DNF representation (or vice versa). For 0 ≤ m ≤ 2n−1 , let dnfsize(m, n) be the maximum dnfsize(f ) for a function f : {0, 1}n → {0, 1} with cnfsize(f ) ≤ m. Since ∧ distributes over ∨, a CNF with m clauses each with k literals can be converted to a DNF with k m terms each with at most m literals. If the clauses do not share any variable, this blow-up cannot be avoided. If the clauses don’t share variables, we have km ≤ n, and the maximum n dnfsize(f ) that one can achieve by this method is 2 2 . Can the blow-up be worse? In particular, we want to know the answer to the following question: For a function f : {0, 1}n → {0, 1}, how large can dnfsize(f ) be if cnfsize(f ) is bounded by a fixed polynomial in n? The problem is motivated by its fundamental nature: dnfsize(f ) and cnfsize(f ) are fundamental complexity measures. Practical circuit designs like programmable logic arrays (PLAs) are based on DNFs and CNFs. Lower bounds on unbounded fan-in circuits are based on the celebrated switching lemma of H˚ astad (1989) which is a statement about converting CNFs to DNFs where some variables randomly are replaced by constants. Hence, it seems that the exact relationship between CNFs and DNFs ought to be understood as completely as possible. Fortunately, CNFs and DNFs have simple combinatorial properties allowing the application of current combinatorial arguments to obtain such an understanding. In contrast, the results of Razborov and Rudich [12] show that this is not likely to be possible for complexity measures like circuit size and circuit depth. Another motivation for considering the question is the study of SAT algorithms and heuristics with “mild” exponential behaviour; a study which has gained a lot of momentum in recent years (e.g., Monien and Speckenmeyer[9], Paturi et al. [10], Dantsin et al. [4], Sch¨ oning [13], Hofmeister et al. [7], and Dantsin et al. [5]). Despite many successes, the following fundamental question is still open: Is there an algorithm that decides SAT of a CNF with n variables and m clauses (without any restrictions on the length of clauses) in time mO(1) 2cn for some constant c < 1? The obvious brute force algorithm solves the
614
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
problem in time mO(1) 2n . One method for solving SAT is to convert the CNF to a DNF, perhaps using sophisticated heuristics to keep the final DNF and any intermediate results small (though presumably not optimally small, due to the hardness of such a task). Once converted to a DNF, satisfiability of the formula is trivial to decide. A CNF-DNF conversion method for solving SAT, phrased in a more general constraint satisfaction framework was recently studied experimentally by Katajainen and Madsen [8]. Answering the question above limits the worst case complexity of any algorithm obtained within this framework. The Monotone Case: Our final motivation for considering the question comes from the monotone version of the problem. Let dnfsize+ (m, n) denote the maximum dnfsize(f ) for a monotone function f : {0, 1}n → {0, 1}. In this case (see, e.g., Wegener [15, Chapter 2, Theorem 4.2]), the number of prime clauses of f is equal to cnfsize(f ) and the number of prime implicants of f is equal to dnfsize(f ). Our problem can then be modelled on a hypergraph Hf whose edges are precisely the prime clauses of f . A vertex cover or hitting set for a hypergraph is a subset of vertices that intersects every edge of the hypergraph. The number of prime implicants of f is precisely the number of minimal vertex covers in Hf . The problem of determining dnfsize+ (m, n) then immediately translates to the following problem on hypergraphs: What is the maximum number of distinct minimal vertex covers in a hypergraph on n vertices with m distinct edges? In particular, how many minimal vertex covers can a hypergraph with nO(1) edges have? Previous Work: Somewhat surprisingly, the exact question we consider does not seem to have been considered before, although some related research has been reported. As mentioned, H˚ astad’s switching lemma can be considered as a result about approximating CNFs by DNFs. The problem of converting polynomialsize CNFs and DNFs into representations by restricted branching programs for the purpose of hardware verification has been considered since a long time (see Wegener [16]). The best lower bounds for ordered binary decision diagrams (OBDDs) and read-once branching programs (BP1s) are due to Bollig and We1/2 gener [3] and are of size 2Ω(n ) even for monotone functions representable as disjunctions of terms of length 2. The Results in this Paper: In Section 2, we show functions where the the blow-up when going from CNF to DNF is large: n
for 2 ≤ m ≤ 2n−1 , dnfsize(m, n) ≥ 2n−2 log(m/n) ; log log(m/n) n for 2 ≤ m ≤ n2 , dnfsize+ (m, n) ≥ 2n−n log(m/n) −log(m/n) . In particular, for m = nO(1) , we have n
dnfsize(m, n) = 2n−O( log n ) and dnfsize+ (m, n) = 2n−O(
n log log n ) log n
.
In Section 3, we show that functions with small CNFs do not need very large DNFs There is a constant c > 0 such that for all large n and all m ∈ − [104 n, 210 4n ],
On Converting CNF to DNF
615
n
dnfsize(m, n) ≤ 2n−c log(m/n) . In particular, for m = nO(1) , we have dnfsize(m, n) = 2n−Ω(n/log n) . For the class of CNF-DNF conversion based SAT algorithms described above, our results imply that no algorithm within this framework has complexity mO(1) 2cn for some constant c < 1, though we cannot rule out an algorithm of this kind with complexity mO(1) 2n−Ω(n/ log n) which would still be a very interesting result.
2
Functions with a Large Blow-Up
In this section, we show functions with small cnfsize but large dnfsize. Our functions will be the conjunction of a small number of parity and majority functions. To estimate the cnfsize and the dnfsize of such functions, we will need use a lemma. Recall, that a prime implicant t of a boolean function f is called an essential prime implicant if there is an input x such that t(x) = 1 but t (x) = 0 for all other prime implicants t of f . We denote the number of essential prime implicants of f by ess(f ). Lemma 1. Let f (x) = i=1 gi (x), where the gi ’s depend on disjoint sets of variables and no gi is identically 0. Then, cnfsize(f ) =
cnfsize(gi ) and
dnfsize(f ) ≥ ess(f ) =
i=1
ess(gi ).
i=1
Proof. First, consider cnfsize(f ). This part is essentially Theorem 1 of Voigt and Wegener [14]. We recall their argument. Clearly, we can put together the CNFs of the gi ’s and produce a CNF for f with size at most i=1 cnfsize(gi ). To show that cnfsize(f ) ≥ i=1 cnfsize(gi ), let C be the set of clauses of the smallest CNF of f . We may assume that all clauses in C are prime clauses of f . Because the gi ’s depend on disjoint variables, every prime clause of f is a prime clause of exactly one gi . Thus we obtain a natural partition {C1 , C2 , . . . , C } of C where each clause in Ci is a prime clause of gi . Consider a setting to the variables of gj (j = i) that makes each such gj take the value 1 (this is possible because no gj is identically 0). Under this restriction, the function f reduces to gi and all clauses outside Ci are set to 1. Thus, gi ≡ c∈Ci c, and |Ci | ≥ cnfsize(gi ). The first claim follows from this. It is well-known since Quine [11] (see also, e.g., Wegener [15, Chapter 2, Lemma 2.2]) that dnfsize(f ) ≥ ess(f ). Also, it is easy to see that any essential prime implicant of f is the conjunction of essential prime implicants of gi and every conjunction of essential prime implicants of gi is an essential prime implicant of f . Our second claim follows from this.
We will apply the above lemma with the parity and majority functions as gi ’s. It is well-known that the parity function on n variables, defined by ∆
Parn (x) =
n
i=1
xi =
n i=1
xi
(mod 2),
616
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
has cnfsize and dnfsize equal to 2n−1 . For monotone functions, it is known that for the majority function on n variables, defined by Maj(x) = 1 ⇔ has cnfsize and dnfsize equal to
n n/2
n
xi ≥
i=1
n , 2
.
Definition 1. Let the set of n variables {x1 , x2 , . . . , xn } be partitioned into = n/k sets S1 , . . . S where |Si | = k for i < . The functions fk,n , hk,n : {0, 1}n → {0, 1} are defined as follows: fk,n (x) =
i=1 j∈Si
xj and hk,n (x) =
Maj(xj : j ∈ Si ).
i=1
Theorem 1. Suppose 1 ≤ k ≤ n. Then n · 2k−1 and dnfsize(fk,n ) = 2n−n/k ; cnfsize(fk,n ) ≤ k
n/k n k k cnfsize(hk,n ) ≤ · and dnfsize(hk,n ) ≥ . k k/2 k/2 k Proof. As noted above cnfsize(Park ) = 2k−1 and cnfsize(Majn ) = k/2 . Also, k k−1 and ess(Majn ) = k/2 . Our theorem it is easy to verify that ess(Park ) = 2 follows easily from this using Lemma 1.
Remark: One can determine the dnfsize of fk,n and hk,n directly using a general result of Voigt and Wegener [14], which states that the dnfsize(g1 ∧ g2 ) = dnfsize(g1 ) · dnfsize(g2 ) whenever g1 and g2 are symmetric functions on disjoint sets of variables. This is not true for general functions g1 and g2 (see Voigt and Wegener [14]). Corollary 1. 1. Let 2n ≤ m ≤ 2n−1 . There is a function f with cnfsize(f ) ≤ m and dnfsize(f ) ≥ 2n−2n/ log(m/n) . n 2. Let 4n ≤ m ≤ n/2 . Then, there is a monotone function h with cnfsize(h) ≤ m and dnfsize(h) ≥ 2n−n
log log(m/n) −log(m/n) log(m/n)
.
Proof. The first part follows from Theorem 1, by considering fk,n for k = log2 (m/n). The second part follows from the Theorem 1, by considering hk,n k ≤ 2k−1 (valid for with the same value of k. We use the inequality 2k /k ≤ k/2 k ≥ 2).
Let us understand what this result says for a range of parameters, assuming n is large.
On Converting CNF to DNF
617
Case m = cn: There is a function with linear cnfsize but exponential dnfsize. For > 0, by choosing c = θ(22/ ), the dnfsize can be made at least 2(1−)n . −1 n Case m = nc : We can make dnfsize(f ) = 2n−O(c log n ) . By choosing c large we obtain in the exponent an arbitrarily small constant for the (n/ log n)-term. Case m = 2o(n) : We can make dnfsize(f ) grow at least as fast as 2n−α(n) , for each α = ω(1). Monotone functions: We obtain a monotone function whose cnfsize is nat most log log n a polynomial m = nc , but whose dnfsize can be made as large as 2n−ε log n . Here, ε = O(c−1 ).
3
Upper Bounds on the Blow-Up
In this section, we show the upper bound on dnfsize(m, n) claimed in the introduction. We will use restrictions to analyse CNFs. So, we first present the necessary background about restrictions, and then use it to derive our result. 3.1
Preliminaries
Definition 2 (Restriction). A restriction on a set of variables V is a function ρ : V → {0, 1, }. The set of variables in V assigned by ρ are said to have been left free by ρ and denoted by free(ρ); the remaining variables set(ρ) = V − free(ρ) are said to be set by ρ. Let S ⊆ V . We use RVS to denote the set of all restrictions ρ with set(ρ) = S. For a Boolean function f on variables V and a restriction ρ, we denote by fρ the function with variables free(ρ) obtained from f by fixing all variables x ∈ set(V ) at the value ρ(x). The following easy observation lets us conclude that if the subfunctions obtained by applying restrictions have small dnfsize then the original function also has small dnfsize. Lemma 2. For all S ⊆ V and all boolean functions f with variables V , dnfsize(f ) ≤ dnfsize(fρ ). ρ∈RV S
Proof. Let Φfρ denote the smallest DNF for fρ . For a restriction ρ ∈ RVS , let t(ρ) be the term consisting of literals from variables in S that is made 1 by ρ and 0 by all other restrictions in RVS . (No variables outside S appears in t(ρ). Every variables in S appears in t(ρ): the variable x appears unnegated if and only if ρ(x) = 1.) Then, Φ = ρ∈RV t(ρ) ∧ Φfρ gives us a DNF for f of the required S size.
In light of this observation, to show that the dnfsize of some function f is small, it suffices to somehow obtain restrictions of f that have small dnfsize. Random restrictions are good for this. We will use random restrictions in two ways. If the clauses of a CNF have a small number of literals, then the switching
618
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
lemma of H˚ astad[6] and Beame [1] when combined with Lemma 2 immediately gives us a small DNF (see Lemma 4 below). We are, however, given a general CNF not necessarily one with small clauses. Again, random restrictions come to our aid: with high probability large clauses are destroyed by random restrictions (see Lemma 5). Definition 3 (Random Restriction). When we say that ρ is a random restriction on the variables in V leaving variables free, we mean that ρ is generated as follows: first, pick a set S of size |V | − at random with uniform distribution; next, pick ρ with uniform distribution from RVS . We will need the following version of the switching lemma due to Beame [1]. Lemma 3 (Switching Lemma). Let f be a function on n variables with a a CNF whose clauses have at most r literals. Let ρ be a random restriction leaving variables free. Then Pr[fρ does not have a decision tree of depth d] < (7r/n)d . We can combine Lemma 2 and the switching lemma to obtain small DNFs for functions with CNFs with small clauses. n Lemma 4. Let 1 ≤ r ≤ 100 . Let f have a CNF on n variables where each clause 1 n has at most r literals. Then, dnfsize(f ) ≤ 2n− 100 · r .
Proof. Let V be 1the set of variables of f . Let ρ be a random restriction on V that leaves = 15 · nr variables free. By the switching lemma, with probability more than 1 − 2−d , fρ has a decision tree of depth at most d. We can fix S ⊆ V so that this event happens with this probability even when conditioned on set(ρ) = S, that is, when ρ is chosen at random with uniform distribution from RVS . If fρ has a decision tree of depth at most d, then it is easy to see that dnfsize(fρ ) ≤ 2d . In any case, dnfsize(fρ ) ≤ 2−1 . Thus, by Lemma 2, we have dnfsize(f ) ≤ 2n− · 2d + 2n− · 2−d · 2−1 . 1 n Set d = 2 . Then, dnfsize(f ) ≤ ρ∈RV dnfsize(fρ ) ≤ 2n− 2 +1 ≤ 2n− 100 · r . S
Lemma 5. Let V be a set of n variables, and K a set of literals distinct on variables. Let |K| = k. Let ρ be a random restriction that leaves n2 variables free. Then, k Pr[no literal in K is assigned 1] ≤ 2e− 8 . ρ
Proof. Let W be the set of variables that appear in K either in negated or nonnegated form. Using estimates for the tail of the hypergeometric distribution [2], we see first have k k Pr[|W ∩ set(ρ)| ≤ ] ≤ exp(− ). 4 8 k k Furthermore, Pr[no literal in K is assigned 1 | |W ∩ set(ρ)| ≥ ] ≤ 2− 4 . Thus, 4 k
k
k
Pr[no literal in K is assigned 1] ≤ e− 8 + 2− 4 < 2e− 8 . ρ
On Converting CNF to DNF
3.2
619
Small DNFs from Small CNFs
We now show that the blow-up obtained in the previous section (see Corollary 1) is essentially optimal. Theorem 2. There is a constant c > 0, such that for all large n, and m ∈ −4 [104 n, 210 n ], n dnfsize(m, n) ≤ 2n−c log(m/n) . Proof. Let f be a Boolean function on a set V of n variables, and let Φ be a CNF for f with at most m clauses. We wish to show that f has a DNF of small size. By comparing the present bound with Lemma 4, we see that our job would be done if we could somehow ensure that the clauses in Φ have at most O(log(m/n)) literals. All we know, however, is that Φ has at most m clauses. In order to prepare Φ for an application of Lemma 4, we will attempt to destroy the large clauses of Φ by applying a random restriction. Let ρ be a random restriction on V that leaves n2 variables free. We cannot claim immediately that all large clause are likely to be destroyed by this restriction. Instead, we will use the structure of the surviving large clauses to get around them. The following predicate will play a crucial role in our proof. E(ρ): There is a set S0 ⊆ free(ρ) of size at most n/10 so that every clause ∆ of Φ that is not killed by ρ has at most r = 100 log(m/n) free variables outside S0 . n
Claim. Prρ [E(ρ)] ≥ 1 − 2− 100 . Before we justify this claim, let us see how we can exploit it to prove our n theorem. Fix a choice of S ⊆ V such that Pr[E(ρ) | set(ρ) = S] ≥ 1 − 2− 100 . Let F = V − S. We will concentrate only on ρ’s with set(ρ) = S, that is, ρ’s from the set RVS . We will build a small DNF for f by putting together the DNFs for the different fρ ’s. The key point is that whenever E(ρ) is true, we will be able to show that fρ has a small DNF. E(ρ) is true: Consider the set S0 ⊆ free(ρ) whose existence is promised in the definition of E(ρ). The definition of S0 implies that for each σ ∈ RF S0 all clauses of Φσ◦ρ have at most r literals. By Lemma 4, dnfsize(fσ◦ρ ) ≤ 2|F |−|S0 |−
|F |−|S0 | 100r
, and by Lemma 2, we have |F |−|S0 | |F |−|S0 | dnfsize(fσ◦ρ ) ≤ 2|S0 | 2|F |−|S0 |− 100r ≤ 2|F |− 100r . dnfsize(fρ ) ≤ σ∈RF S
0
E(ρ) is false: We have dnfsize(fρ ) ≤ 2|F |−1 . Using these bounds for dnfsize(fρ ) for ρ ∈ RVS in Lemma 2 we obtain dnfsize(f ) ≤ 2|S| · 2|F |−
|F |−|S0 | 100r
n
+ 2|S| 2− 100 2|F |−1 = 2n (2−
|F |−|S0 | 100r
n
+ 2− 100 ).
The theorem follows from this because |F | − |S0 | = Ω(n) and r = O(log(m/n)). We still have to prove the claim.
620
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
Proof of Claim. Suppose E(ρ) is false. We will first show that there is a set of at most n/(10(r + 1)) surviving clauses in Φρ that together involve at least n/10 variables. The following sequential procedure will produce this set of clauses. Since E does not hold, there is some (surviving) clause c1 of Φρ with at least r +1 variables. Let T be the set of variables that appear in this clause. If |T | ≥ n/10, then we stop: {c1 } is the set we seek. If |T | < n/10, there must be another clause c2 of Φρ with r + 1 variables outside T , for otherwise, we could take S0 = T and E(ρ) would be true. Add to T all the variables in c2 . If |T | ≥ n/10, we stop with the set of clauses {c1 , c2 }; otherwise, arguing as before there must be another clause c3 of Φρ with r + 1 variables outside T . We continue in this manner, picking a new clause and adding at least r + 1 elements to T each time, as long n n as |T | < 10 . Within n/(10(r + 1)) steps we will have |T | ≥ 10 , at which point we stop. For a set C of clauses of Φ, let K(C) be a set of literals obtained by picking one literal for each variable that appears in some clause in C. By the discussion above, for E(ρ) to be false, there must be some set C of clauses of Φ such that ∆ n |C| ≤ n/(10(r + 1)) = a, K(C) ≥ 10 and no literal in K(C) is assigned 1 by ρ. Thus, using Lemma 5, we have Pr[¬E(ρ)] ≤ Pr[no literal in K(C) is assigned 1 by ρ] ρ
n C,|C|≤a,|K(C)|≥ 10
≤
a m j=1
j
n
ρ
n
· 2e− 80 ≤ 2− 100 .
To justify the last inequality, we used the assumption that n is large and m ∈ −4 [104 n, 210 n ]. We omit the detailed calculation. This completes the proof of the claim.
4
Conclusion and Open Problems n
We have shown lower and upper bounds for dnfsize(m, n) of the form 2n−c log(m/n) . The constant c in the lower and upper bounds are far, and it would be interesting to bring them closer, especially when m = An for some constant A. Our bounds are not tight for monotone functions. In particular, what is the largest possible blow-up in size when converting a polynomial-size monotone CNF to an equivalent optimal-size monotone DNF? Equivalently, what is the largest possible number of distinct minimal vertex covers for a hypergraph with n vertices and nO(1) edges? We have given an upper bound 2n−Ω(n/ log n) and a lower bound 2n−O(n log log n/ log n) . Getting tight bounds seems challenging.
Acknowledgements We thank the referees for their comments.
On Converting CNF to DNF
621
References 1. Beame, P.: A switching lemma primer. Technical Report UW-CSE-95-07-01, Department of Computer Science and Engineering, University of Washington (November 1994). Available online at www.cs.washington.edu/homes/beame/. 2. Chv´ atal, V.: The tail of the hypergeometric distribution. Discrete Mathematics 25 (1979) 285–287. 3. Bollig, B. and Wegener, I.: A very simple function that requires exponential size read-once branching programs. Information Processing Letters 66 (1998) 53–57. 4. Dantsin, E., Goerdt, A., Hirsch, E.A., and Sch¨ oning, U.: Deterministic algorithms for k-SAT based on covering codes and local search. Proceedings of the 27th International Colloquium on Automata, Languages and Programming. Springer. LNCS 1853 (2000) 236–247. 5. Dantsin, E., Goerdt, A., Hirsch, E.A., Kannan, R., Kleinberg, J., Papadimitriou, C., Raghavan, P., and Sch¨ oning, U.: A deterministic (2 − 2/(k + 1))n algorithm for k-SAT based on local search. Theoretical Computer Science, to appear. 6. H˚ astad, J.: Almost optimal lower bounds for small depth circuits. In: Micali, S. (Ed.): Randomness and Computation. Advances in Computing Research, 5 (1989) 143–170. JAI Press. 7. Hofmeister, T., Sch¨ oning, U., Schuler, R., and Watanabe, O.: A probabilistic 3-SAT algorithm further improved. Proceedings of STACS, LNCS 2285 (2002) 192–202. 8. Katajainen, J. and Madsen, J.N.: Performance tuning an algorithm for compressing relational tables. Proceedings of SWAT, LNCS 2368 (2002) 398–407. 9. Monien, B. and Speckenmeyer, E.: Solving satisfiability in less than 2n steps. Discrete Applied Mathematics 10 (1985) 287–295. 10. Paturi, R., Pudl` ak, P., Saks, M.E., and Zane, F.: An improved exponential-time algorithm for k-SAT. Proceedings of the 39th IEEE Symposium on the Foundations of Computer Science (1998) 628–637. 11. W. V. O. Quine: On cores and prime implicants of truth functions. American Mathematics Monthly 66 (1959) 755–760. 12. Razborov, A. and Rudich, S.: Natural proofs. Journal of Computer and System Sciences 55 (1997) 24–35. 13. Sch¨ oning, U.: A probabilistic algorithm for k-SAT based on limited local search and restart. Algorithmica 32 (2002) 615–623. 14. Voigt, B., Wegener, I.: Minimal polynomials for the conjunctions of functions on disjoint variables an be very simple. Information and Computation 83 (1989) 65– 79. 15. Wegener, I.: The Complexity of Boolean Functions. Wiley 1987. Freely available via http://ls2-www.cs.uni-dortmund.de/∼wegener. 16. Wegener, I.: Branching Programs and Binary Decision Diagrams – Theory and Applications. SIAM Monographs on Discrete Mathematics and Applications 2000.
A Basis of Tiling Motifs for Generating Repeated Patterns and Its Complexity for Higher Quorum∗ N. Pisanti1 , M. Crochemore2,3,∗∗ , R. Grossi1 , and M.-F. Sagot4,3,∗∗∗ 1
2
Dipartimento di Informatica, Universit` a di Pisa, Italy {pisanti,grossi}@di.unipi.it Institut Gaspard-Monge, University of Marne-la-Vall´ee, France [email protected] 3 INRIA Rhˆ one Alpes, France [email protected] 4 King’s College London, UK
Abstract. We investigate the problem of determining the basis of motifs (a form of repeated patterns with don’t cares) in an input string. We give new upper and lower bounds on the problem, introducing a new notion of basis that is provably smaller than (and contained in) previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all these bases grows exponentially with the quorum, the minimal number of times a motif must appear. We show that a polynomial-time algorithm exists only for fixed quorum.
1
Introduction
Identifying repeated patterns in strings is a computationally-demanding task on the large data sets available in computational biology, data mining, textual document processing, system security, and other areas; for instance, see [6]. We consider patterns with don’t cares in a given string s of n symbols drawn over an alphabet Σ. The don’t care is a special symbol ‘◦’ matching any symbol of Σ; for example, pattern T◦E matches both TTE and TEE inside s = COMMITTEE (note that a pattern cannot have a don’t care at the beginning or at the end, as this is not considered informative). Contrarily to string matching with don’t cares, the pattern T◦E is not given in advance for searching s. Instead, the patterns with don’t cares appearing in s are unknown and, as such, have to be discovered and extracted by processing s efficiently. In our example, T◦E and M◦◦T◦E are among the patterns appearing repeated in COMMITTEE. In this paper we focus ∗ ∗∗ ∗∗∗
The full version of this paper is available in [11] as technical report TR-03-02. Supported by CNRS action AlBio, NATO Sc. Prog. PST.CLG.977017, and Wellcome Trust Foundation. Supported by CNRS-INRIA-INRA-INSERM action BioInformatique and Wellcome Trust Foundation.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 622–631, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Basis of Tiling Motifs for Generating Repeated Patterns
623
on finding the patterns called motifs, which appear at least q times in s for an input parameter q ≥ 2 called the quorum. Different formulations in the known literature address the problem of detecting motifs in several contexts, revealing its algorithmic relevance. Unfortunately, the complexity of the algorithms for motif discovery may easily become exponential due to the explosive growth of the motifs in strings, such as in the artificial string A · · · ATA · · · A (same number of As on both sides of T) generating many motifs with As intermixed with don’t cares, and in other “real” strings over a small alphabet occurring in practice, e.g., DNA sequences. Some heuristics try to alleviate this drawback by reducing the number of interesting motifs to make feasible any further processing of them, but they cannot guarantee sub-exponential bounds in the worst case [7]. In this paper, we explore the algorithmic ideas behind motif discovery while getting some insight into their combinatorial complexity and their connections with string algorithmics. Given a motif x for a string s of length n, we denote the set of positions on s at which the occurrences of x start by Lx ⊆ [0. .n−1], where |Lx | ≥ q holds for the given quorum q ≥ 2. We single out the maximal motifs x, informally characterized as satisfying |Lx | = |Ly | for any other motif y more specific than x, i.e., obtained from x by adding don’t cares and alphabet letters or by replacing one or more don’t cares with alphabet letters. In other words, x appears in y but x occurs in s more times than y does, which is considered informative for discovering the repetitions in s. For example, M◦◦T◦E is maximal in COMMITTEE for q = 2 while M◦◦◦◦E and T◦E are not maximal since M◦◦T◦E is more specific with the same number of occurrences. Maximality provides an intuitive notion of relevance as each maximal motif x indirectly represents all non-maximal motifs z that are less specific than it. Unfortunately, this property does not bound significantly the number of maximal motifs. For example, A · · · ATA · · · A contains an exponential number of them for q = 2 (see Section 2). A further requirement on the maximal motifs is the notion of irredundant motifs ([7]). A maximal motif x is redundant if there exist maximal motifs y1 , . . . , yk = x such that the set of occurrences of x satisfies Lx = Ly1 ∪ . . . ∪ Lyk ; it is irredundant otherwise. The set of occurrences of a redundant motif can be covered by other sets of occurrences while that of an irredundant motif is not the union of the sets of occurrences of other maximal motifs. The basis of the irredundant motifs of string s with quorum q is the set of irredundant motifs in s. Informally speaking, a basis can generate all the motifs by simple rules and can be expressed mathematically in the algebraic sense of the term. According to Parida et al. [7], what makes interesting the irredundant motifs is that their number is always upper bounded by 3n independently of any chosen q ≥ 2; moreover, they can be found in O(n3 log n) time by this bound, notwithstanding the possibly exponential number of maximal motifs that are candidates for the basis. Our results: We study the complexity of finding the basis of motifs with novel algorithms to represent all motifs succinctly. We show that, in the worst case, there is an infinite family of strings for which the basis contains Ω(n2 ) irredundant motifs for q = 2 (see Section 2). This contradicts the upper bound of 3n for any q ≥ 2 given in [7] as shown (in the Appendix of [11] we give a
624
N. Pisanti et al.
counterexample to its charging scheme, which crucially relies on a lemma that is not valid). As a result, the bound of O(n3 log n) time in [7] for any q does not hold since it relies on the upper bound of 3n, thus leaving open the problem of discovering a basis in polynomial time for any q. We also introduce a new definition called basis of the tiling motifs of string s with quorum q. The condition for tiling motifs is stronger than that of irredundancy. A maximal motif x is tiled if there exist maximal motifs y1 , . . . , yk = x such that the set of occurrences of x satisfies Lx = (Ly1 + d1 ) ∪ . . . ∪ (Lyk + dk ) for some integers d1 , . . . , dk ; it is tiling otherwise. Note that the motifs y1 , . . . , yk are not necessarily distinct and the union of their occurrences is taken after displacing them by d1 , . . . , dk , respectively. Since a redundant motif is also tiled with d1 = · · · = dk = 0, a tiling motif is surely irredundant. Hence the basis for the tiling motifs is included in the basis for irredundant motifs while both of them are able to generate the same set of motifs with mechanical rules. Although the definition of tiling motifs is derived from that of irredundant ones, the difference is much more substantial than it may appear. The basis of tiling motifs is symmetric, namely, the tiling motifs of s (the string s in reversed order) are the reversed tiling motifs of s whereas the irredundant motifs for strings s and s are apparently unrelated, unlike the entropy and other properties related to the repetitions in strings. Moreover, the number of tiling motifs can be provably upper bounded in the worst case by n − 1 for q = 2 and they occur in s for a total of 2n times at most, whereas we demonstrate that there can be Ω(n2 ) irredundant motifs. We give more details in Section 3, and we also discuss in the full paper [11] how to find the longest motifs with a limited number of don’t cares. Finally, in Section 4, we reveal an exponential dependency on the quorum q for the number of motifs, both for the basis of irredundant motifs and for the basis of tiling motifs, which was unnoticed in previous work. We prove n−1 that there is an family of infinite 1 n−1 2 −1 = Ω strings for which the basis contains at least q−1 tiling (hence, 2q q−1 irredundant) motifs. Hence, no worst-case polynomial-time algorithm can exist for finding the basis with arbitrary values of q ≥ 2. we can prove Nonetheless, that the tiling motifs in our basis are less than n−1 q−1 in number and occur in s a total of q n−1 q−1 times at2 most. For them there exists a pseudo-polynomial time, which shows that the tiling motifs can be algorithm taking O q 2 n−1 q−1 found in polynomial time if and only if the quorum q satisfies either q = O(1) or q = n − O(1) (the latter is hardly meaningful in practice). Experimenting with small strings exhibits a non-constant growth of the basis for increasing values of q up to O(log n) but larger values of q are possible in the worst case. More experimental analysis of the implementation can be found in [11]. Proofs of all results can also be found in [11]. Related work: As previously mentioned, the seminal idea of basis was introduced by Parida et al. [7]. The unpublished manuscript [1] adopted an identical definition of irredundant motifs in the first part. Very recently, Apostolico [4] observed that the O(n3 )-time algorithm proposed in the second part of [1] contains an implicit definition different from that of the first part. Namely, in a redundant motif x, the list Lx can be “deduced” from the union of the oth-
A Basis of Tiling Motifs for Generating Repeated Patterns
625
ers (see also [3]). Note that no formal specification of this alternative definition is however explicited. Applications of the basis of repeated patterns (with just q = 2) to data compression are described in [2]. Tiling motifs can be employed in this context because of their linear number of occurrences in total. The idea of the basis was also explored by Pelfrˆene et al. [8,9], who introduced the notion of primitive motifs. They gave two alternative definitions claimed to be equivalent, one definition reported in the two-page abstract accompanying the poster and the other in the poster itself. The basis defined in the poster is not symmetric and is a superset of the one presented in this paper. On the other hand, the definition of primitive motifs given in the two-page abstract is somehow equivalent to that given in this paper and introduced independently in our technical report [10]. Because of the lower bounds proved in this paper, the algorithm in [9] is exponential with respect to q. The problem of finding a polynomial-size basis for higher values of q remains unsolved.
2
Irredundant Motifs: The Basis and Its Size for q = 2
We consider strings that are finite sequences of letters drawn from an alphabet Σ, whose elements are also called solid characters. We introduce an additional letter (denoted by ◦ and called don’t care) that does not belong to Σ and matches any letter. The length of a string t with don’t cares, denoted by |t|, is the number of letters in t, and t[i] indicates the letter at position i in t for 0 ≤ i ≤ |t| − 1 (hence, t = t[0]t[1] · · · t[|t| − 1] also noted t[0 . . |t| − 1]). A pattern is a string in Σ ∪ Σ(Σ ∪ {◦})∗ Σ, that is, it starts and ends with a solid character. The pattern occurrences are related to the specificity relation . For individual characters σ1 , σ2 ∈ Σ ∪ {◦}, we have σ1 σ2 if σ1 = ◦ or σ1 = σ2 . Relation extends to strings in (Σ ∪ {◦})∗ under the convention that each string t is implicitly surrounded by don’t cares, namely, letter t[j] is ◦ when j < 0 or j ≥ |t|. In this way, v is more specific than u (shortly, u v) if u[j] v[j] for any integer j. We also say that u occurs at position in v if u[j] v[ + j], for 0 ≤ j ≤ |u| − 1. Equivalently, we say that u matches v[] · · · v[ + |u| − 1]. For the input string s ∈ Σ ∗ with n = |s|, we consider the occurrences of arbitrary patterns x in s. The location list Lx ⊆ [0 . . n − 1] denotes the set of all the positions on s at which x occurs. For example, the location list of x = T◦E in s = COMMITTEE is Lx = {5, 6}. Definition 1 (Motif ). Given a parameter q ≥ 2 called quorum, we say that pattern x is a motif according to s and q if |Lx | ≥ q. Given any location list Lx and any integer d, we adopt the notation Lx + d = { + d | ∈ Lx } for indicating the occurrences in Lx “displaced” by the offset d. Definition 2 (Maximality). A motif x is maximal if any other motif y such that x occurs in y satisfies Ly = Lx + d for some integer d. Making a maximal motif x more specific (thus obtaining y) reduces the number of its occurrences in s. Definition 2 is equivalent to that in [7] stating that x is
626
N. Pisanti et al.
maximal if there exist no other motif y and no integer d ≥ 0 verifying Lx = Ly +d, such that y[j + d] x[j] for 0 ≤ j ≤ |x| − 1. Definition 3 (Irredundant Motif ). A maximal motif x is irredundant if, for any maximal motifs y1 , y2 , . . . , yk such that Lx = ∪ki=1 Lyi , motif x must be one of the yi ’s. Vice versa, if all the yi ’s are different from x, pattern x is said to be covered by motifs yi , y2 , . . . , yk . The basis of irredundant motifs for string s is the set of all irredundant motifs in s, useful as a generator for all maximal motifs in s (see [7]). The size of the basis is the number of irredundant motifs contained in it. We now show the existence of an infinite family of strings sk (k ≥ 5) for which there are Ω(n2 ) irredundant motifs in the basis already for quorum q = 2, where n = |sk |. In this way, we disprove the upper bound of 3n which is based on an incorrect lemma (see also [11]). Each string sk is the suitable extension of tk = Ak TAk , where Ak denotes the letter A repeated k times (our argument works also for z k wz k , where |z| = |w| and z is a string not sharing any common character with w). String tk has an exponential number of maximal motifs, including those having the form A{A, ◦}k−2 A with exactly two don’t cares. To see why, each such motif x occurs four times in tk : specifically, two occurrences of x match the first and the last k letters in tk while each distinct don’t care in x matching the letter T in tk contributes to one of the two remaining occurrences. Extending x or replacing a don’t care with a solid character reduces the number of these occurrences, so x is maximal. The idea of our proof is to obtain strings sk by prefixing tk with O(|tk |) symbols to transform the above maximal motifs x into irredundant motifs for sk . Since there are Θ(k 2 ) of them, and n = |sk | = O(|tk |) = O(k), this leads to the result. In order to define sk on the alphabet {A, T, u, v, w, x, y, z, a1 , a2 , . . . , ak−2 }, we introduce a few notations. Let u be the reversal of u, and let ev k , od k , uk , vk be if k is even : ev k od k uk vk
= a2 a4 · · · ak−2 , = a1 a3 · · · ak−3 , = ev k u e v k vw ev k , k z od k , = od k xy od
if k is odd : ev k od k uk vk
= a2 a4 · · · ak−3 , = a1 a3 · · · ak−2 , = ev k uv e v k wx ev k , k z od k . = od k y od
The strings sk are then defined by sk = uk vk tk for k ≥ 5. Lemma 1. The length of uk vk is 3k, and that of sk is n = 5k + 1. Proposition 1. For 1 ≤ p ≤ k − 2, any motif of the form Ap ◦ Ak−p−1 with one don’t care cannot be maximal in sk . Also motif Ak cannot be maximal in sk . Proposition 2. Each motif of the form A{A, ◦}k−2 A with exactly two don’t cares is irredundant in sk . Theorem 1. The basis for string sk contains Ω(n2 ) irredundant motifs, where n = |sk | and k ≥ 5.
A Basis of Tiling Motifs for Generating Repeated Patterns
3
627
Tiling Motifs: The Basis and Its Properties
In this section we introduce a natural notion of basis for generating all maximal motifs occurring in a string s of length n. Analogously to what was done for maximal motifs in Definition 2, we introduce displacements while defining tiling motifs for this purpose. Definition 4 (Tiling Motif ). A maximal motif x is tiling if, for any maximal motifs y1 , y2 , . . . , yk and for any integers d1 , d2 , . . . , dk such that Lx = ∪ki=1 (Lyi + di ), motif x must be one of the yi ’s. Vice versa, if all the yi ’s are different from x, pattern x is said to be tiled by motifs y1 , y2 , . . . , yk . The notion of tiling is more selective than that of irredundancy in general. For example, in the string s = FABCXFADCYZEADCEADC, motif x1 = A◦C is irredundant but it is tiled by x2 = FA◦C and x3 = ADC according to Definition 4 since its location list, Lx1 = {1, 6, 12, 16}, can be obtained from the union of Lx2 = {0, 5} and Lx3 = {6, 12, 16} with respective displacements d2 = 1 and d3 = 0. A fairly direct consequence of Definition 4 is that if x is tiled by y1 , y2 , . . . , yk with associated displacements d1 , d2 , . . . , dk , then x occurs at position di in each yi for 1 ≤ i ≤ k (hence di ≥ 0). Note that the yi ’s in Definition 4 are not necessarily distinct and that k > 1 for tiled motifs (it follows from the fact that Lx = Ly1 +d1 with x = y1 would contradict the maximality of both x and y1 ). As a result, a maximal motif x occurring exactly q times in s is tiling as it cannot be tiled by any other motifs (we need at least two of them, which is impossible). The basis of tiling motifs is the complete set of all tiling motifs for s, and the size of the basis is the number of these motifs. For example, the basis B for FABCXFADCYZEADCEADC contains FA◦C, EADC, and ADC as tiling motifs. Although Definition 4 is derived from that of irredundant motifs given in Definition 3, the difference is much more substantial than it may appear. The basis of tiling motifs relies on the fact that tiling motifs are considered as invariant by displacement as for maximality. Consequently, our definition of basis is symmetric, that is, each tiling motif in the basis for the reverse string s is the reverse of a tiling motif in the basis of s. This follows from the symmetry in Definition 4 and from the fact that maximality is also symmetric in Definition 2. It is a sine qua non condition for having a notion of basis invariant by the left-to-right or right-to-left order of the symbols in s (like the entropy of s), while this property does not hold for the irredundant motifs. The basis of tiling motifs has further interesting properties. Later in this section, we show that our basis is linear for quorum q = 2 (i.e., its size is at most n − 1) and that the total size of the location lists for the tiling motifs is less than 2n, describing how to find the basis in O(n2 log n log |Σ|) time. In the full paper [11], we discuss some applications such as generating all maximal motifs with the basis and finding motifs with a constraint on the number of don’t cares. Given a string s of length n, let B denote its basis of tiling motifs for quorum q = 2. Although the number of maximal motifs may be exponential and the basis of irredundant motifs may be at least quadratic (see Section 2), we show that the size of B is always less than n. For this, we introduce an operator ⊕ between the symbols of Σ to define merges, which are at the heart of
628
N. Pisanti et al.
the properties on B. Given two letters σ1 , σ2 ∈ Σ with σ1 = σ2 , the operator satisfies σ1 ⊕ σ2 = ◦ and σ1 ⊕ σ1 = σ1 . The operator applies to any pair of strings x, y ∈ Σ ∗ , so that u = x ⊕ y satisfies u[j] = x[j] ⊕ y[j] for all integers j. A merge is the motif resulting from applying the operator ⊕ to s and to its suffix at position k. Definition 5 (Merge). For 1 ≤ k ≤ n − 1, let sk be the string whose character at position i is sk [i] = s[i] ⊕ s[i + k]. If sk contains at least one solid character, Merge k denotes the motif obtained by removing all the leading and trailing don’t cares in sk (i.e., those appearing before the leftmost solid character and after the rightmost solid character). For example, the string FABCXFADCYZEADCEADC has Merge 4 = EADC, Merge 5 = FA◦C, Merge 6 = Merge 10 = ADC and Merge 11 = Merge 15 = A◦C. The latter is the only merge that is not a tiling motif. Lemma 2. If Merge k exists, it must be a maximal motif. Lemma 3. For each tiling motif x in the basis B, there is at least one k for which Merge k = x. Theorem 2. Given a string s of length n and the quorum q = 2, let M be the set of Merge k , for 1 ≤ k ≤ n − 1 such that Merge k exists. The basis B of tiling motifs for s satisfies B ⊆ M, and therefore the size of B is at most n − 1. A simple consequence of Theorem 2 implies a tight bound on the number of tiling motifs for periodic strings. If s = we for a string w repeated e > 1 times, then s has at most |w| tiling motifs. Corollary 1. The number of tiling motifs for s is ≤ p, the smallest period of s. The bound in Corollary 1 is not valid for irredundant motifs. For example, string s = ATATATATA has period p = 2 and only one tiling motif ATATATA, while its irredundant motifs are A, ATA, ATATA and ATATATA. We describe how to compute the basis B for string s when q = 2. A bruteforce algorithm generating first all maximal motifs of s takes exponential time in the worst case. Theorem 2 plays a crucial role in that we first compute the motifs in M and then discard those being tiled. Since B ⊆ M, what remains is exactly B. To appreciate this approach, it is worth noting that we are left with the problem of selecting B from n − 1 maximal motifs in M at most, rather than selecting B among all the maximal motifs in s, which may be exponential in number. Our simple algorithm takes O(n2 log n log |Σ|) time and is faster than previous (and more complicated) methods. Step 1. Compute the Multiset M of Merges. Letting sk [i] be the leftmost solid character of string sk in Definition 5, we define occ x = {i, i + k} to be the positions of the two occurrences of x whose superposition generates x = Merge k . For k = 1, 2, . . . , n−1, we compute string sk in O(n−k) time. If sk contains some
A Basis of Tiling Motifs for Generating Repeated Patterns
629
solid characters, we compute x = Merge k and occ x in the same time complexity. As a result, we compute the multiset M of merges in O(n2 ) time. Each merge x in M is identified by a triplet i, i + k, |x|, from which we can recover the jth symbol of x in constant time by simple arithmetic operations and comparisons. Step 2. Transform the Multiset M into the Set M of Merges. Since there can be two or more merges in M that are identical and correspond to the same merge in M, we put together all identical merges in M by performing radix sorting on the triplets representing them. The total cost of this step is dominated by radix 2 sorting, giving O(n log |Σ|) time. As byproduct, we produce the temporary location list Tx = x =x : x ∈M occ x for each distinct x ∈ M thus obtained. Lemma 4. Each motif x ∈ B satisfies Tx = Lx . Step 3. Select M∗ ⊆ M, where M∗ = {x ∈ M : Tx = Lx }. In order to build M∗ , we employ the Fischer-Paterson algorithm based on convolution [5] for string matching with don’t cares to compute the whole list of occurrences Lx for each merge x ∈ M. Its cost is O((|x| + n) log n log |Σ|) time for each merge x. Since |x| < n and there are at most n − 1 motifs x ∈ M, we obtain O(n2 log n log |Σ|) time to construct all lists Lx . We can compute M∗ by discarding the merges x ∈ M such that Tx = Lx in additional O(n2 ) time. Lemma 5. The set M∗ satisfy the conditions B ⊆ M∗ and x∈M∗ |Lx | < 2n. The property of M∗ in Lemma 5 is crucial in that x∈M |Lx | = Θ(n2 ) when many lists contain Θ(n) entries. For example, s = An has n − 1 distinct merges, each of the form x = Ai for 1 ≤ i ≤ n − 1, and so |Lx | = n − i + 1. This would be a sharp drawback in Step 4 when removing tiled motifs as it may turn into an Θ(n3 ) algorithm. Using M∗ instead, we are guaranteed that x∈M∗ |Lx | = O(n); we may still have some tiled motifs in M∗ , but their total number of occurrences is O(n). Step 4. Discard the Tiled Motifs in M∗ . We can now check for tiling motifs in O(n2 ) time. Given two distinct motifs x, y ∈ M∗ , we want to test whether Lx + d ⊆ Ly for some integer d and, in that case, we want to mark the entries in Ly that are also in Lx + d. At the end of this task, the lists having all entries marked are tiled (see Definition 4). By removing their corresponding motifs from M∗ , we eventually obtain the basis B by Lemma 5. Since the meaningful values of d are equal to the individual entries of Ly , we have only |Ly | possible values to check. For a given value of d, we avoid to merge Lx and Ly in O(|Lx |+|Ly |) time to perform the test, as it would contribute to a total of Θ(n3 ) time. Instead, we exploit the fact that each list has values ranging from 1 to n, and use a couple of bit-vectors of size n to perform |Ly |) time for all the above check in O(|Lx | × values of d. This gives O( y x |Lx | × |Ly |) = O( y |Ly | × x |Lx |) = O(n2 ) by Lemma 5. We therefore detail how to perform the above check with Lx and Ly in O(|Lx | × |Ly |) time. We use two bit-vectors V1 and V2 initially set to all zeros. Given y ∈ M∗ , we set V1 [i] = 1 if i ∈ Ly . For each x ∈ M∗ − {y} and
630
N. Pisanti et al.
for each d ∈ Ly , we then perform the following test. If all j ∈ Lx + d satisfy V1 [j] = 1, we set V2 [j] = 1 for all such j. Otherwise, we take the next value of d, or the next motif if there are no more values of d, and we repeat the test. After examining all x ∈ M∗ − {y}, we check whether V1 [i] = V2 [i] for all i ∈ Ly . If so, y is tiled as its list is covered by possibly shifted location lists of other motifs. We then reset the ones in both vectors in O(|Ly |) time. Summing up Steps 1–4, the dominant cost is that of Step 3, leading to the following result. Theorem 3. Given an input string s of length n over the alphabet Σ, the basis of tiling motifs with quorum q = 2 can be computed in O(n2 log n log |Σ|) time. The total number of motifs in the basis is less than n, and the total number of their occurrences in s is less than 2n.
4
q > 2: Pseudo-Polynomial Bases for Higher Quorum
We now discuss the general case of quorum q ≥ 2 for finding the basis of a string of length n. Differently from previous work claiming a polynomial-time algorithm for any arbitrary value of q, we show in Section 4 that no such polynomial-time algorithm can exist in the worst case, both for the basis of irredundant motifs and for the basis of tiling motifs. The size of these bases provably depends n−1 2 −1 . exponentially on suitable values of q ≥ 2, i.e., we give a lower bound of Ω q−1 In practice, this size has an exponential growth for increasing values of q up to O(log n), but larger values of q are theoretically possible in the worst case. Fixing q = (n − 1)/4 + 1 in our lower bound, we get a size of Ω(2(n−1)/4 ) motifs in the bases. On the average q = O(log|Σ| n) by extending the argument after Theorem 3. We show a further for the basis of tiling motifs in Section 4, property giving an upper bound of n−1 on its size with a simple proof. Since we can q−1 find an algorithm taking time proportional to the square of that size, we can conclude that a polynomial-time algorithm for finding the basis of tiling motifs exists in the worst case if and only if the quorum q satisfies either q = O(1) or q = n − O(1) (the latter condition is hardly meaningful in practice). n−1We now show the existence of a family of strings for which there are at least 2 −1 tiling motifs for a quorum q. Since a tiling motif is also irredundant, q−1 this gives a lower bound for the irredundant motifs to be combined with that in Section 2 (the latter lower bound still gives Ω(n2 ) for q ≥ 2). The strings are this time tk = Ak TAk (k ≥ 5) themselves, without the left used in the bound extension motifs that are maximal of Section 2. The proof proceeds by exhibiting k−1 q−1 and have each exactly q occurrences, from whence it follows immediately that they are tiling (indeed the remark made after Definition 4 holds for any q ≥ 2). Proposition 3. For 2 ≤ q ≤ k and 1 ≤ p ≤ k − q + 1, any motif of the type Ap ◦ {A, ◦}k−p−1 ◦ Ap with exactly q don’t cares is tiling (and so irredundant) in tk . n−1 −1 2 Theorem 4. String tk has q−1 = Ω 21q n−1 tiling (and irredundant) moq−1 tifs, where n = |tk | and k ≥ 2. We now prove that n−1 q−1 is, instead, an upper bound for the size of a basis of tiling motifs for a string s and quorum q ≥ 2. Let us denote as before such
A Basis of Tiling Motifs for Generating Repeated Patterns
631
a basis by B. To prove the upper bound, we use again the notion of a merge except that it involves q strings. The operator ⊕ between the elements of Σ is the same as before. Let k be an array of q − 1 positive values k1 , . . . , kq−1 with 1 ≤ ki < kj ≤ n − 1 for all 1 ≤ i < j ≤ q − 1. A merge is the (non empty) pattern that results from applying the operator ⊕ to the string s and to s itself q − 1 times, at each time shifted by ki positions to the right for 1 ≤ i ≤ q − 1. Lemma 6. If Merge k exists for quorum q, it must be a maximal motif. Lemma 7. For each tiling motif x in the basis B with quorum q, there is at least one k for which Merge k = x. Theorem 5. Given a string s of length n and a quorum q, let M be the set of Merge k , for any of the n−1 exists. The q−1 possible choices of k for which Merge n−1 k basis B of tiling motifs satisfies B ⊆ M, and therefore |B| ≤ q−1 . The tiling motifs in our basis appear in s for a total of q n−1 q−1 times at most. A generalization of the algorithm 2 given in Section 3 gives a pseudo-polynomial time complexity of O q 2 n−1 . q−1
References 1. A. Apostolico and L. Parida. Incremental paradigms of motif discovery. unpublished, 2002. 2. A. Apostolico and L. Parida. Compression and the wheel of fortune. In IEEE Data Compression Conference (DCC’2003), pages 143–152, 2003. 3. A. Apostolico. Pattern discovery and the algorithmics of surprise. In NATO ASI on Artificial Intelligence and Heuristic Methods for Bioinformatics. IOS press, 2003. 4. A. Apostolico. Personal communication, May 2003. 5. M. Fischer and M. Paterson. String matching and other products. In R. Karp, editor, SIAM AMS Complexity of Computation, pages 113–125, 1974. 6. H. Mannila. Local and global methods in data mining: basic techniques and open problems. In P. et al., editor, International Colloquium on Automata, Languages, and Programming, volume 2380 of LNCS, pages 57–68. Springer-Verlag, 2002. 7. L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern Discovery on Character Sets and Real-valued Data: Linear Bound on Irredundant Motifs and Efficient Polynomial Time Algorithm. In SIAM Symposium on Discrete Algorithms, 2000. 8. J. Pelfrˆene, S. Abdedda˝ım, and J. Alexandre. Un algorithme d’indexation de motifs approch´es. In Journ´ee Ouvertes Biologie Informatique Math´ematiques (JOBIM), pages 263–264, 2002. 9. J. Pelfrˆene, S. Abdedda˝ım, and J. Alexandre. Extracting approximare patterns. In Combinatorial Pattern Matching, 2003. to appear. 10. N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. A basis for repeated motifs in pattern discovery and text mining. Technical Report IGM 2002-10, Institut Gaspard-Monge, University of Marne-la-Vall´ee, July 2002. 11. N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. Bases of motifs for generating repeated patterns with don’t cares. Technical Report TR-03-02, Dipartimento di Informatica, University of Pisa, January 2003.
On the Complexity of Some Equivalence Problems for Propositional Calculi Steffen Reith 3Soft GmbH Frauenweiherstr. 14 D-91058 Erlangen, Germany [email protected]
Abstract. In the present paper1 we study the complexity of Boolean equivalence problems (i.e. have two given propositional formulas the same truthtable) and of Boolean isomorphism problems (i.e. does there exists a permutation of the variables of one propositional formula, such that the truthtable of this modified formula coincides with the truthtable of the second formula) of two given generalized propositional formulas and certain classes of Boolean circuits. Keywords: Computational complexity, Boolean functions, Boolean isomorphism, Boolean equivalence, closed classes, Dichotomy, Post, satisfiability problems.
1
Introduction
In 1921 E. L. Post gave a full characterization of all classes of Boolean functions which are closed under superposition (i. e. substitution of Boolean functions, permutation and identification of variables and introduction of fictive variables). Based on his results (see [9]) we define, for a finite set B of Boolean functions, the so called B-formulas and B-circuits, which are closely related to Post’s closed classes of Boolean functions. To be more precise: Every B-formula and B-circuit represents a Boolean function in the closure of B, hence B-formulas form generalized propositional calculi, since the classical formulas and circuits are mostly restricted to B = {∧, ∨, ¬}. The satisfiability-problem of B-formulas was at first studied by H. Lewis. In his paper [8] he was able to show that the satisfiability problem of B-formulas is either NP-complete iff the Boolean function represented by x ∧ ¬y is in the closure of B or it is solvable in deterministic polynomial time. Theorems of this form are called dichotomy theorems, because they deal with problems which are either one of the hardest in a given complexity class or they are easy to solve. One of the best known and the first theorem of this kind was proven by Schaefer (see [12]), giving exhaustive results about the satisfiability of generalized propositional formulas in conjunctive normal form. The work [2] can be seen as 1
Work done in part while employed at Julius-Maximilians-Universit¨ at W¨ urzburg. For a full version of this paper see: http://www.streit.cc/dl/
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 632–641, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Complexity of Some Equivalence Problems for Propositional Calculi
633
counterpart of the present paper in Schaefer’s framework, because there the same equivalence problems are studied for formulas in generalized conjunctive normal form. Besides asking for a satisfying assignment, other interesting problems in the theory of Boolean functions were studied. Two of them are the property of being equal or isomorphic (i.e. is there a permutation of the variables of one function, such that the modified function is equal to the other function). A list of early references can be found in [3], stressing the importance of this kind of problems. In the case of classical propositional formulas it is known that the Boolean equivalence-problem for formulas and circuits is coNP-complete, whereas only very weak lower and upper bounds for the Boolean isomorphism-problem are known. By a reduction from the tautology problem, which is coNP-complete, a lower bound for the isomorphism-problem can be easily derived. An upper p bound for the isomorphism-problem is clearly Σp 2 . For this the Σ2 -machine can existentially guess a permutation and universally check the resulting formula for equality by using its oracle. In [1] Agrawal and Thierauf show that the complement of the Boolean isomorphism-problem for formulas and circuits has an one-round interactive proof, where the Verifier has access to an NP oracle. Using this result they also show that if the Boolean isomorphism-problem is Σp 2 -complete, then the polynomial hierarchy collapse to Σp 3 . Agrawal and Thierauf give in their paper also a better lower bound for the isomorphism-problem of classical propositional formulas. More precisely they have proven that UOCLIQUE ≤pm ISOF holds, where ISOF is the isomorphism-problem of propositional formulas built out of ∧, ∨ and ¬, and UOCLIQUE is the problem of checking whether the biggest clique in a graph is unique. It is known that UOCLIQUE is ≤pm -hard for 1-NP, a superclass of coNP, where 1-NP denotes the class of all problems whose solution can be found on exactly one path in nondeterministic polynomial time. In the present paper we focus on the complexity of checking whether two given B-formulas (B-circuits, resp.) represents the same Boolean function (the equivalence-problem for B-formulas (B-circuits, resp.)) or if they represent isomorphic Boolean functions (isomorphism-problem for B-formulas (B-circuits, resp.)). We give, were possible, tight upper and lower bounds for the isomorphismproblem of B-formulas and B-circuits. In all other cases we show the coNPhardness for the isomorphism-problem, which is as good as the trivial lower bound in the classical case. Note that the known upper bounds for the usual isomorphism-problem hold for our B-formulas and B-circuits as well, since we work with special non-complete sets of Boolean functions as connectors. In the case of equivalence-problems we always give tight upper and lower bounds, showing that this problems are either in L, NL-complete, ⊕L-complete or coNPcomplete, where the complexity class L (NL) is defined as the class of problems which can be solved by a deterministic logarithmically space bounded Turingmachine (nondeterministic logarithmically space bounded Turing-machine) and
634
Steffen Reith
by ⊕L we denote the class of decision problems solvable by an NL machine such that we accept (reject resp.) the input if the number of accepting paths is odd. After presenting some notions and preliminary results for closed classes of Boolean functions in Section 2, we turn to the equivalence and isomorphismproblem for B-circuits and B-formulas in Section 3. Finally Section 4 concludes.
2
Preliminaries
Any function of the kind f : {0, 1}k → {0, 1} will be called (k-ary) Boolean function. The set of all Boolean functions will be denoted by BF. Now let B be a finite set of Boolean functions. In the following we will give a description of B-circuits and B-formulas. A B-circuit is a directed acyclic graph where each node is labeled either with a variable xi or with a function out of B. The nodes of such a B-circuit are called gates, the edges are called wires. The number of wires pointing into a gate is called fan-in and the number of wires leaving a gate is named fan-out. Moreover we order the wires pointing to a gate. If a wire leaves a gate u and points to gate v we call u a predecessor-gate of v. Additionally the gates labeled by a variable xi must have fan-in 0 and we call them input-gate. The gates labeled by a function f k ∈ B must have fan-in k. Finally we mark one particular gate o and call this gate the output-gate. Since a Boolean formula can be interpreted as a tree-like circuit, it is reasonable to define B-formulas as the subset of B-circuits C, such that each gate of C has at most fan-out 1. Each B-circuit C(x1 , . . . , xn ) computes a Boolean function fCn (x1 , . . . , xn ). Given an n-bit input string a = a1 . . . an every gate in C computes a Boolean value as follows: The input-gate xi computes ai for 1 ≤ i ≤ n and each non input-gate v computes the value g(b1 , . . . , bm ), where g m ∈ B and b1 , . . . , bm are the values computed by the predecessor-gates of v, ordered according to the order of the wires pointing to v. The value fCn (a1 , . . . , an ) computed by C is defined as the value computed by the output-gate o. Let V = {x1 , . . . , xn } be a finite set of Boolean variables. An assignment w.r.t. V is a function I : {x1 , . . . , xn } → {0, 1}. If V is clear from the context we simply say assignment. If there is an obvious ordering for the variables, we also use (a1 , . . . , an ) instead of {I(x1 ) := a1 , . . . , I(xn ) := an }. Let {xi1 , . . . , xim } ⊆ V . By I = I/{xi1 , . . . , xim } we denote the restricted assignment w.r.t {xi1 , . . . , xim }, which is defined by I (x) = I(x) iff x ∈ {xi1 , . . . , xim }. In order for an assignment I w.r.t V to be compatible with a B-circuit C (B-formula H, resp.) we must have V = Var(C) (V = Var(H), resp.). An assignment I satisfies a circuit C(x1 , . . . , xn ) (formula H(x1 , . . . , xn ), resp.) iff fC (I(x1 ), . . . , I(xn )) = 1 (fH (I(x1 ), . . . , I(xn )) = 1, resp). For an assignment I which satisfies C (H, resp.) we write I |= C (I |= H, resp.). The number of satisfying assignments (nonsatisfying assignments, resp.) of C is denoted by #1 (C) =def |{I|I |= C}| (#0 (C) =def |{I|I |= C}|, resp.). A variable xi is called fictive iff f (a1 , . . . , ai−1 , 0, ai+1 , . . . , an ) = f (a1 , . . . , ai−1 , 1, ai+1 , . . . an ) for all a1 , . . . ai−1 , ai+1 , . . . , an and 1 ≤ i ≤ n.
On the Complexity of Some Equivalence Problems for Propositional Calculi
635
For simplicity we often use a formula instead of a Boolean function. For example, the functions id (x), and (x, y), or (x, y), not(x), xor (x, y) are represented by the formulas x, x ∧ y, x ∨ y, ¬x and x ⊕ y. Sometimes x is used instead of ¬x. We will use 0 and 1 for the constant 0-ary Boolean functions. Finally have in mind that the term gate-type will be replaced by function-symbol when we work with B-formulas. Now we identify the class of Boolean functions which can be computed by a B-circuit (B-formula, resp.). For this let B be a set of Boolean functions. By [B] we denote the smallest set of Boolean functions, which contains B ∪ {id } and is closed under superposition, i.e. under substitution (composition of functions), permutation and identification of variables, and introduction of fictive variables. We call a set F of Boolean functions base for B if [F ] = B, and F is called closed if [F ] = F . A base B is called complete if [B] = BF, where BF is the set of all Boolean functions. For an n-ary Boolean function f its dual function dual(f ) is defined by dual(f )(x1 , . . . , xn ) =def ¬f (¬x1 , . . . , ¬xn ). Let B be a set of Boolean functions. We define dual(B) =def {dual(f )|f ∈ B}. Clearly dual(dual(B)) = B. Furthermore we define dual(H) (dual(C), resp.) to be the dual(B)-formula (dual(B)circuit, resp.) that emerges when we replace all function-symbols in H (gatetypes in C, resp.) by the symbol of their dual function (by the gate-type of their dual-function, resp.). Clearly fdual(H) = dual(fH ) (fdual(C) = dual(fC ), resp.). Emil Post gave in [9] a complete list of all classes of Boolean functions being closed under superposition. Moreover he showed that each closed class has a finite base. The following proposition gives us some bases for some closed classes which play a role for this paper. Proposition 1 ([9,7,10]). Every closed class of Boolean functions has a finite base. In particular: Class BF L S00 E2 N
Base {and , or , not} {xor , 1} {x ∨ (y ∧ z)} {and } {not, 1}
Class M L2 S10 V
Base {and , or , 0, 1} {x ⊕ y ⊕ z} {x ∧ (y ∨ z)} {or , 0, 1}
Class D D2 E V2
Base {(x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z)} {(x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z)} {and , 0, 1} {or }
Now we need some definitions: Definition 2. Let C(x1 , . . . , xn ) be a B-circuit and π : {1, . . . , n} → {1, . . . , n} be a permutation. By π(C(x1 , . . . , xn )) we denote the B-circuit which emerges when we replace the variables xi for 1 ≤ i ≤ n by xπ(i) in C(x1 , . . . , xn ) simultaneously. Next we will define two equivalence relations for B-circuits:
636
Steffen Reith
Definition 3. Let C1 and C2 be B-circuits. The Boolean equivalence- and isomorphism relation for B-circuits is defined as follows: C1 ≡ C2 ⇔def fC1 (a1 , . . . , an ) = fC2 (a1 , . . . , an ) for all a1 , . . . , an ∈ {0, 1}. C1 ∼ = C2 ⇔def There exists a permutation π : {1, . . . , n} → {1, . . . , n} such that π(C1 ) ≡ C2 . These definitions will be analogously used for B-formulas. Note that in both cases the sets of input-variables of C1 and C2 are equal. This can be easily achieved by adding fictive variables when needed. Moreover note that C1 ≡ C2 iff dual(C1 ) ≡ dual(C2 ). Now we are ready to define the Boolean equality-problem and the Boolean isomorphism-problem for B-circuits: Problem: EQC (B) Instance: B-circuits C1 , C2 Question: Is C1 ≡ C2 ?
Problem: ISOC (B) Instance: B-circuits C1 , C2 Question: Is C1 ∼ = C2 ?
Analogously we define the Boolean equality-problem EQF (B) and the Boolean isomorphism-problem ISOF (B) for B-formulas. The next proposition shows, that if we permute the variables of a given Bformula H then the number of satisfying assignments #1 (H) (non-satisfying assignments #0 (H), resp.) remain equal. This proposition is used to show the coNP-hard cases. Proposition 4. Let H1 (x1 , . . . , xn ) and H2 (x1 , . . . , xn ) be B-formulas such that H1 ∼ = H2 . Then #1 (H1 ) = #1 (H2 ) and #0 (H1 ) = #0 (H2 ) hold. It is obvious that Proposition 4 works for B-circuits too. Note that the opposite direction of Proposition 4 does not hold, as x ⊕ y ∼ ¬(x ⊕ y) and = #1 (x ⊕ y) = #1 (¬(x ⊕ y)) = 2 shows. Now let E be an arbitrary problem related to a pair of B-circuits: E(B) =def {(C1 , C2 ) | C1 and C2 are B-circuits such that fC1 and fC2 have property E } . This gives the following obvious proposition: Proposition 5. Let E be a property of two B-circuits, and let B and B be finite sets of Boolean functions. log 1. If B ⊆ [B ] then E(B) ≤log m E(B ). 2. If [B] = [B ] then E(B) ≡m E(B ).
This proposition clarifies that the complexity of EQF (B) and ISOC (B) can be determined by studying the classes of Post’s lattice. Now we can give an upper bound for the equivalence-problem of B-formulas and B-circuits. Later we will see that this upper bound is tight for B-circuits and B-formulas in some cases. Moreover we show that the equivalence-problem and the isomorphism-problem for B-circuits and dual(B)-circuits (B-formulas and dual(B)-formulas, resp.) is of equal complexity, hence we have a vertical symmetry-axis in Post’s lattice for the complexity of EQF (B), EQC (B), ISOF (B) and ISOC (B).
On the Complexity of Some Equivalence Problems for Propositional Calculi
637
Proposition 6. Let B be a finite set of Boolean functions. Then the following four statements hold: 1. 2. 3. 4.
C F log C EQF (B) ≤log m EQ (B) and ISO (B) ≤m ISO (B), EQF (B) ∈ coNP and EQC (B) ∈ coNP, F C log C EQF (B) ≡log m EQ (dual(B)) and EQ (B) ≡m EQ (dual(B)), F C log C ISOF (B) ≡log m ISO (dual(B)) and ISO (B) ≡m ISO (dual(B)).
The circuit-value problem for B-formulas is defined as follows: Problem: VALC (B) Instance: A B-circuit C(x1 , . . . , xn ) and an assignment (a1 , . . . , an ) Question: Is fC (a1 , . . . , an ) = 1? Similarly we define the formula-value problem VALF (B) for B-formulas. The following proposition gives us some information about the circuit value-problem of B-circuits. A complete classification of the circuit value problem for all closed classes can be found in [11,10], which generalizes a result of [4], where only two-input gate-types were studied. Proposition 7 ([11]). 1. Let B be a finite set of Boolean functions such that V2 ⊆ [B] ⊆ V or E2 ⊆ [B] ⊆ E, then VALC (B) is ≤log m -complete for NL. 2. Let B be a finite set of Boolean functions such that L2 ⊆ [B] ⊆ L, then VALC (B) is ≤log m -complete for ⊕L. Proposition 8 ([5,6]). 1. L⊕L = ⊕L
2. LNL = NL
To show, in certain cases, lower bounds for the equivalence- and isomorphismproblem of B-formulas and B-circuits we use the following well known graphtheoretic problems: Problem: Graph Accessibility Problem (GAP) Instance: A directed acyclic graph G whose vertices have outdegree 0 or 2, a start vertex s, and a target vertex t. Output: Is there a path in G which leads from s to t? Problem: Graph Odd Accessibility Problem (GOAP) Instance: A directed acyclic graph G whose vertices have outdegree 0 or 2, a start vertex s, and a target vertex t. Output: Is the number of paths in G, which lead from s to t, odd?
638
3
Steffen Reith
Main Results
In this section we use Post’s lattice (see [9,7]) to determine the complexity of EQC (B) and EQF (B) step by step. Similarly we are able to give lower bounds for the isomorphism problem of B-circuits and B-formulas. The basic idea in the case [B] ⊆ N, where N is the class of k-ary negations with Boolean constants is, that there exists a unique path from the output-gate to some input-gate or a gate which is labeled by a constant function. This is because every allowed Boolean function has one non-fictive variable at most. Lemma 9. Let B be a finite set of Boolean functions and [B] ⊆ N. Then EQC (B) ∈ L, EQF (B) ∈ L, ISOC (B) ∈ L and ISOF (B) ∈ L. If we restrict ourselves to or -functions or and -functions the isomorphismand equivalence-problem for such B-circuits is complete for NL. For the proof we use the NL-complete graph accessibility problem (GAP). In contrast to this, if we only use exclusive-or functions for our B-circuits the equivalence- and isomorphism-problem is ⊕L-complete. Here the ⊕L-complete graph odd accessibility problem (GOAP) is used. The basic idea in all cases is to interpret a B-circuit as directed acyclic graph and the observation that the equivalenceand isomorphism-problem can be solved by using the suitable reachability problem. Theorem 10. Let B be a finite set of Boolean functions. If E2 ⊆ [B] ⊆ E or V2 ⊆ [B] ⊆ V, then EQC (B) and ISOC (B) are ≤log m -complete for NL. If L2 ⊆ [B] ⊆ L, then EQC (B) and ISOC (B) are ≤log m -complete for ⊕L. Proposition 7 shows that in the cases V2 ⊆ [B] ⊆ V and E2 ⊆ [B] ⊆ E the circuit value problem is complete for NL and in the case L2 ⊆ [B] ⊆ L that it is complete for ⊕L. The next lemma show that this does not hold for B-formulas. In contrast to the circuit value problem the formula value problem VALF (B) is in L in all these three cases, which gives us a hint that the corresponding equality- and isomorphism-problem for B-formulas is easier than for B-circuits, since it can be solved with the help of a VALF (B) oracle. Lemma 11. Let B be a finite set of Boolean functions such that V2 ⊆ [B] ⊆ V, E2 ⊆ [B] ⊆ E or L2 ⊆ [B] ⊆ L, then VALF (B) ∈ L. When EQC (B) and ISOC (B) are NL-complete or ⊕L-complete, the formula case is still easy to solve, as we will see in the Theorem below. Theorem 12. Let B be a finite set of Boolean functions. If E2 ⊆ [B] ⊆ E, V2 ⊆ [B] ⊆ V or L2 ⊆ [B] ⊆ L, then EQF (B) ∈ L and ISOF (B) ∈ L. The next lemma shows that it is possible to construct for every monotone 3 -DNF-formula an equivalent (B ∪ {0, 1})-formula in logarithmic space, if (B ∪ {0, 1}) is a base for M.
On the Complexity of Some Equivalence Problems for Propositional Calculi
639
Lemma 13. Let k > 0 be fixed and B be a finite set of Boolean functions, such that E(x, y, v, u) and V (x, y, v, u) are B-formulas, fulfilling E(x, y, 0, 1) ≡ x ∧ y and V (x, y, 0, 1) ≡ x∨y. Then, for any monotone k-DNF (k-CNF, resp.) formula H(x1 , . . . , xn ), there exists a B-formula H (x1 , . . . , xn , u, v) such that H (x1 , . . . , xn , 0, 1) ≡ H(x1 , . . . , xn ). Moreover, H can be computed from H in logarithmic space. Now we use Lemma 13 to build two monotone formulas out of a 3 -DNF formula, such that these two monotone formulas are equivalent iff the 3 -DNF formula is a tautology. For this let 3 -TAUT be the coNP-complete set of 3-DNF formulas which are tautologies. Lemma 14. Let B be a finite set of Boolean functions such that {or , and } ⊆ [B]. Then there exist logspace computable B-formulas H1 and H2 , which can be computed out of H, such that H ∈ 3 -TAUT iff H1 ≡ H2 iff H1 ∼ = H2 iff #1 (H1 ) = #1 (H2 ). Moreover the formulas H1 and H2 do not represent constant Boolean functions (i.e., H1 ≡ 0, H1 ≡ 1, H2 ≡ 0 and H1 ≡ 1). The property that the formulas H1 and H2 can not represent a constant Boolean function plays an important role in the proof of Theorem 17. To show the coNP-completeness the basic idea of the next theorems works as follows: Since we have and ∈ [B] and or ∈ [B ∪ {1}] we can almost apply Lemma 14. The only problem is that we have to “simulate” the constant 1. The idea here is to introduce a new variable u which plays the role of the missing constant 1. By connecting the formulas from Lemma 14 and u with ∧ we cause that every satisfying assignment assigns 1 to u. Theorem 15. Let B be a finite set of Boolean functions, such that and ∈ [B] F and or ∈ [B ∪ {1}]. Then EQF (B) is ≤log m -complete for coNP and ISO (B) is log ≤m -hard for coNP. Corollary 16. Let B be a set of Boolean functions such that S10 ⊆ [B] or S00 ⊆ F [B]. Then EQF (B) and EQC (B) are ≤log m -complete for coNP and ISO (B) and C log ISO (B) are ≤m -hard for coNP. Now only two closed classes of Boolean functions are left: D and D2 . In this case the construction of Theorem 15 cannot work, because neither and ∈ D2 nor or ∈ D2 . Hence we have no possibility to use the and -function to force an additional variable to 1 for all satisfying assignments. This is not needed. Instead we use two new variables as a replacement for 0 and 1 and show, that in any case we get either two formulas representing the same constant function or formulas which match to Lemma 14. Theorem 17. Let B be a finite set of Boolean functions such that D2 ⊆ [B] ⊆ F D. Then EQF (B) and EQC (B) are ≤log m -complete for coNP and ISO (B) and C log ISO (B) are ≤m -hard for coNP. This leads us to the following classification theorems for the complexity of the equivalence- and isomorphism-problem of B-circuits and B-formulas:
640
Steffen Reith
Theorem 18. Let B be a finite set of Boolean functions. The complexity of EQC (B) and ISOC (B) can be determined as follows: if B ⊆ N then EQC (B) ∈ L and ISOC (B) ∈ L. If B ⊆ E or B ⊆ V then EQC (B) and ISOC (B) are ≤log m complete for NL. In the case that B ⊆ L EQC (B) and ISOC (B) are ≤log m complete for ⊕L. In all other cases EQC (B) is ≤log m -complete for coNP and ISOC (B) is ≤log m -hard for coNP. Theorem 19. Let B be a finite set of Boolean functions. The complexity of EQF (B) and of ISOF (B) can be determined as follows: If B ⊆ V, B ⊆ E or B ⊆ L then EQF (B) ∈ L and ISOF (B) ∈ L. In all other cases EQF (B) is F log ≤log m -complete for coNP and ISO (B) is ≤m -hard for coNP. Another interesting problem arises in the context of the isomorphism-problem of Boolean formulas. It is well-known that the satisfiability-problem for unrestricted Boolean formulas is complete for NP, but the satisfiability-problem for monotone Boolean formulas is solvable in P. We showed that in both cases the isomorphism-problem is coNP-hard, but it might be possible that a better upper bound for the isomorphism-problem of monotone formulas (M2 ⊆ [B] ⊆ M) than for the isomorphism-problem of unrestricted formulas can be found.
4
Conclusion
In the present work we determined the complexity of the equivalence-problem of B-formulas and B-circuits and were able to give a complete characterization of the complexity w.r.t. all possible finite sets of Boolean functions. We showed that the equivalence-problem for B-circuits is, depending on the used set of Boolean functions, NP-complete, NL-complete, ⊕L-complete or in L. Interestingly, because of the succinctness of circuits, the equivalence-problem for B-formulas is sometimes easier to solve. To be more precise, if EQC (B) is NL-complete or ⊕L-complete then EQF (B) is still solvable in deterministic logarithmic space. In all other cases the representation as formula or circuit has no influence and the complexity of EQC (B) and EQF (B) coincide. In the case of isomorphism-problems we were not always able to prove completeness results. In these cases we showed the hardness for coNP as a lower bound. Note that in these cases the trivial upper bound Σp 2 remains valid, so our results are as strong as the well known trivial upper and lower bound for the isomorphism-problem of unrestricted Boolean formula and circuits. In the easier case [B] ⊆ N we proved that the isomorphism-problem for B-circuits is decidable in deterministic logarithmic space. For V2 ⊆ [B] ⊆ V and E2 ⊆ [B] ⊆ E (L2 ⊆ [B] ⊆ L, resp.) we showed the NL-completeness (⊕L-completeness, resp.) of the isomorphism-problem of B-circuits. Similar to the equivalence-problem the isomorphism-problem for B-formulas is still solvable in deterministic logarithmic space if ISOC (B) is NL-complete or ⊕L-complete. We use the same reduction for showing the coNP-hardness of EQF (B) and ISOF (B), therefore it can not be expected that this reduction is powerful enough
On the Complexity of Some Equivalence Problems for Propositional Calculi
641
to show a better lower bound for the isomorphism-problem. Note that this reduction does not use the ability of permuting variables. Hence it seems possible that any reduction showing a better lower bound for the isomorphism-problem has to take a non-trivial permutation into account.
Acknowledgments I am grateful to Heribert Vollmer for a lot of helpful discussions and in particular to Sven Kosub for a first idea of Lemma 14.
References 1. Manindra Agrawal and Thomas Thierauf. The Boolean isomorphism problem. In 37th Symposium on Foundation of Computer Science, pages 422–430. IEEE Computer Society Press, 1996. 2. Elmar B¨ ohler, Edith Hemaspaandra, Steffen Reith, and Heribert Vollmer. Equivalence problems for boolean constraint satisfaction. In Proc. Computer Science Logic, Lecture Notes in Computer Science. Springer Verlag, 2002. 3. B. Borchert, D. Ranjan, and F. Stefan. On the Computational Complexity of Some Classical Equivalence Relations on Boolean Functions. Theory of Computing Systems, 31:679–693, 1998. 4. L.M. Goldschlager and I. Parberry. On The Construction Of Parallel Computers From Various Bases Of Boolean Functions. Theoretical Computer Science, 43:43– 58, 1986. 5. Ulrich Hertrampf, Steffen Reith, and Heribert Vollmer. A note on closure properties of logspace MOD classes. Information Processing Letters, 75:91–93, 2000. 6. Neil Immerman. Nondeterministic space is closed under complementation. SIAM Journal on Computing, 17(5):935–938, 1988. 7. S. W. Jablonski, G. P. Gawrilow, and W. B. Kudrajawzew. Boolesche Funktionen und Postsche Klassen. Akademie-Verlag, 1970. 8. Harry R. Lewis. Satisfiability Problems for Propositional Calculi. Mathematical Systems Theory, 13:45–53, 1979. 9. E. L. Post. The two-valued iterative systems of mathematical logic. Annals of Mathematical Studies, 5:1–122, 1941. 10. Steffen Reith. Generalized Satisfiability Problems. PhD thesis, University of W¨ urzburg, 2001. 11. Steffen Reith and Klaus W. Wagner. The Complexity of Problems Defined by Boolean Circuits. Technical Report 255, Institut f¨ ur Informatik, Universit¨ at W¨ urzburg, 2000. To appear in Proceedings International Conference Mathematical Foundation of Informatics, Hanoi, October 25 - 28, 1999. 12. T. J. Schaefer. The complexity of satisfiability problems. In Proccedings 10th Symposium on Theory of Computing, pages 216–226. ACM Press, 1978. 13. R. Szelepcs´enyi. The method of forcing for nondeterministic automata. Bulletin of the European Association for Theoretical Computer Science, 33:96–100, 1987.
Quantified Mu-Calculus for Control Synthesis St´ephane Riedweg and Sophie Pinchinat IRISA-INRIA, F-35042, Rennes, France {sriedweg,pinchina}@irisa.fr Fax: +33299847171 Abstract. We consider an extension of the mu-calculus as a general framework to describe and synthesize controllers. This extension is obtained by quantifying atomic propositions, we call the resulting logic quantified mu-calculus. We study its main theoretical properties and show its adequacy to control applications. The proposed framework is expressive : it offers a uniform way to describe as varied parameters as the kind of systems (closed or open), the control objective, the type of interaction between the controller and the system, the optimality criteria (fairness, maximally permissive), etc. To our knowledge, none of the former approaches can capture such a wide range of concepts.
1
Introduction
To generalize the control synthesis theory of Ramadge and Wohnam [1], lot of works use temporal logics as specification [2–4]. All those approaches suffer from substantial limitations: there is no way to impose properties on the interaction between the system and its controller, nor to require optimality of controllers. The motivation of our work is to fill these gaps. We put forward an extension of the mu-calculus well suited to describe general control objectives and to synthesize finite state controllers. The proposed framework is expressive : it offers a uniform way to describe as varied parameters as the kind of systems (closed or open), the control objective, the type of interaction between the controller and the system, the optimality criteria (fairness, maximally permissive), etc. To our knowledge, none of the former approaches can capture such a wide range of concepts. As in [5–7], we extend a temporal logic (the mu-calculus) by quantifying atomic propositions. We call the resulting logic quantified mu-calculus. We study its main theoretical properties and show its adequacy to control applications. We start from alternating tree automata for mu-calculus [8, 9] and we extend their theory using the Simulation Theorem [10, 11, 8] and a projection of automata. The Simulation Theorem states that alternating automata and nondeterministic automata are equivalent. The projection is an adaption of the construction of [12]. The meanings of existential quantifier is defined projecting automata on sets of propositions. Decision procedures for model-checking and satisfaction can therefore be obtained. Both problems are non-elementary when we consider the full logic. We can however display interesting fragments with lower complexity, covering still a wide class of control problems. B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 642–651, 2003. c Springer-Verlag Berlin Heidelberg 2003
Quantified Mu-Calculus for Control Synthesis
643
The following explains the applications to control. We view supervision of systems as pruning the systems’ computation trees. Consequently, a controller can be represented by a labeling c of the (uncontrolled) system’s computation tree into {0, 1}, such that the (downwards closed) 1-labeled subtree is the behavior of the controlled system. For any proposition c, we define a transformation α∗c of mu-calculus formulas α such that some controller induced restriction of S satisfies α if and only if α ∗ c holds of some c-labeling on the computation tree. Labeling allows to consider the forbidden part of the controlled system, and we derive controllers for large classes of specifications, using a constructive model-checking. Beyond the capability to specify controllers which only cut controllable transitions, we can more interestingly specify (and synthesize) a maximally permissive controller for α; i.e. a controller c such that the c-controlled system satisfies α and no c -controlled system such that c c satisfies α; where c c is the mucalculus formula expressing that the 1-labeled subtree defined by c is a proper subtree of the 1-labeled subtree defined by c . A maximally permissive controller enforcing α can therefore be specified by the quantified mu-calculus formula: ∃c. α∗c ∧ ∀c . c c ⇒ ¬(α∗c ) . Controllers and maximally permissive controllers for open systems [2] may also be specified and synthesized. Such controllers are required moreover to be robust against the environment’s policy. Also, the implementation considerations of [13] and decentralized controllers may be formulated in quantified mu-calculus. Not surprisingly, the expressive power of the mu-calculus enables us to deal with fairness. The rest of the paper is organized as follows : Section 2 presents the logic. Section 3 studies applications to control theory. Algorithms are developed in Section 4, based on the automata-theoretic semantics. Finally, control synthesis is illustrated in Section 5.
2
Quantified Mu-Calculus
We assume given a finite set of events A, a finite set of propositions AP , and an infinite set of variables V ar = {X, Y, . . .}. Definition 1. (Syntax of qLµ ) The set of formulas of the quantified mucalculus on Γ ⊆ AP , written qLµ (Γ ), is defined by the grammar: ∃Λ.α | ¬α1 |α1 ∨ α2 | β where Λ ⊆ AP , α ∈ qLµ (Γ ∪ Λ), α1 and α2 are formulas in qLµ (Γ ), and β is a formula of the pure mu-calculus on Γ . The set of formulas of the pure mu-calculus on Γ ⊆ AP , written Lµ (Γ ), is defined by the grammar:
| p | X | ¬β | β | β ∨ β | µX.β(X)
644
St´ephane Riedweg and Sophie Pinchinat
where a ∈ A , p ∈ Γ , X ∈ V ar, and β and β are in Lµ (Γ ). To ensure meanings to fix-points formulas, X must occur under an even number of negation symbols ¬ in α(X), in each formula µX.α(X). Extending the terminology of mu-calculus, we call sentences all quantified mucalculus formulas without free variables. We write ⊥, [a]α, α ∧ β, νX.α(X), and ∀Λ.α respectively for negating , ¬α, ¬α ∨ ¬β, µX.¬α(¬X) and ∃Λ.¬α. a a We write also →, →, α ⇒ β, and ∃x.α respectively for , [a] ⊥, ¬α ∨ β, and ∃{x}.α. Since in general, fixed-point operators and quantifiers do not commute, we enforce no quantification inside fixed-point terms. The quantified mu-calculus qLµ , as a generalization of the mu-calculus, is also given an interpretation over deterministic transition structures called processes in [3] . Definition 2. A process on Γ ⊆ AP is a tuple S =< Γ, S, s0 , t, L >, where S is the set of states, s0 ∈ S is the initial state, t : S × A → S is a partial function called the transition function and L : S → P(Γ ) maps states to subset of propositions. We say that S is finite if S is finite and that it is complete if for all (a, s) ∈ A × S, t(s, a) is defined. Compound processes can be built up by synchronous product. Definition 3. The (synchronous) product of two processes S1 =< Γ1 , S1 , s01 , t1 , L1 > and S2 =< Γ2 , S2 , s02 , t2 , L2 > on disjoint sets Γ1 and Γ2 is the process S1 × S2 =< Γ, S1 × S2 , (s01 , s02 ), t, L > on Γ = Γ1 ∪ Γ2 such that (1) (s1 , s2 ) ∈ t((s1 , s2 ), a) whenever s1 ∈ t1 (s1 , a) and s2 ∈ t2 (s2 , a)), and (2) L(s1 , s2 ) = L1 (s1 ) ∪ L2 (s2 ). In the sequel, we shall in particular make the product of a process on Γ with another (complete) process on a disjoint set of propositions Λ in order to obtain a similar process on Γ ∪ Λ. This is the way in which qLµ will be applied to solve control problem (see Theorem 1 Section 3). Definition 4. A labeling process on Λ ⊆ AP is simply a complete process E on Λ. Now, for any process S =< Γ, S, s0 , t, L > with Γ disjoint from Λ, S × E is called a labeling of S (by E) on Λ. We let LabΛ denote the set of labeling processes on Λ. Definition 5. (Semantics of qLµ ) The interpretation of the qLµ (Γ )-formulas is relative to a process S =< Γ, S, s0 , t, L > and a valuation val : V ar → P(S). [val] This interpretation α S (⊆ S) is defined by: [val] [val] [val] S = S, p S = {s ∈ S |p ∈ L(s)}, X S = val(X), [val] [val] [val] [val] [val] ¬α S = S \ α S , α ∨ β S = α S ∪ β S [val] [val] α S = {s ∈ S |∃s : t(s, a) = s and s ∈ α S }, [val] [val(V /X)] ⊆ V }, µX.α(X) S = ∩{V ⊆ S | α S [val] [val×E] 0 ∃Λ.α S = {s ∈ S |∃E =< Λ, E, ε ,t ,L >∈ LabΛ , (s, ε0 ) ∈ α S×E } where (val × E)(X) = val(X) × E for any X ∈ V ar.
Quantified Mu-Calculus for Control Synthesis
645
Notice that the valuation val does not influence the semantics of a sentence α ∈ qLµ ; and we write S |= α whenever the initial state of S belongs to α S . Clearly, bisimilar processes satisfy the same qLµ formulas.
3
Control Specifications
This section presents various examples of specifications for control objectives in qLµ . First, a transformation of formulas is defined, which used to link qLµ model-checking with control problems, as shown by Theorem 1. Variants of the Theorem are then exploited to capture requirements, such as maximal permissive controllers, controllers for open systems, etc. Definition 6. For any sentence α ∈ qLµ (Γ ) and for any x ∈ AP , the x-lift of α is the formula α∗x ∈ qLµ (Γ ∪ {x}), inductively defined by (by convention, ∗ has highest priority) :
∗x = , p∗x = p, X ∗x = X, (¬α)∗x = ¬α∗x, (α ∨ β)∗x = α∗x ∨ β ∗x, ( α)∗x = (x ∧ α∗x), (µX.α)∗x = µX.α∗x, (∃Λ.α)∗x = ∃Λ.α∗x. Definition 7. Given a process S =< Γ, S, s0 , t, L > and some x ∈ Γ , the xpruning of S is the process S(x ) =< Γ, S, s0 , t , L > such that, for all s ∈ S and a ∈ A, t (s, a) = t(s, a) if x ∈ L(t(s, a)) or t (s, a) is undefined otherwise. Lemma 1. For any process S on Γ , for any x ∈ Γ , and for any sentence α ∈ qLµ (Γ ), we have: α S(x ) = α∗x S . Proof. Straightforward by induction on α.
Control synthesis [1, 3, 14, 4] is a generic problem that consists to enforce a plant S to have some property α by composing it with another process R, the controller of S for α. The goal is to synthesize R given α. We focus here on the joint specification α and constraints on R : they reflect the way it exerts the control. This capability relies on the following theorem. Theorem 1. Given a sentence α ∈ qLµ (Λ ∪ Γ ), where Λ and Γ are disjoint, and a process S on Γ , the following assertions are equivalent : – There exists a controller on Λ of S for α. – S |= ∃c∃Λ. α∗c where c is a fresh proposition. Proof. First, suppose that there exists a process R =< Λ, R, r0 , t, L > such that S × R |= α. Given c ∈ AP \ Λ ∪ Γ , we can easily define E ∈ LabΛ∪{c} s.t. R is (E))(c ) without the label c. Now, (S × E)(c ) or equivalently S ×(E)(c ) satisfies α, since R is (E)(c ) without c and c does not occur in α. Using Lemma 1, we conclude that (S × E) |= α∗c. Suppose now that S |= ∃c∃Λ.α∗c. By Definition 5, there is a process E ∈ Lab{c}∪Λ such that S × E |= α∗c. By Lemma 1, (S × E)(c ) satisfies α. Then take R as (E)(c ) .
646
St´ephane Riedweg and Sophie Pinchinat
We illustrate now the use of qLµ to specify various control requirements. The formula ∃c.α∗c of Theorem 1 is enriched to integrate control rules. In the sequel, we let = a∈A , [A] = a∈A [a], Reachc (γ) = (µY. Y ∨ γ)∗ c, and Invc (γ) = (νY.[A]Y ∧ γ) ∗ c. Also, the x-lift is canonically extended to conjunctions of propositions. Maximally Permissive Admissible Controller for α. When a system S has uncontrollable transitions, denoted by the set of labels Auc , an admissible controller for α should not disable any of them. Its existence may be expressed by the formula (1). An admissible controller c for α is maximally permissive if no other admissible controller c for α can cut strictly less transitions than c. Writing c c the mu-calculus formula Invc (c ) ∧ Reachc (¬c); this requirement is expressed by the formula (2). (1) S |= ∃c. Invc ([Auc ]c) ∧ α∗c S |= ∃c.Invc ([Auc ]c) ∧ α∗c ∧ ∀c . Invc ([Auc ]c ) ∧ (c c ) ⇒ ¬α∗c . (2) Maximally Permissive Open Controller for α. As studied in [2], an open system S takes the environment’s policy into account : the alphabet A of transitions is a disjoint union of the alphabet Aco of controllable transitions and the alphabet Auc of uncontrollable transitions, permitted or not by the environment. The open controller must ensure α for any possible choice of the environment. This requirement is expressed by the formula (3), where the proposition e represents the environment’s policy. The ad-hoc solution of [2] cannot be easily extended to maximally permissive open controller. This requirement is expressed by the formula (4). (3) S |= ∃c.Invc ([Auc ]c) ∧ ∀e. (Inve ([Aco ]e)) ⇒ α∗(e∧c) . S |= ∃c.Invc ([Auc ]c) ∧ ∀e. (Inve ([Aco ]e)) ⇒ α∗(e∧c) (4) ∧∀c . (Invc ([Auc ]c ) ∧ (cut(c) cut(c ))) ⇒ ∃e .Inve ([Aco ]e)∧¬α∗(e ∧c ) . Implementable Controller for “Non-blocking”. Such a controller [13], is an admissible controller which, moreover, selects exactly one controllable transition at a time, and such that, in the resulting supervised system, a final state (given by the proposition Pf ) is always reachable. Let Nblock = νZ. (µX.Pf ∨ X)∧[A]Z a and let Impl = ( a∈Aco →) ⇒ a∈Aco (< a > c ∧ [Aco \ {a}]¬c) ; a nonblocking implementable controller of a system S may be expressed by the formula: S |= ∃c. c ∧ (Nblock)∗c ∧ Invc ([Auc ]c) ∧ Invc (Impl) Decentralized controllers for α. The existence of decentralized controllers R1 and R2 such that S × R1 × R2 |= α may be expressed : S |= ∃c1 ∃c2 .α∗(c1 ∧ c2 ).
4
Quantified Mu-Calculus and Automata
Automata-theoretic approaches provide the model theory of mu-calculus, and they offer decision algorithms for the satisfiability and the model-checking problems [15, 10, 16–18, 8]. Depending on the approach followed, different automata
Quantified Mu-Calculus for Control Synthesis
647
have been considered, differing mainly in two orthogonal parameters : the more or less restricted kind of transitions, ranging from alternating automata to the subclass of non-deterministic automata, and the acceptance conditions e.g. Rabin, Streett, Motkowski/parity. The class of tree automata for mu-calculus which we shall adapt to qLµ is the class of alternating parity automata, or shortly simple automata, considered in [3]. This adaptation is stated by Theorem 2 below which constitutes the main result of this section; the remaining brings back the material needed for its proof. Theorem 2. (Main result)For any sentence α ∈ qLµ (Γ ), there exists a simple automaton Aα on Γ such that, for any process S on Γ : S |= α iff S is accepted by Aα Definition 8. (Simple Automata on Processes) A simple automaton on Γ is a tuple A =< Γ, Q, Q∃, Q∀, q 0, δ : Q × P(Γ ) → P(M oves(Q)), r > where Q is a finite set of states, partitioned into two subsets Q∃ and Q∀ of respectively existential and universal states, q 0 ∈ Q is the initial state, r : Q → IN is the parity condition, and the transition function δ assigns to each state q and to each subset of Γ a set of possible moves included in M oves(Q) = ((A ∪ { }) × Q) ∪ (A × {→, → }). Definition 9. (Nondeterministic Automata on Processes) A simple automaton is nondeterministic if for any set of labels Λ ⊆ Γ , δ(q, Λ) ⊆ { } × Q for any q ∈ Q∃ , and δ(q, Λ) ⊆ M oves(Q) { } × Q for any q ∈ Q∀ . Moreover, in case when q ∈ Q∀ , it is required that (a1 , q1 ), (a2 , q2 ) ∈ δ(q, Λ) ∩ (A × Q) and a1 = a2 entail q1 = q2 . Finally, the initial state should be an existential state. A nondeterministic automaton is bipartite if for any Λ ⊆ Γ , δ(q, Λ) ⊆ { } × Q∀ for any q ∈ Q∃ and δ(q, Λ) ∩ (A × Q) ⊆ A × Q∃ for any q ∈ Q∀ . Parity games provide automata semantics. A parity game is a graph with an initial vertex v 0 , with a partition (VI , VII ) of the vertices, and with a partial mapping r from the vertices to a given finite set of integers. A play from some vertex v proceeds as follows: if v ∈ VI , then player I chooses a successor vertex v , else player II chooses a successor vertex v , and so on ad infinitum unless one player cannot make any move. The play is winning for player I if it is finite and ends in a vertex of VII , or if it is infinite and the upper bound of the set of ranks r(v) of vertices v that are encountered infinitely often is even. A strategy for player I is a function σ assigning a successor vertex to every sequence of vertices → → → v , ending in a vertex of VI . A strategy σ is memoryless if σ( v ) = σ(w) whenever → → the sequences v and w end in the same vertex. A strategy for player I is winning if all play following the strategy from the initial vertex are winning for player I. Winning strategies for player II are defined similarly. The fundamental result of parity games is the memoryless determinacy Theorem, established in [10, 8]. Theorem 3. (Memoryless determinacy) For any parity game, one of the players has a (memoryless) winning strategy.
648
St´ephane Riedweg and Sophie Pinchinat
Definition 10. Given a simple automaton A =< Γ, Q, Q∃ , Q∀ , q 0 , δ, r > and a process S =< Γ, S, s0 , t, L >, we define the parity game G(A, S); where the vertices of player I are in Q∃ × S ∪ { } and the vertices of player II are in Q∀ × S ∪ {⊥}; the initial vertex v 0 is (q 0 , s0 ), the other vertices and transitions are defined inductively as follows. Vertices and ⊥ have no successor. For any vertex (q, s) and for all a ∈ A: – there is an -edge to a successor vertex (q , s) if ( , q ) ∈ δ(q, L(s)), – there is an a-edge to a successor vertex (q , s ) if (a, q ) ∈ δ(q, L(s)) and t(s, a) = s , – there is an a-edge to a successor vertex if (a, →) ∈ δ(q, L(s)) and t(s, a) is defined, or (a, →) ∈ δ(q, L(s)) and t(s, a) is not defined, – there is an a-edge to a successor vertex ⊥ if (a, →) ∈ δ(q, L(s)) and t(s, a) is not defined, or (a, →) ∈ δ(q, L(s)) and t(s, a)is defined. The automaton A accepts S (noted S |= A) if there is a winning strategy for player I in G(A, S). Like automata on infinite trees [10, 11], simple automata on processes are equivalent to bipartite non-deterministic automata. This fundamental result, due to [8, 3], is called the Simulation Theorem: Theorem 4. (Simulation Theorem for processes) Every simple automaton on processes is equivalent to a bipartite nondeterministic automaton. Since a constructive proof of Theorem 2 for α ∈ Lµ may be found in [8, 18, 9], in order to extend it to qLµ , we consider projections of automata: it is the semantic counterpart of the existential quantification in qLµ . Projections presented here are similar to projections of nondeterministic tree automata presented in [12, 19] : projected automata are obtained by forgetting a subset of propositions in the condition of the transitions. Definition 11. (Projection) Let Γ ⊆ Γ and let A =< Γ , Q, Q∃ , Q∀ , q 0 , δ, r > be a bipartite nondeterministic automaton. The projection of A on Γ is the bipartite nondeterministic automaton A ↓Γ =< Γ, Q∃ ∪ Q∀ × P(Λ), Q∃ , Q∀ × P(Λ), q 0 , δ↓Γ , r↓Γ >, where for all l ⊆ Λ and for all l ⊆ Γ : 1. ∀q ∈ Q∃ : δ↓Γ (q, l ) = {( , (q , l)) |( , q ) ∈ δ(q, l ∪ l)}, 2. ∀q ∈ Q∀ : δ↓Γ ((q, l), l ) = δ(q, l ∪ l), 3. ∀q ∈ Q∃ : r↓Γ (q) = r(q) and ∀q ∈ Q∀ : r↓Γ ((q, l )) = r(q). Theorem 5. (Projection) Let A =< Γ , Q, Q∃ , Q∀ , q 0 , δ, r > be a bipartite nondeterministic automaton. For any process S =< Γ, S, s0 , t, L > on Γ ⊆ Γ , S |= A↓Γ if and only if there exists a labeling process E on Λ = Γ Γ such that S × E |= A. Proof. First, suppose S |= A↓Γ . Let σ be a winning memoryless strategy for player I in the game G = G(A↓Γ , S) (Theorem 3) and let VII ⊆ Q∀ × P(Λ) × S be the set of nodes from player II in G without ⊥. Let E ∈ LabΛ be an arbitrary
Quantified Mu-Calculus for Control Synthesis
649
completion of the process < Λ, VII , σ(q 0 , s0 ), t , L >, where for any (q, l, s) inVII , L (q, l, s) = l, and for all a ∈ A, t ((q, l, s), a) = σ(s , q ), for some (unique) (s , q ) such that there is an a-arc from ((q, l), s) to (q , s ) in G. Then, it can be show that we define a winning strategy σ for player I in the game G(A, S × E) by σ ((s, σ(s, q)), q) = σ(s, q). Reciprocally, suppose S × E |= A for some E ∈ LabΛ . It suffice to show that any memoryless winning strategy for player I in G(A, S×E) defines a memory winning strategy for player I in the game G(A↓Γ , S). We prove now Theorem 2, by induction on the structure of α ∈ qLµ (Γ ). We only prove the case where α is quantified, since α ∈ Lµ is dealt with following [8] and [9]. Without loss off generality, we can assume α of the form qΛα , where q ∈ {∃, ∀}. For q = ∃, let A be the bipartite nondeterministic automaton equivalent to Aα (Theorem 4). Now take Aα = A↓Γ and conclude by Theorem 5. The case where q = ∀ is obtained by complementation : since parity games are determined (Theorem 3), we complement A∃Λ¬α (as in [11]) to obtain Aα Theorem 2 gives an effective construction of finite controllers on finite processes: given a finite process S and a sentence ∃c.α ∈ qLµ expressing a control problem, we construct the automaton A(∃c.α) . If we find a memoryless winning strategy in the finite game G(Aα , S), Theorem 5 gives a finite controller. Otherwise, there is no solution. We can show that the complexity of such problem is (k + 1)EXP T IM E − complete where k is the number of alternations of existential and universal quantifiers in α. The result of [2] is retrieved: synthesizing controllers for open systems is 2EXP T IM E − complete for mu-calculus control objectives.
5
Controller Synthesis
This section illustrates the constructions on a simple example. The plant S (to be controlled) is drawn next page. Both states s0 and s1 are labeled with the empty set of propositions, thus S is a process on Γ = ∅. The control objective is the formula α = νY. Y ∧ (µX.[a]X). There is a controller of S for α iff S |= ∃c.α∗c but also iff S |= ∃c.c ∧ α∗c. Let φ be the formula c ∧ α∗c ≡ c ∧ νY. (c ∧ Y )∧ (c ∧ µX.[a](c ⇒ X)). The bipartite nondeterministic automaton Aφ is shown in Figure 1, where the following graphical conventions are used: circled states are existential states, while states enclosed in squares are universal states; the transitions between states are represented by edges in {a, b, } × P({c}); the other transitions are represented by labeled edges from states to the special boxe containing the symbol →. The rank function maps q 0 to 2 and q2 to 1. The projected automaton Aφ ↓∅ is shown in Figure 2, using similar conventions. Note that all transitions are labeled in {a, b, } × {∅}, since Aφ ↓∅ is an automaton on Γ = ∅, but all universal states are now labeled in Q × P({c}), as a result of the projection. Now, S |= ∃c.φ iff Aφ↓∅ accepts S and this condition is equivalent to the existence of a winning strategy for player I in the finite parity game G(Aφ ↓∅ , S) of Figure 3. Clearly, player I has an unique memoryless wining strategy σ, that maps the vertex (q2 , s0 ) to (q2 , ∅, s0 ). The labeling process E on {c} derived from σ is shown in Figure 4. Four states and
650
St´ephane Riedweg and Sophie Pinchinat
transitions between them are first computed, yielding an incomplete process on {c}. A last state c is then added so as to obtain a complete process. The dashed transitions (and all dead transitions) are finally suppressed to yield the synthesized controller.
6
Conclusion
The logical formalism we have developed allows to synthesize controllers for a large class of control objectives. All the constraints, as maximally permissive controllers or admissible ones for open systems, are formulated as objectives. As it is, the class of controllers is left free and we cannot, for example, deal with partial observation. The recent work of [3] offers two constructions that we can use to interpret the quantified mu-calculus relatively to some fixed classes of labeling processes. The first construction, the quotient of automata, forces the labeling processes to be in some mu-calculus (definable) class. It can be seen as a generalization of the automata projection, and used instead. The quantified mu-calculus could hence be extended by constraining each quantifier to range
Quantified Mu-Calculus for Control Synthesis
651
over some mu-calculus class. Nevertheless, the class of controllers under partial observation being undefinable in the mu-calculus, we need to consider the second construction: the quotient of automata over a process exhibits (when it exists) a controller under partial observation inside some mu-calculus class. The outermost quantification of a sentence is then made relative to some class of partial observation. Therefore, we can seek a controller under partial observation for open systems, but we cannot synthesize a maximally permissive controller among the controllers under partial observation.
References 1. Ramadge, P.J., Wonham, W.M.: The control of discrete event systems. Proceedings of the IEEE; Special issue on Dynamics of Discrete Event Systems 77 (1989) 2. Kupferman, O., Madhusudan, P., Thiagarajan, P., Vardi, M.: Open systems in reactive environments: Control and synthesis. CONCUR 2000, LNCS 1877. 3. Arnold, A., Vincent, A., Walukiewicz, I.: Games for synthesis of controllers with partial observation. To appear in TCS (2003) 4. Vincent, A.: Synth`ese de contrˆ oleurs et strat´egies gagnantes dans les jeux de parit´e. MSR 2001 5. Sistla, A., Vardi, M., Wolper, P.: The complementation problem for Buch¨ı automata with applications to temporal logic. TCS49 (1987) 6. Kupferman, O.: Augmenting branching temporal logics with existential quantification over atomic propositions. Journal of Logic and Computation 9 (1999) 7. Patthak, A.C., Bhattacharya, I., Dasgupta, A., Dasgupta, P., Chakrabart, P.P.: Quantified computation tree logic. IPL 82 (2002) 8. Arnold, A., Niwinski, D.: Rudiments of mu-calculus. North-Holland (2001) 9. Walukiewicz, I.: Automata and logic. In: Notes from EEF Summer School’01. (2001) 10. Emerson, E.A., Jutla, C.S.: Tree automata, mu-calculus and determinacy. FOCS 1991. IEEE Computer Society Press (1991) 11. Muller, D.E., Schupp, P.E.: Simulating alternating tree automata by nondeterministic automata: New results and new proofs of the theorems of Rabin, McNaughton and Safra. TCS 141 (1995) 12. Rabin, M.O.: Decidability of second-order theories and automata on infinite trees. Trans. Amer. Math. Soc. 141 (1969) 13. Dietrich, P., Malik, R., Wonham, W., Brandin, B.: Implementation considerations in supervisory control. In: Synthesis and Control of Discrete Event Systems. Kluwer Academic Publishers (2002) 14. Bergeron, A.: A unified approach to control problems in discrete event processes. Theoretical Informatics and Applications 27 (1993) 15. Emerson, E.A., Sistla, A.P.: Deciding full branching time logic. Information and Control 61 (1984) 16. Emerson, E.A., Jutla, C.S., Sistla, A.P.: On model-checking for fragments of mucalculus. CAV 1993, LNCS 697. 17. Streett, R.S., Emerson, E.A.: The propositional mu-calculus is elementary. ICALP 1984, LNCS 172. 18. Kupferman, O., Vardi, M.Y., Wolper, P.: An automata-theoretic approach to branching-time model checking. Journal of the ACM 47 (2000) 19. Thomas, W.: Automata on infinite objects. In Leeuwen, J.v., ed.: Handbook of TCS, vol. B. Elsevier Science Publishers (1990)
On Probabilistic Quantified Satisfiability Games Marcin Rychlik Institute of Informatics, Warsaw University Banacha 2, 02-097 Warsaw, Poland [email protected]
Abstract. We study the complexity of some new probabilistic variant of the problem Quantified Satisfiability(QSAT). Let a sentence ∃v1 ∀v2 . . . ∃vn−1 ∀vn φ be given. In classical game associated with the QSAT problem, the players ∃ and ∀ alternately chose Boolean values of the variables v1 , . . . , vn . In our game one (or both) players can instead determine the probability that vi is true. We call such player a probabilistic player as opposite to classical player. The payoff (of ∃) is the probability that the formula φ is true. We study the complexity of the problem if ∃ (probabilistic or classical) has a strategy to achieve the payoff at least c playing against ∀ (probabilistic or classical). We completely answer the question for the case of threshold c = 1, exhibiting that the case when ∀ is probabilistic is easier to decide (Σ2P –complete) than the remaining cases (PSPACE-complete). For thresholds c < 1 we have a number of partial results. We establish PSPACE-hardness of the question whether ∃ can win in the case when only one of the players is probabilistic, and Σ2P -hardness when both players are probabilistic. We also show that the set of thresholds c for which a related problem is PSPACE is dense in [0, 1]. We study the set of reals c ∈ [0, 1] that can be game values of our games. The set turns out to include the set of binary rationals, but also some irrational numbers.
1
Introduction
In this paper we study a certain probabilistic variant of the problem Quantified Satisfiability(QSAT). Games with coin tosses (see e.g. [9][3][2][5]) or the games where players use randomized strategies (see e.g. [6][11][4]), have been widely considered in several previous works in complexity theory. Many papers consider a possibility of players to choose probability distributions (mixed strategies [11][4][7][8] or behavior strategies[6][7]), but the choices are made by the players just once per game, either independently or with just one alternation. A crucial difference between these works and ours is that in our framework probabilities are chosen by players in turn, according to the values of probabilities chosen so far. To our knowledge, such situation has not been considered so far. Quantified Satisfiability was studied in [1]. It can be considered as a game between two players, call them ∃ and ∀. Fix some Boolean formula φ (x1 , . . . , xn ).
Supported by Polish KBN grant No. 7 T11C 027 20
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 652–661, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Probabilistic Quantified Satisfiability Games
653
The two players move alternately, with ∃ moving first. If i is odd then ∃ fixes the value of xi , whereas if i is even ∀ fixes the value of xi . ∃ tries to make the expression φ true, while ∀ tries to make it false. Then ∃ has a winning strategy iff ∃x1 ∀x2 ∃x3 . . . φ (x1 , . . . , xn ) is true. If we assume that ∀ is uninterested in winning and plays at random then the game becomes a Game Against Nature studied in [9](see also [10]). The decisions of the Nature are probabilistic in manner, Nature chooses xi = 0 or xi = 1 with probability 12 . In this case a winning strategy for ∃ is a strategy that enforces the probability of success to be greater than 12 . Both in the case of the game Quantified Satisfiability and in the case of the Game Against Nature the following problem is PSPACE-complete[9]: Given φ decide whether there exists a winning strategy for ∃. There will be a difference between the Games Against Nature and our probabilistic variant of the game Quantified Satisfiability. In Games Against Nature the players use deterministic (pure) strategies. It means that at a particular node in a game, a player ∃(playing against Nature) is required to make a strategic move - say to choose the side of the coin. Or else, Nature is required to toss a coin, but the probabilities associated with the coin tosses are fixed in advance and not chosen by Nature. Hence, coin tosses correspond to “chance moves” in standard game-theoretic terminology. In our game, the biases of the coins will be chosen strategically in turn by both players. Once the biases of all the coins are determined, the coins are tossed. Thus the values of x1 , . . . , xn are determined and ∃ wins iff φ (x1 , . . . , xn ) is true. More specifically, we consider two types of players. A probabilistic player, instead of determining the value of xi , chooses the probability pi of xi being 1, where we assume that events {xi = 1} and {xj = 1} are independent when i = j. A player using a classical strategy, i.e. choosing values 0 or 1, can be viewed as a probabilistic player as well, but restricted to pi = 0 or pi = 1. The chosen probabilities p1 , p2 , . . . , pn determine the probability P (φ) of the event that φ is true. Now ∃ tries to make P (φ) as large as possible whereas ∀ tries to minimize P (φ). So P (φ) can be meant as the payoff of ∃. Notice that in the classical Quantified Satisfiability game the payoff P (φ) can be only 0 or 1. The following computational problem arises: given formula φ decide if ∃ can make P (φ) greater than a fixed threshold c ∈ [0, 1). We shall study the problem and related ones in this paper. We prove that the problem if ∃ can make P (φ) = 1 is Σ2P -complete(see e.g. [10]), when ∀ is probabilistic, and that this question is PSPACE-complete, when ∀ is classical. We show that it is PSPACE-hard to tell whether a probabilistic ∃ can enforce P (φ) ≥ c, when the opponent ∀ is classical. Similarly it is PSPACE-hard to tell whether a classical ∃ can make P (φ) > c when ∀ is probabilistic. In both cases we assume that thresholds are fixed. We also present Poly (|φ| , |log2 ε|)space algorithm which, given φ and ε > 0, returns a value that is ε-close to the maximal value of P (φ) attainable by ∃ . We prove that for ∈ {>, ≥} and for all types of players ∃ and ∀ (classical or probabilistic) the following set is
654
Marcin Rychlik
dense in [0, 1]: the set of constants c ∈ [0, 1] such that the language of Boolean formulas φ such that ∃ can make P (φ) c is in PSPACE. For the proofs we refer the reader to [12].
2
Variants of The Problem of Quantified Satisfiability
Let V be a countable set of variables. Recall a definition of the set of Boolean formulas Φ ::= 0 | 1 | V
|
∼Φ |
(Φ ∨ Φ)
|
(Φ ∧ Φ) .
Fix φ (v1 , . . . , vn ) ∈ Φ. Let xi ∈ {0, 1} , 1 ≤ i ≤ n. Then the meaning of φ (x1 , . . . , xn ) is the logical value of φ after replacing variables v1 , . . . , vn in φ by x1 , . . . , xn respectively. Now let X1 , . . . , Xn be pairwise independent random variables with range {0, 1} . Naturally φ (X1 , . . . , Xn ) can be understood as the random variable with range {0, 1} such that P (φ (X1 , . . . , Xn ) = 1), also written P (φ (X1 , . . . , Xn )) for short, equals the probability of the event that (X1 , . . . , Xn ) satisfies φ: P (φ (X1 , . . . , Xn ) = 1) =
n
P (Xi = xi ) .
(1)
(x1 ,...,xn )∈{0,1}n i=1 φ(x1 ,...,xn )=1
Note that P (φ (X1 , . . . , Xn ) = 1) is the expected value of φ (X1 , . . . , Xn ). In the sequel, Pp1 ,...,pn (φ) stands for P (φ (X1 , . . . , Xn ) = 1), where Xi s are arbitrary pairwise independent random variables satisfying P (Xi = 1) = pi , 1 ≤ i ≤ n. For all p1 , . . . , pn ∈ [0, 1] Pp1 ,...,pn (φ) =
n
pi xi
(2)
(x1 ,...,xn )∈{0,1}n i=1 φ(x1 ,...,xn )=1
pi if xi = 1 . 1 − pi if xi = 0 For the rest of this paper we shall assume that the range of random variables we consider is {0, 1} and that differently named random variables are pairwise independent. For instance X1 and X2 would denote two pairwise independent random variables with range {0, 1}. We shall write φ (X1 , . . . , Xn ) as the abbreviation for P (φ (X1 , . . . , Xn ) = 1) = 1. Consider the following statement: “There is a random variable X such that for every random variable Y we have P (X ↔ Y ) ≥ 12 ” (Here we wrote φ1 ↔ φ2 as the abbreviation for ((φ1 ∨ ∼ φ2 ) ∧ (∼ φ1 ∨ φ2 ))). It is a true statement - consider random variable X with P (X = 1) = 12 . This statement can be rewritten as 1 ∃X∀Y P (X ↔ Y ) ≥ . 2 where pi xi =
On Probabilistic Quantified Satisfiability Games
655
We used uppercase letters X and Y to emphasize that they represent random variables. Sometimes we would like to state also that: “There is a random variable X such that for every y ∈ {0, 1} we have P (X ↔ y) ≥ 12 .” This can be viewed as the previous statement with Y restricted to two random variables: such that P (Y = 1) = 1 or P (Y = 0) = 1. We will denote it by ∃X∀y P (X ↔ y) ≥
1 . 2
Here and subsequently, ∃X means that there is a random variable X, ∃x means that there is a random variable x restricted to two random variables: such that P (x = 1) = 1 or P (x = 0) = 1. Similarly in the case of quantifier ∀. We extend this notation to longer prefixes in obvious way. Note that ∃x1 ∀y1 ∃x2 . . . φ has its usual meaning. Consider formula of the form: Q1 y1 Q2 y2 Q3 y3 . . . Qn yn P (φ (y1 , y2 , y3 , . . . , yn )) c
(3)
where ∈ {≥, ≤, >, ’ ∃ tries to make P (φ (y1 , y2 , y3 , . . . , yn )) as big as possible, and then it is natural to call P (φ (y1 , y2 , y3 , . . . , yn )) the payoff of ∃. If yi = Xi for every yi chosen by ∃, then we call ∃ a probabilistic player, and we say that he uses a probabilistic strategy. If yi = xi for every yi chosen by ∃, then we call ∃ a classical player, and we say he uses a classical strategy then. We use similar terminology for the case of the player ∀. For the rules of the game described in the introduction we can consider following problem. Problem 1. Fix c ∈ [0, 1) . Given Boolean formula φ decide whether ∃X1 ∀X2 ∃X3 . . . n Xn P (φ (X1 , X2 , X3 , . . . , Xn )) > c
(4)
where the nth quantifier n is ∃ if n is odd, and ∀ if n is even. In the case of threshold c given by finitely representable rational number decidability of the problem 1 and of similar ones follows from Tarski’s Theorem on the decidability of the first-order theory of the field of real numbers. For example, we can rewrite formula ∃X∀Y P (X ↔ Y ) ≥ 12 as the following sentence of theory of reals ∃pX (0 ≤ pX ≤ 1) ∧ ∀pY [(0 ≤ pY ≤ 1) ⇒ pX pY + (1 − pX ) (1 − pY ) > c] .
656
Marcin Rychlik
In general, the size of an expression representing P (φ (X1 , X2 , X3 , . . . , Xn )) can be of exponential size with respect to the size of φ. The following problem is PSPACE-complete[1]. Problem 2 (Quantified Satisfiability). Given formula φ decide whether ∃x1 ∀x2 ∃x3 . . . n xn φ . One may make conjecture that ∃X1 ∀X2 ∃X3 . . . n Xn φ (X1 , X2 , X3 , . . . , Xn ) is equivalent to ∃x1 ∀x2 . . . n xn φ (x1 , x2 , x3 , . . . , xn ). But it is not true as the following example shows. Example 1. Let φ = v1 ↔ v2 . Then ∀x1 ∃x2 φ (x1 , x2 ) is true but it is not true that ∀X1 ∃X2 φ (X1 , X2 ), because if P (X1 = 1) = 12 then P (φ (X1 , X2 )) = P (X1 = 0) P (X2 = 0) + P (X1 = 1) P (X2 = 1) 1 1 1 = P (X2 = 0) + P (X2 = 1) = < 1 2 2 2 whatever X2 is chosen.
The next example shows that for some Boolean formulas φ quantified formula ∃x1 ∀x2 . . . n xn φ is true whereas ∃X1 ∀X2 . . . n Xn P (φ (X1 , . . . , Xn )) ≥ c is true only when c is negligible. Example 2. Let φ =
n i=1
(v2i−1 ↔ v2i ). Then ∀x1 ∃x2 . . . ∃x2n φ (x1 , . . . , x2n ) is
true but ∀X1 ∃X2 . . . ∃X2n P (φ (X1 , . . . , X2n )) ≥ c is not true unless c ≤ 21n . If we set P (X2i−1 = 1) = 12 for all 1 ≤ i ≤ n, then P (X2i−1 ↔ X2i ) = 12 for all 1 ≤ i ≤ n (see the previous example) and in consequence P (φ (X1 , . . . , X2n )) = n P (X2i−1 ↔ X2i ) = 21n , no matter how ∀ chooses X2 , . . . , X2n . We used the i=1
fact that for arbitrary Boolean formulas φ1 (v1 , . . . , vn ) and φ2 (w1 , . . . , wm ) P (φ1 (X1 , . . . , Xn ) ∧ φ2 (Y1 , . . . , Ym )) = P (φ1 (X1 , . . . , Xn )) P (φ2 (Y1 , . . . , Ym )) when Xi s and Yi s are pairwise independent random variables.
The example above may seem to suggest that if a player has no winning strategy then the best he can do is to always choose probability 12 . But the following example illustrates that this need not be the case. Example 3. Consider formula φ (v1 , v2 , v3 , v4 ) such that φ (x1 , x2 , x3 , x4 ) is true if and only if (x1 , x2 , x3 , x4 ) ∈ {(1, 0, 0, 0) , (0, 1, 0, 0) , (0, 0, 1, 0) , (0, 1, 1, 0) , (1, 1, 1, 0) , (1, 0, 0, 1) , (0, 1, 0, 1) , (1, 1, 0, 1)} .
On Probabilistic Quantified Satisfiability Games
657
One can check that ∃x1 ∀x2 ∃x3 ∀x4 φ (x1 , x2 , x3 , x4 ) is not true. The value F (p) defined by
F (p) =
p
3 2 − 1 −3p+p −p √
2
√
−1+2p+p2 +p2 +1
−1+2p+p2 p
if 0 ≤ p ≤ if
1 2
1 2
1 − ⇔ ∃X1 ∀X2 ∃X3 . . . n Xn φ
1 2n/2
⇔ ∃X1 ∀X2 ∃X3 . . . n Xn P (φ) > 1 −
1 2n/2
where χn is xn if n is odd, and Xn if n is even, and ι = n, κ = n − 1 if n is odd, and ι = n − 1, κ = n if n is even. The following theorem shows that if the player ∀ is classical then probabilistic strategy does not add the power to ∃, when threshold c is set to 1. Theorem 2. ∃x1 ∀x2 ∃x3 . . . n xn φ ⇔ ∃X1 ∀x2 ∃X3 . . . n κn φ ⇔ ∃X1 ∀x2 ∃X3 . . . n κn P (φ) > 1 −
1 2n/2
where κn is Xn if n is odd, and xn if n is even. 1
We used command FullSimplify[x ∈ Rationals] in Mathematica ver. 4.0.1.0 program created by Wolfram Research, Inc., to get this result.
658
3
Marcin Rychlik
Game Value
Definition. Let φ (v1 , . . . , vn ) ∈ Φ. cφ = max
min . . . n
Pp1 ,...,pn (φ)
(5)
cφ = max
min . . . n
Pp1 ,...,pn (φ)
(6)
cφ = max
min . . . n Pp1 ,...,pn (φ)
(7)
p1 ∈[0,1] p2 ∈[0,1]
pn ∈[0,1]
p1 ∈[0,1] p2 ∈{0,1}
pn ∈∆n
p1 ∈{0,1} p2 ∈[0,1]
pn ∈Λn
where n ,∆n ,Λn are max, [0, 1] , {0, 1} respectively if n is odd, and min, {0, 1}, [0, 1] if n is even. Let
Γ = {cφ : φ ∈ Φ} , Γ = cφ : φ ∈ Φ , Γ = cφ : φ ∈ Φ . The values at the right-hand sides of the formulas (5), (6) and (7), call them the game values, are well defined because the sets [0, 1] and {0, 1} are compact and for 1 < i ≤ n the following maps are continuous with respect to p1 , . . . , pi p1 , . . . , pi →
i+1 . . . n Pp1 ,...,pn pn ∈[0,1] pi+1 ∈[0,1]
p1 , . . . , pi →
i+1 pi+1 ∈∆i+1
. . . n Pp1 ,...,pn (φ)
p1 , . . . , pi →
i+1 pi+1 ∈Λi+1
. . . n Pp1 ,...,pn (φ) .
(φ)
pn ∈∆n
pn ∈Λn
Pp1 ,...,pn (φ) is continuous (case i = n) because it is a multilinear map (recall (2)). The continuity of maps in case when i < n can be inductively proved by the use of the following lemma. Lemma 1. Assume f : S × T → R is a continuous map and S ,T are compact spaces. Then F defined by F (s) = maxf (s, t) is also continuous. t∈T
The values cφ , cφ , cφ defined by (5), (6) and (7) are the maximal attainable payoffs of ∃ in corresponding games. To see this observe that if f (p) is the payoff of the player corresponding to a choice p ∈ P , where P is the compact set of all possible choices, then F = maxf (p) is the maximal attainable payoff of the p∈P
player provided f is a continuous map. Example 4. Let φ be as in example 3. Then cφ = max
min
cφ = max
min
max
min
p1 ∈[0,1] p2 ∈[0,1] p3 ∈[0,1] p4 ∈[0,1]
max
min
Pp1 ,p2 ,p3 ,p4 (φ) = F (p∗ )
p1 ∈[0,1] p2 ∈{0,1} p3 ∈[0,1] p4 ∈{0,1}
Pp1 ,p2 ,p3 ,p4 (φ) =
where F and p∗ are defined in example 3.
1 √ 5 − 1 ≈ 0.618034 . 2
On Probabilistic Quantified Satisfiability Games
659
One can easily check that for every formula φ the following equations hold, relating the game values for φ and ∼ φ. 1 = cφ(v1 ,...,vn ) + c∼φ(v0 ,v1 ,...,vn )
= cφ(v1 ,...,vn ) + c∼φ(v0 ,v1 ,...,vn ) = c∼φ(v0 ,v1 ,...,vn ) + cφ(v1 ,...,vn )
(8)
where we used dummy variable v0 not used in formula φ to enforce x1 or X1 (according to the type of the game), be chosen by ∀. Observe that by (8) we have Γ = {1 − γ : γ ∈ Γ }. We also have following inequalities cφ(v1 ,...,vn ) ≤ cφ(v1 ,...,vn ) ≤ cφ(v1 ,...,vn ) . Theorem 3. For every c ∈ Γ \ {0} the following problem is PSPACE-hard: Given φ decide whether ∃X1 ∀x2 ∃X3 . . . n κn P (φ) ≥ c. Theorem 4. For every c ∈ Γ \ {1} the following problem is PSPACE-hard: Given φ decide whether ∃x1 ∀X2 ∃x3 . . . n χn P (φ) > c. Theorems 1, 2, 3, 4 are summarized below. We will rephrase them in gametheoretic terms. That is, the problem concerning ∃X1 ∀x2 ∃X3 . . . n κn P (φ) > c is considered as the problem of ∃ using probabilistic strategy, against ∀ using classical strategy. Similarly for other cases. Summary of the complexity results. Assume φ is given and c is arbitrary fixed number c ∈ [0, 1), until otherwise stated. We put three questions: if ∃ can make: (i)P (φ) = 1, (ii)P (φ) > c, (iii)P (φ) ≥ c. Our complexity results(when one or both players are probabilistic) depend on the natures of strategies that both players use. (Of course if both players are classical the results are obvious consequences of the PSPACE-completeness of QSAT.) P (φ) = 1
∃\
∀
Probabilistic Σ2P -complete Σ2P -complete
∃\
∀
Probabilistic PSPACE-hard** Σ2P -hard*
∃\
∀
Probabilistic ? ?
Classical Classical PSPACE-complete Probabilistic PSPACE-complete
Classical P (φ) > c Classical PSPACE-complete Probabilistic PSPACE-hard* P (φ) ≥ c
Classical Classical PSPACE-complete Probabilistic PSPACE-hard***
* when c is the part of an input ** when c ∈ Γ \ {1} *** when c ∈ Γ \ {0}
660
Marcin Rychlik
The next theorem yields a partial information concerning the shape of the n sets Γ , Γ and Γ . A number b is binary rational if b = bi 21i for some n and i=1
for some b1 , . . . , bn ∈ {0, 1}. Let Υ be the set of all binary rationals in [0, 1]. Theorem 5. Υ Γ , Υ Γ and Υ Γ . Corollary. The sets Γ , Γ and Γ are dense subsets of the interval [0, 1]. We say that λ is ε-close to λ if |λ − λ | ≤ ε. Theorem 6. Let ∆i = [0, 1] or ∆i = {0, 1} for every 1 ≤ i ≤ n. Given φ (x1 , . . . , xn ) and ε > 0, we can compute in O (log2 |φ| + n log2 n + n |log2 ε|) space a number λ that is ε-close to λ = max min max . . . n Pp1 ,...,pn (φ). p1 ∈∆1 p2 ∈∆2 p3 ∈∆3
pn ∈∆n
In particular, we can compute the approximation of game values cφ , cφ , cφ within the bound just mentioned. One may ask if Theorem 6 could be used to solve Problem 1 in polynomial space, at least for some c. Lemma 2 enables us to give an affirmative answer to this question. Lemma 2. Let D ⊆ Σ ∗ be a language over a finite alphabet Σ, |Σ| ≥ 2, and let P be a map P : D → [0, 1] . Assume for given d ∈ D we can compute in space O (Poly (|d| , |log ε|)) a value P (d, ε) that is ε-close to P (d). Let ∈ {≥, >}. Then the set {c ∈ [0, 1] : the language {d ∈ D|P (d) > c} is in PSPACE} is a dense subset of [0, 1]. As a corollary we get: Theorem 7. Let ∈ {≥, >}. The sets {c ∈ [0, 1] : the language {φ ∈ Φ|cφ c} is in PSPACE}
c ∈ [0, 1] : the language φ ∈ Φ|cφ c is in PSPACE
c ∈ [0, 1] : the language φ ∈ Φ|cφ c is in PSPACE are dense subsets of [0, 1].
4
Conclusion
We have answered completely the question of the complexity of the problem if ∃ has strategy to achieve payoff 1 for all combinations of types of players. (For both players classical this is the classical QSAT problem.) We have shown PSPACE-hardness of the question whether classical ∃ can make payoff greater than fixed c when ∀ uses a probabilistic strategy. In the case of probabilistic ∃ and classical ∀ we need c to be part of the input to
On Probabilistic Quantified Satisfiability Games
661
get PSPACE-hardness. We have PSPACE-hardness result in the case of fixed c when we ask whether ∃ can make payoff greater or equal to c. We have given Σ2P lower bound for the question “P (φ) > c ?” in the case of both players being probabilistic and c belonging to an input. We also indicate that for every mentioned problem it is possible to find a dense subset of thresholds for which the problem is in PSPACE. Still many problems remain open. It would be nice to have a PSPACEcompleteness result of the question “P (φ) > c ?” or “P (φ) ≥ c ?” for some fixed c (c = 12 for instance) and for all combinations of types of players. Also, the complexity of the problem of computing an approximation of game values (or exact values if possible) remains to be studied. This is the subject of an ongoing research.
Acknowledgement The author wishes to express his thanks to Prof. Damian Niwi´ nski for many stimulating conversations.
References 1. A.Chandra, D.Kozen, and L.Stockmeyer, Alternation, Journal of the ACM, 28 (1981), pp. 114-133 2. A.Condon, Computational Models od Games, ACM Distinguished Dissertation, MIT Press, Cambridge, 1989 3. A. Condon and R. Ladner, Probabilistic Game Automata, Proceedings of 1st Structure in Complexity Theory Conference, Lecture Notes in Computer Science, vol. 223, Springer, Berlin, 1986, pp. 144-162. 4. J.Feigenbaum, D.Koller, P.Shor, A Game-Theoretic Classification of Interactive Complexity Classes, Proceedings of the 10th Annual IEEE Conference on Structure in Complexity Theory (STRUCTURES), Minneapolis, Minnesota, June 1995, pages 227-237. 5. S.Goldwasser, M.Sipser, Private coins versus public coins in interactive proof systems, Randomness and Computation, S.Micali, editor, vol. 5 of Advances in Computing Research, JAI Press, Greenwich, 1989, pp. 73-90 6. D.Koller, N.Megiddo, The complexity of Two-Person Zero-Sum Games in Extensive Form, Games and Economic Behavior, 4:528-552, 1992 7. D.Koller, N.Megiddo, B. von Stengel Fast Algorithms for Finding Randomized Stragegies in Game Trees, Proceedings of the 26th Symposium on Theory of Computing, ACM, New York, 1994, pp. 750-759 8. R.Lipton, N.Young, Simple strategies for large zero-sum games with applications to complexity theory, Contributions to the Theory of Games II, H.Kuhn, A.Tucker, editors, Princeton University Press, Princeton, 1953, pp. 193-216. 9. C.Papadimitriou, Games Against Nature, Journal of Computer and System Sciences, 31(1985), pp. 288-301 10. C.Papadimitriou, Computational Complexity, Addison-Wesley Pub. Co., 1994 11. C.Papadimitriou, M.Yannakakis, On Complexity as Bounded Rationality, Proceedings of the 26th Symposium on Theory of Computing, ACM, New York, 1994, pp. 726-733 12. M.Rychlik, On Probabilistic Quantified Satisfiability Games, Available at http://www.mimuw.edu.pl/∼mrychlik/papers
A Completeness Property of Wilke’s Tree Algebras Saeed Salehi Turku Center for Computer Science Lemmink¨ aisenkatu 14 A FIN-20520 Turku [email protected]
Abstract. Wilke’s tree algebra formalism for characterizing families of tree languages is based on six operations involving letters, binary trees and binary contexts. In this paper a completeness property of these operations is studied. It is claimed that all functions involving letters, binary trees and binary contexts which preserve all syntactic tree algebra congruence relations of tree languages are generated by Wilke’s functions, if the alphabet contains at least seven letters. The long proof is omitted due to page limit. Instead, a corresponding theorem for term algebras, which yields a special case of the above mentioned theorem, is proved: in every term algebra whose signature contains at least seven constant symbols, all congruence preserving functions are term functions.
1
Introduction
A new formalism for characterizing families of tree languages was introduced by Wilke [13], which can be regarded as a combination of universal algebraic framework of Steinby [11] and Almeida [1], in the case of binary trees, based on syntactic algebras, and syntactic monoid/semigroup framework of Thomas [12] and of Nivat and Podelski [8],[9]. It is based on three-sorted algebras, whose signature Σ consists of six operation symbols involving the sorts Alphabet, Tree and Context. Binary trees over an alphabet are represented by terms over Σ, namely as Σ-terms of sort Tree. A tree algebra is a Σ-algebra satisfying certain identities which identify (some) pairs of Σ-terms representing the same tree. The syntactic tree algebra congruence relation of a tree language is defined in a natural way (see Definition 1 below.) The Tree-sort component of the syntactic tree algebra of a tree language is the syntactic algebra of the language in the sense of [11], while its Context-component is the semigroup part of the syntactic monoid of the tree language, as in [12]. A tree language is regular iff its syntactic tree algebra is finite ([13], Proposition 2.) A special sub-class of regular tree languages, that of k-frontier testable tree languages, is characterized in [13] by a set of identities satisfied by the corresponding syntactic tree algebra. For characterizing this sub-class, three-sorted tree algebra framework appears to be more suitable, since “frontier testable tree languages cannot be characterized by syntactic semigroups and there is no known B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 662–670, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Completeness Property of Wilke’s Tree Algebras
663
finite characterization of frontier testability (for an arbitrary k) in the universal algebra framework” [7]. This paper concerns Wilke’s functions (Definition 2) by which tree algebra formalism is established for characterizing families of tree languages ([13]). We claim that Wilke’s functions generate all congruence preserving operations on the term algebra of trees, when the alphabet contains at least seven labels. For the sake of brevity, we do not treat tree languages and Wilke’s functions in manysorted algebra framework as done in [13], our approach is rather a continuation of the lines of the traditional framework, as of e.g. [11]. A more comprehensive general study of tree algebras and Wilke’s formalism (independent from this work) has been initiated by Steinby and Salehi [10].
2
Tree Algebraic Functions
For an alphabet A, let Σ A be the signature which contains a constant symbol ca and a binary function symbol fa for every a ∈ A, that is Σ A = {ca | a ∈ A} ∪ {fa | a ∈ A}. The set of binary trees over A, denoted by TA , is defined inductively by: – ca ∈ TA for every a ∈ A; and – fa (t1 , t2 ) ∈ TA whenever t1 , t2 ∈ TA and a ∈ A. Fix a new symbol ξ which does not appear in A. Binary contexts over A are binary trees over A ∪ {ξ} in which ξ appears exactly once. The set of binary contexts over A, denoted by CA , can be defined inductively by: – ξ, fa (t, ξ), fa (ξ, t) ∈ CA for every a ∈ A and every t ∈ TA ; and – fa (t, p), fa (p, t) ∈ CA for every a ∈ A, every t ∈ TA , and every p ∈ CA . For p, q ∈ CA and t ∈ TA , p(q) ∈ CA and p(t) ∈ TA are obtained from p by replacing the single occurrence of ξ by q or by t, respectively. Definition 1. For a tree language L ⊆ TA we define the syntactic tree algebra L L congruence relation of L, denoted by ≈L = (≈L A , ≈T , ≈C ), as follows: 1. For any a, b ∈ A, a ≈L A b ≡ ∀p ∈ CA {p(ca ) ∈ L ↔ p(cb ) ∈ L} & ∀p ∈ CA ∀t1 , t2 ∈ TA {p(fa (t1 , t2 )) ∈ L ↔ p(fb (t1 , t2 )) ∈ L}. 2. For any t, s ∈ TA , t ≈L T s ≡ ∀p ∈ CA {p(t) ∈ L ↔ p(s) ∈ L}. 3. For any p, q ∈ CA , p ≈L C q ≡ ∀r ∈ CA ∀t ∈ TA {r(p(t)) ∈ L ↔ r(q(t)) ∈ L}. Remark 1. Our definition of syntactic tree algebra congruence relation of a tree language is that of [13], but we have corrected a mistake in Wilke’s definition of ≈L A ; it is easy to see that the original definition (page 72 of [13]) does not yield a congruence relation. Another difference is that ξ is not a context in [13].
664
Saeed Salehi
Definition 2. ([13], page 88) For an alphabet A, Wilke’s functions over A are defined by: ιA : A → TA 2 κA : A × T A → T A A λ : A × TA → CA ρA : A × TA → CA 2 σ A : CA → CA η A : CA × TA → TA
ιA (a) = ca κA (a, t1 , t2 ) = fa (t1 , t2 ) λA (a, t) = fa (ξ, t) ρA (a, t) = fa (t, ξ) σ A (p1 , p2 ) = p1 (p2 ) η A (p, t) = p(t)
Recall that projection functions πjn : B1 × · · · × Bn → Bj (for sets B1 , · · · , Bn ) are defined by πjn (b1 , · · · , bn ) = bj . For a b ∈ Bj , the constant function from B1 × · · · × Bn to Bj , determined by b, is defined by (b1 , · · · , bn ) → b. n
m
k
Definition 3. For an alphabet A, a function F : A × TA × CA → X where X ∈ {A, TA , CA } is called tree-algebraic over A, if it is a composition of Wilke’s functions over A, constant functions and projection function. Example 1. Let A = {a, b}. The function F : A × TA × CA → CA defined by F (x, t, p) = fa fx fb (ca , ca ), ξ , p fb (t, cx ) , for x ∈ A, t ∈ TA and p ∈ CA , is tree-algebraic over A. Indeed F (x, t, p) = σ A λA a, η A p, κA (b, t, ιA (x)) , ρA x, fb (ca , ca ) . n
m
k
Definition 4. A function F : A × TA × CA → X where X ∈ {A, TA , CA } is called congruence preserving over A, if for every tree language L ⊆ TA and for all a1 , b1 , · · · , an , bn ∈ A, t1 , s1 , · · · , tm , sm ∈ TA , p1 , q1 , · · · , pk , qk ∈ CA , L L L if a1 ≈L A b1 , · · · , an ≈A bn , t1 ≈T s1 , · · · , tm ≈T sm , L L and p1 ≈C q1 , · · · , pk ≈C qk , then F (a1 , · · · , an , t1 , · · · , tm , p1 , · · · , pk ) ≈L x F (b1 , · · · , bn , s1 , · · · , sm , q1 , · · · , qk ),
where x is A, T, or C, if X = A, X = TA , or X = CA , respectively. Remark 2. In universal algebra, the functions which preserve congruence relations of an algebra, are called congruence preserving functions. On the other hand it is known that every congruence relation over an algebra is the intersection of some syntactic congruence relations (see Remark 2.12 of [1] or Lemma 6.2 of [11].) So, a function preserve all congruence relations of an algebra iff it preserves the syntactic congruence relations of all subsets of the algebra. This justifies the notion of congruence preserving function in our Definition 4, even though we require that the function preserves only all the syntactic tree algebra congruence relations of tree languages.
A Completeness Property of Wilke’s Tree Algebras
665
Example 2. For A = {a, b}, the root function root: TA → A, which maps a tree to its root label, is not congruence preserving: Let L = {fa (cb , cb )}, then fa (ca , ca ) ≈L T fb (ca , ca ), but since fa (cb , cb ) ∈ L, and fb (cb , cb ) ∈ L, then root fa (ca , ca ) = a ≈L A b = root fb (ca , ca ) . Lemma 1. All tree-algebraic functions are congruence preserving. The easy proof is omitted. We claim the converse for alphabets containing at least seven labels: Theorem 1. For an alphabet A which contains at least seven labels, every congruence preserving function over A is tree-algebraic over A. Remark 3. The condition |A| ≥ 7 in Theorem 1 may seem odd at the first glance, but the theorem does not hold for |A| = 2: let A = {a, b} and define F : A → TA by F (a) = fa (cb , cb ), F (b) = fb (ca , ca ). It can be easily seen that F is congruence preserving but is not tree-algebraic over A. It is not clear at the moment whether Theorem 1 holds for 3 ≤ |A| ≤ 6. The long detailed proof of Theorem 1 will not be given in this paper because of space shortage. Instead, in the next section, a corresponding theorem for term algebras, which immediately yields Theorem 1 for congruence preserving m functions of the form F : TA → TA , is proved.
3
Congruence Preserving Functions in Term Algebras
Our notation follows mainly [2], [3], [5], [6], and [11]. A ranked alphabet is a finite nonempty set of symbols each of which has a unique non-negative arity (or rank). The set of m-ary symbols in a ranked alphabet Σ is denoted by Σm (for each m ≥ 0). TΣ (X) is the set of Σ-terms with variables in X. For empty X it is simply denoted by TΣ . Note that (TΣ (X), Σ) is a Σ-algebra, and (TΣ , Σ) is called the term algebra over Σ. For L ⊆ TΣ , let ≈L be the syntactic congruence relation of L ([11]), i.e., the greatest congruence on the term algebra TΣ saturating L. Let Σ denote a signature with the property that Σ = Σ0 . Throughout X is always a set of variables. Definition 5. A function F : (TΣ )n → TΣ is congruence preserving if for every congruence relation Θ over TΣ and all t1 , · · · , tn , s1 , · · · , sn ∈ TΣ , if t1 Θs1 , · · · , tn Θsn , then F (t1 , · · · , tn )ΘF (s1 , · · · , sn ). Remark 4. A congruence preserving function F : (TΣ )n → TΣ induces a welldefined function FΘ : (TΣ /Θ)n → TΣ /Θ on any quotient algebra, for any congruence Θ on TΣ , defined by FΘ ([t1 ]Θ , · · · , [tn ]Θ ) = [F (t1 , · · · , tn )]Θ .
666
Saeed Salehi
For terms u1 , · · · , un ∈ TΣ (X) and t ∈ TΣ (X ∪ {x1 , · · · , xn }) with x1 , · · · , xn ∈ X, the term t[x1 /u1 , · · · , xn /un ] 1 ∈ TΣ (X) is resulted from t by replacing all the occurrences of xi by ui for all i ≤ n. The function (TΣ )n → TΣ (X) defined by (u1 , · · · , un ) → t[x1 /u1 , · · · , xn /un ] for all u1 , · · · , un ∈ TΣ , is called the term function 2 defined by t. The rest of the paper is devoted to the proof of the following Theorem: Theorem 2. If |Σ0 | ≥ 7, then every congruence preserving F : (TΣ )n → TΣ , for every n ∈ IN, is a term function (i.e., there is a term t ∈ TΣ ({x1 , · · · , xn }), where x1 , · · · , xn are variables, such that F (u1 , · · · , un ) = t[x1 /u1 , · · · , xn /un ] for all u1 , · · · , un ∈ TΣ .) Remark 5. Theorem 2 dose not hold for |Σ0 | = 1: Let Σ = Σ0 ∪ Σ1 be a signature with Σ1 = {α} and Σ0 = {ζ0 }. The term algebra (TΣ , Σ) is isomorphic to (IN, 0, S), where 0 is the constant zero and S is the successor function. Let F : IN → IN be defined by F (n) = 2n. It is easy to see that F is congruence preserving: for every congruence relation Θ, if nΘm then SnΘSm and by repeating the same argument for n times we get Sn nΘSn m or 2nΘn + m. Similarly Sm nΘSm m, so m + nΘ2m, hence 2mΘ2n that is F (n)ΘF (m). But F is not a term function, since all term functions are of the form n → Sk n = k + n for a fixed k ∈ IN. It is not clear at the moment whether Theorem 2 holds for 2 ≤ |Σ0 | ≤ 6. Remark 6. Finite algebras having the property that all congruence preserving functions are term functions are called hemi-primal in universal algebra (see e.g. [3]). Our assumption Σ = Σ0 in Theorem 2 implies that TΣ is infinite. Remark 7. Theorem 2 yields Theorem 1 for congruence preserving functions of n the form F : TA → TA , since (TA , Σ A ) is the term algebra over the signature Σ A , and its every term function can be represented by ιA and κA (recall that ca = ιA (a), and fa (t1 , t2 ) = κA (a, t1 , t2 ), for every a ∈ A, and t1 , t2 ∈ TA ). Proof of Theorem 2 Definition 6. – An interpretation of X in TΣ is a function ε : X → TΣ . Its unique extension to the Σ-homomorphism TΣ (X) → TΣ is denoted by ε∗ . – Any congruence relation Θ on TΣ is extended to a congruence relation Θ∗ on TΣ (X) defined by the following relation for any p, q ∈ TΣ (X): p Θ∗ q, if for every interpretation ε : X → TΣ , ε∗ (p) Θ ε∗ (q) holds. – A function G : TΣ → TΣ (X) is congruence preserving if for every congruence relation Θ on TΣ and t, s ∈ TΣ , if tΘs, then G(t)Θ∗ G(s). The classical proof of the following lemma is not presented here. 1 2
Denoted by t[u1 , · · · , un ] in [4]. It is also called tree substitution operation, see e.g. [4].
A Completeness Property of Wilke’s Tree Algebras
667
Lemma 2. The term function TΣ → TΣ (X), u → t[x/u] defined by any term t ∈ TΣ (X ∪ {x}) (x ∈ X), is congruence preserving. Definition 7. Let t be a term in TΣ (X), and C ⊆ TΣ (X), then t is called independent from C, if it is not a subterm of any member of C and no member of C is a subterm of t. the set of RFor a term rewriting system R, and a term u, let ∆∗R (u) be descendants of {u} (cf. [6]) and for a set of terms C, let ∆∗R (C) = u∈C ∆∗R (u). A useful property of the notion of independence is the following: Lemma 3. Let u ∈ TΣ (X) be independent from C ⊆ TΣ (X) and R be the single-ruled (ground-)term rewriting system {w → u} where w is any term in TΣ (X). Then L = ∆∗R (C) is closed under the rewriting rule u → w, and also u ≈L w. Moreover, every member of L results from a member of C by replacing some w subterms of it by u. Proof. Straightforward, once we note that any application of the rule w → u to a member of C, does not result in a new subterm of the form w, and all u’s appearing in the members of L (as subterms) are obtained by applying the (ground-term) rewriting rule w → u.
Proposition 1. For any C ⊂ TΣ (X) such that |C| < |Σ0 |, there is a term in TΣ which is independent from C. Proof. For each c ∈ Σ0 choose a tc ∈ TΣ that is higher (has longer height) than all terms in C and contains no other constant symbol than this c. Then, no tc is a subterm of any member of C. On the other hand, no term in C may appear as a subterm in more than one of the terms tc (for any c ∈ Σ0 ). Since the number of tc ’s for c ∈ Σ0 are more than the number of elements of C, then by the Pigeon Hole Principle, there must exist a tc that is independent from C.
Lemma 4. Let G : TΣ → TΣ (X) be congruence preserving, ε : X → TΣ be an interpretation, and u, v ∈ TΣ . If v is independent from {u, ε∗ (G(u))}, then ε∗ (G(v)) ∈ ∆∗{u→v} ε∗ (G(u)) . Moreover, ε∗ (G(v)) results from ε∗ (G(u)) by replacing some u subterms by v. Proof. Let L = ∆∗{u→v} ε∗ (G(u)) . By Lemma 3, u ≈L v. The function G is congruence preserving, so ε∗ (G(u)) ≈L ε∗ (G(v)), and since ε∗ (G(u)) ∈ L, then ε∗ (G(v)) ∈ L. The second claim follows from the independence of v from
{u, ε∗ (G(u))}. Recall that for a position p of the term t, t|p is the subterm of t at the position p (cf. [2]).
668
Saeed Salehi
Lemma 5. Suppose |Σ0 | ≥ 7, and let G : TΣ → TΣ (X) be congruence preserving. If v is independent from {u, G(u)}, for u, v ∈ TΣ , then G(v) results from G(u) by replacing some of its u subterms by v. Proof. By Proposition 1, there are w, w1 , w2 such that w is independent from {u, G(u), v, G(v)}, w1 is independent from {w, u, G(u), v, G(v)}, and w2 is independent from {w, w1 , u, G(u), v, G(v)}. Define the interpretation ε : X → TΣ by setting ε(x) = w for all x ∈ X. By the choice of w, v is independent from {u, ε∗ (G(u))}. So we can apply Lemma 4 to infer that ε∗ (G(v)) results from ε∗ (G(u)) by replacing some u subterms by v. Note that G(v) is obtained by substituting all w’s in ε∗ (G(v)) by members of X. The same is true about G(u) and ε∗ (G(u)). The positions of ε∗ (G(v)) in which w appear are exactly the same positions of ε∗ (G(u)) in which w appear (by the choice of w). So, positions of G(v) in which a member of X appear are exactly the same positions of G(u) in which a member of X appear. We claim that identical members of X appear in those identical positions of G(u) and G(v): if not, there are x1 , x2 ∈ X such that G(v)|p = x1 and G(u)|p = x2 for some position p of G(u) (and of G(v)). Define the interpretation δ : X → TΣ by δ(x1 ) = w1 , δ(x2 ) = w2 , and δ(x) = w for all x = x1 , x2 . Then δ ∗ (G(v))|p = w1 and δ ∗ (G(u))|p = w2 . On the other hand by Lemma 4, δ ∗ (G(v)) results from δ ∗ (G(u)) by replacing some u subterms by v. By the choice of w1 and w2 , such a replacement can not affect the appearance of w1 or w2 , and hence the subterms of δ ∗ (G(v)) and δ ∗ (G(u)) in the position p must be identical, a contradiction. This proves the claim which implies that G(v) results from G(u) by replacing some u subterms by v.
Lemma 6. Suppose |Σ0 | ≥ 7, and let G : TΣ → TΣ (X) be congruence preserving. Then for any u, v ∈ TΣ , G(v) results from G(u) by replacing some u subterms by v. Proof. By Proposition 1, there is a w ∈ TΣ independent from {u, G(u), v, G(v)}. By Lemma 5, G(w) is obtained from G(u) by replacing some u subterms by w, and also results from G(v) by replacing some v subterms by w. By the choice of w, all w’s appearing in G(w) have been obtained either by replacing u by w in G(u) or by replacing v by w in G(v). Since the only difference between G(v) and G(w) is in the positions of G(w) where w appears, and the same is true for the difference between G(u) and G(w), then G(v) can be obtained from G(u) by replacing some u subterms of it, the same u subterms which have been replaced by w to get G(w), by v.
Lemma 7. If |Σ0 | ≥ 7, then every congruence preserving function G : TΣ → TΣ (X) is a term function (i.e., there is a term t ∈ TΣ (X ∪ {x}), where x ∈ X, such that G(u) = t[x/u] for all u ∈ TΣ .)
A Completeness Property of Wilke’s Tree Algebras
669
Proof. Fix a u0 ∈ TΣ , and choose a v ∈ TΣ such that v is independent from {u0 , G(u0 )}. (By Proposition 1 such a v exists.) Then by Lemma 6, G(v) results from G(u0 ) by replacing some u0 subterms by v. Let y be a new variable (y ∈ X) and let t ∈ TΣ (X ∪ {y}) result from G(u0 ) by putting y exactly in the same positions that u0 ’s are replaced by v’s to get G(v). So, G(u0 ) = t[y/u0 ] and G(v) = t[y/v], moreover all v’s in G(v) are obtained from t by substituting all y’s by v. We show that for any arbitrary u ∈ TΣ , G(u) = t[y/u] holds: Take a u ∈ TΣ . By Proposition 1, there is a w independent from the set {u0 , G(u0 ), v, G(v), u, G(u)}. By Lemma 6, G(w) results from G(v) by replacing some v subterms by w. We claim that all v’s are replaced by w’s in G(v) to get G(w). If not, then v must be a subterm of G(w). From the fact (Lemma 6) that G(u0 ) results from G(w) by replacing some w subterms by u0 (and the choice of w) we can infer that v is a subterm of G(u0 ) which is in contradiction with the choice of v. So the claim is proved and then we can write G(w) = t[y/w], moreover all w’s in G(w) are obtained from t by substituting y by w. Again by Lemma 6, G(u) results from G(w) by replacing some w subterms by u. We can claim that all w’s appearing in G(w) are replaced by u to get G(u). Since otherwise w would have been a subterm of G(u) which is in contradiction with the choice of w. This shows that G(u) = t[y/u].
Theorem 2. If |Σ0 | ≥ 7, then every congruence preserving F : (TΣ )n → TΣ , for every n ∈ IN, is a term function. Proof. We proceed by induction on n: For n = 1 it is Lemma 7 with X = ∅. For the induction step let F : (TΣ )n+1 → TΣ be a congruence preserving function. For any u ∈ TΣ define Fu : (TΣ )n → TΣ by Fu (u1 , · · · , un ) = F (u1 , · · · , un , u). By the induction hypothesis every Fu is a term function, i.e., there is a s ∈ TΣ ({x1 , · · · , xn }) such that Fu (u1 , · · · , un ) = s[x1 /u1 , · · · , xn /un ] for all u1 , · · · , un ∈ TΣ . Denote the corresponding term for u by tu (it is straightforward to see that such a term s is unique for every u). The mapping TΣ → TΣ ({x1 , · · · , xn }) defined by u → tu is also congruence preserving. Hence by Lemma 7, it is a term function. So there is a t ∈ TΣ ({x1 , · · · , xn , xn+1 }) such that tu = t[xn+1 /u], hence F (u1 , · · · , un , un+1 ) = Fun+1 (u1 , · · · , un ) = tun+1 [x1 /u1 , · · · , xn /un ] = t[xn+1 /un+1 ][x1 /u1 , · · · , xn /un ]. So F (u1 , · · · , un , un+1 ) = t[x1 /u1 , · · · , xn /un , xn+1 /un+1 ] is a term function.
Acknowledgement I am much grateful to professor Magnus Steinby for reading drafts of this paper and for his fruitful ideas, comments and support.
References 1. Almeida J., “On pseudovarieties, varieties of languages, fiters of congruences, pseudoidentities and related topics”, Algebra Universalis, Vol. 27 (1990) pp. 333-350. 2. Bachmair L., “Canonical equational proofs”, Progress in Theoretical Computer Science, Birkh¨ auser, Boston Inc., Boston MA, 1991.
670
Saeed Salehi
3. Denecke K. & Wismath S. L., “Universal algebra and applications in theoretical computer science”, Chapman & Hall/CRC, Boca Raton FL, 2002. 4. F¨ ul¨ op Z. & V´ agv¨ olgyi S. “Minimal equational representations of recognizable tree languages” Acta Informatica Vol. 34, No. 1 (1997) pp. 59-84. 5. G´ecseg F. & Steinby M., “Tree languages”, in: Rozenberg G.& Salomaa A. (ed.) Handbook of formal languages, Vol. 3, Springer, Berlin (1997) pp. 1-68. 6. Jantzen M., “Confluent string rewriting”, EATCS Monographs on Theoretical Computer Science 14, Springer-Verlag, Berlin 1988. 7. Salomaa K., Review of [13] in AMS-MathSciNet, MR-97f:68134. 8. Nivat M. & Podelski A., “Tree monoids and recognizability of sets of finite trees”, Resolution of Equations in Algebraic Structures, Vol. 1, Academic Press, Boston MA (1989) pp. 351-367. 9. Podelski A., “A monoid approach to tree languages”, in: Nivat M. & Podelski A. (ed.) Tree Automata and Languages, Elsevier-Amsterdam (1992) pp. 41-56. 10. Salehi S. & Steinby M., “Tree algebras and regular tree languages” in preparation. 11. Steinby M., “A theory of tree language varieties”, in: Nivat M. & Podelski A. (ed.) Tree Automata and Languages, Elsvier-Amsterdam (1992) pp. 57-81. 12. Thomas W., “Logical aspects in the study of tree languages”, Ninth Colloquium on Trees in Algebra and in Programming (Proc. CAAP’84), Cambridge University Press (1984) pp. 31-51. 13. Wilke T., “An algebraic characterization of frontier testable tree languages”, Theoretical Computer Science, Vol. 154, N. 1 (1996) pp. 85-106.
Symbolic Topological Sorting with OBDDs (Extended Abstract) Philipp Woelfel FB Informatik, LS2, Univ. Dortmund, 44221 Dortmund, Germany [email protected]
Abstract. We present a symbolic OBDD algorithm for topological sorting which requires O(log2 N ) OBDD operations. Then we analyze its true runtime for the directed grid graph and show an upper bound of O(log4 N ). This is the first true runtime analysis of a symbolic OBDD algorithm for a fundamental graph problem, and it demonstrates that one can hope that the algorithm behaves well for sufficiently structured inputs.
1
Introduction
Algorithms on graphs is one of the best studied areas in computer science. Usually, a graph G = (V, E) is given by an adjacency list or by an adjacency matrix. 2 Such an explicit representation of a graph requires space Θ(|V |+|E|) or Θ(|V | ), and for many graph problems efficient algorithms are known. However, there are several application areas where typical problem instances have such a large size that a linear or even polynomial runtime is not feasible, or where even the explicit representation of the problem instance itself may not fit into memory anymore. In order to deal with very large graphs, symbolic (or implicit) graph algorithms have been devised, where the vertex and edge sets representing the involved graphs are stored symbolically, i.e., in terms of their characteristic functions. The characteristic functions are usually represented by so-called Binary Decision Diagrams (BDDs) or more specifically by Ordered Binary Decision Diagrams (OBDDs) — see Section 2 for definitions. Such approaches have been successfully applied in the areas of model checking, circuit verification and finite state machine verification (see e.g. [2,3,4]). These applications can be viewed as particular cases of symbolic graph problems, which raises the question whether it is also possible to devise symbolic graph algorithms with a good behavior for fundamental graph theoretical problems. One approach in this direction was undertaken by Hachtel and Somenzi [5] who introduced a symbolic OBDD algorithm for the maximum flow problem in 0-1 networks. The promising experimental studies demonstrated that the algorithm is able to handle graphs with over 1036 edges and that it is competitive with traditional algorithms on dense random graphs. The paper lacks however, a theoretical analysis of its performance with
Supported in part by DFG grant We 1066/10-1
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 671–680, 2003. c Springer-Verlag Berlin Heidelberg 2003
672
Philipp Woelfel
respect to runtime. Recently, Sawitzki [11,9] has analyzed the number of OBDD operations (i.e., the number of required synthesis operations of characteristic functions) required by the flow algorithm of Hachtel and Somenzi and has proposed an improved algorithm. But note that there is only a very weak relation between the number of OBDD operations and the true runtime of a symbolic OBDD algorithm. The time required for one synthesis step is mainly influenced by the sizes of the involved OBDDs which may range from linear to exponential (in the number of variables of the represented characteristic functions). In fact, we are not aware of any analysis of a symbolic OBDD algorithm with respect to its true runtime. (However, there is a recent true runtime analysis for a related type of decision diagrams, called Binary Moment Diagrams, showing that certain multiplier circuits can be verified in polynomial time [7].) The results and techniques presented here aim to be a first step in filling this gap. First, we present a new OBDD algorithm for topological sorting the N vertices of a directed acyclic graph, which requires only O(log2 N ) OBDD operations on OBDDs for functions with at most 4log N variables. Hence, if all OBDDs obtained during the execution of the algorithm have subexponential size, the total runtime is sublinear in the number of vertices of the graph. Then we analyze its true runtime for the directed grid graph and show an upper bound of O(log4 N ). This demonstrates that one can in fact hope that such a fundamental graph algorithm behaves well for sufficiently structured inputs. For the analysis, we generalize the notion of threshold functions to multivariate threshold functions. We investigate the OBDD size of multivariate threshold (and modulo) functions and obtain strong results about the effect of OBDD operations such as quantification on these functions. Clearly, our analysis is a “good-case” analysis which is only valid for one particular input instance. We hope, though, that the techniques presented here are a good starting point for developing a framework which allows to analyze symbolic algorithms for fundamental graph problems for larger classes of input instances. In fact, Sawitzki [10] has already successfully applied our framework to analyze the true runtime of his 0-1 network flow algorithm on the grid network.
2
OBDDs and Implicit Graph Representation n
In the following, let Bn denote the class of boolean functions {0, 1} → {0, 1} and let Xn = {x1 , . . . , xn } be a set of boolean variables. Let f ∈ Bn be a function defined over the variables in Xn . The subfunction of f , where k variables xi1 , . . . , xik are fixed to k constants c1 , . . . , ck ∈ {0, 1} is denoted by f|xi1 =c1 ,...,xik =ck . A variable ordering π on Xn is a permutation of the indices {1, . . . , n}, leading to the ordered list xπ(1) , . . . , xπ(n) of the variables. A π-OBDD on Xn for a variable ordering π is a directed acyclic graph with one root, two sinks labeled with 0 and 1, resp., and the following properties: Each inner node is labeled by a variable from Xn and has two outgoing edges, one of them labeled by 0, the other by 1. If an edge leads from a node labeled by xi to a node labeled by xj ,
Symbolic Topological Sorting with OBDDs
673
then π −1 (i) < π −1 (j). This means that any directed path passes the nodes in an order respecting the variable ordering π. A π-OBDD is said to represent a n boolean function f ∈ Bn , if for any a = (a1 , . . . , an ) ∈ {0, 1} , the path starting at the root and leading from any xi -node over the edge labeled by the value of ai , ends at a sink with label f (a). The size of a π-OBDD G is the number of its nodes and is denoted by |G|. The π-OBDD of minimal size for a given function f and a fixed variable ordering π is unique up to isomorphism. A π-OBDD is called reduced, if it is the minimal π-OBDD. It is well-known that the size of any reduced π-OBDD for a function f ∈ Bn is bounded by O(2n /n). (see [1] for the upper bound with the best constants known). Let f and g be functions in Bn and let Gf and Gg be π-OBDDs representing f and g, resp., for an arbitrary variable ordering π. In the following, we summarize the operations on OBDDs to which we will refer in this text. For a more detailed discussion on OBDDs and their operations we refer to the monograph [12]. n
– Evaluation: Given x ∈ {0, 1} compute f (x). This can trivially be done in time O(n). – Minimization: Compute the reduced π-OBDD for f . This is possible in time O(|Gf |). – Binary synthesis: Given a boolean operation ⊗ ∈ B2 compute a reduced π-OBDD Gh representing the function h = f ⊗ g. This can be done in time O(|G∗h |), where G∗h is the graph which consists of all nodes in the product graph of Gf and Gg reachable from the root. The size of Gh is at most O(|G∗h |) = O(|Gf | · |Gg |). – Replacement by constants: Given a sequence of variables xi1 , . . . , xik ∈ Xn and a sequence of constants c1 , . . . , ck , compute a reduced π-OBDD Gh for the subfunction h := f|xi1 =c1 ,...,xik =ck ∈ Bn−k . This is possible in time O(|Gf |) and the reduced π-OBDD Gh is of smaller size than Gf . – Quantification: Given a variable xi ∈ Xn and a quantifier Q ∈ {∃, ∀}, compute a reduced π-OBDD for the function h ∈ Bn−1 with h := (Qxi )f , where (∃xi )f := f|xi =0 ∨ f|xi =1 and (∀xi )f := f|xi =0 ∧ f|xi =1 . The time for computing this π-OBDD is determined by the time for determining the πOBDDs for f|xi =0 and f|xi =1 and the time required for the binary synthesis 2 of the two. Hence, it is bounded by O |Gf | . DFS – SAT enumeration: Enumerate all inputs x ∈ f −1 (1). Using simple techniques, this can be done in optimal time O(|Gf | + nf −1 (1)). We can use OBDDs for an implicit graph representation by letting them represent the characteristic functions of the vertex and edge sets. For practical reasons, though, we assume throughout this text that the vertex set is V = n {0, 1} for some n ∈ N, so that a representation of V is not needed. It is easy to accommodate the algorithm for other vertex sets. In order to encode integers n using the binary notation we define |x| = 2n−1 xn−1 + · · · + 20 x0 for x ∈ {0, 1} .
674
3
Philipp Woelfel
The Topological Sorting Algorithm n
Let G = (V, E), V = {0, 1} , be a directed acyclic graph represented by a π-OBDD as described in the former section. The edge relation E defines in a natural way a partial order on V , where v w if and only if there exists a path from v to w. In the explicit case a topological sorting algorithm would enumerate all vertices in such a way that if u is enumerated before v, then v u. In the implicit case, we hope for runtimes in the order of o(|V |) in which the enumeration of all vertices is not possible. Hence, a goal might be to obtain a complete order ≺ which inherits the properties of (i.e., v u implies v ≺ u). Unless is a complete order, ≺ is not uniquely defined by , and thus we assume that an arbitrary complete order on the vertex set V is given (this may be fixed in advance for the algorithm or may be given as an additional parameter), which determines the order of the elements which are incomparable with respect to (i.e., those with u v and v u). An alternative is to compute an OBDD which allows to enumerate the elements in their topological order by simple SAT enumeration operations. For any two vertices u, v we denote by ∆(u, v) the length of the longest path leading from u to v. (The length of a path is the number of its edges.) If no such path exists, then ∆(u, v) := −∞. Note that ∆(v, v) = 0, since the graph is acyclic. Furthermore, let ∆(v) := max {∆(u, v) | u ∈ V }. We call ∆(v) the length of the longest path to the vertex v. Let now DIST ∈ B2n , be defined to be 1 for an n n input (d, v) ∈ {0, 1} × {0, 1} , if and only if ∆(v) = |d|. Clearly, |du | < |dv | implies v u, where du , dv are the unique values with DIST(du , u) = 1 and DIST(dv , v) = 1. Hence, if we have a π-OBDD GDIST for the function DIST, we can use it to enumerate the vertices in an order respecting by computing the π-OBDDs for DIST|d=a for |a| = 0, 1, . . . and enumerating their satisfying inputs using the SAT enumeration procedure. We will see below how the OBDD GDIST can in addition be used to obtain a complete order respecting . In order to compute the function DIST, we use a method which is similar to that of computing the transitive closure by matrix squaring. For i ∈ {1, . . . , n} and u, v ∈ V let Ti (u, v) be the boolean function with function value 1 if and only if there exists a simple path from u to v which has length exactly 2i . We can compute OBDDs for all Ti as follows. T0 (u, v) = E(u, v)
and Ti+1 (u, v) = ∃w : Ti (u, w) ∧ Ti (w, v).
(S1)
Now we define the function DISTj ∈ B2n−j for 0 ≤ j ≤ n. It takes as input an (n − j)-bit value d∗ = dn−1 . . . dj and a vertex v (for j = n, d∗ is the empty string ). The function value DISTj (d∗ , v) is defined as DISTj (d∗ , v) = 1
⇔
2j |d∗ | ≤ ∆(v) < 2j (|d∗ | + 1).
(∗)
I.e., DISTj (d∗ , v) is true if the bits dn−1 . . . dj are exactly the n − j most significant bits of the binary representation of the integer ∆(v). Clearly, DIST =
Symbolic Topological Sorting with OBDDs
675
DIST0 . The functions DISTj can be computed by DISTn (v) := 1 and for j = n − 1, . . . , 0 DISTj (dn−1 . . . dj , v) = DISTj+1 (dn−1 . . . dj+1 , v) ∧ dj ⇔ ∃u Tj (u, v) ∧ DISTj+1 (dn−1 . . . dj+1 , u) . (S2) It is easy to verify that the boolean functions DISTj do in fact fulfill property (∗) (the proof can be found in the full version of this paper). Once we have computed the function DIST, we can use it together with an arbitrary given complete order in order to compute a complete order ≺ by letting u ≺ v ⇔ ∃du , dv : DIST(du , u) ∧ DIST(dv , v) ∧
|du | < |dv | ∨ (|du | = |dv | ∧ u v) . (S3)
It can be easily checked that ≺ defines a complete order on V respecting . Thus, the following theorem follows from simply counting the number of OBDD operations. n
Theorem 1. Let V = {0, 1} and G = (V, E) be an acyclic directed graph represented by OBDDs. Applying the OBDD operations as described in (S1)– (S3) yields an OBDD for a relation ≺ which defines a complete order on V such that v ≺ w for all v, w ∈ V with (v, w) ∈ E. The number of OBDD operations is O(n2 ), where each OBDD represents a function on at most 4n variables. Note that the algorithm can be easily adapted to an arbitrary vertex set V ⊆ n {0, 1} given by an OBDD for the relation V . This is done by simply executing the algorithm from above for the edge relation E (u, v) = E(u, v) ∧ V (u) ∧ V (v). While the complete order ≺ returned by such a modified algorithm is defined on n n {0, 1} × {0, 1} , its restriction to V × V is obviously a correct complete order. Since any (not necessarily reduced) OBDD in n variables has O(2n ) nodes, the theorem shows that the true worst-case runtime of our algorithm is 4 O(|V | log2 |V |). Clearly, this is much worse than the O(|V | + |E|) upper bound obtained by a well-known explicit algorithm. On the other hand, if all OBDDs obtained during the execution of the algorithm have a subexponential size (with respect to n), then its runtime is sublinear with respect to the number of vertices. In the following sections we show that it is justifiable to hope that this is the case for very structured input graphs.
4
Runtime Analysis for the Grid Graph
We analyze the behavior of the topological sorting algorithm for a 2n × 2n -grid, where all edges are directed from left to right and from bottom to up. The n n directed grid of the vertex set V = {0, 1} × {0, 1} and edge set graph consists E, where (x, y), (x , y ) ∈ E if and only if either |x| = |x | and |y | − |y| = 1 or |y| = |y | and |x | − |x| = 1.
676
Philipp Woelfel
In the analysis to follow, we assume an interleaved variable ordering, that is a variable ordering where e.g. for a function depending on two vertices u, v, the variable vi precedes the corresponding variable ui . Note that in practice, heuristics such as sifting algorithms [8] are used to optimize the variable orderings during the execution of an algorithm, and it can be expected that a good variable ordering is found this way. The idea for proving that the topological sorting algorithm is very efficient for the grid graph is that all functions represented by OBDDs after each step of the algorithm belong to a class of functions which have a small OBDD representation. The functions we consider are compositions of certain threshold and modulo functions, which we define and investigate in the following. We denote by Xk,n the set of variables xij with 1 ≤ i ≤ k and 0 ≤ j < n. By i x we denote the vector of n variables (xin−1 , . . . , xi0 ). Definition 1. 1. A boolean function f ∈ Bkn defined on the variable set Xk,n is called kvariate threshold function, if there exist a threshold T∈ Z and weights k w1 , . . . , wk ∈ Z such that f (x1 , . . . , xk ) = 1 if and only if i=1 wi · xi ≥ T . The maximum absolute weight of f is defined as w(f ) := max |w1 |, . . . , |wk | . The set of k-variate threshold functions with maximum absolute weight w defined on the set of variables Xk,n is denoted by Tw k,n 2. A boolean function g ∈ Bkn defined on the variable set Xk,n is called kvariate modulo M function, if there exists a constant k C ∈ Z and w1 , . . . , wk ∈ Z such that g(x1 , . . . , xk ) = 1 if and only if i=1 wi · xi ≡ C (mod M ). The set of k-variate modulo M functions defined on the set of variables Xk,n is denoted by MM k,n . Definition 2. Let f ∈ Bn and C be a class of functions defined on the variable set Xn . We say that f can be decomposed into m functions in C, if there exist a formula F on m variables and f1 , . . . , fm ∈ C such that f = F (f1 , . . . , fm ). The set of functions decomposable into m functions in C is denoted by D[C, m]. For any k ∈ N we denote by Dk the
set of function sequences (fn )n∈N such that ∃m ∈ N ∀n ∈ N : fn ∈ D T1k,n , m .
The main idea in our proof is based on two observations. Firstly, any function decomposable into a constant number of threshold and modulo functions has a small OBDD size. Secondly, all intermediate OBDDs obtained during the execution of the topological sorting algorithm on the directed grid graph represent functions which are decomposable into threshold and modulo functions. Let πk,n be the variable ordering in which the variable in Xk,n appear in the order x10 , x20 , . . . , xk0 , x11 , . . . , xk1 , . . . , xkn−1 . I.e., a πk,n -OBDD tests all bits of the input integers in an interleaved order with increasing significance of the bits. The following result is a generalization of Proposition 4 in [6]. The proof will be given in the full version of this extended abstract.
Symbolic Topological Sorting with OBDDs
677
M Lemma 1. Let f1 , . . . , fm ∈ Tw k,n ∪ Mk,n be given by reduced πk,n -OBDDs for fi , 1 ≤ i ≤ m. Further, let f = F (f1 , . . . , fm ) for a formula F of size s and let L = L(k, m, M ) = max {4kw + 5, M }. The minimal πk,n -OBDD for f has at most Ls+1 kn nodes and can be computed in time and space O (kn)2 sLs+1 + 1 .
Now we show for functions being decomposable into threshold functions (and no modulo functions), that the quantification over one of its variable blocks xi0 , . . . , xin−1 , 1 ≤ i ≤ k, can be done efficiently. Theorem 2. Let (fn )n∈N such that there exist w, m ∈ N with fn ∈ D[Tw k,n , m] for all n ∈ N, and let Q ∈ {∃, ∀}. If fn is given as a πk,n -OBDD, then for any 1 ≤ ≤ k a minimal πk,n -OBDD for (Qx )fn can be computed in time n3 k O(1) . We need the following lemma, which states that the result of quantifying over one variable block of a function decomposable into threshold functions is a function which is decomposable into threshold and modulo functions. The proof has to be omitted due to space restrictions. Let lcm denote the least common multiple.
Lemma 2. Let f ∈ D Tw , m , and Q ∈ {∃, ∀} and ∈ {1, . . . , k}. Then k,n
∗ ∗ w ∗ (Qx )f ∈ D T2w·w ≤ lcm{1, 2, . . . , w} and m = k−1,n ∪ Mk−1,n , m , where w O(2m w∗ m2 ). In particular, for any fixed k ∈ N and a sequence of functions (fn )n∈N ∈ Dk we have (Qxi )fn ∈ D T2k−1,n , m , where m = O(1). Proof (of Theorem 2). Fix w, m ∈ N such that fn ∈ D[Tw k,n , m] for all n ∈ N. W.l.o.g. we assume = 1 and for the sake of readability we write x instead of x1 and f instead of fn . We only prove the theorem for the case Q = ∀; the proof for Q = ∃ works analogously. We can write (∀x)f as (∀xn−1 ∀xn−2 . . . ∀x0 )f (x2 , . . . , xk ). If we apply the OBDD quantification operations to the bits x0 , . . . , xn−1 in this order, then after the ith quantification (0 ≤ i ≤ n) the resulting OBDD Gi represents the function gi = (∀xi−1 . . . ∀x0 )f be done in time in Bkn−i . Since each of the n quantification operations n−1 can 2 2 O(|Gi | ), the total time required is bounded by i=0 |Gi | . Hence, it suffices to show that Gi has a size of at most O(nk O(1) ) for all 0 ≤ i ≤ n − 1. Note that gi does not depend on the variables x0 , . . . , xi−1 . What we do in the following is to introduce n dummyvariables z0 , . . . , zn−1 and to show that gi can be written as (∀z0 , . . . , zn−1 )gi∗ |x =0,...,x =0 , where gi∗ is a function in 0 i−1
w D Tk+1,n , m + 1 . Hence, gi is obtained from the function (∀z0 , . . . zn−1 )gi∗ by restricting some variables to constants. By Lemma 2, this function is decomposable into a constant number of threshold functions, and therefore its OBDD size is bounded sufficiently. Note that the variables z0 , . . . , zn−1 are merely artifical helper variables, and that none of the functions we “really” deal with (i.e., which are represented by OBDDs) depend on these variables. Let f = F (f1 , . . . , fm ) for a formula F and f1 , . . . , fm ∈ Tw k,n . Since m = O(1), we may assume w.l.o.g. that the size s of F is a constant, too. We introduce
678
Philipp Woelfel
n new variables, which we denote by z0 , . . . , zn−1 . Then we replace the variables xj with the variables zj for 0 ≤ j ≤ i − 1. This way we obtain gi = (∀xi−1 . . . x0 )f (xn−1 . . . xi xi−1 . . . x0 , x2 , . . . , xk ) = (∀zi−1 . . . z0 )f (xn−1 . . . xi zi−1 . . . z0 , x2 , . . . , xk ) = (∀zn−1 . . . z0 ) |z| ≥ 2i ∨ f (xn−1 . . . xi zi−1 . . . z0 , x2 , . . . , xk )
(1)
Now consider an arbitrary threshold function fj for some 1 ≤ j ≤ m. I.e., fj (x, x2 , . . . , xk ) = 1 if and only if w1 |x| + w2 x2 + · · · + wk xk ≥ T . Let fj∗ ∈ B(k+1)n be the function with fj∗ (z, x, x2 , . . . , xk ) = 1 ⇔ w1 |z| + w1 x1 + w2 x2 + · · · + wk xk ≥ T ∗ i and f ∗ = F (f1∗ , . . . , fm ). Obviously, f ∗ ∈ D[Tw k+1,n , m]. If |z| < 2 , then |xn−1 . . . xi zi−1 . . . z0 | is the same as |xn−1 . . . xi 0 . . . 0| + |z|. Hence, it is easy to conclude from (1) that gi = (∀zn−1 . . . z0 ) |z| ≥ 2i ∨ f ∗ (z, xn . . . xi 0 . . . 0, x2 , . . . , xk ) ∗ = (∀zn−1 . . . z0 ) |z| ≥ 2i ∨ f|x (z, x1 , x2 , . . . , xk ) . i−1 =···=x0 =0
Now let
gi∗ (x1 , . . . , xk ) = |z| ≥ 2i ∨ f ∗ (z, x1 , x2 , . . . , xk ).
Then gi∗ ∈ D Tw and gi = (∀z)gi∗ |x =0,...,x =0 . Since k+1,n , m + 1 0 i−1
gi∗ ∈ D Tw , m + 1 and k, w, and m are constants, we can conclude from k+1,n
M Lemma 2 that (∀z)gi∗ ∈ D Tw for some constants w , M , and m . k,n ∪ Mk,n , m
Thus, by Lemma 1 the πk,n -OBDD size of (∀z)gi∗ is bounded by O(nk O(1) ). But as we have shown above, the πk,n -OBDD for gi can be obtained from the πk,n -OBDD for (∀z)gi∗ by simply replacing some variables with the constant 0. Hence, the resulting minimal πk,n -OBDD for gi can only be smaller than that for (∀z)gi∗ and thus its size is also bounded by O(nk O(1) ). Remark 1. All the upper bounds in Lemma 1 and Theorem 2 proven for functions decomposable into threshold- and modulo functions hold equivalently for their subfunctions f|α1 ...αi , where α1 . . . αi is a restriction to arbitrary variables except those being quantified in case of Theorem 2. The following corollary summarizes the abovely stated results in a more convenient way. It follows from the statements in Lemma 1, Theorem 2, Lemma 2, and Remark 1. Corollary 1. Fix a constant k ∈ N and let i, j ∈ {1, . . . , k} and Q, Q ∈ {∃, ∀}. Further, let (gn )n∈N ∈ Dk and fn = gn |α , where α is an assignment of constants to arbitrary variables except to those in {xi0 , . . . , xin−1 }. If gn is either given by a reduced πk,n -OBDD or by the reduced πk,n -OBDDs for the threshold functions into which it is decomposable, then the reduced πk,n -OBDDs for (Qxi )gn , (Qxi )fn , and (Qxi Q xj )gn can be computed in time O(n3 ).
Symbolic Topological Sorting with OBDDs
679
We can now apply these results in order to analyze the true run time of the topological sorting algorithm for the grid graph. Whenever we talk in the following about an OBDD for some function sequence in Dk , we assume that the variable ordering is πk,n . We have to specify the complete order for the 1 1 operations in (S3). A very natural order 1 is the2lexicographical order, i.e. (x , y ) 2 2 1 2 1 2 (x , y ) if and only if |x | < |x | ∨ |x | = |x | ∧ |y | ≤ |y | . Recall the steps (S1)-(S3) of the topological sorting algorithm from Section 3. We start the analysis of the grid with of the edge relation 2E. By 1the definition 1 1 2 2 2 1 graph, (x , y ), (x , y | − |x | = 0 ∧ |y ) ∈ E if and only if |x | − |y |=1 ∨ 2 |y | − |y 1 | = 0 ∧ |x2 | − |x1 | = 1 . Clearly, this function is in D4 . Now we look at the functions Ti obtained by (S1). Recall that Ti (u, v) is defined to be 1 if and only if there exists a path from u to v which has length exactly 2i . Note also that in the directed grid graph all paths from vertex u to vertex v have the same length. Hence, for the directed grid graph Ti (x1 , y 1 ), (x2 , y 2 ) = 1 if and only if |y 2 | ≥ |y 1 |
∧
|x2 | ≥ |x1 |
∧
|x2 | − |x1 | + |y 2 | − |y 1 | = 2i .
Clearly, this function is in D4 and thus according to Corollary 1, Ti+1 can be computed from Ti in time O(n3 ). (Note also that the quantification over one vertex in the grid graph is a quantification over two integers.) Hence, computing T1 , . . . , Tn requires time O(n4 ) in total. Next, we analyze the construction of the OBDDs for the functions DISTj in (S2). Recall that for any vertex v and any d∗ = dn−1 . . . dj , the function DISTj (d∗ , v) is true if and only if d∗ describes the n − j most significant bits of the bitwise representation of ∆(v). Let fj ∈ B3n , 0 ≤ j ≤ n, be defined by fj (d, x, y) = 1 if and only if |d| ≤ |x| + |y| < |d| + 2j . Hence, fj is the conjunction of two functions in T13,n . Furthermore, it is easy to see that DISTj dn−1 . . . dj , (x, y) = f|dj−1 =···=d0 =0 (d, x, y). Therefore, DISTj is obtained from a function in D3 by replacing some variables with the constant 0. Note also that DIST = DIST0 is in fact in D3 . Moreover, due to the analysis of Tj above, it becomes obvious that Tj (u, v) ∧ DISTj+1 (dn−1 . . . dj+1 , u) is a function in D5 , where some variables are replaced with the constant 0. Hence, according to Corollary 1, the OBDD for gj := ∃u : Tj (u, v) ∧ DISTj+1 (dn−1 . . . dj+1 , u) 3 can be in time O(n ). The function gj is obtained from a function 8computed 2 in D T3,n ∪ M3,n , O(1) by replacing some variables with the constant 0 (see Lemma 2). Now it is easy to see that the final two synthesis operations of (S2) required to compute DISTj run in time O(n3 ). (Apply Lemma 1 and Remark 1, and note that the function dj ∈ B1 can be viewed as a subfunction of f ∈ D1 with f (dn−1 . . . d0 ) = 1 if and only if |dn−1 . . . d0 | = 2j .) Altogether, the total time required for computing DISTn−1 , . . . , DIST0 = DIST is O(n4 ). Finally, we have to investigate the computation of the complete order ≺ using the operations in (S3). Recall that DIST ∈ D3 Hence, if one takes the definition of into account, the complete term in (S3) before the first quantification describes
680
Philipp Woelfel
a function h in D4 . According to Corollary 1 the function h = (∃dv ∃du )h can be computed in time and space O(n3 ). Summing up the time bounds for all OBDD operations, we have obtained the following result. Theorem 3. The OBDD algorithm for topological sorting takes time O(n4 ) on the directed 2n × 2n grid graph for an appropriate variable ordering πk,n and the complete order as defined above.
5
Conclusion
Since the results about the threshold and modulo functions are quite general, we hope that they might as well be applicable to the analysis of other symbolic OBDD algorithms. It would be nice to extend the techniques in such a way that not only single input instances but small graph classes can be handled. An interesting example would be grids where some arbitrary or randomly chosen edges have been removed.
Acknowledgments I thank Daniel Sawitzki and Ingo Wegener for helpful comments and discussions.
References 1. Y. Breitbart, H. B. Hunt III, and D. J. Rosenkrantz. On the size of binary decision diagrams representing boolean functions. Theor. Comp. Sci., 145:45–69, 1995. 2. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. Inform. and Comp., 98:142–170, 1992. 3. H. Cho, G. Hachtel, S.-W. Jeong, B. Plessier, E. Schwarz, and F. Somenzi. ATPG aspects of FSM verification. In IEEE Int. Conf. on CAD, pp. 134–137. 1990. 4. H. Cho, S.-W. Jeong, F. Somenzi, and C. Pixley. Synchronizing sequences and symbolic traversal techniques in test generation. Journal of Electronic Testing: Theory and Applications, 4:19–31, 1993. 5. G. D. Hachtel and F. Somenzi. A symbolic algorithm for maximum flow in 0-1 networks. Formal Methods in System Design, pp. 207–219, 1997. 6. S. Jukna. The graph of integer multiplication is hard for read-k-times networks. Technical Report 95–10, Universit¨ at Trier, 1995. 7. M. Keim, R. Drechsler, B. Becker, M. Martin, and P. Molitor. Polynomial formal verification of multipliers. Formal Methods in System Design, 22:39–58, 2003. 8. R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In IEEE Int. Conf. on CAD, pp. 42–47. 1993. 9. D. Sawitzki. Implicit flow maximization by iterative squaring. Manuscript. http://ls2-www.cs.uni-dortmund.de/˜sawitzki. 10. D. Sawitzki. Implicit flow maximization on grid networks. Manuscript. http://ls2-www.cs.uni-dortmund.de/˜sawitzki. 11. D. Sawitzki. Implizite Algorithmen f¨ ur Graphprobleme. Diploma thesis, Univ. Dortmund, 2002. 12. I. Wegener. Branching Programs and Binary Decision Diagrams - Theory and Applications. SIAM, 2000.
Ershov’s Hierarchy of Real Numbers Xizhong Zheng1 , Robert Rettinger2 , and Romain Gengler1 1
2
BTU Cottbus, 03044 Cottbus, Germany [email protected] FernUniversit¨ at Hagen, 58084 Hagen, Germany
Abstract. Analogous to Ershov’s hierarchy for ∆02 -subsets of natural numbers we discuss the similar hierarchy for recursively approximable real numbers. Namely, with respect to different representations of real numbers, we define k-computability and f -computability for natural numbers k and functions f . We will show that these notions are not equivalent for representations based on Cauchy sequences, Dedekind cuts and binary expansions.
1
Introduction
In classical mathematics, real numbers are represented typically by Dedekind cuts, Cauchy sequences of rational numbers and binary or decimal expansions. The effectivization of these representations leads to equivalent definitions of computable real numbers. This notion was first explored by Alan Turing in his famous paper [14] where also the Turing machine is introduced. According to Turing, the computable numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means (page 230, [14]). In other words, a real number x ∈ [0; 1] 1 is called computable if there is a com putable function f : N → {0, 1, · · · , 9} such that x = i∈N f (i) · 10−i . Robinson [9] has observed that computable real numbers can be equivalently defined via Dedekind cuts and Cauchy sequences. Theorem 1 (Robinson [9], Myhill [6] and Rice [8]). For any real number x ∈ [0; 1], the following are equivalent. 1. 2. 3. 4.
x is computable; The Dedekind cut Lx := {r ∈ Q : r < x} of x is a recursive set; There is a recursive set A ⊆ N such that x = xA := i∈A 2−i ; There is a computable sequence (xs ) of rational numbers which converges to x effectively in the sense that (∀s, t ∈ N)(t ≥ s =⇒ |xs − xt | ≤ 2−s ).
1
(1)
In this paper we consider only the real numbers of the unit interval [0; 1]. For other real numbers y, there are an n ∈ N and an x ∈ [0; 1] such that y := x ± n. y and x are regarded as being of the same computability.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 681–690, 2003. c Springer-Verlag Berlin Heidelberg 2003
682
Xizhong Zheng, Robert Rettinger, and Romain Gengler
Because of Specker’s example of an increasing computable sequence of rational numbers with a non-computable limit in [13], the extra condition (1) of the effective convergence is essential for the computability of x. As observed by Specker [13], Theorem 1 does not hold if the effectivization to the primitive recursive instead of computable level is considered. Let R1 be the class of all limits of primitive recursive sequences of rational numbers which converge primitive recursively, R2 the class of all real numbers of primitive recursive binary expansions and R3 include all real numbers of primitive recursive Dedekind cuts. It is shown in [13] that R3 R2 R1 . For polynomial time computability of real numbers, Ko [5] shows this dependence on representations of real numbers too. Let PC be the class of limits of all polynomial time computable sequences of dyadic rational numbers which converge effectively, PD contain all real numbers of polynomial time computable Dedekind cuts and PB be the class of real numbers whose binary expansions are polynomial time computable (with the input n written in unary notation). Ko [5] shows that PD = PB PC and PC is a real closed field while PD is not closed under addition and subtraction. In [5], the dyadic rational numbers D := ∪n∈N Dn for Dn := {m · 2−n : m ∈ N} instead of Q is used as base set. For the complexity discussion D seems more natural and easier to use. But for computability it makes no essential difference and we use both D and Q in this paper. In this paper, we investigate similar classes where we weaken the notion of computability in several quite natural ways instead of strengthening this notion. A typical approach to explore the non-computable objects is to classify them into equivalent classes or so-called degrees by various reductions (see e.g. [12]). This can be easily implemented for real numbers by mapping each set A ⊆ N to a real number xA := i∈A 2−i and then defining Turing reduction xA ≤T xB by A ≤T B. This definition is robust as shown in [2]. The benefit of this approach is that the techniques and results from well developed recursion theory can be applied straightforwardly. For example, Ho [4] shows that a real number x is Turing reducible to 0 , the degree of the halting problem K, iff there is a computable sequence of rational numbers which converges to x. This is a reprint of Shoenfield’s Limit Lemma ([10]) in recursion theory which says that A ≤T K iff A is a limit of a computable sequence of subsets of natural numbers. However, the classification of real numbers by Turing reductions seems not fine enough and it does not relate very closely to the analytical properties of real numbers. In this paper we will give another classification of real numbers which is analogous to Ershov’s hierarchy ([3]) for subsets of natural numbers. Notice that, if A ⊆ N is recursive, then there is an algorithm which tells us whether a natural number n belongs to A or not. In this case, corrections are not allowed. However, if we allow the algorithm to change its mind for the membership of n to A from negative to positive but at most once, then the corresponding set A is an r.e. set. In other words, the algorithm may claim n∈ / A at some stage and correct its claim to n ∈ A at a later stage. In general, given a function h : N → N, if the algorithm is allowed to change the answer to the question “n ∈ A? ” at most h(n) times for any n ∈ N, then the corresponding
Ershov’s Hierarchy of Real Numbers
683
set A is called h-r.e. according to Ershov [3]. Especially, for constant function h(n) ≡ k, the h-r.e. sets are called k-r.e. For recursive function h, the h-r.e. sets are called ω-r.e. This introduces a classification of ∆02 subsets of N (so called Ershov’s Hierarchy). Obviously, we can transfer this hierarchy to real numbers via their binary expansions straightforwardly. More precisely, we call xA h-binary computable if A is h-r.e. Similarly, after extending Ershov’s Hierarchy to subsets of rational numbers, we can call x h-Dedekind computable if the Dedekind cut of x is a h-r.e. set. For the Cauchy representation of real numbers a classification similar to Ershov’s can be introduced too. In this case, we count the number of the “big jumps” of the sequence instead of the number of the “mind-changes”. According to Theorem 1.4, x is computable if there is a computable sequence (xs ) of rational numbers which converges to x and the sequence (xs ) makes no big jumps in the sense of (1). However, if up to h(n) (non-overlapped) “big jumps” are allowed, then x is called h-Cauchy computable. Thus, three kinds of h-computability of real numbers can be naturally introduced. In this paper, we will investigate these notions and compare them with other known notions of weak computability of real numbers discussed in [15]. In this case we will find that Cauchy computability is the most natural notion, although several interesting results about binary and Dedekind computability are obtained in this paper.
2
Basic Definitions
In this section, we recall first some notions of weak computability of real numbers and Ershov’s hierarchy. Then we give the precise definition of binary, Dedekind and Cauchy computability. As mentioned in the previous section, a real number x is computable if there is a computable sequence (xs ) of rational numbers which converges to x effectively in the sense of (1). The limit of an increasing or decreasing computable sequence of rational numbers is called left computable or right computable, respectively. Left and right computable real numbers are called semi-computable. If x is a difference of two left computable real numbers, then x is called weakly computable. According to Ambos-Spies, Weihrauch and Zheng [1], x is weakly computable iff there is a computable sequence (xs ) of rational numbers which converges to x weakly effectively, in the sense that s∈N |xs − xs+1 | ≤ c for a constant c. More generally, if x is simply the limit of a computable sequence of rational numbers, then x is called recursively approximable. The classes of computable, left computable, right computable, semi-computable, weakly computable and recursively approximable real numbers are denoted by EC, LC, RC, SC, WC and RA, respectively. For any finite set A := {x1 < x2 < · · · < xk } of natural numbers, the natural number i := 2x1 + 2x2 + · · · + 2xk is called the canonical index of A. The set with canonical index i is denoted by Di . A sequence (As ) of finite subsets of N is called computable if there is a computable function g : N → N such that As = Dg(s) for any s ∈ N. Similarly, we can introduce the canonical index for subsets of dyadic rational numbers. Let σ : N → D be a one-to-one coding of
684
Xizhong Zheng, Robert Rettinger, and Romain Gengler
the dyadic numbers. For any finite set A ⊆ D, its canonical index is defined as the canonical index of the set Aσ := σ −1 (A) := {n ∈ N : σ(n) ∈ A}. In this paper, the subset A ⊆ D of canonical index n is denoted by Vn . A sequence (As ) of finite subsets of dyadic numbers is called computable if there is a recursive function h such that As = Vh(s) for all s ∈ N. Definition 1 (Ershov [3]). For any function h : N → N, a set A ⊆ N is called h-recursively enumerable (h-r.e. for short) if there is a computable sequence (As ) of finite subsets As ⊆ N such that ∞ ∞ 1. A0 = ∅ and A = i=0 j=i Aj . 2. (∀n ∈ N)(|{s : n ∈ As ∆As+1 }| ≤ h(n)), where A∆B := (A \ B) ∪ (B\A) is the symmetrical difference of A and B. In this case, the sequence (As ) is called an effective h-enumeration of A. For k ∈ N, a set A is called k-r.e. if it is h-r.e. for a constant function h(n) ≡ k and A is ω-r.e. if it is h-r.e. for some recursive function h. For convenience, recursive sets are called 0-r.e. Theorem 2 (Hierarchy Theorem, Ershov [3]). Let f, g : N → N be recursive functions. If (∃∞ n ∈ N)(f (n) < g(n)), then there is a g-r.e. set which is not f -r.e. Thus, there is an ω-r.e. set which is not k-r.e. for any k ∈ N; there is a (k + 1)-r.e. set which is not k-r.e. (for every k ∈ N), and there is also a ∆02 -set which is not ω-r.e. The definition of h-r.e., k-r.e. and ω-r.e. subsets of natural numbers can be transferred straightforwardly to subsets of dyadic rational numbers. Of course, h should be a function of type h : D → N in this case. This should be clear from context and is usually not indicated explicitly later on. Thus, we can easily introduce corresponding hierarchies for real numbers by means of binary or Dedekind representations of real numbers. However, if the real numbers are represented by sequences of rational numbers, we should count the number of their jumps of certain size. More precisely, we have the following definition. Definition 2. Let n be a natural number and (xs ) be a sequence of real numbers which converges to x. 1. An n-jump of (xs ) is a pair (i, j) with n < i < j & 2−n ≤ |xi − xj | < 2−n+1 . 2. The n-divergence of (xs ) is the maximal number of non-nested n-jump pairs of (xs ), i.e., the maximal natural number m such that there is a chain n < i1 < j1 ≤ i2 < j2 ≤ · · · ≤ im < jm with 2−n ≤ |xit − xjt | < 2−n+1 for t = 1, 2, · · · , m. 3. For h : N → N, if the n-divergence of (xs ) is bounded by h(n) for any n ∈ N, then we say that (xs ) converges to x h-effectively. Definition 3. Let x ∈ [0; 1] be a real number and h : N → N a function. 1. x is h-binary computable (h-bEC for short) if there is a h-r.e. set A ⊆ N such that x = xA ;
Ershov’s Hierarchy of Real Numbers
685
2. x is h-Cauchy computable (h-cEC for short) if there is a computable sequence (xs ) of rational numbers which converges to x h-effectively; 3. x is h-Dedekind computable (h-dEC for short) if the left Dedekind cut Lx := {r ∈ Q : r < x} is a h-r.e. set. 4. For δ ∈ {b, c, d}, x is called k-δEC if x is h-δEC for the constant function h(n) ≡ k. x is called ω-δEC if it is h-δEC for a recursive function h. The classes of all k-δEC, ω-δEC and h-δEC real numbers are denoted by k-δ EC, ω-δ EC and h-δ EC, respectively, for δ ∈ {b, c, d}. Besides, let ∗-δ EC := n∈N n-δ EC. The following proposition follows directly from the definition. Proposition 1. For δ ∈ {b, c, d} and f, g : N → N, the following hold. 1. 0-δ EC = EC. 2. k-δ EC ⊆ (k + 1)-δ EC ⊆ ∗-δ EC ⊆ ω-δ EC, for any k ∈ N. 3. If f (n) ≤ g(n) holds for almost all n ∈ N, then f -δ EC ⊆ g-δ EC.
3
Binary Computability
In this section we discuss the binary computability. From Theorem 2, it follows immediately that g-bEC \ f -bEC = ∅ if (∃∞ n ∈ N)(f (n) < g(n)). Thus, we have the following hierarchy theorem for binary computability. Proposition 2. k-bEC (k + 1)-bEC ∗-bEC ω-bEC, for any k ∈ N. Now we compare the binary computability with semi-computability. It turns out that SC is incomparable with ∗-bEC but included properly in ω-bEC. Theorem 3. 1. SC ω-bEC 2. SC ⊆ ∗-bEC 3. 2-bEC ⊆ SC Proof. 1. As it is pointed out by Soare ([11], page 217), if the real number xA is left computable, then the set A is 2n+1 -r.e. Combining this with Theorem 2, SC ω-bEC follows immediately. 2. We construct a set A ⊆ N in stages such that xA is left computable and, for all i, j ∈ N, the following requirements are satisfied. Ri,j : (Dϕi (s) )s is an effective j-enumeration =⇒ A = lim Dϕi (s) . s→∞
where (ϕi ) is an effective enumeration of all computable partial functions ϕ :⊆ N → N. This implies that A is not ∗-r.e. To satisfy Re for e := i, j, we choose an ne > j. We put ne into A as long as ne is not in Dϕi (s) . If ne enters Dϕi (s) for some s, then we take ne out of A. ne may be put into A again if ne leaves Dϕi (t) for some t > s, and so on. Obviously, we need only to change the membership of ne to A at most j times and the strategy succeeds eventually. To make xA left computable, we reserve an interval [me ; ne ] of natural numbers with ne − me > j
686
Xizhong Zheng, Robert Rettinger, and Romain Gengler
exclusively for Re and put a new element from this interval into A whenever ne is taken out of A. 3. Ambos-Spies, Weihrauch and Zheng (Theorem 4.8 of [1]) show that, for Turing incomparable r.e. sets A, B ⊆ N, xA⊕B is not semi-computable, where B is the complement of B and A ⊕ B := {2n : n ∈ A} ∪ {2n + 1 : n ∈ B}. On the other hand, for any r.e. sets A, B, the join A ⊕ B := (2A ∪ (2N + 1)) \ (2B + 1) is a 2-r.e. set and hence xA⊕B is 2-bEC. Theorem 4. WC ⊆ ω-bEC and ω-bEC ⊆ WC Proof. In [16] Zheng shows that there are r.e. sets A, B ⊆ N such that the set C ⊆ N defined by xC := xA − xB is not of ω-r.e. Turing degree. This means that xC is weakly computable but not ω-bEC. That is, WC ⊆ ω-bEC. The part ω-bEC ⊆ WC follows immediately from a result of [1], that if xA⊕∅ is weakly computable, then A is a 23n -r.e. set. By Ershov’s Hierarchy Theorem 2, we can choose an ω-r.e. A which is not 23n -r.e. Then B := A ⊕ ∅ is obviously also an ω-r.e. set and hence xB is ω-bEC. But xB is not weakly computable because A is not 23n -r.e.
4
Dedekind Computability
We investigate Dedekind computability in this section. Again the class of ω-dEC and WC are incomparable. But different from the case of binary computability, the hierarchy theorem does not hold any more. Between ω-binary and ωDedekind computability we have the following result. Theorem 5. ω-bEC ⊆ ω-dEC Proof. Let xA ∈ ω-bEC and (As ) be an effective h-enumeration of A for a recursive function h. We define a computable sequence (Es ) of finite subsets of dyadic numbers by Es := {r ∈ Ds : r ≤ xAs }, where Ds is the set of all dyadic rational numbers of precision s. It is easy to see that E := lims Es exists and it is in fact the left Dedekind cut of the real numberxA . On the other hand, (Es ) is an effective g-enumeration of E, where g(n) := i≤n h(i). Thus, x is a g-dEC and hence an ω-dEC real number. The next result shows that the class ∗-dEC collapses to SC and hence the hierarchy theorem does not hold. Lemma 1. 1. 1-dEC = LC and SC ⊆ 2-dEC. 2. ∗-dEC = SC. Proof. 1. This follows directly from the definition. 2. By item 1, it suffices to prove that ∗-dEC ⊆ SC. For any x ∈ ∗-dEC, let k := min{n : x ∈ n-dEC}. Then the Dedekind cut Lx of x is a k-r.e. but not (k − 1)-r.e. set. Let (As ) be an effective k-enumeration of Lx . Then there are infinitely many r ∈ D such that |{s ∈ N : r ∈ As+1 ∆As }| = k, where
Ershov’s Hierarchy of Real Numbers
687
A∆B := (A \ B) ∪ (B \ A). Let Ok := {r ∈ D : |{s ∈ N : r ∈ As+1 ∆As }| = k}. Obviously, Ok is an r.e. set. If k > 0 and k is even, then x < r for any r ∈ Ok and we can choose a decreasing computable sequence (rs ) from Ok such that lim rs = x. Otherwise, there is a rational number y such that x < y < r for all r ∈ Ok . In this case, we can construct an effective (k − 1)-enumeration of Lx by allowing any r > y to enter Lx at most k/2 − 1 times. This contradicts the hypothesis. Thus x is a right computable real number. Similarly, if k is odd, then x is left computable. Theorem 6. WC ⊆ ω-dEC. Proof. We construct recursive enumerations (As ) and (Bs ) of r.e. sets A and B and define Cs by xCs = xAs − xBs . Let C := lims→∞ Cs = respectively C . Then xC is a weakly computable real number. To guarantee that t s∈N t≥s xC is not ω-dEC, it suffices to satisfy the following requirements for all i, j ∈ N. ϕi and ψj are total functions, (Vϕi (s) )s∈N is an effective ψj -enumeration, =⇒ sup(Ei ) = xC . Ri,j : Ei := lims→∞ Vϕi (s) is a Dedekind cut where (ϕi ) and (ψj ) are recursive enumerations of partial computable functions ϕi :⊆ N → N and ψj :⊆ D → N, respectively. This can be achieved by a finite injury priority construction. Corollary 1. The class ω-dEC is incomparable with the class WC and hence the class ∗-dEC is a proper subset of ω-dEC. Corollary 2. The class ω-dEC is not closed under addition and subtraction. Proof. By Lemma 1.2, we have SC ⊆ ω-dEC. If ω-dEC is closed under addition and subtraction, then WC ⊆ ω-dEC holds because WC is the closure of SC under addition and substraction. This contradicts Theorem 6.
5
Cauchy Computability
We discuss the Cauchy computability in this section. We will show that, the classes k-cEC and ∗-cEC are incomparable with the classes LC and SC, and that the class ∗-cEC is not closed under addition. However the hierarchy theorem holds. From the definition of ω-Cauchy computability, it is easy to see that x is ω-Cauchy computable iff there are a recursive function h and a computable sequence (xs ) of rational numbers converging to x such that, for any n ∈ N, there are at most h(n) non-nested pairs (i, j) of indices with |xi − xj | ≥ 2−n . Thus, class ω-cEC is in fact the class DBC (divergence bounded computable real numbers) discussed in [7] and hence it is in fact the image class of all left computable real numbers under total computable real functions. We summarize some known results about the class ω-cEC in the next theorem where CTF denotes the class of all computable real functions f : [0; 1] → [0; 1].
688
Xizhong Zheng, Robert Rettinger, and Romain Gengler
Theorem 7 (Rettinger, Zheng, Gengler and von Braunm¨ uhl [7]). 1. The class ω-cEC is a field; 2. ω-cEC = CTF(LC) := {f (y) : f ∈ CTF & y ∈ LC} and 3. WC ω-cEC RA. Now let us look at the relationship among the classes 1-cEC, ∗-cEC and the classes SC and WC. Theorem 8. 1-cEC ⊆ SC ⊆ ∗-cEC WC. Proof. For the first noninclusion, consider the number xA⊕B for two Turing incomparable r.e. sets A, B ⊆ N. By Theorem 4.8 of [1], it is not semi-computable but 1-cEC. For the second inclusion, we can construct a left computable real number which is not k-cEC for any k ∈ N by a priority construction. To prove ∗-cEC WC, let (xs ) be a computable sequence of rational numbers which converges k-effectively to a ∗-cEC real number x for some k ∈ N. For any n ∈ N, let Sn := {s ∈ N : 2−n ≤ |xs − xs+1 | < 2−n+1 }. Then s∈N |xs − xs+1 | = n∈N s∈Sn &s≤n |xs − xs+1 | + s∈Sn &s>n |xs − xs+1 | ≤ n∈N (n ·
2−n+1 + k · 2−n+1 ) ≤ 8 + 2k. That is, x is a weakly computable real number. Therefore, ∗-EC ⊆ WC. By the assertion SC ∗-cEC of item (2), this inclusion is also proper. Theorem 9. For any recursive functions f, g with ∃∞ n(f (n) < g(n)), there is a g-cEC real number which is not f -cEC, i.e., g-cEC \ f -cEC = ∅. Proof. We construct a computable sequence (xs ) of rational numbers which satisfies, for any e ∈ N, the following requirements. N:
(xs ) converges g-effectively to x, and
Re :
If (ϕe (s))s converges f -effectively, then x = lims ϕe (s),
where (ϕe ) is an effective enumeration of all computable partial functions ϕe :⊆ N → Q. To satisfy a single requirement Re , choose a rational interval Ie of length 2−ne for some ne ∈ N such that f (ne ) < g(ne ). Divide it equally into four subintervals Ii , for i < 4, of the length 2−(ne +2) . Define xs as the middle point of the interval I1 as long as the sequence (ϕe (s))s does not enter the interval I1 . Otherwise, if ϕe (s) enters into I1 for some s, then let xs be the middle point of I3 . Later, if ϕe (t) enters I3 for some t > s, then let xt be the middle point of I1 again, and so on. If (ϕe (s))s converges f -effectively, then (xs ) needs at most f (ne ) + 1 ≤ g(ne )) jumps to guarantee that lim xs = lims ϕe (s). Thus, the requirement N is satisfied too. To satisfy all the requirements simultaneously, we will construct an increasing sequence (ne ) of natural numbers such that f (ne ) < g(ne ) and ne + 2 ≤ ne+1 for all e ∈ N, and two sequences (Ie ) and (Je ) of rational numbers such that Ie := [ae ; be ] and Je := [ce ; de ] which satisfy the following conditions ae < be < ce < de & be − ae = de − ce = 2−(ne +1) & ce − be = 2−ne ,
(2)
Ershov’s Hierarchy of Real Numbers
689
and Ie+1 ∪ Je+1 ⊂ Ie for all e ∈ N. The intervals Ie and Je are reserved for the requirement Re . That is, we construct a computable sequence (xs ) of rational numbers such that xs is properly chosen from Ie or Je in order to guarantee lims xs = lims ϕe (s). In general, the sequences (ne ), (Ie ) and (Je ) are not computable but they can be effectively approximated. Namely, at stage s, we can construct the finite approximation sequence (ne,s )e≤k(s) , (Ie,s )e≤k(s) and (Je,s )e≤k(s) , where k(s) ∈ N satisfies lims k(s) = ∞. At any stage s, we choose a rational number xs such that xs ∈ Ie,s for all e ≤ k(s). If, for some t, ϕe,s (t) enters the interval Ie,s too, then we exchange Ie,s and Je,s . In this case, we denote this t by te,s . For any i > e, the intervals Ii and Ji will be cancelled and should be redefined with a new ni,t > ni,s for some t > s. For the same ne , the intervals Ie and Je can be exchanged at most f (ne ) times, if (ϕe (s))s converges f -effectively. Therefore, a finite injury priority construction can be applied. Corollary 3. For any k ∈ N, we have k-cEC (k + 1)-cEC. Theorem 10. There are x, y ∈ 1-cEC such that x − y ∈ / ∗-cEC. Therefore, k-cEC and ∗-cEC are not closed under addition and subtraction for any k > 0. Proof. We will construct two computable increasing sequences (xs ) and (ys ) of rational numbers which converge 1-effectively to x and y, respectively, such that z := x − y satisfies all the following requirements: Ri,j :
(ϕi (s))s converges j-effectively to ui =⇒ ui = z
where (ϕi ) is an effective enumeration of all partial computable functions ϕi :⊆ N → Q. To satisfy Re (e := i, j), we choose two natural numbers ne and me such that me = 2j +ne +2 and an rational interval I := [a0e ; a8e ] of length 2−me +2 . ] for k < 8. The interval I is divided equally into eight subintervals Ik := [ake ; ak+1 e At the beginning, let x0 := a2e and y0 = 0 and hence z0 := x0 −y0 = a2e ∈ J := Ie2 , where J serves as a witness interval of Re such that any element z ∈ J satisfies Re . If, at some stage s0 > 0, ϕi (t0 ) enters the interval J for some t0 , then we define xs0 := x0 + 2−(ne +1) + 3 · 2−(me +1) , ys0 := y0 + 2−(ne +1) and J := Ie5 . Accordingly we have zs0 := xs0 − ys0 = z0 + 3 · 2−(me +1) and hence zs0 ∈ J. If, at a later stage s1 > s0 , ϕi (t1 ) enters the interval J := Ie5 for some t1 > t0 , then we define the xs1 := xs0 + 2−(ne +2) + 3 · 2−(me +1) , ys1 := ys0 + 2−(ne +2) and J := Ie2 . In this case, we have zs1 := xs1 − ys1 = z0 + 3 · 2−(me +1) and hence zs1 ∈ J. This can happen at most j times if (ϕi (s))s converges j-effectively. Thus we have lims zs = lims ϕi (s) and Re is satisfied. To satisfy all the requirements, we apply a finite injury priority construction.
References 1. K. Ambos-Spies, K. Weihrauch, and X. Zheng. Weakly computable real numbers. Journal of Complexity, 16(4):676–690, 2000. 2. A. J. Dunlop and M. B. Pour-El. The degree of unsolvability of a real number. In J. Blanck, V. Brattka, and P. Hertling, editors, Computability and Complexity in Analysis, volume 2064 of LNCS, pages 16–29, Berlin, 2001. Springer. CCA 2000, Swansea, UK, September 2000.
690
Xizhong Zheng, Robert Rettinger, and Romain Gengler
3. Y. L. Ershov. A certain hierarchy of sets. i, ii, iii. (Russian). Algebra i Logika. 7(1):47–73, 1968; 7(4):15–47, 1968; 9:34–51, 1970. 4. C.-K. Ho. Relatively recursive reals and real functions. Theoretical Computer Science, 210:99–120, 1999. 5. K.-I. Ko. Complexity Theory of Real Functions. Progress in Theoretical Computer Science. Birkh¨ auser, Boston, 1991. 6. J. Myhill. Criteria of constructibility for real numbers. The Journal of Symbolic Logic, 18(1):7–10, 1953. 7. R. Rettinger, X. Zheng, R. Gengler, and B. von Braunm¨ uhl. Weakly computable real numbers and total computable real functions. In Proceedings of COCOON 2001, Guilin, China, August 20-23, 2001, volume 2108 of LNCS, pages 586–595. Springer, 2001. 8. H. G. Rice. Recursive real numbers. Proc. Amer. Math. Soc., 5:784–791, 1954. 9. R. M. Robinson. Review of “Peter, R., Rekursive Funktionen”. The Journal of Symbolic Logic, 16:280–282, 1951. 10. J. R. Shoenfield. On degrees of unsolvability. Ann. of Math. (2), 69:644–653, 1959. 11. R. Soare. Cohesive sets and recursively enumerable Dedekind cuts. Pacific J. Math., 31:215–231, 1969. 12. R. I. Soare. Recursively enumerable sets and degrees. A study of computable functions and computably generated sets. Perspectives in Mathematical Logic. SpringerVerlag, Berlin, 1987. 13. E. Specker. Nicht konstruktiv beweisbare S¨ atze der Analysis. The Journal of Symbolic Logic, 14(3):145–158, 1949. 14. A. M. Turing. On computable numbers, with an application to the “Entscheidungsproblem”. Proceedings of the London Mathematical Society, 42(2):230–265, 1936. 15. X. Zheng. Recursive approximability of real numbers. Mathematical Logic Quarterly, 48(Suppl. 1):131–156, 2002. 16. X. Zheng. On the Turing degrees of weakly computable real numbers. Journal of Logic and Computation, 13(2): 159-172, 2003.
Author Index ` Alvarez, C. 142 Amano, Kazuyuki 152 Ambos-Spies, Klaus 162 Anantharaman, Siva 169 Ausiello, G. 179 Baba, Kensuke 189 Banderier, Cyril 198 Bannai, Hideo 208 Bazgan, C. 179 Beier, Ren´e 198 Benkoczi, Robert 218 Bhattacharya, Binay 218 Blanchard, F. 228 Blesa, M. 142 Bodlaender, Hans L. 239 B¨ ohler, Elmar 249 Bonsma, Paul S. 259 Boreale, Michele 269, 279 Brosenne, Henrik 290 Brueggemann, Tobias 259 Bucciarelli, Antonio 300 Buhrman, Harry 1 Buscemi, Maria Grazia 269 Carton, Olivier 308 ˇ Cern´ a, Ivana 318 Cervelle, J. 228 Chen, Hubie 328, 338 Chen, Zhi-Zhong 348 Chrobak, Marek 218 Crochemore, M. 622 Dalmau, Victor 358 Dang, Zhe 480 Delhomm´e, Christian 378 Demange, M. 179 D´ıaz, J. 142 Duval, Jean-Pierre 388 Egecioglu, Omer 480 Epstein, Leah 398, 408 Feldmann, R. 21 Fellows, Michael R. 239 Fern´ andez, A. 142 Ford, Daniel K. 358
Formenti, E. 228 Friedl, Katalin 419 Gadducci, Fabio 279 Gairing, M. 21 Gastin, Paul 429, 439 Gengler, Romain 681 Geser, Alfons 449 Glaßer, Christian 249 Gorrieri, Roberto 46 Gramlich, Gregor 460 Grossi, R. 622 Hagiwara, Masayuki 490 Hannay, Jo 68 Hlinˇen´ y , Petr 470 Hofbauer, Dieter 449 Homeister, Matthias 290 Ibarra, Oscar H. 480 Inenaga, Shunsuke 208 Ishii, Toshimasa 490 Katsumata, Shin-ya 68 Knapik, Teodor 378 Kolpakov, Roman 388 Kouno, Mitsuharu 348 Krysta, Piotr 500 Kucherov, Gregory 388 Kumar, K. Narayan 429 Kutylowski, Miroslaw 511 Larmore, Lawrence L. 218 Lasota, Slawomir 521 Lecroq, Thierry 388 Lefebvre, Arnaud 388 Leporati, Alberto 92 Letkiewicz, Daniel 511 L¨ oding, Christof 531 Loyer, Yann 541 L¨ ucking, Thomas 21, 551 Luttik, Bas 562 Magniez, Fr´ed´eric 419 Marco, Gianluca De 368 Martinelli, Fabio 46 Mart´ınez, Conrado 572 Maruoka, Akira 152
692
Author Index
Mauri, Giancarlo 92 Mavronicolas, Marios 551 Meer, K. 582 Meghini, Carlo 592 Mehlhorn, Kurt 198 Meister, Daniel 249 Merkle, Wolfgang 602 Miltersen, Peter Bro 612 Molinero, Xavier 572 Monien, Burkhard 21, 551 Mukund, Madhavan 429 Narendran, Paliath Oddoux, Denis
169
439
Paschos, V. Th. 179 Pel´ anek, Radek 318 Pelc, Andrzej 368 Pinchinat, Sophie 642 Pisanti, N. 622 Radhakrishnan, Jaikumar 612 Reimann, Jan 602 Reith, Steffen 632 Rettinger, Robert 681 Riedweg, St´ephane 642 Rode, Manuel 21, 551 Rohde, Philipp 531 R¨ ohrig, Hein 1 Rusinowitch, Michael 169 Rychlik, Marcin 652 Rytter, Wojciech 218
Sagot, M.-F. 622 Salehi, Saeed 662 Salibra, Antonino 300 Sanders, Peter 500 Sannella, Donald 68 Santha, Miklos 419 Saxena, Gaurav 480 Sen, Pranab 419 Serna, M. 142 Shinohara, Ayumi 189, 208 Spirakis, Paul 551 Spyratos, Nicolas 592 Straccia, Umberto 541 Takeda, Masayuki 189, 208 Tassa, Tamir 408 Thilikos, Dimitrios M. 239 Thomas, D. Gnanaraj 378 Thomas, Wolfgang 113 Tsuruta, Satoshi 189 Tzitzikas, Yannis 592 V¨ ocking, Berthold Vrto, Imrich 551
500
Waack, Stephan 290 Waldmann, Johannes 449 Wegener, Ingo 125, 612 Woeginger, Gerhard J. 259 Woelfel, Philipp 671 Zheng, Xizhong
681