Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2827
3
Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo
Andreas Albrecht Kathleen Steinhöfel (Eds.)
StochasticAlgorithms: Foundations and Applications Second International Symposium, SAGA 2003 Hatfield, UK, September 22-23, 2003 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Andreas Albrecht University of Hertfordshire Computer Science Department Hatfield, Herts AL10 9AB, UK E-mail:
[email protected] Kathleen Steinhöfel FIRST - Fraunhofer Institute for Computer Architecture and Software Engineering 12489 Berlin, Germany E-mail:
[email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at .
CR Subject Classification (1998): F.2, F.1.2, G.1.2, G.1.6, G.2, G.3 ISSN 0302-9743 ISBN 3-540-20103-3 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by Olgun Computergrafik Printed on acid-free paper SPIN: 10954935 06/3142 543210
Preface
The second Symposium on Stochastic Algorithms, Foundations and Applications (SAGA 2003), took place on September 22–23, 2003, in Hatfield, England. The present volume comprises 12 contributed papers and 3 invited talks. The contributed papers included in the proceedings present results in the following areas: ant colony optimization; randomized algorithms for the intersection problem; local search for constraint satisfaction problems; randomized local search methods for combinatorial optimization, in particular, simulated annealing techniques; probabilistic global search algorithms; network communication complexity; open shop scheduling; aircraft routing; traffic control; randomized straight-line programs; and stochastic automata and probabilistic transformations. The invited talk by Roland Kirschner provides a brief introduction to quantum informatics. The requirements and the prospects of the physical implementation of a quantum computer are addressed. Lucila Ohno-Machado and Winston P. Kuo describe the factors that make the analysis of high-throughput gene expression data especially challenging, and indicate why properly evaluated stochastic algorithms can play a particularly important role in this process. John Vaccaro et al. review a fundamental element of quantum information theory, source coding, which entails the compression of quantum data. A recent experiment that demonstrates this fundamental principle is presented and discussed. Our special thanks go to all who supported SAGA 2003, to all authors who submitted papers, to the members of the program committee, to the invited speakers, and to the members of the organizing committee.
Andreas Albrecht Kathleen Steinh¨ofel
Organization
SAGA 2003 was organized by the University of Hertfordshire, Department of Computer Science, Hatfield, Hertfordshire AL10 9AB, United Kingdom.
Organization Committee Andreas Albrecht Mickael Hammar Kathleen Steinh¨ofel
Sally Ensum Georgios Lappas
Program Committee Andreas Albrecht (Chair, University of Hertfordshire, UK) Luisa Gargano (Salerno University, Italy) Juraj Hromkoviˇc (RWTH Aachen, Germany) Oktay Kasim-Zade (Moscow State University, Russia) Roland Kirschner (University of Leipzig, Germany) Michael Kolonko (TU Clausthal-Zellerfeld, Germany) Frieder Lohnert (DaimlerChrysler AG, Germany) Lucila Ohno-Machado (Harvard University, USA) Christian Scheideler (Johns Hopkins University, USA) Jonathan Shapiro (Manchester University, UK) Gregory Sorkin (IBM Research, NY, USA) Kathleen Steinh¨ofel (FhG FIRST, Germany) John Vaccaro (University of Hertfordshire, UK) Lusheng Wang (City University, Hong Kong) Peter Widmayer (ETH Z¨urich, Switzerland) CK Wong (The Chinese University, Hong Kong) Thomas Zeugmann (Medical University of L¨ubeck, Germany)
Table of Contents
Prospects of Quantum Informatics (Invited Talk) . . . . . . . . . . . . . . . . . . . . . . . . . . . Roland Kirschner
1
A Converging ACO Algorithm for Stochastic Combinatorial Optimization . . . . . . 10 Walter J. Gutjahr Optimality of Randomized Algorithms for the Intersection Problem . . . . . . . . . . . 26 J´er´emy Barbay Stochastic Algorithms for Gene Expression Analysis (Invited Talk) . . . . . . . . . . . . 39 Lucila Ohno-Machado and Winston Patrick Kuo Analysis of a Randomized Local Search Algorithm for LDPCC Decoding Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Osamu Watanabe, Takeshi Sawai, and Hayato Takahashi Testing a Simulated Annealing Algorithm in a Classification Problem . . . . . . . . . . 61 Karsten Luebke and Claus Weihs Global Search through Sampling Using a PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Benny Raphael and Ian F.C. Smith Simulated Annealing for Optimal Pivot Selection in Jacobian Accumulation . . . . . 83 Uwe Naumann and Peter Gottschling Quantum Data Compression (Invited Talk) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 John A. Vaccaro, Yasuyoshi Mitsumori, Stephen M. Barnett, Erika Andersson, Atsushi Hasegawa, Masahiro Takeoka, and Masahide Sasaki Who’s The Weakest Link? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Nikhil Devanur, Richard J. Lipton, and Nisheeth Vishnoi On the Stochastic Open Shop Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Roman A. Koryakin Global Optimization – Stochastic or Deterministic? . . . . . . . . . . . . . . . . . . . . . . . . 125 Mike C. Bartholomew-Biggs, Steven C. Parkhurst, and Simon P. Wilson Two-Component Traffic Modelled by Cellular Automata: Imposing Passing Restrictions on Slow Vehicles Increases the Flow . . . . . . . . . . . 138 Paul Baalham and Ole Steuernagel Average-Case Complexity of Partial Boolean Functions . . . . . . . . . . . . . . . . . . . . . 146 Alexander Chashkin
VIII
Table of Contents
Classes of Binary Rational Distributions Closed under Discrete Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Roman Kolpakov
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Prospects of Quantum Informatics Roland Kirschner Institut f¨ur Theoretische Physik, Univ. Leipzig, D-04109 Leipzig, Germany
[email protected] Abstract. A minimal introduction for non-physicists to basic notions of quantum physics is given with respect to quantum informatics. The requirements and the prospects of the physical implementation of a quantum computer are reviewed. Keywords: Quantum systems, qubits, quantum entanglement, decoherence.
1 Introduction Modern computer technology relies essentially on the understanding of the microstructure of matter gained in 20th century by quantum physics. The integration of thousands of transistor elements in one chip has lead to an enormous increase in computer capacities. Still a lot of effort is applied to push the microelectronics further to smaller and more compact structures. Information is physical, i.e. storage and processing of information uses physical systems which can exist in several states in a controllable way. The classical case is the one of systems decomposable into subsystems, called bits, each of which allows for just two distinguished states, |0), |1). The real physical system has much more degrees of freedom, allowing for a variety of states, besides the one used for information processing. The latter may be the direction of a current or the magnetization on a small piece of the chip. The basic physical theory describing the real system is quantum theory; the classical theory is a limiting case , which applies to the mentioned bit states. The exciting ideas of using the quantum features of physical systems for information storage and processing is attracting much interest in recent years. For extensive introductions we refer to [1–3]. More details on the relevant physics can be found in the collection of articles [4].
2 Quantum Theory and Informatics 2.1 Quantum States The states of a quantum degree of freedom are described quite differently compared to the classical ones. In particular there is no case with just two distinguished states. The smallest non-trivial space of states, in our context referred to as a qubit, is represented by the set of one-dimensional subspaces of the two-dimensional complex space H1 = C2 , spanned by the basis vectors |0 >, |1 >. The states of two qubits are accommodated in the tensor product H1 ⊗ H1 , spanned by the products |0, 0 >= |0 >1 ⊗|0 >2 , |0, 1 >, |1, 0 >, |1, 1 >. A mathematically trivial remark has essential consequences in A. Albrecht and K. Steinh¨ofel (Eds.): SAGA 2003, LNCS 2827, pp. 1–9, 2003. c Springer-Verlag Berlin Heidelberg 2003
2
Roland Kirschner
physics and therefore in quantum informatics: A system of two (and more) qubits can exist in states represented by a vector, not decomposable in products of vectors referring to separate qubits. Such indecomposable states are called entangled. The basis of maximally entangled two qubit states is given by 1 1 ψ± = √ (|0, 1 > ±|1, 0 >), φ± = √ (|0, 0 > ±|1, 1 >). 2 2
(1)
The peculiar correlations between qubits being in entangled states is understood as the essential resource of advantages of quantum informatics. We shall give illustrations in the following. To be discussed in the following is also the downside of the medal: The main difficulties of quantum informatics are related to this basic feature of quantum entanglement as well. 2.2 Stochastic and Quantum Computations There is a view on the features of quantum informatics, which experts concerned with stochastic algorithms may find suitable [5]. In a classical deterministic computation the elementary unit of information is the bit, being either in state |0) or in state |1). n bits represent 2n distinct states; let us look at them as unit basis vectors in a 2n dimensional vector space. Operations are performed by Boolean gates, a sequence of gates forms a circuit. A few elementary gates, acting on one or on a pair of bits, are enough as building blocks for all Boolean operations. Typically the registers for input and output have more bits than actually used for the computation. Including passive (ancilla) bits the gates can be made time-reversible, i.e. such that the input values can be recovered from the output ones. In this scheme stochastic computation is modeled by adding a stochastic gate acting on one bit, |0) p 1− p |0) → . |1) q 1−q |1)
(2)
The entries of the stochastic matrix are probabilities, i.e. non-negative real numbers, where the sum of the ones in each row is 1. The resulting stochastic states of a bit are, instead of just two distinct ones, the linear superposition of the type p |0) + (1 − p) |1), which describes for 0 ≤ p ≤ 1 a line segment in the two-dimensional space P1 . A multibit state is then in the positive simplex of the vector space Pn spanned by the original 2n values of the n bits, |α) =
∑
px |x), 0 ≤ px ≤ 1.
(3)
x∈(0,1)n
The allowed operations are the ones preserving this positive simplex, in particular they preserve the l1 norm, ∑ |px | = 1.
Prospects of Quantum Informatics
3
This scheme now allows to pass formally to quantum computation by replacing the 2 dimensional real space, suitable for describing the states of a stochastic bit, by a complex two dimensional space H1 . Vectors differing by a non-zero complex factor represent one and the same state. The 2n distinct n bit states are replaced by the 2n basis vectors of the n qubit system, called the computational basis. A generic state vector in Hn has the form |α >=
∑
ax |x > .
(4)
x∈(0,1)n
Allowed operations are represented by unitary operators, acting on the state vector with the condition of preserving the l2 norm, ∑ |ax |2 = 1. All time-reversible Boolean gates have quantum counterparts, which by unitary operator do the analogous operation on the basis elements |01...10 >. Formally similar to the stochastic one-bit gate (2) i s the following quantum gate, |0 > cos φ sin φ |0 > → . (5) |1 > − sin φ cos φ |1 > All operations can be composed out of a few elementary gates, e.g. the latter one with particular angles φ = π4 , π8 and a two-qubit gate, e.g. the quantum CNOT gate [6], |x1 > ⊗|x2 >→ |x1 > ⊗|x1 ⊕ x2 >, x1,2 ∈ (0, 1)n .
(6)
⊕ stands for addition modulo 2. 2.3 States of Open Systems Only in the ideal situation of a quantum system isolated from the environment or, if the influence of the environment can be described in classical approximation, the representation of states as lines in a complex space H is applicable. States of open quantum systems are rather described by density matrices, i.e. positive hermitean operators ρˆ on H with trace 1 or positive hermitean forms. The data encoded in the density matrix are equivalent to fixing a orthonormal basis |ψi > in Hn plus a point in the positive simplex Pn , {pi }, 2n
ρˆ = ∑ pi |ψi >< ψi |.
(7)
i=1
Here < ψi | is the Dirac symbol of the dual basis, |ψi >< ψi | is the symbol of the projector on the one-dimensional subspace determined by |ψi >. The extreme case of states where pi0 = 1, pi = 0, i = i0 , corresponds to the ones discussed before; these states are called pure and can be represented by the vector |ψi0 >. The generic states are called mixed. The density matrix proportional to the unit operator represents the maximally mixed state. The von Neumann entropy characterizes the degree of mixing, ˆ = − ∑ pi ln pi . SvN = −tr(ρˆ ln ρ) i
(8)
4
Roland Kirschner
Operations on the state represented by the density matrix, ρ → T ρ, should preserve the trace and the positivity property. There is a stronger condition on T , complete positivity, demanding that the extensions of T to an operator on Hn ⊗ Ha by T ⊗ Ia preserve positivity. Unitary operators on state vectors act on the density matrix as TU ρ = U + ρU. They obey all conditions and they do not increase the degree of mixing SvN . 2.4 On the Virtues of Entanglement Consider the computation of a function f (x) on the positive integers x ≤ 2n and with integer values in the same range [7]. It takes two n qubit systems Hn for in- and output. Assume that an algorithm computing f has been implemented as a quantum circuit performing an unitary transformation on the n qubit quantum states, such that the action on a computational basis vector results in the other basis vector, | f (x) > = U f |x >. Instead of doing this operation successively on all 2n basis vectors one operation on the entangled tensor product state ∈ Hn ⊗ Hn , n
(I ⊗ U f ) 2− 2
∑
n
|x > ⊗|x >= 2− 2
x∈(0,1)n
∑
|x > ⊗| f (x) >,
(9)
x∈(0,1)n
leads to a state that involves in principle the result. To extract the value of the function for a particular x0 further unitary transformation have to be applied enhancing iteratively the amplitude of the corresponding basis state |x0 > ⊗| f (x0 ) > in expense of the other amplitudes. This enhancement procedure is part of Grover’s search algorithm [8]. Consider the task of finding out of 2n objects one of a special type and a function f of the object type t taking the value 0 for all types but the special one t˜ of interest; its value for this type is 1. Put the objects in some arbitrary order tx , 0 ≤ x ≤ 2n . Assume that this function f has been implemented in a quantum circuit such that U f |x >= (−1) f (tx ) |x > .
(10)
The search algorithm starts from the entangled state, n
2− 2
∑
|x > .
(11)
x∈(0,1)n
The application of the above U f just marks the contribution of the wanted item in this coherent sum by sign. In the next step the operation acting on the basis as |x >→ ∑(Dn )xy |y >, x, y ∈ (0, 1)n
(12)
y
is applied on the result, where the unitary matrix Dn is (Dn )xy =
2 − δxy . 2n
(13)
These two operations, Dn U f , are then applied repeatedly enhancing in this way the amplitude ax0 of the wanted item; the value of the order number x0 of it can be read off by a quantum measurement.
Prospects of Quantum Informatics
5
The information contained in a classical bit can be encoded in a qubit by mapping the bit values 0, 1 correspondingly to the basis vectors |0 >, |1 >. It is possible however to store and to transfer the information of up to 2 bits by one qubit [9]. This takes a second passive qubit being entangled with the first one. Let the two-qubit system be initially in the entangled state ψ− (1) and assume that the qubits are carried by two particles movable independently. Transfer first the particle 1 to the sender and particle 2 to the receiver. The sender is then doing unitary operations on the qubit state of particle (1) (2) 1. He applies U1 ,U1 acting on the computational basis of qubit 1 (|0 >1 , |1 >1 )T by matrix multiplication with (1)
U1 =
−i π e 2 0 01 (2) = . , U π 1 10 0 ei 2
(14) (1)
(2)
(1)
(2)
The two qubit state changes by applying either U1 or U1 or U1 U1 from ψ− to either ψ+ or φ− or φ+ . In this way the sender is able to produce 4 different states by manipulating the qubit 1 only. Transferring particle 1 now also to the receiver and applying there measurements onto the two particle system the receiver can read which of the 4 options the sender had chosen. Since the sender did not interact with the second particle carrying the auxiliary qubit 2 the result is indeed that two bits have been transferred by sending only one particle carrying a single qubit. 2.5 On the Drawbacks of Entanglement Only a part of the degrees of freedom of the real physical system are used for information processing in the role of bits or qubits. One cannot get rid of the remaining degrees of freedom, the environment, and one cannot eliminate completely their influence on the degrees of freedom of interest. In the classical case it is known how to deal with effects of noise and heat production and how to control possible errors caused by these effects. In the quantum case the unavoidable extra degrees of freedom result besides of that in a much more problematic effect - decoherence. The mechanism of decoherence can be understood as the result of the evolution of the whole system, the qubit system plus the environment, starting from a disentangled one, where the state vectors of the qubit subsystem and of the environment enters as factors, to an entangled state. The state of the qubit subsystem is then described by the projection of this entangled state to the subsystem. The result is a mixed state of the qubits described by a density matrix as discussed in section 2.3. In a large environment the state of the qubit subsystem gets maximally mixed after a short time, called coherence time. In this way all quantum features of the qubit system are washed out, in particular any trace of entanglement between qubits is lost. The maximally mixed state of a qubit can be considered as the uniform probability distribution on the unit sphere and does not carry any information, like the stochastic bit in the unbiased state with p0 = p1 = 12 . There is some analogy of decoherence to heat dissipation. Put a system in a heat bath ( environment being in termal equilibrium at a given temperature). The (weak)
6
Roland Kirschner
interaction between system and bath drives the system towards termal equilibrium with the bath in relaxation time. Typically, coherence times are much shorter than relaxation times. Let us return to the ideal situation of pure qubit states (4). Theoretically a measuring apparatus refers to a basis. Assume that we are given a measuring device adapted to the computational basis. Then the measurement of the qubit system in a generic state |α > results in the value x ∈ (0, 1)n with probability |ax |2 . As the effect of the measurement with result x the state of the qubit system reduces to |x >. This means that in the generic case only a repeated measurement on an ensemble of qubit systems equally prepared to be in state |α > allows to extract the values |ax |2 for all x. A complete reconstruction of |α > from the measurement results is impossible even in this stochastic sense. The situation is comfortable if we know that the system is in one of the basis states, i.e. that only one ax0 is non-vanishing. Then the measurement adapted to this basis tells us which is the value of x0 with probability 1. Therefore the quantum algorithms should result in states out of the computational basis. The peculiarities of the measurement process are understood as the results of the interaction of the (macroscopic) apparatus with the system producing an overall maximally entangled state.
3 Problems of the Physical Implementation 3.1 Basic Requirements to a Quantum Computer The basic requirement for the physical implementation of a quantum computer have been formulated by DiVincenzo [10]. These requirements seem to be generally accepted and they are taken usually as the guideline in discussing the prospects of particular proposals. The physicist’s vocabulary used in formulating the requirements has been explained above. 1 A scalable physical system with well characterized qubits. 2 The ability to initialize the state of the qubit system to a simple fiducial state, e.g. to the one with the factorized state vector where all qubit vectors are the basis vector |0 >. 3 Long relevant coherence times, much longer than the gate operation time. 4 A universal set of quantum gates. 5 A qubit specific measurement capability. 6 Ability to interconvert stationary and flying qubits. 7 Ability to faithfully transmit flying qubits. The latter two requirements are the ones for quantum communication. This would allow to exchange the information in the quantum form, i.e. without intermediate conversion into the classical form and back.
Prospects of Quantum Informatics
7
3.2 The Critical View A straightforward comparison of the contemporary experimental abilities to a quantum computer meeting the above requirements leads to a pessimistic conclusion [11]. To justify the effort, the computing capacity of the new divice should be comparable with the existing ones. The proposed quantum error correction codes [12, 13] promise to compensate errors on the level o f 10−5 per qubit and per gate. This would take about 10 ancilla qubits per working qubit for beating these potential errors. This leads to an estimate of n ≈ 105 qubits for doing useful computations with error control. If spins of electrons or nuclei are used as qubits then initialization (requirement 2) could be done by cooling (below 100 mK) and by applying a strong (1 T) magnetic field. It is questionable that this can be done with a precision of 10−5 ( = probability of any of the qubits to be not in the initial state |0 >). The requirements 4 and 5 mean to manipulate and to measure the quantum state of n = 105 spins, i.e. switching on and off about n2 pairwise interactions of the spins at each operation of the computing algorithm. The most essential problem is the one about coherence (requirement 3). To keep coherence during a sequence of controlled interaction of 105 qubits seems to be a requirement that cannot be achieved in foreseeable future. In any case one cannot believe to achieve this by scaling up the present experiments concerned with coherence of up to 7 qubits. Most theoretical treatments of proposed qubit systems and of realizations of gates are presented in the formulation of closed systems. This ignores decoherence or assumes it to be a small correction. A physically realistic treatment of qubit systems should be formulated in the language of open systems, describing the states by density matrices and the gates by operators in the more general class beyond the unitary ones. In this way the decoherence can be included inherently into the theoretical model. In the critical note [11] it has been proposed to study a qubit system in the opposite extreme of small coherence, i.e. in almost maximally mixed states. The author refers to the classical limit of quantum system. The classical limit of a qubit system would be the one of classical spins, the states of each of them are represented by points on the unit sphere. On the other hand the maximally mixed qubit state can be considered as a uniform probability distribution on the unit sphere. This is not a classical spin state which in contrary corresponds to one distinguished point on this sphere. From this we conclude that the classical spin system does not tell us much about the situation of small coherence. 3.3 The Optimistic View In recent years a number of experiments in different types of physical systems have demonstrated the controlled evolution of a few qubit-like quantum degrees of freedom. Actually quantum coherence phenomena have been the topic in quantum optics since the laser had been invented. The improvement of experimental techniques, e.g. ion and atom traps, high quality microcavities, the advanced technology of nuclear magnetic resonance, is the basis of the new coherence experiments. Implementations of qubitlike systems and quantum gate operations have been proposed, discussed and experimentally investigated in different physical situations, among them ions trapped in a
8
Roland Kirschner
clever setup of static and oscillating electromagnetic field, microcavities at optical and microwave frequencies in interaction with atoms, electron and nuclear spin states at donors in silicon, nuclear spins bound in molecules controlled by nuclear magnetic resonance technique, charge states of superconductors or flux states of superconducting circuits. This experimental progress is the real source of optimism about quantum informatics. Several schemes of quantum communication have been demonstrated experimentally with the qubits implemented as the polarization states of photons, in particular the dense coding mechanism described theoretically above [14]. Two entangled photon polarizations have been used also to implement save quantum cryptography and quantum teleportation, i.e. transmission of a (unknown) qubit state without converting the information to the classical form (compare requirements 6, 7). It is difficult by now to transform a product polarization state of photons, |0 >1 |0 >2 , into an entangled one. Fortunately there are sources of entangled photon pairs. Some crystals have peculiar highly non-linear optical properties and are able to convert a high frequency photon into a pair of equal frequency photons polarized orthogonal to each other. The polarization depends on the direction of emission and the polarization states appear disentangled in general. In the particular situation of type II conversion there are special direction of emission where the photons emerge in an entangled polarization state. By letting one of the photons pass through additional birefrigent plates one compensates for relative time delay and turns the polarization of this photon (corresponding to the transformations U1 ,U2 (14)) in order to produce one of the 4 entangled basis states (1). The resulting two-photon polarization state can be detected by coincidence counters. By this technique one is able to distinguish only 3 of the 4 basis states, the states φ± result unfortunately in the same coincidence signal. But this was enough to demonstrate the transport of information stored in 3 distinguished states by one photon, i.e. a channel capacity exceeding the classical one by a factor 1.5. Ions can be kept for some time in a small region by a superposition of static and oscillating electromagnetic fields. Several ions can be placed nearby, each ion feeling the vibrations of its neighbors. Laser cooling allows to slow down the termal motion to 1K. By controlled laser pulses one stimulates the transitions between selected ion energy levels. The subsystem of two levels (|0 >, |1 >) serves as a qubit and is carried by one ion. A third ion level with high transition rate to and from |0 > (but not to |1 >) can be used to detect whether the qubit is in state |0 > and to initialize the qubit to |0 > [15]. One-bit gates are constructed by laser stimulated interactions between the levels |0 > and |1 >. Two-bit gates can work with the vibrational interaction between ions. Experiments demonstrating the controlled interaction of 4 qubit have been reported [16]. The nuclear spin qubits considered in the nuclear magnetic resonance set-up reside on one molecule. Working with a macroscopic number of molecules allows to detect coherence effects of up to 7 qubits despite of having rather low coherence. Two superconducting pieces separated by a thin oxid layer can be in discrete charge states, characterized by the number of electronic Cooper pairs in one of the pieces. This number changes by tunneling through the layer (Josephson effect). By an applied electric field two adjacent charge states become distinguished as the ones of lowest
Prospects of Quantum Informatics
9
energy; they can form the qubit subsystem. A superconducting circuit, interrupted at one place by a thin oxid layer, can be in discrete states, characterized by units of the induced magnetic flux through the circuit contour. By an applied external magnetic field two adjacent flux states are separated as the ones of lowest energy and they can also form the qubit subsystem. In the superconducting setup the influence from degrees of freedom other than the mentioned charge and flux ones are suppressed. Coherence times can be larger, but also the gate operation times are larger here. Experiments on the controlled coupling of two charge-state qubits have been reported recently [17]. These few (and incomplete) details on some physical implementations may be sufficient to provide an illustration of the involved physics and of the challenges of quantum informatics.
References 1. Steane, A.: Quantum computing. Reports Prog. Phys. 61 (1998) 117 2. Alber, G. et al.: Quantum Information. Springer tracts in Modern Physics, 173, Springer Verl., 2001. 3. Preskill, J.: Lecture notes. http://www.theory.caltech.edu/people/˜preskill/ph229 4. (Collection of papers): Fortschr. Phys. 48 (2000) 9-11 5. Fenner, A.: A physics-free introduction to quantum computational model. preprint arXiv: cs.CC/0304008, in http://xxx.lanl.gov/find/cs 6. Barenco, A. et al.: Phys. Rev. A 52 (1995) 3457 7. Deutsch, D.: Proc. R. Soc. London A 400 (1980) 97 8. Grover, L.: Phys. Rev. Lett. 80 (1996) 4329 9. Bennett, G.H. and Wiesner, S.J.: Phys. Rev. Lett. 69 (1992) 2881 10. DiVincenzo, D.P.: in ref. [4], pp. 771 11. Dyakonov, M.I.: Quantum computing: A view from the enemy camp. in Lurgyi, S.,Xu, J. and Zaslavsky, A. (eds.): Future Trends in Microelectronics, Wiley 2002 12. Steane, A.: Phys. Rev. Lett. 77 (1996) 793 13. Calderbank, A. and Shor, P.: Phys. Rev. A 54 (1996) 1098 14. Mattle, K., Weinfurter, H., Kwiat, P.G. and Zeilinger, A.: Phys. Rev. Lett. 76 (1996) 4656 15. Cirac, J.I., Zoller, P.: Phys.Rev.Lett. 74 (1995) 4091 16. Sackett, C.A. et al.: Nature 404 (2000) 256 17. Pashkin, Yu.A. et al.: Nature 421 (2003) 823
A Converging ACO Algorithm for Stochastic Combinatorial Optimization Walter J. Gutjahr Dept. of Statistics and Decision Support Systems, University of Vienna
[email protected] http://mailbox.univie.ac.at/walter.gutjahr/
Abstract. The paper presents a general-purpose algorithm for solving stochastic combinatorial optimization problems with the expected value of a random variable as objective and deterministic constraints. The algorithm follows the Ant Colony Optimization (ACO) approach and uses Monte-Carlo sampling for estimating the objective. It is shown that on rather mild conditions, including that of linear increment of the sample size, the algorithm converges with probability one to the globally optimal solution of the stochastic combinatorial optimization problem. Contrary to most convergence results for metaheuristics in the deterministic case, the algorithm can usually be recommended for practical application in an unchanged form, i.e., with the “theoretical” parameter schedule. Keywords: Ant colony optimization, combinatorial optimization, convergence results, metaheuristics, Monte-Carlo simulation, stochastic optimization.
1 Introduction In many practical applications of combinatorial optimization, a smaller or larger extent of uncertainty on the outcome, given a special choice of the decision maker, must be taken account of. A well-established way to represent uncertainty is by using a stochastic model. If this approach is chosen, the objective function of the optimization problem under consideration gets dependent not only on the decision, but on a random influence as well; in other word, it gets a random variable. The aim is then to optimize a specific functional of this random variable. Usually, this functional is the expected value; e.g., if the objective function represents cost, then the quantity to be minimized can be the expected cost. (Particular applications of risk theory, especially in financial engineering, also consider other functionals, e.g. the variance of the objective. We do not deal with this situation here.) In some formally relative simple stochastic models, the expected value of the objective function can either be represented explicitly as a mathematical expression, or at least be easily computed numerically to any desired degree of accuracy. Then, the solution of the stochastic optimization problem is not essentially different from that of a deterministic optimization problem; the stochastic structure is hidden in the representation of the expected objective function, and exact or heuristic techniques of combinatorial optimization can be used. The situation changes if it is only possible to determine estimates of the expected objective function by means of sampling or simulation. For A. Albrecht and K. Steinh¨ofel (Eds.): SAGA 2003, LNCS 2827, pp. 10–25, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
11
example, consider the single machine total tardiness problem, an NP-hard sequencing problem (see Du and Leung [8]): n jobs have to be scheduled on a single machine, each job has a processing time and a due date, and the objective is to minimize the sum of the tardiness values, where tardiness is defined as the positive part of the difference between completion time and due date. Although the formula for the objective in the deterministic case is simple, no closed-form expression for the expected total tardiness in the case where the processing times are random variables (following a given joint distribution) is known, and its numerical computation would be very complicated and time-consuming. However, a relatively straightforward approach for approximating the expected total tardiness is to draw a sample of random scenarios and to take the average total tardiness over these scenarios as an estimate. Examples for other problems where the same approach seems promising are stochastic vehicle routing problems (see, e.g., Bertsimas and Simchi-Levi [3]), emergency planning based on simulation (Bakuli and Smith [2]), facility location problems involving queuing models (Marianov and Serra [18]), project financing with uncertain costs and incomes (Norkin, Ermoliev and Ruszczynski [19]), manpower planning under uncertainty (Futschik and Pflug [9]), or activity crashing using PERT (Gutjahr, Strauss and Wagner [16]). For the approximate solution of hard deterministic combinatorial optimization problems, several metaheuristics have been developed. One of these metaheuristics with a currently rapidly growing number of applications is Ant Colony Optimization (ACO), rooted in work by Dorigo, Maniezzo and Colorni [7] at the beginning of the nineties and formulated more recently as a metaheuristic by Dorigo and DiCaro [6]. Like some other metaheuristics, ACO derives its basic idea from a biological analogy; in the case of ACO, the food-searching behavior of biological ant colonies is considered as an optimization process, and from this metaphor, strategies for solving a given combinatorial optimization problem by simulating walks of “artificial ants” are derived. It has been shown that certain ACO variants have the favorable property that the intermediate solutions found by the system converge to the globally optimal solution of the problem (see [12]–[14]). The aim of the present investigation is to develop an ACO algorithm that is able to treat the more general case of a stochastic combinatorial optimization problem, using the generally applicable sampling approach described above. As in the deterministic case, guarantees on the convergence to the optimal solution are highly desirable. It turns out that such a convergence result is indeed possible for the algorithm presented here. It will be argued that, contrary to most convergence results for metaheuristics for deterministic problems, our algorithm can be recommended for practical use in an unchanged form, i.e., with the same parameter schedule as assumed for obtaining the convergence result. Whereas for other metaheuristic approches, extensions to stochastic problems have already been studied intensely (see, e.g., Arnold [1] for Evolutionary Strategies or Gutjahr and Pflug [15] for Simulated Annealing), the research on ACO algorithms for stochastic problems is currently only at the very beginning. An interesting first paper has been published by Bianchi, Gambardella and Dorigo [4], it concerns the solution of the probabilistic travelling salesman problem. Nevertheless, the approach chosen in [4] is tailored to the specific problem under consideration, and it depends on the availabil-
12
Walter J. Gutjahr
ity of a closed-form expression of the expected objective function value. The algorithm presented here has a considerably broader range of application. We think that ACO is especially promising for problems of the considered type for three reasons: First, it works with a “memory” (the pheromone trails, see below) which effects a certain robustness against noise; this is a common feature with Evolutionary Strategies and Genetic Algorithms, but different from Simulated Annealing or Tabu Search. Secondly, also problems with a highly constrained solution space (e.g., permutation problems) can be encoded in a natural way. Third, problem-specific heuristics can be incorporated to improve the performance. The two last issues seem to give the ACO approach a specific advantage in the field of highly constrained combinatorial optimization.
2 Stochastic Combinatorial Optimization Problems We deal with stochastic combinatorial optimization problems of the following general form: Minimize F(x) = E ( f (x, ω)) s.t.
x ∈ S.
(1)
Therein, x is the decision variable, f is the (deterministic) objective function, ω denotes the influence of randomness (formally: ω ∈ Ω, where (Ω, Σ, P) is the probability space specifying the chosen stochastic model), E denotes the mathematical expectation, and S is a finite set of feasible decisions. We need not to assume that E ( f (x, ω)) is numerically computable, since it can be estimated by sampling: Draw N random scenarios ω1 , . . . , ωN independently from each other. A sample estimate is given by 1 ˜ F(x) = N
N
∑ f (x, ων ) ≈ E ( f (x, ω)).
(2)
ν=1
˜ Obviously, F(x) is an unbiased estimator for F(x). For example, in the single machine total tardiness problem mentioned in Section 1, N arrays, each consisting of n random processing times for job 1 to n according to the given distribution(s), can be generated from independent random numbers. For each of these arrays, the total tardiness of the considered schedule (permutation) x can be computed. The average over the N total ˜ tardiness values is the sample estimate F(x) of F(x). Let us emphasize that, contrary to its deterministic counterpart, problem (1) can be nontrivial already for a very small number |S| of feasible solutions: Even for |S| = 2, we obtain, except if F(x) can be computed directly, a nontrivial statistical hypothesis testing problem (see [19]).
3 Ant Colony Optimization For the sake of a clearer understanding of the algorithm given in the next section, we recapitulate the main ideas of ACO by presenting one particular ACO algorithm, GBAS
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
13
(see [12]), designed for deterministic problems. GBAS has been chosen since it is also the kernel of the algorithm S-ACO in Section 4. Essential general features of ACO are the following: – Solutions are constructed randomly and step-by-step. – Construction steps that have turned out as part of good solutions are favored. – Construction steps that can be expected to be part of good solutions are favored. In GBAS (Graph-Based Ant System), the given problem instance is encoded by a construction graph C , a directed graph with a distinguished start node. For sequencing problems as the TSP or the single-machine total tardiness problem mentioned above, the construction graph is essentially a complete graph with the items to be scheduled as nodes. For problems with other combinatorial structures (e.g., subset problems), suitable other graphs are used. The stepwise construction of a solution is represented by a random walk in the construction graph. In this walk, each node is visited at most once, already visited nodes are “tabu” (infeasible). There may also be additional rules defining particular nodes as infeasible after a certain partial walk has been traversed. When there is no feasible unvisited successor node anymore, the walk stops and is decoded as a complete solution for the problem. The encoding must be such that each walk that is feasible in the sense above corresponds to exactly one feasible solution. (Usually, also the reverse holds, but we do not make this to an explicit condition.) Since, if the indicated condition is satisfied, the objective function value is uniquely determined by a feasible walk, we may denote a walk by the same symbol x as a decision or solution and consider S as the set of feasible walks. When constructing a walk in the algorithm, the probability pkl to go from a node k to a feasible successor node l is chosen as proportional to τkl ·ηkl (u), where τkl is the socalled pheromone or trail level, a memory value storing how good step (k, l) has been in previous runs, and ηkl (u) is the so-called attractiveness or visibility, a pre-evaluation of how good step (k, l) will presumably be (e.g., the reciprocal of the distance from k to l in a TSP). ηkl (u) is allowed to depend on the given partial walk u up to now. This pre-evaluation is done in a problem-specific manner. Pheromone initialization and update is performed as follows: Pheromone initialization: Set τkl = 1/m, where m is the number of arcs of the construction graph. Pheromone update: First, set, for each arc, τkl = (1 − ρ)τkl , where ρ is a so-called evaporation factor between 0 and 1. This step is called evaporation. Then, on each ˆ where L(x) denotes arc of the best walk xˆ found up to now, increase τkl by ρ / L(x), the length of walk x, defined as the number of arcs on x. Thus, the overall amount of pheromone remains equal to unity. This step reinforces the arcs (partial construction steps) of already found good solutions. Random walk construction and pheromone update are iterated. Instead of a single walk (“one ant”), s walks (s > 1) are usually constructed sequentially or, in parallel implementations, simultaneously (“s ants”).
14
Walter J. Gutjahr
Note that for being able to do the pheromone update as described above, the best found walk up to now has to be stored in a global variable x. ˆ Each time a new random walk x is completed, the objective function value of the corresponding feasible solution is computed and compared with the objective function value of x. ˆ If x turns out to be better than w, ˆ the walk stored in xˆ is replaced by x.
4 Extension of the Algorithm to Stochastic Problems We present now an extension S-ACO of the algorithm GBAS of the last section to the case of the stochastic optimization problem (1). S-ACO leaves the basic procedure largely unchanged, but modifies the pheromone-update subprocedure by introducing a stochastic test whether the solution stored as the current best one should still be considered as optimal. In the pseudo-code formulation below, we write τkl (n) instead of τkl in order to denote the dependence on round number n; the same for pkl (n). For τmin (n), see the comments after the procedure. Feasibility of a continuation (k, l) of a partial walk u ending with node k is defined as in Section 3: The continuation (k, l) is feasible if node l is not yet contained in u, and none of the (eventual) additional rules specifies l as infeasible after u has been traversed. procedure S-ACO { do pheromone-initialization; for round n = 1, 2, . . . { for ant σ = 1, . . . , s { set k, the current position of the ant, equal to the start node of C ; set u, the current walk of the ant, equal to the empty list; while (a feasible continuation (k, l) of the walk u of the ant exists) { select successor node l with probability pkl (n), where 0, if (k, l) is infeasible, pkl (n) = τkl (n) ηkl (u) / ∑(k,r) τkr (n) ηkr (u) , otherwise, the sum being over all feasible (k, r); set k = l, and append l to u;
} set xσ = u;
} from {x1 , . . . , xs }, select a walk x; do S-pheromone-update(x, n); // see below }
}
procedure S-pheromone-update(x, n) { compute estimate 1 ˜ F(n) = Nn
Nn
∑
ν=1
f (x, ων )
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
15
by applying Nn independent sample scenarios ων to x; if (n = 1) { set xˆ = x; ˆ ˜ set F(n) = F(n); } else { compute estimate 1 ˆ F(n) = Nn
Nn
∑
ν=1
f (x, ˆ ων )
ˆ by applying Nn independent sample scenarios ων to x; ˜ ˆ if (F(n) < F(n)) set xˆ = x;
} set, with L(x) denoting the length of walk x, max ((1 − ρ) τkl (n) + ρ / L(x), ˆ τmin (n)), if (k, l) ∈ x, ˆ τkl (n + 1) := otherwise; max ((1 − ρ) τkl (n), τmin (n)),
(3)
} Comments: The essential difference to the deterministic case is that in the stochastic case, it is not possible anymore to decide with certainty whether a current solution x is better than the solution currently considered as the best found, x, ˆ or not. This can only be tested by statistical sampling, which happens in the specific pheromone update subprocedure used here, S-pheromone-update. Even the result of this test can be erroneous, due to the stochastic nature of all objective function evaluations, i.e., the test yields the correct comparison result only with a certain probability. For the same reason, it is not even possible to decide which ant has, in the current round, produced the best walk. The procedure above prescribes that one of the s produced walks is selected, according to whatever rule. A promising way to do that would be to evaluate each xσ at a random scenario drawn specifically for this round and to take x as the walk with best objective value. The first part of the subprocedure S-pheromone-update compares the solution x selected in the present round with the solution considered currently as the best, x. ˆ This is done by determining sample estimates for both solutions (practically speaking: by estimating the expected costs of both solutions by means of Monte-Carlo simulation with Nn runs each). Scenarios ων have to be drawn independently from scenarios ων (i.e., the simulation runs have to be executed with two independent series of random numbers). The winner of the comparison is stored as the new x. ˆ The question which sample size Nn should be chosen in round n will be dealt with in the next section. In the second part of the subprocedure, pheromone update is performed essentially in the same way as described in Section 3, but with an additional feature: If the computed pheromone value τkl (n) would fall below some predefined lower bound τmin (n),
16
Walter J. Gutjahr
we set τkl (n) = τmin (n). (The idea of using lower pheromone bounds in ACO is due to St¨utzle and Hoos [20], [21]). Again, the question how τmin (n) should be chosen in dependence of n will be treated in the following section. The computation of the attractiveness values ηkl (u) needs some explanation. As mentioned, these values are obtained from a suitable problem-specific heuristic. Although, in principle, one could work with “zero-information” attractiveness values, all set equal to a constant, the choice of a good attractiveness heuristic will improve the performance of the algorithm considerably. In the stochastic case, there is the difficulty that certain variables possibly used by such a heuristic are not known with certainty, because they depend on the random influence ω. This difficulty can be solved either by taking the expected values (with respect to the distribution of ω) of the required variables as the base of the attractiveness computation (in most stochastic models, these expected values are directly given as model parameters), or by taking those variable values that result from a random scenario ω drawn for the current round. Presumably, both will perform much better than applying zero-information attractiveness.
5 Convergence For the validity of the algorithm S-ACO presented in Section 4, we are able to give a strong theoretical justification: It is possible to prove that, under rather mild conditions, the current solutions produced by the algorithm converge with probability one to the globally optimal solution. In the sequel, we first present and then discuss these conditions. (i) The optimal walk x∗ is unique. (ii) The function f (x, ων ) observed at random scenario ων can be decomposed in expected value and error term as follows: f (x, ων ) = f (x) + εxν , where f (x) = E ( f (x, ω)), εxν is normally distributed with mean 0 and variance (σ(x))2 , and all error variables εxν are stochastically independent. (iii) The attractiveness values ηkl (u) satisfy ηkl (u) > 0 for all prefices u of x∗ and for all (k, l) on x∗ . (iv) The lower pheromone bound is chosen as τmin (n) =
cn log(n + 1)
with limn→∞ cn > 0. (E.g., τmin (n) = c/ log n for some c > 0 fulfills the condition.) (v) The sample size Nn grows at least linearly in n, i.e., Nn ≥ C · n for some constant C > 0. Condition (i) is in some sense the strongest of the four conditions, but it can probably be removed along the same lines as the corresponding condition in [14] for the
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
17
deterministic special case. Also if this should not be the case, a negligible change of the objective function (e.g., adding ε · i(x), where i(x) is the index of solution x according to some order, and ε is sufficiently small) makes (i) satisfied. As to Condition (ii), it should be observed that it is always possible to decompose a random variable f (x, ω) with existing expected value into this expected value and an error term. That the error terms are normally distributed is not a very restrictive assumption, since in S-ACO, the observations f (x, ων ) are always used as independent terms in the sample estimate (2), where they produce (after suitable normalization and for large sample size Nn ) an approximately normally distributed random variable by the central limit theorem, so it does not make an essential difference if they are assumed as normally distributed from the beginning. If one wants to get rid of the assumption of normally distributed error terms, one can also apply stochastic dominance arguments, as, e.g., in [15] for the convergence of stochastic Simulated Annealing. By independent simulation runs each time a value f (x, ων ) is required, the condition on stochastic independence of the error terms can easily be made satisfied. Condition (iii) is very weak, since it is only violated if the problem-specific attractiveness values have been chosen in such an inappropriate way that the optimal walk x∗ is blocked by them a priori. Condition (iv) is easy to satisfy. Also Condition (v) makes, in general, no problems (cf. Remark 1 after the theorem below). Theorem 1. If conditions (i) – (v) are satisfied, then for the currently best found walk x(n) ˆ in round n, for the pheromone values τkl (n) in round n and for the probability Pσ (n) that some fixed considered ant σ traverses the optimal walk x∗ in round n, the following assertions hold: ˆ = x∗ lim x(n)
n→∞
lim τkl (n) =
n→∞
lim Pσ (n) = 1.
n→∞
with probability 1, 1/L(x∗ ), if (k, l) on x∗ , 0, otherwise
(4) with probability 1,
(5) (6)
In informal terms: On the indicated conditions, the solutions subsequently produced by S-ACO tend to the (globally) optimal solution, pheromone concentrates on the optimal walk and vanishes outside, and the current walks of the ants concentrate on the optimal walk. We prove Theorem 1 with the help of five lemmas. In the proofs, we use the following notational conventions: – x(n) is the walk selected in round n, i.e., the first parameter given to the procedure S-pheromone-update when it is called in round n, – x(n) ˆ is the current value of xˆ before the update of xˆ in the else-branch of Spheromone-update in round n. In particular: x(1) ˆ = x(1). In all five lemmas, we always assume implicitly that conditions (i) – (v) are satisfied.
18
Walter J. Gutjahr
Lemma 1. For each fixed positive integer n1 , there exists with probability one an integer n = n(ω) ≥ n1 , such that x∗ is traversed by all ants in round n. Proof. Because of the lower pheromone bound as given by condition (iv), τkl (n) ≥ τmin (n) =
cn c ≥ log(n + 1) 2 log(n + 1)
(7)
for some c > 0 and for n ≥ n0 with some n0 ∈ IN. By condition (iii), for all prefices u of x∗ and for all (k, l) on x∗ ,
ηkl (u) ≥ γ > 0
since the optimal walk x∗ contains only a finite number of arcs. Moreover, ηkl (u) ≤ Γ for some Γ ∈ IR, since there is only a finite number of feasible paths. Therefore, for the probability that the optimal walk x∗ is traversed by a fixed ant in round n, the estimate below is obtained, where uk (x∗ ) denotes the prefix of walk x∗ ending with node k (note that the sum of the pheromone values is unity):
∏
(k,l)∈x∗
pkl (n, uk (x∗ )) =
τkl (n) γ γ ≥ ∏ τkl (n) Γ ∑(k,r) τk,r (n) (k,l)∈x∗ Γ
∏∗
≥
(k,l)∈x
≥
∏
(k,l)∈x∗
γc = 2Γ log(n + 1)
τkl (n) ηkl (uk (x∗ )) ∏ τ (n) ηkr (uk (x∗ )) (k,l)∈x∗ ∑(k,r) kr
γc 2Γ log(n + 1)
L(x∗ )
.
(8)
Obviously, estimation (8) holds as well, if the l.h.s. refers to the probability of traversing x∗ conditional on any event in round 1 to n − 1. Now, let Bn denote the event that x∗ is traversed in round n by all ants. Evidently, ¬Bn1 ∧ ¬Bn1 +1 ∧ . . . is equivalent to the statement that no round n ≥ n1 exists such that x∗ is traversed in round n by all ants. We show that Prob (¬Bn1 ∧ ¬Bn1 +1 ∧ . . .) = 0.
(9)
This is seen as follows. With n = max(n0 , n1 ), the last probability is equal to Prob (¬Bn1 ) · Prob (¬Bn1 +1 | ¬Bn1 ) · Prob (¬Bn1 +2 | ¬Bn1 ∧ ¬Bn1 +1 ) · . . . ≤
∞
∏ Prob (¬Bn | ¬Bn1 ∧ ¬Bn1 +1 ∧ . . . ∧ ¬Bn−1)
n=n
≤
∞
∏
n=n
γc 1− 2Γ log(n + 1)
L(x∗ )·s
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
19
because of (8) and the remark thereafter. The logarithm of the last expression is L(x∗ )·s ∞ γc ∑ log 1 − 2Γ log(n + 1) n=n ≤ −
∞
∑
n=n
γc 2Γ log(n + 1)
L(x∗ )·s
= −∞,
since ∑n (log n)−λ diverges for λ > 0. It follows that (9) holds, which proves the lemma. Lemma 2. Conditionally on the event that x(n) ˆ = x∗ and x(n) = x∗ , ˜ ˆ Prob(F(n) < F(n)) ≤ g(n), and conversely, conditionally on the event that x(n) ˆ = x∗ and x(n) = x∗ , ˆ ˜ Prob(F(n) < F(n)) ≤ g(n), √ where g(n) = φ(−C n) with φ denoting the distribution function of the standard normal distribution, and C is a constant only depending on σ = min σ(x)
(10)
x∈S
(cf. condition (ii)) and on δ = min{F(x) − F(x∗ ) | x = x∗ } > 0
(11)
(cf. condition (i)). In other words: The probability that x∗ looses a comparison against a suboptimal solution is always smaller or equal to g(n). ˜ Proof. Because of condition (ii) and by definition of F(n), 1 Nn ˜ F(n) = ∑ f (x, ων ) Nn ν=1 is normally distributed with mean F(x) and ˜ var(F(n)) =
1 (σ(x))2 2 · N · (σ(x)) = . n Nn2 Nn
˜ For the same reason, F(n) is normally distributed with mean F(x) ˆ and ˆ var(F(n)) =
(σ(x)) ˆ 2 , Nn
˜ ˆ ˜ ˆ and F(n) and F(n) are stochastically independent. Hence F(n) − F(n) is normally distributed with mean F(x) − F(x) ˆ and variance (σ(x))2 (σ(x)) ˆ 2 σ2 2σ2 + ≤2 ≤ Nn Nn Nn an
20
Walter J. Gutjahr
with a > 0 given by condition (v), and σ given by (10). For x(n) ˆ = x∗ , this yields: F(x) − F(x∗ ) ˜ ˆ Prob (F(n) − F(n) < 0) = φ − (σ(x))2 /Nn + (σ(x∗ ))2 /Nn √ δ ≤ φ − = φ(−C n) 2 2σ /an √ √ with C = δ a / ( 2σ) > 0. The second part of the assertion follows immediately because of the symmetry in ˆ ˜ the computation of F(n) and F(n). Lemma 3. For the function g(n) defined in Lemma 2, ∞
∏ (1 − g(n)) = 1 n1 →∞ lim
(12)
n=n1
holds. Proof. Because of C > 0, we have 0 < g(n) < 1. Taking logarithm, we obtain that (12) is equivalent to ∞
lim
n1 →∞
∑ (− log(1 − g(n)) = 0,
(13)
n=n1
where each term in the sum is positive. Because of − log(1 − x) ≤ x for all x, a sufficient condition for (13) being satisfied is ∞
lim
n1 →∞
∑ g(n) = 0.
(14)
n1 =1
Let ϕ(x) = φ (x) denote the density function of a standard normal distribution. By elementary calculations, it is seen that φ(x) ≤ ϕ(x)/(−x) for x < 0. Therefore one obtains √ ∞ ∞ ∞ √ ϕ(−C n) ∑ g(n) = ∑ φ(−C n) ≤ ∑ C√n n=1 n=1 n=1 ∞ √ 1 ∞ 1 C2 n ≤ ∑ ϕ(−C n) = C√2π ∑ exp − 2 . C n=1 n=1 2 The function exp − C2 n is decreasing in n, so ∞ C2 n C2 x ∑ exp − 2 ≤ 0 exp − 2 dx < ∞. n=1 ∞
Thus, ∑∞ n=1 g(n) < ∞. Since g(n) > 0, this proves (14) and therefore also the lemma.
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
21
Lemma 4. With probability one, there is an n2 = n2 (ω) such that x(n) ˆ = x∗ for all n ≥ n2 . Proof. Let n0 be the index introduced in the proof of Lemma 1, such that (7) holds for all n ≥ n0 . We choose n1 ≥ n0 in such a way that ∞
∏ (1 − g(n)) ≥ 1 − ε,
n=n1
which is possible by Lemma 3. Let Gn denote the event that in round n, the optimal solution x∗ is taken for the comparison in S-pheromone-update, either as the currently selected solution x(n), or as the current best-solution candidate x(n), ˆ or both, and that ˆ + 1) = x∗ . Event Gn occurs in two possible situx∗ wins the comparison, such that x(n ations: (a) x(n) = x(n) ˆ = x∗ . Then automatically (i.e., with probability 1) x(n ˆ + 1) = x∗ . (b) x(n) = x(n), ˆ and either x(n) = x∗ or x(n) ˆ = x∗ . In this situation, by Lemma 2, x∗ wins the comparison with a probability of at least 1 − g(n), with the effect that x(n ˆ + 1) = x∗ . Furthermore, let Dn denote the event that round n is the first round with n ≥ n1 where x∗ is traversed by all ants. With the notation in the proof of Lemma 1, Dn = ¬Bn1 ∧ ¬Bn1 +1 ∧ . . . ∧ ¬Bn−1 ∧ Bn . Consider two arbitrary fixed rounds n2 and n with n1 ≤ n2 ≤ n. For n > n2 , the event Dn2 ∧ Gn2 ∧ Gn2 +1 ∧ . . . ∧ Gn−1 implies that x(n) ˆ = x∗ , hence Prob (Gn | Dn2 ∧ Gn2 ∧ Gn2 +1 ∧ . . . ∧ Gn−1 ) ≥ 1 − g(n)
(15)
by the consideration above. For n = n2 , on the other hand, the event Dn2 ∧ Gn2 ∧ . . . ∧ Gn−1 reduces to Dn2 , which implies x(n2 ) = x∗ , such that also in this case, (15) holds by the consideration above. Therefore, Prob (Gn2 ∧ Gn2 +1 ∧ . . . | Dn2 ) =
∞
∏ Prob (Gn | Dn2 ∧ Gn2 ∧ Gn2+1 ∧ . . . ∧ Gn−1)
n=n2
≥
∞
∞
n=n2
n=n1
∏ (1 − g(n)) ≥ ∏ (1 − g(n)) ≥ 1 − ε.
The events Dn1 , Dn1 +1 , . . . are mutually exclusive, and by Lemma 1, Prob (Dn1 ) + Prob(Dn1 +1 ) + . . . = 1. Using this, we obtain: The probability that there is a round n2 ≥ n1 , such that round n2 is the first round after round n1 where x∗ is traversed by all ants and x(n) ˆ = x∗ for all n ≥ n2 , is given by ∞
∑
n2 =n1
Prob (Dn2 ∧ Gn2 ∧ Gn2 +1 ∧ . . .)
22
Walter J. Gutjahr
= ≥ (1 − ε)
∞
∑
n2 =n1
∞
∑
n2 =n1
Prob (Gn2 ∧ Gn2 +1 ∧ . . . | Dn2 ) · Prob (Dn2 )
Prob (Dn2 ) = 1 − ε.
(16)
Since the l.h.s. of (16) does not depend on ε and ε > 0 is arbitrary, the considered probability must be exactly 1, which proves the assertion. Lemma 5. With probability one, τkl (n) → 1/L(x∗ ) for (k, l) ∈ x∗ and τkl (n) → 0 for (k, l) ∈ / x∗ , as n → ∞. Proof. By Lemma 4, there is with probability one an integer n2 such that x(n) ˆ = x∗ for all n ≥ n2 . (i) Let (k, l) ∈ x∗ . In round n2 and all subsequent rounds, (k, l) is always reinforced. Set L = L(x∗ ) for abbreviation. A lower bound for τkl (n) is obtained by omitting the rule that τkl (n + 1) is set equal to τmin (n) if it would otherwise decrease below τmin (n) by evaporation. Based on this lower bound estimation, we get by induction w.r.t. t = 1, 2, . . . that τkl (n2 + t) ≥ (1 − ρ)t τkl (n2 ) +
ρ t−1 ∑ (1 − ρ)i. L i=0
(17)
As t → ∞, the expression on the r.h.s. of (17) tends to ρ ∞ 1 ∑ (1 − ρ)i = L . L i=0 Therefore, for sufficiently large t, τkl (n2 + t) > 1/(2L). On the other hand, τmin (n) → 0, hence τmin (n2 +t) < 1/(2L) for sufficiently large t, which means that updates by setting τkl (n + 1) equal to τmin (n) do not happen anymore for large values of t. Thus, for some t0 and integers t ≥ 1, we find in analogy to (17) (but now with equality instead of inequality) that
ρ t −1 τkl (n2 + t0 + t ) = (1 − ρ) τkl (n2 + t0 ) + ∑ (1 − ρ)i, L i=0
t
and the expression on the r.h.s. tends to 1/L as t → ∞. (ii) Let (k, l) ∈ / x∗ . Then (k, l) is never reinforced anymore in round n2 and any subsequent round. Thus the pheromone on (k, l) decreases geometrically until the lower bound τmin is reached. Since τmin → 0 as well, we have τkl (n) → 0 as n → ∞. Proof of Theorem 1. The first two assertions of the theorem, eqs. (4) and (5), are the assertions of Lemma 4 and Lemma 5, respectively. The third assertion, eq. (6), is seen as follows: ¿From (5), we obtain for (k, l) ∈ x∗ and prefix u of x∗ that, with δkr = 1 if k = r and δkr = 0 otherwise, lim pkl (n, u) =
n→∞
1 · ηkl (u) = 1. ∑(k,r) δkr · ηkr (u)
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
23
Therefore also the probability that a fixed ant σ traverses x∗ , which is given by Pσ (n) = tends to unity as n → ∞.
∏
(k,l)∈x∗
pkl (n, u),
Remark 1. In Gutjahr and Pflug [15], a similar convergence result has been shown for a modification of the Simulated Annealing metaheuristic designed for the application to stochastic optimization problems. There, however, a growth of the sample size Nn of order Ω(n2γ ) with γ > 1 was required for obtaining the convergence property. The growth of order Ω(n) required in Theorem 1 is much more favorable. While runtime limits are reached soon when the sample size is increased faster than with quadratic order, a linear increment usually does not impose severe practical restrictions. Remark 2. For the solution of deterministic combinatorial optimization problems by metaheuristic approaches, some convergence results exist. For Simulated Annealing, e.g., it has been demonstrated by Gelfand and Mitter [11] and by Hajek [17] that by applying a suitable “cooling schedule”, one can achieve that the distribution of the current solution tends to the uniform distribution on the set of globally optimal solutions. A related result for two particular ACO algorithms (both of the GBAS-type outlined above) has been obtained in [13]. Nevertheless, it is clear that when applied to NP-hard problems, these algorithms cannot overcome the general limitations demonstrated by NP-complete-ness theory: If an algorithm is designed in such a way that it is guaranteed to find the optimal solution of any (or even: some) NP-hard problem, a price must be paid: runtime will get prohibitive for larger problem instances. For example, the theoretical cooling schedule assumed in the convergence results for Simulated Annealing is too slow to be wellsuited for practical applications; it has to be modified towards faster cooling, which, on the other hand, introduces the risk of premature convergence to suboptimal solutions. (This dilemma has sometimes been formulated under the term of “No-Free-Lunch Theorems”.) For this reason, algorithms with theoretical guarantee of convergence to optimality are sometimes considered as not practicable. It is interesting to see that this restriction needs not to hold for the algorithm S-ACO presented here: Of course, when applied to large instances of problems that are NPhard even in the deterministic boundary case, S-ACO is subject to the same limitations as the deterministic-problem algorithms converging to optimality. Very large problem instances, however, are not typical for stochastic combinatorial problems in current practice. As argued at the end of Section 2, such problems are already nontrivial in the case of small feasible sets, say, with a few hundred elements or even less. For such problem instances, the algorithm S-ACO can be implemented without any modification; also the linear increase of the sample size will not lead to prohibitive runtime behavior.
6 Modifications The algorithm S-ACO can be modified in several different ways. Let us only indicate one possible line of extension:
24
Walter J. Gutjahr
Our procedure S-pheromone-update follows a “global-best” reinforcement strategy (see Gambardella and Dorigo [10]): the arcs on that walk that is considered as the best found up to now (in any of the previous rounds) are reinforced. An alternative strategy is the classical pheromone update of Ant System [7], where the amount of reinforcement is chosen proportional to the “fitness” of the solution, or the rank-based pheromone update, introduced by Bullnheimer, Hartl and Strauss [5]: the arcs on the k best walks found in the current round are reinforced by a pheromone increment proportional to (k − j + 1)/k for the walk with rank j ( j = 1, . . . , k). We shortly outline the rank-based case; the classical case can be treated analogously. In the stochastic context, one cannot determine the absolute ranks of the walks, but, as indicated in the Comments in Section 4, one can evaluate the walks at a random scenario or at a small sample of random scenarios drawn specifically for this round. In this way, ranks relative to the current scenarios(s) can be computed. Now, one can choose between two alternatives: (i) Perform S-ACO in two phases: In the first phase, replace in S-pheromone-update the global-best update rule by rank-based pheromone update w.r.t. the the currently drawn scenario(s). For this kind of update, sampling for getting the estimates F˜ and Fˆ is not required. In the second phase, start with the pheromone values obtained in the first phase, and perform, from now on, in S-pheromone-update the (global-best) update rule described in Section 4. (ii) Instead of working in two phases, perform pheromone update in each round by a weighted mix between the global-best update described in Section 4 and the rankbased update w.r.t. the current scenario(s). It is likely that the convergence result of Section 5 can be generalized to alternative (i) above. A generalization to alternative (ii) is much more difficult; presumably, convergence to the optimal solution can only be obtained if the weight for the application of the rank-based update scheme is gradually reduced. Both alternatives may be advantageous in practice compared with the basic algorithm, since they allow a broad initial exploration of the solution space (the results of this “learning” rounds are stored in the pheromone), which can possibly speed up convergence by guiding the search in later rounds.
7 Conclusion We have presented a general-purpose algorithm S-ACO applicable to all problems of one of the most frequent problem type in stochastic combinatorial optimization, namely expected-value optimization under deterministic constraints, and shown that on specific, rather mild conditions, S-ACO converges with probability one to the globally optimal solution of the given stochastic optimization problem. Since the algorithm can usually be applied without the necessity of tuning parameters from “theoretical” to “practical” schemes and still keeps the property of convergence to optimality, it might be a promising candidate for computational experiments in diverse areas of application of stochastic combinatorial optimization. Of course, experimental comparisons with other metaheuristic algorithms for this problem field, either ACO-based or derived from other concepts, would be very interesting and could be of considerable practical value.
A Converging ACO Algorithm for Stochastic Combinatorial Optimization
25
References 1. Arnold, D.V., “Evolution strategies in noisy environments - a survey of existing work”, Theoretical Aspects of Evolutionary Computing, Kallel, L., Nauts, B., Rogers, A. (eds.), Springer (2001), pp. 239–250. 2. Bakuli, D.L., MacGregor Smith, J., “Resource allocation in state-dependent emergency evacuation networks”, European J. of Op. Res. 89 (1996), pp. 543–555. 3. Bertsimas, D., Simchi-Levi, D., “A new generation of vehicle routing research: robust algorithms, addressing uncertainty”, Operations Research 44 (1996), pp. 286–304. 4. Bianchi, L., Gambardella. L.M., Dorigo, M., “Solving the homogeneous probabilistic travelling salesman problem by the ACO metaheuristic”, Proc. ANTS ’02, 3rd Int. Workshop on Ant Algorithms (2002), pp. 177–187. 5. Bullnheimer, B., Hartl, R. F., Strauss, C., “A new rank–based version of the Ant System: A computational study”, Central European Journal for Operations Research 7 (1) (1999), pp. 25–38. 6. Dorigo, M., Di Caro, G., “The Ant Colony Optimization metaheuristic”, in: New Ideas in Optimization, D. Corne, M. Dorigo, F. Glover (eds.), pp. 11-32, McGraw–Hill (1999). 7. Dorigo, M., Maniezzo, V., Colorni, A., “The Ant System: An autocatalytic optimization process”, Technical Report 91–016, Dept. of Electronics, Politecnico di Milano, Italy (1991). 8. Du, J., Leung, J.Y.T., “Minimizing total tardiness on one machine is NP-hard”, Mathematics of Operations Research 15 (1990), 483–495. 9. Futschik, A., Pflug, Ch., “Confidence sets for discrete stochastic optimization”, Annals of Operations Research 56 (1995), pp. 95–108. 10. Gambardella, L.M., Dorigo, M., “Ant-Q: A Reinforcement Learning approach to the traveling salesman problem”, Proc. of ML-95, Twelfth Intern. Conf. on Machine Learning (1995), pp. 252–260. 11. Gelfand, S.B., Mitter, S.K., “Analysis of Simulated Annealing for Optimization”, Proc. 24th IEEE Conf. on Decision and Control (1985), pp. 779–786. 12. Gutjahr, W.J., “A graph–based Ant System and its convergence”, Future Generation Computer Systems 16 (2000), pp. 873–888. 13. Gutjahr, W.J., “ACO algorithms with guaranteed convergence to the optimal solution”, Information Processing Letters 82 (2002), pp. 145–153. 14. Gutjahr, W.J., “A generalized convergence result for the Graph–based Ant System”, accepted for publication in: Probability in the Engineering and Informational Sciences. 15. Gutjahr, W.J., Pflug, G., “Simulated annealing for noisy cost functions”, J. of Global Optimization, 8 (1996), pp. 1–13. 16. Gutjahr, W.J., Strauss, Ch, Wagner, E., “A stochastic branch-and-bound approach to activity crashing in project management”, INFORMS J. on Computing, 12 (2000), pp. 125-135. 17. Hajek, B., “Cooling schedules for optimal annealing”, Mathematics of OR, 13 (1988), pp. 311–329. 18. Marianov, V., Serra, D., “Probabilistic maximal covering location-allocation models for congested systems”, J. of Regional Science, 38 (1998), 401–424. 19. Norkin, V.I., Ermoliev, Y.M., Ruszczynski, A., “On optimal allocation of indivisibles under uncertainty”, Operations Research 46 (1998), pp. 381–395. 20. St¨utzle, T., Hoos, H.H., “The MAX-MIN Ant system and local search for the travelling salesman problem”, in: T. Baeck, Z. Michalewicz and X. Yao (eds.), Proc. ICEC ’97 (Int. Conf. on Evolutionary Computation) (1997), pp. 309–314. 21. St¨utzle, T., Hoos, H.H., “MAX-MIN Ant System”, Future Generation Computer Systems, 16 (2000), pp. 889–914.
Optimality of Randomized Algorithms for the Intersection Problem J´er´emy Barbay Department of Computer Science, University of British Columbia, 201-2366 Main Mall, Vancouver, B.C. V6T 1Z4 Canada
[email protected] Abstract. The “Intersection of sorted arrays” problem has applications in indexed search engines such as Google. Previous works propose and compare deterministic algorithms for this problem, and offer lower bounds on the randomized complexity in different models (cost model, alternation model). We refine the alternation model into the redundancy model to prove that randomized algorithms perform better than deterministic ones on the intersection problem. We present a randomized and simplified version of a previous algorithm, optimal in this model. Keywords: Randomized algorithm, intersection of sorted arrays.
1 Introduction We consider search engines where queries are composed of several keywords, each one being associated with a sorted array of references to entries in a database. The answer to a conjunctive query is the intersection of the sorted arrays corresponding to each keyword. Most search engines implement these queries. The algorithms are in the comparison model, where comparisons are the only operations permitted on references. The intersection problem has been studied before [1, 4, 5], but the lower bounds apply to randomized algorithms, when some deterministic algorithms are optimal. Does it mean that no randomized algorithms can do better than a deterministic one on the intersection problem? In this paper we present a new analysis of the intersection problem, called the redundancy analysis, more precise and which permits to prove that for the intersection problem, randomized algorithms perform better than deterministic algorithms in term of the number of comparisons. The redundancy analysis also makes more natural assumptions on the instances: the worst case in the alternation analysis is such that an element considered by the algorithm is matched by almost all of the keywords, while in the redundancy analysis the maximum number of keywords matching such an element is parametrized by the measure of difficulty. We define formally the intersection problem and the redundancy model in Section 2. We give in Section 3 a randomized algorithm inspired by the small adaptive algorithm, and give its complexity in the redundancy model, for which we prove it is optimal in Section 4. We answer the question of the utility of randomized algorithm for the intersection problem in Section 5: no deterministic algorithm is optimal in the redundancy model. We list in Section 6 several points on which we will try to extend this work. A. Albrecht and K. Steinh¨ofel (Eds.): SAGA 2003, LNCS 2827, pp. 26–38, 2003. c Springer-Verlag Berlin Heidelberg 2003
Optimality of Randomized Algorithms for the Intersection Problem
27
2 Definitions In the search engines we consider, queries are composed of several keywords, and each keyword is associated to a sorted array of references. The references can be, for instance, addresses of web pages, but the only important thing is that there is a total order on them, i.e. all unequal pair of references can be ordered. To study the problem of intersection, we hence consider any set of arrays on a totally ordered space to form an instance [1]. To perform any complexity analysis on such instances, we need to define a measure representing the size of the instance. We define for this the signature of an instance. Definition 1 (Instance and Signature). We consider U to be a totally ordered space. An instance is composed of k sorted arrays A1 , . . . , Ak of positive sizes n1 , . . . , nk and composed of elements from U . The signature of such an instance is (k, n1 , . . . , nk ). An instance is “of signature at most” (k, n1 , . . . , nk ) if it can be completed by adding arrays and elements to form an instance of signature exactly (k, n1 , . . . , nk ). Example 1. Consider the instance of Figure 1, where the ordered space is the set of positive integers: it has signature (7, 1, 4, 4, 4, 4, 4, 4) A= B= C= D= E= F= G=
9 1 3 9 4 5 8
2 9 14 10 6 10
9 12 15 17 7 19
11 13 16 18 10 20
A: B: 1 2 C: 3 D: E: 4 F: 567 G: 8
9 9 9 9
11 12 13 14 15 16 10 10 10
17 18 19 20
Fig. 1. An instance of the intersection problem: on the left is the array representation of the instance, on the right is a representation which expresses better the structure of the instance, where the abscissa of elements are equal to their value.
Definition 2 (Intersection). The Intersection of an instance is the set A1 ∩ . . . ∩ Ak composed of the elements that are present in k distinct arrays. Example 2. The intersection A ∩ B ∩ . . . ∩ G of the instance of Figure 1 is empty as no element is present in more than 4 arrays. Any algorithm (even non-deterministic) computing the intersection must certify the correctness of the output: first, it must certify that all the elements of the output are indeed elements of the k arrays; second, it must certify that no element of the intersection has been omitted by exhibiting some proof that there can be no other elements in the intersection than those output. We define the partition-certificate as a proof of the intersection.
28
J´er´emy Barbay
Definition 3 (Partition-Certificate). A partition-certificate is a partition (I j ) j≤δ of U into intervals such that any singleton {x} corresponds to an element x of ∩i Ai , and each other interval I has an empty intersection I ∩ Ai with at least one array Ai . Imagine a function which indicates for each element x ∈ U the name of an array not containing x if x is not in the intersection, and “all” if x is in the intersection. The minimal number of times such a function alternates names, for x scanning U in increasing order, is also the minimal size of a partition-certificate of the instance (minus one), which is called alternation. Definition 4 (Alternation). The alternation δ(A1 , . . . , Ak ) of an instance (A1 , . . . , Ak ) is the minimal number of intervals forming a partition-certificate of this instance. Example 3. The alternation of the instance in Figure 1 is δ = 3, as we can see on the right representation that the partition (−∞, 9), [9, 10), [10, +∞) is a partition-certificate of size 3, and that none can be smaller. The alternation measure was used as a measure of the difficulty of the instance [1], as it is the non-deterministic complexity of the instance, and as there is a lower bound increasing with δ on the complexity of any randomized algorithm. By definition of the partition-certificate: – for each singleton {x} of the partition, any algorithm must find the position of x in all arrays Ai , which takes k searches; – for each interval I j of the partition, any algorithm must find an array, or a set of arrays, such that the intersection of I j with this array, or with the intersection of those arrays, is empty. The cost for finding such a set of arrays can vary and depends on the choices performed by the algorithm. In general it requires less searches if there are many possible answers. To take this into account, for each interval I j of the partition-certificate we will count the number r j of arrays whose intersection with I j is empty. The smaller is r j , the harder is the instance: 1/r j measures the contribution of this interval to the difficulty of the instance. Example 4. Consider for instance the interval [10, 11) in the instance of Figure 1: r j = 4. A random algorithm choosing an array uniformly has probability r j /k to find an array which do not intersect [10, 11), and will do so on average before k/r j trials, even if it doesn’t memorize which array it tried before. k being fixed, 1/r j measures the difficulty of proving that no element of [10, 11) is in the intersection of the instance. We name the sum of those contributions the redundancy of the instance, and it forms our new measure of difficulty: Definition 5 (Redundancy). Let A1 , . . . , Ak be k sorted arrays, and let (I j ) j≤δ be a partition-certificate for this instance. – The redundancy ρ(I) of an interval or singleton I is defined as equal to 1 if I is a / otherwise. singleton, and equal to 1/#{i, , AI ∩ I = 0}
Optimality of Randomized Algorithms for the Intersection Problem
29
– The redundancy ρ((I j ) j≤δ ) of a partition-certificate (I j ) j≤δ is the sum ∑ j ρ(I j ) of the redundancies of the intervals composing it. – The redundancy ρ ((Ai )i≤k ) of an instance of the intersection problemis the minimal redundancy of a partition-certificate of the instance, min{ρ (I j ) j≤δ , ∀(I j ) j≤δ }. Note that the redundancy is always well defined and finite: if I is not a singleton then by definition there is at least one array Ai whose intersection with I is empty, and / > 0. #{i, Ai ∩ I = 0} Example 5. The partition-certificate (−∞, 9), [9, 10), [10, 11), [11, +∞) has redundancy at most 12 + 13 + 14 + 12 , and no other partition-certificate has a smaller redundancy, hence our instance has redundancy 76 . The redundancy analysis permits to measure the difficulty of the instance in a finer way than with the alternation: for fixed k, n1 , . . . , nk , δ, several instances of signature (k, n1 , . . . , nk ) and alternation δ may present different difficulties for any algorithm, and different redundancies. Example 6. In the instance from Figure 1 the only way to prove that the intersection of those arrays is empty, is to compute the intersection of one of the arrays from {A, B,C, D} with one of the arrays from {E, F, G}. For simplicity, and without loss of generality, we suppose the algorithm searches to intersect A with another array in {B,C, D, E, F, G}, and we focus for this example on the number of unbounded searches performed, instead of the number of comparisons: the randomized algorithm looking for the element of A in an array from {B,C, D, E, F, G} chosen at random performs on average only 2 searches in the first instance, as the probability to find an array whose intersection is empty with A is then 12 . On the other hand, consider the instance of a subtle variant of the instance of Figure 1, where the element 9 would be present in all the arrays but one, for instance E (only two elements needs to change, F[4] and G[2] which were equal to 10 and are now equal to 9). As the two instances have the same signature and alternation, the alternation analysis yields the same lower bound for both instances. But the randomized algorithm described above performs now on average k/2 searches, as opposed to 2 searches on the original instance. This difference of performance is not expressed by a difference of alternation, but is expressed by a difference of redundancy: the new instance has a redundancy of 12 +1+ 12 = 2 larger than the redundancy 76 of the original instance1 .
3 Randomized Algorithm The algorithm we propose here is a randomized and simplified version of the small adaptive algorithm [5]. It uses the unbounded search algorithm, which looks for an element x in a sorted array A of unknown size, starting at position init, with complexity 2log2 (p−init), where p is the insertion position of the element in the array. It returns a value p such that A[p − 1]<x≤A[p]. This algorithm has already been studied before, 1
This is just a particular case given as an example, see Section 5 for the general proof.
30
J´er´emy Barbay
it can be implemented using the doubling search and binary search algorithms [1, 4–6], or directly to improve the complexity by a constant factor [3]. Given t∈{1, . . . , k}, and k non-empty sorted sets A1 , . . . , Ak of sizes n1 , . . . , nk , the rand intersection algorithm (algorithm 1) computes the intersection I=A1 ∩ . . . ∩Ak . For simplicity, we assume that all arrays contain the element −∞ at position 0 and the element +∞ at position ni + 1. The algorithm is composed of two nested loops. The outer loop iterates through potential elements of the intersection in variable m and in increasing order, and the inner loop checks for each value of m if it is in the intersection. In each pass of the inner loop, the algorithm searches for m in one array As which potentially contains it. The invariant of the inner loop is that, at the start of each pass and for each array Ai , pi denotes the first potential position for m in Ai : Ai [pi − 1] < m. The variables #YES and #NO count how many arrays are known to contain m, and are updated depending on the result of each search. A new value for m is chosen every time we enter the outer loop, at which time the current subproblem is to compute the intersection on the sub-arrays Ai [pi , . . . , ni ] for all values of i. Any first element Ai [pi ] of a sub-array could be a candidate, but a better candidate is one which is larger than the last value of m: the algorithm chooses As [ps ], which is by definition larger than m. Then only one array As is known to contain m, hence #YES ← 1, and no array is known not to contain it, hence #NO ← 0. The algorithm terminates when all the values of the current array have been considered, and m has taken the last value +∞.
Algorithm 1 Rand Intersection (A1 , . . . , Ak ) Given k non-empty sorted sets A1 , . . . , Ak of sizes n1 , . . . , nk , the algorithm computes in variable I the Intersection A1 ∩ . . . ∩ Ak . Note that the only random instruction is the choice of the array in the inner loop. for all i do pi ← 1 end for / s←1 I ← 0; rep´eter m ← As [ps ] #NO ← 0; #YES ← 1; tant que #YES < k and #NO = 0 faire Let As be a random array s.t. As [ps ] = m. ps ← Unbounded Search(m, As , ps ) if Ai [pi ] = m then #NO ← 1 else #YES ← YES + 1 end if fin tant que if #YES = k then I ← I ∪ {m} end if for all i such that Ai [pi ] = m do pi ← pi + 1 end for jusqu’`a ce que m = +∞ return T
Optimality of Randomized Algorithms for the Intersection Problem
31
Theorem 1. Algorithm rand intersection (algorithm 1) performs on average O(ρ ∑ log(ni /ρ)) comparisons on an instance of signature (k, n1 , . . . , nk ) and of redundancy ρ. Proof. Let (I j ) j≤δ be a partition-certificate of minimal redundancy ρ. Each comparison performed by the algorithm is said to be performed in phase j if m ∈ I j for some interval j I j of the partition. Let Ci be the number of binary searches performed by the algorithm j during phase j in array Ai , let Ci = ∑ j Ci be the number of binary searches performed by the algorithm in array Ai over the whole execution, and let (r j ) j≤δ be such that r j is / otherwise. equal to 1 if I is a singleton, and to #{i, Ai ∩ I j = 0} Let’s consider a fixed phase j ∈ {1, . . . , δ}: if the phase is positive (if m ∈ I) then j j Ci = 1 ∀i = 1, . . . , k. Remark that in this case 1/r j is also equal to 1, so that Ci = 1/r j . j If the phase is negative (if m ∈ / I), Ci is a random variable. j
– If Ai ∩ I j = 0/ then Ci ∈ {0, 1} as the algorithm will terminate the phase whenever j / = 1r j , searching in Ai . The probability to do such a search is Pr{Ci = 1|Ai ∩ I j = 0} j
j
/ = 1 ∗ Pr{Ci = 1|Ai ∩ I j = so the average number of searches is E(Ci |Ai ∩ I j = 0) / = 1r . 0} j
j
– If Ai ∩ I j = 0/ then at each new search, either Ci is incremented with probability j 1 k−1 , because the search occurred in Ai ; or Ci is fixed in a final way with probarj bility k−1 , because an array of empty intersection with I was searched; or neither 1+r
incremented nor fixed, with probability 1 − k−1j , in the other cases. This system j 1 and fixed is equivalent to a system where Ci is incremented with probability 1+r j with probability 1+r j rj
−1 =
1 rj
rj 1+r j .
j
From this we can deduce that Ci is incremented on average
times before it is fixed.
So the algorithm performs on average E(Ci ) = ∑ j
1 rj
= ρ binary searches in array Ai .
Let gli, j be the increment of pi due to the lth unbounded search in array Ai during phase j. Notice that ∑ j,l gli, j ≤ ni . The algorithm performs 2 log(gli, j + 1) comparisons during the lth search of phase j in array Ai . So it performs 2 ∑ j,l log(gli,l + 1) comparisons between m and an element of array Ai during the whole execution. Because of the gl
concavity of the function log(x + 1), this is smaller than 2Ci log(∑ j,l Ci, ij + 1), and be cause of the preceding remark ∑ j,l gli, j ≤ni , this is still smaller than 2Ci log( nCii + 1). The functions fi (x)=2x log( nxi +1) are concave for x≤ni , so E( fi (Ci ))≤ fi (E(Ci )). As the average complexity of the algorithm in array Ai is E( f (Ci )), and as E(Ci ) = ρ, on average the algorithm performs less than 2ρ log( nρi + 1) comparisons between m and an element in array Ai . Summing over i we get the final result, which is O(ρ ∑i log nρi ).
32
J´er´emy Barbay
4 Randomized Complexity Lower Bound We prove now that no randomized algorithm can do asymptotically better. The proof is quite similar to the lower bound of the alternation model [1], and differs mostly in lemma 1, which must be adapted to the redundancy and whose lower bound is improved by a constant multiplicative factor. In Lemma 1 we prove a lower bound on average on a distribution of instances of redundancy at most ρ = 4 and of output size at most 1. We use this result in Lemma 2 to define a distribution on instances of redundancy at most ρ ∈ {4, 4n1} by combining p = o(ρ) sub-instances. In Lemma 3 we prove that any instance of signature (k, n1 , . . . , nk ) has redundancy ρ at most 2n1 + 1, so that the result of lemma 2 holds for any ρ ≥ 4. Finally applying the Yao-von Neumann principle [7–9] in Theorem 2 this gives us a lower bound of Ω(ρ ∑ki=2 log(ni /ρ)) on the complexity of any randomized algorithm for the Intersection problem. Lemma 1. For any k ≥ 2, and 0 < n1 ≤ . . . ≤ nk , there is a distribution on instances of the Intersection problem with signature at most (k, n1 , . . . , nk ), and redundancy at most 4, such that any deterministic algorithm performs at least 14 ∑ki=2 log ni + ∑ki=2 2ni1+1 − k+2 comparisons on average. Proof. Let C be the total number of comparisons performed by the algorithm, and for each array Ai note Fi = log2 (2ni + 1), and F = ∑ki=2 Fi . Let’s draw an index w ∈ {2, . . . , k} equal to i with probability FFi , and (k − 1) positions (pi )i∈{2,...,k} such that ∀ i each pi is chosen uniformly at random in {1, . . . , ni }. Let P and N be two instances such that in both P and N, for any 1< j≤k, a∈A1 , b, c∈Ai and d, e∈A j then b