Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5799
Zhiming Liu Anders P. Ravn (Eds.)
Automated Technology for Verification and Analysis 7th International Symposium, ATVA 2009 Macao, China, October 14-16, 2009 Proceedings
13
Volume Editors Zhiming Liu United Nations University International Institute of Software Technology (UNU-IIST) Macao, China E-mail:
[email protected] Anders P. Ravn Aalborg University Department of Computer Science 9220 Aalborg, Denmark E-mail:
[email protected] Library of Congress Control Number: 2009935680 CR Subject Classification (1998): D.2, D.3, F.3, G.4, I.2.2, F.4, B.6, B.7, C.5.4 LNCS Sublibrary: SL 2 – Programming and Software Engineering ISSN ISBN-10 ISBN-13
0302-9743 3-642-04760-2 Springer Berlin Heidelberg New York 978-3-642-04760-2 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12768271 06/3180 543210
Preface
This volume contains the papers presented at the 7th International Symposium on Automated Technology for Verification and Analysis held during October 1316 in Macao SAR, China. The primary objective of the ATVA conferences remains the same: to exchange and promote the latest advances of state-of-the-art research on theoretical and practical aspects of automated analysis, verification, and synthesis. Among 74 research papers and 10 tool papers submitted to ATVA 2009, the Program Committee accepted 23 as regular papers and 3 as tool papers. In all, 33 experts from 17 countries worked hard to make sure that every submission received a rigorous and fair evaluation. In addition, the program included three excellent tutorials and keynote talks by Mark Greenstreet (U. British Columbia), Orna Grumberg (Technion), and Bill Roscoe (Oxford University). The conference organizers were truly grateful to have such distinguished researchers as keynote speakers. Many worked hard and offered their valuable time so generously to make ATVA 2009 successful. First of all, the conference organizers thank all 229 researchers who worked hard to complete and submit papers to the conference. The PC members, reviewers, and Steering Committee members also deserve special recognition. Without them, a competitive and peer-reviewed international symposium simply cannot take place. Many organizations sponsored the symposium. They include: The United Nations University, International Institute of Software Technology (UNU-IIST); Macao Polytechnic Institute (MPI); Macao POST; and Formal Methods Europe (FME). The conference organizers thank them for their generous support and assistance. Many individuals offered their enthusiastic help to the conference. We are grateful to Lei Heong Iok, President of MPI, Cheang Mio Han, Head of the Academic Affairs Department of MPI, and Hau Veng San, Head of the Division for Pedagogical Affairs of MPI, for their support. We would like to thank Chris George, the Conference Chair, Antonio Cerone, the Local Organization Chair, and Jun Pang, the Publicity Chair, for their hard work; Wang Zhen for writing the online registration program; Charles Morisset for his help in managing the proceedings and the program; and last, but not least, the general staff of UNUIIST, Wendy Hoi, Kitty Chan, Michelle Ho, Sandy Lee, and Kyle Au, for their help with the local organization. We sincerely hope that the readers find the proceedings of ATVA 2009 informative and rewarding. August 2009
Zhiming Liu Anders P. Ravn
Organization
ATVA 2009 was organized by the United Nations University, International Institute for Software Technology (UNU-IIST) in cooperation with the Macao Polytechnic Institute.
Conference Chairs General Chair Program Chairs
Organization Chair Publicity Chair
Chris George (UNU-IIST, Macao) Zhiming Liu (UNU-IIST, Macao) Anders P. Ravn (Aalborg University, Denmark) Antonio Cerone (UNU-IIST, Macao) Jun Pang (University of Luxembourg)
Program Committee Rajeev Alur Christel Baier Jonathan Billington Laurent Fribourg Masahiro Fujita Susanne Graf Mark Greenstreet Wolfgang Grieskamp Teruo Higashino Moonzoo Kim Orna Kupferman
Marta Kwiatkowska Insup Lee Xuandong Li Shaoying Liu Kedar Namjoshi Hanne Nielson Ernst-Ruediger Olderog Jun Pang Doron A. Peled Abhik Roychoudhury Natarajan Shankar
Irek Ulidowski Mahesh Viswanathan Farn Wang Xu Wang Ji Wang Hsu-Chun Yen Wang Yi Tomohiro Yoneda Wenhui Zhang
Steering Committee E. Allen Emerson Teruo Higashino Oscar H. Ibarra Insup Lee Doron A. Peled Farn Wang Hsu-Chun Yen
U. Texas at Austin Osaka University U. California, Santa Barbara U. Pennsylvania Univ. Warwick, Univ. Bar Ilan National Taiwan Univ. National Taiwan Univ.
VIII
Organization
External Referees Eugene Asarin Souheib Baarir Rena Bakhshi Bruno Barras Denes Bisztray Laura Bocchi Benedikt Bollig Olivier Bournez Lei Bu Doina Bucur Lin-Zan Cai Radu Calinescu Rohit Chadha Taolue Chen Vivien Chinnapongse Thao Dang Arnab De Ton van Deursen Alin Deutsch Catalin Dima Bruno Dutertre Jochen Eisinger Dana Fisman Guy Gallasch Pierre Ganty Han Gao
Vijay Gehlot Ankit Goel Stefan Haar Stefan Hallerstede Alejandro Hernandez Hsi-Ming Ho Shin Hong Guo-Chou Huang Chung-Hao Huang Lei Ju Barbara Kordy Mark Kattenbelt Joachim Klein Vijay Korthikanti Jaewoo Lee Wanwei Liu Christof Loeding Gavin Lowe Yoad Lustig Xiaodong Ma Stephane Messika Gethin Norman Chun Ouyang Jorge A. Perez Henrik Pilegaard Pavithra Prabhakar
Jan-David Quesel Nataliya Skrypnyuk Martin Stigge Tim Strazny Mani Swaminathan Ashish Tiwari Paolo Torrini Ashutosh Trivedi Somsak Vanit-Anunchai Jeffrey Vaughan Ramesh Viswanathan Xi Wang Zhaofei Wang Shaohui Wang Andrew West Anton Wijs Hong-Hsin Wu Rong-Hsuan Wu Shaofa Yang Hsuen-Chin Yang Lu Yang Ender Yuksel Jianhua Zhao Gethin Norman Christian W. Probst
Sponsoring Institutions United Nations University – International Institute for Software Technology Macao Post Macao Polytechnic Institute Formal Methods Europe
Table of Contents
Invited Talks Verifying VLSI Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark R. Greenstreet
1
3-Valued Abstraction for (Bounded) Model Checking (Abstract) . . . . . . . Orna Grumberg
21
Local Search in Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.W. Roscoe, P.J. Armstrong, and Pragyesh
22
State Space Reduction Exploring the Scope for Partial Order Reduction . . . . . . . . . . . . . . . . . . . . . Jaco Geldenhuys, Henri Hansen, and Antti Valmari
39
State Space Reduction of Linear Processes Using Control Flow Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaco van de Pol and Mark Timmer
54
A Data Symmetry Reduction Technique for Temporal-epistemic Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mika Cohen, Mads Dam, Alessio Lomuscio, and Hongyang Qu
69
Tools TAPAAL: Editor, Simulator and Verifier of Timed-Arc Petri Nets . . . . . . Joakim Byg, Kenneth Yrke Jørgensen, and Jiˇr´ı Srba
84
CLAN: A Tool for Contract Analysis and Conflict Discovery . . . . . . . . . . . Stephen Fenech, Gordon J. Pace, and Gerardo Schneider
90
UnitCheck: Unit Testing and Model Checking Combined . . . . . . . . . . . . . . ˇ y Michal Kebrt and Ondˇrej Ser´
97
Probabilistic Systems LTL Model Checking of Time-Inhomogeneous Markov Chains . . . . . . . . . Taolue Chen, Tingting Han, Joost-Pieter Katoen, and Alexandru Mereacre
104
Statistical Model Checking Using Perfect Simulation . . . . . . . . . . . . . . . . . Diana El Rabih and Nihal Pekergin
120
X
Table of Contents
Quantitative Analysis under Fairness Constraints . . . . . . . . . . . . . . . . . . . . Christel Baier, Marcus Groesser, and Frank Ciesinski A Decompositional Proof Scheme for Automated Convergence Proofs of Stochastic Hybrid Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jens Oehlerking and Oliver Theel
135
151
Medley Memory Usage Verification Using Hip/Sleek . . . . . . . . . . . . . . . . . . . . . . . . . Guanhua He, Shengchao Qin, Chenguang Luo, and Wei-Ngan Chin
166
Solving Parity Games in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Friedmann and Martin Lange
182
Automated Analysis of Data-Dependent Programs with Dynamic Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parosh Aziz Abdulla, Muhsin Atto, Jonathan Cederberg, and Ran Ji
197
Temporal Logic I On-the-fly Emptiness Check of Transition-Based Streett Automata . . . . . Alexandre Duret-Lutz, Denis Poitrenaud, and Jean-Michel Couvreur
213
On Minimal Odd Rankings for B¨ uchi Complementation . . . . . . . . . . . . . . . Hrishikesh Karmarkar and Supratik Chakraborty
228
Specification Languages for Stutter-Invariant Regular Properties . . . . . . . Christian Dax, Felix Klaedtke, and Stefan Leue
244
Abstraction and Refinement Incremental False Path Elimination for Static Software Analysis . . . . . . . Ansgar Fehnker, Ralf Huuck, and Sean Seefried A Framework for Compositional Verification of Multi-valued Systems via Abstraction-Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yael Meller, Orna Grumberg, and Sharon Shoham
255
271
Don’t Know for Multi-valued Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alarico Campetelli, Alexander Gruler, Martin Leucker, and Daniel Thoma
289
Logahedra: A New Weakly Relational Domain . . . . . . . . . . . . . . . . . . . . . . . Jacob M. Howe and Andy King
306
Table of Contents
XI
Fault Tolerant Systems Synthesis of Fault-Tolerant Distributed Systems . . . . . . . . . . . . . . . . . . . . . Rayna Dimitrova and Bernd Finkbeiner
321
Formal Verification for High-Assurance Behavioral Synthesis . . . . . . . . . . Sandip Ray, Kecheng Hao, Yan Chen, Fei Xie, and Jin Yang
337
Dynamic Observers for the Synthesis of Opaque Systems . . . . . . . . . . . . . . Franck Cassez, J´er´emy Dubreil, and Herv´e Marchand
352
Temporal Logic II Symbolic CTL Model Checking of Asynchronous Systems Using Constrained Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yang Zhao and Gianfranco Ciardo
368
LTL Model Checking for Recursive Programs . . . . . . . . . . . . . . . . . . . . . . . . Geng-Dian Huang, Lin-Zan Cai, and Farn Wang
382
On Detecting Regular Predicates in Distributed Systems . . . . . . . . . . . . . . Hongtao Huang
397
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
413
Verifying VLSI Circuits Mark R. Greenstreet Department of Computer Science University of British Columbia Vancouver, BC, Canada
[email protected] 1 Introduction Circuit-level verification is a promising area for formal methods research. Simulation using tools such as SPICE remains the main method for circuit validation. Increasing integration densities have increased the prevalence of analog/mixed-signal designs. It is now common for analog components such as DLLs and phase correction circuits to be embedded deep in digital designs, making the circuits critical for chip functional yet hard to test. While digital design flows have benefited from systematic methodologies including the use of formal methods, circuit design remains an art. As a consequence, analog design errors account for a growing percentage of design re-spins. All of these have created a pressing need for better circuit-level CAD and motivated a strong interest in formal verification. There are many properties of circuit-level design that make it attractive for formal approaches: key circuits tend to be small. Thus, unlike digital designs, the problems are not primarily ones of scale. Instead, the challenge is to correctly specify, model and verify circuit behavior. Furthermore, circuit-level design for both analog and digital cells is the domain of design experts who expect to spend a substantial amount of time on each cell designed. Thus, they can consider working with a verification expert if that interaction leads to a reduction in design time or risk. In this paper, we identify three types of verification problems: introductory, intermediate, and open challenges, give examples of each, and present verification results. Section 2 presents “introductory problems” using a simple tunnel diode oscillator that has been extensively studied by formal methods researchers. These introductory problems have low-dimensional dynamics described by models chosen for mathematical convenience rather than physical realism. Such problems can provide a useful introduction to circuit behavior for verification researchers with expertise developed in other application domains. Intermediate problems are introduced in Section 3. These are circuits that are used on real integrated circuits and modeled with a reasonable level of physical fidelity. Examples include the ring-oscillator challenge put forward by Rambus last year. Verification results for these problems may be of immediate use to design experts. Equally important, the process of formalizing the specification and circuit model provides a common language for designers, verifiers, and CAD tool developers. Finally, Section 4 presents challenge problems whose solutions would have significant impact on the circuit design community but at present have not been addressed by Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 1–20, 2009. c Springer-Verlag Berlin Heidelberg 2009
2
M.R. Greenstreet
+ VR − + VL − IVin + Vin −
R
L
IR
IL I TD
IC
C + VTD = VC −
Fig. 1. A Tunnel-Diode Oscillator
formal methods. Chief among these are design problems that arise from uncertainty in device parameters. On chip analog circuits such as oscillators, voltage regulators and sense amplifiers often include a large number of digital control inputs that compensate for uncontrollable variations in the fabrication process and operating conditions.. These circuits are genuine hybrid systems with a digital control loop regulating analog components. CAD tools that could verify such designs over a the full range of operating modes and fabrication parameters would be of great interest to designers and present a great challenge and opportunity for formal methods research.
2 Introductory Examples Research in applying formal methods to circuit-level verification dates back Kurshan and MacMillan’s verification of an nMOS arbiter using COSPAN in 1991 [1]. Interest in the topic has rapidly grown in the past five years as advances in reachability tools have made increased the range of problems that can be addressed formally, and as advances in fabrication technology have led to a prevalence of mixed-signal designs and to a wide range of circuit-level behaviors that must be considered to obtain a working design but for which existing CAD flows provide no automated means of verification or validation. Not surprisingly, much of the early research in this area has focused on simple circuits and/or used simple models. These simplifications serve two purposes: first, they provide problems that are tractable with current verification tools; second, they serve a pedagogical role for formal methods researchers by providing examples that can be easily understood without a deep knowledge of circuit design and analysis techniques. In this section, I will use the tunnel diode oscillator circuit first considered in a formal verification context by Hartong et al. [2] as an example of an “introductory problem” and as a way to illustrate dynamical systems concepts that can be applied to circuit verification. This section closes with a brief description of other problems of a similarly introductory nature. 2.1 The Tunnel-Diode Oscillator The tunnel-diode oscillator consists of four components:
Verifying VLSI Circuits
3
−3
3.5
x 10
3
2.5
2
1.5
1
0.5
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
I (amperes)
⎧ 3 − 0.9917V 2 + 0.0545V 0.000 ≤ VTD ≤ 0.055 ⎨ 6.0105VTD TD TD 3 − 0.0421V 2 + 0.0040V −4 ITD (VTD ) = 0.0692VTD TD + 8.85794 × 10 , 0.055 ≤ VTD ≤ 0.350 TD ⎩ 3 − 0.2765V 2 + 0.0968V 0.2634VTD 0.350 ≤ VTD TD − 0.0112, TD Fig. 2. Current vs. Voltage for a Tunnel Diode
a voltage source provides the voltage (corresponds to mechanical force) that drives the oscillator. a resistor, the unavoidable friction or drag in circuit components. By Ohm’s law, I = V /R where I is the current flowing through the resistor, V is the voltage applied across the resistor, and R is the “resistance” of the resistor. In the mechanical analogy, current is the velocity of an object, and charge is the distance that it has moved. an inductor, the electrical equivalent of “mass” in mechanical systems. The current flowing through an inductor grows at a rate proportional to the voltage applied across the inductor, just as the velocity of a mass grows at a rate proportional to the force applied to it. The circuit equation for the inductor is: dtd IL = VL . a capacitor, the electrical equivalent of a spring. The charge held in a (linear) capacitor is proportional to the voltage applied just as the displacement of a (linear) spring is proportional to the force applied. The circuit equation for a capacitor is: dtd V = CI . a tunnel diode, a non-linear device that compensates for the energy dissipated by the resistor. The distinctive feature of the tunnel diode is the region where the current decreases with increasing applied voltage. This creates a “negative incremental resistance” that can be used to cancel the resistive losses in the inductor (and other components). Figure 2 shows the piecewise cubic relationship between current and voltage from [3] that we use for this example.
+ Vin
−
R
L
C
We can obtain an ordinary differential equation that models the oscillator circuit using Kirchoff’s current and voltage laws. Kirchoff’s current law (KCL) states that the sum of currents entering any node must be zero. This yields: IVin = IR = IL = ITD + IC
(1)
4
M.R. Greenstreet
Kirchoff’s voltage law (KVL) states that the sum of the voltages around any loop must be zero. This yields: VTD = VC (2) Vin = VR + VL + VC Combining these with the component models described above yields: I˙L = L1 (Vin − IL R − VC ) ˙ VC = C1 (IL − ITD (VC ))
(3)
where I˙L denotes the time derivative of IL and likewise for V˙C . Numerically integrating Equation 3 yields waveforms for the tunnel-diode oscillator. Here, we use the circuit parameters from [4]: Vin = 0.3V , L = 1 μ H, C = 1pF, and the tunnel-diode model from Figure 2, and we illustrate the behavior for three values of R, 100Ω , 200Ω and 300Ω with the initial condition that IL = 0 and VC = 0 for all three cases. Figure 3(a) shows that the circuit oscillates for R = 100Ω and R = 200Ω but locks-up at a fixpoint when R = 300Ω . Natural questions that a designer could have about this circuit include: “For what values of R will the circuit oscillate?” “Does oscillation occur from all initial states, or does proper operation depend on the initial conditions?” “How much variation amplitude and period variation will appear at the output of the oscillator?” “How is power supply noise converted into amplitude and phase noise by the oscillator?” I examine the first two of these questions using methods from dynamical systems theory in the next section and then conclude this treatment of this simple oscillator by comparing this dynamical systems approach with time-domain analysis methods published earlier. A Dynamical Systems Analysis of the Tunnel-Diode Oscillator. The ODE model for the tunnel-diode oscillator has two state variables: IL and VC . Thus, the ODE has a two-dimensional phase-space. Solutions to Equation 3 can be visualized as trajectories in this IL × VC space. Figure 3(b) shows two such trajectories for the oscillator circuit with R = 200Ω . The first starts at the point (IL ,VC ) = (1.2mA, 0.0591V) which is very close to the (unstable) equilibrium point of the the oscillator. The other trajectory starts −4
12
R = 100 R = 200 R = 300
0.5
10
0.4
IL (amperes)
8
0.3
C
V (volts)
x 10
6
4
0.2 2
0.1
0 0
0
5
10
15
20
time (ns)
(a) Waveforms
25
30
−2 0
0.1
0.2
0.3
0.4
VC (volts)
0.5
0.6
(b) Trajectories (R = 200Ω )
Fig. 3. Waveforms and Trajectories for the Tunnel Diode Oscillator
0.7
Verifying VLSI Circuits
5
at (IL ,VC ) = (0mA, 0.75V) which is well “outside” of the stable orbit of the oscillator. As can be seen, both trajectories converge rapidly to a clockwise, periodic orbit – the convergence is fast enough, that they two trajectories are visually indistinguishable for most of the orbit. Because the phase-space for the tunnel-diode oscillator’s ODE is two-dimensional, the trajectory originating at any given initial condition must either converge to a fixed equilibrium point, diverge to infinity, or enter a periodic orbit [5]. Thus, we first consider the equilibrium points of the oscillator; these are the points at which I˙L = 0 and V˙C = 0. We can find all such points by solving for the roots of three cubic polynomials, one for each of the regions of the tunnel-diode model. When R = 200Ω the oscillator has one equilibrium point with IL = 0.84mA and VC = 0.132V. Let f be the function such that (I˙L , V˙C ) = f (IL ,VC ). We check the stability of the equilibrium point to computing the eigenvalues of the Jacobian matrix for f at the equilibrium point: ∂ ˙ ∂ ˙ ∂ IL IL ∂ IL VC Jac( f , (0.84mA, 0.132V)) = ∂ ˙ ∂ ˙ IL VC IL =0.84mA,VC =0.132V (4) ∂ VC ∂ VC 8 −1 −2 × 10 s −1 × 106Ω −1 s−1 = 1 × 1012Ω s−1 3.51 × 109s−1 The eigenvalues of this Jacobian matrix are 9.3 × 107 s−1 and 3.2 × 109s−1 . Both have positive real parts, which shows that this equilibrium point is a repeller: trajectories starting near this point diverge exponentially away from it. Having characterized the one equilibrium point of the oscillator, we now know that every trajectory except for the one that starts exactly at the equilibrium point must converge to a periodic attractor. Furthermore, every periodic attractor in a two-dimensional system must have an equilibrium point in its interior. Because the the tunnel diode oscillator has only one equilibrium point, it has only one periodic attractor. We conclude that from all initial conditions (except for the equilibrium point itself), trajectories converge to the desired oscillation. In other words, the oscillator is guaranteed to start-up, and to enter the desired oscillation mode. For concreteness, the preceding discussion analyzed the tunnel-diode oscillator circuit assume that R = 200Ω . The conclusions only require that the circuit have a unique equilibrium point that is a repeller. It is straightforward to show that this condition holds for any R < 243.59Ω . For larger values of R, the equilibrium point becomes an attractor, and all trajectories converge to this point. This is an example of a Hopf bifurcation, where the qualitative behavior of the system changes at a critical value of a parameter. Furthermore, the analysis can be extended to consider other values of Vin . We note that for values of Vin greater than the local minimum in the tunnel-diode’s current-vs.-voltage function (see Figure 2), it is possible to have multiple equilibrium points. For example, with Vin = 0.5V and R = 600Ω , there are equilibria for (IL ,VC ) ∈ {(0.955mA, 0.023V), (0.561mA, 0.219V), (0.407mA, 0.186V)}. The first and last of these are stable equilibria while the middle one is unstable. In this case, the circuit has two stable states, and the middle equilibrium is the saddle point (a.k.a. “metastable” point) between them. In this section, I’ve shown that the behavior of the tunnel-diode oscillator can be understood using standard methods from dynamical systems theory. The two-dimensional
6
M.R. Greenstreet
phase space makes a detailed analysis tractable. I’ve shown that for appropriate values of the series resistor, the oscillator is guaranteed to oscillate from all initial conditions except for a single point in the phase space. I’ve presented constraints on the value of R that are necessary and sufficient for correct oscillation. Without external disturbances, the oscillator settles to a unique, periodic attractor which means that the amplitude and phase noise go to zero as the oscillator operates. Similar techniques can be used to determine the sensitivity of the output amplitude and period to power supply noise and answer other questions that a designer might have about the circuit. Time Domain Analysis of the Tunnel Diode Oscillator. Many formal methods researchers have used the tunnel-diode oscillator circuit from Figure 1 starting with Hartong et al. who introduced the problem in [2]. Hartong used a recursive subdivision algorithm to partition the continuous state space into smaller boxes, constructed a nextstate relation on these boxes, and then used CTL model checking on this relation. They showed that nearly all boxes lead to the desired oscillating behavior and identified a fairly tight approximation of the stable orbit. They could not verify that start-up occurs everywhere except for a single point as in our analysis because of their discretization. This results in many small boxes near the equilibrium point for which they could not prove that oscillation is guaranteed. There are also boxes near along the border of their discretization for which the verification fails because they can’t show that trajectories remain in with in the discretization. Other researchers have verified the tunnel-diode oscillator example, generally following the same basic approach as Hartong et al. of discretizing the state space and model checking the discretization. The variations are mainly in how the discretization is performed and how the next state relation is determined. Gupta et al. [6] present a verification of the oscillator CheckMate which represents the reachable space with flowpipes. They show how they can verify proper oscillation for a 200Ω resistor, but show that the circuit fails to oscillate with a 242.13Ω resistor. They don’t give the polynomial coefficient for their tunnel-diode model, and given that there seem to be slight variations in these coefficients from different papers, this could account for their slight discrepancy with our conclusion that the circuit oscillates for any R < 243.59Ω . For example [7] verify the oscillator using PHAVer which uses high-dimensional polyhedra to represent reachable sets or over-approximations thereof. Little et al. [3] present a verification of the tunnel-diode oscillator where the circuit is modeled with a labeled hybrid Petri net (LHPN) and the reachable state space is represented using difference bound matrices. Other Introductory Examples. There are numerous other examples that have appeared in the research literature that I will include in the category of introductory example. These are characterized by using simplified, often idealized, circuit models and having models with a low number of dimensions, typically four or less. Much of the early work was done at UBC including the verification of an asynchronous arbiter [8], a toggle flip-flop [9], and a van der Pol oscillator [10]. Like the tunnel-diode oscillator, these examples have simple, cubic polynomials for their derivative functions, and the verification approaches were built on results from dynamical systems theory. These simple examples show how it is possible to specify a wide range of circuit behaviors,
Verifying VLSI Circuits
7
both analog and digital, as topological properties of invariant sets of trajectories. The simple models make the analysis tractable, and intuitive. In addition to the tunnel-diode oscillator, Hartong’s paper [2] also also presented a Schmitt trigger and a low pass filter. Gupta et al. [6] added a Delta-Sigma modulator (analog-to-digital converter) circuit to the set of “standard” examples. This design differs from the oscillators and other circuits described above as it is modeled with discrete time. Thus, the verification problem is one of computing the reachable space for an iterated sequence of linear mappings, where each mapping is selected according to the sequence of threshold decisions made by the comparator. Little et al. [3] presented a switched capacitor integrator which has continuous, linear dynamics with discrete event transitions between different dynamics. In a different direction, Seger citeSeger08 described the use of the Forte verification tool to find state assignments that minimize the leakage current of logic blocks in standby, power-saving modes.
3 Intermediate Examples This section presents design and verification examples that use accurate circuit models for deep-submicron fabrication processes, are larger circuits that hence have higherdimensional state spaces and are problems of interest to practicing VLSI designers. Section 3.1 examines a differential ring oscillator that was put forward by Jones et al. [11] as challenge problem: “For what transistor sizings in the oscillator guaranteed to start-up properly.” I will then describe some recent work on computing synchronizer failure probabilities. Synchronizers are used to implement reliable communication in systems that have multiple, independent clocks. It is well-known that it is impossible to implement a perfect synchronizer. Instead, our goal is to show that the failure probability for the synchronizer is sufficiently low. While this is not “formal” verification, the techniques that we present build on the same dynamical systems framework that we are using throughout this paper, and the problem is critical for today’s chip designs. Section 3.3 concludes this section with brief descriptions of other intermediate examples and their verification. 3.1 The Rambus Ring Oscillator Figure 4 shows the four-stage instantiation of the differential ring-oscillator circuit put forward by researchers at Rambus [11] as a verification challenge. Although the circuit has been used by VLSI designers since long before it was proposed as a challenge fwd x(0,0) cc fwd x(1,0)
fwd x(0,1) cc
cc fwd x(1,1)
fwd x(0,2) cc
cc fwd x(1,2)
fwd x(0,3) cc
cc fwd x(1,3)
Fig. 4. A Four-Stage Differential Ring Oscillator
cc
8
M.R. Greenstreet
problem, it’s introduction to the verification community has given it the moniker, the “Rambus Oscillator” problem, and I will refer to it as the “Rambus Oscillator” in the following. Designers have at times had versions of this circuit that worked in simulation, but once fabricated on a chip would occasionally fail to start. It was understood that these lock-up issues were related to the relative sizes of the transistors in the “forward” and “cross-coupling” inverters in the design. If the “forward” inverters (labeled fwd in the figure), are much larger than the “cross-coupling” inverters (labeled cc), then the circuit acts like a ring of 2n inverters and will settle to one of two states: State 1: x(0,0), x(1,0), x(0,2), x(0,2) are high, and x(0,1), x(1,1), x(0,3), x(0,3) are low. State 2: x(0,1), x(1,1), x(0,3), x(0,3) are high, and x(0,0), x(1,0), x(0,2), x(0,2) are low.
(5)
Conversely, if the cross-coupling inverters are much larger than the forward ones, then the circuit acts like n separate static latches and has 2n stable states. If the forward and cross-coupling inverters have comparable strength, then the circuit should oscillate in a stable fashion. The Rambus challenge problem is to determine conditions on the sizes of the inverters that guarantee that the circuit will enter a stable oscillating condition from any initial condition. We note that it is easy to show that if a Rambus oscillator has an odd-number of stages, then it is has two stable DC equilibria. Thus, we only consider oscillators where n is even in the following. A Dynamical Systems Analysis of the Rambus Oscillator. Our analysis proceeds in two phases (see [12] for details). First we describe a method for locating all DCequilibria of a ring-oscillator circuit as depicted in Figure 4. We then analyze the stability of each equilibrium point. If all equilibria are unstable, then it is impossible for the circuit to settle at some stationary state. However, our analysis does not rule out the possibility that the circuit has some chaotic or higher-harmonic behavior. For an oscillator with a small number of stages, chaotic solutions seem unlikely, and our demonstration that it does not “lock up” addresses the main initialization concern of practicing designers. An n-stage Rambus oscillator has 2n nodes. We model MOSFETs as voltage controlled current sources with associated capacitances, and we model the interconnect in the oscillator as additional capacitances. We use HSPICE generate tables of current data for the transistors in our circuit and use bilinear interpolation within this table. By using a fine grid for the table (0.01V steps with a 1.8V power supply), our models are very close to the HSPICE model. For simplicity, we ignore wire resistance and inductance noting that they should be negligible for any reasonably compact layout of the circuit. The state of the circuit is given by a vector of 2n elements, corresponding to the voltages on the 2n nodes of the circuit. A state is a DC equilibrium if the total current flowing into each node through the transistors is zero. A brute-force analysis of the ring-oscillator would require searching a 2n-dimensional space for DC equilibria. In addition to being time consuming, it would be difficult to show that such a search was exhaustive. Instead, we first show how we can simplify the problem to the search of a 2-dimensional space, and then further reduce it to a
Verifying VLSI Circuits gate
Iout Vin
+
+
−
−
in
out
in
Vout gate
9
source drain out drain source
Fig. 5. An Inverter
1-dimensional search problem. This formulation makes the identification of DC equilibria straightforward. Our analysis is based on simple observations of the monotonicity of the drain source current of a MOSFET. In particular, the drain-to-source current for a transistor (nchannel or p-channel) is positive monotonic in both the gate-to-source and drain-tosource voltages. As a consequence, the output current of an inverter (see Figure 5) is negative monotonic in both the input and output voltages of the inverter. Because the Iinv function is monotonic, its inverses are also functions. In particular, we define ininv (Vout , Iout ) to be the input voltage to the inverter such that if the output voltage is Vout , then the output current will be Iout . In other words: Iinv (ininv (Vout , Iout ),Vout ) = Iout
(6)
We assume that all forward inverters have the same transfer function and that all crosscoupling inverters have the same transfer function – this assumption is not essential to the remaining analysis, but it simplifies the presentation. Let Iinv,fwd and Iinv,cc denote these two transfer functions, and let ininv,fwd denote the ininv function for a forward inverter. For i ∈ {0, 1} and 0 ≤ j < n, let Vi, j denote the voltage on node x(i,j) as indicated in Figure 4. We note that V is a DC equilibrium iff Iinv,fwd (x(1, n − 1), x(0, 0)) + Iinv,cc (x(1, 0), x(0, 0)) = 0 ∧ Iinv,fwd (x(0, n − 1), x(1, 0)) + Iinv,cc (x(0, 0), x(1, 0)) = 0 ∧ Iinv,fwd (x(0, i), x(0, i + 1)) + Iinv,cc (x(1, i + 1), x(0, i + 1)) = 0, 0 ≤ i < n−1 ∧ Iinv,fwd (x(1, i), x(1, i + 1)) + Iinv,cc (x(0, i + 1), x(1, i + 1)) = 0, 0 ≤ i < n−1
(7)
back(V0,i ,V1,i ) = (ininv,fwd (V0,i , −Iinv,cc (V1,i ,V0,i )), ininv,fwd (V1,i , −Iinv,cc (V1,i ,V0,i )))
(8)
Let
In English, given the voltages on the outputs of an oscillator stage, back calculates the voltages that must be present on the inputs of the stage if the outputs are to be in DC equilibrium. We can use back repeatedly to work backwards all the way around the oscillator ring. In particular, if backn (x, y) = (y, x), then we’ve found a DC equilibrium. Otherwise, we note that backn (x, y) is positive monotonic in both x and y. Thus for any choice of x, there exists exactly one choice of y such that there is some z with backn (x, y) = (z, x), and this value of y can be found using standard root-finding techniques. Our verification procedure now is:
10
M.R. Greenstreet 1.8
1.6
z
1.4
1.2
y& z
1
0.8
0.6
y
0.4
0.2
0
0
0.2
0.4
0.6
0.8
x
1
1.2
1.4
1.6
1.8
Fig. 6. Searching for DC Equilibria
1. Sweep x from ground to the power supply voltage in small steps. 2. For each choice of x, find the y such that there exists a z with backn (x, y) = (z, x). 3. Find consecutive choices of x such that the sign of y − z changes. perform binary search on these intervals to find a pair, (x, y). such that backn (x, y) = (y, x). The pairs found this way are the DC-equilibrium points of the circuit. The soundness of this method depends on having a small enough step in the sweep of the x values. In practice, we observe that the value of z changes smoothly with x, for example, Figure 6 shows the values of y and z that we obtain while sweeping x – the smoothness of these curves suggests that we are not missing any equilibrium points. Thus, we have reduced the problems of finding all DC equilibria to a single, one-dimensional sweep. To determine whether or not a DC equilibrium is stable, we construct the smallsignal, transient model at each equilibrium point. This gives us a linear model for circuit behavior in a small neighborhood of the equilibrium: V˙ ≈ Jeq (V − Veq )
(9)
where V˙ is the time derivative of V , Veq is the voltage vector for the equilibrium point, and Jeq is the matrix representation of the linear approximation for V˙ when V is near Veq . The solution to Equation 9 is V (t0 + t) = Veq + etJeq (V − Veq )
(10)
If all of the eigenvalues of Jeq have negative real parts, then etJeq goes to zero as t increases, and V (t0 + t) converges to Veq . Thus, such a DC equilibrium is stable (See [5] for an introduction to the dynamical systems theory). Conversely, if Jeq has any eigenvalues with positive real parts, then its DC equilibrium is unstable. To compute the Jacobian of −C−1 Im (V ), we note that J is a matrix with J(i, j) =
∂ V˙i ∂Vj
(11)
For simplicity, we assume that C is constant, and it suffices to compute the partial derivatives of Im (V ) with respect to the components of V .
Verifying VLSI Circuits Q := equilibrium points ; for each q ∈ Q do J := Jacobian operator at q; Λ := eigenvalues(J); μ := maxλ ∈Λ Real(λ ); if(μ > 0) then return(unstable); endif; od; return(stable);
Pseudo-Code
2
1.5
1
instability
[ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [10]
11
0.5
0
−0.5
−1
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
size(cross−couple)/size(forward)
Stability Test for a 4-Stage Oscillator
Fig. 7. Classifying DC Equilibria
Figure 7(a) shows how we combine these pieces to obtain an algorithm to verify that a Rambus ring-oscillator has no stable DC-equilibria. We tested the methods described in the previous section by applying them to a family of Rambus ring oscillator circuits designed in the TSMC 180nm CMOS process. We implemented our analysis algorithms as Matlab scripts and compared our results with HSPICE simulations. The p-channel device in each inverter has twice the width of the n-channel transistor. We varied the transistor width to determine the inverter sizings required to guarantee oscillation. Figure 7(b) displays the results. Let inverter) r = size(cross−coupling size(forward inverter) instability(r) = minq∈Q(r) maxe∈Λ (q) real(e)
(12)
where Q(r) is the set of all DC equilibria for an oscillator with an inverter size ratio of r, and Λ (q) is the set of eigenvalues of the Jacobian matrix for V˙ at q. Thus, if instability(r) > 0, all DC equilibria are unstable and the oscillator will not lock up. Conversely if instability(r) < 0, the circuit can lock-up for some initial conditions. We determined that lock-up is excluded for 0.638 < r < 2.243. We then simulated the oscillator circuit using both HSPICE and Matlab’s numerical integrator function (ode113) to explore the behavior near the critical values. As expected, we found lock-up behaviors for values of r < 0.638 and r > 2.243. To our surprise, we found that the four-stage oscillator can support stable oscillations for values of r very close to zero. For ε < r < 0.638 (with a very small, positive ε ), the four-stage ring has three stable behaviors: it can lock-up at either of the two stable DC equilibria described in Equation 5, or it can oscillate. The actual behavior depends on the initial conditions. This is a potentially treacherous situation for an analog circuit designer. It shows that there are designs for which many initial conditions (including the default in HSPICE) lead to the desired behavior, but some conditions can lead to lock-up. Thus, this is a failure mode that could easily go undetected using standard simulation methods and only be discovered in the test lab, or after the chip has been shipped in products. We believe that examples such as this demonstrate the value of a formal approach to analyzing analog circuits.
12
M.R. Greenstreet
The method presented here (and in more detail in [12]) establishes conditions that ensure that the oscillator circuit is free from lock-up. While we used a four-stage oscillator as an example, the method works for any (even) number of stages because it only searches a one-dimensional space regardless of the number of stages in the oscillator. In particular, it works for the two-stage version originally presented in [11] or for oscillators with more stages However, this analysis here does not guarantee that there is only one, stable oscillatory mode, nor does it ensure that unstable, higher-order modes die out quickly. Showing the uniqueness of the desired oscillatory mode could be attempted by reachability techniques such as those developed of hybrid systems or through finding appropriate Lyapunov functions. These remain areas for future research. Other Approaches to Verifying the Oscillator. Little and Myers [13] analyzed a twostage oscillator where they obtained a LHPN model derived from simulation traces and user-defined thresholds for node voltages. There goal was to establish the stability of the oscillation and they did not attempt to show that the oscillator starts correctly from any initial condition. To find lock-up traces with their method requires starting from an appropriate set of simulation traces and choosing the right set of node voltage thresholds for partitioning the state space. Tiwary et al. [14] used the Rambus oscillator as an example of their SAT-solver based approach to circuit verification. In particular, they used a SAT solver to identify regions of the state space that might contain DC equilibrium points. They did not classify the stability of these equilibrium points, and the regions that they identified covered a large fraction of the total space. They noted that more exact analysis may be possible by using abstraction refinement. More recently, Zaki et al. [15] presented a similar approach using HySat [16] that identifies a very small region for each equilibrium point and characterizes the stability of the point. Comparing the relative strengths and weaknesses of satisfiability based methods such as [15] and the dynamical systems approach of [12] is a topic for future research. 3.2 Synchronizer Failure Probabilities Today’s VLSI designs are often “System-on-a-Chip” [17] that combine a large number of independently designed “intellectual property” blocks. These blocks can operate at different clock frequencies because of different I/O requirements, different design targets, power optimization, etc. Consider two, communicating blocks that operate from different clocks. A signal from the sending block may change at an arbitrary time relative to the receiver’s clock. This can create a synchronization failure as depicted in Figure 8.
clk D
D Q
Q
d Q
A
B A
C B
clk
tx Fig. 8. A D Flip-Flop and Metastability
C
Verifying VLSI Circuits
13
If the D input to the flip-flop changes well before the triggering transition of the clock, then the value of the Q output well after this rising edge (e.g. at time tx ) will reflect the new value of D. This scenario is depicted by the traces labeled A in Figure 8. On the other hand, if D changes well after the triggering transition of the clock, the Q output will retain its old value as depicted by the traces labeled C in the figure. The flipflop is implemented by circuits that can be accurately modeled by ODEs with smooth derivative functions. Thus, the value of Q at any time is a continuous function of the parameters of the system. In particular, the value of Q at time tx is a continuous function of the time of the transition of the D input. Thus, there is some time (depicted in the figure with the traces labeled B) that the D input can change such that the Q output will be at an intermediate value at time tx no matter how long tx is after the clock event. Such a scenario is called a metastability failure. Metastability has been known since the early work of Chaney and Molnar [18], and many proofs have been published showing that metastability is unavoidable given any reasonable model of the physical implementation of the synchronizer (e.g. [19,20,21]). Thus, we cannot hope to prove that a synchronizer circuit will never fail. Fortunately, the probability of failure decreases exponentially with the time allotted for the synchronizer output to settle. Thus, with a sufficient settling time, the probability of a synchronization failure can be made extremely small. On the other hand, designers do not want to make the settling time too large as the extra latency of the synchronizer can adversely impact system performance. Kinniment et al. [22] have presented experimental measurements that show that simple linear models fail to give accurate estimates of synchronizer failure probabilities because they neglect non-linear circuit behaviors that occur as metastability “moves” from one latch to the next in a synchronizer chain. Thus, there is a need to be able to analyze synchronizer failure probabilities. Consider a synchronizer that must handle an input transition every 10ns. To achieve a mean time between failure (MTBF) of one year, then the synchronizer must have a failure probability of less than 10−15. This is close to the resolution of double precision arithmetic and is impossible to verify with traditional circuit simulators. In fact, from our experience, simulators such as HSPICE have sufficient numerical accuracy to establish MTBFs of a few milliseconds, but they cannot establish any MTBF that would be acceptable for a practical design. Figure 9 shows a simple latch that could be used in the implementation of a flip-flop – a typical flip-flop consists of two latches, a master and a slave, that are enable on opposite polarities of the clock signal. Metastability occurs when a transition on the d input occurs just before the falling edge of clk bringing nodes x and x to nearly the
q x
x
d clk
Fig. 9. A Jamb Latch
14
M.R. Greenstreet time interval
matlab simulation
voltage interval
VH 3 VH 1
tH 0
tH 1
t L1 t
VH 4
VH 3
VH 2 L0
VL 3
VL 2
VH 4
VH 2 VH 5 VL 5
VL 4 VL 2
VL 4
VL 1
t1
t2
VL 3
t3
t4
t5
time
Fig. 10. Simulation with Interpolation and Restart
same voltage. In particular, the latch has an equilibrium point where both x and x are at point where the input and output voltages of the inverters are equal. For any reasonable inverter designs, this equilibrium point is unstable. In fact, the Jacobian matrix for the derivative operator must have one positive eigenvalue (the instability) and the other eigenvalues must be negative. This is what makes the circuit challenging for the numerical integrators used in circuit simulators: the positive eigenvalue means that errors are amplified in the forward-time direction, and the negative eigenvalues mean that trying to integrate backwards in time is also numerically unstable. This instability has an important physical interpretation. Consider two trajectories that both spend a long time near the metastable point before one settles with q high and the other settles with q low. For most of this time, both trajectories are very close to the unstable equilibrium point, and therefore they are very close to each other. This means that a small-signal, linear model can be used to model the difference between the two trajectories. The full, non-linear circuit model must be used to determine the common component of the two trajectories. Figure 10 shows how we used these observations to obtain an accurate way to compute synchronizer failure probabilities. As with traditional methods, we start with two simulations with two different input transition time times: one that leads to the synchronizer settling high and the other that leads to it settling low. We bisect on this time interval until we get two trajectories that are close enough together to safely use linear interpolation to derive intermediate trajectories, but still sufficiently separated to allow a clear numerical distinction between these two path. Our algorithm finds a time, t2 in the diagram, such that linear interpolation is valid in the interval [t1 ,t2 ], and the points in phase space, VL1 and VH1 are sufficiently separated as to allow a new round of bisection to be performed. Our algorithm repeats this process, finding a sequence of time points and trajectory endpoint that move closer and closer to “perfectly metastable” trajectory. Crucial to our analysis is that at each step, the algorithm has a linear mapping from states at ti+1 back to states at ti . Because the circuit model is highly non-linear, this mapping is only valid for trajectories that are close to the ones that we have found, but that is sufficient in out case. Composing these linear mappings, we can determine the width of the equivalent time interval at t1 that leads to synchronization failures at any given settling time. We do not attempt to compute the location of this interval any more accurately than traditional simulation methods as the precise location is unimportant for
Verifying VLSI Circuits
15
determining failure probabilities. We only need to know the width of the interval, and our algorithm computes very small widths without any numerical difficulties. More details on this approach are given in [23], and [24] shows how this approach can be used to generate waveforms, the “counterexamples” that show how metastability failures occur for particular circuits. Recently, our software has been used by research collaborators at SUN [25]. The analysis revealed failure modes that had not been previously considered and showed ways to improve the reliability of the synchronizer circuits being used in commercial designs. This is a paper for a formal verification conference; so the reader might ask: “Is this really formal verification?” In many ways, the answer to this question is “Of course not.” As noted above, no synchronizer can be perfect; so, we cannot hope to prove that a synchronizer never fails. We might hope to prove that its failure probability is acceptably small. While this is possible, our current techniques are very pragmatically numerical, and they don’t offer that kind of proof. Nevertheless, I believe that this kind of research is relevant to the formal methods community. First, it came out of the same dynamical systems perspective that is behind many of the techniques used for formal verification of circuits. By learning how to specify and verify circuits, we developed many of the conceptual tools (as well as much of the software) for computing synchronizer failure probabilities. Second, when an analysis tool indicates that changes should be made to a design, we need a high level of confidence that these changes really will improve the quality of the product. Design changes are expensive and risky, and when they come out of newly developed analysis tools, project managers have a good reason to scrutinize the claims and results very carefully. Although we have not formally verified the correctness of our approach, we have carefully checked the results by many different methods. A formal correctness proof would contribute even more confidence to using new methods like these, and we see this as a valuable area for future work. 3.3 Other Intermediate Examples To the best of my knowledge, the first published research on circuit-level verification was by Kurshan and MacMillan [1]. They verified an arbiter, the asynchronous cousin of the synchronizers described above. I classify this as an “intermediate” example as they used fairly realistic transistors models with a real circuit. Their verification assumed that inputs to the arbiter change instantaneously. Dastidar and Chakrabarti [26] implemented a model checker for analog circuits. As in [13], they discretized the state space and used SPICE simulations to achieve a nextstate relation. They extended their discretized state space model to discretizing the possible input waveforms; in other words, inputs are fine-grained “stair-step” functions. I’m including this as an “intermediate” example because they presented a systematic procedure for enumerating all possible state transitions relative to these discretizations. They used their model checker to verify properties of several circuits including a comparator, an operational amplifier, a VCO and a successive-approximation analog-to-digital converter.
16
M.R. Greenstreet dx/dt
V0l V0h
t
V1l V1h
1
2
3
4
2 1
x
V1h V1l
x
3 4
V0h V0l
The Annulus
A "typical" trajectory
Fig. 11. Brockett’s Annulus
Recently, we have verified the CMOS counterpart of Kurshan and MacMillan’s arbiter [27]. Our verification uses a Brockett’s annulus construction [28] shown in Figure 11 to specify the allowed input transitions. The annulus defines a relation between signal voltage and its time derivative. This bounds signals into appropriate intervals for logical low and logical high values; it also specifies monotonic transitions, and bounds the slopes, and thus transition times, for these transitions. Our verification applies for any input functions that satisfy these constraints. More details of the construction are in [27]. We have used similar techniques to verify a toggle flip-flop [29].
4 Open Problems The previous two sections presented examples that show how traditional formal methods techniques such as reachability analysis and model checking can be combined with concepts from dynamical systems theory to verify circuits of simple or moderate complexity. Where to we go from here? Presently, one of the biggest concerns of circuit designers is device parameter variability [30]. Traditionally, circuit designers have had to cope with the fact that there could be large parameter variation between different wafers or different die, but that devices on the same die tend to be very well matched. More recently, this matching has been limited to devices that are physically proximate, but designers have been extremely adept at exploiting this matching. As fabrication technology enables the manufacture of chip with extremely small transistors and wires, statistical fluctuations between devices have become relatively large, and the traditional design methods no longer work. Instead, designers use digital control inputs to analog circuits to adaptively correct for variations in process parameters and operating conditions. For example, many microprocessors adaptively adjust the power supply voltage and the body bias (the voltage on the “fourth” terminal of the transistor that we often don’t bother to include in schematics) to balance speed and leakage power trade-offs [31,32]. Many chips include on-chip, dedicated processors simply to manage power supply voltage, clock skew, body bias, and other variations (e.g. [33]). Thus, chips really are hybrid systems: analog circuits with non-linear dynamics with discrete control inputs whose values are determined by a software program.
Verifying VLSI Circuits
17
This mix of analog circuitry with digital, adaptive control creates numerous verification challenges. First, detailed simulation with circuit simulators such as SPICE is too slow to be used while simulating the adaptive algorithm. Thus, designers use mixedmode simulation languages such as Verilog-AMS [34] or VHDL-AMS to describe their designs. Individual circuits are simulated extensively with SPICE, and then the designer writes a Verilog-AMS model to match the observed behavior. Presently, there is little or no automatic support to verify that the Verilog-AMS model matches the actual circuit. Furthermore, the detailed simulations may omit some assignments of the control inputs, but the Verilog-AMS code does something “sensible” with that input, even though it may not be what the actual circuit does. This creates many opportunities for errors to be hidden in a design. Based on these observations, I expect that the many of the critical issues for circuitlevel verification will be related to handling design with parameter variations in a systematic way. This leads to three critical areas for further research: 1. Developing tools to verify that simplified circuit models such as those used in Verilog-AMS are valid abstractions of the actual circuit. Formal methods researchers have developed many ways of describing and verifying abstractions. I expect that there are good opportunities for formal methods research to contribute to this area. 2. Verifying the correct operation of the circuit over the range of possible parameter values. While a designer can often obtain a high level of confidence in digital designs by simulating the “corner cases,” analog circuits can exhibit more subtle failures that require consideration of parameters in the interior of the parameter space as well. While symbolic tools such as PHAVer [4] can analyze a circuit with parameters left as symbolic quantities and then derive relations that must hold amongst the parameters for correct operation, these tools are presently limited to designs that have a small number of dimensions to their phase space and/or have very simple dynamics. 3. Verifying the composition of analog circuits with adaptive, digital control. As noted above, this is a true hybrid systems problem, and there appears to be a good opportunity to build on the experience and and techniques of the hybrid systems community to make contributions here. Again, the complexity of the circuits and control algorithms along with the non-linearity of the circuit models present challenges that go beyond current hybrid systems research.
5 Conclusions I have presented an introduction and survey into results and opportunities for formal verification of circuits using continuous models. My emphasis has been on using basic results from dynamical systems theory to specify and analyze circuits. In Section 2, I described the tunnel-diode oscillator problem that has often been used as an example for reachability tools, and showed how it can be verified and characterized using straightforward dynamical systems methods. This does not invalidate the reachability approach. Indeed, the challenge of circuit-level verification is largely one of the “curse of dimensionality:” with continuous variables, a state spac of only five to ten variables
18
M.R. Greenstreet
can be challenging for existing tools. Thus, progress will almost certainly rely on combining reachability methods from the formal methods community with key results from dynamical systems theory to create tractable methods. In Section 3, I described “intermediate problems” that are of interest to practicing circuit designers and can be addressed with existing techniques. Again, I used an oscillator as the primary example, in this case the differential ring oscillator challenge problem posed by researchers at Rambus. I again used a dynamical systems approach, first finding the equilibrium points of the circuit and then characterizing their stability. Unlike the tunnel-diode oscillator, refuting the possibility of fixed-point attractors does not prove that the circuit oscillates as intended. This is because the higher dimensionality of the Rambus oscillator’s phase space opens the possibility of more complicated behaviors than are possible with the tunnel-diode circuit. I also sketched how a dynamical systems approach can be applied to synchronizer analysis and has been used by industrial designers. Finally, I posed challenge problems for where we could go next with this work. Chief among these are finding ways of verifying analog circuits that have digital control inputs to compensate for variations in device parameters and operating conditions. These are genuine hybrid systems that pose challenges for modeling, abstraction, and verification. In the past ten years, circuit-level verification has progressed from being academic curiosity to a topic that addresses concerns of designers working with cutting-edge fabrication processes. The high-cost of design and fabrication, the prevalence of mixedsignal designs, and the challenges of designing deep-submicron processes all motivate developing new approaches to circuit design. Formal methods has the potential to make significant contributions to emerging design techniques.
Acknowledgements I am grateful for interaction with many students and colleagues including Robert Drost, Kevin Jones, Ian Mitchell, Frank O’Mahony, Chao Yan and Suwen Yang. This work has been supported in part by grants from Intel, SUN Microsystems, and NSERC CRD– 356905-07.
References 1. Kurshan, R., McMillan, K.: Analysis of digital circuits through symbolic reduction. IEEE Transactions on Computer-Aided Design 10(11), 1356–1371 (1991) 2. Hartong, W., Hedrich, L., Barke, E.: Model checking algorithms for analog verification. In: Proceedings of the 39th ACM/IEEE Design Automation Conference, June 2002, pp. 542–547 (2002) 3. Little, S., Seegmiller, N., Walter, D., Myers, C., Yoneda, T.: Verification of analog/mixedsignal circuits using labeled hybrid petri nets. In: Proceedings of the International Conference on Computer Aided Design, November 2006, pp. 275–282 (2006) 4. Frehse, G.: PHAVer: Algorithmic verification of hybrid systems past HyTech. In: Morari, M., Thiele, L. (eds.) HSCC 2005. LNCS, vol. 3414, pp. 258–273. Springer, Heidelberg (2005) 5. Hirsch, M.W., Smale, S.: Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, San Diego (1974)
Verifying VLSI Circuits
19
6. Gupta, S., Krogh, B.H., Rutenbar, R.A.: Towards formal verification of analog designs. In: Proceedings of 2004 IEEE/ACM International Conference on Computer Aided Design, November 2004, pp. 210–217 (2004) 7. Frehse, G., Krogh, B.H., Rutenbar, R.A.: Verifying analog oscillator circuits using forward/backward abstraction refinement. In: Proceedings of Design Automation and Test Europe, March 2006, pp. 257–262 (2006) 8. Mitchell, I., Greenstreet, M.: Proving Newtonian arbiters correct, almost surely. In: Proceedings of the Third Workshop on Designing Correct Circuits, B˚astad, Sweden (September 1996) 9. Greenstreet, M.R., Huang, X.: A smooth dynamical system that counts in binary. In: Proceedings of the 1997 International Symposium on Circuits and Systems, Hong Kong, vol. II, pp. 977–980. IEEE, Los Alamitos (1997) 10. Greenstreet, M.R., Mitchell, I.: Integrating projections. In: Henzinger, T.A., Sastry, S. (eds.) HSCC 1998. LNCS, vol. 1386, pp. 159–174. Springer, Heidelberg (1998) 11. Jones, K.D., Kim, J., Konrad, V.: Some “real world” problems in the analog and mixed-signal domains. In: Proc. Workshop on Designing Correct Circuits (April 2008) 12. Greenstreet, M.R., Yang, S.: Verifying start-up conditions for a ring oscillator. In: Proceedings of the 18th Great Lakes Symposium on VLSI (GLSVLSI 2008), May 2008, pp. 201–206 (2008) 13. Little, S., Myers, C.: Abstract modeling and simulation aided verification of analog/mixedsignal circuits. Presented at the 2008 Workshop on Formal Verification for Analog Circuits (FAC 2008) (July 2008) 14. Tiwari, S.K., Gupta, A., et al.: fSpice: a boolean satisfiability based approach for formally verifying analog circuits. Presented at the 2008 Workshop on Formal Verification for Analog Circuits (FAC 2008) (July 2008) 15. Zaki, M.H., Mitchell, I., Greenstreet, M.R.: Towards a formal analysis of DC equilibria of analog designs. Presented at the 2009 Workshop on Formal Verification for Analog Circuits (FAC 2009) (June 2009) 16. Fr¨anzle, M., Herde, C., Ratschan, S., Schubert, T., Teige, T.: Efficient solving of large nonlinear arithmetic constraint systems with complex boolean structure. JSAT Special Issue on Constraint Programming and SAT 1, 209–236 (2007) ˜ A.J., Greenstreet, S.M., Ivanov, A., Lemieux, G., Pande, P., Grecu, 17. Saleh, R., Wilton, S., Hu, C.: System-on-chip: Reuse and integration. Proceedings of the IEEE 94(6), 1050–1069 (2006) 18. Chaney, T., Molnar, C.: Anomalous behavior of synchronizer and arbiter circuits. IEEE Transactions on Computers C-22(4), 421–422 (1973) 19. Hurtado, M.: Structure and Performance of Asymptotically Bistable Dynamical Systems. PhD thesis, Sever Institute, Washington University, Saint Louis, MO (1975) 20. Marino, L.: General theory of metastable operation. IEEE Transactions on Computers C-30(2), 107–115 (1981) 21. Mendler, M., Stroup, T.: Newtonian arbiters cannot be proven correct. In: Proceedings of the 1992 Workshop on Designing Correct Circuits (January 1992) 22. Kinniment, D., Heron, K., Russell, G.: Measuring deep metastability. In: Proceedings of the Twelfth International Symposium on Asynchronous Circuits and Systems, March 2006, pp. 2–11 (2006) 23. Yang, S., Greenstreet, M.R.: Computing synchronizer failure probabilities. In: Proceedings of the 13th Design, Automation and Test, Europe Conference, April 2007, pp. 1361–1366 (2007) 24. Yang, S., Greenstreet, M.R.: Simulating improbable events. In: Proceedings of the 44th ACM/IEEE Design Automation Conference, June 2007, pp. 154–157 (2007)
20
M.R. Greenstreet
25. Jones, I.W., Yang, S., Greenstreet, M.: Synchronizer behavior and analysis. In: Proceedings of the Fifthteenth International Symposium on Asynchronous Circuits and Systems, May 2009, pp. 119–126 (2009) 26. Dastidar, T., Chakrabarti, P.: A verification system for transient response of analog circuits using model checking. In: Proceedings of the 18th International Conference on VLSI Design (VLSID 2005), January 2005, pp. 195–200 (2005) 27. Yan, C., Greenstreet, M.R.: Verifying an arbiter circuit. In: Proceedings of the 8th Conference on Formal Methods in Computer Aided Design (FMCAD 2008) (November 2008) 28. Brockett, R.: Smooth dynamical systems which realize arithmetical and logical operations. In: Nijmeijer, H., Schumacher, J.M. (eds.) Three Decades of Mathematical Systems Theory: A Collection of Surveys at the Occasion of the 50th Birthday of J. C. Willems. LNCIS, vol. 135, pp. 19–30. Springer, Heidelberg (1989) 29. Yan, C., Greenstreet, M.R.: Circuit level verification of a high-speed toggle. In: Proceedings of the 7th Conference on Formal Methods in Computer Aided Design (FMCAD 2007) (November 2007) 30. Bowman, K.A., Duvall, S.G., Meindl, J.D.: Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits 37(2), 183–190 (2002) 31. Tschanz, J., Kao, J., et al.: Adaptive body bias for reducing impacts of die-to-die and withindie parameter variations on microprocessor frequency and leakage. IEEE Journal of SolidState Circuits 37(11), 1396–1402 (2002) 32. Chen, T., Naffziger, S.: Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation. IEEE Transactions on VLSI Systems 11(5), 888–899 (2003) 33. Naffziger, S., Stackhouse, B., et al.: The implementation of a 2-core, multi-threaded Itanium family processor. IEEE Journal of Solid-State Circuits 41(1), 197–209 (2006) 34. Kundert, K.S.: The Designer’s Guide to Verilog-AMS. Kluwer, Dordrecht (2004)
3-Valued Abstraction for (Bounded) Model Checking Orna Grumberg Computer Science Department, Technion, Haifa, Israel
Abstract. Model Checking is the problem of verifying that a given model satisfies a specification, given in a formal specification language. Abstraction is one of the most successful approaches to avoiding the state explosion problem in model checking. It simplifies the model being checked, in order to save memory and time. 3-valued abstraction is a strong type of abstraction that can be used for both verification and refutation. For hardware verification, 3-valued abstraction can be obtained by letting state variables and inputs range over the ternary domain 0,1,X, where X stands for “unknown”. X is used to abstract away parts of the circuit that are irrelevant for the property being checked. For 3-valued abstractions, checking an abstract model may result in 1 or 0, indicating that the checked property holds or fails, respectively, on the original model. Alternatively, model checking may result in X, indicating that it is impossible to determine whether the property holds or fails due to a too coarse abstraction. In the latter case, the abstract model is refined by replacing some of the X’s with the relevant parts of the circuit. The 3-valued abstraction and refinement can be applied either automatically or manually. In this talk we present an automata theoretic approach to 3-valued abstraction in hardware model checking. We show how our 3-valued framework can be incorporated into SAT based bounded model checking and induction based unbounded model checking. Our method enables applying formal verification of LTL formulae on very large industrial designs. We developed our method within Intel’s bounded and unbounded model checking framework, implemented on top of a state-of-the-art CNF SAT solver. We used it for checking real life assertions on a large CPU design, and obtained outstanding results.
This is a joint work with Avi Yadgar, Alon Flaisher, and Michael Lifshits.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, p. 21, 2009. c Springer-Verlag Berlin Heidelberg 2009
Local Search in Model Checking A.W. Roscoe, P.J. Armstrong, and Pragyesh Oxford University Computing Laboratory
[email protected] Abstract. We introduce a new strategy for structuring large searches in model checking, called local search, as an alternative to depth-first and breadth-first search. It is designed to optimise the amount of checking that is done relative to communication, where communication can mean either between parallel processors or between fast main memory and backing store, whether virtual memory or disc files. We report on it in the context of the CSP refinement checker FDR.
1
Introduction
Recent years have seen an enormous increase in the power of model checking technology, thanks in part to the more powerful computers that we now have to run them on and in part to improved algorithms such as techniques for SAT checking [2,8], partial order reduction [14] and state-space compression [13]. It is clear, however, that there is still a need for the basic function of searching through the states of a large but finite automaton to test whether each reachable state is satisfactory. The first author has yet to find a problem where FDR, the refinement checker for CSP (using explicit searching combined with the compression of subprocesses) could not prove a property (i.e. the absence of a counter-example) faster than the SAT checking models of [10,16]. Even when finding counter-examples, the option of using FDR in DFS mode is very competitive with these, as shown in [10]. It is well known that the state spaces of parallel systems tend to grow exponentially with the number of parallel components, and also grow rapidly with the sizes of the types used in defining systems. This means that many of the examples one might wish to run on a tool such as FDR either take only a trivial time or are well beyond the bounds of possibility. What we are going to investigate in this paper is the region between these two extremes where, for example, either it becomes desirable to split the effort of a particular run across several CPUs, the number of states needing to be stored exceeds the limits of fast memory on the computer, or both. In any case it is clear that for the foreseeable future there will be demand for tools that can handle as large as possible a system as quickly as possible. In what follows we will examine how model-checking technology for explicitly searching through a state space has evolved with time, and has been affected by the developments in computing technology over the last two decades. In this Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 22–38, 2009. c Springer-Verlag Berlin Heidelberg 2009
Local Search in Model Checking
23
last respect technology has been hugely positive and is alone responsible for the complete transformation of the capabilities of FDR and other methods. However it is also true to say that, relative to the huge increase in processing power these years have brought, the rate at which large amounts of data can be moved around, in particular on and off backing store such as disc, has not developed so rapidly on typical workstations. There has therefore been a gradually increasing need for search algorithms that minimise and optimise the needs for such data shifting. One particular statistic that is immediately visible to the user of a tool like FDR is the change in speed that occurs when it comes to occupy more space than is available to it in fast memory. We will refer to this as the slow down factor: 1 implying none, 2 meaning that it goes to half speed, and so on. A crude measure of the slow-down factor is the reciprocal of the proportion of CPU time that the search tool gets when operating on backing store. (In the authors’ experience they always use 100% when not fettered in this way.) This paper presents techniques that we believe will greatly improve this factor. It represents work in progress in the sense that only relatively primitive and experimental versions of our methods have been implemented at the time of writing. The relative importance and effectiveness of the heuristics we introduce will only become clear after a good deal more work. The main concepts of local search were proposed by the first author, and developed by all three during a period when the third author was an intern in Oxford during the summer of 2009. We expect that an implementation of it will be released in version 2.91 of FDR towards the end of 2009. In the next section we review the background to the problem, and see how model checking algorithms have evolved in relation to this slow-down, and how both have been affected by the developments in computing technology. We then introduce the main ideas of local searching, before concentrating on the problem of how to partition state spaces effectively. We then review the results we have obtained to date before reviewing how local searching will transfer to parallel architectures. This paper focuses on FDR. While we describe those parts of its behaviour directly relevant to this paper, inevitably we leave out a large amount of related detail. The interested reader can discover much more in, for example, [4,9], [11] (especially Appendices B and C) and [12] (forthcoming 2010).
2
Background
Early model checkers, including the first version of FDR, stored state information in hash tables. To discover if one of the successor states S of the current state S you are examining has been seen before, just see if it is in the table. This works well provided the hash table will fit in the main memory of the computer, but extremely badly otherwise because of the way in which hash tables tend to create truly random access into the memory they consume: a section of backing store that is fetched into main memory (and this happens in large blocks) is unlikely to be much used before being written back.
24
A.W. Roscoe, P.J. Armstrong, and Pragyesh
This problem generated a number of ingenious techniques such as one bit hashing [5,6,15] in which one did not worry about hash collisions (so the search one performed was not complete). In this, only one bit is allocated per hash value, which simply records whether a state with that value has been seen before. Such techniques were never implemented in FDR1, which saw slow-down factors approaching 100, making it entirely unusable for checks exceeding the bounds of RAM. The release of FDR2 in 1994 saw the first introduction of a searching technique in which, instead of performing each membership test individually, they are grouped together into large batches in such a way that the stored state space can be accessed effectively, with all checks against a particular block of stored state space being performed together. The specific technique used by early versions of FDR, and described in [9], was to store the entire explored state space in a sorted list. The search was then performed in breadth-first search order, so that all the membership tests of each level of the check can be done together as follows: – Initially, the explored state list E (0) is empty and we need to explore the root state r . The initial ply of the search Ply(0) is therefore {r }. – Each ply Ply(n) is sorted and combined with the explored states E (n) by merging to create E (n + 1), by adding only those members of Ply(n) not in E (n). Ply(n + 1) is created as the set of all successors of these added states. – The search is over when Ply(n) is empty. At the same time FDR started to apply compression techniques to these sorted lists, storing them in blocks that were either delta-compressed (i.e. only the bytes that were different between consecutive states were stored, together with a table showing where to put them), or compressed using the gzip utility, which was more expensive in terms of computing, but created compressed files that were perhaps 30% smaller than delta-compression. When this was implemented the typical slow-down was reduced to somewhere between 2 and 4, so this represented a considerable success. In many cases the more aggressive compression regime gave lower execution times for complete checks. The fact that disc access speed did not keep up with processor speed had the effect, however, of gradually increasing the slow-down factor between 1994 and the early 2000’s, so that by this time the typical slow-down associated with this method had increased to perhaps 8-12. This was particularly noticeable towards the end of a refinement check, when relatively small numbers of new states are introduced per ply of the BFS, but the above algorithm was still ploughing through the entire state space each time. To counter that, from FDR 2.64 onwards, the simple sorted list structure was replaced by a B-tree, still a sorted structure, but where it is possible to gain rapid access to a particular state and to omit whole blocks of states that are not required in a particular pass. Thus the merge of the algorithm reported above was replaced by repeated insertion into the B-tree represented by the alreadyexplored states. This was particularly effective towards the end of searches, and perhaps halved the slow-down factor overall.
Local Search in Model Checking
25
Of course this was not as good as we would have liked, and we are naturally now once again seeing an increase in this factor. It is this issue that the present paper addresses: we are looking for ways to reduce the amount of data that has to be moved to and from memory for each state explored. Although the memory size of modern workstations is huge compared to those of a few years ago (2-4Gb being typical at the time of writing) it nevertheless seems to be the case that this limit is reached more quickly by FDR than in previous years. This means that, if anything, the problems of a high slow-down factor has become greater. Believing that the new technology of disc drives built from flash memory would help greatly with the problems identified here, we obtained such a machine in 2008. We were disappointed that, at least with that particular version, the performance was actually marginally worse than with the same machine’s conventional disc. This led us to believe that, though this type of technology may well improve in the future, it is very unlikely to solve the slow down problem sufficiently. As we will discuss later, may of the same issues also apply to the parallel execution of model checking, where a significant barrier may be amount of data that has to be communicated between processors.
3
Local Search
As we have seen, BFS permits all of the membership tests in each ply of the search to be combined, which greatly reduces the amount of memory churn that is required. Nevertheless, even when improved by B-tree structures, it still moreor-less requires the entire accumulated state space to pass through the CPU on each ply. This would not matter if the memory bandwidth were sufficiently high that this happened without diminishing performance, but this is not true. It gradually became apparent to the first author that the present BFS strategy required improvement, and so he proposed the following alternative method: – Begin the search using the same BFS strategy as at present, with a parameter set for how much physical memory FDR is supposed to use. Maintain statistics on the states that are reached, the details of which will be discussed later. – Monitor the position at the end of each play of the BFS, and when the memory used by the combination of the accumulated state space X (n) and the unexplored successors S (n + 1) exceeds the defined limit, these two sets are split into a small number (at least two) blocks by applying some partitioning function. – This partitioning might either be done as a separate exercise, or be combined with the next ply of the search. – From here on the search always consists of a number of these blocks, and a partitioning function Π that allocates states between them. Each such block B (i) will have an explored state space EB (i) and a set of unexplored states UB (i), which may intersect EB (i).
26
A.W. Roscoe, P.J. Armstrong, and Pragyesh
– At any time FDR pursues the search on a single block B only, as a BFS, but the successors generated are partitioned using Π. The successors belonging to B (i) are pursued in this BFS. Those belonging to other blocks are added to UB (j ) in a way that does not require operations on the existing body of UB (j ) that require this to be brought into memory. – The search on B (i) might terminate because after some ply UB (i) becomes empty. In this case begin or resume the search on some other block with a nonempty UB (j ): if there is none, the search is complete and in FDR the refinement check gives a positive result. If there is another block to resume we must realise that B (i) may not be finished for all time since other B (j ) might create a new UB (i) for the block that has just terminated. – On the other hand the size of the search of B (i) might grow beyond our limit. In this case Π is refined so that B (i) is split as above. After that a block which may or may not be one of the results of this last split is resumed. – On resuming a block B (i) the set UB (i) might have grown very large thanks to input from other blocks, and furthermore it is possible that there might be much repetition. There is therefore a strong argument for (repetition removing) sorting of UB (j ) before assessing whether to split B (i) at this stage. It is clear that this search will find exactly as many states and successor states as either BFS or DFS: we explore the successors of every state exactly once. The objective of this approach is to minimise the amount of transfer in and out of main memory during the search. The basic rationale is that it makes sense to explore the successors of some states without always backgrounding them before doing so. In other words, – Like DFS, tend to explore states that are already in the foreground. – Seek to reduce the amount of memory churning performed by BFS. There are three major parameters to this search method: – How many pieces to divide blocks into? There are clear arguments for making this depend on the evolution of the search. If it is growing quickly we would expect a larger number of pieces to be better. – Where there is a choice of blocks to resume, which one one should one pick? One could, for example, pick the largest one, follow a depth-first strategy (where blocks are arranged into a stack, with new blocks being pushed onto the stack, and the top of the stack being chosen for resumption), a breadth first search (where they are organised into a queue), or a hybrid in which, when a split occurs, we pursue one of the resulting blocks but push the rest to the back of a queue. Suppose, for example, an initial split generates blocks A and B , pursuing A generates AA and AB , and that pursuing AA at that point means that it terminates before splitting. Then the three strategies would initially follow A, AA, AB , A, B and A, AA, B . – Perhaps most importantly, what algorithm should we use for partitioning the state space? There are two desiderata here: firstly that the parts we divide
Local Search in Model Checking
27
the state space into are the sizes we want. While we might well want them to be equal in size, this is not inevitable. The second objective is that as large a proportion of the transitions from a state in each block should be to the same block: that way there will be relatively little state space to transfer to other blocks, and the search of a generated block is unlikely to peter out quickly. We will term this property that transitions tend to be to “nearby” states as locality and measure it as a percentage: the percentage of computed transitions that lie inside a single block. It follows that we should hope to do better than we would with a randomised partition. How successful we are with this second objective might well influence the other choices that have to be made. Whilst we hope that this approach will give us advantages thanks to better memory management, we also need to be aware of the extra work it introduces. – Based on the above, we would expect each state to have partitioning functions applied to it as often as the blocks it happens to be in are split. – Similarly we would expect there to be a cost in re-organising the data structures used to store states each time a block is split. – There will also be costs in devising partitioning functions and collecting and analysing statistics to help in this.
4
Partitioning the State Space
We will assume in what follows that the overall state space is stored in a sorted structure in the general style of FDR as described earlier. We do not, however, discount the possibility that the same ideas might work in conjunction with hash tables. Given this choice, there is an obvious way of partitioning a state space quickly: to break it into k pieces choose k − 1 states as pivots, with the pieces being the k − 2 sections between consecutive pairs of pivots, and the other two being the those less than, and greater than, all pivots. We might conventionally include the pivots themselves in the section immediately above them. The most obvious way of picking these pivots is to spread them evenly either in the relevant UB (i) or EB (i) or both. The most obvious advantage of this approach is that the partitioning and restructuring work essentially disappears. A clear disadvantage is that there is no good reason to suspect that this partitioning approach will give good locality properties: for this to happen we would need to design an order where most transitions are to states that are close (in the order) to their origins. We will show later how this can be achieved. If we are to look for ways of improving the locality of a partition we must understand how one’s tool, in our case FDR, represents states and generates transitions. Since FDR is checking the refinement of a “specification” process Spec by an ‘implementation” process Impl , the things it searches through are not single states but actually “state-pairs”: the pairs (ss, is) where ss is a state of the
28
A.W. Roscoe, P.J. Armstrong, and Pragyesh
specification and is is a state of the implementation that are reachable on some trace t . ss is an integer index into an explicit representation of the normal form of Spec. An explicit state machine is an array of states, each of which carries a list of transitions, with each transition being the label of the action and the index of the target state. In some cases the states might have additional labelling. A normal form state machine is one in which (i) there are no τ (invisible) actions and (ii) no state has more than one action with any given label. Thus given the root state of a normal form machine, and a trace, then if it can perform the trace t then after t its state is completely determined. If the refinement check is over any model other than T , the traces model, then the normal form state will be labelled with information representing, for example, divergence and refusal-set information. In most cases FDR computes the normal form in advance, but there is also the option to use the function lazynorm(Spec), which means that only those normal form states relevant to traces of Impl are explored. The motivation for lazynorm is that sometimes the normalisation of Spec is a time-consuming activity, and lazynorm means that only the necessary parts of the normalisation are done. This is potentially an important distinction for our partitioning activities, since preliminary analysis can be carried out on a complete Spec, but less on one that has yet to be evaluated. In a large proportion of FDR checks, the specification is extremely simple, frequently having just one or two states. For example, the specifications for “deadlock-free” and “the event a never occurs” each have just one state. There is clearly very little potential for partitioning the space of state pairs based on the specification state in cases like this1 . The situation with the implementation state is is analogous to that with lazy normalisation: at the start of the check we have no idea which states will be reached or what their transitions will be. They are both examples of implicit state machines: a representation of the root state together with a set of recipes for computing the actions and resulting target states from any given state. To understand the structure of an implementation state we need to understand the two-level treatment of CSP’s operational semantics in FDR. The CSPM input language of FDR allows the user to lay out a network using functional programming. Broadly speaking FDR will reduce (in functional programming terms) the description of a network until it either hits a true recursion or gets below the level of all “high-level” operators, the most important of which are parallel, hiding and renaming. The syntax it encounters below that dividing line is compiled into explicit state machines using a symbolic representation of the operational semantics: these must be finite state and will be called components. Above that level it devises one or more recipes for combining together the states and actions 1
It is possible in these cases artificially to increase the number of states in the specification, and indeed in our trial implementations we have sometimes done this. There are various good reasons such as loss of compression and the difficulty of automating this process that make this option very unattractive as a long-term solution.
Local Search in Model Checking
29
of the components into states and actions of the whole implementation. We call these recipes supercombinators: see below. Components do not only arise from compiling low-level syntax. They can also be generated by one of the various operators in CSP that compresses the state space of a process, typically because it has been applied to the parallel/hiding combination of a proper subset of what would otherwise have been the components of the complete system. Components can vary in size from a single state to hundreds of thousands or even millions. Each of the recipes for combining components is called a format. In the majority of practical checks, there is just a single format that is effectively a parallel composition of the N (say) components, with perhaps some hiding and renaming mixed in. Where there is more than one format, it is because one or more CSP operators that are usually low level get pushed up to high level by having some parallel operator beneath them in the reduced syntax tree, as in (P Q ); R
and (a → (P Q )) (b → R)
In both of these there will be format that consists of a state of P and one of Q , and a format that consists of a state of R. In the right-hand process there is also a format representing the initial state that has no components. FDR already stores the different formats separately, but where there are more than one they will frequently have very different numbers of states. So while it may well make sense to incorporate formats into a partitioning strategy, they are unlikely ever to be an important part of it. For the rest of this section, therefore, we will concentrate on partitioning an implementation process that has only a single format. FDR calculates the actions of such a state in one of two ways. Firstly, any τ action of one of the components is promoted to be an action of the complete system, in which only the component that performs the τ changes state. Secondly, FDR produces a number of supercombinator rules. Each of these can be described by a partial function φ from the component indices to Σ, the visible event labels, together with a result action, which may either be in Σ or τ . This rule can fire just when, for each i ∈ dom(φ), component i can perform the event φ(i). Just the components that are in dom(φ) change state. Thus each component processes simply moves around its own state space as the complete implementation progresses. One of the best prospects for partitioning the complete state space is to partition one of the components, and simply assign each state or state pair to a group determined by what state that component is in. This can clearly be generalised to look at the states of a small collection of the components, particularly in the case where the individual components each have very few states. The mapping that sends each overall state to the state of the j th component may be far from uniform: some states may be represented many more times than others. It follows that what seems like a well-balanced partition of this component may not yield a well-balanced partition of the complete space. It is
30
A.W. Roscoe, P.J. Armstrong, and Pragyesh
not at all unusual, for example, for a component process to have one or more error states that we hope will never be reached in the entire system. It therefore seems unlikely that there will be good automatic ways of deciding what componentbased partition will be used in advance of running the search. We will, however, later show how to create a range of options for this in advance. In our single format case, where the format has N components, we can think of each state pair as an N + 1-tuple of states, the extra one being the specification. There is no reason why partitions cannot be based on the specification state just as for a component of the implementations, but nor is there any reason to expect specification state distributions to be any more uniform. We offer two different strategies for partitioning the state space, which we term pre-computed and dynamic. 4.1
Pre-computed Partitioning
In pre-computed partitioning, we decide on some partitioning options in advance, based on the state spaces of one or more of the N + 1 processes that represent a state pair. We need an algorithm which will take a state machine and divide it into two or more state machines that are relatively self-contained, in the sense that as few transitions as possible pass between these parts. This algorithm will come in two parts. The first part will assign weights to the various nodes and transitions of the component state machine. These will assess how likely it is that the machine will be in a given state or take a given transition. The second part will be to choose partition(s) of the state machine that attempt to deliver roughly equal node weight in each part and minimal edge weight between them so as to improve the locality of the search. Each of these two algorithms will inevitably only give approximations to optimal results. In the first case this is because we cannot predict the weights that apply to a given search without running the search, and in the second case because even if we could formulate the correct balance between the two criteria, it would almost certainly be NP-hard to optimise them. Since the size of components we will want to partition will vary from 2 to millions, we will have to choose extremely efficient algorithms, place an upper bound on the size of component we will apply our algorithms to, or have different algorithms for different sizes of component. This is not the place to go into great detail about these algorithms, but we give a few ideas below. Weighting algorithms. In the absence of evidence about how a system behaves in practice, we can make intelligent guesses about how often particular transitions and nodes are used. The state explosion problem arises because potentially, and frequently, the state spaces of the component processes multiply to give the state space of the entire system. Though in many cases this is not literally true, it is hard to think of a general way other than running the search to identify statistical information about which of these potential states are reachable and which are not. What we
Local Search in Model Checking
31
will therefore do is to build up a stochastic model built on the assumption that all combinations of component states are reachable. We exclude the specification from the computation of this model since it participates in every visible action. What we will estimate is the probability, for each state γ of each component i of the system, that if the ith component of a global state Γ is γ, and Γ ’s other components are random, a randomly chosen action of Γ leads component i being in each of γ and the states immediately reachable from it. The sum of the probabilities of those states that are in the same partition as γ provides a measure of the locality of the actions starting from a randomly chosen Γ , conditional on this ith component. We will call γ an i-state. We note that it might stay in γ either because it took an action that did not change the state, or because the global action did not involve component i. For each labelled action x that each component j can make, we can compute Ej (x ) the expected number of times j can perform x from a random state: this is simply the number of x actions from its states divided by the number of states i has. Note that some of the states might have more than one x , so this expectation can be greater than 1. We can therefore calculate, for each supercombinator ρ = (φ, x ) the expected number of times E (ρ) it can fire in an random state, namely the product of Ej (φ(j )) as j ranges over dom(φ). We can similarly calculate Ej (ρ | γ) the expected number of times that ρ can fire given that the i-state is γ. – If i ∈ dom(φ), it is E (ρ). – If i ∈ dom(φ), then it is the product of the number of φ(i) actions γ has and all the Ej (φ(j )) as j ranges over dom(φ) − {i}. The total number of actions EAi (γ) we can expect a random state Γ whose i-state is some fixed γ to perform is ct (τ, γ) + Σ{Ej (τ ) | j ∈ {1..N } − {i}} + Σ{Ei (ρ) | ρ ∈ SC } where ct (x , γ ) is the number of different x actions that the state γ can perform from its initial state. The expected number of times E 1i (x | γ) that component i has a single one of its x actions enabled is then – 1 if x = τ . – If x = τ , it is the sum, over all supercombinators (φ, y) with φ(i) = x , of the product of Ej (φ(j )) as j ranges over dom(φ) − {i}. We can now say that the probability of a specific action (x , γ ) of γ firing from Γ whose other components are randomly chosen is E 1i (x | γ)/EAi (γ), namely the number of times we expect it to fire as an action of such a Γ divided by the total number of actions of Γ . If EAi (γ) = 0 then it is also 0. The probability PTi (γ, γ ) of γ making a transition to γ = γ is therefore the x sum of this value over all labels x such that γ −→ γ . The probability the state is unchanged is PTi (γ, γ) = 1 − Σγ =γ PT (γ, γ )
32
A.W. Roscoe, P.J. Armstrong, and Pragyesh x
which takes into account the possibility of some x with γ −→ γ firing, and component i not being involved in the transition. The above model depends crucially on our simplifying assumption that all states of the Cartesian product of the component state spaces are reachable. Under that assumption we know that every state of a given component i is reached equally often, so there is no point in trying to assign weights to these states representing how often each is represented in the complete system. There may, however, be reason to assign weights to them which include the number of times we expect that state to appear as a successor in the search, since that will affect the memory consumption of each block. This is because some component states may be represented more often than others in successors, even though they all appear equally often in the final state space. We do not propose a specific method at this time. 4.2
Example
In Chapter 15 of [11], the first case study given is peg solitaire, see Figure 1. We have added the letters A–G to show typical examples of the seven symmetry classes of slot under rotation and reflection. Games like this are useful benchmarks for FDR because they demonstrate the complexity of the problems
A
B
C
D
E
F G
Fig. 1. Solitaire board with representatives of symmetry classes
Local Search in Model Checking
33
being solved, because they usually give a counter-example (which is much more appealing than demonstrating an example where refinement holds) and because in FDR’s standard breadth-first search the counter-example is found (as in solitaire) at the very end of the search, meaning that the search profile is almost identical to a check that succeeds. The model consists of 33 two-state processes, one representing each peg. A solitaire move is the in one of the directions up, down, left and right, and hops a peg over another peg into an empty slot. The complete alphabet has 19 moves in each of the four directions, plus a special event done that all the processes synchronise on when they reach their target state. The diagram shows the initial configuration, and the target is the opposite one – a single peg in the middle. None of the processes can perform a τ . The nine central slots (in classes E , F and G) can each perform 12 different moves: they can be hopped out of, over, or into in any of the four directions. The twelve slots adjacent to them (classes C and D ) can each perform 6 different move events, and the twelve slots around the edges (A and B ) can each perform only 4. Since there is one supercombinator for each event – the one that demands that each of the 3 or 33 processes that can perform the event do so – and since only one of the two states of each event can perform it, our probability model calculates the probability of each move event being enabled as 2−3 and the probability that done is enabled as 2−33 . In other words Ei (x ) = 0.5 for every event x and slot i. The model predicts that the probability of any component process being involved in an arbitrary transition is close to the number of moves it is involved in, divided by the total number of move (76). This model, of course, is based on the assumption that all the 233 states are reachable2 whereas in fact 187, 636, 299 are (about 2.2%). By symmetry of the puzzle, it follows immediately from the fact that not all reachable configurations can be carried forward to a solution that more of the states will be encountered beyond the half-way point (i.e the 16th move of 31) than before it. This means, that Ei (x ) is, on average, less than 0.5 since each move requires two pegs present and one absent. Thus it is not surprising that the actual number of successor states found (1,487M) is less than the 1,781M that the model of (76/8) × 187M suggests. While the model suggests that all 76 moves will be enabled roughly equally often, in fact they vary from 12.4M times to 24.1M. Both the model and the statistics from the check itself suggest that the outer pegs (classes A and B) are the best ones to partition on, since they change state in the smallest proportion of the transitions. Partitioning a weighted graph. Now suppose that we have computed, for each component i, a node weight for each state γ, representing the proportion by which we expect to encounter Γ s with this γ during the search, and an edge 2
As discussed in [1], the 233 states break down naturally into 16 equal-size equivalence classes such that every reachable state is in the same class as the starting one. Thus the 2.2% reachability actually represents about 35% of the equivalence class of the starting state.
34
A.W. Roscoe, P.J. Armstrong, and Pragyesh
weight on each pair (γ, γ ) such that γ = γ and γ is reachable in one step from γ. The edge weight represents the probability that, given that the ith component of Γ is γ, a successor Γ will have γ . The previous section we assumed that all node weights are equal, and that the edge weights are PTi (γ, γ ). We will suggest some more possibilities later. In a case where node weights are not equal, then the edge weights from a state γ should be an estimate of the product w (γ) and the probabilities of the various actions given we are in γ. In partitioning component i into roughly equal weight but localised parts (localised meaning that most of its transitions lie within the part), there is a tension between these two objectives. In general, of course, we will want to divide a component machine into more than two pieces, or two be able to subdivide a machine that has already been partitioned. Some possibilities are as follows. – We could deploy algorithms for finding a minimum cut between two regions of the state machine that have been identified an have non-trivial node weight. An example of this (and the one we have deployed in our preliminary implementation) is the Kerninghan-Lin algorithm [7]. – We could successively remove inter-node edges in reverse order of weight until the machine becomes disconnected, again with restrictions to ensure the pieces do not become too large. – Starting from a number of well-spaced points in the graph of nodes, build up components Ci . Successively add a node to the lightest Ci : the node for which the sum of edge weights between Ci and it (both ways) is maximised. Where we have partitioned a component into more than two parts C1 , . . . , Cn , whatever algorithm we use for the abovetask should also generate a linear order s on the Ci with a view to each of the i=r Ci also making sense in terms of locality. One way of achieving this is to join together the two Ci with the largest (sum of edge weights) connection between them, and then successively add on another whose connection at either end makes most sense. To follow the pre-computed partitioning algorithm, we then arrange the components into order by the quality of the partitioning, judged in some way. Suppose the j th part (in its linear order) of the ith component process (ordered in decreasing order of quality) is Ci,j . We then define the partitioning function Π, when applied to a state Γ to be a tuple whose ith component is j , where the ith component γ(i) of Γ lies in Ci,j . The range of Π is naturally linearly ordered by lexicographic order (i.e., the first component takes precedence, then the second, and so on). Recalling that FDR stores the accumulated state space in a sorted structure, we can make Π(Γ ) the primary sorting key for the states Γ . Then to partition the state space all we have to do is find the point(s) in the range of Π(Γ ) which is closest to giving the desired sizes of the sections, or (probably) better, find points determined by as few components γ(i) as possible that are within some tolerance of the desired division points.
Local Search in Model Checking
35
The effect of this technique is to pre-compute many potential ways to partition the state space, and then structure the search so that it is trivial to split either B-trees or sorted lists of states. Only experience can tell us how effective the probability-based weight calculations set out above are. It is reasonable to expect that we will find methods that are effective for searches that are in some sense “typical”. It is also very likely that it will be possible to construct possibly pathological examples for which it works badly. There are two alternatives to this: one is to use Monte Carlo methods to conduct a partial exploration of the state space in advance, simply to be able to give estimates for the node and edge weights of the components. Using this is will still be possible to use the partitioning function Π that stays constant during a search. The other alternative is to use a dynamic partitioning method. 4.3
Dynamic Partitioning
We have seen that there is a significant data structuring advantage in using a pre-computed, linearly ordered, partitioning function. There are also the disadvantages that we have to work to give predictions about the structure of the check, and that we are committed to using the N + 1 component processes in a fixed order regarding partitioning. The alternative is to gather information about the node and edge weights of the various components as the check progresses: we can actually count how many times each of the states of these processes is represented in the accumulated state space, and how often each of their transitions is possible in these states. This is very easy and relatively cheap to collect during a search, and we can then use whatever part of it we wish when a particular block of states has to be partitioned. A possible approach to this is, for each block that is searched, to collect information about those states in it that arose since that block either began as the initial one with the root state, or was last split. We should, however, notice that the statistics from the part of the check already done may not be a good predictor of the future. For example, in the solitaire example, we would expect that the probability that any given slot is empty will increase during a run. It is likely that in many cases it will be possible to choose a better-performing partition of such a block, probably based on only one component, that using the fixed function. The algorithms for choosing the best ways to partition a given component process will, however, follow the same lines as those above. Whether using dynamic or pre-computed partitioning, we always have the option of deciding how many pieces to break a block into at the point when the split is made. We note that it may be wise to estimate the eventual size of the block to be partitioned in order to guide this decision.
36
5
A.W. Roscoe, P.J. Armstrong, and Pragyesh
Preliminary Practical Results
There are two main options as to how to use FDR in respect of memory management. The default mode is to have it run as a simple UNIX process, storing everything as part of process state. Typical Linux implementations limit the size of a process to 2Gb or 4Gb. A modern processor typically fills this up in less than an hour when running FDR, in many cases without needing backing store at all. The other option is via FDRPAGEDIRS and FDRPAGESIZE, environment variables which direct the tool to store blocks of states, whose size are determined by the latter, in directories specified by the former. The latter, naturally, represent backing store, but of course in many cases these are buffered in main memory. Our main experiments have been based on two classes of example: variants of the peg solitaire puzzle discussed earlier, and the distributed database example set out in Chapter 15 of [11]. Our implementations to date use a partitioning algorithm based on the specification process only, without edge or node weighting calculations of the sort set out earlier. Nevertheless they have shown reasonable locality on the above examples: 93% (of states leading to same partition) in the database example, and 84% for solitaire. They have shown considerable promise in speeding up large checks, in particular in the database example where we found better locality, but are still considerably sub-optimum in the way they handle the sets of states that are passed around between blocks. For this reason we have decided not to publish a performance table in this version of this paper, but will instead include one in a later version to appear on the web.
6
Parallel Implementation
A parallel version of FDR was reported in [3], the modus operandi of which is to allocate states between processors using a randomising hash function whose range is the same size as the number of processors, and which implements the usual BFS by having the processors “trade” successors at the end of each ply. Though its performance was excellent in terms of speed up (linear in most cases), this parallel version has never been included in the general FDR release because its usability would be restricted to just one of the many parallel architectures in existence. It is clear, however, that a parallel version for at least multi-core architectures is now required. The concepts of local search clearly transfer extremely well to parallel execution, and offer the prospects of reducing the amount of communication between processors and reducing the amount of time that one process will spend waiting for another. One possible implementation would be to use the idea of processor farming: keep a set of search blocks in a queue and, each time a processor becomes free, give it the next member of this queue.
Local Search in Model Checking
37
The effectiveness of local search versus BFS, and the most efficient way to use it, is likely to differ between parallel architectures. Let us examine the difference between a cluster of processors with independent memory and discs, and multicore architectures that share these things. It is, of course, likely that we will have two-level architectures consisting of clusters of multi-core processors. – In a shared-resource multi-core architecture we should have very fast intercore communication. On the other hand the imbalance (from the point of view of model checking) between processor power and fast memory (i.e. the problem that local search is designed to alleviate) will simply be multiplied when running a check on more than one core, and disc bandwidth-per-core will be divided. It may therefore prove a good idea to concentrate on high locality between the set of blocks currently being explored on all the cores, and the set of blocks presently stored on disc. There might also be an argument for performing different aspects of the search on different cores, though that might be harder to balance and scale. – The behaviour of a cluster of processors is likely to be governed by the relative rates at which data (transfers of states) is passed between processors, and the speeds at which the processors can (a) transfer data in and out of their own disc store and (b) process the results. Our initial aim will be to release a multi-core parallel implementation, hopefully by the end of 2009.
7
Conclusions
In this paper we have analysed the issue of keeping the usage of slow forms of memory down to the level where this does not force processing to be delayed, and seen that this challenge has gradually increased as processing capability has grown faster, and will increase further with the number of processor cores. We have developed the local search strategy in an attempt to solve this problem, which is crucially dependent on our ability to partition the state space into pieces whose transitions are primarily local, namely to other members of this partition. We have shown that by analysing the transition patterns of the component processes that are combined by FDR generate the transitions of a particular refinement check, there is a good prospect that we can choose good partitioning algorithms. Since local search is not BFS it removes the guarantee that the first counterexample it finds will be a shortest one. There is, however, still a choice akin to the distinction between DFS and BFS that we have to make, namely in which order do we process the outstanding blocks of our search. We discussed three possibilities for this earlier. The next state of our work will be to implement these possibilities and a variety of partitioning algorithms in FDR and experiment with their effectiveness. Our objective is to provide a limited range of options so that the user does not have to understand all of this technology to use it.
38
A.W. Roscoe, P.J. Armstrong, and Pragyesh
Acknowledgements We are grateful for discussions with Michael Goldsmith. The computing support staff at Oxford University Computing Laboratory were very helpful by providing the machines with a small amount of memory that we requested. This work has benefited from funding from the US Office of Naval Research and EPSRC.
References 1. Beasley, J.D.: The ins and outs of peg solitaire. Oxford University Press, Oxford (1985) 2. Een, N., Sorenson, N.: An extensible SAT solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004) 3. Goldsmith, M.H., Martin, J.M.R.: The parallelisation of FDR. In: Proc. Workshop on Parallel and Distributed model Checking (2002) 4. Goldsmith, M.H., et al.: Failures-Divergences Refinement (FDR) manual (1991-2009) 5. Holzmann, G.: An improved reachability analysis technique. Software P&E 18(2), 137–161 (1988) 6. Holzmann, G.: Design and validation of computer protocols. Prentice Hall, Englewood Cliffs (1991) 7. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell System Tech. Journal 49, 291–307 (1970) 8. McMillan, K.L.: Interpolation and SAT-based model checking. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg (2003) 9. Roscoe, A.W.: Model checking CSP. In: A classical mind: essays in honour of C.A.R. Hoare. Prentice Hall, Englewood Cliffs (1994) 10. Palikareva, H., Ouaknine, J., Roscoe, A.W.: Faster FDR counter-example generation using SAT solving. To appear in proceedings of AVoCS 2009 (2009) 11. Roscoe, A.W.: The theory and practice of concurrency. Prentice-Hall, Englewood Cliffs (1997) 12. Roscoe, A.W.: Understanding concurrent systems. Springer, Heidelberg (forthcoming, 2010) 13. Roscoe, A.W., Gardiner, P.H.B., Goldsmith, M.H., Hulance, J.R., Jackson, D.M., Scattergood, J.B.: Hierarchical compression for model-checking CSP or how to check 1020 dining philosophers for deadlock. In: Brinksma, E., Steffen, B., Cleaveland, W.R., Larsen, K.G., Margaria, T. (eds.) TACAS 1995. LNCS, vol. 1019. Springer, Heidelberg (1995) 14. Valmari, A.: Stubborn sets for reduced state space generation. In: Proceedings of 10th International conference on theory and applications of Petri nets (1989) 15. Wolper, P.L., Leroy, D.: Reliable hashing without collision detection. In: Courcoubetis, C. (ed.) CAV 1993. LNCS, vol. 697. Springer, Heidelberg (1993) 16. Sun, J., Liu, Y., Dong, J.S.: Bounded model checking of compositional processes. In: Proc. 2nd IEEE International Symposium on Theoretical Aspects of Software Engineering, pp. 23–30. IEEE, Los Alamitos (2008)
Exploring the Scope for Partial Order Reduction Jaco Geldenhuys1 , Henri Hansen2 , and Antti Valmari2 1 Computer Science Division, Department of Mathematical Sciences Stellenbosch University, Private Bag X1, 7602 Matieland, South Africa
[email protected] 2 Department of Software Systems, Tampere University of Technology PO Box 553, FI-33101 Tampere, Finland {henri.hansen,antti.valmari}@tut.fi
Abstract. Partial order reduction methods combat state explosion by exploring only a part of the full state space. In each state a subset of enabled transitions is selected using well-established criteria. Typically such criteria are based on an upper approximation of dependencies between transitions. An additional heuristic is needed to ensure that currently disabled transitions stay disabled in the discarded execution paths. Usually rather coarse approximations and heuristics have been used, together with fast, simple algorithms that do not fully exploit the information available. More powerful approximations, heuristics, and algorithms had been suggested early on, but little is known whether their use pays off. We approach this question, not by trying alternative methods, but by investigating how much room the popular methods leave for better reduction. We do this via a series of experiments that mimic the ultimate reduction obtainable under certain conditions.
1
Introduction
Partial order reduction is a widely-used and particularly effective approach to combat the state explosion problem. Broadly speaking, partial order reduction rules out a part of the state space as unnecessary for verifying a given property. This is achieved by exploiting the commutativity of transitions that arises from their concurrent execution or other reasons. The term “partial order reduction” is somewhat inaccurate, but it is used for historical reasons. There are several approaches to partial order reduction. For the purpose of this paper, we consider methods that for each state expand some subset of enabled transitions. Such methods are highly similar; the sets of transitions that are expanded are called either stubborn sets [12], ample sets [11], or persistent sets [4]. The methods differ slightly in the way they are defined, and each method has a number of different formulations. Nonetheless, the key ideas in all of them are more or less equivalent [13]. In this paper we use the ample set method as presented in [1] as the starting point of our investigation. It is easy to implement, and its primitive operations are fast but, in return, it wastes some reduction power. We investigate experimentally how much potential there is for better reduction. We are interested in Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 39–53, 2009. c Springer-Verlag Berlin Heidelberg 2009
40
J. Geldenhuys, H. Hansen, and A. Valmari
the ultimate results that would be obtainable by an ideal method that is based on partial order principles. We restrict our attention to reductions that preserve deadlocks. More sophisticated verification questions use additional rules such as visibility of transitions and various cycle closing conditions. We postpone them for future work, because they introduce more complications than can be discussed in this paper. The concept of dependency between transitions refers to situations where two transitions may interfere with each other. Dependency is central in the calculation of ample sets, and some approximation that overapproximates dependency is used. We explore how much additional reduction in the resulting state space is to be gained from using a more accurate dependency relation. Analysis of dependency could be taken further by engaging in dynamic analysis as in [3,10], where the notion of dependency is refined during the generation of state spaces. There is even more variability in the treatment of disabled transitions that are dependent on transitions in the ample set. The correctness of the methods requires that they remain disabled until a transition in the ample set occurs. This can be ensured using different heuristics, some more complicated and presumably also more powerful than others. In this paper we use a rather straightforward precedence heuristic and leave the allegedly stronger ones for future work. In addition to the issues above, there is the question of calculating ample sets. The basic algorithm considers only subsets that are local to a single process. If no suitable process is found, it gives up and returns the set of all enabled transitions. However, a more sophisticated algorithm can often do better in this situation. How much further reduction is gained from using the more sophisticated algorithm, is also a matter of investigation here. Section 2 defines the formal model of concurrent systems that is used throughout the paper. Section 3 describes exactly how ample sets and the transition dependency and precedence we employ are calculated. In Section 4 we compare experimental results from using the different algorithms for ample sets and different versions of dependencies and precedences, and conclusions are presented in Section 5.
2
Mathematical Background
In the first part of this section we present a formal description of our model of computation. This is not mere formality for its own sake. Although the formalization is detailed, its purpose is to make it possible to describe the computation of ample sets in a precise manner. 2.1
Model of Computation
Our model of computation has three components: a set of variables, a set of transitions, and a set of processes. Definition 1. Let V = (v1 , v2 , . . . , vn ) be an ordered set of variables. The values of variable vi are taken from some finite domain Xi . Let X = X1 ×X2 ×· · ·×Xn .
Exploring the Scope for Partial Order Reduction
41
– An evaluation e ∈ X associates a value with each variable. – A guard g: X → B is a total function that maps each evaluation to a Boolean value. B = {true, false}. – An assignment a: X → X is a total function that maps one evaluation to another. Each process is described fully by its transitions and a designated variable called a program counter. A transition is a pair of the form (g, a) where g is a guard and a is an assignment. A transition is enabled in states where its guard evaluates as true. If the transition fires (i.e., is executed), it changes the values of variables as described by its assignment component. In almost all cases, the guard checks that the program counter has an appropriate value, and the assignment changes the value of the program counter. Definition 2. A process over variables V is a pair (Σ, pc), where Σ is a set of transitions, and pc ∈ V is a variable called the program counter. For each transition (g, a) in Σ there is a value x such that the guard g is of the form (pc = x) ∧ ϕ. For e ∈ X, we write pc(e) to denote the value of the program counter at e. Definition 3. Let V be an ordered set of variables over the finite domain X and let P = {P1 , P2 , . . . , Pk } be a set of processes over V , such that Pi = (Σi , pc i ). We assume all the pc i are different. Let eˆ be an evaluation. The state space of P from eˆ is M = (S, eˆ, Σ, Δ), where eˆ ∈ X is the initial state, Σ = Σ1 ∪ . . . ∪ Σk , and S and Δ are the smallest sets such that S ⊆ X, Δ ⊆ S ×Σ ×S, eˆ ∈ S, and if t = (g, a) ∈ Σ and s ∈ S and g(s) = true, then a(s) ∈ S and (s, t, a(s)) ∈ Δ. We say that S is the set of states and eˆ is the initial state. The elements of Σ above are referred to as structural transitions, while the elements of Δ are called semantic transitions. From now on, when we write just “transition”, we are referring to the structural ones. This convention agrees with Petri net terminology and disagrees with process algebra and Kripke structure terminology. The following notation is used throughout the paper: Definition 4. Let M = (S, eˆ, Σ, Δ) be the state space of P from eˆ, and let s be some state of M . The enabled transitions of s are en(s) = {t ∈ Σ | ∃s ∈ S: (s, t, s ) ∈ Δ}. t We write s−→ when t ∈ en(s). t We write s−→ s when (s, t, s ) ∈ Δ. t1 t2 t3 ··· We write s− −−−→, when ∃s1 , s2 , . . . ∈ S : s−t1 →s1 −t2 →s2 −t3 →... The local transitions of process Pi in state s are current i (s) = {(g, a) ∈ Σi | ∃e ∈ X : pc i (s) = pc i (e) ∧ g(e) = true}. – The enabled local transitions of Pi are en i (s) = current i (s) ∩ en(s). – The disabled local transitions of Pi are dis i (s) = current i (s) \ en(s). – – – – –
Lastly, we need to know which other variables are involved in which transitions. These relationships are only defined in the context of a state space.
42
J. Geldenhuys, H. Hansen, and A. Valmari
Definition 5. Let V = (v1 , v2 , . . . , vn ) be an ordered set of variables, let P be a set of processes over V , and let M = (S, eˆ, Σ, Δ) be the state space of P from eˆ. Given evaluations e, e ∈ X, for 1 ≤ i ≤ n: – ei denotes the value of vi in e, and – δ(e, e ) = {i | ei = ei } is the set of indices on which e and e disagree. If t = (g, a) ∈ Σ, then – the test set of t is Ts(t) = {vi | ∃e, e ∈ X: δ(e, e ) = {i} ∧ g(e) = g(e )}, – the write set of t is Wr(t) = {vi | ∃e ∈ X: ei = a(e)i }, – the read set of t is Rd (t) = {vi | ∃e, e ∈ X: δ(e, e ) = {i} ∧ ∃j: vj ∈ Wr (t) ∧ a(e)j = a(e )j }, and – the variable set of t is Vr(t) = Ts(t) ∪ Rd (t) ∪ Wr (t). The test, write, and read sets are conservative syntactic estimates of those variables that may be involved in the different aspects of a transition’s execution. Variables with disjoint variable sets are clearly independent, but an even finer condition will be given in Section 3.1.. 2.2
Ample Sets, Dependency and Precedence
It is important to distinguish which transitions may interfere with one another and to this end we define the following: Definition 6. Let M = (S, eˆ, Σ, Δ) be the state space of P from eˆ, and S ⊆ S. A dependency relation D ⊆ Σ × Σ for S is a symmetric, reflexive relation such that (t1 , t2 ) ∈ D implies that for all states s ∈ S , the following are true: 1 2 2 1. If s−t→ s and s−t→ , then s −t→ (independent transitions do not disable one another). t1 t2 t2 t1 2. If s− −→s and s− −→s then s = s (the final state is independent of the transition order).
Given D and T ⊆ Σ, we write dep(T ) = {t | ∃t ∈ T : (t, t ) ∈ D}.
The definition implies that if D is a dependency relation, D is symmetric, and D ⊆ D , then D is also a dependency relation. If S = S in this definition, we get the usual notion of dependency. We shall, however, also use other S . Here we use the definition of ample sets from [1], modified to accommodate S instead of S. Definition 7. Let M = (S, eˆ, Σ, Δ) be the state space of P from eˆ, let S ⊆ S and let D be a dependency relation for S . A set ample(s) ⊆ Σ of transitions is an ample set for state s ∈ S if and only if the following hold: A0 ample(s) = ∅ if and only if en(s) = ∅. A1 For every path of the state space that begins at s, no transition that is not in ample(s) and is dependent on a transition in ample(s), can occur without some transition in ample(s) occurring first.
Exploring the Scope for Partial Order Reduction
43
Because condition A1 talks about future transitions, some of which may not be enabled in the current state, we need information about not only those transitions that are currently enabled, but also those that may become enabled in the future. Definition 8. Let M = (S, eˆ, Σ, Δ) be the state space of P from eˆ, and S ⊆ S. A precedence relation R ⊆ Σ × Σ for S is such that if there is some state s ∈ S t1 t2 2 such that ¬s−t→ and s− −→, then (t1 , t2 ) ∈ R. Given R and T ⊆ Σ, we write pre(T ) = {t | ∃t ∈ T : (t, t ) ∈ R}. The definition implies that if R is a precedence relation and R ⊆ R , then R is also a precedence relation. The precedence relation makes it possible to detect all the transitions that can enable a given transition. It is a coarse heuristic. A finer heuristic, commonly used in the stubborn set method, takes advantage of the fact that if the guard of t is of the form ϕ ∧ ψ where ϕ evaluates to true and ψ evaluates to false in the current state, then it would suffice to only consider those transitions which may affect ψ.
3 3.1
The Calculation of D, R, and Ample Sets Calculating D and R
In Section 2.1 we carefully make a distinction between structural and semantic transitions, and between the description of a model (its variables and processes) and the state space it generates. This difference plays an important role when it comes to the calculation of the D and R relations. Specifically, the static (structural) description of a model is almost always available and can be used to calculate an overapproximation of the smallest possible D and R. The full state space, on the other hand, is seldom available for realistic models; the purpose of partial order methods is to avoid its construction! In this paper we consider three versions of D/R. The first, which we refer to as Ds /Rs , is based on the static model description and is calculated as follows: Let M = (S, eˆ, Σ, Δ) be a state space, and t1 , t2 ∈ Σ. – (t1 , t2 ) ∈ Ds if and only if Wr (t1 ) ∩ Vr(t2 ) = ∅ or Wr (t2 ) ∩ Vr(t1 ) = ∅. – (t1 , t2 ) ∈ Rs if and only if Wr (t1 ) ∩ Ts(t2 ) = ∅. It is easy to see that these dependency and precedence relations are not the smallest possible. For example, consider the two transitions ta : true −→ x := (x + 1) mod 5 tb : true −→ x := (x + 2) mod 5 (Here we use Dijkstra’s guarded command notation to describe the guards and assignments of the transitions.) If ta and tb belong to two different processes, they are clearly independent: they cannot disable each other and the order in which they execute has no effect on the final state. Nevertheless, according to
44
J. Geldenhuys, H. Hansen, and A. Valmari
the rules above (ta , tb ) ∈ Ds because each transition reads a variable (x) that is written to by the other. The second version of the relations is called Df /Rf , and is based on the full state space: 1 2 – (t1 , t2 ) ∈ Df if and only if for some state s ∈ S where s−t→ s1 and s−t→ s2 , t2 t2 t1 either ¬s1 −→ or s1 −→s ∧ ¬s2 −→s . t1 t2 2 – (t1 , t2 ) ∈ Rf if and only if for some state s ∈ S, both ¬s−t→ and s− −→.
The third and final version of the relations, Dd /Rd , is based on the full state space and the current state: – (t1 , t2 ) ∈ Dd (s) if and only if for some state s ∈ S reachable from s and 1 2 2 2 1 where s −t→ s1 and s −t→ s2 , either ¬s1 −t→ or s1 −t→ s ∧ ¬s2 −t→ s . – (t1 , t2 ) ∈ Rd (s) if and only if for some state s ∈ S reachable from s, both t1 t2 2 ¬s −t→ and s − −→. Note that the definition of ample sets in Definition 7 is only sensitive to what happens in the current state and its subsequent states. It is therefore correct to restrict Dd and Rd to the part of the state space that is reachable from the current state. Here we make use of the S in Definitions 6 and 8: for Dd (s) and Rd (s), we let S be the set of all those states that are reachable from s. Ds /Rs are based on structural transitions, whereas both Dd /Rd and Df /Rf are defined with respect to semantic transitions. While in practice the latter two versions may be expensive to calculate in full, they provide some idea of the limits of partial order reduction. 3.2
Calculating Ample Sets
It is reasonable to always consider all the current transitions in a given process as dependent, and therefore the smallest possible sets that are eligible as ample sets are the sets en i (s) of enabled local transitions in each process. A conservative estimate of when such a set can be selected is based on the following sufficient condition [1]: Proposition 1. en i (s) is an ample set if for each process Pj = Pi we have 1. pre(dis i (s)) ∩ Σj = ∅, and 2. dep(en i (s)) ∩ Σj = ∅.
A straightforward method for using this information tests the en i sets one by one. If either of the conditions fails to hold, the set is discarded and we consider the next candidate. If no suitable candidate is found, the set en(s) is used as an ample set. This approach is shown in Figure 1, and it is also roughly how partial order reduction is implemented in SPIN [7]. As it stands, the algorithm returns the first valid ample set it encounters. This is somewhat arbitrary, since it depends on the order in which the processes are examined, which, in turn, often depends on their order of declaration. This may
Exploring the Scope for Partial Order Reduction
45
ample1(s) 1 for i ∈ {1, . . . , k} such that en i (s) = ∅ do 2 A ← true 3 for j = i do 4 if pre(dis i (s)) ∩ Σj = ∅ or dep(en i (s)) ∩ Σj = ∅ then 5 A ← false 6 break 7 if A then return en i (s) 8 return en(s) Fig. 1. Ample set selection from [2]
give the user some control over the selection of ample sets, but it is doubtful whether such control is ever exercised and whether it is effective. Instead, we refer to this default version in Figure 1 as first choice, and we consider two alternative approaches: – Minimum choice: The algorithm is modified to compute all the valid ample sets of the form en i (s) (it merely records the set index in line 7), and returns the smallest set (or one of the smallest sets) in line 8, reverting to en(s) if no en i (s) qualifies. – Random choice: As for minimum choice, the algorithm computes all valid ample sets of the form en i (s) and then randomly picks one of these to return in line 8, reverting to en(s) if necessary. On the surface, random choice seems just as arbitrary as first choice. However, in Section 4 the same partial order reduction run is repeated many times with the random choice approach. This allows us to measure experimentally how sensitive the reduction is to the choice of ample set. 3.3
Using SCCs for Ample Sets
One drawback of the approach in the previous section is that the ample set contains the enabled transitions of either all or exactly one of the processes. It is easy to imagine a scenario of four processes P1 . . . P4 where en 1 and en 2 are mutually dependent, and en 3 and en 4 are mutually dependent, and all other pairings are independent. In this scenario it is possible to choose ample = en 1 ∪ en 2 or ample = en 3 ∪ en 4 , but this is never done. In [12] an algorithm that constructs a graph whose maximal strongly connected components (SCCs) are used as candidates for ample is presented. Definition 9. Let M = (S, eˆ, Σ, Δ) be the state space of P from eˆ, D be a dependency relation, R be a precedence relation, and s ∈ S a state of M . – For two processes Pi and Pj , if one or both of the conditions in Proposition 1 are violated, then Pj is a conflicting process for Pi in state s.
46
J. Geldenhuys, H. Hansen, and A. Valmari ample2(s) 1 E ←∅ 2 for i ∈ {1, . . . , k} do 3 A ← true 4 for j = i do 5 if pre(dis i (s)) ∩ Σj = ∅ or dep(en i (s)) ∩ Σj = ∅ then 6 A ← false 7 E ← E ∪ {(i, j)} 8 if A ∧ en i (s) = ∅ then return en i (s) 9 return en H (s) where H is some SCC of Gs = ({1, . . . , k}, E) that satisfies the conditions of Proposition 2 Fig. 2. Ample set selection using a conflict graph
– Gs = (W, E) is a conflict graph for state s such that the vertices are process indices: W = {1, . . . , k} and (i, j) ∈ E if and only if Pj is a conflicting process for Pi in state s. – If H is an SCC of the conflict graph Gs , then en H (s) = ∪i∈H en i (s). Then we have the following: Proposition 2. Let M = (S, eˆ, Σ, Δ) be the state space of P from eˆ, s ∈ S be some state of M , and Gs be the conflict graph for state s. If H is an SCC of Gs such that 1. en H (s) = ∅, and 2. for all SCCs H = H that are reachable from H, en H (s) = ∅, then en H (s) is an ample set for state s.
This gives us the correctness of the algorithm in Figure 2. To see how this approach can improve upon ample1, consider the model of the philosophers’ banquet, shown in Figure 3. The whole system consists of two completely independent copies of the classic four dining philosophers system, as illustrated in Figure 3(b). (The details of a single philosopher are shown in Figure 3(a).) No reduction is possible with ample1, because each en i (s) set of each philosopher contains transitions that are dependent on the transitions of the philosopher to the left or right, and is therefore invalid according to Proposition 1. On the other hand, ample2 is able to select an ample set ∪i∈Table1 en i (s). The full system has 6400 states and 33920 transitions, which ample2 reduces to 95 states and 152 transitions; as mentioned, ample1 does not reduce the state space at all. As in the case of ample1, we shall consider three versions of ample2: – First choice: The algorithm as it stands. – Minimum choice: The algorithm modified to compute all valid en H , and to return the smallest such candidate. – Random choice: The algorithm modified to compute all valid en H , and to return a random candidate.
Exploring the Scope for Partial Order Reduction fL = 0 →
@ R @ fL := 1 - hasL think 6
fR = 0 → fR := 1
fR := 0
? done hasR fL := 0
Table f -f ? 6 f f f -f ? 6 f f
(a)
1
Table f -f f -f ? ? 6 6 f f f f f -f f -f ? ? 6 6 f f f f
47
2
f -f 6 f ? f f -f 6 f ? f
(b)
Fig. 3. Model of the philosophers’ banquet. (a) Details of a single philosopher; fL and fR refer to a philosopher’s left and right forks. (b) A banquet consisting of two independent copies of the classic dining philosophers model.
4
Experimental Results
The previous section presented three different versions of the D and R relations, two different algorithms (ample1 and ample2) that use the relations to compute potential ample sets, and three different ways (first , minimum, and random) of choosing an actual ample set. Each of the 18 combinations of techniques are evaluated by applying them to models taken from the BEEM repository [8]. The full repository contains 300 variants of 57 basic models, and covers a range of genres: protocols, mutual exclusion and leader election algorithms, hardware control, scheduling and planning, and others. Since the experiments are long-running, only the 114 smallest models were chosen, but with at least one variant for each basic model. The sizes of the models range from 80 to 124 704 states, and 212 to 399 138 transitions. For each model, the original model source code is converted to a C program that generates the full state space and writes the resulting graph (without detailed state contents) to a file. The Ds /Rs and Df /Rf relations are also computed and stored along with the graphs. In all cases the full state space is identical to those described on the BEEM website. All the numbers in the tables that follow refer to the percentage of states in the full state space explored by a method. This provides some measure of savings with respect to memory consumption, independent of specific data structures and architecture. The overhead costs involved in calculating ample sets make it hard to give a similarly independent measure for computation time. 4.1
Conflict Graph in the Static Case
The first question we address is whether the use of the conflict graph makes any difference. Algorithm ample2 never fares worse than algorithm ample1; Table 1(a) shows those cases where it fares better. For all but a small number of models its impact is negligible.
48
J. Geldenhuys, H. Hansen, and A. Valmari
Table 1. Comparison of ample1 and ample2 algorithms in the static case. (a) Instances where ample1 and ample2 differ. (b) Further instances where ample1 or ample2 or both achieve reduction.
Model phils3 trgate2 trgate3 trgate1 phils1 bopdp1 elev21
(a) ample1 ample2 Ds /Rs Ds /Rs 100.00 100.00 100.00 100.00 100.00 100.00 100.00
32.37 54.77 55.61 59.12 60.00 95.74 96.30
Model mcs4 anders4 anders2 peters1 mcs2 szyman1 ldrfil2 peters2 ldrfil4 mcs1 ldrfil3 krebs2
(b) ample1 ample2 Ds /Rs Ds /Rs 5.88 46.47 61.55 62.25 65.06 69.46 76.57 82.42 82.71 89.30 96.39 97.60
5.88 46.47 61.55 62.25 65.06 69.46 76.57 82.42 82.71 89.30 96.39 97.60
This is partly due to the fact that the Ds /Rs relations are crude overapproximations of the true dependency between transitions, that do not provide much information for either ample1 or ample2 to exploit. Table 1(b) shows all further models for which any reduction at all was achieved. For the remaining 92 models both ample1 and ample2 explored the entire state space. In all tables the rows are ordered according to the reduction reported in the rightmost column. In some tables we omit results for the same model with a different parameterization, due to lack of space. 4.2
The Static v. Full Calculation of D/R
With the exception of the msc4 model, the results in Tables 1(a) and (b) could be seen as disappointing. On the other hand, they demonstrate what can be achieved with very basic static analysis. The question that arises is how much can be gained by refining the D/R relations. There are essentially two ways of achieving this. Firstly, a more sophisticated analysis of the structural transition guards and assignments can eliminate unnecessary dependencies between some transitions. Such an analysis may include, for example, the kind of reasoning employed to conclude that transitions ta and tb in Section 3.1 are independent of one another. This is what Godefroid and Pirottin call refined dependency [5]. Secondly, some dependencies can be computed on-the-fly. For example, in models of concurrency with asynchronous communication, the emptiness or fullness of a bounded-length communication channel affects the dependency of some operations on the channel. Godefroid and Pirottin refer to this approach as conditional
Exploring the Scope for Partial Order Reduction
49
Table 2. Comparison of static and full D/R relations
Model cycsch1 mcs4 fwtree1 phils3 mcs1 anders4 iprot2 mcs2 phils1 fwlink2 krebs1 ldrelc3 teleph2 ldrelc1 szyman1 prodcl2 at1 szyman2 ldrfil2 lamp1 prots2 collsn1 drphls1
ample2 ample1 ample2 Ds /Rs Df /Rf Df /Rf 100.00 5.88 100.00 32.37 89.30 46.47 100.00 65.06 60.00 100.00 100.00 100.00 100.00 100.00 69.46 100.00 100.00 72.22 76.57 100.00 100.00 100.00 100.00
1.19 4.13 6.25 29.63 18.35 22.78 26.16 34.45 60.00 51.37 90.06 54.26 59.55 60.52 62.98 63.12 65.71 65.76 65.94 66.24 72.51 68.46 68.54
1.19 4.13 6.25 10.84 18.35 22.78 25.97 34.45 47.50 50.66 51.09 54.26 59.55 60.52 62.98 63.12 65.35 65.76 65.94 66.24 67.69 68.46 68.54
Model prots3 peters2 drphls2 collsn2 prodcl1 teleph1 lamp3 fwlink1 pgmprt4 ldrfil3 bopdp2 fischr1 bakery3 exit2 brp21 pubsub1 fwtree2 pgmprt2 brp2 extinc2 cycsch2 synaps2 plc1
ample2 ample1 ample2 Ds /Rs Df /Rf Df /Rf 100.00 82.42 100.00 100.00 100.00 100.00 100.00 100.00 100.00 96.39 95.35 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
72.45 71.54 72.25 73.94 74.08 75.16 75.18 81.61 80.82 83.20 85.81 87.38 87.51 100.00 87.96 94.48 98.93 89.49 99.51 98.95 100.00 100.00 99.95
70.75 71.54 72.25 73.94 74.08 75.16 75.18 78.83 80.82 83.20 84.78 87.38 87.51 87.78 87.96 88.97 89.43 89.49 96.46 96.87 98.55 99.75 99.95
dependency [5], and a related technique is briefly described by Peled [9] and more fully by Holzmann [6]. Both these approaches are subsumed by Df /Rf as defined in Section 3.1. In other words, the full versions of the D and R relations supply an upper bound to the reduction that can be achieved with refined and conditional dependency. Table 2 compares the ample2 algorithm for the static D/R, and the ample1 and ample2 algorithms for the full D/R. As the table shows, a significant reduction can be achieved for some models, even when using ample1. Nevertheless, the SCC approach is able to exploit the improved dependency relations even further. These results lead to a further question: given the choice of improving D or improving R, which of the two relations should be refined, if possible? To answer this question, we combined the static version of D with the full version of R, and vice versa. The results of these experiments are shown in Table 3. Only for the four topmost models in the table does the Ds /Rf combination make a greater impact than the Df /Rs combination. In all other cases, a more accurate version of D leads to greater reduction.
50
J. Geldenhuys, H. Hansen, and A. Valmari Table 3. Combinations of static/full D/R ample2 Ds /Rs
ample2 Ds /Rf
ample2 Df /Rs
ample2 Df /Rf
fwtree1 bopdp2 bopdp1 exit2
100.00 95.35 95.74 100.00
9.19 93.24 93.42 87.78
100.00 95.11 95.33 100.00
6.25 84.78 86.88 87.78
phils3 mcs1 anders4 iprot2 iprot1 mcs2 brp1 brp2 plc1
32.37 89.30 46.47 100.00 100.00 65.06 100.00 100.00 100.00
32.37 89.30 46.47 100.00 100.00 65.06 100.00 100.00 100.00
10.84 20.11 22.78 27.40 30.82 38.28 99.15 99.51 99.97
10.84 18.35 22.78 25.97 28.94 34.45 96.98 96.46 99.95
Model
Table 4. Comparison of the full and dynamic D/R relations
Model mcs2 fwtree2 ldrflt1 cycsch2 prots3 pubsub1 needhm1 bakery1 bakery2 gear1
4.3
ample2 ample2 Df /Rf Dd /Rd 34.45 89.43 88.70 98.55 70.75 88.97 100.00 94.42 100.00 100.00
22.73 28.51 50.62 53.73 70.18 88.10 89.54 93.63 98.87 99.40
Dynamic Version of D/R
The effect of using the dynamic version of D/R is compared to the full version in Table 4. For the majority of the 114 models, little further reduction is achieved, and only a couple of models (mcs2, fwtree2, ldrflt1, cycsch2, and needhm1) exhibit significant reduction (more than 10%). Of course, the use of Df /Rf already reduces the state space for many models, and there is less “room” for additional reduction. Furthermore, if the state space of a system is strongly connected, the Df and Dd relations are identical, as are Rf and Rd .
Exploring the Scope for Partial Order Reduction
51
Table 5. Comparison of the first, minimum, and random choice
Model fwtree1 mcs1 iprot2 iprot1 mcs2 fwlink2 ldrelc3 ldrelc1 szyman1 prodcl2 ldrelc2 krebs2 at1 krebs1
Model mcs2 fwtree2 ldrflt1 cycsch2 prots3 prots1
4.4
1st 6.25 18.35 25.97 28.94 34.45 50.66 54.26 60.52 62.98 63.12 64.38 65.17 65.35 51.09
1st 22.73 28.51 50.62 53.73 70.18 78.77
ample2, Df /Rf min random 6.25 18.35 25.98 28.96 34.45 50.68 54.28 60.55 62.98 63.12 64.05 65.17 65.35 47.95
6.25 . . . . 6.99 18.05. . .18.37 27.48. . .29.61 29.78. . .34.00 34.38. . .34.87 50.66. . .50.68 54.12. . .54.21 60.27. . .60.57 63.12. . .63.33 63.30. . .64.06 64.07. . .64.20 65.18. . .65.22 65.35. . .65.38 57.92. . .65.51
ample2, Dd /Rd min random 22.87 28.76 50.28 57.42 72.31 78.11
23.08. . .24.43 31.54. . .35.35 50.24. . .52.17 54.09. . .55.15 71.67. . .72.91 78.44. . .79.42
Model ldrflt2 szyman2 drphls1 prots2 drphls2 at2 prots3 prodcl1 prots1 pgmprt4 fwlink4 collsn2 exit2 fischr1
Model teleph1 pubsub1 needhm1 bakery1 bakery2 gear1
1st 65.94 65.76 68.54 67.69 72.25 72.73 70.75 74.08 78.77 80.82 82.06 73.94 87.78 87.38
1st 75.78 88.10 89.54 93.63 98.87 99.40
ample2, Df /Rf min random 65.94 65.76 68.54 68.25 72.25 72.73 72.31 74.08 78.11 80.82 82.06 73.94 87.10 87.38
65.86. . .65.99 66.05. . .66.22 68.54. . .68.55 68.99. . .69.52 72.30. . .72.36 72.73. . .72.75 71.28. . .73.09 74.40. . .75.45 78.27. . .79.51 80.87. . .80.92 82.45. . .82.73 83.09. . .85.50 87.84. . .88.01 86.75. . .88.17
ample2, Dd /Rd min random 70.00 88.10 89.54 93.63 98.87 99.55
77.42. . .82.97 87.41. . .88.10 89.34. . .91.15 93.63. . .94.02 98.95. . .99.48 99.44. . .99.70
First, Minimum, and Random Choice
Lastly, Table 5 shows the results of experiments in which a different choice of valid ample sets is exercised. In the case of Df /Rf , the choice of first, minimum, and random ample sets produces no effect for either the ample1 or ample2 approaches. This may be explained by the fact that the choices are so limited that it does not matter which ample set is selected. In the case of the full and dynamic versions of D/R, however, some variation can be observed. For six models, selecting the SCC with the smallest number of enabled transitions produces an improvement in the reduction; their names are shown in italics. Note, however, that this strategy does not consistently improve the reduction and that the same model behaved differently in the full and dynamic versions. The improvement is largest for teleph1 (−5.78%) and for krebs1 (−3.14%); for the other models it is less than 1%. For 12 models the minimum choice leads to losses of reduction, although these are generally smaller than the improvements. The situation is somewhat similar when a valid SCC is chosen at random. Each experiment was repeated 50 times for Df /Rf and 20 times for Dd /Rd to
52
J. Geldenhuys, H. Hansen, and A. Valmari
produce the results in Table 5. In the case of Df /Rf , this strategy produces a range of 7.59% for the krebs1 model, and for Dd /Rd a range of 5.55% for the teleph1 model. All other ranges are smaller than 5% for the remaining models shown, and zero for the rest. The relatively small ranges seem to indicate that in the majority of cases, reduction is not overtly sensitive to the choice of ample set. However, this does not rule out the possibility that more advanced, systematic heuristics for choosing an ample set could produce significant savings.
5
Discussion
The use of partial order reduction is widespread, and many improvements to the basic techniques have been proposed. Before such proposals are pursued, it is worthwhile to try to determine whether any significant improvement is possible at all. This paper has attempted to partially address this question. We have – presented empirical lower bounds for partial order reduction based on a rough approximation of the dependency relation between transitions; – presented empirical upper bounds based on information derived from the full state space; – demonstrated that it is possible to improve reduction using a relatively simple technique such as a conflict graph that exploits information about transition dependency more fully than the standard technique; and – shown that, given a choice of ample sets, choosing the smallest set, or a random set does not lead to significantly greater reduction in any of our experiments. It is important to point out that it is unlikely that the upper bounds we present here are achievable. We have left the effect of the cycle-closing conditions and visibility for future work. Both tend to reduce the effect of partial order reduction. Generally speaking, the reduction does not appear to be as significant as reported elsewhere.
References 1. Clarke, E.M., Grumberg, O., Minea, M., Peled, D.: State space reduction using partial order techniques. Software Tools for Technology Transfer 2(3), 279–287 (1999) 2. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (1999) 3. Flanagan, C., Godefroid, P.: Dynamic partial-order reduction for model checking software. In: Palsberg, J., Abadi, M. (eds.) Proceedings of the 32nd Annual ACM Symposium on Principles of Programming Languages, January 2005, pp. 110–121 (2005) 4. Godefroid, P.: Partial-order Methods for the Verification of Concurrent Systems: an Approach to the State-explosion Problem. LNCS, vol. 1032. Springer, Heidelberg (1996)
Exploring the Scope for Partial Order Reduction
53
5. Godefroid, P., Pirottin, D.: Refining dependencies improves partial-order verification methods (extended abstract). In: Courcoubetis, C. (ed.) CAV 1993. LNCS, vol. 697, pp. 438–449. Springer, Heidelberg (1993) 6. Holzmann, G.J.: The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley, Reading (2004) 7. Holzmann, G.J., Peled, D.: An improvement in formal verification. In: Hogrefe, D., Leue, S. (eds.) Proceedings of the 7th IFIP TC6/WG6.1 International Conference on Formal Description Techniques (FORTE 1994), June 1994, pp. 197–211 (1994) 8. Pel´ anek, R.: BEEM: Benchmarks for explicit model checkers. In: Boˇsnaˇcki, D., Edelkamp, S. (eds.) SPIN 2007. LNCS, vol. 4595, pp. 263–267. Springer, Heidelberg (2007), http://anna.fi.muni.cz/models/ 9. Peled, D.: Combining partial order reductions with on-the-fly model-checking. In: Dill, D.L. (ed.) CAV 1994. LNCS, vol. 818, pp. 377–390. Springer, Heidelberg (1994) 10. Peled, D., Valmari, A., Kokkarinen, I.: Relaxed visibility enhances partial order reduction. Formal Methods in System Design 19(3), 275–289 (2001) 11. Peled, D.A.: All from one, one for all: on model checking using representatives. In: Courcoubetis, C. (ed.) CAV 1993. LNCS, vol. 697, pp. 409–423. Springer, Heidelberg (1993) 12. Valmari, A.: A stubborn attack on state explosion. Formal Methods in System Design 1(1), 297–322 (1992) 13. Varpaaniemi, K.: On the Stubborn Set Method in Reduced State Space Generation. PhD thesis, Digital Systems Laboratory, Helsinki University of Technology (May 1998)
State Space Reduction of Linear Processes Using Control Flow Reconstruction Jaco van de Pol and Mark Timmer University of Twente, Department of Computer Science, The Netherlands Formal Methods & Tools {pol,timmer}@cs.utwente.nl
Abstract. We present a new method for fighting the state space explosion of process algebraic specifications, by performing static analysis on an intermediate format: linear process equations (LPEs). Our method consists of two steps: (1) we reconstruct the LPE’s control flow, detecting control flow parameters that were introduced by linearisation as well as those already encoded in the original specification; (2) we reset parameters found to be irrelevant based on data flow analysis techniques similar to traditional liveness analysis, modified to take into account the parallel nature of the specifications. Our transformation is correct with respect to strong bisimilarity, and never increases the state space. Case studies show that impressive reductions occur in practice, which could not be obtained automatically without reconstructing the control flow.
1
Introduction
Our society depends heavily on computer systems, asking increasingly for methods to verify their correctness. One successful approach is model checking; performing an exhaustive state space exploration. However, for concurrent systems this approach suffers from the infamous state space explosion, an exponential growth of the number of reachable states. Even a small system specification can give rise to a gigantic, or even infinite, state space. Therefore, much attention has been given to methods for reducing the state space. It is often inefficient to first generate a state space and then reduce it, since most of the complexity is in the generation process. As a result, intermediate symbolic representations such as Petri nets and linear process equations (LPEs) have been developed, upon which reductions can be applied. We concentrate on LPEs, the intermediate format of the process algebraic language μCRL [12]. Although LPEs are a restricted part of μCRL, every specification can be transformed to an LPE by a procedure called linearisation [13, 19]. Our results could also easily be applied to other formalisms employing concurrency. An LPE is a flat process description, consisting of a collection of summands that describe transitions symbolically. Each summand can perform an action and advance the system to some next state, given that a certain condition based
This research has been partially funded by NWO under grant 612.063.817 (SYRUP).
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 54–68, 2009. c Springer-Verlag Berlin Heidelberg 2009
State Space Reduction of Linear Processes
55
on the current state is true. It has already been shown useful to reduce LPEs directly (e.g. [5, 14]), instead of first generating their entire (or partial) state spaces and reducing those, or performing reductions on-the-fly. The state space obtained from a reduced LPE is often much smaller than the equivalent state space obtained from an unreduced LPE; hence, both memory and time are saved. The reductions we will introduce rely on the order in which summands can be executed. The problem when using LPEs, however, is that the explicit control flow of the original parallel processes has been lost, since they have been merged into one linear form. Moreover, some control flow could already have been encoded in the state parameters of the original specification. To solve this, we first present a technique to reconstruct the control flow graphs of an LPE. This technique is based on detecting which state parameters act as program counters for the underlying parallel processes; we call these control flow parameters (CFPs). We then reconstruct the control flow graph of each CFP based on the values it can take before and after each summand. Using the reconstructed control flow, we define a parameter to be relevant if, before overwritten, it might be used by an enabling or action function, or by a next-state function to determine the value of another parameter that is relevant in the next state. Parameters that are not relevant are irrelevant, also called dead. Our syntactic reduction technique resets such irrelevant variables to their initial value. This is justified, because these variables will be overwritten before ever being read. Contributions. (1) We present a novel method to reconstruct the control flow of linear processes. Especially when specifications are translated between languages, their control flow may be hidden in the state parameters (as will also hold for our main case study). No such reconstruction method appeared in literature before. (2) We use the reconstructed control flow to perform data flow analysis, resetting irrelevant state parameters. We prove that the transformed system is strongly bisimilar to the original, and that the state space never increases. (3) We implemented our method in a tool called stategraph and provide several examples, showing that significant reductions can be obtained. The main case study clearly explains the use of control flow reconstruction. By finding useful variable resets automatically, the user can focus on modelling systems in an intuitive way, instead of formulating models such that the toolset can handle them best. This idea of automatic syntactic transformations for improving the efficiency of formal verification (not relying on users to make their models as efficient as possible) already proved to be a fruitful concept in earlier work [21]. Related work. Liveness analysis techniques are well-known in compiler theory [1]. However, their focus is often not on handling the multiple control flows arising from parallelism. Moreover, these techniques generally only work locally for each block of program code, and aim at reducing execution time instead of state space. The concept of resetting dead variables for state space reduction was first formalised by Bozga et al. [7], but their analysis was based on a set of sequential processes with queues rather than parallel processes. Moreover, relevance of variables was only dealt with locally, such that a variable that is passed to a queue
56
J. van de Pol and M. Timmer
or written to another variable was considered relevant, even if it is never used afterwards. A similar technique was presented in [22], using analysis of control flow graphs. It suffers from the same locality restriction as [7]. Most recent is [10], which applies data flow analysis to value-passing process algebras. It uses Petri nets as its intermediate format, featuring concurrency and taking into account global liveness information. We improve on this work by providing a thorough formal foundation including bisimulation preservation proofs, and by showing that our transformation never increases the state space. Most importantly, none of the existing approaches attempts to reconstruct control flow information that is hidden in state variables, missing opportunities for reduction. The μCRL toolkit already contained a tool parelm, implementing a basic variant of our methods. Instead of resetting state parameters that are dead given some context, it simply removes parameters that are dead in all contexts [11]. As it does not take into account the control flow, parameters that are sometimes relevant and sometimes not will never be reset. We show by examples from the μCRL toolset that stategraph indeed improves on parelm. Organisation of the paper. After the preliminaries in Section 2, we discuss the reconstruction of control flow graphs in Section 3, the data flow analysis in Section 4, and the transformation in Section 5. The results of the case studies are given in Section 6, and conclusions and directions for future work in Section 7. Due to space limitations, we refer the reader to [20] for the full version of the current paper, containing all the complete proofs, and further insights about additional reductions, potential limitations, and potential adaptions to our theory.
2
Preliminaries
Notation. Variables for single values are written in lowercase, variables for sets or types in uppercase. We write variables for vectors and sets or types of vectors in boldface. Labelled transition systems (LTSs). The semantics of an LPE is given in terms of an LTS : a tuple S, s0 , A, Δ, with S a set of states, s0 ∈ S the initial state, A a set of actions, and Δ ⊆ S × A × S a transition relation. Linear process equations (LPEs). The LPE [4] is a common format for defining LTSs in a symbolic manner. It is a restricted process algebraic equation, similar to the Greibach normal form for formal grammars, specifications in the language UNITY [8], and the precondition-effect style used for describing automata [16]. Usenko showed how to transform a general μCRL specification into an LPE [13, 19]. Each LPE is of the form X(d : D) = ci (d, ei ) ⇒ ai (d, ei ) · X(gi (d, ei )), i∈I ei : Ei
where D is a type for state vectors (containing the global variables), I a set of summand indices, and Ei a type for local variables vectors for summand i.
State Space Reduction of Linear Processes
57
The summations represent nondeterministic choices; the outer between different summands, the inner between different possibilities for the local variables. Furthermore, each summand i has an enabling function ci , an action function ai (yielding an atomic action, potentially with parameters), and a next-state function gi , which may all depend on the state and the local variables. In this paper we assume the existence of an LPE with the above function and variable names, as well as an initial state vector init. Given a vector of formal state parameters d, we use dj to refer to its j th parameter. An actual state is a vector of values, denoted by v; we use vj to refer to its j th value. We use Dj to denote the type of dj , and J for the set of all parameters dj . Furthermore, gi,j (d, ei ) denotes the j th element of gi (d, ei ), and pars(t) the set of all parameters dj that syntactically occur in the expression t. The state space of the LTS underlying an LPE consists of all state vectors. It has a transition from v to v by an atomic action a(p) (parameterised by the possibly empty vector p) if and only if there is a summand i for which a vector of local variables ei exists such that the enabling function is true, the action is a(p) and the next-state function yields v . Formally, for all v, v ∈ D, there is a a(p) transition v −→ v if and only if there is a summand i such that ∃ei ∈ Ei · ci (v, ei ) ∧ ai (v, ei ) = a(p) ∧ gi (v, ei ) = v . Example 1. Consider a process consisting of two buffers, B1 and B2 . Buffer B1 reads a datum of type D from the environment, and sends it synchronously to B2 . Then, B2 writes it back to the environment. The processes are given by B1 = read (d) · w(d) · B1 , B2 = r(d) · write(d) · B2 , d: D
d: D
put in parallel and communicating on w and r. Linearised [19], they become X(a : { 1, 2 }, b: { 1, 2 }, x : D, y : D) = ⇒ read(d) · X(2, b, d, y) (1) d: D a = 1 + b=2 ⇒ write(y) · X(a, 1, x, y) (2) a = 2 ∧ b = 1 ⇒ c(x) · X(1, 2, x, x)
+
(3)
where the first summand models behaviour of B1 , the second models behaviour of B2 , and the third models their communication. The global variables a and b are used as program counters for B1 and B2 , and x and y for their local memory. Strong bisimulation. When transforming a specification S into S , it is obviously important to verify that S and S describe equivalent systems. For this we will use strong bisimulation [17], one of the most prominent notions of equivalence, which relates processes that have the same branching structure. It is well-known that strongly bisimilar processes satisfy the same properties, as for instance expressed in CTL∗ or μ-calculus. Formally, two processes with initial states p and q are strongly bisimilar if there exists a relation R such that (p, q) ∈ R, and a
a
– if (s, t) ∈ R and s → s , then there is a t such that t → t and (s , t ) ∈ R; a a – if (s, t) ∈ R and t → t , then there is a s such that s → s and (s , t ) ∈ R.
58
3
J. van de Pol and M. Timmer
Reconstructing the Control Flow Graphs
First, we define a parameter to be changed in a summand i if its value after taking i might be different from its current value. A parameter is directly used in i if it occurs in its enabling function or action function, and used if it is either directly used or needed to calculate the next state. Definition 1 (Changed, used). Let i be a summand, then a parameter dj is changed in i if gi,j (d, ei )
= dj , otherwise it is unchanged in i. It is directly used in i if dj ∈ pars(ai (d, ei )) ∪ pars(ci (d, ei )), and used in i if it is directly used in i or dj ∈ pars(gi,k (d, ei )) for some k such that dk is changed in i. We will often need to deduce the value s that a parameter dj must have for a summand i to be taken; the source of dj for i. More precisely, this value is defined such that the enabling function of i can only evaluate to true if dj = s. Definition 2 (Source). A function f : I × (dj :J) → Dj ∪ {⊥} is a source function if, for every i ∈ I, dj ∈ J, and s ∈ Dj , f (i, dj ) = s implies that ∀v ∈ D, ei ∈ Ei · ci (v, ei ) =⇒ vj = s. Furthermore, f (i, dj ) = ⊥ is always allowed; it indicates that no unique value s complying to the above could be found. In the following we assume the existence of a source function source. Note that source(i, dj ) is allowed to be ⊥ even though there might be some source s. The reason for this is that computing the source is in general undecidable, so in practice heuristics are used that sometimes yield ⊥ when in fact a source is present. However, we will see that this does not result in any errors. The same holds for the destination functions defined below. Basically, the heuristics we apply to find a source can handle equations, disjunctions and conjunctions. For an equational condition x = c the source is obviously c, for a disjunction of such terms we apply set union, and for conjunction intersection. If for some summand i a set of sources is obtained, it can be split into multiple summands, such that each again has a unique source. Example 2. Let ci (d, ei ) be given by (dj = 3 ∨ dj = 5) ∧ dj = 3 ∧ dk = 10, then obviously source(i, dj ) = 3 is valid (because ({ 3 } ∪ { 5 }) ∩ { 3 } = { 3 }), but also (as always) source(i, dj ) = ⊥. We define the destination of a parameter dj for a summand i to be the unique value dj has after taking summand i. Again, we only specify a minimal requirement. Definition 3 (Destination). A function f : I × (dj :J) → Dj ∪ { ⊥ } is a destination function if, for every i ∈ I, dj ∈ J, and s ∈ Dj , f (i, dj ) = s implies ∀v ∈ D, ei ∈ Ei · ci (v, ei ) =⇒ gi,j (v, ei ) = s. Furthermore, f (i, dj ) = ⊥ is always allowed, indicating that no unique destination value could be derived. In the following we assume the existence of a destination function dest.
State Space Reduction of Linear Processes
59
Our heuristics for computing dest(i, dj ) just substitute source(i, dj ) for dj in the next-state function of summand i, and try to rewrite it to a closed term. Example 3. Let ci (d, ei ) be given by dj = 8 and gi,j (d, ei ) by dj + 5, then dest(i, dj ) = 13 is valid, but also (as always) dest(i, dj ) = ⊥. If for instance ci (d, ei ) = dj = 5 and gi,j (d, ei ) = e3 , then dest(i, dj ) can only yield ⊥, since the value of dj after taking i is not fixed. We say that a parameter rules a summand if both its source and its destination for that summand can be computed. Definition 4 (Rules). A parameter dj rules a summand i if source(i, dj )
=⊥ and dest(i, dj )
= ⊥. The set of all summands that dj rules is denoted by Rdj = { i ∈ I | dj rules i }. Furthermore, Vdj denotes the set of all possible values that dj can take before and after taking one of the summands which it rules, plus its initial value. Formally, Vdj = { source(i, dj ) | i ∈ Rdj } ∪ { dest(i, dj ) | i ∈ Rdj } ∪ { initj }. Examples will show that summands can be ruled by several parameters. We now define a parameter to be a control flow parameter if it rules all summands in which it is changed. Stated differently, in every summand a control flow parameter is either left alone or we know what happens to it. Such a parameter can be seen as a program counter for the summands it rules, and therefore its values can be seen as locations. All other parameters are called data parameters. Definition 5 (Control flow parameters). A parameter dj is a control flow parameter (CFP) if for all i ∈ I, either dj rules i, or dj is unchanged in i. A parameter that is not a CFP is called a data parameter (DP). The set of all summands i ∈ I such that dj rules i is called the cluster of dj . The set of all CFPs is denoted by C, the set of all DPs by D. Example 4. Consider the LPE of Example 1 again. For the first summand we may define source(1, a) = 1 and dest(1, a) = 2. Therefore, parameter a rules the first summand. Similarly, it rules the third summand. As a is unchanged in the second summand, it is a CFP (with summands 1 and 3 in its cluster). In the same way, we can show that parameter b is a CFP ruling summands 2 and 3. Parameter x is a DP, as it is changed in summand 1 while both its source and its destination are not unique. From summand 3 it follows that y is a DP. Based on CFPs, we can define control flow graphs. The nodes of the control flow graph of a CFP dj are the values dj can take, and the edges denote possible transitions. Specifically, an edge labelled i from value s to t denotes that summand i might be taken if dj = s, resulting in dj = t. Definition 6 (Control flow graphs). Let dj be a CFP, then the control flow graph for dj is the tuple (Vdj , Edj ), where Vdj was given in Definition 4, and Edj = { (s, i, t) | i ∈ Rdj ∧ s = source(i, dj ) ∧ t = dest(i, dj ) }.
60
J. van de Pol and M. Timmer
a=1 (1)
b=1
(3)
(3)
(2)
a=2
b=2
(a) Control flow graph for a.
(b) Control flow graph for b.
Fig. 1. Control flow graphs for the LPE of Example 1
Figure 1 shows the control flow graphs for the LPE of Example 1. The next proposition states that if a CFP dj rules a summand i, and i is enabled for some state vector v = (v1 , . . . , vj , . . . , vn ) and local variable vector ei , then the control flow graph of dj contains an edge from vj to gi,j (v, ei ). Proposition 1. Let dj be a CFP, v a state vector, and ei a local variable vector. Then, if dj rules i and ci (v, ei ) holds, it follows that (vj , i, gi,j (v, ei )) ∈ Edj . Note that we reconstruct a local control flow graph per CFP, rather than a global control flow graph. Although global control flow might be useful, its graph can grow larger than the complete state space, completely defeating its purpose.
4
Simultaneous Data Flow Analysis
Using the notion of CFPs, we analyse to which clusters DPs belong. Definition 7 (The belongs-to relation). Let dk be a DP and dj a CFP, then dk belongs to dj if all summands i ∈ I that use or change dk are ruled by dj . We assume that each DP belongs to at least one CFP, and define CFPs to not belong to anything. Note that the assumption above can always be satisfied by adding a dummy parameter b of type Bool to every summand, initialising it to true, adding b = true to every ci , and leaving b unchanged in all gi . Also note that the fact that a DP dk belongs to a CFP dj implies that the complete data flow of dk is contained in the summands of the cluster of dj . Therefore, all decisions on resetting dk can be made based on the summands within this cluster. Example 5. For the LPE of the previous example, x belongs to a, and y to b. If a DP dk belongs to a CFP dj , it follows that all analyses on dk can be made by the cluster of dj . We begin these analyses by defining for which values of dj (so during which part of the cluster’s control flow) the value of dk is relevant. Basically, dk is relevant if it might be directly used before it will be changed, otherwise it is irrelevant. More precisely, the relevance of dk is divided into three conditions. They state that dk is relevant given that dj = s, if there is a
State Space Reduction of Linear Processes
61
summand i that can be taken when dj = s, such that either (1) dk is directly used in i; or (2,3) dk is indirectly used in i to determine the value of a DP that is relevant after taking i. Basically, clause (2) deals with temporal dependencies within one cluster, whereas (3) deals with dependencies through concurrency between different clusters. The next definition formalises this. Definition 8 (Relevance). Let dk ∈ D and dj ∈ C, such that dk belongs to dj . Given some s ∈ Dj , we use (dk , dj , s) ∈ R (or R(dk , dj , s)) to denote that the value of dk is relevant when dj = s. Formally, R is the smallest relation such that 1. If dk is directly used in some i ∈ I, dk belongs to some dj ∈ C, and s = source(i, dj ), then R(dk , dj , s); 2. If R(dl , dj , t), and there exists an i ∈ I such that (s, i, t) ∈ Edj , and dk belongs to dj , and dk ∈ pars(gi,l (d, ei )), then R(dk , dj , s); 3. If R(dl , dp , t), and there exists an i ∈ I and an r such that (r, i, t) ∈ Edp , and dk ∈ pars(gi,l (d, ei )), and dk belongs to some cluster dj to which dl does not belong, and s = source(i, dj ), then R(dk , dj , s). If (dk , dj , s)
∈ R, we write ¬R(dk , dj , s) and say that dk is irrelevant when dj = s. Although it might seem that the second and third clause could be merged, we provide an example in [20] where this would decrease the number of reductions. Example 6. Applying the first clause of the definition of relevance to the LPE of Example 1, we see that R(x, a, 2) and R(y, b, 2). Then, no clauses apply anymore, so ¬R(x, a, 1) and ¬R(y, b, 1). Now, we hide the action c, obtaining X(a : { 1, 2 }, b: { 1, 2 }, x : D, y : D) = ⇒ read(d) · X(2, b, d, y) d: D a = 1 + +
(1)
b=2 ⇒ write(y) · X(a, 1, x, y) (2) a = 2 ∧ b = 1 ⇒ τ · X(1, 2, x, x) (3)
In this case, the first clause of relevance only yields R(y, b, 2). Moreover, since x is used in summand 3 to determine the value that y will have when b becomes 2, also R(x, a, 2). Formally, this can be found using the third clause, substituting l = y, p = b, t = 2, i = 3, r = 1, k = x, j = a, and s = 2. Since clusters have only limited information, they do not always detect a DP’s irrelevance. However, they always have sufficient information to never erroneously find a DP irrelevant. Therefore, we define a DP dk to be relevant given a state vector v, if it is relevant for the valuations of all CFPs dj it belongs to. Definition 9 (Relevance in state vectors). The relevance of a parameter dk given a state vector v, denoted Relevant(dk , v), is defined by Relevant (dk , v) = R(dk , dj , vj ). dj ∈C dk belongs to dj
Note that, since a CFP belongs to no other parameters, it is always relevant.
62
J. van de Pol and M. Timmer
Example 7. For the LPE of the previous example we derived that x belongs to a, and that it is irrelevant when a = 1. Therefore, the valuation x = d5 is not relevant in the state vector v = (1, 2, d5 , d2 ), so we write ¬ Relevant(x, v). Obviously, the value of a DP that is irrelevant in a state vector does not matter. For instance, v = (w, x, y) and v = (w, x , y) are equivalent if ¬ Relevant (d2 , v). ∼ To formalise this, we introduce a relation = on state vectors, given by ∼
v = v ⇐⇒ ∀dk ∈ J : (Relevant (dk , v) =⇒ vk = vk ) , and prove that it is a strong bisimulation; one of the main results of this paper. ∼
Theorem 1. The relation = is a strong bisimulation. ∼
Proof (sketch). It is easy to see that = is an equivalence relation (1). Then, it can be proven that if a summand i is enabled given a state vector v, it is also ∼ enabled given a state vector v such that v = v (2). Finally, it can be shown that if a summand i is taken given v, its action is identical to when i is taken ∼ given v (3), and their next-state vectors are equivalent according to = (4). ∼ Now, let v0 and v0 be state vectors such that v0 = v0 . Also, assume that a ∼ v0 → v1 . By (1) = is symmetric, so we only need to prove that a transition a ∼ v0 → v1 exists such that v1 = v1 . By the operational semantics there is a summand i and a local variable vector ei such that ci (v0 , ei ) holds, a = ai (v0 , ei ), and v1 = gi (v0 , ei ). Now, by (2) we know that ci (v0 , ei ) holds, and by (3) that a = ai (v0 , ei ). Therefore, a ∼ v0 →gi (v0 , ei ). Using (4) we get gi (v0 , ei ) = gi (v0 , e), proving the theorem.
5
Transformations on LPEs
The most important application of the data flow analysis described in the previous section is to reduce the number of reachable states of the LTS underlying an LPE. Note that by modifying irrelevant parameters in an arbitrary way, this number could even increase. We present a syntactic transformation of LPEs, and prove that it yields a strongly bisimilar system and can never increase the number of reachable states. In several practical examples, it yields a decrease. Our transformation uses the idea that a data parameter dk that is irrelevant in all possible states after taking a summand i, can just as well be reset by i to its initial value. Definition 10 (Transforms). Given an LPE X of the familiar form, we define its transform to be the LPE X given by X (d : D) = ci (d, ei ) ⇒ ai (d, ei ) · X (gi (d, ei )), i∈I ei : Ei
with gi,k (d, ei ) =
⎧ ⎪ ⎪gi,k (d, ei ) ⎪ ⎨ ⎪ ⎪ ⎪ ⎩
if dk
initk
dj ∈C dj rules i belongs to dj
otherwise.
R(dk , dj , dest(i, dj )),
State Space Reduction of Linear Processes
63
We will use the notation X(v) to denote state v in the underlying LTS of X, and X (v) to denote state v in the underlying LTS of X . Note that gi (d, ei ) only deviates from gi (d, ei ) for parameters dk that are irrelevant after taking i, as stated by the following lemma. Lemma 1. For every i ∈ I, state vector v, and local variable vector ei , given ∼ that ci (v, ei ) = true it holds that gi (v, ei ) = gi (v, ei ). Using this lemma we show that X(v) and X (v) are bisimilar, by first proving an even stronger statement. ≈
Theorem 2. Let = be defined by ≈
∼
X(v) = X (v ) ⇐⇒ v = v , ≈
∼
then = is a strong bisimulation. The relation = is used as it was defined for X. ≈
∼
Proof. Let v0 and v0 be state vectors such that X(v0 ) = X (v0 ), so v0 = v0 . a Assume that X(v0 ) → X(v1 ). We need to prove that there exists a transition a ≈ X (v0 ) → X (v1 ) such that X(v1 ) = X (v1 ). By Theorem 1 there exists a state a ∼ vector v1 such that X(v0 ) → X(v1 ) and v1 = v1 . By the operational semantics, for some i and ei we thus have ci (v0 , ei ), ai (v0 , ei ) = a, and gi (v0 , ei ) = v1 . By a ∼ Definition 10, we have X (v0 ) → X (gi (v0 , ei )), and by Lemma 1 gi (v0 , ei ) = ∼ gi (v0 , ei ). Now, by transitivity and reflexivity of = (Statement (1) of the proof ∼ ∼ ≈ of Theorem 1), v1 = v1 = gi (v0 , ei ) = gi (v0 , ei ), hence X(v1 ) = X (gi (v0 , ei )). ∼ By symmetry of =, this completes the proof. The following corollary, stating the desired bisimilarity, immediately follows. Corollary 1. Let X be an LPE, X its transform, and v a state vector. Then, X(v) is strongly bisimilar to X (v). We now show that our choice of g (d, ei ) ensures that the state space of X is at most as large as the state space of X. We first give the invariant that if a parameter is irrelevant for a state vector, it is equal to its initial value. Proposition 2. For X (init) invariably ¬ Relevant (dk , v) implies vk = initk . Using this invariant it is possible to prove the following lemma, providing a functional strong bisimulation relating the states of X(init) and X (init). Lemma 2. Let h be a function over state vectors, such that for any v ∈ D it is given by hk (v) = vk if Relevant(dk , v), and by hk (v) = initk otherwise. Then, h is a strong bisimulation relating the states of X(init) and X (init). Since the bisimulation relation is a function, and the domain of every function is at least as large as its image, the following corollary is immediate.
64
J. van de Pol and M. Timmer
Corollary 2. The number of reachable states in X is at most as large as the number of reachable states in X. Example 8. Using the above transformation, the LPE of Example 6 becomes X (a : { 1, 2 }, b: { 1, 2 }, x : D, y : D) = ⇒ read(d) · X (2, b, d, y) d: D a = 1 + b=2 ⇒ write(y) · X (a, 1, x, d1 ) +
a = 2 ∧ b = 1 ⇒ τ · X (1, 2, d1 , x)
(1) (2) (3)
assuming that the initial state vector is (1, 1, d1 , d1 ). Note that for X the state (1, 1, di , dj ) is only reachable for di = dj = d1 , whereas in the original specification X it is reachable for all di , dj ∈ D such that di = dj .
6
Case Studies
The proposed method has been implemented in the context of the μCRL toolkit by a tool called stategraph. For evaluation purposes we applied it first on a model of a handshake register, modelled and verified by Hesselink [15]. We used a MacBook with a 2.4 GHz Intel Core 2 Duo processor and 2 GB memory. A handshake register is a data structure that is used for communication between a single reader and a single writer. It guarantees recentness and sequentiality; any value that is read was at some point during the read action the last value written, and the values of sequential reads occur in the same order as they were written). Also, it is waitfree; both the reader and the writer can complete their actions in a bounded number of steps, independent of the other process. Hesselink provides a method to construct a handshake register of a certain data type based on four so-called safe registers and four atomic boolean registers. We used a μCRL model of the handshake register, and one of the implementation using four safe registers. We generated their state spaces, minimised, and indeed obtained identical LTSs, showing that the implementation is correct. However, using a data type D of three values the state space before minimisation is already very large, such that its generation is quite time-consuming. So, we applied stategraph (in combination with the existing μCRL tool constelm [11]) to reduce the LPE for different sizes of D. For comparison we also reduced the specifications in the same way using the existing, less powerful tool parelm. For each specification we measured the time for reducing its LPE and generating the state space. We also used a recently implemented tool1 for symbolic reachability analysis [6] to obtain the state spaces when not using stategraph, since in that case not all specifications could be generated explicitly. Every experiment was performed ten times, and the average run times are shown in Table 1 (where x:y.z means x minutes and y.z seconds).
1
Available from http://fmt.cs.utwente.nl/tools/ltsmin
State Space Reduction of Linear Processes
65
Table 1. Modelling a handshake register; parelm versus stategraph
|D| |D| |D| |D| |D|
= = = = =
2 3 4 5 6
constelm | parelm | constelm states time (expl.) time (symb.) 540,736 0:23.0 0:04.5 13,834,800 10:10.3 0:06.7 142,081,536 – 0:09.0 883,738,000 – 0:11.9 3,991,840,704 – 0:15.4
constelm | stategraph | constelm states time (expl.) time (symb.) 45,504 0:02.4 0:01.3 290,736 0:12.7 0:01.4 1,107,456 0:48.9 0:01.6 3,162,000 2:20.3 0:01.8 7,504,704 5:26.1 0:01.9
Observations. The results show that stategraph provides a substantial reduction of the state space. Using parelm explicit generation was infeasible with just four data elements (after sixteen hours about half of the states had been generated), whereas using stategraph we could easily continue until six elements. Note that the state space reduction for |D| = 6 was more than a factor 500. Also observe that stategraph is impressively useful for speeding up symbolic analysis, as the time for symbolic generation improves an order of magnitude. To gain an understanding of why our method works for this example, observe the μCRL specification of the four safe registers below. Y (i : Bool, j : Bool, r : { 1, 2 , 3 }, w : { 1, 2, 3 }, v : D, vw : D, vr : D) = + + + + + +
x: D
x: D
r=1 ⇒ beginRead(i, j) · Y (i, j, 2, w, v, vw, vr) r = 2 ∧ w = 1 ⇒ τ · Y (i, j, 3, w, v, vw, v)
(1) (2)
r = 2∧w
= 1 ⇒ τ · Y (i, j, 3, w, v, vw, x) r=3 ⇒ endRead(i, j, vr) · Y (i, j, 1, w, v, vw, vr)
(3) (4)
w=1 w=2
⇒ beginWrite(i, j, x) · Y (i, j, r, 2, v, x, vr) ⇒ τ · Y (i, j, r, 3, vw, vw, vr)
(5) (6)
w=3
⇒ endWrite(i, j) · Y (i, j, r, 1, vw, vw, vr)
(7)
The boolean parameters i and j are just meant to distinguish the four components. The parameter r denotes the read status, and w the write status. Reading consists of a beginRead action, a τ step, and an endRead action. During the τ step either the contents of v is copied into vr, or, when writing is taking place at the same time, a random value is copied to vr. Writing works by first storing the value to be written in vw, and then copying vw to v. The tool discovered that after summand 4 the value of vr is irrelevant, since it will not be used before summand 4 is reached again. This is always preceded by summand 2 or 3, both overwriting vr. Thus, vr can be reset to its initial value in the next-state function of summand 4. This turned out to drastically decrease the size of the state space. Other tools were not able to make this reduction, since it requires control flow reconstruction. Note that using parallel processes for the reader and the writer instead of our solution of encoding control flow in the data parameters would be difficult, because of the shared variable v.
66
J. van de Pol and M. Timmer Table 2. Modelling several specifications; parelm versus stategraph specification bke ccp33 onebit AIDA-B AIDA ccp221 locker swp32
constelm | parelm | constelm time states summands pars 0:47.9 79,949 50 31 – – 1082 97 0:25.1 319,732 30 26 7:50.1 3,500,040 89 35 0:40.1 318,682 85 35 0:28.3 76,227 562 63 1:43.3 803,830 88 72 0:11.7 156,900 13 12
constelm | stategraph | constelm time states summands pars 0:48.3 79,949 50 21 – – 807 94 0:21.4 269,428 30 26 7:11.9 3,271,580 89 32 0:30.8 253,622 85 32 0:25.6 76,227 464 62 1:32.9 803,830 88 19 0:11.8 156,900 13 12
Although the example may seem artificial, it is an almost one-to-one formalisation of its description in [15]. Without our method for control flow reconstruction, finding the useful variable reset could not be done automatically. Other specifications. We also applied stategraph to all the example specifications of μCRL, and five from industry: two versions of an Automatic In-flight Data Acquisition unit for a helicopter of the Dutch Royal Navy [9]; a cache coherence protocol for a distributed JVM [18]; an automatic translation from Erlang to μCRL of a distributed resource locker in Ericsson’s AXD 301 switch [2]; and the sliding window protocol (with three data elements and window size two) [3]. The same analysis as before was performed, but now also counting the number of summands and parameters of the reduced LPEs. Decreases of these quantities are due to stategraph resetting variables to their initial value, which may turn them into constants and have them removed. As a side effect, some summands might be removed as their enabling condition is shown to never be satisfied. These effects provide a syntactical cleanup and fasten state space generation, as seen for instance from the ccp221 and locker specifications. The reductions obtained are shown in Table 2; values that differ significantly are listed in boldface. Not all example specifications benefited from stategraph (these are omitted from the table). This is partly because parelm already performs a rudimentary variant of our method, and also because the lineariser removes parameters that are syntactically out of scope. However, although optimising LPEs has been the focus for years, stategraph could still reduce some of the standard examples. Especially for the larger, industrial specifications reductions in state space, but also in the number of summands and parameters of the linearised form were obtained. Both results are shown to speed up state space generation, proving stategraph to be a valuable addition to the μCRL toolkit.
7
Conclusions and Future Work
We presented a novel method for reconstructing the control flow of linear processes. This information is used for data flow analysis, aiming at state space reduction by resetting variables that are irrelevant given a certain state. We
State Space Reduction of Linear Processes
67
introduced a transformation and proved both its preservation of strong bisimilarity, and its property to never increase the state space. The reconstruction process enables us to interpret some variables as program counters; something other tools are not able to. Case studies using our implementation stategraph showed that although for some small academic examples the existing tools already suffice, impressive state space reductions can be obtained for larger, industrial systems. Since we work on linear processes, these reductions are obtained before the entire state space is generated, saving valuable time. Surprisingly, a recently implemented symbolic tool for μCRL also profits much from stategraph. As future work it would be interesting to find additional applications for the reconstructed control flow. One possibility is to use it for invariant generation, another (already implemented) is to visualise it such that process structure can be understood better. Also, it might be used to optimise confluence checking [5], since it could assist in determining which pairs of summands may be confluent. Another direction for future work is based on the insight that the control flow graph is an abstraction of the state space. It could be investigated whether other abstractions, such as a control flow graph containing also the values of important data parameters, might result in more accurate data flow analysis. Acknowledgements. We thank Jan Friso Groote for his specification of the handshake register, upon which our model has been based. Furthermore, we thank Michael Weber for fruitful discussions about Hesselink’s protocol.
References [1] Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading (1986) [2] Arts, T., Earle, C.B., Derrick, J.: Verifying Erlang code: A resource locker casestudy. In: Eriksson, L.-H., Lindsay, P.A. (eds.) FME 2002. LNCS, vol. 2391, pp. 184–203. Springer, Heidelberg (2002) [3] Badban, B., Fokkink, W., Groote, J.F., Pang, J., van de Pol, J.: Verification of a sliding window protocol in μCRL and PVS. Formal Aspects of Computing 17(3), 342–388 (2005) [4] Bezem, M., Groote, J.F.: Invariants in process algebra with data. In: Jonsson, B., Parrow, J. (eds.) CONCUR 1994. LNCS, vol. 836, pp. 401–416. Springer, Heidelberg (1994) [5] Blom, S., van de Pol, J.: State space reduction by proving confluence. In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, pp. 596–609. Springer, Heidelberg (2002) [6] Blom, S., van de Pol, J.: Symbolic reachability for process algebras with recursive data types. In: Fitzgerald, J.S., Haxthausen, A.E., Yenigun, H. (eds.) ICTAC 2008. LNCS, vol. 5160, pp. 81–95. Springer, Heidelberg (2008) [7] Bozga, M., Fernandez, J.-C., Ghirvu, L.: State space reduction based on live variables analysis. In: Cortesi, A., Fil´e, G. (eds.) SAS 1999. LNCS, vol. 1694, pp. 164–178. Springer, Heidelberg (1999) [8] Chandy, K.M., Misra, J.: Parallel program design: a foundation. Addison-Wesley, Reading (1988)
68
J. van de Pol and M. Timmer
[9] Fokkink, W., Ioustinova, N., Kesseler, E., van de Pol, J., Usenko, Y.S., Yushtein, Y.A.: Refinement and verification applied to an in-flight data acquisition unit. In: Brim, L., Janˇcar, P., Kˇret´ınsk´ y, M., Kucera, A. (eds.) CONCUR 2002. LNCS, vol. 2421, pp. 1–23. Springer, Heidelberg (2002) [10] Garavel, H., Serwe, W.: State space reduction for process algebra specifications. Theoretical Computer Science 351(2), 131–145 (2006) [11] Groote, J.F., Lisser, B.: Computer assisted manipulation of algebraic process specifications. Technical report, SEN-R0117, CWI (2001) [12] Groote, J.F., Ponse, A.: The syntax and semantics of μCRL. In: Proc. of the 1st Workshop on the Algebra of Communicating Processes (ACP 1994), pp. 26–62. Springer, Heidelberg (1994) [13] Groote, J.F., Ponse, A., Usenko, Y.S.: Linearization in parallel pCRL. Journal of Logic and Algebraic Programming 48(1-2), 39–72 (2001) [14] Groote, J.F., van de Pol, J.: State space reduction using partial τ -confluence. In: Nielsen, M., Rovan, B. (eds.) MFCS 2000. LNCS, vol. 1893, pp. 383–393. Springer, Heidelberg (2000) [15] Hesselink, W.H.: Invariants for the construction of a handshake register. Information Processing Letters 68(4), 173–177 (1998) [16] Lynch, N., Tuttle, M.: An introduction to input/output automata. CWI-Quarterly 2(3), 219–246 (1989) [17] Milner, R.: Communication and Concurrency. Prentice-Hall, Englewood Cliffs (1989) [18] Pang, J., Fokkink, W., Hofman, R.F.H., Veldema, R.: Model checking a cache coherence protocol of a Java DSM implementation. Journal of Logic and Algebraic Programming 71(1), 1–43 (2007) [19] Usenko, Y.S.: Linearization in μCRL. PhD thesis, Eindhoven University (2002) [20] van de Pol, J., Timmer, M.: State space reduction of linear processes using control flow reconstruction (extended version). Technical report, TR-CTIT-09-24, CTIT, University of Twente (2009) [21] Winters, B.D., Hu, A.J.: Source-level transformations for improved formal verification. In: Proc. of the 18th IEEE Int. Conference on Computer Design (ICCD 2000), pp. 599–602 (2000) [22] Yorav, K., Grumberg, O.: Static analysis for state-space reductions preserving temporal logics. Formal Methods in System Design 25(1), 67–96 (2004)
A Data Symmetry Reduction Technique for Temporal-epistemic Logic Mika Cohen1 , Mads Dam2 , Alessio Lomuscio1 , and Hongyang Qu3 1
2
Department of Computing, Imperial College London, UK Access Linnaeus Center, Royal Institute of Technology, Sweden 3 Oxford University Computing Laboratory, UK
Abstract. We present a data symmetry reduction approach for model checking temporal-epistemic logic. The technique abstracts the epistemic indistinguishably relation for the knowledge operators, and is shown to preserve temporal-epistemic formulae. We show a method for statically detecting data symmetry in an ISPL program, the input to the temporalepistemic model checker MCMAS. The experiments we report show an exponential saving in verification time and space while verifying security properties of the NSPK protocol.
1
Introduction
Abstraction by data symmetry reduction [1] is one of the techniques put forward to tackle the state-explosion problem in model checking reactive systems. While the effectiveness of the methodology is well understood in the context of temporal logics, this is not the case for richer logics. Specifically, no analysis has been conducted so far in the context of temporal-epistemic logic [2]. This seems unsatisfactory as efficient symbolic checkers for epistemic languages have been put forward recently [3,4,5] and the usefulness of temporal-epistemic specifications demonstrated in a number of application-critical scenarios including web-services [6], automatic fault-detection [7], and security [8]. The models for the applications above display large numbers of initial states, often resulting from randomisation of data parameters (such as nonces and messages), exacerbating further the problem of checking large models. In such models, as it is known, a group of computation paths may well be the same up to renaming the data parameters in question. While in pure temporal logic we can safely collect representatives on these traces and conduct our checks only on these, this is not immediately possible in the presence of temporal-epistemic specifications. In fact, as we recall below, in these frameworks the epistemic operators are defined analysing all possible global states in which the local component for the agent in question is the same even if these belong to different computation paths. Because of this, simply collapsing traces would make some epistemic specifications satisfied in the abstract model even if they were not in the concrete one. In this paper we show that an alternative methodology solving this problem may be defined. Specifically, we show how we can still reduce the number of Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 69–83, 2009. c Springer-Verlag Berlin Heidelberg 2009
70
M. Cohen et al.
initial states to be considered by using an “abstracted” version of the epistemic relations for agents in the system. We show this reduction is sound and complete for temporal-epistemic specifications, in the sense that no false positives or false negatives are found in the reduced model. We also show how to compute the abstract epistemic relations efficiently, and how to statically detect data symmetry in the input to the temporal-epistemic model checker MCMAS [5] by means of scalarset annotations [1]. The experiments we report on a prototype extension to MCMAS show an exponential reduction in time and space for the verification of the security protocol NSPK. Related work. Data symmetry reduction is a known abstraction technique aiming to collapse states that are equivalent up to a renaming of data, thereby yielding a bisimilar quotient model [1]. There has been no attempt in the literature to extend data symmetry reduction from temporal logic to epistemic logic. Indeed, while abstraction for temporal properties is a well-established research area, abstraction for epistemic properties has only recently begun to receive some attention. In [9,10], Kripke models for epistemic logic are abstracted by approximating the epistemic relations. However, the models are not computationally grounded, which hampers concrete applications [11]. In [12], computationally grounded systems are abstracted by collapsing local states and actions of agents. Closer to our contribution, [13] gives a technique for component symmetry reduction [14,15] not too dissimilar from the data symmetry reduction technique in this paper. Indeed, Theorem 1 in this paper has a close analogue in [13], although the semantics for the epistemic modality is abstracted there into a counterpart semantics [16]. Beyond this, the main contribution in this paper is to address abstraction and symmetry detection in terms of concrete models represented in the MCMAS model checker. Specifically, we introduce a symbolic extension of MCMAS on which a syntactic criteria can be given that guarantees data symmetry, and we compute the abstract semantics without quantifying over permutations. This allows significantly improved savings in relation to [13]. Overview of paper. The rest of the paper is organised as follows. In section 2 we review the interpreted systems framework, the temporal-epistemic logic CTLK, and the model checker MCMAS. In Section 3 we present the data symmetry reduction technique for CTLK properties of interpreted systems. In Section 4 we show how to detect data symmetry in an interpreted system description in the input language to MCMAS. In Section 5 we report on experimental results for a prototype extension to MCMAS. Finally, Section 6 concludes.
2
Interpreted Systems, CTLK, and MCMAS
We model multi-agent systems in the mainstream interpreted systems framework [2] and express system requirements in the temporal-epistemic logic CTLK [17]; this section summarises the basic definitions. More details can be found in [2]. We also describe MCMAS [5], a model checker for the verification of CTLK properties of interpreted systems.
A Data Symmetry Reduction Technique for Temporal-epistemic Logic
71
Interpreted systems. Consider a set Ag = {1...n} of agents. For each agent i, assume a non-empty set Li of local states that agent i can be in, and a non-empty set ACTi of actions that agent i can perform. Assume also a nonempty set LEnv of states for the environment and a non-empty set ACTEnv of actions for the environment. Let S = L1 × · · · × Ln × LEnv be the set of all possible global states and ACT = ACT1 × · · · × ACTn × ACTEnv the set of all possible joint actions. For each agent i assume a local protocol Pi : Li −→ 2ACTi selecting actions depending on the local state of i, and a local evolution function ti : Li × ACT −→ Li specifying how agent i evolves from one local state to another depending on its action, the actions of the other agents, and the action of the environment. Analogously, assume an environment protocol PEnv : LEnv −→ 2ACTEnv , and an environment evolution function tEnv : LEnv × ACT −→ LEnv . Let P = P1 , · · · , Pn , PEnv be the joint protocol and t = t1 , · · · , tn , tEnv be the joint evolution function. Finally, consider a non-empty set I0 ⊆ S of initial states, and an evaluation function V : A −→ 2S for some non-empty set A of propositional atoms. Definition 1 (Interpreted system). An interpreted system is a tuple I = S, ACT, P, t, I0 , V with a set S of possible global states, a set ACT of possible joint actions, a joint protocol P , a joint evolution function t, a set I0 of initial states, and an evaluation function V . For any global state g = l1 , . . . , ln , lEnv ∈ S, we write gi for the local state li of agent i in g, and gEnv for the environment state lEnv in g. The local protocols and the local evolution functions together determine how the system of agents proceeds from one global state to the next. The global transition relation R ⊆ S × S is such that g, g ∈ R if and only if there exists a = a1 , . . . , an , aEnv ∈ ACT such that for all i ∈ Ag ∪{Env}, ti (a, gi ) = gi and ai ∈ Pi (gi ). We assume throughout the paper that the global transition relation R is serial, i.e., for every g ∈ S, there is g ∈ S such that gRg . A path in I is an infinite sequence g 0 , g 1 , . . . of global states in S such that each pair of adjacent states forms a transition, i.e., g j Rg j+1 for all j. The set G of reachable states in I contains all global states g ∈ S for which there is a path g 0 , g 1 , . . . , g, . . . starting from some g 0 ∈ I0 . Intuitively, the local state gi contains all the information available to agent i: if gi = gi then global state g could, for all agent i can tell, be global state g . This observation can be used to employ a knowledge modality defined on the relation given by the equality on the local components [2]: Definition 2 (Epistemic relation). The epistemic indistinguishability relation ∼i ⊆ G × G for agent i is such that g ∼i g iff gi = gi . Computation Tree Logic of Knowledge. We consider specifications in the temporal-epistemic logic CTLK which extends CTL with epistemic modalities. Definition 3 (CTLK). Assume a set Ag = {1..n} of agents i and a non-empty set A of propositional atoms p. CT LK formulae are defined by the expression: φ ::= p | ¬φ | φ ∧ φ | Ki φ | EXφ | EGφ | E(φU φ)
72
M. Cohen et al.
The knowledge modality Ki is read “Agent i knows that”, the quantifier E is read “For some computation path” and the temporal operators X, G and U are read “In the next state”, “Always” and “Until” respectively. We assume customary abbreviations: K i encodes the diamond epistemic modality ¬Ki ¬; AG φ represents ¬E(true U ¬φ) (“For all paths, always φ”); AF φ abbreviates ¬EG ¬φ (“For all paths, eventually φ”). Satisfaction of the language above with respect to interpreted systems is defined as standard. Specifically, the knowledge modality Ki is evaluated on an interpreted system I on a point g by means of Definition 2 as follows: – (I, g) |= Ki φ iff for all g such that g ∼i g we have that (I, g ) |= φ. The CTL modalities are interpreted on the serial paths generated by the global transition relation R: (I, g) |= EXφ iff for some path g 0 , g 1 , . . . in I such that g = g 0 , we have (I, g 1 ) |= φ; (I, g) |= EGφ iff for some path g 0 , g 1 , . . . in I such that g = g 0 , we have (I, g i ) |= φ for all i ≥ 0; (I, g) |= E(φU φ ) iff for some path g 0 , g 1 , . . . in I such that g = g 0 , there is a natural number i such that (I, g i ) |= φ and (I, g j ) |= φ for all 0 ≤ j < i. We write [[φ]] for the extension of formula φ in I, i.e., the set of reachable states g ∈ G such that (I, g) |= φ. We say that formula φ is true in system I, written I |= φ, iff I0 ⊆ [[φ]]. MCMAS. The tool MCMAS is a symbolic model checker for the verification of CTLK properties of interpreted systems [5]. We illustrate its input language, the Interpreted Systems Programming Language (ISPL), with the bit-transmission protocol, a standard example in temporal-epistemic logic [2]. Example 1 (Bit-transmission protocol [2]). A sender and a receiver communicate over a lossy channel. Their goal is to transmit a bit value b ∈ {0, 1} from the sender to the receiver in such a way that the sender will know that the receiver knows the value of b. The sender sends the bit value and continues to do so until it receives an acknowledgement of receipt. The receiver waits until it receives a bit value and then sends an acknowledgement and re-sends it indefinitely. The protocol can be modelled as an interpreted system I with a sender agent S, a receiver agent R and the environment Env as the unreliable channel. We would like to verify that once the sender receives the acknowledgement, the bit held by the receiver is the same as the bit held by the sender, that the receiver knows this, and that the sender knows that the receiver knows this: AG (recack −→ KS KR agree)
(1)
where KS and KR are the epistemic modalities for the sender and receiver agents respectively, and the propositional atom agree holds at a global state g if the variable bit in the receiver agrees with the variable bit in the sender, i.e., gR (bit) = gS (bit), and the propositional atom reckack holds when the sender has received an acknowledgement. The code in Fig.1 describes the system I and the CTLK specification (1) in ISPL. The code should be straightforward to understand in view of the above. We refer to [5] for more details.
A Data Symmetry Reduction Technique for Temporal-epistemic Logic Agent S Vars bit: {0,1}; rec_ack: {true,false}; Actions = {send__0,send__1,null}; Protocol rec_ack=false and bit=0: {send__0}; rec_ack=false and bit=1: {send__1}; rec_ack=true: {null}; end Protocol Evolution rec_ack=true if R.Action=sendack and Env.Action=transmit; end Evolution end Agent Agent Env Vars: state: {ok, error}; end Vars Actions = {transmit,drop}; Protocol: state=ok: {transmit}; state=error: {drop} end Protocol -- Evolution omitted end Agent
73
Agent R Vars bit: {0, 1}; rec_bit: {true, false}; -- Actions, Protocol omitted Evolution bit=0 and rec_bit=true if S.Action=send__0 and Env.Action=transmit; bit=1 and rec_bit=true if S.Action=send__1 and Env.Action=transmit; end Evolution end Agent InitStates S.rec_ack=false and R.rec_bit=false; end InitStates Evaluation agree if S.bit=R.bit; recack if S.rec_ack=true; end Evaluation Formulae AG recack -> K(S,K(R,agree)) end Formulae
Fig. 1. Bit-transmission protocol in ISPL
We briefly describe the features of basic ISPL needed to follow the discussion below; Fig.1 can be consulted for an example. An ISPL programs σ reflects the structure of the defined interpreted system I(σ), with one program section for each agent, and for each agent section four subsections containing, respectively: – Local variable and domain definitions of the form X: {d0 , . . . , dn }. – Action declarations listing the actions such as send 0 available to the agent. – Local protocol specifications of the form lcond : {a0 , . . . , an } where the ai are actions and lcond is a boolean combination of (domain correct) local equalities of the form X = X or X = d. – Local evolution function specifications of the form assign if acond where assign is a conjunction of local equalities and acond is a boolean combination of atoms i.Action = a, where a is an action declared in the agent named i. In addition, an ISPL program provides an initial state condition and truth conditions for atomic propositions, all in the form of global state conditions built from (domain correct) equalities of the form i.X = j.Y or i.X = d, where X and Y are local variables of agents i and j respectively.
3
A Data Symmetry Reduction Technique
In this section we present a data symmetry reduction technique for CTLK properties of interpreted systems. Subsection 3.1 extends the notion of data symmetry [1] to interpreted systems; Subsection 3.2 establishes the reduction result; Subsection 3.3 shows how to compute the reduced epistemic relations.
74
M. Cohen et al.
3.1
Data Symmetry
We assume that local states are built from variables (as they are in ISPL programs). In detail, an interpreted system I is given together with a set V ari of local variables for every agent i ∈ Ag ∪ {Env}, where each X ∈ V ari is associated with a non-empty set DX , the data domain of X. A local state l ∈ Li of agent i is a type respecting assignment to the variables in V ari , i.e., l(X) ∈ DX for every X ∈ V ari . We write D for the collection of all domains. Following [1] we mark domains as either ordered or unordered.1 Informally, a system is expected to treat all data from the same unordered domain in a symmetric fashion: Every permutation of data from such a domain should preserve the behaviours of the system. Definition 4 (Domain permutation). A domain permutation is a family π = {π D }D∈D of bijections π D : D −→ D that only change values in unordered domains, i.e., if D is ordered, then π D (d) = d, for d ∈ D. The domain permutation π naturally defines a bijection on the local states Li of agent i and a bijection on the global states S by point-wise application on data elements inside the states. In detail, for each l ∈ Li , π(l) ∈ Li is defined by π(l)(X) = π DX (l(X)) for local variable X ∈ V ari ; For each global state g ∈ S, π(g) = π(g1 ), ..., π(gn ), π(gEnv ). Definition 5 (Data symmetry). A set Δ ⊆ S of states is data symmetric iff g ∈ Δ iff π(g) ∈ Δ for all domain permutations π. A relation Δ ⊆ S × S between states is data symmetric iff g, g ∈ Δ iff π(g), π(g ) ∈ Δ for all domain permutations π. The system I is data symmetric iff the induced global transition relation R, the set I0 of initial states, and each extension V (p) of a propositional atom p are data symmetric. Example 2. Consider the protocol model I in Example 1 and mark the bit domain {0, 1} as an unordered domain. Two domain permutations are possible: the identity ι leaving all values unchanged, and the transposition f lip such that f lip{0,1}(0) = 1 and f lip{0,1}(1) = 0. It can be checked that both ι and f lip preserve the global transition relation, the set of initial states, and the extension of the propositional atom agree. Therefore the system I is data symmetric. Lemma 1. If system I is data symmetric, then so is the set G of reachable states, each epistemic relation ∼i , and any formula extension [[φ]]. Proof. (Sketch) G is data symmetric: Since I0 and R are data symmetric, ∼i is data symmetric: Assume g ∼i g ; then gi = gi and g, g ∈ G. From the former, π(gi ) = π(gi ), i.e., π(g)i = π(g )i . But, since G is data symmetric, π(g), π(g ) ∈ G. Thus, π(g) ∼i π(g ). [[φ]] is data symmetric: By induction on φ we can show that (I, g) |= φ iff (I, π(g)) |= φ. For the base step note that the extension V (p) of an atomic proposition is data symmetric. Induction step, epistemic modalities: Since ∼i is data symmetric. Induction step, temporal modalities: Since the global transition relation R is data symmetric. 1
Unordered domains are called scalarsets in [1].
A Data Symmetry Reduction Technique for Temporal-epistemic Logic
75
Given a data symmetric system I, the global states g, g ∈ S are said to be data symmetric, written g ≡ g , if and only if, π(g) = g for some domain permutation π. The equivalence class [g] of global state g with respect to ≡ is called the orbit of g. Analogously, the local states l, l ∈ Li are said to be data symmetric, l ≡ l , if and only if, π(l) = l for some domain permutation π. 3.2
Data Symmetry Reduction
In this section we present a technique that exploits data symmetries to reduce the number of initial states in interpreted systems. Let I be a data symmetric interpreted system. An abstraction of I is an interpreted system I A = S, ACT, P, t, I0 , V where I0 ⊆ I0 is minimal such that I0 = {[g] : g ∈ I0 }. Thus, the abstract system has a single representative initial state g for each orbit [g] of symmetric initial states in I0 . Example 3. Consider the system I from Example 2. There are eight initial states in I0 reflecting the eight possible joint assignments to the variable bit in the sender/receiver and the variable state in the environment. We can form an abstraction I A = S, ACT, P, t, I0 , V of I such that I0 contains only four initial states and in each of these S.bit=1. Observe that I A |= KR agree ∨ KR ¬agree, i.e., in the abstract system initially the receiver knows whether its variable agrees with the senders variable. This follows from the fact that agree ↔ R.bit = 1 holds at all reachable states in I A . By contrast, I |= KR agree ∨ KR ¬agree. As the example illustrates temporal-epistemic formulae are not preserved from the abstraction I A to the original system I (or vice versa). However, we show that we can make formulae invariant between the original system and the abstract system by abstracting the satisfaction relation. Definition 6 (Abstract epistemic relation). The abstract epistemic indisA tinguishability relation ∼A i ⊆ G × G for agent i is such that g ∼i g iff gi ≡ gi . In other words, data symmetric local states are indistinguishable under the abstract epistemic relation ∼A i . In the abstract semantics the knowledge modality Ki for agent i is defined by the abstract epistemic relation ∼A i for agent i. Definition 7 (Abstract satisfaction). Abstract satisfaction of φ at g in I, written (I, g) |=A φ, is defined inductively by: – Non-epistemic cases are the same as for standard satisfaction (Section 2) – (I, g) |=A Ki φ iff (I, g ) |=A φ for all g ∈ G such that g ∼A i g Example 4. Continuing Example 3, the abstract semantics avoids the unintended validity, i.e., I A |=A KR agree ∨ KR ¬agree. Pick an initial state g ∈ I0 in which gS (bit) = gR (bit) = 1. Then, g = gS , f lip(gR ), gEnv ∈ I0 and g ∼A R g , since gR ≡ f lip(gR). However, the atom agree has different truth values in g and g . Standard satisfaction on a data symmetric interpreted system I is equivalent to abstract satisfaction on the abstract system I A .
76
M. Cohen et al.
Theorem 1 (Reduction). I |= φ iff I A |=A φ, assuming I is data symmetric. Proof. (Sketch) By Lemma 1, G is data symmetric, and so G = {π(g) | any π, g ∈ GA }, where G and GA are the sets of reachable states in I and I A respectively. Therefore, we can evaluate the epistemic modality in I by scanning the reduced space G and apply agent permutations “on the fly”, expanding each state g into its equivalence class [g ]. So, (I, g) |= Ki φ iff ∀g ∈ GA : ∀π : g ∼i π(g ) ⇒ (I, π(g )) |= φ. By Lemma 1, [[φ]] is data symmetric, and so we can replace the test of the property φ at π(g ) with the test of φ at g , and so obtain: (I, g) |= Ki φ iff ∀g ∈ GA : ∀π : g ∼i π(g ) ⇒ (I, g ) |= φ. In other words, (I, g) |= Ki φ iff ∀g ∈ GA : gi ≡ gi ⇒ (I, g ) |= φ. By induction over φ, therefore, we obtain: (I, g) |= φ iff (I A , g) |=A φ, for all g ∈ GA . The theorem follows from this, since [[φ]] is data symmetric by Lemma 1. 3.3
Computing the Abstract Epistemic Relations
Computing the abstract epistemic relations may seem expensive when there is a large number of domain permutations. However, as we show below, we can compute the abstract epistemic relations without applying any domain permutation at all; two local states l and l are data symmetric if l and l satisfy the same equalities between variables with the same unordered domain, and in addition each variable with an ordered domain has the same value in l and l . Proposition 1 (Equivalence check). For any l, l ∈ Li , l ≡ l , if and only if, for all variables X, Y ∈ V ari with the same unordered domain DX = DY , 1. l(X) = l(Y ) iff l (X) = l (Y ) and for all variables X ∈ V ari with an ordered domain DX , 2. l(X) = l (X) Proof. Pick two local states l, l ∈ Li . For each domain D ∈ D, define the relation π D = {l(X), l (X)|X ∈ V ari , DX = D}. Condition (1) holds iff for each unordered domain D, π D is functional and injective, i.e., can be extended to a bijection on D. Condition (2) holds iff for each ordered domain D, π D preserves values, i.e., can be extended to the identity on D. For symbolic model checkers, Proposition 1 provides a constructive way of computing a boolean formula encoding of the abstract epistemic relations.
4
Data Symmetry Detection
In this section we establish a static test on ISPL programs that establishes whether a given interpreted system is data symmetric, and so amenable to reduction. A natural check [1] is to verify whether or not the program explicitly distinguishes between different values from an unordered domain.
A Data Symmetry Reduction Technique for Temporal-epistemic Logic Agent S -- Vars, Evolution as in Fig.1 Actions = {send(?bit),null}; Protocol rec_ack=false: {send(bit)}; rec_ack=true: {null}; end Protocol end Agent -- Agent Env, InitStates, -- Evaluation as in Fig.1
77
Agent R -- Vars, Actions, Protocol -- as in Fig.1 Evolution bit=?bit and rec_bit=true if S.Action=send(?bit) and Env.Action=transmit; end Evolution end Agent
Fig. 2. Bit-transmission protocol in extended ISPL
Definition 8 (Symbolic program). Assume an ISPL program σ and a partition of the variables’ domains in σ into ordered and unordered. The program σ, or a program section of σ, is symbolic iff ground values from an unordered domain appear only in domain definitions. For the concept of symbolic ISPL program to be useful, the actions in it need to carry parameters from unordered domains. Example 5. Consider the program in Fig.1 with the bit-domain marked as unordered; the program is not symbolic. Intuitively, the atomic actions send 0 and send 1 could be replaced by one action and a parameter. 4.1
Extended ISPL
We extend ISPL with structured actions that explicitly carry parameters from a specified domain; the parameter can be indicated symbolically by a variable. The syntax of the extended version of ISPL is as follows; Fig.2 can be consulted for an example. Local variables X of agents are declared as in basic ISPL. Each entry in the list of actions has the form a(?X), where X is a local variable.2 Intuitively, the macro-variable ?X represents an arbitrary value of the domain of X. A term t is either a local variable X, a macro variable ?X, or a ground element d (drawn from some domain). A parametric action is an expression of the form a(t) where the operation a and the argument term t have identical domains. The syntax for protocol sections and evolution sections are given in the same way as for basic ISPL but using the above definitions of term and parametric action. The initial states condition and the evaluation section are the same as in basic ISPL (i.e., no macro variables are allowed in equalities). Intuitively, local variables such as bit are evaluated and bound at time of execution in the expected fashion. For instance, in the protocol of S, Fig.2, the parametric action send(bit) represents both send 0 and send 1 depending on how the term bit is instantiated. By contrast, the macro-variable ?bit in the protocol of R represents an arbitrary bit value. Translation into basic ISPL. Extended ISPL programs are expanded into basic ones by means of the rewrite rules given in Fig. 3. As an illustration, the 2
For ease of presentation we restrict attention to the unary case.
78
M. Cohen et al.
{a1 (?X1 ), . . . , an (?Xn )} → {a1 (d1,1 ), . . . , a1 (d1,m1 ), . . . , an (dn,1 ), . . . , an (dn,mn )} (2) lcond : actions(x, y) →
d1 ,d2
assign(y) if acond (x, y) →
x = d1 and lcond : actions(d1 , d2 )
assign(d2 ) if acond (d1 , d2 )
(3) (4)
d1 ,d2
Fig. 3. Extended ISPL Translation Rules
(symbolic) program in Fig.2 translates to the (non-symbolic) program in Fig.1. Action lists are expanded according to (2) where {di,1 , . . . , di,mi } is the domain of ?Xi . Protocol and evolution rules are expanded to sets of rules (denoted as conjunctions) where x is the list of local variables and y is the list of macro variables occurring in the rule under consideration, and where we assume that substitutions respect domain assignments. Thus, each macro-variable is replaced during the translation by the elements from its domain. 4.2
Detection Theorem
We show that symbolic programs in extended ISPL define data symmetric systems. We assume throughout an extended ISPL program Σ which translates into a basic ISPL program σ, and we write I(Σ) for the interpreted system I(σ) defined by σ. We assume domains are divided into ordered and unordered. Observe that a domain permutation π defines a substitution of code fragments of σ. In particular, π(a d) = a π(d) for atomic actions a d, and π(x = d) = (x = π(d)) for local equalities (x = d). Continuing, π lifts to a bijection on the set ACT of joint actions by component-wise application: π(a) = π(a1 ), . . . , π(an ), π(aE ). According to the following lemma, if program Σ is symbolic then agent i is “syntactically data symmetric” in the sense that the set of protocol rules and the set of evolution rules in the translation σ are closed under domain permutations. Lemma 2 (Syntactic agent closure). Assume agent i is symbolic in Σ. 1. If Δ is a rule i’s protocol in σ, then so is π(Δ). 2. If Δ is a rule in i’s evolution in σ, then so is π(Δ). Proof. (Sketch) (1): By rewrite rule (3) of Fig. 3, there is a “source” protocol entry lcond : actions(x, y) in agent i in Σ such that Δ is the rule (x = d1 and lcond : actions(d1 , d2 )), for some d1 , d2 . By rewrite rule (3) again, (x = π(d1 ) and lcond : actions(π(d1 ), π(d2 )) is a protocol rule in agent i in σ. But, this rule is precisely π(Δ), since both lcond and actions(x, y) are symbolic. (2): By rewrite rule (4) of Fig. 3, there is an evolution entry assign(y) if acond (x, y) in agent i in Σ such that Δ is the rule (assign (d2 ) if acond (d1 , d2 )), for some
A Data Symmetry Reduction Technique for Temporal-epistemic Logic
79
d1 , d2 . By rewrite rule (4) again, (assign (π(d2 )) if acond (π(d1 ), π(d2 ))) is also an evolution rule in agent i in σ. But, this rule is π(Δ), since both assign(y) and acond (x, y) are symbolic. It follows that agents are “semantically data symmetric” as defined below. Definition 9 (Symmetric agent). Agent i is data symmetric in I(Σ) iff 1. a ∈ Pi (l) iff π(a) ∈ Pi (π(l)) 2. ti (a, l) = l iff ti (π(a), π(l)) = π(l ). Lemma 3. If agent i is symbolic in Σ, agent i is data symmetric in I(Σ). Proof. (Sketch) The local protocol Pi is data symmetric: Assume a ∈ Pi (l). By the semantics of basic ISPL, there is a protocol rule lcond : actions in agent i in the translation σ such that a ∈ actions and l satisfies lcond, and so π(a) ∈ π(actions) and π(l) |= π(lcond). But, by Lemma 2.1, π(lcond : actions) is also a protocol rule in agent i in σ, and so π(a) ∈ Pi (π(l)). We conclude that a ∈ Pi (l) implies π(a) ∈ Pi (π(l)). The converse implication follows by applying π −1 . The local evolution function ti is data symmetric: Assume ti (a, l) = l . By the semantics of basic ISPL, there is an evolution entry assign if acond in agent i in σ such that a satisfies acond and l [assign] l, using Floyd–Hoare logic style notation for local state updates. It follows that π(a) satisfies π(acond) and π(l) [π(assign)] π(l ). But, by Lemma 2.2, π(assign if acond) is an evolution rule in agent i in σ, and so ti (π(a), π(l)) = π(l ). We conclude that ti (a, l) = l implies ti (π(a), π(l)) = π(l ). The converse implication follows by applying π −1 . If agents are data symmetric then so is the induced global transition relation. Lemma 4. If each agent is data symmetric in I(Σ), then so R. Proof. By definition of R, g, g ∈ R iff there is a such that gi , gi ∈ ti (a) and ai ∈ Pi (gi ) for i ∈ Ag ∪ {E}. Since i is data symmetric, this is equivalent to π(gi ), π(gi ) ∈ ti (π(a)) and π(ai ) ∈ Pi (π(gi )) for i ∈ Ag ∪ {E}. In turn this is equivalent to π(g)i , π(g )i ∈ ti (π(a)) and π(ai ) ∈ Pi (π(g)i ) for i ∈ Ag ∪ {E}. By definition of R, this is equivalent to π(g), π(g ) ∈ R. We reach the symmetry detection result stating that every symbolic program in the extended ISPL defines a data symmetric interpreted system. Theorem 2 (Detection). If extended ISPL program Σ is symbolic then interpreted system I(Σ) is data symmetric. Proof. (i) R is data symmetric: By Lemmas 3 and 4. (ii) I0 is data symmetric: The initial states condition cond in the translation σ of Σ is symbolic, and so g satisfies cond iff π(g) satisfies cond. (iii) V (p) is data symmetric: Shown as (ii).
5
Implementation and Experiments
In this section we describe a prototype extension to MCMAS implementing the data symmetry reduction presented above and report on its performance for a well-known security protocol.
80
M. Cohen et al.
Implementation. The prototype extension takes as input an extended ISPL program in which some domains are marked as unordered, checks that the supplied program is data symmetric (using Detection Theorem 2), compiles it to basic ISPL (using the translation in Section 4), reduces the initial states (as described below) and, finally, checks the supplied CTLK specifications against the abstract semantics (using Proposition 1). To reduce the initial states, the prototype constructs the symbolic representation of a set S which contains exactly one representative state for each orbit class, i.e., S ⊆ S is minimal such that S = {[s] : s ∈ S }, where S is the set of possible global states for the supplied program. Roughly, the symbolic representation of S is a disjunction of assignments, one assignment for each possible pattern of identities between variables with unordered domains. In detail, let V be the set of variables with unordered domains, and let Δ ⊆ 2V be a partition of V such that each block δ ∈ Δ contains only variables that share the same domain. Intuitively, the partition Δ represents a pattern of identities between variables in V: X = Y if and only if X and Y belong to the same block δ ∈ Δ. For every such partition Δ, the prototype selects an assignment to variables in V that “agrees” with Δ; The symbolic representation of S is the disjunction of all such assignments: symS = X = d(δ) (5) Δ
δ∈Δ,X∈δ
where d(δ) is a value from the domain shared by variables in block δ; the value d(δ) is different for different blocks δ ∈ Δ from the same partition. A symbolic representation of the reduced set I0 of initial states can then be obtained by conjuncting symS with the initial states condition symI0 in the supplied program. To optimise the construction, the prototype distributes symI0 over the disjunction in symS : symI0 = (symI0 ∧ X = d(δ)) (6) Δ
δ∈Δ,X∈δ
i.e., it conjuncts with symI0 “on the fly” as symS is being constructed. As a further optimisation, the prototype generates only some of the possible variable partitions Δ. In particular, if (X = Y ) ∧ symI0 is empty for two variables X, Y ∈ V, the prototype excludes variable partitions Δ that have a subset δ containing both X and Y . The prototype computes the symbolic representation of the extension [[Ki φ]] with respect to the abstract satisfaction relation as follows: sym[[Ki φ]] = symG ∧ ¬ (symΔ ∧ P reImage(sym[[¬φ]] ∧ symΔ , ≈)) Δ
where symG is the symbolic representation of the set of reachable states; Δ ranges over partitions of the set of agent i’s local variables with unordered domains; symΔ expresses that variables in agent i agree with Δ, i.e., the conjunction of equalities X = Y and inequalities X = Z for X, Y belonging to the same
A Data Symmetry Reduction Technique for Temporal-epistemic Logic
81
block in Δ and X, Z belonging to different blocks in Δ; sym[[¬φ]] is the symbolic representation of the extension [[¬φ]]; ≈ relates global states with identical values for the variables in agent i with ordered domains, i.e., ≈ is the condition X = prim(X) where X ranges over agent i’s variables with ordered domains. X
NSPK. To evaluate the performance of the technique we tested the prototype on the Needham-Schroeder Public Key protocol (NSPK), a standard example in the security literature [18]. The NSPK protocol involves a number of A–agents and B–agents; each A–agent starts with a nonce (unique, unpredictable number ) N a, and each B–agent starts with a nonce N b. We considered the following CAPSL [19] authentication goal for the protocol: Knows B : Knows A : agree A : B : N a, N b
(7)
stating that when a protocol session between an A–agent and a B–agent ends, the agents share the nonces N a and N b, the A–agent knows this and the B–agent knows that the A–agent knows this.3 To verify the CAPSL goal (7) we modelled the NSPK protocol as an extended ISPL program with N agents, some of them A–agents and others B–agents. In addition, we modelled a Dolev-Yao attacker [21] in the environment agent. The intruder and all agents start with a unique, non-deterministic nonce value – a value for N a in the case of A–agents and value for N b in the case of B–agents. Thus we assumed a domain of N + 1 nonces; one nonce for the intruder and one for each agent.4 We marked the domain of nonces as unordered. Finally, we translated the CAPSL goal (7) into the following CTLK formula: AG (i.Step = 3 → Ki (agree(i, j) ∧ Kj agree(i, j))) (8) i:B
j:A
i:B
where i : B ranges over B–agents, and j : A ranges over A–agents, and agree(i, j) states that agents i and j agree on the protocol variables N a, N b, A and B, i.e., i.N a = j.N a, i.N b = j.N b, etc. The specification (8) states that whenever a B–agent i has completed all three protocol steps, the agent i knows that some A–agent j agrees with i, and agent i knows that this agent j knows that some B–agent i agrees with j. Experiments. Table 1 shows the total verification time (including the time it takes to reduce the initial states) in seconds and the number of reachable states for CTLK specification (8) and different number of participating agents. The experiments ran on a 2 GHz Intel machine with 2GB of memory running Linux. Each run was given a time limit of 24 hours. For this experiment we observed an exponential reduction in both time and space in the number N of agents. Specifically, the state space is reduced by the factor (N + 1)!, while the reduction in verification time is more irregular given that MCMAS is a symbolic 3 4
The goal was derived manually in [20]. It would be reasonable to provide the intruder with more than just one initial nonce; the reduction would then yield even bigger savings.
82
M. Cohen et al. Table 1. Verification results for NSPK Agents Without reduction States Time 3 1 536 3 4 11 400 28 5 651 600 7 716 6 – > 86 400 7 – > 86 400
With reduction States Time 64 1 95 4 905 9 12 256 24 21 989 91
model checker. We can expect even greater savings for security protocols that involve more than just one unique, unpredictable data value (nonce, session key, password, etc.) per agent.
6
Conclusions
We presented a data symmetry reduction technique for temporal-epistemic logic in the mainstream interpreted systems framework. The technique uses an abstract satisfaction relation in the reduced system; this was shown to make the reduction sound and complete, i.e., there are no false positives or false negatives in the reduced system. To facilitate the detection of data symmetric systems, i.e., systems amenable to reduction, we extended the interpreted systems programming language (ISPL) with parametric actions. We showed that symbolic programs in the extended ISPL define data symmetric systems. Experiments on the NSPK security protocol show an exponential reduction in verification time and state space for temporal-epistemic security goals. The reduction technique in this paper reduces initial states only. However, we emphasize that for some applications, such as the security protocol model considered in this paper, collapsing data symmetric initial states alone yields the same reduced state space as collapsing all data symmetric states, i.e., it yields the quotient model with respect to the orbit relation. Acknowledgments. The research described in this paper is partly supported by EPSRC funded project EP/E035655, by the European Commission Framework 6 funded project CONTRACT (IST Project Number 034418), and by grant 2003-6108 from the Swedish Research Council.
References 1. Ip, C.N., Dill, D.L.: Better verification through symmetry. Form. Methods Syst. Des. 9(1-2), 41–75 (1996) 2. Fagin, R., Halpern, J.Y., Vardi, M.Y., Moses, Y.: Reasoning about knowledge. MIT Press, Cambridge (1995) 3. Gammie, P., van der Meyden, R.: MCK: Model checking the logic of knowledge. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 479–483. Springer, Heidelberg (2004)
A Data Symmetry Reduction Technique for Temporal-epistemic Logic
83
4. Nabialek, W., Niewiadomski, A., Penczek, W., P´ olrola, A., Szreter, M.: VerICS 2004: A model checker for real time and multi-agent systems. In: Proc. CS&P 2004, pp. 88–99. Humboldt University (2004) 5. Lomuscio, A., Qu, H., Raimondi, F.: MCMAS: A model checker for multiagent systems. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 682–688. Springer, Heidelberg (2009) 6. Lomuscio, A., Qu, H., Solanki, M.: Towards verifying contract regulated service composition. In: Proc. ICWS 2008, pp. 254–261. IEEE Computer Society, Los Alamitos (2008) 7. Ezekiel, J., Lomuscio, A.: Combining fault injection and model checking to verify fault tolerance in multi-agent systems. In: Proc. AAMAS 2009 (to appear, 2009) 8. van der Meyden, R., Su, K.: Symbolic model checking the knowledge of the dining cryptographers. In: Proc. CSFW 2004, Washington, DC, USA, p. 280. IEEE Computer Society, Los Alamitos (2004) 9. Dechesne, F., Orzan, S., Wang, Y.: Refinement of kripke models for dynamics. In: Fitzgerald, J.S., Haxthausen, A.E., Yenigun, H. (eds.) ICTAC 2008. LNCS, vol. 5160, pp. 111–125. Springer, Heidelberg (2008) 10. Enea, C., Dima, C.: Abstractions of multi-agent systems. In: Burkhard, H.-D., Lindemann, G., Verbrugge, R., Varga, L.Z. (eds.) CEEMAS 2007. LNCS (LNAI), vol. 4696, pp. 11–21. Springer, Heidelberg (2007) 11. Wooldridge, M.: Computationally grounded theories of agency. In: Proc. ICMAS 2000, pp. 13–22. IEEE Press, Los Alamitos (2000) 12. Cohen, M., Dam, M., Lomuscio, A., Russo, F.: Abstraction in model checking multi-agent systems. In: Proc. AAMAS 2009 (to appear, 2009) 13. Cohen, M., Dam, M., Lomuscio, A., Qu, H.: A symmetry reduction technique for model checking temporal epistemic logic. In: Proc. IJCAI 2009 (to appear, 2009) 14. Clarke, E.M., Enders, R., Filkorn, T., Jha, S.: Exploiting symmetry in temporal logic model checking. Form. Methods Syst. Des. 9(1-2), 77–104 (1996) 15. Emerson, E.A., Sistla, A.P.: Symmetry and model checking. Form. Methods Syst. Des. 9(1-2), 105–131 (1996) 16. Lewis, D.: Counterpart theory and quantified modal logic. Journal of Philosophy 65, 113–126 (1968) 17. van der Meyden, R., Wong, K.S.: Complete axiomatizations for reasoning about knowledge and branching time. Studia Logica 75(1), 93–123 (2003) 18. Needham, R.M., Schroeder, M.D.: Using encryption for authentication in large networks of computers. Commun. ACM 21(12), 993–999 (1978) 19. Denker, G., Millen, J.: Capsl integrated protocol environment. In: Proc. DISCEX 2000, pp. 207–221. IEEE Computer Society, Los Alamitos (2000) 20. Burrows, M., Abadi, M., Needham, R.: A logic of authentication. ACM Trans. Comput. Syst. 8(1), 18–36 (1990) 21. Dolev, D., Yao, A.: On the security of public key protocols. IEEE Transactions on Information Theory 29(2), 198–208 (1983)
TAPAAL: Editor, Simulator and Verifier of Timed-Arc Petri Nets Joakim Byg, Kenneth Yrke Jørgensen, and Jiˇr´ı Srba Department of Computer Science, Aalborg University, Selma Lagerl¨ ofs Vej 300, 9220 Aalborg Øst, Denmark
Abstract. TAPAAL is a new platform independent tool for modelling, simulation and verification of timed-arc Petri nets. TAPAAL provides a stand-alone editor and simulator, while the verification module translates timed-arc Petri net models into networks of timed automata and uses the UPPAAL engine for the automatic analysis. We report on the status of the first release of TAPAAL (available at www.tapaal.net), on its new modelling features and we demonstrate the efficiency and modelling capabilities of the tool on a few examples.
1
Introduction
Petri net is a popular mathematical model of discrete distributed systems introduced in 1962 by Carl Adam Petri in his PhD thesis. Since then numerous extensions of the basic place/transition model were studied and supported by a number of academic as well as industrial tools [8]. Many recent works consider various extensions of the Petri net model with time features that can be associated to places, transitions, arcs or tokens. A recent overview aiming at a comparison of the different time dependent models (including timed automata) is given in [15]. In the TAPAAL tool we consider Timed-Arc Petri Nets (TAPN) [4, 7] where an age (a real number) is associated with each token in a net and time intervals are placed on arcs in order to restrict the ages of tokens that can be used for firing a transition. The advantages of this model are an intuitive semantics and the decidability of a number of problems like coverability and boundedness (for references see [15]). On the other hand, the impossibility to describe urgent behaviour limited its modelling power and wider applicability. TAPAAL extends the TAPN model with new features of invariants and transport arcs in order to model urgent behaviour and transportation of tokens without resetting their age. TAPAAL has an intuitive modelling environment for editing and simulation of TAPN. It also provides a verification module with automatic checking of bounded TAPN models against safety and liveness requirements via a translation to networks of timed automata and then using the UPPAAL [16] engine as a back-end for verification.
Author was partially supported by Ministry of Education of the Czech Republic, project No. MSM 0021622419.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 84–89, 2009. c Springer-Verlag Berlin Heidelberg 2009
TAPAAL: Editor, Simulator and Verifier of Timed-Arc Petri Nets
85
The connection between bounded TAPN and timed automata was studied in [13, 14, 5] and while theoretically satisfactory, the translations described in these papers are not suitable for a tool implementation as they either cause an exponential blow-up in the size or create a new parallel component with a fresh local clock for each place in the net. As UPPAAL performance becomes significantly slower with the growing number of parallel processes and clocks, the verification of larger nets with little or no concurrent behaviour (few tokens in the net) becomes intractable. In TAPAAL we suggest a novel translation technique where a fresh parallel component (with a local clock) is created only for each token in the net. The proposed translation also transforms safety and liveness properties (EF, AG, EG, AF) into equivalent UPPAAL queries. One of the main advantages of this translation approach is the possibility to use active clock reduction and symmetry reduction techniques recently implemented in UPPAAL. As a result the verifiable size of models increases by orders of magnitude. To the best of our knowledge, TAPAAL is the first publicly available tool which offers modelling, simulation and verification of timed-arc Petri nets with continuous time. There is only one related tool prototype mentioned in [1] where the authors discuss a coverability algorithm for general (unbounded) nets, though without any urgent behaviour. Time features (time stamps) connected to tokens can be modelled also in Coloured Petri Nets using CPN Tools [10], however, time passing is represented here using a global clock rather than the local ones as in TAPN, only discrete time semantics is implemented in CPN Tools and the analysis can be nondeterministic as the time stamps are in some situations ignored during the state-space construction.
2
TAPAAL Framework
TAPAAL offers an editor, a simulator and a verifier for TAPN. It is written in Java 6.0 using Java Swing for the GUI components and it is so available for the majority of existing platforms. TAPAAL’s graphical editor features all necessary elements for a creation of TAPN models, including invariants on places and transport arcs. The user interface supports, among others, a select/move feature for moving a selected subnet of the model as well as an undo/redo buttons allowing the user to move backward and forward in the history during a creation of larger models. Constructed nets and queries are saved in an interchangeable XML format using the Petri Net Markup Language (PNML) [12] further extended with TAPAAL specific timing features. An important aspect of the graphical editor is that it disallows to enter syntactically incorrect nets and hence no syntax checks are necessary before calling further TAPAAL modules. The simulator part of TAPAAL allows one to inspect the behaviour of a TAPN by graphically simulating the effects of time delays and transition firings. When firing a transition the user can either manually select the concrete tokens that will be used for the firing or simply allow the simulator to automatically
86
J. Byg, K.Y. Jørgensen, and J. Srba
select the tokens based on some predefined strategy: youngest, oldest or random. The simulator also allows the user to step back and forth in the simulated trace, which makes it easier to investigate alternative net behaviours. TAPAAL’s verification module enables us to check safety and liveness queries in the constructed net. Queries are created using a graphical query dialog, completely eliminating the possibility of introducing syntactical errors and offering an intuitive and easy to use query formulation mechanism. The TAPAAL query language is a subset of the CTL logic comprising EF, AG, EG and AF temporal operators1, however, several TCTL properties can be verified by encoding them into the net. The actual verification is done via translating TAPN models into networks of timed automata and by using the model checker UPPAAL. The verification calls to UPPAAL are seamlessly integrated inside the TAPAAL environment and the returned error traces (if any) are displayed in TAPAAL’s simulator. For safety questions concrete traces are displayed whenever the command-line UPPAAL engine can output them, otherwise the user is offered an untimed trace and can in the simulation mode experiment with suitable time delays in order to realize the displayed trace in the net. A number of verification/trace options found in UPPAAL are also available in TAPAAL, including a symmetry reduction option which often provides orders of magnitude improvement with respect to verification time, though at the expense of disallowing trace options (a current limitation of UPPAAL). Finally, it is possible to check whether the constructed net is k-bounded or not, for any given k. The tool provides a suitable under-approximation of the net behaviour in case the net is unbounded. The TAPAAL code consists of two parts: the editor and simulator, extending the Platform Independent Petri net Editor project PIPE version 2.5 [9], which is licensed under the Open Software License 3.0, and a framework for translating TAPN models into UPPAAL, licensed under the BSD License.
3
Experiments
We shall now report on a few experiments investigating the modelling capabilities and verification efficiency of TAPAAL. All examples are included in the TAPAAL distribution and can be directly downloaded also from www.tapaal.net. 3.1
Workflow Processes with Deadlines
Workflow processes provide a classical case study suitable for modelling in different variants of Petri nets. A recent focus is, among others, on the addition of timing aspects into the automated analysis process. Gonzalez del Foyo and Silva consider in [6] workflow diagrams extended with task durations and the latest execution deadline of each task. They provide a translation into Time Petri Nets (TPN), where clocks are associated with each transition in the net, and use the tool TINA [3] to analyze schedulability questions. An example of a workflow process (taken from [6]) is illustrated in Fig. 1. 1
At the moment the EG and AF queries are supported only for nets with transitions that do not contain more than two input and two output places.
TAPAAL: Editor, Simulator and Verifier of Timed-Arc Petri Nets
Task Duration Deadline A0 5 5 A1 4 9 A2 4 15 A3 2 9 A4 2 8 A5 3 13 A6 3 18 A7 2 25
A1 A0
87
A2
A3
Sync Sync
A5
A7
A6
A4
Fig. 1. A simple workflow diagram and its timed-arc Petri net model
The translation described in [6] relies on preprocessing of the workflow so that the individual (relative) deadlines for each task must be computed before a TPN model can be constructed. In our model of extended timed-arc Petri nets a more direct translation without any preprocessing can be given. See Fig. 1 for a TAPAAL Petri net model resulting from the translation of the workflow example. Every transition with the naming schema Ai done corresponds to the finalisation of the execution of the task Ai . The duration constraints are encoded directly into the net and the global deadlines are handled by adding a fresh place called Deadlines, containing one token (initially of age 0). The latest execution deadline Xi of a task Ai (where 0 ≤ i ≤ 7) is then ensured by adding (for each i) a pair of transport arcs constrained with the time interval [0, Xi ] between the place Deadlines and the corresponding transition Ai done. A schematic illustration is given in Fig. 1. Notice that transport arcs have different arrow tips than the standard arcs and are annotated with the label 1 to denote which arcs are paired into a route for the age-preserving token transportation (the annotations are relevant only in the presence of multiple transport arcs connected to a single transition). As the age of the token in the place Deadlines is never reset, it is guaranteed that the latest execution deadlines of all tasks are met. The workflow example was verified in TAPAAL against the query EF (Work Done = 1) and by selecting the fastest trace option the tool returned in
88
J. Byg, K.Y. Jørgensen, and J. Srba
0.1s the following scheduling of task executions together with the necessary time delays: 5, A0 Done, 2, A3 Done, A4 Done, Sync1 Done, 2, A1 Done, 1, A5 Done, 3, A2 Done, A6 Done, Sync2 Done, 2, A7 Done. 3.2
Fischer’s Protocol and Alternating Bit Protocol
Fischer’s protocol [11] for mutual exclusion and alternating bit protocol [2] for network communication via lossy communication medium are well-known and scalable examples used for a tool performance testing. In our experiments we managed to verify Fischer’s protocol (with a TAPN model taken from [1]) for 200 processes (each with its own clock) within 29 minutes, while an equivalent timed automaton model of the protocol provided in the UPPAAL distribution verified 200 processes in 2 hours and 22 minutes. The experiment showed a speed-up of 205% for 100 processes, 293% for 150 processes and 393% for 200 processes. The explanation to this seemingly surprising phenomenon is that the translated timed automata model of the protocol contains on one hand more discrete states than the native UPPAAL model (about twice as many), but on the other hand the zones that are tested for inclusion are smaller. As a result, the verification times for the TAPAAL produced automata are significantly faster. Correctness of the alternating bit protocol was verified for up to 50 messages (each message has its own time-stamp) currently present in the protocol in less than two hours, while a native UPPAAL model verification took more than a day without any result. The speed-up was even more significant than in the case of Fischer’s protocol: for 15 messages UPPAAL used 136 seconds and TAPAAL 7.3 seconds, for 17 messages UPPAAL needed 32 minutes and TAPAAL 13.7 seconds. There was also a difference in the tool performance depending on what kind of verification options were used, demonstrating a similar pattern of behaviour as in the case of Fischer’s protocol (models with more discrete states and smaller zones verify faster).
4
Conclusion
TAPAAL offers a graphical environment for editing, simulation and verification of timed-arc Petri nets and the introduction of novel elements like invariants on places and transport arcs provides useful features particularly suitable for modelling of workflow processes, time sensitive communication protocols and other systems. The tool shows a promising performance in verification of safety and liveness properties. The future development will focus on incorporating C-like functions and data structures into tokens in the net, on extending the firing policies with urgency and priorities, and on generalizing the query language. The aim is also to provide concrete error traces for liveness properties, rather than only the abstract or untimed ones (a current limitation of UPPAAL). We also plan to extend the model with cost, probability and game semantics and then use the corresponding UPPAAL branches (CORA, PROB and TIGA) for verification.
TAPAAL: Editor, Simulator and Verifier of Timed-Arc Petri Nets
89
Acknowledgments. We would like to thank the UPPAAL team at Aalborg University and in particular Alexandre David for numerous discussions on the topic.
References [1] Abdulla, P.A., Nyl´en, A.: Timed petri nets and BQOs. In: ICATPN 2001. LNCS, vol. 2075, pp. 53–70. Springer, Heidelberg (2001) [2] Bartlett, K.A., Scantlebury, R.A., Wilkinson, P.T.: A note on reliable full-duplex transmission over half-duplex links. Commun. ACM 12(5), 260–261 (1969) [3] Berthomieu, B., Ribet, P.-O., Vernadat, F.: The tool TINA — construction of abstract state spaces for Petri nets and time Petri nets. International Journal of Production Research 42(14), 2741–2756 (2004) [4] Bolognesi, T., Lucidi, F., Trigila, S.: From timed Petri nets to timed LOTOS. In: Proceedings of the IFIP WG 6.1 Tenth International Symposium on Protocol Specification, Testing and Verification, Ottawa, pp. 1–14 (1990) [5] Bouyer, P., Haddad, S., Reynier, P.-A.: Timed Petri nets and timed automata: On the discriminating power of Zeno sequences. Information and Computation 206(1), 73–107 (2008) [6] Gonzalez del Foyo, P.M., Silva, J.R.: Using time Petri nets for modelling and verification of timed constrained workflow systems. In: ABCM Symposium Series in Mechatronics. ABCM, vol. 3, pp. 471–478. ABCM - Brazilian Society of Mechanical Sciences and Engineering (2008) [7] Hanisch, H.M.: Analysis of place/transition nets with timed-arcs and its application to batch process control. In: Ajmone Marsan, M. (ed.) ICATPN 1993. LNCS, vol. 691, pp. 282–299. Springer, Heidelberg (1993) [8] Heitmann, F., Moldt, D., Mortensen, K.H., R¨ olke, H.: Petri nets tools database quick overview, http://www.informatik.uni-hamburg.de/TGI/PetriNets/tools/quick.html [9] Platform Independent Petri net Editor 2.5, http://pipe2.sourceforge.net [10] Jensen, K., Kristensen, L., Wells, L.: Coloured Petri nets and CPN tools for modelling and validation of concurrent systems. International Journal on Software Tools for Technology Transfer (STTT) 9(3), 213–254 (2007) [11] Lamport, L.: A fast mutual exclusion algorithm. ACM Transactions on Computer Systems 5(1), 1–11 (1987) [12] Petri Net Markup Language, http://www2.informatik.hu-berlin.de/top/pnml [13] Sifakis, J., Yovine, S.: Compositional specification of timed systems. In: Puech, C., Reischuk, R. (eds.) STACS 1996. LNCS, vol. 1046, pp. 347–359. Springer, Heidelberg (1996) [14] Srba, J.: Timed-arc Petri nets vs. networks of timed automata. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 385–402. Springer, Heidelberg (2005) [15] Srba, J.: Comparing the expressiveness of timed automata and timed extensions of Petri nets. In: Cassez, F., Jard, C. (eds.) FORMATS 2008. LNCS, vol. 5215, pp. 15–32. Springer, Heidelberg (2008) [16] UPPAAL, http://www.uppaal.com
CLAN: A Tool for Contract Analysis and Conflict Discovery Stephen Fenech1 , Gordon J. Pace1 , and Gerardo Schneider2 1
Dept. of Computer Science, University of Malta, Malta 2 Dept. of Informatics, University of Oslo, Norway {sfen002,gordon.pace}@um.edu.mt,
[email protected] Abstract. As Service-Oriented Architectures are more widely adopted, it becomes more important to adopt measures for ensuring that the services satisfy functional and non-functional requirements. One approach is the use of contracts based on deontic logics, expressing obligations, permissions and prohibitions of the different actors. A challenging aspect is that of service composition, in which the contracts composed together may result in conflicting situations, so there is a need to analyse contracts and ensure their soundness. In this paper, we present CLAN, a tool for automatic analysis of conflicting clauses of contracts written in the contract language CL. We present a small case study of an airline check-in desk illustrating the use of the tool.
1 Introduction and Background In Service-Oriented Architectures services are frequently composed of different subservices, each with its own contract. Not only does the service user require a a guarantee that each single contract is conflict-free, but also that the combination of the contracts is also conflict-free — meaning that the contracts will never lead to conflicting or contradictory normative directives. This is even more challenging in a dynamic setting, in which contracts may only be acquired at runtime. A common view of contracts is that of properties which the system must (or is guaranteed) to satisfy. However, when analysing contracts for conflicts, the need to analyse and reason about contracts is extremely important, and looking at contracts simply as logical properties may hide conflicts altogether. The use of deontic logic to enable reasoning explicitly about normative information in a contract and about exceptional behaviour is one alternative approach to contract analysis. CL [3] is a formal language to specify deontic electronic contracts. The language has a trace semantics [2], which although useful for runtime monitoring of contracts, lacks the deontic information concerning the obligations, permissions and prohibitions of the involved parties in the contract, and thus it is not suitable for conflict analysis. We have recently developed conflict analysis techniques for CL [1], and in this paper, we present a tool implementing these techniques for the automatic analysis of contracts written in CL.
Partially supported by the Nordunet3 project COSoDIS: “Contract-Oriented Software Development for Internet Services”.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 90–96, 2009. c Springer-Verlag Berlin Heidelberg 2009
CLAN: A Tool for Contract Analysis and Conflict Discovery
91
The Contract Language CL. Deontic logics enable explicit reasoning about, not only the actual state of affairs in a system e.g. ‘the client has paid,’ but also about the ideal state of affairs e.g. ‘the client is obliged to pay’ or ‘the client is permitted to request a service.’ CL is based on a combination of deontic, dynamic and temporal logics, allowing the representation of the deontic notions of obligations, permissions and prohibitions, as well as temporal aspects. Moreover, it also gives a mean to specify exceptional behaviours arising from the violation of obligations (what is to be demanded in case an obligation is not fulfilled) and of prohibitions (what is the penalty in case a prohibition is violated). These are usually known as Contrary-to-Duties (CTDs) and Contrary-toProhibitions (CTPs) respectively. CL contracts are written using the following syntax: C := CO |CP |CF |C ∧ C|[β]C||⊥ CO := OC (α)|CO ⊕ CO CP := P (α)|CP ⊕ CP CF := FC (δ)|CF ∨ [α]CF α := 0|1|a|α&α|α · α|α + α
β := 0|1|a|β&β|β · β|β + β|β ∗
A contract clause C can be either an obligation (CO ), a permission (CP ) or a prohibition (CF ) clause, a conjunction of two clauses or a clause preceded by the dynamic logic square brackets. OC (α) is interpreted as the obligation to perform α in which case, if violated, then the reparation contract C must be executed (a CTD). FC (α) is interpreted as forbidden to perform α and if α is performed then the reparation C must be executed (a CTP). Permission is represented as P (α), identifying that the action expression α is permitted. Note that repetition in actions (using the ∗ operator) is not allowed inside the deontic modalities. They are, however allowed in dynamic logic-style conditions. [β]C is interpreted as if action β is performed then the contract C must be executed — if β is not performed, the contract is trivially satisfied. ∧ allows the conjunction of clauses, ⊕ is used as an exclusive choice between certain clauses, and ⊥ are the trivially satisfied (violated) contract. Compound actions can be constructed from basic ones using the operators &, ·, + and ∗ where & stands for the actions occurring concurrently, · stands for the actions to occur in sequence, + stands for choice, and ∗ for repetition. 1 is an action expression matching any action, while 0 is the impossible action. In order to avoid paradoxes the operators combining obligations, permissions and prohibitions are restricted syntactically. See [3,2] for more details on CL. As a simple example, let us consider the following clause from an airline company contract: ‘When checking in, the traveller is obliged to have a luggage within the weight limit — if exceeded, the traveller is obliged to pay extra.’ This would be represented in CL as [checkIn]OO(pay) (withinW eightLimit). Trace Semantics. The trace semantics presented in [2] enables checking whether or not a trace satisfies a contract. However, deontic information is not preserved in the trace and thus it is not suitable to be used for conflict detection. By a conflict we mean for instance that the contract permits and forbids performing the same action at the same time (see below for a more formal definition). We will use lower case letters (a, b . . .) to represent atomic actions, Greek letters (α, β . . .) for compound actions, and Greek letters with a subscript & (α& , β& , . . . ) for
92
S. Fenech, G.J. Pace, and G. Schneider
compound concurrent actions built from atomic actions and the concurrency operator &. The set of all such concurrent actions will be written A& . We use # to denote mutually exclusive actions (for example, if a stands for ‘opening the check-in desk’ and b for ‘closing the check-in desk’, we write a#b). A trace is a sequence of sets of actions, giving the set of actions present at each point in time. The Greek letter σ will be used to represent traces, using functional notation for indexing starting at zero i.e. σ(n) is the (n − 1)th element of trace σ. For a trace σ to satisfy an obligation, OC (α& ), α& must be a subset of σ(0) or the rest of the trace must satisfy the reparation C, thus for the obligation to be satisfied all the atomic actions in α& must be present in the first set of the sequence. For prohibitions the converse is required, e.g. not all the actions of α& are executed in the first step. In order to enable conflict analysis, we start by adding deontic information in an additional trace, giving two parallel traces — a trace of actions (σ) and a trace of deontic notions (σd ). Similar to σ, σd is defined as a sequence of sets whose elements are from the set Da which is defined as {Oa | a ∈ A} ∪ {Fa | a ∈ A} ∪ {Pa | a ∈ A} where Oa stands for the obligation to do a, Fa stands for the prohibition to do a and Pa for permission to do a. Also, since conflicts may result in sequences of finite behaviour which cannot be extended (due to the conflict), we reinterpret the semantics over finite traces. A conflict may result in reaching a state where we have only the option of violating the contract, thus any infinite trace which leads to this conflicting state will result not being accepted by the semantics. We need to be able to check that a finite trace has not yet violated the contract and then check if the following state is conflicting. Furthermore, if α is a set of concurrent atomic actions then we will use Oα to denote the set {Oa | a ∈ α}. Note that the semantics is given in the form of σ, σd C, which says that an action trace σ and a trace with deontic information σd satisfies a contract C. We show the trace semantics for the obligation as an example:
, OC (α& ) (β : σ), (βd : σd ) OC (α& ) if (α& ⊆ β and O(α& ) ∈ βd ) or σ, σd C Note that in the above, pattern matching is used to split the traces into the head (β and βd ) and the tails (σ and σd ). The above says that empty traces satisfy an obligation, while non-empty traces satisfy the contract if either (i) the obligation is satisfied, and the obligation is registered in the deontic trace; or (ii) the reparation (CTD) is satisfied by the remainder of the trace. See [1] for more details. Conflict Analysis. Conflicts in contracts arise for four different reasons:1 (i) obligation and prohibition on the same action; (ii) permission and prohibition on the same action; (iii) obligation to perform mutually exclusive actions; and (iv) permission and obligation to perform mutually exclusive actions. With conflicts of the first type one would end up in a state where performing any action leads to a violation of the contract. The second conflict type results in traces 1
By tabulating all combinations of the deontic operators, one finds that there are two basic underlying cases of conflict — concurrent obligation and prohibition, and concurrent permission and prohibition. Adding constraints on the concurrency of actions, one can identify the additional two cases. More complex temporal constraints on actions may give rise to others, but which can also be reduced to these basic four cases.
CLAN: A Tool for Contract Analysis and Conflict Discovery
93
which also violate the contract even though permissions cannot be broken, since the deontic information is kept in the semantics. The remaining two cases correspond to mutually exclusive actions. Freedom from conflict can be defined formally as follows: Given a trace σd of a contract C, let D, D ⊆ σd (i) (with i ≥ 0). We say that D is in conflict with D iff there exists at least one element e ∈ D such that: e = Oa ∧ (Fa ∈ D ∨ (Pb ∈ D ∧ a#b) ∨ (Ob ∈ D ∧ a#b)) or e = Pa ∧ (Fa ∈ D ∨ (Pb ∈ D ∧ a#b) ∨ (Ob ∈ D ∧ a#b)) or e = Fa ∧ (Pa ∈ D ∨ Oa ∈ D ) A contract C is said to be conflict-free if for all traces σ and σd such that σ, σd C, then for any D, D ⊆ σd (i) (0 ≤ i ≤ len(σd )), D and D are not in conflict. As an example, let us consider the contract C = [a]O(b + c) ∧ [b]F(b), stipulating the obligation of the choice of doing b or c after an a, and the prohibition of doing b if previously a b has been done. We have that C is not conflict-free since
{a, b}, {b}, {∅}, {{Ob, Oc }, {Fb }} C, and there are D, D ⊆ σd (1) such that D and D are in conflict. To see this, let us take D = {Ob , Oc } and e = Ob . We have then that for D = {Fb }, Fb ∈ D (satisfying the first line of the above definition). By unwinding a CL formula according to the finite trace semantics, we create an automaton which accepts all non-violating traces, and such that any trace resulting in a violation ends up in a violating state. Furthermore, we label the states of the automaton with deontic information provided in σd , so we can ensure that a contract is conflictfree simply through the analysis of the resulting reachable states (non-violating states). States of the automaton contain a set of formulae still to be satisfied, following the standard sub-formula construction (e.g., as for CTL). Each transition is labelled with the set of actions that are to be performed in order to move along the transition. Once the automaton is generated we can check for the four types of conflicts on all the states. If there is a conflict of type (i) or (iii), then all transitions out of the state go to a special violation state. In general we might need to generate all possible transitions before processing each sub-formula, resulting on a big automaton. In practice, we improve the algorithm in such a way that we create all and only those required transitions reducing the size considerably. Conflict analysis can also be done on-the-fly without the need to create the complete automaton. One can process the states without storing the transitions and store only satisfied subformulae (for termination), in this manner, memory issues are reduced since only a part of the automaton is stored in memory.
2 A Tool for Contract Analysis CLAN2 is a tool for detection of normative conflicts in CL contracts, enabling: (i) The automatic analysis of the contract for normative conflicts; (ii) The automatic generation of a monitor for the contract. The analyses are particularly useful when the contract is being written (enabling the discovery of undesired conflicts), before adhering to a given contract (to ensure unambiguous enforcement of the contract), and during contract enforcement (monitoring). 2
CLAN can be downloaded from http://www.cs.um.edu.mt/˜svrg/Tools/CLTool
94
S. Fenech, G.J. Pace, and G. Schneider
(b)
(a) Fig. 1. (a) Screen shot of the tool; (b) Automaton generated for [c]O(b) ∧ [a]F(b)
The core of CLAN is implemented in Java, consisting of 700 lines of code. This does not include the additional code, over and above the conflict discovery algorithm, for the graphical user interface (a screen shot can be seen in Fig. 1-(a)). CL contracts and additional information (such as which actions are mutually exclusive) are given to the tool which then performs the conflict analysis. Upon discovery of a conflict, it gives a counter-example trace. The tool also offers an aid to analyse traces, and the possibility of visualising the generated automaton, as the one shown in Fig. 1-(b). CLAN has been used to analyse a large contract resulting in a graph with 64,000 states, consuming approximately 700MB of memory; the analysis took around 94 minutes. The analysis seems to scale linearly in memory and polynomially in the number of states. The complexity of the automaton increases exponentially on the number of actions, since all the possible combinations to generate concurrent actions must be considered. We are currently working on how to optimise the analysis to avoid such exponential state-explosion.
3 Case Study We briefly present here a portion of a small case study, starting from a draft contract written in English, which is formalised in CL and analysed using CLAN. The full example can be found in [1]. The case study concerns a contract between an airline company
CLAN: A Tool for Contract Analysis and Conflict Discovery
95
and a company taking care of the ground crew (mainly the check-in process) Some clauses of the contract expressed in English and CL, are given below: 1. The ground crew is obliged to open the check-in desk and request the passenger manifest two hours before the flight leaves. [1∗ ][twoHBefore]OO(issueFine) (openCheckIn & requestInfo) 2. After the check-in desk is opened the check-in crew is obliged to initiate the check-in process with any customer present by checking that the passport details match what is written on the ticket and that the luggage is within the weight limits. Then they are obliged to issue the boarding pass. [1∗ ][openCheckIn][1∗ ](O(correctDetails & luggageInLimit) ∧ [correctDetails & luggageInLimit]OO(issueFine) (issueBoardingCard)) 3. The ground crew is obliged to close the check-in desk 20 minutes before the flight is due to leave and not before. ([1∗ ][20mBefore]OO(issueFine) (closeCheckIn)) ∧ ∗
([20mBefore ]FO(issueFine) (closeCheckIn)) 4. If any of the above obligations and prohibitions are violated a fine is to be paid. [1∗ ][closeCheckIn][1∗ ](FO(issueFine) (openCheckIn)∧FO(issueFine) (issueBoardingCard))
On this size of example, the tool gives practically instantaneous results, identifying conflicts such as the concurrent obligation and prohibition to perform action issueBoardingCard, together with a counter-example trace. Looking at clause 2, once the crew opens the check-in desk, they are always obliged to issue a boarding pass if the client has the correct details. However, according to clause 4 it is prohibited to issue of boarding pass once the check-in desk is closed. These two clauses are in conflict once the check-in desk is closed and a client arrives to the desk with the correct details. To fix this problem one has to change clause 2 so that after the check-in desk is opened, the ground crew is obliged to issue the boarding pass as long as the desk has not been closed. In the full case study, other conflicts due to mutual exclusion are also found and fixed in the English contract.
4 Conclusions The analysis of contracts for the discovery of potential conflicts can be crucial to enable safe and dependable contract adoption and composition at runtime. In this paper we have presented CLAN, a tool for automatic detection of conflicting clauses in contracts written in the deontic language CL. Currently the tool only provides the functionality of detecting conflicts. However, many other analysis may be done by slightly modifying the underlying algorithm, as for instance the generation of a model from the contract which can be processed by a model checker. Furthermore, other contract analysis techniques are planned to be added to the tool to enable analysis for overlapping, superfluous and unreachable sub-clauses of a contract. See [1] for related works.
96
S. Fenech, G.J. Pace, and G. Schneider
References 1. Fenech, S., Pace, G.J., Schneider, G.: Automatic Conflict Detection on Contracts. In: Leucker, M., Morgan, C. (eds.) ICTAC 2009. LNCS, vol. 5684, pp. 200–214. Springer, Heidelberg (2009) 2. Kyas, M., Prisacariu, C., Schneider, G.: Run-time monitoring of electronic contracts. In: Cha, S., Choi, J.-Y., Kim, M., Lee, I., Viswanathan, M. (eds.) ATVA 2008. LNCS, vol. 5311, pp. 397–407. Springer, Heidelberg (2008) 3. Prisacariu, C., Schneider, G.: A Formal Language for Electronic Contracts. In: Bonsangue, M.M., Johnsen, E.B. (eds.) FMOODS 2007. LNCS, vol. 4468, pp. 174–189. Springer, Heidelberg (2007)
UnitCheck: Unit Testing and Model Checking Combined ˇ y Michal Kebrt and Ondˇrej Ser´ Charles University in Prague Malostransk´e n´ amˇest´ı 25 118 00 Prague 1 Czech Republic
[email protected],
[email protected] http://dsrg.mff.cuni.cz
Abstract. Code model checking is a rapidly advancing research topic. However, apart from very constrained scenarios (e.g., verification of device drivers by Slam), the code model checking tools are not widely used in general software development process. We believe that this could be changed if the developers could use the tools in the same way they already use testing tools. In this paper, we present the UnitCheck tool, which enhances standard unit testing of Java code with model checking. A developer familiar with unit testing can apply the tool on standard unit test scenarios and benefit from the exhaustive traversal performed by a code model checker, which is employed inside UnitCheck. The UnitCheck plugin for Eclipse presents the checking results in a convenient way known from unit testing, while providing also a verbose output for the expert users.
1
Introduction
In recent years, the field of code model checking has advanced significantly. There exist a number of code model checkers targeting mainstream programming languages such as C, Java, and C# (e.g., Slam [2], CBMC [6], Blast [11], Java PathFinder [15], and MoonWalker [7]). In spite of this fact, the adoption of the code model checking technologies in the industrial software development process is still very slow. This is caused by two main reasons (i) limited scalability to large software, and (ii) missing tool-supported integration into the development process. The current model checking tools can handle programs up to tens of KLOC and often require manual simplifications of the code under analysis [13]. Unfortunately, such program size is still several orders of magnitude smaller than the size of many industrial projects.
This work was partially supported by the Czech Academy of Sciences project 1ET400300504, and the Q-ImPrESS research project (FP7-215013) by the European Union under the Information and Communication Technologies priority of the Seventh Research Framework Programme.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 97–103, 2009. c Springer-Verlag Berlin Heidelberg 2009
98
ˇ y M. Kebrt and O. Ser´
Apart from the scalability issues, there is virtually no support for integration of the code model checkers into the development process. Although some tools feature a user interface in the form of a plugin for a mainstream IDE (e.g., SatAbs [5]), creation of a particular checking scenario is not often discussed or supported in any way. A notable exception is Slam and its successful application in the very specific domain of kernel device drivers. These two obstacles might be overcome by employing code model checking in a way similar to unit testing – we use the term unit checking first proposed in [10]. Unit testing is widely used in industry and developers are familiar with writing test suites. Providing model checking tools with a similar interface would allow developers to directly benefit from model checking technology (e.g., of exploration of all thread interleavings) without changing their habits. Moreover, applying model checking to smaller code units also helps avoiding the state explosion problem, the main issue of the model checking tools. Goals and structure of the paper. We present the UnitCheck tool, which allows for creation, execution and evaluation of checking scenarios using the Java PathFinder model checker (JPF). UnitCheck accepts standard JUnit tests [1] and exhaustively explores the reachable state space including all admissible thread interleavings. Moreover, the checking scenarios might feature nondeterministic choices, in which case all possible outcomes are examined. The tool is integrated into the Eclipse IDE in a similar way as the JUnit framework and also an Ant task is provided. As a result, users familiar with unit testing using JUnit might immediately start using UnitCheck. public class Account { private double balance = 0;
}
public void deposit(double a) { balance = balance + a; //(1) } public void withdraw(double a) { balance = balance − a; } public double getBalance() { return balance; }
public class Banker implements Runnable { private Account account; private double amount; private int cnt;
}
@Override public void run() { for (int i=0; i < cnt; ++i) { account.deposit(amount); } }
@Test public void testDepositInThreads() { Account account = new Account(); Thread t1 = new Thread( new Banker(account, 5, 5)); Thread t2 = new Thread( new Banker(account, 10, 5)); t1.start(); t2.start(); t1.join(); t2.join(); assertEquals(account.getBalance(), 25 + 50, 0); } @Test public void testDepositWithdraw() { Account account = new Account(); int income = Verify.getInt(0, 10); int outcome = Verify.getInt(0, 10);
}
account.deposit(income); assertEquals(account.getBalance(), income, 0); account.withdraw(outcome); assertEquals(account.getBalance(), Math.max(income − outcome, 0), 0);
Fig. 1. Tests that benefit from UnitCheck’s features
UnitCheck: Unit Testing and Model Checking Combined
99
Test example. Examples of two JUnit tests that would benefit from the analysis performed by UnitCheck (in contrast to standard unit testing) are listed in Figure 1. The code under test, on the left, comprises of bank accounts and bankers that sequentially deposit money to an account. The first test creates two bankers for the same account, executes them in parallel, and checks the total balance when they are finished. UnitCheck reports a test failure because the line marked with (1) is not synchronized. Therefore, with a certain thread interleaving of the two bankers the total balance will not be correct due to the race condition. In most cases, JUnit misses this bug because it uses only one thread interleaving. The second test demonstrates the use of a random generator (the Verify class) to conveniently specify the range of values to be used in the test. UnitCheck exhaustively examines all values in the range and discovers an error, i.e., the negative account balance. When using standard unit testing, the test designer could either use a pseudorandom number generator (the Random class) and take a risk of missing an error, or explicitly loop through all possible values, thus obfuscating the testing code.
2
Tool
In this section, the internal architecture of UnitCheck is described. Additionally, we discuss three user interfaces to UnitCheck which can be used to employ the tool in the development process. 2.1
Architecture
An overview of the UnitCheck’s architecture is depicted in Figure 2. The core module of UnitCheck, the actual integration of JPF and JUnit, is enclosed in the central box. It is compiled into a Java library so that it can be easily embedded into other Java applications (e.g., into an IDE). As an input, the core module takes an application under analysis, the JUnit-compliant test cases, and optionally additional properties to fine-tune JPF. The analysis is then driven and monitored via the UnitCheckListener interface. It is important to note that neither JPF nor JUnit functionality and structures are directly exposed outside the core. The UnitCheckListener interface hides all JPF and JUnit specific details. This solution brings a couple of advantages. (i) Extensions (e.g., Eclipse plugin) implement only the single (and simple) listener interface. (ii) In future, both JPF and JUnit can be replaced with similar tools without modifying existing extensions. Inside the core, UnitCheckListener is built upon two interfaces – JPF’s SearchListener and JUnit’s RunListener. SearchListener notifies about property violations (e.g., deadlocks) and provides complete execution history leading to a violation. RunListener informs about assertion violations and other uncaught exceptions. UnitCheck processes reports from both listeners and provides them in a unified form to higher levels through UnitCheckListener.
100
ˇ y M. Kebrt and O. Ser´
(FOLSVH ,'(
&PGOLQH 7RRO
$QW 7DVN
8QLW&KHFN/LVWHQHU 8QLW&KHFNHU
-8QLW 7HVW
-3)/LVWHQHU
7HVW5HSRUW/LVWHQHU
3HHU
6HDUFK/LVWHQHU -3) &RUH
WHVWV
QRWLILHV
$SSOLFDWLRQ 7HVW 7DUJHW
5XQ/LVWHQHU -8QLW5XQQHU
UXQV
UXQV
XVHV -DYD H[HFXWDEOH
,QSXW
-DYD FODVV
([HFXWHG LQ WKH -3) 90
-3) 3URSHUW\
-DYD OLEUDU\
Fig. 2. JPF and JUnit integration
When analyzing the test cases, two Java virtual machines are employed. The first one is the host virtual machine in which UnitCheck itself and the underlying JPF are executed. The second one is JPF, a special kind of virtual machine, which executes JUnit. Subsequently, JUnit runs the input test cases (in Figure 2, the code executed inside the JPF virtual machine is explicitly marked). The information about test case progress provided by the JUnit’s RunListener interface is available only in the JPF virtual machine. To make this information accessible in the host virtual machine, the JPF’s Model Java Interface (MJI) API is used. It allows to execute parts of the application under analysis in the host virtual machine instead of the JPF virtual machine. Each class that is to be executed in the host VM has a corresponding peer counterpart. This mechanism is used for the TestReportListener class. 2.2
User Interface
Currently, there are three different extensions that provide user interface for checking JUnit tests using UnitCheck – simple command-line application, Ant task, and Eclipse plugin. The interfaces are designed to make their usage as close to the usage of the corresponding JUnit tools as possible. As an example, the Eclipse plugin provides a user interface very similar to the JUnit plugin for Eclipse, including test summaries, progress reports, and run configurations. On the other hand, the tools provide also a verbose output for the expert users which are familiar with model checking tools. In addition to the easy-to-understand
UnitCheck: Unit Testing and Model Checking Combined
101
JUnit-like result summaries, the Eclipse plugin provides also a navigable panel with a detailed error trace obtained from JPF with different levels of verbosity.1
3
Related work
The notion of unit checking was first used in [10]. The authors study the problem of symbolic model checking of code fragments (e.g., individual functions) in separation. In a similar vein, we use the term unit checking in parallel to unit testing. However, the focus of UnitCheck is in providing users with tool support for integration of model checking into the software development process. To our knowledge, the Slam project [2] is by far the most successful application of code model checking in real-life software development process. Nevertheless, its success stems from the very constrained domain of device drivers, where the environment is fixed and known in advance by the tool’s developers. In contrast, UnitCheck offers benefits of code model checking in a convenient (and familiar) interface to developers of general purpose Java software. Another related project is Chess, which is a testing tool that can execute the target .NET or Win32 program in all relevant thread interleavings. Chess comes in a form of a plugin into Microsoft Visual Studio and can be used to execute existing unit tests. As well as with UnitCheck, the user of Chess is not forced to change his/her habits with unit testing and gets the benefit of test execution under all relevant thread interleaving for free. In contrast to UnitCheck, Chess cannot cope with random values in tests, because it uses a layer over the scheduler-relevant API calls. The presence of a random event in a test would result in the loss of error reproducibility. Orthogonal to our work is the progress on generating test inputs for unit tests for achieving high code coverage [9,16,17]. To name one, the Pex tool [14] uses symbolic execution and an automated theorem prover Z3 [12] to automatically produce a small test suite with high code coverage for a .NET program. We believe that similar techniques can be used in synergy with unit checking. There are other approaches for assuring program correctness than code model checking. Static analysis tools (e.g., Splint [8]) are easy to use and scale well. However, there is typically a trade off between amount of false positives and completeness of such analysis. The design-by-contract paradigm (e.g., JML [4], Spec# [3]) relies on user provided code annotations. Specifying these annotations is a demanding task that requires an expertise in formal methods.
4
Conclusion
We presented the UnitCheck tool that brings the benefits of code model checking to the unit testing area the developers are familiar with. Of course, not all tests are amenable for unit checking. Only tests for which the standard testing 1
The UnitCheck tool and all three user interfaces are available for download at http://aiya.ms.mff.cuni.cz/unitchecking
102
ˇ y M. Kebrt and O. Ser´
is not complete (i.e., tests that feature random values or concurrency) would benefit from exhaustive traversal using UnitCheck2 . As UnitCheck accepts standard JUnit tests, developers can seamlessly switch among the testing engines as necessary.
References 1. JUnit testing framework, http://www.junit.org 2. Ball, T., Bounimova, E., Cook, B., Levin, V., Lichtenberg, J., McGarvey, C., Ondrusek, B., Rajamani, S.K., Ustuner, A.: Thorough static analysis of device drivers. SIGOPS Oper. Syst. Rev. 40(4), 73–85 (2006) 3. Barnett, M., DeLine, R., F¨ ahndrich, M., Jacobs, B., Leino, K.R.M., Schulte, W., Venter, H.: The spec# programming system: Challenges and directions. In: Meyer, B., Woodcock, J. (eds.) VSTTE 2005. LNCS, vol. 4171, pp. 144–152. Springer, Heidelberg (2008) 4. Chalin, P., Kiniry, J.R., Leavens, G.T., Poll, E.: Beyond assertions: Advanced specification and verification with JML and eSC/Java2. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 342–363. Springer, Heidelberg (2006) 5. Clarke, E., Kr¨ oning, D., Sharygina, N., Yorav, K.: SATABS: SAT-Based Predicate Abstraction for ANSI-C. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 570–574. Springer, Heidelberg (2005) 6. Clarke, E., Kr¨ oning, D., Lerda, F.: A tool for checking ANSI-C programs. In: Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp. 168–176. Springer, Heidelberg (2004) 7. de Brugh, N.H.M.A., Nguyen, V.Y., Ruys, T.C.: Moonwalker: Verification of.net programs. In: Kowalewski, S., Philippou, A. (eds.) TACAS 2009. LNCS, vol. 5505, pp. 170–173. Springer, Heidelberg (2009) 8. Evans, D., Larochelle, D.: Improving security using extensible lightweight static analysis. IEEE Software 19(1), 42–51 (2002) 9. Godefroid, P., Klarlund, N., Sen, K.: Dart: directed automated random testing. In: PLDI 2005: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pp. 213–223. ACM, New York (2005) 10. Gunter, E.L., Peled, D.: Unit checking: Symbolic model checking for a unit of code. In: Dershowitz, N. (ed.) Verification: Theory and Practice. LNCS, vol. 2772, pp. 548–567. Springer, Heidelberg (2004) 11. Henzinger, T.A., Jhala, R., Majumdar, R., Sutre, G.: Lazy abstraction. SIGPLAN Not. 37(1), 58–70 (2002) 12. de Moura, L., Bjørner, N.S.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008) 13. M¨ uhlberg, J.T., L¨ uttgen, G.: Blasting linux code. In: Brim, L., Haverkort, B.R., Leucker, M., van de Pol, J. (eds.) FMICS 2006 and PDMC 2006. LNCS, vol. 4346, pp. 211–226. Springer, Heidelberg (2007) 14. Tillmann, N., de Halleux, J.: Pex white box test generation for.net. In: 2nd International Conference on Tests and Proofs, April 2008, pp. 134–153 (2008) 2
Of course, only the (increasing) portion of the standard Java libraries supported by JPFcan be used in the tests.
UnitCheck: Unit Testing and Model Checking Combined
103
15. Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model Checking Programs. Automated Software Engineering 10(2), 203–232 (2003) 16. Visser, W., Pˇ asˇ areanu, C.S., Khurshid, S.: Test input generation with java pathfinder. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA 2004), pp. 97–107. ACM, New York (2004) 17. Xie, T., Marinov, D., Schulte, W., Notkin, D.: Symstra: A framework for generating object-oriented unit tests using symbolic execution. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 365–381. Springer, Heidelberg (2005)
LTL Model Checking of Time-Inhomogeneous Markov Chains Taolue Chen1 , Tingting Han2,3 , Joost-Pieter Katoen2,3 , and Alexandru Mereacre2 1
Design and Analysis of Communication Systems, University of Twente, The Netherlands 2 Software Modelling and Verification, RWTH Aachen University, Germany 3 Formal Methods and Tools, University of Twente, The Netherlands
Abstract. We investigate the problem of verifying linear-time properties against inhomogeneous continuous-time Markov chains (ICTMCs). A fundamental question we address is how to compute reachability probabilities. We consider two variants: time-bounded and unbounded reachability. It turns out that both can be characterized as the least solution of a system of integral equations. We show that for the time-bounded case, the obtained integral equations can be transformed into a system of ordinary differential equations; for the time-unbounded case, we identify two sufficient conditions, namely the eventually periodic assumption and the eventually uniform assumption, under which the problem can be reduced to solving a time-bounded reachability problem for the ICTMCs and a reachability problem for a DTMC. These results provide the basis for a model checking algorithm for LTL. Under the eventually stable assumption, we show how to compute the probability of a set of ICTMC paths which satisfy a given LTL formula. By an automata-based approach, we reduce this problem to the previous established results for reachability problems.
1 Introduction Continuous-time Markov chains (CTMCs) are one of the most important models in performance and dependability analysis. They are exploited in a broad range of applications, and constitute the underlying semantical model of a plethora of modeling formalisms for real-time probabilistic systems such as Markovian queueing networks, stochastic Petri nets, stochastic variants of process algebras, and, more recently, calculi for system biology. These Markov chains are typically homogeneous, i.e., the rates that determine the speed of changing state as well as the probabilistic nature of mode transitions are constant. However, in some situations constant rates do not adequately model real behaviors. This applies, e.g., to failure rates of hardware components [10] (that usually depend on the component’s age), battery depletion [7] (where the power extraction rate non-linearly depends on the remaining amount of energy), and random phenomena that are subject to environmental influences. In these circumstances, Markov models
Financially supported by the DFG research training group 1295 AlgoSyn, the Dutch Bsik project BRICKS, the NWO project QUPES, the EU project QUASIMODO, and the SRO DSN project of CTIT, University of Twente.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 104–119, 2009. c Springer-Verlag Berlin Heidelberg 2009
LTL Model Checking of Time-Inhomogeneous Markov Chains
105
with inhomogeneous rates, i.e., rates that are time-varying functions, are much more appropriate [17]. Temporal logics and accompanying model-checking algorithms have been developed for discrete-time Markov chains (DTMCs for short), against linear-time properties [8,9] and branching-time properties [11]; for CTMCs against branching-time properties [2,3] and linear real-time properties [6]. And some of them have resulted in a number of successful model checkers such as PRISM [12] and MRMC [14]. However, the verification of time-inhomogeneous CTMCs (ICTMCs) has – to the best of our knowledge – not yet been investigated in depth, with the notable exception [15], which considered model checking a simple stochastic variant of Hennessy-Milner Logic (without fixed points) for piecewise-constant ICTMCs. The main aim of the current paper is to fill this gap by considering model checking ICTMCs w.r.t. linear-time properties. One of the most fundamental linear-time properties are reachability problems. Here we address two variants: time-bounded and unbounded reachability. The former asks, given a set of goal states and a time bound, what is the probability of paths of a given ICTMC that reach the goal states within the time bound. Time-unbounded reachability is similar except that the time bound is infinity. To solve both of them, we first provide a characterization in terms of the least solution of a system of integral equations. This can be regarded as a generalization of similar results for CTMCs [2,3] to ICTMCs. Furthermore, we show that for the time-bounded case, the obtained integral equations can be transformed into a system of (homogeneous) ordinary differential equations, which often enjoys an efficient numerical solution; for the time-unbounded case, generally this is not possible and one has to solve the system of integral equations directly, which is not so efficient and numerically unstable. To remedy this deficiency, we identify two sufficient conditions, i.e., the eventually periodicity and eventually uniformity, under which the problem can be reduced to the time-bounded reachability problem for ICTMCs and a (time-unbounded) reachability problem for DTMCs and thus can be solved efficiently. These classes subsume some interesting and important subclasses of ICTMCs, such as, the piecewise-constant case studied in [15] and ICTMCs with rates function representing Weibull failure rates. The latter distributions are important to model hazards and failures, and are popular in, e.g., reliability engineering. We then turn to model checking ICTMCs against LTL. Strictly speaking, we focus on computing the probability of the set of paths of a given ICTMC which satisfy the LTL formula. One of the main difficulties here compared to CTMCs is that in ICTMC, rates between states are functions over time instead of constants, and thus the topological structure of ICTMCs, when considered as a digraph, is not stable. To circumvent this problem, we identify a condition, i.e., the eventually stable assumption which intuitively means that after a (finite) time, the topological structure of the ICTMC does not change any more. Under this assumption, we can adapt the standard automata-based approach. A crucial ingredient is that we can construct a corresponding separated B¨uchi automaton from an LTL formula1, based on which, one can build the product of the given ICTMC and the separated B¨uchi automaton while obtaining a well-defined stochastic process. We then reduce the LTL model checking problem to the previous established results for reachability problems. 1
Note that one can also use deterministic automata, but that would incur an extra (unnecessary) exponential blowup.
106
T. Chen et al.
2 Preliminaries Given a set S, let Distr(S) denote the set of probability distributions over S. Definition 1 (ICTMC). A (labeled) inhomogeneous continuous-time Markov chain (ICTMC) is a tuple C = (S, AP, L, α, R(t)), where S is a finite set of states; AP is a finite set of atomic propositions; L : S → 2AP is a labeling function; α ∈ Distr(S) is an initial distribution; R(t) : S × S × R0 → R0 is a rate matrix. n×n Let diagonal matrix E(t) = diag [E s (t)] ∈ R0 , where n = |S| and Es (t) : S × R0 → R0 be defined as Es (t) = s ∈S Rs,s (t) for all s ∈ S, i.e., Es (t) is the exit rate of state s at time t. We require that all rates and exit rates, as functions of time t, are integrable. If all rates (and thus exit rates) are constant, we obtain a CTMC. A state s is absorbing if Rs,s (t) = 0, for s = s.
Semantics. An ICTMC induces a stochastic process. The probability to take a transition from s to s at time t within Δt time units is given by: Prob s→s , t, Δt =
Δt 0
R
s,s
(t + τ )e
− 0τ Es (t+υ)dυ
t+Δt τ dτ = Rs,s (τ )e− t Es (v)dv dτ. t
Definition 2 (Timed paths). Let C be an ICTMC. An infinite path starting at time x t0 t1 is a sequence ρx = s0 −− → s1 −− → s 2 · · · such that for each i ∈ N, si ∈ S, ti ∈ R>0 i and Rsi ,si+1 (t) > 0 where t = x + j=0 tj . A finite path is a prefix of an infinite path ending in a state. We will sometimes omit the subscript of ρx if the starting time x is irrelevant. Let Paths C and Paths C (s, x) denote the set of (finite and infinite) paths in C and those starting from state s at time x, respectively. The superscript C is omitted whenever convenient. Let ρ[n] := sn be the n-th state of ρ (if it exists) and ρn := tn the time spent in state sn . Let ρx @t be the state occupied in ρ at time t ∈ R0 , i.e. ρx @t := n ρx [n] where n is the smallest index such that x + i=0 ρx i > t. We assume w.l.o.g. that the time to stay in any state is strictly greater than 0. Let I denote the set of all nonempty intervals I ⊆ R0 and let I ⊕ t (resp. I t) denote {x + t | x ∈ I} (resp. {x − t | x ∈ I ∧ x t}). The definition of a Borel space over paths through ICTMCs follows [3]. An ICTMC C with initial state s0 and initial time x yields a probability measure PrCs0 ,x on paths as follows: Let Cx (s0 , I0 , . . ., Ik−1 , sk ) denote the cylinder set consisting of all paths ρ ∈ Paths(s0 , x) such that ρ[i] = si (i k) and ρi ∈ Ii (i < k). F (Paths(s0 , x)) is the smallest σalgebra on Paths(s0 , x) which contains all cylinder sets Cx (s0 , I0 , . . ., Ik−1 , sk ) for all state sequences (s0 , . . ., sk ) ∈ S k+1 and I0 , . . ., Ik−1 ∈ I. The probability measure PrCs0 ,x on F (Paths(s0 , x)) is the unique measure recursively defined by: PrCs0 ,x Cx (s0 , I0 , . . ., Ik−1 , sk ) τ 0 = Rs0 ,s1 (τ0 )·e− x Es0 (v)dv · PrCs1 ,τ0 Cτ0 (s1 , I1 , . . ., Ik−1 , sk ) dτ0 I0 ⊕x
LTL Model Checking of Time-Inhomogeneous Markov Chains
107
Example 1. An example ICTMC is illustrated in Fig. 2(a) (page 113), where AP = {a, b, c} and the rate functions are ri (t) (1 i 6). In particular, the exit rate function of s1 is r2 (t) + r3 (t). The initial distribution is α(s0 ) = 1 and α(s) = 0 for s = s0 . A possible rate function can be the ones depicted in Fig. 1 (page 110). Linear temporal logic. The set of linear temporal logic (LTL) formulae over a set of atomic propositions AP is defined as follows: Definition 3 (LTL syntax). Given a set of atomic propositions AP which is ranged over by a,b,. . ., the syntax of LTL formulae is defined by: ϕ ::= tt | a | ¬ϕ | ϕ ∧ ϕ | X ϕ | ϕ U ϕ. The semantics of LTL for ICTMC C is defined in a standard way by a satisfaction relation, denoted |=, which is the least relation |= ⊆ Paths C × R0 × LTL (here we use LTL to denote the set of LTL formulae) satisfying: (ρ, t) |= tt (ρ, t) |= ϕ1 ∧ ϕ2 iff (ρ, t) |= ϕ1 and (ρ, t) |= ϕ2 (ρ, t) |= a iff a ∈ L(ρ@t) (ρ, t) |= ¬ϕ iff (ρ, t) |= ϕ (ρ, t) |= X ϕ iff ∃Δt 0. (ρ, t+Δt) |= ϕ and ρ[1] = ρ@(t+Δt) (ρ, t) |= ϕ1 U ϕ2 iff ∃Δt 0. (ρ, t+Δt) |= ϕ2 and ∀t < t+Δt. (ρ, t ) |= ϕ1 We use ats ∈AP as an atomic proposition which holds solely at state s. For F ⊆ S, we write atF for s∈F ats . Let Paths(s, x, ϕ) = {ρ ∈ Paths(s, x) | (ρ, x) |= ϕ}. t0 t1 Note that a timed path ρ = s0 −− → s1 −− → · · · satisfies a formula ϕ iff the “discrete part” of ρ, namely, s0 s1 s2 · · · (= ρ[0]ρ[1]ρ[2] · · · ) satisfies ϕ. It thus can be easily shown that the set Paths(s, x, ϕ) is measurable. We denote the probability meaC sure of Paths(s, x, ϕ) as Prob(s, x, ϕ) = Prs,x (Paths(s, x, ϕ)) and let Prob (ϕ) = α(s0 )>0 α(s0 )·Prob(s0 , 0, ϕ) be the probability that ICTMC C satisfies ϕ.
3 Reachability Analysis In this section, we tackle reachability problems for ICTMCs. We distinguish two variants: time-bounded reachability and time-unbounded reachability. To solve both of them, we first give a characterization of Prob(s, x, ♦IatF ), namely, the probability of the set of paths which reach a set of goal states F ⊆ S within time interval I starting from state s at time point x. This is done by resorting to a system of integral equations, which is a generalization of a similar characterization for CTMCs [3]. Proposition 1. Let C = (S, AP, L, α, R(t)) be an ICTMC with s ∈ S, x ∈ R0 , F ⊆ S and interval I ⊆ R0 with T1 = inf I and T2 = sup I. The function S × R0 × I → [0, 1], (s, x, I) → Prob(s, x, ♦IatF ) is the least fixed point of the operator Ω : (S × R0 × I → [0, 1]) → (S × R0 × I → [0, 1]) ,
108
T. Chen et al.
where Ω(f )(s, x, I) = ⎧ T2 τ ⎪ ⎪ Rs,s (x + τ )e− 0 Es (x+v)dv · f (s , x + τ, I τ )dτ, if s ∈ / F (1) ⎪ ⎪ ⎨ 0 s ∈S T1 T τ ⎪ ⎪ − 0 1Es (x+v)dv ⎪ e + Rs,s (x+τ )e− 0 Es (x+v)dv ·f (s , x+τ, I τ )dτ, if s ∈ F (2) ⎪ ⎩ 0
s ∈S
3.1 Time-Bounded Reachability We now solve the time-bounded reachability problem, i.e., given ICTMC C, a set of goal states F ⊆ S and a time bound T ∈ R0 , to compute Prob(s, x, ♦TatF ), the probability of Paths(s, x, ♦TatF ) which is the set of paths that reach F within T time units given the initial time x. To accomplish this, we first compute Prob(s, x, ♦=TatF ), where the slightly different property ♦=TatF , in contrast to ♦TatF , requires that states in F are reached at exactly time T . Note that ♦T atF and ♦=T atF can also be written as ♦[0,T ] atF and ♦[T,T ] atF , respectively, where I = [0, T ] or I = [T, T ] is a time interval. By instantiating (1), (2) in Prop. 1, we obtain that Prob(s, x, ♦=TatF ) = ⎧ T ⎪ − 0τ Es (x+v)dv ⎪ (x + τ )e R · Prob(s , x + τ, ♦=T −τatF )dτ, if s∈F / (3) ⎪ s,s ⎪ ⎨0 s ∈S T τ ⎪ ⎪ − 0TEs (x+v)dv ⎪ e + Rs,s (x+τ )e− 0 Es (x+v)dv ·Prob(s ,x+τ ,♦=T −τatF )dτ, if s∈F (4) ⎪ ⎩ 0 s ∈S
Intuitively, (3) and (4) are justified as follows: If s ∈ / F , the probability of reaching an F -state from s after exactly T time units given the starting time x equals the probability of reaching some direct successor s of s in τ time units, multiplied by the probability of reaching an F -state from s in the remaining T − τ time units. If s ∈ F at time x, then it can either stay in s (i.e., delay) for T time units (the first summand in (4)), or regard s as a non-F state and take a transition (the second summand in (4)). We now address the problem of solving (3) and (4), read as a system of integral equations. We define Π(x, T ) as the matrix with entries Πi,j (x, T ) denoting the probability of the set of paths starting from state i at time x and reaching state j at time x + T . For any ICTMC, the following equation holds: Π(x, T ) =
0
T
M(x, τ )Π(x + τ, T − τ )dτ + D(x, T ) delay
(5)
Markovian jump
M(x, T ) is the probability density matrix where Mi,j (x, T ) = Ri,j (x + T ) · − 0T Ei (x+v)dv e is the density to move from state i to j at exactly time T and D(x, T ) T is the diagonal delay probability matrix with Di,i (x, T ) = e− 0 Ei (x+v)dv . We note that Π(x, T ) is actually the (equivalent) matrix form of (3) and (4). For (4), it follows directly that each of its summands has a counterpart in (5). For (3), note that D(x, T ) is a diagonal matrix where all the off-diagonal elements are 0 and that (3)
LTL Model Checking of Time-Inhomogeneous Markov Chains
109
does not allow a delay transition from a non-F state. This correspondence builds a halfbridge between Prob(s, x, ♦=T atF ) and Π(x, T ), whereas the following proposition completes the other half bridge between Π(x, T ) and the transient probability vector π(t) of ICTMCs: Proposition 2. Given ICTMC C with initial distribution α and rate matrix R(t). We have that Π(0, t) and π(t) satisfy the following two equations: π(t) = α · Π(0, t) , dπ(t) = π(t) · Q(t), π(0) = α , dt
(6) (7)
where Q(t) = R(t) − E(t) is the infinitesimal generator of C. Intuitively, this proposition implies that solving the system of integral equations Π(x, t) boils down to computing the transient probability vector π(t) with each element π s (t) indicating the probability to be in state s at time t given the initial probability distribution α = π(0). The transient probability is specified by a system of ODEs (7), the celebrated Chapman-Kolmogorov equations. Given ICTMC C, let C[F ] be the ICTMC obtained by making the states in F absorbing in C. We have the following theorem: Theorem 1. For any ICTMC C, Prob C (s, x, ♦TatF ) = Prob C[F ] (s, x, ♦=TatF ). To sum up, Proposition 1, 2 together with Theorem 1 suggest that computing timebounded reachability probabilities in an ICTMC can be done, by first making the F states absorbing (and thus obtaining C[F ]) followed by solving a system of homogeneous ODEs (7) for C[F ]. By using standard numerical approaches, e.g., Euler method or Runge-Kutta method and their variants [16], this system of ODEs (i.e. the transient probability vector) can be solved. 3.2 Time-Unbounded Reachability We then turn to the time-unbounded reachability problem, i.e., there are no constraints on the time to reach the F -states. Let Prob(s, x, ♦ atF ) denote the reachability probability from state s at time x to reach F within time interval [0, ∞). Using Proposition 1, we can characterize Prob(s, x, ♦ atF ) as follows: ⎧ ⎪ ⎨ ⎪ ⎩
∞ 0
1,
Rs,s (x + τ )e−
τ 0 Es (x+v)dv
· Prob(s , x + τ, ♦atF )dτ,
if s ∈ /F
(8)
if s ∈ F
(9)
s ∈S
The case s ∈ F is derived from (2), where the probability to delay in an F -state for zero units of time is 1 and the probability to leave (i.e. taking a Markovian jump) an F -state in zero units of time is 0. When s ∈ / F , Eq. (8) is similar to (3) except that there is no bound on the time to leave a state s ∈ / F . Note that in contrast to the timebounded case, in general it is not possible to reduce this system of integral equations to a system of ODEs. Since solving a system of integral equations is generally time consuming and numerically instable, we propose to investigate some special cases (subsets
110
T. Chen et al.
Rs,s (t)
ns,s ·P
0
Ts,s
Rs,s (t)
t
0
Ts
t
Fig. 1. Eventually periodic assumption (left) and eventually stable assumption (right)
of ICTMCs), for which the reduction to ODEs is possible. Here we consider two such classes, i.e. eventually periodic ICTMCs and eventually uniform ICTMCs. Their common feature is that rate functions of the given ICTMC exhibit regular behaviors after some time T . This allows for computing time-unbounded reachability probabilities efficiently (e.g., via DTMCs). Hence the problem turns out to be reducible to computing the time-bounded reachability probabilities with time bound T , which has been tackled in the previous section, and reachability probabilities for DTMCs. Both of them, fortunately, enjoy efficient computational methods. Eventually periodic assumption. We consider eventually periodic ICTMCs. Definition 4 (Eventually periodic assumption (EPA)). An ICTMC C is eventually periodic if there exists some time P ∈ R>0 such that for any two states s, s ∈ S, there exists some Ts,s ∈ R0 and ns,s ∈ N such that (1) Rs,s (t) if t Ts,s Rs,s (t) = (2) Rs,s (t) if t > Ts,s (2)
(2)
where Rs,s (t) = Rs,s (t + ns,s ·P ). An example rate function under the EPA is illustrated in Fig. 1 (left). After time point Ts,s , the function Rs,s (t) becomes periodic with the period ns,s ·P , where P is the “common factor” of all the periods of rate functions Rs,s (t), for all s, s ∈ S. For any ICTMC C satisfying EPA, let TEP = maxs,s ∈S Ts,s and PEP = (gcds,s ∈S ns,s )·P . Intuitively, TEP is the time since when all rate functions are periodic and PEP is the period of all the periodic rate functions. For instance, suppose there are two rate func(2) (2) tions with Rs1 ,s2 (t) = 2 + cos( 12 t) and Rs2 ,s3 (t) = 3 − sin( 13 t), and let Ts1 ,s2 = 10, Ts2 ,s3 = 15. Then TEP = max{10, 15}=15, P =π, ns1 ,s2 =4 and ns2 ,s3 =6, and PEP = gcd{4, 6}·π=12π. Time-unbounded reachability probabilities for an ICTMC under the EPA can be computed according to Alg. 1 and justified by Theorem 2. Let us explain it in more detail. Due to (9), once F states are reached, it is irrelevant how the paths continue. This justifies the model transformation from C to C[F ]. The reachability problem can be divided into two subproblems: (I) first to compute the probability to reach state s ∈ S at exactly time TEC (the second Prob in (10), see below); and (II) then to compute the time-unbounded reachability from s ∈ S to F (the third Prob in (10)). In the following we will focus on (II): Recall that we denote Prob(s, TEP , ♦=PEPats ) to be the probability to reach from s to s after time PEP starting from time point
LTL Model Checking of Time-Inhomogeneous Markov Chains
111
TEP . Since after time TEP all rate functions are periodic with period PEP , it holds that Prob(s, TEP , ♦=PEPats ) = Prob(s, TEP +n·PEP , ♦=PEPats ), for all n ∈ N. It then suffices to compute Prob(s, TEP , ♦=PEPats ) for any s, s ∈ S. Given the ICTMC C with state space S, we build a DTMC DC = (S, P) with Ps,s =Prob C (s, TEP , ♦=PEPats ). Intuitively, Ps,s is the one-step probability (one-step here means one period) to move from s to s , and the problem (II) is now reduced to computing the reachability probability from s to F -states in arbitrarily many steps (since the time-unbounded case is considered), i.e., Prob DC[F ] (s, ♦ atF ). This can be done by standard methods, e.g., value iteration or solving a system of linear equations, see, among others, [4] (Ch. 10). Theorem 2. Let C = (S, AP, L, α, R(t)) be an ICTMC satisfying EPA with time TEP and PEP , s ∈ S and F ⊆ S. Then: Prob CEP (s, 0, ♦ atF )= Prob C[F ] (s, 0, ♦=TEP ats )·Prob DC[F ] (s , ♦ atF ) (10) s ∈S
Remark 1. Sometimes we need to compute P robCEP (s, x, ♦ atF ) for ICTMC C = (S, AP, L, α, R(t)), namely, the starting time is x instead of 0. To accomplish this, we define an ICTMC C = (S, AP, L, α, R (t)) such that R (t) = R(t + x) and it follows that C still satisfies EPA (with TEP = TEP − x if x TEP and 0 otherwise; C PEP = PEP ) and P robEP (s, x, ♦ atF ) = P robCEP (s, 0, ♦ atF ). Algorithm 1. Time-unbounded reachability for ICTMCs satisfying EPA Require: ICTMC C = (S, AP, L, α, R(t)), EPA time TEP , period PEP Ensure: Prob CEP (s, 0, ♦ atF ) 1: For any two states s, s ∈ S in C[F ], compute the time-bounded reachability probability with time bound TEP , starting from time point x, i.e. Prob C[F ] (s, 0, ♦=TEP ats ); 2: For any two states s, s ∈ S in C[F ], compute the time-bounded reachability probability with time bound PEP , starting from time point TEP , i.e. Prob C[F ] (s, TEP , ♦=PEP ats ); 3: Construct a discrete-time Markov chain (DTMC for short) DC[F ] = (S, P) with Ps,s = Prob C[F ] (s, TEP , ♦=PEP ats ). We denote the reachability probability from s to F in DC by ] Prob DC[F (s, ♦ atF ); 4: Return s ∈S Prob C[F ] (s, 0, ♦=TEP ats ) · Prob DC[F ] (s , ♦ atF ).
Eventually uniform assumption. The previous section has discussed rate functions enjoying a periodic behavior. A different class of rate functions are those which increase or decrease uniformly, e.g., an ICTMC in which all rates are a multiplicative of the γ−1 Weibull failure rate which is characterized by the function f (t) = αγ αt , where γ and α are the shape and scale parameters of the Weibull distribution, respectively. These distributions can e.g., characterize normal distributions, and are frequently used in reliability analysis. This suggests to investigate eventually uniform ICTMCs. Definition 5 (Eventually uniform assumption (EUA)). An ICTMC C is eventually uniform if there exists some time TEU ∈ R>0 and an integrable function f (t) : R>0 → t R>0 such that limt→∞ TEU f (τ )dτ → ∞ and for any two states s, s ∈ S and t TEU , Rs,s (t) = f (t) · Rcs,s , where Rcs,s is a constant.
112
T. Chen et al.
In terms of the infinitesimal generator Q(t) of the ICTMC C, EUA intuitively entails that there exists some function f (t) and constant infinitesimal generator Qc = Rc − Ec (Rc and Ec are the constant rate matrix and exit rate matrix, respectively) such that Q(t) = f (t)·Qc for all t TEU . We also define the constant transition probability Rc
matrix Pc such that Pcs,s = Es,s c . s By restricting to the EUA, one can reduce the time-unbounded reachability problem for an ICTMC C to computing the time-bounded reachability probability with C time bound TEU and the reachability probability in a DTMC DEU [F ] with transition c c c probability matrix P [F ], where P [F ]s,s = Ps,s for s ∈ / F ; Pc [F ]s,s = 1 and Pc [F ]s,s = 0, for s ∈ F and s = s. This is shown by the following theorem. Theorem 3. Let C = (S, AP, L, α, R(t)) be an ICTMC with s ∈ S. Given a set F of goal states and the eventually uniform assumption with the associated time TEU and C DTMC DEU , it holds that C Prob C (s, 0, ♦ atF ) = Prob C[F ] (s, 0, ♦=TEU ats ) · Prob DEU [F ] (s , ♦ atF ). (11) s ∈S
Remark 2. We note that the two assumptions, EUA and EPA are incomparable. There are rate functions (e.g. polynomials) which can not be represented as periodic functions but satisfy EUA; on the other hand, in case of EPA one can, for instance, assign the same sort of rate functions (e.g. sin) with different periods, and thus obtain an ICTMC which invalidates EUA.
4 LTL Model Checking In this section, we tackle the problem of model checking properties specified by LTL formulae for ICTMCs. Model checking CTMCs against LTL is not very difficult, since one can easily extract the embedded DTMC of the given CTMC, and thus reduce the problem to the corresponding model checking problem of DTMCs, which is wellstudied, see, e.g. [8]. However, this approach does not work for ICTMCs, since the rates of the ICTMC vary with time. Below we shall employ an automata-based approach. For this purpose, some basic definitions are in order. Definition 6 (Generalized Buchi ¨ automata). A generalized B¨uchi automaton (GBA) is a tuple A = (Σ, Q, Δ, Q0 , F ), where Σ is a finite alphabet; Q is a finite set of states; Δ ⊆ Q × Σ × Q is a transition relation; Q0 ⊆ Q is a set of initial states, and F ⊆ 2Q is a set of acceptance sets. σ We sometimes write q −− → q if (q, σ, q ) ⊆ Δ for simplicity. An infinite word w ∈ ω Σ is accepted by A, if there exists an infinite run θ ∈ Qω such that θ[0] ∈ Q0 , (θ[i], w[i], θ[i + 1]) ⊆ Δ for i 0 and for each F ∈ F, there exist infinitely many indices j ∈ N such that θ[j] ∈ F . Note that w[i] (resp. θ[i]) denotes the i-th letter (resp. state) on w (resp. θ). The accepted language of A, denoted L(A), is the set of all words accepted by A. Given a GBA A and state q, we denote by A[q] the automaton A with q as the unique initial state. Note that L(A) = q∈Q0 L(A[q]). A GBA A is separated, if for any two states q, q , L(A[q ]) ∩ L(A[q ]) = ∅.
LTL Model Checking of Time-Inhomogeneous Markov Chains
113
It follows from [9] that the correspondence between LTL formulae and separated GBA can be established: Theorem 4. For any LTL formula ϕ over AP, there exists a separated GBA Aϕ = (Σ, Q, Δ, Q0 , F ), where Σ = 2AP and |Q| 2O(|ϕ|), such that L(Aϕ ) is the set of computations satisfying the formula ϕ. We note that the notion of separated is crucial for the remainder of this paper. A closely related notion, referred to as unambiguous, has been widely studied in automata and language theory, dating back to [1]. See also, among others, [5][13] for relevant literature. To the best of our knowledge, the notion of “separated” was firstly exploited in [9] for model checking DTMCs against LTL. Definition 7 (Product). Given an ICTMC C = (S, AP, L, α, R(t)) and a separated GBA A = (Σ, Q, Δ, Q0 , F ), the product C ⊗ A is defined as ˜ α ˜ C ⊗ A = (Loc, AP, L, ˜ , R(t)), ˜ where Loc = S × Q; L(s, q) = L(s); α(s ˜ 0 , q0 ) = α(s0 ) if α(s0 ) > 0 and L(s) ˜ q0 ∈ Q0 , and undefined elsewhere; and Rs,q,s ,q (t) = Rs,s (t) if q −−−→ q . For the sake of clarity, we call the states of a product as locations. Example 2. Given ICTMC C (Fig. 2(a)) and separated GBA A (Fig. 2(b)), the product C ⊗ A is shown in Fig. 2(c). {a} s0
r1 (t)
{b} s1
r2 (t)
r3 (t)
{c}
r5 (t)
s2
{a} s3
a q0
r4 (t) s4
r6 (t) {b}
a
(a) ICTMC C
r3 (t)
r1 (t) 6 = s1 , q2
r1 (t)
q2 c
b
q3 b
a
q4
a c
q5
c
(b) Separated GBA A
8 = s4 , q5
0 = s0 , q0
q1
1 = s1 , q1
7 = s2 , q3 r2 (t) r2 (t)
2 = s2 , q5
r3 (t) 5 = s4 , q3
r4 (t)
r4 (t) r5 (t)
3 = s4 , q1 r6 (t) 4 = s3 , q0 r6 (t) 9 = s4 , q2
(c) Product C ⊗ A Fig. 2. Example product construction of ICTMC C and separated GBA A
114
T. Chen et al.
Remark 3. Note that in general the product itself is not an ICTMC. The reason is twofold: (1) If |Q0 | > 1, then α ˜ is not a distribution; (2) The sum of the rates of outgoing transitions from a location might exceed the exit rate of the location. For instance, in ˜0 (t) = Es0 (t) = r1 (t); while the sum Example 2, the exit rate of 0 , as defined, is E of the rates of its outgoing transitions is 2r1 (t). However, due to the fact that A is separated, as we will see later, it would not be a problem, cf. Proposition 3. The generalized B¨uchi acceptance condition, roughly speaking, requires to visit some states infinitely often. As in the tradition of model checking Markovian models, we need to identify bottom strongly connected components (BSCCs) of the product (when read as a graph). A strongly connected component (SCC for short) of the product denotes a strongly connected set of locations such that no proper superset is strongly connected. A BSCC is an SCC from which no location outside is reachable. Unfortunately, generally in ICTMCs, there is no way to define a BSCC over the product since the rate of each transition is a function of time instead of a constant and thus a BSCC at time t might not be a BSCC at time t . In other words, the topological structure (edge relation) of the product might change at any moment of time, which is one of the main difficulties of model checking ICTMCs. To circumvent this problem, we make an (arguably mild) assumption, that is, we assume that ICTMCs are eventually stable, in the following sense. Definition 8 (Eventually stable assumption (ESA)). An ICTMC C is eventually stable if for each state s ∈ S, there exists some time Ts such that for any t Ts and s ∈ S, either Rs,s (t) > 0 or Rs,s (t) = 0. W.l.o.g., we assume Ts is the smallest time point that the above assumption holds for state s. Let TES = maxs∈S Ts be the smallest time point that an ICTMC is stable. Intuitively, an ICTMC is stable if its topological structure does not change any more. More specifically, transitions can alter their rates, but not from positive to zero or vice versa, i.e., no transitions will “disappear” or “newly created”. An example rate function is illustrated in Fig. 1 (right), where after TES the rates keep strictly positive (note the particular value is irrelevant here). It turns out that ESA is essential for identifying stable BSCCs (also model checking LTL). A stable product as well as stable BSCC are defined in the same way, relative to the time point TES . In the sequel, when we refer to BSCCs, we implicitly refer to the stable BSCCs in the stable product. In accordance with this, we will sometimes write s → s for ICTMC C if Rs,s (t) > 0 with t TES ; and similarly for → in the product. Definition 9 (aBSCC). Given the product C ⊗ A of an ICTMC C = (S, AP, L, α, R(t)) satisfying ESA and a GBA A = (Σ, Q, Δ, Q0 , F ), we define I. a SCC is a set of locations B ⊆ S × Q such that (i) B is strongly connected meaning that for any two locations , ∈ B, →∗ where →∗ denotes the reflexive and transitive closure of →, and (ii) no proper superset of B is strongly connected; II. a SCC B is accepting if ∀F ∈ F, there exists some s, q ∈ B such that q ∈ F ; III. a SCC B is an accepting bottom SCC (B ∈ aBSCC for short) if (i) B is accepting; (ii) for each location ∈ B, there does not exist any location such that
LTL Model Checking of Time-Inhomogeneous Markov Chains
115
→ and is in any other accepting SCC; (iii) for each location = s, q ∈ B, for any s with s → s , s , q ∈ B for some q . As an example, we note, suppose that r4 (t), r5 (t), r6 (t) > 0 when t TES , that an accepting BSCC in the stable product in Fig. 2(c) is formed by 2 , 3 , 4 . Note that { 7 } is not an (accepting) SCC, so III(ii) is not violated. { 8 } is not an SCC either, since we would require that 8 →∗ 8 which fails to be. Recall that given an LTL formula, one can obtain a corresponding separated automaton, which renders us very nice properties for the product defined in Definition 7. A couple of lemmas, dedicated to illustrate these properties are in order. The following two essentially exploit the fact that for each accepted word of a separated GBA, there is a unique accepting path. Lemma 1. Given the product C ⊗ A where A is separated. For any aBSCC B of the stable product C ⊗ A, it cannot be the case that s, q → s , q and s, q → s , q for any s, q, s , q , s , q in B with q =q . We say that two locations s, q and s , q in the product C ⊗ A are connected, if L(s) q −−−→ q 2 . We say that from location s, q there is a path leading to a BSCC B, if there is a sequence s0 , q0 , s1 , q1 , . . . , sn , qn such that s, q = s0 , q0 , si , qi and si+1 , qi+1 are connected for 0 i < n and sn , qn ∈ B. Lemma 2. Given the product C ⊗ A of an ICTMC C and a separated GBA A, it cannot be the case that there are two locations s, q and s, q with q = q such that both of them have a path reaching an aBSCC. As said, given ICTMC C and separated GBA A, the product C ⊗ A itself is not an ICTMC (see Example 2). However, thanks to the fact that A is separated, we can transform C ⊗ A into an ICTMC. Lemma 1 and 2 entail that in the product C ⊗ A, we can safely remove the locations which do not lead to an accepting BSCC, and thus obtain an ICTMC model, denoted C⊗A. Let us illustrate this by continuing Example 2. First note that the dashed locations are the trap locations from which the accepted location 2 cannot be reached. Those locations can safely be removed since the paths passing them will never be accepted. It is not a coincidence that at most one of the outgoing transitions from those “nondeterministic” locations (i.e., 0 , 1 , 3 , 4 ) can reach the accepted locations. This is guaranteed by the separated property of the automaton (Lemma 2). By deleting all the dashed locations, we obtain C⊗A. The following proposition claims that C⊗A can be viewed as an ICTMC in the sense that it defines a stochastic process exactly as an ICTMC. The crucial point is that in C⊗A, for each location and time t, the sum of the rates of the emanating transitions from does not exceed the exit rate of . (Note that the sum could be strictly less than the exit rate as for 1 in Fig. 2(c), thus it is “substochastic”.) With a little abusing of terms, we call this model an ICTMC. Proposition 3. C⊗A is an ICTMC. Moreover, for each accepting cylinder set, C and C⊗A give rise to the same probability. 2
Note that we do not require that s → s . So “connected” is purely a graph-theoretic notion where the time is irrelevant.
116
T. Chen et al.
Let C⊗A be obtained from C⊗A by making each location in the aBSCCs absorbing, and define F as the set of locations in any aBSCC. Given an ICTMC C with eventually stable assumption (with TES ) and an LTL formula ϕ, the probability of the set of paths of C satisfying ϕ, denoted Prob CES (ϕ), can be computed by Alg. 2. Theorem 5. For an ICTMC C with TES and an LTL formula ϕ, Prob CES (ϕ) =
α( ˜ 0 ) · Prob C⊗A (0 , 0, ♦=TES at ) · Prob C⊗A (, TES , ♦ atF ).
α( ˜ 0 )>0 ∈Loc
Algorithm 2. Model checking ICTMC against LTL Require: ICTMC C, LTL formula ϕ, ESA time TES ; Ensure: Prob C ES (ϕ) 1: Transform ϕ to a separated generalized B¨uchi automaton A; ˜ α, ˜ 2: Build the product C ⊗ A = (Loc, AP, L, ˜ R(t)); 3: Find all accepting BSCCs in the (stable) product C ⊗ A; 4: Remove all the trap locations yielding C⊗A; 5: Compute the time-bounded reachability in C⊗A from initial location 0 to each ∈ Loc, Prob C⊗A (0 , 0, ♦=TES at ); 6: Make each location in the aBSCCs absorbing, thus obtaining C⊗A and F ; 7: Compute the time-unbounded reachability probability in C⊗A from each ∈ Loc to F , i.e., Prob C⊗A (, TES , ♦ atF ); 8: Prob CES (ϕ)= α( ˜ 0 )·Prob C⊗A (0 , 0, ♦=TES at )·Prob C⊗A (, TES , ♦ atF ). ˜ 0 )>0 ∈Loc α(
Note that Prob C⊗A ( 0 , 0, ♦=TES at ) and Prob C⊗A ( , 0, ♦ atF ) can be computed by the approaches in Section 3.1 and 3.2, respectively. Computing the former relies on solving a system of ODEs, whereas computing the latter, as stated in Section 3.2, one has to solve a system of integral equations in general. Remark 4 (EPA, EUA and ESA). EPA and ESA are incomparable, i.e., there are ICTMCs that are eventually periodic but not stable (see e.g., the ICTMC with one rate function in Fig. 1 (left)), and vice versa (see, e.g., that in Fig. 1 (right)). When both assumptions are applied, we obtain ICTMCs that are “eventually positive periodic”, i.e., eventually periodic and all rate function values in the periods are either strictly positive or being zero. For this subset of ICTMCs, one can resort to solving a system of ODEs and linear equations, as presented in Theorem 2 as well as Alg. 1. EUA and ESA are incomparable as well. The counterexamples for both directions can be easily constructed. When both assumptions are applied, as in the previous case, the subset of ICTMCs (where f (t) is eventually strictly positive) can be dealt with by solving a systems of ODEs and linear equations (Theorem 3). The comparison of EPA and EUA can be found in Remark 2. We emphasize once again that ESA is of most importance in LTL model checking, in order to find stable BSCCs. However EPA or EUA are certain subsets of ICTMCs that we can efficiently deal with (meaning by solving a system of ODEs and linear equations). We mention that there are other approaches which can handle and solve the system of integral equations, e.g., approximation by truncating the infinite range of the integral.
LTL Model Checking of Time-Inhomogeneous Markov Chains
117
Example 3. We continue Example 2 to show how to compute the set of paths of ICTMC C accepted by A. Let the rate functions be defined as: ri (t) = i for t 0 and 3 i 6; and ⎧ r1 (t) =
x ∈ [0, 9.5) ⎨t 0 x ∈ [9.5, 10) ⎩ 2 + cos( 12 t) x ∈ [10, ∞)
r2 (t) =
4.1 x ∈ [0, 15) 7.6 − sin( 13 t) x ∈ [15, ∞)
It is not difficult to see that this ICTMC satisfies both the ESA and EPA and TES = 10, TEP = 15 and PEP = 12π. To compute Prob CES (A), Alg. 2 is applied. Note that we omit the step of transforming an LTL formula to a GBA and the notation Prob CES (A) is self-explanatory. We consider the first Prob appearing in step 8, namely, to compute Prob C⊗A ( 0 , 0, ♦=10 at ) for all
in C⊗A. This is actually to compute the transient probability vector in C⊗A at time TES = 10, which can be done by solving a system of ODEs. We then consider the second Prob appearing in step 8, namely, to compute Prob C⊗A ( , 10, ♦ at2 ) (note that F ∗ = { 2 }). Finally, we wrap them up as follows: ⎧ Prob C⊗A (0 , 0, ♦=10 at0 ) ⎪ ⎪ ⎪ ⎪ ⎨ Prob C⊗A (0 , 0, ♦=10 at1 ) C Prob ES (A) = Prob C⊗A (0 , 0, ♦=10 at2 ) ⎪ ⎪ ⎪ Prob C⊗A (0 , 0, ♦=10 at3 ) ⎪ ⎩ Prob C⊗A (0 , 0, ♦=10 at4 )
· · · · ·
Prob C⊗A (0 , 10, ♦ at2 ) Prob C⊗A (1 , 10, ♦ at2 ) 1 1 1
(12)
The left column of (12) is the transient probability vector; we then show how to compute the elements in the right column. For this purpose, generally we have to solve a system of integral equations. Here we obtain the following one: τ ⎧ ∞ ⎨ f0 (x) = 0 r1 (x + τ )e− 0 r1 (x+v)dv ·f1 (x + τ )dτ τ ∞ f (x) = 0 r2 (x + τ )e− 0 r2 (x+v)+r3 (x+v)dv ·f2 (x + τ )dτ ⎩ 1 f2 (x) = 1 In this case, one obtains that f2 (x) = 1, Z
f1 (10) =
15
10
4.1e−
Rτ
10 (4.1+3)dv
Z
dτ +
∞
R 15 Rτ 1 1 (7.6−sin( τ ))·e− 10 (4.1+3)dv− 15 (7.6−sin( 3 (x+v))+3)dv dτ . 3 15
and f0 (10) can be computed accordingly. It follows that Prob C⊗A ( 0 , 10, ♦ at2 ) = f0 (10) and Prob C⊗A ( 1 , 10, ♦ at2 ) = f1 (10). Hence (12) can be obtained. Alternatively, let us note that fortunately in this case, the EPA is satisfied. So one can apply Alg. 1 to compute Prob C⊗A ( , 10, ♦ at2 ). Let us illustrate for the case that
= 0 . (The case that = 1 is similar.) For the first Prob in Alg. 1, step 4, it is again to compute the transient probability matrix Prob C⊗A ( , 10, ♦=15−10 at ), for , ∈ { i | 1 i 4} (note that 2 is already made absorbing in C⊗A ); and for the second Prob, we need to construct the DTMC D with P, = Prob C⊗A ( , 15, ♦=12π at ). It follows that C⊗A Prob EP (0 , 10, ♦ at2 )
⎧ C⊗A (0 , 10, ♦=15−10 at0 ) · Prob D (0 , ♦ at2 ) ⎨ Prob C⊗A = Prob (0 , 10, ♦=15−10 at1 ) · Prob D (1 , ♦ at2 ) ⎩ C⊗A Prob (0 , 10, ♦=15−10 at2 ) · 1
118
T. Chen et al.
The P matrix of the DTMC D is as follows (let Θ denote C⊗A ): ⎛
⎞ Prob Θ (0 , 15, ♦=12π at0 ) Prob Θ (0 , 15, ♦=12π at1 ) Prob Θ (0 , 15, ♦=12π at2 ) Θ =12π Θ =12π P=⎝ 0 Prob (1 , 15, ♦ at1 ) Prob (1 , 15, ♦ at2 ) ⎠. 0 0 1 C⊗A
Hence Prob EP
( 0 , 10, ♦ at2 ) can be easily computed.
5 Conclusion We have studied the problem of verifying linear-time properties against ICTMCs. Two variants of reachability problems, i.e. time-bounded and unbounded reachability, as well as LTL properties were considered. Future work consists of identifying more classes of ICTMCs for which efficient computational methods exist such that the approach studied in this paper can be applied. Other specifications like (D)TA, M(I)TL, will also be investigated.
References 1. Arnold, A.: Rational omega-languages are non-ambiguous. Theor. Comput. Sci. 26, 221–223 (1983) 2. Aziz, A., Sanwal, K., Singhal, V., Brayton, R.K.: Model-checking continous-time Markov chains. ACM Trans. Comput. Log. 1(1), 162–170 (2000) 3. Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P.: Model-checking algorithms for continuous-time Markov chains. IEEE Trans. Software Eng. 29(6), 524–541 (2003) 4. Baier, C., Katoen, J.-P.: Principles of Model Checking. MIT Press, Cambridge (2008) 5. Carton, O., Michel, M.: Unambiguous B¨uchi automata. Theor. Comput. Sci. 297(1-3), 37–81 (2003) 6. Chen, T., Han, T., Katoen, J.-P., Mereacre, A.: Quantitative model checking of continuoustime Markov chains against timed automata specification. In: LICS (to appear, 2009) 7. Cloth, L., Jongerden, M.R., Haverkort, B.R.: Computing battery lifetime distributions. In: DSN, pp. 780–789 (2007) 8. Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995) 9. Couvreur, J.-M., Saheb, N., Sutre, G.: An optimal automata approach to LTL model checking of probabilistic systems. In: Y. Vardi, M., Voronkov, A. (eds.) LPAR 2003. LNCS, vol. 2850, pp. 361–375. Springer, Heidelberg (2003) 10. Gokhale, S.S., Lyu, M.R., Trivedi, K.S.: Analysis of software fault removal policies using a non-homogeneous continuous time Markov chain. Software Quality Control 12(3), 211–230 (2004) 11. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Asp. Comput. 6(5), 512–535 (1994) 12. Hinton, A., Kwiatkowska, M.Z., Norman, G., Parker, D.: Prism: A tool for automatic verification of probabilistic systems. In: Hermanns, H., Palsberg, J. (eds.) TACAS 2006. LNCS, vol. 3920, pp. 441–444. Springer, Heidelberg (2006) 13. K¨ahler, D., Wilke, T.: Complementation, disambiguation, and determinization of B¨uchi automata unified. In: Aceto, L., Damg˚ard, I., Goldberg, L.A., Halld´orsson, M.M., Ing´olfsd´ottir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 724–735. Springer, Heidelberg (2008)
LTL Model Checking of Time-Inhomogeneous Markov Chains
119
14. Katoen, J.-P., Khattri, M., Zapreev, I.S.: A Markov reward model checker. In: QEST, pp. 243–244 (2005) 15. Katoen, J.-P., Mereacre, A.: Model checking HML on piecewise-constant inhomogeneous markov chains. In: Cassez, F., Jard, C. (eds.) FORMATS 2008. LNCS, vol. 5215, pp. 203–217. Springer, Heidelberg (2008) 16. Lambert, J.D.: Numerical Methods for Ordinary Differential Systems. John Wiley & Sons, Chichester (1991) 17. Rindos, A., Woolet, S.P., Viniotis, Y., Trivedi, K.S.: Exact methods for the transient analysis of non-homogeneous continuous-time Markov chains. In: Numerical Solution of Markov Chains (NSMC), pp. 121–134 (1995)
Statistical Model Checking Using Perfect Simulation Diana El Rabih and Nihal Pekergin LACL, University of Paris-Est (Paris 12), 61 avenue G´en´eral de Gaulle 94010, Cr´eteil, France
[email protected],
[email protected] Abstract. We propose to perform statistical probabilistic model checking by using perfect simulation in order to verify steady-state and time unbounded until formulas over Markov chains. The model checking of probabilistic models by statistical methods has received increased attention in the last years since it provides an interesting alternative to numerical model checking which is poorly scalable with the increasing model size. In previous statistical model checking works, unbounded until formulas could not be efficiently verified, and steady-state formulas had not been considered due to the burn-in time problem to detect the steadystate. Perfect simulation is an extension of Markov Chain Monte Carlo (MCMC) methods that allows us to obtain exact steady-state samples of the underlying Markov chain, and thus it avoids the burn-in time problem to detect the steady-state. Therefore we suggest to verify time unbounded until and steady-state dependability properties for large Markov chains through statistical model checking by combining perfect simulation and statistical hypothesis testing.
1
Introduction
Probabilistic model checking is an extension of formal verification methods for systems exhibiting stochastic behavior. The system model is usually specified as a state transition system, with probabilities attached to transitions, for example Markov chains. From this model, a wide range of quantitative performance, reliability, and dependability measures of the original system can be computed. These measures can be specified using temporal logics as PCTL [5] for Discrete Time Markov Chains (DTMC) and CSL [2,3] for Continuous Time Markov Chains (CTMC). Most of the high level formalisms assume a CTMC as the underlying stochastic process. To perform probabilistic model checking there are two distinct approaches : numerical approach based on computation of transient-state or steady-state distribution of the underlying Markov chain and statistical approach based on hypothesis testing and on sampling by means of discrete event simulation or by measurement. The numerical approach is highly accurate but it suffers from the state space explosion problem. The statistical
This work is supported by a french research project, ANR-06-SETI-002.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 120–134, 2009. c Springer-Verlag Berlin Heidelberg 2009
Statistical Model Checking Using Perfect Simulation
121
approach overcomes this problem but it does not guarantee that the verification result is correct. However it is possible to bound the probability of generating an incorrect answer so it provides probabilistic guarantees of correctness. Hence statistical model checking techniques constitute an interesting alternative to numerical techniques for large scale systems. A comparison of numerical and statistical approaches for probabilistic model checking has been done in [16]. Statistical approaches for model checking have received increasing attention in the last years [15,18,9,16,17]. Younes et al. have proposed a statistical approach based on hypothesis testing and discrete event simulation but with a focus on time bounded until formulas [15,18]. In [9] Sen et al. have proposed a statistical approach also based on hypothesis testing and discrete event simulation for verifying time unbounded until properties by introducing a stopping probability, ps , which is the probability of terminating the generation of a trajectory after each state transition. This stopping probability must be extremely small to give correctness guarantees and the accuracy of their verification result would depend on the state space size, making the approach impractical, except for models with small state spaces. In fact, the steady-state formula was not studied before in any existing statistical approach and the unbounded until formula can not be checked efficiently by the existing approach [9] because of the steady-state detection problem (stopping probability problem). Thus we have proposed in [1] a novel approach to perform statistical model checking by combining perfect simulation and statistical hypothesis testing in order to check steady-state formulas and we have shown some preliminary results. Perfect simulation is an extension of MCMC methods allowing us to obtain exact steady-state samples of the underlying Markov chain and avoiding the burn-in time problem to detect the steady-state. Propp and Wilson have designed the algorithm of coupling from the past to perform perfect simulation [8]. A web page dedicated to this method is maintained by them (http://research.microsoft.com/enus/um/people/dbwilson/exact/). As a perfect sampler, we use ψ 2 proposed in [12,10], designed for the steady-state evaluation of various monotone queueing networks (http://psi.gforge.inria.fr/website/Psi2-Unix-Website/). This tool permits to simulate a stationary distribution or directly a cost function or a reward of large Markov chains. An extension of this tool is proposed in [14] to study non monotone queueing networks. In this paper, we extend our approach proposed in [1] by applying more efficient statistical hypothesis testing methods. Another extension is to design an algorithm to generate samples for the verification of time unbounded until formulas. Moreover, we integrate our proposed approach to the perfect sampler Ψ 2 [12], to be able to model check large performability and dependability models. Note that, the proposed approach is applicable both to monotone and to nonmonotone systems, thus it allows us to generate samples for time unbounded until and steady-state formulas to perform statistical model checking. Moreover this approach is exteremely efficient for monotone systems and lets us overcome the state-space explosion problem. As application of our approach, we verify steady-state and time unbounded until dependability properties for a multistage
122
D. El Rabih and N. Pekergin
interconnection queueing network illustrating its efficiency when considering very large models. The rest of this paper is organised as follows: In section 2, we present some preliminaries on considered temporal logics CSL and PCTL and on the concept of statistical hypothesis testing. Section 3 is devoted to the perfect simulation. In section 4, we present our contribution for statistical probabilistic model checking using perfect simulation as well as the applicability and complexity of our proposed method. Section 5 presents the case study with experimental results. Finally, we conclude and present our future works in section 6.
2 2.1
Preliminaries Temporal Logics for Markov Chains
In this subsection we give a brief introduction for the considered temporal logics operators. We consider essentially the until formulas for verification over execution paths and the steady-state operator for long run behaviours of the underlying model. The stochastic behaviour of the underlying system is described by a labelled Markov chain, M, which may be in discrete or continuous time. These operators are defined in CSL defined over CTMCs [2,3] and in PCTL defined over DTMCs [5]. Let M take values in a finite set of states S, AP denote the finite set of atomic propositions. L : S → 2AP is the labeling function which assigns to each state s ∈ S the set L(s) of atomic propositions a ∈ AP those are valid in s. Let p be a probability, I a time interval, and a comparison operator: ∈ {, ≤, ≥}. The syntax is given as follows: ϕ ::= true | a | ϕ ∧ ϕ | ¬ϕ | Pθ (ϕ1 U I ϕ2 ) | Pθ (ϕ1 Uϕ2 ) | Sθ (ϕ) The satisfaction operator is denoted by |=, then for all states s ∈ S, s |= true. Atomic proposition a is satisfied by state s (s |= a) iff a ∈ L(s). The logic operators are obtained using standard logic equivalence rules : s |= ¬ϕ iff s |= ϕ, s |= ϕ1 ∧ ϕ2 iff s |= ϕ1 ∧ s |= ϕ2 . Until formulas are evaluated over the paths initiated from a given initial state s. A state s satisfies Pθ (ϕ1 U I ϕ2 ), iff the sum of probability measures over paths starting from s, passing through only states satisfying ϕ1 and reaching to a state satisfying ϕ2 in time interval I meets the bound θ. The until formula without time interval is the time unbounded until formula which means that I ∈ [0, ∞[. The steady-state operator Sθ (ϕ) lets us to analyze the long-run behaviour of the system. If the sum of steadystate probabilities of states satisfying ϕ meets θ, this operator is satisfied. If M is ergodic, the steady-state distribution is independent of the initial state, then this formula is satisfied or not regardless of the initial state. 2.2
Statistical Hypothesis Testing
The probabilistic model checking consists in deciding whether the probability that the considered system satisfies the underlying property ϕ meets a given threshold
Statistical Model Checking Using Perfect Simulation
123
θ or not. Without loss of generality, we consider the case P≥θ (ϕ) where ϕ is a path formula. Obviously, this is equivalent to verify ¬P ) < , and therefore, per condition (1) P (∃t : ||x(t)|| > ) < . This theorem can also be adapted to the case where the differential inclusions per mode are replaced by stochastic differential equations x˙ = f (x, σ). In this case, condition (2) can be replaced by
dV m m (2’) E dV dx f (x, σ) = dx E(f (x, σ)) is negative definite on Inv(m) For non-hybrid systems, a proof outline for this case based on results by Kushner [7] can be found in [8]. For the hybrid case, it can be combined with the proof for Theorem 1 to accommodate for stochastic differential equations in hybrid systems. For linear dynamics and quadratic Lyapunov function candidates of the form V (x) = xT P x, P ∈ Rn × Rn , the conditions (1) to (4) can directly be mapped onto a linear matrix inequality (LMI) problem [9], which in turn can be solved 2
Supermartingales are stochastic processes for which, given an evolution to time s, the expected value at time t ≥ s is never higher than the value at time s.
A Decompositional Proof Scheme for Automated Convergence Proofs
157
automatically with nonlinear optimization techniques [10]. The solution of such an LMI problem consists of valuations of the entries of P and some auxiliary varii ables μim , νm , and ηei that are used to express the invariants and guards through the so-called S-procedure [10]. Invariants are encoded through an arbitrary number of matrices Qim per mode m that satisfy x ∈ Inv(m) ⇒ xT Qim x ≥ 0. The same applies to the guard sets G and the matrices Rei . The S-procedure always results in correct over-approximations of invariant and guard sets, although it can be conservative [2]. Denote “xT P x is positive semidefinite” as “P 0”. Assume the dynamics for each mode m are given as the conic hull of a family of linear dynamics, i.e., x˙ ∈ cone {Am,1 x, . . . , Am,k x}. Then, the associated LMI problem looks as follows. Theorem 2 (LMI Formulation). If the following LMI problem has a solution, then the system is GAS-P wrt. 0: i Find Pm ∈ Rn × Rn , α > 0, μim ≥ 0, νm ≥ 0, ηei ≥ 0, such that for each mode m: Pm − μim Qim − αI 0 i
for each mode m and each j: − ATm,j P − P Am,j −
(1)
i νm Qim − αI 0 (2)
i
for each transition e and each target mode m with Target(m ) > 0: Pm − Target(m ) · Pm − ηei Rei 0 m
(3)
i
Conditions (1) to (3) directly map onto the same conditions of Theorem 1. Condition (4) of Theorem 1 is already satisfied through the use of quadratic function templates. If the system dynamics are not linear or a non-quadratic Lyapunov function candidate is needed, then the sums-of-squares decomposition [3] can be applied to transform the constraints into an LMI problem. LMI problems are a representation of semidefinite programming (SDP) problems, which in turn form a special class of convex optimization problems [10] that can be solved with dedicated software, e.g., CSDP [11] or SeDuMi [12]. The result of the computation – if it is successful – yields valuations of the matrix variables Pm , and thereby a suitable discontinuous Lyapunov function, completing the stability proof. If no positive result is obtained, then no conclusion about the stability or instability of the system can be drawn, and it is not easy to identify the cause of the problem. It is possible that the system is indeed unstable, or that a different Lyapunov function parameterization (for instance applying the sums-of-squares decomposition [3]) or a different hybrid automaton representation of the system might allow for a solution to the LMI problem. Moreover, the computation might simply fail for numerical reasons, despite the existence of a Lyapunov function. These problems are more likely to occur, the larger the LMI problem grows, i.e., the more complex the hybrid automaton is. For this reason, we next turn to decompositional proofs of GAS-P, keeping the LMI problems comparatively small, and in case of failure, giving constructive information about the part of the hybrid automaton that is most likely responsible
158
J. Oehlerking and O. Theel
for the failure. Furthermore, decomposition can also be turned into composition, in the sense that a stable hybrid automaton can be designed step by step, by solving LMI problems for the different sub-automata. These sub-automata can then be composed according to the results in the next section, yielding a new stable automaton.
4
Decompositional Computation of Lyapunov Functions
This section deals with automaton-based decomposition of stability proofs, as opposed to techniques like the composition of input-to-state stable systems [4], which work on the continuous state space. The hybrid automaton is divided into sub-automata, for which LMI problems can either be solved completely independently, or sequentially, but with some information being passed from one computation to the next. In contrast to the non-stochastic setting covered in [5], it turned out that stochastic stability proofs allow for stronger decompositional results, which exploit the knowledge of transition properties. The different levels of decomposition are outlined in the following. The decompositions take place on the graph structure defined by the automaton. Hence, we will apply graph-theoretic terms to the automaton by viewing the automaton as a hypergraph with some labels. The modes in the hybrid automaton are therefore sometimes referred to as nodes, and the transitions as hyperedges. Since one hybrid system can be represented by different hybrid automata with potentially different graph structures, some representations of the system might be more amenable to decomposition than others. The first level of decomposition concerns the strongly connected components of the hybrid automaton. Definition 5 (Strongly Connected Components). A strongly connected component (SCC) of a hypergraph is a maximal subgraph G, such that each node in G is reachable from each other node. Note that, here, reachability is exclusively based on the graph structure: continuous dynamics, invariants, and guards are not taken into account. It is well known that every node of a hypergraph belongs to exactly one SCC, and that the reachability relation between the SCCs of a hypergraph forms an acyclic graph structure. This property can be exploited to allow for completely independent Lyapunov function computation for the SCCs of a hybrid automaton. Theorem 3 (Decomposition into Strongly Connected Components). Let H be a probabilistic hybrid automaton. If all sub-automata pertaining to the SCC of H are GAS-P wrt. 0, then so is H. The consequence of this theorem is that LMI problems as per Theorem 2 can be solved locally for each SCC, still yielding a proof of GAS-P for the entire system. The proof is a variant of the proof for non-probabilistic hybrid automata stated in [5]. The following corollary is a consequence of this property.
A Decompositional Proof Scheme for Automated Convergence Proofs
159
Corollary 1 (Multiple Equilibria). If all SCCs of probabilistic hybrid automaton H are GA-P, but with respect to different equilibrium states, then each trajectory of H will converge to one of these states with probability 1. The second level of decomposition concerns cycles within an SCC and is also described in detail for non-probabilistic systems in [5]. Since it is based on the decompositional properties of Lyapunov functions, and not on the system itself, the result applies to both non-probabilistic and probabilistic hybrid automata. Therefore, we only give a brief summary of the decomposition technique. These results are based on a theorem that allows the decomposition of LMI computations within an SCC. Consider an automaton consisting of two subgraphs C1 and C2 , which overlap in exactly one node b (see Figure 2(a)). Again, LMI computations can be conducted separately for C1 and C2 . However, the computations are not completely independent, but a conic predicate given by a family of local Lyapunov functions Vbi for b is used to “connect” the two Lyapunov function computations for C1 and C2 . First, a local LMI problem is solved for C1 , computing the Vbi (see Figure 2(b)). These Vbi have the property that, whenever the LLF for node b in a GLF for C2 is a conic combination of the Vbi , there exists a GLF for the entire system comprising both subgraphs. Therefore, this requirement on the LLF of b is added as an additional constraint to the LMI for C2 (see Figure 2(c)). If there is a solution to this second local LMI problem, then the system is GAS-P. The probabilistic version of this theorem is given next. Again, a straightforward modification of the proof from [5] yields a variant for the probabilistic case. Theorem 4 (Decomposition inside an SCC). Let H be a probabilistic hybrid automaton consisting of two subgraphs C1 and C2 with a single common node b. Let b, n1 , . . . , nj be the nodes of C1 and b, m1 , . . . , mk be the nodes of C2 . Let each Vb1 , . . . , Vbm be a LLF for b belonging to a GLF Vbi , Vni1 , . . . , Vnij for the entire subgraph C1 . If there exists a GLF Vb , Vm1 , . . . , Vmk for subgraph C2 with ∃λ1 , . . . , λm > 0 : Vb = i λi Vbi , then H is GAS-P wrt. 0. We employ this theorem for cycle-based decomposition of the stability proof within an SCC. In a slight abuse of terminology, we will use the term “cycle” for the graphs generated by a cyclic path, as defined below. n3
n1
C1
n2
m2
b
C2
m1
(a) Two cycles
n3
m3
n1
C1
m2 Vbi
Vbi
b
b
n2
(b) Computation of the Vbi for C1
C2
m3
m1
(c) Computation for C2 , using the Vbi
Fig. 2. Decomposition with separate computations for subgraphs C1 and C2
160
J. Oehlerking and O. Theel
Definition 6 (Cycle). A cycle of a hypergraph G is a subgraph G , such that there exists a closed path in G, covering all edges and nodes of G . A cycle C is simple, if there exists such a path that only traverses each node once. Since each node inside an SCC is part of at least one simple cycle, this theorem allows for local LMI computations on a per-cycle basis. While solving the LMI problem for a cycle C, one can already compute adequate Vbi , which are then taken into account for other cycles that intersect with C. Note, that this decomposition is in general conservative, i.e., some Lyapunov functions are lost in the computation of the Vbi . It is, however, possible to approximate the real set of existing Lyapunov functions of the chosen parameterization arbitrarily close by increasing the number of computed Vbi . The previous two results stem from analysis of stability properties for nonprobabilistic systems. We will now show another decompositional property that is only applicable to probabilistic hybrid automata. However, this decomposition will only preserve global attractivity in probability (GA-P), as opposed to global asymptotic stability in probability (GAS-P). Since attractivity (that is, convergence to the equilibrium) is usually the more interesting property, this is still a useful result. If the automaton has certain local graph structures, then Lyapunov functions can sometimes be computed completely independently per mode, while still allowing for a proof of global attractivity in probability. We first define finiteness in probability, a property of individual cycles. Informally, this property means that, with probability 1, it is only possible for a trajectory to traverse the cycle finitely long until the cycle is either not entered again or the trajectory ends up in a mode of the cycle which is not left any more. Definition 7 (Finiteness in Probability). A cycle C with node set VC in a probabilistic hybrid automaton H is called finite in probability if for all trajectories of H with infinite mode sequences (mi ) the property P (∃m ∈ VC ∃i0 ∀i ≥ i 0 : mi = m) = 1 holds. Finiteness in probability is a property that can often be derived directly from the graph structure, as stated by the following theorem (see also Figure 3). Lemma 1 (Criterion for Finite in Probability Cycles). Let C be a cycle in a probabilistic hybrid automaton H. If at least one edge in C belongs to a hyperedge e, such that there exists a mode m with Targete (m) > 0 belonging to a different SCC, then C is finite in probability. Proof. Let (mi ) be an infinite mode sequence belonging to a trajectory of H. Let m ˜ be the source mode of edge e. Show that P (∃i0 ∀i ≥ i0 : mi = m) ˜ = 1. Proof by contradiction. Assume that P (∀i0 ∃i ≥ i0 : mi = m) ˜ > 0. Since edge e branches to another SCC with probability p > 0, this implies that 0 = ∞ (1 − p) ≥ P (∀i ∃i ≥ i : m = m) ˜ > 0, which is a contradiction. 0 0 i i=1 The consequence of this property for stability verification is as follows. A cycle that is finite in probability does not need to be mapped onto one LMI problem
A Decompositional Proof Scheme for Automated Convergence Proofs
161
m2 Flow(m2 ) Inv(m2 ) G2 ∧ Update2 G1 ∧ Update1
Target2 (m3 ) > 0 m1
m3
Flow(m1 ) Inv(m1 )
Flow(m3 ) Inv(m3 )
Fig. 3. Finite in probability cycle consisting of m1 and m2
for the entire cycle. Instead, it is sufficient to provide a local Lyapunov function for each mode of the cycle, with no constraints spanning several modes. The existence of such local Lyapunov functions ensures that the system will always stabilize, in case a trajectory “gets stuck in a mode”. If it does not get stuck, then finiteness in probability ensures that the cycle is eventually left with probability 1. The following lemma breaks GA-P down into a probability on the cycles of the system and will be used to prove the decomposition theorem. Lemma 2 (GA-P and Finite in Probability Cycles). Let H be a probabilistic hybrid automaton. If for each cycle C of H one of the following two conditions holds, then H is GA-P wrt. 0: (1) C is finite in probability and for each m ∈ C there exists a LLF Vm , or (2) there exists a GLF for C Proof. Case 1, (mi ) is finite: Let mn be the final element of (mi ). There either exists a LLF Vmn for mn per condition (1), or per condition (2) as part of a GLF. Therefore, x(t) → ∞. Case 2, (mi ) is infinite: Since (mi ) only contains modes belonging to a finite number of SCC C1 , . . . , Cm , Cm cannot contain any finite in probability cycles. Therefore, per condition (2), there exists a GLF for Cm (which can be covered by a cycle) and therefore Cm is GA-P. By itself, this result is of limited use, because all cycles of the hybrid automaton need to be considered. However, it is used to prove the following theorem, which allows a further decomposition. It implies that nodes lying only on finite in probability simple cycles can be treated completely separately within an SCC. Theorem 5 (Decomposition of Lyapunov Functions within an SCC). Let C be an SCC of a probabilistic hybrid automaton. Let N be the set of modes that lie only on finite in probability simple cycles. Let C be the hybrid automaton obtained by removing the modes of N and all incident transitions from C. If the following two conditions both hold, then C is GA-P wrt. 0:
162
J. Oehlerking and O. Theel
(1) for each m ∈ N there exists a LLF Vm , and (2) there exists a GLF for C Proof. We show that the prerequisites of Lemma 2 are always satisfied. Let D be a cycle in C as required by Lemma 2. There are three cases: Case 1, D is simple and finite in probability: there exists a LLF for each node of D, either per condition (1) or as part of a GLF per condition (2). Therefore, condition (1) of Lemma 2 is satisfied for D. Case 2, D is not simple, but finite in probability: D can be broken down into a family D1 , . . . , Dn of simple cycles. For each node in a Di , there exists a LLF either per condition (1) or condition (2), and therefore for all nodes in D. Case 3, D is not finite in probability: D cannot contain any simple subcycles that are finite in probability. Therefore, D is a subgraph of C , and a GLF exists for D per condition (2), implying condition (2) of Lemma 2. To check whether a node is in N or not, an enumeration of only the simple cycles of the automaton is necessary. The result is, that nodes in N can each be treated separately, since they only require the existence of a local Lyapunov function. Once a LLF is found, they can be removed from the automaton, and the cycle-based decomposition procedure from Theorem 4 can be applied to the remainder of the SCC. Next, we will apply these decomposition results to an example automaton.
5
Example
As an example, we present a simple cruise controller system with a probabilistic transition (see Figure 4). The variable v models the difference between actual speed and desired speed v0 (i.e., the system dynamics are shifted, such that v = 0 is the desired speed), and a models the acceleration. Mode A1 represents a saturation, enforcing a maximum acceleration, while mode A2 is the standard acceleration/deceleration mode that is active whenever a is below the saturation level and v is close to 0. B1 and B2 represent two different service brake modes, modeling different brake dynamics that are chosen probabilistically with probabilities p1 > 0 and p2 > 0 (for instance depending on the inclination of the track or the weather conditions, which are considered random in this model). Additionally, with a (small) probability p3 > 0, the service brake system might fail altogether, activating an emergency brake mode F . From any mode except F , it is also possible that the vehicle is ordered to stop in a regular manner, e.g., because the destination has been reached. This is modeled in mode E. By applying the theorems of Section 4, it is possible to decompose the stability proof for this system into a number of subproofs. We want to show that the system will either converge to v = a = 0, or the vehicle will come to a stop in modes E or F , which means convergence to v = −v0 , where v0 is the desired speed. The modes E and F each form a separate SCC and can therefore be treated separately through Theorem 3 and Corollary 1. Since the cycles formed by A2 and B1 and by A2 and B2 are both finite in probability, the nodes B1
A Decompositional Proof Scheme for Automated Convergence Proofs
B1
v˙ = a a˙ = −0.1v − 0.2a 8 ≤ v ≤ 50 −5 ≤ a ≤ 0
B2
163
v˙ = a a˙ = −0.15v − 0.3a 8 ≤ v ≤ 50 −5 ≤ a ≤ 0
p1 p2 p3
F
v˙ = −5 a˙ = 0 −v0 ≤ v ≤ 50
A2
v˙ = a a˙ = −0.05v − 0.1a −20 ≤ v ≤ 10 −5 ≤ a ≤ 5
A1
v˙ = 3 a˙ = −0.05v − 0.1a −30 ≤ v ≤ −18 −5 ≤ a ≤ 5
E
v˙ = −5 a˙ = 0 −v0 ≤ v ≤ 50
Fig. 4. Example automaton (guards and updates not pictured) and decomposition (dashed lines: Theorem 3, dotted lines: Theorem 5)
and B2 can also be analyzed separately per Theorem 5. This yields one LMI problem as per Theorem 2 for E, F , B1 and B2. Furthermore, one LMI problem for the cycle given by nodes A1 and A2 must be solved, including the two edges connecting them. If solutions for all of these independent LMI problems can be found, then all SCCs are GA-P, and all trajectories converge to either v = a = 0 or v = −v0 with probability 1. In contrast, solving an LMI containing all constraints for all modes and transitions of the whole graph in one step is intractable in practice. The GLF for the SCC consisting of A1 and A2, as computed by an LMI solver, is given by VA1 = 3.9804v 2 + 2.0001ta + 10.502a2 and VA2 = 0.625v 2 + 2va + 10.5a2 . Examples for LLF for the other nodes are VB1 = 18.496v 2 + 26.314av + 100a2 , VB2 = 31.843v 2 + 40.98av + 100a2 and VE = VF = v + v0 .
6
Quantitative Analysis
In this section, we outline how the results of the previous section can be employed for quantitative stability analysis, i.e., the computation of convergence properties that lie below 1. Generally, this type of analysis is oriented along the SCCs of the hybrid system. If the convergence probability is strictly smaller than 1, then
164
J. Oehlerking and O. Theel
some trajectories need to reach a permanent decision point, where a stabilizing decision is taken with probability p < 1 and a non-stabilizing decision is taken with probability 1 − p. The permanent decision points that are visible in the hybrid automaton are the transitions between SCCs: once an SCC is left, a return is impossible. Therefore, such a “decision” is irreversible for the trajectory. This leads to the conclusion that quantitative stability analysis can be conducted with help of SCCs: those that are “visible in the graph structure” and those that are “hidden in the guards and dynamics” and can be exposed by using alternate hybrid automaton representations of the system. First, we define global attractivity with probability of less than 1, and then we give a theorem on the SCCs that are visible in the graph structure. Definition 8 (Global Attractivity with Probability p < 1). A probabilistic hybrid automaton H is globally attractive in probability with probability p (GAP(p)) with respect to an equilibrium state xe , if for all trajectories x(t) of H: P (limt→∞ x(t) = xe ) ≥ p. Theorem 6 (Quantitative Analysis). Let H be a probabilistic hybrid automaton, consisting of an SCC C, that is GA-P wrt. 0 and a number of SCC C1 , . . . , Cm that are successors of C. Assume that Init only contains hybrid states with nodes of C as their discrete state. Furthermore, assume that each Ci is known to be GA-P(pi ) wrt. 0 for some 0 ≤ pi ≤ 1. Let ci be a lower bound on the probability that a trajectory of H ends up in Ci Then, the sub-automaton of H consisting of C and the Ci is GA-P(p) wrt. 0, with p = ci pi . Proof. Let (mi ) be a mode sequence belonging to a trajectory x(t) of H. If (mi ) never leaves C, then x(t) must converge to 0 with probability 1 since C is GA-P. If this is not the case, then (mi ) will enter SCC Ci with a probability of at least ci . Since ci is GA-P(pi ), x(t) will then converge with probability pi . Summing up over all Ci , we get a lower bound for the probability of convergence as p = ci pi . Therefore, H is GA-P(p). Lower bounds ci can, for instance, be computed with the help of discrete time Markov decision processes, where the steady-state probability of ending up in an SCC is such a ci . However, to obtain tight bounds on the stabilization property, it is necessary to have an automaton model of the system where all “branching points” are visible as transitions between SCC in the graph structure. At this point, methods for reachable set computation of hybrid systems can be employed to discover semantically equivalent automata (wrt. the continuous behavior) that have a “finer” SCC structure.
7
Conclusions
In this paper, we presented a scheme for decomposition of proofs of stability in probability for probabilistic hybrid automata. The decomposition results can be used to make automatic stability proofs through Lyapunov function computation more tractable in practice. Furthermore, the results give conditions, under
A Decompositional Proof Scheme for Automated Convergence Proofs
165
which stability properties of sub-automata to be composed transfer to the newly obtained larger automaton. Therefore, stable automata can be designed step by step, applying Lyapunov function arguments that are local in the graph during the design process. Furthermore, failure of a Lyapunov function is now less problematic, since it will be visible which computational step – and therefore which part of the automaton – caused the problem. This knowledge allows for an easier diagnosis of the problem that prevented the stability proof from succeeding. In general, we postulate that decompositional reasoning makes it easier to see what makes or breaks stability properties in probabilistic hybrid automata.
References 1. Branicky, M.: Multiple Lyapunov functions and other analysis tools for switched and hybrid systems. IEEE Transactions on Automatic Control 43(4), 475–482 (1998) 2. Pettersson, S.: Analysis and Design of Hybrid Systems. PhD thesis, Chalmers University of Technology, Gothenburg (1999) 3. Parrilo, P.A.: Semidefinite programming relaxations for semialgebraic problems. Mathematical Programming, Series B (96), 293–320 (2003) 4. Heemels, M., Weiland, S., Juloski, A.: Input-to-state stability of discontinuous dynamical systems with an observer-based control application. In: Bemporad, A., Bicchi, A., Buttazzo, G. (eds.) HSCC 2007. LNCS, vol. 4416, pp. 259–272. Springer, Heidelberg (2007) 5. Oehlerking, J., Theel, O.: Decompositional construction of Lyapunov functions for hybrid systems. In: Majumdar, R., Tabuada, P. (eds.) HSCC 2009. LNCS, vol. 5469, pp. 276–290. Springer, Heidelberg (2009) 6. Shiryaev, A.N.: Probability, 2nd edn. Springer, Heidelberg (1996) 7. Kushner, H.J.: Stochastic stability. Lecture Notes in Mathematics, vol. (249), pp. 97–124 (1972) 8. Loparo, K.A., Feng, X.: Stability of stochastic systems. In: The Control Handbook, pp. 1105–1126. CRC Press, Boca Raton (1996) 9. Boyd, S., El Ghaoui, L., Feron, E., Balakrishnan, V.: Linear Matrix Inequalities in System and Control Theory. Society for Industrial and Applied Mathematics, SIAM (1994) 10. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004) 11. Borchers, B.: CSDP, a C library for semidefinite programming. Optimization Methods and Software 10(1), 613–623 (1999), https://projects.coin-or.org/Csdp/ 12. Romanko, O., P´ olik, I., Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones (1999), http://sedumi.ie.lehigh.edu
Memory Usage Verification Using Hip/Sleek Guanhua He1 , Shengchao Qin1 , Chenguang Luo1 , and Wei-Ngan Chin2 1
Durham University, Durham DH1 3LE, UK 2 National University of Singapore {guanhua.he,shengchao.qin,chenguang.luo}@durham.ac.uk,
[email protected] Abstract. Embedded systems often come with constrained memory footprints. It is therefore essential to ensure that software running on such platforms fulfils memory usage specifications at compile-time, to prevent memory-related software failure after deployment. Previous proposals on memory usage verification are not satisfactory as they usually can only handle restricted subsets of programs, especially when shared mutable data structures are involved. In this paper, we propose a simple but novel solution. We instrument programs with explicit memory operations so that memory usage verification can be done along with the verification of other properties, using an automated verification system Hip/Sleek developed recently by Chin et al. [10,19]. The instrumentation can be done automatically and is proven sound with respect to an underlying semantics. One immediate benefit is that we do not need to develop from scratch a specific system for memory usage verification. Another benefit is that we can verify more programs, especially those involving shared mutable data structures, which previous systems failed to handle, as evidenced by our experimental results.
1
Introduction
Ubiquitous embedded systems are often supplied with limited memory and computation resources due to various constraints on, e.g., product size, power consumption and manufacture cost. The consequences of violating memory safety requirements can be quite severe because of the close coupling of these systems with the physical world; in some cases, they can put human lives at risk. The Mars Rover’s anomaly problem was actually due to a memory leak error and it took fifteen days to fix the problem and bring the Rover back to normal [21]. For applications running on resource-constrained platforms, a challenging problem would be how to make memory usage more predictable and how to ensure that memory usage fulfils the restricted memory requirements. To tackle this challenge, a number of proposals have been reported on memory usage analysis and verification, with most of them focused on functional programs where data structures are mostly immutable and thus easier to handle [1,2,5,7,15,23]. Memory usage verification for imperative/OO languages can be more challenging due to mutability of states and object sharing. Existing solutions to this are mainly type-based [11,12,16]. Instead of capturing all aliasing Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 166–181, 2009. c Springer-Verlag Berlin Heidelberg 2009
Memory Usage Verification Using Hip/Sleek
167
information, they impose restrictions on object mutability and sharing. Therefore, they can only handle limited subsets of programs manipulating shared mutable data structures. The emergence of separation logic [17,22] promotes scalable reasoning via explicit separation of structural properties over the memory heap where recursive data structures are dynamically allocated. Since then, dramatic advances have been made in automated software verification via separation logic, e.g. the Smallfoot tool [3] and the Space Invader tool [6,13,24] for the analysis and verification on pointer safety (i.e. shape properties asserting that pointers cannot go wrong), the Hip/Sleek tool [10,18,19] for the verification of more general properties involving both structural (shape) and numerical (size) information, the verification on termination [4], and the verification for object-oriented programs [9,14,20]. Given these significant advances in the field, a research question that we post to ourselves is: can we make use of some of these state-of-the-art verification tools to do a better job for memory usage verification, without the need of constructing a memory usage verifier from scratch? This paper addresses this question by proposing a simple but novel mechanism to memory usage verification using the Hip/Sleek system developed by Chin et al. [10,19]. Separation logic offers a powerful and expressive mechanism to capture structural properties of shared mutable data structures including aliasing information. The specification mechanism in Hip/Sleek leverages structural properties with numerical information and is readily capable for the use of memory usage specification. Approach and contributions. Memory usage occur in both the heap and stack spaces. While heap space is used to store dynamically allocated data structures, stack memory is used for local variables as well as return addresses of method calls. On the specification side, we assume that two special global variables heap and stk of type int are reserved to represent respectively the available heap and stack memory in the pre-/post-conditions of each method. On the program side, we instrument the program to be verified with explicit operations over variables heap and stk using rewriting rules. We call the instrumented programs as memory-aware programs. The memory usage behaviour of the original program is now mimicked and made explicit in its memory-aware version via the newly introduced primitive operations over heap and stk. We also show that the original program and its memory-aware version are observationally equivalent modulo the behaviour of the latter on the special variables heap and stk as well as a fixed memory cost for storing the two global variables. Instead of constructing and implementing a fresh set of memory usage verification rules for the original program, we can now pass to Hip/Sleek as inputs the corresponding memory-aware program together with the expected memory specification for automated memory usage verification. In summary, this paper makes the following contributions: – We propose a simple but novel solution to memory usage verification based on a verification tool Hip/Sleek by first rewriting programs to their memoryaware counterparts.
168
G. He et al.
– We demonstrate that the syntax-directed rewriting process is sound in the sense that the memory-aware programs are observationally equivalent to their original programs with respect to an instrumented operational semantics. – We have integrated our solution with Hip/Sleek and conducted some initial experiments. The experimental results confirm the viability of our solution and show that we can verify the memory safety of more programs compared with previous type-based approaches. The rest of the paper is structured as follows. We introduce our programming and specification languages in Section 2. In Section 3 we present our approach to memory usage verification in Hip/Sleek . Section 4 defines an underlying semantics for the programming language and formulates the soundness of our approach w.r.t. the semantics. Experimental results are shown in Section 5, followed by related work and concluding remarks afterwards.
2
Language and Specifications
In this section, we first introduce a core imperative language we use to demonstrate the work, and then depict the general specification mechanism used by Hip/Sleek and show how memory usage specifications can be incorporated in. 2.1
Programming Language
To simplify presentation, we focus on a strongly-typed C-like imperative language in Figure 1. A program P in our language consists of user-defined data types tdecl, global variables gVar and method definitions meth. The notation datat stands for the standard data type declaration used in programs, for example as below: data node { int val; node next } data node2 { int val; node2 prev; node2 next } data node3 { int val; node3 left; node3 right; node3 parent } The notation spred denotes a user-defined predicate which may be recursively defined and can specify both structural and numerical properties of data structures involved. The syntax of spred is given in Figure 2. P datat τ meth e
::= tdecl∗ gVar∗ meth∗ tdecl ::= datat | spred ::= data c { field∗ } field ::= t v t ::= c | τ ::= int | bool | void gVar ::= t v ::= t mn (([ref] t v)∗ ) mspec {e} ::= null | kτ | v | v.f | v:=e | v1 .f :=v2 | new c(v ∗ ) | free(v) | e1 ; e2 | t v; e | mn(v ∗ ) | if v then e1 else e2 Fig. 1. A Core (C-like) Imperative Language
Memory Usage Verification Using Hip/Sleek
169
Note that a parameter can be either pass-by-value or pass-by-reference, distinguished by the ref before a parameter definition. The method specification mspec, written in our specification language in Figure 2, specifies the expected behaviour of the method, including its memory usage behaviour. Our aim is to verify the method body against this specification. Our language is expressionoriented, so the body of a method is an expression composed of standard instructions and constructors of an imperative language. Note that the instructions new and free explicitly deal with memory allocation and deallocation, respectively. The term k τ denotes a constant value of type τ . While loops are transformed to tail-recursive methods in a preprocessing step. 2.2
Specification Language
Our specification language is given in Figure 2. Note spred defines a new separation predicate c in terms of the formula Φ with a given pure invariant π. Such user-specified predicates can be used in the method specifications. The method specification requires Φpr ensures Φpo comprises a precondition Φpr and a postcondition Φpo . The separation formula Φ, which appears in the predicate definition spred or in the pre-/post-conditions of a method, is in disjunctive normal form. Each disjunct consists of a ∗-separated heap constraint κ, referred to as heap part, and a heap-independent formula π, referred to as pure part. The pure part does not contain any heap nodes and is restricted to pointer equality/disequality γ and Presburger arithmetic φ. As we will see later, γ is used to capture the alias information of pointers during the verification, and φ is to record the numerical information of data structures, such as length of a list or height of a tree. Furthermore, Δ denotes a composite formula that could always be normalized into the Φ form [19]. The formula emp represents an empty heap. If c is a data node, the formula p::cv∗ represents a singleton heap p →[(f : v)∗ ] with f∗ as fields of data declaration c. For example, p::node0, null denotes that p points to a node structure in the heap, whose fields have values 0 and null, respectively. If c is a (userspecified) predicate, p::cv∗ stands for the formula c(p, v∗ ) which signifies that
spred mspec Φ γ κ Δ φ b s
::= root::cv ∗ ≡ Φ inv π ::= requires Φpr ensures Φpo ::= (∃v ∗ ·κ∧π)∗ π ::= γ∧φ ::= v1 =v2 | v=null | v1 =v2 | v=null | γ1 ∧γ2 ::= emp | v::cv ∗ | κ1 ∗ κ2 ::= Φ | Δ1 ∨Δ2 | Δ∧π | Δ1 ∗Δ2 | ∃v·Δ ::= b | a | φ1 ∧φ2 | φ1 ∨φ2 | ¬φ | ∃v · φ | ∀v · φ ::= true | false | v | b1 = b2 a ::= s1 =s2 | s1 ≤s2 ::= kint | v | kint ×s | s1 +s2 | −s | max(s1 ,s2 ) | min(s1 ,s2 ) Fig. 2. The Specification Language
170
G. He et al.
the data structure pointed to by p has the shape c with parameters v∗ . As an example, one may define the following predicate for a singly linked list with length n: root::lln≡(root=null∧n=0)∨(∃i, m, q · root::nodei, q∗q::llm∧n=m+1) inv n≥0
The above definition asserts that an ll list either can be empty (the base case root=null where root is the “head pointer” pointing to the beginning of the whole structure described by ll), or consists of a head data node (specified by root::nodei, q) and a separate tail data structure which is also an ll list (q::llm saying that q points to an ll list with length m). The separation conjunction ∗ introduced in separation logic signifies that two heap portions are domain-disjoint. Therefore, in the inductive case of ll’s definition, the separation conjunction ensures that the head node and the tail ll reside in disjoint heaps. A default invariant n≥0 is specified which holds for all ll lists. Existential quantifiers are for local values and pointers in the predicate, such as i, m and q. A slightly more complicated shape, a doubly linked-list with length n, is described by: root::dllp, n≡(root=null∧n=0)∨(root::node2 , p, q∗q::dllroot, n−1) inv n≥0
The dll predicate has a parameter p to represent the prev field of the root node of the doubly linked list. This shape includes node root and all the nodes reachable through the next field starting from root, but not the ones reachable through prev from root. Here we also can see some shortcuts that underscore denotes an anonymous variable, and non-parameter variables in the right hand side of the shape definition, such as q, are implicitly existentially quantified. As can be seen from the above, we can use κ to express the shape of heap and φ to express numerical information of data structures, such as length. This allows us to specify data structures with sophisticated invariants. For example, we may define a non-empty sorted list as below: root::sortln, min ≡ (root::nodemin, null∧n=1 ∨ (root::nodemin, q∗q::sortlm, k∧n=m+1∧min≤k) inv n≥0
The sortedness property is captured with the help of an additional parameter min denoting the minimum value stored in the list. The formula min≤k ensures the sortedness. With the aforesaid predicates, we can now specify the insertion-sort algorithm as follows: node insert(node x, node vn) requires x::sortln, min ∗ vn::nodev, ensures res::sortln+1, min(v, min); {· · · }
node insertion sort(node y) requires y::lln ∧ n>0 ensures res::sortln, ; {· · · }
where a special identifier res is used in the postcondition to denote the result of a method. The postcondition of insertion sort shows that the output list is sorted and has the same number of nodes. We can also specify that the input
Memory Usage Verification Using Hip/Sleek
171
and output lists contain the same set of values by adding another parameter to the sortl predicate to capture the bag of values stored in the list [10]. The semantics of our specification formula is similar to the model given for separation logic [22] except that we have extensions to handle user-defined shape predicates. We assume sets Loc of memory locations, Val of primitive values, with 0 ∈ Val denoting null, Var of variables (program and logical variables), and ObjVal of object values stored in the heap, with c[f1 →ν1 , .., fn →νn ] denoting an object value of data type c where ν1 , .., νn are current values of the corresponding fields f1 , .., fn . Let s, h |= Φ denote the model relation, i.e. the stack s and heap h satisfy Φ, with h, s from the following concrete domains: h ∈ Heaps =df Loc fin ObjVal
s ∈ Stacks =df Var → Val ∪Loc
Note that each heap h is a finite partial mapping while each stack s is a total mapping, as in the classical separation logic [17,22]. The detailed definitions of the model relation can be found in Chin et al. [10]. 2.3
Memory Usage Specification
To incorporate memory usage into the specification mechanism of Hip/Sleek, we employ two global variables heap and stk to represent the available heap and stack memory (in bytes). The memory requirement of a method can then be specified as a pure constraint over heap and stk in the precondition of the method. The remaining memory space upon the return from a method call can also be exhibited using a pure formula over heap and stk in the postcondition.1 Due to perfect recovery of stack space upon return from a method call, stk in a method’s postcondition will always be the same as its initial value stk. As an example, the method new list(int n), which creates a singly linked list with length n, is given as follows together with its memory usage specification: node new list(int n) requires heap≥8 ∗ n ∧ n≥0 ∧ stk≥12 ∗ n+4 ensures res::lln ∧ heap =heap−8 ∗ n ∧ stk =stk { node r := null; if (n>0) { r := new list(n−1); r := new node(n, r)}; r }
where the node was declared earlier in Sec 2.1. We assume that we use a 32-bit architecture; therefore, one node requires 8 bytes of memory. This assumption can be easily changed for a different architecture. The precondition specifies that the method requires at least 8 ∗ n bytes of heap space and 12 ∗ n + 4 stack space before each execution with n denoting the size of the input.2 After method 1 2
A primed variable x in a specification formula denotes the latest value of variable x, with x representing its initial value. When a new local variable r is declared, 4 bytes of stack memory is consumed. Later when the method new list is invoked recursively, its parameters, return address and local variables are all placed on top of the stack. This is why it requires at least 12 ∗ n+4 bytes of stack space.
172
G. He et al.
execution, 8 ∗ n bytes of heap memory is consumed by the returned list, but the stack space is fully recovered. This is reflected by the formula (heap = heap − 8 ∗ n ∧ stk = stk) in the postcondition. As another example, the following method free list deallocates a list: void free list(node2 x) requires x::dlln ∧ heap≥0 ∧ stk≥12 ∗ n ensures emp ∧ heap =heap+12 ∗ n ∧ stk =stk { if (x = null) { node t := x; x := x.next; free(t); free list(x) } }
We can see that 12 ∗ n bytes of heap space is expected to be claimed back by the method as signified in the postcondition. Notice here the stack and heap memory are specified in terms of the logical variable n denoting the length of the list x, showing the possible close relation between the separation (shape and size) specification and the memory specification. Next we will show how to rewrite the program to its memory-aware version by using the two global variables heap and stk to mimic the memory behaviour, so that Hip/Sleek can step in for memory usage verification.
3
Memory Usage Verification
In this section, we first present the instrumentation process which converts programs to be verified to memory-aware programs. We then briefly introduce the automated verification process in Hip/Sleek. 3.1
The Instrumentation Process
The instrumentation process makes use of primitive operations over the global variables heap and stk to simulate the memory usage behaviour of the original program. It is conducted via the rewriting rules given in Figure 3. These rewriting rules form a transformer M which takes in a program and returns its memory-aware version. Note that M conducts identical rewriting except for the following four cases: (1) heap allocation new c(v ∗ ); (2) heap deallocation free(v); (3) local block {t v; e}; (4) method declaration t0 mn(t1 v1 , .., tn vn ){e}. M(E) ::= E where E ∈ {null, kτ , v, v.f, v1 .f :=v2 , mn(v ∗ )} M(new c(v ∗ )) ::= dec hp(ssizeof(c)); new c(v ∗ ) M(free(v)) ::= free(v); inc hp(ssizeof(type(v))) M({t v; e}) ::= dec stk(sizeof(t)); {t v; M(e)}; inc stk(sizeof(t)) M(v:=e) ::= v:=M(e) M(e1 ; e2 ) ::= M(e1 ); M(e2 ) M(if v then e1 else e2 ) ::= if v then M(e1 ) else M(e2 ) M(t0 mn(t1 v1 , .., tn vn ){e}) ::= t0 mn(t1 v1 , .., tn vn ){ dec stk(sizeof(t0 , t1 , .., tn )+4); M(e); inc stk(sizeof(t0 , t1 , .., tn )+4)} Fig. 3. Rewriting Rules for Instrumentation
Memory Usage Verification Using Hip/Sleek
173
To simulate the memory effect of new c(v ∗ ), we employ a primitive method over variable heap, called dec hp, which is subject to the specification: void dec hp(int n) requires heap≥n ∧ n≥0 ensures heap =heap−n
To successfully call dec hp(n), the variable heap must hold a value no less than the non-negative integer n at the call site. Upon return, the value of heap is decreased by n. To simulate the memory effect of free(v), we employ a primitive method over heap, called inc hp: void inc hp(int n) requires n≥0 ensures heap =heap+n
The memory effect of local blocks and method bodies can be simulated in a similar way, and the difference is that they count on the stack instead of heap. For code blocks, we employ dec stk to check the stack space is sufficient for the local variable to be declared, and decrease the stack space; meanwhile, at the end of the block, we recover such space by inc stk due to the popping out of the local variables. As for method body, stack space is initially acquired (and later recovered) for method parameters and return address (four bytes), as the last rewriting rule suggests. The specifications for these two primitive methods are as follows: void dec stk(int n) requires stk≥n ∧ n≥0 ensures stk =stk−n void inc stk(int n) requires n≥0 ensures stk =stk+n
Note that two different functions sizeof and ssizeof are used in the rewriting rules: sizeof is applied to both primitive and reference types, while ssizeof is applied to (user-defined) data types, by summing up the sizes of all declared fields’ types obtained via sizeof. For example, sizeof(int) = 4, sizeof(node) = 4, and ssizeof(node) = 8, since the node data structure (defined in Section 2) contains an int field and a reference to another node. We also abuse these functions by applying them to a list of types, expecting them to return the sum of the results when applied to each type. We present below the memory-aware versions for the two examples given in Section 2. node new list(int n) requires emp ∧ heap≥8 ∗ n∧ n≥0 ∧ stk≥12 ∗ n + 4 ensures res::lln ∧ stk =stk∧ heap =heap−8 ∗ n; { dec stk(4); node r := null; if (n > 0) { dec stk(8); r := new list(n−1); inc stk(8); dec hp(8); r := new node(n, r) }; inc stk(4); r } Fig. 4. Example 1
void free list(node2 x) requires x::dllp, n ∧ heap≥0∧ stk≥12 ∗ n ensures emp ∧ stk =stk ∧ heap =heap+12 ∗ n; { if (x = null) { dec stk(4); node2 t := x; x := x.next; free(t); inc hp(12); dec stk(8); free list(x); inc stk(8); inc stk(4) } } Fig. 5. Example 2
174
G. He et al.
Note that the memory effect is simulated via explicit calls to the afore-mentioned four primitive methods over heap and stk, which are highlighted in bold. As one more example, we show in Figure 6 a program with more complicated memory usage behaviour. The program translates a doubly linked list (node2) into a singly linked list (node), by deallocating node2 x and then creating a singly linked list with the same length and content. A heap memory of 4 ∗ n bytes is reclaimed back since each node2 object has one more field (which takes 4 bytes) than a node object. node dl2sl(node2 x) requires x::dll , n ∧ stk≥20∗n ∧ heap≥0 ensures res::lln ∧ stk =stk ∧ heap =heap+4∗n; { dec stk(4); node r := null; if (x = null) { dec stk(4); int v := x.val; dec stk(4); node2 t := x; x := x.next; free(t); inc hp(12); dec stk(8); r := dl2sl(x); inc stk(8); dec hp(8); r := new node(v, r); inc stk(4); inc stk(4) }; inc stk(4); r } Fig. 6. Example 3
The instrumented programs are then passed to Hip/Sleek for automated verification. 3.2
The Hip/Sleek Automated Verification System
An overview of the User Supplied Items Hip/Sleek autoShape Program mated verification Pre/Post Predicates Code system is given in Figure 7. The front-end of the system is a standard SLEEK: Entailment HIP: Hoare-style Prover Forward Verifier Hoare-style forward verifier Hip, which Automated Verification System invokes the entailFig. 7. The Hip/Sleek Verification System ment prover Sleek. The Hip verifier comprises a set of forward verification rules to systematically check that the precondition is satisfied at each call site, and that the declared postcondition is successfully verified (assuming the given precondition) for each method definition. The forward verification rules are of the form {Δ1 } e {Δ2 } which expect the symbolic abstract state Δ1 to be given before computing Δ2 . Given two separation formulas Δ1 and Δ2 , the entailment prover Sleek attempts to prove that Δ1 entails Δ2 ; if it succeeds, it returns a frame R such that Δ1 Δ2 ∗ R. More details of the Hip and Sleek provers can be found in Chin et al. [10].
Memory Usage Verification Using Hip/Sleek
4
175
Soundness
This section presents the soundness of our approach with respect to an underlying operational semantics given in Figure 8. Note that we instrument the state with memory size information, so a program state is represented by s, h, σ, μ, e, where s, h denote respectively the current stack and heap state as mentioned earlier, σ (μ) represents current available stack (heap) memory in bytes, and e is the program code to be executed. If the execution leads to an error, we denote the error state as er1 if it is due to memory inadequacy, or as er2 for all other errors (e.g. null pointer dereference). Note also that an intermediate construct ret(v∗ , e) is introduced to denote the return value of call invocation and local blocks as in Chin et al. [10]. Later, we use →∗ to denote the composition of any non-negative number of transitions, and ↑ for program divergence.
s, h, σ, μ, v→s, h, σ, μ, s(v)
s, h, σ, μ, k→s, h, σ, μ, k
s, h, σ, μ, v:=k→s[v →k], h, σ, μ, ()
s, h, σ, μ, (); e→s, h, σ, μ, e
s(v) ∈ dom(h) s, h, σ, μ, v.f →s, h, σ, μ, h(s(v))(f ) s, h, σ, μ, e1 →s1 , h1 , σ1 , μ1 , e3 s, h, σ, μ, e1 ; e2 →s1 , h1 , σ1 , μ1 , e3 ; e2
s(v) ∈ / dom(h) s, h, σ, μ, v.f →er2
s, h, σ, μ, e→s1 , h1 , σ1 , μ1 , e1 s, h, σ, μ, v:=e→s1 , h1 , σ1 , μ1 , v:=e1
s(v)=true s, h, σ, μ, if v then e1 else e2 →s, h, σ, μ, e1 s(v)=false s, h, σ, μ, if v then e1 else e2 →s, h, σ, μ, e2 s(v1 ) ∈ dom(h) r = h(s(v1 ))[f →s(v2 )] h1 = h[s(v1 ) →r] s(v1 ) ∈ / dom(h) s, h, σ, μ, v1 .f := v2 →s, h1 , σ, μ, () s, h, σ, μ, v1 .f := v2 →er2 s(v) →l ∈ h h1 =h\[s(v) →l] μ1 =μ+ssizeof(type(v)) s, h, σ, μ, free(v)→s, h1 , σ, μ1 , ()
s(v) ∈ / dom(h) s, h, σ, μ, free(v)→er2
data c {t1 f1 , .., tn fn }∈P ι∈dom(h) / μ≥ssizeof(c) μ1 =μ−ssizeof(c) r=c[fi →s(vi )]n i=1 s, h, σ, μ, new c(v ∗ )→s, h+[ι → r], σ, μ1 , ι
μ<ssizeof(c) s, h, σ, μ, new c(v ∗ )→er1
s, h, σ, μ, ret(v1 , .., vn , k)→s−{v1 , .., vn }, h, σ+sizeof(type(v1 ), .., type(vn )), μ, k s, h, σ, μ, e→s1 , h1 , σ1 , μ1 , e1 s, h, σ, μ, ret(v ∗ , e)→s1 , h1 , σ1 , μ1 , ret(v ∗ , e1 ) σ≥sizeof(t) σ1 =σ−sizeof(t) s, h, σ, μ, {t v; e}→s+[v →⊥], h, σ1 , μ, ret(v, e) s1 =s+[wi →s(vi )]n i=m n n σ≥Σi=m sizeof(ti ) σ1 =σ−Σi=m sizeof(ti ) m−1 t0 mn((ref ti wi )i=1 , (ti wi )n i=m ) {e} s, h, σ, μ, mn(v1 , .., vn ) → m−1 s1 , h, σ1 , μ, ret({wi }n i=m , [vi /wi ]i=1 e)
σ<sizeof(t) s, h, σ, μ, {t v; e}→er1
n σ 0 s.t. v ∈ (Vi ∩ Attr i (U )) \ Attr i (U ) Attr k−1 σi (v) := and w ∈ Attr i (U ) ∩ vE ⎪ ⎩ ⊥, otherwise
186
O. Friedmann and M. Lange
Note that the choice of w is not unique, but any w with the prescribed property will suffice. An important property that has been noted before [18,14] is that removing the i-attractor of any set of nodes from a game will still result in a total game graph.
3
Universal Optimisations and a Generic Solver
This section describes some universal optimisations – in the form of pre-transformations or incomplete solvers. These try to efficiently reduce the overall complexity of a given parity game in order to reduce the effort spent by any solver. Clearly, such optimisations have to ensure that a solution of the modified game can be effectively and efficiently translated back into a valid solution of the original game. Here we describe four: (1) SCC decomposition [5]; (2) detection of three special cases – two from [1]; (3) priority compression [5], and (4) priority propagation [1]. At the end we present a generic solver that makes use of most of them. The choice is motivated by two facts: their worst-case running time is low in comparison to that of a real solver: at most O(|Ω| · |E|), resp. O(|V | log |V |). More importantly, they are empirically found to be beneficial. 3.1
SCC Decomposition
Let G = (V, V0 , V1 , E, Ω) be a parity game. A strongly connected component (SCC) is a non-empty set C ⊆ V with the property that every node in C can reach every other node in C, i.e. uE ∗ v for all u, v ∈ C (where E ∗ denotes the reflexive-transitive closure of E). We always assume SCCs to be maximal. We call an SCC C proper if |C| > 1 or C = {v} for some v with vEv. Every parity game G = (V, V0 , V1 , E, Ω) can, in time O(|E|), be partitioned into SCCs C0 , ..., Cn using Tarjan’s algorithm for example [15]. There is a topological ordering → on these SCCs which is defined as Ci → Cj iff i = j and there are u ∈ Ci , v ∈ Cj with uEv. An SCC C is called final if there is no SCC C s.t. C → C . Note that every finite graph must have at least one final SCC. Parity games can be solved SCC-wise. Each play eventually gets trapped in an SCC, and the winner of the play is determined by the priorities of the nodes in this SCC alone, in particular not by priorities of nodes not in this SCC. Hence, an entire parity game can be solved by solving its SCCs starting with the final ones and working backwards in their order. It is reasonable to assume that SCC decomposition speeds up the solving of a game. Suppose that the time it takes to solve a game G is f (G), and that G can be decomposed into SCCs C0 , . . . , Cn . Then solving SCC-wise will take time f (C1 ) + . . . + f (Cn ) + O(|G|) which is asymptotically better than f (G) if f is superlinear. Note that it takes at least linear time to solve a parity game because every node has to be visited at least once in order to determine which Wi it belongs to.
Solving Parity Games in Practice
187
Attr 0(W0) W1
W0
W1
W0
Fig. 1. Solving a game SCC-wise with refined decompositions
A na¨ıve implementation of SCC-wise solving handles final SCCs and replaces the winning regions in those with two single self-looping nodes that are won by the respective players, and then continues with the next SCCs. A slightly more clever way is the following, as suggested by Jurdzi´ nski [5]. First, let WiG and σiG be empty sets resp. strategies for the players i ∈ {0, 1} on G. 1. Decompose G into SCCs C0 , . . . , Cn . W.l.o.g. say that C0 , . . . , Cm are final for some m ≤ n. Then one solves these obtaining winning regions Wij and strategies σij for i ∈ {0, 1} and j = 0, . . . , m. Add Wij to WiG for every i, j and add σij to σiG via right-join. 2. Compute Ai := Attr i (Wi0 ∪ . . . ∪ Wim ) for i ∈ {0, 1} and corresponding attractor strategies σi which are also added to WiG and σiG via right-join. 3. Repeat step 1 with (G \ A0 ) \ A1 until G is entirely solved. Note that the attractors of the winning regions in some SCC can extend into SCCs further up, and eliminating them can result in a finer SCC structure than before. Hence, it suffices to decompose those of Cm+1 , . . . , Cn that intersect with one of the attractors. An example is depicted in Fig. 1. On the left it shows a parity game that is decomposed into 5 SCCs of which one is final. That is then solved using an arbitrary solver which partitions it into the winning regions W0 and W1 . The middle then shows the attractor of W0 reaching into other SCCs. On the right it shows the shaded regions already declared as winning for the respective players, and the two affected non-final SCCs being decomposed into SCCs again. This then yields a smaller parity game with 6 SCCs which can be solved iteratively until the winning regions partition the entire game. 3.2
Detection of Special Cases
There are certain games that can be solved very efficiently. W.l.o.g. we assume games to be proper and final SCCs. Note that non-proper SCCs are being solved using attractor computations in the procedure described above. Self-cycle games. Suppose there is a node v such that vEv. Then there are two cases depending on the node’s owner p and the parity of the node’s priority. If
188
O. Friedmann and M. Lange
Ω(v) ≡2 p then taking the edge (v, v) is always a bad choice for player p and this edge can be removed from the game. If v is v’s only successor then v itself can be removed and the process iterated in order to preserve totality. If Ω(v) ≡2 p then taking this edge is always good in the sense that the partial function [v → v] is a winning strategy for player p on v. Hence, its attractor can be removed as described above. It is therefore possible to remove self-cycles from a game in time O(|E|), returning winning sets Wi and strategies σi for i ∈ {0, 1} that result from the attractors of those nodes that are good for the respective players. One-parity games. If all nodes in a proper SCC have the same parity the whole game is won by the corresponding player no matter which choice she makes. Hence, a winning strategy can be found by random choice in time O(|V |). One-player games. A game G is a one-player game for player i iff for all v ∈ V1−i we have |vE| = 1. It can be solved in time O(|Ω| · |E|) as follows. Consider the largest priority p in G, and let P := {v | Ω(v) = p}. There are two cases. – If p ≡2 i then player i wins the entire G because it is assumed to be a proper SCC and player 1 − i does not make any choices, so she can reach a node in P from any node in the game. A winning strategy can easily be constructed from an attractor strategy for P . – If p ≡2 then let A := Attr 1−i (P ). Note that A consists of all nodes from which player i has to move through a node with priority p which is bad for her. Let C0 , . . . , Cm be a decomposition of G \ A into SCCs. Now, player i wins from all nodes in G iff she wins from all nodes in one of C0 , . . . , Cm , simply because they are part of the original SCC G in which player 1 − i does not move, so the attractor of any winning node is always the entire G. This gives a simple recursive algorithm which considers the largest priority and either terminates or removes attractors, decomposes into SCCs and calls itself recursively on the sub-SCCs. If player 1 − i does not win on the entire G, then player 1 − i wins on the entire G, and this is the case if in all the recursive calls no sub-SCC is won by player i. Clearly, this can be realised in time O(|Ω| · |E|). 3.3
Priority Compression
The complexity of a parity game rises with |Ω|. This optimisation step attempts to reduce this number. Note that it is not the actual values of priorities that determine the winner. It is rather their parity on the one hand and their ordering on the other. For instance, if there are two priorities p1 < p2 in a game with p1 ≡2 p2 but there is no p such that p1 < p < p2 and p ≡2 p1 then every occurrence of p2 can be replaced by p1 . The compression of G is a partial mapping ω : N → N that is defined on all Ω(v) for any node v of the underlying game G; monotonic (x ≤ y implies ω(x) ≤ ω(y)); decreasing (ω(x) ≤ x); parity-preserving (ω(x) ≡2 x); dense (ω(x) < ω(y) − 1 implies ∃z.ω(x) < ω(z) < ω(y)); and minimal (min{ω(x) | ω(x) = ⊥} < 2). Note that a compression of G is unique and can easily be
Solving Parity Games in Practice
189
computed in time O(|V | log |V |): sort all nodes in ascending order of priority and then construct ω in a single sweep through this order, starting with 0 or 1 depending on the least priority in G. If G = (V, V0 , V1 , E, Ω) and ω is its compression then Comp(G) = (V, V0 , V1 , E, ω ◦ Ω). It is the case that G and Comp(G) have the same winning regions and strategies. 3.4
Priority Propagation
Suppose a play passes through a node v. Then it ultimately has to pass through some of its successors as well. If the priority of v is at most as high as that of all its successors, then one can replace the priority of v with the minimum of its successors’ priorities without changing the winning regions and strategies. This is backwards propagation. Equally, in forwards propagation one replaces a node’s priority with the minimum of the priorities of all its predecessors if that is greater than the current priority. This is sound because a node can only contribute to the determination of the winner of a play if it is visited repeatedly, i.e. if the play passes through one of its predecessors as well. Technically, priority propagation on G = (V, V0 , V1 , E, Ω) computes a game G = (V, V0 , V1 , E, Ω ) s.t. for all v ∈ V : Ω (v) := max{Ω(v), min{Ω(w) | w ∈ Uv }} where Uv = vE in case of backwards propagation and Uv = Ev otherwise. Note that propagation can be iterated, the forwards and backwards facets can be intertwined, and this process is guaranteed to terminate because priorities are at most increased but never beyond the maximal priority in G. Hence, the worstcase running time is O(|Ω| · |V | · |E|), on average it will be much faster though. Note that, even though priority propagation increases priorities, it decreases the range of priorities in a game, and is therefore supposedly beneficial to preceed compression, because it allows for more compressed priorities. Empirically, however, the use of priority propagation turns out to be harmful. This is why it is not considered in the generic solver presented next. 3.5
A Generic Solver
We propose to solve parity games using the following generic algorithm which takes as parameter a real solver and applies it only where necessary. It relies heavily on SCC decomposition and attractor computations. Self-cycle elimination is done first because it can only be applied once and for all. Then the game is decomposed, and from then on, only final SCCs are being solved. Their priorities are being compressed – note that compression within an SCC rather than the entire game generally leads to better results – and are checked for being special cases. If this does not solve the SCC then the parameter solver is put to work on it. Finally, attractors of computed winning regions are formed, and the SCC decomposition is refined accordingly.
190
1 2 3 4 5 6 7 8 9 10 11 12 13 14
O. Friedmann and M. Lange
GenericSolver(G = (V, V0 , V1 , E, Ω),S) = initialise empty winning regions W0 , W1 and strategies σ0 , σ1 eliminate self-cycles from G while G is not empty do decompose G into SCCs for each final SCC C do if C is a one-player-SCC then solve C directly else if C is a one-parity-SCC then solve C directly else compress priorities on C solve C using S compute and remove attractors of the winning regions in C
Here we assume that the procedures in lines 2,7,11 and 13 update the variables W0 , W1 , σ0 , σ1 with the information about winning regions and strategies that they have found on parts of the game. Hence, the solution to the entire game is stored in these variables in the end. Note that this generic algorithm is sound whenever the backend S is sound, meaning that the answer it computes for a node is correct. It is complete – an answer is computed for every node – if the backend is complete. However, this is not necessary. In order to guarantee completeness one does not need completeness of the backend. Instead it suffices if the backend solves at least one node of a given game. Then the generic algorithm will eventually terminate with W0 ∪ W1 = V .
4
Empirical Evaluation
The generic solver described above, together with 8 real solvers from the literature has been implemented in a tool called PGSolver1 . The tool is written in OCaml; and it uses standard array representations for manipulating game graphs, in particular no symbolic methods. Here we report on some of PGSolver’s runtime result on benchmarking families of games. These benchmarks should cover typical applications of parity games, in particular games from the area of model checking and decision problems for branching-time logics. However, we remark that so far there is no standard collection of parity game benchmarks. Here we start with the following. – Decision procedures. We apply to certain (hard) formulas the exponential reduction of the validity problem for the modal μ-calculus to the parity game problem using Piterman’s determinisation procedure [9]. – Model checking. We encode two verification problems (fairness and reachability) as parity games. 1
Publicly available via http://www.tcs.ifi.lmu.de/pgsolver
Solving Parity Games in Practice
191
– Random games. Because of the absence of meaningful standardised parity game benchmarks we also evaluate the generic solver on random games. The benchmarking games are presented in detail in the following. We only report on runtime results of the generic solver using the recursive algorithm, strategy improvement and the small progress measures algorithm as a backend because, on a separate note, these three algorithms turn out to be best in the sense that in general, they solve games faster than the other algorithms like the local model checker for example. This holds regardless of whether they are used directly or as a backend to the generic solver. In order to exhibit the benefits of using the generic solver we present runtime results using various combinations of these optimisations: table columns labeled all contain running times obtained from the generic solver as presented above, i.e. using removal of self-cycles, SCC decomposition, priority compression and detection of special cases. Equally, columns none indicate using none of these, i.e. the entire game is solved by the backend. Columns labeled scc contain results from using SCC decomposition only, while cyc indicates the application of both SCC decomposition and removal of self-cycles. Finally, columns named pcsg imply the application of SCC decomposition, priority compression as well as detection of special cases. Note that the series presented in the tables to follow do not start with the smallest instances. We only present instances with non-negligable running times. On the other hand, the solving of larger instances not presented in the tables anymore has experienced time-outs after one hour, marked †, or the games were already to large to be stored in the heap space. R XeonTM All tests have been carried out on a machine with two 2.4GHz Intel processors and 4GB RAM space. The implementation does not (yet) support parallel computations, hence, each test is run on one processor only. Decision Procedures. Consider the following μ-calculus formulas ϕn := ψn ∨¬ψn , n ∈ N, where
ψn := μX1 .νX2 . . . σn Xn . q1 ∨ ♦ X1 ∧ q2 ∨ ♦(X2 ∧ . . . (qn ∨ ♦Xn )) with σn = μ if n is odd, and σn = ν otherwise. Obviously, ϕn is valid. It has been chosen because of its high alternation depth which requires a relatively large NBA An that checks for the unfoldings of ν-formulas. The nodes of the parity game Gn resulting from ϕn are sets of subformulas of ϕn together with a state of a deterministic parity automaton Bn which is equivalent to An and which gives the game node its priority. The number of priorities in Bn depends on the size of An [9]. Hence, ϕn is chosen in order to yield games of large index. Another family to be considered is the following μ-calculus formula
ϕ n := νX. q ∧ ♦(q ∧ ♦(. . . ♦(q ∧♦(¬q ∧ ♦X)) . . .))
2n times
→ νZ.μY.(¬q ∧ ♦Z) ∨ (q ∧ ♦(q ∧ ♦Y )) which describes the language inclusion ((aa)n b)ω ⊆ ((aa)∗ b)ω .
192
O. Friedmann and M. Lange
Recursive Algorithm
ϕn
ϕn
n 2 3 4 5 6 7 10 50 100 500 1K 2K
nodes 462 2.5K 14K 58K 262K 982K 4.5K 21.5K 42.6K 211K 423K 846K
sccs 84 219 966 3.4K 15K 55K 1.1K 5.5K 10.8K 54.1K 108K 216K
all 0.0s 0.0s 0.1s 0.6s 5.0s 22s 0.1s 0.5s 0.6s 5.4s 6.5s 24s
cyc 0.0s 0.0s 0.1s 0.7s 5.2s 25s 0.1s 0.5s 0.7s 6.7s 7.6s 27s
pcsg 0.0s 0.0s 0.2s 1.2s 29s 4.6m 0.1s 3.3s 20s 13m 53m †
scc 0.0s 0.0s 0.2s 1.2s 29s 5.2m 0.2s 4.5s 20s 13m 54m †
none 0.0s 0.5s 6.4s 53s 17m † 0.3s 7.8s 2.7m 29m † †
Strategy Improvement all 0.0s 0.2s 7.4s 49s 12m † 0.1s 0.6s 1.1s 6.5s 7.9s 18s
cyc 0.2s 5.8s 4.1m 33m † † 22m † † † † †
pcsg 0.0s 0.2s 8.1s 75s 14m † 0.1s 6.5s 21s 11m 43m †
Small Progress Measures
scc none all cyc 0.2s 1.0s 0.0s 0.1s 6.2s 4.7m 0.2s 1.6s 4.6m † 0.6s 12s 40m † 2.2s 56s † † 5.1s 6.7m † † 35s 42m 30m † 0.0s 0.4s † † 0.0s 13.6s † † 0.1s 2.2m † † 0.3s † † † 0.6s † † † 1.4s †
pcsg 0.0s 0.2s 0.8s 2.9s 32s 5.1m 0.0s 0.0s 0.2s 5.1s 17s 87s
scc 0.1s 1.7s 13s 60s 7.8m 44m 0.4s 15s 2.3m † † †
none 0.3s 4.6s 39s 2.4m † † 0.9s 61s 6.5m † † †
Fig. 2. Runtime results on games from the decision procedures domain
The times needed to solve the resulting games as well as their sizes are presented in Fig. 2. Model Checking I. We encode a simple fairness verification problem as a parity game. States of a transition system modelling an elevator for n floors are of type {1, . . . , n} × {o, c} × ( {Perm(S) | S ⊆ {1, . . . , n}). The first component describes the current position of the elevator as one of the floors. The second component indicates whether the door is open or closed. The third component – a permutation of a subset of all available floors – holds the requests, i.e. those floors that should be served next. The transitions on these are as follows. – At any moment, any request or none can be issued. For simplicity reasons, we assume that at most one floor is added to the requests per transition. Note that nondeterministically, no request can be issued, and a request for a certain floor that is already contained in the current requests does not change them. – If the door is open then it is closed in the next step, the current floor does not change. – If it is closed, the elevator moves one floor (up or down) into the direction of the first request. If the floor reached that way is among the requested ones, the door is opened and that floor is removed from the current requests. Otherwise, the door remains closed. We consider two different implementations of this elevator model: the first one stores requests in FIFO style, the second in LIFO style. The games Gn (with FIFO), resp. Gn (with LIFO) result from encoding the model checking problem for this transition system and the CTL∗ formula A(GFisPressed → GFisAt ) as a parity game [14]. Proposition isPressed holds in any state s.t. the request list contains the number n, and isAt holds in a state where the current floor is n. Hence, this formula requires all runs of the elevator to satisfy the following fairness property: if the top floor is requested infinitely often then it is being served infinitely often. It can easily be formulated in the modal μ-calculus using a formula of size 11 and alternation depth 2 (of type ν–μ–ν). Hence the resulting
Solving Parity Games in Practice
Recursive Algorithm
Gn
Gn
n 3 4 5 6 7 3 4 5 6
nodes 564 2.6K 15.6K 108K 861K 588 2.8K 16.3K 111K
sccs 95 449 2.6K 18K 143K 99 473 2.7K 18.5K
all 0.0s 0.1s 0.4s 3.1s 34s 0.0s 0.1s 0.6s 3.8s
cyc 0.0s 0.1s 0.5s 4.7s 44s 0.0s 0.1s 0.7s 4.3s
pcsg 0.0s 0.1s 0.6s 4.9s 50s 0.0s 0.1s 0.8s 5.6s
scc 0.0s 0.1s 0.7s 6.0s 73s 0.0s 0.1s 0.9s 6.0s
none 0.0s 0.2s 1.4s 11s 1.8m 0.0s 0.1s 1.0s 7.1s
193
Strategy Improvement Small Progress Measures all 0.0s 0.1s 0.5s 3.1s 36s 0.0s 0.1s 0.6s 3.8s
cyc 0.1s 1.8s 2.0s † † 0.0s 0.2s 2.4s 21s
pcsg 0.0s 0.1s 0.7s 4.5s 53s 0.0s 0.1s 0.8s 8.7s
scc 0.1s 2.0s 2.2s † † 0.0s 0.3s 5.7s 46s
none 0.10s 3.1s 2.3s † † 0.0s 0.7s 13s 5.3m
all 0.0s 0.1s 0.5s 4.0s 39s 0.0s 0.1s 0.5s 5.2s
cyc 0.0s 0.4s 2.9s 33s 6.7m 0.0s 0.2s 1.3s 20s
pcsg 0.0s 0.1s 0.7s 5.8s 59s 0.0s 0.1s 0.5s 7.0s
scc 0.0s 0.4s 3.0s 33s 6.9m 0.0s 0.2s 1.5s 24s
none 0.1s 0.4s 3.9s 37s 7.6m 0.0s 0.2s 1.5s 24s
Fig. 3. Runtime results on games from the elevator verification example
parity games have constant index 3. Note that Gn encodes a positive instance of the model checking problem whereas Gn encodes a negative one. The times needed to solve them as well as their sizes are presented in Fig. 3. Larger instances caused out-of-memory failures due to the size of the underlying transition system. Model Checking II. Typical verification problems often lead to special parity games for which there are specialised solvers. For instance, CTL model checking problems lead to alternation-free parity games, i.e. those in which every SCC is a single-parity SCC with index 0 or 1. We therefore consider a second set of benchmarks from the verification domain in the form of very special games. We model the well-known Towers of Hanoi represented as a transition system in which states consist of three stacks containing the numbers {1, . . . , n}. The initial state is ([1, . . . , n], [], []), and each state has up to 6 successors resulting from shifting the top element of one stack to another for as long as the top of that is not smaller. The property to be tested is the CTL formula EFfin, where fin holds in the state ([], [1, . . . , n], []) only. The resulting game Gn = (V, V0 , V1 , E, Ω) is special because V1 = ∅ and only priorities 0 and 1 are being assigned to the states. The times needed to solve these games and their sizes are shown in Fig. 4. Note that the interesting part of solving the games of the former example is the computation of the winning regions which show those states from which the elevator has fair runs. Here, however, the interesting part is the computation of the winning strategy for player 0 since it represents a strategy for solving the Towers-of-Hanoi game. Random Games. Finally, we evaluate the generic solver on random games. Note that the standard model of a random game which chooses, for each node, some d successors and randomly assigns priorities as well as node owners, leads to graphs which typically consist of one large SCC and several 1-node SCCs which have successors in the large one. Those do not add significantly to the runtime of the solving process which is predominantly determined by the large SCC. Hence, SCC decomposition would not necessarily prove to be useful in this random model. The truth, however, is that SCC decomposition is indeed useful in general but this random model creates special games on which it is not. While special
194
O. Friedmann and M. Lange
Recursive Algorithm n 5 6 7 8 9 10 11
nodes 972 2.9K 8.7K 26K 78K 236K 708K
sccs 244 730 2.1K 6.5K 19K 59K 177K
all 0.0s 0.0s 0.1s 0.2s 0.7s 2.3s 7.2s
cyc 0.0s 0.0s 0.1s 0.2s 0.7s 2.3s 7.2s
pcsg 0.0s 0.0s 0.1s 0.4s 1.0s 3.3s 13s
scc 0.0s 0.0s 0.1s 0.4s 1.0s 3.3s 13s
none 0.0s 0.1s 0.2s 0.7s 2.2s 4.1s 21s
Strategy Improvement Small Progress Measures all 0.0s 0.0s 0.1s 0.2s 0.7s 2.3s 7.2s
cyc 0.0s 0.0s 0.1s 0.2s 0.7s 2.3s 7.2s
pcsg 0.0s 0.0s 0.1s 0.4s 1.0s 3.3s 13s
scc 0.0s 0.0s 0.1s 0.4s 1.0s 3.3s 13s
none 0.9s 15s 5.4m † † † †
all 0.0s 0.0s 0.1s 0.2s 0.7s 2.3s 7.2s
cyc 0.0s 0.0s 0.1s 0.2s 0.7s 2.3s 7.2s
pcsg 0.0s 0.0s 0.1s 0.4s 1.0s 3.3s 13s
scc 0.0s 0.0s 0.1s 0.4s 1.0s 3.3s 13s
none 0.3s 2.2s 13s 77s 9.9m 37m †
Fig. 4. Runtime results on games from the Towers-of-Hanoi example
Recursive Algorithm nodes avg.sccs all cyc pcsg 1K 31 0.0s 0.0s 0.0s 2K 71 0.0s 0.0s 0.0s 5K 130 0.1s 0.1s 0.1s 10K 244 0.2s 0.2s 0.2s 20K 458 0.3s 0.4s 0.3s 50K 1K 0.8s 1.1s 0.8s 100K 1.5K 2.7s 4.1s 2.9s 200K 2.3K 5.9s 8.4s 5.5s 500K 3.4K 16s 20s 18s 1M 12K 99s 2.1m 1.7m
scc 0.0s 0.0s 0.1s 0.2s 0.4s 1.1s 4.1s 8.4s 19s 2.3m
none 0.0s 0.1s 0.2s 0.5s 1.1s 2.9s 6.3s 14s 60s 14m
Strategy Improvement all 0.0s 0.0s 0.1s 0.2s 0.4s 1.1s 3.9s 8.4s 19s 1.7m
cyc 0.0s 0.0s 0.1s 0.2s 0.4s 1.1s 4.0s 8.4s 20s 1.7m
pcsg 0.3s 0.3s 0.7s 0.8s 2.7s 3.7s 6.0s 16s 34s 13m
scc 0.3s 0.3s 0.7s 0.9s 2.7s 3.7s 11s 22s 34s 26m
none 1.2s 6.7s 61s 5.9m 32m † † † † †
Small Progress Measures all 0.0s 0.0s 0.1s 0.4s 0.7s 1.7s 6.2s 13s 30s 2.8m
cyc pcsg scc none 0.0s † † † 0.0s † † † 0.1s † † † 0.4s † † † 0.7s † † † 1.8s † † † 6.5s † † † 14s † † † 31s † † † 3.0m † † †
Fig. 5. Runtime results on random games
games are important to consider, random games should be more general ones since a random model is typically employed in order to capture all sorts of other games. Thus, we enhance this simple random model in order to obtain more interesting games of size n: first, create clusters of sizes < n according to this model, then combine these whilst adding random edges between the clusters. Fig. 5 presents the average number of SCCs that these random games posses, as well as the corresponding average runtime results. Each row represents 100 random games of corresponding size.
5
Conclusions
The previous section shows that it is possible to solve large parity games efficiently in practice. Contrary to common believe, even a large number of priorities does not necessarily pose a great difficulty in practice. All in all, there are five notable, maybe even surprising observations that can generally be made here. (1) The recursive algorithm is much better than the other two algorithms if applied without any optimisation and preprocessing techniques. We believe that this is due to the nature of the recursive algorithm being itself based on a continuous decomposition of the game. (2) SCC decomposition alone is highly profitable already, and in general even moreso when combined with any of the other optimisations, particularly the elimination of self-cycles.
Solving Parity Games in Practice
195
(3) Not every optimisation speeds up all algorithms likewise. The recursive algorithm seems to profit more from self-cycle elimination than from priority compression and solving of special cases. This could be due to the fact that without eliminating self-cycles it expectedly requires a deep recursive descend in order to detect them by the recursion mechanism. On the other hand, the considered special cases can be solved quite fast by the recursive algorithm. Regarding strategy iteration and small progress measures iteration, it is the other way round: the detection of special cases as well as priority compression speed up the algorithms much more than treating self-cycles beforehand. The former is not surprising because both algorithms summarize nodes with the same priority and it is clear that the direct solution of special cases is faster than applying the iteration techniques to them. Finally it is also clear why the strategy iteration does not profit as much from elimination of self-cycles as the recursive algorithm since strategy iteration basically detects cycles and computes attractor-like strategies seeking cycles which is similar to the preprocessing technique that removes selfcycles. Similarly, the small progress measures algorithm easily detects self-cycles and back-propagates them through the graph which also corresponds to the computation of attractor-like strategies. (4) There are even complex instances like the Towers-of-Hanoi example that are completely solved by the generic algorithm, i.e. without calling the backend even once. Obviously, generic optimisations as discussed in this paper have not at all the potential to give rise to a polynomial time algorithm that solves arbitrary parity games. But solving real-world parity game problems in practice can be heavily sped up by generic optimisation techniques. (5) In general, it is advisable to enable all optimisations. Thus even an inexperienced user is on the safe side by activitating all of them. Also, using can cause tremendous speed-ups, which is witnessed for example in the drop of the average runtime from 14 to 1.5 minutes on random games with 1 million nodes using the recursive algorithm. Despite developing additional universal optimisation techniques and parallelizing existing backend algorithms, there is another approach that should turn out to be of high value for practical solving: since it is very unlikely that one would ever find a real-world family of games on which all of the known backend algorithms show bad performance, there is an immediate improvement for the generic solver: it could take an arbitrary number of complete solvers as arguments and run them in parallel on those parts that cannot be solved by simpler methods. As soon as any of them provides a solution, the other computations can be killed.
References 1. Antonik, A., Charlton, N., Huth, M.: Polynomial-time under-approximation of winning regions in parity games. Technical report, Dept. of Computer Science, Imperial College London (2006) 2. Arnold, A., Vincent, A., Walukiewicz, I.: Games for synthesis of controllers with partial observation. Theor. Comput. Sci. 303(1), 7–34 (2003)
196
O. Friedmann and M. Lange
3. Emerson, E.A., Jutla, C.S.: Tree automata, μ-calculus and determinacy. In: Proc. 32nd Symp. on Foundations of Computer Science, San Juan, Puerto Rico, pp. 368–377. IEEE, Los Alamitos (1991) 4. Jurdzi´ nski, M.: Deciding the winner in parity games is in U P ∩co-U P . Inf. Process. Lett. 68(3), 119–124 (1998) 5. Jurdzi´ nski, M.: Small progress measures for solving parity games. In: Reichel, H., Tison, S. (eds.) STACS 2000. LNCS, vol. 1770, pp. 290–301. Springer, Heidelberg (2000) 6. Jurdzi´ nski, M., Paterson, M., Zwick, U.: A deterministic subexponential algorithm for solving parity games. In: Proc. ACM-SIAM Symp. on Discrete Algorithms, SODA 2006, pp. 114–123. ACM/SIAM (2006) (to appear) 7. K¨ ahler, D., Wilke, T.: Complementation, disambiguation, and determinization of B¨ uchi automata unified. In: Aceto, L., Damg˚ ard, I., Goldberg, L.A., Halld´ orsson, M.M., Ing´ olfsd´ ottir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 724–735. Springer, Heidelberg (2008) 8. Lange, M.: Solving parity games by a reduction to SAT. In: Majumdar, R., Jurdzi´ nski, M. (eds.) Proc. Int. Workshop on Games in Design and Verification, GDV 2005 (2005) 9. Piterman, N.: From nondeterministic B¨ uchi and Streett automata to deterministic parity automata. In: Proc. 21st Symp. on Logic in Computer Science, LICS 2006, pp. 255–264. IEEE Computer Society, Los Alamitos (2006) 10. Safra, S.: On the complexity of ω-automata. In: Proc. 29th Symp. on Foundations of Computer Science, FOCS 1988, pp. 319–327. IEEE, Los Alamitos (1988) 11. Schewe, S.: Solving parity games in big steps. In: Arvind, V., Prasad, S. (eds.) FSTTCS 2007. LNCS, vol. 4855, pp. 449–460. Springer, Heidelberg (2007) 12. Schewe, S.: An optimal strategy improvement algorithm for solving parity and payoff games. In: Kaminski, M., Martini, S. (eds.) CSL 2008. LNCS, vol. 5213, pp. 369–384. Springer, Heidelberg (2008) 13. Stevens, P., Stirling, C.: Practical model-checking using games. In: Steffen, B. (ed.) TACAS 1998. LNCS, vol. 1384, pp. 85–101. Springer, Heidelberg (1998) 14. Stirling, C.: Local model checking games. In: Lee, I., Smolka, S.A. (eds.) CONCUR 1995. LNCS, vol. 962, pp. 1–11. Springer, Heidelberg (1995) 15. Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM J. Computing 1, 146–160 (1972) 16. van de Pol, J., Weber, M.: A multi-core solver for parity games. Electr. Notes Theor. Comput. Sci. 220(2), 19–34 (2008) 17. V¨ oge, J., Jurdzi´ nski, M.: A discrete strategy improvement algorithm for solving parity games. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, pp. 202–215. Springer, Heidelberg (2000) 18. Zielonka, W.: Infinite games on finitely coloured graphs with applications to automata on infinite trees. TCS 200(1–2), 135–183 (1998)
Automated Analysis of Data-Dependent Programs with Dynamic Memory Parosh Aziz Abdulla1 , Muhsin Atto2 , Jonathan Cederberg1, and Ran Ji3 1
Uppsala University, Sweden University of Duhok, Kurdistan-Iraq Chalmers University of Technology, Gothenburg, Sweden 2
3
Abstract. We present a new approach for automatic verification of data-dependent programs manipulating dynamic heaps. A heap is encoded by a graph where the nodes represent the cells, and the edges reflect the pointer structure between the cells of the heap. Each cell contains a set of variables which range over the natural numbers. Our method relies on standard backward reachability analysis, where the main idea is to use a simple set of predicates, called signatures, in order to represent bad sets of heaps. Examples of bad heaps are those which contain either garbage, lists which are not well-formed, or lists which are not sorted. We present the results for the case of programs with a single nextselector, and where variables may be compared for (in)equality. This allows us to verify for instance that a program, like bubble sort or insertion sort, returns a list which is well-formed and sorted, or that the merging of two sorted lists is a new sorted list. We report on the result of running a prototype based on the method on a number of programs.
1 Introduction We consider the automatic verification of data-dependent programs that manipulate dynamic linked lists. The contents of the linked lists, here refered to as a heap, is represented by a graph. The nodes of the graph represent the cells of the heap, while the edges reflect the pointer structure between the cells (see Figure 1 for a typical example). The pro1 gram has a v z x dynamic be∗ 3 5 6 6 4 # haviour in w the sense that y cells may be created and deleted; and Fig. 1. A typical graph representing the heap that pointers may be re-directed during the execution of the program. The program is also datadependent since the cells contain variables, ranging over the natural numbers, that can be compared for (in)equality and whose values may be updated by the program. The values of the local variables are provided as attributes to the corresponding cells. Finally, we have a set of (pointer) variables which point to different cells inside the heap. Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 197–212, 2009. c Springer-Verlag Berlin Heidelberg 2009
198
P.A. Abdulla et al.
In this paper, we consider the case of programs with a single next-selector, i.e., where each cell has at most one successor. For this class of programs, we give a method for automatic verification of safety properties. Such properties can be either structural properties such as absence of garbage, sharing, and dangling pointers; or data properties such as sortedness and value uniqueness. We provide a simple symbolic representation, which we call signatures, for characterizing (infinite) sets of heaps. Signatures can also be represented by graphs. One difference, compared to the case of heaps, is that some parts may be missing from the graph of a signature. For instance, the absence of a pointer means that the pointer may point to an arbitrary cell inside a heap satisfying the signature. Another difference is that we only store information about the ordering on values of the local variables rather than their exact values. A signature can be interpreted as a forbidden pattern which should not occur inside the heap. The forbidden pattern is essentially a set of minimal conditions which should be satisfied by any heap in order for the heap to satisfy the signature. A heap satisfying the signature is considered to be bad in the sense that it contains a bad pattern which in turn implies that it violates one of the properties mentioned above. Examples of bad patterns in heaps are garbage, lists which are not well-formed, or lists which are not sorted. This means that checking a safety property amounts to checking the reachability of a finite set of signatures. We perform standard backward reachability analysis, using signatures as a symbolic representation, and starting from the set of bad signatures. We show how to perform the two basic operations needed for backward reachability analysis, namely checking entailment and computing predecessors on signatures. For checking entailment, we define a pre-order on signatures, where we view a signature as three separate graphs with identical sets of nodes. The edge relation in one of the three graphs reflects the structure of the heap graph, while the other two reflect the ordering on the values of the variables (equality resp. inequality). Given two signatures g1 and g2 , we have g1 g2 if g1 can be obtained from g2 by a sequence of transformations consisting of either deleting an edge (in one of the three graphs), a variable, an isolated node, or contracting segments (i.e., sequence of nodes) without sharing in the structure graph. In fact, this ordering also induces an ordering on heaps where h1 h2 if, for all signatures g, h2 satisfies g whenever h1 satisfies g. When performing backward reachability analysis, it is essential that the underlying symbolic representation, signatures in our case, is closed under the operation of computing predecessors. More precisely, for a signature g, let us define Pre(g) to be the set of predecessors of g, i.e., the set of signatures which characterize those heaps from which we can perform one step of the program and as a result obtain a heap satisfying g. Unfortunately, the set Pre(g) does not exist in general under the operational semantics of the class of programs we consider in this paper. Therefore, we consider an over-approximation of the transition relation where a heap h is allowed first to move to smaller heap (w.r.t. the ordering ) before performing the transition. For the approximated transition relation, we show that the set Pre(g) exists, and that it is finite and computable. One advantage of using signatures is that it is quite straightforward to specify sets of bad heaps. For instance, forbidden patterns for the properties of list well-formedness and absence of garbage can each be described by 4-6 signatures, with 2-3 nodes in
Automated Analysis of Data-Dependent Programs with Dynamic Memory
199
each signature. Also, the forbidden pattern for the property that a list is sorted consists of only one signature with two nodes. Furthermore, signatures offer a very compact symbolic representation of sets of bad heaps. In fact, when verifying our programs, the number of nodes in the signatures which arise in the analysis does not exceed ten. In addition, the rules for computing predecessors are local in the sense that they change only a small part of the graph (typically one or two nodes and edges). This makes it possible to check entailment and compute predecessors quite efficiently. The whole verification process is fully automatic since both the approximation and the reachability analysis are carried out without user intervention. Notice that if we verify a safety property in the approximate transition system then this also implies its correctness in the original system. We have implemented a prototype based on our method, and carried out automatic verification of several programs such as insertion in a sorted lists, bubble sort, insertion sort, merging of sorted lists, list partitioning, reversing sorted lists, etc. Although the procedure is not guaranteed to terminate in general, our prototype terminates on all these examples. Outline. In the next section, we describe our model of heaps, and introduce the programming language together with the induced transition system. In Section 3, we introduce the notion of signatures and the associated ordering. Section 4 describes how to specify sets of bad heaps using signatures. In Section 5 we give an overview of the backward reachability scheme, and show how to compute the predecessor and entailment relations on signatures. The experimental results are presented in Section 6. In Section 7 we give some conclusions and directions for future research. Finally, in Section 8, we give an overview of related approaches and the relationship to our work.
2 Heaps In this section, we give some preliminaries on programs which manipulate heaps. Let N be the set of natural numbers. For sets A and B, we write f : A → B to denote that f is a (possibly partial) function from A to B. We write f (a) = ⊥ to denote that f (a) is undefined. We use f [a ← b] to denote the function f such that f (a) = b and f (x) = f (x) if x = a. In particular, we use f [a ← ⊥] to denote the function f which agrees on f on all arguments, except that f (a) is undefined. Heaps. We consider programs which operate on dynamic data structures, here called heaps. A heap consists of a set of memory cells (cells for short), where each cell has one next-pointer. Examples of such heaps are singly liked lists and circular lists, possibly sharing their parts (see Figure 1). A cell in the heap may contain a datum which is a natural number. A program operating on a heap may use a finite set of variables representing pointers whose values are cells inside the heap. A pointer may have the special value null which represents a cell without successors. Furthermore, a pointer may be dangling which means that it does not point to any cell in the heap. Sometimes, we write the “x-cell” to refer to the the cell pointed to by the variable x. We also write “the value of the x-cell” to refer to the value stored inside the cell pointed to by x. A heap can naturally be encoded by a graph, as the one of Figure 1. A vertex in the graph represents a cell in the heap, while the edges reflect the successor (pointer) relation on
200
P.A. Abdulla et al.
the cells. A variable is attached to a vertex in the graph if the variable points to the corresponding cell in the heap. Cell values are written inside the nodes (absence of a number means that the value is undefined). Assume a finite set X of variables. Formally, a heap is a tuple (M, Succ, λ, Val) where – M is a finite set of (memory) cells. We assume two special cells # and ∗ which represent the constant null and the dangling pointer value respectively. We define M • := M ∪ {#, ∗}. – Succ : M → M • . If Succ(m1 ) = m2 then the (only) pointer of the cell m1 points to the cell m2 . The function Succ is total which means that each cell in M has a successor (possibly # or ∗). Notice that the special cells # and ∗ have no successors. – λ : X → M • defines the cells pointed to by the variables. The function λ is total, i.e., each variable points to one cell (possibly # or ∗). – Val : M → N is a partial function which gives the values of the cells. In Figure 1, we have 17 cells of which 15 are in M, The set X is given by {x, y, z, v, w}. The successor of the z-cell is null. Variable w is attached to the cell ∗, which means that w is dangling (w does not point to any cell in the heap). Furthermore, the value of the x-cell is 6, the value of the y-cell is not defined, the value of the successor of the y-cell is 3, etc. Remark. In fact, we can allow cells to contain multiple values. However, to simplify the presentation, we keep the assumption that a cell contains only one number. This will be sufficient for our purposes; and furthermore, all the definitions and methods we present in the paper can be extended in a straightforward manner to the general case. Also, we can use ordered domains other than the natural numbers such as the integers, rationals, or reals. Programming Language. We define a simple programming language. To this end, we assume, together with the earlier mentioned set X of variables, the constant null where null ∈ X. We define X # := X ∪ {null}. A program P is a pair (Q, T ) where Q is a finite set of control states and T is a finite set of transitions. The control states represent the locations of the program. A transition is a triple (q1 , op, q2 ) where q1 , q2 ∈ Q are control states and op is an operation. In the transition, the program changes location from q1 to q2 , while it checks and manipulates the heap according to the operation op. The operation op is of one of the following forms – x = y or x = y where x, y ∈ X # . The program checks whether the x- and y-cells are identical or different. – x := y or x.next := y where x ∈ X and y ∈ X # . In the first operation, the program makes x point to the y-cell, while in the second operation it updates the successor of the x-cell, and makes it equal to the y-cell. – x := y.next where x, y ∈ X. The variable x will now point to the successor of the y-cell. – new(x), delete(x), or read(x), where x ∈ X . The first operation creates a new cell and makes x point to it; the second operation removes the x-cell from the heap; while the third operation reads a new value and assigns it to the x-cell.
Automated Analysis of Data-Dependent Programs with Dynamic Memory
z
1
x
6
y
h0 9 6
y
6
h4
∗
z 9
x 6
h5
x
y
h2 1
9
y
3
6
z 6
9
6
y
1
x ∗
3
6
h1
z
1
h3
3
6
z
1
9
x
6
6
z
1
201
∗
6
x
y
Fig. 2. Starting from the heap h0 , the heaps h1 , h2 , h3 , h4 , and h5 are generated by performing the following sequence of operations: z.num :> x.num, x := y.next, delete(x), new(x), and z.next := y. To simplify the figures, we omit the special nodes # and ∗ unless one of the variables x, y, z is attached to them. For this reason the cell # is missing in all the heaps, and ∗ is present only in h3 , h4 , h5 .
– x.num = y.num, x.num < y.num, x.num := y.num, x.num :> y.num, or x.num :< y.num, where x, y ∈ X. The first two operations compare the values of (number stored inside) the x- and y-cells. The third operation copies the value of the y-cell to the x-cell. The fourth (fifth) operation assigns non-deterministically a value to the x-cell which is larger (smaller) than that of the y-cell. Figure 2 illustrates the effect of a sequence of operations of the forms described above on a number of heaps. Examples of some programs can be found in [2]. Transition System. We define the operational semantics of a program P = (Q, T ) by giving the transition system induced by P. In other words, we define the set of configurations and a transition relation on configurations. A configuration is a pair (q, h) where q ∈ Q represents the location of the program, and h is a heap. We define a transition relation (on configurations) that reflects the manner in which the instructions of the program change a given configuration. First, we define some operations on heaps. Fix a heap h = (M, Succ, λ, Val). For m1 , m2 ∈ M, we use (h.Succ) [m1 ← m2 ] to denote the heap h we obtain by updating the successor relation such that the cell m2 now becomes the successor of m1 (without changing anything else in h). Formally, h = (M, Succ , Val, λ) where Succ = Succ[m1 ← m2 ]. Analogously, (h.λ) [x ← m] is the heap we obtain by making x point to the cell m; and (h.Val) [m ← i] is the heap we obtain by assigning the value i to the cell m. For instance, in Figure 2, let hi be of the form (Mi , Succi , Vali , λi ) for i ∈ {0, 1, 2, 3, 4, 5}. Then, we have h1 = (h0 .Val) [λ0 (z) ← 9] since we make the value of the z-cell equal to 9. Also, h2 = (h1 .λ1 ) [x ← Succ1 (λ1 (y))] since we make x point to the successor of the y-cell. Furthermore, h5 = (h4 .Succ4 ) [λ4 (z) ← λ4 (y)] since we make the y-cell the successor of the z-cell.
202
P.A. Abdulla et al.
Consider a cell m ∈ M. We define h m to be the heap h we get by deleting the cell m from h. More precisely, we define h := (M , Succ , λ , Val ) where – M = M − {m}. – Succ (m ) = Succ(m ) if Succ(m ) = m, and Succ (m ) = ∗ otherwise. In other words, the successor of cells pointing to m will become dangling in h . – λ (x) = ∗ if λ(x) = m, and λ (x) = λ(x) otherwise. In other words, variables pointing to the same cell as x in h will become dangling in h . – Val (m ) = Val(m ) if m ∈ M . That is, the function Val is the restriction of Val to M : it assigns the same values as Val to all the cells which remain in M (since m ∈ M , it not meaningful to speak about Val (m)). In Figure 2, we have h3 = h2 λ2 (x). Let t = (q1 , op, q2 ) be a transition and let c = (q, h) and c = (q , h ) be configurations. op op t We write c −→ c to denote that q = q1 , q = q2 , and h −→ h , where h −→ h holds if we obtain h by performing the operation op on h. For brevity, we give the definition of op the relation −→ for three types of operations. The rest of the cases can be found in [2]. – op is of the form x := y.next, λ(y) ∈ M, Succ(λ(y)) = ∗, and h = (h.λ) [x ← Succ(λ(y))]. – op is of the from new(x), M = M ∪ {m} for some m ∈ M, λ = λ [x ← m], Succ = Succ[m ← ∗], Val (m ) = Val(m ) if m = m, and Val (m) = ⊥. This operation creates a new cell and makes x point to it. The value of the new cell is not defined, while the successor is the special cell ∗. As an example of this operation, see the transition from h3 to h4 in Figure 2. – op is of the form x.num :> y.num, λ(x) ∈ M, λ(y) ∈ M, Val(λ(y)) = ⊥, and h = (h.Val) [λ(x) ← i], where i > Val(λ(y)). t
∗
We write c −→ c to denote that c −→ c for some t ∈ T ; and use −→ to denote the ∗ reflexive transitive closure of −→. The relations −→ and −→ are extended to sets of configurations in the obvious manner. Remark. One could also allow deterministic assignment operations of the form x.num:= y.num + k or x.num := y.num − k for some constant k. However, according the approximate transition relation which we define in Section 5, these operations will have identical interpretations as the non-deterministic operations given above.
3 Signatures In this section, we introduce the notion of signatures. We will define an ordering on signatures from which we derive an ordering on heaps. We will then show how to use signatures as a symbolic representation of infinite sets of heaps. Signatures. Roughly speaking, a signature is a graph which is “less concrete” than a heap in the following sense: – We do not store the actual values of the cells in a signature. Instead, we define an ordering on the cells which reflects their values. – The functions Succ and λ in a signature are partial (in contrast to a heap in which these functions are total).
Automated Analysis of Data-Dependent Programs with Dynamic Memory
203
Formally, a signature g is a tuple of the form (M, Succ, λ, Ord), where M, Succ, λ are defined in the same way as in heaps (Section 2), except that Succ and λ are now partial. Furthermore, Ord is a partial function from M × M to the set {≺, ≡}. Intuitively, if Succ(m) = ⊥ for some cell m ∈ M, then g puts no constraints on the successor of m, i.e., the successor of m can be any arbitrary cell. Analogously, if λ(x) = ⊥, then x may point to any of the cells. The relation Ord constrains the ordering on the cell values. If Ord(m1 , m2 ) =≺ then the value of m1 is strictly smaller than that of m2 ; and if Ord(m1 , m2 ) =≡ then their values are equal. This means that we abstract away the actual values of the cells, and only keep track of their ordering (and whether they are equal). For a cell m, we say that the value of m is free if Ord(m, m ) = ⊥ and Ord(m , m) = ⊥ for all other cells m . Abusing notation, we write m1 ≺ m2 (resp. m1 ≡ m2 ) if Ord(m1 , m2 ) =≺ (resp. Ord(m1 , m2 ) =≡). We represent signatures graphically in a manner similar to that of heaps. Figure 3 shows graphical representations of six signatures g0 , . . . , g5 over the set of variables {x, y, z}. If a vertex in the graph has no successor, then the successor of the corresponding cell is not defined in g (e.g., the y-cell in g4 ). Also, if a variable is missing in the graph, then this means that the cell to which the variable points is left unspecified (e.g., variable z in g3 ). The ordering Ord on cells is illustrated by dashed arrows. A dashed single-headed arrow from a cell m1 to a cell m2 indicates that m1 ≺ m2 . A dashed double-headed arrow between m1 and m2 indicates that m1 ≡ m2 . To simplify the figures, we omit self-loops indicating value reflexivity (i.e., m ≡ m). In this manner, we can view a signature as three graphs with a common set of vertices, and with three edge relations; where the first edge relation gives the graph structure, and the other two define the ordering on cell values (inequality resp. equality). In fact, each heap h = (M, Succ, λ, Val) induces a unique signature which we denote by z z sig (h). More precisely, sig (h) := x x (M, Succ, λ, Ord) where, for all cells m1 , m2 ∈ M, we have m1 ≺ m2 iff Val(m1 ) < Val(m2 ) and y y g0 g1 m1 ≡ m2 iff Val(m1 ) = Val(m2 ). In other words, in the signature of h, we remove the concrete values in z the cells and replace them by the x x ordering relation on the cell values. For example, in Figure 2 and Figure 3, we have g0 = sig (h0 ). g2 g3
y
x g4
y
x y
g5
Fig. 3. Examples of signatures
y
Signature Ordering. We define an entailment relation, i.e., ordering on signatures. The intuition is that each signature can be interpreted as a predicate which characterizes an infinite set of heaps. The ordering is then the inverse of
204
P.A. Abdulla et al.
implication: smaller signatures impose less restrictions and hence characterize larger sets of heaps. We derive a small signature from a larger one, by deleting cells, edges, variables in the graph of the signature, and by weakening the ordering requirements on the cells (the latter corresponds to deleting edges encoding the two relations on data values). To define the ordering, we give first definitions and describe some operations on signatures. Fix a signature g = (M, Succ, λ, Ord). A cell m ∈ M is said to be semi-isolated if there is no x ∈ X with λ(x) = m, the / and either Succ(m) = ⊥ or Succ(m) = ∗. In other value of m is free, Succ−1 (m) = 0, words, m is not pointed to by any variables, its value is not related to that of any other cell, it has no predecessors, and it has no successors (except possibly ∗). We say that m is isolated if it is semi-isolated and in addition Succ(m) = ⊥. A cell m ∈ M is said to be simple if there is no x ∈ X with λ(x) = m, the value of m is free, |Succ−1 (m)| = 1, and Succ(m) = ⊥. In other words, m has exactly one predecessor, one successor and no label. In Figure 3, the topmost cell of g3 is isolated, and the successor of the x-cell in g4 is simple. In Figure 1, the cell to the left of the w-cell is semi-isolated in the signature of the heap. The operations (g.Succ) [m1 ← m2 ] and (g.λ)[x ← m] are defined in identical fashion to the case of heaps. Furthermore, for cells m1 , m2 and 2 ∈ {≺, ≡, ⊥}, we define (g.Ord) [(m1 , m2 ) ← 2] to be the signature g we obtain from g by making the ordering relation between m1 and m2 equal to 2. For a variable x, we define g x to be the signature g we get from g by deleting the variable x from the graph, i.e., g = (g.λ)[x ← ⊥]. For a cell m, we define the signature g = g m = (M , Succ , λ , Ord ) in a manner similar to the case of heaps. The only difference is that Ord (rather than Val ) is the restriction of Ord to pairs of cells both of which are different from m. Now, we are ready to define the ordering. For signatures g = (M, Succ, λ, Ord) and g = (M , Succ , λ , Ord ), we write that g g to denote that one of the following properties is satisfied: Variable Deletion: g = g x for some variable x, Cell Deletion: g = g m for some isolated cell m ∈ M , Edge Deletion: g = (g .Succ) [m ← ⊥] for some m ∈ M , Contraction: there are cells m1 , m2 , m3 ∈ M and a signature g1 such that m2 is simple, Succ (m1 ) = m2 , Succ (m2 ) = m3 , g1 = (g .Succ) [m1 ← m3 ] and g = g1 m2 , or – Order Deletion: g = (g .Ord) [(m1 , m2 ) ← ⊥] for some cells m1 , m2 ∈ M . – – – –
We write g g to denote that there are g0 g1 g2 · · · gn with n ≥ 0, g0 = g, and gn = g . That is, we can obtain g from g by performing a finite sequence of variable deletion, cell deletion, edge deletion, order deletion, and contraction operations. In Figure 3 we obtain: g1 from g0 through three order deletions; g2 from g1 through one order deletion; g3 from g2 through one variable deletion and two edge deletions; g4 from g3 through one node deletion and one edge deletion; and g5 from g4 through one contraction. It means that g5 g4 g3 g2 g1 g0 and hence g5 g0 . Heap Ordering We define an ordering on heaps such that h h iff sig (h) sig (h ). For a heap h and a signature g, we say that h satisfies g, denoted h |= g, if g sig (h). In this manner,
Automated Analysis of Data-Dependent Programs with Dynamic Memory
205
each signature characterizes an infinite set of heaps, namely the set [[g]] := {h| h |= g}. Notice that [[g]] is upward closed w.r.t. the ordering on heaps. We also observe that, for signatures g and g , we have that g g iff [[g ]] ⊆ [[g]]. For a (finite) set G of signatures we define [[G]] := g∈G [[g]]. Considering the heaps of Figure 2 and the signatures of Figure 3, we have h1 |= g0 , h2 |= g0 , h0 h1 , h0 h2 , etc. Remark. Our definition implies that signatures cannot specify “exact distances” between cells. For instance, we cannot specify the set of heaps in which the x-cell and the y-cell are exactly of distance one from each other. In fact, if such a heap is in the set then, since we allow contraction, heaps where the distance is larger than one will also be in the set. On the other hand, we can characterize sets of heaps where two cells are at distance at least k from each other for some k ≥ 1.
4 Bad Configurations In this section, we show how to use signatures in order to specify sets of bad heaps for programs which produce ordered linear lists. A signature is interpreted as a forbidden pattern which should not occur inside the heap. Typically, we would like such a program to produce a heap which is a linear list. Furthermore, the heap should not contain any garbage, and the output list should be ordered. For each of these three properties, we describe the corresponding forbidden patterns as a set of signatures which characterize exactly those heaps which violate the property. Later, we will collect all these signatures into a single set which exactly characterizes the set of bad configurations. First, we give some definitions. Fix a heap h = (M, Succ, λ, Val). A loop in h is a set {m0 , . . . , mn } of cells such that Succ(mi ) = mi+1 for all i : 0 ≤ i < n, and Succ(mn ) = m0 . For cells m, m ∈ M, we say that m is visible from m if there are cells m0 , m1 , . . . , mn for some n ≥ 0 such that m0 = m, mn = m , and mi+1 = Succ(mi ) for all i : 0 ≤ i < n. In other words, there is a (possibly empty) path in the graph leading from m to m . We say that m is strictly visible from m if n > 0 (i.e. the path is not empty). A set M ⊆ M is said to be visible from m if some m ∈ M is visible from m. Well-Formedness. We say that h is well-formed w.r.t a variable x if # is visible form the x-cell. Equivalently, neither the cell ∗ nor any loop is visible from the x-cell. Intuitively, if a heap satisfies this condition, then the part of the heap visible from the x-cell forms a linear list ending with #. For instance, the heap of Figure 1 is well-formed w.r.t. the variables v and z. In Figure 2, h0 is not well-formed w.r.t. the variables x and z (a loop is visible), and h4 is not well-formed w.r.t. x x z (the cell ∗ is visible). The set of heaps violating well- b1 : ∗ b2 : ∗ formedness w.r.t. x are characterized by the four signatures x in the figure to the right. The signatures b1 and b2 characx b3 : b4 : terize (together) all heaps in which the cell ∗ is visible from the x-cell. The signatures b3 and b4 characterize (together) all heaps in which a loop is visible from the x-cell.
206
P.A. Abdulla et al.
Garbage. We say that h contains garbage w.r.t a variable x if there is a cell m ∈ M in h which is not visible from the x-cell. In Figure 2, the heap h0 contains one cell which is garbage w.r.t. x, namely the cell with value 1. The figure to the right shows six signatures which together characterize the set of heaps which contain garbage w.r.t. x.
b5 : b7 : b9 :
b6 :
x x x
x
#
b8 : #
∗
b10 : ∗
x x
Sortedness. A heap is said to be sorted if it satisfies the condition that whenever a cell m1 ∈ M is visible from a cell m2 ∈ M then Val(m1 ) ≤ Val(m2 ). For instance, in Figure 2, only h5 is sorted. The figure to the right shows a signature which characterizes all heaps which are not b11 : sorted. Putting Everything Together. Given a (reference) variable x, a configuration is considered to be bad w.r.t. x if it violates one of the conditions of being well-formed w.r.t. x, not containing garbage w.r.t. x, or being sorted. As explained above, the signatures b1 , . . . , b11 characterize the set of heaps which are bad w.r.t. x. We observe that b1 b9 , b2 b10 , b3 b5 and b4 b6 , which means that the heaps b9 .b10 , b5 , b6 can be discarded from the set above. Therefore, the set of bad configurations w.r.t. x is characterized by the set {b1 , b2 , b3 , b4 , b7 , b8 , b11 }. Remark. Other types of bad patterns can be defined in a similar manner. Examples can be found in [2].
5 Reachability Analysis In this section, we show how to check safety properties through backward reachability analysis. First, we give an abstract transition relation −→A which is an over-approximation of the transition relation −→. Then, we describe how to compute predecessors of signatures w.r.t. −→A ; and how to check the entailment relation. Finally, we introduce sets of initial heaps (from which the program starts running), and describe how to check safety properties using backward reachability analysis. Over-Approximation. The basic step in backward reachability analysis is to compute the set of predecessors of sets of heaps characterized by signatures. More precisely, for a signature g and an operation op, we wouldlike to op x, y compute a finite set G of signatures such that [[G]] = h| h −→ [[g]] . Cong:
sider the signature g to the right. The set [[g]] contains exactly all heaps where x and y point to the same cell. Consider the operation op defined by y := z.next. The set H of heaps from which we can perform the operation and obtain a heap in [[g]] are all those where the x-cell is the immediate successor of the z-cell. Since signatures cannot capture the immediate successor relation (see the remark in the end of Section 3), the set H cannot be characterized by a set G of signatures, i.e., there is no G such that [[G]] = H. To overcome this problem, we define an approximate transition relation −→A which is an over-approximation of the relation −→. More precisely, for heaps h and h , op op we have h −→A h iff there is a heap h1 such that h1 h and h1 −→ h .
Automated Analysis of Data-Dependent Programs with Dynamic Memory
Computing Predecessors. We show that, for an operation op and a signature g, we can compute afinite set Pre(op)(g) op
g1 :
z
x
g2 :
207
x, z
of signatures such that [[Pre(op)(g)]] = h| h −→A [[g]] . For instance in the above case the set Pre(op)(g) is given by the {g1 , g2 } shown in the figure to the right. Notice that [[{g1 , g2 }]] is the set of all heaps in which the x-cell is strictly visible from the z-cell. In fact, if we take any heap satisfying [[g1 ]] or [[g2 ]], then we can perform deletion and contraction operations (possibly several times) until the x-cell becomes the immediate successor of the z-cell, after which we can perform op thus obtaining a heap where x and y point to the same cell. For each signature g and operation op, we show how to compute Pre(op)(g) as a finite set of signatures. Due to lack of space, we show the definition only for the operation new. The definitions for the rest of the operations can be found in the [2]. For a cell m ∈ M and a variable x ∈ X, we define m being x-isolated in a manner similar to m being isolated, except that we now allow m to be pointed to by x (and only x). More precisely, we say m is x-isolated if λ(x) = m, λ(y) = m if y = x, the value of m is free, / and Succ(m) = ⊥. We define m being x-semi-isolated in a similar manSucc−1 (m) = 0, ner, i.e., by also allowing ∗ to be the successor of the x-cell. For instance, the leftmost cell of the signature b1 in Section 4, and the x-cell in the signature sig (h5 ) in Figure 2 are x-semi-isolated. We define Pre(g)(new(x)) to be the set of signatures g such that one of the following conditions is satisfied: – λ(x) is x-semi-isolated, and there is a signature g1 such that g1 = g λ(x) and g = g1 x. – λ(x) = ⊥ and g = g or g ∈ g m for some semi-isolated cell m. Initial Heaps. A program starts running from a designated set HInit of initial heaps. For instance, in a sorting program, HInit is the set of well-formed lists which are (potentially) not sorted. Notice that this set is infinite since there is no bound on the lengths of the input lists. To deal with input lists, we follow the methodology of [7], and augment the program with an initialization phase. The program starts from an empty heap (denoted hε ) and systematically (and non-deterministically) builds an arbitrary initial heap. In the case of sorting, the initial phase builds a well-formed list of an arbitrary length. We can now take the set HInit to be the singleton containing the empty heap hε . Checking Entailment. For signatures g and g , checking whether g g amounts to constructing an injection from the cells of g to those of g . It turns out that a vast majority (more than 99%) of signatures, compared during the reachability analysis, are not related by entailment. Therefore, we have implemented a number of heuristics to detect negative answers as quickly as possible. An example is that a cell m in g should have (at most) the same labels as its image m in g ; or that the in- and out-degrees of m are smaller than those of m . The details of the entailment algorithm are included in [2]. Checking Safety Properties. To check a safety property, we start from the set GBad of bad signatures, and generate a sequence G0 , G1 , G2 , . . . of finite sets of signatures, where G0 = GBad and Gi+1 = g∈Gi Pre(g). Each time we generate a signature g such that g g for some already generated signature g , we discard g from the analysis.
208
P.A. Abdulla et al.
We terminate the procedure when we reach a point where no new signatures can be added (all the new signatures are subsumed by existing ones). In such a case, we have generated a set G of signatures that characterize all heaps from which we can reach a bad heap through the approximate transition relation −→A . The program satisfies the safety property if g sig (hε ) for all g ∈ G.
6 Experimental Results We have implemented the method described above in a prototype written in Java. We have run the tool on several examples, including all the benchmarks on singly linked lists with data known to us from the TVLA and PALE tools. Table 1 shows the results of our experiments. The column “#Sig.” shows the total number of signatures that were computed throughout the analysis, the column “#Final” shows the number of signatures that remain in the visited set upon termination, the column “#Ent” shows the total number of calls to entailment that were made, and the last column shows the percentage of such calls that returned true. We have also considered buggy versions of some programs in which case the prototype reports an error. All experiments were performed Table 1. Experimental results on a 2.2 GHz InTime #Sig. #Final #Ent Ratio tel Core 2 Duo Prog. 0.1 s 44 40 1570 0.7% with 4 GB of EfficientInsert 111 99 8165 0.2% RAM. For each NonDuplicateInsert 0.4 s program, we ver- Insert 2.6 s 2343 1601 2.2 · 106 0.03% ify well-formed- Insert (bug) 1.4 s 337 268 86000 0.09% ness, absence of Merge 23.5 s 11910 5830 3.6 · 107 0.017% garbage, and Reverse 1.5 s 435 261 70000 0.3% sortedness. Also, ReverseCyclic 1.6 s 1031 574 375000 0.1% in the case of the Partition 2 m 49 s 21058 15072 1.8 · 108 0.003% Partition pro- BubbleSort 35.9 s 11023 10034 7.5 · 107 0.001% gram, we verify BubbleSortCyclic 36.6 s 11142 10143 7.7 · 107 0.001% that the two re- BubbleSort (bug) 1.76 s 198 182 33500 0.07% sulting lists do InsertionSort 11 m 53 s 34843 23324 4.4 · 108 0.003% not have common elements.
7 Conclusions, Discussion, and Future Work We have presented a method for automatic verification of safety properties for programs which manipulate heaps containing data. There are potentially two drawbacks of our method, namely the analysis is not guaranteed to terminate, and it may generate false positives (since we use an over-approximation). A sufficient condition for termination is well quasi-ordering of the entailment relation on signatures (see e.g. [3]). The only example known to us for non-well-quasi-ordering of this relation is based on a complicated sequence pattern by Nash-Williams (described in [13]) which shows
Automated Analysis of Data-Dependent Programs with Dynamic Memory
209
non-well-quasi-ordering of permutations of sequences of natural numbers. Such artificial patterns are unlikely to ever appear in the analysis of typical pointer-manipulating programs. In fact, it is quite hard even to construct artificial programs for which the Nash-Williams pattern arises during backward reachability analysis. This is confirmed by the fact that our implementation terminates on all the given examples. As for false positives, the definition of the heap ordering means that the abstract transition relation −→A allows three types of imprecisions, namely it allows: (i) deleting garbage (nodes which are not visible from any variables), (ii) preforming contraction, and (iii) only storing the ordering on cell variables rather than their actual values. Program runs are not changed by (i) since we only delete nodes which are not accessible from the program pointers in the first place. Also, most program behaviors are not sensitive to the exact distances between nodes in a heap and therefore they are not affected by (ii). Finally, data-dependent programs (such as sorting or merge algorithms) check only ordering rather than complicated relations on data inside the heap cells. This explains why we do not get false positives on any of the examples on which we have run our implementation. The experimental results are quite encouraging, especially considering the fact that our code is still highly unoptimized. For instance, most of the verification time is spent on checking entailment between signatures. We believe that adapting specialized algorithms, e.g. [20], for checking entailment will substantially improve performance of the tool. Several extensions of our framework can be carried out by refining the considered preorder (and the abstraction it induces). For instance, if needed, our framework can be extended in a straightforward manner to handle arithmetical relations which are more complicated than simple ordering on data values such as gap-order constraints [17] or Presburger arithmetic. Given the fact that the analysis terminates on all benchmarks, it is tempting to characterize a class of programs which covers the current examples and for which termination is theoretically guaranteed. Another direction for future work is to consider more general classes of heaps with multiple selectors, and then study programs operating on data structures such as doubly-linked lists and trees both with and without data.
8
Related Work
Several works consider the verification of singly linked lists with data. The paper [14] presents a method for automatic verification of sorting programs that manipulate linked lists. The method is defined within the framework of TVLA which provides an abstract description of the heap structures in 3-valued logic [19]. The user may be required to provide instrumentation predicates in order to make the abstraction sufficiently precise. The analysis is performed in a forward manner. In contrast, the search procedure we describe in this paper is backward, and therefore also property-driven. Thus, the signatures obtained in the traversal do not need to express the state of the entire heap, but only those parts that contribute to the eventual failure. This makes the two methods conceptually and technically different. Furthermore, the difference in search strategy implies that forward and backward search procedures often offer varying degrees of efficiency in different contexts, which makes them complementary to each other in many
210
P.A. Abdulla et al.
cases. This has been observed also for other models such as parameterized systems, timed Petri nets, and lossy channel systems (see e.g. [4,9,1]). Another approach to verification of linked lists with data is proposed in [6,7] based on abstract regular model checking (ARMC) [8]. In ARMC, finite-state automata are used as a symbolic representation of sets of heaps. This means that the ARMC-based approach needs the manipulation of quite complex encodings of the heap graphs into words or trees. In contrast, our symbolic representation uses signatures which provide a simpler and more natural representation of heaps as graphs. Furthermore, ARMC uses a sophisticated machinery for manipulating the heap encodings based on representing program statements as (word/tree) transducers. However, as mentioned above, our operations for computing predecessors are all local in the sense that they only update limited parts of the graph thus making it possible to have much more efficient implementations. The paper [5] uses counter automata as abstract models of heaps which contain data from an ordered domain. The counters are used to keep track of lengths of list segments without sharing. The analysis reduces to manipulation of counter automata, and thus requires techniques and tools for these automata. Recently, there has been an extensive work to use separation logic [18] for performing shape analysis of programs that manipulate pointer data structures (see e.g. [10,21]). The paper [16] describes how to use separation logic in order to provide a semi-automatic procedure for verifying data-dependent programs which manipulate heaps. In contrast, the approach we present here uses a built-in abstraction principle which is different from the ones used above and which makes the analysis fully automatic. The tool PALE (Pointer Assertion Logic Engine) [15] checks automatically properties of programs manipulating pointers. The user is required to supply assertions expressed in the weak monadic second-order logic of graph types. This means that the verification procedure as a whole is only partially automatic. The tool MONA [11], which uses translations to finite-state automata, is employed to verify the provided assertions. Recently, there have been several works which aim at algorithmic verification of systems whose configurations are finite graphs (e.g. [12,3]). These works may seem similar since they are all based on backward reachability using finite graphs as symbolic representations. However, they use different orderings on graphs which leads to entirely different methods for computing predecessor and entailment relations. In fact, the main challenge when designing verification algorithms on graphs, is to come up with the “right” notion of ordering: an ordering which allows computing entailment and predecessors, and which is sufficiently precise to avoid too many false positives. For instance, the graph minor ordering used in [12] to analyze distributed algorithms, is too weak to employ in shape analysis. The reason is that the contraction operation (in the case of the graph minor relation) is insensitive to the directions of the edges; and furthermore the ordering allows merging vertices which carry different labels (different variables), meaning that we would get false positives in almost all examples since they often rely tests like x = y for termination. In our previous work [3], we combined abstraction with backward reachability analysis for verifying heap manipulating programs. However, the programs in [3] are restricted to be data-independent. The extension to the case of
Automated Analysis of Data-Dependent Programs with Dynamic Memory
211
data-dependent programs requires a new ordering on graphs which involves an intricate treatment of structural and data properties. For instance, at the heap level, the data ordering amounts to keeping track of (in)equalities, while the structural ordering is defined in terms of garbage elimination and edge contractions (see the discussion in Section 7). This gives the two orderings entirely different characteristics when computing predecessors and entailment. Also, there is a non-trivial interaction between the structural and the data orderings. This is illustrated by the fact that even specifications of basic data-dependent properties like sortedness require forbidden patterns that contain edges from both orderings (see Section 4). Consequently, none of the programs we consider in this paper can be analyzed in the framework of [3]. In fact, since the programs here are data-dependent, the method of [3] may fail even to verify properties which are purely structural. For instance, the program EfficientInsert (described in [2]) gives a false non-well-formedness warning if data is abstracted away.
References 1. Abdulla, P.A., Annichini, A., Bouajjani, A.: Using forward reachability analysis for verification of lossy channel systems. Formal Methods in System Design (2004) 2. Abdulla, P.A., Atto, M., Cederberg, J., Ji, R.: Automated analysis of data-dependent programs with dynamic memory. Technical Report 2009-018, Dept. of Information Technology, Uppsala University, Sweden (2009), http://user.it.uu.se/˜jonmo/datadependent.pdf 3. Abdulla, P.A., Bouajjani, A., Cederberg, J., Haziza, F., Rezine, A.: Monotonic abstraction for programs with dynamic memory heaps. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 341–354. Springer, Heidelberg (2008) 4. Abdulla, P.A., Henda, N.B., Delzanno, G., Rezine, A.: Regular model checking without transducers (on efficient verification of parameterized systems). In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 721–736. Springer, Heidelberg (2007) 5. Bouajjani, A., Bozga, M., Habermehl, P., Iosif, R., Moro, P., Vojnar, T.: Programs with lists are counter automata. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 517–531. Springer, Heidelberg (2006) 6. Bouajjani, A., Habermehl, P., Moro, P., Vojnar, T.: Verifying programs with dynamic 1-selector-linked structures in regular model checking. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 13–29. Springer, Heidelberg (2005) 7. Bouajjani, A., Habermehl, P., Rogalewicz, A., Vojnar, T.: Abstract tree regular model checking of complex dynamic data structures. In: Yi, K. (ed.) SAS 2006. LNCS, vol. 4134, pp. 52–70. Springer, Heidelberg (2006) 8. Bouajjani, A., Habermehl, P., Vojnar, T.: Abstract regular model checking. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 372–386. Springer, Heidelberg (2004) 9. Ganty, P., Raskin, J., Begin, L.V.: A complete abstract interpretation framework for coverability properties of wsts. In: Emerson, E.A., Namjoshi, K.S. (eds.) VMCAI 2006. LNCS, vol. 3855, pp. 49–64. Springer, Heidelberg (2006) 10. Guo, B., Vachharajani, N., August, D.I.: Shape analysis with inductive recursion synthesis. In: Proc. PLDI 2007, vol. 42 (2007) 11. Henriksen, J., Jensen, J., Jørgensen, M., Klarlund, N., Paige, B., Rauhe, T., Sandholm, A.: Mona: Monadic second-order logic in practice. In: Brinksma, E., Steffen, B., Cleaveland, W.R., Larsen, K.G., Margaria, T. (eds.) TACAS 1995. LNCS, vol. 1019. Springer, Heidelberg (1995)
212
P.A. Abdulla et al.
12. Joshi, S., K¨onig, B.: Applying the graph minor theorem to the verification of graph transformation systems. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 214–226. Springer, Heidelberg (2008) 13. Laver, R.: Well-quasi-orderings and sets of finite sequences. In: Mathematical Proceedings of the Cambridge Philosophical Society, vol. 79, pp. 1–10 (1976) 14. Lev-Ami, T., Reps, T.W., Sagiv, S., Wilhelm, R.: Putting static analysis to work for verification: A case study. In: Proc. ISSTA 2000 (2000) 15. Møller, A., Schwartzbach, M.I.: The pointer assertion logic engine. In: Proc. PLDI 2001, vol. 26, pp. 221–231 (2001) 16. Nguyen, H.H., David, C., Qin, S., Chin, W.-N.: Automated verification of shape and size properties via separation logic. In: Cook, B., Podelski, A. (eds.) VMCAI 2007. LNCS, vol. 4349, pp. 251–266. Springer, Heidelberg (2007) 17. Revesz, P.: Introduction to Constraint Databases. Springer, Heidelberg (2002) 18. Reynolds, J.C.: Separation logic: A logic for shared mutable data structures. In: Proc. LICS 2002 (2002) 19. Sagiv, S., Reps, T., Wilhelm, R.: Parametric shape analysis via 3-valued logic. ACM Trans. on Programming Languages and Systems 24(3), 217–298 (2002) 20. Valiente, G.: Constrained tree inclusion. J. Discrete Algorithms 3(2-4), 431–447 (2005) 21. Yang, H., Lee, O., Berdine, J., Calcagno, C., Cook, B., Distefano, D., O’Hearn, P.W.: Scalable shape analysis for systems code. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 385–398. Springer, Heidelberg (2008)
On-the-fly Emptiness Check of Transition-Based Streett Automata Alexandre Duret-Lutz1, Denis Poitrenaud2, and Jean-Michel Couvreur3 1
3
EPITA Research and Development Laboratory (LRDE) 2 Laboratoire d’Informatique de Paris 6 (LIP6) Laboratoire d’Informatique Fondamentale d’Orl´eans (LIFO)
Abstract. In the automata theoretic approach to model checking, checking a state-space S against a linear-time property ϕ can be done in O(|S| × 2O(|ϕ|) ) time. When model checking under n strong fairness hypotheses expressed as a Generalized B¨uchi automaton, this complexity becomes O(|S| × 2O(|ϕ|+n) ). Here we describe an algorithm to check the emptiness of Streett automata, which allows model checking under n strong fairness hypotheses in O(|S| × 2O(|ϕ|) × n). We focus on transition-based Streett automata, because it allows us to express strong fairness hypotheses by injecting Streett acceptance conditions into the state-space without any blowup.
1 Introduction The Automata Theoretic Approach to Model Checking [29, 28] is a way to check that a model M verifies some property expressed as a temporal logic formula ϕ, in other words: to check whether M |= ϕ. This verification is achieved in four steps, using automata over infinite words (ω-automata): 1. Computation of the state space of M . This graph can be seen as an ω-automaton AM whose language L (AM ) is the set of all possible executions of M . 2. Translation of the temporal property ϕ into an ω-automaton A¬ϕ whose language, L (A¬ϕ ), is the set of all executions that would invalidate ϕ. 3. Synchronized product of these two objects. This constructs an automaton AM ⊗ A¬ϕ whose language is L (AM ) ∩ L (A¬ϕ ): the set of executions of the model M that invalidate the temporal property ϕ. 4. Emptiness check of this product. This operation tells whether AM ⊗A¬ϕ accepts an infinite word (a counterexample). The model M verifies ϕ iff L (AM ⊗ A¬ϕ ) = ∅. On-the-fly algorithms. In practice the above steps are usually tightly tied by the implementation, due to transversal optimizations that forbid a sequential approach. One such optimization is the on-the-fly model checking, where the computation of the product, state space, and formula automaton are all driven by the progression of the emptiness check procedure: nothing is computed before it is required. Being able to work on-the-fly has three practical advantages: – Large parts of AM may not need to be built because of the constraints of A¬ϕ . – The emptiness check may find a counterexample without exploring (and thus constructing) the entire synchronized product. Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 213–227, 2009. c Springer-Verlag Berlin Heidelberg 2009
214
A. Duret-Lutz, D. Poitrenaud, and J.-M. Couvreur
– To save memory we can throw away states that have been constructed but are not actually needed. We would rebuild them later should they be needed again. [13] From an implementation standpoint on-the-fly model checking puts requirements on the interface of the automata representing the product, the state graph, and the formula. For instance the interface used in Spot [8] amounts to two functions: one to obtain the initial state of the automata, another to get the successors of a given state. It is common to say that an emptiness check is on-the-fly when it is compatible with such an interface. For instance Kosaraju’s algorithm [2, §23.5] for computing strongly connected components (SCC) will not work on-the-fly because it has to know the entire graph to transpose it. The algorithms of Tarjan [25] and Dijkstra [5, 6] are more suited to compute SCCs on-the-fly, because they perform a single depth-first search. Fairness hypotheses [10] is a way to restrict the verification to a subset of “fair” executions of the model. For instance if we have a model of two concurrent processes running on the same host, we might want to assume that the scheduler is fair and that both processes will get slices of CPU-time infinitely often. When considering all the possible executions of the model, this hypothesis amounts to discarding all executions in which a process is stuck. Transition-based Buchi ¨ and Streett automata. We shall consider two kinds of ωautomata that are expressively equivalent: B¨uchi automata and Streett automata. B¨uchi automata are more commonly used because there exist simple translations from LTL formulæ to B¨uchi automata and there exist many emptiness check algorithms for these automata [4]. Readers familiar with B¨uchi automata might be surprised that we use transition-based acceptance conditions rather than state-based ones. As noted by several authors [19, 3, 11, 12, 4, 26] this allows LTL formulæ to be translated into smaller automata, and for our purpose it will be useful to show why weak (resp. strong) fairness hypotheses can be added to a B¨uchi (resp. Streett) automaton without any blowup. Streett Automata can also be used as intermediate steps in some methods to complement B¨uchi automata [27]. For instance in the automata theoretic approach to model checking, we could want to express a property P to verify, not as an LTL formula, but as a (more expressive) B¨uchi automaton AP (or equivalently, an ω-regular expression). To ensure that M |= AP we should check that L (AM ⊗ ¬AP ) = ∅. One way to compute ¬AP is to use Safra’s construction [21] to construct a Streett automaton, and then convert this Streett automaton back into a B¨uchi automaton. Our objective is to introduce an on-the-fly emptiness check for transition-based Streett automata, in order to efficiently verify linear-time properties under strong fairness hypotheses, or simply to check the emptiness of AM ⊗¬AP without the cost of converting ¬Ap into a B¨uchi automaton. Existing emptiness checks for Streett automata [20, 15] share the same asymptotic complexity, but are state-based and will not work on-the-fly. Outline. Section 2 briefly reviews LTL and transition-based B¨uchi automata. Section 3 then introduces fairness hypotheses and Streett automata. We recall that weak fairness hypotheses are free and show that strong fairness hypotheses are less costly to express with Streett automata. Finally section 4 gives an on-the-fly algorithm to check
On-the-fly Emptiness Check of Transition-Based Streett Automata
215
the emptiness of a Streett automaton in a way that is only linearly slower (in the number of acceptance conditions) than the emptiness check of a B¨uchi automaton.
2 Background 2.1 Linear-time Temporal Logic (LTL) An LTL formula is constructed from a set AP of atomic propositions, the usual boolean operators (¬, ∨, ∧, →) and some temporal operators: X (next), U (until), F (eventually), G (globally). An LTL formula can express a property on the execution of the system to be checked. Because we focus on fairness properties we shall not be concerned with the full semantics of LTL [1, 18], it is enough to describe the following two idioms: – G F p means that property p is true infinitely often (i.e., at any instant of the execution you can always find a later instant so that p is true), – F Gp means that property p is eventually true continuously (i.e., at some instant in the future p will stay true for the remaining of the execution). The size |ϕ| of an LTL formula ϕ is its number of operators plus atomic propositions. 2.2 Buchi ¨ Automata Definition 1. (TGBA) A Transition-based Generalized B¨uchi Automaton [12] over Σ is a B¨uchi automaton with labels and generalized acceptance conditions on transitions. It can be defined as a tuple A = Σ, Q, q 0 , δ, F where Σ is an alphabet, Q is a finite set of states, q 0 ∈ Q is a distinguished initial state, δ ⊆ Q × Σ × Q is the (nondeterministic) transition relation, F ⊆ 2δ is a set of sets of accepting transitions. Graphically we represent the elements of F (which we call acceptance conditions) as small circles such as or on Fig. 1a, 1b and 1d. We will also merge into a single transition all transitions between two states with identical acceptance conditions, as if the transition relation was actually in Q × 2Σ × Q.1 For the purpose of model checking we have AP equal to the set of all atomic propositions that can characterize a configuration, and we use these automata with Σ = 2AP (i.e., each configuration of the system can be mapped into a letter of Σ). Graphically, with the aforementioned merging of transitions, it is therefore equivalent to label the transitions of the automata by propositional formulæ over AP . An infinite word σ = σ(0)σ(1) · · · over the alphabet Σ is accepted by A if there exists an infinite sequence ρ = (q0 , l0 , q1 )(q1 , l1 , q2 ) . . . of transitions of δ, starting at q0 = q 0 , and such that ∀i 0, σ(i) = li , and ∀f ∈ F, ∀i 0, ∃j i, ρ(j) ∈ f . That is, each letter of the word is recognized, and ρ traverses each acceptance condition infinitely often. Given two TGBAs A and B, the synchronous product of A and B, denoted A ⊗ B is a TGBA that accepts only the words that are accepted by A and B. If we denote |A|s the number of accessible states of A, we have |A ⊗ B|s ≤ |A|s × |B|s . If we denote |A|t the number of transitions of A, we always have |A|t ≤ |A|2s × |Σ|. Also |A ⊗ B|t ≤ (|A|s × |B|s )2 × |Σ| ≤ |A|t × |B|t . Finally if a TGBA C has only one state and is deterministic, then |A ⊗ C|s ≤ |A|s and |A ⊗ C|t ≤ |A|t . 1
This optimization is pretty common in implementations; we only use it to simplify figures.
216
A. Duret-Lutz, D. Poitrenaud, and J.-M. Couvreur
3 Coping with Fairness Hypotheses Fairness hypotheses are a way to filter out certain behaviors of the model that are deemed irrelevant. For instance when modeling a communication between two processes over a lossy channel, we might want to assume that any message will eventually reach its destination after a finite number of retransmissions. Although there is one behavior of the model in which the retransmitted message is always lost, we may want to ignore this possibility during verification. 3.1 Weak and Strong Fairness Let us give a definition of fairness involving a pair of events en and oc in a model M . An event could be the progress of some process, the execution of a particular instruction of the model, or even the fact that an instruction is enabled (i.e., could be executed). Definition 2. (Unconditional fairness) An event oc is unconditionally fair if it will happen infinitely often, i.e., if M |= G F oc. Definition 3. (Weak fairness) A pair of events (en, oc) is weakly fair if whenever en occurs continuously, then oc will occur infinitely often: M |= (F Gen → G F oc). Because we have F Gen → G F oc ≡ G F((¬en) ∨ oc) weak fairness can be handled like unconditional fairness. Fig. 1a shows an example of a 1-state TGBA recognizing G F((¬en ) ∨ oc). This TGBA is deterministic: for any configuration given by a set of truth values of en and oc, n there is only one transition that can be followed. In fact, any formula of the form i=1 (F Gen i → G F oc i ), representing a combination of n weak (or unconditional)
¬en ∨ oc ¬oc
oc
¬oc
en ∧ ¬oc
¬en
¬en
oc
(a) F = { }
¬en ∧ ¬oc oc
¬en
en ∧ ¬oc
(b) F = { , }
(c) F = {( , )} oc
oc ¬en ¬en
(d) F = { }
¬en
¬en
(e) F = {( , )}
Fig. 1. (a) A TGBA equivalent to the LTL formula G F((¬en) ∨ oc); (b),(d) two TGBAs equivalent to ϕ = (G F en → G F oc); (c),(e) two TSAs equivalent to ϕ. denotes the true value.
On-the-fly Emptiness Check of Transition-Based Streett Automata
217
fairness hypotheses, can be translated into a 1-state deterministic TGBA with 2n transitions. Note that this “1-state determinism” property holds both because we are considering generalized automata and transition-based acceptance conditions, it would not not hold for state-based acceptance conditions. Definition 4. (Strong fairness) A pair of events (en, oc) is strongly fair if whenever en occurs infinitely often, then oc will occur infinitely often: M |= (G F en → G F oc). Fig. 1b and 1d show two TGBAs corresponding to the formula G F en → G F oc. The first, bigger automaton is produced by LTL-to-B¨uchi translation algorithms such those of Couvreur [3] or Tauriainen [26]. The smaller one is a TGBA adaptation of an automaton shown by Kesten et al. [14]; we do not know of any general LTLto-B¨uchi translation algorithm that would produce this automaton. Attempts to construct n automata for conjunctions of strong fairness hypotheses, i.e. formulæ of the form i=1 (G F en i → G F oc i ), will lead to a nondeterministic automaton that has either 3n + 1 or 3n states depending on whether we base the construction on Fig. 1b or 1d. These automata have 2O(n) transitions. 3.2 Fairness in the Automata Theoretic Approach Given a model M and an LTL formula ϕ, we can check whether M |= ϕ by checking whether the automaton AM ⊗ A¬ϕ accepts any infinite word (such a word would be a counterexample). Because |A¬ϕ |t = 2O(|ϕ|) , we have |AM ⊗A¬ϕ |t ≤ |AM |t ×2O(|ϕ|). Checking the emptiness of a TGBA can be done in linear time with respect to its size, regardless of the number of acceptance conditions [4], so the whole verification process requires O(|AM |t × 2O(|ϕ|)) time. Verifying ϕ under some fairness hypothesis represented as an LTLformula ψ amounts to checking whether M |= (ψ → ϕ), i.e., ϕ should hold only for the runs where ψ also holds. We can see that AM ⊗ A¬(ψ→ϕ) = AM ⊗ Aψ∧¬ϕ = AM ⊗ Aψ ⊗ A¬ϕ . In other words, a fairness hypothesis could be represented by just an extra synchronized product before doing the emptiness check. Weak fairness. We have seen that n weak fairness hypotheses can be represented by a 1-state deterministic TGBA. This means that the operation AM ⊗ Aψ is basically free: it will not add new states to those of AM . In practice each transition of AM would be labelled during its on-the-fly construction with the acceptance conditions of Aψ . Model checking under n week fairness hypotheses is therefore independent of n2 and requires O(|AM |t × 2O(|ϕ|)) time. Strong fairness. Model checking under n strong fairness hypotheses is costly with B¨uchi automata: we have seen that these n hypotheses can be represented by a TGBA with 2O(n) transitions, the verification therefore requires O(|AM |t × 2O(|ϕ|+n) ) time. 3.3 Streett Automata Definition 5. (TSA) A Transition-based Streett Automaton is a kind of TGBA in which acceptance conditions are paired. It can be also be defined as a tuple A = Σ, Q, q 0 , δ, 2
This is because we assume that we are using a generalized emptiness check [4].
218
A. Duret-Lutz, D. Poitrenaud, and J.-M. Couvreur
F where F = {(l1 , u1 ), (l2 , u2 ), . . . , (lr , ur )} is a set of pairs of acceptance conditions with li ⊆ δ and ui ⊆ δ. The difference between TSA and TGBA lies in the interpretation of F . An infinite word σ over the alphabet Σ is accepted by A if there exists an infinite sequence ρ = (q0 , l0 , q1 )(q1 , l1 , q2 ) . . . of transitions of δ, starting at q0 = q 0 , and such that ∀i 0, σ(i) = li , and ∀(l, u) ∈ F, (∀i 0, ∃j i, ρ(j) ∈ l) =⇒ (∀i 0, ∃j i, ρ(i) ∈ u). That is, each letter of the word is recognized, and for each pair (l, u) of acceptance conditions, if ρ encounters l infinitely often, then it encounters u infinitely often. Given two TSA A and B, it is also possible to define their synchronous product A ⊗ B such that |A ⊗ B| = O(|A| × |B|) and L (A ⊗ B) = L (A) ∩ L (B). B¨uchi and Streett automata are known to be expressively equivalent [21]. Obviously a TGBA with acceptance conditions F = {u1 , u2 , . . . , un } can be translated into an equivalent TSA without changing its structure: we simply use the acceptance conditions F = {(Q, u1 ), . . . , (Q, un )}. For instance Fig. 1e shows the TSA resulting from this rewriting applied to the TGBA of Fig. 1d. The converse operation, translating Streett automata to B¨uchi, induces an exponential blowup of the number of states [22]. For instance L¨oding [16] shows how to translate a state-based Streett automaton of |Q| states and n pairs of acceptance conditions into a state-based B¨uchi automaton with |Q| × (4n − 3n + 2) states (and 1 acceptance condition). The following construction shows how to translate a TSA of |Q| states and n pairs acceptance conditions of into a TGBA of |Q| × (2n + 1) states and n acceptance conditions. (The same construction could be achieved for state-based automata: here the gain is only due to the use of generalized acceptance conditions.) Given a TSA A = Σ, Q, q 0 , F , δ with F = {(l1 , u1 ), (l2 , u2 ), . . . , (ln , un )}, let N = {1, 2, . . . , n}, and for any (S, t) ∈ 2N × δ let pending(S, t) = (S ∪ {i ∈ N | t ∈ li })\{i ∈ N | t ∈ ui }. Now define the TGBA A = Σ, Q , q 0 , δ , F where Q = Q∪ (Q×2N ), δ = δ∪{(s, g, (d, ∅)) | (s, g, d) ∈ δ}∪{((s, S), g, (d, pending(S, (s, g, d))) | (s, g, d) ∈ δ, S ∈ 2N }, and F = {fi | i ∈ 2N } with fi = {((s, S), l, (d, D) ∈ δ | N \ S = i}. Then L (A) = L (A ). The justification behind this construction is that any run accepted by a Streett automaton can be split in two parts: a finite prefix, where any transition can occur, followed by a infinite suffix where it is guaranteed that any transition in li will be eventually followed by a transitions in ui . The original TGBA is therefore cloned 2n + 1 times to construct the corresponding TSA. The first clone, using Q and δ, is where the prefix is read. From there the automaton can non-deterministically switch to the clone that is using states in Q × {∅}. From now on the automaton has to remember which ui it has to expect: this is the purpose of the extra set added to the state. An automaton is in state (s, S) that follows a transition in li will therefore reach state (s, S ∪ {i}), and conversely, following a transition in ui will reach state (s, S \ {i}). The function pending(S , t ) defined above computes those pending ui s. The acceptance conditions are defined to complement the set of pending ui s, to be sure they are eventually fulfilled. 3.4 Strong Fairness with Streett Automata The TSA of Fig. 1e is however not the most compact way to translate a strong fairness formula: Fig. 1c shows how it can be done with a 1-state deterministic TSA.
On-the-fly Emptiness Check of Transition-Based Streett Automata
219
Actually any LTL formula ni=1 G F en → G F oc representing n strong fairness hypotheses can be translated into a 1-state deterministic TSA with n pairs of acceptance conditions and 4n transitions. It is the TSA A = 2AP , {q}, q, δ, F where AP = {oc 1 , oc 2 , . . . , oc n , en 1 , en 2 , . . . en n }, δ = {q, E, q | E ∈ 2AP }, and F = {(l1 , u1 ), (l2 , u2 ), . . . , (ln , un )} with li = {(q, E, q) ∈ δ | eni ∈ E} and ui = {(q, E, q) ∈ δ | oci ∈ E}. Again this “1-state determinism” would not hold for state-based Streett acceptance condition. Combining this automaton with the construction of section 3.3, we can represent n strong fairness hypotheses using a TGBA of 2n + 1 states (and 4n (2n + 1) transitions). This is better than the TGBA of 3n states presented in section 3.1, but the complexity of the verification would remain in O(|AM |t × 2O(|ϕ|+n) ) time. As when model checking under weak fairness hypotheses, the Streett acceptance conditions representing strong fairness hypotheses can be injected in the automaton AM during its on-the-fly generation: any transition of AM labelled by E ∈ 2AP receives the acceptance conditions α(E). The verification under n strong fairness hypotheses amounts to checking the emptiness of a TSA of size O(|AM |t × 2O(|ϕ|) ), with n pairs of acceptance conditions. We now show how to check this TSA emptiness in O(|AM |t × n × 2O(|ϕ|)) time by adapting an algorithm by Couvreur [3, 4] that was originally designed for the emptiness check of TGBA.
4 Emptiness Check for Streett Automata The behavior of the algorithm is illustrated on Fig. 2 on a TSA with 2 pairs of acceptance conditions: ( , ) and ( , ). We are looking for runs that visit (resp. ) infinitely often if they visit (resp. ) infinitely often. As its older brother (for TGBA [3, 4]) this algorithm performs a DFS to discover strongly connected components (SCC). Each SCC is labelled with the set of acceptance conditions that can be found on its edges, and will stop as soon as it finds an SCC whose label verifies ( → )∧( → ). Figures 2a–2f show the first steps until a terminal SCC (i.e. with no outgoing transition) is found. Let us denote F = {(l1 , u1 ), (l2 , u2 ), . . . , (ln , un )} the set of acceptance conditions of the Streett automaton, and acc ⊆ F the set of acceptance conditions of the terminal SCC encountered. When such a terminal SCC is found we can be in one of the three following cases. 1. Either the SCC is trivial (i.e. has no loops): it cannot be accepting and all its states can be ignored from now on. 2. Or the SCC is accepting: ∀i, li ∈ acc =⇒ ui ∈ acc. In that case the algorithm terminates and reports the existence of an accepting run. It is better to check this condition any time a non-trivial SCC is formed, not only for terminal SCC: this gives the algorithm more chance to terminate early. 3. Or ∃i, li ∈ acc ∧ ui ∈ acc. In that case we cannot state whether the SCC is accepting or not. Maybe it contains an accepting run that does not use any transition of li . Fig. 2f is an instance of this case: F = {( , ), ( , )} and acc = { , , } so the algorithm cannot conclude immediately.
220
A. Duret-Lutz, D. Poitrenaud, and J.-M. Couvreur
(a) A DFS numbers states and stacks them as triv- (g) We start another DFS, this time handling differently. ial SCCs. 1
2
(b) Backlinks cause SCCs to be merged.
(h) . . .
1
(c) DFS continues. . .
(d) . . .
(e) . . .
(f) This terminal SCC could hide an accepting run that visits only finitely often.
2
2
(i) Crossing
3
4
sets a threshold.
(j) Merges are allowed above the threshold. . .
(k) . . . but disallowed across the threshold.
(l) The right SCC is accepting; we stop.
Fig. 2. Running the emptiness check on a TSA with F = {( , ), ( , )}
To solve third case, the algorithm will revisit the whole SCC, but avoiding transitions t such that ∃i, t ∈ li ∧ ui ∈ acc. Practically, we define the set avoid = {li ∈ acc | ui ∈ acc} of li that cannot be satisfied, all the states from the SCC are removed from the hash table of visited states, and the algorithm makes another DFS with the following changes: – amongst the outgoing transitions of a state, those who carry acceptance condition of avoid are visited last – crossing a transition labelled by an avoided acceptance condition sets up a threshold (denoted by a dashed vertical line on Fig. 2i) – if a transition going out from a SCC goes back to another SCC in the search stack, then the two SCC will be merged only if the two SCC are behind the last threshold
On-the-fly Emptiness Check of Transition-Based Streett Automata
221
set. Fig. 2j shows one case where merging has been allowed, while Fig 2k shows a forbidden attempt to merge two SCCs. This new visit will construct smaller SCCs instead of the original terminal SCC. The only way to merge these smaller SCCs would be to accept a cycle using a transition from an acceptance condition (of avoid) that cannot be satisfied. For each of these smaller SCCs we can then decide whether they are trivial, accepting, or if they contain acceptance conditions (not already listed in avoid) that cannot be satisfied. In the latter case avoid is augmented and the process is repeated. This recursion cannot exceed |F | levels since we complete avoid at each step with at least one pair of F . Compared to the original emptiness check for TGBA that visits each state and transitions only once, this variant will in the worst case visit each state and transitions |F | + 1 times. On a TSA A this algorithm therefore works in O(|A|t × |F |) time. 1 2 3
4 5 6 7 8 9 10 11 12 13 14
Input: A Streett automaton A = Σ, Q, q 0 , δ, F Output: iff L (A) = ∅ Data: SCC: stack of state ∈ Q, root ∈ N, la ⊆ F, acc ⊆ F, rem ⊆ Q, succ ⊆ δ, fsucc ⊆ δ H: map of Q →N avoid: stack of root ∈ N, acc ⊆ F min: stack of N max ← 0 begin min.push(0) avoid.push(1, ∅) DFSpush(∅, q 0 ) while ¬SCC.empty() do if SCC.top().succ = ∅ then if SCC.top().fsucc = ∅ then swap(SCC.top().succ, SCC.top().fsucc) min.push(max) else DFSpop() else
15
pick one s, e, d off SCC.top().succ a ← {f ∈ F | (s, e, d) ∈ f } if d ∈ H then DFSpush(a, d) else if H[d] > min.top() then merge(a, H[d]) acc ← SCC.top().acc if ∀l, u ∈ F, (l ∈ acc) =⇒ (u ∈ acc) then return ⊥
16 17 18 19 20 21 22 23
return
24 25
end Fig. 3. Emptiness check of a Streett automaton
222
26 27 28 29 30 31 32 33 34 35
A. Duret-Lutz, D. Poitrenaud, and J.-M. Couvreur
DFSpush(a ⊆ F, q ∈ Q) max ← max + 1 H[q] ← max SCC.push(q, max, a, ∅, ∅, {s, l, a, d ∈ δ | s = q, a ∩ avoid.top().acc = ∅}, {s, l, a, d ∈ δ | s = q, a ∩ avoid.top().acc = ∅}) end DFSpop() q, n, la, acc, rem, , ← SCC.pop() max ← n − 1 if n ≤ min.top() then min.pop() old avoid ← avoid.top().acc if n = avoid.top().root then avoid.pop()
36 37 38
new avoid ← old avoid ∪ {l | l, u ∈ F, l ∩ acc = ∅, u ∩ acc = ∅} if new avoid = old avoid then foreach s ∈ rem do delete H[s]
39 40 41 42
avoid.push(n, new avoid) DFSpush(la, q)
43 44
else
45
foreach s ∈ rem do H[s] ← 0
46 47 48 49 50 51 52 53 54 55 56 57 58
end merge(a ⊆ F, t ∈ N) r←∅ s←∅ f ←∅ while t < SCC.top().root do a ← a ∪ SCC.top().acc ∪ SCC.top().la r ← r ∪ SCC.top().rem ∪ SCC.top().state s ← s ∪ SCC.top().succ f ← f ∪ SCC.top().fsucc SCC.pop() SCC.top().acc ← SCC.top().acc ∪ a SCC.top().rem ← SCC.top().rem ∪ r SCC.top().succ ← SCC.top().succ ∪ s SCC.top().fsucc ← SCC.top().fsucc ∪ f
59 60 61 62 63
end Fig. 3. (Continued)
On-the-fly Emptiness Check of Transition-Based Streett Automata
223
Relation to other algorithms. The basic idea to using strongly connected components to check strong fairness is old [17, 9], and has been declined in a few algorithms to check the emptiness of (state-based) Streett automata [20, 15]. But these algorithms modify the graph before visiting it again, hindering on-the-fly computations. At a high level, our algorithm is close to the one presented by Latvala and Heljanko [15], who suggests using any algorithm to compute SCCs. However we have more than implementation detail differences. Our algorithm is targeted to transition-based acceptance conditions, actually shows how to make the emptiness check on-the-fly, and uses two tricks that are dependent on the algorithm used to compute SCC. As mentioned in the introduction, there exists two similar algorithms to compute SCCs on-the-fly: Tarjan’s [25] and Dijkstra’s [5, 6]. The latter is less known, but better suited to model checking (it has less overhead and can abort earlier). Our trick to use a threshold to prevent SCC merges could work with either algorithms, but for the emptiness-check to be correct we also need to perform the DFS in terms on SCCs instead of working in terms of states. This ordering is possible with Dijkstra’s algorithm, but not Tarjan’s. Implementation. Fig. 3 presents the algorithm. Its structure mimics that of the emptiness check for TGBA of Couvreur et al. [4], especially it profits from the idea of performing the DFS in terms of SCCs rather than states: the stack SCC serves both as a stack of connected components and as the DFS stack. The constituents of each entry are state (the root state of the SCC), root (its DFS number), la (the acceptance conditions of the incoming transition to state), acc (the acceptance conditions of the cycles inside the SCC), rem (the other states of the SCC), succ and fsucc (the unexplored successors of the SCC). These unexplored successors are split into succ and fsucc to ensure a proper ordering with respect to avoided acceptance conditions. When a state is pushed down on SCC at line 29, fsucc is loaded with all transitions in acceptance conditions that must be avoided, while succ receive the others. The latter will be visited first: the algorithm always pick the next successor to visit from succ (line 16) and will swap fsucc and succ once succ is empty (lignes 9–11). Thresholds, meant to prevent merging SCCs using a cycle that would use an unsatisfiable acceptance condition, are represented by the number (in DFS order) of the last state of the SCC from which the threshold transition is going out (that is 2 on our example). These numbers form the min stack; they are used line 20 before deciding whether to merge; they are pushed when fsucc and succ are swapped line 12, and are popped when the state of that number is removed line 35. The acceptance conditions to avoid are pushed on top of a stack called avoid which is completed anytime the algorithm needs to revisit an SCC (line 43). Each element of this stack is a pair (ar, acc) where root is the number of the first state of the SCC starting at which acceptance conditions in acc should be avoided. This stack is popped when the SCC rooted at root has been visited and has to be removed (lines 37–38). Correctness. Termination is guaranteed by the DFS and the fact that the number of avoided acceptance conditions cannot exceed |F |. By lack of space, we only give the scheme of our proof that this algorithm will return ⊥ if an accepting run exists in the input TSA, and will return otherwise. (A complete proof is available in French [7].)
224
A. Duret-Lutz, D. Poitrenaud, and J.-M. Couvreur
Let us use the following notations to describe the state of the algorithm: SCC =state0 , root0 , la0 , acc0 , rem0 , succ0 , fsucc 0 state1 , root1 , la1 , acc1 , rem1 , succ1 , fsucc 1 .. . staten , rootn , lan , accn , remn , succn , fsucc n min =min0 min1 . . . minp avoid =ar0 , acc0 ar1 , acc1 . . . arr , accr Furthermore, let us denote Si the set of states represented by SCC[i], and ϕ(x) the index of the SCC containing the state numbered x: Si ={s ∈ Q | rooti ≤ H[s] < rooti+1 }
for 0 ≤ i < n
Sn ={s ∈ Q | rootn ≤ H[s]} ϕ(x) = max{i | rooti ≤ x} Lemma 1. At any time between the execution of lines 8–23, for any pair ari , acci on the avoid stack, there exists a unique entry statej , rootj , laj , accj , remj , succj , fsucc j on the SCC stack such that ari = rootj . In other words, the avoid entries are always associated to roots of SCCs. Lemma 2. When line 16 is run to pick a state amongst the successors of the top of SCC, the value of accr is the same as when this set of successors was created at line 29. Lemma 3. The values of (rooti )i∈[[0,n]] are strictly increasing and we have rootn ≤ max at all times between the execution of lines 8–23. Lemma 4. Let us call n the value of n at a moment right after lines 11–12 have been run. The sets succn and fsucc n will never increase. Lemma 5. The function g that to any i ∈ {0, ..., p} associates g(i) = ϕ(mini ) is injective. In other words, two states numbered mini1 and mini2 (with i1 = i2 ) cannot belong to the same SCC. Furthermore, if n > minp , rootϕ(minp )+1 = minp + 1. In other words, minp is the number of the last state of the SCC whose root has the number rootϕ(minp ) . Finally, rootϕ(minp ) ≤ minp ≤ max. The state set Q of the TSA to check can be partitioned in three sets: – active states are those which appear in H associated to a non-null value, – removed states are those which appear in H with a null value, – unexplored states are not yet in H. The algorithm can move a state from the unexplored set to the active set, and from there it can move it either to the removed set or back to the unexplored set (lines 41–42). The following invariants are preserved by all the lines of the main function (lines 8– 23). They need to be proved together as their proofs are interdependent.
On-the-fly Emptiness Check of Transition-Based Streett Automata
225
Invariant 1. For all i ≤ n, the subgraph induced by the states of Si is a SCC. Furthermore there exists a cycle in this SCC that visits all acceptance conditions of acci . Finally S0 , S1 , . . . , Sn is a partition of the set of active states. Invariant 2. ∀i < n, ∃s ∈ Si , ∃s ∈ Si+1 , ∃p ∈ 2Σ , {f ∈ F | (s, p, s ) ∈ f } = lai+1 . I.e., there exists a transition between the SCCs indexed by i and i + 1 that is in all that acceptance conditions of lai+1 . Invariant 3. There is exactly max active states. No state of H is associated to a value greater than max. If two different states are associated to the same value in H, this value is 0. In particular, this means that for any value v between 1 and max, there exists a unique active state s such that H[s] = v. Invariant 4. For all integer i ≤ n, the set remi holds all the states of Si \ {statei }. Invariant 5. Any removed state q cannot be part of an accepting run. Invariant 6. There is no state accessible from staten from which we could find an accepting cycle using a transition in an acceptance condition from accr . Invariant 7. All transitions going from Sϕ(minp ) to Sϕ(minp )+1 are labelled by an acceptance condition of accr . (In particular, laϕ(minp )+1 ∩ accr = ∅.) Invariant 8. ∀j ≥ ϕ(minp ), accj ∩accr = ∅ and ∀j > ϕ(minp )+ 1, laj ∩accr = ∅. In other words, the SCC built after the last threshold, and the transitions between them, are not in acceptance conditions from accr , except for the first transition visited after the last threshold (in laϕ(minp )+1 ). The first two invariants imply that if the algorithm finds an i such that ∀(l, u) ∈ F , acci ∈ l =⇒ acci ∈ u, then SCC[i] is an accepting SCC (inv. 1) that is accessible (inv. 1 & 2), so the algorithm can terminate with ⊥. Invariant 5 assures that no accepting run exists once all states have been removed: the algorithm therefore terminates with .
5 Conclusion We have introduced a new algorithm for the on-the-fly emptiness check of transitionbased Streett automata (TSA), that generalizes the algorithm for transition-based B¨uchi automata of Couvreur [3]. This algorithm checks the emptiness of a TSA A with |A|t transitions and |F | acceptance pairs in O(|A|t × |F |) time. We have seen that this algorithm allows us to check a linear-time property on a model AM under n strong fairness hypotheses in O(|AM |t ×2O(|ϕ|) ×n) time instead of the O(|AM |t ×2O(|ϕ|+n)) we would have using B¨uchi automata. It should be noted that since B¨uchi automata can be seen as Streett automata without any structural change, this very same algorithm can also be used to check the emptiness of B¨uchi automata. In that case SCCs will never have to be revisited (the avoid stack stays empty) and the algorithms performs the same operations as the original algorithm for B¨uchi automata.
226
A. Duret-Lutz, D. Poitrenaud, and J.-M. Couvreur
Using Streett automata could also be useful to translate some LTL properties that look like strong fairness properties. For instance Sebastiani et al. [24] give the following LTL formula as an example of a property that is hard to translate to B¨uchi automata (most of the tools blow up): (G F p0 → G F p1 ) ∧ (G F p2 → G F p0 )∧ (G F p3 → G F p2 ) ∧ (G F p4 → G F p2 )∧ (G F p5 → G F p3 ) ∧ (G F p6 → G F(p5 ∨ p4 ))∧ (G F p7 → G F p6 ) ∧ (G F p1 → G F p7 ) → G F p8 Spot’s LTL-to-B¨uchi translator [8] produces a TGBA with 1731 states. With a dedicated algorithm Sebastiani et al. were able to produce a 1281-state Generalized B¨uchi automaton. However this formula has the form ψ → ϕ where ψ is a combinaison of 8 strong fairness hypotheses, and ¬ϕ can be expressed as a B¨uchi automaton with 2 states and one acceptance condition. The whole formula can therefore be expressed as a transition-based Streett automaton with two states and 9 pairs of acceptance conditions.3 This reduction should not be a surprise since Streett automata are exponentially more succinct than B¨uchi automata [23], however this example shows that it would be useful to have an efficient algorithm to translate LTL formulæ to Streett automata. Unfortunately we are not aware of any published work in this area.
References [1] Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (2000) [2] Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001) [3] Couvreur, J.-M.: On-the-fly verification of temporal logic. In: Wing, J.M., Woodcock, J.C.P., Davies, J. (eds.) FM 1999. LNCS, vol. 1708, pp. 253–271. Springer, Heidelberg (1999) [4] Couvreur, J.-M., Duret-Lutz, A., Poitrenaud, D.: On-the-fly emptiness checks for generalized B¨uchi automata. In: Godefroid, P. (ed.) SPIN 2005. LNCS, vol. 3639, pp. 169–184. Springer, Heidelberg (2005) [5] Dijkstra, E.W.: EWD 376: Finding the maximum strong components in a directed graph (May 1973), http://www.cs.utexas.edu/users/EWD/ewd03xx/EWD376.PDF [6] Dijkstra, E.W.: Finding the maximal strong components in a directed graph. In: A Discipline of Programming, ch. 25, pp. 192–200. Prentice-Hall, Englewood Cliffs (1976) [7] Duret-Lutz, A.: Contributions a` l’approche automate pour la v´erification de propri´et´es de syst`emes concurrents. PhD thesis, Universit´e Pierre et Marie Curie (Paris 6) (July 2007) [8] Duret-Lutz, A., Poitrenaud, D.: Spot: an extensible model checking library using transitionbased generalized B¨uchi automata. In: Proc. MASCOTS 2004, Volendam, The Netherlands, October 2004, pp. 76–83. IEEE Computer Society, Los Alamitos (2004) 3
Combining this with the TSA to TGBA construction from section 3.3 yields a TGBA of 2 × (29 + 1) = 1026 states that is even smaller than that of Sebastiani et al.
On-the-fly Emptiness Check of Transition-Based Streett Automata
227
[9] Emerson, E.A., Lei, C.-L.: Modalities for model checking: Branching time logic strikes back. Science of Computer Programming 8(3), 275–306 (1987) [10] Francez, N.: Fairness. Springer, Heidelberg (1986) [11] Gastin, P., Oddoux, D.: Fast LTL to B¨uchi automata translation. In: Berry, G., Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 53–65. Springer, Heidelberg (2001) [12] Giannakopoulou, D., Lerda, F.: From states to transitions: Improving translation of LTL formulæ to B¨uchi automata. In: Peled, D.A., Vardi, M.Y. (eds.) FORTE 2002. LNCS, vol. 2529, pp. 308–326. Springer, Heidelberg (2002) [13] Godefroid, P., Holzmann, G., Pirottin, D.: State-space caching revisited. Formal Methods in System Design 7(3), 227–241 (1995) [14] Kesten, Y., Pnueli, A., Vardi, M.Y.: Verification by augmented abstraction: The automatatheoretic view. Journal of Computer and System Sciences 62(4), 668–690 (2001) [15] Latvala, T., Heljanko, K.: Coping with strong fairness. Fundamenta Informaticae 43(1–4), 1–19 (2000) [16] L¨oding, C.: Methods for the transformation of ω-automata: Complexity and connection to second order logic. Diploma thesis, Institue of Computer Science and Applied Mathematics (1998) [17] Lichtenstein, O., Pnueli, A.: Checking that finite state concurrent programs satisfy their linear specification. In: Proc. the 12th ACM Symposium on Principles of Programming Languages (POPL 1985), pp. 97–107. ACM, New York (1985) [18] Merz, S.: Model checking: A tutorial overview. In: Cassez, F., Jard, C., Rozoy, B., Dermot, M. (eds.) MOVEP 2000. LNCS, vol. 2067, pp. 3–38. Springer, Heidelberg (2001) [19] Michel, M.: Alg`ebre de machines et logique temporelle. In: Fontet, M., Mehlhorn, K. (eds.) STACS 1984. LNCS, vol. 166, pp. 287–298. Springer, Heidelberg (1984) [20] Rauch Henzinger, M., Telle, J.A.: Faster algorithms for the nonemptiness of Streett automata and for communication protocol pruning. In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 16–27. Springer, Heidelberg (1996) [21] Safra, S.: Complexity of Automata on Infinite Objects. PhD thesis, The Weizmann Institute of Science, Rehovot, Israel (March 1989) [22] Safra, S.: Exponential determinization for ω-automata with strong-fairness acceptance condition. In: Proc. STOC 1992. ACM, New York (1992) [23] Safra, S., Vardi, M.Y.: On ω-automata and temporal logic (preliminary report). In: Proc. STOC 1989, pp. 127–137. ACM, New York (1989) [24] Sebastiani, R., Tonetta, S., Vardi, M.Y.: Symbolic systems, explicit properties: on hybrid approches for LTL symbolic model checking. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 350–363. Springer, Heidelberg (2005) [25] Tarjan, R.: Depth-first search and linear graph algorithms. SIAM Journal on Computing 1(2), 146–160 (1972) [26] Tauriainen, H.: Automata and Linear Temporal Logic: Translation with Transition-based Acceptance. PhD thesis, Helsinki University of Technology, Espoo, Finland (September 2006) [27] Vardi, M.Y.: The B¨uchi complementation saga. In: Thomas, W., Weil, P. (eds.) STACS 2007. LNCS, vol. 4393, pp. 12–22. Springer, Heidelberg (2007) [28] Vardi, M.Y.: Automata-theoretic model checking revisited (Invited paper.). In: Cook, B., Podelski, A. (eds.) VMCAI 2007. LNCS, vol. 4349, pp. 137–150. Springer, Heidelberg (2007) [29] Vardi, M.Y.: An automata-theoretic approach to linear temporal logic. In: Moller, F., Birtwistle, G. (eds.) Logics for Concurrency. LNCS, vol. 1043, pp. 238–266. Springer, Heidelberg (1996)
On Minimal Odd Rankings for B¨ uchi Complementation Hrishikesh Karmarkar and Supratik Chakraborty Department of Computer Science and Engineering Indian Institute of Technology Bombay
Abstract. We study minimal odd rankings (as defined by Kupferman and Vardi[KV01]) for run-DAGs of words in the complement of a nondeterministic B¨ uchi automaton. We present an optimized version of the ranking based complementation construction of Friedgut, Kupferman and Vardi [FKV06] and Schewe’s[Sch09] variant of it, such that every accepting run of the complement automaton assigns a minimal odd ranking to the corresponding run-DAG. This allows us to determine minimally inessential ranks and redundant slices in ranking-based complementation constructions. We exploit this to reduce the size of the complement B¨ uchi automaton by eliminating all redundant slices. We demonstrate the practical importance of this result through a set of experiments using the NuSMV model checker.
1
Introduction
The problem of complementing nondeterministic ω-word automata is fundamental in the theory of automata over infinite words. In addition to the theoretical aspects of the study of complementation techniques, efficient complementation techniques are extremely useful in practical applications as well. Vardi’s excellent survey paper on the saga of B¨ uchi complementation spanning more than 45 years provides a brief overview of various such applications of complementation techniques [Var07]. Various complementation constructions for nondeterministic B¨ uchi automata on words (henceforth referred to as NBW) have been developed over the years, starting with B¨ uchi [B¨ uc62]. B¨ uchi’s algorithm resulted in a complement auO(n) tomaton with 22 states, starting from an NBW with n states. This upper 2 bound was improved to 2O(n ) by Sistla et al [SVW87]. Safra [Saf88] provided the first asymptotically optimal nO(n) upper bound for complementation that passes through determinization. By a theorem of Michel [Mic88], it was known that B¨ uchi complementation has a n! lower-bound. With this, L¨ oding [L¨ od99] showed that Safra’s construction is asymptotically optimal for B¨ uchi detern minization, and hence for complementation. The O(n!) (approximately (0.36n) ) lower bound for B¨ uchi complementation was recently sharpened to (0.76n)n by Yan [Yan08] using a full-automata technique. The complementation constructions of Klarlund [Kla91], Kupferman and Vardi [KV01] and K¨ ahler and Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 228–243, 2009. c Springer-Verlag Berlin Heidelberg 2009
On Minimal Odd Rankings for B¨ uchi Complementation
229
Wilke [KW08] for nondeterministic B¨ uchi automata are examples of determinization free (or Safraless as they are popularly called) complementation constructions for B¨ uchi automata. The best known upper bound for the problem until recently was (0.97n)n , given by Kupferman and Vardi. This was recently sharpn ened by Schewe[Sch09] to an almost tight upper bound of (0.76n) modulo a factor of n2 . NBW complementation techniques based on optimized versions of Safra’s determinization construction (see, for example, Piterman’s recent work [Pit07]) have been experimentally found to work well for automata of small sizes (typically 8 − 10 states) [TCT+ 08]. However, these techniques are complex and present fewer opportunities for optimized implementations. Ranking-based complementation constructions [KV01, FKV06, Sch09] are comparatively simpler and appear more amenable to optimizations, especially when dealing with larger automaton sizes. For example, several optimization techniques for ranking based complementation constructions have been proposed recently [FKV06, GKSV03]. Similarly, language universality and containment checking techniques that use the framework of ranking-based complementation but avoid explicit construction of complement automata have been successfully applied to NBW with more than 100 states [DR09, FV09]. This leads us to believe that ranking-based complementation constructions hold much promise, and motivates our study of new optimization techniques for such constructions. The primary contributions of this paper can be summarized as follows : (i) We present an improvement to the ranking based complementation constructions of [FKV06] and [Sch09] for NBW. All accepting runs of our automaton on a word in the complement language correspond to a minimal odd ranking of the run-DAG. (ii) We show how to reduce the size of the complement automaton by efficiently identifying and removing redundant slices without language containment checks. (iii) We present an implementation of the proposed technique using the BDDbased symbolic model checker NuSMV, and experimentally demonstrate the advantages of our technique on a set of examples.
2
Ranking-Based NBW Complementation
Let A = (Q, q0 , Σ, δ, F ) be an NBW, where Q is a set of states, q0 ∈ Q is an initial state, Σ is an alphabet, δ : Q × Σ → 2Q is a transition function, and F ∈ 2Q is a set of accepting states. An NBW accepts a set of ω-words, where an ω-word α is an infinite sequence α0 α1 . . ., and αi ∈ Σ for all i ≥ 0. A run ρ of A on α is an infinite sequence of states given by ρ : N → Q, where ρ(0) = q0 and ρ(i + 1) ∈ δ(ρ(i), αi ) for all i ≥ 0. A run ρ of A on α is called accepting if inf (ρ) ∩ F = ∅, where inf (ρ) is the set of states that appear infinitely often along ρ. The run ρ is called rejecting if inf (ρ) ∩ F = ∅. An ω-word α is accepted by A if A has an accepting run on it, and is rejected otherwise. The set of all words accepted by A is called the language of A, and is denoted L(A). The complementation problem for NBW is to construct an automaton Ac from a given NBW A such that L(Ac ) = Σ ω \ L(A). We will henceforth denote the
230
H. Karmarkar and S. Chakraborty
complement language Σ ω \ L(A) by L(A). An NBW is said to be in initial normal form (INF) if (i) the initial state is non-accepting and has a transition back to itself on every letter in Σ, and (ii) no other state has a transition to the initial state. An NBW is said to be complete if every state has at least one outgoing transition on every letter in Σ. Every NBW can be transformed to INF and made complete without changing the accepted language by adding at most one non-accepting initial state and at most one non-accepting “sink” state. All NBW considered in the remainder of this paper are assumed to be complete and in INF. The (possibly infinite) set of all runs of an NBW A = (Q, q0 , Σ, δ, F ) on a word α can be represented by a directed acyclic graph Gα = (V, E), where V is a subset of Q × N and E ⊆ V × V . The root vertex of the DAG is (q0 , 0). For all i > 0, vertex (q, i) ∈ V iff there is a run ρ of A on α such that ρ(i) = q. The set of edges of Gα is E ⊆ V × V , where ((q, i), (q , j)) ∈ E iff both (q, i) and (q , j) are in V , j = i + 1 and q ∈ δ(q, αi ). Graph Gα is called the runDAG of α in A. A vertex (q, l) ∈ V is called an F -vertex if q ∈ F , i.e, q is a final state of A. A vertex (q, l) is said to be F-free if there is no F -vertex that is reachable from (q, l) in Gα . Furthermore, (q, l) is called finite if only finitely many vertices are reachable from (q, l) in Gα . For every l ≥ 0, the set of vertices {(q, l) | (q, l) ∈ V } constitutes level l of Gα . An accepting path in Gα is an infinite path (q0 , 0), (qi1 , 1), (qi2 , 2) . . . such that q0 , qi1 , . . . is an accepting run of A. The run-DAG Gα is called rejecting if there is no accepting path in Gα . Otherwise, Gα is said to be accepting. 2.1
Ranking Functions and Complementation
Kupferman and Vardi [KV01] introduced the idea of assigning ranks to vertices of run-DAGs, and described a rank-based complementation construction for alternating B¨ uchi automata. They also showed how this technique can be used to obtain a ranking-based complementation construction for NBW, that is easier to understand and implement than the complementation construction based on Safra’s determinization construction[Saf88]. In this section, we briefly overview ranking-based complementation constructions for NBW. odd Let [k] denote the set {1, 2, . . . , k}, and [k] (respectively [k]even ) denote the set of all odd (respectively even) numbers in the set {1, 2, . . . , k}. Given an NBW A with n states and an ω-word α, let Gα = (V, E) be the run-DAG of α in A. A ranking r of Gα is a function r : V → [2n] that satisfies the following conditions: (i) for all vertices (q, l) ∈ V , if r((q, l)) is odd then q ∈ / F , and (ii) for all edges ((q, l), (q , l + 1)) ∈ E, we have r((q , l + 1)) ≤ r((q, l)). A ranking associates with every vertex in Gα a rank in [2n] such that the ranks along every path in Gα are non-increasing, and vertices corresponding to final states always get even ranks. A ranking r is said to be odd if every infinite path in Gα eventually gets trapped in an odd rank. Otherwise, r is called an even ranking. We use max odd (r) to denote the highest odd rank in the range of r. A level ranking for A is a function g : Q → [2n] ∪ {⊥} such that for every q ∈ Q, if g(q) ∈ [2n]odd, then q ∈ / F . Let L be the set of all level rankings for
On Minimal Odd Rankings for B¨ uchi Complementation
231
A. Given two level rankings g, g ∈ L, a set S ⊆ Q and a letter σ, we say that g covers (g, S, σ) if for all q ∈ S and q ∈ δ(q, σ), if g(q) = ⊥, then g (q ) = ⊥ and g (q ) ≤ g(q). For a level ranking g, we abuse notation and let max odd (g) denote the highest odd rank in the range of g. A ranking r of Gα induces a level ranking for every level l ≥ 0 of Gα . If Ql = {q | (q, l) ∈ V } denotes the set of states in level l of Gα , then the level ranking g induced by r for level l is as follows: g(q) = r((q, l)) for all q ∈ Ql and g(q) = ⊥ otherwise. It is easy to see that if g and g are level rankings induced for levels l and l + 1 respectively, then g covers (g, Ql , αl ), where αl is the lth letter in the input word α. A level ranking g is said to be tight if the following conditions hold: (i) the highest rank in the range of g is odd, and (ii) for all i ∈ [max odd(g)]odd , there is a state q ∈ Q with g(q) = i. Lemma 1 ([KV01]). The following statements are equivalent: (P1) All paths of Gα see only finitely many F-vertices. (P2) There is an odd ranking for Gα . Kupferman and Vardi [KV01] provided a constructive proof of (P1) ⇒ (P2) in the above Lemma. Their construction is important for some of our subsequent discussions, hence we outline it briefly here. Given an NBW A with n states, an ωword α ∈ L(A) and the run-DAG Gα of A on α, the proof in [KV01] inductively defines an infinite sequence of DAGs G0 ⊇ G1 ⊇ . . ., where (i) G0 = Gα , (ii) G2i+1 = G2i \ {(q, l) | (q, l) is finite in G2i }, and (iii) G2i+2 = G2i+1 \ {(q, l) | (q, l) is F-free in G2i+1 }, for all i ≥ 0. An interesting consequence of this definition is that for all i ≥ 0, G2i+1 is either empty or has no finite vertices. It can be shown that if all paths in Gα see only finitely many F-vertices, then KV G2n−1 and all subsequent Gi s must be empty. A ranking rA,α of Gα can therefore KV be defined as follows: for every vertex (q, l) of Gα , rA,α ((q, l)) = 2i if (q, l) is finite KV in G2i , and rA,α ((q, l)) = 2i + 1 if (q, l) is F-free in G2i+1 . Kupferman and Vardi KV showed that rA,α is an odd ranking [KV01]. Throughout this paper, we will KV use rA,α to denote the odd ranking computed by the above technique due to Kupferman and Vardi (hence KV in the superscript) for NBW A and α ∈ L(A). When A and α are clear from the context, we will simply use rKV for notational convenience. The NBW complementation construction and upper size bound presented in [KV01] was subsequently tightened in [FKV06], where the following important observation was made. Lemma 2 ([FKV06]). Given a word α ∈ L(A), there exists an odd ranking r of Gα and a level llim ≥ 0, such that for all levels l > llim , the level ranking induced by r for l is tight. Lemma 2 led to a reduced upper bound for the size of ranking-based complementation constructions, since all non-tight level rankings could now be ignored after reading a finite prefix of the input word. Schewe[Sch09] tightened
232
H. Karmarkar and S. Chakraborty
the construction and analysis further, resulting in a ranking-based complementation construction with an upper size bound that is within a factor of n2 of the best known lower bound [Yan08]. Hence, Schewe’s construction is currently the best known ranking-based construction for complementing NBW. Gurumurthy et al [GKSV03] presented a collection of practically useful optimization techniques for keeping the size of complement automata constructed using ranking techniques under control. Their experiments demonstrated the effectiveness of their optimizations for NBW with an average size of 6 states. Interestingly, their work also highlighted the difficulty of complementing NBW with tens of states in practice. Doyen and Raskin [DR09] have recently proposed powerful anti-chain optimizations in ranking-based techniques for checking universality (L(A) =? Σ ω ) and language containment (L(A) ⊆? L(B)) of NBW. Fogarty and Vardi [FV09] have evaluated Doyen and Raskin’s technique and also Ramseybased containment checking techniques in the context of proving size-change termination (SCT) of programs. Their results bear testimony to the effectiveness of Doyen and Raskin’s anti-chain optimizations for ranking-based complementation in SCT problems, especially when the original NBW is known to have reverse-determinism [FV09]. Given an NBW A, let KVF(A) be the complement NBW constructed using the Friedgut, Kupferman and Vardi construction with tight level rankings [KV01, FKV06]. For notational convenience, we will henceforth refer to this construction as KVF-construction. Similarly, let KVFS(A) be the complement automaton constructed using Schewe’s variant[Sch09] of the KVF-construction. We will henceforth refer to this construction as KVFS-construction. The work presented in this paper can be viewed as an optimized variant of Schewe’s [Sch09] ranking-based complementation construction. The proposed method is distinct from other optimization techniques proposed in the literature (e.g. those in [GKSV03]), and adds to the repertoire of such techniques. We first show that for every NBW A and word α ∈ L(A), the ranking rKV is minimal in the following sense: if r is any odd ranking of Gα , then every vertex (q, l) in Gα satisfies rKV ((q, l)) ≤ r((q, l)). We then describe how to restrict the transitions of the complement automaton obtained by the KVFS-construction, such that every accepting run of α assigns the same rank to all vertices in Gα as is assigned by rKV . Thus, our construction ensures that acceptance of α happens only through minimal odd rankings. This allows us to partition the set of states of the complement automaton into slices such that for every word α ∈ L(A), all its accepting runs lie in exactly one slice. Redundant slices can then be identified as those that never contribute to accepting any word in L(A). Removal of such redundant slices results in a reduced state count, while preserving the language of the complement automaton. The largest k(> 0) such that there is a non-redundant slice with that assigns rank k to some vertex in the run-DAG gives the rank of A, as defined by [GKSV03]. Notice that our sliced view of the complement automaton is distinct from the notion of slices as used in [KW08]. Gurumurthy et al have shown [GKSV03] that for every NBW A, there exists an NBW B with L(B) = L(A), such that both the KVF- and KVFS-constructions
On Minimal Odd Rankings for B¨ uchi Complementation
233
for B c require at most 3 ranks. However, obtaining B from A is non-trivial, and requires an exponential blowup in the worst-case [FKV06]. Therefore, rankingbased complementation constructions typically focus on reducing the state count of the complement automaton starting from a given NBW A, instead of first computing B and then constructing B c . We follow the same approach in this paper. Thus, we do not seek to obtain an NBW with the minimum rank for the complement of a given ω-regular language. Instead, we wish to reduce the state count of Kupferman and Vardi’s rank-based complementation construction, starting from a given NBW A.
3
Minimal Odd Rankings
Given an NBW A and an ω-word α ∈ L(A), an odd ranking r of Gα is said to be minimal if for every odd ranking r of Gα , we have r ((q, l)) ≥ r((q, l)) for all vertices (q, l) in Gα . KV Theorem 1. For every NBW A and ω-word α ∈ L(A), the ranking rA,α is minimal.
Proof. Let α be an ω-word in L(A). Since A and α are clear from the context, we will use the simpler notation rKV to denote the ranking computed by Kupferman and Vardi’s method. Let r be any (other) odd ranking of Gα , and let Vr,i denote the set of vertices in Gα that are assigned the rank i by r. Since A is assumed to be a complete automaton, there are no finite vertices in Gα . Hence, by Kupferman and Vardi’s construction, VrKV ,0 = ∅. Note that this is consistent with our requirement that all ranking functions have range [2n] = {1, . . . 2n}. i We will prove the theorem by showing that Vr,i ⊆ k=1 VrKV ,k for all i > 0. The proof proceeds by induction on i, and by following the construction of DAGs G0 , G1 , . . . in Kupferman and Vardi’s proof of Lemma 1. Base case: Consider G1 = G0 \ {(q , l ) | (q , l ) is finite in G0 } = Gα \ ∅ = Gα . Let (q1 , l1 ) be a vertex in G1 such that r((q1 , l1 )) = 1. Let (qf , lf ) be an F vertex reachable from (q1 , l1 ) in Gα , if possible. By virtue of the requirements that F -vertices must get even ranks, and ranks cannot increase along any path, r((qf , lf )) must be < 1. However, this is impossible given that the range of r must be [2n]. Therefore, no F -vertex can be reachable from (q1 , l1 ). In other words, (q1 , l1 ) is F-free in G1 . Hence, by definition, we have rKV (q1 , l1 ) = 1. Thus, Vr,1 ⊆ VrKV ,1 . j Hypothesis: Assume that Vr,j ⊆ k=1 VrKV ,k for all 1 ≤ j ≤ i. i Induction: By definition, Gi+1 = Gα \ s=1 VrKV ,s . Let (qi+1 , li+1 ) be a vertex in Gi+1 such that r((qi+1 , li+1 )) = i + 1. Since r is an odd ranking, all paths starting from (qi+1 , li+1 ) must eventually get trapped in some odd rank ≤ i + 1 (assigned by r). We consider two cases. – Suppose there are no infinite paths starting from (qi+1 , li+1 ) in Gi+1 . This implies (qi+1 , li+1 ) is a finite vertex in Gi+1 . We have seen earlier that for
234
H. Karmarkar and S. Chakraborty
all k ≥ 0, G2k+1 must be either empty or have no finite vertices. Therefore, i + 1 must be even, and rKV ((qi+1 , li+1 )) = i + 1 by Kupferman and Vardi’s construction. – There exists a non-empty set of infinite paths starting from (qi+1 , li+1 ) in i Gi+1 . Since Gi+1 = Gα \ s=1 VrKV ,s , none of these paths reach any vertex i in s=1 VrKV ,s . Since Vr,j ⊆ k∈{1,2,...,j} VrKV ,k for all 1 ≤ j ≤ i, and since ranks cannot increase along any path, it follows that r must assign i + 1 to all vertices along each of the above paths. This, coupled with the fact that r is an odd ranking, implies that i + 1 is odd. Since F -vertices must be assigned even ranks by r, it follows from above that (qi+1 , li+1 ) is F free in Gα . Therefore, rKV ((qi+1 , li+1 )) = i + 1 by Kupferman and Vardi’s construction. We have thus shown that all vertices in Gi+1 that are assigned rank i + 1 by r i must also be assigned rank i + 1 by rKV . Therefore, Vr,i+1 \ j=1 VrKV ,j ⊆ VrKV ,i+1 . i+1 In other words, Vr,i+1 ⊆ j=1 VrKV ,j . i By the principle of mathematical induction, it follows that Vr,i ⊆ k=1 VrKV ,k for all i > 0. Thus, if a vertex is assigned rank i by r, it must be assigned a rank ≤ i by rKV . Hence rKV is minimal. KV Lemma 3. For every α ∈ L(A), the run-DAG Gα ranked by rA,α (or simply KV r ) satisfies the following properties.
1. For every vertex (q, l) that is not an F -vertex such that rKV ((q, l)) = k, there must be at least one immediate successor (q , l+1) such that rKV ((q , l+1)) = k. 2. For every vertex (q, l) that is an F -vertex, such that rKV ((q, l)) = k, there must be atleast one immediate successor (q , l+1) such that rKV ((q , l+1)) = k or rKV ((q , l + 1)) = k − 1 3. For every vertex (q, l) such that rKV ((q, l)) = k, where k is odd and > 1, there is a vertex (q , l ) for l > l such that (q , l ) is an F -vertex reachable from (q, l) and rKV ((q , l )) = k − 1. 4. For every vertex (q, l) such that rKV ((q, l)) = k, where k is even and > 0, every path starting at (q, l) eventually visits a vertex with rank less than k. Furthermore, there is a vertex (q , l ) for l > l such that (q , l ) is reachable from (q, l) and rKV ((q , l )) = k − 1. We omit the proof due to lack of space. The reader is referred to [KC09] for details of the proof.
4
A Motivating Example
We have seen above the KVFS-construction leads to almost tight worst case bounds for NBW complementation. This is a significant achievement considering the long history of B¨ uchi complementation [Var07]. However, the KVFSconstruction does not necessarily allow us to construct small complement automata for every NBW. Specifically, there exists a family of NBW
On Minimal Odd Rankings for B¨ uchi Complementation
235
A = { A3 , A5 , . . . } such that for every i ∈ {3, 5, . . .}: (i) Ai has i states, (ii) each of the ranking-based complementation constructions in [KV01], [FKV06] and ( i−1 )
[Sch09] produces a complement automaton with at least i e2i states, and (iii) a ranking-based complementation construction that assigns minimal ranks to all run-DAGs results in a complement automaton with Θ(i) states. Automata in the family A can be described as follows. Each Ai is an a,b b a,b NBW (Qi , q1 , Σ, δi , Fi ), where Qi = a,b a,b q3 a,b a {q1 , q2 , . . . , qi }, Σ = {a, b}, q1 is q2 q1 q4 q5 the initial state and Fi = {qj | Fig. 1. Automaton A5 qj ∈ Qi , j is even }. The transition relation δi is given by: δi = {(qj , a, qj+1 ), (qj , b, qj+1 ) | 1 ≤ j ≤ i − 2} ∪ {(q1 , a, q1 ), (q1 , b, q1 ), (qi , a, qi ), (qi , b, qi ), (qi−1 , b, qi−1 ), (qi−1 , a, qi )}. Figure 1 shows the structure of automaton A5 defined in this manner. Note that each Ai ∈ A is a complete automaton in INF. Furthermore, bω ∈ L(Ai ) and aω ∈ L(Ai ) for each Ai ∈ A. Let KVF(Ai ) and KVFS(Ai ) be the complement automata for Ai constructed using the KVFconstruction and KVFS-construction respectively. Lemma 4. For every Ai ∈ A, the number of states in KVF(Ai ) and KVFS(Ai ) (i−1)/2
is atleast (i) ei . Furthermore, there exists a ranking-based complementation construction for Ai that gives a complement automaton Ai with Θ(i) states. Proof sketch: The proof is obtained by considering the run-DAG for aω (which (i−1)/2
is ∈ L(Ai )), and by showing that at least (i) ei states are required in both the KVF- and KVFS-constructions in order to allow all possible consistent rank assignments to vertices of the run-DAG. On the other hand, a complementation construction that uses the ranking f (qi ) = 1, f (q1 ) = 3 and f (qj ) = 2 for all 2 ≤ j ≤ i − 1 at all levels of the run-DAG can be shown to accept L(A) with Θ(i) states. Details of the proof may be found in [KC09]. This discrepancy in the size of a sufficient rank set and the actual set of ranks used by the KVF- and KVFS-constructions motivates us to ask if we can devise a ranking-based complementation construction for NBW that uses the minimum number of ranks when accepting a word in the complement language. In this paper, we answer this question in the affirmative, by providing such a complementation construction.
5
Complementation with Minimal Ranks
Motivated by the example described in the previous section, we now describe an optimized ranking-based complementation construction for NBW. Given an NBW A, the complement automaton A obtained using our construction has the special property that when it accepts an ω-word α, it assigns a rank r to Gα KV that agrees with the ranking rA,α at every vertex in Gα . This is achieved by nonKV deterministically mimicking the process of rank assignment used to arrive at rA,α .
236
H. Karmarkar and S. Chakraborty
Our construction imposes additional constraints on the states and transitions of the complement automaton, beyond those in the KVF- and KVFS-constructions. For example, if k is the smallest rank that can be assigned to vertex (q, l), then the following conditions must hold: (i) if (q, l) is not an F -vertex then it must have a successor of the same rank at the next level, and (ii) if (q, l) is an F -vertex then it must have a successor at the next level with rank k or k − 1. The above observations are coded as conditions on the transitions of A , and are crucial if every accepting run of A on α ∈ L(A) must correspond to the unique ranking KV rA,α of Gα . Recall that in [FKV06], a state of the automaton that tracks the ranking of the run-DAG vertices at the current level is represented as a triple (S, O, f ), where S is a set of B¨ uchi states reachable after reading a finite prefix of the word, f is a tight level ranking, and the O-set checks if all even ranked vertices in a (possibly previous) level of the run-DAG have moved to lower odd ranks. In our construction, we use a similar representation of states, although the Oset is much more versatile. Specifically, the O-set is populated turn-wise with states of the same rank k, for both odd and even k. This is a generalization of Schewe’s technique that uses the O-set to check if even ranked states present at a particular level have moved to states with lower odd ranks. The O-set in our construction, however, does more. It checks if every state of rank k (whether even or odd) in an O-set eventually reaches a state with rank k − 1. If k is even. then it also checks if all runs starting at states in O eventually reach a state with rank k − 1. When all states in an O-set tracking rank 2 reach states with rank 1, the O-set is reset and loaded with states that have the maximal odd rank in the range of the current level ranking. The process of checking ranks is then re-started. This gives rise to the following construction for the complement automaton A . Let A = (Q , Q0 , Σ, δ , F ), where – Q = {2Q × 2Q × R} is the state set, such that if (S, O, f ) ∈ Q , then S ⊆ Q and S = ∅, f ∈ R is a level ranking, O ⊆ S, and either O = ∅ or ∃k ∈ [2n − 1] ∀q ∈ O, f (q) = k. – Q0 = i∈[2n−1]odd {(S, O, f ) | S = {q0 }, f (q0 ) = i, O = ∅} is the set of initial states. – For every σ ∈ Σ, the transition function δ is defined such that if (S , O , f ) ∈ δ ((S, O, f ), σ), the following conditions are satisfied. 1. S = δ(S, σ), f covers (f, S, σ). 2. For all q ∈ S \ F , there is a q ∈ δ(q, σ) such that f (q ) = f (q). 3. For all q ∈ S ∩ F one of the following must hold (a) There is a q ∈ δ(q, σ), such that f (q ) = f (q). (b) There is a q ∈ δ(q, σ), such that f (q ) = f (q) − 1. 4. In addition, we have the following restrictions: (a) If f is not a tight level ranking, then O = O (b) If f is a tight level ranking and O = ∅, then i. If for all q ∈ O, we have f (q) = k, where k is even, then • Let O1 = δ(O, σ) \ {q | f (q) < k}.
On Minimal Odd Rankings for B¨ uchi Complementation
237
• If (O1 = ∅) then O = {q | q ∈ Q ∧ f (q) = k − 1}, else O = O1 . ii. If for all q ∈ O, we have f (q) = k, where k is odd, then • If k = 1 then O = ∅ • If k > 1 then let O2 = O \ {q | ∃q ∈ δ(q, σ), (f (q ) = k − 1)}. - If O2 = ∅ then O ⊆ δ(O2 , σ) such that ∀q ∈ O2 ∃q ∈ O , q ∈ δ(q, σ) ∧ f (q ) = k. - If O2 = ∅ then O = {q | f (q) = k − 1}. (c) If f is a tight level ranking and O = ∅, then O = {q | f (q) = max odd(f )}. – As in the KVF- and KVFS-constructions, the set of accepting states of A is F = {(S, O, f ) | f is a tight level ranking, and O = ∅} Note that unlike the KVF- and KVFS-constructions, the above construction does not have an initial phase of unranked subset construction, followed by a non-deterministic jump to ranked subset construction with tight level rankings. Instead, we start directly with ranked subsets of states, and the level rankings may indeed be non-tight for some finite prefix of an accepting run. The value of O is inconsequential until the level ranking becomes tight; hence it is kept as ∅ during this period. Note further that the above construction gives rise to multiple initial states in general. Since an NBW with multiple initial states can be easily converted to one with a single initial state without changing its language, this does not pose any problem, and we will not dwell on this issue any further. Theorem 2. L(A ) = L(A) The proof proceeds by establishing three sub-results: (i) every accepting run of A on word α assigns an odd ranking to the run-DAG Gα and hence corresponds KV to an accepting run of KVF(A), (ii) the run corresponding to the ranking rA,α is an accepting run of A on α, and (iii) L(KVF(A)) = L(A)[KV01]. Details of the proof may be found in [KC09]. Given an NBW A and the complement NBW A constructed using the above algorithm, we now ask if A has an accepting run on some α ∈ L(A) that induces KV an odd ranking r different from rA,α . We answer this question negatively in the following lemma. Lemma 5. Let α ∈ L(A), and let r be the odd ranking corresponding to an accepting run of A on α. Let Vr,i (respectively, VrA,α KV ,i ) be the set of vertices in KV Gα that are assigned rank i by r (respectively, rA,α ). Then Vr,i = VrA,α KV ,i for all i > 0.
Proof. We prove the claim by induction on the rank i. Since A and α are clear KV from the context, we will use rKV in place of rA,α in the remainder of the proof. Base case: Let (q, l) ∈ VrKV ,1 . By definition, (q, l) is F -free. Suppose r((q, l)) = m, where m > 1, if possible. If m is even, the constraints embodied in steps (2), (3)
238
H. Karmarkar and S. Chakraborty
and (4(b)i) of our complementation construction, coupled with the fact that the Oset becomes ∅ infinitely often, imply that (q, l) has an F -vertex descendant (q , l ) in Gα . Therefore, (q, l) is not F -free – a contradiction! Hence m cannot be even. Suppose m is odd and > 1. The constraint embodied in step (4(b)ii) of our construction, and the fact that the O-set becomes ∅ infinitely often imply that (q, l) has a descendant (q , l ) that is assigned an even rank by r. The constraints embodied in steps (2), (3) and (4(b)i), coupled with the fact that the O-set becomes ∅ infinitely often, further imply that (q , l ) has an F -vertex descendant in Gα . Hence (q, l) has an F -vertex descendant, and is not F -free. This leads to a contradiction again! Therefore, our assumption must have been incorrect, i.e. r((q, l)) ≤ 1. Since 1 is the minimum rank in the range of r, we finally have r((q, l)) = 1. This shows that VrKV ,1 ⊆ Vr,1 . Now suppose (q, l) ∈ Vr,1 . Since r corresponds to an accepting run of A , it is an odd-ranking. This, coupled with the fact that ranks cannot increase along any path in Gα , imply that all descendants of (q, l) in Gα are assigned rank 1 by r. Since F -vertices must be assigned even ranks, this implies that (q, l) is F -free in Gα . It follows that rKV ((q, l)) = 1. Therefore, Vr,1 ⊆ VrKV ,1 . From the above two results, we have Vr,1 = VrKV ,1 . Hypothesis: Assume that Vr,j = VrKV ,j for 1 ≤ j ≤ i. Induction: Let (q, l) ∈ VrKV ,i+1 . Then by the induction hypothesis, (q, l) cannot be in any Vr,j for j ≤ i. Suppose r((q, l)) = m, where m > i + 1, if possible. We have two cases. 1. i + 1 is odd: In this case, the constraints embodied in steps (2), (3), (4(b)i) and (4(b)ii) of our construction, coupled with the fact that the O-set becomes ∅ infinitely often, imply that vertex (q, l) has an F -vertex descendant (q , l ) (possibly its own self) such that (i) r((q , l )) = i + 2, and (ii) vertex (q , l ) in turn has a descendant (q , l ) such that r((q , l )) = i + 1. The constraint embodied in (2) of our construction further implies that there must be an infinite path π starting from (q , l ) in Gα such that every vertex on π is assigned rank i + 1 by r, and none of these are F -vertices. Since rKV ((q, l)) = i+1 is odd, and since (q , l ) is an F -vertex descendant of (q, l), we must have rKV ((q , l )) ≤ i, where i is even. Furthermore, since rKV is an odd ranking, every path in Gα must eventually get trapped in an odd rank assigned by KV rKV . Hence, eventually every vertex on π is assigned an odd rank < i by rA,α . However, we already know that the vertices on π are eventually assigned the odd rank i + 1 by r. Hence Vr,j = VrKV ,j for some j ∈ {1, . . . i}. This contradicts the inductive hypothesis! 2. i + 1 is even: In this case, the constraints embodied in steps (2), (3), (4(b)i) and (4(b)ii) of our construction, and the fact that the O-set becomes ∅ infinitely often, imply that (q, l) has a descendant (q , l ) in Gα such that r((q , l )) = i + 2 (which is odd), and there is an infinite path π starting from (q , l ) such that all vertices on π are assigned rank i + 2 by r. However, since rKV is an odd ranking and since rKV ((q, l)) = i + 1 (which is even), vertices on π must eventually get trapped in an odd rank ≤ i assigned by rKV . This
On Minimal Odd Rankings for B¨ uchi Complementation
239
implies that Vr,j = VrKV ,j for some j ∈ {1, . . . i}. This violates the inductive hypothesis once again! It follows from the above cases that r((q, l)) ≤ i + 1. However, (q, l) ∈ VrKV ,j for 1 ≤ j ≤ i (since (q, l) ∈ VrKV ,i+1 ), and Vr,j = VrKV ,j for 1 ≤ j ≤ i (by inductive hypothesis). Therefore, (q, l) ∈ Vr,j for 1 ≤ j ≤ i. This implies that r((q, l)) = i + 1, completing the induction. By the principle of mathematical induction, we VrA,α KV ,i = Vr,i for all i > 0. Theorem 3. Every accepting run of A on α ∈ L(A) induces the unique miniKV mal ranking rA,α . Proof. Follows from Lemma 5. 5.1
Size of Complement Automaton
The states of A are those in the set {2Q × 2Q × R}. While some of these states correspond to tight level rankings, others do not. We first use an extension of the idea in [Sch09] to encode a state (S, O, f ) with tight level ranking f as a pair (g, i), where g : Q → {1, . . . , r} ∪ {−1, −2} and r = max odd (f ). Thus, for all q ∈ Q, we have q ∈ / S iff g(q) = −2. If q ∈ S and q ∈ / O, we have g(q) = f (q). If q ∈ O and f (q) is even, then we let g(q) = −1 and i = f (q). This part of the encoding is exactly as in [Sch09]. We extend this encoding to consider cases where q ∈ O and f (q) = k is odd. There are two sub cases to consider: (i) O {q | q ∈ S ∧ f (q) = k}, and (ii) O = {q | q ∈ S ∧ f (q) = k}. In the first case, we let i = k and g(q) = −1 for all q ∈ O. In the second case, we let i = k and g(q) = f (q) = k for all q ∈ O. Since, f is a tight level ranking, the O-set cannot be empty when we check for states with an odd rank in our construction. Therefore, there is no ambiguity in identifying the set O in both cases (i) and (ii) above. It is now easy to see that g is always onto one of the three sets {1, 3, . . . , r}, {−1} ∪ {1, 3, . . . , r} or {−2} ∪ {1, 3, . . . , r}. By Schewe’s analysis [Sch09], the total number of such (g, i) pairs is upper bounded by O (tight(n + 1)). Now, let us consider states with non-tight level rankings. Our construction ensures that once an odd rank i appears in a level ranking g along a run ρ, all subsequent level rankings along ρ contain every rank in {i, i+2, . . . max odd (g)}. The O-set in states with non-tight level ranking is inconsequential; hence we ignore this. Suppose a state with non-tight level ranking g contains the odd ranks {i, i − 2, . . . , j}, where 1 < j ≤ i = max odd (g). To encode this state, we first replace g with a level ranking g as follows. For all k ∈ {j, . . . , i} and q ∈ Q, if g(q) = k, then g (q) = k − j + c, where c = 0 if j is even and 1 otherwise. Effectively, this transforms g to a tight level ranking g by shifting all ranks down by j − c. The original state can now be represented as the pair (g , −(j − c)). Note that the second component of a state represented as (g, i) is always non-negative for states with tight level ranking, and always negative for states with non-tight level ranking. Hence, there is no ambiguity in decoding the state representation. Clearly, the total no. of states with non-tight rankings
240
H. Karmarkar and S. Chakraborty
is O(n.tight(n)) = O(tight(n + 1)). Hence, the size of A is upper bounded by O (tight(n + 1)) which differs from the lower bound of Ω(tight (n − 1)) given by [Yan08] by only a factor of n2 .
6
Slices of Complement Automaton
The transitions of the complement automaton A obtained by our construction have the property that a state (S, O, f ) has a transition to (S , O , f ) only if max odd (f ) = max odd (f ). Consequently, the set of states Q can be partitioned into slices Q1 , Q3 , . . . , Q2n−1 , where the set Qi = {(S, O, f )|(S, O, f ) ∈ th Q ∧ max odd (f ) = i} is called the i slice of A . It is easy to see that Q = i∈[2n−1]odd Qi . Let ρ be an accepting run of A on α ∈ L(A). We say that ρ is confined to a slice Qi of A iff ρ sees only states from Qi . If ρ is confined to slice i, and if r is the odd ranking induced by ρ, then max odd (r) = i. Lemma 6. All accepting runs of A on α ∈ L(A) are confined to the same slice. Proof. Follows from Theorem 3.
The above results indicate that if a word α is accepted by the ith slice of our automaton A , then it cannot be accepted by KVF(A) using a tight ranking with max odd rank < i. It is however possible that the same word is accepted by KVF(A) using a tight ranking with max odd rank > i. Figure 1 shows an example of such an automaton, where the word aω is accepted by KVF(A) using a tight ranking with max odd rank 5, as well as with a tight ranking with max odd rank 3. The same word is accepted by only the 3rd slice of our automaton A . This motivates the definition of minimally inessential ranks. Given an NBW A with n states, odd rank i (1 ≤ i ≤ 2n−1) is said to be minimally inessential if every word α that is accepted by KVF(A) using a tight ranking with max odd rank i is also accepted by KVF(A) using a tight ranking with max odd rank j < i. An odd rank that is not minimally inessential is called minimally essential. As the example in Figure 1 shows, neither the KVF-construction nor the KVFS-construction allows us to detect minimally essential ranks in a straightforward way. Specifically, although the 5th slice of KVF(A) for this example accepts the word aω , 5 is not a minimally inessential rank. In order to determine if 5 is minimally inessential, we must isolate the 5th slice of KVF(A), and then check whether the language accepted by this slice is a subset of the language accepted by KVF(A) sans the 5th slice. This involves complementing KVF(A) sans the 5th slice, requiring a significant blowup. In contrast, the properties of our automaton A allow us to detect minimally (in)essential ranks efficiently. Specifically, if we find that the ith slice of A accepts a word α, we can infer than i is minimally essential. Once all minimally essential ranks have been identified in this manner, we can prune automaton A to retain only those slices that correspond to minimally essential ranks. This gives us a way of eliminating redundant slices (and hence states) in KVF(A).
On Minimal Odd Rankings for B¨ uchi Complementation
7
241
An Implementation of Our Algorithm
We have implemented the complementation algorithm presented in this paper as a facility on top of the BDD-based model checker NuSMV. In our implementation, states of the complement automaton are encoded as pairs (g, i), as explained in Section 5.1. Our tool takes as input an NBW A, and gena,b,c a,b erates the state transition relation of a,b,c a,c a,b,c 2 3 0 1 the complement automaton A using b a,b,c b the above encoding in NuSMV’s ina,b,c a,b,c 5 4 6 put format. Generating the NuSMV b file from a given description of NBW a,b,c a,b takes negligible time (< 0.01s). The number of boolean constraints used Fig. 2. Example automaton with gaps in expressing the transition relation in NuSMV is quadratic in n. We use NuSMV’s fair CTL model checking capability to check whether there exists an infinite path in a slice of A (corresponding to a maximum odd rank k) that visits an accepting state infinitely often. If NuSMV responds negatively to such a query, we disable all transitions to and from states in this section of the NuSMV model. This allows us to effectively detect and eliminate redundant slices of our complement automaton, resulting in a reduction of the overall size of the automaton. For purposes of comparison, we have also implemented the algorithm presented in [Sch09] in a similar manner using a translation to NuSMV. We also compared the performance of our technique with those of Safra-Piterman [Saf88, Pit07] determinization based complementation technique and Kupferman-Vardi’s ranking based complementation technique [KV01], as implemented in the GOAL tool [TCT+ 08]. Table 1 shows the results of some of our experiments. We used the CUDD BDD library with NuSMV 2.4.3, and all our experiments were run Table 1. Experimental Results. XX:timeout. (after 10 min) Automaton (states,trans.,final) michel4(6,15,1) g15 (6,17,1) g47 (6,13,3) ex4 (8,9,4) ex16 (9,13,5) ex18 (9,13,5) ex20 (9,12,5) ex22 (11,18,7) ex24 (13,16,8) ex26 (15,22,11) gap1 (15,22,11) gap2 (15,22,11)
KVFS algorithm States MI Ranks 157 {9,7} 324 {9,7,5} 41 {5} 3956 ∅ 10302 ∅ 63605 ∅ 17405 ∅ 258 {7,5} 4141300 ∅ 1042840 ∅ 99 {5,1} 532 {1}
Our algorithm States MI Ranks 117 {9,7} 155 {9,7,5} 29 {5,3} 39 {7,5} 909 {7} 2886 ∅ 156 {7} 23 {7,5} 19902 {9,7} 57540 {7,5} 26 {5,1} 80 {5,1}
GOAL WAA SP XX 105 XX 39 XX 28 XX 7 XX 30 XX 71 XX 24 XX 6 XX 51 XX XX XX 16 XX 48
242
H. Karmarkar and S. Chakraborty
on an Intel Xeon 3GHz with 2 GB of memory and a timeout of 10 minutes. The entry for each automaton1 lists the number of states, transitions and final states of the original automaton, the number of states of the complement automaton computed by the KVFS-construction and by our construction, and the set of minimally inessential ranks (denoted as “MI Ranks”) identified by each of these techniques. In addition, each row also lists the number of states computed by the “Safra-Piterman” (denoted as “SP”) technique and “Weak Alternating Automata” (denoted as “WAA”) technique in GOAL. A significant advantage of our construction is its ability to detect “gaps” in slices. As an example, the automaton in Figure (2) has ranks 1, 5 as minimally inessential, while ranks 3 and 7 are minimally essential (see Table (1)). In this case, if we compute the rank of the NBW (as suggested in [GKSV03]) and then consider slices only upto this rank, we will fail to detect that rank 5 is minimally inessential. Therefore, eliminating redundant slices is a stronger optimization than identifying and eliminating states with ranks greater than the rank of the NBW.
8
Conclusion
In this paper, we presented a complementation algorithm for nondeterministic B¨ uchi automata that is based on the idea of ranking functions introduced by Kupferman and Vardi[KV01]. We showed that the ranking assignment presented in [KV01] always results in a minimal odd ranking for run-DAGs of words in the complement language. We then described a complementation construction for NBW such that the complement NBW accepts only the run-DAG with the minimal odd ranking for every word in the complement. We observed that the states of the complement NBW are partioned into slices, and that each word in the complement is accepted by exactly one such slice. This allowed us to check for redundant slices and eliminate them, leading to a reduction the size of the complement NBW. It is noteworthy that this ability to reduce the size of the final complement NBW comes for free since the worst case bounds coincide with the worst case bounds of the best known NBW complementation construction. In the future, we wish to explore techniques to construct unambiguous automata and deterministic Rabin automata from NBW, building on the results presented here.
References [B¨ uc62]
[DR09] 1
B¨ uchi, J.R.: On a decision method in restricted second order arithmetic. In: Proc. 1960 Int. Congr. for Logic, Methodology and Philosophy of Science, pp. 1–11. Stanford Univ. Press, Stanford (1962) Doyen, L., Raskin, J.-F.: Antichains for the automata based approach to model checking. Logical Methods in Computer Science 5, 1–20 (2009)
All example automata from this paper and the translator from automaton description to NuSMV are available at http://www.cfdvs.iitb.ac.in/reports/minrank
On Minimal Odd Rankings for B¨ uchi Complementation [FKV06]
243
Friedgut, E., Kupferman, O., Vardi, M.Y.: B¨ uchi complementation made tighter. Int. J. Found. Comput. Sci. 17(4), 851–868 (2006) [FV09] Fogarty, S., Vardi., M.Y.: B¨ uchi complementation and size-change termination. In: Proc. TACAS, pp. 16–30 (2009) [GKSV03] Gurumurthy, S., Kupferman, O., Somenzi, F., Vardi, M.Y.: On complementing nondeterministic b¨ uchi automata. In: Geist, D., Tronci, E. (eds.) CHARME 2003. LNCS, vol. 2860, pp. 96–110. Springer, Heidelberg (2003) [KC09] Karmarkar, H., Chakraborty, S.: On minimal odd rankings for B¨ uchi complementation (May 2009), http://www.cfdvs.iitb.ac.in/reports/index.php [Kla91] Klarlund, N.: Progress measures for complementation of ω-automata with applications to temporal logic. In: Proc. 32nd IEEE FOCS, San Juan, pp. 358–367 (1991) [KV01] Kupferman, O., Vardi, M.Y.: Weak alternating automata are not that weak. ACM Transactions on Computational Logic 2(3), 408–429 (2001) [KW08] K¨ ahler, D., Wilke, T.: Complementation, disambiguation, and determinization of B¨ uchi automata unified. In: Aceto, L., Damg˚ ard, I., Goldberg, L.A., Halld´ orsson, M.M., Ing´ olfsd´ ottir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 724–735. Springer, Heidelberg (2008) [L¨ od99] L¨ oding, C.: Optimal bounds for transformations of ω-automata. In: Pandu Rangan, C., Raman, V., Sarukkai, S. (eds.) FST TCS 1999. LNCS, vol. 1738, pp. 97–109. Springer, Heidelberg (1999) [Mic88] Michel, M.: Complementation is more difficult with automata on infinite words. In: CNET, Paris (1988) [Pit07] Piterman, N.: From nondeterministic B¨ uchi and Streett automata to deterministic Parity automata. Logical Methods in Computer Science 3(3:5), 1–217 (2007) [Saf88] Safra, S.: On the complexity of ω-automata. In: Proc. 29th IEEE FOCS, pp. 319–327 (1988) [Sch09] Schewe, S.: B¨ uchi complementation made tight. In: Proc. STACS, pp. 661–672 (2009) [SVW87] Prasad Sistla, A., Vardi, M.Y., Wolper, P.: The complementation problem for B¨ uchi automata with applications to temporal logic. Theoretical Computer Science 49, 217–237 (1987) [TCT+ 08] Tsay, Y.-K., Chen, Y.-F., Tsai, M.-H., Chan, W.-C., Luo, C.-J.: Goal extended: Towards a research tool for omega automata and temporal logic. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 346–350. Springer, Heidelberg (2008) [Var07] Vardi, M.Y.: The B¨ uchi complementation saga. In: Thomas, W., Weil, P. (eds.) STACS 2007. LNCS, vol. 4393, pp. 12–22. Springer, Heidelberg (2007) [Yan08] Yan, Q.: Lower bounds for complementation of omega-automata via the full automata technique. Logical Methods in Computer Science 4(1), 1–20 (2008)
Specification Languages for Stutter-Invariant Regular Properties Christian Dax1 , Felix Klaedtke1 , and Stefan Leue2 1
2
ETH Zurich, Switzerland University of Konstanz, Germany
Abstract. We present specification languages that naturally capture exactly the regular and ω-regular properties that are stutter invariant. Our specification languages are variants of the classical regular expressions and of the core of PSL, a temporal logic, which is widely used in industry and which extends the classical linear-time temporal logic LTL by semi-extended regular expressions.
1
Introduction
Stutter-invariant specifications do not distinguish between system behaviors that differ from each other only by the number of consecutive repetitions of the observed system states. Stutter invariance is crucial for refining specifications and for modular reasoning [13]. Apart from these conceptual reasons for restricting oneself to stutter-invariant specifications, there is also a more practical motivation: stuttering invariance is an essential requirement for using partial-order reduction techniques (see, e.g., [2, 11, 15, 16, 20]) in finite-state model checking. Unfortunately, checking whether an LTL formula or an automaton describes a stutter-invariant property is PSPACE-complete [18]. To leverage partial-order reduction techniques in finite-state model checking even when it is unknown whether the given property is stutter-invariant, Holzmann and Kupferman [12] suggested to use a stutter-invariant overapproximation of the given property. However, if the given property is not stutter-invariant, we might obtain counterexamples that are false positives. Moreover, the overapproximation of the property blows up the specification and decelerates the model-checking process. Another approach for avoiding the expensive check whether a given property is stutter-invariant, is to use specification languages that only allow one to specify stutter-invariant properties. For instance, LTL without the next operator X, LTL−X for short, captures exactly the stutter-invariant star-free properties [10, 17]. An advantage of such a syntactic characterization is that it yields a sufficient and easily checkable condition whether partial-order reduction techniques are applicable. However, LTL−X is limited in its expressive power. Independently, Etessami [9] and Rabinovich [19] gave similar syntactic characterizations of the stutter-invariant ω-regular properties. However, these characterizations are not satisfactory from a practical point of view. Both extend
Partly supported by the Swiss National Science Foundation.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 244–254, 2009. c Springer-Verlag Berlin Heidelberg 2009
Specification Languages for Stutter-Invariant Regular Properties
245
fragments of LTL−X by allowing one to existentially quantify over propositions. To preserve stutter invariance the quantification is semantically restricted. Due to this restriction, the meaning of quantifying over propositions becomes unintuitive and expressing properties in the proposed temporal logics becomes difficult. Note that even the extension of LTL with the standard quantification over propositions is considered as difficult to use in practice [21]. Another practical drawback of the temporal logic in [19] is that the finite-state model-checking problem has a non-elementary worst-case complexity. The finite-state modelchecking problem with the temporal logic in [9] remains in PSPACE, as for LTL. This upper bound on the complexity of the model-checking problem is achieved by additionally restricting syntactically the use of the non-standard quantification over propositions. The downside of this restriction is that the logic is not syntactically closed under negation anymore, which can make it more difficult or even impossible to express properties naturally and concisely in it. Expressing the complement of a property might lead to an exponential blow-up. In this paper, we give another syntactic characterization in terms of a temporal logic of the ω-regular properties that are stutter invariant. Our characterization overcomes the limitations of the temporal logics from [9] and [19]. Namely, it is syntactically closed under negation, it is easy to use, and the finite-state modelchecking problem with it is solvable in practice. Furthermore, we also present a syntactic characterization of the stutter-invariant regular properties. Our characterizations are given as variants of the classical regular expressions and the linear-time core of the industrial-strength temporal logic PSL [1], which extends LTL with semi-extended regular expressions (SEREs). We name our variants siSEREs and siPSL, respectively. Similar to PSL, siPSL extends LTL−X with siSEREs. For siSEREs, the use of the concatenation operator and the Kleene star is syntactically restricted. Moreover, siSEREs make use of a novel iteration operator, which is a variant of the Kleene star.
2
Preliminaries
Words. For an alphabet Σ, we denote the set of finite and infinite words by Σ ∗ and Σ ω , respectively. Furthermore, we write Σ ∞ := Σ ∗ ∪ Σ ω and Σ + := Σ ∗ \{ε}, where ε denotes the empty word. The concatenation of words is written as juxtaposition. The concatenation of the languages K ⊆ Σ ∗ and L ⊆ Σ ∞ is K ; L := {uv : u ∈ K and v ∈ L}, and the fusion of K and L is K : L := {ubv ∈Σ ∗ : b ∈ Σ, ub ∈ K, for L ⊆ Σ ∗ , we define and nbv ∈ L}.0Furthermore, i+1 ∗ n + L := n≥0 L and L := n≥1 L with L := {ε} and L := L ;Li , for i ∈ N. ∞ We write |w| for the length of w ∈ Σ and we denote the (i + 1)st letter of w by w(i), where we assume that i < |w|. For a word w ∈ Σ ω and i ≥ 0, we define w≥i := w(i)w(i + 1) . . . and w≤i := w(0) . . . w(i). Stutter-Invariant Languages. Let us recall the definition of stutter invariance from [18]. The stutter-removal operator : Σ ∞ → Σ ∞ maps a word v ∈ Σ ∞ to the word that is obtained from v by replacing every maximal finite substring of identical letters by a single copy of the letter. For instance, (aabbbccc) = abc,
246
C. Dax, F. Klaedtke, and S. Leue
(aab(bbc)ω ) = a(bc)ω , and (aabbbcccω ) = abcω . A language L ⊆ Σ ∞ is stutterinvariant if u ∈ L ⇔ v ∈ L, for all u, v ∈ Σ ∞ with (u) = (v). A word w ∈ Σ ∞ is stutter free if w = (w). For L ⊆ Σ ∞ , we define L := {(w) : w ∈ L}. Propositional Logic. For a set of propositions P , we denote the set of Boolean formulas over P by B(P ), i.e., B(P ) consists of the formulas that are inductively built from the propositions in P and the connectives ∧ and ¬. For M ⊆ P and b ∈ B(P ), we write M |= b iff b evaluates to true when assigning true to the propositions in M and false to the propositions in P \ M . Semi-extended Regular Expressions. The syntax of semi-extended regular expressions (SEREs) over the proposition set P is defined by the grammar r ::= ε b r∗ r ; r r : r r ∪ r r ∩ r , where b ∈ B(P ). We point out that in addition to the concatentation operator ;, SEREs have the operator : for expressing the fusion of two languages. The language of an SERE over P is inductively defined: ⎧ {ε} if r = ε, ⎪ ⎪ ⎪ ⎨{b ∈ 2P : b |= r} if r ∈ B(P ), L(r) := ⎪ L(s) L(t) if r = s t, ⎪ ⎪ ∗ ⎩ L(s) if r = s∗ , where ∈ {;, :, ∪, ∩}. The size of an SERE is its syntactic length, i.e., ||ε|| := 1, ||b|| := 1, for b ∈ B(P ), ||rs|| := 1+||r||+||s||, for ∈ {∪, ∩, ;, :}, and ||r∗ || := 1+||r||. Propositional Temporal Logic. The core of the linear-time fragment of PSL [1] is as follows. Its syntax over the set P of propositions is given by the grammar ϕ ::= p cl(r) ¬ϕ ϕ ∧ ϕ Xϕ ϕ U ϕ r ϕ , where p ∈ P and r is an SERE over P . A PSL formula1 over P is interpreted over an infinite word w ∈ (2P )ω as follows: w |= p iff p ∈ w(0) w |= cl(r) iff ∃k ≥ 0 : w≤k ∈ L(r) or ∀k ≥ 0 : ∃v ∈ L(r) : w≤k is a prefix of v w |= ϕ ∧ ψ iff w |= ϕ and w |= ψ w |= ¬ϕ iff w |= ϕ w |= Xϕ iff w≥1 |= ϕ w |= ϕ U ψ iff ∃k ≥ 0 : w≤k |= ψ and ∀j < k : w≥j |= ϕ w |= r ϕ iff ∃k ≥ 0 : w≤k ∈ L(r) and w≥k |= ϕ The language of a PSL formula ϕ is L(ϕ) := {w ∈ (2P )ω : w |= ϕ}. As for SEREs, we define the size of a PSL formula as its syntactic length. That means, ||p|| := 1, ||cl(r)|| := 1 + ||r||, ||¬ϕ|| := ||Xϕ|| := 1 + ||ϕ||, ||ϕ ∧ ψ|| := ||ϕ U ψ|| := 1 + ||ϕ|| + ||ψ||, and ||r ϕ|| := 1 + ||r|| + ||ϕ||. 1
For the ease of exposition, we identify PSL with its linear-time core.
Specification Languages for Stutter-Invariant Regular Properties
247
Syntactic Sugar. We use the standard conventions to omit parenthesis, e.g., temporal operators bind stronger than Boolean connectives and the binary operators of the SEREs are left associative. We also use standard syntactic sugar for the Boolean values, the Boolean connectives, and the linear-time temporal operators: ff := p ∧ ¬p, for some proposition p ∈ P , tt := ¬ff, ϕ ∨ ψ := ¬(¬ϕ ∧ ¬ψ), ϕ → ψ := ¬ϕ ∨ ψ, Fϕ := tt U ϕ, Gϕ := ¬F¬ϕ, and ϕ W ψ := (ϕ U ψ) ∨ Gϕ, where ϕ and ψ are formulas. Moreover, r ϕ abbreviates ¬(r ¬ϕ).
3
Stutter-Invariant Regular Properties
In this section, we present syntactic characterizations for stutter-invariant regular and ω-regular languages. In Section 3.1, we define a variant of SEREs that can describe only stutter-invariant languages. Furthermore, we show that this variant of SEREs is complete in the sense that any stutter-invariant regular language can be described by such an expression. Similarly, in Section 3.2, we present a variant of PSL for expressing stutter-invariant ω-regular languages. In Section 3.3, we give examples that illustrate the use of our stutter-invariant variant of PSL. 3.1
Stutter-Invariant SEREs
It is straightforward to see that stutter-invariant languages are not closed under the concatenation and the Kleene star. A perhaps surprising example is the SERE p+ ; q + over the proposition set {p, q}, which does not describe a stutterinvariant language, although L(p+ ) and L(q + ) are stutter-invariant languages.2 In our variant of SEREs, we restrict the use of concatenation and replace the Kleene star by an iteration operator, which uses the fusion instead of the concatenation for gluing words together. Namely, for a language L of finite words, we define L⊕ := n∈N Ln , where L0 := L and Li+1 := Li : L, for i ∈ N. The following lemma summarizes some closure properties of the class of stutterinvariant languages. Lemma 1. Let K ⊆ Σ ∗ and L, L ⊆ Σ ∞ be stutter-invariant languages. The languages L ∩ L , L ∪ L , K : L, and K ⊕ are stutter-invariant. Furthermore, Σ ∗ \ K, Σ ω \ L, and Σ ∞ \ L are stutter-invariant. Proof. We only show that the language K : L is stutter-invariant. The other closure properties are similarly proved. Assume that u ∈ K : L and (u) = (v) for u, v ∈ Σ ∞ . Let u = u bu , for some u ∈ Σ ∗ , u ∈ Σ ∞ , and b ∈ Σ with u b ∈ K and bu ∈ L. Since K is stutter-invariant, we can assume without loss of generality that if u is nonempty then u (|u |−1) = b. Since (u) = (v), there are v ∈ Σ ∗ and v ∈ Σ ∞ such that v = v bv , (v ) = (u ), and (bv ) = (bu ). From the stutter invariance of K and L, it follows that v ∈ K : L. Our variant of SEREs is defined as follows. 2
Note that the word {p, q} {p, q} belongs to L(p+ ; q + ) but the word {p, q} does not.
248
C. Dax, F. Klaedtke, and S. Leue
Definition 1. The syntax of siSEREs over the proposition set P is given by the grammar r ::= ε b+ b∗ ; r r ; b∗ r : r r ∪ r r ∩ r r⊕ , where b ranges over the Boolean formulas in B(P ). The language L(r) of an siSERE r is defined as expected. By an induction over the structure of siSEREs, which uses the closure properties from Lemma 1, we easily obtain the following theorem. Theorem 1. The language of every siSERE is stutter-invariant. In the remainder of this subsection, we show that any regular language that is stutter-invariant can be described by an siSERE. We prove this result by defining a function κ that maps SEREs to siSEREs. We show that it preserves the language if the given SERE describes a stutter-invariant language. The function κ is defined recursively over the structure of SEREs: κ(ε) := ε κ(b) := b+ κ(s ∪ t) := κ(s) ∪ κ(t) κ(s ∩ t) := κ(s) ∩ κ(t) κ(s : t) := κ(s) : κ(t)
∗ κ(t) if ε ∈ L(s) + κ(s ; t) := κ(s) : a ˆ : a ˆ ; κ(t)) ∪ ff otherwise a∈2P
⊕ κ(s∗ ) := ε ∪ κ(s) ∪ κ(s) : a ˆ+ : (ˆ a∗ ; κ(s)) , a∈2P
where b ∈ B(P ), s, t are SEREs, and a ˆ :=
p∈a
p∧
p
∈a
¬p, for a ∈ 2P .
Lemma 2. For every SERE r, the equality L (r) = L (κ(r)) holds. Proof. We show the lemma by induction over the structure of the SERE r. The base cases where r is ε or b with b ∈ B(P ) are obvious. The step cases where r is of one of the forms s ∪ t, s ∩ t, or s : t follow straightforwardly from the induction hypothesis. Next, we prove the step case where r is of the form s ; t. For showing L (r) ⊆ L (κ(r)), assume that u ∈ L (r). There are words x ∈ L(s) and y ∈ L(t) such that u = (xy). By induction hypothesis, we have that (x) ∈ L (κ(s)) and (y) ∈ L (κ(t)). The case where x the empty word is obvious. Assume that x = ε and a ∈ 2P is the last letter of x. We have that (xy) ∈ L (κ(s) : a ˆ) ; κ(t) and L (κ(s) : a ˆ) ; κ(t) ⊆ L (κ(s) : (ˆ a ; κ(t)) ⊆ L κ(s) : ((ˆ a:a ˆ) ; κ(t)) ⊆ L κ(s) : (ˆ a+ : (ˆ a∗ ; κ(t))) . For showing L (r) ⊇ L (κ(r)), assume that u ∈ L (κ(r)). We make a case split.
Specification Languages for Stutter-Invariant Regular Properties
249
1. If ε ∈ L(s) and u ∈ L (κ(t)) then u ∈ L (t) by induction hypothesis. We conclude that u ∈ L (ε ; t)⊆ L (s ;+t) =∗ L (r). 2. Assume that u ∈ L (κ(s): a∈2P a ˆ :(ˆ a ;κ(t)) ). There is a letter a ∈ 2P such + ∗ that u ∈ L (κ(s) : (ˆ a : (ˆ a ; κ(t)))) = L (κ(s) : (ˆ a ; κ(t))). It follows that there are words x and y such that u = xay, xa ∈ L (κ(s)), and ay ∈ L (ˆ a ; κ(t)). We have that either ay ∈ L (κ(t)) or y ∈ L (κ(t)). By induction hypothesis, we have that xa ∈ L (s) and either ay ∈ L (t) or y ∈ L (t). It follows that u ∈ L (r). Finally, we prove the step case where r is of the form s∗ . We first show L (r) ⊆ L (κ(r)). Assume that u ∈ L (s∗ ). If u is the empty word or u ∈ L (s) then there is nothing to prove. Assume that u is of the form u1 u2 . . . un with ui ∈ L (s) and ui = ε, for all 1 ≤ i ≤ n. By induction hypothesis, we have that ui ∈ L (κ(s)). Let ai be the last letter of ui , for each 1 ≤ i < n, respectively. We have that (ai−1 ui ) ∈ L (ˆ a+ a∗i−1 ; κ(s))), for all 1 < i ≤ n. It follows that i−1 : (ˆ (u1 a1 u2 . . . an−1 an ) ∈ L(κ(s)) : L (ˆ a+ a∗2 ; κ(s))) : . . . : L (ˆ a+ a∗n ; κ(s))). 1 : (ˆ n−1 : (ˆ Since (u) = (u1 a1 u2 . . . an−1 an ), we conclude that (u) ∈ L (κ(r)). For showing L (r) ⊇ L (κ(r)), we assume that u ∈ L (κ(r)). cases u = ε The and u ∈ L (κ(s)) are obvious. So, we assume that u ∈ L κ(s) : a+ : a∈2P (ˆ ⊕ ⊕ ⊕ (ˆ a∗ ; κ(s))) = L κ(s) : a ; κ(s)) = L s : a ; s) , where a∈2P (ˆ a∈2P (ˆ the last equality holds by induction hypothesis. There is an integer n ≥ 2 and words u1 , u2 , . . . , un ∈ L(s) and letters a1 , a2 , . . . , an−1 ∈ 2P such that u = (u1 a1 u2 . . . an−1 un ) and (ui ) = (ui ai ), for all 1 ≤ i < n. It follows that u = (u1 u2 . . . un ) ∈ L (s∗ ). A consequence of Lemma 2 is that the translated siSERE describes the minimal stutter-invariant language that overapproximates the language of the given SERE. Lemma 3. For every SERE r, L(r) ⊆ L(κ(r)) and if K is a stutter-invariant language with L(r) ⊆ K then L(κ(r)) ⊆ K. Proof. Let K be a stutter-invariant language with L(r) ⊆ K and let w ∈ L(κ(r)). We have to show that w ∈ K. Since L(κ(r)) is stutter-invariant, we have that (w) ∈ L(κ(r)). With Lemma 2, we conclude that (w) ∈ L (r). It follows that there is a word u ∈ L(r) with (u) = (w). Since K ⊇ L(r), we have that (w) ∈ K and thus, w ∈ K since K is stutter-invariant. It remains to be proven that L(r) ⊆ L(κ(r)). For w ∈ L(r), we have that (w) ∈ L (r). By Lemma 2, we have that (w) ∈ L (κ(r)). Since L(κ(r)) is stutter-invariant, we conclude that w ∈ L(κ(r)). From Lemma 3 we immediately obtain the following theorem. Theorem 2. For every stutter-invariant regular language L, there is an siSERE r such that L(r) = L. Note that the intersection and the fusion operation is not needed for SEREs to describe the class of regular languages. However, they are convenient for expressing regular languages naturally and concisely. It follows immediately from the
250
C. Dax, F. Klaedtke, and S. Leue
definition of the function κ that siSEREs even without the intersection operation exactly capture the class of stutter-invariant regular languages. However, in contrast to the intersection operator, the fusion operator is essential for describing this class of languages with siSEREs. Finally, we remark that when translating an SERE of the form r ; s or s∗ , we obtain an siSERE that contains a disjunction of all the letters in 2P that contains 2|P | copies of κ(s). We conclude that in the worst case, the size of the siSERE κ(r) for a given SERE r is exponential in ||r||. It remains open whether for every SERE that describes a stutter-invariant regular language, there is a language-equivalent siSERE of polynomial size. 3.2
Stutter-Invariant PSL
Similar to the previous subsection, we define a variant of the core of PSL and show that this temporal logic describes exactly the class of stutter-invariant ω-regular languages. Definition 2. The syntax of siPSL formulas is similar to that of PSL formulas except that the formulas do not contain the temporal operator X and instead of SEREs they contain siSEREs. The semantics is defined as expected. By a straightforward induction over the structure of siPSL formulas and by using the closure properties from Lemma 1, we obtain the following theorem. Note that L(r ϕ) = L(r) : L(ϕ). Furthermore, it is easy to see that the language L(cl(r)) is stutter-invariant if r is an SERE or siSERE that describes a stutter-invariant language. Theorem 3. The language of every siPSL formula is stutter-invariant. In the following, we show that every stutter-invariant ω-regular language can be described by an siPSL formula. We do this by extending the translations in [17] for eliminating the temporal operator X in LTL formulas to PSL formulas. We define the function τ that translates PSL formulas into siPSL formulas as follows. It is defined recursively over the formula structure and it uses the function κ from Section 3.1 for translating SEREs into siSEREs. τ (p) := p τ (cl(r)) := cl(κ(r)) τ (¬ϕ) := ¬τ (ϕ) τ (ϕ ∧ ψ) := τ (ϕ) ∧ τ (ψ) τ (ϕ U ψ) := τ (ϕ) U τ (ψ) τ (r ϕ) := κ(r) τ (ϕ) τ (Xϕ) := Gˆ a ∧ τ (ϕ) ∨ a∈2P
ˆ a ˆ U b ∧ τ (ϕ)
b∈2P \{a}
The intuition of the elimination of the outermost operator X in a formula Xϕ is as follows: “the first time after now that some new event happens, ϕ must hold, or else, if nothing new ever happens, ϕ must hold right now.”
Specification Languages for Stutter-Invariant Regular Properties
251
Note that the size of the resulting siPSL formula is in the worst case exponential in the size of the given PSL formula. The sources of the blow-up are (1) the translation of the SEREs in the given PSL formula into siSEREs and (2) the elimination of the temporal operator X. We can improve the translation τ with respect to the size of the resulting formula by using the translation defined in [10] for eliminating the operator X in LTL formulas that describe stutter-invariant languages. The translation in [10] avoids the conjunctions over the letters in 2P . Instead the conjunctions only range over the propositions in P . The elimination of an operator X is not exponential in |P | anymore. However, the resulting translation for PSL into siPSL is still exponential in the worst case because of (1). The question whether the exponential blow-up can be avoided remains open. The following lemma for τ is the analog of Lemma 2 for the function κ. Lemma 4. For every PSL formula ϕ, the equality L (ϕ) = L (τ (ϕ)) holds. Similar to Lemma 3 for SEREs, we obtain that the function τ translates PSL formulas into siPSL formulas that minimally overapproximate the described languages with respect to stutter invariance. Lemma 5. For every PSL formula ϕ, L(ϕ) ⊆ L(τ (ϕ)) and if L is a stutterinvariant language with L(ϕ) ⊆ L then L(τ (ϕ)) ⊆ L. From Lemma 5 we immediately obtain the following theorem. Theorem 4. For every stutter-invariant ω-regular language L, there is an siPSL formula ϕ such that L(ϕ) = L. We remark that the finite-state model-checking problem for PSL and siPSL fall into the same complexity classes. Namely, the finite-state model-checking problem for siPSL is EXPSPACE-complete and the problem becomes PSPACEcomplete when the number of intersection operators in the given siPSL formulas is bounded. These complexity bounds can be easily established from the existing bounds on PSL, see [4] and [5, 14]. Note that the automata-theoretic realization of the iteration operator ⊕ is similar to the one that handles the Kleene-star. Recently, we proposed an extension of PSL with past operators [7]. As for LTL−X [17], we remark that our result on the stutter invariance of siPSL straightforwardly carries over to an extension of siPSL with past operators. 3.3
siPSL Examples
In the following, we illustrate that stutter-invariant ω-regular properties can be naturally expressed in siPSL. For comparison, we describe these properties in siPSL and other temporal logics that express stutter-invariant properties. Star-Free Properties. Consider the following commonly used specification patterns taken from [8]: (P1) Absence: p is false after q until r. (P2) Existence: p becomes true between q and r.
252
C. Dax, F. Klaedtke, and S. Leue Table 1. siPSL formulas and LTL−X formulas of the specification patterns
pattern siPSL formula P1 P2 P3 P4 P5
+
LTL−X formula
+
G(q : ¬r ¬p) G((q ∧ ¬r)+ : (¬p∗ ; r + ) ff) G(q + : ¬r + : ¬p : (¬r ∗ ; r + ) ff) G(q + : (¬r ∧ ¬s)+ ¬p) G(q + : ¬r + : p (¬r + : s+ tt))
G(q ∧ ¬r → (¬p) W r) G(q ∧ ¬r → (¬r) W (p ∧ ¬r)) G(q ∧ ¬r ∧ Fr → p U r) G(q ∧ ¬r → (¬p) W (s ∨ r)) G(q ∧ ¬r → (p → (¬r) U (s ∧ ¬r)) W r)
(P3) Universality: p is true between q and r. (P4) Precedence: s precedes p, after q until r. (P5) Response: s responds to p, after q until r. Table 1 contains the formalization of these specification patterns in siPSL and LTL−X . Note that any LTL−X is also an siPSL formula. However, since practitioners often find it easier to use (semi-extended) regular expressions than the temporal operators in LTL, we have used siSEREs in the siPSL formulas to formalize the patterns in siPSL. An advantage of siPSL over LTL−X is that one can choose between the two specifications styles and mix them. Omega-regular Properties. We consider the stutter-invariant ω-regular language Ln := {w ∈ (2{p} )ω : the number of occurrences of the subword {p}∅ in w is divisible by n} , for n ≥ 2. The following siPSL formula describes the language Ln : neverswitch ∨ ((¬p∗ ; switch) : . . . : (¬p∗ ; switch))⊕ neverswitch , n times
where switch := p+ : (p∗ ; ¬p+ ) and neverswitch := (¬p) W Gp. Note that the language Ln is not star-free and thus, it cannot be described in LTL−X . In the following, we compare our siPSL formalization of Ln with a formalization in the temporal logic SI-EQLTL from [9], which has the same expressive power as siPSL. We briefly recall the syntax and semantics of SIEQLTL. The formulas in SI-EQLTL are of the form ∃h q1 . . . ∃h qn ϕ, where ϕ is an LTL−X formula over a proposition set that contains the propositions q1 , . . . , qn . The semantics of the quantifier ∃h is as follows. Let P be a proposition set with q ∈ P . The word w ∈ (2P ∪{q} )ω is a harmonious extension of v ∈ (2P )ω if for all i ∈ N, it holds that v(i) = w(i) ∩ P and if v(i) = v(i + 1) then w(i) = w(i + 1). For v ∈ (2P )ω , we define v |= ∃h q ϕ iff w |= ϕ, for some harmonious extension w ∈ (2P ∪{q} )ω of v. For readability, we only state an SI-EQLTL formula that describes the language L2 (the formula can be straightforwardly generalized for describing the language Ln with n ≥ 2):
Specification Languages for Stutter-Invariant Regular Properties
253
∃h q q ∧ G(q → neverswitch ∨ switch2 ) ∧ F neverswitch , where
switch2 := (¬p ∧ q) U (p ∧ q) U (¬p ∧ ¬q) U (p ∧ ¬q) U (¬p ∧ q) . Intuitively, the subformula switch2 matches subwords that contain two occurrences of {p}∅. Furthermore, the harmoniously existentially quantified proposition q marks every position k of a word in L2 , where the number of occurrences of {p}∅ in w≤k is even. We remark that we did not manage to come up with a simpler SI-EQLTL formula for describing the language Ln .3 Nevertheless, we consider the SI-EQLTL formula for Ln still hard to read because of the harmonious quantified variable q and the nesting of the temporal operators, which is linear in n. Furthermore, note that the advantage of siPSL over LTL−X , namely, to mix different specification styles, is also an advantage of siPSL over SI-EQLTL.
4
Concluding Remarks
We have presented the specification languages siSEREs and siPSL, which capture exactly the classes of stutter-invariant regular and ω-regular languages, respectively. siSEREs are a variants of SEREs and siPSL is a variant of the temporal logic PSL [1], which is nowadays widely used in industry. siPSL inherits the following pleasant features from PSL. First, siPSL is easy to use. Second, the computational complexities for solving the finite-state model-checking problem with siPSL and fragments thereof are similar to the corresponding problems for PSL. Third, with only minor modifications we can use the existing tool support for PSL (like the model checker RuleBase [3], the formula translator into nondeterministic B¨ uchi automata rtl2ba [7], or the translator used in [6] with all its optimizations) for siPSL. We only need to provide additional support for the new Kleene-star-like iteration operator ⊕ of the siSEREs.
References 1. IEEE standard for property specification language (PSL). IEEE Std 1850TM (October 2005) 2. Alur, R., Brayton, R.K., Henzinger, T.A., Qadeer, S., Rajamani, S.K.: Partial-order reduction in symbolic state-space exploration. Form. Method. Syst. Des. 18(2), 97–116 (2001) 3. Beer, I., Ben-David, S., Eisner, C., Geist, D., Gluhovsky, L., Heyman, T., Landver, A., Paanah, P., Rodeh, Y., Ronin, G., Wolfsthal, Y.: RuleBase: Model checking at IBM. In: Marie, R., Plateau, B., Calzarossa, M.C., Rubino, G.J. (eds.) TOOLS 1997. LNCS, vol. 1245, pp. 480–483. Springer, Heidelberg (1997) 3
We encourage the reader to find a simpler SI-EQLTL formula that describes Ln .
254
C. Dax, F. Klaedtke, and S. Leue
4. Ben-David, S., Bloem, R., Fisman, D., Griesmayer, A., Pill, I., Ruah, S.: Automata construction algorithms optimized for PSL. Technical report, The Prosyd Project (2005), http://www.prosyd.org 5. Bustan, D., Havlicek, J.: Some complexity results for SystemVerilog assertions. In: Ball, T., Jones, R.B. (eds.) CAV 2006. LNCS, vol. 4144, pp. 205–218. Springer, Heidelberg (2006) 6. Cimatti, A., Roveri, M., Tonetta, S.: Symbolic compilation of PSL. IEEE Trans. on CAD of Integrated Circuits and Systems 27(10), 1737–1750 (2008) 7. Dax, C., Klaedtke, F., Lange, M.: On regular temporal logics with past. In: Proceedings of the 36th International Colloquium on Automata, Languages, and Programming, ICALP (to appear, 2009) 8. Dwyer, M.B., Avrunin, G.S., Corbett, J.C.: Patterns in property specifications for finite-state verification. In: Proceedings of the 21st International Conference on Software Engineering (ICSE), pp. 411–420 (1999), http://patterns.projects.cis.ksu.edu/ 9. Etessami, K.: Stutter-invariant languages, ω-automata, and temporal logic. In: Halbwachs, N., Peled, D.A. (eds.) CAV 1999. LNCS, vol. 1633, pp. 236–248. Springer, Heidelberg (1999) 10. Etessami, K.: A note on a question of Peled and Wilke regarding stutter-invariant LTL. Inform. Process. Lett. 75(6), 261–263 (2000) 11. Godefroid, P., Wolper, P.: A partial approach to model checking. Inf. Comput. 110(2), 305–326 (1994) 12. Holzmann, G., Kupferman, O.: Not checking for closure under stuttering. In: Proceedings of the 2nd International Workshop on the SPIN Verification System. Series in Discrete Mathematics and Theoretical Computer Science, vol. 32, pp. 163–169 (1996) 13. Lamport, L.: What good is temporal logic? In: Proceedings of the 9th IFIP World Computer Congress. Information Processing, vol. 83, pp. 657–668 (1983) 14. Lange, M.: Linear time logics around PSL: Complexity, expressiveness, and a little bit of succinctness. In: Caires, L., Vasconcelos, V.T. (eds.) CONCUR 2007. LNCS, vol. 4703, pp. 90–104. Springer, Heidelberg (2007) 15. Peled, D.: Combining partial order reductions with on-the-fly model-checking. Form. Method. Syst. Des. 8(1), 39–64 (1996) 16. Peled, D.: Ten years of partial order reduction. In: Y. Vardi, M. (ed.) CAV 1998. LNCS, vol. 1427, pp. 17–28. Springer, Heidelberg (1998) 17. Peled, D., Wilke, T.: Stutter-invariant temporal properties are expressible without the next operator. Inform. Process. Lett. 63(5), 243–246 (1997) 18. Peled, D., Wilke, T., Wolper, P.: An algorithmic approach for checking closure properties of temporal logic specifications and ω-regular languages. Theoret. Comput. Sci. 195(2), 183–203 (1998) 19. Rabinovich, A.M.: Expressive completeness of temporal logic of action. In: Brim, L., Gruska, J., Zlatuˇska, J. (eds.) MFCS 1998. LNCS, vol. 1450, pp. 229–238. Springer, Heidelberg (1998) 20. Valmari, A.: A stubborn attack on state explosion. Form. Method. Syst. Des. 1(4), 297–322 (1992) 21. Vardi, M.Y.: From philosophical to industrial logics. In: Ramanujam, R., Sarukkai, S. (eds.) Logic and Its Applications. LNCS, vol. 5378, pp. 89–115. Springer, Heidelberg (2009)
Incremental False Path Elimination for Static Software Analysis Ansgar Fehnker, Ralf Huuck, and Sean Seefried National ICT Australia Ltd. (NICTA) Locked Bag 6016 University of New South Wales Sydney NSW 1466, Australia
Abstract. In this work we introduce a novel approach for removing false positives in static program analysis. We present an incremental algorithm that investigates paths to failure locations with respect to feasibility. The feasibility test it done by interval constraint solving over a semantic abstraction of program paths. Sets of infeasible paths can be ruled out by enriching the analysis incrementally with observers. Much like counterexample guided abstraction refinement for software verification our approach enables to start static program analysis with a coarse syntactic abstraction and use richer semantic information to rule out false positives when necessary and possible. Moreover, we present our implementation in the Goanna static analyzer and compare it to other tools for C/C++ program analysis.
1
Introduction
One technique to find bugs in large industrial software packages is static program analysis. While it has been proven to be scalable and fast, it typically suffers to one degree or another from potential false positives. The main reason is that unlike software model checking, static program analysis typically works on a rather abstract level, such as control flow graphs (CFG) without any data abstraction. Therefore, a syntactic model of a program is a coarse abstraction, and reported error paths can be spurious, i.e. they may not correspond to an actual run in the concrete program. In this paper we present an incremental algorithm that automatically investigates error paths and checks if they are infeasible. To do so, semantic information in the form of interval equations is automatically added to a previously purely syntactic model. This semantic information is only incorporated once a potential bug has been detected, i.e., a counterexample generated. While this ensures scalability and speed for the bug-free parts of a program, it allows to drill down further once a bug has been found.
NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 255–270, 2009. c Springer-Verlag Berlin Heidelberg 2009
256
A. Fehnker, R. Huuck, and S. Seefried
The feasibility of a counterexample is analyzed using constraint solving over the interval domain, i.e., the possible ranges of all variables within a given path. A counterexample path is spurious if the least solution of a corresponding equation system is empty. The subset of equations responsible for an empty solution is called a conflict. In a second step we refine our syntactic model by a finite observer which excludes all paths that generate the same conflict. These steps are applied iteratively, until either no new counterexample can be found, or until a counterexample is found that cannot be proven to be spurious. This approach is obviously inspired by counterexample guided abstraction refinement (CEGAR), as used in [1,2]. A main difference is the use of the precise least solution for interval equations [3] in the place of SAT-solving. This technique deals directly with loops, without the need to discover additional loop-predicates [2,4], or successive unrolling of the transition relation [5]. An alternative to SAT-based CEGAR in the context of static analysis, using polyhedral approximations, was proposed in [6,7]. Pre-image computations along the counterexamples are used to improve the accuracy of the polyhedral approximation. Our approach in contrast uses a precise least solution of an interval equation system, which is computationally faster, at the expense of precision. Our proposed approach combines nicely with the static analysis approach put forward in [8]. It defines, in contrast to semantic software model checking, syntactic properties on a syntactic abstraction of the program. The experimental results confirm that the proposed technique situates our tool in-between software model checking and static analysis tools. False positives are effectively reduced while interval solving converges quickly, even in the presence of loops. The next section introduces preliminaries of labeled transition systems, interval solving and model checking for static properties. Section 3 introduces Interval Automata, the model we use to capture the program semantics. Section 4 presents the details of our approach. Implementation details and a comparison with other tools are given in Section 5.
2 2.1
Basic Definitions and Concepts Labeled Transition Systems
This paper uses labeled transition systems (LTS) to describe the semantics of our abstract programs. An LTS is defined by (S, S0 , A, R, F ) where S is a set of states, S0 ⊆ S is a sub-set of initial states, A is a set of actions and R ⊆ S ×A×S is a transition relation where each transition is labeled with an action a ∈ A, and F ⊆ S is a set of final states. An LTS is deterministic if for every state s ∈ S and action a ∈ A there is at most one successor state such that (s, a, s ) ∈ R. The finite sequence ρ = s0 a0 s1 a1 . . . an−1 sn is an execution of an LTS P = (S, S0 , A, R, F ), if s0 ∈ So and (si , ai , si+1 ) ∈ R for all i ≥ 0. An execution is accepting if sn ∈ F . We say w = a0 . . . an−1 ∈ A∗ is a word in P , if there exist si , i ≥ 0, such that s0 a0 s1 a1 . . . an−1 sn form an execution in P . The language of P is defined by the set of all words for which there exists an accepting execution. We denote this language as LP .
Incremental False Path Elimination for Static Software Analysis
257
The product of two labeled transition systems P1 = (S1 , S10 , A, R1 , F1 ) and P2 = (S2 , S20 , A, R2 , F2 ), denoted as P× = P1 × P2 , is defined as P× = (S1 × S2 , S10 × S10 , A, R× , F1 × F2 ) where ((s1 , s2 ), a, (s1 , s2 )) ∈ R× if and only if (s1 , a, s1 ) ∈ R1 and (s2 , a, s2 ) ∈ R2 . The language of P× is the intersection of the language defined by P1 and P2 . 2.2
Interval Equation Systems
We define an interval lattice I = (I, ⊆) by the set I = {∅} ∪ {[z1 , z2 ]|z1 ∈ Z∪{−∞}, z2 ∈ Z∪{∞}, z1 ≤ z2 } with the partial order implied by the contained in relation ”⊆” , where a non-empty interval [a, b] is contained in [c, d], if a ≥ c and b ≤ d. The empty element is the bottom element of this lattice, and [−∞, +∞] the top element. Moreover, we consider the following operators on intervals: addition (+), multiplication (·), union , and intersection with the usual semantics [[.]] defined on them. For a given finite set of variables X = {x0 , . . . , xn } over I we define an interval expression φ as follows: . φ =a | x| φ φ| φ φ | φ+φ| φ·φ where x ∈ X, and a ∈ I. The set of all expression over X is denoted as C(X). For all operation we have that [[φ ◦ ϕ]] is [[φ]] ◦ [[ϕ]], where ◦ can be any of , , +, ·. A valuation is a mapping v : X → I from an interval variable to an interval. Given an interval expression φ ∈ C(X), and a valuation v, the [[φ]]v denoted the expression φ evaluated in v, i.e. it is defined to be the interval [[φ[v(x0 )/x0 , . . . , v(xn )/xn ]]], which is obtained by substituting each variable xi with the corresponding interval v(xi ). An interval equation system is a mapping IE : X → C(X) from interval variables to interval expressions. We also denote this by xi = φi where i ∈ 1, . . . , n. The solution of such an interval equation system is a valuation satisfying all equations, i.e., [[xi ]] = [[φi ]]v for all i ∈ 1, . . . , n. As shown in [3] there always is a precise least solution which can be efficiently computed. By precise we mean precise with respect to the interval operators’s semantics and without the use of additional widening techniques. Of course, from a program analysis point of view over-approximations are introduced, e.g., when joining two intervals [1, 2] [4, 5] results in [1, 5]. This, however, is due to the domain we have chosen. 2.3
Static Analysis by Model Checking
This work is based on an automata based static analysis framework as described in [8], which is related to [9,10,11]. The basic idea is to map a C/C++ program to its CFG, and to label it with occurrences of syntactic constructs of interest. The CFG together with the labels are mapped to, either a Kripke structure, or to the input language of a model checker, in our case NuSMV. A simple example of this approach is shown in Fig. 1. Consider the contrived program foo which is allocating some memory, copying it a number of times
258 1
A. Fehnker, R. Huuck, and S. Seefried
void foo() {
2
int x, *a;
3
int *p=malloc(sizeof(int));
4
for(x = 10; x > 0; x--) {
5
a = p;
6
if(x == 1)
7
free(p)
8 9
l0 l1 mallocp l2 l7 l3 usedp
f reep
l5
}
l4 l6
} Fig. 1. Example program and labeled CFG for use-after-free check
to a, and freeing the memory in the last loop iteration. In our automata based approach we syntactically identify program locations that allocate, use, and free resource p. We automatically label the program’s CFG with this information as shown on the right hand side of Fig. 1. To check whether after freeing some allocated resource, it is not used afterwards, we can check the CTL property: AG (mallocp ⇒ AG (f reep ⇒ ¬EF usedp )), which means that whenever there is free after malloc for a resource p, there is no path such that p is used later on. Obviously, neglecting any further semantic information will lead to an alarm in this example, which is a false positive as p only gets freed in the last loop iteration.
3
Interval Automata
This section introduces interval automata (IA) which abstract programs and capture their operational semantics on the domain of intervals. We define an IA as an extended state machine where the control structure is a finite state machine, extended by a mapping from interval variables to interval expressions. We will show later how to translate a C/C++ program to an IA. Definition 1 (Syntax). An interval automaton is a tuple (L, l0 , X, E, update), with – – – – –
a finite set of locations L, an initial location l0 , a set of interval variables X, a finite set of edges E ⊆ L × L, and an effect function update : E → (X × C(X)).
The effect update assigns to each edge a pair of an interval variable and an interval expression. We will refer to the (left-hand side) variable part update|X
Incremental False Path Elimination for Static Software Analysis
259
as lhs, and to the (right-hand side) expression part update|C(X) as rhsexpr. The set of all variables that appear in rhsexpr will be denoted by rhsvars. Note, that only one variable is updated on each edge. This restriction is made for the sake of simplicity, but does not restrict the expressivity of an IA. Definition 2 (Semantics). The semantics of an IA P = (L, l0 , X, E, update) are defined by a labeled transition system LT S(P ) = (S, S0 , A, R, F ), such that – the set of states S contains all states (l, v) with location l, and an interval valuation v. – the set of initial states S0 contains all states s0 = (l0 , v0 ), with v0 ≡ [−∞, ∞]. – the alphabet A is the set of edges E. – the transition relation R ⊆ S × A × S contains a triple ((l, v), (l, l ), (l , v )), i.e a transition from state (l, v) to (l , v ) labeled (l, l ), if there exists a (l, l ) in E, such that v = v[lhs(e) ← [[rhsexpr(e)]]v ] and [[rhsexpr(e)]]v = ∅. – the set of final states F = S, i.e. all states are final states It might seem a bit awkward that the transitions in the LTS are labeled with the edges of the IA, but this will be used later to define the synchronous composition with an observer. Since each transition is labeled with its corresponding edge we obtain a deterministic system, i.e., for a given word there exists only one possible run. We identify a word ((l0 , l1 ), (l1 , l2 ), . . . , (lm−1 , lm )) in the remainder by the sequence of locations (l0 , . . . , lm ). Given an IA P . Its language LP contains all sequences (l0 , . . . , ln ) which satisfy the following: l0 = l 0
(1)
∧ ∀i = 0, . . . , n − 1. (li , li+1 ) ∈ E
(2)
∧ v0 ≡ [−∞, +∞] ∧ ∃v1 , . . . , vn .([[rhsexpr(li , li+1 )]]vi =∅ ∧ vi+1 = vi [lhs(li , li+1 ) ← [[rhsexpr(li , li+1 )]]vi ])
(3)
This mean that a word (1) starts in the initial location, (2) respects the edge relation E, and (3) that there exists a sequence of non-empty valuations that satisfies the updates associated with the edges. We use this characterization of words as satisfiability problem to generate systems of interval equations that have a non-empty solution only if a sequence (l0 , . . . , ln ) is a word. We will define for a given IA P and sequence w a conflict as an interval equation system with an empty least solution which proves that w cannot be a word of the IA P .
4
Path Reduction
The labeled CFG as defined in Section 2.3 is a coarse abstraction of the actual program. Like most static analysis techniques this approach suffers from false positives. In the context of this paper we define a property as a regular language, and satisfaction of a property as language inclusion. The program itself will be defined by an Interval Automaton P and its behavior is defined by the language
260
A. Fehnker, R. Huuck, and S. Seefried
of the corresponding LT S(P ). Since interval automata are infinite state systems, we do not check the IA itself but an abstraction Pˆ . This abstraction is initially an annotated CFG as depicted in Fig. 1. A positive is a word in the abstraction Pˆ that does not satisfy the property. A false positive is a positive that is not in the actual behavior of the program, i.e. it is not in the language of the LT S(P ). Path reduction is then defined as the iterative process that restricts the language of the abstraction, until either a true positive has been found, or until the reduced language satisfies the property. 4.1
Path Reduction Loop
Given an IA P = (L, l0 , E, X, update) we define its finite abstraction Pˆ as follows: Pˆ = (L, l0 , E, E , L) is a labeled transition system with states L, initial state l0 , alphabet E, transition relation E = {(l, (l, l), l )|(l, l ) ∈ E}, and the entire set L as final states. The LTS Pˆ is an abstraction of the LT S(P ), and it represents the finite control structure of P . The language of Pˆ will be denoted by LPˆ . Each word of Pˆ is by construction a word of LT S(P ). Let Lφ be the language of the specification. We assume to have a procedure that checks if the language of LTS LPˆ is a subset of Lφ , and produces a counterexample if this is not the case (cf. Section 4.5). If it finds a word in LPˆ that is not in Lφ , we have to check whether this word is in LP , i.e. we have to check whether it satisfies equation (1) to (3). Every word w = (l0 , . . . , lm ) in LPˆ satisfies by construction (1) and (2). If a word w = (l0 , . . . , lm ) can be found such that there exists no solution for (3), it cannot be a word of LP . In this case we call the word spurious. If we assume in addition to have a procedure to check whether a word is spurious (cf. Section 4.2), we can use these in an iterative loop to check if the infinite LTS of IA P satisfies the property, by checking a finite product of abstraction Pˆ with observers instead as follows: 1. Let Pˆ0 := Pˆ , and i = 0. 2. Check if w ∈ LPˆi \ Lφ exists. If such a w exists got to step 3, otherwise exit with “property satisfied”. 3. Check if w ∈ LP . If w ∈ LP build observer Obsw , otherwise exit with “property not satisfied”. The observer (1) accepts w, and (2) satisfies that all accepted words w are not in LP . C ˆ 4. Let Pˆi+1 := Pˆi × ObsC w , with . Pi × Obcw is the synchronous composition of ˆ Pi and the complement of Obsw . Increment i and goto step 3. The remainder of this section explains how to check if a word is in LP , how to build a suitable observer, and how to combine it in a framework which uses NuSMV to model check the finite abstraction Pˆi . Example. The initial coarse abstraction as CFG in Fig. 1 loses the information that p cannot be used after it was freed. The shortest counterexample in Fig. 1 initializes x to 10 in line 4, enters the for-loop, takes the if-branch, frees p in line 7, decrements x, returns to the beginning of the for-loop, and then uses
Incremental False Path Elimination for Static Software Analysis
1
void foo() {
2
int x, *a;
3
int* p=malloc(sizeof(int));
4
for(x = 10; x > 0; x--) { a = p;
6
if(x == 1)
7
free(p)
8 9
l1
x = [10, 10]
x = x [−∞, 0] l
2
l7 l3 x = x [1, 1]
} }
p = [1, ∞]
l4
l5 p = [−∞, ∞]
l6
x = x [1, ∞] a = p x = (x [−∞, 0]) (x [2, ∞])
x = x + [−1, −1]
5
l0
261
Fig. 2. Abstraction of the program as IA. Analysis of the syntactic properties of the annotated CFG in Fig.1 is combined with an analysis of the IA to the right.
p in line 5. This counterexample is obviously spurious. Firstly, because the ifbranch with condition x == 1 at line 7 is not reachable while x = 10. Secondly, because if the programm could enter the if-loop, it would imply x == 1, and hence impossible to reenter the loop, given decrement x-- and condition x > 0. 4.2
Checking for Spurious Words
Every word w = (l0 , . . . , lm ) in LPˆ satisfies by construction (1) and (2). It remains to be checked if condition (3) can be satisfied. A straightforward approach is to execute the trace on LT S(P ). However this can only determine if that particular word is spurious. Our proposed approach builds an equation system instead, which allows us to find a set of conflicting interval equations, which can in turn be used to show that an entire class of words is spurious. Another straightforward approach to build such an equation system is to introduce for each variable and edge in w an interval equation, and to use an interval solver to check if a solution to this system of interval equations exists. A drawback of this approach is that it introduces (m + 1) × n variables, and m × n equations. In the following we present an approach to construct an equation system with at most one equation and one variable for each edge in w. Interval Equation System for a Sequence of Transitions. We describe how to obtain an equation system for a word w ∈ LPˆ , such that it has a nonempty least solution only if w ∈ LP . This system is generated in three steps: I. Tracking variables. Let XL be a set of fresh variables xl . For each update of a variable, i.e. x = lhs(li−1 , li ) we introduce a fresh variable xli . We use the notation x(i) to refer to the last update of x before the i-th transition in w = (l0 , . . . , lm ). If no such update exist x(i) will be xl0 .
A. Fehnker, R. Huuck, and S. Seefried
var
exprw (i)
(l0 , l1 ) (l1 , l2 ) (l2 , l3 ) (l3 , l4 ) (l4 , l5 ) (l5 , l6 ) (l6 , l2 ) (l2 , l3 ) (l3 , l4 )
pl1 x l2 x l3 al4 x l5 pl6 x l2 x l3 al4
[1, ∞] [10, 10] xl2 [1, ∞] pl1 xl3 [1, 1] [−∞, ∞] xl5 + [−1, −1] xl2 [1, ∞] pl6
IE w ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ pl1 = ⎪ ⎪ ⎪ ⎪ x l2 = ⎪ ⎪ ⎬ x l3 = ⎪ ⎪ ⎪ ⎪ al4 = ⎪ ⎪ ⎪ ⎪ x l5 = ⎪ ⎪ ⎪ pl6 = ⎪ ⎪ ⎪ ⎪ ⎭
Reduced conflicts ⎫ ⎪ ⎪ ⎪ ⎬ xl2 [1, ∞] xl3 [10, 10] ⎪ ⎪ xl (xl5 + [−1, −1]) ⎪ ⎭ 5 ⎫ (xl2 [1, ∞]) ⎪ ⎪ pl1 pl6 ⎪ ⎬ xl5 xl3 [1, 1] xl2 ⎪ [−∞, ∞] ⎪ ⎪ xl3 ⎭
= [10, 10] = xl2 [1, ∞] = xl3 [1, 1]
conflict 1
w
= [−∞, ∞] [1, 1] = xl5 + [−1, −1] = xl2 [1, ∞]
conflict 2
262
Fig. 3. Equations for counterexample w of the IA depicted in Fig. 2
II. Generating equations. For each edge in w we generate an interval expression over XL . We define exprw : {0, . . . , m} → C(XL ) as follows: exprw (i) → rhsexpr(li−1 , li )[x(i−1) /x]x∈rhsvars(li−1 ,li )
(4)
An expression exprw (i) is the right-hand side expression of the update on (li−1 , li ), where all occuring variables are substituted by variables in XL . III. Generating equation system. Locations may occur more than once in a word, and variables maybe updated by multiple edges. Let writesw ⊆ XL the set {xl |∃i s.t.x = lhs(li−1 , li ), and Ωw be a mapping each xl ∈ writesw the set {i|x = lhs(li−1 , li )∧li = l}. The system IE w : XL → C(XL ) is defined as follows: xl →
i∈Ωw (xl )
[−∞, ∞]
exprw (i)
if xl ∈ writesw otherwise
(5)
System IE w assigns each variable xl ∈ writesw to a union of expressions; one expression for each element in Ωw (x, l). Example. Fig. 3 depicts for word w = (l0 , l1 , , l2 , l3 , l4 , l5 , l6 , l2 , l3 , l4 ) how to generate IE w . The first column gives the transitions in w. The second column gives the variable in writesw . The variable pl1 , for example, refers to the update of p on the first transition (l0 , l1 ) of the IA in Fig. 2. We have that x(5) = xl3 , since the last update of x before (l4 , l5 ) was on edge (l2 , l3 ). The third column gives the equations exprw (i). For example, the right-hand side rhsexpr(l4 , l5 ) is x = x [1, 1]. Since x(4) = xl3 , we get that exprw (5) is xl3 [1, 1]. The fourth column shows the equation system IE w derived from the equations. We have, for example, that x is updated on (l1 , l2 ), the second edge in w, and (l6 , l2 ), the 8th edge. Hence, Ωw (xl2 ) = {2, 8}. Equation IE w (xl2 ) is then defined as the union of exprw (2), which is [10, 10], and exprw (8), which is xl5 + [−1, −1]. The least solution of the equation system IE w is pl1 = [1, ∞], xl2 = [10, 10], xl3 = [10, 10], al4 = [−∞, ∞], xl5 = ∅, and pl6 = [−∞, ∞]. Since xl5 = ∅, there exists no solution, and w is spurious.
Incremental False Path Elimination for Static Software Analysis
263
Lemma 1. Given a word w ∈ LPˆ . Then there exist a sequence of non-empty valuations v1 , . . . , vm such that (3) holds for w only if IE w has a non-empty least solution. Proof. Given solution to (3) we can construct non-empty solution of IE w , which must be included in the least solution of IE w . An advantage of creating interval equations as described is that reason naturally over loops. A third or fourth repetition does not introduce new variable in writesw , and neither new expressions. This means that the equation system for a word, which is a concatenation αβββγ, has no non-empty solution, if the concatenation αββγ has. The least solution will be the same. 4.3
Conflict Discovery
The previous subsection described how to check if a given word w ∈ LPˆ is spurious. Interval solving, however, leads to an over-approximation, mostly due to the -operation. This subsection describes how to reduce the numbers of (nontrivial) equations in a conflict and at the same time the over-approximation error, by restricting conflicts to fragments and cone-of influence. For CEGAR approaches for infinite-state systems it has been observed that it is sufficient and often more efficient to find a spurious fragments of a counterexample, rather than a spurious counterexample [12,13]. The effect is similar to small or minimal predicates in SAT-based approaches. The difference between a word and a fragment in our context is that a fragment may start in any location. Given a word w = (l0 , . . . , lm ) a fragment w = (l0 , . . . , lm ) is a subsequence of w. For fragment w of a word w we can show the analog of Lemma 1. If the least solution of IE w is empty, i.e. if the fragment w is spurious, then the word w is spurious. If there exists a sequence of non-empty valuations v1 , . . . , vm for w, then they also define a non-empty subsequence of valuations for w , and these have to be contained in the least solution of IE w . A word of length m − 1 has m2 /2 fragments. Rather than checking all of these we can rule out most of these based on two observations. To be useful at least the last edge of a fragment has to result in an empty solution. An update in (3) can only result in an empty solution, if there exist an element in I that is mapped to the empty set by update. We call such updates restricting. An example of such updates are intersections with constants such as x = x [1, ∞]. We can omit all (li−1 , li ) after the last last restricting update in w . Similarly, we find that edges with updates that map valuation v ≡ [−∞, ∞] to [−∞, ∞], can be omitted from the beginning of a fragment. Given thus reduced fragment w , we can reduce IE w further, by omitting all equations that are not in the cone-of-influence of the last restricting update. We refer to the resulting system of equations as IE w . In the remainder we will refer to IE w as the reduced conflict, which is uniquely determined by the fragment w . The reduction to the cone-of-influence, guarantees that IE w has an empty least solution if IE w has. The converse is not true. However, since we consider all possible fragments, it is guaranteed that if a set of removed equations result in an conflict, these will appear as reduced conflict of another fragment.
264
A. Fehnker, R. Huuck, and S. Seefried
Example. There are two conflicts among the candidate fragments in Fig. 3. Conflict 1, for fragment (l1 , l2 , l3 , l4 , l5 ), has as least solution xl2 = [10, 10], xl3 = [10, 10], xl5 = ∅. Conflict 2, for fragment (l4 , l5 , l6 , l2 , l3 ), has as least solution xl5 = [1, 1], xl2 = [0, 0], xl3 = ∅. The equation was pl6 was not included in the second conflict, as it is not in the cone-of-influence of xl3 . Note, that variable x in equation IE (xl5 ) was substituted by [−∞, ∞]. This expression was generated by the first edge (l4 , l5 ) of the fragment (l4 , l5 , l6 , l2 , l3 ). The last update before this edge is outside of the scope of the fragment, and hence x was assumed to be [−∞, ∞], which is a conservative over-approximation. 4.4
Conflict Observer
Given a reduced conflict IE w for a fragment w = (l0 , . . . , lm ), we construct an observer such that if a word w ∈ LPˆ is accepted, then w ∈ / LP . The observer is a LTS over the same alphabet E as LT S(P ) and Pˆ . Definition 3. Given IA P = (L, l0 , E, X, update) and reduced conflict IE w for a fragment w = (l0 , . . . , lm ), define Xw as the set of all variables x ∈ X such that x = lhs(li−1 , li ), for some edge (li−1 , li ) in w. Observer Obsw is a LTS with – set SObs of states (current, eqn, conflict) with valuation current : Xw → L, valuation eqn : writesw → {unsat, sat} , and location conflict ∈ {all, some, none}. – initial state (write0 , eqn0 , loc0 ) with current0 ≡ l0 , eqn0 ≡ unsat, and conflict0 = none. – alphabet E – transition relation SObs × E × SObs . – a set final states F . A state is final if conflict = all. Due to the limited space we give an informal definition of the transition relation. The detailed formal definition can be found on [14]. – Variable conflict changes its state based on the next state of eqn. We have that conflict (xl ) is all, some, none, if eqn (xl ) = sat for all, some or no xl ∈ writesw , respectively. Once conflict is in all it remains there forever. – Variable eqn represents IE w (xl ) for xl ∈ writesw . The successor eqn (xl ) will change if x = lhs(λ, λ ) and λ = l. If substituting variables x by xcurrent(x) in rhserpr(λ, λ ), results in an expression that does not appear in IE w (xl ), for any xl , then eqn (xl ) = unsat for all xl . Otherwise, the successor eqn (xl ) is sat for matching variables xl , and remains unchanged for all others. This is a simple syntactic match, achieved by comparing the indices and variables that appear in IE w (xl ) with the values of current(x). – Variable current is used to record for each variable the location of the last update. If conflict’ is none, then current’(x) = l0 . Otherwise, if (λ, λ ) is in w, and x = lhs(λ, λ ), then current’(x) = λ . Otherwise, it remains unchanged. The interaction between current, eqn, and conflict is somewhat subtle. The idea is that the observer is initially in conflict = none. If an edge is observed such
Incremental False Path Elimination for Static Software Analysis
265
which generates an expression expr(i) that appears in IE w (xl ), with i ∈ Ωw (xl ) (see Eq. (5)), then conflict = some, and the corresponding eqn(xl ) = sat. It can be deduced IE w (xl ) is satisfied, unless another expression is encountered that might enlarge the fix-point. This is the case when an expression for xl will generated, that does not appear in IE w (xl ). It is conservatively assumed that this expression increases the fixpoint solution. If conflict = all it can be deduced that the observed edges produce an equation system IE w that has a non-empty solution only if IE w has a non-empty solution. And from the assumption we know that IE w has an empty-solution, and thus also IE w . Which implies that the currently observed run is infeasible and cannot satisfy Eq. 3. Given a reduced conflict IE w for fragment w we have two properties. – If a word w ∈ LPˆ is accepted by Obsw , then w ∈ LP . – Given a fragment w such that IE w = IE w , and a word w such that w is a subsequence, then w is accepted by Obsw . The first property states that the language of Pˆ = Pˆ × ObsC w contains LP . The second ensures that each observer is only constructed once. This is needed to ensure termination. Each observer is uniquely determined by the finite set expressions that appear in it, and since XL and E are finite, there exists only a finite set of possible expressions that may appear in (4). Consequently, there can only exist a finite set of conflicts as defined by (5). Example. The observer for the first conflict in Fig. 3 accepts a word if a fragment generates conflict xl2 → [10, 10], xl3 → xl2 [1, ∞], xl5 → xl3 [1, 1]. This is the case if it observes edge (l1 , l2 ), edge (l2 , l3 ) with a last write to x at l2 , and edge (l4 , l5 ) with a last write to x at l3 . All other edges are irrelevant, as long as they do not write to x2 , x3 or x5 , and change the solution. This would be, for example, the case for (l2 , l3 ) if current(x) = l2 . This would create an expression different from xl2 [1, ∞], and thus potentially enlarge the solution set. The observer for the other conflicts is constructed similarly. The complement of these observers are obtained by labeling all states in S \F as final. The product of the complemented observers with the annotated CFG in Fig.1 removes all potential counterexamples. The observer for the first conflict prunes all runs that enter the for-loop once, and then immediately enter the if-branch. The observer for the second conflict prunes all words that enter the if-branch and return into the loop. 4.5
Path Reduction with NuSMV
The previous subsections assumed a checker for language inclusion for LTS. In practice we use however the CTL model checker NuSMV. The product of Pˆ with the complement of the observers is by construction proper a abstractions of LT S(P ). The results language inclusion would therefore extend also to invariant checking, path inclusion, LTL and ACTL model checking. For pragmatic reasons we do not restrict ourselves to any of these, but use full CTL and LTL1 . 1
NuSMV 2.x supports LTL as well.
266
A. Fehnker, R. Huuck, and S. Seefried
Whenever NuSMV produces a counterexample path, we use interval solving as described before to determine if this path is spurious. Note, that path reduction can also be used to check witnesses, for example for reachability properties. In this case path reduction will check if a property which is true on the level of abstraction is indeed satisfied by the program. The abstraction Pˆ and the observers are composed synchronously in NuSMV. The observer synchronizes on the current and next location of Pˆ . The property is defined as a CTL property of Pˆ . The acceptance condition of the complements of observers is modeled as LTL fairness condition G¬(conflict = all).
5 5.1
Implementation and Experiments C to Interval Equations
This section describes how to abstract a C/C++ program to a set of interval equations, and covers briefly expression statements, condition statements as well as the control structures. Expressions statements involving simple operations such as addition and multiplication are directly mapped to interval equations. E.g., an expression statement x=(x+y)*5 is represented as xi+1 = (xi + yi ) ∗ [5, 5]. Subtraction such as x = x − y can be easily expressed as xi+1 = xi + ([−1, −1] ∗ yi ). Condition statements occur in constructs such as if-then-else, for-loops, whileloops etc. For every condition such as x 0, hence ci ∈ αY (I). Likewise ci+1 ∈ αY (I). Although αY is partial, it can be used to abstract arbitrary TVPI systems: Definition 12. The abstraction map α : ℘f (TVPI) → ℘f (Log) is given by α(I) = ∪{αY (∃Y.I) | Y ⊆ X ∧ |Y | = 2} 6
pppp !p
ci+1 := −x + 2y ≤ 1 !! ! q! ci+1 := −x + (5/2)y ≤ 2
c i := −x + (3/2)y ≤ 0 ci := −x + 2y ≤ 1 q c := −x ≤ 0 ci := −x + y ≤ 0 pppp i−1 p
Fig. 3. Approximation with logahedral constraints
316
J.M. Howe and A. King
In the above ∃Y.I denotes project I onto Y , that is, the repeated application of Fourier-Motzkin to elimination of all variables x ∈ X \ Y from I. However, if I is complete then α(I) = ∪{πY (I) | Y ⊆ X ∧ |Y | = 2}. Interestingly, α can be immediately lifted to α : ℘f (Poly) → ℘f (Log) since if I is a finite set then ∃Y.I is a finite set. The symbol α hints at the existence of a Galois connection between ℘f (Poly), |= and ℘f (Log), |= , and indeed α is monotonic. But such a structure can only be obtained by quotienting ℘f (Poly) and ℘f (Log) by ≡ to obtain posets. The upper adjoint of α is the identity. To abstract the inequalities I or I ∪ Boxk to Logk put ci := sign(bi )y ≤ sign(bi )φi−1 and ci := ai x + sign(bi )2k y ≤ ai ψi + sign(bi )2k φi if ai = sign(bi ) and k < lg(|bi |). Likewise, if lg(|bi |) < −k then put ci := ai x + sign(bi )2−k y ≤ ai ψi−1 + sign(bi )2−k φi−1 and ci := ai x ≤ ai ψi . Similarly for ai = sign(bi ). 4.5
Meet and Abstraction
Meet over ℘f (Log), |= can be defined by [I1 ]≡ [I2 ]≡ = [I1 ∪ I2 ]≡ . Thus, meet reduces to set union when a class [I]≡ is represented by a set I. Henceforth quotienting will be omitted, for brevity, and to reflect implementation. A less trivial problem is that of computing α(I1 ∪ I2 ) where I1 ∈ ℘f (Log) and I2 ∈ ℘f (Poly). This arises as a special case when a statement is encountered, such as a loop condition x ≤ y+z or an assignment x = y+z, that cannot be expressed logahedrally. If these statements are described respectively by I2 = {x ≤ y + z} and I2 = {y +z ≤ x, x ≤ y +z}, and I1 is a logahedral description of the program state, then the subsequent state is described by α(I1 ∪ I2 ). Thus, in an analysis, α(I1 ∪ I2 ) performs the role of meet when I2 is not logahedral. This problem has been tackled linear programming [15]. To illustrate, using n suppose I2 = {c } where c := k=1 ak xk ≤ d . Let i, j ∈ [1, n] where i = j and minimise k =i,k =j ak xk subject to I1 and c . If a minimum di,j exists then it follows I1 ∪ {c } |= ai xi + aj xj ≤ d − di,j . TVPI inequalities found this way are added to I1 . An alternative and potentially more precise approach, which is applicable to both TVPI and logahedra, is based on extending the resultant operator to polyhedral inequalities in the following fashion: n Definition 13. If c := ai xi + aj xj ≤ d ∈ TVPI, c := k=1 ak xk ≤ d ∈ Poly, aj =0 and ai .ai < 0 then result(c, c , xi ) = (|ai |aj + |ai |aj )xj + |ai | k ∈{i,j} ak xk ≤ |ai |d + |ai |d otherwise result(c, c , xi ) = ⊥. The aj = 0 condition ensures |vars(result(c, c , xi ))| < |vars(c )|. Thus if c is ternary then result(c, c , xi ) is either binary or undefined. The partial result map can be lifted to a total map result(I1 , I2 ) for I1 ⊆ TVPI and I2 ⊆ Poly by analogy with definition 9. With this in place, an operator extend(I1 , I2 ) is introduced, designed so that α(I1 ∪ I2 ) |= extend(I1 , I2 ). Definition 14. The map extend : ℘f (TVPI) × ℘f (Poly) → ℘f (TVPI) is defined extend(I, δ) = ∪i≥0 Ii where I0 = complete(I ∪ (δ ∩ TVPI)), δ0 = δ \ TVPI, Ri+1 = result(Ii , δi ), Ii+1 = complete(Ii ∪ (Ri+1 ∩ TVPI)) and δi+1 = Ri+1 \ Ii+1 .
Logahedra: A New Weakly Relational Domain
317
If δ is comprised of ternary constraints, which is the dominating case [15], then result is applied once. If desired, incremental closure can be used to compute complete(Ii ∪ (Ri+1 ∩ TVPI)), since proposition 2 extends to TVPI [8]. If extend(I, δ) is not logahedral, then α must subsequently be applied. Example 5. Consider augmenting I = {x − y ≤ 0, −x + y ≤ 0} with c := x−2y +z ≤ 0. Then I0 = I and δ0 = {c }. Thus R1 = result(I0 , δ0 ) = {y +z ≤ 0} whence δ1 = ∅ and I1 = complete(I0 ∪ R1 ) = I ∪ {y + z ≤ 0, x + z ≤ 0}. These inequalities cannot be inferred with the linear programming technique of [15]. Example 6. Let I = {w − x ≤ 0, y − z ≤ 0} and c := w + x + y + z ≤ 1. Then I0 = complete(I) = I and δ0 = {c }. Thus R1 = result(I0 , δ0 ) = {2w + y + z ≤ 1, x + w + 2y ≤ 1}, I1 = I0 and δ1 = R1 . Next R2 = result(I2 , δ1 ) = {2w + 2y ≤ 1}, I2 = complete(I1 ∪ R2 ) = I1 ∪ R2 = I ∪ {w + y ≤ 1/2} and δ2 = ∅. 4.6
Join
Join is required to merge abstractions on different paths, as is shown with P5 in section 3. If quotienting is omitted, then join over ℘f (Log) can be defined I1 I2 = ∪{∃Y.I1 Y ∃Y.I2 | Y ⊆ X ∧|Y | = 2} where J1 Y J2 = αY (J1 ∨J2 ) and ∨ is join (planar convex hull) for TVPI [15]. Join also benefits from completeness since if Ii is complete then ∃Y.Ii = πY (Ii ). The TVPI operation ∨ is O(n lg n) where n = |J1 | + |J2 |, hence the overall cost of is dominated by completion. With and thus defined, ℘f (Log), |=, , forms a lattice. Furthermore, Logk has additional structure since Ck is finite (see definition 5). In particular, if I ⊆ Logk then there exists a finite K ⊆ I such that I ≡ K. As a consequence ℘f (Logk ), |=, , is a complete lattice. Widening [6] is often applied with join in order to enforce termination. As has been explained elsewhere [11,15], care must be taken not to reintroduce inequalities through completion that have been deliberately discarded in widening.
5
Logahedra versus Octagons and TVPI
Logahedra are theoretically well motivated and their representational advantages, as a generalisation of octagons, are of interest. To provide preliminary data on the power of bounded logahedra two sets of experiments were performed. In the first experiment sets of integer points, { x, y | x, y ∈ [−32, 32]} were randomly selected. For each set, between 1 and 63 points were generated. The best Oct = Log0 and Logk (for k ∈ [1, 5]) abstractions were computed and compared with the best TVPI abstraction. The comparison is based on the number of integer points in the abstractions. For example, one set of 23 points had 7 extreme points. The Oct, the five Log and the TVPI descriptions were satisfied by 3221, 3027, 2881, 2843, 2835, 2819, 2701 integer points. Thus the precision loss incurred by Oct over TVPI is (3321−2701)/2701 = 0.230. Likewise the losses for Log relative to TVPI are 0.121, 0.067, 0.053, 0.050 and 0.044. This was done
318
J.M. Howe and A. King Table 1. Precision comparison of Oct and Logk against TVPI on random data vertices 1 2 3 4 5 6 7 8 9 10 11 12
sets 2044 2958 5923 10423 14217 13550 9058 4345 1508 398 64 6
Oct 0.000 82.640 1.874 0.557 0.352 0.276 0.234 0.205 0.188 0.171 0.165 0.179
k=1 0.000 43.941 0.998 0.294 0.192 0.152 0.131 0.115 0.105 0.096 0.095 0.107
k=2 0.000 30.675 0.700 0.195 0.125 0.097 0.081 0.071 0.064 0.058 0.054 0.06
Logk k=3 0.000 26.396 0.611 0.163 0.100 0.075 0.061 0.053 0.047 0.042 0.037 0.045
k=4 0.000 25.024 0.584 0.153 0.092 0.067 0.054 0.046 0.040 0.035 0.031 0.038
k=5 0.000 24.481 0.576 0.149 0.089 0.064 0.051 0.043 0.038 0.033 0.029 0.036
for 64K sets and the results are summarised in Table 1. The table presents the mean precision loss where the sets are grouped by their number of vertices. The data reveals that describing exactly two points is by far the most inaccurate scenario for Oct and Logk ; if the angle between the points is suitably acute then the TVPI constraints are satisfied by just the two points whereas the Oct and Logk constraints can be satisfied by a band of integral points. Precision loss decreases beyond this two point case. Enlarging the sample space does not noticeably effect the relative precision loss other than accentuating the two point case. Observe that the relative precision loss declines as k increases, suggesting that logahedra can offer a significant precision improvement over octagons. The second set of experiments compares the abstractions of two variable projections of the results of a polyhedral analysis tightened to integer points for a series of benchmarks [1,2,13]. As in the first experiment, comparisons are made between the number of points in the TVPI, Oct and Logk abstractions. Table 2 details benchmarks, their number of variables, the pair of variables in the projection, the number of integer points in the TVPI abstraction and the number of additional points for Oct and Logk abstractions. Benchmarks and projections where Oct and TVPI give the same abstraction are omitted from the table. This was the case for the additional 12 programs in the benchmark suite. The data illustrates the power of octagons, with the majority of two variable projections being octagonal (indeed, many of these were intervals). It also illustrates that there are cases when non-octagonal constraints occur, with (as in the first experiment) Logk precision improving as k increases. Together, the results motivate further study. The random data demonstrates that logahedra have the potential to deliver precision gains over octagons, however, the analysis data adds a note of caution. It is not surprising that many program invariants can be described by intervals, nor that many of the remainder are octagonal. The data (and the example in section 3) shows that the descriptive power of logahedra can improve analysis, leaving open the question of whether
Logahedra: A New Weakly Relational Domain
319
Table 2. Precision comparison of Oct and Logk against TVPI on analysis data Logk fixpoint vars project Oct k=1 k=2 k=3 k=4 k=5 cars.inv 5 (4, 5) 312 234 78 54 54 54 efm1.inv 6 (3, 6) 1 0 0 0 0 0 heap.inv 5 (1, 2) 1056 0 0 0 0 0 heap.inv 5 (3, 4) 465 0 0 0 0 0 heap.inv 5 (3, 5) 465 0 0 0 0 0 robot.inv 3 (1, 2) 528 0 0 0 0 0 scheduler-2p.invl1 7 (3, 6) 135 120 90 30 18 18 scheduler-2p.invl1 7 (4, 6) 115 100 70 20 12 12 scheduler-2p.invl1 7 (6, 7) 135 120 90 30 18 18 scheduler-2p.invl2 7 (4, 6) 189 168 126 42 26 26 scheduler-2p.invl2 7 (6, 7) 90 80 60 20 12 12 scheduler-3p.invl1 10 (4, 9) 264 198 66 45 45 45 scheduler-3p.invl1 10 (9, 10) 264 198 66 45 45 45 scheduler-3p.invl3 10 (4, 8) 312 234 78 54 54 54 scheduler-3p.invl3 10 (8, 9) 144 108 36 24 24 24 scheduler-3p.invl3 10 (9, 10) 534 128 128 128 128 128 see-saw.inv 2 (1, 2) 990 231 110 110 110 110 swim-pool-1.inv 9 (4, 6) 62 61 59 55 47 31
TVPI 3640 128 33 1055 1055 1089 180 260 180 245 215 390 390 455 405 1725 2454 4162
the additional cost (a higher constant in the complexity) pays for itself with improved accuracy. The answer to this question is in part application specific.
6
Conclusion
This paper has introduced logahedra, a new weakly relational abstract domain. A variant of logahedra, bounded logahedra Logk , where k is the maximum exponent is also introduced. Logahedra lie strictly between octagons and TVPI in terms of expressive power, with octagons forming a special case, Oct = Log0 . Bounded logahedra retain the good computational properties of octagons, whilst being less restrictive. The theory for the abstract domain has been given, along with algorithms for each domain operation. Logahedra have the further advantage that their variable coefficients can be represented by their exponents, mitigating the problem of large coefficients that arise when using polyhedra or TVPI. A preliminary investigation into the expressive power of logahedra, plus a worked example, suggests that they can lead to more accurate analysis, but cautions that many invariants can be described using intervals and octagons. Future work will further investigate the application of logahedra in verification. Acknowledgements. The authors thank Phil Charles, Tim Hopkins, Stefan Kahrs and Axel Simon for useful discussions. The work was supported by EPSRC projects EP/E033105/1 and EP/E034519/1. The authors would like to thank the University of St Andrews and the Royal Society for their generous support.
320
J.M. Howe and A. King
References 1. Bardin, S., Finkel, A., Leroux, J., Petrucci, L.: FAST: Fast Acceleration of Symbolic Transition Systems. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 118–121. Springer, Heidelberg (2003) 2. Charles, P.J., Howe, J.M., King, A.: Integer Polyhedra for Program Analysis. In: AAIM. LNCS, vol. 5564, pp. 85–99. Springer, Heidelberg (2009) 3. Claris´ o, R., Cortadella, J.: The Octahedron Abstract Domain. Science of Computer Programming 64, 115–139 (2007) 4. Cousot, P., Halbwachs, N.: Automatic Discovery of Linear Restraints among Variables of a Program. In: POPL, pp. 84–97. ACM Press, New York (1978) 5. Eisenbrand, F., Laue, S.: A Linear Algorithm for Integer Programming in the Plane. Mathematical Programming 102(2), 249–259 (2005) 6. Halbwachs, N.: D´etermination Automatique de Relations Lin´eaires V´erifi´ees par les Variables d’un Programme. PhD thesis, Universit´e Scientifique et M´edicale de Grenoble (1979) 7. Henzinger, T.A., Ho, P.-H., Wong-Toi, H.: HyTech: A Model Checker for Hybrid Systems. Software Tools for Technology Transfer 1, 110–122 (1997) 8. Howe, J.M., King, A.: Closure Algorithms for Domains with Two Variables per Inequality. Technical Report TR/2009/DOC/01, School of Informatics, City University London (2009), http://www.soi.city.ac.uk/organisation/doc/research/tech_reports/ 9. Larsen, K.G., Larsson, F., Pettersson, P., Yi, W.: Efficient Verification of Real-time Systems: Compact Data Structure and State-space Reduction. In: IEEE Real-Time Systems Symposium, pp. 14–24. IEEE Computer Society, Los Alamitos (1997) 10. Logozzo, F., F¨ ahndrich, M.: Pentagons: a Weakly Relational Abstract Domain for the Efficient Validation of Array Accesses. In: ACM Symposium on Applied Computing, pp. 184–188. ACM Press, New York (2008) 11. Min´e, A.: The Octagon Abstract Domain. Higher-Order and Symbolic Computation 19(1), 31–100 (2006) 12. Nelson, C.G.: An nlg n Algorithm for the Two-Variable-Per-Constraint Linear Programming Satisfiability Problem. Technical Report STAN-CS-78-689, Stanford University, Computer Science Department (1978) 13. Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Constraint Based Linear Relations Analysis. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 53–68. Springer, Heidelberg (2004) 14. Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Scalable Analysis of Linear Systems Using Mathematical Programming. In: Cousot, R. (ed.) VMCAI 2005. LNCS, vol. 3385, pp. 25–41. Springer, Heidelberg (2005) 15. Simon, A.: Value-Range Analysis of C Programs: Towards Proving the Absence of Buffer Overflow Vulnerabilities. Springer, Heidelberg (2008) 16. Simon, A., King, A.: Exploiting Sparsity in Polyhedral Analysis. In: Hankin, C., Siveroni, I. (eds.) SAS 2005. LNCS, vol. 3672, pp. 336–351. Springer, Heidelberg (2005) 17. Simon, A., King, A., Howe, J.M.: Two Variables per Linear Inequality as an Abstract Domain. In: Leuschel, M. (ed.) LOPSTR 2002. LNCS, vol. 2664, pp. 71–89. Springer, Heidelberg (2003)
Synthesis of Fault-Tolerant Distributed Systems Rayna Dimitrova and Bernd Finkbeiner Saarland University, Germany
Abstract. A distributed system is fault-tolerant if it continues to perform correctly even when a subset of the processes becomes faulty. Faulttolerance is highly desirable but often difficult to implement. In this paper, we investigate fault-tolerant synthesis, i.e., the problem of determining whether a given temporal specification can be implemented as a fault-tolerant distributed system. As in standard distributed synthesis, we assume that the specification of the correct behaviors is given as a temporal formula over the externally visible variables. Additionally, we introduce the fault-tolerance specification, a CTL∗ formula describing the effects and the duration of faults. If, at some point in time, a process becomes faulty, it becomes part of the external environment and its further behavior is only restricted by the fault-tolerance specification. This allows us to model a large variety of fault types. Our method accounts for the effect of faults on the values communicated by the processes, and, hence, on the information available to the non-faulty processes. We prove that for fully connected system architectures, i.e., for systems where each pair of processes is connected by a communication link, the fault-tolerant synthesis problem from CTL∗ specifications is 2EXPTIME-complete.
1
Introduction
Fault-tolerance is an important design consideration in distributed systems. A fault-tolerant system is able to withstand situations where a subset of its components breaks: depending on the chosen type of fault-tolerance, the system may completely mask the fault, return to correct behavior after a finite amount of time, or switch to a behavior that is still safe but possibly less performant. Fault-tolerance is highly desirable but often difficult to implement. Thus, formal methods for verification [7] and synthesis [10] of fault-tolerance are necessary. Traditionally, fault-tolerance requirements are chosen manually. While it is obviously desirable to stay as close as possible to the normal behavior, the question which type of fault-tolerance can be realized in a given system is difficult to decide and requires a careful analysis of both the desired system functionality and the possible faults. In this paper, we develop algorithmic support for this
This work was partly supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Center Automatic Verification and Analysis of Complex Systems (SFB/TR 14 AVACS). Supported by a Microsoft Research European PhD Scholarship.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 321–336, 2009. c Springer-Verlag Berlin Heidelberg 2009
322
R. Dimitrova and B. Finkbeiner
design step. We present a synthesis algorithm that determines if a given temporal specification has a fault-tolerant implementation, and, in case the answer is positive, automatically derives such an implementation. Our goal is thus more ambitious than previous approaches (cf. [10, 2, 4]) to fault-tolerant synthesis, which transform an existing fault-intolerant implementation into a fault-tolerant version. While such approaches are often able to deliver fault-tolerant systems, they are inherently incomplete, and can therefore not be used to decide whether a given fault-tolerance requirement can be realized. In order to obtain a decision procedure, we cannot treat the implementation of the system functionality and the implementation of its fault-tolerance as two separate tasks, but must rather extend the synthesis algorithm to address both concerns at once. In the restricted setting of closed systems, i.e., of systems without input, such a combination has already been carried out: Attie et al. [1] represent faults by a finite set of fault actions that may be carried out by a malicious environment. Their method then synthesizes a program that is correct with respect to a specified set of such possible environments. The key challenge in moving from simple closed systems to general distributed systems is to account for the incomplete information available to the individual processes. Faults may affect the communication between processes, which affects the information the non-faulty processes have. Our setting builds on that of standard distributed synthesis [6], where the communication links between the processes are described as a directed graph, called the system architecture. We assume the architecture is fully connected and the system specification is external [13, 8], i.e., it does not refer to the internal variables. For standard synthesis, this case is known to be decidable: while the processes may read different inputs, they can simply transmit all information to the other processes through the internal communication links. The distributed system thus resembles a monolithic program in the sense that all processes are aware of the global state. The situation is more difficult in a fault-tolerant system, since, when a fault occurs in some process, the process essentially becomes part of the hostile environment and the remaining processes can no longer rely on receiving accurate information about the external input at its site. We present a synthesis algorithm for CTL∗ specifications that accounts for the resulting incomplete information. The given CTL∗ specification is a fault-tolerance specification which encodes the effects and the durations of the faults and the desired type of tolerance. Our algorithm is based on a transformation of the architecture and of the fault-tolerance specification. The architecture transformation changes the set of external input variables by introducing a new input variable for each process and making the original input variables unobservable for all processes in the architecture. The transformation of the specification establishes the relation between the original input of a process and the new faulty input. The two inputs are constrained to be the same during the normal operation of the corresponding process, which guarantees the correctness of the transformation, and may differ when the process is faulty, which allows us to assume that in the transformed architecture the faults do not affect the transmission of external input. Thus, we
Synthesis of Fault-Tolerant Distributed Systems
323
can reduce the distributed synthesis problem for the original architecture and fault-tolerance specification to the one of finding a monolithic implementation that satisfies the transformed specification in the presence of faults. We hence establish that the synthesis of fault-tolerant distributed systems with fully connected system architectures and external specifications is decidable. In fact, the problem is no more expensive than standard synthesis: fault-tolerant distributed synthesis from CTL∗ specifications is 2EXPTIME-complete.
2 2.1
Modelling Fault-Tolerant Systems Faults and Fault-Tolerance
Types of Faults. In the field of fault-tolerant distributed computing faults are categorized in a variety of ways. The categorization of faults according to the behavior they cause, results in several standard classes [3, 1]. Stuck-at faults can, for example, cause a component or a wire to be stuck in some state. If a process is affected by a fail-stop or a crash fault, it stops (potentially permanently) executing any actions before it violates its input-output specification. In both cases the process is uncorrectably corrupted, but while fail-stop faults are detectable, that is, other processes are explicitly notified of the fault, crash stops are undetectable. If a process fails to respond to an input from another component, i.e., some action is omitted, it is said to exhibit an omission fault. Omission faults are a subset of the class of timing faults that cause the component to respond with the correct value but outside the required time interval. The most general class of Byzantine faults encompasses all possible faults, including arbitrary and even malicious behavior of the affected process, and are in general undetectable. According to their duration, faults can be permanent, transient, or intermittent. In the latter two cases, upon recovery the affected process returns to normal operation from the arbitrary state it has reached in the presence of the fault. Fault-Tolerance Requirements. Usually the system is not required to satisfy the original specification after a fault occurs, but instead comply with some faulttolerance policy. Fault-tolerance properties are generally classified according to whether and how they respect the safety and liveness parts of the original specification. This classification yields three main types of tolerance. Masking tolerance always respects both safety and liveness. In non-masking tolerance, however, the safety property might be temporarily violated, but is guaranteed to be eventually restored, while the liveness part is again always respected. A third type is fail-safe tolerance. When formalizing fail-safe tolerance it is assumed that the original specification is given as conjunction of a safety and a liveness specifications [12], and after fault occurrence only the safety conjunct has to be satisfied. 2.2
Architectures for Fault-Tolerant Synthesis
An architecture describes the communication between the processes in a distributed system and their interaction with the external environment. We model
324
R. Dimitrova and B. Finkbeiner
the occurrence of a fault as an action of the environment. In the following, we assume that faults are detectable, that is there exists a reliable unit of the external environment that notifies all processes immediately when a fault occurs in some of them, and also informs them exactly which processes were faulty in the previous execution step. To this end, we consider architectures with a distinguished set of external input fault-notification variables, which all processes in the system are allowed to read. Alternatively, the fault-notification variables could be made invisible to the system processes, in which case finding the fault-detection mechanism would be part of the synthesis problem. An architecture A = (env , P, Ext , C, (D(v))v∈Ext , (In(p), Out(p))p∈P ) is a tuple that consists of: environment env, a finite set of processes P , a set Ext of external variables together with a finite domain D(v) for each v ∈ Ext, a set C (disjoint from Ext) of internal variables, and read and write permissions for each p ∈ P . The set Ext is the union of the disjoint sets I, O, H and N , where: – The set I consists of the external input variables whose values are supplied by the environment env . The set I is the union of the sets Ip , where for each process p, Ip is a set of external input variables, this process can read. Each variable in I is read by at least one (possibly several) processes in P . – The set O consists of the external output variables, via which the processes provide their output to the environment env. The set O is the union of the disjoint sets Op , where for each process p, Op is the set of external output variables written by that process, which no other process in P can read. – The set H consists of the external private environment variables written by the environment env , and which none of the processes in P can read. – The set N = {np , mp | p ∈ P } of external input variables for fault notification contains one variable np for each process p that is used by the environment to notify all processes for a fault occurrence in p and a variable mp that indicates whether p was faulty in the previous execution step. The variables in N can be read by all processes and are written only by the environment. The domain D(np ) of np is a finite subset of N that consists of the different faults that can occur in process p, where 0 indicates normal operation. Similarly for mp . The set C consists of the variables used for internal communication between the processes. It is the union of the disjoint sets Cp , where for each process p, Cp is the set of internal variables written by p via which it communicates to the other processes. We denote with V the set of all variables in an architecture A. For a process p, the set In(p) consists of all variables (internal or external) this process is allowed to read and Out(p) = Cp ∪ Op consists of all variables that this process is allowed to write. By definition, the sets Out(p) are disjoint. The architecture associates with each external variable v ∈ Ext a finite nonempty domain D(v) together with some designated element d0 (v) ∈ D(v). The domains of the internal variables are unconstrained by the architecture, and hence the capacity of the communication channels is not limited a priori. Consider some nonempty finite domains D(v)v∈C for the internal variables in A. For a subset U ⊆ V , we denote with D(U ) the Cartesian product u∈U D(u) and with d0 (U ) the tuple (d0 (u))u∈U . For d ∈ D(U ), u ∈ U and U ⊆ U , du and
Synthesis of Fault-Tolerant Distributed Systems
325
dU denote the projections of d on the variable u and on the subset of variables U , respectively. For a finite or infinite sequence σ = d0 d1 d2 . . . of elements of D(U ) and j ≥ 0, we denote by σ[j] the element dj and with σ(j) the prefix of length j of σ (if j = 0, then σ(j) = ε). Projection trivially extends to sequences and prefixes. When σ is finite, we denote with |σ| the number of elements of σ. We consider synchronous communication with delay: at each step, each process reads its current external input and the output of the processes in P delayed by one step. For a global computation history σ ∈ D(V )∗ we have that σ[0] = d0 (V ) and for every j ≥ 1, σ[j]I ∪ H ∪ N is the input provided by the environment at step j and σ[j]V \ (I ∪ H ∪ N ) is the output of the processes at step j − 1, i.e., the history reflects the delay. Thus, for simplicity of the presentation we have assumed that the delay of each variable v ∈ V \ (I ∪ H ∪ N ) is 1. Our results can be easily extended to the case of arbitrary a priori fixed delays. Fully Connected Architectures. An architecture A is fully connected if every pair of processes is connected via a communication link with sufficient capacity. From now on, we consider only fully connected architectures and w.l.o.g. assume that C = {cp , tp | p ∈ P }, where for each process p, the variables cp and tp are written by p and read by all processes, and the domain of cp is fixed to be D(Ip ). Thus, process p can use cp to communicate its input. We denote with cvp the component of the variable cp used for the transmission of v ∈ Ip . The domains of the variables tp for p ∈ P are left unspecified in the architecture. 2.3
The Specification Language CTL∗
Syntax. Let AP be a finite set of atomic propositions. The logic CTL∗ distinguishes state and path formulas. State formulas are called CTL∗ formulas. State formulas over AP are formed according to the following grammar, where p ∈ AP and θ stands for a path formula: ϕ ::= true | p | ¬ϕ | ϕ1 ∧ ϕ2 | Eθ. Path formulas are formed according to the following grammar, where ϕ is a state formula and θ, θ1 and θ2 are path formulas: θ ::= ϕ | ¬θ | θ1 ∧ θ2 | Xθ | θ1 U θ2 . As abbreviations we can define the remaining usual boolean operators over state and path formulas. For a path formula θ, we define the state formula Aθ as ¬E¬θ, the path formula Fθ as true U θ and the path formula Gθ as ¬F¬θ. Trees. As usual, for a finite set X, an X-tree is a prefix-closed subset T ⊆ X ∗ of finite words over X. The direction of every nonempty node σ · x ∈ X + is defined as dir (σ · x) = x, and for ε, dir (ε) = x0 where x0 ∈ X is some designated root direction. A X-tree T is called total if ε ∈ T and for every σ ∈ T there exists at least one successor σ · x ∈ T , x ∈ X. If T = X ∗ , then T is called full. For a given finite set Y , a Y -labeled X-tree is a pair T, l, where T is an X-tree and l : T → Y is a labelling function that maps each node in T to an element of Y . Semantics. Consider a set of variables V with D(V ) being the Cartesian product of their domains. Let AP be a finite set of atomic propositions over V . A CTL∗ formula ϕ over AP can then be interpreted over total D(V )-labeled trees
326
R. Dimitrova and B. Finkbeiner
according to the standard semantics [5] of CTL∗ . A total D(V )-labeled tree T, l is a model of ϕ, written T, l |= ϕ, iff the root node of T, l satisfies ϕ. 2.4
Specifying Fault-Tolerance
The system specification describes the desired input-output behavior of the system in the absence of faults and leaves the internal communication unconstrained. That is, we are given an external specification as a CTL∗ formula ϕ over atomic propositions from the set AP = {v = a | v ∈ Ext \ N, a ∈ D(v)}, i.e., about external variables. The models of ϕ are total D(Ext)-labeled D(Ext)-trees. In the presence of faults, the system need not satisfy the original specification, but instead comply with some (possibly weaker) fault-tolerance specification. An external specification for an architecture A can refer to the fault notification variables in N . This allows for specifying the intended fault-tolerance policy as well as encoding the effects and durations of faults in the input CTL∗ formula. Given the original specification ϕ, we first construct a formula ΦT OL according to the required type of fault-tolerance. The user can describe manually as a CTL∗ formula the desired properties of the behavior of the system in the presence of particular faults in particular processes and combinations thereof. Of course, classical fault-tolerance requirements, such as masking, non-masking or fail-safe, can be also specified (for masking tolerance it suffices to leave the specification unchanged). Moreover, in the case of simple specifications such as invariants, i.e., of the form AGψ, this compilation can be done automatically: For fail-safe and non-masking tolerance, the tolerance properties are respectively AG(ψ ∨ (fault -present ∧ ψsafe )) and AG(ψ ∨ (fault -present ∧ AFAGψ)), where fault -present = p∈P ¬(np = 0) and ψsafe is the safety conjunct of ψ. In our model, the occurrence of a fault causes the affected process to behave in an arbitrary way, i.e., it exhibits maximal behavior. However, by constraining this behavior in the fault-tolerance specification, we can model several of the fault types mentioned in the beginning of this section, as well as many more. Given a set of faults with their effects on the behavior of a process and their durations, we transform the formula ϕT OL into the fault-tolerance specification Φt , by relativizing the path quantifiers in the formula ϕT OL w.r.t. the corresponding assumptions on the environment. These assumptions are encoded in the formulas fault -behavior , fault -duration, and fault -distribution, whose construction we discuss below. Thus, the fault-tolerance specification Φt is obtained from the formula ϕT OL by substituting each occurrence of Aθ by A((fault -behavior ∧ fault -duration ∧ fault -distribution) → θ), and each occurrence of Eθ by E(fault -behavior ∧ fault -duration ∧ fault -distribution ∧ θ). The formula fault -behavior describes the possible behaviors of the processes in the presence of each of the given faults. Let faulty-output(d, p) be a state formula describing the possible outputs of process p when affected by the fault of type d (we can assume thata stopped process outputs some default element ⊥). Then, fault -behavior = G p∈P d∈D(np ) (np = d → X(faulty-output(d, p))). The formula fault -duration constrains the duration of faults. Let D (np ) and D (np ) be the subsets of D(np ) consisting of the permanent and the transient
Synthesis of Fault-Tolerant Distributed Systems
327
faults, respectively, for a process p ∈ P . For each d ∈ D (np ), we assume the existence of a boolean variable rpd in H, where rpd being true means that process p has recovered from d. The path formulas permanent (d, p) = G((np = d) → G(np = d)) and transient(d, p) = G(((np = d) → (np = d U rpd )) ∧ (rpd → (¬(np = d) ∧ Grpd ))) state that the occurrence of fault of type d in process p is permanent, respectively transient (i.e., the duration of the fault is finite and the recovered process cannot perturb again, cf. [7]). Finally, we define the formula fault -duration = (( permanent(d, p)) ∧ ( p∈P d∈D (np ) d∈D (np ) transient(d, p))). The user can also provide a path formula fault -distribution that constrains the number of faulty processes in the considered system during the execution, e.g., there is at most one faulty process at every point of the execution. To avoid restriction to only memoryless implementations, we assume that at every point of the system’s execution at least one process is not faulty, i.e., the synthesized system is designed to tolerate up to n − 1 simultaneously faulty processes, where n is the total number of processes. Thus, we assume that the formula fault -distribution is a conjunction of the user specified requirements and: the formula G p∈P (np = 0) which guarantees that there is at least one nonfaulty process at every point, and the formula G p∈P (np = d → X(mp = d)), which states that the values of the variables mp and np are correctly related. Note that for common fault types such as fail-stop or stuck-at, as well as for the usual constraints on the duration of faults, the fault-tolerance specification can be compiled automatically from the original specification. The user can also specify customized requirements expressible in CTL∗ . From now on, we assume that the fault-tolerance specification is given as input to our algorithm. Example (Reliable Broadcast). In a broadcast protocol, the environment consists of n clients E1 , . . . , En , which broadcast messages. The system consists of n servers, S1 , . . . , Sn that correspond to the processes p1 , . . . , pn , which deliver the messages to the clients. Each client Ej communicates only with the corresponding server Sj and we assume that each message sent by a client is unique. A system with 3 servers is depicted on Fig. 1, where we have omitted the internal communication variables. Let M be the finite set of possible message contents. The domain of each input, output or communication variable v ∈ {ij , oj , cj | j = 1, 2, 3} is D(v) = {(m, j) | m ∈ M, j ∈ {1, 2, 3}} ∪ {⊥}, where j denotes the broadcaster’s name and ⊥ indicates the absence of message. In the absence of faults, the correctness can be specified by the following standard requirements: (1) If a client Ej broadcasts a message m, then the server Sj eventually delivers m; (2) If a server delivers a message m, then all D
env E1 E2 E3 o o o 1 2 3 i1 i2 i3 S1 S2 S3 system
n1 , n2 , n3
Fig. 1. Architecture of a distributed server
328
R. Dimitrova and B. Finkbeiner
servers eventually deliver m; (3) For every message m, every server delivers (m, l) at most once and does so only if m was previously broadcast by the client El . The environment component D notifies all servers when a fault in server j occurs, by setting nj to 1. A faulty server sends arbitrary messages to the corresponding client and to the other servers. The duration of faults is unconstrained. Thus, both formulas fault -behavior and fault -duration are equivalent to true. We specify the fault-tolerance requirement as follows. We replace each requirement that a server eventually delivers a message m by the weaker requirement that it has to eventually deliver a message, provided that from that point on it is never faulty. The safety property that the servers do not invent messages is also weakened to hold only for non-faulty processes. Thus, we obtain a variation of the standard requirements for reliable broadcast from [9]. Validity: If a client Ej broadcasts a message m and the corresponding server Sj is never faulty from that point on, then Sj eventually delivers m. ϕVj = AG m∈M ((ij = (m, j) ∧ G(nj = 0)) → F(oj = (m, j))) Agreement: If a non-faulty server Ej delivers a message (m, l), then all servers that are non-faulty from that point on eventually deliver (m, l). ϕA j = AG m∈M,l∈P (oj = (m, l) ∧ nj = 0 → p∈P (G(np = 0) → F(op = (m, l)))) Integrity: For every message m, every non-faulty server Ej delivers (m, l) at most once and does so only if m was previously broadcast by the client El . ϕIj = AG m∈M,l∈P ((nj = 0 ∧ oj = (m, l)) → XG(nj = 0 → ¬(oj = (m, l))))∧ AG m∈M,l∈P ¬((¬il = (m, l)) U (oj = (m, l) ∧ nj = 0 ∧ ¬il = (m, l))) 2.5
The Fault-Tolerant Synthesis Problem
Let A = (env , P, Ext, C, (D(v))v∈Ext , (In(p), Out(p))p∈P ) be a fully connected architecture. A distributed implementation for the architecture A consists of a tuple (D(v))v∈C of sets defining domains for the internal variables and a tuple sˆ = (sp )p∈P of implementations for the processes in P , where an implementation for a process p is a function (strategy) sp : D(In(p))∗ → D(Out(p)) which maps each local input history for p to an assignment to the variables written by p. A distributed implementation sˆ is finite-state if for each process p, the domain D(Cp ) is finite and the strategy sp can be represented by a finite automaton. In the absence of faults, a distributed implementation sˆ = (sp )p∈P defines a computation tree CT (ˆ s) = T, dir , where T ⊆ D(V )∗ is the greatest total tree such that for all σ ∈ D(V )∗ and all d ∈ D(V ), if σ · d ∈ T , then σ ∈ T and for every p ∈ P it holds that dOut(p) = sp (σIn(p)). Recall that the realizability problem for an architecture A with fixed finite domains for all variables and a CTL∗ specification ϕ over Ext \ N is to decide whether there exists a finitestate distributed implementation sˆ for A such that CT (ˆ s) |= ϕ. The distributed synthesis problem requires finding such an implementation if one exists. In the presence of faults, a distributed implementation sˆ = (sp )p∈P defines a fault computation tree F CT (ˆ s) = T, dir , where T ⊆ D(V )∗ is the greatest
Synthesis of Fault-Tolerant Distributed Systems
329
total tree such that for all σ ∈ D(V )∗ and d ∈ D(V ), if σ · d ∈ T , then σ ∈ T and for every p ∈ P such that dir (σ)np = 0 it holds that dOut(p) = sp (σIn(p)), i.e., the output of only non-faulty processes determines the successors of a node. The implementations for the processes in P should be independent of information about the external environment the processes do not have. Consider an external input variable v ∈ I and assume that at some point all processes allowed to read v are faulty. The behavior of the (non-faulty) processes in P at this and at later points of the execution of the system should not depend on the value of v at that moment, because the faulty processes may communicate the value incorrectly and their states may be arbitrarily perturbed. Thus, we say that two local histories for a process p are equivalent up to faults if they differ only in the values of external input variables at points when all processes reading them were faulty. A distributed implementation is consistent w.r.t. faults if all strategies produce the same output for histories that are equivalent up to faults. Note that for an implementation that is consistent w.r.t. faults, the output of a strategy for a process p is allowed to depend on the value of a variable v ∈ Ip from points in the history in which process p was faulty, as long as some process allowed to read v was not faulty. Thus, provided that at each point of the execution of the system there exists at least one process that is not faulty (an assumption that can be specified in the fault-tolerance specification as shown above), a recovered process is allowed to depend on the history up to equivalence w.r.t. faults. This is possible since in a fully connected architecture a process can, upon recovery from a fault, receive information about the current state of the system from another process that was not faulty at the previous step. Definition 1 (Equivalence up to faults). For a process p ∈ P , we define the equivalence relation ≡fp on D(In(p))∗ in the architecture A as follows. For σ1 ∈ D(In(p))∗ and σ2 ∈ D(In(p))∗ , we have σ1 ≡fp σ2 if and only if |σ1 | = |σ2 |, σ1 [|σ1 | − 1] = σ2 [|σ2 | − 1] and for every 0 ≤ j < |σ1 | − 1 it holds that: (1) σ1 [j]In(p) \ Ip = σ2 [j]In(p) \ Ip and (2) for every v ∈ Ip , if there exists a process q ∈ P with v ∈ Iq and σ[j]nq = 0, then it holds that σ1 [j]v = σ2 [j]v. Definition 2 (Consistency w.r.t. faults). We say that a distributed strategy sˆ for the architecture A is consistent w.r.t. faults if for every p ∈ P and for every σ1 and σ2 in D(In(p))∗ with σ1 ≡fp σ2 it holds that sp (σ1 ) = sp (σ2 ). Definition 3 (Fault-Tolerant Synthesis Problem). The fault-tolerant realizability problem for an architecture A and a CTL∗ fault-tolerance specification Φt is to decide whether there exists a finite-state distributed implementation sˆ for A that is consistent w.r.t. faults and such that F CT (ˆ s) |= Φt . The fault-tolerant synthesis problem requires finding such an implementation if the answer is yes.
3
Synthesis
Our synthesis algorithm builds on that of single-process synthesis under incomplete information [11], via a standard reduction for fully connected architectures
330
R. Dimitrova and B. Finkbeiner
and external specifications [8]. We now provide the necessary preliminaries based on the classical synthesis case and in the next sections we present the methodology we developed to employ a similar approach in the fault-tolerant setting. 3.1
Synthesis for Fully Connected Architectures
Transmission Delay. In a fully connected architecture, the output of every process p may depend, with a certain delay, on all external input variables in I ∪ N . The delay, delay (v, p), of the transmission of an external input variable v to a process p in a fully connected architecture is 0 if v ∈ Ip ∪ N and 1 otherwise. Input-Output Functions. An input-output function for a process p is a function gp : D(I ∪ N )∗ → D(Op ) which based on the global input history assigns values to the variables in Op . A global input-output function g : D(I ∪ N )∗ → D(O) assigns values to all external output variables, based on the global input history. An input-output function gp is delay-compatible if for each σ1 , σ2 ∈ D(I ∪N )∗ , for which for every v ∈ I ∪N it holds that σ1 v(|σ1 |−delay (v, p)) = σ2 v(|σ2 |− delay (v, p)), it holds that gp (σ1 ) = gp (σ2 ). A global input-output function g is delay-compatible iff so is the projection of g on Op , for every process p ∈ P . Routing Strategies. Given a nonempty set D(v) for each v ∈ C, a routing rˆ = (rp )p∈P for an architecture A is a tuple of local memoryless strategies, called routing strategies, where for each p ∈ P , rp : D(In(p)) → D(Cp ) is a function that given values for the variables in In(p), assigns values to the variables in Cp . In a fully connected architecture A, each process p can transmit the values of Ip to the other processes via the variable cp . A simple routing rˆ = (rp )p∈P is one for which rp (d)cp = dIp and it allows every process to trivially reconstruct the value of every variable in I. A distributed implementation sˆ for A has simple routing if for every σ ∈ D(V )∗ and p ∈ P it holds that sp (σIn(p))cp = dir (σ)Ip , i.e., the strategies directly forward the external input. For fully connected architectures it suffices to consider only implementations with simple routing. Synthesis for Fully Connected Architectures and External Specifications. In [8] it was shown that the distributed synthesis problem is decidable for uniformly well-connected architectures with linearly preordered information and external specifications, and it can be reduced to finding a collection of delay-compatible input-output functions for the processes in P . Fully connected architectures are a special case of uniformly well-connected architectures in which all processes have the same information and hence fall in this class. Moreover, in order to find such a collection of input-output functions, it suffices to find a delay-compatible global input-output function and use projection to obtain functions for the processes in P . Thus, the problem reduces to the single-process synthesis problem under incomplete information with the additional requirement of delay-compatibility. 3.2
Single-Process Synthesis under Incomplete Information
Let A = (env , {p}, Ext, ∅, (D(v))v∈Ext , (I ∪ N, O)) be a single-process architecture and ψ be a CTL∗ specification over the variables in V = I ∪ H ∪ N ∪ O.
Synthesis of Fault-Tolerant Distributed Systems
331
Tree Automata. An alternating parity tree automaton is a tuple A = (Y, X, Q, q0 , δ, α), where Y is a finite alphabet, X is a finite set of directions, Q is a finite set of states, q0 ∈ Q is the initial state, δ : Q × Y → B+ (Q × X) is a transition function that maps a state and an input letter to a positive boolean combination of pairs of states and directions, and a coloring function α : Q → Col ⊂ N that maps each state to some color from a finite set Col . An alternating automaton runs on full Y -labeled X-trees. A run tree on a given full Y -labeled X-tree X ∗ , l is a Q×X ∗ -labeled tree where the root is labeled with (q, ε) and where for every node ρ with label (q, σ) the set of children K of ρ satisfies the following properties: (1) for every ρ ∈ K, the label of ρ is (q , σ · x) for some q ∈ Q and x ∈ X such that (q , x ) is an atom of δ(q, l(σ)), and (2) the set of atoms defined by the set of children K satisfies δ(q, l(σ)). An infinite path fulfills the parity condition if the maximal color of the states appearing infinitely often is even. A run tree is accepting if all infinite paths fulfill the parity condition. A full Y -labeled X-tree is accepted if it has an accepting run tree. A nondeterministic automaton is an alternating automaton, in which, in the DNF of each transition every disjunct contains exactly one (q, x) for every x ∈ X. Symmetric alternating automata are a variant of alternating automata that run on total Y -labeled X-trees. For a symmetric alternating automaton S = (Y, Q, q0 , δ, α), Q,q0 , and α are as above, but the transition function δ : Q × Y → B+ (Q × {, ♦}) maps a state and an input letter to a positive boolean combination over atoms that refer to some(♦) or all () successors in the tree. A run tree on a given Y -labeled X-tree T, l is a Q × X ∗ -labeled tree where the root is labeled with (q, ε) and where for every node ρ with label (q, σ) the set of children K of ρ satisfies the following properties: (1) for every ρ ∈ K, the label of ρ is (q , σ · x) for some q ∈ Q, x ∈ X and σ · x ∈ T such that (q , ) or (q of δ(q, l(σ)), and (2) interpreting each occurrence of (q , ) as , ♦) is an atom (q , x) and each occurrence of (q , ♦) as x∈X,σ·x∈T x∈X,σ·x∈T (q , x), the set of atoms defined by the set of children K satisfies δ(q, l(σ)). Automata-Theoretic Solution. For a CTL∗ formula ψ one can construct an alternating parity tree automaton Aψ with 2O(|ψ|) states that accepts exactly the models of ψ [11]. Via the automata transformations described in [11], the realizability problem for the single process architecture A and the specification ψ can be reduced to the nonemptiness of an alternating tree automaton Cψ that is obtained from Aψ and that has the same number of states as Aψ . Theorem 1 (from [11]). The single-process synthesis problem for CTL∗ with incomplete information is 2EXPTIME-complete.
4
Encoding Fault-Tolerant Realizability
In this section we present a transformation of the architecture and the faulttolerance specification to an architecture and a specification for which the distributed fault-tolerant synthesis problem can be reduced to single-process synthesis under incomplete information. The key challenge is to account for the
332
R. Dimitrova and B. Finkbeiner
fact, that the non-faulty processes cannot rely on receiving accurate information about the external input of a faulty process. We describe how we model the effect of a fault occurrence on the informedness of the non-faulty processes. We show that the described transformations do not affect fault-tolerant realizability. Architecture Transformation. To circumvent the change of informedness of the processes in the architecture caused by the occurrences of faults, and yet allow a faulty process to communicate incorrectly its external input to the other processes, we introduce a faulty copy of the external input of each process. Formally, we transform the architecture A into an architecture Af defined as follows: Af = (env , P, Ext f , C, (D(v))v∈Ext f , (In f (p), Out(p))p∈P ), where V f is partitioned into I f = F , H f = H ∪ I, N , C and O. The new set of variables F = {fp | p ∈ P } consists of the external faulty-input variables, one for each process p in P , whose values are supplied by the environment. For p ∈ P we have In f (p) = (In(p) \ Ip ) ∪ {fp }. The domain of each fp is D(Ip ) and we denote with fpv the component of fp that corresponds to a variable v ∈ Ip . In the architecture Af , none of the processes is allowed to read the values of the original input variables in I and thus, a routing strategy for process p can only transmit the value of the faulty-input variable fp to the other processes. Specification Transformation. The relation between the correct and the faulty input in normal execution and after the occurrence of a fault is established by an assumption on the environment introduced in the specification: While the tuple of values given by the environment to the input variables of a process p and the value given to the corresponding faulty-input variable for that process are constrained to be the same during the normal execution of process p, during the time process pis faulty this may not be the case. Formally, the assumption is faulty-input = G p∈P,v∈Ip (np = 0 → fpv = v). Then the formula Φf is obtained by substituting in the fault-tolerance specification Φt each occurrence of Aθ by A(faulty-input → θ) and each occurrence of Eθ by E(faulty-input ∧ θ). For implementations for the architecture Af we assume that faults do not affect the forwarding of external input by a faulty process. This results in the fault computation tree F CT f (ˆ s) = T, dir , where T ⊆ D(V f )∗ is the greatest total tree such that for all σ ∈ D(V f )∗ and all d ∈ D(V f ), if σ·d ∈ T , then σ ∈ T and for every p ∈ P it holds that dcp = sp (σIn f (p))cp and if dir (σ)np = 0 then dOut (p) = sp (σIn f (p))Out (p). The following theorem establishes the connection between the fault computation trees for the implementations for the architecture Af and the implementations for the original architecture A. Theorem 2. There exists a finite-state distributed implementation sˆ with simple routing for the architecture A that is consistent w.r.t. faults and such that the fault computation tree F CT (ˆ s) is a model of Φt iff there exist a finite-state f distributed implementation sˆ with simple routing for the architecture Af such that the fault computation tree F CT f (ˆ sf ) is a model of Φf . Proof (Idea). We define mappings from local input histories for a process p ∈ P in the architecture A to local input histories for p in Af and vice versa. An
Synthesis of Fault-Tolerant Distributed Systems
333
element of D(In(p))∗ is mapped to an element of D(In f (p))∗ , where the value of fp in a state of the image prefix is the value of cp from the next state in the original prefix, if such a state exists, and is equal to the tuple of values for Ip in the corresponding state of the original prefix otherwise. When mapping D(In f (p))∗ to D(In(p))∗ , the value of a variable v ∈ Ip in a state of the image prefix is the same as the value of cvq from the next state in the original prefix, if such a state exists and there exists a process q with v ∈ Iq that is not faulty in the corresponding state of the original prefix, and is the value of fpv from the corresponding state in the original prefix otherwise. Based on these mappings we define the respective strategies and show that they have the required properties. The formal definitions and proof can be found in the full version of this paper.
5
From Fault Input-Output Trees to Full Trees
We now present a modification of the classical construction that transforms an automaton on total trees into one that accepts full trees. Our construction accounts for the shape of the total tree resulting from the occurrences of faults. Consider the architecture Af and the formula Φf obtained as described in Sect. 4 from the architecture A and the fault-tolerance specification Φt . For a global input-output function g for Af , we define a fault input-output tree F OT (g) = T, dir similarly to fault computation trees: T ⊆ D(Ext f )∗ is the greatest total tree such that for all σ ∈ D(Ext f )∗ and all d ∈ D(Ext f ), if σ · d ∈ T , then σ ∈ T and for all p ∈ P with dir (σ)np = 0 it holds that dOp = g(σI f )Op . The tree F OT (g), which is a total D(Ext f )-labeled D(Ext f )-tree, can be represented as a full D(Ext f )× D(O)-labeled D(Ext f )-tree where the nodes are labeled additionally with the output of g that determines the enabled directions. Given a full D(Ext f )× D(O)-labeled D(Ext f )-tree T , we can determine the corresponding total characteristic tree under faults, char F (T ). Definition 4. Let D(Ext f )∗ , l be a full D(Ext f ) × D(O)-labeled D(Ext f )tree. We define the characteristic tree under faults as the total D(Ext f )-labeled D(Ext f )-tree T, dir = char F (D(Ext f )∗ , l) as follows: T ⊆ D(Ext f )∗ is the greatest total tree such that for all σ ∈ D(Ext f )∗ and all d ∈ D(Ext f ), if σ · d ∈ T , then σ ∈ T and the condition (∗) below holds: (∗) for every p ∈ P , if dnp = 0, then d Op = do Op , where l(σ) = d, do . For the CTL∗ formula Φf , we can construct [11] a symmetric parity automaton SΦ that accepts exactly the models of Φf . This automaton runs on total D(Ext f )f labeled trees, has 2O(|Φ |) states and five colors. Since automata transformations are simpler for automata running on full trees, from the symmetric automaton SΦ we construct the alternating parity automaton AΦ that accepts a full tree iff its characteristic tree under faults is a model of Φf . The transition function of AΦ uses the information about enabled successors given in the labels: Where SΦ sends a copy to all successors (some successor), AΦ sends a copy to all enabled successors (some enabled successor). However, here the enabled successors are determined only according to the output of the processes that are non-faulty according to the node’s label.
334
R. Dimitrova and B. Finkbeiner
Theorem 3. If SΦ is a symmetric automaton over D(Ext f )-labeled D(Ext f )trees, we can construct an alternating parity automaton AΦ such that AΦ accepts a D(Ext f )×D(O)-labeled D(Ext f )-tree T iff SΦ accepts char F (T ). The automaton AΦ has the same state space and acceptance condition as SΦ .
6
Synthesis of Fault-Tolerant Systems
We reduced the fault-tolerant synthesis problem for the architecture A and the formula Φt to the corresponding problem for the architecture Af and Φf . In Af we can assume that faults do not affect the routing of input and therefore we can reduce the problem to finding delay-compatible input-output functions. Delay-Compatible Global Input-Output Functions for Af . From a symmetric automaton SΦ that accepts exactly the total D(Ext f )-labeled D(Ext f )-trees that are models of Φf , we construct, as explained in the previous section, an alternating parity tree automaton AΦ that accepts a full D(Ext f ) × D(O)-labeled D(Ext f )-tree T iff char F (T ) is a model of Φf . Via standard transformations we obtain from AΦ a nondeterministic automaton NΦ with number of states doubly exponential in the size of Φf that accepts a D(O)-labeled D(I f ∪ N )-tree T iff T corresponds to a global input-output function for A whose fault input-output tree is a model of Φf . Via a construction similar to the one in [8], we transform NΦ into a nondeterministic tree automaton DΦ that accepts exactly the labeled trees accepted by NΦ that correspond to delay-compatible global input-output functions. The size of DΦ is linear in the size of NΦ . If the language of DΦ is nonempty the nonemptiness test produces a finite-state delay-compatible global input-output function g : D(I f ∪ N )∗ → D(O) for Af for which F OT (g) |= Φf . From Global Input-Output Functions to Distributed Implementations. By projecting a finite-state delay-compatible global input-output function g for Af on the sets Op of variables we obtain a set of delay-compatible input-output functions (gp )p∈P for the processes in Af . These functions are represented as finite automata, which have the same set of states Qg , which are labeled by elements of D(Op ), and the same deterministic transition function δg . For each variable tp ∈ C, we define D(tp ) = Qg ∪ {⊥}, d0 (tp ) = ⊥ and for each cp ∈ C we have D(cp ) = D(Ipf ). Then, we define a simple routing rˆ = (rp )p∈P as follows. For every process p and d ∈ D(In f (p)), the value assigned to tp by rp is determined as follows. If dtq = ⊥ for all q, this value is qg0 (the initial state). Otherwise, the value is δg (t, d ), where t = dtq for some process q with dmq = 0 if one exists, or some fixed process q otherwise, and for every fq ∈ I f , d fq = dcq . Combining the functions (gp )p∈P with this simple routing we obtain a finitestate distributed implementation sˆf = (sfp )p∈P for Af . If the tree F OT (g) is a model of Φf , then so is the tree F CT f (ˆ sf ). Clearly, vice versa, if there exists a distributed implementation sˆ for Af with F CT f (ˆ sf ) |= Φf , then there exists a delay-compatible global input-output function g with F OT (g) |= Φf and hence the language of DΦ is not empty.
Synthesis of Fault-Tolerant Distributed Systems
335
Recalling the relation between the implementations in the architectures A and Af we established in Sect. 4, we obtain the following result. Theorem 4. The fault-tolerant distributed synthesis problem is 2EXPTIMEcomplete for fully connected architectures and external specifications.
7
Conclusion
We have presented a synthesis algorithm that determines for a fully connected architecture and a temporal specification whether a fault-tolerant implementation exists, and, in case the answer is positive, automatically derives such an implementation. We demonstrated that the framework of incomplete information is well-suited for encoding the effects of faults on the informedness of individual processes in a distributed system. This allowed us to reduce the fault-tolerant distributed synthesis problem to single-process synthesis with incomplete information for a modified specification. We thus showed that the fault-tolerance synthesis problem is decidable and no more expensive than standard synthesis. Establishing general decidability criteria for architectures that are not fully connected as well as extending the scope to broader fault types such as Byzantine faults are two open problems that deserve further study.
References [1] Attie, P.C., Arora, A., Emerson, E.A.: Synthesis of fault-tolerant concurrent programs. ACM Trans. Program. Lang. Syst. 26(1), 125–185 (2004) [2] Bonakdarpour, B., Kulkarni, S.S., Abujarad, F.: Distributed synthesis of faulttolerant programs in the high atomicity model. In: Masuzawa, T., Tixeuil, S. (eds.) SSS 2007. LNCS, vol. 4838, pp. 21–36. Springer, Heidelberg (2007) [3] Dolev, D., Strong, H.R.: A simple model for agreement in distributed systems. In: Simons, B., Spector, A.Z. (eds.) Fault-Tolerant Distributed Computing. LNCS, vol. 448, pp. 42–50. Springer, Heidelberg (1990) [4] Ebnenasir, A., Kulkarni, S.S., Arora, A.: FTSyn: a framework for automatic synthesis of fault-tolerance. STTT 10(5), 455–471 (2008) [5] Emerson, E.A.: Temporal and modal logic. In: Handbook of Theoretical Computer Science. Formal Models and Sematics (B), vol. B, pp. 995–1072 (1990) [6] Finkbeiner, B., Schewe, S.: Uniform distributed synthesis. In: Proc. LICS 2005, June 2005, pp. 321–330 (2005) [7] Fisman, D., Kupferman, O., Lustig, Y.: On verifying fault tolerance of distributed protocols. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 315–331. Springer, Heidelberg (2008) [8] Gastin, P., Sznajder, N., Zeitoun, M.: Distributed synthesis for well-connected architectures. In: Arun-Kumar, S., Garg, N. (eds.) FSTTCS 2006. LNCS, vol. 4337, pp. 321–332. Springer, Heidelberg (2006) [9] Hadzilacos, V., Toueg, S.: A modular approach to fault-tolerant broadcasts and related problems. In: Distributed Systems, 2nd edn., Addison-Wesley, Reading (1993) [10] Kulkarni, S.S., Arora, A.: Automating the addition of fault-tolerance. In: Formal Techniques in Real-Time and Fault-Tolerant Systems, pp. 82–93 (2000)
336
R. Dimitrova and B. Finkbeiner
[11] Kupferman, O., Vardi, M.Y.: Church’s problem revisited. Bulletin of Symbolic Logic 5(2), 245–263 (1999) [12] Manolios, P., Trefler, R.: Safety and liveness in branching time. In: Proc. LICS, pp. 366–374. IEEE Computer Society Press, Los Alamitos (2001) [13] Pnueli, A., Rosner, R.: Distributed reactive systems are hard to synthesize. In: Proc. FOCS, vol. II, pp. 746–757. IEEE, Los Alamitos (1990)
Formal Verification for High-Assurance Behavioral Synthesis Sandip Ray1 , Kecheng Hao2 , Yan Chen3 , Fei Xie2 , and Jin Yang4 1
Department of Computer Sciences, University of Texas at Austin, Austin, TX 78712 Department of Computer Science, Portland State University, Portland, OR 97207 3 Toyota Technological Institute at Chicago, Chicago, IL 60637 4 Strategic CAD Labs, Intel Corporation, Hillsboro, OR 97124
2
Abstract. We present a framework for certifying hardware designs generated through behavioral synthesis, by using formal verification to certify the associated synthesis transformations. We show how to decompose this certification into two components, which can be respectively handled by the complementary verification techniques, theorem proving and model checking. The approach produces a certified reference flow, composed of transformations distilled from production synthesis tools but represented as transformations on graphs with an associated formal semantics. This tool-independent abstraction disentangles our framework from the inner workings of specific synthesis tools while permitting certification of hardware designs generated from a broad class of behavioral descriptions. We provide experimental results suggesting the scalability on practical designs.
1 Introduction Recent years have seen high complexity in hardware designs, making it challenging to develop reliable, high-quality systems through hand-crafted Register Transfer Level (RTL) or gate-level implementations. This has motivated a gradual migration away from RTL towards Electronic System Level (ESL) designs which permit description of design functionality abstractly in high-level languages, e.g., SystemC. However, the ESL approach crucially depends on reliable tools for behavioral synthesis, that is, automated synthesis of a hardware circuit from its ESL description. Behavioral synthesis tools apply a sequence of transformations to compile the ESL description to an RTL design. Several behavioral synthesis tools are available today [1,2,3,4]. Nevertheless, and despite its great need, behavioral synthesis has not yet found wide acceptance in industrial practice. A major barrier to its adoption is the lack of designers’ confidence in correctness of synthesis tools themselves. The difference in abstraction level between a synthesized design and the ESL description puts the onus on behavioral synthesis to ensure that the synthesized design indeed conforms to the description. On the other hand, synthesis transformations necessary to produce designs satisfying the growing demands of performance and power include complex and aggressive optimizations which must respect subtle invariants. Consequently, synthesis tools are often either (a) error-prone or (b) overly conservative, producing circuits of poor quality and performance [4,5].
This research was partially supported by a grant from Intel Corporation. Yan Chen was a M.S. student at Portland State University when he participated in this research.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 337–351, 2009. c Springer-Verlag Berlin Heidelberg 2009
338
S. Ray et al.
In this paper, we develop a scalable, mechanized framework for certifying behavioral synthesis flows. Certification of a synthesis flow amounts to the guarantee that its output preserves the semantics of its input description; thus, the question of correctness of the synthesized design is reduced to the question of analysis of the behavioral description. Our approach is distinguished by two key features: – Our framework is independent of the inner workings of a specific tool, and can be applied to certify designs synthesized by different tools from a broad class of ESL descriptions. This makes our approach particularly suitable for certifying securitycritical hardware which are often synthesized from domain-specific languages [6]. – The approach produces a certified reference flow, which makes explicit generic invariants that must be preserved by different transformations. The reference flow serves as a formal specification for reliable, aggressive synthesis transformations. Formal verification has enjoyed significant successes in the analysis of industrial hardware designs [7,8]. Nevertheless, applying formal verification directly to certify a synthesized design is undesirable for two reasons. First, it defeats the very purpose of behavioral synthesis as a vehicle for raising design abstraction since it requires reasoning at the level of the synthesized design rather than the behavioral description. Second, the cost of analyzing a complex design is substantial and the cost must be incurred for each design certification. Instead, our approach targets the synthesis flow, thereby raising the level of abstraction necessary for design certification. In the remainder of this section, we first provide a brief overview of behavioral synthesis with an illustrative example; we then describe our approach in greater detail. 1.1 Behavioral Synthesis and an Illustrative Example A behavioral synthesis tool accepts a design description and a library of hardware resources; it performs a sequence of transformations on the description to generate RTL. The transformations are roughly partitioned into the following three phases. – Compiler transformations. These include loop unrolling, common subexpression elimination, copy propagation, code motion, etc. Furthermore, expensive operations (e.g., division) are often replaced with simpler ones (e.g., subtraction). – Scheduling. This phase determines the clock step for each operation. The ordering between operations is constrained by the data and control dependencies. Scheduling transformations include chaining operations across conditional blocks and decomposing one operation into a sequence of multi-cycle operations based on resource constraints. Furthermore, several compiler transformations are employed, exploiting (and creating opportunities for) operation decomposition and code motions. – Resource binding and control synthesis. This phase binds operations to functional units, allocates and binds registers, and generates the control circuit to implement the schedule. After these transformations, the design can be expressed as RTL. This design is subjected to further manual optimizations to fine-tune for performance and power. Each synthesis transformation is non-trivial. The consequence of their composition is a significant difference in abstraction from the original description. To illustrate this,
Formal Verification for High-Assurance Behavioral Synthesis
/ ∗ bas i c c y c l e s t a r t ∗ / f o r ( i = 0; i < 32; i ++) { sum += d e l t a ; v0 += ( ( v15)+k1 ) ; v1 += ( ( v05)+k3 ) ; }
V[1] Pipeline logic
i0 i1 i2 out = ((i05)+i2) out
/ ∗ end c y c l e ∗ / v [ 0 ] = v0 ; v [ 1 ] = v1 ;
}
V[1] k2
0x9e3779b9
void e n c r y p t ( u i n t 3 2 t ∗ v , u i n t 3 2 t ∗ k ) { / ∗ s e t up ∗ / u i n t 3 2 t v0=v [ 0 ] , v1=v [ 1 ] , sum=0 , i ; / ∗ a key schedule c o n s t a n t ∗ / u i n t 3 2 t d e l t a =0x9e3779b9 ; / ∗ cache key ∗ / u i n t 3 2 t k0=k [ 0 ] , k1=k [ 1 ] , k2=k [ 2 ] , k3=k [ 3 ] ;
339
Phi V0_0
tmp41
tmp49 V[0]
(A)
(B)
Fig. 1. (A) C code for TEA encryption function. (B) Schema of RTL synthesized by AutoPilot.
consider the synthesis of the Tiny Encryption Algorithm (TEA) [9]. Fig. 1 shows a C implementation and the circuit synthesized by the AutoPilot behavioral synthesis tool [10]. The following transformations are involved in the synthesis of the circuit. – In the first phase, constant propagation removes unnecessary variables. – In the second phase, the key scheduling transformation performed is pipelining, to enable overlapping execution of operations from different loop iterations. – In the third phase, operations are bound to hardware resources (e.g., “+” operation to an adder), and the FSM module is generated to schedule circuit operations. Each transformation must respect subtle design invariants. For instance, paralleling operations from different loop iterations must avoid race conditions, and scheduling must respect data dependencies. Since such considerations are entangled with low-level heuristics, it is easy to have errors in the synthesis tool implementation, resulting in buggy designs [5]. However, the difference in abstraction level makes direct comparison between the C and RTL descriptions impractical; performing such comparison through sequential equivalence checking [11] requires cost-prohibitive symbolic co-simulation to check input/output correspondence. 1.2 Approach Overview We address the above issue by breaking the certification of behavioral synthesis transformations into two components, verified and verifying.1 Fig. 2 illustrates our framework. A verified transformation is formally certified once and for all using theorem 1
The terms “verified” and “verifying” as used here are borrowed from analogous notions in the compiler certification literature.
340
S. Ray et al.
Behavioral Design Description
Behavioral Synthesis
Hardware Resource Library
Certified Compiler (Application of certified primitive transformations; Offline proof of transformation rules;)
Verified Component (Theorem Proving or Decision Procedures)
Sequence of applied (Algorithms/heuristics/user−guidence primitive transformations deciding sequence of primitive transformations to be applied; Application of primitive transformations)
Synthesized RTL
Manual RTL Optimizations
Manually Optimized RTL
Equivalence Checking
Yes/No
Verifying Component (Model Checking)
Golden Circuit Model (Clocked Control/Data Flow Graphs)
Fig. 2. Framework for certification of behavioral synthesis flows
proving; a verifying transformation is not itself verified, but each instance is accompanied by a verification of correspondence between input and output. The viability of decomposition is justified by the nature of behavioral synthesis. Transformations applied at the higher level, (e.g., compiler and scheduling transformations) are generic. The cost of a monolithic proof is therefore mitigated by the reusability of the transformation over different designs. Such transformations make up the verified component. On the other hand, the optimizations performed at the lower levels are unique to the design being synthesized; these transformations constitute the verifying component. Since the verification is discharged per instance, it must be fully automatic. However, these transformations tend to be localized and independent of global invariants, making it tractable to verify them automatically by sequential equivalence checking. 1.3 Golden Circuit Model and Synthesis Certification In a practical synthesis tool, transformations are implemented with low-level, optimized code. A naive approach for the verified component, e.g., to formally verify such a tool with all optimizations would be prohibitive. Furthermore, such an approach would tie the framework to a single tool, limiting reusability. To mitigate this challenge, we develop a formal, graph-based abstraction called clocked control/data flow graph (CCDFG), which serves as the universal golden circuit model. We discuss our formalization of CCDFG in Section 2. CCDFG is an abstraction of the control/data flow graph (CDFG) — used as an intermediate representation in most synthesis tools — augmented with a schedule. The close connection between the formal abstraction and the representation used in a synthesis flow enables us to view synthesis transformations as transformations on CCDFG, while obviating a morass of tool-specific details. We construct a reference flow as a sequence of CCDFG transformations as follows: each transformation generates a CCDFG that is guaranteed to preserve semantic correspondence with its input. A production transformation is decomposed into primitive transformations, together with algorithms/heuristics that determine the
Formal Verification for High-Assurance Behavioral Synthesis
341
application sequence of these transformations. Once the primitive transformations are certified, the algorithms or heuristics do not affect the correctness of a transformation sequence, only the performance. The reference flow requires no knowledge about the algorithms/heuristics which are often confidential to a synthesis tool. Given a synthesized hardware design D and its corresponding behavioral description, the certification of the hardware can be mechanically performed as follows. – Extract the CCDFG C from the behavioral description. – Apply the certified primitive transformations from the reference flow, following the application sequence provided by the synthesis tool. The result is a CCDFG C that is close to to D in abstraction level. – Apply equivalence checking to guarantee correspondence between C and D. The overall correctness of this certification is justified by the correctness of the verified and verifying components and their coupling through the CCDFG C . How does the approach disentangle the certification of a synthesized hardware from the inner workings of the synthesis tool? Although each certified transformation mimics a corresponding transformation applied by the tool, from the perspective of certifying the hardware they are merely heuristic guides transforming CCDFGs to facilitate equivalence checking: certification of the synthesized hardware reduces to checking that the initial CCDFG reflects the design intent. The initial CCDFG can be automatically extracted from the synthesis tools’ initial internal representation.2 Furthermore, the framework abstracts low-level optimizations making the verification problem tractable. The rest of the paper is organized as follows. In Section 2 we present the semantics of CCDFG. In Section 3 we discuss how to use theorem proving to verify the correctness of generic CCDFG transformations. In Section 4 we present our equivalence checking procedure. We provide initial experimental results in Section 5, discuss related work in Section 6, and conclude in Section 7.
2 Clocked Control/Data Flow Graphs A CCDFG can be viewed as a formal control/data flow graph (CDFG) — used as internal representation in most synthesis tools including Spark and Autopilot — augmented with a schedule. Fig. 3 shows two CCDFGs for the TEA encryption. The semantics of CCDFG are formalized in the logic of the ACL2 theorem prover [12]. This section briefly discusses the formulation of a CCDFG; for a more complete account, see [13]. The formalization of CCDFG assumes that the underlying language provides the semantics for a collection ops of primitive operations. The primitive operations in Fig. 1 include comparison and arithmetic operations. We also assume a partition of design variables into state variables and input variables. Variable assignments are assumed to be in a Static in Single Static Assignment (SSA) form. Design descriptions are assumed to be amenable to control and data flow analysis. Control flow is broken up into basic 2
Since the input description is normally unclocked, the initial CCDFG does not contain schedule information, and can be viewed as a CDFG. Schedules are generated by synthesis transformations that turn the unclocked representation to a clocked one.
342
S. Ray et al.
Input
Input
Microstep
pl_start = 0
delta0 = 0x9e3779b9 N
newPhi = phi (0, newbin); v1_0 = phi (v[1], tmp56); v0_0 = phi (v[0], tmp41) Y
Y
tmp56 = tmp54 + v1_0
newPhi == 32 N
v[0] = v0_0; v[1] = v1_0
newbin = newPhi + 1
return
sum0 = newPhi*delta0
newPhi = phi (0, newbin); v1_0 = phi (v[1],tmp56); v0_0 = phi (v[0], tmp41) Y
tmp26 = sum0+delta0
Scheduling Step
tmp39 = (v1_0 >5) + k1))
pl_start == 1
tmp54 = (tmp41 >5)+k3)
tmp39 = ( (v1_0 >5) + k1))
tmp54 = ((tmp41 > 5) + k3)
(A)
(B)
Fig. 3. (A) Initial CCDFG of TEA encryption function. (B) Transformed CCDFG after pipelining. The shaded regions represent scheduling steps, and white boxes represent microsteps. For brevity, only the control flow is shown; data flow is omitted. Although the underlying operations are assumed to be in SSA form, the diagrams aggregate several single assignments for simplicity.
blocks. Data dependency is given by “read after write” paradigm: opj is data dependent on opi if opj occurs after opi in some control flow path and computes an expression over some state variable v that is assigned most recently by opi in the path. The language is assumed to disallow circular data dependencies. Definition 1 (Control and Data Flow Graphs). Let ops {op1 , . . . , opn } be a set of operations over some set V of (state and input) variables, and bb be a set of basic blocks each consisting of a sequence of operations. A data flow graph GD over ops is a directed acyclic graph with vertex set ops. A control flow graph GC is a graph with vertex set bb and each edge labeled with an assertion over V . An edge in GD from opi to opj represents data dependency, and an edge in GC from bbi to bbj indicates that bbi is a direct predecessor of bbj in the control flow of. An assertion on an edge holds whenever program control makes the corresponding transition. Definition 2 (CDFG). Let ops {op1 , . . . , opm } be a set of operations over a set of variables V , bb {bb1 , . . . , bbn } be a set of basic blocks over ops, GD and GC are data and control flow graphs over ops and bb respectively. A CDFG is the tuple GCD GD , GC , H, where H is a mapping H : ops → bb such that H(opi ) = bbj iff opi occurs in bbj .
Formal Verification for High-Assurance Behavioral Synthesis
343
The execution order of operations in a CDFG is irrelevant as long as control and data dependencies are respected. The definition of microsteps makes this notion explicit. Definition 3 (Microstep Ordering and Partition). Let GCD GC , GD , H, where the set of vertices of GC is bb {bb1 , . . . , bbl }, and the set of vertices in GD is ops {op1 , . . . , opn }. For each bbk ∈ bb, a microstep ordering is a relation ≺k over ops(bbk ) {opi : H(opi ) = bbk } such that opa ≺k opb if and only if there is a path from opa to opb in the subgraph GD,k of GD induced by ops(bbk ). A microstep partition of bbk under ≺k is a partition Mk of ops(bbk ) satisfying the following two conditions. (1) For each p ∈ Mk , if opa , opb ∈ p then opa ≺ opb and opb ≺k opa . (2) If p, q ∈ Mk with p = q, opa ∈ p, opb ∈ q, and opa ≺k opb , then for each opa ∈ p and opb ∈ q opb ≺k opa . A microstep partition of GCD is a set M containing each microstep partition Mk . If opa and opb are in the same partition, their order of execution does not matter; if p and q are two microsteps where p ≺k q, the operations in p must be executed before q to respect the data dependencies. Note that we treat different instances of the same operation as different (with same semantics); this permits stipulation of H as a function instead of a relation, and simplifies the formalization. In Fig. 3, each white box corresponds to a microstep partition. Since GD is acyclic, ≺k is an irreflexive partial order on ops(bbk ) and the notion of microstep partition is well-defined. Given a microstep partition M {m0 , m1 , . . .} of GCD each mi is called a microstep of GCD . It is convenient to view ≺k as a partial order over the microsteps of bbk . CCDFGs are formalized by augmenting a CDFG with a schedule. Consider a microstep partition M of GCD . A schedule T of M is a partition or grouping of M ; for m1 , m2 ∈ M , if m1 and m2 are in the same group in T , we say that they belong to the same scheduling step. Informally, if two microsteps in M are in the same group in T then they are executed within the same clock cycle. Definition 4 (CCDFG). A CCDFG is a tuple G GCD , M, T , where GCD is a CDFG, M is a microstep partition of GCD , and T is a schedule of M . We formalize CCDFG executions through a state-based semantics. A CCDFG state is a valuation of state variables, and a CCDFG input is a valuation of input variables. We also assume a well-defined initial state. Given a sequence I of inputs, an execution of a CCDFG G = GCD , M, T is a sequence of CCDFG states that corresponds to an evaluation of the microsteps in M respecting T . Finally, we consider outputs and observation. An output of a CCDFG G is some computable function f of (a subset of) state variables of G; informally, f corresponds to some output signal in the circuit synthesized from G. To formalize this in ACL2’s first order logic, the output is restricted to a Boolean expression of the state variables; the domain of each state variable itself is unrestricted, which enables us to represent programs such as the Greatest Common Divider (GCD) algorithm that do not return Boolean values. For each state s of G, the observation corresponding to an output f at state s is the valuation of f under s. Given a set F of output functions, any sequence E of states of G induces a sequence of observations O; we refer to O as the observable behavior of E under F .
344
S. Ray et al.
3 Certified Compilation Certifying a transformation T requires showing that if the application of T on a CCDFG G generates a new CCDFG G , then there is provable correspondence between the executions of G and G . The certification process crucially depends on a formal notion of correspondence to relate the executions of G and G . Note that the notion must comprehend differences between execution order of operations as long as the sequence of observations is unaffected. The notion we use is loosely based on stuttering trace containment [14,15]. Roughly, a CCDFG G refines G if for each execution of G there is an execution of G that produces the same observable behavior up to stuttering. We formalize this notion below. Definition 5 (Compressed Execution). Let E s0 , s1 , . . . be an execution of CCDFG G and F be a set of output functions over G. The compression of E under F is the subsequence of E obtained by removing each si such that f (si ) = f (si+1 ) for every f ∈ F. Definition 6 (Trace Equivalence). Let G and G be two CCDFGs on the same set of state and input variables, E and E be executions of G and G respectively, and F be a set of output functions. We say that E is trace equivalent to E if the observable behavior of the compression of E under F is the same as the observable behavior of the compression of E under F . Definition 7 (CCDFG Refinement). We say that a CCDFG G refines G if for each execution E of G there is an execution E of G such that E is trace equivalent to E . Remark 1. For the verified component, we use refinement instead of full equivalence as a notion of correspondence between CCDFGs, to permit connecting the same ESL description with a number of different concrete implementations. In the verifying framework, we will use a stronger notion of equivalence (and indeed, equivalence without stuttering), to facilitate sequential equivalence checking. In addition to showing that a transformation on a CCDFG G produces a refinement of G, we must account for the possibility that a transformation may be applicable to G only if G has a specific structural characteristic; furthermore the result of application might produce a CCDFG with a characteristic to facilitate a subsequent transformation. To make explicit the notion of applicability of a transformation, we view a transformation as a “guarded command” τ pre, T , post : τ is applicable to a CCDFG which satisfies pre and produces a CCDFG which satisfies post. Definition 8 (Transformation Correctness). A transformation τ pre, T , post is correct if the result of applying T to any CCDFG G satisfying pre refines G and satisfies post . The following theorem is trivial by induction on the sequence of transformations. Here [T0 , . . . , Tn ] represents the composition of T0 , . . . , Tn . Theorem 1 (Correctness of Transformation Sequences). Let τ0 , . . . , τn be some sequence of correct transformations, where τi pre i , Ti , post i , Let post i ⇒ pre i+1 , 1 ≤ i < n. Then the transformation pre 1 , [T0 , T1 , . . . , Tn ], post n is correct.
Formal Verification for High-Assurance Behavioral Synthesis
345
Theorem 1 justifies decomposition of a transformation into a sequence of primitive transformations. Note that the proof of Theorem 1 is independent of a specific transformation. We thus construct a reference flow as follows. (1) Identify and distill a sequence τ0 , . . . , τn of primitive transformations; (2) verify τi individually; and (3) check that for each 0 ≤ i < n, post i ⇒ pre i+1 . Theorem 1 guarantees the correctness of the flow. Verifying the correctness of individual guarded transformations using theorem proving might involve significant manual effort. To ameliorate this cost, we identify and derive generic theorems that can certify a class of similar transformations. As a simple example, consider any transformation that refines the schedule. The following theorem states that each such transformation is correct. Theorem 2 (Correctness of Schedule Refinement). Let G GCD , M, T and G GCD , M, T be CCDFGs such that for any two microsteps mi , mj ∈ M if T assigns mi and mj the same group then so does T . Then G is a refinement of G. Theorem 2 is admittedly trivial; it is only shown here for illustration purposes. However, the same approach can verify more complex transformations. For example, consider the constant propagation and pipelining transformations shown in Figure 3 for our TEA example. The implementations of these transformations involve significant heuristics, for instance, to determine whether to apply the transformations in a specific case, how many iterations of the loop should be pipelined, etc. However, from the perspective of correctness, the only relevant conditions about the two transformations are: (1) if a variable v is assigned a constant c, then v can be eliminated by replacing each occurrence with c; and (2) a microstep mi can be overlapped with microstep mj from a subsequent iteration if for each op i ∈ mi and op j ∈ mj , op j ≺ op i in G. Since these conditions are independent of a specific design (e.g., TEA) to which the transformation is applied, the same certification can be used to justify its applicability for diverse designs. The approach is viable because we employ theorem proving which supports an expressive logic, thereby permitting stipulation of the general conditions above as formal predicates in the logic. For example, as we show in previous work [16], we can make use of first-order quantification to formalize a generic refinement proof of arbitrary pipelines, which is directly reusable for verification of the pipeline transformation in our framework. Another generic transformation that is widely employed in behavioral synthesis is operation balancing; its correctness depends only on the fact that the operations involved are associative and commutative and can be proven for CCDFGs containing arbitrary associative-commutative operations. We end the discussion of the verified framework with another observation. Since the logic of ACL2 is executable, pre and post can be efficiently executed for a given concrete transformation. Thus, a transformation τ pre, T , post can be applied even before verification by using pre and post for runtime checks: if a CCDFG G indeed satisfies pre and the application of τ on G results in a CCDFG satisfying post then the instance of application of τ on G can be composed with other compiler transformations; furthermore, the expense of the runtime assertion checking can be alleviated by generating a proof obligation for a specific instance, which is normally more tractable than a monolithic generic proof of the correctness of τ . This provides a trade-off between the computational expense of runtime checks and verification of individual instances with a (perhaps deep) one-time proof of the correctness of a transformation.
346
S. Ray et al.
4 Equivalence Checking We now discuss how to check equivalence between a CCDFG and its synthesized circuit. The verified component facilitates close correspondence between the transformed CCDFG and the synthesized circuit, critical to the scalability of equivalence checking. 4.1 Circuit Model We represent a circuit as a Mealy machine specifying the updates to the state elements (latches) in each clock cycle. Our formalization of circuits is typical in traditional hardware verification, but we make combinational nodes explicit to facilitate the correspondence with CCDFGs. Definition 9 (Circuit). A circuit is a tuple M = I, N, F where I is a vector of inputs; N is a pair Nc , Nd where Nc is a set of combinational nodes and Nd is a set of latches; and F is a pair Fc , Fd where Fc maps each combinational node c ∈ Nc to an expression over Nc ∪ Nd ∪ I and for each latch d ∈ Nd , Fd maps each latch d to n ∈ Nc ∪ Nd ∪ I where Fd is a delay function which takes the current value of n to be the next-state value of d. A circuit state is an assignment to the latches in Nd . Given a sequence of valuations to the inputs i0 , i1 , . . ., a circuit trace of M is the sequence of states s0 , s1 , . . ., where (1) s0 is the initial state and (2) for each j > 0, the state sj is obtained by updating the elements in Nd given the state valuation sj−1 and input valuation ij−1 . The observable behavior of the circuit is the sequence of valuations of the outputs which are a subset of latches and combinational nodes. 4.2 Correspondence between CCDFGs and Circuits Given a CCDFG G and a synthesized circuit M , it is tempting to define a notion of correspondence as follows: (1) Establish a fixed mapping between the state variables of G and the latches in M , and (2) stipulate an execution of G to be equivalent to an execution of M if they have the same observable behavior. However, this does not work in general since the mappings between state variables and latches may be different in each clock cycle. To address this, we introduce EM ap : ops → Nc , mapping CCDFG operations to the combinational nodes in the circuit: each operation is mapped to the combinational node that implements the operation; the mapping is independent of clock cycles. Fig. 4 shows the mapping for the synthesized circuit of TEA. Recall from Section 1.1 that the FSM decides the control signals for the circuit; the FSM is thus excluded from the mapping. We now define the equivalence between G and M . Definition 10. A CCDFG state x of G is equivalent to a circuit state s of M with respect to an input i and a microstep partition t, if for each operation op in t, the inputs to op according to x and i are equivalent to the inputs to EM ap(op) according to s and EM ap(i), i.e., the values of each input to op and the corresponding input to EM ap(op) are equivalent, and the outputs of op are equivalent to the outputs of EM ap(op).
Formal Verification for High-Assurance Behavioral Synthesis
Input
N
pl_start == 1 Y
tmp54 = (tmp41 5) + k1)) tmp41 = tmp39 + v0_0 tmp49 = (tmp41 + tmp26) ^ ((tmp41 >> 5) + k3)
0
Phi newPhi 1
newPhi == 32 newbin = newPhi + 1
V[1] Pipeline logic
i0 i1 i2 out = ((i05)+i2) out
Phi V0_0
tmp41
tmp49 V[0]
Fig. 4. Synthesized circuit for TEA and the corresponding operation mapping with pipelined CCDFG; dotted lines represent mapping from CCDFG operations to combinational circuit nodes
Definition 11. Given a CCDFG G and a circuit M , G is equivalent to M if and only if for any execution [x0 , x1 , x2 , . . .] of G generated by an input sequence [i0 , i1 , i2 , . . .] and by microstep partition [t0 , t1 , . . .] of G, and the state sequence [s0 , s1 , s2 , . . .] of M generated by the input sequence [EM ap(i0 ), EM ap(i1 ), EM ap(i2 ), . . .], xk and sk are equivalent with respect to tk under ik , k ≥ 0. 4.3 Dual-Rail Simulation for Equivalence Checking We check equivalence between CCDFG G and circuit M by dual-rail symbolic simulation (Fig. 5); the two rails simulate G and M respectively, and are synchronized by clock cycle. The equivalence checking in clock cycle k is conducted as follows: 1. The current CCDFG state xk and circuit state sk are checked to see whether for the input ik , the inputs to each operation op in the scheduling step tk are equivalent to the inputs to EM ap(op). If yes, continue; otherwise, report inequivalence. 2. G is simulated by executing tk on xk under ik to compute xk+1 and recording the outputs of each op ∈ tk . M is simulated for one clock cycle from sk under input EM ap(ik ) to compute sk+1 . The outputs for each op are checked for equivalence with the outputs of EM ap(op). If yes, continue; otherwise, report inequivalence. 3. The next scheduling step tk+1 is determined from control flow. If tk has multiple outgoing control edges, the last microstep of tk executed is identified. The outgoing control edge from this microstep whose condition evaluates to true leads to tk+1 . We permit both bounded and unbounded (fixed-point) simulations. In particular, the simulation proceeds until (i) the equivalence check fails, (ii) the end of a bounded input sequence is reached, or (iii) a fixed point is reached for an unbounded input sequence.
348
S. Ray et al.
CCDFG
Eqivalence Mapping
Circuit
Single Clock Cycle Simulation of CCDFG
Input Constraints
Yes. Fixed Point Computation
No Equivalent?
or Execution up to Given Bound
Single Clock Cycle Simulation of Circuit
Fig. 5. Dual-Rail simulation scheme for equivalence checking between CCDFG and circuit
We have implemented the dual-rail scheme on the bit level in the Intel Forte environment [17], where symbolic states are represented using BDDs. We have also implemented the scheme on the word level with several built-in optimizations, using Satisfiability Modulo Theories (SMT); this is viable since word-level mappings between operations and circuit nodes are explicit. We use bit-vectors to encode the variables in the CCDFG and the circuit; the SMT engine checks input/output equivalence and determines control paths. Our word-level checker is based on the CVC3 SMT engine [18]. The bit-level and word-level checkers are complementary. The bit-level checker ensures that the equivalence checking is decidable, while the word-level checker provides the optimizations crucial to scalability. The word-level checker can make effective use of results from bit-level checking in many cases. One typical scenario is as follows. Suppose M is a design module of modest complexity but is awkward to check at wordlevel. Then the bit-level checker is used to check the equivalence of the CCDFG of M with its circuit implementation; when the word-level checker is used for equivalence checking of a module that calls M , it skips the check of M , treating the CCDFG of M and its circuit implementation as equivalent black boxes.
5 Experimental Results We used the bit-level checker on a set of CCDFGs for GCD and the corresponding circuits synthesized by AutoPilot. The experiments were conducted on a workstation with 3GHz Intel Xeon processor with 2GB memory. Table 1 shows the statistics before and after schedule refinement (Theorem 2). Since we bit-blast all CCDFG operations, the running time grows exponentially with the bit width; for 8-bit GCD, checking requires about 2 hours. It is interesting to understand how schedule refinement affects the performance of equivalence checking. Schedule refinement partitions operations in the loop body into two clock cycles. This does not change fixed-point computation; however, the number of cycles for which the circuit is simulated doubles. For small bitwidths, the running time after schedule refinement is about two times slower than that before. However, for large bit widths, the running time is dominated by the complexity of the CCDFG simulation instead of the circuit simulation. The decrease in time with the increase in bit width from 7 to 8 is likely due to BDD variable reordering.
Formal Verification for High-Assurance Behavioral Synthesis
349
Table 1. Bit-level equivalence checking statistics Circuit Before schedule refinement After schedule refinement Bit Width # of Nodes Time (Sec.) BDD Nodes Time (Sec.) BDD Nodes 2 96 0.02 503 0.02 783 3 164 0.05 4772 0.07 11113 4 246 0.11 42831 0.24 20937 5 342 0.59 16244 1.93 99723 6 452 12.50 39968 27.27 118346 7 576 369.31 220891 383.98 164613 8 714 6850.56 1197604 3471.74 581655 Table 2. Word-level equivalence checking statistics Design GCD TEA DCT 3DES 3DES key C Code Size (# of Lines) 14 12 52 325 412 RTL Size (# of Lines) 364 1001 688 18053 79976 Time (Seconds) 2 15.6 30.1 355.7 2351.7 Memory (Megabytes) 4.1 24.6 49.2 59.4 307.2
Using our word-level checker, we have checked several RTL designs synthesized by AutoPilot with CCDFGs derived from AutoPilot’s intermediate representations; the statistics are shown in Table 2. The designs illustrate different facets of the framework. GCD contains a loop whose number of iterations depends on the inputs. TEA has an explicitly bounded loop. DCT contains sequential computation without loop. 3DES represents a practical design of significant size. 3DES key is included to illustrate the scalability of our approach on relatively large synthesized designs. The results demonstrate the efficacy of our word-level equivalence checking. In contrast, full word-level symbolic simulation comparing the input/output relations of C and RTL runs out of memory on all the designs but DCT (for which it needs twice as much time and memory).
6 Related Work An early effort [19] on verification of high-level synthesis targets the behavioral portion of VHDL [20]. A translation from behavioral VHDL to dependence flow graphs [21] was verified by structural induction based on the CSP [22] semantics. Recently, there has been research on certified synthesis of hardware from formal languages such as HOL [23] in which a compiler that automatically translates recursive function definitions in HOL to clocked synchronous hardware has been developed. A certified hardware synthesis from programs in Esterel, a synchronous design language, has been also been developed [24] in which a variant of Esterel was embedded in HOL. Dave [25] provides a comprehensive bibliography of compiler verification. One of the earliest work on compiler verification was the Piton project [26], which verified a simple assembly language compiler. Compiler certification forms a critical component of the Verisoft project [27], aiming at correctness of implementations of computing systems with both hardware and software components. The Verifix [28] and CompCert [29]
350
S. Ray et al.
projects have explored a general framework for certification of compilers for various C subsets [30,31]. There has also been work on a verifying compiler, where each instance of a transformation generates a proof obligation discharged by a theorem prover [32]. There has been much research on sequential equivalence checking (SEC) between RTL and gate-level hardware designs [33,34]. Research has also be done on combinational equivalence checking between high-level designs in software-like languages (e.g., SystemC) and RTL-level designs [11]. There has also been effort for SEC between software specifications and hardware implementations [35]: GSTE assertion graphs [36] were extended so that an assertion graph edge have pre and post condition labels, and also associated assignments that update state variables. There has also been work on equivalence checking with other graph representations, e.g., Signal Flow Graph [37].
7 Conclusion We have presented a framework for certifying behavioral synthesis flows. The framework includes a combination of verified and verifying paradigms: high-level transformations are certified once and for all by theorem proving, while low-level tweaks and optimizations can be handled through model checking. We demonstrated the use of the CCDFG structure as an interface between the two components. Certification of different compiler transformations is uniformly specified by viewing them as manipulation of CCDFGs. The transformed CCDFG can then be used for equivalence checking with the synthesized design. One key benefit of the approach is that it obviates the need for developing formal semantics for each different intermediate representation generated by the compiler. Furthermore, the low-level optimizations implemented in a synthesis tool are abstracted from the reasoning framework without weakening the formal guarantee on the synthesized design. Our experimental results indicate that the approach scales to verification of realistic designs synthesized by production synthesis tools. In future work, we will make further improvements to improve scalability. In the verified component, we are formalizing other generic transformations e.g., code motion across loop iterations. In the verifying component, we are considering the use of theorem proving to partition a CCDFG into smaller subgraphs for compositional certification. We are also exploring ways to tolerate limited perturbations in mappings between CCDFGs and circuits (e.g., due to manual RTL tweaks) in their equivalence checking.
References 1. Forte Design Systems: Behavioral Design Suite, http://www.forteds.com 2. Celoxica: DK Design Suite, http://www.celoxica.com 3. Cong, J., Fan, Y., Han, G., Jiang, W., Zhang, Z.: Behavioral and Communication CoOptimizations for Systems with Sequential Communication Media. In: DAC (2006) 4. Gajski, D., Dutt, N.D., Wu, A., Lin, S.: High Level Synthesis: Introduction to Chip and System Design. Kluwer Academic Publishers, Dordrecht (1993) 5. Kundu, S., Lerner, S., Gupta, R.: Validating High-Level Synthesis. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 459–472. Springer, Heidelberg (2008) 6. Galois, Inc.: Cryptol: The Language of Cryptography (2007)
Formal Verification for High-Assurance Behavioral Synthesis
351
7. Russinoff, D.: A Mechanically Checked Proof of IEEE Compliance of a Register-TransferLevel Specification of the AMD-K7 Floating-point Multiplication, Division, and Square Root Instructions. JCM 1 (1998) 8. O’Leary, J., Zhao, X., Gerth, R., Seger, C.J.H.: Formally Verifying IEEE Compliance of Floating-point Hardware. Intel Technology Journal Q1 (1999) 9. Wheeler, D.J., Needham, R.M.: Tea, a tiny encryption algorithm. In: Fast Software Encryption (1994) 10. AutoESL: AutoPilot Reference Manual. AutoESL (2008) 11. Hu, A.J.: High-level vs. RTL combinational equivalence: An introduction. In: ICCD (2006) 12. Kaufmann, M., Manolios, P., Moore, J.S.: Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers, Boston (2000) 13. Ray, S., Chen, Y., Xie, F., Yang, J.: Combining theorem proving and model checking for certification of behavioral synthesis flows. Technical Report TR-08-48, University of Texas at Austin (2008) 14. Abadi, M., Lamport, L.: The Existence of Refinement Mappings. TCS 82(2) (1991) 15. Lamport, L.: What Good is Temporal Logic? Information Processing 83 (1983) 16. Ray, S., Hunt Jr., W.A.: Deductive Verification of Pipelined Machines Using First-Order Quantification. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 31–43. Springer, Heidelberg (2004) 17. Seger, C.J.H., Jones, R., O’Leary, J., Melham, T., Aagaard, M., Barrett, C., Syme, D.: An industrially effective environment for formal hardware verification. TCAD 24(9) (2005) 18. Barrett, C., Tinelli, C.: CVC3. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 298–302. Springer, Heidelberg (2007) 19. Chapman, R.O.: Verified high-level synthesis. PhD thesis, Ithaca, NY, USA (1994) 20. IEEE: IEEE Std 1076: IEEE standards VHDL language reference manual 21. Johnson, R., Pingali, K.: Dependence-based program analysis. In: PLDI (1993) 22. Hoare, C.A.R.: Communicating Sequential Processes. Prentice-Hall, Englewood Cliffs (1985) 23. Gordon, M., Iyoda, J., Owens, S., Slind, K.: Automatic formal synthesis of hardware from higher order logic. TCS 145 (2006) 24. Schneider, K.: A verified hardware synthesis for Esterel. In: DIPES (2000) 25. Dave, M.A.: Compiler verification: a bibliography. SIGSOFT SEN 28(6) (2003) 26. Moore, J.S.: Piton: A Mechanically Verified Assembly Language. Kluwer Academic Publishers, Dordrecht (1996) 27. Verisoft Project: http://www.verisoft.de 28. Verifix Project: http://www.info.uni-karlsruhe.de/˜verifix 29. CompCert Project: http://pauillac.inria.fr/˜xleroy/compcert 30. Leinenbach, D., Paul, W.J., Petrova, E.: Towards the formal verification of a C0 compiler: Code generation and implementation correctness. In: SEFM (2005) 31. Leroy, X.: Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In: POPL (2006) 32. Pike, L., Shields, M., Matthews, J.: A Verifying Core for a Cryptographic Language Compiler. In: ACL2 (2006) 33. Baumgartner, J., Mony, H., Paruthi, V., Kanzelman, R., Janssen, G.: Scalable sequential equivalence checking across arbitrary design transformations. In: ICCD (2006) 34. Kaiss, D., Goldenberg, S., Hanna, Z., Khasidashvili, Z.: Seqver: A sequential equivalence verifier for hardware designs. In: ICCD (2006) 35. Feng, X., Hu, A.J., Yang, J.: Partitioned model checking from software specifications. In: ASP-DAC (2005) 36. Yang, J., Seger, C.J.H.: Introduction to generalized symbolic trajectory evaluation. TVLSI 11(3) (2003) 37. Claesen, L., Genoe, M., Verlind, E.: Implementation/specification verification by means of SFG-Tracing. In: CHARME (1993)
Dynamic Observers for the Synthesis of Opaque Systems Franck Cassez1, , J´er´emy Dubreil2, , and Herv´e Marchand2, 1
2
National ICT Australia & CNRS, Sydney, Australia INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, Rennes, France
Abstract. In this paper, we address the problem of synthesizing opaque systems by selecting the set of observable events. We first investigate the case of static observability where the set of observable events is fixed a priori. In this context, we show that checking whether a system is opaque and computing an optimal static observer ensuring opacity are both PSPACE-complete problems. Next, we introduce dynamic partial observability where the set of observable events can change over time. We show how to check that a system is opaque w.r.t. a dynamic observer and also address the corresponding synthesis problem: given a system G and secret states S, compute the set of dynamic observers under which S is opaque. Our main result is that the synthesis problem can be solved in EXPTIME.
1 Introduction Security is one of the most important and challenging aspects in designing services deployed on large open networks like Internet or mobile phones, e-voting systems etc. For such services, naturally subject to malicious attacks, methods to certify their security are crucial. In this context there has been a lot of research to develop formal methods for the design of secure systems and a growing interest in the formal verification of security properties [1,2,3] and in their model-based testing [4,5,6,7,8]. Security properties are generally divided into three categories: integrity, availability and confidentiality. We focus here on confidentiality and especially information flow properties. We use the notion of opacity defined in [9] formalizing the absence of information flow, or more precisely, the impossibility for an attacker to infer the truth of a predicate (it could be the occurrence in the system of some particular sequences of events, or the fact that the system is in some particular configurations). Consider a predicate ϕ over the runs of a system G and an attacker observing only a subset of the events of G. We assume that the attacker knows the model G. In this context, the attacker should not be able to infer that a run of G satisfies ϕ. The secret ϕ is opaque for G with respect to a given partial observation if for every run of G that satisfies ϕ, there exists a run, observationally equivalent from the attacker’s point of view, that does not satisfy ϕ. In such a case, the attacker can never be sure that a run of G satisfying ϕ has occurred. In the sequel, we shall consider a secret ϕ corresponding to a set of secret states. Finally, note that
Author supported by a Marie Curie International Outgoing Fellowship within the 7th European Community Framework Programme. Authors partially supported by the Politess RNRT project. Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 352–367, 2009. c Springer-Verlag Berlin Heidelberg 2009
Dynamic Observers for the Synthesis of Opaque Systems
353
the definition of opacity is general enough to define other notions of information flow like trace-based non-interference and anonymity (see [9]). Note also that secrecy [10] can be handled as a particular case of opacity (see Section 3) and thus our framework applies to secrecy as well. Related Work. Methods for the synthesis of opaque systems have already been investigated from the supervisory control point of view. In these frameworks, some of the events are uncontrollable and the set of events an external attacker can observe is fixed. If the system is G, the approach is then to restrict G (remove some of its behaviors) using a supervisor (or controller) C, in order to render a secret ϕ opaque in the supervised system C(G). In [11], the authors consider several secrets and attackers with different sets of observable events. They provide sufficient conditions to compute an optimal controller preserving all secrets assuming that the controller has complete knowledge of the system and full control on it. In [12,13], the authors consider a control problem under partial observation and provide algorithms to compute the optimal controller ensuring the opacity of one secret against one attacker. Other works on the enforcement of opacity by means of controllers can be found in [14]. Note that these approaches are intrusive in the sense that the system G has to be modified. Our Contribution. In this paper, instead of restricting the behavior of the system by means of a controller C which disables some actions, we consider dynamic observers that will dynamically change the set of observable events in order to ensure opacity. Compared to the previous approaches related to the supervisory control theory, this approach is not intrusive in the sense that it does not restrict G but only hides some events at different points in the course of the execution of the system. Indeed, one can think of a dynamic observer as a filter (See Figure 1) which is added on top of G. System G
u ∈ Σ∗
Filter D
D(u)
Attacker U
Fig. 1. Architecture Filtering out Sequences of Events in G
The main contributions of this paper are two-fold. First, we extend the notion of opacity for static observers (i.e., the natural projection) to dynamic observers.1 We show how to check opacity when the dynamic observer is given by a finite automaton. Second we give an algorithm to compute the set of all dynamic observers which can ensure opacity of a secret ϕ for a given system G. Finally we consider an optimization problem which is to compute a least expensive dynamic observer. The notion of dynamic observers was already introduced in [15] for the fault diagnosis problem. Notice that the fault diagnosis problem and the opacity problems are not reducible one to the other and thus we have to design new algorithms to solve the opacity problems under dynamic observations. Organization of the Paper. In Section 2 we introduce some notations for words, languages and finite automata. In Section 3 we define the notion of opacity with static 1
At this point, it should be mentioned that we assume the attacker has not only a perfect knowledge of the system but also of the observer.
354
F. Cassez, J. Dubreil, and H. Marchand
observers and show that deciding opacity for finite automata is PSPACE-complete. We also consider the optimization problem of computing a largest set (cardinality-wise) of observable events to ensure opacity and we show that this problem is PSPACE-complete as well. Section 4 is the core of the paper and considers dynamic observers for ensuring opacity. We prove that the set of all observers that ensure opacity can be computed in EXPTIME. In Section 5 we briefly discuss how to compute optimal dynamic observers. Omitted proofs and details are given in Appendix or available in the extended version of this paper [16].
2 Notation and Preliminaries Let Σ be a finite alphabet. Σ ∗ is the set of finite words over Σ and contains the empty word ε. A language L is any subset of Σ ∗ . Given two words u ∈ Σ ∗ and v ∈ Σ ∗ , we denote u.v the concatenation of u and v which is defined in the usual way. |u| stands for the length of the word u (the length of the empty word is zero). We let Σ n with n ∈ N denote the set of words of length n over Σ. Given Σ1 ⊆ Σ, we define the projection operator on finite words, PΣ1 : Σ ∗ → Σ1∗ , that removes in a sequence of Σ ∗ all the events that do not belong to Σ1 . Formally, PΣ1 is recursively defined as follows: PΣ1 (ε) = ε and for λ ∈ Σ, s ∈ Σ ∗ , PΣ1 (s.λ) = PΣ1 (s).λ if λ ∈ Σ1 and PΣ1 (s) otherwise. Let K ⊆ Σ ∗ be a language. The definition of projection for words extends to languages: PΣ1 (K) = {PΣ1 (s) | s ∈ K}. Conversely, let K ⊆ Σ1∗ . The inverse −1 projection of K is PΣ,Σ (K) = {s ∈ Σ ∗ | PΣ1 (s) ∈ K}. We omit the subscript Σ1 in 1 the sequel when it is clear from the context. We assume that the system is given by an automaton G which is a tuple (Q, q0 , Σ, δ, F ) with Q a set of states, q0 ∈ Q is the initial state, δ : Q × Σ → 2Q is the transition relation and F ⊆ Q is the set of accepting states. If Q is finite, G is a finite automaton λ (FA). We write q −−→ whenever δ(q, λ) = ∅. An automaton is complete if for each λ
λ ∈ Σ and each q ∈ Q, q −−→. G is deterministic if for all q ∈ Q, λ ∈ Σ, |δ(q, λ)| ≤ 1. λ
λ
1 2 A run ρ from state q0 in G is a finite sequence of transitions q0 −→ q1 −→ λi λn q2 · · · qi−1 −→ qi · · · qn−1 −−→ qn s.t. λi+1 ∈ Σ and qi+1 ∈ δ(qi , λi+1 ) for 0 ≤ i ≤ n − 1. The trace of the run ρ is tr(ρ) = λ1 .λ2 · · · λn . We let last(ρ) = qn , and the length of ρ, denoted |ρ|, is n. For i ≤ n we denote by ρ[i] the prefix of the run
λ
λ
1 i ρ truncated at state qi , i.e., ρ(i) = q0 −−→ q1 · · · qi−1 −−→ qi . The set of finite runs ∗ from q0 in G is denoted Runs(G). A word u ∈ Σ is generated by G if u = tr(ρ) for some ρ ∈ Runs(G). Let L(G) be the set of words generated by G. The word u ∈ Σ ∗ is accepted by G if u = tr(ρ) for some ρ ∈ Runs(G) with last(ρ) ∈ F . The language of (finite) words LF (G) of G is the set of words accepted by G. If G is a FA such that F = Q we simply omit F in the tuple that defines G. In the sequel we shall use the Post operator defined by: let X ⊆ Q, Post(X, ε) = X and for u ∈ Σ ∗ , λ ∈ Σ, Post(X, u.λ) = ∪q∈Post(X,u) δ(q, λ). We also let Post(X, L) = ∪u∈L Post(X, u) for a non empty language L. The product of automata is defined in the usual way: the product automaton represents the concurrent behavior of the automata with synchronization on the common
Dynamic Observers for the Synthesis of Opaque Systems
355
events. Given G1 = (Q1 , q01 , Σ1 , δ1 , F1 ) and G2 = (Q2 , q02 , Σ2 , δ2 , F2 ) we denote G1 × G2 the product of G1 and G2 .
3 Opacity with Static Projections In the sequel, we let G = (Q, q0 , Σ, δ, F ) be a non-deterministic automaton over Σ and Σo ⊆ Σ. Enforcing opacity aims at preventing an attacker U, from deducing confidential information on the execution of a system from the observation of the events in Σo . Given a run of G with trace u, the observation of the attacker U is given by the static natural projection PΣo (u) following the architecture of Figure 1 with D(u) = PΣo (u). In this paper, we shall consider that the confidential information is directly encoded in the system by means of a set of states F 2 . If the current trace of a run is u ∈ L(G), the attacker should not be able to deduce, from the knowledge of PΣo (u) and the structure of G, that the current state of the system is in F . As stressed earlier, the attacker U has full knowledge of the structure of G (he can perform computations using G like subset constructions) but only has a partial observation upon its behaviors, namely the observed traces in Σo∗ . The set of Σo -traces of G is T rΣo (G) = PΣo (L(G)). We define the operator [[·]]Σo by: [[ε]]Σo = {ε} and for μ ∈ Σo∗ and λ ∈ Σo , [[μ.λ]]Σo = PΣ−1 (μ).λ ∩ L(G). In other words, u ∈ [[μ.λ]]Σo iff (1) the projection of u is μ.λ and (2) the sequence u ends with an observable “λ” event and (3) u ∈ L(G). Remark 1. We suppose that U is reacting faster than the system. Therefore, when an observable event occurs, U can compute the possible set of states of G before G moves even if G can do an unobservable action.
Next we introduce the notion of opacity defined in [9]. Intuitively, a set of states F is said to be opaque with respect to a pair (G, Σo ) if the attacker U can never be sure that the current state of G is in the set F . Definition 1 (State Based Opacity). Let F ⊆ Q. The secret F is opaque with respect to (G, Σo ) if for all μ ∈ T rΣo (G), Post({q0 }, [[μ]]Σo ) ⊆ F. We can extend Definition 1 to a (finite) family of sets F = {F1 , F2 , · · · , Fk }: the secret F is opaque with respect to (G, Σo ) if for each F ∈ F, F is opaque w.r.t. (G, Σo ). This can be used to express other kinds a of confidentiality properties. For example, [10] introduced the notion of secrecy q4 q5 b q6 a,b b b of a set of states F . Intuitively, F is not a secret w.r.t. G and Σo whenever after an q0 a q1 b q2 a q3 a,b observation μ, the attacker either knows that the system is in F or knows that it Fig. 2. State based opacity illustration is not in F . Secrecy can thus be handled considering the opacity w.r.t. a family {F, Q \ F }. In the sequel we consider only one set of states F and, when necessary, we point out what has to be done for solving the problems with family of sets. 2
Equivalently, the secret can be given by a regular language over Σ ∗ , see [16].
356
F. Cassez, J. Dubreil, and H. Marchand
Example 1. Consider the automaton G of Figure 2, with Σo = Σ = {a, b}. The secret is given by the states F = {q2 , q5 }. The secret F is certainly not opaque with respect to (G, Σ), as by observing a trace b∗ .a.b, the attacker U knows that the system is in a secret state. Note that he does not know whether it is q2 or q5 but still he knows that the state of the system is in F . In the sequel we shall focus on variations of the State Based Opacity Problem: Problem 1 (Static State Based Opacity Problem). I NPUT: A non-deterministic FA G = (Q, q0 , Σ, δ, F ) and Σo ⊆ Σ. P ROBLEM: Is F opaque w.r.t. (G, Σo ) ? 3.1 Checking State Based Opacity In order to check for the opacity of F w.r.t. (G, Σo ), we first introduce the classical notion of determinization via subset construction adapted to our definition of opacity: Deto (G) = (X , X0 , Σo , Δ, Fo ) denotes the deterministic automaton given by: – X ⊆ 2Q \ ∅, X0 = {q0 } and Fo = 2F , – given λ ∈ Σo , if X = Post(X, (Σ \ Σo )∗ .λ) = ∅ then Δ(X, λ) = X . Checking whether F is opaque w.r.t. (G, Σo ) amounts to checking whether a state in Fo is reachable. To check opacity for a family {F1 , F2 , · · · , Fk }, we define Fo to be the set 2F1 ∪ 2F2 ∪ · · · ∪ 2Fk (as pointed out before, this enables us to handle secrecy). The previous construction shows that opacity on non-deterministic FA can be checked in exponential time. Actually, checking state based opacity for (nondeterministic) FA is PSPACE-complete. Given a FA G over Σ and F the set of accepting states, the (language) universality problem is to decide whether LF (G) = Σ ∗ . If not, then G is not universal. Checking language universality for non-deterministic FA is PSPACE-complete [17] and Problem 1 is equivalent to universality. Theorem 1. Problem 1 is PSPACE-complete for non-deterministic FA. Proof. We assume that G is complete i.e., L(G) = Σ ∗ . Note that [[u]]Σ = u. Now, G is not universal iff there exists u ∈ Σ ∗ such that Post({q0 }, [[u]]Σ ) ⊆ Q \ F . With the definition of state based opacity, taking Σo = Σ, we have: Q \ F is not opaque w.r.t. (G, Σ) ⇐⇒ ∃μ ∈ Σ ∗ s.t. Post({q0 }, [[μ]]Σ ) ⊆ Q \ F. PSPACE-easiness was already known and follows from a result in [18]: the modelchecking problem for a temporal logics which can specify security properties is proved to be PSPACE-complete. 3.2 Maximum Cardinality for Static Projections If a secret is opaque w.r.t. a set of observable events Σo , it is worthwhile noticing that it will still be opaque w.r.t. any subset of Σo . It might be of interest to hide as few events as possible from the attacker still preserving opacity of a secret. Indeed, hiding an event can be seen as energy consuming or as limiting the interactions or visibility for users of the system (and some of them are not malicious attackers) and thus should be avoided. Given the set of events Σ of G, we can check whether the secret is opaque w.r.t. Σo ⊆ Σ. In that case, we may increase the number of visible letters and check again if the secret remains opaque. This suggests the following optimization problem:
Dynamic Observers for the Synthesis of Opaque Systems
357
Problem 2 (Maximum Cardinality of Observable Events). I NPUT: A non-deterministic FA G = (Q, q0 , Σ, δ, F ) and n ∈ N s.t. n ≤ |Σ|. P ROBLEMS: (A) Is there any Σo ⊆ Σ with |Σo | = n, such that F is opaque w.r.t. (G, Σo ) ? (B) If the answer to (A) is “yes”, find the maximum n0 such that there exists Σo ⊆ Σ with |Σo | = n0 and F is opaque w.r.t. (G, Σo ). Theorem 2. Problem 2.(A) and Problem 2.(B) are PSPACE-complete. Proof. PSPACE-easiness follows directly as we can guess a set Σo with |Σo | = n and check in PSPACE whether F is opaque w.r.t. (G, Σo ). Thus Problem 2.(A) is in NPSPACE and thus in PSPACE. PSPACE-hardness is also easy because taking n = |Σ| amounts to checking that F is opaque w.r.t. (G, Σ) which has been shown equivalent to the universality problem (proof of Theorem 1). To solve Problem 2.(B) it suffices to iterate a binary search and thus Problem 2.(B) is also in PSPACE. To see it is PSPACE-complete, to check whether F is opaque w.r.t. (G, Σ), it suffices to solve Problem 2.(B) and then check whether n0 = |Σ|.
4 Opacity with Dynamic Projection So far, we have assumed that the observability of events is given a priori and this is why we used the term static projections. We generalize this approach by considering the notion of dynamic projections encoded by means of dynamic observers as introduced in [15]. Dynamic projection allows us to render unobservable some events after a given observed trace (for example, some outputs of the system). To illustrate the benefits of such projections, we consider the following example: Example 2. Consider again the automaton G of Example 1, Figure 2, where F = {q2 , q5 }. With Σo = Σ = {a, b}, F is not opaque. If either Σo = {a} or Σo = {b}, then the secret becomes opaque. Thus if we have to define static sets of observable events, at least one event will have to be permanently unobservable. However, the less you hide, the more important is the observable behavior of the system. Thus, we should try to reduce as much as possible the hiding of events. We can be more efficient by using a dynamic projection that will render unobservable an event only when necessary. Indeed, after observing b∗ , the attacker knows that the system is in the initial state. However, if a subsequent “a” follows, then the attacker should not be able to observe “b” as in this case it could know the system is in a secret state. We can then design a dynamic events’s hider as follows: at the beginning, everything is observable; when an “a” occurs, the observer hides any subsequent “b” occurrences and permits only the observation of “a”. Once an “a” has been observed, the observer releases its hiding by letting both “a” and “b” be observable again.
4.1 Opacity Generalized to Dynamic Projection We now define the notion of dynamic projection and its associated dynamic observer.
358
F. Cassez, J. Dubreil, and H. Marchand
Dynamic Projections and Observers. A dynamic projection is a function that will decide to let an event be observable or to hide it, thus playing the role of a filter between the system and the attacker to prevent information flow (see Figure 1). Definition 2. A dynamic observability choice is a mapping T : Σ ∗ → 2Σ . The (observation-based) dynamic projection induced by T is the mapping D : Σ ∗ → Σ ∗ defined by D(ε) = ε, and for all u ∈ Σ ∗ and all λ ∈ Σ, D(u.λ) = D(u).λ if λ ∈ T (D(u)) and D(u.λ) = D(u) otherwise.
(1)
Assuming that u ∈ Σ ∗ occurred in the system and μ ∈ Σ ∗ has been observed by the attacker i.e., μ = D(u), then the events of T (μ) are the ones currently observable. Note that this choice does not change until an observable event occurs. Given μ ∈ Σ ∗ , D−1 (μ) = {u ∈ Σ ∗ | D(u) = μ} is the set of sequences that project onto μ. Example 3. A dynamic projection D : Σ ∗ → Σ ∗ corresponding to the one of Example 2 can be induced by the dynamic observability choice T defined by T (u) = {a} for all u ∈ b∗ .a and T (u) = {a, b} for all the other sequences u ∈ Σ ∗ .
Given a FA G and a dynamic projection D, we denote by T rD (G) = D(L(G)), the set of observed traces. Conversely, given μ ∈ T rD (G), the set of words [[μ]]D of G that are compatible with μ is defined by: [[ε]]D = {ε} and for μ ∈ Σ ∗ , λ ∈ Σ : [[μ.λ]]D = D−1 (μ).λ ∩ L(G). Given two different dynamic projections D1 and D2 and a system G over Σ, we say that D1 and D2 are G-equivalent, denoted D1 ∼G D2 , whenever for all u ∈ L(G), D1 (u) = D2 (u). The relation ∼G identifies two dynamic projections when they agree on L(G); they can disagree on other words in Σ ∗ but since they will not be generated by G, it will not make any difference from the attacker point of view. In the sequel we will be interested in computing the interesting part of dynamic projections given G, and thus will compute one dynamic projection in each class. Opacity with Dynamic Projection. We generalize Definition 1 by taking into account the new observation interface given by D. Definition 3. Given a FA G = (Q, q0 , Σ, δ, F ), F is opaque with respect to (G, D) if ∀μ ∈ T rD (G), Post({q0 }, [[μ]]D ) ⊆ F. Again, this definition extends to family of sets. We say that D is a valid dynamic projection if (2) is satisfied (i.e., whenever F is opaque w.r.t. (G, D)) and we denote by D the set of valid dynamic projections. Obviously if D1 ∼G D2 , then D1 is valid if and only if D2 is valid. We denote by D∼G the quotient set of D by ∼G .
b 1
(2) a, b
b a
2
a
3
Γ (1) = {a, b} Γ (2) = {a} Γ (3) = {a, b}
Fig. 3. Example of a Dynamic Observer
Dynamic Observers for the Synthesis of Opaque Systems
359
Remark 2. Let Σo ⊆ Σ, then if D is a dynamic projection that defines a constant mapping making actions in Σo always observable (and the others always unobservable), we have D(u) = PΣo (u) and we retrieve Definition 1. The property of secrecy can be extended as well using dynamic projection.
In the sequel, we shall be interested in checking the opacity of F w.r.t. (G, D) or to synthesize such a dynamic projection D ensuring this property. In Section 3, the dynamic projection was merely the natural projection and computing the observational behavior of G was easy. Here, we need to find a characterization of these dynamic projections that can be used to check opacity or to enforce it. To do so, we introduce the notion of dynamic observer [15] that will encode a dynamic projection in terms of automata. Definition 4 (Dynamic observer). A dynamic observer is a complete deterministic labeled automaton O = (X, x0 , Σ, δo , Γ ) where X is a (possibly infinite) set of states, x0 ∈ X is the initial state, Σ is the set of input events, δo : X ×Σ → X is the transition function (a total function), and Γ : X → 2Σ is a labeling function that specifies the set of events that the observer keeps observable at state x. We require that for all x ∈ X and for all λ ∈ Σ, if λ ∈ / Γ (x), then δo (x, λ) = x, i.e., if the observer does not want an event to be observed, it does not change its state when such an event occurs. We extend δo to words of Σ ∗ by: δo (q, ε) = q and for u ∈ Σ ∗ , λ ∈ Σ, δo (q, u.λ) = δo (δo (q, u), λ). Assuming that the observer is at state x and an event λ occurs, it outputs λ whenever λ ∈ Γ (x) or nothing (ε) if λ ∈ / Γ (x) and moves to state δo (x, λ). An observer can be interpreted as a functional transducer taking a string u ∈ Σ ∗ as input, and producing the output which corresponds to the successive events it has chosen to keep observable. An example of dynamic observer is given in Figure 3. We now relate the notion of dynamic observer to the one of dynamic projection. Proposition 1. Let O = (X, x0 , Σ, δo , Γ ) be an observer and define DO by: DO (ε) = ε, and for all u ∈ Σ ∗ , DO (u.λ) = DO (u).λ if λ ∈ Γ (δo (x0 , u)) and DO (u) otherwise. Then DO is a dynamic projection. In the sequel, we write [[μ]]O for [[μ]]DO . Proof. To prove that DO defined above is a dynamic projection, it is sufficient to exhibit a dynamic observability choice T : Σ ∗ → 2Σ and to show that (1) holds. Let T (u) = Γ (δo (xo , DO (u))). It is easy to show by induction that δo (xo , u) = δo (xo , DO (u)) because δo (x, λ) = x when λ ∈ / Γ (x). We can then define T (u) = Γ (δo (xo , u)) and the result follows from this remark. Proposition 2. Given a dynamic projection D induced by T , let OD = (Σ ∗ , ε, Σ, δD , T ) where δD (w, λ) = D(w.λ). Then OD is a dynamic observer. Proof. OD is complete and deterministic by construction. Moreover after a sequence u if D(u.λ) = D(u) then δD (u, λ) = u. Note that there might exist several equivalent observers that encode the same dynamic projection. For example, the observer depicted in Figure 3 is one observer that encodes the dynamic projection described in Example 3. But, one can consider other observers obtained by unfolding an arbitrary number of times the self-loops in states 1 or 3. Finally, to mimic the language theory terminology, we will say that a dynamic projection D is regular whenever there exists a finite state dynamic observer O such that DO = D.
360
F. Cassez, J. Dubreil, and H. Marchand
To summarize this part, we can state that with each dynamic projection D, we can associate a dynamic observer OD such that D = DOD . In other words, we can consider a dynamic projection or one of its associated dynamic observers whenever one representation is more convenient than the other. If the dynamic projection D derived from O is valid, we say that O is a valid dynamic observer. In that case, we will say that F is opaque w.r.t. (G, O) and we denote by OBS(G) the set of all valid dynamic observers. 4.2 Checking Opacity with Dynamic Observers The first problem we are going to address consists in checking whether a given dynamic projection ensures opacity. To do so, we assume given a dynamic observer which defines this projection map. The problem, we are interested in, is then the following: Problem 3 (Dynamic State Based Opacity Problem). I NPUT: A non-deterministic FA G = (Q, q0 , Σ, δ, F ) and a dynamic observer O = (X, x0 , Σ, δo , Γ ). P ROBLEM: Is F opaque w.r.t. (G, O) ? We first construct an automaton which represents what an attacker will see under the dynamic choices of observable events made by O. To do so, we define the automaton G ⊗ O = (Q × X, (q0 , x0 ), Σ ∪ {τ }, δ, F × X) where τ is a fresh letter not in Σ and δ is defined for each λ ∈ Σ ∪ {τ }, and (q, x) ∈ Q × X by: – δ((q, x), λ) = δG (q, λ) × {δo (x, λ)} if λ ∈ Γ (x); – δ((q, x), τ ) = ∪λ∈Σ\Γ (x) δG (q, λ) × {x}. Proposition 3. F is opaque w.r.t. (G, O) iff F × X is opaque w.r.t. (G ⊗ O, Σ). Proof. Let μ ∈ T rO (G) be a trace observed by the attacker. We prove the following by induction on the length of μ: q ∈ PostG ({q0 }, [[μ]]O ) ⇐⇒ (q, x) ∈ PostG⊗O ({(q0 , x0 )}, [[μ]]Σ ) for some x ∈ X. If μ = ε, the result is immediate. Assume that μ = μ.λ. Let q ∈ PostG ({q0 }, [[μ ]]O ). u
v
λ
By definition of [[μ ]]O we have q0 →q →q →q with u ∈ [[μ]]O , u.v.λ ∈ [[μ.λ]]O . By induction hypothesis, it follows that (q , δo (x0 , u)) ∈ PostG⊗O ({(q0 , x0 )}, [[μ]]Σ ) where δo (x0 , u) is the (unique) state of O after reading u. Then, there exists a word w w ∈ (Σ ∪ {τ })∗ such that PΣ (w) = μ and (q0 , x0 )→(q , δo (x0 , u)) is a run of G ⊗ O. Assume v = v1 .v2 . · · · .vk , k ≥ 0. As O(u.v) = O(u) , we must have vi ∈ Γ (δo (x0 , u.v1 . · · · .vi )) when 1 ≤ i ≤ k. Hence, by construction of G ⊗ O, there is a sequence of transitions in G ⊗ O of the form τ
τ
(q , δo (x0 , u)) −−→ δo (x0 , u.v1 ) −−→ w
···
τ
−−→ (q , δo (x0 , u.v)) τ k .λ
with λ ∈ Γ (δo (x0 , u.v)). Thus, (q0 , x0 ) −−→ (q , δo (x0 , u)) −−−−→ (q, δo (u.v.λ)) is a run of G ⊗ O with PΣ (w.τ k .λ) = μ.λ = μ . This implies (q, δo (x0 , u.v.λ)) ∈ PostG⊗O ({(q0 , x0 )}, [[μ ]]Σ ). For the converse if we have a sequence of τ transitions in G ⊗ O, they must originate from actions in G which are not observable.
Dynamic Observers for the Synthesis of Opaque Systems
361
The previous result is general, and if O is a FA we obtain the following theorem: Theorem 3. For finite state observers, Problem 3 is PSPACE-complete. Proof. As the size of the product G ⊗ O is the product of the size of G and the size of O and opacity can be checked in PSPACE, PSPACE-easiness follows. Now, checking state based opacity with respect to (G, Σ) can be done using a simple observer with one state which always let Σ observable and PSPACE-hardness follows. As Proposition 3 reduces the problem of checking opacity with dynamic observers to the problem of checking opacity with static observers, Theorem 3 extends to family of sets (and thus to secrecy). 4.3 Enforcing Opacity with Dynamic Projections So far, we have assumed that the dynamic projection/observer was given. Next we will be interested in synthesizing one in such a way that the secret becomes opaque w.r.t. the system and this observer. Problem 4 (Dynamic Observer Synthesis Problem). I NPUT: A non-deterministic FA G = (Q, q0 , Σ, δ, F ). P ROBLEM: Compute the set of valid dynamic observers OBS(G)3 . Deciding the existence of a valid observer is trivial: it is sufficient to check whether always hiding Σ is a solution. Moreover, note that OBS(G) can be infinite. To solve Problem 4, we reduce it to a safety 2-player game. Player 1 will play the role of an observer and Player 2 what the attacker observes. Assume the automaton G can be in any of the states s = {q1 , q2 , · · · , qn }, after a sequence of actions occurred. A round of the game is: given s, Player 1 chooses which letters should be observable next i.e., a set t ⊆ Σ; then it hands it over to Player 2 who picks up an observable letter λ ∈ t; this determines a new set of states G can be in after λ, and the turn is back to Player 1. The goal of the Players are defined by: – The goal of Player 2 is to pick up a sequence of letters such that the set of states that can be reached after this sequence is included in F . If Player 2 can do this, then it can infer the secret F . Player 2 thus plays a reachability game trying to enforce a particular set of states, say Bad (the states in which the secret is disclosed). – The goal of Player 1 is opposite: it must keep the game in a safe set of states where the secret is not disclosed. Thus Player 1 plays a safety game trying to keep the game in the complement set of Bad. As we are playing a (finite) turn-based game, Player 2 has a strategy to enforce Bad iff Player 1 has no strategy to keep the game in the complement set of Bad (turn-based finite games are determined [19]). We now formally define the 2-player game and show it allows us to obtain a finite representation of all the valid dynamic observers. Let H = (S1 ∪ S2 , s0 , M1 ∪ M2 , δH ) be the deterministic game automaton derived from G and given by: 3
Our aim is actually to be able to generate at least one observer for each representative of D∼G , thus capturing all the interesting dynamical projections.
362
F. Cassez, J. Dubreil, and H. Marchand
– S1 = 2Q is the set of Player 1 states and S2 = 2Q × 2Σ the set of Player 2 states; – the initial state of the game is the Player 1 state s0 = {q0 }; – Player 1 will choose a set of events to hide in Σ, then Player 1 actions are in the alphabet M1 = 2Σ and Player 2 actions are in M2 = Σ; – the transition relation δH ⊆ (S1 × M1 × S2 ) ∪ (S2 × M2 × S1 ) is given by: • Player 1 moves (observable events): if s ∈ S1 , t ⊆ Σ, then δH (s, t) = (s, t); • Player 2 moves (observed events): if (s, t) ∈ S2 , λ ∈ t and s = Post(s, (Σ \ t)∗ .λ) = ∅, then δH ((s, t), λ) = s . Remark 3. If we want to exclude the possibility of hiding everything for Player 1, it suffices to build the game H with this constraint on Player 1 moves i.e., ∀s ∈ S1 , and t = ∅, δH (s, t) = (s, t).
We define the set of Bad states to be the set of Player 1 states s s.t. s ⊆ F . For a family of sets F1 , F2 , · · · , Fk , Bad is the set of states 2F1 ∪ 2F2 ∪ · · · ∪ 2Fk . Let Runsi (H), i = 1, 2 be the set of runs of H that end in a Player i state. A strategy for Player i is a mapping fi : Runsi (H) → Mi that associates with each run that ends in a Player i state, the new choice of Player i. Given two strategies f1 , f2 , the game H generates the set of runs Outcome(f1 , f2 , H) combining the choices of Players 1 and 2 w.r.t. f1 and f2 . f1 is a winning strategy for Playing 1 in H for avoiding Bad if for all Player 2 strategies f2 , no run of Outcome(f1 , f2 , H) contains a Bad state. A winning strategy for Player 2 is a strategy f2 s.t. for all strategy f1 of Player 1, Outcome(f1 , f2 , H) reaches a Bad state. As turn-based games are determined, either Player 1 has a winning strategy or Player 2 has a winning strategy. We now relate the set of winning strategies for Player 1 in H to the set of valid dynamic projections. Let PM2 () = PΣ (tr()) for a run of H. The proof of the following Proposition 4 is given in Appendix. Definition 5. Given a dynamic projection D, we define the strategy fD such that for every ∈ Runs1 (H), fD () = TD (PM2 ()). Proposition 4. Let D be a dynamic projection. D is valid if and only if fD is a winning strategy for Player 1 in H. Given a strategy f for Player 1 in H, for all μ ∈ Σ ∗ , there exists at most one run μ ∈ Outcome1 (f, H) such that PM2 (tr(μ )) = μ. Definition 6. Let f be a strategy for Player 1 in H. We define the dynamic projection Df induced by the dynamic observability choice Tf : Σ ∗ → 2Σ given by: Tf (μ) = f (μ ) if μ is in Outcome(f, H) and Tf (μ) = Σ otherwise. Notice that when μ is not in Outcome(f, H), it does not really matter how we define Tf because there is no word w ∈ L(G) s.t. μ = Df (w). Proposition 5. If f is a winning strategy for Player 1 in H, then Df is a valid dynamic projection. Proof. Applying the construction of Definition 5 yields fDf = f . Since f is a winning strategy, by Proposition 4, we get that Df is a valid dynamic projection.
Dynamic Observers for the Synthesis of Opaque Systems
363
Notice that we only generate a representative for each of the equivalence classes induced by ∼G . However, an immediate consequence of the two previous propositions is that there is a bijection between the set of winning strategies of Player 1 and D∼G . 4.4 Most Permissive Dynamic Observer We now define the notion of most permissive valid dynamic observers. For an observer O = (X, xo , Σ, δo , Γ ) and w ∈ Σ ∗ , recall that Γ (δo (xo , w)) is the set of events that O chooses to render observable after observing w. Assume that w = λ1 λ2 · · · λk . Let w = Γ (xo ).λ1 .Γ (δo (xo , w[1])).λ2 .Γ (δo (xo , w[2])) · · · λk .Γ (δo (xo , w[k])) i.e., w contains the history of what O has chosen to observe at each step and the next observable event that occurred after each choice. Σ
Definition 7. Let O∗ : (2Σ .Σ)∗ → 22 . The mapping O∗ is the most permissive valid dynamic observer4 ensuring the opacity of F if the following holds: O = (X, xo , Σ, δo , Γ ) is a valid observer ⇐⇒ ∀w ∈ L(G), Γ (δo (xo , w)) ∈ O∗ (w). The definition of the most permissive valid observer states that any valid observer O must choose a set of observable events in O∗ (w) on input w; if an observer chooses its set of observable events in O∗ (w) on input w, then it is a valid observer. Theorem 4. The most permissive valid observer O∗ can be computed in EXPTIME. Proof. The detailed proof is given in Appendix. For a sketch, the most permissive valid dynamic observer is obtained using the most permissive winning strategy in the game H. It is well-known result [20] that for a finite game, if there is a winning strategy, there is a memoryless most permissive one. Moreover whether there is a winning strategy can be decided in linear time in the size of the game. As the size of H is exponential in the size of G and Σ the result follows. We let FH be the automaton representing the most permissive observer. Theorem 4 states that FH can be used to generate any valid observer. In particular, given a finitememory winning strategy, the corresponding valid observer is finite and thus its associated dynamic projection is regular. An immediate corollary of Theorem 4 is the following: Corollary 1. Problem 4 is in EXPTIME. Example 4. To illustrate this section, we consider the following small example. The system is depicted by the automaton in Figure 4(a). The set of secret states is reduced to the state (2). Figure 4(b) represents the associated game automaton. The states of Player 1 are represented by circles whereas the ones of Player 2 are represented by squares. The only bad states is the state (2). The most permissive valid dynamic observer is obtained when Player 1 does not allow transition {a, b} to be triggered in state (1) (otherwise, 4
Strictly speaking O∗ is not an observer because it maps to sets of sets of events whereas observers map to sets of events. Still we use this term because it is the usual terminology in the literature.
364
F. Cassez, J. Dubreil, and H. Marchand b 12.a
a 1
12.ab
2
b (a): The Automaton G
a, b
a
a b 1.a
1.b
{a}
{a, b}
a
12
{b}
12.b
b Γ (x1 ) = {a} x1
{b}
b
a 1
a x2
{a}
a, b a, b
Γ (x2 ) = {a, b} (c): A Finite Observer
b
2.a
2.b
{a, b} {a}
2.ab
1.ab
a, b
{b} 2
{a, b} (b): The Game H
Fig. 4. Most Permissive Dynamic Observer
Player 2 could choose to observe either event a or b and in this case the game will evolve into state (2) and the secret will be revealed). The dashed lines represents the transitions that are removed from the game automaton to obtain the most permissive observer. Finally, Figure 4(c) represents a valid observer O generated from the most permissive observer with the memoryless strategy f (1) = {a} and f (12) = {a, b}.
5 Optimal Dynamic Observer Among all the possible observers that ensure the opacity of the secret, it is worthwhile noticing that some are better (in some sense) than other: they hide less events on average. We here define a notion of cost for observers which captures this intuitive notion. We first introduce a general cost function and we show how to compute the cost of a given pair (G, O) where G is a system and O a finite state observer. Second, we show that among all the valid observers (that ensure opacity), there is an optimal cost, and we can compute an observer which ensures this cost. The problems in this section and the solutions are closely related to the results in [15] and use the same tools: Karp’s mean-weight algorithm [21] and a result of Zwick and Paterson [22]. We want to define a notion of cost which takes into account the set of events the observer chooses to hide and also how long it hides them. We assume that the observer is a finite automaton O = (X, x0 , Σ, δo , Γ ). With each set of observable events Σ ∈ 2Σ we associate a cost of hiding Σ \ Σ which is a positive integer. We denote
Dynamic Observers for the Synthesis of Opaque Systems
365
Cost : 2Σ → N this function. Now, if O is in state x, the current cost per time unit is Cost(Γ (x)). Let Runsn (G) be the set of runs of length n in Runs(G). Given a run λ
λ
ρ = q0 −−1→ q1 · · · qn−1 −−n→ qn ∈ Runsn (G), let xi = δo (x0 , wi ) with wi = tr(ρ[i]). The cost associated with ρ ∈ Runsn (G) is defined by: 1 Cost(ρ, G, O) = · Cost(Γ (xi )). n + 1 i=0..n Notice that the time basis we take is the number of steps which occurred in G. Thus if the observer is in state x, and chooses to observe Γ (x) at steps i and i + 1, Cost(Γ (x)) will be counted twice: at steps i and i+1. The definition of the cost of a run corresponds to the average cost per time unit, the time unit being the number of steps of the run in G. Define the cost of the set of runs of length n that belongs to Runsn (G) by: Cost(n, G, O) = max{Cost(ρ, G, O) | ρ ∈ Runsn (G)}. The cost of an observer with respect to a system G is Cost(G, O) = lim sup Cost(n, G, O) n→∞
(3)
(notice that the limit may not exist whereas the limit sup is always defined.) To compute the cost of a given observer, we can use a similar algorithm as the one given in [15], and using Karps’s maximum mean-weight cycle algorithm [21]: Theorem 5. Computing Cost(G, O) is in PTIME. Proof. We can prove that the cost of an observer is equal to the maximum mean-weight cycle in G ⊗ O. The size of G ⊗ O is polynomial in the size of G and O. Computing the maximum mean-weight cycle can be done in linear time w.r.t. the size of G ⊗ O. Finally we can solve the following optimization problem: Problem 5 (Bounded Cost Observer). I NPUTS : an automaton G = (Q, q0 , Σ, δ, F ) and an integer k ∈ N. P ROBLEMS : (A) Is there any O ∈ OBS(G) s.t. F is opaque w.r.t. (G, O) and Cost(G, O) ≤ k ? (B) If the answer to (A) is “yes”, compute a witness observer O s.t. Cost(G, O) ≤ k. To solve this problem we use a result from Zwick and Paterson [22], which is an extension of Karp’s algorithm for finite state games. Theorem 6. Problem 5 can be solved in EXPTIME. The solution to this problem is the same as the one given in [15], and the proof for the opacity problem is detailed in [16]. The key result is Theorem 4, which enables us to represent all the winning strategies in H as a finite automaton. Synchronizing G and the most permissive valid dynamic observer FH produces a weighted game, the optimal value of which can be computed in PTIME (in the size of the product) using the algorithm in [22]. The optimal strategies can be computed in PTIME as well. As G × FH has size exponential in G and Σ, the result follows.
366
F. Cassez, J. Dubreil, and H. Marchand
6 Conclusion In this paper, we have investigated the synthesis of opaque systems. In the context of static observers, where the observability of events is fixed a priori, we provided an algorithm (PSPACE-complete) to compute a maximal subalphabet of observable actions ensuring opacity. We have also defined a model of dynamic observers determining whether an event is observable after a given observed trace. We proved that the synthesis of dynamic observers can be solved in EXPTIME, and EXPTIME-hardness is left open. We assumed that the dynamic observers can change the set observable events only after an observable event has occurred. This assumption should fit most applications since the knowledge of the attacker also depends on observed traces. It would be interesting to investigate also the case where this decision depends on the word executed by the system. The case where the observability depends on the state of the system should also be considered as it would be easy to implement in practice. Finally, the notion of semantics of an observed trace used throughout this article is based on the assumption that the attacker can react, i.e., acquire knowledge, faster than the system’s evolution. It would be interesting to adapt this work to other types of semantics.
References 1. Lowe, G.: Towards a completeness result for model checking of security protocols. Journal of Computer Security 7(2-3), 89–146 (1999) 2. Blanchet, B., Abadi, M., Fournet, C.: Automated Verification of Selected Equivalences for Security Protocols. In: 20th IEEE Symposium on Logic in Computer Science (LICS 2005), Chicago, IL, pp. 331–340. IEEE Computer Society, Los Alamitos (2005) 3. Hadj-Alouane, N., Lafrance, S., Lin, F., Mullins, J., Yeddes, M.: On the verification of intransitive noninterference in mulitlevel security. IEEE Transaction On Systems, Man, And Cybernetics—Part B: Cybernetics 35(5), 948–957 (2005) 4. Schneider, F.B.: Enforceable security policies. ACM Trans. Inf. Syst. Secur. 3(1), 30–50 (2000) 5. Ligatti, J., Bauer, L., Walker, D.: Edit automata: enforcement mechanisms for run-time security policies. Int. J. Inf. Sec. 4(1-2), 2–16 (2005) 6. Darmaillacq, V., Fernandez, J.C., Groz, R., Mounier, L., Richier, J.L.: Test generation for net¨ Duale, A.Y., Fecko, M.A. (eds.) TestCom 2006. LNCS, work security rules. In: Uyar, M.U., vol. 3964, pp. 341–356. Springer, Heidelberg (2006) 7. Le Guernic, G.: Information flow testing - the third path towards confidentiality guarantee. In: Cervesato, I. (ed.) ASIAN 2007. LNCS, vol. 4846, pp. 33–47. Springer, Heidelberg (2007) 8. Dubreil, J., J´eron, T., Marchand, H.: Monitoring information flow by diagnosis techniques. Technical Report 1901, IRISA (August 2008) 9. Bryans, J., Koutny, M., Mazar´e, L., Ryan, P.: Opacity generalised to transition systems. International Journal of Information Security 7(6), 421–435 (2008) ˇ y, P., Zdancewic, S.: Preserving secrecy under refinement. In: Bugliesi, M., 10. Alur, R., Cern´ Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 107–118. Springer, Heidelberg (2006) 11. Badouel, E., Bednarczyk, M., Borzyszkowski, A., Caillaud, B., Darondeau, P.: Concurrent secrets. Discrete Event Dynamic Systems 17, 425–446 (2007) 12. Dubreil, J., Darondeau, P., Marchand, H.: Opacity enforcing control synthesis. In: Proceedings of the 9th International Workshop on Discrete Event Systems (WODES 2008), G¨oteborg, Sweden, May 2008, pp. 28–35 (2008)
Dynamic Observers for the Synthesis of Opaque Systems
367
13. Dubreil, J., Darondeau, P., Marchand, H.: Opacity enforcing control synthesis. Technical Report 1921, IRISA (February 2009) 14. Takai, S., Oka, Y.: A formula for the supremal controllable and opaque sublanguage arising in supervisory control. SICE Journal of Control, Measurement, and System Integration 1(4), 307–312 (2008) 15. Cassez, F., Tripakis, S.: Fault diagnosis with static or dynamic diagnosers. Fundamenta Informatica 88(4), 497–540 (2008) 16. Cassez, F., Dubreil, J., Marchand, H.: Dynamic Observers for the Synthesis of Opaque Systems. Technical Report 1930, IRISA (May 2009) 17. Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time: Preliminary report. In: STOC, pp. 1–9. ACM, New York (1973) 18. Alur, R., Cern´y, P., Chaudhuri, S.: Model checking on trees with path equivalences. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 664–678. Springer, Heidelberg (2007) 19. Martin, D.A.: Borel determinacy. Annals of Mathematics 102(2), 363–371 (1975) 20. Thomas, W.: On the synthesis of strategies in infinite games. In: Mayr, E.W., Puech, C. (eds.) STACS 1995. LNCS, vol. 900, pp. 1–13. Springer, Heidelberg (1995), Invited talk 21. Karp, R.: A characterization of the minimum mean cycle in a digraph. Discrete Mathematics 23, 309–311 (1978) 22. Zwick, U., Paterson, M.: The complexity of mean payoff games on graphs. Theoretical Computer Science 158(1–2), 343–359 (1996)
Symbolic CTL Model Checking of Asynchronous Systems Using Constrained Saturation Yang Zhao and Gianfranco Ciardo Department of Computer Science and Engineering University of California, Riverside {zhaoy,ciardo}@cs.ucr.edu
Abstract. The saturation state-space generation algorithm has demonstrated clear improvements over state-of-the-art symbolic methods for asynchronous systems. This work is motivated by efficiently applying saturation to CTL model checking. First, we introduce a new “constrained saturation” algorithm which constrains state exploration to a set of states satisfying given properties. This algorithm avoids the expensive afterthe-fact intersection operations and retains the advantages of saturation, namely, exploiting event locality and benefiting from recursive local fixpoint computations. Then, we employ constrained saturation to build the set of states satisfying EU and EG properties for asynchronous systems. The new algorithm can achieve orders-of-magnitude reduction in runtime and memory consumption with respect to methods based on breath-first search, and even with a previously-proposed hybrid approach that alternates between “safe” saturation and “unsafe” breadth-first searches. Furthermore, the new approch is fully general, as it does not require the next-state function to be expressable in Kronecker form. We conclude this paper with a discussion of some possible future work, such as building the set of states belonging to strongly connected components.
1
Introduction
CTL model checking is an important state-of-the-art approach in formal verification. Paired with the use of BDDs [2], which provide a time and space efficient data structure to perform operations such as union, intersection, and relational product over sets of states, symbolic model checking [13] is one of the most successful techniques to verify industrial hardware and embedded software systems. Most current symbolic model checkers, such as NuSMV [12], employ methods based on breath-first search (BFS). The saturation algorithm [7] employs a very different philosophy, recursively computing “local fixpoints”. A series of publications has proven the clear advantages of saturation for state-space generation over traditional symbolic approaches [6,8,9,14], while extending its applicability to increasingly general settings. However, our previous attempts to apply saturation to CTL model checking have been only partially successful [10].
Work supported in part by the National Science Foundation under grant CCF0848463.
Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 368–381, 2009. c Springer-Verlag Berlin Heidelberg 2009
Symbolic CTL Model Checking of Asynchronous Systems
369
This paper addresses CTL model checking for asynchronous systems by proposing an extended constrained saturation algorithm. This algorithm constrains the saturation-based state-space exploration to a given set of states without explicitly executing the expensive intersection operations normally required to implement CTL operators. Furthermore, unlike the original approach [10] where the nextstate function had to satisfy a Kronecker expression, the proposed algorithm is fully general, as it employs a disjunctive-then-conjunctive encoding that exploits the common characteristics of asynchronous systems. Constrained saturation can be used to compute the set of states satisfying an EU formula as well as to efficiently compute the backward reachability relation, which we in turn use for a new algorithm to compute the set of states satisfying an EG formula. The remainder of this paper is organized as follows. Section 2 introduces the relevant background on MDDs and the saturation algorithm. Section 3 introduces constrained saturation and new EU computation algorithm. Section 4 proposes a new EG computation algorithm based on the backward reachability relation. We conclude this paper and outline future work in the last section.
2
Preliminaries
Consider a discrete-state model (S, Sinit , E, N ) where the state space S is given by the product SL × · · · × S1 of local state spaces of L submodels, that is, each (global) state i is a tuple (iL , · · · , i1 ), where ik ∈ Sk , for L ≥ k ≥ 1; the set of initial states is Sinit ⊆ S; the set of (asynchronous) events is E; the nextstate function N : S → 2S is described in disjunctively partitioned form as N = α∈E Nα , where Nα (i) is the set of states that can be reached in one step when α occurs, or fires, in state i. We say that α is enabled in state i if Nα (i) = ∅. Correspondingly, N −1 and Nα−1 denote the inverse next-state functions, e.g., Nα−1 (i) is the set of states that can reach i in one step by firing event α. For high-level models, the generation of the state space S is an important and interesting problem in itself. This is particularly true for models such as Petri nets, where the sets Sk might not be known a priori. However, using the saturation algorithm, the state spaces of complex models can usually be generated in acceptable runtime, so we now assume that state-space generation is a preprocessing step that has been already performed prior to model checking. Consequently, the sets Sk , their sizes nk , and the reachable state space Srch ⊆ S are assumed known in the following discussion, and we let Sk = {0, ..., nk − 1}, without loss of generality. More details about our state-space generation algorithm are reviewed in Section 2.3. 2.1
Symbolic Encoding of Sets of States
Binary decision diagrams (BDDs) [2] are the most widely used data structure to store and manipulate sets of states. Instead of BDDs, we employ multi-way decision diagrams (MDDs) to encode sets of states. MDDs extend BDDs by allowing integer-valued variables, so that the choices for nodes at level k naturally correspond to the local states of submodel k.
370
Y. Zhao and G. Ciardo
Definition 1. A quasi-reduced MDD is an acyclic directed edge-labeled graph where: -
Each node a belongs to a level in {L, · · · , 1, 0}, denoted by a.lvl. There is a single root node. The only terminal nodes are 0 and 1, and are at level 0. A nonterminal node a at level k, with L ≥ k ≥ 1, has nk outgoing edges labeled with a different integer in {0, ..., nk − 1}. The node pointed by the edge labeled with ik is denoted a[ik ], and, if not 0, it must be at level k − 1.
The set encoded by MDD node a at level k > 1 is then recursively defined as B(a) = {ik } × B(a[ik ]). ik ∈Sk
with terminal cases B(0) = ∅ and B(a) = {i1 ∈ S1 : a[i1 ] = 1} if a.lvl = 1. 2.2
Symbolic Encoding of the Next-State Functions
In a traditional symbolic approach, the next-state function N is encoded using a 2L-level BDD or MDD, with levels ordered as L, L , ..., 1, 1 , 0, where levels L through 1, corresponding to the “from” states, and levels L through 1 , corresponding to the “to” states, are interleaved. Given a next-state function N and a set of states X , the image computation or relational product builds the set N (X ) = {j : ∃i ∈ X , (i, j) ∈ N }. A commonly employed technique is to conjunctively or disjunctively partition the next-state function encoded by a monolithic MDD into several MDDs [3,11]. For asynchronous systems, it is natural to disjunctively store each next-state function Nα , for each event α ∈ E, so that the overall next-state function N = N is actually never stored as a single MDD. α α∈E Locality is a fundamental property enjoyed by most asynchronous systems: an event α is independent of the k th submodel (or, briefly, of k) if its enabling does not depend on ik and its firing does not change the value of ik . A level k belongs to the support set of event α, denoted supp(α), if α is not independent of k. We define Top(α) and Bot (α) to be the maximum and minimum levels in supp(α), and Ek to be the set of events {α ∈ E : Top(α) = k}. Also, welet Nk be the next-state function corresponding to all events in Ek , i.e., Nk = T op(α)=k Nα . In previous work [9], the locality of events is indicated by the presence of identity matrices in a Kronecker description of the model’s next-state function. However, while the existence of a Kronecker description is not a restriction for some formalisms (e.g., ordinary Petri nets, even if extended with inhibitor and reset arcs), it does impose constraints on the decomposition granularity in others (e.g., Petri nets with arbitrary marking-dependent arc cardinalities or transition guards). [11] proposes a new encoding for the transition relation based on a disjunctive-conjunctive partition and a fully-identity reduction rule for MDDs. [14] compares several possible MDD reduction rules and finally adopts the quasiidentity-fully (QIF) reduction rule for MDDs encoding next-state functions. This
Symbolic CTL Model Checking of Asynchronous Systems
Sinit 3 0
Nα 3
2 0
1 0
(a)
1
N3
0 1
3
0 1
Nβ Nγ 3’ 1
3’ 1 2
0
1
2
2’
1
0
2’
N2 0 1 1
1
0
1
0
1’
1
1’
1
(b)
1
371
1
1
(c)
1
0
1
Fig. 1. An example of set and next-state function encoding using MDDs
encoding is successfully applied to saturation and allows for complex variable dependencies in asynchronous systems. A 2L-level MDD encoding next-state function Nα with this QIF reduction rule satisfies the following: - The root node is at level Top(α), an unprimed level. - (Interleaving) Level k is immediately followed by level k , for L ≥ k ≥ 1. - (Quasi-reduced rule) If k ∈ supp(α), level k is present on every path in the MDD encoding Nα . – (Identity-reduced rule) If k ∈ supp(α), level k can be present on a path in the MDD encoding Nα , or it can be absent. In the latter case, the meaning of an edge a[ik ] = b, with a.lvl = k > b.lvl is that a singular node c at level k has been skipped, i.e., a node with c[ik ] = b and c[jk ] = 0 for jk ∈ Sk \ {ik }. - (Fully-reduced rule) If k ∈ supp(α), level k is absent on every path in the MDD encoding Nα . The meaning of an edge a[il ] = b, with a.lvl > k > b.lvl is that a redundant node c at level k has been skipped, i.e., a node with c[ik ] = b for every ik ∈ Sk . - (Fully-identity-reduced pattern) This is really a combination of the two preceding cases. The interpretation of an edge skipping levels k and k (or of the root being at a level below k ) is that α is independent of ik , i.e., event α is enabled when the k th local state is ik , for any ik ∈ Sk , and that, if α fires, the k th local state remains equal to ik in the new state. Figure 1(a) shows a 3-level MDD encoding an initial set of states containing a single state, Sinit = {000}. Figure 1(b) shows three MDDs encoding next-state functions for events α, β, and γ, respectively. Figure 1(c) shows the next-state functions merged by levels, N3 = Nα , N2 = Nβ ∪ Nγ , and N1 = ∅ (not shown). Note, for example, that N2 does not depend on level 1 and that the node at level 3 for the 1-child of the root of N3 is skipped due to the identity-reduced rule. 2.3
State Space Generation
All symbolic approaches to state-space generation use some variant of symbolic image computation. The simplest approach is a breadth-first iteration, directly
372
Y. Zhao and G. Ciardo
implementing the definition of the state space Srch as the fixpoint of the expression Sinit ∪ N (Sinit ) ∪ N 2 (Sinit ) ∪ N 3 (Sinit ) ∪ · · · . Locality and disjunctive partition of the next-state function form instead the basis is for the saturation algorithm. The key idea is to fire events node-wise, bottom-up, and exhaustively, instead of level-wise and exactly once per iteration. A node a is saturated if it is a fixpoint with respect to firing any event that is independent of all levels above k: ∀h, k ≥ h ≥ 1, ∀α ∈ Eh , SL × · · · × Sk+1 × B(a) ⊇ Nα (SL × · · · × Sk+1 × B(a)) Starting from the MDD encoding the initial states, the nodes in the MDD are saturated in order, from the bottom level to the top level. Of course, one of the advantages of saturation is that, given the QIF reduction, the application of Nα needs to start at level Top(α), and not all the way up, at the root of entire MDD. Every node is saturated before being checked in the unique table, including nodes newly created when firing an event on a node at a higher level. 2.4
CTL Model Checking
CTL is a widely used temporal logic to describe the system properties because of its simple yet expressive syntax. All CTL properties are state properties and can be checked by manipulating sets of states. It is well-known that {EX, EU, EG} is a complete set of operators for CTL, that is, it can be used to express any other CTL operator (for example, the EF operator is a special case of EU, since EFp ≡ EtrueUp). The EXp operator can be easily computed as the relational product N −1 (P), where P is the set of states satisfying property p. Building the set of states satisfying EFp is instead essentially the same process as state-space generation, the only differences being that we start from P instead of Sinit and that we go backwards instead of forwards, thus we use N −1 (or Nα−1 , or Nk−1 , as appropriate) instead of N . The traditional algorithm to obtain the set of states satisfying EpUq computes a least fixpoint (see EU trad in Figure 2; all sets of states and relations over states in our pseudocode are encoded using MDDs, of course). Starting from Q, the set of states satisfying q, it computes the intersection of the preimage of the explored states, X , with the states in P. The newly computed states are added to the explored states, for the next iteration. The number of iterations is thus equal to the longest path of states in P \ Q reaching a state in Q. A saturation-based EU computation algorithm was instead proposed in [10] (see EUsat in Figure 2). First, it partitions the set E of events into safe, ES , and unsafe, EU = E \ ES , where α ∈ E is safe iff Nα−1 (P ∪ Q) ⊆ P, i.e., it is such that, following its firing backwards, we can only find states in P (alternatively, we can restrict all sets by intersecting them with the reachable states Srch in the above test). The algorithm iteratively (1) saturates the MDD encoding the set of explored states using only safe events, then (2) fires each unsafe event once using NU−1 = α∈EU Nα−1 in breadth-first fashion, then (3) intersects the result with P, and finally (4) adds the result to the working set X .
Symbolic CTL Model Checking of Asynchronous Systems
EU trad(in P, Q): set of state 1 declare X : set of states 2 X ← Q; 3 repeat 4 X ← X ∪ (N −1 (X ) ∩ P); 5 until X does not change; 6 return X ; EGtrad (in P): set of state 1 declare X : set of states 2 X ← P; 3 repeat 4 X ← X ∩ N −1 (X ); 5 until X does not change; 6 return X ;
373
EUsat (in P, Q): set of state 1 declare X , Y: set of state; 2 declare EU , ES : set of event; 3 ClassifyEvents(P ∪ Q, EU , ES ) 4 X ← Q; 5 Saturate(X , ES ) 6 repeat 7 Y ← X; 8 X ← X ∪ (NU−1 (X ) ∩ (P ∪ Q)) 9 if X = Y then 10 Saturate(X , ES ) 11 until X = Y; 12 return X ;
Fig. 2. Traditional and saturation-based CTL model checking algorithms
The traditional EG algorithm (see EGtrad in Figure 2) computes a greatest fixpoint by iteratively eliminating states without successors in the working set X . [10] also attempts to compute set of states satisfying EGp using forward and backward EU saturation from a single state in P. However, this approach is more efficient than the traditional algorithm only in very special cases.
3
Constrained Saturation for the EU Operator
The set of states satisfying EpUq is a least fixpoint, where the saturation algorithm could be efficiently employed. However, the challenge arises from the need to “filter out” the states not in P before exploring their predecessors. Failure to do so can results in paths which temporarily go out of P, so that the result may include some states not satisfying EpUq. The saturation algorithm does not find states in breadth-first-search order, as the process of saturating a node often consists of firing a series of events. Performing an expensive intersection operation after each firing would be enormously less time and memory efficient. The advantage of Algorithm EUsat [10] over EUtrad depends on the structure of the model. If there are no safe events with respect to a given property p, EUsat degrades to the simple breadth-first exploration of EUtrad . To overcome this difficulty, we propose two approaches, both aimed at exploring only states in P without requiring an expensive intersection operation after each firing. 1. Saturation with Constrained Next-State Functions. For each Nk−1 , −1 we build a constrained inverse next-state function Nk,P such that −1 j ∈ Nk,P (i) ⇐⇒ (j ∈ P) ∧ j ∈ Nk−1 (i). −1 Algorithm ConsNSF in Figure 3 builds the MDD representation of Nα,P .
374
Y. Zhao and G. Ciardo
2. Constrained Saturation. This is the main contribution of our paper. We do not explicitly constrain the next-state functions, but perform instead a “checkand-fire” step when computing the constrained preimage (function ConsRelProd in the pseudocode of Figure 3), based on the following observation: B(t) = RelProd (s, r) ∩ B(a) ⇐⇒ B(t[i ]) = RelProd (s[i], r[i][i ]) ∩ B(a[i ]), (1) i∈Sl
where t and s are L-level MDDs encoding sets of states, l = s.lvl, and r is a 2L-level encoding a next-state function. This can be considered as a form of ITE operator [2], widely used in BDD operations, but extended from boolean to integer variables. The overall process of EU computation based on constrained saturation is then shown in Figure 3. The key differences from the saturation algorithm in [9] are marked with a “”. The idea of the first approach is straightforward: all constrained next-state −1 functions Nα,P are forced to be safe by definition. According to the saturationbased EU computation algorithm in Figure 2, the result is obtained in a single iteration (a single call to Saturate). The downside of this approach is a possible decrease in locality. A property p is dependent on level k if the value of ik affects the satisfiability of p, i.e., if the (fully-reduced) encoding of p has nodes at level k. After constraining a next-state function Nα with p, the levels on which p depends −1 become part of the support, thus, Top(Nα,P ) = max{Top(P), Top(Nα−1 )}. If P depends on level L, all the constrained next-state functions belong to EL−1 and the saturation algorithm degrades to BFS, losing its advantages. The second approach, constrained saturation, does not modify the transition relation explicitly, but constrains the state exploration “on-the-fly” following the “check-and-fire” policy. This policy guarantees that the state exploration is constrained to set P. At the same time, it retains the advantages of saturation due to exploiting event locality and employing recursive local fixpoint computations. In the pseudocode shown in the right portion of Figure 3, the “check-and-fire” policy can be summarized into two cases: 1. If p[i] = 0, s[i] is kept unchanged without adding new states (line 4 in function ConsSaturate). 2. When computing the relational product, check whether the newly generated local state is included in p on each level (line 7 in function ConsSaturate and line 5 in function ConsRelProd ). If instead p[i ] = 0 in formula (1), the relational product stops the recursive execution and returns 0. Another tradeoff affecting efficiency is how to select the set of states P when checking EpUq. In high-level models, P is often associated with an atomic property, e.g., “place a of the Petri Net is empty” or a “localized” property dependent on just a few levels. There are then two reasonable choices to define P: – P = Ppot : include all states in the potential state space S that satisfy the given property, even if they are not reachable (recall that the potential state space is finite because the bound for each local state space Sk is known).
Symbolic CTL Model Checking of Asynchronous Systems
375
mdd EUsatConsNSF (mdd P, mdd Q) −1 1 foreach α ∈ E do Nα,P ← ConsNSF (P, Nα−1 ); 2 mdd s ← Saturate(Q); 3 s ← intersection(s, Srch ); 4 return s; mdd ConsNSF (mdd a, mdd r) 1 if a = 1 and r = 1 then return 1; 2 if InCache ConsNSF (a,r,t) then return t; 3 mdd t ← 0; level lr ← r.lvl; level la ← a.lvl; 4 if lr < la then 5 foreach i ∈ Sla s.t. a[i] = 0 do t[i][i] ← ConsNSF (a[i], r ); 6 else if lr > la then 7 foreach i, i ∈ Slr s.t. r[i][i ] = 0 do t[i][i ] ← ConsNSF (a, r [i][i ]); 8 else • lr = la 9 foreach i,i ∈ Slr s.t. r[i][i ] = 0 and a[i ] = 0 do t[i][i ] ← ConsNSF (a[i ],r[i][i ]); 10 CacheAdd ConsNSF (a, r, t); 11 return t; mdd EUconsSat (mdd a, mdd b) 1 mdd s ← ConsSaturate(a, b); 2 s ← intersection(s, Srch ); 3 return s;
• a: the constraint; b: the set being saturated
mdd ConsSaturate (mdd a, mdd s) • a: the constraint; s: the set being saturated 1 if InCache ConsSaturate (a, s, t) then return t; 2 level l ← s.lvl; mdd t ← NewNode(l); mdd r ← Nl−1 ; 3 foreach i ∈ Sl s.t. s[i] = 0 do 4 if a[i ] = 0 then t[i] ← ConsSaturate (a[i], s[i]); else t[i] ← s[i]; 5 repeat 6 foreach i, i ∈ Sl s.t. r[i][i ] = 0 do 7 if a[i ] = 0 then 8 mdd u ← ConsRelProd (a[i ],t[i],r[i][i ]); t[i ] ← Or (t[i ], u); 9 until t does not change; 10 t ←UniqueTablePut (t); CacheAdd ConsSaturate (a, s, t); 11 return t; mdd ConsRelProd (mdd a, mdd s, mdd r) 1 if s = 1 and r = 1 then return 1; 2 if InCache ConsRelProd (a, s, r, t) then return t; 3 level l ← s.lvl; mdd t ← 0; 4 foreach i, i ∈ Sl s.t. r[i][i ] = 0 do 5 if a[i ] = 0 then 6 mdd u ← ConsRelProd (a[i ],s[i],r[i][i ]); 7 if u = 0 then 8 if t = 0 then t ← NewNode(l); 9 t[i ] ← Or (t[i ], u); 10 t ← ConsSaturate (a, UniqueTablePut (t)); CacheAdd ConsRelProd (a, s, r, t); 11 return t; Fig. 3. EU computation: saturation using a constrained next-state function (EUsatConsNSF ) and constrained saturation (EUconsSat )
376
Y. Zhao and G. Ciardo
– P = Prch : include in P only the reachable states that satisfy the given property, Prch = Ppot ∩ Srch . We are normally only interested in reachable states and, of course, backward state exploration from unreachable states can only lead to more unreachable states; all these unreachable states can be filtered out after saturation, without affecting the correctness of the result (unlike the discussion at the beginning of this section, pertaining to filtering out states not in P). Exploration including the unreachable states might result in greater time and memory requirements, in which case using Prch is preferable for algorithmic efficiency. On the other hand, Ppot is often dependent on very few levels, while, for most models, Prch is a strict subset of S, thus depends on many levels, and this increases the complexity of algorithm, especially for the first approach. In the ideal case, we can constrain the state exploration to Prch with an acceptable overhead. The experimental results in Section 5 demonstrate that constrained saturation using Prch tends to perform much better than saturation with constrained nextstate functions in both runtime and memory consumption. We select it as our main method to compute the EU operator, as well as the reachability relation, which we introduce in the next section.
4
Reachability Relation and the EG Operator
The EGp property describes the existence of a path in P from a state leading to a nontrivial strongly-connected component (SCC), where p holds in all states along the path and in the SCC. In this section, we propose a new EGp computation algorithm based on the reachability relation, built using constrained saturation. The following defines the (backward) reachability relation of a set of states X within P, denoted with (NX−1,P )+ . Definition 2. Given a state i ∈ X , j ∈ (NX−1,P )+ (i) iff there exists a nontrivial (i.e., positive length) forward path π from j to i and all states in π belong to P. If j ∈ (NX−1,P )+ (i), we know that j is in P. Since it is not always necessary to compute the reachability relation for all i ∈ S, we can build the reachability relation starting only from states in X , to reduce time and memory consumption. −1 + Claim 1. If j ∈ (NS,P ) (i), then ∃i ∈ N −1 (i) ∩ P s.t. j ∈ ConsSaturate(P, {i }). This claim comes from the definition of constrained saturation and derives a way of building the reachability relation efficiently. Starting from the MDD encoding N −1 , appropriately restricted to a set of states (e.g., P), we compute the constrained saturation for states encoded at the primed levels. Analogous to constrained saturation, this process can be performed bottom-up recursively on each level. −1 + −1 + Claim 2. EGp holds in state j iff ∃i ∈ P s.t. i ∈ (NP,P ) (i) and j ∈ (NP,P ) (i). From this claim, we can obtain an algorithm to compute the set of states satisfying EGp. Given a 2L-level MDD encoding the reachability relation, it is
Symbolic CTL Model Checking of Asynchronous Systems
377
−1 + easy to obtain the set of states Sscc = {i : i ∈ (NP,P ) (i)}. These states belong to SCCs where property P holds continuously. Then, the result of EGp can be −1 + obtained computing RelProd (Sscc , (NP,P ) ). Building the reachability relation is a time and memory intensive task, constituting the bottleneck for our new EG algorithm. On the other hand, the reachability relation contains more information than the basic EG property and has further applications. We discuss one of them: EG computation under a weak fairness constraint. Fairness is widely used in formal specification of protocols; in particular, weak fairness specifies that there is an infinite execution on which some states, say in F , appear infinitely often. The difficulty lies in that the fact that executions in SCCs which do not contains states in F must be eliminated to guarantee the fairness, and the traditional EG algorithm cannot handle this problem. However, this extension is easy in our framework, as discussed next.
Claim 3. EGp under weak a fairness constraint F holds in state j iff ∃i ∈ F s.t. −1 + + i ∈ (NF−1 ∩P,P ) (i) and j ∈ (NF ∩P,P ) (i). Since i ∈ F, the SCCs containing such a state satisfy the fairness constraint. We only need to build reachability relation on these states. An interesting point −1 + + is that many fewer state pairs are in (NF−1 than in (NP,P ) . Although, ∩P,P ) in symbolic approaches, fewer states do not always lead to smaller MDDs, thus lower time and memory requirements, it is often the case in our framework that considering fairness will reduce the runtime, which is quite the opposite than in traditional approaches.
5
Experimental Results
We implemented the proposed approach in SmArT [5] and report on experiments run on an Intel Xeon 3.0Ghz workstation with 3GB RAM under SuSE Linux 9.1. Detailed descriptions of the models we use in the experiments can be found in the SmArT User Manual [4]. The state space size for each model is controlled by a parameter N . For comparison, we study each model in both SmArT and NuSMV version 2.4.3 [1]. 5.1
Results for the EU Computation
Table 1 shows the results for each EU query. The runtime (seconds), final (memf) and peak memory (mem-p) consumption (Kbytes) required by NuSMV, the old version of SmArT [10], and our new approach are shown in the corresponding columns, fo reach model. We compare the following five approaches: - BFS : the traditional EU algorithm implemented in SmArT - ConNSFSat -Ppot : Saturation using constrained next-state functions, where the next-state functions are constrained using Ppot - ConNSFSat -Prch: Saturation using constrained next-state functions, where the next-state functions are constrained using Prch . - ConSat -Ppot : Constrained saturation with Ppot . - ConSat -Prch : Constrained saturation with Prch .
NuSMV [1] sec KB(f)
EU query
BFS sec KB(f) KB(p)
SmArT ConsNSFSat -Ppot ConsNSFSat -Prch ConsSat -Ppot ConsSat -Prch sec KB(f) KB(p) sec KB(f) KB(p) sec KB(f) KB(p) sec KB(f) KB(p) E(pref 1 = 0)U(status0 = leader) 163 6,998 1.6 3,112 6,539 28.8 632 809 1.1 1,351 2,375 17.6 544 776 1.2 602 728 463 40,429 8.3 12,175 24,080 2,022.5 1,283 2,328 19.8 3,153 8,046 1,032.9 1,473 2,284 14.9 1,054 1,337 – – 39.8 44,551 91,466 – – – 347.3 8,726 23,332 – – – 114.9 2,078 2,616 – – 371.5 171,867 277,649 – – – – – – – – – 4,058.2 4,110 5,139 E(phil1 = eat)U(phil0 = eat) 73 287 4.7 3,248 4,586 < 0.1 464 465 0.2 1,117 1,591 0.2 436 436 < 0.1 435 435 147 872 60.6 7,141 16,446 0.2 861 864 0.8 3,119 4,907 0.4 817 843 0.3 816 841 740 16,082 – – – 0.3 3,837 3,870 160.7 52,447 87,791 0.7 3,856 3,885 0.4 3,850 3,876 1,036 30,707 – – – 0.6 5,262 5,269 1,228.2 100,516 164,137 4.8 7,589 7,697 0.8 5,253 5,271 E(p1 = load)U(p0 = send) 186 4,447 0.1 494 574 < 0.1 92 92 < 0.1 204 204 < 0.1 90 90 < 0.1 64 64 – – 1.4 3,088 3,738 < 0.1 283 317 < 0.1 598 659 0.1 312 312 < 0.1 166 166 – – – – – 5.7 14,272 14,274 636.1 17,398 30,833 3.6 15,899 15,907 < 0.1 4,269 4,298 – – – – – 163.1 105,787 105,788 – – – 111.6 119,988 119,988 0.4 28,781 28,790 E(M1 > 0)U(P1 s = P2 s = P3 s = N ) 27 628 5.2 35,979 35,980 0.4 407 477 43.2 993 2,965 < 0.1 257 288 < 0.1 256 286 155 8,223 – – – 31.5 638 1,362 – – – 1.8 981 1,107 1.9 977 1,104 812 75,884 – – – 2,566.3 1,449 6,972 – – – 39.1 1,247 6,018 40.5 1,246 6,006 – – – – – – – – – – – 1,128.0 4,302 40,588 1,200.9 4,299 40,504 E(p[N − 1][N ] = 1)U(p[N − 1][N ] = 0) – – 1.1 9,744 9,744 < 0.1 2,728 2,728 < 0.1 1,899 1,899 0.2 3,532 3,532 < 0.1 1,841 1,841 – – 7.2 41,252 41,252 0.5 12,445 12,445 0.1 7,289 7,312 1.6 15,877 15,877 < 0.1 7,043 7,062 – – 237.2 186,360 186,360 2.3 52,764 52,765 0.7 80,785 80,859 9.1 75,347 75,347 < 0.1 29,714 29,763 – – – – – 10.9 236,501 236,506 6.1 166,711 166,837 86.9 337,265 337,265 < 0.1 133,121 133,121
OldSmart [10] sec KB(f) KB(p)
leader 5 44.1 62,057 9.8 6 304.8 180,988 296.1 7 1,791.5 666,779 – 8 – – – phil. 50 1,644.2 75,282 < 0.1 100 – – 0.2 500 – – 2.9 1,000 – – 4.1 robin 10 29.2 70,583 5.1 20 – – – 100 – – – 200 – – – fms 10 578.9 181,353 0.1 25 – – 1.6 50 – – 24.0 100 – – – queens 10 15.3 77,861 – 11 84.0 327,907 – 12 595.1 1,355,128 – 13 – – –
Model
Table 1. Results for the EU computation
378 Y. Zhao and G. Ciardo
Symbolic CTL Model Checking of Asynchronous Systems
379
Our main method, constrained saturation using Prch , outperforms (sometimes by orders-of-magnitude) NuSMV and other methods in both time and memory. In comparison with NuSMV, the saturation-based methods excel because of the local fixpoint iteration scheme. The improvement of our new work over our old approach [10] can be attributed to both the MDD encoding of the next-state function and the more advanced saturation schemes. Overall, ConSat requires less runtime as well as memory than ConN SF Sat, because the constrained next-state functions often impose overhead on relational product operations. The difference between the results of ConNSFSat -Ppot and ConNSFSat -Prch shows the tradeoff discussed at the end of Section 3. ConSat constrained with Prch is more advantageous than with Ppot because ConSat is not sensitive to the complexity of the constraint set due to our lightweight “checkand-fire” policy. The reduction in state exploration becomes the dominant factor for efficiency. 5.2
Results for the EG Computation
Table 2 compares the results of NuSMV, BFS (SmArT-BFS) and the method in Section 4 (SmArT-RchRel) for EG computation with or without fairness constraints. For BFS, the number of iterations is listed in column itr. Without fairness, traditional BFS (in NuSMV or SmArT-BFS) is often ordersof-magnitude faster than our algorithm based on the reachability relation. This result is not surprising due to the time and memory complexity of building the reachability relation, even if this is done using saturation. Another experiment is provided to show the merit of our algorithm. We build a simple model with a long path from an SCC where EGp holds to a terminal state, with p holding on every state on this path. We parameterize the length of Table 2. Results for the EG computation EG query Fairness Model NuSMV SmArT-BFS SmArT-RchRel NuSMV SmArT-RchRel sec KB(f) itr sec KB(f) KB(p) sec KB(f) KB(p) sec KB sec KB(f) KB(p) leader EG(status0 = leader) pref0 = 1 3 0.2 9,308 14 0.1 139 196 4.9 934 1,115 0.6 11,428 0.02 266 268 4 3.0 50,187 18 < 0.1 400 436 791.6 5,999 7,225 11.2 49,655 207.4 5,271 6,394 phil. EG(phil0 = eat) phil0 = has lef t f ork 10 0.1 7,193 4 < 0.1 95 95 0.1 170 170 0.2 8,447 < 0.1 113 113 50 3.0 50,187 4 < 0.1 220 241 0.2 682 682 1,244.5 75,274 < 0.1 393 399 100 – – 4 0.1 444 562 1.1 1,180 1,191 – – 0.1 704 705 robin EG(true) p1 = Ask 10 2.3 70,581 1 < 0.1 86 86 0.1 437 437 73.5 73,145 < 0.1 222 222 50 – – 1 0.1 1,263 1,263 4.8 15,676 15,676 – – 0.3 1,902 1,902 100 – – 1 1.0 7,688 7,688 53.9 100,719 102,941 – – 1.5 9,317 9,317 fms EG¬(P1 s = P2 s = P3 s = N ) P1 s = N 5 0.8 18,474 1 < 0.1 61 135 1.9 1,022 1,024 2.5 35,238 0.27 419 475 10 16.5 60,559 1 < 0.1 128 220 1,062.4 4,338 6,231 191.8 62,188 77.7 607 1,050 kanban EG(P1 out > 0 ∨ P2 out > 0 ∨ P3 out > 0 ∨ P4 out > 0) P1 out = N 8 2.1 42,925 1 < 0.1 279 415 1,131.1 1,949 2,714 2.2 43,511 6.2 1,303 1,486 10 4.4 58,693 1 < 0.1 529 930 – – – 4.6 58,705 27.0 2,507 2,939
380
Y. Zhao and G. Ciardo
Fig. 4. EG computation based on BFS v.s. reachability relation
the path to control the number of iterations which a traditional EG algorithm will require to reach the fixpoint. For different numbers of iterations, from 500 to 50,000, we compare the runtimes (in seconds) of traditional (BFS) search and our algorithm (SatTR) in Figure 4. As the number of iterations grows, the runtime of our algorithm grows much slower than that of the traditional algorithm, due to the efficient state exploration scheme in constrained saturation. If we consider fairness, as discussed in Section 4, the time and memory complexity of building the reachability relation is often reduced, while that of the traditional algorithm in NuSMV increases. The advantage of our algorithm is observable in this case, especially for some complex models which are not manageable in NuSMV.
6
Conclusion and Future Work
In this paper, we focus on symbolic CTL model checking based on the idea of the saturation algorithm. To constrain state exploration in a given set of states, we present a constrained saturation algorithm. The “check-and-fire” policy filters out the states not in the given set when saturating MDD nodes recursively. For the EG operator, we first symbolically build the reachability relation, using constrained saturation, then compute the set of states satisfying EGp. We discuss desirable properties of our new EU and EG algorithms and analyze a set of experimental results.
Symbolic CTL Model Checking of Asynchronous Systems
381
Constrained saturation enables building the reachability relation for some complex systems. The application of the reachability relation can be further extended to SCC construction, a basic problem in emptiness checking for B¨ uchi automata. Another future work is to reduce the cost of building the reacha+ bility relation. For SCC enumeration, X in RX ,S can be refined to reduce the computation complexity.
References 1. NuSMV: a new symbolic model checker, http://nusmv.irst.itc.it/ 2. Bryant, R.E.: Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers 35(8), 677–691 (1986) 3. Burch, J.R., Clarke, E.M., Long, D.E.: Symbolic model checking with partitioned transition relations. In: Halaas, A., Denyer, P.B. (eds.) Int. Conference on Very Large Scale Integration, Edinburgh, Scotland, August 1991. IFIP Transactions, pp. 49–58. North-Holland, Amsterdam (1991) 4. Ciardo, G., et al.: SMART: Stochastic Model checking Analyzer for Reliability and Timing, User Manual, http://www.cs.ucr.edu/~ ciardo/SMART/ 5. Ciardo, G., Jones, R.L., Miner, A.S., Siminiceanu, R.: Logical and stochastic modeling with SMART. Perf. Eval. 63, 578–608 (2006) 6. Ciardo, G., Luettgen, G., Miner, A.S.: Exploiting interleaving semantics in symbolic state-space generation. Formal Methods in System Design 31, 63–100 (2007) 7. Ciardo, G., L¨ uttgen, G., Siminiceanu, R.: Saturation: An efficient iteration strategy for symbolic state space generation. In: Margaria, T., Yi, W. (eds.) TACAS 2001. LNCS, vol. 2031, pp. 328–342. Springer, Heidelberg (2001) 8. Ciardo, G., Marmorstein, R., Siminiceanu, R.: Saturation unbound. In: Garavel, H., Hatcliff, J. (eds.) TACAS 2003. LNCS, vol. 2619, pp. 379–393. Springer, Heidelberg (2003) 9. Ciardo, G., Marmorstein, R., Siminiceanu, R.: The saturation algorithm for symbolic state space exploration. Software Tools for Technology Transfer 8(1), 4–25 (2006) 10. Ciardo, G., Siminiceanu, R.: Structural symbolic CTL model checking of asynchronous systems. In: Hunt Jr., W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 40–53. Springer, Heidelberg (2003) 11. Ciardo, G., Yu, A.J.: Saturation-based symbolic reachability analysis using conjunctive and disjunctive partitioning. In: Borrione, D., Paul, W. (eds.) CHARME 2005. LNCS, vol. 3725, pp. 146–161. Springer, Heidelberg (2005) 12. Cimatti, A., Clarke, E., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M., Sebastiani, R., Tacchella, A.: NuSMV Version 2: An OpenSource Tool for Symbolic Model Checking. In: Brinksma, E., Larsen, K.G. (eds.) CAV 2002. LNCS, vol. 2404, p. 359. Springer, Heidelberg (2002) 13. McMillan, K.L.: Symbolic Model Checking: An Approach to the State Explosion Problem. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (May 1992), CMU-CS-92-131 14. Wan, M., Ciardo, G.: Symbolic state-space generation of asynchronous systems using extensible decision diagrams. In: Nielsen, M., Kucera, A., Miltersen, P.B., Palamidessi, C., Tuma, P., Valencia, F.D. (eds.) SOFSEM 2009. LNCS, vol. 5404, pp. 582–594. Springer, Heidelberg (2009)
LTL Model Checking for Recursive Programs Geng-Dian Huang1 , Lin-Zan Cai1 , and Farn Wang1,2 2
1 Dept. of Electrical Engineering, National Taiwan University Grad. Inst. of Electronic Engineering, National Taiwan University
Abstract. We propose a complete algorithm to model check LTL (Linear Temporal Logic) formulas with recursive programs. Our program models are control flow graphs extended with procedure calls. The LTL formulas may then be used to specify constraints on the global variables and the local variables in the current scope. Our algorithm is based on semisymbolic simulation of control-flow graphs to search for counter-examples. We apply post-dominance relation to reduce the number of the exploration traces. The existence of counter-examples is reduced to Boolean satisfiability while the termination of the exploration is reduced to Boolean unsatisfiability. We report our implementation and experiment.
1
Introduction
For finite-state systems, the LTL (Linear Temporal Logic) model checking problem can be reduced to the problem of finding a loop, which is a counter-example for an LTL property, in a given state graph [15]. Likewise, for a recursive program, the corresponding problem can also be reduced to the problem of finding a loop in a given control-flow graph [7] extended with procedure calls. Such a loop corresponds to either a repetition of state sequences or a divergent recursive procedure invocation sequence. An execution of a recursive program can be characterized by a state sequence. Conceptually, a state is a tuple (g, Ω) where g is a valuation of global variables and Ω is a stack. A stack element consists of a location n in a control-flow graph and a valuation l of local variables. An execution forms a loop if (i) the valuation of global variables in the head is the same as that in the tail, (ii) the top of the stack in the head is the same as that of the tail, and (iii) the stack height of the head is the lowest in the program trace. Conditions (i) and (ii) identify a possible loop. Condition (iii) ensures that the loop is genuine, i.e., it can be repeated for infinitely many times. Based on the dataflow algorithm in [13], we propose an algorithm to model check LTL formulas for recursive programs. We employ the semi-symbolic simulation technique [2], i.e., constructing a simulation tree by unfolding the controlflow graphs and record program variables symbolically in every tree node. For each tree path, a Boolean formula can be constructed to check the loop conditions. Note that we do not record the stack in tree nodes. Instead, we use the summary techniques to bypass procedure-call and return pairs [4]. Condition (iii) is assured by the construction of the simulation tree. To manage the complexity of formula construction, instead of constructing a whole formula for Z. Liu and A.P. Ravn (Eds.): ATVA 2009, LNCS 5799, pp. 382–396, 2009. c Springer-Verlag Berlin Heidelberg 2009
LTL Model Checking for Recursive Programs
383
satisfiability checking, we present techniques to incrementally construct intermediate formulas using semi-symbolic simulation. Without recording the stack explicitly in tree nodes, each tree path has a loop diameter. We can construct another Boolean formula to check if the loop diameter is reached. Our algorithm terminates if either a counter-example is found or all tree paths reach their loop diameters. The size of a simulation tree grows exponentially to the number of branches in a program. To reduce the size of the simulation tree, we propose to join the branches in the post-dominators. The post-dominance relation is widely used in the static analysis of programs[9]. In our experiment, the sizes of simulation trees were reduced effectively. In section 2 reviews related work. Section 3 defines the problem of LTL modelchecking with recursive programs. Section 4 explains semi-symbolic simulation [2]. Section 5 presents our LTL model-checking algorithm. Section 6 reports our experiment. Finally, section 7 concludes this work.
2
Related Work
By generalizing a dataflow algorithm [13], Ball et al. proposed a reachability analysis algorithm for recursive programs in [4]. They used BDDs (binary decision diagrams) to record summaries that represent pairs of program states before and after a procedure call. Based on [4], Basler et al. also proposed a reachability algorithm for recursive programs [3]. To handle recursion, they used QBF (Quantified Boolean Formulas) solvers to compute summaries. SAT solvers can replace the QBF solvers by representing the summaries in Boolean formulas which encode program traces of sufficient recursive depths and sufficient lengths [2,14]. In contrast to above work, we consider the LTL model-checking problem. Our algorithm extends the summary technique in searching for counter-examples for LTL formulas. Esparza et al.[6] proposed an algorithm to model check LTL formulas with pushdown automata. Instead of recording summaries of procedures, their algorithm works on head reachability graphs. A head consists of a pushdown automaton state and the top symbol on the corresponding stack. In a head reachability graph, a head h connects to a head h if there is a valid program trace from h to h . A loop in the head reachability graph corresponds to a counter-example for the corresponding LTL formula. Esparza et al. also extended the technique to model check LTL formulas with Boolean programs [7]. Program states correspond to heads and program transition relations are encoded with BDDs in the head reachability graph. SAT-based algorithms have been shown useful for the approach [5]. In contrast, we encode the head reachability relation with a Boolean formula and reduce the loop search to Boolean satisfiability. Huang and Wang [11] used SAT solvers to model-check the universal fragment of alternation-free μ-calculus formulas on context-free processes by encoding the local model checking algorithm in [12]. The formalization of context-free processes does not have variables. It blows up the model size to encode variables in context-free processes.
384
3
G.-D. Huang, L.-Z. Cai, and F. Wang
Problem Definition
In section 3.1, we introduce our model language, control-flow graphs. In section 3.2, we briefly explain LTL (Linear Temporal Logic) as our specification language. In LTL model checking, B¨ uchi automatas are used to express LTL formulas [15]. In section 3.3, we define our problem in terms of control flow graphs and B¨ uchi automatas. 3.1
Control Flow Graph with Procedure-Calls
Assume that a program P consists of a set of procedures p0 , . . . , pm , where p0 is the main procedure. We consider variables of Boolean type. Let B = {true, false} be the domain of a Boolean variable. For simplicity, assume that all procedures have the same set of variables. Let VG and VL be the set of global variables and local variables. Let G and L denote the domain of global variables B|VG | and the domain of local variables B|VL | respectively. In the sequel, we use g and l to denote a valuation of VG and of VL respectively. Let R = 2(G×L×G×L) be all possible transition relations. A transition relation defines the pre-condition and post-condition of a program statement. The mapping from program statements to transition relations can be found in [4]. Given a transition relation r ∈ R, we write r(g, l, g , l ) for (g, l, g , l ) ∈ r. We model a procedure by a control-flow graph. Let Gi be the control-flow graph for pi . Definition 1. For a procedure pi in a program, its control-flow graph Gi is a directed graph (Ni , Ei , Δi , callee i , si , ei ) where – – – – –
Ni is the set of nodes, Ei ⊆ Ni × Ni is the set of edges, Δi : Ei → R labels edges with transition relations, callee i ⊆ Ei → P labels edges with callees, ςi , i ∈ Ni are the entry node and the exit node.
Let succ Ei (n) = {n | (n, n ) ∈ Ei } denote the successors of n in Ei . Ei can be partitioned into a set of intraprocedural edges Ei→ and a set of call-to-return edges Ei . An intraprocedural edge e is labelled with the transition relation Δi (e). A call-to-return edge e represents a procedure call to procedure callee i (e). Assume that e = (n, n ) and callee i (e) = pj . n is a call-location and n is a return-location. Given e, let e denote the call-to-entry edge (n, ςj ) and e denote the exit-to-return edge (j , n ). Assume that the passing-parameters and return values are stored by auxiliary global variables. We label e with transition relation r = {(g, l, g, l) | g ∈ G, l ∈ L, l ∈ L} and label e with r = {(g, l, g, l) | g ∈ G, l ∈ L, l ∈ L}. There is no constraint on l and l because the scope of VL is limited to a procedure. In the sequel, we will show the valuation of VL in the call-location n is the same as the one in the return-location n . Given the control-flow graphs, the program P can be represented as a controlflow graph (N, E, Δ, callee, s, e) that is the combination of the the control-flow graphs for p0 , . . . , pm defined as follows. Let N = i Ni , E → = i Ei→ , E =
LTL Model Checking for Recursive Programs
385
i Ei , E
= {e | e ∈ E }, E = {e | e ∈ E }, and E = E → ∪ E ∪ E ∪ E . Let Δ(e) = Δi (e) for e ∈ Ei→ and Δ(e) = {(g, l, g, l) | g ∈ G, l ∈ L, l ∈ L} for e ∈ E ∪ E . Let callee(e) = callee i (e) for e ∈ Ei . We may write callee(e) as callee(n, n ) for e = (n, n ). A program state is a tuple n, g, l where n ∈ N is a location, g ∈ G is a valuation of global variables, and l ∈ L is a valuation of local variables. We characterize the executions of the program P as program traces. A program trace is a sequence of program states n1 , g1 , l1 n2 , g2 , l2 . . . nm , gm , lm that fulfills the following constraints. – For each consecutive pair of program states ni , gi , li and ni+1 , gi+1 , li+1 , ni , gi , li transits to ni+1 , gi+1 , li+1 via edge ei = (ni , ni+1 ), which is either an intraprocedural edge, a call-to-entry edge, or an exit-to-return edge. – Each transition relation Δ(ei ) = ri is fulfilled, i.e.,ri (gi , li , gi+1 , li+1 ). Moreover, assume that ei is a call-to-entry edge and j is the corresponding exit-to return edge. The valuation of VL in ni is the same as the one in nj+1 , i.e., li = lj+1 , since a procedure calls only affect the valuations of global variables. – The sequence of edges e1 e2 . . . em−1 can be derived from the non-terminal symbol valid in the grammar in table 1. Table 1. Context-free Grammar for Program Traces valid ::= match | match e 1 valid match ::= ε | e→ 1 match | (n, ςi ) match (i , n ) match → Here e is a procedure-call transition edge. ε is a null sequence. e→ 1 ∈E 1 ∈E is an intraprocedural transition edge. (n, ςi ) and (i , n ) are a pair of matching procedure-call transition and procedure-return transition such that there is an e = (n, n ) and callee(e) = pi . That is, a program trace must be context-sensitive [13]. The grammar rule for non-terminal match specifies that in a trace, transitions for procedure-entries and procedure-exits must match in a nested fashion. The program trace is matched if match can derive e1 e2 . . . em−1 . The language of a program P , denoted by L(P ), is the set of program traces.
3.2
LTL and B¨ uchi Automata
We use LTL (Linear Temporal Logic) as our specification language for program traces. An LTL formula is constructed on top of atomic propositions for program states. Consider a program with a set of locations N and a set of Boolean variables VG ∪VL . We define the following atomic propositions for program states: loc = n, v = true, v = false, true, and false, where n ∈ N and v ∈ VG ∪ VL . loc = n asserts that the location of a program state is n. v, as a proposition, asserts that the value of v is true. Let AP be the set of atomic propositions. A state predicate is a combination of atomic propositions with Boolean operators ¬, ∨, and ∧. Let Ψ denote the set of state predicates. A program state n, g, l satisfies a state predicate ψ, n, g, l |= ψ, if and only if ψ is evaluated true according to n, g, l.
386
G.-D. Huang, L.-Z. Cai, and F. Wang
Definition 2. Suppose we are given a set AP of atomic propositions. An LTL formula ϕ is inductively defined with the following rule. ϕ ::= α|¬ϕ|ϕ1 ∨ ϕ2 | ϕ|ϕ1 U ϕ2 α ∈ AP is an atomic proposition. “ϕ” means “ϕ is true on the next state”. “ϕ1 U ϕ2 ” means “From now on, ϕ1 is always true until ϕ2 is true”. “♦ϕ” is the abbreviation of “trueU ϕ”, which means ”From now on, ϕ is eventually true”. “ϕ” is the abbreviation of “¬♦¬ϕ”, which means ”From now on, ϕ is always true”. Given an LTL formula ϕ, we use L(ϕ) to denote the set of program traces satisfying ϕ. For LTL model checking, usually we convert LTL formulas to B¨ uchi automata [10] with the same program traces. Definition 3. A B¨ uchi automata B is a tuple (Q, δ, q0 , F ) where – – – –
Q is a finite set of states, δ ⊆ Q × Ψ × Q defines a finite set of transitions, q0 is the initial state, and F is a set of final states. n,g,l
Given a (q, ψ, q ) ∈ δ and a program state n, g, l, we write q −→ q if n, g, l satisfies ψ. A run is a sequence of the form q1 nm ,gm ,lm
n1 ,g1 ,l1
−→
q2
n2 ,g2 ,l2
−→
... −→ qm+1 . The word of the run is the sequence of program states n1 , g1 , l1 n2 , g2 , l2 . . . nm , gm , lm of the run. Note that a word needs not be a program trace. An infinite run ρ is accepting if and only if it starts from q0 and it visits a final state infinitely often. That is, ρ contains a loop which visits a final state. B accepts a sequence of program states w if and only if there is an accepting run whose word is w. The language of B, denoted by L(B), is the sequences of program states accepted by B. 3.3
LTL Model Checking Problem
A program P satisfies an LTL formula ϕ, denoted by P |= ϕ, if and only if L(P ) is included in the sequences specified by ϕ. For verifying P |= ϕ, we negate ϕ and construct a B¨ uchi automata B¬ϕ , which accepts the sequences of program states not specified by ϕ. P |= ϕ if and only if L(P ) ∩ L(B¬ϕ ) = ∅. Given a program P = {p0 , . . . , pm } and a B¨ uchi automata B = (Q, δ, q0 , F ), we can simulate P and B concurrently to verify L(P ) ∩ L(B) = ∅. A simulation state q, n, g, l records a state q ∈ Q and a program state n, g, l. A simulation state q, n, g, l is final if q ∈ F . Let → denote the transition relation of simulation states. we have q, n, g, l → q , n , g , l if and only if n,g,l
q −→ q and (g, l, g , l ) ∈ Δ(n, n ). A simulation trace is a sequence of the form q1 , n1 , g1 , l1 → q2 , n2 , g2 , l2 → ... → qm , nm , gm , lm where n1 , g1 , l1 n2 , g2 , l2 . . . nm , gm , lm is a program trace. The simulation trace is matched if n1 , g1 , l1 n2 , g2 , l2 . . . nm , gm , lm is matched. A tuple (ςi , g, l, i , g , l ) is a
LTL Model Checking for Recursive Programs
387
summary of procedure pi if there is a matched simulation trace from the ςi , g, l to i , g , l , where ςi and i are the entry location and exit location respectively. Assume (n, n ) ∈ E → ∪E ∪E . A simulation state q, n, g, l can reach another simulation state q , n , g , l , denoted by q, n, g, l ⇒ q , n , g , l , if either one of the following conditions holds. – (n, n ) ∈ E → ∪ E and q, n, g, l → q , n , g , l . – (n, n ) ∈ E and there is a matched simulation trace: q, n, g, l → q1 , n1 , g1 , l1 → ... → qm , nm , gm , lm → q , n , g , l such that callee(n, n ) = pi , n1 = ςi , and nm = i . Here, (q1 , n1 , g1 , l1 , qm , nm , gm , lm ) is a summary of procedure pi . true
We write q, n, g, l ⇒ q , n , g , l if either q ∈ F or ∃1 ≤ i ≤ m.qi ∈ F . That is, q, n, g, l reaches q , n , g , l via a simulation trace which visits a final simulafalse
tion state. Otherwise, we write q, n, g, l ⇒ q , n , g , l . A reachability path is a sequence of the form q1 , n1 , g1 , l1 ⇒ q2 , n2 , g2 , l2 ⇒ . . . ⇒ qm , nm , gm , lm . Intuitively, the stack height of the program is monotonic increasing in a reachability path. A reachability path is matched if the stack height is not increasing, i.e., ∀(ni , ni+1 ) ∈ E → ∪E . Let ⇒+ denote the transitive closure, and let ⇒∗ deb
bm−1
1 note the reflexive transitive closure. If q1 , n1 , g1 , l1 ⇒ . . . ⇒ qm , nm , gm , lm ,
b1 ∨...∨bm−1 +
we have q1 , n1 , g1 , l1 =⇒ qm , nm , gm , lm . For verifying L(P ) ∩ L(B) = ∅, we have the following theorem. Theorem 1. [7] L(P ) ∩ L(B) = ∅ if and only if ∃g0 , l0 , q, n, g, l. true +
q0 , ς0 , g0 , l0 ⇒∗ q, n, g, l ⇒
q, n, g, l.
Intuitively, the program trace accepted by both P and B shall be able to loop. The stack height at the second occurrence of q, n, g, l is higher or equal to that at the second occurrence of q, n, g, l. The program trace can loop.
(a)
(b)
Fig. 1. (a)Control flow graph of p0 (b) B¨ uchi automaton B♦(loc=n3 )
388
G.-D. Huang, L.-Z. Cai, and F. Wang
Now, we show an example of P |= ♦(loc = n3 ), which checks if n3 is entered infinitely often. Assume that P = {p0 } and we give the control flow graph of p0 in Fig 1(a). ς0 is the entry node, and 0 is the exit node. Location n1 branches to location n2 and location n3 . (n2 , 0 ) and (n3 , 0 ) are call-to-return edges, which represent procedure calls to p0 itself. In Fig 1(b), we show a B¨ uchi automaton B♦(loc = n3 )) = ¬(♦(loc = n3 )). By simulating P =n3 )) , where ♦(loc false
and B concurrently, we can find out a reachability path: q0 , ς0 , true, true ⇒ true true true q1 , n1 , true, true ⇒ q1 , n2 , true, true ⇒ q1 , ς0 , true, true ⇒ q1 , n1 , true, true. The state q1 , n1 , true, true reoccurs in the reachability path so that we can infer that L(P ) ∩ L(B) = ∅ according to the Theorem 1, and it implies P ♦(loc = n3 ).
4
Semi-symbolic Simulation
Suppose a program P = {p0 , . . . , pk } and a B¨ uchi automata B = (Q, δ, q0 , F ). We semi-symbolic simulate P and B concurrently. That is, we explicitly simulate on the locations while keeping the states Q and the program variables symbolically. We use a set of Boolean variables VQ to encode the set of states Q. That is, each state is represented by a unique valuation of VQ . We write [q] to denote the valuation for a specific state q. Let v¯ denote a set of Boolean variables. Let q¯, g¯, and ¯l be of size |VQ |, |VG |, and |VL | respectively. We give the definition of semi-symbolic states in definition 4. Definition 4. A semi-symbolic state is a tuple ¯ q , n, g¯, ¯l in which we keep a ¯ location n and sets of Boolean variables q¯, g¯, and l. A semi-symbolic state ¯ q , n, g¯, ¯l can represent a set of simulation states with the same location n. An assignment π on a set of Boolean variables v¯ is a mapping from v¯ to B. Let |[¯ v ]|π ∈ B|¯v| denote the valuation of v¯ according to the assignment π. Given an assignment π of (¯ q ∪ g¯ ∪ ¯l), |[¯ q ]|π , n, |[¯ g ]|π , |[¯l]|π ¯ ¯ is a simulation state. Let |[¯ q , n, g¯, l]|π denote |[¯ q ]|π , n, |[¯ g ]|π , |[l]|π . Assume σ = ¯ q , n, g¯, ¯l and σ = ¯ q , n , g¯ , ¯l . Let eq(σ, σ ) denote a Boolean formula which is satisfied by an assignment π if and only if |[σ]|π = |[σ ]|π . Let [F ](¯ q ) be a Boolean formula which is satisfied by an assignment π if and only if |[¯ q ]|π ∈ F . Let τ (σ, σ ) be the characteristic function for transition relation of simulation states. τ (σ, σ ) is satisfied by an assignment π if and only if |[σ]|π → |[σ ]|π . The construction for these Boolean formulas is straightforward and is skipped here. Definition 5. A semi-symbolic state ¯ q , n, g¯, ¯l can reach another semi-symbolic ¯ ¯ state ¯ q , n , g¯ , l , denoted by ¯ q , n, g¯, l → ¯ q , n , g¯ , ¯l , if and only if (n, n ) ∈ → E ∪E ∪E . We define the reachability relation for semi-symbolic states in definition 5. Compare with the definition for simulation states, we only check the locations since we only keep the locations explicitly in semi-symbolic states. Assuming σ = ¯ q , n, g¯, ¯l and σ = ¯ q , n , g¯ , ¯l . we define a Boolean formula T (σ, b, σ ) in
LTL Model Checking for Recursive Programs
389
table 4. Intuitively, T (σ, b, σ ) is the characteristic function for the reachability relation of simulation states. T (σ, b, σ ) is satisfied by an assignment π if and |[b]|π
only if |[σ]|π ⇒ |[σ ]|π . That is, – (n, n ) ∈ E → ∪ E and |[σ]|π → |[σ ]|π , or – (n, n ) ∈ E , callee(n, n ) = pi , and there is a matched simulation trace |[σ]|π → qς , ςi , gς , lς → ... → q , i , g , l → |[σ ]|π . Moreover, |[b]|π = true if and only if |[¯ q ]|π ∈ F or ∃1 ≤ i ≤ m.|[¯ qi ]|π ∈ F , and |[b]|π = false otherwise. For the case (n, n ) ∈ E , T (σ, b, σ ) is defined on Σ(σς , b , σ ). Intuitively, Σ(σς , b , σ ) encodes the summary [2,14] of procedure pi . By the definition, Σ(σς , b , σ ) is satisfied by an assignment π if and only if there is a matched reachability path from |[σς ]|π to |[σ ]|π . That is, there is a matched simulation trace qς , ςi , gς , lς → ... → q , i , g , l where |[σς ]|π = qς , ςi , gς , lς and |[σ ]|π = q , i , g , l . Moreover, |[b ]|π = true if and only if the matched simulation trace visits a final simulation state, and |[b ]|π = false otherwise. Table 2. Definitions of T (σ, b, σ ) and Σ(σ, b, σ ) ⎧ q) if (n, n ) ∈ E → ∪ E ⎨ τ (σ, σ ) ∧ b ⇔ [F ](¯ T (σ, b, σ ) τ (σ, σς ) ∧ Σ(σς , b , σ ) ∧ τ (σ , σ ) ∧ b ⎩ ⇔ ([F ](¯ q ) ∨ b ∨ [F ](¯ q )) if (n, n ) ∈ E ¯ where callee(n, n ) = pi , σς = ¯ qς , ςi , g¯ς , lς , and σ = ¯ q , i , g¯ , ¯ l . eq(σ, σ ) ∧ b ⇔ false if n = n Σ(σ, b, σ ) T (σ, b , σ ) ∧ Σ(σ , b , σ ) ∧ b ⇔ (b ∨ b ) if n = n i i i i i i 1≤i≤m where succ(n) = {n1 , . . . , nm } and σi = ¯ qi , ni , g¯i , ¯ li
Definition 6. A semi-symbolic path is a sequence of the form σ1 → σ2 → . . . → σm . Given a semi-symbolic path σ1 → σ2 → . . . → σm , we define the following Boolean formula to check the existence of loops. χLoop (σ 1 , σ2 , . . . , σm ) 1≤i<m T (σi , bσi σi+1 , σi+1 )∧ (eq(σ , σ ) ∧ i j 1≤i<j≤m i≤k<j bσk σk+1 ). Intuitively, the satisfiability of χLoop (σ1 , σ2 , . . . , σm ) is corresponding to the existence of loop. χLoop (σ1 , σ2 , . . . , σm ) is satisfied by an assignment π if and only if there is a reachability path |[σ1 ]|π ⇒ |[σ2 ]|π ⇒ . . . ⇒ |[σm ]|π such that ∃i, j.|[σi ]|π = |[σj ]|π and ∃k.bσk σk+1 . We also define the following Boolean formula to check if the loop diameter is reached. χDiameter (σ1 , σ2 , .. . , σm ) 1≤i<m T (σi , bσi σi+1 , σi+1 )∧ 1≤i<j≤m ¬eq(σi , σj ). The unsatisfiability of χDiameter (σ1 , σ2 , . . . , σm ) is corresponding to that the loop diameter is reached.
390
G.-D. Huang, L.-Z. Cai, and F. Wang
Construction of T (σ, b, σ ) and Σ(σ, b, σ ). We construct the Boolean formulas T (σ, b, σ ) and Σ(σ, b, σ ) inductively. We define a set of equations which can represent a Boolean formula. The LHS (left-hand-side) of an equation is a term of the form T (σ, b, σ ) and Σ(σ, b, σ ). Note that we interpret T (σ, b, σ ) and Σ(σ, b, σ ) as terms instead of Boolean formulas in the sequel. According to table 4, we define the LHS term by the RHS (right-hand-side) in which there may be other terms. We illustrate the construction by example. T (σ, b, σ ) = τ (σ, σς ) ∧ Σ(σς , b , σ ) ∧ τ (σ , σ ) ∧ b ⇔ ([F ](¯ q ) ∨ b ∨ [F ](¯ q )) is an equation for defining T (σ, b, σ ) and there is a term Σ(σς , b , σ ) in the RHS. In turn, we define an equation for Σ(σς , b , σ ). We iteratively define a set of equations Φ for T (σ, b, σ ). Let U ndef (Φ) denote the terms which only appear in the LHS of equations. Intuitively, if a term t ∈ U ndef (Φ), t is undefined. We can solve the set of equations by SAT solvers. The set of solutions for T (σ, b, σ )∧Φ over-approximates the set of solutions for theBoolean formula T (σ, b, σ ). In contrast, the set of solutions for T (σ, b, σ ) ∧ Φ ∧ t∈Undef (Φ) ¬t is an under-approximation. With sufficient equa tions, the set of solutions for T (σ, b, σ ) ∧ Φ and T (σ, b, σ ) ∧ Φ ∧ t∈Undef (Φ) ¬t reach a fix-point, which is equivalent to the set of solutions for the Boolean formula T (σ, b, σ )[14].
5
Model Checking Algorithm
Suppose a program P = {p0 , . . . , pk } and a B¨ uchi automata B = (Q, δ, q0 , F ). We construct a simulation tree to enumerate possible semi-symbolic paths. The simulation tree is constructed by unfolding the control flow graphs and by instantiating semi-symbolic states for the locations. An instantiation for a location n is a semi-symbolic state ¯ q , n, g¯, ¯l, where q¯ , g¯, and ¯l are new sets of Boolean variables serve as a copy of VG , VQ , and VL respectively. A tree node is a semi-symbolic state. A tree node ¯ q , n, g¯, ¯l has a child ¯ q , n , g¯ , ¯l if → ¯ (n, n ) ∈ E ∪ E ∪ E . A tree node ¯ q , n, g¯, l has no child if n is an exit location. A tree path in the simulation tree is a semi-symbolic path. Example 1. In Fig 2(a), we give the control flow graph of a procedure p0 The entry node ς0 branches to location n1 and location n2 . (n1 , 0 ) is a call-to-return edge, which represents a procedure call to p0 itself. In Fig 2(b), we give the simulation tree constructed by unfolding the control flow graph. A node with double circles is leaf in the simulation tree. Given a simulation tree, we define a set of equations. Let σ = ¯ q , n, g¯, ¯l and σ = ¯ ¯ q , n , g¯ , l . For each tree edge (σ, σ ), we define an equation φ for T (σ, bσσ , σ ) according to table 4. If (n, n ) ∈ E → ∪ E , it is straightforward to define φ as T (σ, bσσ , σ ) = τ (σ, σ ) ∧ bσσ ⇔ [F ](¯ q ). If (n, n ) ∈ E , φ depends on a sum mary of procedure callee(n, n ) = pk . Since (n, n ) ∈ E , we have (n, ςk ) ∈ E . Assume that σ has a child σς = ¯ q , ςk , g¯ , ¯l . We instantiate a semi-symbolic ¯ state σ = ¯ q , k , g¯ , l for the exit location k . We define φ as T (σ, bσσ , σ ) = τ (σ, σς ) ∧ Σ(σς , bσς σ , σ ) ∧ τ (σ , σ ) ∧ bσσ ⇔ ([F ](¯ q ) ∨ bσς ,σ ∨ [F ](¯ q )).
LTL Model Checking for Recursive Programs
(a)
(b)
391
(c)
Fig. 2. (a)Control flow graph of p0 (b)Simulation tree (c)Simulation tree applying immediately post-dominance relation
Let σς = ¯ qς , ςk , g¯ς , ¯lς and σ = ¯ q , k , g¯ , ¯l . For a summary Σ(σς , bσς σ , σ ), we define a set of equations. For each tree node σ = ¯ q , n, g¯, ¯l reachable from σς without visiting an entry location, we define an equation φ for Σ(σ, bσσ , σ ). Assume n = k and σ has m children σ1 , . . . , σm where σ qi , ni , g¯i , ¯li and ni i = ¯ is not an entry location. We define φ as Σ(σ, bσσ , σ ) = 1≤i≤m T (σ, bσσi , σi ) ∧ Σ(σi , bσi σ , σ ) ∧ b ⇔ (bσσi ∨ bσi σ ). If n = k , σ is an instantiation for the exit location k and the valuation of σ shall be the same as the valuation of σ . We define φ as Σ(σ, bσσ , σ ) = eq(σ, σ ) ∧ bσσ ⇔ false. Algorithm. We list our algorithm in table 5. We construct a simulation tree G and define a set of equations Φ on the fly. The worklist wl is used to record current leafs. In each iteration, we pop a leaf σ = ¯ q , n, g¯, ¯l from the worklist. We expand the simulation tree from σ and define an equation φT (σ,bσσ ,σ ) for T (σ, bσσ , σ ) for each new edge (σ, σ ). After the expansion, we check if Σ(σ, bσσ , σ ) ∈ U ndef (Φ). If so, we need define an equation φΣ(σ,bσσ ,σ ) for Σ(σ, bσσ , σ ). Then, we check all semi-symbolic paths in the simulation tree to find looping reachability path. Let ∈ G denote that is a path from the root to a leaf in the simulation tree G. We define a Boolean formula Ξ − (G, Φ) ∈G χLoop () ∧ Φ ∧ t∈Undef (Φ) ¬t. The satisfiability of Ξ − (G, Φ) is corresponding to the existence of looping reachability paths. We have the following lemma. Lemma 1. Ξ − (G, Φ) is satisfiable eventually if and only if ∃g0 , l0 , q, n, g, l. true +
q0 , ς0 , g0 , l0 ⇒∗ q, n, g, l ⇒
q, n, g, l.
392
G.-D. Huang, L.-Z. Cai, and F. Wang Table 3. Bounded Model Checking Algorithm
1: G = ∅; Φ = ∅; wl = {[q0 ], ς0 , q¯, ¯ l}. 2: while wl = ∅ do 3: wl = wl \ σ, where σ = ¯ q , n, g¯, ¯ l → 4: for each (n, n ) ∈ E ∪ E ∪ E do 5: instantiate a semi-symbolic state σ = ¯ q , n , g¯ , ¯ l ; wl = wl ∪ {σ } 6: G = G ∪ {(σ, σ )} 7: Φ = Φ ∪ {φT (σ,bσσ ,σ ) } 8: end for 9: if Σ(σ, bσσ , σ ) ∈ U ndef (Φ) then 10: Φ = Φ ∪ {φΣ(σ,bσσ ,σ ) } 11: end if 12: if Ξ − (G, Φ) is satisfiable then 13: print L(P ) ∩ L(B) = ∅; exit 14: else if Ξ + (G, Φ) is unsatisfiable then 15: print L(P ) ∩ L(B) = ∅; exit 16: end if 17: end while
We can stop a semi-symbolic path if it has reached its loop diameter. Otherwise, we shall continue to expand the semi-symbolic path. Our algorithm terminates if no semi-symbolic paths needs expansion. We define a Boolean formula Ξ + (G, wl) ∈G χDiameter () ∧ Φ. The unsatisfiability of Ξ + (G, Φ) is corresponding to the absence of semi-symbolic paths which need expansion. In other words, our algorithm can terminate. We have the following lemma. Lemma 2. Ξ + (G, wl) is unsatisfiable eventually if and only if g0 , l0 , q, n, g, l. true +
q0 , ς0 , g0 , l0 ⇒∗ q, n, g, l ⇒
q, n, g, l.
Applying Post-dominance Relation. Consider a control-flow graph G = (N, E, Δ, callee, s, e). A location n is post-dominated by a location n , denoted by n ⇒ n , if and only if all paths from n to e must visit n . A location n is immediately post-dominated by a location n if and only if n ⇒ n , and there is no location n such that n ⇒ n and n ⇒ n . Given a location n, let ipd(n) denote the location which immediately post-dominates n. While program variables are kept symbolic, the locations are still explicitly tracked in simulation. The size of the simulation tree grows exponentially to the number of branches in a program. To reduce the size of the simulation tree, we join the branches at immediately post-dominators and continue to unfold the control flow graphs from the immediately post-dominators. Example 2. Follow Example 1. The location ς0 has two branches and has a immediately post-dominator n1 . In Fig 2(c), we give the simulation tree applied
LTL Model Checking for Recursive Programs
393
the immediately post-dominance relation. We stop these two branches at n1 . We add an additional edge (ς0 , n1 ) (in the right-most side) and continue to unfold the control flow graphs from the n1 . Consider a tree node σ = ¯ q , n, g¯, ¯l and assume that n has more than one successors. We instantiate a semi-symbolic state σipd = ¯ qipd , ipd(n), g¯ipd , ¯lipd for the immediately post-dominator ipd(n) and add an edge (σ, sigmaipd ). For the edge (σ, sigmaipd ), we define an equation T (σ, bσσipd , σipd ) = Σ(σ, bσσipd , σipd ). Intuitively, T (σ, bσσipd , σipd ) shall be satisfied by an assignment π if there is a matched reachability path from |[σ]|π to |[σipd ]|π . It is equivalent to Σ(σ, bσσipd , σipd ). For each node σ in the branches, we define an equation for Σ(σ , bσ σipd , σipd ) according to table 4. Assume that there is a function term Σ(σ, bσσ , σ ) ∈ U ndef (Φ). We change the equation for Σ(σ, bσσ , σ ) to Σ(σ, bσσ , σ ) = T (σ, bσσipd , σipd ) ∧ T (σipd , bσipd σ , σ ) since we join the branches at ipd(n).
6
Experiments
Given a program P = {p0 , . . . , pk } and an LTL formula ϕ, we have implemented our model checking algorithm to verify P |= ϕ. In our implementation, we construct the B¨ uchi automata B¬ϕ for the negation of ϕ by LTL2BA[10] and solve instances of Boolean satisfiability by Zchaff solver. We collect our data in a 1.6 GHz Intel machine with 1Gb memory. In figure 3, we show the growth of tree size on a quicksort program [7]. We construct the simulation tree in the breadth-first order and record the number of tree nodes in each iteration. As can been seen, the growth of tree size is smooth when applying the post-dominance relation. In the 25th iteration, the simulation tree is constructed completely with 1682 tree nodes. Without applying the postdominance relation, the growth of tree size is rapid and the algorithm ran out of memory in the 17th iteration.
Fig. 3. Tree Size Comparison of IPD and without IPD
394
G.-D. Huang, L.-Z. Cai, and F. Wang Table 4. Performance Comparison #procedure/ liveness safety avg. #location ans. moped lmc IPD ans. moped lmc IPD 3/1k No 0.04 0.03 0.04 Yes 0.02 0.08 0.02 4/2k No 0.06 0.13 0.09 No 0.12 0.02 0.11 5/4k No 0.09 0.16 0.10 Yes 0.14 1.70 0.09 6/8k No 0.26 0.52 0.26 Yes 0.44 6.82 0.27 7/16k No 0.93 1.42 0.71 No 1.98 3.79 4.88 8/32k No 14.87 6.61 2.49 No 28.32 6.88 3.04 9/64k No 18.99 7.82 3.01 No 37.90 7.87 2.96 10/128k No 361.47 37.46 13.24 No O/M 42.70 17.96 moped: BDD-based model-checker. lmc: SAT-based model-checker. O/M: out of memory Table 5. Profiling Data
#process/ avg. #location 3/1k 4/2k 5/4k 6/8k 7/16k 8/32k 9/64k 10/128k
liveness safety lmc IPD lmc IPD create solve create solve create solve create solve