Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2517
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo
Mark D. Aagaard John W. O’Leary (Eds.)
Formal Methods in Computer-Aided Design 4th International Conference, FMCAD 2002 Portland, OR, USA, November 6-8, 2002 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Mark D. Aagaard Department of Electrical and Computer Engineering, University of Waterloo 200 University Avenue West, Waterloo, ON N2L 3G1, Canada E-mail:
[email protected] John W. O’Leary Strategic CAD Labs, Intel Corporation 5200 NE Elam Young Parkway, Hillsboro OR, 97124-6497, USA E-mail:
[email protected] Cataloging-in-Publication Data applied for Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at
CR Subject Classification (1998): B.1.2, B.1.4, B.2.2-3, B.6.2-3, B.7.2-3, F.3.1, F.4.1, I.2.3, D.2.4, J.6 ISSN 0302-9743 ISBN 3-540-00116-6 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2002 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin, Stefan Sossna e. K. Printed on acid-free paper SPIN: 10870994 06/3142 543210
Preface
This volume contains the proceedings of the Fourth Biennial Conference on Formal Methods in Computer-Aided Design (FMCAD). The conference is devoted to the use of mathematical methods for the analysis of digital hardware circuits and systems. The work reported in this book describes the use of formal mathematics and associated tools to design and verify digital hardware systems. Functional verification has become one of the principal costs in a modern computer design effort. FMCAD provides a venue for academic and industrial researchers and practitioners to share their ideas and experiences of using discrete mathematical modeling and verification. Over the past 20 years, this area has grown from just a few academic researchers to a vibrant worldwide community of people from both academia and industry. This volume includes 23 papers selected from the 47 submitted papers, each of which was reviewed by at least three program committee members. The history of FMCAD dates back to 1984, when the earliest meetings on this topic occurred as part of IFIP WG10.2. IFIP WG10.2 Workshops 1984 1985 1986 1988 1989 1990 1991
Darmstadt Edinburgh Grenoble Glasgow Leuven Miami Torino
Eveking Milne and Subrahmanyam Borrione Milne Claessen Subrahmanyam Prinetto and Camurati
At the IFIP WG10.2 meeting in 1991 a presentation by the ESPRIT group “CHARME” led to the creation of the conference on Correct Hardware Design and Verification Methods (CHARME). For several years, CHARME alternated with the conference on Theorem Provers in Circuit Design (TPCD), which evolved into FMCAD. Traditionally, FMCAD and CHARME are held on alternate years on different continents. Correct Hardware Design and Verification Methods (CHARME) 1993 Arles Milne and Pierre (LNCS 683) 1995 Frankfort Eveking and Camurati (LNCS 987) 1997 Montreal Li and Probst 1999 Bad Herrenalb Kropf and Pierre (LNCS 1703) 2001 Livingstone Margaria and Melham (LNCS 2144)
VI
Preface
Theorem Provers in Circuit Design 1992 1994
(TPCD)
Nijmegen Boute, Melham, and Stavridou Bad Herrenalb Kropf and Kumar (LNCS 901)
Formal Methods in Computer-Aided Design (FMCAD) 1996 1998 2000
San Jose San Jose Austin
Camilleri and Srivas (LNCS 1166) Gopalakrishnan and Windley (LNCS 1522) Hunt and Johnson (LNCS 1954)
The organizers are grateful to Intel, Motorola, Xilinx, and Synopsys for their financial sponsorship, which considerably eased the organization of the conference. Sandy Ellison and Kelli Dawson of Intel Meeting Services are to be thanked for their tireless effort; they kept us on an organized and orderly path. Waterloo, Ontario Portland, Oregon November 2002
Mark D. Aagaard John W. O’Leary
Conference Organization John O’Leary (General Chair) Mark Aagaard (Program Chair)
Program Committee Mark Aagaard (Canada) Dominique Borrione (France) Randal E. Bryant (USA) Jerry Burch (USA) Eduard Cerny (USA) Shiu-Kai Chin (USA) Ed Clarke (USA) David Dill (USA) Hans Eveking (Germany) Masahiro Fujita (Japan) Steven German (USA) Ganesh Gopalakrishnan (USA) Mike Gordon (UK) Susanne Graf (France) Kiyoharu Hamaguchi (Japan) Ravi Hosabettu (USA) Alan Hu (Canada) Warren Hunt (USA) Steve Johnson (USA)
Robert Jones (USA) Thomas Kropf (Germany) Andreas Kuehlmann (USA) John Launchbury (USA) Tim Leonard (USA) Andy Martin (USA) Ken McMillan (USA) Tom Melham (UK) Paul Miner (USA) John O’Leary (USA) Laurence Pierre (France) Carl Pixley (USA) David Russinoff (USA) Mary Sheeran (Sweden) Eli Singerman (Israel) Anna Slobodova (USA) Ranga Vemuri (USA) Matthew Wilding (USA) Jin Yang (USA)
Additional Reviewers Roy Armoni Ritwik Bhattacharya Jesse Bingham Annette Bunker Pankaj Chauhan Limor Fix Amit Goel
John Harrison Gila Kamhi James Kukula Shuvendu Lahiri Madhubanti Mukherjee Rajesh Radhakrishnan Sanjit Seshia
Ali Sezgin Robert de Simone Subramanyan Siva Ofer Strichman Rob Sumners Vijay Sundaresan
Table of Contents
Abstraction Abstraction by Symbolic Indexing Transformations . . . . . . . . . . . . . . . . . . . . . Thomas F. Melham, Robert B. Jones
1
Counter-Example Based Predicate Discovery in Predicate Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Satyaki Das, David L. Dill Automated Abstraction Refinement for Model Checking Large State Spaces Using SAT Based Conflict Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Pankaj Chauhan, Edmund Clarke, James Kukula, Samir Sapra, Helmut Veith, Dong Wang Symbolic Simulation Simplifying Circuits for Formal Verification Using Parametric Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 In-Ho Moon, Hee Hwan Kwak, James Kukula, Thomas Shiple, Carl Pixley Generalized Symbolic Trajectory Evaluation — Abstraction in Action . . . . 70 Jin Yang, Carl-Johan H. Seger Model Checking: Strongly-Connected Components Analysis of Symbolic SCC Hull Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Fabio Somenzi, Kavita Ravi, Roderick Bloem Sharp Disjunctive Decomposition for Language Emptiness Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Chao Wang, Gary D. Hachtel Microprocessor Specification and Verification Relating Multi-step and Single-Step Microprocessor Correctness Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Mark D. Aagaard, Nancy A. Day, Meng Lou Modeling and Verification of Out-of-Order Microprocessors in UCLID . . . . 142 Shuvendu K. Lahiri, Sanjit A. Seshia, Randal E. Bryant
X
Table of Contents
Decision Procedures On Solving Presburger and Linear Arithmetic with SAT . . . . . . . . . . . . . . . . 160 Ofer Strichman Deciding Presburger Arithmetic by Model Checking and Comparisons with Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Vijay Ganesh, Sergey Berezin, David L. Dill Qubos: Deciding Quantified Boolean Logic Using Propositional Satisfiability Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Abdelwaheb Ayari, David Basin Model Checking: Reachability Analysis Exploiting Transition Locality in the Disk Based Murϕ Verifier . . . . . . . . . . 202 Giuseppe Della Penna, Benedetto Intrigila, Enrico Tronci, Marisa Venturini Zilli Traversal Techniques for Concurrent Systems . . . . . . . . . . . . . . . . . . . . . . . . . 220 Marc Sol´e, Enric Pastor Model Checking: Fixed Points A Fixpoint Based Encoding for Bounded Model Checking . . . . . . . . . . . . . . . 238 Alan Frisch, Daniel Sheridan, Toby Walsh Using Edge-Valued Decision Diagrams for Symbolic Generation of Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Gianfranco Ciardo, Radu Siminiceanu Verification Techniques and Methodology Mechanical Verification of a Square Root Algorithm Using Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Jun Sawada, Ruben Gamboa A Specification and Verification Framework for Developing Weak Shared Memory Consistency Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Prosenjit Chatterjee, Ganesh Gopalakrishnan Model Checking the Design of an Unrestricted, Stuck-at Fault Tolerant, Asynchronous Sequential Circuit Using SMV . . . . . . . . . . . . . . . . . 310 Meine van der Meulen
Table of Contents
XI
Hardware Description Languages Functional Design Using Behavioural and Structural Components . . . . . . . 324 Richard Sharp Compiling Hardware Descriptions with Relative Placement Information for Parametrised Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 Steve McKeever, Wayne Luk, Arran Derbyshire Prototyping and Synthesis Input/Output Compatibility of Reactive Systems . . . . . . . . . . . . . . . . . . . . . . 360 Josep Carmona, Jordi Cortadella Smart Play-out of Behavioral Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 David Harel, Hillel Kugler, Rami Marelly, Amir Pnueli Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Abstraction by Symbolic Indexing Transformations Thomas F. Melham1 and Robert B. Jones2 1
Department of Computing Science, University of Glasgow, Glasgow, Scotland, G12 8QQ. 2 Strategic CAD Labs, Intel Corporation, JF4-211, 2511 NE 25th Avenue, Hillsboro, OR 97124, USA.
Abstract. Symbolic indexing is a data abstraction technique that exploits the partially-ordered state space of symbolic trajectory evaluation (STE). Use of this technique has been somewhat limited in practice because of its complexity. We present logical machinery and efficient algorithms that provide a much simpler interface to symbolic indexing for the STE user. Our logical machinery also allows correctness assertions proved by symbolic indexing to be composed into larger properties, something previously not possible.
1
Introduction
Symbolic trajectory evaluation (STE) is an efficient model checking algorithm especially suited to verifying properties of large datapath designs [1]. STE is based on symbolic ternary simulation [2], in which the Boolean data domain {0, 1} is extended to a partially-ordered state space by the addition of an unknown value ‘X’. This gives circuit models in STE a built-in and flexible data abstraction hierarchy. Symbolic indexing is a technique for formulating STE logic formulas in a way that exploits this partially-ordered state space and reduces the number of BDD variables needed to verify a property. The method can make a dramatic difference in the time and space needed to check a formula, and can be used to verify circuit properties that are infeasible to verify directly [3]. Although symbolic indexing has been known for a long time [4], our experience is that it is not exploited nearly as often as it is applicable. In part, this is because only limited user-level support has been available in libraries provided to verification engineers. But, more importantly, correctness assertions proved by symbolic indexing are not formulated in a way that makes them composable at higher levels. Two formulas written using symbolic indexing might express two circuit properties that imply some desired result but encode these properties using incompatible indexing schemes. Moreover, there is no explicit characterization of the conditions under which more composable formulas can be derived from the indexed ones. M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 1–18, 2002. c Springer-Verlag Berlin Heidelberg 2002
2
T.F. Melham and R.B. Jones
This paper describes some logical machinery aimed at bridging these gaps. We present an algorithm to transform ordinary verification problems into symbolically indexed form, together with an account of the side-conditions that must hold for this transformation to be sound. We also describe how the algorithm can be applied in the presence of environmental constraints, an important consideration in practice. Finally, we provide some experimental results on a CAM (content-addressable memory) circuit. The work presented in this paper does not completely automate the use of symbolic indexing in the verification flow. Our algorithms require the user to supply an indexing relation that expresses the desired abstraction scheme; we do not provide a method whereby an effective indexing relation can be discovered in the first place. Our results do, however, guarantee the soundness, subject to certain well-characterized side-conditions, of using an indexing relation to transform a verification property. This key result paves the way for future work on automatic abstraction techniques for STE, in which an attempt might be made to discover suitable indexing relations automatically.
2
STE Model Checking
Symbolic trajectory evaluation [1] is an efficient model checking algorithm especially suited to verifying properties of large datapath designs. The most basic form of STE works on a very simple linear-time temporal logic, limited to implications between formulas built from only conjunction and the next-time operator. STE is based on ternary simulation [2], in which the Boolean data domain {0, 1} is extended with a third value ‘X’ that stands for an indeterminate value (‘0’ or ‘1’). This provides STE with powerful state-space abstraction capabilities, as will be illustrated subsequently. While the basic STE logic is weak, its expressive power is greatly extended by implementing a symbolic ternary simulation algorithm. Symbolic ternary simulation [4] uses BDDs [5] to represent classes of data values on circuit nodes. With this representation, STE can combine many (ternary) simulation runs— one for each assignment of values to the BDD variables—into a single symbolic simulation run covering them all. In this section, we provide a brief overview of STE model checking theory. A full account of the theory can be found in [1] and an alternative perspective in [6]. 2.1
Circuit Models
Symbolic trajectory evaluation employs a ternary data model with values drawn from the set D = {0, 1, X}. A partial order relation ≤ is introduced, with X ≤ 0 and X ≤ 1: 0
❅ ❅
1 X
Abstraction by Symbolic Indexing Transformations
3
This orders values by information content: X stands for an unknown value and so is ordered below 0 and 1. We suppose there is a set of nodes, N , naming observable points in circuits. A state is an instantaneous snapshot of circuit behavior given by assigning a value in D to every circuit node in N . The ordering ≤ on D is extended pointwise to get an ordering on states. We wish this to form a complete lattice, and so introduce a special ‘top’ state, , and define the set of states S to be (N →D) ∪ {}. The required ordering is then defined for states s1 , s2 ∈ S by s1 s2
=
s2 = or s1 , s2 ∈ N →D and s1 (n) ≤ s2 (n) for all n ∈ N
The intuition is that if s1 s2 , then s1 may have ‘less information’ about node values than s2 , i.e. it may have Xs in place of some 0s and 1s. If one considers the three-valued ‘states’ s1 and s2 as constraints or predicates on the actual, i.e. Boolean, state of the hardware, then s1 s2 means that every Boolean state that satisfies s1 also satisfies s2 . We say that s1 is ‘weaker than’ s2 . (Strictly speaking, is reflexive and we really mean ‘no stronger than’, but it is common to be somewhat inexact and just say ‘weaker than’.) The top value represents the unsatisfiable constraint. The join operator on pairs of states in the lattice is denoted by ‘ ’. To model dynamic behavior, a sequence of the values that occur on circuit nodes over time will represented by a function σ ∈ N→S from time (the natural numbers N) to states. Such a function, called a sequence, assigns a value in D to each node at each point in time. For example, σ 3 reset is the value present on the reset node at time 3. We lift the ordering on states pointwise to sequences in the obvious way: σ 1 σ2
=
σ1 (t) σ2 (t) for all t ∈ N
One convenient operation, used later in stating the semantics of STE, is taking the ith suffix of a sequence. The ith suffix of a sequence σ is written σ i and defined by σ i t = σ (t+i) for all t ∈ N.
The suffix operation σ i simply shifts the sequence σ forward i points in time, ignoring the states at the first i time units. In symbolic trajectory evaluation, the formal model of a circuit c is given by a next-state function Yc ∈ S → S that maps states to states. Intuitively, the next-state function expresses a constraint on the real, Boolean states into which the circuit may go, given a constraint on the current Boolean state it is in. The next-state function must be monotonic and a requirement for implementations of STE is that they extract a next-state function that has this property from the circuit under analysis.1 1
In practice, the circuit model Yc is constructed on-the-fly by ternary symbolic simulation of a netlist description of the circuit c.
4
T.F. Melham and R.B. Jones
A sequence σ is said to be a trajectory of a circuit if it represents a set of behaviors that the circuit could actually exhibit. That is, the set of behaviors that σ represents (i.e. possibly using unknowns) is a subset of the Boolean behaviors that the real circuit can exhibit (where there are no unknowns). For a circuit c, we define the set of all its trajectories, T (c), as follows:
T (c) = {σ | Yc (σ t) σ (t+1) for all t ∈ N} For a sequence σ to be a trajectory, the result of applying Yc to any state must be no more specified (with respect to the ordering) than the state at the next moment of time. This ensures that σ is consistent with the circuit model Yc . 2.2
Trajectory Evaluation Logic
One of the keys to the efficiency of STE and its success with datapath circuits is its restricted temporal logic. A trajectory formula is a simple linear-time temporal logic formula with the following syntax: f, g := n is 0 | n is 1 | f and g | P →f | Nf
-
node n has value 0 node n has value 1 conjunction of formulas f is asserted only when P is true f holds in the next time step
where f and g range over formulas, n ∈ N ranges over the nodes of the circuit, and P is a propositional formula (‘Boolean function’) called a guard. The basic trajectory formulas ‘n is 0’ and ‘n is 1’ say that the node n has value 0 or value 1, respectively. The operator and forms the conjunction of trajectory formulas. The trajectory formula P → f weakens the subformula f by requiring it to be satisfied only when the guard P is true. Finally, Nf says that the trajectory formula f holds in the next point of time. Guards are the only place that variables may occur in the primitive definition of trajectory formulas. At first sight, this seems to rule out assertions such as ‘node n has value b’, where b is a variable. But the following syntactic sugar allows variables—indeed any propositional formula—to be associated with a node:
n is P = P → (n is 1) and ¬P → (n is 0) where n ∈ N ranges over nodes and P ranges over propositional formulas. The definition of when a sequence σ satisfies a trajectory formula f is now given. Satisfaction is defined with respect to an assignment φ of Boolean truthvalues to the variables that appear in the guards of the formula: φ, σ φ, σ φ, σ φ, σ φ, σ
|= n is 0 |= n is 1 |= f and g |= P → f |= Nf
= = = = =
σ(0)=, or σ(0) ∈ N →D and σ 0 n = 0 σ(0)=, or σ(0) ∈ N →D and σ 0 n = 1 φ, σ |= f and φ, σ |= g φ |= P implies φ, σ |= f φ, σ 1 |= f
Abstraction by Symbolic Indexing Transformations
5
where φ |= P means that the propositional formula P is satisfied by the assignment φ of truth-values to the Boolean variables in P . The key feature of this logic is that for any trajectory formula f and assignment φ, there exists a unique weakest sequence that satisfies f . This sequence is called the defining sequence for f and is written [f ]φ . It is defined recursively as follows: [m is 0]φ t [m is 1]φ t [f and g]φ t [P → f ]φ t [Nf ]φ t
= λn. 0 if m=n and t=0, otherwise X = λn. 1 if m=n and t=0, otherwise X = ([f ]φ t) ([g]φ t) = [f ]φ t if φ |= P, otherwise λn. X = [f ]φ (t−1) if t=0 , otherwise λn. X
The crucial property enjoyed by this definition is that [f ]φ is the unique weakest sequence that satisfies f for the given φ. That is, for any φ and σ, φ, σ |= f if and only if [f ]φ σ. The algorithm for STE is also concerned with the weakest trajectory that satisfies a particular formula. This is the defining trajectory for a formula, written [[f ]]φ . It is defined by the following recursive calculation:
[[f ]]φ 0 = [[f ]]φ (t+1) =
[f ]φ 0 [f ]φ (t+1) Yc ( [[f ]]φ t)
The defining trajectory of a formula f is its defining sequence with the added constraints on state transitions imposed by the circuit, as modeled by the nextstate function Yc . It can be shown that [[f ]]φ is the unique weakest trajectory that satisfies f . 2.3
Symbolic Trajectory Evaluation
Circuit correctness in symbolic trajectory evaluation is stated with trajectory assertions of the form A ⇒ C, where A and C are trajectory formulas. The intuition is that the antecedent A provides stimuli to circuit nodes and the consequent C specifies the values expected on circuit nodes as a response. A trajectory assertion is true for a given assignment φ of Boolean values to the variables in its guards exactly when every trajectory of the circuit that satisfies the antecedent also satisfies the consequent. For a given circuit c, we define φ |= A ⇒ C to mean that for all σ ∈ T (c), if φ, σ |= A then φ, σ |= C. The notation |= A ⇒ C means that φ |= A ⇒ C holds for all φ. The fundamental theorem of trajectory evaluation [1] follows immediately from the previously-stated properties of [f ]φ and [[f ]]φ . It states that for any φ, the trajectory assertion φ |= A ⇒ C holds exactly when [C]φ [[A]]φ . The intuition is that the sequence characterizing the consequent must be ‘included in’ the weakest sequence satisfying the antecedent that is also consistent with the circuit.
6
T.F. Melham and R.B. Jones
This theorem gives a model-checking algorithm for trajectory assertions: to see if φ |= A ⇒ C holds for a given φ, just compute [C]φ and [[A]]φ and compare them point-wise for every circuit node and point in time. This works because both A and C will have only a finite number of nested next-time operators N, and so only finite initial segments of the defining trajectory and defining sequence need to be calculated and compared. Much of the practical utility of STE comes from the key observation that it is possible to compute [C]φ [[A]]φ not just for a specific φ, but as a symbolic constraint on an arbitrary φ. This constraint takes the form of a propositional formula (e.g. a BDD) which is true exactly for variable assignments φ for which [C]φ [[A]]φ holds. Such a constraint is called a residual , and represents precisely the conditions under which the property A ⇒ C is true of the circuit.
3
Symbolic Indexing in STE
Two important properties follow from the STE theory just presented. Consider an STE assertion A ⇒ C. Suppose we replace the antecedent A with a new antecedent B that has a defining sequence no stronger than that of A (i.e. [B]φ [A]φ for all φ). Then by monotonicity of underlying the circuit model we will also have that [[B ]]φ [[A]]φ for all φ. Hence if we can prove |= B ⇒ C, then the original STE assertion |= A ⇒ C also holds. This is called antecedent weakening. Likewise, if we replace the consequent C with a new consequent D that has a defining sequence no weaker than that of C (i.e. [C]φ [D]φ for all φ) and we can prove |= A ⇒ D, then the original STE assertion |= A ⇒ C also holds. This is called consequent strengthening. Symbolic indexing is the systematic use of antecedent weakening to perform data abstraction for certain circuit structures. It exploits the partially-ordered state space of STE to reduce the complexity of the BDDs needed to verify a circuit property. Intuitively, symbolic indexing is a way to use BDD variables only ‘when needed’. The idea can be illustrated using the following trivial example. Consider the three-input AND gate shown below: ✏ i1 o i2 i3 ✑ With direct use of STE, an assertion that could be used to verify this device is |= (i1 is a) and (i2 is b) and (i3 is c) ⇒ (o is a ∧ b ∧ c)
(1)
In primitive form, this would be expressed as follows: |= ¬a → (i1 is 0) and a → (i1 is 1) and ¬b → (i2 is 0) and b → (i2 is 1) and ¬c → (i3 is 0) and c → (i3 is 1) and ⇒ ¬a ∨ ¬b ∨ ¬c → (o is 0) and a ∧ b ∧ c → (o is 1)
(2)
Abstraction by Symbolic Indexing Transformations
7
The strategy here is to place unique and unconstrained Boolean variables on each input node in the device, and symbolically simulate the circuit to check that the desired function of these variables will appear on the output node. STE’s unknown value X allows us to reduce the number of variables needed to verify the desired property. Because of the functionality of the AND gate, only the four cases enumerated in the table below need to be verified: case i1 i2 i3 o 0 0 XX0 1 X 0 X0 2 XX 0 0 3 1 1 1 1 If at least one of the AND inputs is 0, the output will be 0 regardless of the values on the other two inputs. In these cases, X may be used to represent the unknown value on the other two input nodes. If all three inputs are 1, then the output is 1 as well. Antecedent weakening, and the fact that the four cases enumerated above cover all input patterns of 0s and 1s, means this is sufficient for a complete verification. Symbolic indexing is the technique of using Boolean variables to enumerate or ‘index’ groups of cases in this efficient way. For the AND gate, there are just four cases to check, so these can be indexed with two Boolean variables, say p and q. These cases can then be verified simultaneously with STE by checking the following trajectory assertion: |= ¬p ∧ ¬q → (i1 is 0) and p ∧ q → (i1 is 1) and p ∧ ¬q → (i2 is 0) and p ∧ q → (i2 is 1) and ¬p ∧ q → (i3 is 0) and p ∧ q → (i3 is 1) and ⇒ ¬p ∨ ¬q → (o is 0) and p ∧ q → (o is 1)
(3)
If this formula is true, then we have definitive—but somewhat indirectly stated— formal evidence that the AND gate does what is required. Antecedent weakening says that whenever (3) allows an input circuit node to be X, that node could have been set to either 0 or 1 and the input/output relation verified would still hold. It can also be established by inspection of the cases enumerated in the antecedent that the given combinations of explicit constant 0s and 1s and implicit Xs covers the whole input space. This (informal) reasoning tells us that the indexed formula (3) amounts to a complete verification of the expected behavior. The advantage of symbolic indexing is that it reduces the number of Boolean variables needed to verify a property. In the AND gate the reduction is trivial— two variables instead of three. But much greater reductions are possible in real applications, and there are certainly circuits that can be verified in STE by indexing but cannot be verified directly. Memory structures are one notable example that arise frequently.
8
4
T.F. Melham and R.B. Jones
Indexing Transformations
The technical contribution of this paper addresses two problems with using symbolic indexing in practice. First, how can we gain the efficiency of symbolic indexing and yet still obtain properties that make direct, non-indexed statements about circuit correctness? Second, what side conditions must hold to ensure the soundness of such a process? We show how to construct indexed STE assertions from direct ones, given a user-supplied specification of the indexing scheme to be employed. For example, applying the method to the AND gate formula (1) above produces the indexed formula (3). This provides an accessible interface to the indexing technique. The user no longer needs to generate indexed antecedents and consequents explicitly, but can describe the indexing scheme abstractly and let a computer program construct the correct indexed formulas. Moreover, if the resulting indexed assertions are proven true, then the original assertion is also true by construction (subject to a certain side condition). This means that the original assertion can subsequently be used in higher-level reasoning. For example, it might be composed via theorem proving with other assertions verified using a different indexing scheme. 4.1
Indexing Relations
The user’s interface to our indexing method is an indexing relation that specifies the indexing scheme to be applied to the problem at hand. The relation is a propositional logic formula of the form R(xs, ts). It relates the Boolean variables ts appearing in the original problem and the Boolean variables xs that will index the cases being grouped together in the abstraction. The original problem variables ts are called the index target variables and the variables to be introduced xs are called the index variables. For the AND gate, the index targets are a, b, c and the index variables are p and q. The indexing relation R is: R(p, q, a, b, c) ≡ (pq ⊃ a) ∧ (pq ⊃ b) ∧ (pq ⊃ c) ∧ (pq ⊃ abc) As can be seen, this relation represents in logical form an enumeration of the four cases in the table of Section 3. Note that the indexing relation is not one-to-one (though other indexing relations may be). This reflects the Xs that appear in the table in Section 3, and indeed is essential to making the indexing a data abstraction at all. 4.2
Preimage and Strong Preimage
It is convenient to specify two operations on predicates using an indexing relation. The first is the ordinary preimage operation. Given a relation R and a predicate P on the target variables, the preimage PR is defined by
PR = ∃ts. R(xs, ts) ∧ P (ts)
Abstraction by Symbolic Indexing Transformations
9
The second is the strong preimage of a predicate. Given a relation R and a predicate P on the target variables, the strong preimage P R is defined by P R = PR ∧ ¬ [∃ts. R(xs, ts) ∧ ¬P (ts)]
The strong preimage is P R (xs) holds of some index xs precisely when xs is in the preimage of P and not in the preimage of the negation of P . These operations are illustrated in Figure 1. The solid circle is the preimage
xs
1111 0000 0000 1111 0000 1111 0000 1111 PR
←R→
ts P
¬PR
¬P Fig. 1. Index Relation Preimages
PR of P and the dotted circle the preimage (¬P )R of the negation of P . The strong preimage P R is the shaded region—i.e. that part of PR that does not also lie within (¬P )R . 4.3
Transforming STE Formulas with Indexing Relations
Our indexing transformation for an STE assertion A ⇒ C applies the strong preimage operation to the guards of the antecedent A and the preimage operation to the guards of the consequent C. For given trajectory formula f and indexing relation R, we write fR for the preimage of f under R and f R for the strong preimage of f under R. The definitions of these operations are given by recursion over the syntax of trajectory formulas in the obvious way: (n is 0)R (n is 1)R (f and g)R (P → f )R (N f )R
= n is 0 = n is 1 = f R and g R = PR → fR = N fR
(n is 0)R (n is 1)R (f and g)R (P → f )R (N f )R
= n is 0 = n is 1 = fR and gR = PR → fR = N fR
Two theorems about the preimage and strong preimage operations on trajectory formulas are used in the sequel. The first is that applying the strong
10
T.F. Melham and R.B. Jones
preimage of an indexing relation to the guards of an STE formula is a weakening operation: Theorem 1 For all R, f and φ, if φ |= R, then [f R ]φ [f ]φ . This is really the core of our abstraction transformation. Taking the strong preimage under an indexing relation can strictly weaken the guards of the formula by ‘subtracting out’ the indexes of cases in which the guard can be false. This achieves an abstraction by introducing Xs into the defining sequence of the formula. The second theorem is that applying the preimage of an indexing relation to the guards of an STE formula is a strengthening operation: Theorem 2 For all R, f and φ, if φ |= R, then [f ]φ [fR ]φ . Each of these theorems follows by a straightforward induction on the structure of the trajectory formula f . 4.4
Transforming STE Assertions with Indexing Relations
The theorems just cited, combined with the STE antecedent weakening and consequent strengthening properties of Section 2, allow an arbitrary property A ⇒ C to be indexed by an indexing relation R. Intuitively, we can use an indexing scheme to weaken the antecedent by grouping some of its separate Boolean input configurations using Xs (thereby assuming less about circuit behavior). If we use the same indexing to strengthen the consequent, and the resulting STE assertion holds, then we can also conclude the original STE assertion. To guarantee soundness, a technical side condition must be satisfied—namely that the indexing scheme R completely ‘covers’ the target variables: ∀ts. ∃xs. R(xs, ts)
(4)
This says that for any values of the target variables ts (the variables that appear in A and C), there is an assignment to the index variables xs that indexes it. This condition ensures that every verification case included in the original problem is also covered in the indexed verification—which is clearly necessary, for otherwise the indexed verification would be incomplete. Before considering the soundness of our transformation, we introduce a notation for the truth of a trajectory formula under a propositional assumption about its Boolean variables. If P is a propositional Boolean formula (for example an indexing relation) and A ⇒ C a trajectory assertion, we write P |= A ⇒ C to mean that for any valuation φ for which φ |= P , we have that φ |= A ⇒ C. Informally, we are saying that A ⇒ C is true whenever the condition P holds. More detail on how such an assertion can be checked in practice is given in Section 5.1. Soundness of our abstraction transformation is given by the following theorem.
Abstraction by Symbolic Indexing Transformations
11
Theorem 3 If we can show that R(xs, ts) |= AR ⇒ CR and the indexing relation coverage condition ∀ts. ∃xs. R(xs, ts) holds, then we may conclude |= A ⇒ C. Proof. By the following derivation. 1. 2. 3. 4. 5. 6.
R(xs, ts) |= AR ⇒ CR R(xs, ts) |= A ⇒ CR R(xs, ts) |= A ⇒ C ∃xs. R(xs, ts) |= A ⇒ C ∀ts. ∃xs. R(xs, ts) |= A ⇒ C
[assumption] [1 and Theorem (1)] [2 and Theorem (2)] [3, because xs do not appear in A or C] [side condition] [4 and 5]
Note that although the variables ts do not appear in the trajectory assertion AR ⇒ CR of line 1, the variables xs do. The condition given by R(xs, ts) is therefore significant to verification of this assertion. Indeed in this context it is equivalent to ∃ts.R(xs, ts), which restricts the verification to values of xs that actually do index something. If the STE algorithm produces a residual when checking the formula shown in line 1, then this will of course be given in terms of the index variables rather than the target variables from the original problem. The user must therefore analyze the residual by taking its image under the indexing relation, mapping it back into the original target variables for inspection there.
5
Indexing under Environmental Constraints
Few verifications take place in isolation from complex environmental and other operating assumptions. In this section, we extend our indexing algorithm to incorporate such conditions. We present two methods for indexing under an environmental constraints. The first is the simpler option, and requires little or no user intervention. The second is an alternative that can be applied to certain problems for which the direct approach is infeasible. Both methods use the technique of parametric representation of environmental constraints, which we now briefly introduce. 5.1
Parametric Representation
The parametric representation of Boolean predicates is useful for restricting verification to a care set and for reducing complexity by input-space decomposition [7,8,9]. The technique is independent of the symbolic simulation algorithm in STE, does not require modifications to the circuit, and can be used to constrain both input and internal signals. Consider a Boolean predicate P that constrains input and state variables vs. Suppose we express the required behavior of the circuit as a trajectory assertion A ⇒ C over the same variables, but expect this assertion to hold only under the constraint P . That is, we wish to establish that P |= A ⇒ C. One way of
12
T.F. Melham and R.B. Jones
doing this is to use STE to obtain a residual from φ |= A ⇒ C and then check that P implies this. But this is usually not practical; the complexity of directly computing φ |= A ⇒ C with a symbolic simulator is too great. A better way is to evaluate φ |= A ⇒ C only for those variable assignments φ that actually do satisfy P . The parametric representation does exactly this, by encoding the care predicate implicitly by means of parametric functions. Given a satisfiable P , we compute a vector of Boolean functions Qs = param(P, vs) that are substituted for the variables vs in the original trajectory assertion.2 These functions are constructed so that P |= A ⇒ C holds exactly when |= A[Qs/vs] ⇒ C[Qs/vs] holds. An algorithm for param and its correctness proof are found in [9]. Suppose M is an arbitrary expression—either a propositional logic formula or a trajectory formula—and P is a predicate over the variables vs appearing in M . We write ‘M [P ]’ for M [param(P, vs)/vs]. A complicating factor is that the parametric functions will, in general, contain fresh variables vs distinct from the original variables vs. When necessary, we will write M [P ](vs ) to emphasize the appearance of these in the resulting expression. 5.2
Method 1: Direct Parametric Encoding
We wish to apply an indexing relation R to a verification problem P |= A ⇒ C that includes a constraint P . With our first method, a fully automatic procedure uses the parametric representation to ‘fold’ the constraint P into both the trajectory assertion being checked and the relation R. Indexed verification then proceeds as before. Suppose we wish to check an STE assertion P |= A ⇒ C under an environmental constraint P and using an indexing relation R(xs, ts). First, we compute a parametrically-encoded STE assertion |= A[P ] ⇒ C[P ] and indexing relation R[P ]. We then just supply these to the symbolic indexing algorithm of Section 4. The soundness of the optimization provided by our transformation is justified as follows. Note that we also write the encoded indexing relation R[P ] as R[P ](xs, ts ), where ts are the fresh variables introduced by the parametric encoding process. Theorem 4 If R[P ](xs, ts ) |= A[P ]R[P ] ⇒ C[P ]R[P ] and the indexing relation coverage condition ∀ts . ∃xs. R[P ](xs, ts ) holds, then |= A ⇒ C. Proof. By the following derivation. 1. 2. 3. 4. 5. 2
[assumption] R[P ](xs, ts ) |= A[P ]R[P ] ⇒ C[P ]R[P ] [1 and Theorem (1)] R[P ](xs, ts ) |= A[P ] ⇒ C[P ]R[P ] R[P ](xs, ts ) |= A[P ] ⇒ C[P ] [2 and Theorem (2)] ∃xs. R[P ](xs, ts ) |= A[P ] ⇒ C[P ] [3, because xs do not appear in A or C] [side condition] ∀ts . ∃xs. R[P ](xs, ts )
As usual, we write f [Qs/vs] to denote the result of substituting Qs for all occurrences of vs (respectively) in a formula f .
Abstraction by Symbolic Indexing Transformations
6. |= A[P ] ⇒ C[P ] 7. P |= A ⇒ C
13
[4 and 5] [parametric theorem (see [8])]
As before, if the STE run that checks line 1 produces a non-trivial residual this must first be mapped back through the relation R[P ] to derive a residual in terms of the target variables of |= A[P ] ⇒ C[P ]. But these will, of course, be the fresh variables introduced by the parametric encoding, so we must also undo this encoding in turn to get back to the user’s variables of the original assertion A ⇒ C. 5.3
Method 2: Analyzing Indexed Residuals
While the method presented above is straightforward, it is often infeasible in practice to construct the parameterized indexing relation R[P ]. Our second method avoids this, while still allowing us to use a constraint predicate P . We initially run the STE model-checking algorithm on AR ⇒ CR . This will then produce a residual that describes the indexed situations under which the property holds. The predicate P is then itself indexed with R, to produce an indexed predicate PR . This is then checked to ensure it implies the indexed residual obtained from STE. This process is sound only for certain indexing relations R, and the main technical innovation here consists in identifying the required side conditions on R. The first side condition is similar to the coverage side condition (4) in Section 4.4. It requires the indexing relation to cover all values of the target variables that satisfy the constraint P : ∀ts. P (ts) ⊃ ∃xs. R(xs, ts)
(5)
The second side condition is new. It is that the preimage PR and the preimage (¬P )R must be disjoint, making PR = P R . The intuition for this condition is provided by considering Figure 1, where PR and (¬P )R overlap. We wish to index the condition P in order to check that it implies the residual—and we must do this by either taking the preimage PR or the strong preimage P R . If the preimage PR is selected, and there is an overlap, then false negatives may occur. Every point in the overlap will be included in the verification, but also maps via R to elements of ¬P , and the property may simply not hold for some of these ‘don’t care’ elements. On the other hand, false positives could occur if the strong preimage P R is selected. In this case, there may be points in P that are indexed only from points in the overlap area, but for which the verification property fails. The solution is to ban the overlap. One way to ensure PR = P R is to make the preimage (¬P )R empty. The following condition does this by restricting R from indexing anything in ¬P : ∀ts. (∃xs. R(xs, ts)) ⊃ P (ts)
(6)
If we choose an indexing relation R that exactly partitions P , ∀ts. P (ts) ≡ ∃xs. R(xs, ts) both side conditions are satisfied.
(7)
14
T.F. Melham and R.B. Jones
The soundness of the optimization provided by our transformation is justified as follows. Note again that we write R(xs, ts) as just ‘R’ when we do not need to emphasize the particular variables involved. Theorem 5 Let Q be the residual condition under which the model-checking assertion R(xs, ts) |= AR ⇒ CR holds. Suppose that ∀ts. P (ts) ≡ ∃xs. R(xs, ts) and that PR ⊃ Q. Then P |= A ⇒ C. Proof. By the following derivation. 1. 2. 3. 4. 5. 6. 7. 8. 9.
6
Q ∧ R(xs, ts) |= AR ⇒ CR PR ⊃ Q (∃ts. R(xs, ts) ∧ P (ts)) ⊃ Q P (ts) ∧ R(xs, ts) |= AR ⇒ CR P (ts) ∧ R(xs, ts) |= A ⇒ CR P (ts) ∧ R(xs, ts) |= A ⇒ C P (ts) ∧ ∃xs. R(xs, ts) |= A ⇒ C ∀ts. P (ts) ≡ ∃xs. R(xs, ts) P (ts) |= A ⇒ C
[assumption] [assumption] [2 and definition of PR ] [1 and 3, by logic] [4 and Theorem (1)] [5 and Theorem (2)] [6, because xs do not appear in A or C] [side conditions] [7 and 8, by logic]
Experimental Results
We have implemented the above algorithm as an experimental extension to Forte, a formal verification environment developed in Intel’s Strategic CAD Labs. Forte combines STE model checking with lightweight theorem proving in higher-order logic and has successfully been used in large-scale industrial trials on datapathdominated hardware [10,11,12]. The implementation of our algorithm is highly optimized, to ensure that the cost of computing an indexed STE property does not exceed the benefit gained by the abstraction. As usual with symbolic treatment of relations in model-checking algorithms, the main computational overhead arises from the existential quantifier of the preimage. We use the common strategy of partitioning the indexing relation to allow early quantification. The implementation is also carefully engineered to eliminate redundant computations. One circuit structure we studied is the simple CAM shown in Figure 2. This compares a 64-bit query against the contents of an n-entry memory, producing a bit that indicates whether the query value is in the memory or not. CAM devices have previously been verified using symbolic indexing by Pandey et al. [3], who devised an indexing scheme with a logarithmic reduction in the number of variables needed—bringing an otherwise infeasible verification within reach of STE. Our experiments on CAMs showed that we could add our indexing transformation to get a verification of directly-stated CAM properties with acceptable computational overhead. As an example, we present results for the following simple property: if the query value is equal to the contents of one of the CAM memory entries, then the ‘hit’ output will be true. The formalization of this
Abstraction by Symbolic Indexing Transformations
15
query Memory
= =
n entries
=
hit
= 64-bits Fig. 2. Simple Content-Addressable Memory (CAM)
property in STE involves the use of an environmental constraint to express the condition that the query is equal to one of the CAM entries. The verification therefore employs the methods of Section 5. Of course, this is not a complete characterization of correct behavior for the CAM device. However, it is typical of the kind of property for memory arrays that cannot be verified directly but that yields to the symbolic indexing technique. Figure 3 shows the CPU time required to verify this property for different numbers of entries in the CAM memory, from 4 up to 64. All runs were performed on a 400 MHz Intel Pentiumr II Processor running RedHatr Linux, and user time was determined with the system time command. The verification of this property by symbolic indexing, including our indexing transformation algorithm, is much faster than the best-known alternative, namely using the parametric representation to case-split on the location of the hit while simultaneously weakening other circuit nodes. The numbers reported are for the model-checking portions of the verification. Both approaches require similar amounts of deductive reasoning, namely coverage analysis for case splitting and the coverage side condition for symbolic indexing. As shown in Figure 4, our automatic indexing transformation did not add significant computational overhead to the indexed verification, a requirement for our technique to be feasible in practice. The computational overhead for our indexing algorithm is roughly constant at 50-60% of the total verification time.
7
Conclusions
We have presented algorithms that facilitate easier application of symbolic indexing in STE model checking. Our approach provides a simpler interface for the STE user, making it easier to include the technique in the verification flow. Our theoretical results also provide the logical foundation for composing multiple indexed results into larger properties. The method allows us to transform
16
T.F. Melham and R.B. Jones
100 time (s)
Case Splitting Symbolic Indexing
10 1 0.1
4 8
16
32
64
CAM Entries Fig. 3. Symbolic Indexing vs. Case Splitting
an STE formula into the more efficiently-checkable indexed form, but still conclude the truth of the original formula. A top-level verification can, therefore, be decomposed into separate sub-properties that are verified under different, and possibly incompatible, indexing schemes. We have demonstrated the efficiency of an implementation of our algorithms by verifying a simple property of a CAM, a hardware structure commonly encountered in microprocessor designs. The indexing scheme applied in this example comes from past work by Pandey et al. [3]. Of course, the single property chosen as an illustration in Section 6 doesn’t provide a complete characterization of the desired behavior of a CAM. Our contribution has been to show that we can both obtain the computational advantages of this indexing scheme and
10 8 time (s)
Total Indexing Only
6 4 2 0
4
8
16
32
64
CAM Entries Fig. 4. Overhead of Automatic Indexing Algorithm
Abstraction by Symbolic Indexing Transformations
17
justifiably conclude a direct statement of the desired property—with negligible additional cost. Our algorithm requires a user-supplied abstraction scheme, presented formally as a Boolean relation. Of course the indexing scheme could also be provided as a set of (possibly overlapping) predicates over the the target variables in the original formula. For example, the indexing scheme in Section 3 for the AND gate can also be given by the following set of predicates: {¬a, ¬b, ¬c, a ∧ b ∧ c} These cover the whole input space and precisely characterize the four cases to be verified in terms of the ‘target’ variables in the original property. A formal indexing relation can just be an arbitrary enumeration of these predicates in terms of a suitable number of index variables and can easily be generated automatically. But this still leaves the problem of discovering the indexing scheme in the first place. Part of our current research is directed at finding techniques to automatically discover abstractions that can leverage the indexing algorithms presented here. Finally, we observe that our transformation is a pre-processing step for STE model checking. In this paper, we have assumed a BDD-based STE algorithm. But of course the data abstraction capability of STE’s partially-ordered state spaces is orthogonal to the propositional logic technology employed. It is therefore reasonable to suppose that our method would also work with STE algorithms based on SAT [13], provided the formula representation supports our preimage and strong preimage operations. It would also be very interesting to see how our algorithms could be applied to generalized STE [14], a promising new model checking method that combines the efficiency of STE’s partially-ordered state spaces with a much more expressive and flexible framework for stating properties. Acknowledgments. We thank the anonymous referees for their careful reading of the paper and very helpful comments. John Harrison and Ashish Darbari also provided useful remarks on notation.
References 1. Seger, C.J.H., Bryant, R.E.: Formal verification by symbolic evaluation of partiallyordered trajectories. Formal Methods in System Design 6 (1995) 147–189 2. Bryant, R.E.: A methodology for hardware verification based on logic simulation. Journal of the ACM 38 (1991) 299–328 3. Pandey, M., Raimi, R., Bryant, R.E., Abadir, M.S.: Formal verification of content addressable memories using symbolic trajectory evaluation. In: ACM/IEEE Design Automation Conference, ACM Press (1997) 167–172 4. Bryant, R.E., Beatty, D.L., Seger, C.J.H.: Formal hardware verification by symbolic ternary trajectory evaluation. In: ACM/IEEE Design Automation Conference, ACM Press (1991) 397–402
18
T.F. Melham and R.B. Jones
5. Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Computers C-35 (1986) 677–691 6. Chou, C.T.: The mathematical foundation of symbolic trajectory evaluation. In Halbwachs, N., Peled, D., eds.: Computer Aided Verification (CAV). Volume 1633 of Lecture Notes in Computer Science., Springer-Verlag (1999) 196–207 7. Jain, P., Gopalakrishnan, G.: Efficient symbolic simulation-based verification using the parametric form of Boolean expressions. IEEE Transactions on ComputerAided Design of Integrated Circuits 13 (1994) 1005–1015 8. Aagaard, M.D., Jones, R.B., Seger, C.J.H.: Formal verification using parametric representations of Boolean constraints. In: ACM/IEEE Design Automation Conference, ACM Press (1999) 402–407 9. Jones, R.B.: Applications of Symbolic Simulation to the Formal Verification of Microprocessors. PhD thesis, Department of Electrical Engineering, Stanford University (1999) 10. O’Leary, J.W., Zhao, X., Gerth, R., Seger, C.J.H.: Formally verifying IEEE compliance of floating-point hardware. Intel Technical Journal (First quarter, 1999) Available at developer.intel.com/technology/itj/. 11. Kaivola, R., Aagaard, M.D.: Divider circuit verification with model checking and theorem proving. In Aagaard, M., Harrison, J., eds.: Theorem Proving in Higher Order Logics. Volume 1869 of Lecture Notes in Computer Science., Springer-Verlag (2000) 338–355 12. Aagaard, M.D., Jones, R.B., Seger, C.J.H.: Combining theorem proving and trajectory evaluation in an industrial environment. In: ACM/IEEE Design Automation Conference, ACM Press (1998) 538–541 13. Bjesse, P., Leonard, T., Mokkedem, A.: Finding bugs in an Alpha microprocessor using satisfiability solvers. In Berry, G., Comon, H., Finkel, A., eds.: Computer Aided Verification (CAV). Volume 2102 of Lecture Notes in Computer Science., Springer-Verlag (2001) 454–464 14. Yang, J., Seger, C.J.H.: Introduction to generalized symbolic trajectory evaluation. In: Proceedings of 2001 IEEE International Conference on Computer Design. (2001) 360–365
Counter-Example Based Predicate Discovery in Predicate Abstraction Satyaki Das and David L. Dill Computer Systems Laboratory Stanford University
[email protected],
[email protected] Abstract. The application of predicate abstraction to parameterized systems requires the use of quantified predicates. These predicates cannot be found automatically by existing techniques and are tedious for the user to provide. In this work we demonstrate a method of discovering most of these predicates automatically by analyzing spurious abstract counter-example traces. Since predicate discovery for unbounded state systems is an undecidable problem, it can fail on some problems. The method has been applied to a simplified version of the Ad hoc OnDemand Distance Vector Routing protocol where it successfully discovers all required predicates.
1
Introduction
Unbounded state systems have to be reasoned about to prove the correctness of a variety of real life systems including microprocessors, network protocols, software device drivers and security protocols. Predicate Abstraction is an efficient way of reducing these infinite state systems into more tractable finite state systems. A finite set of abstraction predicates defined on the concrete system is used to define the finite-state model of the system. The states of the abstract system consist of truth assignments to the set of abstraction predicates, that is each predicate is assigned a value of true or false. The abstraction is conservative, meaning that for any property proved on the abstract system, a concrete counterpart holds on the actual system. There are many hard problems that need to be solved to make predicate abstraction useful. The first is that the problem of proving arbitrary safety properties of a transition system is (obviously) undecidable. Given a pre-selected set of predicates and certain other assumptions, it is possible to prove in some cases that the system satisfies a safety property, but a failed proof may indicate that the property is violated, or simply that the abstraction is not sufficiently precise to complete the proof. Automating such
This work was supported by National Science Foundation under grant number 0121403 and DARPA contract 00-C-8015. The content of this paper does not necessarily reflect the position or the policy of the Government and no official endorsement should be inferred.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 19–32, 2002. c Springer-Verlag Berlin Heidelberg 2002
20
S. Das and D.L. Dill
proofs is quite difficult in practice, since it involves automatically solving logic problems that have high complexity and searching potentially large state spaces. In spite of the difficulty of this problem, there has been substantial progress towards solving it in the last few years. Another problem is how to discover the appropriate set of predicates. In much of the work on predicate abstraction, the predicates were assumed to be given by the user, or they were extracted syntactically from the system description (for example, predicates that appear in conditionals are often useful). It is obviously difficult for the user to find the right set of predicates (indeed, it is a trial-anderror process involving inspecting failed proofs), and the predicates appearing in the system description are rarely sufficient. There has been less work, and less progress, on solving the problem of finding the right set of predicates. In addition to the challenge of finding a sufficient set of predicates, there is the challenge of avoiding irrelevant predicates, since the cost of checking the abstract system usually increases exponentially with the number of predicates. In our system quantified predicates are used to deal with parameterized systems. In a parameterized system, it is often interesting (and necessary) to find properties that hold for all values of the parameter. For instance if a message queue is modeled as an array and rules parameterized by the array index are used to deliver messages then the absence of certain kinds of messages is expressed by a universally quantified formula. So predicates with quantifiers in them are used. This paper describes new ways of automatically discovering useful predicates by diagnosing failed proofs. The method is designed to find hard predicates that do not appear syntactically in the system description, including quantified predicates, which are necessary for proving most interesting properties. As importantly, it tries to avoid discovering useless predicates that do not help to avoid a known erroneous result. Furthermore, the diagnosis process can tell when a proof fails because of a genuine violation of the property by the actual system. Implementation The system was implemented using Binary Decision Diagrams (BDD) to represent the abstract system. A decision procedure for quantifier-free first-order logic, CVC [1] was used to do the satisfiability checks. The system is built around the predicate abstraction tool described in Das and Dill [9]. The state variable declarations describe the state of the concrete system. The transition relation is described using a list of parameterized guarded commands. Each guarded command consists of a guard and an action. The guard is a logic formula over the state variables and possibly the parameters that evaluates to either true or false. Each of the actions is a procedure that modifies the current concrete state into a new value. At each point the action corresponding to one of the enabled rules (rules whose guards evaluate to true) is non-deterministically executed and the concrete state changes. The prototype is implemented as as shown in Figure 1. The upper block is the tool described in our previous work [9]. Given a set of abstraction predicates,
Counter-Example Based Predicate Discovery in Predicate Abstraction Concrete System Verification Condition
Property verified Abstraction and Model Checking
Initial Predicates
Discovered predicates
21
Abstract Counter−example
Counter Example checking and predicate discovery Counter−example found
Fig. 1. Predicate Abstraction Algorithm
a verification condition and the concrete system description it first computes an approximate abstract model. This abstract model is model checked and the abstract system refined appropriately if it was too inexact. Notice that this refinement does not change the set of abstraction predicates and concentrates on using the existing predicates more efficiently. Finally this process terminates with either the verification condition verified (in which case nothing else needs to be done) or with an abstract counter-example trace. The current work, represented by the lower block in the diagram, checks whether a concrete counter-example trace corresponding to the abstract trace exists. If so the verification condition is violated and an error is reported, otherwise new predicates are discovered which avoids this counter-example. The new predicates are added to the already present abstraction predicates and the process starts anew. Since all the old predicates are reused a lot of the work from previous iterations are reused. Related Work Recently a lot of work has been done on predicate abstraction. The use of automatic predicate abstraction for model checking infinite-state systems was first presented by Graf and Sa¨ıdi in 1997 [11]. The method used monomials (monomials are conjunctions of abstract state variables or their negations) to represent abstract states. Parameterized systems are handled by using a counting abstraction [13]. Similar work has also been proposed in [17] and [14]. In 1998 [8], Col´ on and Uribe describes a method of constructing a finite abstract system and then model checking it. The abstraction produced in both methods are coarse and could fail to prove the verification condition even if all necessary predicates were present. By constructing the abstraction in a demand driven fashion, the method of Das and Dill [9] is able to compute abstractions efficiently that are as precise
22
S. Das and D.L. Dill
as possible given a fixed finite set of predicates. This ensures that if the desired properties can be proved with the abstraction predicates then the method will be able to do so. The predicate abstraction methods described so far have relied on user provided predicates to produce the abstract system. Counter-example guided refinement is a generally useful technique. It has been used in by Kurshan et al. [2] for checking timed automata, Balarin et al. [3] for language containment and Clarke et al [7] in the context of verification using abstraction for different variables in a version of the SMV model checker. Counter-example guided refinement has even been used with predicate abstraction by Lakhnech et al. [12]. Invariant generation techniques have also used similar ideas [19,5]. Invariant generation techniques generally produce too many invariants many of which are not relevant to the property being proved. This can cause problems with large systems. The counter-example guided refinement techniques do not produce the quantified predicates that our method needs. Predicate abstraction is also being used for software verification. Device drivers are being verified by the SLAM project [4]. The SLAM project has used concrete simulation of the abstract counter-example trace to generate new predicates. The BLAST project [18] also uses spurious counter-examples to generate new predicates. Predicate abstraction has also been used in software verification as a way of finding loop invariants [10]. These systems do not deal with parameterized systems, hence they do not need quantified predicates.
2
Abstraction Basics
As in previous work [9], sets of abstract and concrete states will be represented by logical formulas. For instance the concrete predicate, X represents the set of concrete states which satisfy, X. The main idea of predicate abstraction is to construct a conservative abstraction of the concrete system. This ensures that if some property is proved for the abstract system, then the corresponding property also holds for the concrete system. Formally the concrete transition system is described by a set of initial states represented by the predicate IC and a transition relation represented by the predicate RC . IC (x) is true iff x is an initial state. Similarly, RC (x, y) is true iff y is a successor of x. The safety property, P is the verification condition that needs to be proved in the concrete system. An execution of the concrete system is defined to be a sequence of states, x0 , x1 , . . . xM such that IC (x0 ) holds and for every i ∈ [0, M ), RC (xi , xi+1 ) holds. A partial trace is an execution that does not necessarily start from an initial state. A counter-example trace is defined to be an execution, x0 , x1 , . . . xM such that ¬P (xM ) holds (i.e., the counter-example trace ends in a state which violates P ). The abstraction is determined by a set of N predicates, φ1 , φ2 , . . . φN . The abstract state space is just the set of all bit-vectors of length N . An abstraction function, α maps sets of concrete states to sets of abstract states while the concretization function, γ does the reverse. In the following definitions the predicates QC and QA represent sets of concrete states and abstract states respectively. Then α(QC ) is a predicate over abstract states such that α(QC )(s)
Counter-Example Based Predicate Discovery in Predicate Abstraction
23
holds exactly when s is an abstraction of some concrete state x in QC . Similarly γ(QA )(x) holds exactly when there exists an abstract state, s in QA and s is the abstraction of x. Definition 1 Given predicates, QC and QA over concrete and abstract states respectively, the abstraction and concretization functions are defined as: α(QC )(s) = ∃x. QC (x) ∧
φi (x) ≡ s(i)
i∈[1,N ]
γ(QA )(x) = ∃s. QA (s) ∧
φi (x) ≡ s(i)
i∈[1,N ]
Using the above definitions, the abstract system is defined by the set of abstract initial states, IA = α(IC ) and the abstract transition relation, RA (s, t) = ∃x, y. γ(s)(x)∧γ(t)(y)∧RC (x, y). An abstract execution is a sequence of abstract states, s0 , s1 , . . . sM such that IA (s0 ) holds and for each i ∈ [0, M ), RA (si , si+1 ) holds. An abstract counter-example trace is an abstract execution, s0 , s1 , . . . sM for which α(¬P )(sM ) holds. The atomic predicates in the verification condition, P , are used as the initial set of predicates. The abstract system is constructed and the abstract property, ¬α(¬P ) checked for all reachable states. If this is successful then the verification condition holds. Otherwise the generated abstract counter-example is analyzed to see if a concrete execution corresponding to the abstract trace exists. In that case, a concrete counter-example has been constructed. Otherwise the abstract counter-example is used to discover new predicates. Then the process is repeated with the discovered predicates being added to the already present predicates. An abstract trace is called a real trace if there exists a concrete trace corresponding to it. Conversely if there are no concrete traces corresponding to an abstract trace then it is called a spurious trace.
3
Predicate Discovery
As described in the previous section, the system generates a counter-example trace to the verification condition that was to be proved. Now the system must analyze the abstract counter-example trace to either confirm that the trace is real, that is a concrete trace corresponding to it exists, or come up with additional predicates which would eliminate the spurious counter-example. First the trace is minimized to get a minimal spurious trace. A minimal spurious trace is defined to be an abstract trace which is 1. spurious (no corresponding concrete trace exists.) 2. minimal (removing even a single state from either the beginning or end of the trace makes the remainder real.)
24
S. Das and D.L. Dill
Checking the Abstract Counter-Example Trace There is a concrete counter-example trace x1 , x2 , . . . xL corresponding to the abstract counter-example trace, s1 , s2 , . . . sL if these conditions are satisfied: 1. For each i ∈ [1, L], γ(si )(xi ) holds. This means that each concrete state xi corresponds to the abstract state si in the trace. 2. IC (x1 ) ∧ ¬P (xL ) holds. The concrete counter-example trace starts from a initial state and ends in a state which violates P . 3. For each i ∈ [1, L), RC (xi , xi+1 ). For every i, xi+1 is the successor of xi . The conditions (1) and (3) determine that a concrete trace corresponding to the abstract trace exists and condition (2) determines that the trace starts from the set of concrete initial states and ends in a state that violates the verification condition. To write the logic concisely the logic for the initial state has been disregarded. In the implementation, an initial totally unconstrained state is added to the trace and it is assumed that the initial rule produces the initial state of the system. Since all the atomic predicates of P are present among the abstraction predicates the condition ¬P (xL ) is implied by γ(sL )(xL ). Hence, if the formula L i=1
γ(si )(xi ) ∧
L−1
RC (xi , xi+1 )
i=0
is satisfiable then the abstract counter-example trace is real. Otherwise there is no satisfying assignment and the abstract counter-example trace is spurious. To simplify the presentation it shall be assumed that the same transition relation, RC can be used for each of the concrete steps including the first where RI is actually used. In our implementation the first step is handled specially and RI is used instead of RC . The test for spuriousness is completely a property of the transition relation and the trace itself and does not depend either on the initial states or the verification condition. So we will generalize the definition of spuriousness to partial traces. A partial trace is spurious if the above formula is unsatisfiable. Predicate Discovery To understand predicate discovery we must first understand when predicate abstraction produces a spurious counter-example. Assume that in Figure 2 the whole abstract trace s1 , s2 , . . . sL is spurious but the partial trace s2 , s3 , . . . sL is real. So there are two kinds of concrete states in γ(s2 ): 1. Successor states of states in γ(s1 ). 2. States (like x2 ) that are part of some concrete trace corresponding to s2 , . . . sL .
Counter-Example Based Predicate Discovery in Predicate Abstraction
25
It must be the case that the above two types of states are disjoint. Otherwise it would be possible to find a concrete trace corresponding to the whole trace thereby making it real. If predicates to distinguish the two kinds of states were added then the spurious counter-example would be avoided. In the method described here, the discovered predicates will be able to characterize states of the second type above. Once it has been determined that the abstract counter-example is spurious, states are removed from the beginning of the trace while still keeping the remainder spurious. When states can no longer be removed from the beginning, the same process is carried out by removing states from the end of the trace. This will eventually produces a minimal spurious trace. predicate to refine s2
✬✩ ✬✩ ✬✩ ✬✩ ✈ x2 ✈ ✈ ✲✈ ✈x1 ✫✪ ✫✪ ✫✪ ✫✪ γ(s3 ) γ(sL ) γ(s2 ) γ(s1 )
✤✜
✬✩ ✈ ✫✪
✈
✬✩ ✬✩ ✈ ✈
✣✢ ✤✜ ✈
✫✪ ✫✪
✣✢ Fig. 2. Abstraction Refinement
Now consider the minimal spurious trace, s1 , s2 , s3 , . . . sL shown in Figure 2. Here the circles representing γ(s1 ), γ(s2 ) etc. are sets of concrete states while the black dots inside the sets represent individual concrete states. Since the trace s2 , s3 . . . sL is real, Q0 =
L i=2
γ(si )(xi ) ∧
L−1
RC (xi , xi+1 )
i=2
is satisfiable for some concrete states, x2 , x3 , . . . xL . Now CVC is queried about the satisfiability of Q0 . This returns a finite conjunction of formulas, ψ1 (x2 )∧ψ2 (x2 )∧. . . ψK (x2 )∧θ(x3 , . . . xL ) which implies
26
S. Das and D.L. Dill
Q0 . So the ψi ’s are conditions that any x2 must satisfy for it to be the first state of the concrete trace corresponding to s2 , s3 , . . . sL . Now it must be the case that, γ(s1 )(x1 ) ∧ RC (x1 , x2 ) ∧
K
ψi (x2 ) ∧ θ(x3 , . . . xL )
i=1
is unsatisfiable. Otherwise it would be possible to find a concrete trace corresponding to s1 , s2 , . . . sL ! More specifically, if the predicates ψ1 , ψ2 , . . . ψK are added to the set of abstraction predicates, and the verifier rerun, this particular spurious abstract counter-example will not be generated. So, we have an automatic way of discovering new abstract predicates. However it is possible to reduce the number of additional abstraction predicates. In fact it is quite likely that all of the predicates ψ1 , . . . ψK are not needed to avoid the spurious counter-example. The satisfiability of the above formula is checked after leaving out the ψ1 (x2 ) expression. If the formula is still unsatisfiable then it is dropped altogether. The same procedure is repeated with the other ψi ’s till an essential set of predicates remain (dropping any one of them makes the formula satisfiable). Notice that there may be multiple essential sets of predicates that make the above formula unsatisfiable. This method finds one such set. Now consider the effect that the abstraction refinement has on the abstract system. The original abstract state, s2 will be split into two – in one part all the added predicates hold while in the other part at least one of the assertions does not hold. Also, in the abstract transition relation, the transition from the state s1 to the first partition of s2 is removed. It is still possible that there is a path from s1 to s3 through the other partition of s2 . However the refined abstraction will never generate a spurious counter-example in which a concrete state corresponding to s1 has a successor which satisfies all the assertions ψ1 , ψ2 , . . . ψK . Parameterized Rules and Quantified Predicates When proving properties of parameterized systems, quantified predicates are needed. These quantified predicates cannot be found either from the system description or by existing predicate discovery techniques. Invariant generation methods do find quantified invariants which may be useful in some cases. But the problem there is that a lot of invariants are generated and there is no good way of deciding which ones are useful. In the presence of parameterized rules, the predicate discovery works exactly as described above. But the parameters (which are explicitly not part of the concrete state) in the rules may appear in the predicates finally generated. Recall that the predicates discovered characterize the set of states like x2 (in Figure 2) that are part of a real abstract trace. Appearance of a rule parameter in these expressions implies that the parameter must satisfy some conditions in the concrete counterpart of the abstract trace. Any other value of the parameter which satisfies the same conditions could produce another concrete trace. Naturally,
Counter-Example Based Predicate Discovery in Predicate Abstraction
27
state N: positive integer status : array [N ] of enum {GOOD, BAD} error : boolean initialize status := All values are initialized to GOOD error := false /* No error initially */ rule(p : subrange [1..N]) (status[p] = BAD) ⇒ error := true property ¬error
Fig. 3. Quantified predicate example
an existential quantifier wrapped around these expressions would find a predicate that is consistent with all possible behaviors of the (possibly unbounded) parameter. Quantifier scope minimization is carried out so that smaller predicates may be found. In some cases the existential quantifiers can be eliminated all together. Often predicates of the form, ∃x. Q(x) ∧ (x = a) where a is independent of x, are discovered. Heuristics were added so that this predicate would be simplified to Q(a). To illustrate the way quantified predicates are discovered automatically, a really trivial example is presented in Figure 3. In the example system we want to prove that error is always false. So the initial abstraction predicate chosen will be just the atomic formulas of the verification condition, in this case the predicate: B1 ≡ error. With this abstraction the property can not be proved and an abstract counter-example trace, ¬B1 , B1 is returned. Since the initialization rule is handled like any other rule (only with implicit guard true) the abstract counter-example that shall be analyzed is, true, ¬B1 , B1 . Using the test for spuriousness described earlier, the counter-example is shown to be a minimal spurious trace. Also the partial trace, ¬B1, B1 is real (that is a concrete counterpart exists) when status[p0 ] = BAD holds (p0 is the specific value of the parameter chosen). However the initialization rule specifically sets all the elements of the status array to GOOD. Hence the predicate discovered will be, status[p0 ] = BAD. But notice that the parameter appears in the predicate. Hence the new predicate will be, B2 ≡ ∃q. status[q] = BAD. Now the abstraction will be refined with the extra predicate. The additional bit will be initialized to false. Also the transition rule will now be enabled only when the new bit is true. Since that never happens the rule is never enabled and the desired property holds.
28
4
S. Das and D.L. Dill
Application to AODV
As an application of this method we shall consider a simplified version of the Ad Hoc On-demand Distance Vector (AODV) routing protocol [15,16]. The simplification was to remove timeouts from the protocol since we could not find a way of reasoning about them in our system. The protocol is used for routing in a dynamic environment where networked nodes are entering and leaving the system. The main correctness condition of the protocol is to avoid the formation of routing loops. This is hard to accomplish and bugs have been found [6]. Finite instances of the protocol has been analyzed with model checkers and a version of the protocol has been proved correct using manual theorem proving techniques. Briefly the protocol works as follows. When a node needs to find a route to another, it broadcasts a route request (RREQ) message to its neighbors. If any of them has a route to the destination it replies with a route reply (RREP) message. Otherwise it sends out a RREQ to its neighbors. This continues till the destination node is reached or some node has a route to the final destination. Then the RREP message is propagated back to the node requesting the route. When a node receives a RREQ message it adds a route to the original sender of the message, so that it can propagate the RREP back. Also nodes will replace longer paths by shorter ones to optimize communication. The routing tables are modeled by the three two-dimensional arrays route p, route and hops. Given nodes i and j, route p[i][j] is true iff i has a route to j, route[i][j] is the node to which i forwards packets whose final destination is j and hops[i][j] is the number of hops that i believes are needed for a packet to reach j. The message queue is modeled as an unbounded array of records. Each record has type, src, dst, from, to and hops fields. The src and dst fields are the original source and final destination of the current request (or reply). The from and to fields are the message source and destination of the current hop. The field hops is an integer which keeps track of the number of hops the message has traversed. As explained before, for every route that a node has, it keeps track of the number of hops necessary to get to the destination. Consider three arbitrary but distinct nodes: a, b and c. The node a has a route to c and its next hop is b. In this situation the protocol maintains the invariant that b has a route to c and a’s hop count to c is strictly greater than b’s hop count to c. This makes sure that along a route to the destination the hop count always decreases. Thus there can not be a cycle in the routing table. This is the property that was verified automatically. In the actual protocol, where links between nodes can go down, the age of the routes is tracked with a sequence number field. The ordering relation is more complex in that case. To simplify the system for the sake of discussion here the sequence numbers have been dropped. The simplified version is described in Figure 4 and 5. The atomic predicates in the the verification condition are used as the initial set of predicates. The initial predicates are, B1 ≡ route p[a][c], B2 ≡ route[a][c] = b, B3 ≡ route p[b][c] and B4 ≡ hops[a][c] > hops[b][c]. The abstract
Counter-Example Based Predicate Discovery in Predicate Abstraction
29
type cell index type : subrange(1..N) msg index type : subrange(1..infinity) msg sort : enum of [INVALID, RREQ, RREP] msg type : record of [type : msg type; from,to,src,dst : cell index type; hops : integer]; state route p : array [N][N] of boolean route : array [N][N] of cell queue : array [infinity] of msg type a, b, c : msg index type initialize msg queue := all messages have type INVALID route p := all array elements are false /* Generate RREQ */ rule (msg : msg index type; src,dst : cell index type;) queue[msg].type = INVALID ∧ ¬ route p[src][dst] ⇒ queue[msg] := [# type = RREQ; src = src; dst = dst; from = src; hops = 0 #] /* Receive RREP */ rule (in, out: msg index type;) queue[in].type = RREP ∧ queue[out].type = INVALID ⇒ /* Add route to immediate neighbor */ route p[queue[in].to][queue[in].from] := true route[queue[in].to][queue[in].from] := queue[in].from hops[queue[in].to][queue[in].from] := 1 /* Add route to RREP source if this is a better route */ if hops[queue[in].to][queue[in].src]>queue[in].hops ∨ ¬ route p[queue[in].to][queue[in].src] then route p[queue[in].to][queue[in].src] := true route[queue[in].to][queue[in].src] := queue[in].from hops[queue[in].to][queue[in].src] := queue[in].hops + 1 end /* Forward RREP */ if queue[in].to = queue[in].dst∧ route p[queue[in].to][queue[in].dst] then queue[out] := [# type=RREP; src=queue[in].src; dst=queue[in].dst; from=queue[in].to; to=route[queue[in].to][queue[in].dst] hops=hops[queue[in].to][queue[in].src] #] end
Fig. 4. AODV protocol
30
S. Das and D.L. Dill
/* Receive RREQ */ rule (in, out: msg index type;) queue[in].type = RREQ ∧ queue[out].type = INVALID ⇒ /* Add route to immediate neighbor */ route p[queue[in].to][queue[in].from] := true route[queue[in].to][queue[in].from] := queue[in].from hops[queue[in].to][queue[in].from] := 1 /* Add route to RREQ source if this is a better route */ if hops[queue[in].to][queue[in].src]>queue[in].hops ∨ ¬ route p[queue[in].to][queue[in].src] then route p[queue[in].to][queue[in].src] := true route[queue[in].to][queue[in].src] := queue[in].from hops[queue[in].to][queue[in].src] := queue[in].hops + 1 end /* RREQ has reached final destination */ if queue[in].dst = queue[in].to then queue[out] := [# type=RREP; src=queue[in].dst; dst=queue[in].src; from=queue[in].to; to=queue[in].from; hops=0 #] /* The RREQ receiver has a route to final destination */ elsif route p[queue[in].to][queue[in].dst] then queue[out] := [# type=RREP; src=queue[in].dst; dst=queue[in].src from=queue[in].to; to=queue[in].from; hops=hops[queue[in].to][queue[in].dst] #] /* Forward RREQ */ else queue[out] := [# type=RREQ; src=queue[in].src; dst=queue[in].dst from=queue[in].from; hops=queue[in].hops+1 #] end property (route p[a][c] ∧ route[a][c] = b) → (route p[b][c] ∧ hops[a][c] > hops[b][c])
Fig. 5. AODV protocol (contd.)
system generates a counter-example of length one where a receives a RREQ and adds a route to c through b while b does not have a route to c. The predicate discovery algorithm deduces that this cannot happen since in the initial state there are no RREQs present. So the predicate, ∃x. queue[x].type = RREQ is added and the new abstraction is model checked again. Now a two step counterexample is generated. In the first step an arbitrary cell generates an RREQ. In the next step a receives an RREQ from b originally requested by c and sets it routing table entry for node c to b. Since b does not have a routing table entry
Counter-Example Based Predicate Discovery in Predicate Abstraction
31
to c this violates the desired invariant. Again the predicate discovery algorithm deduces that such a message cannot exist. So the predicate ∃x. ( queue[x].type = RREQ ∧ queue[x].f rom = b ∧ queue[x].src = c ∧ queue[x].to = a) is discovered. Continuing in this manner in the next iteration the predicate, ∃x. ( queue[x].type = RREQ ∧ queue[x].f rom = b ∧ queue[x].src = c ∧ queue[x].to = a ∧ hops[b][c] > queue[x].hops) is discovered. This is exactly the predicate that is required to prove the desired invariant. While verifying the actual protocol, similar predicates are discovered for the RREP branch of the protocol as well. The predicates needed to prove the actual protocol are different from the predicates listed here but are of the same flavor. The program requires thirteen predicate discovery cycles to find all the necessary predicates.
References 1. David L. Dill Aaron Stump, Clark W. Barrett. CVC: a cooperating validity checker. In Conference on Computer Aided Verification, Lecture notes in Computer Science. Springer-Verlag, 2002. 2. R. Alur, A. Itai, R.P. Kurshan, and M. Yannakakis. Timing verification by successive approximation. Information and Computation 118(1), pages 142–157, 1995. 3. F. Balarin and A. L. Sangiovanni-Vincentelli. An iterative approach to language containment. In 5th International Conference on Computer-Aided Verification, pages 29–40. Springer-Verlag, 1993. 4. Thomas Ball and Sriram K. Rajamani. The SLAM project: debugging system software via static analysis. In Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 1–3. ACM Press, 2002. 5. Saddek Bensalem, Yassine Lakhnech, and Sam Owre. InVeSt: A tool for the verification of invariants. In 10th International Conference on Computer-Aided Verification, pages 505–510. Springer-Verlag, 1998. 6. Karthikeyan Bhargavan, Davor Obradovic, and Carl A. Gunter. Formal verification of standards for distance vector routing protocols, August 1999. Presented in the Recent Research Session at Sigcomm 1999. 7. Edmund M. Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. Counterexample-guided abstraction refinement. In Computer Aided Verification, pages 154–169. Springer-Verlag, 2000. 8. Michael A. Col´ on and Tom´ as E. Uribe. Generating finite-state abstractions of reactive systems using decision procedures. In Conference on Computer-Aided Verification, volume 1427 of Lecture Notes in Computer Science, pages 293–304. Springer-Verlag, 1998. 9. Satyaki Das and David L. Dill. Successive approximation of abstract transition relations. In Proceedings of the Sixteenth Annual IEEE Symposium on Logic in Computer Science, pages 51–60. IEEE Computer Society, 2001. June 2001, Boston, USA.
32
S. Das and D.L. Dill
10. C. Flanagan and S. Qadeer. Predicate abstraction for software verification. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM Press, 2002. 11. Susanne Graf and Hassen Sa¨ıdi. Construction of abstract state graphs with PVS. In Orna Grumberg, editor, Conference on Computer Aided Verification, volume 1254 of Lecture notes in Computer Science, pages 72–83. Springer-Verlag, 1997. June 1997, Haifa, Israel. 12. Yassine Lakhnech, Saddek Bensalem, Sergey Berezin, and Sam Owre. Incremental verification by abstraction. In T. Margaria and W. Yi, editors, Tools and Algorithms for the Construction and Analysis of Systems: 7th International Conference, TACAS 2001, pages 98–112, Genova, Italy, 2001. Springer-Verlag. 13. D. Lessens and Hassen Sa¨ıdi. Automatic verification of parameterized networks of processes by abstraction. Electronic Notes of Theoretical Computer Science (ENTCS), 1997. 14. Z. Manna and A. Pnueli. Temporal Verification of Reactive Systems: Safety. Springer-Verlag, 1995. 15. Charles E. Perkins and Elizabeth M. Royer. Ad Hoc On-Demand Distance Vector (AODV) Routing. In Workshop on Mobile Computing Systems and Applications, pages 90–100. ACM Press, February 1999. 16. Charles E. Perkins, Elizabeth M. Royer, and Samir Das. Ad Hoc On-Demand Distance Vector (AODV) Routing. Available at http://www.ietf.org/internet-drafts/draft-ietf-manet-aodv-05.txt, 2000. 17. A. P. Sistla and S. M. German. Reasoning with many processes. In Symp. on Logic in Computer Science, Ithaca, pages 138–152. IEEE Computer Society, June 1987. 18. Rupak Majumdar Thomas A Henzinger, Ranjit Jhala and Gregoire Sutre. Lazy abstraction. In Proceedings of the 29th ACM SIGPLAN-SIGACT Conference on Principles of Programming Languages. ACM Press, 2002. 19. A. Tiwari, H. Rueß, H. Sa¨idi, and N. Shankar. A technique for invariant generation. In Tiziana Margaria and Wang Yi, editors, TACAS 2001 - Tools and Algorithms for the Construction and Analysis of Systems, volume 2031 of Lecture Notes in Computer Science, pages 113–127, Genova, Italy, apr 2001. Springer-Verlag.
Automated Abstraction Refinement for Model Checking Large State Spaces Using SAT Based Conflict Analysis Pankaj Chauhan1 , Edmund Clarke1 , James Kukula3 , Samir Sapra1 , Helmut Veith2 , and Dong Wang1 1 3
Carnegie Mellon University 2 TU Vienna, Austria Synopsys Inc., Beaverton, OR
Abstract. We introduce a SAT based automatic abstraction refinement framework for model checking systems with several thousand state variables in the cone of influence of the specification. The abstract model is constructed by designating a large number of state variables as invisible. In contrast to previous work where invisible variables were treated as free inputs we describe a computationally more advantageous approach in which the abstract transition relation is approximated by pre-quantifying invisible variables during image computation. The abstract counterexamples obtained from model-checking the abstract model are symbolically simulated on the concrete system using a state-of-the-art SAT checker. If no concrete counterexample is found, a subset of the invisible variables is reintroduced into the system and the process is repeated. The main contribution of this paper are two new algorithms for identifying the relevant variables to be reintroduced. These algorithms monitor the SAT checking phase in order to analyze the impact of individual variables. Our method is complete for safety properties (AG p) in the sense that – performance permitting – a property is either verified or disproved by a concrete counterexample. Experimental results are given to demonstrate the power of our method on real-world designs.
1
Introduction
Symbolic model checking has been successful at automatically verifying temporal specifications on small to medium sized designs. However, the inability of BDD based model checking to handle large state spaces of “real world” designs hinders the wide scale acceptance of these techniques. There have been advances
This research is sponsored by the Semiconductor Research Corporation (SRC) under contract no. 99-TJ-684, the Gigascale Silicon Research Center (GSRC), the National Science Foundation (NSF) under Grant No. CCR-9803774, and the Max Kade Foundation. One of the authors is also supported by Austrian Science Fund Project N Z29-INF. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of SRC, GSRC, NSF, or the United States Government.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 33–51, 2002. c Springer-Verlag Berlin Heidelberg 2002
34
P. Chauhan et al.
on various fronts to push the limits of automatic verification. On the one hand, improving BDD based algorithms improves the ability to handle large state machines, while on the other hand, various abstraction algorithms reduce the size of the design by focusing only on relevant portions of the design. It is important to make improvements on both fronts for successful verification. A conservative abstraction is one which preserves all behaviors of a concrete system. Conservative abstractions benefit from a preservation theorem which states that the correctness of any universal (e.g. ACTL∗ ) formulae on an abstract system automatically implies the correctness of the formula on the concrete system. However, a counterexample on an abstract system may not correspond to any real path, in which case it is called a spurious counterexample. To get rid of a spurious counterexample, the abstraction needs to be made more precise via refinement. It is obviously desirable to automate this procedure. This paper focuses on automating the abstraction process for handling large designs containing up to a few thousand latches. This means that using any computation on concrete systems based on BDDs will be too expensive. Abstraction refinement [1,6,8,11,13,17] is a general strategy for automatic abstraction. Abstraction refinement usually involves the following process. 1. Generation of Initial Abstraction. It is desirable to derive the initial abstraction automatically. 2. Model checking of abstract system. If this results in a conclusive answer for the abstract system, then the process is terminated. For example, in case of existential abstraction, a “yes” answer for an ACTL∗ property in this step means that the concrete system also satisfies the property, and we can stop. However, if the property is false on the abstract system, an abstract counterexample is generated. 3. Checking whether the counterexample holds on the concrete system. If the counterexample is valid, then we have actually found a bug. Otherwise, the counterexample is spurious and the abstraction needs to be refined. Usually, refinement of abstraction is based on the analysis of counterexample(s) generated. Our abstraction function is based on hiding irrelevant parts of the circuit by make a set of variables invisible. This simple abstraction function yields an efficient way to generate minimal abstractions, a source of difficulty in previous approaches. We describe two techniques to produce abstract systems by removing invisible variables. The first is simply to make the invisible variables into input variables. This is shown to be a minimal abstraction. However, this leaves a large number of input variables in the abstract system and, consequently, BDD based model checking even on this abstract system becomes very difficult [19]. We propose an efficient method to pre-quantify these variables on the fly during image computation. The resulting abstract systems are usually small enough to be handled by standard BDD based model checkers. We use an enhanced version [3,4] of NuSMV [5] for this. If a counterexample is produced for the abstract system, we try to simulate it on the concrete system symbolically using a fast SAT checker (Chaff [16,21] in our case).
Automated Abstraction Refinement
35
The refinement is done by identifying a small set of invisible variables to be made visible. We call these variables the refinement variables. Identification of refinement variables is the main focus of this paper. Our techniques for identifying important variables are based on analysis of effective boolean constraint propagation (BCP) and conflicts [16] during the SAT checking run of the counterexample simulation. Recently, propositional SAT checkers have demonstrated tremendous success on various classes of SAT formulas. The key to the effectiveness of SAT checkers like Chaff [16], GRASP [18] and SATO [20] is nonchronological backtracking, efficient conflict driven learning of conflict clauses, and improved decision heuristics. SAT checkers have been successfully used for Bounded Model Checking (BMC) [2], where the design under consideration is unrolled and the property is symbolically verified using SAT procedures. BMC is effective for showing the presence of errors. However, BMC is not at all effective for showing that a specification is true unless the diameter of the state space is known. Moreover, BMC performance degrades when searching for deep counterexamples. Our technique can be used to show that a specification is true and is able to search for deeper concrete counterexamples because of the guidance derived from abstract counterexamples. The efficiency of SAT procedures has made it possible to handle circuits with a few thousand of variables, much larger than any BDD based model checker is able to do at present. Our approach is similar to BMC, except that the propositional formula for simulation is constrained by assignments to visible variables. This formula is unsatisfiable for a spurious counterexample. We propose heuristic scores based on backtracking and conflict clause information, similar to VSIDS heuristics in Chaff, and conflict dependency analysis algorithm to extract the reason for unsatisfiability. Our techniques are able to identify those variables that are critical for unsatisfiability of the formula and are, therefore, prime candidates for refinement. The main strength of our approach is that we use the SAT procedure itself for refinement. We do not need to invoke multiple SAT instances or solve separation problems as in [8]. Thus the main contributions of our work are, (a) use of SAT for counterexample validation, (b) refinement procedures based on SAT conflict analysis, and, (c) a method to remove invisible variables from the abstract system for computational efficiency. Outline of the Paper The rest of the paper is organized as follows. Section 2 briefly reviews how abstraction is used in model checking and introduces notation that is used in the following sections. In Section 3, we describe in detail, our abstraction technique and how we check an abstract counterexample on the concrete model. The most important part of the paper is Section 4, where we discuss our refinement algorithms based on scoring heuristics for variables and conflict dependency analysis. In section 5, we present experimental evidence to show the ability of our approach to handle large state systems. In Section 6, we describe related work in detail. Finally, we conclude in Section 7 with directions for future research.
36
2
P. Chauhan et al.
Abstraction in Model Checking
We give a brief summary of the use of abstraction in model checking and introduce notation that we will use in the remainder of the paper (refer to [7] for a full treatment). A transition system is modeled by a tuple M = (S, I, R, L, L) where S is the set of states, I ⊆ S is the set of initial states, R is the set of transitions, L is the set of atomic propositions that label each state in S with the labeling function L : S → 2L . The set I is also used as a predicate I(s), meaning the state s is in I. Similarly, the transition relation R is also used as a predicate R(s1 , s2 ), meaning there exists a transition between states s1 and s2 . Each program variable vi ranges over its non-empty domain Dvi . The state space of a program with a set of variables V = {v1 , v2 , . . . , vn } is defined by the Cartesian product Dv1 × Dv2 × . . . × Dvn . In existential abstraction [7] a surjecˆ tion h : S → Sˆ maps a concrete state si ∈ S to an abstract state sˆi = h(si ) ∈ S. si ). We denote the set of concrete states that map to an abstract state sˆi by h−1 (ˆ ˆ = (S, ˆ I, ˆ R, ˆ L, ˆ L) ˆ corDefinition 1. The minimal existential abstraction M responding to a transition system M = (S, I, R, L, L) and an abstraction function h is defined by: 1. Sˆ = {ˆ s|∃s.s ∈ S ∧ h(s) = sˆ}. 2. Iˆ = {ˆ s|∃s.I(s) ∧ h(s) = sˆ}. ˆ = {(ˆ 3. R s1 , sˆ2 )|∃s1 .∃s2 .R(s1 , s2 ) ∧ h(s1 ) = sˆ1 ∧ h(s2 ) = sˆ2 }. ˆ 4. L = L. ˆ s) = 5. L(ˆ h(s)=ˆ s L(s). Condition 3 can be stated equivalently as ˆ s1 , sˆ2 ) ∃s1 , s2 (R(s1 , s2 ) ∧ h(s1 ) = sˆ1 ∧ h(s2 ) = sˆ2 ) ⇔ R(ˆ
(1)
An atomic formula f respects h if for all s ∈ S, h(s) |= f ⇒ s |= f . Labeling ˆ s) is consistent, if for all s ∈ h−1 (ˆ L(ˆ s) it holds that s |= f ∈L(ˆ ˆ s) f . The following theorem from [6,15] is stated without proof. Theorem 1. Let h be an abstraction function and φ an ACTL∗ specification where the atomic sub-formulae respect h. Then the following holds: (i) For all ˆ L(ˆ ˆ s) is consistent, and (ii) M ˆ |= φ ⇒ M |= φ. sˆ ∈ S, This theorem is the core of all abstraction refinement frameworks. However, the ˆ | converse may not hold, i.e., even if M = φ, the concrete model M may still ˆ is said to be spurious, and satisfy φ. In this case, the counterexample on M we need to refine the abstraction function. Note that the theorem holds even if only the right implication holds in Equation 1. In other words, even if we add ˆ the validity of an ACTL∗ more transitions to the minimal transition relation R, ˆ implies its validity on M . formula on M Definition 2. An abstraction function h is a refinement for the abstraction function h and the transition system M = (S, I, R, L, L) if for all s1 , s2 ∈ S, h (s1 ) = h (s2 ) implies h(s1 ) = h(s2 ). Moreover, h is a proper refinement of h if there exist s1 , s2 ∈ S such that h(s1 ) = h(s2 ) and h (s1 ) =h (s2 ).
Automated Abstraction Refinement
37
In general, ACTL∗ formulae can have tree-like counterexamples [9]. In this paper, we focus only on safety properties, which have finite path counterexamples. It is possible to generalize our approach to full ACTL∗ as done in [9]. The following iterative abstraction refinement procedure for a system M and a safety formula φ follows immediately. 1. Generate an initial abstraction function h. ˆ . If M ˆ |= φ, return TRUE. 2. Model check M ˆ 3. If M | = φ, check the generated counterexample T on M . If the counterexample is real, return FALSE. 4. Refine h, and goto step 2. Since each refinement step partitions at least one abstract state, the above procedure is complete for finite state systems for ACTL* formulae that have path counterexamples. Thus the number of iterations is bounded by the number of concrete states. However, as we will show in the next two sections, the number of refinement steps can be at most equal to the number of program variables. We would like to emphasize that we model check abstract system in step 2 using BDD based symbolic model checking, while steps 3 and 4 are carried out with the help of SAT checkers.
3
Generating Abstract State Machine
We consider a special type of abstraction for our methodology, wherein, we hide a set of variables that we call invisible variables, denoted by I. The set of variables that we retain in our abstract machine are called visible variables, denoted by V. The visible variables are considered to be important for the property and hence are retained in the abstraction, while the invisible variables are considered irrelevant for the property. The initial abstraction and the refinement in steps 1 and 4 respectively correspond to different partitions of V . Typically, we would want |V| |I|. Formally, the value of a variable v ∈ V in state s ∈ S is denoted by s(v). Given a set of variables U = {u1 , u2 , . . . , up }, U ⊆ V , let sU denote the portion of s that corresponds to the variables in U , i.e., sU = (s(u1 )s(u2 ) . . . s(up )). Let V = {v1 , v2 , . . . , vk }. This partitioning of variables ˆ The set of abstract states is Sˆ = defines our abstraction function h : S → S. Dv1 × Dv2 . . . × Dvk and h(s) = sV . In our approach, the initial abstraction is to take the set of variables mentioned in the property as visible variables. Another option is to make the variables in the cone of influence (COI) of the property visible. However, the COI of a property may be too large and we may end with a large number of visible variables. The idea is to begin with a small set of visible variables and then let the refinement procedure come up with a small set of invisible variables to make visible. We also assume that the transition relation is described not as a single predicate, but as a conjunction of bit relations Rj of each individual variable vj . More formally, we consider a sequential circuit with registers V = {v1 , v2 , . . . , vm } and inputs I = {i1 , i2 , . . . , iq }. Let s = (v1 , v2 , . . . , vm ), i = (i1 , i2 , . . . , iq )
38
P. Chauhan et al.
and s = (v1 , v2 , . . . , vm ). The primed variables denote the next state versions of unprimed variables as usual. Thus the bit relation for vj becomes Rj (s, i, vj ) = (vj ↔ fvj (s, i)).
R(s, s ) = ∃i
m
Rj (s, i, vj )
(2)
j=1
3.1
Abstraction by Making Invisible Variables as Input Variables
ˆ corresponding to R and h As shown in [8], the minimal transition relation R described above is obtained by removing the logic defining invisible variables ˆ looks like: and treating them as free input variables of the circuit. Hence, R ˆ s, sˆ ) = ∃sI ∃i R(ˆ
Rj (sV , sI , i, vj )
(3)
vj ∈V
The quantifications in Equation 3 are performed during each image computation in symbolic model checking of the abstract system. This is done so as not to build ˆ and enjoy the benefits of early quantification. a monolithic BDD for R We call this type of abstraction an input abstraction. We write s as sV , sI to stress the fact that we are leaving invisible variables as input variables in ˆ When dealing with systems with a large number of registers, quantifying so R. many variables for each image computation is expensive (e.g. [19]). An invisible variable can in the support of multiple partitions of the transition relation. In input abstraction, each occurence of an invisible variable has the same value in different partitions of the abstract transition relation. Thus, we say input abstraction preserves correlations between different occurrences of an invisible variable. In the next type of abstraction, we pre-quantify most of the invisible variables, to reduce the number of variables during image computation. This means that different occurrences of an invisible variable get de-coupled when we push the quantifications inside Equation 3, making the abstraction more approximate. 3.2
Abstraction by Pre-quantifying Invisible Variables
Input abstraction leaves a large number of variables to quantify during the image computation process. We can however, quantify these variables a priori, leaving ˆ The transition relation that we get by quantifying only visible variables in R. ˆ ˜ We can even quantify invisible variables from R in the beginning is denoted by R. some of the input variables a priori in this fashion to control the total number ˜ Let Q ⊆ I ∪ I denote the set of variables to be preof variables appearing in R. quantified and let W = (I ∪I)\Q, the set of variable that are not pre-quantified. Quantification of a large number of invisible variables in Equation 3 is computationally expensive [15]. To alleviate this difficulty, it is customary to
Automated Abstraction Refinement
39
approximate this abstraction by pushing the quantification inside conjunctions as follows. ˜ s, sˆ ) = ∃sW R(ˆ ∃sQ Rj (sV , sI , i, vj ) (4) vj ∈V
Since the BDDs for state sets do not contain input variables in the support, this is a safe step to do. This does not violate the soundness of the approximation, i.e., for each concrete transition in R, there will be a corresponding transition ˆ as stated below. in R, ˜ s1 , sˆ2 ). Theorem 2. ∃s1 , s2 (R(s1 , s2 ) ∧ h(s1 ) = sˆ1 ∧ h(s2 ) = sˆ2 ) ⇒ R(ˆ The other direction of this implication does not hold because of the approximations introduced. Preserving Correlations. We can see in Equation 4 that by existentially quantifying each invisible variable separately for each conjunct of the transition relation, we lose the correlation between different occurrences of a variable. For example, consider the trivial bit relations x1 = x3 , x2 = ¬x3 and x3 = x1 ⊕ x2 . Suppose x3 is made an invisible variable. Then quantifying x3 from the bit relations of x1 and x2 will result in the transition relation being always evaluated 1, meaning the state graph is a clique. However, we can see that in any reachable state, x1 and x2 are always opposite of each other. To solve this problem partially without having to resort to equation 4, we propose to cluster those bit relations that share many common variables. Since this problem is very similar to the quantification scheduling problem (which occurs during image computations), we propose to use a modification of VarScore algorithms [3] for evaluating this quantification. This algorithm can be viewed as producing clusters of bit relations. We use it to produce clusters with controlled approximations. The idea is to delay variable quantifications as much as possible, without letting the conjoined BDDs grow too large. When a BDD grows larger than some threshold, we quantify away a variable. We can of course quantify a variable that no longer appears in the support of other BDDs. Effective quantification scheduling algorithms put closely related occurrences of a variable in the same cluster. Figure 1 shows the VarScore algorithm for approximating existential abstraction. A static circuit minimum cut based structural method to reduce the number of invisible variables was proposed in [12] and used in [19]. Our method introduces approximations as needed based on actual image computation, while there method removes the variables statically. Our algorithms achieves a balance between performance and accuracy. This means that the approximations introduced by our algorithm are more accurate as the parts of the circuits statically removed in [12] could be important. 3.3
Checking the Validity of an Abstract Counterexamples
ˆ and a safety formula φ, we run the usual BDD based Given an abstract model M ˆ |= φ. Suppose that the symbolic model checking algorithm to determine if M
40
P. Chauhan et al.
Given a set of conjuncts RV and variables sQ to pre-quantify Repeat until all sQ variables are quantified 1. Quantify away sQ variables appearing in only one BDD 2. Score the variables by summing up the sizes of BDDs in which a variable occurs 3. Pick two smallest BDDs for the variable with the smallest score 4. If any BDD is larger then the size threshold, quantify the variable from BDD(s) and go back to step 2. 5. If the BDDs are smaller than threshold, do BDDAnd or BDDAndExists depending upon the case Fig. 1. VarScore algorithm for approximating existential abstraction
model checker produces an abstract path counterexample s¯m = ˆ s0 , sˆ1 , . . . , sˆm . To check whether this counterexample holds on the concrete model M or not, we symbolically simulate M beginning with the initial state I(s0 ) using a fast SAT checker. At each stage of the symbolic simulation, we constrain the values of visible variables only according to the counterexample produced. The equation for symbolic simulation is: (I(s0 ) ∧ (h(s0 ) = sˆ0 )) ∧ (R(s0 , s1 ) ∧ (h(s1 ) = sˆ1 )) ∧ . . . ∧(R(sm−1 , sm ) ∧ (h(sm ) = sˆm ))
(5)
Each h(si ) is just a projection of the state si onto visible variables. If this propositional formula is satisfiable, then we can successfully simulate the counterexample on the concrete machine to conclude that M | = φ. The satisfiable assignments to invisible variables along with assignments to visible variables produced by model checking give a valid counterexample on the concrete machine. If this formula is not satisfiable, the counterexample is spurious and the abstraction needs refinement. Assume that the counterexample can be simulated up to the abstract state sˆf , but not up to sˆf +1 ([6,8]). Thus formula 6 is satisfiable while formula 7 is not satisfiable, as shown in Figure 2. (I(s0 ) ∧ (h(s0 ) = sˆ0 )) ∧ (R(s0 , s1 ) ∧ (h(s1 ) = sˆ1 )) ∧ . . . ∧(R(sf −1 , sf ) ∧ (h(sf ) = sˆf ))
(6)
(I(s0 ) ∧ (h(s0 ) = sˆ0 )) ∧ (R(s0 , s1 ) ∧ (h(s1 ) = sˆ1 )) ∧ . . . ∧(R(sf , sf +1 ) ∧ (h(sf +1 ) = sˆf +1 ))
(7)
Using the terminology introduced in [6], we call the abstract state sˆf a failure state. The abstract state sˆf contains many concrete states given by all possible combinations of invisible variables, keeping the same values for visible variables as given by sˆf . The concrete states in sˆf reachable from the initial states following the spurious counterexample are called the dead-end states. The concrete states in sˆf that have a reachable set in sˆf +1 are called bad states. Because the
Automated Abstraction Refinement
Abstract Trace
s^0
s^1
s^2
s^f
41
dead−end states s^
f+1
s^0 failure state
Concrete Trace −1 h ( s^0 )
−1 h ( s^1 )
−1 h ( s^2 )
−1 h ( s^f )
bad states
−1 h ( s^f+1)
Fig. 2. A spurious counterexample showing failure state [8]. No concrete path can be extended beyond failure state.
dead-end states and the bad states are part of the same abstract state, we get the spurious counterexample. The refinement step then is to separate dead-end states and bad states by making a small subset of invisible variables visible. It is easy to see that the set of dead-end states are given by the values of state variables in the f th step for all satisfying solutions to Equation 6. Note that in symbolic simulation formulas, we have a copy of each state variable for each time frame. We do this symbolic simulation using the SAT checker Chaff [16]. We assume that there are concrete transitions which correspond to each abstract transition from sˆi to sˆi+1 , where 0 < i ≤ f . It is fairly straightforward to extend our algorithm to handle spurious abstract transitions. In this case, the set of bad states is not empty. Since s¯f is the shortest prefix that is unsatisfiable, there must be information passed through the invisible registers at time frame f in order for the SAT solver to prove the counterexample is spurious. Specifically, the SAT solver implicitly generates constraints on the invisible registers at time frame f based on either the last abstract transition or the prefix s¯f . Obviously the intersection of these two constraints on those invisible registers is empty. Thus the set of invisible registers that are constrained in time frame f during the SAT process is sufficient to separate deadend states and bad states after refinement. Therefore, our algorithm limits the refinement candidates to the registers that are constrained in time frame f . Equation 5 is exactly like symbolic simulation with Bounded Model Checking. The only difference is that the values of visible state variables at each step are constrained to the counterexample values. Since the original input variables to the system are unconstrained, we also constrain their values according to the abstract counterexample. This puts many constraints on the SAT formula. Hence, the SAT checker is able to prune the search space significantly. We rely on the ability of Chaff to identify important variables in this SAT check to separate dead-end and bad states, as described in the next section.
42
P. Chauhan et al. while(1) { if (decide_next_branch()) { while (deduce() == conflict) { blevel = analyse_conflict(); if (blevel == 0) return UNSAT; else backtrack(blevel); } } else }
return SAT;
// Branching // Propagate implications // Learning
// Non-chronological // backtrack // no branch means all vars // have been assigned
Fig. 3. Basic DPLL backtracking search (used from [16] for illustration purpose)
4
SAT Based Refinement Heuristics
The basic framework for these SAT procedures is Davis-PutnamLogeman-Loveland backtracking search, shown in Figure 3. The function decide_next_branch() chooses the branching variable at current decision level. The function deduce() does Boolean constraint propagation to deduce further assignments. While doing so, it might infer that the present set of assignments to variables do not lead to any satisfying solution, leading to a conflict. In case of a conflict, new clauses are learned by analyse_conflict() that hopefully prevent the same unsuccessful search in the future. The conflict analysis also returns a variable for which another value should be tried. This variable may not be the most recent variable decided, leading to a non-chronological backtrack. If all variables have been decided, then we have found a satisfying assignment and the procedure returns. The strength of various SAT checkers lies in their implementation of constraint propagation, decision heuristics, and learning. Modern SAT checkers work by introducing conflict clauses in the learning phase and by non-chronological backtracking. Implication graphs are used for Boolean constraint propagation. The vertices of this graph are literals, and each edge is labeled with the clause that forces the assignment. When a clause becomes unsatisfiable as a result of the current set of assignments (decision assignments or implied assignments), a conflict clause is introduced to record the cause of the conflict, so that the same futile search is never repeated. The conflict clause is learned from the structure of the implication graph. When the search backtracks, it backtracks to the most recent variable in the conflict clause just added, not to the variable that was assigned last. For our purposes, note that Equation 7 is unsatisfiable, and hence there will be much backtracking. Hence, many conflict clauses will be introduced before the SAT checker concludes that the formula is unsatisfiable. A conflict clause records a reason for the formula being unsat-
Automated Abstraction Refinement
43
isfiable. The variables in a conflict clause are thus important for distinguishing between dead-end and bad states. The decision variable to which the search backtracks is responsible for the current conflict and hence is an important variable. We call the implication graph associated with each conflict a conflict graph.The source nodes of this graph are the variable decisions, the sink node of this graph is the conflicting assignment to one of the variables. At least one conflict clause is generated from a conflict graph. We propose the following two algorithms to identify important variables from conflict analysis and backtracking. 4.1
Refinement Based on Scoring Invisible Variables
We score invisible variables based on two factors, first, the number of times a variable gets backtracked to and, second, the number of times a variable appears in a conflict clause. Note that we have adjust the first score by an exponential factor based on the decision level a variable is at, as the variable at the root node can potentially get just two back tracks, while a variable at the decision level dl can get 2dl backtracks globally. Every time the SAT procedure backtracks to an invisible variable at decision level dl, we add the following number to the backtrack score.
2
|I|−dl c
We use c as a normalizing constant. For computing the second score, we just keep a global counter conflict score for each variable and increment the counter for each variable appearing in any conflict clause. The method used for identifying conflict clauses from conflict graphs greatly affects SAT performance. As shown in [21], we use the most effective method called the first unique implication point (1UIP) for identifying conflict clauses. We then use weighted average of these two scores to derive the final score as follows. w1 · backtrack score + w2 · conflict score
(8)
Note that the second factor is very similar to the decision heuristic VSIDS used in Chaff. The difference is that Chaff uses these per variable global scores to arrive at local decisions (of the next branching variable), while we use them to derive global information about important variables. Therefore, we do not periodically divide the variable scores as Chaff does. We also have to be careful to guide Chaff not to decide on the intermediate variables introduced while converting various formulae to CNF form, which is the required input format for SAT checkers. This is done automatically in our method. 4.2
Refinement Based on Conflict Dependency Graph
The choice of which invisible registers to make visible is the key to the success of the refinement algorithm. Ideally, we want this set of registers to be small and still
44
P. Chauhan et al.
be able to prevent the spurious trace. Obviously, the set of registers appearing in the conflict graphs during the checking of the counterexample could prevent the spurious trace. However, this set can be very large. We will show here that it is unnecessary to consider all conflict graphs. Dependencies between Conflict Graphs. We call the implication graph associated with a conflict a conflict graph. At least one conflict clause is generated from a conflict graph. Definition 3. Given two conflict graphs A and B, if at least one of the conflict clauses generated from A labels one of the edges in B, then we say that conflict B directly depends on conflict A. For example, consider the conflicts depicted in the conflict graphs of Figure 4. Suppose that at a certain stage of the SAT checking, conflict graph A is generated. This produces the conflict clause ω9 = (¬x9 + x11 + ¬x15 ). We are using the first UIP (1UIP) learning strategy [21] to identify the conflict clause here. This conflict clause can be rewritten as x9 ∧ ¬x11 → ¬x15 . In the other conflict graph B, clause ω9 labels one of the edges, and forces variable x15 to be 0. Hence, we say that conflict graph B directly depends on conflict graph A. −x11 (2)
−x11 (2) ω1 −x12 (3) ω4
ω4 x14 (5)
ω1 x15 (5)
x9 (1)
ω2
ω5
−x11 (2) x2 (5) ω3
ω3 x10 (5)
ω2 1UIP cut
Conflict graph A
directly depends conflict
ω9 ω9
ω5 −x15 (5)
x9 (1)
x16 (5) ω6
ω6
conflict
x17 (4) Using conflict clause
Conflict graph B
Fig. 4. Two dependent conflict graphs. Conflict B depends on conflict A, as the conflict clause ω9 derived from the conflict graph A produces conflict B.
Given the set of conflict graphs generated during satisfiability checking, we construct the unpruned conflict dependency graph as follows: – Vertices of the unpruned dependency graph are all conflict graphs created by the SAT algorithm. – Edges of the unpruned dependency graph are direct dependencies.
Automated Abstraction Refinement
45
Figure 5 shows an unpruned conflict dependency graph with five conflict graphs. A conflict graph B depends on another conflict graph A, if vertex A is reachable from vertex B in the unpruned dependency graph. In Figure 5, conflict graph E depends on conflict graph A. When the SAT algorithm detects unsatisfiability, it terminates with the last conflict graph corresponding to the last conflict. The subgraph of the unpruned conflict dependency graph on which the last conflict graph depends is called the conflict dependency graph. Formally, Definition 4. The conflict dependency graph is a subgraph of the unpruned dependency graph. It includes the last conflict graph and all the conflict graphs on which the last one depends.
conflict graph B
conflict graph A
conflict graph D conflict graph C
conflict graph E
last conflict graph
Fig. 5. The unpruned dependency graph and the dependency graph (within dotted lines)
In Figure 5, conflict graph E is the last conflict graph, hence the conflict dependency graph includes conflict graphs A, C, D, E. Thus, the conflict dependency graph can be constructed from the unpruned dependency graph by any directed graph traversal algorithm for reachability. Typically, many conflict graphs can be pruned away in this traversal, so that the dependency graph becomes much smaller than the unpruned dependency graph. Intuitively, all SAT decision strategies are based on heuristics. For a given SAT problem, the initial set of decisions/conflicts a SAT solver comes up with may not be related to the final unsatisfiability result. Our dependency analysis helps to remove that irrelevant reasoning. Generating Conflict Dependency Graph Based on Zchaff. We have implemented the conflict dependency analysis algorithm on top of zchaff [21], which has a powerful learning strategy called first UIP (1UIP). Experimental results from [21] show that 1UIP is the best known learning strategy. In 1UIP, only one conflict clause is generated from each conflict graph, and it only includes those implications that are closer to the conflict. Refer to [21] for the details. We have built our algorithms on top of 1UIP, and we restrict the following discussions to the case that only one conflict clause is generated from a conflict graph. Note here that the algorithms can be easily adapted to other learning strategies.
46
P. Chauhan et al.
After SAT terminates with unsatisfiability, our pruning algorithm starts from the last conflict graph. Based on the clauses contained in this conflict graph, the algorithm traverses other conflict graphs that this one depends on. The result of this traversal is the pruned dependency graph. Identifying Important Variables. The dependency graph records the reasons for unsatisfiability. Therefore, only the variables appearing in the dependency graph are important. Instead of collecting all the variables appearing in any conflict graph, those in the dependency graph are sufficient to disable the spurious counterexample. s0 , sˆ1 , . . . , sˆf +1 is the shortest prefix of a spurious counSuppose s¯f +1 = ˆ terexample that can not be simulated on the concrete machine. Recall that sˆf is the failure state. During the satisfiability checking of s¯f +1 , we generate an unpruned conflict dependency graph. When Chaff terminates with unsatisfiability, we collect the clauses from the pruned conflict dependency graph. Some of the literals in these clauses correspond to invisible registers at time frame f . Only those portions of the circuit that correspond to the clauses contained in the pruned conflict dependency graph are necessary for the unsatisfiability. Therefore, the candidates for refinement are the invisible registers that appear at time frame f in the conflict dependency graph. Refinement Minimization. The set of refinement candidates identified from conflict analysis is usually not minimal, i.e., not all registers in this set are required to invalidate the current spurious abstract counterexample. To remove those that are unnecessary, we have adapted the greedy refinement minimization algorithm in [19]. The algorithm in [19] has two phases. The first phase is the addition phase, where a set of invisible registers that it suffices to disable the spurious abstract counterexample is identified. In the second phase, a minimal subset of registers that is necessary to disable the counterexample is identifed. Their algorithm tries to see whether removing a newly added register from the abstract model still disables the abstract counterexample. If that is the case, this register is unnecessary and is no longer considered for refinement. In our case, we only need the second phase of the algorithm. The set of refinement candidates provided by our conflict dependency analysis algorithm already suffices to disable the current spurious abstract counterexample. Since the first phase of their algorithm takes at least as long as the second phase, this should speed up our minimization algorithm considerably.
5
Experimental Results
We have implemented our abstraction refinement framework on top of NuSMV model checker [5]. We modified the SAT checker Chaff to compute heuristic scores, to produce conflict dependency graphs and to do incremental SAT. The IU-p1 benchmark was verified by conflict analysis based refinement on a SunFire 280R machine with two 750Mhz UltraSparc III CPUs and 8GB of RAM running Solaris. All other experiments were performed on a dual 1.5GHz Athlon machine
Automated Abstraction Refinement
47
with 3GB of RAM running Linux. The experiments were performed on two sets of benchmarks. The first set of benchmarks in Table 1 are industrial benchmarks obtained from various sources. The benchmarks IU-p1 and IU-p2 refer to the same circuit, IU, but different properties are checked in each case. This circuit is an integer unit of a picoJava microprocessor from Sun. The D series benchmarks are from a processor design. The properties verified were simple AG properties. The property for IU-p2 has 7 registers, while IU-p1 and D series circuits have only one register in the property. The circuits in Table 2 are various abstractions of the IU circuit. The property being verified has 17 registers. They are smaller circuits that are easily handled by our methods but they have been shown to be difficult to handle by Cadence SMV [8]. We include these results here to compare our methods with the results reported in [8] for property 2. We do not report the results for property 1 in [8] because it is too trivial (all counterexamples can be found in 1 iteration). It is interesting to note that all benchmarks but IU-p1 and IU-p2 have a valid counterexample. Table 1. Comparison between Candence SMV (CSMV), heuristic score based refinement and dependency analysis based refinement for larger circuits. The experiment marked with a ∗ was performed on the SunFire machine with more memory because of a length 72 abstract counterexample encountered. circuit # regs ctrex length D2 105 15 D5 350 32 D6 177 20 D18 745 28 D20 562 14 D24 270 10 IU-p1 4855 true IU-p2 4855 true
CSMV Heuristic Score Dependency time time iters # regs time iters # regs 152 105 10 51 79 11 39 1,192 29 3 16 38.2 8 10 45,596 784 24 121 833 48 90 >4 hrs 12,086 69 346 9,995 142 253 >7 hrs 1,493 56 281 1,947 74 265 7,850 14 1 6 8 1 4 - 9,138 22 107 3,350∗ 13 19 - 2,820 7 36 712 6 13
In Table 1, we compare our methods against the BDD based model checker Cadence SMV (CSMV). We enabled cone of influence reduction and dynamic variable reordering in Cadence SMV. The performance of “vanilla” NuSMV was worse than Cadence SMV, hence we do not report those numbers. We report total running time, number of iterations and the number of registers in the final abstraction. The columns labeled with “Heuristic Score” report the results with our heuristic variable scoring method. We introduce 5 latches at a time in this method. The columns labeled with “Dependency” report the results of our dependency analysis based refinement. This method employs pruning of candidate refinement sets. A “-” in a cell indicates that the model checker ran out of memory. Table 2 compares our methods against those reported in [8] on IU series benchmarks for verifying property 2.
48
P. Chauhan et al.
Table 2. Comparison between [8], heuristic score based refinement and dependency analysis based refinement for smaller circuits. circuit # regs ctrex length IU30 30 11 IU35 35 20 IU40 40 20 IU45 45 20 IU50 50 20 IU55 55 11 IU60 60 11 IU65 65 11 IU70 70 11 IU75 75 11 IU80 80 11 IU85 85 11 IU90 90 11
[8] time 6.5 11 16.1 22.1 85.1 130.5 153.4 167.7 167.1
Heuristic Score time iters # regs 2.3 2 27 8.9 2 27 28.4 3 32 32.9 3 32 36 3 32 43 2 27 52.8 2 27 50.3 2 27 55.6 2 27 38.5 4 37 47.1 4 37 44.7 4 37 49.9 4 37
Dependency time iters # regs 1.9 4 20 10.4 5 21 13.3 6 22 25 6 22 32.8 6 22 61.9 4 20 65.5 4 20 67.5 4 20 71.4 4 20 15.7 5 21 21.1 5 21 24.6 5 21 24.3 5 21
We can see that our conflict dependency analysis based method outperforms a standard BDD based model checker, the method reported in [8] and the heuristic score based method. We also conclude that the computational overhead of our dependency analysis based method is well justified by the smaller abstractions that it produces. The variable scoring based method does not enjoy the benefits of reduced candidate refinement sets obtained through dependency analysis. Therefore, it results in a coarser abstraction in general. The heuristic based refinement method adds 5 registers at a time, resulting in some uniformity in the final number of registers, especially evident in Table 2. Due to the smaller number of refinement steps it performs, the total time it has to spend in model checking abstract machines may be smaller (as for D5, D6, D20, IU60, IU65, IU70).
6
Related Work
Our work compares most closely to that presented in [6] and more recently [8]. There are three major differences between our work and [6]. First, their initial abstraction is based on predicate abstraction, where new set of program variables are generated representing various predicates. They symbolically generate and manipulate these abstractions with BDDs. Our abstraction is based on hiding certain parts of the circuit. This yields an easier way to generate abstractions. Secondly, the biggest bottleneck in their method is the use of BDD based image computations on concrete systems for validating counterexamples. We use symbolic simulation based on SAT accomplish this task, as in [8]. Finally, their refinement is based on splitting the variable domains. The problem of finding the coarsest refinement is shown to be NP-hard in [6]. Because our abstraction functions are simpler, we can identify refinement variables during the SAT
Automated Abstraction Refinement
49
checking phase. We do not need to solve any other problem for refinement. We differ from [8] in three aspects. First, we propose to remove invisible variables from abstract systems on the fly by quantification. This reduces the complexity of BDD based model checking of abstract systems. Leaving a large number of input variables in the system makes it very difficult to model check even an abstract system [19]. Secondly, computation overhead for our separation heuristics is minimal. In their approach, refinement is done by separating dead-end and bad states (sets of concrete states contained in the failure state) with ILP solvers or machine learning. This requires enumerating all dead-end and bad states or producing samples of these states and separating them. We avoid this step altogether and cheaply identify refinement variables from the analysis of a single SAT check that is already done. We do not claim any optimality on the number of variables, however, this is a small price to pay for efficiency. We have been able to handle a circuit with about 5000 variables in cone of influence of the specification. Finally, we believe our method can identify a better set of invisible registers for refinement. Although [8] uses optimization algorithms to minimize the number of registers to refine, their algorithm relies on sampling to provide the candidate separation sets. When the size of the problem becomes large, there could be many possible separation sets. Our method is based on SAT conflict analysis. The Boolean constraint propagation (BCP) algorithm in a SAT solver naturally limits the number of candidates that we will need to consider. We use conflict dependency analysis to reduce further the number of candidates for refinement. The work of [10] focuses on algorithms to refine an approximate abstract transition relation. Given a spurious abstract transition, they combine a theorem prover with a greedy strategy to enumerate the part of the abstract transition that does not have corresponding concrete transitions. The identified bad transition is removed from the current abstract model for refinement. Their enumeration technique is potentially expensive. More importantly, they do not address the problem of how to refine abstract predicates. Previous work on abstraction by making variables invisible includes the localization reduction of Kurshan [13] and other techniques (e.g. [1,14]). Localization reduction begins with the set of variables in the property as visible variables. The set of variables adjacent to the present set of visible variables in the variable dependency graph are chosen as the candidates for refinement. Counterexamples are analyzed in order to choose variables among these candidates. The work presented in [19] combines three different engines (BDD, ATPG and simulation) to handle large circuits using abstraction and refinement. The main difference between our method and that in [19] is the strategy for refinement. In [19], candidates for refinement are based on those invisible registers that get assigned in the abstract counterexample. In our approach, we intentionally throw away invisible registers in the abstract counterexample, and rely on our SAT conflict analysis to select the candidates. We believe there are two advantages to disallowing invisible registers in the abstract counterexample. First of all, generating an abstract counterexample is computationally expensive, when the number of invisible registers is large. In fact, for efficiency reasons, a BDD/ATPG hybrid engine is used in [19] to model check the abstract model. By quantifying
50
P. Chauhan et al.
the invisible variables early, we avoid this bottleneck. More importantly, in [19], invisible registers are free inputs in the abstract model, their values are totally unconstrained. When checking such an abstract counterexample on the concrete machine, it is more likely to be spurious. In our case, the abstract counterexample only includes assignments to the visible registers and hence a real counterexample can be found more cheaply.
7
Conclusions
We have presented an effective and practical automatic abstraction refinement framework based on our novel SAT based conflict analysis. We have described a simple variable scoring heuristic as well as an elaborate conflict dependency analysis for identifying important variables. Our schemes are able to handle large industrial scale designs. Our work highlights the importance of using SAT based methods for handling large circuits. We believe these techniques complement bounded model checking in that they enable us to handle true specifications effeciently. An obvious extension of our framework is to handle all ACTL* formulae. We believe this can be done as in [9]. Further experimental evaluation will help us fine tune our procedures. We can also use circuit structure information to accelerate the SAT based simulation of counterexamples, for example, by identifying replicated clauses. We are investigating the use of the techniques described in this paper for software verification. We already have a tool for extracting a Boolean program from an ANSI C program by using predicate abstraction. Acknowledgements. We would like to thank Ofer Strichman for providing us some of the larger benchmark circuits. We would also like to acknowledge the anonymous reviewers for carefully reading the paper and making useful suggestions.
References [1] Felice Balarin and Alberto L. Sangiovanni-Vincentelli. An iterative approach to language containment. In Proceedings of CAV’93, pages 29–40, 1993. [2] Armin Biere, Alexandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. In Proceedings of Tools and Algorithms for the Analysis and Construction of Systems (TACAS’99), number 1579 in LNCS, 1999. [3] Pankaj Chauhan, Edmund M. Clarke, Somesh Jha, Jim Kukula, Tom Shiple, Helmut Veith, and Dong Wang. Non-linear quantification scheduling in image computation. In Proceedings of ICCAD’01, pages 293–298, November 2001. [4] Pankaj Chauhan, Edmund M. Clarke, Somesh Jha, Jim Kukula, Helmut Veith, and Dong Wang. Using combinatorial optimization methods for quantification scheduling. In Tiziana Margaria and Tom Melham, editors, Proceedings of CHARME’01, volume 2144 of LNCS, pages 293–309, September 2001. [5] A. Cimatti, E. M. Clarke, F. Giunchiglia, and M. Roveri. NuSMV: A new Symbolic Model Verifier. In N. Halbwachs and D. Peled, editors, Proceedings of the International Conference on Computer-Aided Verification (CAV’99), number 1633 in Lecture Notes in Computer Science, pages 495–499. Springer, July 1999.
Automated Abstraction Refinement
51
[6] E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In E. A. Emerson and A. P. Sistla, editors, Proceedings of CAV, volume 1855 of LNCS, pages 154–169, July 2000. [7] E. M. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 2000. [8] Edmund Clarke, Anubhav Gupta, James Kukula, and Ofer Strichman. SAT based abstraction-refinement using ILP and machine learning techniques. In Proceedings of CAV’02, 2002. To appear. [9] Edmund Clarke, Somesh Jha, Yuan Lu, and Helmut Veith. Tree-like counterexamples in model checking. In Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science (LICS’02), 2002. To appear. [10] Satyaki Das and David Dill. Successive approximation of abstract transition relations. In Proceedings of the 16th Annual IEEE Symposium on Logic in Computer Science (LICS’01), 2001. [11] Shankar G. Govindaraju and David L. Dill. Counterexample-guided choice of projections in approximate symbolic model checking. In Proceedings of ICCAD’00, San Jose, CA, November 2000. [12] P.-H. Ho, T. Shiple, K. Harer, J. Kukula, R. Damiano, V. Bertacco, J. Taylor, and J. Long. Smart simulation using collaborative formal and simulation engines. In Proceedings of ICCAD’00, November 2000. [13] R. Kurshan. Computer-Aided Verification of Co-ordinating Processes: The Automata-Theoretic Approach. Princeton University Press, 1994. [14] J. Lind-Nielsen and H. Andersen. Stepwise CTL model checking of state/event systems. In N. Halbwachs and D. Peled, editors, Proceedings of the International Conference on Computer Aided Verification (CAV’99), 1999. [15] David E. Long. Model checking, abstraction and compositional verification. PhD thesis, Carnegie Mellon University, 1993. CMU-CS-93-178. [16] Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. Chaff: Engineering an efficient SAT solver. In Proceedings of the Design Automation Conference (DAC’01), pages 530–535, 2001. [17] Abelardo Pardo and Gary D. Hachtel. Incremental CTL model checking using BDD subsetting. In Proceedings of the Design Automation Conference (DAC’98), pages 457–462, June 1998. [18] J. P. Marques Silva and K. A. Sakallah. GRASP: A new search algorithm for satisfiability. Technical Report CSE-TR-292-96, Computer Science and Engineering Division, Department of EECS, Univ. of Michigan, April 1996. [19] Dong Wang, Pei-Hsin Ho, Jiang Long, James Kukula, Yunshan Zhu, Tony Ma, and Robert Damiano. Formal property verification by abstraction refinement with formal, simulation and hybrid engines. In Proceedings of the DAC, pages 35–40, 2001. [20] Hantao Zhang. SATO: An efficient propositional prover. In Proceedings of the Conference on Automated Deduction (CADE’97), pages 272–275, 1997. [21] Lintao Zhang, Conor F. Madigan, Matthew W. Moskewicz, and Sharad Malik. Efficient conflict driven learning in a Boolean satisfiability solver. In Proceedings of ICCAD’01, November 2001.
Simplifying Circuits for Formal Verification Using Parametric Representation In-Ho Moon1 , Hee Hwan Kwak1 , James Kukula1 , Thomas Shiple2 , and Carl Pixley1 1
Synopsys Inc., Hillsboro, OR Synopsys Inc., Grenoble, France {mooni,hkwak,kukula,shiple,cpixley}@synopsys.com 2
Abstract. We describe a new method to simplify combinational circuits while preserving the set of all possible values (that is, the range) on the outputs. This method is performed iteratively and on the fly while building BDDs of the circuits. The method is composed of three steps; 1) identifying a cut in the circuit, 2) identifying a group of nets within the cut, 3) replacing the logic driving the group of nets in such a way that the range of values for the entire cut is unchanged and, hence, the range of values on circuit outputs is unchanged. Hence, we parameterize the circuit in such a way that the range is preserved and the representation is much more efficient than the original circuit.Actually, these replacements are not done in terms of logic gates but in terms of BDDs directly. This is allowed by a new generalized parametric representation algorithm to deal with both input and output variables at the same time. We applied this method to combinational equivalence checking and the experimental results show that this technique outperforms an existing related method which replaces one logic net at a time. We also proved that the previous method is a special case of ours. This technique can be applied to various other problem domains such as symbolic simulation and image computation in model checking.
1
Introduction
Given a complex Boolean expression that defines a function from an input bit vector to an output bit vector, one can compute by a variety of methods the range of output values that the function can generate. This range computation has a variety of applications such as equivalence checking and model checking. BDDs (Binary Decision Diagrams[4]) and SAT (Satisfiability[13,19]) are two major techniques that can be used to perform the computation. In this paper we present a new BDD-based method, and describe its use in equivalence checking. However this new method can also be applied to other areas. The Boolean equivalence checking problem is to determine whether two circuits are equivalent. Typically, the circuits are at different levels of abstraction, one is a reference design and the other is its implementation. Equivalence checking is being used intensively in industrial design and is a mature problem. However there are still many real designs that current state-of-the-art equivalence checking tools cannot verify. BDD-based equivalence checking is trivial if the BDD size does not grow too large, however that is not the case in most of real designs. Therefore cut-based method [2, 14,10] has been used to avoid building huge monolithic BDDs. The cut-based method M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 52–69, 2002. c Springer-Verlag Berlin Heidelberg 2002
Simplifying Circuits for Formal Verification
53
introduces free variables for the nets in a cut, causing the false negative problem [2] since we lose the correlations on the free variables. When the verification result is false, this method has to resolve the false negatives by composing the free variables with their original functions. Even though this method has been used successfully, it still suffers from false negative resolutions that are very expensive and infeasible in many cases in real designs. To overcome the false negative problem, Moondanos et al. proposed the normalized function method [18]. Instead of simply introducing a free variable for a net on a cut, the function driving the net is replaced with a simplified function which preserves the range of values on the cut. This simplified function is called a normalized function. However we have observed that the normalized function is not optimal and we have generalized the normalized function not to have redundant variables, as explained in Section 4. A similar approach to the normalized function has been presented by Cerny and Mauras [6], which uses cross-controllability and cross-observability to compute the range of a cut from primary inputs, and the reverse range from primary outputs. Then equivalence checking can be done by checking whether the reverse range covers the range. In this method, once a set of gates is composed to compute the range, the variables feeding only the gates are quantified, just as the fanout-free variables are quantified in the normalized function. However this method suffers from BDD blowup since the range computation is expensive and the range of a cut represented by BDDs is very large in general. In this paper we present a new method to simplify circuits while preserving the range of all outputs. The method makes the work of Cerny and Mauras practical and also extends normalized functions to apply to a set of nets in a cut, instead of a single net. The new method is performed iteratively and on the fly while building BDDs of the circuits and is composed of three steps; 1) identifying a cut in the circuit, 2) identifying a group of nets within the cut, 3) replacing the logic driving the group of nets in such a way that the range of values for the entire cut is unchanged and, hence, the range of values on circuit outputs is unchanged. We apply the range computation selectively by first identifying the group to be replaced in step 2) and then estimating the feasibility and the gain from the computation in step 3). Furthermore once the range is computed, we do not keep the computed range as Cerny and Mauras do. Instead we try to get a simplified circuit from the range by using a parametric representation [1,11]. We also prove that the normalized function method is a special case of our method. Parametric representation has been used to model the verification environment based on design constraints [1,11]. Various parametric representations of Boolean expressions have been discussed in [5,7,8,9,1,11]. Parametric representation using BDDs was introduced by Coudert et al.[7,8] and improved by Aagaard et al.[1]. The authors in [1] proposed a method to generate the parameterized outputs as BDDs from the constraints represented by a single BDD [1]. However this method can deal with only the output variables of the environment, in other words the variables do not depend on the states of the design. Kukula and Shiple presented a method to deal with the output variables as well as the input variables that depend on the states of the design [11]. However this method takes the environment represented by a relation BDD and generates the parameterized outputs as circuits instead of BDDs.
54
I.-H. Moon et al.
In this paper we also present a generalized approach of the parametric representations to deal with the input and output variables as well as to generate the parameterized outputs as BDDs. We also identify that the method in [1] is a special case of the one in [11] in the framework of our generalized approach. Combining the range computation and the generalized parametric representation makes more efficient and compact representations of the circuits under verification so that the circuits can be easily verified. This approach can be applied to not only equivalence checking but also symbolic simulation as well as image computation. The rest of the paper is organized as follows. Section 2 reviews background material and Section 3 discusses prior work. We present our algorithm to find sets of variables for early quantification in Section 4. Section 5 shows the overall algorithm for equivalence checking and compares ours to the prior work. Section 6 describes a special type of range computation and Section 7 presents our methods for parametric representation. Section 8 shows the relationship between normalization and parameterization. Experimental results are shown in Section 9 and we conclude with Section 10.
2
Preliminaries
Image computation is finding all successor states from a given set of states in one step and is a key step in model checking to deal with sequential circuits [15,17,16]. Let x and y be the sets of present and next state variables and w be the set of primary input variables. Suppose we have a transition relation T (x, w, y) that represents all transitions, being true of just those triples of a, b, and c, such that there is a transition from state a to state c, labeled by input b. Image I(y) for given set of states C(x) is formally defined as I(y) = Image(T, C) = ∃x,w. T (x, w, y) ∧ C(x) . Range computation is a special type of image computation where C(x) is the universe, in other words it finds all possible successor states in a transition system. Range R(y) is defined as R(y) = Range(T ) = Image(T, 1) = ∃x,w. T (x, w, y) .
3 3.1
(1)
Related Work Normalization
To overcome the false negative problem in BDD-based equivalence checking, Moondanos et al. proposed a normalization method [18]. The authors split the set of input variables of the current cut into N and R. N is the set of fanout-free variables, in other words, the variables feeding only one net in the cut. R is the set of fanout variables that fanout to more than one net in the cut. Then, the function F of a net can be simplified without causing false negatives by using its normalized function that preserves the range of the cut. To make the normalized function of F , possible term Fp and forced term Ff of F are defined as below. Fp (R) = ∃N. F (R, N ) Ff (R) = ∀N. F (R, N )
Simplifying Circuits for Formal Verification
55
Then the normalized function Fnorm is defined by Fnorm = (v ∧ Fp ) ∨ Ff = (v ∧ ∃N. F (R, N )) ∨ ∀N. F (R, N ) ,
(2)
where v is an eigenvariable that is newly introduced. 3.2
Parameterization with Output Variables
Parametric representation using BDDs was introduced by Coudert et al.[7,8] and improved by Aagaard et al.[1]. The authors in [1] used the parametric representation to make the verification environment from the input constraints of the design under verification. Thus only output variables of the environment are considered since there is no constraint relating the states of the design. The basic idea is that each variable is parameterized with three cases for each path from the root to the leaf of the constraint BDD. However this operation is performed implicitly by calling P aram recursively and by using a cache. The three cases are 1) the positive cofactor of a node is empty, 2) the negative cofactor is empty, and 3) both cofactors are non-zeroes; BDD ZERO, BDD ONE or a parametric variable is assigned for each case, respectively. Then, the sub-results from the two children of a branch are merged by bdd ite operations from bottom to top. 3.3
Parameterization with Input and Output Variables
Kukula and Shiple proposed a method of parametric representation to deal with the input and output variables of the verification environment [11]. The input variables depend on the states of the design under verification. This method generates circuits from the BDD relation representing the environment. The conceptual procedure consists of three phases as follows. – Phase 1 (DFS): Finds all paths to constant 1 for each child of each node, through bottom-up traversal from the leaf node to the root node. – Phase 2 (DFS): Propagates signals from the root node to the leaf node to activate single path from root to leaf. – Phase 3: Computes the circuit output for each output variable. This method uses two template modules; one for input variables and the other for output variables. Using the template modules, the parameterized output circuit is generated in the following procedure. 1. Replace all BDD nodes for input and output variables with the pre-defined input and output template modules, respectively. 2. Connect the pins of the modules through Phase 1 and 2. 3. Produce outputs using a mux for each output variable.
56
3.4
I.-H. Moon et al.
Cross-Controllability and Cross-Observability Relations
Cerny and Mauras have used cross-controllability and cross-observability for equivalence checking[6]. Suppose we make a cut in a circuit containing two outputs of implementation and specification, namely yI and yS , respectively. Let x be the set of input variables and u and v be the set of cut variables in the implementation and specification, respectively. We then compute I1 , which is the relation between u and x, and S1 , which is the relation between v and x. Then cross-controllability is defined as Cross-controllability(u, v) = ∃x. (I1 (u, x) ∧ S1 (v, x)) . We can see that the cross-controllability is the range of the cut. Similarly, we compute I2 , which is the relation between yI and u, and S2 , which is the relation between yS and v. Then cross-observability is defined as Cross-observability(u, v) = ∃y. (I2 (u, y) ∧ S2 (v, y)) . We can also see that the cross-observability is the reverse range of the two outputs in terms of u and v. Then equivalence checking can be done by Cross-controllability(u, v) ≤ Cross-observability(u, v) .
(3)
The authors proposed the three different checking strategies and one of those is forward sweep. In the strategy, the cut is placed at the primary outputs and the crosscontrollability of the cut is computed by composing gates iteratively from the inputs to the outputs in such a way to eliminate any local variables that feed only some of the gates to be composed. When all gates are composed, Equation 3 is applied with trivial cross-observability.
4
Definition of K and Q Sets
In this section, we start with an example to show that the method in Section 3.1 can introduce redundant variables. Then we define the set of variables we can early quantify not to have those redundant variables in simplifying the functions in a given cut. Furthermore we extend the definition to handle a group in the cut. Consider two functions f and g in terms of variables a, b, and c. f = a ∧ ¬b + ¬a ∧ b g =a∧c Then, from the normalization method, R becomes {a} and N becomes {b, c}. Then, the normalized functions for f and g are as below. fnorm = v1 gnorm = a ∧ v2 In this example, it is easy to see that the variable a is redundant in gnorm since the variable a occurs only in gnorm . Actually the range of {f, g} is tautologous. So gnorm
Simplifying Circuits for Formal Verification
57
could be just v2 , which is optimum. This is because even though the variable a fans out to both f and g, the effect of the signal a to f is blocked by the signal b, which is non-reconvergent. Therefore, we can move the signal a into N in this case so that we can quantify even a. Now we formally define K and Q for a cut. K is the set of variables to keep in the simplified functions and Q is the set of variables to quantify out. Let F = {fi,00 | wrap=1, entry[head]=D]
0−ahead [(tail−head)%3=1 | wrap=1, entry[head−1]=D]
3−filled [(tail−head)%3=0, wrap=1]
[(tail−head)%3=2, wrap=0]
1−ahead
2−ahead
[(tail−head)%3=0, wrap=1, entry[head−2]=D]
Fig. 8. Refined Quaternary Simulation for the FIFO
Now, let us turn to the canonization algorithm. The algorithm canonizes an extended quaternary assignment in two phases, assuming a total ordering among precise nodes. Given a quaternary assignment, the first phase of the algorithm extracts the list of precise node and symbolic value pairs ordered by the precise nodes, and generates the unique parametric representation for the list without building the intermediate state predicate. It also returns a boolean relation between the unique set of parametric variables generated from the precise nodes and the symbolic variables in the original list. The entire algorithm is listed in Figure 9, where input old is a vector of precise node and symbolic value pairs such that all symbolic variables in the list have been properly renamed to avoid overlapping with the unique set of parametric variables and depend(f ) is a function that returns the set of symbolic variables in f . It should be pointed out that if only new needed to be computed, the algorithm could be significantly improved by existentially quantifying out variables from old as soon as they no longer were needed. However, in order not to lose the information implied by the precise nodes on the non-precise nodes, it is critical to compute this relation. The second phase of the canonization algorithm normalizes the symbolic quaternary value for every other circuit node in the quaternary assignment. The value of a nonprecise node in the original quaternary assignment may depend on the constraint among precise nodes captured by the original symbolic variables. Such a dependency relation is maintained in the canonized assignment by replacing the original constraint with the equivalent constraint over the set of unique parametric variables.
82
J. Yang and C.-J.H. Seger Algorithm: UniqueParam( old[1:n]) 1. Rel := true; 2. Z := depend( old[1:n]); 3. for i: = 1 to n 4. (Node, Value) := old[i]; 5. H := ∃Z. Rel ∧ Value; 6. L := ∃Z. Rel ∧ ¬Value; 7. Value’ := (UniqVar(N ode) → H) ∧ (¬UniqVar(N ode) → ¬L); 8. new[i] := (Node, Value’); 9. Rel := Rel ∧ (Value’ = Value); endfor 10. return ( new[1:n], Rel); end. Fig. 9. Unique Parameterization
Although the model refinement approach might be computationally more expensive than the specification refinement approach, it has many advantages. First, deep knowledge about the circuit behavior is not required. What is required is to identify the set of precise nodes in the circuit. This is a considerably simpler task and can be aided by the debugging capability in GSTE. Also, heuristics exists for identifying precise nodes, e.g., all control nodes may be good candidates for precise nodes. Second, the graph does not need to be unfolded to match the implementation, a process which can sometimes be very tedious. Last, the regression becomes easier to maintain, since an internal change in the circuit may require very little to be changed in the specification. Nevertheless, the key for successful verification is to make the best trade-off between specification refinement and model refinement, and between model checking efficiency and accuracy.
5
Model Checking with Backward Simulation
In this section, we show that in GSTE we also can use a more sophisticated model checking algorithm as yet another weapon against over-abstraction. More specifically, we will illustrate that by using backward simulation, i.e., model checking using normal satisfiability, a problem can get significantly simplified. In order to illustrate the approach, we will use the very simple 8-bit counter circuit shown in Figure 10 (a). The property we want to verify is that, after the circuit has been properly reset, the outputs outb is the complement of output out. In Figure 10 (b) we give a typical GSTE assertion stating this property. Intuitively, after a reset and some arbitrary number of clock cycles later, if the output of out has some value O, then outb should be !O. O is a vector of symbolic constants used to encode the 256 different values that out can take on. The difficulty with this verification is that the we need to maintain the relation between outb and out. We could split the assertion graph and effectively have a vertex for every value of the counter. However, for large registers, this is clearly not practical. On the other hand, we could simply make the nodes in out and outb precise. However, this would lead to a large number of variables if the registers are large. Although both approaches are feasible for the 8-bit wide register shown in Figure 10 (a), there is in
Generalized Symbolic Trajectory Evaluation — Abstraction in Action 8 8
0 1
mux
+1
8
reset&!clk / true
out
clk / true
!reset&!clk / true
clk&out=O[7:0] / outb=!O[7:0]
mid
8
reset
init
outb
clk / true
loop
83
!reset&!clk / true
done
clk (A) dual−rail 8−bit counter
(B) mutual exclusion specification
Fig. 10. Dual-rail counter: (a) circuit, (b) assertion graph
fact a more elegant solution that does not require any change of assertion graph and no extra variables. The idea is to use the normal satisfaction model checking algorithm to effectively move the output assumption into the internal signals of the circuit. As mentioned in Section 2, the model checking algorithm for normal satisfaction is a two-phase process. In the first phase, a pre-image fixpoint process is used to strengthen earlier antecedents in the assertion graph by propagating later antecedent constraints backwards. By using this algorithm, we will “pull” the value on out back to the inverter input mid, which will then be propagated forward during the second phase of the model checking algorithm to the node outb. In Figure 11 we show the strengthened assertion graph that is obtained after the first phase of the model checking algorithm. Note that we did not assume here that the backward propagation was able to pull the value O back through the adder. For this to be possible, the state-holding register out would almost certainly have to have been made precise. However, for this verification, it is sufficient to pull the value O back through the flip-flop.
reset&!clk / true init
clk / true clk / true
!reset&!clk&mid=O[7:0] / true loop
!reset&!clk / true
clk&out=O[7:0] / outb=!O[7:0] done
Fig. 11. Assertion graph after backward strengthening phase.
In our final example, we turn our focus to the verification of a high-level property of our FIFO circuit. Although much less complete than our earlier FIFO specifications, the property illustrates the need for using a combination of the techniques we have introduced in this paper. The property we would like to establish states simply: if, after a reset, a value D is never enqueued, then the value D cannot be dequeued. A natural assertion graph for the property is shown in Figure 12 (a). Let us consider verifying this property against the stationary FIFO implementation in Figure 6 as an example. On the self loop (loop, loop), the antecedent can be converted into a parametric form: [enq = z, din = (if z then param((d[9:0]! = D[9:0]), d[9:0]) else X[9: 0])]
84
J. Yang and C.-J.H. Seger !empty&deq&dout=E[9:0] / E[9:0]!=D[9:0]
reset / true init
loop enq −> din!=D[9:0] / true (a) the Original Assertion Graph
done
head=H[1:0]&00 x o F
4
meaning has self loop, belongs to no fairness constraints has self loop, belongs to all fairness constraints except the nth trivial SCC, belongs to no fairness constraints trivial SCC, belongs to all fairness constraints has self loop, belongs to all fairness constraints
Complexity of GSH
Following [1], we measure the cost of GSH in steps, the number of EX and EY operations applied to a nonempty set. We express the cost in terms of the number of fairness constraints |C|, the diameter of the graph d, the height of (i.e., the length of the longest paths in) the SCC quotient graph h, and the number of SCCs (N ) and nontrivial SCCs (N ). Since d, h, and N are often much smaller than n, this analysis provides more practical bounds than using the number of states. In [1, Theorem 1] it was shown that EL takes Θ(|C|dh) steps. In this section we extend this result to account for the flexibility in the scheduling of operators that characterizes the GSH algorithm. Throughout the analysis, we shall analyze worst-case behavior: we pick graphs that are hard for the algorithms. In Sections 4.1 and 4.2, we shall look at how many steps are needed if the scheduler chooses operations badly. In Section 4.3, we shall study how many steps are needed if the scheduler makes optimal decisions. The conventions used to label the nodes in Figs. 2–5 are shown in Table 1. To avoid clutter, the arcs controlling the diameter are not shown, but rather described in the captions. 4.1
Bounds for Unrestricted Schedules
Theorem 5. GSH takes O(|C|dN ) steps. Proof. Let t = |TB | = 2|C| + 2 be the number of operators applied by GSH. Clearly, O(t) = O(|C|). We must have progress at least once every t iterations, because otherwise all operators have been applied without progress and the algorithm terminates. Each operator’s application cost is O(d). Hence, we do O(|C|d) work in between two advances toward the fixpoint. The number of times we make progress is O(N ). Hence, the desired bound. To show that the number of times we make progress is O(N ) we argue as follows. The initial Z is SCC-closed, because it is either V or the reachable subset of V . When any of the operators in TB is applied, SCC-closedness is preserved. In particular, this holds for EU and ES because Z is (inductively) SCC-closed. Indeed, if v ∈ V has a path to Z ⊆ Z all in Z, then the SCC of v is contained in Z. Hence, each v in the same SCC has a path to Z all in Z. The result is therefore SCC-closed, and the set of dropped states, which is the difference of two SCC-closed sets, is also SCC-closed. In summary, when there is progress, Z loses an integral number of unfair SCCs that is greater than or equal to 1. Thus, progress cannot occur more than N times.
96
F. Somenzi, K. Ravi, and R. Bloem 2r
r
2r
0
0
0
0
0
0
0
0
1
2
1
2
1
2
1
2
0
0
0
0
0
0
0
0
1
2
1
2
1
2
3
3
0
0
0
0
0
0
0
0
1
2
1
2
4
4
3
3
0
0
0
0
0
0
0
0
1
2
5
5
4
4
3
3
Fig. 2. Graph showing that GSH is Ω(|C|dN ). Not shown are the arcs from any node with a label different from 0 to all the nodes to its right
The bound of Theorem 5 is tight in the following sense. Theorem 6. There exist a family of graphs and a corresponding family of schedules such that GSH takes Ω(|C|dN ) steps. Proof. Consider the family of graphs parameterized by r that is exemplified by Fig. 2. A graph in the family has r rows, each of which consists of 4r nontrivial SCCs. Hence, there are N = 4r2 SCCs, and |C| = r + 1 acceptance conditions; the height of the SCC graph is 4r, and the diameter is d = 2r + 1. Let U = {EUi |1 ≤ i ≤ |C|}. We consider the following schedule. – – – – –
All elements of U \ {EU3 } in decreasing index order r times, followed by EU3 . All elements of U \ {EU4 } in decreasing index order r − 1 times, followed by EU4 . ... All elements of U \ {EU|C| } in decreasing index order twice, followed by EU|C| . All elements of U \ {EU|C| } in decreasing index order once.
We now count the steps. The first series of subsequences takes h/4O(|C|d) steps. The second series takes (h/4 − 1)O(|C|d) steps and so on. The total number of steps is therefore Ω(|C|dh2 ), which is also Ω(|C|dN ).
4.2
Bounds for Restricted Schedules
If we strengthen the assumption about pick, we can prove an O(|C|dh + N − N ) bound. (N − N is the number of trivial SCCs.) The additional assumption is that the computation is performed in passes. We shall show that this bound is tight for EL2, but not for EL. Definition 1. A pass is a sequence over TB that satisfies the constraints imposed by GSH, and such that: 1. No EUi or ESi appears more than once. 2. Either all operators in TF or all operators in TP appear. Having thus divided the computation in passes, we can use reasoning similar to the one of [1, Theorem 1].
Analysis of Symbolic SCC Hull Algorithms
97
Table 2. Schedules and tenses. The algorithms are classified according to the mix of operators (EX of EY and EU or ES). Within these categories, they differ by tense EL future-tense [4] past-tense both tenses
EL2 [7,5] [7,8] [7,6]
Theorem 7. If the operator schedule can be divided in passes, GSH takes O(|C|dh + N − N ) steps. Proof. A pass in which all EU operators and at least one EX have been applied once removes all the terminal unfair SCCs present at the beginning of the pass. Likewise, a pass in which all ES operators and at least one EY have been applied once removes all the minimal unfair SCCs present at the beginning of the pass. Then, by induction on h, we can prove that we cannot have more than h passes of either type, for a total of 2h passes. Each pass may contain more than one EX or EY. We charge their cost separately, and we argue that the total cost of the successful applications is O(N − N ), because each extra EXs or EYs removes a trivial SCC. The cost of the unsuccessful applications is dominated by the cost of the EUs and ESs, which is O(|C|d).
The algorithms of Table 2 all satisfy the restricted scheduling policy2 , and are therefore O(|C|dh + N − N ). N − N is the linear penalty discussed in [5]. Though this penalty does not alter the complexity bounds in terms of the number of states n, it cannot be ignored when the bounds are given in terms of |C|, d, and h. Consider the following family Gr,s,f of graphs. Here, r is the number of rows, 0 < s < 2r determines the diameter, and f is the number of fairness conditions. (Shown in Fig. 3 is G3,2,2 .) For this family of graphs, d = s + 2, |C| = f , and h = 4r − 1. We consider the EL2 schedule. The future-tense version of EL2 applies EU1 through EUf followed by EG until convergence. The first application of EU1 through EUf removes the f rightmost nontrivial SCCs of each row. The successive EG removes what is left of the first row, and the rightmost trivial SCC of all the other rows. The second round of EU’s removes again the f rightmost nontrivial SCCs of each surviving row. EG then removes the second row entirely and the rightmost trivial SCC of each other row. We need a total of r passes to converge. Each pass costs |C|(s + 3) + 2r + 2. The |C|(s + 3) term is for the EUs and the 2r + 2 term is for the EG. Hence, the total cost of EL2 is (|C|(s + 3) + 2r + 2)r which is not O(|C|dh) = O(|C|sr). So, even though EL2 may beat EL on specific examples, EL’s O(|C|dh) bound is better. For EL2 we have the following lower bound, which is a special case of our previous observation about schedules that can be divided into passes. 2
GSH can implement a simplification of Kesten’s algorithm that disregards the issues of Streett emptiness.
98
F. Somenzi, K. Ravi, and R. Bloem
2r x
x
x
1 x
x
x
o 2 1
r
x
x
x
x
x
x
o
1 o
2 1 x
x
x
x
x
s
x
o
2 1 o
2
1 o
2
2
Fig. 3. Graph G3,2,2 showing that EL2 is Θ(|C|dh + N − N ). Not shown are the arcs from each node of type o or n to every node to its right on the same row, and from every node of type x, but the first s, to each x to its right and to the first o node on the row
Theorem 8. Algorithm GSH with schedule EL2 runs in Θ(|C|dh + N − N ) steps. Proof. EL2 is O(|C|dh+N −N ) thanks to Theorem 7. To show that EL2 is Ω(|C|dh+ N − N ) we resort to the family of graphs Gr,s,f we have used to show that EL2 is not O(|C|dh). We counted (|C|(s + 3) + 2r)r steps for EL2. Since N − N = 5r2 /2 + r/2, |C|dh + N − N = |C|(s + 2)(4r − 1) + 5r2 /2 + r/2, so (|C|(s + 3) + 2r)r is Ω(|C|dh + N − N ).
A similar analysis can be carried out for the variant of EL2 that uses both tenses (HH). In particular, for the upper bound Theorem 7 applies. For the lower bound, one can take the family of graphs we have used for EL2, add a fair SCC at the beginning of each row, and then mirror each row to the left. On the other hand, every schedule divided in passes in which the cost of applying EX and EY in each pass is dominated by the cost of applying EU and ES operators shares the (optimal) bounds of EL. 4.3
Bounds for Optimal Schedules
Theorem 6 is concerned with how badly things can go if the schedule is not well matched to the graph at hand. It is also interesting to consider what an optimal schedule can do. To this purpose, we provide GSH with an oracle that computes an optimal schedule, and we call the resulting method OSH. Theorem 9. OSH takes Θ(|C|dh) steps. Proof. For the upper bound we rely on [1, Theorem 1], which shows that EL is O(|C|dh). For the lower bound, we use the example of Fig. 4. In the graph shown, |C| = 3. The diameter is determined by the number of x nodes. Assume there are at least as many o nodes as there are x nodes.3 3
This assumption guarantees that the number of o nodes, which determines the number of “rounds” of EUs or ESs, is Ω(h).
Analysis of Symbolic SCC Hull Algorithms 1
1 2 3
o
2
1 o
x
x
F
x
3
x
o
2 3
99
1
o
2 3
Fig. 4. Graph showing that OSH is Ω(|C|dh). The o and n nodes have arcs forward to the other o and n nodes on the same side of F
OSH takes Ω(|C|dh) steps on this family of graphs. The cost of an EU or ES does not change until some x nodes are removed. At that point, the optimal schedule simply removes the remaining exposed x nodes to reach convergence. Hence, in this case, a unidirectional schedule is optimal. Suppose we use a future-tense schedule to fix ideas. Initially, we can only make progress by applying one EU. After that, we need to apply all remaining EUs before the rightmost o is exposed. At that point we can only make progress by applying EX. Therefore, we need to apply all EUs and one EX Ω(h) times. (Here is where we use the assumption of the number of o nodes.) The number of EUs is thus Ω(|C|h) and their cost is Ω(|C|dh).
A consequence of Theorem 9 is that EL is an optimal schedule. This optimality has its limits: it depends on our choice of measures, and there are graphs on which other schedules need fewer steps. Corollary 2. If cost is measured in terms of steps, and expressed in terms of |C|, d, h, N , and N , there is no schedule of GSH that has better worst-case asymptotic complexity than EL. 4.4
Bidirectional vs. Unidirectional Schedules
We conclude our analysis of GSH with a discussion of the advantages and disadvantages of schedules that use all four types of operators relative to those schedules that use only past-tense operators, or only future-tense operators. The proof of Theorem 7 suggests that trimming from both sides may take more steps than trimming from one side only, because if we work only from one side, we need at most h passes instead of 2h. Occasionally, though, working from both sides will speed up things, especially when there are no fair SCCs. (As noted in [6].) One reason is that a search in one direction can reduce the diameter of the graph, which helps the search in the other direction. The following example illustrates this point. Example 2. Consider the family of graphs exemplified in Fig. 5. The arcs out of the x nodes are all to the direct neighbor to the right, while the remaining nodes form a complete acyclic subgraph. In the example graph, d = 12, |C| = 3. OSH first applies an EH at a cost of d/2; it then applies d/2 EUs. Each EU costs 2 steps; so, the total cost is Θ(d) steps for EUs and O(d) steps total. Any purely future-tense or purely past-tense algorithm needs to apply d/2 EUs, too, but this time every one costs Ω(d) steps, giving a quadratic behavior. Note that an EG does not help.
100
F. Somenzi, K. Ravi, and R. Bloem
x
x
x
x
x
x
x
x
x
x
x
x
1
2
3
1
2
3
F 1
2
3
1
2
3
Fig. 5. Graph illustrating the possible advantages of bidirectional schedules
The preceding example shows that some bidirectional schedules may outperform all unidirectional ones. Obviously, there are even more cases in which bidirectional schedules outperform schedules in one direction, but not in the other.
5
Early Termination in Lockstep
Lockstep is a symbolic cycle detection algorithm based on SCC enumeration [1]. Given a seed v, the SCC containing v is computed as the intersection of the set β(v) of states that can reach v nontrivially and the set φ(v) of states that are nontrivially reachable from v. If there are short cycles involving the seed state v, the intersection of β(v) and φ(v) may be non-empty well before the two sets have been computed in their entirety. This suggests an early termination criterion for the algorithm. Theorem 10. Let F (v) and B(v) be the subsets of φ(v) and β(v) computed by Lockstep at some iteration. Let I(v) = F (v) ∩ B(v) and U (v) = F (v) ∪ B(v). If I(v) has a non-null intersection with all fair sets, then U (v) contains a fair cycle. Proof. Every state in B(v) has a non-trivial path to v entirely contained in B(v). Every state in F (v) has a non-trivial path from v entirely contained in F (v). Hence, every state in I(v) has non-trivial paths to and from v entirely contained in U (v). Therefore, every state in I(v) is connected to every other state of I(v) by a non-trivial path entirely in U (v). Since I(v) contains representatives from all fair sets, one can trace a fair cycle in U (v).
Once a fair set intersects I(v), it will continue to intersect it in all successive iterations of lockstep. Furthermore, at each iteration, one can stop testing intersections as soon as one fair set is found that does not intersect I(v). Hence, if there are |C| fair sets and convergence requires s steps, the number of intersection checks is O(|C| + s). The overhead for this early termination check is O(s) intersections (for I(v)) and O(s) intersection checks, because the original Lockstep performs O(|C|) intersection checks on the maximal SCC. Early termination imposes a simple change to counterexample generation: The fair sets are intersected with I(v). This intersection guarantees that the path connecting the fair sets can always be closed.
Analysis of Symbolic SCC Hull Algorithms
6
101
Experiments
In this section we present preliminary results obtained with implementations of GSH and Lockstep in VIS 1.4 [2]. The CPU times were measured on an IBM IntelliStation running Linux with a 1.7 GHz Pentium 4 CPU and 1 GB of RAM. The experiments involved three types of schedules for GSH: EL, EL2, and a random schedule, which applies about the same fraction of EXs as EL. We experimented with different levels in the use of don’t care conditions in the computation of the fixpoints. Our experiments involved both language emptiness checks, and CTL model checking problems. For the language emptiness experiments, we considered both future-tense and past-tense schedules, and we also ran the enhanced version of Lockstep described in Section 5. All experiments included reachability analysis and used fixed BDD variable orders to minimize noise. We present three summary tables, for the three classes of experiments. The parameters we vary are the algorithm/schedule, the tense (future or past) for GSH schedules, and the degree of exploitation of don’t care conditions in the EX and EY computations. For all GSH schedules, a “low DC level” means that the reachable states are used to simplify the transition relation, that the fixpoint computation for the fair states is started from Z = true, and that no frontiers are used in the EU and ES computations. A “medium DC level” means that in addition to simplifying the transition relation, frontiers are used. A “high DC level,” would simplify the argument to each EX computation with respect to the reachable states. (This simplification is not possible for EY computations.) One such scheme that we tried did not produce significant improvements over “medium DC level.” We have not yet implemented the technique described in [14]. The columns of the tables have the following meaning. Total is the total time (in seconds) for all experiments. For timed-out experiments, the limit of 1800s is taken. Gmt is the geometric mean of the running time of all experiments. A win is a case in which a method is at least 2% and at least 0.1s faster than any other method. A tie is a case in which there was no win, and the method was either fastest or less than 2% slower than the fastest method. T/O is the number of experiments that timed out. Steps is the total number of steps (EXs and EYs) performed, and gms is the geometric mean of the number of steps. All experiments for which at least one method took 0s are excluded from the computation of geometric mean time, wins, and ties. Table 3 compares different GSH schedules for CTL model checking experiments; Table 4 compares GSH schedules and Lockstep on language emptiness problems; and Table 5 shows the results of LTL and CTL model checking for families of parameterized models. The properties used for these experiments required cycle detection for both CTL and LTL model checking. The data in the tables supports the following remarks. – No type of schedule dominates the others, even though on individual models there are sometimes large differences. On average, EL-type schedules are the fastest for the parameterized models, while EL2 is the best for the non-parameterized ones. – The complexity bounds of Section 4 are in terms of number of steps. While within a homogeneous group of experiments (same tense and DC level) the schedule performing fewer steps is often the fastest, it is obvious that the cost of a step is not
102
F. Somenzi, K. Ravi, and R. Bloem Table 3. Summary of CTL model checking experiments for 39 models schedule EL EL2 random EL EL2 random
DC level low low low medium medium medium
total 10307 8002 10028 9606 9128 11284
gmt wins ties T/O 7.7 1 20 4 6.3 0 23 3 8.5 2 20 4 7.5 0 14 5 6.7 1 17 4 9.0 1 16 6
steps 78734 31153 64420 78622 31013 64059
gms 139 126 148 137 123 147
Table 4. Summary of language emptiness experiments for 59 models schedule EL EL2 random EL EL2 random EL EL2 random EL EL2 random lockstep
tense future future future future future future past past past past past past both
DC level low low low medium medium medium low low low medium medium medium medium
total 15389 10273 15167 12367 8342 11875 11440 5999 10666 6759 3362 5771 10613
gmt wins ties T/O 23.1 0 13 7 16.4 0 13 4 22.7 0 13 6 19.6 1 13 4 14.8 1 13 1 18.9 1 14 2 18.3 0 13 4 9.3 0 17 2 17.4 0 13 4 14.2 1 14 0 8.4 3 16 0 13.4 0 15 0 22.2 7 12 3
steps 37252 15182 31500 38412 16033 32840 118711 19894 105842 153188 20692 116904 140673
gms 64 57 63 68 59 67 101 71 98 105 72 101 86
constant. Figure 6, for instance, shows the relation between number of steps and CPU time for the EL schedule with low DC level applied to the parameterized families of models. It is readily seen that for most families, the computation time grows much faster than the number of steps. Also, the past-tense schedules of Table 5 perform many more steps than the corresponding future-tense schedules. However, the majority of them are due to just one model, and are very cheap. – In our experiments, the tense did not affect in a significant way the comparison between different types of schedules (e.g., EL vs. EL2). – Past-tense schedules did usually better than future-tense schedules.4 However, the advantage of past-tense schedules may depend on several factors. These include different quantification schedules for EX and EY, different diameters for a graph and its reverse, the positions of the fair SCCs in the graphs, as well as various BDDrelated factors like the fact that some fixed variable orders are saved at the end of reachability analysis runs with dynamic reordering, and the hard-to-predict effects of the BDD computed table. In addition, our current implementation applies the same don’t care techniques for past and future schedules. All these reasons may 4
In Table 5, the best results are for CTL model checking, but it is not possible to compare those future-tense schedules to the others because the LTL models have more state variables.
Analysis of Symbolic SCC Hull Algorithms
103
Table 5. Summary of model checking experiments for 57 models from 11 parameterized families logic LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL LTL CTL CTL CTL CTL CTL CTL
schedule EL EL2 random EL EL2 random EL EL2 random EL EL2 random lockstep EL EL2 random EL EL2 random
tense future future future future future future past past past past past past both future future future future future future
DC level low low low medium medium medium low low low medium medium medium medium low low low medium medium medium
total 7746 13419 12614 7503 14009 13896 8422 8573 7236 8587 8588 7908 26597 2699 10076 11525 2969 10928 11725
gmt wins ties T/O steps gms 6.8 0 10 3 606502 587 10.7 0 10 5 626093 1061 10.5 0 8 5 685794 1053 6.7 0 9 3 606400 587 10.5 0 8 7 595326 1007 10.3 0 8 7 656409 996 6.0 0 17 3 2873990 1220 5.9 0 18 4 2749041 1164 5.9 0 14 2 2865320 1237 6.2 0 17 4 2816871 1188 6.0 0 18 4 2714631 1139 6.0 0 15 2 2857436 1222 30.2 0 8 12 3362850 2603 2.8 7 26 1 396251 429 4.9 4 23 5 370797 720 4.9 0 23 6 483961 676 3.2 0 29 1 396218 428 5.5 0 27 5 342588 703 5.5 0 27 6 463425 667
Scatter plot for 53 out of 57 experiments arbiter bakery drop elev-1-f elev-c-3 hrglass lock minmax philo tree-arb vending
103
EL: time (s)
102
1
10
100
10-1 1 10
2
10
3
10 10 EL: steps
4
5
10
Fig. 6. CPU time as a function of the number of steps
explain the differences between our results and those of [11] with regard to tenses. It should also be mentioned that future tense schedules may be applied also without preliminary reachability analysis. For past-tense schedules, one has then to prove reachability of the fair SCCs.
104
F. Somenzi, K. Ravi, and R. Bloem
– For all the experiments that complete within 1800 s, the number of steps does not depend on the DC level. However, in case of timeout, the number of steps until the timeout is counted. This explains small differences in the numbers of steps between methods that differ only in the use of don’t cares. Early termination for Lockstep is effective. The results with early termination are uniformly better or equal to those without.5 Compared to GSH, Lockstep loses in most cases, but has the largest number of wins for both non-parameterized and parameterized language emptiness experiments. (That is, not counting the CTL experiments in Table 5.)
7
Conclusions
We have presented an improved Generic SCC Hull algorithm, and we have proved several bounds on the performance of classes of algorithms that can be cast as particular operator schedules for GSH. We have proved in particular, that when complexity is measured in steps (EX and EY computations) and it is given as a function of the number of fairness constraints |C|, the diameter of the graph d, the height of the SCC quotient graph h, and the number of total (nontrivial) SCCs N (N ), then algorithm EL is optimal (Θ(|C|dh)) among those that can be simulated by GSH. Variants like EL2, on the other hand, are not optimal in that sense. (They are Θ(|C|dh + N − N ).) Of course, on a particular graph, EL2 may outperform EL for at least two reasons: On the one hand, the theoretical bounds are for worst-case performance. On the other hand, the cost of individual steps can vary widely. This implies that the theoretical analysis should be accompanied by an experimental evaluation. We have performed such an assessment, conducting experiments with several competing algorithms on a large set of designs. We have found that no GSH schedule dominates the others. Also, Lockstep is slower on average than GSH, but it produces the best results in quite a few cases. On individual experiments the ranges of CPU times for the various schedules may cover three orders of magnitude, which suggests that having more than one method at one’s disposal may allow more model checking problems to be solved.
References [1] R. Bloem, H. N. Gabow, and F. Somenzi. An algorithm for strongly connected component analysis in n log n symbolic steps. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 37–54. Springer-Verlag, November 2000. LNCS 1954. [2] R. K. Brayton et al. VIS: A system for verification and synthesis. In T. Henzinger and R. Alur, editors, Eighth Conference on Computer Aided Verification (CAV’96), pages 428– 432. Springer-Verlag, Rutgers University, 1996. LNCS 1102. [3] E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, Cambridge, MA, 1999. 5
We also found that the best performance is achieved by not trimming [11] the initial set of states (the reachable states). The results shown for Lockstep are for the algorithm that does not trim the initial set.
Analysis of Symbolic SCC Hull Algorithms
105
[4] E. A. Emerson and C.-L. Lei. Efficient model checking in fragments of the propositional mu-calculus. In Proceedings of the First Annual Symposium of Logic in Computer Science, pages 267–278, June 1986. [5] K. Fisler, R. Fraer, G. Kamhi, M. Vardi, and Z.Yang. Is there a best symbolic cycle-detection algorithm? In T. Margaria and W. Yi, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 420–434. Springer-Verlag, April 2001. LNCS 2031. [6] R. H. Hardin, R. P. Kurshan, S. K. Shukla, and M. Y. Vardi. A new heuristic for bad cycle detection using BDDs. In O. Grumberg, editor, Ninth Conference on Computer Aided Verification (CAV’97), pages 268–278. Springer-Verlag, Berlin, 1997. LNCS 1254. [7] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton. Efficient ω-regular language containment. In Computer Aided Verification, pages 371–382, Montr´eal, Canada, June 1992. [8] Y. Kesten, A. Pnueli, and L.-o. Raviv. Algorithmic verification of linear temporal logic specifications. In International Colloquium on Automata, Languages, and Programming (ICALP-98), pages 1–16, Berlin, 1998. Springer. LNCS 1443. [9] R. P. Kurshan. Computer-Aided Verification of Coordinating Processes. Princeton University Press, Princeton, NJ, 1994. [10] K. L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Boston, MA, 1994. [11] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of symbolic algorithms for the computation of fair cycles. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 143–160. Springer-Verlag, November 2000. LNCS 1954. [12] F. Somenzi. Symbolic state exploration. Electronic Notes in Theoretical Computer Science, 23, 1999. http://www.elsevier.nl/locate/entcs/volume23.html. [13] A. Tarski. A lattice-theoretic fixpoint theorem and its applications. Pacific Journal of Mathematics, 5:285–309, 1955. [14] C. Wang, R. Bloem, G. D. Hachtel, K. Ravi, and F. Somenzi. Divide and compose: SCC refinement for language emptiness. In International Conference on Concurrency Theory (CONCUR01), pages 456–471, Berlin, August 2001. Springer-Verlag. LNCS 2154. [15] A. Xie and P. A. Beerel. Implicit enumeration of strongly connected components. In Proceedings of the International Conference on Computer-Aided Design, pages 37–40, San Jose, CA, November 1999.
Sharp Disjunctive Decomposition for Language Emptiness Checking Chao Wang and Gary D. Hachtel Department of Electrical and Computer Engineering University of Colorado at Boulder, CO, 80309-0425 {wangc,hachtel}@Colorado.EDU
Abstract. We propose a “Sharp” disjunctive decomposition approach for language emptiness checking which is specifically targeted at “Large” or “Difficult” problems. Based on the SCC (Strongly-Connected Component) quotient graph of the property automaton, our method partitions the entire state space so that each state subspace accepts a subset of the language, the union of which is exactly the language accepted by the original system. The decomposition is “sharp” in that this allows BDD operations on the concrete model to be restricted to small subspaces, and also in the sense that unfair and unreachable parts of the submodules and automaton can be pruned away. We also propose “sharp” guided search algorithms for the traversal of the state subspaces, with its guidance the approximate distance to the fair SCCs. We give experimental data which show that our algorithm outperforms previously published algorithms, especially for harder problems.
1
Introduction
Language emptiness checking on the fair Kripke structure is an essential problem in LTL [1,2] and fair-CTL [3] model checking, and in the language-containment based verification [4]. Symbolic fair cycle detection algorithms – both the SCC-hull algorithms [5,6,7,8] and the SCC enumeration algorithms [9,10], can be used to solve this problem. However, checking language emptiness in general is harder than checking invariants, since the later is equivalent to reachability analysis and has a linear complexity. Due to the well-known state space explosion, checking language emptiness can be prohibitively more expensive and is still considered to be impractical on industry scale circuits. Symbolic fair cycle detection requires in the general case a more than linear complexity: O(n2 ) for SCC-hull algorithms and O(n log n) for SCC enumeration algorithms, where n is the number of states. For those cases where the automata are weak or terminal [11,12], special model checking algorithms usually outperform the general ones. This idea was further extended by [13], which combines compositional SCC analysis with specific decision procedures tailored to the cases of strong, weak, or terminal automata. It thus takes advantage of those strong automata with weak or terminal SCCs, and of those strong SCCs that turn into weak or terminal SCCs after the automata are composed with the model.
This work was supported in part by SRC contract 2001-TJ-920 and NSF grant CCR-99-71195.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 106–122, 2002. c Springer-Verlag Berlin Heidelberg 2002
Sharp Disjunctive Decomposition for Language Emptiness Checking
107
In [13] SCC analysis is also used during the localization reduction to limit BDD attention to one fair SCC of the partially composed abstract model at a time. This permitted BDD restriction to a small state subspace during expensive operations that needed to be performed on the entire concrete model. Sometimes, by partitioning the sequential system into subsystems and inspecting each of these small pieces separately, the chance of solving the problem increases. In the context of reachability analysis, [14] proposed the machine decomposition algorithm: It partitions the sequential system using its latch connectivity graph, so each subsystem contains a subset of latches of the original system. For language emptiness checking, we propose in this paper a new algorithm for state space decomposition, which is based on the notion of sharpness. Our algorithm partitions the original state space S into a collection of state subspaces Si , according to the SCC quotient graph structure of the amassed property automaton. A nice feature of these state subspaces is that, each of them can be viewed as a separate fair Kripke structure. Further, if we use L(S) to represent the original language, and L(Si ) to represent the language accepted within each state subspace, we have L(Si ) ⊆ L(S) and ∪i L(Si ) = L(S). This allows us to check language emptiness on each state subspace separately. Thus our decomposition is “sharp” in that the BDD operations on the concrete model are focused on very small state subspaces, and also in the sense that unfair and unreachable parts of the submodules and automaton can be pruned away. We further propose a “sharp” forward (and backward) guided search algorithm for the traversal of the sate subspaces, which uses the approximate distance to the fair SCCs to guide the search. At each breadth-first search step, we only compute a subset of normal image with a smaller BDD size (sharp) and a closer distance to the potential fair SCC (guided). Whenever the reachable subset intersects a promising state – a state that is in the fair SCC-closed set (defined later) and satisfies some fairness constraints, we use that state as a seed for the fair SCC search. If a fair SCC can be found, we know that the language isn’t empty; otherwise, we continue the forward search. If we can not find any fair SCC when the forward search reaches a fix-point, or the entire fair SCC-closed set has been explored, we know the language of is empty. Note our new algorithm does not use the weak/terminal automata strength reduction techniques of [12]. On practical circuits, reachability analysis or even a single image computation can be prohibitively expensive. In fact, our research is directed specifically toward such larger problems. Thus it is to be expected that algorithms with less heuristic overhead might outperform our “sharp” algorithm for easily soluble problems. The experimental results show this, but they also show that when the LTL model checking problems become harder, our “sharp” algorithm outperforms both Emerson-Lei (the standard language emptiness checking algorithm in VIS [15]) and D’n’C [13]. The flow of the paper is as follows. In Section 2 we present the basic definitions. In Section 3 we present the state space decomposition theory. In Section 4 we describe the algorithm and analyze its complexity. The experimental results are given in Section 5, and we conclude and discuss potentially fruitful future work in Section 6.
108
2
C. Wang and G.D. Hachtel
Preliminaries
We combine the model M and the property automaton A¬ψ together and represent the entire system as a labelled, generalized B¨uchi automaton1 A = M ∗ A¬ψ . Definition 1. A (labelled, generalized) B¨uchi automaton is a six-tuple A = S, S0 , T, F, A, L , where S is the finite set of states, S0 ⊆ S is the set of initial states, T ⊆ S × S is the transition relation, F ⊆ 2S is the set of fairness conditions, A is the finite set of atomic propositions, and L : S → 2A is the labelling function. A run of A is an infinite sequence ρ = ρ0 , ρ1 , . . . over S, such that ρ0 ∈ S0 , and for all i ≥ 0, (ρi , ρi+1 ) ∈ T . A run ρ is accepting if, for each Fi ∈ F, there exists sj ∈ Fi that appears infinitely often in ρ. The automaton accepts an infinite word σ = σ0 , σ1 , . . . in Aω if there exists an accepting run ρ such that, for all i ≥ 0, σi ∈ L(ρi ). The language of A, denoted by L(A), is the subset of Aω accepted by A. The language of A is nonempty iff A contains a fair cycle: a cycle that is reachable from an initial state and intersects all the fair sets. A Strongly-Connected Component (SCC) C of an automaton A is a maximal set of nodes such that there is a directed path between any node in C to any other. A reachable SCC that intersects all fair sets is called fair SCC. A SCC that intersects some initial states is called initial SCC. Given an automaton A, the SCC (quotient) graph Q(A) is the result of contracting each SCC of A into one node, merging the parallel edges and removing the self-loops. Definition 2. The SCC (quotient) graph of the automaton A is a four-tuple C Q(A) = S C , S0C , T C , SF ,
where S C is the finite set of SCCs, S0C ⊆ S C is the set of initial SCCs, T C = {C1 × C2 |s1 ∈ C1 , s2 ∈ C2 and s1 × s2 ∈ T and C1 =C2 } is the transition relation, C ⊆ S C is the set of fair SCCs. SF The SCC graph forms a Directed Acyclic Graph (DAG), which induces a partial order: The minimal (maximal) SCC has no incoming (outgoing) edges. In symbolic model checking, we assume that all automata are defined over the same state space and agree on the state labels, and communication proceeds through the common state space. The composition A1 ∗ A2 = S, S0 , T, F, A, L of two B¨uchi automata A1 = S, S01 , T1 , F1 , A, L and A2 = S, S02 , T2 , F2 , A, L is defined by S0 = S01 ∩ S02 , T = T1 ∩ T2 , and F = F1 ∪ F2 . Hence, composing two automata restricts the transition relation and results in the intersection of the two languages. We also want to define a quotient restriction operation. Definition 3. The restriction of A = S, S0 , T, F, A, L by a subset SCC graph Q− = C S C , S0C , T C , SF is defined as A ⇓ Q− = S − , S0− , T − , F, A, L , with S − = {s|s ∈ C C and C ∈ S }, S0− = {s0 |s0 ∈ C0 and C0 ∈ S0C }, T − = {s1 × s2 |s1 , s2 ∈ C and C ∈ S C and (s1 , s2 ) ∈ T }. 1
Note when the context is clear we will just use ∗ to denote the composition operation between two FSMs. Similarly, consistent with BDD usage, we will sometimes use ∗ in place of × to refer to the cartesian product of two sets, or the product/composition of two automata.
Sharp Disjunctive Decomposition for Language Emptiness Checking
109
Obviously we have A ⇓ Q(A) = A. Note that unlike BDD restriction operations, the right argument is a segment of a quotient graph. Inside the definition, however, the automaton is actually operated upon by the sets of states implied by the quotient graph. An SCC-closed set of A is a subset V ⊆ S such that, for every SCC C in A, either C ⊆ V or C ∩ V = ∅. Note that if C is an SCC in A1 (or A2 ) , it is an SCC-closed set of the composition A1 ∗ A2 .
3
State Space Decomposition – Theory
The automaton A contains an accepting cycle iff its SCC graph Q(A) contains a fair SCC. Definition 4. A SCC graph Q(A) is “pruned” if all the minimal nodes are initial , all the maximal nodes are fair, and all the other nodes are on paths from initial nodes to fair nodes. Pruning (defined as removing nodes that are not in the pruned SCC graph Q(A)) does not change the language of the corresponding automaton A. In the following, we assume that all the SCC graphs are pruned. 2 The entire state space of A can be decomposed into state subspaces according to the structure of Q(A) . For brevity, we don’t give proof for the following theorems since they are obvious. Definition 5. For each fair SCC Ci in Q(A), we can construct such a SCC subgraph QF i by marking all the other SCCs “non-fair” and then pruning Q(A). Theorem 1. The language accepted by each state subspace A ⇓ QF i and the language accepted by A satisfy the following relations, L(A ⇓ QF i ) ⊆ L(A) ∪i L(A ⇓ QF i ) = L(A) Note that in each SCC subgraph QF i , the (only) maximal node is fair. Definition 6. In the SCC subgraph QF i , each “initial-fair” path constitutes a SCC . subgraph QL ij Theorem 2. The language accepted by each state subspace A ⇓ QL ij satisfies the following relations, F L(A ⇓ QL ij ) ⊆ L(A ⇓ Qi )
F ∪j L(A ⇓ QL ij ) = L(A ⇓ Qi ) 2
Note that in the pruned SCC graph, all the maximal nodes are fair. However, the fair SCCs are not always maximal – they might be on the path from initial to other maximal fair SCCs.
110
C. Wang and G.D. Hachtel
Thus, checking language emptiness of the original automaton A can be done on each individual subgraphs A ⇓ QL ij separately. L Theorem 3. L(A) = ∅ iff L(A ⇓ QL ij ) = ∅ for every SCC subgraph Qij . In order to clarify the distinction between Cartesian product and composition operations in the sequel (see methods (b) and (c) in Section 4.3) we also include the following proposition. Proposition 1. Let {Ci1 } be the SCCs of A1 and {Cj2 } be the SCCs of A2 . Then the SCCs {Cij } of the composition A1 ∗ A2 satisfies
∃k,l such that (1) Cij ⊆ Ck1 × Cl2 , (2) Cij ∗ (Ck1 × Cl2 ) = ∅, ∀(k , l ) =k,( l), with equality holding only when the edges inside Ci1 and Cj2 either: 1. have no labels; or 2. have labels whose supports are disjoint from each other; or 3. have mutually consistent labels (meaning nonempty conjunction). Note although the first two conditions for equality are subsumed by the third, they demonstrate cheap tests which might be used to avoid the expensive composition operation in some cases.
4 The Algorithm 4.1 The Overall Algorithm In this algorithm, we combine the idea of “sharp” guided search (will be explained in Section 4.4) with the “disjunctive” decomposition (explained in Section 3). The pseudo code of the overall algorithm is given in Figure 1. check-language-emptiness is the main procedure, it accepts three parameters: The concrete system A, the property automaton A¬ψ , and the list of (circuit) model submodules M = {M1 , M2 , ..., Mm }. The algorithm goes through the following phases: 1. The amassing phase: The property automaton A¬ψ is composed with submodules from {Mi }, one at a time, and its SCC graph QA+ is built at each step. This phase continues until either QA+ becomes an empty graph or the amassing threshold is reached. We will explain the amassing phase in detail in Section 4.2. 2. The decompose and pre-jump phase: Each fair SCC in A+ is pre-processed by intersecting with the remaining submodules in list {Mi }. The details of this prejump process will be explained in Section 4.3. By building the QF/QL subgraphs, QA+ is decomposed into a collection of SCC subgraphs QF /QL . 3. The jump phase: Now we “jump” to the concrete system A, with a collection of SCC subgraphs QL . Language emptiness is check on each individual state subspace A ⇓ QL . The “sharp” guided search idea is implemented in sharpsearch-andlockstep, together with the LockStep search, with focus on the ideagoal of “early termination”. This will be described in detail in Section 4.4.
Sharp Disjunctive Decomposition for Language Emptiness Checking // entire system, property, submodules check-language-emptiness(A, A¬ψ , {Mi }){ Reach := compute-initial-states(A) A+ := A¬ψ // amassing phase while (amassing threshold not reached) do Mi := pick-next-submodule(A+ , {Mi }) A+ := A+ ∗ Mi QA+ := build-sccgraph(A+ ) if QA+ is an empty graph then return true fi od for each fair SCC C ∈ QA+ do // decompose and pre-jump QF := build-qf-subgraph(QA+ , C) Queue := {C} for each remaining submodule Mi do Queue := refine-sccs(A¬ψ ∗ Mi , Queue) od for each dfs path pj in QF do QL := build-ql-subgraph(QF , pj ) if (sharpsearch-and-lockstep(A, QL , C, Reach, Queue) = false) then return false fi od od return true } sharpsearch-and-lockstep(A, QL , C, Reach, Queue){ // model, hyper-line, fair scc // reachable, and scc queue F ront := Reach absRings := compute-reachable-onionrings(A+ ⇓ QL ) // (see Definition 3) FS: while (F ront = ∅) and (F ront ∩ Queue = ∅) do F ront := img# (A ⇓ QL , F ront, absRings) \ Reach if (F ront = ∅) then F ront := img(A ⇓ QL , Reach) \ Reach fi Reach := Reach ∪ F ront od if (F ront = ∅) then return true else if (lockstep-with-earlytermination(A ⇓ C, Queue, absRings)) then return false else goto FS fi } Fig. 1. The Overall Algorithm for Checking Language Emptiness
111
112
C. Wang and G.D. Hachtel
4.2 Amassing and Decomposition Amassing the Property Automaton. The property automaton A¬ψ is usually small and its SCC graph demonstrates limited structure/sparsity. In order to get finer decomposi tion, we need to augment A¬ψ with a small portion of the submodules of M = i Mi . At the very beginning A+ = A¬ψ . As we pick up the Mi and gradually add them to A+ , we are able to see the structural interaction between the property automaton and the model. As a consequence, the SCCs in A+ gradually gets fractured and the SCC graph becomes larger and shows more structure/sparsity. We call this augmentation process “amassing the automaton”. The order in which the remaining submodules Mi are brought in is critical, as is the way in which the original model was partitioned to form the submodules Mi in the first place. Since our “sharpness” goal is to fracture the SCC graph Q(A+ ) and make it show more structure/sparsity, we used the following criteria: 1. Cone-Of-Influence (localization) reduction: Only state variables that are in the transitive fan-ins of A¬ψ are considered. These state variables are grouped into clusters {Mi } so that the interaction between clusters is minimized [14]. For each Mi , we compute the SCC graph Q(Ai ), with Ai = A¬ψ ∗ Mi . 2. When we augment A+ , we give priority to clusters which are both in the immediate fan-ins of A+ and have the relatively most complex SCC graph Q(Ai ). 3. We repeat the previous step until either all the Mi are added, or the amassing phase reaches a certain threshold. At each amassing step, current A+ is a refinement of the previous A+ (The SCC graph Q(A+ ) is also a refinement of its previous counter-part). This means that we can build the SCC graph incrementally, as opposed to building it from the scratch each time. We use lockstep to refine each SCC in the previous Q(A+ ), then update the edges. Also, the SCCs that are in the previous SCC graph, but now becomes redundant (not in the pruned graph) are removed. If at anytime, Q(A+ ) becomes empty, we can stop, knowing that the language is empty. In order to avoid an excessive partitioning cost , with consequent exponential number of subgraphs, we have heuristic control on the activation of SCC refinement: 1. If the size of an SCC in the previous Q(A+ ) is below a certain threshold, and it is not fair, skip refining it. 2. If the total number of edges in Q(A+ ), e, exceeds a certain threshold, stop the amassing. 3. If the total number of fair SCCs in Q(A+ ), f , exceeds a certain threshold, stop the amassing. After the amassing phase, the SCC graph Q(A+ ) is available. SCC subgraphs QF i L and QL ij will be built as discussed in Section 3. Since each Qij corresponds to a depthfirst search path in the SCC graph Q(A+ ), we also called them hyperlines in the sequel. In fact, each hyperline is an envelope of Abstract Counter-Examples (ACE). The total number of SCC subgraphs is bounded by the size of Q(A+ ). Theorem 4. For a SCC graph with f fair SCCs and e edges, the total number of QF i SCC subgraphs is f ; The total number of the QL ij SCC subgraphs is O(f e). Let’s denote the total number of states in A+ as ηk . Without the control by the amassing threshold, in the worst case e = O(ηk2 ) and f = O(ηk ). However, in our method, the amassing threshold bounds both f and e to constant values.
Sharp Disjunctive Decomposition for Language Emptiness Checking
113
4.3 The Jump Phase In the jump phase we determine if any of the abstract counter examples in the current hyperlines contain a concrete counter example. We are currently using an intermediate iterative state subspace restriction process that can be inserted before “jump”, which is related to the work of [14], and [16]. Assume that the submodules of the model (M = i Mi ) have each been intersected with the property automaton, creating a series of Ai , where i = 1, 2, ..., m 3 . After the amassing phase, we have: (1) the amassed automaton A+ = A1 ∗ A2 ∗ ... ∗ Ak−1 , (2) the remaining automata Ak , Ak+1 , ..., Am , and (3) the list of SCC-closed sets in A+ , which we shall call L+ . At this point, LockStep can be used to partition and refine each SCC-closed set of L+ into one or more SCCs according to the Transition Relations (TR) of Ai . We briefly discuss four different approaches to the “jump” phase. The first is one used for a similar purpose in D’n’C, while the last three are part of our new algorithm. In the last three approaches, the last step, called the jump step, is the same: They are computed on the entire concrete system, subject to a computed state subspace restriction. Only the state subspace restriction varies from method to method. First, in the D’n’C approach 4 [13], which we shall call Method (a), L = EL( Ai , ∪C∈L+ C) i
Here el stands for the Emerson-Lei algorithm, and ∪C∈L+ C is the union of all the fair SCC-closed sets of A+ . i Ai is the concrete system. Its main feature is that fair cycle detection is restricted to each state subspace C ∈ L+ . In [13], the advantageous experimental results were attributed mainly to this restriction and the automata strength reduction [12]. The second approach, which is the one currently in our implemention, can be called the “Cartesian product” approach, and will be referred to as Method (b). It is based directly on Proposition 1, and can be characterized as follows. 5 Compute Jump State Space Restriction = LockStep(Ak , L+ ) Lk Lk+1 = LockStep(Ak+1 , Lk ) = LockStep(Ak+2 , Lk+1 ) Lk+2 ... = LockStep(Am , Lm−1 ) Lm Jump in Restricted State Space L = LockStep( i Ai , Lm )
Ai = A¬ψ ∗ Mi , and A = i Ai 4 We acknowledge here that these methods fall partially under the purview of the “Policy” discussed in [13] in terms of the lattice of over approximations which derive from composing arbitrary subsets of all the submodules of the concrete system A+ . However the Cartesian Product Approach (Method (b) below) has an element (see Proposition 1) distinct from the topic of which approximations to use, and that element appears in (c) and (d) below as well. 5 In the pseudo code, this is described by the function refine-sccs 3
114
C. Wang and G.D. Hachtel
A direct analogy can be observed between Method (b) and the MBM (Machine by Machine) approach of [14]. For each SCC-closed set C in list Lk , LockStep will further partition it into a collection of SCCs according to the TR of Ak+1 . The submachines still remaining to be composed in testing the ACE, are treated “one machine at a time”. Note that the quotient graph of machine Ak = A¬ψ ∗ Mk has been computed a priori, and the searches inside LockStep are restricted to the state subspace C × C, where C is a specific SCC of Ak , and × represents the cartesian product. The product C × C is a smaller set than C, because the product operation further refines the partition block C. Further, this process can fracture the closed sets, since C and C are sometimes disjoint sets. Thus the closed sets in Lk can be smaller than those L+ . Similarly, those in Lk+1 are smaller still, and so on. Thus as the machine that LockStep operates on becomes progressively more concrete, the size of the considered state space becomes progressively smaller. Thus the state restriction subspaces of Method (b) are thus generally much smaller than those in Method (a). To illustrate this effect, consider a simple example, in which the original property automaton A¬ψ has two fair SCCs (scc1, scc2), and the pre-jump amassed automaton A+ also has two fair SCCs (SC1, SC2). We assume: scc1 ⊇ SC1 and scc1 × SC2 = ∅. Suppose there are two submodules Ma , Mb yet to be composed in the jump phase, and that Ma has a single fair SCC Ca that Mb also has a single fair SCC Cb . Summarizing, we have Module A¬ψ A+ Aa = A¬ψ ∗ Ma Ab = A¬ψ ∗ Mb Ma Mb
Fair SCCs scc1 scc2 SC1 SC2 C1 C2 C3 C4 Ca Cb
After the composition A¬ψ ∗ Ma , Ca is decomposed into 2 SCCs (C1,C2). In this case, it’s obvious that C1 ⊆ (scc1 × Ca ) and C2 ⊆ (scc2 × Ca ). The same thing happens to the composition of Mb : its only fair SCC Cb is decomposed into (C3, C4). And the following holds C3 ⊆ (scc1 × Cb ), C4 ⊆ (scc2 × Cb ). We take the two cartesian products to yield = {SC1 × Ca , SC2 × Ca } LockStep(Ma , {SC1, SC2}) LockStep(Mb , {SC1 × Ca , SC2 × Ca }) = {SC1 × Ca × Cb , SC2 × Ca × Cb } Notice that SC1 × Ca × Cb ⊇ SC1 × (C1 + C2) × (C3 + C4) SC2 × Ca × Cb ⊇ SC2 × (C1 + C2) × (C3 + C4) To Summarize Method (b) L+ La Lb L
= {SC1, SC2} amassing = LockStep(Aa , L+ ) = {SC1 × C1, SC2 × C2} refining Aa = LockStep(A , L ) = {SC1 × C1 × C3, SC2 × C2 × C4} refining Ab b a = LockStep( i Ai , Lb ) (Jump in Restricted State Space)
Sharp Disjunctive Decomposition for Language Emptiness Checking
115
Note since SC1 ⊆ scc1, SC2 ⊆ scc2 Method (b) gives a smaller restriction subspace than Method (a) as used in D’n’C. The third approach, Method (c), can be called the “one-step composition” approach, and can be characterized briefly as follows. Lk = LockStep(A+ ∗ Ak , L+ ) Lk+1 = LockStep(A+ ∗ Ak+1 , Lk ) ... Lm = LockStep(A+ ∗ Am , Lm−1 ) L
= LockStep( i Ai , Lm)
Whereas Method (b) did no composition prior to making the full jump, Method (c) invests more heavily in sharpness by composing A+ with each of the remaining submodules. At each step we use the refined SCC-closed sets computed in the previous step. This is certainly more work than method (b) but produces still “sharper” (that is, smaller) restriction subspaces, due to SCC-fracturing process inherent in composition. In comparing Methods (b) and (c), the reader should pay attention to Proposition 1. Working with unlabelled graphs might give the impression that methods (b) and (c) give identical results. Note for an edge to exist in the STG of the composition it must exist in both of the machines being composed. Thus whereas method (b) never “fractured” any individual SCCs, Method (c) does, ultimately leading to much smaller restriction subspaces in the jump (that is the last) step. The fourth approach, Method (d), can be called the “full iterative composition” approach, and can be characterized briefly as follows. Lk Lk+1 Lk+2 ... L
= LockStep(A+ ∗ Ak , L+ ) = LockStep((A+ ∗ Ak ) ∗ Ak+1 , Lk ) = LockStep(((A+ ∗ Ak ) ∗ Ak+1 ) ∗ Ak+2 , Lk+1 ) = LockStep( i Ai , Lm−1 )
Note that in the calls to LockStep, the next of remaining uncomposed submachines are composed with the result of the previous composition. At each step computation is restricted to an SCC-closed set computed on the previous step. This composition process maximally fractures the SCC closed sets. Each step is thus done on a maximally reduced restriction subspace, due to the restriction to the state subspace of an SCC of computed in the previous step. Further, the SCCs of Lk+1 are generally smaller than those in Lk . Thus as the machine LockStep operates on becomes progressively more concrete, the size of the considered state space becomes progressively smaller. Method (d) is offered to complete the spectrum of available sharpness options. It has not yet been implemented. The principle at work in Methods (a)-(d) is to use the maximum affordable sharpness with each composition step. Method (a) represents the least investment in sharpness, and therefore suffers the least amount of overhead. However, it performs the most expensive
116
C. Wang and G.D. Hachtel
step (the jump step) on the largest restriction subspace. Similarly, Method (d) is sharpest at the jump step, but incurs the greatest overhead. Roughly speaking, we expect that CPUTIME(a)) CPUTIME(b) CPUTIME(c)) CPUTIME(d) However, in the experimental results section we show that the largest computations are only possible with maximum affordable sharpness. The larger investment is clearly justified when the cheaper approach fails anyway. 4.4
Sharp Search and Fair Cycle Detection
Now we “jumped” to the concrete system, and language emptiness need to be checked on each individual state subspace (A ⇓ QL ij ). Fortunately, the subspaces are smaller than the entire state space, thus, both forward traverse and fair cycle detection are easier. Since fair cycle detection is generally harder than forward traverse, and it doesn’t make sense searching unreachable area for a fair cycle, we want to do forward search first, and only starting fair cycle detection when forward search hits a promising state. The promising state is defined as the state that is both in the SCC-closed set of A+ and intersects some fair sets. These promising states are also prioritized: those intersect more fair sets get higher priorities. Sharp Search. We notice that not all the hyperlines are as “sharp” as expected. This is because the SCC size varies, and sometimes we have a big SCC stay in the hyperline. In this case, we need to sharpen it further. The “sharp” guided search algorithm is proposed to address this issue. Instead of using the normal image computation in the forward search, at each step we use its “sharp” counterpart – img# . The pseudo code of img# is given in figure 2. First, a subset of the “from” set is computed heuristically, (it could be a minterm, a cube , or an arbitrary subset with a small BDD size), and states in the subset is selected in a way that those with a shorter approximate distances to the fair SCC are favored. img# is fast even on the concrete system, and it heuristically targets the fair SCC. In other words, it is able to hit a fair SCC by visiting only a portion of the states in the stem (states between initial states and fair SCCs). img# (A, F rom, absRings){ // Model, from set, and abstract onionRings i := length(absRings) while (F rom ∩ absRings[i] = ∅) do i−− od F rom# := bdd-subsetting(F rom ∩ absRings[i]) return img(A, F rom# ) } Fig. 2. The “sharp” image computation algorithm
Sharp Disjunctive Decomposition for Language Emptiness Checking
117
Since img# computes only a subset of the image each time, a dead-end might be reached before the forward search reaches the fix-point. Whenever this happens, we need to backtrack and use the normal img to recover (the algorithm is described in Figure 1). If there exist fair cycles, the sharp guided search algorithm might find one by exploring only part of the reachable states and going directly to its target - the fair SCC. Though all the reachable states or all the SCC-closed set (whichever finishes first) should be explored if there is no fair cycle. In the worst case, the sharp search will have to be executed on every hyperline. It is possible that some area (states) are shared by more than one hyperlines. The variable Reach is used (Figure 1) to avoid computing them more than once. Given ηR as the total number of reachable state on each state subspace A ⇓ QL ij , and f e as the total number of hyperlines (or QL SCC subgraphs), the cost of sharp search on all the subgraphs is O(ηR + f e). Prioritized Lockstep with Early Termination. LockStep with early termination is used together with sharp guided search to find a fair cycle on each A ⇓ QL ij . All the SCC-closed set are put into a priority queue, and they are prioritized according to the approximate distances to the initial states. (These distances are computed on the abstract model A+ ). The recursion in LockStep is also implemented using the priority queue [17]. LockStep will be started as soon as sharp forward search hits some promising states. At this time, one promising state (with higher priority) will be selected as a seed. This guarantees that every fair SCC found is in the reachable area. The early termination is implemented such that, as soon as the cycle found so far intersects all the fair sets, we stop (as opposed to find the entire fair SCC). Assume that η is the total number of states on the concrete system A, clearly, the cost of fair cycle detection is bounded by O(η log η). 4.5
Complexity
+ + + Here A+ 1 , A2 , ..., Ak are used to represent the series of A in the amassing phase. Assume A contains r state variables, the total number of states is then η = O(2r ). Since we have all the abstract model and the concrete system defined over the same + state space and agree on the state labels, each A+ i has η states. However, On each Ai the entire state space can be partitioned into ηi parts and the states inside each part are “indistinguishable”. We define ηi as the effective number of states of the abstract model. ti If A+ i contains ti ≤ r state variables, we have ηi = O(2 ). It’s obvious that LockStep takes O(ηi log ηi ) on each Ai .
Amassing and Decomposition. Building the SCC quotient graph for each A+ i takes O(ηi log ηi ) symbolic step. For any two consecutive abstract models A+ and A+ i i+1 , + Ai+1 has at least one more state variable. This gives us the following relation over their effective number of states: ηi+1 ≥2 ηi
118
C. Wang and G.D. Hachtel
Thus, the total cost of the amassing phase is bound by ηk log ηk (1 + 1/2 + 1/4 + ...) ≤ 2ηk log ηk which is O(ηk log ηk ). So does the pre-jump process. During the decomposition phase, the total number of hyperlines is O(f e), given that the SCC quotient graph of A+ k has a total number of f fair SCCs and e edges. Sharp Search and Lockstep. In the worst case, sharp-search is traversing all the reachable states, plus at least 1 image computation on each state subspace, thus its cost is bounded by O(ηR + f e). Fair cycle detection on the concrete system is bounded by O(η log η) symbolic steps. Put all of them together, we have the overall complexity O(ηk log ηk + f e + ηR + η log η + f e) = O(η log η + f e) In our implementation, f e is bounded by a constant value (amassing threshold), though leaving it uncontrolled will result in O(f e) = O(ηk3 ) in the worst case .
5
Experiments
We implemented our algorithm in VIS-1.4 (we call it LEC# ), and compared its performance with both Emerson-Lei (the standard language emptiness checking command) and D’n’C on the circuits in [13] and the texas97 benchmark circuits. All the experiments are using the static variable ordering (obtained by dynamic variable reordering command in VIS). Table 1 and Table 2 are run on the 400MHz Pentium II with 1GB of RAM, while Table 3 is on the 1.7 GHz Pentium 4 with 1GB of RAM. All of them are running Linux and with the data size limit set to 750MB. Table 1 shows that with VIS dcLevel=2 (using prior reachability analysis result as don’t cares where possible), D’n’C consistently outperforms our new algorithm. To summarize the comparison of the new algorithm and D’n’C, we can denote by “CL” (Constant factor Lose) the case in which both algorithms complete, but D’n’C is faster. Similarly, we can denote by “CW” (Constant factor Win) the case in which both algorithms complete, but LEC# is faster. We also denote by “AL/AW” (Arbitrary Factor Loss/Win) the case where D’n’C (the new algorithm) completes but the other doesn’t. With this notation, a tally of Table 1 gives Cases CL CW AL AW
LEC# vs. D’n’C 15 3 0 0
LEC# vs. EL 6 6 0 6
We see that compared to D’n’C, LEC# has only 3 constant factor wins vs. 15 for D’n’C. However, when you look at D’n’C’s 15 CWs, only 4 were for problems needing more than 100 seconds to complete–that is, the easy problems. In contrast, on LEC# ’s 3 CWs, D’n’C took 1337, 1683, and 233 seconds. Neither algorithm had an AW case.
Sharp Disjunctive Decomposition for Language Emptiness Checking
119
We conclude that on harder problems LEC# is at least competitive even when advance reachability is feasible. Making similar tally for LEC# vs. EL, we see that LEC# ties in the constant factor competition with 6 CWs each, and has a convincing advantage in AWs: New 9, EL 0. Tables 2 and 3 show that with VIS dcLevel=3 (using approximate reachability analysis result as don’t cares), LEC# consistently out-performs both D’n’C and EL on the circuits of [13] and the Texas-97 benchmark circuits. The difference here is that both D’n’C and EL depend strongly on full reachability 6 to restrict the search spaces. The sharp searches of the new algorithm minimizes this dependency. For the circuits of [13] the tallies are: Cases CL CW AL AW
LEC# vs. D’n’C 6 4 0 7
LEC# vs. EL 3 5 0 9
We see that compared to D’n’C, LEC# has only 4 constant factor wins vs. 6 for D’n’C. However, when you look at D’n’C’s 6 CWs, all were for problems needing less than 100 seconds to complete–that is, the easy problems. In contrast, on LEC# ’s CWs, D’nC’ took 7565, 5, 2165, and 1139 seconds. Except for one case, LEC# “wins the big ones”. The bottom line that we are seeking is completion on large problems. In that respect, note that AWs (Arbitrary Factor Wins) are LEC# 7, D’n’C 0. Making similar tallies for LEC# vs. EL, we see that LEC# wins the constant factor competition, and has an even more convincing advantage in AWs: New 9, EL 0. Finally, we look at the same comparisons for Texas-97 benchmark circuits. Similarly tallying Table 3, we obtain Cases CL CW AL AW
LEC# vs. D’n’C 2 3 0 2
LEC# vs. EL 1 3 0 3
For these mostly larger circuits, for some of which reachability is prohibitively expensive, we see a decisive advantage of LEC# vs. both D’n’C and EL.
6
Conclusion
In this paper we proposed a new algorithm for language emptiness, based on a series of “sharpness” heuristics, which enable us to perform the most expensive parts of language emptiness checking with restriction to minimal state subspaces. We presented theoretical and experimental results which supports our hypothesis that for large or otherwise difficult problems, heavy investment in sharpness-based heuristic state subspace restriction and guidance is justified. 6
The full reachability analysis is usually impossible on pratical circuits
120
C. Wang and G.D. Hachtel
Table 1. On the circuits of [13]: * means dcLevel=0 (no don’t cares), otherwise dcLevel=2 (reachable don’t cares). T/O means time-out after 4 hours. Circuit Pass and LTL or Fail bakery1 F bakery2 P bakery3 P bakery4 F bakery5 F eisen1 F eisen2 F elevator1 F nmodem1 P peterson1 F philo1 F philo2 F philo3 P shamp1 F shamp2 F shamp3 F twoq1* P twoq2* P
latch num 56 49 50 58 59 35 35 37 56 70 133 133 133 143 144 145 69 69
CPU (s) Memory (MB) BDD (M) EL D’n’C LEC# EL D’n’C LEC# EL Dnc LEC# 212 27 159 262 69 20 28 152 421 43 1514 550 T/O 1337 655 T/O 623 737 23 16 128 69 T/O 1683 944 210 41 192 489 4384 233 227 569 17 21 41 73 371 7 56 401 73 12 58 145 T/O 115 207 44 87 303 96 T/O 101 239 T/O 335 1383 12 4 14 36 241 30 289 333
75 73 111 411 555 50 564 132 63 83 26 44 119 113 187 478 23 47
125 74 125 476 554 64 340 369 169 78 37 42 329 401 268 500 24 509
5.1 3.4 14 1.0 14 11 1.1 12 2.8 2.1 0.4 8.9
1.3 1.2 1.8 4.7 6.1 0.9 7.7 2.2 0.6 1.2 0.2 0.5 1.2 2.2 2.9 4.4 0.1 0.9
1.5 1.2 1.5 4.8 9.9 0.6 1.7 10.3 2.2 1.2 0.1 0.3 7.0 9.2 3.5 5.8 0.0 7.9
The experimental results show that while D’n’C mostly outperforms our new algorithm on problems where prior reachability is possible, the new algorithm outperforms both the Emerson-Lei algorithm and the D’n’C algorithm on difficult circuits. Although our new algorithm does not win in every case, it tends to win on the harder problems. Out of the 25 LTL model checking cases we described in Tables 2 and 3, Emerson-Lei timed out in 13 cases, more than half. This attests to the fact that the circuits studied, while not huge, are definitely non-trivial. The D’n’C algorithm timed out on 10 of the 25 cases. Since our algorithm never timed out (and usually had much smaller memory requirements when time was an issue) we can only say that the speedup achieved in these cases was arbitrarily large. We note that our new algorithm does not yet employ the strength reduction techniques of D’n’C. This suggests that sharpness itself is very powerful. However, when combined with the strength reduction techniques, our advantage with respect to both D’n’C and Emerson-Lei, might improve further on some problems. A priority in future work would be to diagnose the qualities of a given design which make language emptiness checking compute-intensive. This might afford guidance on how to set the various parameters of the algorithm such as how many latches to compose before jumping, and how to choose, for example between sharp forward search and sharp backward search at the end of the jump phase (currently, we start both and abandon the one that seems to be stalling).
Sharp Disjunctive Decomposition for Language Emptiness Checking
121
Table 2. On the circuits of [13] : With dcLevel=3 (approximate reachable don’t cares). T/O means time-out after 4 hours. Circuit Pass and LTL or Fail bakery1 F bakery2 P bakery3 P bakery4 F bakery5 F eisen1 F eisen2 F ele F nullmodem P peterson F philo1 F philo2 F philo3 P shampoo1 F shampoo2 F shampoo3 F twoq1 P twoq2 P
latch num 56 49 50 58 59 35 35 37 56 70 133 133 133 143 144 145 69 69
CPU Memory EL D’n’C LEC# EL D’n’C LEC# T/O 7565 5367 183 5 2 241 2794 48 174 609 T/O T/O 1964 T/O T/O 1294 23 6 107 36 T/O T/O 1150 3504 2156 585 663 T/O T/O 3375 4 8 176 21 T/O T/O 385 T/O T/O 267 T/O 1139 241 12 T/O 168 21 T/O T/O 189 T/O 53 735 12 4 23 37 172 30 665 322
609 25 128 26 612 42 609 51 14 15
BDD EL Dnc LEC#
447 - 17.6 15 4.1 0.1 133 18.8 2.1 477 416 73 0.3 0.3 365 657 24.4 21.1 306 121 0.0 0.3 64 144 119 - 21.4 127 0.0 153 331 - 0.3 24 0.4 0.1 496 7.7 0.9
8.0 0.0 1.5 4.0 4.9 0.5 3.0 23.6 2.6 1.4 0.9 2.1 1.4 2.0 3.0 5.0 0.0 8.2
Table 3. On Texas-97 benchmark circuits. With dcLevel=3 (approximate reachable don’t cares). T/O means time out after 8 hours. Circuit Pass and LTL or Fail Blackjack1 F MSI cache1 P MSI cache2 F PI bus1 P PI bus2 F PPC60X1 F PPC60X2 P
latch EL num 176 7296 65 T/O 65 T/O 387 T/O 385 501 67 1109 69 13459
CPU (s) Memory (MB) D’n’C LEC# EL D’n’C LEC# 2566 237 618 T/O 51 T/O 165 73 1700 292 1302 467 1690 651 609 2811 531 745
610 243 477 611 625
551 83 342 539 609 445 327
BDD (M) EL D’n’C LEC# 26.8 17.0 20.1 17.8
24.2 3.5 15.4 22.4 18.9
18.1 2.0 6.7 13.4 22.6 10.6 6.9
Further research should be focused on both the clustering algorithms to create the submodules, and the corresponding refinement scheduling (guidance on the order of processing the submodules in the amassing and jump phases). Acknowledgements. We acknowledge the contributions of “deep in the shed” research sessions with Roderick Bloem, Kavita Ravi, and Fabio Somenzi.
122
C. Wang and G.D. Hachtel
References [1] O. Lichtenstein and A. Pnueli. Checking that finite state concurrent programs satisfy their linear specification. In Proceedings of the Twelfth Annual ACM Symposium on Principles of Programming Languages, pages 97–107, New Orleans, January 1985. [2] M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In Proceedings of the First Symposium on Logic in Computer Science, pages 322–331, Cambridge, UK, June 1986. [3] K. L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Boston, MA, 1994. [4] R. P. Kurshan. Computer-Aided Verification of Coordinating Processes. Princeton University Press, Princeton, NJ, 1994. [5] E. A. Emerson and C.-L. Lei. Efficient model checking in fragments of the propositional mu-calculus. In Proceedings of the First Annual Symposium of Logic in Computer Science, pages 267–278, June 1986. [6] R. Hojati, H. Touati, R. P. Kurshan, and R. K. Brayton. Efficient ω-regular language containment. In Computer Aided Verification, pages 371–382, Montr´eal, Canada, June 1992. [7] H. J. Touati, R. K. Brayton, and R. P. Kurshan. Testing language containment for ω-automata using BDD’s. Information and Computation, 118(1):101–109, April 1995. [8] Y. Kesten, A. Pnueli, and L.-o. Raviv. Algorithmic verification of linear temporal logic specifications. In International Colloquium on Automata, Languages, and Programming (ICALP-98), pages 1–16, Berlin, 1998. Springer. LNCS 1443. [9] A. Xie and P.A. Beerel. Implicit enumeration of strongly connected components and an application to formal verification. IEEE Transactions on Computer-Aided Design, 19(10):1225– 1230, October 2000. [10] R. Bloem, H. N. Gabow, and F. Somenzi. An algorithm for strongly connected component analysis in n log n symbolic steps. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 37–54. Springer-Verlag, November 2000. LNCS 1954. [11] O. Kupferman and M. Y. Vardi. Freedom, weakness, and determinism: From linear-time to branching-time. In Proc. 13th IEEE Symposium on Logic in Computer Science, June 1998. [12] R. Bloem, K. Ravi, and F. Somenzi. Efficient decision procedures for model checking of linear time logic properties. In N. Halbwachs and D. Peled, editors, Eleventh Conference on Computer Aided Verification (CAV’99), pages 222–235. Springer-Verlag, Berlin, 1999. LNCS 1633. [13] C. Wang, R. Bloem, G. D. Hachtel, K. Ravi, and F. Somenzi. Divide and compose: SCC refinement for language emptiness. In International Conference on Concurrency Theory (CONCUR01), pages 456–471, Berlin, August 2001. Springer-Verlag. LNCS 2154. [14] H. Cho, G. D. Hachtel, E. Macii, M. Poncino, and F. Somenzi. A state space decomposition algorithm for approximate FSM traversal. In Proceedings of the European Conference on Design Automation, pages 137–141, Paris, France, February 1994. [15] R. K. Brayton et al. VIS: A system for verification and synthesis. In T. Henzinger and R. Alur, editors, Eighth Conference on Computer Aided Verification (CAV’96), pages 428– 432. Springer-Verlag, Rutgers University, 1996. LNCS 1102. [16] D. L. Dill. What’s between simulation and formal verification? In Proceedings of the Design Automation Conference, pages 328–329, San Francisco, CA, June 1998. [17] K. Ravi, R. Bloem, and F. Somenzi. A comparative study of symbolic algorithms for the computation of fair cycles. In W. A. Hunt, Jr. and S. D. Johnson, editors, Formal Methods in Computer Aided Design, pages 143–160. Springer-Verlag, November 2000. LNCS 1954.
Relating Multi-step and Single-Step Microprocessor Correctness Statements Mark D. Aagaard1 , Nancy A. Day2 , and Meng Lou2 1
Electrical and Computer Engr., University of Waterloo
[email protected] 2 Computer Science, University of Waterloo, Waterloo, ON, Canada
[email protected],
[email protected] Abstract. A diverse collection of correctness statements have been proposed and used in microprocessor verification efforts. Correctness statements have evolved from criteria that match a single step of the implementation against the specification to seemingly looser, multi-step, criteria. In this paper, we formally verify conditions under which two categories of multi-step correctness statements logically imply single-step correctness statements. The first category of correctness statements compare flushed states of the implementation and the second category compare states that are able to retire instructions. Our results are applicable to superscalar implementations, which fetch or retire multiple instructions in a single step.
1
Introduction
Microprocessor verification efforts usually compare a state-machine description of a microarchitectural-level implementation against an Instruction Set Architecture (ISA). The correctness statement describes the intended relationship between the implementation and the specification ISA. In early verification efforts, correctness statements were based on Milner’s pointwise notion of simulation — a commuting diagram that says for any step the implementation takes, the specification must take a corresponding step [15]. Pipelining and other optimizations increased the gap between the behaviour of the implementation and the specification, making it more difficult to show that an individual implementation step corresponds to a specification step. In a seminal paper, Burch and Dill proposed constructing abstraction functions automatically by flushing pipelines [5]. Their correctness criteria compares each step of the implementation against the specification by flushing the implementation. As verification efforts have tackled complexities such as out-of-order execution and interrupts, the correctness statements have evolved from single-step criteria to seemingly looser, multi-step criteria. Sawada and Hunt [16], Hosabettu et al. [10], Jones et al. [14], and Arons and Pnueli [3] check that the implementation corresponds with the specification only at flushed implementation states, i.e. states with no in-flight instructions. Fox and Harman [7] compare the implementation and specification only at states where an instruction is about to retire. Berezin et al. [4] compare multi-step implementation traces that fetch a single instruction against a single step of the specification. M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 123–141, 2002. c Springer-Verlag Berlin Heidelberg 2002
124
M.D. Aagaard, N.A. Day, and M. Lou
The change from single-step to multi-step correctness statements raises the questions “are they proving the same relationship?”, “are there correct machines that satisfy multistep correctness but not single-step?”, and finally, “are there bugs that are undetectable with multi-step correctness statements?” To explore the relationship between multi-step and single-step correctness statements, we build on the Microbox framework [1,2] for microprocessor correctness statements. Using Microbox, Aagaard et al. [2] described and compared thirty-seven correctness statements from twenty-nine papers. Day et al. [6] mechanized Microbox in the HOL theorem prover [8] and verified a partial order relationship between correctness statements. Day et al. proved that tighter criteria, such as single-step correctness statements, logically imply looser criteria, such as testing only flushed states of the implementation. In this paper we examine whether some reverse implications hold, i.e., if a multi-step correctness statement is verified, is there a single-step statement that also holds? Sections 2 provides background material on Microbox. Section 3 characterizes the microprocessor-specific functions used in the correctness statements. Section 4 describes the relationship between multi-step correctness that compares flushed states and singlestep correctness using Burch-Dill style flushing. The main result of the section is Theorem 3, which says that comparing flushed states of the implementation against the specification is equivalent to using flushing to compare each step of the implementation, for deterministic specifications with no internal state. We also provide an example of a non-deterministic specification and implementation that satisfy the multi-step correctness statement, but not the single-step statement with flushing. Section 5 describes the relationship between multi-step correctness at retirement to single-step correctness. Theorem 6 says that comparing the implementation to the specification when instructions are about to retire is equivalent to checking each step of the implementation. Our results are applicable to superscalar implementations, which can fetch and retire multiple instructions in a single step. Sections 6 and 7 consider the relevance of our results to existing verification efforts and summarize the paper.
2 The Microbox Framework The Microbox framework uses four parameters to characterize a correctness statement: alignment, match, implementation execution, and specification execution. Alignment is the method used to align the traces of the implementation and specification (Section 2.1). Match is the relation established between aligned states in the implementation and specification traces (Section 2.2). Implementation execution and specification execution describe the type of state machines used – either deterministic or non-deterministic. The Microbox framework provides a list of options for each of these parameters based on verification efforts discussed in the literature (Table 1). By choosing options for the parameters, Microbox can produce a wide variety of correctness statements. Each correctness statement contains a base case and an induction step. The base cases deal with initial states and are generally quite straightforward, so we concentrate on the induction steps. The alignment parameter determines the overall form of the induction clause. For each alignment option, Microbox defines a correctness statement for an other match (O), non-deterministic implementation (N), and non-deterministic specification
Relating Multi-step and Single-Step Microprocessor Correctness Statements
125
Table 1. Options for correctness statement parameters alignment (F) Flushpoint (W) Will-retire (M) Must-issue (S) Stuttering (I) Informed-pointwise (P) Pointwise
match impl. execution spec. execution (O) Other (N) Non-deterministic (N) Non-deterministic (A) Abstraction (D) Deterministic (D) Deterministic (U) Flushing (E) Equality (R) Refinement Map
Example: IUND = informed-pointwise alignment (I), flushing match (U), non-deterministic implementation (N) and deterministic specification (D).
(N). Correctness statements for different match and execution options are generated by substitutions into the *ONN definitions. In Microbox, both the specification and implementation machines have program memories as part of their state, and so do not take instructions as inputs. Invariants, which limit the state space of a machine to reachable states or an over-approximation of reachable states, are encoded in the set of states for a machine. Table 2 summarizes the notation. Table 2. State-machine notation N N k (q, q ) n π π qi = qs
Next-state relation q is reachable from q in k steps of N Next-state function External state projection function. Externally visible equivalence: πi (qi ) = πs (qs ).
Identifiers are subscripted with “s” for specification and “i” for implementation.
In Sections 2.1 and 2.2, we describe the alignment and match options that are relevant to this paper. In Section 2.3, we characterize the correctness statements in terms of the type of synchronization used, i.e. at fetch or at retire. In Section 2.4, we describe the partial order relationships between these correctness statements. 2.1 Alignment Alignment describes which states in the execution trace are tested for matching. Pointwise alignment (P, Definition 1) is the classic commuting diagram. Informed-pointwise (I, Definition 2) is a variation of pointwise alignment suitable for superscalar implementations, which allows the implementation to inform the correctness statement of the number of specification steps to take. In practice, numInstr is instantiated with either the number of instructions that were fetched (numFetch) or the number of instructions that were retired (numRetire), depending on the synchronization method (Section 2.3).
126
M.D. Aagaard, N.A. Day, and M. Lou
Definition 1 (Pointwise induction clause: PONN). Ns PONN(R, Ni , Ns ) ≡ qs qs ∀ qi , qi . ∀ qs . ∃qs . R R N (q , q ) N (q , q ) =⇒ ∧ s s s ∧ i i i qi qi R(qi , qs ) R(qi , qs ) N i
Definition 2 (Informed-pointwise induction clause: IONN). IONN(numInstr, R, Ni , Ns ) ≡
∀ qi , qi . ∀ qs . ∃ qs . letj = numInstr(q i , qi ) in j N (q , q ) N (q , q ) =⇒ ∧ s s s ∧ i i i R(qi , qs ) R(qi , qs )
qs
Ns
Ns
qs R
R qi
Ni
qi
Will-retire alignment (W, Definition 3) compares the implementation and specification whenever the implementation is ready to retire instructions. The implementation retires one or more instructions in the first step of the trace and continues until it is ready to retire again. Definition 3 (Will-retire induction clause: WONN). WONN(numRetire, willRetire, R, Ni , Ns ) ≡ ∀ qi0 , qi1 , . . . , qik . ∀ qs . ∃ qs . letr = numRetire(qi0 , qi1 ) in Ni (qi0 , qi1 ) ∧ willRetire(qi0 , qi1 ) ∧ ∀j ∈ 1 . . . k − 1. Nsr (qs , qs ) =⇒ ∧ R(qik , qs ) Ni (qij , qij+1 ) ∧ ¬ willRetire(qij , qij+1 ) ∧ ( ∃ q . Ni (q k , q ) ∧ willRetire(q k , q ) ) i i i i i ∧ R(qi0 , qs ) R qs
Ns
Ns
qs R
R
Ni
qik
qi
et llR wi
llR et wi
Ni
ire
ire
Ni
¬
llR et wi
¬
llR et wi
qi1
ire
Ni
ire
qi0
Flushpoint alignment (F, Definition 4) compares flushed states of the implementation against the specification. It says that if there is a trace between flushed implementation states, then there must exist a trace in the specification between a pair of states that match the flushed implementation states.
Relating Multi-step and Single-Step Microprocessor Correctness Statements
127
Definition 4 (Flushpoint induction clause: FONN). j
FONN(isFlushed, R, Ni , Ns ) ≡
qs
∀ qi , qi , qs . ∃ qs . isFlushed(qi ) ∧ ∃ k. Nik (qi , qi ) ∃ j. Nsj (qs , qs ) ∧ isFlushed(qi ) =⇒ ∧ R(qi , qs ) ∧ R(qi , qs )
2.2
Ns
Ns
qs
R
isF lu
ed
k
qi
sh
Ni
isF lu
Ni
sh
ed
qi
R
Match
Instantiations for the match parameter are relations between an implementation state qi and specification state qs that mean “qi is a correct representation of qs ”. Figure 1 shows the match options that are relevant to this paper and the partial order on the options.
(O)
qs •T
General relation R(qi , qs )
qi •
^== ==
@ (E)
Equality π qi = qs
qs • V
qs •T
(U)
π
=
qi •
R
Flushing π flush(qi ) = qs
π =
•O flush qi •
Fig. 1. Options and partial order for the match parameter
An other match (O) is any relation between implementation and specification states. The flushing match (U) uses a flushing function to compute an implementation state that should be externally equivalent to a specification state. The equality match (E) requires that the implementation and specification states be externally equivalent. 2.3
Synchronization
In the implementation projection function (πi ), there are two common representations of the program counter: the address of the next instruction to fetch, and the address of the next instruction to retire. We refer to the first option as synchronization at fetch and the second option as synchronization at retirement. For a projection function to be sensible, the program counter, register file, and other state components must all reflect the same point in the execution of a program. Synchronization at fetch is only appropriate when applied to a flushed implementation state.
128
M.D. Aagaard, N.A. Day, and M. Lou
Hence, synchronization at fetch can only be used with the flushing match, which flushes the implementation before applying the projection function, and with flushpoint alignment. With synchronization at retirement, the register file and program counter always correspond to the same point of execution. The function numInstr is instantiated with numFetch for synchronization at fetch and numRetire for synchronization at retirement. Instructions in the shadow of a mispredicted branch or an exception should not be executed by the specification, and so do not count toward the number of instructions fetched. The function numRetire counts the number of instructions that retire. Every instruction that retires should be executed by the specification. 2.4
Correctness Space
Figure 2 shows the partial order of logical implication for the first two parameters of correctness statements (alignment and match). For the third and fourth parameters, the execution of the implementation and specification machines, it is easy to consider deterministic as an instance of non-deterministic, thereby providing the ordering amongst these options. The alignment parameter iF (Definition 5, informed-flushpoint — a common instance of F) will be introduced in Section 4.1. The non-shaded lines show the natural ordering amongst correctness criteria, which was verified in Day et al. [6]. In this paper, we verify the arrows in the shaded boxes, which proves equivalences between the correctness statements. In Section 4.2, we verify informed-flushpoint with the equality match for deterministic specifications with no internal state is equivalent to informed-pointwise with flushing (iFE ⇐⇒ IU). The dashed line between iFE and IU indicates that this implication holds only for deterministic specifications. In Section 5, we prove will-retire equality is equivalent to informed-pointwise equality (WE ⇐⇒ IE). In related work, we verified that the multi-step correctness statement of must-issue with the flushing match, in which the implementation takes some number of stalled steps followed by one step where it fetches an instruction, is equivalent to the singlestep informed-pointwise flushing (IUNN) [6].
3
Characterization of Microprocessor-Specific Functions
The relationships between correctness statements are based on microprocessor-specific functions and relations (Table 3) behaving appropriately. In this section, we describe the required conditions on these functions. These conditions often appear as lemmas in verification efforts. To apply our results to a particular specification and implementation, these conditions would have to be verified. Conditions 1–5 are for synchronization at fetch. Conditions 6–8 are for synchronization at retirement. 3.1
Fetching and Flushing Conditions
Condition 1 states that numFetch is zero in a step if-and-only-if doesFetch is false.
Relating Multi-step and Single-Step Microprocessor Correctness Statements
Flushpoint
FU
FO
iFU
iFO
129
F FE
Alignment Options
Informed Flushpoint
iF iFE
WO
Will-Retire W WE
Informed Pointwise
I
Pointwise
P
IU
IO
PU
PO
IE
PE
E
U
O
Equality
Flushing
Other
Match Options
Fig. 2. Partial order for correctness statements Table 3. Microprocessor-specific functions doesFetch(qi , qi ) numFetch(qi , qi ) willRetire(qi , qi ) numRetire(qi , qi ) flush(qi ) isFlushed(qi )
true if an instruction is fetched in a step. returns the number of instructions fetched in a step. true if an instruction is retired in a step. returns the number of instructions retired in a step. flushes qi , i.e., completes the execution of any in-flight instructions. true if a state is flushed.
Condition 1 (numFetch and doesFetch) numFetch doesFetch(numFetch, doesFetch) ≡ ∀ qi , qi . (numFetch(qi , qi ) = 0) ⇐⇒ ¬ doesFetch(qi , qi ) We characterize the required behaviour of a flushing function with Conditions 2 and 3. Condition 2 relates the function flush to the predicate isFlushed and says that if a state qi is flushed, then flushing qi returns qi , i.e. flush is the identity function for a flushed state.
130
M.D. Aagaard, N.A. Day, and M. Lou
Condition 2 (isFlushed and flush) isFlushed flush(isFlushed, flush) ≡ ∀ qi . isFlushed(qi ) =⇒ (flush(qi ) = qi ) Condition 3 says that if an instruction is not fetched in a step where the implementation transitions from qi to qi , then flushing qi returns the same state as flushing qi . Equivalently, flushing a stalled state results in the same state as allowing the machine to take one (unproductive) step and then flushing. Condition 3 (doesFetch and flush) doesFetch flush(doesFetch, flush, Ni ) ≡ ∀ qi , qi . ¬ doesFetch(qi , qi ) ∧ Ni (qi , qi ) =⇒ (flush(qi ) = flush(qi )) Conditions 2 and 3 are the only restrictions on flushing functions. The construction of the flushing function is up to the verifier. The most common method for constructing a flushing function was originated by Burch and Dill [5]. They iterate a deterministic implementation’s next-state function without fetching new instructions. Another method for constructing flushing functions was developed by Hosabettu et al. [10], who define completion functions for each stage in the pipeline and then compose the completion functions to create a flushing function. We also need a reachability condition and a liveness condition. Condition 4 says that for any implementation state, qi , there exists a trace from a flushed implementation state to qi . Condition 4 (Past Flush) past flush(isFlushed, Ni ) ≡ ∀ qi . ∃ k, qi0 . isFlushed(qi0 ) ∧ Nik (qi0 , qi ) Condition 5 says that from any state, the implementation can reach a flushed state by passing through a series of states where it does not fetch an instruction. If the implementation does not already have the ability to prevent instructions from being fetched, then flushing circuitry must be added. Condition 5 (Eventually Flushed) eventually flushed(isFlushed, doesFetch, Ni ) ≡ ∀ qi . ∃ k, qi0 , . . . , qik . qi = qi0 ∧ (∀ j < k. Ni (qij , qij+1 ) ∧ ¬ doesFetch(qij , qij+1 )) ∧ isFlushed(qik ) 3.2
Retiring and Projection Conditions
Condition 6 states that numRetire is zero for an implementation step if-and-only-if willRetire is false. It is the dual of Condition 1 for synchronization at retirement. Condition 6 (numRetire and willRetire) numRetire willRetire(numRetire, willRetire) ≡ ∀ qi , qi . (numRetire(qi , qi ) = 0) ⇐⇒ ¬ willRetire(qi , qi )
Relating Multi-step and Single-Step Microprocessor Correctness Statements
131
Condition 7, relating the predicate willRetire to the implementation projection function πi appropriate for synchronization at retirement, is the dual of Condition 3. Condition 7 says that if an instruction is not retired in a step where the implementation transitions from qi to qi , then the projections of qi and qi are equivalent. Condition 7 (willRetire and πi ) willRetire pi(willRetire, πi , Ni ) ≡ ∀ qi , qi . ¬ willRetire(qi , qi ) ∧ Ni (qi , qi ) =⇒ (πi (qi ) = πi (qi )) Condition 8 is a liveness condition. The condition says that from any implementation state, it is possible to reach a state that can retire an instruction. Condition 8 (Eventually Retires) eventually retires(willRetire, Ni ) ≡ ∀ qi . ∃ k, qi , qi . Nik (qi , qi ) ∧ Ni (qi , qi ) ∧ willRetire(qi , qi )
4
Flushpoint Equality and Informed-Pointwise Flushing
In this section, we discuss the relationship between the two correctness statements, flushpoint equality (FE) and informed-pointwise flushing (IU), which use synchronization at fetch. IU is Burch-Dill style flushing. In Section 4.1, we introduce a commonly used version of flushpoint alignment, which we call informed-flushpoint (iF). In Section 4.2, we prove that informed-flushpoint equality and informed-pointwise flushing are equivalent for a deterministic specification with no internal state (iFEND ⇐⇒ IUND, Theorem 3). A similar relationship does not exist between flushpoint equality (FE) and informed-pointwise flushing (IU), because flushpoint alignment does not constrain the number of steps in the specification trace. In Section 4.3, we describe an implementation and a non-deterministic specification that satisfy informed-flushpoint equality but not informed-pointwise flushing, thereby providing a counterexample to iFENN =⇒ IUNN.
4.1
Informed-Flushpoint
Flushpoint alignment (Definition 4) does not impose any constraints on the number of specification steps taken. However, in most verification efforts that use flushpoint alignment (e.g., [16,10,14]), the number of steps in the specification trace is the number of instructions executed in the implementation trace. We introduce informed-flushpoint alignment (iF) to capture this common practice. Informed-flushpoint is most commonly used with the equality match, as shown in Definition 5. We overload numFetch to return the total number of instructions fetched in either a sequence of implementation steps or in a single implementation step.
M.D. Aagaard, N.A. Day, and M. Lou
qs =π
Ni
4.2
qik
lu isF
isF
lu
sh ed
Definition 5 (Informed-Flushpoint Equality induction clause: iFENN). iFENN(isFlushed, numFetch, πi , πs , Ni , Ns ) ≡ numFetch ∀ qi0 , qi1 , . . . , qik . ∀ qs . ∃ qs . Ns Ns qs let f = numFetchqi0 , . . . , qik in =π isFlushed(qi0 ) ∧ j j+1 f (∀j < k. Ni (qi , qi )) qi0 Ni =⇒ ∧ Ns π(qs , qs ) ∧ k isFlushed(qi ) qik = qs k ∧ 0 π qi = q s
sh ed
132
Informed-Flushpoint and Informed Pointwise: Deterministic Specification
In this section, we prove Theorem 3, which says that, for a deterministic specification without internal state (i.e. Ns is ns and πs is identity), informed-flushpoint with the equality match (iFEND, an instantiation of Definition 5) is equivalent to informedpointwise with the flushing match (IUND, an instantiation of Definition 2). Showing that the single-step informed-pointwise correctness statement logically implies multistep informed-flushpoint (IUND =⇒ iFEND) is straightforward by induction. Here we describe the more difficult reverse direction (iFEND =⇒ IUND). First, we introduce an intermediate point, which we call iFflush (Definition 6) and prove iFEND =⇒ iFflush (Theorem 1). Second, we show iFflush =⇒ IUND (Theorem 2). Definition 6 (iFflush). numFetch
iFflush(isFlushed, numFetch, flush, πi , πs , Ni , Ns ) ≡
qs
Ns
qs
π
=
lu
sh
ed
qi0
=π
flush Ni
Ni
qik
k
isF
∀ qi0 , qi1 , . . . , qik .∀ qs . ∃ qs . let f = numFetchqi0 , . . . , qik in isFlushed(qi0 ) f ∧ j j+1 (∀j < k. Ni (qi , qi )) =⇒ ∧ Ns (qs , qs )π ∧ π flush(qik ) = qs q i = qs
Ns
Definition 6 is the same as informed-flushpoint (Definition 5), except that the final states must satisfy the flushing match, rather than be externally equivalent. Theorem 1 (iFENN =⇒ iFflush). ∀ isFlushed, numFetch, doesFetch, flush, πi , πs , Ni , Ns . eventually flushed(isFlushed, doesFetch, Ni ) — Condition 5 ∧ doesFetch flush(doesFetch, flush, Ni ) — Condition 3 ∧ isFlushed flush(isFlushed, flush) — Condition 2 ∧ — Condition1 numFetch doesFetch(numFetch, doesFetch) iFENN(isFlushed, numFetch, πi , πs , Ni , Ns ) =⇒ =⇒ iFflush(isFlushed, numFetch, flush, πi , πs , Ni , Ns )
Relating Multi-step and Single-Step Microprocessor Correctness Statements
133
Figure 3 outlines the proof of iFENN =⇒ iFflush (Theorem 1). This theorem depends on conditions described in Section 3. We begin in Step 0 assuming the left and lower sides of the commuting diagram for iFflush. In Step 1, we extend the path from qik to a flushed state, qi◦ , using the condition that the implementation can always reach a flushed state by taking steps that do not fetch instructions (Condition 5, eventually flushed). In Step 2, we use the condition that flushing a state after taking a series of steps that do not fetch an instruction is the same as flushing the state at the beginning of the series (Condition 3, doesFetch flush). In Step 3, we conclude that flushing qik results in qi◦ because flushing a flushed state has no effect (Condition 2, isFlushed flush). In Step 4, we use the fact that iFENN holds for traces between flushed states to complete the commuting diagram. Condition 1, which relates numFetch and doesFetch, is needed to relate the number of steps in the specification traces.
Step 0
Step 1: using eventually_flushed qs
qs
Ni
Ni
Ni ¬ doesFetch qik
lu
Step 2: using doesFetch_flush
isF
isF
lu
sh
ed
Ni
Ni
qi
qik
sh
ed
qi
qi◦ isFlushed Ni ¬ doesFetch
π =
π =
Step 3: using isFlushed_flush qs
qs
flush
qi
flush
Ni
qik
lu
sh
Ni
Step 4: using iFENN qs
Ns
π = flush
Ni
Ni
qik
isF
lu
sh
ed
qs
qi◦ isFlushed
π = qi
Ns
isF
lu isF
qi◦ isFlushed
π =
Ni ¬ doesFetch qik
Ni
Ni
sh
ed
qi
flush
ed
π =
qi◦ isFlushed Ni ¬ doesFetch
Fig. 3. Steps in proof of iFENN =⇒ iFflush (Theorem 1)
In the second half of the proof of iFEND =⇒ IUND, we use iFflush to arrive at IUND (Theorem 2). The steps of the proof are outlined in Figure 4.
134
M.D. Aagaard, N.A. Day, and M. Lou
Theorem 2 (iFflush =⇒ IUND). ∀ isFlushed, numFetch, flush, πi , πs , Ni , ns . — Condition 4 past flush(isFlushed, Ni ) ∧ πs= (λx.x) iFflush(isFlushed, numFetch, flush, πi , πs , Ni , ns ) =⇒ =⇒ IUND(numFetch, flush, πi , πs , Ni , ns )
Step 0
qs πi flush qi
Step 1: using past_flush
Ni
qi
qs πi
flush Ni
qi
Ni
Ni
qi
isF
lu
sh
ed
qi◦
Step 2: using iFflush twice ns ns qs◦
qs◦
qs πi
πi
ns
ns
qs
ns
qs πi
πi
flush
flush Ni
qi◦
ed
qi
Ni
Ni
Ni
sh
Ni
isF
isF lu
lu
sh
ed
qi◦
qi
Step 3: IUND
qs
ns
qs
πi
πi flush
flush qi
Ni
qi
Fig. 4. Steps in proof of iFflush =⇒ IUND (Theorem 2)
qi
Ni
qi
Relating Multi-step and Single-Step Microprocessor Correctness Statements
135
In Step 0 of Figure 4, we start with the left and lower edges of the IUND commuting diagram, leaving out πs because it is the identity function in this case. In Step 1, we extend the path from qi back to a flushed state, qi◦ , using the condition that for any state, there is always a previous flushed state (Condition 4, past flush). In Step 2, we use iFflush to deduce the two commuting diagrams both beginning at qi◦ . Because the matching relationship is a function, and because the specification is deterministic, from these two commuting diagrams we can conclude IUND in Step 3. We combine Theorem 1, specialized for a deterministic specification with no internal state; Theorem 2; and the result that IUND logically implies iFEND to conclude that iFEND is equivalent to IUND under the conditions listed in Section 3 (Theorem 3). Theorem 3 (iFEND ⇐⇒ IUND). ∀ isFlushed, numFetch, doesFetch, flush, πi , πs , Ni , ns . — Condition 5 eventually flushed(isFlushed, doesFetch, Ni ) ∧ doesFetch flush(doesFetch, flush, Ni ) — Condition 3 ∧ isFlushed flush(isFlushed, flush) — Condition 2 ∧ numFetch doesFetch(numFetch, doesFetch) — Condition 1 ∧ past flush(isFlushed, Ni ) — Condition 4 ∧ πs = (λx.x) iFEND(isFlushed, numFetch, πi , πs , Ni , ns ) ⇐⇒ =⇒ IUND(numFetch, flush, πi , πs , Ni , ns )
4.3
Informed-Flushpoint and Informed-Pointwise: Non-Deterministic Specification Counterexample
In Section 4.2, we proved iFEND ⇐⇒ IUND. In this section, we illustrate that a nondeterministic specification paired with an implementation can satisfy iFENN without satisfying IUNN. Figure 5 is an example of a reasonable non-deterministic specification and a slightly strange, but arguably correct, implementation that satisfies iFENN but not IUNN. In the specification states (S1—S9), the letters in the top of the box represent instructions to execute. The lower part of the box lists completed instructions. In the implementation states (I1–I7), the middle shaded area is in-flight instructions. States with no in-flight instructions are flushed. The larger, shaded arrows show the projection of the implementation states. In the step marked “X” the implementation kills its currently executing instruction “B” and fetches the instructions “C”, and “D”, however it only reports fetching one instruction. Figure 6 shows how the iFENN commuting diagram is satisfied for all possible paths between flushed implementation states. In all three cases, the length of the specification trace is the reported number of instructions fetched. Because there is a bug in the fetch mechanism, this is not actually the number of instructions fetched in Path 3. Figure 7 illustrates that IUNN does not hold for the implementation step “X”.
136
M.D. Aagaard, N.A. Day, and M. Lou
S1
Specification S3
S2
A B C D
A
B C D
B
C D
A C
A B C
S4 A B
D
C
S5
A
D
B
C
A D
A B C D
S7
I7
B C D A B
A
B C D
B
C D
B
C,D
B
X
A
¬ doesFetch
¬ doesFetch
I4
I3
A
C D
¬ doesFetch
¬ doesFetch
I2 A
B
C D
A
A B C D
C D
πi
I6
I5
A B C D
A
S9
B
B
πi
πi
D
S8
A C D
I1
D
C
A B
πi
S6
B
C D A
Implementation
Path 1: iFENN
Fig. 5. Specification and implementation of counterexample
S2
S1
A B C D
A
B C D A
I1
πi
A B C D
I2 A
I5
B C D
πi B C D
A
=
A
Path 2: iFENN
S3 S1
A B C D
S2
A
B C D
A
I1
πi
A B C D
I2 A
B C D
B
A
C D
C D
Path 3: iFENN
S4
A B C D
C
S7
A B
D
D
C
I1
πi
A B C D
I2 A
I3 B C D
A
C D
B
B A
A
A
B
C D
A
I4
I7 B
C,D
X
B
S8
A B
A
I4 C,D
X
C D
πi
C D
Fig. 6. iFENN paths of counterexample
A
C D
flush B C D
A
B
A
πi B
flush
I3
C D
I7
A B
A B
C D
S1
B A
C D
B A
πi
I6
πi
I6
D A B C
A B
C D
A B
I3
S8
C D C
S3
B
S6
C D
Fig. 7. IUNN path of counterexample
Relating Multi-step and Single-Step Microprocessor Correctness Statements
137
5 Will-Retire and Informed-Pointwise The will-retire correctness statement (WONN, Definition 3) uses synchronization at retirement to compare an implementation trace that retires instructions only in the first step against one specification step. The implementation trace continues until it is ready to retire another instruction. The main result of this section is Theorem 6, which says that will-retire equality (WENN) is equivalent to informed-pointwise with equality (IENN, Definition 2 with the equality match). The first insight in the proof that WENN is equivalent to IENN is the introduction of an alternative way of expressing WONN, which we call single-step will-retire (ssWONN, Definition 7). ssWONN decomposes WONN into two simpler, single-step properties based on whether the implementation will retire any instructions. As a single-step correctness statement, ssWONN is similar to informed-pointwise (IONN) in examining only a single step of the implementation. IONN and ssWONN are equivalent under Condition 6, numRetire willRetire, which states that the function numRetire returns zero if-and-only-if willRetire is false (Theorem 4). Definition 7 (Single-step will-retire induction clause: ssWONN). ssWONN(numRetire, willRetire, R, Ni , Ns ) ≡ ∀ qi , qi . ∀ qs . letr = numRetire(q i , qi )in willRetire(qi , qi ) =⇒ ∃ qs . Nsr (qs , qs ) ∧ R(qi , qs ) Ni (qi , qi ) ∧ =⇒ ∧ R(qi , qs ) ¬ willRetire(qi , qi ) =⇒ R(qi , qs ) Theorem 4 (ssWONN ⇐⇒ IONN). ∀ numRetire, willRetire, R, Ni , Ns . numRetire — Condition 6 willRetire(numRetire, willRetire) ssWONN(numRetire, willRetire, R, Ni , Ns ) ⇐⇒ =⇒ IONN(numRetire, R, Ni , Ns ) The next and more challenging step in the proof is to show that will-retire with the equality match is equivalent to the seemingly tighter single-step will-retire correctness statement (WENN ⇐⇒ ssWENN). Showing ssWENN =⇒ WENN is straightforward by induction. The other direction (WENN =⇒ ssWENN, Theorem 5) holds under Conditions 7 and 8. Theorem 5 (WENN ⇐⇒ ssWENN). ∀ willRetire, πi , πs , Ni , Ns . willRetire pi(willRetire, πi , Ni ) — Condition 7 ∧ eventually retires(willRetire, Ni ) — Condition 8 WENN(numRetire, willRetire, πi , πs , Ni , Ns ) ⇐⇒ =⇒ ssWENN(numRetire, willRetire, πi , πs , Ni , Ns )
138
M.D. Aagaard, N.A. Day, and M. Lou
Figure 8 is an illustration of the proof of Theorem 5. In Step 0, we start with the left and lower side of the commuting diagram for ssWENN. In Step 1, we use the eventually retires condition (Condition 8), to reach the first future state, qi , that retires an instruction. In Step 2, we use the willRetire pi condition (Condition 7) to conclude the projection of qi and qi are equal. In Step 3, we use WENN to complete the commuting diagram. Step 4 shows ssWENN where the left case follows from Step 3 and the right case follows directly from Condition 7.
qs πs
Step 0
πi qi
Ni willRetire
qi
qs πs
Step 1: using eventually_retires
πi qi
Ni willRetire
qi
Ni ¬ willRetire
qi Ni ¬ willRetire
qs
Step 2: using willRetire_pi
πs πi
πi
πi qi
qs
Ni willRetire Ns
qi
Ni ¬ willRetire
πs
πi
πi
qs
Ni willRetire Ns
qi
πs
πi
πi Ni willRetire
qi
Ni willRetire
Step 3: using WENN πi Ni ¬ willRetire
qi Ni ¬ willRetire
qs
πs
qi
qi Ni ¬ willRetire
qs
πs
qi
Ni willRetire
qs πs
Ni willRetire
Step 4: left case from Step 3; right case from willRetire_pi πi
πi qi
qi Ni ¬ willRetire
Fig. 8. Steps in proof of WENN =⇒ ssWENN (Theorem 6)
Relating Multi-step and Single-Step Microprocessor Correctness Statements
139
Theorem 6 (WENN ⇐⇒ IENN). ∀ willRetire, πi , πs , Ni , Ns . — Condition 7 willRetire pi(willRetire, πi , Ni ) ∧ eventually retires(willRetire, Ni ) — Condition 8 ∧ willRetire(numRetire, willRetire) — Condition 6 numRetire WENN(numRetire, willRetire, πi , πs , Ni , Ns ) ⇐⇒ =⇒ IENN(numRetire, πi , πs , Ni , Ns ) By specializing R in Theorem 4 to the equality match, we are able to conclude
IENN is equivalent to ssWENN under Condition 6. Combining this specialization of Theorem 4 with Theorem 5, we conclude WENN ⇐⇒ IENN under Conditions 7, 8,
and 6 (Theorem 6).
6
Relating Theory to Practice
We now consider the relevance of our results to existing microprocessor verification efforts that use multi-step correctness statements based on flushpoint and will-retire alignment. Using our theorems is contingent upon showing the implementation satisfies the conditions in Section 3. Sawada and Hunt [16] verified that a non-deterministic implementation with out-oforder retirement satisfies informed-flushpoint equality with a deterministic specification with no internal state (iFEND). Their verification strategy is to build an intermediate model with history variables, called the MAETT. From our result, they can now conclude that informed-pointwise flushing (IUND) also holds. In later work [17,18], they enhanced their implementation to support external interrupts, which led them to add non-determinism to their specification because of the problem of predicting how many instructions the implementation will have completed when an interrupt occurs. Because of the non-deterministic specification, we cannot conclude that pointwise flushing holds in this case. Skakkebæk et al. [14,13] verify that a deterministic implementation with in-order retirement satisfies informed-flushpoint equality with a deterministic specification with no internal state (iFEDD). They build a non-deterministic intermediate model that computes the result of each instruction when it enters the machine and queues the result for later retirement. Because of our result they are able to conclude informed-pointwise flushing (IUDD) holds. Hosabettu, Srivas, and Gopalakrishnan [10,11,12,9] prove that a deterministic out-oforder implementation satisfies informed-flushpoint equality with a deterministic specification with no internal state. They first prove informed-pointwise flushing (IUDD), then apply induction to prove informed-flushpoint equality (iFEDD). Because they use IUDD as a step toward iFEDD, there is no need for our result in this work. Arons and Pnueli [3] use flushpoint alignment, not informed-flushpoint. Thus, our result is not applicable to their verification effort. Fox and Harman [7] use will-retire alignment for a deterministic implementation and specification where the match is projection of the implementation (WEDD). Based on the results of this paper, they can also conclude informed-pointwise equality (IEDD).
140
7
M.D. Aagaard, N.A. Day, and M. Lou
Conclusions
This paper contains three results. First, we prove that for deterministic specifications with no internal state, from multi-step informed-flushpoint equality, one can conclude singlestep informed-pointwise with the flushing match. Second, we provide a counterexample showing that for non-deterministic specifications flushpoint equality does not always imply informed-pointwise with the flushing match. Third, we prove that a multi-step correctness statement based on synchronization at retirement with the equality match is equivalent to informed-pointwise with the equality match. Our results are applicable to superscalar implementations, which fetch or retire multiple instructions in a single step. Our long-term goal in studying correctness statements abstractly is to determine decomposition strategies that will ease the verification effort. The proofs described in this paper have been mechanized in the HOL theorem prover. We have created a reusable theory of microprocessor correctness that allows the comparison and extension of existing verification efforts. Acknowledgments. We thank Robert Jones of Intel and the reviewers for detailed comments on this paper. The authors are supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC). Aagaard is supported in part by Intel Corporation.
References 1. M. D. Aagaard, B. Cook, N. A. Day, and R. B. Jones. A framework for microprocessor correctness statements. In CHARME, volume 2144 of LNCS, pages 433–448. Springer, 2001. 2. M. D. Aagaard, B. Cook, N. A. Day, and R. B. Jones. A framework for superscalar microprocessor correctness statements, 2002. To appear in Software Tools for Technology Transfer. 3. T. Arons and A. Pnueli. Verifying Tomasulo’s algorithm by refinement. In Int’l Conf. on VLSI Design, pages 92–99. IEEE Comp. Soc. Press, 1999. 4. S. Berezin, E. Clarke, A. Biere, and Y. Zhu. Verification of out-of-order processor designs using model checking and a light-weight completion function. Formal Methods in System Design, 20(2):159–186, March 2002. 5. J. Burch and D. Dill. Automatic verification of pipelined microprocessor control. In CAV, volume 818 of LNCS, pages 68–80. Springer, 1994. 6. N. A. Day, M. D. Aagaard, and M. Lou. A mechanized theory for microprocessor correctness statements. Technical Report 2002-11, U. of Waterloo, Dept. of Comp. Sci., 2002. 7. A. Fox and N. Harman. Algebraic models of correctness for microprocessors. Formal Aspects in Computing, 12(4):298–312, 2000. 8. M. Gordon and T. Melham. Introduction to HOL: A Theorem Proving Environment for Higher Order Logic. Cambridge University Press, 1993. 9. R. Hosabettu, G. Gopalakrishnan, and M. Srivas. Verifying advanced microarchitectures that support speculation and exceptions. In CAV, volume 1855 of LNCS, pages 521–537. Springer, 2000. 10. R. Hosabettu, M. Srivas, and G. Gopalakrishnan. Decomposing the proof of correctness of pipelined microprocessors. In CAV, volume 1427 of LNCS, pages 122–134. Springer, 1998. 11. R. Hosabettu, M. Srivas, and G. Gopalakrishnan. Proof of correctness of a processor with reorder buffer using the completion functions approach. In CAV, volume 1633 of LNCS, pages 47–59. Springer, 1999.
Relating Multi-step and Single-Step Microprocessor Correctness Statements
141
12. R. Hosabettu, M. Srivas, and G. Gopalakrishnan. Proof of correctness of a processor without reorder buffer using the completion functions approach. In CHARME, volume 1703 of LNCS, pages 8–22. Springer, 1999. 13. R. Jones, J. Skakkebæk, and D. Dill. Reducing manual abstraction in formal verification of out-of-order execution. In FMCAD, volume 1522 of LNCS, pages 2–17. Springer, 1998. 14. R. B. Jones, J. U. Skakkebæk, , and D. L. Dill. Formal verification of out-of-order execution using incremental flushing. Formal Methods in System Design, 20(2):39–58, March 2002. 15. R. Milner. An algebraic definition of simulation between programs. In Joint Conference on Artificial Intelligence, pages 481–489. British Computer Society, 1971. 16. J. Sawada and W. Hunt. Trace table based approach for pipelined microprocessor verification. In CAV, volume 1254 of LNCS, pages 364–375. Springer, 1997. 17. J. Sawada and W. Hunt. Processor verification with precise exceptions and speculative execution. In CAV, volume 1427 of LNCS, pages 135–146. Springer, 1998. 18. J. Sawada and W. Hunt. Results of the verification of a complex pipelined machine model. In CHARME, volume 1703 of LNCS, pages 313–316. Springer, 1999.
Modeling and Verification of Out-of-Order Microprocessors in UCLID Shuvendu K. Lahiri2 , Sanjit A. Seshia1 , and Randal E. Bryant1,2 1
2
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA {Randy.Bryant, Sanjit.Seshia}@cs.cmu.edu Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA
[email protected] Abstract. In this paper, we describe the modeling and verification of out-of-order microprocessors with unbounded resources using an expressive, yet efficiently decidable, quantifier-free fragment of first order logic. This logic includes uninterpreted functions, equality, ordering, constrained lambda expressions, and counter arithmetic. UCLID is a tool for specifying and verifying systems expressed in this logic. The paper makes two main contributions. First, we show that the logic is expressive enough to model components found in most modern microprocessors, independent of their actual sizes. Second, we demonstrate UCLID’s verification capabilities, ranging from full automation for bounded property checking to a high degree of automation in proving restricted classes of invariants. These techniques, coupled with a counterexample generation facility, are useful in establishing correctness of processor designs. We demonstrate UCLID’s methods using a case study of a synthetic model of an out-of-order processor where all the invariants were proved automatically.
1
Introduction
Present-day microprocessors are complex systems, incorporating features such as pipelining, speculative, out-of-order execution, register-renaming, exceptions, and multi-level caching. Several formal verification techniques, including symbolic model checking [4,12], theorem proving [17,2,11], and approaches based on decision procedures for the logic of equality with uninterpreted functions [8,6,20] have been used to verify such microarchitectures. In previous work, Bryant et al.[5,6] presented PEUF, a logic of positive equality with uninterpreted functions. PEUF has been shown to be expressive enough to model pipelined processors and also has a very efficient decision procedure based on Boolean techniques. Lahiri et al. [13] demonstrate the use of this technique for the verification of the superscalar, deeply pipelined MCORE1 processor, by finding bugs in the real 1
MCORE is a registered trademark of Motorola Inc.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 142–159, 2002. c Springer-Verlag Berlin Heidelberg 2002
Modeling and Verification of Out-of-Order Microprocessors in UCLID
143
design. However, this approach cannot handle models with unbounded queues and reorder buffers, which limits its applicability to processors with bounded resources. To overcome this problem, we have generalized PEUF to yield a more expressive logic called CLU [7], which is a logic of Counter Arithmetic with Lambda Expressions and Uninterpreted Functions. UCLID is a system for modeling and verifying systems modeled in CLU. It can be used to model a large class of infinite-state systems, including those with unbounded resources, while retaining the advantage of having an efficient decision procedure. In this paper, we explore the application of UCLID to out-of-order processor designs. First, we illustrate the fact that CLU is expressive enough to model different processor components with unbounded resources. This includes components with infinite resources (e.g. infinite memory) or resources with finite but arbitrary size (e.g. a circular queue of arbitrary length). Next, we show that UCLID has useful verification capabilities that build upon the efficient decision procedure and a counterexample generator. We demonstrate the successful use of bounded property checking, i.e., checking an invariant on all the states of the system which are reachable within a fixed (bounded) number of steps from the reset state. The efficiency of UCLID’s decision procedure enables a completely automatic exploration of a much larger state space than is possible with other techniques which can model infinite state systems. UCLID can also be used for inductive invariant checking, for a restricted class of invariants of the form ∀x1 . . . ∀xk .Ψ (x1 , . . . , xk ), where Ψ (x1 , . . . , xk ) is a CLU formula. In our experience, this class of invariant is expressive enough to specify most invariants about out-of-order processors with unbounded size. These are also the most frequently occurring invariants that we have encountered in our experience with UCLID. As a case study, we present the modeling and verification of a synthetic out-oforder processor, OOO, with ALU instructions, infinite memory, arbitrary large data words and an unbounded-size reorder buffer (first with an infinite size queue, and then with a finite but arbitrary size circular buffer). Bounded property checking was used initially to debug the design. The processor model was then formally verified by inductive invariant checking, by showing that it refines an instruction set architecture (ISA) model. The highlight of the verification was that all the invariants were proved fully automatically. Moreover, very little manual effort was needed in coming up with auxiliary invariants, which were inferred fairly easily from counterexample traces. Related Work. Jhala and McMillan [12] use compositional model checking to verify a microarchitecture with speculative, out-of-order execution, load-store buffers and branch prediction. Apart from requiring the user to write down the refinement maps and case-splits to prove lemmas, the rest of the verification is automatically performed using Cadence SMV. The out-of-order processor we verify is similar in complexity to the model of Tomasulo algorithm McMillan verified using compositional reasoning [14]. The author acknowledges that the proof is not automatic and substantial human effort is required to decompose the proof into lemmas about small components of states. The main advantage of using model checking is in automatically computing the strongest invariants for the most general state of the system; in our case, once the invariants have
144
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
been figured out by the user, the rest of the proof is fully automatic and no manual decomposition is required. Berezin et al. [4] use special data-structures called reference files, along with other symmetry reduction techniques, to manually decompose a generic out-of-order execution model to a finite model, which is verified using a model checker. The manual guidance involved in decomposing the model limits the applicability of this approach to small, simple designs. Sawada and Hunt [17] use theorem proving methodology to verify the correctness of microarchitectures with out-of-order execution, load-store instruction and speculation. They use a trace-table based intermediate representation called MAETT to record both committed and in-flight instructions. This method requires extensive user guidance during the verification process, first in discovering invariants, and then in proving them using the ACL2 theorem prover. The authors claim that automating the proof of the lemmas would make the verification easier. Automating proof is central to our work and we illustrate it with the verification of an out-of-order unit. Hosabettu et al. [10,11] use a completion function approach to verify advanced microarchitectures which includes reorder buffers, using the PVS [16] theorem prover. The method requires user ingenuity to construct a completion function for the different instruction types and then composing the different completion functions to obtain the abstraction function. The approach further requires extensive user guidance in discharging the proofs. Although the out-of-order unit we verify is of similar complexity as that in their original work [10], we shall show that the invariants required in our verification are few and simple, and they are discharged in a completely automatic manner. Arons et al. [1,2] also verify out-of-order processors using refinement within PVS theorem prover. Our verification scheme is very similar to their approach as it also uses prediction to establish the correspondence with a sequential ISA. The model verified in [1] is similar in complexity to ours but once again substantial manual assistance is required to prove the invariants using PVS. Skakkebaek et al. [19] manually transform an out-of-order model of a processor to an intermediate inorder model, and use incremental flushing to show the correspondence of the intermediate model with the ISA model. The manual component in the entire process is significant in both constructing the intermediate model and proving correctness. Velev [20] has verified an out-of-order execution unit exploiting positive equality and rewrite rules. The model does not have register-renaming and still considers bounded (although very large) resources. The rest of the paper is organized as follows. We begin by describing the UCLID system in Section 2. This section outlines the underlying logic CLU in Section 2.1 and the verification techniques supported in the UCLID framework in Section 2.2. Modeling primitives for various processor components are described in Section 3. Section 4 describes the case study of the verification of an out-oforder processor unit (OOO) in detail. The section contains a description of the processor, all the invariants required, and the use of bounded property checking and inductive invariant checking for the verification of the OOO unit. We conclude in Section 5.
Modeling and Verification of Out-of-Order Microprocessors in UCLID
2
145
The UCLID System
2.1
The CLU Logic
The logic of Counter Arithmetic with Lambda Expressions and Uninterpreted Functions (CLU) is a generalization of Logic of Equality with Uninterpreted Functions (EUF) [8] with constrained lambda expressions, ordering, interpreted functions for successor (succ) and predecessor (pred) operations, that we will refer to as counter arithmetic. bool-expr ::= true | false | ¬bool-expr | (bool-expr ∧ bool-expr) | (int-expr = int-expr) | (int-expr < int-expr) | predicate-expr(int-expr, . . . , int-expr) int-expr ::= int-var | ITE(bool-expr, int-expr, int-expr) | succ(int-expr) | pred(int-expr) | function-expr(int-expr, . . . , int-expr) predicate-expr ::= predicate-symbol | λ int-var, . . . , int-var . bool-expr function-expr ::= function-symbol | λ int-var, . . . , int-var . int-expr
Fig. 1. CLU Syntax.
Expressions in CLU describe means of computing four different types of values. Boolean expressions, also termed formulas, yield true or false. Integer expressions, also referred to as terms, yield integer values. Predicate expressions denote functions from integers to Boolean values. Function expressions, denote functions from integers to integers. Figure 1 summarizes the expression syntax. The simplest Boolean expressions are true and false. Boolean expressions can also be formed by comparing two integer expressions for equality or for ordering, or by applying a predicate expression to a list of integer expressions, or by combining Boolean expressions using Boolean connectives. Integer expressions can be integer variables2 , or can be formed by applying a function expression (including interpreted functions succ and pred) to a list of integer expressions, or by applying the ITE (for “if-then-else”) operator. The ITE operator chooses between two values based on a Boolean control value, i.e., ITE(true, x1 , x2 ) yields x1 while ITE(false, x1 , x2 ) yields x2 . Function (predicate) expressions can be either function (predicate) symbols, representing uninterpreted functions (predicates), or lambda expressions, defining the value of the function (predicate) as an integer (Boolean) expression containing references to a set of argument variables. We will omit parentheses for function and predicate symbols with zero arguments, writing a instead of a(). An integer variable x is said to be bound in expression E when it occurs inside a lambda expression for which x is one of the argument variables. We say that an 2
Integer variables are used only as the formal arguments of lambda expressions
146
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
expression is well-formed when it contains no unbound variables. The value of a well-formed expression in CLU is defined relative to an interpretation I of the function and predicate symbols. Let Z denote the set of integers. Interpretation I assigns to each function symbol of arity k, a function from Z k to Z, and to each predicate symbol of arity k a function from Z k to {true, false}. The value of a well-formed expression E in CLU relative to an interpretation I, [E]I is defined inductively over the expression structure. We shall omit the details in this paper. A well-formed formula F is true under interpretation I if [F ]I is true. It is valid when it is true under all possible interpretations. It can be easily shown that CLU has a small-model property, i.e. a CLU formula Fclu is valid iff Fclu is valid over all interpretations whose domain size equals the number of distinct terms in Fclu . The decision procedure for CLU checks the validity of a well-formed formula F by translating it to an equivalent propositional formula. The structure of the formula is exploited for positive equality [5] to dramatically reduce the number of interpretations to consider, yielding a very efficient decision procedure for CLU [7]. For brevity, we will not discuss the decision procedure in this paper. 2.2
Verification with UCLID
The UCLID specification language can be used to specify a state machine, where the state variables either have primitive types — Boolean, enumerated, or (unbounded) integer — or are functions of integer arguments that evaluate to these primitive types. The concept of using functions or predicates as state variables has previously been used in Cadence SMV, and in theorem provers as well. A system is specified in UCLID by describing initial-state and next-state expressions for each state variable. The UCLID verification engine comprises of a symbolic simulator that can be “configured” for different kinds of verification tasks, and a decision procedure for CLU. We shall illustrate the use of two particular techniques for the verification of out-of-order processors. The reader is referred to [7] for more details. 1. Bounded property checking: The system is symbolically simulated for a fixed number of steps starting from the reset state. At each step, the decision procedure is invoked to check the validity of some safety property. If the property fails, then we can generate a counterexample trace from the reset state. 2. Inductive invariant checking: The system is started from the most general state which satisfies the invariants and then simulated for one step. The invariants are checked at the next step to ensure that the state transition preserves the invariant. If the invariants hold for the reset state, and the invariants are preserved by the transition function, then the invariants hold for any reachable state of the model. As we shall see in the next section, we can express an interesting class of invariants with universal quantifiers and can automatically decide that the transition function preserves the invariants.
Modeling and Verification of Out-of-Order Microprocessors in UCLID
147
Counterexample Generation. One of the useful features of UCLID is its ability to generate counterexample traces, much like a model checker. A counterexample to a CLU formula Fclu is a partial interpretation I to the various function and predicate symbols in the formula. If the system has been symbolically simulated for k steps, then the interpretation I generated above can be applied to the expressions at each step, thereby resulting in a complete counterexample trace for k steps. The counterexample generation is useful in both bounded property checking to discover bugs in the design and in inductive invariant checking for adding more auxiliary invariants. Invariant Checking and Quantifiers. The logic of CLU has been restricted to be quantifier-free. Hence a well-formed formula in this logic can be decided for validity using the small-model property of CLU. Although this restriction is not severe in the modeling of the out-of-order processors we consider, the need for quantifiers become apparent when UCLID is used for invariant checking. The invariants we encounter are frequently of the form ∀x1 ∀x2 . . . ∀xk Φ(x1 , . . . , xk ), where x1 , . . . , xk are integer variables free in the CLU formula Φ(x1 , . . . , xk ). To prove that such an invariant is actually preserved by the state transition function, we need to decide the validity of formulas of the form ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ) =⇒ ∀y1 . . . ∀yk Φ(y1 , . . . , yk )
(1)
where Ψ (x1 , . . . , xm ), Φ(y1 , . . . , yk ) are CLU formulas, x1 . . . xm and y1 . . . yk are free in Ψ (x1 , . . . , xm ) and Φ(y1 , . . . , yk ) respectively. In general, the problem of checking validity of first-order formulas of the form (1), with uninterpreted functions is undecidable [9]. Note that this class of formulas cannot be expressed in CLU, since CLU is a quantifier-free logic. However, UCLID has a preprocessor for formulas of the form (1), which are translated to a CLU formula, which is a more conservative form of the original formula, i.e. if the CLU formula is valid then the original formula is valid. As we shall demonstrate, this has proved very effective for automatically checking the class of invariants encountered in our verification with out-of-order processors. We employ a very simple heuristic to convert formulas of the form (1) to a CLU formula. First, the universal quantifiers to the right of the implication in (1) are removed by skolemization to yield the following formula, which is equivalent to the formula in (1) ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ) =⇒ Φ( y1 , . . . , yk )
(2)
where y1 , . . . , yk are fresh function symbols of arity 0. Second, as in deductive verification, we instantiate x1 . . . xm with concrete terms and the universal quantifiers to the left of the implication are replaced by a finite conjunction over these concrete terms. The resulting formula is a CLU formula, whose validity implies the validity of (1). The set of terms over which to instantiate the antecedent is chosen as follows. Let T (Fclu ) be the set of all terms (integer expressions) which occur in a CLU expression Fclu . For each bound variable xi in ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ), we denote Fxji = { f | f is an function or predicate symbol and xi occurs as j th
148
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
argument to f in Ψ (x1 , . . . , xm )}. Further, for each function or predicate symbol f which occurs in Ψ (x1 , . . . , xm ), denote Gfk = { T | T ∈ T (Φ), and appears as the k th argument to f in Φ(y1 , . . . , yk ) }. The set of arguments that each bound variable xi takes is given by Axi = j { T | T ∈ Gfj for some f ∈ Fxji }. Finally, Ψ (x1 , . . . , xm ) is instantiated over all the terms in Cartesian product, Ax1 × Ax2 . . . × Axm . For example, consider the following quantified formula ∀x1 ∀x2 .f (x1 , x2 ) = g(x2 , x1 ) =⇒ ∀y.f (h2 (y), h1 (y)) = g(h1 (y), h2 (y)) where Ψ ≡ f (x1 , x2 ) = g(x2 , x1 ) and Φ ≡ f (h2 (y), h1 (y)) = g(h1 (y), h2 (y)). In this case,Fx11 = {f }, Fx21 = {g} and Fx12 = {g}, Fx22 = {f }. Similarly, Gf1 = {h2 ( y )}, Gf2 = {h1 ( y )} and Gg1 = {h1 ( y )}, Gg2 = {h2 ( y )}. Finally, Ax1 = y )} and Ax2 = {h1 ( y )}. Hence the bound variables x1 , x2 are instantiated {h2 ( over {h2 ( y )} and {h1 ( y )} respectively. Hence the CLU formula becomes : y ), h1 ( y )) = g(h1 ( y ), h2 ( y )) =⇒ f (h2 ( y ), h1 ( y )) = g(h1 ( y ), h2 ( y )) f (h2 ( which is valid. It is easy to see that this method would cause a blowup which is exponential in the number of bound variables in ∀x1 . . . ∀xm Ψ (x1 , . . . , xm ). However our experience shows that the form of invariants we normally consider have very few bound variables which the decision procedure for UCLID can handle. More importantly, we will demonstrate in Section 4.2 that this simple translation to CLU formula helps us decide many equations of the form (1).
3
Modeling Components of Microprocessors
This section presents techniques to model commonly found structures of modern superscalar processor designs. Primitive constructs have been drawn from a wide spectrum of industrial processor designs, including those of the MIPS R10000, PowerPC 620, and Pentium Pro [18]. 3.1
Terms, Uninterpreted Functions, and Data Abstraction
Microprocessors are described using the standard term-level modeling primitives [17,12,21], where data-words and bit-vectors are abstracted with terms, and functional units abstracted with uninterpreted functions. 3.2
Memories
In this section, we look at a few different formulations of memories found in processors and show how the lambda notations offer a very natural modeling capability for memories. Indexed Memories. Data memory and register file are examples of indexed memories. The set of operations supported by this form of memory are read,
Modeling and Verification of Out-of-Order Microprocessors in UCLID
149
write. At any point in system operation, an indexed memory is represented by a function expression M denoting a mapping from addresses to data values. The initial state of the memory is given by an uninterpreted function symbol m0 which denotes an arbitrary memory state. The effect of a write operation with integer expressions A and D denoting the address and data values yields a function expression M : M = λ addr . ITE(addr = A, D, M (addr )) where M (addr ) denotes a read from the memory at an address addr . Content Addressable Memories. Register Rename units and Translation Lookaside Buffers (TLBs) are examples of Content Addressable Memory (CAM), that store associations between key and data. We represent a CAM as a pair C = C .data,C .present, where C .present is a predicate expression such that C .present(k) is true for any key k that is stored in the CAM, and C .data is a function expression such that C .data(k) yields the data associated with key k, assuming the key is present. The next state components of a CAM for different operations are shown in Figure 2. Operation C .present C .data Insert(C , K , D) λ key . (key = K) ∨ C .present(key) λ key . ITE(key = K, D, C .data(key)) Delete(C , K ) λ key . ¬(key = K) ∧ C .present(key) C .data
Fig. 2. CAM operations
Simultaneous-update arrays. Many structures such as reorder buffers, reservation stations in processors, snoop on the result bus to update an arbitrary number of entries in the array at a single instant. At any point in time, the entry at index i in M can be updated with a data D(i ) if the predicate P (i ) is satisfied. The next state of the array is denoted as: M = λi .ITE(P (i ), D(i ), M (i )) Note that an arbitrary subset of entries in the array can get updated at any time. 3.3
Queues and FIFO Buffers
Processors which employ out-of-order execution mechanisms or prefetching use a variety of queues in the microarchitecture. Instruction buffers, reorder buffers, queues for deferring store instructions to memory, load queues to hold the load instructions which suffer a cache miss are found in most modern processors. Queues. A finite circular queue of arbitrary length can be modeled by augmenting a CAM with two pointers to point to the head and the tail of the queue.
150
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
Insertion (push) of data takes place only at the tail of the queue, and deletion (pop) takes place only at the head. Thus a circular queue can be modeled as a record Q = Q.data, Q.present, Q.head , Q.tail . Q.data and Q.present are defined exactly as in Section 3.2. Q.head is the index of the head of the queue, Q.tail is the index of the tail (next insertion point) of the queue. Let the symbolic constants s and e represent the start and end points of the array over which the circular queue is implemented. The queue is empty when Q.head = Q.tail and Q.present(Q.head ) = false. The queue is full when Q.head = Q.tail and Q.present(Q.head ) = true. To model the effect of succ and pred modulo certain integer, we define the the modulo increment and decrement functions succ[s,e] and pred[s,e] as follows: succ[s,e] := λ i . ITE(i = e, s, succ(i)) pred[s,e] := λ i . ITE(i = s, e, pred(i)) Popping data item from Q returns a new queue Q whose components have the value: Q .head = succ[s,e] (Q.head ) Q .present = λ i . ¬(i = Q.head ) ∧ Q.present(i) Q .tail = Q.tail Q .data = Q.data
Pushing a data item X into Q returns a new queue Q where Q .head = Q.head Q .present = λ i . (i = Q.tail ) ∨ C .present(i) Q .tail = succ[s,e] (Q.tail ) Q .data = λ i . ITE(i = Q.tail , X , Q.data(i))
This formulation of queue is used when the the index to the queue is used as a key in the system. The reorder buffers in processors follow this formulation, because the index in the reorder buffer uniquely identifies the instruction at the index. It is easy to see that for the case when succ[s,e] = succ and pred[s,e] = pred, we obtain an unbounded infinite queue. Q.present would be redundant in that situation. FIFO Buffers. Alternate formulation of queues where the index in the queue is not used as a key (normally referred as FIFO Buffers) are also found in processors. Instruction buffers and load buffers are some examples of this form of queue. Every time an entry is dequeued, the entire content of the queue is shifted by one place towards the head of the queue. If the symbolic constant max denotes the maximum length of the queue, then the queue is full when (Q.tail = max ) and is empty when (Q.tail = Q.head ). The other operations of the queue are given below. Operation Q .head Q .tail Q .data Push(Q, X ) Q.head succ(Q.tail ) λ i . ITE(i = Q.tail , X, Q.data(i)) Pop(Q) Q.head pred(Q.tail ) λ i . Q.data(succ(i))
Modeling and Verification of Out-of-Order Microprocessors in UCLID
4
151
OOO: A Synthetic Out-of-Order Processor
OOO is a simple, unspeculative, out-of-order execution unit with unbounded resources, depicted in Figure 3. The only instructions permitted are arithmetic and logical (ALU) instructions with two source operands and one destination operand. As shown in Figure 3, an instruction is read from program memory,
D E C O D E
PROGRAM MEMORY
REGISTER FILE
src1 src2 dest
RESULT
retire
RESULT BUS
11 00 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11 00 11 0 1 00 11
ALU
SRC1 VAL SRC1 TAG SRC1 VALID? SRC2 VAL SRC2 TAG SRC2 VALID? DEST REG ID
HEAD
VALID? opcode
1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
OPCODE PC
dispatch REORDER BUFFER
TAIL
execute
Fig. 3. OOO: An Out-of-order execution unit.
decoded, and dispatched to the end of the reorder buffer, which is modeled as an infinite queue. Instructions with ready operands can execute out-of-order. Finally, an instruction is retired (the program state updated), once it is at the head of the reorder buffer. On each step, the system nondeterministically chooses to either dispatch a new instruction, execute an instruction, or retire an instruction. The register file is modeled as an infinite memory indexed by register ID. Each entry of the register file has a bit, reg.valid, a value reg.val and a tag reg.tag. If reg.valid bit is true, the reg.val contains a valid value, else, reg.tag would hold the tag of the most recent instruction that will write to this register. The reorder buffer has two pointers, rob.head, which points to the oldest instruction in the reorder buffer, and rob.tail, where a newly dispatched instruction would be added. The index of an entry in the reorder buffer serves as its tag. Each entry in the reorder buffer has a valid bit rob.valid indicating if the instruction has finished execution. It has fields for the two operands rob.src1val, rob.src2val. The bit rob.src1valid indicates if the first operand is ready. If the first operand does not have valid data, rob.src1tag holds the tag for the instruction which would produce the operand data. There is a similar bit for the second operand. Each entry also contains the destination register identifier rob.dest and the result of the instruction rob.value to be written back. Further, each entry also stores the program counter (PC) for each entry in rob.PC.
152
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
When an instruction is dispatched, if a source register is marked valid in the register file, the contents of that register are filled into the corresponding operand field for the instruction in the reorder buffer and it is marked valid. If the instruction which would write to the source register has finished execution, then the corresponding operand field copies the result of that instruction and the operand is marked valid. Otherwise, the operand copies the the tag present with the source register into its tag field and the operand is marked invalid. When an instruction executes, it updates its result, and broadcasts the result on the result bus so that all other instructions in the reorder buffer that are waiting on it can update their operand fields. Finally, when a completed instruction reaches the head of the reorder buffer, it is retired. If the tag of the retiring instruction matches the reg.tag for the destination register, the result of the instruction is written back into the destination register, and that register is marked valid. Otherwise, the register file remains unchanged. 4.1
Bounded Property Checking of OOO
The verification of the OOO model was carried out in two phases. In the first phase, we applied bounded property checking to eliminate most of the bugs present in the original model of OOO. For instance, in the original model, a dispatched instruction only looked at the register file for its source operands. If the source was invalid, it was enqueued into the reorder buffer with its operand invalid. The counterexample trace demonstrated that an instruction in the rob can hold the tag of an already retired instruction. The purpose of bounded property checking is not only to discover bugs, but can also serve as a very useful semi-formal verification tool. We can argue that for a model with a circular rob of size k, all the states of the OOO where (i) the length of the rob is anywhere between 0, . . . , k, (ii) the value of the control bits rob.src1valid, rob.src2valid, rob.valid are arbitrary for each entry in the rob and (iii) the control bit of each register reg.valid is arbitrary, can be reached within 2k steps from the reset state. 2k steps are needed to reach the state when the rob is full and all the instructions in the rob have finished execution. Thus a property verified upto 2k steps gives a reasonable guarantee that it would always hold for a implementation of OOO where the number of rob entries is bound by k. This also means that if there is a bug for a particular implementation of OOO where the size of the rob is bound by k, then there is a high likelihood of the bug being detected within 2k steps of bounded-property checking. In Fig 4, we demonstrate that the efficiency of the decision procedure enables UCLID to perform bounded property checking for a reasonable number of steps (upto 20), thus providing guarantee for OOO models with upto 10 rob entries. Figure 4 shows the result for checking the following two properties: 1. tag-consistency:
∀r1 ∀r2 [((r1 = r2 ) ∧ ¬reg.valid(r1 ) ∧ ¬reg.valid(r2 )) =⇒ (reg.tag(r1 ) = reg.tag(r2 ))] 2. rf-rob: ∀r[¬reg.valid(r) =⇒ rob.dest(reg.tag(r)) = r]
The experiments were performed on a 1400MHz Pentium with 256MB memory running Linux. zChaff [15] was used as the SAT solver within UCLID. To com-
Modeling and Verification of Out-of-Order Microprocessors in UCLID
153
pare the performance of UCLID’s decision procedure, we also used SVC [3] to decide the CLU formulas. Although SVC’s logic is more expressive than CLU (includes bit-vectors and linear arithmetic in addition to CLU constructs), the decision procedure for CLU outperforms SVC for checking the properties of interest in bounded property checking. The key point to note is that UCLID (coupled with powerful SAT solvers like zChaff) enables automatic exploration of much larger state spaces than was previously possible with other techniques. Property #steps tag-consistency 6 10 14 18 20 rf-rob 10 14 18 20
Fclu size 346 2566 7480 15098 19921 2308 7392 14982 19791
Fbool size UCLID time 1203 0.87 15290 10.80 62504 76.55 173612 542.30 263413 1679.12 14666 10.31 61196 71.29 171364 485.09 260599 777.12
SVC time 0.22 233.18 > 5hrs > 1 day > 1 day 160.84 > 8hr > 1day > 1day
Fig. 4. Experimental results for Bounded Property Checking with OOO. Here “steps” indicates the number of steps of symbolic simulation, “Fclu ” denotes the CLU formula obtained after the symbolic simulation, “Fbool ” denotes boolean formula obtained after translating a CLU formula to a propositional formula by the decision procedure, the “size” of a formula denotes the number of distinct nodes in the Directed Acyclic Graph (DAG) representing the formula. “UCLID time” is the time taken by UCLID decision procedure and “SVC time” is the time taken by SVC 1.1 to decide the CLU formula. “tag-consistency” and “rf-rob” denote the properties to be verified.
4.2
Verification of the OOO Unit by Invariant Checking
We verify the OOO processor by proving a refinement map between OOO and a sequential Instruction Set Architecture (ISA) model. The ISA contains a program counter Isa.PC, and a register file Isa.rf. The program counter Isa.PC is synchronized with the program counter for OOO. Isa.rf maintains the state of the register file when all the instructions in the reorder buffer (rob) have retired . and the rob is empty. Every time an instruction I = (r1,r2,d,op) is decoded and put into the rob, the result of the instruction is computed and written to the destination register d in the ISA register file as follows: Isa.rf[d] ← Alu(op, Isa.rf[r1], Isa.rf[r2]) where, Alu is an uninterpreted function to abstract the actual computation of the execution unit. To state the invariants for the OOO processor, we maintain some auxiliary state elements in addition to the state variables of the OOO unit. These structures are very similar to the auxiliary structures used by McMillan [14] and Arons [1] for
154
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
verifying the correctness of out-of-order processors. We maintain the following structures to reason about the correctness. 1. A shadow reorder buffer, Shadow.rob, where each entry contains the correct values of the operands and the result. This structure is used to reason about the correctness of values in the rob entries. Shadow.rob is a triple (Shadow.value, Shadow.src1val, Shadow.src2val). Shadow.value(t) contains the correct value of rob.value(t) in the rob. Similarly, the other fields in the Shadow.rob contain the correct values for the two data operands. . When an instruction I = (r1,r2,d,op) is decoded, the Shadow.rob structure at rob.tail is updated as follows: Shadow.value(rob.tail) ← Alu(op, Isa.rf(r1), Isa.rf(r2)) Shadow.src1val(rob.tail) ← Isa.rf(r1) Shadow.src2val(rob.tail) ← Isa.rf(r2) 2. A shadow program counter Shadow.PC, which points to the next instruction to be retired. It is incremented every time an instruction retires in OOO. The Shadow.PC is used to prove that OOO retires instruction in a sequential order. Correctness criteria. The correctness is established by proving the following refinement map between the register file of the OOO unit and the ISA register file. ∀r.[reg.valid(r) =⇒ (Isa.rf (r) = reg.val(r))]
(ΨHa )
The lemma states that if a register is not the destination of any of the instructions in the rob, then the values in the OOO model and the ISA model are the same. Inorder Retirement. We also prove that the OOO retires instruction in sequential order with the following lemma. Shadow.P C = IT E(rob.head = rob.tail, rob.P C(rob.head), P C)
(ΨP C )
Note that this lemma is not required for establishing the correctness of OOO. 4.3
Invariants for the OOO Unit
We needed to come up with 12 additional invariants to establish the correctness of the OOO model, and we describe all of them in this section. The invariants broadly fall under three categories. The first four invariants, ΨA , ΨB1 , ΨC , ΨD are concerned with maintaining a consistent state within the OOO model. These invariants are required mainly due to the redundancy present in the OOO model. The invariants ΨE , ΨGa establish the correctness of data in the register file and rob. Lastly, invariants ΨGb , ΨHc , ΨK1 are the auxiliary invariants, which were required to prove some of the invariants above. The invariant names have no special bearing, except ΨB1 , ΨE1 and ΨK1 denote that there are similar invariants
Modeling and Verification of Out-of-Order Microprocessors in UCLID
155
for the second operand. For the sake of readability, we define ∀t.Φ(t) to be an abbreviation for ∀t.((rob.head ≤ t < rob.tail)) =⇒ Φ(t). Consistency Invariants. Invariant ΨA asserts that an instruction in the rob can execute only when both the operands are ready. ∀t.[rob.valid(t) =⇒ (rob.src1valid(t) ∧ rob.src2valid(t))]
(ΨA )
For any rob entry t, if any operand is not valid, then the operand should hold the tag of an older entry which produces the data but has not yet completed execution. There is a similar invariant for the second operand. ∀t.[¬rob.src1valid(t) =⇒ (¬rob.valid(rob.src1tag(t)) ∧ (rob.head ≤ rob.src1tag(t) < t)]
(ΨB1 )
Invariant ΨC claims that if the instruction at index t writes to a register r : rob.dest(t), then r can’t have valid data and the tag carried by r would be either t or a newer entry. ∀t.[(t ≤ reg.tag(rob.dest(t)) < rob.tail)) ∧ (¬reg.valid(rob.dest(t))]
(ΨC )
Invariant ΨD asserts that a register r can only be modified by an active instruction in the rob which has r as the destination register. ∀r.[¬reg.valid(r) =⇒ ((rob.dest(reg.tag(r)) = r) ∧ (rob.head ≤ reg.tag(r) < rob.tail))]
(ΨD )
All the above invariants restrict the state of the OOO model to be a reachable state. Note that there is no reference to any shadow structure, because the shadow structures only provide correctness of values in the OOO model. Correctness Invariants. Invariant ΨE1 establishes the constraint between the Shadow.src1val and rob.src1val. It states that if any rob entry has a valid operand, then it should be correct (equals the value in the Shadow structure for that entry). There is a similar invariant for the second operand. ∀t.[rob.src1valid(t) =⇒ (Shadow.src1val(t) = rob.src1val(t))]
(ΨE1 )
The following invariant asserts that if an rob entry has completed execution, then the result matches with the value in the shadow rob. ∀t.[rob.valid(t) =⇒ (Shadow.value(t) = rob.value(t))]
(ΨGa )
Auxiliary Invariants. We needed the following auxiliary invariants for the Shadow.src1val, Shadow.value and Isa.rf respectively to prove the previous invariants inductive. ∀t.[¬rob.src1valid(t) =⇒ Shadow.src1val(t) = Shadow.value(rob.src1tag(t))] (ΨK1 )
The above invariant asserts that the correct value of a data operand which is not ready is the result of the instruction which would produce the data. ∀t.[(Shadow.value(t) = Alu(rob.opcode(t), Shadow.src1val(t), Shadow.src2val(t)))]
(ΨGb )
156
S.K. Lahiri, S.A. Seshia, and R.E. Bryant
The above invariant relates the result of execution to the correct value for any entry. ∀r.[¬reg.valid(r) =⇒ Isa.rf (r) = Shadow.value(reg.tag(r))]
(ΨHc )
The invariant ΨHc relates the value of a register r in the shadow register file with the result of the instruction which would write back to the register. Finally, we conjoin all the invariants to make the monolithic invariant Ψall . Since ∀ distributes over ∧, we pull the quantifiers out in the formula given here: . Ψall = ∀r.∀t.[ΨA (t) ∧ ΨB1 (t) ∧ ΨB2 (t) ∧ ΨC (t) ∧ ΨD (r) ∧ ΨE1 (t) ∧ ΨE2 (t) ∧ ΨK1 (t) ∧ ΨK2 (t) ∧ ΨGa (t) ∧ ΨGb (t) ∧ ΨHa (r) ∧ ΨHc (r)]
Proof of the invariants. Some of the invariants were manually deduced from a failure trace from the counterexample generator. The most complicated among them were the invariants for the shadow register file and shadow rob entries. We spent two man-days to come up with all the invariants. The invariants were proved in a completely automatic way by automatically translating the invariants to a formula in CLU by the method described in Section 2.2, and using the decision procedure for CLU to decide the formula. As we claimed earlier, the translation of quantified formulas to a CLU formula does not blow up the formula in a huge way, since most of the formulas have at most two bound variables. For instance, consider the proof for the invariant ΨHa as given in the UCLID framework: decide(Inv_all => Inv_Ha_next(r1));
Here the invariant ΨHa (written above as Inv Ha) is checked in the next state if Ψall (written as Inv all) holds in the current state for all registers r and all tags t. There are only two bound variables r,t in the antecedent. Since all our invariants are of the form ∀r.Φ(r) or ∀t.Ψ (t), we had to consider at most two bound variables in the antecedent. The final proof script had 13 such formulas (one for each invariant) to be decided, and they were discharged automatically by UCLID in 76.44 sec. on a 1400 MHz Pentium IV Linux machine with 256 MB of memory. The memory requirement was less than 20 MB for the entire run. There is still a lot of scope of improvement in the decision procedure. The proof script consisted of the shadow structures, definition of the invariants mentioned in the Section 4.3, and 13 lines of proof to prove all the invariants in the next state. To prove the lemma ΨP C for the in-order retirement, we required two more auxiliary lemmas. nPC is an uninterpreted function to obtain the next sequential value of a program counter. ∀t.[(t > rob.head) =⇒ rob.P C(t) = nP C(rob.P C(t − 1))]
(ΨP C1 )
[(rob.head = rob.tail) =⇒ P C = nP C(rob.P C(rob.tail − 1))]
(ΨP C2 )
Modeling and Verification of Out-of-Order Microprocessors in UCLID
4.4
157
Using a Circular Reorder Buffer
The model verified in this section is somewhat unrealistic because of the infinite reorder buffer, since it never wraps around. Most reorder buffer implementations use a finite circular queue to model the reorder buffer. Thus tags are reused unlike the above model. Hence we re-did the verification using a model with a circular buffer of arbitrary size. We needed very little change to our original proof. First, the reorder buffer was modeled as a circular buffer with modulo successor and predecessor functions as defined in Section 3.3. Second, each rob entry had an additional entry rob.present to indicate if the entry has a valid instruction, and to disambiguate between checking the rob for full or empty. Third, the “ 0: 2. For all i s.t. ai,n < 0: 3. For all i s.t. ai,n = 0:
n−1 ai,n · xn ≤ bi − Σj=1 ai,j · xj n−1 Σj=1 ai,j · xj − bi ≤ −ai,n · xn n−1 Σj=1 ai,j · xj ≤ bi
The first and second segments correspond to upper and lower bounds on xn , respectively. To eliminate xn , FM replaces each pair of lower and upper bound constraints L ≤ cl · xn and cu · xn ≤ U , where cl , cu > 0, with the new constraint cu ·L ≤ cl ·U . If, in the process of elimination, the procedure derives the constraint c ≤ 0 where c is a constant greater than 0, it terminates and indicates that the system is unsatisfiable. Note that it is possible that variables are not bounded from both ends. In this case it is possible to simplify the system by removing these variables from the system together with all the constraints to which they belong. This can make other variables unbounded. Thus, this simplification stage iterates until no such variables are left. n The FM method can result in the worst case in m2 constraints, which is the reason that it is only suitable for a relatively small set of constraints with small number of variables. There are various heuristics for choosing the elimination order. A standard greedy criteria gives priority to variables that their elimination produces less new constraints. Example 1. Consider the following formula: ϕ = x1 − x2 ≤ 0
∧
x1 − x3 ≤ 0
∧
−x1 + 2x3 + x2 ≤ 0
∧
−x3 ≤ −1
The following table demonstrates the elimination steps following the variable order x1 ,x2 ,x3 : Eliminated Lower Upper New var bound bound constraint x1 x1 − x2 ≤ 0 −x1 + 2x3 + x2 ≤ 0 2x3 ≤ 0 x1 − x3 ≤ 0 −x1 + 2x3 + x2 ≤ 0 x2 + x3 ≤ 0 x2 no lower bound x3 2x3 ≤ 0 −x3 ≤ −1 2≤0 The last line results in a contradiction, which implies that this system is unsatisfiable. The extension of FM to handle a combination of strict ( (cu − 1) · (cl − 1) for a given lower and upper bounds on xn : cLl ≤ xn ≤ cUu , where cu and cl are integer constants. We refer the reader to [18] for a proof of this derivation. The dark shadow test is sound, but not complete. It is possible that the dark shadow is unsatisfiable, but there is still a solution to C. If the dark shadow is unsatisfiable, the omega test generates a set of constraints in DNF, called splinters, which define integral solutions outside the dark shadow (DNF is required because the solution area is not necessarily continues). The algorithm in figure 1, adopted from [18], gives a rough idea on how this algorithm works. Unlike
164
O. Strichman
the description above, this is a non-recursive version of the algorithm, which is therefore more suitable for reduction to SAT. Given a set of inequality constraints C and an integer variable xn that should be quantified out, it generates a logically equivalent formula that is a disjunction between two sub-formulas: the first does not contains xn , and the second contains xn as part of an equality constraint (which means it can be eliminated by simple substitution). % Input: ∃xn .C where xn is an integer variable and C is a conjunction of inequalities. R = false C = all constraints from C that do not involve xn . for each lower bound on xn : L ≤ cl · xn for each upper bound on xn : cu · xn ≤ U C = C ∧ (cu · L + (cu − 1)(cl − 1) ≤ cl · U ) Let cmax = max coefficient of xn in upper bound on xn . for (i = 0 to ((cmax − 1)(cl − 1) − 1)/cmax ) do R = R ∨ (C ∧ (L + i = cl · xn )). % C is the dark shadow. % R contains the splinters % Output: C ∨ (∃ integer xn s.t. R) Fig. 1. Existential quantification of an integer xn from a set of constraints C
In the next section we present a propositional version of the FM method and the omega test.
3
A Propositional Version of Fourier-Motzkin
Given a DLA formula ϕ, we now show how to derive a propositional formula ϕ s.t. ϕ is satisfiable iff ϕ is satisfiable. The procedure for generating ϕ emulates the FM method. 1. Normalize ϕ: - Rewrite equalities as conjunction of inequalities. - Transform ϕ to Negation Normal Form (negations are allowed only over atomic constraints). - Eliminate negations by reversing inequality signs. 2. Encode each inequality i with a Boolean variable ei . Let ϕ denote the encoded formula. 3. - Perform FM elimination on the set of all constraints in ϕ, while assigning new Boolean variables to the newly generated constraints. - At each elimination step, for every pair of constraints ei , ej that result in the new constraint ek , add the constraint ei ∧ ej → ek to ϕ . - If ek represents a contradiction (e.g., 1 ≤ 0), replace ek by false. We refer to this procedure from here on as Boolean Fourier Motzkin (BFM).
On Solving Presburger and Linear Arithmetic with SAT
165
Example 3. Consider the following formula: ϕ = 2x1 − x2 ≤ 0
∧
(2x2 − 4x3 ≤ 0
∨
x3 − x1 ≤ −1)
By Assigning an increasing index to the predicates from left to right we initially get ϕ = e1 ∧ (e2 ∨ e3 ). Let x1 , x2 , x3 be the elimination order. The following table illustrates the process of updating ϕ : Elimina- Lower Upper New Enco- Add to ϕ ted var bound bound constraint ding x1 x3 − x1 ≤ −1 2x1 − x2 ≤ 0 2x3 − x2 ≤ −2 e4 e3 ∧ e1 → e4 x2 2x3 − x2 ≤ −2 2x2 − 4x3 ≤ 0 4 ≤ 0 false e4 ∧ e2 → false Thus, the resulting satisfiable formula is: ϕ = (e1 ∧ (e2 ∨ e3 )) ∧ (e1 ∧ e3 → e4 ) ∧ (e4 ∧ e2 → false) A propositional version of the omega test, which is needed for solving QFP arithmetic, works in a similar way. The main difference is that in step 3, ei and ej can result in a Boolean combination of predicates rather than a single predicate ek . Example 3 demonstrates the main drawback of this method. Since in step 2 we consider all inequalities, regardless of the Boolean connectives between them, the number of constraints that the FM procedure adds is potentially larger than those that we would add if we considered each case separately (where a ’case’ corresponds to a conjoined list of inequalities). In the above example, case splitting would result in two cases, none of which results in added constraints. Since the complexity of FM is the bottleneck of this procedure, this drawback may significantly worsen the overall run time and risk its usability. As a remedy, we will suggest in section 4 a polynomial method that bounds the number of constraints to the same number that would otherwise be added by solving the various cases separately. Complexity of deciding ϕ . The encoded formula ϕ has a unique structure that makes it easier to solve comparing to a general propositional formula of similar size. Let m be the set of encoded predicates of ϕ and n be the number of variables. n
Proposition 1. ϕ can be decided in time bounded by O(2|m| · |m|2 ). Proof. SAT is worst case exponential in the number of decided variables and linear in the number of clauses. The Boolean value assigned to the predicates in m imply the values of all the generated predicates3 . Thus, we can restrict the 3
Note that the constraints added in step 3 are Horn clauses. This means that for a given assignment to the predicates in m, these constraints are solvable in linear time.
166
O. Strichman
SAT solver to split only on m. Hence, in the worst case the SAT procedure will be exponential in m and linear in the number of clauses, which in the worst case n is |m|2 .
4
Conjunctions Matrices
Case splitting can be thought of as a two step procedure, where in the first step the formula is transformed to DNF, and in the second each clause, which now includes a conjunction of constraints, is solved separately. In this section we show how to predict, in polynomial time, whether a given pair of predicates would share a clause if the formula was transformed to DNF. It is clear that there is no need to generate a new constraint from two predicates that do not share a clause. 4.1
Joining Operands
We assume that ϕ is normalized, as explained in step 1. Let ϕf denote the encoded formula after step 2 and ϕc denote the added constraints of step 3 (thus, after step 3 ϕ = ϕf ∧ϕc ). All the internal nodes of the parse tree of ϕf correspond to either disjunctions or conjunctions. Consider the lowest common parent of two leafs ei , ej in the parsing tree. We call the Boolean operand represented by this node the joining operand of these two leafs and denote it by J(ei , ej ). Example 4. In the formula ϕf = e1 ∧ (e2 ∨ e3 ), J(e1 , e2 ) = ‘∧’ and J(e2 , e3 ) = ‘∨’. For simplicity, we first assume that no predicates appear in ϕ more than once. In section 4.2 we solve the more general case. Denote by ϕD the DNF representation of ϕ. The following proposition is the basis for the prediction technique: Proposition 2. Two predicates ei , ej share a clause in ϕD iff J(ei , ej ) = ‘∧’. Proof. Recall that ϕf does not contain negations and no predicate appears more than once. (⇒) Let node denote the node joining ei and ej , and assume it represent a disjunction (J(ei , ej ) =‘∨’). Transform the right and left branches descending from node to DNF. A disjunction of two DNF formulas is a DNF, and therefore the formula under node is now a DNF expression. If node is the root or if there are only disjunctions on the path from node to the root, we are done. Otherwise, the distribution of conjunction only adds elements to each of the clauses under node but does not join them into a single clause. Thus, ei and ej do not share a clause if their joining operand is a disjunction. (⇐) Again let node denote the node joining ei and ej , and assume it represents a conjunction (J(ei , ej ) =‘∧’). Transform the right and left branches descending from node to DNF. Transforming a conjunction of two DNF sub formulas back to DNF is done by forming a clause for each sequence of literals from the different clauses. Thus, at least one clause contains ei ∧ ej . Since there are no negation in the formula, the literals in this clause will remain together in ϕD regardless of the Boolean operands above node.
On Solving Presburger and Linear Arithmetic with SAT
167
For a given pair of predicates, it is a linear operation (in the height of the parse tree h) to check whether their joining operand is a conjunction or disjunction. If there are m predicates in ϕ, constructing the initial m × m conjunctions matrix Mϕ of ϕ has the complexity of O(m2 h). Mϕ is a binary, symmetric matrix, where Mϕ [ei , ej ] = 1 if and only if J(ei , ej ) =‘∧’. For example, Mϕ corresponding to ϕf of example 4 is given by
e1 e1 0 Mϕ = e2 1 e3 1
e2 1 0 0
e3 1 0 0
Given proposition 2, this means that these predicates share at least one clause in ϕD . New entries are added to Mϕ when new constraints are generated, and other entries, corresponding to constraints with non-zero coefficients over eliminated variables, are removed. The entry for a new predicate ek that was formed from the predicates ei , ej is updated as follows: ∀l ∈ [1..k − 1]. Mϕ [ek , el ] = Mϕ [ei , el ] ∧ Mϕ [ej , el ] This reflects the fact that the new predicate is relevant only to predicates that share a clause with both ei and ej . 4.2
Handling Repeating Predicates
Practically most formulas contain predicates that appear more than once, in different parts of the formula. We will denote by eki , k ≥ 1 the k instance of the predicate ei in ϕ . It is possible that the same pair of predicates has different joining operands, e.g. J(e1i , e1j ) =‘∧’ but J(e1i , e2j ) =‘∨’. There are two possible solutions to this problem: 1. Represent each predicate instance as a separate predicate. 2. Assign Mϕ [ei , ej ] = 1 if there exists an instance of ei and of ej s.t. J(ei , ej ) = ‘∧’. The second option has a more concise representation, but may result in redundant constraints, as the example below demonstrates. Example 5. Let ϕf = e1 ∧ (e2 ∨ e3 ) ∨ (e2 ∧ e3 ). According to option 2, ϕ contains only three predicates e1 . . . e3 and therefore Mϕ is a 3 × 3 matrix with an entry ’1’ in all its cells. Thus, Mϕ does not contain the information that the three predicates never appear together in the same clause, which will potentially result in redundant constraints. Conjunctions matrices can be used to speed up many of the other decision procedures that were published in the last few years for subset of linear arithmetic [9,5, 3,4,15,20]. We refer the reader to a technical report [19] for a detailed description of how this can be done.
168
4.3
O. Strichman
A Revised Decision Procedure
Given the initial conjunctions matrix Mϕ , we now change step 3: 3.
- Perform FM elimination on the set of all constraints in ϕ, while assigning new Boolean variables to the newly generated constraints. - At each elimination step consider the pair of constraints ei , ej only if Mϕ [ei , ej ] = 1. In this case let ek be the new predicate. · Add the constraint ei ∧ ej → ek to ϕ . · If ek represents a contradiction (e.g., 1 ≤ 0), replace ek by false. · Otherwise update Mϕ as follows: ∀l ∈ [1..k − 1]. Mϕ [ek , el ] = Mϕ [ei , el ] ∧ Mϕ [ej , el ].
The revised procedure guarantees that the total number of constraints generated is less or equal to the total number of constraints that are generated by solving each set of conjoint constraints separately. In fact, it is expected to generate a much smaller number, because constraints that are repeated in many separate cases resolve in a single new constraint in BFM. For example, naive case splitting over the formula ϕ = e1 ∧ e2 ∧ (e3 ∨ e4 ) will generate the resolvent of e1 and e2 twice, while BFM will only generate it once4 .
5
Experiments
An implementation of BFM turned out to be harder than expected, because of the lack of efficient and sound implementations of FM and the omega test in the public domain. We implemented BFM for real variables on top of PORTA (A Polyhedron Representation and Transformation Algorithm) [16]. We randomly generated formulas in 2-CNF style (that is, a 2-CNF where the literals are linear inequalities) with different number of clauses and variables. The (integer) coefficients were chosen randomly in the range −10..10. The time it took to generate the SAT instance with BFM5 is summarized in Fig. 2. The time it took Chaff [12] to solve each of the instances that we were able to generate was relatively negligible. Normally it was less than a second, with the exception of 3 instances that took 10-20 seconds each to solve. We also ran these instances with ICS, which solves these type of formulas with FM combined with case-splitting. ICS could solve only one of these instances (the 10 x 10 instance) in the specified time bound (it took it about 10 minutes). It either ran out of memory or out of time in all other cases. This is not very surprising, because in the worst case it has to solve 2c separate cases, where c is the number of clauses. CNF style formulas are also harder for BFM because they make conjunctions matrices ineffective. Each predicate in ϕ appears with 4
5
Smarter implementation of case splitting will possibly identify, in this simple example, that the resolvent has to be generated once. But in the general case redundant constraints will be generated. All experiments were run on a 1.5 GHz AMD Athlon machine with 1.5 G memory running Linux.
On Solving Presburger and Linear Arithmetic with SAT
# vars 10 30 50 70 90 110 130 150 170
10 0.1 0.1 0.1 0.1 0.2 0.3 0.3 0.2 0.2
30 0.2 0.1 0.1 0.2 0.2 0.3 0.3 0.3 0.3
50 0.2 0.2 0.2 0.2 0.3 0.5 0.4 0.5 0.5
70 1.1 2.5 0.3 0.4 0.3 8.2 0.7 0.8 58.2
# clauses 90 110 130 56 103 208 61.1 68 618 4.9 8 173 13.4 108 * 0.5 1 14 396 594 * 2.9 195 2658 18.4 334 1227 999 * *
150 254 * 893 * 181 * * * *
169
170 * * 2772 * 347 * * * *
Fig. 2. Time, in seconds, required for generating a SAT instance for random 2-CNF style linear inequalities with a varying number of clauses and variables. ‘*’ indicates running time exceeding 2 hours.
all other predicates in some clause of ϕD , except those predicates it shares a clause with in ϕ. Thus, almost all the entries of Mϕ are equal to ‘1’. We performed two other sets of tests. In the first set, we ran BFM and ICS on seven formulas resulting from symbolic simulation of hardware designs. The only type of inequalities found in these formulas are separation predicates, i.e. predicates of the form x < y + c, where c is a constant. While BFM solved all seven formulas in a few seconds, ICS timed-out on two formulas, and solved in a few seconds the other five. In the second set, we ran some of the standard ICS benchmarks (e.g., ‘linsys-035’, ’linsys-100’). ICS performed much better than BFM with these instances. In some cases it terminated in a few seconds, while BFM timed-out. The reason for this seemingly inconsistency is that all the ICS benchmark formulas are a conjunction of linear equalities, and therefore no case splitting is required. The better performance of ICS can be attributed to the higher quality of implementation of FM comparing to that of PORTA. PORTA itself is, unfortunately, not an optimized implementation of FM. For example, it does not have heuristics for choosing dynamically the variable elimination order; rather it requires the user to supply a static order. It also doesn’t have a mechanism for identifying subsumed or even equivalent inequalities. These inefficiencies apparently have a very strong effect on the results, which indicates that if BFM will be implemented on top of a better implementation of FM (for example, on top of ICS itself), the results will, hopefully, further improve.
References 1. C. Barrett, D. Dill, and J. Levitt. Validity checking for combinations of theories with equality. In M. Srivas and A. Camilleri, editors, Proc. FMCAD 1996, volume 1166 of LNCS. Springer-Verlag, 1996. 2. A.J.C. Bik and H.A.G. Wijshoff. Implementation of Fourier-Motzkin elimination. Technical Report 94-42, Dept. of Computer Science, Leiden University, 1994.
170
O. Strichman
3. R.E. Bryant, S. German, and M. Velev. Exploiting positive equality in a logic of equality with uninterpreted functions. In Proc. 11th Intl. Conference on Computer Aided Verification (CAV’99), 1999. 4. R.E. Bryant, S. German, and M. Velev. Processor verification using efficient reductions of the logic of uninterpreted functions to propositional logic. ACM Transactions on Computational Logic, 2(1):1–41, 2001. 5. R.E. Bryant and M. Velev. Boolean satisfiability with transitivity constraints. In E.A. Emerson and A.P. Sistla, editors, Proc. 12th Intl. Conference on Computer Aided Verification (CAV’00), volume 1855 of Lect. Notes in Comp. Sci. SpringerVerlag, 2000. 6. G. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey., 1963. 7. W. M. Farmer, J. D. Guttman, , and F. J. Thayer. IMPS: System description. In D. Kapur, editor, Automated Deduction–CADE-11, volume 607 of Lect. Notes in Comp. Sci., pages 701–705. Springer-Verlag, 1992. 8. J.C. Filliatre, S. Owre, H. Rueb, and N. Shankar. ICS: Integrated canonizer and solver. In G. Berry, H. Comon, and A. Finkel, editors, Proc. 13th Intl. Conference on Computer Aided Verification (CAV’01), LNCS. Springer-Verlag, 2001. 9. A. Goel, K. Sajid, H. Zhou, A. Aziz, and V. Singhal. BDD based procedures for a theory of equality with uninterpreted functions. In A.J. Hu and M.Y. Vardi, editors, CAV98, volume 1427 of LNCS. Springer-Verlag, 1998. 10. P. Johannsen. Reducing bitvector satisfiability problems to scale down design sizes for rtl property checking. In IEEE Proc. HLDVT’01, 2001. 11. L. G. Khachiyan. A polynomial algorithm in linear programming. Soviet Mathematics Doklady, 1979. 12. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In Proc. Design Automation Conference 2001 (DAC’01), 2001. 13. G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Transactions on Programming Languages and Systems, 1979. 14. S. Owre, N. Shankar, and J.M. Rushby. User guide for the PVS specification and verification system. Technical report, SRI International, 1993. 15. A. Pnueli, Y. Rodeh, O. Shtrichman, and M. Siegel. Deciding equality formulas by small-domains instantiations. In Proc. 11th Intl. Conference on Computer Aided Verification (CAV’99), Lect. Notes in Comp. Sci. Springer-Verlag, 1999. 16. PORTA. http://elib.zib.de/pub/packages/mathprog/polyth/porta/. 17. W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM, pages 102–114, 1992. 18. W. Pugh and D. Wonnacott. Experiences with constraint-based array dependence analysis. In Principles and Practice of Constraint Programming, pages 312–325, 1994. 19. O. Strichman. Optimizations in decision procedures for propositional linear inequalities. Technical Report CMU-CS-02-133, Carnegie Mellon University, 2002. 20. O. Strichman, S.A. Seshia, and R.E. Bryant. Deciding separation formulas with SAT. In Proc. 14th Intl. Conference on Computer Aided Verification (CAV’02), LNCS, Copenhagen, Denmark, July 2002. Springer-Verlag.
Deciding Presburger Arithmetic by Model Checking and Comparisons with Other Methods Vijay Ganesh, Sergey Berezin, and David L. Dill Stanford University {vganesh,berezin, dill}@stanford.edu
Abstract. We present a new way of using Binary Decision Diagrams in automata based algorithms for solving the satisfiability problem of quantifier-free Presburger arithmetic. Unlike in previous approaches [5,2,19], we translate the satisfiability problem into a model checking problem and use the existing BDD-based model checker SMV [13] as our primary engine. We also compare the performance of various Presburger tools, based on both automata and ILP approaches, on a large suite of parameterized randomly generated test cases. The strengths and weaknesses of each approach as a function of these parameters are reported, and the reasons for the same are discussed. The results show that no single tool performs better than the others for all the parameters. On the theoretical side, we provide tighter bounds on the number of states of the automata.
1
Introduction
Efficient decision procedures for logical theories can greatly help in the verification of programs or hardware designs. For instance, quantifier-free Presburger arithmetic [15] has been used in RTL-datapath verification [3], and symbolic timing verification [1].1 However, the satisfiability problem for the quantifier free fragment is known to be NPcomplete [14]. Consequently, the search for practically efficient algorithms becomes very important. Presburger arithmetic is defined to be the first-order theory of the structure Z, 0, ≤, +, where Z is the set of integers. The satisfiability of Presburger arithmetic was shown to be decidable by Presburger in 1927 [15,12]. This theory is usually defined over the natural numbers N, but can easily be extended to the integers (which is important for practical applications) by representing any integer variable x by two natural variables: x = x+ − x− . This reduction obviously has no effect on known decidability or complexity results.
This research was supported by GSRC contract SA2206-23106PG-2 and in part by National Science Foundation CCR-9806889-002. The content of this paper does not necessarily reflect the position or the policy of GSRC, NSF, or the Government, and no official endorsement should be inferred. 1 In [1] Presburger formulas have quantifiers, but without alternation, and therefore, are easy to convert into quantifier-free formulas.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 171–186, 2002. c Springer-Verlag Berlin Heidelberg 2002
172
V. Ganesh, S. Berezin, and D.L. Dill
The remainder of the paper focuses on quantifier-free Presburger arithmetic because many verification problems do not require quantification, and because the performance of decision procedures on quantifier-free formulas may be qualitatively different from the quantified case. This paper has two primary goals: presentation of a new decision procedure based on model checking and comparison of the various approaches to deciding quantifier-free Presburger arithmetic and their implementations. There are three distinct ways of solving the satisfiability problem of quantifier-free Presburger, namely the Cooper’s method [8], the integer linear programming (ILP) based approaches, and the automata-based methods. Cooper’s method is based on Presburger’s original method for solving quantified formulas, only more efficient. Using Cooper’s method on a quantifier-free formula still requires introducing existential quantifiers and then eliminating them. This process results in an explosion of new atomic formulas, so the method is probably too inefficient to be competitive with other approaches. Since atomic formulas are linear integer equalities and inequalities, it is natural to think of the integer linear programming (ILP) algorithms as a means to determine the satisfiability of quantifier-free formulas in Presburger arithmetic. ILP algorithms maximize an objective function, subject to constraints in the form of a conjunction of linear equalities and inequalities. Along the way, the system is checked for satisfiability (usually called feasibility), which is the problem of interest in this paper. There are many efficient implementations of ILP solvers available. We have experimented with the commercial tool CPLEX and open source implementations LP SOLVE and OMEGA [16]. The OMEGA tool is specifically tuned to solve integer problems, and is an extension of the Fourier-Motzkin linear programming algorithm [9] to integers [18]. In order to solve an arbitrary quantifier-free formula, it must first be converted to disjunctive normal form (DNF), then ILP must be applied to each disjunct until a satisfiable one is found. If any of the disjuncts is satisfiable, then the entire formula is satisfiable. This conversion to DNF may lead to an exponential explosion of the formula size. In addition, unlike automata methods, the existing implementations lack the support for arbitrarily large integers and use native machine arithmetic. This has two consequences. Firstly, it obstructs making a fair comparison of the ILP tools with automata methods, since the two are not feature equivalent. The use of native machine arithmetic by ILP tools gives them an unfair performance advantage. Secondly, the support for large integers may be crucial in certain hardware verification problems, where the solution set may have integers larger than the int types supported natively by the hardware. For instance, many current RTL-datapath verification approaches use ILP [11,3], but these approaches cannot be scaled with the bit-vector size in the designs. A third approach uses finite automata theory. The idea that an atomic Presburger formula can be represented by a finite-state automaton goes back at least to B¨uchi [5]. Boudet and Comon [2] proposed a more efficient encoding than B¨uchi’s. Later, Wolper and Boigelot [19] further improved the method of Boudet and Comon and implemented the technique in the system called LASH. Another automata-based approach is to translate the atomic formulas into WS1S (weak monadic second order logic with one successor) and then use the MONA tool [10]. MONA is a decision procedure for WS1S and uses Binary Decision Diagrams (BDDs, [4]) internally to represent automata.
Deciding Presburger Arithmetic
173
In this paper, a new automata-based approach using symbolic model checking [7] is proposed and evaluated. The key idea is to convert the quantifier-free Presburger formula into a sequential circuit which is then model checked using SMV [13]. Experiments indicate that the SMV approach is quite efficient and more scalable on formulas with large coefficients than all the other automata-based techniques. The reason for this is the use of BDDs to represent both the states and the transitions of the resulting automaton. Another factor which contributes to the efficiency is that SMV uses a highly optimized BDD package. In addition, the use of an existing tool saves a lot of implementation effort. The experiments required only a relatively small Perl script to convert Presburger formulas into the SMV language. The other tools do not use BDDs for the states because they perform quantifier elimination by manipulating the automata directly. Namely, each quantifier alternation requires projection and determinization of the automaton. The use of BDDs for the states can make the implementation of the determinization step particularly hard. We also compare various automata and ILP-based approaches on a suite of 400 randomly generated Presburger formulas. The random generation was controlled by several parameters, such as the number of atomic formulas, the number of variables, and maximum coefficient size. For every approach we identify classes of Presburger formulas for which it either performs very poorly or very efficiently. Only one similar comparison has been done previously in [17]. However, their examples consist of a rather small set of quantified Presburger formulas obtained from real hardware verification problems. The goal of our comparison is to study the performance trends of various approaches and tools depending on different parameters of quantifier-free Presburger formulas. The paper is organized as follows. Section 2 explains the automata construction algorithms which are the same as in [19,2], except for the tighter bounds on the number of states of the automata. Section 3 then describes the implementation issues, the conversion of the satisfiability problem into a model checking problem, and construction of a circuit corresponding to the automaton. Section 4 provides our experimental results and comparisons with other tools. Finally, Section 5 concludes the paper with the discussion of experimental results and the future work.
2
Presburger Arithmetic
Definition 1. We define Presburger arithmetic to be the first-order theory over atomic formulas of the form n ai xi ∼ c, (1) i=1
where ai and c are integer constants, xi ’s are variables ranging over integers, and ∼ is an operator from {=, = , , ≥}. The semantics of these operators are the usual ones. In the rest of the paper we restrict ourselves to only quantifier-free fragment of Presburger arithmetic.
174
V. Ganesh, S. Berezin, and D.L. Dill
A formula f is either an atomic formula (1), or is constructed from formulas f1 and f2 recursively as follows: f ::= ¬f1 | f1 ∧ f2 | f1 ∨ f2 . Throughout the paper we use the following typographic conventions. Notation 1. We reserve boldface letters, e.g. b, to represent column vectors and bT to represent row vectors. The term vector shall always refer to a column vector unless specified otherwise. In this notation, x represents the vector of variables of the atomic formula: x1 .. x= . xn and b represents n-bit boolean column vectors. A row vector of coefficients in an atomic formula is denoted by aT : aT = (a1 , a2 , . . . , an ). In particular, an atomic formula in the vector notation is written as follows: f ≡ aT · x ∼ c, where aT · x is the scalar product of the two vectors aT and x. We give the formal semantics of the quantifier-free Presburger arithmetic in terms of the sets of solutions. A variable assignment for a formula φ (not necessarily atomic) with n free variables is an n-vector of integers w. An atomic formula f under a particular assignment w can be easily determined to be true or false by evaluating the expression aT · w ∼ c. A solution is a variable assignment w which makes the formula φ true. We denote the set of all solutions of φ by Sol(φ), which is defined recursively as follows: – if φ is atomic, then Sol(φ) = {w ∈ Zn | aT · w ∼ c}; – if φ ≡ ¬φ1 , then Sol(φ) = Zn − Sol(φ1 ); – if φ ≡ φ1 ∧ φ2 , then Sol(φ) = Sol(φ1 ) ∩ Sol(φ2 ); – if φ ≡ φ1 ∨ φ2 , then Sol(φ) = Sol(φ1 ) ∪ Sol(φ2 ). To simplify the definitions, we assume that all atomic formulas of φ always contain the same set of variables. If this is not true and some variables are missing in one of the atomic formulas, then these variables can be added with zero coefficients. 2.1
Idea behind the Automaton
The idea behind the automata-based approach is to construct a deterministic finite-state automaton (DFA) Aφ for a quantifier-free Presburger formula φ such that the language of this automaton L(Aφ ) corresponds to the set of all solutions of φ. When such an
Deciding Presburger Arithmetic
175
automaton is constructed, the satisfiability problem for φ is effectively reduced to the emptiness problem of the automaton, that is, checking that L(Aφ ) =∅. If a formula is not atomic, then the corresponding DFA can be constructed from the DFAs for the subformulas using the complement, intersection, and union operations on the automata. Therefore, to complete our construction of Aφ for an arbitrary quantifierfree Presburger formula φ it is sufficient to construct DFAs for each of the atomic formulas of φ. Throughout this section we fix a particular atomic Presburger formula f : f ≡ aT · x ∼ c. Recall that a variable assignment is an n-vector of integers w. Each integer can be represented in the binary format in 2’s complement, so a solution vector can be represented by a vector of binary strings. We can now look at this representation of a variable assignment w as a binary matrix where each row, or track, represents an integer for the corresponding variable, and each ith column represents the vector of the ith bits of all the components of w. Alternatively, this matrix can be seen as a string of its columns, a string over the alphabet Σ = Bn , where B = {0, 1}. The set of all strings that together represent all the solutions of a formula f form a language Lf over the alphabet Σ. Our problem is now reduced to building a DFA for the atomic formula f that accepts exactly the language Lf . Intuitively, the automaton Af must read a string π, extract the corresponding variable assignment w from it, instantiate it into the formula f , and check that the value of the left hand side (LHS) is indeed related to the right hand side (RHS) constant as the relation ∼ prescribes. If it does, the string is accepted, otherwise rejected. Since the RHS constant and the relation ∼ are fixed in f , the value of the LHS of f solely determines whether the input string π should be accepted or not. Assume that the automaton Af reads a string from left to right. If the value of the LHS of f is l after reading the string π, then after appending one more “letter” b ∈ Bn to π on the right, the LHS value changes to l = 2l + aT · b. Notice that only the original value of the LHS l and the new “letter” b are needed to compute the new value of the LHS l for the resulting string. This property directly corresponds to the property of the transition relation of an automaton, namely, that the next state is solely determined by the current state and the next input letter. Following the above intuition, we can define an automaton Af as follows. The states of Af are integers representing the values of the LHS of f ; the input alphabet is Σ = Bn ; and on an input b ∈ Σ the automaton transitions from a state l to l = 2l + aT · b. The set of accepting states are those states l that satisfy l ∼ c. Special care has to be taken of the initial state sinitial ∈Z. First, we interpret the empty string as a vector of 0’s. Thus, the value of the left hand side in the initial state must be equal to 0. The first “letter” read by Af is the vector of sign bits, and, according to the 2’s complement interpretation, the value of the LHS in the next state after sinitial must be l = −aT · b. Notice that this automaton is not finite, since we have explicitly defined the set of states to be integers. Later we examine the structure of this infinite automaton and show how to trim the state space to a finite subset and obtain an equivalent DFA, similar to the one in Figure 1.
176
V. Ganesh, S. Berezin, and D.L. Dill
−inf 0 1
0 0 1 1 0,1,0,1
0 0 1 1 1,0,0,1
−2 0 1 0 , 1 1 0
−1 1 0
Sinitial
0 1 0 , 1 0 1 0 1
0 1 0 ,1
0 1 0
0 1
1 0 1 0,1
2 0 0 1 1 0,1,0,1 1 0
+inf
0 0 1 1 0,1,0,1
Fig. 1. Example of an automaton for an atomic Presburger formula x − y ≤ −2.
2.2
Formal Description of the Automaton
An (infinite-state) automaton corresponding to an atomic Presburger formula f is defined as follows: Af = (S, Bn , δ, sinitial , Sacc ), where – – – –
S = Z ∪ {sinitial } is the set of states, Z is the set of integers and sinitial ∈Z; sinitial is the start state; Bn is the alphabet, which is the set of n-bit vectors, B = {0, 1}; The transition function δ : S × Bn → S is defined as follows: δ(sinitial , b) = −aT · b δ(l, b) = 2l + aT · b
where l ∈ Z is a non-initial state. – The set of accepting states Sacc = {l ∈ Z | l ∼ c} ∪
{sinitial } ∅
if aT · 0 ∼ c otherwise.
In the rest of this section we show how this infinite automaton can be converted into an equivalent finite-state automaton. Intuitively, there is a certain finite range of values of the LHS of f such that if Af transitions outside of this range, it starts diverging, or “moving away” from this range, and is guaranteed to stay outside of this range and on the same side of it (i.e. diverging to +∞ or −∞). We show that all of the states outside of the range can be collapsed into only two states (representing +∞ and −∞ respectively), and that those states can be meaningfully labeled as accepting or rejecting without affecting the language of the original automaton Af .
Deciding Presburger Arithmetic
177
Definition 2. For a vector of LHS coefficients aT = (a1 , . . . , an ) define |ai | ||aT ||− = {i|ai 0}
|ai |
Notice that both ||aT ||− and ||aT ||+ are non-negative. Let b denote an n-bit binary vector, that is, b ∈ Bn . Observe that −aT · b ≤ ||aT ||− for any value of b, since the expression −aT · b can be rewritten as |aj |bj − |ai |bi . −aT · b = {j | aj 0}
Therefore, the largest positive value of −aT ·b can be obtained by setting bi to 0 whenever ai > 0, and setting bj to 1 when aj < 0, in which case −aT · b = ||aT ||− . It is clear that any other assignment to b can only make −aT · b smaller. Similarly, aT · b ≤ ||aT ||+ . Lemma 3. Given an atomic Presburger formula aT ·x ∼ c, a corresponding automaton Af as defined is Section 2.2, and a current state of the automaton l ∈ Z, the following two claims hold: 1. If l > ||aT ||− , then any next state l will satisfy l > l. 2. If l < −||aT ||+ , then any next state l will satisfy l < l. Proof. The upper bound (claim 1). Assume that l > ||aT ||− for some state l ∈ Z. Then the next state l satisfies the following: l = 2l + aT · b ≥ 2l − ||aT ||− > 2l − l = l. The lower bound (claim 2) is similar to the proof of claim 1. We now discuss bounds on the states of the automata based on Lemma 3. From this lemma it is easy to see that once the automaton reaches a state outside of
min(−||aT ||+ , c), max(||aT ||− , c) , it is guaranteed to stay outside of this range and on the same side of it. That is, if it reaches a state l < min(−||aT ||+ , c), then l < min(−||aT ||+ , c) for any subsequent state l that it can reach from l. If the relation ∼ in f is an equality, then l = c is guaranteed to be false from the moment Af transitions to l onward. Similarly, it will be false forever when ∼ is ≥ or >; however it will always be true for < and ≤ relations. In any case, either all of the states l of the automaton Af below min(−||aT ||+ , c) are accepting, or
178
V. Ganesh, S. Berezin, and D.L. Dill
all of them are rejecting. Since the automaton will never leave this set of states, it will either always accept any further inputs or always reject. Therefore, replacing all states below min(−||aT ||+ , c) with one single state s−∞ with a self-loop transition for all inputs and marking this state appropriately as accepting or rejecting will result in an automaton equivalent to the original Af . Exactly the same line of reasoning applies to the states l > max(||aT ||− , c), and they all can be replaced by just one state s+∞ with a self-loop for all inputs. Formally, the new finite automaton has the set of states
S = min(−||aT ||+ , c), max(||aT ||− , c) ∪ {sinitial , s−∞ , s+∞ }. Transitions within the range coincide with the transitions of the original (infinite) automaton Af . If in the original automaton l = δ(l, b) for some state l and input b, and l > max(||aT ||− , c), then in the new automaton the corresponding next state is δ (l, b) = s+∞ , and subsequently, δ (s+∞ , b) = s+∞ for any input b. Similarly, if the next state l < min(−||aT ||+ , c), then the new next state is s−∞ , and the automaton remains in s−∞ forever: δ (sinitial , b) = −aT · b δ (s+∞ , b) = s+∞ δ (s−∞ , b) = s−∞ if 2l + aT · b > max(||aT ||− , c) s+∞ , if 2l + aT · b < min(−||aT ||+ , c) δ (l, b) = s−∞ , T 2l + a · b, otherwise. The accepting states within the range are those that satisfy the ∼ relation. The new “divergence” states are labeled accepting if the ∼ relation holds for some representative state. For instance, for a formula aT · x < c the state s−∞ is accepting, and s+∞ is rejecting. Finally, the initial state sinitial is accepting if and only if it is accepting in the original infinite automaton. We can use the bounds from Lemma 3 to repeat the analysis from [19] for the number of states of the automaton and obtain new bounds tighter by a factor of 2. Since we have to know the bounds in advance when constructing an SMV model, this saves one bit of state for every atomic formula. Asymptotically, of course, our new bounds stay the same as in [19].
3
Implementation
In the previous section we have shown a mathematical construction of a deterministic finite-state automaton corresponding to a quantifier-free Presburger formula f . In practice, building such an automaton explicitly is very inefficient, since the number of states is proportional to the value of the coefficients in aT and the right hand side constant c and, most importantly, the number of transitions from each state is exponential (2n ) in the number of variables in f .
Deciding Presburger Arithmetic
179
Instead, we use an existing symbolic model checker SMV [13] as a means to build the symbolic representation of the automaton and check its language for emptiness. Symbolic model checking expresses a design as a finite-state automaton, and then properties of this design are checked by traversing the states of the automaton. In the past decade, there has been a lot of research in boosting the performance of model checkers. Most notable breakthrough was in early 90s when binary decision diagrams [4] (BDDs) were successfully used in model checking [13], pushing the tractable size of an automaton to as many as 1020 states and beyond [6]. Therefore, it is only natural to try to utilize such powerful and well-developed techniques of handling finite-state automata in checking the satisfiability of Presburger formulas. The obvious advantages of this approach is that the state-of-the-art verification engines such as SMV are readily available, and the only remaining task is to transform the emptiness problem for an automaton into a model checking problem efficiently. In addition, with SMV we exploit the efficient BDD representation for both states and transitions of the automata, whereas in the other automata-based approaches like MONA or LASH the states are represented explicitly. We have performed all of our experiments with the CMU version of SMV model checker. Although the SMV language allows us to express the automaton and its transitions directly in terms of arithmetic expressions, the cost of evaluating these expressions in SMV is prohibitively high. Internally, SMV represents all the state variables as vectors of boolean variables. Similarly, the representation of the transition relation is a function2 that takes boolean vectors of the current state variables and the inputs and returns new boolean vectors for the state variables in the next state. Clock Input
Next State Function
R accept?
0/1
Fig. 2. Circuit implementing a finite-state automaton.
Effectively, SMV builds an equivalent of a sequential digital circuit operating on boolean signals, as shown in Figure 2. The current state of the automaton is stored in the register R. The next state is computed by a combinational circuit from the value of the current state and the new inputs, and the result is latched back into the register R at the next clock cycle. A special tester circuit checks whether the current state is accepting, 2
Strictly speaking, SMV constructs a transition relation which does not have to be a function, but here it is indeed a function, so this distinction is not important.
180
V. Ganesh, S. Berezin, and D.L. Dill
and if it is, the sequence of inputs read so far (or the string in our original terminology) is accepted by the automaton (and represents a solution to f ). The property that we check is that the output of the circuit never becomes 1 for any sequence of inputs. In the logical specification language of SMV, this is written as AG(output = 1) . If this property is true, then the language of the automaton is empty, and the original formula f is unsatisfiable. If this property is violated, SMV generates a counterexample trace which is a sequence of transitions leading to an accepting state. This trace represents a satisfying assignment to the formula f . The translation of the arithmetic expressions to such a boolean circuit is the primary bottleneck in SMV. Hence, providing the circuit explicitly greatly speeds up the process of building the transition relation. A relatively simple Perl script generates such a circuit and the property very efficiently and transforms it into a SMV description. The structure of the resulting SMV code follows very closely the mathematical definition of the automaton, but all the state variables are explicitly represented by several boolean variables, and all the arithmetic operations are converted into combinational circuits (or, equivalently, boolean expressions). In particular, ripple-carry adders are used for addition, “shift-and-add” circuits implement multiplication by a constant, and comparators implement equality and inequality relations in the tester circuit.
4
Experimental Results
Since the satisfiability problem for quantifier-free Presburger arithmetic is NP-complete, the hope that it has an efficient general purpose decision procedure is quite thin. Therefore, for practical purposes, it is more important to collect several different methods and evaluate their performance on different classes of formulas. When strengths and weaknesses of each of the approaches and tools are identified, it is easier to pick the best one for solving concrete problems that arise in practice. The primary purpose of our experiments is to study the performance of automatabased and ILP-based methods and their variations depending on different parameters of Presburger formulas. The tools and approaches that we picked are the following: – Automata-based tools: • Our approach using the SMV model checker (we refer to it as “SMV”); • LASH [19], a direct implementation of the automata-based approach dedicated to Presburger arithmetic; • MONA [10], an automata-based solver for WS1S and a general-purpose automata library. – Approaches based on Integer Linear Programming (ILP): • LP SOLVE, simplex-based open source tool with branch-and-bound for integer constraints; • CPLEX, one of the best commercial simplex-based LP solvers; • OMEGA [16], a tool based on Fourier-Motzkin algorithm [18].
Deciding Presburger Arithmetic
181
The benchmarks consist of many randomly generated relatively small quantifierfree Presburger formulas. The examples have three main parameters: the number of variables, the number of atomic formulas (the resulting formula is a conjunction of atomic formulas), and the maximum value of the coefficients. For each set of parameters we generate 5 random formulas and run this same set of examples through each of the tools. The results of the comparisons appear in Figures 3, 4, and 5 as plots showing how execution time of each automata-based tool depends on some particular parameter with other parameters fixed, and the success rate of all the tools for the same parameters. Each point in the run-time graphs represents a successful run of an experiment in a particular tool. That is, if a certain tool has fewer points in a certain range, then it means it failed more often in this range (ran out of memory or time, hit a fatal error, etc.). The ILP tools either complete an example within a small fraction of a second, or fail. Therefore the run-time is not as informative for ILP tools as the number of completed examples, and hence, only the success rates for those are shown. In the case of MONA, the only readily available input language is WS1S, and we have found that translating Presburger formulas into WS1S is extremely inefficient. Even rather simple examples which SMV and LASH solve in no time take significant time in MONA. Due to this inefficient translation, the comparison of MONA with other approaches is not quite fair. Therefore, it is omitted from the graphs and will not be considered in our discussion further. LASH and SMV both have obvious strengths and weaknesses that can be easily characterized. SMV suffers the most from the number of atomic formulas, as can be seen from Figure 3 where the run-time is plotted as a function of the number of atomic formulas. The largest number of formulas it could handle in this batch is 11, whereas the other tools including LASH finished most of the experiments with up to 20 atomic formulas. This suggests that the implemention of the parallel composition of automata for atomic formulas in SMV is suboptimal. LASH apparently has a better way of composing automata. Varying the number of variables (Figure 4) makes SMV and LASH look very much alike. Both tools can complete all of the experiments, and the run-time grows approximately exponentially with the number of variables and at the same rate in both tools. This suggests that the BDD-like structure for the transitions in LASH indeed behaves very similarly to BDDs in SMV. However, since the number of states in the automata are proportional to the values of the coefficients, LASH cannot complete any of the experiments with coefficients larger than 4096 and fails on many experiments even with smaller values. SMV, on the other hand, can handle as large coefficients as 230 with only a moderate increase of the runtime and the failure rate. We attribute this behavior to the fact that in SMV both the states and the transitions of the automata are represented with BDDs, while in LASH (and all the other available automata-based tools) the states are always represented explicitly. Finally, we have to say a few words about the ILP based methods. First of all, these methods are greatly superior to the automata-based in general, and they do not exhibit any noticeable increase in run-time when the number of variables or the number of formulas increase. The only limiting factor for ILPs are the values of the coefficients, which cause
182
V. Ganesh, S. Berezin, and D.L. Dill 10000 time (seconds)
number of variables = 4, max. coefficient size = 32
1000
100
10
1
0.1
0.01 Tools SMV LASH
Number of atomic formulas 0
5
10
15 25
22.5
22.5
20 17.5 15
SMV LASH
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
Number of atomic formulas
Completed experiments
Completed experiments
0.001 25
20
25
20 17.5 15
Omega LP SOLVE CPLEX
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
Number of atomic formulas
Fig. 3. Run-time and the number of completed experiments depending on the number of atomic formulas in each test case.
many failures and overflows starting at about 107 , especially in LP SOLVE. Although all of the successful runs of the ILP-based tools are well under a fraction of a second, there are also many failures due to a non-terminating branch-and-bound search, overflow exceptions, and program errors. OMEGA is especially notorious for segmentation faults, and its failure rate greatly increases when the values of the coefficients approach the limit of the machine-native integer or float representation. Despite overall superiority of the ILP-based methods over the automata-based ones, there are a few cases where the ILP methods fail while the automata-based methods work rather efficiently. The most interesting class of such examples can be characterized as follows. The formula must have a solution in real numbers, but the integer solutions either do not exist or they are rather sparse in the feasibility set (the set of real solutions) of the formula. Additionally, the direct implementation of the branch-and-bound method is incomplete when the feasibility set is unbounded, since there are infinitely many integer points that have to be checked. This claim still holds to some extent even in the
Deciding Presburger Arithmetic
183
1000 time (seconds)
number of formulas = 1, max. coefficient size = 32
100
10
1
0.1
0.01 Tools SMV LASH
Number of variables 0.001
0
5
10
15
20
25
30
35
25
Completed experiments
22.5 20 17.5 15
OMEGA LP SOLVE CPLEX
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
21-25
26-30
Number of variables Fig. 4. Run-time and the number of completed experiments depending on the number of variables in a single atomic formula. SMV and LASH finish all of the experiments, hence there is no bar chart for those.
heuristic-rich top quality commercial tools such as CPLEX, and we have observed their divergence on a few examples that are trivial even for the automata-based techniques. The OMEGA approach stands out from the rest of ILP tools since it is based on the Fourier-Motzkin method which is complete for integer linear constraints. Unfortunately, the only readily available implementation of this method is very unstable. Another common weakness of all of the ILP-based approaches is the limit of the coefficient and solution values due to the rounding errors of native computer arithmetic. It is quite easy to construct an example with large integer coefficients for which CPLEX
184
V. Ganesh, S. Berezin, and D.L. Dill 10000 time (seconds)
Number of variables = 4, number of formulas = 1
1000
100
10
1
0.1
0.01 Tools SMV LASH
maximum coefficient value 0.001
1
10
100
1000
10000
100000
23
22.5
1e+07
1e+08
1e+09
1e+10
22.5
20 17.5 15
SMV LASH
12.5 10 7.5 5 2.5
2 0
0 1-5
1e+06 25
25
6-10
11-15
16-20
0
21-25
log2 (max. coefficient)
0
26-30
Completed experiments
Completed experiments
25
20 17.5 15
OMEGA LP SOLVE CPLEX
12.5 10 7.5 5 2.5 0 1-5
6-10
11-15
16-20
21-25
26-30
log2 (max. coefficient)
Fig. 5. Run-time and the number of completed examples depending on the (maximum) values of the coefficients in a single atomic formula.
returns a plainly wrong answer. Large coefficients can be extremely useful in hardware verification when operations on long bit-vectors are translated into Presburger arithmetic. We conjecture that the efficiency of the ILP methods highly depends on the use of computer arithmetic, and the only fair comparison with automata-based methods can be done if the ILP tools use arbitrary precision arithmetic.
5
Conclusion
Efficient decision procedures for Presburger arithmetic are key to solving many formal verification problems. We have developed a decision procedure based on the idea of converting the satisfiability problem into a model checking problem. Experimental comparisons show that our method can be more efficient than other automata-based methods like LASH and MONA, particularly for formulas with large coefficients. In our approach
Deciding Presburger Arithmetic
185
we use BDDs both for the states and the transitions of the automata while LASH and MONA use BDDs or similar structures only for the transitions. As an additional theoretical result, we provide tighter bounds for the number of states of the automata. This makes our automaton construction in SMV even more efficient. Another advantage of our approach is that converting the satisfiability problem into model checking problem requires very little implementation effort. We exploit the existing SMV model checker as a back-end which employs a very efficient BDD package. Therefore, the only effort required from us is the translation of a Presburger formula into the SMV input language. In addition, we compare various automata and ILP-based approaches on a suite of parameterized randomly generated Presburger formulas. For every approach we identify classes of Presburger formulas for which it either performs very poorly or very efficiently. For instance, we found that the ILP-based tools are more likely to fail on examples with unbounded but sparse solution sets and cannot handle large coefficients due to the use of native machine arithmetic. The automata-based tools are not as sensitive to these parameters. On the other hand, ILP-based approaches scale much better on the number of variables and atomic formulas. We also believe that the ILP tools have an unfair advantage over the automata methods due to the use of native arithmetic. However, until further experiments are done with an ILP tool with support for arbitrarily large integers we cannot tell how much difference it makes. Within the automata-based approaches SMV scales better with the coefficients’ size, but displays poorer performance for large number of atomic formulas when compared to LASH. Both perform equally well as the number of variables is varied. The reason the other tools do not use BDDs for the states is because they perform quantifier elimination by manipulating the automata directly. Namely, each quantifier alternation requires projection and determinization of the automaton. The use of BDDs for the states can make the implementation of the determinization step particularly hard. This difference is one of the reasons for the relative efficiency of our approach. The extension of our approach to full Presburger arithmetic can be done by combining it with the traditional quantifier elimination method [12]. This method introduces a new type of atomic formulas with the divisibility operator: aT · x | c, and our automaton construction can be easily extended to handle it. We also believe that our approach may prove useful to other theories and logics which use automata based decision procedures.
References 1. Tod Amon, Gaetano Borriello, Taokuan Hu, and Jiwen Liu. Symbolic timing verification of timing diagrams using Presburger formulas. In Design Automation Conference, pages 226–231, 1997. 2. Alexandre Boudet and Hubert Comon. Diophantine equations, Presburger arithmetic and finite automata. In H. Kirchner, editor, Colloquium on Trees in Algebra and Programming (CAAP’96), volume 1059 of Lecture Notes in Computer Science, pages 30–43. Springer Verlag, 1996. 3. R. Brinkmann and R. Drechsler. RTL-datapath verification using integer linear programming. In IEEE VLSI Design’01 & Asia and South Pacific Design Automation Conference, Bangalore, pages 741–746, 2002.
186
V. Ganesh, S. Berezin, and D.L. Dill
4. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, 35(8):677–691, 1986. 5. J. R. B¨uchi. Weak second-order arithmetic and finite automata. Zeitschrift f¨ur mathematische Logik und Grundladen der Mathematik, 6:66–92, 1960. 6. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. Information and Computation, 98:142–170, 1992. 7. E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2):244–263, 1986. 8. D. C. Cooper. Theorem proving in arithmetic without multiplication. In Machine Intelligence, volume 7, pages 91–99, New York, 1972. American Elsevier. 9. George B. Dantzig and B. Curtis Eaves. Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A), 14:288–297, 1973. 10. Jacob Elgaard, Nils Klarlund, and Anders Møller. Mona 1.x: new techniques for ws1s and ws2s. In Computer Aided Verification, CAV ’98, Proceedings, volume 1427 of LNCS. Springer Verlag, 1998. 11. P. Johannsen and R. Drechsler. Formal verification on the RT level computing one-to-one design abstractions by signal width reduction. In IFIP International Conference on Very Large Scale Integration (VLSI’01), Montpellier, 2001, pages 127–132, 2001. 12. G. Kreisel and J. Krivine. Elements of mathematical logic, 1967. 13. K. L. McMillan. Symbolic Model Checking: An Approach to the State Explosion Problem. Kluwer Academic Publishers, 1993. 2pn 14. Derek C. Oppen. A 22 upper bound on the complexity of Presburger arithmetic. Journal of Computer and System Sciences, 16(3):323–332, June 1978. ¨ 15. M. Presburger. Uber de vollst¨andigkeit eines gewissen systems der arithmetik ganzer zahlen, in welchen, die addition als einzige operation hervortritt. In Comptes Rendus du Premier Congr`es des Math´ematicienes des Pays Slaves, pages 92–101, 395, Warsaw, 1927. 16. William Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Supercomputing, pages 4–13, 1991. 17. T. R. Shiple, J. H. Kukula, and R. K. Ranjan. A comparison of Presburger engines for EFSM reachability. In A. J. Hu and M. Y. Vardi, editors, Proceedings of the 10th International Conference on Computer Aided Verification, volume 1427, pages 280–292. Springer-Verlag, 1998. 18. H. P. Williams. Fourier-Motzkin elimination extension to integer programming problems. Journal of Combinatorial Theory (A), 21:118–123, 1976. 19. Pierre Wolper and Bernard Boigelot. On the construction of automata from linear arithmetic constraints. In Proc. 6th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, volume 1785 of Lecture Notes in Computer Science, pages 1–19, Berlin, March 2000. Springer-Verlag.
Qubos: Deciding Quantified Boolean Logic Using Propositional Satisfiability Solvers Abdelwaheb Ayari and David Basin Institut f¨ ur Informatik, Albert-Ludwigs-Universit¨ at Freiburg, Germany. www.informatik.uni-freiburg.de/˜{ayari,basin}
Abstract. We describe Qubos (QUantified BOolean Solver), a decision procedure for quantified Boolean logic. The procedure is based on nonclausal simplification techniques that reduce formulae to a propositional clausal form after which off-the-shelf satisfiability solvers can be employed. We show that there are domains exhibiting structure for which this procedure is very effective and we report on experimental results.
1
Introduction
In recent years there has been considerable work on developing and applying satisfiability (SAT) solvers for quantified Boolean logic (QBL). Applications include program verification using bounded model checking [3] and bounded model construction [1], hardware applications including testing and equivalence checking [17], and artificial intelligence tasks like planning [14]. Solvers for (unquantified) Boolean logic have reached a state of maturity; there are many success stories where SAT-solvers such as [11,19,22] have been successfully applied to industrial scale problems. However, the picture for QBL is rather different. Despite the growing body of research on this topic, the current generation of Q(uantified)SAT-solvers [8,10,15] are still in their infancy. These tools work by translating QBL formulae to formulae in a quantified clausal normal form and applying extensions of the Davis-Putnam method to the result. The extensions concern generalizing Davis-Putnam heuristics such as unitpropagation and backjumping. These tools have not yet achieved the successes that SAT tools have and our understanding of which classes of formulae these procedures work well on, and why, is also poor. In this article, we present a different approach to the QSAT problem. It arose from our work in bounded model construction for monadic second-order logics [1] where we reduce the problem of finding small models for monadic formulae to QBL satisfiability. Our experience with available QBL solvers was disappointing. Their application to formulae involving more than a couple quantifier iterations would often fail, even for fairly simple formulae. In particular, our model construction procedure generates formulae where the scope of quantification is generally small in proportion to the overall formula size and in many cases quantifiers can be eliminated, without blowing up the formulae, by combining quantifier elimination with simplification. This motivated our work on a procedure based on combining miniscoping (pushing quantifiers in, in contrast M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 187–201, 2002. c Springer-Verlag Berlin Heidelberg 2002
188
A. Ayari and D. Basin
to out, which is used in clause based procedures), quantifier expansion, and eager simplification using a generalization of Boolean constraint propagation. The transformation process is carried out until the result has only one kind of quantifier remaining, at which point the result can be converted to clausal form and given to an off-the-shelf (Boolean) SAT-solver. Our thesis in this paper is that our decision procedure works well (it is superior to other state-of-the-art approaches) when certain kinds of structure are present in the problems to be solved. Our contribution is to identify a notion of structure based on relative quantifiers scope, to show that certain classes of problems will naturally have this structure (i.e., that the ideas presented in this paper have general applicability), and to validate our thesis experimentally. Our experimental comparison is on two sets of problems, those arising in bounded model construction, which always exhibit significant structure, and those arising in conditional planning, which have varying degrees of structure. Related Work. The idea of tuning a solver to exploit structure also arises in bounded model checking, where SAT-solvers are tuned to exploit the problemspecific structure arising there. In [18], such heuristics were embedded within a generic SAT algorithm that generalizes the Davis-Putnam procedure. Similar techniques to miniscoping and quantifier expansion are also used in Williams et al. [20] to optimize different computation tasks like the calculation of fixed points. Most QBL algorithms generalize the Davis-Putnam procedure to operate on formulae transformed into quantified clausal normal form. Cadoli et al. [6] and Rintanen [16,15] present different heuristic extensions of the Davis-Putnam method. Cadoli et al.’s techniques were tuned for randomly generated problems and Rintanen’s strategies were specially designed for planning problems whose quantifiers have a fixed ∃∀∃-structure. Other work includes that of Letz [10] and Giunchiglia et al. [7] who have generalized the backjumping heuristic (also called dependency-directed backtracking) to QBL. Our approach differs from all of these in that it is not based on Davis-Putnam, it can operate freely on subformulae of the input formula (this avoids a major source of inefficiency of Davis-Putnam based procedures, namely the selection of branching variables is strongly restricted by the ordering induced by the prefix of the input formula), and for structured problems (in our sense) it yields significantly better results. The most closely related work is that of Plaisted et al. [13] who present a decision procedure for QBL that also operates directly on quantified Boolean formulae by iteratively applying equivalence preserving transformation. However, rather than expanding quantifiers, in their approach a subformula with a set of free variables X is replaced by a large conjunction of all negated evaluations of X that make the subformulae unsatisfiable. Plaisted et al. [13] suggest that their procedure should work well for hardware systems that have structure in the sense of being “long and thin”; as indicated by their examples (ripplecarry adders), these systems form a subclass of well-structured problems in our sense. As no implementation is currently available, we were unable to compare our approaches experimentally.
Qubos: Deciding Quantified Boolean Logic
189
Organization. The rest of the paper is organized as follows. In Section 2, we provide background on QBL and introduce notation. In Section 3, we explain what kind of structure we will exploit and why certain classes of problems are naturally structured. In Section 4, we introduce our procedure and in Section 5, we present experimental results. Finally, in Section 6, we draw conclusions.
2
Background
The formulae of Boolean logic (BL) are built from the constants and ⊥, the variables x ∈ V, and are closed under the standard connectives ¬ (negation), ∧ (conjunction), ∨ (disjunction), → (implication), and ↔ (logical equivalence). The formulae φ are interpreted in B = {0, 1}. A substitution σ : V → B is a mapping from variables to truth values that is extended homomorphically to formulae. We say σ satisfies φ if σ(φ) = 1. Quantified Boolean logic (QBL) extends Boolean logic by allowing quantification over Boolean variables, i.e., ∀x. φ and ∃x. φ. A substitution σ satisfies ∀x. φ if σ satisfies φ[/x]∧φ[⊥/x], and dually a substitution σ satisfies ∃x. φ if σ satisfies φ[/x] ∨ φ[⊥/x]. As notational shorthand, we allow quantification over sets of variables and we write Qx1 , . . . , xn . φ for the formula Qx1 . · · · Qxn . φ, where Q ∈ {∀, ∃}. We denote by free(φ) the set of free variables in φ. Unless indicated otherwise, by “formulae” we mean quantified Boolean formulae instead of (unquantified) Boolean formulae. A formula x or ¬x, where x is a variable, is called a literal . A formula φ is in negation normal form (nnf ), if, besides the quantifiers, it contains only the connectives ∨, ∧ and ¬, and ¬ appears only before variables. A formula φ is in prenex normal form (pnf ) if it has the form Q1 X1 · · · Qk Xk . ψ where Qi ∈ {∃, ∀}, each Xi is a finite set of variables, and ψ is a Boolean formula called the matrix of φ. A formula φ is in quantified clausal normal form (qcnf ) if it is in pnf and its matrix is a conjunction of disjunctions of literals. We define the prefix-type of a formula in pnf inductively as follows. A Boolean formula has the prefix-type Σ0 = Π0 . A formula ∀x. φ has the prefix-type Πn+1 (respectively Πn ) if φ has the prefix-type Σn (respectively Πn ). A formula ∃x. φ has the prefix-type Σn+1 (respectively Σn ) if φ has the prefix-type Πn (respectively Σn ). Finally, the size of a formula φ, denoted by | φ |, is the number of variable occurrences, connectives and (maximal) quantifier blocks in φ, i.e., the size of the abstract syntax tree for φ, where like quantifiers are grouped in blocks and only counted once.
3
Structured Problems
Our thesis is that our decision procedure works well (in particular, it is superior to other state-of-the-art approaches) when certain kinds of structure are present in the problems to be solved. In this section we explain what structure is, how one measures it, and why certain classes of problems will naturally have this structure.
190
A. Ayari and D. Basin
The structure we exploit is based on a notion of quantifier scope, in particular the size of quantified subterms relative to the size of the entire term. When the average quantifier scope is small, our transformations can often successfully eliminate quantifiers in manageable time and space. In our experiments, it is important to be able to measure structure to assess its effects on the decision procedure’s performance. Our measure is based on the average quantifier weight, defined as follows: Definition 1. Let φ be a quantified Boolean formula, Q ∈ {∀, ∃}, MQ be the multiset of all Q-quantified subformulae of φ, and ψ ∈ MQ . The relative Qweight of ψ with respect to φ is rwφQ (ψ) = |ψ| |φ| . The average Q-weight of φ is
awQ (φ) = |M1Q| Σψ∈MQ rwφQ (ψ). Now, well-structured formulae are those with either a small average ∀-weight or small average ∃-weight (typically under 5%, as we will see for the first problem domain we consider), i.e., those in which, for at least one of the quantifiers, quantified variables have small scopes on average. In contrast, poorly structured formulae with large average weight have many quantifiers with large scopes. The two domains we consider are system verification using bounded model construction [1], and conditional planning [14]. For the first domain, we show that problems are always well-structured. In the second domain, the degree of structure varies considerably. The corresponding effectiveness of our decision procedure also varies in relationship to this structure. 3.1
Bounded Model Construction
Bounded model construction (BMC) [1] is a method for generating models for a monadic formula by reducing its satisfiability problem to a QBL satisfiability problem. This method has been applied to problems in hardware verification, protocol verification, and reasoning about Java Bytecode correctness. We will present a small example to show how structured problems arise in BMC and then explain why this is generally the case. The example is reasoning about a parameterized family of ripple-carry adder: verifying the equivalence of adders in the family with their specification, for all parameter instances. The monadic formulae describing part of the implementation and specification of the adder are as follows: adder(n, A, B, S, ci , co ) ≡ ∃C. (C(0) ↔ ci ) ∧ (C(n) ↔ co ) ∧ ∀p. p < n → full adder(A(p), B(p), S(p), C(p), C(p + 1)) spec(n, A, B, S, ci , co ) ≡ ∃C. (C(0) ↔ ci ) ∧ (C(n) ↔ co ) ∧ ∀p. p < n → at least two(A(p), B(p), C(p), C(p + 1))∧ mod two(A(p), B(p), S(p), C(p)) The monadic second-order variables (written in capitals) A and B represent n-bit input vectors, S represents the n-bit output, C the n + 1-bit vector of carries, and the Booleans ci and co are the carry-in and carry-out respectively. The
Qubos: Deciding Quantified Boolean Logic
191
specification of the n-bit adder, for example, states that an n-bit adder is built by chaining together (ripple-carry fashion) n copies of a full-one bit adder, where carries are propagated along an internal line of carries C. The specification of the auxiliary formulae full adder, at least two and mod two are straightforward Boolean formulae and can be found in [2]. The equivalence between the specification and the implementation of the adder is stated by the formula Φ ≡ ∀n. ∀A, B, S. ∀ci , co . adder(n, A, B, S, ci , co ) ↔
(1)
spec(n, A, B, S, ci , co ) . In this example, BMC takes as input the negation of (1) and a natural number k. It produces a quantified Boolean formula as follows. First, first-order quantified subformulae are unfolded k-times; that is formulae having the form ∀x. φ (respectively, ∃x. φ), where x ranges over the natural numbers, are unfolded into the formula i∈{1,... ,k} φ[i/x] (respectively, i∈{1,... ,k} φ[i/x]). In our example, the quantification over n in (1) and over p in the predicates adder and spec are unfolded k times. Afterwards, second-order quantification is eliminated: each second-order variable is replaced with k Boolean variables. For example ∀A is replaced with the quantifier block ∀a1 , . . . , ak and every occurrence of the predicate A(i) is replaced with the Boolean variable ai . This kind of transformation produces a quantified Boolean formula whose size is O(k 2 | φ |) in the bound k and original formula φ. In general, applications to practical verification problems give rise to large quantified Boolean formulae often on the order of 20 megabytes for larger examples, that we have tackled. Central to our approach here is the fact that the transformation always produces formula with a large amount of structure, as we explain below. In the above transformation, large formulae (due to the k 2 factor in the expansion) result from expanding first-order quantification. In this example, we quantify outermost over n in stating our correctness theorem and this is always the case when verifying theorems about parameterized systems. Similarly, when reasoning about time dependent systems, like sequential circuits or protocols, one also always quantifies outermost over n, which represents time or the number of steps. The unfolding of this outermost quantifier alone explains the main reason why BMC results in a quantified Boolean formula of small average quantifier weight since, after the unfolding, the remaining quantified subformulae have a relative weight at most 1/k of the original formula. The unfolding of additional first-order quantifiers only serves to further reduce the average weight. Hence we have: Lemma 1. Let Φ ≡ Q n. φ be a first-order quantified monadic formula where Q ∈ {∀, ∃} and let Φ (respectively φ ) be the result of the BMC expansion of Φ (respectively φ) with bound k ∈ N. It holds that awQ (Φ ) = k1 awQ (φ ), for Q ∈ {∀, ∃}. Of course, BMC also eliminates second-order quantification, where a secondorder quantifier is replaced with a block of Boolean quantifiers. In general, this has a negligible effect on the amount of structure since, after the outermost unfolding, these quantifiers have small relative scope. It follows then that BMC
192
A. Ayari and D. Basin
produces well-structured problems. Moreover, there is a positive correlation between problem size (resulting from large values of k) and structure, which helps to explain the good performance of our decision procedure on problems in this class. 3.2
Conditional Planning in Artificial Intelligence
The second problem domain that we use for experiments is conditional planning in QBL. A conditional planning problem is the task of finding a finite sequence of actions (which comprise a plan) whose successive application, starting from an initial state, leads to a goal state. Applications of conditional planning include robotics, scheduling, and building controllers. The main difference between conditional and classical planning is that the initial states as well as the moves from one state to another state depend on different circumstances that can be tested. This leads to interesting QBL problems. As shown in [14], finding a solution for a conditional planning problem can be expressed as a satisfiability problem for a quantified Boolean formula of the form: P ≡ ∃P1 , . . . , Pm . ∀C1 , . . . , Cn . ∃O1 , . . . , Op . Φ . The validity of the formula P means that there is a plan (represented by the variables P1 , . . . , Pm ) such that for any contingencies (represented by the variables C1 , . . . , Cn ) that could arise, there is a finite sequence of operations (O1 , . . . , Op ) whose applications allow one to reach the goal state starting from an initial state. The body Φ is a conjunct of formulae stating the initial states, goal states, and the next-state relation. If n = 0 then P encodes a classical (non-conditional) planning problem. In this case, the validity of P can be checked using a SAT-solver. In the n = 0 case, in general miniscoping can only partially succeed in pushing the quantifier ∃O1 , . . . , Op down in Φ; this in turn limits the miniscoping of the other quantifiers, e.g., ∀C1 , . . . , Cn . As a result, even after miniscoping, the average ∀-weight is m n + p+ | Φ | = 1− n + m + p+ | Φ | m + n + p+ | Φ | which is high, up to 90%, for large n, m, p, and | Φ |. The average ∃-weight tends to be better since by pushing down, even partially, the ∃O1 , . . . , Op , we increase the amount of (∃-)structure in P and we obtain better average weight, typically between 50% and 70%. Furthermore, the average ∃-weight generally becomes larger (respectively smaller) when we decrease (respectively increase) one of the factors p and | Φ |. Hence conditional planning gives us a potentially large spectrum of problems with differing amounts of structure. Moreover, there are standard databases of such planning problems that exhibit such variations, which we can use for testing.
Qubos: Deciding Quantified Boolean Logic
193
proc Qubos(φ, SAT ) ≡ let Q ∈ {∀, ∃} be the quantifier kind with smallest awQ while (φ contains Q’s) do miniscope the quantifiers in φ; eliminate the innermost Q block; simplify φ; od; compute input α for SAT from φ; invoke SAT with the input α; end Fig. 1. The Qubos Main Loop
4
Qubos
We present in this section the decision procedure implemented by our system Qubos. The main idea is to iterate normalization using miniscoping with selective quantifier expansion and simplification. For well-structured problems, the combination often does not require significant additional space; we will provide experimental evidence for this thesis in Section 5. The structure of the main routine of our decision procedure is given in Figure 1. It takes as arguments a quantified Boolean formula φ and a SAT-solver SAT . The initial step determines whether the average quantifier weight is smaller for ∀ or ∃. Afterwards Qubos iterates three transformations to reduce φ to a Boolean formula. As each iteration results in fewer Q-quantifiers, the procedure always terminates (given sufficient memory). At the end of this step, the formula φ contains only one kind of quantifier. Afterwards, Qubos computes the input formula of the SAT-solver SAT depending on the quantifier kind Q and whether SAT operates on Boolean formulae or on formulae in clausal form. If Q is the quantifier ∃ then Qubos deletes all the occurrences of Q and generates the input of SAT . If Q is the quantifier ∀ then Qubos also deletes all the occurrences of Q, negates the resulting formula, generates the input of SAT , and finally it complements the result returned by the SAT solver. Below, we describe the transformations used in the main loop in more details. Miniscoping. Miniscoping is the process of pushing quantifiers down inside a formula to their minimal possible scope. By reducing the scope of quantifiers, miniscoping reduces the size of the formula resulting from subsequent quantifier expansion. The following rules for miniscoping are standard. ∀x. φ ∧ ψ
⇒
(∀x. φ) ∧ ∀x. ψ
∀x. φ ∨ ψ
⇒
(∀x. φ) ∨ ψ,
if x ∈ free(ψ)
∀x. φ
⇒
φ,
if x ∈ free(φ)
∃x. φ ∨ ψ
⇒
(∃x. φ) ∨ ∃x. ψ
∃x. φ ∧ ψ
⇒
(∃x. φ) ∧ ψ,
if x ∈ free(ψ)
∃x. φ
⇒
φ,
if x ∈ free(φ)
194
A. Ayari and D. Basin
Note that similar kinds of simplification are performed in first-order theorem proving, where quantifiers are pushed down to reduce dependencies and generate Skolem functions with minimal arities (see [12]). Although simple and intuitively desirable, other QSAT solvers work by maxi scoping, i.e., moving quantifiers outwards when transforming formulae to quantified clausal normal form. Elimination of Quantified Variables. We explain only the elimination of universally quantified variables as the elimination of existentially quantified variables is similar. In an expansion phase, we eliminate blocks of universally quantified variables by replacing subformulae of the form ∀x. φ with the conjunction φ[/x] ∧ φ[⊥/x]. In special cases (when eliminating universally quantified variables), we can avoid duplication altogether, e.g., when φ does not contain existential quantifiers (cf., [5]). In this case, we proceed as follows: we transform φ into the clausal normal form ψ, remove all tautologies from ψ, and then replace each literal from {y | y is universally quantified in φ} ∪ {¬y | y is universally quantified in φ} with ⊥ in ψ. Simplification. The application of simplification after each expansion step is important in keeping the size of formulae manageable. We distinguish between four kinds of simplification rules. The first kind consists of the standard simplification rules for Boolean logic that are used to remove tautologies, or perform direct simplification using the idempotence of the connectives ∨ and ∧ and the fact that ⊥ and are their (respective) identities. The second kind of simplification rule is based on a generalization of the unit clause rule (also called Boolean constraint propagation [21]). These rules are as follows (where l is a literal): l ∨ φ ⇒ l ∨ φ[⊥/l]
l ∧ φ ⇒ l ∧ φ[/l]
These rules are especially useful in combination with miniscoping as they often lead to new opportunities for miniscoping to be applied. For example, using the above rules, the formula ∀x. ∃y, z. x ∨ (y ∧ ¬z) ∨ (¬y ∧ z ∧ ¬x) can be simplified to ∀x. ∃y, z. x ∨ (y ∧ ¬z) ∨ (¬y ∧ z) , which can be further transformed using the miniscoping rules to (∀x. x) ∨ ((∃y. y) ∧ (∃z. ¬z) ∨ (∃y. ¬y) ∧ (∃z. z)) .
(2)
This example also motivates why miniscoping is in the Qubos main loop, as opposed to being applied only once initially.
Qubos: Deciding Quantified Boolean Logic
195
The third kind of simplification rule consists of the following quantifier specific rules. ∃x. φ
⇒ φ,
if x ∈free(φ)
∃x. l
⇒
for l ∈ {x, ¬x}
∃x. x ∧ φ
⇒ φ[/x]
∃x. (¬x) ∧ φ ⇒
, φ[⊥/x]
∀x. φ
⇒ φ,
if x ∈free(φ)
∀x. l
⇒
for l ∈ {x, ¬x}
∀x. x ∨ φ
⇒ φ[⊥/x]
∀x. (¬x) ∨ φ ⇒
⊥, φ[/x]
These rules are often effective in eliminating both kinds of quantifiers and therefore avoiding expansion steps. The application of these rules to the formula (2) above simplifies it to . The fourth kind of simplification rule is based on a technique commonly used by solvers based on clausal normal form and consists of dropping variables that occur only positively or only negatively in the clauses set. This technique can be also applied to quantified Boolean formulae that are in nnf . Let φ be a quantified Boolean formula in nnf and x a variable occurring in φ; we say that x is monotone in φ if it occurs only positively or only negatively in φ. It is easy to show that formulae with monotone variables have the following property. Proposition 1 Let φ be a quantified Boolean formula in nnf and let Qx.ψ (for Q ∈ {∀, ∃}) be a subformula in φ where x is monotone in φ. Then the formulae φ and φ are equivalent, where: (i) If Q is ψ[/x] (ii) If Q is ψ[⊥/x]
the quantifier ∃ then φ (respectively ψ[⊥/x]), if the quantifier ∀ then φ (respectively ψ[/x]), if
is obtained from φ by replacing Qx.ψ with x occurs positively (respectively negatively). is obtained from φ by replacing Qx.ψ with x occurs positively (respectively negatively).
This proposition provides a way of eliminating both universally and existentially quantified variables without applying the expansion step, provided the variables are monotone. Clausal Normal Form. Before handing off the normalized formula to a SAT solver we must transform it into clausal normal form. We do this using the renaming technique of [4] where subformulae are replaced with new Boolean variables and definitions of these new Booleans are added to the formula. This technique allows the generation of the clauses in time linear in the size of the input formula.
5
Experimental Results
We have built a system, Qubos (QUantified BOolean Solver), based on the ideas presented in Section 4. The system is written in C++ and supports the use of
196
A. Ayari and D. Basin Table 1. Examples from the BMC library
Qubos: Deciding Quantified Boolean Logic
197
different SAT-solvers including Prover [19], Heerhugo [9], Sato [22] and Zchaff [11]. The times reported below are based on Zchaff. In these timings, typically 60% of the time is consumed by our system and 40% by Zchaff. We carried out comparisons with the Qbf [16] and Semprop [10] systems, which are both state-of-the-art systems based on extensions of Davis-Putnam. The runtimes (on a 750 Mhz Sun Ultra Sparc workstation) depicted in the tables below are user time (in seconds) reported by the operating system for all computation required. Times greater than one hour are indicated by the symbol abort. We used two sets of benchmarks for our comparison. The first is obtained by applying bounded model construction to a library of monadic formulae modeling several verification tasks. These problems include: 1. Formulae encoding the equivalence of the specification and implementation of a ripple-carry adder for different bit-widths. 2. Formulae stating safety properties of a lift-controller. 3. Formulae encoding the equivalence of von Neumann adders and ripple-carry adders with varying bit-width. 4. Formulae stating the stability of a timed flip-flop model. 5. Formulae stating the mutual exclusion property for two protocols. The second set contains encodings of conditional planning problems generated by Rintanen [16] as well as their negations. Table 1 shows the results of the comparison. Each table gives information on quantificational structure, the size k of the model investigated, running times, Qubos space requirements in megabytes, the average quantifier width, and the prefix type of the problems. The input formulae are of size 105 , on average, with respect to | . | defined in Section 2. Qubos has dramatically better performance on all of these examples. The reason is that these problems all have very high structure and, as explained previously, the amount of structure improves (the average quantifier weight decreases) as k and the formulae become larger. These examples also demonstrate that, for well-structured formulae, memory requirements are typically modest; for example, the adder problems use 2 megabytes on the average. On the other hand, Qbf and Semprop translate the problems into quantified clausal form, which drastically increases the quantifier scope and the time and space required to find a solution. The second set of examples contains encodings of block-world planning problems where there is significantly less structure, although varied. Table 2 shows the time required to solve different block planning problems and their negations. The instances are called x.iii.y, where x denotes the number of blocks, y denotes the length of the plan and iii stands for the encoding strategy used to generate the problem (cf., [15]). The instances are ordered by the number of the blocks and their size. The left part of Table 2 titled ”Positive (∃∀∃) Q ≡ ∃” contains the results of the (positive) block planning problems and the right part titled ”Negative (∀∃∀∃) Q ≡ ∀ ” contains the results of the negated block planning problems. A (positive) block planning problem has the general form ∃∀∃φ, where φ is a Boolean formula, and its negation has the form ∀∃∀¬φ. Since the negative problems are just the negation of the positive problems the average ∃-weight in
198
A. Ayari and D. Basin Table 2. Block-World Planning Problems
the positive case and the average ∀-weight in the negative case are identical and their values are displayed in the second column of Table 2. In the positive case, the system Semprop generally either diverges or is very fast. The system Qbf always succeeds with respectable runtime. For Qubos there is a close relationship between its success and the average quantifier weight: the performance of Qubos decreases as the average quantifier weight rises. Qubos succeeds for the small problems, up to size 103 (with respect to | . |), even when the average quantifier weight is high, but it requires significantly more time than Qbf. When the problems become larger, up to size 105 , and the average quantifier weight is high, then Qubos exhausts memory. The superior performance of Qbf in this domain is not too surprising: it was developed and tuned precisely to solve this class of planning problems. In the negative case, the results show that Qubos is robust with respect to the quantificational structure and its success depends decisively on the average
Qubos: Deciding Quantified Boolean Logic
199
quantifier weight. Notice that although the problems in the positive case as well as in the negative case have the same average quantifier weight Qubos requires in general less CPU time for the negative problems. This can be explained by the fact that the negation makes these problems easier. When applying Qbf and Semprop to the negative problems, the negated formula ¬φ is first transformed into clausal form and thereby a new block of existential quantified variables (due to the renaming technique describe in Section 4) is introduced and so these problems have a ∀∃∀∃-structure. As a result these problems no longer have the shape of ∃∀∃ planning problems, which accounts for the divergence of Qbf. Notice that the Mona system can be also used for these examples. A detailed comparison of Mona with the BMC approach can be found in [1]. On the examples given here Mona yields comparable results for the ripple-carry adder, flip-flop, and mutex examples. It yields poorer results for the von Neumann adders, lift-controller, and planning problems. For examples, for the von Neumann adders with bit-width less than 11 it is up to factor 3 slower than Qubos and it diverges on the rest the von Neumann adders, the lift-controller, and all of the planning problems.
6
Conclusion and Future Work
We presented an approach to deciding quantified Boolean logic that works directly on fully-quantified Boolean formulae. We gave a characterization of structure, defined an interesting, natural, class of well-structured problems, and showed experimentally that our approach works well for problems in this class. One issue that is not addressed in our implementation of Qubos is the impact of the order in which quantified subformulae are expanded. Currently Qubos selects the innermost quantified subformula. As future work, we intend to investigate the effect of different selection strategies, such as ordering the quantified formulae with respect to their relative structure. Acknowledgments. The authors would like to thank Jussi Rintanen for providing us the planning examples used in Section 5.
References 1. Abdelwaheb Ayari and David Basin. Bounded model construction for monadic second-order logics. In 12th International Conference on Computer-Aided Verification (CAV’00), number 1855 in LNCS. Springer-Verlag, 2000. 2. Abdelwaheb Ayari, David Basin, and Stefan Friedrich. Structural and behavioral modeling with monadic logics. In Rolf Drechsler and Bernd Becker, editors, The Twenty-Ninth IEEE International Symposium on Multiple-Valued Logic. IEEE Computer Society, Los Alamitos, Freiburg, Germany, May 1999. 3. Armin Biere, Alessandro Cimatti, Edmund Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. In TACAS’99, volume 1579 of LNCS. Springer, 1999. 4. Thierry Boy de la Tour. An optimality result for clause form translation. Journal of Symbolic Computation, 14(4), October 1992.
200
A. Ayari and D. Basin
5. Hans Kleine B¨ uning and Theodor Lettmann. Aussagenlogik: Deduktion und Algorithmen. B. G. Teubner, Stuttgart, 1994. 6. Marco Cadoli, Andrea Giovanardi, and Marco Schaerf. An algorithm to evaluate quantified Boolean formulae. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98) and of the 10th Conference on Innovative Applications of Artificial Intelligence (IAAI-98), July 26–30 1998. 7. Enrico Giunchiglia, Massimo Narizzano, and Armando Tacchella. Backjumping for quantified boolean logic satisfiability. In Proceedings of the 17th International Conference on Artificial Intelligence (IJCAI-01), August 4–10 2001. 8. Enrico Giunchiglia, Massimo Narizzano, and Armando Tacchella. QuBE: A system for deciding Quantified Boolean Formulas Satisfiability. In Proceedings of the International Joint Conference on Automated Reasoning (IJCAR’01), June 2001. 9. Jan Friso Groote and Joost P. Warners. The propositional formula checker HeerHugo. In Ian Gent, Hans van Maaren, and Toby Walsh, editors, SAT20000: Highlights of Satisfiability Research in the year 2000, Frontiers in Artificial Intelligence and Applications. Kluwer Academic, 2000. 10. Reinhold Letz. Advances in decision procedures for quantified boolean formulas. In Uwe Egly, Rainer Feldmann, and Hans Tompits, editors, Proceedings of QBF2001 workshop at IJCAR’01, June 2001. 11. Matthew Moskewicz, Conor Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. Chaff: Engineering an Efficient SAT Solver. In Proceedings of the 38th Design Automation Conference (DAC’01), June 2001. 12. Andreas Nonnengart and Christoph Weidenbach. Computing small clause normal forms. In Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, volume I, chapter 6. Elsevier Science B.V., 2001. 13. David Plaisted, Armin Biere, and Yunshan Zhu. A satisfiability procedure for quantified boolean formulae. Unpublished, 2001. 14. Jussi Rintanen. Constructing conditional plans by a theorem-prover. Journal of Artificial Intelligence Research, 10, 1999. 15. Jussi Rintanen. Improvements to the evaluation of quantified boolean formulae. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2). Morgan Kaufmann Publishers, S.F., July 31–August 6 1999. 16. Jussi Rintanen. Partial implicit unfolding in the Davis-Putnam procedure for quantified Boolean formulae. In R. Nieuwenhuis and A. Voronkov, editors, Proceedings of the 8th International Conference on Logic for Programming, Artificial Intelligence and Reasoning, volume 2250 of LNCS. Springer-Verlag, Berlin, 2001. 17. Christoph Scholl and Bernd Becker. Checking equivalence for partial implementations. In Design Automation Conference, 2001. 18. Ofer Shtrichman. Tuning sat checkers for bounded model checking. In 12th International Conference on Computer-Aided Verification (CAV’00), number 1855 in LNCS. Springer-Verlag, 2000. 19. Gunnar St˚ almarck. A system for determining propositional logic theorems by applying values and rules to triplets that are generated from a formula. Technical report, European Patent Nr. 0403 454 (1995), US Patent Nr. 5 276 897, Swedish Patent Nr. 467 076 (1989), 1989. 20. Poul Williams, Armin Biere, Edmund Clarke, and Anubhav Gupta. Combining Decision Diagrams and SAT Procedures for Efficient Symbolic Model Checking. In Proceedings of CAV’00, number 1855 in LNCS. Springer-Verlag, 2000.
Qubos: Deciding Quantified Boolean Logic
201
21. Ramin Zabih and David McAllester. A rearrangement search strategy for determining propositional satisfiability. In Tom M. Smith, Reid G.; Mitchell, editor, Proceedings of the 7th National Conference on Artificial Intelligence, St. Paul, MN, August 1988. Morgan Kaufmann. 22. Hantao Zhang. SATO: An efficient propositional prover. In CADE’97, volume 1249 of LNAI. Springer, 1997.
Exploiting Transition Locality in the Disk Based Murϕ Verifier Giuseppe Della Penna1 , Benedetto Intrigila1 , Enrico Tronci2, , and Marisa Venturini Zilli2 1
Dip. di Informatica, Universit` a di L’Aquila, Coppito 67100, L’Aquila, Italy {gdellape,intrigil}@univaq.it 2 Dip. di Informatica Universit` a di Roma “La Sapienza”, Via Salaria 113, 00198 Roma, Italy {tronci,zilli}@dsi.uniroma1.it
Abstract. The main obstruction to automatic verification of Finite State Systems is the huge amount of memory required to complete the verification task (state explosion). This motivates research on distributed as well as disk based verification algorithms. In this paper we present a disk based Breadth First Explicit State Space Exploration algorithm as well as an implementation of it within the Murϕ verifier. Our algorithm exploits transition locality (i.e. the statistical fact that most transitions lead to unvisited states or to recently visited states) to decrease disk read accesses thus reducing the time overhead due to disk usage. A disk based verification algorithm for Murϕ has been already proposed in the literature. To measure the time speed up due to locality exploitation we compared our algorithm with such previously proposed algorithm. Our experimental results show that our disk based verification algorithm is typically more than 10 times faster than such previously proposed disk based verification algorithm. To measure the time overhead due to disk usage we compared our algorithm with RAM based verification using the (standard) Murϕ verifier with enough memory to complete the verification task. Our experimental results show that even when using 1/10 of the RAM needed to complete verification, our disk based algorithm is only between 1.4 and 5.3 times (3 times on average) slower than (RAM) Murϕ with enough RAM memory to complete the verification task at hand. Using our disk based Murϕ we were able to complete verification of a protocol with about 109 reachable states. This would require more than 5 gigabytes of RAM using RAM based Murϕ.
1
Introduction
State Space Exploration (Reachability Analysis) is at the very heart of all algorithms for automatic verification of concurrent systems. As well known, the
This research has been partially supported by MURST projects MEFISTO and SAHARA Corresponding Author: Enrico Tronci. Tel: +39 06 4991 8361 Fax: +39 06 8541 842
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 202–219, 2002. c Springer-Verlag Berlin Heidelberg 2002
Exploiting Transition Locality in the Disk Based Murϕ Verifier
203
main obstruction to automatic verification of Finite State Systems (FSS) is the huge amount of memory required to complete state space exploration (state explosion). For protocol like systems, Explicit State Space Exploration often outperforms Symbolic (i.e. OBDD based, [1,2]) State Space Exploration [8]. Since here we are mainly interested in protocol verification, we focus on explicit state space exploration. Tools based on explicit state space exploration are, e.g., SPIN [6, 14] and Murϕ [4,11]. In our context, roughly speaking, two kinds of approaches have been studied to counteract (i.e. delay) state explosion: memory saving and auxiliary storage. In a memory saving approach essentially one tries to reduce the amount of memory needed to represent the set of visited states. Examples of the memory saving approach are, e.g., in [23,9,10,17,18,7]. In an auxiliary storage approach one tries to exploit disk storage as well as distributed processors (network storage) to enlarge the available memory (and CPU). Examples of this approach are, e.g., in [15,16,12,20,13,5]. Exploiting statistical properties of protocol transition graphs it is possible to trade space with time [21,22], thus enlarging the class of systems for which automatic verification is feasible. In particular in [21] it has been shown that protocols exhibit locality. That is, w.r.t. levels of a Breadth First Search (BFS), state transitions tend to be between states belonging to close levels of the transition graph. In [21] an algorithm was also presented exploiting locality in order to save RAM as well as an implementation of such an algorithm within the Murϕ verifier. It is then natural and worth doing looking for a way to exploit locality also when using a disk based state exploration algorithm. In this paper we present a Disk based Breadth First Search (DBFS) algorithm that exploits transition locality. Our algorithm is obtained by modifying the DBFS algorithm presented in [16]. Our main results can be summarized as follows. – We present a DBFS algorithm that is able to exploit transition locality. Essentially, our algorithm is obtained from the one in [16] by using only a suitable subset of the states stored on disk to clean up the unchecked states BFS queue of [16]. By reducing disk read accesses we also reduce our time overhead w.r.t. a RAM based BFS state space exploration. – We implemented our algorithm within the Murϕ verifier. As the algorithm in [16], our algorithm is compatible with all state reduction techniques implemented in the Murϕ verifier. – We run our DBFS algorithm on some of the protocols included in the standard Murϕ distribution [11]. Our experimental results can be summarized as follows. • Even when using 1/10 of the RAM needed to complete verification, our disk based Murϕ is only between 1.4 and 5.3 times slower (3 times on average) than (RAM based) standard Murϕ [11] with enough RAM to complete the verification task at hand.
204
G.D. Penna et al.
• Our disk based algorithm is typically more than 10 times faster than the disk based algorithm presented in [16]. – Using our disk based Murϕ we were able to complete verification of a protocol with almost 109 reachable states. Using standard Murϕ this protocol would require more than 5 gigabytes of RAM.
2
Transition Locality for Finite State Systems
In this section we define (from [21]) our notion of locality for transitions. For our purposes, a protocol is represented as a Finite State System. A Finite State System (FSS) S is a 4-tuple (S, I, A, R) where: S is a finite set (of states), I ⊆ S is the set of initial states, A is a finite set (of transition labels) and R is a relation on S ×A×S. R is usually called the transition relation of S. Given states s, s ∈ S and a ∈ A we say that there is a transition from s to s labeled with a iff R(s, a, s ) holds. We say that there is a transition from s to s (notation R(s, s )) iff there exists a ∈ A s.t. R(s, a, s ) holds. The set of successors of state s (notation next(s)) is the set of states s s.t. R(s, s ). The set of reachable states of S (notation Reach) is the set of states of S reachable in 0 or more steps from I. Formally, Reach is the smallest set s.t.: 1. I ⊆ Reach; 2. for all s ∈ Reach, next(s) ⊆ Reach. The transition relation R of a given system defines a graph (transition graph). Computing Reach (reachability analysis) means visiting (exploring) the transition graph starting from the initial states in I. This can be done, e.g., using a Depth First Search (DFS) or a Breadth First Search (BFS). In the following we will focus on BFS. As well known a BFS defines levels on the transition graph. Initial states (i.e. states in I) are at level 0. The states in (next(I) − I) (states reachable in one step from I and not in I) are at level 1, etc. Formally we define the set of states at level k (notation L(k)) as follows. ∪ L(i)}. L(0) = I, L(k + 1) = {s | ∃s s.t. s ∈ L(k) and R(s, s ) and s ∈ i=k i=0 Given a state s ∈ Reach we define level(s) = k iff s ∈ L(k). That is level(s) is the level of state s in a BFS of S. The set Visited(k) of states visited (by a BFS) by level k is defined as follows. Visited(k) = ∪i=k i=0 L(i). Informally, transition locality means that for most transitions source and target states will be in levels not too far apart. Let S = (S, I, A, R) be an FSS. A transition in S from state s to state s is said to be k-local iff |level(s ) − level(s)| ≤ k. In [21] it is shown experimentaly the following fact. For most protocols, we have that for most states more that 75% of the transitions are 1-local.
Exploiting Transition Locality in the Disk Based Murϕ Verifier
205
/* Global Variables */ hash table M; /* main memory table */ file D; /* disk table */ FIFO queue Q_ck; /* checked state queue */ FIFO queue Q_unck; /* unchecked state queue */ int disk_cloud_size; /* number of blocks to be read from file D */ Fig. 1. Data Structures
3
A Disk Based State Space Exploration Algorithm Exploiting Transition Locality
Magnetic disk read/write times are much larger than RAM read/write times. Thus, not surprisingly, the main drawback of DBFS (Disk based Breadth First Search) w.r.t. RAM-BFS (RAM based Breadth First Search) is the time overhead due to disk usage. On the other hand, because of state explosion, memory is one of the main obstructions to automatic verification. Thus using magnetic disks to increase the amount of memory available during verification is very appealing. In [16] a DBFS algorithm has been proposed for the Murϕ verifier. Here we show that by exploiting transition locality (Section 2) the algorithm in [16] can be improved. In particular, disk accesses for reading can be reduced. This decreases the time overhead (w.r.t. a RAM-BFS) due to disk usage. As in [16] we actually have two DBFS algorithms: one for the case in which hash compaction [17,18] (Murϕ option -c) is enabled and one for the case in which hash compaction is not enabled. As the algorithm in [16] our algorithm can adjust for both cases. In the following we only present the version which is compatible with the hash compaction option. When hash compaction is not enabled the algorithm is actually simpler and can be easily obtained from the algorithm compatible with the hash compaction option. In the following we call LDBFS our Locality based DBFS algorithm. Figs. 1, 2, 3, 4, 5, 7 define our LDBFS using a C like programming language. Search() { /* initialization */ M = empty; D = empty; Q_ck = empty; Q_unck = empty; for each startstate s {Insert(s);} /* startstate generation */ do /* search loop */ { while (Q_ck is not empty) { s = dequeue(Q_ck); for all s’ in successors(s) {Insert(s’);} } /* while */ Checktable(); } while (Q_ck is not empty); /* do */ } /* Search()*/ Fig. 2. Function Search()
206
G.D. Penna et al.
3.1
Data Structures
The data structures used by LDBFS are in Fig. 1 and are essentially the same as the ones used in [16]. We have: a table M to store signatures of recently visited states; a file D to store signatures of all visited states (old states); a checked queue Q ck to store the states in the BFS level currently explored by the algorithm (BFS front); an unchecked queue Q unck to store pairs (s, h) where s is a state candidate to be on the next BFS level and h is the signature of state s. As in [16] state signatures in M do not necessarily represent all visited states. In M we just have recently visited states. Using the information in M we build the unchecked queue Q unck, i.e. the set of states candidate to be on the next BFS level. Note that the states in Q unck may be old (i.e. previously visited) since using M we can only avoid re-inserting in Q unck recently visited states. As in [16] we use disk file D to remove old state signatures from table M as well as to check Q unck to get rid of old states. The result of this checking process is the checked queue Q ck. The main difference between our algorithm and the one in [16] is that in the checking process we only use a subset of the state signatures in D. In fact we divide D into blocks and then use only some of such blocks to clean up M and Q unck. The global variable state cloud size holds the number of blocks of D we use to remove old state signatures from table M. Our algorithm dynamically adjust the value of state cloud size during the search. Using only a subset of the states in D decreases disk usage and thus speeds up verification. Note however that in [16] the checked queue Q ck only contains new (i.e. not previously visited) states whereas in our case Q ck may also contain some old (i.e. already visited) state. As a result our algorithm may mark as new (unvisited) a state that indeed is old (visited). This means that some state may be visited more than once and thus appended to file D more than once. However, thanks to transition locality (Section 2), this does not happen too often. It is exactly this statistical property of transition graphs that makes our approach effective. Table M is in main memory (RAM) whereas file D is on disk. We use disk memory also for the BFS queues Q ck, Q unck which instead are kept in main memory in the algorithm proposed in [16]. Our low level algorithm to handle disk queues Q ck and Q unck is exactly the same one we used in Cached Murϕ [21,3] for the same purpose, thus we do not show it here. Note that all the data structures that grow with the state space size (namely: D, Q ck, Q unck) are on disk in LDBFS. In [16] D is on disk, however state queues are in RAM. Since states in the BFS queue are not compressed [11] we have that for large verification problems the BFS queue can be a limiting factor for [16]. For this reason in LDBFS we implemented state queues on disk. 3.2
Function Search()
Function Search() (Fig. 2) is the same as the one used in the DBFS algorithm in [16].
Exploiting Transition Locality in the Disk Based Murϕ Verifier Insert(state s) { h = hash(s); /* compute signature of state s */ if (h is not in M) { insert h in M; enqueue((s, h), Q_unck); if (M is full) Checktable(); } /* if */ } /* Insert()
207
*/
Fig. 3. Function Insert()
Function Search() is a Breadth First Search using the checked queue Q ck as the current level state queue. Function Search() first loads the BFS queue (Q ck) with the initial states. Then Search() begins dequeuing states from Q ck. For each successor s’ of each state dequeued from Q ck, Search() calls Insert(s’) to store potentially new states in M as well as in Q unck. When queue Q ck becomes empty it means that all transitions from all states in the current BFS level have been explored. Thus we want to move to the next BFS level. Function Search() does this by calling function Checktable() which refills the checked queue Q ck with fresh (non visited) states, if any, from the unchecked queue Q unck. If, after calling Checktable(), Q ck is still empty it means that all reachable states have been visited and the BFS ends. 3.3
Function Insert()
Functions Insert() (Fig. 3) is the same as the one used in the DBFS algorithm in [16]. Consider the pair (s, h), where s is a state whose signature is h. If signature h is not in table M then Insert(s) inserts pair (s, h) in the unchecked queue Q unck and signature h in table M. When M is full, function Insert() calls function Checktable() to clean up M as well as the queues. Function Checktable() is also called at the end of each BFS level (when Q ck is empty). 3.4
Exploiting Locality in State Filtering
Function Checktable() in the DBFS algorithm in [16] uses all state signatures in disk file D to remove old states from Q unck. Exploiting locality (Section 2) here we are able to use only a fraction of the state signatures on disk D to clean up table M and queue Q unck. Disk usage is what slows down DBFS w.r.t. a RAM-BFS. Thus, by reading less states from disk, we save w.r.t. [16] some of the time overhead due to disk (read) accesses. The rationale of our approach stems from the following observations. First we should note that state signatures are appended to D in the same order in which new states are discovered by the BFS. Thus, as we move towards the tail of file D we find (signatures of) states whose BFS level is closer and closer to the current BFS level, i.e. the BFS level reached by the search. From [21] we
208
G.D. Penna et al.
Checktable() /* old/new check for main memory table */ { /* Disk cloud defined in Section 3.4 */ /*number of states deleted from M that are in disk cloud*/ deleted_in_cloud = 0; /*number of states deleted from M that are on disk but not in disk cloud*/ deleted_not_in_cloud = 0; /* Randomly choose indexes of disk blocks to read (disk cloud) */ DiskCloud = GetDiskCloud(); /* something_not_in_cloud is true iff there exists a state on disk that is not in the disk cloud */ if (there exists a disk block not selected in DiskCloud) something_not_in_cloud = true; else something_not_in_cloud = false; Calibration_Required = QueryCalibration(); for each Block in D { if (Block is in DiskCloud or Calibration_Required) { for all state signatures h in Block { if (h is in M) { remove h from M; if (Block is in DiskCloud) { deleted_in_cloud++; } else /* Block is not in DiskCloud */ {deleted_not_in_cloud++; }}}}} /* remove old states from state queue and add new states to disk */ while (Q_unck is not empty) { (s, h) = dequeue(Q_unck); if (h is in M) {append h to D; remove h from M; enqueue(Q_ck, s);}} /* clean up the hash table */ remove all entries from M; /* adjust disk cloud size, if requested */ if (Calibration_Required) { if (something_not_in_cloud and (deleted_in_cloud + deleted_not_in_cloud > 0)) {Calibrate(deleted_in_cloud,deleted_not_in_cloud);} if (disk access rate has been too long above a given critical limit) {reset disk cloud size to its initial value with given probability P;} } /* if Calibration_Required */ } /* Checktable() */ Fig. 4. Function Checktable() (state filtering)
Exploiting Transition Locality in the Disk Based Murϕ Verifier
209
GetDiskCloud() { Randomly select disk_cloud_size blocks from disk according to the probability distribution shown in Fig. 6 Return the indexes of the selected blocks. } Fig. 5. Function GetDiskCloud()
know that most transitions are local, i.e. they lead to states that are on BFS levels close to the current one. This means that most of the old states in M can be detected and removed by only looking at the tail of file D. We can take advantage of the above remarks by using the following approach. We divide the disk file D into blocks. Rather than using the whole file D in the Checktable() (as done in [16]) we only use a subset of the set of disk blocks. We call such a subset disk cloud. The disk cloud is created by selecting at random several disk blocks. Selection probability of disk blocks is not uniform. Instead, to exploit locality, disk block selection probability increases as we approach the tail of D (see Fig. 6). In [21] it is shown that locality allows us to save about 40% of the memory required to complete verification. This suggests to just use say 60% of the disk blocks. Thus the size (number of blocks) of the disk cloud should be 60% of the number of disk blocks. This works fine. However we can do more. Our experimental results show that, most of the time, we need much less than 60% of the disk blocks to carry out the clean up implemented by function Checktable(). Thus we dynamically adjust the fraction of disk blocks used by function Checktable(). 3.5
Function Checktable()
Function Checktable() (Fig. 4), using disk file D, removes signatures of old (i.e. visited) states from table M. Then, using such cleaned M, Checktable() removes old states from the unchecked queue Q unck. Finally, Checktable() moves the states that are in the (now cleaned) unchecked queue Q unck to the checked queue Q ck. 3.6
Disk Cloud Creation
Function GetDiskCloud() (Fig. 5) is called by function Checktable() to create our disk cloud. Function GetDiskCloud() selects disk cloud size disk blocks according to the probability curve shown in Fig. 6. We number disk blocks starting from 0 (oldest block). Thus the lower the disk block index the older (closer to the head of file D) the disk block. On the x axis of Fig. 6 we have the relative disk block index ρ, i.e. ρ = /. E.g. ρ = 0 is the (relative index of the) first (oldest) disk block inserted in disk D, whereas ρ = 1 is the last (newest) disk block inserted. On the y axis of Fig. 6 we have the probability of selecting a disk block with a given ρ.
210
G.D. Penna et al.
Selection Probability
b3
b2
b1 b0 a0
a1 Disk Blocks
a2
a3
Fig. 6. Probability curve for disk cloud block selection (used by GetDiskCloud())
The selection probability curve in Fig. 6 ensures that the most recently created blocks (ρ close to 1) are selected with a higher probability than old blocks thus exploiting transition locality [21]. Note that, defensively, the selection probability of old blocks (ρ close to 0) is b0 > 0. This is because we want to have some old blocks to remove occasional far back states (i.e. states belonging to an old BFS level far from the current one) reached by occasional non local transitions. Function GetDiskCloud() returns to Checktable() the indexes of the selected blocks. Since our min and max values for the relative disk block indexes are, respectively, 0 and 1, in Fig. 6 we have a0 = 0 and a3 = 1. The value of b3 is always 1/K, where K is a normalization constant chosen so that the sum over all disk blocks of the selection probabilities is 1. The pairs (a1 , b1 ), (a2 , b2 ) define our selection strategy. The values we used in our experiments are: a1 = 0.4, b1 = 0.4/K, a2 = 0.7, b2 = 0.6/K. Two strategies are possible to partition disk D in state signature blocks. We can have either a variable number of fixed size blocks or a fixed number of variable size blocks. Reading a block from disk D can be done with a sequential transfer, whereas moving disk heads from one block to another requires a disk seek operation. Since seeks take longer than sequential transfers we decided to limit the number of seeks. This led us to use a fixed number of variable size blocks. Let N be the number of disk blocks we want to use and let S be the number of state signatures in file D. Then each block (possibly with the exception of the last one that will be smaller) has S/N state signatures. As a matter of fact, to avoid having too small blocks, we also impose a minimum value B for the number of state signatures in a block. Thus we may have less than N blocks if S is too small.
Exploiting Transition Locality in the Disk Based Murϕ Verifier
211
Calibrate(deleted_in_cloud, deleted_not_in_cloud) { deleted_states = deleted_in_cloud + deleted_not_in_cloud; beta = deleted_not_in_cloud / deleted_states; if (beta is close to 1) /* low disk cloud effectiveness: increase disk access rate */ { /* increase disk_cloud_size by a given percentage */ disk_cloud_size = (1 + speedup)*disk_cloud_size; } else if (beta is close to 0) /* high disk cloud effectiveness: decrease disk access rate */ { /* decrease disk_cloud_size by a given percentage */ disk_cloud_size = (1 - slowdown)*disk_cloud_size; }} Fig. 7. Function Calibrate()
In our experiments here we used N = 100 and B = 104 . Thus, e.g. to have 100 disk blocks we need at least 106 reachable states. 3.7
Disk Cloud Size Calibration
Function Calibrate() (Fig. 7) is called by function Checktable() every time a calibration is needed for the disk cloud size. Two parameters are passed to function Calibrate(). Namely: the number of disk states deleted from M by Checktable() by only using disk blocks that are in the disk cloud (deleted in cloud in Fig. 7) and the number of disk states deleted from M by only using disk blocks that are not in the disk cloud (deleted not in cloud in Fig. 7). Function Calibrate() reads the whole file D and computes the ratio (beta in Fig. 7) between the number of deleted states not in the disk cloud and the number of total deleted states (deleted states in Fig. 7). A value of beta close to 1 (low disk cloud effectiveness) means that the disk cloud has not been very effective in removing old states from table M. In this case, the variable disk cloud size (holding the disk cloud size) is increased by (speedup*disk cloud size). A value of beta close to 0 (high disk cloud effectiveness) means that the disk cloud has been very effective in removing old states from table M. In this case, we decrease the value of disk cloud size by (slowdown*disk cloud size) in order to lower the disk access rate. In our experiments here we used speedup = 0.15 and slowdown = 0.15. 3.8
Calibration Frequency
Function QueryCalibration() called by function Checktable() (Fig. 4) tells us whether a calibration has to be performed or not. The rationale behind function QueryCalibration() is the following. Calling function Calibrate() too often nullifies our efforts for reducing disk usage. In fact a calibration of the disk cloud size requires reading the whole file
212
G.D. Penna et al.
D. However calling function Calibrate() too sporadically may have the same effect. In fact waiting too much for a calibration may lead to use an oversized disk cloud or an undersized one. An oversized disk cloud increases disk usage beyond needs. Also an undersized disk cloud increases disk usage, since many old states will not be removed from M and we will be revisiting many already visited states. In our current implementation function QueryCalibration() enables a calibration for every 10 calls of function Checktable() (Fig. 4). Our experimental results suggests that this is a reasonable calibration frequency.
4
Experimental Results
We implemented the LDBFS algorithm of Sect. 3 within the Murϕ verifier. In the following we call DMurϕ the version of the Murϕ verifier we obtained. In this section we report the experimental results we obtained by using DMurϕ. Our experiments have two goals. First we want to know if by using locality there is indeed some gain w.r.t. the algorithm proposed in [16]. Second we want to measure DMurϕ time overhead w.r.t. standard Murϕ performing a RAM-BFS. To meet our goals we proceed as follows. First, for each protocol in our benchmark we determine the minimum amount of memory needed to complete verification using the Murϕ verifier (namely Murϕ version 3.1 from [11]). Then we compare Murϕ performances with those of DMurϕ and with those of the disk based algorithm proposed in [16]. Our benchmark consists of some of the protocols in the Murϕ distribution [11] and the kerb protocol from [19]. 4.1
Results with Murϕ
The Murϕ verifier takes as input the amount of memory M to be used during the verification run as well as the fraction g (in [0, 1]) of M used for the queue (i.e. g is gPercentActive using a Murϕ parlance). We say that the pair (M , g) is suitable for protocol p iff the verification (with Murϕ) of p can be completed with memory M and queue gM . For each protocol p we determine the least M s.t. for some g, (M , g) is suitable for p. In the sequel we denote by M (p) such an M . Of course M (p) depends on the compression options used. Murϕ offers bit compression (-b) and hash compaction (-c). Our approach (as the one in [16]) is compatible with all Murϕ compression options. However, a disk based approach is really interesting only when, even using all compression options, one runs out of RAM. For this reason we only present results about experiments in which all compression options (i.e. -b -c) are enabled. Fig. 8 gives some useful information about the protocols we considered in our experiments. The meaning of the columns in Fig. 8 is explained in Fig. 9.
Exploiting Transition Locality in the Disk Based Murϕ Verifier Bytes Diam 96 1,1,3,2,10 12 n peterson 20 9 241 newlist6 32 7 91 ldash 144 1,4,1,false 72 sci 60 3,1,1,2,1 94 mcslock1 16 6 111 sci 64 3,1,1,5,1 95 sci 68 3,1,1,7,1 143 kerb 148 NumIntruders=2 15 newlist6 40 8 110 Protocol and Parameters ns
213
mu -b -c Reach
Rules
Max Q
M
g
T
2,455,257
8,477,970 1,388,415 145,564,125 0.57 1,211.02
2,871,372
25,842,348
3,619,556
21,612,905 140,382 22,590,004 0.04 1,641.67
46,657
15,290,000 0.02 764.27
8,939,558 112,808,653 509,751 118,101,934 0.06 12,352.93 9,299,127
30,037,227 347,299 67,333,575 0.04 2,852.03
12,783,541 76,701,246 392,757 70,201,817 0.03 3,279.45 75,081,011 254,261,319 2,927,550 562,768,255 0.04 35,904.86 126,784,943 447,583,731 4,720,612 954,926,331 0.04 99,904.47 7,614,392
9,859,187 4,730,277 738,152,956 0.62 2,830.83
81,271,421 563,937,480 2,875,471 521,375,945 0.03 31114.87
Fig. 8. Results on a INTEL Pentium III 866Mhz with 512M RAM. Murϕ options used: -b (bit compression), -c (40 bit hash compaction), -ndl (no deadlock detection). Meaning Attribute Protocol Name of the protocol. Values of the parameters we used for the protocol. We show our parameter values in the same order in which such parameters appear in the Parameters Const section of the protocol file included in the Murϕ distribution [11]. When such list is too long, as for the kerb protocol, we just list the assignments we modified in the Const section w.r.t. the distribution. Number of bytes needed to represent a state in the queue when bit compression is used. For protocol p we denote such number by StateBytes(p). Bytes Note that since we are using bit compression as well as hash compaction (-b -c), 5 bytes are used to represent (the signature of) a state in the hash table. Number of reachable states for the protocol. For protocol p, we denote Reach such number by |Reach(p)|. Number of rules fired during state space exploration. For protocol p, we Rules denote such number by RulesFired(p). Maximum queue size (i.e. number of states) attained during space state Max Q exploration. For protocol p we denote such number by MaxQ(p). Diam Diameter of the transition graph. Minimum amount of memory (in kilobytes) needed to complete state space exploration. That is M (p). Let bh be the number of bytes taken M by a state in the hash table (for us bh = 5 since we are using hash compaction). From the Murϕ source code [11] we can compute M (p). We have: M (p) = |Reach(p)| (bh + (MaxQ(p)/|Reach(p)|)StateBytes(p)). Fraction of memory M used for the queue. From the Murϕ source code g [11] we can compute g. We have: g = MaxQ(p)/|Reach(p)|. CPU time (in seconds) to complete state space exploration when using memory M and queue gM. For protocol p, we denote such number by T T (p). Fig. 9. Meaning of the columns in Fig. 8.
214
G.D. Penna et al.
From column M of Fig. 8 we see that there are protocols requiring more than 512M bytes of RAM to complete. Thus we could not use standard Murϕ on our 512M PC. However we were able to complete verification of such protocols using Cached Murϕ (CMurϕ) [3]. Giving to CMurϕ enough RAM we get a very low collision rate and from [21] we know that in this case the CPU time taken by CMurϕ is essentially the same as that taken by standard Murϕ with enough RAM to complete the verification task. For this reason in the following we will regard the results in Fig. 8 as if they were all obtained by using standard Murϕ with enough (i.e. M (p)) RAM to complete the verification task.
4.2
Results with DMurϕ
Our next step is to run each protocol p in Fig. 8 with less and less (RAM) memory using our DMurϕ. Namely, we run protocol p with memory limits M (p), 0.5M (p) and 0.1M (p). This approach allows us to easily compare the experimental results obtained from different protocols. The results we obtained are in Fig. 10. We give the meaning of rows and columns in Fig. 10. Columns Protocol and Parameters have the meaning given in Fig. 9. Column α (with α = 1, 0.5, 0.1) gives information about the run of protocol p with memory αM (p). Row States gives the ratio between the visited states (by DMurϕ) when using memory αM (p) and |Reach(p)| (in Fig. 8). This is the state overhead due to revisiting already visited states. This may happen since in function Checktable() (Fig. 4) we do not use the whole disk file D to remove old states from table M. Row Rules gives the ratio between the rules fired (by DMurϕ) when using memory αM (p) and RulesFired(p) (in Fig. 8). This is the rule overhead due to revisiting already visited states. Row Time gives the ratio between the time TDM urϕ,α (p) (in seconds) to complete state space exploration (with DMurϕ) when using memory αM (p) and T (p) in Fig. 8. This is our time overhead w.r.t. RAM-BFS. Note that TDM urϕ,α (p) is the time elapsed between the start and the end of the state space exploration process. That is TDM urϕ,α (p) is not just the CPU time, instead TDM urϕ,α (p) also includes the time spent on disk accesses. Note that for the big protocols in Fig. 8 (i.e. those requiring more than 512M of RAM) we could not run the experiments with α = 1 on our machine with 512M of RAM. However, of course, the most interesting column for us is the one with α = 0.1. The experimental results in Fig. 10 show that even when α = 0.1 our disk based approach is only between 1.4 and 5.3 (3 on average) times slower than a RAM-BFS with enough RAM to complete the verification task.
Exploiting Transition Locality in the Disk Based Murϕ Verifier Protocol n peterson
Parameters 9
ns
1,1,3,2,10
newlist6
7
ldash
1,4,1,false
sci
3,1,1,2,1
mcslock1
6
sci
3,1,1,5,1
Mem States Rules Time States Rules Time States Rules Time States Rules Time States Rules Time States Rules Time
States Rules Time sci 3,1,1,7,1 States Rules Time kerb NumIntruders=2 States Rules Time newlist6 8 States Rules Time Min Avg Max
1 1.178 1.178 2.148 1.348 1.487 1.734 1.366 1.365 1.703 1.566 1.528 2.037 1.260 1.279 1.811 1.346 1.346 1.915
0.5 1.124 1.124 2.056 1.405 2.011 2.144 1.335 1.334 1.765 1.668 1.626 2.226 1.189 1.206 1.798 1.550 1.550 2.477
0.1 1.199 1.199 2.783 1.373 1.645 1.953 1.384 1.382 2.791 1.702 1.658 3.770 1.183 1.200 2.888 1.703 1.703 5.259
— — — — — —
1.169 1.195 1.828 1.130 1.152 1.421 1.282 1.060 1.234 1.416 1.412 2.612
1.143 1.167 2.553 1.097 1.115 1.743 1.279 1.080 1.438 1.406 1.405 4.436
— — — — —
215
Time 1.703 1.234 1.438 Time 1.891 1.954 2.961 Time 2.148 2.612 5.259
Fig. 10. Comparing DMurϕ with RAM Murϕ [11] (compression options: -b -c)
4.3
Results with Disk Based Murϕ
To measure the time speed up we obtain by exploiting locality we are also interested in comparing our locality based disk algorithm DMurϕ with the disk based Murϕ presented in [16]. The algorithm in [16] is not available in the standard Murϕ distribution [11]. However, if we omit the calibration (Fig. 7) step in function Checktable() (Fig. 4) and always use all disk blocks to clean up the unchecked queue Q unck and
216
G.D. Penna et al. Protocol Parameters Mem n peterson 9 States Rules Time ns 1,1,3,2,10 States Rules Time newlist6 7 States Rules Time ldash 1,4,1,false States Rules Time sci 3,1,1,2,1 States Rules Time mcslock1 6 States Rules Time
1 0.5 0.1 1.000 1.000 0.527 1.000 1.000 0.507 2.623 2.430 > 90.704 1.000 1.000 0.747 1.000 1.000 0.309 1.259 242.131 >77.895 1.000 1.000 0.253 1.000 1.000 0.203 1.331 1.357 >42.817 0.355 — — 0.245 — — >50.660 — — 1.000 0.361 — 1.000 0.647 — 1.616 > 11.863 — 1.000 1.000 0.137 1.000 1.000 0.115 1.821 1.691 >11.605
Fig. 11. Comparing Disk Murϕ in [16] with RAM Murϕ [11] (compression options: -b -c)
table M (Fig. 1) we obtain exactly the algorithm in [16] (quite obviously since [16] was our starting point). Thus in the sequel for the algorithm in [16] we use the implementation obtained as described above. For the algorithm in [16] (implemented as above) we wanted to repeat the same set of experiments we run for DMurϕ. However the big protocols of Fig. 8 took too long. Thus we did not include them in our set of experiments. Our results are in Fig. 11. Rows and columns in Fig. 11 have the same meaning as those in Fig. 10, but those of Fig. 11 refer to the algorithm in [16] (while those of Fig. 10 refer to DMurϕ). Computations taking too much longer than the time in Fig. 8 were aborted. In such cases we get a lower bound to the time overhead w.r.t. standard Murϕ. This is indicated with a > sign before the lower bound. For aborted computations the rows States and Rules are, of course, less than 1 and give us an idea of the fraction of the state space explored before the computation was terminated. Fig. 12 compares performances of our DMurϕ with those of the disk based Murϕ in [16]. The meaning of rows and columns of Fig. 12 is as follows. Columns Protocol, Parameters and column α (with α = 1, 0.5, 0.1) have the meaning given in Fig. 9. Row Time gives the ratio (or a lower bound to the ratio) between the verification time when using disk based Murϕ in [16] and the verification time when using DMurϕ. Of course the interesting cases for us are those for which α = 0.1 (i.e. there is not enough RAM to complete verification using a RAM-BFS). For such cases,
Exploiting Transition Locality in the Disk Based Murϕ Verifier Protocol n peterson ns newlist6 ldash sci mcslock1
Parameters 9 1,1,3,2,10 7 1,4,1,false 3,1,1,2,1 6
Min Avg Max
Mem Time Time Time Time Time Time
1 1.221 0.726 0.781 > 24 0.892 0.950
0.5 1.182 112.934 0.768 > 24 >6 0.683
217
0.1 > 32 > 39 > 15 > 24 >6 >2
Time 0.726 0.683 >2 Time >4.762 > 24.261 > 19.667 Time > 24 112.934 > 39
Fig. 12. Comparing DMurϕ with disk based Murϕ in [16].
from the results in Fig. 12 we see that our algorithm is typically more than 10 times faster than the one presented in [16]. Note however that the results in Fig. 12 should be regarded more as qualitative results rather than quantitative results. In fact, as described above, we obtained the algorithm in [16] by eliminating the calibration step from our algorithm. It is quite conceivable that when calibration is not to be performed one can devise optimizations that are not possible when calibration has to be performed. Still, the message of Figs. 10, 11, 12 is quite clear: because of transition locality most of the time we do not need to read the whole disk D. This saves disk accesses and thus verification time. Protocol Parameters Bytes Reach Rules MaxQ mcslock2 N=4 16 945,950,806 3,783,803,224 30,091,568 Diam 153
T 406,275
Mem HMem 300 4,729,754
QMem 481,465
TotMem 5,211,219
Fig. 13. Results for DMurϕ on a 1GHz Pentium IV PC with 512M of RAM. Murϕ options used: -ndl (no deadlock detection), -b (bit compression), -c (40 bit hash compaction).
4.4
A Large Protocol
We also wanted to test our disk based approach on a protocol out of reach for both standard Murϕ [4,11] and Cached Murϕ [21,3] on our 512M machine. We found that the protocol mcslock2 (with N = 4) in the Murϕ distribution suites our needs. Our results are in Fig. 13. The meaning of the columns of Fig. 13 is as follows. Columns Protocol, Parameters, Bytes, Reach, Rules, MaxQ, Diam, T have the same meaning as in Fig. 8 but they refer to DMurϕ (while those of Fig. 8 refer to standard Murϕ). Column Mem gives the total RAM memory (in Megabytes) given to DMurϕ to carry out the given verification task.
218
G.D. Penna et al.
Column HMem gives the hash table size (in kilobytes) that would be needed if we were to store all reachable states in a RAM hash table. Column QMem gives the RAM size (in kilobytes) needed for the BFS queue if we were to keep all BFS queue in RAM. Column TotMem gives the RAM size (in kilobytes) needed to complete the verification task using a RAM-BFS with standard Murϕ. TotMem is equal to (HMem + QMem).
5
Conclusions
We presented a disk based Breadth First Explicit State Space Exploration algorithm as well as an implementation of it within the Murϕ verifier. Our algorithm has been obtained from the one in [16] by exploiting transition locality [21] to decrease disk usage (namely, disk read accesses). Our experimental results show the following. Our algorithm is typically more than 10 times faster than the disk based algorithm proposed [16]. Moreover, even when using 1/10 of the RAM needed to complete verification, our algorithm is only between 1.4 and 5.3 times (3 times on average) slower than RAM-BFS (namely, standard Murϕ) with enough RAM memory to complete the verification task at hand. Statistical properties of transition graphs (as transition locality is) have proven quite effective in improving state space exploration algorithms ([21,22]) on a single processor machine. Looking for new statistical properties and for ways to exploit such statistical properties when performing verification on distributed processors are natural further developments for our research work. Acknowledgements. We are grateful to Igor Melatti and to FMCAD referees for helpful comments and suggestions on a preliminary version of this paper.
References [1] R. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. on Computers, C-35(8), Aug 1986. [2] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. Information and Computation, (98), 1992. [3] url: http://univaq.it/∼tronci/cached.murphi.html. [4] D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang. Protocol verification as a hardware design aid. In IEEE International Conference on Computer Design: VLSI in Computers and Processors, pages 522–5, 1992. [5] R. Sisto F. Lerda. Disributed-memory model checking with spin. In Proc. of 5th International SPIN Workshop, volume 1680. LNCS, Springer, 2000. [6] G. J. Holzmann. The spin model checker. IEEE Trans. on Software Engineering, 23(5):279–295, May 1997. [7] G. J. Holzmann. An analysis of bitstate hashing. Formal Methods in Systems Design, 1998.
Exploiting Transition Locality in the Disk Based Murϕ Verifier
219
[8] A. J. Hu, G. York, and D. L. Dill. New techniques for efficient verification with implicitily conjoined bdds. In 31st IEEE Design Automation Conference, pages 276–282, 1994. [9] C. N. Ip and D. L. Dill. Better verification through symmetry. In 11th International Conference on: Computer Hardware Description Languages and their Applications, pages 97–111, 1993. [10] C. N. Ip and D. L. Dill. Efficient verification of symmetric concurrent systems. In IEEE International Conference on Computer Design: VLSI in Computers and Processors, pages 230–234, 1993. [11] url: http://sprout.stanford.edu/dill/murphi.html. [12] R. K. Ranjan, J. V. Sanghavi, R. K. Brayton, and A. Sangiovanni-Vincentelli. Binary decision diagrams on network of workstations. In IEEE International Conference on Computer Design, pages 358–364, 1996. [13] J. V. Sanghavi, R. K. Ranjan, R. K. Brayton, and A. Sangiovanni-Vincentelli. High performance bdd package by exploiting memory hierarchy. In 33rd IEEE Design Automation Conference, 1996. [14] url: http://netlib.bell-labs.com/netlib/spin/whatispin.html. [15] U. Stern and D. Dill. Parallelizing the murϕ verifier. In Proc. 9th Int. Conference on: Computer Aided Verification, volume 1254, pages 256–267, Haifa, Israel, 1997. LNCS, Springer. [16] U. Stern and D. Dill. Using magnetic disk instead of main memory in the murϕ verifier. In Proc. 10th Int. Conference on: Computer Aided Verification, volume 1427, pages 172–183, Vancouver, BC, Canada, 1998. LNCS, Springer. [17] U. Stern and D. L. Dill. Improved probabilistic verification by hash compaction. In IFIP WG 10.5 Advanced Research Working Conference on: Correct Hardware Design and Verification Methods (CHARME), pages 206–224, 1995. [18] U. Stern and D. L. Dill. A new scheme for memory-efficient probabilistic verification. In IFIP TC6/WG6.1 Joint International Conference on: Formal Description Techniques for Distributed Systems and Communication Protocols, and Protocol Specification, Testing, and Verification, 1996. [19] url: http://verify.stanford.edu/uli/research.html. [20] T. Stornetta and F. Brewer. Implementation of an efficient parallel bdd package. In 33rd IEEE Design Automation Conference, pages 641–644, 1996. [21] E. Tronci, G. Della Penna, B. Intrigila, and M. Venturini Zilli. Exploiting transition locality in automatic verification. In IFIP WG 10.5 Advanced Research Working Conference on: Correct Hardware Design and Verification Methods (CHARME). LNCS, Springer, Sept 2001. [22] E. Tronci, G. Della Penna, B. Intrigila, and M. Venturini Zilli. A probabilistic approach to space-time trading in automatic verification of concurrent system. In Proc. of 8th IEEE Asia-Pacific Software Engineering Conference (APSEC), Macau SAR, China, Dec 2001. IEEE Computer Society Press. [23] Pierre Wolper and Dennis Leroy. Reliable hashing without collision detection. In Proc. 5th Int. Conference on: Computer Aided Verification, pages 59–70, Elounda, Greece, 1993.
Traversal Techniques for Concurrent Systems Marc Sol´e and Enric Pastor Department of Computer Architecture Technical University of Catalonia 08860 Castelldefels (Barcelona), Spain {msole, enric}@ac.upc.es
Abstract. Symbolic model checking based on Binary Decision Diagrams (BDDs) is a verification tool that has received an increasing attention by the research community. Conventional breadth-first approach to state generation results is often responsible for inefficiencies due to the growth of the BDD sizes. This is specially true for concurrent systems in which existing research (mostly oriented to synchronous designs) is ineffective. In this paper we show that it is possible to improve BFS symbolic traverse for concurrent systems by scheduling the application of the transition relation. The scheduling scheme is devised analyzing the causality relations between the events that occur in the system. We apply the scheduled symbolic traverse to invariant checking. We present a number of schedule schemes and analyze its implementation and effectiveness in a prototype verification tool.
1
Introduction
A lot of effort has been made by the verification community to develop efficient traversal methods [1,2]. Unfortunately most of them are designed to improve the traversal process of synchronous systems and are not suitable or relevant for concurrent systems (concurrent systems may include asynchronous circuits [3,4], distributed systems [5,6], etc.). In synchronous systems, transition relations (TRs) are usually partitioned and the sequence of application of each part must be decided in order to reduce the BDD sizes for intermediate results. The application order in this case is important because the way the variables are quantified depends on it, affecting the size of the intermediate representation. This is usually referred as the quantification schedule problem. Algorithms developed to solve the quantification schedule problem have no practical application for concurrent systems. In this latter case we usually have a disjunctive collection of small TRs, each one describing the behavior of some component. Each individual TR is applied assuming interleaved semantics and the result is immediately added to the reachable set of states, so the order in which these TR are fired has a strong influence on the overall performance.
This work has been partially funded by the Ministry of Science and Technology of Spain under contract TIC 2001-2476-C03-02 and grant AP2001-2819.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 220–237, 2002. c Springer-Verlag Berlin Heidelberg 2002
Traversal Techniques for Concurrent Systems
221
Some authors have studied the influence of ordering the application of the TR to avoid the BDD explosion problem. Their goal is to schedule the exploration of the state space by taking only selected portions of the TR, or by delaying the exploration of certain states. In [7] Ravi and Somenzi proposed a “high density” traverse, which does not use the set of newly reached states as the from set for the next iteration. Instead it uses a subset of the newly reached states that has a more compact representation. This is a partial traverse, so afterwards must be completed. In [8] Cabodi et al. use “activity profiles” for each BDD node in the TRs and prune the BDDs to perform a partial traversal, completed again, in the end. The “activity profiles” are obtained in a preliminary reachability learning phase. In [9] Hett et al. propose a sequence of partial traverses that combine subsets of the newly reached states and dynamic TR pruning. Both manipulations are applied using the Hamming distance as the main part of the heuristic function. In [10] Ravi and Cabodi allow the user to provide hints to guide symbolic search. User-defined hints are used to simplify the TR, but require the user to understand the design and also predict the BDD behavior. Our objective is to minimize the CPU time of the traversal process. Usually the problems appear in its intermediate steps, as big BDDs start to be generated. In this cases the faster you can discover the remaining states, the better the performance is, due to BDD recombination. The speed of new states generation is highly related to the number of TRs applications needed to end up the process. Hence an algorithm for determining a good TR application order is crucial. This paper proposes a method that intends to complete symbolic traversal with the minimum number of TR applications. The number of intermediate steps is reduced, thus reducing the probability of generating an intermediate BDD that is much too big to cope with. We present four symbolic traverse algorithms that schedule the application of the TRs. The TR application schemas are named: token traverse (TOK), weighted token traverse (WTOK), dynamic event-clustered traverse (DEC), and TR cluster-closure traverse (TRCC). TOK and WTOK require an static analysis of the system to build the TR application schema. The analysis is basically an a priori causality analysis between TRs (see Section 3). Once we have derived a TR application schema we use it to decide the order in which the TRs would be applied. The schema does not imply a static TR application order because it uses feedback from the traversal to adapt the order dynamically. TOK and WTOK differ in the kind of feedback that receive from the traversal analysis. DEC tries to be more accurate in its TR application schema, so it is completely adaptable and has no initial precomputation phase. DEC keeps constantly updated information on how many states each TR may be applied for the first time. Hence we can decide each time which TR has the biggest probabilities to generate new states at a fastest ratio. Finally, TRCC is an adaptation of partial iterative squaring to the scope of concurrent systems. We combine some TRs to (1) reduce the number of TRs while keeping their size small, thus reducing the number of intermediate results,
222
M. Sol´e and E. Pastor
and (2) due to squaring reduce the number of iterations needed by the schema to complete the analysis. The paper is organized as follows. Section 2 is devoted to basic models for formal verification of concurrent systems. Section 3 reviews some of the known peculiarities of symbolic traverse for concurrent systems and their impact on performance. Sections 4, 5, 6 and 7 are the core of this paper as they explain the four traversal proposals: TOK, WTOK, DEC, and TRCC respectively. Section 8 presents some preliminary results on the performance of the different methods on some benchmarks. Finally Section 9 concludes the paper.
2
Background
A finite transition system (TS) [11] is a 4-tuple A = S, Σ, T, Si , where S is a finite set of states, Σ is a non-empty alphabet of events, T is a transition relation such that T ⊆ S × Σ × S, and Si are the initial states Si ⊆ S. Transitions are e e denoted by s −→s . An event e is enabled at state s if ∃s −→s ∈ T . Given an S event e its firing region Fr(e) is defined as Fr : Σ → 2 such that Fr(e) = {s ∈ e S | ∃ s −→s ∈ T } . Event e is said to be firable at state s if s ∈ Fr(e). The concurrent execution of events is described by means of interleaving; that is, weaving the execution of events into sequences. Given the significance of individual events, the transition relation of a TS can be naturally partitioned e into a disjoint set of relations, one for each event ei ∈ Σ: Te = {s −→s ∈ T }. To represent events symbolically we use a set of Boolean variables that encode the states in the TS and a Boolean relation to encode the TR. The application of a TR Te on some set of states R results in a set of states R that contains all the states reachable from R through a transition of event e. Although a TS is a powerful formalism, it is not usually used directly to specify concurrent systems. Instead, other high-level formalisms like Petri nets [12] or circuit structural descriptions are used, that later on are translated to transitions systems for analysis. A Petri net (PN) is a 4-tuple N = {P, T , W, M0 }, where P = {p1 , p2 , . . . , pn } and T = {t1 , t2 , . . . , tm } are finite sets of places and transitions satisfying P∩T = ∅ and P ∪ T = ∅; W : (P × T ) ∪ (T × P) → Z defines the weighted flow relation, and M0 is the initial marking. The function M : P → N is called a marking; that is, an assignment of a nonnegative integer to each place. If k is assigned to place p, we will say that p is marked with k tokens. If W(u, v) > 0 then there is an arc from u to v with weight (or multiplicity) W(u, v). PNs are graphically represented by drawing places as circles, transitions as boxes or bars, the flow relation as directed arcs, and tokens as dots circumscribed into the places (see the example in Fig. 5).
3
Causality and Chaining Traversal
To speed up the generation of new states we combine two kinds of techniques: causality analysis and chaining. In traditional breadth first search (BFS) the TR
Traversal Techniques for Concurrent Systems s0
a s1
s3
d
s2
b
c
s4
e
s6
s5
c
d
s7
a
b
f
s9
s10
a
iteration 3
d
s11
(a)
s6
iteration 5
d
s7
a
s8
e f
s9
s10
e
apply e
d
s11
(b)
apply g
a
s6
s5
c
d
s7
a f
s9
s10
iteration 2
e s8
e
a
iteration 1
b
s4
e
b apply f
s2
b
s3
d
f
g s12
apply d
b
s1
c
apply c
s5
c
initial state
s0
a
apply b
b
s4
a
g s12
a
e
b iteration 4
s2
b
s3
d
f
initial state b apply a
s1
c
iteration 2
e s8
e
a
iteration 1
b
a
s0
initial state
b
223
f d
s11
g s12
iteration 3
(c)
Fig. 1. Example of exploration process of a 13 state concurrent system using (a) BFS (b) BFS with chaining in lexicographical order of TR application (c) BFS with chaining in inverse lexicographical order.
is applied to the same from set to generate a new to set. Using chaining after each TR application the from set is updated with the states recently generated. Thus, a domino effect is produced and more states are discovered in only one TR application. Figures 2 and 3 show the difference between BFS traversal and BFS traversal with chaining. When chaining is used the order in which TRs are applied plays a crucial role. As an example, Fig. 1 shows a TS in which the behavior of symbolic traverse depends to a great extend on the selected TR application order. Each one of the subfigures in Fig. 1, shows the performance of different approaches on the same system. Subfigure (a) corresponds to a traditional BFS traversal. The progress of the reachability set is indicated by means of labeled arcs of type iteration n, indicating that all the states over that arc were discovered in BFS step n. Subfigures (b) and (c) show also a BFS approach, but using chaining. The difference between these two subfigures is the order in which the chaining of events is applied. In (b) we used lexicographical order (so in each step we applied the events as follow: {a, b, c, d, e, f, g}), and in (c) we used inverse lexicographical order ({g, f, e, d, c, b, a}). In this case the length of the traverse process was: (b) 1 iteration (c) 3 iterations. In (b) we show a detailed behavior of chaining and we draw the reachability set after each event is fired. As we can see all the system is traversed in only one step, while in (c) three steps are needed, although chaining is also used. The state generation ratio of this technique may be limited if a TR application order is established that does not pay attention to causality between TRs. The causality between pairs of TRs can be approximated by the following heuristic that tries to numerically indicate the a priori causal relationship between events. Let Tei and Tej be the TRs of two events ei and ej . We define XT oei →ej (V ) as XT oei →ej (V ) = ∃v∈V [Tei (V, V ) ∩ Fr(ei )(V ) ∩ Fr(ej )(V )] V ←V
224
M. Sol´e and E. Pastor
TO(A)
Application of TR A
States generated by the application of TR A TO(A)
Application of TR B
Overlapping the sets TO(B)
TO (B)
(C) TO
FROM
States generated by the application of TR B
Union of the sets
Application of TR C
TO(C)
TO States generated by the application of TR C
Fig. 2. State generation using BFS traversal. Application of TR A TO(A)
FROM
States generated by the application of TR A Overlapping the sets TO(B) Application of TR B
Union of the sets FROM
TO(A)
FROM’
States generated by the application of TR B
Overlapping the sets
FROM’
Union of the sets FROM’’
TO(B)
Fig. 3. State generation using chained traversal.
(see Fig. 4). From now on we will avoid the overhead of explicitly stating the present set of Boolean variables V and next state set V in the formulas. Therefore the previous formula will be rewritten as XT oei →ej = ∃v∈V [Tei ∩ Fr(ei ) ∩ Fr(ej )] V ←V The XT oei →ej operator simply gives us the set of states reached after the firing of event ei from the states in which event ej was not fireable. The heuristic causality(ei → ej ) is defined as causality(ei → ej ) =
|XT oei →ej ∩ Fr(ej )| |Fr(ej )|
Traversal Techniques for Concurrent Systems
225
and indicates the proportion between the set found with the XT o operator and Fr(ej ). Graphically, see Fig. 4(c), it is the proportion of the dashed area with respect to the whole Fr(ej ) set. a)
b)
XTo(A,B)
c)
XTo(A,B)
To(A)
To(A) Fr(A)
Fr(A)
Fr(A) XTo(A,B)*Fr(B) Fr(B) Fr(B)
Fig. 4. The XT oA→B operator: (a) shows the T o operator, (b) depicts XT oA→B , and (c) shows their relationship.
Intuitively, big values of causality(ei → ej ) show that the activation of TR Tei will tend to produce states in which the application of TR Tej is possible. It must be noted that it is possible to define the symmetric heuristic of causality(ei → ej ), noted negative causality(ei → ej ) by defining the operator CT oei →ej as CT oei →ej = ∃v∈V [Tei ∩ Fr(ei ) ∩ Fr(ej )] V ←V . This function returns the set of states reached after firing event ei from the states in which event ej was fireable. negative causality(ei → ej ) is defined as: negative causality(ei → ej ) =
|CT oei →ej ∩ Fr(ej )| |CT oei →ej |
Definition 1. Two TRs A and B are said to be independent iff each one of the transitions of A falls into one of the following categories: 1. it goes from a state where TR B is fireable to a state where TR B is still fireable, or 2. it goes from a state where TR B is not fireable to a state where TR B is still not fireable. and the same must hold if TR B is applied with respect the fireability of TR A. Theorem 1. If two TRs A and B are independent ⇔ causality(A → B) = 0 and negative causality(A → B) = 0 and causality(B → A) = 0 and negative causality(B → A) = 0. Proof. If A and B are independent the application of one of them to some set S of states cannot change the enableness/disableness of the other. Suppose in set S we
226
M. Sol´e and E. Pastor
have states from which B can be fired (set SB ) and states in which it cannot (set SB ). If we apply A to SB or SB there may be states that will not change. This ones immediately satisfy the property abovementioned. The states that have changed must satisfy the following: if they were states of SB the application of A must produce only states in which B is fireable (negative causality(A → B) is then 0). If they were part of SB then the states generated cannot be in Fr(B) (so causality(A → B) must be 0). The same holds if we exchange A and B. It must be noted that this concept of independence may be viewed as a strong independence or structural independence, as it can happen that two dependent TRs behave, in fact, as independent given some particular initial states. Definition 2. The set of variables which constitute a formula ϕ is called support of ϕ, written as Sup(ϕ). To specify the formula for a TR we use two sets of variables, one to represent present state states and another to represent next state states. Definition 3. Let V be the set of variables used to represent the present state, and V the set of variables used to represent the next state. We define the function related(v) as a bijective function between V and V . Given a variable v in V , related(v ) returns the corresponding variable v in V . We extend function related(v) to sets of variables; i.e. related(Va ) returns the set of variables related to Va . Formally related(Va ) = {related(v)|v ∈ Va }. For instance, assuming V = {p1 , p2 , p3 } and V = {q1 , q2 , q3 }, then related(q3 ) = p3 and related({q1 , q3 }) = {p1 , p3 }. Definition 4. An event ei is said to have independent causal support from event ej iff related(Sup(Tej )) ∩ Sup(Tei ) = ∅. Theorem 2. Events that have mutual independent causal support one from each other, are independent. Proof. if Related(Sup(ej )) ∩ Sup(ei ) = ∅ is true, then event ej is not able to write on the variables on which the enableness of ei depends. Then, any state obtained from the activation of ej will preserve the enableness of ei . Thus, ei is independent from ej . The same can be stated by interchanging ei and ej , so ei and ej are independent events. Theorem 2 can be used to simplify the computation of the causal matrix (see Section 4), as this independence check only involves variable set manipulation, which is usually very fast. Only for those events that do not satisfy this check we need to compute the causality heuristic to determine its final causality value.
4
Token Traversal
Given a concurrent system, it is possible to compute causality(ei → ej ) for each pair of events (TRs), resulting in a causality matrix. This matrix can be analyzed in such a way that we produce a PN model of the event firing. This
Traversal Techniques for Concurrent Systems
227
A
A A 0 B 0 C 0.1
B 0.3 0 0
(a)
C 0.2 0.4 0
A B C A 011 B 0 0 1 C 100 (b)
A fired
C fired
B
C B fired
(c) Fig. 5. Event PN inferred from the causality matrix.
transformation is done as follows: suppose that each event is a place, then, for every position of the matrix different than 0, we establish a relation between the places of the two events that are related to that matrix position. For instance, imagine the causality matrix for a three event system shown in Fig. 5(a). All matrix positions that have a value greater than 0 are changed to 1, otherwise their value remains 0 (see Fig. 5(b)). The corresponding PN is depicted in Fig. 5(c). Although we use the same graphical representation as a PN, it does not behave as a normal PN as defined in Section 2. Instead, the traverse scheme always fires the place with more tokens in it. Obviously we must define some initial tokens. In order to do this we put a token in all places corresponding to events that are initially fireable; more precisely for each event e ∈ Σ : N ew ∩ Fr(e) =∅. Initially, the set of new states is equal to the initial set of states (N ew = F rom). A brief outline of the algorithm is given below: 1. select place with the highest number of tokens 2. fire the event associated to this place 3. if the event has generated new states 4. then put one token on all successors 5. else absorb tokens
Figure 6 shows an example of the algorithm execution over the system represented by Fig. 5. We assume two initial tokens on events A and B. When there is more than one place with maximum number of tokens, one of them is chosen randomly. In our case event A was selected, although event B was also a possible election. Let us assume that event A generates new states (states not already visited), then one token is placed on its successors, that is place B and C. Next, event B is selected (the only possible choice this time) and is fired, successfully generating new states. As a result two tokens are placed on event C (the initial plus the token from B), which is our next choice corresponding to the last state shown in the figure. Now consider what happens if event C is not successfully fired. All tokens on the net are absorbed, so no possible event can be selected afterwards. In this case, the algorithm starts up again, first by recalculating the
228
M. Sol´e and E. Pastor A
A
A A fires successfully
A selected A fired
C fired
B
C
A fired
C fired
B
B fired
C
A fired
C fired
B
B fired
C B fired B selected
A
A fired
C fired
B
C B fired
A
A
C selected
A fired
C fired
C
B
B fires successfully
A fired
C fired
B
B fired
C B fired
Fig. 6. Token firing scheme (TOK).
N ew set as the set of new states generated since last setup and then placing tokens in the events fireable given this present new set. Proceeding so, the number of steps can be considered as the number of setups, and inside one step all firings use chaining to take advantage of the causality relation. The algorithm for TOK is shown next. The external loop is repeated until traversal is finished (no new states generated in the last step). The inner loop represents one step, we select events until all tokens are absorbed. 1. repeat 2. oldF rom = f rom 3. initial tokens( net, new ) 4. stop = FALSE 5. while (¬stop) 6. event = select event max token( net ) 7. to = fire event( event, f rom ) 8. if (to ⊆ reached) 9. absorb token( net, event ) 10. else 11. propagate token( net, event ) 12. f rom = f rom ∪ to 13. reached = reached ∪ to 14. if ( no more tokens( net ) ) 16. stop = TRUE 17. new = reached\oldF rom 18. f rom = new 19.until (new = ∅)
We provide a brief definition of all functions called in this pseudo-code:
Traversal Techniques for Concurrent Systems
229
– initial tokens scans all events and adds a token to the corresponding place if the event is fireable in some state contained in N ew i.e. N ew ∩ Fr(e) =∅. – select event max token selects the event that has more tokens in its corresponding place. – absorb token just removes the tokens from the place assigned to the event passed as argument. – propagate token removes the tokens from the place assigned to the event passed as argument and adds one token on all the successors of that event. – no more tokens returns true if there is no token left in the net.
5
Weighed Token Traversal
It is possible to expand the preceding idea and consider that when an event is successfully fired we do not add only one token to its successors. Instead we can add a number of tokens related to the number of new states generated in which this particular successor is fireable. This would solve one of the problems of our previous proposal, the ineffective activation. The problem arises when a token is placed on one event because its predecessor generated new states, but there are no real new states in which this particular event can be fired. As a result its activation is superfluous. We will illustrate this problem with an example. Suppose we have a TS with three variables V = {p1 , p2 , p3 } and three events A, B and C. To specify the TRs we also use an extra set of variables V = {q1 , q2 , q3 } on the next state. The TR for the events is given below: – TR A: p1 · q1 – TR B: p2 · q2 – TR C: p1 · p2 · q1 · q2 · q3 The initial state s0 is p1 = 0, p2 = 0, p3 = 0 that we write as 000. This system has the reachable set of states S that we depict in Figure 7. The causality matrix of this system (once all values greater than 0 are converted to 1) is: A B C A 001 B 0 0 1 C 110 which translates into the net of Fig. 7. Applying the TOK scheme (see Section 4), Fig. 8 depicts the execution of the traversal on the example. We start at state 000, where events A and B can be activated. This is shown in the first net of Fig. 8 by the two tokens placed on places A and B. The algorithm may select A to fire. A token is placed on C as in the causality matrix A is related to C. However, the activation of A has only produced state 100, from which C cannot fire although a token has been placed on its place. Now the algorithm may select C to fire (3rd net on figure), that is a superfluous activation because no new state can be produced.
230
M. Sol´e and E. Pastor 000 A
B
B
A
100
C fired
010
A
B
110 C 001 A 101
A fired
B
B
B fired
011
C
C
A 111
Fig. 7. (a)State space for ineffective activation example; and (b)Firing net for ineffective activation example. C fired A
C fired B
C fired
A
B
A fired
A fired
B fired
A
B
C fired
A fired
B fired
C
A fired
C
B fired C B fired
C fired A
C fired B
A
C fired B A fired
C fired
A
B
(no new so token absorbed) A fired
B fired C
A fired
B fired C
A fired
B fired C
Fig. 8. Ineffective activation example for system of Figure 7(b) using token traverse.
In order to tighten the relationship between the number of tokens and the real number of new states in which an event can be fired, we redefine the number of tokens as a lower bound of the number of states in which the events may be fired. Later on we will justify why it is only a lower bound. The setup is done like in TOK, except that the number of tokens placed in every event is given by |F rom ∩ Fr(e)|. When an event is selected and fired, we compute the new states set (inN ew = T o\Reached), and we add to the successors of the event the following quantity |N ew ∩ Fr(e)|. We stated that the number of tokens in a place is a lower bound of the number of fireable states for the event related to that place. We illustrate this with an example. In Fig. 7 the causality matrix has zeros for the relationships between A and B as they are independent events. However it can be seen in Fig. 7 that if our starting point is state 000 and we fire event A, we obtain state 100; that is, a state in which B is also fireable. No new token has been placed on
Traversal Techniques for Concurrent Systems
231
B because there is a zero in the causality matrix for those two events, although now B can be fired on two different states. These “untracked” states are always states in which an event ei was already fireable and then the activation of ej added new fireable states to ei (they were independent). Although they are not considered by the number of tokens, the algorithm indirectly keeps track of them because initial tokens are placed on all possible fireable events. In this example, although no additional token is added to the place of event B, there is already a token there and eventually B will be fired. Next we present the main WTOK traverse schema, which resembles the TOK algorithm: 1. repeat 2. oldF rom = f rom 3. initial tokens( net, new ) 4. stop = FALSE 5. while (¬stop) 6. event = select event max token( net ) 7. to = fire event( event, f rom ) 8. inN ew = to\reached 9. distribute tokens( net, event, inN ew ) 10. f rom = f rom ∪ to 11. reached = reached ∪ to 12. if ( no more tokens( net) ) 13. stop = TRUE 14. new = reached\oldF rom 15. f rom = new 16. until (new = ∅)
Using this schema, the sequences of firings for the TS in Fig. 7 is shown in Fig. 9. Note that with respect to Fig. 8 the ineffective activation problem has been eliminated. Compared with TOK, WTOK allows a greater level of accuracy, but is computationally slightly more expensive, because for every possible successor a BDD AND operation is performed.
C fired A
C fired B
A
C fired B
A fired
A fired
B fired C
A
C fired B
B fired
A fired
B fired C
A
B
C fired
A fired
B fired C
A fired
B fired C
Fig. 9. Solution to the ineffective activation provided by the weighed token traverse.
232
6
M. Sol´e and E. Pastor
Dynamic Event-Clustered Traversal
We have seen that WTOK does not guarantee an exact equivalence between number of tokens and new states to fire from. The main problem are the untracked states produced by independent events firings. This is a side-effect of using only causality to determine the successors events of an event, as we have already stated on the previous section. Using causality was motivated to produce sequences of firings favorable to chaining. However, if we fire not only causal related events but also independent events, then the use of chaining is unadvisable. A generalized use of chaining usually implies larger execution times as all events are fired in each iteration. To avoid the ineffective application of the TRs we propose to keep track of all states in which each particular event is enabled (DEC). Hence, we will store a From set for each event in the system (denoted F rom(e)). This set should hold all states up to the current state of the reachability analysis from which the event has not been fired yet; that is, all new states for the event. When an event is fired from the set of states assigned to it, implicitly uses chaining. The firing scheme is as follows. Given a set of new states, they are distributed over the events in the TS. Those states in which a certain event is enabled are associated to it and accumulated with other states that have been previously assigned. The set is updated as: F rom(e) = F rom(e) ∪ (N ew ∩ Fr(e)). The number of tokens “assigned” to each event is computed as the cardinality of the set F rom(e). The event with greater number of fireable states is selected, the event fired, its F rom(e) set emptied, and the new states generated distributed again. The scheme ends when all events have an empty from set. The main algorithm is given below: 1. stop = FALSE 2. while (¬stop) 3. event = select event max from( event list ) 4. if (event = N U LL) 5. stop = TRUE 6. else 7. to = fire event( event, event → f rom ) 8. event → f rom = ∅ 9. new = to\reached 10. reached = reached ∪ to 11. distribute tokens( new, event list )
The price to pay for the exact knowledge this scheme provides, is an increased computational complexity. For every event activation, the state distribution process implies n BDD operations, being n the number of events. Compared to WTOK in which only k BDD operations were performed, being k the number of successors for that particular event. Another drawback is the BDD blowup problem, when the from sets tend to grow due to poor BDD recombination. To mitigate this problem the from sets are minimized using the reachability set.
Traversal Techniques for Concurrent Systems
233
Event "a OR b" closure s0
a
b
s1
s2
a s3
a
a
b
b
s4
b
a
s5
b
a
b
Fig. 10. Closure of an ORed event.
7
Transition Relation Cluster-Closure Traversal
One of the main bottlenecks of symbolic verification is the size of the TR as a result of its monolithic structure. After partitioned TRs were introduced the bottleneck moved to the representation of the intermediate set of reachable states. In concurrent systems partitioned TRs is even more natural due to their inherent structure. However, the additional number of intermediate sets and BDD operations increases the probability of a BDD blowup. We propose a firing scheme that reduces the number of TR applications by clustering subsets of events (TRCC). A monolithic closured TR is created for each cluster. Events are added to clusters incrementally. Without loss of generality, two events are clustered together by ORing their TRs. ORing produces a single TR whose activation has the same effect than the activation of both TR independently. Note that TR size is increased as the support variables in each TR is increased. Hence, clustering stops when a certain BDD size is reached. As a result, we perform less TR applications but normally more expensive. In concurrent systems it is common to have concurrency diamonds due to events that are independent. In order to generate this diamond in only one firing we also concatenate TRs. This process is a particular case of iterative squaring. Iterative squaring is a powerful technique because when used with a monolithic TR it may exponentially reduce the number of steps required to complete the reachability analysis. Unfortunately, it is often the case that this is computationally too expensive. However, when transitive closure is used with smaller TRs it may be effective and computationally suitable. If we take a twoevent TR and compute its closure, we obtain a TR that can compute at least the full concurrency diamond in one step. In fact more states can be discovered depending on whether the events can be iteratively activated or not (see Fig. 10). In practice we add events to the events clusters iteratively. First we OR the TR of the new event and then compute the transitive closure of this new TR (usually we obtain smaller BDD sizes). Our approach does not assume any hierarchical structure in the system. To avoid an uncontrolled BDD growth we cluster the events that share as many variables as possible. In the results pre-
234
M. Sol´e and E. Pastor
sented in Section 8, each event was clustered with some other event that had most variables in common. Doing so the number of events can be reduced at most to half of the original number.
8
Experimental Results
All the results are from executions on a Pentium III 833 Mhz. On the following tables several concurrent systems are analyzed using the schemes described in this article. Due to space constraints we use abbreviations on the table. The correspondence between the abbreviations and the methods is: Seq BFS traversal. GChain Greedy chaining traversal. TOK Token traverse (see Section 4). WTOK Weighed token traversal (see Section 5). DEC Dynamic event-clustered traversal (see Section 6). TRCC Transition relation cluster-closure traversal (see Section 7). For TRCC in some examples there is an additional entry. The default method is TRCC, but when appears written as TRCC* indicates that the execution used manual clustering. As it is not always easy to define good partitioning schemes, we only report results when this was possible. The Greedy chaining traversal is equal to the BFS algorithm (i.e. same firing order) with the only exception that makes use of the chaining technique (see Section 3). Column (Events) shows the total number of event firings used to traverse the system. Note that some results are not directly comparable, i.e. TRCC reduces the number of events in the system. Column (Peak) shows the peak number of BDD nodes reached during traversal (in thousands). Finally, the last column specifies in seconds the wall-clock time needed to finish the analysis or timeout if the algorithm failed to finish within an hour (3600s). Sometimes, when the total time has been obtained, it is specified in brackets. Also when the time at which the algorithm was stopped has been bigger than an hour it is indicated with a “>” sign. We analyzed different types of systems. Their characteristics are described on Table 1. Basic information on the size is given: number of Boolean variables, number of events and reachable states. The second column shows the original formalism of the system (before generating the equivalent TS): C for circuits and P for Petri nets. We give a brief description of the most relevant systems: RGD-arbiter asP*, RGD arbiter presented on [13] at transistor level. STARI (16) A self-timed pipeline. Slotted ring (n) Slotted ring protocol for LANs (n number of nodes). dme /DME (n) Various DME implementations/specifications. Muller (n) Muller’s C-element pipeline of n elements. In all examples the TR application count is largely reduced (the original goal of this work). We can also see that the way in which TRs are applied also provide
Traversal Techniques for Concurrent Systems
235
benefits in terms of CPU execution times and BDD-sizes. The RGD-arbiter, the slotted ring, the DME specification and Muller pipeline are examples in which almost any traversal scheme will provide improvements. On the contrary, the STARI pipeline does not respond to any of the schemes, except TRCC when a set of clusters was manually provided. The motivation for this behavior is the structure of the pipeline itself: it is a deep structure with lots of concurrency at each level. Clustering the events in each step reduced the depth of the traversal and a BDD reduction due to the complete diamond generation in one step. More experiments are necessary in order to correlate the efficiency of each scheme with the topology of the system under analysis. Table 1. Concurrent systems under test. Name Type Variables Events Size RGD-arbiter C 63 47 5.49046e+13 STARI (16) C 100 100 4.21776e+22 Slotted ring (10) P 100 100 8.49079e+12 Slotted ring (15) P 150 150 4.79344e+19 Slotted ring (20) P 200 200 2.86471e+26 dme (3) C 295 492 6579 dme (5) C 491 820 859996 DME (8) P 134 128 311296 DME (9) P 152 144 3.2768e+06 parallelizer (16) P 130 100 2.82111e+12 Muller (30) P 120 60 6.009e+07 Muller (40) P 160 80 4.64139e+10 Muller (50) P 200 100 3.61071e+13 Muller (60) P 240 120 8.38369e+15 buf (100) P 200 101 1.26765e+30 sdl arq deadlock P 154 92 3954
— RGD-arbiter — Method Steps Events Peak Time (s) Seq >38 >1786 >1755 [>14400] GChain 24 1175 1755 1476 TOK 8 1430 20 30 WTOK 10 1280 20 26 DEC N/A 1334 50 82 TRCC* 10 55 63 45 TRCC 17 468 1335 1501 — slotted ring (10) — Method Steps Events Peak Time (s) Seq 189 19000 445 4195 GChain 17 1800 68 65 TOK 1 1486 17 16 WTOK 1 1296 20 20 DEC N/A 2500 307 802 TRCC 12 780 32 15 — slotted ring (20) — Method Steps Events Peak Time (s) Seq – – – timeout GChain 32 6600 1562 [5296] TOK 1 5463 191 966 WTOK 1 4474 118 545 DEC – – – timeout TRCC 22 2760 311 531
— STARI (16) — Steps Events Peak Time (s) >329 >33000 – [>8800] 127 12800 440 2435 >34 N/A [>1590] [>10800] 67 10555 698 [8890] N/A 8135 572 [7997] 48 833 106 138 110 5550 852 [4318] — slotted ring (15) — Steps Events Peak Time (s) – – – timeout 24 3750 391 781 1 3206 71 248 1 2690 220 414 N/A 5621 893 [7196] 18 1710 77 87 — parallelizer (16) — Steps Events Peak Time (s) 99 10000 70 189 5 600 20 26 1 314 20 18 1 342 20 30 N/A 194 20 39 3 272 20 18
236
M. Sol´e and E. Pastor — dme (3) — Method Steps Events Peak Time (s) Seq 114 56580 150 289 GChain 46 23124 70 105 TOK 1 2938 87 305 WTOK 1 3235 78 156 DEC N/A 544 45 103 TRCC 46 11562 77 145 — DME (8) — Method Steps Events Peak Time (s) Seq 40 5248 36 26 GChain 12 1664 20 9 TOK 1 545 26 20 WTOK 1 528 26 19 DEC N/A 250 20 7 TRCC 12 936 20 11 — Muller (30) — Method Steps Events Peak Time (s) Seq 140 8460 258 1386 GChain 23 1440 43 32 TOK 1 901 20 16 WTOK 1 774 20 16 DEC N/A 666 113 98 TRCC 23 720 41 17 — Muller (50) — Method Steps Events Peak Time (s) Seq – – – timeout GChain 35 3600 219 456 TOK 1 2336 57 155 WTOK 1 1965 57 111 DEC – – – timeout TRCC 35 1800 213 246 — buf100 — Method Steps Events Peak Time (s) Seq >352 >35552 – [>8100] GChain 100 10201 13 7 TOK 1 10202 51 690 WTOK 1 6200 21 155 DEC N/A 7864 – [13407] TRCC 100 5151 334 595
9
— dme (5) — Steps Events Peak Time (s) >83 >68060 >1297 [>9900] 86 71340 977 [4166] 1 9453 1055 [9865] 1 10328 857 [11989] N/A 2708 373 2321 86 35670 756 3089 — DME (9) — Steps Events Peak Time (s) 51 7488 51 82 15 2304 20 18 1 690 20 22 1 697 25 34 N/A 392 20 10 15 1216 20 30 — Muller (40) — Steps Events Peak Time (s) 248 19920 1026 [15361] 29 2400 103 151 1 1536 30 52 1 1305 47 59 N/A >211 – timeout 29 1200 103 79 — Muller (60) — Steps Events Peak Time (s) – – – timeout 43 5280 431 907 1 3185 80 244 1 2763 155 320 – – – timeout 43 2640 429 582 — sdl arq deadlock — Steps Events Peak Time (s) 120 11132 42 35 40 3772 20 7 1 1354 15 3 1 1242 15 3 N/A 448 20 8 35 1800 22 22
Conclusions
This paper proposes four different schemes to speed up reachability analysis on concurrent systems. Their main contribution is to establish different heuristic orderings for the application of the TRs that can reduce substantially the time required to generate the full state space. Although firing order has been studied on state reduction techniques (i.e partial order [14]), to our knowledge this is the first time this issue is addressed to generate all the reachable states for concurrent systems. Experimental evidence has been given that the methods proposed are most times faster than a classical BFS approach or even a BFS with chaining. For all benchmarks, the use of the simple greedy chaining (BFS) scheme has proved to be very useful. However it is important to note that at least one of the proposed schemes always performed better than the latter. It remains as an open problem to decide a priori which method is more suitable for a given system. If this could not be decided on a reasonable amount of time there is always the possibility to try all the schemes sequentially or in parallel.
Traversal Techniques for Concurrent Systems
237
References 1. R. E. Bryant, “Graph-based algorithms for Boolean function manipulation,” IEEE Trans. Computers, vol. C-35, pp. 677–691, Aug. 1986. 2. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang, “Symbolic model checking: 1020 states and beyond,” Information and Computation, vol. 98, no. 2, pp. 142–170, 1992. 3. O. Roig, J. Cortadella, and E. Pastor, “Verification of asynchronous circuits by bdd-based model checking of petri nets,” in 16th International Conference on Application and Theory of Petri Nets, pp. 374–391, June 1995. 4. J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Yakovlev, “Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers,” IEICE Transactions on Information and Systems, vol. E80D, no. 3, pp. 315–325, March 1997. 5. A. S. Miner and G. Ciardo, “Efficient reachability set generation and storage using decision diagrams,” in ICATPN, pp. 6–25, 1999. 6. J. C. E. Pastor and O. Roig, “Symbolic analysis of bounded petri nets,” IEEE Transactions on Computers, vol. 50, no. 5, pp. pp. 432–448, May 2001. 7. K. Ravi and F. Somenzi, “High-density reachability analysis,” in Proc. of the IEEE/ACM International Conference on Computer Aided Design, pp. 154–158, 1995. 8. G. Cabodi, P. Camurati, and S. Quer, “Improving symbolic traversals by means of activity profiles,” in Design Automation Conference, pp. 306–311, 1999. 9. A. Hett, C. Scholl, and B. Becker, “State traversal guided by hamming distance profiles.” 10. K. Ravi and F. Somenzi, “Hints to accelerate symbolic traversal,” in Conference on Correct Hardware Design and Verification Methods, pp. 250–264, 1999. 11. A. Arnold, Finite Transition Systems. Prentice Hall, 1994. 12. C. Petri, Kommunikation mit Automaten. PhD thesis, Schriften des Institutes f¨ ur Instrumentelle Matematik, Bonn, 1962. 13. M. R. Greenstreet and T. Ono-Tesfaye, “A fast, ASP*, RGD arbiter,” in Proceedings of the Fifth International Symposium on Advanced Research in Asynchronous Circuits and Systems, (Barcelona, Spain), pp. 173–185, IEEE, Apr. 1999. 14. P. Godefroid, Partial-order methods for the verification of concurrent systems: an approach to the state-explosion problem, vol. 1032. New York, NY, USA: SpringerVerlag Inc., 1996.
A Fixpoint Based Encoding for Bounded Model Checking Alan Frisch1 , Daniel Sheridan1 , and Toby Walsh2 1
2
University of York, York, UK {frisch,djs}@cs.york.ac.uk Cork Constraint Computation Centre, University College Cork, Cork, Ireland
[email protected] Abstract. The Bounded Model Checking approach to the LTL model checking problem, based on an encoding to Boolean satisfiability, has seen a growth in popularity due to recent improvements in SAT technology. The currently available encodings have certain shortcomings, particularly in the size of the clause forms that it generates. We address this by making use of the established correspondence between temporal logic expressions and the fixpoints of predicate transformers as used in symbolic model checking. We demonstrate how an encoding based on fixpoints can result in improved performance in the SAT checker.
1
Introduction
Bounded Model Checking (BMC) [2] is an encoding to Boolean Satisfiability (SAT) of the LTL model checking problem. The encoding is achieved by placing a bound on the number of time steps of the model that are to be checked against the specification. The resulting Boolean formula contains variables representing the state variables of the model at each step along a path, together with constraints requiring the path to be contained within the model and to violate the specification. The result of the SAT checker is thus a path in the model which is a counterexample to the specification, or failure, which means that no such path exists within the bound. The encoding of the LTL specification in BMC is defined recursively on the structure of the formula. While for simple specifications this is sufficient, more complex specifications such as bounded existence and response patterns [7] lead to an exponential blowup in the size of the resulting Boolean formula. Recent improvements to the encoding in NuSMV [4] have not removed this restriction. The fixpoint characterisations of temporal operators [8] have been exploited in other model checking systems such as SMV [14]; we discuss an approach to their use in an encoding of LTL for BMC which produces more compact encodings which can be solved more quickly in the SAT solver.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 238–255, 2002. c Springer-Verlag Berlin Heidelberg 2002
A Fixpoint Based Encoding for Bounded Model Checking
2 2.1
239
Bounded Model Checking Background
A model checking problem is a pair M, f of a model and a temporal logic specification. A model M is defined as a Kripke structure S, R, L, I where S is a set of states; R : S → S is the transition relation; L : S → P(AP ) is the labelling function, marking each state with the set of atomic propositions (AP ) that hold in that state; and I is the set of initial states, which may be equal to S. A path π ∈ M is a sequence of states s0 , s1 , . . . ∈ M such that ∀i.(si , si+1 ) ∈ R. We write π(i) to refer to the ith state along the path. The model checking problem for LTL is to verify that for an LTL formula f , for all paths πi ∈ M such that πi (0) ∈ I, (M, πi ) |= f . 2.2
Path Loops
We say the a path π is a k-loop if for all i ≥ 0, the (k + i)th state in π is identical to the l + ith state for some l, 0 ≤ l < k. If a path is known to be a loop, it is possible to verify the correctness of infinite time specifications such as always (G) by checking just the first k states in the path. 2.3
Boolean Satisfiability
Boolean satisfiability (SAT) is the problem of assigning Boolean values to variables in a propositional formula, in such a way as to make the formula evaluate to true (to satisfy the formula). For example, for the formula (a ∨ ¬b) ∧ (b ∨ ¬c) ∧ (¬c ∨ ¬a) can be satisfied by e.g. the assignment a = 1, b = 1, c = 0. SAT solvers derived from the Davis-Putnam algorithm [5] require input in clause form (CNF): a conjunction of clauses, each of which is a disjunction of literals. A number of high performance SAT solvers are available, making SAT a convenient ‘black box’ back end for a number of different problems. 2.4
The Bounded Model Checking Encoding
The bounded model checking encoding represents k states along a bounded path πbmc together with a conjunction of constraints requiring πbmc to be a valid path in M and be a counterexample of f . The ‘valid path’ constraint is a propositional encoding of the transition relation. We can see from the bounded semantics of LTL (Figure 1) that there are two ways of violating each operator in the specification, depending on whether πbmc is a k-loop; the ‘counterexample’ constraint is therefore a disjunction of the ways in which the specification may be violated. We write the bounded model checking encoding of a problem with bound k, model M and specification f as [[M, ¬f ]]k
240
A. Frisch, D. Sheridan, and T. Walsh (M, π) |=ik a
⇔ a ∈ L(π(i))
(M, π) |=ik ¬f1
⇔ (M, π)| =ik f1
(M, π) |=ik f1 ∧ f2
⇔ (M, π) |=ik f1 and (M, π) |=ik f2
for atomic a
(M, π) |=ik f1 ∨ f2 (M, π) |=ik (M, π) |=ik (M, π) |=ik
(M, π) |=ik
(M, π) |=ik
⇔ (M, π) |=ik f1 or (M, π) |=ik f2 f1 if π is a k-loop (M, π) |=i+1 k X f1 ⇔ i+1 (M, π) |=k f1 ∧ i < k otherwise ∃j, i ≤ j.(M, π) |=jk f1 if π is a k-loop F f1 ⇔ ∃j, i ≤ j ≤ k.(M, π) |=jk f1 otherwise ∀j, i ≤ j.(M, π) |=jk f1 if π is a k-loop G f1 ⇔ ⊥ otherwise j ∃j, i ≤ j.(M, π) |=k f2 ∧∀n, i ≤ n < j.(M, π) |=n if π is a k-loop k f1 [f1 U f2 ] ⇔ n f ∃j, i ≤ j ≤ k.(M, π) |= 2 k ∧∀n, i ≤ n < j.(M, π) |=n otherwise k f1 ∃j, i ≤ j.(M, π) |=jk f1 ∧∀n, i ≤ n ≤ j.(M, π) |=jk f2 if π is a k-loop [f1 R f2 ] ⇔ ∃j, i ≤ j ≤ k.(M, π) |=jk f1 ∧∀n, i ≤ n ≤ j.(M, π) |=jk f2 otherwise
Fig. 1. The Bounded Semantics of LTL
Given the functions l Lk (π) which holds when π is a k-loop with π(k) = π(l) k and Ll (π) = l=0 l Lk which holds when π is any k-loop, the general translation is defined as1 : k 0 0 (1) [[M, f ]]k := [[M ]]k ∧ ¬Lk (π) ∧ [[f ]]k ∨ l Lk (π) ∧ l [[f ]]k l=0
where [[M ]]k denotes the encoding of the transition relation of M as a constraint on π with bound k; [[f ]]ki and l [[f ]]ki denote the encoding of the LTL formula f evaluated along path π at time i, where π is a non-looping path and a k-loop to l respectively. These encodings are given in Table 1. Biere et al. show the correctness of some of these encodings in [2]; we will not repeat their proofs here. Theorem 1 in Biere et al. [2] states that bounded model checking of this form is complete provided that the bound k is sufficiently large. 1
This comes from Definition 15 in [2]
A Fixpoint Based Encoding for Bounded Model Checking
241
Table 1. The BMC Encoding for LTL f [[f ]]ik [[f ]]i k l k j G f1 ⊥ j=min(i,l) l [[f1 ]]k k k j F f1 [[f ]]j j=i [[f1 ]]k j=min(i,l) l 1 k i+1 i+1 X f1 i < k ∧ [[f1 ]]k i < k ∧ l [[f1 ]]k ∨ i = k ∧ l [[f1 ]]lk k k n n f1 U f2 j=i ([[f2 ]]jk ∧ j−1 [[f2 ]]jk ∧ j−1 1 ]] ) n=i [[f1 ]]k ) j=i (l n=i l [[f k k j−1 i−1 j n ∨ j=l (l [[f2 ]]k ∧ n=i l [[f1 ]]n k ∧ n=l l [[f1 ]]k ) k j k k j j j j n n f1 R f2 j=i ([[f1 ]]k ∧ n=i [[f2 ]]k ) j=min(i,l) l [[f2 ]]k ∨ j=i (l [[f1 ]]k ∧ n=i l [[f2 ]]k ) i−1 j n ∨ j=l (l [[f1 ]]jk ∧ kn=i l [[f2 ]]n k ∧ n=l l [[f2 ]]k )
3
Exploiting Fixpoints in BMC
The approach that we have taken to making a fixpoint-based encoding for BMC is based on a clause-style normal form for temporal logic. After converting the specification to this form, we can redefine the encoding to specifically take advantage of the properties of the normal form. 3.1
The Separated Normal Form
Gabbay’s Separation Theorem [11] states that arbitrary temporal formulæ may be written in the form G ( i (Pi ⇒ Fi )) where Pi are (strict) past time formulæ and Fi are (non-strict) future time formulæ. Fisher [9] defines a normal form for temporal logic based on the Separation Theorem and gave a series of transformations for reaching it. The general form of SNF is the same as the separation theorem; the implications Pi ⇒ Fi are referred to as rules. Since neither LTL nor CTL have explicit past-time operators, Bolotov and Fisher [3] define the start operator which holds only at the beginning of time. (M, π) |=ik start ⇔ π(i) ∈ I The possible rules are thus start ⇒ lj An initial rule i
j
li ⇒ F
i
lj
li ⇒ X
lj
A global X-rule
j
A global F-rule
j
where li and lj are literals. The transformation functions T (Ψ ) recursively convert a set of rules which do not conform to the normal form into a set of rules which do. To convert any temporal logic formula f to SNF, it is sufficient to apply the transformation rules to the singleton set {start ⇒ f }. For brevity, we do not list the full set of transformations here; in general they are trivially adapted from those in [3], or from standard propositional logic.
242
A. Frisch, D. Sheridan, and T. Walsh
P ⇒f ∧x ∪ TG (Ψ ) x ⇒ X (f ∧ x) P ⇒ g ∨ (f ∧ x) TU ({P ⇒ f U g} ∪ Ψ ) = x ⇒ X (g ∨ (f ∧ x)) ∪ TU (Ψ ) P ⇒ Fg
P ⇒ G f (x) Tren1 ({P ⇒ G f (F g)} ∪ Ψ ) = ∪ Tren1 (Ψ ) x ⇒ Fg TG ({P ⇒ G f } ∪ Ψ ) =
In each of the above transformations, a new variable x is introduced: the conversion to SNF introduces one variable for each removed operator (in the first two transformations above) in addition to the renaming variables used to flatten the structure of the formula (in the last transformation above). The transformations to rules are based on the fixpoint characterisations of the LTL operators. All LTL operators can be represented as the fixpoint of a recursive function [8]; the transformations encode the corresponding function as a rule which is required to hold in all states. Only those operators characterised by greatest fixpoints are converted (always (G) and weak until (W); until (U) is first converted to weak until and sometime for its transformation) which means that the sometime operator remains unchanged. By Tarski’s fixpoint theorem [18] we know that a finite number of iterations of a rule is sufficient to find its fixpoint. Thus the instance of the introduced variable at time i holds iff the original operator held at time i. For a formal proof of the correctness of the transformations, see [10]. 3.2
Bounded SNF
Although the fixpoint characterisations are given for unbounded temporal logic, they are preserved for most of bounded LTL since we have bounded semantics for next-state (X). We note that the characterisation of always is valid if and only if the path is a k-loop; we encapsulate this constraint in the new operator next-loop-state (Xl ) with semantics (M, π(i + 1)) |=k f1 if π is a k-loop i (M, π) |=k Xl f1 ⇔ ⊥ otherwise and modify the transformation accordingly. The bounded semantics of always also fails to capture the concept of rules holding in all reachable states. We give the semantics for a modified operator bounded always (Gk ) for bounded LTL without the restriction to paths with loops. if π is a k-loop ∀j, i ≤ j.(M, π(j)) |=k f1 i (M, π) |=k Gk f1 ⇔ ∀j, i ≤ j < k.(M, π(j)) |=k f1 otherwise
A Fixpoint Based Encoding for Bounded Model Checking
243
The correctness of the transformations rely on a sufficient number of instances of the rules occurring. In BMC, this means that the transformations based on fixpoints are correct only when the bound is sufficiently large. It is easy to see, by appealing to the semantics, that the failure mode with an insufficiently large bound is the same as that for the original encoding: no counterexample is found. Introducing this operator allows us to restate the general form as Gk (Pi ⇒ Fi ) i
The rules Pi ⇒ Fi are now of the following form: lj An initial rule li ⇒ X l lj start ⇒ i
j
li ⇒ X
i
lj
A global X-rule
j
A global Xl -rule
j
li ⇒ F
i
lj
A global F-rule
j
with the transformation for the always operator being amended to TG ({P ⇒ G f } ∪ Ψ ) =
P ⇒f ∧x ∪ TG (Ψ ) x ⇒ Xl (f ∧ x)
The correctness of bounded SNF is covered in [16]. 3.3
Encoding Bounded SNF
The distributivity of bounded always follows directly from its semantics; because of the unusual semantics of start, this means that any LTL formula may be represented as a conjunction of instances of the following ‘universal’ rules:
start ⇒ Gk
i
lj
Gk
j
li ⇒ X lj j
i
Gk
i
li ⇒ X l lj j
li ⇒ F lj j
Although it is simple to encode these rules using the standard BMC encodings in Table 1, we can take advantage of the limited nesting depth characteristic of these normal forms to define a more efficient encoding, in the same way as for the depth 1 case in [4] and [17]. We give the more efficient encodings in Table 2. Note that although we make use of the BMC encodings, they are only used for purely propositional formulæ. No further proof of these encodings is required: they are trivial simplifications of those proved in [2].
244
A. Frisch, D. Sheridan, and T. Walsh Table 2. The BMC Encoding for SNF-LTL [[f ]]0k [[f1 ]]0k
f start ⇒ f1 Gk (f1 ⇒ Xl f2 )
⊥ k−1 n Gk (f1 ⇒ X f2 ) [[f1 ]]k ⇒ [[f2 ]]n+1 k n=0
k m Gk (f1 ⇒ F f2 ) kn=0 [[f1 ]]n k ⇒ m=n [[f2 ]]k
0 l [[f ]]k 0 l [[f1 ]]k k n+1 n n=0 l [[f1 ]]k ⇒ l [[f2 ]]k k n+1 n n=0 l [[f1 ]]k ⇒ l [[f2 ]]k
k k n m n=0 l [[f1 ]]k ⇒ m=min(n,l) l [[f2 ]]k
For propositional f , [[f ]]ik ≡ l [[f ]]ik , so we can deduce from Table 2 that this relationship also holds for many cases where f is a rule. Under these circumstances, we can factorise the encodings for f out of the disjunctions in Equation 1 either explicitly during the encoding or by processing the resulting propositional formula. Often the checks for the looping nature of π will cancel each other out entirely, further simplifying the encoding. While this type of optimisation can be made with the standard BMC encoding, it only occurs where operators are not nested; the renaming effect of SNF simplifies this optimisation and makes it more widely applicable. 3.4
The Fixpoint Normal Form
We noted in Section 3.1 that SNF converts only the greatest fixpoint operators, leaving rules containing the sometime operator; we see from Table 2 that these rules are the pathological case for this encoding. Converting the sometime operator in the same way requires care. A transformation based directly on the fixpoint characterisation would be
P ⇒f ∨x TF ({P ⇒ F f } ∪ Ψ ) = ∪ TF (Ψ ) x ⇒ X (f ∨ x) The problem stems from the disjunction in the second rule. Since we are trying to show satisfiability, it is simple to satisfy each occurrence of the rule by setting the right hand disjunct to true for all time: the rule can always be satisfied. Since we are interested only in the bounded semantics of the operator, it is possible to break this chain at the bound by introducing an extra operator: (M, π) |=ik bound ⇔ i ≥ k The transformation is now
P ⇒f ∨x x ⇒ X (f ∨ x) ∪Ψ TF ({P ⇒ F f } ∪ Ψ ) = bound ⇒ f ∨ ¬x
3.5
Correctness of the Fixpoint Normal Form Transformation
We take the outline of the proof from [10]. For a transformation T to preserve the semantics of an arbitrary formula f , we require that
A Fixpoint Based Encoding for Bounded Model Checking
245
for all models M and for all LTL formulæ f , (M, π) |=k f iff there exists an M such that M ∼x M and (M, π) |=k τ (f ) where x is a new propositional variable introduced, and M ∼x M if and only if M differs from M in at most the valuation given x. We express this in temporal logic with quantification over propositions (QPTL)2 as QPTL f ⇔ ∃x.T (f ). The proof is given for the case that the rule set is a singleton set, since for all transformations, T is independent of Ψ . The proofs may easily be extended to non-empty Ψ . Lemma 1. For sufficiently large k, (M, π) |=k F f1 if and only if (M , π) |=k (x ∨ f1 ) and (M , π) |=k Gk (x ⇔ X(x ∨ f1 )) where M ∼x M . Proof. Consider the fixpoint expression τ (Z) = f1 ∨ X Z. We introduce the variable x such that for all n, (M , π) |=nk x ⇔ (M , π) |=nk X τ k−n (true) By substituting the definition of x and by one substitution of the definition of τ , we have (M , π) |=nk x ⇔ (M , π) |=nk X (f1 ∨ x) and by reference to the semantics, (M , π) |=k Gk (x ⇔ X(x ∨ f1 )) . From the least fixpoint characterisation[8], (M , π) |=k x ⇔ F f1 , and by unrolling τ by one step and substituting the definition of x, we get (M , π) |=k f1 ∨ x. Theorem 1. For any rule A, QPTL A ⇔ ∃x.TF (A) Proof. Proving each direction independently: – QPTL A ⇒ ∃x.TF (A) Substituting Lemma 1, Gk (P ⇒ F B) ⇒ ∃x.Gk (x ⇔ X(x ∨ B)) ∧ Gk (bound ⇒ ¬x) ∧ Gk (P ⇒ (x ∨ B)) ⇒ ∃x.Gk ((x ⇔ X(x ∨ B)) ∧ bound ⇒ ¬x ∧ (P ⇒ (x ∨ B))) which implies the set of rules {x ⇒ X(x ∨ B), bound ⇒ ¬x, P ⇒ x ∨ B}. – QPTL ∃x.TF (f ) ⇒ f Starting with the transformed set of rules {x ⇒ X(x ∨ B), bound ⇒ ¬x, P ⇒ x ∨ B}, and exploiting the corollary of Lemma 1, (M , si ) |=k (x ∨ f1 ) ⇔ (M , si ) |=k F f1 iff (M , si ) |= Gk (bound ⇒ ¬x) Gk ((x ⇔ X(x ∨ B)) ∧ bound ⇒ ¬x ∧ (P ⇒ (x ∨ B))) ⇔ Gk (x ⇔ X(x ∨ B)) ∧ Gk (bound ⇒ ¬x) ∧ Gk (P ⇒ (x ∨ B)) ⇔ Gk (x ⇔ X F B) ∧ Gk (bound ⇒ ¬x) ∧ Gk (P ⇒ (x ∨ B)) ⇒ Gk ((x ⇒ X F B) ∧ (P ⇒ (x ∨ B))) ⇒ Gk (P ⇒ ((X F B) ∨ B)) ⇒ Gk (P ⇒ F B) That is, the singleton rule set {P ⇒ F B}. 2
See [19] for full details; briefly, (M, i) |= ∃p.A iff there exists an M such that (M , i) |= A, and M and M differ at most in the valuation given to p.
246
4
A. Frisch, D. Sheridan, and T. Walsh
Comparisons
We compare the encodings on an example specification G F f . This is a reachability specification, with many applications. Before encoding, the specification is negated to F G ¬f
(2)
We consider only the loop encoding, as the non-loop encoding is ⊥ for all methods due to the semantics of always. The original, recursive encoding decomposes in two steps.In the loop case, 0 l [[F G ¬f, π]]k
=
k
i l [[G ¬f, π]]k
i=0
=
k
k
f (j)
i=0 j=min(i,l)
This is a disjunction of conjunctions: the pathological case for conversion to clause form. It is possible to define a more efficient encoding using renamed subformulæ [4], but this approach is difficult to generalise. The size of the formula is O(k2 ), hence the cost to build it before CNF conversion is quadratic. The conversion to SNF yields the following rules3 start ⇒ F x1 x1 ⇒ ¬f ∧ x2 x2 ⇒ Xl (¬f ∧ x2 ) which encode to the three conjuncts k
x1 (i)
∧
(x1 (i) ⇒ ¬f (i) ∧ x2 (i))
∧
i=0 k i=0 k
(x2 (i) ⇒ ¬f (i + 1) ∧ x2 (i + 1))
i=0
We have two introduced variables: the first establishes a renaming of the G ¬f subformula, and the second renames each successive step of this subformula. This means that steps are shared between references from the F operator, leading to 3
Further reduction of the second and third rules is necessary for correct SNF; we disregard this as it makes no difference to the final encoding
A Fixpoint Based Encoding for Bounded Model Checking
247
a simplification of the problem which is easier to solve as well as being smaller. The added complexity of the introduced variables is balanced by the ability to reuse subformulæ many times. The encoding corresponds to an ideal renaming of the formula above, but the conversion is performed in linear time, and results in a formula of size O(k). Furthermore, we can show in advance that the encoding of each rule used here is invariant with respect to l, which means that the subformulæ can be factorised out of the disjunction of loops seen in Equation 1. Finally, we examine the fixpoint normal form conversion. The set of rules corresponding to the specification is start ⇒ x0 ∨ x1 x0 ⇒ X (x0 ∨ x1 ) bound ⇒ x1 ∨ ¬x0 x1 ⇒ ¬f ∧ x2 x2 ⇒ Xl (¬f ∧ x2 ) which encode to the conjuncts k
x0 (0) ∨ x1 (0)
∧
(x0 (i) ⇒ x0 (i + 1) ∨ x1 (i + 1))
∧
x1 (k) ∨ ¬x0 (k)
∧
(x1 (i) ⇒ ¬f (i) ∧ x2 (i))
∧
i=0
k i=0 k
(x2 (i) ⇒ ¬f (i + 1) ∧ x2 (i + 1))
i=0
The main difference between the SNF encoding and the fixpoint normal form encoding is the omission of the long disjunction in the first conjunct which would be encoded as a single long clause. This is replaced by an array of conjunctions which rename each step in much the same way as for the G operator. Although in this case the advantage is dependent on the SAT checker, it is clear that where the F operator is nested, similar advantages would be seen as for SNF with the G operator.
5
Results
We compare the SNF and Fixpoint encodings with the encoding used in NuSMV version 2.0.2; this version of NuSMV includes several of the optimisations discussed in [4]. For consistency, we have implemented the SNF and Fixpoint encodings as options in NuSMV. All of the experiments have been done using the SAT solver zChaff [15] on a 700MHz AMD Athlon with 256Mb main memory, running Linux.
248
A. Frisch, D. Sheridan, and T. Walsh Snf 104
Fixpoint 104
Global After Before
103
102 2 10
Fixpoint 104
Global After Before
103
103
104 NuSMV
105
102 2 10
Global After Before
103
103
104 NuSMV
105
102 2 10
103 Snf
104
Fig. 2. Number of clauses generated by a shift register model
5.1
Scalability
We observe the difference in the behaviour of the encodings with increasing problem size by choosing a simple problem that is easy to scale. The benchmark circuits have been kept deliberately simple as it is the encoding of the specification not the model that differentiates the encodings. A shift register is a storage device of length n which, at each time step, moves the values stored in each of its elements to the next element, reading in a new value to fill the now empty first element. That is, storage elements x0 . . . xn−1 and input in are transformed such that ∀i, 0 < i < n · (xi ← xi−1 ) and x0 ← in. The specification that the shift register must fulfil will depend on its application; we explore a number of response patterns taken from [6]. The specifications depend on the number of elements in the shift register, referring to points at the end and middle of the register. For example, in the case of a three element register: – Global response (depth 2) — x2 goes high in response to in: G(in ⇒ F x2 ) – After response (depth 3)— x2 goes high in response to in, after x1 has gone high: G (x1 ⇒ G(in ⇒ F x2 )) – Before response (depth 3)— x1 goes high in response to in, before x2 has gone high (this property is only true if all the registers are zero, so we test for empty ≡ ¬x0 ∧ ¬x1 ∧ ¬x2 too): [((in ∧ empty) ⇒ [¬x2 U (x1 ∧ ¬x2 )]) U (x2 ∨ G x2 )]
Number of Clauses. We see in Figure 2 that the number of clauses produced by both SNF and Fixpoint grows, in general, less quickly than the number produced by NuSMV, as the length of the register increases. The differing gradients follow the behaviour predicted by the differing depths of the specifications: the slopes become shallower with increasing depth indicating an exponential improvement in the number of clauses.
A Fixpoint Based Encoding for Bounded Model Checking Snf 101
Fixpoint 101
Global After Before
0
Fixpoint 101
Global After Before
0
10
10
10-1
10-1
10-1
10-1
100
101 102 NuSMV
103
104
10-2 -2 10
Global After Before
0
10
10-2 -2 10
249
10-1
100
101 102 NuSMV
103
104
10-2 -2 10
10-1
100
101
Snf
Fig. 3. Time taken by zChaff for a shift register model
The advantage of the Fixpoint encoding over SNF is dependent upon the number of occurrences of the always operator in the specification, since this is the only difference between the encodings. We see the greatest advantage for Fixpoint in the after response and before response specifications, with two occurrences of the always operator; the first operator in the after response specification has a smaller encoding than the second as one of the corresponding rules is an initial rule. We can conclude that, as far as the number of clauses is concerned, the Fixpoint encoding outperforms SNF and NuSMV in the way that is expected: size and rate of size increase decreasing with the nesting depth and the occurrence of least fixpoint operators.
zChaff timings. Counting the number of clauses is far from being an effective method of determining the efficiency of an encoding. We also look at one of the current state-of-the-art SAT solvers, zChaff [15]. The behaviour is far less clear than for the number of clauses; zChaff is a complex system. Broadly, the SNF and Fixpoint encodings always result in a shorter runtime than the NuSMV encoding; the Fixpoint encoding outperforms the SNF encoding only for the after response specification (for the global response specification, the trend is towards an improvement for larger problems). We see a clear exponential improvement for certain specifications: the timings for Before with SNF and Fixpoint grow exponentially slower than NuSMV; the global response specification shows the same trend less dramatically. We only see a exponential improvement for the after response specification with the Fixpoint encoding: with the SNF encoding, the trend appears to be towards NuSMV being faster.
250
5.2
A. Frisch, D. Sheridan, and T. Walsh
Distributed Mutual Exclusion
The distributed mutual exclusion circuit from [13] forms a good basis for comparing the performance of different encodings as it meaningfully implements several specifications. We look at three here, applied to a DME of four elements: – Accessibility: if an element wishes to enter the critical region, it eventually will. We check the accessibility of the first two elements. This specification is correct, so as in [2], we check at a chosen bound to illustrate the timing differences. G(request(0) → F enter (0)) ∧ G(request(1) → F enter (1)) – Precedence given token possession: the mutual exclusion property is enforced by a token passing mechanism; if an element of the DME holds the token, then its requests to enter the critical region are given precedence. We check the converse: if the first element holds the token, the second does not have precedence and vice versa. Since the token begins at the first element, this is the quicker to prove, with a bound of 14. For the second element, a bound of 54 is required to find the counterexample. G((request(0) ∧ request(1) ∧ token(0)) → [¬enter (0) U enter (1)]) – Bounded overtaking given token possession: if two elements wish to enter the critical region, then the higher priority may enter a given number of times before the other. We check bounded overtaking of one and two entrances. Both specifications are correct so as above we check at a bound of 40. These specifications are the most complex, including up to four nested until operators. For one entrance: G((request(0) ∧ request(1) ∧ token(0)) → [(¬enter (0) ∧ ¬enter (1)) U (enter (0) ∧ X(enter (0) U [(¬enter (0) ∧ ¬enter (1)) U enter (1)]))]) The results are summarised in Table 3 together with the timings for CMU SMV on CTL representations of the same problems4 . For the bounded overtaking problems, we note that NuSMV took nearly 10 minutes to generate the formula in the first case, and after 25 minutes had not completed in the second case. In contrast, the time taken to perform the SNF and Fixpoint encodings were insignificant. While both the SNF and Fixpoint encodings outperform the NuSMV encoding and SMV, we do not see a consistent advantage to either. The results for accessibility suggest that Fixpoint scales better with increasing bound, while the results for bounded overtaking suggest that SNF scales better with increasing specification depth. 5.3
Texas-97 Benchmarks
We examine a number of model checking benchmarks from the Texas-97 benchmark suite [1]. These benchmarks have been converted from the Blif-mv represen4
We note that for SMV to terminate in a reasonable time on these problems, it must be started with the -inc switch. No similar knowledge of model checker behaviour is needed for BMC.
A Fixpoint Based Encoding for Bounded Model Checking
251
Table 3. Timing results in zChaff for the distributed mutual exclusion circuit Specification Bound NuSMV encoding SNF encoding Fixpoint encoding Accessibility 30 2.65 0.33 0.36 Accessibility 40 20.93 4.84 4.33 Priority for 0 14 0.13 0.02 0.02 Priority for 1 54 14.93 0.44 0.76 Overtaking depth 1 40 85.73 2.15 1.11 Overtaking depth 2 40 * 4.92 5.15
SMV 13.13 13.13 12.97 15.00 13.96 14.14
Table 4. Timing results in zChaff for the MSI cache coherence protocol Processors 2 2 2 2 3 3 3 3 3 3
Specification Bound NuSMV SNF Fixpoint Request A 10 4.40 1.73 1.53 Request A 20 19.40 5.82 9.97 Request B 10 2.65 3.63 2.69 Request B 20 49.78 8.63 16.42 Request A 10 13.00 3.03 2.50 Request A 20 39.22 8.2 5.79 Request B 10 4.60 6.66 5.93 Request B 20 54.94 62.11 40.25 Request C 10 4.58 6.64 5.91 Request C 20 44.8 50.27 37.65
tation to SMV format by a locally modified version of the VIS model checker [12]. We run these benchmarks at fixed bounds and report the time spent by zChaff. MSI Cache Coherence Protocol. This is an implementation of a Modified Shared Invalid cache coherence protocol between two or three processors. We examine two of the specifications of behaviour from the benchmark. The results are summarised in Table 4. – Whenever processor A requests the bus, it gets the bus in the next clock cycle. Listed as “Request A” in the results. G(bus reqA → X bus ackA) – Whenever processor B (or C) requests the bus, it gets the bus only when Processor A did not request the bus. Listed as “Request B” or “Request C” in the results. G(bus reqB → F bus ackB ) Instruction Fetch Control Module. This is a model of the instruction fetch control module of the experimental torch microprocessor developed at Stanford University. Three models are examined; from the text accompanying the benchmark set: – IFetchControl1: The original instruction module with several assumptions on the environmental signal. – IFetchControl2: As IFetchControl1 except that the memory stall signal is always low.
252
A. Frisch, D. Sheridan, and T. Walsh Table 5. Timing results in zChaff for the Instruction Fetch Control Module Model IFetchControl1 IFetchControl2 IFetchControl3 IFetchControl1 IFetchControl2 IFetchControl3 IFetchControl1 IFetchControl2 IFetchControl3
Specification Bound NuSMV SNF Fixpoint Delay 10 0.94 0.45 0.44 Delay 10 0.99 0.40 0.40 Delay 10 1.29 0.39 0.50 Refetch 10 3.69 0.91 0.82 Refetch 10 3.30 0.89 0.81 Refetch 10 3.74 1.49 1.88 WriteCache 10 3.58 1.68 2.47 WriteCache 10 2.67 1.65 1.78 WriteCache 10 2.78 2.24 1.40
– IFetchControl3.v: As IFetchControl1 except that the instruction cache line is assumed to be always valid. We examine three specifications from the benchmark. The results are summarised in Table 5. – The delayed version of a signal should, in the next state, have the signal’s previous value. Listed as “Delay” in the results. G(IStall s1 → X IStall s2 ) – As above, for the Refetch state. Listed as “Refetch” in the results. G((PresState s1 = REFETCH ) → X((PrevState s2 = REFETCH )) – WriteCache s2 becomes one in some paths before WriteTag s2 becomes one. Listed as “WriteCache” in the results. ¬[¬WriteTag s2 U (WriteCache s2 ∧ ¬WriteTag s2 )]
Pentium Pro Split-Transaction Bus. This is a model of the Modified Exclusive Shared Invalid cache coherence protocol used by the Intel Pentium Pro processor for SMP. We examine a number of different combinations of opcodes running on the processors, with the memory address of the transaction being nondeterministically 0 or 1. We examine three specifications from the benchmark. The results are summarised in Table 6. – Correctness of the bus transaction IOQ. Listed as “IOQ” in the results. G(¬((processor0 .fifo = REQUEST ) ∧ (processor1 .fifo = REQUEST ))) – Liveness of processor 0 (part 1). Listed as “Live 1” in the results. G((processor0 .stage = FETCH ) → F(processor0 .stage = EXECUTE )) – Liveness of processor 0 (part 2). Listed as “Live 2” in the results. G((processor0 .stage = EXECUTE ) → F(processor0 .stage = FETCH ))
Summary. While we can see that the SNF and Fixpoint encodings outperform the NuSMV encoding in many cases, the gains are typically less dramatic than were seen for the mutual exclusion circuit. The models are encoded in the same
A Fixpoint Based Encoding for Bounded Model Checking
253
Table 6. Timing results in zChaff for the Pentium Pro Split-Transaction Bus Opcode 0 Load2Store Load2Store Load2Store Load2Store Load2Store Load2Store Load2Store Load2Store
Opcode 1 Specification NuSMV SNF Fixpoint Load2Store IOQ 949.53 202.83 202.90 Store IOQ 753.26 156.43 156.24 Load2Store Live 1 923.12 176.64 169.97 Load Live 1 745.94 131.61 145.88 Store Live 1 1111.63 175.58 199.19 Load2Store Live 2 919.61 167.23 160.38 Load Live 2 883.51 134.52 155.23 Store Live 2 738.74 128.96 143.04
way regardless of the encoding used for the specification; these benchmarks are very large circuits with several thousand variables, so it is reasonable to suppose that the performance gains due to the new encodings are mitigated by the time taken to process the model. The specifications used in these benchmarks are much simpler than those used to test the DME: typically of the form G(a → X b) or G(a → F b). This suggests again that one advantage of the SNF and Fixpoint encodings are dependent on the nesting depth of the specification.
6
Conclusions
We have described two new encoding schemes for bounded model checking which build on the existing encodings and use the fixpoint characterisations of LTL. The first is a novel application of the Separated Normal Form, while the second extends SNF by the introduction of a transformation for the eventually operator. We have shown that these new encodings are correct, provided that the original bounded model checking encoding is correct. We have demonstrated a reduction in the number of clauses generated by the problem which is exponential in the size of the problem instance, for both encodings, and also that the improvement in performance in the SAT checker can be exponential in the size of the problem instance, depending on the specification. We have demonstrated a clear performance advantage to these encodings over the NuSMV bounded model checking implementation in several real-world examples, and we have demonstrated the advantage that these encodings give BMC over conventional symbolic model checkers.
References 1. Adnan Aziz et al. Examples of HW verification using VIS, 1997. http://vlsi.colorado.edu/˜vis/texas-97/ 2. Armin Biere, Alessandro Cimatti, Edmund Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. In W.R. Cleaveland, editor, Tools and Algorithms for the Construction and Analysis of Systems. 5th International Conference, TACAS’99, volume 1579 of Lecture Notes in Computer Science, pages 193–207. Springer-Verlag Inc., July 1999.
254
A. Frisch, D. Sheridan, and T. Walsh
3. Alexander Bolotov and Michael Fisher. A resolution method for CTL branchingtime temporal logic. In Proceedings of the Fourth International Workshop on Temporal Representation and Reasoning (TIME). IEEE Press, 1997. 4. Alessandro Cimatti, Marco Pistore, Marco Roveri, and Roberto Sebastiani. Improving the encoding of LTL model checking into SAT. In Agostino Cortesi, editor, Third International Workshop on Verification, Model Checking and Abstract Interpretation, volume 2294 of Lecture Notes in Computer Science. Springer-Verlag Inc., January 2002. 5. Martin Davis and Hilary Putnam. A computing procedure for quantification theory. Journal of the ACM, 7:201–215, 1960. 6. M.B. Dwyer, G.S. Avrunin, and J.C. Corbett. Property Specification Patterns for Finite-State Verification. In M. Ardis, editor, 2nd Workshop on Formal Methods in Software Practice, pages 7–15, March 1998. 7. M.B. Dwyer, G.S. Avruning, and J.C. Corbett. Patterns in property specifications for finite-state verification. In 21st International Conference on Software Engineering, Los Angeles, California, May 1999. 8. E. Allen Emerson and Edmund M. Clarke. Characterizing correctness properties of parallel programs using fixpoints. In Jan van Leeuwen J. W. de Bakker, editor, Automata, Languages and Programming, 7th Colloquium, volume 85 of Lecture Notes in Computer Science, pages 169–181. Springer-Verlag Inc, 1980. 9. Michael Fisher. A resolution method for temporal logic. In Proceedings of Twelfth International Joint Conference on Artificial Intelligence (IJCAI). Morgan Kaufmann, August 1991. 10. Michael Fisher and Philippe No¨el. Transformation and synthesis in MetateM Part I: Propositional MetateM. Technical Report UMCS-92-2-1, Department of Computer Science, University of Manchester, Manchester M13 9PL, England, February 1992. 11. Dov Gabbay. The declarative past and imperative future. In H. Barringer, editor, Proccedings of the Colloquium on Temporal Logic and Specifications, volume 398 of Lecture Notes in Computer Science, pages 409–448. Springer-Verlag, 1989. 12. The VIS Group. VIS: A system for verification and synthesis. In R. Alur and T. Henzinger, editors, Proceedings of the 8th International Conference on Computer Aided Verification, volume 1102 of Lecture Notes in Computer Science, pages 428– 432, New Brunswick, NJ, July 1996. Springer. 13. A. J. Martin. The design of a self-timed circuit for distributed mutual exclusion. In Henry Fuchs, editor, Proceedings of the 1985 Chapel Hill Conference on VLSI, pages 245–260. Computer Science Press, 1985. 14. K. L. McMillan. Symbolic Model Checking: An Approach to the State Explosion Problem. PhD thesis, Carnegie Mellon University, 1992. 15. M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering an efficient SAT solver. In 39th Design Automation Conference, Las Vegas, June 2001. 16. Daniel Sheridan. Using fixpoint characterisations of LTL for bounded model checking. Technical Report APES-41-2002, APES Research Group, January 2002. Available from http://www.dcs.st-and.ac.uk/˜apes/apesreports.html 17. Daniel Sheridan and Toby Walsh. Clause forms generated by bounded model checking. In Andrei Voronkov, editor, Eighth Workshop on Automated Reasoning, 2001.
A Fixpoint Based Encoding for Bounded Model Checking
255
18. A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics, 5:285–309, 1955. 19. Pierre Wolper. Specification and synthesis of communicating processes using an extended temporal logic. In Proceeding of the 9th Symposium on Principles of Programming Languages, pages 20–33, Albuquerque, January 1982.
Using Edge-Valued Decision Diagrams for Symbolic Generation of Shortest Paths Gianfranco Ciardo and Radu Siminiceanu College of William and Mary, Williamsburg, Virginia 23187 {ciardo,radu}@cs.wm.edu
Abstract. We present a new method for the symbolic construction of shortest paths in reachability graphs. Our algorithm relies on a variant of edge–valued decision diagrams that supports efficient fixed–point iterations for the joint computation of both the reachable states and their distance from the initial states. Once the distance function is known, a shortest path from an initial state to a state satisfying a given condition can be easily obtained. Using a few representative examples, we show how our algorithm is vastly superior, in terms of both memory and space, to alternative approaches that compute the same information, such as ordinary or algebraic decision diagrams.
1
Introduction
Model checking [13] is an exhaustive, fully automated approach to formal verification. Its ability to provide counterexamples or witnesses for the properties that are checked makes it increasingly popular. In many cases, however, this feature is the most time– and space–consuming stage of the entire verification process. For example, [15] shows how to construct traces for queries expressed in the temporal logic CTL [11] under fairness constraints. Another direction is taken in SAT–based model checking, where satisfiabilty checkers are used to find shortest–length counterexamples (as is the case of the bounded model checking technique [4]), conduct the entire reachability analysis [1], or combine the state– space exploration method with SAT solvers [24]. Since a trace is usually meant to be examined by a human, it is particularly desirable for a model–checking tool to compute a minimal–length trace. Unfortunately, finding such trace is an NP-complete problem [17], thus a sub–optimal trace is sought in most cases. For some operators, finding minimal–length witnesses is instead easy in principle. An example is the EF operator, which is closely related to the (backward) reachability relation: a state satisfies EF p if there is an execution path from it to a state where property p holds. Even using symbolic encodings [7], though, the generation and storage of the sets of states required to generate an EF witness can be a major limitation in practice.
Work supported in part by the National Aeronautics and Space Administration under NASA Grants NAG-1-2168 and NAG-1-02095.
M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 256–273, 2002. c Springer-Verlag Berlin Heidelberg 2002
Using Edge-Valued Decision Diagrams
257
Our goal is then to adapt a very fast and memory–efficient state–space generation algorithm we recently developed [10] and endow the symbolic data structure with information that captures the minimum distance of each state from any of the initial states. Knowledge of this distance significantly simplifies the generation of shortest–length EF witnesses. To encode this information, we employ a variant of the edge–valued decision diagrams [21], appropriately generalized so that it is applicable to our fast state–space generation strategy. We show that the new variant we define is still canonical, and emphasize the importance of using edge–values, which give us increased flexibility when performing guided fixed–point iterations. The paper is organized as follows. Section 2 defines basic concepts in discrete– state systems, ordinary and edge–valued decision diagrams, state–space generation, and traces, and formulates the one–to–many shortest path problem. Section 3 introduces our extensions to edge–valued decision diagrams, including a different type of canonical form, EV+MDDs. Section 4 discusses the efficient manipulation of EV+MDDs and our algorithm for constructing the distance function. Section 5 evaluates the performance of the new data structure and algorithm by comparing them with existing technologies: regular and algebraic decision diagrams. Section 6 concludes with final remarks and future research directions.
2
State Spaces, Decision Diagrams, and Distances
S init , N ), where the discrete set S is the A discrete–state model is a triple (S, potential state space of the model; the set S init ⊆ S contains the initial states; can be reached and N : S → 2S is the transition function specifying which states from a given state in one step, which we extend to sets: N (X ) = i∈X N (i). We consider structured systems modeled as a collection of K submodels. A (global) system state i is then a K-tuple (iK , . . . , i1 ), where ik is the local state for submodel k, for K ≥ k ≥ 1, and S is given by SK × · · · × S1 , the cross–product of K local state spaces Sk , which we identify with {0, . . . , nk −1} since we assume that S is finite. The (reachable) state space S ⊆ S is the smallest set containing S init and closed with respect to N , i.e.: S = S init ∪ N (S init ) ∪ N (N (S init ) ∪ · · · = N ∗ (S init ). Thus, S is the fixed point of equation S = N (S) when S is initialized to S init . 2.1
Decision Diagrams
It is well known that the state spaces of realistic models are enormous, and that decision diagrams are an effective way to cope with this state–space explosion problem. Their boolean incarnation, binary decision diagrams (BDDs) [5], can compactly encode boolean functions of K variables, hence subsets of {0, 1}K , which can then be manipulated very efficiently. BDDs have been successfully employed to verify digital circuits and other types of synchronous and
258
G. Ciardo and R. Siminiceanu
asynchronous systems. In the last decade, their application has expanded to areas of computer science beyond computer–aided verification. A comprehensive overview of decision diagrams is presented in [14]. We consider exclusively ordered decision diagrams (the variables labelling nodes along any path from the root must follow the order iK , . . . , i1 ) that are either reduced (no duplicate nodes and no node with all edges pointing to the same node, but edges possibly spanning multiple levels) or quasi–reduced (no duplicate nodes, and all edges spanning exactly one level), either form being canonical. We adopt the extension of BDDs to integer variables, i.e., multi– valued decision diagrams (MDDs) [19], an example of which is in Figure 1. MDDs are often more naturally suited than BDDs to represent the state space of arbitrary discrete systems, since no binary encoding must be used to represent the local states for level k when nk > 2. An even more important reason to use MDDs in our work, as it will be apparent, is that they better allow us to exploit the event locality present in systems exhibiting a globally–asynchronous locally– synchronous behavior. When combined with the Kronecker representation of the transition relation inspired by [2] and applied in [9,22], MDDs accommodate different fixed–point iteration strategies that result in remarkable efficiency improvements [10]. To discuss locality in a structured model, we require a disjunctively– partitioned transition function [18], i.e., N must be a union of (asynchronous) transition functions: N (iK , . . . , i1 ) = e∈E Ne (iK , . . . , i1 ), where E is a finite set of events and Ne is the transition function associated with event e. Furthermore, we must be able to express each transition function Ne as the cross–product of K local transition functions: Ne (iK , . . . , i1 ) = Ne,K (iK ) × · · · × Ne,1 (i1 ). This is a surprisingly natural requirement: for example, it is satisfied by any Petri net [23], regardless of how it is decomposed into K subnets (by partitioning its places into K sets). Moreover, if a given model does not exhibit this behavior, we can always coarsen K or refine E so that it does. If we identify Ne,k with a boolean matrix of size nk × nk , where entry (ik , jk ) is 1 iff jk ∈ Ne,k (ik ), the overall transition relation is encoded by the boolean Kronecker expression e∈E K≥k≥1 Ne,k . We say that event e affects level k if Ne,k is not the identity, we denote the top and bottom levels affected by e with Top(e) and Bot(e), respectively, and we let Ek = {e ∈ E : Top(e) = k}. 2.2
Symbolic State–Space Generation: Breadth–First vs. Saturation
The traditional approach to generating the reachable states of a system is based on a breadth–first traversal, as derived from classical fixed–point theory, and applies a monolithic N (even when encoded as e∈E Ne ): after d iterations, the currently–known state space contains exactly all states whose distance from any state in S init is at most d. However, recent advances have shown that non–BFS, guided, or chaotic [16], exploration can result in a better iteration strategy. An example is the saturation algorithm introduced in [10], which exhaustively fires (explores) all events of Ek in an MDD node at level k, thereby bringing it to its final “saturated” form. We only briefly summarize the main characteristics of
Using Edge-Valued Decision Diagrams 0 1 2 3
S4 = {0, 1, 2, 3} S3 = {0, 1, 2}
0 1 2
S2 = {0, 1}
0 1
S1 = {0, 1, 2}
0 1 2
0 1 2 0 1
0 1 2
0 1
0 1
0 1 2 0
0 1 2
259
S = {0210, 1000, 1010, 1100, 1110, 1210, 2000, 2010, 2100, 2110, 2210, 3010, 3110, 3200, 3201, 3202, 3210, 3211, 3212}
1
Fig. 1. A 4-level MDD on {0,1,2,3}×{0,1,2}×{0,1}×{0,1,2} and the encoded set S.
saturation in this section, since the algorithm we present in Section 4.1 follows the same idea, except it is applied to a richer data structure. Saturation considers the nodes in a bottom–up fashion, i.e., when a node is processed, all its descendants are already known to be saturated. There are major advantages in working with saturated nodes. A saturated node at level k encodes a fixed point with respect to events in Ek ∪ . . . ∪ E1 , thus it need not be visited again when considering such events. By contrast, traditional symbolic algorithms manipulate and store a large number of non–saturated nodes; these nodes cannot be present in the encoding of the final state space, thus will necessarily be deleted before reaching the fixed–point and replaced by (saturated) nodes encoding a larger subspace. Similar advantages apply to the manipulation of the auxiliary data structures used in any symbolic state–space generation algorithm, the unique table and the operation cache: only saturated nodes are inserted in them, resulting in substantial memory savings. Exploring a node exhaustively once, instead of once per iteration, also facilitates the idea of in– place–updates: while traditional algorithms frequently create updated versions of a node, to avoid using stale unique table and cache entries, saturation only checks–in a node when all possible updates on it have been performed. Experimental studies [10] show that our saturation strategy performs orders of magnitude faster than previous algorithms. Even more important, its peak memory requirements are often very close to the final requirements, unlike traditional approaches where the memory consumption grows rapidly until midway through the exploration, only to drop sharply in the last phases. Our next challenge for saturation is then applying it to other types of symbolic computation, such as the one discussed in this paper: the generation of shortest–length traces, where the use of chaotic iteration strategies would not seem applicable at first. 2.3
The Distance Function
The distance of a reachable state i ∈ S from the set of initial states S init is defined as δ(i) = min d : i ∈ N d (S init ) . We can naturally extend δ : S → N to all states in S by letting δ(i) = ∞ for any non–reachable state i ∈ S \ S. Alternatively, given such a function δ : S → N ∪ {∞}, we can identify S as the subset of the domain where the function is finite: S = {i ∈ S : δ(i) < ∞}.
260
G. Ciardo and R. Siminiceanu
The formulation of our problem is then: Given a description of a struc S init , N ), determine the distance to all reachable tured discrete–state system (S, states, i.e., compute and store δ : S → N ∪ {∞} (note that the reachable state space S is not an input, rather, it is implicitly an output). This can be viewed as a least fixed–point computation for the functional Φ : D → D, where D is the set of functions mapping S onto N ∪ {∞}. In other words, Φ refines an approximation of the distance function from the initial δ [0] ∈ D, defined as δ [0] (i) = 0, if s ∈ S init , δ [0] (i) = ∞ otherwise, via the iteration
δ [m+1] (i) = Φ(δ [m] )(i) = min δ [m] (i), min 1 + δ [m] (i ) i ∈ N (i ) . Note that the state–space construction is itself a fixed–point computation, so we seek now to efficiently combine the two fixed–point operations into one. Before showing our algorithm to accomplish this, in Section 3, we first describe a few approaches to compute distance information based on existing decision diagrams technology. 2.4
Explicit Encoding of State Distances
Algebraic decision diagrams (ADDs) [3] are an extension of BDDs where multiple terminals are allowed (thus, they are also called MTBDDs [12]). ADDs can encode arithmetic functions from S to R ∪ {∞}. The value of the function on a specific input (representing a state in our case) is the value of the terminal node reached by following the path encoding the input. While ADDs are traditionally associated to boolean argument variables, extending the arguments to finite integer sets is straightforward. The compactness of the ADD representation is related to the merging of nodes, exploited to a certain degree in all decision diagrams. In this case, there is a unique root, but having many terminal values can greatly reduce the degree of node merging, especially at the lower levels, with respect to the support decision In other words, the number of diagram, i.e., the MDD that encodes S ⊆ S. terminal nodes for the ADD that encodes δ : S → N ∪ {∞} equals the number of distinct values for δ (hence the “explicit” in the title of this section); if we merged all finite–valued terminals into one, thus encoding just S but not the state distances, many ADD nodes may be merged into one MDD node. An alternative explicit encoding of state distances can be achieved by simply using a forest of MDDs. This approach is derived from the traditional ROBDD method, by extending it to multi–valued variables. Each of the distance sets N d (S init ) = {i ∈ S | δ(i) = d} (or {i ∈ S | δ(i) ≤ d}, which may require fewer nodes in some cases) can be encoded using a separate MDD. Informally, this reverses the region where most sharing of nodes occurs compared to ADDs: the roots are distinct, but they may be likely to share nodes downstream. The cardinality of the range of the function is critical to the compactness of either representation: the wider the range, the less likely it is that nodes are merged. Figure 2 (a) and (b) show an example of the same distance function represented as an ADD or as a forest of MDDs, respectively.
Using Edge-Valued Decision Diagrams (a)
261
(b) dist=0 dist=1 dist=2 dist=3 dist=4
i3 0 0 0 0 1 1 1 1 i2 0 0 1 1 0 0 1 1 i1 0 1 0 1 0 1 0 1 f 02322410
0
1
2
3
4
T
Fig. 2. Storing the distance function: an ADD (a) vs. a forest of MDDs (b).
2.5
Symbolic Encoding of State Distances
The idea of associating numerical values to the edges of regular BDDs was proposed in [20,21], resulting in a new type of decision diagrams, edge–valued BDDs (EVBDDs)1 . In the following definition of EVBDDs, instead of using the original terminology and notation, we use the terminology and notation needed to introduce the new data structure presented in the next section, so that differences and similarities will be more apparent. Definition 1. An EVBDD is a directed acyclic graph that encodes a total function f : {0, 1}K → Z as follows: 1. There is a single terminal node, at level 0, with label 0, denoted by 0.0. 2. A non–terminal node at level k, K ≥ k ≥ 1, is denoted by k.p, where p is a unique identifier within level k, and has two children, k.p[0].child and k.p[1].child (corresponding to the two possible values of ik ) which are nodes at some (not necessarily the same) level l, k > l ≥ 0. 3. The 1-edge is labelled with an integer value k.p[1].val ∈ Z, while the label of k.p[0].val is always (implicitly) 0. 4. There is a single root node kr .r, for some K ≥ kr ≥ 0, with no incoming edges, except for a “dangling” edge labelled with an integer value ρ ∈ Z. 5. Canonicity restrictions analogous to those of reduced ordered BDDs apply: uniqueness: if k.p[0].child = k.q[0].child, k.p[1].child = k.q[1].child, and k.p[1].val = k.q[1].val, then p = q; reducedness: there is no redundant node k.p satisfying k.p[0].child = k.p[1].child and k.p[1].val = 0. The function encoded by an EVBDD node k.p is recursively defined by if ik = 0 f k.p [0].child (il , . . . , i1 ) f k.p (ik , . . . , i1 ) = f k.p [1].child (ir , . . . , i1 ) + k.p[1].val if ik = 1 1
We observe that also binary moment diagrams (BMDs), independently introduced in [6], associate values to edges. For BMDs however, evaluating the function on a particular argument requires the traversal of multiple paths, as opposed to a unique path for EVBDDs. Thus, while very effective for verifying circuits such as a multiplier, BMDs are not as suited for our approach.
262
G. Ciardo and R. Siminiceanu
i3 0 0 0 0 1 1 1 1
(b)
0
(a)
(c)
0
0 2
1
-3
1
2 3
i2 0 0 1 1 0 0 1 1
0 3
0 -1
0 1
2 -1
2 2
3 -1
i1 0 1 0 1 0 1 0 1
0 2
0 -1
-1 1
1 0
-1 1
2 1
f 02322410
0
0
0
Fig. 3. Canonical (a) and non–canonical (b),(c) EVBDDs for the same function f .
where l and r are the levels of k.p[0].child and k.p[1].child, respectively, and f 0.0 = 0. The function encoded by an EVBDD edge, that is, a (value,node) pair is then simply obtained by adding the constant value to the function encoded by the node. In particular, the function encoded by the EVBDD is f = ρ + f kr .r . Note that the nodes are normalized to enforce canonicity: the value of the 0-edge is always 0. If this requirement were relaxed, there would be an infinite number of EVBDDs representing the same function, obtained by rearranging the edge values. An example of multiple ways to encode the function of Figure 2 with non–canonical EVBDDs is shown in Figure 3, where, for better readability, we show the edge value in the box from where the edge departs, except for the top dangling arc. Only the EVBDD in Figure 3(a) is normalized. This node normalization implies that ρ = f (0, . . . , 0) and may require the use of both negative and positive edge values even when the encoded function is non–negative, as is the case for Figure 3(a). More importantly, if we want to represent functions such our distance δ : S → N ∪ {∞}, we can allow edge values to be ∞; however, if δ(0, . . . , 0) = ∞, i.e., state (0, . . . , 0) is not reachable, we cannot enforce the required normalization, since this implies that ρ is ∞, and f is identically ∞ as well. This prompted us to introduce a more general normalization rule, which we present next.
3
A New Approach
We use quasi–reduced, ordered, non–negative edge–valued, multi–valued decision diagrams. To the best of our knowledge, this is the first attempt to use edge–valued decision diagrams of any type in fixed–point computations or in the generation of traces. 3.1
Definition of EV+MDDs
We extend EVBDDs in several ways. The first extension is straightforward: from binary to multi–valued variables. Then, we change the normalization of nodes to a slightly more general one needed for our task. Finally, we allow the value of
Using Edge-Valued Decision Diagrams (a)
i3 0 0 0 0 1 1 1 1
0 0 0
i1 0 0 0 0 1 1 1 1
i2 0 0 1 1 0 0 1 1
0 2
2 0
i2 0 0 1 1 0 0 1 1
i1 0 1 0 1 0 1 0 1
0 2
1 0
i3 0 1 0 1 0 1 0 1
f 02322410
0
f 023∞∞410
(b)
0 2
263
0 0 0 0 2
2 0
1
2
1
0
0
Fig. 4. Storing total (a) and partial (b) arithmetic functions with EV+MDDs.
an edge to be ∞, since this is required to describe our distance functions. Note that the choice to use quasi–reduced instead of reduced decision diagrams is not dictated by limitations in the descriptive power of EVBDDs, but by efficiency considerations in the saturation–based algorithm we present in Section 4. Definition 2. Given a function f : S → Z ∪ {∞}, an EV+MDD for f =∞ is a directed acyclic graph with labelled edges that satisfies the following properties: 1. There is a single terminal node, at level 0, with label 0, denoted by 0.0. 2. A non–terminal node at level k, K ≥ k ≥ 1, is denoted by k.p, where p is a unique identifier within the level, and has nk ≥ 2 edges to children, k.p[ik ].child, labelled with values k.p[ik ].val ∈ N ∪ {∞}, for 0 ≤ ik < nk . 3. If k.p[ik ].val = ∞, the value of k.p[ik ].child is irrelevant, so we simply require it to be 0 for canonicity; otherwise, k.p[ik ].child is the index of a node at level k − 1. 4. There is a single root node, K.r, with no incoming edges, except for a “dangling” incoming edge labelled with an integer value ρ ∈ Z. 5. Each non–terminal node has at least one outgoing edge labelled with 0. 6. All nodes are unique, i.e., if ∀ik , 0 ≤ ik < nk , k.p[ik ].child = k.q[ik ].child, k.p[ik ].val = k.q[ik ].val, then p = q. Figure 4 shows two EV+MDDs storing a total and a partial2 function, respectively (the total function encoded is that of Figures 2 and 3). Note that, unlike the normalization for EVBDDs, our normalization requires that the labels on (non–dangling) edges be non–negative, and at least one per node be zero, but not in a pre–determined location; compare the EVBDD of Figure 3(a) with the equivalent EV+MDD of Figure 4(a). The function encoded by the EV+MDD node k.p is f k.p (ik , . . . , i1 ) = k.p[ik ].val + f k−1. k.p [ik ].child (ik−1 , . . . , i1 ) 2
By “partial, we mean that some of its values can be ∞; whenever this is the case, we omit the corresponding value and edge from the graphical representation.
264
G. Ciardo and R. Siminiceanu
and we let f 0.0 = 0. As for EVBDDs, the function encoded by the EV+MDD (ρ, K.r) is f = ρ + f K.r . However, now, ρ = min{f (i) : i ∈ Sk × · · · × S1 }. In our application, we will encode distances, which are non–negative, thus ρ = 0. If we wanted to cope with the degenerate case S init = ∅, so that f is identically ∞, we could allow a special EV+MDD with ρ = ∞ and root 0.0. 3.2
Canonicity of EV+MDDs
Lemma 1. From every non–terminal EV+MDD node, there is an outgoing path with all edges labelled 0 reaching 0.0. Corollary 1. The function f k.p encoded by a node k.p is non–negative and min(f k.p ) = 0. Definition 3. The graphs rooted at two EV+MDD nodes k.p and k.q are isomorphic if there is a bijection b from the nodes of the first graph to the nodes of the second graph such that, for each node l.s of the first graph and each il ∈ Sl (with k ≥ l ≥ 1): b(l.s)[il ].child = b(l.s[il ].child) and b(l.s)[il ].val = l.s[il ].val. Theorem 1. (Canonicity) If two EV+MDDs (ρ1 , K.r1 ) and (ρ2 , K.r2 ) encode the same function f : S → N ∪ {∞}, then ρ1 = ρ2 and the two labelled graphs rooted at K.r1 and K.r2 are isomorphic. Proof. It is easy to see that, since the value on the dangling edges of the two EV+MDDs equals the minimum value ρ the encoded function f can assume, we must have ρ1 = ρ2 = ρ, and the two nodes K.r1 and K.r2 must encode the same function f − ρ. We then need to prove by induction that, if two generic EV+MDD nodes k.p and k.q encode the same function, the labelled graphs rooted at them are isomorphic. Basis (k = 1): if 1.p and 1.q encode the same function f : S1 → N ∪ {∞}, 1.p[i1 ].child = 1.q[i1 ].child = 0 and 1.p[i1 ].val = 1.q[i1 ].val = f (i1 ) for all i1 ∈ S1 , thus the two labelled graphs rooted at 1.p and 1.q are isomorphic. Inductive step (assume claim true for k − 1): if k.p and k.q encode the same function f : Sk × · × S1 → N ∪ {∞}, consider the function obtained when we fix ik to a particular value t, i.e., fik =t . Let g and h be the functions encoded by k.p[t].child and k.q[t].child, respectively; also, let k.p[t].val = α and k.q[t].val = β, and observe that the functions α+g and β+h must coincide with fik =t . However, because of Corollary 1, we know that both the g and h evaluate to 0, their minimum possible value, for at least one choice of the arguments (ik−1 , . . . , i1 ). Thus, the minimum of values α+g and β +h can have are α and β, respectively; since α+g and β+h are the same function, they must have the same minimum, hence α = β. This implies that g = h and, by inductive hypothesis, that k.p[t].child and k.q[t].child are isomorphic. Since this argument applies to a generic child t, the two nodes k.p and k.q are then themselves isomorphic, completing the proof. ✷
Using Edge-Valued Decision Diagrams
265
UnionMin(k : level , (α, p) : edge, (β, q) : edge) : edge 1. if α = ∞ then return (β, q); 2. if β = ∞ then return (α, p); 3. if k = 0 then return (min(α, β), 0); • the only node at level k = 0 has index 0 4. if UCacheFind (k, p, q, α−β, (γ, u)) then • match (k, p, q, α−β), return (γ, u) 5. return (γ + min(α, β), u); 6. u ← NewNode(k); • create new node at level k with edges set to (∞, 0) 7. µ ← min(α, β); 8. for ik = 0 to nk − 1 do 9. p ← k.p .child [ik ]; α ← α − µ + k.p .val [ik ]; 10. q ← k.q .child [ik ]; β ← β − µ + k.q .val [ik ]; 11. k.u [ik ] ← UnionMin(k−1, (α , p ), (β , q )); • continue downstream 12. CheckInUniqueTable(k, u); 13. UCacheInsert(k, p, q, α − β, (µ, u)); 14. return (µ, u); Fig. 5. The UnionMin algorithm for EV+MDDs.
4
Operations with EV+MDDs
We are now ready to discuss manipulation algorithms for EV+MDDs. We do so in the context of our state–space and distance generation problem, although, of course, the UnionMin function we introduce in Figure 5 has general applicability. The types and variables used in the pseudo–code of Figures 5 and 7 are event (model event, e), level (EV+MDD level, k), index (node index within a level, p, q, p , q , s, u, f ), value (edge value, α, β, α , β , µ, γ, φ), local (local state index ik , jk ), and localset (set of local states for one level, L). In addition, we let edge denote the pair (value, index ), i.e., the type of k.p[i]; note that only index is needed to identify a child, since the level itself is known: k−1. The UnionMin algorithm computes the minimum of two partial functions. This acts like a dual operator by performing the union on the support sets of states of the two operands (which must be defined over the same potential state and by finding the minimum value for the common elements. The space S), algorithm starts at the roots of the two operand EV+MDDs, and recursively descends along matching edges. If at some point one of the edges has value ∞, the recursion stops and returns the other edge (since ∞ is the neutral value with respect to the minimum); if the other edge has value ∞ as well, the returned value is (∞, 0), i.e., no states are added to the union; otherwise, if the other edge has finite value, we have just found states reachable in one set but not in the other. If the recursion reaches instead all the way to the terminal node 0.0, the returned value is the minimum of the two input values α and β. If both α and β are finite and p and q are non–terminal, UnionMin “keeps” the minimum value on the incoming arcs to the operands, µ, and “pushes down” any residual value α − µ, if µ = β < α, or β − µ, if µ = α < β, on the children of
266
G. Ciardo and R. Siminiceanu
i3 0 0 0 0 1 1 1 1 2 2 2 2
f
g
h=min(f,g)
0 1 2
0 2 1
0 1 1
i2 0 0 1 1 0 0 1 1 0 0 1 1
0 2
i1 f g h
0
0 0 0 0
1 ∞ 2 2
0 2 ∞ 2
1 ∞ ∞ ∞
0 2 2 2
1 ∞ 4 4
0 ∞ ∞ ∞
1 1 ∞ 1
0 3 1 1
1 ∞ 3 3
0 ∞ ∞ ∞
1 2 3 2
1 0 0 0
0 0 2
0 2
0 2
1 0
0
0 2
0
0
0 1 0
0
Fig. 6. An example of the UnionMin operator for EV+MDDs.
p or q, respectively, in its recursive downstream calls. In this case, the returned edge (µ, u) is such that µ + f k.u = min(α + f k.p , β + f k.q ). An example of the application of the UnionMin algorithm is illustrated in Figure 6. The potential state space is S3 × S2 × S1 = {0, 1, 2} × {0, 1} × {0, 1}. The functions encoded by the operands, f and g, are listed in the table to the left, along with the result function h = min(f, g). Lemma 2. The call UnionMin(k, (α, p), (β, q)) returns an edge (µ, u) such that µ = min(α, β) and k.u and its descendants satisfy property 5 of Definition 2, if k.p and k.q do. Proof. It is immediate to see that µ = min(α, β). To prove that k.u satisfies property 5, we use induction: if k = 0, there is nothing to prove, since property 5 applies to non–terminal nodes only. Assume now that the lemma is true for all calls at level k −1 and consider an arbitrary call UnionMin(k, (α, p), (β, q)), where the input nodes k.p and k.q satisfy property 5. If α or β is ∞, the returned node is one of the input nodes, so it satisfies property 5. Otherwise, since µ = min(α, β), at least one of α−µ and β −µ is 0; say α−µ = 0. The values labelling the edges of k.u are computed in line 11 of UnionMin. Since k.p satisfies property 5, there exists ik ∈ {0, . . . , nk − 1} such that k.p.val [ik ] = 0. Then, for the corresponding iteration of the for–loop, α is 0 and the edge returned by UnionMin(k−1, (α , p ), (β , q )) is (min(α , β ), u ) = (0, u ), where k−1.u satisfies property 5 by induction; thus, k.u[ik ].val is set to 0. ✷ We conclude the discussion of UnionMin by observing that the hash–key for the entries in our “union/min cache” is formed by the two nodes (passed as level , index , index , since the nodes are at the same level) plus the difference α − β of the values labelling two edges pointing to these nodes. This is better than using the key (k, p, q, α, β), which would unnecessarily clutter the cache with entries of the form (k, p, q, α + τ, β + τ, (µ + τ, u)), for all the values of τ arising in a particular execution.
Using Edge-Valued Decision Diagrams
4.1
267
State-Space and Distance Generation Using EV+MDDs
Our fixed–point algorithm to build and store the distance function δ, and implicitly the state space S, is described by the pseudo–code for BuildDistance, S init , N ) we Saturate, and RecursiveFire, shown in Figure 7. Given a model (S, follow these steps: 1. Encode S init into an initial EV+MDD node K.r. This can be done by building the MDD for S init , then setting to 0 all edge values for edges going to true (called 1 in the MDD terminology of [10]), setting the remaining edge values to ∞, eliminating the terminal node false, and renaming the terminal node true as 0 (in EV+MDD terminology). See [10] on how to build an MDD when S init contains a single state. In general, the MDD encoding of S init will be derived from some other symbolic computation, e.g., it will be already available as the result of a temporal logic query. 2. Call BuildDistance(K , r ). Functions CheckInUniqueTable , LocalsToExplore , UCacheFind , FCacheFind , UCacheInsert , FCacheInsert , PickAndRemoveElementFromSet , and CreateNode have the intuitive semantic associated to their name (see also the comments in the pseudo–code). Normalize(k, s) puts node k.s in canonical form by computing µ = min{k.s[ik ].val : ik ∈ Sk } and subtracting µ from each k.s[ik ].val (so that at least one of them becomes 0), then returns µ; in particular, if all edge values in k.s are ∞, it returns ∞ (this is the case in Statement 17 of RecursiveFire if the while–loop did not manage to fire e from any of the local states in L). The hash–key for the firing cache does not use the value α on the incoming edge, because the node k.s corresponding to the result (γ, s) of RecursiveFire is independent of this quantity. The edge value returned by RecursiveFire depends instead of α: it is simply obtained by adding the result of Normalize(k, s) to α. RecursiveFire may push excess values upwards when normalizing a node in line 17, that is, residual values are moved in the opposite direction as in UnionMin. However, the normalization procedure is called only once per node (when the node has been saturated), therefore excess values are not bounced back and forth repeatedly along edges. 4.2
Trace Generation Using EV+MDDs
Once the EV+MDD (ρ, K.r) encoding δ and S is built, a shortest–length trace from any of the states in S init to one of the states in a set X (given in input as an MDD) can be obtained by backtracking. For simplicity, the following algorithm does not output the identity of the events along the trace, but this option could be easily added, if desired: 1. Transform the MDD for X into an EV+MDD (ρx , K.x) encoding X and δx using the approach previously described for S init , where δx (i) = 0 if i ∈ X and δx (i) = ∞ if i ∈ S \ X .
268
G. Ciardo and R. Siminiceanu
BuildDistance(k : level , p : index ) 1. if k > 0 then 2. for ik = 0 to nk − 1 do 3. if k.p [ik ].val < ∞ then BuildDistance(k − 1, k.p [ik ].child ); 4. Saturate(k, p); Saturate(k : level , p : index ) 1. repeat 2. pChanged ← false; 3. foreach e ∈ Ek do 4. L ← LocalsToExplore(e, k, p); • {ik : Ne,k (ik ) = ∅ ∧ k.p [ik ].val = ∞} 5. while L = ∅ do 6. ik ← PickAndRemoveElementFromSet(L); 7. (α, f ) ← RecursiveFire(e, k−1, k.p [ik ]); 8. if α = ∞ then 9. foreach jk ∈ Ne,k (ik ) do 10. (β, u) ← UnionMin(k−1, (α + 1, f ), k.p [jk ]); 11. if (β, u) = k.p [jk ] then 12. k.p [jk ] ← (β, u); 13. pChanged ← true; 14. if Ne,k (jk ) = ∅ then L ← L∪{jk }; • remember to explore jk later 15. until pChanged = false; 16. CheckInUniqueTable(k, p); RecursiveFire(e : event, k : level , (α, q) : edge) : edge 1. if k < Bot(e) then return (α, q); • level k is not affected by event e 2. if FCacheFind (k, q, e, (γ, s)) then • match (k, q, e), return (γ, s) 3. return (γ + α, s); 4. s ← NewNode(k); • create new node at level k with edges set to (∞, 0) 5. sChanged ← false; ∅ ∧ k.q [ik ].val = ∞} 6. L ← LocalsToExplore(e, k, q); • {ik : Ne,k (ik ) = 7. while L = ∅ do 8. ik ← PickAndRemoveElementFromSet(L); 9. (φ, f ) ← RecursiveFire(e, k−1, k.q [ik ]); 10. if φ = ∞ then 11. foreach jk ∈ Ne,k (ik ) do 12. (β, u) ← UnionMin(k−1, (φ, f ), k.s [jk ]); 13. if (β, u) = k.s [jk ] then 14. k.s [jk ] ← (β, u); 15. sChanged ← true; 16. if sChanged then Saturate(k, s); 17. γ ← Normalize(k, s); 18. s ← CheckInUniqueTable(k, s); 19. FCacheInsert(k, q, e, (γ, s)); 20. return (γ + α, s); Fig. 7. BuildDistance, our saturation–based algorithm using EV+MDDs.
Using Edge-Valued Decision Diagrams
269
2. Compute IntersectionMax (K, (ρ, r), (ρx , x)), which is the dual of UnionMin, and whose pseudo–code is exactly analogous; let (µ, K.m) be the resulting EV+MDD, which encodes X ∩ S and the restriction of δ to this set (µ is then the length of one of the shortest–paths we are seeking). [µ] [µ] 3. Extract from (µ, K.m) a state j [µ] = (jK , . . . , j1 ) encoded by a path from K.m to 0.0 labelled with 0 values (j [µ] is a state in X at the desired minimum distance µ from S init ). The algorithm proceeds now with an explicit flavor. 4. Initialize ν to µ and iterate: a) Find all states i ∈ S such that j [ν] ∈ N (i). With our boolean Kronecker encoding of N , this “one step backward” is easily performed: we simply have to use the transpose of the matrices Ne,k . b) For each such state i, compute δ(i) using (ρ, K.r) and stop on the first i such that δ(i) = ν − 1 (there exists at least one such state i∗ ). c) Decrement ν. d) Let j [ν] be i∗ . 5. Output j [0] , . . . , j [µ] . The cost of obtaining j [µ] as the result of the IntersectionMax operation is O(#K.r · #K.x), where # indicates the number of EV+MDD nodes. The complexity of the rest of the algorithm is then simply O(µ · M · K), where M is the maximum number of incoming arcs to any state in the reachability graph of the model, i.e., M = max{|N −1 (j)| : j ∈ S}, and K comes from traversing one path in the EV+MDD. In practice M is small but, if this were not the case, the set N −1 (j [ν] ) could be computed symbolically at each iteration instead. Generating the same trace using traditional symbolic approaches could follow a similar idea. If we used ADDs, we would start with an ADD encoding the same information as the EV+MDD (ρx , K.x), compute the ADD equivalent to the EV+MDD (µ, K.m) using a breadth–first approach, and pick as j [µ] any state leading to a terminal with minimal value µ. If we used a forest of MDDs, we = ∅}, and pick as j [µ] any state in would compute µ = min{d : N d (S init ) ∩ X µ N ∩ X . Then, the backtracking would proceed in exactly the same way. In either case, however, we are discovering states symbolically in breadth– first order, thus we could choose to perform an intersection with X after finding each additional set of states N d , and stop as soon as N d (S init ) ∩ X = ∅. Overall, we would then have explored only {i : δ(i) ≤ µ}, which might be a strict subset of the entire state space S. However, two observations are in order. First, while this “optimization” manages fewer states, it may well require many more nodes in the symbolic representation: decision diagrams are quite counter–intuitive in this respect. Second, in many verification applications, the states in X satisfy some property, e.g., “being a deadlock”, and they can only be reached in some obscure and tortuous way, so that the minimum distance µ to any state in X is in practice close, if not equal, to the maximum distance ρ to any of the states in S. The advantage of our approach is that, while it must explore the entire S, it can do so using the idea of saturation, thus the resulting decision diagrams are
270
G. Ciardo and R. Siminiceanu Table 1. Comparison of the five approaches (“—” means “out of memory”).
Time Number of nodes (in seconds) final peak Es Eb Mb As Ab Es Eb Mb As Ab Es Eb Mb As Ab Dining philosophers: D = 2N , K = N/2, |Sk | = 34 for all k except |S1 | = 8 when N is odd 5 1.3·103 0.00 0.01 0.01 0.01 0.03 11 83 38 11 155 172 48 434 10 1.9·106 0.01 0.06 0.05 0.12 0.46 21 255 170 21 605 644 238 4022 20 3.5·1012 0.01 0.34 0.28 1.64 9.00 46 1100 740 46 2990 3079 1163 38942 25 4.7·1015 0.01 0.59 0.47 4.09 26.08 61 1893 1178 61 5215 5334 1958 79674 30 6.4·1018 0.02 0.86 0.70 7.39 56.80 71 2545 1710 71 7225 7364 2788 140262 1000 9.2·10626 0.48 — — — — 2496 — — 2496 — — — — Kanban system: D = 14N , K = 4, |Sk | = (N +3)(N +2)(N +1)/6 for all k 3 5.8·104 0.01 0.02 0.02 0.04 0.17 7 180 68 29 454 464 284 3133 5 2.5·106 0.02 0.14 0.12 0.24 1.55 9 444 133 57 1132 1156 776 13241 7 4.2·107 0.04 0.51 0.42 0.94 7.79 11 848 218 93 2112 2166 1600 35741 14 1673 383 162 4041 4160 3616 98843 10 1.0·109 0.16 2.10 1.68 4.68 48.86 12 5.5·109 0.34 4.34 3.45 11.08 129.46 16 2368 518 218 5633 5805 5585 165938 50 1.0·1016 179.48 — — — — 58 — — 2802 — — — — Flex. manuf. syst.: D = 14N , K = 19, |Sk | = N +1 for all k except |S17 | = 4, |S12 | = 3, |S2 | = 2 3 4.9·104 0.00 0.12 0.09 0.26 1.58 88 1925 1191 116 5002 5187 2075 37657 5 2.9·106 0.01 0.42 0.34 0.88 11.78 149 5640 2989 211 15205 15693 4903 179577 7 6.6·107 0.02 1.05 0.85 2.08 65.32 222 12070 5739 326 32805 33761 9027 523223 10 2.5·109 0.04 2.96 2.40 5.79 608.92 354 28225 11894 536 76676 78649 17885 1681625 140 2.0·1023 20.03 — — — — 32012 — — 52864 — — — — Round–robin mutex protocol: D = 8N −6, K = N +1, |Sk | = 10 for all k except |S1 | = N +1 10 2.3·104 0.01 0.06 0.05 0.22 0.50 92 1038 1123 107 1898 1948 1210 9245 15 1.1·106 0.01 0.15 0.14 1.00 2.93 177 2578 3136 212 4774 4885 3308 34897 20 4.7·107 0.02 0.32 0.31 3.10 12.62 287 4968 6619 322 9270 9467 6901 92140 25 1.8·109 0.03 0.59 0.54 7.89 52.29 422 8333 11947 477 15636 15944 12364 198839 30 7.2·1010 0.05 0.95 0.89 16.04 224.83 582 12798 19495 637 24122 24566 20072 376609 200 7.2·1062 1.63 — — — — 20897 — — 21292 — — — — N
|S|
built much more efficiently and require much less memory than with breadth– first approaches. The following section confirms this, focusing on the first and expensive phase of trace generation, the computation of the distance information, since the backtracking phase has negligible cost in comparison and is in any case essentially required by any approach.
5
Results
To stress the importance of using a saturation–based approach, we compare the three types of encodings for the distance function we have discussed, EV+MDDs, forests of MDDs, and ADDs, in conjunction with two iteration strategies, based on breadth–first and saturation, respectively (see Table 1). Since only breadth– first is applicable in the case of forests of MDDs, this leads to five cases: EV+MDD with saturation (Es ), EV+MDD with breadth–first (Eb ), forest of MDDs with breadth–first (Mb ), ADD with saturation (As ), and ADD with breadth–first (Ab ). Note that only Mb and Ab have been used in the literature before, while Es and Eb use our new data structure and As (which we cannot
Using Edge-Valued Decision Diagrams
271
discuss in detail for lack of space) applies the idea of saturation to ADDs, thus it is also a new approach. We implemented the five algorithms (their MDD, not BDD, version) in our tool SMART [8] and used them to generate the distance function for the entire state space. The suite of examples is chosen from the same benchmark we used in [10]; each model is scalable by a parameter N . All experiments were ran on a 800 MHz Pentium III workstation with 1GB of memory. For each model, we list the maximum distance D, the number K of levels in the decision diagram, and the sizes of the local state spaces. For each experiment we list the maximum distance to a reachable state, which is also the number of iterations in the breadth–first approaches, the runtime, and the number of nodes (both final and peak). In terms of runtime, there is a clear order: Es < Eb < Mb < As < Ab , with Es easily managing much larger systems; Es , Eb < Mb < As , Ab clearly attests to the effectiveness of the data structures, while Es < Eb and As < Ab attest to the improvements obtainable with saturation–based approaches. With EV+MDDs, in particular with Es , we can scale up the models to huge parameters. The other two data structures do not scale up nearly as well and run out of memory. In terms of memory consumption: Es < As < Eb ≈ Mb < Ab for the peak number of nodes, while Es = Eb < As = Ab ≈ Mb for the final number of nodes. The key observation is that Es substantially outperforms all other methods. Compared to Ab , it is over 1,000 times faster and uses fewer peak nodes, also by a factor of 1,000.
6
Conclusion
We introduced EV+MDDs, a new canonical variation of EVBBDs, which can be used to store the state space of a model and the distance of every state form the initial set of states within a single decision diagram. A key contribution is that we extend the saturation approach we previously introduced for state–space generation alone, and apply it to this data–structure, resulting in a very fast and memory–efficient algorithm for joint state–space and distance generation. One conclusion of our research is a clear confirmation of the effectiveness of saturation as opposed to a traditional breadth–first iteration, not just when used in conjunction with our EV+MDDs, but even with ADDs. A second orthogonal conclusion is that edge–valued decision diagrams in general are much more suited than ADDs to the task at hand, because they implicitly encode the possible distance values, while ADDs have an explicit terminal node for each possible value, greatly reducing the degree of node merging in the diagram. Future work along these research lines includes exploring smarter cache management policies that exploit properties of the involved operators (e.g., additivity), extending the idea to EU and EG operators (probably a major challenge), comparing the performance of our method with that of non BDD–based techniques (such as using SAT solvers [4]), and investigate other fields of application for EV+MDDs.
272
G. Ciardo and R. Siminiceanu
References 1. P. A. Abdulla, P. Bjesse, and N. E´en. Symbolic reachability analysis based on SAT-solvers. In S. Graf and M. Schwartzbach, editors, Proc. Tools and Algorithms for the Construction and Analysis of Systems TACAS, Berlin, Germany, volume 1785 of LNCS, pages 411–425. Springer-Verlag, 2000. 2. V. Amoia, G. De Micheli, and M. Santomauro. Computer-oriented formulation of transition-rate matrices via Kronecker algebra. IEEE Trans. Rel., 30:123–132, June 1981. 3. R. I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Maciii, A. Pardo, and F. Somenzi. Algebraic decision diagrams and their applications. Formal Methods in System Design, 10(2/3):171–206, Apr. 1997. 4. A. Biere, A. Cimatti, E. Clarke, and Y. Zhu. Symbolic model checking without BDDs. LNCS, 1579:193–207, 1999. 5. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Comp., 35(8):677–691, Aug. 1986. 6. R. E. Bryant and Y.-A. Chen. Verification of arithmetic circuits with binary moment diagrams. In Proc. of Design Automation Conf. (DAC), pages 535–541, 1995. 7. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. In Proc. 5th Annual IEEE Symp. on Logic in Computer Science, pages 428–439, Philadelphia, PA, 4–7 June 1990. IEEE Comp. Soc. Press. 8. G. Ciardo, R. L. Jones, A. S. Miner, and R. Siminiceanu. SMART: Stochastic Model Analyzer for Reliability and Timing. In P. Kemper, editor, Tools of Aachen 2001 Int. Multiconference on Measurement, Modelling and Evaluation of Computer-Communication Systems, pages 29–34, Aachen, Germany, Sept. 2001. 9. G. Ciardo, G. Luettgen, and R. Siminiceanu. Efficient symbolic state-space construction for asynchronous systems. In M. Nielsen and D. Simpson, editors, Application and Theory of Petri Nets 2000 (Proc. 21th Int. Conf. on Applications and Theory of Petri Nets, Aarhus, Denmark), LNCS 1825, pages 103–122. SpringerVerlag, June 2000. 10. G. Ciardo, G. Luettgen, and R. Siminiceanu. Saturation: An efficient iteration strategy for symbolic state space generation. In T. Margaria and W. Yi, editors, Proc. Tools and Algorithms for the Construction and Analysis of Systems (TACAS), LNCS 2031, pages 328–342, Genova, Italy, Apr. 2001. Springer-Verlag. 11. E. Clarke, E. Emerson, and A. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans. Progr. Lang. and Syst., 8(2):244–263, Apr. 1986. 12. E. Clarke and X. Zhao. Word level symbolic model checking: A new approach for verifying arithmetic circuits. Technical Report CS-95-161, Carnegie Mellon University, School of Computer Science, May 1995. 13. E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 1999. 14. R. Drechsler and B. Becker. Overview of decision diagrams. IEE Proc.-Comput. Digit. Tech., 144(3):187–193, May 1997. 15. E.M. Clarke, O. Grumberg, K.L. McMillan, and X. Zhao. Efficient generation of counterexamples and witnesses in symbolic model checking. In 32nd Design Automation Conference (DAC 95), pages 427–432, San Francisco, CA, USA, 1995. 16. A. Geser, J. Knoop, G. L¨ uttgen, B. Steffen, and O. R¨ uthing. Chaotic fixed point iterations. Technical Report MIP-9403, Univ. of Passau, 1994.
Using Edge-Valued Decision Diagrams
273
17. R. Hojati, R. K. Brayton, and R. P. Kurshan. BDD-based debugging of designs using language containment and fair CTL. In C. Courcoubetis, editor, Computer Aided Verification (CAV’93), volume 697 of LNCS, pages 41–58, Elounda, Greece, June/July 1993. Springer-Verlag. 18. J.R. Burch, E.M. Clarke, and D.E. Long. Symbolic model checking with partitioned transition relations. In A. Halaas and P.B. Denyer, editors, Int. Conference on Very Large Scale Integration, pages 49–58, Edinburgh, Scotland, Aug. 1991. IFIP Transactions, North-Holland. 19. T. Kam, T. Villa, R. Brayton, and A. Sangiovanni-Vincentelli. Multi-valued decision diagrams: theory and applications. Multiple-Valued Logic, 4(1–2):9–62, 1998. 20. Y.-T. Lai, M. Pedram, and B. K. Vrudhula. Formal verification using edge-valued binary decision diagrams. IEEE Trans. Comp., 45:247–255, 1996. 21. Y.-T. Lai and S. Sastry. Edge-valued binary decision diagrams for multi-level hierarchical verification. In Proceedings of the 29th Conference on Design Automation, pages 608–613, Los Alamitos, CA, USA, June 1992. IEEE Computer Society Press. 22. A. S. Miner and G. Ciardo. Efficient reachability set generation and storage using decision diagrams. In H. Kleijn and S. Donatelli, editors, Application and Theory of Petri Nets 1999 (Proc. 20th Int. Conf. on Applications and Theory of Petri Nets, Williamsburg, VA, USA), LNCS 1639, pages 6–25. Springer-Verlag, June 1999. 23. T. Murata. Petri Nets: properties, analysis and applications. Proc. of the IEEE, 77(4):541–579, Apr. 1989. 24. P. F. Williams, A. Biere, E. M. Clarke, and A. Gupta. Combining Decision Diagrams and SAT Procedures for Efficient Symbolic Model Checking. In Proceedings of CAV’00, pages 124–138, 2000.
Mechanical Verification of a Square Root Algorithm Using Taylor’s Theorem Jun Sawada1 and Ruben Gamboa2 1
2
IBM Austin Research Laboratory Austin, TX 78759
[email protected] Department of Computer Science University of Wyoming Laramie, WY 82071
[email protected] Abstract. The IBM Power4TM processor uses Chebyshev polynomials to calculate square root. We formally verified the correctness of this algorithm using the ACL2(r) theorem prover. The proof requires the analysis on the approximation error of Chebyshev polynomials. This is done by proving Taylor’s theorem, and then analyzing the Chebyshev polynomial using Taylor polynomials. Taylor’s theorem is proven by way of non-standard analysis, as implemented in ACL2(r). Since a Taylor polynomial has less accuracy than the Chebyshev polynomial of the same degree, we used hundreds of Taylor polynomial generated by ACL2(r) to evaluate the error of a Chebyshev polynomial.
1
Introduction
We discuss the formal verification of a floating-point square root algorithm used in the IBM Power4TM processor. The same algorithm was first presented and proven, not formally, by Agarwal et al in [2]. Obviously, the drawback of a handproof is that it does not provide an absolute assurance of correctness. Formal verification gives a higher-level of confidence by mechanically checking every detail of the algorithm. The formal verification of square root algorithms used in industrial processors has been studied in the past. Russinoff used the ACL2 theorem prover [12] to verify the microcode of K5 Microprocessor [18]. Later he also verified the square root algorithm in the K7 microprocessor [19]. Aagaard et al. [1] verified the square root algorithm used in an Intel processor with the Forte system [15] that combines symbolic trajectory evaluation and theorem proving. The square root algorithms mentioned above use the Newton-Raphson algorithm or one of its variants. This algorithm starts with an initial estimate and interactively calculates a better estimate from the previous one. The formula to obtain the new estimate is relatively simple. It takes a few iterations to obtain an estimate that is accurate enough. This estimate is rounded to the final answer according to a specified rounding mode. In Newton-Raphson’s algorithm, many M.D. Aagaard and J.W. O’Leary (Eds.): FMCAD 2002, LNCS 2517, pp. 274–291, 2002. c Springer-Verlag Berlin Heidelberg 2002
Mechanical Verification of a Square Root Algorithm
275
instructions are dependent on earlier instructions. The algorithm may require more execution cycles on a processor with many pipeline stages and high latency. The IBM Power4 processor and its predecessor Power3TM processor use a different iteration algorithm. From the initial approximation, it obtains a better approximation using a Chebyshev polynomial. Polynomial calculation needs more instructions than a single iteration of the Newton-Raphson algorithm. However, only a single iteration is sufficient to obtain the necessary precision. Since instructions in the polynomial calculation are less dependent on earlier instructions than those in the Newton-Raphson algorithm, more instructions can be executed in parallel with a pipelined floating-point unit. We verify that this algorithm returns a final estimate accurate enough to guarantee that it is rounded to the correct answer. The verification was carried out with the ACL2(r) theorem prover [4]. ACL2(r) is an extension of the ACL2 theorem prover that performs reasoning on real numbers using non-standard analysis [17]. The verification of the square root algorithm took place in three steps: S1 Prove Taylor’s theorem. S2 Bound the error of a Chebyshev polynomial using the result from S1. S3 Prove the algorithm using the result from S2. One challenge for the formal verification of this algorithm is the error size analysis on the Chebyshev polynomial approximating the square root function. Our approach uses Taylor polynomials in the measurement of the error size of a Chebyshev polynomial. However, a Chebyshev polynomial gives a better approximation than a Taylor polynomial of the same degree, thus it cannot be done in a straightforward fashion. Certainly, we can use a high-degree Taylor polynomial to obtain a better precision, as was done by Harrison [8,9] in his analysis of exponential and trigonometric functions. In order to measure the error of a polynomial p(x) approximating function f (x), he used a high-degree Taylor polynomial t(x) which approximates f (x) far better than p(x). The upper bound of |t(x) − p(x)| can be obtained by calculating its value at the points where the derivatives of the polynomials satisfy t (x) − p (x) = 0. However, calculating all the roots of this equation is a major bottleneck in automating the proof. Our approach, instead, generates hundreds of Taylor polynomials that have no higher degree than p(x), and measure the error size of p(x) in divided segments. This approach does not require solving equations, and can be automated easily. This paper is organized as follows. In Section 2, we introduce the nonstandard analysis features of ACL2(r) that form a basis for our proof. In Section 3, we describe the proof of Taylor’s theorem in ACL2(r), which corresponds to the step S1. In Section 4, we describe the square root algorithm used in the Power4 processor and its verification, which corresponds to the step S3. This section assumes that certain proof obligations are met. These proof obligations are proven in Section 5, using Taylor’s theorem. This corresponds to the step S2. Finally, we conclude in Section 6.
276
2
J. Sawada and R. Gamboa
ACL2(r): Real Analysis Using Non-standard Analysis
Non-standard analysis, introduced by Robinson in the 1960s using model theoretic techniques and later given an axiomatization by Nelson [17,14], provides a rigorous foundation for the informal reasoning about infinitesimal quantities used by Leibniz when he co-invented the calculus and still used today by engineers and scientists when applying calculus. There are several good introductions to non-standard analysis, for example [16,13]. In this section, we give the reader enough of the background to follow subsequent discussions. Non-standard analysis changes our intuitive understanding of the real number line in a number of ways. Some real numbers, including all numbers that are uniquely determined by a first-order formula, such as 0, 1, e, and π, are called standard. There are real numbers that are larger in magnitude than all the standard reals; these numbers are called i-large. Numbers that are not i-large are called i-limited. Moreover, there are reals smaller in magnitude than any positive standard real; these numbers are called i-small. It follows that 0 is the only number that is both standard and i-small. Notice that if N is an i-large number, 1/N must be i-small. Two numbers are called i-close if their difference is i-small. It turns out that every i-limited number is i-close to a standard number. That is, if x is i-limited, it can be written as x = x∗ + , where x∗ is standard and is i-small. The number x∗ is called the standard-part of x. The terms i-large, i-small, and i-close give mathematical precision to the informal ideas “infinitely large,” “infinitely small,” and “infinitely close.” These informal notions are ubiquitous in analysis, where they are often replaced by formal statements about series or by − δ arguments. A feature of non-standard analysis is that it restores the intuitive aspects of analytical proofs. For example, the sequence {an } is said to converge to the limit A if and only if aN is i-close to A for all i-large N . This agrees with the intuitive notion of convergence: “an gets close to A when n is large enough.” Similarly, consider the notion of derivatives: the function f has derivative f (x) at a standard point x if and only if (f (x) − f (y))/(x − y) is i-close to f (x) whenever x is i-close to y. Again, the formal definition follows closely the intuitive idea of derivative as the slope of the chord with endpoints “close enough.” The non-standard definition principle allows the definition of functions by specifying√ their behavior only at standard points. For example, consider the function x. One way to define it is to provide an approximation scheme fn (x) so that {f √ square root of x. For standard points x, the √n (x)} converges to the function x can be defined by x = (fN (x))∗ , where N is an i-large integer. Using the non-standard definitional principle, this function defined over standard √ numbers is extended to the function x defined over the entire real number line. The transfer principle allows us to prove a first-order statement P (x) about the reals by proving it only when x is standard. This principle can be applied only when the statement P (x) is a statement without using the new functions of non-standard analysis, such as standard,√ i-large, i-small, i-close, or standard-part. Consider the example given above for x. The function fN (x) is an approximation to the square root of x, so it is reasonable that fN (x) · fN (x) is i-close
Mechanical Verification of a Square Root Algorithm
277
to x when x is i-limited and N is i-large. In fact, such a theorem can proved in using induction on N . What this means is that for standard x, √ √ ACL2(r) ∗ ∗ ∗ x · x = (fN (x)) √ ·√(fN (x)) = (fN (x) · fN (x)) = x. The transfer principle then establishes x · x = x for all x. Using the non-standard definition and transfer principles in tandem is a powerful and ubiquitous technique in ACL2(r). To illustrate it, we present a proof of the maximum theorem in ACL2(r). The theorem states that if f is a continuous function on the closed interval [a, b], there is a point x ∈ [a, b] so that f (x) ≥ f (y) for all y ∈ [a, b]. This theorem is used in the proof of Rolle’s Lemma, which in turn is the key to proving Taylor’s Theorem. We begin by introducing an arbitrary continuous function f in a domain. This can be done in ACL2 using the encapsulate event: (encapsulate ((f (x) t) (domain-p (x) t)) (local (defun f (x) x)) (local (defun domain-p (x) (realp x))) (defthm domain-real (implies (domain-p x) (realp x))) (defthm domain-is-interval (implies (and (domain-p l) (domain-p h) (realp x) (