Robustness and Usability in Modern Design Flows
Robustness and Usability in Modern Design Flows by
Görschwin Fey Univ...
43 downloads
876 Views
769KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Robustness and Usability in Modern Design Flows
Robustness and Usability in Modern Design Flows by
Görschwin Fey University of Bremen Germany and
Rolf Drechsler University of Bremen Germany
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 978-1-4020-6535-4 (HB) ISBN 978-1-4020-6536-1 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com
Printed on acid-free paper
All Rights Reserved c 2008 Springer Science + Business Media B.V.
No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
To Liva Jolanthe and Luna Sophie
CONTENTS
Dedication List of Figures List of Tables Preface 1. INTRODUCTION 2. PRELIMINARIES 2.1 Boolean Reasoning 2.1.1 Boolean Functions 2.1.2 Binary Decision Diagrams 2.1.3 Boolean Satisfiability 2.2 Circuits 2.2.1 Circuits and Traces 2.2.2 BDD Circuits 2.2.3 Transformation into CNF 2.3 Formal Verification 2.3.1 Equivalence Checking 2.3.2 Bounded Model Checking 2.4 Automatic Test Pattern Generation 2.4.1 Fault Models 2.4.2 Combinational ATPG 2.4.3 Classical ATPG Algorithms
v xi xv xvii 1 9 9 9 10 13 19 19 22 23 25 25 27 31 31 32 33
viii
ROBUSTNESS AND USABILITY
3. ALGORITHMS AND DATA STRUCTURES 3.1 Combining SAT and BDD Provers 3.1.1 Proof Techniques 3.1.2 Hybrid Approach 3.1.3 Experimental Results 3.2 Summary and Future Work
37 37 38 40 45 49
4. SYNTHESIS 4.1 Synthesis of SystemC 4.1.1 SystemC 4.1.2 SystemC Parser 4.1.3 Characteristics 4.1.4 Experimental Results 4.2 Synthesis for Testability 4.2.1 BDD Transformation 4.2.2 Testability 4.2.3 Experimental Results 4.3 Summary and Future Work
51 52 54 55 59 60 65 66 68 69 72
5. PROPERTY GENERATION 5.1 Detecting Gaps in Testbenches 5.1.1 Generating Properties 5.1.2 Selection of Properties 5.1.3 Experimental Results 5.2 Design Understanding 5.2.1 Methodology 5.2.2 Comparison to Other Techniques 5.2.3 Work Flow 5.2.4 Experimental Results 5.3 Summary and Future Work
75 77 78 81 83 87 87 91 91 92 97
6. DIAGNOSIS 6.1 Comparing SAT-based and Simulation-based Approaches 6.1.1 Diagnosis Approaches 6.1.2 Relation Between the Approaches 6.1.3 Qualitative Comparison 6.1.4 Experimental Results 6.2 Generating Counterexamples for Diagnosis 6.2.1 Choosing Good Counterexamples
99 101 102 107 109 112 115 116
ix
Contents
6.3
6.4
6.2.2 Heuristics to Choose Counterexamples 6.2.3 Experimental Results Debugging Properties 6.3.1 Other Diagnosis Approaches 6.3.2 Diagnosis for Properties 6.3.3 Source Level Diagnosis 6.3.4 Experimental Results Summary and Future Work
123 126 130 132 133 141 142 147
7. SUMMARY AND CONCLUSIONS
149
References
151
Index of Symbols
163
Index
165
LIST OF FIGURES
1.1 1.2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 3.1 3.2 3.3
Traditional design flow Enhanced design flow Example for a BDD BDD Gπb BDD Gϕ b DPLL procedure Decision stack Basic gates Simulation trace for the shift-register 1-bit-shift-register Multiplexor cell MUX Example for a BDD circuit Example for the conversion into CNF Miter circuit for equivalence checking SAT instance for BMC 1-bit-shift-register Example for the SAFM Boolean difference of the faulty circuit and the fault free circuit Justification and propagation Different approaches Overview over different node types Depth first traversal
3 5 11 12 12 15 17 19 21 21 22 23 25 26 27 30 31 33 34 41 42 44
xii
ROBUSTNESS AND USABILITY
3.4 3.5 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11
Modified node structure Solution to the 5-Queens problem Synthesis part of the design flow Data types in the process counter proc Process counter proc of the robot controller from [GLMS02] AST for Example 16 Overall synthesis procedure Intermediate representation Arbiter: Block-level diagram Arbiter: Top-level module scalable FIR-filter: Block-level diagram Generation of circuits from BDDs Redundancy due to simplification Verification part of the design flow Integration into the verification flow Sketch of the property generation Simulation trace for the shift-register 1-bit-shift-register Runs resulting in a valid property for misex3 Time needed for property generation for misex3 Current verification methodology Proposed methodology Application of property deduction The arbiter Code of the arbiter Fault diagnosis in the design flow Basic simulation-based diagnosis Example of a sensitized path SAT-based diagnosis Basic SAT-based diagnosis Diagnosis based on set cover Example: COV may not provide a correction Example: Solution for k = 2 by BSAT but not by COV BSAT vs. COV: Average distance BSAT vs. COV: Number of solutions Circuit corresponding to the instance I1 of MI
45 46 52 56 56 56 57 58 60 61 63 67 69 76 78 79 79 81 85 85 88 90 91 94 96 100 103 104 105 105 107 108 109 114 115 121
List of Figures
6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22 6.23 6.24
Algorithm to build the subset circuit Greedy algorithm to choose counterexamples Number of candidates Time for diagnosis Faulty arbiter circuit Circuit with gate g2 as diagnosis (Ω = req + ack + X ack, Ψ = ack + X ack). State elements considered for Ackermann constraints Pseudocode of the static decision strategy Source code link State machine for branch prediction Source code for bpb am2910: Runtime vs. number of diagnosed components gcd: Runtime vs. Number of diagnosed components
xiii 122 125 129 130 135 136 138 140 142 143 144 147 147
LIST OF TABLES
2.1 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 4.5 4.6 5.1 5.2 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
Transformation of an AND-gate into a CNF formula 24 Index of node types (32-bit) 45 Heuristics to limit the size of the hybrid structure 46 Selection of expansion nodes 47 ESOP minimization 48 Arbiter: Synthesis results 62 FIR-filter: Synthesis results 63 ISCAS 89: Synthesis results 64 Benchmarks before and after optimization by SIS 70 Path-delay fault coverage of BDD circuits 71 Path-delay fault coverage of BDD circuits optimized by sifting 71 Sequential benchmarks, tcyc = 1,000,000 86 Sequential benchmarks, tcyc = 100, 000 93 Comparison of the approaches 110 Run time of the basic approaches 112 Quality of the basic approaches 113 Circuit data 127 Results using two counterexamples 127 Results using three counterexamples 128 Results using four counterexamples 128 Diagnosis results for multiple counterexamples 145 and Ackermann constraints Run times for the different approaches 146 (using four counterexamples)
PREFACE
The size of technically producible integrated circuits increases continuously. But the ability to design and verify these circuits does not keep up with this development. Therefore, today’s design flow has to be improved to achieve a higher productivity. In this book the current design methodology and verification methodology are analyzed, a number of deficiencies are identified, and solutions are suggested. Improvements in the methodology as well as in the underlying algorithms are proposed. An in-depth presentation of preliminary concepts makes the book self-contained. Based on this foundation major design problems are targeted. In particular, a complete tool flow for Synthesis for Testability of SystemC descriptions is presented. The resulting circuits are completely testable and test pattern generation in polynomial time is possible. Verification issues are covered in even more detail. A whole new paradigm for formal design verification is suggested. This is based upon design understanding, the automatic generation of properties, and powerful tool support for debugging failures. All these new techniques are empirically evaluated and experimental results are provided. As a result, an enhanced design flow is created that provides more automation (i.e. better usability) and reduces the probability of introducing conceptual errors (i.e. higher robustness).
Acknowledgments We would like to thank all members of the research group for computer architecture in Bremen for the helpful discussions and the great atmosphere during work and research. Furthermore, we would like to thank all our coauthors of the papers that make up an important part of this book: Roderick Bloem, Tim Cassens, Christian Genz, Daniel Große, Sebastian Kinder, Sean Safarpour, Stefan Staber, Andreas Veneris, and Tim Warode. Rüdiger Ebendt helped us in proofreading while unifying the notations. We would like to thank Lisa Teuber for designing the cover page. Antje Luchs patiently helped to improve the presentation for nonexperts. Görschwin Fey and Rolf Drechsler Bremen, September 2007
Chapter 1 INTRODUCTION
Almost every appliance used in daily life has an integrated circuit as a control unit. This applies not only to a modern television or a washing machine but also to cars or airplanes where security critical tasks are controlled by circuits. Up to several 100 million gates are contained in such an integrated circuit – also called “chip”. Moreover, the number of elements that are composed into a single chip doubles every 18 months according to Moore’s Law. This causes an exponentially increasing size of the problem instances that have to be handled during circuit design. Techniques and tools for computer-aided design (CAD) are available to create such complex systems. But often the tool development does not keep up with the progress in fabrication techniques. The “design gap” is resulting, i.e. the size of the circuits that can be produced increases faster than the productivity of the design process. One major issue is the robustness of the design tools. While a tool may produce an output of high quality within an acceptable run time for one design, this may not be the case for another design. Also, the performance of the tool cannot be predicted from the design itself. This behavior is not desirable while designing a circuit. But it is inherent to the problems solved by these tools. Many of these problems are computationally complex – often NP-complete – and, additionally, the size of the problem instances grows exponentially. For this reason, the underlying algorithms have to be continuously improved. This means to reduce the run time of these algorithms while keeping or even improving the quality of the output. A second reason for the design gap is the low usability of circuit design tools. Often a high expertise and long experience are needed, e.g. to adjust the large number of parameters or to optimally interpret the output. By automating more tasks to help the designer and providing tools that are easy to use, these steps become easier and, as a result, the design productivity increases.
2
ROBUSTNESS AND USABILITY
This book addresses both of these aspects: robustness and usability. For this purpose the current – in the following also called “traditional” – design flow is considered as a whole. A number of hot spots is identified where an improvement of either robustness or usability of the tools can significantly improve the overall productivity. Solutions to these methodological weaknesses are proposed. This leads to a new enhanced design flow based on the intensive use of formal methods. First, the traditional design flow is briefly reviewed and deficiencies are identified. Then, solutions for these deficiencies and the enhanced design flow are presented. This presentation is kept brief because the whole design flow is covered. A more detailed explanation of the problems and a motivation for the proposed solutions follows at the beginning of each chapter that addresses a particular problem. The major steps of the traditional design flow are shown in Figure 1.1 on Page 3. The design process itself is sketched in the left part of the figure while the right part shows the verification procedures. Rounded boxes denote tasks and angular boxes denote input data or output data of these tasks. Initially, a specification of the circuit is written, usually as a text book in natural language. This textual specification is then manually coded in two formal languages. An executable system description in terms of an ordinary software programming language (often C/C++ ) serves as an early system model to allow for the development of software and for simulation-based verification. Additionally, a synthesizable description in terms of a Hardware Description Language (HDL) is necessary. Both descriptions are usually coded independently. This redundancy in the design flow significantly extends the design time and, even worse, may lead to inconsistencies between the different design descriptions. Based on the HDL description, synthesis is carried out to retrieve the circuit description for production, i.e. a gate level or transistor level representation. Simulation is applied to check the compliance of the system level description with the textual specification and with the synthesizable description of the system. A testbench is created manually to describe crucial scenarios that have to be considered during simulation. But the state space grows exponentially with the number of state elements. A design with only 100 state elements, for example, has 2100 states already. Todays circuits often have more than 100 k state elements. Therefore these dynamic verification approaches are inherently incomplete in the sense that neither all input scenarios nor all design states can be considered due to time limits. Formal property checking overcomes this weakness. The industrial application of property checking is at its beginning. The formal verification with respect to the textual specification of a 2 million gate design for UMTS data transfer was described in [WTSF04]. Formal equivalence checking is already state of the art to guarantee the correctness of subsequent synthesis steps if the synthesizable description of the design is
3
Introduction
Textual specification
Manual setup
Testbench
Manual coding
Simulation
System level description Counterexamples Manual coding
Manual fault diagnosis Equivalence check
Synthesizable description
Synthesis
Gate level description
Manual fault diagnosis
ATPG
Testset
Counterexamples
Task Input/ Output
Figure 1.1. Traditional design flow
available. Equivalence checking has already replaced simulation-based methods in many industrial design flows. But all these methods only help to detect the existence of design errors. The localization of design errors currently remains a time- consuming manual task. As a last step, m Automatic Test Pattern Generation (ATPG) is applied to calculate input stimuli for the postproduction
4
ROBUSTNESS AND USABILITY
test. But during synthesis testability issues are usually not considered and, therefore, ATPG is difficult; the underlying problem is NP-complete. In this book, several approaches are proposed to remove the deficiencies that exist in the traditional design flow. By combining these techniques, a new enhanced design flow emerges. The enhanced flow boosts the productivity of circuit design and thereby reduces the design gap. Formal techniques are excessively used for this purpose since it has been shown that they improve the productivity of individual steps in the traditional flow already. One reason is the high computational power of these techniques compared to nonsymbolic techniques like, e.g. simulation-based approaches. As a starting point, the underlying algorithms for Boolean function manipulation are considered with respect to particular needs. Binary Decision Diagrams (BDDs) and Boolean Satisfiability (SAT) are the dominant engines in this area. Currently, efficiency, i.e. to calculate a solution as fast as possible, is a major focus in the development of such algorithms. Increasing the robustness of the formal techniques is an important issue. This is achieved by combining concepts from BDDs and solvers for the SAT problem. The resulting integrated data structure allows to trade BDD-like behavior for SAT-like behavior and, by this, to exploit the strengths of both domains. Additionally, the data structure can be used to investigate “more interesting” parts of the search space more thoroughly than others. Efficient Boolean function manipulation is the core of several techniques to improve the overall design flow. The enhanced design flow itself is shown in Figure 1.2 on Page 5. Bold lines around boxes indicated sections modified in comparison to the traditional design flow. As a first major improvement, the enhanced flow tightly couples the system level description and the synthesizable description. The two languages that are typically used – the software programming language and the HDL – are replaced by SystemC [LTG97, GLMS02] (see also http://www.systemc.org). SystemC is a description language that includes constructs to specify software and hardware at different levels of abstraction. As a result, the system level description can directly be refined into a synthesizable description within a single language. By this, the robustness of the design task is improved because the transformation of the system level model can be done more efficiently. The improved refinement step is complemented by synthesis for testability. The proposed technique produces circuits that are fully testable under several fault models. Here, a representation of the function of the circuit as a BDD is used as a starting point. This functional representation is directly converted into a fully testable circuit. While ATPG is NP-complete in general, all faults can be classified in polynomial time on these circuits – a robust ATPG step is the result. The weak simulation-based techniques for design verification are replaced by state-of-the-art formal techniques, namely property checking. The slow manual creation of properties is aided by automatically generating properties
5
Introduction
Textual specification
Interactive creation
Manual coding
Simulation traces
System level description
Simulation
Properties
Property check
Counterexamples Manual refinement
Fault diagnosis Equivalence check
Synthesizable description
Synthesis (for test.)
Gate level description
ATPG
Testset
Fault diagnosis
Counterexamples
Task Input/ Output
Figure 1.2. Enhanced design flow
from simulation traces. This allows to apply a new design methodology at this step. Properties are created interactively. The approach has a number of advantages. Automatically generated properties help to understand the simulation traces and by this the design itself. If the proof of these properties fails on the design, this also helps to identify gaps in the simulation traces. When
6
ROBUSTNESS AND USABILITY
testbenches are used for simulation, this bridges the traditional flow and the enhanced flow. That is of great importance for the practical application. As a side effect, the formal properties verifying the synthesizable description of the system are created much faster when an interactive approach is used. By this, verification with respect to all input sequences and all states of a design can be done more easily – the usability of the verification tools is raised. An inconsistency between different design descriptions is usually indicated by counterexamples, no matter which technique – simulation, property checking, or equivalence checking – or design step – design description or synthesis – is considered. Debugging this inconsistency, i.e. identifying the real error site in the description, is a time-consuming manual task. Here, using techniques for automatic fault diagnosis drastically boosts the productivity. Efficient state-ofthe-art techniques for fault diagnosis are compared and a technique to improve the generation of counterexamples for diagnosis is presented. The extension of diagnosis methods for debugging errors that are detected by formal properties is also considered. In contrast to previous methods, no correct output response per counterexample has to be given in advance and the diagnosis results are presented at the source code level. Automatically, helping the designer to find design errors reduces the difficulty of interpreting results from formal verification tools and, by this, the usability increases. Altogether, the proposed techniques and verification methodology establish the enhanced design flow. Only equivalence checking is not further considered in this book. Robust and easy-to-use tools for this task are already state of the art in the industrial application. Finally, the main improvements of the enhanced design flow over the traditional design flow can be summarized as follows: Integration of SAT and BDDs for robust Boolean function manipulation Tight coupling of system-level description and synthesizable description Fully testable circuits Automatic generation of properties from simulation traces Detection of gaps in simulation traces Automatic debugging support Presentation of diagnosis results at the source code level All techniques that are proposed have been implemented and empirically evaluated. They have been developed to the extent of a robust application on benchmark cases. Experimental results, a discussion of related work, and possible future extensions for the proposed techniques are presented in the respective
Introduction
7
sections. Each chapter addresses a particular problem area. Due to the comprehensive coverage of the whole design flow, a more detailed explanation of the problem and a motivation of the proposed solution are given at the beginning of each chapter. There, the embedding into the overall flow is also shown. A summary, possible future extensions and further related papers are given at the end of each chapter. This book is structured as follows: In the second chapter, the basic notations and definitions are given for the different concepts to keep the presentation self-contained. In Chapter 3, improvements for underlying algorithms for Boolean reasoning are explained. Namely, the integration of BDDs and SAT provers is investigated. Then, the synthesis step of the design flow is considered in Chapter 4. The technique to create fully testable circuits from SystemC descriptions is introduced. This is done in two steps. A tool to parse and synthesize a SystemC description is presented. The gate level circuit is then transformed into a fully testable circuit. Chapter 5 presents the techniques and methodology to improve the verification flow. First, the automatic generation of properties from traces is explained from a technical point of view and the practical application to detect gaps in testbenches is proposed. Then, the transition towards a whole new verification methodology based on design understanding and interactive creation of formal properties is discussed. Techniques for automatic diagnosis are reviewed in Chapter 6. Simulationbased diagnosis and SAT-based diagnosis are compared in detail. Then, the problem to produce counterexamples for an increased diagnosis quality is examined from a theoretical and practical point of view. Next, a technique to aid debugging for property checking is presented. Based on counterexamples, the error candidates are automatically calculated at the source code level. In the last chapter, the contributions of this book are summarized and conclusions are presented.
Chapter 2 PRELIMINARIES
This chapter provides the necessary definitions and notations to keep the book self-contained. The complete design flow and, therefore, a wide area is covered ranging from Boolean reasoning and underlying techniques to applications like formal verification and ATPG. Therefore, the presentation is kept brief. A large number of books is available for an in-depth discussion of each topic. References to some of these books are given at the beginning of the respective sections.
2.1
Boolean Reasoning
In the following, the notations used for Boolean functions, Boolean expressions, binary decision diagrams, and Boolean satisfiability are briefly reviewed. A more detailed presentation can be found, e.g. in [HS96].
2.1.1
Boolean Functions
Notation 1. The set of Boolean values is given by B = {0, 1}. A Boolean function f is a mapping f : Bn → B. Usually f is defined over n variables X = {x1 , . . . , xn } in the following. This is denoted by f (x1 , . . . , xn ). A multi-output Boolean function is a mapping f : Bn → Bm . A Boolean function can be described in terms of a Boolean expression. A Boolean expression over a set X = {x1 , . . . , xn } is formed over The variables The unary operator · (NOT) The binary operators, · (AND), + (OR), ⊕ (XOR), → (implication), ↔ (equivalence) Parentheses
10
ROBUSTNESS AND USABILITY
Given a Boolean function f (x1 , . . . , xn ), the positive cofactor fxi and the negative cofactor fxi with respect to xi are defined as follows: fxi (. . . , xi−1 , xi+1 , . . .) = f (. . . , xi−1 , 1, xi+1 , . . .) f xi (. . . , xi−1 , xi+1 , . . .) = f (. . . , xi−1 , 0, xi+1 , . . .) The iterative cofactor fli1 ... lij , where lik ∈ {xik , xik }, is retrieved by iteratively calculating the cofactors fli1 , (fli1 )li2 up to (. . . ((fli1 )li2 ) . . .)lij .
2.1.2
Binary Decision Diagrams
As is well known, a Boolean function f : Bn → B can be represented by a Binary Decision Diagram (BDD) which is a directed acyclic graph G = (V, E) representing a Boolean function [Bry86]. The Shannon decomposition f (x) = xfx + xfx is carried out in each of the internal nodes with respect to a given variable x. The function represented by an internal node is determined recursively by the two children. Terminal nodes represent the constant functions. Output nodes represent functions that are considered externally, i.e. user functions. A BDD is called ordered if each variable is encountered at most once on each path from the root to a terminal node and if the variables are encountered in the same order on all such paths. A BDD is called reduced if it does not contain isomorphic subgraphs nor does it have redundant nodes. Reduced and ordered BDDs are a canonical representation since for each Boolean function the BDD is unique up to graph isomorphism [Bry86]. In the following, we refer to reduced and ordered BDDs as BDDs for brevity. By using Complemented Edges (CEs), the size of a BDD can be further reduced [BRB90]. In this book, both types of BDDs – with and without CEs – are considered. Formally, the order of the n variables of a Boolean function can be given by mapping the variable index to a level in the graph G: π : {1, . . . , n} → {1, . . . , n}. The index n+1 is assigned to terminal nodes. A BDD with CEs has exactly one terminal node, denoted by 1. The function isTerminal(v) returns true if, and only if, v is a terminal node. Each internal node has two successors, denoted by Then(v) and Else(v), and v ∈ V is labeled with an index Index(v) ∈ {1, . . . , n}. Alternatively, the function Label(v) returns the variable of a node, i.e. Label(v) = xIndex(v) . Due to the order π, the inequality π(Index(v)) < min{π(Index(Then(v))), π(Index(Else(v)))} always holds, i.e. a node is always above its children. Output nodes v ∈ V are labeled with Index(v) = 0, Label(v) is undefined. They always reside on
11
Preliminaries
the topmost level 0. These nodes have exactly one successor Else(v). An edge e = (v, Then(v)) is never a CE. For edges e with e = (v, Else(v)) the attribute CE(e) is true if and only if e is a CE. By this, an output node v represents the Boolean function f or f , respectively, where f is the Boolean function represented by Else(v). Output nodes are denoted by a function symbol in all figures. In the following, Gπf denotes a BDD representing the Boolean function f with respect to variable order π. If clear from the context, π and f are omitted. The size of a BDD refers to the number of nodes excluding terminal nodes and output nodes. Example 1. Figure 2.1 shows the BDD for f = x1 x2 x3 + x1 x2 x4 + x1 x2 x3 x4 + x1 x2 x3 x4 . Edges from a node w to Else(w) are dashed; edges to Then(w) are solid. A dot denots a CE. The output node is denoted by the function symbol f . The BDD has a size of five. The implementations handle BDDs with CEs. BDDs without CEs are considered in some examples to keep the presentation simple. They have two terminals 0 and 1 but no edge attributes. As a result, two different nodes are needed to represent a function and its complement. In the worst case a BDD v0
f
v1 x4 v2
v3 x3
x3 v4 x2
v5
x1
1
1
Figure 2.1. Example for a BDD
12
ROBUSTNESS AND USABILITY
without CEs has twice the number of nodes compared to a BDD with CEs [BRB90]. The size of a BDD depends on the variable order. Example 2. Bryant [Bry86] gave the function b = x1 xn+1 + x2 xn+2 + · · · + xn x2n and the two variable orders π = (x1 , xn+1 , . . . , xn , x2n ), ϕ = (x1 , x2 , . . . , x2n−1 , x2n ) as an example. Figures 2.2 and 2.3 show the BDDs for variable orders π and ϕ, respectively. The BDD of b has a size of O(n) when π is used, but the size is O(2n ) when ϕ is used. The problem to decide whether a given variable ordering can be improved is NP-complete [BW96]. Efficient heuristics have been proposed to find a good variable order for BDDs. For example, Rudell’s sifting algorithm [Rud93] is b
b
x1
x1
x2
xn+1
x2
x2 x3
x3
x3
x3
xn+2 x2n−1
xn
x2n
x2n 0
1 Figure 2.2.
x2n−1
BDD Gπb
1 Figure 2.3. BDD Gϕ b
0
13
Preliminaries
quite fast while techniques based on evolutionary algorithms usually yield better results at the cost of a higher run time [DBG96].
2.1.2.1 Efficient Implementation Using BDDs in practice is relatively easy since efficient BDD packages are available, e.g. CUDD [Som01a]. A BDD node v is stored as a triple (Index(v), Then(v), Else(v)), where Then(v) and Else(v) are pointers to the memory locations of the children of v. The least significant bits of these pointers are always zero in modern computers that address the memory word-wise. To save memory, the attribute CE(v, Else(v)) is stored in the least significant bit of the pointer to Else(v). A hash is used to uniquely store the tuples representing all nodes. This hash is called the unique table. An advantage of BDDs is the efficiency of Boolean operations [Bry86]. Consider a Boolean operation ◦ ∈ {·, +, ⊕, ↔}. Given two functions f and g, the result of f ◦ g is calculated as follows: f ◦ g = (xfx + xfx ) ◦ (xgx + xgx ) = x(fx ◦ gx ) + x(fx ◦ gx )
(2.1)
Given the BDD nodes representing the functions f and g, this corresponds to the construction of a node to represent f ◦ g. This node is determined by recursively calculating the result of the operation on the children. By using the unique table, an existing node is reused if the function was already represented within the BDD package. Otherwise, a new node is created. This guarantees that only reduced BDDs are created and no additional reduction step is necessary. A second hash, the computed table, is used to efficiently carry out the recursive descent. The computed table is accessed via the operands and the operation as key. The value stores the result of the operation. Each time a result is calculated this is stored in the computed table. Therefore, each pair of nodes is only considered once with respect to a particular binary Boolean operation.
2.1.3
Boolean Satisfiability
Besides, BDDs solvers for Boolean Satisfiability (SAT) provide a powerful reasoning engine for Boolean problems. In CAD of circuits and systems, problems are frequently transformed into an instance of SAT. Then, a SAT solver can be used to calculate a solution and the SAT solution is transformed back into the original problem domain. In particular, SAT solvers are often used as the underlying engine for formal verification. In this work, SAT solvers are applied for diagnosis. Moreover, the concepts will be important when considering the underlying algorithms for Boolean function manipulation. The SAT problem and efficient algorithms to solve a given SAT instance are reviewed in this section.
14
ROBUSTNESS AND USABILITY
Given a Boolean function f (x1 , . . . , xn ) in conjunctive normal form, the SAT problem is to find an assignment a = a1 , . . . , an for x1 . . . . , xn such that f (a1 , . . . , an ) = 1 or to proof that no such assignment exists. For the corresponding decision problem the question whether a exists has to be answered. This was the first problem that was proven to be NP-complete [Coo71]. Despite this proven difficulty of the problem, algorithms for SAT solving have been proposed recently that efficiently solve many practical SAT instances.
2.1.3.1 SAT Solver SAT solvers usually work on a database that represents the Boolean formula in Conjunctive Normal Form (CNF), also called product of sums. A CNF formula is a conjunction (product) of clauses where each clause is a disjunction (sum) of literals. Finally, a literal is a variable or its complement. The objective during SAT solving is to find a satisfying assignment for the given Boolean formula or to prove that no such assignment exists. A CNF formula is satisfied if all clauses are satisfied. A clause is satisfied if at least one literal in the clause is satisfied. The literal x is satisfied if the value 1 is assigned to variable x. The literal x is satisfied if the value 0 is assigned to variable x. If there exists a satisfying assignment for a formula, the formula is said to be satisfiable, otherwise the formula is unsatisfiable. Example 3. The following Boolean formula is given in CNF: f (x1 , x2 , x3 , x4 ) = (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) w1
w2
w3
· (x1 + x3 + x4 ) · (x1 + x2 + x3 ) w4
w5
This CNF formula has five clauses w1 , . . . , w5 . A satisfying assignment for the formula is given by x1 = 1, x2 = 1, x3 = 1 and x4 = 0. Therefore this formula is satisfiable. Modern SAT solvers are based on the DLL procedure that was first introduced in [DLL62] as an improvement upon [DP60]. Often the DLL procedure is also referred to as DPLL. In principle, this algorithm explores the search space of all assignments by a backtrack search as shown in Figure 2.4. Iteratively, a decision is done by choosing a variable and a value for this variable according to a decision heuristic (Step 1). Then, implications due to this assignment are carried out (Step 2). When all clauses are satisfied, the problem is solved (Step 3). Otherwise, the current assignment may only be partial and therefore no conclusion is possible, yet. In this case, further assignments are necessary (Step 4). If at least one clause cannot be satisfied under the current (partial) assignment, conflict analysis is carried out as will be explained below.
15
Preliminaries
1. Decision: Choose an unassigned variable and assign a new value to the variable. 2. Boolean Constraint Propagation: Carry out implications resulting from the previous assignment. 3. Solution: If all clauses are satisfied, output the current variable assignment and return “satisfiable.” 4. If there is no unsatisfied clause due to the current assignment, proceed with Step 1. 5. Conflict analysis: If the current assignment leads to at least one unsatisfied clause without unassigned literals, carry out conflict analysis and add conflict clauses. 6. (Non-chronological) Backtracking: Undo the most recent decision where switching the variable could lead to a solution, undo all implications due to this assignment and switch the variable value. Go to Step 2. 7. Unsatisfiable: Return “unsatisfiable.” Figure 2.4.
DPLL procedure
Then, a new branch in the search tree is explored by switching the variable value (Step 6). When there is no decision to undo, the search space has been completely explored and the instance is unsatisfiable (Step 7).
2.1.3.2 Advances in SAT Only after some substantial improvements over the basic DPLL procedure in the recent past SAT solvers became a powerful engine to solve real world problems. In particular, these improvements were: efficient Boolean Constraint Propagation (BCP), conflict analysis together with non-chronological backtracking, and sophisticated decision heuristics. BCP carries out implications due to previous decisions. In order to satisfy a CNF formula, all clauses must be satisfied. Now, assume that under the current partial assignment all but one literal in a clause evaluate to 0 and the variable of the last literal is unassigned. Then, the value of this last variable can be implied in order to evaluate the clause to 1.
16
ROBUSTNESS AND USABILITY
Example 4. Again, consider the CNF formula from Example 3. Assume the partial assignment x1 = 1 and x2 = 1. Then, due to clause w5 = x1 + x2 + x3 the assignment x3 = 1 can be implied. After each decision BCP has to be carried out and, therefore, the efficiency of this procedure is crucial for the overall performance. In [MMZ+ 01] an efficient architecture for BCP was presented for the SAT solver Chaff (the source code of the implementation Zchaff can be downloaded from [Boo04]). The basic idea is to use the two literal watching scheme to efficiently detect where an implication may be possible. Two literals of each clause are watched. Only if one of these literals evaluates to 0 upon a previous decision and the other literal is unassigned, an implication may occur for the clause. If no implication occurs because there is a second unassigned literal, this second literal is watched. For each literal a watching list is stored to efficiently access those clauses where the particular literal is watched. Therefore, instead of always touching all clauses in the database, only those clauses that may cause an implication are considered. Conflict analysis was first proposed in [MS96, MS99] for the SAT solver GRASP. In the traditional DPLL procedure only the most recent decision was undone when a conflict, i.e. a clause that is unsatisfied under the current assignment, was detected. In contrast, a modern SAT solver analyzes such a conflict. During BCP, a conflict occurs if opposite values are implied for a single variable due to different clauses. Then, the decisions that were responsible for this conflict are detected. These decisions are the reason for the conflict. From this reason a conflict clause is created to prevent the solver to reenter the same search space. As soon as all but one literal of the conflict clause are assigned, BCP takes over and implies the value of the remaining literal. As a result the previously conflicting assignment is not considered again. Finally, the SAT solver backtracks to the decision before the last decision that participated in the conflict. Switching the value of the last decision that lead to the conflict is done by BCP due to the inserted conflict clause. So this value assignment becomes an implication instead of a decision – also called conflict driven assertion. Example 5. Again, consider the CNF formula from Example 3: f (x1 , x2 , x3 , x4 ) = (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x3 + x4 ) · (x1 + x2 + x3 ) Each time the SAT solver makes a decision, this decision is pushed onto the decision stack. Now, assume that the first decision at decision level L0 is the assignment x1 = 0. No implications follow from this decision. Then, the solver decides x2 = 0 at L1. Again, no implications follow. The solver decides x3 = 1 at L2. Now, according to clause w1 the assignment x4 = 1 is implied,
17
Preliminaries
L0 x1=0
w1
w3 L0 x =0 1 w6
x4=1
w2 L1 x2=0
w4 L1 x2=0
x4
w1 L2 x3=1 w x4=0 2
x3=0 w 3
x4=1
w4
x4
x4=0
L2
(a) Configuration 1
(b) Configuration 2
w7
L0 x =1 2
x1=1
w5 x3=1
L1 x =0 4
L2
(c) Configuration 3
Figure 2.5. Decision stack
but also, due to w2 , the assignment x4 = 0 is implied. Therefore, a conflict with respect to variable x4 occurs. This situation is shown in Figure 2.5(a). The decision stack is shown on the left hand side. The solver tracks reasons for assignments using an implication graph (shown on the right hand side). Each node represents an assignment. Decisions are represented by nodes without predecessors. Each implied assignment has the reason that caused the assignment as its predecessors. The edges are labeled by the clauses that cause an assignment. In the example, the decisions x1 = 0 and x3 = 1 caused the assignment x4 = 1 due to clause w1 . Additionally, this caused the assignment x4 = 0 due to w2 and a conflict results. By traversing the graph backwards, the reason for the conflict, i.e. x1 = 0 and x3 = 1, can be determined. Now, it is known that this assignment must be avoided in order to satisfy the CNF formula. This information is stored by adding the conflict clause
18
ROBUSTNESS AND USABILITY
w6 = (x1 + x3 ) to the CNF formula. Thus, the nonsolution space is recognized earlier while searching – this is also called conflict based learning. The decision x3 = 1 is undone. Due to x1 = 0 and the conflict clause w6 , the assignment x3 = 0 is implied which is called a conflict driven assertion. The implication x3 = 0 triggers a next conflict with respect to x4 as shown in Figure 2.5(b). The single reason for this conflict is the decision x1 = 0. So the conflict clause w7 = (x1 ) is added. Now, the solver backtracks above decision level L0. This happens because the decision x2 = 0 was not a reason for the conflict. Instead, nonchronological backtracking occurs – the solver undoes any decision up to the most recent decision that was involved in the conflict. Therefore, in the example, the decisions x2 = 0 and x1 = 0 are undone. Due to the conflict clause w7 , the assignment x1 = 1 is implied independent of any decision as shown in Figure 2.5(c). Then, the decision x2 = 1 is done at L0. For efficiency reasons the SAT solver does not check whether all clauses are satisfied under this partial assignment but only detects conflicts. Finally, a satisfying assignment is found by deciding x4 = 0 at L1. In summary, this example shows on an informal basis how a modern SAT solver carries out conflict analysis and uses conflict clauses “to remember” nonsolution spaces. A large number of added conflict clauses may result in memory problems. This is resolved by removing conflict clauses from time to time which does not change the initial problem instance. A formal and more detailed presentation of the technique can be found in [MS99]. The algorithms to derive conflict clauses have been further improved, e.g. in [ZMMM01, ES04]. A result of this learning is a drastic speed-up of the solving process – in particular, also for unsatisfiable formulas. The last major improvement of SAT solvers results from sophisticated decision heuristics. Basically, the SAT solver dynamically collects statistics about the occurrence of literals in clauses. A dynamic procedure is used to keep track of conflict clauses added during the search. An important observation is that locality is achieved by exploiting recently learned information. This helps to speed up the search. An example is the Variable State Independent Decaying Sum (VSIDS) strategy employed in [MMZ+ 01]. A counter exists for each literal to count the number of occurrences in clauses. Each time a conflict clause is added, the counters are incremented accordingly. The value of these counters is regularly divided by two. This helps to emphasize the influence of more recently learned clauses. But a large number of other heuristics has also been investigated, e.g. in [Mar99, GN02, JS05]. Another ingredient to modern SAT solvers is a powerful preprocessing step as proposed in [Dre04, EB05, JS05, EMS07]. The original CNF formula is usually a direct mapping of the problem onto a CNF representation. No optimizations are carried out, e.g. clauses with only one literal are frequently
19
Preliminaries
contained in this original CNF formula, but these can be eliminated without changing the solution space. When preprocessing the CNF formula, optimizations are applied to make the representation more compact and to improve the performance of BCP. Due to these advances, SAT solvers have become the state of the art for solving a large range of problems in CAD, e.g. formal verification [BCCZ99, KPKG02], debugging or diagnosis [SVV04, ASV+ 05, FSVD06], and test pattern generation [SBSV96, SFD+ 05b].
2.2
Circuits
Circuits are considered throughout the design flow. Often formal definitions for circuits aim at a special purpose like synthesis [HS96] or ATPG [KS97]. In this book a more general definition is used to also cope with different tasks, like verification, simulation, and diagnosis. After defining circuits for the sequential and the combinational case, the mapping of a BDD to a circuit is introduced. Finally, the transformation of a circuit into a CNF formula which is necessary when applying a SAT solver is explained.
2.2.1
Circuits and Traces
A circuit is usually composed of the elements of a set of basic gates. This set of gates is called library. One example of such a library is the set of gates shown in Figure 2.6. These are the well-known gates that correspond to Boolean operators: AND, OR, XOR, and NOT. If necessary, it is straightforward to extend this library to other Boolean gates. In the following, the library usually consists of all Boolean functions with a single output. Where necessary, the library may be restricted to consider only a subset of all Boolean functions. The connections between gates are defined by an underlying graph structure. Additionally, for gates that represent nonsymmetric functions (e.g. multiplexors) a unique order for the inputs is given by ordering the predecessors of a gate. Definition 1. A sequential circuit is defined by C = (V, E, X, Y, S, N, F, P ) where An acyclic directed graph G = (V, E) defines the connections X = {x1 , . . . , xn } ⊆ V is the set of primary inputs
AND
OR
Figure 2.6.
XOR
Basic gates
NOT
20
ROBUSTNESS AND USABILITY
Y = {y1 , . . . , ym } ⊆ V is the set of primary outputs S = {s1 , . . . , sl } ⊆ V is the set of present state nodes N = {n1 , . . . , nl } ⊆ V is the set of next state nodes F : V → (B∗ → B) associates a Boolean function fv = F (v) to a node v (projection functions of variables are assigned to input nodes and present state nodes) P : (V \ (X ∪ S)) → (V \ (Y ∪ N ))∗ is an ordered tuple of predecessors of v: P (v) = (w1 , . . . , wp ) Thus, P (v) describes the input variables of fv . A gate of a circuit C = (V, E, X, Y, S, N, F, P ) is a node g ∈ V . This is often denoted by g ∈ C. The size of a circuit C is denoted by |C| and is equal to the number of gates |V |. For convenience, the output signal of a gate g is often referred to as signal g. If a propositional variable is needed for gate g, this variable is also denoted by g. For any gate g the Boolean function of g in terms of primary inputs and present state values is denoted by Fg . This function is retrieved by recursively substituting the variables in fg with the functions of predecessors of g. Definition 2. A controlling value at the input of a gate determines the value of the output independently of the values at other inputs. Example 6. The value 1 (0) is the controlling value for OR (AND), and the value 0 (1) is the non-controlling value for OR (AND). An XOR-gate does not have a controlling input value. These notations can be extended to handle gates with multiple outputs and hierarchical circuits. The extension is straightforward and therefore omitted. All the practical implementations of the techniques presented in this book handle these cases when necessary. Definition 3. A combinational circuit C = (V, E, X, Y, S, N, F, P ) is a circuit without state elements, i.e. S = ∅ and N = ∅. A circuit with state elements may also be referred to as sequential circuit. For brevity a combinational circuit C = (V, E, X, Y, ∅, ∅, F, P ) may be denoted by C = (V, E, X, Y, F, P ). The value of gate g at time step t is denoted by νg [t]. If the value is unknown or not important, this may be denoted by the values ‘U ’ or ‘−’, respectively.
21
Preliminaries
This may be particularly useful to describe a counterexample where the values of some signals are not important to excite a malfunction. In this case νg [t] ∈ {0, 1, −, U } but often νg [t] ∈ B is sufficient. Definition 4. A simulation trace T of length tcyc for a circuit C is given by a tuple (U, (u0 , . . . , utcyc −1 )), where U = (g1 , . . . , gr ) is a vector of r gates gj ∈ C, 1 ≤ j ≤ r and ut = (νg1 [t], . . . , νgr [t]) gives the values of these gates at time step t Example 7. Consider the waveforms in Figure 2.7(a) produced by the sequential circuit in Figure 2.8. For synchronously clocked circuits as studied in this book, the waveform can directly be mapped into the vector notation that is shown in Figure 2.7(b). Together with the vector U = (x2 , x1 , s1 , s2 , s3 ) this forms the simulation trace T = (U, (u0 , . . . , u5 )). Thus, a simulation trace directly corresponds to a waveform, e.g. given in the widely used Value Change Dump (VCD) format that is specified in IEEE Std 1364-1995.
x2
x2
0
0
0
0
0
0
x1
x1
1
1
0
1
0
0
s1
s1
0
1
1
0
1
0
s2
s2
0
0
1
1
0
1
s3
s3
0
0
0
1
1
0
u0 u1
u2
u3
u4
u5
0
1
2
3
4
5
t
(b) Vector representation of the waveforms
(a) Waveform
Figure 2.7.
Simulation trace for the shift-register
x2 x1
0 1
s1
0 1
Figure 2.8.
s2
0 1
1-bit-shift-register
s3
y1
22
ROBUSTNESS AND USABILITY d0
d1
s d0 d1 s
0
1
Figure 2.9. Multiplexor cell MUX
2.2.2
BDD Circuits
BDDs directly correspond to Boolean circuits composed of multiplexors as explained in [Bec92]. Such circuits are called BDD circuits in this work. More exactly: BDD circuits are combinational logic circuits defined over a fixed library. The typical multiplexor cell is denoted as MUX, and it is defined as shown in Figure 2.9 by its standard AND-, OR-, NOT-based realization. The left input is called control input, the upper inputs are called data inputs (left data input = 0-input, right data input = 1-input). Results reported for BDD circuits in this book also transfer to different realizations, e.g. the realization of a multiplexor in Pass Transistor Logic (PTL). The BDD circuit of a BDD is now obtained by the following construction: Traverse the BDD in topological order and replace each internal node v in the BDD by a MUX cell, connect the control input to the primary input Label(v), corresponding to the label of the BDD node. Then, connect the 1-input to the output of the multiplexor for Then(v), connect the 0-input to the multiplexor for Else(v) and insert an inverter if CE((v, Else(v))). Finally, substitute the output nodes by primary outputs and connect these outputs to the multiplexors of their successors; insert an inverter if the edge to the successor is complemented. Example 8. Figure 2.10 shows an example for the transformation. The original BDD is shown in Figure 2.10(a). Note that the root node in this case is shown on the bottom and the terminal nodes on the top. The corresponding BDD circuit can be seen in Figure 2.10(b). Remark 1. As has been suggested in previous work [Bec92, ADK93], the MUX cells connected to constant values can be simplified. But this reduction is not applied to the BDD circuits considered in this book unless stated otherwise. The reason is a degradation of the testability due to the optimization as will be shown in Section 4.2.
23
Preliminaries 0
1
0
x3
x3
x2
x1
0
Figure 2.10.
1
x2
0
x1
0
f (a) BDD
1
1
1
y (b) BDD circuit
Example for a BDD circuit
More details on BDD circuits and their applications in the design flow can be found, e.g. in [DG02].
2.2.3
Transformation into CNF
A SAT solver can be applied as a powerful black-box engine to solve a problem. In this case, transforming the problem instance into a SAT instance and the SAT solution into a solution for the original problem is crucial. In particular, the transformation of the circuit into a CNF formula is one step for multiple applications that can be implemented using a SAT prover as a core engine, e.g. ATPG, property checking, or debugging. Commonly, the Tseitin transformation [Tse68] is used that is defined for Boolean expressions. For each subformula a new propositional variable is introduced and constrained to be equivalent to the subformula. For example, in [Lar92] the application to circuits has been presented. The transformation of a single AND-gate into a set of clauses is shown in Table 2.1. The goal is to create a CNF formula that models an AND-gate, i.e. a CNF formula that is only satisfied for assignments that may occur for an ANDgate. For an AND-gate with two inputs x1 and x2 , the output y must always be equal to x1 ·x2 . The truth-table for this CNF formula is shown in Figure 2.1(a). From the truth-table a CNF formula is generated by extracting one clause for each assignment where the formula evaluates to 0. These clauses are shown in Table 2.1(b). This CNF representation is not minimal and can therefore be reduced by two-level logic minimization, e.g. using the tool ESPRESSO that is included in SIS [SSL+ 92]. The clauses in Table 2.1(c) are the final result.
24
ROBUSTNESS AND USABILITY
Table 2.1. Transformation of an AND-gate into a CNF formula (a) Truth-table
x1 0 0 0 0 1 1 1 1
x2 0 0 1 1 0 0 1 1
y 0 1 0 1 0 1 0 1
(b) Clauses
y ↔ x1 · x2 1 0 1 0 1 0 0 1
(c) Minimized
(x1 + x2 + y) ·
(x1 + x2 + y)
· ·
(x1 + x2 + y) (x1 + x2 + y)
· ·
(x1 + y) (x2 + y) (x1 + x2 + y)
For a gate g a propositional variable g is also used in the CNF formula when a circuit is considered in the following. This simplifies understanding and notation of CNF formulas that correspond to circuits. The Boolean expression ψg describes the constraints needed to model g. Now, the generation of the CNF formula for a complete circuit is straightforward. The Boolean expressions describing the gates are conjoined into one CNF formula. Clauses are generated for each gate according to the type. Given the circuit C = (V, E, X, Y, S, N, F, P ), the Boolean expression to model the whole circuit is given by ψC = g∈V ψg . If all subexpressions ψg are given in CNF representation, the overall expression is in CNF. The output variables of a gate and input variables of the successors are identical and therefore reflect the connections between gates within the CNF formula. Note that this only models the circuit for one-time step. Modeling the sequential behavior will be considered later. Example 9. Consider the circuit shown in Figure 2.11. The OR-gate is described by the formula ψy = (y ↔ a + b). The primary input x1 is described by ψx1 = x1 . As a result, the circuit is translated into the following CNF formula: (x1 + a) · (x2 + a) · (x1 + x2 + a) a↔x1 ·x2
· (x3 + b) · (x3 + b) b↔x3
· (a + y) · (b + y) · (a + b + y) y↔a+b
25
Preliminaries x1 x2
a y
x3
Figure 2.11.
b
Example for the conversion into CNF
An advantage of this transformation is the linear size complexity. Given a circuit where |C| is the sum of the numbers of inputs, outputs, and gates, the number of variables in the SAT instance is also |C| and the number of clauses is in O(|C|). A disadvantage is the loss of structural information. Only a set of clauses is given to the SAT solver. Information about predecessors and successors of a node is lost and is not used during the SAT search. But this information can be partially recovered for certain applications by introducing additional constraints into the SAT instance as proposed in [Sht01] for bounded model checking and in [SBSV96, SFD+ 05b] for test pattern generation.
2.3
Formal Verification
Formal verification covers mainly two aspects of the design flow. The verification of the initial HDL description of the design is addressed by model checking. The correctness of subsequent synthesis steps is verified by equivalence checking. These two techniques are introduced in this section. The simpler presentation for equivalence checking is given first. The text book [Kro99] gives a more comprehensive introduction and overview of techniques for formal verification.
2.3.1
Equivalence Checking
Formal equivalence checking determines whether the functions realized by two given circuits are identical. In the following, the equivalence checking problem for two combinational circuits is considered. Matching the primary inputs (outputs) of one circuit with those of the other circuit is a difficult problem itself [MMM02]. But this is not the focus of this work and, therefore, the mapping is assumed to be given. A common approach to carry out the equivalence check is to create a miter circuit [Bra83]. Example 10. Given a circuit CE = (VE , EE , X, Y, FE ) and its specification CS = (VS , ES , X, Y, FS ) with X = (x1 , x2 , x3 ) and Y = (y1 , y2 ), the miter circuit is built as shown in Figure 2.12. The output of the miter assumes the value 1 if, and only if, the current input assignment causes at least one pair
26
ROBUSTNESS AND USABILITY
x1 x2
y1
CE
1 y2
x3
y1
CS y2 Figure 2.12.
Miter circuit for equivalence checking
of outputs to assume different values. Such an input assignment is called a counterexample (see Definition 5 below). The two circuits are equivalent if no such input assignment exists. One possibility to solve the equivalence checking problem is to transform the miter circuit into a SAT instance and constrain the output to the value 1. The resulting SAT instance is unsatisfiable if the two circuits are equivalent. The SAT instance can be satisfied if implementation and specification differ in at least one output value under the same input assignment. In this case, SAT solving returns a single counterexample. An all solutions SAT solver [GSY04, LHS04] could be used if more than one counterexample is needed. Alternatively, BDDs could be used to calculate the counterexamples for each output symbolically. For output yi all counterexamples are represented by: FE,yi ⊕ FS,yi
(2.2)
But this approach is limited due to the potentially large size of BDDs. Therefore, in practice, structural information is usually exploited to simplify the problem by merging identical subcircuits and multiple engines are applied as proposed in [KPKG02]. For diagnosis and debugging, often a description of the implementation and one or more counterexamples are used. Formally, counterexamples are described as follows: Definition 5. Let the circuit C be a faulty implementation of a specification. A counterexample T is a triple (T, g, ν), where T is a simulation trace of C T causes an erroneous value at gate g ν is the correct value for gate g A test-set T is a set of counterexamples.
27
Preliminaries
For combinational equivalence checking the trace has a length of one time frame and the trace is defined solely over primary inputs. The fault is always observed at a primary output. If the counterexample is calculated symbolically, do not care values may be contained in the trace.
2.3.2
Bounded Model Checking
Model checking (or property checking) [CGP99] is a technique to formally prove the validity of a given property on a model. The property is usually given in some temporal logic and the model is often described in terms of a labeled transition system or a finite state machine. Here, the model is described by a circuit that directly corresponds to a finite state machine: The values of the flip-flops describe a state, the values of the primary inputs describe the input symbol, and the combinational logic describes the transition function. In this context, the atomic propositions for a particular state in a labeled transition system are given by the bits with value 1 of the state vector. Essentially, each formalism can be transformed into the others. In this book, Bounded Model Checking (BMC) is considered [BCCZ99]. The property is always checked over a finite number of time frames. The advantage of this formulation is the direct correspondence to a SAT instance. The property language may describe properties over infinite intervals like Linear Time Logic (LTL) [Pnu77]. Longer and longer time frames are considered incrementally until either a counterexample is found or the state space diameter is reached. On the other hand, the property language may restrict the length of the time interval. Solving a single SAT instance is then sufficient to prove or disprove the property. By this, the effectiveness is drastically increased. A finite window restricts the expressiveness of the property language but usually circuits also respond within a bounded time interval to stimuli. Therefore, this type of property checking is quite efficient and successfully applied in practice [WTSF04]. The SAT instance for checking a temporal property over a finite interval is shown in Figure 2.13. The circuit is “unrolled” for a finite number of time time frame 0
x1 [0]
x2 [0]
time frame 1
y1 [0]
y2 [0]
x1 [1]
time frame tcyc -1
y1 [1]
x2 [1]
x [t cyc -1] y2 [1] 1 x2 [t cyc -1]
y1 [t cyc -1] y2 [t cyc -1]
s1 [0]
n1 [0]
s1 [1]
n1 [1]
s1 [tcyc -1]
n1 [tcyc -1]
s2 [0]
n2 [0]
s2 [1]
n2 [1]
s2 [tcyc -1]
n2 [tcyc -1]
Property
Figure 2.13.
SAT instance for BMC
0
28
ROBUSTNESS AND USABILITY
frames and a propositional formula corresponding to the property is attached to this unrolling. The property is constrained to evaluate to 0. Therefore, the SAT instance is satisfiable if, and only if, a counterexample exists that shows the invalidity of the property on the circuit. Otherwise, the property is valid. More detailed, the SAT instance is created as follows: For each time frame one copy of the circuit is created. State elements are converted to inputs and outputs. The next state outputs of time frame t are connected to the present state inputs of time frame t + 1. New variables are used for every copy of the circuit in the SAT instance. Notation 2. In time frame t, the variable g[t] is used for gate g. The Boolean expression ψg [t] denotes the Boolean constraints for gate g at time t. Remark 2. Normally, an indexed notation is used to denote variables or different Boolean expressions, while using an array notation to identify different time frames is not usual. But this notation has the advantage to separate the time reference from other indices (e.g. the number i of a particular input xi ) and, by this, to improve the readability. Moreover, in Chapter 5 Boolean expressions are derived from simulation traces. The chosen notation helps to understand the equations more easily. Given the constraint ψg , the constraint ψg [t] is retrieved by substituting all variables with the variable at time frame t, e.g. g is substituted by g[t]. Then, the CNF formula to describe the unrolling of circuit C = (V, E, X, Y, S, N, F, P ) for tcyc time frames is given by: t ψCcyc
tcyc −1
=
t=0 g∈V
ψg [t]
tcyc −2 l t=0 i=1
((ni [t] + si [t + 1])(ni [t] + si [t + 1])) ni [t]↔si [t+1]
(2.3) As a result, the behavior of the circuit over time is modeled. For a bounded finite interval the temporal property directly corresponds to a propositional formula where the variables correspond to variables of gates at particular time frames. By attaching the property to the unrolled circuit, the relationship between signals is evaluated over time. In this book, properties may either be given 1. As an LTL safety property 2. As a propositional formula that refers to signals of the circuit at particular time frames
29
Preliminaries
At first, suppose a partial specification of the system is given as an LTL formula. Besides well-known propositional operators, also temporal connectives are available in LTL. The meaning of the temporal operators is informally introduced in the following: X p means “p holds in the next time frame” G p means “p holds in all time frames” F p means “p eventually holds in some time frame” p U q means “p holds in all time frames until q holds” A safety property does not contain the operator F (and no other construct to express this operator, e.g. G p). Now, the LTL formula Ψ has to be checked for tcyc time steps. For this purt pose a propositional formula ψΨcyc representing the specification is constructed. For each subformula Ω of Ψ and for every time frame t a new propositional variable zΩ [t] is introduced. These variables are related to each other and to the variables used in the unrolling of the circuit as follows. For the temporal connectives, the well-known expansion rules [MP91] are applied which relate the truth value of a formula to the truth values of its subformulas in the same and the next time frame. For instance, G Ψ = Ψ · X G Ψ and F Ψ = Ψ + X F Ψ. The Boolean connectives used in LTL are trivially translated to the corresponding constructs relating the propositional variables. Finally, the truth value of the atomic proposition g at time frame t is equal to the value of the corresponding variable g[t] in the unrolling of the circuit. The final requirement is that the formula is not contradicted by the behavior of the circuit. That is, zΨ [0], the variable corresponding to the specification in time frame 0, is true. As a result, property checking can be done by solving the SAT problem for the following propositional formula: t
t
ψBM C = z Ψ [0] · ψΨcyc · ψCcyc
(2.4)
The formula ψBM C is unsatisfiable if, and only if, no trace for the circuit C exists such that the specification Ψ does not hold – or simpler ψBM C is unsatisfiable if and only if Ψ is valid on C, i.e. C is a model for Ψ. Alternatively, a property may be given directly as a propositional formula. In this case, a fixed number of time frames is considered by this property. The length of the window for a propositional property is given by the largest time
30
ROBUSTNESS AND USABILITY
x2 x1
0 1
s1
0 1
Figure 2.14.
s2
0 1
s3
y1
1-bit-shift-register
step referenced by any variable plus one (the first time step is considered to be zero). The property is shifted to an arbitrary time step t by adding t to each time reference. Example 11. Again, consider the circuit in Figure 2.14 that was introduced in Example 11. This is a 1-bit-shift-register with three state registers labeled by the name of the present state nodes s1 , s2 and s3 . The shift-register has two modes of operation: keep the current value (x2 = 1) and shifting (x2 = 0). In the shifting mode, the value of input x1 is shifted into the register. After three clock cycles the value is stored in register s3 . This behavior is described by the property “If x2 is zero on three consecutive time steps, the value of x1 in the first time step equals y1 in the fourth time step” which can be written as a formula: x2 [t] · x2 [t + 1] · x2 [t + 2] → (x1 [t] = s3 [t + 3])
(2.5)
The length of the window for this property is 4. Similar notions of properties are also used by industrial model checking tools, e.g. [BS01]. Having a window for the property is not a restriction in practice. Very often the length of the window corresponds to a particular number of cycles needed for an operation in the design. In case of the shift-register, this is the number of cycles needed to bring an input value to the output. For a more sophisticated design like a processor this can be the depth of the pipeline, i.e. the number of cycles to process an instruction. Finally, counterexamples are also considered to carry out diagnosis for BMC. Similar to the case of equivalence checking the counterexample is represented by a triple (T, y, ν) as described by Definition 5. This counterexample may either be given with respect to the circuit or with respect to the SAT instance representing the BMC problem. With respect to the circuit, the counterexample is a simulation trace over time, but in general no single erroneous output of the circuit is responsible for the failure of a property. If the counterexample is given with respect to the SAT instance, the failure corresponds to zΨ [0] becoming 0 instead of 1.
31
Preliminaries
2.4
Automatic Test Pattern Generation
This section provides the necessary notions to introduce Automatic Test Pattern Generation (ATPG). First, circuits and fault models are presented. Then, the reduction of a sequential ATPG problem to a combinational problem is explained. Finally, classical ATPG algorithms working on the circuit structure are briefly reviewed. The presentation is kept brief, for further reading we refer to, e.g. [JG03].
2.4.1
Fault Models
After producing a chip, the functional correctness of this chip has to be checked. Without this check an erroneous chip may be delivered to customers which, in turn, may cause a malfunction of the final product. This, of course, is not acceptable. A large range of malfunctions is possible due to defects in the material, process variations during production, etc. But directly checking for all possible physical defects is not feasible. An abstraction in terms of a fault model is typically introduced. The Stuck-At Fault Model (SAFM) [BF76] is well-known and widely used in practice. In this fault model, a single line is assumed to be stuck at a fixed value instead of depending on the input values. When a line is stuck at the value 0, this is called a stuck-at-0 (SA0) fault. Analogously, if the line is stuck at the value 1, this is a stuck-at-1 (SA1) fault. Example 12. Figure 2.15(a) repeats the circuit from Example 9. When a SA0 fault is introduced on line a, the faulty circuit in Figure 2.15(b) is created. The output of the AND-gate is disconnected and the upper input of the OR-gate constantly assumes the value 0. Besides the SAFM a number of other fault models have been proposed, e.g. the cellular fault model [Fri73] where the function of a single gate is changed, or the bridging fault model [KP80] where two lines are assumed to settle to a single value. These fault models mainly cover static physical defects like opens or shorts. Dynamic effects are covered by delay fault models, for example, the Path-Delay Fault Model (PDFM) [Smi85]. In the PDFM, it is x1
x1
x2 x3
x2
a y b (a) Correct circuit
Figure 2.15.
a 0
x3
b (b) Faulty circuit
Example for the SAFM
y
32
ROBUSTNESS AND USABILITY
checked whether the propagation delays of all paths in a given circuit are less than the system clock interval. For the detection of a path delay fault a pair of patterns (I1 , I2 ) is required rather than a single pattern as in the SAFM: The initialization vector I1 is applied and all signals of the circuit are allowed to stabilize. Then, the propagation vector I2 is applied and after the system clock interval the outputs of circuit C are checked. Definition 6. A two-pattern test is called a robust test for a path delay fault (RPDF test) on a path if it detects that fault independently of all other delays in the circuit and all other delay faults not located on this path. An even stronger property can also be defined for PDF tests: For each path delay fault there exists a robust test (I1 , I2 ) which sets all off-path inputs to noncontrolling values on application of I1 and remains stable during application of I2 , i.e. the values on the off-path inputs are not invalidated by hazards or races. Robust tests with this property are called strong RPDF tests. In the following, we only use such tests, but for simplicity we call them RPDF tests, too. For a detailed classification of PDFs see [PR90].
2.4.2
Combinational ATPG
Automatic Test Pattern Generation (ATPG) is the task of calculating a set of test patterns for a given circuit with respect to a fault model. A test pattern for a particular fault is an assignment to the primary inputs of the circuit that leads to different output values depending on the presence of the fault. Calculating the Boolean difference of the faulty circuit and the fault-free circuit yields all test patterns for a particular fault. This construction is similar to a miter circuit [Bra83] as it can be used for combinational equivalence checking (see Section 2.3.1). In this sense, formal verification and ATPG are similar problems [AFK88]. Example 13. Again, consider the SA0 fault in the circuit in Figure 2.15. The input assignment x1 = 1, x2 = 1, x3 = 1 leads to the output value y = 1 for the correct circuit and to the output value y = 0 if the fault is present. Therefore this input assignment is a test pattern for the fault a SA0. The construction to calculate the Boolean difference of the fault free circuit and the faulty circuit is shown in Figure 2.16. A similar approach can be used to calculate tests for the dynamic PDFM. In this case the circuit is either unrolled for two time frames or a multi-valued logic is applied to model the value of a gate in two subsequent time frames. Additional constraints apply to gates along the path to be tested to force different values in the two time frames. As a result, two test patterns are calculated
33
Preliminaries x1 x2
a y
x3
b BD
a’ 0
y’
b’
Figure 2.16.
Boolean difference of the faulty circuit and the fault free circuit
to test for a PDF. For a strong RPDF test the side inputs to the path have to be set to noncontrolling values. The absence of hazards has to be ensured by extra constraints. Definition 7. A fault is testable when a test pattern exists for that fault. A fault is untestable when no test pattern exists for that fault. To decide whether a fault is testable, is an NP-complete problem [IS75]. The aim is to classify all faults and to create a set of test patterns that contains at least one test pattern for each testable fault. Generating test patterns for circuits that contain state elements like flip-flops is computationally more difficult because the state elements cannot directly be set to a particular value. Instead, the behavior of the circuit over time has to be considered during ATPG. For example, the circuit can be unrolled similarly to BMC. In ATPG, this is frequently called the iterative logic array. Moreover, a number of tools have been proposed that directly address the sequential problem, e.g. HITEC [NP91] or the sequential SAT solver SATORI [IPC03]. But in practice, the resulting model is often too complex to be handled by ATPG tools. To overcome this problem, the full scan mode is usually considered by connecting all state elements by a scan chain [WA73, EW77]. In test mode, the scan chain combines all state elements into a shift-register, in normal operation mode the state elements are driven by the ordinary logic in the circuit. As a result, the state elements can be considered as primary inputs and outputs for testing purposes and a combinational problem results.
2.4.3
Classical ATPG Algorithms
Classical algorithms for ATPG usually work directly on the circuit structure to solve the ATPG problem for a particular fault. Some of these algorithms are briefly reviewed in the following. For an in-depth discussion the reader is referred to text books on ATPG, e.g. [JG03].
34
ROBUSTNESS AND USABILITY
One of the first complete algorithms dedicated to ATPG for the SAFM was the D-algorithm proposed by Roth [Rot66]. The basic ideas of the algorithm can be summarized as follows: A fault is observed due to differing values at a line in the faulty circuit and the fault-free circuit. Such a divergence is denoted by values D or D to mark differences 1/0 or 0/1, respectively. Instead of Boolean values, the set {0, 1, D, D} is used to evaluate gates and carry out implications. A gate that is not on a path between the fault and any output does never have a D-value. A necessary condition for testability is the existence of a path from the fault location to an output where all intermediate gates either have a D-value or are not assigned yet. Such a path is called a potential D-chain. A gate is on a D-chain if it is on a path from the fault location to an output and all intermediate gates have a D-value. On this basis an ATPG algorithm can focus on justifying a D-value at the fault site and propagating this D-value to an output as shown in Figure 2.17. The algorithm starts with injecting the D-value at the fault site. Then, this value has to be propagated towards the outputs. For example, to propagate the value D at one input along a 2-input AND-gate, the other input must have the noncontrolling value 1. After reaching an output, the search proceeds towards the inputs in the same manner to justify the D-value at the fault site. At some stages in the search decisions are possible. For example, to produce a 0 at the output of an AND-gate, either one or both inputs can have the value 0. Such a decision may be wrong and may lead to a conflict later on. Due to a reconvergence as shown in Figure 2.17, conditions resulting from propagation may prevent justification. In this case, a backtrack search has to be applied. In
Fault site
Justification
Propagation
Reconvergent path Figure 2.17.
Justification and propagation
Preliminaries
35
summary, the D-algorithm is confronted with a search space of O(2|C| ) for a circuit with |C| signals including inputs, outputs and internal signals. A number of improvements have been proposed for this basic procedure. PODEM [Goe81] branches only on the values for primary inputs. This reduces the search space for test pattern generation to O(2n ) for a circuit with n primary inputs. But as a disadvantage time is wasted if all internal values are implied from a given input pattern that finally does not detect the fault. Fan [FS83] improves upon this problem by branching on stems of fanout points as well. As a result internal structures that cause a conflict when trying to detect the test pattern are detected earlier. The branching order and value assignments are determined by heuristics that rely on observability measures to predict a “good” variable assignment for justification or propagation, respectively. Moreover, the algorithm keeps track of a justification frontier moving towards the inputs and a propagation frontier moving towards the outputs. Therefore, Fan can make the “most important decision” first – based on a heuristic – while the D-algorithm applied a more static order by propagating the fault at first and justifying the assignments for preceding gates afterward. Socrates [STS87] includes the use of global static implications by considering the circuit structure. Based on particular structures in the circuit, indirect implications are possible, i.e. implications that are not directly obvious due to assignments at a single gate, but implications that result from functional arguments taking several gates into account. These indirect implications are applied during the search process to imply values earlier from partial assignments and, by this, prevent wrong decisions. Hannibal [Kun93] further improves this idea. While Socrates only uses a predefined set of indirect implications, Hannibal learns from the circuit structure in a preprocessing step. For this task recursive learning [KP94] is applied. In principle, recursive learning is complete itself but too time consuming to be used as a stand-alone procedure. Therefore, learning is done in a preprocessing step. During this step, the effect of value assignments is calculated and the resulting implications are learned. These implications are stored for the following run of the search procedure. In Hannibal the Fan algorithm was used to realize this search step. Even though several improvements have been proposed to increase the efficiency of ATPG algorithms, the worst case complexity is still exponential. Synthesis for testability means to consider the ATPG problem during synthesis already and, by this, create circuits with good testability.
Chapter 3 ALGORITHMS AND DATA STRUCTURES
The technique proposed in this chapter cannot exclusively be attributed to a single step in the design flow. Instead, the underlying techniques for Boolean function manipulation are adjusted to particular subsequent needs. Binary Decision Diagrams (BDDs) [Bry86] and solvers for the Boolean Satisfiability (SAT) problem [DP60, Coo71] are state of the art for Boolean function manipulation. Both approaches have individual advantages. In the past, many researchers have proposed techniques to improve the efficiency of these algorithms, e.g. in [MS96, BRB90, MMZ+ 01, Som01b, ES04]. A new data structure is presented that combines paradigms of SAT solvers and BDDs. Heuristics allow to trade-off BDD-like symbolic manipulation of Boolean functions versus SAT-like search in the Boolean space. This can influence the robustness advantageously or can be exploited to retrieve more detailed information about particular parts of the solution space. The approach was first presented in [DFK06]. The link of this technique to other steps within the design flow is outlined at the end of this chapter.
3.1
Combining SAT and BDD Provers
Besides BDDs SAT provers are an efficient – and often more robust – technique to handle Boolean problems. Experimental studies have shown that both techniques are orthogonal, i.e. there exist problems where BDDs work well while SAT solvers fail and vice versa. This trade-off can even be formally proven [GZ03]. BDDs and SAT provers are very different in nature. While BDDs compute all solutions in parallel, they require a large amount of memory. In contrast, SAT is very efficient regarding memory consumption but only gives a single solution. There are many applications where multiple solutions are needed
38
ROBUSTNESS AND USABILITY
(see, e.g. [HTFM03] or Section 6.2). Motivated by these observations, many authors tried to combine the best of the two approaches, by applying SAT solvers and BDDs alternatively or iteratively. Even though remarkable results have been obtained, so far none of the approaches considered an integration of the two methods within a single data structure. In this section, the first hybrid approach that allows to tightly combine BDDs and SAT is presented. Even though the overall principle of the two techniques is very different, there are also some similarities. In both concepts, starting from a Boolean description the problem is decomposed by assigning a Boolean value to a variable. This has already been observed in [RDO02]. For this, the concept of expansion nodes is introduced. The given Boolean problem is initially represented by a single expansion node that is recursively expanded. If this is done in a strict Depth First Search (DFS) manner, the resulting algorithm is close to a SAT procedure. But if all operations are carried out symbolically, the algorithm computes a BDD. The relation between the two approaches is discussed in more detail later. Experimental results demonstrate the efficiency of the approach. The section is structured as follows: Other approaches to extend Boolean proof techniques and the relation between SAT and BDDs are discussed in Section 3.1.1. Then, the relation between the two is considered. The new hybrid approach is presented in Section 3.1.2. In Section 3.1.3, experimental results are given.
3.1.1
Proof Techniques
In the following, earlier work related to the hybrid approach is discussed. Different extensions have been suggested for both concepts, SAT provers and BDDs. Then, the relations between both concepts are briefly reviewed.
3.1.1.1 Extensions Streaming BDDs have been proposed to reduce the memory requirements [Min02]. The idea is to represent a BDD as a bracketed sequence. The sequence can be processed sequentially using limited memory. But this can only be done by giving up canonicity. In the context of extensions of the classical BDD concept introduced by Bryant (see Section 2.1.2), some approaches have been presented that make use of different types of functional nodes. The approach in [RBKM91] keeps control of the memory needed for the BDD construction by projecting some parts of the graph to a new terminal node U (= unknown). Instead of completely calculating each subgraph, the calculation may be stopped at a given depth and the complete tree is replaced by the terminal node U . As a result, exactness cannot be recovered afterward.
39
Algorithms and Data Structures
Nodes to represent the exclusive-or of the children have been introduced in [MS98]. The purpose of these nodes is to reduce the size of the BDD. Then, probabilistic methods are applied to find a satisfying assignment. Extended BDDs as proposed in [JPHS91] apply existential quantification and universal quantification as edge attributes. By introducing a “structural variable” s, the equality ∃s f = fs + fs can be exploited to represent the Boolean operation f + g in terms of a node v. This can be seen as follows: Let v be a node and f and g be the Boolean functions represented by its children. Then, v represents the function sf + sg. Now, assume an incoming edge has the attribute for existential quantification. The function represented by this edge is retrieved as follows: ∃s (sf + sg) = (sf + sg)s + (sf + sg)s = f +g
(as introduced above)
Similarly, universal quantification is used to represent f · g. These structural variables allow to control the size of the extended BDD. Again, the problem is to find a satisfying assignment of the resulting extended BDDs. The same principle was exploited in [HDB96]. By introducing extra nodes at the top level of two BDDs, a Boolean operation is represented. Then, these nodes are moved towards the terminals by exchanging adjacent variables [Rud93]. At the terminals these nodes can be eliminated. In both cases the use of new variables implies that a new level is introduced in the shared BDD structure. The approach was further extended in [AH97] for Boolean Expression Diagrams (BEDs). Functional nodes that directly represent Boolean operations were introduced. Again, these nodes can be eliminated by swapping adjacent levels in the BED. If a BED is built from a description of a circuit, the size of a BED is similar to the circuit size. All of these approaches are presented as extensions of BDDs. The advantage of using SAT-like algorithms on such a structure has not been considered. Another recent direction of research are efficient all-solution SAT solvers that do not stop after reaching the first satisfying assignment but calculate all possible satisfying solutions, e.g. [LHS04]. A drawback of these approaches is the potentially large representation of all solutions usually as cubes or as BDDs. In contrast, the hybrid approach targets applications where not all but a set of good solutions is needed. Recently, several techniques have been proposed to combine BDDs and SAT solvers (see, e.g. [GYAG00, KPKG02, CNQ03, SFVD05]), but no real integration is done. Instead, the proof engines are started one after the other, or alternating. By this, good experimental results have often been obtained, demonstrating the potential of an integrated approach.
40
ROBUSTNESS AND USABILITY
3.1.1.2 Relations BDDs and SAT solvers are most frequently used as complete proof techniques and for the symbolic manipulation of Boolean functions. Both techniques have advantages and disadvantages. BDDs represent all solutions in parallel at the cost of large memory requirements. SAT solvers only provide a single solution while the memory consumption is relatively low. In [RDO02] the relation between BDDs and SAT has been studied from a theoretical point of view. It has been proven that the BDD corresponds to a complete representation of the SAT backtrack tree if a fixed variable order is assumed. As a motivation for the next section, where the hybrid approach is described in more detail, an example is given to show the difference between SAT and BDDs. We will later come back to this example. Example 14. Consider the Boolean function f over four variables given by f
= (x1 + x2 + x3 )(x1 + x2 + x4 )(x1 + x2 + x4 ) (x1 + x2 + x3 )(x1 + x2 + x3 + x4 )
A sketch of the search tree if the function is processed by a SAT solver is shown in Figure 3.1(a). The corresponding BDD is given in Figure 3.1(b) for the variable order π = (x1 , x2 , x3 , x4 ). As can be seen, the SAT solver by construction only gives a single solution while the BDD represents all satisfying assignments in parallel at the cost of a larger number of nodes.
3.1.2
Hybrid Approach
In this section, the hybrid approach for BDD and SAT integration is presented. First, the overall idea is given. Then, the concept of expansion nodes is introduced followed by a discussion of expansion heuristics. Finally, comments on some issues related to an efficient implementation are provided.
3.1.2.1 Basic Idea In the hybrid approach, processing starts by symbolic operations analogously to BDDs. For the operations the basic operators for XOR and AND (see Section 2.1.2) have been modified. During the starting phase, the constructed graphs are simply BDDs. But when composing BDDs, a heuristic is used to decide which parts of the solution space are explored. To guarantee the exactness of the algorithm, i.e. no solution is missed, a node is introduced where the computation can be resumed. These nodes are called expansion nodes in the following. As a result, the hybrid approach stores all necessary information resulting in a complete proof method.
41
Algorithms and Data Structures f
x1 x3
x3=0
x2
x2
x2 x2=0
x3
x3
x1 x1=1 x4 x4=1
x4
x4=0 0
1
x4
x1=0
1
0
0
(b) BDD
(a) SAT search tree
x1
xh
x2
E xi
E
x3 E
xj
x4
0 (c) Sketch of the hybrid approach Figure 3.1.
1
(d) Hybrid representation
Different approaches
A sketch of a configuration during the run is shown in Figure 3.1(c). In this case the upper part is “SAT-like” while the lower part is a complete symbolic representation as it occurs in BDDs. The expansion nodes are denoted by E. The decomposition nodes are labeled by variables, these variables occur in the same order on all paths. In the following, such graphs that allow a smooth transition between SAT and BDDs are called a hybrid structure.
42
ROBUSTNESS AND USABILITY xi 1 (a) Terminal
E op low
high
(b) Decomposition node
f
g
(c) Expansion node
Figure 3.2. Overview over different node types
Remark 3. Several expansion nodes in a hybrid structure may represent the same function. This cannot be detected before completely expanding the node. Thus, a hybrid structure is not a canonical representation of Boolean functions.
3.1.2.2 Expansion Nodes The hybrid approach makes use of three types of nodes (see Figure 3.2): (a) Terminal nodes (b) Decomposition nodes (c) Expansion nodes The first two can also be found in BDDs. Terminal nodes represent the constant functions 0 and 1. In decomposition nodes the Shannon decomposition is carried out. Expansion nodes are labeled by a Boolean operation op and have two successors f and g that represent Boolean functions (which are also denoted by f and g for simplicity). The expansion node represents the function f op g. Example 15. Consider again the function from Example 14 and Figures 3.1(a) and 3.1(b). A possible hybrid structure is shown in Figure 3.1(d). This one results if the top variable is only decomposed in one direction, while an expansion node is placed on the other branch. As can be seen, the structure is more memory efficient. Compared to the BDD five instead of seven nodes are needed. At the same time three solutions are represented in contrast to the SAT approach that only returns a single solution. This simple example demonstrates that the approach combines the two proof techniques SAT and BDD. A crucial point to address is where to place the expansion nodes. A heuristic for this purpose is presented in the next section.
3.1.2.3 Expansion Heuristics Inserting expansion nodes at suitable locations is crucial for the approach to work. If too many expansion nodes are inserted, no solutions can be found. Only structures without a path to a terminal will be constructed and the expansion of partial trees will take most of the run time until computing a solution.
Algorithms and Data Structures
43
Not inserting enough expansion nodes will lead to a memory blow-up as known from BDDs. In a BDD-based approach the final solutions are computed by composing intermediate BDDs. This is similar for the new approach. The following steps are necessary to retrieve solutions: 1. Build BDDs for basic functions without any expansion nodes. 2. Compose the basic functions and insert expansion nodes according to a predetermined heuristic. 3. Select expansion nodes to calculate the final solutions. Which functions are considered as basic functions in Step 1 depends on the problem and the input format, e.g. projection functions and cubes were chosen in the experiments. Building BDDs for these basic functions is not necessary for the approach to work, but having the basic functions completely represented, improves the performance drastically by reducing the number of necessary expansions. The following two heuristics to limit the size of the resulting hybrid structure in Step 2 have been evaluated: (S1) A fast procedure is to directly limit the memory consumption. This limit can be detected efficiently. Once the limit is reached no further decomposition nodes are created. Instead, expansion nodes are generated. Therefore, prior to performing an expansion the memory limit is increased by a user defined value. (S2) The second procedure is to limit the number of nodes in a subgraph to a certain threshold. Tracking this limit is computationally more expensive. But allowing more than n nodes in a subgraph guarantees that there is at least one path to a terminal node, i.e. for at least one assignment the function can directly be evaluated. The selection of nodes to expand in Step 3 has been evaluated using two other heuristics: (E1) Randomly (E2) Heuristically (using the algorithm in Figure 3.3): The hybrid structure is traversed in a depth first manner until an expansion node is reached. This node is selected and then expanded by carrying out the stored operation. The same scheme is applied recursively if further selections are necessary.
44
ROBUSTNESS AND USABILITY
1 node* DFS(v){ 2 if(isTerminal(v)) return NULL; 3 tmp = DFS(Then(v)); 4 if(tmp) return tmp; 5 if(isExpNode(v)) return v; 6 tmp = DFS(Else(v)); 7 return tmp; 8 } Figure 3.3. Depth first traversal
Here, (E2) also heuristically ensures a moderate growth of the memory needs. Experimental studies showed that the combination of a hard limit on memory consumption (S1) with deterministic DFS (E2) gives the best results, i.e. small run times and a large number of solutions. From a more general point of view this combination of heuristics leads to a SAT-like search tree in the upper part of the hybrid structure which is enriched by a BDD-like lower part. Remark 4. When using heuristics (S1) and (E2) in combination, the search space is traversed similar as with “BDDs at SAT leaves” as it has been introduced in [GYAG00, GYA+ 01]. But the proposed hybrid structure is more general in the sense that switching between SAT-like and BDD-like behavior is subject to heuristics. Remark 5. During expansion canonicity is also an issue. When expanding a node, a function that is already represented by another node may be the result. The hybrid structure can be reduced at a computational cost linear in the number of nodes using an algorithm similar to [SW93]. In the implementation no reduction was carried out to save run time.
3.1.2.4 Implementation The technique described above has been integrated into the CUDD package [Som01a] where the core data structures are taken from. To store the expansion nodes, the structure for nodes has been extended (see Line 8 in Figure 3.4). The structure for the new type is given in Lines 12–15. In case of an expansion node, also the operation has to be stored. For reasons of efficiency only two types of operations are stored: AND and XOR. Negation is realized by complemented edges (see Section 2.1.2). All other Boolean operators are mapped accordingly. The information is stored in the index of each node. The complete encoding is given in Table 3.1, i.e. three indices have a special meaning while all the remaining ones are used for decomposition variables.
45
Algorithms and Data Structures
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
struct node { halfWord index; halfWord ref; node *next; union { terminal value; children kids; expNode func; } } struct expNode { node *F; node *G; } Figure 3.4. Modified node structure
Table 3.1. Index of node types (32-bit) Node type Decomposition nodes XOR-node AND-node Terminal node
3.1.3
Index 0 - 65532 65533 65534 65535
Experimental Results
In the following, the results of two types of experiments are analyzed. First, the well-known n-Queens problem is considered as an example of a combinational problem where BDDs perform poorly on large instances while a large number of solutions is available. Second, the synthesis problem of minimizing ESOP representations is studied as an optimization problem that is known to be hard. All experiments have been carried out on an Intel Pentium 4 processor with 3 GHz and 1 GB of main memory running Linux.
3.1.3.1 n-Queens The n-Queens problem is a well-known combinational problem. The objective is to place n queens on an n × n board such that no queen can be captured by another one. An example for a solution of the 5-Queens problem is shown in Figure 3.5. This game problem is encoded using n2 binary input variables, each one deciding whether a queen is placed on the corresponding field of the
46
ROBUSTNESS AND USABILITY
Figure 3.5. Solution to the 5-Queens problem Table 3.2. Heuristics to limit the size of the hybrid structure BDD n 6 7 8 9 10 11 12 13
#Sol. 4 40 92 352 724 2680 14200 73712
Time 0.00 0.01 0.05 0.37 1.56 7.81 48.12 352.11
Limit for the size Memory (S1) Subgraph (S2) Time Overhead Time Overhead 0.00 – 0.01 – 0.01 0.00 % 0.03 200.00 % 0.06 20.00 % 0.18 260.00 % 0.37 0.00 % 1.30 251.35 % 1.59 1.92 % 8.20 425.64 % 7.82 0.13 % 62.39 698.84 % 48.54 0.87 % 490.33 918.97 % 353.21 0.31 % 4566.75 1196.97 %
chess board or not. Obviously, the constraints are to place one queen per row and column and at most one queen per diagonal. In a first experiment, the heuristics to limit the size were considered. For all experiments the limits were loose enough to retrieve all solutions. By this, the overhead of the heuristics to limit the size can directly be measured in comparison to BDDs. Results are reported in Table 3.2. Given are the number of solutions for increasing values of n and run times in CPU seconds for BDDs and the two heuristics introduced in Section 3.1.2.3. The resource requirements for BDDs increase rapidly and no further solutions beyond n = 13 could be retrieved. Also the computational overhead of limiting the size of subgraphs using heuristic (S2) is too large. But directly limiting the memory consumption according to heuristic (S1) introduces almost no overhead. The direct limit has been used in all remaining experiments to restrict the size. The performance of heuristics to select nodes for expansion has been investigated in the next experiment. Expansion was carried out until a total memory
47
Algorithms and Data Structures Table 3.3. Selection of expansion nodes n 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#Var 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400 441
Randomly (E1) #Sol. Time 0 0.00 2 0.00 10 0.00 4 0.00 40 0.02 92 0.06 352 0.37 724 2.10 2680 16.54 14200 158.86 73712 2062.39 0 384.45 0 289.01 0 652.64 0 1366.25 0 693.13 0 529.37 0 1923.07 0 1957.39
DFS (E2) #Sol. Time 0 0.00 2 0.00 10 0.00 4 0.00 40 0.01 92 0.06 352 0.37 724 1.83 2680 10.30 14200 73.34 73712 578.54 56672 1836.93 33382 1669.50 20338 2555.35 5061 2055.97 204 2238.79 1428 3357.97 38 1592.94 111 1972.60
limit of 750 MB was reached. Due to the expansion of subfunctions, more than one solution can be contained in the final representation. The results are shown in Table 3.3. Up to n = 13 all solutions were obtained with both heuristics. Then, the random selection performs very poorly. When expanding the last node in a cascade of expansion nodes, new decomposition nodes are created. But the next expansion will often occur at an expansion node in a different subgraph. Thus, the previously created decomposition nodes cannot be utilized for the next step. In contrast, the deterministic DFS starts the next expansion where new decomposition nodes have been constructed previously. As a result, the new approach yields solutions up to n = 21 in a moderate amount of time.
3.1.3.2 ESOP Minimization Compared to a SOP-representation of an function the ESOP-representation can be exponentially smaller (consider i = 1 xi as an example). But most algorithms for ESOP minimization only apply local transformations to improve the representation starting from an initial solution, e.g. [BS93, MP01]. In [PC90] the problem to compute an ESOP for a given Boolean function f over n variables has been formulated using the Helliwell equation. The Helliwell equation Hf for function f has 3n input variables, each input variable corresponds to a cube and is 1 if, and only if, this cube is chosen for the ESOP of f . A satisfying
48
ROBUSTNESS AND USABILITY
Table 3.4. ESOP minimization
k 4 5 10 15 20 25 30 35 39
BDD all solutions Time #Nodes 0.55 628 0.58 4075 1.75 420655 4.96 1428139 53.96 2444782 1945.01 3449866 9985.37 4441463 13900.22 5361182 13913.44 5906441
hybrid structure ≥ 1 solution ≥ 103 solutions ≥ 106 Time #Nodes Time #Nodes Time 0.50 568 0.53 1108 0.53 0.53 638 0.60 4729 0.61 0.47 145 0.70 11597 51.28 0.48 352 0.61 11634 10.17 0.47 112 0.54 7459 1.13 0.48 490 0.52 5465 0.98 0.49 495 0.49 2618 0.66 0.52 544 0.51 878 0.75 0.44 217 0.45 1241 0.53
solutions #Nodes 1108 4729 155018 172422 177708 133396 48107 21608 5910
Zchaff 1 sol. 103 sol. 106 sol. Time Time Time