Engineering Theories of Software Intensive Systems
NATO Science Series A Series presenting the results of scientific meetings supported under the NAT A O Science Programme. The Series is published by IOS Press, Amsterdam, and Springer (formerly Kluwer Academic Publishers) in conjunction with the NATO Public Diplomacy Division.
Sub-Series I. II. III. IV.
Life and Behavioural Sciences Mathematics, Physics and Chemistry Computer and Systems Science Earth and Environmental Sciences
IOS Press Springer (formerly Kluwer Academic Publishers) IOS Press Springer (formerly Kluwer Academic Publishers)
The NATO Science Series continues the series of books published formerly as the NATO ASI Series.
The NATO Science Programme offers support for collaboration in civil science between scientists of countries of the Euro-Atlantic Partnership Council. The types of scientific meeting generally supported are “Advanced Study Institutes” and “Advanced Research Workshops”, and the NATO Science Series collects together the results of these meetings. The meetings are co-organized by scientists from , NATO countries and scientists from NATO s Partner countries – countries of the CIS and Central and Eastern Europe. Advanced Study Institutes are high-level tutorial courses offering in-depth study of latest advances in a field. Advanced Research Workshops are expert meetings aimed at critical assessment of a field, and identification of directions for future action. As a consequence of the restructuring of the NATO Science Programme in 1999, the NATO Science Series was re-organized to the four sub-series noted above. Please consult the following web sites for information on previous volumes published in the Series. http://www.nato.int/science http://www.springeronline.com http://www.iospress.nl
Series II: Mathematics, Physics and Chemistry – Vol. 195
Engineering Theories of Software Intensive Systems edited by
Manfred Broy Technische Universität München, Garching, Germany
Johannes Grünbauer Technische Universität München, Garching, Germany
David Harel The Weizmann Institute of Science, Rehovot, Israel and
Tony Hoare Microsoft Research, Cambridge, U.K.
Proceedings of the NATO Advanced Study Institute on Engineering Theories of Software Intensive Systems Marktoberdorf, Germany 3-15 August 2004 A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN-10 ISBN-13 ISBN-10 ISBN-13 ISBN-10 ISBN-13
1-4020-3531-4 (PB) 978-1-4020-3531-9 (PB) 1-4020-3530-6 (HB) 978-1-4020-3530-2 (HB) 1-4020-3532-2 (e-book) 978-1-4020-3532-6 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands.
www.springeronline.com
Printed on acid-free paper
All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands.
Contents
Preface
Part I
vii
Architectures, Design and Interfaces
Incremental Software Construction with Refinement Diagrams Ralph-Johan Back
3
Service-oriented Systems y Engineering: g g Specification and Design g of Services and Layered Architectures 47 Manfred Broy Interface-based Design Luca de Alfaro, Thomas A. Henzinger The Dependent Delegate Dilemma Betrand Meyer
83 105
Part II System and Program Verification, Model Checking and Theorem Proving Formalizing Counterexample-driven Refinement with Weakest Preconditions Thomas Ball
121
A Mechanically Checked Proof of a Comparator Sort Algorithm J. Strother Moore, Bishop Brock
141
Keys in Formal Verification Amir Pnueli
177
On the utility of canonical abstraction Mooly Sagiv, Thomas W. Reps, Reinhard Wilhelm, Eran Yahav
215
Part III Process Algebras and Experimental Calculi Process Algebra: a Unifying Approach Tony Hoare
257
Computation Orchestration Jayadev Misra
285
vi A Tree Semantics of an Orchestration Language Tony Hoare, Galen Menzel, Jayadev Misra
Part IV
331
Security, System Development and Special Aspects
Model Driven Security David Basin, Jürgen Doser, r Torsten Lodderstedt
353
Some Challenges g for System y Development: Reactive Animation, Smart Play-Out y and Olfaction 399 David Harel
Preface
Today software systems are a pervasive factor in industry, science, commerce, and communication. In consequence of the wide distribution of software, the high dependency of its functioning and quality, our understanding of software engineering foundations is still weak. A lot of progress has been achieved, but much more has to be done to keep pace with the speed of innovations and new applications. The foundations of software technology lie in models allowing us to capture application domains, requirements, but also to understand the structure and working of software systems like software architectures and programs. These models have to be expressed in techniques of discrete mathematics including logics and algebra. However, according to the very specific needs in applications of software technology, formal methods have to serve the needs and the quality of advanced software engineering methods, especially taking into account security aspects in Information Technology. The lectures of the Marktoberdorf Summerschool address these topics and teach state-of-the-art ideas how to meet these challenges. This book is divided into four parts:
Part I: Architectures, Design and Interfaces. Constructing large software systems requires solutions to the central problem of managing their complexity. The larger a systems becomes, the more difficult it is to extend the system and to adapt it to changing requirements. Ralph-Johan Back presents in his contribution the handling of these processes as a stepwise feature introduction. Interfaces play a central role in the design of hard- and software systems. Two further articles take this into account: First, based on the Focus theory, Manfred Broy introduces a formal model to specifying and designing large systems using services and layered architectures. Second, Thomas A. Henzinger presents his work on interface-based design, where interface theories are axiomated to cover certain requirements. Finally, Bertrand Meyer presents a promising solution to the “Dependent Delegate Dilemma”, based on a simple correctness rule.
viii
ENGINEERING THEORIES OF SOFTWARE INTENSIVE SYSTEMS
Part II: System and Program Verification, Model Checking and Theorem Building reliable soft- and hardware is still a challenge today. Proving. Even so, there are some areas where strong achievements to systems are made, e.g. in air planes and air traffic control, most soft- and hardware is still errorprone. In this part, approaches to develop correct systems are shown. Thomas Ball presents a method how to model-check temporal safety properties of programs. J Strother Moore shows how to prove “little theorems” without addressing a complete system, Amir Pnueli presents a set of techniques for the verification of reactive infinite-states systems. And finally, Shmuel Shagiv presents a survey on a parametric abstract domain called canonical abstraction. Part III: Process Algebras and Experimental Calculi. Mathematical foundations of software engineering help to describe and understand the processes how software should behave. In this part, some experimental work is shown. The article of Tony Hoare presents a unifying theory of concurrency that combines the advantages of process algebras. Jayadev Misra shows in his contribution a novel approach for combining different web-services as well as general distributed transactions to form complex services over the internet. In a joint paper, Tony Hoare, Galen Menzel and Jayadev Misra present a formal semantics of Jayadev Misra’s language “Orc”. Part IV: Security, System Development and Special Aspects. Security is getting a leading role in software engineering. David Basin shows how to specify high-level system models along with their security properties and use tools to generate system architectures from the models automatically, including complete, configured security infrastructures. The article of David Harel revolves topics that are seemingly peripherical to the classical notion of system development. Reactive animation covers a method which allows enriching models of reactive systems with an animated, interactive and intuitive frontend. A method, called “smart play-out”, helps to run a program by utilizing verification techniques and tools. With olfaction a set-up for an odor communication and synthesis system is proposed. The contributions in this volume emerged from lectures of the 25th International Summer School on Engineering Theories of Software Intensive Systems, held at Marktoberdorf from August 3 to August 15, 2004. More than 100 participants from 25 countries attended, including students, lecturers and staff. The summerschool has been two weeks of learning, discussing and developing new ideas, and a gainful event, from the professional as well from the social aspect.
PREFACE
ix
It is our pleasure to thank all lecturers, staff, hosts in Marktoberdorf, and especially our secretaries Ingrid Luhn and Katharina Spies for their great and gentle support. Furthermore, we thank Wil Bruins from Kluwer Publishing and Katharina Spies for their help in editing this volume. We also thank Britta Liebscher, who did a great job in typesetting some articles in LATEX. We thank the photographers Katharina Spies, Sonja Werner, David Harel and Dan Barak for always being in the right place at the right time. The Marktoberdorf Summer School was arranged as an Advanced Study Institute of the NATO Security Through Science Programme with support from the town and county Marktoberdorf and the Deutsche Akademische Austausch Dienst (DAAD). We thank all authorities involved. THE EDITORS
Part I Architectures, Design and Interfaces
Ralph-Johan Back
Manfred Broy
Thomas A. Henzinger
Bertrand Meyer
INCREMENTAL SOFTWARE CONSTRUCTION WITH REFINEMENT DIAGRAMS Ralph-Johan Back Abo Akademi University and Turku Centre for Computer Science Turku, Finland backrj@abo.fi
Abstract
We propose here a mathematical framework for incremental software construction and for controlled software evolution. The framework allows incremental changes of a software system to be described on a high architecture level, but still with mathematical precision so that we can reason about the correctness of the changes. The framework introduces refinement diagrams as a visual way of presenting the architecture of large software systems. Refinement diagrams are based on lattice theory and allow reasoning about lattice elements to be carried out directly in terms of diagrams. A refinement diagram proof will be equivalent to a Hilbert like proof in lattice theory. We use refinement calculus as the logic for reasoning about software systems. The calculus models software parts as elements in a lattice of predicate transformers. . In this way, we can use refinement diagrams to reason about the properties of software systems. We show here how to apply refinement diagrams and refinement calculus to the incremental construction of large software system. We concentrate on three topics: (i) modularization of software systems with component specifications and the role of information hiding in this approach, (ii) layered extension of software by adding new features one-by-one and the role of inheritance and dynamic binding in this approach, and (iii) evolution of software over time and the control of successive versions of software.
Keywords:
Incremental software construction, refinement calculus, stepwise feature introduction, refinement diagrams, layered software, software evolution, version control, class diagrams, lattices, diagramatic reasoning, object oriented programming
1.
Introduction
We are interested here in a logical framework that will support the construction of large, correct software systems in an incremental and layered fashion. This means that the software system is built in small increments, part by part, always checking that the correctness of the system is preserved by the exten3 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 3–46. © 2005 Springer. Printed in the Netherlands.
4 sion. We will assume that the system has not been completely specified when we start building it. Rather, the requirements on the system are influenced by the system built thus far, and by changing expectations in the environment. Hence, the framework must also support an evolutionary approach to software construction: the system is never ready, but continuously evolving to meet different demands. At the same time as we are adding new increments to the software system, we are also accumulating design errors, which make further increments more and more difficult. Hence, the systematic extension of the software must be punctuated by frequent redesigns , which improve the software architecture and allow for further extensions. The following diagram shows in a very simple way the overall evolution of a software system: Environment needs new requirements
evaluate software
Software process Adding parts
Changing structure
Software construction is in a continuous interaction with the environment that needs the software: the present system is checked for conformance with the environment needs, and based on this, new requirements are given that influence the further development of the system. The system itself is built by alternating between adding new parts that implement required features and internal redesign of the software in order to meet new demands. This view of software evolution should be contrasted with an alternative view, where we start from a well-defined and complete specification (or set of requirements) for the software to be built, and then proceed to build it in a software project without changing the specifications any more. This view is not inconsistent with the above, because there is a difference of scale here. A project would typically add one or more features to an existing software system, by a sequence of increments. We do not want the specifications to change too much while implementing these new features. The project can also be a major restructuring of a system that has grown out of phase with the changing requirements. Thus, the sequence of successive software projects constitute the software evolution. The important thing here is that while carrying out a software project, one should bear in mind that there will be other software projects
Incremental Software Construction with Refinement Diagrams
5
that continue from the point where this project ended, and that the software should therefore be structured so that it can evolve smoothly. The issues that need to be addresses when designing a framework for software evolution include the following: What is a suitable conceptual model for software and its evolution? What is a suitable software architecture to support evolving software? What is a good way to reason about the correctness of evolving software? What kind of software processes are needed to support software evolution? What kind of software tools do we need to manage software evolution? We are here going to concentrate on the first three questions. We are presently also working on answers to the last two questions [Back, 2002, Anttila et al., 2002, Back et al., 2002], but this is beyond the scope of this paper.
Basic approach. We will use refinement calculus as the basic framework for software evolution. Originally, this calculus was proposed as a programming logic for stepwise program refinement [Back, 1980, Back, 1988, Morgan, 1990, Back and von Wright, 1998], but its applicability has been extended considerably over the years .It has been used for modeling different kinds of software, from distributed and parallel systems to asynchronous circuits to object oriented systems and UML diagrams (e.g., [Back et al., 1996, Back et al., 2000, Back et al., 1999b, Back and Sere, 1991]) . Refinement calculus is based on looking at program statements as predicate transformers, as originally proposed by Dijkstra [Dijkstra, 1976, Dijkstra and Scholten, 1990]. In refinement calculus, we introduce a refinement ordering on predicate transformers. Intuitively, the refinement ordering models replacement: a statement S is refined by another statement T , denoted S T , if S can be replaced by T in any program context without compromising program correctness. The predicate transformers form a complete lattice with refinement as the lattice ordering. The predicate transformer approach is, however, more versatile than just modelling program statements. Different domains that are central to software construction fit together into a hierarchy of lattices that are connected by pointwise extension [Back and von Wright, 1998]. Reasoning about software can be carried out in this hierarchy, at different levels of detail. In this hierarchy, we can fit simple reasoning about state functions and state relations, as well as more complex reasoning about classes and concurrent or interactive processes.
6 One of the basic methods for mastering the complexity of large engineering systems is to use diagrams and drawings. The same approach is also used for software, in particular when describing software architecture. UML, the unified modelling language, has become particularly popular in the last ten years. However, very often these software diagrams are rather informal, based on more or less intuitive software concepts. This should be contrasted with, e.g., construction drawings for buildings. Such drawings have a very precise meaning, so that the systems can be built from these without even consulting the people who have made the drawings. We want to achieve the same level of precision for software architecture diagrams. We will do this by providing a diagrammatic way of reasoning about the construction and correctness of software systems in the refinement calculus. Essentially, what we will provide is a diagrammatic way of reasoning about lattice elements, and then apply this to software construction. Because of the intended application area, we refer to the diagrams that we use for reasoning as refinement diagrams. However, the refinement diagrams can be used for reasoning about any kind of lattice elements, not only software constructs. Refinement diagrams will provide a simple way of visualizing the architecture of a software system, and allows us to reason about the construction and correctness of software at the architecture scale. As we will show below, they are also very precise: a refinement diagram derivation is essentially isomorphic to a Hilbert-like proof in lattice theory. Usually, reasoning in a mathematical structure like a lattice means establishing some general properties for the structure in question (analyzing properties of the structure). This would be the mathematicians way of working. In software engineering, the focus is different. There we are interested in building a piece of software that satisfies specific requirements (constructing elements in the structure). The requirements have been laid down before the construction, or they become evident during the construction process. The mathematicians viewpoint is still needed, however, because we need to show that the construction does indeed satisfy the requirements (correctness). The term that describes the required lattice element can be very complex, so the construction needs to be carried out in an incremental manner. The property to be established for the term can also be quite complex, and is preferably checked during the construction of the term rather than after the construction. Thus, our logical framework must support incremental construction of large lattice terms, and allow for proofs of specific properties about these terms to be carried out during the construction. We will below show that refinement diagrams do satisfy these conditions. The paper is structured as follows. In the next section, we introduce lattices as an algebraic structure. We give the basic properties of lattices, and we introduce refinement diagrams as a way to describe lattice properties and to
Incremental Software Construction with Refinement Diagrams
7
reason about lattice elements. In Section 3, we introduce a diagrammatic proof method for refinement diagrams and show that a refinement diagram proof is equivalent to a Hilbert-like proof in lattice theory. Section 4 gives a brief overview of refinement calculus from a lattice theoretic point of view, to show how lattices can be used as the semantic basis for software systems. We introduce the refinement calculus hierarchy, to show that we may need to carry out reasoning in different lattices at the same time. In Section 5, we use refinement diagrams to analyze modular software construction, in particular the use of specifications and information hiding in building large systems. We consider situations where information hiding is advantageous and situations where information hiding should not be enforced. In Section 6, we look at software extension, and show how to use refinement diagrams to describe the extension of software with new features. The central technique to be used here is layering, which provides for the extension of existing software using inheritance and dynamic binding. Section 7 then finally ties together these threads into an overall view of software evolution with refinement diagrams, and proposes a software editor or versions control system based on refinement diagrams.
Related and earlier work. An earlier version of refinement diagrams has been described in [Back, 1991]. The present version is influenced by UML class diagrams. The way we reason about refinement diagrams is obviously also influenced by category theory diagrams [Barr and Wells, 1990]. The new thing here is to use refinement calculus as the underlying logic for the diagrams, and to use the diagrams to reason about software architecture and correct refinement. The theory described here is intended to support the stepwise feature introduction methods [Back, 2002] for constructing layered software, where each layer introduces only one new feature in the system. We are not in this paper going into ways for modelling more complicated software notions in the refinement calculus, like concurrent and interactive systems, or object oriented systems. Some references are to be found in [Back et al., 1999a, Back and Sere, 1991, Back et al., 2000] .
2.
Lattices and refinement diagrams
We will below define lattices and their basic properties (see e.g. [Birkhoff, 1961, Davey and Priestley, 1990] for an overiew of lattice theory, or [Back and von Wright, 1998] for a shorter overview of lattices). At the same time, we will introduce refinement diagrams as a way of expressing the lattice properties in an intuitive and visual way.
8 Posets and categories. A partially ordered set (or poset) is a set A together with an ordering that is reflexive, transitive and antisymmetric. This means that the following holds in any poset, for any elements a, b, c in the poset: a a (reflexivity) a b ∧ b c ⇒ a c (transitivity) a b ∧ b a ⇒ a = b (antisymmetry) We capture these properties in the following diagrams: c
b
a
Reflexivity
a
Transitivity
a
b
Antisymmetry
Here the first figure shows reflexivity, the second transitivity and the third antisymmetry. Each diagram describes a universally quantified implication: the solid arrows indicate assumed relationships, while the dashed arrows indicate implied relationships. The identifiers stand for arbitrary elements in the lattice. An arrow from b to a indicates that a b holds (think of the arrow as a greater than sign). We use a double arrow to indicate equality of lattice elements. As an example, consider the middle diagram. It says the following: if we know initially that a b holds (the lower arrow) and that b c holds (the upper arrow) for some arbitrary elements a, b, and c, then we may deduce that a c also holds (the dashed arrow from c to a). We refer to these diagrams as refinement diagrams. This name indicates the intended use of the diagrams: the lattice elements are program parts, and the ordering corresponds to refinement between program parts. Intuitively, we can think of refinement as permission for replacement: a b means that the part a can be replaced by program part b (in any context). The poset can be generalized to a category, if we want that the arrows are labelled. The diagrammatic representation of algebraic entities are familiar from category theory. Here we will emphasize the use of this kind of diagrams for reasoning about software construction and software architecture in a refinement calculus framework.
Incremental Software Construction with Refinement Diagrams
9
Lattices. A poset is a lattice, if any two elements a and b in the lattice have a least upper bound (or join) a b and a greatest lower bound (or meet) a b. A lattice is thus characterized by the following properties: a a b and b a b ((join is upper bound) a c ∧ b c ⇒ a b c ((join is least upper bound) a b a and a b b (meet is lower bound) c a ∧ c b ⇒ c a b (meet is greatest lower bound) These properties are illustrated by the following diagram. c’
a join b
a
b
a meet b
c
Here the meet and join elements are shown as dashed, because their existence is inferred. A lattice is bounded, if there is a least element ⊥ and a greatest element in the lattice. This means that for any element a in the lattice, we have that ⊥ a (least and greatest element) This property is shown in the following diagram: top
b
bot
10 The top and bottom elements are dashed here, because their existence can be inferred by the boundedness property. A lattice is complete, if any set of elements in the lattice has a least upper bound and a greatest lower bound. Any finite set will have this property in a lattice, but in an complete lattice also infinite sets and the empty set have a least upper bound and a greatest lower bound. In particular, we have that ⊥ is then the least element of the whole lattice and the greatest element of the empty set, while is the greatest element of the whole lattice and the least element of the empty set.
Functions and terms on lattices. A functionf : A → B from poset A to poset B is monotonic, if for any elements a, a ∈ A a A a ⇒ f.a B f.a (We write function application using dot-notation: f.x stands for the more familiar f (x)). The partial ordering on A is indicated by A and the partial ordering on B by B . Monotonicity is expressed by the following diagram: X
a’
f.X
X
a
f.X
Note that the ordering on the left is in A and the ordering on the right is in B. We indicate the dependency of f.X on a by an open arrow from f.X to a. Here f.X is a lattice term that is constructed by applying the lattice function f to the variable X that ranges over lattice elements. A lattice term is in general constructed by applying lattice operations on lattice constants and lattice variables. We write a lattice term as t[X1 , . . . , Xm ] to indicate that it depends on the lattice variables X1 , . . . , Xm . An example of a lattice term is, e.g., f.((⊥ X1 ) g.X2 ), where f and g are two operations on lattices. In general, a box in a refinement diagram denotes a lattice term. We show a term as a box with dependency arrows, each arrow labeled with a lattice variable. As an example, consider the diagram t1
X1
t[X1,X2]
X2
t2
This shows the term t[X1 , X2 ], where X1 is instantiated with the term t1 and X2 is instantiated with the term t2 . The middle box thus denotes the term t[t1 , t2 ]. The explicit indication of the lattice variables on the arrows can be omitted, if it is clear from the context what is meant by the arrows.
11
Incremental Software Construction with Refinement Diagrams
We can also show the same term box with the subterms as nested boxes (on the left below), or using aggregation as in UML (on the right): X1
t[X1,X2]
t[X1,X2]
t1 X2
X1
X2 t1
t2
The difference between this representation and the previous is that the subterms in the latter representations cannot be shared. In the previous representation, we can have two or more terms using the same subterm. In general, we assume that there is a collection of monotonic functions (operations) available on a given lattice. As the composition of monotonic functions is also monotonic, this means that we can build more complex monotonic functions out these simpler functions using composition. In addition, we assume that there are constants that denote specific lattice elements (⊥ and are two constants that always exist in any bounded lattice, but we usually need other constants as well). We say that a lattice term is monotonic, if it is built out of monotonic lattice functions.
Least fixpoints. In general, the dependencies between terms in a diagram may be circular. In that case, we need a more elaborate notion of what a box denotes in the diagram. For this purpose, we need to introduce the notion of fixed points of functions on lattices. A monotonic function f : A → A on a complete lattice A has a unique least fixed point, denoted µ.f ∈ A (here µ is the fixpoint operator that gives the least fixed point for any monotonic function f ). The least fixed point has the following properties: f.(µ.f ) = µ.f (µ.f is a fixed point) f.a a ⇒ µ.f a ( least fixed point induction) Similarly, there is also a unique greatest fixed point ν.f ∈ A , which satisfies the following conditions f.(ν.f ) = ν.f (ν.f is a fixed point) a f.a ⇒ a ν.f ( greatest fixed point induction) The least fixed point and the greatest fixed point are illustrated in the following diagram:
12 X a
X f.X
nu.f’
f.X
a
X mu.f’
f.X
X f.X
We can use fixpoint induction to establish properties of fixpoint. Another proof technique is given later, in section 5.2.
Pointwise extension. Consider the set of all monotonic functions from lattice A to lattice B. We denote this set by A →m B. The pointwise extension of the partial ordering on B to A →m B is defined by f f ≡ (∀a ∈ A · f.a B f .a) The pointwise extension of a (complete) lattice is also a (complete) lattice. A special case of pointwise extension is lattice product. Let A1 ×· · ·×Am be the cartesian product of lattices A1 , . . . , Am . Then we define a lattice ordering on A1 × · · · × Am by (a1 , . . . , am ) (a1 , . . . , am ) ≡ a1 a1 ∧ . . . ∧ am am This is a special case of the previous definition when each Ai denotes the same lattice B, and we choose A = {1, . . . , m}. As above, the product of a collection of (complete) lattices is a (complete) lattice. 1 Consider now the set A →m A. The least and greatest fixpoint operators are functions of type µ, ν : (A →m A) → A. These operators are monotonic with respect to lattice ordering, i.e., we have for any f, g : A →m A that f g ⇒ µ.f µ.g
Lattice homomorphisms. Consider a function h : A → B, where A and B are lattices. Then h is a lattice meet homomorphism if h.(a b) = h.a h.b In a similar way, we define lattice join homomorphism, a bottom homomorphism, a top homomorphism and so on. The function is a complete join homomorphism if h.(A) = h.A holds for any set of lattice elements A. We can define homomorphism also for other operations on lattices, in the same way as we defined it for meet and join.
Incremental Software Construction with Refinement Diagrams
13
Mutual recursive definitions. An environment is a tuple of monotonic lattice terms, E = (t1 [X], . . . , tn [X]), where X = (X1 , . . . , Xm ) is a tuple of variables that range over lattice elements. Note that ti [X] has to use projection to access a specific variable in X. Projection is denoted by πim , so πim .X = Xi for i = 1, . . . , m. Consider the special case when m = n, i.e., E = (t1 [X], . . . , tn [x]) and X = (X1 , . . . , Xn ). Then the function = (λX · E) E
is a function of type An → An . This function is monotonic on the complete Thus, lattice (An , ) . Hence, this function has a least fixed point µE = µ.E. µE is the least solution to the equation X = E. We refer to µE as the (least) En ), i.e., system defined by the environment E. We write µE = (µE1 , . . . , µE µE Ei is the ith lattice element in the system µE. (Dually, we can define the greatest system νE defined by the environment). A consequence of this is that we can use unfolding to determine the meaning of an environment: µE = (t1 [µE], . . . , tn [µE]) Intuitively, this means that the system defined by the environment is the (potentially infinite) unfolding of the environment. We will usually describe a system by an equation X = E. The system Ei now denotes the ith described is then the solution X = µE, where Xi = µE element of the system. The following figure illustrates an environment E (on the left) and the system µE (on the right) that is defined by the environment:
X1
X1
t1 t2
t1 X2
X2
X X3
X3
t3 t4
X44
Environment
X4
t2
t3 t4
System
The system is here the solution to the equation (X1 , X2 X3 , X4 ) = (t1 [X1 , X3 ], t2 [X1 , X2 , X4 ], t3 [X1 , X3 , X4 ], t4 [X3 , X4 ])
14 ( we have here indicated the occurrence of the components of X in each term rather than just X ). We see how the dependency arrows are bent back to the terms themselves in the system, thus providing the infinite unfolding semantics for the system defined by the environment. We can define a lattice ordering on environments, by pointwise extensions: E E E ≡ E
The monotonicity of the fixed point operator then gives us that E E ⇒ µE µE This can be used to reason about refinement in a complex system. Assume that we have a term ti [X] in the system, and that we make some small change to this term, to get ti [X]. If this change is such that ti [X] ti [X], then we may infer that E E holds for the changed environment E ,and therefore that µ.E µ.E holds for the new system that is determined by the change. Ei holds. In other words, a refinement of one This again means that µE Ei µE of the terms in the environment will result in a refinement of the meaning of the terms.
3.
Diagrammatic reasoning
The diagrammatic notation that we have used to describe the lattice rules can be extended to a proof system for refinement diagrams. The basic idea of such a refinement diagram proof (or derivation) is the following: 1 The entities of the diagram are lattice terms (shown here as rectangels, but we can also use other graphical notation for terms), lattice ordering (shown as a refinement arrow), lattice equality (shown by double arrow) and dependency relations (shown as usage arrow). Nested boxes are interpreted as a dependency of the enclosing box on the inner box. 2 We assume that the lattice terms in a diagram together form an environment E , with each occurrence of a lattice term t in the diagram corresponds to some element ti in the environment tuple. The same lattice term can occur in two or more places in the diagram, but each occurrence has its own index in the tuple. 3 The meaning of a term ti in the diagram (environment) E is the system element µE Ei . 4 An ordering indicated in the diagram holds between the meanings of the terms in the environment (not the terms themselves). The arrow goes from the larger to the smaller element.
Incremental Software Construction with Refinement Diagrams
15
5 We may associate names with the diagram boxes. These names can be understood as the variable names in X . A name associated with a dependency arrow must be one of the free variables in the term that is the source of the arrow. (The variables in the terms can be local names for terms, which are bound to the actual term by the dependency arrow). 6 An entity in the diagram may be annotated. The annotation may provide additional information about the entity. For instance, for software components the refinement arrows would in general be annotated by an abstraction function (or relation) that is needed to determine the data refinement between the components. The conventions (3) and (4) are very important. They can be explicated as follows. Assume that we have an environment E = (t1 [X], . . . , tn [X), where the indexes 1, . . . , n now identify the different occurrences of terms in the refinement diagram that describes E. Then a refinement arrow from box i to Ei holds. As already explained, we write µE Ei for box j means that µE Ej µE πim (µ.(λX · E[X])). A refinement diagram proof (or derivation), is a refinement diagram where the ordering relations in the diagram are numbered by consecutive integers. This numbering shows the order in which the relations have been introduced in the diagram. With each number we associate a proof rule that justifies the introduction of this arrow, together with the possible side conditions that must hold for this inference to be valid. New entities may only be introduced into the diagram if justified by some proof rule. No entities may ever be removed from the diagram. The proof rules used in the diagram can be textual proof rules, or they can be diagrammatic ones, like the inference rules we have presented above. In the latter case, we can apply the proof rule if the solid part of the rule diagram matches the proof diagram; and we may then add any or all of the dashed entities in the rule diagram to the proof diagram. As an example, consider the refinement diagram proof in Figure 1: Note that the infered rules remain dashed in the proof diagram, indicating which parts of the diagram had to be specifically assumed and which parts could be infered by some inference rules. The numbering of the entities in the diagram means that there is an equivalent textual presentation of the proof as a Hilbert-like proof in the theory of lattices. In this textual proof, each step is numbered, and is either justified as an axiom, as an assumption or as an inference that is drawn from some previous steps using some inference rule. As an example, this is the Hilbert like proof that corresponds to the above derivation: 1. a b (assumption)
16 x join y x c
y x
2.
6.
f.(g.x) 7.
4. b
x
Figure 1.
5.
3.
1. a
f.(g.x)
x
f.(g.x)
A refinement diagram proof
2. b c (assumption) 3. f.(g.a) f.(g.b)) (from 1 by monotonicity) 4. f.(g.b) f.(g.c) (from 2 by monotonicity) 5. f.(g.a) f.(g.c) (from 3 and 4 by transitivity) 6. f.(g.c) c f.(g.c) (least upper bound) 7. f.(g.a) c f.(g.c) (from 5 and 6 by transitivity) Note that the Hilbert proof contains more detail, because it also justifies each step. We will assume that the justifications in the diagram proof are given in a separate document (or with hyperlinks), because writing out the justification in the diagram itself is likely to be messy. Rather than considering diagrammatic reasoning as a logic for establishing truth (a proof system), we will usually apply it as a logic of construction. This means that we construct some complex environment step by step, by building up the diagram for this environment and all the time checking that the environment built so far satisfies our requirements to the extent that they can. The purpose of building up the environment is that we are interested in defining (constructing) some specific term with the help of the environment, the term of interest itself being part of the environment. A diagrammatic proof (with justifications for the steps) can be shown in a rather convenient manner as an animated presentation of the diagrams. Then one shows the diagram building up step by step, and carefully identifies the new entities that are placed at the diagram and also shows the justification for this.
Incremental Software Construction with Refinement Diagrams
17
Constructing such a presentation is rather simple using standard presentation software.
4.
Lattice of program parts
A predicate is a property of a state. Hence, we can identify a predicate with a set of states. A predicate transformer maps predicates to predicates. A predicate transformer can be understood as the semantics of a program statement, an idea that was introduced by Dijkstra [Dijkstra, 1976]. For any programming language statement S, we define its meaning wp.S, which is a predicate transformer. This predicate transformer computes for any predicate ((postcondition ) q on the state space another predicate wp.S.q , which is the weakest precondition for statement S to terminate in a state satisfying q. A statement that can fail to terminate in any initial state then describes the predicate transformer abort (defined by abort.q = f alse for any q), while a statement that is guaranteed to always terminate and establish any postcondition describes the predicate transformer magic (defined by magic.q = true for any q). There cannot be any such statement in reality, because among other things, it would also guarantee that the program terminates in a final state that satisfies f alse, which is impossible. Hence the name magic for this predicate transformer. It turns out that it is a good mathematical notion, even if it does not exist in reality. The refinement ordering S T essentially says that any postcondition that S can establish can also be established by T . In this sense T is better than S (or at least as good as S) . We define this relation by S T ≡ (∀q · wp.S ⊆ wp.T ) [Back, 1980, Back and von Wright, 1998]. It also means that any user of the statement S that only is interested in the functional properties of this statement, should not notice any difference if S is replaced by T . The meet and join in the predicate transformer lattice also have very strong analogues in program behavior. The meet S T is the demonic choice between executing S or executing T . One of these alternatives is chosen, but we (the person who is interested in the result of the program) have no influence on which alternative is chosen. The join S T is the angelic choice between executing S or executing T . In this case, we can choose the alternative that suits our purpose better (and choose differently for different purposes). The refinement calculus interprets software systems as elements in a lattice (of, e.g., predicate transformers). A simpler lattice interpretation of program statements is to interpret these as relations on the state space. On the other hand, we can build more complicated models to capture, e.g., classes in object oriented systems, or interactive systems, real-time systems or concurrent systems. A common feature of these models is that they often can be seen
18 as having some lattice theoretic properties. In fact, one often has to force the semantics into a lattice framework, because most systems will allow one form of recursion or another, and the simplest way of modelling recursion is as fixpoints. However, fixed points require an underlying lattice structure (or something very close to a lattice, like a cpo), if one is to reason about the properties of the fixed points in any reasonable manner. We will therefore here postulate that a software system can be understood as the system defined by an environment on a lattice, as explained above. The terms can be seen as the collection of parts of the system. A part can depend on (or use) other parts. A part can be refined by another part, in the sense that any user of this part does not see the difference if that part is replaced with the refining part. We can think of parts as real physical entities, like machine parts or building parts or similar things. We can also think of parts as software components, like procedures, functions, expressions, classes, modules, libraries, or packages. The latter interpretation is the one that we are here primarily interested in.
The refinement calculus hierarchy. The refinement calculus provides a hierarchy of complete lattices that together allow one to reason about complex software (and also hardware) systems. The refinement calculus hierarchy is built on top of an arbitrary collection of state spaces Σ, Γ, etc. In addition, we assume a collection of agents Ω for contracts. The hierarchy and its properties are discussed in much more detail in [Back and von Wright, 1998]. Figure 2 shows the main lattices in the refinement calculus hierarchy. The basic lattices in the hierarchy are The truth value lattice Bool = {T, F } The ordering is implication, i.e., b b ≡ (b ⇒ b ). The smallest element is falsity F and the largest element is truth T . Meet is defined by a b = a ∧ b and join is defined by a b = a ∨ b. The state predicate (or subset) lattice P red(Σ) = Σ → Bool The ordering is subset inclusion, p q ≡ p ⊆ q. The smallest element is the universally false predicate f alse = ∅ and the largest element is the universally true predicate true = Σ. Meet is intersection, p q = p∩q, and join is union, p q = p ∪ q. Rel(Σ, Γ) = Σ → P red(Γ) The ordering is relational inclusion, P Q ≡ P ⊆ Q. The smallest element is the universally false (empty) relation F alse = ∅ and the
Incremental Software Construction with Refinement Diagrams
19
contract statements
predicate p transformers
state relations
state predicates
truth values
Figure 2.
The refinement calculus hierarchy
largest relation is the true (universal) relation T rue = Σ × Γ. Meet is intersection of relations and join is union of relations. The predicate transformer lattice M tran(Σ, Γ) = P red(Γ) →m P red(Σ) The ordering is refinement, defined as stated above by S T ≡ (∀q · S.q ⊆ T.q). This is the pointwise extension of the ordering of predicates. The least element is the predicate transformer abort = (λq · f alse)and the greatest element is the predicate transformer magic = (λq · true). Meet is the predicate transformer S T = (λq · S.q ∩ T.q) and join is the predicate transformer S T = (λq · S.q ∪ T.q). The contracts lattice Cont(Ω, Σ, Γ) = P red(Ω) → M tran(Σ, Γ) Here the ordering is F G ≡ (∀c · F.c G.c). The least element is Abort = (λc·abort) and the greatest element is M agic = (λc·magic). Meet is defined by F G = (λc · F.c G.c) and join is defined by F G = (λc · F.c G.c). Contracts model systems where a number of independent agents with possibly conflicting goals participate in making decisions about how the system should behave.
20 The implication ordering of truth values forms the basis for the refinement calculus hierarchy. The other lattice orderings are defined by pointwise extension of lower orderings. Thus, subset inclusion is the pointwise extension of implication, and relation inclusion is the pointwise extension of subset inclusion. Refinement is also the pointwise extension of subset inclusion, to a different domain than relations. Finally, contract ordering is the pointwise extension of the refinement ordering. The refinement calculus hierarchy also contains other, more exotic lattices, which we will not describe in detail here. The refinement calculus hierarchy contains, in addition to the lattice operations, also other operations that are defined on these domains. In particular, we usually need some operation for sequential composition of relations and predicate transformers. In addition, there are a number of homomorphic embeddings between the lattices in the hierarchy. Reasoning with refinement diagrams in the refinement calculus hierarchy would typically involve reasoning on different levels in the hierarchy simultaneously. It is possible to show reasoning in different lattices in the same diagram. This can be very useful, but in the applications that we have here in mind, we do not use this facility.
5.
Modularity and specifications
Let us start by considering software components with specifications. A specification is a description of a the functional (and sometimes also nonfunctional) behavior of a software component that describes what the component does, but not how it does it. We will consider a specification and an implementation as both parts (i.e., lattice terms) in a software system, and assume that there is a (possibly idealized) sense in which the specification can be executed, in the same way as an implementation can be executed. A specification S is satisfied by an implementation T , if S T holds. Mathematically, this means that S T holds in the refinement calculus. In practice, one would usually have data refinement [Back, 1980, Hoare, 1972, Gardiner and Morgan, 1993, Back and von Wright, 2000] between the components rather than simple algorithmic refinement, but we will skip the distinction here. Intuitively, this means that we are allowed to replace the specification S by the implementation T in any context. Let us consider the refinement relation in somewhat more detail. A specification S can be satisfied by more than one implementation, S T1 , S T2 , S T3 , and so on. For instance, S could be a standard for some component, and T1 , T2 , T3 could be different implementations of this standard which are provided by different vendors. An implementation can also satisfy more than one specification, S1 T , S2 T , S3 T etc. Then we often talk about multiple interfaces to
21
Incremental Software Construction with Refinement Diagrams
the same software component. For instance, a banking application may provide one interface for the bank customer and another interface for the bank clerk. It is also possible that an implementation T1 is seen as a specification of another implementation, in which case we require T1 T2 . For instance, T2 could be a more efficient implementation of T1 , T2 could be an adaptation of T1 to a different platform, or T2 could be the object code of the source code component T1 . It is also possible that we have refinement between specifications, S1 S2 . This could be the case when we are adding functionality to a specification or when we are adding constraints to the specification. The following diagram shows some of these possibilities. We have indicated specifications by rounded boxes to emphasize the intended use of these components. There is, however, no difference between specifications and implementations logically. U2
S2
S1
U1
S0
T2
T1
T0
The specification S0 is in this diagram implemented by both S1 and U1 , and specification T0 is implemented by specification T1 which in turn is implemented by both U1 and T2 . Thus U1 implements both S0 and T1 . Implementation S1 is further implemented by S2 . Specifications allow us to modularize software systems. If a component only knows about the specifications of other components, then the implementation of a used component can be changed at will, as long as the implementation still satisfies the original specification. This technique, known as information hiding, is a powerful technique for building loosely coupled systems, that are easy to maintain. It allows us to build different parts of the system independently (e.g., by different people or at different times), as long as we do not change the specifications of the parts in the system.
22 Specifications are also an important for verifying that a software system is correct. A specification S0 will usually be considerably more abstract and simpler to reason about than its implementation S1 . Hence, if T is another component that depends on this component, T [S1 ] is likely to be a much more complex term than T [S S0 ], and hence much more difficult to reason about.
5.1
Implementing a specification
Let us start with the following example task: We have a specification T0 of a part that we want to build, and we want to implement this with a part T1 that uses another part S1 . The following figure shows what we want to construct:
S1
T1
T0
The term T1 with the dependency arrow to S1 stands for the lattice element T1 [S1 ]. We want to construct this so that it is a correct implementation of the specification T0 , i.e., T0 T1 [S1 ] should hold. The following diagram shows the construction of this term as a diagrammatic proof:
S1
T1 4
3
2.
T1 1. T0
The proof shows that we have used a specification S0 of S1 in order to make it easier to check the correctness of the constructed system. The refinement diagram shows all construction steps in a single diagram. The following figure shows the construction as a sequence of buildups leading up to the final diagram. The step is indicated in parenthesis below each subfigure.
23
Incremental Software Construction with Refinement Diagrams S1
S0
T1
T0
T0
(0)
(1)
3 T1
1. T0 (2)
2. S0
1. T0 (3)
S1
T1 3 4.
S0
T1
1.
S1
T1 2.
2. S0
T1 1. T0 (4)
The last figure here shows the intermediate components and relations with dotted lines, to indicate that these have been used as stepping stones in order to reach the final system, but are not needed in the final system. Here we take the view that the components are constructed when they are needed. An alternative view is that the components already exist and are waiting to be used. The construction then only combines existing components in a suitable way and checks that they satisfy the correctness requirements. In the latter case, we could start with all these parts on the canvas (and maybe other parts as well), and the construction would then amount to connecting the right parts in the right way. The construction steps above can be explained as follows, assuming that initially only T0 is given: 1 We provide the specification of an auxiliary part S0 and an implementaS0 ] of T0 . We show that this is a correct implementation. tion T1 [S 2 We then provide an implementation S1 of S0 , and prove that this implementation satisfies the specification S0 . 3 We redirect T1 to use the implementation S1 rather than the specification S0 . This is a correct refinement of the previous version of T1 which used the specification S0 (by monotonicity). 4 Finally, we notice that we now have a correct implementation T1 [S1 ] of the original specification T0 (by transitivity). The specification S0 and the previous version of T1 that used S0 are now obsolete, so we can forget about them. The same derivation, now expresses ad a Hilbert like proof in the refinement calculus, is as follows: S0 ] (assumption or lemma) 1 T0 T1 [S
24 2 S0 S1 (assumption or lemma) 3 T1 [S S0 ] T1 [S1 ] (by monotonicity from 2) 4 T0 T1 [S1 ] (by transitivity from 1,3) We thus have three different ways of describing the same construction: a diagrammatic way based on refinement diagrams, a software process log where the successive development steps are described informally, and a formal proof in refinement calculus. The first description is the most intuitive one and allows for a good overview of the architecture of the system that is constructed. The second description shows clearly what actions we need to take to carry out the construction. The third one provides a high-level formal proof of the correctness of the construction. We have in the proof added “assumption” or “lemma” as justification, to indicate that these steps may have been established in a different proof framework, and are here taken as lemmas or assumptions. This is a way in which we can separate the program construction in the large that is done using refinement diagrams from the program construction in the small that is possibly carried out in other logical frameworks (or using refinement diagram reasoning on a lower level in the refinement calculus hierarchy). Note. The above formalization assumes that the terms describing the software parts are always well formed and internally consistent. We can emphasize the construction of a well formed and consistent software part by introducing a separate judgment for this, e.g., S, that states that S has the required property (e.g., S is internally consisten).. Then the diagrammatic proof and the corresponding Hilbert like proof will have two kinds of judgements, S and S T , and at least twice as many steps. We choose the version above for our examples, because it leads to shorter and simpler proofs. In actually software construction, one may want to use the more detailed version for software construction, in order to better identify the two different kinds of steps involved: first constructing a part and then checking that it satisfies its requirements.
5.2
A recursive implementation
The previous derivation assumed that the parts involved are non-recursive. The infinite unfolding of a non-recursive term is then equivalent to some finite unfolding of the term, so we can consider all terms to be finite. If we have recursive components, this is no longer true.
Incremental Software Construction with Refinement Diagrams
25
Consider first a system with a single recursive component. We assume that we have a specification S0 of a component, and we want to implement this with the component (µX · S1 [X]). We can use the following induction principle to reason about properties of a fixed point. Assume that f is a monotonic function on a lattice, and that we have a monotonically increasing sequence x0 = ⊥ x1 x2 . . . such that x . Assume further that x= ∞ i i=0 xn+1 f.xn holds for any n ≥ 0. We then have that x µ.f This result is the basis for a proof rule for recursive procedures described in [Back and von Wright, 1998]. The proof of this is a rather simple exercise in lattice theory. We first show that xn µ.f holds for any n ≥ 0. We prove this by induction. For n = 0, we have x0 = ⊥ µ.f . Next, assume that xn µ.f holds. We then have that xn+1 f.xn f.(µ.f ) = µ.f >From this then follows that ∞
xn µ.f
n=0
as the limit is the least of all fixed points of the sequence. (Note: for the general case, we need to carry on the argument to transfinite induction). This result would be used in the following way. The sequence x0 = ⊥ better approximations of the specification x1 x2 . . . provides better and of the component, such that x = ∞ n=0 xn is the complete specification of the component. We prove for an arbitrary approximate specification xn+1 , that xn+1 f.xn holds. In other words, the specification xn+1 can be replaced by the body of the component, which calls some specification lower down in the approximation hierarchy. This means that any sequence of unfoldings will eventually terminate. Ignoring the indexing of the specification components, the recursion rule has the following general form: S0 ] ⇒ S0 (µX · S1 [X]) S0 S1 [S The need for a termination argument when applying the rule is indicated in the diagram below with a start on the refinement arrow.
26 S1 . 2*.
S1 1. S0
The star on step 2 indicates that this step has a side condition. The diagram shows the following textual Hilbert like proof: 1 S0 S1 [S S0 ] 2 S0 (µX · S1 [X]) (by recursion rule, provided termination is guaranteed) Now consider systems with mutually recursive calls. Initially, we have two specifications, S0 and T0 . Assume that we implement T0 with T1 that uses the specification S0 and S0 with S1 that uses the specification T0 . We want to show that the system where these two statements call each other directly is a correct implementation of these specifications. The following is a diagrammatic proof of this:
2*.
T1
S1
T1
S1
1.
T0
2*. 1.
S0
The derivation steps are as follows: 1 (T T0 , S0 ) (T T1 [S S0 ], S1 [T T0 ]) 2 (T T0 , S0 ) (µX, Y · (T T1 [Y ], S1 [X]) (by recursion rule) Note that we do not provide any tupling operation in refinement diagrams. Here tupling has to be inferred from the fact that two relations have the same step number. In practice, we can do the tuple proofs one component at a time, to make the derivation easier to follow, and to better record what really happens in the construction. This gives us the following derivation:
Incremental Software Construction with Refinement Diagrams
27
1 T0 T1 [S S0 ] 2 S0 S1 [T T0 ] T1 [Y ], S1 [X]))1 3 T0 (µX, Y · (T nation)
(by recursion rule, assuming termi-
T1 [Y ], S1 [X]))1 (by recursion rule, assuming termina4 S0 (µX, Y · (T tion) The ordering of steps 1 and 2, and of steps 3 and 4 does not matter here. In step 3 and 4, we need to establish termination. For this purpose, we need to show that some termination function is decreased by each call. If termination is established, then we get refinement for free. Note the crucial role played by the specifications when we implement components using mutual recursion. The specification serves as the induction hypothesis in the construction (and the termination argument is needed to show that the induction is well-founded).
5.3
On duplication of terms and implicit inference
The derivation shown in the figures above can seem overly complex, because we are duplicating some entities, like T1 , in these derivations. It would seem more economical to redirect the arrow in the derivation rather than duplicating the whole entity. For instance, we could consider representing this derivation as follows: S1 3
2. S0
T1 1. T0
This figure shows the third step as just a redirection of the solid arrow from T1 to S1 . Implicitly, this could be seen as stating that this redirection is ok, in the sense that all relations that held before are still valid. In particular, this would mean that T1 [S1 ] would still be an implementation of T0 (as indicated by step 4.). The advantage here is that the derivation becomes more compact, and the use of duplicates is avoided. There is also a considerable advantage in keeping the layout of the class diagram unchanged and just moving arrows around. The disadvantage is that the meaning of a box becomes ambiguous. Consider the same situation as above, but now include a user of T1 , say U1 .
28 S1 3 U1
2. S0
1. U0
T0
Here a change in T1 (to use S1 rather than S0 ) will also mean that U1 is T1 [S S0 ]] to U1 [T T1 [S1 ]] . However, this change is difficult changed, from U1 [T to notice here. As there are many such dependencies in any larger software system, avoiding duplication of terms opens up the door to hidden and possibly uncontrolled and unwanted changes in the software system. (On the other hand, as we will show below, redirection of calls can be a very simple and important mechanism for doing automatic changes in the dependency structure of software, when the changes are known to be admissible.) Compare this situation above to the same derivation with duplication: T1
U1 5. 6
S1 2.
3
U1
S0
T1 1.
4 U0
T0
Here the new terms that are introduced are shown explicitly, in separate T1 [S S0 ]] and U1 [T T1 [S1 ]] occur in the same diagram. derivation steps. Both U1 [T There is, however, still room for improvements in the presentation of the diagrams. The trick here is to avoid making inferences unless they are explicitly needed. For instance, we could be happy to present the diagram above in the following form: S1 2. U1 4 U0
S0
T1 1. T0
Here we have indicated the refinement of S0 by S1 , but we have not drawn the consequences. The inferred terms and arrows can be indicated later, if they are needed.
29
Incremental Software Construction with Refinement Diagrams
Alternatively, we could combine a number of inference steps into a single step, as in the following diagram: T1
U1
S1 3
U1
U0
S0
T1 4
2.
1. T0
Here we only show the desired conclusion, that U1 [T T1 [S S0 ]] is refined by T1 [S1 ]], and we leave the intermediate transitivity and monotonicity steps U1 [T implicit, as these are easy to see by arrow chasing. The conclusion is that the duplication of terms is really needed, to avoid making the derivations ambiguous. On the other hand, one does not have to draw all the inference arrows and intermediate terms that are possible, but only those that are relevant for the final result. The problem with too much detail should be handled at a different level. The diagrammatic proof should be seen for what it is, as a proof, so it must be unambiguous and show all the information that is necessary to easily convince the observer that the stated fact do indeed hold. After the proof is done, then one need only to display the part of the diagram that is interesting for the present purpose. This is similar to a lemma that we have proved: we don’t want to see the proof when using the lemma, but the proof should have been done, and it should be available for inspection later on, if we start doubting the lemma.
5.4
Improving parts by respecting information hiding
Above we showed how to implement a specification with the help of an auxiliary component using information hiding for the implementation of the dependent component. Next we show how to further improve dependent components in a way that respects information hiding. Consider first the following situation. We have a statement S0 that is a specification for statement S1 . We have a statement T0 that uses S0 (we ignore here the possible specification for T0 ). This means that T0 only knows of the specification S0 of the S component. We refine the T0 that uses S0 to T1 using the same S0 . This means that T1 also only knows the specification S0 of the S component. By monotonicity, T1 using S1 is a correct refinement of T1 using S0 . This means that the component T1 can freely use the implementation S1 , but it is not able to use any special features of the implementation, it can only use those features that are specified in S0 .
30 Figure reffigc shows the derivation. The desired system is shown on the T1 4.
T1
S1
3.
T1
2.
T0
T0
Figure 3.
S1
1.
S0
6.
Implementation that respects information hiding
left. The diagram proof on the right shows the sequence of steps that lead to S0 ]. The implementation is the construction of a correct implementation of T0 [S T1 [S1 ], whose correctness is established in step 4. The fact that we are using information hiding is shown in the left diagram, that gives a compressed view of the derivation. In this diagram, we show S0 ] occurs as an intermediate step in the derivation (using that the term T1 [S the dashed arrow). This means that T1 cannot have any information about the implementation, because it refers to the specification only. S0 ] as the specification of the final system, Note that we are here using T0 [S rather than a single specification statement like T0 in the previous case. This means, in particular, that we do not check whether the system T0 uses S0 in the correct way. If it does not, then it is possible that this error is propagated to the implementation T1 . If we want to avoid this, we need to add a specification for T0 and prove that T0 satisfies this specification.
5.5
Improving parts without respecting information hiding
If we insist on information hiding in the refinement step above, then it follows that the implementation T1 of T0 cannot make use of the implementation S1 . In many cases, it may be desirable that T1 does make use of the implementation, e.g., because of efficiency reason, or because it needs direct access to the data representation in S1 , or because it wants to utilize new functionality provided by S1 . In this case, we would prefer to implement S0 and T0 together, and therefore break the information hiding principle. The derivation in Figure 4 doess not respect information hiding. The fact that we are not respecting information hiding here is shown in the left diagram
31
Incremental Software Construction with Refinement Diagrams T1
3. 4. T1
S1
T0
S1
2.
T0
T0
Figure 4.
1.
S0
Implementation that does not respect information hiding
by the upward diagonal arrow, which shows that the term T0 [S1 ] occurs as S0 ] as in the previous derivation. The an intermediate term, rather than T1 [S statement T1 does not refer to S0 but only to S1 , so it can make use of facilities that only exist in the implementation. Note that the resulting diagrams are the same in both cases, but that the derivation shows the differences between the two diagrams. Thus, we can choose between enforcing and not enforcing information hiding in software construction, depending on what we want to do. In both these cases, the ordering of the steps 1 and 2 is not important. If step 1 comes before step 2, then we proceed bottom up, first defining the components that are used before using them. If step 2 comes before step 1, then we are proceeding top down, first defining the user of a component, before implementing the component. Both cases are equally acceptable. We can also think about the construction as proceeding by first placing some (or all) entities on the diagram, before starting to connect them by refinement arrows and construct additional entities. This would correspond to building the system using off-the-shelf components.
6.
Extension
The previous discussion has centered around the use of specifications in software construction, as a way of modularizing software. Modularity is a central technique for allowing incremental software construction. There is, however, another important way in which software can be built incrementally, and that is by extending software by new features one by one. In [Back, 2002], this approach is discussed extensively, under the name stepwise feature introduction. We will here consider how one can model this kind of software extension in the refinement calculus framework with refinement diagrams.
32 We write ST [base] for the component that we get by extending componentS by component T . We can model extension by usage: S T [base] = T [S], i.e. an extension is a component that uses another component. The extending component T refers to the extended component S by the name base (the word super is more common, but base seems more appropriate here). In addition, we will then require that S T [S] holds, to guarantee that the extension preserves the functionality of the original component. Note that S[X] T [X] = T [X, S[X]], i.e., both S and T can also be dependent on other parts in the environment. We will introduce a special arrow (with a hollow head) to indicate extension. Notice that, by the definition, an extension arrow is really two arrows, a usage arrow and a refinement arrow, as shown in the following diagram: T
T == base
S
S
The usage arrow shows that the extending statement S1 can call the extended statement S0 , which is called base inside S1 . The (data) refinement arrow inS0 ], i.e., that the extension is a superposition refinement dicates that S0 S1 [S of the original statement (see, e.g., [Back and Sere, 1996]). The refinement is restricted, in the sense that the extended statement can introduce some new attributes, but cannot remove old attributes, and can also not change their intended interpretation (essentially, this means that the encoding function is a projection from the old and new attributes to the old attributes). As explained S0 ], where the super calls are reabove, we write S0 S2 for the statement S1 [S S0 is a superposition placed with the called statement. We will write S0 S1 (S refined by S1 ) for the statement S0 S0 S1 .
6.1
Extension, layering and binding
We have above discussed how to model software construction in terms of specification and implementations of software modules that may depend on other modules (possibly in a recursive way). Let us now consider extension rather than implementation in this framework. Assume that we have built a basic system, consisting of a collection of parts (e.g., classes) that use each other. This system provides some basic functionality. Next, we want to extend the functionality of the system with some new feature. Often, it is not sufficient to just extend a single part, the new functionality may require the cooperation of two or more parts, where some of these parts are extensions of existing parts in the basic system. Essentially, we want
33
Incremental Software Construction with Refinement Diagrams
to build a new layer on top of the basic system layer, the new layer providing the added functionality. To make this more concrete, let us assume that we start off with a statement T0 that uses another statement S0 . We then extend S0 with S1 that adds some new functionality to S0 . Then we extend T0 with T1 which makes use of the added functionality in S1 to provide new functionality in T1 . How do we do model this with refinement diagrams. The simple solution is provided by the following diagram (the left diagram shows the derivation with the extension arrow, the right shows the same diagram expressed in terms of refinement and dependency only): T1
S1
T1
1.
2
S0
S1 2
1.
S0
T0
Here we start with the system consisting of T0 [S S0 ] and S0 (the basic layer). In step (1) we introduce a part S1 such that S0 S0 S1 (written more briefly S0 ] as S0 S1 ). In step (2) we then introduce a new part T1 such that T0 [S S0 ] T1 [S1 ] (i.e., T0 [S S0 ] T1 [S1 ]). T0 [S This layering uses static binding for the extended parts. This means that in the extended system T0 T1 , the base part T0 continues to use the base part S0 , even if there is an extension S0 S1 of this part available. We can model dynamic binding by requiring that the extended version of the used part is used also in all extension layers. The following derivation achieves this effect: T1
S1 3
4.
T0
1.
2 S0
T0
The initial situation is the same as before, and step (1) is also the same as above. However, in step (2) we use monotonicity to derive that T0 [S S0 ] T0 [S S0 S1 ] In step (3) we show that T0 [S S0 S1 ] T1 [S S0 S1 ]
34 Finally, in step (4) we use transitivity to derive the required result, T0 [S S0 ] T0 [S S0 S1 ] T1 [S S0 S1 ] This says that the bottom layer is refined by the extension, where the binding is dynamic rather than static. The above shows that layer extension with dynamic binding requires some extra steps in the derivation, which complicates the derivation. Adding more layers makes the derivation even more cumbersome. The following diagram shows the derivation of three layers with dynamic binding:
T2
S2
T1 T1
T T0 T0
S1
T0 S0
We have here shown that each extension is a refinement of the previous layer, when dynamic binding is used in the layers. On the other hand, one can note that the extra steps that are required by dynamic binding can be inferred and do not have to be proved explicitly. This suggests that we could introduce a simpler notation where dynamic binding is implicit. For this purpose, we introduce layers as an additional device in the diagrams. A layer indicates a collection of extensions that are to be used together. Whenever a part is referenced at a lower level of extension, it is taken to mean the extension in the current layer (i.e., all calls are bound to extensions in the current layer). The layer is determined by the part that we consider initially. We indicate a layer with a dashed outline in the diagram. The following diagram shows the two layer system with dynamic binding, using layers on the left and without layers on the right (the same diagram as above):
35
Incremental Software Construction with Refinement Diagrams T1
S1
T1
S1
1.
2
3
4.
1. 2
S0
T0
T0 S0
T0
The three layer system with dynamic binding shows the advantage of notation even more clearly (layers on the left, no layer on the right): T2
S2
T1
S1
T2
S2
T1 T1
T T0 T0
S0
T0
S1
T0 S0
The three layer system really defines three different systems. The basic system is started by using T0 as the main program. It provides some basic functionality, and makes use of S0 as an auxiliary part. The intermediate system is started using T1 and it makes use of the extension S1 of S0 ( and extend itself the system T0 ). All calls to S0 are redirected to S1 . Finally, the most advanced system is started from T2 and makes use of the extension S2 of S1 . This system adds even more functionality, and all calls to S0 or S1 are redirected to the extension S2 . There are a few loose ends in this description of software layers. First, we assume that the layers in the system have a tree like structure, so that for each layer there is a unique previous (father) layer. A part may only reference a part in a preceding layer (which means either a previous layer or a layer that precedes the previous layer). For any used component, the most recent extension is used, i.e., the extension that is closest to the chosen layer in the layer hierarchy. This convention corresponds to the single inheritance principle in object oriented systems. Note that the layering construct allows a number of different extension hierarchies to co-exist at the same time. At the same time, it prevents extensions in different layers to be used at the same time. In many situations, this is exactly what we want. There are, however, also situations where we do not want this. For such situations, we may use both kinds of calls: dynamic calls that
36 are redirected by layering, and static calls that cannot be redirected. We would then need a differentiating notation for these two calls (we will ignore this issue here).
6.2
Layered specifications
Let us next consider the relationship between extension and implementation. Assume that we have a preliminary specification S0 which we have implemented by S1 . Assume now that we want to extend the specification by some new features T0 . This now gives us a new layered specification S0 T0 . We could implement this layered specification directly by a new implementation U1 (as shown below on the left), or we could implement the layered specification by an extension T1 of the original implementation S1 , giving us S1 T1 (as shown below on the right). T0
U1
T0
S0
S1
S0
T1
S1
Both approaches are perfectly valid. In the first approach, one decides that the presence of the new features in T0 require also a change of the original implementation of S0 , and that the new and old features therefore are better to implement anew, as U1 . The second approach deems that the new features in T0 are such that they do not require a reimplementation of the features in S0 but that it is sufficient to just extend the implementation S1 with an implementation T1 for the new features of T0 . Which approach one chooses depends on the situation. Very often the new features are rather orthogonal to the old features, so the second approach is feasible. Its advantage is that one can reuse the implementation of the S0 features, and only need to check that the new features in T0 are correctly implemented and that their implementation does not compromise the implementation of the previous layer. This will usually save a lot of time and effort.
6.3
Extension, non-recursive implementation and usage
A big problem with the correctness proof of a layered system in Section 6.1 is that the proof is not local. Thus, when we add extension layers to the system, we are forced to prove refinement between larger and larger terms. For S0 S1 ] T0 [S S0 S1 ]T T1 [S S0 S1 ]. instance, we have to prove in step 3 that T0 [S If S0 and S1 are non-trivial statements, then this can require proofs involving
37
Incremental Software Construction with Refinement Diagrams
very large terms. If these statements in turn call other statements, the terms get even bigger. We therefore need to use more local reasoning and modularize the proof, in order to keep it of manageable complexity. The solution is again to introduce specifications for components. On the surface, this complicates the proof, because we have to come up with a whole new set of constructions, the specifications, and we have to establish many more properties. However, the terms in the propositions are now smaller, and calls only refer to specification statements. The derivation of the same end result as before, but now using specifications, is shown in the following diagram
Layer 1 13.
T11
12.
11.
S11 S01
T11
7 7. 6.
T01 10. Layer 0
8.
5.
S10
T10
4. 3. T10
2 2. S00
1. T00
We write here Sjk for component S in implementation j and layer k, and similarly for component Tjk . Let us now consider the Hilbert proof for this diagram. First we derive an implementation of the original system specification T00 with T10 (using an auxiliary statement S10 ). All the terms used in this derivation are “small” in the sense that they consists of at most one main statement and then specifications of other statements (rather than implementations of other statements). S00 ] 1. T00 T10 [S 2
S00 S10
3. T10 [S S00 ] T10 [S10 ] (by monotonicity) 4. T00 T10 [S10 ] (transitivity) Next, we derive the extension level 5. S00 S01
38 6. S10 S11 7. S00 S01 S10 S11 8. T00 T01 9. T10 [S S00 ] T10 [S S00 S01 ] (monotonicity) 10. T10 [S S00 S01 ] T11 [S S00 S01 ] S00 S01 ] T11 [S S00 S01 ] 11. T00 T01 T10 [S S00 S01 ] T11 [S S00 S01 ] T10 [S10 S11 ] T11 [S10 S11 ] (mono12. T10 [S tonicity) 13 T00 T01 T10 [S10 S11 ] T11 [S10 S11 ] (transitivity) Note that step 9 is not shown in the diagram derivation. This is the step that is done automatically because of dynamic binding, as indicated by the layering of the diagram. This finishes the derivation of the second layer. In the end, we have found an implementation of the specification for the first two layers, we have proved that this implementation is consistent and that it is a superposition refinement of the specification layers. (The consistency of the specification layers has also been established on the way). The final system that we have derived is the following:
T10
T11
S11
Level 1
Level 0 T00
S10
T10
In fact, this describes two different systems, Layer 0 and Layer 1. Written out explicitly, these systems are as follows: T10 1
T00
T10
Level 0
S10
T00
T11
T10
Level 1
S11
S10
Incremental Software Construction with Refinement Diagrams
6.4
39
Layered components
Extension adds a new dimension to software diagrams. Because the diagrams can become quite large, it is useful to be able to describe extensions in a more concise way than above. One possible way is suggested below: stacking extensions on top of each other. This is only notational abbreviations, it does not change the underlying logic of the derivations. S2
T2
S1
T1
S0
T0
We could extend this notational presentation, and show the implementations as boxes to the right of the original boxes. For instance, the result of the above derivation could then be compressed into the following figure: T10 T00
S11 T10
S10
This description is considerably more compact than the previous derivation. It also shows that the system essentially consists of two components, a T and an S component. The implementation and extension arrows in the diagram can be omitted, if the intention is clear from the context. The figure shows that the T component is internally constructed in two layers, and for both layers we have a specification and an implementation. Similarly, the S component is constructed in two layers. We refer to these components as layered components. In addition, this figure shows that the bottom layer of the T component uses the bottom layer of the S component. Thus we can allow a layered component to be used on different levels, with increasing functionality. The T component also provides a layered specification of the component, and shows explicitly that there are two levels on which the component can be used. The layering of the different components does not have to be the same. Moreover, the layering of specifications and implementations need not be directly corresponding. The following diagram shows some examples of this. T30 T20 T10 T00
40 Here the specification T00 is implemented by two layers in the implementation, whereas T20 and T30 are implemented at the same time by a single layer. The layering of the S component is here independent of the layering of the T component. Moreover, the implementation of extensions T20 and T30 has been further optimized by providing further implementations. In addition, some of the implementations use private components, that are not visible to the outside. This example has been constructed to show all the different possibilities for using proximity of parts to indicate usage, implementation and extension. The abbreviated notation is convenient when we are describing simple structures, but is quite restrictive and can also be ambiguous. In more complex situations we will need general refinement diagrams. However, it is possible to combine general refinement diagram constructs with the more concise constructs describe above.
7.
Software evolution
Refinement diagrams provides us with a tool for managing the development of software systems in a way that ensures correctness of the system througout its construction. This is done by checking the correctness of each arrow when it is introduced, rather than trying to establish correctness of all the arrows after the system has been built. The latter approach is also feasible, but will require much more work and reimplementations when the required relations between the parts does not hold. The diagrams that we have described above consist of nodes together with three kinds of arrows: dependency/usage arrows, refinement/replacement arrows and extension/inheritance arrows. We can consider these three kinds of arrows as representing three dimensions of software: usage between parts in the x- axis, extensions of parts in the y-axis and implementations of parts in the z- axis. This means that the software we have constructed can be seen as a three dimensional cube, as illustrated by Figure5: In reality, a piece of software is not a cube, because the dimensions are not linear, but are more general relations. Hence, a component can use more than one other component and be used by more than one other component. The dependency between components can even be cyclic. Also, a component can have more than one implementation and can be implemented in more than one way. Finally, a layer may be extended in more than one way, and may be extending more than one other layer. The last situation corresponds to multiple inheritance. However, the software cube provides an intuitive understanding of three important dimensions of software.
7.1
Evolution over time
There is one more important dimension of software, the evolution of software over time. This is the fourth, t- dimension of software. The refinement
41
Incremental Software Construction with Refinement Diagrams extension
replacement
layers implementations dependency components
Figure 5.
Software cube
diagrams capture this dimension also, by numbering the inference steps. Each new inference step increases the (logical) time counter by one. This time dimension is then the same as the step number in a Hilbert like proof system. The fact that the time steps correspond to proof steps help maintain consistency of the construction: we will never refer to a step that has not been taken yet, in the same way as we cannot refer to a part that has not been constructed yet. We are, of course, also free to time stamp the derivation steps with real time, so that we can see the exact date and time when a specific step was taken. The time dimension means that the construction of software can be played back like a movie, showing how each step adds to the construction. However, the rules we have for manipulating refinement diagrams only permit addition of elements; we do not permit any elements to be removed from a refinement diagram. This means that over time the diagram will be filled with elements that are not needed anymore. Such elements can either be stepping stones in the derivation that have served their purpose, or they can be approaches that we have abandoned because we found a better way of doing things. One could argue that these constructs should be removed to simplify the diagram. On the other hand, we do not need to remove these elements from the diagram to get a simpler view, it is sufficient to only display those parts of the diagram that are presently of interest. The parts of the diagram that reflect the historic development but are not relevant now may still be needed later. A step in the derivation that can usually be ignored may have to be revisited, if we find an error in the proof, or if we are considering an alternative development that could be based on this version. An alternative approach that was abandoned may become relevant again, when we find ourselves in a blind alley with the
42 present approach. Also, keeping the trail of the software development may be useful for auditing purposes, or for certification purposes. Given any specific element in the diagram (say a main program or an application), we can focus on the part of the refinement diagram that is needed to understand how this specific node was constructed. This gives us a view of the development thread for this specific node. Within this thread, we may further focus on those parts that are explicitly required for the present purpose (e.g., executing the application), and ignore the steps that were used to derive the application. This will give us a compressed description of the application, as it is today, and would correspond to the usual UML class diagram. Finally, within this diagram, we may choose to study only certain parts of the diagram, and ignore the other parts. The fact that we have saved the whole derivation of the refinement diagram, in a timed sequence of derivation steps, also allows us to go back in history. We can choose any point in time (marked by a specific derivation step number) and choose to look at the situation as it was at that date, ignoring all later steps. The diagrams shown above present the software construction as an orderly sequence of additions to the diagram. In practice, it is often necessary and desirable to redesign the system, i.e., change the software architecture without necessarily changing the functionality of the system. This means that the refinement diagram is extended with new elements, and some of the old elements become obsolete. These obsolete elements are not, however, removed. They remain in the diagram, but are on paths that will be ignored in later construction phases.
7.2
Example of software evolution
The derivation in Figure 6 is an example of software evolution. It is a continuation of the derivation above. Starting from step 13 in the previous derivation, we continue the derivation as follows: 14. We decide that the implementation S10 S11 is too inefficient or too complicated, and we want to improve it by implementing the S component directly without layering. For this purpose, we introduce a new class S21 . We show that this new class is a correct implementation of S10 S11 . 15. Because of this, we are now allowed to deduce that T10 using S10 is refined by T10 using S21 instead (this requires two applications of monotonicity and one application of transitivity) 16. Next, we show that T11 using S21 is a correct extension of T10 using S21 . 17. Finally, we show that T00 T01 is correctly refined by T10 [S21 ] T11 [S21 ] .
43
Incremental Software Construction with Refinement Diagrams
S21 17
Layer 1
14.
T11 13.
T11
12.
S11 16
11.
S01
T11
7 7. 6.
T01 10. Layer 0
8.
T10
4.
15.
5.
3.
S10 2 2.
S00
T10 1. T00
Figure 6.
Example of software evolution
This is an example of software evolution. We started with a quite strict layered construction,. Having managed to get this to work, we decided that we needed a more efficient version, so we refactored the system by reimplementing the layered component S10 S11 by a non-layered component S21 . The T components were changed to use the new component instead, and we showed that the changed T components still satisfied the layered specification. We did not change the layered structure of the T component. In the figure, we have left those parts of the system can be safely ignored uncolored and dashed. These parts are not needed for executing the system, and maybe not even for documenting the system behavior. If, however, we want to recheck the derivation, e.g. because we have found an error somewhere, then we can go back and look at the earlier versions of the system and see how the parts there were built and used.
7.3
A version control system
The above should be sufficient to show that we can build a version control system system based on refinement diagrams. The versioning system would essentially be a tool for creating, storing and inspecting refinement diagrams, and would provide sophisticated tools for viewing different parts of the software cube, as explained above. The refinement diagrams emphasise the structure of software, but do not properly discuss the information associated with the different structure ele-
44 ments. Typically, we would associate program code with the components, as well as other information (e.g., protection). We would associate proofs with refinement and extension arrows, in addition to, e.g., abstraction relations with refinement arrows. We may also associate test sets (e.g., automatic unit tests) with the implementation and extension arrows. We could associate usage restrictions (e.g., method preconditions) with usage arrows (or with the methods themselves), and so on. All this information also needs to be provided and inspected. A simple setup is then to have one editor that constructs and browses the refinement diagram and have other specialized editors that are used to construct and inspect the information associated with the structure elements. For instance, we could have a source code editor for writing program text, we could have an interactive proof editor for checking correctness of refinement steps, we could have a unit test framework to execute the tests automatically, we would have a compiler for executing the software system, and so on. The refinement diagram as presented above essentially equate a refinement arrow with a true refinement proposition. In practice, one may want to make a difference between a refinement arrow that should be true (the intention) and one that has been shown to be true (the established fact). As proposed in s [Back, 2002] , we can decorate an intended refinement arrow (and extension arrow) with a question mark, and an established refinement arrow with an exclamation mark. The exclamation mark may further be qualified by the way in which the truth of the refinement has been established: by inspection, by testing, by a manual proof, or by a formal, possibly machine checked proof.
8.
Conclusions
We have above shown how to extend the refinement calculus with a diagrammatic notation that allows large software systems to be constructed in a rigorous and (in our opinion) quite intuitive way. The refinement diagrams that we introduce for this purpose are essentially tools to reason about lattice elements, but can be used for software by interpreting software components as elements in lattices, as is done in refinement calculus. We have shown that the refinement diagram proofs are isomorphic to Hilbert like proofs in a lattice theory. We have applied this framework to analyze a collection of important problems in software engineering. The importance of specifications has been highlighted, and we have shown the importance of specifications when deriving large systems. We have also discussed the rational for the information hiding principle when constructing large software systems, that this principle should be used when applicable, but that there are situations when it should not be used. We have also shown how to formalize and reason about systems that are built by exension layers, where the layering is based on inheritance and
Incremental Software Construction with Refinement Diagrams
45
dynamic or static binding. Finally, we have described how the refinement diagram proofs provide a high level view of the evolution of the software system, and that a version control system could be based on this kind of diagrams.
Acknowledgments. A number of collegues have been very helpful in dicussing the issues described here. In particular, I want to thank Marcus Alanen, Johannes Eriksson, Luka Milovanov, Herman Norrgrann, Viorel Preoteasa, and Joakim von Wright.
Notes 1. A dependent product would be a generalization of both functions and products, and for which this property also would hold. However, we try to avoid dependent products in our presentation, because they cannot be expressed in a convenient manner in simply typed higher order logic.
References - J., and Rysa, - E. [Anttila et al., 2002] Anttila, H., Back, R.-J., Ketola, P., Konkka, K., Leskela, (2002). Combining stepwise feature introduction with user-centric design. Technical Report 495, TUCS - Turku Centre for Computer Science, www.tucs.fi. [Back, 1980] Back, R.-J. (1980). Correctness Preserving Program Refinements: Proof Theory and Applications, volume 131 of Mathematical Center Tracts. Mathematical Centre, Amsterdam. [Back, 1988] Back, R.-J. (1988). A calculus of refinements for program derivations. Acta Informatica, 25:593–624. [Back, 1991] Back, R. J. (1991). Refinement diagrams. In Morris, J. M. and Shaw, R. C. F., editors, Proceedings of the 4th Refinement Workshop, Workshops in Computer Science, pages 125–137, Cambridge, England. Springer-Verlag. [Back, 2002] Back, R.-J. (2002). Software construction by stepwise feature introduction. In Bert, D., Bowen, J., Henson, M., and Robinson, K., editors, ZB 2002: Formal Specification and Development in Z and B, proceedings of the 2nd International Conference of B and Z Users, LNCS, pages 162–183, Grenoble, France. Springer Verlag. Also appeared as TUCS Technical Report 496. [Back et al., 1996] Back, R.-J., Martin, A., and Sere, K. (1996). Specifying the caltech asynchronous microprocessor. R Science of Computer Programming, 26:79–97. [Back et al., 2000] Back, R.-J., Mikhajlov, L., and von Wright, J. (March 2000.). Formal semantics of inheritance and object substitutability. Technical Report 337, TUCS - Turku Centre for Computer Science, Turku, Finland. [Back et al., 1999a] Back, R.-J., Mikhajlova, A., and von Wright, J. (1999a). Reasoning about interactive systems. In J. Wing, J. W. and Davies, J., editors, Proc. of the World Conference on Formal Methods (FM’99),Toulouse, France., volume 1709 of Lecture Notes in Computer Science, pages 1460 – 1476. Springer-Verlag. [Back et al., 2002] Back, R.-J., Milovanov, L., Porres, I., and Preoteasa, V. (2002). An experiment on extreme programming and stepwise feature introduction. Technical Report 451, TUCS - Turku Centre for Computer Science. [Back et al., 1999b] Back, R.-J., Petre, L., and Porres-Paltor, I. (1999b). Analyzing uml use cases as contracts. In France, R. and Rumpe, B., editors, UML’99- Second International
46 Conference on the Unified Modeling Language: Beyond the Standard,, number 1723 in Lecture Notes in Computer Science, pages 518 – 533. Springer-Verlag,. [Back and Sere, 1991] Back, R.-J. and Sere, K. (1991). Stepwise refinement of action systems. Structured Programming, 12:17–30. [Back and Sere, 1996] Back, R.-J. and Sere, K. (1996.). Superposition refinement of reactive systems. Formal Aspects of Computing, 8(3):324–346. [Back and von Wright, 1998] Back, R.-J. and von Wright, J. (1998). Refinement Calculus: A Systematic Introduction. Springer-Verlag. [Back and von Wright, 2000] Back, R.-J. and von Wright, J. (2000). Encoding, decoding and data refinemen. Formal Aspects of Computing. [Barr and Wells, 1990] Barr, M. and Wells, C. (1990). Category Theory for Computing Science. Prentice-Hall. [Birkhoff, 1961] Birkhoff, G. (1961). Lattice Theory. American Mathematical Society, Providence. [Davey and Priestley, 1990] Davey, B. A. and Priestley, H. A. (1990). Introduction to Lattices and Order. Cambridge University Press. [Dijkstra, 1976] Dijkstra, E. W. (1976). A Discipline of Programming. Prentice–Hall International. [Dijkstra and Scholten, 1990] Dijkstra, E. W. and Scholten, C. S. (1990). Predicate Calculus and Program Semantics. Springer–Verlag. [Gardiner and Morgan, 1993] Gardiner, P. H. and Morgan, C. C. (1993). A single complete rule for data refinement. Formal Aspects of Computing, 5(4):367–383. [Hoare, 1972] Hoare, C. A. R. (1972). Proofs of correctness of data representation. Acta Informatica, 1(4):271–281. [Morgan, 1990] Morgan, C. C. (1990). Programming from Specifications. Prentice-Hall.
SERVICE-ORIENTED SYSTEMS ENGINEERING: SPECIFICATION AND DESIGN OF SERVICES AND LAYERED ARCHITECTURES The JANUS Approach
Manfred Broy Institut fur Informatik, Technische Universitat Munchen D-80290 Munchen, Germany
[email protected] Abstract
Based on the F OCUS theory of distributed systems (see [Broy and Stølen, 2001]) that are composed of interacting components we introduce a formal model of services and of layered architectures. In F OCUS a component is a total behavior. In contrast, a service is a partial behavior. A layer in a layered architecture is a component or a service with two service interfaces, an import and an export interface. A layered architecture is a stack of several layers. For this model of services and service layers we work out specification and design techniques for layers and layered architectures. We study the application of the notions of a service and service layer and its relation to object orientation. Finally we discuss more specific aspects of layered architectures such as refinement and error handling as well as layer models in telecommunication.
Keywords:
Service Engineering, Assumption/Commitment Specifications, Software Architectures, Layered Architecture, Import/Export Specifications, Protocol-Stack 47
M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 47–81. © 2005 Springer. Printed in the Netherlands.
48
1.
Motivation
Software development is today one of the most complex and powerful tasks in engineering. Modern software systems typically are embedded in technical or organizational processes and support those. They are deployed and distributed over large networks; they are dynamic, and accessed concurrently via a couple of independent user interfaces. They are based on software infrastructure such as operating systems and middleware providing the service of object request brokers. Large software systems are typically built in a modular fashion and structured into components. These components are grouped together in software architectures. Software architectures are typically structured in layers. It is well known that hierarchies of layered architectures provide useful structuring principles for software systems. These ideas go back to “structured programming” according to Dijkstra and to Parnas (see [Parnas, 1972]). The specification and modeling of software layers and its theory, however, is not sufficiently understood and formalized till today. The purpose of this paper is to present a comprehensive theory that captures the notions of services and those of layers and layered architectures in terms of services. It is aiming at a theoretical basis for a more practical engineering approach to services and the design of layered architectures in terms of services. The discussion of specific architectures or architectural patterns is not within the scope of this paper. In this paper we study semantic models of services and layered architectures. We introduce a mathematical model of layers. The purpose of this theory is to provide a basis for an engineering method for the design and specification of layered architectures. We are mainly interested in schemes of specifying and designing systems in terms of the assumption/commitment paradigm. This paradigm treats systems in a form where certain properties (“commitments”) are only guaranteed provided certain assumptions are fulfilled by the system environment. We show two versions of the scheme, assumptions about the input streams of a system and assumptions about the existence of certain services. The paper is organized as follows: in section 2 we define the notion of a service and in section 3 layered architectures in terms of services. We introduce the notion of a service and that of a component. We give an abstract semantic model of software component interfaces and of service interfaces. We show how these notions relate to state machines. On this basis we define a model for layered architectures. According to this model we introduce and discuss specification techniques for layered architectures. Finally, in section 4, we study specific aspects of service layers and layered architectures such as refinement,
Service-oriented Systems Engineering
49
the extension of the layered architecture, and error handling as well as the application of the idea of layered architectures in telecommunication.
2.
Components and Services
In this section we introduce the syntactic and semantic notion of a component interface and that of a service. Since services are partial functions a suggestive way to describe them are assumption/commitment specifications. Another way to describe services are state machines. We show how the notion of a service is related to state machines. State machines are one way to describe services. We closely follow the F OCUS approach explained in all its details in [Broy and Stølen, 2001]. It provides a flexible modular notion of a component and of a service, too.
2.1
Interfaces, Components, and Services
In this subsection we define the concepts of a component, an interface, and a service. These three concepts are closely related. All three are based on the idea of a data stream as a model for communication and a behavior as a relation on data streams.
2.1.1 Streams. We introduce the notion of a component based on the idea of a data stream. Throughout this paper we work with only a few simple notations for data streams. Streams are used to represent histories of communications of data messages in a time frame. Given a set M, by M ∗ we denote the set of finite sequences of elements from M, by M ∞ the set of infinite sequences of elements of M that can easily be represented by functions N → M. By M ω we denote M ∗ ∪ M ∞ , called the set of finite and infinite (non-timed) streams. Given a message set M we define a timed stream by a function s : N → M∗ For each time t the sequence s(t) denotes the sequence of messages communicated at time t in the stream s. We use channels as identifiers for streams in systems. Let I be the set of input channels and O be the set of output channels. With every channel c in the channel set I ∪ O we associate a data type Type(c) indicating the type of messages sent along that channel. A data type is in our context simply a data set. Let C be a set of channels with types assigned by the function Type : C → TYPE
50 Here TYPE is a set of types τ ∈ TYPE, which are carrier sets of data elements. Let M be the universe of all messages. This means M = ∪{τ : τ ∈ TYPE} The concept of a stream is used to define the concept of a channel history. A channel history is given by the messages communicated over a channel.
Definition 1 (Channel history) Let C be a set of typed channels; a channel history is a mapping x : C → (N → M ∗ ) such that x.c is a stream of type Type(c) for each c ∈ C. Both by H(C) as well the set of channel histories for the channel set C is denoted. 2 as by C We use, in particular, the following notations for a timed stream s (let S be a set of messages, k ∈ N): z s concatenation of a sequence or stream z to a stream s, c s sub-stream of s with only the elements in the set S, S S#s number of elements in s that are elements in the set S, s.k k-th sequence in the stream s, s ↓ k prefix of the first k sequences in the timed stream s, s ↑ k stream s without the first k sequences, ¯s finite or infinite (non-timed) stream that is the result of concatenating all sequences in s. Note that ¯s defines a time abstraction for the timed stream s. Going from s to ¯s eliminates all timing information. Similarly we denote for a channel by ¯x its time abstraction, defined for each channel c ∈ C by valuation x ∈ C the equation ¯x.c = x.c All the operators introduced above easily generalize to sets of streams and sets of valuations by element wise application. Given two disjoint sets C and C of channels with C ∩ C = ∅ and histories z ∈ H(C) and z ∈ H(C ) we define the direct sum of the histories z and z by (z ⊕ z ) ∈ H(C ∪ C ). It is specified as follows: (z ⊕ z ).c = z.c ⇐ c ∈ C
(z ⊕ z ).c = z .c ⇐ c ∈ C
The notion of a stream is essential for defining the behavior of components.
2.1.2 Components. Components have interfaces determined by their sets and types of channels. We describe the black box behavior of components by their interfaces.
51
Service-oriented Systems Engineering
An interface provides both a syntactic and semantic notion. We use the concept of a channel, a data type and a data stream to describe interfaces. The syntactic interfaces define a kind of type for a component. The semantic interfaces characterize the observable behavior of components.
Definition 2 (Syntactic interface) Let I be a set of typed input channels and O be the set of typed output channels. The pair (I, O) characterizes the syntactic interface of a component. By (I O) this syntactic interface is 2 denoted. A component is connected to its environment exclusively by its channels. The syntactic interface indicates which types of messages can be exchanged but it tells nothing particular about the interface behavior. For each interface (I O) we call (O I) the converse interface.
Definition 3 (Semantic interface of a component) A semantic component interface (behavior) with the syntactic interface (I O) is given by a function F : I → ℘(O) that fulfills the following timing property, which axiomatizes the time flow. By F.x we denote the set of output histories for the input stream x of the component t ∈ N) described by F. The timing property reads as follows (let x, z ∈ I, y ∈ O, x↓t=z↓t
⇒
{y ↓ t + 1 : y ∈ F(x)} = {y ↓ t + 1 : y ∈ F(z)}
Here x ↓ t denotes the stream that is the prefix of the stream x and contains the first t finite sequences. In other words, x ↓ t denotes the communication 2 histories in the channel valuation x until time interval t. The timing property expresses that the set of possible output histories for the first t + 1 time intervals only depends on the input histories for the first t time intervals. In other words, the processing of messages within a component takes at least one time tick. We call functions with this property time-guarded or strictly causal. x1 : S1
y1 : T1 .. .
xn : Sn
f
.. . ym : Tm
Figure 1. Graphical Representation of a Component as a Data Flow Node with Input Channels x1 , . . . , xn and Output Channels y1 , . . . , ym and their Respective Types
52 As we will see in the following the notion of causality is essential and has strong logical consequences. We give a first simple example of these consequences of the causality assumption. Let us consider the question whether we can have F.x = ∅ for a component with behavior F for some input history x. In this case, since x ↓ 0 = for all streams x, we get x ↓ 0 = z ↓ 0 for all streams z and by causality {y ↓ 1 : y ∈ F(x)} = {y ↓ 1 : y ∈ F(z)} = ∅ for all streams x. Therefore the result of the application of a strictly causal function is either empty for all its input histories or F is “total”, in other words F.x = ∅ for all x. In the first case we call the interface function paradoxical. In the latter case we call the interface function total.
2.1.3 Services. A service has a syntactic interface just like a component. Its behavior, however, is “partial” in contrast to the totality of a component interface behavior. Partiality here means that a service is defined only for a subset of its input histories. This subset is called the service domain. Definition 4 (Service interface) A service interface with the syntactic interface (I O) is given by a function F : I → ℘(O) that fulfills the timing property only for the input histories with nonempty out t ∈ N): put set (let x, z ∈ I, y ∈ O, F.x = ∅ = F.z ∧ x ↓ t = z ↓ t ⇒ {y ↓ t + 1 : y ∈ F(x)} = {y ↓ t + 1 : y ∈ F(z)} The set Dom(F) = {x : F.x = ∅} is called the service domain. The set Ran(F) = {y ∈ F.x : x ∈ Dom(F)} is called the service range. By F[I O] we denote the set of all service interfaces with input channels from the set I and output channels from the set O. By F we denote the set of all interfaces for arbitrary channel sets I and O. 2
53
Service-oriented Systems Engineering
In contrast to a component, where the causality requirement implies that for a component F either all output sets F.x are empty for all x or none is empty, a service G may be a partial function in the sense that G.x = ∅ may hold for an arbitrary set of input histories x ∈ I. To get access to a service, in general, certain access conventions have to be valid. We speak of a service protocol. Input histories x that are not in the service domain do not fulfill the service access assumptions. This gives a clear view: a non-paradoxical component is a total, while a service may be a partial behavior. In other words a nonparadoxical component is a total service. For a component there are nonempty sets of behaviors for every input history.
I
O Service interface
Figure 2.
Service Interface
A service is close to the idea of a use case in object oriented analysis. It can be seen as the formalization of this idea. A service provides a partial view onto a component.
Example 5 (Storage service) A storage service Stose stores messages of type Data received on channel a and returns them on its channel b upon request as indicated by the signal Req: Stose : H[{a}] → H[{b}] where Type(a) = Data ∪ {Req},
Type(b) = Data.
An informal description of the Stose service reads as follows: “If for every input d ∈ Data there is eventually a request message Req then d is eventually produced as output.” Formally Stose is specified explicitly by the following specification scheme:
54 Stose in : a : Data ∪ {Req} out : b : Data {Req}#a = Data#b input/output assertion ∧ ∀ d ∈ Data : {d}#a = {d}#b input/output assertion ∧ ∀ t ∈ N : {d}#a ↓ t ≥ {d}#b ↓ t causality assertion (could be kept implicit) ∧ ∀ t ∈ N : {Req}#a ↓ (t + 1) ≤ Data#a ↓ t input restriction The formulas express that only data are returned that have been stored before, that data are only returned on request and finally all data in the output stream are requested. The last two clauses address, in particular, the causality. Data can only be produced as output if received before and if requests have been received. We get for the service domain the following characterization: Dom(Stose) = {a : {Req}#a = Data#a ∧ ∀ t ∈ N : {Req}#a ↓ (t + 1) ≤ Data#a ↓ t} Stose and also Dom(Stose) include both safety and liveness properties.
2
The characterization of the service domain can be used in service specifications by formulating assumptions about the input histories.
2.1.4 Assumption/Commitment Specification of Services. There are many ways to specify components or services. All techniques for component specifications (see [Broy and Stølen, 2001]) can be used for services, in principle, too. Services can be specified by logical formulas defining the relation between input and output streams, by state machines, or by a set of message sequence diagrams specifying the dialogue between the service user and the service provider. In a service dialog we observe the input and output history between the service provider and its environment. We assume that only special input is allowed in such a dialogue. In the following we discuss in detail an assertion technique for describing services. Actually, it addresses explicitly the partiality of I/O-functions representing the behavior of services. Since a service is represented by a partial function we put specific emphasis on characterizing its domain. We discuss two kinds of assertions, input assumptions and output commitments. Input assumptions speak about the question whether some input is in conformance to the service dialog. Since the conformance of input histories to service dialogues may depend also on the previous output history the input assumptions are predicates on two parameters, which may be surprising for some readers.
Service-oriented Systems Engineering
55
Let F ∈ F[I O] be a service and x ∈ H[I] be an input history; if there exists an input history x ∈ H[I] such that for a time t ∈ N x ↓ t = x ↓ t and y ∈ F.x but there does not exist an output history y ∈ F.x such that y ↓ t = y ↓ t then we may conclude, that x ↓ t is a proper input for output y ↓ t, but something in x is not. We define for each time t ∈ N a predicate →B At : I × O by the formula At (x, y) = ∃ x ∈ H[I], y ∈ F.x : x ↓ t = x ↓ t ∧ y ↓ t = y ↓ t At (x, y) expresses that after input of x ↓ t that has caused output y ↓ t there exists an output y.t for input x.t. At is called the input assumption at time t. We easily prove for all t ∈ N: At+1 (x, y) ⇒ At (x, y) In addition to At we define a predicate A : I → B by the formula A(x) = ∃ y ∈ H[O] : y ∈ F.x A is called the input assumption. We easily prove for all t ∈ N: A(x) ⇒ ∃ y ∈ H[O] : At (x, y) This shows that in the logical sense of implication the predicate A is stronger than all the predicates At . Furthermore for each time t ∈ N we define a predicate →B Gt : I × O by the formula Gt (x, y) = ∃ x ∈ H[I], y ∈ F.x : x ↓ t = x ↓ t ∧ y ↓ t + 1 = y ↓ t + 1 N:
Gt is called the output commitment at time t. We easily prove for all times t ∈ Gt+1 (x, y) ⇒ Gt (x, y)
56 and also Gt (x, y) ⇒ At (x, y) Finally we define the predicate →B G : I × O by the formula G(x, y) = y ∈ F.x G is called the output commitment. We easily prove for all times t ∈ N: G(x, y) ⇒ Gt (x, y) and G(x, y) ⇒ A(x) Often we are interested to derive the predicates G and A not from a given behavior F but rather to specify F in terms of A and G. Then we specify At , Gt , A, and G by logical means and define F as follows. In this case we speak of an assumption/commitment specification.
Definition 6 (Assumption/Commitment Specifications) Given the predicates with the functionalities as indicated above, we specify the service function F as follows: F.x = {y : A(x) ∧ G(x, y)} and a component F by F .x = {y : (A(x) ⇒ G(x, y)) ∧ ∀ t ∈ N : At (x, y) ⇒ Gt (x, y)}. In both cases we speak of an assumption/commitment specification of the service F and the component F respectively. 2 F is a component. The definition of F has been carefully done in a way that makes sure that F is total and strictly causal. To demonstrate the technique of assumption/commitment specifications we start with a simple example.
Example 7 (StoseV by Assumption/Commitment) We specify a slight variation of Stose called StoseV. The specification for StoseV is rather straightforward. We first specify the data types: type StoIn = Data ∪ {Req} type StoOut = Data
Service-oriented Systems Engineering
57
The specification reads as follows: StoseV in a : StoIn out b : SoOut ∀ d ∈ Data : {d}#b = {d}#a ∧ Data#b = {Req}#a ∧ ∀ t ∈ N : {d}#(b ↓ t + 1) ≤ {d}#(a ↓ t) By logics we get the assumption (which does not depend on b) StoseAs(a) = (Data#a = Req#a) and the commitment StoseCo(a, b) = ∀ d ∈ Data : {d}#b = {d}#a ∧ Data#b = Req#a ∧ ∀ t ∈ N : {d}#(b ↓ t + 1) ≤ {d}#(a ↓ t) This fixes the assumption/commitment specification of StoseV. In contrast to Stose we weakened the assumption. Here we can send requests at any time. The assumption requires that eventually there exits a data message for every 2 request and vice versa. In an assumption/commitment specification the assumption A characterizes for which input histories x the set F.x is empty. More precisely F.x = ∅ if ∀ y : ¬ G(x, y). Since G(x, y) ⇒ A(x) holds we can actually drop A(x) in the service specification. A is useful only to define the service domain. Theorem 8 (Consistency of assumption/commitment specification)
Let all the definitions be as above. Then F is total and strictly causal.
Proof. For every input history x we can construct an output history y ∈ F .x. We define y inductively by defining y.k + 1 in terms of y.1, . . . , y.k as follows: y ↓ 0 = given y ↓ k we construct y ↓ k + 1 as follows:
58 If Ak (x, y) holds then there exists a sequence s = y.k + 1 such that Gk (x, y) holds; if ¬ Ak (x, y) holds then we can choose y.k + 1 arbitrarily. This construction yields an output history y. We show that F .x = ∅. We consider three cases. (1) A(x) holds; then by definition there exists y ∈ F.x ⊆ F .x. (2) ¬ A(x) holds; we consider two subcases (2a) Ak (x, y) and Gk (x, y) hold for all k; then y ∈ F .x. (2b) ¬ Ak (x, y) and Ak (x, y) and Gk (x, y) for all k < k; again by definition y ∈ F .x. It remains to show the strict causality of F : If x ↓ k = z ↓ k, then we can use the same construction as above to construct a history y for x and y for z. If we do the same choices for y.1, . . . , y.k + 1 and y .1, . . . , y .k + 1 yields some 2 y and y where y ↓ k + 1 = y ↓ k + 1 and y ∈ F .x and y ∈ F .z.
Example 9 (Indexed Access) Assume we define a component for indexed access to data. We use the following two types Type InAcc = put(i : Index, d : Data) | get(i : Index) | del(i : Index) Type OutAcc = out(d : Data) | Fail(i : In) | ack(i : In) It is specified as follows (using the scheme of [Broy and Stølen, 2001]): IndAcc in x : InAC out x : OutACC sel(σ0 , ¯x, ¯y) Let σ be a mapping (denoting a “state”) σ : Index → Data ∪ {Fail} where for all i ∈ Index: σ0 (i) = Fail
59
Service-oriented Systems Engineering
We define sel as the weakest predicate that fulfills the following equations: sel(σ, , y) = (y = ) sel(σ, x, ) = (x = ) sel(σ, a x, b y) = ∃ σ : [sel(σ , x, y) ∧ ∀ i : Index, d : Data : (a = put(i, d) ⇒ ((b = Fail(i) ∧ σ = σ) ∨ (b = ack(i) ∧ σ = σ[i := d]))) ∧ (a = get(i) ∧ σ[i] = Fail ⇒ σ = σ ∧ (b = Fail(i) ∨ (b = out(σ[i]))) ∧ (a = del(i) ⇒ b = ack(i) ∧ σ = σ[i := Fail])] where we specify
(σ[i := d])[i] =
d if i = j σ[j [ ] otherwise
This specification expresses that the message get(i) must not be sent if σ[i] = Fail. This is part of the assumption. It is an interesting exercise to represent the specifying formula of the function sel by a table: a
b
put(i, d) Fail(i) ack(i) σ[i] = Fail get(i) Fail(i) out(σ[i]) del(i) ack(i)
σ σ σ[i := d] σ σ σ[i := Fail]
This way we get a very clearly structured specification.
2
Which input is feasible at a certain time point for a service may depend on the previous output, forming the service reaction till that time point. Given an input history x and an output history y the function At (x, y) yields true, if the input till time point t is in conformance with the service dialogue provided the service output history was y ↓ t. For non-paradoxical services we trivially obtain A0 (x, y) = false. This expresses that every input does not fulfill the service assumption. The service domain is empty. The expression Gt (x, y)
60 yields true, if the output y till time point t is feasible according to the given service behavior. Finally the proposition A(x) expresses, that the input history x is a feasible input history for the service. Given a feasible input history x the expression G(x, y) yields true, if the output y is correct for input x according to the service specification. As we will show in the following the notion of partiality and input assumptions is essential for services. Given a service we may turn it into a component by the chaos closure. In this component we allow arbitrary output for those input histories that do not fulfill the assumptions. We define the chaos closure of a service F as follows Fchaos .x = {y : (A(x) ⇒ G(x, y)) ∧ ∀ t ∈ N : At (x, y) ⇒ Gt (x, y)} It turns a service into a component. Fchaos is a refinement of F. In fact it is the least refinement of the service F that is a component. According to its definition F is always strongly causal. Note that a naive chaos completion by the formula Fchaosnaive .x = {y : A(x) ⇒ G(x, y)} would lead to a contradiction to the requirement of strict causality in general. From the chaos closure Fchaos we can reconstruct the service F only under the simple assumption that the formula At (x, y) ⇒ (∀ y : y ↓ t = y ↓ t ⇒ Gt (x, y)) is never a tautology for any input history x. In other words, in the service function F there is no chaotic behavior which means that every input history x in the service domain actually restricts the output.
Example 10 (The unreliable storage service) A storage service Stose stores Data and returns them upon request. Now we study an unreliable version called Unstose: Unstose : H[{a}] → H[{b}] In contrast to the behavior of the service Stose for Unstose requesting Data output may fail: Type(a) = Data ∪ {Req},
Type(b) = Data ∪ {Fail}
Service-oriented Systems Engineering
61
specified by: Unstose in : a : Data ∪ {Req} out : b : Data ∪ {Fail} {Req}#a = (Data ∪ {Fail})#b
∧ ∀ d ∈ Data : {d}#a = {d}#b ∧ ∀ t ∈ N : {d}#a ↓ t ≥ {d}#b ↓ t + 1 ∧ ∀ t ∈ N : {Req}#a ↓ t + 1
≤ (Data#a ↓ t + 1) + {Fail}#b ↓ t}
The formulas express that only data are returned that have been stored, that data are only returned on request and finally all data are requested. However, requests may fail. It implicitly expresses fairness (liveness) properties both for the input and for the output histories. We get the assumptions (including the causality property) At (x, y) ≡ ({Req}#x.a ↓ t + 1 ≤ (Data#x.a ↓ t + 1) + ({Fail}#y.b ↓ t)) A(x) ≡ ∃ y ∈ H[{b}] : {Req}#x.a = (Data ∪ {Fail})#y.b ∧ ∀ d ∈ Data : {d}#x.a = {d}#y.b ∧ ∀ t ∈ N : {Req}#x.a ↓ t + 1 ≤ (Data#x.a ↓ t + 1) + ({Fail}#y.b ↓ t) Unstose and also Dom(Unstose) comprise safety and liveness properties. In contrast to Stose Unstose shows a more sophisticated property since its input assumption depends on the output. If a request fails it has to be repeated 2 eventually until it is finally successful. For a consistent service, we require a number of healthiness conditions for the specification of services listed in the following: there exists a least one feasible input history and a corresponding correct output history (Dom(F) = ∅) ∃ x, y : A(x) ∧ G(x, y) every finite feasible input history can be extended to an infinite feasible input history At (x, y) ⇒ ∃ x , y : x ↓ t + 1 = x ↓ t + 1 ∧ y ↓ t + 1 = y ↓ t + 1 ∧ G(x , y ) for every feasible input history there exists a correct output history A(x) ⇒ ∃ y : G(x, y)
62 if there exists an output history y for some input history x the assumption is fulfilled G(x, y) ⇒ A(x) If we construct the assertions A and G as described above from a consistent service function with a nonempty domain, all these conditions are valid. Note that the predicates A, G, At , and Gt are only of interest for the component specification but not for the service specification. They can be extracted from a given service specification.
2.1.5 Composition of Components and Services. In this subsection we study the composition of components. Services and components are composed by parallel composition with feedback along the lines of [Broy and Stølen, 2001]. Definition 11 (Composition of Components and Services) Given two service interfaces F1 ∈ F[I1 O1 ] and F2 ∈ F[II2 O2 ], we define a composition for the feedback channels C1 ⊆ O1 ∩ I2 and C2 ⊆ O2 ∩ I1 by F1 [C1 ↔ C2 ]F2 The component F1 [C1 ↔ C2 ]F2 is defined as follows (where z ∈ H[I1 ∪ O1 ∪ I2 ∪ O2 ], x ∈ H[I] where I = I1 \ C2 ∪ I2 \ C1 ): (F1 [C1 ↔ C2 ]F2 ).x = {z | (O1 \ C1 ) ∪ (O2 \ C2 ) : x = z | I ∧ z | O1 ∈ F1 (z | I1 ) ∧ z | O2 ∈ F1 (z | I2 )} The channels in C1 ∪ C2 are called internal for the composed system F1 [C1 ↔ 2 C2 ]F2 . The idea of the composition of components and services as defined above is shown in Fig. 3. I1 \ C2
C1 F1
O1 \ C1
Figure 3.
C2
F2
O2 \ C2
I2 \ C1
Composition F1 [C1 ↔ C2 ]F2 of Services or Components
Service-oriented Systems Engineering
63
In a composed component F1 [C1 ↔ C2 ]F2 the channels in the channel sets C1 and C2 are used for internal communication. Parallel composition of independent sets of internal channels is associative, as long as (I ∪ O) ∩ (I ∪ O ) = ∅ holds; then we have F3 = F1 [I ↔ O](F2 [I ↔ O ]F F3 ) (F1 [I ↔ O]F2 )[I ↔ O ]F The proof of this equation is straightforward. The set of services and the set of components form together with compositions an algebra. The composition of components (strictly causal stream functions) yields components and the composition of services yields services. Composition is a partial function on the set of all components and the set of all services. It is only defined if the syntactic interfaces fit together.
2.1.6 Refinement of Components and Services. An essential notion for components and services is that of refinement. Note that a component is a special case of a service (with either an empty or a total domain). Definition 12 (Refinement of Components and Services) Given two service interfaces F1 ∈ F[I1 O1 ] and F2 ∈ F[II2 O2 ], where I1 ⊆ I2 and O1 ⊆ O2 , we call F2 a refinement of F1 if for all input histories x ∈ Dom(F1 ) {y | O1 : ∃ x : x = x | I1 ∧ y ∈ F2 .x } ⊆ F1 .x and Dom(F1 ) ⊆ {x | I1 : x ∈ Dom(F2 )} Then we write F1 ≈> F2 F2 is called a service refinement or a behavioral refinement of F1 . This 2 notion applies both for services and component specifications. Note that this refinement notion is a slight generalization of the property refinement introduced in [Broy and Stølen, 2001], where we did not allow for the introduction of new channels in a refinement. The refinement relation defines a partial order on the set of services. One service may be the refinement of several quite unrelated services.
64
2.2
State Machine Models of Components and Services
In this subsection we introduce a state machine model of components and services. In particular, we discuss the issue of partiality for these state machines, which has been indicated as being essential for services, and also has to be dealt with for state machines.
2.2.1 System States. One common way to model a system and its behavior is to describe it in terms of a state machine with its state space and its state transitions. Each state of a system consists of the states of its internal and external channels and the states of its components. This leads to a local and global state view of the system. We describe the data state of a state transition machine by a set of typed attributes V that can be seen as program variables. A data state is then modeled by the mapping η:V→
∪
type(v)
v∈V
It is a valuation of the attributes in the set V by values of the corresponding we denote the set of valuations of the attributes in V. By Σ we type. By V denote the set of all valuations. A system that is modeled by a state machine has a local data state. Each which is the set of valuations for local data state is an element in the set V, its attributes. In addition to the channel attributes (see below), we use the attributes of the local states of the systems to refer to the data and control states of a system.
2.2.2 State Machine Model: State Transitions. Often a component is described in a well-understandable and sufficiently abstract way by a state transition machine with input and output. A state transition is one step of a system execution leading from a given state to a new state. In each transition the state machine consumes a sequence of messages on each of its input channels and produces a sequence of messages on each of its output channels. By Σ we denote the set of all states. A state machine (∆, Λ) with input and output channels according to the syntactic interface (I O) is given by a set Λ ⊆ Σ × (O → M ∗ ) of pairs of initial states and initial output sequences as well as a state transition function ∆ : (Σ × (I → M ∗ )) → ℘(Σ × (O → M ∗ )) For each state σ ∈ Σ and each valuation u : I → M ∗ of the input channels in the set I by sequences of messages of the required types we obtain by every pair (σ , s) ∈ ∆(σ, u) a successor state σ and a valuation s : O → M ∗ of the output channels consisting of the sequences produced by the state transition.
65
Service-oriented Systems Engineering
A state machine is called partial, if the set ∆(σ, u) is the empty set for certain (reachable) states σ and certain input sequences. By SM[I O] we denote the set of all state machines with input channels I and output channels O. By SM we denote the set of all state machines.
Example 13 (Description of a Service by a State Machine) As a basic example we consider a simple account service. Initially the account is closed and its balance is 0. An account can be opened, amounts can be added to it, it can be blocked (using a password psw that allows to open it again) and opened again by the password and it can be closed. Fig. 4 gives the state machine by a state machine diagram. We work with a state machine with two attributes. The attribute denotes a number which represents the amount store in the account and p represents a password. The state machine is highly partial. For instance, the first input has to be 2 the message open. Otherwise, the set of successor states is empty. set(n)/val(a + n){a := a + n}
{a := 0}/−
open/ack{a := 0}
closed
block(psw ( )/ack{p := psw}
open
close/ack
Figure 4.
{p = psw}free } ( (psw )/rej
blocked
free(p ( )/ack
Example of a Service Description by a Partial State Machine
Note that it is more difficult to give a state machine description for components with complex liveness properties as found for instance in the example of a storage service in the previous subsections. In the following section we show how to derive the canonical service interface function from a state machine.
3.
Layers and Layered Architectures
In this section we introduce the notion of a service layer and that of a layered architecture based on the idea of a component interface and that of a service. Roughly speaking a layered software architecture is a family of components forming layers in a component hierarchy. Each layer defines an upper interface called the export interface and makes use of a lower interface called the import interface.
66 set(n)/val(a + n){a := a + n}
{a := 0}/−
open/ack{a := 0}
closed
open
close/ack
Figure 5.
3.1
block(psw ( )/ack{p := psw}
{p = psw}free } ( (psw )/rej set(n)/rej
blocked
free(p ( )/ack
Example of a Refined Service Description by a Partial State Machine
Service Layers
In this subsection we introduce the notion of a service layer. A service layer is a service with a syntactic interface structured into two (or more) complementary sub-interfaces. Of course, one might consider not only two but many separate sub-interfaces for components – however, considering two interfaces is enough to discuss most of the interesting issues of layers.
3.1.1 Service Users and Service Providers. In practically applications, services are often structured into service providers and service users. However, formally the service provider and the service user are communicating units of interaction. So what is the precise difference between a service provider F ∈ F[I O] and a service user G ∈ F[O I]? A service user G is, in general seen from the perspective of the service provider, highly nondeterministic. The service user G can use the service in many different ways, in general. It has only to follow the rules of the service access protocol making sure that the service input history that it issues is in the service domain. By using the service F according to the service user G we get two histories x and y by the formula: y ∈ F.x Thus the most general user G of the service F is obviously G.y = {x : y ∈ F.x} In other words, for each service output y the user may use all the service inputs that result in the output y for service F. A more specific user therefore is given by a refinement G of G. It may use F only in a restricted form, but it has to be able to accept all output services generated by the service provider F.
Service-oriented Systems Engineering
67
Thus we require the following relationships between the domains and ranges of the service provider F and the service user G. Ran(G) ⊆ Dom(F) {y : ∃ x : y ∈ F.x ∧ x ∈ Ran(G)} ⊆ Dom(G) The second formula means that the service user is prepared to handle every output of the service provider produced as reaction on input of G. Given a service interface F ∈ F[I O], a subset of the input channels I ⊆ I, and a subset of the output channels O ⊆ O, we can splice the service F into a service with the syntactic interface (I O ) as follows:
Definition 14 (Splicing) Let F ∈ F[I O] and a subset of the input channels I ⊆ I and a subset of the output channels O ⊆ O be given, we define a service function F ∈ F[I O ] called the splicing of F to the syntactic interface (I O ) by the specification F .x = {y | O : ∃ x : x = x | I ∧ y ∈ F.x} Splicing derives a sub-interface from a given service. It is an abstraction of F. 2 We denote F in this case also by F † (I O ). An easy proof shows that the behavior obtained by the splicing of a service F into a service F , is strongly causal again due to the causality of F and thus F is again a service provided F is a service. Given the sub-interface I ⊆ I and O ⊆ O we may construct two slices of the service F namely F † (I O ) as well as F † ((I \ I ) (O \ O )) Note, however, that we cannot reconstruct the service F, in general, from these two slices, since we loose by slicing the dependencies between the input histories of I and I \ I and the output histories of O \ O and O respectively. The splices provide only partial views for the behavior modeled by F.
3.1.2 Service Layers. A layer is a service with (at least) two syntactic interfaces as shown in Fig. 6. So far, the structuring provides only additional information but does not affect the idea of a service behavior.
68
I
O
upward interface
O
downward interface
Service layer
I
Figure 6.
Service Layer
Therefore all the notions introduced for services apply also for service layers.
Definition 15 (Service Layer) Given two syntactic service interfaces (I O) and (O I ) where we assume I ∩ O = ∅ and O ∩ I = ∅; the behavior of a service layer L is represented by a service interface L ∈ F[I ∪ O O ∪ I ] For the service layer the first syntactic service interface is called the syntactic upward interface and the second one is called the syntactic downward interface. The syntactic service layer interface is denoted by (I O/O I ). 2 We denote the set of layers by L[I O/O I ]. The idea of a service layer interface is well illustrated by Fig. 6. From a behavioral view a service layer is itself nothing but a service, with its syntactic interface divided into an upper and a lower part.
3.1.3 Composition of Service Layers. A service layer can be composed given a service such that it provides an upper “export” service. Given a service interface F ∈ F[I O ] called the import service and a service layer L ∈ L[I O/O I ] we define its composition by the term L[I ↔ O ]F This term corresponds to the small system architecture as shown in Fig. 7. We call the layered architecture correct with respect to the export service F ∈ F[I O] for a provided import service F if the following equation holds: F = L[I ↔ O ]F
69
Service-oriented Systems Engineering
Service F I
O Service Layer L
I
O
Service interface F
Figure 7.
Layered Architecture Formed of a Service and Service Layer
The idea of the composition of two layers with services is graphically illustrated in Fig. 8.
I
O Service Layer L
O
I Service Layer L
I
Figure 8.
O
Service Layer Composed of Two Service Layers
This is the parallel composition introduced before. But now we work with a structured view onto the two interfaces.
70 In
On Service layer n
In−1
On−1
Service layer n − 1 In−2
On−2
Service layer n − 2 In−3
On−3 ...
I0
O0 Service layer 0
Figure 9.
L
Layered Architecture
We may also compose two given service layers L ∈ L[I O/O I ] and ∈ L[O I /O I ] into the layer L[I ↔ O ]L
This term denotes a layer in L[I O/O I ]. The composition of two layers is illustrated in Fig. 8. If we iterate the idea of service layers, we get hierarchies of layers also called layered architectures as shown in Fig. 9. As Figure 7 shows there are three services involved in a layer pattern for the service layer L: The import service F ∈ F[I O ]. The export service F ∈ F[I O] that is a result of composing the layer with the import service F = L[I ↔ O ]F . The downward service G ∈ F[O I ] which is obtained by projecting the layer to the import interface G = L † (O I ). The downward service G is the service “offered” by L to the downward layer; it uses the import service provided by F . We assume that all inputs to the downward service are within its service domain.
Service-oriented Systems Engineering
71
The idea of a stack of layered architectures is illustrated in Fig. 9. It is characterized best by the family of export services Fj ∈ F[IIj Oj ] for 0 ≤ Oj Ij ] the following j ≤ n. We get for each layer Lj+1 ∈ L[IIj+1 Oj+1 /O properties: The export service Fj+1 ∈ F[IIj+1 Oj+1 ] is given by Fj+1 = Lj+1 [IIj ↔ Fj . Oj ]F Its import service is Fj ∈ F[IIj Oj ]. The downward service Gj ∈ F[O Oj Ij ] is given by Gj = Lj+1 † (O Oj Ij ). In the following we deal with the interaction between layers of layered architectures. We, in particular, study the specification of service layers.
3.2
Specifying Service Layers
In this subsection we discuss how to characterize and to specify service layers. As we have shown, one way to specify layers is the assumption/commitment style. We concentrate here on the specification of layers in terms of services.
3.2.1 Characterizing Layers by their Import and Export Services. The idea of a layer is characterized best as follows: a service layer L ∈ L[I O/O I ] offers an export service F = L[O ↔ I ]F provided an adequate import service F ∈ F[I O ] is available. In general, a layer shows only a sensible behavior for a small set of import services F . Therefore the idea of a layer is best communicated by the characterization and the specification of its required import and its provided export services. Note, however, that a layer L ∈ L[I O/O I ] is not uniquely characterized by a specification of its import and export service. In fact, given two services, an import service F ∈ F[I O ] and an export service F ∈ F[I O] there exist, in general, many layers L ∈ L[I O/O I ] such that the following equation holds F = L[I ↔ O ]F In the extreme, the layer L is never forced to actually make use of its import service. It may never send any messages to F but realize this service by itself internally. This freedom to use an import service or not changes for two or multi-SAP layers (SAP = service access point) that support communication. We come back to this issue.
3.2.2 Matching Services. Figure 7 shows that there are three services involved in a layer specification pattern for the layer L ∈ L[I O/O I ]: The import service F ∈ F[I O ].
72 The export service F ∈ F[I O] with F = L[I ↔ O ]F . The downward service G ∈ F[O I ] with G = L † (O I ). If we compose two service interfaces for instance when composing two layers as shown in Fig. 8 we have two syntactically corresponding services F ∈ F[I O ] and G ∈ F[O I ]. If we compose the two services, we get a set of interaction histories S ⊆ H(I ∪ O ) as follows: S = {z ∈ H(I ∪ O ) : z | I ∈ G(z | O ) ∧ z | O ∈ F (z | I )} We call the two services F and G matching if S | O ⊆ Dom(G) and S | I ⊆ Dom(F ) In other words, all output histories produced by the downward service G are required to be in the domain of the service F and all output histories produced by F are required to be in the domain of G. In fact, in general, not all input histories in the domain of F and of G do actually occur in S. However, that either F or G produce output histories in S that are not in the domain of its corresponding service is seen as a design error. Note that there is a symmetry between the services F and G. We cannot actually say that the service F uses the service G or that the service G uses F. This symmetry is broken in the case of import and export services, however, as follows. To explain this asymmetry we look again at the question whether there is a difference between offering a service for usage (which is the role of an export service) and the idea of using a service (which is the role of the downward service). In fact, if we introduce an asymmetry by stating that the service F uses G, we require the following conditions. The downward service G uses the import service F . Thus Ran(G) ⊆ Dom(F )
(∗)
is required. Vice versa all the output produced by F on input from G is required to be in the domain of G: {y ∈ F .x : x ∈ Ran(G)} ⊆ Dom(G) By this requirement we break the symmetry between the imported service and the downward service. We do not describe the downward service G but rather the import service.
Service-oriented Systems Engineering
73
As noted before actually the requirement (∗) is stronger than needed, in general! If G does not use its whole range in the domain of F due to the fact, that F does not use the whole domain of G then we can weaken the requirement Ran(G) ⊆ Dom(F ).
3.3
Export/Import Specifications of Layers
Typically not all input histories are good for an access to a service. Only those that are within the service domain and thus fulfill certain service assumptions lead to a well controlled behavior. This suggests the usage of assumption/commitment specifications for services as introduced above. The specification of layers is based on the specification of services. A layer can be seen as a bridge between two services. In a layered architecture a layer exhibits several interfaces: the upward interface, also called the export service interface, the downward interface, the converse of which is also called the import service interface. More precisely, the upward (export) interface is a function of the downward (import) interface and vice versa. From a methodological point of view we work according to the following idea: the upward service interface corresponds to the service interface specification, provided the downward service interface requirements are fulfilled. For the export and the import service interface we assume another form of an assumption/commitment specification. In particular, in such a specification we do not force a layer to actually make usage of the import interface. It can make use of the interface but it does not need to. This is different for double layered architectures (see later). If we specify the interaction at the interface between two layers by an interaction interface, we give another form of a specification of a layered architecture. The interaction interface between two layers has to fulfill certain rules and show certain properties. These rules induce specifications for the upper and the lower level. Since a layer is strictly speaking a service with a more structured syntactic interface the techniques of assumption/commitment specifications can immediately be transferred to this situation. Each interaction between the adjacent layers is completely separated from the layer interactions above or below. This allows an independent specification and implementation. In other words, to understand the downward interface of a layer L we have only to study the service L † (O I ). We do not have
74 to take into account the rather complex service L † (I O). The relationship between the export service (O I) and the downward service L † (O I ) is the responsibility of the layer. In a requirement specification of a layer we do not want to describe all behaviors of a layer, but only those that fit into the specific scheme of interactions, and thus we see the layer as a service. We are, in particular, interested in the specification of the behavioral relationship between the layer and its downward layer. There are three principle techniques to specify these aspects of a layer: We specify the interaction interface S ⊆ H(I ∪ O ) between the layer and its downward service. We specify the layer L ∈ L[I O/O I ] indirectly by specifying the export service F ∈ F[I O] and the import service F ∈ F[I O ] such that F ≈> L[I ↔ O ]F holds. We specify the layer L ∈ L[I O/O I ] as a service FL ∈ F[I ∪ O O ∪ I ]. All three techniques work in principle and are related. However, the second one seems from a methodological point of view the most promising. In particular, to specify a layered architecture, we only have to specify for each layer the export service. An interesting and crucial question is the methodological difference we make between the two services associated with a layer, the export service and downward service.
4.
More on Layered Architectures
In this section we apply our approach of services and layered architectures to telecommunication applications. We deal with two classes of layered architectures. In telecommunication a sub-interface of a service, of a component, or of a system is called a service access point (SAP). Note that our layers so far had only one SAP (the export service).
4.1
Double Layered Architectures
In telecommunication layered architectures are also used. The ISO/OSI layered protocols provide typical examples. For them there are at least two (or actually many) service interfaces, for instance one of a sender and one of a receiver. We speak of double layered architectures.
4.1.1 Double SAP Services. The idea of a double service interface is well illustrated by Fig. 10. It shows a unit with two separated service interfaces (SAPs). Formally it is again a layer. But in contrast to layers, the two
75
Service-oriented Systems Engineering
service interfaces have the same role. There is no distinction between the two SAPs into an import and an export interface. Rather we have two simultaneous interfaces.
I
I
O
O
Double service U
Figure 10.
Double SAP Service
From a behavioral point of view a double service D ∈ L[I O/O I ] is formally a service layer, where its syntactic interface is divided instead of an upper and a lower part into a left and a right part. In contrast to layers, which are best specified by their import and their export services we describe both SAPs by their offered services. So we describe the communication component by two export services or – to express how they relate – more precisely as one joint service.
4.1.2 Layers of Double Services. In fact, we now can associate a stack of layers with each of the service interfaces. This leads to a doubled layered architecture. I
O
I
O
Service layer L
Service layer L
I
I
O
O
Double service D
Figure 11.
Doubled Layered Architecture and Service Layer
A service layer can be composed with a service to provide an upper service. Given two service layers L ∈ L[I O /O I] and L ∈ L[I O /O
76 I ] and a double layered architecture D ∈ L[I O/I O ] we construct a composed system called layered communication architecture L[I ↔ O]D[I ↔ O ]L This idea is illustrated in Fig. 11. As before for layers we can iterate the layering for a double layered architecture as illustrated in Fig.12. We obtain a diagram very similar to the ISO/OSI protocol hierarchy. Ij
Oj
Service layer Lj
Service layer Lj Ij−1
Oj
Ij
Oj−1
Ij−1
Oj−1
...
...
I2
O2
I2
Service layer L2
Service layer L2 I1
O1
I1
O1 Service layer L1
Service layer L1 I0
O2
O0
I0
O0
Double service D
Figure 12.
Doubled Layered Architecture and Service Layer
In principle, we can use all our techniques for describing and specifying the layers. In principle, there is no difference between the layers in layered architectures and those in communication architectures.
4.2
Layers as Refinement
We can see a layer also as a refinement. In this case the layer only refines the input and output histories of a service. This is explained in more detail in the following.
4.2.1 Refinement of Services. In each syntactic interface a special layer is the identity (or more precisely identity modulo time delay). For each
Service-oriented Systems Engineering
77
syntactic interface of a layer where the syntactic interfaces of the export and the import services coincide we get an identity.
Definition 16 (Identity Layer) Given the syntactic service interface (I O) the syntactic service layer interface of identity is denoted by (I O/I O); it is represented by a service interface F[I ∪ O O ∪ I] Id(I ∪ O I ∪ O) ∈ L[I O/O I] is the service with Id(x ⊕ y) = {x ⊕ y} A layer L is called an identity modulo time if L(x ⊕ y) = for all x ∈ I, y ∈ O. 2 x ⊕ y. For each service F ∈ F[I O] we get the equation Id(I ∪ O I ∪ O)[I ↔ O]F = F and for any layer L ∈ L(I O/O I ) we get the equation Id(I ∪ O I ∪ O)[I ↔ O]L = L These rules are quite straightforward. They can be generalized to identities modulo time. The more significant issue for identity is the definition of refinement pairs.
Definition 17 (Refinement Pairs) Two layers L ∈ L[I O/O I ] and L ∈ L[I O /O I] are called a refinement pair for (I O/O I) if L[I ↔ O ]L = Id(I ∪ O I ∪ O) In this case both L and L do only change the representation of their input 2 and output histories, but let all the information through. By the idea of a refinement pair we easily describe what it means for a system layer to forward information. A component E ∈ F(C C) is called an equivalence relation if for all x ∈ C: x ∈ E.x x ∈ E.y ⇒ y ∈ E.x x ∈ E.y ∧ y ∈ E.z ⇒ x ∈ E.z A component layer L is called faithful if there exists a layer L such that (from now on we simply write Id as long as the channels types and their types are obvious) L[I ↔ O ]L = Id
78 Given a faithful layer L such that E = L[I ↔ O ]L is an equivalence, we call L forgetful, if for some input history x the set E.x has more than one element.
Theorem 18 If L is faithful then L is not forgetful. Proof. with
Assume L is faithful; then for every L that if faithful there exists L
L [O ↔ I]L = Id and L such that L[I ↔ O ]L = Id. Thus L[I ↔ O ]L = E implies (E[O ↔ I]L )[I ↔ O ]L = (L[I ↔ O ](L [O ↔ I]L ))[I ↔ O ]L = (L[I ↔ O ]Id)[I ↔ O ]L = L[I ↔ O ]L = Id which implies that E is the identity. Thus given a layer L and a faithful layer L with
2
E = L[I ↔ O ]L we obtain the following observations: Being an equivalence relation we get a message of the information forwarded by L if two input histories x, x ∈ H[I ∪ O ] are equivalent, i. e. x ∈ E.x then the information that they are different is not forwarded.
Theorem 19 Let L be a layer and L and L be faithful and E = L[I ↔ 2 O ]L and E = L[I ↔ O ]L be equivalence relations, then E = E . This theorem shows that the equivalence relations provide a uniform notion of information propagation.
Service-oriented Systems Engineering
79
This idea also allows us to discuss “how deep” some information penetrates into an architecture since each information aspect can be measured as the delta between two histories and by the question how deep this delta is propagated into the layers of the architecture. In the same way we may ask at which layers new information is generated.
4.2.2 Service Equivalence and Abstract Service Refinement. Based on the concept of faithful services we may define service equivalence and abstract service refinement. Two given services F and F , are called faithfully equivalent if there is a faithful layer L such that F = L[I ↔ O ]F . Here L can be viewed as a connector to service F .
5.
Summary and Outlook
Why did we present this quite theoretical setting of mathematical models of services, layers, layered architectures and relations between them? First of all, we want to show how rich and flexible the tool kit of mathematical models is and how far we are in integrating and relating them within the context of software design questions. In our case the usage of streams and stream processing functions is the reason for the remarkable flexibility of our model toolkit and the simplicity of the integration. Second we are interested in a simple and basic model of a service and a layer just strong and rich enough to capture all relevant notions of architectures and interfaces. Software development is a difficult and complex engineering task. It would be very surprising if such a task could be carried out properly without a proper theoretical framework. It would at the same time be quite surprising if a purely scientifically theoretical framework would be enough and directly the right approach for the practical engineer. The result has to be a combination as we have argued between formal techniques and an appropriate theory on one side and intuitive notations based on diagrams. Work is needed along those lines including experiments and feedback from practical applications. But as already our example and experiment show a lot can be gained that way.
Acknowledgements It is a pleasure to thank Ingolf Kr¨u¨ ger, Andreas Rausch, Michael Meisinger, and Bernhard Rumpe for stimulating discussions and helpful remarks on draft versions of the manuscript.
80
References [Baeten and Bergstra, 1992] Baeten, J. and Bergstra, J. (1992). Process algebras with signals and conditions. In Broy, M., editor, Programming and Mathematical Method, volume 88 of NATO ASI Series, Series F: Computer and System Sciences, pages 273–324. Springer. [Berry and Gonthier, 1988] Berry, G. and Gonthier, G. (1988). The esterel synchronous programming language: Design, semantics, implementation. Research Report 842, INRIA. [Booch, 1991] Booch, G. (1991). Object Oriented Design with Applications. Benjamin Cummings, Redwood City, CA. [Booch et al., 1997] Booch, G., Rumbaugh, J., and Jacobson, I. (1997). The unified modeling language for object-oriented development, version 1.0. Technical report, RATIONAL Software Cooperation. [Broy, 1991] Broy, M. (1991). Towards a formal foundation of the specification and description language SDL. Formal Aspects of Computing, 3:21–57. [Broy, 1997] Broy, M. (1997). Refinement of time. In Bertran, M. and Rus, T., editors, Transformation-Based Reactive System Development. ARTS’97, volume 1231 of Lecture Notes in Computer Science, pages 44–63. To appear in TCS. [Broy, 1998] Broy, M. (1998). A functional rephrasing of the assumption/commitment specification style. Formal Methods in System Design, 13(1):87–119. [Broy et al., 1993] Broy, M., Facchi, C., Hettler, R., Hußmann, H., Nazareth, D., Regensburger, F., Slotosch, O., and Stølen, K. (1993). The requirement and design specification language Spectrum. An informal introduction. Version 1.0. Part I/II. Technical Report TUM-I9311 / TUM-I9312, Technische Universit¨a¨ t Mu¨ nchen, Institut fu¨ r Informatik. [Broy et al., 1997] Broy, M., Hofmann, C., Kr¨u¨ ger, I., and Schmidt, M. (1997). A graphical description technique for communication in software architectures. Technical Report TUMI9705, Technische Universit¨a¨ t Mu¨ nchen, Institut fu¨ r Informatik. URL: http://www4. informatik.tu-muenchen.de/reports/TUM-I9705. Also in: Joint 1997 Asia Pacific Software Engineering Conference and International Computer Science Conference (APSEC’97/ICSC’97). [Broy and Kr¨u¨ ger, 1998] Broy, M. and Kru¨ ger, I. (1998). Interaction interfaces – towards a scientific foundation of a methodological usage of message sequence charts. In Staples, J., Hinchey, M., and Liu, S., editors, Formal Engineering Methods, pages 2–15, Brisbane. IEEE Computer Society. [Broy and Stølen, 2001] Broy, M. and Stølen, K. (2001). Specification and Development of Interactive Systems: Focus on Streams, Interfaces, and Refinement. Springer. [Harel, 1987] Harel, D. (1987). Statecharts: A visual formalism for complex systems. Science of Computer Programming, 8:231–274. [Herzberg and Broy, ] Herzberg, D. and Broy, M. Modelling layered distributed communication systems. To appear. ¨ [Hettler, 1994] Hettler, R. (1994). Zur Ubersetzung von E/R-Schemata nach Spectrum. Technischer Bericht TUM-I9409, TU M¨u¨ nchen. [Hinkel, 1998] Hinkel, U. (1998). Formale, semantische Fundierung und eine darauf abgestutzte Verifikationsmethode fr SDL. Dissertation, Fakult¨a¨ t fu¨ r Informatik, Technische ¨ Universit¨a¨ t Mu¨ nchen. [Hoare, 1985] Hoare, C. (1985). Communicating Sequential Processes. Prentice Hall. [Hoare et al., 1981] Hoare, C., Brookes, S., and Roscoe, A. (1981). A theory of communicating sequential processes. Technical Monograph PRG-21, Oxford University Computing Laboratory Programming Research Group, Oxford.
Service-oriented Systems Engineering [Jacobsen, 1992] Jacobsen, I. (1992). Wesley, ACM Press.
Object-Oriented Software Engineering.
81 Addison-
[Kahn, 1974] Kahn, G. (1974). The semantics of a simple language for parallel processing. In Rosenfeld, J., editor, Information Processing 74. Proc. of the IFIP Congress 74, pages 471–475, Amsterdam. North Holland. [Kr¨u¨ ger et al., 1999] Kru¨ ger, I., Grosu, R., Scholz, P., and Broy, M. (1999). From MSCs to statecharts. In Proceedings of DIPES’98. Kluwer. [Milner, 1980] Milner, R. (1980). A Calculus of Communicating Systems, volume 92 of Lecture Notes in Computer Science. Springer. [MSC, 1993] MSC (1993). Criteria for the Use and Applicability of Formal Description Techniques. Recommendation Z. 120, Message Sequence Chart (MSC). ITU-T (previously CCITT). 35 pages. [MSC, 1995] MSC (1995). Recommendation Z.120, Annex B: Algebraic Semantics of Message Sequence Charts. ITU-Telecommunication Standardization Sector, Geneva, Switzerland. [M¨u¨ ller and Scholz, 1997] Mu¨ ller, O. and Scholz, P. (1997). Functional specification of realtime and hybrid systems. In HART’97, Proc. of the 1st Int. Workshop on Hybrid and RealTime Systems, volume 1201 of Lecture Notes in Computer Science, pages 273–286. [Park, 1980] Park, D. (1980). On the semantics of fair parallelism. In Bjørner, D., editor, Abstract Software Specification, volume 86 of Lecture Notes in Computer Science, pages 504–526. Springer. [Park, 1983] Park, D. (1983). The “fairness” problem and nondeterministic computing networks. In Proc. 4th Foundations of Computer Science, volume 159 of Mathematical Centre Tracts, pages 133–161. Mathematisch Centrum Amsterdam. [Parnas, 1972] Parnas, D. (1972). On the criteria to be used to decompose systems into modules. Comm. ACM, 15:1053–1058. [Rumbaugh, 1991] Rumbaugh, J. (1991). Object-Oriented Modeling and Design. Prentice Hall, Englewood Cliffs: New Jersey. [Rumpe, 1996] Rumpe, B. (1996). Formale Methodik des Entwurfs verteilter objektorientierter Systeme. Ph. d. thesis, Technische Universit¨a¨ t Mu¨ nchen, Fakulta¨ t fu¨ r Informatik. Published by Herbert Utz Verlag. [SDL, 1988] SDL (1988). Specification and Description Language (SDL), Recommendation Z.100. CCITT. Technical Report. [Selic et al., 1994] Selic, B., Gullekson, G., and Ward, P. (1994). Real-time Object-Oriented Modeling. Wiley, New York. [Zave and Jackson, 1997] Zave, P. and Jackson, M. (1997). Four dark corners of requirements engineering. ACM Transactions on Software Engineering and Methodology.
INTERFACE-BASED DESIGN Luca de Alfaro UC Santa Cruz, California
[email protected] Thomas A. Henzinger EPFL, Switzerland, and UC Berkeley, California tah@epfl.ch
Abstract
Surveying results from [5] and [6], we motivate and introduce the theory behind formalizing rich interfaces for software and hardware components. Rich interfaces specify the protocol aspects of component interaction. Their formalization, called interface automata, permits a compiler to check the compatibility of component interaction protocols. Interface automata support incremental design and independent implementability. Incremental design means that the compatibility checking of interfaces can proceed for partial system descriptions, without knowing the interfaces of all components. Independent implementability means that compatible interfaces can be refined separately, while still maintaining compatibility.
Keywords:
Software engineering, formal methods, component-based design.
Introduction Interfaces play a central role in the component-based design of software and hardware systems. We say that two or more components are compatible if they work together properly. Good interface design is based on two principles. First, an interface should expose enough information about a component as to make it possible to predict if two or more components are compatible by looking only at their interfaces. Second, an interface should not expose more information about a component than is required by the first principle. The technical realization of these principles depends, of course, on what it means for two or more components to “work together properly.” A simple interpretation is offered by typed programming languages: a component that implements a function and a component that calls the function are compatible 83 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 83–104. © 2005 Springer. Printed in the Netherlands.
84 if the function definition and the function call agree on the number, order, and types of the parameters. We discuss richer notions of compatibility, which specify in addition to type information, also protocol information about how a component must be used. For example, the interface of a file server with the two methods open-file and read-file may stipulate that the method read-file must not be called before the method open-file has been called. Symmetrically, the interface of a client specifies the possible behaviors of the client in terms of which orderings of open-file and read-file calls may occur during its execution. Given such server and client interfaces, a compiler can check statically if the server and the client fit together. Interfaces that expose protocol information about component interaction can be specified naturally in an automaton-based language [5]. In this article, we give a tutorial introduction to such interface automata.
Interface Languages We begin by introducing two requirements on interface languages. An interface language should support incremental design and independent implementability. With each interface language we present, we will verify that both of these requirements are met.
Incremental design. A component is typically an open system, i.e., it has some free inputs, which are provided by other components. Incremental design is supported if we can check the compatibility of two or more component interfaces without specifying interfaces for all components, i.e., without closing the system. The unspecified component interfaces may later be added one by one, as long as throughout the process, the set of specified interfaces stays compatible. More precisely, the property of incremental design requires that if the interfaces in a set F (representing the complete, closed design) are compatible, then the interfaces in every subset G ⊆ F (representing a partial, open design) are compatible. This yields an existential interpretation of interface compatibility: the interfaces in an open set G of interfaces (i.e., a set with free inputs) are compatible if there exists an interface E (representing an environment that provides all free inputs to the interfaces in G) such that the interfaces in the closed set G ∪ {E} (without free inputs) are compatible.1 Incremental design suggests that we model compatibility as a symmetric binary relation ∼ between interfaces, and composition as a binary partial function || on interfaces. If two interfaces F and G are compatible, that is, F ∼ G, then F ||G is defined and denotes the resulting composite interface. Now the property of incremental design reads as follows: For all interfaces F , G, H, and I, if F ∼ G and H ∼ I and F ||G ∼ H||I, then F ∼ H and G ∼ I and F ||H ∼ G||I.
Interface-based Design
85
This property ensures that the compatible components of a system can be put together in any order.2
Independent implementability. Recall the first principle of interface design, namely, that the information contained in interfaces should suffice to check if two or more components are compatible. This principle can be formalized as follows: if F and G are compatible interfaces, and F is a component that conforms to interface F , and G is a component that conforms to interface G, then F and G are compatible components, and moreover, the composition F ||G of the two components conforms to the composite interface F ||G. We call this the property of independent implementability, because it enables the outsourcing of the implementation of the components F and G to two different vendors: as long as the vendors conform to the provided interfaces F and G, respectively, their products will fit together, even if the vendors do not communicate with each other. For simplicity, in this article we gloss over the differences between interfaces and components, and express both in the same language; that is, we consider components to be simply more detailed interfaces.3 For this purpose, we use a refinement preorder between interfaces: if F " F , then the interface F refines the interface F . An interface may be refined into an implementation in several steps. As the refinement relation is a preorder, it is transitive. The property of independent implementability reads as follows: For all interfaces F , F , G, and G , if F ∼ G and F " F and G " G , then F ∼ G and F ||G " F ||G . This property ensures that compatible interfaces can always be refined separately.4
Assume/Guarantee Interfaces We illustrate the properties of incremental design and independent implementability through a simple, stateless interface language called assume/guarantee (A/G, for short) interfaces [6]. Assume/guarantee interfaces have input and output variables. An A/G interface puts a constraint on the environment through a predicate φI on its input variables: the environment is expected to provide inputs that satisfy φI . In return, the interface communicates to the environment a constraint φO on its output variables: it vouches to provide only outputs that satisfy φO . In other words, the input assumption φI represents a precondition, and the output guarantee φO a postcondition.
Definition 1 An A/G interface F = X I , X O , φI , φO consists of two disjoint sets X I and X O of input and output variables; a satisfiable predicate φI over X I called input assumption;
86 a satisfiable predicate φO over X O called output guarantee. Note that input assumptions, like output guarantees, are required to be satisfiable, not valid. An input assumption is satisfiable if it can be met by some environment. Hence, for every A/G interface there is a context in which it can be used. On the other hand, in general not all environments will satisfy the input assumption; that is, the interface puts a constraint on the environment.
Example 2 A division component with two inputs x and y, and an output z, might have an A/G interface with the input assumption y = 0 and the output guarantee true (which is trivially satisfied by all output values). The input assumption y = 0 ensures that the component is used only in contexts that provide non-zero divisors. In the following, when referring to the components of several interfaces, we use the interface name as subscript to identify ownership. For example, the input assumption of an interface F is denoted by φIF .
Compatibility and composition. We define the composition of A/G interfaces in several steps. First, two A/G interfaces are syntactically composable if their output variables are disjoint. In general, some outputs of one interface will provide inputs to the other interface, and some inputs will remain free in the composition. Second, two A/G interfaces F and G are semantically compatible if whenever one interface provides inputs to the other interface, then the output guarantee of the former implies the input assumption of the latter. Consider first the closed case, that all inputs of F are outputs of G, and vice versa. Then F and G are compatible if the closed formula O O I I )(φO (∀XFO ∪ XG F ∧ φG ⇒ φF ∧ φG )
(ψ)
is true. In the open case, where some inputs of F and G are left free, the formula (ψ) has free input variables. As discussed above, to support incremental design, the two interfaces F and G are compatible if they can be used together in some context, i.e., if there is a environment that makes (ψ) true by providing helpful input values. Thus, in the open case, the A/G interfaces F and G are compatible if the formula (ψ) is satisfiable. Then, the formula (ψ) is the input assumption of the composite interface F ||G, because it encodes the weakest condition on the environment of F ||G that makes F and G work together. O = Definition 3 Two A/G interfaces F and G are composable if XFO ∩ XG ∅. Two A/G interfaces F and G are compatible, written F ∼ G, if they are composable and the formula O O I I )(φO (∀XFO ∪ XG F ∧ φG ⇒ φF ∧ φG )
(ψ)
Interface-based Design
87
is satisfiable. The composition F ||G of two compatible A/G interfaces F and G is the A/G interface with I )\X O ; XFI ||G = (XFI ∪ XG F ||G O; XFO||G = XFO ∪ XG
φIF ||G = ψ; O O φO F ||G = φF ∧ φG .
Note that the compatibility relation ∼ is symmetric.
Example 4 Let F be the A/G interface without input variables, the single output variable x, and the output guarantee true. Let G be the A/G interface with the two input variables x and y, the input assumption x = 0 ⇒ y = 0, and no output variables. Then F and G are compatible, because the formula (∀x)(true ⇒ (x = 0 ⇒ y = 0)) simplifies to y = 0, which is satisfiable. Note that the predicate y = 0 expresses the weakest input assumption that the composite interface needs to make in order to ensure that the input assumption x = 0 ⇒ y = 0 of G is satisfied. This is because F makes no guarantees about x; in particular, it might provide outputs x that are 0, and it might provide outputs x that are different from 0. The composition F ||G has the input variable y, the input assumption y = 0, the output variable x, and the output guarantee true. The following theorem shows that the A/G interfaces support incremental design.
Theorem 5 For all A/G interfaces F , G, H, and I, if F ∼ G and H ∼ I and F ||G ∼ H||I, then F ∼ H and G ∼ I and F ||H ∼ G||I. Proof sketch. Note that from the premises of the theorem it follows that O , X O , and X O are pairwise disjoint; and (2) the (1) the four sets XFO , XG H I formula O O O O O I I I I (∀XFO ∪ XG ∪ XH ∪ XIO )(φO F ∧ φG ∧ φH ∧ φI ⇒ φF ∧ φG ∧ φH ∧ φI )
is satisfiable. To prove from this that, say, the formula O O I I (∀XFO ∪ XH )(φO F ∧ φH ⇒ φF ∧ φH ) O is satisfiable, choose the values for the variables in XFI ||H ∩ XG||I so that O O φG ∧ φI is true. 2
88 Besides composition, the second operation on interfaces is Refinement. refinement. Refinement between A/G interfaces is, like subtyping for function types, defined in an input/output contravariant fashion: an implementation must accept all inputs that the specification accepts, and it may produce only outputs that the specification allows. Hence, to refine an A/G interface, the input assumption can be weakened, and the output guarantee can be strengthened.
Definition 6 An A/G interface F refines an A/G interface F , written F " F , if 1. XFI ⊆ XFI and XFO ⊇ XFO ; O 2. φIF ⇒ φIF and φO F ⇐ φF .
Refinement between A/G interfaces is a preorder (i.e., reflexive and transitive). The following theorem shows that the A/G interfaces support independent implementability.
Theorem 7 For all A/G interfaces F , G, and F , if F ∼ G and F " F , then F ∼ G and F ||G " F ||G. O = ∅ and X O ⊇ X O , it follows that X O ∩ Proof sketch. From XFO ∩ XG F F F = ∅. Choose values for the input variables in XFI ||G so that
O XG
O O I I (∀XFO ∪ XG )(φO F ∧ φG ⇒ φF ∧ φG ).
From XFI ⊆ XFI and XFO ⊇ XFO , it follows that XFI ||G ⊆ XFI ||G . Choose O I I arbitrary values for all variables not in XFI ||G . Then φO F ∧ φG ⇒ φF ∧ φG O O O I I I I follows from φO F ⇒ φF and φF ∧ φG ⇒ φF ∧ φG and φF ⇒ φF . This proves that F ∼ G. The proof that F ||G " F ||G is straight-forward. 2 Note that the contravariant definition of refinement is needed for Theorem 7 to hold, as input assumptions and output guarantees occur on two different sides of the implication in the formula (ψ). We have not fixed the types of variables, nor the theory in which input assumptions or output guarantees are written. Checking the compatibility of A/G interfaces, and checking refinement between A/G interfaces, requires a procedure that decides the satisfiability of universal formulas in that theory. For example, if all variables are boolean, then the input assumptions and output guarantees are quantifier-free boolean formulas. In this case, compatibility checking requires the evaluation of ∃∀ boolean formulas (namely, satisfiability checking of the universal formula (ψ)), and refinement checking requires the evaluation of ∀ boolean formulas (namely, validity checking of the two implications of Definition 6).
Interface-based Design
89
Automaton Interfaces We now present the stateful interface language called interface automata [5]. An interface automaton is an edge-labeled digraph whose vertices represent interface states, whose edges represent interface transitions, and whose labels represent action names. The actions are partitioned into input, output, and internal actions. The internal actions are “hidden”: they cannot be observed by the environment. The syntax of interface automata is identical to the syntax of I/O automata [8], but composition will be defined differently.
Definition 8 An interface automaton F = Q, q 0 , AI , AO , AH , δ consists of a finite set Q of states; an initial state q 0 ∈ Q; three pairwise disjoint sets AI , AO , and AH of input, t output, t and hidden actions; a set δ ⊆ Q × A × Q of transitions, where A = AI ∪ AO ∪ AH is the set of all actions. We require that the automaton F be input-deterministic, that is, for all states q, q , q ∈ Q and all input actions a ∈ AI , if (q, a, q ) ∈ δ and (q, a, q ) ∈ δ, then q = q . An action a ∈ A is enabled at a state q ∈ Q if there exists a state q ∈ Q such that (q, a, q ) ∈ δ. Given a state q ∈ Q, we write AI (q) (resp. AO (q); AH (q)) for the set of input (resp. output; hidden) actions that are enabled at q. Unlike I/O automata, an interface automaton is not required to be input-enabled; that is, we do not require that AI (q) = AI for all states q ∈ Q. Rather, we use the set AI (q) to specify the input actions that are accepted at state q; that is, an interface automaton encodes the assumption that, when F is in state q, the environment does not provide an input action that is not enabled at q.
Example 9 We model a software component that implements a messagetransmission service. The component has a method “send” for sending messages. When this method is called, the component returns either “ok” or “fail.” To perform this service, the component relies on a communication channel that provides the method “trnsmt” for transmitting messages. The two possible return values are “ack,” which indicates a successful transmission, and “nack,” which indicates a failure. When the method “send” is called, the component tries to transmit the message, and if the first transmission fails, it tries again. If both transmissions fail, the component reports failure. The interface automaton modeling this component is called TryTwice and shown in Figure 1.
90
send
ok
fail
5 ok!
ack? ack?
0
d
1
2
fail!
k
6
trnsmt
3
4
nack?
ack
Figure 1.
nack
The interface automaton TryTwice.
send! 0
1 ok?
send
Figure 2.
fail
The interface automaton Client.
Interface-based Design
91
The interface automaton TryTwice has the three input actions “send,” “ack,” and “nack”; the three output actions “trnsmt,” “ok,” and “fail”; and no hidden actions. It has seven states, with state 0 being initial (marked by an arrow without source). On the transitions, we append to the name of the action label the symbol “?” (resp. “!”; “;”) to indicate that the transition is input (resp. output; hidden). Note how the automaton expresses in a straight-forward manner the above informal description of the message-passing service. The input “send” is accepted only in state 0; that is, the component expects a client to send a second message only after it has received an “ok” or “fail” response. The interface automaton Client of Figure 2 shows a possible client of the message-passing service. It has the input actions “ok” and “fail,” the output action “send,” and again no hidden actions. This particular client expects messages to be sent successfully, and makes no provisions for handling failures: after calling the method “send,” it accepts the return value “ok,” but does not accept the return value “fail.” The expectation that the return value is always “ok” is an assumption by the component Client about its environment; that is, the component Client is designed to be used only with message-transmission services that cannot fail. An interface automaton F is closed if it has no input and output actions; that is, if AI = AO = ∅. Closed interface automata do not interact with the environment. An execution of the interface automaton F is a finite alternating sequence q0 , a0 , q1 , a1 , . . . , qn of states and actions such that (qi , ai , qi+1 ) ∈ δ for all 0 ≤ i < n. The execution is autonomous if all its actions are output or hidden actions; that is, if ai ∈ AO ∪ AH for all 0 ≤ i < n. Autonomous executions do not depend on input actions. The execution is invisible if all its actions are hidden; that is, if ai ∈ AH for all 0 ≤ i < n. A state q ∈ Q is (autonomously; invisibly) reachable from a state q ∈ Q if there exists an (autonomous; invisible) execution whose first state is q, and whose last state is q . The state q is reachable in F if q is reachable from the initial state q 0 . In the definition of interface automata, it is not required that all states be reachable. However, one is generally not interested in states that are not reachable, and they can be removed.
Compatibility and composition. We define the composition of two interface automata only if their actions are disjoint, except that an input action of one automaton may coincide with an output action of the other automaton. Definition 10 Two interface automata F and G are composable if H 1. AH F ∩ AG = ∅ and AF ∩ AG = ∅;
2. AIF ∩ AIG = ∅;
92 O 3. AO F ∩ AG = ∅.
For two interface automata F and G, we let shared (F, G) = AF ∩ AG be the set of common actions. If F and G are composable, then shared(F, G) = O I (AIG ∩ AO F ) ∪ (AF ∩ AG ). We define the composition of interface automata in stages, first defining the product automaton F ⊗ G. The two automata synchronize on the actions in shared (F, G), and asynchronously interleave all other actions. Shared actions become hidden in the product.
Definition 11 For two composable interface automata F and G, the product F ⊗ G is the interface automaton with QF ⊗G = QF × QG ; 0 ); qF0 ⊗G = (qF0 , qG
AIF ⊗G = (AIF ∪ AIG )\shared (F, G); O O AO F ⊗G = (AF ∪ AG )\shared (F, G); H H AH F ⊗G = AF ∪ AG ∪ shared(F, G);
((q, r), a, (q , r )) ∈ δF ⊗G iff a ∈ shared(F, G) and (q, a, q ) ∈ δF and r = r , or a ∈ shared(F, G) and q = q and (r, a, r ) ∈ δG , or a ∈ shared(F, G) and (q, a, q ) ∈ δF and (r, a, r ) ∈ δG . Let δFI = {(q, a, q ) ∈ δ | a ∈ AIF } denote the set of input transitions of an interface automaton F , and let δFO and δFH be defined similarly as the output and hidden transitions of F . Then according to the definition of product automata, each input transition of F ⊗G is an input transition of either F or G; that is, ((q, r), a, (q , r )) ∈ δFI ⊗G iff (q, a, q ) ∈ δFI and a ∈ AO G and r = r ; or I a ∈ AO F and q = q and (r, a, r ) ∈ δG .
Each output transitions of F ⊗ G is an output transition of F or G; that is, ((q, r), a, (q , r )) ∈ δFO⊗G iff (q, a, q ) ∈ δFO and a ∈ AIG and r = r ; or O. a ∈ AIF and q = q and (r, a, r ) ∈ δG Each hidden transition of F ⊗ G is either an input transition of F that is an output transition of G, or vice versa, or it is a hidden transition of F or G; that is, ((q, r), a, (q , r )) ∈ δFH⊗G iff
93
Interface-based Design
5 ok;
ack? ack?
0
send;
1
2
k
3
nack?
6
trnsmt
Figure 3.
4
ack
nack
The product automaton TryTwice ⊗ Client.
5 ok;
ack? ack?
0
send;
trnsmt
Figure 4.
1
2
k
ack
3
4
nack
The composite interface automaton TryTwice||Client.
O ; or (q, a, q ) ∈ δFI and (r, a, r ) ∈ δG O I ; or (q, a, q ) ∈ δF and (r, a, r ) ∈ δG H (q, a, q ) ∈ δF and r = r ; or H. q = q and (r, a, r ) ∈ δG
Example 12 The product TryTwice ⊗ Client of the interface automata TryTwice and Client from Figures 1 and 2 is shown in Figure 3. Each state of the product consists of a state of TryTwice together with a state of Client. Only the reachable states of the product automaton are shown. Each transition of the product is either a joint “send” transition, which represents the call of the method “send” by Client, or a joint “ok” transition, which represents the termination of the method “send” with return value “ok,” or a transition of TryTwice calling the method “trnsmt” of the (unspecified) communi-
94 cation channel, or a transition of TryTwice receiving the return value “ack” or “nack” from the channel. Consider the following sequence of events. The component Client calls the method “send”; then TryTwice calls twice the method “trnsmt” and receives twice the return value “nack,” indicating transmission failure. This sequence of events brings us to state 6 of the product automaton, which corresponds to state 6 of TryTwice and state 1 of Client. In state 6, the component TryTwice tries to report failure by returning “fail,” but not expecting failure, the component Client does not accept the return value “fail” in state 1. Hence the product state 6 has no outgoing edges; it is called an error state, because at product state 6, the component TryTwice violates the assumption made by the component Client about the inputs that Client receives. This example illustrates that, as interface automata are not necessarily inputenabled, in the product of two interface automata, one of the automata may produce an output action that is in the input alphabet of the other automaton, but is not accepted.
Definition 13 Given two composable interface automata F and G, a product state (q, r) ∈ QF × QG is an error state of the product automaton F ⊗ G if there exists an action a ∈ shared(F, G) such that either a ∈ AO F (q) and I I O a ∈ AG (r), or a ∈ AF (q) and a ∈ AG (r). We write error (F, G) for the set of error states of the product automaton F ⊗ G. If the product F ⊗ G contains no reachable error states, then the two interface automata F and G satisfy each other’s input assumptions and are thus compatible. On the other hand, if F ⊗ G is closed and contains a reachable error state, then F and G are incompatible. The interesting case arises when F ⊗ G contains reachable error states, but is not closed. The fact that a state in error (F, G) is reachable does not necessarily indicate an incompatibility, because by providing appropriate inputs, the environment of F ⊗ G may be able to ensure that no error state is encountered in the product. We therefore define the set of incompatible states of F ⊗ G as those states from which no environment can prevent that an error state may be entered. First, the error states of F ⊗ G are incompatible. Second, all states from which a sequence of output or hidden actions of F ⊗ G leads to an error state are also incompatible, because the product automaton may choose to traverse that sequence in every environment. On the other hand, if an error state is only reachable via an input action, then a helpful environment can choose not to provide that action, thus avoiding the error state.
Definition 14 Given two composable interface automata F and G, a product state (q, r) ∈ QF × QG is a compatible state of the product automaton
Interface-based Design
95
F ⊗ G if there exists no error state (q , r ) ∈ error (F, G) that is autonomously reachable from (q, r). Two interface automata F and G are compatible, written F ∼ G, if they are composable and the initial state of the product automaton F ⊗ G is compatible. Note that the compatibility relation ∼ is symmetric. If two composable interface automata F and G are compatible, then there is an environment E such that (1) E is composable with F ⊗ G; (2) (F ⊗ G) ⊗ E is closed; (3) for all states ((q, r), s) ∈ (QF × QG ) × QE that are reachable in (F ⊗G)⊗E, we have (q, r) ∈ error (F, G) and ((q, r), s) ∈ error (F ⊗ G, E). The third condition ensures that (3a) E prevents the error states of F ⊗ G from being entered, and (3b) E accepts all outputs of F ⊗ G and does not provide inputs that are not accepted by F ⊗G. An interface automaton that satisfies the conditions (1)–(3) is called a legal environment for F ⊗ G. The existence of a legal environment shows that two compatible interfaces can be used together in some context. For compatible interface automata F and G, there is always a trivial legal environment, which provides no inputs to F ⊗ G. Formally, empty closure of F and G is the interface automaton close(F, G) with Qclose(F,G) = {0}; 0 qclose(F,G) = 0; I Aclose(F,G) = AO F ⊗G ; O Aclose(F,G) = AIF ⊗G ; AH close(F,G) = ∅; δclose(F,G) = {(0, a, 0) | a ∈ AIclose(F,G) }. The empty closure close(F, G) has a single state (arbitrarily named 0), which is its initial state. It accepts all output actions of F ⊗ G as inputs, but does not issue any outputs. All states that are reachable in (F ⊗ G) ⊗ close(F, G) are reachable solely by output and hidden actions of F ⊗ G. Thus, if the initial state of F ⊗ G is compatible, then each state that is reachable in (F ⊗ G) ⊗ close(F, G) must not correspond to an error state of F ⊗G. On the other hand, if the initial state of F ⊗ G is not compatible, then some error state of F ⊗ Q is reachable in every environment that does not constrain the outputs of F ⊗ G, in particular, in (F ⊗ G) ⊗ close(F, G). Consequently, two interface automata F and G are compatible iff for all states (q, r) ∈ QF × QG such that ((q, r), 0) is reachable in (F ⊗ G) ⊗ close(F, G), we have (q, r) ∈ error (F, G).
96 The composition of two compatible interface automata is obtained by restricting the product of the two automata to the set of compatible states.
Definition 15 Given two compatible interface automata F and G, the composition F ||G is the interface automaton that results from the product F ⊗ G by removing all transitions (q, a, q ) ∈ δF ⊗G such that 1. q is a compatible state of the product F ⊗ G; 2. a ∈ AIF ⊗G is an input action of the product; 3. q is not a compatible state of F ⊗ G.
Example 16 In the product automaton TryTwice ⊗ Client from Figure 3, state 6 is an error state, and thus not compatible. However, the product TryTwice ⊗ Client is not closed, because its environment —the communication channel— provides “ack” and “nack” inputs. The environment that provides input “ack” (or no input at all) at the product state 4 ensures that the error state 6 is not entered. Hence, the product states 0, 1, 2, 3, 4, and 5 are compatible. Since the initial state 0 of the product is compatible, the two interface automata TryTwice and Client are compatible. The result of removing from TryTwice ⊗ Client the input transition to the incompatible state 6 is the interface automaton TryTwice||Client shown in Figure 4. Note that restricting TryTwice ⊗ Client to its compatible states corresponds to imposing an assumption on the environment, namely, that calls to the method “send” never return twice in a row the value “nack.” Hence, when the two interface automata TryTwice and Client are composed, the assumption of Client that no failures occur is translated into the assumption of TryTwice||Client that no two consecutive transmissions fail. The illustrates how the composition of the interface automata TryTwice and Client propagates to the environment of TryTwice||Client the assumptions that are necessary for the correct interaction of TryTwice and Client. The definition of composition removes only transitions, not states, from the product automaton. The removal of transitions, however, may render some states unreachable, which can then be removed also. In particular, as far as reachable states are concerned, the composition F ||G results from the product F ⊗ G by removing all incompatible states; if the result is empty, then F and G are not compatible. In general, the removal of input transitions from a product automaton may render even some compatible states unreachable. Hence, the relevant states of the composite automaton F ||G are those states of the product automaton F ⊗ G which remain reachable after all incompatible states are removed. Those states of the product automaton can be found in linear time, by forward and backward traversals of the underlying graph [5].
Interface-based Design
97
Thus the compatibility of two interface automata with m1 and m2 reachable transitions, respectively, can be checked, and their composition constructed, in time O(m1 · m2 ). The following theorem shows that interface automata support incremental design.
Theorem 17 For all interface automata F , G, H, and I, if F ∼ G and H ∼ I and F ||G ∼ H||I, then F ∼ H and G ∼ I and F ||H ∼ G||I. Proof sketch. For composability, note that from the premises of the theorem I it follows that AH i is disjoint from Aj for all j = i, that all Ai ’s are pairwise disjoint, and that all AO i ’s are pairwise disjoint. Consider the product automaton F ⊗ G ⊗ H ⊗ I; the associativity of ⊗ implies that parentheses do not matter. Define a state (p1 , p2 , p3 , p4 ) to be an error state of F ⊗ G ⊗ H ⊗ I if some pair (pi , pj ) is an error state of the corresponding subproduct; e.g., if (p1 , p3 ) is an error state of F ⊗ H. Define a state p to be an incompatible state of F ⊗G⊗H ⊗I if some error state of F ⊗G⊗H ⊗I is autonomously reachable from p, that is, reachable via hidden and output transitions. For ≥ 0, define a state p to be a rank- incompatible state if some error state is autonomously reachable from p in at most transitions. We show that under the premises of the theorem, the composition F ||G||H||I is achieved, for any insertion of parentheses, by removing the incompatible states from the product F ⊗ G ⊗ H ⊗ I. The proof proceeds in two steps. First, we show that if some projection of a product state p = (p1 , p2 , p3 , p4 ) is an incompatible state of the corresponding subproduct (say, F ⊗ G), then p is an incompatible state of the full product F ⊗ G ⊗ H ⊗ I. Second, we show that if p is an incompatible state of F ⊗ G ⊗ H ⊗ I, and some input transitions are removed by constructing the composition of any subproduct (say, F ||H), then even in the product without the removed transitions, there remains an autonomous path from p to an error state. (1) Consider a state p of the product F ⊗ G ⊗ H ⊗ I, and a projection p of p which is a rank- incompatible state of the corresponding subproduct. We show that p is a rank- incompatible state of F ⊗ G ⊗ H ⊗ I for some
≤ . Consider a shortest autonomous path from p to an error state in the subproduct. There are three cases. First, if p is an error state of the subproduct (rank 0), then p is an error state of the full product. Second, if the first transition of the error path from p in the subproduct corresponds to an output or hidden transition of the full product, then the rank of the successor state is −1 and the claim follows by induction. Third, if the first transition of the error path from p is an output transition of the subproduct which does not have a matching input transition in the full product, then p is an error state of the full product and has rank 0.
98 (2) Consider an incompatible state p of the product F ⊗ G ⊗ H ⊗ I. Suppose that some input transitions are removed by constructing the composition of a subproduct, and remove the corresponding transitions in the full product F ⊗G⊗H ⊗I. The only kind of transition that might be removed in this way is a hidden transition (q, a, r) of the product whose projection onto the subproduct is an input transition (q , a, r ), which is matched in the full product by an output transition. Once (q, a, r) is removed, the input action a is no longer enabled at the state q , because interface automata are input-deterministic. Hence in the full product, the state q is an error state. Therefore even after the removal of the transition (q, a, r) from the product F ⊗ G ⊗ H ⊗ I, there is an autonomous path from p to an error state, namely, to q. 2 As a consequence of Theorem 17, we can check whether k > 0 interface auFk tomata F1 , . . . , Fk are compatible by computing their composition F1 || · · · ||F incrementally, by adding one interface automaton at a time. The potential efficiency of the incremental product construction lies in the fact that product states can be pruned as soon as they become either incompatible, or unreachable through the pruning of incompatible states. Thus, in some cases the exponential explosion of states inherent in a product construction may be avoided.
Refinement. In the stateful input-enabled setting, refinement is usually defined as trace containment or simulation; this ensures that all output behaviors of the implementation are allowed by the specification. However, such definitions are not appropriate in a non-input-enabled setting, such as interface automata: if one were to require that the set of accepted inputs of the implementation is a subset of the inputs allowed by the specification, then the implementation would make stronger assumptions about the environment, and could not be used in all contexts in which the specification is used. Example 18 Consider the interface automaton OnceOrTwice of Figure 5. This automaton represents a component that provides two services: the first is the try-twice service “send” provided also by the automaton TryTwice of Figure 1; the second is a try-once-only service “once” designed for messages that are useless when stale. Clearly, we would like to define refinement so that OnceOrTwice is a refinement of TryTwice, because the component OnceOrTwice implements all services provided by the component TryTwice, and it is consistent with TryTwice in their implementation. Hence, in all contexts in which TryTwice is used, OnceOrTwice can be used instead. The language of OnceOrTwice, however, is not contained in the language of TryTwice; indeed, “once” is not even an action of TryTwice. Therefore, for interface automata we define refinement in a contravariant fashion: the implementation must accept more inputs, and provide fewer outputs, than the specification. For efficient checkability of refinement, we choose
99
Interface-based Design send
once
5’
9’ ok!
ok!
ack?
ack? 8’
fail
ok
ack?
trnsmt! 7’
once?
send?
1’
2’
k
3 trnsmt! 3’ t nsmt!
4’
0’ nack? fail!
fail!
6 6’
nack?
10’
trnsmt
Figure 5.
ack
nack
The interface automaton OnceOrTwice.
a contravariant refinement relation in the spirit of simulation, rather than in the spirit of language containment. This leads to the definition of refinement as alternating simulation [1]. Roughly, an interface automaton F refines an interface automaton F if each input transition of F can be simulated by F , and each output transition of F can be simulated by F . The precise definition must take into account the hidden transitions of F and F . The environment of an interface automaton F cannot see the hidden transitions of F . Consequently, if F is at a state q, and state r is invisibly reachable from q (by hidden actions only), then the environment cannot distinguish between q and r. Given a state q ∈ Q, let ε-closure(q) be the set of states that are invisibly reachable from q. The environment must be able to accept all output actions in the set obsAO (q) = {a ∈ AO | (∃r ∈ ε-closure(q))(a ∈ AO (r))} of outputs that may follow after some sequence of hidden transitions from q. Conversely, the environment can safely issue all input actions in the set obsAI (q) = {a ∈ AI | (∀r ∈ ε-closure(q))(a ∈ AI (r))} of inputs that are accepted after all sequences of hidden transitions from q. For an implementation state q to refine a specification state q we need to require that obsAI (q) ⊆ obsAI (q ) and obsAO (q) ⊇ obsAO (q ). Alternating simulation propagates this requirement from q and q to their successor states.
100 To define alternating simulation formally, we use the following notation. Given a state q ∈ Q and an action a ∈ A of an interface automaton, let post(q, a) = {r ∈ Q | (q, a, r) ∈ δ} be the set of a-successors of q.
Definition 19 Given two interface automata F and F , a binary relation " ⊆ QF × QF is an alternating simulation by F of F if q " q implies 1. for all input actions a ∈ AI (q) and states r ∈ post(q, a), there is a state r ∈ post(q , a) such that r " r ; 2. for all output actions a ∈ AO (q ) and states r ∈ post(q , a), there is a state p ∈ ε-closure(q) and a state r ∈ post(p, a) such that r " r ; 3. for all hidden actions a ∈ AH (q ) and states r ∈ post(q , a), there is a state r ∈ ε-closure(q) such that r " r . Conditions (1) and (2) express the input/output duality between states q " q in the alternating simulation relation: every input transition from q must be matched by an input transition from q , and every output transition from q must be matched by a sequence of zero or more hidden transitions from q followed by an output transition. Condition (3) stipulates that every hidden transition from q can be matched by a sequence of zero or more hidden transitions from q. In all three cases, matching requires that the alternatingsimulation relation is propagated co-inductively. Since interface automata are input-deterministic, condition (1) can be rewritten as (1a) AI (q) ⊆ AI (q ) and (1b) for all input transitions (q, a, r), (q , a, r ) ∈ δ I , we have r " r . It can be checked that if q " q for some alternating simulation ", then obsAI (q) ⊆ obsAI (q ) and obsAO (q) ⊇ obsAO (q ).
Definition 20 An interface automaton F refines an interface automaton F , written F " F , if O 1. AIF ⊆ AIF and AO F ⊇ AF ;
2. there is an alternating simulation " by F of F such that qF0 " qF0 . Note that unlike in standard simulation, the “typing” condition (1) is contravariant on the input and output action sets. This captures a simple kind of subclassing: if F " F , then the implementation F is able to provide more services than the specification F , but it must be consistent with F on the shared services. Condition (2) relates the initial states of the two automata.
Example 21 In the example of Figures 1 and 5, there is an alternating simulation that relates q with q for all q ∈ {0, 1, 2, 3, 4, 5, 6}. Hence OnceOrTwice refines TryTwice.
Interface-based Design
101
It can be shown that refinement relation between interface automata is a preorder. Refinement can be checked in polynomial time. More precisely, if F has n1 reachable states and m1 reachable transitions, and F has n2 reachable states and m2 reachable transitions, then it can be checked in time O((m1 + m2 ) · (n1 + n2 )) whether F " F [1]. The following theorem shows that interface automata support independent implementability: we can always replace an interface automaton F with a more refined version F such that F " F , provided that F and F are connected to the environment by the same inputs. The side condition is due to the fact that if the environment were to provide inputs for F that are not provided for F , then it would be possible that new incompatibilities arise in the processing of these inputs. For software components, independent implementability is a statement of subclass polymorphism: we can always substitute a subclass for a superclass, provided no new methods of the subclass are used.
Theorem 22 Consider three interface automata F , G, and F such that F and G are composable and shared (F , G) ⊆ shared (F, G). If F ∼ G and F " F , then F ∼ G and F ||G " F ||G. Proof sketch. The typing conditions are straight-forward to check. Note in particular that shared (F , G) ⊆ shared (F, G) implies both AH F ∩AG = ∅ and I I O (AF \AF ) ∩ AG = ∅. To prove that F ∼ G under the premises of the theorem, we show that every autonomous path leading from the initial state to an error state of F ⊗ G can be matched, transition by transition, by an autonomous path leading from the initial state to an error state of F ⊗ G. The interesting case is that of an input transition of F in the product F ⊗ G, say on action a. Since the path is autonomous, the input action a of F must be an output action of G, and because shared (F , G) ⊆ shared (F, G), the action a must also be an input action of F . If a is not enabled in F , then we have already hit an error state of F ⊗ G; otherwise, there are unique a-successors in both F and F , and the path matching can continue. Finally, to prove that F ||G " F ||G under the premises of the theorem, consider an alternating simulation " by F of F such that qF0 " qF0 . Then an alternating simulation " by F ||G of F ||G can be defined as follows: let (p, r) " (p , r ) iff (1) p " p , (2) r = r , and (3) (p, r) is not an error state of F ⊗ G. 2 The property of independent implementability implies that refinement is compositional: in order to check if F ||G " F ||G , it suffices to check both F " F and G " G . This observation allows the decomposition of refinement proofs. Decomposition is particularly important in the case of interface automata, where the efficiency of refinement checking depends on the number of states.
102
Discussion An interface automaton represents both assumptions about the environment, and guarantees about the specified component. The environment assumptions are twofold: (1) each output transition incorporates the assumption that the corresponding action is accepted by the environment as input; and (2) each input action that is not accepted at a state encodes the assumption that the environment does not provide that input. The component guarantees correspond to possible sequences and choices of input, output, and hidden actions, as usual. When two interface automata are composed, the composition operator || combines not only the component guarantees, as is the case in other component models, but also the environment assumptions. Whenever two interface automata F and G are compatible, there is a particularly simple legal environment, namely, the empty closure close(F, G). This points to a limitation of interface automata: while the environment assumption of an automaton can express which inputs may occur, it cannot express which inputs must occur. Thus, the environment that provides no inputs is always the best environment for showing compatibility. There are several ways of enriching interface automata to specify inputs that must occur, among them, synchronicity [3],[6], adding fairness [4], or adding real-time constraints [7]. In these cases, no generic best environment exists, and a legal environment must be derived as a winning strategy in a two-player game. Recall that two interfaces F and G are compatible iff the environment has a strategy to avoid incompatible states of the product F ⊗G. In this game, player-1 is the environment, which provides inputs to the product F ⊗ G, and player-2 is the “team” {F, G} of interfaces, which choose internal transitions and outputs of F ⊗ G. The game aspect of compatibility checking is illustrated by the following example of a stateful, synchronous extension of assume/guarantee interfaces [3].
Example 23 Suppose that F and G are two generalized A/G interfaces, which receive inputs and issue outputs in a sequence of rounds and may change, in each round, their input assumptions and output guarantees. The interface F has no inputs and the single output variable x; the interface G has the two input variables x and y, and no outputs. In the first round, the interface F either goes to state q0 and outputs x = 0, or it goes to state q1 and outputs x = 0. Also in the first round, on input y = 0 the interface G goes to state r0 , and on input y = 0 it goes to state r1 . In the second round, in state q0 the interface F outputs x = 0, and in state q1 it outputs x = 0, after which it goes back to the initial state. Also the second round, in state r0 the interface G has the input assumption x = 0, and in state r1 it has the input assumption x = 0. After the second round, also G returns to its initial state and the process repeats ad infinitum.
Interface-based Design
103
Note that the state q0 of interface F is compatible with the state r0 of interface G, and q1 is compatible with r1 , but q0 is not compatible with r1 , and q1 is not compatible with r0 . The environment provides the input y to the interface G in every round. The environment can avoid incompatibilities by copying, in each (odd) round, the value of x into y. In this way the environment can ensure that F and G are always in compatible states. Hence the two interfaces F and G are compatible. The helpful strategy of the environment can be synthesized as a winning strategy of the two-player game “environment” versus “interfaces.” In this simple example, it is a game with complete information, because at all times the environment, by observing the output x of F , can deduce the internal state of F . In the presence of hidden transitions, interface languages must be designed carefully. This is because if the state of an interface is not visible to the environment, then a legal environment corresponds to a winning strategy in a game with partial information. The derivation of such strategies requires, in general, exponential time, by involving a subset construction that considers all sets of possible interface states [9]. Any model with an exponential cost for binary composition, however, is unlikely to be practical. This is why we have focused, in this article, on the asynchronous case with hidden transitions, and elsewhere [3],[6],[7], on more general, synchronous and real-time interfaces but without hidden transitions. Another interesting direction is to investigate stronger but more efficient compatibility checks, which consider only restricted sets of strategies for the environment. Such checks would be conservative (i.e., sufficient but not necessary) yet might still achieve the desired properties of incremental design and independent implementability. Rich, stateful interfaces as games have been developed further in [2],[4]. In the former article, multiple instances of a component, such as a recursive software module, may be active simultaneously. Compatibility checking for the corresponding interface language is based on solving push-down games. In the latter article, the notion of error state is generalized to handle resource constraints of a system: an error occurs if two or more components simultaneously access or overuse a constrained resource. Critical resources may include power, buffer capacity, or cost. While the basic set-up of the game “environment” versus “interfaces” remains the same, the objective function of the game changes and may include quantitative aspects, such as minimizing resource use. - Stoelinga for pointing out errors in Acknowledgments. We thank Marielle a previous version of this article. The research was supported in part by the ONR grant N00014-02-1-0671 and by the NSF grants CCR-0234690, CCR9988172, and CCR-0225610.
104
Notes 1. It is important to emphasize the existential interpretation of the free inputs of interfaces, because this deviates from the standard, universal interpretation of free inputs in specifications [5]. While a specification of an open system is well-formed if it can be realized for all input values, an interface is well-formed if it is compatible to some environment. In other words, for interfaces, the environment is helpful, not adversarial. 2. We could formalize the property of incremental design, instead, as associativity of interface composition [5]. We chose our formalization, because it does not require an explicit notion of equality or equivalence between interfaces. Implicitly, according to our formalization, two interfaces F and G are equivalent if they are compatible with same interfaces, that is, if for all interfaces H, we have F ∼ H iff G ∼ H. It can be shown that if the property of incremental design holds, then for all interfaces F , G, and H, if F ∼ G and F ||G ∼ H, then G ∼ H and F ∼ G||H and the two interfaces (F ||G)||H and F ||(G||H) are equivalent in the specified sense. 3. A discussion about interfaces versus components can be found in [6]. 4. The property of independent implementability is a compositionality property [6]. It should be noted that the “direction” of interface compositionality is top-down, from more abstract to more refined interfaces: if F ∼ G and F F and G G , then F ∼ G . This is in contrast to the bottom-up compositionality of many other formalisms: if F ∼ G and F F and G G, then F ∼ G.
References [1]
R. Alur, T.A. Henzinger, O. Kupferman, and M. Vardi. Alternating refinement relations. In Proc. Concurrency Theory, Lecture Notes in Computer Science 1466, pages 163–178. Springer-Verlag, 1998.
[2]
« A. Chakrabarti, L. de Alfaro, T.A. Henzinger, M. Jurdzinski, and F.Y.C. Mang. Interface compatibility checking for software modules. In Proc. Computer-Aided Verification, Lecture Notes in Computer Science 2404, pages 428–441. Springer-Verlag, 2002.
[3]
A. Chakrabarti, L. de Alfaro, T.A. Henzinger, and F.Y.C. Mang. Synchronous and bidirectional component interfaces. In Proc. Computer-Aided Verification, Lecture Notes in Computer Science 2404, pages 414–427. Springer-Verlag, 2002.
[4]
A. Chakrabarti, L. de Alfaro, T.A. Henzinger, and M. Stoelinga. Resource interfaces. In Proc. Embedded Software, Lecture Notes in Computer Science 2855, pages 117–133. Springer-Verlag, 2003.
[5]
L. de Alfaro and T.A. Henzinger. Interface automata. In Proc. Foundations of Software Engineering, pages 109–120. ACM Press, 2001.
[6]
L. de Alfaro and T.A. Henzinger. Interface theories for component-based design. In Proc. Embedded Software, Lecture Notes in Computer Science 2211, pages 148–165. Springer-Verlag, 2001.
[7]
L. de Alfaro, T.A. Henzinger, and M. Stoelinga. Timed interfaces. In Proc. Embedded Software, Lecture Notes in Computer Science 2491, pages 108–122. Springer-Verlag, 2002.
[8]
N.A. Lynch. Distributed Algorithms. Morgan-Kaufmann, 1996.
[9]
J. Reif. The complexity of two-player games of incomplete information. J. Computer and System Sciences, 29:274–301, 1984.
THE DEPENDENT DELEGATE DILEMMA Bertrand Meyer ETH Zurich & Eiffel Software http://se.inf.ethz.ch – http://www.eiffel.com
Abstract
1.
A criticism of the object-oriented style of programming is that the notion of class invariant seems to collapse in non-trivial client-supplier relationships: a supplier (“Dependent Delegate”) called from within the execution of a routine, where the invariant is not required to hold, may call back into the originating object, which it then catches in an inconsistent state. This is one of the problems arising from the application of assertion-based semantics to a model of computation involving references and the resulting possibility of dynamic aliasing. This note suggests handling such cases by applying the basic non-objectoriented Hoare rule, instead of the version involving the invariant. It does not consider inheritance and dynamic binding.
OVERVIEW
A key concept of object-oriented programming, essential for reasoning about classes and their instances, is the class invariant. A class invariant expresses a consistency property applicable to all instances of a class. For example a class PERSON with a query spouse returning a PERSON and a boolean query is married may include the invariant clauses: is married = (spouse /= Void) is married implies (spouse.spouse = Current)) In words: a person is married if and only if “he” has a spouse, and in that case the spouse of that spouse is the person himself (the “Current” object as talked about in the class). Despite its name, the class invariant is, for all but non-trivial examples, not always satisfied; it only has to hold when the object is officially accessible to clients. During the execution of a routine of the class, the invariant may be temporarily violated. This is already clear from our example: any routine that affects spouse or is married, for example a procedure marry (p: PERSON) that sets the spouse of the current person to p and is married to True, will temporarily, in-between those two setting operations, falsify the invariant. This 105 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 105–118. © 2005 Springer. Printed in the Netherlands.
106 is considered acceptable since in such an intermediate state the object is not directly usable by the rest of the world – it is busy executing a routine, marry –, so it doesn’t matter that its state might be inconsistent. What counts is that the invariant will hold before and after the execution of calls such as Alice.marry (Bob), executed by clients of the class PERSON. The Dilemma of interest for this note arises when such a client is also a supplier, direct or indirect. A typical scheme is for a routine r (which could be the marry of our example) to pass the current object to a supplier: r is do . . . Instructions 1 . . . some supplier.some work (Current) . . . Instructions 2 . . . end This tells another object some supplier (the “Dependent Delegate”) to do some work, for which it may need to access the current object, passed to it as an argument. As a consequence, part of some work could be a call (a “Dependent Delegate Callback”) back into that same object: some work (x: PERSON) is do ... x.some operation ... end In this execution, x happens to be the former “current object” that is now waiting for the execution of r to terminate. But then the call to some operation, back into that object, catches it unawares: there is no guarantee that the object will satisfy the invariant at that point, since the Instructions 1 might have invalidated that invariant, as they are entitled to do – without violating the correctness requirement of the original class – provided the Instructions 2 reestablish it. This is the Dependent Delegate Dilemma: you hope to delegate a certain task to a supplier, but discover that the supplier (the delegate) is dependent on you, soon coming back with requests for your own help. Since you didn’t expect those requests – na¨¨ıvely believing, like many a novice manager before you, that delegating a task means you can stop worrying about it and just wait for the delegate to come back with the work done – they may catch you in
107
The Dependent Delegate Dilemma
a state that doesn’t satisfy the invariant (you may for example be dozing off between meetings with successive visitors). The ultimate cause behind the Dilemma is the role of references in the object-oriented model of computation and the resulting possibility of dynamic aliasing. In our example the delegate object can, through x, keep a reference to the original PERSON object: some supplier x (PERSON)
(DELEGATE TYPE)
Such dynamic aliasing is part of the flexibility provided by the use of references, but complicates assertion-based reasoning about program behavior. The next sections examine the Dilemma and suggest addressing it through proper application of Hoare-style specifications. It is important for this discussion to note the context in which the Dilemma may occur: The Dependent Delegate Dilemma arises when a supplier of a class is also – because it calls back one of its routines – a client of that class. The case of a class that is both a client and supplier of another, introducing a cycle in the client class, is known to be delicate. For example: It prohibits a client relationship of the “expanded client” form where every instance of A contains a subobject of B, rather than the usual “reference client” form where each instance of A contains a (possibly void) reference to an object of type B. Eiffel’s compile-time validity rules explicitly prohibit cycles in the expanded client relation [Meyer, 1992]. Cyclic client relationships make invariants more difficult to enforce. Class PERSON may have a feature residence: HOUSE and the invariant clause residence /= Void implies residence.resident = Current, where HOUSE has resident: PERSON. Even if all the routines of class PERSON preserve that invariant, a routine of class HOUSE can violate it by assigning to resident. Looking at one of the classes alone will not reveal the error. This Indelicate Delegate problem is closely related to the Dependent Delegate Dilemma. This issue is discussed in [Meyer, 1997] (11.14, “Class invariants and reference semantics”) with the informal suggestion of adding a symmetric invariant: here, in class HOUSE, resident /= Void implies resident.residence = Current.
108
2.
RULES FOR ROUTINE CALLS
A routine call stands for the execution of the corresponding routine body, with actual arguments if any substituted for the corresponding formals. This is captured by the traditional (non-O-O) Hoare rule for routines, which in a simplified form sufficient for this discussion we may write {P} body
{Q}
{P } call
{Q }
N RULE
where P and Q are assertions (precondition and postcondition), body is the body of a routine, call is a call to that routine, and priming (in P’ and Q’) stands for substitution of actual for formal arguments. The rule states that, after such substitution, we may infer a property of any call to a routine from the corresponding property of the routine’s body. We call this rule N RULE (N for non-object-oriented). N RULE applies to calls of the form some routine (some arguments)
[UNQUAL]
This doesn’t just includes routine calls in a non-O-O language, but also, in an O-O language, calls of the UNQUAL form executed by another routine in the same class as some routine, which simply calls some routine on the same object on which it is currently executing. The correctness of such calls, said to be unqualified, falls under N RULE. In an object-oriented language, we also have qualified calls of the form some object. some routine (some arguments)
[QUAL]
where some routine must be exported to the appropriate class (a client) executing the qualified call. It is for such qualified calls that the class invariant intervenes, in the form of the modified rule {P ∧ INV } {P } call
body {Q ∧ INV } {Q }
I RULE
called “I RULE” because it involves the invariant INV of the class. The invariant helps us reason about the class: Being added to the precondition, it facilitates writing the routine by allowing us to assume that it always finds the object in a consistent state. Being added to the postcondition, it imposes on the routine the extra requirement of ensuring the postcondition on exit.
The Dependent Delegate Dilemma
109
For example, a class describing bank accounts may have an invariant clause stating balance = deposits.total – withdrawals.total: the current balance is consistent with the history of deposits and withdrawals. An exported routine that manipulates the account may assume this: it doesn’t have to worry about finding an inconsistent object. It must, however, worry about avoiding that same inconsistency on return. So if for example it modifies the list of deposits, it must update the balance accordingly. To complement I RULE there’s also a rule ensuring that, on creation, every object satisfies the invariant of its generating class. It reads {P ∧ Defaults } c body {Q ∧ INV} {P }
c call
{Q }
C RULE
and applies to a creation procedure (“constructor” in C++); Defaults denotes the result of default initializations. C RULE is to I RULE what the base step of an induction rule is to the induction step. It is not, however, essential to the present discussion. For both N RULE and I RULE the inferred property of routine calls – the consequent of the rule – is the same: {P’} call {Q’}. The invariant figures only, in I RULE, in the property of the routine body – the hypothesis that we must prove to allow the inference. This presents the invariant as an internal property of the class rather than one directly relevant to clients. Indeed, an invariant typically includes, along with official properties, corresponding to axioms of the corresponding abstract data type, a part known as the representation invariant [Hoare, 1972] which involves secret features of the class and hence should not be visible to clients. This discussion suggests a first definition of the correctness of a class:
Definition: Class Correctness (basic) A class is correct if every routine r of the class satisfies the following properties: N RULE if a routine of the class calls r unqualified. I RULE if r is exported to at least one client. C RULE if r is a creation procedure. The three cases are not exclusive; a routine r that falls into more than one case must satisfy the associated clauses. In particular: In Eiffel, a procedure of the class may be available for normal calls x.p (a), where clause 2 applies, as well as for creation calls create x.p (a) which subject it to clause 3. (This is not the case in languages such as
110 C++, Java and C# where constructors are special mechanisms distinct from the features of the class.) More directly relevant to this discussion, r may be both called by other routines of the class in unqualified form and available for qualified calls by clients, subjecting it to clause 1 as well as 2. Clause 2 is stronger than needed since it would suffice to require that r satisfy I RULE if some client actually calls it qualified, as in x.p (a). But then we couldn’t check class correctness without knowing all the clients of a class; this would mean that the check is global, applying to an entire program (“system” in Eiffel terminology). As given, the rule is modular: enforceable at the level of individual classes. Clause 2 makes I RULE applicable to routines exported to “at least one client”. In some object-oriented languages a feature is either secret or public; then the rule will apply only to exported features. In others, the policy is more fine-grained. For example C++ has a notion of “friend” classes and C# allows export to the “family” of a class or to its assembly. In Eiffel, it is possible to export a feature to specific classes, as in class C feature “Declaration of features f1, . . . ” feature {A, B} “Declaration of features g1, . . . ” feature {A, B, C} “Declaration of features h1, . . . ” feature {NONE} “Declaration of features i1, . . . ” end The specifications determine whether a call of the form x.f (. . . ), where x is declared of type C and f is one of f1, g1 etc., is valid in a class CLIENT: For f1, CLIENT can be any class (the feature is public). For i1 the call is never valid (NONE is, by convention, the bottom of the inheritance graph). For g1 the call is valid only in a class CLIENT that is A, B or one of its descendants. (If we export a feature to a class we should also export it to its descendants.) Because the rule applies to all qualified calls, x.g1 (. . . ) is not valid in class C itself, because the call uses the class as its own client. For the call to be valid,
111
The Dependent Delegate Dilemma
we must export the feature to the class itself, as with h1 and of course f1. This policy distinguishes Eiffel from languages such as Java and C#, where a class may always use its own features. It follows from the principle that: A feature is always usable, within its own class, in unqualified calls. A qualified call, however, is only valid if it appears in a client to which the feature is exported. This means that if the client is the same as the supplier, it must export the feature to itself. The rule that invariants (clause 2 of the rule) only matter for qualified calls also affects run-time assertion monitoring. Current Eiffel implementations do not yet support full proofs of correctness but offer optional run-time contract monitoring. With invariant monitoring turned on, invariant checks, on routine entry and exit, only take place for qualified calls. This means in particular (for example under Eiffel Software’s EiffelStudio compiler) that the calls f (. . . ) Current.f (. . . ) although equivalent for a correct program, differ in the presence of invariant monitoring: the second will cause the invariant to be checked, the first won’t.
3.
THE DEPENDENT DELEGATE RULE
The purpose of the Class Correctness rule is to ensure that clients get the promise of the class routines’ contracts, based on the assumption that whenever an instance of the class is observable from the outside it will satisfy its invariant. We may informally picture the life of an object as follows [Meyer, 1997]: create a.make (. . . ) a.f (. . . )
a.g (. . . )
State satisfying invariant State that may violate invariant Routine call (causing state change)
a.f (. . . )
112 During the execution of qualified calls (as in the mark appearing in the execution of g) the invariant may temporarily be violated; but it will hold before and after the execution of these calls. With the possibility of calls to “dependent delegates” the basic Class Correctness rule is no longer sufficient to ensure this property: a.g (Current) g (x) x.f (. . . )
As illustrated, g may call back into the original object which it finds in a state violating the invariant. If such a callback from a dependent delegate (a supplier that is also a client) occurs, it no longer suffices that the routine being called back, f in the figure, satisfy I RULE, since it is called outside of invariant-satisfying states. The callback is similar to an unqualified call as may be executed from within the class, which the basic Class Correctness rule addresses through N RULE (clause 1). It seems appropriate to address this case through the same formal device, yielding a new clause:
Definition: Class Correctness (extended) In addition to the preceding clauses, r must satisfy: N RULE if a supplier of the class calls r qualified. This extra requirement appears to take care of the Dependent Delegate Dilemma in the absence of dynamic binding and is the contribution of the present note. The added condition can be checked locally for each class (in other words, it is “modular”): while knowing the clients of a class requires access to the entire system (program), analyzing any class requires having access to its suppliers (the classes used as type for x in any calls of the form x.f (. . . ) appearing in the class). Clause 4 arises when such a supplier (direct or indirect) is also a client, calling back into the object; checking it requires no more information than is needed anyway to analyze the class, independently of any specific system to which the class may belong.
113
The Dependent Delegate Dilemma
4.
AN EXAMPLE
To see how the Dependent Delegate Dilemma and the solution presented work out in practice, let us develop the “marriage” example sketched earlier: class PERSON feature spouse: PERSON -- Spouse, if any is married: BOOLEAN is -- Is this person married? do Result := (Spouse /= Void) end . . . Procedures such as marry (see below) . . . invariant is married = (spouse /= Void) is married implies (spouse.spouse = Current)) end Since there is no explicit creation procedure, instances of this class will be created through the default creation mechanism create p (for p: PERSON) N which initializes all booleans such as is married to False and all references such as spouse to void, ensuring that the instance satisfies the invariant, as per C RULE. Here is a first attempt at a procedure to marry the current PERSON object to another (omitting preconditions not relevant to the discussion, such as p not being Current): marry1 (p: PERSON) is -- Get married to p. require p /= Void not is married not p.is married do spouse := p is married := True p.marry1 (Current) ensure is married spouse = p end
-- [INCORRECT VERSION]
114 This will not work since the call to marry1 violates the second clause of the precondition: for this call p represents the current PERSON object, whose is married attribute has just been set to True. Note that in this example the Dependent Delegate of class PERSON is PERSON itself. The discussion can be directly transposed to the example of people’s residence and houses’ residents, which involves two distinct classes. We might try reversing the order of instructions: marry2 (p: PERSON) is -- Get married to p. require . . . As for marry1 . . . do p.marry2 (Current) spouse := p is married := True ensure . . . As for marry1 . . . end
-- [INCORRECT VERSION]
but the call causes infinite recursion. It seems inevitable to introduce a routine with fewer restrictions than marry, a plain “setter” procedure, which we call get engaged: feature {PERSON} -- Implementation get engaged (p: PERSON) is -- Set spouse to p and is married to True. -- No precondition! do spouse := p is married := True ensure spouse = p is married end Procedure get engaged is useful only for PERSON’s internal purposes; as a consequence it appears in a feature clause labeled feature {PERSON}, meaning it’s exported only to PERSON itself, allowing routines of the class to use calls such as p1.get engaged (p2) for p1 and p2 of type PERSON. We may now write a correct version of marry:
115
The Dependent Delegate Dilemma
marry3 (p: PERSON) is -- Get married to p. require . . . As for marry1 . . . do get engaged (p) p.get engaged (Current) ensure . . . As for marry1 . . . end
Here the invariant doesn’t hold!
The call p.get engaged (Current) is executed in a state that doesn’t satisfy the invariant since is married is now True but spouse.spouse is not Current (it’s indeed the purpose of that call to set it to Current). This call, p.get engaged (Current), is a dependent delegate callback on p. It doesn’t actually cause a problem with the original Class Correctness rule since that at stage p satisfies the invariant (its spouse is void and its is married is false). To illustrate a potentially damaging callback we replace get engaged by two separate setter procedures: feature {PERSON} -- Implementation set spouse (p: PERSON) is -- Set spouse to p. -- No precondition! do spouse := p ensure spouse = p end set married is -- Set is married to True. -- No precondition! do is married := True ensure is married end
Then we may write the marrying procedure as
116 marry4 (p: PERSON) is -- Get married to p. require . . . As for marry1 . . . do set married p.set married set spouse (p) p.set spouse (Current) ensure . . . As for marry1 . . . end
Here the invariant doesn’t hold for p
where the last call catches the object associated with p in a state that doesn’t satisfy the invariant, since its is married is true but (assuming p had just been created and initialized to the default) its spouse is still void. With only the basic Class Correctness rule this would make marry4 incorrect; the extended rule, however, only requires set spouse and set married to satisfy N RULE, which they do since they trivially ensure their postconditions.
5.
IMPROVING RUN-TIME INVARIANT MONITORING
The extended Class Correctness rule appears to provide a basis for addressing the Dependent Delegate Dilemma, although this note does not address inheritance and dynamic binding, and does not provide a formal proof, which requires a mathematical model of O-O computation. A practical consequence for today’s Design by Contract support systems, enforcing run-time contract monitoring rather than proofs, is that invariant monitoring should not apply to Dependent Delegate Callbacks. As noted, invariant monitoring applies only to qualified calls; a Dependent Delegate Callback is qualified (x.f) f but, like an unqualified call ff, it may catch the object in a state that doesn’t satisfy the invariant, without signaling any actual mistake in the system. Eiffel Software’s EiffelStudio implementation [Eiffel, 2004] checks the invariant in this case; so do (as far as I know) other Eiffel compilers. This policy should be corrected as it may lead to false alarms. Such situations, although very rare, do occasionally occur in practice; programmers address them through calls to library routines that disable invariant monitoring before the offending callback and re¨e¨ nable it after. Instead of forcing such ad hoc solutions on the programmer, compilers should take care of the problem by skipping the invariant check for dependent delegate callbacks.
The Dependent Delegate Dilemma
117
Detecting that a qualified call is in fact a dependent delegate callback shouldn’t be hard for compilers; this is, as noted, a local check, not requiring any more information than already needed to analyze and compile a class.
6.
ABOUT THE INDELICATE DELEGATE PROBLEM
An earlier part of the discussion mentioned the Indelicate Delegate problem, causing a class invariant to be invalidated, through reference reassignment, beyond the control of the class itself. Although this note concentrates on the Dependent Delegate Dilemma, we may take a look at the relationship between the two issues. The Indelicate Delegate problem is indeed lurking in our marriage example. All the marry procedures so far assumed the precondition clauses not is married and not p.is married. Assume we remove these clauses, to loosen the requirements by allowing remarriage. Then for non-void p, q and r a client may execute the successive calls p.marry (q) q.marry (r) The second call remarries q to r. If marry is written – for example as marry3 or marry4 – to preserve the invariant in accordance with I RULE, it will ensure spouse.spouse = Current for both q and r. But neither implementation updates any property of p; indeed, spouse.spouse will, for p, end up with value r! Such bigamous behavior on the part of q leads to moral outrage, the full punishment of the law and (the real scandal for this discussion) breach of software correctness. [Meyer, 1997], as already noted, suggested providing a symmetric invariant. This is clearly not the right approach here, since only one class is involved: the symmetric invariant is the same as the original, spouse.spouse = Current (under is married). What we seem actually to need here is spouse.spouse.spouse = spouse. Future work will address the issue in a more general setting.
7.
SUMMARY AND CONCLUSION
This note has proposed a simple solution to the Dependent Delegate Dilemma, based on a simple correctness rule: requiring that any routine used in a dependent delegate callback satisfy, in addition to the object-oriented correctness property, the traditional routine rule. If this solution is right, it should be applied right away by run-time contract-monitoring options of current compilers.
118
Acknowledgments This note derives from an email message to Peter M¨u¨ ller and Rustan Leino (whose comments and criticism I gratefully acknowledge) during a discussion in November 2003, originally triggered by comments by Tony Hoare at the WG 2.3 meeting in Monterey in January 2002 (where he appears to have suggested the solution described here, although I did not realize it then). It was further fueled by remarks from Manfred Broy in Marktoberdorf in August 2004. Peter M¨u¨ ller provided further corrections.
References [Eiffel, 2004] Eiffel (1993–2004). EiffelStudio documentation. Eiffel Software. Online at eiffel.com. [Hoare, 1971] Hoare, C. (1971). Procedures and parameters: An axiomatic approach. In Engeler, E., editor, Symposium on the Semantics of Programming Languages, volume 188 of Lecture Notes in Mathematics, pages 103–116. Springer-Verlag. reprinted in C. A. R. Hoare and C. B. Jones (eds.), Essays in Computing Science, Prentice Hall International 1989. [Hoare, 1972] Hoare, C. (1972). Proofs of correctness of data representations. Acta Informatica, 1:271–281. reprinted in C. A. R. Hoare and C. B. Jones (eds.), Essays in Computing Science, Prentice Hall International, 1989, pages 103–115. [Leino and M¨u¨ ller, 2004] Leino, K. R. and Mu¨ ller, P. (2004). Object invariants in dynamic contexts. In European Conference on Object-Oriented Programming, volume 3086 of LNCS, pages 491–516. Springer-Verlag. [Meyer, 1992] Meyer, B. (1992). Eiffel: The Language. Prentice Hall. 2nd printing. [Meyer, 1997] Meyer, B. (1997). Object-Oriented Software Construction. Prentice Hall, 2nd edition. [M¨u¨ ller, 2002] Mu¨ ller, P. (2002). Modular Specification and Verification of Object-Oriented Programs, volume 2262 of LNCS. Springer-Verlag.
Part II System and Program Verification, Model Checking and Theorem Proving
Thomas Ball
Amir Pnueli
J Strother Moore
Shmuel Sagiv
FORMALIZING COUNTEREXAMPLE-DRIVEN REFINEMENT WITH WEAKEST PRECONDITIONS Thomas Ball Microsoft Research
[email protected] Abstract
To check a safety property of a program, it is sufficient to check the property on an abstraction that has more behaviors than the original program. If the safety property holds of the abstraction then it also holds of the original program. However, if the property does not hold of the abstraction along some trace t (a counterexample), it may or may not hold of the original program on trace t. If it can be proved that the property does not hold in the original program on trace t then it makes sense to refine the abstraction to eliminate the “spurious counterexample” t (rather than a report a known false negative to the user). The SLAM tool developed at Microsoft Research implements such an automated abstraction-refinement process. In this paper, we reformulate this process for a tiny while language using the concepts of weakest preconditions, bounded model checking and Craig interpolants. This representation of SLAM simplifies and distills the concepts of counterexample-driven refinement in a form that should be suitable for teaching the process in a few lectures of a graduate-level course.
Keywords:
Hoare logic, weakest preconditions, predicate abstraction, abstract interpretation, inductive invariants, symbolic model checking, automatic theorem proving, Craig interpolants
1.
Introduction
A classic problem in computer science is to automate, as fully as possible, the process of proving that a software program correctly implements a specification of its expected behavior. Over thirty years ago, Tony Hoare and Edsger Dijkstra laid the foundation for logically reasoning about programs [Hoare, 1969; Dijkstra, 1976]. A key idea of their approach (let us call it “programming with invariants”) is to break the potentially complicated proof of a program’s correctness into a finite set of small proofs. This proof technique requires the programmer to identify loop invariants that decompose the proof of a program containing loops into a set of proofs about loop-free program fragments. 121 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 121–139. © 2005 Springer. Printed in the Netherlands.
122 Thirty years later, we do not find many programmers programming with invariants. We identify three main reasons for this: Lack of Specifications: it is hard to develop formal specifications for correct program behavior, especially for complex programs. Specifications of full correctness can be complicated and error-ridden, just like programs. Minimal Tool Support: there has been a lack of “push-button” verification tools that provide value to programmers. As a result, programmers must climb a steep learning curve and even become proficient in the underlying mechanisms of tools (such as automated deduction) to make progress. Programmers simply have not had the incentive to create formal specifications because tools have not provided enough value to them. Large Annotation Burden: finally, although invariants are at the heart of every correct program, eliciting them from people remains difficult. Identification of loop invariants and procedure pre- and postconditions is necessary to make programming with invariants scale and it is left up to the programmer to produce such annotations. Despite these obstacles, automated support for program proving has steadily improved [Sagiv et al., 1999; Flanagan et al., 2002] and has now reached a level of maturity such that software tools capable of proof are appearing on programmers’ desktops. In particular, at Microsoft Research, a tool called SLAM [Ball and Rajamani, 2000; Ball and Rajamani, 2001] that checks temporal safety properties of C programs has been developed and successfully deployed on Windows device drivers. Three factors have made a tool like SLAM possible: Focus on Critical Safety Properties: The most important idea is to change the focus of correctness from proving the functional correctness of a program to proving that the program does not violate some critical safety property. In the domain of device drivers, such properties define what it means for a driver to be a good client of the Windows operating system (in which it is hosted). The properties state nothing about what the driver actually does; they merely state that the driver does nothing bad when it interacts with the operating system. Many of these properties are relatively simple, control-dominated properties without much dependence on program data. As a result, they are simpler to state and to check than functional correctness properties. Advances in Algorithms and Horsepower: SLAM builds on and extends a variety of analysis technologies. Model checking [Clarke and Emerson, 1981; Queille and Sifakis, 1981] and, in particular, symbolic
Formalizing Counterexample-driven Refinement with Weakest Preconditions
123
model checking [Burch et al., 1992; McMillan, 1993] greatly advanced our ability to automatically analyze large finite-state systems. Predicate abstraction [Graf and Saidi, 1997] enables the automated construction of a finite-state system from an infinite-state system. SLAM uses automatic theorem provers [Nelson and Oppen, 1979; Detlefs et al., 2003], which have steadily advanced in their capabilities over the past decades, to create predicate (or Boolean) abstractions of C programs and to determine whether or not a trace in a C program is executable. Program analysis [Cousot and Cousot, 1978; Sharir and Pnueli, 1981; Knoop and Steffen, 1992; Reps et al., 1995; Das, 2000] has developed efficient techniques for analyzing programs with procedures and pointers. Last but not least, the tremendous computational capabilities of computers of today make more powerful analysis methods such as SLAM possible. Invariant Inference: Given a complex program and a simple property to check of that program, SLAM can automatically find loop invariants and procedure preconditions/postconditions that are strong enough to either prove that the program satisfies the property or that point to a real error in the program. This frees the programmer from having to annotate their code with invariants. Invariants are discovered in a goal-directed manner, based on the property to be checked, as described below. To check a safety property of a program, it is sufficient to check the property on an abstraction that has more behaviors than (or overapproximates the behaviors of) the original program. This is a basic concept of abstract interpretation [Cousot and Cousot, 1977]. If the safety property holds of the abstraction then it also holds of the original program. However, if the property does not hold of the abstraction on some trace t, it may or may not hold of the original program. If it does not hold on the original program on trace t, we say the analysis yields a “false negative”. Too many false negatives degrades the usefulness of an analysis. In abstract interpretation, if there are too many false negatives then the analyst’s task is to find a better abstraction that reduces the number of false negatives. The abstraction should also be effectively computable and result in a terminating analysis. Rather than appealing to a human for assistance, the SLAM process uses false negatives (also called spurious counterexamples) to automatically refine the abstraction. Kurshan introduced this idea [Kurshan, 1994] which was extended and applied to finite-state systems by Clarke et al. [Clarke et al., 2000] and to software by Ball and Rajamani [Ball and Rajamani, 2000; Ball and Rajamani, 2001]. In SLAM, program abstractions are represented by predicate abstractions. Predicate abstraction is a parametric abstraction technique that constructs a finite-state abstraction of an infinite state system S based on a set of predicates
124 observations of the state space of S. If there are n predicates, then a state s of the original program (S) maps to a bit-vector of length n, with each bit having a value corresponding to the value of its corresponding predicate in state s. This finite-state abstraction is amenable to symbolic model checking techniques, using data structures such as binary decision diagrams [Bryant, 1986]. Suppose that the desired property does not hold of the predicate (Boolean) abstraction. In this case, the symbolic model checker produces an error trace t that demonstrates that the error state is reachable in the Boolean abstraction. SLAM uses automated theorem proving to decide whether or not trace t is a spurious or feasible counterexample of the original program. The basic idea is to build a formula f (t) from the trace such that f (t) is satisfiable if and only if trace t is a (feasible) execution trace of the original program. An automatic theorem prover decides if f (t) is satisfiable or unsatisfiable. If it is satisfiable then the original program does not satisfy the safety property. Otherwise, SLAM adds new predicates to the Boolean abstraction so as to eliminate the spurious counterexample. That is, the Boolean variables introduced into the Boolean abstraction to track the state of the new predicates make the trace t unexecutable in the refined Boolean abstraction. SLAM then repeats the steps of symbolic model checking, path feasibility testing and predicate refinement, as needed. SLAM is a semi-algorithm because it is not guaranteed to terminate, although it can terminate with success (proving that the program satifies the safety property) or failure (proving that the program does not satisfy the safety property). The goal of this paper is to present the counterexample-driven refinement process for software in a declarative manner: we declare what the process does rather than how it does it. Here is what we will do: review Hoare logic and weakest preconditions for a tiny while language (Section 2); describe the predicate abstraction of a program with respect to a set of predicates and the symbolic model checking of the resulting Boolean abstraction in a single blow using weakest preconditions (Section 3); describe how the concept of a length-bounded weakest precondition serves to describe counterexamples in both the original program and Boolean abstraction (Section 4); show how refinement predicates can be found through the use of Craig interpolants [McMillan, 2003; Henzinger et al., 2004] (Section 5); extend the while language and the process to procedures with call-byvalue parameter passing (Section 6);
Formalizing Counterexample-driven Refinement with Weakest Preconditions
125
extend the while language and the process to procedures with call-byreference parameter passing (Section 7). Along the way, we suggest certain exercises for the student to try (marked with “Exercise” and “End Exercise”). The process we present is not exactly what SLAM does. First, we have unified the steps of abstraction construction and symbolic reachability into a single step, which proceeds backwards from a (undesirable) goal state to determine if an initial state of the program can be found. In SLAM, the two steps are separate and the symbolic reachability analysis is a forward analysis. Second, SLAM computes a Cartesian approximation to predicate abstraction [Ball et al., 2001], while the process we describe here defines the most precise predication abstraction. Third, SLAM abstracts a C program at each assignment statement. The process we describe here abstracts over much larger code segments, which yields a more precise abstraction. Fourth, inspired by work by McMillan [McMillan, 2003], we analyze a set of counterexamples at once. SLAM works with one counterexample at a time. Finally, inspired by the BLAST project [Henzinger et al., 2004], we use the theory of Craig interpolants [Craig, 1957] to describe the predicate refinement step.
2.
A Small Language and Its Semantics
We consider a small while language containing structured control-flow constructs and integer variables, with the following syntax: S → skip | x := e | S ; S | if b then S else S | while b do S We assume the usual operators over integer variables (addition, subtraction, multiplication and division). A predicate is either a relational comparison of integer-valued expressions or is a Boolean combination of predicates (constructed using the Boolean connectives ∧, ∨ and ¬). A predicate p is said to be atomic if it contains no Boolean connectives (that is, it is a relational expression). Otherwise, it is said to be compound. A program state is a mapping of variables to integer values. A predicate Q symbolically denotes the set of program states for which Q is true. For example, the predicate (x < 5 ∨ x > 10) denotes the infinite set of states for which x is less than 5 or greater than 10. We use the familiar Floyd-Hoare triples and weakest-precondition transformers to specify the semantics of a program. The notation {P } S {Q} denotes that if the predicate P is true before the execution of statement S then the predicate Q will be true if execution of statement S completes. (Note: this is a partial correctness guarantee, as it applies only to terminating computations of S).
126 {Q} skip {Q}
(S KIP)
{Q[x/e]} x := e {Q} P ⇒ P
{P } S {Q } {P } S {Q}
(A SSIGNMENT) Q ⇒ Q
{P } S1 {Q} {Q} S2 {R} {P } S1 ; S2 {R} {P ∧ B} S1 {Q} {P ∧ ¬B} S2 {Q} {P } if B then S1 else S2 {Q} {I ∧ B} S {I} {I} while B do S {¬B ∧ I} Figure 1.
(C ONSEQUENCE)
(S EQUENCE)
(C ONDITIONAL)
(I TERATION)
Hoare axioms and rules of inference for the while language.
wp(skip, Q)
= Q
wp(x := e, Q)
= Q[x/e]
wp(S1 ; S2 , Q)
= wp(S1 , wp(S2 , Q))
wp(if b then S1 else S2 , Q) = (b ⇒ wp(S1 , Q)) ∧ (¬b ⇒ wp(S2 , Q)) = νX.((b ⇒ wp(S, X)) ∧ (¬b ⇒ Q))
wp(while b do S, Q)
Figure 2.
Weakest preconditions
Figure 1 presents the standard axiom of assignment and inference rules of Hoare logic for the while programming language. The Iteration inference rule shows the main difficulty of “programming with invariants”: the invariant I appears out of nowhere. It is up to the programmer to come up with an appropriate loop invariant I. Given a predicate Q representing a set of goal states and a statement S, the weakest (liberal) precondition transformer defines the weakest predicate P such that {P } S {Q} holds. Figure 2 contains the weakest preconditions for the five statements in the example language. Of special importance is the transformer for the while loop, which requires the use of the greatest fixpoint
Formalizing Counterexample-driven Refinement with Weakest Preconditions
127
operator νX.φ(X): νX.φ(X) =
φi (true)
i=1,···,∞
where φi (true) denotes the i-fold application of the function λX.φ(X) to the predicate true. The greatest fixpoint is expressible in the language of predicates (where implication among predicates defines the lattice ordering) but is not, in general, computable (if it were, then there would be no need for programmers to provide loop invariants). The weakest precondition transformer satisfies a number of properties: wp(S, f alse) = f alse; (Q ⇒ R) ⇒ (wp(S, Q) ⇒ wp(S, R)); (wp(S, Q) ∧ wp(S, R)) = wp(S, Q ∧ R); Exercise. Prove these three properties of the wp transformer. End Exercise
3.
Predicate Abstraction and Symbolic Reachability
We are given a program S and a predicate Q that represents a set of “bad” final states. We want to determine if wp(S, Q) = f alse, in which case we say that program S is “safe” with respect to Q as there can be no initial state from which execution of S yields a state satisfying Q. The problem, of course, is that wp(S, Q) is not computable in the presence of loops. We will use predicate abstraction (with respect to a finite set of atomic predicates E) to define a wpE (S, Q) that is computable and is weaker than wp(S, Q) (that is, wp(S, Q) ⇒ wpE (S, Q)). Therefore, if wpE (S, Q) = f alse then wp(S, Q) = f alse. This gives us a sufficient (but not necessary) test for safety. We say that wpE (S, Q) is an overapproximation of wp(S, Q). We formalize the notion of overapproximation of a compound predicate e by a set of atomic predicates E = {e1 , · · · , ek } with the exterior cover function ecE (e). The exterior cover function ecE (e) is the strongest predicate representable via a Boolean combination of the atomic predicates in E such that e implies ecE (e). We now make the definition of ecE (e) precise. A cube over E is a conjunction ci1 ∧ . . . ∧ cik , where each cij ∈ {eij , ¬eij } for some eij ∈ E. The function ecE (e) is the disjunction of all clauses c over E such that e ⇒ c. There are exactly 2k clauses over E. A naive implementation of the ecE function enumerates all such clauses c and invokes a theorem prover to decide the validity of e ⇒ c. The exterior cover function has the following properties: ecE (e1 ∧ e2 ) ⇒ ecE (e1 ) ∧ ecE (e2 );
128 ecE (e1 ∨ e2 ) = ecE (e1 ) ∨ ecE (e2 ); ecE (¬e) ⇐ ¬ecE (e); (e1 ⇒ e2 ) ⇒ (ecE (e1 ) ⇒ ecE (e2 )) Exercise. Prove these four properties of the ecE function. End Exercise k An important property of the ecE function is that there are at most 22 semantically distinct compound predicates that can be constructed over the set of atomic predicates E. That is, while the domain of the ecE function is infinite, the range of the ecE function is finite. In general, computing the most precise ecE is undecidable. Thus, in practice, we compute an overapproximation to ecE . Now, ecE (wp(S, Q)) is the optimal (but incomputable) covering of wp with respect to E. It is a covering because by the definition of ecE , wp(S, Q) ⇒ ecE (wp(S, Q)). It is optimal also because of the definition of ecE . In order to define a computable wpE , we need to “push” ecE inside wp, but not too far. For example, we prefer not to abstract between statements S1 and S2 in a sequence because: wp(S1 , wp(S2 , Q)) ⇒ wp(S1 , ecE (wp(S2 , Q))) Following abstract interpretation, we wish to abstract only at loops. However, ecE (wp(while b do S, Q)) = ecE (νX.((b ⇒ wp(S, X)) ∧ (¬b ⇒ Q))) still is incomputable because of the greatest fixpoint inside the argument to ecE . But because ecE (e1 ∧ e2 ) ⇒ ecE (e1 ) ∧ ecE (e2 ) it follows that ecE (νX.((b ⇒ wp(S, X)) ∧ (¬b ⇒ Q))) ⇒ νX.ecE ((b ⇒ wp(S, X)) ∧ (¬b ⇒ Q)) That is, by pushing the ecE function inside the greatest fixpoint, we get a function that is weaker than the ecE function applied to the greatest fixpoint. (In fact, it is possible to gain even more precision by only applying ecE to wp(S, X); we leave it to the reader to show that this optimization is sound). So, our desired wpE transformer is the same as wp for every construct except for the problematic while loop, where we have: wpE (while b do S, Q) = νX.ecE ((b ⇒ wp(S, X)) ∧ (¬b ⇒ Q))
Formalizing Counterexample-driven Refinement with Weakest Preconditions
129
Thus, wpE is computable (given an algorithm for approximating ecE ). If wpE (S, Q) = f alse then we declare that wp(S, Q), which means that S is safe with respect to the set of bad final states satisfying predicate Q. Referring back to the terminology of the Introduction, wpE (S, Q) is the result of the symbolic (backwards) reachability analysis of the Boolean abstraction. (We note that to be completely faithful to Boolean abstraction, we should check ecE (wpE (S, Q)) = f alse, which is in general weaker than wpE (S, Q) = f alse. However, as shown above, we need only apply ecE at while loops.) Exercise. Determine how to efficiently implement wpE . As a hint, in the SLAM process, we introduce a Boolean variable bi for each predicate ei in E and convert the predicate ecE (e) into a proposition formula (Boolean expression) over the bi . We then can make use of binary decision diagrams [Bryant, 1986] to manipulate boolean expressions. End Exercise
4.
Path Feasibility Analysis
Suppose that wpE (S, Q) = f alse. From this result, we do not know if wp(S, Q) is false or not. In order to refine our knowledge, we proceed as follows. We introduce the concept of a “bounded weakest precondition” that will serve to unify the concepts of counterexample traces of the Boolean abstraction and the original program. Figure 3 shows the bounded weakest precondition computation, specified as a recursive function. The basic idea is to compute the weakest precondition of statement S with respect to predicate Q for all transformations up to and including k steps (and possibly some more) from the end of S. Each skip or assignment statement constitutes a single step. The bwp transformer returns a pair of a predicate and the numbers of steps remaining after processing S. We see that the definition of bwp first checks if the step bound k has been exhausted. If so, bwp is defined to be “false”. This stops the computation along paths of greater than k steps (because wp(S, f alse) = f alse for all S). Otherwise, if statement S is an assignment or skip, the weakest precondition is computed as before and the step bound decremented by one. Statement sequencing is not noteworthy. The processing of the if-then-else statement proceeds by first recursing on the statements S1 and S2 , resulting in corresponding outputs (Q1 , k1 ) and (Q2 , k2 ). The output predicate is computed as before and the returned bound is the maximum of k1 and k2 . This guarantees that all (complete) computations of length less than or equal to k steps will be contained in bwp (and perhaps some of longer length as well). Processing of the while statement simply unrolls the loop by peeling off one iteration (modeled by the statement “if b then S1 else skip”) and recursing on the while statement. It is clear that the bwp will terminate after at most k steps.
130 let bwp(S, (Q, k)) = if k ≤ 0 then (f alse, 0) else case S of “skip” → (Q, k − 1) “x := e” → (Q[x/e], k − 1) “S1 ; S2 ” → bwp(S1 , bwp(S2 , (Q, k))) “if b then S1 else S2 ” → let (P P1 , k1 ) = bwp(S1 , (Q, k)) in let (P P2 , k2 ) = bwp(S2 , (Q, k)) in (((b ⇒ P1 ) ∧ (¬b ⇒ P2 )), max(k1 , k2 )) “while b do S1 ” → let (Q , k ) = bwp(“if b then S1 else skip”, (Q, k)) in bwp(“while b do S1 ”, (Q , k )) Figure 3.
Bounded weakest precondition.
We define bwpE to be the same as bwp everywhere except at the while loop (as before), where we replace bwp(S, (Q , k )) by bwp(S, (ecE (Q ), k )). Exercise. If wpE (S, Q) = f alse then there is a smallest k such that bwpE (S, (Q, k)) = f alse. Such a k can be found by various means during the computation of wpE . Generalize the wpE computation to produce such a k. End Exercise
5.
Predicate Discovery
If bwp(S, (Q, k)) = f alse then we have found that there is an initial state in wp(S, Q) from which the bad state Q can be reached in at least k steps. If, on the other hand, bwp(S, (Q, k)) = f alse then there is no counterexample of less than or equal to k steps. In this case, we wish to refine the set of predicates E by finding a set of predicates E such that bwpE∪E (S, (Q, k)) = f alse. To explain the generation of predicates, let us first focus on a simple scenario. We have a program S = S1 ; S2 , post-condition Q and set of predicates E such that: wp(S1 , ecE (wp(S2 , Q))) = f alse, and
Formalizing Counterexample-driven Refinement with Weakest Preconditions
131
wp(S1 , wp(S2 , Q)) = f alse. Now, we wish to find an E such that wp(S1 , ecE∪E (wp(S2 , Q))) = f alse One sufficient E simply is the set of atomic predicates that occur in wp(S2 , Q) (that is, E = atoms(wp(S2 , Q))). With such an E , the predicate wp(S2 , Q) is expressible as a Boolean function over the predicates in E and so is expressible by the exterior cover ecE∪E . While the set of predicates (atoms(wp(S2 , Q))) is a correct solution to the predicate refinement problem, it may be too strong for our purposes. That is, there may be many E such that for all e, ecE (e) ⇒ ecE (e) and for which wp(S1 , ecE∪E (wp(S2 , Q))) = f alse Craig interpolants [Craig, 1957] are one way to find such E [Henzinger et al., 2004]. Given predicates A and B such that A∧B = f alse, an interpolant Θ(A, B) satisfies the three following points: A ⇒ Θ(A, B), Θ(A, B) ∧ B = f alse, V (Θ(A, B)) ⊆ V (A) ∩ V (B) That is, Θ(A, B) is weaker than A, the conjunction of Θ(A, B) and B is unsatisfiable (Θ(A, B) is not too weak), and all the variables in Θ(A, B) are common to both A and B. To make use of interpolants in our setting, we decompose wp(S1 , wp(S2 , Q)) into A and B predicates as follows. Let X be the set of variables appearing in the program S1 ; S2 and let X be primedversions of these variables (which do not appear in the program). Let eqX = x∈X (x = x ), which denotes the set of states in which each x ∈ X has the same value as its primed version x . We define A and B as follows: A = wp(S2 , Q)[X/X ]; B = wp(S1 , eqX ) That is, A is the weakest precondition of S2 with respect to Q, with all unprimed variables replaced by their primed versions, while B is the weakest precondition of S1 with respect to the predicate eqX . It is clear that: wp(S1 , wp(S2 , Q)) ⇐⇒ (∃X .A ∧ B) Let EΘ = atoms(Θ(A, B)[X /X]). From the definition of interpolants, it follows that wp(S1 , ecEΘ (wp(S2 , Q))) = f alse. Therefore, wp(S1 , ecE∪EΘ (wp(S2 , Q))) = f alse.
132 let interp S (m, A, B, k) = if k ≤ 0 then list (m, f alse, f alse, 0) else case S of “skip” → list (m, A, B, k − 1) “x := e” → if m = InA then list (m, A[x/e], B, k − 1) else list (m, A, B[x/e], k − 1) “S1 ; S2 ” → f latten(map (interp S1 ) (interp S2 (m, A, B, k))) “if b then S1 else S2 ” → let (InA, A1 , B1 , k1 ) :: tl1 = interp S1 (m, A, B, k) in let (InA, A2 , B2 , k2 ) :: tl2 = interp S2 (m, A, B, k) in let res0 = list (InA, ((b ⇒ A1 ) ∧ (¬b ⇒ A2 )), f alse, max(k1 , k2 )) in let res1 = map (λ(m , A , B , k ). (m , A , b ∧ B , k )) tl1 in let res2 = map (λ(m , A , B , k ). (m , A , ¬b ∧ B , k )) tl2 in res0 @res1 @res2 “while b do S1 ” → let res = interp “if b then S1 else skip”(m, A, B, k) in let (InA, A , B , k ) :: tl = res in let res = (InA, A , B , k ) :: (InB, A [X/X ], eqX , k ) :: tl in f latten (map (interp “while b do S1 ”) res )
Figure 4.
Generation of (A, B) pairs for interpolation.
We now generalize the generation of interpolants to the while language. The basic idea is simple: at any point where the ecE function has been applied during bwpE , we need to generate an (A, B) interpolant pair. Since the bwpE function only applies ecE after peeling one iteration off of a while loop, we need only interpolate at these places. Thus, we will generate at most k × n interpolant pairs, where n is the number of while loops in the program. The interpolation function (interp of Figure 4) follows the same basic structure as the bwp function with a few important differences. First, there is a mode parameter m that indicates whether the function currently is construct-
Formalizing Counterexample-driven Refinement with Weakest Preconditions
133
ing the A predicate (m = InA) or is constructing the B predicate (m = InB). With each interpolant pair we also track the remaining number of steps k. Thus, the interp function takes two arguments: a statement S and a four-tuple (m, A, B, k). Second, the function returns a list of four tuples of the same form (m, A, B, k). It returns a list because each iteration peeled from a while loop yields a new (A, B) pair. In the case that k is less than or equal to zero, the interp function replaces A and B by f alse and returns list(m, f alse, f alse, 0) (the function list takes a tuple and returns a list containing that single tuple). Processing of a skip statement simply reduces the k bound. Processing of an assignment statement switches on the mode variable m to determine whether or not to apply the substitution ([x/e]) to A or B. Processing of statement sequencing first recursively makes the call (interp S2 (m, A, B, k)), which results in a list of four-tuples. The curried function (interp S1 ) then is mapped over this list (via map), resulting in a list of lists of four-tuples, which is then flattened (via f latten) to result in a list of fourtuples. We now skip to the processing of the while loop. The interp function maintains the invariant that (assuming it initially is called with m = InA) that the output list begins with a tuple with mode InA and is followed by tuples with mode InB. A loop iteration is peeled (as before) and a call to interp yields the list res, which is then split into a head (InA, A , B , k ) and tail tl. A new list res is defined that is the same as res except for the addition of the new tuple (InB, A [X/X ], eqX , k ) which represents the new interpolant pair that we must consider for this loop iteration. Finally, this list is processed (as in statement sequencing, using map and f latten). Finally, we come to the processing of the if-then-else statement. As before, the interp function recurses on the statements S1 and S2 , yielding two lists. By construction, the first tuple of each list has mode InA. These tuples are combined together to make a new list res0 containing one tuple of mode InA, in the expected way (note that since the mode is InA, it is safe to substitute f alse in the B position). The list tl1 contains the InB tuples from the then branch. The interp function maps each tuple (m , A , B , k ) in tl1 to (m , A , b∧B , k ) in res1 to reflect the semantics of the conditional statement. The list res2 is created in a symmetric fashion. Finally, the concatenation of lists res0 , res1 and res2 is returned as the result. Recall that we are given that for some k, bwpE (S, (Q, k)) = f alse and that bwp(S, (Q, k)) = f alse. To derive a new set of predicates we invoke (interp S (InA, Q, f alse, k)). The first tuple in the result list is discarded (as it has mode InA and does not represent an interpolant pair). The remaining tuples in the list represent (A, B) interpolant pairs. Further recall, that each
134 Θ(A, B) that is an interpolant of (A, B) yields a set of refinement predicates EΘ = atoms(Θ(A, B)[X /X]). Exercise. Prove that the set of refinement predicates E generated by interpolating the (A, B) pairs returned by (interp S (InA, Q, f alse, k)) has the property that bwpE (S, (Q, k)) = f alse. End Exercise
6.
Procedures
We now add to the language procedures with call-by-value formal parameters (of type integer). As in the C language, all procedures are defined at the same lexical level (no nesting of procedures is allowed). To simplify the exposition, procedures do not have return statements. Any side-effect of a procedure must be accomplished through assignment to a global variable (which can be used to simulate returning an integer result). Exercise. Extend the technique to deal with return statements. End Exercise We assume a set of global integer variables G and we will find it useful to have primed (G ) and temporary (GT ) versions of the global variables. Given a procedure p, let Fp = {f1 · · · fm } be the formal (integer) parameters of p and let Lp = {l1 · · · ln } be the local (integer) variables of p. Now that we have procedures, it makes sense to talk about the “scope” of predicates. A predicate has “global” scope if it only refers to global variables and constants. A predicate has “local” scope (with respect to a procedure p) if it mentions a formal parameter or local variable of procedure p. Predicates that mention variables from the (local) scopes of different procedures are not allowed. Let Ep be the predicates with global scope together with the predicates that are locally scoped to procedure p. We wish to compute a “summary” of the effect of a procedure p. A summary is simply a predicate containing both unprimed and primed variables of the global variables. Thus, as summary is a (transition) relation between prestates and post-states. Let Sp be the body of procedure p (in which all formal parameters and local variables of p are in scope). The summary of p is Sp , ∆p = ∃Lp .wp(S
(g = g ))
g∈G
That is, the summary is the weakest precondition of the procedure body with respect to a predicate in which the values of variables g and g are the same, for all global variables g in G and their primed counterparts. Finally, the local variables of p are eliminated, resulting in a predicate mentioning only global variables and their primed counterparts, as well as the formal parameters of procedure p. Procedure summaries make it possible to deal with recursive
Formalizing Counterexample-driven Refinement with Weakest Preconditions
135
procedures, as is well-known in program analysis [Sharir and Pnueli, 1981; Reps et al., 1995]. The weakest precondition of a call to procedure p from procedure q is wp(p(a1 , · · · , am ), Q) = ∃GT .(∆p [G /GT ] ∧ Q[G/GT ])[ffi /ai ] The formula (∆p [G /GT ] ∧ Q[G/GT ]) equates the pre-state of Q (G renamed to GT ) with the post-state of ∆p (G renamed to GT ) in order to perform a join of ∆p with Q. Then, the formal parameters fi of procedure p are replaced by their corresponding actually arguments ai . Finally, the temporary globals GT are eliminated, resulting in a predicate that mentions only global variables (and their primed counterparts), as well as the local variables of procedure q. If the program has no recursion then we can simply apply the weakest precondition calculation as given above. If it has recursion then, as with while loops, we must apply the exterior cover to achieve termination of symbolic backwards reachability. We redefine ∆p to use the exterior cover as follows: Sp , ∆p = ∃Lp .ecEp (wp(S
(g = g )))
g∈G
Of course, not every procedure body need be abstracted. By a depth-first search of the program’s callgraph we can identify a set of procedures that are the targets of backedges. Only these procedures’s bodies need be abstracted. Exercise. We have generalized the weakest precondition calculation to procedures with call-by-value parameter passing. Generalize the bwp and bwpE functions and the generation of refinement predicates via the interp function. End Exercise
7.
Call-by-reference Parameter Passing
To complicate matters a bit more, we introduce a call-by-reference parameter passing mechanism to the language, which permits aliasing. So far, the type system of our language (such as it is) permits only the base type “integer”. We now extend the type system to permit references to integers: τ → int | ref int References are created via parameter passing of a local variable l (of type int) to formal parameter f of a procedure with type declaration “out int”. By definition, the type of f is ref int and f is initialized to be a reference to variable l. It is possible to copy a reference from caller to callee using the type declaration “ref int” for formal parameter f . In this case, the actual argument corresponding to f must be a formal parameter of the caller with type ref int. Only formal parameters may have type ref int. If f is a formal parameter of type ref int then the expression ∗f denotes the contents of the location that
136 f references. Two formal parameters with reference type may be compared for equality. In order to reason about references to variables we need to add the concept of an “address” of a variable. Let L be the set of all local variables in the program. Let every variable l in L be assigned a unique integer index ind(l). To distinguish integers that represent addresses from program integers, we introduce the constructor addr(i). The following axioms define the meaning of addr and dereference ∗: ∀ l ∈ L : ∗addr(ind(l)) = l ∀ i, j : i = j ⇐⇒ addr(i) = addr(j)
(A DDR /D EREF) (A DDR E Q)
The first axiom states that the dereference of addr(i), where i is the index of local variable l is equal to the value of variable l. The second axiom states that two addresses are equal if and only if their integer indices are equal. We now need to define the weakest precondition for two new cases that arise with dereferences: procedure calls to procedures with out parameters; and assignment through a dereference of a formal parameter (∗f := e). The weakest precondition for a direct assignment to a variable (local or global) (x := e) does not change. We assume (without loss of generality) that the formal parameters of procedure p are (f1 , · · · , fm , g1 , · · · , gn ) where the fi are not out parameters and the gj are out parameters. The weakest precondition of a call to procedure p from procedure q is wp(p(a1 , · · · , am , l1 , · · · , ln ), Q) = ∃GT .(∆p [G /GT ] ∧ Q[G/GT ])[ffi /ai ][ggj /addr(ind(lj ))] The procedure call passes actual expressions (a1 , · · · , am ) corresponding to the formal parameters (f1 , · · · , fm ) and locals variables (l1 , · · · , ln ) of procedure q corresponding to the formal out parameters (g1 , · · · , gn ). The only change to the wp transformer (compared to the call-by-value transformer) is to replace each out parameter gj by the address of the corresponding local lj in the call. Let ∗f := e be an assignment statement in procedure p, where f is a formal parameter of type ref int and e is an integer expression. Let Y = {∗g1 , · · · ∗ gk } be the dereference expressions mentioned in expression e. Given a predicate Q, variables f and g of type ref int and expression e of type int, we consider the effect of assigning e to ∗f on predicate Q under
Formalizing Counterexample-driven Refinement with Weakest Preconditions
137
potential aliasing between f and g. There are two cases to consider: either f and g reference the same location (are aliases), and hence the assignment of e to ∗f will cause the value of ∗g to become e; or they are not aliases, and the assignment to ∗f leaves ∗g unchanged. The following formula captures this choice: Q[f, e, g] = (f = g ∧ Q[∗g/e]) ∨ (f = g ∧ Q) We now define wp(∗f := e, Q) as follows: wp(∗f := e, Q) = Q[f, e, g1 ][f, e, g2 ] · · · [f, e, gk ] This generalization of the assignment statement to consider potential aliases is due to Morris [Morris, 1982]. Exercise. We have generalized the weakest precondition calculation to procedures with call-by-reference parameter passing. Generalize the bwp and bwpE functions and the generation of refinement predicates via the interp function. End Exercise
8.
Conclusions
We have presented a counterexample-driven refinement process for software using the concepts of weakest precondition, predicate abstraction, bounded model checking and Craig interpolants. This presentation drew from our experience with the SLAM project as well as recent results from other researchers working on abstraction/refinement techniques. We thank them all for their work and inspiration. We have left a number of interesting problems open for the reader to solve in the exercises. Just in case these are not enough, here are some other problems to consider: consider how to make the process incremental, ala the BLAST project [Henzinger et al., 2002]; our use of interpolants does not localize the scope of the refinement predicates to the syntax of the program as in [Henzinger et al., 2004] - consider how to do this to improve the efficiency of the process; consider how to generalize the type system we have given to permit references to references (as in the C language), and to allow the creation of references by mechanisms other than parameter passing (such as the C address-of operator); consider how to generalize the process for concurrent programs.
138
Acknowledgements Thanks to Tony Hoare for suggesting that we formalize the counterexampledriven refinement process for a simple structured language using the weakest precondition transformer and for his comments on this paper. Thanks to Sriram K. Rajamani for the enjoyable years we spent together working on the SLAM project and for his comments on this paper. Thanks also to Mooly Sagiv and Orna Kupferman and her students for their comments on this work during my first and too brief visit to Israel.
References Ball, T., Podelski, A., and Rajamani, S. K. (2001). Boolean and cartesian abstractions for model checking C programs. In TACAS 01: Tools and Algorithms for Construction and Analysis of Systems, LNCS 2031, pages 268–283. Springer-Verlag. Ball, T. and Rajamani, S. K. (2000). Boolean programs: A model and process for software analysis. Technical Report MSR-TR-2000-14, Microsoft Research. Ball, T. and Rajamani, S. K. (2001). Automatically validating temporal safety properties of interfaces. In SPIN 01: SPIN Workshop, LNCS 2057, pages 103–122. Springer-Verlag. Bryant, R. (1986). Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691. Burch, J., Clarke, E., McMillan, K., Dill, D., and Hwang, L. (1992). Symbolic model checking: 1020 states and beyond. Information and Computation, 98(2):142–170. Clarke, E., Grumberg, O., Jha, S., Lu, Y., and Veith, H. (2000). Counterexample-guided abstraction refinement. In CAV 00: Computer Aided Verification, LNCS 1855, pages 154–169. Springer-Verlag. Clarke, E. M. and Emerson, E. A. (1981). Synthesis of synchronization skeletons for branching time temporal logic. In Logic of Programs, LNCS 131, pages 52–71. Springer-Verlag. Cousot, P. and Cousot, R. (1977). Abstract interpretation: a unified lattice model for the static analysis of programs by construction or approximation of fixpoints. In POPL 77: Principles of Programming Languages, pages 238–252. ACM. Cousot, P. and Cousot, R. (1978). Static determination of dynamic properties of recursive procedures. In Neuhold, E., editor, Formal Descriptions of Programming Concepts, (IFIP WG 2.2, St. Andrews, Canada, August 1977), pages 237–277. North-Holland. Craig, W. (1957). Linear reasoning. a new form of the herbrand-gentzen theorem. J. Symbolic Logic, 22:250–268. Das, M. (2000). Unification-based pointer analysis with directional assignments. In PLDI 00: Programming Language Design and Implementation, pages 35–46. ACM. Detlefs, D., Nelson, G., and Saxe, J. B. (2003). Simplify: A theorem prover for program checking. Technical Report HPL-2003-148, HP Labs. Dijkstra, E. (1976). A Discipline of Programming. Prentice-Hall. Flanagan, C., Leino, K. R. M., Lillibridge, M., Nelson, G., Saxe, J. B., and Stata, R. (2002). Extended static checking for java. In PLDI 02: Programming Language Design and Implementation, pages 234–245. ACM. Graf, S. and Saidi, H. (1997). Construction of abstract state graphs with PVS. In CAV 97: Computer-aided Verification, LNCS 1254, pages 72–83. Springer-Verlag.
Formalizing Counterexample-driven Refinement with Weakest Preconditions
139
Henzinger, T. A., Jhala, R., Majumdar, R., and McMillan, K. L. (2004). Abstractions from proofs. In POPL 04: Principles of Programming Languages, pages 232–244. ACM. Henzinger, T. A., Jhala, R., Majumdar, R., and Sutre, G. (2002). Lazy abstraction. In POPL ’02, pages 58–70. ACM. Hoare, C. A. R. (1969). An axiomatic basis for computer programming. Communications of the ACM, 12(10):576–583. Knoop, J. and Steffen, B. (1992). The interprocedural coincidence theorem. In CC 92: Compiler Construction, pages 125–140. Kurshan, R. (1994). Computer-aided Verification of Coordinating Processes. Princeton University Press. McMillan, K. (1993). Symbolic Model Checking: An Approach to the State-Explosion Problem. Kluwer Academic Publishers. McMillan, K. (2003). Interpolation and sat-based model checking. In CAV 03: Computer-Aided Verification, LNCS 2725, pages 1–13. Springer-Verlag. Morris, J. M. (1982). A general axiom of assignment. In Theoretical Foundations of Programming Methodology, Lecture Notes of an International Summer School, pages 25–34. D. Reidel Publishing Company. Nelson, G. and Oppen, D. C. (1979). Simplification by cooperating decision procedures. ACM Transactions on Programming Languages and Systems, 1(2):245–257. Queille, J. and Sifakis, J. (1981). Specification and verification of concurrent systems in Cesar. In Proc. 5th International Symp. on Programming, volume 137 of Lecture Notes in Computer Science, pages 337–351. Springer-Verlag. Reps, T., Horwitz, S., and Sagiv, M. (1995). Precise interprocedural dataflow analysis via graph reachability. In POPL 95: Principles of Programming Languages, pages 49–61. ACM. Sagiv, M., Reps, T., and Wilhelm, R. (1999). Parametric shape analysis via 3-valued logic. In POPL 99: Principles of Programming Languages, pages 105–118. ACM. Sharir, M. and Pnueli, A. (1981). Two approaches to interprocedural data flow analysis. In Program Flow Analysis: Theory and Applications, pages 189–233. Prentice-Hall.
A MECHANICALLY CHECKED PROOF OF A COMPARATOR SORT ALGORITHM
J Strother Moore Department of Computer Sciences University of Texas at Austin Austin, TX 78712
[email protected] Bishop Brock IBM Austin Research Laboratory 11501 Burnet Road Austin, TX 78756
[email protected] Abstract We describe a mechanically checked correctness proof for the comparator sort algorithm underlying a microcode program in a commercially designed digital signal processing chip. The abstract algorithm uses an unlimited number of systolic comparator modules to sort a stream of data. In addition to proving that the algorithm produces an ordered permutation of its input, we prove two theorems that are important to verifying the microcode implementation. These theorems describe how positive and negative “infinities” can be streamed into the array of comparators to achieve certain effects. Interesting generalizations are necessary in order to prove these theorems inductively. The mechanical proofs were carried out with the ACL2 theorem prover. We find these proofs both mathematically interesting and illustrative of the kind of mathematics that must be done to verify software.
Keywords:
digital signal processing, hardware verification, microcode verification, software verification, statistical filtering, theorem proving
141 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 141–175. © 2005 Springer. Printed in the Netherlands.
142
1.
An Author’s Note
In August, 2004, I was invited to give six lectures at the Marktoberdorf Summer School. The title of my lectures was “Little (but Hard) Theorems about Big Systems.” Had the lectures actually been devoted to that subject, I fear many students would have only learned that some people can use mechanical theorem provers to prove such theorems. I wanted to teach them something more useful: how to manage formal proofs. Therefore, my lectures were largely drawn from my “How to Prove Theorems Formally” [Moore, 2004] which can be found on the ACL2 web page where it serves as an introduction to the subject for people wishing to learn ACL2. I devoted the last two lectures to quick tours of theorems of commercial interest. This contribution to the Summer School book explains one such theorem in more detail. It describes joint work with Bishop Brock. – J Strother Moore
2.
Little Theorems about Big Systems
“ACL2” is the name of a functional programming language (based on Common Lisp), a first-order, quantifier-free mathematical logic, and a mechanical theorem prover. ACL2, which is sometimes called an “industrial strength version of the Boyer-Moore system,” is the product of Kaufmann and Moore, with many early design contributions by Boyer [Kaufmann et al., 2000b]. It has been used for a variety of important formal methods projects of industrial and commercial interest, including: verification that the register-transfer level description of the elementary floating point arithmetic circuitry for the AMD Athlon implements the IEEE floating point standard [Russinoff, 1998, Russinoff and Flatau, 2000]; similar work has been done for components of the AMD K5 [Moore et al., 1998], the IBM Power 4 [Sawada, 2002], and the AMD Opteron; verification that a micro-architectural model of a Motorola digital signal processor (DSP) implements a given microcode engine [Brock and Hunt, 1999] and verification that specific microcode extracted from the ROM implements certain DSP algorithms; verification that microcode for the Rockwell Collins AAMP7 implements a given security policy having to do with process separation [Greve and Wilding, 2003]; verification that the Java Virtual Machine (JVM) bytecode produced by the Sun compiler javac on certain simple Java classes implements the claimed functionality [Moore, 2003] and the verification of properties
A Mechanically Checked Proof of a Comparator Sort Algorithm
143
of importance to the Sun bytecode verifier as described in JSR-139 for J2ME JVMs [Liu and Moore, 2003]; verification of the soundness and completeness of a Lisp implementation of a BDD package that achieves runtime speeds of about 60% those of the CUDD package (however, unlike CUDD, the verified package does not support dynamic variable reordering and is thus more limited in scope) [Sumners, 2000]; verification of the soundness of a Lisp program that checks the proofs produced by the Ivy theorem prover from Argonne National Labs; Ivy proofs may thus be generated by unverified code but confirmed to be proofs by a verified Lisp function [McCune and Shumsky, 2000]. Other applications are described in [Kaufmann et al., 2000a] and in the papers distributed as part of the annual ACL2 workshops, the proceedings of which may be found via the Workshops link on the ACL2 home page [Kaufmann and Moore, 2004]. As these examples demonstrate, it is possible to construct mechanically checked proofs of properties of great interest to industrial hardware and software designers. The properties proved are typically not characterizations of the correctness of the systems studied. For example, the proofs about the AMD microprocessors – the K5, the Athlon, and the Opteron – just deal with the IEEE compliance of certain floating point operations modeled at the register transfer level. The microprocessors contain many unverified components and even the verified ones could fail due to violations of their input conditions. Nevertheless, these theorems were proved for good reason: the designers were worried about their designs. Aspects of these designs are quite subtle or complicated and formal specification and mechanized proof offer virtually the only way to relieve the worries that something had been overlooked. In addition to being interesting, these theorems are hard to prove. That is a relative judgment of course. Compared to longstanding open problems, these theorems are all trivial. But by many measures each of these proofs is much more complicated than any proof ever encountered by most readers. For example, the IEEE compliance proof for the floating point division microcode for the AMD K5 (in 1995) required the formal statement and proof of approximately 1,200 lemmas. Subsequent AMD floating-point proofs are harder to measure because they build on libraries of lemmas that have been accumulating since 1995. The correspondence result between the Motorola DSP micro-architecture and its microcode engine involved intermediate formulas that, when printed, consumed 25 megabytes (approximately 5000 pages of densely packed text) per formula. And the proof involved hundreds of such formulas. The formal model of the CLDC-like JVM and bytecode verifier is almost 200 pages of densely packed text. The proof that a simple Java class,
144 which spawns an unbounded number of threads, produces a monotonic increase in the value of a certain shared counter produces about 19,000 subgoals and requires about 84 megabytes to print. In these senses, the theorems in which we are interested are little (but hard) theorems about big systems, or put another way, they are valuable and nontrivial theorems about parts of very complicated systems. How do we prove theorems like this? There is no mystery. We prove theorems like this the same way we prove simple theorems: by properly defining the concepts and carefully stating the theorem, by separating concerns, appropriate decomposition of the proof into more general lemmas and the recursive application of the same methodology. But to do it on a grand scale takes more than the usual attention to detail and good taste. Minor misjudgments that are tolerable in small proofs are blown out of proportion in big ones. An unnecessary case split, an inelegant definition, or an insufficiently general concept can doom a big proof in a way that makes it very hard even to trace the problem back to its source. If you aim to produce big proofs, it pays to learn how to produce small ones well. Then learn how to get a mechanical theorem prover, such as ACL2 [Kaufmann et al., 2000b], HOL [Gordon and Melham, 1993], HOL Light [Harrison, 2000], Isabelle [Nipkow and Paulson, 1992], or PVS [Owre et al., 1992], to do the rote work for you, so that you can focus on the strategic issues. But in the rest of this paper we do not discuss how to use our theorem prover of choice, ACL2, but instead focus on one commercially interesting proof done with it: the verification of some key properties of a microcode program for a very complex commercial digital signal processor.
3.
Informal Discussion of the Problem
It is often necessary to perform statistical filtering and peak location in digital spectra for communications signal processing. In this paper we consider an abstraction of the algorithm implemented on one such microprocessor, the Motorola CAP digital signal processor [Gilfeather et al., 1994]. The CAP was designed for use in a programmable radio. Features of the CAP design include separate program and data memories; 252 programmervisible data and control registers; 6 independently addressable data and parameter memories; logical partitioning of the data memories into ‘source’ and ‘destination’ memories allowing data to be streamed from one to the other through the processing units; four multiplier-accumulators and a 6-adder array (discussed further below); a 64-bit instruction word, which in the arithmetic unit is further decoded into a 317-bit, low-level control word; no-overhead looping instructions; instructions that access as many as 10 different registers to determine the next instruction; instructions that can simultaneously modify
145
A Mechanically Checked Proof of a Comparator Sort Algorithm
over 100 registers; and an instruction pipeline that contains many programmervisible hazards. One of the major functional units of the CAP is the adder array, a collection of 20-bit adder/subtracters, each of which has 8 dedicated input registers and a dedicated path to a local memory. The CAP adder array was originally designed to support fast FFT computations, but the designers also included the data paths necessary to accelerate peak finding. Our main interest in the CAP was modeling the micro-architecture and the microcode engine it allegedly implemented, formalizing the notion of “pipeline hazard” and proving that if the relevant microcode contained no hazards then the micro-architecture implements the microcode engine. The ACL2 models were bit- and cycle-accurate, i.e., it was possible using the ACL2 model to determine the value of every user-visible bit in the microcode state on any cycle. The reader may thus get some impression of the complexity of the ACL2 models from the complexity of the CAP design. The CAP modeling and proof work was done primarily by Brock and reported in [Brock and Hunt, 1999]. We extracted microcode from the ROM of the CAP and verified the functional properties of some routines with respect to the ACL2 microcode interpreter model. One of the routines we verified was the peaking finding code. The so-called “5PEAK” program of the CAP [Brock and Hunt, 1999] uses the microprocessor’s adder array as a systolic comparator array as shown in Figure 1. The program streams data through the comparator array and finds the five largest data points and the five corresponding memory addresses. In this P1
max
P2
min C1
▲
C2
Figure 1.
max a
P3
min ▲
C3
max
P4
▲
C4
max
P5
min
min ▲
C5
max min
▲
Abstract View of Comparator Array
informal discussion we largely ignore memory addresses paired with each data point. Candidate data points enter the array via register C1 and move through the array, toward the right in the diagram. Maximum values (peaks) remain in the array in the Pn registers, and the minima are eventually discarded when they pass out of the last comparator. On each cycle the comparator array updates the registers as follows: C1 = next data point, Cn−1 , Pn−1 ), Cn = min(C Pn = max(C Cn , Pn ).
n > 1,
146 Informally, the peak registers, Pi , maintain the maximum value that has passed by that point in the comparator array. How can we use this array to sort? Or, more particularly, to identify the n highest peaks in the stream of data? Using the comparator array to find the five maxima requires several steps. We explain the algorithm by example here. In the following we will represent the contents of the comparator array registers in the format shown below, with the contents of each peak register above the contents of the corresponding candidate register. P1 P2 P3 P4 P5 C1 C2 C3 C4 C5 The next state of the comparator array on each cycle is
max(P P1 , C 1 ) d
max(P P2 , C 2 ) min(P P1 , C 1 )
max(P P3 , C 3 ) min(P P2 , C 2 )
max(P P4 , C 4 ) min(P P3 , C 3 )
max(P P5 , C 5 ) min(P P4 , C 4 )
,
where d represents the next data point. We will illustrate the peak search for the 10-element data vector
2 9 3 5 4 1 8 7 10 6
.
Although this example uses small unsigned numbers for simplicity, the CAP implementation via the comparator array and the 5PEAK microcode will correctly search any vector of signed, 20-bit 2’s complement data values, subject to a few obvious restrictions. The comparator array is initialized by setting C1 to the first (leftmost) element of the data vector, and setting every other register to −∞. In the fixed bit-width hardware realization of the comparator array on the CAP, the role of −∞ is played by the most negative number, −219 .
−∞ −∞ −∞ −∞ −∞ 2 −∞ −∞ −∞ −∞
Carrying out the comparator array operations for 9 more steps results in the configuration 10 8 5 3 2 . 6 9 7 4 1 Up to this point the algorithm described here is essentially identical to the two VLSI sorting algorithms described in [Miranker et al., 1983, Curey et al., 1985]. These researchers offered special-purpose hardware designs with the same basic compare-exchange step described above. There is a key difference, however, in that the VLSI sorting proposals require reversing the direction of data flow to extract the sorted data. In these approaches the sorting machine is a stack that accepts data pushes in arbitrary order but pops data in sorted
A Mechanically Checked Proof of a Comparator Sort Algorithm
147
order. For example, if after loading the sample vector we were to redefine the next-state function of our sorting array to be Output Pn Cn P5
= = = =
max(C1 , P1 ), max(C Cn+1 , Pn+1 ), min(C Cn , Pn ), n < 5, −∞
and pump ten times, the original input vector would be popped to the output in descending order. As long as there are enough registers and comparators for the input data set size, a machine of this type can sort data as fast as it can be physically moved to and from the sorting array. Reference [Miranker et al., 1983] also describes ways to pipeline the use of these sorting machines to increase throughput. Although the CAP provided numerous data paths in the adder array, reversing the direction of data flow was not possible, and another solution to extracting the maxima had to be found. Among the many possibilities that were supported by the hardware, the most straightforward involved simply continuing to step the original compare-exchange algorithm and collecting the maxima as they are ejected out the array. This was the algorithm ultimately encoded in CAP microcode. In the CAP algorithm, data input is completed by stepping the array one more cycle with a dummy input of +∞. In the fixed bit-width hardware realization on the CAP the role of +∞ is played by the most positive data value, 219 − 1. 10 9 7 4 2 . +∞ 6 8 5 3 At this point register P1 holds the maximum value, yet the rest of the array is not yet ordered in any discernible way, except that the Pn registers satisfy a certain invariant given below. As we will show later, this invariant guarantees that if we ‘pump’ the array four times with +∞, then the maxima will collect at the end of the array in registers P3 , C4 , P4 , C5 , and P5 .
+∞ +∞ 10 8 6 +∞ +∞ +∞ 9 7
.
At this point the comparator array data registers C1 , P1 , C2 , . . . , C5 , P5 are ordered, and the array acts like a shift register as long as +∞ is pumped into C1 . Pumping the array five times with +∞ forces the five maxima out of the comparator array in reverse order, where they can be collected and stored. To summarize, the systolic comparator array can be used to compute the five maxima of a data vector by the following steps: The first data point is loaded into C1 , and the rest of the comparator array is initialized to −∞.
148 The data vector is pumped into the array one point at a time, and a single +∞ is inserted to finish the data input. Pumping four times with +∞ causes the maxima to collect at the end of the array. Pumping five times with +∞ forces the maxima out of the array in reverse order, where they are collected and stored. The algorithm above is implemented in microcode on the CAP. It is among several microcode programs for that processor that we have mechanically verified. As described briefly in [Brock et al., 1996] (and more completely in [Brock and Hunt, 1999]), we formalized the CAP in the ACL2 logic, sketched below. We then extracted the microcode for the 5PEAK program from the CAP ROM, obtaining a sequence of bit vectors, and used the ACL2 theorem prover to show that when the abstract CAP machine executes the extracted code on an appropriate initial state and for the appropriate number of cycles, the five highest peaks and their addresses are deposited into certain locations. We defined the “highest peaks and their addresses” by defining, for specification purposes only, a sort function in ACL2 which sorts such address/data pairs into descending order. The reader will see that this sorting function is exactly the stack-like sorting method of the VLSI implementations described above. In our 5PEAK specification we refer to the first five pairs in the ordering. The argument that the microcode is correct is quite subtle, in part because an arbitrary amount of data is streamed through and in part because the positive and negative infinities involved in the algorithm can be legitimate data values but are accompanied by bogus addresses; correctness depends on a certain “anti-stability” property of the comparator array. A wonderfully subtle generalization of a key lemma was necessary in order to produce a theorem that could be proved by mathematical induction. In this paper we discuss only the high-level algorithm sketched above and its correctness proof. We do not discuss the microcode itself. The list of definitions and theorems leading ACL2 to the proofs described here is available at http://www.cs.utexas.edu/users/moore/publications/csort/csort.lisp. One of the two appendices to this paper gives a “tour” through the input script. The other appendix shows the ACL2 proof output on one of the main lemmas.
4.
ACL2
Before we present this work in detail we briefly describe the ACL2 logic and theorem prover. ACL2 stands for “A Computational Logic for Applicative Common Lisp.” ACL2 is both a mathematical logic and system of mechanical tools which can
A Mechanically Checked Proof of a Comparator Sort Algorithm expression endp (x) cons (x, y) car (x) cdr (x) cadr (x) cddr (x) zp (x) len (x)
149
meaning true iff x is the empty list the ordered pair < x, y > the left component of (the ordered pair) x the right component of x the left component of the right component of x the right component of the right component of x x = 0 (or x is not a natural number) the number of elements in the list x
Table 1. The Meaning of Certain Expressions
be used to construct proofs in the logic. The logic formalizes a subset of Common Lisp. The ACL2 system is essentially a re-implemented extension, for applicative Common Lisp, of the so-called “Boyer-Moore theorem prover” Nqthm [Boyer and Moore, 1979, Boyer and Moore, 1997]. The ACL2 logic is a first-order, essentially quantifier-free logic of total recursive functions providing mathematical induction and two extension principles: one for recursive definition and one for “encapsulation.” The syntax of ACL2 is a subset of that of Common Lisp. However, we do not use Lisp syntax in this paper. The rules of inference are those of propositional calculus with equality together ...with instantiation and mathematical ω induction on the ordinals up to 0 = ω ω . The axioms of ACL2 describe five primitive data types: the numbers (actually, the complex rationals), characters, strings, symbols, and ordered pairs or “conses”. Essentially all of the Common Lisp functions on the above data types are axiomatized or defined as functions or macros in ACL2. By “Common Lisp functions” here we mean the programs specified in [Steele, 1990] that are (i) applicative, (ii) not dependent on state, implicit parameters, or data types other than those in ACL2, and (iii) completely specified, unambiguously, in a hostindependent manner. Approximately 170 such functions are axiomatized or defined. The functions used in Table 1 are particularly important here. Common Lisp functions are partial; they are not defined for all possible inputs. In ACL2 we complete the domains of the Common Lisp functions and provide a “guard mechanism” by which one can establish that the completion process does not affect the value of a given expression. See [Kaufmann and Moore, 1997]. The most important data structure we use in this paper is lists. The empty list is usually represented by the symbol nil. The non-empty list whose first element is x and whose remaining elements are those in the list y is represented by the ordered pair < x, y >. This ordered pair is the value of the expression cons (x, y).
150 Here is an example of a simple list processing function, namely, the function for concatenating two lists. In the syntax of Common Lisp we could write this as (defun append (x y) (if (endp x) y (cons (car x) (append (cdr x) y)))). but we will here use the notation Definition: append (x, y) = if endp (x) then y else cons (car (x), append (cdr (x), y)) fi The concatenation of the empty x to y yields y. The concatenation of a non-empty x to y is obtained by consing the first element of x, car (x), to the concatenation of the rest of x, cdr (x), to y. The logic sketched above is supported by a mechanical theorem prover, i.e., a computer program that takes input from a human user, typically in the form of definitions and conjectured theorems, and attempts to find proofs. The ACL2 theorem prover combines many proof techniques, including mathematical induction, simplification, and decision procedures for propositional calculus, equality, and linear arithmetic. Simplification is driven by the system’s data base of previously admitted definitions and previously proved theorems. For example, a function definition, like that of append above, might be used to expand a call of the function into an instance of the body; built-in rules about primitives are used to simplify the if structure, and previously proved lemmas are typically used as rewrite rules to rearrange terms. Many heuristics participate in the orchestration of the various proof techniques and rules. The user cannot interact directly with the theorem prover during a proof attempt. Instead, when a proof attempt fails, the user tries to help the theorem prover by posing new lemmas for it to prove first. Ideally, when these lemmas are proved and available in the system’s data base, the system can find some proofs it previously could not. The process by which the user inspects failed proof attempts and formulates the necessary intermediate lemmas is called “The Method” and is explained in [Kaufmann et al., 2000b]. The end result of a successful proof session is a sequence of definitions, theorems, and other kinds of hints, that configure the theorem prover so that it can prove the desired main conjecture. Such a script is called a “book” in ACL2 terminology. Many books, i.e., for
A Mechanically Checked Proof of a Comparator Sort Algorithm
151
arithmetic, bit vector manipulation, list processing, etc., are available from the ACL2 home page [Kaufmann and Moore, 2004]. Books can be included in a session (and thus in other books) to configure the system. Readers interested in learning more about Common Lisp should consult [Steele, 1990]. Readers interested in the logical foundations of applicative Common Lisp as formalized in ACL2 should see [Kaufmann and Moore, 1997]. Readers interested in the ACL2 system should see [Kaufmann et al., 2000b] as well as the home page for ACL2 [Kaufmann and Moore, 2004], which contains the source code, 5 megabytes of hypertext documentation, a bibliography, and many applications.
5.
High-Level Specification
We typically approach the verification of a machine code program in two phases, by (a) characterizing what the machine code computes at a low level, and then (b) showing that the low-level behavior meets, or is somehow equivalent to, a higher-level specification. This describes, roughly speaking, how we approached the verification of the 5PEAK algorithm. In step (a) we proved that the microcode extracted from the CAP ROM, when executed by the ACL2 model of the CAP microcode engine, produces a state described in terms of the sorting function formalized below. In step (b) we proved that the sorting function produces an ordered permutation of its input. Because the sorting function defined below sorts an arbitrarily long list, part of step (a) was to show that certain microcode registers ultimated contained the first five elements of the output vector, i.e., the five maxima. We defined the abstract sorting algorithm in a way that made the correspondence proof relatively easy. In addition, in relating the abstract algorithm to the microcode it was necessary to prove several theorems about how the signed infinities are handled by the abstract algorithm. These theorems are especially interesting to prove. Therefore, this paper presents the abstract sorting algorithm and the key theorems about it. We focus on the hardest of these theorems to prove, namely the treatment of positive infinities. Despite the general nature of these theorems – e.g., the absence of bounds on the lengths of the vectors being sorted or the size of the data – the reader is reminded that these theorems play a direct role in the very practical problem of the 5PEAK microcode verification and are illustrative of the kind of general mathematics one must handle in code verification. The abstract sorting algorithm sorts lists of “records” with integer keys. The algorithm is inspired by the operation of a comparator array, except that it uses an unlimited number of comparators.. The records are represented as cons pairs as constructed by cons (other, data), where the data field represents the
152 integer sort key, and the other field is arbitrary (but, in practice, contains the address from which the data was obtained). The basic systolic cycle of the general algorithm is captured by the function cstep. Definition: cstep (acc) = if endp (acc) then nil elseif endp (cdr (acc)) then acc else cons (max-pair (cadr (acc), car (acc)), cons (min-pair (cadr (acc), car (acc)), cstep (cddr (acc)))) fi where Definition: max-pair ((pair1, pair2) = if data ((pair1) ≤ data ((pair2) then pair2 else pair1 fi Definition: min-pair ((pair1, pair2) = if data ((pair1) ≤ data ((pair2) then pair1 else pair2 fi The function cstep orders adjacent records in the accumulator acc pairwise, just as the comparator array orders Cn , Pn into Pn , Cn+1 on each cycle. Feeding the input vector into the unlimited resource comparator array is modeled by the function cfeed Definition: cfeed (lst, acc) = if endp (lst) then acc else cfeed (cdr (lst), cstep (cons (car (lst), acc))) fi
A Mechanically Checked Proof of a Comparator Sort Algorithm
153
The function cfeed maintains an important invariant on the accumulator. If we number the elements of the accumulator, acc, acc0 , acc1 , . . . , accn where acc0 is the first element of the accumulator, then acci ≥ accj , for i even and i < j. That is, the even numbered elements dominate the elements to their right. Call this property Φ (acc). It is not difficult to prove that Φ is invariant under cfeed. That is, if an accumulator has property Φ and a list of records is fed into it with cfeed then the result satisfies Φ. Since Φ (nil) holds, we can create an accumulator satisfying Φ by feeding an arbitrary list of records into the empty accumulator. Furthermore, we can also prove that if a non-empty accumulator acc has property Φ, then the first element of acc is a maximal element and the result of applying cstep to cdr (acc) satisfies Φ. Thus, we can sort such an accumulator by ‘draining’ off the maxima while stepping the remainder.1 Definition: cdrain (n, acc) = if zp (n) then acc else cons (car (acc), cdrain (n − 1, cstep (cdr (acc)))) fi The final sorting algorithm feeds the input data vector into an empty accumulator and then drains off the maxima. Definition: csort (lst) = cdrain (len (lst), cfeed (lst, nil))
6.
The Key Theorems Given the foregoing claims about Φ it is not difficult to prove Theorem: Ordered Permutation Property The function csort returns an ordered (weakly descending) permutation of its input.
154 To relate these abstractions to the microcode, we had to develop two other interesting and crucial properties. First, observe that in the definition of csort above the cfeed operation is done with the initial accumulator nil. But in the code, the corresponding operation is done with the peak and candidate value registers initialized to the most negative CAP integer. To prove that the code implements csort (in the sense described) we had to prove Theorem: Negative Infinity Property Let lst be a list of records and min be one record, and suppose every element of lst dominates (i.e., has data greater than or equal to the data of)min. Let minlst be a list of n repetitions of min. Then cfeed (lst, minlst) is just append (cfeed (lst, nil), minlst).
This theorem tells us that if we initialize the comparator array to “negative infinities” as done on the CAP (i.e., to minlst where min is a record containing the most negative CAP integer) and then feed the input vector into it, the abstract result is the same as feeding the vector into an empty comparator array, as in our definition of csort, and then concatenating the “negative infinities” to the right. Since we are only interested in the first five elements, we can see that the negative infinities are irrelevant to the final answer if the input vector contains more than five elements. The second interesting property concerns the fact that our csort uses the function cdrain while the second phase of the microcode performs this step by feeding in “positive infinities.” We prove the following theorem to overcome this difference: Theorem: Positive Infinity Property Let acc be a list of records satisfying Φ. Let max be a record that dominates every element of acc. Finally, let maxlst be a list of n repetitions of max, where n is an integer, 0 ≤ n ≤ |acc|. Then cfeed (maxlst, acc) is append (maxlst, cdrain (n, acc)).
Note that the accumulator produced from nil by cfeed satisfies Φ and thus has the property required of acc in the theorem above. Furthermore, a list of n repetition’s of the “positive infinity” record has the property required of maxlst above. The theorem thus tells us that when the second phase of the CAP code feeds positive infinitives into the array the result is the same as concatenating positive infinities to the result of draining the comparators as specified in our definition of csort. Thus, at the conclusion of the second phase, the rightmost registers in the CAP array contain the answer computed by cdrain. We find this relationship between cfeed and cdrain to be both surprising and beautiful.
7.
Proof of the Positive Infinity Property
The two infinity properties are challenging to prove. We will briefly discuss our proof of the Positive Infinity Property. The problem is a familiar one to
A Mechanically Checked Proof of a Comparator Sort Algorithm
155
anyone who has proved theorems by induction: the theorem must be generalized. This problem is a mathematical one and is independent of the particular mechanized logic or mechanical theorem prover employed. The theorem we wish to prove involves feeding a series of n max’s into acc. What happens when you do that? The max’s pile up (in reverse order) at the front and acc is stepped with cstep, except that odd/even parities of the elements of acc alternate because of the max’s being added to the front. We leave to the reader the problem of discovering what goes wrong with an attempt to prove the theorem directly by induction, but dealing with these changing parities is one of the problems. To prove the Positive Infinity Property we prove a stronger property by induction. We state the stronger property, Positive Infinity Property Generalized, below. But we motivate (and sketch the proof of) the property in the discussion below, where we explain how to strengthen the original property. The original property involves feeding a list of max’s into an accumulator acc. We will generalize the theorem by generalizing both the list of max’s and the accumulator. We start with the latter. From the discussion above it is clear that the general state of the accumulator is not one merely satisfying Φ but one containing a pile of max’s at the front and satisfying Φ. Thus, the accumulator should have the form append(s, acc ). At first it may appear sufficient to require that s be a list of max’s and that append(s, acc ) satisfy Φ, but we need to generalize further. In particular, we require that s be an ordered (weakly descending) list of records such that |s| is even and every element of s dominates every element of acc . Note that under these conditions, if acc satisfies Φ then so does the concatenation of s and acc . The facts that |s| is even and every element of s dominates every element of acc allows us to distribute Φ over the concatenation, e.g., append(s, acc ) has property Φ iff both s and acc have the property. Note also that if |s| is even, then cstep distributes over append also: the result of stepping the concatenation of s and acc is the concatenation of the results of stepping s and stepping acc . Such observations are crucial and we use them implicitly below. So the general shape of the accumulator is append (s, acc ) where s and acc are as above. Instead of feeding in a list of max’s, we feed in an arbitrary list of records, lst, such that lst is ordered but weakly ascending, every element of lst dominates the elements of s and of acc , and |lst| < |acc |. To see why this version of the theorem is necessary, consider inductively proving a theorem involving the expression cfeed(lst, append(s, acc )) where lst, s and acc have the properties required above.
156 In the induction step, lst is non-empty, i.e., is cons (mx, lst ). Consider what happens when we feed in the first element, mx, to the comparator. The function cfeed conses mx onto append(s, acc ), steps it, and recursively handles lst . That is, the expression above becomes cfeed (lst , cstep (cons (mx, append (s, acc )))) and we seek an induction hypothesis that will enable us to manipulate this expression further. But the inductive hypothesis will be of the form cfeed (lst , append (σ, α)), for lst and some σ and α satisfying our general conditions on lst, s and acc above. Clearly, we must manipulate the cstep expression above, which we shall call ψ, into the append form. Because |lst| < |acc | we can write acc as cons (a, acc ). Thus, ψ = cstep (cons (mx, append (s, acc ))) = cstep (cons (mx, append (s, cons (a, acc )))) = cstep (append (cons (mx, append (s, cons (a, nil))), acc )) Because |s| is even, so is |cons(mx, append(s, cons(a, nil)))|. Thus, we can distribute cstep over the append to obtain ψ = append (cstep (cons (mx, append (s, cons (a, nil)))), cstep (acc )) Since s is ordered, weakly descending, mx dominates everything in s and a is dominated by everything in s, the list cons (mx, append (s, cons (a, nil))) is ordered, weakly descending. Thus the first cstep expression above is a no-op. ψ = append (cons (mx, append (s, cons (a, nil))), cstep (acc )) Hence, ψ is in the form append(σ, α), where σ : cons (mx, append (s, cons (a, nil))) α : cstep (acc ) A little thought will show that these values of σ and α satisfy the conditions on s and acc required by the theorem. In short, an inductive proof of the following general theorem is straightforward, given the fairly subtle relationships between the conditions illustrated above. Theorem: Positive Infinity Property Generalized Let acc be a list of records satisfying Φ. Let lst be a list of records such that
A Mechanically Checked Proof of a Comparator Sort Algorithm
157
|lst| < |acc | and suppose that lst is ordered weakly ascending. Let s be a list of records such that |s| is even and s is ordered weakly descending. Finally, suppose every element of lst dominates every element of s and of acc and that every element of s dominates every element of acc . Then cfeed (lst, append (s, acc )) = append (reverse (lst), s, cdrain (|lst|, acc )).
Note that our Positive Infinity Property follows from the one above, if we let s be nil, acc be acc and lst be a list of n max’s.
8.
Conclusion
We have described the formalization of a sorting algorithm that underlies the “5PEAK” peak-finding microcode on the Motorola CAP digital signal processor. We have sketched the proofs of three main theorems: that the algorithm produces an ordered permutation of its input; that the hardware implementing it can be initialized by streaming in “negative infinities;” and that the final answer can be read out by streaming in “positive infinities.” The problem was formalized in the ACL2 logic and the proofs checked with the ACL2 theorem prover, under the direction of the authors. In addition, in work not discussed here, we extracted the 5PEAK microcode from the ROM of the CAP and proved that it implements this algorithm. While that aspect of the proof is not discussed here, it is reasonable to assume that the proof is very similar to the proofs in [Boyer and Yu, 1996], where the Berkeley C String Library is verified by compiling it with gcc -o and verifying the binary correct with respect to an Nqthm model of the Motorola 68020. The CAP proofs are also similar to the proofs in [Moore, 2003], where we show how to verify that particular JVM bytecode sequences implement recursively defined ACL2 functions. Key to these kinds of proofs is the library of lemmas that allow the operational semantics model to be used to do “symbolic evaluation” of programs. Such lemmas are discussed in [Boyer and Yu, 1996, Moore, 2003] in settings of industrial complexity; for a simple and pedagogically clear setting, see [Boyer and Moore, 1996]. In the CAP work, the semantics of the microcode language is given by an interpreter for the microcode, written in ACL2, and the extracted binary is just a list of 64-bit numbers. The lemmas for controlling the model are complex (because of the complexity of the model) but spiritually similar to those of [Boyer and Moore, 1996]. The 5PEAK theorem states that if the microcode engine is started on a state running the 5PEAK code, with data being streamed in from a certain region of memory, and the engine is allowed to run for a certain
158 number of microcycles, the interpreter produces a state in which the five peaks in the data have been written to certain registers. The number of microcycles is characterized as a function of the number of data items streamed through the array. The key lemma relates the execution of the code to the algorithm described here. The fact that the final registers contain the peaks follows easily from the theorems here. The most intellectually challenging part of the proof – after the creation of the microcode engine model and the lemmas necessary to control its symbolic evaluation – was the work described in this paper and, in particular, the generalizations and inductions appropriate for the positive and negative infinity properties. But this is no fluke; our experience is that routine software verification often requires creative and sophisticated generalization, abstraction, and invariant invention.
Appendix: A Tour of the ACL2 Script In this Appendix we discuss the ACL2 script for reproducing these proofs. Proofs of all three of the key theorems noted here have been checked with the ACL2 theorem prover. Ninety-six ACL2 definitions and theorems are involved in our proof of these theorems. This includes the definitions necessary to define all of the concepts. The ACL2 input script or “book” is http://www.cs.utexas.edu/users/moore/publications/csort/csort.lisp. We give a brief sketch of the book here. The book is divided into six “chapters” and one “appendix.” Chapter 1 deals with elementary list processing and is independent of the specifics of the comparator sort problem. It defines functions for retrieving the first n elements of a list and for producing a list of n repetitions of an element, and it defines the predicate that determines whether one list is a permutation of another. The chapter then proves fundamental properties of several primitive ACL2 functions and these functions, including the concatenation function is associative, the length of the concatenation of two lists is the sum of their lengths, the first n element of a list of length n is the list itself, and the reverse of n repetitions of an element is just n repetitions of the element. The most important contribution of this chapter is that it establishes that the permutation predicate is an equivalence relation and that it is a congruence relation for certain Lisp primitives such as list membership, concatenation, and length. Here we will denote that a is a permutation of b by “a b”. When we say that the permutation predicate is a congruence relation for (the second argument of) list membership, we mean “a b → (x ∈ a) ↔ (x ∈ b)”. ACL2 supports congruence-based rewriting. When ACL2 rewrites an expression it does so in a context in which it is trying to maintain some given equivalence relation. Generally, at the top-level of a formula, it rewrites to maintain propositional equivalence. Because of the above congruence relation, when ACL2 rewrites an expression like “γ ∈ α” to maintain propositional equivalence (“↔”) it can rewrite α to maintain the permutation relation (“”). How does ACL2 rewrite maintaining “”? The answer is that it uses rewrite rules that use “” as their top-level predicate. For example, the theorem that reverse(x) x can be so used as a rewrite rule in the second argument of “∈”. Thus, “e ∈ reverse(x)” rewrites to “e ∈ x”. This rewrite rule about reverse (modulo ) is included in the first chapter of our book. In all, Chapter 1 contains about 40 definitions and theorems.
A Mechanically Checked Proof of a Comparator Sort Algorithm
159
Chapter 2 deals with the idea of ordering lists of pairs by the “data” component. It defines the function “data” and the predicate “ordered” and also defines two other predicates. The first (“all-gte”) checks that one pair dominates all the pairs in a given list, in the sense that the pair’s data field is greater than or equal to that of each of the other pairs. The second (“all-all-gte”) checks that every pair in one list dominates all the pairs in another. These predicates are used in our formalizations of the two infinity properties. The chapter then lists about 20 theorems about these functions and predicates, including, that permutation is a congruence relation for all-gte and all-all-gte, e.g., that a b → all-gte (p, a) ↔ all-gte (p, b). that the concatenation of two lists is ordered precisely when the two lists are ordered and all the elements of the first dominate those of the second, that a pair dominates the elements of the concatenation of two lists precisely when it dominates all the elements of each list, and that the list consisting of n repetitions of an element is ordered. A total of 25 events are in this chapter. Chapter 3 contains the six events defining min-pair, max-pair, cstep, cfeed, cdrain and csort. Chapter 4 establishes the basic properties of the above-mentioned functions, including that cstep, cfeed, and cdrain produce permutations of their arguments, the corollaries that the lengths of their outputs are suitably related to the lengths of their inputs, and that cstep has these three properties: cstep distributes over the concatenation of two lists if the first list has even length, cstep distributes over the concatenation of two lists if the second list is ordered and is dominated by the elements of the first list, cstep is a no-op on ordered lists. Thirteen theorems are in this chapter. In Chapter 5 we are concerned with the invariant φ. We define it and prove φ(cdr(acc)) → φ(cstep(acc)), and φ(acc) → φ(cfeed(lst, acc)). Two other lemmas are proved to help ACL2 to find the proofs of these two theorems. Finally, in Chapter 6 we prove the three theorems discussed in this paper. The theorem that csort produces an ordered permutation of its input is decomposed into two parts. The permutation part, csort(acc) acc, is trivial, given the work done in Chapter 4. The ordered part is ordered(csort(lst)) and is proved using the lemma: If n is a natural number such that n ≤ |acc| and acc has property φ, then ordered(firstn(2 + n, cdrain(n, acc))). The Positive Infinity Property is proved using the lemma below. Suppose data (p1 ) ≥ data (p2 ). Suppose s is an ordered list of even length, p1 dominates every element of s, and every element of s dominates p2 . Then cstep (cons (p1 , append (s, cons (p2 , acc)))) = cons (p1 , append (s, cons (p2 , cstep (acc)))). This lemma is the key simplification step in the proof discussed above of Positive Infinity Property Generalized, which is the next theorem proved in this chapter. It is necessary to tell ACL2
160 to use the particular induction scheme used in our discussion. The Positive Infinity Property is then proved by instantiation. The Negative Infinity Property relies on a similar, inductively proved generalization: Suppose acc is ordered. Suppose further that every element of lst dominates every element of acc and that every element of s dominates acc. Then cfeed (lst, append (s, acc)) = append (cfeed (lst, s), acc). The reader of the ACL2 book will note that we have recursively defined the concepts ordered and permutation. That is, ordered (x) is defined to scan down x by cdr and compare the data values of successive elements. Perm (x, y) scans down x by cdr and checks that each element, e, is a member of y and that the rest of x is, recursively, a permutation of the result of deleting one occurrence of e from y. Many readers find it more natural to define ordered (x) ↔ ∀i, j : 0 ≤ i ≤ j < |x| → data (x[i]) ≥ data (x[j]) and perm (x, y) ↔ ∀e : # (e, x) = # (e, y) where # (e, x) determines how many times e occurs in x. In the appendix of the book we prove theorems weakly relating our recursive definitions to these quantified formulas and restating our main theorem about csort in these terms. The reader may wonder why we did not state the theorems in these terms in the first place. The reason has to do with ACL2’s lack of explicit quantification. Implicitly, all ACL2 formulas are universally quantified on the outside, but no quantifiers can be written. This allows some forms of (implicit) quantification, e.g., when one wishes to universally quantify in the conclusion of a theorem, but prohibits others, e.g., when one wishes to universally quantify in the hypothesis of a theorem. However, by defining predicates in closed form recursively, the ACL2 user can use the predicate without regard for its “logical parity” within the formula. To restate the main theorem as indicated above takes one definition (of “how many”) and five theorems (a technical lemma, the implication from ordered to the quantified form, the fact that permutation is a congruence for the how many function, the fact that csort preserves length, and the restated main theorem). To process all ninety-six of the definitions and theorems discussed in this paper takes ACL2 Version 2.8 about 3 seconds on a 2.40 GHz Intel Xeon.
Appendix: Proof Output In this Appendix we present the proof of the Positive Infinity Property. We first show the definition of positive-infinity-hint. The function’s value is irrelevant, but the recursive scheme it uses illustrates the induction scheme required to prove the Positive Infinity Property Generalized. Each case in the recursive definition gives rise to a case in the induction scheme. Each recursive call in a given branch of the recursive definition gives rise to a corresponding inductive hypothesis (instance of the conjecture) in the corresponding case of the induction scheme. Any branch with no recursive call is a base case. After the definition of the hint function, we show the statement of the generalized property, including the user-supplied induction hint, and the initial part of the theorem prover’s proof output. The proof breaks into an induction step and a base case. The induction step simplifies slightly and then breaks into eight cases. We show the proofs of the first two cases as representatives of the whole. We then delete the remaining proof output to save space. Finally, we show the statement of the Positive Infinity Property and its proof, which is more or less immediate given the generalized lemma just proved.
A Mechanically Checked Proof of a Comparator Sort Algorithm
161
The three commands – a definition and two theorems with hints – were typed by the user. All other material below was created by ACL2 Version 2.8 in response (except for the italicized comments indicating that proof output has been deleted). The verbose proof output is of interest for two reasons. First, it often helps the user, or others, understand how the theorem was proved. Second, when a proof attempt fails, it is crucial for the ACL2 user to read the failed proof attempt, following “The Method” of [Kaufmann et al., 2000b, Moore, 2004], and identify the lemmas or hints leading to a successful proof. ACL2 >(defun positive-infinity-hint (lst s acc) (cond ((endp lst) (list s acc)) (t (positive-infinity-hint (cdr lst) (cons (car lst) (append s (list (car acc)))) (cstep (cdr acc)))))) The admission of POSITIVE-INFINITY-HINT is trivial, using the relation O< (which is known to be well-founded on the domain recognized by OP) and the measure (ACL2-COUNT LST). We observe that the type of POSITIVE-INFINITY-HINT is described by the theorem (AND (CONSP (POSITIVE-INFINITY-HINT LST S ACC)) (TRUE-LISTP (POSITIVE-INFINITY-HINT LST S ACC))). We used primitive type reasoning. Summary Form: ( DEFUN POSITIVE-INFINITY-HINT ...) Rules: ((:FAKE-RUNE-FOR-TYPE-SET NIL)) Warnings: None Time: 0.00 seconds (prove: 0.00, print: 0.00, other: 0.00) POSITIVE-INFINITY-HINT
ACL2 >(defthm positive-infinity-gen (implies (and (From a complexity point of view, this is not surprising since Defn. 6.1 provides a polynomial algorithm and since the underlying logic is undecidable (thus deciding validity is impossible). This motivates the following ideal definition of meaning of formulas in a given 3-valued logical structure:
Definition 6.4 Let F be a set of integrity formulas F . Let S = U S , ιS be a 3-valued interpretation of the language of formulas over P. The supervaluational meaning of a closed formula ϕ, denoted by ϕSF , yields a truth value in {0, 1, 1/2} and is defined as follows: [[ϕ]]S2 ([]) (7) ϕSF = S ∈γ[F ](S)
Example 6.5 Consider the structure S from Figure 8. The closed formula ∃v1 , v2 : x(v1 ) ∧ n(v1 , v2 ) evaluates to 1/2 according to the compositional
On the Utility of Canonical Abstraction
235
rules given in Defn. 6.1 since for the assignment [v1 '→ u1, v2 '→ u2 ], ιS (x)(u1 ) = 1 and ιS (n)(u1 , u2 ) = 1/2, and for every other assignment, the formula x(v1 ) ∧ n(v1 , v2 ) evaluates to 0. In contrast, the supervaluational value of the formula evaluates to 1 since all nodes represented by u2 must have rn,x = 1 and because of the integrity formula ∀v : rn,x (v) ⇐⇒ x(v) ∨ ∃v1 : x(v1 ) ∧ n+ (v1 , v)
Remark. Notice that Defn. 6.4 does not provide an algorithm to compute the supervaluational value of a given formula and that such an algorithm may not exist. In [Yorsh, 2003], it is shown that a theorem prover can be used to compute the supervaluational value, but the theorem prover need not terminate. Also, this definition can be formulated with respect to an arbitrary abstract domain. Finally, it is also possible to generalize this definition for formulas with free variables by evaluating the formula against all possible assignments.
6.3
Employing Semantic Reductions
We now describe an algorithm that approximates the supervaluational value of a given closed formula ϕ in a given 3-valued structure S. The algorithm operates in two phases: (i) it converts S into a set of 3-valued structures XS such that XS and S represent the same set of structures and the Kleene (compositional) value of ϕ is definite in all the structures in XS. This phase is called focus, since it brings the formula into focus. It can also be explained as a partial concretization since it yields 3-valued structures that are “concrete enough” to make the Kleene value of ϕ definite. (ii) it reduces the number of structures in XS by eliminating infeasible structures S ∈ XS, i.e., structures S such that γ[F ](S ) = φ. This phase is called coerce since it actually coerces the structure into one with more definite values, and in the degenerate case, eliminates infeasible structures. Usually the first phase is simple and the second phase can be expensive as it is global in nature. It is worthwhile noting that both of these phases are semantic-reduction operations (a concept originally introduced in [Cousot and Cousot, 1979]). That is, they convert a set of 3-valued structures into a more precise set of 3-valued structures that describe the same set of stores The rest of this section explains these two phases.
Bringing Formulas Into Focus. In this section, we define an operation, called focus, that generates a set of structures, on which all formulas F of a given set have definite values for all assignments. Unfortunately, focus potentially yields an infinite set of structures. [Lev-Ami, 2000] defines sufficient
236 conditions, under which the number of structures is finite, and it gives an algorithm to compute this set. In this paper, we only give a declarative specification of the desired properties of the focus operation.
Definition 6.6 Given a formula ϕ. A function op : 3-STRUCT[P] → 23-STRUCT[P] is a focus operation for ϕ if for every S ∈ 3-STRUCT[P], op(S) satisfies the following requirements: • op(S) and S represent the same concrete structures, i.e., γ[F ](S) = γ[F ](op(S)) • In each of the structures in op(S), ϕ has a definite value for every assignment, i.e., for every S ∈ op(S), ϕ ∈ F , and assignment Z, we have [[ϕ]]S3 (Z) = 1/2. In the above definition, Z maps the free variables of ϕ to individuals in structures S ∈ op(S). In particular, when ϕ has one designated free variable, v, Z maps v to an individual. As usual, when ϕ is a closed formula, the quantification over Z is superfluous.
Example 6.7 Figure 9 shows the result of focusϕ for the closed formula ϕ = ∃v1 , v2 : x(v1 ) ∧ n(v1 , v2 ) on the input structure shown in Figure 8. In S0 , ϕ evaluates to 0. This structure is infeasible and will be removed by the coerce procedure described in the next section. In S1 and S2 , ϕ evaluates to 1. Notice how these structures were constructed from the original structure by enumerating the cases in which ϕ evaluates to a definite value: S0 for definite value 0, S1 for definite value 1, when the summary node u2 is directly connected to u1 via an n field, and S2 for definite value 1, when the summary node u2 was bifurcated into two nodes: (i) u2.1 , which is directly connected to u1 via an n field, (i) u2.0 , which is not directly connected to u1 via an n field. After removing S0 , we conclude that the value of ϕ is 1 in S1 and in S2 , and thus in all the concrete structures represented by S. Remark. The structures in Figure 9 demonstrates the fact focus is a “partial concretization” since it yields 3-valued structures which are “concrete enough” to make the Kleene value of ϕ definite. Each of these structures refines the original structure. Indeed, the utility of this operation is that it provides the ability to concretize the abstract value, usually, without considering all the infinite set of concrete states.
Coerce. In the 2nd phase, we define the coerce operation which eliminates infeasible structures. The actual algorithm for coerce and the mathematical properties are described in [Lev-Ami, 2000, Sagiv et al., 2002]. This phase can be viewed as a simple theorem prover that applies sound but incomplete rules on an undecidable logic.
237
On the Utility of Canonical Abstraction S0
OONML HIJK I L IJ O x, y, rn,x , rn,y
n
OONML HIJK N I L IJ O rn,x , rn,y
S1
OONML HIJK I L IJ O x, y, rn,x , rn,y
n
n
HIJK HN I LK IJ / OONML O rn,x , rn,y
S2
ONML HIJK I IJ O x, y, rn,x , rn,y
n
n
WWVUT Q QR V T / PQRS l O rn,x , rn,y
n
n n
WWVUT Q QR V T / PQRS O rn,x , rn,y
Figure 9. The structures resulting from focusing on the formula ∃v1 , v2 : x(v1 ) ∧ n(v1 , v2 ) on the input structure shown in Figure 8.
We can, in many cases, sharpen some of the stored predicate values of 3valued structures as stated in the following observation:
Observation 6.8 [The Sharpening Principle]. In any structure S, the value stored for ιS (p)(u1 , . . . , uk ) should be at least as precise as the value of p’s defining formula, ϕp , evaluated at u1 , . . . , uk (i.e., [[ϕp ]]S3 ([v1 '→ u1 , . . . , vk '→ uk ])). Furthermore, if ιS (p)(u1 , . . . , uk ) has a definite value and ϕp evaluates to an incomparable definite value, then S is a 3-valued structure that does not represent any concrete structures at all. This observation motivates the subject of the remainder of this subsection, an investigation of compatibility constraints expressed in terms of a new connective, ‘=⇒ = 1 ’.
Definition 6.9 A compatibility constraint is a term of the form ϕ1 =⇒ = 1 ϕ2 , where ϕ1 is an arbitrary 3-valued formula, and ϕ2 is either an atomic formula or the negation of an atomic formula over distinct logical variables. = 1 ϕ2 , We say that a 3-valued structure S and an assignment Z satisfy ϕ1 =⇒ = 1 ϕ2 , if whenever Z is an assignment such that denoted by S, Z |= ϕ1 =⇒ [[ϕ1 ]]S3 (Z) = 1, we also have [[ϕ2 ]]S3 (Z) = 1. (Note that if [[ϕ1 ]]S3 (Z) equals 0 = 1 ϕ2 , regardless of the value of [[ϕ2 ]]S3 (Z).) or 1/2, S and Z satisfy ϕ1 =⇒ = 1 ϕ2 , denoted by S |= ϕ1 =⇒ = 1 ϕ2 , if for We say that S satisfies ϕ1 =⇒ = 1 ϕ2 . If Σ is a finite set of compatibility every Z, we have S, Z |= ϕ1 =⇒ constraints, we write S |= Σ if S satisfies every constraint in Σ. Compatibility constraints provide a way to express certain properties that are a consequence of the tight-embedding process, but that would not be expressible with formulas alone. For a 2-valued structure, =⇒ = 1 has the same = 1 meaning as implication. (That is, if S is a 2-valued structure, S, Z |= ϕ1 =⇒ ϕ2 iff S, Z |= ϕ1 =⇒ = ϕ2 .) However, for a 3-valued structure, =⇒ = 1 is stronger than implication: if ϕ1 evaluates to 1 and ϕ2 evaluates to 1/2, the constraint ϕ1 =⇒ = 1 ϕ2 is not satisfied. More precisely, suppose that
238 [[ϕ1 ]]S3 (Z) = 1 and [[ϕ2 ]]S3 (Z) = 1/2; the implication ϕ1 =⇒ = ϕ2 is satisfied (i.e., S, Z |= ϕ1 =⇒ = ϕ2 ), but the constraint ϕ1 =⇒ = 1 ϕ2 is not satisfied = 1 ϕ2 ). (i.e., S, Z |= ϕ1 =⇒ In general, compatibility constraints are not expressible in Kleene’s logic (i.e., by means of a formula that simulates the connective =⇒ = 1 ). The reason is that formulas are monotonic in the information order, whereas =⇒ = 1 is nonmonotonic in its right-hand-side argument. For instance, the constraint 1 =⇒ = 1 p is satisfied in the structure S = ∅, [p [ '→ 1]; however, it is not satisfied in [ '→ 1/2] * S. S = ∅, [p Thus, in 3-valued logic, compatibility constraints are in some sense “better” than first-order formulas. Fortunately, compatibility constraints can be generated automatically from first-order formulas that express integrity rules (see Section 5.1). The following definition supplies a way to convert formulas into constraints:
Definition 6.10 Let ϕ be a closed formula, and (where applicable below) let a be an atomic formula such that a contains no repetitions of logical variables. Then the constraint generated from ϕ, denoted by r(ϕ), is defined as follows: = 1a r(ϕ) = ϕ1 =⇒ = 1 ¬a r(ϕ) = ϕ1 =⇒ r(ϕ) = ¬ϕ =⇒ = 10
if ϕ ≡ ∀v1 , . . . vk : (ϕ1 = =⇒ a) if ϕ ≡ ∀v1 , . . . vk : (ϕ1 = =⇒ ¬a) otherwise
(8) (9) (10)
For a set of formulas F , we define r$(F ) to be the set of constraints generated from the formulas in F (i.e., {r(ϕ) | ϕ ∈ F }). The intuition behind (8) and (9) is that for an atomic predicate, a tight embedding yields 1/2 only in cases in which a evaluates to 1 on one tuple of values for v1 , . . . vk , but evaluates to 0 on a different tuple of values. In this case, the left-hand side will evaluate to 1/2 as well. Rule (10) is included to enable an arbitrary formula to be converted to a constraint.
Example 6.11 For reachability, from (6), the following constraints are generated: =⇒1 for each x ∈ PVar : x(v) ∨ ∃v1 : x(v1 ) ∧ n+ (v1 , v) =
rn,x (v) (11)
=⇒1 ¬rn,x (v) for each x ∈ PVar : ¬(x(v) ∨ ∃v1 : x(v1 ) ∧ n+ (v1 , v)) = (12) The constraint-generation rules defined in Defn. 6.10 generate interesting constraints only for certain specific syntactic forms, namely implications with
239
On the Utility of Canonical Abstraction
exactly one (possibly negated) predicate symbol on the right-hand side. Thus, when we generate compatibility constraints from compatibility formulas written as implications (cf. Tables 4 and 6), the set of constraints generated depends on the form in which the compatibility formulas are written. In particular, not all of the many equivalent forms possible for a given compatibility formula = a)) yields lead to useful constraints. For instance, r(∀v1 , . . . vk : (ϕ =⇒ the (useful) constraint ϕ =⇒ = 1 a, but r(∀v1 , . . . vk : (¬ϕ ∨ a)) yields the (not useful) constraint ¬(¬ϕ ∨ a) = =⇒1 0. This phenomenon can prevent an instantiation of the shape-analysis framework from having a suitable compatibility constraint at its disposal that would otherwise allow it to sharpen or discard a structure that arises during the analysis—and hence can lead to a shape-analysis algorithm that is more conservative than we would like. However, when compatibility formulas are written as “clauses” (see Defn. 6.12 below), the way around this difficulty is to augment the constraint-generation process to generate constraints for some of the logical consequences of each compatibility formula. The process of “generating some of the logical consequences for clauses” is formalized as follows:
Definition 6.12 For a formula ϕ, we define ϕ1 ≡ ϕ and ϕ0 ≡ ¬ϕ. We say that a formula ϕ of the form ∀... :
m !
(ϕi )Bi ,
i=1
where m > 1 and Bi ∈ {0, 1}, is a clause. We define the closure of ϕ, denoted by closure(ϕ), to be the following set of formulas: ⎫ ⎧ & m & 1 ≤ j ≤ m, ⎬ ⎨ def B & i closure(ϕ) = ∀ . . . , ∃v1 , v2 , . . . , vn : ϕ1−B = ϕj j & vk ∈ FV(ϕ), (13) =⇒ i ⎩ & vk ∈ FV(ϕj ) ⎭ i=1,i= j
For a formula ϕ that is not a clause, closure(ϕ) = {ϕ}. Finally, for a set of ( formulas F , we write closure(F ) to denote the application of closure to every formula in F . It is easy to see that the formulas in closure(ϕ) are implied by ϕ.
Example 6.13 The set of formulas listed in Table 8 are the compatibility ( FList ) generated via Defn. 6.12 when the two implication formulas closure(F formulas in FList , (2) and (3), are expressed as the following clauses (i.e., by rewriting the implications as disjunctions and then applying De Morgan’s laws): for each x ∈ PVar , ∀v1 , v2 : ¬x(v1 ) ∨ ¬x(v2 ) ∨ eq(v1 , v2 ) ∀v1 , v2 , v3 : ¬n(v3 , v1 ) ∨ ¬n(v3 , v2 ) ∨ eq(v1 , v2 )
(18) (19)
From (18) and (19), Defn. 6.12 generates the final six compatibility formulas shown in Table 8. By Defn. 6.10 these yield the following compatibility
240 for each x ∈ PVar , ∀v1 , v2 : x(v1 ) ∧ x(v2 ) for each x ∈ PVar , ∀v2 : (∃v1 : x(v1 ) ∧ ¬eq(v1 , v2 )) for each x ∈ PVar , ∀v1 : (∃v2 : x(v2 ) ∧ ¬eq(v1 , v2 )) ∀v1 , v2 : (∃v3 : n(v3 , v1 ) ∧ n(v3 , v2 )) ∀v2 , v3 : (∃v1 : n(v3 , v1 ) ∧ ¬eq(v1 , v2 )) ∀v1 , v3 : (∃v2 : n(v3 , v2 ) ∧ ¬eq(v1 , v2 ))
= =⇒ eq(v1 , v2 ) = =⇒ ¬x(v2 ) = =⇒ ¬x(v1 ) = =⇒ eq(v1 , v2 ) = =⇒ ¬n(v3 , v2 ) = =⇒ ¬n(v3 , v1 )
(2) (14) (15) (3) (16) (17)
( FList ) generated via Defn. 6.12 when the two Table 8. The compatibility formulas closure(F implication formulas in FList , (2) and (3), are expressed as the clauses (18) and (19), respectively. (Note that the systematic application of Defn. 6.12 leads, in this case, to two pairs of formulas that differ only in the names of their bound variables: (14)/(15) and (16)/(17).)
constraints: =⇒1 for each x ∈ PVar , x(v1 ) ∧ x(v2 ) = =⇒1 for each x ∈ PVar , (∃v1 : x(v1 ) ∧ ¬eq(v1 , v2 )) = for each x ∈ PVar , (∃v2 : x(v2 ) ∧ ¬eq(v1 , v2 )) = =⇒1 =⇒1 (∃v3 : n(v3 , v1 ) ∧ n(v3 , v2 )) = =⇒1 (∃v1 : n(v3 , v1 ) ∧ ¬eq(v1 , v2 )) = =⇒1 (∃v2 : n(v3 , v2 ) ∧ ¬eq(v1 , v2 )) =
eq(v1 , v2 ) ¬x(v2 ) ¬x(v1 ) eq(v1 , v2 ) ¬n(v3 , v2 ) ¬n(v3 , v1 )
(2) (20) (21) (3) (22) (23)
Similarly, after (4) is rewritten as the following clause ∀v, v1 , v2 : ¬n(v1 , v) ∨ ¬n(v2 , v) ∨ eq(v1 , v2 ) ∨ is(v) we obtain the following compatibility constraints: (∃v1 , v2 : n(v1 , v) ∧ n(v2 , v) ∧ ¬eq(v1 , v2 )) = =⇒1 =⇒1 (∃v1 : n(v1 , v) ∧ ¬eq(v1 , v2 ) ∧ ¬is(v)) = =⇒1 (∃v2 : n(v2 , v) ∧ ¬eq(v1 , v2 ) ∧ ¬is(v)) = =⇒1 (∃v : n(v1 , v) ∧ n(v2 , v) ∧ ¬is(v)) =
is(v) ¬n(v2 , v) ¬n(v1 , v) eq(v1 , v2 )
(24) (25) (26) (27)
( has been applied to all sets of compatiHenceforth, we assume that closure bility formulas.
241
On the Utility of Canonical Abstraction
Definition 6.14 (Compatible 3-Valued Structures). Given a set of compatibility formulas F , the set of compatible 3-valued logical structures 3-CSTRUCT[P, r$(F )] ⊆ 3-STRUCT[P] is defined by S ∈ 3-CSTRUCT[P, r$(F )] iff S |= r$(F ). The Coerce Operation. tion works.
We are now ready to define how the coerce opera-
Definition 6.15 The operation coercer$(F ) : 3-STRUCT[P] → 3-CSTRUCT[P, r$(F )] ∪ {⊥} def
is defined as follows: coercer$(F ) (S) = the maximal S such that S S, U S = U S , and S ∈ 3-CSTRUCT[P, r$(F )], or ⊥ if no such S exists. (We will simply write coerce when r$(F ) is clear from the context.) It is a fact that the maximal such structure S is unique (if it exists), which follows from the observation that compatible structures with the same universe of individuals are closed under the following join operation:
Definition 6.16 For every pair of structures S1 , S2 ∈ 3-CSTRUCT[P, r$(F )] such that U S1 = U S2 = U , the join of S1 and S2 , denoted by S1 S2 , is defined as follows: def
S1 S2 = U, λp.λu1 , u2 , . . . , um .ιS1 (p)(u1 , u2 , . . . , um ) ιS2 (p)(u1 , u2 , . . . , um ). Because coerce can result in at most one structure, its definition does not involve a set former—in contrast to focus, which can return a non-singleton set. The significance of this is that only focus can increase the number of structures that arise during shape analysis, whereas coerce cannot.
Example 6.17 The application of coerce to the structures S0 , S1 , and S2 yields Sb,1 and Sb,2 , shown in Figure 10. • The structure S0 is discarded, since there exists no structure that can be embedded into it and that satisfies constraint (12). • The structure Sc,1 was obtained from S1 by removing incompatibilities as follows: 1. Consider the assignment [v '→ u, v1 '→ u1 , v2 '→ u2 ]. Because ι(n)(u1 , u) = 1, u1 = u2 , and ι(is)(u) = 0, constraint (25) implies that ι(n)(u, u) must equal 0. Thus, in Sb,1 the (indefinite) n edge from u to u has been removed. 2. Consider the assignment [v1 '→ u, v2 '→ u2 ]. Because ι(y)(u) = S 1, constraint (2) implies that [[eq(v1 , v2 )]]3 b,1 ([v1 '→ u2 , v2 '→ u2 ])
242 Sc,1
?>=< 89:; O
Sc,2 n
x, y, rn,x , rn,y
Figure 10.
?>=< / 89:; O
89:; ?>=< O
rn,x , rn,y
x, y, rn,x , rn,y
n
n
@ABC A DC AB / G@GFED O rn,x , rn,y
n
@ABC F A DC AB / G@GFED O rn,x , rn,y
The structures resulting from coercing the input structures shown in Figure 9.
must equal 1. By Defn. 6.1, this means that ιSb,1 (eq)(u2 , u2 ) must equal 1. Thus, in Sb,1 u2 is no longer a summary node. • The structure Sc,2 was obtained from S2 by removing incompatibilities as follows: 1. Consider the assignment [v '→ u.1, v1 '→ u1 , v2 '→ u2 .0]. Since ι(n)(u1 , u2 .1) = 1, u1 = u2 .0, and ι(is)(u2 .1) = 0, constraint (25) implies that ιSb,2 (n)(u2 .0, u2 .1) must equal 0. Thus, in Sb,2 the (indefinite) n edge from u2 .0 to u2 .1 has been removed. 2. Consider the assignment [v '→ u2 .1, v1 '→ u1 , v2 '→ u2 .1]. Because ι(n)(u1 , u2 .1) = 1, u1 = u2 .1, and ι(is)(u2 .1) = 0, constraint (25) implies that ιSb,2 (n)(u2 .1, u2 .1) must equal 0. Thus, in Sb,2 the (indefinite) self n edge from u2 .1 to u2 .1 has been removed. '→ u2 .1, v2 '→ u2 .1]. 3. Consider the assignment [v1 = 1, constraint (2) implies that Because ι(y)(u2 .1) S [[eq(v1 , v2 )]]3 b,2 ([v1 '→ u2 .1, v2 '→ u2 .1]) = 1. By Defn. 6.1, this means that ιSb,2 (eq)(u2 .1.u2 .1) must equal 1. Thus, in Sb,2 u2 .1 is no longer a summary node. Since in Sc,1 and in Sc,2 the Kleene value of ϕ is 1 we can safely conclude that the formula evaluates to 1 on the original structure shown in Figure 8.
6.4
Summary
In this section, we have shown three methods to extract information from 3-valued S structures expressed as a FOTC formula ϕ: (i) Evaluate ϕ against S using Kleene evaluation rules. This is the most efficient way, but the least precise—it can return 1/2 even when ϕ evaluates to 1 (respectively 0) in all the concrete structures represented by S. (ii) Evaluate ϕ against S using supervaluational semantics. This can be implemented with the help of Theorem Provers, but this need not terminate. (iii) Apply focusϕ to S; then apply coerce to all the resultant structures and finally evaluate ϕ against coerced structures. If ϕ evaluates to the same value in all these structures return this value and otherwise return 1/2.
243
On the Utility of Canonical Abstraction n
O
@ABC F A DC AB / GGFED O d
rn,x
rn,x , rn,y
@ABC A D AB x / GGFED
n
y
Figure 11. The 3-valued structure resulting from evaluating the formula defining the meaning of the statement y = y->n; to the 3-valued structure shown in Figure 8.
7.
Abstract Interpretations of Statements
In the previous section, we showed three sound methods for extracting information from 3-valued structures defined by FOTC formula. In this section, we show that since the semantics of statements is also used defined using formulas, we can adopt the methods to calculate the effect of program statements and conditions. This solves a problem which was long time open in shape analysis: How to calculate the effect of program statements in a conservative and reasonably precise way? In particular, how to perform strong updates?
The reminder of this section is organized as follows: Section 7.1 discusses the Kleene interpretation of statements. Section 7.2 discusses the best interpretation of statements [Cousot and Cousot, 1979] Finally, Section 7.3 describes a realistic solution using focus and coerce.
7.1
Kleene Conservative Interpretation of Statements
The simplest and the most efficient way to interpret statements and conditions is by reevaluating the logical formulas that define the operational semantics using Kleene interpretation. More specifically, whenever the precondition is potentially satisfied (i.e, having values 1 or 1/2), the right hand side of the assignment is evaluated and the assignment is interpreted. For more details the reader is refereed to [Lev-Ami, 2000]
Example 7.1 Figure 11 exemplifies the Kleene evaluation of the formulas defining the statement y = y->n; in the insert program. The precondition formula evaluates to 1/2, and the system need to issue a warning that the precondition may not be specified. Then, y (u2 ) is set to 1/2 since the formula eq(u2 , u2 ) evaluates to 1/2. The formulas for updating the reachability are shown in [Reps et al., 2003]. Intuitively, after this statement rn,y becomes false for u1 since it is no longer reachable from y.
244
conc.
x, y
HOHIJK / NML n HOHIJK / NML O O rn,x , rn,y
assign.
/ NML x OHHIJK
n
O
OHHIJK WQR VU VUT n / WPQRS Q QR VUT VU / NML n / PQRS O O O
rn,x , rn,y
HOHIJK / NML O y, rn,x , rn,y
rn,x
x, y
rn,x , rn,y
/ NML x OHHIJK
n
O
rn,x , rn,y
rn,x , rn,y
WQR VU VUT n / WPQRS Q QR VUT VU / PQRS O O y, rn,x , rn,y
rn,x
···
···
rn,x , rn,y n
abs.
@ABC A AB x / GFED O
rn,x
n
@ABC AB AB / GFED O y, rn,x , rn,y
@ABC AB AB x / GFED O
HIJK I IJ / OHONML O
rn,x
y, rn,x , rn,y
n
n
HIJK N IJ IJ / OHONML O rn,x , rn,y
Figure 12. The application of the best transformer of the statement y = y->n to the 3-valued structure shown in Figure 8.
7.2
The Best Conservative Interpretation
Because canonical abstraction introduces a Galois connection, the best conservative (also called induced) interpretation of a statement st on a 3-valued structure S can be defined as suggested in [Cousot and Cousot, 1979], by applying the concrete semantics of st to every concrete structure in γ[F ](S) and then abstracting the resultant set of structures.
Example 7.2 Figure 12 demonstrates the application of the best transformer of the statement y = y->n to the 3-valued structure shown in Figure 8. The ˙ structures represents lists of length 2 or more pointed to by x and yAfter the assignment y = y->n;, there are two possible canonical 3-valued structures: a list with exactly two elements in which y points to the second element, and lists of length 3 or more with the second element is pointed to by y. Remark. As in the case of supervaluation, the best transformer for canonical abstraction can be implemented with the help of a theorem prover [Yorsh et al., 2004]. Indeed, the best transformer is analogous to supervaluation.
7.3
A Realistic Conservative Interpretation
The realistic solution implemented in TVLA is to apply focus to the precondition formulas followed by Coerce and then use normal Kleene evaluation.
245
On the Utility of Canonical Abstraction
focus
x, y, rn,x , rn,y
n
@ABCD / GGFED
@ABC A D AB x, y / GGFED
rn,x , rn,y
rn,x , rn,y
rn,x , rn,y
O
@ABC A D AB / GGFED O
@ABC A D AB x, y / GGFED
rn,x , rn,y
rn,x , rn,y
rn,x , rn,y
@ABC A D AB x, y / GGFED
n
rn,x , rn,y
O
@ABC F A D AB / GGFED O
@ABC F D / GGFED
n
O
n/
n
n
rn,x , rn,y
rn,x , rn,y
OONML HHIJK N I L n / OHONML IJ HIJK N I LK IJ m n O O
n
coerce
@ABC A D AB x, y / GGFED
n
O
n/
OONML HHIJK I L IJ O rn,x , rn,y
n/
OHONML HIJK N I LK IJ O rn,x , rn,y n
assign.
@ABC A D AB x / GGFED
n
O
GGFED A D AB / @ABC O y, rn,x , rn,y
rn,x
@ABC A D AB x / GGFED O
rn,x
n/
OONML HHIJK I L IJ O y, rn,x , rn,y
n
HHIJK N I L IJ / OONML O rn,x , rn,y n
abs.
@ABC A D AB x / GGFED O
rn,x
n
GGFED A D AB / @ABC O y, rn,x , rn,y
@ABC A D n / OONML AB HHIJK I L n / OONML IJ HHIJK N I L IJ x / GGFED O O O
rn,x
y, rn,x , rn,y
rn,x , rn,y
Figure 13. The application of a realistic transformer of the statement y = y->n to the 3valued structure shown in Figure 8. First, we focus on the formula ∃ w1 , w2 : t(w1 ) ∧ n(w1 , w2 ). Then, we apply coerce. Then, the statement is interpreted using Kleene evaluation. Finally, canonical abstraction is applied.
In fact, the TVLA user can control which of these operations is applied. For details we refer the reader to [Lev-Ami, 2000].
Example 7.3 Figure 13 demonstrates the application of the realistic transformer of the statement y = y->n to the 3-valued structure shown in Figure 8.
8.
Applications and Extensions
This section describes several applications and extensions of the parametric logic-based analysis framework.
8.1
Interprocedural Analysis
[Rinetskey and Sagiv, 2001] handles procedures by explicitly representing stacks of activation records as linked lists, allowing rather precise analysis of
246 recursive procedures. [Jeannet et al., 2004] handles procedures by summarizing their behavior. [Rinetzky et al., 2005] presents a new concrete semantics for heap manipulating programs, which only passes “local” heaps to procedures. This semantics is extended in [Rinetzky et al., 2004] to perform more modular summarization by only representing reachable parts of the heap.
8.2
Concurrent Java Programs
[Yahav, 2001] presents a general framework for proving safety properties of concurrent Java programs with unbounded number of objects and threads. In [Yahav and Sagiv, 2003] it is applied to verify partial correctness of concurrentqueue implementations.
8.3
Temporal Properties
[Yahav et al., 2003] proposes a general framework for proving temporal properties of programs by representing program traces as logical structures. A more efficient technique for proving local temporal properties is presented in [Shaham et al., 2003] and applied to compile-time garbage collection in Javacard programs.
8.4
Correctness of Sorting Implementations
In [Lev-Ami et al., 2000], TVLA is applied to analyze programs sorting linked lists. It is shown that the analysis is precise enough to discover that (correct versions) of bubble-sort and insertion-sort procedures do, in fact, produce correctly sorted lists as outputs, and that the invariant “is-sorted” is maintained by list-manipulation operations such as merge. In addition, it is shown that when the analysis is applied to erroneous versions of bubble-sort and insertionsort procedures, it is able to discover and sometimes even locate and diagnose the error. In [Loginov et al., 2004], abstraction refinement is used to automatically derive abstractions that are successfully used to prove partial correctness of several sorting algorithms. The derived abstractions are used to prove that the algorithms possess additional properties such as stability and antistability.
8.5
Conformance to API Specifications
[Ramalingam et al., 2002] shows how to verify that client programs using a library conform to the library’s API specifications. In particular, an analysis is provided for verifying the absence of concurrent-modification exceptions in Java programs that use Java collections and iterators. In [Yahav and Ramalingam, 2004], separation and heterogeneous abstraction are used to scale the verification algorithms and to allow verification of larger programs using libraries such as JDBC.
On the Utility of Canonical Abstraction
8.6
247
Computing Intersections of Shape Graphs
[Arnold, 2004] considers the problem of computing the intersection (meet) of heap abstractions, namely the common value of a set of abstract stores. This problem proves to have many applications in program analysis such as interpreting program conditions, refining abstract configurations, reasoning about procedures, and proving temporal properties of heap-manipulating programs, either via greatest fixed point approximation over trace semantics or in a staged manner over the collecting semantics. [Arnold, 2004] describes a constructive formulation of meet that is based on finding certain relations between abstract heap objects. The enumeration of those relations is reduced to finding constrained matchings over bipartite graphs.
8.7
Efficient Heap Abstractions and Representations
[Manevich et al., 2002] addresses the problem of space consumption in firstorder state representations by describing and evaluating two new representation techniques for logical structures. One technique uses ordered binary decision diagrams (OBDDs); the other uses a variant of a functional map data structure. The results show that both the OBDD and functional implementations reduce space consumption in TVLA by a factor of 4 to 10 relative to the current TVLA state representation without compromising analysis time. [Manevich et al., 2004] presents a new heap abstraction that works by merging shape descriptors according to a partial isomorphism similarity criterion, resulting in a partially disjunctive abstraction. This abstraction usually provides superior performance compared to the powerset heap abstraction, practically without a significant loss of precision. [Manevich et al., 2005] provides a family of simple abstractions for potentially cyclic linked lists. In particular, it provides a relatively efficient predicate abstraction that allows verification of programs manipulating potentially cyclic linked lists.
8.8
Abstracting Numerical Values
In this paper, we ignore numerical values in programs, and thus our analysis can become imprecise in programs that do numerical computations. [Gopan et al., 2004] presents a generic solution for combining abstractions of numeric and heap allocated storage. This solution has been integrated into TVLA. Indeed, in [Gopan et al., 2005], a new abstraction of numeric values is presented, which like canonical abstraction tracks correlations between aggregates and not just indices. For example, it can perform array-kills that assign values to a whole array.
248
Red
Green
Yellow
(a)
Go
Red
(b)
Red
Red = 0
(c)
Figure 14 4. (a) Transition diagram for a stoplight; (b) transitio on diagram abstracted via the method of [Clarke et al., 2000] when green n and yellow are mapped d to go; (c) transition diagram abstractedd via canonical abstraction, using g red(v) as the only abstrraction predicate.
9.
The TVLA system
TVL LA is a system for generatin ng implementations of static analysis algorithms, successfully used for a wide range of applications. Several aspects contribuuted to the usefulness of thee system:
Firm theoretical heoretical background background. TVLA is based onn the theoretical framework of [Sagiv et al., 2002], which provides a proof of soundness via the embedding theorem. This relieves users from having to prove the soundness of the analysis. Powerful meta-language. The language of first-order logic with transitive closure is highly expressive. Users can specify different verification properties, and model semantics of different programming languages and different programming paradigms. Automation and flexibility. TVLA generates several ingredients that are essential for a precise static analysis. Users can tune the precision and control the cost of the generated algorithm. Although TVLA is useful for solving different problems, it has certain limitations. The cost of the generated algorithm can be quite prohibitive, preventing analysis of large programs. Some of the costs can be reduced by better engineering certain components and other costs can be reduced by developing more efficient abstract transformers. The problem of generating more precise algorithms deserves further research.
10.
Related Work
Existential abstraction. Canonical abstraction is also related to the notion of existential abstraction used in [Clarke et al., 1994, Clarke et al., 2000]. However, canonical abstraction yields 3-valued predicates and distinguishes summary nodes from non-summary nodes, whereas existential abstraction yields 2-valued predicates and does not distinguish summary nodes from nonsummary nodes. Figure 14 shows the transition diagram for a stoplight—an
On the Utility of Canonical Abstraction
249
example used in [Clarke et al., 2000]—abstracted via the method of [Clarke et al., 2000] (Figure 14(b)) and via canonical abstraction, using red(v) as the only abstraction predicate (Figure 14(c)). With existential abstraction, soundness is preserved by restricting attention to universal formulas (formulas in ACTL∗ ). With canonical abstraction, soundness is also preserved by switching logics, although in this case there is no syntactic restriction; we switch from 2-valued first-order logic to 3-valued first-order logic. An advantage of this approach is that if ϕ is any formula for a query about a concrete state, the same syntactic formula ϕ can be used to pose the same query about an abstract state.
One-sided versus two-sided answers.. Most static-analysis algorithms provide 2-valued answers, but are one-sided: an answer is definite on one value and conservative on the other. That is, either 0 means 0, and 1 means “maybe”; or 1 means 1, and 0 means “maybe”. In contrast, by basing the abstract semantics on 3-valued logic, definite truth and definite falseness can both be tracked, with 1/2 capturing indefiniteness. (To determine whether a formula ϕ holds at P , it is evaluated in each of the structures that are collected at P . The answer is the join of these values.) This provides insight into the true nature of the one-sided approach. For instance, an analysis that is definite with respect to 1 is really a 3-valued analysis that conflates 0 and 1/2 (and uses 0 in place of 1/2). (It should be noted that with a two-sided analysis, the answers 0 and 1 are definite with respect to the concrete semantics as specified, which may itself overapproximate the behavior of the actual system being modeled.)
11.
Conclusion
Reasoning about programs which manipulate the heap is one of the greatest challenges in the area of programming languages and systems. In previous work [Sagiv et al., 1998], we verified the soundness of a specific abstraction. This was a very difficult, tedious, and error prone process. In this paper, we summarized our experience with developing generic heap abstractions, which relieves the specifier from the burden of proving the soundness of his approach.
Acknowledgements We are grateful to the contributions of J. Bauer, R. Biber, N. Dor, J. Field, D. Gopan, D. Goyal, N. Immerman, B. Jeannet, T. Lev-Ami, A. Loginov, R. Manevich, F. Nielson, H.R. Nielson, A. Rabinovich, G. Ramalingam, N. Rinetzky, R. Shaham, A. Varshavsky, and G. Yorsh
Notes 1. The term “heap” refers to the collection of nodes in, and allocated from, the free-storage pool.
250 2. Without loss of generality, we exclude constant and function symbols. Constant symbols can be encoded via unary predicates, and n-ary functions via n + 1-ary predicates.
References [url, ] TVLA system. “http://www.cs.tau.ac.il/∼TVLA/”. [Andersen, 1993] Andersen, L. O. (1993). Binding-time analysis and the taming of C pointers. In Part. Eval. and Semantics-Based Prog. Manip., pages 47–58. [Arnold, 2004] Arnold, G. (2004). Combining heap analyses by intersecting abstractions. Master’s thesis, Tel Aviv University. [Ball et al., 2001] Ball, T., Majumdar, R., Millstein, T., and Rajamani, S. (2001). Automatic predicate abstraction of C programs. In Prog. Lang. Design and Impl., New York, NY. ACM Press. [Ball and Rajamani, 2001] Ball, T. and Rajamani, S. (2001). The SLAM toolkit. In Int. Conf. on Computer Aided Verif., volume 2102 of Lec. Notes in Comp. Sci., pages 260–264. [Bush et al., 2000] Bush, W., Pincus, J., and Sielaff, D. (2000). A static analyzer for finding dynamic programming errors. Software–Practice&Experience, 30:775–802. [Chase et al., 1990] Chase, D., Wegman, M., and Zadeck, F. (1990). Analysis of pointers and structures. In Prog. Lang. Design and Impl., pages 296–310, New York, NY. ACM Press. [Chen and Wagner, 2002] Chen, H. and Wagner, D. (2002). MOPS: An infrastructure for examining security properties of software. In Conf. on Comp. and Commun. Sec., pages 235– 244. [Cheng and Hwu, 2000] Cheng, B.-C. and Hwu, W. (2000). Modular interprocedural pointer analysis using access paths: Design, implementation, and evaluation. In Prog. Lang. Design and Impl., pages 57–69. [Clarke et al., 2000] Clarke, E., Grumberg, O., Jha, S., Lu, Y., and Veith, H. (2000). Counterexample-guided abstraction refinement. In Int. Conf. on Computer Aided Verif., pages 154–169. [Clarke et al., 1994] Clarke, E., Grumberg, O., and Long, D. (1994). Model checking and abstraction. Trans. on Prog. Lang. and Syst., 16(5):1512–1542. [Corbett et al., 2000] Corbett, J., Dwyer, M., Hatcliff, J., Laubach, S., Pasareanu, C., Robby, and Zheng, H. (2000). Bandera: Extracting finite-state models from Java source code. In Int. Conf. on Softw. Eng., pages 439–448. [Cousot and Cousot, 1977] Cousot, P. and Cousot, R. (1977). Abstract interpretation: A unified lattice model for static analysis of programs by construction of approximation of fixed points. In Princ. of Prog. Lang., pages 238–252. [Cousot and Cousot, 1979] Cousot, P. and Cousot, R. (1979). Systematic design of program analysis frameworks. In Princ. of Prog. Lang., pages 269–282. [Das, 2000] Das, M. (2000). Unification-based pointer analysis with directional assignments. In Prog. Lang. Design and Impl., pages 35–46. [Deutsch, 1994] Deutsch, A. (1994). Interprocedural may-alias analysis for pointers: Beyond k-limiting. In Prog. Lang. Design and Impl., pages 230–241, New York, NY. ACM Press. [Engler et al., 2000] Engler, D., Chelf, B., Chou, A., and Hallem, S. (2000). Checking system rules using system-specific, programmer-written compiler extensions. In Op. Syst. Design and Impl., pages 1–16.
On the Utility of Canonical Abstraction
251
[Fähndrich et al., 2000] Fähndrich, M., Rehof, J., and Das, M. (2000). Scalable contextsensitive flow analysis using instantiation constraints. In Prog. Lang. Design and Impl., pages 253–263. [Foster et al., 2000] Foster, J., Fähndrich, M., and Aiken, A. (2000). Polymorphic versus monomorphic flow-insensitive points-to analysis for C. In Static Analysis Symp., pages 175–198. [Ginsberg, 1988] Ginsberg, M. (1988). Multivalued logics: A uniform approach to inference in artificial intelligence. Comp. Intell., 4:265–316. [Gopan et al., 2004] Gopan, D., DiMaio, F., N.Dor, Reps, T., and Sagiv, M. (2004). Numeric domains with summarized dimensions. In Tools and Algs. for the Construct. and Anal. of Syst., pages 512–529. [Gopan et al., 2005] Gopan, D., Reps, T., and Sagiv, M. (2005). Numeric analysis of array operations. In Princ. of Prog. Lang. [Graf and Saïdi, 1997] Graf, S. and Saïdi, H. (1997). Construction of abstract state graphs with PVS. In Int. Conf. on Computer Aided Verif., volume 1254 of Lec. Notes in Comp. Sci., pages 72–83. [Havelund and Pressburger, 2000] Havelund, K. and Pressburger, T. (2000). Model checking Java programs using Java PathFinder. Softw. Tools for Tech. Transfer, 2(4). [Jeannet et al., 2004] Jeannet, B., Loginov, A., Reps, T., and Sagiv, M. (2004). A relational approach to interprocedural shape analysis. In Static Analysis Symp. Springer. [Jones and Muchnick, 1982] Jones, N. and Muchnick, S. (1982). A flexible approach to interprocedural data flow analysis and programs with recursive data structures. In Princ. of Prog. Lang., pages 66–74, New York, NY. ACM Press. [Landi and Ryder, 1991] Landi, W. and Ryder, B. (1991). Pointer induced aliasing: A problem classification. In Princ. of Prog. Lang., pages 93–103, New York, NY. ACM Press. [Lev-Ami, 2000] Lev-Ami, T. (2000). TVLA: A framework for Kleene based static analysis. Master’s thesis, Tel-Aviv University, Tel-Aviv, Israel. [Lev-Ami et al., 2000] Lev-Ami, T., Reps, T., Sagiv, M., and Wilhelm, R. (2000). Putting static analysis to work for verification: A case study. In Int. Symp. on Softw. Testing and Analysis, pages 26–38. [Lev-Ami and Sagiv, 2000] Lev-Ami, T. and Sagiv, M. (2000). TVLA: A system for implementing static analyses. In Static Analysis Symp., pages 280–301. [Loginov et al., 2004] Loginov, A., Reps, T., and Sagiv, M. (2004). Abstraction refinement for 3-valued-logic analysis. Submitted for publication. [Manevich et al., 2002] Manevich, R., Ramalingam, G., Field, J., Goyal, D., and Sagiv, M. (2002). Compactly representing first-order structures for static analysis. In Static Analysis Symp., pages 196–212. [Manevich et al., 2004] Manevich, R., Sagiv, M., Ramalingam, G., and Field, J. (2004). Partially disjunctive heap abstraction. In Proceedings of the 11th International Symposium, SAS 2004, volume 3148 of Lecture Notes in Computer Science, pages 265–279. Springer. [Manevich et al., 2005] Manevich, R., Yahav, E., Ramalingam, G., and Sagiv, M. (2005). Predicate abstraction and canonical abstraction for singly-linked lists. In Proceedings of the 6th International Conference on Verification, Model Checking and Abstract Interpretation, VMCAI 2005, Lecture Notes in Computer Science. Springer.
252 [M.Das et al., 2001] M.Das, Liblit, B., Fähndrich, M., and Rehof, J. (2001). Estimating the impact of scalable pointer analysis on optimization. In Static Analysis Symp., pages 260– 278. [Ramalingam et al., 2002] Ramalingam, G., Warshavsky, A., Field, J., Goyal, D., and Sagiv, M. (2002). Deriving specialized program analyses for certifying component-client conformance. In Prog. Lang. Design and Impl., pages 83–94. [Reps et al., 2003] Reps, T., Sagiv, M., and Loginov, A. (2003). Finite differencing of logical formulas for static analysis. In European Symp. On Programming, pages 380–398. [Rinetskey and Sagiv, 2001] Rinetskey, N. and Sagiv, M. (2001). Interprocedural shape analysis for recursive programs. In Wilhelm, R., editor, Comp. Construct., volume 2027 of LNCS, pages 133–149. Springer-Verlag. [Rinetzky et al., 2005] Rinetzky, N., Bauer, J., Reps, T., Sagiv, M., and Wilhelm, R. (2005). A semantics for procedure local heaps and its abstractions. In Princ. of Prog. Lang. [Rinetzky et al., 2004] Rinetzky, N., Sagiv, M., and Yahav, E. (2004). Computing procedure summaries exploiting heap-locality. Tech. Rep. XXX, Tel Aviv Uni. Available at “http://www.cs.tau.ac.il/∼ maon”. [Sagiv et al., 1998] Sagiv, M., Reps, T., and Wilhelm, R. (1998). Solving shape-analysis problems in languages with destructive updating. Trans. on Prog. Lang. and Syst., 20(1):1–50. [Sagiv et al., 2002] Sagiv, M., Reps, T., and Wilhelm, R. (2002). Parametric shape analysis via 3-valued logic. Trans. on Prog. Lang. and Syst., 24(3):217–298. [Shaham et al., 2003] Shaham, R., Yahav, E., Kolodner, E., and Sagiv, M. (2003). Establishing local temporal heap safety properties with applications to compile-time memory management. In Proc. of Static Analysis Symposium (SAS’03), volume 2694 of LNCS, pages 483–503. Springer. [Steensgaard, 1996] Steensgaard, B. (1996). Points-to analysis in almost-linear time. In Princ. of Prog. Lang., pages 32–41. [Wagner et al., 2000] Wagner, D., Foster, J., Brewer, E., and Aiken, A. (2000). A first step towards automated detection of buffer overrun vulnerabilities. In Network and Dist. Syst. Security. [Whaley and Lam, 2004] Whaley, J. and Lam, M. (2004). Cloning-based context-sensitive pointer alias analyses using binary decision diagrams. In Prog. Lang. Design and Impl. [Yahav, 2001] Yahav, E. (2001). Verifying safety properties of concurrent Java programs using 3-valued logic. In Princ. of Prog. Lang., pages 27–40. [Yahav and Ramalingam, 2004] Yahav, E. and Ramalingam, G. (2004). Verifying safety properties using separation and heterogeneous abstractions. In Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 25– 34. ACM Press. [Yahav et al., 2003] Yahav, E., Reps, T., Sagiv, M., and Wilhelm, R. (2003). Verifying temporal heap properties specified via evolution logic. In Proc. of the 12th European Symposium on Programming, ESOP 2003, volume 2618 of LNCS. [Yahav and Sagiv, 2003] Yahav, E. and Sagiv, M. (2003). Automatically verifying concurrent queue algorithms. In Workshop on Software Model Checking. [Yorsh, 2003] Yorsh, G. (2003). Logical characterizations of heap abstractions. Master’s thesis, Tel Aviv University.
On the Utility of Canonical Abstraction
253
[Yorsh et al., 2004] Yorsh, G., Reps, T., and Sagiv, M. (2004). Symbolically computing mostprecise abstract operations for shape analysis. In Tools and Algs. for the Construct. and Anal. of Syst., pages 530–545. [Zhu and Calman, 2004] Zhu, J. and Calman, S. (2004). Symbolic pointer analysis revisited. In Prog. Lang. Design and Impl.
Part III Process Algebras and Experimental Calculi
Jayadev Misra and Tony Hoare
PROCESS ALGEBRA: A UNIFYING APPROACH Tony Hoare Microsoft Research 7 J J Thomson Avenue Cambridge CB3 0FB, UK
Abstract
Process algebra studies systems that act and react continuously with their environment. It models them by transition graphs, whose nodes represent their states, and whose edges are labelled with the names of events by which they interact with their environment. A trace of the behaviour of a process is recorded as a sequence of observable events in which the process engages. Refinement is defined as the inclusion of all traces of a more refined process in those of the process that it refines. A simulation is a relation that compares states as well as events; by definition, two processes that start in states related by a simulation, and which then engage in the same event, will end in states also related by the same simulation. A bisimulation is defined as a symmetric simulation, and similarity is defined as the weakest of all simulations. In classical automata theory, the transition graphs are deterministic: from a given node, there is at most one edge with a given label; as a result, trace refinement and similarity coincide in meaning. Research over many years has produced a wide variety of process algebras, distinguished by the manner in which they compare processes, usually by some form of simulation or by some form of refinement. This paper aims to unify the study of process algebras, by maintaining the identity between similarity and trace refinement, even for non-deterministic systems. Obviously, this unifying approach is entirely dependent on prior exploration of the diversity of theories that apply to the unbounded diversity of the real world. The aim of unification is to inspire and co-ordinate the exploration of yet further diversity; in no way does it detract from the value of such exploration.
1.
Introduction
Process algebra is the branch of mathematics that has been developed to apply to systems which continuously act and react in response to stimuli from their environment. It is applied to natural systems such as living organisms and societies, and also to artificial systems such as networks of distributed computers. It is applied at many levels of abstraction, granularity and scale, from the entire collection of computers connected to the World Wide Web, through 257 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 257–283. © 2005 Springer. Printed in the Netherlands.
258 multiple processes time-sharing in a single computer, right down to electronic signals passing between the hardware circuits from which a computer is made. We assume that this is sufficient motivation for the study of process algebra as a branch of Computer Science. With such a range of applications, it is not surprising that there is now a wide variety of process algebras developed to meet differing needs. Fortunately, the axiomatic techniques of modern algebra establish order among the variations, and assist in the selection or development of a theory to meet new needs. The approach taken in this paper emphasises the essential unity of the study of the subject. In particular, it crosses a historical divide between theories that were based on the foundation of Milner’s Calculus of Communicating Systems (CCS [Milner, 1989]) and those that owe their origin to the theory of Communicating Sequential Processes (CSP [Roscoe, 1998]). In CCS and its variants and successors, the standard method of comparing two processes is by simulation, defined as a relation that is preserved after every action of the pair of processes between which it holds. The relation is often required to be symmetric, and then it is called a bisimulation. Similarity is defined as the existence of a simulation between two given processes [Park, 1981]; it can be efficiently computed by automatic model checking, or proved manually by an elegant co-inductive technique. In CSP and its variants, the standard comparison method is refinement, which in its simplest form is defined as inclusion of the traces of the observed behaviour of a more refined process in those of the refining process. This is an intuitive notion of correctness (at least for safety properties), and it has been applied in the stepwise design and development of the implementation of a process, starting from its more abstractly expressed specification. Such reasoning exploits the expressive and deductive power of the mathematics of sets and sequences. The divergence between CCS and CSP is not accidental, but reflects a slight difference in the primary purposes for which the two calculi were designed. The purpose emphasised by CCS is to model and to analyse the behaviour of existing concurrent systems, including those which occur in nature and which are not implemented on a computer. The purpose emphasised by CSP is to formalise the specification of a concurrent system that is intended to be implemented as a computer program, and to verify that the implementation is correct, in the sense that it satisfies its specification. The only difference is one of emphasis: both CCS and CSP have been successfully used for both purposes. More detailed comparisons of these two calculi may be found in [van Glabbeek, 1997, Brookes, 1983]. This paper shows how to combine the particular advantages of similarity with those of refinement, simply by ensuring that they mean the same thing.
Process Algebra: a Unifying Approach
259
The next section introduces the standard theory of deterministic automata by means of an ‘after’ function, which maps each process and event onto the process that results after occurrence of that event. It defines the basic concepts of simulation and refinement, and proves they are the same. Non-determinism is introduced in Section three by means of a silent transition ‘τ ’, representing an internal choice of an implementation to move from the current state to a different one, without any external cause or externally observable effect. Such internal moves are committed, and cannot be reversed. The reflexive transitive closure of the internal transition is known as reduction, and we postulate that it is a simulation. A weak transition is defined as an observable event preceded and followed by reduction. Weak similarity is defined in terms of weak transitions, in the same way as before. Because reduction is a simulation, weak similarity is proved to be the same as that defined in terms of purely deterministic transitions. Trace refinement guarantees the safety of an implemented process against what is permitted by the specification that it refines. However, a process should also be adequately responsive to stimuli from its environment, and refinement of simple conventional traces does not guarantee responsiveness. Indeed the most refined process by this definition is the one that does nothing. In Section four, responsiveness is turned into a safety property by introduction of events known as ‘barbs’ [Milner, 1989, Milner and Sangiorgi, 1992, Phillips, 1987], which can be used to record the failure of a process to respond to possibilities of interaction offered by its environment. Barbs are treated as ordinary events, and are recorded at the end of ordinary traces. Barbed simulation is defined as an ordinary simulation that takes these events into account. As a result, barbed simulation is still the same as barbed trace refinement. The problem of divergence can be treated in a similar way. The paper is illustrated by a series of simple process calculi. The deterministic fragment of the calculus is reminiscent of Milner’s lock-step synchronous SCCS, the non-deterministic one is more like CCS, and the barbed calculus is based on a familiar version of CSP. The calculi are defined by recursive definitions of the ‘after’ function, and by definitions of the tau-successors of each syntactic term; these effectively provide a structured operational semantics [Plotkin, 1981] for the calculi. Not a single axiom or definition or theorem is lost or changed in moving from one of these calculi to the next.
2.
Deterministic transition systems
A deterministic transition system is an edge-labelled graph in which all edges leading from the same node have distinct labels. The nodes of the graph stand for the possible states of a process, and the labels stand for observable events in which the process can engage together with its environment.
260 For a fixed transition system, the existence of an edge labelled e leading from e node p to node q is stated by p −→ q ; this triple is known as a transition. Its meaning is that a process in state p can move to state q on occurrence of the event denoted by the label e . The corresponding abstract relation on nodes is e e denoted −→ , defined as the set {(p, q) | p −→ q}. We will use the relational calculus to simplify the statement and proof of many of the theorems. The semi-colon will denote (forward) relational composition S
o 9
T =def {(p, r) | ∃ q. (p, q) ∈ S & (q, r) ∈ T }
Our proofs will rely on the fact that composition is associative and has the identity relation as its unit. Also, it distributes through arbitrary unions of relations. Because of determinism, we can define a function p/e (p ( after e), which maps a node p to the node at the other end of the edge labelled e. It describes the behaviour of process p after it has engaged in the event e.
Definition of ( /e) e
p/e =def q iff p −→ q To make this into a total function, it is convenient to introduce a special node ∗, which is not a process, but merely serves as a value for p/e in the case that e e /→ means that there is no edge from p which is labelled p − /→ , where p − by e. It is also convenient to postulate that ∗ is an isolated node, and has no incoming or outgoing edges. This property effectively defines the purpose of ∗, and is formalised as follows: e
e
/→ ∗ ∀ p, e. ∗ − /→ p & p −
2.1 Traces The after function can usefully be extended to apply also to sequences of events rather than just single events. If s is such a sequence, p/s is the state that process p reaches after engaging in the whole sequence s of events, one after the other. It can be found in the transition graph by starting with p and following a path along edges labelled by the successive events of s. If there is no such path, the result is ∗. The formal definition is by induction on the length of the trace.
Process Algebra: a Unifying Approach
261
Extended definition of ( /s) ∗/s p/ p/<e>s
=def ∗ =def p =def (p/e)/s
A trace of a process is defined to be the sequence of labels on the edges of some finite path of consecutive edges starting at p.
Definition of traces traces(p) =def {s | p/s = ∗} The non-process ∗ has no traces; the empty sequence is a trace of every process; and the non-empty traces of a process p are all sequences of the form <e>t , where t is a trace of p/e.
Theorem 2.1.1 Let labels be the set of all finite sequences of labels. The following properties hold. traces: nodes −→ labels traces(p) = {} iff p = ∗ ∈ traces(p) iff p = ∗ t ∈ traces(p/e) iff <e>t ∈ traces(p)
Theorem 2.1.2 The function traces( ) is uniquely defined by the four clauses of Theorem 2.1.1. Proof: The statement of Theorem 2.1.1 is effectively a definition of the traces function by primitive recursion on the length of the trace. A node q is said to refine p if every trace of q is also a trace of p. The ordering relation p ≥ q means that p is refined by q.
Definition of refinement p ≥ q =def traces(q) ⊆ traces(p) Refinement in a process calculus is used to model program correctness. Let spec be a specification, describing the intended behaviour of a process prog in terms of all the traces that it may give rise to. Being a specification, the description may take advantage of any mathematical concepts that apply to sets of sequences of events. But the actual process prog must be described in the restricted notations of a process calculus, or an implemented programming language that is based upon it. The semantics of the process calculus (as described in Section 2.3) determines exactly which traces are possible for prog. Now a proof that prog refines spec shows that no visible behaviour of prog
262 can ever fall outside the set of behaviours permitted by spec. It thereby serves as a proof of the correctness of prog with respect to spec. Refinement can also be used to justify optimisation of a program. Let opt be a better version of a program prog, for example, more efficient in resources of communication or computation, or more responsive to the needs of its users. Then the optimisation is valid just if opt refines prog. This is because every specification satisfied by prog will also be satisfied by opt, as stated by the following theorem.
Theorem 2.1.3 Refinement is reflexive and transitive, i.e., (reflexive) p≥p (transitive) p ≥ q & q ≥ r =⇒ = p≥r The preceding account of refinement takes safety as an adequate criterion of correctness; it supposes that a process that does less actions is always safer than one that does more. But obviously, failure to respond to the expected stimuli from the environment is also a serious error, one that in practice (all too frequently) manifests itself as deadlock or as livelock of a computer system. A definition of responsiveness states that a process that has more traces is more responsive than one that has less traces. A full specification of correctness should therefore also specify the desired lower bounds on the responsiveness of the system. The introduction of non-determinism in the next section will permit these lower bounds to be specified at the same time as the upper bound, in a single process specification, with the result that a single proof of refinement ensures both safety and responsiveness. In Section 2.3, we shall define a number of operators, both parallel and sequential, for constructing a complex process, say F (p, q), out of simpler components p and q. These operators can be applied equally well to specifications as to processes written in the calculus or an available programming language that implements it. Suppose we want F (p, q) to satisfy some overall specification F -spec, and decide to split the whole task into two sub-tasks: to write a program p and a program q to meet specifications p-spec and q-spec respectively; the intention is to combine them later by F . Before starting work (may be in parallel) on these two tasks, it would be a good idea to check that we have got their specifications right. This can be done in advance by proving that F (p-spec, q-spec) is a refinement of F -spec. Then, when p-spec has been correctly implemented by p (i.e., p-spec ≥ p) and similarly q-spec has been implemented by q, we can safely plug the implementations p and q into F , in place of their specifications. If such a procedure has been consistently and carefully followed throughout, we can have high confidence that the result F (p, q) is free from design errors (because F (p, q) satisfies F (p-spec, q-spec), which is already known to satisfy F -spec). Furthermore, correctness of the assem-
263
Process Algebra: a Unifying Approach
bly has actually been proved before implementation of the components. This method of engineering design is known as step-wise decomposition. But wait! The validity of the method is dependent on a basic property of the function F : it must respect the ordering relation of refinement between processes. More formally, it must be monotonic in all its arguments.
Definition of monotonicity A function F is monotonic wrt ≥ iff F (p, q, . . . ) ≥ F (p , q , . . . ) whenever p ≥ p & q ≥ q , and . . . An example is provided by the only function on processes that we have defined so far, the after function /e.
Theorem 2.1.4 p ≥ q =⇒ = p/e ≥ q/e Proof: If q is ∗, traces(q/e) is empty, so the consequent is trivial. Otherwise let t be a trace of q/e. Then by Theorem 2.1.1, <e>t is a trace of q. By the antecedent of this theorem, it is also a trace of p. By Theorem 2.1.1 again, t is a trace of p/e. The preceding theorem states that every transition respects the ordering ≥, in the sense that if two processes are related by ≥ before an event, they are still so related after it. An alternative formulation of the same theorem can be given e in terms of the relation −→ .
Theorem 2.1.5 e
e
p ≥ q & q −→ r = =⇒ ∃p . p −→ p & p ≥ r e
Proof: Let p ≥ q and q −→ r. Then <e> is a trace of q, and therefore of p. So e p −→ p/e. By Theorem 2.1.4, p/e ≥ q/e. Because of determinism, q/e = r. So p/e ≥ r, and p/e can play the role of p in the statement of the theorem. An even neater formulation of the same theorem can be stated using the relational calculus, as we shall often do from now on. It is weak commutivity principle, which permits interchange of refinement with any e-transition, when they are composed sequentially.
Theorem 2.1.5 (alternative formulation) e
e
(≥ o9 −→) ⊆ (−→ o9 ≥) Here is another example of a relation satisfying the same weak commutivity principle. Let ≡ stand for equality of the trace sets of two processes, formally defined by mutual refinement.
264 Definition of trace equivalence p ≡ q =def p ≥ q & q ≥ p
Theorem 2.1.6 e
e
(≡ o9 −→) ⊆ (−→ o9 ≡) In standard automata theory, trace equivalence is taken as a sufficient condition for identity of two processes. We shall not do this, because in the next section we want to make distinctions between processes that have the same traces. In the final section, we will show that trace equivalence is after all an adequate definition of identity of processes.
2.2 Simulation The principle of weak commutivity introduced in the last two theorems of the previous section suggests that the following definition.
Definition of simulation A relation S between processes (i.e., excluding the non-process node ∗) is defined to be a simulation if the relational composition e e (S o9 −→) is contained in (−→ o9 S), for all labels e. A bisimulation is defined as a simulation that is symmetric. Examples of bisimulation include the empty relation, the identity relation and ≡, whereas ≥ is just a simulation; so is the relation {(p, q) | traces(q) = {} }.
Theorem 2.2.1 If S and T are simulations, so is their relational composition. Proof: The assumptions are: e e (A1) (S o9 −→) ⊆ (−→ o9 S) and (A2) (T (S
o 9
T)
o 9
e
−→
= ⊆ = ⊆ =
e
S o9 (T o9 −→) e S o9 (−→ o9 T ) e (S o9 −→ ) o9 T e (−→ o9 S) o9 T e −→ o9 (S o9 T )
o 9
e
e
−→) ⊆ (−→ o9 T ) by associativity of o9 by A2 and monotonicity of by associativity of o9 by A1 and monotonicity of by associativity of o9
Theorem 2.2.2 The union of any set of simulations is a simulation. The intersection of any non-empty set of simulations is a simulation. e
Proof: Relational composition distributes through union; and since −→ is a e partial function, (−→ o9 ) distributes through intersections of non-empty set of relations.
o 9
o 9
Process Algebra: a Unifying Approach
265
Theorem 2.2.3 If R is a simulation, so is its reflexive transitive closure R , defined as the union of the identity relation with R, (R o9 R), (R o9 R o9 R), etc. Proof: Follows from the previous two theorems. In CCS and related calculi, bisimulation is the basic relation used to reason about the correctness of programs and their optimisation. Correctness means that there exists a bisimulation between a specification and its program, or between a program and its optimised version. Remarkably, it is not necessary to specify exactly which relation has been used as the bisimulation: any bisimulation will do. It can be chosen to match the needs of each particular proof. Thus what every proof establishes is the bisimilarity of two processes, where bisimilarity (and its asymmetric version similarity) is defined as follows
Definition of similarity Similarity is defined as the set union of all simulations. Bisimilarity is the union of all bisimulations. Theorem 2.2.2 says that similarity is itself a simulation, in fact the largest of all simulations, and the same applies to bisimilarity. In summary, bisimilarity is the correctness relation established by all bisimulation proofs in CCS, and other process calculi which take bisimulation as the basis of reasoning about processes.
Theorem 2.2.4 Similarity is reflexive and transitive. Proof: Reflexive because identity is a simulation; transitive because similarity composed with itself is a simulation; and every simulation is contained in similarity.
Theorem 2.2.5 Every simulation is relationally contained in refinement. Proof: By induction on the length of a trace. Let S be a simulation and let p S q. Let t be a trace of q. If t is of length 0, it is the empty sequence, which e is a trace of every p. Otherwise, let t be <e>t . Then for some q , q −→ q and e t is a trace of q . Since S is a simulation, there is a p such that (1) p −→ p and p S q . Since t is shorter than t, we can by induction assume that the traces of p include the traces of q . Since t is a trace of q , it is also a trace of e p . Since we proved above at (1) that p −→ p , <e>t is a trace of p. We now state the theorem that is the goal of this whole Section.
266 Theorem 2.2.6 In a deterministic transition system, similarity and refinement are the same. Proof: By Theorem 2.1.5, refinement is a simulation, and therefore contained in the largest simulation. Theorem 2.2.5 gives the reverse inclusion. It is worthy of note that none of the proofs in this section, except that of Theorem 2.2.6 and the second claim of Theorem 2.2.2, relies on the determinacy of the underlying transition system.
2.3 Example: a synchronous calculus The purpose of a process calculus is to define a particular transition system. It first defines the syntax for naming the nodes of the transition system, and then uses induction on the structure of the syntax to define which nodes are connected by transitions, and what the labels on the edges are. The calculus postulates that there is a node in the underlying transition graph that is named by each of the terms of the calculus, as constructed in accordance with its syntax; furthermore, each node of the graph has exactly one name. In this section we will present a synchronous deterministic calculus based loosely on Milner’s SCCS. The primitives of its syntax and their intended meanings are: STOP RUN
never does anything can do anything at any time.
There are two monadic combinators, called prefixing and restriction; they both mean the same as in CCS: f.p p\f
does f first, and then behaves like p can always do anything that p can do, except f
There are two parallel combinators: (p |&| q) (p |or| q)
can always do what both p and q can do at the same time can do whatever either p or q can do, as long as possible
These parallel combinators are chosen for their simplicity and elegance. They do not correspond to any of the parallel combinators of either CCS or CSP. Note that the syntax must not contain in its syntax a notation for ∗, which is not a process. For the same reason, a process calculus cannot include the after operator, because it sometimes gives the result ∗ . However, the after operator remains useful in reasoning about the calculus at a more abstract level. The specification of the labelled edges of the underlying transition system is formalised as an inductive definition of the after operator, where the induction is over the structure of the term that names the parameter. The following table shows a simple way of doing this. For each node p, it tells how to compute
267
Process Algebra: a Unifying Approach
(recursively, where necessary) the name of the node at the other end of the edge labelled e. It is easy to check that the formal definition accords with the informal description given above to explain the meaning of each notation.
Definition of ( /e) STOP/e RUN/e
= ∗ = RUN
f.∗
= ∗
(f.p)/e
= p = ∗
if e = f otherwise
(p |&| q)/e
= (p/e) |&| (q/e) = ∗
if p −→ & q −→ otherwise
(p |or| q)/e = = = = (p\f )/e
(p/e) |or| (q/e) (p/e) (q/e) ∗
= (p/e)\f = ∗
e
e
e
e
if p −→ & q −→ e e if p −→ & q − /→ e e if p − /→ & q −→ otherwise if e = f otherwise
Note the unusual recursion in the first line of the rule for or-parallelism. It reveals that continued parallel computation of both operands is needed if the first event is possible for both of them. Or-parallelism is a deterministic version of a choice operator, as computed by the traditional determinisation procedure of automata theory. That is why the or-parallel operator defined here does not correspond to any of the choice operators in CSP or CCS, which avoid the inefficiency of the parallel computation by resorting to non-determinism. We will return to these more familiar choice operators in Section 3.3. In the standard deterministic model of CSP, the clauses of the following theorem are presented as a recursive definition of the operators of the calculus; and the clauses of the definition of ( /) can be proved from them. The equivalence of two different methods of definition is mildly encouraging in a mathematical theory, since it forestalls any controversy over which to choose as definitive.
268 Theorem 2.3.1 traces(STOP) traces(RUN) traces(f.p) traces(p |&| q) traces(p |or| q) traces(p\f )
= = = = = =
{} labels {} ∪ {t | t ∈ traces(p)} traces(p) ∩ traces(q) traces(p) ∪ traces(q) {t | t ∈ traces(p) & not( in t )} where s in t =def ∃u, v. u s v = t
Proof (the last clause, for example). By induction on the length of a trace. The hypothesis is that traces with maximum length n on both sides of the assertion are identical. Base case: ∈ LHS holds because (p\f ) = ∗. Similarly, ∈ RHS because ∈ traces(p) and (obviously) does not contain f . For the inductive case, we reason as follows. <e>t ∈ traces(p\f ) iff t ∈ traces((p\f )/e) by definition of traces iff t ∈ (traces(p/e)\f if f = e else {}) by definition of ((p\f )/e) iff f = e & t ∈ traces(p/e) by definition of ∈ iff f = e & t ∈ traces(p/e) & not( in t) by induction hypothesis iff <e>t ∈ traces(p) & not( in <e>t) by definition of in The traces of each of the constructions of the calculus can be calculated from the definition of traces( ) at the beginning of Section 2.1, together with the definition of /e given above.
Theorem 2.3.2 All the operators are monotonic with respect to refinement. Proof: Simple Boolean algebra of sets. In fact a great many equivalences between processes are immediate consequences of elementary equations between their sets of traces. For example, the following theorems match exactly the properties of the meet operator in a Boolean algebra.
Theorem 2.3.3 RUN |&| p STOP |&| p p |&| p p |&| q (p |&| q) |&| r
≡ ≡ ≡ ≡ ≡
p STOP p q |&| p p |&| (q |&| r)
RUN is the unit of |&| STOP is its zero |&| is idempotent it commutes it associates
Algebraic laws based on trace equivalence help to explain the way in which parallel processes interact with each other by synchronised participation in the
269
Process Algebra: a Unifying Approach
same events. Consider the process ((e.p) |or| (f.q)), which offers a choice between two initial events, either e or f , which we assume to be distinct. Suppose this is run in and-parallel with an environment (e.r); this process selects e as the next event to occur, rejecting the possibility of f . Then e must be the next event, and the subsequent behaviour of the system will involve p, and q will have no further effect. However, if the environment selects an event, say g, which is not offered by the or-parallel process, the result is deadlock, indicated by the process STOP. These facts are summarised algebraically in the following theorem.
Theorem 2.3.4 ((e.p) |or| (f.q)) |&| (e.r) ≡ e.(p |&| r) ((e.p) |or| (f.q)) |&| (g.r) ≡ STOP
if e = f if g = f & g = e
In addition to its collection of operators, a process calculus usually allows definition of the behaviour of a system by means of recursion. For example, a perpetually ticking clock may be defined by the recursive equation clock = tick.clock The right hand side of such an equation may include any of the operators of the calculus. The fact that all the operators are monotonic guarantees by Tarski’s theorem that there exists a trace set that for a process that satisfies the equation. In all reasonable cases, there will only be one such set. When the solution is non-unique, CCS specifies that the least solution is intended. Thus the trivial loop defined by the recursive equation loop = loop has STOP as its intended solution, rather than RUN. CSP makes the opposite choice. The reason is to make it as difficult as possible to prove correctness of a non-terminating recursion. In the remainder of this paper we will not give further attention to recursion. In CCS and related calculi, it is usual to present the semantics in a structured operational style. For each nameable node, there are clauses which determine a name for each of its e-derivatives. These essentially determine the structure of the underlying graph.
270 Theorem 2.3.5 e RUN −→ RUN e e.p −→ p e p |&| q −→ p |&| q e p |or| q −→ p |or| q e p |or| q −→ p e p |or| q −→ q e p\f −→ p
if if if if if
e
p −→ p e p −→ p e p −→ p e p− /→ e = f
e
& q −→ q e & q −→ q e &q− /→ e & q −→ q e & p −→ p
There is an implicit understanding that this is the entire set of rules for the transition system, and that if a transition cannot be derived from these rules, it does not exist. Thus the fact the STOP does not appear in any of the rules means that it is the source of no edges. The fact that there is only one transition given f
for e.p means that if (e.p) −→ q, then e = f and q = p. In reasoning about the calculus, the rules must be strengthened to give both necessary and sufficient conditions for each transition. When this has been done, the operational definition of Theorem 2.3.5 can be used to justify the clauses of the definition of ( /e) given earlier in this section.
3.
Non-deterministic transition systems
For a deterministic system, all the events that happen are observable, predictable and to some extent controllable (see Theorem 2.3.4) by the environment with which the system interacts. Non-determinism is introduced into the system when there are unobservable internal events that change the internal state of the system without the knowledge or control of the external environment. Let τ be such an internal event. (In fact, there can be many such events, but because they are indistinguishable, we follow convention in letting τ stand for them all). We define a non-deterministic system as a deterministic transition system, plus a set of additional τ -labelled edges between the nodes; these do not have to satisfy the requirement of determinacy: many τ -labelled edges can lead from the same node to many different nodes. Selection between them is non-deterministic, uncontrollable and unobservable by the external environment. In a program-controlled system, the τ event may model a period of internal computation of the program. In modern process algebras, these internal computations are specified by algebraic reduction rules, which permit the implementation to move the state of a process from one that matches the left hand side of the rule to one that matches the right hand side. In other cases, τ may stand for an internal communication between components of a system, observable only by those components; they are deliberately hidden from its outer environment, in order to present a simpler and more abstract interface.
Process Algebra: a Unifying Approach
271
Non-determinism arises because reduction does not have to satisfy the ChurchRosser property: whenever there is a possibility of two reductions to two different states, there may be no possibility of convergence back to the same state again afterwards. As a result, algebraic reduction involves a commitment that cannot later be withdrawn. That is what makes non-determinism problematic in the design and implementation of computer systems. Another reason is that a program can work perfectly while it is being tested, and still go wrong when put to serious use. Indeed, the solution to these problems is one of the primary motives for the study and application of process algebra to proofs of correctness of computer systems. In development of an abstract theory of correctness, we are not interested in the exact number of steps involved in a particular computation. In fact, if the theory is to be used for purposes of program optimisation, it is essential to abstract from questions of efficiency, in order that an optimised program can be proved equal to (or a refinement of) its un-optimised version. We are therefore τ not so much interested in the −→ relation by itself, but rather in its reflexive transitive closure, which we denote by an unlabelled arrow.
Definition of reduction τ
−→ =def (−→) We are also interested in the states for which no further internal computation is possible until after the next externally visible event. In such a stable state, a process is idle, waiting for a response or a new stimulus from the environment. Stable states are defined in the same way as normal forms in algebraic reduction
Definition of stability p is stable
τ
=def p − /→
Occurrence of a τ -transition is intended to be totally unobservable from outside a process. In particular, we cannot tell exactly when the transition took place. In an unstable state, after occurrence of a visible event e we cannot tell whether a τ -transition preceded it or followed it, or maybe never occurred at all. This invisibility is expressed by postulating an algebraic law similar to the definition of a simulation.
272 Defining property of τ τ
e
e
−→ o9 −→ ⊆ −→ o9 −→ or equivalently −→ is a simulation. This condition can also be expressed: τ
= p/e −→ q/e p −→ q =⇒ In Section 3.3 we will define the τ transitions of a simple process calculus. Care is needed to show that these definitions are consistent with the axiom. Axioms like this are often called healthiness conditions: the designer of a calculus must ensure that healthiness is preserved by all the operators. There are three reasons for accepting this as the defining property of a τ transition. First, it represents the invisibility of τ . Secondly, it permits a useful optimisation. If an implementation can detect that an event e will be possible after an internal computation, it may compile code to do e straight away, either avoiding the calculation at run time, or at least postponing it till after: efficiency and responsiveness may thereby be improved. Finally, the postulate achieves our primary goal of reconciliation of similarity with refinement. Since simulation implies refinement, the defining property of τ means that a non-deterministic choice can only reduce the traces of a process, never change or increase them. Thus the trace set defines the limits of what a process can do, and includes all possible results of any choices it may make internally.
3.1 Weak simulation In the evolution of a non-deterministic process, internal invisible activity will alternate with externally visible events. Each external event will be preceded and followed by (none or more) internal reductions. We therefore give the usual definition of the concept of a non-deterministic (or weak) transition.
Definition of weak transition e
e
= =def (−→ o9 −→ o9 −→) =⇒ A weak simulation and weak similarity are defined in the same way as ordinary e e = in place of −→. (strong) simulation and similarity, using the transition =⇒ e
The following theorem gives an alternative definition =⇒ = , and makes explicit some of its obvious properties.
Theorem 3.1.1 e
= =⇒
e
e
e
= (−→ o9 −→) = (−→ o9 =⇒ = ) = (=⇒ = o9 −→)
Proof: −→ is a reflexive and transitive simulation.
273
Process Algebra: a Unifying Approach
Theorem 3.1.2 If W is a weak simulation, then (−→ o9 W ) is a simulation. Proof: e
e
(−→ o9 W ) o9 −→
⊆ −→ o9 W o9 −→ o9 −→ e = = −→ o9 W o9 =⇒ e ⊆ −→ o9 = =⇒ o9 W e o = −→ 9 −→ o9 W
−→ is reflexive by Theorem 3.1.1 W is a weak simulation by Theorem 3.1.1
Theorem 3.1.3 If S is a simulation, then (S o9 −→) is a weak simulation. Proof: similar to the above.
Theorem 3.1.4 Weak similarity is the same as similarity. Proof: Let W be the largest weak simulation. Because −→ is reflexive, W is contained in (−→ o9 W ). From Theorem 3.1.2, this is a simulation, and so contained in the largest simulation. The reverse inclusion is proved using Theorem 3.1.3 in place of Theorem 3.1.2. Note that this theorem depends on the defining property for τ . The introduction of non-determinism has not required any change in the definition of traces. So Theorem 3.1.4 achieves our goal of reconciling refinement with weak similarity in a non-deterministic setting.
3.2 Relationship with CCS A traditional presentation of a non-deterministic transition system is as an edge-labelled graph which allows edges with the same source and label to point to two or more distinct nodes. We can construct such a traditional graph from our definition of a non-deterministic system, simply by using weak transitions e e = to define its labelled edges, instead of the deterministic transitions −→ . =⇒ T he resulting graph will enjoy the following extra properties τ
e
e
e
τ
= ) = =⇒ = = (=⇒ = o9 −→) (1) (−→ o9 =⇒ e =⇒ q iff p/e −→ q (2) p = e e = q =⇒ = p =⇒ = p/e (3) p =⇒ In standard versions of CCS, the property (1) holds by definition of the weak transition; furthermore the after function can be defined in CCS in a way that satisfies the transition rule (2). The only missing property (3) is the one that states that there exists an edge labelled e between p and p/e. For example, consider the graph fragment.
274 e A A Ae e e A @ τA τ @ A A @ e R @AU e
In a transition graph satisfying property (3), there must be an edge labelled e from the top node to the middle node of this diagram. In the underlying transition graph for CCS, such an edge may be missing. Our calculus therefore cannot be applied to a reactive system in which the possible absence of such an edge is a significant feature; for study of such a system, CCS would be a better choice as a model. In general, our theory can be regarded as a sub-theory of CCS, in that it applies to the subset of CCS processes which happen to satisfy (3).
3.3 Example: an asynchronous calculus The introduction of non-determinism by means of a τ -transition permits distinctions to be made between processes which have the same traces, but which have different reductions. For example the RUN process has all sequences of labels as its traces. So does the CHAOS process of CSP, which is intended to be the most non-deterministic of all processes. Its extreme non-determinism is indicated by the fact that it can unobservably change into any other process whatsoever τ
∀ CHAOS −→ p ∀p. In this, it is distinguished from RUN, which has no τ -transitions. However, both the processes satisfy similar recursive equations, explaining why RUN and CHAOS have the same traces. RUN/e = RUN CHAOS/e = CHAOS Before proceeding, we must check that these definitions satisfy the healthiness condition for τ , which is done as follows τ
RUN −→ r τ CHAOS −→ r
=⇒ RUN/e −→ r/e = the antecedent is always false = =⇒ CHAOS/e −→ r/e the consequent is always true
In general, a process algebra can define the placing of the τ -labelled edges in a non-deterministic graph by means of a collection of clauses similar to those which defined the meaning of the after operator in Section 2.3. The postulates
Process Algebra: a Unifying Approach
275
define a set of τ -transitions which the transition system must include. The definition is completed by saying that these are all the τ -transitions that are included – there must be no more.
Definition of τ -transitions STOP, RUN, and f.p τ (p |&| q) −→ (p |&| q) τ (p |&| q) −→ (p |&| q ) τ (p |or| q) −→ (p |or| q) τ (p |or| q) −→ (p |or| q ) τ (p\f ) −→ (p \f )
have no τ -transitions τ if p −→ p τ if q −→ q τ if p −→ p τ if q −→ q τ if p −→ p
The first line forbids an implementation from making internal transitions in cases where it is not needed or wanted. The next four of these clauses allow reductions to be made independently for both the operands of a parallel combinator. This is what permits an implementation to exploit concurrency in execution of the internal actions of concurrent parallel processes. Before proceeding further, we must prove that the above definition preserves the healthiness condition for τ . The pattern of the proof is given just for the case p\f . τ Assume p\f −→ r; we need to prove (p\f )/e −→ r/e. Since the traces e of p\f include the traces of r , it follows that p\f −→ , and so e cannot be f . The definition given above for τ transitions has only one clause that could justify a τ transition of p\f , so the assumption must match that clause; τ consequently for some p , r = p \f and p −→ p . The definition of ( /e) e e shows that p \f −→ only if p −→ . Now p is syntactically simpler than p\f , so we may assume by induction that it satisfies the healthiness condition. So p/e −→ p /e, ie, there is a sequence of τ -transitions stretching between them. Applying the definition of ( \f ) to each step of the sequence, we get (p/e)\f −→ (p /e)\f . From the definition of ( /e) since e = f , (p\f )/e = (p/e)\f −→ (p /e)\f = (p \f )/e. As mentioned in Section 2.3, the implementation of or-parallelism has to be prepared to execute both its operands concurrently, for as long as the events that actually happen are possible for both of them. For practical reasons, most process algebras introduce a choice operator that does not require such parallel computation. In fact, CSP introduces two choice operators: internal choice ( ), which is made by a process in a manner that cannot be observed or controlled by the environment; and external choice (), which can be controlled by the environment (as described in Theorem 2.3.4, but only on the first step of their interaction. Although both operators have the same set of traces as |or| , they are distinguished by their τ -transitions, as we shall now describe.
276 For an internal choice, denoted by (p q), the choice between the operands can be made internally by a τ -transition; so the following clause should be added to the defining properties of τ -transitions τ
τ
(p q) −→ p & (p q) −→ q The after function when applied to (p q) obeys the same recursion as it does for (p |or| q) (p q)/e = = = =
e
e
if p −→ & q −→ e e if p −→ & q − /→ e e if p − /→ & q −→ otherwise
(p/e) (q/e) (p/e) (q/e) ∗
For external choice, the two operands can be reduced, even in parallel; but (as in CSP) on this first step it is not permitted to withdraw the external choice between the operands (the CCS + operator does allow such withdrawal). So the τ definition for is the same as for |or| . τ
(p q) −→ (p q) τ (p q) −→ (p q )
τ
if p −→ p τ if q −→ q
However, after the first visible event, the behaviour is the same for external as for internal choice (p q)/e = (p q)/e The distinction between internal and external non-determinism is sufficient to solve the problem of non-deterministic deadlock. The process (p STOP) has the same traces as p; but it is distinguished from p because it can independently and unobservably withdraw its capability to perform the actions of p. The withdrawal is justified by the τ -transition from (p STOP) to STOP. In Section 2.3, the restriction operator \f was defined as in CCS, to conceal the event f by preventing it from happening altogether. CSP introduced a different hiding operator, which we will denote by \\f ; it allows an f -transition to happen whenever it can, but only as an internally hidden event. This is formally expressed by the postulate τ
(p\\f ) −→ (p \\f )
f
τ
if p = =⇒ p ∨ p −→ p
A process (p\\f ) for which all f -transitions are hidden obviously cannot engage in any external occurrence of the event f . But any other event possible for p may occur; or the operand can perform a hidden f -transition first. The choice to perform the f event rather than some other possible event is internally non-deterministic. (p\\f )/e = ∗ = ((p/e)\\f ) ((p/f )\\f )/e
if e = f otherwise
Process Algebra: a Unifying Approach
4.
277
Barbed Transition Systems
So far we have devoted attention to graphs with labels on their edges; but why shouldn’t nodes have labels too? They could be used to denote properties of the internal states of a process, independent of any actions they can perform. Let us introduce a set of node labels B (standing for ‘barbs’), disjoint from the set of familiar edge labels, which we will denote by L. All labels e and f mentioned in earlier sections are assumed to be from the set L, which excludes barbs. The barbs cannot be explicitly mentioned in the syntax of our calculus (e.g., b.p is forbidden if b is a barb). Their attachment to the nodes of the graph is governed by specific axioms of the calculus that are known as healthiness conditions. b We use the notation p −→ # to state that node p has barb b. To enable us to continue to use the relational calculus, the symbol # (sharp) will be taken to denote another special node that is not a process; it is distinct from ∗ and has no outgoing edges. It will be helpful to draw a barb as an actual barb-labelled edge of the graph; such edges stick out from the nodes of the graph in the same way as the barbs stick out from a barbed wire - this is the origin of the name. This construction ensures that the transition graph satisfies a healthiness condition. Healthiness condition for barbs e
= e is a barb ≡ q = #. p −→ q =⇒ A barbed trace of a process p is defined in the same way as before, except that in addition to normal labels from L , it can also include barbs from B . As a result of the healthiness condition, a barbed trace can contain only a single barb, one which labels the node at the end of the trace. This barb represents the last observation that can be made of a process, and is often used to indicate the reason for termination. For example, a refusal in CSP can be regarded as a barb indicating that the process has ended in deadlock; and a divergence barb indicates the possibility of an infinite sequence of internal actions, often known as livelock. We will explain these phenomena more fully in Sections 4.2 and 4.3. A barbed simulation is defined as a simulation in which the range of quantification of the events ranges over B as well as L. In order to apply our theory to barbs, we require reduction to be a barbed simulation. This has the consequence that a state inherits all the barbs of its −→ descendants. Consequently, a barb on an unstable state describes only the possibility that the state may spontaneously (and unobservably) change to one that displays the property, – but equally, as a result of non-determinism, it may not. It is only on a stable state that a barb denotes a property that can definitely be observed of that state.
278 These definitions have been carefully formulated to ensure that barbed simulation is the same as barbed trace refinement. In the remaining sections we shall describe some of the barbs introduced in standard models of CSP.
4.1 Refusals A refusal ‘ref (X)’ in CSP is an observation of the state of a process which refuses to engage in any of the events in some set X (a subset of L ), even though all of them are possible (and usually even desirable) in the current state of its environment. The concept of a refusal barb was introduced into CSP in order to model non-deterministic deadlock as a safety property rather than as a liveness property of a process, and so ensure that simple refinement reasoning can prove its absence. The intended meaning of a meaningful barb is often defined rather abstractly by means of a healthiness condition imposed on the underlying transition system. In the case of the refusals of CSP, the healthiness condition states (rather obviously) that a stable process can refuse a set if and only if the set contains none of the events which it can accept and non-stable processes inherit the refusals of their descendents. Healthiness condition for refusals ref (X)
x
/→ p −→ # iff ∃ q. q is stable & p −→ q & ∀x ∈ X. q − As a consequence, every stable state (and all its predecessors under −→) has an empty refusal set among its barbs. Furthermore, if a state has ref (X) among its barbs, then it also has ref (Y ) , for all subsets Y of X . However, if a state has no stable successors under reduction, then it has no refusals according to the above definition. We will return to this point in the next section. The healthiness condition enables us to deduce the refusals of all the processes expressible in the syntax of the calculus. It seems better to deduce the refusals from a general healthiness condition than to make them part of the definition of the operators, which would require a separate proof of the preservation of healthiness.
279
Process Algebra: a Unifying Approach
Theorem 4.1.1 ref (X)
STOP −→ #
for all sets X that exclude barbs
RUN −→ #
iff X = {}
f.p −→ #
iff f ∈ /X
(p |or| q) −→ #
iff p −→ # & q −→ #
(p |&| q) −→ #
iff ∃ Y, Z. X = Y ∪ Z & p −→ # & q −→ #
(p\f ) −→ #
iff p
ref (X)
ref (X)
ref (X)
ref (X)
ref (X)
ref (X)
ref (X−{f })
−→
ref (X)
ref (Y )
ref (Z)
#
The non-deterministic processes introduced in Section 3.3 have their refusals defined in the following theorem
Theorem 4.1.2 ref (X)
CHAOS −→ #
for all sets X
(p q) −→ #
iff p −→ # ∨ q −→ #
(p q) −→ #
iff p −→ # & q −→ #
(p\\f ) −→ #
iff p
ref (X)
ref (X)
ref (X)
ref (X)
ref (X)
ref (X)
ref (X)
ref (X∪{f })
−→
ref (X)
# ∨ (p/f ) −→ #
4.3 Divergences In spite of the powerful law of inheritance from descendents, there can be nodes that have no refusal barbs at all. This can happen as a result of concealing an infinite sequence of f -transitions, for example in the process RUN. Because RUN can always do an f , RUN\\f can always make an internal transition τ
RUN\\f −→ RUN\\f As a consequence, even after any number of f ’s , RUN\\f is never stable. Such a process has an empty set of refusals. What is worse, it has an infinite sequence of τ -transitions, and is therefore said to diverge. Divergence is often considered an undesirable feature of a concurrent system, because there is no way of controlling the amount of system resource that a divergent system may consume. In practice, divergence is a common mechanism of denial-of-service attacks on the World Wide Web, and one would like to prove its impossibility. On the other hand, in some circumstances, maybe the possibility of divergence is of no concern, for example when probabilistic reasoning proves that it is vanishingly unlikely. It is therefore desirable for a process calculus to provide some means of specifying whether a process is allowed to diverge; and that is the purpose of the ‘div ’ barb; its introduction allows refinement as a means of proving its absence where it is not wanted.
280 A node has a div barb if it is the origin of a potentially infinite series of τ transitions, as stated by the obvious healthiness condition Healthiness condition for div : div
τ
τ
τ
p −→ # iff p −→ p −→ p −→ . . . forever . The effect of this new barb on the constants and operators of our process calculus is given by the following theorem
Theorem 4.3.1 div
CHAOS −→ # STOP, RUN and f.p do not have divergence barbs. (p |&| q), (p |or| q), (p q), (p q) and p\f have a divergence barb iff one (or both) of their operands has a divergence barb. div div (p\\f ) −→ # iff ∃ n. p/(f n ) −→ # ∨ ∀ i. f i ∈ traces(P ) In the divergence model of CSP, the occurrence of divergence is regarded as so undesirable that it is assumed that no specification will actually allow it, and that any process that allows even just the possibility of divergence is so bad that it is not worth differentiating from any other process that allows it. This very strict view of divergence is taken from Dijkstra’s theory of programming, and it is consistent with a simple treatment of recursion as the largest fixed point of its defining equation. The view can be introduced into a process calculus by an additional healthiness condition. Healthiness condition for CHAOS (in the divergence model of CSP) div
τ
= p −→ CHAOS p −→ # =⇒ A consequence of this definition is that even divergent processes will have at least one refusal barb. In the standard models of CSP, the semantics of a non-deterministic process is given in terms of its traces, its failures, and its divergences. Introduction of barbs enables us to model all of these as just ordinary barbed traces. A failure is just a trace with a refusal barb at the end. A divergence is just a trace with a divergence barb at the end. Thus failures/divergence refinement (FDR) can be considered as simple trace refinement. Theorem 2.2.6 continues to hold. Trace refinement and similarity are the same.
5.
Conclusion
In this paper we have started with the classical theory of deterministic automata, and the languages of traces which they generate. In the classical the-
Process Algebra: a Unifying Approach
281
ory, a particular automaton can be completely specified either by its transition graph, with equality implied by mutual similarity, or by equality of the language of traces generated by the labels on all the paths leading from a node. The two methods of defining a process are isomorphic, and so they are mathematically indistinguishable. Modern process algebras have extended the classical theory of automata, firstly by removing the restriction to finite state automata, and secondly by introduction of non-determinism. The second extension has led to a dichotomy in the study of process algebra, arising from selection of a refinement (as in CSP) or a simulation (as in CCS) as the basis of comparisons between processes. The central goal of this paper has been to re-establish the isomorphism between the two approaches, even in the presence of non-determinism. CCS provides a fixed collection of primitive processes and operators capable of modelling arbitrary concurrent systems, and it defines them in terms of a fixed set of primitive transitions. All other operators that are needed are expected to be definable in terms of the primitive set. Proofs about the calculus can therefore be made by induction on the structure of the syntax and on the number of operations involved in its execution. The concept of bisimulation gives a way of proving the most essential equations among terms. Bisimulation minimises the risk of obscuring distinctions between processes that may later be required in its potential range of applications. Bisimilarity has a number of excellent qualities. It is based directly on an operational semantics for the process calculus, which provides an abstract model for its implementation. This kind of semantics appeals to the operational intuition of programmers, which is especially useful when diagnosing errors by testing a program. Bisimulation admits simple and elegant proofs, using coinduction. And for particular programs, proofs can often be replaced by mechanical checking, because bisimulation is a direct description of an algorithm that can be used by an efficient model checker. Subtle variations in the definition of bisimilarity have offered wide scope for research, and new versions can fairly easily be introduced for new purposes. The standard variants are sufficiently powerful to reduce every (finite) process term to a normal form, thereby permitting powerful algebraic techniques to be used in reasoning. For processes containing recursion, a head normal form is available. CSP is based on the concept of a trace (possibly barbed) as a description of the observable behaviour of a concurrent interactive process. There is an extensible collection of operators defined in terms of the trace sets that they generate. Proofs about the calculus are conducted by standard mathematical theories of sets and sequences. Basic properties of the calculus are postulated by means of healthiness conditions, which must be preserved by each operator that is introduced into the syntax. Further barbs and healthiness conditions can be introduced to model properties of particular systems, but they may require re-
282 strictions on the use of any operator that does not preserve the condition. CSP pays particular attention to notions of correctness of implementation. In a specification, the traces may be described in any clear and convenient formalism; whereas implementations are expressed in the notations of the calculus, and intermediate designs can exploit a mixture of notations. Correctness is modelled by trace inclusion, so the calculus supports the standard engineering design strategies of stepwise refinement and stepwise decomposition. The intention is to make refinement and equality as comprehensive as possible, so that programs are easier to prove correct and to optimise. But care has been taken to describe and distinguish undesirable behaviours like deadlock and divergence, which can afflict any real distributed system. Of course, when bisimulation and refinement are reconciled, all their separate advantages can be combined and exploited whenever the occasion demands. Although in this paper only CCS and CSP have been considered in detail, is hoped that the reconciliation can be extended to more modern process calculi. The secret of reconciliation of a simulation-based calculus with a trace-based one is to require the reduction relation to be a simulation. This and other healthiness conditions can be expressed as additional transition rules, suitable for inclusion at will in the operational semantics of any calculus that seeks reconciliation. The new transitions may be interpreted by an implementer of the calculus as permission to be kind to the user of the process, in the sense of giving the user more opportunities to avoid deadlock. The new transitions offer additional possibilities for resolving non-determinism at compile time; they validate more algebraic laws, so giving more opportunities for optimisation. But there is no compulsion to be kind – in fact, during system test, it is actually kinder for an implementation to expose all possibilities of error. That too is allowed by the theory.
Acknowledgements The goal of unification of theories of concurrency was first pursued in the Esprit Basic Research Action CONCUR. This was a Europe-wide research contract aiming at concurrence (in the sense of agreement) between various process algebras current at the time, particularly ACP [Bergstra and Klop, 1985], CCS and CSP. A renewed attempt to reconcile simulation with refinement was encouraged by the earlier success of power simulation [Gardiner, 2003]. The inspiration for the particular approach of this paper derives from a Workshop held in Microsoft Research Ltd., at Cambridge on 22-23 July 2002. Those who contributed to the discussion were Ernie Cohen, Cedric Fournet, Paul Gardiner, Andy Gordon, Robin Milner, Sriram Rajamani, Jakob Rehof, and Bill Roscoe.
Process Algebra: a Unifying Approach
283
Subsequently, He Jifeng and Jakob Rehof made essential simplifications and clarifications. The first draft of this paper was published by the Technical University at Munich as lecture notes for the Marktoberdorf Summer School in August 2004. This version was prepared with the kind assistance of Ali Abdallah and was presented at the CSP25 symposium at the South Bank University in July 2004. It was published by Springer [Abdallah et al., 2005] and is reproduced here by permission.
References [Abdallah et al., 2005] Abdallah, A. E., Jones, C. B., and Sanders, J. W., editors (2005). Twenty-five Years of Communicating Sequential Processes, Lecture Notes in Computer Science. Springer-Verlag. [Bergstra and Klop, 1985] Bergstra, J. and Klop, J. W. (1985). Algebra of communicating processes with abstraction. Theoretical Computer Science, 37(1):77–121. [Brookes, 1983] Brookes, S. D. (1983). On the relationship of CCS and CSP. In Proceedings of the 10th Colloquium on Automata, Languages and Programming, volume 154 of Lecture Notes in Computer Science, pages 83–96. Springer-Verlag. [Gardiner, 2003] Gardiner, P. (2003). Power simulation and its relation to Traces and Failures Refinement. Theoretical Computer Science, 309(1):157–176. [Milner, 1989] Milner, R. (1989). Communication and concurrency. Prentice Hall. [Milner and Sangiorgi, 1992] Milner, R. and Sangiorgi, D. (1992). Barbed bisimulation. In Kuich, W., editor, Proceeding of the 19th International Colloquium on Automata, Languages and Programming (ICALP ’92), volume 623 of Lecture Notes in Computer Science, pages 685–695. Springer-Verlag. [Park, 1981] Park, D. (1981). Concurrency and automata on infinite sequences. In Deussen, P., editor, Proceedings of 5th GI Conference, volume 104 of Lecture Notes in Computer Science, pages 167–183. Springer-Verlag. [Phillips, 1987] Phillips, I. (1987). Refusal testing. Theoretical Computer Science, 50(3):241– 284. [Plotkin, 1981] Plotkin, G. D. (1981). A structural approach to operational semantics. Technical Report DAIMI FN-19, Aarhus University. [Roscoe, 1998] Roscoe, A. W. (1998). The theory and practice of concurrency. Prentice Hall. [van Glabbeek, 1997] van Glabbeek, R. (1997). Notes on the methodology of CCS and CSP. Theoretical Computer Science, 177(2):329–349.
COMPUTATION ORCHESTRATION A Basis for Wide-Area Computing Jayadev Misra The University of Texas at Austin Austin, Texas 78712, USA
[email protected] Dedicated to the memory of Amit Garg, August 31, 1977 – April 5, 2003, who was a group member in this project.
Abstract
1.
We explore the following quintessential problem: given a set of basic computing elements how do we compose them to yield interesting computation patterns. Our goal is to study composition operators which apply across a broad spectrum of computing elements, from sequential programs to distributed transactions over computer networks. Our theory makes very few assumptions about the nature of the basic elements; in particular, we do not assume that an element’s computation always terminates, or that it is deterministic. We develop a theory which provides useful guidance for application designs, from integration of sequential programs to coordination of distributed tasks. The primary application of interest for us is orchestration of web services over the internet, which we describe in detail in this paper.
Introduction
We explore the following quintessential problem: given a set of basic computing elements how do we compose them to yield interesting computation patterns. Our goal is to study composition operators which apply across a broad spectrum of computing elements, from sequential programs to distributed transactions over computer networks. Our theory makes very few assumptions about the nature of the basic elements; in particular, we do not assume that an element’s computation always terminates, or that it is deterministic. We develop a theory which provides useful guidance for application designs, from integration of sequential programs to coordination of distributed tasks. We in285 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 285–330. © 2005 Springer. Printed in the Netherlands.
286 troduce site as a general term for the basic computing elements and Orc as the theory of orchestration of sites. We study distributed application design in general, with particular emphasis on orchestration of web services over the internet. A web service is a site. More generally, a distributed transaction, which can be regarded as an atomic step of a larger computation, is a site. We sketch some of the requirements for sites later in this section and in greater detail in section 2. Our composition operators are quite minimal, inspired by the operators of Kleene algebra (regular expression of language theory is one instance of Kleene algebra). We use alternation ( | ) for parallel composition and sequencing ( ) for sequential composition. Additionally, we propose a mechanism to introduce local variables into an expression, which permits us to implicitly specify computation order, selectively prune parallel threads, and transfer data across subcomputations. We show a variety of examples from web services and other domains to illustrate the power of the composition operators. A simple semantic definition is made possible by the simplicity of the operators. Our operators obey a number of axioms of Kleene algebra.
1.1
Wide-area computing
The computational pattern inherent in many wide-area applications is this: acquire data from one or more remote services, calculate with these data, and invoke yet other remote services with the results. Additionally, it is often required to invoke alternate services for the same computation to guard against service failure. It should be possible to repeatedly poll a service until it supplies results which meet certain desired criteria, or to ask a service to notify the user when it acquires the appropriate data. And it should be possible to download an application and invoke it locally, or have a service provide the results directly to another service on behalf of the user. We call the smooth integration of services orchestration. Orchestration requires a better understanding of the kinds of computations that can be performed efficiently over a wide-area network, where the delays associated with communication, unreliability and unavailability of servers, and competition for resources from multiple clients are dominant concerns. Consider the following example. A client contacts two airlines simultaneously for price quotes. He buys a ticket from either airline if its quoted price is no more than $300, the cheapest ticket if both quotes are above $300, and any ticket if the other airline does not provide a timely quote. The client should receive an indication if neither airline provides a timely quote. This is a typical wide-area computing problem, which is difficult to program using the traditional sequential programming constructs. We propose composition operators to express solutions to such problems succinctly.
Computation Orchestration
1.2
287
An Overview of the Orchestration Theory
Starting an orchestration. We propose a simple extension to a sequential programming language to permit orchestration. Introduce an assignment statement of the form z :∈ f where z is a variable and f is the name of an orchestration expression (abbreviated to Orc expression, or, simply expression)1 . Evaluation of f may entail a wide-area computation involving, possibly, multiple servers. The evaluation returns zero or more results, the first one of which (if there is one) is assigned to z. If the evaluation yields no result, the statement execution does not terminate. Additionally, the evaluation may initiate computations which have effects on other servers, and these effects may or may not be visible to the client. Next, we give a brief introduction to the structure of Orc expressions.
Site. The simplest Orc expression is a site name. Evaluation of the expression calls the site like a procedure. Each call to a site elicits at most one response; it is possible that a site never responds to a call. A site call may also have parameters. Consider evaluation of the Orc expression M , where M is a news service. It may simply return the latest news page. Calling M with a parameter, as in M (d) where d is a date, downloads the news page for the specified date. Let Email(a, m) send message m to address a. Evaluation of the expression Email(a, m) sends the message, causing permanent change in the state of the recipient’s mailbox, and returns a signal to the client to denote completion of the operation. Let A be an airline flight-booking site. Evaluating the expression A returns the booking information and causes a state change in the airline database. A ticket is purchased only by making an explicit commitment later in the computation. A site could be a function (say, to convert an XML file to a bit stream for transmission), a method of an object (say, to gain access to a passwordprotected object; in this case, the password, or an encrypted form of it, would be a parameter of the call), a monitor[Hoare, 1974] procedure (such as read or write to a buffer, where the read responds only when the buffer is non-empty), or a web service (say, a stock quote service that delivers the latest quotes on selected stocks). A transaction[Gray and Reuter, 1993] can be a site that returns a result and tentatively changes the states of some servers. An orchestration may involve humans as sites. A program which coordinates the rescue efforts after an earthquake will have to accept inputs from the 1 The notation :∈ is due to Hoare. It neatly expresses, in analogy with the assignment operator :=, that the evaluation of the right side may yield a set of values one of which is to be assigned to z.
288 medical staff, firemen and the police, and direct them by sending commands and information to their hand-held devices. Humans communicate with the orchestration by sending digital inputs (key presses) and receiving output suitable for human consumption (print, display or audio). A call to a site may not return a result if, for instance, the server or the communication link is down. This is treated as any other non-terminating computation. We show how time-outs can be used to alleviate this problem.
Composition Operators. The simplest Orc expressions, as we have seen, are site calls. We use composition operators on expressions to form longer expressions (and orchestrate their evaluations). Orc has three composition operators: (1) for sequential composition, (2) | for symmetric parallel composition, and (3) where for asymmetric parallel composition. Additionally, we structure an expression by allowing expression definitions, and using names of expressions in other expressions. Naming also allows recursive definitions of expressions, which is essential in any substantive application design. We give a brief summary of the composition operators; there is a detailed description in section 3.2. Evaluation of an expression produces a (possibly empty) stream of values. For Orc expressions f and g, (Sequential Composition) To evaluate (f g), first start f and for each value produced by f start a new thread to evaluate g. Pass the value from f to this thread. Thus, there may be multiple threads for g executing simultaneously, one for each value produced by f . The output stream of (f g) consists of the values produced by the g-threads in time-order. (Symmetric Parallel Composition) To evaluate (f | g), start f and g simultaneously as independent threads. The output stream consists of the values produced by both threads in time-order. (Asymmetric Parallel Composition) Operators and | only create threads. In {f where x :∈ g}, expressions f and g are treated asymmetrically, and variable x may be defined by g and used by f . To evaluate the expression, start f and g simultaneously. When g produces its first value, assign it to x and terminate further computation of g. During evaluation of f , any site call which does not name x as a parameter may proceed, but site calls in which x is a parameter are deferred until x acquires a value. The output stream of {f where x :∈ g} is the stream of values produced by f under this evaluation strategy. Sequencing allows results from one expression to be used as input to another; for instance, we may contact a discovery service and pipe its output —the name of an application— to another service which downloads the application and executes it on some given data. Operator | allows us to receive
Computation Orchestration
289
data from mirror sites or to compute a result by calling alternate services. And where allows creation of multiple threads, as well as pruning of the computation. A small example below illustrates many of these concepts.
Example. A machine is assembled from two parts, u and v. For each part there are two vendors, u1 and u2 for u, and v1 and v2 for v. It is required to compute a cost for the machine as follows. Solicit quotes for u from both vendors and accept the first quote received. Then solicit quotes for v, again accepting the first received quote. Compute the machine cost to be the sum of the part costs and produce this as the result. First, we define the expression ContactVendor , which has sites M and N (vendor names) as formal parameters. It returns the first quote received from either. We use a predefined site let, where let(x) returns the value of x. ContactVendor (M, N ) ∆ {let(x) where x :∈ (M | N )} Next, define the expression to compute the cost of the machine by first acquiring the part costs. Below, site call add(u, v) returns u + v. (We use two assignments under one where clause, a syntactic convention. See page 294). Note how the computation for v is started only after a value is assigned to u. Cost ∆ {add(u, v) where u :∈ ContactVendor (u1, u2) v :∈ let(u) ContactVendor (v1, v2)} We start the computation by executing the following statement in the main (host language) program. z :∈ Cost This assigns the value produced by Cost to z. In this example, each expression produces at most one value. The computation does not terminate if an expression, such as ContactVendor (u1, u2), produces no value (because neither vendor responds).
1.3
Power of the Orc computation model
The proposed programming model is quite minimal. It has no inherent computational power; it has to rely on external sites for doing even arithmetic. However, this apparent limitation permits us to study orchestration in isolation and to combine sites of arbitrary complexity in a computation, without making any assumptions about their behavior. Our model includes no explicit constructs for time-out or thread synchronization and communication, features
290 which are common in thread-based languages. We show in section 4 how such constructs are easily implemented in Orc. As a special case, single-threaded computations (as in sequential computing) are also easy to code in Orc; they resemble programs written in a functional language like Haskell [Haskell Language Report, 1999]. We program arbitrary process-network-style computations by having threads correspond to processes, communicating through sites that implement channels. In many distributed applications, no value is returned from a computation because the computation never terminates by design. An Orc computation can start and spawn (a bounded or unbounded number of) threads: some may terminate on their own (thus, returning values), some are deliberately terminated (using a where clause, still returning a value), and others continue to run forever (without returning a value), though they affect the states of sites. This generality permits us to program a variety of thread-based applications using a small number of concepts.
Structure of the paper. The goal of this paper is to introduce the Orc programming model and to illustrate its application in diverse areas of programming. In a companion paper in this volume[Hoare et al., 2004], we propose a semantic model, and in forthcoming papers describe an implementation of Orc and develop strategies to commit the computation of a specific thread (if the thread makes calls to transactions which need commitment). We discuss several issues related to sites in section 2. In particular, we state some assumptions we do not make about sites. We define a few sites which are fundamental to effective programming in Orc. We describe the syntax in section 3.1 and an operational semantics in section 3.2. Most programming is done by learning certain idioms. We develop a number of idioms in section 4, which show the programming strategy for sequential computing, time-out, and communication and synchronization among threads. Section 5 contains a few laws, describing equivalences over Orc expressions. We develop some longer examples in section 6. These are motivated by the intended application domain of Orc, web services orchestration. Treatment of a realistic example would take much longer, in time and (paper) space. But our examples illustrate that Orc provides succinct representation for a variety of distributed applications.
2. 2.1
Sites Properties of sites
Each terminal element in an Orc expression is a site call. A site call has the same form as a function call: the name of a site followed by an optional list of parameters. Therefore, the simplest Orc expression is the name of a site. A
Computation Orchestration
291
parameter is a constant, variable, or a special symbol θ, which we explain in section 3.2.5 (page 297). In this paper, we do not specify exactly how a site is to be called; the kinds of communication protocols to be used and the servers on which the computations of a site take place are not relevant to our theory. It is possible to designate a site as being downloadable —as is the case with most Java applets— which causes a site call to result in a download and execution of the application on the client’s machine. More elaborate schemes for migration and execution may be specified for certain sites. In general, calling a site causes execution of the corresponding procedure at the appropriate servers. Such concerns are not addressed within Orc. A site is different in several ways from a mathematical function. First, a site call may have side-effects, changing the state of some object. Second, a site call may elicit no response, or produce different results with the same input at different times. In particular, a site may return no result for one call and a result for an identical call (with the same inputs) at a different time. This is because the server or the communication link may have failed during the former call. Third, the response delay of a site is unpredictable.
2.2
Types of results produced by sites
A site is called with values of certain types and it returns typed values. The internet already supports a number of esoteric data types, such as news pages, downloadable files, images, animation and video, url strings, email lists, order forms, etc. The result returned by a discovery service is of type site. We expect the variety of types to proliferate in the coming years. Many of these types will be XML document types[Extensible Markup Language (XML), 2001]; see Cardelli [Cardelli, ] for an interesting presentation on this and related topics. Even though it is a fascinating area, we will not pursue the question of how various types will be handled within a traditional sequential programming language. We merely assume that a result produced by a site can be assigned to a program variable. We introduce a type, called signal, which has exactly one value. Its purpose is to indicate the termination of some expression evaluation.
2.3
States changed by site calls
A site call can potentially affect the state of the external world in addition to returning a value to the client. The state changes could be one of the following: (1) no (discernible) state change (2) a permanent state change, or (3) a tentative state change. A site which is a function (in the strict mathematical sense) causes no state change. (Although its execution consumes resources, such aspects are not rele-
292 vant to our work.) Similarly, a query on a database does not cause visible state change, though it may have the benign side-effect of rearranging the data for faster access in the future. A call to an Email site causes a permanent state change in the mailbox of the intended recipient. This state change can not be rolled back. Any rollback strategy is application dependant, say, by sending a cancellation message, which the recipient has to interpret appropriately. A call to a site that implements a transaction will usually cause a tentative state change. Imagine booking an airline ticket through its web site or trading stocks online at a brokerage service. The tentative state changes are made permanent only by explicit commitment (i.e., the user confirms the purchase of the airline ticket or buys the stock). If the transaction is not confirmed in a timely manner, the state changes are rolled back. A transaction can be regarded as an atomic instruction which is either executed completely or not at all. This permits us to build larger computational units by composing the atomic instructions in various ways. And a transaction has no permanent effect unless it is committed. This permits us to explore alternative computations, each computation being a series of transactions, and to commit to a specific computation (i.e., all transactions in it), only after observing the results of different computations. For example, a client may book tickets at different airlines, compare their prices and then confirm the cheapest one. In a forthcoming paper, we describe a protocol to select and commit an appropriate subset of transactions that are invoked during a computation.
2.4
Some Fundamental Sites
We define a few sites that are fundamental to effective programming in Orc. let(x, y, · · ·) returns a tuple consisting of the values of its arguments. Clock returns the current time at the server on which this site is implemented. The value is an integer. Atimer(t), where t is integer and t ≥ Clock, returns a signal at time t. Rtimer(t), where t is integer and t ≥ 0, returns a signal after exactly t time units. Signal returns a signal immediately. It is same as Rtimer(0). For a boolean value b, if (b) returns a signal if b is true and it remains silent (i.e., does not respond) if b is false. The names Atimer and Rtimer indicate absolute and relative values of their arguments. Note that
293
Computation Orchestration
Atimer(t) ≡ Rtimer(t − c) Rtimer(u) ≡ Atimer(u + c) where c is the current value of Clock. The timer sites are used for computations involving time-outs. Time is measured locally by the server on which the client (and the timer) reside. Since the timer is a local site, the client experiences no network delay in calling the timer or receiving a response from it; this means that the signal from the timer can be delivered at exactly the right moment. With t = 0, Rtimer responds immediately.
3.
Syntax and Semantics
We describe the syntax and operational semantics of Orc in this section. The notation, which we have outlined in section 1.1, is quite simple, and can be adapted easily for many sequential host languages.
3.1
Syntax
A computation is started from a host language program by executing an Orc statement z :∈ f ([parameter [ ]) where z is a variable of the host program, f is the name of a (defined) expression and [[parameter] is a (possibly empty) comma-separated list of parameters. See section 3.2.11 for the semantics of Orc statements. The syntax of expression definition is exprDefinition ::= exprName([parameter [ ]) ∆ expr We next describe expr which denotes an Orc expression. In the following, f and g are Orc expressions, F is the name of an expression defined separately, and x is a variable. expr term parameter tag
::= ::= ::= ::=
term | f g | f >tag > g | f where x :∈ g [ ]) | F ([parameter [ ]) 0 | 1 | site([parameter constant | variable | θ variable | θ
Important notational convention. tice, f g is used heavily.
We write f g for f
>θ >
g. In prac2
Here are some example expressions where M , N and R are sites, x and y are parameters, u and v are tags, and F is an expression of two arguments defined elsewhere.
294 M , N (x), 1 | M , M | N (x), M N (x), M {M 0 | {N (x) where x :∈ R | N (y)}} F (x, y) >u> N (u)
>u >
N (u)
Binding powers of the operators. Operators and >tag > are identical; the tag θ is implicit in the former case. The binding power of the operators in increasing order of precedence are: ∆ , where , :∈ , | , . So M 0 | N (x) where x :∈ R | N (u) stands for {(M 0) | N (x) where x :∈ (R | N (u))} Operator M M
>x> >x>
is right associative. So (N (x) | R) >y > S(x, y) is {(N (x) | R) >y > S(x, y)}.
Scopes of variable names. An expression has several kinds of variables: (1) global variables, which are the formal parameters in its definition (2) local variables which are the variables defined within where clauses, and (3) tags. Each of these may be named as a parameter in a site call. The scope rules determine how the free variables in an expression (which occur as parameters in site calls) bind with definitions of variables. A global variable can be named as a parameter anywhere in an expression. Variable x is local in {f where x :∈ g}; any free occurrence of x in f is bound to this variable. The scope of tag x in f >x> g is expression g. Because of the right associativity of , g extends as far to the right as possible over . We do allow the same tag name to occur more than once in a scope, as in M
>x>
N (x)
>x>
R(x).
In case of ambiguity in reference, as in referencing x in R(x), the tag with the smallest scope (i.e., the most recent one) is chosen. Because of right associativity, M M
>x> >x>
N (x) >x> R(x) is (N (x) >x> R(x)).
Therefore, there is no ambiguity in the reference to x in N (x). In R(x), the reference is to the second occurrence of >x> , because it is the one with the smallest scope.
Other notational conventions. We write a sequence of local variable definitions under one where clause, as follows. Expression {f where x :∈ g} where y :∈ h is also written as
Computation Orchestration
295
f where
x :∈ g y :∈ h
or, {f where x :∈ g, y :∈ h}. In a group of local variable definitions, an expression can refer only to the variables defined below it. Note that f , g and h are all evaluated in parallel in the above expression. We use a variety of brackets, () {} , in writing expressions to make them easier to read. They are interchangeable. An expression of the form | i : 0 ≤ i ≤ 2 : Pi is an abbreviation for P0 | P1 | P2 . We omit the range of i when it is clear from the context.
3.2
Operational semantics
In this section, we describe the semantics of Orc in operational terms. A formal semantics appears in a companion paper in this volume[Hoare et al., 2004]. Evaluation of an expression (for a certain set of global variable values) yields a stream of values; additionally, the evaluation may assign values to certain tags and local variables. We describe the evaluation procedure for expressions based on their syntactic structures.
3.2.1 Term: Site call. The simplest expression is a site name without parameters. To evaluate the expression, call the site and the value returned by the site becomes the (only) value of the expression. For example, let CNN be a site that returns a newspage. Then the expression CNN simply returns the page as its value. A site call with parameters is strict; that is, the site is called only when all its parameters are defined. The parameters and return value of a site can be of any type (see section 2.2), including a site name, which can be called later in the Orc expression. For instance, let Email(address, message) call site Email with a recipient address and the message to be sent to him. The site call causes an email to be sent and a value to be returned (possibly, a signal) to indicate completion of the send operation. Let D(r) call a discovery service D with service requirement r; site D returns the name of a site which provides service r. 3.2.2 Term: Expression call. An expression call is syntactically similar to a site call, with the name of an expression replacing a site name. However, there are several semantic differences. First, a site call produces at most one value whereas an expression may produce many.
296 Second, calling an expression starts evaluation of a new instance of that expression; that is f f refers to two different instances of f . A site call, typically, will not create new instances of the site, but will queue its callers and serve them in some order. Third, a site call is strict in that its actual parameter values are defined before the call. An expression call is non-strict; evaluation of an expression begins when it is called, even if some of its actual parameters are undefined. See sections 3.2.7 and 3.3.3 for elaboration.
3.2.3 Operator for sequential composition. Operator and its more general form >tag > allow sequencing of site calls. First, we describe , the sequencing operator whose tag, θ, is implicit. Expression M N first calls M , and on receiving the response from M calls N . The value of the expression is the value returned by N . Site N can reference the value returned by M using a tag, as we describe below. As example, Rtimer(1) Email(address, message) sends the email after a unit delay and returns a signal (the value from Email). Expression Rtimer(1) Rtimer(1) has the same effect as Rtimer(2). Operator is right associative. So Email(address1, message) Email(address2, message) Notify sends two emails in sequence and then calls Notify. The examples we have shown so far each produce at most one value. In this case, has the same meaning as the sequencing operator in a conventional sequential language (like “;” in Java). For expression f g, where f and g are general Orc expressions, f produces a stream of values, and each value causes a fresh evaluation of g. The values produced by all instances of g in the time-order is the stream produced by f g. Note that during the evaluation of f g, threads for both f and g may be executing simultaneously. We elaborate on this below and after introducing more general Orc expressions.
3.2.4 Tag. In M N , we have merely specified an order of site calls without showing how N may reference the value produced by M . We write M >x> N , where x is a tag (a variable name), to assign a name to the value produced by M . Then N can reference this value as x. Expression M M
>x> >x>
(N (x) | R) >y > S(x, y) is {(N (x) | R) >y > S(x, y)},
from the right associativity of . That is, the scope of a tag as far to the right as possible over a chain of . For general Orc expressions f and g, f >x> g assigns name x to every value produced by f . Each value is referenced in a different thread (an instance of g)
297
Computation Orchestration
as x. For example, suppose f produces three values, 0, 1 and 2. We show the computation of f >x> M (x) schematically in figure 1. Here, each path in the tree is an independent thread. f 0
1
x=0
x=1
M(x)
Figure 1.
2 x=2
M(x)
Computation of f
M(x)
>x>
M (x)
A variable name may appear as tag more than once in an expression, as in M
>x>
N (x)
>x>
R(x).
This is a bad programming practice, and it should be avoided. As explained under the scope rules, a tag reference is to the the most recent matching tag name, i.e., the one with the smallest scope.
Associativity of . We require right associativity of >x> only to establish the scope of tag x. It is easily shown that is fully associative; that is, (f g) h = f (g h). More generally, (f
>x>
g)
>y >
h = f
>x>
(g
>y >
h)
if h does not reference x.
3.2.5 Default tag: θ . If the tag is absent in a sequencing operator, as in M N , we take it to mean M >θ> N , where θ is called the default tag. Using θ is a convenient way to reference a value over a short scope without assigning it an explicit name, as in M N (θ). The default tag is redefined with every . From the scope rules, in M N (θ) R(θ) the first occurrence of θ refers to the value produced by M and the latter to that of N . 3.2.6 Operator | for symmetric parallel composition. Using the sequencing operator, we can only create single-threaded computations. We introduce | to permit symmetric creations of multiple threads. Evaluation of (M | N ) creates two parallel threads (one for M and one for N ), and produces values returned by both threads in the order in which they are computed. Given that CNN and BBC are two sites that return newspages, CNN | BBC
298 may potentially return two newspages. (It may return zero, one or two values depending on how many sites respond.) In general, evaluation of f | g, where f and g are Orc expressions, creates two threads to compute f and g, which may, in turn, spawn more threads. The result from each thread is a stream of values. The result from f | g is the merge of these two streams in time order. If both threads produce values simultaneously, their merge order is arbitrary. Operator | is commutative and associative. In traditional thread-based languages, f | g returns a single value, either the first value computed by one of the threads or a tuple of values combining the first one from each thread. The first strategy makes a commitment to the first value, discarding all other values. The second strategy, often called forkjoin parallelism, requires both threads to deliver results before proceeding with further computation. In Orc, each value from either thread is treated independently in further computations. Therefore, (f | g) h creates multiple threads of h, one for each value from f | g. The two traditional computation styles can be expressed in Orc, as we will show later. It is instructive to consider the expression (M | N ) R. The evaluation starts by creating two threads to evaluate M and N . Suppose M returns a value first. Then R is called. If N returns a value next, R is called again. That is, each value from (M | N ) spawns a thread for evaluating the remaining part of the expression. In (M | N ) R(θ), the value that spawns R is referenced as θ in R. In (M | N ) >x> R(x), this value is given a name, x. Expressions M | M and M are different; the former makes two parallel calls to M , and the latter makes just one. The expression M (N | R) is different from M N | M R. In the first case, exactly one call is made to M , and N and R are called after M responds. In the second case two parallel calls are made to M , and N and R are called only after the corresponding calls respond. The difference is significant where M returns different values on each call, and N and R use those values. The two computations are depicted pictorially in figure 2. M N
M R
N
(a)
Figure 2.
(a) M
M R (b)
(N
| R) and (b) M
N
| M
R
Earlier, we wrote Email(address1, message) Email(address2, message) Notify
Computation Orchestration
299
to send two emails and then call Notify. The emails may be sent in parallel using {Email(address1, message) | Email(address2, message)} Notify However, Notify is called twice, once for each email. The operators and | can only create threads, not destroy them. Our next operator permits us to terminate parts of an expression evaluation selectively.
3.2.7 Operator where for asymmetric parallel composition. An expression with a where clause (henceforth, called a where expression), has the form {f where x :∈ g}. Expression f may name x as a parameter in some of its site calls. Evaluation of the where expression proceeds as follows. Evaluate f and g in parallel. When g returns a result, assign the value to x and terminate evaluation of g. During evaluation of f , any site call which does not name x as a parameter may proceed, but site calls in which x is a parameter are deferred until x acquires a value. The stream of values produced by f under this evaluation strategy is the stream produced by {f where x :∈ g}. A useful application of where is in pruning the computation selectively, by destroying certain threads. Consider (M | N ) >x> R(x) where each value produced by (M | N ) creates an instance of R(x). To create just one thread for R(x), corresponding to the first value produced by (M | N ), use {R(x) where x :∈ (M | N )} We solve the notification problem from the previous section by using a where expression. {let(u, v) Notify where
u :∈ Email(address1, message) v :∈ Email(address2, message)} Expression calls are non-strict because the semantics of where expressions demand it. Consider {F (x) where x :∈ g} where F is the name of an expression. The semantics of where require that we start the evaluation of F (x) and g simultaneously, i.e., before x has a value. An implementation has to pass x by reference (where the value of x will be stored when it is defined) to F .
3.2.8 Constant terms 0 and 1. There are two constant terms in Orc, 0 and 1. Treat each as a site. Site 0 never responds and 1 responds immediately with the value of θ. We have many uses for both constants, which we illustrate through out the paper. Expression {Email(address1, message) 0 | Notify} sends an email but never waits for its response and calls Notify immediately. A site like
300 Email is called an asynchronous procedure in polyphonic C# [Benton et al., 2004]; no response is needed from it to proceed with the main computation. Suppose we have to call sites M and N in order, and return the response from each. We can not use M N because the response from M is not a value of the expression. And (M | N ) does not call M and N in order. We can use M (1 | N ), which starts two threads after receiving the response from M . One thread (expression 1) returns the value from M , and the other (expression N ) calls N and returns its value.
Tag Elimination. We can eliminate all non-default tags using the following identity. We include tags in Orc only for convenience in programming. (f
>x>
g) = (f
{g where x :∈ 1})
3.2.9 Defining Orc expressions. Essential to program structuring is the ability to write a long expression in terms of other expressions that are defined separately. In Orc, an expression is defined by its name, a list of parameters which serve as its global variables, and an expression which serves as its body. As an example, consider the definition Asynch(M, N ) ∆ M 0 | N which defines the name Asynch, specifies its formal parameters (sites M and N ) and its body. Another expression may call it, for example, in Asynch(Email(address1, message), Notify). The effect of the call is to evaluate a new instance of the expression. The actual parameters may not all be defined, so they are passed by reference. An expression evaluation produces a stream of values. In the following example, an expression may return up to two values. Sites P and Q manage the calendars of two different professors. Calling P (t), where t is a time, returns t if the corresponding professor can attend a meeting at t, and it is silent (i.e., returns no value), otherwise. Expression P meetQ has two parameters, u and v, which are two possible meeting times, and it outputs those times (out of u and v) when both P and Q can meet. So, it may produce 0, 1 or 2 outputs. P meetQ(u, v) ∆ {P (u) let(x) where x :∈ Q(u)} | {P (v) let(x) where x :∈ Q(v)}
3.2.10 Recursive definitions of expressions. Naming expressions has the additional benefit that we can use the name of an expression in its own
Computation Orchestration
301
definition, getting a recursive definition. Below is an expression which emits a signal every time unit, starting immediately. Metronome ∆ Signal | Rtimer(1) Metronome Parameters may appear in recursive calls in the usual fashion. Define a bounded metronome to generate n signals at unit intervals, starting immediately. We permit pattern matching over parameter values in the same style as Haskell[Haskell Language Report, 1999]. ∆ 0 BMetronome(0) BMetronome(n + 1) ∆ Signal | Rtimer(1) BMetronome(n) Site Query returns a value (different ones at different times) and Accept(x) returns x if x is acceptable. Produce all acceptable values by calling Query at unit intervals forever. RepeatQuery ∆ Metronome Query Accept(θ) Or, produce all acceptable values by calling Query at unit intervals n times. RepeatQuery(n) ∆ BMetronome(n) Query Accept(θ) Using only the basic composition operators, an expression can produce only a bounded number of values. As we see in Metronome, recursive definitions allow unbounded computations. Many more examples of the use of recursion appear through out the paper.
3.2.11 Starting and ending a computation. A computation is started from a host language program by executing an Orc statement z :∈ f ([parameter [ ]) where z is a variable of the host program and f is the name of an expression, followed by a list of parameters. All parameters of f have values before its evaluation starts, unlike expression calls made during an evaluation. To execute this statement, start the evaluation of f with actual parameters substituted for the formal ones, assign the first value produced to variable z, and then terminate the evaluation. If f produces no value, the execution of the statement does not terminate. Thus, z :∈ RepeatQuery(10) assigns the first value (if any) returned by RepeatQuery to z. In many distributed programming applications, f never produces a value though it has effects on the external world through its site calls. Several such examples appear in sections 4 and 6. In such case, the Orc statement should be placed within a thread of the host language program with the expectation of non-termination.
302
3.3
Non-determinism and Referential Transparency
3.3.1 Angelic non-determinism. In evaluating (M | N ) R, it is tempting to accept the first value computed for (M | N ) and call R only with this input, a form of demonic choice. But we reject this strategy, because we would like to explore all possible computation paths denoted by the expression. That is, we employ angelic non-determinism. Therefore, we call R with all values returned by M and N . And R may respond after, say, N has returned its value, but fail to respond after M . One pleasing outcome of this evaluation strategy is that we have the identity demanded by the axioms of Kleene algebra (see section 5), (M | N ) R = M R | N R, and, more generally, the following distributivity law over expressions f , g and h. (Right Distributivity of
over | )
(f | g) h = (f h | g h)
See section 4.10 for a solution to the eight queens problem which exploits angelic non-determinism.
3.3.2 Demonic Nondeterminism. In a functional programming language like Haskell[Haskell Language Report, 1999], the where operator provides a convenient mechanism for program structuring and efficient evaluations of expressions. It is not a necessity because of referential transparency: a variable defined by a where clause can be eliminated from an expression by replacing its occurrence by its definition. In Orc, the where clause is essential to implement demonic nondeterminism: to accept a single value of an Orc expression and discard the remaining ones. Therefore, let(x) M where
x :∈ N | R is not equivalent to (N | R) M In x :∈ N | R, the first value computed for N | R is assigned to x, and subsequent values are discarded. However, (N | R) M forces evaluation of M for each value of N | R. The first form of programming (angelic) allows us to explore all possible computation paths, and the second form (demonic) permits a more efficient evaluation strategy, used when only some of the paths need to be explored.
303
Computation Orchestration
Orc is referentially transparent: the 3.3.3 Referential Transparency. name of an expression can be replaced by its body in any context to yield an equivalent expression. (We define equivalence of Orc expressions in the companion paper in this volume[Hoare et al., 2004]. Some equivalences are shown in section 5.) We show that referential transparency and the semantics of where expressions force us to implement non-strict evaluation of expressions. Consider the example of parallel-or (section 4.8) which we reproduce below. P arallel or ∆ {if (x) | if (y) | or (x, y) where
x :∈ M y :∈ N } Define P or(u, v) ∆ {if (u) | if (v) | or (u, v)} Under referential transparency, the expression {P or(x, y) where
x :∈ M y :∈ N } has the same semantics as P arallel or. This requires us to call P or as soon as the evaluation of the where expression starts, i.e., before x and y are defined.
3.4
Small Examples
We give a number of small examples to familiarize the reader with the programming notation. Some fundamental programming idioms appear in the next section and a few longer examples appear in section 6.
Timing thread creations. time unit each.
Make four requests to site M , in intervals of one
M | Rtimer(1) M | Rtimer(2) M | Rtimer(3) M If site M returns result v before t time units, set z to v; if after t (or never), set z to 0; if at t, set z to either. z :∈ M | Rtimer(t) let(0) If the computation shown above is to be embedded as part of a larger expression evaluation, we write
304 {let(z) where z :∈ M | Rtimer(t) let(0)}
Selective timing with threads. Receive N ’s response as soon as possible, but no earlier than 1 unit from now. Expression Rtimer(1) N delays calling N for a time unit and expression {N >x> Rtimer(1) let(x)} delays producing the response for a unit after it is received. What we want is to call N immediately but delay its response until a time unit has passed. DelayedN ∆ {Rtimer(1) let(u) where u :∈ N } We can use this expression to give priority to M over N . Request M and N for values, but give priority to M by allowing its response to overtake N ’s response provided it arrives within the first time unit. x :∈ M | DelayedN
Flow rate calculation. Count the number of values produced by expression f in 10 time units. We use a local site count which implements a counter. The initial value of the counter is 0; calling count.inc increments the counter and returns a signal, and count.read returns the counter value. In this solution, the value returned by count.inc is explicitly ignored, because we are only interested in producing a single value after 10 time units. f count.inc 0 | Rtimer(10) count.read This expression can be used to compare the rate at which two sources (say, expressions f and g) are producing values. We may then choose one source over another when both are producing the same stream of values. Flow rate computation is important in many applications. Cardelli and Davies [Cardelli and Davies, 1999] introduces a basic language construct to compute flow rates for bit streams.
Recursive definition with time-out. Call a list of sites and tally the number of responses received in a certain time interval. Below, tally(L) implements this specification where L is a list of sites, m is a (fixed) argument for each site call, and the time interval is 10 units. This example illustrates the use of recursion over a list. We use the Haskell [Haskell Language Report, 1999] notation for lists, denoting an empty list by [ ], and a list with head x and tail xs by (x : xs). ∆ let(0) tally([ ]) tally(x : xs) ∆ {add(u, v)
— add(u, v) returns the sum of u and v
where
u :∈ x(m) let(1) | Rtimer(10) v :∈ tally(xs)}
let(0)
305
Computation Orchestration
4.
Programming Idioms
Lexical conventions. Orc does not include any facility for doing primitive operations on data, such as arithmetic or predicate evaluation. We have to call specific sites to carry out such operations. For example, to add x and y we need to call add(x, y) which returns the sum. In our examples, we take the liberty of writing x + y as an arithmetic expression; it is easily converted to an Orc expression by a compiler. Similarly, we write expressions over booleans, lists and other data types.
4.1
Sequential computing
Orc is not intended as a replacement for sequential programming. Yet its constructs can be used to simulate control structures of sequential programming languages, as we show in this section.
Sequencing. The sequential program fragment (S; T ) is (S T ) in Orc. If S is an assignment statement x := e, the Orc code is (E >x> T ) where Orc expression E returns the (single) value of e. This encoding also supports reassignments of variables. Conditional execution. if
A typical if-then-else statement,
b then S else T
is coded in Orc as if (b)
S | if (¬b)
T
Note that of the two threads created here, only one can proceed to compute a value. As a specific example, the following expression returns the absolute value of its numerical argument. absolute(x) ∆ if (x ≥ 0 ) let(x) | if (x < 0 ) let(−x)
Iteration. while
A typical loop in an imperative program has the form b do x := S(x)
where x may be a set of variables. We simulate this code fragment in Orc as follows; the value returned by the Orc expression is that of x. loop(x) ∆ if (b)
S(x)
loop(θ) | if (¬b)
let(x)
Consider a typical program which starts with an initialization, followed by a loop and a terminating computation.
306 x := x0 ; while b do x := S(x); return T (x) This is equivalent to the Orc expression {loop(x0 )
4.2
T (θ)}.
Kleene Star and Primitive Recursion
In the theory of regular expressions, M ∗ denotes the set of strings formed by concatenating zero or more M symbols. By analogy, we would like to define an expression, Mstar (x), which returns the stream of results x, M (x), M (x) M (θ), M (x) M (θ) M (θ), . . . Our definition of this expression is Mstar (x) ∆ let(x) | M (x) Mstar (θ) Closely related Mstar(x) is Mplus(x) which returns the same stream as Mstar(x) except its very first value, i.e., the stream M (x), M (x) M (θ), M (x) M (θ) M (θ), . . . We define Mplus(x) ∆ M (x) (1 | Mplus(θ)) More general expressions which take M as parameter are, Star(M, x) ∆ let(x) | M (x) Star(M, θ) P lus(M, x) ∆ M (x) (1 | P lus(M, θ))
Creating a stream of successive approximations. Consider a numerical analysis program which computes its final value by successive approximations from an initial value. It checks each produced value for a convergence criterion, and stops the computation once a convergent value is found (i.e., one that meets the convergence criterion). Let site Refine(x) returns a refined approximation of x and Converge?(x) returns x if x is a convergent value; it is silent otherwise. We define expressions RefineStream(x) which returns a stream of successive approximations starting from x, and RefineConverge(x) which returns the substream of RefineStream(x) of convergent values. RefineStream(x) ∆ Star(Refine, x) RefineConverge(x) ∆ RefineStream(x) More direct definitions are
converge?(θ)
307
Computation Orchestration
RefineStream(x) ∆ let(x) | Refine(x) RefineStream(θ) RefineConverge(x) ∆ converge?(x) | Refine(x) RefineConverge(θ) If it is required to stop the computation after the first convergent value is found, use the expression {let(z) where z :∈ RefineConverge(x)}.
4.3
Arbitration
A fundamental problem in concurrent computing is arbitration: to choose between two threads and let only one proceed. Arbitration is the essence of mutual exclusion. In process algebras like CCS and CSP, specific operators are included to allow arbitration; in very simple terms, α.P + β.Q is a process which behaves as process P if action α happens and as Q if β happens. In Orc terms, α and β correspond to sites Alpha and Beta and P and Q are expressions. We have the expression Alpha P | Beta Q, though we wish to evaluate only one of the threads, P or Q, depending on which site, Alpha or Beta, responds first. (This is similar, though not identical, to the process algebra expression, where only one of α or β succeeds; here, we have to attempt both Alpha and Beta, and choose one when both succeed.) Below, boolean variable flag encodes which of Alpha and Beta responds first. if (flag)
P | if (¬flag)
Q
where
flag :∈ Alpha
let(true) | Beta
let(false ( )
If P and Q use the values from Alpha and Beta, modify the program: if (flag)
let(x)
P | if (¬flag)
let(x)
Q
where
(x , flag) :∈ Alpha
let(θ, true) | Beta
let(θ, false)
An important special case of arbitration involves time-out: run P if Alpha responds within 1 time unit, otherwise run Q. This amounts to encoding Beta as Rtimer(1). A more detailed treatment of time-out appears next. The Orc model permits more complex arbitration protocols, such as, execute one of P , Q and R, depending how many sites out of Alpha, Beta and Gamma respond within 10 time units.
4.4
Time-out
To add a time-out to {z :∈ f }, write {z :∈ f | Rtimer(t) let(x)}, which either returns a result from f , or times out after t units and returns x. A typical paradigm is to return the value from site M only if it arrives before t units following the call; an indication is also returned if the value is from M or a time-out has occurred. The following expression returns a pair as its value:
308 (x, true) if M returns x before the time-out, and (−, false) if there is a timeout, where − is irrelevant. let(z) where
z :∈ M let(θ, true) | Rtimer(t) let(θ, false) As a more involved example, call Refine repeatedly (starting with initial value x0 ) and return the last value (the most refined) before time t. Below, BestRefine(t, x) implements this specification. It returns x if the computation times-out at t; otherwise it returns BestRefine(t, y), where y is the value returned by Refine before the time-out. BestRefine(t, x) ∆ if (b) BestRefine(t, y) | if (¬b)
let(x)
where
(y, b) :∈ Refine(x)
let(θ, true) | Atimer(t)
let(θ, false)
We find it easier to have an absolute time t as parameter of BestRefine; it can be modified to a relative time easily. Define BestRefineRelative(h, x), where h is a relative time, as clock BestRefine(θ + h, x)
4.5
Fork-join Parallelism
In concurrent programming, we often need to spawn two independent threads at a point in the computation, and resume the computation after both threads complete. Such an execution style is called fork-join parallelism. There is no special construct for fork-join in Orc, but it is easy to code such computations. The following code fragment calls sites M and N in parallel and returns their values as a tuple after they both complete their executions. { let(u, v) where
}
u :∈ M v :∈ N
As a simple application of fork-join, consider refreshing a display device at unit time intervals. The display is drawn by calling site Draw with a triple: a given screen image, keyboard inputs and the mouse position. We use Metronome (see section 3.2.10, page 300) to generate a signal at every unit, then start independent threads to acquire the image, keyboard inputs and the mouse position, and on completion of all three threads, call Draw. We code this as
Computation Orchestration
309
Metronome { let(i, k, m) where i :∈ Image k :∈ Keyboard m :∈ Mouse } Draw(θ)
The implicit assumption in this code is that i, k and m are evaluated faster than the refresh rate of one time unit.
4.6
Synchronization
Synchronization of threads is fundamental in concurrent computing. There is no special machinery for synchronization in Orc; a where expression provides the necessary ingredients for programming synchronizations. Consider two threads M f and N g; we wish to execute them independently, but synchronize f and g by starting them only after both M and N have completed. { let(u, v) where u :∈ M v :∈ N } (f | g) If the values returned by M and N have to be passed on to f and g, respectively, we modify the expression to { let(u, v) where u :∈ M v :∈ N } >(u, v)>
(f | g)
Barrier synchronization. The form of synchronization we have shown is known in the literature as barrier synchronization. In the general case, each independent thread executes a sequence of phases. The (k + 1)th phase of a thread is begun only if all threads have completed their k th phases. A straightforward generalization of the given expression solves the barrier synchronization problem. Barrier synchronization is common in scientific computing. For example, Gauss-Siedel iteration proceeds in phases where the (k + 1)th approximation for all variables are computed from their k th approximations. In heat transfer computation over a grid, the temperature at point (i, j) at moment k + 1 is
310 the average temperature over its neighboring points at moment k. The computation proceeds until some convergence criterion is met (we assume that the boundary points have constant temperature). We give a sketch of heat transfer computation in Orc. Given the temperature matrix x for some moment, where xij is the temperature at grid point (i, j), Refine(x) produces matrix y, the temperature at the next moment. Site N ext computes the temperature at a point from the its and its neighbors’ previous temperatures. Typically, it would return the average temperature of neighboring points of (i, j) if (i, j) is not a boundary point, but it may implement more sophisticated strategies. For a boundary point, the neighboring temperatures are irrelevant and it returns the previous temperature. Refine(x) ∆ {let(y) where
}
∀i, j :: yij :∈ N ext(xi,j , xi−1,j , xi+1,j , xi,j−1 , xi,j+1 )
As we have shown in section 4.2, we can get a convergent value by using RefineConverge. Using this strategy, the heat transfer computation is run by z :∈ RefineConverge(I) where I is the initial temperature matrix.
4.7
Interrupt
Consider an Orc expression which orchestrates the vacation planning for a family. It makes airline and hotel reservations by contacting several sites and choosing the most suitable ones according to the criteria set by the client. Suppose the client decides to cancel vacation plans while the Orc program is still executing. There is no mechanism for the client to interrupt the program because an Orc expression is evaluated like an arithmetic expression, not as a process which waits to receive messages. In this section, we show how an expression evaluation can be interrupted, and more importantly, how a different computation (such as roll back) can be initiated in case of interruption. This is important in many practical applications, such B2B transactions, where clients of a company may interrupt its computations by specifying new requirements, and vendors may wish to renegotiate their promises about parts delivery. For the vacation planner, an interruption by the client may require it to cancel any reservations it may have made and terminate the computation. We have already seen a form of interrupt: time-out. To allow for general interrupts, we set up sites Interrupt.set and Interrupt.get. An external agent calls Interrupt.set to interrupt the evaluation of an expression. And, Interrupt.get returns a signal only if Interrupt.set has been called earlier.
311
Computation Orchestration
Note the similarity of Interrupt to a semaphore, where set and get are the V and P operations on the semaphore. If a call on site M can be interrupted, use let(z) where z :∈ M | Interrupt.get where z acquires a value from M or Interrupt.get. Often we wish to determine if there has been an interrupt. Then we return a tuple whose first component is the value from M (if any) and the second component is a boolean to indicate whether there has been an interrupt. let(z) where (z, b) :∈ M
let(θ, true) | Interrupt.get
let(θ, false)
An easy generalization is to interrupt a stream. Below, expression callM calls M repeatedly until it is interrupted. It produces a stream of tuples: (x, true) for value x received from M and (−, false) for interrupt. It terminates computation after receiving an interrupt. callM ∆ let(x, b) | if (b)
callM
let(θ, true) | Interrupt.get
where
(x , b) :∈ M
let(θ, false)
It is easy to extend this solution to handle different types of interrupts, by waiting to receive from many possible interruption sites, and returning specific codes for each kind of interrupt. Typically, occurrence of an interrupt is followed by interrupt processing. An expression which processes the values from M and the interrupt differently is shown below. callM >(x, b)>
4.8
{ if (b) | if (¬b)
“Normal processing with value x” “Interrupt Processing” }
Non-strict Evaluation; Parallel-or
A classic problem in non-strict evaluation is Parallel-or: computation of x ∨ y over booleans x and y. The value of x ∨ y is true if either variable value is true; therefore, the expression evaluation may terminate even when one of the variable values is unknown. In this section, we state the problem in Orc terms, give a simple solution, and show examples of its use in web services orchestration. Suppose sites M and N return booleans. Compute the parallel-or of the two booleans, i.e., (in a non-strict fashion) return true as soon as either site returns true and false only if both sites return false. In the following solution, site or (x, y) returns x ∨ y.
312 {if (x) | if (y) | or (x, y) where
x :∈ M y :∈ N } This solution may return up to three different values depending on how many of x and y are true. To return just one value, use {let(z) where
z :∈ if (x) | if (y) | or (x, y) x :∈ M y :∈ N } A generalization of this expression for a list of sites is as follows. ( ) Paror ([ ]) ∆ let(false Paror (u : us) ∆ {let(z) where
z :∈ if (x) | if (y) | or (x, y) x :∈ u y :∈ Paror (us)} We can use the strategy of parallel-or to evaluate any function f of the form ⎧ ⎨ p(x)
f (x, y) =
if c(x) q(y) if d(y) ⎩ r(x, y) otherwise
where x and y are received from different sites. Many search problems over partitioned databases have this structure.
Airline Booking. We show a typical orchestration example in which parallelor plays a prominent role in one of the solutions. There are two airlines A and B each of which returns a quote, i.e., the price of a ticket to a certain destination. We show several variations in choosing a quote. First, compute the cheapest quote. Below, Min is a site which returns the minimum of its arguments. {Min(x, y) where x :∈ A, y :∈ B} Our next solution returns each quote that is below some threshold value c, and there is no response if neither quote is below c. Assume that site threshold
313
Computation Orchestration
returns the value of its argument provided it is below c, and it is silent otherwise. Expression (A | B)
threshold (θ)
returns each quote that is below the threshold. To obtain at most one such quote, we write {let(z) where z :∈ (A | B)
threshold (θ)}
To return any quote if it is below c as soon as it is available, otherwise return the minimum quote, we use the strategy of parallel-or. {threshold (x) | threshold (y) | Min(x, y) where
x :∈ A y :∈ B}
4.9
Communicating Processes
Orchestration is closely tied to distributed computing. Traditional distributed computing is structured around a network of processes, where the processes communicate by participating in events, or reading and writing into common channels. Processes are usually long-lived entities. In many cases, we do not expect a distributed computation to terminate. Programming constructs of Orc, as we have seen, can implement essential distributed computing paradigms, such as arbitration, synchronization and interrupt. We argue that they are also well-suited for encoding process-based computations. As a small example, consider a light bulb which is controlled by two switches. Flipping either switch changes the state of the bulb, from off to on and on to off. f This behavior is captured by Light ∆ {let(x) where x :∈ switch1 | switch2} ChangeBulbState Light This expression never returns a value, but causes the light bulb to change state (through site call ChangeBulbState). Note that only one of the switch flips is recognized if both switches are flipped before the bulb state changes.
Channel. We introduce channels for communication among processes. It is not an Orc construct; each channel has to be implemented by sites outside Orc. We assume in our examples that channels are FIFO and unbounded, though
314 other kinds of channels (including rendezvous-based communications) are easily implemented through sites. Channel c has two methods, c.get and c.put, which are called as sites from an Orc expression. Calling c.put(m) adds item m to the end of the channel and returns a signal. Calling c.get returns the value at the head of c and removes it from c; if the channel is empty, c.get queues the caller until it becomes nonempty.
Fairness. We make no fairness assumption about the queuing discipline at a site such as c.get. Calls are handled in arbitrary order and some caller may never receive a value even though values are being constantly put in the channel. However, if c is non-empty, the channel sends a value to some caller of c.get, and this value is eventually received by the caller. Therefore, a call to c.get during an expression evaluation completes eventually if c is non-empty and this is the only caller. Process. A process is an expression which, typically, names channels which are shared with other expressions. Shown below is a simple process which reads items from its input channel c, calls site Compute to do some computations with the item and then writes the result on output channel e. We add the input channel name as a subscript to the process name. Pc ∆ c.get
>x>
Compute(x)
>y >
e.put(y)
Pc
This process produces no value, though it writes on channel e. To output every value which is also written on e, define Pc ∆ c.get
>x>
Compute(x)
>y >
(1 | e.put(y) Pc )
Consider a process which processes inputs from two input channels, c and d, independently. P ∆ Pc | Pd There are two independent threads executing in parallel in P , one for input from c and the other from d. The following small example illustrates a dialog with a user process. The process reads input, which is assumed to be a positive integer, from a terminal (called tty), checks if the number is prime and outputs the result to the terminal. It repeats these steps as long as input is provided to it. Dialog ∆ tty.get P rime?(x) tty.put(b) Dialog
>x> >b>
Computation Orchestration
315
Process Network. A process network is a parallel composition of processes. There is no logical difference between a process and a network; see, for example, process P which is defined to be Pc | Pd . Let us build a process which reads from a set of channels ci , where i ranges over some set of indices, and outputs all the items read into channel e. That is, the process creates a fair merge of the input channels. The definition is a generalization of P , shown above, for multiple input channels, though the Compute step eliminated. Multiplexor i ∆ ci .get e.put(θ) Multiplexor i Multiplexor ∆ ( | i :: Multiplexor i )
Mutual exclusion. Consider a set of processes, Qi , which share a resource, and access to the resource has to be exclusive. This is a mutual exclusion, or arbitration, problem and it is easily solved in Orc. Process Qi writes its own id i to channel ci to request the resource. We employ the Multiplexor , above, to read the values from all ci and write them to channel e. The arbiter reads a value i from e and calls site Granti to permit Qi to use the resource. After using the resource, Qi returns a signal as the response of site call Granti . Expression Mutex orchestrates mutual exclusion. Arbiter Mutex
∆ e.get >i> Granti Arbiter ∆ Multiplexor | Arbiter
Note that the solution is starvation-free for each Qi , because its request will be read eventually from ci , put in channel e, read again from e and granted. This assumes that every process i releases the resource eventually by responding to Granti . The solution is easily modified to snatch the resource from an (unyielding) process after a time-out.
Synchronized Communications: Byzantine Protocol. We can combine many of the earlier idioms to code more involved process behavior. Consider, for example, the Byzantine agreement protocol [Lamport et al., 1982] which runs for a number of synchronized rounds. In each round, a process sends its own estimate (of the consensus value) to all processes, receives estimates from all processes (including itself), and computes a revised estimate, which is sent in the next round. The communications from process i to j use channel cij . We show the orchestration of the steps, though we omit (the crucial detail of) computing a new estimate, which we delegate to a site. The sending of estimate v by process i to all processes is coded by Sendi (v) ∆ 1 | ( | j :: cij .put(v) 0) Evaluation of Sendi (v) appends v to all outgoing channels of i and returns a single value. The responses from cij .put(v) are ignored (by using 0).
316 Expression Readi encodes one round of message receipt by process i. Below, X is a vector of estimates and Xi is its ith component. Readi ∆ let(X) where (∀j ∀ :: Xj :∈ cji .get) Process i computes a new estimate from X by calling Computei (X). A round at process i is a sequence of Sendi , Readi and Computei . Define Roundi (v, n) as n rounds of computation at process i starting with v as the initial estimate. The result of Roundi (v, n) is a single estimate. Roundi (v, 0) ∆ let(v) Roundi (v, n) ∆ Sendi (v) Readi
>X >
Computei (X)
Roundi (θ, n − 1)
The entire algorithm is coded by Byz(V, n), where V is the vector of initial estimates and n is the number of rounds. Below, i ranges over process indices. Vi , n)} Byz(V, n) ∆ { | i :: Roundi (V
Dining Philosophers. The dining philosophers is a quintessential problem of shared resource allocation. We give a solution in Orc which resembles a process-based solution given in Hoare [Hoare, 1984]. In this example, processes communicate using bounded buffers. There are N processes, called Philosophers, where the ith process is denoted by Pi . The philosophers are seated around a table where the right neighbor of Pi is Pi (henceforth, i is (i + 1) mod N ). Every pair of neighbors share a fork. The fork to the left of Pi is Forki and to his right is Forki . Philosopher i can eat only if it holds both its left and right forks. Assume that a philosopher’s life cycle consists of repeating the following steps: acquire the two adjacent forks, eat, and release the forks. Because of the seating arrangement, neighboring philosophers are prevented from eating simultaneously. Each Forki is a channel which holds only signals (assume that Eat returns a signal on completion of eating). A channel holds at most one signal at any moment, when the corresponding fork has been released but not yet picked up. Initially, each channel holds a signal. A philosopher’s life is depicted by Pi ∆
{let(x, y) where
} Pi
Eat Forki .put(θ) x :∈ Forki .get y :∈ Forki .get
Represent the ensemble of philosophers by DP ∆ ( | i : 0 ≤ i < N : Pi )
Forki .put(θ)
317
Computation Orchestration
Deadlock. It is well known that the given solution for dining philosophers has the potential for deadlock. To avoid deadlock, philosophers pick up their forks in a specific order: all except P0 pick up their left and then their right forks, and P0 picks up its right and then its left fork. P0 ∆ Fork1 .get Fork0 .get Pi , 1 ≤ i < N, ∆ Forki .get Forki .get
Eat
Fork0 .put(θ)
Eat
Forki .put(θ)
Fork1 .put(θ) Forki .put(θ)
P0 Pi
Evaluation of an expression leads to deadlock when it spawns threads which wait for each other. Since the threads communicate only through sites, deadlock is avoided if each site call is guaranteed to return a result. Many distributed applications communicate with web services, like a stock quote service, which have this property; so deadlock avoidance is easily established. For other site calls, like c.get on channel c, there is no guarantee of receiving a result. But by judiciously using time-outs as alternatives of site calls in Orc expressions, we can ensure that a result is always delivered, and deadlock avoided.
4.10
Backtrack Search
For problems which are traditionally implemented by backtracking, we exploit angelic non-determinism of Orc to express their solutions succinctly. The evaluation of the Orc expression will create multiple threads, which may be implemented by backtracking. Among the problems which are easily coded are parsing problems in language theory and combinatorial search. We show the solution to one well-known search problem below.
A classical backtracking problem: Eight queens. The eight queens problem is to place 8 queens on a chess board so that no queen can capture another. An elegant functional solution appears in Turner[Turner, 1986]. A placement of queens in the last i rows of the board, 0 ≤ i < 8, is called a configuration. A configuration is represented by a list of integers in the range 0 through 7, denoting the column in which the corresponding queen is placed. A configuration is valid if none of the queens in it can capture any other. Site call check(x : xs), where (x : xs) is a non-empty configuration and xs is valid, returns (x : xs) provided it is valid; if (x : xs) is not valid, it remains silent. We can implement check easily; determine if the queen at x can capture any of the queens represented by xs. Expression extend(x, n), where x is a valid configuration, n is an integer, 1 ≤ n and |x| + n ≤ 8, produces all valid extensions of x by placing n additional queens. Expression extend(x, 1) starts a number of threads, to produce all valid one-queen extension of x. And extend(x, n) is merely the
318 n-fold application of extend(x, 1). The original problem is solved by calling extend([ ], 8), which yields all possible solutions. extend(x, 1) extend(x, n)
| i : 0 ≤ i < 8 : check(i : x) extend(x, 1) extend(θ, n − 1)
∆ ∆
To allow n = 0 in extend(x, n), add extend(x, 0) ∆ let(x).
5.
Laws about Orc Expressions
We list a number of laws about Orc expressions. These laws are also valid for regular expressions of language theory. Some Orc expressions can be regarded as regular expressions: an Orc term corresponds to a symbol in a regular expression, 0 and 1 correspond to the empty set and the set that contains the empty string, and | and mimic alternation and concatenation. There is no operator in Orc corresponding to ∗ of regular expressions, which we simulate using recursion. Additionally, Orc includes the where operator which has no correspondence in language theory. All Orc expressions including where expressions obey the laws given in this section. They can be proved using the formal semantics of Orc.
5.1
Kleene laws
Below f , g and h are Orc expressions. (Zero and | ) (Commutativity of | ) (Associativity of | ) (Left zero of ) (Left unit of ) (Right unit of ) (Associativity of ) (Right Distributivity of
over | )
f |0 = f f |g = g|f (f | g) | h = f | (g | h) 0f = 0 1f = f f 1 = f (f g) h = f (g h) (f | g) h = (f h | g h)
Some of the axioms of Kleene algebra[Kozen, 1990] (of which regular expression theory is an instance) do not hold in Orc. First is the idempotence of | , f | f = f . Consider M and M | M . These are different in Orc, because we make two calls to M in M | M , and just one in M . Also, M may return two different results for the two calls made in M | M . In Orc, we require right associativity of >x> to delineate the scope of tag x. If the scope issue is immaterial, we have full associativity. So (f g) h = f (g h) (f >x> g) >y > h = f >x> (g
>y >
h), if h does not reference x.
In Kleene algebra, 0 is both a right and a left zero. In Orc, it is only a left zero; that is, f 0 = 0 does not hold. Even though neither f 0 nor 0
319
Computation Orchestration
produces a value, evaluation of f 0 may cause changes in the external world, but 0 has no such effect. Another axiom of Kleene algebra which does not hold in Orc is the left distributivity of over | , f (g | h) = (f g) | (f h) To see why, consider M (N | R). Here, M is called once and the value it returns is used in the evaluations of both N and R. In (M N | M R), evaluations of M N and M R are treated independently, M being called once for each subexpression. The left distributivity axiom holds if f is a function; in this case, it has no impact on the external world, and it always returns the same value.
5.2
Laws about where expressions
The laws below for where expressions apply when g does not reference x. {f | g where x :∈ h} {f g where x :∈ h}
6. 6.1
= =
{f where x :∈ h} | g {f where x :∈ h} g
Longer Examples Workflow coordination
In this section, we consider a typical workflow application, where a number of activities have to be coordinated by having them occur in a designated sequence. The problem, which appears in Choi et. al.[Choi et al., 2002], is to arrange a visit of a speaker. An office assistant contacts the speaker, proposing a set of possible dates for the visit. The speaker responds by choosing one of the dates. The assistant then contacts Hotel and Airline sites. He sends the hotel and airline information to the speaker who sends an acknowledgment. Only after receiving the acknowledgment, the assistant confirms both the hotel and the airline reservations. The assistant then reserves a room for the lecture, announces the lecture (by posting it at an appropriate web-site) and requests the audio-visual technician to check the equipment in the room prior to the lecture. In our solution, we employ the following sites. GetDate(p, s): contact speaker p with a list of possible dates s; the response is a single date from s. Hotel (d): contact several hotels for a 2-night stay, leaving on date d. The response is the name of the chosen hotel, its location, price for the room and the confirmation number. This site implements the preferences of the speaker and the organization.
320 Airline(d): similar to Hotel . Ack(p, t): same as GetDate except tuple t is sent and only an acknowledgment is expected as a response. Confirm(t): confirm reservation t (for a hotel or airline). Room(d): reserve a room for one hour on date d. The response is the room number and the time of the day. Announce(p, q): announce the lecture with speaker information (from p), and room and time (from q). AV (q): contact the audio-visual technician with room and time (in q). We have structured the solution as a sequence: (1) contact the speaker and acquire a date of visit, d, (2) make both hotel (h) and airline (a) reservations (3) acquire the acknowledgment from the speaker for h and a, (4) confirm the hotel and the airline, (4) reserve a room (q), and (5) announce the visit and contact the audio-visual technician. The value produced by evaluating the expression is of no significance. Visit(p, s) ∆ GetDate(p, s) >d> {let(h, a) where h :∈ Hotel (d), a :∈ Airline(d)} >(h, a)> Ack(p, (h, a)) {let(x, y) where x :∈ Confirm(h), y :∈ Confirm(a)}
Room(d) >q > {let(x, y) where x :∈ Announce(p, q), y :∈ AV (q)} The problem of arranging a visit is typically more elaborate than what has been shown: the speaker needs to be picked up at the airport and the hotel, lunches and dinners have to be arranged, and meetings with the appropriate individuals have to be scheduled. These additional tasks add no complexity, just bulk, to the solution. They would be coded as separate sites and orchestrated by the top-level solution. Also, we have not considered failure in this solution, which would be handled through time-outs and retries.
6.2
Orchestrating an auction
We consider an example of a typical web-based application, running an auction for an item. First, the item is advertised by calling site Adv, which posts its description and a minimum bid price at a web site. Bidders put their bids
321
Computation Orchestration
on specific channels, and we use the Multiplexor from page 315 to merge all the bids into a single channel, c. We consider three variations on the auction strategy, Auctioni (v), 1 ≤ i ≤ 3. We start the auction by executing z :∈ Auctioni (V ) where 1 ≤ i ≤ 3 and V is the minimum acceptable bid.
A Non-terminating auction. Our first solution continually takes the next bid from channel c which exceeds the current (highest) bid and posts it at a web site by calling PostNext. Below, nextBid(v) returns the next bid from c exceeding v. (the site call if (x > v) returns a signal if x > v and remains silent otherwise.) nextBid (v) ∆ c.get >x> { if (x > v) | if (x ≤ v) }
let(x) nextBid (v)
Below, Bids(v) returns a stream of bids from c where the first bid exceeds v and successive bids are strictly increasing. Bids(v) ∆ nextBid (v) (1 | Bids(θ)) The following strategy starts the auction by advertising the item, and posts successively higher bids at a web site. But the expression evaluation never terminates. Auction1 (v) ∆ Adv(v) Bids(v)
PostNext(θ)
0
A terminating auction. We modify the previous program so that the auction terminates if no higher bid arrives for h time units (say, h is an hour). The winning bid is then posted by calling PostFinal , and the goal variable is assigned the value of the winning bid. Expression T bids(v), where v is a bid, returns a stream of pairs (x , flag), where x is a bid value, x ≥ v, and flag is boolean. If flag is true, then x exceeds its previous bid, and if false then x equals its previous bid, i.e., no higher bid has been received in an hour.
322 T bids(v) ∆ let(x , flag) | if (flag)
T bids(x)
where
(x , flag) :∈ nextBid (v)
let(θ, true) | Rtimer(h)
let(v, false)
The full auction is given by Auction2 (v) ∆ Adv(v) T bids(v)
>(x , flag)>
{ if (f lag) | if (¬f lag) }
PostNext(x) PostFinal (x)
0 let(x)
Batch processing. Our previous solution posts every higher bid as it appears in channel c. It is reasonable to post higher bids only once each hour. So, we collect the best bid over an hour and post it. If this bid does not exceed the previous posting, i.e., no better bid has arrived in an hour, we close the auction, post the winning bid and return its value as the result. Analogous to nextBid (v), we define bestBid (t, v) where t is an absolute time and v is a bid. And bestBid (t, v) returns x, x ≥ v, where x is the best bid received up to t. If x = v then no better bid than v has been received up to t. The code for bestBid (t, v) (see BestRefine of section 4.4) can be understood as follows. First call nextBid (v). If it returns y before t then y > v, and bestBid (t, y) is the desired result. If nextBid (v) times out then return v. bestBid (t, v) ∆ if (b) bestBid (t, y) | if (¬b)
let(v)
where
(y, b) :∈ nextBid (v)
let(θ, true) | Atimer(t)
let(θ, false)
Analogous to T bids(v), we define Hbids(v) to return a stream of pairs (x , flag), where x is the best bid received so far and flag is true iff x is received in the last hour. Expression Hbids calls bestBid every hour until it receives no better bid. Below, the value of f lag is simply the boolean x = v. Hbids(v) ∆ clock bestBid(θ + h, v)
>x>
{let(x, x = v) | if (x = v) Hbids(x)}
The code of Auction3 is identical to that of Auction2 except that T bids in the latter is replaced by Hbids.
323
Computation Orchestration
Auction3 (v) ∆ >(x , flag)>
6.3
Adv(v) Hbids(v) { if (f lag) | if (¬f lag) }
PostNext(x) PostFinal (x)
0 let(x)
Arranging and monitoring a meeting
We write a program to arrange and monitor a meeting at (absolute) time T among a group of professors. First, send a message to all professors requesting the meeting. If N responses are received within 10 time units, then proceed with the meeting arrangement, otherwise cancel the meeting and inform all professors (not just those who have responded). To proceed with the meeting arrangement, reserve a room for time T . If room reservation succeeds, announce the meeting time and room to all professors . If room reservation fails, cancel the meeting and inform all. It is given that a room can be preempted (by the department chairman) until one hour (h units) before its scheduled time. No meeting is preempted more than once. If the room is preempted (before T − h), attempt to reserve another room. If it succeeds, inform all that the meeting has been moved to another room. If room reservation fails, inform all that the meeting is now cancelled. The value of the entire computation is a boolean, false if the meeting is cancelled, true otherwise. This value can be computed only at T − h or shortly thereafter.
Messages. The computation sends several kinds of messages to the professors, which we list below. A message includes certain parameters. msg1 (t): Please respond if you can attend a meeting at time t. msg2 (t): The meeting planned for time t is cancelled due to poor response. msg3 (t): The meeting planned for time t is cancelled because no room is available then. msg4 (t, r): A meeting is scheduled at time t in room r. msg5(t, r, s): The meeting scheduled at time t in room r moved to room s. msg6 (t, r): The meeting scheduled at time t is cancelled because it was preempted from room r and no room is available at t. Site Broadcasti (p), where 1 ≤ i ≤ 6 and p is a list of parameters, sends the ith message with parameters p to all professors, and returns a signal.
324 Specifications of the main components. Main components of the solution are expressions Arrange, Room and M onitor. Their specifications are as follows. Arrange(t): Send message msg1 (t) to the professors and count the number of responses received in 10 time units. If this number is at least N , return true, otherwise call Broadcast2 (t) and return false. Room(t): Reserve a room, r, for time t by calling the site RoomReserve(t). If this fails (r = 0), call Broadcast3 (t). If room reservation succeeds (r = 0), call Broadcast4 (t, r). In all cases return value r. M onitor(t, r): Site RoomCancel(r).get returns a signal if room r has been preempted. In case of preemption before time t, attempt to reserve a room, s. If reservation succeeds (s = 0), call Broadcast5 (t, r, s) and return true. If room reservation fails (s = 0), call Broadcast6 (t, r) and return false.
The computation structure.
The overall structure of the computation is
z :∈ M eetingM onitor(T ) M eetingM onitor(t) ∆ Arrange(t) >b> { if (¬b) let(false ( ) | if (b) Room(t) ( if (r = 0) | if (r = 0) ) }
>r >
let(false ( ) Monitor (t − h, r)
Code of the main components. We give the code of the main components, Arrange, Room and M onitor. The code for Arrange uses tally from section 3.4, page 304. Message m in tally is msg1 (t), and prof is a list of sites, one site for each professor. Expression Arrange sends a cancellation message if the number of responses, n, is below N . It returns the value of n ≥ N in all cases. Arrange(t) ∆ tally(prof ) >n> { if (n ≥ N ) | if (n < N ) }
let(true) Broadcast2 (t)
let(false ( )
The code of Room(t) is straight-forward from its specification.
325
Computation Orchestration
Room(t) ∆ RoomReserve(t) >r > { if (r = 0) Broadcast3 (t) | if (r = 0) Broadcast4 (t, r) } let(r) The code of Monitor (t, r) is straight-forward from its specification. Monitor (t, r) ∆ Atimer(t) let(true) | { RoomCancel (r).get RoomReserve(t) >s> ( if (s = 0) Broadcast5 (t, r, s) | if (s = 0) Broadcast6 (t, r) ) }
let(true) let(f alse)
We have not included time-out in the calls to RoomReserve, but this is easily done.
7. 7.1
Concluding Remarks Programming language design
The notation proposed in this paper provides a minimal language to express interesting multi-threaded computations. It is is not intended as a serious programming language yet, because many language-related issues, from lexical to hierarchical structuring, have been ignored. We consider some below. A number of programming paradigms appear repeatedly in Orc programming. We have listed some of them as idioms in section 4. Some coding patterns are so frequent that special notation should be designed for them. We consider a few notational issues below.
Adding Code and data to expressions. The absence of any arithmetic facility in an Orc expression is a nuisance (though not a disaster) when writing actual programs. To add x and y within an Orc expression we have to call the site add(x, y), where add implements the addition procedure. We have adopted the convention of writing x + y, which a preprocessor can translate to add(x, y). A number of sequential programming features, including conditional statements and some form of iteration, should be allowed within Orc. Also, the programming language should allow most data type manipulations, including array indexing, within Orc expressions, which can then be converted to site calls. And programmers may find it more pleasing to use longer names for the cryptic symbols and | .
326 The current syntax requires that the parameters of site Nested site calls. calls be variables. We do not allow expressions as parameters, because they produce streams of values, not just one. But M (N (x), R(y)), where M , N and R are sites, makes sense. It is {M (u, v) where u :∈ N (x), v :∈ R(y)}. There is no technical difficulty in allowing nested site calls. An expression like M (N (x), N (x)) poses semantic ambiguity. It is not clear if N should be called twice for the two arguments of M or just once, with the value being used for both arguments. These options can be coded, respectively, as {M (u, v) where u :∈ N (x), v :∈ N (x)} {M (u, u) where u :∈ N (x)} We have to study a large number of examples to decide which of these should be picked as the default semantics. The other semantics will have to be coded explicitly.
Fork-Join Parallelism. It is common to call two sites, M and N , in parallel, name their values u and v, respectively, and continue computation only after both return their values. We would code this as { let(u, v) where
}
u :∈ M v :∈ N
>(u, v)>
A convenient notational alternative is {u ← M || v ← N }
Using this notation, the screen-refresh program of section 4.5 (page 308) looks much cleaner.
Metronome {i ← Image || k ← Keyboard || m ← Mouse} Draw(i, k, m)
We can also remove a tag name which is never referenced. So {M || v ← N }
is a shorthand for
Computation Orchestration
327
{ let(u, v) let(v) where u :∈ M v :∈ N } >v >
The workflow coordination example (section 6.1, page 319) now becomes much simpler. Visit(p, s) ∆ d ← GetDate(p, s) {h ← Hotel (d) || a ← Airline(d)} Ack(p, (h, a)) {Confirm(h) || Confirm(a)} q ← Room(d) {Announce(p, q) || AV (q)}
Hierarchical definitions. The current definition of expressions treats all sites named in it as external sites. In many cases, an expression calls sites which are completely local to it, in that no other expression can (or should) call those sites. For example, consider the expressions F ∆ f c.put(θ) 0 G ∆ c.get M (1 | G) E ∆ F | G in which F is a producer that writes to channel c, G a consumer from c, and E the process network consisting of F and G. Here, channel c is local to E (so are the names F and G). The following proposal allows structuring both expressions and sites into hierarchies. An expression definition consists of: (1) its name and formal parameters, (2) definitions of local sites (such as c.put and c.get, which are written in the host language, not Orc), (3) definitions of local expressions (such as F and G), (4) the body of the expression. Remote sites can still be called from an expression; a remote site name is either hard-coded as a constant or passed as a parameter to an expression. Observe that having local expressions within an expression definition allows considerable information hiding.
7.2
Related work
This work draws upon a number of areas of computer science; we give a very brief outline of a few selected pieces of the relevant literature. The work of the W3C group is of particular importance. The Semantic Web [Group, ], a standard for the representation of data on the World Wide Web,
328 is a collaborative effort led by W3C, which integrates a variety of applications using XML for syntax and URIs for naming. We expect that our model will be particularly suitable for processing metadata and making decisions based on their values. Monadic computations in functional programming languages, particularly Haskell [Haskell Language Report, 1999], have the same flavor as Orc computations. Monads allow a functional program to call external agents, like I/O devices, which behave as sites; see Elliott [Elliott, 2004] for some particularly interesting applications. Orc evaluates expressions as Haskell programs, though Orc uses eager evaluation strategy and Haskell uses lazy. We have modeled expression definition in Orc along the lines of function definition in Haskell. There are, however, major differences between the two approaches in that Orc explicitly supports multi-threaded computations. Consequently, a backtracking search problem like eight queens is solved very differently in Orc than it is in functional programming. Orc uses angelic nondeterminism and the functional solution in Turner [Turner, 1986] uses lazy evaluation. Process-network-style computations create and destroy threads, and have threads communicate, synchronize and interrupt each other. This seems impossible to program in a functional language (however, see Du Bois et. al. [Bois et al., ] for interesting work in this area in a functional domain). Orc supports these features, yet has a functional flavor and retains referential transparency. There is a huge amount of literature on the process network model of distributed computing. Our interest in this area derives from its formal semantics, and the possibility that Orc may be a viable alternative to some of these models. A recent work of considerable importance is Benton, Cardelli and Fournet [Benton et al., 2004]. It extends the C # programming language with new asynchronous concurrency abstractions based on the join calculus[Fournet and Gonthier, 1996]. The language is applicable both to multi-threaded applications running on a single machine and to the orchestration of asynchronous, event-based applications communicating over a wide area network. Process algebras, particularly CCS [Milner, 1989] and CSP [Hoare, 1984], have much in common with the philosophy of Orc. All three represent a multithreaded computation by an expression which has interesting algebraic properties. But unlike these process algebras, Orc permits integration of arbitrary components (sites) in a computation. This is both an advantage in that we can orchestrate heterogeneous components, and a disadvantage in that we are unable to decide equivalence of arbitrary Orc expressions, by using bisimulation, for example. Orc differs in a major way from process algebras in its basic operators and the evaluation procedure. We insist on angelic non-determinism; a commit-
Computation Orchestration
329
ment to a value is made only when the statement execution terminates, or explicitly within a where expression. We also permit arbitrary sequential compositions of expressions, f g, which is not supported in CCS or CSP. Transaction processing has a massive number of references; a comprehensive survey appears in Gray and Reuter[Gray and Reuter, 1993]. To the best of our knowledge, no one else has applied Kleene Algebra to specify transaction orchestration. Our approach has the flavor of nested transactions (see Chapter 4 of[Gray and Reuter, 1993]) though there is considerable difference in semantics. Of special importance in our work are compensating transactions, and, particularly, their use in business process orchestration languages, like BPEL [web site on BPEL, a, web site on BPEL, b]. Butler and Hoare[Butler and Hoare, ] are developing a theoretical model and algebra for process interaction which includes compensating transactions. Harel and his co-workers [Harel and Politi, 1998] have developed a very attractive visual notation, Statecharts, to encode computations of interacting processes. Their approach has met with considerable practical success. They have also developed a rigorous semantics of the visual notation.
Acknowledgment. I am extremely grateful to C.A.R. Hoare for extensive discussions and many key insights. Galen Menzel has given me very useful feedback by carrying out an implementation of Orc in Java. He has also contributed to the programming model, particularly concerning scope and binding rules, and has given extensive comments on several drafts of the paper. Elaine Rich has helped me focus on the big issues by being skeptical at the appropriate moments. Comments and suggestions from Luca Cardelli, Ankur Gupta, Gerard Hu´et, Amir Husain, Mathai Joseph, Jose Meseguer, Todd Smith, Reino Kurki-Suonio and Greg Plaxton have enriched the paper.
References [Benton et al., 2004] Benton, N., Cardelli, L., and Fournet, C. (2004). Modern concurrency abstractions for C#. TOPLAS, 26(5):769 – 804. [Bois et al., ] Bois, A. R. D., Pointon, R., Loidl, H.-W., and Trinder, P. Implementing declarative parallel bottom-avoiding choice. http://www.macs.hw.ac.uk/ trinder/papers/spec.ps. [Butler and Hoare, ] Butler, M. and Hoare, C. Personal communication. [Cardelli, ] Cardelli, L. Transitions in programming models (microsoft research european faculty summit ’03). http://research.microsoft.com/Users/luca/Slides/2003-07-16n [Cardelli and Davies, 1999] Cardelli, L. and Davies, R. (1999). Service combinators for web computing. IEEE Transactions on Software Engineering, 25(3):309–316. [Choi et al., 2002] Choi, Y., Garg, A., Rai, S., Misra, J., and Vin, H. (2002). Orchestrating computations on the world-wide web. In B. Monien, R. F., editor, Parallel Processing: 8th International Euro-Par Conference, volume LNCS 2400, pages 1–20. Springer-Verlag Heidelberg.
330 [Elliott, 2004] Elliott, C. (2004). Programming graphics processors functionally. In Proceedings of the 2004 Haskell Workshop, Snowbird, Utah, USA. [Extensible Markup Language (XML), 2001] Extensible Markup Language (XML) (2001). Main page for World Wide Web Consortium (W3C) XML activity and information. http://www.w3.org/XML/. [Fournet and Gonthier, 1996] Fournet, C. and Gonthier, G. (1996). The reflexive chemical abstract machine and the join-calculus. In Proceedings of the POPL. ACM. [Gray and Reuter, 1993] Gray, J. and Reuter, A. (1993). Transaction Processing: Concepts and Techniques. Morgan Kaufmann. [Group, ] Group, W. Semantic web. http://www.w3.org/2001/sw/. [Harel and Politi, 1998] Harel, D. and Politi, M. (1998). Modeling Reactive Systems with Statecharts. McGraw-Hill. [Haskell Language Report, 1999] Haskell Language Report (1999). Haskell 98: A non-strict, purely functional language. Available at http://haskell.org/onlinereport. [Hoare, 1974] Hoare, C. (1974). Monitors: an operating system structuring concept. Communications of the ACM, 17(10):549–557. [Hoare, 1984] Hoare, C. (1984). Communicating Sequential Processes. Prentice Hall International. [Hoare et al., 2004] Hoare, T., Menzel, G., and Misra, J. (2004). A tree semantics of an orchestration language. In Broy, M., editor, Proc. of the NATO Advanced Study Institute, Engineering Theories of Software Intensive Systems, NATO ASI Series, Marktoberdorf, Germany. [Kozen, 1990] Kozen, D. (1990). On Kleene algebras and closed semirings. In Proceedings, Math. Found. of Comput. Sci., volume 452 of Lecture Notes in Computer Science, pages 26–47. Springer-Verlag. [Lamport et al., 1982] Lamport, L., Shostak, R., and Pease, M. (1982). The Byzantine Generals Problem. TOPLAS, 4(3):382–401. [Milner, 1989] Milner, R. (1989). Communication and Concurrency. International Series in Computer Science, C.A.R. Hoare, series editor. Prentice-Hall International. [Turner, 1986] Turner, D. (1986). An overview of Miranda. ACM SIGPLAN Notices, 21:156– 166. [web site on BPEL, a] web site on BPEL, A. Available for download at http://www-106.ibm.com/developerworks/webservices/library/ws-autobp. [web site on BPEL, b] web site on BPEL, A. Available for download at http://www-106.ibm.com/developerworks/webservices/library/ws-bpel.
A TREE SEMANTICS OF AN ORCHESTRATION LANGUAGE Tony Hoare, Microsoft Research Labs, Cambridge, U.K. Galen Menzel, University of Texas, Austin, Texas 78712, USA Jayadev Misra, University of Texas, Austin, Texas 78712, USA∗ email:
[email protected],
[email protected],
[email protected] Abstract
1.
This paper presents a formal semantics of a language, called Orc, which is described in a companion paper[3] in this volume. There are many styles of presentation of programming language semantics. The more operational styles give more concrete guidance to the implementer on how a program should be executed. The more abstract styles are more helpful in proving the correctness of particular programs. The style adopted in this paper is neutral between implementer and programmer. Its main achievement is to permit simple proofs of familiar algebraic identities that hold between programs with different syntactic forms.
Introduction
This paper presents a formal semantics of a language, called Orc, which is described in a companion paper[3] in this volume. Orc is designed to orchestrate executions of independent entities, called sites. Orchestration means creating multiple threads of execution where each thread calls a sequence of sites. Orc permits creating new threads, passing data among threads and selectively pruning threads. A detailed discussion of Orc and its application in a variety of problem areas appears in the accompanying paper. The semantics of Orc proposed here represents all possible threads of execution by a labelled tree. The tree is completely defined by the syntactic structure of the program, independent of its actual execution. Two programs are considered equivalent if their representative trees are equal. We show how to use the tree as a basis for reasoning about program executions. A semantics for a programming language defines the intended meaning of all syntactically correct programs in the language. It serves as a contractual
∗ Work
of this author is partially supported by the National Science Foundation grant CCR–0204323.
331 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 331–350. © 2005 Springer. Printed in the Netherlands.
332 interface between a language’s implementers and its users. Each implementer must guarantee that every execution of every program satisfies the definition, and each programmer may rely on this guarantee when writing programs. There are many styles of presentation of programming language semantics. The operational styles give more concrete guidance to the implementer on how a program should be executed. The abstract styles are more helpful in proving the correctness of particular programs. The style adopted in this paper is neutral between implementer and programmer. Its main achievement is to permit simple proofs of familiar algebraic identities that hold between programs with different syntactic forms. The algebraic laws may help the implementer in designing automatic optimization strategies, and the programmer in developing efficient algorithms, while preserving correctness by construction. The presentation of our semantics proceeds in two stages. First, the text of the program is translated into an abstract tree form. A standard collection of algebraic laws is proved simply by graph isomorphism. This is done in section 3. A particular execution of the program can then be recorded by annotating the nodes of the tree with information about the values of the program variables and the times at which they are assigned. Because a program is non-deterministic, there are many ways to do this: we define the complete set of valid annotations by healthiness conditions, which can be checked independently at every node of the tree. This is done in section 4. The paper concludes with a summary of further research that may prove useful for both the implementation and use of the language.
2.
Syntax and Semantics
We describe the syntax and operational semantics of Orc in this section. This material is condensed from the companion paper[3] in this volume. We include it here for completeness. We encourage the reader to consult the companion paper for more details and examples.
2.1
Syntax
A computation is started from a host language program by executing an Orc statement z :∈ f ([actual-parameter]) where z is a variable of the host program, called the goal variable, f is the name of a (defined) expression, called the goal expression, and [actual-parameter] is a (possibly empty) comma-separated list of actual parameters. The syntax of expression definition is exprDefinition ::= exprName([formal-parameter [ ]) ∆ expr formal-parameter ::= variable
333
A Tree Semantics of an Orchestration Language
Next, we define the syntactic unit expr which denotes an Orc expression. Below, f and g are Orc expressions, F is the name of an expression defined separately, and x is a variable. expr term actual-parameter
::= term | f g | f g | f where x :∈ g | F ([actual-parameter]) ::= 0 | 1 | site([actual-parameter]) ::= constant | variable | θ
Binding powers of the operators. The binding powers of the operators in increasing order of precedence are: ∆ , where , :∈ , | , . Well-formed expressions. The free variables of an expression are defined as follows, where M is a site or an expression name and L is a list of its variable parameters. free(0) = {}, free(1) = {} free(M (L)) = {x| x ∈ L} free(f op g) = free(f ) ∪ free(g), where op is | or free(f where x :∈ g) = (free(f ) − {x}) ∪ free(g)
Variable x is bound in f if it is named in f and is not free. The binding rule is: given ((f where x :∈ g), any free occurrence of x in f is bound to its binding occurrence, the x defined just after where . We rename all bound variables in where expressions so that all variable names in an expression are distinct. Expression f is well-formed if all free variables of it are its formal parameters.
Tag elimination. The full syntax of Orc includes tags, which are variables used to pass values of one expression to another. For example, we write M >x> N (x), where x is a tag, to assign a name to the value produced by M , and to pass this value to N . We do not consider tags in this paper because they can be eliminated using the following identity. (f
2.2
>x>
g) = (f
{g where x :∈ 1})
Operational semantics
In this section, we describe the semantics of Orc in operational terms. Evaluation of an expression (for a certain set of global variable values) yields a stream of values, which may be empty, non-empty but finite, or infinite. Additionally, the evaluation may assign values to certain tags and local variables. We describe the evaluation procedure for expressions based on their syntactic structures.
334 2.2.1 Site call. The simplest expression is a term representing a site call. To evaluate the expression, call the site with the appropriate parameter values. If the site responds, its return value is the (only) value of the expression. In this paper we do not ascribe any semantics to a site call. Its behavior is taken as entirely arbitrary, and therefore consistent with any semantics that may be ascribed later. 2.2.2 Operator for sequential composition. Operator allows sequencing of site calls and passing values between them. To explain the sequencing mechanism, we consider first M N where both operands are site calls. Evaluation of M N first calls M , and on receiving the response from M calls N . The value of the expression is the value returned by N . The value returned by M is referred to as θ; so in M R(θ), R is called with the value returned by M as its argument. Each application of sequencing reassigns the value of θ; so in M R(θ) S(θ), the first occurrence of θ refers to the value produced by M and the latter to the value produced by R. When an expression produces at most one value, has the same meaning as the sequencing operator in a conventional sequential language (like “;” in Java). For expression f g, where f and g are general Orc expressions, f produces a stream of values, and each value causes a fresh evaluation of g. The values produced by all instances of g in time-order is the stream produced by f g. Note that during the evaluation of f g, threads for both f and several instances of g may be executing simultaneously. We elaborate on this in section 2.2.3. 2.2.3 Operator | for symmetric parallel composition. Using the sequencing operator, we can only create single-threaded computations. We introduce | to permit symmetric creations of multiple threads. Evaluation of (M | N ) creates two parallel threads (one for M and one for N ), and produces a stream containing the values returned by both threads in the order in which they are computed. In general, evaluation of f | g, where f and g are Orc expressions, creates two threads to compute f and g, which may, in turn, spawn more threads. The evaluation produces a series of site calls, which are merged in time order. The result from each thread is a stream of values. The result from f | g is the merge of these two streams in time order. If both threads produce values simultaneously, their merge order is arbitrary. Treatment of this is postponed to section 4. It is instructive to consider the expression (M | N ) R. The evaluation starts by creating two threads to evaluate M and N . Suppose M returns a value first. Then R is called. If N returns a value next, R is called again. That is, each value from (M | N ) spawns a thread for evaluating the remaining part
335
A Tree Semantics of an Orchestration Language
of the expression. In (M | N ) R(θ), the value that spawns the thread for computing R(θ) is referenced as θ. Expressions M | M and M are different; the former makes two parallel calls to M , and the latter makes just one. Therefore, M produces at most one value, whereas M | M may produce two (possibly identical) values. The expression M (N | R) is different from M N | M R. In the first case, exactly one call is made to M , and N and R are called after M responds. In the second case two parallel calls are made to M , and N and R are called only after the corresponding calls respond. The difference is significant where M returns different values on each call, and N and R use those values. The two computations are depicted pictorially in figure 1.
M N
M R
N
(a)
Figure 1.
(a) M
M R (b)
(N
| R) and (b) M
N
| M
R
2.2.4 Operator where for asymmetric parallel composition. An expression with a where clause (henceforth called a where expression), has the form {f where x :∈ g}. Expression f may name x as a parameter in some of its site calls. Evaluation of the where expression proceeds as follows. Evaluate f and g in parallel. When g returns its first result, assign the result to x and terminate evaluation of g. During evaluation of f , any site call which does not name x as a parameter may proceed, but site calls in which x is a parameter are deferred until x acquires a value. The stream of values produced by f under this evaluation strategy is the stream produced by {f where x :∈ g}. 2.2.5 Expression call. An expression call is like a function call; the body of the expression is substituted at the point of the call after assigning the appropriate values to the formal parameters. Unlike a function call, an expression returns a stream of values, not just one value. The values returned are non-deterministic, because the sites it calls may be non-deterministic. 2.2.6 Constant terms. There are two constant terms in Orc: 0 and 1. Treat each as a site. Site 0 never responds and 1 responds immediately with the value of θ.
336 2.2.7 Defining Orc expressions. In Orc, an expression is defined by its name, a list of parameters which serve as its global variables, and an expression which serves as its body. For example, BM (0) ∆ 0 BM (n + 1) ∆ S | R BM (n) defines the name BM , and specifies its formal parameter and body. Calling BM (2), for instance, starts evaluation of a new instance of BM with actual parameter 2, which produces a stream of values. The definition of BM is well-grounded so that for every n, n ≥ 0, BM (n) calls a finite number of sites and returns a finite number of values. Orc has expression definitions which are not well-grounded to allow for infinite computations. For example, a call to E where E ∆ S | RE may cause an unbounded number of site calls (and produce an unbounded number of values). However, every call of E occurs inside some context x :∈ . . . E . . ., which assigns to x only the first value produced by the expression. Further computations, i.e., all later site calls by E and the values it returns, can be truncated. Thus all computations of interest depend only on finite recursion depth and a finite tree.
2.2.8 Starting and ending a computation. A computation is started from a host language program by executing an Orc statement z :∈ f ([actual-parameter]) where z is a variable of the host program and f is the name of an expression, followed by a list of actual parameters. All actual parameters have values before f ’s evaluation starts. To execute this statement, start the evaluation of f with actual parameters substituted for the formal ones, assign the first value produced to variable z, and then terminate the evaluation of f . If f produces no value, the execution of the statement does not terminate.
3.
A Semantic Model
We develop a semantic model of Orc which is defined in a manner entirely independent of the behaviors of the sites and the meanings of the site calls. As a result, two expressions that are equal in this semantics behave exactly alike when executed in the same environment, whatever that may be. The model is denotational. The denotation of an expression is a tree whose edges are labelled with site calls, and whose nodes are labelled with declarations of local variables (i.e., their names and the associated trees) and a natural
A Tree Semantics of an Orchestration Language
337
number called size. Paths denote threads of execution. A path ending at a node of size n produces the value associated with the node n times as results of expression evaluation. Two expressions are equal if both are well-formed and their denotation trees are equal. Equal expressions are interchangeable in every context. In this section, we look at the equality problem; in the next section, we show how to depict executions using denotation trees.
Informal Description of the Equality Theory. It is customary to regard two expressions equal if one can be replaced by the other within any expression. This suggests that equal expressions f and g produce the same external effect (i.e., call the same sites in the same order) and the same internal effect (i.e., produce the same values). Therefore, (M | N ) and (N | M ) are equal. In evaluating these expressions, we make the same site calls in both cases and produce the same values, no matter which sites respond. If only M responds, say, both expressions will produce the same value, the response received from M. We create the tree from an expression assuming that every site responds. Thus, two expressions are equal if they have the same tree. This notion of equality, properly refined, is appropriate even when some sites may not respond during an execution, because then both expressions will behave identically. For example, consider ((M | N ) R) and (M R | N R). If N does not respond and M does during an evaluation, both expressions will produce the value from the sequence M R (provided R responds). Moreover, they would have made identical site calls, to M and N simultaneously and to R after M responds. A value produced by an expression is derived from the response of a site call. So for the equality of f and g, we need only establish that f and g have identical tree structures where corresponding nodes have the same size. To see the need for the latter requirement, consider (M 0) and M . They both make the same site call, to M , though only the latter produces a value. And in M (1 | 1), the value received from M is produced twice as the result of expression evaluation, whereas in M the value is produced only once. The size of a node (in these cases, the respective terminal nodes) denotes the number of times the corresponding value is produced.
3.1
The denotation tree
3.1.1 Structure of the denotation tree. The denotation of an expression is a tree. The tree has at least one node (its root), and it may be infinite. Each edge of the tree is labelled with a site call of the form M (L), where L is a list of parameters. Each node has a set of declarations, where a declaration consists of a variable name and a denotation tree. The set of declarations may
338 be empty; otherwise, the variable names in the declarations at a node are distinct. Each node has a size, a natural number1 . Size n specifies that during an execution the value associated with this node (i.e., received from the site call ending at this node) appears n times as the result of expression evaluation.
Bound Variable Renaming. Declarations at a node correspond to introduction of local variables. The reference to variable x in an edge label, say M (x), is bound to x which is declared at the closest ancestor of this edge. We may rename variable x, which appears in a declaration at a node, by y provided y is not the name of any variable in that declaration. Then, we replace x by y for all occurrences of x bound to this declaration. Renaming does not change any property of the tree. 3.1.2 Operations on denotation trees. We define three operations on denotation trees: join (∪), graft ( + + ) and declare (◦). To compute P ∪ Q for trees P and Q, create a tree where P and Q share the root. The declarations at the root is the union of the declarations at the roots of P and Q; ensure distinct names in the declarations by renaming variables in P or Q. The size at the root is the sum of the sizes of both roots. To compute P ++ Q, at each node u of P which has size n, n > 0, attach n copies of Q, as follows. First, join n copies of Q as described above; call the result Qn . Let q be the root of Q and q n of Qn . Node q n has n distinctly named variables for each variable declared at q and its size is m × n, where m is the size of q. Next: (1) set the declarations at u to the union of the declarations at u and q n , (2) set the size of u to that of q n (i.e., m × n), and (3) make all children of q n children of u. Note that a node of P whose size is 0 is unaffected by graft. To compute (x, Q) ◦ P , add the declaration (x, Q) to the declarations at the root of P ; rename x if necessary to avoid name clash. In (x, Q) ◦ P , tree Q is a subordinate of tree P . Two trees are equal if they are identical in all respects as unordered trees after possible renamings. Specifically, equal trees have 1-1 correspondence between their nodes and their edges so that (1) corresponding nodes have the same declarations (i.e., same variables and equal associated trees) and same size, and (2) corresponding edges have the same label and they are incident on nodes which correspond. Simple facts about join, graft and declare. are trees, and c and d are declarations.
1 In
general, the size is an ordinal; see section 3.3.
In the following, P , Q and R
A Tree Semantics of an Orchestration Language
339
1 ∪ is commutative and associative. 2 + + is associative. 3 (P ∪ Q) + + R = (P + + R) ∪ (Q + + R) 4 d ◦ (P ++ Q) = (d ◦ P ) + +Q 5 d ◦ (P ∪ Q) = (d ◦ P ) ∪ Q 6 c ◦ (d ◦ P ) = d ◦ (c ◦ P ) The commutativity and associativity of ∪ follow from its definition. The associativity of is also easy to see pictorially, but we sketch a proof. A copy of R in (P + + Q) + + R is equal to any copy of R in P + + (Q ++ R), because there has been no graft on (i.e., attachments to the nodes of) R in either tree. Next, we show that any copy of Q in one tree is equal to any in the other. The copies are identical because in both cases R has been grafted to Q, and grafting R to identical trees results in identical trees. To complete the proof, first note that in (P ++ Q) + + R and P + + (Q + + R) there is exactly one copy of tree P . We show that copies of P in both trees are identical; i.e., corresponding nodes u and v in both trees have: (1) equal numbers of copies of Q and R attached to them; so, their declarations are identical, (2) equal sizes, and (3) identical edges of P incident on them. The proof of part (3) is trivial, because the edges of P are unaffected by graft. To prove (1) and (2), let the size of u (and v) in P be m, and the sizes of the roots of Q and R be n and r, respectively. The number of copies of Q attached to u in (P + + Q) + + R is m, because it is the same as in (P + + Q). The number of copies of Q attached to v in P ++ (Q + + R) is again m because the size of v in P is m. And the number of copies of R attached to either is m × n. The sizes of both u and v, in (P + + Q) + + R and P + + (Q ++ R) respectively, are m × n × r. Proof of (P ∪ Q) + + R = (P + + R) ∪ (Q + + R) is direct from the tree construction. Note that P + + (Q ∪ R) = (P ++ Q) ∪ (P + + R). To see this, let P , Q and R have single variable edges labelled M , N and R respectively. + (Q ∪ R) is given in figure 1(a) and for (P + + Q) ∪ (P ++ R) The tree for P + in figure 1(b) (in page 335); they are different. The proof of d ◦ (P ++ Q) = (d ◦ P ) + + Q follows from the tree structure; declaration d appears at the root of tree P in both cases. And d ◦ (P ∪ Q) = (d ◦ P ) ∪ Q because in both cases declarations at the root are equal; d is added to the set of declarations of (P ∪ Q). We have c ◦ (d ◦ P ) = d ◦ (c ◦ P ) because the declarations form a set and “◦” adds a declaration to the set.
3.1.3 Denotations of expressions. Write tree(f ) for the denotation tree of expression f . For 0, the tree has a single node (the root) whose size is 0. For 1, the tree has a single node (the root) whose size is 1. For a site call
340 [0]
[0]
S(x) [1]
N(y) [1] 1
y
[0] C
[0] S(x)
[1] 1
x
[0]
y
N(y)
[1] 1
[1] 1
[0] C [1]
Trees for S(x), {N (y) where y :∈ c}, (S(x) where x :∈ {N (y) where y :∈ c})
Figure 2.
of the form M (L), the tree consists of a single edge labelled with the term, the root has size 0 and the terminal node size 1. There are no declaration in any of these cases. The rest of the tree-construction rules follow. tree(f g) = tree(f ) + + tree(g) tree(f | g) = tree(f ) ∪ tree(g) tree(f where x :∈ g) = (x, tree(g)) ◦ tree(f ) For expression F where F ∆ f : tree(F ) is the least fixed point of the equation tree(F ) = tree(f ). We have described the operations + + , ∪ and ◦ earlier. In the definition of tree(f where x :∈ g), tree(g) is a subordinate tree of tree(f ). We treat least fixed point in more detail in section 3.3.
Example.
We construct the denotation of
(S(x) where x :∈ {N (y) where y :∈ c}) ({(1 | N (x)) R(y) where x :∈ A 0} where y :∈ B)
The construction involves all three operations, ++ , ∪ and ◦. In the figures, the size of a node is enclosed within square brackets. We show a declaration by drawing a dashed edge to the root of the subordinate tree and labeling the edge with the variable name. For S(x), {N (y) where y :∈ c} and (S(x) where x :∈ {N (y) where y :∈ c}) the trees are shown in figure 2. The trees for 1, N (x) and (1 | N (x)) are in figure 3. The trees for R(y) and (1 | N (x)) R(y) are in figure 4. The tree for {(1 | N (x)) R(y) where x :∈ A 0} is in figure 5. The tree for ({(1 | N (x)) R(y) where x :∈ A 0} where y :∈ B) is in figure 6. The tree for the whole expression is in figure 7.
341
A Tree Semantics of an Orchestration Language [0]
[1]
[1]
N(x)
N(x)
[1] (a) 1
[1]
(b) N(x)
(c) (1 | N(x))
Trees for 1, N (x) and (1 | N (x))
Figure 3.
[0]
[0]
R(y)
R(y))
[1]
N(x) N [0] R(y)
[1]
[1] Figure 4.
Trees for R(y) and (1 | N (x)) R(y)
[0] R(y)) [1]
x N(x) N [0] R(y) [1]
Figure 5.
3.2
[0] A [0]
Tree for {(1 | N (x)) R(y) where x :∈ A 0}
Laws obeyed by Orc expressions
Well-formed expressions f and g are equal if their trees are identical. (See section 2.1 for definition of well-formed expressions.) f = g iff tree(f ) = tree(g) . Equal expressions are interchangeable in any context. We list a number of laws about Orc expressions. The laws in section 3.2.1 are also valid for regular expressions of language theory (which is a Kleene algebra[1]). Orc expressions without where clauses can be regarded as regular expressions. An Orc term corresponds to a symbol in a regular expression: 0 and 1 correspond to the empty set and the set that contains the empty string, and | and correspond to alternation and concatenation. There is no operator in Orc corresponding to ∗ of regular expressions, which we simulate using recursion. The where operator of Orc has no counterpart in language theory.
342 [0]
y
[0]
x
R(y))
B
N(x) N [0] R(y)
[1] [1] Figure 6.
[0] A [0]
[1]
Tree for ({(1 | N (x)) R(y) where x :∈ A 0} where y :∈ B)
[0]
x
S(x)
[0]
y
N(y)
C
[1] [0]
y
B
[0] R(y))
x N(x) N [0] R(y)
[1] [1]
[1] Figure 7.
[0]
[1] [0] A [0]
The tree for the whole expression
3.2.1 Kleene laws. All Orc expressions, including where expressions, obey the laws given in this section. Below f , g and h are Orc expressions. In all the identities, one side is well-formed iff the other side is. (Zero and | ) (Commutativity of | ) (Associativity of | ) (Left zero of ) (Left unit of ) (Right unit of ) (Associativity of ) (Right Distributivity of
over | )
f |0 = f f |g = g|f (f | g) | h = f | (g | h) 0f = 0 1f = f f 1 = f (f g) h = f (g h) (f | g) h = (f h | g h)
(Zero and | ) (f | 0) = f : Join with 0 does not affect the size or the declarations at the root of f . (Commutativity of | ) f | g = g | f : From the commutativity of join. (Associativity of | ) (f | g) | h = f | (g | h): From the associativity of join. (Left zero of ) 0 f = 0: The tree for 0 has only a node of size zero; so, grafting has no effect.
343
A Tree Semantics of an Orchestration Language
(Left unit of ) 1 f = f : The tree for 1 has only a root node of size one; so, grafting f produces f . (Right unit of
)
(Associativity of ciative.
f 1 = f : Similar arguments as above.
)
(f g) h = f (g h): Operation graft is asso-
(Right Distributivity of over | ) (f | g) h = (f h | g h): From the right distributivity of graft over join. Some of the axioms of Kleene algebra do not hold in Orc. First is the idempotence of | , f | f = f . Expressions M | M and M are different because the corresponding trees have different sizes at the terminal nodes. In Kleene algebra, 0 is both a right and a left zero. In Orc, it is only a left zero; that is, f 0 = 0 does not hold: expression (M 0) differs from 0, because the corresponding trees are different. Another axiom of Kleene algebra is the left distributivity of over | : f (g | h) = (f g) | (f h). This does not hold in Orc because, as we have shown, the + + does not left distribute over ∪.
3.2.2 Laws for where expressions. have no counterpart in Kleene algebra.
The following laws for where expressions
(Distributivity over ) {f g where x :∈ h} = {f where x :∈ h}
g
(Distributivity over | ) {f | g where x :∈ h} = {f where x :∈ h} | g (Distributivity over where ) {{f where x :∈ g} where y :∈ h} = {{f where y :∈ h} where x :∈ g}
On the need for well-formedness. For the laws given above, both sides of an identity must be checked syntactically for well-formedness. Unlike the laws in section 3.2.1, both sides may not be well-formed if only one side is. Consider p = (M | N (x) where x :∈ g) We show below that tree(p) is identical to both tree(q) and tree(r), where q = (M where x :∈ g) | N (x) r = (N (x) where x :∈ g) | M Expression r is well-formed though q is not, because x in the term N (x) is not bound to any variable. So, p = q though p = r.
344 Proofs.
Use the following abbreviations. P = tree(f ) Q = tree(g) R = tree(h) Q = (x, tree(g)) R = (x, tree(h)) R = (y, tree(h))
The required proofs are +Q R ◦ (P ++ Q) = (R ◦ P ) + R ◦ (P ∪ Q) = (R ◦ P ) ∪ Q R ◦ (R ◦ P ) = R ◦ (R ◦ P ) These results follow directly from the properties of declaration (◦); see section 3.1.2 (page 338).
3.3
Least Fixed point
To construct the tree for an expression call we need to solve an equation. To handle expression calls with parameters, say, F (s) where F ∆ (λr.f ), find the least fixed point of tree(F (s)) = tree(F ∆ (λr.f )s). To do this, replace all formal parameters by the actual parameter values and then construct the least denotation tree. As an example, consider the definition BM (0) ∆ 0 BM (n + 1) ∆ S | R BM (n) To construct tree(BM (2)), say, we solve the following equations, which results in the tree shown in figure 8. tree(BM (2)) = tree(S | R BM (1)) tree(BM (1)) = tree(S | R BM (0)) tree(BM (0)) = tree(0) [0] S [1]
R [ [0] S [1]
Figure 8.
R [0]
Solution of the equations for BM (2)
The solution is more involved for the following definition where the number of equations is infinite. E ∆ S | RE
345
A Tree Semantics of an Orchestration Language
The resulting tree is the least fixed point of this equation. It is obtained by a chain of approximations for E: start with the approximation 0 for E, and substitute each approximation for E into the equation to obtain a better approximation. We claim (though we do not prove in this paper) that the limit of this chain of approximations is the least fixed point of the given equation. The first few approximations are shown in figure 9. The sequence is same as BM (0), BM (1) . . ., shown above. [0]
[0]
[0]
S
R
S
[1]
[0]
[1]
R S [1]
Figure 9.
[0] R [0]
...
Approximations for the least fixed point of E ∆ S | R E
The reader may show that both (E = E) and (E = E f ) have 0 as their least fixed points. The least fixed point of (E = M E) is the tree which is a single infinite chain of edges labelled M in which every node has size zero. And for (E = 1 | E M ), the least fixed point is a denotation of the infinite expression 1 | M | M M | M M M | . . . (which is not a valid Orc expression) whose terminals have size one.
The need for ordinal as size. Theoretically, we need ordinals to represent sizes in denotation trees. Consider the equation (E = 1 | E). Its tree has a single node (root) with size ω. And, E E has a tree whose root has size ω 2 . Similarly, E = ({M (x) where x :∈ g} | E) has an infinite number of declarations at its root. Such expressions are rare in practice, and they are unimplementable because their executions have to create infinite number of threads simultaneously. The least fixed point of (E = M | E) is (M | M | · · ·) which is (!M ) of Pi-calculus [2]. In this case, the tree has infinite degree at the root, but each terminal node has size one.
4.
Healthiness conditions for executions
In this section, we augment the denotation tree with additional information to record the steps of an execution. An execution is a history of site calls (the actual parameter values passed to the sites and the times of the calls), the responses received from the sites (the values received and the times of receipt), and the assignments of values to the local variables. We record these steps by attaching a state to each node of the tree, as we explain below. A node u in the tree has an associated state u.state. A state is an assignment of values to variables; we write u.x for the value of x in u.state, provided x is
346 defined in that state. A value is a tuple (magnitude, time), where magnitude holds the actual value of the variable and time denotes the time at which the magnitude is computed; write u.x.time for the time component of x in u.x. The times associated with different variables in a state may be different, as we explain in section 4.4. The time at which node u is reached in a computation is u.θ.time.
Notation.
For node u
u.state: u.def : u.decl: u.null: u.x: u.x.time:
the state of u the set of variables (including θ) defined in u.state the set of variables declared at u is true iff u.def = {} the value of x in u.state the time component of x in u.x
For any execution, the states associated with the nodes satisfy certain healthiness conditions, which we specify in this section. Conversely, any set of states which satisfy the healthiness conditions is a possible execution. We give the healthiness conditions in three parts: (1) edge conditions, which specify for each edge (u, v) the relationship between u.state and v.state, (2) root conditions, which specify the states at the roots of the goal tree and all subordinate trees, and (3) the conditions for subordinate computations, which give the semantics of where expressions.
Convention. Assume that a tree and all its subordinates have been renamed so that a variable is declared in at most one node.
4.1
Edge conditions
For edge (u, v), whose label is M (L), v.state specifies the effect of site call M (L) in u.state. ¬v.null ⇒
∧ ∧ ∧ ∧ ∧
¬u.null θ ∈ v.def u.def = v.def − v.decl (∀x : x ∈ u.def ∧ x = θ : u.x = v.x) (∀y : y ∈ L : y ∈ u.def ∧ v.θ.time ≥ u.y.time) v.θ.time ≥ u.θ.time (edge condition)
We study the conjuncts in turn. The state of v is non-null only if u’s state is non-null (so a null state propagates down the tree). In every non-null state θ is defined (we will require in section 4.2 that θ be defined at each root). All variables defined in u.state are also defined in v.state; the additional variables
A Tree Semantics of an Orchestration Language
347
defined in v.state are declared at v. Values of the variables in u.state (other than θ) are the same in v.state. The site call is made only if all parameters of the call are defined in u.state. Next, we justify the conditions on v.θ.time in the last two conjuncts. The site call is made no earlier than u.y.time for any parameter y, for y ∈ L, because the value of y is not available before u.y.time. And the call is made no earlier than u.θ.time. The time of the corresponding response is v.θ.time which is at least the time of the call.
4.2
Root conditions
We specify the conditions on r.state where r is a root node of the goal or a subordinate tree. The only condition for the goal tree is r.def = {θ} and r.θ.time ≥ 0
(goal root condition)
The first part says that when an expression is called from a main (host language) program, all its formal parameters are replaced by actual parameters values; so they are not part of the root state. Only θ is defined and its associated time is non-negative. For subordinate trees, consider node u of tree P which has a declaration (x, Q), so Q is subordinate to P . For root q of Q q.state = u.state
(subordinate root condition)
That is, the computations of q and u start simultaneously in the same state. The condition is surprising and, apparently, circular. This is because x may be defined in u.state, but it is certainly not available to any node in Q; the purpose of Q is to compute the value of x. Here, we exploit the fact that Q represents the denotation of a well-formed expression. Therefore, no edge in Q accesses q.x. The presence or absence of x in q.state (and in the states of all its descendants) is immaterial. Similar remarks apply for multiple declarations at node u; all variables in u.def appear in q.def , only some of which would be accessed by a well-formed expression.
4.3
Subordinate tree conditions
Let P be a denotation tree in which node u has a declaration (x, Q); therefore, Q is subordinate to P . The first healthiness condition states that if x is defined (at u), then there is a node of Q of positive size whose θ value is same as that of x. Conversely, x is defined only if there is some such node of Q.
Subordinate assignment condition. (x ∈ u.def ) ≡ (∃q : q ∈ Q, size of q is positive, ¬q.null : q.θ = u.x)
348 The next condition states that all computations in Q and its subordinates cease once x is assigned value. Define predicate cease(R, t), where R is a tree and t a time, which holds iff (1) r.θ.time at any node r of R is at most t, and (2) the condition applies recursively to all subordinate trees of R.
Subordinate termination condition. cease(R, t) ∆
(∀r : r ∈ R : r.θ.time ≤ t) ∧ (∀S : S subordinate of R : cease(S, t))
(x ∈ u.def ) ⇒ cease(Q, u.x.time) The subordinate tree conditions apply to the goal tree as well; its computation may assign the θ value of a node of positive size to the goal variable, and then cease.
A small identity. Call expression g silent if all nodes in its tree have size zero. That is, g = g 0. We use the healthiness conditions to argue that (f where x :∈ g) has the same execution as (f | g) if g is silent (note that f | g may not be well-formed if f references x). In (f where x :∈ g) variable x is declared at the root u of tree(f ). From the first condition for subordinate tree, x ∈ u.def . Therefore, x plays no role in the computation. The condition for ceasing the computation holds vacuously for tree(g). So, the execution of (f where x :∈ g) is identical to starting f and g simultaneously without any constraints on termination (i.e, as (f | g)).
4.4
Discussion
We show that variables defined in a state may have different associated times, which may also be different from the time associated with θ. To see this, consider (f where x :∈ g). In the operational semantics, the computations of f and g start in the same state, say s. If g returns a value, x is assigned the value and portions of f which are waiting for x may be resumed. This operational notion is captured within our semantics by creating a state s , which is s augmented with the (magnitude, time) of x, and starting both f and g in s . We consider the ramifications of this rule for both f and g. We have explained in section 4.2 (under subordinate root condition), that the apparent circularity of using s in place of s causes no semantic difficulty for g. A well-formed g does not access x; therefore, state assignments to its nodes are the same with s and s except that in the latter case each state includes x as a defined variable. We argue that s is the appropriate state for starting the computation of f , as well. Consider, for example, f = (M | N (x)). The evaluation of M proceeds without waiting for x, whereas N (x) has to wait for x. Therefore, M starts in
A Tree Semantics of an Orchestration Language
349
state s and N (x) in state s . So, it may seem that the root of tree(f ) should have two associated states, s and s . Fortunately, it is sufficient to associate just s with the root, because since M does not access x its execution is identical in both s and s , much like the way we argued for g, in the previous paragraph. This argument applies for arbitrary f , as well.
5.
Conclusion
The most serious omission from the semantics presented here has been a treatment of site calls and their interactions with the Orc program. A semantics for sites should ideally enable each independent site to be treated separately so that its combination with an Orc program yields the semantics of a more restricted Orc program: interactions with the site are entirely hidden, so that the result can be simply interfaced with the remaining sites. A second extension to the semantics presented could give a more step-bystep guidance to the implementer on how to execute programs without risking deadlock or making unnecessary site calls. A third extension would provide more guidance to the programmer on how to establish correctness of programs and of sites. It is particularly important to help the designer of a site to discharge responsibility for nullifying the effect of all site calls except those involved in computing the final result delivered by the Orc program.
Related work. The proposed semantics is greatly influenced by denotations of regular expressions (i.e., Kleene algebra[1]). A regular expression is denoted by a tree labelled by symbols on its edges. Each path in the tree represents a string (the sequence of symbols along its edges) and two trees are identical if they have the same set of paths. Therefore, a regular expression can be denoted by a set of strings only. Orc expressions are also denoted by trees though there are several differences: (1) left distributivity of over | , i.e., f (g | h) = (f g) | (f h) does not hold in Orc; so, a tree can not be represented by the labels on its paths only (see figure 1 in page 335), (2) the where clause has no counterpart in Kleene algebra; its introduction requires us to attach subordinate trees to the nodes of a tree, and (3) lack of idempotence in Orc forces us to distinguish between 1 and (1 | 1), say, by associating a size with each node.
350
References [1] Dexter Kozen. On Kleene algebras and closed semirings. In Proceedings, Math. Found. of Comput. Sci., volume 452 of Lecture Notes in Computer Science, pages 26–47. SpringerVerlag, 1990. [2] Robin Milner. Communicating and Mobile Systems: the π-Calculus. Cambridge University Press, May 1999. [3] Jayadev Misra. Computation orchestration: A basis for wide-area computing. In Manfred Broy, editor, Proc. of the NATO Advanced Study Institute, Engineering Theories of Software Intensive Systems, NATO ASI Series, Marktoberdorf, Germany, 2004.
Part IV Security, System Development and Special Aspects
David Basin
David Harel
MODEL DRIVEN SECURITY David Basin,1 Jürgen Doser,1 and Torsten Lodderstedt2 1 ETH Zürich, Switzerland∗ 2 Interactive Objects Software GmbH, Germany
Abstract
1.
We present a new approach to building secure systems. In our approach, which we call Model Driven Security, designers specify system models along with their security requirements and use tools to automatically generate system architectures from the models including complete, configured security infrastructures. Rather than fixing one particular modeling language for this process, we propose a general schema for constructing such languages that combines languages for modeling systems with languages for modeling security. We present several instances of this schema that combine (both syntactically and semantically) different UML modeling languages with a security modeling language for formalizing access control requirements. From models in the combined languages, we automatically generate security architectures for distributed applications, built from declarative and programmatic access control mechanisms. We have implemented this approach and report on a case-study with the resulting tool.
Introduction
Model building is standard practice in software engineering. The construction of models during requirements analysis and system design can improve the quality of the resulting systems by providing a foundation for early analysis and fault detection. The models also serve as specifications for the later development phases and, when the models are sufficiently formal, they can provide a basis for refinement down to code. Model building is also carried out in security modeling and policy specification. However, its integration into the overall development process is problematic and suffers from two gaps. First, security models and system design models are typically disjoint and expressed in different ways (e.g., security models as structured text versus graphical design models in languages like UML). In ∗ This
work has been partially supported by the Swiss “Federal Office for Education and Science” in the context of the EU-funded Integrated Project TrustCoM (IST-2002-2.3.1.9 Contract-No. 1945). The authors are responsible for the content of this publication.
353 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 353–398. © 2005 Springer. Printed in the Netherlands.
354 general, the integration of system design models with security models is poorly understood and inadequately supported by modern software development processes and tools. Second, although security requirements and threats are often considered during the early development phases (requirements analysis), and security mechanisms are later employed in the final development phases (system integration and test), there is a gap in the middle. As a result, security is typically integrated into systems in a post-hoc manner, which degrades the security and maintainability of the resulting systems. In this paper, we take up the challenge of providing languages, methods, and tools for bridging these gaps. Our starting point is the concept of Model Driven Architecture (MDA) [Frankel, 2003], which has been proposed as model-centric and generative approach to software development. Conceptually, the MDA approach has three parts: (1) developers create system models in high-level modeling languages like UML; (2) tools are used to perform automatic model transformation; and the result is (3) a target (system) architecture. Whereas the generation of simple kinds of code skeletons by CASE-tools is now standard (e.g., generating class hierarchies from class diagrams), Model Driven Architecture is more ambitious and aims at generating nontrivial kinds of system infrastructure from models. Our main contribution is to show how the Model Driven Architecture approach can be specialized to what we call Model Driven Security by extending its three parts: system design models are extended with security requirements and model transformation is extended to generate security infrastructure for the target system. The most difficult part of this specialization is the first, concerning the models themselves, and here we propose a general schema for combining languages for security modeling with those for design modeling. Our schema provides a recipe for language combination at the level of both syntax and semantics, for example providing sufficient conditions for the combination to be semantically well-defined. The main idea is to define security modeling languages that are general in that they leave open the nature of the protected resources, i.e., whether these resources are data, business objects, processes, states in a controller, etc. Such a security modeling language can then be combined with a system design modeling language by defining a dialect, which identifies elements of the design language as the protected resources of the security language. In this way, we can define families of languages that flexibly combine design modeling languages and security modeling languages, and are capable of formulating system designs along with their security requirements. To show the feasibility of this approach and to illustrate some of the design issues, we present several detailed examples. First, we specify a security modeling language for modeling access control requirements that generalizes Role-Based Access Control (RBAC) [Ferraiolo et al., 2001]. To support visual modeling, we embed this language within an extension of UML and hence
Model Driven Security
355
we call the result SecureUML. Afterwards, we give two examples of design modeling languages, one based on class diagrams and the other based on statecharts. We then combine each of these with SecureUML by defining dialects that identify particular elements of each design modeling language as protected SecureUML resources. In each case, we define model transformations for the combined modeling language by augmenting model transformations for the UML-based modeling languages with the additional functionality necessary for translating our security modeling constructs. The first dialect provides a language for modeling access control in a distributed object setting and we define a transformation function that produces security infrastructures for distributed systems conforming to the Enterprise JavaBeans (EJB) standard. The second dialect provides a language for modeling security requirements for controllers for multi-tier architectures and the transformation function generates access control infrastructures for web applications. As a proof of concept, within the MDA-tool ArcStyler [Hubert, 2001] we have built a prototypical generator that implements the above mentioned transformation functions for both dialects. We report on this, as well as on experience with our approach. Overall, we view the result as a large step towards integrating security engineering into a model-driven software development process. This bridges the gap between security analysis and the integration of security mechanisms into end systems. Moreover, it integrates security models with system design models and thus yields a new kind of model, security design models.
2.
Background
We first introduce a design problem along with its security requirements that will serve as a running example throughout this paper. Afterwards, we introduce the modeling and technological foundations that we build upon: the Unified Modeling Language, Model Driven Architecture, Role-based Access Control, and several security architectures.
2.1
A design problem
As a running example, we will consider developing a simplified version of a system for administrating meetings. The system should maintain a list of users (we will ignore issues such as user administration) and records of meetings. A meeting has an owner, a list of participants, a time, and a place. Users may carry out standard operations on meetings such as creating, reading, editing, and deleting them. A user may also cancel a meeting, which deletes the meeting and notifies all participants by email.
356 0..* Meeting -start : date -duration : time +notify() : void +cancel() : void
+owner 1 +participants
0..*
0..*
0..*
Room 0..1 +location
Figure 1.
Person -name : string -e-mail : string
-number : int -floor : int
Scheduler Application Class Diagram
As the paper proceeds, we will see how to formalize a design model for this system along with the following security policy. 1 All users can create new meetings and read all meeting entries. 2 Only the owner of a meeting may change meeting data and cancel or delete the meeting. 3 A supervisor can cancel any meeting.
2.2
The Unified Modeling Language
The Unified Modeling Language (UML) [Rumbaugh et al., 1998] is a widely used graphical language for modeling object-oriented systems. The language specification differentiates between abstract syntax and notation (also called concrete syntax). The abstract syntax defines the language primitives used to build models, whereas the notation defines the graphical representation of these primitives as icons, strings, or figures. UML supports the description of the structure and behavior of systems using different model element types and corresponding diagram types. In this paper, we focus on the model element types comprising class and statechart diagrams. The structural aspects of systems are defined using classes, e.g., as in Figure 1, which models the structure of our scheduling application. This model consists of three classes: Meeting, Person, and Room. A Meeting has attributes for storing the start date and the planned duration. The participants and the location of the meeting are specified using the association ends participants and location. The method notify notifies the participants of changes to the schedule. The method cancel cancels the meeting, which includes notifying the participants and canceling the room reservation. In contrast, state machines describe the behavior of a system or a class in terms of states and events that cause a transition between states. Figure 2 shows the statechart diagram for our scheduling application. In the state ListMeetings, a user can browse the scheduled meetings and can initiate (e.g., by clicking a button in a graphical user interface) the editing, creation, deletion, and cancel-
357
Model Driven Security EditMeeting
edit delete / deleteMeeting update ListMeetings
create
cancel / cancelMeeting
Figure 2.
CreateMeeting insert
Scheduler Application Statechart
lation of meetings. An event of type edit causes a transition to the state EditMeeting, where the currently selected meeting (stored in ListMeetings) is edited. An event of type create causes a transition to the state CreateMeeting, where a new meeting is created from data entered by the user. An event of type delete in state ListMeetings triggers a transition that executes the action deleteMeeting, where the currently selected meeting is deleted from the database. Similarly, an event of type cancel causes the execution of cancelMeeting, which calls the method cancel on the selected meeting. UML also provides a specification language called OCL, the Object Constraint Language. OCL expressions are used to formalize invariants for classes, preconditions and postconditions for methods, and guards for enabling transitions in a state machine. As an example, we can add to the class Meeting in Figure 1 the following OCL constraint, stating that the participants of a meeting must always include the meeting’s owner. context Meeting inv : self . participants - > includes ( self . owner )
2.3
Model Driven Architecture
Model Driven Architecture (MDA) has been proposed as an approach to specifying and developing applications where systems are represented as models and transformation functions are used to map between models as well as to automatically generate executable code [Frankel, 2003]. Of course, the fully automatic synthesis of complex systems from high-level descriptions is unobtainable in its full generality. We cannot, in general, automatically generate the functions implementing a specification of a system’s functional behavior, i.e., its “business logic”. But what is possible is to automate the generation of platform-specific support for different kinds of non-functional system concerns, such as support for persistence, logging, and the like, i.e., system aspects, in the aspect-oriented programming sense [Kiczales et al., 1997], that
358 cut across different system components. Our work shows that security, in particular access control, is one such aspect that can be automatically generated and that this brings with it many advantages. The starting point of MDA is the use of domain-specific languages to formalize models for different application domains or system aspects. In our work, we define modeling languages by directly formalizing their metamodels. As a metalanguage, we use the Meta-Object Facility (MOF), which is essentially a subset of UML that is well-suited for formalizing metamodels using standard object-oriented concepts like class and inheritance. MOF provides a more expressive formalism for defining modeling languages than other alternatives, e.g., the use of UML profiles or conventional definition techniques like the Backus-Naur Form (BNF). For example, in MOF, we can directly formalize relations between model primitives, which is one of the key ideas we use when combining modeling languages (see, for example, the discussion on subtyping in Section 5.1). MOF also offers advantages for building MDA tools. There is tool support for automatically creating repositories and maintaining metadata based on MOF, e.g. [Akehurst and Kent, 2002]. Moreover, by separating the abstract syntax of languages from their UML-based concrete syntax (defined by UML profiles), we can concisely define modeling languages and directly use UML CASE-tools for building models.
2.4
RBAC
Mathematically, access control expresses a relation AC between a set of Users and a set of Permissions: AC ⊆ Users × Permissions . User u is granted permission p if and only if (u, p) ∈ AC. Aside from the technical question of how to integrate this relation into systems so that granting permissions respects this relation, a major challenge concerns how to effectively represent this information since directly storing all the (u, p) pairs scales poorly. Moreover, this view is rather “flat” and does not support natural abstractions like sets of permissions. Role-Based Access Control, or RBAC, addresses both of the above limitations. The core idea of RBAC is to introduce a set of roles and to decompose the relation AC into two relations: user assignment UA and permission assignment PA, i.e., UA ⊆ Users × Roles,
PA ⊆ Roles × Permissions .
The access control relation is then simply the composition of these relations: AC = PA ◦ UA .
359
Model Driven Security
To further reduce the size of these relations and support additional abstraction, RBAC also has a notion of hierarchy on roles. Mathematically, this is a partial order ≥ on the set of roles, with the meaning that larger roles inherit permissions from all smaller roles. Formally, this means that the access control relation is now given by the equation AC = PA ◦ ≥ ◦ UA , where the role hierarchy relation ≥ is also part of the composition. To express the same access control relation without a role hierarchy, one must, for example, assign each user additional roles, i.e., a user is then not just assigned his original roles, but also all smaller roles. The introduction of a hierarchy, like the decomposition of relations, leads to a more expressive formalism in the sense that one can express access control relations more concisely. Role hierarchies also simplify the administration of access control since they provide a convenient and intuitive abstraction that can correspond to the actual organizational structure of companies. We have chosen RBAC as a foundation of our security modeling language because it is well-established and it is supported by many existing technology platforms, which simplifies the subsequent definition of the transformation functions. However, RBAC also has limitations. For example, it is difficult to formalize access control policies that depend on dynamic aspects of the system, like the date or the values of system or method parameters. We have extended RBAC with authorization constraints to overcome this limitation. Furthermore, although many technologies support RBAC, they differ in details, like the degree of support for role-hierarchies and the types of protected resources. As we will see later, our approach of generating architectures from models provides a means to overcome such limitations and differences in technologies.
2.5
Security architectures
We make use of two different security architectures in this paper. We provide an overview of them here, focusing on their support for access control.
Enterprise JavaBeans Enterprise JavaBeans (EJBs) is a component architecture standard [Monson-Haefel, 2001] for developing server-side components in Java. These components usually form the business logic of multi-tier applications and run on application servers. The standard specifies infrastructures for system-level aspects such as transactions, persistence, and security. To use these, an EJB developer declares properties for these aspects, which are managed by the application server. This configuration information is stored in deployment descriptors, which are XML documents that are installed together with the components.
360 The access control model of EJB is based on RBAC, where the protected resources are the methods accessed using the interfaces of an EJB. This provides a mechanism for declarative access control where the access control policy is configured in the deployment descriptors of an EJB component. The security subsystem of the EJB application server is then responsible for enforcing this policy on behalf of the components. The following example shows a permission definition that authorizes the role Supervisor to execute the method cancel on the component Meeting.
< method - permission > < role - name >Supervisor < method > <ejb - name >Meeting < method - intf >Remote < method - name >cancel < method - params / >
As this example illustrates, permissions are defined at the level of individual methods. A method-permission element lists one or more roles using elements of type role-name and one or more EJB methods using elements of type method. An EJB method is identified by the name of its component (ejb-name), the interface it implements (method-intf), and the method signature (method-name and method-params). The listed roles are granted the right to execute the listed methods. EJB offers the additional possibility of enforcing access control within the business logic of components. This mechanism is called programmatic access control and is based on inserting Java assertions in the methods of the bean class. To support this, EJB provides interfaces for retrieving security relevant data of a caller, like his name or roles.
Java Servlets The Java Servlet Specification [Hunter, 2001] specifies an execution environment for web components, called servlets. A servlet is basically a Java class running in a web server that processes http requests and creates http responses. Servlets can be used to dynamically create HTML pages or to control the processing of requests in large web applications. The execution environment, called the servlet container, supports both declarative and programmatic access control. For declarative access control, permissions are defined at the level of uniform resource locators (URLs) in XML deployment descriptors. Programmatic access control is used to determine the identity and the roles of a caller and to implement decisions within a servlet.
361
Model Driven Security Security Design Language RBAC Information flow Privacy
Security Modeling Language Dialect
Class diagramss Statecharts Sequence diagrams
System Design Modeling Language
Figure 3.
3.
Modeling language based on RBAC + class diagrams
Security Design Language Schema
Model Driven Security: an Overview
At the heart of Model Driven Security are security design models, which combine security and design requirements. Rather than presenting one particular modeling language for constructing these models, we propose a schema for building such languages in a modular way. The overall form of our schema is depicted in Figure 3. The schema is parameterized by three languages: 1 a security modeling language for expressing security policies; 2 a system design modeling language for constructing design models; and 3 a dialect, which provides a bridge by defining the connection points for integrating (1) with (2), e.g., model elements of (2) are classified as protected resources of (1). This schema defines a family of security design languages. By different instantiations of the three parameters, we can build different languages, tailored for expressing different kinds of designs and security policies. To automate our approach to Model Driven Security, for each schema instance, we define transformation functions that map models to security infrastructures. This must be done on a case-by-case basis, but, like with compilers, the implementation is just a one-time cost and the result is a general tool. Below we discuss these aspects in more detail. However, due to space limitations, we will focus on one particular security modeling language, which we call SecureUML, that is based on an extension of Role-Based Access Control. We will present this language in detail, emphasizing the general metamodeling ideas behind it. We will later present two different system design modeling languages and different dialects.
3.1
Security modeling languages
A security modeling language is a formal language in that it has a welldefined syntax and semantics. As we intend these languages to be used for
362 creating intuitive, readable models (e.g., visual models, like in UML), they will also be employed with a notation. To distinguish these two kinds of syntax, and following UML (cf. Section 2.2), we call the underlying syntax the abstract syntax and the notation the concrete syntax. In general, the abstract syntax is defined formally, e.g., by a grammar, whereas the notation is defined informally. The translation between notation and abstract syntax is generally straightforward; we give examples in Section 4.2. Designing modeling languages is a creative and nontrivial task, in particular when it comes to their semantics and developing (semantics-preserving) transformation functions. However, it is not our expectation that each application developer must also be a language designer. This task will be done once and for all for a large class of applications by security and system architects. We will use SecureUML to illustrate that it is possible to design security modeling languages that are general, usable with different design modeling languages, and applicable to a wide scope of problems. The definition of a language’s abstract syntax will be based on MOF and the concrete syntax will be defined by a UML profile. In Section 4 we explain this in detail as well as the semantics of SecureUML and language combination. Note that the abstract syntax and semantics of SecureUML define a modeling language for access control policies that is independent of UML and which could be combined with design modeling languages different from those of UML. However, we do make a commitment to UML when defining notation, and our use of a UML profile to define a UML notation motivates the name SecureUML.
3.2
System design languages and dialects
In our approach, a system design modeling language is merged with a security modeling language by merging their vocabularies at the levels of notation and abstract syntax. But more is required: it must be possible to build expressions in the combined language that combine subexpressions from the different languages. That is, security policies expressed in the security modeling language must be able to make statements about system resources or attributes specified in the design modeling language. It is the role of the dialect to make this connection. We will show one way of doing this using subtyping (in the object-oriented sense) to classify constructs in one language as belonging to subtypes in the other. We will provide examples of such combinations in Section 5 and Section 7. These ideas are best understood on an example. Our security modeling language SecureUML provides a language for specifying access control policies for actions on protected resources. However, it leaves open what the protected resources are and which actions they offer to clients. These depend on the
Model Driven Security
363
primitives for constructing models in the system design modeling language. For example, in a component-oriented modeling language, the resources might be methods that can be executed. Alternatively, in a process-oriented language, the resources might be processes with actions reflecting the ability to activate, deactivate, terminate, or resume the processes. Or, if we are modeling file systems, the protected resources might correspond to files that can be read, written, or executed. The dialect specifies how the modeling primitives of SecureUML are integrated with the primitives of the design modeling language in a way that allows the direct annotation of model elements representing protected resources with access control information. Hence it provides the missing vocabulary to formulate security policies involving these resources by defining: the model element types of the system design modeling language that represent protected resources; the actions these resource types offer and hierarchies classifying these actions; and the default access control policy for actions where no explicit permission is defined (i.e., whether access is allowed or denied). We give examples of integrating SecureUML into different system modeling languages in Sections 5.1 and 7.1.
3.3
Model transformation
Given a language that is an instance of the schema in Figure 3, we must define a transformation function operating on models constructed in the language. As our focus in this paper is on security, we shall assume that the system design modeling language used is already equipped with a transformation function, consisting of transformation rules that define how model elements are transformed into code or system infrastructure. Our task then is to define how the additional modeling constructs, from the security modeling language, are translated into system constructs. Our aim here is neither to develop nor to generate new kinds of security architectures, but rather to capitalize on the existing security mechanisms of the target component architecture and generate appropriate instances of these mechanisms. Of course, for this to be successful, the modeling constructs in the security modeling language and their semantics should be designed with an eye open to the class of architectures and security mechanisms that will later be part of the target platforms. This requires care during the language design phase. We will illustrate the transformation process using SecureUML and its combination with two different design languages. In one case, we define a transformation function that translates component models into secure systems based
364 RoleHierarchy 0..** PermissionAssignment ActionAssignment Role 1..* 1..* 0..* Permission 0..*
+action Action
0..* 0..*
0..* +subordinatedActions
0..*
SubjectAssignment s 0..* Subject
ConstraintAssignment As ActionHierarchy
0..1
0..*
AuthorizationConstraint User
+resource Resource 0..* ResourceAction 1
AtomicAction
CompositeAction
0..*
0..* Group 0.. SubjectGroup
Figure 4.
SecureUML Metamodel
on the component platform EJB (Section 6). In the other case, our transformation function maps controller models into secure web applications based on the Java Servlet standard (Section 7).
4.
SecureUML
We now define the abstract syntax, concrete syntax, and semantics of SecureUML. While we will later give examples of how to combine SecureUML syntactically with different design modeling languages, we describe here the semantic foundations for this combination.
4.1
Abstract syntax
Figure 4 presents the metamodel that defines the abstract syntax of SecureUML. The left-hand part of the diagram basically formalizes an extension of RBAC, where we extend Users (defined in Section 2.4) by Groups and formalize the assignment of users and groups to roles by using their common supertype Subject. The right-hand part of the diagram factors permissions into the ability to carry out actions on resources. Permissions may be constrained to hold only in certain system states by authorization constraints. Additionally, we introduce hierarchies not only on roles (which is standard for RBAC), but also on actions. Let us now examine these types and associations in more detail. Subject is the base type of all users and groups in a system. It is an abstract type (type names in italic font in class diagrams represent abstract types), which means that it cannot be instantiated directly: each subject is either a user or a group. A User represents a system entity, like a person or a process, whereas a Group names a set of users and groups. Subjects are assigned to groups by the aggregation SubjectGroup, which represents an ordering relation over subjects. Subjects are assigned to roles by the association SubjectAssignment.
Model Driven Security
365
A Role represents a job and bundles all privileges needed to carry out the job. A Permission grants roles access to one or more actions, where the actions are assigned by the association ActionAssignment and the entitled roles are denoted by the association PermissionAssignment. Due to the cardinality constraints on these associations, a permission must be assigned to at least one role and action. Roles can be ordered hierarchically, which is denoted by the aggregation RoleHierarchy, with the intuition that the role at the part end of the association inherits all the privileges of the aggregate. An AuthorizationConstraint is a logical predicate that is attached to a permission by the association ConstraintAssignment and makes the permission’s validity a function of the system state. Consider a policy stating that an employee is allowed to withdraw money from a company account provided the amount is less than $5,000. Such a policy could be formalized by giving a permission to a role Employee for the method withdraw, restricted by an authorization constraint on the parameter amount of this method. Such constraints are given by OCL expressions, where the system model determines the vocabulary (classes and methods) that can be used, extended by the additional symbol caller, which represents the name of the user on whose behalf an action is performed. Resource is the base class of all model elements in the system modeling language that represent protected resources. The possible operations on these resources are represented by the class Action. Each resource offers one or more actions and each action belongs to exactly one resource, which is denoted by the composite aggregation ResourceAction. We differentiate between two categories of actions formalized by the action subtypes AtomicAction and CompositeAction. Atomic actions are low-level actions that can be directly mapped to actions of the target platform, e.g., the action execute of a method. In contrast, composite actions are high-level actions that may not have direct counterparts on the target platform. Composite actions are ordered in an ActionHierarchy. As we will see, the semantics of a permission defined on a composite action is that the right to perform the action implies the right to perform any one of the (transitively) contained subordinated actions. This semantics yields a simple basis for defining high-level actions. Suppose that a security policy grants a role the permission to “read” an entity. Using an action hierarchy, we can formalize this by stating that such a permission includes the permission to read the value of every entity attribute and to execute every side-effect free method of the entity. Action hierarchies also simplify the development of generation rules since it is sufficient to define these rules only for the atomic actions. Together, the types Resource and Action formalize a generic resource model that serves as a foundation for combining SecureUML with different system modeling languages. The concrete resource types, their actions, the action hierarchy, and the rules for deriving resources along an inheritance hierarchy are defined as part of a SecureUML dialect.
366 UML metamodel type and stereotype Class «User» Class «Group» Dependency «SubjectGroup» Dependency «SubjectAssignment» Class «Role» Generalization between classes with stereotype «Role» AssociationClass «Permission»
SecureUML metamodel type User Group SubjectGroup SubjectAssignment Role RoleHierarchy Permission, PermissionAssignment, ActionAssignment, AuthorizationConstraint, and ConstraintAssignment
Table 1. Mapping Between SecureUML Concrete and Abstract Syntax
4.2
Concrete syntax
SecureUML’s concrete syntax is based on UML. To achieve this, we define a UML profile that formalizes the modeling notation of SecureUML using stereotypes and tagged values. In this section, we introduce the modeling notation and explain how models in concrete syntax are transformed into abstract syntax. Table 1 gives an overview of the mapping between elements of the SecureUML metamodel and UML types. Note that a permission, its associations to other elements, and its optional authorization constraint are represented by a single UML association class. Also note that the profile does not define an encoding for all SecureUML elements. For example, the notation for defining resources is left open and must be defined by the dialect. Also, no representation for subjects is given because Subject is an abstract type. We now illustrate the concrete syntax and the mapping to abstract syntax with the example given in Figure 5, which formalizes the second part of the security policy introduced in Section 2.1: only the owner of a meeting may change meeting data and cancel or delete the meeting. In the SecureUML profile, a role is represented by a UML class with the stereotype «Role» and an inheritance relationship between two roles is defined using a UML generalization relationship. The role referenced by the arrowhead of the generalization relationship is considered to be the superrole of the role referenced by the tail, and the subrole inherits all access rights of the superrole. In our example, we define the two roles User and Supervisor. Moreover, we define Supervisor as a subrole of User. Users are defined as UML classes with the stereotype «User». The assignment of a subject to a role is defined as a dependency with the stereotype «SubjectAssignment», where the role is associated with the arrowhead of the
367
Model Driven Security OwnerMeeting caller.name = self.owner.name
> Bob
>
>-Meeting : update >-Meeting : delete
User
> Meeting
> Alice
>
Figure 5.
Supervisor
-start : date -duration : time +notify() : void +cancel() : void
Example of the Concrete Syntax of SecureUML
dependency. In our example, we define the users Alice and Bob, and formalize that Alice is assigned to the role Supervisor, whereas Bob has the role User.1 The right-hand part of Figure 5 specifies a permission on a protected resource. Specifying this is only possible after having combined SecureUML with an appropriate design modeling language. The concrete syntax of SecureUML is generic in that every UML model element type can represent a protected resource. Examples are classes, attributes, and methods, as well as state machines and states. A SecureUML dialect specializes the base syntax by stipulating which elements of the system design language represent protected resource and defines the mapping between the UML representation of these elements and the resource types in the abstract syntax of the dialect. For this example, we employ a dialect (explained in Section 5.1) that formalizes that UML classes with the stereotype «Entity» are protected resources possessing the actions update and delete, i.e., the class Meeting is a protected resource. A permission, along with its relations to roles (PermissionAssignment) and actions (ActionAssignment), is defined in a single UML model element, namely an association class with the stereotype «Permission». The association class connects a role with a UML class representing a protected resource, which is designated as the root resource of the permission. The actions such a permission refers to may be actions on the root resource or on subresources of the root resource. In our example, the class Meeting is the root resource of the permission OwnerMeeting granted to the role User. Each attribute of the association class represents the assignment of an action to the permission (ActionAssignment), where the action is identified by the name of its resource and the action name. The action name is given as the attribute’s type, e.g. “update”. The resource name is stored in the tagged value identifier and references the root resource or one of its subresources. The format of the identifier depends on the type of the referenced resource and is determined by the stereotype of the attribute.
368 The stereotypes for action references and the naming conventions for identifiers are defined as part of the dialect. As a general rule, the resource identifier is always specified relative to the root resource. This prevents redundant information in the model and inconsistencies when the root resource’s name is changed. For example, the attribute start would be referenced by the string “start” and the root resource itself would be referenced by an empty string. Note that the name of the action reference attribute has only an illustrative meaning. We generally use names that provide information about the referenced resource. In our example, the attribute of type “update” with the stereotype «EntityAction» and the name “Meeting” denotes the action update on the class Meeting. As we will later see in Table 2, the permission to update an Entity also comprises the permission to execute any non-side-effect free method of the Entity, for example the method cancel() of the class Meeting. The second attribute in our example denotes the action delete on the class Meeting. Together, these two attributes specify the permission to update (which includes canceling) and delete a meeting. Each authorization constraint is stored as an OCL expression in the tagged value constraint of the permission that it constrains. To improve the readability of a model, we attach a text note with the constraint expression to the permission’s association class. In our example, the permission UserMeeting is constrained by the authorization constraint caller.name = self.owner.name, which restricts the permission to update and delete a meeting to the owner of the meeting.
4.3
Semantics
The General Idea SecureUML formalizes access control decisions that depend on two kinds of information. 1 Declarative access control decisions that depend on static information, namely the assignments of users and permissions to roles, which we designate as a RBAC configuration. 2 Programmatic access control decisions that depend on dynamic information, namely the satisfaction of authorization constraints in the current system state. While formalizing the semantics of RBAC configurations is straightforward, formalizing the satisfaction of authorization constraints in system states is not. This is mainly because what constitutes a system state is defined by the design modeling language, and not by SecureUML. Since the semantics of SecureUML depends on the set of states, we parameterize the SecureUML semantics by this set. Also, we have to define the semantics of RBAC configura-
Model Driven Security
369
tions in a way that supports its combination with the semantics of authorization constraints. The basic ideas are as follows. To formalize 1, declarative access control decisions, we represent a RBAC configuration as a first-order structure SRBAC , and we define the semantics of declarative access control decisions by SRBAC |= φRBAC (u, a), where φRBAC (u, a) formalizes that the user u is “in the right role” to perform the action a. To formalize 2, we represent system states st by (corresponding) first-order structures Sst , and authorization constraints as first-order formulas φpST (u) (independent of the state st). In accordance with the SecureUML metamodel, constraints are associated with permissions (not actions), and this formula formalizes under which condition the user u has the permission p. Whether this condition holds or not in the state st is then cast as the logical decision problem Sst |= φpST (u). To combine both RBAC configurations and authorization constraints, we combine the first-order structures Sst and SRBAC , as well as the first-order formulas φpST (u) and φRBAC (u, a), and use this to formalize the semantics of individual access control decisions. Since the addition of access control changes the run-time behavior of a system, we must also define how the semantics of SecureUML changes the behavior specified by the design modeling language. To accomplish this, we require that the system behavior can be defined by a transition system and we interpret the addition of access control as restricting the system behavior by removing transitions from this transition system. In what follows, we formalize these ideas more precisely.
Declarative Access Control To begin with, we define an order-sorted signature ΣRBAC = (SRBAC , ≤RBAC , FRBAC , PRBAC ), which defines the type of structures that specify role-based access control configurations.2 Here SRBAC is a set of sorts, ≤RBAC is a partial order on SRBAC , FRBAC is a sorted set of function symbols, and PRBAC is a sorted set of predicate symbols. In detail, we define SRBAC = {Users, Subjects, Roles, Permissions, AtomicActions, Actions} , where Users ≤RBAC Subjects, and AtomicActions ≤RBAC Actions, FRBAC = ∅ , ⎧ ⎫ ⎨ ≥Subjects :Subjects × Subjects, UA:Subjects × Roles, ⎬ PA :Roles × Permissions, . PRBAC = ≥Roles :Roles × Roles, ⎩ ⎭ ≥Actions :Actions × Actions, AA:Permissions × Actions The subsort relation ≤RBAC is used here to formalize that Users is a subsort of Subjects and AtomicActions is a subsort of Actions.
370 The predicate symbols UA, PA, and AA denote assignment relations, corresponding in the SecureUML metamodel to the associations SubjectAssignment, PermissionAssignment, and ActionAssignment respectively. The predicate symbols ≥Subjects , ≥Roles , and ≥Actions denote hierarchies on the respective sets and correspond to the aggregation associations SubjectGroup, RoleHierarchy, and ActionHierarchy respectively. A SecureUML model defines a ΣRBAC -structure SRBAC in the obvious way: the sets Subjects, Users, Roles, Permissions, Actions, and AtomicActions each contain entries for every model element of the corresponding metamodel types Subject, User, Role, Permission, Action, and AtomicAction. Also, the relations UA, PA, and AA contain tuples for each instance of the corresponding association in the abstract syntax of SecureUML. Additionally, we define the partial orders ≥Subjects , ≥Roles , and ≥Actions on the sets of subjects, roles, and actions respectively. ≥Subjects is given by the reflexive closure of the aggregation association SubjectGroup in Figure 4 and formalizes that a group is larger than all its contained subjects. ≥Role is defined analogously based on the aggregation association RoleHierarchy on Role and we write subroles (roles with additional privileges) on the left (larger) side of the ≥-symbol. ≥Actions is given by the reflexive closure of the composition hierarchy on actions, defined by the aggregation ActionHierarchy. We write a1 ≥Actions a2 , if a2 is a subordinated action of a1 . These relations are partial orders because aggregations in UML are transitive and antisymmetric by definition. Note that compared to Figure 4, we have excluded the metamodel types Group, CompositeAction, Resource, and AuthorizationConstraint. Resource is excluded because the target of access control is the actions performed on resources, and not resources themselves. Group and CompositeAction are excluded because groups and composite actions are just subsets of subjects and actions respectively and do not play any further role in the semantics. AuthorizationConstraint is excluded because its semantics is not part of declarative access control, but rather part of programmatic access control. We define the formula φRBAC (u, a) with variables u of sort Users and a of sort Actions by φRBAC (u, a) = ∃s ∈ Subjects, r1 , r2 ∈ Roles, p ∈ Permissions, a ∈ Actions. s ≥Subjects u ∧ UA(s, r1 ) ∧ r1 ≥Roles r2 ∧ PA(r2 , p) ∧ AA(p, a ) ∧ a ≥Actions a , or equivalently, by factoring out the permissions explicitly, as φRBAC (u, a) =
{p∈P ermissions}
φUser (u, p) ∧ φAction (p, a) ,
(1)
Model Driven Security
371
where φUser (u, p) = ∃s ∈ Subjects, r1 , r2 ∈ Roles. s ≥Subjects u ∧ UA(s, r1 ) ∧ r1 ≥Roles r2 ∧ PA(r2 , p) states that the user u has the permission p, and φAction (p, a) = ∃a ∈ Actions. AA(p, a ) ∧ a ≥Actions a states that p is a permission for the action a. This is essentially a reformulation of the usual RBAC semantics (cf. Section 2.4). The reason for the factorization given by definition (1) will become clear when we combine this formula with programmatic access control formulas φpST (u). The declarative access control part of SecureUML is now defined by saying that a user u may perform an action a only if SRBAC |= φRBAC (u, a) holds.
Programmatic Access Control While declarative access control decisions can be made independently of the system model, we must explicitly incorporate the syntax and semantics of the design modeling language into SecureUML for programmatic access control. In order to be able to combine the semantics of SecureUML with the semantics of system design modeling languages, we make some assumptions about the nature of the latter, so that the semantic combination will be well-defined. To make programmatic access control decision, we require that the system design model provides a vocabulary for talking about the structure of the system. More formally, we require that the system design model provides a sorted first-order signature ΣST = (SST , FST , PST ). Typically, SST contains one sort for each class in the system model, FST contains a function symbol for each attribute and for each side-effect free method of the model, and PST contains predicate symbols for 1-to-many and many-to-many relations between classes. How exactly this signature is defined depends on the semantics of the system design modeling language. We do however require that SST contains a sort Users and that FST contains a constant symbol caller of sort Users, and a constant symbol selffC for each class C in the system model. This amounts to the requirement that the design modeling language provides some way of talking about who is accessing what, which is a minimal requirement for any reasonable notion of access control. For practical reasons, we also assume that FST contains a function symbol UserName, which maps users to a string representation of their names. How the symbols in ΣST are interpreted in Sst is again defined by the system design modeling language. Here we only require that the constant symbol selffC is interpreted by the currently accessed object, in case the currently accessed object is of the sort C, and that the constant symbol caller is interpreted by the user that initiated this access.
372 In this setting, the state of the system at a particular time defines a ΣST structure Sst . Constraints on the system state Sst can be expressed as logical formulas φST , whereby constraint satisfaction is just the question of whether Sst |= φST holds.3
Combining Declarative and Programmatic Access Control To formalize combined declarative and programmatic access control decisions, we combine the states Sst and SRBAC into the composite structure SAC = SRBAC , Sst
and combine the formulas φST and φRBAC into a new formula φAC . The combined access control decision is then defined as the question of whether SAC |= φAC holds. By SRBAC , Sst we mean that SAC is the structure that consists of the carrier sets, functions and predicates from both SRBAC and Sst , where we identify the carrier sets of the sort Users, which belongs to both structures. As for φAC , observe that authorization constraints are not global constraints, but are attached to permissions (as can be seen in Figure 5) and hence are only relevant for the roles that have these permissions. We denote the authorization constraint that is attached to a permission p by φpST , and require that φpST is an expression in the first-order language defined by ΣST . In order to define the language for the combined formula φAC , we combine the signatures ΣRBAC and ΣST by taking their componentwise union4 , i.e., ΣAC = (S SAC , ≤AC , FAC , PAC ), where SAC = SRBAC ∪ SST , ≤AC =≤RBAC ∪ idSS T , FAC = FRBAC ∪ FST , and PAC = PRBAC ∪ PST . Here we assume that the signatures ΣRBAC and ΣST are disjoint, with the exception of the sort Users, which belongs to both signatures. Observe that under this definition of ΣAC , SAC is a ΣAC -structure. The combined access control semantics is now defined by the formula φUser (u, p) ∧ φAction (p, a) ∧ φpST (u) , (2) φAC (u, a) = p∈P ermissions
stating that a user u must have a permission p for the action a according to the RBAC configuration and that the corresponding authorization constraint for this permission p must evaluate to true for the user u.
Behavioral Semantics of Access Control The preceding paragraphs defined how access control decisions are made in a system state. But what is interesting in the end is how the system behaves when an access control decision is made. In order to define this, we again make some minimal assumptions on the semantics of the design modeling language. Namely, we assume that the semantics of the system design modeling languages can be expressed as a Labeled Transition System (LTS) ∆ = (Q, A, δ). In this LTS, the set of nodes Q consists of ΣST -structures, the edges are labeled with elements from a set of
Model Driven Security
373
actions A that is a superset of AtomicActions, and δ ⊆ Q × A × Q is the transition relation. Note that we do not require that A = AtomicActions because the design modeling language may define state-changing actions (i.e., those with side-effects) that are not protected. The behavior of the system is defined by a a the paths (also called traces) in the LTS as is standard: a trace s0 →0 s1 →1 . . . defines a possible behavior if and only if (si , ai , si+1 ) ∈ δ, for 0 ≤ i. In this setting, adding access control to the system design corresponds to deleting traces from the LTS, i.e., when an action is not permitted then the transition must not be made, and when an action is permitted, the subsequent state must be the same as before adding access control. More formally, adding access control to a system description means transforming the LTS ∆ = (Q, A, δ) to an LTS ∆AC = (QAC , AAC , δAC ) as follows: QAC is defined by combining system states with RBAC configurations, i.e., QAC = QRBAC × Q, where QRBAC denotes the universe of all finite ΣRBAC -structures. AAC is unchanged: AAC = A. δAC is defined by restricting δ to the permitted transitions: δAC = {(qRBAC , q, a, qRBAC , q ) | (q, a, q ) ∈ δ ∧ (a ∈ AtomicActions → qRBAC , q |= φAC (caller, a)} Note that this definition implies that the RBAC configuration does not change during system execution. We do not address issues like run-time user administration in this work. We will see concrete semantic combinations of SecureUML with different design modeling languages in Sections 5.3 (for ComponentUML) and in Section 7.3 (for ControllerUML).
5.
An Example Modeling Language: ComponentUML
In this section, we give an example of a system design language, which we call ComponentUML, and present its combination with SecureUML. We also show how to model security policies using the resulting security design modeling language and we illustrate its semantics using the example introduced in Section 2.1. ComponentUML is a simple language for modeling component-based systems. The metamodel for ComponentUML is shown in Figure 6. Elements of type Entity represent object types of a particular domain. An entity may have multiple methods and attributes, represented by elements of the types Method and Attribute respectively. Associations are used to specify relations between
374 Entity
+type
AssociationEnd
1
0..*
1
0..*
0..* Attribute
Association 2
Method
Figure 6.
ComponentUML Metamodel
entities. An association is built from an Association model element and every entity participating in an association is connected to the association by an AssociationEnd. ComponentUML uses a UML-based notation where entities are represented by UML classes with the stereotype «Entity». Every method, attribute, or association end owned by such a class is automatically considered to be a method, attribute, or association end of the entity, so no further stereotypes are necessary. Figure 7 shows the structural model of our scheduling application in the ComponentUML notation. Instead of classes, we now have the three entities Meeting, Person, and Room, each represented by a UML class with the stereotype «Entity».
5.1
Extending the abstract syntax
Merging the Syntax As the first step towards making ComponentUML security aware, we extend its abstract syntax with the vocabulary of SecureUML by integrating both metamodels, i.e., we merge the abstract syntax of both modeling languages. This is achieved by importing the SecureUML metamodel into the metamodel of ComponentUML. This extends ComponentUML with the SecureUML modeling constructs, e.g., Role and Permission. The use of packages and corresponding namespaces for defining these metamodels ensures that no conflicts arise during merging. Identifying Protected Resources Second, we identify the model elements of ComponentUML representing protected resources and formalize this as part of a SecureUML dialect. To do this, we must determine which model element we wish to control access to in the resulting systems. When doing this, we must account for what can ultimately be protected by the target platform. Suppose, for example, we decide to interpret entity attributes as protected resources and the target platform supports access control on methods only. This is possible, but it necessitates a transformation function that transforms each modeled attribute into a private attribute and generates (and enforces access to) access
375
Model Driven Security +owner > Meeting
0..*
-start : date -duration : time
0..*
+notify() : void +cancel() : void
0..*
1 +participants 0..*
+location 0..1
Figure 7.
> Person -name : string -e-mail : string
> Room -number : int -floor : int
Scheduling Application
methods for reading and changing the value of the attribute in the generated system. In our example, we identify the following model elements of ComponentUML as protected resources: Entity, Method, Attribute, and AssociationEnd. This identification is made by using inheritance to specify that these metatypes are subtypes of the SecureUML type Resource, as shown in Figure 8. In this way, the metatypes inherit all properties needed to define authorization policies. Additionally, we define in this figure several action classes as subtypes of the SecureUML class CompositeAction. The action composition hierarchy is then defined as part of each action’s type information, by way of OCL invariant constraints (see below) on the respective types.
Defining Resource Actions In the next step, we define the set of actions that is offered by every model element type representing a protected resource, i.e., we fix the domain of the metamodel association ResourceAction for each resource type of the dialect. Actions can be freely defined at every level of abstraction. One may choose just to leverage the actions that are present in the target security architecture, e.g., the action “execute” on methods. Alternatively one may define actions at a higher level of abstraction, e.g., “read” access to a component. This results in a richer, easier to use vocabulary since granting read or write access to an entity is more intuitive than giving someone the privilege to execute the methods getBalance, getOwner, and getId. High-level actions also lead to concise models. We usually define actions of both kinds and connect them using hierarchies. In the metamodel, the set of actions each resource type offers is defined by the named dependencies from the resource type to action classes, as shown in Figure 8. Each dependency represents one action of the referenced action type in the context of the resource type, where the dependency name determines the name of the action. For example, the metamodel in Figure 8 formalizes that an Attribute always possesses the action fullAccess of type AttributeFullAccess and the actions read and update of type AtomicAction.
376 Resource (from SecureUML)
Entity name : string
fullAccess EntityFullAccess
Method
read update createdelete execute
EntityRead
EntityUpdate
Attribute
read update fullAccess
AtomicAction
AttributeFullAccess
AssociationEnd
update fullAccess read AssociationEndFullAccess
(from SecureUML)
AtomicAction (from SecureUML)
CompositeAction (from SecureUML)
Figure 8.
SecureUML Dialect for ComponentUML Metamodel
Defining the Action Hierarchy As the final step in defining our SecureUML dialect, we define a hierarchy on actions. We do this by restricting the domain of the SecureUML association ActionHierarchy on each composite action type of the dialect by an OCL invariant. An overview of the composite actions of the SecureUML dialect for ComponentUML is given in Table 2. The approach we take is shown for the action class EntityFullAccess by the following OCL expression. context EntityFullAccess inv : subordinatedActions = resource . actions - > select ( name =" create " or name =" read " or name =" update " or name =" delete ")
This expression states that the composite action EntityFullAccess is larger (a “super-action”) in the action hierarchy than the actions create, read, update, and delete of the entity the action belongs to.
5.2
Extending the concrete syntax
In the previous section, we have seen how the abstract syntax of ComponentUML can be augmented with syntax for security modeling by combining it with the abstract syntax of SecureUML. We extend the concrete syntax of ComponentUML analogously by importing the SecureUML notation into ComponentUML. Afterwards, we define well-formedness rules on SecureUML primitives that restrict their use to those ComponentUML elements representing protected resources. For example, the scope of a permission, which is any UML class in the SecureUML notation (see Section 4.2), is restricted to UML classes with the stereotype «Entity». Finally, as shown in
377
Model Driven Security composite action type EntityFullAccess EntityRead EntityUpdate
AttributeFullAccess AssociationEndFullAccess
subordinated actions create, read, update, and delete of the entity. read for all attributes and association ends of the entity, and execute for all side-effect free methods of the entity. update for all attributes of the entity, update for all association ends of the entity, and execute for all non-side-effect free methods of the entity. read and update of the attribute. read and update of the association end.
Table 2. SecureUML Dialect Action Hierarchy stereotype EntityAction MethodAction AttributeAction AssociationEndAction
resource type Entity Method Attribute AssociationEnd
naming convention empty string method signature attribute name association end name
Table 3. Action Reference Types for ComponentUML
Table 3, we define the action reference types for entities, attributes, methods, and association ends.
5.3
Extending the Semantics
Our combination schema requires that we define the semantics of ComponentUML as a labeled transition system ∆ = (Q, A, δ) over a first-order signature ΣST . Intuitively, every entity defines a sort in the first-order signature, and every atomic action defined by the SecureUML dialect for ComponentUML (cf. Figure 8) defines an action in the labeled transition system. Side-effect free actions give rise to function and predicate symbols in the firstorder signature. To make this more precise, given a model in the ComponentUML language, we define the signature ΣST = (SST , FST , PST ) as follows: Each Entity e gives rise to a sort Se in SST . Additionally, SST contains the sorts Users, String, Int, Real, and Boolean: SST = {Se | e is an entity} ∪ {Users, String, Int, Real, Boolean} . Each side-effect free entity method m (which is marked in UML by the tagged value “isQuery”, set to true) gives rise to a function symbol fm in FST of the corresponding type. Corresponding type here means, in particular, that we add the sort of the entity as an additional parameter, i.e., the “this-pointer” is passed as an additional argument. Each
378 entity attribute at gives rise to a function symbol getat in FST (the “getmethod”) of type s → v, where s is the sort of the entity and v is the sort of the attribute’s type. Each association end ae with multiplicity {1} gives rise to a function symbol fae . Finally, we have a constant symbol caller of type Users and a function symbol UserName of type Users → String: FST = {ffm | m is an entity method} ∪ {getat | at is an entity attribute} ∪ {selffe | e is an entity} ∪ {ffae | ae is an association end with multiplicity {1}} ∪ {caller, UserName} . Each association end ae with a multiplicity other than {1} gives rise to a binary predicate symbol Pae in PST of the type of the involved entities: PST = {P Pae | ae is an association end with multiplicity = {1}} . We now define the labeled transition system ∆ = (Q, A, δ) by: Q is the universe of all possible system states, which is just the set of all first-order structures over the signature ΣST that consist of finitely many objects for each entity as well as for the sort Users, and where the interpretations of String, Int, Real, and Boolean are fixed to be the sets Strings, Z, R, and {true, false} respectively. The entity sorts consist of objects that can be thought of as tuples, containing an object identifier and fields for each attribute. The attribute fields contain the object identifier of the referenced object (in case this object is of an entity sort) or a value of one of the primitive types. The set of actions A is defined by (cf. Figure 8): A = EntityCreateActions ∪ EntityDeleteActions ∪ MethodActions ∪ AttributeReadActions ∪ AttributeUpdateActions ∪ AssociationEndReadActions ∪ AssociationEndAddActions ∪ AssociationEndRemoveActions , where, for example, AttributeUpdateActions is defined by: ) {setat } × Qe × Qat . AttributeUpdateActions = {at∈Attributes}
Model Driven Security
379
Here, Qe and Qat denote the universes of all possible instances of the type of the attribute’s entity, and the type of the attribute respectively, e.g., the action (setat , e, v) ∈ AttributeUpdateActions denotes the action of setting the attribute at of the entity e to the value v. The other sets of actions are defined similarly. The transition relation δ ⊆ Q × A × Q defines the allowed transitions. The exact details of δ will depend on the intended semantics of the methods themselves. We will just give a few examples here to illustrate the main idea. For example, for a ∈ AttributeReadActions, (q, a, q ) ∈ δ if and only if q = q , i.e., reading an attribute’s value does not change the system state. In contrast, setting an attribute value should be reflected in the system state: for a = (setat , e, v) ∈ AttributeUpdateActions, (q, a, q ) ∈ δ implies q |= getat (e) = v. It is possible to complete this account and give a full semantics of ComponentUML, but this would take us too far afield. Any completion will meet the requirements put forth in Section 4.3 and have a well-defined behavioral semantics. Specifically, the transition system ∆AC = (QAC , AAC , δAC ) for the combination is defined by adding ΣRBAC -structures to the system states in Q, extending δ to QAC × AAC × QAC , and removing forbidden transitions. Hence, δAC will only contain those transitions that are allowed according to the SecureUML semantics.
5.4
Modeling the authorization policy
We now use the combined language to formalize the security policy given in Section 2.1. We do this by adding permissions to the entity model of the scheduler application that formalize the three policy requirements. As these permissions associate roles with actions, we also employ the roles User and Supervisor, which we introduced in Section 4.2. The first requirement states that any user may create and read meeting data. We formalize this by the permission UserMeeting in Figure 9, which grants the role User the right to perform create and read actions on the entity Meeting. We formalize the second requirement with the permission OwnerMeeting, which states that a meeting may only be altered or deleted by its owner. This permission grants the role User the privilege to perform update and delete actions on a Meeting. Additionally, we restrict this permission with the authorization constraint caller.name = self.owner.name, which states that the name of a caller must be equal to the name of the owner of the meeting instance. Due to the definition of the action update (cf. Table 2), this permission must hold for every change of the value of the attributes or association ends of the meeting entity as well as for invocations of the methods notify or cancel.
380 OwnerMeeting >-Meeting : update >-Meeting : delete
caller.name = self.owner.name + +owner
User
UserMeeting
Supervisor
>-Meeting : read >-Meeting : create
> Meeting
1
0..*
-start : date -duration : time
0..*
+notify() : void +cancel() : void
0..*
0..*
> Person -name : string -e-mail : string
+participants
+location
0..1
> Room -number : int -floor : int
SupervisorCancel >-Meeting.cancel : execute >-Meeting.notify : execute
Figure 9.
Scheduler Example with Authorization Policy
Finally, we formalize the third requirement with the permission SupervisorCancel. This gives a supervisor the permission to cancel any meeting, i.e., the right to execute the methods cancel and notify.
5.5
Examples of access control decisions
We now illustrate the semantics by analyzing several access control decisions in the context of Figure 9. We assume that we have three users, Alice, Bob, and Jack, and that Bob is assigned the role User whereas Alice is assigned the role Supervisor. Here we assume that our dialect has the default behavior “access allowed” and we directly apply the semantics of SecureUML to the policy given in the previous section. The corresponding ΣRBAC -structure SRBAC is5 Users = Subjects ={Alice, Bob, Jack} Roles ={User, Supervisor} Permissions ={OwnerMeeting, SupervisorCancel, . . . } AtomicActions ={Meeting::cancel.execute, . . . } Actions =AtomicActions ∪ {Meeting.update, . . . } UA ={(Bob, User), (Alice, Supervisor)} PA ={(User, OwnerMeeting), (Supervisor, SupervisorCancel), . . . } AA ={(SupervisorCancel, Meeting::cancel.execute), (OwnerMeeting, Meeting.update), . . . }
381
Model Driven Security
≥Roles ={(Supervisor, User), (Supervisor, Supervisor), (User, User)} ≥Actions ={(Meeting.update, Meeting::cancel.execute), ...} , and the signature ΣST , derived from the system model, is S = {Meetings, Persons, Rooms} ∪ {String, Int, Real, Bool} F = {self Meetings , . . . , MeetingOwner , PersonName} P = {MeetingLocation, MeetingParticipants, . . .} . The constant symbol self Meetings of sort Meetings denotes the currently accessed meeting. The function symbols MeetingOwner : Meetings → Persons PersonName : Persons → String represent the association end owner of the entity type Meeting and the attribute name of a person. Now suppose that Alice wants to cancel a meeting entry owned by Jack. Suppose further that the system state is given by the first-order structure Sst over ΣST , where callerSst Meetings Sst Persons Sst Sst self Meetings Sst
MeetingOwner PersonName Sst UserName Sst
= Alice = {meeting Jack } = {alice, bob, jack } = meeting Jack = {(meeting Jack , jack )} = {(alice, "Alice"), (bob, "Bob"), (jack , "Jack")} = {(Alice, "Alice", (Bob, "Bob"), (Jack, "Jack")}.
The formula that must be satisfied by the structure SAC = SRBAC , Sst in order to grant Alice access is built according to the definition (2), given in Section 4.3: ! φUser (u, p) ∧ φAction (p, a) ∧ φpST (u) . φAC (u, a) = p∈P ermissions
As can be seen in Figure 9, Alice has the permission SupervisorCancel for the action Meeting::cancel.execute. However, the method cancel() of the entity Meeting is a method with side-effects. Therefore, the composite action Meeting.update includes the action Meeting::cancel.execute. Because the role Supervisor inherits permissions from the role User, Alice also
382 has the permission OwnerMeeting for the action Meeting::cancel.execute. No other permissions for this action exist. Hence, the formula φUser (Alice, p) ∧ φAction (p, Meeting::cancel.execute) is only true for these permissions. The constraint expression caller . name = self . owner . name
on the permission OwnerMeeting is translated into the formula UserName(caller) = PersonName(MeetingOwner (self Meetings ())) , and the formula for the permission SupervisorCancel is true. For all other permissions p, the formula φU ser (u, p) ∧ φAction (p, a) is false. Therefore the access decision is equivalent to SAC |= true ∨ UserName(caller) = PersonName(MeetingOwner (self Meetings ())) , which is satisfied. Alternatively, suppose that Bob tries to perform this action. The corresponding structure SAC differs from SAC by the interpretation of the constant symbol caller, which now refers to “Bob”. Bob only has the permission OwnerMeeting for this action. Hence, SAC |= UserName(caller) = PersonName(MeetingOwner (self Meetings ())) is required for access. Since Jack (not Bob) is the owner of this meeting, this constraint is not satisfied and access is denied.
6.
Generating an EJB System
We now show how ComponentUML models can be transformed into executable EJB systems with configured access control infrastructures. First, we outline the basic generation rules for EJB systems. Afterwards, we present the rules for transforming SecureUML elements into EJB access control information. The generation of users, roles, and user assignments is straightforward in EJB: for each user, role, and user assignment, we generate a corresponding element in the deployment descriptor. We therefore omit these details and focus here on the parts of the infrastructure responsible for enforcing permissions and authorization constraints.
Model Driven Security
6.1
383
Basic generation rules for EJB
Generation rules are defined for entities, their attributes, methods, and association ends. The result of the transformation is a source code fragment in the concrete syntax of the EJB platform, either Java source code or XML deployment descriptors. An Entity is transformed to a complete EJB component of type entity bean with all necessary interfaces and an implementation class. Additionally, a factory method create for creating new component instances is generated. The component itself is defined by an entry in the deployment descriptor of type entity as shown by the following XML fragment. < entity > <ejb - name > Meeting < local - home > scheduler . MeetingHome < local > scheduler . Meeting <ejb - class > scheduler . MeetingBean ...
A Method is transformed to a method declaration in the component interface of the respective entity bean and a method stub in the corresponding bean implementation class. The following shows the stub for the method cancel of the entity Meeting. void cancel (){ }
For each Attribute, access methods for reading and writing the attribute value are generated along with persistency information that is used by the application server to determine how to store this value in a database. The declarations of the access methods for the attribute duration of the entity Meeting are shown in the following Java code fragment. int getDuration (); void setDuration ( int duration );
Elements of type AssociationEnd are handled analogously to attributes. Access methods are generated for reading the collection of associated objects and for adding objects to, or deleting them from, the collection. Furthermore, persistency information for storing the association-end data in a database is generated. The following code fragment shows the declarations of the access methods for the association end participants of the entity Meeting. Collection getParticipants (); void addParticipant ( Participant participant ); void removeParticipant ( Participant participant );
384
6.2
Generating access control Infrastructures
We define generation rules that translate a security design model into an EJB security infrastructure based on declarative and programmatic access control. Each permission is translated into an equivalent XML element of type method-permission, used in the deployment descriptor for the declarative access control of EJB. The resulting access control configuration enforces the static part of an access control policy, without considering the authorization constraints. Programmatic access control is used to enforce the authorization constraints. For each method that is restricted by at least one permission with an authorization constraint, an assertion is generated and placed at the start of the method body. Note that since the default behavior of both the SecureUML dialect for ComponentUML and the EJB access control monitor is “access allowed”, we need not consider actions without permissions during generation.
Generating Permissions As explained in Section 2.5, a method permission element names a set of roles and the set of EJB methods that the members of the roles may execute. Generating a method permission can therefore be split into two parts: generating a set of roles and assigning methods to them. Since EJB does not support role hierarchies, both the roles directly connected to permissions in the model, as well as their subroles, are needed for generation. First, the set of roles directly connected to a permission is determined using the association PermissionAssignment of the SecureUML metamodel. Then, for every role in this set, all of its subroles (under the transitive closure of the relation defined by the association RoleHierarchy) are added to the role set. Finally, for each role in the resulting set, one role-name element is generated. Applying this generation procedure to the permission OwnerMeeting in our example results in the following two role references. < role - name > User < role - name > Supervisor
The set of method elements that is generated for each permission is computed similarly. First, for each permission, we determine the set of actions directly referenced by the permission using the association ActionAssignment. Then, for every action in this set, all of its subordinated actions (under the reflexive closure of the relation defined by the association ActionHierarchy) are added to the action set. Finally, for each atomic action in the resulting set, method elements for the corresponding EJB methods are generated. The correspondence between atomic actions and EJB methods is given in Table 4. Note that an atomic action may map to several EJB methods and therefore several method entries may need to be generated.
385
Model Driven Security rule # 1 2 3 4 5 6 7
resource type Entity Entity Method Attribute Attribute AssociationEnd AssociationEnd
action create delete execute read update read update
EJB methods automatically generated factory methods delete methods corresponding method get-method of the attribute set-method of the attribute get-method of the association end add- and remove-method of the association end
Table 4. Atomic Action to Method Mapping for EJB
We illustrate this process for the permission UserMeeting, which references the actions Meeting.create and Meeting.read. The resulting set of atomic actions for this permission is {Meeting.create, Meeting::start.read, Meeting::duration.read, Meeting::owner.read, Meeting::location.read, Meeting::participants.read} ,
where “::” is standard object-oriented notation, which is used here to reference the attributes and association ends of the entity Meeting. The action create of the entity Meeting remains in the set, whereas the action read is replaced by the corresponding actions for reading the attributes and the association ends of the entity Meeting. The mapping rules 1, 4, and 6 given in Table 4 are applied, which results in a set of six methods: the method create, the read-methods of the attributes start and duration, and the read-methods of the association ends owner, participants, and location. The XML code generated is as follows: < method > <ejb - name > Meeting < method - intf > Local < method - name > create < method - params / > < method > <ejb - name > Meeting < method - intf > Local < method - name > getStart < method - params / > < method > <ejb - name > Meeting < method - intf > Local < method - name > getDuration < method - params / >
< method > <ejb - name > Meeting < method - intf > Local < method - name > getOwner < method - params / > < method > <ejb - name > Meeting < method - intf > Local < method - name > getLocation < method - params / > < method > <ejb - name > Meeting < method - intf > Local < method - name > getParticipants < method - params / >
386 Generating Assertions While the generation of an assertion for each OCL constraint is a simple matter, this task is complicated by the fact that a method may have multiple (alternative) permissions, associated with different constraints and roles, where the roles in turn may be associated with subroles. Below we describe how we account for this when generating assertions. First, given a method m, the atomic action a corresponding to the method is determined using Table 4. For example, the action corresponding to the EJB method Meeting::cancel is the action execute of the method cancel of the entity Meeting in the model. Then, using this action a, the set of permissions ActionPermissions(a) that affect the execution of the method m is determined as follows: a permission is included if it is assigned to a by the association ActionAssignment or one of the super-actions of a (under the reflexive closure of the relation defined by the association ActionHierarchy). Next, for each permission p in the resulting set ActionPermissions(a), the set PR(p) of roles assigned to p is determined, again taking into account the hierarchy on roles in the same way as in the previous section. Finally, based on this information, an assertion is generated of the form !
**
!
if (!( p∈ActionPermissions(a)
+ + UserRole(r ) ∧ Constraint(p) ))
r∈PR(p)
(3)
throw new AccessControlException("Access denied."); .
This scheme is similar to the definition of φAC (u, a) by Equation (2) in Section 4.3, as each permission represents an (alternative) authorization to execute an action. However, because the permission assignments and action assignments are known at compile time, this information is used to simplify the assertion. Instead of considering all permissions, we only consider permissions that refer to the action in question by calculating the set ActionPermissions(a). This has the effect that the equivalent of φAction (p, a) in Equation (2) can be omitted. Similarly, the equivalent of φUser is simplified by only considering roles that have one of these permission, which is done by calculating the sets PR(p). If a constraint is assigned to a permission, it is evaluated afterwards. Access denial is signaled to the caller by throwing an exception. As an example, for the method Meeting::cancel, we generate the following assertion. if (!( ctxt . isCallerInRole ( " Supervisor " ) /* SupervisorCancel */ || ( ctxt . isCallerInRole ( " User " ) || ctxt . isCallerInRole ( " Supervisor " )) && ctxt . getCallerPrincipal . getName (). equals ( getOwner (). getName ()))) /* OwnerMeeting */ throw new A c c e s s C o n t r o l E x c e p t i o n ( " Access denied . " );
Observe that the role assignment check UserRole(r) is translated into a Java expression of the form ctxt.isCallerInRole(). The variable ctxt references an object that is used in EJB to communicate with the execu-
387
Model Driven Security ControllerAttribute StateHierarchy 0..n
Controller
container t 0..1 behavior
Statemachine
1 1
states n
+substates t 0..n
State
incoming
1 target
0..n
1 source
0..n
+controller
StateTransition
trigger
Event
1
outgoing effect SubControllerState
Figure 10.
ViewState
0..1
StatemachineAction
Metamodel of ControllerUML
tion environment of a component. Here, the context object is used to check the role assignment of the current caller. Authorization constraints are translated to equivalent Java expressions. The symbol caller is translated to ctxt.getCallerPrincipal.getName(). Access to methods, attributes, and association ends respects the rules that are applied to generate the respective counterparts of these elements, given in Section 6.1. For example, access to the value of an attribute name is translated to a call of the corresponding read method getName. The OCL equality operator is translated to the Java method equals for objects or into Java’s equality operator for primitive types.
7.
ControllerUML
To demonstrate the general applicability of our approach, we now present a second design modeling language. This language, which we call ControllerUML, is based on state machines.6 We will show how ControllerUML can be integrated with SecureUML and used to model secure controllers for multitier applications, and how access control infrastructures can be generated from such controller models. A well-established pattern for developing multi-tier applications is the ModelView-Controller pattern [Krasner and Pope, 1988]. In this pattern, a controller is responsible for managing the control flow of the application and the data flow between the persistence tier (model) and the visualization tier (view). The behavior of the controller can be formalized by using event-driven state machines and the modeling language ControllerUML utilizes UML state machines for this purpose. The abstract syntax of ControllerUML is defined by the metamodel shown in Figure 10. Each Controller possesses a Statemachine that describes its be-
388 MainController
CreationController
- selectedMeeting : Meeting
MainController's Statechart Start delete / deleteMeeting
select
ListMeetings
exit cancel / cancelMeeting
Figure 11.
create
> CreateMeeting
back edit EditMeeting
apply / update
End
Controllers for the Scheduling Application
havior in terms of States, StateTransitions, Events, and StatemachineActions. A State may contain other states, formalized by the association StateHierarchy, and a transition between two states is defined by a StateTransition, which is triggered by the event referenced by the association end trigger. A state machine action specifies an executable statement that is performed on entities of the application model. ViewState and SubControllerState are subclasses of State. A ViewState is a state where the application interacts with humans by way of view elements like dialogs or input forms. The view elements generate events in response to user actions, e.g., clicking a mouse button, which are processed by the controller’s state machine. A SubControllerState references another controller using the association end controller. The referenced controller takes over the application’s control flow when the referencing SubControllerState is activated. This supports the modular specification of controllers. The notation of ControllerUML uses primitives from UML class diagrams and statecharts. An example of a ControllerUML model is shown in Figure 11. A Controller is represented by a UML class with the stereotype «Controller». The behavior of the controller is defined by a state machine that is associated with this class. States, transitions, events, and actions are represented by their counterparts in the UML metamodel. Transitions are labeled with a string, containing a triggering event and an action to be executed during state transition, separated by a slash. We use events to name transitions in our explanations. View states and subcontroller states are labeled by the stereotypes «ViewState» and «SubControllerState», respectively. Figure 11 shows the design model for an interactive application that formalizes the scheduler workflow presented in Section 2.2. The controller class MainController is the top-level controller of the application and CreationController controls the creation of new meetings (details are omitted here to save
389
Model Driven Security Resource (from SecureUML)
Controller activateRecursive Re
StatemachineAction activate ctiva
ControllerActivateRecursive
execute c
State
activate tiva activateRecursive R
AtomicAction
StateActivateRecursive
(from SecureUML)
CompositeAction (from SecureUML)
Figure 12.
Resource Model of ControllerUML
space). The state machine of MainController is similar to that of Figure 2. Note that the selected meeting is stored in the attribute selectedMeeting of the controller object. Also, the reference from the subcontroller state CreateMeeting to the controller CreationController is not visible in the diagram. This information is stored in a tagged value of the subcontroller state.
7.1
Extending the abstract syntax
There are various ways to introduce access control into a process-oriented modeling language like ControllerUML. For example, one can choose whether entering states or making transitions (or both) are protected. Each choice results in the definition of a different dialect for integrating ControllerUML with SecureUML. Here we shall proceed by focusing on the structural aspects of statecharts, which are described by the classes of the metamodel (Figure 10) and the relations between them. We identify the types Controller, State, and StatemachineAction as the resource types in our language since their execution or activation can be sensibly protected by checkpoints in the generated code. Figure 12 shows this identification and also defines the composite actions for the dialect and the assignment of actions to resource types. The resource type StatemachineAction offers the atomic action execute and a state has the actions activate and activateRecursive. The action activateRecursive on a state is composed of the actions activate on the state, execute on all state machine actions of the outgoing transitions of the state, and the actions activateRecursive on all substates of the state. The corresponding OCL definition is as follows: context S t a t e A c t i v a t e R e c u r s i v e inv : subordinatedActions = resource . actions - > select ( name = " activate ") -> union ( resource . outgoing - > select ( effect < > None ). effect
390 stereotype ControllerAction StateAction ActionAction
resource type Controller State StatemachineAction
naming convention empty string state name state name + “.” + event name
Table 5. Action Reference Types for ControllerUML
. actions - > select ( name = " execute ")) -> union ( resource . substates . actions -> select ( name = " activateRecursive "))) .
This expression is built using the vocabulary defined by the ControllerUML metamodel shown in Figure 10 and the dialect definition given in Figure 12. The third line accesses the resource that the action belongs to (always a state) and selects the action with the name “activate”. The next line queries all outgoing transitions on the state and selects those transitions with an assigned state machine action (association end effect). Afterwards, for each state machine action, its (SecureUML) actions with the name “execute” is selected. The last line selects all actions with the name “activateRecursive” on all substates of the state to which the action of type StateActivateRecursive belongs. A controller possesses the actions activate and activateRecursive. The latter is a composite action that includes the action activate on the controller and the action activateRecursive for all of its states. Due to the definition of activateRecursive on states, this (transitively) includes all substates and all actions of the state machine.
7.2
Extending the notation
First, we merge the notation of both languages. Afterwards, we define wellformedness rules on SecureUML primitives that restrict which kinds of combined expressions are possible, i.e., we restrict how SecureUML primitives can refer to ControllerUML elements representing protected resources. For example, the scope of a permission is restricted to the UML classes with the stereotype «Controller». Finally, we define the action reference types for controllers, states, and state machine actions, as shown in Table 5.
7.3
Extending the semantics
We first define the semantics of ControllerUML in terms of a labeled transition system over a fixed first-order signature (cf. Section 4.3). Intuitively, every Controller defines a sort in the first-order signature and, in addition, we have a sort of states. Also, every atomic action defined in the SecureUML dialect as well as every state-transition in the ControllerUML model defines an action of the labeled transition system.
391
Model Driven Security
More precisely, given a model in the ControllerUML language, the corresponding signature ΣST = (SST , FST , PST ) is defined as follows: Each Controller c gives rise to two sorts Cc and Sc in SST . Cc is the sort of the controller c, where the elements of sort Cc represent the instances of the controller c. Each user interacting with the system gives rise to such an instance. Sc is the sort of the states of the controller c, where each state of the state machine describing the behavior of the controller c gives rise to an element of sort Sc . Additionally, SST contains the sorts Users, String, Int, Real, and Boolean: Cc | c is a controller} ∪ SST = {Sc | c is a controller} ∪ {C {Users, String, Int, Real, Boolean} . Function symbols are defined similarly to ComponentUML. However, controllers in ControllerUML can only have attributes, but not methods. Therefore, each controller attribute at gives rise to a function symbol getat in FST (the “get-method”) of type s → v, where s is the sort of the controller, and v is the sort of the attribute’s type: FST = {getat | at is a controller attribute} ∪ {selffc | c is a controller} ∪ {caller, UserName} . The initial and current states of a controller’s state machine are denoted by the implicit (in the sense that every controller will have them) controller attributes initialState and currentState of type Sc . The initial state of a controller denotes the state that is active when the state machine starts after the controller is created, and the current state denotes the currently active state. Whereas the attributes initialState and currentState are of type Sc , other controller attributes denote application-specific data attached to the controller and can have the types String, Int, Real, and Boolean. Additionally, it is possible to combine ControllerUML with a more data-oriented modeling language (like ComponentUML). Then one can use controller attributes with types provided by the data-modeling language. For example, in Figure 13 in the MainController, we refer to the entity Meeting of the ComponentUML model. Since there are no predicate symbols, PST = ∅ . The transition system ∆ = (Q, A, δ) is defined as follows:
392 Q is the universe of all possible states, which is just the set of all firstorder structures over the signature ΣST with finitely many elements for each controller sort as well as for the sort Users, where the interpretations of String, Int, Real, Boolean, and Sc are fixed to be the sets Strings, Z, R, {true, false}, and the set of states of the controller c respectively. The set of actions A is defined by: A = ControllerActivateActions ∪ StateActivateActions ∪ SMActionExecuteActions ∪ StateTransitions . This means that all atomic actions (cf. Figure 12) as well as all state transitions are actions of the transition system. The transition relation δ ⊆ Q×A×Q defines the allowed transitions. For a example, one requires that for each transition s1 → s2 in the model there are corresponding tuples (sold , a, snew ) in δ, where the current state of the controller (i.e., the attribute currentState) is s1 in sold and is s2 in snew . For the purposes of this paper, it does not matter which particular semantics is used, e.g., one of the many semantics for state-chart like languages ([von der Beeck, 1994] lists about 20 of them). Having defined the semantics of ComponentUML in this way, we combine it with the semantics of SecureUML as described in Section 4.3. That is, the new transition system ∆AC = (QAC , AAC , δAC ) is defined by adding ΣRBAC structures to the system states in Q, extending δ to QAC × AAC × QAC , and removing the forbidden transitions from the result. Hence, δAC will only contain those transitions that are allowed according to the SecureUML semantics.
7.4
Formalizing the authorization policy
We now return to our scheduling application model and extend it with a formalization of the security policy given in Section 2.1. In doing so, we use the role model introduced in Section 4.2. As Figure 13 shows, we use two permissions to formalize the first requirement that all users are allowed to create and to read all meetings. The permission UserMain grants the role User the right to activate the controller MainController and the states ListMeetings and CreateMeeting. The permission UserCreation grants the role User the privilege to activate the CreationController including the right to activate all of its states and to execute all of its actions. The second requirement states that only the owner of a meeting entry is allowed to change or delete it. We formalize this by the permission OwnerMeeting, which grants the role User the right to execute the actions on the outgoing transitions delete and cancel of the state ListMeetings and the right to activate
393
Model Driven Security UserCreation CreationController : activate_recursive
CreationController
OwnerMeeting ListMeetings.delete : execute ListMeetings.cancel : execute >EditMeeting : activate_recursive User
caller.name = self.selectedMeeting.owner.name MainController -selectedMeeting : Meeting
UserMain MainController : activate >ListMeetings : activate >CreateMeeting : activate
Supervisor
SupervisorCancel ListMeetings.cancel : execute
Figure 13.
Policy for Scheduling Application
the state EditMeeting. This permission is restricted by the ownership constraint attached to it. Finally, only supervisors are allowed to cancel any meeting. Therefore, the permission SupervisorCancel grants this role the unrestricted right to execute the action cancelMeeting on the transition cancel.
7.5
Transformation to web applications
In this section, we describe a transformation function that constructs secure web applications from ControllerUML models. As a starting point, we assume the existence of a transformation function that translates UML classes and state machines into controller classes for web applications, which can be executed in a Java Servlet environment (see Section 2.5). We describe here how we extend such a function to generate security infrastructures from SecureUML models. The Java Servlet architecture supports RBAC; however, its URL-based authorization scheme only enforces access control when a request arrives from outside the web server. This is ill-suited for advanced web applications that are built from multiple servlets, with one acting as the central entry point to the application. This entry point servlet acts as a dispatcher in that it receives all requests and forwards them (depending on the application state) to the other servlets, which execute the business logic. The declarative authorization mechanism only provides protection for the dispatcher. To overcome this weakness, we generate access control infrastructures that exploit the programmatic access
394 control mechanism that servlets provide, where the role assignments of a user can be retrieved by any servlet. Our transformation function is an extension of an existing generator provided by the MDA-tool ArcStyler [Hubert, 2001], which converts UML classes and state machines into controller classes. Each controller is equipped with methods for activating the controller, performing state transitions, activating the states of the controller, and executing actions on transitions. We augment the existing transformation function by generation rules that operate on the abstract syntax of SecureUML and add Java assertions to the methods for process activation, state activation, and action execution of a controller class. First, the set ActionPermissions(a), which contains all permissions affecting the execution of an action, is determined as described in Section 6.2. Afterwards, an assertion is generated of the form: !
**
+ + UserRole(r) ∧ Constraint(p) ))
!
if (!( p∈ActionPermissions(a)
r∈PR(p)
(4)
c.forward("/unauthorized.jsp");
The rule that generates this assertion has a structure similar to rule 3 in Section 6.2, which is used to generate assertions in the stubs of EJB components. When access is denied, however, the request is now forwarded to an error page by the term c.forward("/unauthorized.jsp"), instead of throwing an exception. Additionally, the functions used to obtain security information differ between EJB and Java Servlet. For example, the following assertion is generated for the execution of the action cancel on the state ListMeetings. if (!( request . isUserInRole ( " Supervisor " ) /*SupervisorCancel*/ || ( request . isUserInRole ( " User " ) || request . isUserInRole ( " Supervisor " )) && getSelectedMeeting (). getOwner (). getName (). equals ( request . getRemoteUser ()))) c . forward ( " / unauthorized . jsp " );
The role check is performed using the method isUserInRole() on the request object and each constraint is translated into a Java expression that accesses the attributes and side-effect free methods of the controller. The symbol caller is translated into a call to getRemoteUser() on the request object.
8. 8.1
Conclusion Evaluation
We have evaluated the ideas presented in this paper in an extensive case study: the model-driven development of the J2EE “Pet Store” application. Pet Store is a prototypical e-commerce application designed to demonstrate the use of the J2EE platform. It features web front-ends for shopping, administration, and order processing. The application model consists of 30 components
Model Driven Security
395
and several front-end controllers. We have extended this model with an access control policy formalizing the principle of least privileges, where a user is given only those access rights that are necessary to perform a job. The modeled policy comprises six roles and 60 permissions, 15 of which are restricted by authorization constraints. The corresponding infrastructure is generated automatically and consists of roughly 5,000 lines of XML (overall application: 13,000) and 2,000 lines of Java source code (overall application: 20,000). This large expansion is due to the high abstraction level provided by the modeling language. For example, we can grant a role read access to an entity, whereas EJB only supports permissions for whole components or single methods. Therefore, a modeled permission to read the state of a component may require the generation of many method permissions, e.g., for the get-methods of all attributes. Clearly, this amount of information cannot be managed at the source code level. The low abstraction level provided by the access control mechanisms of today’s middleware platforms often forces developers to take shortcuts and make compromises when implementing access control. For example, roles are assigned full access privileges even where they only require read access. As our experience shows, Model Driven Security can not only help to ease the transition from security requirements to secure applications, it also plays an important role in helping system designers to formalize and meet exact application requirements.
8.2
Related work
Various extensions to the core RBAC model have been presented in the literature, e.g., [Jaeger, 1999; Ahn and Sandhu, 1999; Ahn and Sandhu, 2000; Ahn and Shin, 2001]. These use constraints on role assignments to express different kinds of high-level organizational policies, like separation of duty. In contrast, SecureUML extends RBAC with constraints on system states associated with a design model. Jürjens [Jürjens, 2001; Jürjens, 2002] proposed an approach to developing secure systems using an extension of UML called UMLsec. Using UMLsec, one can annotate UML models with formally specified security requirements, like confidentiality or secure information flow. In contrast, our work focuses on a semantic basis for annotating UML models given by class or statechart diagrams with access control policies, where the semantics provides a foundation for generating implementations and for analyzing these policies. Probably the most closely related work is the Ponder Specification Language [Damianou, 2002], which supports the formalization of authorization policies where rules specify which actions each subject can perform on given targets. As in our work, Ponder supports the organization of privileges in an RBAC-like way and allows rules to be restricted by conditions expressed in
396 a subset of OCL. Moreover, Ponder policies can be directly interpreted and enforced by a policy management platform. There are, however, important differences. The possible actions on targets are defined in Ponder by the target’s visible interface methods. Hence, the granularity of access control in Ponder is at the level of methods, whereas in our approach higher-level actions can be defined using action hierarchies. Moreover, Ponder’s authorization rules refer to a hierarchy of domains in which the subjects and targets of an application are stored. In contrast, our approach integrates the security modeling language with the design modeling language, providing a joint vocabulary for building combined models. In our view, the overall security of systems benefits by building such security design models, which tightly integrate security policies with system design models during system design, and using these as a basis for subsequent development.
8.3
Future work
There are a number of promising directions for future work. To begin with, the languages we have presented constitute representative examples of security and design modeling languages. There are many questions remaining on how to design such languages and how to specialize them for particular modeling domains. On the security modeling side, one could enrich SecureUML with primitives for modeling other security aspects, like digital signatures or auditing. On the design modeling side, one could explore other design modeling languages that support modeling different views of systems at different levels of abstraction. What is attractive here is that our use of dialects to join languages provides a way of decomposing language design so that these problems can be tackled independently. We believe that Model Driven Security has an important role to play not only in the design of systems but also in their analysis and certification. Our semantics provides basis for formally verifying the transformation of models to code. Moreover, since our models are formal, we can ask questions about them and get well-defined answers, as the examples given in Section 5.5 suggest. More complex kinds of analysis should be possible too, which we will investigate in future work. Ideas here include calculating a symbolic description of those system states where an action is allowed, model checking statechart diagrams that combine dynamic behavior specifications with security policies, and verifying refinement or consistency relationships between different models.
Notes 1. SecureUML supports users, groups, and their role assignment. This can be used, e.g., to analyze the security-related behavior of an application. In general, user administration will not be performed using UML models, but rather using administration tools provided by the target platform at deployment time. Note too that the reader should not confuse the «Role» User with the SecureUML type «User».
Model Driven Security
397
2. For an overview of order-sorted signatures and algebras, see [?]. 3. Recall that authorization constraints are OCL formulas. A translation from OCL constraints to firstorder formulas can be found in [?]. 4. Note that we are here combining a many-sorted signature and an order-sorted signature. This is sensible because a many-sorted signature is trivially order-sorted 5. We denote actions by the name of their resource and the name of the action type, separated by a dot. 6. To keep the account self-contained, we simplify state machines by omitting parallelism, actions on state entry and exit, and details on visualization elements.
References Ahn, G.-J. and Sandhu, R. S. (1999). The RSL99 language for role-based separation of duty constraints. In Proceedings of the 4th ACM Workshop on Role-based Access Control, pages 43–54. ACM Press. Ahn, G.-J. and Sandhu, R. S. (2000). Role-based authorization constraints specification. ACM Transactions on Information and System Security, 3(4):207–226. Ahn, G.-J. and Shin, M. E. (2001). Role-based authorization constraints specification using object constraint language. In 10th IEEE International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE 2001), pages 157–162. IEEE Computer Society. Akehurst, D. and Kent, S. (2002). A relational approach to defining transformations in a metamodel. In UML 2002 — The Unified Modeling Language. Model Engineering, Languages, Concepts, and Tools. 5th International Conference, Dresden, Germany, September/October 2002, Proceedings, volume 2460 of LNCS, pages 243–258. Springer Verlag. Damianou, N. (2002). A Policy Framework for Management of Distributed Systems. PhD thesis, Imperial College, University of London. Ferraiolo, D. F., Sandhu, R., Gavrila, S., Kuhn, D. R., and Chandramouli, R. (2001). Proposed NIST standard for role-based access control. ACM Transactions on Information and System Security (TISSEC), 4(3):224–274. Frankel, D. S. (2003). Model Driven ArchitectureTM : Applying MDATM to Enterprise Computing. John Wiley & Sons. Hubert, R. (2001). Convergent Architecture: Building Model Driven J2EE Systems with UML. John Wiley & Sons. Hunter, J. (2001). Java Servlet Programming, 2nd Edition. O’Reilly & Associates. Jaeger, T. (1999). On the increasing importance of constraints. In Proceedings of 4th ACM Workshop on Role-based Access Control, pages 33–42. ACM Press. Jürjens, J. (2001). Towards development of secure systems using UMLsec. In Hussmann, H., editor, Fundamental Approaches to Software Engineering (FASE/ETAPS 2001), number 2029 in LNCS, pages 187–200. Springer-Verlag. Jürjens, J. (2002). UMLsec: Extending UML for secure systems development. In Jézéquel, J.M., Hussmann, H., and Cook, S., editors, UML 2002 — The Unified Modeling Language, volume 2460 of LNCS, pages 412–425. Springer-Verlag. Kiczales, G., Lamping, J., Menhdhekar, A., Maeda, C., Lopes, C., Loingtier, J.-M., and Irwin, J. (1997). Aspect-oriented programming. In Ak¸s¸it, M. and Matsuoka, S., editors, Proceedings European Conference on Object-Oriented Programming, volume 1241, pages 220–242. Springer-Verlag. Krasner, G. E. and Pope, S. T. (1988). A cookbook for using the model-view controller user interface paradigm in smalltalk-80. Journal of Object Oriented Program., 1(3):26–49.
398 Monson-Haefel, R. (2001). Enterprise JavaBeans (3rd Edition). O’Reilly & Associates. Rumbaugh, J., Jacobson, I., and Booch, G. (1998). The Unified Modeling Language Reference Manual. Addison-Wesley. von der Beeck, M. (1994). A comparison of statechart variants. In Langmaack, H., de Roever, W.-P., and Vytopil, J., editors, Formal Techniques in Real-Time and Fault-Tolerant Systems, volume 863 of LNCS, pages 128–148. Springer Verlag.
SOME CHALLENGES FOR SYSTEM DEVELOPMENT: REACTIVE ANIMATION, SMART PLAY-OUT AND OLFACTION David Harel The Weizmann Institute of Science Rehovot, Israel 76100
[email protected] Abstract
1.
This series of three lectures at the 2004 Marktoberdorf Summer School revolved around three topics that are peripheral to the classical notion of system development, but are relevant to it. It was not a Marktoberdorf “course” in the usual sense of the word, and hence detailed definitions and mathematical statements were not given. Rather, the topics were introduced and motivated, the main results described and illustrated, and ideas for further work sketched. This short report contains brief summaries of these topics, followed by some pointers to published papers.
Reactive Animation
We present a method, called reactive animation (RA), to enrich models of reactive systems with an animated, interactive and intuitive front-end [6]. The method harnesses the available strength of state-of-the-art tools and languages for reactive system design (e.g., Statemate, Rhapsody, Rose-RT and the PlayEngine) and tools for animation (e.g., Flash, Director and Maya), and builds a link between the two kinds of tools. Thus, one can connect the sophisticated specification of a reactive system with the attractive, intuitive, interfaces of today’s high-end animation tools, gaining the best of both worlds. The idea originated in needs arising during our efforts to simulate biology, and we have indeed used RA to describe the dynamics of T cell development in the thymus gland; see [4]. However, RA appears to be broadly applicable to the many kinds of complex systems whose front end representation is dynamic and unpredictable, and requires more than standard GUI technology. Reactive animation is not a new way to animate algorithms; it does not attempt to introduce a method for translating algorithmic complexity into abstract, arbitrary animation. Instead, it uses dynamic user interfaces and ani399 M. Broy et al. (eds.), Engineering Theories of Software Intensive Systems, 399–404. © 2005 Springer. Printed in the Netherlands.
400 mates, or dynamically redesigns them so as to realistically represent the system and its behavior in operation. Furthermore, RA is not a methodology for using notions from reactive systems and visual languages to facilitate handling reactivity inside the animation tools themselves. Rather, RA is about linking the two efforts — reactive technology and animation technology — by bridging the power of tools for animation and the power of tools for reactive systems design and implementation. Technically, RA is based on the observation that a system may be viewed as a combination of what it does and what it looks like. This leads to two separate but closely linked design paths — reactive behavior design and front-end design. Initially the two are non-overlapping, but we later connect them. Designers may start by preparing a visual description of the system and building the visual interface using their preferred animation tool. When this external view requires further dynamic enrichment by the sophistication of a reactive design, it may be connected to a reactive design tool. Alternatively, the design may start by constructing all or part of a reactive system model, which is later enriched by incorporating a high-quality animated front-end. Schematically, to build an RA implementation the system’s appearance is developed using the animation tool, and it includes the required animation components and the scripting instructions on how they combine at run time. Specification is then carried out using reactive system development tools that deal with the system’s architecture and run-time behavior. Through specification, the designer reaches a full running model of the system. Once the animation components and the reactive behavior are operational, they are linked through a specific communication channel. When this connection is fully established, the reactive model can be run so that it continuously sends the information needed by the animation, while at the same time attending to information coming from animation. The user can then view and interact with the running model by way of the front-end animation, or by way of the diagrammatic animation of the visual language used for reactivity, or by both. We have placed several pre-recorded video clips of the T-cell reactive animation example in operation, including segments of the animation and the statecharts in operation, and some examples of typical interaction with them. These can be viewed at: http://www.wisdom.weizmann.ac.il/∼dharel/ReactiveAnimation/
2.
Smart Play-Out
This work describes a method, which we have termed smart play-out [7], that utilizes verification techniques and tools, not to prove properties of a program, but to help run that program. It is applicable to many kinds of programming approaches that are declarative and nondeterministic in nature, and/or are
Some Challenges for System Development
401
based on constraints or rules. Smart play-out was developed in the framework of scenario-based programming using LSCs [3], and has been implemented on the Play-Engine. It is described in [7] and in Chapter 18 of [9]. The idea of smart play-out is to formulate the play-out task as a verification problem, and to use a model-checking algorithm to find a “good” super-step (i.e., a chain reaction of system events that constitute the reaction to an external event), if one exists. Thus, we use verification techniques to help run a program, rather than to prove properties thereof. The model-checking procedure is handed as input a transition system that is constructed from to the universal charts in the LSC specification. (These are the charts that drive the execution in the naive play-out process too.) The transition relation is designed to allow progress of the active universal charts, but to prevent violations. The system is initialized to reflect the status of the execution just after the last external event occurred, including the current values of object properties, information on the universal charts that were activated as a result of the most recent external events, and the progress in all precharts. The model-checker is then given a property claiming that it is always the case that at least one of the universal charts is active. This is really the negation of what we want, since in order to falsify the property, the model-checker searches for a run in which eventually none of the universal charts is active. That is, all active universal charts complete successfully, so that by the definition of the transition relation no violations occurred in the process. Such a counter-example is the desired super-step. If the model-checker is able to verify the property then no correct super-step exists, but if it is not able to, the counter-example is exactly what we seek. For more details see [7]. Smart play-out can also be instructed to try to satisfy an existential chart, which can be used to specify system tests. It automatically finds a trace (if there is one) that satisfies the existential chart without violating any universal charts in the process. This can be useful in understanding the possible behavior of a system and also in detecting problems, by, e.g., asking if there is some way for a certain scenario, which we believe cannot be realized by the system, to be satisfied. If smart play-out manages to satisfy the chart it will execute the trace, thus providing evidence for the cause of the problem. Since the appearance of [7], in which we reported on smart play-out as applied to a basic kernel version of LSCs (more or less the one appearing in [3]), we have gained experience in applying the method to several applications and case studies. These include a computerized system — a machine for manufacturing smart-cards — as well as a biological system — parts of the vulval development process of the C. elegans nematode worm [10]. We have also been working on extending smart play-out to cover a larger set of the LSC language features and to deal more efficiently with larger models. Specifically, in [8] we show how smart play-out extends to cover two key fea-
402 tures of the rich version of LSCs described in [9], namely, time and forbidden elements. The former is crucial for systems with time constraints and/or timedriven behavior, and the latter allows specifying invariants and contracts on behavior. Forbidden elements can also help reduce the state space that has to be considered by the model-checking, thus enabling smart play-out to handle larger models.
3.
Olfaction
This topic consists of a setup for an odor communication and synthesis system. Its different parts are described, and ways to realize them are outlined. Our approach enables an output device — the whiffer — to release an imitation of an odor read in by an input device — the sniffer — upon command. This is in complete analogy with the process of audio and video reproduction. The heart of the system is a novel mathematical/algorithmic scheme that makes the setup feasible; see [5]. We discuss in some detail our work in researching and developing some of the components that constitute the scheme, many of which have to do with the analysis of eNose space, including odor identification, mixing of odors, and mappings between eNose spaces; see, e.g., [1, 2, 11]. Future work that remains to be done involves a detailed investigation and analysis of human panel perception space. What is so difficult about odor communication? Probably, a combination of technological barriers and limited understanding of the relevant biology and psychophysics. Some of the major problems seem to be the following: The underlying physics is complex. Vision and audition also involve complex physical phenomena, but photons and sound waves are welldefined physical objects that follow well-known equations of a simple basic nature. Specifically, in both cases sensory quality is related to wellknown physics. On the other hand, the smell of an odorant is determined by the complex, and only partially understood, interactions between the ligand molecule and the olfactory receptor (OR) molecule. The biological detection system is high-dimensional. The nose contains hundreds of different types of ORs, each of them interacting in different ways with different kinds of odorants. Thus, the dimensionality of the sense of smell is at least two orders of magnitude larger than that of vision, which can make do with only three types of color receptors. Odor delivery technology is immature. While artificial generation of desired visual and auditory stimuli is done in high speed and with high quality, smells cannot be easily reproduced. Nowadays, the best that can be done is to interactively release extracts that were prepared in advance.
Some Challenges for System Development
403
Electronic noses (eNoses) play a fundamental role in the scheme of [5] for odor communication. At the basis of the scheme lies the concept of an odor space, the collection of all possible response patterns of a nose, be it the human nose or an eNose. The odor communication system requires the realization of an algorithm — the mix-to-mimic (MTM) algorithm — which is as yet not fully realizable. One of its core components is an algorithmic means for mapping from the space of an eNose into the space of the human perception, termed the psychophysical space. While eNose responses can be directly measured by applying chemical samples to them, the way to obtain measurements from the psychophysical space is to conduct large-scale human panel experiments. Due to the difficulty in carrying out these experiments and directly developing MTM, the suggestion in [5] was to construct three sub-algorithms, each adding a further complication. Upon the completion of the third, the full MTM will be available. As one example of the more specific topics we have worked on, towards making the scheme of [5] feasible, consider odor space mapping. The second of the sub-algorithms mentioned above calls for the construction of a mapping between two different eNoses. In [11], we describe how we were able to construct a mapping between two eNoses that employ two very different sensor technologies, quartz microbalance and conducting polymers. The technology differences are important, as, in a way, they represent the differences between eNoses and the human nose. By the way, mappings between eNoses are also important for purposes other than odor communication. Using such mappings, we would be able to integrate response patterns of different eNoses into a unified database. This would allow one to combine data from different types of eNoses, to maintain continuity when replacing sensor modules, and to overcome drift effects due to sensor aging.
References [1] L. Carmel, N. Sever, D. Lancet and D. Harel, “An E-Nose Algorithm for Identifying Chemicals and Determining their Concentration”, Sensors and Actuators B: Chemical 93 (2003), 76–82. [2] L. Carmel, N. Sever and D. Harel, “On Predicting Response to Mixtures in QMB Sensors”, Sensors and Actuators B: Chemical, in press, 2004. ((Also, Proc. 10th Int. Symp. on Olfaction and Electronic Nose (ISOEN’03), 2003, pp. 160–163.) [3] W. Damm and D. Harel, “LSCs: Breathing Life into Message Sequence Charts”, Formal Methods in System Design 19:1 (2001), 45–80. (Preliminary version in Proc. 3rd IFIP Int. Conf. on Formal Methods for Open Object-Based Distributed Systems, (P. Ciancarini, A. Fantechi and R. Gorrieri, eds.), Kluwer Academic Publishers, 1999, pp. 293–312.) [4] S. Efroni, D. Harel and I.R. Cohen, “Towards Rigorous Comprehension of Biological Complexity: Modeling, Execution and Visualization of Thymic T Cell Maturation”, Genome Research 13 (2003), 2485–2484.
404 [5] D. Harel, L. Carmel and D. Lancet, “Towards an Odor Communication System”, Computational Biology and Chemistry 27 (2003), 121–133. [6] D. Harel, S. Efroni and I.R. Cohen, “Reactive Animation”, Proc. 1st Int. Symposium on Formal Methods for Components and Objects (FMCO 2002), Lecture Notes in Computer Science, Vol. 2852, Springer-Verlag, 2003, pp. 136–153. [7] D. Harel, H. Kugler, R. Marelly and A. Pnueli, “Smart Play-Out of Behavioral Requirements”, Proc. 4th Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD 2002), November 2002, pp. 378–398. [8] D. Harel, H. Kugler and A. Pnueli, “Smart Play-Out Extended: Time and Forbidden Elements”, Proc. 4th Int. Conf. on Quality Software (QSIC’04), IEEE Computer Society Press, 2004, pp. 2–10. [9] D. Harel and R. Marelly, Come, Let’s Play: Scenario-Based Programming Using LSCs and the Play-Engine, Springer-Verlag, 2003. [10] N. Kam, D. Harel, H. Kugler, R. Marelly, A. Pnueli, E. Hubbard, and M. Stern, “Formal Modeling of C. elegans Development: A Scenario-Based Approach”, Proc. Int. Workshop on Computational Methods in Systems Biology (CMSB 2003), Lecture Notes in Computer Science, Vol. 2602, Springer-Verlag, pp. 4–20, 2003. (Revised version in Modeling in Molecular Biology (G. Ciobanu and G. Rozenberg, eds.), Springer, Berlin, 2004, pp. 151–173.) [11] O. Shaham, L. Carmel and D. Harel, “Mapping Between Electronic Noses”, Sensors and Actuators B: Chemical, in press, 2004. (Also, Proc. 10th Int. Symp. on Olfaction and Electronic Nose (ISOEN’03), 2003, pp. 92–95.)