Lecture Notes in Computer Science
6602
Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison, UK Josef Kittler, UK Alfred Kobsa, USA John C. Mitchell, USA Oscar Nierstrasz, Switzerland Bernhard Steffen, Germany Demetri Terzopoulos, USA Gerhard Weikum, Germany
Takeo Kanade, USA Jon M. Kleinberg, USA Friedemann Mattern, Switzerland Moni Naor, Israel C. Pandu Rangan, India Madhu Sudan, USA Doug Tygar, USA
Advanced Research in Computing and Software Science Subline of Lectures Notes in Computer Science Subline Series Editors Giorgio Ausiello, University of Rome ‘La Sapienza’, Italy Vladimiro Sassone, University of Southampton, UK
Subline Advisory Board Susanne Albers, University of Freiburg, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Madhu Sudan, Microsoft Research, Cambridge, MA, USA Deng Xiaotie, City University of Hong Kong Jeannette M. Wing, Carnegie Mellon University, Pittsburgh, PA, USA
Gilles Barthe (Ed.)
Programming Languages and Systems 20th European Symposium on Programming, ESOP 2011 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2011 Saarbrücken, Germany, March 26–April 3, 2011 Proceedings
13
Volume Editor Gilles Barthe IMDEA Software Facultad de Informatica (UPM) Campus Montegancedo, 28660 Boadilla del Monte, Madrid, Spain E-mail:
[email protected] ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-19717-8 e-ISBN 978-3-642-19718-5 DOI 10.1007/978-3-642-19718-5 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011922331 CR Subject Classification (1998): D.2, F.3, C.2, D.3, H.4, D.1 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
ETAPS 2011 was the 14th instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference that was established in 1998 by combining a number of existing and new conferences. This year it comprised the usual five sister conferences (CC, ESOP, FASE, FOSSACS, TACAS), 16 satellite workshops (ACCAT, BYTECODE, COCV, DICE, FESCA, GaLoP, GT-VMT, HAS, IWIGP, LDTA, PLACES, QAPL, ROCKS, SVARM, TERMGRAPH, and WGT), one associated event (TOSCA), and seven invited lectures (excluding those specific to the satellite events). The five main conferences received 463 submissions this year (including 26 tool demonstration papers), 130 of which were accepted (2 tool demos), giving an overall acceptance rate of 28%. Congratulations therefore to all the authors who made it to the final programme! I hope that most of the other authors will still have found a way of participating in this exciting event, and that you will all continue submitting to ETAPS and contributing to make of it the best conference on software science and engineering. The events that comprise ETAPS address various aspects of the system development process, including specification, design, implementation, analysis and improvement. The languages, methodologies and tools which support these activities are all well within its scope. Different blends of theory and practice are represented, with an inclination towards theory with a practical motivation on the one hand and soundly based practice on the other. Many of the issues involved in software design apply to systems in general, including hardware systems, and the emphasis on software is not intended to be exclusive. ETAPS is a confederation in which each event retains its own identity, with a separate Programme Committee and proceedings. Its format is open-ended, allowing it to grow and evolve as time goes by. Contributed talks and system demonstrations are in synchronised parallel sessions, with invited lectures in plenary sessions. Two of the invited lectures are reserved for ‘unifying’ talks on topics of interest to the whole range of ETAPS attendees. The aim of cramming all this activity into a single one-week meeting is to create a strong magnet for academic and industrial researchers working on topics within its scope, giving them the opportunity to learn about research in related areas, and thereby to foster new and existing links between work in areas that were formerly addressed in separate meetings. ETAPS 2011 was organised by the Universit¨ at des Saarlandes in cooperation with: European Association for Theoretical Computer Science (EATCS) European Association for Programming Languages and Systems (EAPLS) European Association of Software Science and Technology (EASST)
VI
Foreword
It also had support from the following sponsors, which we gratefully thank: DFG Deutsche Forschungsgemeinschaft; AbsInt Angewandte Informatik GmbH; Microsoft Research; Robert Bosch GmbH; IDS Scheer AG / Software AG; T-Systems Enterprise Services GmbH; IBM Re¨r Wirtschaftsfo ¨ rderung Saar mbH; search; gwSaar Gesellschaft fu Springer-Verlag GmbH; and Elsevier B.V. The organising team comprised: General Chair: Reinhard Wilhelm Organising Committee: Bernd Finkbeiner, Holger Hermanns (chair), Reinhard Wilhelm, Stefanie Haupert-Betz, Christa Sch¨ afer Satellite Events: Bernd Finkbeiner Website: Hern´ an Bar´ o Graf Overall planning for ETAPS conferences is the responsibility of its Steering Committee, whose current membership is: Vladimiro Sassone (Southampton, Chair), Parosh Abdulla (Uppsala), Gilles Barthe (IMDEA-Software), Lars Birkedal (Copenhagen), Michael O’Boyle (Edinburgh), Giuseppe Castagna (CNRS Paris), Marsha Chechik (Toronto), Sophia Drossopoulou (Imperial College London), Bernd Finkbeiner (Saarbr¨ ucken) Cormac Flanagan (Santa Cruz), Dimitra Giannakopoulou (CMU/NASA Ames), Andrew D. Gordon (MSR Cambridge), Rajiv Gupta (UC Riverside), Chris Hankin (Imperial College London), Holger Hermanns (Saarbr¨ ucken), Mike Hinchey (Lero, the Irish Software Engineering Research Centre), Martin Hofmann (LMU Munich), Joost-Pieter Katoen (Aachen), Paul Klint (Amsterdam), Jens Knoop (Vienna), Barbara K¨ onig (Duisburg), Shriram Krishnamurthi (Brown), Juan de Lara (Madrid), Kim Larsen (Aalborg), Rustan Leino (MSR Redmond), Gerald Luettgen (Bamberg), Rupak Majumdar (Los Angeles), Tiziana Margaria (Potsdam), Ugo Montanari (Pisa), Luke Ong (Oxford), Fernando Orejas (Barcelona), Catuscia Palamidessi (INRIA Paris), George Papadopoulos (Cyprus), David Rosenblum (UCL), Don Sannella (Edinburgh), Jo˜ ao Saraiva (Minho), Helmut Seidl (TU Munich), Tarmo Uustalu (Tallinn), and Andrea Zisman (London). I would like to express my sincere gratitude to all of these people and organisations, the Programme Committee Chairs and members of the ETAPS conferences, the organisers of the satellite events, the speakers themselves, the many reviewers, all the participants, and Springer for agreeing to publish the ETAPS proceedings in the ARCoSS subline. Finally, I would like to thank the Organising Chair of ETAPS 2011, Holger Hermanns and his Organising Committee, for arranging for us to have ETAPS in the most beautiful surroundings of Saarbr¨ ucken. January 2011
Vladimiro Sassone ETAPS SC Chair
Preface
This volume contains the papers presented at ESOP 2011, the 20th European Symposium on Programming held March 30-April 1, 2011, in Saarbr¨ ucken, Germany. ESOP is an annual conference devoted to fundamental issues in the specification, design, analysis, and implementation of programming languages and systems. ESOP 2011 was the 20th edition in the series. The Programme Committee (PC) invited papers on all aspects of programming language research including: programming paradigms and styles, methods and tools to write and specify programs and languages, methods and tools for reasoning about programs, methods and tools for implementation, and concurrency and distribution. Following previous editions, we maintained the page limit to 20 pages, and a rebuttal process of 72 hours during which the authors could respond to the reviews of their submission. This year, PC submissions were not allowed. We received 117 abstracts and in the end got 93 full submissions; one submission was withdrawn. The remaining 92 submissions received from 3 to 6, and on average 4, reviews; eventually the PC selected 24 papers for publication. These proceedings consist of Andrew Appel’s invited paper, and of the 24 selected papers. I would like to thank the PC and the subreviewers for their dedicated work in the paper selection process, and all authors who submitted their work to the conference. I would also like to thank the 2011 Organizing Committee, chaired by Holger Hermanns, and the Steering Committee, chaired by Vladimiro Sassone, for coordinating the organization of ETAPS 2011. Finally, I would like to thank Andrei Voronkov, whose EasyChair system proved (once more) invaluable throughout the whole process. January 2011
Gilles Barthe
Conference Organization
Programme Chair Gilles Barthe
Programme Committee Nick Benton Radhia Cousot Jean Goubault-Larrecq Radha Jagadeesan Viktor Kuncak Sorin Lerner Arnd Poetzsch-Heffter Shaz Qadeer Andrei Sabelfeld Tachio Terauchi Jan Vitek Stephanie Weirich
Cristiano Calcagno Sophia Drossopoulou Nicolas Halbwachs Gerwin Klein Julia Lawall Frank Piessens Francois Pottier Andrey Rybalchenko Peter Sewell Vasco T. Vasconcelos David Walker Kwangkeun Yi
External Reviewers Ahmed, Amal Amtoft, Torben Andronick, June Balakrishnan, Gogul Banerjee, Anindya Berdine, Josh Bertrane, Julien Bierman, Gavin Bouajjani, Ahmed Boyton, Andrew Casinghino, Chris Chadha, Rohit Chatzikokolakis, Konstantinos Chen, Liqian Choi, Wontai Chugh, Ravi Comon-Lundh, Hubert Corin, Ricardo Cotton, Scott Crespo, Juan Manuel
Alglave, Jade Ancona, Davide Askarov, Aslan Balland, Emilie Barnett, Michael Beringer, Lennart Besson, Fr´ed´eric Birgisson, Arnar Boulm´e, Sylvain Carbone, Marco Castagna, Giuseppe Chakravarty, Manuel Chen, Juan Chlipala, Adam Chong, Stephen Cirstea, Horatiu Compagnoni, Adriana Cortier, V´eronique Cremers, Cas D’Silva, Vijay
X
Conference Organization
Daubignard, Marion Devriese, Dominique Distefano, Dino Dumas, Marlon Feller, Christoph Ferrara, Pietro Fournet, C´edric Gaillourdet, Jean-Marie Ganty, Pierre Geilmann, Kathrin Gordon, Andy Goubault, Eric Greenaway, David Gupta, Ashutosh Gvero, Tihomir Rydhof Hansen, Rene Heeren, Bastiaan Honda, Kohei Hurlin, Cl´ement Jacobs, Swen Jeffrey, Alan Jung, Yungbum Kim, Hanjun Kim, Youil Kolanski, Rafal Koutavas, Vasileios Kunz, C´esar K¨ onig, Barbara Laviron, Vincent Lee, Wonchan Lopes, Nuno P. Maffei, Matteo Marinescu, Maria-Cristina Martel, Matthieu Mass, Damien Mauborgne, Laurent McKinna, James Michaelson, Greg Might, Matthew Monniaux, David Motika, Christian Nilsson, Henrik Nogueira, Pablo Oh, Hakjoo Owens, Scott Park, Sungwoo
Desmet, Lieven Dietl, Werner Dodds, Mike Fahndrich, Manuel Feret, Jerome Filliatre, Jean-Christophe Francalanza, Adrian Gammie, Peter Gay, Simon Ghica, Dan Gotsman, Alexey Gray, Kathryn E. Greenberg, Michael Gurov, Dilian Hammer, Christian Hedin, Daniel Hobor, Aquinas Hu, Zhenjiang Jacobs, Bart Jaskelioff, Mauro Jensen, Simon Kennedy, Andrew Kim, Ik-Soon Kinder, Johannes Kopp, Oliver Kremer, Steve Kurnia, Ilham Laud, Peeter Lee, Oukseh Leroy, Xavier Lozes, Etienne Malkis, Alexander Marron, Mark Martins, Francisco Matsuda, Kazutaka McCusker, Guy Meyer, Roland Michel, Patrick Min´e, Antoine Mostrous, Dimitris Nguyen, Kim Noble, James Nordio, Martin Oliveira, Bruno Pandya, Paritosh Parkinson, Matthew
Conference Organization
Petri, Gustavo Pichardie, David Pitcher, Corin Plump, Detlef Pratikakis, Polyvios Rajan, Hridesh Remy, Didier Reus, Bernhard Ringeissen, Christophe Russo, Alejandro Ryu, Sukyoung Samborski-Forlese, Julian Sands, David Santos, Andr´e L. Schmitt, Alan Sevcik, Jaroslav Shin, Jaeho Silva, Josep Skalka, Christian Sozeau, Matthieu Steffen, Martin Strickland, Stephen T. Summers, Alexander Svenningsson, Josef Thielecke, Hayo Tsukada, Takeshi Uustalu, Tarmo Van Horn, David Vanoverberghe, Dries Venet, Arnaud von Hanxleden, Reinhard Wadler, Philip Welsch, Yannick Winwood, Simon Yorgey, Brent Zanella B´eguelin, Santiago
Phillips, Andrew Piskac, Ruzica Plsek, Ales Popeea, Corneliu Rafnsson, Willard Regis-Gianas, Yann Rensink, Arend Riba, Colin Rompf, Tiark Russo, Luis S, Ramesh Sanchez, Cesar Sankaranarayanan, Sriram Schaefer, Jan Seidl, Helmut Sewell, Thomas Siek, Jeremy Sj¨ oberg, Vilhelm Smans, Jan Staton, Sam Strackx, Raoul Sumii, Eijiro Suter, Philippe Swierstra, Doaitse Thiemann, Peter Urzyczyn, Pawel van der Meyden, Ron van Staden, Stephan Vaughan, Jeff Vogels, Frederic Vouillon, J´erˆome Wells, Joe Westbrook, Edwin Wrigstad, Tobias Yu, Minlan Zappa Nardelli, Francesco
XI
Table of Contents
Verified Software Toolchain (Invited Talk) . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrew W. Appel
1
Polymorphic Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jo˜ ao Filipe Belo, Michael Greenberg, Atsushi Igarashi, and Benjamin C. Pierce
18
Proving Isolation Properties for Software Transactional Memory . . . . . . . Annette Bieniusa and Peter Thiemann
38
Typing Copyless Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viviana Bono, Chiara Messa, and Luca Padovani
57
Measure Transformer Semantics for Bayesian Machine Learning . . . . . . . . Johannes Borgstr¨ om, Andrew D. Gordon, Michael Greenberg, James Margetson, and Jurgen Van Gael
77
Transfer Function Synthesis without Quantifier Elimination . . . . . . . . . . . J¨ org Brauer and Andy King
97
Semantics of Concurrent Revisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sebastian Burckhardt and Daan Leijen
116
Type-Based Access Control in Data-Centric Systems . . . . . . . . . . . . . . . . . Lu´ıs Caires, Jorge A. P´erez, Jo˜ ao Costa Seco, Hugo Torres Vieira, and L´ ucio Ferr˜ ao
136
Linear Absolute Value Relation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liqian Chen, Antoine Min´e, Ji Wang, and Patrick Cousot
156
Generalizing the Template Polyhedral Domain . . . . . . . . . . . . . . . . . . . . . . . Michael A. Col´ on and Sriram Sankaranarayanan
176
Dataflow Analysis for Datarace-Free Programs . . . . . . . . . . . . . . . . . . . . . . . Arnab De, Deepak D’Souza, and Rupesh Nasre
196
Compiling Information-Flow Security to Minimal Trusted Computing Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C´edric Fournet and J´er´emy Planul Improving Strategies via SMT Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Martin Gawlitza and David Monniaux
216 236
XIV
Table of Contents
Typing Local Control and State Using Flow Analysis . . . . . . . . . . . . . . . . . Arjun Guha, Claudiu Saftoiu, and Shriram Krishnamurthi
256
Barriers in Concurrent Separation Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aquinas Hobor and Cristian Gherghina
276
From Exponential to Polynomial-Time Security Typing via Principal Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sebastian Hunt and David Sands
297
Secure the Clones: Static Enforcement of Policies for Secure Object Copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Jensen, Florent Kirchner, and David Pichardie
317
Biochemical Reaction Rules with Constraints . . . . . . . . . . . . . . . . . . . . . . . . Mathias John, C´edric Lhoussaine, Joachim Niehren, and Cristian Versari A Testing Theory for a Higher-Order Cryptographic Language (Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vasileios Koutavas and Matthew Hennessy A New Method for Dependent Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trevor Jim and Yitzhak Mandelbaum Static Analysis of Run-Time Errors in Embedded Critical Parallel C Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antoine Min´e Algorithmic Nominal Game Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrzej S. Murawski and Nikos Tzevelekos
338
358 378
398 419
The Relationship between Separation Logic and Implicit Dynamic Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthew J. Parkinson and Alexander J. Summers
439
Precise Interprocedural Analysis in the Presence of Pointers to the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pascal Sotin and Bertrand Jeannet
459
General Bindings and Alpha-Equivalence in Nominal Isabelle . . . . . . . . . . Christian Urban and Cezary Kaliszyk
480
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
501
Verified Software Toolchain (Invited Talk) Andrew W. Appel Princeton University
Abstract. The software toolchain includes static analyzers to check assertions about programs; optimizing compilers to translate programs to machine language; operating systems and libraries to supply context for programs. Our Verified Software Toolchain verifies with machine-checked proofs that the assertions claimed at the top of the toolchain really hold in the machine-language program, running in the operating-system context, on a weakly-consistent-shared-memory machine. Our verification approach is modular, in that proofs about operating systems or concurrency libraries are oblivious of the programming language or machine language, proofs about compilers are oblivious of the program logic used to verify static analyzers, and so on. The approach is scalable, in that each component is verified in the semantic idiom most natural for that component. Finally, the verification is foundational: the trusted base for proofs of observable properties of the machine-language program includes only the operational semantics of the machine language, not the source language, the compiler, the program logic, or any other part of the toolchain—even when these proofs are carried out by source-level static analyzers. In this paper I explain some semantic techniques for building a verified toolchain.
Consider a software toolchain comprising, A Static analyzer or program verifier that uses program invariants to check assertions about the behavior of the source program. A compiler that translates the source-language program to a machinelanguage (or other object-language) program. A runtime system (or operating system, or concurrency library) that serves as the runtime context for external function calls of the machinelanguage program. We want to construct a machine-checked proof, from the foundations of logic, that Any claims by the static analyzer about observations of the source-language program will also characterize the observations of the compiled program. We may want to attach several different static analyzers to the same compiler, or choose among several compilers beneath the same analyzer, or substitute one operating system for another. The construction and verification of just one component, such as a compiler or a static analyzer, may be as large as one project team or research group can reasonably accomplish. For both of these reasons, the interfaces between G. Barthe (Ed.): ESOP 2011, LNCS 6602, pp. 1–17, 2011. c Springer-Verlag Berlin Heidelberg 2011
2
A.W. Appel
components—their specifications—deserve as much attention and effort as the components themselves. We specify the observable behavior of a concurrent program (or of a single thread of that program) as its input-output behavior, so that the statement Program p matches specification S can be expressed independently of the semantics of the programming language in which p is written, or of the machine language. Of course, those semantics will show up in the proof of the claim! Sections 10 and 11 explain this in more detail. But observable “input-output” behaviors of individual shared-memory threads are not just the input and output of atomic tokens: a lock-release makes visible (all at once) a whole batch of newly observable memory, a lock-aquire absorbs a similar batch, and a operating-system call (read into memory buffer, write from memory buffer, alloc memory buffer) is also an operation on a set of memory locations. Section 1 explains. The main technical idea of this approach is to define a thread-local semantics for a well-synchronized concurrent program and use this semantics to trick the correctness proof of the sequential compiler to prove its correctness on a concurrent program. That is, take a sequential program; characterize its observable behavior in a way that ignores intensional properties such as individual loads and stores, and focus instead on (externally visible) system calls or lock-aquire/releases that transfer a whole batch of memory locations; split into thread-local semantics of individual threads; carry them through the modified sequential proof of compiler correctness; gather the resulting statements about observable interactions of individual threads into a statement about the observable behavior of the whole binary. Because our very expressive program logic is most naturally proved sound with respect to an operational semantics with fancy features (permissions, predicates-in-theheap) that don’t exist in a standard operational semantics (or real computer), we need to erase them at some appropriate point. But when to erase? We do some erasure (what we call the transition from decorated op. sem. to angelic op. sem.) before the compilercorrectness proof; other erasure (from angelic to erased) should be done much later. We organize our proof, in Coq, as follows. (Items marked • are either completed or nearly so; items marked ◦ are in the early stages; my principal coauthors on this research (in rough chronological order) are Sandrine Blazy, Aquinas Hobor, Robert Dockins, Lennart Beringer, and Gordon Stewart. Items marked – are plausible but not even begun.) • We specify an expressive program logic for source-language programs; ◦ we instrument the static analyzer to emit witnesses in the form of invariants; ◦ we reimplement just the core of the static analyzer (invariant checker, not invariant inference engine) and prove it correct w.r.t the program logic; • we specify a decorated operational semantics for source-language programs; • we prove the soundness of the program logic w.r.t. the decorated semantics; • we specify an angelic operational semantics; • we prove a correspondence between executions of the decorated and the angelic semantics; we prove the correctness of the optimizing compiler w.r.t. the angelic operational semantics of the source and machine languages;
Verified Software Toolchain
3
– we prove the equivalence of executions in the angelic operational semantics in a weakly consistent memory model and in a sequentially consistent memory model; • we specify an erased (i.e., quite conventional) operational semantics of the machine language; • we prove a correspondence between executions of the angelic and erased semantics. The composition of all these steps gives the desired theorem. Q.E.D. The verified optimizing compiler is not by our research group, it is the CompCert compiler by Leroy et al. [24] with whom we have been collaborating since 2006 on adjusting the CompCert interfaces and specifications to make this connection possible. Since Leroy et al.’s correctness proof of CompCert in Coq constitutes one of the components of our modular verification, it seemed reasonable to do the rest of our verification in Coq. In addition, some of the logical techniques we use require a logic with dependent types, and cannot be expressed (for example) in HOL (higher-order logic, a.k.a. Church’s simple theory of types). We wanted to use a logic whose kernel theory (in this case, CiC) is trustworthy, machine-checkable, and well understood. Finally, we wanted a system with well-maintained software tools, mature libraries of tactics and theories, and a large user community. For all these reasons we use Coq for our Verified Software Toolchain (VST), and we have been happy with this choice.
1 Observables Specifications of program behavior must be robust with respect to translation from source language to machine language. Although the states of source- and target-language programs may be expressed differently, we characterize the observable behaviors (or observations) of source- and target-language programs in the same language [15]. Our source language is a dialect of C with shared-memory concurrency using semaphores (locks). The C programs interact with the outside world by doing system calls— with communication via parameters as well as in memory shared with the operating system—and via lock synchronization (through shared memory). One thread of a shared-memory concurrent program interacts with other threads in much the same way that a client program interacts with an operating system. That is, the thread reads and writes memory to which the other threads (or operating system) have access; the thread synchronizes with the other threads (or OS) via special instructions (trap, compareand-swap) or external function calls (read-from-file, semaphore-unlock). We want the notion of observable behavior to be sufficiently robust to characterize either kind of interaction. For each system call, and at each lock acquire/release, some portion of the memory is observable. But on the other hand, some parts of memory should not be observable.1 In private regions of memory, compilers should have the freedom to optimize loads and stores of sequential programs: to hoist loads/stores past each other, and past control operations, to eliminate redundant loads and stores, etc.—subject only to dataflow constraints. 1
Of course, the operating system can observe any part of memory that it wants to. But the O.S. should not “care” about the thread’s local data, while it does “care” about the contents of memory pointed to by the argument of a write system call.
4
A.W. Appel
Of course, with a naive or ill-synchronized approach to shared memory, it is possible for one thread to see another’s loads and stores, and therefore the compiler might alter the output of a program by “optimizing” it. At least for the time being, we restrict our attention to well-synchronized programs—in which any thread’s write (read) access to data is done while holding a semaphore granting exclusive (shared) permission to that data. In such a regime, we can achieve that loads and stores are not observable events. This is an important design principle. “Volatile” variables with observable loads/stores can be treated in our framework, with each load or store transformed—by the front end—into a special synchronization operation. Leroy has recently instrumented the front end of CompCert to perform this transformation. Extensional properties. This approach to program specification deliberately avoids intensional properties, i.e. those that characterize how the computation proceeds. This design decision frees the compiler to make very general program transformations. However, that means that we cannot specify properties such as execution time. Permissions and synchronization. When a thread does a system call or a lock synchronization, there is implicitly a set of addresses that may be “observed.” Our specification of observations makes this explicit as permissions. A thread that (at a given time) has write permission to a memory address can be sure that no other thread (or operating system) can observe it. Lock synchronizations and system calls can change a thread’s permission in controlled and predictable ways. From the compiler’s point of view, synchronizations and system calls are instances of external function calls, which can change memory permissions in almost arbitrary ways. The permissions have no concrete runtime manifestation—since, of course, we are compiling to raw machine code on real machines. Instead, they are a “fictional” artifact of static program invariants: a static permission-based proof about a program says something about its non-stuck execution in a permission-carrying operational semantics. Then, the permissions present in our decorated and angelic operational semantics disappear in the erased operational semantics. In fact, the difference between the decorated and angelic operational semantics is in how the permissions change at external function calls. The decorated operational semantics is annotated with sufficient program invariants to calculate the change in permissions; the angelic semantics is equipped with an “angel” that serves as an oracle for the change in permissions; see Section 7.
2 The Programming Language and Compiler Our VST project was inspired in part by Leroy’s CompCert verified optimizing C compiler [24]. CompCert is a remarkable achievement, on the whole and in many particulars. For example, the CompCert memory model by Leroy and Blazy [25] supports storing of bytes, integers, floats, and relocatable pointers in a byte-addressable memory; is sufficiently abstract to support relocation and the kinds of block-layout adjustments that occur during compilation; and is sufficiently general that the same memory model can be used for the source-level semantics, the target-machine semantics, and at every
Verified Software Toolchain
5
level in between. The tasteful design of the memory model is one reason for CompCert’s success. CompCert “merely” proves that, whenever it compiles a C program to assembly language, any safe execution in C corresponds to a safe execution in assembly with the same observable behavior. But where do safe C programs come from, and how can one characterize the observable behavior of those C programs? If those questions cannot be answered, then it might seem that CompCert is like a highly overengineered hammer without any nails. Our Verified Software Toolchain uses C minor [24] as a source language. C minor is the high-level intermediate representation used in CompCert for the C programming language. CompCert compiles the C programming language (or rather, a very substantial subset called C light [9]) through 7 intermediate languages into assembly language for the PowerPC processor; the third language in this chain of 9 is C minor. For each language in the chain, Leroy has specified the syntax and operational semantics in the Coq theorem prover. Each of the 8 compiler phases between these intermediate languages is proved (in Coq) to preserve observational equivalence between its input (a program in one intermediate language) and its output (a program in the next language). We chose to use C minor (instead of C light) as the target for VST, for two reasons: C minor is more friendly to Hoare-style reasoning as there are no side effects inside expressions;2 and C minor can be used as a target language from source languages such as ML or Java.
3 Oracle Semantics Remarkable as CompCert is, its original specification was too weak to express the interaction of a thread with its context—whether that context is the operating system or other concurrent threads. We want to remedy that weakness without making fundamental changes to the CompCert correctness proof. We factor the operational semantics to separate core programming-language execution from execution steps in the operating-system/runtime-system/concurrency context, which we call the oracle for short. We do this because the same oracle will be applied to both the source- and machine-language programs; and because (conversely) different kinds of oracles will be applied to the same source- or machine-language program. The CompCert C compiler [24] is proved correct with respect to a source-level operational semantics and a machine-level operational semantics. In early versions of CompCert, the source-level semantics was big-step and the machine-level was small-step. Each semantics generated a trace, a finite or infinite sequence of observable events, that is, system calls with atomic arguments and return values. This specification posed obstacles for integration into our verified toolchain: the bigstep semantics was inconvenient for reasoning about concurrent threads; the system calls did not permit shared memory between client and operating system (e.g. did not permit the conventional model of system calls); the lack of shared memory meant that 2
More precisely, because C minor is not a subset of C, we had some flexibility to negotiate with Leroy a few modifications to the specification of C minor: e.g., unlike C minor 2006, C minor 2010 has no side-effects inside expressions.
6
A.W. Appel
a shared-memory concurrency model was difficult; the coarseness of memory-access permissions meant that concurrent separation logic was difficult; and so on. Therefore we worked with Xavier Leroy to adjust the specification of CompCert’s source-level operational semantics. Now it is a small-step semantics, with a lattice of permissions on each memory address. Instead of a trace model–in which the behavior of a program is characterized by a sequence of atomic values read or written, we characterize the behavior by the thread’s history of (shared-memory) interactions with the oracle. Each state of the small-step semantics contains three components: oracle state Ω, memory m, and core state q. A core step is a small-step of computation that is not an external function call; an oracle step is taken whenever the computation is at an external function call. Each core step affects only m and q, leaving Ω unchanged; each oracle-step affects only Ω and m. What is the oracle? If we are modeling the interaction of a sequential C program with its operating system, then the oracle is the operating system, and the external function calls are system calls. If we are modeling the interaction of a sequential C thread with all the other threads in a shared-memory concurrent computation, then the external function calls are lock-acquire and -release, and the oracle is all the other threads. An early version of this oracle model of concurrency is described by Hobor et al. [20,21]. The advantage of the oracle model is that, from the point of view of the C compiler and its correctness proof, issues of concurrency hardly intrude at all. Still, Hobor’s oracle model was only partly modular: it successfully hid the internal details of the oracle from the compiler-correctness proof, but it did not hide all the details of the programming language from the oracle. This was a problem, because it was difficult to characterize the result of compiling a concurrent C program down to a concurrent machine-language program. Dockins [15] has a substantially more modular oracle model, which is the basis for the verified toolchain described here.
4 The Program Logic Proofs of static analyses and program verifiers are (often) most convenient with respect to an axiomatic semantics (such as a Hoare Logic) of the programming language; we call this the program logic. Proofs of compilers and optimizations are (often) most convenient using an operational semantics.3 Therefore for the source language we will need both an axiomatic and an operational semantics, as well as a machine-checked soundness proof that relates the two. We intend to use the program logic as an intermediate step between a static analysis algorithm and an operational semantics. We would like to do this in a modular way— that is, prove several different static analyses correct w.r.t. the same program logic, then prove the program logic sound w.r.t. the operational semantics. Therefore we want the program logic to be as general and expressive as possible. We start with a Hoare Logic for C minor: the judgement Γ {P }c{Q} means that command c has precondition P and postcondition Q. Somewhat more precisely, if c starts in a state satisfying P , then either it safely infinite-loops or it reaches a state 3
At the very least, the particular proof that we want to connect our system to—CompCert—is w.r.t. an operational semantics.
Verified Software Toolchain
7
satisfying Q. However, C minor has two control-flow commands that can avoid a fallthrough exit: exit n breaks out of n nested blocks (within a function), and return e evaluates e and returns its value from the current function-call. Thus, Q is really a triple of three postconditions; an ordinary assertion for fall-through, an assertion parameterized by n (equivalently, a list of assertions) for exit and an assertion parameterized by return-value for return. To take a simple example, the judgement 4,π
4,
4,π
4,
Γ {∃x. (v ⇓ x) ∗ (x → 6) ∗ (x+4 → x)} [v+4] :=4 0 {∃x. (v ⇓ x) ∗ (x → 6) ∗ (x+4 → 0)}
can be read as follows: Before execution of the store statement, local variable v points to some address x; the current thread has partial permission π to read (but not necessarily write) a 4-byte integer (or pointer) at x; the current 4-byte contents of memory at x is 6; the thread has full permission to read or write a 4-byte integer (or pointer) at x + 4; the current contentents at x + 4 is x. Furthermore, by the rules of ∗ in separation logic, x is not aliased to x + 4, though in this case we didn’t need separation logic to tell us this obvious fact! After the assignment, the situation is much the same except that the contents of x + 4 is now a null pointer. The exit and return postconditions (not shown) are both false since the store command neither exits nor returns. The global context Γ maps addresses to their function specifications. The assertion f : {P }{Q} means f has precondition P and postcondition Q. Since C (and C minor) permits passing functions as pointers, in the program logic one can write ∃v. (f ⇓ v) ∗ (v : {P }{Q}), meaning that the name f evaluates to some pointer-value v, and v has precondition P , etc. Since the function can take a sequence of parameters and return an optional result, in fact P and Q are functions from value-list to assertion. The operators ∃, ∗, →, and : are all constructed as definitions in Coq from the underlying logic. We use a “shallow embedding,” but because of the almost-self-referential nature of some assertions (“f (x) is a function whose precondition claims that x is a function whose precondition...”), the construction can be rather intricate; see Section 6. In a Hoare logic one often needs to relate certain dynamic values appearing in a precondition to the same values in a postcondition. For example, consider the function int f(int x){return x + 1; }. One wants to specify that this function returns a value strictly greater than its argument. We write a specification such as, ∃v. f ⇓ v ∧ v : (∀x : int.{λa.a = x}{λr.r > x}) or, using a HOAS style for the : operator, ∃v. f ⇓ v ∧ v : [int]{P }{Q} where P = λxλa.(a = x) and Q = λxλr.(r > x). This notation permits the shared information x to be of any Coq type τ , where in this example τ = int. The use of HOAS (higher-order abstract syntax) hints that we shallowly embed the assertions in the surrounding logical framework, in this case Coq, to take advantage of all the binding structure there. To reason about polymorphic functions or data abstraction, our Hoare logic permits ∃x : τ.P universal and existential quantification: ∀x : τ.P
8
A.W. Appel
where τ can be any Coq type, and P can contain free occurrences of x. Of course this is just Coq “notation” for the HOAS version: ∀[τ ]P ∃[τ ]P where P is a function from τ to assertion. It is particularly important that τ can be any Coq type including quantified assertions. That is, the quantification is impredicative. Impredicativity is necessary for reasoning about data abstraction, where the abstract types in the interface of one module can be themselves implemented by abstract types from the interface of another module. To specify inductive datatypes, our program logic has an operator μ for recursively defined assertions: μx.P (x) satisfies the fixpoint equation μx.P (x) = P (μx.P (x)), (or in HOAS μP = P (μP )), provided that P is a contractive function [6]. Ordinary Hoare logics have difficulty with pointers and arrays, especially with updates of fields and array slots. Therefore we use a separation logic, which is a Hoare logic based on the theory of bunched implications. In an ordinary Hoare logic, the assertion P ∧ Q means that P holds on the current state and so does Q. In separation logic, P ∗ Q means that the current state can be expressed as the disjoint union of two parts, and P holds on the first part while Q holds on the second part. The assert emp holds on exactly the empty state, and true holds on any state. Thus the assertion (P ∗ Q) ∧ (R ∗ true) holds on any state s that can be broken into two pieces s1 ⊕ s2 , where P holds on one and Q on the other; and such that R holds on some substate of the state. Our separation logic has a notion of partial ownership, or “permissions.” The asserπ tion x → y means that the current state has exactly one address x in it; the contents of this address is the value y; and the current state has nonempty permission π to read (or perhaps also to write) the value. Permissions form a lattice; some permissions give read access, stronger permissions give write access, and the “top” permission gives the capability to deallocate the address. In contrast to previous permission models [11,29] ours is finite, constructive, general, and modeled in Coq [16]. Partial ownerships and overlapping visibility mean that “disjoint union” is an oversimplification to describe the ⊕ operator; we use a separation algebra, following Calcagno et al.; but we have a more general treatment of units and identities [16]. To reason about concurrent programs, we use a concurrent separation logic (CSL). O’Hearn [28] demonstrated that separation logic naturally extends to concurrency, as it limits each thread to the objects for which it has (virtual) permission to access. Portions of the heap can have their virtual ownership transferred from one thread to another by synchronization on semaphores. We have generalized O’Hearn’s original CSL to account for dynamic creation of locks and threads [21]; Gotsman et al. made a similar π generalization [18]. The assertion ; R means that address is a lock (i.e., semaphore) with visibility (permission) π and resource invariant R. Permission π < means that, most likely, some other thread(s) can see this lock as well; if only one thread could see then it would be difficult to use for synchronization! Any nonempty π gives permission to (attempt to) acquire the lock. Resource invariant R means that whenever a thread releases , it gives up a portion of memory satisfying the assertion R; whenever a thread acquires , it acquires a new portion of memory satisfying R. While is held, the thread can violate R until it is ready to release the lock.
Verified Software Toolchain
9
The basic CSL triples for acquire/release are π
π
{ ; R} acquire {R ∗ ; R}
π
π
{R ∗ ; R} release { ; R}.
In summary, programs have pointers, we want to reason about aliasing, hence Separation Logic rather than just Hoare Logic. Programs have higher-order functions, polymorphism, and data abstraction; hence we want impredicative quantification in the logic. Programs have concurrency, hence we want resource invariants. Finally, different front-end static analyses will make different demands on the program logic, so we want the program logic to be as expressive and general as we can possibly make it. We end up with impredicative higher-order concurrent separation logic.
5 Instrumenting Static Analyses Hoare-style judgements can be proved in an interactive theorem prover by applying the inference rules of the program logic. But there are many kinds of program properties that, with much less user effort, a static analyzer can “prove” automatically. We are interested in removing the quotation marks from the previous sentence. One way to connect a static analyzer to a program logic is to have the analyzer produce derivation trees in the logic. But this may require a large change to an existing analyzer, or significantly affect the software design of an analyzer; and the derivation trees will be huge. Another way would be to prove the analyzer correct, but analyzers are typically large programs implemented in languages with difficult proof theories; sometimes they have major components such as SAT solvers or ILP solvers. We propose to instrument the analyzer in a different way. A typical analyzer infers program invariants from assertions provided by the user. In a sense, the invariants are the induction hypotheses needed to verify the assertions. We propose that the analyzer should output the program liberally sprinkled with invariants. A much simpler program than the analyzer, the core of the analyzer, can check that the invariants before and after each program statement (or perhaps, program block) match up. We will implement the core of each analyzer in the Gallina functional programing language embedded in the Coq theorem prover. We will use Coq to prove the soundness of the core w.r.t. our program logic. Then we use Coq’s program-extraction to obtain a Caml program that implements the core. The toolchain then starts with the full (untrusted, unverified) static analyzer, whose results are rechecked with the (verified) core analyzer. My undergraduate student William Mansky conducted one successful experiment in this vein [26]: he reimplemented in Gallina the concurrent-separation-logic-based shape analysis by Gotsman et al. [19], and proved it correct4 with respect to our program logic. He found that this shape analysis was simple enough that he could implement the entire analysis including invariant inference in Gallina, without needing an unverified front end. 4
Well, almost proved. The problem is that Gotsman et al. assume a Concurrent Separation Logic with imprecise resource invariants, whereas our CSL (as is more standard [28]) requires precise resource invariants. The joy of discovering mismatches such as this is one of the rewards in doing end-to-end, top-to-bottom, machine-checked proofs such as the VST.
10
A.W. Appel
6 Semantic Modeling of Assertions We want to interpret an assertion P as a predicate on the state of the computation: more or less, s |= P is interpreted as P (s). Some of our assertion forms characterize information about permissions, types, and predicates that are nowhere to be found in a “Curry-style” bare-bones operational semantics of a conventional target machine. Therefore we use more than one kind of operational semantics: a decorated semantics with extra information, an angelic semantics with partial information, and an erased semantics with even less information. We have erasure theorems—that any nonstuck execution in the decorated semantics corresponds to a nonstuck execution in angelic semantics with similar observables, and ditto for the angelic to the erased semantics. Assertions require the decorated semantics. π π A judgement s |= 10 →1 8 ∗ 11 →2 0 means that s (in the decorated and angelic semantics) must contain not only 8 at address 10 and 0 at address 11, but must also keep track of the fact that there is exactly π1 permission at 10, and π2 at 11. π The assertions ; R (meaning that is a lock with resource invariant R and visibility π) and v : [τ ]{P }{Q} (v is a function with precondition P and postcondition Q) are interesting in that one assertion is characterizing the binding of an address ( or v) to another assertion (R, P, Q). Somehow the decorated state s must contain predicates; which in turn predicate over states. Achieving this—in connection with impredicative quantification and recursive assertions—is quite difficult; the semantic methods of the late 20th century were not up to the task. If we interpret assertions naively in set theory, we violate Cantor’s paradox. We solve this problem using the methods of Ahmed [4] as reformulated into the “Very Modal Model” [6] and “Indirection Theory” [22]. The recent formulation of Birkedal et al. [8] could also be used. The basic trick is that each state s indicates a π degree of approximation (a natural number ks ). When s |= ; R, then the assertion R can be applied to states s only strictly more approximately than ks . When the computation takes a step s → s , it must “age” the state by making sure that s is strictly more approximate than s, that is, ks < ks . In a decorated state an address can map to any of: valπ v lockπ R funπ [τ ]P Q
Memory data byte v with permission π Lock with resource-invariant R Function entry point with precondition P , post Q
In the angelic semantics the predicates are removed—leaving only valπ v, lockπ , funπ — and in the erased semantics the permissions are removed.
7 Partial Erasure before Bisimulation Our toolchain has, near the front end, a higher-order program logic; and near the back end, an optimizing compiler composed of successive phases. Each phase translates one language to another (or to itself with rewrites), and each phase comes equipped with a bisimulation proof that the observable behavior is unchanged.
Verified Software Toolchain
11
The fact that program-states s contain “hair” such as approximation indices and predicates, poses some problems for the modularity of the top-to-bottom verification of the toolchain. To verify the soundness of the program logic w.r.t. the operational semantics, we need states s containing predicates. As explained in Section 6, Indirection Theory requires that predicates embedded in states must be at specific levels of approximation, and these need to be “aged” (made more approximate) at each step of the computation. On the other hand, to verify the correctness of the optimizing compiler the predicates are unneeded, and the aging hampers the bisimulation proofs. For example, a compiler phase might change the number of steps executed (and thus the amount of aging). Thus we erase the predicates in moving from the decorated to the angelic semantics. But there are a few places where the predicates have an operational effect, and we replace these with an angelic oracle that supplies the missing information. All of our ageable predicates are within the Ω (oracle) component of the state; and thus this erasure has no effect on the m and q components. π Let address ; R be a lock (semaphore) in Concurrent Separation Logic, with resource invariant R. R must be a precise separation-logic predicate, meaning that R is satisfied by a unique subheap (if any subset) of a given memory. When a thread releases this lock, the (unique) subheap satisfying R, of the current heap, is moved from threadownership to lock-pool ownership. The operational effect on fully-erased memory is only that the semaphore goes from state 0 to state 1. The operational effect on decorated or angelic memory is that the permissions π of the thread (and of the lock pool) change. The way the permissionchange Δπ is calculated, in the decorated semantics, is that R is fetched from the state and (classically, not constructively) evaluated.5 That is, in a decorated state, at each lock address, is stored the resource invariant R. Unlike decorated states, angelic states contain no predicates. Instead of calculating Δπ from R, the angel is an oracle that contains a list of all Δπ effects that would have (nonconstructively) satisfied from deleted predicates. At each semaphore-release operation, the next Δπ is consumed from the angel. Dockins [15] shows the proof that For each safe execution in the decorated semantics, there exists an angel that gives a safe execution in the angelic semantics with the same observable behavior.
8 Bisimulation After the static analyzer or a program verifier has proved something about the observable behavior of a source program in the decorated semantics, we partially erase to the angelic semantics. The compiler-correctness proof, showing that source-language and machine-language programs have identical observables, is done in the angelic semantics of the several intermediate languages. 5
Our own proofs (and those of CompCert) do not use classical axioms such as choice and excluded middle; we use only some axioms of extensionality. But classical axioms are consistent with CiC+extensionality, so the user of our program logic may choose to use them. Therefore we do not assume that the satisfaction of R must be constructive.
12
A.W. Appel
The original CompCert proofs were done by bisimulation, with respect to a simple notion of finite or infinite traces of atomic events, with no shared memory. Because the CompCert languages are deterministic, the bisimulations reduce to simpler simulation proofs. These proofs rely on a set of general lemmas about simulations and bisimulations in small-step semantics equipped with observable traces. We are able reuse the existing CompCert proofs with very little modification, by ensuring that our oracle/observable interface is as compatible as possible with the original trace-based specification. From oracles and observables, Dockins [15] proves a set of simulation lemmas very similar to the original versions in CompCert. From these, Leroy can easily adapt the CompCert proofs.6 The fact that the operational semantics is angelic is almost invisible in the CompCert bisimulations, because the angel is consulted only at certain of the external function calls, and never during ordinary computation steps corresponding to instructions emitted by the compiler. If there had been predicates in the heap (i.e. the decorated semantics), the bisimulations would have been harder to prove (and require more changes to the CompCert proofs), because the aging of predicates required by Indirection Theory would age the state by different amounts depending how many instructions a compiler optimization deletes. Even so, there have been a few changes to the CompCert specification to accommodate our verified toolchain. In addition to the new notion of observables, CompCert now has an address-by-address permission map. Leroy has already modified CompCert (and its proof) to accommodate these permissions (CompCert 1.7, released 03/2010). The need for permissions in the (angelic) operational semantics is in response to a problem explained by Boehm: “Threads cannot be implemented as a library.” [10] He points out that hoisting loads and stores past aquire/release synchronizations can be unsafe optimizations, in shared-memory concurrent programs. He writes, “Here we point out the important issues, and argue that they lie almost exclusively with the compiler and the language specification itself, not with the thread library or its specification.” Indeed, the permissions in our decorated or angelic semantics form such a specification, and CompCert is proved correct with respect to this specification.
9 Weak Memory Models The angelic semantics (and its management of permissions) must be preserved down to the back end of the compiler, so that each optimization stage can correctly hoist loads/stores past other instructions. Only then can full erasure of permissions be done, into the erased semantics. At this time, the program is still race-free, and the permissions in the angelic semantics allow a proof of this fact. This suggests that executions in a weakly consistent memory model should have the same observable behavior as sequentially consistent executions. It should be possible to prove this, for a given memory model, and to add this proof to the bottom of the verified software toolchain. Recent results in the formal specification of weak memory models [31,12] should be helpful in this regard. 6
As of mid-2010, CompCert is not yet fully ported to the new model of observations.
Verified Software Toolchain
13
10 Foundational Verification Foundational verification is a machine-checked formal verification where we pay particular attention to the size and comprehensibility of the trusted base, which includes the specification of the theorem to be proved, the axiomatization of the logic, and the implementation of the proof checker. It is instructive to use Proof-Carrying Code as a testbed for gedanken experiments about what constitutes the trusted base of a formal method for software. Then in the next section we can apply this methodology to the Verified Software Toolchain, and compare both the size of the trusted base and the strength of what can be proved. Consider the claim that Program P in the ML (or Java) language does not load or store addresses outside its heap. What is the trusted base for a proof of this claim of memory safety? Implicit in this claim is that P is compiled by an ML (or Java) compiler to machine language. A full statement of this theorem in mathematical logic would include: (1) the specification of the operational semantics and instruction encodings of the machine language, (2) the specification of memory safety as a property of executions in the machine language, and (3) the entire ML (or Java) compiler. A verification is a proof of this theorem for P , and naively it might seem that part of this proof is a proof of correctness of the ML (or Java) compiler. ML and Java are type-safe languages: a source-language program that type-checks cannot load or store outside the heap when executing in the source-language operational semantics. Consequently, the machine-language program should also be memory-safe, if the compiler is correct. Proof-carrying code (PCC) [27] was introduced as a way to remove the compiler from the trusted base, without having to prove the compiler correct—machine-checked compiler-correctness proofs seemed impractical in 1997. Instead of relying on a correct compiler, PCC instruments the (untrusted, possibly incorrect) compiler to translate the source-language types of program variables (in the source-language type system) to machine-language types of program registers (in some machine-language type system). Then one implements a program to type-check machine-language programs. Let P be the compilation of P to machine-language; now the theorem is that P is memory-safe. The trusted base includes the machine-language type-checker, but not the compiler: that is, one trusts that, or proves that, if P typechecks then it is memory safe. Proof-carrying code originally had three important limitations: 1. There is no guarantee that P has the same observable (input/output) behavior as P , because the compiler is free to produce any program as output so long as it type-checks in the machine-language type system; 2. For each kind of source-level safety property that one wishes to proof-carry, one must instrument the compiler to translate source-level annotations to checkable machine-level annotations; 3. The claim that Any program that type-checks in the machine-language type system has the desired safety property when it executes is still proved only informally; that is, there was a LATEXproof about an abstraction of a subset of the type-checker. Foundational proof-carrying code [5] addresses this third point. We construct a formal specification of the machine-language syntax and semantics in a machine-checkable
14
A.W. Appel
logic, and a specification of memory-safety in that same logic. Then the theorem is, Program P is memory-safe, and neither the compiler nor the machine-language type annotations need be trusted; the type annotations are part of the proof of the theorem, and the proof need not be trusted because it is checked by machine. The Princeton foundational proof-carrying code (FPCC) project, 1999-2005, demonstrated this approach for core Standard ML. We demonstrated that the trusted base could be reduced to less than 3000 lines of source code: about 800 for a proof checker, written in C and capable of checking proofs for any object logic representable in LF; a few lines for the representation of higher-order logic (HOL) in LF; and about 1500 lines to represent instruction encodings and instruction semantics of the Sparc processor [32]. We instrumented Standard ML of New Jersey to produce type annotations at each basic block [13] and we built a semantic soundness proof for the machine-level type system [1]. Other research groups also demonstrated FPCC for other compilers [17,14]. The late-20th-century limitation that inspired PCC and FPCC—that it is impractical to prove the correctness of an optimizing compiler—no longer applies. Several machine-checked compiler-correctness proofs have been demonstrated [23,24,30]. Proof-carrying code is no longer the state of the art.
11 What Is the Trusted Base? Our main theorem is, Claims by the static analyzer about the source-language program will characterize the observations of the compiled program. How is this statement to be represented in a machine-checkable logic? We may choose to use complex and sophisticated mathematical techniques to prove such a statement. The compiler may use sophisticated optimization algorithms, with correspondingly complex correctness proofs. We can tolerate such complexity in proofs, because proofs are machine-checked. But complexity in the statement of the theorem is undesirable, because some human—the eventual consumer of our proof—must be able to check whether we have proved the right thing. A “claim” by a static analyzer relates a program p to the specification S of its observable behavior. That is, let A be the static analyzer, so that A(p, S) means that the analyzer claims that every observation o of source-language program p is in the set S. Let C be the compiler, so that C(p, p ) means that source program p translates to machinelanguage program p . Let M be the operational semantics of machine language, so that M (p , o) means that program p can run with observation o. Then the main theorem is, ∀p, p , S, o. A(p, S) ∧ C(p, p ) → M (p , o) → o ∈ S. That is, “if the analyzer claims that program p matches specification S, and the compiler compiles p to p , and p executes on M with observation o, then o is permitted by S.” The statement of this theorem is independent of the semantics of the source language! If we expand all the definitions in this theorem, the “big” things are A, C, S, and M . The consumer of this theorem doesn’t need to understand A or C, he just needs to make sure that A and C are installed, bit-for-bit, in his computer. The definition S is just the specification of the desired behavior of the program, and this will be as large or as concise as the user needs. The semantics M is only moderately large, depending on the machine architecture; in the FPCC project the Sparc architecture was specified in about 1500 lines of higher-order logic. [32]
Verified Software Toolchain
15
What is not in the statement of this theorem is the operational or axiomatic semantics of the source language, or the semantic techniques used in defining the program logic! All of these are contained within the proof of the theorem, but they do not form part of the trusted based needed in interpreting what the theorem claims. It seems paradoxical that a claim about the behavior of a program can be independent of the semantics of the programming language. But it works because of the end-to-end connection between the program logic and the compiler. The theorem can be read as, “When you apply the compiler to the syntax of this program, the result is a machinelanguage program with a certain behavior.” That sentence can be uttered without reference to the particular source-language semantics. What else do you need to trust? The analyzer A and compiler C are written in Gallina, the pure functional programming language embedded in the Coq theorem prover. Coq contains software s1 to check proofs about Gallina programs, and software s2 to translate Gallina to ML. The OCaml compiler s3 translates ML to machine-language; the machine language runs with OCaml runtime-system s4 . All of these are in the trusted base, in the sense that bugs in the si can render the proved theorem useless. FPCC had a much smaller trusted base, avoiding s1 –s4 .7 But at least these components s1 –s4 are fixed for all proofs and projects, and well tested by the community; and perhaps some of the techniques used in the FPCC project, such as a tiny independent proof checker, could be applied here as well to remove the si from the trusted base.
12 Conclusion Highly expressive program logics require different semantic methods than compilercorrectness proofs. Proofs about a sequential thread require different kinds of reasoning than proofs about operating systems and concurrency. A top-to-bottom verified toolchain requires all these kinds of reasoning, and thus we choose the right formalism for each component, and prove the relationships between the formalisms. In this way we achieve a system that is sufficiently modular that its major components can be built and proved by entirely separate research groups. In the process, more work goes into thinking about specifications and interfaces than into the individual components. This is not a bad thing, in the end. A formal proof may be “wrong” for either of two reasons: There may be a mistake in one or more steps of the proof, so that the “proof” fails to prove the stated theorem; or the statement of the theorem may be not what was intended. Using a proof assistant to mechanically check the proof prevents the first kind of problem, but not the second. This is why I find it so important to do a big top-to-bottom system and connect all the components together in the same metalogic. For example, our paper in LICS’02 [2] was a correct proof of a nice-looking theorem (semantic model of a type system with mutable references, polymorphism, and recursive types) but only in the application to the big Foundational Proof-Carrying Code project did we discover that it was the wrong theorem: the type system had predicative quantification, but we needed impredicative. We had to go back to the drawing board and figure out how to do impredicativity as well 7
For the explanation of why no compilers are in FPCC’s trusted base, see [7, §8.2]. Unfortunately that argument does not apply to the VST.
16
A.W. Appel
[3,4]. Footnote 4 describes one such incident in the VST project, and there have been many more along the way. Big “systems” projects have an important place in research on the formal semantics of software. Acknowledgments. Gilles Barthe, Lennart Beringer, Alexey Gotsman, Julia Lawall, Aleks Nanevski, Gordon Stewart, and Shaz Qadeer all gave me very helpful suggestions on how to organize and present this material. This research was supported in part by the Air Force Office of Scientific Research (grant FA9550-09-1-0138) and the National Science Foundation (grant CNS-0910448).
References 1. Ahmed, A., Appel, A.W., Richards, C.D., Swadi, K.N., Tan, G., Wang, D.C.: Semantic foundations for typed assembly languages. ACM Trans. Program. Lang. Syst. 32(3), 1–67 (2010) 2. Ahmed, A., Appel, A.W., Virga, R.: A stratified semantics of general references embeddable in higher-order logic. In: 17th Annual IEEE Symp. on Logic in Computer Science, pp. 75–86 (June 2002) 3. Ahmed, A., Appel, A.W., Virga, R.: An indexed model of impredicative polymorphism and mutable references (January 2003), http://www.cs.princeton.edu/˜appel/papers/impred.pdf 4. Ahmed, A.J.: Semantics of Types for Mutable State. PhD thesis, Princeton University, Princeton, NJ, Tech Report TR-713-04 (November 2004) 5. Appel, A.W.: Foundational proof-carrying code. In: Symp. on Logic in Computer Science (LICS 2001), pp. 247–258. IEEE, Los Alamitos (2001) 6. Appel, A.W., Melli`es, P.-A., Richards, C.D., Vouillon, J.: A very modal model of a modern, major, general type system. In: Proc. 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2007), pp. 109–122 (January 2007) 7. Appel, A.W., Michael, N.G., Stump, A., Virga, R.: A trustworthy proof checker. J. Automated Reasoning 31, 231–260 (2003) 8. Birkedal, L., Reus, B., Schwinghammer, J., Stovring, K., Thamsborg, J., Yang, H.: Stepindexed Kripke models over recursive worlds (2010) (submitted for publication) 9. Blazy, S., Dargaye, Z., Leroy, X.: Formal verification of a C compiler front-end. In: Symp. on Formal Methods, pp. 460–475 (2006) 10. Boehm, H.-J.: Threads cannot be implemented as a library. In: PLDI 2005: 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, pp. 261–268 (2005) 11. Bornat, R., Calcagno, C., O’Hearn, P., Parkinson, M.: Permission accounting in separation logic. In: POPL 2005, pp. 259–270 (2005) 12. Boudol, G., Petri, G.: Relaxed memory models: an operational approach. In: POPL 2009, pp. 392–403 (2009) 13. Chen, J., Wu, D., Appel, A.W., Fang, H.: A provably sound TAL for back-end optimization. In: PLDI 2003: Proc. 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 208–219 (June 2003) 14. Crary, K., Sarkar, S.: Foundational certified code in the twelf metalogical framework. ACM Trans. Comput. Logic 9(3), 1–26 (2008) 15. Dockins, R., Appel, A.W.: Observational oracular semantics for compiler correctness and language metatheory (2011) (in preparation)
Verified Software Toolchain
17
16. Dockins, R., Hobor, A., Appel, A.W.: A fresh look at separation algebras and share accounting. In: Hu, Z. (ed.) APLAS 2009. LNCS, vol. 5904, pp. 161–177. Springer, Heidelberg (2009) 17. Feng, X., Ni, Z., Shao, Z., Guo, Y.: An open framework for foundational proof-carrying code. In: Proc. 2007 ACM SIGPLAN International Workshop on Types in Language Design and Implementation (TLDI 2007), January 2007, pp. 67–78. ACM Press, New York (2007) 18. Gotsman, A., Berdine, J., Cook, B., Rinetzky, N., Sagiv, M.: Local reasoning for storable locks and threads. In: Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 19–37. Springer, Heidelberg (2007) 19. Gotsman, A., Berdine, J., Cook, B., Sagiv, M.: Thread-modular shape analysis. In: PLDI 2007: 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (2007) 20. Hobor, A.: Oracle Semantics. PhD thesis, Princeton University (2008) 21. Hobor, A., Appel, A.W., Nardelli, F.Z.: Oracle semantics for concurrent separation logic. In: Gairing, M. (ed.) ESOP 2008. LNCS, vol. 4960, pp. 353–367. Springer, Heidelberg (2008) 22. Hobor, A., Dockings, R., Appel, A.W.: A theory of indirection via approximation. In: POPL 2010: Proc. 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 171–184 (January 2010) 23. Klein, G., Nipkow, T.: A machine-checked model for a Java-like language, virtual machine and compiler. ACM Trans. on Programming Languages and Systems 28, 619–695 (2006) 24. Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler with a proof assistant. In: POPL 2006, pp. 42–54 (2006) 25. Leroy, X., Blazy, S.: Formal verification of a C-like memory model and its uses for verifying program transformations. Journal of Automated Reasoning 41(1), 1–31 (2008) 26. Mansky, W.: Automating separation logic for Concurrent C minor. Undergraduate thesis (May 2008) 27. Necula, G.: Proof-carrying code. In: 24th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pp. 106–119. ACM Press, New York (1997) 28. O’Hearn, P.W.: Resources, concurrency and local reasoning. Theoretical Computer Science 375(1), 271–307 (2007) 29. Parkinson, M.J.: Local Reasoning for Java. PhD thesis, University of Cambridge (2005) 30. Schirmer, N.: Verification of Sequential Imperative Programs in Isabelle/HOL. PhD thesis, Technische Universit¨at M¨unchen (2006) 31. Sewell, P., Sarkar, S., Owens, S., Nardelli, F.Z., Myreen, M.O.: x86-tso: a rigorous and usable programmer’s model for x86 multiprocessors. Commun. ACM 53(7), 89–97 (2010) 32. Wu, D., Appel, A.W., Stump, A.: Foundational proof checkers with small witnesses. In: 5th ACM SIGPLAN International Conference on Principles and Practice of Declarative Programming (August 2003)
Polymorphic Contracts Jo˜ ao Filipe Belo1 , Michael Greenberg1, Atsushi Igarashi2, and Benjamin C. Pierce1 1
University of Pennsylvania 2 Kyoto University
Abstract. Manifest contracts track precise properties by refining types with predicates—e.g., {x :Int | x > 0} denotes the positive integers. Contracts and polymorphism make a natural combination: programmers can give strong contracts to abstract types, precisely stating pre- and post-conditions while hiding implementation details—for example, an abstract type of stacks might specify that the pop operation has input type {x :α Stack | not (empty x )}. We formalize this combination by defining FH , a polymorphic calculus with manifest contracts, and establishing fundamental properties including type soundness and relational parametricity. Our development relies on a significant technical improvement over earlier presentations of contracts: instead of introducing a denotational model to break a problematic circularity between typing, subtyping, and evaluation, we develop the metatheory of contracts in a completely syntactic fashion, omitting subtyping from the core system and recovering it post facto as a derived property. Keywords: contracts, refinement types, preconditions, postconditions, dynamic checking, parametric polymorphism, abstract datatypes, syntactic proof, logical relations, subtyping.
1
Introduction
Software contracts allow programmers to state precise properties—e.g., that a function takes a non-empty list to a positive integer—as concrete predicates written in the same language as the rest of the program; these predicates can be checked dynamically as the program executes or, more ambitiously, verified statically with the assistance of a theorem prover. Findler and Felleisen [5] introduced “higher-order contracts” for functional languages; these can take one of two forms: predicate contracts like {x :Int | x > 0}, which denotes the positive numbers, and function contracts like x :Int → {y:Int | y ≥ x }, which denotes functions over the integers that return numbers larger than their inputs. Greenberg, Pierce, and Weirich [7] contrast two different approaches to contracts: in the manifest approach, contracts are types—the type system itself makes contracts ‘manifest’; in the latent approach, contracts and types live in different worlds (indeed, there may be no types at all, as in PLT Racket’s contract system [1]). These two presentations lead to different ways of checking contracts. Latent systems run contracts with checks: for example, {x :Int | x > 0}l n G. Barthe (Ed.): ESOP 2011, LNCS 6602, pp. 18–37, 2011. c Springer-Verlag Berlin Heidelberg 2011
Polymorphic Contracts
19
checks that n > 0. If the check succeeds, then the entire expression will just return n. If it fails, then the entire program will “blame” the label l , raising an uncatchable exception ⇑l , pronounced “blame l ”. Manifest systems use casts, Int ⇒ {x :Int | x > 0}l to convert values from one type to another (the lefthand side is the source type and the right-hand side is the target type). For predicate contracts, a cast will behave just like a check on the target type: applied to n, the cast either returns n or raises ⇑l . Checks and casts differ when it comes to function contracts. A function check (T1 → T2 l v ) v will reduce to T2 l (v (T1 l v )), giving v the argument checked at the domain contract and checking that the result satisfies the codomain contract. A function cast (T11 → T12 ⇒ T21 → T22 l v ) v will reduce to T12 ⇒ T22 l (v (T21 ⇒ T11 l v )), wrapping the argument v in a (contravariant) cast between the domain types and wrapping the result of the application in a (covariant) cast between the codomain types. The differences between checks and casts are discussed at length in [7]. Both presentations have their pros and cons: latent contracts are simpler to design and extend, while manifest contracts make a clearer connection between the static constraints captured by types and the dynamic checks performed by casts. In this work, we consider the manifest approach and endeavor to tame its principal drawback: the complexity of its metatheory. We summarize the issues here, comparing our work to previous approaches more thoroughly in Section 6. Subtyping is the main source of complexity in the most expressive manifest calculi—those which have dependent functions and allow arbitrary terms in refinements [7,10]. These calculi have subtyping for two reasons. First, subtyping helps preserve types when evaluating casts with predicate contracts: if Int ⇒ {x :Int | x > 0}l n −→∗ n, then we need to type n at {x :Int | x > 0}. Subtyping gives it to us, allowing n to be typed at any predicate contract it satisfies. Second, subtyping can show the equivalence of types with different but related term substitutions. Consider the standard dependent-function application rule: Γ e2 : T1 Γ e1 : (x :T1 → T2 ) Γ e1 e2 : T2 [e2 /x ] If e2 −→ e2 , how do T2 [e2 /x ] and T2 [e2 /x ] relate? (An important question when proving preservation!) Subtyping shows that these types are really the same: the first type parallel reduces to the second, and it can be shown that parallel reduction between types implies mutual subtyping—that is, equivalence. Subtyping brings its own challenges, though. A na¨ıve treatment of subtyping introduces a circularity in the definition of the type system. Existing systems break this circularity by defining judgements in a careful order: first the evaluation relation and the corresponding parallel reduction relation; then a denotational semantics based on the evaluation relation and subtyping based on the denotational semantics; and finally the syntactic type system. Making this carefully sequenced series of definitions hold together requires a long series of tedious lemmas relating evaluation and parallel reduction. The upshot is that existing manifest calculi have taken considerable effort to construct. We propose here a simpler approach to manifest calculi that greatly simplifies their definition and metatheory. Rather than using subtyping, we define a type
20
J.F. Belo et al.
conversion relation based on parallel reduction. This avoids the original circularity without resorting to denotational semantics. Indeed, we can use this type conversion to give a completely syntactic account of type soundness—with just a few easy lemmas relating evaluation and parallel reduction. Moreover, eliminating subtyping doesn’t fundamentally weaken our approach, since we can define a subtyping relation and prove its soundness post facto. We bring this new technique to bear on FH , a manifest calculus with parametric polymorphism. Researchers have already studied the dynamic enforcement of parametric polymorphism in languages that mix (conventional, un-refined) static and dynamic typing (see Section 6); here we study the static enforcement of parametric polymorphism in languages that go beyond conventional static types by adding refinement types and dependent function contracts. Concretely, we offer four main contributions: 1. We devise a simpler approach to manifest contract calculi and apply it to FH , proving type soundness using straightforward syntactic methods [19]. 2. We offer the first operational semantics for general refinements, where refinements can apply to any type—not just base types. 3. We prove that FH is relationally parametric—establishing that contract checking does not interfere with this desirable property. 4. We define a post facto subtyping relation and prove that “upcasts” from subtypes to supertypes always succeed in FH , i.e., that subtyping is sound. We begin with some examples in Section 2. We then describe FH and prove type soundness in Section 3. We prove parametricity in Section 4 and the upcast lemma in Section 5. We discuss related work in Section 6 and conclude with ideas for future work in Section 7.
2
Examples
Like other manifest calculi, FH checks contracts with casts: the cast T1 ⇒ T2 l takes a value of type T1 (the source type) and ensures that it behaves (and is treated) like a T2 (the target type). The l superscript is a blame label, used to differentiate between different casts and identify the source of failures. How we check T1 ⇒ T2 l v depends on the structure of T1 and T2 . Checking predicate contracts with casts is easy: if v satisfies the predicate of the target type, the entire application goes to v ; if not, then the program aborts, “raising” blame, written ⇑l . For example, Int ⇒ {x :Int | x > 0}l 5 −→∗ 5, since 5 > 0. But Int ⇒ {x :Int | x > 0}l 0 −→∗ ⇑l , since 0 > 0. When checking predicate contracts, only the target type matters—the type system guarantees that whatever value we have is well typed at the source type. Checking function contracts is a little trickier: what should Int → Int ⇒ {x :Int | x > 0} → {y:Int | y > 5}l v do? We can’t just open up v and check whether it always returns positives. The solution is to decompose the cast into its parts: Int → Int ⇒ {x :Int | x > 0} → {y:Int | y > 5}l v −→ λx :{x :Int | x > 0}. (Int ⇒ {y:Int | y > 5}l (v ({x :Int | x > 0} ⇒ Intl x )))
Polymorphic Contracts
21
Note that the domain cast is contravariant, while the codomain is covariant: the context will be forced by the type system to provide a positive number, so we need to cast the input to an appropriate type for v . (In this example, the contravariant cast {x :Int | x > 0} ⇒ Intl will always succeed.) After v returns, we run the covariant codomain cast to ensure that v didn’t misbehave. So: Int → Int ⇒ {x :Int | x > 0} → {y:Int | y > 5}l (λx :Int. x ) 6 −→∗ 6 · · · l (λx :Int. 0) 6 −→∗ ⇑l l · · · (λx :Int. 0) (Int ⇒ {x :Int | x > 0}l 0) −→∗ ⇑l Note that we omitted the case where a cast function is applied to 0. It is an important property of our system that 0 doesn’t have type {x :Int | x > 0}! With these preliminaries out of the way, we can approach our work: a manifest calculus with polymorphism. The standard polymorphic encodings of existential and product types transfer over to FH without a problem. Indeed, our dependent functions allow us to go one step further and encode even dependent products such as (x : Int)×{y:α List | length y = x }, which represents lists paired with their lengths. Let’s look at an example combining contracts and polymorphism—an abstract datatype of natural numbers. NAT : ∃α. (zero : α) × (succ : (α → α)) × (iszero : (α → Bool)) × (pred : {x :α | not (iszero x )} → α) (We omit the implementation, a standard Church encoding.) The NAT interface hides our encoding of the naturals behind an existential type, but it also requires that pred is only ever applied to terms of type {x :α | not (iszero x )}. Assuming that iszero v −→∗ true iff v = zero, we can infer that pred is never given zero as an argument. Consider the following expression, where I is the interface we specified for NAT and we omit the term binding for brevity: unpack NAT : ∃α. I as α,
in pred (α ⇒ {x :α | not (iszero x )}l zero) : α
The application of pred directly to zero would not be well typed, since zero : α. On the other hand, the cast term is well typed, since we cast zero to the type we need. Naturally, this cast will ultimately raise ⇑l , because not (iszero zero) −→∗ false. The example so far imposes constraints only on the use of the abstract datatype, in particular on the use of pred. To have constraints imposed also on the implementation of the abstract data type, consider the extension of the interface with a subtraction operation, sub, and a “less than or equal” predicate, leq. We now have the interface: I = I × (leq : α → α → Bool) × (sub : (x :α → {y:α | leq y x } → {z :α | leq z x })) The sub function requires that its second argument isn’t greater than the first, and it promises to return a result that isn’t greater than the first argument. We get contracts in interfaces by putting casts in the implementations. For example, the contracts on pred and sub are imposed when we “pack up” NAT; we write nat for the implementation type: pack nat, (zero, succ, iszero, pred, leq, sub) as ∃α. I
22
J.F. Belo et al.
Types and contexts T ::= B | α | x :T1 → T2 | ∀α.T | {x :T | e} Γ ::= ∅ | Γ, x :T | Γ, α Terms e ::= x | k | op (e1 , ... , en ) | λx :T . e | Λα.e | e1 e2 | e T | T1 ⇒ T2 l | ⇑l | {x :T | e1 }, e2 , v l v ::= k | λx :T . e | Λα.e | T1 ⇒ T2 l r ::= v | ⇑l E ::= [ ] e2 | v1 [ ] | [ ] T | {x :T | e}, [ ] , v l | op(v1 , ..., vi−1 , [ ] , ei+1 , ..., en ) Fig. 1. Syntax for FH
where: pred = nat → nat ⇒ {x :nat | not (iszero x )} → natl pred sub = nat → nat → nat ⇒ x :nat → {y:nat | leq y x } → {z :nat | leq z x }l sub That is, the existential type dictates that we must pack up cast versions of our implementations, pred and sub . Note, however, that the cast on pred will never actually check anything at runtime: if we unfold the domain contract contravariantly, we see that {x :nat | not (iszero x )} ⇒ natl is a no-op. Instead, clients of NAT can only call pred with terms that are typed at {x :nat | not (iszero x )}, i.e., by checking that values are nonzero with a cast into pred’s input type. The story is the same for the contract on sub’s second argument—the contravariant cast won’t actually check anything. The codomain contract on sub, however, could fail if sub mis-implemented subtraction. We can sum up the situation for contracts in interfaces as follows: the positive parts of the interface type are checked and can raise blame—these parts are the responsibility of the implementation; the negative parts of the interface type are not checked by the implementation—clients must check these themselves before calling functions from the ADT. Distributing obligations in this way recalls Findler and Felleisen’s seminal idea of client and server blame [5].
3
Defining FH
The syntax of FH is given in Figure 1. For unrefined types we have: base types B , which must include Bool; type variables α; dependent function types x :T1 → T2 where x is bound in T2 ; and universal types ∀α.T , where α is bound in T . Aside from dependency in function types, these are just the types of the standard polymorphic lambda calculus. As usual, we write T1 → T2 for x :T1 → T2 when x does not appear free in T2 . We also have predicate contracts, or refinement types, written {x :T | e}. Conceptually, {x :T | e} denotes values v of type T for which e[v /x ] reduces to true. For each B , we fix a set KB of the constants in that type; we require our typing rules for constants and our typing and evaluation rules for operations to respect this set. We also require that KBool = {true, false}. In the syntax of terms, the first line is standard for a call-by-value polymorphic language: variables, constants, several monomorphic first-order operations
Polymorphic Contracts
23
op (i.e., destructors of one or more base-type arguments), term and type abstractions, and term and type applications. The second line offers the standard constructs of a manifest contract calculus [6,7,10], with a few alterations, discussed below. Casts are the distinguishing feature of manifest contract calculi. When applied to a value of type T1 , the cast T1 ⇒ T2 l ensures that its argument behaves— and is treated—like a value of type T2 . When a cast detects a problem, it raises blame, a label-indexed uncatchable exception written ⇑l . The label l allows us to trace blame back to a specific cast. (While our labels here are drawn from an arbitrary set, in practice l will refer to a source-code location.) Finally, we use active checks {x :T | e1 }, e2 , v l to support a small-step semantics for checking casts into refinement types. In an active check, {x :T | e1 } is the refinement being checked, e2 is the current state of checking, and v is the value being checked. The type in the first position of an active check isn’t necessary for the operational semantics, but we keep it around as a technical aid to type soundness. If checking succeeds, the check will return v ; if checking fails, the check will blame its label, raising ⇑l . Active checks and blame are not intended to occur in source programs—they are runtime devices. (In a real programming language based on this calculus, casts will probably not appear explicitly either, but will be inserted by an elaboration phase. The details of this process are beyond the scope of the present work.) The values in FH are constants, term and type abstractions, and casts. We also define results, which are either values or blame. (Type soundness—a consequence of Theorems 2 and 3 below—will show that evaluation produces a result, but not necessarily a value.) In some earlier work [7,8], casts between function types applied to values were themselves considered values. We make the other choice here: excluding applications from the possible syntactic forms of values simplifies our inversion lemmas. There are two notable features relative to existing manifest calculi: first, any type (even a refinement type) can be refined, not just base types (as in [6,7,8,10,12]); second, the third part of the active check form {x :T | e1 }, e2 , v l can be any value, not just a constant. Both of these changes are motivated by the introduction of polymorphism. In particular, to support refinement of type variables we must allow refinements of all types, since any type can be substituted in for a variable. Operational Semantics The call-by-value operational semantics in Figure 2 are given as a small-step relation, split into two sub-relations: one for reductions () and one for congruence and blame lifting (−→). The latter relation is standard. The E Reduce rule lifts reductions into −→; the E Compat rule turns −→ into a congruence over our evaluation contexts; and the E Blame rule lifts blame, treating it as an uncatchable exception. The reduction relation is more interesting. There are four different kinds of
24
J.F. Belo et al.
Reduction rules e1 e2 op (v1 , ... , vn ) [[op]] (v1 , ... , vn ) (λx :T1 . e12 ) v2 e12 [v2 /x ] (Λα.e) T e[T /α]
E Op E Beta E TBeta
T ⇒ T l v v E Refl x :T11 → T12 ⇒ x :T21 → T22 l v E Fun λx :T21 . (T12 [T21 ⇒ T11 l x /x ] ⇒ T22 l (v (T21 ⇒ T11 l x ))) when x :T11 → T12 = x :T21 → T22 ∀α.T1 ⇒ ∀α.T2 l v Λα.(T1 ⇒ T2 l (v α)) E Forall when ∀α.T1 = ∀α.T2 {x :T1 | e} ⇒ T2 l v T1 ⇒ T2 l v E Forget when T2 = {x :T1 | e} and T2 = {y:{x :T1 | e} | e2 } T1 ⇒ {x :T2 | e}l v T2 ⇒ {x :T2 | e}l (T1 ⇒ T2 l v ) E PreCheck when T1 = T2 and T1 = {x :T | e } l T ⇒ {x :T | e} v {x :T | e}, e[v /x ], v l E Check {x :T | e}, true, v l v {x :T | e}, false, v l ⇑l
E OK E Fail
Evaluation rules e1 −→ e2 e1 e2 e1 −→ e2
E Reduce
e1 −→ e2 E [e1 ] −→ E [e2 ]
E Compat
E [⇑l ] −→ ⇑l
E Blame
Fig. 2. Operational semantics
reductions: the standard lambda calculus reductions, structural cast reductions, cast staging reductions, and checking reductions. The E Beta, and E TBeta rules should need no explanation—these are the standard call-by-value polymorphic lambda calculus reductions. The E Op rule uses a denotation function [[−]] to give meaning to our first-order operations. The E Refl, E Fun, and E Forall rules are structural cast reductions. E Refl eliminates a cast from a type to itself; intuitively, such a cast should always succeed anyway. (We discuss this rule more in Section 4.) When a cast between function types is applied to a value v , the E Fun rule produces a new lambda, wrapping v with a contravariant cast on the domain and covariant cast on the codomain. The extra substitution in the left-hand side of the codomain cast may seem suspicious, but in fact the rule must be this way in order for type preservation to hold (see [7] for an explanation). The E Forall rule is similar to E Fun, generating a type abstraction with the necessary covariant cast. Side conditions on E Forall and E Fun ensure that these rules apply only when E Refl doesn’t. The E Forget, E PreCheck, and E Check rules are cast-staging reductions, breaking a complex cast down to a series of simpler casts and checks. All of these rules require that the left- and right-hand sides of the cast be
Polymorphic Contracts
25
different—if they are the same, then E Refl applies. The E Forget rule strips a layer of refinement off the left-hand side; in addition to requiring that the leftand right-hand sides are different, the preconditions require that the right-hand side isn’t a refinement of the left-hand side. The E PreCheck rule breaks a cast into two parts: one that checks exactly one level of refinement and another that checks the remaining parts. We only apply this rule when the two sides of the cast are different and when the left-hand side isn’t a refinement. The E Check rule applies when the right-hand side refines the left-hand side; it takes the cast value and checks that it satisfies the right-hand side. (We don’t have to check the left-hand side, since that’s the type we’re casting from.) Before explaining how these rules interact in general, we offer a few examples. First, here is a reduction using E Check, E Compat, E Op, and E OK: Int ⇒ {x :Int | x ≥ 0}l 5 −→ {x :Int | x ≥ 0}, 5 ≥ 0, 5l −→ {x :Int | x ≥ 0}, true, 5l −→ 5 A failed check will work the same way until the last reduction, which will use E Fail rather than E OK: Int ⇒ {x :Int | x ≥ 0}l (−1) −→ {x :Int | x ≥ 0}, −1 ≥ 0, −1l −→ {x :Int | x ≥ 0}, false, −1l −→ ⇑l Notice that the blame label comes from the cast that failed. Here is a similar reduction that needs some staging, using E Forget followed by the first reduction we gave: {x :Int | x = 5} ⇒ {x :Int | x ≥ 0}l 5 −→ Int ⇒ {x :Int | x ≥ 0}l 5 −→ {x :Int | x ≥ 0}, 5 ≥ 0, 5l −→∗ 5 There are two cases where we need to use E PreCheck. First, when multiple refinements are involved: Int ⇒ {x :{y:Int | y ≥ 0} | x = 5}l 5 −→ {y:Int | y ≥ 0} ⇒ {x :{y:Int | y ≥ 0} | x = 5}l (Int ⇒ {y:Int | y ≥ 0}l 5) −→∗ {y:Int | y ≥ 0} ⇒ {x :{y:Int | y ≥ 0} | x = 5}l 5 −→ {x :{y:Int | y ≥ 0} | x = 5}, 5 = 5, 5l −→∗ 5 Second, when casting a function or universal type into a refinement of a different function or universal type. Bool → {x :Bool | x } ⇒ {f :Bool → Bool | f true = f false}l v −→ Bool → Bool ⇒ {f :Bool → Bool | f true = f false}l (Bool → {x :Bool | x } ⇒ Bool → Booll v ) E Refl is necessary for simple cases, like Int ⇒ Intl 5 −→ 5. Hopefully, such a silly cast would never be written, but it could arise as a result of E Fun or E Forall. (We also need E Refl in our proof of parametricity; see Section 4.)
26
J.F. Belo et al.
Cast evaluation follows a regular schema: Refl | (Forget∗ (Refl | (PreCheck∗ (Refl | Fun | Forall)? Check∗ ))) Let’s consider the cast T1 ⇒ T2 l v . To simplify the following discussion, we define unref(T ) as T without any outer refinements (though refinements on, e.g., the domain of a function would be unaffected); we write unref n (T ) when we remove only the n outermost refinements: unref(T ) if T = {x :T | e} unref(T ) = T otherwise First, if T1 = T2 , we can apply E Refl and be done with it. If that doesn’t work, we’ll reduce by E Forget until the left-hand side doesn’t have any refinements. (N.B. we may not have to make any of these reductions.) Either all of the refinements will be stripped away from the source type, or E Refl eventually applies and the entire cast disappears. Assuming E Refl doesn’t apply, we now have unref(T1 ) ⇒ T2 l v . Next, we apply E PreCheck until the cast is completely decomposed into one-step casts, once for each refinement in T2 : unref 1 (T2 ) ⇒ T2 l (unref 2 (T2 ) ⇒ unref 1 (T2 )l (... (unref(T1 ) ⇒ unref(T2 )l v ) ...)) As our next step, we apply whichever structural cast rule applies to unref(T1 ) ⇒ unref(T2 )l v , one of E Refl, E Fun, or E Forall. Now all that remains are some number of refinement checks, which can be dispatched by the E Check rule (and other rules, of course, during the predicate checks themselves). Static Typing The type system comprises three mutually recursive judgments: context well formedness, type well formedness, and term well typing. The rules for contexts and types are unsurprising. The rules for terms are mostly standard. First, the T App rule is dependent, to account for dependent function types. The T Cast rule is standard for manifest calculi, allowing casts between compatibly structured well formed types. Compatibility of type structures is defined in Figure 4; in short, compatible types erase to identical simple type skeletons. Note that we assign casts a non-dependent function type. The T Op rule uses the ty function to assign (possibly dependent) monomorphic first-order types to our operations; we require that ty(op) and [[op]] agree. Some of the typing rules—T Check, T Blame, T Exact, T Forget, and T Conv—are “runtime only”. We don’t expect to use these rules to type check source programs, but we need them to guarantee preservation. Note that the conclusions of these rules use a context Γ , but their premises don’t use Γ at all. Even though runtime terms and their typing rules should only ever occur in an empty context, the T App rule substitutes terms into types—so a runtime term could end up under a binder. We therefore allow the runtime typing rules
Polymorphic Contracts
27
Context well formedness Γ
∅
Γ Γ T
Γ, x :T
WF Empty
WF ExtendVar
Γ
Γ, α
WF ExtendTVar
Type well formedness Γ T
Γ Γ B
Γ α∈Γ Γ α
WF Base
Γ T1 Γ, x :T1 T2 Γ x :T1 → T2
WF Fun
Γ, α T Γ ∀α.T
WF Forall
Γ T Γ, x :T e : Bool Γ {x :T | e}
WF Refine
WF TVar
Term typing Γ e : T
Γ x :T ∈ Γ Γ x :T
Γ Γ k : ty (k )
T Var
Γ, x :T1 e12 : T2 Γ λx :T1 . e12 : x :T1 → T2
T Abs
T Const
∅ T Γ Γ ⇑l : T
Γ e1 : (x :T1 → T2 ) Γ e2 : T1 Γ e1 e2 : T2 [e2 /x ]
Γ ty(op) = x1 : T1 → ... → xn : Tn → T Γ ei [e1 /x1 , ..., ei−1 /xi−1 ] : Ti [e1 /x1 , ..., ei−1 /xi−1 ] Γ op (e1 , ... , en ) : T [e1 /x1 , ..., en /xn ] Γ, α e : T Γ Λα.e : ∀α.T
Γ
Γ
T ≡ T
∅ v :T
T Conv
∅ v : {x :T | e} Γ v :T
∅ {x :T | e} e[v /x ] −→∗ true Γ v : {x :T | e} Fig. 3. Typing rules
T TApp
T Cast
∅ {x :T | e1 } ∅ v : T ∅ e2 : Bool e1 [v /x ] −→∗ e2 Γ {x :T | e1 }, e2 , v l : {x :T | e1 }
∅ e : T ∅ T Γ e : T
T App
T Op
Γ e1 : ∀α.T Γ T2 Γ e1 T2 : T [T2 /α]
T TAbs
Γ T1 Γ T2 T1 T2 Γ T1 ⇒ T2 l : T1 → T2
Γ
T Blame
T Check
Γ
T Exact
T Forget
28
J.F. Belo et al.
Type compatibility T1 T2 T T
C Refl
T1 T2 {x :T1 | e} T2
T11 T21 T12 T22 x :T11 → T12 x :T21 → T22
C RefineL
T1 T2 T1 {x :T2 | e}
C RefineR
T1 T2 ∀α.T1 ∀α.T2
C Forall
C Fun Fig. 4. Type compatibility
to apply in any well formed context, so long as the terms they type check are closed. The T Blame rule allows us to give any type to blame—this is necessary for preservation. The T Check rule types an active check, {x :T | e1 }, e2 , v l . Such a term arises when a term like T ⇒ {x :T | e1 }l v reduces by E Check. The premises of the rule are all intuitive except for e1 [v /x ] −→∗ e2 , which is necessary to avoid nonsensical terms like {x :T | x ≥ 0}, true, −1l , where the wrong predicate gets checked. The T Exact rule allows us to retype a closed value of type T at {x :T | e} if e[v /x ] −→∗ true. This typing rule guarantees type preservation for E OK: {x :T | e1 }, true, v l −→ v . If the active check was well typed, then we know that e1 [v /x ] −→∗ true, so T Exact applies. Finally, the T Conv rule allows us to retype expressions at convertible types: if ∅ e : T and T ≡ T , then ∅ e : T (or in any well formed context Γ ). We define ≡ as the symmetric, transitive closure of call-by-value respecting parallel reduction, which we write . The T Conv rule is necessary to prove preservation in the case where e1 e2 −→ e1 e2 . Why? The first term is typed at T2 [e2 /x ] (by T App), but reapplying T App types the second term at T2 [e2 /x ]. Conveniently, T2 [e2 /x ] T2 [e2 /x ], so the two are convertible if we take parallel reduction as our type conversion. Naturally, we have to take the transitive closure so we can string together conversion derivations. We take the symmetric closure, since it is easier for us to work with an equivalence. In previous work, subtyping is used instead of the ≡ relation; one of our contributions is the insight that subtyping—with its accompanying metatheoretical complications—is not an essential component of manifest calculi. We define type compatibility and a few metatheoretically useful operators in Figure 4. Lemma 1 (Canonical forms). If ∅ v : T , then: 1. If unref(T ) = B then v = k ∈ KB for some v 2. If unref(T ) = x :T1 → T2 then v is (a) λx :T1 . e12 and T1 ≡ T1 for some x , T1 and e12 , or (b) T1 ⇒ T2 l and T1 ≡ T1 and T2 ≡ T2 for some T1 , T2 , and l 3. If unref(T ) = ∀α.T then v is Λα.v for some v . Theorem 2 (Progress). If ∅ e : T , then either e −→ e or e is a result. Theorem 3 (Preservation). If ∅ e : T and e −→ e , then ∅ e : T .
Polymorphic Contracts
29
Closed terms r1 ∼ r2 : T ; θ; δ and e1 e2 : T ; θ; δ k ∼ k : B ; θ; δ v1 ∼ v2 : α; θ; δ v1 ∼ v2 : (x :T1 → T2 ); θ; δ v1 ∼ v2 : ∀α.T ; θ; δ v1 ∼ v2 : {x :T | e}; θ; δ ⇑l ∼ ⇑l : T ; θ; δ
⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒
k ∈ KB ∃RT1 T2 , α → R, T1 , T2 ∈ θ ∧ v1 R v2 ∀v1 ∼ v2 : T1 ; θ; δ, v1 v1 v2 v2 : T2 ; θ; δ[v1 , v2 /x ] ∀RT1 T2 , v1 T1 v2 T2 : T ; θ[α → R, T1 , T2 ]; δ v1 ∼ v2 : T ; θ; δ ∧ θ1 (δ1 (e))[v1 /x ] −→∗ true ∧ θ2 (δ2 (e))[v2 /x ] −→∗ true
e1 e2 : T ; θ; δ ⇐⇒ ∃r1 r2 , e1 −→∗ r1 ∧ e2 −→∗ r2 ∧ r1 ∼ r2 : T ; θ; δ Types T1 T2 : ∗; θ; δ B B : ∗; θ; δ α α : ∗; θ; δ x :T11 → T12 x :T21 → T22 : ∗; θ; δ ⇐⇒ T11 T21 : ∗; θ; δ ∧ ∀v1 ∼ v2 : T11 ; θ; δ, T12 T22 : ∗; θ; δ[v1 , v2 /x ] ∀α.T1 ∀α.T2 : ∗; θ; δ ⇐⇒ ∀RT1 T2 , T1 T2 : ∗; θ[α → R, T1 , T2 ]; δ {x :T1 | e1 } {x :T2 | e2 } : ∗; θ; δ ⇐⇒ T1 T2 : ∗; θ; δ ∧ ∀v1 ∼ v2 : T1 ; θ; δ, θ1 (δ1 (e1 ))[v1 /x ] θ2 (δ2 (e2 ))[v2 /x ] : Bool; θ; δ Open terms and types Γ θ; δ and Γ e1 e2 : T and Γ T1 T2 : ∗ Γ θ; δ ⇐⇒ ∀x :T ∈ Γ, θ1 (δ1 (x )) θ2 (δ2 (x )) : T ; θ; δ ∧ ∀α ∈ Γ, ∃RT1 T2 , α → R, T1 , T2 ∈ θ Γ e1 e2 : T ⇐⇒ ∀Γ θ; δ, θ1 (δ1 (e1 )) θ2 (δ2 (e2 )) : T ; θ; δ Γ T1 T2 : ∗ ⇐⇒ ∀Γ θ; δ, T1 T2 : ∗; θ; δ Fig. 5. The logical relation for parametricity
Requiring standard weakening, substitution, and inversion lemmas, the syntactic proof of type soundness is straightforward. It is easy to restrict FH to a simply typed calculus with a similar type soundness proof.
4
Parametricity
We prove relational parametricity for two reasons: (1) it gives us powerful reasoning techniques such as free theorems [17], and (2) it indicates that contracts don’t interfere with type abstraction. Our proof is standard: we define a (syntactic) logical relation where each type is interpreted as a relation on terms and the relation at type variables is given as a parameter. In the next section, we will define a subtype relation and show that an upcast—a cast whose source type is a subtype of the target type—is logically related to the identity function. Since our logical relation is an adequate congruence, it is contained in contextual equivalence. Therefore, upcasts are contextually equivalent to the identity and can be eliminated without changing the meaning of a program. We begin by defining two relations: r1 ∼ r2 : T ; θ; δ relates closed results, defined by induction on types; e1 e2 : T ; θ; δ relates closed expressions which
30
J.F. Belo et al.
evaluate results in the first relation. The definitions are shown in Figure 5.1 Both relations have three indices: a type T , a substitution θ for type variables, and a substitution δ for term variables. A type substitution θ, which gives the interpretation of free type variables in T , maps a type variable to a triple (R, T1 , T2 ) comprising a binary relation R on terms and two closed types T1 and T2 . We require that R be closed under parallel reduction (the relation). A term substitution δ maps from variables to pairs of closed values. We write θi (i = 1, 2) for a substitution that maps a type variable α to Ti where θ(α) = (R, T1 , T2 ). We denote projections δi similarly. With these definitions out of the way, the term relation is mostly straightforward. First, ⇑l is related to itself at every type. A base type B gives the identity relation on KB , the set of constants of type B . A type variable α simply uses the relation assumed in the substitution θ. Related functions map related arguments to related results. Type abstractions are related when their bodies are parametric in the interpretation of the type variable. Finally, two values are related at a refinement type when they are related at the underlying type and both satisfy the predicate; here, the predicate e gets closed by applying the substitutions. The ∼ relation on results is extended to the relation on closed terms in a straightforward manner: terms are related if and only if they both terminate at related results. We extend the relation to open terms, written Γ e1 e2 : T , relating open terms that are related when closed by any “Γ -respecting” pair of substitutions θ and δ (written Γ θ; δ, also defined in Figure 5). To show that a (well-typed) cast is logically related to itself, we also need a relation on types T1 T2 : ∗; θ; δ; we define this relation in Figure 5. We use the logical relation on terms to handle the arguments of function types and refinement types. Note that T1 and T2 are not necessarily closed; terms in refinement types, which should be related at Bool, are closed by applying substitutions. In the function/refinement type cases, the relation on a smaller type is universally quantified over logically related values. There are two choices of the type at which they should be related (for example, the second line of the function type case could change T11 to T21 ), but it does not really matter which to choose since they are related types. Here, we have chosen the type from the left-hand side; in our proof, we justify this choice by proving a “type exchange” lemma that lets us replace a type index T1 in the term relation by T2 when T1 T2 : ∗. Finally, we lift our type relation to open terms: we write Γ T1 T2 : ∗ when two types are equivalent for any Γ -respecting substitutions. It is worth discussing a few points peculiar to our formulation. First, we allow any relation on terms closed under parallel reduction to be used in θ; terms related at T need not be well typed at T . The standard formulation of a logical relation is well typed throughout, requiring that the relation R in every triple be well typed, only relating values of type T1 to values of type T2 (e.g., [14]). We have two motivations for leaving our relations untyped. First, functions of type 1
To save space, we write ⇑l ∼ ⇑l : T ; θ; δ separately instead of manually adding such a clause for each type.
Polymorphic Contracts
31
x :T1 → T2 must map related values (v1 ∼ v2 : T1 ) to related results...but at what type? While T2 [v1 /x ] and T2 [v2 /x ] are related in our type logical relation, terms that are well typed at one type won’t necessarily be well typed at the other. Second, we prove in Section 5 that upcasts have no effect: if T1 1, t = t1 ∗ (t2 ∗ · · ·∗tn ). In listings, we rely on syntactic abbreviations available in F#, such as layout conventions (to suppress in keywords) and writing tuples as M1 , . . . , Mn without enclosing parentheses. Let a typing environment, Γ , be a list of the form ε , x1 : t1 , . . . , xn : tn ; we say Γ is well-formed and write Γ to mean that the variables xi are pairwise distinct. Let dom (Γ ) = {x1 , . . . , xn } if Γ = ε , x1 : t1 , . . . , xn : tn . We sometimes use the notation x : t for Γ = ε , x1 : t1 , . . . , xn : tn where x = x1 , . . . , xn and t = t1 , . . . ,tn . The typing rules for this monomorphic first-order language are standard.
82
J. Borgstr¨om et al.
Representative Typing Rules for Fun Expressions: Γ M : t (F UN O PERATOR) ⊗ : b1 , b2 → b3 Γ V1 : b1 Γ V2 : b2
(F UN R ANDOM) D : (x1 : b1 ∗ · · · ∗ xn : bn ) → b Γ V : (b1 ∗ · · · ∗ bn )
Γ V1 ⊗ V2 : b3
Γ random (D(V )) : b
(F UN O BSERVE) Γ V :b
Γ observe V : unit
3 Semantics as Measure Transformers If we can only sample from discrete distributions, the semantics of Fun is straightforward. In our technical report, we formalize the sampling semantics of the previous section as a small-step operational semantics for the fragment of Fun where every random expression takes the form random (Bernoulli(c)) for some real c ∈ (0, 1). A reduction M → p M means that M reduces to M with non-zero probability p. We cannot give such a semantics to expressions that sample from continuous distributions, such as random (Gaussian(1, 1)), since the probability of any particular sample is zero. A further difficulty is the need to observe events with probability zero, a common situation in machine learning. For example, consider the naive Bayesian classifier, a common, simple probabilistic model. In the training phase, it is given objects together with their classes and the values of their pertinent features. Below, we show the training for a single feature: the weight of the object. The zero probability events are weight measurements, assumed to be normally distributed around the class mean. The outcome of the training is the posterior weight distributions for the different classes. Naive Bayesian Classifier, Single Feature Training: let wPrior() = sample (Gaussian(0.5,1.0)) let Glass,Watch,Plate = wPrior(),wPrior(),wPrior() let weight objClass objWeight = observe (objWeight−(sample (Gaussian(objClass,1.0))) weight Glass .18; weight Glass .21 weight Watch .11; weight Watch .073 weight Plate .23; weight Plate .45 Watch,Glass,Plate Above, the call to weight Glass .18 modifies the distribution of the variable Glass. The example uses observe (x−y) to denote that the difference between the weights x and y is 0. The reason for not instead writing x=y is that conditioning on events of zero probability without specifying the random variable they are drawn from is not in general well-defined, cf. Borel’s paradox [12]. To avoid this issue, we instead observe the random variable x−y of type real, at the value 0. To give a formal semantics to such observations, as well as to mixtures of continuous and discrete distributions, we turn to measure theory, following standard sources [3]. Two basic concepts are measurable spaces and measures. A measurable space is a set of values equipped with a collection of measurable subsets; these measurable sets
Measure Transformer Semantics for Bayesian Machine Learning
83
generalize the events of discrete probability. A finite measure is a function that assigns a numeric size to each measurable set; measures generalize probability distributions. 3.1 Types as Measurable Spaces We let Ω range over sets of possible outcomes; in our semantics Ω will range over B = {true, false}, Z, R, and finite Cartesian products of these sets. A σ -algebra over Ω is a set M ⊆ P(Ω ) which (1) contains ∅ and Ω , and (2) is closed under complement and countable union and intersection. A measurable space is a pair (Ω , M) where M is a σ -algebra over Ω ; the elements of M are called measurable sets. We use the notation σΩ (S), when S ⊆ P(Ω ), for the smallest σ -algebra over Ω that is a superset of S; we may omit Ω when it is clear from context. If (Ω , M) and (Ω , M ) are measurable spaces, then the function f : Ω → Ω is measurable if and only if for all A ∈ M , f −1 (A) ∈ M, where the inverse image f −1 : P(Ω ) → P(Ω ) is given by f −1 (A) {ω ∈ Ω | f (ω ) ∈ A}. We write f −1 (x) for f −1 ({x}) when x ∈ Ω . We give each first-order type t an interpretation as a measurable space T[[t]] (Vt , Mt ) below. We write () for ∅, the unit value. Semantics of Types as Measurable Spaces: T[[unit]] = ({()}, {{()}, ∅}) T[[bool]] = (B, P(B)) T[[int]] = (Z, P(Z)) T[[real]] = (R, σR ({[a, b] | a, b ∈ R})) T[[t ∗ u]] = (Vt × Vu , σVt ×Vu ({m × n | m ∈ Mt , n ∈ Mu })) The set σR ({[a, b] | a, b ∈ R}) in the definition of T[[real]] is the Borel σ -algebra on the real line, which is the smallest σ -algebra containing all closed (and open) intervals. Below, we write f : t → u to denote that f : Vt → Vu is measurable, that is, that f −1 (B) ∈ Mt for all B ∈ Mu . 3.2 Finite Measures A finite measure μ on a measurable space (Ω , M) is a function M → R+ that is countably additive, that is, if the sets A0 , A1 , . . . ∈ M are pairwise disjoint, then μ (∪i Ai ) = ∑i μ (Ai ). We write |μ | μ (Ω ). Let M t be the set of finite measures on the measurable space T[[t]]. We make use of the following constructions on measures. – Given a function f : t → u and a measure μ ∈ M t, there is a measure μ f −1 ∈ M u given by (μ f −1 )(B) μ ( f −1 (B)). – Given a finite measure μ and a measurable set B, we let μ |B (A) μ (A ∩ B) be the restriction of μ to B. – We can add two measures on the same set as (μ1 + μ2 )(A) μ1 (A) + μ2 (A). – The (independent) product (μ1 × μ2 ) of two measures is also definable, and satisfies (μ1 × μ2 )(A × B) = μ1 (A) · μ2 (B). (Existence and uniqueness follows from the Hahn-Kolmogorov theorem.) space T[[t]], a measurable set A ∈ Mt and a – Given a measure μ on the measurable function f : t → real, we write A f d μ or equivalently A f (x) d μ (x) for standard (Lebesgue) integration. This integration is always well-defined if μ is finite and f is non-negative and bounded from above.
84
J. Borgstr¨om et al.
– Given a measure μ on ameasurable space T[[t]] let a function μ˙ : t → real be a density for μ iff μ (A) = A μ˙ d λ for all A ∈ M, where λ is the standard Lebesgue measure on T[[t]]. (We also use λ -notation for functions, but we trust any ambiguity is easily resolved.) Standard Distributions. Given a closed well-typed Fun expression random (D(V )) of base type b, we define a corresponding finite measure μD(V ) on measurable space T[[b]]. In the discrete case, we first define probability masses D(V ) c of single elements, and hence of singleton sets, and then define the measure μD(V ) as a countable sum. Masses D(V ) c and Measures μD(V ) for Discrete Probability Distributions: Bernoulli(p) true p Bernoulli(p) false 1 − p Binomial(n, p) i ni pi /n! DiscreteUniform(m) i 1/m Poisson(l) n e−l l n /n! μD(V ) (A) ∑i D(V ) ci
if 0 ≤ p ≤ 1, 0 otherwise if 0 ≤ p ≤ 1, 0 otherwise if 0 ≤ p ≤ 1, 0 otherwise if 0 ≤ i < m, 0 otherwise if l, n ≥ 0, 0 otherwise if A = i {ci } for pairwise distinct ci
In the continuous case, we first define probability densities D(V ) r at individual elements r. and then define the measure μD(V ) as an integral. Below, we write G for the standard Gamma function, which on naturals n satisfies G(n) = (n − 1)!. Densities D(V ) r and Measures μD(V ) for Continuous Probability Distributions: √ 2 Gaussian(m, v) r e−(r−m) /2v / 2π v if v > 0, 0 otherwise s−1 −pr s if r, s, p > 0, 0 otherwise Gamma(s, p) r r e p /G(s) Beta(a, b) r ra−1 (1 − r)b−1G(a + b)/(G(a)G(b)) if a, b ≥ 0 and 0 ≤ r ≤ 1, 0 otherwise μD(V ) (A) A D(V ) d λ where λ is the Lebesgue measure The Dirac δ measure is defined on the measurable space T[[b]] for each base type b, and is given by δc (A) 1 if c ∈ A, 0 otherwise. We write δ for δ0.0 . The notion of density can be generalized as follows, yielding an unnormalized counterpart to conditional probability. Given a measure μ on T[[t]] and a measurable function p : t → b, we consider the family of events p(x) = c where c ranges over Vb . We define μ˙ [A||p = c] ∈ R (the μ -density at p = c of A) following [8], by: Conditional Density: μ˙ [A||p = c]
μ˙ [A||p = c] limi→∞ μ (A ∩ p−1 (Bi ))/ Bi 1 d λ if the limit exists and is the same for all sequences {Bi } of closed sets converging regularly to c. Where defined, letting A ∈ Ma , B ∈ Mb , conditional density satisfies the equation B
μ˙ [A||p = x] d(μ p−1 )(x) = μ (A ∩ p−1 (B)).
Measure Transformer Semantics for Bayesian Machine Learning
85
In particular, we have μ˙ [A||p = c] = 0 if b is discrete and μ (p−1 (c)) = 0. To show that our definition of conditional density generalizes the notion of density given above, we have that if μ has a continuous density μ˙ on some neighbourhood of p−1 (c) then
μ˙ [A||p = c] =
A
δc (p(x))μ˙ (x) d λ (x).
3.3 Measure Transformers We will now recast some standard theorems of measure theory as a library of combinators, that we will later use to give semantics to probabilistic languages. A measure transformer is a function from finite measures to finite measures. We let t u be the set of functions M t → M u. We use the following combinators on measure transformers in the formal semantics of our languages. Measure Transformer Combinators: pure ∈ (t → u) → (t u) >>> ∈ (t1 t2 ) → (t2 t3 ) → (t1 t3 ) choose ∈ (Vt → (t u)) → (t u) extend ∈ (Vt → M u) → (t (t ∗ u)) observe ∈ (t → b) → (t t) The definitions of these combinators occupy the remainder of this section. We recall that μ denotes a measure and A a measurable set, of appropriate types. To lift a pure measurable function to a measure transformer, we use the combinator pure ∈ (t → u) → (t u). Given f : t → u, we let pure f μ A μ f −1 (A), where μ is a measure on T[[t]] and A is a measurable set from T[[u]]. To sequentially compose two measure transformers we use standard function composition, defining >>> ∈ (t1 t2 ) → (t2 t3 ) → (t1 t3 ) as T >>> U U ◦ T . The combinator choose ∈ (Vt → (t u)) → (t u) makes a conditional choice between measure transformers, if its first argument is measurable and has finite range. Intuitively, choose K μ first splits Vt into the equivalence classes modulo K. For each equivalence class, we then run the corresponding measure transformer on μ restricted to the class. Finally, the resulting finite measures are added together, yielding a finite measure. We let choose K μ A ∑T ∈range(K) T (μ |K −1 (T ) )(A). In particular, if K is a binary choice mapping all elements of B to TB and all elements of C = B to TC , we have choose K μ A = TB (μ |B )(A) + TC (μ |C )(A). (In fact, our only uses of choose in this paper are in the semantics of conditional expressions in Fun and conditional statements in Imp, and in each case the argument K to choose is a binary choice.) The combinator extend ∈ (Vt → M u) → (t (t ∗ u)) extends the domain of a measure using a function yielding measures. It is reminiscent of creating a dependent pair, since the distribution of the second component depends on the value of the first. For extend m to be defined, we require that for every A ∈ Mu , the function fA λ x.m(x)(A) is measurable, non-negative and bounded from above. This will always be the case in our semantics for Fun, since we only use the standard distribu tions for m above. We let extend m μ AB Vt m(x)({y | (x, y) ∈ AB})d μ (x), where
86
J. Borgstr¨om et al.
we integrate over the first component (call it x) with respect to the measure μ , and the integrand is the measure m(x) of the set {y | (x, y) ∈ A} for each x. The combinator observe ∈ (t → b) → (t t) conditions a measure over T[[t]] on the event that an indicator function of type t → b is zero. Here observation is unnormalized conditioning of a measure on an event. We define: μ˙ [A||p = 0b ] if μ (p−1 (0b )) = 0 observe p μ A μ (A ∩ p−1 (0b )) otherwise As an example, if p : t → bool is a predicate on values of type t, we have observe p μ A = μ (A ∩ {x | p(x) = true}). In the continuous case, if Vt = R × Rk , p = λ (y, x).(y − c) and μ has density μ˙ then observe p μ A =
A
μ (y, x) d(δc × λ )(y, x) =
μ˙ (c, x) d λ (x).
{x|(c,x)∈A}
Notice that observe p μ A can be greater than μ (A), for which reason we cannot restrict ourselves to transformation of (sub-)probability measures. 3.4 Measure Transformer Semantics of Fun In order to give a compositional denotational semantics of Fun programs, we give a semantics to open programs, later to be placed in some closing context. Since observations change the distributions of program variables, we may draw a parallel to while programs. In this setting, we can give a denotation to a program as a function from variable valuations to a return value and a variable valuation. Similarly, we give semantics to an open Fun term by mapping a measure over assignments to the term’s free variables to a joint measure of the term’s return value and assignments to its free variables. This choice is a generalization of the (discrete) semantics of pW HILE [2]. First, we define a data structure for an evaluation environment assigning values to variable names, and corresponding operations. Given an environment Γ = x1 :t1 , . . . , xn :tn , we let SΓ be the set of states, or finite maps s = {x1 → V1 , . . . , xn → Vn } such that for all i = 1, . . . , n, ε Vi : ti . We let T[[SΓ ]] T[[t1 ∗ · · · ∗ tn ]] be the measurable space of states in SΓ . We define dom(s) {x1 , . . . , xn }. We define the following operators. Auxiliary Operations on States and Pairs: add x (s,V ) s ∪ {x → V } lookup x s s(x) drop X s {(x → V ) ∈ s | x ∈ / X}
if ε V : t and x ∈ / dom(s), s otherwise. if x ∈ dom(s), () otherwise. fst((x, y)) x snd((x, y)) y
We apply these combinators to give a semantics to Fun programs as measure transformers. We assume that all bound variables in a program are different from the free variables and each other. Below, V[[V ]] s gives the valuation of V in state s, and A[[M]] gives the measure transformer denoted by M.
Measure Transformer Semantics for Bayesian Machine Learning
87
Measure Transformer Semantics of Fun: V[[x]] s lookup x s V[[c]] s c V[[(V1 ,V2 )]] s (V[[V1 ]] s, V[[V2 ]] s) A[[V ]] pure λ s.(s, V[[V ]] s) A[[V1 ⊗ V2 ]] pure λ s.(s, ((V[[V1 ]] s) ⊗ (V[[V2]] s))) A[[V.1]] pure λ s.(s, fst(V[[V ]] s)) A[[V.2]] pure λ s.(s, snd(V[[V ]] s)) A[[if V then M else N]] choose λ s.if V[[V ]] s then A[[M]] else A[[N]] A[[random (D(V ))]] extend λ s.μD(V[[V ]] s) A[[observe V ]] (observe λ s.V[[V ]] s) >>> pure λ s.(s, ()) A[[let x = M in N]] A[[M]] >>> pure (add x) >>> A[[N]] >>> pure λ (s, y).((drop {x} s), y) A value expression V returns the valuation of V in the current state, which is left unchanged. Similarly, binary operations and projections have a deterministic meaning given the current state. An if V expression runs the measure transformer given by the then branch on the states where V evaluates true, and the transformer given by the else branch on all other states, using the combinator choose. A primitive distribution random (D(V )) extends the state measure with a value drawn from the distribution D, with parameters V depending on the current state. An observation observe V modifies the current measure by restricting it to states where V is zero. It is implemented with the observe combinator, and it always returns the unit value. The expression let x = M in N intuitively first runs M and binds its return value to x using add. After running N, the binding is discarded using drop. Lemma 1. If s : SΓ and Γ V : t then V[[V ]] s ∈ Vt . Lemma 2. If Γ M : t then A[[M]] ∈ SΓ (SΓ ∗ t). The measure transformer semantics of Fun is hard to use directly, except in the case of discrete measures where they can be directly implemented: a naive implementation of MSΓ is as a map assigning a probability to each possible variable valuation. If there are N variables, each sampled from a Bernoulli distribution, in the worst case there are 2N paths to be explored in the computation, each of which corresponds to a variable valuation. In this simple case, the measure transformer semantics of closed programs also coincides with the sampling semantics. We write PM [value = V | valid] for the probability that a run of M returns V given that all observations in the run succeed. Theorem 1. Suppose ε M : t for some M only using Bernoulli distributions. If μ = A[[M]] δ() and ε V : t then PM [value = V | valid] = μ ({V })/|μ |. A consequence of the theorem is that our measure transformer semantics is a generalization of the sampling semantics for discrete probabilities. For this theorem to hold, it is critical that observe denotes unnormalized conditioning (filtering). Otherwise programs that perform observations inside the branches of conditional expressions would
88
J. Borgstr¨om et al.
have undesired semantics. As the following example shows, the two program fragments observe (x=y) and if x then observe (y=true) else observe (y=false) would have different measure transformer semantics although they have the same sampling semantics. Simple Conditional Expression: Mif let x = sample (Bernoulli(0.5)) let y = sample (Bernoulli(0.1)) if x then observe (y=true) else observe (y=false) y
In the sampling semantics, the two valid runs are when x and y are both true (with probability 0.05), and both false (with probability 0.45), so we have P [true | valid] = 0.1 and P [false | valid] = 0.9. If, instead of the unnormalized definition observe p μ A = μ (A ∩ {x | p(x)}), we had either of the flawed definitions observe p μ A =
μ (A ∩ {x | p(x)}) μ (A ∩ {x | p(x)}) or |μ | μ ({x | p(x)}) μ ({x | p(x)})
then A[[Mif ]] δ() {true} = A[[Mif ]] δ() {false}, which would invalidate the theorem. Let M = Mif with observe (x = y) substituted for the conditional expression. With the actual or either of the flawed definitions of observe we have A[[M ]] δ() {true} = (A[[M ]] δ() {false})/9.
4 Semantics as Factor Graphs A naive implementation of the measure transformer semantics of the previous section would work directly with measures of states, whose size could be exponential in the number of variables in scope. For large models, this becomes intractable. In this section, we instead give a semantics to Fun programs as factor graphs [18], whose size will be linear in the size of the program. We define this semantics in two steps. We first compile the Fun program into a program in the simple imperative language Imp, and then the Imp program itself has a straightforward semantics as a factor graph. Our semantics formalizes the way in which our implementation maps F# programs to Csoft programs, which are evaluated by Infer.NET by constructing suitable factor graphs. The implementation advantage of translating F# to Csoft, over simply generating factor graphs directly [22], is that the translation preserves the structure of the input model (including array processing in our full language), which can be exploited by the various inference algorithms supported by Infer.NET. 4.1 Imp: An Imperative Core Calculus Imp is an imperative language, based on the static single assignment (SSA) intermediate form. It is a sublanguage of Csoft, the input language of Infer.NET [25], and is intended to have a simple semantics as a factor graph. A composite statement C is a sequence of
Measure Transformer Semantics for Bayesian Machine Learning
89
statements, each of which either stores the result of a primitive operation in a location, observes the contents of a location to be zero, or branches on the value of a location. Imp shares the base types b with Fun, but has no tuples. Syntax of Imp: l, l , . . . E, F ::= c | l | (l ⊗ l) I ::= l←E s − D(l1 , . . . , ln ) l← observeb l if l thenΣ1 C1 elseΣ2 C2 C ::= nil | I | (C;C)
Locations (variables) in global store Expression Statement assignment random assignment observation conditional Composite Statement
When making an observation observeb , we make explicit the type b of the observed location. In the form if l thenΣ1 C1 elseΣ2 C2 , the environments Σ1 and Σ2 declare the local variables assigned by the then branch and the else branch, respectively. These annotations simplify type checking and denotational semantics. The typing rules for Imp are standard. We consider Imp typing environments Σ to be a special case of Fun environments Γ , where variables (locations) always map to base types. The judgment Σ C : Σ means that the composite statement C is well-typed in the initial environment Σ , yielding additional bindings Σ . Part of the Type System for Imp: Σ C : Σ (I MP S EQ) Σ C1 : Σ
Σ , Σ C2 : Σ Σ C1 ;C2 : (Σ , Σ )
(I MP O BSERVE) Σ l:b
Σ observeb l : ε
(I MP N IL) Σ
(I MP A SSIGN) Σ E :b l∈ / dom(Σ )
Σ nil : ε
Σ l ← E : ε , l:b
(I MP I F) Σ l : bool Σ C1 : Σ1
Σ C2 : Σ2
{Σi } = {Σi , Σ }
Σ if l thenΣ1 C1 elseΣ2 C2 : Σ
4.2 Translating from Fun to Imp The translation from Fun to Imp is a mostly routine compilation of functional code to imperative code. The main point of interest is that Imp locations only hold values of base type, while Fun variables may hold tuples. We rely on patterns p and layouts ρ to track the Imp locations corresponding to Fun environments. The technical report has the detailed definition of the following notations. Notations for the Translation from Fun to Imp: p ::= l | () | (p, p) ρ ::= (xi → pi )i∈1..n Σ p:t Σ ρ :Γ ρ M ⇒ C, p
pattern: group of Imp locations to represent Fun value layout: finite map from Fun variables to patterns in environment Σ , pattern p represents Fun value of type t in environment Σ , layout ρ represents environment Γ given ρ , expression M translates to C and pattern p
90
J. Borgstr¨om et al.
4.3 Factor Graphs A factor graph [18] represents a joint probability distribution of a set of random variables as a collection of multiplicative factors. Factor graphs are an effective means of stating conditional independence properties between variables, and enable efficient algebraic inference techniques [27,38] as well as sampling techniques [15, Chapter 12]. We use factor graphs with gates [26] for modelling if-then-else clauses; gates introduce second-order edges in the graph. Factor Graphs: G ::= new x : b in {e1 , . . . , em } x, y, z, . . . e ::= Equal(x, y) Constantc (x) Binop⊗ (x, y, z) SampleD (x, y1 , . . . , yn ) Gate(x, G1 , G2 )
Graph Nodes (random variables) Edge equality (x = y) constant (x = c) binary operator (x = y ⊗ z) sampling (x ∼ D(y1 , . . . , yn )) gate (if x then G1 else G2 )
In a graph new x : b in {e1 , . . . , em }, the variables xi are bound; graphs are identified up to consistent renaming of bound variables. We write {e1 , . . . , em } for new ε in {e1 , . . . , em }. We write fv(G) for the variables occurring free in G. Here is an example factor graph GE . (The corresponding Fun source code is listed in the technical report.) Factor Graph for Epidemiology Example: GE = {Constant0.01 (pd ), SampleB (has disease, pd ), Gate(has disease, new p p : real in {Constant0.8 (p p ), SampleB (positive result, p p )}, new pn : real in {Constant0.096 (pn ), SampleB (positive result, pn )}), Constanttrue (positive result)} A factor graph typically denotes a probability distribution. The probability (density) of an assignment of values to variables is equal to the product of all the factors, averaged over all assignments to local variables. Here, we give a slightly more general semantics of factor graphs as measure transformers; the input measure corresponds to a prior factor over all variables that it mentions. Below, we use the Iverson brackets, where [p] is 1 when p is true and 0 otherwise. We let δ (x = y) δ0 (x − y) when x, y denote real numbers, and [x = y] otherwise.
Semantics of Factor Graphs: J[[G]]ΣΣ ∈ SΣ SΣ , Σ
J[[G]]ΣΣ μ A
A (J[[G]]
J[[new x : b in {e}]] s
s) d(μ × λ )(s)
V∗i bi
∏ j (J[[e j ]] (s, x)) d λ (x)
J[[Equal(l, l )]] s δ (lookup l s = lookup l s) J[[Constantc (l)]] s δ (lookup l s = c) J[[Binop⊗ (l, w1 , w2 )]] s δ (lookup l s = lookup w1 s ⊗ lookup w2 s)
Measure Transformer Semantics for Bayesian Machine Learning
91
J[[SampleD (l, v1 , . . . , vn )]] s μD(lookup v1 s,...,lookup vn s) (lookup l s) J[[Gate(v, G1 , G2 )]] s (J[[G1 ]] s)[lookup v s] (J[[G2 ]] s)[¬lookup v s]
4.4 Factor Graph Semantics for Imp An Imp statement has a straightforward semantics as a factor graph. Here, observation is defined by the value of the variable being the constant 0b . Factor Graph Semantics of Imp: G = G[[C]] G[[nil]] ∅ G[[C1 ;C2 ]] G[[C1 ]] ∪ G[[C2 ]] G[[l ← c]] {Constantc (l)} G[[l ← l ]] {Equal(l, l )} G[[l ← l1 ⊗ l2 ]] {Binop⊗ (l, l1 , l2 )} s − D(l1 , . . . , ln )]] {SampleD (l, l1 , . . . , ln )} G[[l ← G[[observeb l]] {Constant0b (l)} G[[if l thenΣ1 C1 elseΣ2 C2 ]] {Gate(l, new Σ1 in G[[C1 ]], new Σ2 in G[[C2 ]])} The following theorem asserts that the semantics of Fun coincides with the semantics of Imp for compatible measures, which are defined as follows. If T : t u is a measure transformer composed from the combinators of Section 3 and μ ∈ M t, we say that T is compatible with μ if every application of observe f to some μ in the evaluation of T (μ ) satisfies either that f is discrete or that μ has a continuous density on some ε -neighbourhood of f −1 (0.0). The statement of the theorem needs some additional notation. If Σ p : t and s ∈ SΣ , we write p s for the reconstruction of an element of T[[t]] by looking up the locations of p in the state s. We define as follows operations lift and restrict to translate between states consisting of Fun variables (SΓ ) and states consisting of Imp locations (SΣ ), where flatten takes a mapping from patterns to values to a mapping from locations to base values. lift ρ λ s. flatten {ρ (x) → V[[x]] s | x ∈ dom(ρ )} restrict ρ λ s. {x → V[[ρ (x)]] s | x ∈ dom(ρ )} Theorem 2. If Γ M : t and Σ ρ : Γ and ρ M ⇒ C, p and measure μ ∈ MSΓ is compatible with A[[M]] then there exists Σ such that Σ C : Σ and: A[[M]] μ = (pure (lift ρ ) >>> J[[G[[C]]]]ΣΣ >>> pure (λ s. (restrict ρ s, p s))) μ . Proof. Via a direct measure transformer semantics for Imp. The proof is by induction on the typing judgments Γ M : t and Σ C : Σ .
5 Implementation Experience We implemented a compiler from Fun to Imp in F#. We wrote two backends for Imp: an exact inference algorithm based on a direct implementation of measure transformers for
92
J. Borgstr¨om et al.
discrete measures, and an approximating inference algorithm for continuous measures, using Infer.NET [25]. Translating Imp to Infer.NET is relatively straightforward, and amounts to a syntax-directed series of calls to Infer.NET’s object-oriented API. We have statistics on a few of the examples we have implemented. The lines of code number includes F# code that loads and processes data from disk before loading it into Infer.NET. The times are based on an average of three runs. All of the runs are on a four-core machine with 4GB of RAM. The Naive Bayes program is the naive Bayesian classifier of the earlier examples. The Mixture model is another clustering/classification model. TrueSkill is a tournament ranking model, and adPredictor is a simplified version of a model to predict the likelihood that a display advertisment will be clicked. In the two long-running examples, time is spent mostly loading and processing data from disk and running inference in Infer.NET. TrueSkill spends the majority of its time (64%) in Infer.NET, performing inference. AdPredictor spends most of the time in pre-processing (58%), and only 40% in inference. The time spent in our compiler is negligible, never more than a few hundred milliseconds. Summary of our Basic Test Suite: Naive Bayes Mixture TrueSkill adPredictor
LOC Observations Variables Time 28 9 3 2 ·
d5 . Since the combined system is unsatisfiable, we conclude that the update for this mode-combination includes d1 = 2 · d5 . The complete affine update consists of: d1 = 2 · d5 d5 = 2 · d5 + d2
d2 = d2 d6 = 2 · d6 + d4
d3 = 2 · d6 d7 = 2 · d6 + d2
d4 = d4 d8 = 2 · d5 + d4
Observe that these linear symbolic update operations are optimal.
Transfer Function Synthesis without Quantifier Elimination
109
Reflections on octagonal transfer functions. Interestingly, Min´e [22, Fig. 27] also discusses the relative precision of transfer functions, though where the base semantics is polyhedral rather than Boolean. Using his classification, the transfer functions derived using the synthesis techniques presented in Sect. 3.3 and 3.4 might be described as medium and exact. 3.5
Inferring Bounds for Octagons
For a final example, consider the following code block: 1 : AND R0, 15;
2 : AND R1, 15;
3 : XOR R0, R1;
4 : ADD R0, R1;
The operations AND and XOR are uni-modal; ADD is multi-modal but it only operates in exact mode for this block. For this single mode no affine relationship exists between the symbolic constants di that characterise the input octagon and those di that characterise the output octagon. However, even in such cases, it can be still possible to find a d1 such that
r0 ≤ d1 by applying range refinement. This gives d1 = 30. Repeating this tactic for remaining the symbolic output constants yields: d1 = 30 d5 = 45
4
d2 = 15 d6 = 0
d3 = 0 d7 = 0
d4 = 0 d8 = 15
Evaluating Transfer Functions
Thus far, we have described how to derive transfer functions for intervals and octagons where the functions are systems of guards paired with affine updates, without reference to how they are evaluated. In our previous work [6], the application of a transfer function amounted to solving a series of integer linear programs (ILPs). To illustrate, suppose a transfer function consists of a single guard g and update u pair and let c denote a system of octagonal constraints on the input variables. A single output inequality in the output system, c , such as r0 + r1 ≤ d5 , can be derived by maximising r0 + r1 subject to the linear system c ∧ g ∧ u. To construct c in its entirety requires the solution of O(n2 ) ILPs where n is the number of registers (or variables) in the block. Although steady progress has been made on deriving safe bounds for integer programs [26], a more attractive solution computationally would avoid ILPs altogether. 4.1
A Single Guard and Update Pair
Affine updates, as derived in Sect. 3.4, relate symbolic constants on the inequalities in the input octagon to those of the output octagon. These updates confer a different, simpler, evaluation model. To compute r0 + r1 ≤ d5 in c it is sufficient to compute c g [22] which is the octagon that describes the conjoined system c ∧ g. This can be computed in quadratic-time when g is a single inequality and in cubic-time otherwise [22]. The meet c g then defines values for the
110
J. Brauer and A. King
symbolic constants di , though these values may include −∞ and ∞. The value of d5 is defined by its affine update, that is, as a weighted sum of the di values. If there is no affine update for d5 , then its value defaults to ∞. If bounds have been inferred for output octagons (Sect 3.5), then the di can possibly be refined with a tighter bound. This evaluation mechanism thus replaces ILP with arithmetic that is both conceptually simple and computationally efficient. This is significant since transfer functions are themselves computed many times during fixpoint evaluation. 4.2
A System of Guard and Update Pairs
The above evaluation procedure needs to be applied for each guard g and update u pair for which c g is satisfiable. Thus several output octagons may be derived for a single block. We do not prescribe how these octagons should be combined, for example, a disjunctive representation is one possibility [13]. However, the simplest tactic is undoubtedly to apply the merge operation for octagons [22] (though this entails closing the output octagons).
5
Experiments
We have implemented the techniques described in this paper in Java using the Sat4J solver [21], so as to integrate with our analysis framework for machine code [30], called [mc]square, which is also coded in Java. All experiments were performed on a MacBook Pro equipped with a 2.6 GHz dual-core processor and 4 GB of RAM, but only a single core was used in our experiments. To evaluate transfer function synthesis without quantifier elimination, Tab. 2 compares the results for intervals for different blocks of assembly code to those obtained using the technique described in [6]. Column #instr contains the number of instructions, whereas column #bits gives the bit-width. (The 8-bit and 32bit versions of the AVR instruction sets are analogous.) Then, #affine presents the number of affine relations for each block. The columns runtime contain the runtime and the number of SAT instances. The overall runtime of the eliminationbased algorithm [6] is given in column old (∞ is used for timeout, which is set to 30s). Transfer function synthesis for blocks of up to 10 instruction is evaluated, which is a typical size for microcontroller code. For these size blocks, we have never observed more than 10 feasible mode combinations. Comparison. Using quantifier elimination, all instances could be solved in a reasonable amount of time for 8-bit instructions. However, only the small instances could be solved for 32 bits (and only then because the Boolean encodings for the instructions were minimised prior to the synthesis of the transfer functions). It is also important to appreciate that none of the timeouts was caused by the SAT solver; it was resolution that failed to produce results in reasonable time. By way of comparison, synthesising guards for different overflow modes requires most runtime in our new approach, caused by the fact that the number of SAT
Transfer Function Synthesis without Quantifier Elimination
111
Table 2. Experimental results for synthesis of transfer functions block
#instr #affine #bits
inc
1
2
inc+shift
2
3
swap
3
1
inc+flip
4
2
abs
5
3
inc+abs
6
3
sum+isign
7
5
exchange+abs
10
3
8 32 8 32 8 32 8 32 8 32 8 32 8 32 8 32
runtime guards / #SAT affine / #SAT 0.2s / 40 0.1s / 5 0.5s / 136 0.2s / 5 0.3s / 60 0.1s / 8 0.8s / 216 0.2s / 8 — 0.1s / 3 — 0.1s / 3 0.2s / 40 0.2s / 5 0.9s / 216 0.3s / 5 2.5s / 216 0.3s / 8 6.5s / 792 0.3s / 8 2.6s / 216 0.3s / 8 6.7s / 792 0.3s / 8 4.1s / 360 0.2s / 18 10.7s / 1320 0.4s / 18 2.8s / 216 0.3s / 8 7.2s / 792 0.3s / 8
overall 0.3s 1.0s 0.4s 1.0s 0.1s 0.1s 0.4s 1.2s 2.8s 6.8s 2.9s 7.0s 4.3s 11.1s 3.1s 7.5s
old 0.2s 23.0s 0.3s ∞ 0.1s 0.2s 0.5s ∞ 0.8s ∞ 1.4s ∞ 4.5s ∞ 9.5s ∞
instances to be solved grows linearly with the number of bits and quadratically with the number of variables (the number of octagonal inequalities is quadratic in the number of variables). Computing the affine updates consumes only a fraction of the overall time. In terms of precision, the results coincide with those previously generated [6]. The block for swap is interesting since it consists of three consecutive exclusiveor instructions, for which there is no coupling between different bits of the same register. The block is also unusual in that it is uni-modal with vacuous guards. These properties make it ideal for resolution. Even in this situation, the new technique scales better. In fact, the Boolean formulae that we present to the solver are almost trivial by modern standards, the main overhead coming from repeated SAT solving rather than solving a single large instance. Sat4J does reuse clauses learnt in an earlier SAT instances, though it does not permit clauses to be incrementally added and rescinded which is useful when solving maximisation problems [6]. Thus the timings given above are very conservative; indeed Sat4J was chosen to maintain the portability of [mc]square rather than for raw performance. Nevertheless, these timings very favourably compare with those required to compute transfer functions for intervals using BDDs [28], where in excess of 24 hours is required for single 8-bit instructions. Deriving octagonal transfer functions. The process of deriving octagonal transfer functions by lifting (Sect. 3.3) requires an imperceivable overhead compared to computing affine relations themselves, indeed it is merely syntactic rewriting. The runtimes required for inferring affine inequalities by alternating range refinement and affine join (Sect. 3.4), however, is typically 3 or 4 times
112
J. Brauer and A. King
slower than those of computing the guards; the number of symbolic constants on the output inequalities corresponds exactly to the number of input guards. (We refrain from giving exact times since this component has not been tuned.) Further optimisations. Since transfer functions are program dependent, one could first use a simple form of range analysis [5] to over-approximate the ranges a register can assume. These ranges can be encoded in the formulae, thereby pruning out some mode-combinations. For example, it is rarely the case that the absolute value function is actually applied to the smallest representable integer.
6
Related Work
Although the problem of constructing transfer functions has been recognised for over twenty years, for example, by Cousot and Halbwachs [12] for the polyhedral domain and by Granger [14] for linear congruences, automatic synthesis has only recently become a practical proposition due to emergence of robust decision procedures [19,29] and quantifier elimination techniques [20,23,24]. Transfer functions [29] can always be found for domains that satisfy the finite ascending chain condition, provided one is prepared to pay the cost of calling a decision procedure repeatedly on each application of a transformer. This motivates applying a decision procedure in order to compute optimal transfer functions offline, prior to the actual analysis [6,19]. Our previous work [6] shows how bit-blasting and quantifier elimination can be applied to synthesise transformers for bit-vector programs. This work was inspired by that of Monniaux [23,24] on synthesising transfer functions for piecewise linear programs. Although his approach extends beyond octagons [32], it is unclear how to express some instructions (such as exclusive-or) in terms of linear constraints. Universal quantification, as used in both approaches, also appears in work on inferring linear template constraints by Gulwani et al. [15]. There, the authors apply Farkas’ lemma in order to transform universal quantification into existential quantification, albeit at the cost of completeness since Farkas’ lemma prevents integral reasoning. However, crucially, neither Monniaux nor Gulwani et al. provide a way to model integer overflow. By way of contrast, our approach explains how to systematically handle wrap-around arithmetic in the transfer function itself whilst sidestepping quantifier elimination. Transfer functions have been automatically synthesised for intervals using BDDs by applying interval subdivision [28]. If g : [0, 28 − 1] → [0, 28 − 1] is a unary operation on an unsigned byte, then its transformer f : D → D on D = {∅} ∪ {[, u] | 0 ≤ ≤ u < 28 } can be defined recursively. If = u then f ([, u]) = g() whereas if < u then f ([, u]) = f ([, m − 1]) f ([m, u]) where m = u/2n 2n and n = log2 (u − + 1). Binary operations can likewise be decomposed. The 8-bit inputs, and u, can be represented as 8-bit vectors, as can the 8-bit outputs, so as to represent f with a BDD. This permits caching to be applied when f is computed, which reduces the time needed to compute a best transformer to approximately one day for each 8-bit operation.
Transfer Function Synthesis without Quantifier Elimination
113
The classical approach to handling overflow is to verify that they do not occur using unbounded domains as implemented in the Astree tool [11]. However, for the domain of polyhedra, it is also possible to revise the concretisation map to reflect the effect of truncation [31]. Another choice is to deploy congruence relations [14] where the modulus is a power of two [19,25]. Finally, bit-blasting has been combined with range inference elsewhere [5,8], though neither of these papers address relational abstraction nor transfer function synthesis.
7
Concluding Discussion
Synopsis. This paper revisits the problem of synthesising transfer functions for programs whose semantics is defined over finite bit-vectors. The irony is that although Boolean formula initially appear attractive for synthesis because of the simplicity of universal projection [6], their real strength is the fact that they are discrete. This permits octagonal inequalities to be inferred by repeated satisfiability testing, avoiding the need for quantifier elimination, and in particular the complexity of resolution. The force of this observation is that it extends transfer function synthesis to architectures whose word size exceeds 8 bits, strengthening the case for low-level code verification [4,30]. Future work. The problem of synthesising transfer functions is not dissimilar to that of inferring ranking functions for bit-vector programs [9]. The existence of a ranking function on a path π with a transition relation rπ (x, x ) amounts to solving the formula ∃c : ∀x : ∀x : rπ (x, x ) → (p(c, x) < p(c, x )) where p(c, x) is a polynomial over the bit-vector x and c is a bit-vector of coefficients. However, if intermediate variables y are needed to express rπ (x, x ), p(c, x), p(c, x ) or