Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
6154
Alessandro Aldini Marco Bernardo Alessandra Di Pierro Herbert Wiklicky (Eds.)
Formal Methods for Quantitative Aspects of Programming Languages 10th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2010 Bertinoro, Italy, June 21-26, 2010 Advanced Lectures
13
Volume Editors Alessandro Aldini Università di Urbino “Carlo Bo”, Dipartimento di Matematica, Fisica e Informatica Piazza della Repubblica 13, 61029 Urbino, Italy E-mail:
[email protected] Marco Bernardo Università di Urbino “Carlo Bo”, Dipartimento di Matematica, Fisica e Informatica Piazza della Repubblica 13, 61029 Urbino, Italy E-mail:
[email protected] Alessandra Di Pierro Università di Verona, Dipartimento di Informatica Ca’ Vignal 2, Strada le Grazie 15, 37134 Verona, Italy E-mail:
[email protected] Herbert Wiklicky Imperial College London, Department of Computing Huxley Building, 180 Queen’s Gate, London SW7 2BZ, UK E-mail:
[email protected] Library of Congress Control Number: 2010928129 CR Subject Classification (1998): D.2.4, D.3.1, F.3-4, C.3 LNCS Sublibrary: SL 2 – Programming and Software Engineering ISSN ISBN-10 ISBN-13
0302-9743 3-642-13677-X Springer Berlin Heidelberg New York 978-3-642-13677-1 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
This volume presents the set of papers accompanying some of the lectures of the 10th International School on Formal Methods for the Design of Computer, Communication and Software Systems (SFM). This series of schools addresses the use of formal methods in computer science as a prominent approach to the rigorous design of the above-mentioned systems. The main aim of the SFM series is to offer a good spectrum of current research in foundations as well as applications of formal methods, which can be of help for graduate students and young researchers who intend to approach the field. SFM 2010 was devoted to formal methods for quantitative aspects of programming languages and covered several topics including probabilistic and timed models, model checking, static analysis, quantum computing, real-time and embedded systems, and security. This volume comprises four articles. The paper by Di Pierro, Hankin, and Wiklicky investigates the relation between the operational semantics of probabilistic programming languages and discrete-time Markov chains and presents a framework for probabilistic program analysis inspired by classical abstract interpretation. Broadbent, Fitzsimons, and Kashefi review the mathematical model underlying measurement-based quantum computation, a novel approach to quantum computation where measurement is the main driving force of computation instead of the unitary operations of the more traditional quantum circuit model. The paper by Malacaria and Heusser illustrates the informationtheoretical basis of quantitative information flow by showing the relationship between lattices, partitions, and information-theoretical concepts, as well as their applicability to quantify leakage of confidential information in programs. Finally, Wolter and Reinecke discuss the trade-off between performance and security by formulating metrics that explicitly express the trade-off and by showing how to find system parameters that optimize those metrics. We believe that this book offers a useful view of what has been done and what is going on worldwide in the field of formal methods for quantitative aspects of programming languages. We wish to thank all the speakers and all the participants for a lively and fruitful school. We also wish to thank the entire staff of the University Residential Center of Bertinoro for the organizational and administrative support. June 2010
Alessandro Aldini Marco Bernardo Alessandra Di Pierro Herbert Wiklicky
Table of Contents
Probabilistic Semantics and Program Analysis . . . . . . . . . . . . . . . . . . . . . . . Alessandra Di Pierro, Chris Hankin, and Herbert Wiklicky
1
Measurement-Based and Universal Blind Quantum Computation . . . . . . . Anne Broadbent, Joseph Fitzsimons, and Elham Kashefi
43
Information Theory and Security: Quantitative Information Flow . . . . . . Pasquale Malacaria and Jonathan Heusser
87
Performance and Security Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katinka Wolter and Philipp Reinecke
135
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169
Probabilistic Semantics and Program Analysis Alessandra Di Pierro1, Chris Hankin2 , and Herbert Wiklicky2 1 2
University of Verona, Ca’ Vignal 2 - Strada le Grazie 15, 37134 Verona, Italy
[email protected] Imperial College London, 180 Queen’s Gate, London SW7 2AZ, United Kingdom {clh,herbert}@doc.ic.ac.uk
Abstract. The aims of these lecture notes are two-fold: (i) we investigate the relation between the operational semantics of probabilistic programming languages and Discrete Time Markov Chains (DTMCs), and (ii) we present a framework for probabilistic program analysis which is inspired by the classical Abstract Interpretation framework by Cousot & Cousot and which we introduced as Probabilistic Abstract Interpretation (PAI) in [1]. The link between programming languages and DTMCs is the construction of a so-called Linear Operator semantics (LOS) in a syntax-directed or compositional way. The main element in this construction is the use of tensor product to combine information about different aspects of a program. Although this inevitably results in a combinatorial explosion of the size of the semantics of program, the PAI approach allows us to keep some control and to obtain reasonably sized abstract models.
1
Introduction
These lecture notes aim in establishing a formal link between the semantics of deterministic and probabilistic programming languages and Markov Chains. We will consider only discrete time models but, as we have shown in [2], it is possible to use similar constructions also to model continuous time systems. Our motivation is based on concrete systems rather than specifications of systems as we find it for example in the area of process algebras; we therefore eliminate any non-probabilistic or pure non-determinism. To a certain degree non-deterministic models can be simulated by using “unknown” probability variables rather than constants to express choice probabilities. However, this leads to slightly different outcomes as even “unknown” probabilities, for example, are able to express correlations between different choices. A further (didactic) restriction we will use throughout these notes is the finiteness of our state and configuration spaces. Although it is possible to develop a similar framework also for infinite spaces, this requires certain mathematical tools from Functional Analysis and Operator Theory (e.g. C∗ algebras, Hilbert and Banach spaces) which are beyond what a short introduction can provide. We will therefore consider only a finite-dimensional algebraic theory for which a basic knowledge of linear algebra is sufficient. A. Aldini et al. (Eds.): SFM 2010, LNCS 6154, pp. 1–42, 2010. c Springer-Verlag Berlin Heidelberg 2010
2
A. Di Pierro, C. Hankin, and H. Wiklicky
In the following we will use a simple but intriguing example to illustrate our approach: Example 1 (Monty Hall). The origins of this example are legendary. Allegedly, it goes back to some TV show in which the contestant was given the chance to win a car or other prizes by picking the right door behind which the desired prize could be found. The game proceeds as follows: First the contestant is invited to pick one of three doors (behind one is the prize) but the door is not yet opened. Instead, the host – legendary Monty Hall – opens one of the other doors which is empty. After that the contestant is given a last chance to stick with his/her door or to switch to the other closed one. Note that the host (knowing where the prize is) has always at least one door he can open. The problem is whether it is better to stay stubborn or to switch the chosen door. Assuming that there is an equal chance for all doors to hide the prize it is a favourite exercise in basic probability theory to demonstrate that it is better to switch to a new door. We will analyse this example using probabilistic techniques in program analysis - rather than more or less informal mathematical arguments. An extensive discussion of the problem can be found in [3] where it is also observed that a bias in hiding the car (e.g. because the architecture of the TV studio does not allow for enough room behind a door to put the prize there) changes the analysis dramatically. Note that it is pointless to investigate a non-deterministic version of the Monty Hall problem: If we are only interested in a possibilistic analysis then both strategies have exactly the same possible outcomes: The contestant might win or lose – everything is possible. As in many walks of life it is not what is possible that determines success, but the chances of achieving one’s aim.
2
Mathematical Preliminaries
We assume that the reader of these lecture notes is well acquainted with basic ideas from linear algebra and probability theory. We will consider here only finite dimensional spaces and thus avoid a detailed consideration of finite dimensional spaces, as in functional analysis, and general measure theoretic concepts. However, it is often possible to generalise the concepts to such an infinite dimensional setting and we may occasionally mention this or give hints in this direction. We need to introduce a few basic mathematical concepts – the acquainted readers may skip immediately to Section 3. The aim of this section is to sketch the basic constructions and to provide some motivation and intuition of the mathematical framework we use. A more detailed discussion of the notions and concepts we need can be found in the appropriate textbooks on probability and linear algebra.
Probabilistic Semantics and Program Analysis
2.1
3
Vector Spaces
In all generality, the real vector space V(S, R) = V(S) over a set S is defined as the formal1 linear combinations of elements in S which we can also see as tuples of real numbers xs indexed by elements in S V(S) = {xs , ss∈S | xs ∈ R} = xs s = {(xs )s∈S } , s∈S
with the usual point-wise algebraic operations, i.e. scalar multiplication for λ ∈ R: λ · (xs )s = (λ · xs )s and vector addition (xs )s + (ys )s = (xs + ys )s . We denote tuples like (xs )s or (ys )s as vectors x and y. We consider in the following only finite dimensional vector spaces, i.e. V(S) over finite sets S, as they possess a unique topological structure, see e.g. [4, 1.22]. By imposing additional constraints one could equip V(S) with an appropriate topological structure even for infinite sets S, e.g. by considering Banach or Hilbert spaces like 1 (S), 2 (S), etc. (see for example [5]). The importance of vector spaces in the context of these notes comes from the fact that we can use them to represent probability distributions ρ, i.e. normalised functions which associate to elements in S some probability in the interval [0, 1] ρ : S → [0, 1] s.t. ρ(s) = 1. s∈S
The set of all distributions Dist(S) on S is isomorphic to a sub-set (however, not a sub-space) of V(S). This helps to transfer the algebraic structures of V like, for example, the tensor product (see below) immediately into the context of distributions. The important class of structure preserving maps between vector spaces V and W are linear maps T : V → W which fulfil: T(v) = λ · T(v) and T(v 1 + v 2 ) = T(v 1 ) + T(v 2 ). For linear maps T : V → V we usually use the term operator. Vectors in any vector space can be represented – as in the above definition of V(S) – as a linear combination of elements a certain basis, or even simpler as a tuple, i.e. a row, of coordinates. Usually, we will use here the defining basis {s | s ∈ S} so that we do not need to consider the problem of base changes. As with vectors we can also represent linear maps in a standardised way as matrices. We will treat here the terms linear map and operator as synonymous of 1
We allow for any – also infinite – linear combinations. For the related notion of a free vector space one allows only finite linear combinations.
4
A. Di Pierro, C. Hankin, and H. Wiklicky
matrix. The standard representation of a linear map T : V → W simply records the image of all basis vectors of the basis in V and collects them as row vectors of a matrix. It is sufficient to just specify what happens to the (finitely many) basis vectors to completely determine T as by linearity this can be extended to all (uncountably infinitely many) vectors in V. Given a (row) vector x = (xs )s and the matrix (Tst )st , with the first index indicating the row and the second the column of the matrix entry, representing a linear map T we can implement the application of T to x as a matrix multiplication: T(x) = x · T = (xs )s · (Tst )st = ( xs Tst )t . s
2.2
Discrete Time Markov Chains
The standard and most popular model for probabilistic processes are Markov Chains. We assume a basic knowledge as presented for example in [6,7,8,9,10], to mention just a few of the many monographs on this topic. Markov chains have the important property that they are memory-less in the sense that the “next state” does not depend on anything else but the current state. Markov Chains come in two important versions as Discrete Time Markov Chains (DTMC) and Continuous Time Markov Chains (CTMC). We will deal here only with DTMCs, i.e. probabilistic descriptions of a system only at discrete time steps. This allows us to talk about the next state in the obvious way (for CTMC this concept is a bit more complicated). The DTMCs we will use to model the semantics of a programming language will be based on finitely many states S 2 . For such a system a description at a given point in time is represented by a distribution over the finite state space S, we will refer to the elements in s also as classical states and to the elements in Dist(S) as probabilistic states. In general, we would need measures or vectors in Banach or Hilbert spaces to describe probabilistic states. Once we have an enumeration of states in S we can represent probabilistic states, i.e. distributions on S, as normalised tuples or simply as vectors in V(S). The fact that DTMCs are memory-less means that we only need to specify how the description of a system changes into the one at the next step, i.e. how to transform one probabilistic state dt into the next one dt+1 . Intuitively, we need to describe how much of the probability of an si ∈ S is “distributed” to the other sj in the next moment. Again, we can use matrices to do this. More precisely, we need to consider stochastic matrices M, where all rows must sum up to 1, i.e. Mst = 1 for all s, t
so that for a distribution represented by d the image x·M is again a (normalised) distribution. Note that we follow in these notes the convention of postmultiplying M and that vectors are implemented as row vectors. 2
Unfortunately, the term “state” is used differently in probability theory and semantics: The (probabilistic) state space for the semantics we represent is made up of so-called configurations which are pairs of (semantical) states and statements.
Probabilistic Semantics and Program Analysis
5
We will consider here only homogenous DTMCs where the way the system changes does not change itself over time, i.e. d0 is transformed into d1 in the same way as dt becomes dt+1 at any time t. The change to matrix M, thus, does not depend on t. In fact, we can define a DTMC as we use it here just by specifying its state space S and its generator matrix M, which has to be stochastic. 2.3
Kronecker and Tensor Product
For the definition of our semantics we will use the tensor product construction. The tensor product U ⊗ V of two vector spaces U and V can be defined in a purely abstract way via the following universal property: For each bi-linear function f : U × V → W there exists a unique linear function f⊗ : U ⊗ V → W such that f (u, v) = f⊗ (u ⊗ v), see e.g. [11, Ch 14]. In the case of infinite dimensional topological vector spaces one usually imposes additional requirements on the tensor product ensuring, for example, that the tensor product of two Hilbert spaces is again a Hilbert space, see e.g. [12, 2.6]. Product measures on the Cartesian product of measure spaces as characterised by Fubini’s Theorem, see e.g. [13, 4.5], can also be seen as tensor products. For finite dimensional vector spaces we can realise U ⊗ V as the space of the tensor product of vectors in V and U. More concretely, we can construct the tensor product of two finite dimensional matrices or vectors – seen as 1 × n or n × 1 matrices – via the so-called Kronecker product: Given an n × m matrix A and a k × l matrix B then A ⊗ B is the nk × ml matrix ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ b1,1 . . . b1,l a1,1 B . . . a1,m B a1,1 . . . a1,m ⎟ ⎜ ⎟ ⎜ ⎜ .. ⎟ A ⊗ B = ⎝ ... . . . ... ⎠ ⊗ ⎝ ... . . . ... ⎠ = ⎝ ... . . . . ⎠ an,1 . . . am,n
bk,1 . . . bk,l
an,1 B . . . an,m B
For a d1 dimensional vector u and a d2 dimensional vector v we get a d1 · d2 dimensional vector u ⊗ v. The ith entry in u ⊗ v is the product of the i1 th coordinate of u with the i2 th coordinate of v. The relation between index i and the indices i1 and i2 is as follows: i = (i1 − 1) · d2 + (i2 − 1) + 1 i1 = (i − 1) div d2 + 1 i2 = (i − 1) mod d2 + 1 Note that the concrete realisation of the tensor product via the Kronecker product is not base independent, i.e. if we use a different basis to represent A and B then it is non-trivial to see how the coordinates of A ⊗ B change. Thus many texts prefer the abstract definition of tensor products. However, our discussions will not involve base changes and we thus can work with Kronecker and tensor products as synonyms.
6
A. Di Pierro, C. Hankin, and H. Wiklicky
The binary tensor/Kronecker product can easily be generalised to an n-ary version which is associative but not commutative. Among the important algebraic properties of the tensor/Kronecker product (of matrices and vectors with matching dimensions) we have for example, see e.g. [11,12]: (λA) ⊗ B = λ(A ⊗ B) = A ⊗ (λB) (A1 + A2 ) ⊗ B = (A1 ⊗ B) + (A1 ⊗ B) A ⊗ (B1 + B2 ) = (A ⊗ B1 ) + (A ⊗ B2 ) (A1 ⊗ B1 )(A2 ⊗ B2 ) = (A1 A2 ) ⊗ (B1 B2 ) If we consider the tensor product of vector spaces V(X) and V(Y ) over some (finite) sets X and Y then we get the following important isomorphism which relates the Cartesian product and the tensor product: V(X × Y ) = V(X) ⊗ V(Y ) This follows directly from the universal properties of the tensor product. In terms of distribution this provides a way to construct and understand the space of distributions over product spaces.
3
Probabilistic While
We now introduce a simple imperative language, pWhile, with constructs for probabilistic choice and random assignment, which is based on the well known While language one can find for example in [14,15]. We will use this language to investigate static program analysis techniques based on its semantics. We first present the syntax and operational semantics (in an SOS style) of pWhile; then we develop a syntax-directed semantics which will immediately give the generator of the corresponding DTMC. 3.1
Syntax
The overall structure of a pWhile program is made up from a possibly empty declaration part D of variables and a single statement S which represents the actual program: P ::= begin S end | var D begin S end The declarations D of variables v associate to them a certain basic type e.g. int, bool, or a simple value range r, which determine the possible values of the variable v. Each variable can have only one type, i.e. possible values are in the disjoint union of Z representing integers, B = {true, false} for booleans. r ::= bool | int | { c1 , . . . cn } | { c1 .. cn } D ::= v : r | v:r;D
Probabilistic Semantics and Program Analysis
7
The syntax of statements S is as follows: S ::= stop | skip | v := a | v ?= r | S1 ; S2 | choose p1 : S1 or p2 : S2 ro | if b then S1 else S2 fi | while b do S od We have in pWhile two types of “empty” statements, namely stop and the usual skip statement. We can use both as final statements in a program but while skip represents actual termination the meaning of stop is an infinite loop which replicates the terminal configuration forever – this is a behaviour we need in order to avoid “probability leaks” and to obtain proper DTMCs. The meaning of the assignment “:=”, sequential composition “;”, “if” and “while” are as usual – we only change the syntax slightly to allow for an easier implementation of a pWhile parser in ocaml. We have two additional probabilistic statements: a random assignment “?=” which assigns a random value to a variable using a uniform distribution over the possible values in the range r; and a probabilistic choice “choose”, which executes either S1 or S2 with probabilities p1 and p2 , respectively. Here p1 and p2 are constants and we assume without loss of generality that they are normalised, i.e. that p1 + p2 = 1; if this is not the case, we can also require that at compile time these values are normalised to obtain i p˜i = p1p+p . It is obvious how to generalise the “choose” construct from a binary 2 to an n-ary version. We will also use brackets, indentation and comment lines “#” to improve the readability of programs. Expressions e in pWhile are either boolean expressions b or arithmetic expressions a. Arithmetic expressions are of the form a ::= n | a1 a2 with n ∈ Z a constant and ‘’ representing one of the usual arithmetic operations like ‘+’, ‘−’, ‘×’, ‘/’ or ‘%’ (representing the remainder of an integer division). The syntax of boolean expressions b is defined by b ::= true | false | not b | b1 && b2 | b1 || b2 | a1 < > a2 The symbol ‘< >’ denotes one of the standard comparison operators for arithmetic expressions, i.e. .
8
3.2
A. Di Pierro, C. Hankin, and H. Wiklicky
Operational Semantics
The semantics of pWhile follows essentially the standard one for While as presented, e.g., in [15]. The only two differences concern (i) the probabilistic choice and (ii) random assignments. The structured operational semantics (SOS) is given as usual via a transition system on configurations S, σ, i.e. pairs of statements and (classical) states. To allow for probabilistic choices we label these transitions with probabilities; except for the choose construct and the random assignment these probabilities will always be 1 as all other statements in pWhile are deterministic. A state σ ∈ State describes how variables in Var are associated to values in Value = Z + B (with ‘+’ denoting the disjoint union). The value of a variable can be either an integer or a boolean constant, i.e. State = Var → Z + B The expressions a and b evaluate to values of type Z and B in the usual way. The value represented by an arithmetic expression can be computed by: E(n)σ = n E(v)σ = σ([[v]]σ) E(a1 a2 )σ = E(a1 )σ E(a2 )σ The result is always an integer (i.e. E(.)a ∈ Z). Boolean expressions are also handled in a similar way; their semantics is given by an element in B = {true, false}: E(true)σ E(false)σ E(not b)σ E(b1 || b2 )σ E(b1 && b2 )σ E(a1 < > a2 )σ
= true = false = ¬E(b)σ = E(b1 )σ ∨ E(b2 )σ = E(b1 )σ ∧ E(b2 )σ = E(a1 )σ < > E(a2 )σ
If we denote by Expr the set of all expressions e then the evaluation function E(.). is a function from Expr × State into Z + B. Based on the functions [[.]]. and E(.). the semantics of an assignment is given, for example, by: v := e, σ−→1 stop, σ[v → E(e)σ]. The state σ stays unchanged except for the variable v. The value of this variable is changed so that it now contains the value represented by the expression e. The formal definition of the transition rules defining the operational semantics of pWhile in the SOS style is given in Table 1.
Probabilistic Semantics and Program Analysis
9
Table 1. The rules of the SOS semantics of pWhile R0 skip, σ−→1 stop, σ R1 stop, σ−→1 stop, σ R2 v := e, σ−→1 stop, σ[v → E (e)σ] R3 v ?= r, σ−→
1 |r|
stop, σ[v → ri ∈ r]
R41
S1 , σ−→p S1 , σ S1 ; S2 , σ−→p S1 ; S2 , σ
R42
S1 , σ−→p stop, σ S1 ; S2 , σ−→p S2 , σ
R51 choose p1 : S1 or p2 : S2 ro, σ−→p1 S1 , σ R52 choose p1 : S1 or p2 : S2 ro, σ−→p2 S2 , σ R61 if b then S1 else S2 fi, σ−→1 S1 , σ
if E (b)σ = true
R62 if b then S1 else S2 fi, σ−→1 S2 , σ
if E (b)σ = false
R71 while b do S od, σ−→1 S; while b do S od, σ if E (b)σ = true R72 while b do S od, σ−→1 stop, σ
3.3
if E (b)σ = false
Examples
To illustrate the the use of pWhile to formulate probabilistic programs we present two small examples which we will use throughout these lecture notes. Example 2 (Factorial). This example concerns the Factorial of a natural number, i.e. n! = 1 · 2 · 3 · . . . · n (with 0! = 1). The two programs below compute the usual factorial n! and the “double factorial 2 · n!. var m : {0..2}; n : {0..2};
var m : {0..2}; n : {0..2};
begin m := 1; while (n>1) do m := m*n; n := n-1; od; stop; # looping end
begin m := 2; while (n>1) do m := m*n; n := n-1; od; stop; # looping end
10
A. Di Pierro, C. Hankin, and H. Wiklicky
Though these two programs are deterministic, we will still analyse them using probabilistic techniques. Example 3 (Monty Hall). Let us consider again Example 1 in Section 1. We can implement the two possible strategies of the contestant: Either to stick to his/her initial choice no matter what the show host is doing, or to switch doors once one of the empty doors has been opened. var d :{0,1,2}; g :{0,1,2}; o :{0,1,2};
var d :{0,1,2}; g :{0,1,2}; o :{0,1,2};
begin # Pick winning door d ?= {0,1,2}; # Pick guessed door g ?= {0,1,2}; # Open empty door o ?= {0,1,2}; while ((o == g) ||(o == d)) do o := (o+1)%3; od; # Stick with guess stop; # looping end
begin # Pick winning door d ?= {0,1,2}; # Pick guessed door g ?= {0,1,2}; # Open empty door o ?= {0,1,2}; while ((o == g) ||(o == d)) do o := (o+1)%3; od; # Switch guess g := (g+1)%3; while (g == o) do g := (g+1)%3; od; stop; # looping end
3.4
Linear Operator Semantics
In order to study the semantic properties of a pWhile program we will investigate the stochastic process which corresponds to the program executions. More precisely, we will construct the generator of a Discrete Time Markov Chain (DTMC) which represents the operational semantics of the program in question. The generator matrix of the DTMC which we will construct for any given pWhile program defines a linear operator – thus we refer to it as a Linear Operator Semantics (LOS) – on a vector space based on the labelled blocks and classical states of the program in question. The SOS transition relation – and in particular its restriction to the reachable configurations of a given program – can be directly encoded in a linear operator (cf. [16]), i.e. a matrix T defined for all configurations ci , cj by p if Si , σi −→p Sj , σj (T)ci ,cj = 0 otherwise,
Probabilistic Semantics and Program Analysis
11
However, this approach is in fact only a matrix representation of the SOS semantics and requires the construction of all possible execution trees. This is in itself not compositional, i.e. if we know already the DTMC of a part of the program (e.g. a while loop) it is impossible or at least extremely difficult to describe the operational semantics of a program which contains this part. Instead we present here a different construction which has the advantage of being compositional and therefore provides a more suitable basis for the compositional analysis in Section 4.2. In order to be able to refer to particular program points in an unambiguous way we introduce a standard labelling (cf. [15]) S ::= [stop] | [skip] | [v := a] | [v ?= r] | [S1 ; S2 | [choose] p1 : S1 or p2 : S2 ro | if [b] then S1 else S2 fi | while [b] do S od where is a label in Lab – typically just a unique number. Classical and Probabilistic States. The probabilistic state of the computation is described via a probability measure over the space of (classical) states State = (Var → Z + B). In order to keep the mathematical treatment as simple as possible we will exploit the fact that Var is finite for any given program. We furthermore restrict the actual range of integer variables to a finite sub-set Z of Z. Although such a finite restriction is somewhat unsatisfactory from a purely theoretical point of view, it appears to be justified in the context of static program analysis (one could argue that any “real world” program has to be executed on a computer with certain memory limitations). As a result we can restrict our construction to probability distributions on State, i.e. Dist(State) ⊆ V(State) rather than referring to the more general notion of probability measures on states. While in discrete, i.e. finite, probability spaces every measure can be defined via a distribution, the same does not hold any more for infinite state spaces, even for countable ones: it is, for example, impossible to define on the set of rationals in the interval [0, 1] a kind of “uniform distribution” which would correspond to the Lebesgue measure. As we consider only finitely many variables, v = |Var|, we can represent the space of all possible states Var → Z + B as the Cartesian product (Z + B)v , i.e. for every variable vi ∈ Var we specify its associated value in (a separate copy of) Z + B. As the declarations of variables fix their types – in effect their possible range – we can exploit this information by presenting the state in a slightly more effective way: State = Value1 × Value2 . . . × Valuev
12
A. Di Pierro, C. Hankin, and H. Wiklicky
with Valuei = Z or B. We will use the convention that, given v variables, we enumerate them according to the sequence in which they are declared in D. Probabilistic Control Flow. We base the compositional construction of our LOS semantics on a probabilistic version of the control flow [15] or abstract syntax [17] of pWhile programs. The flow F = flow is a set of triples i , pij , j which record the fact that control passes with probability pij from block Bi to block Bj , where a block is of the form Bi = [. . .]i . We assume label consistency, i.e. the labels on blocks are unique. We denote by Block(P ) the set of all blocks and by Lab(P ) the set of all labels in a program P . Except for the choose statement and the random assignment the probability pij is always equal to 1. For the if statement we indicate the control step into the then branch by underlining the target label; the same is the case for while statements. The formal definition of the control flow of a program following the presentation in [15] is based on two auxiliary operations init and final init : Stmt → Lab final : Stmt → P(Lab) which return the initial label and the final labels of a statement (whereas a sequence of statements has a single entry, it may have multiple exits, as for example in the conditional). init([skip] ) = init([stop] ) = init([v := e] ) = init([v ?= e] ) = init(S1 ; S2 ) = init(S1 ) init([choose] p1 : S1 or p2 : S2 ro) =
init(if [b] then S1 else S2 fi) = init(while [b] do S od) = and final([skip] ) = {} final([stop] ) = {} final([v := e] ) = {} init([v ?= e] ) = {} final(S1 ; S2 ) = final(S2 ) final([choose] p1 : S1 or p2 : S2 ro) = final(S1 ) ∪ final(S2 )
final(if [b] then S1 else S2 fi) = final(S1 ) ∪ final(S2 ) final(while [b] do S od) = {}
Probabilistic Semantics and Program Analysis
13
The probabilistic control flow F (S) = flow(S) is then defined via the a function flow flow : Stmt → P(Lab × [0, 1] × Lab) which maps statements to sets of triples which represent the probabilistic control flow graph: flow([skip] ) = ∅ flow([stop] ) = {, 1, } flow([v := e] ) = ∅ flow([v ?= e] ) = ∅ flow(S1 ; S2 ) = flow(S1 ) ∪ flow(S2 ) ∪ ∪ {(, 1, init(S2 )) | ∈ final(S1 )} flow([choose] p1 : S1 or p2 : S2 ro) = flow(S1 ) ∪ flow(S2 ) ∪ ∪ {(, p1 , init(S1 )), (, p2 , init(S2 ))} flow(if [b] then S1 else S2 fi) = flow(S1 ) ∪ flow(S2 ) ∪ ∪ {(, 1, init(S1 )), (, 1, init(S2 ))} flow(while [b] do S od) = flow(S) ∪ ∪ {(, 1, init(S))} ∪ {( , 1, ) | ∈ final(S)} Example 4. Consider the following labelled program P : var z : {0..200}; begin while [z1)]2 do [m := m*n]3 ;; [n := n-1]4 ;; od; [stop]5; end
var m : {0..2}; n : {0..2}; begin [m := 2]1 ; while [(n>1)]2 do [m := m*n]3 ; [n := n-1]4 ; od; [stop]5; end
24
A. Di Pierro, C. Hankin, and H. Wiklicky
The idea is now to analyse the properties of the states during the execution of the program rather than the actual or concrete values of the variables. To demonstrate this idea let us look at the parity of the variables, i.e. whether they are even or odd. The abstract property we are interested in is the description of the possible parities of the variables m and n: If we can guarantee that a variable is always ‘even’ when we reach a certain program point then we associate to it the abstract value or property even; if on the other hand we are certain that a variable is always ‘odd’, then we use odd as its abstract value. However, we also have to take care of the case when we are not sure about the parity of a variable: it could be sometimes even or sometimes odd. We use the value to indicate this ambiguous situation. We can distinguish this situation from another kind of unknown value ⊥ we use to handle non-initialised variables which are neither even nor odd. This situation can be formalised using the notion of a lattice L, cf. [21]: y DDD yy DD y y DD yy D y y evenE odd EE zz EE z z EE zz EE zz ⊥ which expresses the relation between abstract values as an order relation, e.g. is more general than even and odd, i.e. if we know that a variable could be even and odd then the general statement which describes its (abstract) value is to say that its parity is ambiguous or . We can interprete this property lattice also as the power-set of {even, odd}, i.e. L = P({even, odd}), identifying = {even, odd} and ⊥ = ∅ and ordered by inclusion “⊆”. We now consider the abstract execution of the “double factorial” program (on the left-hand side above). Two cases are possible: One where the guard in label 2 fails, and one where we enter the loop. The abstract values we can associate in these two cases (assuming that we start with unknown rather than non-initialised values) are: 1 : m → , n → 2 : m → even, n → 3: 4: 5 : m → even, n →
1 : m → , n → 2 : m → even, n → 3 : m → even, n → 4 : m → even, n → 5 : m → even, n →
We observe that the parity of n remains ambiguous throughout the execution of the program. However, whether or not the loop is executed, the parity of m will always be even when we reach the final label 5: If we omit the loop then the even value 2 we assigned to m is directly used; if we execute the loop, then m enters the loop at the first iteration with an even value and remains even despite the fact that in label 3 it is multiplied with an unknown n because we know that the
Probabilistic Semantics and Program Analysis
25
product of an even number with any number results again in an even number. In any subsequent iteration the same argument holds. Thus, whenever the loop terminates, we will always be certain that m is even when we reach label 5. The “double factorial” always produces an even result. If we consider the program on the right-hand side, which implements the simple “factorial” then our arguments break down. The abstract executions in this case give us: 1 : m → , n → 2 : m → odd, n → 3: 4: 5 : m → odd, n →
1 : m → , n → 2 : m → odd, n → 3 : m → , n → 4 : m → , n → 5 : m → , n →
If the loop is not executed we can guarantee that m is odd; but if we execute the loop then we have to multiply (in the first iteration) an odd m with an unknown n and we cannot guarantee any particular parity for m from then on. As a result the analysis will return for the parity of m at label 5. The factorial indeed may give an odd value (for 0 and 1) but it is obvious that for “most” values of n it will be an even number. The classical analysis is conservative and unable to extract this information. The remainder of these notes aims in developing a framework which allows for a formal analysis which captures such a “probabilistic” intuition. A detailed formal discussion of the relation between the concrete values of m and n as sub-sets of Z, i.e. as elements in the power-set P(Z) (which also forms a lattice in a canonical way via the sub-set relation) and their abstract values in L is beyond the the scope of these notes. For our purposes, it is sufficient to say that there exists a abstraction function α between the concrete and abstract values of m and n and a formal way to define an abstract semantics describing our factorial programs in terms of these abstract values by constructing the “right” concretisation function γ. In the standard theory of abstract interpretation, which was introduced by Cousot & Cousot 30 years ago [22,23], the correctness of an abstract semantics is guaranteed by ensuring that we have a pair of functions α and γ which form a Galois connection between two lattices C and D representing concrete and abstract properties. Definition 1. Let C = (C, ≤C ) and D = (D, ≤D ) be two partially ordered set (e.g. lattices). If there are two functions α : C → D and γ : D → C such that for all c ∈ C and all d ∈ D: c ≤C γ(d) iff α(c) ≤D d, then (C, α, γ, D) forms a Galois connection. The intended meaning is that an abstract element d approximates a concrete one c if c ≤C γ(d) or equivalently (by adjunction) if α(c) ≤D d. Therefore,
26
A. Di Pierro, C. Hankin, and H. Wiklicky
the concrete value corresponding to an abstract denotation d is γ(d), while the adjunction guarantees that α(c) is the best possible approximation of c in D (because whenever d is a correct approximation of c, then α(c) ≤D d). An abstract function f # : D → D is a correct approximation of a concrete function f : C → C if α ◦ f ≤A f # ◦ α If α and γ form a Galois connection then correctness is automatically guaranteed. The important case is when f describes the (concrete) semantics of a program. An easy way to define a correct abstract function (e.g. a semantics) f # is to induce it simply via f # = α ◦ f ◦ γ. An alternative characterisation of a Galois connection is as follows: Theorem 1. Let C = (C, ≤C ) and D = (D, ≤D ) be two partially ordered set together with two functions α : C → D and γ : D → C. Then (C, α, γ, D) form a Galois connection iff 1. α and γ are order-preserving, 2. α ◦ γ is reductive (i.e. for any d ∈ D, α ◦ γ(d) ≤D d), 3. γ ◦ α is extensive (i.e. for any c ∈ C, c ≤C γ ◦ α(c)). A further important property of Galois connections guarantees that the approximation of a concrete semantics by means of two functions α and γ related by a Galois connection is not only safe but also conservative in as far as repeating the abstraction or the concretisation gives the same results as by a single application of these functions. Formally, this property is expressed by the following proposition: Let (C, α, γ, D) be a Galois connection, then α and γ are quasi-inverse, i.e. α ◦ γ ◦ α = α, and γ ◦ α ◦ γ = γ. 4.2
Probabilistic Abstract Interpretation
The general approach for constructing simplified versions of a concrete (collecting) semantics via abstract interpretation is based on order-theoretic and not on linear structures. One can define a number of orderings (lexicographic, etc.) as an additional structure on a given vector space, and then use this order to compute over- or under-approximations using classical Abstract Interpretation. Though such approximations will always be safe, they might also be quite unrealistic, addressing a worst case scenario rather than the average case [24]. Furthermore, there is no canonical order on a vector space (e.g. the lexicographic order depends on the base). In order to provide probabilistic estimates we have previously introduced, cf. [1,25], a quantitative version of the Cousot & Cousot framework, which we have called Probabilistic Abstract Interpretation (PAI). The PAI approach is based, as in the classical case, on a concrete and abstract domain C and D – except that C and D are now vector spaces (or in general, Hilbert spaces) instead of lattices. We assume that the pair of abstraction and concretisation function α : C → D and γ : D → C are again structure preserving, i.e. in our setting they are (bounded) linear maps represented by matrices A and G. Finally, we replace the notion of a Galois connection by the notion of a Moore-Penrose pseudo-inverse.
Probabilistic Semantics and Program Analysis
27
Definition 2. Let C and D be two finite dimensional vector spaces, and let A : C → D be a linear map between them. The linear map A† = G : D → C is the Moore-Penrose pseudo-inverse of A iff A ◦ G = PA and G ◦ A = PG where PA and PG denote orthogonal projections (i.e. P∗A = PA = P2A and P∗G = PG = P2G where .∗ denotes the adjoint [11, Ch 10]) onto the ranges of A and G. Alternatively, if A is Moore-Penrose invertible, its Moore-Penrose pseudo-inverse, A† satisfies the following: (i) AA† A = A, (ii) A† AA† = A† , (iii) (AA† )∗ = AA† , (iv) (A† A)∗ = A† A. It is instructive to compare these equations with the classical setting. For example, if (α, γ) is a Galois connection we similarly have α ◦ γ ◦ α = α and γ ◦ α ◦ γ = γ. This allows us to construct the closest (i.e. least square, see for example [26,27]) approximation T# : D → D of the concrete semantics T : C → C as: T# = G · T · A = A† · T · A = A ◦ T ◦ G. As our concrete semantics is constructed using tensor products it is important that the Moore-Penrose pseudo-inverse of a tensor product can easily be computed as follows [27, 2.1,Ex 3]: (A1 ⊗ A2 ⊗ . . . ⊗ An )† = A†1 ⊗ A†2 ⊗ . . . ⊗ A†n . Example 7 (Parity). Let us consider as abstract and concrete domains C = V({−n, . . . , n}) and D = V({even, odd}). The abstraction operator Ap and its concretisation operator Gp = A†p corresponding to a parity analysis are represented by the following n × 2 and 2 × n matrices (assuming w.l.o.g. that n is even) with .T denoting the matrix transpose, (AT )ij = (A)ji : ⎛ ⎞ 10 ⎜0 1⎟ ⎜ ⎟ 1 ⎜1 0⎟ 1 1 0 n+1 0 . . . n+1 ⎜ ⎟ † n+1 Ap = ⎜ 0 1 ⎟ Ap = 0 n1 0 n1 . . . 0 ⎜ ⎟ ⎜ .. .. ⎟ ⎝. .⎠ 10 The concretisation operator A†p represents uniform distributions over the n + 1 even numbers in the range −n, . . . , n (as the first row) and the n odd numbers in the same range (in the second row).
28
A. Di Pierro, C. Hankin, and H. Wiklicky
Example 8 (Sign). With C = V({−n, . . . , 0, . . . , n}) and D = V({−, 0, +}) we can represent the usual sign abstraction by the following matrices: ⎞ ⎛ 100 ⎜ .. .. .. ⎟ ⎜. . .⎟ ⎜ ⎟ ⎛1 ⎞ 1 ⎜1 0 0⎟ n ... n 0 0 ... 0 ⎜ ⎟ ⎟ As = ⎜ A†s = ⎝ 0 . . . 0 1 0 . . . 0 ⎠ ⎜0 1 0⎟ ⎜0 0 1⎟ 0 . . . 0 0 n1 . . . n1 ⎜ ⎟ ⎜. . .⎟ ⎝ .. .. .. ⎠ 001 Example 9 (Forget). We can also abstract all details of the concrete semantics. Although this is in general a rather unusual abstraction it is quite useful in the context of a tensor product state and/or abstraction. Let the concrete domain be the vector space over any range, i.e. C = V({n, . . . , 0, . . . , m}), and the abstract domain a one dimensional space D = V({}). Then the forgetful abstraction and concretisation can be defined by: 1 1 1 1 ATf = 1 1 1 . . . 1 A†f = m−n+1 m−n+1 m−n+1 . . . m−n+1 For any matrix M operating on C = V({n, . . . , 0, . . . , m}) the abstraction A†f ·M· Af gives a one dimensional matrix, i.e. a single scalar μ. For stochastic matrices, such as our T generating the DTMC representing the concrete semantics we have: μ = 1. If we consider a tensor product of two matrices M ⊗ N, then the abstraction Af ⊗ I extracts (essentially) N, (Af ⊗ I)† · (M ⊗ N) · (Af ⊗ I) = = (A†f ⊗ I† ) · (M ⊗ N) · (Af ⊗ I) = = (A†f · M · Af ) ⊗ (I · N · I) = = μ ⊗ N = μN. 4.3
Abstract LOS Semantics
The abstract semantics T# (P ) of a program P is constructed exactly like the concrete one, except that we will use abstract tests and update operators. This is possible as abstractions and concretisations distribute over sums and tensor products. More precisely, we can construct T# for a program P as: T# (P ) = pij · T# (i , j ) i,pij ,j∈F (P )
where the transfer operator along a computational step from label i to j can be abstracted “locally”: Abstracting each variable separately and using the concrete control flow we get the operator v A=( Ai ) ⊗ I = A1 ⊗ A2 ⊗ . . . ⊗ Av ⊗ I. i=1
Probabilistic Semantics and Program Analysis
29
Then the abstract transfer operator T# (i , j ) can be defined as: T# (i , j ) = (A†1 Ni1 A1 ) ⊗ (A†2 Ni2 A2 ) ⊗ . . . ⊗ (A†v Niv Av ) ⊗ E(i , j ). This operator implements the (abstract) effect to each of the variables in the individual statement at i and combines it with the concrete control flow. This follows directly from a short calculation: T# = A† TA = = A† ( pij · T(i , j ))A = =
i,j
=
i,j
=
i,j
=
i,j
i,j
pij · (A† T(i , j )A) = pij · ( Ak ⊗ I)† T(i , j )( Ak ⊗ I) = k
k
† pij · ( Ak ⊗ I† )( Nik ⊗ E(i , j ))( Ak ⊗ I) = k
pij ·
k
k
(A†k Nik Ak )
k
⊗ E(i , j ).
It is of course also possible to abstract the control flow, or to use abstractions which abstract several variables at the same time, e.g. by specifying the abstract state via the difference of two variables. The dramatic reduction in size, i.e. dimensions, achieved via PAI illustrated also by the examples in these notes lets us hope that our approach could ultimately lead to scalable analyses, despite the fact that the concrete semantics is non-feasibly large. As many people have observed – the use of tensor products or similar constructions in probabilistic models leads to a combinatorial explosion of the size of the formal model. However, the PAI approach allows us to keep some control and to obtain reasonably sized abstract models. Further work in the form of practical implementations and experiments is needed to decide whether this is indeed the case. The LOS represents the SOS via the generator of a DTMC. It describes the stepwise evolution of the state of a computation and does not provide a fixed-point semantics. Therefore, neither in the concrete nor in the abstract case can we guarantee that limn→∞ (T(P ))n or limn→∞ (T(P )# )n always exist. The analysis of a program P based on the abstract operator T(P )# is considerably simpler than by considering the concrete one but still not entirely trivial. Various properties of T(P )# can be extracted by iterative methods (e.g. computing limn→∞ (T(P )# )n or some averages). As often in numerical computation, these methods will converge only for n → ∞ and any result obtained after only a finite number of steps will only be an approximation. However, one can study stopping criteria which guarantee a certain quality of this approximation. The development or adaptation of iterative methods and formulation of appropriate stopping criteria might be seen as the numerical analog to widening and narrowing techniques within the classical setting.
30
4.4
A. Di Pierro, C. Hankin, and H. Wiklicky
Classical vs. Probabilistic Abstract Interpretation
Classical abstract interpretation and probabilistic abstract interpretation provide “approximations” for different mathematical structures, namely partial orders vs vector spaces. In order to illustrate and compare their features we therefore need a setting where the domain in question in some way naturally provides both structures. One such situation is in the context of classical function interpolation or approximation. The set of real-valued functions on a real interval [a, b] obviously comes with a canonical partial order, namely the point-wise ordering, and at the same time is equipped with a vector space structure, where again addition and scalar multiplication are defined point-wise. Some care has to be taken in order to define an inner product – which we need to obtain a Hilbert space structure, e.g. one could consider only the square integrable functions L2 ([a, b]). In order to avoid mathematical (e.g. measure-theoretic) details we simplify the situation by just considering the step functions on the interval [a, b]. For a (closed) real interval [a, b] ⊆ R we ncall the set of subintervals [ai , bi ] with i = 1, . . . , n the n-subdivision of [a, b] if i=1 [ai , bi ] = [a, b] and bi − ai = b−a for n all i = 1, . . . , n. We assume that the sub-intervals are enumerated in the obvious way, i.e. ai < bi = ai+1 < bi+1 for all i and in particular that a = a1 and bn = b. Definition 3. The set of n-step functions Tn ([a, b]) on [a, b] is the set of realvalued functions f : [a, b] → R such that f is constant on each subinterval (ai , bi ) in the n-subdivision of [a, b]. We define a partial order on Tn ([a, b]) in the obvious way for f, g ∈ Tn ([a, b]): f g iff f (
b i − ai b i − ai ) ≤ g( ), for all 1 ≤ i ≤ n 2 2
i.e. iff the value of f (which we obtain by evaluating it on the mid-point in (ai , bi )) on all subintervals (ai , bi ) is less or equal to the value of g. It is also obvious to see that Tn ([a, b]) has a vector space structure isomorphic to Rn and thus is also provided with an inner product. More concretely we define the vector space operations . · . : R × Tn ([a, b]) → Tn ([a, b]) and . + . : Tn ([a, b]) × Tn ([a, b]) → Tn ([a, b]) pointwise as follows: (α · f )(x) = αf (x) (f + g)(x) = f (x) + g(x) for all α ∈ R, f, g ∈ Tn ([a, b]) and x ∈ [a, b]. The inner product is given by: f, g =
n i=1
f(
b i − ai b i − ai )g( ). 2 2
In this setting we now can apply and compare both the classical and the quantitative version of abstract interpretation as in the following example.
Probabilistic Semantics and Program Analysis
31
Example 10. Let us consider a step function f in T16 (the concrete values of a and b don’t really play a role in our setting) which can be depicted as: 10 9 8 7 6 5 4 3 2 1 0 a
b
We can also represent f by the vector in R : 5567843286679887 16
We then construct a series of abstractions which correspond to coarser and coarser sub-divisions of the interval [a, b], e.g. considering 8, 4 etc. subintervals instead of the original 16. These abstractions are from T16 ([a, b]) to T8 ([a, b]), T4 ([a, b]) etc. and can be represented by 16 × 8, 16 × 4, etc. matrices. For example, the abstraction which joins two sub-intervals and which corresponds to the abstraction α8 : T16 ([a, b]) → T8 ([a, b]) together with its Moore-Penrose pseudoinverse is represented by: ⎛ ⎞ 10000000 ⎜1 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜0 1 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜0 1 0 0 0 0 0 0⎟ ⎜ ⎟ ⎛1 1 ⎞ ⎜0 0 1 0 0 0 0 0⎟ 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ⎜ ⎟ ⎜0 0 1 0 0 0 0 0⎟ ⎜ 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0⎟ 2 2 ⎜ ⎟ ⎟ ⎜ ⎜0 0 0 1 0 0 0 0⎟ ⎜ 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0⎟ 2 2 ⎜ ⎟ ⎜ ⎟ 1 1 ⎜0 0 0 1 0 0 0 0⎟ ⎜ ⎟ ⎟ G8 = ⎜ 0 0 0 0 0 0 2 2 10 10 0 0 0 0 0 0 ⎟ A8 = ⎜ ⎜0 0 0 0 1 0 0 0⎟ ⎜0 0 0 0 0 0 0 0 ⎟ 2 2 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎜0 0 0 0 1 0 0 0⎟ ⎜ 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0⎟ 2 2 ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 0 0 1 0 0⎟ ⎝ 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0⎠ 2 2 ⎜ ⎟ ⎜0 0 0 0 0 1 0 0⎟ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 12 ⎜ ⎟ ⎜0 0 0 0 0 0 1 0⎟ ⎜ ⎟ ⎜0 0 0 0 0 0 1 0⎟ ⎜ ⎟ ⎝0 0 0 0 0 0 0 1⎠ 00000001 With the help of Aj , j ∈ {1, 2, 4, 8}, we can easily compute the abstraction of f as f Aj , which in order to compare it with the original f we can then again
32
A. Di Pierro, C. Hankin, and H. Wiklicky
concretise using G, i.e. computing f AG. In a similar way we can also compute the over- and under-approximation of f in Ti based on the above pointwise ordering and its reverse ordering. The result of these abstractions is depicted geometrically in Figure 1. With the help of Aj , j ∈ {1, 2, 4, 8}, we can easily compute the abstraction of f as f Aj , which in order to compare it with the original f we can then again concretise using G, i.e. computing f AG. In a similar way we can also compute the over- and under-approximation of f in Ti based on the above pointwise ordering and its reverse ordering. The result of these abstractions is depicted geometrically in Figure 1. The individual diagrams in this figure depict the original, i.e. concrete step function f ∈ T16 together with its approximations in T8 , T4 , etc. On the left hand side the PAI abstractions show how coarser and coarser interval subdivisions result in a series of approximations which try to interpolate the given function as closely as possible, sometimes below, sometimes above the concrete values. The diagrams on the right hand side depict the classical over- and underapproximations: In each case the function f is entirely below or above these approximations, i.e. we have safe but not necessarily close approximations. Additionally, one can also see from these figures not only that the PAI interpolation is in general closer to the original function than the classical abstractions (in fact it is the closest possible) but also that the PAI interpolation is always between the classical over- and under-approximations. The vector space framework also allows us to judge the quality of an abstraction or approximation via the Euclidian distance between the concrete and abstract version of a function. We can compute the least square error as f − f AG. In our case we get for example: f − f A8 G8 = 3.5355 f − f A4 G4 = 5.3151 f − f A2 G2 = 5.9896 f − f A1 G1 = 7.6444 which illustrates, as expected, that the coarser our abstraction is the larger is also the mistake or error. 4.5
Examples
We conclude by discussing in detail how probabilistic abstraction allows us to analyse the properties of programs. In the first example we are going to present, the aim is to reduce the size (dimension) of the concrete semantics so as to allow for an immediate understanding of the results of a computation. The second example will look more closely at the efficiency of an analysis, i.e. how PAI
Probabilistic Semantics and Program Analysis Probabilistic Abstract Interpretation
33
Classical Abstract Interpretation
T8
T4
T2
T1 Fig. 1. Average, Over- and Under-Approximation
can be deployed in order to beat the combinatorial explosion or the curse of dimensionality. Example 11 (Monty Hall). We have already investigated the LOS semantics of the Monty Hall program in Example 5. We still have to analyse whether it is Ht or Hw that implements the better strategy. In principle, we can do this using the concrete semantics we constructed above. However, it is rather cumbersome
34
A. Di Pierro, C. Hankin, and H. Wiklicky
to work with “relatively large” 162 × 162 or 243 × 243 matrices, even when they are sparse, i.e. contain almost only zeros (in fact only about 1.2% of entries in Ht and 0.7% of entries in Hw are non-zero). If we want to analyse the final states, i.e. which of the two programs has a better chance of getting the right door, we need to start with an initial configuration and then iterate T(H) until we reach a/the final configuration. For our programs it is sufficient to indicate that we start in label 1, while the state is irrelevant as we initialise all three variables at the beginning of the program; we could take – for example – a state with d = o = g = 0. The vector or distribution which describes this initial configuration is a 162 or 243 dimensional vector. We can describe it in a rather compact form as: x0 = 1 0 0 ⊗ 1 0 0 ⊗ 1 0 0 ⊗ 1 0 0 . . . 0 , where the last factor is 6 or 9 dimensional, depending on whether we deal with Ht or Hw . This represents a point distribution on 162 or 243 relevant distributions. Assuming that our program terminates for all initial states, as it is the case here, then there exists a certain number of iterations t such that x0 T(H)t = x0 T(H)t+1 , i.e. we will eventually reach a fix-point which gives us a distribution over configurations. In general, as in our case here, this will not be just a point distribution. Again we get vectors of dimension 162 or 243, respectively. For Ht and Hw there are 12 configurations which have non-zero probability. ⎧ ⎧ x12 = 0.074074 x18 = 0.11111 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ = 0.037037 x x ⎪ ⎪ 18 27 = 0.11111 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x = 0.11111 ⎪ ⎪ 36 54 = 0.037037 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ = 0.11111 x x ⎪ ⎪ 48 72 = 0.074074 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x = 0.11111 = 0.074074 ⎪ ⎪ 72 108 ⎪ ⎪ ⎨ ⎨ x78 = 0.037037 x117 = 0.11111 for Ht for Hw x90 = 0.074074 x135 = 0.11111 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x144 = 0.037037 = 0.11111 ⎪ ⎪ 96 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x180 = 0.037037 = 0.11111 ⎪ ⎪ 120 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x198 = 0.074074 = 0.11111 ⎪ ⎪ 132 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x225 = 0.11111 = 0.074074 ⎪ ⎪ 150 ⎪ ⎪ ⎩ ⎩ x156 = 0.037037 x234 = 0.11111 It is anything but easy to determine from this information which of the two strategies is more successful. In order to achieve this we will abstract away all unnecessary information. First, we ignore the syntactic information: If we are in the terminal state, then we have reached the final stop state, but even if this would not be the case we only need to know whether in the final state we have guessed the right door, i.e. whether d==g or not. We thus also don’t need to know the value of o as it ultimately is of no interest to us which door had been opened during the game. Therefore, we can use the forgetful abstraction
Probabilistic Semantics and Program Analysis
35
Af to simplify the information contained in the terminal state. Regarding d and g we want to know everything, and thus use the trivial abstraction A = I, i.e. the identity. The result for Ht is for x the terminal configuration distribution: x · (I ⊗ I ⊗ Af ⊗ Af ) = 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 and for Hw we get: x · (I ⊗ I ⊗ Af ⊗ Af ) = 0.22 0.04 0.07 0.07 0.22 0.04 0.04 0.07 0.22 The nine coordinates of these vectors correspond to (d → 0, g → 0), (d → 0, g → 1), (d → 0, g → 2), (d → 1, g → 0), . . . , (d → 2, g → 2). This is in principle enough to conclude that Hw is the better strategy. However, we can go a step further and abstract not the values of d and g but their relation, i.e. whether they are equal or different. For this we need the abstraction: ⎛ ⎞ 10 ⎜0 1⎟ ⎜ ⎟ ⎜0 1⎟ ⎜ ⎟ ⎜0 1⎟ ⎜ ⎟ ⎟ Aw = ⎜ ⎜1 0⎟ ⎜0 1⎟ ⎜ ⎟ ⎜0 1⎟ ⎜ ⎟ ⎝0 1⎠ 10 where the first column corresponds to a winning situation (i.e. d and g are equal), and the second to unequal d and g. With this we get for Ht : x · (Aw ⊗ Af ⊗ Af ) = 0.33333 0.66667 and for Hw
x · (Aw ⊗ Af ⊗ Af ) = 0.66667 0.33333
It is now obvious that Ht has just a 13 chance of winning, while Hw has a probability of picking the winning door.
2 3
This example illustrates how abstraction can be used in order to obtain useful information from a large collection of data – so to say, how to use abstractions to do statistics. We did not utilise PAI here to simplify the semantics itself but only the final results. We will now consider this issue in our second running example. Example 12 (Factorial). Classical abstraction allows us to determine the parity properties of the “double factorial” in Example 2. However, we cannot use it to justify our intuition that even the plain factorial itself almost always produces a even result. In order to do this, let us first consider the concrete semantics of our program using the following labelling:
36
A. Di Pierro, C. Hankin, and H. Wiklicky
var m : {0..2}; n : {0..2}; begin [m := 1]1 ; while [(n>1)]2 do [m := m*n]3; [n := n-1]4; od; [stop]5; end The flow of this program F is given as follows: Flow(F ) = {(1, 1, 2), (2, 1, 3), (3, 1, 4), (4, 1, 2), (2, 1, 5), (5, 1, 5)} The operator T(F ) is then constructed as T(F ) = U(m ← 1) ⊗ E(1, 2) + P((n > 1)) ⊗ E(2, 3) + U(m ← (m * n)) ⊗ E(3, 4) + U(n ← (n - 1)) ⊗ E(4, 2) + P((n 1)) ⊗ E(2, 3) + U# (m ← (m * n)) ⊗ E(3, 4) + U# (n ← (n - 1)) ⊗ E(4, 2) + P# ((n 1)) = ⊗ ⎜0 0 0 1 ... 0⎟ 01 ⎜ ⎟ ⎜ .. .. .. .. . . .. ⎟ ⎝. . . . . .⎠ 0 0 0 0 ... 1
⎞ 1 0 0 0 ... 0 ⎜0 1 0 0 ... 0⎟ ⎟ ⎜ ⎜0 0 0 0 ... 0⎟ 1 0 ⎟ ⎜ ⊗ ⎜0 0 0 0 ... 0⎟ P# ((n m(P ), where < is the lexicographic ordering on Ne+1 . To clarify these definitions, consider the following example. Suppose P’s command sequence is of the form EXZE, then e = 2, dE (P) = (1, 4), and m(P) = (1, 4, 3). For the command sequence EEX we get that e = 2, dE (P) = (2, 3) and m(P) = (2, 3, 2). Now, if one considers the rewrite EEX ⇒ EXZE, the measure of the left hand side is (2, 3, 2), while the measure of the right hand
62
A. Broadbent, J. Fitzsimons, and E. Kashefi
side, as said, is (1, 4, 3), and indeed (2, 3, 2) > (1, 4, 3). Intuitively the reason is clear: the Cs are being pushed to the left, thus decreasing the depths of Es, and concomitantly, the value of dE . Let us now consider all cases starting with an EC rewrite. Suppose the E command under rewrite has depth d and rank i in the order <E . Then all Es of smaller rank have same depth in the right hand side, while E has now depth d − 1 and still rank i. So the right hand side has a strictly smaller measure. Note that when C = X, because of the creation of a Z (see the example above), the last element of m(P) may increase, and for the same reason all elements of index j > i in dE (P) may increase. This is why we are working with a lexicographical ordering. Suppose now one does an M C rewrite, then dC (P) strictly decreases, since one correction is absorbed, while all E commands have equal or smaller depths. Again the measure strictly decreases. Next, suppose one does an EA rewrite, and the E command under rewrite has depth d and rank i. Then it has depth d − 1 in the right hand side, and all other E commands have invariant depths, since we forbade the case when A is itself an E. It follows that the measure strictly decreases. Finally, upon an AC rewrite, all E commands have invariant depth, except possibly one which has smaller depth in the case A = E, and dC (P) decreases strictly because we forbade the case where A = C. Again the claim follows. So all rewrites decrease our ordinal measure, and therefore all sequences of rewrites are finite, and since the system is finitely branching (there are no more than n possible single step rewrites on a given sequence of length n), we get the statement of the theorem. The next theorem establishes the important determinacy property and furthermore shows that the standard patterns have a certain canonical form which we call the NEMC form. The precise definition is: Definition 13. A pattern has a NEMC form if its commands occur in the order of N s first, then Es , then M s, and finally Cs. We will usually just say “EMC” form since we can assume that all the auxiliary qubits are prepared in the |+ state and we usually just elide these N commands. Theorem 5 (Confluence). For all P, there exists a unique standard P , such that P ⇒ P , and P is in EMC form. Proof. Since the rewriting system is terminating, confluence follows from local confluence by Newman’s lemma, see, for example, [Bar84]. This means that whenever two rewrite rules can be applied to a term t yielding t1 and t2 , one can rewrite both t1 and t2 to a common third term t3 , possibly in many steps. Then the uniqueness of the standard form is an immediate consequence. In order to prove the local confluence we look for critical pairs, that is occurrences of three successive commands where two rules can be applied simultaneously. One finds that there are only five types of critical pairs, of these the three involve the N command, these are of the form: N M C, N EC and N EM ; and
Measurement-Based and Universal Blind Quantum Computation
63
the remaining two are: Eij Mk Ck with i, j and k all distinct, Eij Mk Cl with k and l distinct. In all cases local confluence is easily verified. Suppose now P does not satisfy the EMC form conditions. Then, either there is a pattern EA with A not of type E, or there is a pattern AC with A not of type C. In the former case, E and A must operate on overlapping qubits, else one may apply a free commutation rule, and A may not be a C since in this case one may apply an EC rewrite. The only remaining case is when A is of type M , overlapping E’s qubits, but this is what condition (D1) forbids, and since (D1) is preserved under rewriting, this contradicts the assumption. The latter case is even simpler. We have shown that under rewriting any pattern can be put in EMC form, which is what we wanted. We actually proved more, namely that the standard form obtained is unique. However, one has to be a bit careful about the significance of this additional piece of information. Note first that uniqueness is obtained because we dropped the CC and EE free commutations, thus having a rigid notion of command sequence. One cannot put them back as rewrite rules, since they obviously ruin termination and uniqueness of standard forms. A reasonable thing to do, would be to take this set of equations as generating an equivalence relation on command sequences, call it ≡, and hope to strengthen the results obtained so far, by proving that all reachable standard forms are equivalent. But this is too naive a strategy, since E12 X1 X2 ≡ E12 X2 X1 , and: E12 X1s X2t ⇒ X1s Z2s X2t Z1t E12 ≡ X1s Z1t Z2s X2t E12 obtaining an expression which is not symmetric in 1 and 2. To conclude, one has to extend ≡ to include the additional equivalence X1s Z1t ≡ Z1t X1s , which fortunately is sound since these two operators are equal up to a global phase. Thus, these are all equivalent in our semantics of patterns. We summarise this discussion as follows. Definition 14. We define an equivalence relation ≡ on patterns by taking all the rewrite rules as equations and adding the equation X1s Z1t ≡ Z1t X1s and generating the smallest equivalence relation. With this definition we can state the following proposition. Proposition 15. All patterns that are equivalent by ≡ are equal in the denotational semantics. This ≡ relation preserves both the type (the (V, I, O) triple) and the underlying entanglement graph. So clearly semantic equality does not entail equality up to ≡. In fact, by composing teleportation patterns one obtains infinitely many patterns for the identity which are all different up to ≡. One may wonder whether two patterns with same semantics, type and underlying entanglement graph are necessarily equal up to ≡. This is not true either. One has J(α)J(0)J(β) = J(α + β) = J(β)J(0)J(α) (where J(α) is defined in Section 5),
64
A. Broadbent, J. Fitzsimons, and E. Kashefi
and this readily provides a counter-example. We can now formally describe a simple standardisation algorithm. Algorithm. Input: A pattern P on |V | = N qubits with command sequence AM · · · A1 . Output: An equivalent pattern P in NEMC form. 1. Commute all the preparation commands (new qubits) to the right side. 2. Commute all the correction commands to the left side using the EC and MC rewriting rules. 3. Commute all the entanglement commands to the right side after the preparation commands. Note that since each qubit can be entangled with at most N −1 other qubits, and can be measured or corrected only once, we have O(N 2 ) entanglement commands and O(N ) measurement commands. According to the definiteness condition, no command acts on a qubit not yet prepared, hence the first step of the above algorithm is based on trivial commuting rules; the same is true for the last step as no entanglement command can act on a qubit that has been measured. Both steps can be done in O(N 2 ) time. The real complexity of the algorithm comes from the second step and the EX commuting rule. In the worst case scenario, commuting an X correction to the left might create O(N 2 ) other Z corrections, each of which has to be commuted to the left themselves. Thus one can have at most O(N 3 ) new corrections, each of which has to be commuted past O(N 2 ) measurement or entanglement commands. Therefore the second step, and hence the algorithm, has a worst case complexity of O(N 5 ) time. We conclude this subsection by emphasising the importance of the EMC form. Since the entanglement can always be done first, we can always derive the entanglement resource needed for the whole computation right at the beginning. After that only local operations will be performed. This will separate the analysis of entanglement resource requirements from the classical control. Furthermore, this makes it possible to extract the maximal parallelism for the execution of the pattern since the necessary dependencies are explicitly expressed, see [BK09] for further discussion. The EMC form also provides us with tools to prove general theorems about patterns, such as the fact that they always compute CPTP-maps and the expressiveness theorems [DKP07]. Finally, we present later the first MBQC protocol designed using the EMC form which allows one to clearly distinguish between the quantum and classical aspects of a quantum computation.
7
Determinism
An important aspect of MBQC is the way the inherent randomness of the measurement outcomes can be accounted for, so that the overall computation remains deterministic. This is accomplished by conditioning the basis of certain measurements upon the outcome of others, introducing a measurement order. We first introduce various notions of determinism. A pattern is said to be
Measurement-Based and Universal Blind Quantum Computation
65
deterministic if it realises a CPTP-map that brings pure states to pure states. This is equivalent to saying that for a deterministic pattern branch maps are proportional, that is to say, for all q ∈ HI and all s1 , s2 ∈ Zn2 , As1 (q) and As2 (q) differ only up to a scalar. The class of deterministic patterns include projections. A more restricted class contains all the unitary and unitary embedding operators: a pattern is said to be strongly deterministic when branch maps are equal (up to a global phase), i.e. for all s1 , s2 ∈ Zn2 , As1 = eiφs1 ,s2 As2 . These are the patterns implementing quantum algorithms and hence understanding their structural properties is of particular interest. Proposition 16. If a pattern is strongly deterministic, then it realises a unitary embedding. Proof. † Define T to be the map realised by the pattern. We have T (ρ) = s As ρAs . Since the pattern in strongly deterministic all the branch maps are the same. Define A to be 2n/2 As , then A must be a unitary embedding, because A† A = I. An important sub-class of deterministic patterns are robust under the changes of the angles: a pattern is said to be uniformly deterministic if it is deterministic for all values of its measurement angles. In other words a uniformly deterministic pattern defines a class of quantum operators that can be performed given the same initial entanglement resources. On the other hand it is known that if we fix the angle of measurements to be Pauli the obtained operators is in Clifford group [DKP07]. That means uniform determinism allow us to associate to a family of quantum operators a canonical pattern implementing a Clifford operator, a potential valuable abstract reduction for the study of quantum operators. Finally a pattern is said to be stepwise deterministic if it is deterministic after performing each single measurement together with all the corrections depending on the result of that measurement. In another words a pattern is stepwise deterministic if after each single measurements there exists a set of local corrections depending only on the result of this measurements to be performed on some or all of the non-measured qubits that will make the two branches equal (up to a global phase). A variety of methods for constructing measurement patterns have been already proposed that guarantee determinism by construction [RBB03, HEB04, CLN05a]. We introduce a direct condition on graph states which guarantees a strong form of deterministic behaviour for a class of MBQC patterns defined over them [DK06]. Remarkably, this condition bears only on the geometric structure of the entangled graph states. Let us define an open graph state (G, I, O) to be a state associated with an undirected graph G together with two subsets of nodes I and O, called inputs and outputs. We write V for the set of nodes in G, I c , and Oc for the complements of I and O in V , NG (i) for the set of neighbours of i in G, i ∼ j for (i, j) ∈ G, and EG := i∼j Eij for the global entanglement operator associated to G. Thus i ∼ j denotes that i is adjacent to j in G. NI c denotes the sequence of preparation commands i∈I c Ni .
66
A. Broadbent, J. Fitzsimons, and E. Kashefi
Definition 17. A flow (f, ) for an open graph state (G, I, O) consists of a map f : Oc → I c and a partial order over V such that for all x ∈ Oc : (i) x ∼ f (x); (ii) x f (x); (iii) for all y ∼ f (x), x y . As one can see, a flow consists of two structures: a function f over vertices and a matching partial order over vertices. In order to obtain a deterministic pattern for an open graph state with flow, dependent corrections will be defined based on function f . The order of the execution of the commands is given by the partial order induced by the flow. The matching properties between the function f and the partial order will make the obtained pattern runnable. Figure 2 shows an open graph state together with a flow, where f is represented by arcs from Oc (measured qubits, black vertices) to I c (prepared qubits, non boxed vertices). The associated partial order is given by the labelled sets of vertices. The coarsest order for which (f, ) is a flow is called the dependency order induced by f and its depth (4 in Figure 2) is called flow depth. The existence of a causal flow is a sufficient condition for determinism. Before we can prove this, however, we need the following simple lemma which describes an essential property of graph state. Lemma 18 For any open graph (G, I, O) and any i ∈ I c , EG NI c = Xi ZNG (i) EG NI c Proof. The proof is based on equations 10, 12 of the Measurement Calculus, and the additional equation Xi Ni = Ni , which follows from the fact that Ni produces a qubit in the |+ state which is a fix point of X.
4 3
2 b
g
d
1
f a
c e
Fig. 2. An open graph state with flow. The boxed vertices are the input qubits and the white vertices are the output qubits. All the non-output qubits, black vertices, are measured during the run of the pattern. The flow function is represented as arcs and the partial order on the vertices is given by the 4 partition sets.
Measurement-Based and Universal Blind Quantum Computation
67
EG NI c = E G Xi NI c Xi NI c = E E k,l i,j =i,l
=i G (i) j∈N (k,l)∈G,k
Xi j∈NG (i) Zj = (k,l)∈G,k
=i,l
=i Ek,l j∈NG (i) Ei,j NI c = Xi j∈NG (i) Zj EG NI c = Xi ZNG (i) EG NI c The operator Ki := Xi ( j∈NG (i) Zj ) is called graph stabiliser [HEB04] at qubit i and the above lemma proves Ki EG NI c = EG NI c . Note that this equation is slightly more general than the common graph stabiliser [HEB04] as it can be applied to open graph states where input qubits are prepared in arbitrary states. Theorem 6. Suppose the open graph state (G, I, O) has flow f , then the pattern: si si αi := Z M X EG Pf,G,→ i k f (i) α i∈Oc
k∼f (i) k
=i
where the product follows the dependency order of f , is uniformly and strongly deterministic, and realises the unitary embedding: |Oc |/2 := 2 + | EG UG,I,O,→ αi i α i∈Oc
Proof. The proof is based on anachronical patterns, i.e. patterns which do not satisfy the D0 condition (see section 3) saying that no command depends on an outcome not yet measured. Indeed, in the anachronical pattern Miα Zisi , the command Zisi depends on the outcome si whereas the qubit i is not yet measured. However, by relaxing the D0 condition, we have the following equation: +α |i = Miα Zisi Indeed, if si = 0 the measurement realises the projection +α |i , and if si = 1 the measurement realises the projection −α |i = +α |i Zi . Thus, any correction-free αi pattern i∈Oc M i EG NI c can be turned into an anachronical strongly deterministic pattern i∈Oc Miαi Zisi EG NI c which realises UG . The rest of the proof consists in transforming this anachronical pattern into a pattern which satisfies the D0 condition: αi i αi si si si c = M Z E N M Z Z X EG NI c c c G I si i i i i∈O i∈O j∈NG (f (i)) j f (i) ≺ si si αi = i∈Oc Xf (i) j∈NG (f (i)){i} Zj Mi EG NI c Lemma 18 and condition 3 of the causal flow are used in the previous equation for eliminating the command Zsii , whereas conditions 1 and 2 ensure that the pattern satisfies the D0 condition. The intuition of the proof is that entanglement between two qubits i and j converts an anachronical Z correction at i, given in the term Miα Zisi , into a
68
A. Broadbent, J. Fitzsimons, and E. Kashefi
pair of a ‘future’ X correction on qubit j. The existence of the flow is only a sufficient condition for determinism which assign to every single measured qubit a single unique correcting vertex f (i). A natural generalisation is to consider a set of vertices as a correcting set which leads to a full characterisation of determinism [BKMP07]. Having obtained the rigourous mathematical model underlying MBQC, we can now demonstrate how this model suggests new techniques for designing quantum protocols.
8
Universal Blind Quantum Computing
When the technology to build quantum computers becomes available, it is likely that initially it will only be accessible to a handful of centres around the world. Much like today’s rental system of supercomputers, users will probably be granted access to the computers in a limited way. How will a user interface with such a quantum computer? Here, we consider the scenario where a user is unwilling to reveal the computation that the remote computer is to perform, but still wishes to exploit this quantum resource. The solution is the Universal Blind Quantum Computing (UBQC) protocol [BFK09] that allows a client Alice (who does not have any quantum computational resources or quantum memory) to interact with a server Bob (who has a quantum computer) in order for Alice to obtain the outcome of her target computation such that privacy is preserved. This means that Bob learns nothing about Alice’s inputs, outputs, or desired computation. The privacy is perfect, does not rely on any computational assumptions, and holds no matter what actions a cheating Bob undertakes. Alice only needs to be able to prepare single qubits randomly chosen from a finite set and send them to the server, who has the balance of the required quantum computational resources. After this initial preparation, Alice and Bob use two-way classical communication which enables Alice to drive the computation by giving single-qubit measurement instructions to Bob, depending on previous measurement outcomes. Note that if Alice wanted to compute the solution to a classical problem in NP, she could efficiently verify the outcome. An interfering Bob is not so obviously detected in other cases. UBQC uses an authentication technique which performs this detection. The UBQC protocol is constructed using the unique feature of MBQC that separates the classical and quantum parts of a computation, leading to a generic scheme for blind computation of any circuit without requiring any quantum memory for Alice. This is fundamentally different from previously known classical or quantum schemes. UBQC can be viewed as a distributed version of an MBQC computation (where Alice prepares the individual qubits, Bob does the entanglement and measurements, and Alice computes the classical feedforward mechanism), on top of which randomness is added in order to obscure the computation from Bob’s point of view. This is the first time that a new functionality has been achieved thanks to MBQC (though other theoretical advances due to MBQC appear in [RHG06, MS08]). From a conceptual point of view, this shows that MBQC has tremendous potential for the development of new protocols, and
Measurement-Based and Universal Blind Quantum Computation
69
maybe even of algorithms. UBQC can be used for any quantum circuit and also works for quantum inputs or outputs. We now give some applications. – Factoring. Factoring is a prime application of UBQC: by implementing Shor’s factoring algorithm [Sho97] as a blind quantum computation, Alice can use Bob to help her factor a product of large primes which is associated with an RSA public key [RSA78]. Thanks to the properties of UBQC, Bob will not only be unable to determine Alice’s input, but will be completely oblivious to the fact that he is helping her factor. – BQP-complete problems. UBQC could be used to help Alice solve a BQPcomplete problem, for instance approximating the Jones polynomial [AJL06]. There is no known classical method to efficiently verify the solution; this motivates the need for authentication of Bob’s computation, even in the case that the output is classical. – Processing quantum information. Alice may wish to use Bob as a remote device to manipulate quantum information. Consider the case where Alice is participating in a quantum protocol such as a quantum interactive proof. She can use UBQC to prepare a quantum state, to perform a measurement on a quantum system, or to process quantum inputs into quantum outputs. – Quantum prover interactive proofs. UBQC can be used to accomplish an interactive proof for any language in BQP, with a quantum prover and a nearly-classical verifier, where the verifier requires the power to generate random qubits chosen from a fixed set. Moreover, UBQC can be adapted to provide a two-prover interactive proof for any problem in BQP with a purely classical verifier. The modification requires that the provers share entanglement but otherwise be unable to communicate. Guided by the verifier, the first prover measures his part of the entanglement in order to create a shared resource between the verifier and the second prover. The remainder of the interaction involves the verifier and the second prover who essentially run the main protocol. In the classical world, Feigenbaum introduced the notion of computing with encrypted data [Fei86], according to which a function f is encryptable if Alice can easily transform an instance x into instance x , obtain f (x ) from Bob and efficiently compute f (x) from f (x ), in such a way that Bob cannot infer x from x . Following this, Abadi, Feigenbaum and Kilian [AFK89] gave an impossibility result: no NP-hard function can be computed with encrypted data (even probabilistically and with polynomial interaction), unless the polynomial hierarchy collapses at the third level. Ignoring the blindness requirement of UBQC yields an interactive proof with a BQP prover and a nearly-classical verifier. This scenario was first proposed in the work of [ABE10], using very different techniques based on authentication schemes. Their protocol can be also used for blind quantum computation. However, their scheme requires that Alice have quantum computational resources and memory to act on a constant-sized register. A related classical protocol for the scenario involving a P prover and a nearly-linear time verifier was given in [GKR08].
70
A. Broadbent, J. Fitzsimons, and E. Kashefi
Returning to the cryptographic scenario, still in the model where the function is classical and public, Arrighi and Salvail [AS06] gave an approach using quantum resources. The idea of their protocol is that Alice gives Bob multiple quantum inputs, most of which are decoys. Bob applies the target function on all inputs, and then Alice verifies his behaviour on the decoys. There are two important points to make here. First, the protocol only works for a restricted set of classical functions called random verifiable: it must be possible for Alice to efficiently generate random input-output pairs. Second, the protocol does not prevent Bob from learning Alice’s private input; it provides only cheat sensitivity. The case of a blind quantum computation was first considered by Childs [Chi05] based on the idea of encrypting input qubits with a quantum one-time pad [AMTW00, BR03]. At each step, Alice sends the encrypted qubits to Bob, who applies a known quantum gate (some gates requiring further interaction with Alice). Bob returns the quantum state, which Alice decrypts using her key. Cycling through a fixed set of universal gates ensures that Bob learns nothing about the circuit. The protocol requires fault-tolerant quantum memory and the ability to apply local Pauli operators at each step, and does not provide any method for the detection of malicious errors. The UBQC protocol [BFK09] is the first protocol for universal blind quantum computation where Alice has no quantum memory that works for any quantum circuit and assumes Alice has a classical computer, augmented with the power to prepare single qubits randomly chosen in √ {1/ 2 |0 + eiθ |1 | θ = 0, π/4, 2π/4, . . . , 7π/4} . The required quantum and classical communication between Alice and Bob is linear in the size of Alice’s desired quantum circuit. Interestingly, it is sufficient for our purposes to restrict Alice’s classical feed forward computation to modulo 8 arithmetic! Similar observations in a non-cryptographic context have been made in [AB09]. Except for an unavoidable leakage of the size of Alice’s data [AFK89], Alice’s privacy is perfect. We provide an authentication technique to detect an interfering Bob with overwhelming probability; this is optimal since there is always an exponentially small probability that Bob can guess a path that will make Alice accept. All previous protocols for blind quantum computation require technology for Alice that is today unavailable: Arrighi and Salvail’s protocol requires multiqubit preparations and measurements, Childs’ protocol requires fault-tolerant quantum memory and the ability to apply local Pauli operators at each step, while Aharonov, Ben-Or and Eban’s protocol requires a constant-sized quantum computer with memory. In sharp contrast to this, from Alice’s point of view, UBQC can be implemented with physical systems that are already available and well-developed. The required apparatus can be achieved by making only minor modifications to equipment used in the BB84 key exchange protocol [BB84].
Measurement-Based and Universal Blind Quantum Computation
9
71
The Brickwork States
The family of graph states called cluster states [RB01] is universal for MBQC, however, the method that allows arbitrary computation on the cluster state consists in first tailoring the cluster state to the specific computation by performing some computational basis measurements. If one was to use this principle or any arbitrary graph sates for blind quantum computing, Alice would have to reveal information about the structure of the underlying graph. Instead UBQC uses a new family of states called the brickwork states (Figure 3) which are universal for X − Y plane measurements and thus do not require the initial computational basis measurements. Other universal graph states that do not require initial computational basis measurements have appeared in [CLN05b]. Definition 19. A brickwork state Gn×m , where m ≡ 1 or 5 (mod 8), is an entangled state of n × m qubits constructed as follows (see also Figure 3): 1. Prepare all qubits in state |+ and assign to each qubit an index (i, j), i being a column (i ∈ [n]) and j being a row (j ∈ [m]). 2. For each row, apply the operator ∧Z on qubits (i, j) and (i, j + 1) where 1 ≤ j ≤ m − 1. 3. For each column j ≡ 3 (mod 8) and each odd row i, apply the operator ∧Z on qubits (i, j) and (i + 1, j) and also on qubits (i, j + 2) and (i + 1, j + 2). 4. For each column j ≡ 7 (mod 8) and each even row i, apply the operator ∧Z on qubits (i, j) and (i + 1, j) and also on qubits (i, j + 2) and (i + 1, j + 2). ... ... ... ... ... .. .
.. . ... ...
Fig. 3. The brickwork state, Gn×m . Qubits |ψx,y (x = 1, . . . , n, y = 1, . . . , m) are arranged according to layer x and row y, corresponding to the vertices in the above graph, and are originally in the |+ = √12 |0 + √12 |1 state. Controlled-Z gates are then performed between qubits which are joined by an edge.
Theorem 7 (Universality). The brickwork state Gn×m is universal for quantum computation. Furthermore, we only require single-qubit measurements under the angles {0, ±π/4, ±π/2}, and measurements can be done layer-by-layer.
72
A. Broadbent, J. Fitzsimons, and E. Kashefi
Proof. It is well-known that the set U = {∧X, H, Z( π4 )}, is a universal set of gates, where ∧X denotes the controlled-X operator; we will show how the brickwork state can be used to compute any gate in U . Recall the rotation iθX iθZ transformations: X(θ) = e 2 and Z(θ) = e 2 . Consider the measurement pattern and underlying graph state given in Figure 4. The implicit required corrections are implemented according to the flow condition [DK06] which guarantees determinism, and allows measurements to be performed layer-by-layer. The action of the measurement of the first three qubits on each wire is clearly given by the rotations in the right-hand part of Figure 4 [BB06]. The circuit identity follows since ∧Z commutes with Z(α) and is self-inverse. By assigning specific values to the angles, we get the Hadamard gate (Figure 5), the Z( π4 ) gate (Figure 6) and the identity (Figure 7). By symmetry, we can get H or Z(π/4) acting on logical qubit 2 instead of logical qubit 1. In Figure 8, we give a pattern and show using circuit identities that it implements a ∧X. The verification of the circuit identities is straightforward. Again by symmetry, we can reverse the control and target qubits. Note that as long as we have ∧Xs between any pair of neighbours, this is sufficient to implement ∧X between further away qubits. We now show how we can tile the patterns as given in Figures 4 through 8 (the underlying graph states are the same) to implement any circuit using U as a universal set of gates. In Figure 9, we show how a 4-qubit circuit with three gates, U1 , U2 and U3 (each gate acting on a maximum of two adjacent qubits) can be implemented on the brickwork state G9,4 . We have completed the top and bottom logical wires with a pattern that implements the identity. Generalising this technique, we get the family of brickwork states as given in Figure 3 and Definition 19. Here we only consider approximate universality. This allows us to restrict the angles of preparation and measurement to a finite set and hence simplify the
α
β
γ
0
Rz (α)
Rx (β)
Rz (γ)
Rz (α )
Rx (β )
Rz (γ )
= α
β
γ
0
Fig. 4. Pattern with arbitrary rotations. Squares indicate output qubits.
4
4
4
2
I ?
2
2
2
2
Fig. 5. Implementation of a Hadamard gate
Measurement-Based and Universal Blind Quantum Computation
6
2
2
73
6
2 ?
2
2
2
2
Fig. 6. Implementation of a Z(π/4) gate
0
0
0
0 =
0
0
0
0
Fig. 7. Implementation of the identity
2
4
2
2
I
S{ ) 4 *I I
I
? 2
4
2
. 4 S{ ) 4 *
I S{ ) 4 *I [
I S{ ). 4 *I [
?
? Sy ) 4 *
[
Sy ). 4 *
[
Fig. 8. Implementation of a ∧X
U1 U3
=
U2
U1 U3 U2
Fig. 9. Tiling for a 4-qubit circuit with three gates
description of the protocol. However one can easily extend UBQC to achieve exact universality as well, provided Alice can communicate real numbers to Bob.
10
The UBQC Protocol
Suppose Alice has in mind a unitary operator U that is implemented with a pattern on a brickwork state Gn×m (Figure 3) with measurements given as multiples of π/4. This pattern could have been designed either directly in MBQC
74
A. Broadbent, J. Fitzsimons, and E. Kashefi
or from a circuit construction. Each qubit |ψx,y ∈ Gn×m is indexed by a column x ∈ {1, . . . , n} and a row y ∈ {1, . . . , m}. Thus each qubit is assigned: a measurement angle φx,y , a set of X-dependencies Dx,y ⊆ [x − 1] × [m], and a set ⊆ [x − 1] × [m] . Here, we assume that the dependency of Z-dependencies Dx,y sets Xx,y and Zx,y are obtained via the flow construction [DK06]. During the execution of the pattern, the actual measurement angle φx,y is a modification of φx,y that depends on previous measurement outcomes in the following way: let sX x,y = ⊕i∈Dx,y si be the parity of all measurement outcomes for qubits in Xx,y and similarly, sZ si be the parity of all measurement outcomes x,y = ⊕i∈Dx,y X
for qubits in Zx,y . Then φx,y = (−1)sx,y φx,y + sZ x,y π . Protocol 1 implements a blind quantum computation for U . Note that we assume that Alice’s input to the computation is built into U . In other words, Alice wishes to compute U |0, her input is classical and the first layers of U may depend on it.
Protocol 1. Universal Blind Quantum Computation 1. Alice’s preparation For each column x = 1, . . . , n For each row y = 1, . . . , m 1.1 Alice prepares |ψx,y ∈R {|+θx,y = √12 (|0 + eiθx,y |1) | θx,y = 0, π/4, . . . , 7π/4} and sends the qubits to Bob. 2. Bob’s preparation 2.1 Bob creates an entangled state from all received qubits, according to their indices, by applying ∧Z gates between the qubits in order to create a brickwork state Gn×m (see Definition 19). 3. Interaction and measurement For each column x = 1, . . . , n For each row y = 1, . . . , m Z 3.1 Alice computes φx,y where sX 0,y = s0,y = 0. 3.2 Alice chooses rx,y ∈R {0, 1} and computes δx,y = φx,y + θx,y + πrx,y . 3.3 Alice transmits δx,y to Bob. Bob measures in the basis {|+δx,y , |−δx,y }. 3.4 Bob transmits the result sx,y ∈ {0, 1} to Alice. 3.5 If rx,y = 1 above, Alice flips sx,y ; otherwise she does nothing.
The universality of Protocol 1 follows from the universality of brickwork state for measurement-based quantum computing. Correctness refers to the fact that the outcome of the protocol is the same as the outcome if Alice had run the pattern herself. The fact that Protocol 1 correctly computes U |0 follows from the commutativity of Alice’s rotations and Bob’s measurements in the rotated bases. This is formalised below. Theorem 8 (Correctness). Assume Alice and Bob follow the steps of Protocol 1. Then the outcome is correct. Proof. Firstly, since ∧Z commutes with Z-rotations, steps 1 and 2 do not change the underlying graph state; only the phase of each qubit is locally changed, and it
Measurement-Based and Universal Blind Quantum Computation
75
is as if Bob had done the Z-rotation after the ∧Z. Secondly, since a measurement in the |+φ , |−φ basis on a state |ψ is the same as a measurement in the |+φ+θ , |−φ+θ basis on Z(θ)|ψ, and since δ = φ + θ + πr, if r = 0, Bob’s measurement has the same effect as Alice’s target measurement; if r = 1, all Alice needs to do is flip the outcome. We now define and prove the security of the protocol. Intuitively, we wish to prove that whatever Bob chooses to do (including arbitrary deviations from the protocol), his knowledge on Alice’s quantum computation does not increase. Note, however that Bob does learn the dimensions of the brickwork state, giving an upper bound on the size of Alice’s computation. This is unavoidable: a simple adaptation of the proof of Theorem 2 from [AFK89], confirms this. We incorporate this notion of leakage in our definition of blindness. A quantum delegated computation protocol is a protocol by which Alice interacts quantum mechanically with Bob in order to obtain the result of a computation, U (x), ˜ , x) is Alice’s input with U ˜ being a description of U . where X = (U Definition 20. Let P be a quantum delegated computation on input X and let L(X) be any function of the input. We say that a quantum delegated computation protocol is blind while leaking at most L(X) if, on Alice’s input X, for any fixed Y = L(X), the following two hold when given Y : 1. The distribution of the classical information obtained by Bob in P is independent of X. 2. Given the distribution of classical information described in 1, the state of the quantum system obtained by Bob in P is fixed and independent of X. Definition 20 captures the intuitive notion that Bob’s view of the protocol should not depend on X (when given Y ); since his view consists of classical and quantum information, this means that the distribution of the classical information should not depend on X (given Y ) and that for any fixed choice of the classical information, the state of the quantum system should be uniquely determined and not depend on X (given Y ). We are now ready to state and prove our main theorem. Recall that in Protocol 1, (n, m) is the dimension of the brickwork state. Theorem 9 (Blindness). Protocol 1 is blind while leaking at most (n, m). Proof. Let (n, m) (the dimension of the brickwork state) be given. Note that the universality of the brickwork state guarantees that Bob’s creating of the graph state does not reveal anything on the underlying computation (except n and m). Alice’s input consists of φ = (φx,y | x ∈ [n], y ∈ [m]) with the actual measurement angles φ = (φx,y | x ∈ [n], y ∈ [m])
76
A. Broadbent, J. Fitzsimons, and E. Kashefi
being a modification of φ that depends on previous measurement outcomes. Let the classical information that Bob gets during the protocol be δ = (δx,y | x ∈ [n], y ∈ [m]) and let A be the quantum system initially sent from Alice to Bob. To show independence of Bob’s classical information, let θx,y = θx,y + πrx,y (for a uniformly random chosen θx,y ) and θ = (θx,y | x ∈ [n], y ∈ [m]). We have δ = φ + θ , with θ being uniformly random (and independent of φ and/or φ ), which implies the independence of δ and φ. As for Bob’s quantum information, first fix an arbitrary choice of δ. Because rx,y is uniformly random, for each qubit of A, one of the following two has occurred: √1 (|0 + ei(δx,y −φx,y ) |1. 2 |ψx,y = √12 (|0 − ei(δx,y −φx,y ) |1.
1. rx,y = 0 so δx,y = φx,y + θx,y and |ψx,y = + π and 2. rx,y = 1 so δx,y = φx,y + θx,y
Since δ is fixed, θ depends on φ (and thus on φ), but since rx,y is independent of everything else, without knowledge of rx,y (i.e. taking the partial trace of the system over Alice’s secret), A consists of copies of the two-dimensional completely mixed state, which is fixed and independent of φ. There are two malicious scenarios that are covered by Definition 20 and that we explicitly mention here. Suppose Bob has some prior knowledge, given as some a priori distribution on Alice’s input X. Since Definition 20 applies to any distribution of X, we can simply apply it to the conditional distribution representing the distribution of X given Bob’s a priori knowledge; we conclude that Bob does not learn any information on X beyond what he already knows, as well as what is leaked. The second scenario concerns a Bob whose goal it is to find Alice’s output. Definition 20 forbids this: learning information on the output would imply learning information on Alice’s input. Note that the protocol does not allow Alice to reveal to Bob whether or not she accepts the result of the computation as this bit of information could be exploited by Bob to learn some information about the actual computation. In this scenario, Protocol 4 can be used instead.
11
Quantum Inputs and Outputs
We can slightly modify Protocol 1 to deal with both quantum inputs and outputs. In the former case, no extra channel resources are required, while the latter case requires a quantum channel from Bob to Alice in order for him to return the output qubits. Alice will also need to be able to apply X and Z Pauli operators in order to undo the quantum one-time pad. Note that these protocols can be combined to obtain a protocol for quantum inputs and outputs. Consider the scenario where Alice’s input is the form of m physical qubits and she has no efficient classical description of the inputs to be able to incorporate
Measurement-Based and Universal Blind Quantum Computation
77
it into Protocol 1. In this case, she needs to be able to apply local Pauli-X and Pauli-Z operators to implement a full one-time pad over the input qubits. The first layer of measurements are adapted to undo the Pauli-X operation if necessary. By the quantum one-time pad, Theorem 8 and Theorem 9, this modified protocol, given in Protocol 2 is still correct and private. Here we assume that Alice already has in her hands the quantum inputs: unless she receives the inputs one-by-one, she requires for this initial step some quantum memory. She also needs to be able to apply the single-qubit gates as described above. Note that this is only asking slightly more than Alice choosing between four single-qubit gates, which would be the minimum required in any blind quantum computation protocol with quantum inputs. Protocol 2. Universal Blind Quantum Computation with Quantum Inputs 1. Alice’s input preparation For the input column (x = 0, y = 1, . . . , m) corresponding to Alice’s input 1.1 Alice applies Z0,y (θ0,y ) for θ0,y ∈R {0, π/4, 2π/4, . . . , 7π/4}. i0,y . She sends the qubits to Bob. 1.2 Alice chooses i0,y ∈R {0, 1} and applies X0,y 2. Alice’s auxiliary preparation For each column x = 1, . . . , n For each row y = 1, . . . , m 2.1 Alice prepares |ψx,y ∈R {|+θx,y | θx,y = 0, π/4, 2π/4, . . . , 7π/4} and sends the qubits to Bob. 3. Bob’s preparation 3.1 Bob creates an entangled state from all received qubits, according to their indices, by applying ∧Z gates between the qubits in order to create a brickwork state G(n+1)×m . 4. Interaction and measurement For each column x = 0, . . . , n For each row y = 1, . . . , m 4.1 Alice computes φx,y with the special case φ0,y = (−1)i0,y φ0,y . 4.2 Alice chooses rx,y ∈R {0, 1} and computes δx,y = φx,y + θx,y + πrx,y . 4.3 Alice transmits δx,y to Bob. 4.4 Bob measures in the basis {|+δx,y , |−δx,y }. 4.5 Bob transmits the result sx,y ∈ {0, 1} to Alice. 4.6 If rx,y = 1 above, Alice flips sx,y ; otherwise she does nothing.
Suppose Alice now requires a quantum output, for example in the case of blind quantum state preparation. In this scenario, instead of measuring the last layer of qubits, Bob returns it to Alice, who performs the final layer of Pauli corrections. The following theorem shows a privacy property on the quantum states that Bob manipulates. Theorem 10. At every step of Protocol 1, Bob’s quantum state is one-time padded. Proof. During the execution of the protocol the value of sX and sZ are unknown to Bob since they have been one-time padded using the random key r at each
78
A. Broadbent, J. Fitzsimons, and E. Kashefi
layer. Due to the flow construction [DK06], each qubit (starting at the third column) receives independent Pauli operators, which act as the full quantum one-time pad over Bob’s state. Since our initial state is |+, and since the first layer performs a hidden Z-rotation, it follows that the qubits in the second layer are also completely encrypted during the computation. This result together with Theorems 8 and 9 proves the correctness and privacy of Protocol 3 that deals with quantum outputs.
Protocol 3. Universal Blind Quantum Computation with Quantum Outputs 1. Alice’s auxiliary preparation For each column x = 1, . . . , n − 1 For each row y = 1, . . . , m 1.1 Alice prepares |ψx,y ∈R {|+θx,y | θx,y = 0, π/4, 2π/4, . . . , 7π/4} and sends the qubits to Bob. 2. Alice’s output preparation 2.1 Alice prepares the last column of qubits |ψn,y = |+ (y = 1, . . . , m) and sends the qubits to Bob. 3. Bob’s preparation 3.1 Bob creates an entangled state from all received qubits, according to their indices, by applying ∧Z gates between the qubits in order to create a brickwork state Gn×m . 4. Interaction and measurement For each column x = 1, . . . , n − 1 For each row y = 1, . . . , m Z 4.1 Alice computes φx,y where sX 0,y = s0,y = 0 for the first column. 4.2 Alice chooses rx,y ∈R {0, 1} and computes δx,y = φx,y + θx,y + πrx,y . 4.3 Alice transmits δx,y to Bob. 4.4 Bob measures in the basis {|+δx,y , |−δx,y }. 4.5 Bob transmits the result sx,y ∈ {0, 1} to Alice. 4.6 If rx,y = 1 above, Alice flips sx,y ; otherwise she does nothing. 5. Output Correction 5.1 Bob sends to Alice all qubits in the last layer. Z X 5.2 Alice performs the final Pauli corrections Z sn,y X sn,y .
12
Authentication and Fault-Tolerance
We now focus on Alice’s ability to detect if Bob is not cooperating. There are two possible ways in which Bob can be uncooperative: he can refuse to perform the computation (this is immediately apparent to Alice), or he can actively interfere with the computation, while pretending to follow the protocol. It is this latter case that we focus on detecting. The authentication technique enables Alice to detect an interfering Bob with overwhelming probability (strictly speaking, either Bob’s interference is corrected and he is not detected, or his interference is detected with overwhelming probability). Note that this is the best that we can
Measurement-Based and Universal Blind Quantum Computation
79
hope for since nothing prevents Bob from refusing to perform the computation. Bob could also be lucky and guess a path that Alice will accept. This happens with exponentially small probability, hence our technique is optimal. In the case that Alice’s computation has a classical output and that she does not require fault-tolerance, a simple protocol for blind quantum computing with authentication exists: execute Protocol 1, on a modification of Alice’s target circuit: she adds N randomly placed trap wires that are randomly in state |0 or |1 (N is the number of qubits in the computation). If Bob interferes, either his interference has no effect on the classical output, or he will get caught with probability at least 12 (he gets caught if Alice finds that the output of at least one trap wire is incorrect). The protocol is repeated s times (the traps are randomly re-positioned each time); if Bob is not caught cheating, Alice accepts if all outputs are identical; otherwise she rejects. The probability of an incorrect output being accepted is at most 2−s . Protocol 4 is more general than this scheme since it works for quantum outputs and is fault-tolerant. If the above scheme is used for quantum inputs, they must be given to Alice as multiple copies. Similarly (but more realistically), if Protocol 4 is to be used on quantum inputs, these must already be given to Alice in an encoded form as in step 2 of Protocol 4 (because Alice has no quantum computational power). In the case of a quantum output, it will be given to Alice in a known encoded form, which she can pass on to a third party for verification. The theory of quantum error correction provides a natural mechanism for detecting unintended changes to a computation, whereas the theory of fault-tolerant computation provides a way to process information even using error-prone gates. Unfortunately, error correction, even when combined with fault-tolerant gate constructions is insufficient to detect malicious tampering if the error correction code is known. As evidenced by the quantum authentication protocol [BCG+ 02], error correction encodings can, however, be adapted for this purpose. UBQC proceeds along the following lines. Alice chooses an nC -qubit error correction code C with distance dC . (The values of nC and dC are taken as security parameters.) If the original computation involves N logical qubits, the authenticated version involves N (nC + 3nT ) (with nT = nC ), logical qubits: throughout the computation, each logical qubit is encoded with C, while the remaining 3N nT qubits are used as traps to detect an interfering Bob. The trap qubits are prepared as a first step of the computation in eigenstates of the Pauli operators X, Y and Z, with an equal number of qubits in each state. The protocol also involves fault-tolerant gates, for some of which it is necessary to have Bob periodically measure qubits [ZCC07]. In order to accomplish this, the blind computation protocol is extended by allowing Alice to instruct Bob to measure specific qubits within the brickwork state in the computational basis at regular intervals. These qubits are chosen at regular spacial intervals so that no information about the structure of the computation is revealed. It should be noted that in Protocol 4, we allow Alice to reveal to Bob whether or not she accepts the final result.
80
A. Broadbent, J. Fitzsimons, and E. Kashefi
Protocol 4. Blind Quantum Computing with Authentication (classical input and output) 1. Alice chooses C, where C is some nC -qubit error-correcting code with distance dC . The security parameter is dC . 2. In the circuit model, starting from circuit for U , Alice converts target circuit to fault-tolerant circuit: 2.1 Use error-correcting code C. The encoding appears in the initial layers of the circuit. 2.2 Perform all gates and measurements fault-tolerantly. 2.3 Some computational basis measurements are required for the fault-tolerant implementation (for verification of ancillae and non-transversal gates). Each measurement is accomplished by making and measuring a pseudo-copy of the target qubit: a ∧X is performed from the target to an ancilla qubit initially set to |0, which is then measured in the Z-basis. 2.4 Ancilla qubit wires are evenly spaced through the circuit. 2.5 The ancillae are re-used. All ancillae are measured at the same time, at regular intervals, after each fault-tolerant gate (some outputs may be meaningless). 3. Within each encoded qubit, permute all wires, keeping these permutations secret from Bob. 4. Within each encoded qubit, add 3nT randomly interspersed trap wires, each trap being a random eigenstate of X, Y or Z (nT of each). For security, we must have nT ∝ nC ; for convenience, we choose nT = nC . The trap qubit wire (at this point) does not interact with the rest of the circuit. The wire is initially |0, and then single-qubit gates are used to create the trap state. These single-qubit gates appear in the initial layers of the circuit. 5. Trap qubits are verified using the same ancillae as above: they are rotated into the computational basis, measured using the pseudo-copy technique above, and then returned to their initial basis. 6. Any fault-tolerant measurement is randomly interspersed with verification of 3nT random trap wires. For this, identity gates are added as required. 7. For classical output, the trap wires are rotated as a last step, so that the following measurement in the computational basis is used for a final verification. 8. Convert the whole circuit above to a measurement-based computation on the brickwork state, with the addition of regular Z-basis measurements corresponding to the measurements on ancillae qubits above. Swap and identity gates are added as required, and trap qubits are left untouched. 9. Perform the blind quantum computation: 9.1 Execute Protocol 1, to which we add that Alice periodically instructs Bob to measure in Z-basis as indicated above. 9.2 Alice uses the results of the trap qubit measurements to estimate the error rate; if it is below the threshold (see discussion in the main text), she accepts, otherwise she rejects.
UBQC can also be used in the scenario of non-malicious faults: because it already uses a fault-tolerant construction, the measurement of trap qubits in Protocol 4 allows for the estimation of the error rate (whether caused by the environment or by an adversary); if this error rate is below a certain threshold
Measurement-Based and Universal Blind Quantum Computation
81
(this threshold is chosen below the fault-tolerance threshold to take into account sampling errors), Alice accepts the computation. As long as this is below the fault-tolerance threshold, an adversary would still have to guess which qubits are part of the code, and which are traps, so Theorem 13 also holds in the faulttolerant version. The only difference is that the adversary can set off a few traps without being detected, but he must still be able to correctly guess which qubits are in the encoded qubit and which are traps. Increasing the security parameters will make up for the fact that Bob can set off a few traps without making the protocol abort. This yields a linear trade-off between the error rate and the security parameter. Note that the brickwork state (Figure 3) can be extended to multiple dimensions, which may be useful for obtaining better fault-tolerance thresholds [Got00]. While the quantum Singleton bound [KL00] allows error correction codes for which dC ∝ nC , it may be more convenient to use the Toric √ Code [Kit97] for which dC ∝ nC , as this represents a rather simple encoding while retaining a high ratio of dC to nC . For the special case of deterministic classical output, a classical repetition code is sufficient and preferable as such an encoding maximises nC . Theorem 11 (Fault Tolerance). Protocol 4 is fault-tolerant. Proof. By construction, the circuit created in step 2.1 is fault-tolerant. Furthermore, the permutation of the circuit wires and insertion of trap qubits (steps 2.2 and 2.3) preserves the fault tolerance. This is due to the fact that qubits are permuted only within blocks of constant size. The fault-tolerant circuit given in step 2.1 can be written as a sequence of local gates and ∧X gates between neighbours. Clearly permutation does not affect the fidelity of local operations. As qubits which are neighbours in the initial fault-tolerant circuit become separated by less than twice the number of qubits in a single block, the maximum number of nearest-neighbour ∧X gates required to implement ∧X from the original circuit is in O(nC + 3nT ) (the size of a block). (If required, the multi-dimensional analogue of the two-dimensional brickwork state can be used in order to substantially reduce this distance.) As this upper bound is constant for a given implementation, a lower bound for the fault-tolerance threshold can be obtained simply be scaling the threshold such that the error rate for this worst-case ∧X is never more than the threshold for the original circuit. Thus, while the threshold is reduced, it remains non-zero. Step 8 converts the fault-tolerant circuit to a measurement pattern; it is known that this transformation retains the fault-tolerance property [ND05, AL06]. Finally, in step 9, distributing the fault-tolerant measurement pattern between Alice and Bob does not disturb the fault tolerance since the communication between them is only classical. Theorem 12 (Blindness). Protocol 4 is blind while leaking at most (n, m). Proof. Protocol 4 differs from Protocol 1 in the following two ways: Alice instructs Bob to perform regular Z-basis measurements and she reveals whether or not she accepts or rejects the computation. It is known that Z measurements
82
A. Broadbent, J. Fitzsimons, and E. Kashefi
change the underlying graph state into a new graph state [HEB04]. The Z measurements in the protocol are inserted at regular intervals and their numbers are also independent of the underlying circuit computation. Therefore their action transforms the generic brickwork state into another generic resource still independent of Alice’s input and the blindness property is obtained via the same proof of Theorem 9. Finally, from Alice’s decision to accept or reject, only information relating to the trap qubits is revealed to Bob, since Alice rejects if and only if the estimated error rate is too high. The trap qubits are uncorrelated with the underlying computation (in the circuit picture, they do not interact with the rest of the circuit) and hence they reveal no information about Alice’s input. In the following theorem, for simplicity, we consider the scenario with zero error rate; a proof for the full fault-tolerant version is similar. Theorem 13 (Authentication). For the zero-error case of Protocol 4, if Bob interferes with an authenticated computation, then either he is detected except with exponentially small probability (in the security parameter), or his actions fail to alter the computation. Proof. If Bob interferes with the computation, then in order for his actions to affect the outcome of the computation without being detected, he must perform a non-trivial operation (i.e. an operation other than the identity) on the subspace in which the logical qubits are encoded. Due to the fault-tolerant construction of Alice’s computation (Theorem 11), Bob’s operation must have weight at least dC . Due to discretisation of errors, we can treat Bob’s action as introducing a Pauli error with some probability p. If a Pauli error acts non-trivially on a trap qubit then the probability of this going undetected is 1/3. Pauli operators which remain within the code space must act on at least dC qubits. As Bob has no knowledge about the roles of qubits (Theorem 12), the probability of him acting on any qubit is equal. As the probability of acting on a trap is at least 3nT /(nC + 3nT ), for each qubit upon which he acts non-trivially, the probability of Bob being detected is more than 2nT /(nC + 3nT ). Thus the probability of an M -qubit Pauli operator going undetected is below (1 − 2nT /(nC + 3nT ))M . Since nT = nC , the minimum probability of Bob affecting the computation and going undetected is = 2−dC .
13
Entangled Servers
We will close with a discussion of UBQC in the context of multi-prover interactive proofs. As stated before, one can view UBQC as an interactive proof system where Alice acts as the verifier and Bob as the prover. An important open problem is to find an interactive proof for any problem in BQP with a BQP prover, but with a purely classical verifier. Protocol 4 makes progress towards finding a solution by providing an interactive proof for any language in BQP, with a quantum prover and a BPP verifier that also has the power to generate random
Measurement-Based and Universal Blind Quantum Computation
83
qubits chosen from a fixed set and send them to the prover. This perspective was first proposed by Aharonov, Ben-Or and Eban [ABE10], however their scheme demands a more powerful verifier. Protocol 5 is a solution to another closely related problem, namely the case of a purely classical verifier interacting with two non-communicating entangled provers. The idea is to adapt Protocol 1 so that one prover (that we now call a server) is used to prepare the random qubits that would have been generated by Alice in the original protocol, while the other server is used for universal blind quantum computation. Using the authenticated protocol (Protocol 4) between Alice and the second server, Alice will detect any cheating servers as clearly, any cheating by Server 1 is equivalent to a deviation from the protocol by Server 2, which is detected in step 2 of the protocol (the proof is directly obtained from Theorem 13). On the other hand, since Server 2 has access to only half of each entangled state, from his point of view, his sub-system remains in a completely mixed state independently of Server 1’s actions and the blindness of the protocol is obtained directly from Theorem 12. This protocol acts as an interactive proof system for BQP, as an authenticated blind computation gaurantees the correctness of the result except with exponentially small probability. Thus UBQC provides a means for a completely classical party to interactively verify the correctness of any quantum computation. While the UBQC protocol was the first to demonstrate new functionality via measurement based computation, we expect that this novel model will provide fertile ground for future research.
Protocol 5. Universal Blind Quantum Computation with Entangled Servers Initially, Servers 1 and 2 share |Φ+ x,y =
√1 (|00 2
+ |11) (x = 1, . . . , n, y = 1, . . . , m).
1. Alice’s preparation with Server 1 For each column x = 1, . . . , n For each row y = 1, . . . , m 1.1 Alice chooses θ˜x,y ∈R {0, π/4, 2π/4, . . . , 7π/4} and sends it to Server 1, who measures his part of |Φ+ x,y in |±θ˜x,y . 1.2 Server 1 sends mx,y , the outcome of his measurement, to Alice. 2. Alice’s computation with Server 2 2.1 Alice runs the authenticated blind quantum computing protocol (Protocol 4) with Server 2, taking θx,y = θ˜x,y + mx,y π.
Acknowledgments We would like to thank our collaborators and co-authors in the series of the papers that this chapter is based on: Vincent Danos and Prakash Panangaden.
84
A. Broadbent, J. Fitzsimons, and E. Kashefi
References [AB09]
Anders, J., Browne, D.E.: Computational power of correlations. Physical Review Letters 102, 050502 (4 pages) (2009) [ABE10] Aharonov, D., Ben-Or, M., Eban, E.: Interactive proofs for quantum computations. In: Proceedings of Innovations in Computer Science (ICS 2010), pp. 453–469 (2010) [AFK89] Abadi, M., Feigenbaum, J., Kilian, J.: On hiding information from an oracle. Journal of Computer and System Sciences 39, 21–50 (1989) [AJL06] Aharonov, D., Jones, V., Landau, Z.: A polynomial quantum algorithm for approximating the Jones polynomial. In: Proceedings of the 38th annual ACM symposium on Theory of computing (STOC 2006), pp. 427–436 (2006) [AL06] Aliferis, P., Leung, D.W.: Simple proof of fault tolerance in the graph-state model. Phys. Rev. A 73 (2006) [AMTW00] Ambainis, A., Mosca, M., Tapp, A., de Wolf, R.: Private quantum channels. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS 2000), pp. 547–553 (2000) [AS06] Arrighi, P., Salvail, L.: Blind quantum computation. International Journal of Quantum Information 4, 883–898 (2006) [Bar84] Barendregt, H.P.: The Lambda Calculus, Its Syntax and Semantics. In: Studies in Logic. North-Holland, Amsterdam (1984) [BB84] Brassard, G., Bennett, C.H.: Public key distribution and coin tossing. In: Proceedings of IEEE International Conference on Computers, Systems and Signal Processing (1984) [BB06] Browne, D.E., Briegel, H.J.: One-way quantum computation. In: Lectures on Quantum Information, pp. 359–380. Wiley-VCH, Berlin (2006) [BCG+ 02] Barnum, H., Cr´epeau, C., Gottesman, D., Smith, A., Tapp, A.: Authentication of quantum messages. In: Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2002), p. 449 (2002) [BFK09] Broadbent, A., Fitzsimons, J., Kashefi, E.: Universal blind quantum computation. In: Proceedings of the 50th Annual Symposium on Foundations of Computer Science (FOCS 2009), pp. 517–527 (2009) [BK09] Broadbent, A., Kashefi, E.: Parallelizing quantum circuits. In: Theoretical Computer Science, pp. 2489–2510 (2009) [BKMP07] Browne, D., Kashefi, E., Mhalla, M., Perdrix, S.: Generalized flow and determinism in measurement-based quantum computation. New Journal of Physics 9 (2007) [BR03] Boykin, P.O., Roychowdhury, V.: Optimal encryption of quantum bits. Physical Review A 67, 042317 (2003) [BV97] Bernstein, E., Vazirani, U.: Quantum complexity theory. SIAM Journal of Computing 5(26) (1997) [Chi05] Childs, A.M.: Secure assisted quantum computation. Quantum Information and Computation 5, 456–466 (2005); Initial version appeared online in (2001) [Cho75] Choi, M.D.: Completely positive linear maps on complex matrices. Linear Algebra and Applications 10 (1975) [CLN05a] Childs, A.M., Leung, D.W., Nielsen, M.A.: Unified derivations of measurement-based schemes for quantum computation. Physical Review A 71 (2005), quant-ph/0404132
Measurement-Based and Universal Blind Quantum Computation [CLN05b]
85
Childs, A.M., Leung, D.W., Nielsen, M.A.: Unified derivations of measurement-based schemes for quantum computation. Physical Review A 71, 032318 (14 pages) (2005) [DAB03] D¨ ur, W., Aschauer, H., Briegel, H.J.: Multiparticle entanglement purification for graph state. Physical Review Letters 91 (2003), quant-ph/0303087 [Deu85] Deutsch, D.: Quantum theory, the Church-Turing principle and the universal quantum computer. Proceedings of the Royal Society of London A400 (1985) [Deu89] Deutsch, D.: Quantum computational networks. Proc. Roy. Soc. Lond A 425 (1989) [DK06] Danos, V., Kashefi, E.: Determinism in the one-way model. Physical Review A 74, 052310 (6 pages) (2006) [DKP05] Danos, V., Kashefi, E., Panangaden, P.: Parsimonious and robust realizations of unitary maps in the one-way model. Physical Review A 72 (2005) [DKP07] Danos, V., Kashefi, E., Panangaden, P.: The measurement calculus. Journal of ACM 54, 8 (45 pages) (2007) [DS96] D¨ urr, C., Santha, M.: A decision procedure for unitary linear quantum cellular automata. In: Proceedings of FOCS 1996 – Symposium on Foundations of Computer Science, LNCS. Springer, Heidelberg (1996), quantph/9604007 [Fei86] Feigenbaum, J.: Encrypting problem instances: Or... can you take advantage of someone without having to trust him? In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 477–488. Springer, Heidelberg (1986) [GC99] Gottesman, D., Chuang, I.L.: Quantum teleportation is a universal computational primitive. Nature 402 (1999) [GKR08] Goldwasser, S., Kalai, Y.T., Rothblum, G.N.: Delegating computation: interactive proofs for muggles. In: Proceedings of the 40th annual ACM symposium on Theory of computing, pp. 113–122 (2008) [GOD+ 06] Greentree, A.D., Olivero, P., Draganski, M., Trajkov, E., Rabeau, J.R., Reichart, P., Gibson, B.C., Rubanov, S., Huntington, S.T., Jamieson, D.N., Prawer, S.: Critical components for diamond-based quantum coherent devices. Journal of Physics: Condensed Matter 18, 825–842 (2006) [Got97] Gottesoman, D.: Stabilizer codes and quantum error correction. PhD thesis, California Institute of Technology (1997) [Got00] Gottesman, D.: Fault-tolerant quantum computation with local gates. Journal of Modern Optics 47, 333–345 (2000) [HEB04] Hein, M., Eisert, J., Briegel, H.J.: Multi-party entanglement in graph states. Physical Review A 69 (2004) [Kit97] Kitaev, A.Y.: Quantum computations: algorithms and error correction. Russian Mathematical Surveys 52, 1191–1249 (1997) [KL00] Knill, E., Laflamme, R.: A theory of quantum error-correcting codes. Physical Review Letters 84, 2525 (2000) [MS08] Markham, D., Sanders, B.C.: Graph states for quantum secret sharing. Physical Review A 78, 042309 (17 pages) (2008) [NC00] Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000) [ND05] Nielsen, M.A., Dawson, C.M.: Fault-tolerant quantum computation with cluster states. Phys. Rev. A 71 (2005) [Nie03] Nielsen, M.A.: Universal quantum computation using only projective measurement, quantum memory and preparation of the 0 state. Physical Review A 308 (2003)
86
A. Broadbent, J. Fitzsimons, and E. Kashefi
[Per95] [Pre98]
[RB01] [RBB03] [RHG06] [RSA78]
[Sel04] [Sel05] [Sho97]
[SW04] [Unr05] [vD96] [Wat95]
[ZCC07]
Peres, A.: Quantum Theory: Concepts and Methods. Kluwer Academic Publishers, Dordrecht (1995) Preskill, J.: Fault-tolerant quantum computation. In: Lo, H.K., Popescu, S., Spiller, T.P. (eds.) Introduction to Quantum Computation and Information. World Scientific, Singapore (1998) Raussendorf, R., Briegel, H.J.: A one-way quantum computer. Physical Review Letters 86 (2001) Raussendorf, R., Browne, D.E., Briegel, H.J.: Measurement-based quantum computation on cluster states. Physical Review A 68 (2003) Raussendorf, R., Harrington, J., Goyal, K.: A fault-tolerant one-way quantum computer. Annals of Physics 321, 2242–2270 (2006) Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21, 120–126 (1978) Selinger, P.: Towards a quantum programming language. Mathematical Structures in Computer Science 14(4) (2004) Selinger, P. (ed.): Proceedings of the 3nd International Workshop on Quantum Programming Languages. ENTCS (2005) Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Journal on Computing 26, 1484–1509 (1997); First published in 1995 Schumacher, B., Werner, R.F.: Reversible quantum cellular automata (2004), quant-ph/0405174 Unruh, D.: Quantum programs with classical output streams. In: Selinger [Sel05] (2005) van Dam, W.: Quantum cellular automata. Master’s thesis, Computer Science Nijmegen (1996) Watrous, J.: On one-dimensional quantum cellular automata. In: Proceedings of FOCS 1995 – Symposium on Foundations of Computer Science, LNCS. Springer, Heidelberg (1995) Zeng, B., Cross, A., Chuang, I.L.: Transversality versus universality for additive quantum codes (2007), arXiv:0706.1382v3 (quant-ph)
Information Theory and Security: Quantitative Information Flow Pasquale Malacaria and Jonathan Heusser School of Electronic Engineering and Computer Science, Queen Mary University of London
Abstract. We present the information theoretical basis of Quantitative Information Flow. We show the relationship between lattices, partitions and information theoretical concepts and their applicability to quantify leakage of confidential information in programs, including looping programs. We also report on recent works that use these ideas to build tools for the automatic quantitative analysis of programs. The applicability of this information theoretical framework to the wider context of network protocols and the use of Lagrange multipliers in this setting is also demonstrated.
1
Introduction
Computational systems (a general term) have two basic properties: first they process information, second they allow for observations of this processing to be made. For example a program will typically process the inputs and allows its output to be observed for example on a screen. In a distributed system each unit processes information and will allow some observation to be made by the environment or other units, for example by message passing. In an election voters cast their vote, officials count the votes and the public can observe the election result. The broad goal of our research is to develop theories and techniques to quantity the information leaked by components of a computational system by these observations. The information we are interested to quantify is the one coming from some designed sources, for example in a system processing secret data we may want to quantify how much is leaked by observations available to the general public. In an election we may want to quantify how much about the choice of individual voters is leaked by the result; consider for example the extreme case when a candidate get all votes: then all secret is revealed, a total loss of anonymity. Several terms can be used for essentially the same concept: quantification of leakage, quantitative information flow, quantification of interference. quantitative analysis of dependencies. The basic idea for this quantitative analysis has been proposed in various forms by a number of researchers. Given components A and B we can say that Interference from A to B can be measured by the number of states of A we can distinguish given observations over B A. Aldini et al. (Eds.): SFM 2010, LNCS 6154, pp. 87–134, 2010. c Springer-Verlag Berlin Heidelberg 2010
88
P. Malacaria and J. Heusser
Although this is a somehow crude model it does capture the essence of quantitative information flow. The model is crude because while the number of distinguishable states is the essential quantity, their probabilities are also to be taken into account: the measure can be refined by Information Theoretical notions like entropy. To see why probabilities are essential consider a simple pin checking program of a cash machine if (pin==guess) x=access else x=deny the leakage of information about the secret pin by observing the output value of x greatly depend on the probability of the pin to have a particular value. If the possible values of a 4 digits pin are equally likely then the leakage will be very low whereas if the pin is very likely to take the same value as guess then the leakage will be unacceptable. Hence probabilities are an important component of a quantitative information flow analysis. 1.1
Structure of This Work
We will start by relating observations over a system with sets of distinguishable states and partitions. It will be shown how random variables in our context can be seen as partitions and that partitions over a set of states form a complete lattice. The concept of measure on lattice points is then discussed and it will be argued that Shannon’s entropy provides the best measure. This measure induces a pseudometric on the lattice points and the derived equivalence classes (points of distance 0) can be seen, following Shannon, as the information tokens of the space. The second part of the work is devoted to apply this lattice-information theoretical framework to quantify leakage of sequential programs. Leakage of programs is defined in two steps: first interpret programs as random variables, that is partitions in the lattice of information and then measure this partition using information theory. Splitting the definition in two steps is useful in that allows to use a unique framework for several measures of leakage like Shannon’s one and those based on guessability. We will then see in details how to quantify leakage of loops, using both an analytic approach and the partition approach. These approaches will be shown to be equivalent. In the third part we will describe recent work towards automation of these ideas. These ideas, their relation to current verification and abstract interpretation tools and techniques and the challenges in the implementation are discussed. We conclude with a short review of more advanced techniques like Lagrange multipliers in a the general setting of probabilistic systems.
Information Theory and Security: Quantitative Information Flow
2 2.1
89
Basics Observations and the Lattice of Information
Information Theory aims to measure the amount of information of random variables or of some sort of processes (mainly stochastic processes): what the information is about is not a concern of the theory, the measure is based on the number of distinctions available in an information context. As an example consider the information-wise very different processes “flipping a coin” and “presidential election between two candidate”. While the first is a rather inconsequential process and the second may have important consequences they are both contexts allowing for two choices hence they both have an information measure of (at most) 1 bit. In a context where n choices are possible (a process with n outcomes) the information associated is measured in terms of the number of bits needed to encode those possible choices, so it is at most log2 (n). We can see an observation over a context (or a process) as a partial information, that is an observation reveals some information about a context. As an analogy a witness observing a crime may notice that the criminal was a tall male. He may not be able to identify the criminal but his observation will split the population into two groups, the possible criminals (tall males) the non suspects (all not tall males). Hence the concept of observation ties in nicely with the one of information: observation = partial information = sets of indistinguishable items Notice that we can always define a minimal, least informative observation (nothing is distinguished) and a maximal, most informative observations (everything is distinguished). We will make an important determinacy assumption about observations, i.e. that an observations can be defined as a partition on the set of all possible states: a block in this partition is the set of states that are indistinguishable by that observation. This assumption is satisfied for example in the setting of sequential languages when we take as observations the program outputs. If we consider a more general probabilistic settings like anonymity protocols a more general framework should be considered [11] (see section 8). 2.2
Partitions and Equivalence Relations as Lattice Points
Given a system with a set of possible states Σ the set of all possible partitions over Σ is a complete lattice: the Lattice of Information (LoI) [24]. Order on partitions is given by refinement: a partition is above another if it is more informative, i.e. each block in the lower partition is included in a block in the above partition. An alternative view of the same structure is in terms of equivalence relations. Notice first that there is simple translations between an equivalence relation and a partition: given an equivalence relation define the partition whose blocks are equivalences classes.
90
P. Malacaria and J. Heusser
Let us define the set LoI which stands for the set of all possible equivalence relations on a set Σ. The ordering of LoI is now defined as ≈ ∼ ↔ ∀σ1 , σ2 (σ1 ∼ σ2 ⇒ σ1 ≈ σ2 )
(1)
where ≈, ∼ ∈ LoI and σ1 , σ2 ∈ Σ. Furthermore, the join and meet lattice operations stand for the intersection of relations and the transitive closure union of relations respectively. Thus, higher elements in the lattice can distinguish more while lower elements in the lattice can distinguish less states. It easily follows from (1) that LoI is a complete lattice. We will assume this lattice to be finite; this is motivate by considering information storable in programs variables: such information is ≤ 2k where k is the number of bits of the secret variable. We give a typical example of how these equivalence relations can be used in an information flow setting. Let us assume the set of states Σ consists of a tuple l, h where l is an observable, usually called low, variable and h is a confidential variable, usually called high. One possible observer can be described by the equivalence relation l1 , h1 ≈ l2 , h2 ↔ l1 = l2 That is the observer can only distinguish two states whenever they agree on the low variable part. Clearly, a more powerful attacker is the one who can distinguish any two states from one another, or l1 , h1 ∼ l2 , h2 ↔ l1 = l2 ∧ h1 = h2 The ∼-observer gains more information than the ≈-observer by comparing states, therefore ≈ ∼. 2.3
Lattice of Information as a Lattice of Random Variables
A random variable (noted r.v.) is usually defined as a map X : D → R, where D is a finite set with a probability distribution and the real numbers R is the range of X. For each element d ∈ D, its probability will be denoted p(d). For every element x ∈ R we write p(X = x) (or often in short p(x)) to mean the def probability that X takes on the value x, i.e. p(x) = d∈X −1 (x) p(d). In other words, what we observe by X = x is that the input to X in D belongs to the set X −1 (x). From that perspective, X partitions the space D into sets which are indistinguishable to an observer who sees the value that X takes on1 . This can be stated relationally by taking the kernel of X which defines the following equivalence relation ker(X): d ker(X) d iff X(d) = X(d )
1
We define an event for the random variable a block in the partition.
(2)
Information Theory and Security: Quantitative Information Flow
91
Equivalently we write X Y whenever the following holds X Y iff {X −1 (x) : x ∈ R} = {Y −1 (y) : y ∈ R} and thus if X Y then H(X) = H(Y ). This shows that each element of the lattice LoI can be seen as a random variable. Given two r.v. X, Y in LoI we define the joint random variable (X, Y ) as their least upper bound in LoI i.e. X Y . It is easy to verify that X Y is the partition obtained by all possible intersections of blocks of X with blocks of Y . 2.4
Basic Concepts of Information Theory
This section contains a very short review of some basic definitions of Information Theory; additional background is readily available both in textbooks (the standard being Cover and Thomas textbook [17]) and on the web. Given a space of events with probabilities P = (pi )i∈N (N is a set of indices) the Shannon’s entropy is defined as H(X) = − pi log pi (3) i∈N
It is usually said that this number measures the average information content of the set of events: if there is an event with probability 1 then the entropy will be 0 and if the distribution is uniform i.e. no event is more likely than any other the entropy is maximal, i.e. log |N |. In the literature the terms information content and uncertainty in this context are often used interchangeably: both terms refer to the number of possible distinctions on the set of events in the sense we discussed before. The entropy of a r.v. X is just the entropy of its probability distribution i.e. p(X = x) log p(X = x) − x∈X
Given two random variables X and Y , the joint entropy H(X, Y ) measures the uncertainty of the joint r.v. (X, Y ). it Is defined as − p(X = x, Y = y) log p(X = x, Y = y) x∈X,y∈Y
Conditional entropy H(X|Y ) measures the uncertainty about X given knowledge of Y . It is defined as H(X, Y ) − H(Y ). The higher H(X|Y ) is, the lower is the correlation between X and Y . It is easy to see that if X is a function of Y , then H(X|Y ) = 0 (there is no uncertainty on X knowing Y if X is a function of Y ) and if X and Y are independent then H(X|Y ) = H(X) (knowledge of Y doesn’t change the uncertainty on X if they are independent) . Mutual information I(X; Y ) is a measure of how much information X and Y share. It can be defined as I(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X)
92
P. Malacaria and J. Heusser
Thus the information shared between X and Y is the information of X (resp Y ) from which the information about X given Y has been deduced. This quantity measures the correlation between X and Y . For example X and Y are independent iff I(X; Y ) = 0. Mutual information is a measure of binary interaction. Conditional mutual information, a form of ternary interaction will be used to quantify leakage. Conditional mutual information measures the correlation between two random variables conditioned on a third random variable; it is defined as: I(X; Y |Z) = H(X|Z) − H(X|Y, Z) = H(Y |Z) − H(Y |X, Z) 2.5
Measures on the Lattice of Information
Suppose we want attempt to quantify the amount of information provided by a point in the lattice of information. We could for example associate to a partition P the measure |P | = “number of blocks in P ”. This measure would be 1 for the least informative partition and its maximal value would be reached by the top partition. It is also true that A B then |A| ≤ |B| so the measure reflects the order of the lattice. An important property of “additivity” for measures is the inclusion-exclusion principle: roughly speaking this principle says that things should not be counted twice. In terms of sets, the inclusion-exclusion principle says that the number of elements in a union of sets is the sum of the number of elements of the two sets minus the number of elements in the intersection2 : in our case the inclusionexclusion principle is: |A B| = |A| + |B| − |A B| Unfortunately this property does not hold. As example, by taking A = {{1, 2}{3, 4}}, B = {{1, 3}{2, 4}} as two partitions, then their join and meet will be A B = {{1}{2}{3}{4}}, A B = {{1, 3, 2, 4}}. The counting principle from above is in this case not satisfied |A B| = 4 = 3 = |A| + |B| − |A B| Another problem with the map | | is that when we consider LoI as a lattice of random variables the above measure may end up being too crude; in fact, all probabilities are disregarded by | |. To address this problem we introduce more abstract lattice theoretic notions. 2
The principle is universal e.g. in propositional logic the truth value of A ∨ B is given by the truth value of A plus the truth value of B minus the truth value of A ∧ B.
Information Theory and Security: Quantitative Information Flow
93
A valuation on LoI is a real valued map ν : LoI → R, that satisfies the following properties: ν(X Y ) = ν(X) + ν(Y ) − ν(X Y )
(4)
X Y implies ν(X) ≤ ν(Y )
(5)
A join semivaluation is a weak valuation, i.e. a real valued map satisfying ν(X Y ) ≤ ν(X) + ν(Y ) − ν(X Y )
(6)
X Y implies ν(X) ≤ ν(Y )
(7)
for every element X and Y in a lattice [35]. The property (5) is order-preserving: a higher element in the lattice has a larger valuation than elements below itself. The first property (6) is a weakened inclusion-exclusion principle. Proposition 1. The map ν(X Y ) = H(X, Y )
(8)
is a join semivaluation on LoI . Proof: The tricky part is to prove that inequality 6 is satisfied. Since it is true that H(X, Y ) = H(X) + H(Y ) − I(X; Y ) it will be enough to prove that H(X Y ) ≤ I(X; Y ) This can be proved by noticing that 1. H(X Y ) = I(X Y ; X) this is clear because I(X Y ; X) measure the information shared between X Y and X and because X Y X such measure has to be H(X Y ) 2. I(X Y ; X) ≤ I(Y ; X) this is clear because X Y Y hence there is more information shareable between Y and X than between X Y and X combining we have H(X Y ) = I(X Y ; X) ≤ I(Y ; X) An important result proved by Nakamura [35] gives a particular importance to Shannon entropy as a measure on LoI . He proved that the only probabilitybased join semivaluation on the lattice of information is Shannon’s entropy. It is easy to show that a valuation itself is not definable on this lattice, thus Shannon’s entropy is the best approximation to a probability-based valuation on this lattice. Other measures can be used, which are however less mathematically appealing. We will also consider Min-Entropy, used recently by Smith in an information flow context [41], which seems like a good complementing measure. While
94
P. Malacaria and J. Heusser
Shannon entropy intuitively results in an “averaging” measure over a probability distribution, the Min-Entropy H∞ takes on a “worst-case” view: only the maximal value p(x) of a random variable X is considered H∞ (X) = − log max p(x) x∈X
where it is always the case that H∞ (X) ≤ H(X). 2.6
Shannon’s “Lattice Theory of Information”
Shannon’s original work on Information Theory [38] did not use the term Information but the term Communication. This was not a coincidence. In a little known note from 1953 [39] Shannon explained that while the entropy H(X) is a reasonable measure of the amount of information contained in the random variable (or process) X it can hardly be said that it represents the actual information of X: as we already observed two random variables might have the same entropy yet they have not in general the “same information” (flipping a coin, electing a US president) . The point here is to agree on what we mean by “same information”: consider two tables in a spreadsheet expressing distances about cities, one table measuring distance in kilometers and the other in miles. Then knowledge of one table reveals all information contained in the other table by converting kilometers to miles and viceversa. In general consider a random variable (or a stochastic process): a random variable could be described in many different possible ways, all those descriptions are reversible translations, in the same way as a newspaper article could be translated in another language without losing any information; an information element is then an equivalence class of objects under an invertible translation. More formally given random variables X and Y we can defined a distance d(X, Y ) = H(X|Y ) + H(Y |X) = 2H(X, Y ) − H(X) − H(Y ) we will see in a moment that d is a pseudometric, but let’s first understand what it means for X, Y to have distance zero: first they have the same entropy, in fact H(X|Y ) + H(Y |X) = 0 implies H(X) = H(Y ). Second and more important all information about X can be derived by knowing Y and viceversa, so X and Y contain the same information and reciprocally if they contain the same information then given full knowledge of one completely describes the other, i.e. H(X|Y ) = 0 = H(Y |X), and thus they have distance 0. For example, if X and Y are “flipping a coin” and “electing the US president” then knowing the next president will not help in determining the outcome of flipping a coin (i.e. H(X|Y ) > 0) and similarly flipping a coin will not help in knowing who won the election, so H(Y |X) > 0. We can state two elements having distance 0 contain the same information, i.e. they are reversible translation of the same process.
Information Theory and Security: Quantitative Information Flow
95
The argument can be pushed even further; knowing that d is a pseudometric implies that the relation X ≡ Y ⇐⇒ d(X, Y ) = 0 is an equivalence relation; we can then state that An information element is an equivalence class [X]≡ This means that nothing outside [X]≡ contains the same information as X and any thing that contain the same information as X is inside [X]≡ . Theorem 1. d is a pseudometric (or a metric if we take the equivalence classes [X]≡ ) d(X, X) = 0 is trivial and the symmetry of d is also trivial hence to prove the result the only non trivial property to prove is triangular inequality: d(X, Z) ≤ d(X, Y ) + d(Y, Z), i.e. by unfolding the definition H(X|Z) + H(Z|X) ≤ H(X|Y ) + H(Y |X) + H(Y |Z) + H(Z|Y ) let us prove one half (the other half will be the same argument) H(X|Z) ≤ H(X|Y ) + H(Y |Z) H(X|Y ) + H(Y |Z) ≥ H(X|Z) ⇔ H(X, Y ) − H(Y ) + H(Y, Z) − H(Z) ≥ H(X, Z) − H(Z) ⇔ H(X, Y ) + H(Y, Z) ≥ H(X, Z) + H(Y ) We now show that by adding a positive quantity to the right hand side we get the left hand side hence proving the inequality. We have = H(X, Z) + H(Y ) + H(Y |X, Z) + I(X; Z|Y ) = H(X, Z) + H(Y ) + H(Y, X, Z) − H(X, Z) + I(X; Z|Y ) = H(Y ) + H(Y, X, Z) + I(X; Z|Y ) = H(Y ) + H(Y, X, Z) + H(X|Y ) − H(X|Z, Y ) = H(Y ) + H(Y, X, Z) + H(X|Y ) − H(X, Z, Y ) + H(Y, Z) = H(Y ) + H(X|Y ) + H(Y, Z) = H(Y ) + H(X, Y ) − H(Y ) + H(Y, Z) = H(X, Y ) + H(Y, Z) The quantity H(Y |X, Z) + I(X; Z|Y ) ≥ 0 we added to prove the inequality can be found by using Venn diagrams, a powerful source of intuition when reasoning
96
P. Malacaria and J. Heusser
Y
X
a
b
c
d e
f
g
Z
Fig. 1. Reasoning with Venn diagrams
in Information Theory. Figure 1 shows r.v. X, Y, Z as the three main circles. H(X, Y ) corresponds to the union of X, Y , i.e. the regions a + b + c + d + e + f , similarly H(Y, Z) is made up by the regions b + c + d + e + f + g. The right hand side of the inequality gives H(X, Z) corresponding to a + b + d + e + f + g and H(Y ) corresponding to b + c + d + f . By subtracting the right hand side from the left hand side we are left with the regions c and e: c is the region corresponding to taking out X, Z from Y , i.e. Y − X, Z which is the term H(Y |X, Z) and e is the intersection of X and Z minus Y , X ∩ Z − Y which corresponds to I(X; Z|Y ). As pointed out by Yeung [43] Venn diagrams’ reasoning can be used by seeing entropy as a measure μ on sets corresponding to random variables and then using the following interpretation: 1. 2. 3. 4.
μ(X μ(X μ(X μ(X
∪ Y ) = H(X, Y ) − Y ) = H(X|Y ) ∩ Y ) = I(X; Y ) ∩ Y − Z) = I(X; Y |Z)
A related3 notion is an order on random variables defined as X ≥d Y ⇔ H(Y |X) = 0 The intuition here is that X provides complete information about Y , or equivalently Y has less information than X, so Y is an abstraction of X (some information is forgotten). Let us now relate this order with the lattice of information. We can show that when we consider the lattice of information as a lattice of random variables then 3
Notice that d induces a metric hence a topology. This topology can be completed by adding Cauchy converging sequences, i.e. lim
n→∞,m→∞
We will ignore these completion.
d(Xn , Xm ) = 0
Information Theory and Security: Quantitative Information Flow
97
the order above defined is the same as the order in LoI, hence they define the same lattice. Theorem 2 X Y ⇔ X ≤d Y To prove the result let us first define, given two partitions X and Y , the conditional partition Y |X = x (where x is a block in X) as the intersection of all blocks in Y with x; given Y |X = x a probability distribution is achieved by normalising the probabilities (the normalisation factor being p(x)). The notation Y |X = x is justified because we have that H(Y |X = x) is the usual notion of information theoretical entropy of the variable Y given the event X = x. Formally, Y |X = x ≡ {y ∩ x|y ∈ Y } and the probability distribution associated to Y |X = x is {
p(y ∩ x) |y ∈ Y } p(x)
Proof: Let’s start from the direction X Y ⇒ X ≥d Y Suppose now that X refines Y , then Y |X = x consists of at most one block (the block of which x is a subset of). Therefore H(Y |X = x) = 0 for all x and it follows that H(Y |X) = 0. X Y ⇐ X ≥d Y For the reverse implication suppose X does not refine Y , so there exists a block in X where its elements intersect two blocks in Y , for such x we have Y |X = x ≡ y, y , ... and hence H(Y |X = x) > 0, so H(Y |X) > 0 which proves the result.
3 3.1
Measuring Leakage of Programs Observations over Programs
An observation over a program P is an equivalence relation on states of P . A particular equivalence class will be called an observable. Hence an observable is a set of states indistinguishable by an attacker making that observation. The above intuition can be formalized in terms of several program semantics. We will concentrate here on a specific observation: the output observation [25]. For this observation the random variable associated to a program P is the equivalence relation on any two states σ, σ from the universe of states Σ defined by σ σ ⇐⇒ [[P ]](σ) = [[P ]](σ )
(9)
98
P. Malacaria and J. Heusser
where [[P ]] represents the denotational semantics of P . Hence the equivalence relation amounts to“ have the same observable output”. We denote the interpretation of a program P in LoI as defined by the equivalence relation (9) by Π(P ). According to denotational semantics commands are considered as state transformers, informally maps which change the values of variables in the memory; similarly, language expressions are interpreted as maps from the memory to values. The relation Π(P ) is nothing else than the kernel of the denotational semantic of P . 3.2
LoI Interpretation of Programs and Basic Properties
For the example programs used, we are referring to a simple imperative language with assignments, sequencing, conditionals and loops. Syntax and semantics for the language are standard, as in e.g. [47]. The expressions of the language are arithmetic expression, with constants 0, 1, . . . and boolean expressions with constants tt, ff. To see a concrete example, let P be the program if h==0 then x=0 else x=1 where the variable h ranges over {0, 1, 2, 3}. The equivalence relation (i.e. partition) Π(P ) associated to the above program is then O = { {0} {1, 2, 3}} x=0
x=1
O effectively partitions the domain of the variable h, where each disjoint subset represents an output. The partition reflects the idea of what a passive attacker can learn of secret inputs by backwards analysis of the program, from the outputs to the inputs. The quantitative evaluation of the partition O measures such knowledge gains of an attacker, solely depending on the partition of states and the probability distribution of the input. The next proposition shows how algebraic operations in LoI can be expressed using programs. Proposition 2. Given programs P1 , P2 there exists a program P12 such that Π(P12 ) = Π(P1 ) Π(P2 ) Given programs P1 , P2 , we define P12 = P1 ; P2 where the primed programs P1 , P2 are P1 , P2 with variables renamed so to have disjoint variable sets. If the two programs are syntactically equivalent, then this results in self-composition [3]. For example, consider the two programs P1 ≡ if (h == 0) x = 0 else x = 1,
P2 ≡ if (h == 1) x = 0 else x = 1
with their partitions Π(P1 ) = {{0}{h = 0}} and Π(P2 ) = {{1}{h = 1}}. The program P12 is the concatentation of the previous programs with variable renaming
Information Theory and Security: Quantitative Information Flow
99
P12 ≡ h = h; if (h == 0) x = 0 else x = 1; h = h; if (h == 1) x = 0 else x = 1 The corresponding lattice element is the join, i.e. intersection of blocks, of the individual programs P1,2 Π(P12 ) = {{0}{1}{h = 0, 1} = {{0}{h = 0}} {{1}{h = 1}} The above result can be extended to expressions of the language: we can associate to an expression e the program consisting of the assignment x = e and use Proposition 2 to compute the l.u.b. in LoI of a set of expressions. 3.3
Definition of Measuring Leakage
Let us take the following intuition The leakage of confidential information of a program is defined as the difference between an attacker’s uncertainty about the secret before and after her available observations about the program. For a Shannon-based measure, the above intuition can be expressed in terms of conditional mutual information. In fact if we start by observing that the attacker uncertainty about the secret before observations is H(h|l) and the attacker uncertainty about the secret after obervations is H(h|l, Π(P )) then using the definition of conditional mutual information we define leakage as H(h|l) − H(h|l, Π(P )) = I(h; Π(P )|l) We can now simplify the above definition as follows I(Π(P ); h|l) = H(Π(P )|l) − H(Π(P )|l, h) =A H(Π(P )|l) − 0 = H(Π(P )|l) =B H(Π(P ))
(10)
where equality A holds because the program is deterministic and B holds when the program only depends on the high inputs, for example when all low variables are initialised in the code of the program. Thus, for such programs Leakage: (Shannon-based) leakage of a program P is defined as the (Shannon) entropy of the partition Π(P ). Notice that the above definition can be easily adapted to other real valued maps from the lattice of information, providing possibly different definitions of leakage: Π(P ) provides a very general representation that can be used as the basis for several quantitative measures likes Shannon’s entropy, Renyi entropies or guessability measures. We can relate the order in LoI and the amount of leakage by the following result Proposition 3. Let P1 , P2 be two programs depending only on the high inputs. Then Π(P1 ) Π(P2 ) iff for all probability distributions on states in LoI, H(Π(P1 )) H(Π(P2 )).
100
4
P. Malacaria and J. Heusser
Foundational Issues about Measuring Leakage
Let us revisit the idea that Shannon’s entropy measures the information content of a random variable. Consider a horse race including four horses and the random variable W for “the winner is”. W can take four values, value i standing for ”the winner is the i−th horse”. The information content of a random variable can also be interpreted as the minimum space needed to store and transmit the possible outcomes of a random variable. 1. Suppose the r.v. W takes one of its 4 possible values with probability 1, so the other values have probability 0. Then there is only one possible outcome for the variable, which is known: is the value with probability 1, hence no space is needed to store or transmit the information content of W , i.e. W has 0 information content. 2. Suppose, at the other extreme, that all 4 values are equally likely. In that case the information content of W is 2 because using 2 bits is possible to store 4 values. 3. If there were only two possible values and they were equally likely then the information content of W would be 1 because in 1 bit is possible to store 2 values. Accordingly the entropy of W , H(W ) will take on the values 0, 2, 1 respectively when W follows the distributions p1 = 0, 0, 0, 1 (for the first case), p2 = 1/4, 1/4, 1/4, 1/4 (for the second case) and p3 = 1/2, 1/2, 0, 0 (for the third case). 4.1
Guessability 1: Dictionary Attack
Let us now consider a different idea. Instead of measuring the information content of W we now measure its guessability G(W ), i.e. the number of attempts that on average we need to guess the winner by choosing at each stage the most likely element not yet chosen. In security terms this method is called a dictionary attack. 1. Suppose the r.v. W takes one of its 4 possible values with probability 1, so the others have probability 0. Then there is only one possible outcome for the variable, which is known so we need 0 guesses to guess the winner. 2. The other extreme assumes that all 4 values are equally likely. In that case with one guess we will guess the right horse 1/4th of the times, with 2 guesses we will be right 1/4th of the times, with 3 guesses will be right 1/4th of the times and with 4 guesses 1/4, so (1/4) + (2/4) + (3/4) + (4/4) = 2.5 on average we will need 2.5 guesses to guess the winner. 3. If there were only two values possible and they were equally likely then on average we would only need 1 guess 1/2 of the times and 2 1/2 i.e 1.5 guesses on average.
Information Theory and Security: Quantitative Information Flow
101
The general definition of guessability for a random variable where its distribution is written in decreasing order xi ≥ xi+1 is G(W ) = i ixi . In general if there are n elements that are equally likely then G(W ) = ip(xi ) = 1/n i = 1/n ∗ n(n + 1)/2 = (n + 1)/2 i
1≤i≤n
whereas Shannon entropy results in H(W ) = H(1/n, . . . , 1/n) = log2 (n) For example when n = 100 then G(W ) = 101/2 = 50.5, H(W ) = log2 (100) = 6.6438. So there is a significant difference between average number of guesses and entropy; but notice that entropy is always lower than guessability. So what does entropy really measure? 4.2
Guessability 2: The 20 Questions Game
In the 20 questions game a player thinks of an object and his opponent can ask yes or no questions with the aim to guess the object with the minimum number of questions, usually less than 20 are needed to succeed. Using a dictionary attack for asking questions is not a clever strategy because it only eliminates one object at each round. A better strategy instead is to ask questions about sets of elements, i.e. if the object is or isn’t an element of a set. If the set is chosen carefully a large number of objects can be eliminated at each round. Assuming a uniform distribution then with 20 questions and yes/no answers there are 220 = 1048576 possible items that can be identified. This strategy is played as follows: 1. split the universe of all possible items into two sets of equal size A, B. Then ask if the item is in set A. 2. if the answer is yes set the universe to be the set A, if the answer is no then set the universe to be the set B. Go to step 1. Suppose now we believe that the player has chosen an item with a higher probability than other items. What is the best way to act? We could just ignore our believe, or we could combine it in creating a set with probability 1/2 containing that item, we could just try the guess that item? As an example, we have 8 possible items and probabilities: 1/4, 1/8, 5/48, 5/48, 5/48, 5/48, 5/48, 5/48 We would have the choices: – ignore: define A = {1/4, 1/8, 5/48, 5/48}, B = {5/48, 5/48, 5/48, 5/48} and ask is it in A (or B)? – brute force: guess item with probability 1/4 – combine: set A = {1/4, 1/8, 5/48}, B = {5/48, 5/48, 5/48, 5/48, 5/48} and ask is it in A?
102
P. Malacaria and J. Heusser
Information theory tell us that the best strategy is to combine. This can be proven as follows. Encode the universe with an Huffman code. Then ask questions about the leftmost unknown bits of the code The Huffman code of a set of events E is defined by building a binary tree as follows: Initialise the set P as the set of probabilities of the events in E and T as the empty set. Step: given P and a set of trees T pick from P two elements a, b with the lowest probability (if several have the same lowest probability randomly pick two of them). Add to T the the new tree consisting of a new parent node c with children a and b. Add to P the element c where its probability is the sum of probabilities of a and b The Huffman code of the previous example is built as follows: 1. joint x3 , x4 with probability 5/48, 5/48; get a new element y1 with probability 10/48 = 5/24 2. joint x5 , x6 with probability 5/48, 5/48; get a new element y2 with probability 10/48 = 5/24 3. joint x7 , x8 with probability 5/48, 5/48; get a new element y3 with probability 10/48 = 5/24 4. joint x2 , y1 with probability 1/8, 5/24; get a new element y4 with probability 9/24 = 5/24 5. . . . This results in the following code: x1 = 00, x2 = 010, x3 = 0110, x4 = 1111, x5 = 100, x6 = 101, x7 = 110, x8 = 1110 Now the question about the leftmost unknown bit corresponds to partitioning the universe into A = {1/4, 1/8, 5/48}, B = {5/48, 5/48, 5/48, 5/48, 5/48} and asking if the object is in A? The average length of the words (we calculate this as the sum of all lengths multiplied by their probability) is 1/4 ∗ 2 + 1/8 ∗ 3 + 2 ∗ 5/48 ∗ 4 + 4 ∗ 5/48 ∗ 3 = 2.95833333 We can see the word 00 as identifying the element x1 by the following sequence of questions/answers: “is the leftmost bit 0? Yes. Is the next leftmost bit 0? Yes. Then is x1 ”. In general by seeing a word as the sequence of questions/answers that have the encoded element as the outcome then we can see the average length as the average length of the sequence of questions/answers needed to guess elements of that universe. Compare the average length above with the entropy of the same probability space H(1/4, 1/8, 5/48, 5/48, 5/48, 5/48, 5/48, 5/48) = 2.9143965
Information Theory and Security: Quantitative Information Flow
103
This is not a coincidence, in fact Shannon’s entropy measures the average length of the sequence of questions/answers in an optimal guessing strategy. Notice the tiny discrepancy of 0.04 between Huffman code and Entropy . Entropy is a lower limit on coding and although Huffman algorithm is pretty close to such a limit it is still above it. The important remark is that Huffman algorithm is optimal, i.e. there is no other feasible code that performs better4 . Suppose now we could find a more efficient strategy to play the 20 questions game. Then this can easily turned into an an algorithm that given any finite probability space would give us an average shorter sequence of binary codes for elements of the probability space than the one given by Huffman coding. This contradict the optimality of Huffman coding so it is not possible. 4.3
Leakage and Guessability: Smith’s Example
Let us see how this investigation on guessing strategies relates to some recent fundational debate on quantitative information flow [41]. Consider the two programs below and assume the secret is a 8k bits variable under uniform distribution i.e. H(h) = 8k. 1. if (h % 8 == 0) l=h else l=1 The leakage of this program consists of a conditional statement can be computed as the leakage of the guard plus the weighted leakage of the branches i.e. H(p(h%8 == 0)) + p(h%8 == 0)H(l = h|h%8 == 0) + p(h%8 = 0)H(l = 1|h%8 = 0) that is 1 7 1 H( , ) + H(l = h|h%8 = 0) 8 8 8 7 + H(l = 1|h%8 = 0) 8 1 7 1 7 = H( , ) + log(28k−3 ) + 0 = 8k − 7k + 0.169 8 8 8 8 Smith computes the leakage using mutual information as I(h; l) = H(h) − H(h|l). We have already seen that the two definitions of leakage are equivalent; in fact H(h|l) is 7k−0.169 so the leakage H(h)−H(h|l) 8k−7k+0.169. 2. l = h & 07k−1 1k+1 This program copies the last k + 1 bits of h into l., hence its leakage is k + 1 Alternatively using mutual information we have H(h|l) is 7k − 1 so the leakage H(h) − H(h|l) 8k − 7k + 1 k + 1. The programs leak a similar amount. 4
Some minor improvement can be achieved in some context
104
P. Malacaria and J. Heusser
Smith’s point is that program 1 is much a bigger threat than 2, because after running program 1 the attacker has one chance in 8 to guess the secret whereas after running program 2 the probability to guess the secret is much lower, at 1/27k−1 . On this basis Smith proposes a measure (based on Min-Entropy [36]) according to which program 1 has a much bigger measure than program 2. So what is wrong with Shannon’s entropy for those examples? 4.4
Meaning of Shannon’s Measure
Smith’s observation assumes an attacker attempts a single guess of the secret after running the program just once. While this is often a reasonable assumption about the real world this kind of attacks (like the dictionary attacks we saw before) are not the most powerful guessing strategy and hence it may underestimate the power of an attacker. Suppose the attacker has, after running the program, an optimal guessing strategy: he can play a 20 questions game using the outcome of the two programs. Then with program 2 from before there are around k + 1 bits leaked, i.e. 8k − (k + 1) = 7k − 1 bits are left so we would need 7k − 1 questions using an optimal strategy to guess the secret. With program 1 in 1/8th of the cases the attacker will need 0 questions, whereas in 7/8 of the cases he will face a set of size 7 ∗ 28k−3 where the secret could be; we can approximate this to 23 28k−3 = 28k i.e. in 7/8 of the cases we need around 8k questions to guess the secret. The expected number of questions is around 1/8 ∗ 0 + 7/8 ∗ 8k = 7k This argument justifies why Shannon’s leakage of the two programs above is similar. Hence Shannon’s measure indicates the threat level of a program when attacked by (in some respect) the most powerful attacker, and hence provide a good lower bound to the security threat of programs for most security scenarios. However as shown by Smith’s work other measure may be more appropriate in particular contexts and guessability in n tries or within a confidence interval are sometimes a better indication of the threat level of code.
5
Reasoning about Programs: Looping Constructs
The generality of the definition of leakage we gave in Section 3 may present a problem. In fact it abstracts over all programming constructs and so it doesn’t tell much about how to go on to reason about the leakage of specific program constructs. In this section, we introduce reasoning techniques for a very challenging program construct: loops. Looping constructs are one of the most challenging aspects of programming languages. Most kind of program analyses would be much simpler if it wasn’t for loops. The main complication of loops is that they introduce “circular” dependencies between program points. Circular dependencies, if taken literally usually
Information Theory and Security: Quantitative Information Flow
105
result in poor analysis where either nothing or everything is leaked. Hence any useful analysis needs to provide general reasoning tools to cleverly break down this apparent circularity. We present two approaches to the analysis of loops: the first approach, not based on the lattice of information, follows [25,26] and provide an analysis based on the source of leakage and the number of iterations. The second approach, based on the lattice of information interpret loops in terms of chains in the lattice of information and their leakage as the entropy of the least upper bound of the chain. The two approaches are shown to be equivalent. 5.1
Loops: Analytical Approach
A possible way to analyse leakage of loops is an analysis of possible sources of leakage. Both the guard and the body of a loop can be sources of leaks. In fact it has been shown [25] that those are two of the three components needed to provide a precise quantitative analysis. The three components are: Guard: the information about the number of iterations of the loop Body: the information about the output given knowledge of the number of iterations Collisions: the information about the number of iterations given knowledge of the output The idea is that the leakage of a looping program (noted L(P )) is given by the information leaked by the guard plus the information leaked by the body minus the ambiguity given by the collisions. In terms of random variables this can be expressed as follows [26] (the following random variables will be formally defined later on): L(P ) = H(NIterations(P ))+H(P |NIterations(P ))−H(NIterations(P )|P ) guard
body
collisions
Consider this example program l=0; while(l < h) { if (h==2) l=3 else l++ } and suppose h,l are two bit variables with range {0, 1, 2, 3} and all values of h are equally likely. Then the loop will terminate in 0 iterations with probability 0.25 (i.e. only when h=0); it will terminate in 1 iterations with probability 0.5 (i.e. only when h=1 and h=2), it will terminate in 2 iterations with probability 0 and it will terminate in three iterations with probability 0.25 (i.e. only when h=3). Now we have the first ingredient of our formula: H(NIterations(P )) = H(0.25, 0.5, 0.25) guard
106
P. Malacaria and J. Heusser
Considering the leakage in the body, we have that in the case of two and three iterations there is no uncertainty left about the secret (0 bits of information), and in the case of two iterations the body leaks the information that h=1 or h=2 (1 bit of information). This amounts to: H(P |NIterations(P )) = 0.25 ∗ 0 + 0.25 ∗ 0 + 0.5 ∗ 1 body
For the collisions, notice that the output l=3 can be the result of one or three iterations, hence the output l=3, happening with probability 0.5 generates 1 bit of uncertainty about the number of iterations (it could be one or three iterations). This give the last element of the leakage formula: H(NIterations(P )|P ) = 0.5 ∗ 1 collisions
For this particular program the leakage is then H(0.25, 0.5, 0.25) + 0.25 ∗ 0 + 0.25 ∗ 0 + 0.5 ∗ 1 − 0.5 ∗ 1 = H(0.25, 0.5, 0.25) = 1.5 The fact that 1.5 is the correct amount leaked can be checked with the intuition. An attacker observing the output of the program may observe l = 0 in which case knows that h = 0, may observe l = 1 in which case knows that h = 1, may observe l = 3 in which case knows that h = 2 or h = 3. These three observations have probability 0.25, 0.25, 0.5 respectively and so the leakage given the observations is H(0.25, 0.5, 0.25) = 1.5 We are now going to make this argument formal following [25]. 5.2
Loops as Disjoint Union of Functions
Given a looping program P ≡ while e M that depends only on a high input variable h let associate to P the following random variables: NItP is the random variable “number of iterations the loop terminates in”. The associated distribution p(NItP = n) is the sum of the probabilities of all values of h such that for those values P terminates in n iterations. p(NItP = n) = {p(h = v)|P (v) terminates in n iterations} We can then show that this analytical approach give the same leakage as in definition 10: Proposition 4 H(Π(P )) = H(NItP ) + H(Π(P )|NItP ) − H(NItP |Π(P )) Proof: We use the information theoretical equality H(Y ) = H(X) + H(Y |X) − H(X|Y ) which is true because by definition of conditional entropy
(11)
Information Theory and Security: Quantitative Information Flow
107
H(X) + H(Y |X) − H(X|Y ) = H(X) + H(Y, X) − H(X) − H(X|Y ) = H(X) + H(Y, X) − H(X) − H(X, Y ) + H(Y ) = H(Y ) The result then follows with replacing X = NItP , Y = Π(P ).
This proposition states that the leakage of a looping program is equivalent to the uncertainty about the number of iterations it takes for the loop to terminate plus the uncertainty about the output of the program knowing how many iterations it took to terminate minus the uncertainty in the number of iterations it took to terminate knowing the output of the program. We interpret the elements in equation 11 as follows 1. H(NItP ) is the leakage of the guard 2. H(Π(P )|NItP ) is the leakage of the body 3. H(NItP |Π(P )) is the measure of the collisions of the loop A collision is an observable value that could be generated in different numbers of iterations of the loop. We can “approximate” the r.v. NItP by the r.v. NItPn which is “number of iterations ≤ n the loop terminates in”. The possible values for NItPn are 0, . . . , n, where the last value n is for “the loop terminates in > n iterations”. Probabilities associated to NItP n are also an approximation of the probabilities of NItP. They are defined by if m ≤ n, p(NItP = m) p(NItPn = m) = 1 − {p(NItP = s)|s > n} otherwise. 5.3
Basic Definitions
Definition 1. Define the leakage of a collision free loop while e M up to n iterations by W (e, M )n = H(NItPn ) + H(Π(P )|NItPn ) Proposition 5. ∀n ≥ 0, W (e, M )n ≤ W (e, M )n+1 Proof: The proof can be decomposed in showing that H(NItPn ) ≤ H(NItP n+1 ) which is true because NItPn+1 refines the distribution NItPn . To prove the other component of the inequality, i.e. H(Π(P )|NItP n ) ≤ H(Π(P )|NItP n+1 ) consider the event e as the ”loop terminates in n + 1 iterations”. Using the definition of conditional entropy we have then p(e)H(Π(P )|NItPn = e) H(Π(P )|NItPn ) = NItPn =e
≤
p(e)H(Π(P )|NItP n = e) + p(e )H(Π(P )|e )
NItPn =e
=
p(e)H(Π(P )|NItPn+1 = e)
NItPn+1 =e
= H(Π(P )|NItP n+1 )
108
P. Malacaria and J. Heusser
Using proposition 4 we can hence define the leakage of a loop as lim W(e, M)n − H(NItP |Π(P ))
n→∞
(12)
which simplify when there are no collisions to lim W(e, M)n
n→∞
(13)
using this simplified definition we now formalize some important concepts. The rate of leakage is W(e, M)n n→∞,p(NItPn =n)
=0 n lim
Thus in the case of terminating loops the rate will be the total leakage divided by the number of iterations. This can be considered a rough measure of rate: for example if the first iteration were to leak all secret and the following billion nothing the rate would still be one billionth of the secret size. However as in our model the attacker can only perform observations on the output and not on intermediate states of the program the chosen definition of rate will give indication of the timing behavior of the channel in that context. A fundamental concept in Information Theory is channel capacity, i.e. the maximum amount of leakage over all possible input distributions, i.e. maxp lim W(e, M)n n→∞
(14)
In our setting we will look for the distribution which will maximize leakage. Informally such a distribution will provide the setting for the most devastating attack: we will refer to this as the channel distribution. Also we will use the term channel rate for the rate of leakage of the channel distribution. Again this should be thought of as the average maximal amount of leakage per iteration. To define rate and channel capacity on the case of collisions the above definitions should be applied on the definition of leakage for loops with collisions. The previous definitions can be used to give a simple classification of the leakage behaviour of loops: for example a bounded loop is one where even if we were able to increase arbitrarily the size of the secret we would not be able to increase arbitrarily the amount leaked. Similarly we can define the rate of leakage as increasing (or decreasing or constant) if increasing the size of the secret increases (or decreases or keeps constant) the rate. Notice also that the rate of leakage is loosely related to timing behaviour. In loops with decreasing rate if the size of the secret is doubled each iteration will (on average) reveal less information than each iteration with the original size. We will discuss timing behaviour in one example shortly. In most cases a separation property of the definition of leakage for loops can be exploited. As shown, the definition neatly separates information flows in the
Information Theory and Security: Quantitative Information Flow
109
guard and body of a loop. If there is no leakage in the body – e.g. no high variable appears in the body of the loop – (13) reduces to lim H(NItP n )
(15)
n→∞
On the other hand, if there is no indirect flow from the guard – e.g. e doesn’t contain any variable affected by high variables – then (13) reduces to lim H(Π(P )|NItP n )
(16)
n→∞
5.4
Examples
Let us apply the previous theory to the analysis of two looping programs. Unless stated otherwise we assume uniform distribution for all input random variables and that the high input is a k-bit variable assuming possible values 0. . . . , 2k − 1 (i.e. no negative numbers). An unbounded covert channel with decreasing rate. Consider the following simple loop with an increasing counter l: l=0; while (l != h) { l=l+1 } No high variable appears in the body of the loop, so there is no leakage in the body, i.e lim H(Π(P )|NItPn ) = 0 n→∞
Therefore we only need to study the behaviour of lim H(NItP n )
n→∞
The events associated to the random variable NItP n are: ⎧ ⎪ ⎨ 0 = h, if i = 0 (NItP n = i) = ⎪ ⎩ 0 = h, . . . , i = h ∧ i + 1 = h, if i > 0 hence every event is equally likely, i.e. p(NItPn = i) = possible guards is then lim H(NItP n ) = H(
n→∞
1 . 2k
The entropy over all
1 1 , . . . , k ) = log(2k ) = k k 2 2
As expected all k-bits of a variable are leaked in this loop, for all possible k; however to reveal k bits 2k iterations are required. We conclude that this is an unbounded covert channel with decreasing rate 2kk . To attach a concrete timing meaning to this rate, let t1 , t2 be the time taken by the system to evaluate the
110
P. Malacaria and J. Heusser
expression l != h and to execute the command l = l+1 respectively. Then the above program leaks 2kk bits per t1 + t2 milliseconds. Notice that uniform distribution maximizes leakage, i.e. it achieves channel capacity. Consider for example the following input distribution for a 3-bit variable: p(0) =
7 1 , p(1) = p(2) · · · = p(7) = 8 56
In this case the attacker knows, before the run of the program, that 0 is much more likely than any other number to be the secret, so the amount of information revealed by running the program is below 3 bits (below capacity). In fact, we have 1 7 1 H( , , . . . , ) = 0.8944838 8 56 56 Notice that whatever the distribution the security of this program is 0 and leakage ratio 1. A bounded covert channel with constant rate. The next example is a loop with a decreasing counter and a slightly different guard expression: l=20; while (h < l) { l=l-1 } Again, since the body of the loop does not contain any high variable, the body part of the leakage is 0 lim H(Π(P )|NItP n ) = 0
n→∞
Thus we only need to study the leakage of the guard. After executing the program, l will be 20 if h ≥ 20 and will be h if 0 ≤ h < 20, i.e. h will be revealed if its value is in the interval 0 . . . 19. The events associated to NItPn are:
(NItPn
⎧ h < 20 − i ∧ h ≥ 20 − (i + 1) ≡ ⎪ ⎪ ⎪ ⎨ h = 20-(i+1), i>0 = i) = ⎪ ⎪ ⎪ ⎩ h ≥ 20, i=0
and
p(NItP n
⎧ k 2 −20 ⎪ if i = 0 ⎪ 2k ⎪ ⎪ ⎪ ⎪ ⎨ = i) = 21k if 0 < i ≤ 20 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 0 if i > 20
Information Theory and Security: Quantitative Information Flow
111
The leakage is then given by lim H(NItPn ) =
n→∞ k
2 − 20 1 1 , k , . . . , k , 0, . . . , 0) = 2k 2 2 2k − 20 2k − 20 1 1 log( ) − 20( k log( k )) − k k 2 2 2 2 H(
This function is plotted in Figure 2 for k = {6 . . . 16}. The interesting element in the graph is how it shows that for k around 6 bits the program is unsafe (more than 2.2 bits of leakage) whereas for k from 14 upwards the program is safe (around 0 bits of leakage).
Fig. 2. Leakage in l=20; while (h < l) {l=l-1}
However, the uniform distribution is not the channel distribution. The capacity of this channel is 4.3923 and is achieved by the distribution where the only values with non zero probability forh are in the range {0 . . . 20} and have uniform distribution5 . The channel distribution ignores values of h higher than 20, so the channel rate = 0.2091. We conclude that this is a bounded covert channel is constant 4.3923 21 with decreasing rate.
6
Loops in the Lattice of Information
We are now to show how the previous analysis of loops is naturally interpreted in the lattice of information. In informal terms the key result is that leakage of loops is the semivaluation of the l.u.b. of a chains of points in the lattice of 5
We are ignoring the case where k < 5 where the capacity is less than 4.3923.
112
P. Malacaria and J. Heusser
information, where the chain is the interpretation of the different iterations of the loop. To understand the ideas let’s consider again the program l=0; while(l < h) { if (h==2) l=3 else l++ } and let us now study the partitions it generates. The loop terminating in 0 iterations will reveal that h=0 i.e. the partition W0 = {{0}{1, 2, 3}}, termination in 1 iteration will reveal h=1 if the output is 1 and h=2 if the output is 3 i.e. W1 = {{1}{2}{0, 3}}, the loop will never terminate in 2 iterations i.e. W2 = {{0, 1, 2, 3}} and in 3 iterations will reveal that h=3 given the output 3, i.e. W3 = {{3}{0, 1, 2}}. Let’s define W≤n as n≥i≥0 Wi , we have then W≤1 = W≤2 = W≤3 = {{0}{1}{2}{3}} We also introduce an additional partition C to cater for the collisions in the loop: the collision partition is C = {{0}{1}{2, 3}} because for h=2 the loop terminates with output 3 in 1 iterations and for h=3 the loop terminates with output 3 in 3 iterations, hence H( n≥0 W≤n C) = H({{0}{1}{2, 3}}) Notice now that the analytic and lattice interpretation give the same result: assuming uniform distribution we get H(0.25, 0.5, 0.25) + 0.5 H(0.5, 0.5) − 0.5 H(0.5, 0.5) = guard
body
collisions
= 1.5 = H({{0}{1}{2, 3}}) The above is not a coincidence; using the lattice of information we can relate this analytic formula to the join semi-valuation of lattice chains: We can interpret looping programs in the lattice of information as least upper bounds of increasing sequences; for some loops (those with collisions) this is not immediately true: we will show however that all loops can be interpreted as the meet of the l.u.b. of an increasing sequence and a point in the lattice representing the collisions. 6.1
Algebraic Interpretation
Given a loop W , let Wn be the program W up to the n-th iteration. The random variable associated to Wn is hence a partition where only the outputs of W up to the n−th iteration are distinguished. Hence, Wn+1 will refine Wn by introducing additional blocks. As a simple example of a collision free program consider the “linear search” program P below
Information Theory and Security: Quantitative Information Flow
113
l=0; while (l n, because Wi+1 destructively refines or “splits” a finite block of Wi into smaller equivalence classes. 6.2
Loops with Collisions
Let us look at the colliding program shown in Figure 3. It consists of two iterations, represented by functions f1 and f2 . The exact partition for this program is P = {{a, a }, {x, x , y}, {c}} The chain of partitions associated to the program is the following: W1 = {{a, a }, {x, x }, {y, c}} W2 = {{a, a }, {x, x , y}, {c}}
114
P. Malacaria and J. Heusser
f1
f2
a a' x x'
b'
y c
b''
b
Fig. 3. Two iterations with one collision at b
We see that W2 extends the block containing x, x with y because all three of them have the same image b . This reflects the idea of collisions, namely that two (or more) elements of the codomain of two different iteration functions, here f1 and f2 coincide. The result is that their inverse images are indistinguishable from one another and therefore end up being in the same block, here {x, x , y}. Then, W2 is equal to P . However, because W2 extends a block in W1 this is not an ascending chain anymore; actually by choosing a distribution assigning probability 0 to c, we can see that H(W1 ) > H(W2 ) and therefore theorem 3 is false in case of collisions. To address this problem we first introduce a trick to transform a sequence of partitions into an ascending chain of partitions: given a sequence of partitions (Wi )i≥0 define the sequence (W≤i )i≥0 by W≤i = j≤i Wj It is easy to see that (W≤i )i≥0 is an increasing chain. Define now the collision equivalence of a loop W as the reflexive and transitive closure of the relation σ C σ iff σ, σ generate the same output from different iterations. We are now ready to relate the leakage of arbitrary loops with semivaluations on LoI. Theorem 4. The leakage of an arbitrary loop as in definition 12 is equivalent to semivaluating the meet of the least upper bound of its increasing chain W≤n and its collision partition C, i.e. lim W(e, M)n − H(NItP |Π(P )) = H( n≥0 W≤n C)
n→∞
Proof: Notice first that increasing chains xn with a maximal element in a lattice do distribute, i.e.: ( n≥0 xn ) y = n≥0 (xn y) Assuming distributivity the argument is then easy to show: ( n≥0 W≤n C) = n≥0 (W≤n C)
Information Theory and Security: Quantitative Information Flow
115
Notice now that (W≤n C)n≥0 is a chain cofinal to the sequence (Wn )n≥0 and so we can conclude that n≥0 (W≤n C) is the partition whose semivaluation corresponds to W (e, M ). Notice the generality of the lattice approach: we can replace Shannon entropy H with any real valued map form the lattice of information F and we get a definition of leakage for loops as follows: F ( n≥0 (Wn C))
7
Automation
By now it is clear that a central ingredient to quantifying information flows in programs is the partitioning of the secret space into indistinguishable subsets, i.e. equivalence classes. One equivalence class contains all inputs which lead to the output described by the equivalence class. Terauchi and Aiken [44] describe the crucial insight into automatically quantifying information flows by stating that a program with secure information flows satisfies the 2-safety property. This means that insecure information flows in a program can be detected by observing two finite traces of the program which lead to a distinction in the outputs from related inputs. Figure 4 describes this situation, where each oval describes an equivalence class and the four dots inside the top figure are elements in the secret space. Let us take the top partition as an initial partition of the secret and the bottom partition as “output” partition generated by a program. Under this setup, the arrow to B from the first equivalence class represents a violation of the 2-safety property: two initially indistinguishable secret elements are now in distinct equivalence classes A and B. Checking the initial partition for every such violation is equivalent to describing the “output” partition. Given that partition, the quantification is simply achieved by applying different entropy measures on it as described in previous sections. Thus, the question any automatic technique has to address in one way or the other is how to find the “output” partition given a program and an initial secret partition (usually the ⊥ partition with only one equivalence class). The next
A
B
C
Fig. 4. Distinction in class B as Non-Interference violation
116
P. Malacaria and J. Heusser
sections describe different approaches to solving this problem, starting with a more thorough description of our own tool AQuA (which is partially inspired by the tool described in Section 7.2) and then reviewing other existing techniques. 7.1
SAT Solving and Model Counting
The computationally intensive task of AQuA is to automatically calculate the output partition given a C program code. Given a program P , its partition is denoted as Π(P ) as defined in Section 3. Applying any measure to it, e.g. F (Π(P )), is in comparison to finding the partition cheap and easy (if the probability distribution is known). The idea behind the partition discovery is best explained using the recurring password example with 4 bit variable width and the secret input variable pwd: if(pwd == 4) { return 1 } else { return 0 } The first step of the method is to find a representative input for each possible output. In our case, AQuA could find the set {4, 5}, for outputs 1 and 0, respectively. This is accomplished using a SAT-based fixed point computation. The next step runs on that set of representative inputs. For each input in that set, the number of possible inputs are counted which lead to the same implicit, distinct output. This step is accomplished using model counting. The next section will describe these two steps in more detail. Method. The method consists of two reachability analyses, which can be run either one after another or interleaved. The first analysis finds a set of inputs to which the original program produces distinct outputs for. That set has cardinality of the number of possible outputs for the program. The second analysis counts the set of all inputs which lead to the same output. This analysis is run on all members of the set of the first analysis. Together, these two analyses allow to discover the partition of the input space according to a program’s outputs.
Input: P= Output: Sinput Sinput ← ∅ h ← random Sinput ← Sinput ∪ {h} while P= (h) not unsat do (l, h ) ← Run SAT solver on P= (h) Sinput ← Sinput ∪ {h } h ← h P= ← P= ∧ l =l end Algorithm 1. Calculation of Sinput using P=
Information Theory and Security: Quantitative Information Flow
117
To a program P we associate two modified programs P
= and P= , representing the two reachability questions. The two programs are defined as follows: P
= (i) ≡ h = i; P ; P ; assert(l! = l )
P= (i) ≡ h = i; P ; P ; assert(l = l ) The program P is self-composed [3,44] and is either asserting low-equality or lowinequality on the output variable and its copy. Their argument is the initialisation value for the input variable. This method works on any number of input variables, but we simplify it to a single variable. The programs P
= and P= are unwound into propositional formula and then translated in Conjunctive Normal Form (CNF) in a standard fashion. P
= is solved using a number of SAT solver calls using a standard reachability algorithm (SAT-based fixed point calculation). Algorithm 1 describes this input discovery. In each iteration it discovers a new input h which does not lead to the same output as previous the input h. The new input h is added to the set Sinput . The observable output l is added to the formula as blocking clause, to avoid finding the same solution again in a different iteration. This process is repeated until P
= is unsatisfiable which signifies that the search for Sinput elements is exhausted. Given Sinput (or a subset of it) as result of Algorithm 1, we can use P= to count the sizes of the equivalence classes represented by Sinput using model counting. This process is displayed in Algorithm 2 and is straightforward to understand. The algorithm calculates the size of the equivalence class [h]P= for every h in Sinput by counting the satisfying models of P= (h). The output M of Algorithm 2 is the partition Π(P ) of the original program P . Proposition 8 (Correctness). The set Sinput of Algorithm 1 contains a representative element for each possible equivalence class of Π(P ). Algorithm 2 calculates {[s1 ]P= , . . . , [sn ]P= } which, according to (9), is Π(P ). Implementation. The implementation builds up on a toolchain of existing tools, together with some interfacing, language translations, and optimisations. See Figure 5 for an overview.
Input: P= , Sinput Output: M M =∅ = ∅ do while Sinput h ← s ∈ Sinput #models ← Run allSAT solver on P= (h) M = M ∪ {#models} Sinput ← Sinput \ {s} end Algorithm 2. Model counting of equivalence classes in Sinput
118
P. Malacaria and J. Heusser C CBMC Constr aints Optimisations SelfComp Language translation Spear Format SAT
#SAT
P=
S_input
P=
Partition
Fig. 5. Translation steps
AQuA has the following main features: – runs on a subset of ANSI C without memory allocation and with integer secret variables – no user interaction or code annotations needed except command line options – supports non-linear arithmetic and integer overflows AQuA works on the equational intermediate representation of the CBMC bounded model checker [15]. C code is translated by CBMC into a program of constraints which in turn gets optimised through standard program analysis techniques into cleaned up constraints6 . This program then gets self-composed and user-provided source and sink variables get automatically annotated. In a next step, the program gets translated into the bit-vector arithmetic Spear format of the Spear theorem prover [1]. At this point, AQuA will spawn the two instances, P= and P
= , from the input program P . Algorithms 1 and 2 get executed sequentially on those two program versions. However, depending on the application and cost of the SAT queries, once could also choose to execute them interleaved, by first calculating one input to the program P= and then model counting that equivalence class. For Algorithm 1, Spear will SAT solve P
= directly and report the satisfying model to the tool. The newly found inputs are stored until P
= is reported to be unsat. For Algorithm 2, Spear will bit-blast P= down to CNF which in turn gets model counted by either RelSat [4] or C2D. C2D is only used in case the user specifies fast model counting through command line options. While the counting is much faster on difficult problems than RelSat, the CNF instances have to be transformed into a d-DNNF tree which is very costly in memory. This is a 6
CBMC adds some constraints which distorts the model counting.
Information Theory and Security: Quantitative Information Flow
119
Table 1. Performance examples. * 30 loop unrollings; † from [2]; counted with C2D Machine: Linux, Intel Core 2 Duo 2GHz. Program #h range Σh bits P= Time P= + P= Time Spear LOC CRC8 1h.c 1 8 bit 8 17.36s 32.68s 370 CRC8 2h.c 2 8 bit 16 34.93s 1m18.74s 763 sum3.c† 3 0 . . . 9 9.96 (103 ) 0.19s 0.95s 16 sum10.c 10 0 . . . 5 25.84 (610 ) 1.59s 3m30.76s 51 nonlinear.c 1 16 bit 16 0.04s 13.46s 20 search30.c* 1 8 bit 8 0.84s 2.56s 186 auction.c† 3 20 bit 60 0.06s 16.90s 42
trade-off between time and space. In most instances, RelSat is fast enough, except in cases with multiple constraints on more than two secret input variables. Once the partition Π(P ) is calculated, the user can choose which measure to apply. Loops. The first step of the program transformations is treating loops in an unsound way, i.e. a user needs to define a fixed number of loop unwindings. This is a inherent property of the choice of tools used, as CBMC is a bounded model checker, which limit the number of iterations down to what counterexamples can be found. While this is a real restriction in program verification – as bugs can be missed in that way – it is not as crucial for our quantification purposes. Algorithm 1 detects at one point an input which contains all inputs beyond the iteration bound. Using the principle of maximum entropy, this “sink state” can be used to always safely over-approximate entropy. Let us assume we analyse a binary search examples with 15 unwindings of the loop and 8 bit variables. AQuA reports the partition Partition: {241}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}: 256 where the number in the brackets are the model counts. We have 15 singleton blocks and one sink block with a model count of the remaining 241 unprocessed inputs. When applying a measure, the 241 inputs could be distributed as well in singleton blocks which would over-approximate (and in this case actually exactly find) the leakage of the input program. Proposition 9 (Sound loop leakage). Let us assume partition Π(P )n is the result of n unwindings of P , and Π(P )m is m unwindings of P , where m ≥ n. If every element of the “sink state” block b ∈ Π(P )n is distributed in individual ˆ )n , then Π(P )m Π(P ˆ )n . From Proposition blocks, the partition denoted as Π(P ˆ )n ). 3 follows that H(Π(P )m ) H(Π(P Experiences. Table 1 provides a small benchmark to give an idea on what programs AQuA has been tested on. The running times have been split between Algorithm 1 to calculate P
= and the total run time; also it provides the lines of code (LOC) the program has in Spear format.
120
P. Malacaria and J. Heusser
The biggest example is a full CRC8 checksum implementation where the input is two char variables (16 bit) which has over 700 LOC. The run time depends on the number of secrets and their ranges and as a result on the cardinality of the partition. The programs are available from the second author’s website. 7.2
Model Checking and Constraint Solving
Recently, Backes, K¨opf, and Rybalchenko published an elegant method to calculate and quantify an equivalence relation given a C-like program [2]. Two algorithms are described to discover and quantify the required equivalence relation. The procedure Disco starts with an equivalence relation equivalent to the ⊥ element in the lattice of information, and iteratively discovers and refines the relation by discovering pairs of execution paths which do lead to a distinction in the outputs. The corresponding high inputs of those two paths are then split in two different equivalence classes. This process is repeated until no more counter examples are discovered. The procedure Quant calculates the sizes of equivalence classes generated by the output of the previous procedure. The result can be normalised to a probability distribution and any probabilistic measure can be applied on it. Disco is implemented by turning the information flow checking into a reachability problem, as shown by [44]. The program P is self-composed by creating a copy of the code P with disjoint variable sets (indicated by the primes) and an added low inequality check at the end of the newly created program, where R is the relation to be refined: if(l = l’ && (h,h’) in R) P(h,l) P’(h’,l’) if(l != l’) error If the error state is reachable then that indicates that there exist two paths of the program P with related low and high inputs which produce distinguishable outputs l and l . This is a violation of the non-interference property and thus a leak of information. The model checker Armc is applied to this reachability problem which will output a path to the error label, if reachable. Beside the path, the model checker also returns a formula in linear arithmetic which characterises all initial states from which the error state is reachable. Out of this formula, the two previously related secrets h and h can be extracted which are then split in two different equivalence classes. Given the formula from the last step, Quant calculates the number and sizes of those equivalence classes using a combination of the Omega calculator and the Lattice Point Enumeration Tool. Omega calculates for each equivalence class a linear arithmetic proposition in disjunctive normal form. The enumeration tool
Information Theory and Security: Quantitative Information Flow
121
then solves these system of linear inequalities for each class, which results in counting the number of elements in the equivalence class. The so generated equivalence class can then be applied to various entropy formulas. The paper shows as example, among others, a sum query of three secrets. The precision and scalability of the tool entirely depends on the choice of underlying tools. The runtime depends on the number of execution paths of the program under analysis and number of variables involved. 7.3
Abstract Interpretation
Mu and Clark use probabilistic semantics in an abstract interpretation framework to build an automatic analyser [34]. They borrow Kozen’s semantics for probabilistic programs which interprets programs as a partial measurable functions on a measurable space; these semantics can be seen as a way to map an input probability distribution to an output probability distribution through the execution of the program under analysis. The entropy measure used is Shannon’s entropy was extended to work on “incomplete” random variables, where the entropy is normalised to the coverage of the probability distribution. To make their analysis tractable, they employ abstract interpretation as their abstraction technique. The interval abstract domain is used to partition the concrete measure space into blocks. Additionally, Monniaux’s abstract probabilistic semantics are used to replace the previous concrete semantics. The abstraction overestimates the leakage through uniformalization, which provides safe upper bounds on the leakage. The concrete space X is abstracted to a set of intervalbased partitions for each program variable, together with a weighting factor αi , which is the sum of the probabilities of the interval value-range. The abstract domain is described by a Galois connection X α, γ X # , where the measure space X is abstracted by X # . The abstraction function α is a map from X to sets of interval-based partitions X # = { αi , [Ei ] }0 0} E [#secure] = Pr {#secure > 0} Throughput(send) + Pr {secure} 2 · E [#processing IF #secure = 1] −E [#processing IF #insecure = 1] 2 · E [#processing IF #secure = 1] − E [#processing IF #insecure = 1] E [#processing] · (2 · E [#secure] − 5 · E [#insecure])
In order to illustrate the impact of the chosen revenue metrics on the results we consider two revenue scenarios, a low cost one and a high cost scenario. Comparing both will show how the choice of revenue metric, which is largely influenced by the modeller, affects the results. In the low cost scenario every securely processed item gains twice the revenue an insecurely processed one loses. In the high cost scenario, on the other hand, the gain is the same, but the cost of processing a message in an insecure situation is 5. 5.4
Analysis
The analysis aims at investigating the effects of different key lengths on performance and security of the system. The key length has an impact on encryption 1.6 1.4 1.2 1 0.8 0.6 0.4 Pr{secure} throughput Pr{secure} + throughput
0.2 0 0
0.5
1
1.5
2
2.5
3
3.5
encryption time Fig. 8. Throughput of the processing system and probability of being a secure system
158
K. Wolter and P. Reinecke
time as well as on the time needed to break the key. Therefore, with the key length the firing delay of transition encrypt as well as transition fail changes. We consider key length to be reflected in these firing times. As can be seen in Table 1, encryption with the shortest key is assumed to take 0.1 time units, and the time to break this key is assumed to be 12.5. We assume that encryption time increases in steps of length 0.1, while the time to break the key increases first by a factor of 2 until 100 and then by linear steps of 500. As both parameters increase simultaneously we only use encryption time in the plots. First, we consider throughput, probability of the system being secure, and CPSM, the combination of both. Figure 8 shows these measures for increasing key length (reflected by increasing encryption times). Considering that the relation between the two varying parameters (encrypt and TSI) changes after the first three solution runs the probability of the system being in the secure state increases almost linearly with thetime between security incidents. This is as expected. Reasoning naively, the throughput should decrease linearly with increasing encryption time. This is not the case for a simple reason. For very short encryption times the throughput is limited by the long delay (low firing rate) of transition generate. The interplay of the two transitions (generate and encrypt) is blurred by the effects of the inhibitor arc blocking the encryption while the system is recovering from a security incident. This inhibitor arc injects side effects of the security model into the performance measure throughput. Short keys have more security incidents and therefore more time is spent in the recovery state. Therefore, only short keys (and encryption times) show in the throughput. From Figure 8 we observe that the CPSM metric, which is the sum of throughput and probability of the system being in the secure state, is a simple and straightforward measure for the performance and security tradeoff. It has a clear maximum which is at encryption time 1.4 and TSI (time to security incident) 6100. Consider now the revenue measures (Figure 9). Both revenue measures show very clear optimal parameter settings for the encryption time at 1.9 and hence the key length and expected time between security incidents (TSI). Note that the optimum encryption time lies just below the firing delay of the generate transition. For longer encryption time the generate delay is no longer the limiting factor and a queue may build up in place queueing. For short encryption times many more messages are being processed, therefore the difference between both cost models is more pronounced. In the limit for very long encryption time and extremely long time between security incidents the total revenue decreases for both cost models and they both approach the same limit, zero. Figure 10 shows the same metrics in a new presentation. The lowCostRevenue is the same as shown in Figure 9 and gain is its positive contribution. The difference between both curves, which more clearly shows in the zoomed plot on the right side in Figure 10 illustrates the security cost that is higher the shorter the times between security incidents are.
Performance and Security Tradeoff
159
0.12 0.1 0.08 0.06 0.04 0.02 0
lowCostRevenue highCostRevenue
-0.02 -0.04 -0.06 -0.08 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 9. Revenue with two different cost models 0.11
0.11
0.1
0.1
lowCostRevenue gain
0.09
0.09
0.08
0.08
0.07
0.07
0.06
0.06
0.05
0.05
0.04
0.04
0.03
0.03
0.02
lowCostRevenue gain
0.02 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
0.2
0.4
0.6
0.8 encryption time
1
1.2
1.4
Fig. 10. Comparison of security cost and total revenue (zoomed in on the right)
5.5
The Modified Model
In order to more distinctly split the model into a performance and a security model we have removed the inhibitor arc blocking the encryption of messages during recovery. Henceforth, the performance and the security model are only intertwined by the metrics defined on them and by the simultaneous increase of encryption time and time between security incidents.
160
K. Wolter and P. Reinecke
Fig. 11. Simplified Petri net model for combined performance and security analysis
1.6 1.4 1.2 1 0.8 0.6 0.4 Pr{secure} throughput Pr{secure} + throughput
0.2 0 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 12. Throughput of the simplified processing system
We expect to see more clearly the characteristics of performance and security and how they conflict. The simplified model is shown in Figure 7. While the probability of being in the secure state is not affected by the change in the model, throughput undoubtedly increases for short encryption times and remains almost the same for long encryption times. Without the inhibitor arc performance of the system benefits from more processing time. The constant throughput for small encryption times is caused by the limitation through the generate transition. This holds true until the encryption time equals
Performance and Security Tradeoff
161
the delay of the generate transition. As the encryption time increases further it is decisive for the throughput which then decreases with increasing encryption time. The probability of being in the secure state and the throughput are both monotonous functions (increasing or decreasing) with increasing encryption time and time between security incidents. While neither of them has an optimum over the encryption time, the combination of both, e.g. our CPSM metric, has a clear maximum when the encryption time equals the delay of the generate transition. 0.1 0.08 0.06 0.04 0.02 0
lowCostRevenue highCostRevenue
-0.02 -0.04 -0.06 -0.08 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 13. Revenue with two different cost models for the simplified processing model
If encryption is not blocked during recovery from a security incident the most insecure system with short encryption times and high probability of key breaking achieves no revenue because cost dominates the gain in both the high and the low cost scenarios. Indirectly, the system throughput influences the revenue and the parameter set that achieves the highest throughput also obtains the highest revenue. In this model throughput in the insecure state is treated the same as throughput in the secure state. The higher throughput as compared to the earlier modes comes at the expense of insecurely processed data, which is not considered in the measures. Blocking the processing system during recovery is a wise action as it reduces the amount of wasted work considerably and therefore increases the revenue. This applies to both, the low cost and the high cost scenario. The cost is most
162
K. Wolter and P. Reinecke
0.1
0.1 lowCostRevenue gain
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0
0
-0.02
lowCostRevenue gain
-0.02 0
0.5
1
1.5
2
encryption time
2.5
3
3.5
0.2
0.4
0.6
0.8
1
1.2
1.4
encryption time
Fig. 14. Comparison of security cost and total revenue (zoomed in on the right) for the simplified processing model
dominant while the encryption key is short and the system insecure. In Figure 13 the cost is the difference between both curves which diminishes with increasing encryption time and key length. Figure 14 shows the relationship between gain and loss in the low cost scenario by displaying gain and total revenue. The cost then is the distance between both curves. Obviously, the gain is less as the expected number of items being processed is less when removing the inhibitor arc. Cost also decreases but, as seen above the total revenue still decreases. If gain and loss are proportional to the number of items in the processing state this result is as expected. When optimising revenue, the best encryption time is 0.13.
6
Analysis Issues
So far we have assumed that we can obtain solutions for the models we consider. However, analysis of combined performance and security models suffers from the same numerical difficulties known in performability models. We will now discuss a few common problems. First, one often encounters the problem that the size of the state space rapidly increases. This increases computation time and memory requirements and may render models unsolvable. Furthermore, as Table 3 and Figure 15 illustrate, a large state space may translate into inaccuracy of the solution. Table 3 lists the number of states the model has when increasing the capacity of the processing system. We have computed the probability of being in the secure state for different capacities in both models. Of course, this probability should be constant for all capacities. However, it turned out that the solutions suffered severely from numerical inaccuracy and differed significantly. We then found that we had used the default setting of TimeNET that limits the number of iterations in the steady state solver to 1000 and this was hardly ever enough. This illustrates that the size of the state space can have a drastic impact of the convergence properties of the solution algorithms, and
Performance and Security Tradeoff
163
Table 3. Size of the state space for different capacity N of the processing system number of states 3 9 18 30 63 198 3 978 15 453 34 428
n 0 1 2 3 5 10 50 100 150
1
Probability of secure state
0.9
0.8
0.7
security model only with inhibitor, n = 150 without inhibitor, n = 1 without inhibitor, n = 2 without inhibitor, n = 3 without inhibitor, n = 5 without inhibitor, n = 10 without inhibitor, n = 150
0.6
0.5 0
2000
4000
6000
8000
10000
12000
14000
16000
time between security incidents
Fig. 15. Probability of being in the secure state
consequently on the accuracy of the results. Figure 16 shows the number of iterations needed in both models for the parameter set indicated by the encryption time. Only in very few parameter configurations do the solutions converge within 1000 iterations, and some even need up to 16 000 iterations. Interestingly, the parameter sets around encryption time of 2 require most iterations and encryption time 2, which is identical to the delay of the generate transition needs much less iterations than parameters slightly higher and lower. Figure 16 illustrates that the solution algorithm for the model without inhibitor arc for all parameter configurations requires many more iterations to
164
K. Wolter and P. Reinecke 16000 14000
model with inhibitor model without inhibitor
12000
No. iterations
10000 8000 6000 4000 2000 0 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 16. Iterations needed to obtain 10−7 accuracy
1 0.9 0.8 0.7 0.6 0.5 0.4
with inhibitor, maxIter 1000 with inhibitor, maxIter 2000 with inhibitor, maxIter 5000 with inhibitor, maxIter 1000000 without inhibitor, maxIter 1000 without inhibitor, maxIter 2000 without inhibitor, maxIter 5000 without inhibitor, maxIter 1000000
0.3 0.2 0.1 0 0
0.5
1
1.5
2
2.5
3
Fig. 17. Deviation of the probability of being in the secure state
3.5
Performance and Security Tradeoff
165
converge than for the model with inhibitor arc. The consequences of poor convergence are shown in Figure 17 where the probability of being in the secure state is plotted for different limits of the number of iterations.It becomes clear that the solutions for the model with inhibitor are reasonably good even if the algorithm does not converge. Accuracy of the solution (which we do not show for all runs) is never below 10−5 . The same holds for the model without inhibitor arc using at most 2000 iterations. Using only 1000 iterations for high parameter values precision of the solution goes down to 10−4 while the worst accuracy of the probability of being in the secure state is for intermediate parameter values. This illustrates that an accuracy of 10−7 is sometimes, but not always necessary for reasonably precise results for the measures. Even worse, no rule exists for when high accuracy is essential.
7
Conclusions
In this chapter we have investigated the relationship of performance and security in model-based evaluation. The approach we illustrated is based on the premise that there are significant similarities between security and dependability. In consequence, security may be evaluated using stochastic processes and in particular CTMCs using stochastic Petri nets or stochastic process algebras as specification languages. The combination of security and performance poses interesting tradeoffs and inspires similar models as the combination of performance and dependability, known as performability. Quantification of security has only recently attracted more attention, and while some initial conceptual work has been published already decades ago, serious model-based evaluation of security mechanisms has been published only recently. The tradeoff between performance and security has been investigated only for very specific scenarios. This tradeoff is of high relevance especially in modern systems that are subject to requirements in both areas, performance and security. In order to proceed to a more general treatment and understanding of the performance-security tradeoff we have proposed a rather simple model which distinctly consists of a security part and a performance part. We have shown how to formulate measures that include both, performance and security aspects and that optimise the tradeoff between the two. While previously the performance of security mechanisms has been investigated, or the security of a processing system, we want to initiate more explicit treatment of both properties together. We have used our model to discuss typical issues of parametrisation, reward formulation, and analysis frequently encountered with models of this type. Many challenges and open problems remain that will hopefully be addressed in the future. In particular, it is as of yet unclear whether all existing security mechanisms can be traded for performance of the respective system, whether it will be possible to study realistic parameter sets and whether combined measures exist for arbitrary systems.
166
K. Wolter and P. Reinecke
References [1] Marsan, M.A., Balbo, G., Conte, G., Donatelli, S.: Modelling with Generalized Stochastic Petri Nets. Series in Parallel Computing. John Wiley & Sons, Chichester (1995) [2] Almasizadeh, J., Azgomi, M.A.: Intrusion process modeling for security quantification. In: International Conference on Availability, Reliability and Security, pp. 114–121. IEEE Computer Society, Los Alamitos (2009) [3] Aviˇzienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing 1(1), 11–33 (2004) [4] The Center for Internet Security. The CIS Security Metrics v1.0.0 (May 2009) [5] Cho, J.-H., Chen, I.-R., Feng, P.-G.: Performance analysis of dynamic group communication systems with intrusion detection integrated with batch rekeying in mobile ad hoc networks. In: AINAW 2008: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications – Workshops, Washington, DC, USA, pp. 644–649. IEEE Computer Society, Los Alamitos (2008) [6] Deavours, D.D., Clark, G., Courtney, T., Daly, D., Derisavi, S., Doyle, J.M., Sanders, W.H., Webster, P.G.: The M¨ obius Framework and Its Implementation. Transactions on Software Engineering 28(10), 956–969 (2002) [7] Dingle, N.J., Harrison, P.G., Knottenbelt, W.J.: Hydra: Hypergraph-based distributed response-time analyzer. In: Arabnia, H.R., Mun, Y. (eds.) Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2003, Las Vegas, Nevada, USA, June 23-26, vol. 1, pp. 215–219. CSREA Press (2003) [8] Dingle, N.J., Knottenbelt, W.J.: Automated customer-centric performance analysis of generalised stochastic petri nets using tagged tokens. Electron. Notes Theor. Comput. Sci. 232, 75–88 (2009) [9] Freiling, F.C.: Introduction to security metrics. In: Dependability Metrics, pp. 129–132 (2005) [10] German, R.: Performance Analysis of Communication Systems with NonMarkovian Stochastic Petri Nets. John Wiley & Sons, Inc., Chichester (2000) [11] Gilmore, S., Hillston, J.: The pepa workbench: A tool to support a process algebra-based approach to performance modelling. In: Haring, G., Kotsis, G. (eds.) TOOLS 1994. LNCS, vol. 794, pp. 353–368. Springer, Heidelberg (1994) [12] Haverkort, B.R.: Performance of Computer Communication Systems: A ModelBased Approach. John Wiley & Sons, Chichester (1998) [13] Hillston, J.: A Compositional Approach to Performance Modelling. Cambridge University Press, Cambridge (1994) [14] Hillston, J.: A Compositional Approach to Performance Modelling (Distinguished Dissertations in Computer Science). Cambridge University Press, New York (2005) [15] Jain, R.: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling. Wiley, New York (1991) [16] Jaquith, A.: Security Metrics: Replacing Fear, Uncertainty and Doubt. AddisonWesley Professional, Reading (2007) [17] Kitchenham, B., Pfleeger, S.L., Fenton, N.: Towards a framework for software measurement validation. IEEE Trans. Softw. Eng. 21(12), 929–944 (1995)
Performance and Security Tradeoff
167
[18] Lamprecht, C., van Moorsel, A., Tomlinson, P., Thomas, N.: Investigating the efficiency of cryptographic algorithms in online transactions. International Journal of Simulation: Systems, Science & Technology 7(2), 63–75 (2006) [19] Lindemann, C.: Performance Modelling with Deterministic and Stochastic Petri Nets. John Wiley & Sons, Chichester (1998) [20] Littlewood, B., Brocklehurst, S., Fenton, N., Mellor, P., Page, S., Wright, D., Dobson, J., Mcdermid, J., Gollmann, D.: Towards operational measures of computer security. Journal of Computer Security 2, 211–229 (1993) [21] Madan, B.B., Goseva-Popstojanova, K., Vaidyanathan, K., Trivedi, K.S.: Modeling and quantification of security attributes of software systems. In: DSN 2002: Proceedings of the 2002 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. 505–514. IEEE Computer Society Press, Los Alamitos (2002) [22] Meyer, J.F.: On evaluating the performability of degradable computing systems. IEEE Transactions on Computers 29(8), 720–731 (1980) [23] Meyer, J.F.: Performability modeling: Back to the future? In: Proceedings of the 8th International Workshop on Performability Modeling of Computer and Communication Systems, pp. 5–9, CTIT (2007) [24] Miner, A.S.: Computing response time distributions using stochastic petri nets and matrix diagrams. In: IEEE International Workshop on Petri Nets and Performance Models. IEEE Computer Society, Los Alamitos (2003) [25] Mitrani, I.: Probabilistic modelling. Cambridge University Press, New York (1998) [26] Neuts, M.F.: Matrix-Geometric Solutions in Stochastic Models. An Algorithmic Approach. Dover Publications, Inc., New York (1981) [27] Nicol, D.M., Sanders, W.H., Trivedi, K.S.: Model-based evaluation: From dependability to security. IEEE Trans. Dependable Secur. Comput. 1(1), 48–65 (2004) [28] Pattipati, K.R., Mallubhatla, R., Gopalakrishna, V., Viswanatham, N.: MarkovReward Models and Hyperbolic Systems. In: Performability Modelling: Techniques and Tools, pp. 83–106. Wiley, Chichester (1998) [29] Sahner, R.A., Trivedi, K.S., Puliafito, A.: Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic Publishers, Dordrecht (1996) [30] van Moorsel, A., Bondavalli, A., Pinter, G., Madeira, H., Majzik, I., Dur˜ aes, J., Karlsson, J., Falai, L., Strigini, L., Vieira, M., Vadursi, M., Lollini, P., Esposito, R.: State of the art. Technical Report D2.1, Assessing, Measuring and Benchmarking Resilience (AMBER) (April 2008) [31] Verendel, V.: Quantified security is a weak hypothesis: A critical survey of results and assumptions. In: NSPW 2009: Proceedings of the New Security Pradigms Workshop 2009, pp. 37–50. ACM, New York (2009) [32] Wang, Y., Lin, C., Li, Q.-L.: Performance analysis of email systems under three types of attacks. Performance Evaluation (2010) (in Press) (Corrected Proof) [33] Weyuker, E.J.: Evaluating software complexity measures. IEEE Trans. Softw. Eng. 14(9), 1357–1365 (1988) [34] Zhao, Y., Thomas, N.: Efficient solutions of a pepa model of a key distribution centre. Performance Evaluation (2009) (in Press) (Corrected Proof) [35] Zimmermann, A., German, R., Freiheit, J., Hommel, G.: Petri Net Modelling and Performability Evaluation with TimeNET 3.0. In: Haverkort, B.R., Bohnenkamp, H.C., Smith, C.U. (eds.) TOOLS 2000. LNCS, vol. 1786, pp. 188–202. Springer, Heidelberg (2000)
Author Index
Broadbent, Anne
43
Di Pierro, Alessandra
Kashefi, Elham 1
43
Malacaria, Pasquale
87
Fitzsimons, Joseph
43
Reinecke, Philipp
135
Hankin, Chris 1 Heusser, Jonathan
87
Wiklicky, Herbert 1 Wolter, Katinka 135