for Programming Languages
% (
3
...

Author:
John C. Mitchell

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

for Programming Languages

% (

3

! "# $ &'# &'# ' ) *+ ,) -. / / '! "1 , 2 ! ' #

% ! # ) # 4 3 4 ) 3 4 !# # 5 3 6,5

4 '! 5 */ "+ + 7 - 5 % $ ' ( 8!,# # */ '! ' ! # */ &) ! $ % "# ( + 8 ' ) 5 5

( 0 % ( % %

( ( ( ( (( (0 0 0 %

%

(

*/ ' 4 # % %

% &) 5, 9'

%% %( */ */ ! # !# + 7 ( ,# ( ) ) ( 4 (% & + 5 (( + 5 ( ,8 ' 5 # # : ! 5 */ ' 5 ! ; '# + '# % 5 '#

!

%

(

4 + 7 5 # 8 & '! 5 ) # 5 % &) * # % % & % # # 5 ' %% / 5 # %( * + # % #' ) # # ' )

7 + +' . ) ) '

# 5 +# 5 ) . ) ' M) * ) 5 *

#$!

E*)# F

EF - # 5 !# K 8 5 + 5, 8!,# # E F EF * # 5 ! # E F EF # 5 K ' # E %F EF ) !# + #7 E (F

EF

E*)# F

E ?F

EF # 5 ' # E %F EF 0, then has elements (see Exercise 1.6.1). Returning to cartesian products, let A and B be sets with a E A and b E B. The singleton {a} and the set {a, b} with two elements are both in the powerset P(A U B). Therefore, the ordered pair {{a}, {a, b}} is in P(P(A U B)), the powerset of the powerset. This finally gives us a way of defining the cartesian product using only set-theoretic operations that do not lead to paradoxes:

In addition to ordered pairs, it is also possible to define ordered triples, or k-tuples for any positive integer k. If A1 x Ak is the Ak are sets, then the cartesian product A1 x collection of all k-tuples (al i

1"1, V2

Eq?n n if

values

V a

V

false

true, Eq?n m

true then

M

else

N

M,

value

n,m distinct numerals false

if

then M

else

N

N

Subterm Rules

M+N boo!

Eq?MN

M' + N

n

Eq?M'N

Eq?nM M

(M, N>

M Proj1 M

Functions

(M',

N>

M

n

n a numeral

+ M'

n a numeral

Eq?nM'

M'

if M then N else Pairs

+

M' then N (V, N>

else

(V, N>

P

V a value

M' Proj1 M'

NWN'

V

a value

eager or call-by-value reduction is defined in Table 2.5. Like lazy reduction, eager reduction is only intended to produce a numeral or boolean constant from a full (closed) program; it is not intended to produce normal forms or fully reduce open terms that may have free variables. This can be seen in the rules for addition: there is no eager reduction from a term of the form x + M, even though M could be further reduced. A significant restriction on eager PCF is that we only have fixed-point operator fiXa for The intuitive reason is that since a recursively-defined natural function types a = ai number, boolean or pair would be fully evaluated before any function could be applied to it, any such expression would cause the program containing it to diverge. In more detail, reduction of closed terms only halts on lambda abstractions, pairs of values and constants.

The Language PCF

94

While lambda abstractions and pairs of values could contain occurrences of fix, analysis of a pair (V1, V2) with V1 and V2 values shows that any occurrence of fix must occur inside a lambda abstraction. Therefore, the only values that could involve recursion are functions. If we changed the system so that a pair (M, N) were a value, for any terms M and N, then it would make sense to have a fixed-point operatorfix0 for each product type a = al x It is easy to verify, by examination of Table 2.5, that for any M, there is at most one is a partial function on terms whose N. Since no values are reduced, N with M domain contains only non-values. The use of delay and fix reduction requires some explanation. Since eager reduction Ax: a. Mx does not reduce under a lambda abstraction, a term of the form will not be reduced. This explains why we call the mapping M F—± Ax: a. Mx "delay." The reason that delay is used in fix reduction is to halt reduction of a recursive function definition until an argument is supplied. An example is given in Exercise 2.4.24; see also Exercise 2.4.25.

Example 2.4.20

Some of the characteristics of eager reduction are illustrated by reduc-

ing the term (fix (Ax:

nat

—*

nat. Ày: flat. y)) ((Az: flat. z

+

1) 2)

to a value. This is only a trivial fixed point, but the term does give us a chance to see the order of evaluation. The first step in determining which reduction to apply is to check the entire term. This has the form of an application MN, but neither the function M nor the argument N is a value. According to the rule at the bottom left of Table 2.5, we must (Ax: flat —± flat. Ày: nat. y). Since the argument to fix is eager-reduce the function M since delay[.. .1 is a value, we apply the reduction axiom forfix. Followed by a value, this gives us a function value (lambda abstraction) as follows:

fix (Ax: nat

—*

(Ax: flat

flat. Ày: flat. y) —÷ nat. Ày: flat. y) (delay[fix (Ax: flat

—*

flat. Ày: flat. y)])

Ay:nat.y Note that without delay[ 1' the eager strategy would have continued fix reduction indefinitely. We have now reduced the original term MN to a term VN with function V Ày: flat. y a value but argument N (Az: flat. z + 1) 2 not a value. According to the rule at the bottom right of Table 2.5, we must eager-reduce the function argument. Since Az: nat. z + 1 and 2 are both values, we can apply a-reduction, followed by reduction of the sum of two numerals:

2.4

PCF Reduction and Symbolic Interpreters

95

This now gives us the application (Ày: nat. y) 3 of one value to another, which can be reduced to 3.

Example 2.4.21

Divergence of eager evaluation can be seen in the term

let f(x:nat):nat=3 in letrec g(x:nat):nat=g(x+l) in

f(g5)

from Example 2.4.6. Exercise 2.4.22 asks you to reduce this term. Since we can easily prove that this term is equal to 3, this term shows that the equational proof system of PCF is not sound for proving equivalence under eager evaluation. It is possible to develop an alternative proof system for eager equivalence, restricting $-conversion to the case where the function argument is a value, for example. However, due to restrictions on replacing subexpressions, the resulting system is more complicated than the equational system for PCF.

The deterministic call-by-value evaluator evalv is defined from in the usual way. Since the partial function selects a reduction step if the term is not a value, eva/v may also be defined as follows.

evalv(M)

fM ifMisavalue =jN

if M

M' and evalv(M') = N

There seem to be two main reasons for implementing eager rather than left-most (or lazy) evaluation in practice. The first is that even for a purely functional language such as PCF (i.e., a language without assignment or other operations with side effects), the usual implementation of left-most reduction is less efficient. The reason for the inefficiency appears to be that when an argument such as x is passed to a function g, it is necessary to pass a pointer to the code for and to keep a record of the appropriate lexical environment. As a result, there is significantly more overhead to implementing function calls. It would be simpler just to call with argument x immediately and then pass the resulting integer, for example, to g. The second reason that left-most reduction is not usually implemented has to do with side effects. As illustrated by the many tricks used in Algol 60, the combination of left-most evaluation and assignment is often confusing. In addition, in the presence of side effects, left-most evaluation does not coincide with nondeterministic or parallel evaluation. The reason is that the order in which assignments are made to a variable will generally affect the program output. We cannot expect different orders of evaluation to produce the same result. Since most languages in common use include assignment, many of the advantages of left-most or lazy evaluation are

f

f

lost.

f

The Language PCF

96

Exercise 2.4.22 Show, by performing eager reduction, that the eager interpreter does not halt on the program given in Example 2.4.2 1. Assuming appropriate reduction rules for the function symbols used in 6. the factorial function fact of Section 2.2.5, show thatfact 3

Exercise 2.4.23

Exercise 2.4.24 fixa (Ax: a. M)

An alternate eager reduction forflx might be [fixa (Ax: a.

M)/xIM

Find a term Ax: a. M where eager reduction as defined in Table 2.5 would terminate, but the alternate form given by this rule would not. Explain why, for programs of the form letrec f(x: a) = M in N with fix not occurring in M or N, it does not seem possible to distinguish between the two possible eager reductions forflx.

Eager or call-by-value reduction may also be applied to untyped lambda calculus. With ordinary (left-most) reduction, an untyped fixed-point operator, Y, may f(xx)). Under eager, or call-by-value reduction, be written Y )Lf.(Ax. f(delay[xx]))(Ax. the standard untyped fixed-point operator is written Z f(delay[xx])) where delay[U] Xz. Ux for x not free in untyped term U. Show that for M Ax. .Xy. y, the application YM lazy or left-most reduces to a term beginning with a lambda, and similarly for ZM using untyped eager reduction. What happens when we apply eager reduction to YM?

Exercise 2.4.25

PCF Programming Examples, Expressive Power and Limitations

2.5 2.5.1

Records and n-tuples

Records are a useful data type in Pascal and many other languages. We will see that record expressions may be translated into PCF using pairing and projection. Essentially, a record is an aggregate of one or more components, each with a different label. We can combine any expressions M1: ai Mk: ak and form a record {t1 = M1 4 = Mk) whose the value of The type of this record may be written aj component has £k: ak). We select the component of any record r of this type (0 i k) using the "dot" notation For example, the record {A 3, B = true) with components labeled A and B has type {A: nat, B: bool). The A component is selected by writing {A 3, B = true).A. In general, we have the equational axiom

= M1

4=

= M1

(record selection)

for component selection, which may be used as a reduction rule from left to right.

PCF Programming Examples, Expressive Power and Limitations

2.5

97

One convenient aspect of records is that component order does not matter (in the syntax just introduced and in most languages). We can access the A component of a record r without having to remember which component we listed first in defining r. In addition, we may choose mnemonic names for labels, as in

let

person = (name: string, age: nat, married: bool,

.

.

.}

However, these are the only substantive differences between records and cartesian products; by choosing an ordering of components, we can translate records and record types into pairs and product types of PCF. For records with two components, such as jA: nat, B: bool), we can translate directly into pairs by choosing one component to be first. However, for records with more than two components, we must translate into

"nested" pairs. To simplify the translation of n-component records into PCF, we will define n-tuples a,1, we introduce the n-ary product as syntactic sugar. For any sequence of types al notation cr1

x

•

x

cr1

x

(cr2

x

.

. .

(an_i

X

a,1)

.

by associating to the right. (This is an arbitrary decision. We could just as easily associate n-ary products to the left.) To define elements of an n-ary product, we use the tupling

notation (Mi as syntactic sugar for a nested pair. It is easy to check that if M1:

then

(M1

To retrieve the components of an n-tuple, we use the following notation for combinations

of binary projection functions. x

x

x

x

We leave it as an exercise to check that

justifying the use of this notation. A useful piece of meta-notation is to write

a

a ofk a's.

The Language PCF

98

Using n-tuples, we can now translate records with more than two components into PCF quite easily. If we want to eliminate records of some type {t1: £k: cik), we choose some ordering of the labels Li Lk and write each type expression and record expression using this order. (Any ordering will do, as long as we are consistent. For concreteness, we could use alpha-numeric lexicographical order.) We then translate expressions with records into PCF as follows. {t1:ci1

x ..

£k:crk}

=

=

Mk}

(M1

X

Mk)

If an expression contains more than one type of records, we apply the same process to each type independently. Example 2.5.1

let

r:

We will translate the expression

{A: int, B: bool)

=

{A

= 3,

B

=

true)

in if rB then

else rA +

1

into PCF. The first step is to number the record labels. Using the number 1 for A and 2 for B, the type {A: int, B: bool) becomes mt x bool and a record {A = x, B = y}: {A: int, B: bool) becomes a pair (x, y) : mt x bool. Following the general procedure outlined above, we desugar the expression with records to the following expression with product types and pairing:

let

r: mt x bool =

(3, true)

in if Proj2r then Proj1r else (Proj1r) + 1

Some slightly more complicated examples are given in the exercise.

Exercise 2.5.2

Desugar the following expressions with records to expressions with product types and pairing. (a)

let

r:{A:int, B:bool,

in if r.B then

r.A

int}={A =5, B—_false,

else (r.C)(r.A)

let f(r:{A:int,C:bool}):{A:int, B:bool}={A=r.A, B=r.C} in f{A=3,C=true} (b)

You may wish to eliminate one type of records first and then the other. If you do this, the intermediate term should be a well-typed expression with only one type of records.

2.5.2

Searching the Natural Numbers

One useful programming technique in PCF is to "search" the natural numbers, starting from 0. In recursive function theory, this method is called minimization, and written us-

2.5

PCF Programming Examples, Expressive Power and Limitations

99

ing the operator Specifically, if p is a computable predicate on the natural numbers (which means a computable function from flat to bool), then 1ax[p x] is the least natural number n such that pn true and pn' —false for all n' 0. If C is a class of numeric functions, we say C is is numeric if f:

f

1\fk

Closed under composition if, for every the class C also contains the function h defined by

•

h(n1,...,

and g:

—*

iV from C,

—k

from C,

flk)

Closed under primitive recursion if, for every the class C contains the function h defined by

•

h(O, n1

nk)

f(n1

f: Ark

J\f and g:

nk)

nk)=g(h(m,nl

h(m+1,n1 •

iV

nk),m,nl

nk),

Closed under minimization if, for every f: .N from C such that Vn1,..., f(m, ni nk) = 0, the class C contains the function g defined by

g(nl

I

nk)

==

the least m such that

f(m, ni

nk)

=

0.

The class of total recursive functions is the least class of numeric functions that contains the projection functions (n i nk) n1, the successor function Ax: nat. x + 1, the constant zero function XX: nat. 0, and that is closed under composition, primitive recursion, and minimization. We can show that every tota' recursive function is definable in PCF, in the following precise sense. For any natural number n E Al, let us write Fnl for the corresponding J\f is PCF-definable, or simply numeral of PCF. We say a numeric function f: natk definable, if there is a closed PCF expression M: —* nat, where natk is the k-ary product type nat x ... x nat, such that Vn1

nke.A/.M(Fnll

Theorem 2.5.12

Proof

Fnkl)=Ff(nl

nk)].

Every total recursive function is definable in PCF.

To show that every total recursive function is definable in PCF, we must give the projection functions the successor function, and the constant zero function, and

2.5

PCF Programming Examples, Expressive Power and Limitations

107

show closure under composition, primitive recursion, and minimization. Some of the work has already been done in Exercise 2.5.7 and Proposition 2.5.3. Following the definitions in Section 2.5.1, we let

Ax:nat.x Ax:natk.Projix

(1

(1

0. If and g are numeric functions of k arguments, then g will behave like g, except that an application to k arguments will only have a normal form (or "halt") when the application of has a normal form. For any k > 0, the may be written as follows: function

f

f

f

f

f

Ax:nat1'.if Eq?(fx)0 then gx else gx. For any x, if fx can be reduced to normal form then (whether this is equal to zero or not) the result is gx. However, if fx cannot be reduced to a normal form, then the conditional can never be eliminated and g will not have a normal form. The main reason why this works is that we cannot reduce Eq? (fx) 0 to true or false without reducing fx to normal form. Using cnvg, we can define a composition and g that is defined only when is defined. are unary partial recursive functions represented by PCF —k For example, if f, g: terms M and N, we represent their composition by the term cnvg M (Ax: nat.(N(Mx))).

f

f

f

The Language PCF

112

and The generalization to the composition of partial functions fi —k fe: g: is straightforward and left to the reader. —k for appropriate k. The remaining cases of the proof are treated similarly, using The reader is encouraged to write out the argument for primitive recursion, and convince him- or herself that no modification is needed for minimization.

This theorem has some important corollaries. Both of the corollaries below follow from a well-known undecidability property of recursive functions. Specifically, there is no recursive, or algorithmic, method for deciding whether a recursive partial function is defined on some argument. This fact is often referred to as the recursive unsolvability of the Halting Problem. Exercise 2.5.18 describes a direct proof that the PCF halting problem is not solvable in PCF.

Corollary 2.5.15

There is no algorithm for deciding whether a given PCF expression has

a normal form.

Proof If there were such an algorithm, then by Theorem 2.5.14, we could decide whether a given partial recursive function was defined on some argument, simply by writing the function applied to its argument in PCF.

.

Corollary 2.5.16 There is no algorithm for deciding equality between PCF expressions.

Proof If there were such an algorithm, then by Theorem 2.5.14, we could decide whether a given partial recursive function was defined on some argument. Specifically, given a description of a partial function f, we may produce PCF term M representing the application f(n) by the straightforward algorithm outlined in the proof of Theorem 2.5.14, for any n. Then, using the supposed algorithm to determine whether if Eq? M M then o else 0 = 0, we could decide whether f(n) is defined.

.

Exercise 2.5.17 A primitive recursive function is a total recursive function that is defined without using minimization. The Kleene normal form theorem states that every partial recursive function f: may be written in the form

f(ni

nk)

= p(the least n with tk(i,

n)

= 1)

where p and tk are primitive recursive functions that are independent of f, and i depends on f. This is called the Kleene normalform of f, after Stephen C. Kleene, and the number i is called the index of partial recursive function f. The Kleene normal form theorem may be found in standard books on computability or recursion theory, such as [Rog67}. (a) Use the Kleene normal form theorem and Theorem 2.5.12 to show that every partial recursive function is definable in PCF.

2.5

PCF Programming Examples, Expressive Power and Limitations

113

(b) Using the results of Exercise 2.5.13, show that every partial function computed by a Turing machine is definable in PCF.

Exercise 2.5.18

It is easy to show the halting problem for PCF is not solvable in PCF. Specifically, the halting function on type a, Ha, is a function which, when applied to any PCF term Ma, returns true if M has a normal form ("halts"), and false otherwise. In this exercise, you are asked to show that there is no PCF term Hb00/ defining the halting function on type bool, using a variant of the diagonalization argument from recursive function theory. Since the proof proceeds by contradiction, assume there is a PCF term Hb00/: bool —* bool defining the halting function. (a) Show that using Hb001, we may write a PCF term G: bool —k bool with the property that for any PCF term M: bool, the application GM has a normal form if M does not.

(b) Derive a contradiction by considering whether or notfix G has a normal form.

2.5.6

Nondefinability of Parallel Operations

We have seen in Sections 2.5.4 and 2.5.5 that the PCF-definable functions on numerals correspond exactly to the recursive functions and (in the exercises) the numeric functions computable by Turing machines. By Church's thesis, as discussed in Section 2.5.4, this suggests that all mechanically computable functions on the natural numbers are definable in PCF. However, as we shall see in this section, there are operations that are computable in a reasonable way, but not definable in PCF. The main theorem and the basic idea of Lemma 2.5.24 are due to Plotkin [P1o77J. A semantic proof of Theorem 2.5.19 appears as Example 8.6.3, using logical relations and the denotational semantics of PCF. The main example we consider in this section is called parallel-or Before discussing this function, it is worth remembering that for a PCF term to represent a k-ary function on the natural numbers, we only consider the result of applying the PCF term to a tuple of numerals. It is not important how the PCF term behaves when applied to expressions that are not in normal form. The difference between the positive results on representing numeric functions in Sections 2.5.4 and 2.5.5 and the negative result in this section is that we now take into account the way a function evaluates its arguments. It is easiest to describe the parallel-or function algorithmically. Suppose we are given closed terms M, N: bool and wish to compute the logical disjunction, M V N. We know true, M —**false, or M has no normal form, and similarly for N. One that either M way to compute M v N is to first reduce both M and N to normal form, then return true if either is true and false otherwise. This algorithm computes what is called the sequential-or of M and N; it will only terminate if both M and N have a normal form. For we wish to return true if either M or N reduces to true, regardless of

The Language PCF

114

Table 2.6 Evaluation contexts for lazy PCF reduction. EV[J

::=

EV[]+M n+Ev[]fornumeraln Eq?Ev[]M Eq?nEv[Jfornumeraln I

I

I

if EVE]

then

IProJ1EvH

I

N

else

P

EVE]M

whether the other term has a normal form. It is easy to see how to compute parallelor in a computational setting with explicit parallelism. We begin reducing M and N in parallel. If either terminates with value true, we abort the other computation and return true. Otherwise we continue to reduce both, hoping to produce two normal forms. If both reduce to false, then we return false. To summarize, the parallel-or of M and N is true if either M or N reduces to true, false if both reduce to false, and nonterminating otherwise. The rest of this section will be devoted to proving that parallel-or is not definable in PCF, as stated in the following theorem.

Theorem 2.5.19 POR M N

There is no PCF expression POR with the following behavior true false no normal form

if M true or N false and N if M otherwise

true false

for all closed boolean expressions M and N. We will prove Theorem 2.5.19 by analyzing the operational semantics of PCF. This gives us an opportunity to develop some general tools for operational reasoning. The first important decision is to choose a deterministic reduction strategy instead of reasoning about arbitrary reduction order. Since we will only be interested in reduction on terms of type nat or bool, we can use lazy reduction, defined in Section 2.4.3, instead of the more complicated left-most reduction strategy. A convenient way to analyze the reduction of subterms is to work with contexts. To identify a critical subterm, we define a special form of context called an evaluation context. While the analysis of parallel-or only requires evaluation contexts for boolean terms, we give the general definition for use in the exercises and in Section 5.4.2. The evaluation contexts of PCF are defined in Table 2.6. An example appears below, after two basic lemmas. To make the outline of the proof of Theorem 2.5.19 as clear as possible, we postpone the proofs of the lemmas to the end of

the section. There are two main properties of evaluation contexts. The first is that if M N, then M matches the form of some evaluation context and N is obtained by reducing the indicated term.

2.5

PCF Programming Examples, Expressive Power and Limitations

115

Lemma 2.5.20

If M N, then there is a unique evaluation context EV[ such that M EVI1M'], the reduction M' —± N' is an instance of one of the PCF reduction axioms, and N EVIN']. 1

Since we can determine the unique evaluation context mentioned in the lemma by pattern matching, evaluation contexts provide a complete characterization of lazy reduction. The second basic property of evaluation contexts is that the lazy reduction of EVIM] is the lazy reduction of M, when M has one. This is stated more precisely in the following lemma.

Lemma 2.5.21

If M

M' then, for any evaluation context

EV[ ], we

have

EV{M'].

A subtle point is that when M is a lazy normal form, the lazy reduction of EVIM] may be a reduction that does not involve M.

Example 2.5.22 The left-most reduction of the term ((Xx: nat. Xy: flat. x + y) 7) 5 + (Xx: flat. x) 3 is given in Example 2.4.fl. As observed in Example 2.4A4 this is also the lazy reduction of this term. We illustrate the use of evaluation contexts by writing the evaluation context for each term in the reduction sequence. Since we underline the active redex, the evaluation context is exactly the part of the term that is not underlined.

+ y) 7)5 + (Xx: nat. x) (Xy:nat.7 + y)S + (Xx:nat.x)3

((Xx: nat. Xy: nat. x

(7+5) +(Xx:nat.x)3 12+(Xx:nat.x)3 12+3

3

]

(1] 5) + (Xx: nat. x) 3

+ (Xx:nat.x)3 (Xx:nat.x)3

left

If we look at the second evaluation context, EVI ] [ ] + (Xx: nat. x) 3, then it is easy to see that when M M' the lazy reduction of EV[M] will be EvIM] as guaranteed by Lemma 2.5.21. However, if we insert a lazy normal form such as 2 into the context, then the resulting term, 2 + (Xx: nat. x) 3, will have its lazy reduction completely to the right of the position marked by the placeholder [] in the context. Another case arises with the first context, EV[ ] ([ ] 5) + (Xx: nat. x) 3. If we insert the lazy normal form M flat. 7 + y, we obtain a term EV[M] whose lazy reduction involves a larger subterm than M. We note here that left-most reduction can also be characterized using evaluation contexts. To do so, we add six more forms to the definition of evaluation context, corresponding to the six rules that appear in Table 2.3 but not in Table 2.4. Lemma 2.5.20 extends

The Language PCF

116

easily to left-most reduction, but Lemma 2.5.21 requires the additional hypothesis that M a. M1 or (M1, M2). does not have the form in the analysis of parallel-or by showing that if the normal form We use Lemma 2.5.2 1 of a compound term depends on the subterms M1 Mk, as we expect the parallel-or of M1 and M2 to depend on M1 and M2, then we may reduce the term to the form EV[Mj, for some i independent of the form of M1 Mk. The intuition behind this is that some number of reduction steps may be independent of the chosen subterms. But if any of these subterms has any effect on the result, then eventually we must come to some term The "sequential" nature of PCF is where the next reduction depends on the form of that when we reach a term of the form EV[M1], it follows from Lemma 2.5.2 1 that if M1 is not a lazy normal form, we continue to reduce M1 until this subterm term does not lazyreduce further. Since we are interested in analyzing the reduction steps applied to a function of several arguments, we will use contexts with more than one "hole" (place for inserting a term). A context C[. ] with k holes is a syntactic expression with exactly the same form as a term, but containing zero or more occurrences of the placeholders []i []k. each assumed to have a fixed type within this expression. As for contexts with a single hole, without we write C[M1 Mk] for the result of replacing the ith placeholder [ by renaming bound variables. Our first lemma about reduction in arbitrary contexts is that the lazy reduction of a term of the form C[M1 Mk] is either a reduction on C, independent of the terms M1 placed in the context, or a reduction that depends on the form of one of the Mk terms M1 Mk. Let C[.,..., •] be a context with k holes. Suppose that there exist closed terms N1 Nk] is not in lazy Nk of the appropriate types such that C[N1 normal form. Then C must have one of the following two properties:

Lemma 2.5.23

(i) There is a context C'[. appropriate types C[M1 Mk]

•] such that for all closed terms C'[M1 Mk].

(ii) There is some i such that for all closed terms M1 there is an evaluation context EV[] with C[M1 Mk]

M1,..., Mk of

the

Mk of the appropriate types EV[M1].

We use Lemma 2.5.23 to prove the following statement about sequences of lazy reduc-

tions.

Lemma 2.5.24 Let C[. Mk be closed .1 be a context with k holes, let M1 terms of the appropriate types for C and suppose C[Mi is program with nora Mk] mal form N. Then either C[M

reduces to N for all closed

of the

2.5

PCF Programming Examples, Expressive Power and Limitations

appropriate types or there is some integer i such that for all closed appropriate types there is an evaluation context Ev[ I with

117

of the Ev[M1.

An intuitive reading of this lemma is that some number of reduction steps of a term If this does not produce a normal form, MkI may be independent of all the then eventually the reduction of C[M1 MkI will reach a step that depends on the form of some with the number i depending only on C. If we replace lazy reduction with left-most reduction, then we may change lazy normal form to normal form in Lemma 2.5.23 and drop the assumption that C[M1 MkI is a program in Lemma 2.5.24. Using Lemmas 2.5.21 and 2.5.24, we now prove Theorem 2.5.19. C[M1

Proof of Theorem 2.5.19

Suppose we have a PCF expression POR defining parallel-. or and consider the context C[., 1 POR [ Ii [ 12. Since POR defines parallel-or, C[true, truel true and C[false,falsel false. By Lemma 2.5.24, there are two possibilities for the lazy reduction of C [true, truel to true. The first implies that C[M1, M21 true for all closed boolean terms M1 and M2. But this contradicts C[false,falsel false. Therefore, there is an integer i e (1, 2} such that for all closed boolean terms M1, M2, there is an evaluation context EV[ I with C[M1, M21 EV[M,1. But, by Lemma 2.5.21, this implies that if we apply POR to true and the term fixboo/ bool. x) with no normal form, bool. x) the ith argument, we cannot reduce the resulting term to true. This contradicts the assumption that POR defines parallel-or.

*

.

The reader may reasonably ask whether the non-definability of parallel-or is a peculiarity of PCF or a basic property shared by sequential programming languages such as Pascal and Lisp. The main complicating factor in comparing PCF to these languages is the order of evaluation. Since a Pascal function call P (M , N) is compiled so that expressions M and N are evaluated before P is called, it is clear that no boolean function P could compute parallel-or. However, we can ask whether parallel-or is definable by a Pascal (or Lisp) context with two boolean positions. More precisely, consider a Pascal program context P[., .] with two "holes" for boolean expressions (or bodies of functions that return boolean values). An important part of the execution of P[M, NI is what happens when M or N may run forever. Therefore, it is sensible and nontrivial to ask whether parallel-or is definable by a Pascal or Lisp context. If we were to go to the effort of formalizing the execution of Pascal programs in the way that we have formalized PCF evaluation, it seems very likely that we could prove that parallel-or is not definable by any context. The reader may enjoy trying to demonstrate otherwise by constructing a Pascal context that defines parallel-or and seeing what obstacles are involved. In Lisp, it is possible to define parallel-or by defining your

The Language PCF

118

own eval function. However, the apparent need to define a non-standard the sequentiality of standard Lisp evaluation.

eval illustrates

Proof of Lemma 2.5.20 We use induction on the proof,

in the system of Table 2.4, that N is one of the reduction axioms (these are listed in

M N. The base case is that M —> Table 2.2). If this is so, then the lemma holds with Ev{ I [I. The induction steps for the inference rules in Table 2.4 are all similar and essentially straightforward. For example, if we have P + R Q + R by the rule

then by the induction hypothesis there is a unique evaluation context EV'{ J such that P Ev'{P'I, P' —> Q' is a reduction axioms, and Q EV'[Q'I. The lemma follows by

Proof of Lemma 2.5.21 We prove the lemma by induction on the structure of evaluation contexts. In the base case, we have an evaluation context EV[ I [I. Since EV[MI M and EV{M'] M', it is easy to see that under the hypotheses of the lemma, Ev[M] EV[M'J. There are a number of induction cases, corresponding to the possible syntactic forms of EV[ I. Since the analysis of each of these is similar, we consider three representative cases. The case Ev[ ] ] + M1 is similar to most of the compound evaluation contexts. In this case, Ev[MI EV'[MJ + M1 and, by the induction hypothesis, Ev'[M] EV'[M'I. Since EV'{MJ is not syntactically a numeral, no axiom can apply to the entire term EV'[MI + M1. Therefore the only rule in Table 2.4 that applies is EV'{MI

EV'[M]

+

M1

EV'[M']

EY[M'] + M1

This gives us EV[M] EV[M']. The two other cases we will consider are projection and function application. A context EV[ I I is an evaluation context only if EV'{] does not have the syntactic form of a pair. Since M M' implies that M is not a pair, we know that EV[M] and EV'{MI do not have the syntactic form of a pair. It follows that no reduction axiom applies to the entire term Therefore, by the induction hypothesis that EV'[MI EV'[M'II, we conclude that EV[M1 EV[M'I. The last case we consider is EV[ ] EV'[ I N. This is similar to the projection case. The context Ev'[ I cannot be a lambda abstraction, by the definition of evaluation contexts, and M is not a lambda abstraction since M M'. It follows that EV'[M] does not have the syntactic form of a lambda abstraction. Since the reduction axiom does not apply

2.5

PCF Programming Examples, Expressive Power and Limitations

119

to the entire term EV'[MJ N, we use the induction hypothesis EV'[M] EV'[M'] and the definition of lazy reduction in Table 2.4 to conclude that EV[MJ EV[M'].

Proof of Lemma 2.5.23

We prove the lemma by induction on the structure of contexts. If single symbol, then it is either one of the placeholders, [I,, a variable or a constant. It is easy to see that C has property (ii) if it is a placeholder and property (i) if it is some other symbol. For a context Cj [ ] + •], we consider two cases, each dividing into two subcases when we apply the induction hypothesis. If there exist terms M1 Mk with C1[M1 Mk] not in lazy normal form, then we apply the induction hypothesis to Ci. If C1 [M1 [M1 Mk] Mk] for all M1 Mk, then it is easy to see from the definition of lazy reduction that C is a

Ci[Mi

MkJ

+ C2[M1

Cç[M1

Mk]

MkJ

+ C2[M1

for all M1 Mk]

C1[Mj

Mk. On the other hand, if for every M1 EV[MI], then Mk]

+ C2[M1

EV[M1]

Mk]

MkJ

Mk we can write

+ C2[M1

C1

[Mi

Mkj

and we have an evaluation context for M1. The second case is that C1[M1 MkJ is a numeral and therefore M1 Mk do not occur in C1 [M1 Mkj. In this case, we apply the induction hypothesis to C2 and reason as above. The induction steps for Eq? and if ... then ... else ... are similar to the addition case. The induction steps for pairing and lambda abstraction are trivial, since these are lazy normal forms. For a context Proj1C[. •J, we must consider the form of C. If C[. .] = II then .1 satisfies condition (ii) of the lemma since this is an evaluation context. If C[. (C1 [• I I, I), then Proj1C[. I satisform, fies condition (i). If C[. has some other then Proj1C[M1 cannot MkI •] be a (proj) redex. Therefore, we apply the induction hypothesis to C and reason as in the earlier cases. The induction step for a context C1 [• case, ] •] is similar to the the terms we place in contexts except that we will need to use the assumption that are closed. Specifically, if C1[. a. C3[• then let C' be the context I For all closed M1 ]/x]Ci[. C'[. Mk we have I. j .

.

.

.

(Ax:a.C3[Mi

Mk])C2[Ml

Mk]

and the context satisfies condition (i) of the lemma. If Cit• ] [ ],, then as in the satisfies (ii) the lemma. Finally, if the application context condition of projection case,

The Language PCF

120

some context that is not of one of these two forms, we reason as in the addition case, applying the induction hypothesis to either Ci or, if C1[M1 Mk] is a lazy a normal form for all M1,..., Mk, the context C2. This completes the proof. C1

] is

.

Proof of Lemma 2.5.24

Let C[. Mk .1 be a context with k holes and let M1 be c'osed terms of the appropriate types such that C[M1 Mk] has normal form N of N, where n indicates the number of reduction observable type. Then C[M1 Mk] induction n. prove the lemma by on steps. We N is in normal form. There are two cases to conIn the base case, C[M1 Mk] sider. The degenerate one is that C[M is in normal form for all closed terms of the appropriate types. But since the only results are numerals and boolean must not appear in N and the lemma easily follows. The secconstants, Since ond case is that is not in normal form for some C[M1 Mk] is in normal form, condition (i) of Lemma 2.5.23 cannot hold. Therefore

Mfl In the induction step, with C[M1 Mkj *m+1 N, Lemma 2.5.23 gives us two cases. The simpler is that there is some i such that for all closed terms M of the appropriate types has the form Ev[M]. But then clearly EV[M] and the lemma holds. The remaining case is that there is some context C'[. .1 such that for all of appropriate types, we have a reduction of the form C[M1

Mk]

C'[M1

Mk]

*m N.

The lemma follows by the induction hypothesis for

...

,

•]. This concludes the proof.

U

Exercise 2.5.25 Show that Lemma 2.5.24 fails if we drop the assumption that C[M1 Mk] is a program.

Exercise 2.5.26 This exercise asks you to prove that parallel-conditional is not definable in PCF. Use Lemmas 2.5.21 and 2.5.24 to show that there is no PCF expression PIFnat boo! —÷ nat —÷ nat —÷ flat such that for all closed boolean expressions M: boo! and N, P, Q: flat with Q in normal form, PIFnat MN P

—*4

Q

if

[M M J

[

true and N has normal form Q or, and P has normal form Q or, N, P both have normal form Q.

(If we extend Lemmas 2.5.2 1 and 2.5.24 to left-most reduction, the same proof shows that is not definable for any type a.)

2.5

PCF Programming Examples, Expressive Power and Limitations

121

Exercise 2.5.2 7

The operational equivalence relation on terms is defined in Section 2.3.5. Show that if M and N are either closed terms of type nat or closed terms of type bool, neither having a normal form, then M N.

Exercise 2.5.28

A plausible statement that is stronger than Lemma 2.5.24 is this:

Let be a context with k holes, let M1 Mk be closed terms of the J appropriate types for C and suppose C[M1 N, where N does not further reduce by lazy reduction. Then either C[Mç reduces to N for all closed of the appropriate types or there is some integer i such that for all closed of the appropriate types there is an evaluation context EV[ I with C[Mç •

MkI.

*

The only difference is that we do not require N to be in normal form. Show that this statement is false by giving a counterexample. This may be done using a context C[ I with only one placeholder.

Exercise 2.5.29 [Sto9laI This problem asks you to show that parallel-or, boolean parallel-conditional and natural-number parallel-conditional are all interdefinable. Parallel-or is defined in the statement of Theorem 2.5.19 and parallel-conditional is defined in Exercise 2.5.26. An intermediate step involves parallel-and, defined below. (a) Show how to define POR from PIFb001 by writing a PCF expression containing PIFb00/. (b) Show how to define PIFb00/ from PIFnat using an expression of the form Ax: bool. Ày: bool. Az: bool. Eq?

1

(PIFnat X M N).

(c) Parallel-and, PAND, has the following behavior:

PAND M N

true false no normal form

if M —*±

true and N if M —*± false or N otherwise.

—*±

true,

false,

Show how to define PAND from POR by writing a PCF expression containing POR. (d) Show how to define PIFnat from POR using an expression of the form Ax: bool. Ày: nat. Az: nat. search

P

bool has x, y, z free and contains POR, PAND, and Eq?. The essential where P: nat properties of search are summarized in Proposition 2.5.3 on page 99.

The Language PCF

122

Variations and Extensions of PCF

2.6 2.6.1

Summary of Extensions

This section briefly summarizes several important extensions of PCF. All are obtained by adding new types. The first extension is a very simple one, a type unit with only one element. The second extension is sum, or disjoint union, types. With unit and disjoint unions, we can define bool as the disjoint union of unit and unit. This makes the primitive type bool unnecessary. The next extension is recursive type definitions. With recursive type definitions, unit and sum types, we can define nat and its operations, making the primitive type nat also unnecessary. Other commonly-used types that have straightforward recursive definitions are stacks, lists and trees. Another use of recursive types is that we can define the fixed-point operatorfix on any type. Thus with unit, sums and recursive types, we may define all of the type and term constants of PCF. The final language is a variant of PCF with lifted types, which give us a different view of nontermination. With lifted types, written in the form aj, we can distinguish between natural-number expressions that necessarily have a normal form and those that may not. Specifically, natural-number terms that do not involve fix may have type nat, while terms that involve fix, and therefore may not have a normal form, have the lifted type nati. Thus the type flat of PCF corresponds to type nati in this language. The syntax of many common programs becomes more complicated, since the distinction between lifted and unlifted types forces us to include additional lifting operations in terms. However, the refinement achieved with lifted types has some theoretical advantages. One is that many more equational or reduction axioms become consistent with recursion. Another is that we may study different evaluation orders within a single framework. For this reason, explicit lifting seems useful in meta-languages for studying other programming languages. All of the extensions summarized here may be combined with polymorphism, treated in Chapter 9 and other typing ideas in Chapters 10 and 1 2.6.2

Unit and Sum Types

We add the one-element type unit to PCF, or any language based on typed lambda calculus, by adding unit to the type expressions and adding the constant *: unit to the syntax of

terms. The equational axiom for * is

M

= *: unit

(unit)

for any term M: unit. Intuitively, this axiom says that every element of type unit is equal to *. This may be used as a reduction rule, read left to right. While unit may not seem very interesting, it is surprisingly useful in combination with sums and other forms of types.

2.6

Variations and Extensions of PCF

123

It should be mentioned that the reduction rule for unit terms causes confluence to fail, when combined with i'-reduction [CD91, LS86]. This does not have an immediate consequence for PCF, since we do not use i'-reduction. Intuitively, the sum type a + r is the disjoint union of types a and r. The difference between a disjoint union and an ordinary set union is that with a disjoint union, we can tell which type any element comes from. This is particularly noticeable when the two types overlap or are identical. For example, if we take the set union of mt and int, we just get mt. In contrast, the disjoint union of mt and mt has two "copies" of mt. Informally, we think of the sum type mt + mt as the collection of "tagged" integers, with each tag indicating whether the integer comes from the left or right mt in the sum. The term forms associated with sums are injection and case functions. For any types a have types and r, the injection functions Inleftat and

Inleftot

a

:

Inright°'t

:

—f

r

a

—±

+r

a

+

r

Intuitively, the injection functions map a or r to a + r by "tagging" elements. However, since the operations on tags are encapsulated in the injection and case functions, we never say exactly what the tags actually are. The Case function applies one of two functions to an element of a sum type. The choice between functions depends on which type of the sum the element comes from. For all types r and p, the case function CaseatP has type :

(a+r)—÷(a--÷p)--÷(r--÷p)--÷p

fg

inspects the tag on x and applies Intuitively, from r. The main equational axioms are

f g = fx (Inright°'t x) f g = gx

Caseat/) (Inleftat x)

f if x is from o

and g if x is

(case)i (case)2

Both of these give us reduction axioms, read left to right. There is also an extensionality axiom for sum types,

Casea,t/) x

(f o

(f o

=

f x,

(case) 3

where f: (a + r) —± (a + r). Some consequences of this axiom are given in Exercise 2.6.3. Since the extensionality axiom leads to inconsistency, when combined with fix, we do not include this equational axiom in the extension of PCF with sum types (see Exercise 2.6.4). We will drop the type superscripts from injection and case functions when the type is either irrelevant or determined by context.

The Language PCF

124

As an illustration of unit and sum types, we will show how bool may be eliminated if unit and sum types are added to PCF. We define bool by

bool

unit + unit

and boolean values true and false by

true false

Inleft *

Inright *

The remaining basic boolean expression is conditional, We may consider this sugar for Case, as follows:

if

M

then N else P

if ...

then

... else

M

is the lambda term Ax: p. Ày: unit. x that produces a constant where N, P: p and function. To show that this works, we must check the two equational axioms for conditional:

if true then if false then

M M

else N=M, else N = N.

The reader may easily verify that when we eliminate syntactic sugar, the first equational * = M, and similarly for axiom for Case yields if true then M else N = the second equation. Other uses of sum types include variants and enumeration types, considered in Exercises 2.6.1 and 2.6.2.

Exercise 2.6.1

A variant type is a form of labeled sum, bearing the same relationship to sum types as records do to product types. A variant type defining a sum of types is written [Li: oi Lk: (7k], where Li Lk are distinct syntactic labels. As oi for records, the order in which we write the labelltype pairs does not matter. For example, [A: int, B: bool] is the same type as [B: bool, A: int]. Intuitively, an element of the variant type [L1: Lk: ad is an element of one of for 1 i 0, we may similarly define the numeral ml by

ml

up (Inright up (Inright

.

.

up

(Inleft*)

with n occurrences of Inright and n + applies up and Inright:

succ

Ax:

1

.

.

.))

applications of up. The successor function just

nat. up(Inright x)

We can check that this has the right type by noting that if x: nat, then Inright x belongs to unit + nat. Therefore, up(Inright x): nat. The reader may verify that Sini — In + 11 for any natural number n. The zero-test operation works by taking x: nat and applying dn to obtain an element of unit + nat, then using Case to see whether the result is an element of unit (i.e., equal to 0) or not.

zero?

Ax: nat.

(dn x) (Ày: unit, true) (Az: nat.false)

2.6

Variations and Extensions of PCF

129

The reader may easily check that zero? 101 = true and zero? In] =false for n > 0. The final operation we need is predecessor. Recall that although there is no predecessor of 0, it is conventional to take pred 0 = 0. The predecessor function is similar to zero? since it works by taking x: nat and applying dn, then using Case to see whether the result is an element of unit (i.e., equal to 0) or not. If so, then the predecessor is 0; otherwise, the element of flat already obtained by applying dn is the predecessor.

pred

(dn x) (Ày: unit. 0) (Az: nat. z)

nat.

Ax:

The reader may verify that for any x: nat we have pred(succ x) = x. (This is part of Exercise 2.6.5 below.) A second illustration of the use of recursive types has a very different flavor. While most common uses of recursive types involve data structures, we may also write recursive definitions over any type using recursive types. More specifically, we may define a function a —f a. (fix0 f) using only typed lambda fixa with the reduction property fixa calculus with recursive types. The main idea is to use "self-application," which is not possible without recursive types. More specifically, if M is any PCF term, the application MM cannot be well—typed. The reason is simply that if M: r, then r does not have enough symbols to have the form r —± r'. However, with recursive types, (dn M)M may be well-typed, since the type of M could have the form 1fit.t —± r. We will define fix0 using a term of the form dn MM. The main property we want for dn MM is that

f

a

dn MM

—±

i. f(dn MMf).

This reduction gives us a good clue about the form of M. Specifically, we can let M have the form M

up(

Ax:r.Af:a —± a.f(dnxx f))

if we can find an appropriate type r. Since we want dn MM: (a —± a) —± a, the expression dn x x in the body of M must have type 7. Therefore, r must satisfy the isomorphism

f

—*

(a

—*

a)

—*

i.

We can easily achieve this by letting

dn MM, or more simply

We leave it to the reader to verify that if we define r.

a

—±

a.

f(dnx x f))

r.

a

—±

i. f(dnx x f)))

The Language PCF

130

—÷÷ Af: a —± a. we have a well-typed expression f). The fixed-point operator we have just derived is a typed version of the fixed-point operator e from untyped lambda calculus [Bar84J. As outlined in Exercise 2.6.7 below, any untyped lambda term may be written in a typed way using recursive types.

Exercise 2.6.5 Using the definitions given in this section, show that we have the following reductions on natural number terms. (a) For every natural number n, we have S [nl true and zero? ml

(b) zero? [01

[n

+

false for n > 0.

(c) If x is a natural number variable, then pred (succ x)

Exercise 2.6.6 recursive type: unit

list

11.

—÷÷

x.

We may define the type of lists of natural numbers using the following

+ (nat x t)

The intuition behind this definition is that a list is either empty, which we represent using the element *: unit, or may be regarded as a pair consisting of a natural number and a list.

(a) Define the empty list nil: list and the function cons: nat x list number to a list.

list that adds a natural

(b) The function car returns the first element of a list and the function cdr returns the list following the first element. We can define versions of these functions that make sense when applied to the empty list by giving them types

car

:

list

—±

unit

+ nat

cdr

:

list

—*

unit

+ list

and establishing the convention that when applied to the empty list, each returns Inleft *. Write terms defining car and cdr functions so that the following equations are provable:

car nil car (cons x £)

cdr nil cdr (cons x £)

= Inleft * =x

= Inleft * =£

for x: nat and £: list. (c) Using recursion, it is possible to define "infinite" lists of type list. Show how to use fix to define a list for any numeral n, such that car = n and cdr =

2.6

Variations and Extensions of PCF

131

Exercise 2.6.7

The terms of pure untyped lambda calculus (without constant symbols) are given by the grammar U

::=

x

UU

J

I

Ax.U

The main equational properties of untyped lambda terms are untyped versions of (a), (,8) and (n). We may define untyped lambda terms in typed lambda calculus with recursive types using the type sit. t t. Intuitively, the reason this type works is that terms of type t t may be as used functions on this type, and conversely. This is exactly what we btt. need to make sense of the untyped lambda calculus. (a) Show that if we give every free or bound variable type sit. t t, we may translate any untyped term into a typed lambda term of type sit. t t by inserting up, dn, and type designations on lambda abstractions. Verify that untyped (a) follows from typed (a), untyped follows from dn(up M) = M and untyped (,8) and, finally, untyped follows from typed and up(dn M) = M. (b) Without recursive types, every typed lambda term without constants has a unique normal form. Show that by translating (Ax. xx) Ax. xx into typed lambda calculus, we can use recursive types to write terms with no normal form. (c) A well-known term in untyped lambda calculus is the fixed-point operator Y

Af.(Ax.f(xx))(Ax.f(xx)).

Unlike e, this term is only an equational fixed-point operator, i.e., Yf = f(Yf) but Yf does not reduce to f(Yf). Use a variant of the translation given in part (a) to obtain a typed equational fixed point operator from Y. Your typed term should have type (a a) a. Show that although Yf does not reduce to f(Yf), Yf reduces to a term Mf such that

Mf

fMf.

Exercise 2.6.8 The programming language ML has a form of recursive type declaration, called datatype, that combines sums and type recursion. In general, a datatype declaration has the form

datatype

t

=

of

a1

...

of ak

where the syntactic labels Li

Lk are used in the same way as in variant types (labeled sums). Intuitively, this declaration defines a type t that is the sum of ak, with labels Li Lk used as injection functions as in Exercise 2.6.1. If the declared type t occurs in then this is a recursive declaration of type t. Note that the vertical bar is part of at the syntax of ML, not the meta-language we are using to describe ML. A similar exercise I

The Language PCF

132

on ML datatype declarations, using algebraic specifications instead of recursive types and sums, appears in Chapter 3 (Exercise 3.6.4). The type of binary trees, with natural numbers at the leaves, may be declared using datatype by

datatype tree = leaf of nat

I

node

of tree x tree.

Using the notation for variants given in Exercise 2.6.1, this dechires a type tree satisfying

[leaf: nat, node: tree x tree]

tree

datatype declaration as

of the form .] using variants as in Exercise 2.6.1. Illustrate your general method on the type of trees above. (a) Explain how to regard a

a recursive type expression

(b) A convenient feature of ML is the way that pattern matching may be used to define functions over declared datatypes. A function over trees may be declared using two "clauses," one for each form of tree. In our variant of ML syntax, a function over tree's defined by pattern matching will have the form

letrec fun f(leaf(x:nat)) I

f(node(tl:tree,t2:tree)) in P

=M =N

f

where variables x: nat and f: tree a are bound in M, t1, tree and are bound in N and is bound in the the declaration body P. The compiler checks to make sure there is one clause for each constructor of the datatype. For example, we may declare a function that sums up the values of all the leaves as follows.

f

f

letrec fun f(leaf(x:nat)) f(node(t1:tree, t2:tree))

=x = f(t1) + f(t2)

Explain how to regard a function definition with pattern matching as a function with Case and illustrate your general method on the function that sums up the values of all the leaves of a tree. 2.6.4

Lifted Types

Lifted types may be used to distinguish possibly nonterminating computations from ones that are guaranteed to terminate. We can see how lifted types could have been used in PCF by recalling that every term that does not contain fix has a normal form. If we combine basic boolean and natural number expressions, pairs and functions, we therefore have a language in which every term has a normal form. Let us call this language PCF0. In Section 2.2.5, we completed the definition of PCF by adding fix to PCF0. In the process, we

2.6

Variations and Extensions of PCF

133

extended the set of terms of each type in a nontrivial way. For example, while every PCF0 term of type flat reduces to one of the numerals 0, 1, 2 PCF contains terms such as fix (Xx: nat. x) of type nat that do not have any normal form and are not provably equal to any numeral. This means that if we want to associate a mathematical value with every PCF term, we cannot think of the values of type flat as simply the natural numbers. We must consider some superset of the natural numbers that contains values for all the terms without normal form. (Since it is very reasonable to give all nat terms without normal form the same value, we only need one new value, representing nontermination, in addition to the usual natural numbers.) An alternative way of extending PCF0 with a fixed-point operator is to use lifted types. Although lifted types are sometimes syntactically cumbersome, when it comes to writing common functions or programs, lifted types have some theoretical advantages since they let us clearly identify the sources of nontermination, or "undefinedness," in expressions. One reason to use lifted types is that many equational axioms that are inconsistent with fix may be safely combined with lifted types. One example, which is related to the inconsistency of sum types andflx outlined in Exercise 2.6.4, is the pair of equations Eq? x x

= true,

Eq? x (fi + x) = false

numeral n different from 0.

While these equations make good sense for the ordinary natural numbers, it is inconsistent to add them to PCF. To see this, the reader may substituteflx (Ax: nat. 1 + x) for x in both equations and derive true = false. In contrast, it follows from the semantic construction in Section 5.2 that we may consistently add these two equations to the variant of PCF with lifted types (see Example 2.6.10). Another important reason to consider lifted types is that this provides an interesting and insightful way to incorporate alternative reduction orders into a single system. In particular, as we shall see below, both ordinary (nondeterministic or leftmost) PCF and Eager PCF (defined in Section 2.4.5) can be expressed in PCF with lifted types. This has an interesting consequence for equational reasoning about eager reduction. The equational theory of Eager PCF is not given in Section 2.4.5 since it is relatively subtle and has a different form from the equational theory of PCF. An advantage of lifted types is that we may essentially express eager reduction, and at the same time allow ordinary equational reasoning. In particular, equational axioms may be used to "optimize" a term without changing its normal form. This follows from confluence of PCF with lifted types [How92]. These advantages notwithstanding, the use of lifted types in programming language analysis is a relatively recent development and some syntactic and semantic issues remain unresolved. Consequently, this section will be a high-level and somewhat informal overview.

The Language PCF

134

The types of the language PCF1, called "PCF with lifted types," are given by the grammar

a of PCF, plus the form aj which is read "a lifted." Intuitively, the elements of aj are the same as the elements of a, plus the possibility of "nontermination" or "divergence". Mathematically, we can interpret aj as the union of a and one extra which serves as the value of any "divergent" term that is not equal element, written to an element of type a. The terms of PCF1 include all of the general forms of PCF, extended to PCF1 types. Specifically, we have pairing, projection, lambda abstraction and application for all product and function types. Since we can use if ... then ... else ... on all types in PCF, we extend this to all PCF1 types. More specifically, if M then N else P is well-formed whenever M: boo! and N, P: a, for any a. However, the basic natural number and boolean functions of PCF (numerals, boolean constants, addition and equality test) keep the same types as in PCF. For example, 5 + x is a nat expression of PCF1 if x: nat; it is not an expression of type nat1. The equational and reduction axioms for all of these operations are the same as for PCF. The language PCF1 also has two operations associated with lifted types and a restricted fixed-point operator. Since both involve a class of types called the "pointed" types, we discuss these before giving the remaining basic operations of the language. Intuitively, the pointed types of PCF1 are the types that contain terms which do not yield any meaningful computational value. More precisely, we can say that a closed term M: a is noninformative if, for every context C[ J of observable type, if C[M] has a normal form, then C[N] has the same normal form for every closed N: a. (An example noninformative term is fix(Xx: a. x); see Lemma 2.5.24.) Formally, we say a type a is pointed if either a tj, a a1 x a2 and both al, a a1 and is pointed. The reason tj has noninformative terms will become apparent from the type of the fixedpoint operator. For a product a1 x a noninformative term by combining noninformative terms of types ai and a2. If M: a2 is noninformative, and x: al is not free in M, then The name "pointed," which a!. M is a noninformative term of type al —÷ is not very descriptive, comes from the fact that a pointed type has a distinguished element (or "point") representing the value of any noninformative term. The language PCF1 has a fixed-point constant for each pointed type: These are the types

fixa: (a

a) -÷ a

a pointed

The equational axiom and reduction axiom forfix are as in PCF.

2.6

Variations and Extensions of PCF

135

The first term form associated with lifted types is that if M: o, then

[M]:ai.

(Li Intro)

Intuitively, since the elements of o are included in 0j, the term [Mj defines an element of o-, considered as an element of al. The second term form associated with lifting allows us to evaluate expressions of a1 to obtain expressions of type a. Since some terms of type aj do not define elements of a, this is formalized in a careful way. If M: a1 and N: r, with r pointed and N possibly containing a variable x of type a, then

let [x:ai=M in

N

(LiElim)

is a term of type r. The reader familiar with ML may recognize this syntax as a form of "pattern-matching" let. Intuitively, the intent of Lx: a] = M in N is that we "hope" that M is equal to [M'] for some M". If so, then the value of this expression is

let

the value of N when x has the value of M'. The main equational axiom for Li and let is

let Lx:7]=LMi in N=[M/x]N

(let

Li)

This is also used as a reduction axiom, read left to right. Intuitively, the value of M: a1 is either or an element of a. If M =1, then let Lx: aj = M in N will have value In operational terms, if M cannot be reduced to the form LM'j, then let Lx: a] = M in N not be reducible to a useful form of type r. If M is equal to some LM'i, then the value of let Lx: aj = M in N is equal to [M'/x]N. r representing "undefinedness" or nontermination at each If we add a constant state some other equational properties. pointed type r, we can also

I

(let M

in M)

[xi N

MLxi = NLxi

M=N

x not free in M, N:

aj

r

Intuitively, the first equation reflects the fact that if we cannot reduce M to the form [M'i, then let Lxi = M in N is "undefined." The inference rule is a form of reasoning by cases, based on the intuitive idea that any element of a lifted type °í is either the result of lifting an element of type a or the value 'aJ representing "undefinedness" or or either of nontermination at type a. However, we will not use constants of the form these properties in the remainder of this section. In general, they are not especially useful for proving equations between expressions unless we have a more powerful proof system (such as the one using fixed-point induction in Section 5.3) for proving that terms are equal to

The Language PCF

136

We will illustrate the use of lifted types by translating both PCF and Eager PCF into

PCFj. A number of interesting relationships between lifted types and sums and recursive types are beyond the scope of this section but may be found in [How921, for example. Before proceeding, we consider some simple examples.

Example 2.6.9 While there is only one natural way to write factorial in PCF, there are some choices with explicit lifting. This example shows two essentially equivalent ways of writing factorial. For simplicity, we assume we have natural number subtraction and multiplication operations so that if M, N: flat, then M — N: flat and M * N: flat. Using where these, we can write a factorial function as fact1 Af: nat

F1

if

nat1. Ax: flat.

Eq?xO then [1]

else let [yj=f(x—1) in [x*yj

In the definition here, the type of fact is flat —* nat±. This is a pointed type, which is important since we only have fixed points at pointed types. This type for fact1 is also the type of the lambda-bound variable in F1. Since f: flat —* nat±, we can apply to will be an element of arguments of type flat. However, the result f(x — 1) of calling flatl, and therefore we must use let to extract the natural number value from the recursive call before multiplying by x. If we wish to applyfact1 to an argument M of type flatj, we do so by writing

f

let

[xi = M

in fact1

f

x

An alternative definition of factorial is fact2 F2

Af: flatl

if

—*

f

nat1. Ax:

Eq?zO then [1]

natj. let

else let

[z]

where x

in

[yj =zf[z—

1]

in [z*yj

Here, the type of factorial is different since the domain of the function is flatl instead of flat. The consequence is that we could applyfact2 to a possibly nonterminating expression. However, since the first thingfact2 does with an argument x: flat1 is force it to some value z: flat, there is not much flexibility gained in this definition of factorial. In particular, as illustrated below, both functions calculate 3! in essentially the same way. The reason fact2 immediately forces x: flat1 to some natural number z: flat is that equality test Eq? requires a flat argument, instead of a natj. For the first definition of factorial, we can compute 3! by expanding out three recursive calls and simplifying as follows: (fix F1) 3

2.6

Variations and Extensions of PCF

if

137

let [y]=(flxFi)(3—1) in [3*y]

Eq?30 then [1] else

let [y]= let

(fix F1)(1)

[y']

in

[2 *

y']

in [3*y]

let

let

[y]— [y'] = let [y"] = (fix F1)O in

[1 *

y"]

in [2*y'] in [3*y] Then, once we have reduced (fix F1)O [1], we can set y" to 1, then y' to 1, then y to 2, then perform the final multiplication and produce the final value 6. For the alternative definition, we have essentially the same expansion, beginning with [3] instead of 3 for typing reasons: (fix F2) [3]

let [z] [3] in if Eq?zO then [1] else let [yj=(flxF2)[z—lj in [z*yj let [yj= let [y'j = let [y"] =fix F2) [0] in

[1 *

y"]

in [2*y'j in [3*yj The only difference here is that in each recursive call, the argument is lifted to match the type of the function. However, the body F2 immediately uses let to "unlift" the argument, so the net effect is the same as forfact1.

Example 2.6.10 Eq? x x

As mentioned above, the equations

= true,

Eq? x (n + x) —false

numeral n different from 0.

are consistent with PCF1, when x: flat is a natural number variable, but inconsistent when added to PCF. We will see how these equations could be used in PCF1 by considering the PCF expression

letrec

f(x:nat):nat=if

if in f3

Eq?xOthen

I

else

Eq?f(x—I)f(x—I) then

2 else 3

The Language PCF

138

If we apply left-most reduction to this expression, we end up with an expression of the form

if

Eq? ((fix F)(3 — 1)) ((fix F)(3 — 1))

then

2

else

3

which requires us to simplify two applications of the recursive function in order to determine that f(2) = f(2). We might like to apply some "optimization" which says that whenever we have a subexpression of the form Eq? M M we can simplify this to true. However, the inconsistency of the two equations above shows that this optimization cannot be combined with other reasonable optimizations of this form. We cannot eliminate the need to compute at least one recursive call, but we can eliminate the equality test using lifted types. A natural and routine translation of this expression into PCF1 is

letrec

f(x:nat):nat1=if Eq?xO then let

in

[yj

= f(x —

[1]

1) in if

else

Eq?yy then

L2i

else

L3i

f3

In this form, we have a subexpression of the form Eq? y y where y: nat. With the equations above, we can replace this PCF1 expression by an equivalent one, obtaining the provably equivalent expression

letrec

f(x:nat):nat1=if Eq?xO then

let

[yj=f(x—

1)

Lii

in

else L2i

in f3 The reason we cannot expect to simplify let = f(x — 1) in L2i to L2i by any local optimization that does not analyze the entire declaration of is that when f(x — 1) does not terminate (i.e., cannot be reduced to the form LMj), the two are not equivalent. .

f

Translation of PCF into PCF1 The translation of PCF into PCF1 has two parts. The first is a translation of types. We will map expressions of type a from PCF to expressions of type in PCF1, where ö is the result of replacing nat by nat± and bool by boolj. More specifically, we may define by induction on types:

bool

= bool1

nat

= nati

ai

0IXO2

Xc12 '72

=

2.6

Variations and Extensions of PCF

139

a, the type is pointed. This gives us a fixed-point operator of each type The translation of compound PCF terms into PCF1 is essentially straightforward. The most interesting part of the translation lies in the basic natural number and boolean operations. If M: a in PCF, we must find some suitable term M of type in PCF1. The translation of a numeral n is [nj and the translations of true and false are similarly [true] and Lfalse]. It is easy to see these have the right types. For each of the operations Eq?, + and conditional, we will write a lambda term of the appropriate PCF1 type. Since Eq? compares two natural number terms and produces a boolean, we will write a PCF1 term Eq? of type nat1 —÷ boo!1. The term for equality test is It is easy to check that for any PCF type

Eq?

=

Ax:

nat1.

Ày:

let

[x']

x

in

let

[y'] = y in [Eq? x' y']

An intuitive explanation of this function is that Eq? takes two arguments of type nat1, evaluates them to obtain elements of type nat (if possible), and then compares the results for equality. If either argument does not reduce to the form [M] for M: nat, then the corresponding let cannot be reduced. But if both arguments do reduce to the form [M], then we are certain to be able to reduce both to numerals, and hence we will be able to obtain [true] or Lfalse] by the reduction axiom for Eq?. The reader is encouraged to try a few examples in Exercise 2.6.14. The terms :j: for addition and cond for if ... then ... else ... are similar in spirit to Eq?. Ax:

nati. Ày: nati. let

[x'S]

= x in

let

[y'] = y in

[x'j=x in if

x' then y

[x'S

+ y']

else

z

In the translation of conditional, it is important to notice that is pointed. As a result, the function body let [x'j x in if x' then y else z is well-typed. For pairing and lambda abstraction, we let

(M,N) =(M,N) Ax:a. M = and similarly for application, projections and fixed points. Some examples appear in Exercises 2.6.14 and 2.6.16. The main properties of this translation are (i) If M:

a in PCF,

then M:

in PCF1.

The Language PCF

140

(ii) If M, N are syntactically distinct terms of PCF, then M, N are syntactically distinct terms of PCF1.

(iii) If M N in PCF1. Conversely, if M is a normal form in PCF, N in PCF, then M then M is a normal form in PCF1. (iv) If M

N is provable in PCF, then M = N is provable in PCF1.

While there are a number of cases, it is essentially straightforward to verify these properties. It follows from properties (ii) and (iii) that if M, N are distinct normal forms of PCF, then M, N are distinct normal forms of PCF1.

Translation of Eager PCF into PCF1 Like the translation of PCF into PCF1, the translation of Eager PCF into PCF1 has two parts. We begin with the translation of types. For any type a of Eager PCF, we define the associated type a. Intuitively, a is the type of Eager PCF values (in the technical sense of Section 2.4.5) of type a. Since any Eager PCF term will either reduce to a value, or diverge under eager reduction, an Eager PCF term of type a will be translated to a PCF1 term of type a1. After defining a by induction on the structure of type expressions, we give an informal explanation.

nat

=nat

bool

= bool

a

t = a —+

axt The Eager PCF values of type flat or bool correspond to PCF1 terms of type flat or bool. An Eager PCF function value of type a r has the form Ax: a. M, where for any argument V, the body [V/x]M may either reduce to a value of type r or diverge under eager reduction. Therefore, in PCF1, the values of type a —÷ t are functions from a- to t1. Finally, the Eager PCF values of type a x r are pairs of values, and therefore have type a x r in PCF1. We translate a term M: a of Eager PCF with free variables x1: a1 a term M: a1 of PCF1 with free variables Xl: a1 as follows:

=Lxi =Lni =

Ltruei

Variations and Extensions of PCF

2.6

false

=

M+N

=let

if

M

Lfalsej Lxi =%

then N

MN

in let

in Lx+yi

let = N in [Eq? x yj else P=let Lxi=M in if x then N else P = let Lfi = M in let Lx] = N in fx = let

Eq?MN

141

Lxi

= M in

Ax:o.M

Ag:

(a

—÷

r)

—÷

(a

—÷

r). g(?.x: a.

let

Lhi

= fg in hx))j

The translation of the fixed-point operator needs some explanation. As discussed in Section 2.4.5, Eager PCF only has fixed-point operators on function types. The translation of the fixed-point operator on type a —÷ r is recursively defined to satisfy

=

?.g:

(a

—÷

r)

—÷

(a

—÷

r). g(?.x: a. let [hi

g

in hx)

This is essentially the Eager PCF property "fix = Ag: (. . .). tion 2.4.5, where with delay[M] ?.x: (. .). Mx, using the fact that

gil)" from Sec-

.

Lfi=M in let let Lfi=M in fx

Ly]=

Lxi

in fy

byaxiom(let Li). In the next two examples, we discuss the way that eager reduction is preserved by this translation. A genera' difference between the eager reduction strategy given in Section 2.4.5 and reduction in PCF1 is that eager reduction is deterministic. In particu'ar, eager reduction of an application MN specifies reduction of M to a value (e.g., Ax: (.. .). M') before reducing N to a va'ue. However, from the point of view of accurately representing the termination behavior and values of any expression, the order of evaluation between M and N is not significant. When an application MN of Eager PCF is translated into PCF1, it will be necessary to reduce both to the form L.. .j, intuitively corresponding to the translation of an Eager PCF value, before performing 18-reduction. However, PCF1 allows either the function M or the argument N to be reduced first.

Example 2.6.11 We may apply this translation to the term considered in Example 2.4.20. Since this will help preserve the structure of the expression, we use the syntactic sugar M N

let

Lf]

= M in let

Lxi

= N in fx

The Language PCF

142

for eager application. For readability, we will also use the simplification

u-l-v= let

Lx]

=

Lu]

in (let

Ly]

= [v] in

[x

+ y])

Lu

+

v]

for any variables or constants u and v, since this consequence of our equational axioms does not change the behavior of the resulting term. Applying the translation to the term (fix (Ax: nat

—>

nat. Xy: flat. y)) ((Az: flat. z

+ 1)2)

gives us

(fix.

L

Ax:nat—>

])

(L

Az:nat.[z+

L2i)

A useful general observation is that

but that we cannot simplify a form M'• N' unless M' has the form LM] and N' has the form LN]. The need to produce "lifted values" (or, more precisely, the lifting of terms that reduce to values) is important in expressing eager reduction with lifted types. We can see that there are two possible reductions of the term above, one involving fix and the other (LAz: flat. Lz + l]j L2i) L2 + 1]. The reader may enjoy leftmost reducing the entire expression and seeing how the steps correspond to the eager reduction steps carried out in Example 2.4.20.

Example 2.6.12

We can see how divergence is preserved by considering the term

let f(x:nat):flat = 3 in letrec g(x:flat):nat

g(x+l) in f(g5)

from Example 2.4.21. After replacing let's by lambda abstraction and application, the translation of this term is Xf:nat—> flati.[Xg:nat—> flatl.Lf] (g5)]] L

•[Xx:flat.L3j

]

•(flx. LAg:flat —> flati.LAx:flat.g(x + 1)]]

)

where for simplicity we have used [g] LSi = g 5 and Lgj (x+l) = Lgj Lx + lj = g(x + 1). Recall from Examples 2.4.6 and 2.4.21 that if we apply leftmost reduction to the original term, we obtain 3, while eager reduction does not produce a normal form. We can see how the translation of Eager PCF into PCF1 preserves the absence of normal forms by leftmost-reducing the result of this translation. Since the first argument and the main function have the form [...], we can apply fJreduction as described in Example 2.6. 11. This gives us

2.6

Variations and Extensions of PCF

143

(g5)] (fix.

LAg:nat—* nat±.LAx:nat.L3]]

LG] )

where G Ag: nat —* nat1. [Ax: flat. g(x + 1)]. Our task is to reduce the argument (fix. LG] ) to the form [.. .j before performing the leftmost $-reduction. We do this using the

that — L

Ag: (flat

—*

F

L

nat1)

—*

(nat

] —** —*

nat1). g(Ax: nat.

let

Lh] —fix

(nat

natjj.

F g in hx)j

where

F

nat1)

(Af: ((flat

Ag:(nat

—*

nat1)

nat1))

(nat

—*

[h]

(flat

= fg in hx))

However, we will see in the process that the application to 5 resulting from 15-reduction cannot be reduced to the form L. . .j, keeping us from every reaching the normal form L3] for the entire term. The expansion offlx above gives us

F G

(Ax:nat.let [h]

—fix F G

—+÷

G

—+÷

[Ax: nat.((Ax: nat.

let

—+÷

[Ax:nat.let

=flx F

Lh]

in hx)

in hx) in h(x + 1)j

[h] =fix F G G

(x

+ 1))j

When the function inside the [ ] is applied to 5, the result must be reduced to a form L...] to complete the reduction of the entire term. However, this leads us to again reduce fix F G to this form and apply the result to 6. Since this process may be continued indefinitely, the entire term does not have a normal form. The main properties of the translation from Eager PCF into PCF1 are:

a in Eager PCF, with free variables xi: a1 xk: ak, then M: free variables xi: xk: Qk• (ii) If M N in PCF1. N in Eager PCF, then M (i) If M:

a1 in PCF1 with

(iii) If M is a closed term of Eager PCF of type nat or bool and c is a numeral or true or c in Eager PCF iff M —+÷ [c] in PCF1.

false, then M

(iv) If we let eager be operational equivalence of Eager PCF term, defined in the same in Section 2.3.5 but using eager evaluation, and let =PCFI be operational was as equivalence in PCF1, then for closed terms M, N we have M =eager N iffM PCFI i'L.

It is largely straightforward to verify properties (i)—(iii). Property (iv), however, involves

The Language PCF

144

construction of fully abstract models satisfying the properties described in Section 5.4.2. A proof has been developed by R. Viswanathan.

Exercise 2.6.13 Suppose that instead of having fixa for every pointed o', we only have fixed-point operators for types of the form Tj. Show that it is still possible to write a term with no normal form of each pointed type. Exercise 2.6.14 Reduce the application of Eq? to the following pairs of terms. You should continue until you obtain Ltruei or [false] or it is apparent that one of the let's can never be reduced. (a)

L3

+ 2]

and L5i.

(b)

and

(c)

and L19i.

[(Xx:nat.x+ 1)3].

Exercise 2.6.15 Carry out the reduction of the termfix. from Example 2.6.11.

nat

—÷

Exercise 2.6.16 where

The fibonacci function may be written in PCF as fib

F

nat. Xx: nat.

Xf: flat

—*

if (Eq?xO)or(Eq?xl) then

1

nat.[yjjj

else f(x—1)+f(x—2)

We assume for simplicity that we have natural number subtraction and a boolean or function. (a) Translate the definition from (ordinary) PCF into PCF1 and describe the reduction of fib 3 in approximately the same detail as Example 2.6.9. (Subtraction and or should be translated into PCF1 following the pattern given for + and Eq?. You may simplify [Mi — [N] to [M — N], or perform other simplifications that are similar to the ones used in Example 2.6.11.) (b) Same as part (a), but use the translation from Eager PCF into

PCFj.

Universal Algebra and Algebraic Data Types

3.1

Introduction

The language PCF may be viewed as the sum of three parts: pure typed lambda calculus with function and product types; the natural numbers and booleans; and fixed-point operators. If we replace the natural numbers and booleans by some other basic types, such as characters and strings, we obtain a language with a similar type structure (functions and pairs), but oriented towards computation on different kinds of data. If we wish to make this change to PCF, we must decide on a way to name basic characters and strings in terms, the way we have named natural numbers by 0, 1, 2 and choose basic functions for manipulating characters and strings, in place of +, Eq? and if ... then ... else Then, we must write a set of equational axioms that characterize these functions. This would define the terms of the language and give us an axiomatic semantics. Once we have the syntax of expressions and axiomatic semantics, we can proceed to select reduction axioms, write programs, andlor investigate denotational semantics. An algebraic datatype consists of one or more sets of values, such as natural numbers, booleans, characters or strings, together with a collection of functions on these sets. A fundamental restriction on algebraic datatypes is that none of the functions may have function parameters; this is what "algebraic" means. A expository issue is that in this chapter, we use the standard terminology of universal algebra for the sets associated with an algebra. Apart from maintaining consistency with the literature on algebra, the main reason for doing so is to distinguish algebraic datatypes, which consist of sets and functions, from the sets alone. Therefore, basic "type" symbols such as nat, bool, char and string are called sorts when they are used in algebraic expressions. In algebraic datatype theory, the distinction between type and sort is that a type comes equipped with specific operations, while a sort does not. This distinction is actually consistent with the terminology of Chapter 2, since function types and product types come equipped with specific operations, namely, lambda abstraction, application, pairing and projection. Universal algebra, also called equational logic, is a general mathematical framework that may be used to define and study algebraic datatypes. In universal algebra, the axiomatic semantics of an algebraic datatype is given by a set of equations between terms. The denotational semantics of an algebraic datatype involves structures called algebras, which consist of a collection of sets, one for each sort, and a collection of functions, one for each function symbol used in terms. The operational semantics of algebraic terms are given by directing algebraic equations. Traditionally, reduction axioms for algebraic terms are called rewrite rules. Some examples of datatypes that may be defined and studied using universal algebra are natural numbers, booleans, lists, finite sets, multisets, stacks, queues, and trees, each with various possible functions.

Universal Algebra and Algebraic Data Types

146

In this chapter, we will study universal algebra and its use in defining the kind of datatypes that commonly occur in programming. While the presentation of PCF in Chapter 2 centered on axiomatic and operational semantics, this chapter will be primarily concerned with the connection between axiomatic semantics (equations) and denotational semantics (algebras), with a separate section on operational semantics at the end of the chapter. In studying algebra, we will cover several topics that are common to most logical systems. The main topics of the chapter are: •

Algebraic terms and their interpretation in multi-sorted algebras,

•

Equational (algebraic) specifications and the equational proof system,

Soundness and completeness of the equational proof system (equivalence of axiomatic and denotational semantics),

•

•

Homomorphisms and initiality,

•

Introduction to the algebraic theory of datatypes,

•

Rewrite rules (operational semantics) derived from algebraic specifications.

The first four topics provide a brief introduction to the mathematical system of universal algebra. The subsequent discussion of the algebraic theory of datatypes points out some of the differences between traditional concerns in mathematics and the use of algebraic datatypes in programming. In the final section of this chapter, we consider reduction on algebraic terms. This has two applications. The first is to analyze properties of equational specifications and the second is to model computation on algebraic terms. A pedagogical reason to study universal algebra before proceeding with the denotational semantics of typed lambda calculus is that many technical concepts appear here in a simpler and more accessible form. There are many additional topics in the algebraic theory of datatypes. The most important omitted topics are: hierarchical specifications, parameterized specifications, refinement of one specification into another, and correctness of implementations. While we discuss problems that arise when a function application produces an error, we do not consider some of the more sophisticated approaches to errors such as algebras with partial functions or order-sorted algebras. The reader interested in more detailed investigation may consult [EM85, Wir9O]. 3.2

Preview of Algebraic Specification

The algebraic approach to data abstraction involves specifying the behavior of each datatype by writing a set of equational axioms. A signature, which is a way of defin-

3.2

Preview of Algebraic Specification

147

ing the syntax of terms, combined with a set of equational axioms, is called an algebraic specification. If a program using a set of specified datatypes is designed so that the correctness of the program depends only on the algebraic specification, then the primary concern of the datatype implementor is to satisfy the specification. In this way, the specification serves as a contract between the user and the implementor of a datatype; neither needs to worry about additional details of the other's program. This methodology is not specific to equational specifications, but equations are the simplest specification language that is expressive enough to be used for this purpose. The reader interested in specification and software development may wish to consult [BJ82, GHM78, LG86, Mor9O, Spi88], for example. When we view a set of equations as a specification, we may regard an algebra, which consists of a set of values for each sort and a function for each function symbol, as the mathematical abstraction of an implementation. In general, a specification may be satisfied by many different algebras. In program development, it is often useful to identify a "standard" implementation of a specification. One advantage is that it is easier to think about an equational specification if we choose a concrete, "typical" implementation. We can also use a standard implementation to be more specific about which implementations we consider acceptable. In datatype theory and in practical tools using algebraic datatypes, the so-called initial algebra (defined in this chapter) is often taken as the standard implementation. The main reasons to consider initiality are that it gives us a specific implementation that can be realized automatically and the initial algebra has some useful properties that cannot be expressed by equational axioms. In particular, there is a useful form of induction associated with initial algebras. Initial algebras are defined in Section 3.5.2, while their use in datatype theory is discussed in Section 3.6. In addition to initial algebras, so-called final algebras have also been proposed as standard implementations of an algebraic specification [Wan79J. Other computer scientists prefer to consider any "generated" algebra without extra elements. A short discussion of these alternatives appears in Section 3.6.2 and its exercises. For further information, the reader may consult [Wir9OI and the references cited there. One limitation of the algebraic approach to datatypes is that some of the operations we use in programming languages are type specific in some arguments, and type-independent in others. For example, the PCF conditional if ... then ... else ... requires a boolean as its first argument, but the next two arguments may have any type (as long as both have the same type). For this reason, a discussion of the algebra of PCF nat and bool values does not tell the whole story of PCF conditional expressions. Another limitation of the algebraic approach is that certain types (such as function types) have properties that are not expressible by equations alone. (Specifically, we need a quantifier to axiomatize

Universal Algebra and Algebraic Data Types

148

extensional equality of functions.) For this reason, it does not seem accurate to view a/l types algebraically. However, it does seem profitable to separate languages like PCF into an algebraic part, comprising basic datatypes such as nat, boo!, string, and so on, and a "higher-order" or lambda calculus part, comprising ways of building more complex types from basic ones. Some general references for further reading are [Grä68], a reference book on universal algebra, and [EM85, Wir9O], which cover algebraic datatypes and specifications. Some historically important research articles are [Hoa72, Par72], on the relationships between datatypes and programming style, and [LZ74, GHM78, GTW78] on algebraic datatypes and specifications. 3.3

Algebras, Signatures and Terms Algebras

3.3.1

An algebra consists of one or more sets, called carriers, together with a collection of distinguished elements and first-order (or algebraic) functions

over the carriers of the algebra. An example is the algebra

i,+, *), with carrier the set of natural numbers, distinguished elements 0, 1 E and functions +, * x Al. The difference between a "distinguished" element of a carrier and any other element is that we have a name for each distinguished element in the language of algebraic terms. As an economy measure, we will consider distinguished elements like 0, 1 E as "zero-ary" functions, or "functions of zero arguments." This is not a deep idea at all, just a convenient way to treat distinguished elements and functions uniformly. An example with more than one carrier is the algebra -4pcf

=

(IV,

8,0,

1

+, true,false, Eq9

where is the set of natural numbers, 8 the set of booleans, 0, 1,... are the natural numbers, + is the addition function, and so on. These are the mathematical values that we usually think of when we write basic numeric and boolean expressions of PCF. In studying algebras, we will make the correspondence between the syntactic expressions of PCF and the semantic values of this algebra precise.

3.3

3.3.2

Algebras, Signatures and Terms

149

Syntax of Algebraic Terms

The syntax of algebraic terms depends on the basic symbols and their types. This information is collected together in what is called a signature. In algebra, as mentioned earlier, it is traditional to call the basic type symbols sorts. A signature = (S, F) consists of •

A set S whose elements are called sorts,

A collection F of pairs in two distinct pairs. •

(f,

x

...

x

5k —÷

s), withs1

5 E

Sand no

f occurring

f

In the definition of signature, a symbol occurring in F is called a typed function symbol (or function symbol for short), and an expression 51 x ... x si —÷ s a first-order (or algebraic) function type over S. We usually write f: r instead of (f, r) E F. A sort is a name for a carrier of an algebra, and a function symbol f: sj x ... x 5k s is a name for a k-ary function. We allow k = 0, so that a "function symbol" f: s may be a name for an element of the carrier for sort s. A symbol f: s is called a constant symbol, and f: si x ... x 5k s with k> 1 is called aproperfunction symbol. The restriction that we cannot have f: and f: means that each constant or function symbol has a unique sort or type.

Example 3.3.1 A simple signature for writing natural number expressions is = (5, F), where S = {nat} contains only one sort and F provides function symbols 0: nat, 1: flat, +: flat x nat —÷ nat and *: nat x nat —÷ nat. A convenient syntax for describing signatures is the following tabular format.

sorts:nat fctns:0, 1: nat

+, * nat x nat —* nat In this notation, we often write several function symbols of the same type on one line. This saves space and usually makes the list easier to read.

The purpose of a signature is to allow us to write algebraic terms. We assume we have some infinite set V of symbols we call variables, and assume that these are different from all the symbols we will ever consider using as constants, function symbols or sort symbols. Since it is not meaningful to write a variable without specifying its sort, we will always list the sorts of variables. A sort assignment is a finite set xk:sk}

Universal Algebra and Algebraic Data Types

150

of ordered variable: sort pairs, with no variable given more than one sort. Given a signature = (S, F) and sort assignment F using sorts from S, we define the set Termss(E, F) of algebraic terms of sort s over signature E and variables F as follows x

Termss(E,

F)ifx:s M

In the special case k

=

f

f

F) if

0, the

F,

F) for i

=

1

n.

second clause should be interpreted as

:

For example, 0 e 0) and +0! is a term in 0), since + is a F) binary numeric function symbol of En-. It is also easy to see that +01 e for any sort assignment F. To match the syntax of PCF, we may treat 0 + 1 as syntactic sugar for +01. When there is only one sort, it is common to assume that all variables have that sort. In this case, we can simplify sort assignments to lists of variables. We write Var(M) for the set of variables that appear in M. In algebra, the substitution [N/x]M of N for all occurrences of x in M is accomplished by straightforward replacement of each occurrence of x by N, since there are no bound variables in algebraic terms. It is easy to see that if we replace a variable by a term of the right sort, we obtain a well-formed term of the same sort we started with. This is stated more precisely below. In the following lemma, we use the notation F, x: s' for the sort assignment

F,x:s =FU{x:s} where x does not occur in F.

Lemma 3.3.2

If M e

F, x: s') and N e

F), then [N/x]M e

F). A related property is given in Exercise 3.3.4.

Proof The proof is by induction on the structure of terms in Termss(E, F, x: s'). Since this is the first inductive proof in the chapter, we will give most of the details. Subsequent inductive proofs will give only the main points. Since every term is either a variable or the application of a function symbol to one or more arguments, induction on algebraic terms has one base case (variables) and one induction step (function symbols). In the base case, we must prove the lemma directly. In the induction step, we assume that the lemma holds for all of the subterms. This assumption is called the inductive hypothesis.

3.3

Algebras, Signatures and Terms

151

For the base case we consider a variable y E Termss(E, F, x: s'). If y is different from x, then [N/xjy is just y. In this case, we must have y: s E F, and hence y E Terms5(E, F). If our term is the variable x itself, then the result [N/xjx of substitution is the term N, which belongs to Termss(E, F) by similar assumption. This proves the base case of the induction. For the induction step, we assume that [N/xjM1 E Termssi(>, F), for < i , F, x: s'), the function symbol must have type sj x ... x 5k s. It follows that f[N/xjMi F), which finishes the induction step and proves the lemma. E . [N/X]Mk 1

. .

. .

.

.

Example 3.3.3 We may write natural number and stack expressions using the signature = (S, F) with S = {nat, stack) and F containing the function symbols listed below.

sorts:nat, stack

...

fctns:0, 1,2,

+,

*

:

:

flat

flat x nat

nat

—÷

empty stack :

push nat x stack :

pop stack :

top stack :

—f —÷

—÷

stack

stack

nat

The terms of this signature include the expression push2(pushl (push0empiy)), which would name a stack with elements 0,1 and 2 in the standard interpretation of this signature. Using a stack variable, we can also write an expression push I (push0s) E {s: stack)) which defines the stack obtained by pushing 0 and I onto s.

.

Exercise 3.3.4 Use induction on terms to show that if M assignment containing F, then M E Terms5(>, F'). 3.3.3

E

F) and F' is a sort

Algebras and the Interpretation of Terms

Algebras are mathematical structures that provide meaning, or denotational semantics, for algebraic terms. To make the mapping from symbols to algebras explicit, we will use a form of algebra that gives a value to every sort and function symbol of some signature. If E is a signature, then a E-algebra A consists of •

Exactly one carrier

AS

for each sort symbol s

E

S,

Universal Algebra and Algebraic Data Types

152

•

An interpretation map

I assigning a function

to each proper function symbol each constant symbol f: s e F.

f: Si

x

...

x

5k

s

e F and an element

1(f)

E

to

If E has only one sort, then we say that a E-algebra is single-sorted; otherwise, we say an algebra is many-sorted or multi-sorted. If A = ({AS)5Es, I) is a E-algebra, and is a function symbol of E, it is often convenient to write fA for 1(f). In the case that there are finitely many or countably many function symbols, it is also common to list the functions, as in the example algebras and Apcf given earlier. Combining these two conventions, we may write out the En-algebra

f

as

This gives us all the information we need to interpret terms over since the interpretaway tion map is given by the the functions of the algebra are named. It is a relatively simple matter to say what the meaning of a term M E Terms(E, F) is in any E -algebra. The only technical machinery that might not be familiar is the use of variable assignments, or environments. An environment for A is a mapping ij: V -± USAS from variables to elements of the carriers of A. The reason that we need an environment is that a term M may have a variable x in it, and we need to give x a value before we can say what M means. If someone asks you, "What is the value of x + 1?" it is difficult to answer without choosing a value for x. Of course, an environment must map each variable to a value of the appropriate carrier. We say an environment ij satisfies F if ij (x) E AS for each x: s E F. Given an environment for A that satisfies F, we define the meaning AI[M]Jij of any M e Terms(E, F) in as follows

I

AIIIfM1

. .

.

= fA(AIIIMI]IJl

= A, since there are no In the special case that f: s is a constant symbol, we have function arguments. It is common to omit A and write if the algebra A is clear from context. If M has no variables, then does not depend on ij. In this case, we may omit the environment and write for the meaning of M in algebra A. Example 3.3.5 The signature of Example 3.3.1 has a single sort, nat, and function symbols 0, 1, +, *. We may interpret terms over this signature in the En-algebra given

Algebras, Signatures and Terms

3.3

153

in Section 3.3.1. Let be an environment 77 is determined as follows. (Remember that x

with 77(x) 0. The meaning of x + + is syntactic sugar for +x 1.)

1

in

1

+ 1N) 1N)

=

Example 3.3.6 The signature = (S, F) of Example 3.3.3 has sorts flat and stack, and function symbols 0, 1, 2 +, *, empty, push, pop and top. One algebra Astk for this signature interprets the natural numbers in the standard way and interprets stacks as sequences of numbers. To describe the functions on stacks, let us write n: s for the result of adding natural number n onto the front of sequence s E Using this notation, we have Astk

0A, 1A 2A

=

+A,

emptyA,pushA,popA, topA)

with stack functions determined as follows emptyA

=

pushA(fl,

s)

=

pOpA(fl:

s)

=

:

popA(E)

E,

the empty sequence :s

s E

topA(fl::s)

=rz

topA(E)

=0

In words, the constant symbol empty is interpreted as emptyA, the empty sequence of natural numbers, and the function symbol push is interpreted as pushA, the function that removes the first adds a natural number onto the front of a sequence. The function element of a sequence if the sequence is nonempty, as described by the first equation Since pop(empty) is a well-formed algebraic term, this must be given some for interpretation in the algebra. Since we do not have "error" elements in this algebra, we let popA(E) be the empty sequence. A similar situation occurs with top, which is intended to return the top element of a stack. When the stack is empty, we simply return 0, which is an arbitrary decision.

Universal Algebra and Algebraic Data Types

154

We can work out the meanings of terms using the environment ij mapping variable x: flat to 3 and variable s: stack to the sequence (2, 1) whose first element is 2 and second element is 1.

=popA(pushA(IIjx]JIJ,

lpop(pushx

=pOjy4(pUshA(3, (2, 1)))

=popA((3,2,

1))

= (2,1) =

lljtop(pushx

top

= topA(pushA(3,

(2,

1)))

to see from these examples that for any environment number and s to a stack, we have

ij

mapping x to a natural

ffpop(pushxs)Jlij = ffsjjij _—ffxjjij

fftop(pushxs)}Jij

.

One important property of the meaning function is that the meaning of a term M E Termss(E, F) is an element of the appropriate carrier As, as stated in the lemma below.

This lemma should be regarded as a justification of our definitions, since if it failed, we would change the definitions to make it correct.

Lemma 3.3.7

Let A be a E-algebra, M

fying F. Then

E Terms5(E,

F), and

ij be

an environment satis-

E AS

in Termss(E, F). The base case is to show that for any variable x E Termss(E, F) and environment ij satisfying F, we have ffxjlij E As. From the definition of Terms5(E, F), we can see that x: s E F. By the definition of meaning, we know that ffxjjij = ij(x). But since ij satisfies F, it follows that ij(x) E As. This proves the base case of the induction. For the induction step, we assume that ElM1 Tlij E Asj, for 1 +

M3...

N

and similarly

P

Pi

Po

P3...

P2 —>+

Q.

Since —>+ is reflexive, we may assume without loss of generality that k = Lemma 3.7.10, we have

[P/x]M

Therefore, by

[P3/x]M3...

[P1/x]M1

[Po/x]Mo

£.

[Pk/X]Mk

[Q/x]N.

This proves the lemma.

N [F]

if M

Lemma 3.7.12 If

is

Proof Suppose M

N [F]. Consider any sequence of terms Mi,

M

M3

—>+

M1

—>+

confluent, then M

.

. .

.

Mk such that

N.

Mk

. .

N.

o

N [F] in this form, By reflexivity and transitivity of —>+ we may always express M with the first sequence of reductions proceeding from M —>+ M1, and the next in the opposite direction. If k 2, then the lemma holds. Otherwise, we will show how to shorten M3. the sequence of alternations. By confluence, there is some term P with M1 —>+ P By transitivity of —>+, this gives us a shorter conversion sequence M

—>+

P

M4

—>+

...

Mk

N.

By induction on k, we conclude that M —>+

Theorem 3.7.13 M

o

N.

o

N.

For any confluent rewrite system

1Z,

we have

I—

M

= N[F]

. if

Universal Algebra and Algebraic Data Types

210

Proof By Lemma 3.7.11, we know 3.7.12, confluence implies M

Corollary 3.7.14 If s, then

ER. is consistent.

3.7.3

Termination

ER H M

N [F]

if M

= N[Fj o

if

M N.

N [F]. By Lemma

is confluent and there exist two distinct normal forms

of sort

There are many approaches to proving termination of rewrite systems. Several methods involve assigning some kind of "weights" to terms. For example, suppose we assign a natural number weight WM to each term M and show that whenever M -÷ N, we have WM > wN. It follows that there is no infinite sequence M0 -÷ M1 -÷ M2 -÷ . of one-step reductions, since there is no infinite decreasing sequence of natural numbers. While this approach may seem straightforward, it is often very difficult to devise a weighting function that does the job. In fact, there is no algorithmic test to determine whether a rewrite system is terminating. Nonetheless, there are reasonable methods that work in many simple cases. Rather than survey the extensive literature on termination proofs, we will just take a quick look at a simple method based on algebras with ordered carriers. More information may be found in the survey articles [DJ9O, H080, K1o87] and references cited there. The method we will consider in this section uses binary relations that prohibit infinite sequences. Before considering well-founded relations, also used in Section 1.8.3, we reis a binary relation on a set A, we will write view some notational conventions. If for the union of -< and equality, and >- for the complement of In other words, x y if x -< y or x = y and x >- y —'(y x). Note that in using the symbol -< as the name of is transitive or has any other properties a relation, we do not necessarily assume that of an ordering. However, since most of the examples we will consider involve ordering, the choice of symbol provides helpful intuition. A binary relation -< on a set A is well-founded if there is no infinite decreasing sequence, ao >- ai >>—..., of elements of A. The importance of well-founded relations is that if we can establish a correspondence between terms and elements of a set A with well-founded binary relation —.- R for each rule L —÷ R in 7?., then is terminating.

Proof The proof uses the substitution lemma and other facts about the interpretation of terms. The only basic fact we must prove directly is that for any term M e Terms(E, F, x: s) containing the variable x, and any environment if a >- b e A5, then a]

>.-

IIMFj[x

b].

This is an easy induction on M that is left to the reader. we To prove the lemma, it suffices to show that if M —÷ N, then for any environment >.since no well-founded carrier has any infinite sequence of "decreashave ing" elements. In general, a single reduction step has the form —÷ IISR/x]M, where x occurs exactly once in M. Therefore, we must show that for any environment satisfying the appropriate sort assignment,

and b Let be any environment and let a = and b = for some environment lemma, we have a = the substitution S. Therefore, by the hypothesis of the lemma, a easy induction mentioned in the first paragraph of the proof) that

>-

By the substitution determined by and b. It follows (by the

Universal Algebra and Algebraic Data Types

212

.

The lemma follows by the substitution lemma.

It is important to realize that when we prove termination of a rewrite system by interpreting terms in a well-founded algebra, the algebra we use does not satisfy the equations associated with the rewrite system. The reason is that reducing a term must "decrease" its value in the well-founded algebra. However, rewriting does not change the interpretation of a term in any algebra satisfying the equations associated with the rewrite system. An essentially trivial termination example is the rewrite system for stacks, obtained by orienting the two equations in Table 3.1 from left to right. We can easily see that this system terminates since each rewrite rule reduces the number of symbols in the term. This argument may be carried out using a well-founded algebra whose carriers are the natural numbers. To do so, we interpret each function symbol as a numeric function that adds one to the sum of its arguments. Since the meaning of a term is then found by adding up the number of function symbols, it is easy to check that each rewrite rule reduces the value of a term. It is easy to see that when the conditions of Lemma 3.7.15 are satisfied, and each carrier is the natural numbers, the numeric meaning of a term is an upper-bound on the longest sequence of reduction steps from the term. Therefore, if the number of rewrite steps is a fast-growing function of the length of a term, it will be necessary to interpret at least one function symbol as a numeric function that grows faster than a polynomial. The following example uses an exponential function to show that rewriting to disjunctive normal form always terminates.

Example 3.7.16 (Attributed to Filman [DJ9OJ.) Consider the following rewrite system on formulas of propositional logic. As usual, we write the unary function —' boo! —± bool as a prefix operator, and binary functions A, V bool x bool —± boo! as infix operations. :

:

x

—'—lx

V

y) y)

A >

V

XA(yVz)-+ (xAy)v(xAz) (yvz)Ax—±(yAx)v(zAx). This system transforms a formula into disjunctive normal form. We can prove that this rewrite system is terminating using a well-founded algebra of natural numbers. More specifically, let A = (A"°°1, orA, andA, nor4) be the algebra with

3.7

Rewrite Systems

213

carrier the

=

orA(x,y)

=x+y+1

andA(x, y)

=x

1), the natural numbers greater than 1, and

—

* y

It is easy to see that each of these functions is strictly monotonic on the natural numbers (using the usual ordering). Therefore, A is a well-founded algebra. To show that the rewrite rules are terminating, we must check that for natural number values of the free variables, the left-hand side defines a larger number than the right. For the first rule, this is obvious since for any natural number x > 1, we have 22x > x. For the > for any x, y> 1. The remaining rules rely second rule, we have 2(x+y+1) = on similar routine calculations. .

Example 3.7.17 In this example, we will show termination of a rewrite system derived from the integer multiset axioms given in Table 3.2. For simplicity, we will omit the rules for conditionals on multisets and booleans, since these do not interact with any of the other rules. Specifically, let us consider the signature with sorts mset, nat, bool, constants for natural numbers, the empty multiset and true, false, and functions + and Eq? on natural conditional for natural numbers. The numbers, insert and mem? on multisets, and are rewrite rules

O+O—±O,O+

Eq?xx

—±

Eq?O

—±

1

1

—±

1,...

true

false, Eq?02 —± false,...

mem? x empty —k 0

mem?x(insertym) true x y xy

—± —±

—±

if

Eq?xy then (mem?xm)+1 else mem?xm

x y

= = A with A/. We must choose strictly monotonic functions for each function symbol so that the meaning of the left-hand side of each rule is guaranteed to be a larger natural number than the meaning of the right-hand side. For most of the functions, this may be done in a straightforward manner. However, since the second membership rule does not decrease the number of We will interpret this signature in an algebra

Universal Algebra and Algebraic Data Types

214

symbols in a term, we will have to give more consideration to the numeric interpretation of mem?. An intuitive way to assign integers to terms is to give an upper bound on the longest possible sequence of reductions. For example, the longest sequence from a term of the form reduces all three subterms as much as possible, then (assuming M1 reduces to true or false) reduces the conditional. Therefore, a reasonaNe numeric interpretation for condo is 1

+x+y+z.

Following the same line of reasoning, we are led to the straightforward interpretations of the other functions, except mem?.

+Axy

=1+x+y

Eq?Axy

=l+x+y =

For

1

+x +

m

technical reasons that will become apparent, we will interpret

=

=

=

—

{O, 11

rather than using all natural numbers. Therefore, we interpret each constant symbol as 2. For mem?, we must choose a function so that the left-hand side of each mem? rule will give a larger number than the right-hand side. The first mem? rule only requires that the function always have a value greater than 2. From the second rule, we obtain the condition

mem?Ax(y +m) >

5+x +y +2*(mem?Axm).

We can surmise by inspection that value of mem?A x m should depend exponentially on its second argument, since mem?A x (y + m) must be more than twice mem?A x m, for any y > 2. We can satisfy the numeric condition by taking mem?A x m

=

*

2m

With this interpretation, it is straightforward to verify the conditions of Lemma 3.7.15 and conclude that the rewrite rules for multisets are terminating. The only nontrivial case is mem?. A simple way to see that x

*2(y+m)_(5+x

is to verify the

each variable.

+y+x *2m+1)>O

inequality for x = y = m = 2 and then check the derivative with respect to

3.7

Rewrite Systems

Exercise 3.7.18

215

Consider the following rewrite system on formulas of propositional

logic.

V y)

>

A

y)

>

V

(xvy)A(xvz) (yvx)A(zvx). This is similar to the system in Example 3.7.16, except that these rules transform a formula into conjunctive normal form instead of disjunctive normal form. Show that this rewrite system is terminating by defining and using a well-founded algebra whose carrier is some subset of the natural numbers. (This is easy if you understand Example 3.7.16.)

Exercise 3.7.19 Show that the following rewrite rules are terminating, using a wellfounded algebra. (a)

O+x (Sx) + y

S(x

+ y)

(b) The rules of part (a) in combination with

O*x (Sx)

* y

>

(x * y)

+y

(c) The rules of parts (a) and (b) in combination with

factO

1

fact(Sx)

(Sx) * (factx)

Termination for a more complicated version of this example is proved in [Les92] using a lexicographic order.

An algebraic specification for set, nat and bool is given in Table 3.4. We may derive a rewrite system from the the equational axioms by orienting them from left to right.

Exercise 3.7.20

(a) Show that this rewrite system is terminating by interpreting each of the function symbols as a function over a subset of the natural numbers. (b) We may extend the set datatype with a cardinality function defined by the following

axioms.

Universal Algebra and Algebraic Data Types

216

=0

card empty

card(insert x s) = cond (ismem? x s)(card s)((card s) +

1)

Show that this rewrite system is terminating by interpreting each of the function symbols as a function over a subset of the natural numbers.

3.7.4

Critical Pairs

We will consider two classes of confluent rewrite systems, one necessarily terminating and the other not. In the first class, discussed in Section 3.7.5, the rules do not interact in any significant way. Consequently, we have confluence regardless of termination. In the second class, discussed in Section 3.7.6, we use termination to analyze the interaction between rules. An important idea in both cases is the definition of critical pair which is a pair of terms representing an interaction between rewrite rules. Another general notion is local confluence, which is a weak version of confluence. Since local confluence may be used to motivate the definition of critical pair, we begin by defining local confluence. o is locally confluent if N M —÷ P implies N P. This is A relation o strictly weaker than confluence since we only assume N P when M rewrites to N or P by a single step. However, as shown in Section 3.7.6, local confluence is equivalent to ordinary confluence for terminating rewrite systems.

Example 3.7.21 confluent is a a

—÷

—÷

b, b

—÷

a

ao, b

—÷

bo

A simple example of a rewrite system that is locally confluent but not

Both a and b can be reduced to ao and b0. But since neither ao nor b0 can be reduced to the other, confluence fails. However, a straightforward case analysis shows that is locally confluent. A pictorial representation of this system appears in Figure 3.1. Another rewrite system that is locally confluent but not confluent, and which has no loops, appears in Example 3.7.26.

.

In the rest of this section, we will consider the problem of determining whether a rewrite system is locally confluent. For rewrite systems with a finite number of rules, we may demonstrate local confluence by examining a finite number of cases called "critical pairs." Pictorial representations of terms and reductions, illustrating the three important cases, are given in Figures 3.2, 3.3 and 3.4. (Similar pictures appear in [DJ9O, Hue8O, K1o87].) Suppose we have some finite set 1?. of rewrite rules and a term M that can be reduced in two different ways. This means that there are rewrite rules L —÷ R and L' —÷ R' and

3.7

a0

Rewrite Systems

217

ab

Figure 3.1 A locally confluent but non-confluent reduction.

substitutions S and S' such that the subterm of M at one position, has the form SL and the subterm at another position, has the form S'L'. It is not necessary that L R and R' be different rules. However, if we may rename variables so that L and L' have no L' variables in common, then we may assume that S and 5' are the same substitution. There are several possible relationships between and In some cases, we can see that there is essentially no interaction between the rules. But in one case, the "critical pair" case, there is no a priori guarantee of local confluence. The simplest case, illustrated in Figure 3.2, is if neither SL nor SL' is a subterm of the is a subsequence of the other. In this case, it other. This happens when neither nor does not matter which reduction we do first. We may always perform the second reduction immediately thereafter, producing the same result by either pair of reductions. For example, if M has the form fSL SL', the two reductions produce SR SL' and fSL SR', respectively. But then each reduces to SR SR' by a single reduction. The second possibility is that one subterm may contain the other. In this case, both reductions only effect the larger subterm. For the sake of notational simplicity, we will assume that the larger subterm is all of M. Therefore, we assume is the empty sequence, SL, and SL' is a subterm of SL and position iii, There are two subcases. so M A "trivial" way for SL' to be a subterm of SL is when S substitutes a term containing SL' for some variable in L. In other words, if we divide SL into symbol occurrences that come from L and symbol occurrences that are introduced by 5, the subterm at position only contains symbols introduced by S. This is pictured in Figure 3.3. In this case, reducing SL to SR either eliminates the subterm SL', if R does not contain x, or gives us a term containing one or more occurrences of SL'. Since we can reduce each occurrence of SL' to SR', we have local confluence in this case. The remaining case is the one we must consider in more detail: SL' is at position of is not a variable. Clearly, this may only happen if the principal (first) function SL and If we apply the first symbol of L' occurs in L, as the principal function symbol of

f

f

Universal Algebra and Algebraic Data Types

218

/ Figure 3.2 Disjoint reductions

with rewrite rule, we obtain SR. The second rewrite rule gives us the term our chosen occurrence of SL' replaced by SR'. This is pictured in Figure 3.4. Although o we may have SR [SR7ñi]SL "accidentally," there is no reason to expect this in general.

Example 3.7.22 Disjoint (Figure 3.2) and trivially overlapping (Figure 3.3) reductions are possible in every rewrite system. The following two rewrite rules for lists also allow conflicting reductions (Figure 3.4): —± £ I

I

£

We obtain a disjoint reduction by writing a term that contains the left-hand sides of both rules. For example,

3.7

Rewrite Systems

219

/ Figure 3.3 Trivial overlap

condB(cdr(cons x £)) (cons(car £') (cdr £')) may be reduced using either rewrite rule. The two subterms that may be reduced independently are underlined. A term with overlap only at a substitution instance of a variable (Figure 3.3) is

cdr(cons x(cons(car £')(cdr £'))) which is obtained by substituting the left hand side of the second rule for the variable £ in the first. Notice that if we reduce the larger subterm, the underlined smaller subterm remains intact. Finally, we obtain a nontrivial overlap by replacing x and £ in the first rule by care and

cdr(cons(car £')(cdr £'))

Universal Algebra and Algebraic Data Types

220

/

N

Figure 3.4 Critical pair

If we apply the first rule to the entire term, the cons in the smaller subterm is removed by the rule. If we apply the second rule to the smaller subterm, cons is removed and the first rule may no longer be applied. However, it is easy to see that the rules are locally confluent, since in both cases we obtain the same term, £. A system where local confluence fails due to overlapping rules appears in Example 3.7.1. Specifically, the term (x + 0) + y may be reduced to either x + y or x + (0 + y), but neither of these terms can be reduced further. Associativity (as in the third rule of Example 3.7.1) and commutativity are difficult to treat by rewrite rules. Techniques for rewriting with an associative and commutative equivalence relation are surveyed in [DJ9O], for example.

.

It might appear that for two (not necessarily distinct) rules L R and L' —* R', we must consider infinitely many substitutions S in looking for nontrivial overlaps. However, for each term L, position in L, and term L', if any substitution S gives SL' there is a simplest such substitution. More specifically, we may choose the minimal (most general) substitution S under the ordering S 5' if there exist 5" with 5' = S" o S. Using unification, we may compute a most general substitution for each L' and non-variable subterm or determine that none exists. For many pairs of rewrite rules, this is a straightforward hand calculation. Intuitively, a critical pair is the result of reducing a term with overlapping redexes, obtained with a substitution that is as simple as possible. An overlap is a triple (SL, SL', ñ'z) such that SL' occurs at position of SL and is not a variable. If S is the sim-

3.7

Rewrite Systems

221

plest substitution such that (SL, SL', n1) is an overlap, then the pair (SR, is called a critical pair. In Example 3.7.22 the two nontrivial overlaps give us critical pairs S(x + y) + z), since the simplest possible substitutions were £) and (Sx + (y +

used. A useful fact to remember is that a critical pair may occur only between rules L —f R of L' occurs in L. Moreover, if and L' —± R' when the principal function symbol is applied to non-variable terms in both L' and the relevant occurrence in L, then the principal function symbols of corresponding arguments must be identical. For example, if we have a rule f(gxy) —f R', then this may lead to a critical pair only if there is a rule R such that either fz or a term of the form f(gPQ) occurs in L. A subterm of the L form f(h...) does not give us a critical pair, since there is no substitution making (h...) syntactically identical to any term with principal function symbol g. In general, it is important to consider overlaps involving two uses of the same rule. For example, the rewrite system with the single rule f(fx) —f a is not locally confluent since we have an overlap (f(f(fx)), f(fx), 1) leading to a critical pair (fa, a). (The term f(f(fx)) can be reduced to both fa and a.) It is also important to consider the case where two left-hand sides are identical, since a rewrite system may have two rules, L —± R and L —f R', that have the same left-hand side. (In this case, (R, R') is a critical pair.) Since any rule L —± R results in overlap (L, L, E), where E is the empty sequence, any rule results in a critical pair (R, R) of identical terms. We call a critical pair of identical terms, resulting from two applications of the a single rule L —f R to L, a trivial critical

f

f

pair Proposition 3.7.23

[KB7O] A rewrite system N for every critical pair (M, N).

is locally confluent

if M

o

If a finite rewrite system is terminating, then this lemma gives us an algorithmic procedure for deciding whether 1Z is locally confluent. We will consider this in Section 3.7.6.

Proof The implication from left to right is straightforward. The proof in the other direction follows the line of reasoning used to motivate the definition of critical pair. More specifically, suppose M may be reduced to two subterms, N and P, using L —± R and L' —f R'. Then there are three possible relationships between the relevant subterms SL and SR of M. If these subterms are disjoint or overlap trivially, then as argued in this P. The remaining case is that N and P contain substitusection, we have N —*± o tion instances of a critical pair. Since the symbols not involved in the reductions M —± N o P, we may simand M —± P will not play any role in determining whether N of 1Z with plify notation by assuming that there is some critical pair (SR,

Universal Algebra and Algebraic Data Types

222

However, by the hypothesis of the lemma, we have Therefore, since each reduction step applies to any substitution o P. instance of the term involved (see Lemmas 3.7.10 and 3.7.11), we have N

S'(SR) and P

N SR

o

.

Using the signature of Example 3.7.1, we may also consider the rewrite

Exercise 3.7.24 system

O+x (—x)+x (x

+ y) +

—±0 z

x

+

(y

+

z)

You may replace infix addition by a standard algebraic (prefix) function symbol if this makes it easier for you. (a) Find all critical pairs of this system.

(b) For each critical pair (SR,

determine whether SR

—** o

Exercise 3.7.25

A rewrite system for formulas of propositional logic is given in Example 3.7.16. You may translate into standard algebraic terms if this makes it easier for you. (a) Find all critical pairs of this system.

(b) For each critical pair (SR,

3.7.5

determine whether SR

—** o

[SR'/ñz]SL.

Left-linear Non-overlapping Rewrite Systems

A rewrite system is non-overlapping if it has no nontrivial critical pairs. In other words, the only way to form a critical pair is by using a rule L —÷ R to reduce L to R twice, obtaining the pair (R, R). Since trivial critical pairs do not effect confluence, we might expect every non-overlapping rewrite system to be confluent. However, this is not true; a counterexampIe is given in Example 3.7.26 below. Intuitively, the problem is that since rewrite rules allow us to replace terms of one form with terms of another, we may have rules that only overlap after some initial rewriting. However, if we restrict the left-hand sides of rules, then the absence of critical pairs ensures confluence. The standard terminology is that a rule L —k R is left-linear if no variable occurs twice in L. A set of rules is left-linear if every rule in the set is left-linear. The main result in this section is that left-linear, non-overlapping rewrite systems are confluent. The earliest proofs of this fact are probably proofs for combinatory logic (see Example 3.7.31) that generalize to arbitrary left-linear and non-overlapping systems [Klo87]. The general approach using parallel reduction is due to Huet [Hue8O].

3.7

Rewrite Systems

223

Example 3.7.26 The following rewrite system for finite and "infinite" numbers has no critical pairs (and is therefore locally confluent) but is not confluent.

-±Sc,c Eq?xx Eq?x (Sx) Intuitively, the two Eq? rules would be fine, except that the "infinite number" oc is its own successor. With oc, we have

Eq? oc

true

Eq?c'cc'c—÷ Eq?c'c(Scx)-± false so one term has two distinct normal forms, true and false. The intuitive reason why confluence fails in this situation is fairly subtle. The only possible critical pair in this system would arise from a substitution R such that R(Eq?x x) R(Eq? x (Sx)). Any such substitution R would have to be of the form R = [M/xj, where M SM. But since no term M is syntactically equal to its successor, SM, this is impossible. However, the term oc is reducible to its own successor. Therefore, the rules have the same effect as an overlap at substitution instances Eq? 00 00 and Eq? oc (Sx). Another, way of saying this is that since oc —* we have foo/xJ(Eq?xx) —* fcx/xJ (Eq?x (Sx)), and the "overlap after reduction" causes confluence to fail. We will analyze left-linear, non-overlapping rewrite systems using parallel reduction, which was considered in Section 2.4.4 for PCF. We define the parallel reduction relation, for a set of algebraic rewrite rules by

...flkJM

[SL1

where h1, . . nk are disjoint positions in M (i.e., none of these sequences of natural numR1 that bers is a subsequence of any other) and L1 Lk —* Rk are rewrite rules in need not be distinct (and which have variables renamed if necessary). An alternate characand is closed is that this is the least relation on terms that contains terization of under the rule .

N1

C[M1

Mk

Nk

NkJ

for any context Cf. ..] with places for k terms to be inserted. This was the definition used for parallel PCF reduction in Section 2.4.4.

Universal Algebra and Algebraic Data Types

224

N by a reduction sequence of N if M Some useful notation is to write M length n, and M N if there is a reduction sequence of length less than n. The reason we use parallel reduction to analyze linear, non-overlapping rewrite systems is that parallel reduction has a stronger confluence property that is both easier to prove directly than confluence of sequential reduction and interesting in its own right. A relation —f on terms is strongly confluent if, whenever M —* N and M —* P, there is a term Q with N Q and P Q. The reader who checks other sources should know that an alternate, slightly weaker definition of strong confluence is sometimes used. is strongly confluent. This is called the "parallel moves lemma." We now show that implies confluence After proving this lemma, we will show that strong confluence of

of Lemma 3.7.27 confluent.

[Hue8O] If R. is left-linear and non-overlapping, then

is

strongly

The lemma proved in [Hue8O] is actually stronger, since it applies to systems with the N or N M. However, property that for every critical pair (M, N), we have M the left-linear and non-overlapping case is easier to prove and will be sufficient for our purposes.

Proof Suppose that M

o N and M P. We must show that N P. For notational simplicity, we will drop the subscript in the rest of the proof. The reduction M N results from the replacement of subterrns SL1 SLk at positions by SR1 P involves replaceSRk and, similarly, the reduction M jig. By hypothesis, the positions ñi ment of terms at positions must be disjoint and similarly for J3g. There may be some overlap between some of the 's and 's, but the kinds of overlaps are restricted by the disjointness of the two sets of positions. Specifically, if some ñ is a subsequence of one or more flj's, say then none of Pi can be a subsequence of any Therefore, the only kind of overlap we must consider is a subterm SL of M that is reduced by a single rule L —* R in one reduction, but has one or more disjoint subterms reduced in the other. Since there are no critical pairs, we have only the trivial case that with the same rule applied in = both cases, and the nontrivial case with all of the effected subterms of SL arising from substitution for variables in L. For notational simplicity, we assume in the rest of this proof that we do not have ñj = for any i and j. The modifications required to treat = are completely straightforward, since N and P must be identical at this position. To analyze this situation systematically, we reorder the sequences of disjoint positions so that for some r k and s £, the initial subsequences and P5

n.

3.7

Rewrite Systems

225

have two properties. First, none has any proper subsequences among the complete lists

and Second, every sequence in either complete list must have some proper subsequence among the initial subsequences ñi n and If we let t = r + s and let M1 be the subterms Mf,j1,..., MI,4, then every M1 is reduced in producing N or P, and all of the reductions occur in M1 Therefore, we may write M, N and P in the form

M_=C[M1

NJ PtJ

where, for 1 ÷

are confluent.

Proof It

are the same relation on terms. (This and is not hard to see that is part (a) of Exercise 2.4.19 in Section 2.4.4.) Therefore, it suffices to show that if

is confluent. For notational simplicity, we will drop the strongly confluent then N if there is a reduction subscript 7Z in the rest of the proof. Recall that we write M N if the length is less than n. sequence from M to N of length n, and M N Suppose is strongly confluent. We first show by induction on n that if M For the base and M P then there is some term Q with N =* N holds, in some well-founded algebra. The completion procedure begins by testing whether L > R, for each rewrite rule L —* R. If so, then the rewrite system is terminating and we may proceed to test for local confluence. This may be done algorithmically by computing critical pairs (M, N) and o searching the possible reductions from M and N to see whether M N. Since the rewrite system is terminating, there are only finitely many possible reductions to consider. If all critical pairs converge, then the algorithm terminates with the result that the rewrite system is confluent and terminating. o ±*— N for some critical pair (M, N) then by definition of If we do not have M critical pair, there is some term that reduces to both M and N. Therefore, M and N may be proved equal by equational reasoning and it makes logical sense to add M —± N or N —* M as a rewrite rule. If M> N or N > M, then the completion procedure adds the appropriate rule and continues the test for local confluence. However, if M and N

3.7

Rewrite Systems

229

incomparable (neither M> N nor N > M), there is no guarantee that either rule preserves termination, and so the procedure terminates inconclusively. This procedure may not terminate, as illustrated in the following example, based on [Be186J. are

Example 3.7.34 In this example, we show that the completion procedure may continue indefinitely, neither producing a confluent system after adding any finite number of rules, nor reaching a state where there is no rule to add. Let us consider two rewrite rules, called H for homomorphism and A for associativity.

fx+fy

(H) (A)

We can see that these are terminating by constructing a well-founded algebra, A. As the single carrier, we may use the natural numbers greater than 2. To make (A) decreasing, we

must choose some non-associative interpretation for +. The simplest familiar arithmetic operation is exponentiation. We may use

+A(x,

fA(x)

yX,

x2.

An easy calculation shows that each rule is decreasing in this algebra. The only critical pair is the result of unifying fx + fy with the subterm (x + y) of the left-hand side of (A). Reducing the term (fx + fy) + z in two ways gives us the critical pair

(f(x + y) + z, fx + (fy + z)), with weights

fA(x +A y) +A z =

fAx +A (fAx +A z) =

x2

=

(zY2)x2

2x

=

Comparing the numeric values of these expressions for x, y, z> 3, we can see that termination is preserved if we add the rule rewriting the first term to the second. This gives us a rewrite rule that we may recognize as an instance of the general pattern

+ y) +

z

+ (ffly + z),

(A) (A)o. The general pattern, if we continue the completion procedure, is that by reducing the for n > 1, we obtain a critical pair of the term + fy) + z by (H) and form

Universal Algebra and Algebraic Data Types

230

+ (ffly + z))

+ y) + z, with weights

x2n

+A z) =

(fA)flx +A

This leads us to add the rewrite rule

2nx

= For this reason, the completion procedure will

continue indefinitely.

.

There are a number of issues involved in mechanizing the completion procedure that we will not go into. A successful example of completion, carried out by hand, is given in the next section.

Exercise 3.7.35 Consider the infinite rewrite system consisting of rule (H) of Examgiven in that example. Prove that this ple 3.7.34, along with each rule of the form infinite system is confluent and terminating, or give a counterexample. Exercise 3.7.36

f(g(fx))

The rewrite system with the single rule

g(fx)

is terminating, since this rule reduces the number of function symbols in a term. Carry out several steps of the completion procedure. For each critical pair, add the rewrite rule that reduces the number of function symbols. What is the general form of rewrite rule that is

added? 3.7.7

Applications to Algebraic Datatypes

There are several ways that rewrite rules may be used in the design, analysis and application of algebraic specifications. If equational axioms may be oriented (or completed) to form a confluent rewrite system, then this gives us a useful way to determine whether two terms are provably equal. In particular, a confluent rewrite system allows us to show that two terms are not provably equal by showing that they have distinct normal forms. A potential use of rewrite rules in practice is that rewriting may be used as a prototype implementation of a specified datatype. While this may be useful for simple debugging of specifications, reduction is seldom efficient enough for realistic programming. Finally, since our understanding of initial algebras is largely syntactic (based on provability), rewrite systems are very useful in the analysis and understanding of initial algebras. We will illustrate two uses of rewrite systems using the algebraic specifications of lists with error elements. The first is that we may discover the inconsistency of our first specification by testing for local confluence. The second is that we may characterize the initial al-

3.7

Rewrite Systems

231

gebra for the revised specification by completing directed equations to a confluent rewrite system. In Tables 3.5 and 3.6 of Section 3.6.3, we considered a naive treatment of error values that turned out to be inconsistent. Let us consider the rewrite system obtained by reading the equations from left to right. It is easy to see that these rules are terminating, since every rule reduces the length of an expression. Before checking the entire system for confluence, let us check Table 3.5 and Table 3.6 individually for critical pairs. The only pair of rules that give us a critical pair are the rules

= errori cons x errorl = errori cons errora

1

of Table 3.6. Since the left-hand sides are unified by replacing x and 1 by error values, we have the critical pair (error1, errorl). However, this critical pair obviously does not cause local confluence to fail. It follows that, taken separately, the rules determined from Tables 3.5 or Table 3.6 are confluent and terminating. Hence by Corollary 3.7.14, the equations in each figure are consistent. Now let us examine rules from Table 3.6 that might overlap with a rule from Table 3.5. x containing The first candidate is cons errora 1 errori, since a rule car(consx 1) cons appears in Table 3.5. We have a nontrivial overlap since the term car(cons errora /) is a substitution instance of one left-hand side and also has a subterm matching the leftwhile the other gives hand side of another rule. Applying one reduction, we obtain us car errorl. Thus our first non-trivial critical pair is (errora, car errorl). However, this critical pair is benign, since car errori errora. The next critical pair is obtained by unifying the left-hand side of cons errora 1 errorl 1. Substituting errora for x produces a with a subterm of the next rule, cdr(cons x 1) term cdr(cons errora 1), which reduces either to the list variable 1 or cdr errorl. Since the latter only reduces to errors, we see that confluence fails. In addition, we have found a single expression that reduces both to a list variable 1, and a list constant errorl. Therefore, [1: list] and so by Lemma 3.7.11 the equational specification given in Tables 3.5 errorl and 3.6 proves the equation errorl = 111: list]. This shows that all lists must be equal in any algebra satisfying the specification. From this point, it is not hard to see that similar equations are provable for the other sorts. Thus we conclude that the specification is inconsistent. The revised specification of lists with error elements is given in Table 3.7. We obtain rewrite system by reading each equation from left to right. Before checking the critical a pairs, let us establish termination. This will give us a basis for orienting additional rules that arise during the completion procedure. We can show termination of this rewrite system using a well-founded algebra A with 1

Universal Algebra and Algebraic Data Types

232

each carrier the set of positive natural numbers (the numbers greater than 0). We interpret each of the constants as 1 and the functions as follows. This interpretation was developed by estimating the maximum number of reduction steps that could be applied to a term of each form, as described in Example 3.7.17.

consAxt

—x+t

cdrAt

=2*t+5 =2*t+5

isemply?At = £ +

8

condAuxy=u+x+y+1 OKAx

=x+8

It is straightforward to check that L > R for each of the rewrite rules derived from the

specification. There are several critical pairs, all involving the rules for consxerror and conserrort. The first overlap we will check is

car(cons error t) If we reduce the outer term, we get cond (OK error) (cond(OK t)error error) error while the inner reduction gives us carerror. This is a nontrivial critical pair. However, both terms reduce to error, so there is no problem here. The next overlap is

car(consx error) which gives us the similar-looking critical pair

(cond (OK x) (cond(OK error) x error) error,

car error)

The second term clearly reduces to error, but the closest we can come with the first term (by reducing the inner conditional) is

cond (OK x)error error. This gives us a candidate for completion. In the well-founded algebra A, it is easy to see that

3.7

Rewrite Systems

233

cond (OKx)error error > error so we will preserve termination by adding the rewrite rule

cond (OKx)error error

error.

Since the left-hand and right-hand sides of this rule are already provably equal from this specification, adding this rule does not change the conversion relation between terms. The critical pairs for cdr and isempty? are similar. For both functions, the first critical pair converges easily and the second converges using the new rule added for the car case. The final critical pairs are for OK applied to a cons. For car(cons error £), we get false in both cases. However, for

car(cons x error) we must add the rewrite rule

cond (OKx)falsefalse

—÷

false.

Thus two new rules give us a confluent rewrite system. The initial algebra for a specification consists of the equivalence classes of ground (variable-free) terms, modulo provable equality. Since the confluent and terminating rewrite system for the list specification allows us to test provable equality by reduction, we may use the rewrite system to understand the initial algebra. Specifically, since every equivalence class has exactly one normal form, the initial algebra is isomorphic to the algebra whose carriers are the ground normal forms of each sort. Therefore, the atoms error,4, the booleans in the initial algebra are in the initial algebra are a, b, c, d true,false, errorb, and the lists all have the form nil, error or cons A L where neither A nor L contains error. Thus this specification correctly adds exactly one error element to each carrier and eliminates undesirable lists that contain error elements. The reader interested in a general theorem about initial algebras with error elements may consult [GTW78, Wir9O].

Exercise 3.7.37 Exercise 3.6.7 asks you to write an algebraic specification of queues with error elements, in the same style as the specification of lists in Table 3.7. Complete this specification to a confluent set of rewrite rules and use these rules to characterize the initial algebra. Exercise 3.7.38 Exercise 3.6.8 asks you to write an algebraic specification of trees with error elements, in the same style as the specification of lists in Table 3.7. Complete this specification to a confluent set of rewrite rules and use these rules to characterize the initial algebra.

Simply-typed Lambda Calculus

4.1

Introduction

This chapter presents the pure simply-typed lambda calculus. This system, which is extended with natural numbers, booleans and fixed-point operators to produce PCF, is the core of all of the calculi studied in this book, except for the simpler system of universal algebra used in Chapter 3. The main topics of this chapter are: •

Presentation of context-sensitive syntax using typing axioms and inference rules.

The equational proof system (axiomatic semantics) and reduction system (operational semantics), alone and in combination with additional equational axioms or rewrite rules. •

Henkin models (denotational semantics) of typed lambda calculus and discussion of general soundness and completeness theorems. •

Since Chapter 2 describes the connections between typed lambda calculus and programming in some detail, this chapter is more concerned with mathematical properties of the system. We discuss three examples of models, set-theoretic models, domain-theoretic models using partial orders, and models based on traditional recursive function theory, but leave full development of domain-theoretic and recursion-theoretic models to Chapter 5. The proofs of some of the theorems stated or used in this chapter are postponed to Chapter 8. Although versions of typed lambda calculus with product, sum, and other types described in Section 2.6 are useful for programming, the pure typed lambda calculus with only function types is often used in this chapter to illustrate the main concepts. Church developed lambda calculus in the 1930's as part of a formal logic and as a formalism for defining computable functions [Chu36, Chu4l]. The original logic, which was untyped, was intended to serve a foundational role in the formalization of mathematics. The reason that lambda calculus is relevant to logic is that logical quantifiers (such quantifiers) are binding operators and their usual proof as universal, V, and existential, rules involve substitution. Using lambda abstraction, logical quantifiers may be treated as constants, in much the same way that recursive function definitions may be treated using fixed-point constants (see Example 4.4.7 and associated exercises). However, a paradox was discovered in logic based on untyped lambda calculus 11KJ35]. This led to the development of typed lambda calculus, as part of a consistent, typed higher-order logic Hen5O]. In 1936, Kleene proved that the natural number functions definable in untyped lambda Turing, in 1937, then proved that the calculus are exactly the recursive functions A-definable numeric functions are exactly as the functions computable by Turing machines

Simply-typed Lambda Calculus

236

[Tur37]. Together, these results, which are analogous to properties of PCF considered in Sections 2.5.4 and 2.5.5, establish connections between lambda calculus and other models of computation. After the development of electronic computers, untyped lambda calculus clearly influenced the development of Lisp in the late 1950's [McC6O]. (See [McC78, Sto9lb} for many interesting historical comments.) General connections between lambda notation and other programming languages were elaborated throughout the 1960's by various researchers [BG66, Lan63, Lan65, Lan66], leading to the view that the semantics of programming languages could be given using extensions of lambda calculus [Str66, MS76, Sto77]. Although untyped lambda calculus provides a simpler execution model, a modern view is that typed lambda calculus leads to a more insightful analysis of programming languages. Using typed lambda calculi with type structures that correspond to types in programming languages, we may faithfully model the power and limitations of typed programming languages. Like many-sorted algebra and first-order logic, typed lambda calculus may be defined using any collection of type constants (corresponding to sorts in algebra) and term constants (corresponding to function symbols). Typical type constants in programming languages are numeric types and booleans, as in PCF, as well as characters, strings, and other sorts that appear in the algebraic datatypes of Chapter 3. In typed lambda calculus, the term constants may have any type. Typical constants in programming languages include 3, + and if ... then ... else ., as in PCF. As in algebra, the set of type constants and term constants are given by a signature. The axiom system (axiomatic semantics), reduction rules (operational semantics) and model theory (denotational semantics) are all formulated in a way that is applicable to any signature. .

.

Exercise 4.1.1 In the untyped logical system Church proposed in 1932, Russell's paradox (see Section 1.6) results in the untyped lambda term (Ax. not (xx))(Ax. not(xx))

This may be explained as follows. If we identify sets with predicates, and assume that x is a set, then the formula xx is true if x is an element of x, and not(xx) is true if x is not an element of x. Therefore the predicate (Ax. not(xx)) defines the set of all sets that are not members of themselves. Russell's paradox asks whether this set is a member of itself, resulting in the application above. Show that if this term is equal to false, then it is also equal to true, and conversely. You may assume that not true = false and not false = true. (Church did not overlook this problem. He attempted to avoid inconsistency by not giving either truth value to this paradoxical term.)

4.2

4.2 4.2.1

Types

237

Types

Syntax

We will use the phrase "simply-typed lambda calculus" to refer to any version of typed lambda calculus whose types do not contain type variables. The standard type forms that occur in simply-typed lambda calculus are function types, cartesian products, sums (disjoint unions) and initial ("null" or "empty") and terminal ("trivial" or "one-element") types. We do not consider recursive types, which were introduced in Section 2.6.3 and are considered again in Section 7.4, a form of simple type. (Recursive types contain type variables but a type variable alone is not a type, so this could be considered a borderline case.) Since lambda calculus would not be lambda calculus without lambda, all versions of simply-typed lambda calculus include function types. The type expressions of full simply-typed lambda calculus are given by the grammar

a ::= blnulllunitla+ala

xala

—÷a

where b may be any type constant, null is the initial (empty) type and, otherwise, terminal (one-element), sum, product and function types are as in Chapter 2. (Type connectives x and —÷ are used throughout Chapter 2; unit and + appear in Section 2.6.) Recall that when parentheses are omitted, we follow the convention that —± associates to the right. In addition, we give x higher precedence than —±, and —± higher precedence than +, so that

a+b x c—÷disreadasa+((b

xc)—÷ d).

To keep all the possibilities straight, we will name versions of typed lambda calculus by their types. The simplest language, called simply-typed lambda calculus with function

With sums, products and functions, we have types, will be indicated by the symbol type expression is a type expression containing only —± and type and soon. A constants. A type expression is a type expression containing only x, —± and type

constants, and similarly for other fragments of simply-typed lambda calculus. 4.2.2

Interpretation of Types

There are two general frameworks for describing the denotational semantics of typed lambda calculus, Henkin models and cartesian closed categories. While cartesian closed categories are more general, they are also more abstract and therefore more difficult to understand at first. Therefore, we will concentrate on Henkin models in this chapter and postpone category theory to Chapter 7. In a Henkin model, each type expression is interpreted as a set, the set of values of that type. The most important aspect of a Henkin model is the interpretation of function types, since the interpretation of other types is generally determined by the function types.

Simply-typed Lambda Calculus

238

Intuitively, a r is the type, or collection, of all functions from a to r. However, there are important reasons not to interpret a r as all set-theoretic functions from a to r. The easiest to state is that not all set-theoretic functions have fixed points. In the denotational semantics of any typed lambda calculus with a fixed-point constant, the type a —÷ a must contain all functions definable in the language, but must not contain any functions that do not have fixed points. There are many reasonable interpretations of function types. A general intuitive guide is to think of a r as containing all functions from a to r that have some general and desirable property, such as computability or continuity. To emphasize from the outset that many views are possible, we will briefly summarize three representative interpretations: classical set-theoretic functions, recursive functions coded by Gödel numbers, and continuous functions on complete partial orders (domains). Each will be explained fully later. In the full set-theoretic interpretation, a type constant b may denote any set Ab, and

the function type

a

r denotes the set

of all functions from

to At.

Note that is determined by the sets AU and At. There are several forms of recursion-theoretic interpretation for each involving sets with some kind of enumeration function or relation. We will consider a representative framework based on partial enumeration functions. A modest set (A, eA) is a set A together with a surjective partial function eA: A from the natural numbers to A. Intuitively, the natural number n is the "code" for eA(n) E A. Since eA is partial, eA(n) may not be defined, in which case n is not a code for any element of A. The importance of having codes for every element of A is that we use coding to define the recursive functions on A. Specifically, f: A B is recursive if there is a recursive map on natural numbers taking any code for a E A to some code for f(a) E B. In the recursive interpretation, a type constant b may denote any modest set (A", eb), and

the function type a tions from AU to At.

r denotes the modest set

of all total recursive func-

Since every total recursive function from AU to At has a Gödel number, we can use any standard numbering of recursive functions to enumerate the elements of the function type Thus, if we begin with modest sets for each type constant, we can associate a modest set with every functional type. The continuous or domain-theoretic interpretation uses complete partial orders. The primary motivations for this interpretation are to give semantics to recursively defined

4.3

Terms

239

functions and recursive type definitions. A complete partial order, or CPO, (D,

flat

—±

flat.

flat. xy

=

flat

—±

flat. x

We will now show that the meaning of a well-typed term does not depend on the typing derivation we use. This is the intent of the following two lemmas. The first requires some special notation for typing derivations. In general, a typing derivation is a tree, with each leaf an instance of a typing axiom, and each internal node (non-leaf) an instance of an inference rule. In the special case that an inference rule has only one premise, then uses of this rule result in a linear sequence of proof steps. If we have a derivation ending in a sequence of uses of (add var), this derivation has the form A,A', where A is generally a tree, and A' is a sequence of instances of (add var) concatenated onto the end of A. We use this notation to state that (add var) does not effect the meanings of terms.

Suppose is a derivation of F M: a and M: a such only rule (add var) appears in is'. Then for any rj

Lemma 4.5.7

IIIF

is a

derivation of F'

F', we have

=

where the meanings are taken with respect to derivations

and

The proof is an easy induction on typing derivations, left to the reader. We can now show that the meanings of "compatible" typings of a term are equal.

286

Simply-typed Lambda Calculus

Suppose L\ and L\' are derivations of typings F M: a and F" respectively, and that F and F' give the same type to every x free in M. Then

Lemma 4.5.8

M:

a,

if where the meanings are defined using L\ and L\', respectively.

The proof, which we 'eave as an exercise, is by induction on the structure of terms. It follows that the meaning of any well-typed term is independent of the typing derivation. This 'emma allows us to regard an equation F M = N: a as an equation between M and N, rather than derivations of typings F M: a and F N: a. Lemma 4.5.8 is a simple example of a coherence theorem. In general, a coherence problem arises when we interpret syntactic expressions using some extra information that is not uniquely determined by the expressions themselves. In the case at hand, we are giving meaning to terms, not typing derivations. However, we define the meaning function using extra information provided by typing derivations. Therefore, we must show that the meaning of a term depends only on the term itself, and not the typing derivation. The final lemma about the environment model condition is that it does not depend on the set of term constants in the signature. Let A be an environment model for terms over signature E and let E' be a signature containing E (i.e., containing all the type and term constants of E). If we expand A to E' by interpreting the additional term constants as any elements of the appropriate types, then we obtain an environment model for terms over

Lemma 4.5.9

Proof This lemma

is easily proved using the fact that every constant is equal to some variable in some environment. More specifically, if we want to know that a term F M: a with constants from E' has a meaning in some environment, we begin by replacing the constants with fresh variables. Then, we choose some environment that is identical to on the free variables of M, giving the new variables the values of the constants they replace. If A is an environment model, then the new term must have a meaning in environment and so it is easy to show that the original term must have a meaning in

. Exercise 4.5.10 Recall that in a full set-theoretic function hierarchy, contains all set-theoretic functions from to At. Show, by induction on typing derivations, that for any full set-theoretic hierarchy A, typed term F M: a and any environment F, the meaning AIIF M: is well-defined and an element of Exercise 4.5.11 Recall that in a full set-theoretic function hierarchy, contains all set-theoretic functions from to At. If there are no term constants, then this frame is

4.5

Henkin Models, Soundness and Completeness

287

simply a family of sets A = {A°} indexed by types. This problem is concerned with the full set-theoretic hierarchy for the signature with one base type b and no term constants, determined by A's' = {O, 1). It is easy to see that there are two elements of Ab, four elements of Ab and 44 = 256 elements of type 1.

(a) Calculate the meaning of the term Af: b

—÷

b. Ax: b.

f(fx)

in this model using the

inductive definition of meaning. and stating 2. (b) Express your answer to part (a) by naming the four elements of Ab to each element. the result of applying the function defined by Af: b —÷ b. Xx: b.

f(fx)

3. (c) Recall that every pure typed lambda term has a unique normal form. It follows from

the soundness theorem for Henkin models that the meaning of any term in any model is the same as the meaning of its normal form. Use these facts about normal forms to determine are the meanings of and how many elements of each of the types A", closed typed lambda term without term constants.

Exercise 4.5.12 [Fri75] This exercise is also concerned with the full set-theoretic hierarchy, A, for one type constant, b. We assume that A's' has at least two elements, so that the model is nontrivial. We define a form of higher-order logic with formulas

::= where in writing these formulas we assume that M and N have the same type (in some is well-formed under the assumption that x typing context) and that, in the final case, is a variable of type a. Each formula may be interpreted in A in the obvious way. We N (M N) A (N N), and assume abbreviations will use the abbreviation M for other connectives and quantifier as in Example 4.4.7. We will also use the abbreAx:b.Xy:b.y and, for M:a and N:r, (M,N) Ax:b.Xy:b.x, F viations T there exist terms M and N such that Xx: a —÷ r —÷ b. xMN. Show that for every N, by the steps given below. Vx: a. M A (a)

(b) For that any M, N, P, Q of appropriate types, A

((M, N) = (P, Q))

(M = P

A

N=Q). a(M = N). Show that if (c) We say is existential if has the form variables such that A i/i with the same free then there is an existential

is existential,

and are existential, then there is an existential 0 with FV(0) (q5 A i/i). 0 U FV(i/r) such that A

(d) Show that if

=

Simply-typed Lambda Calculus

288

(e) Show that if Rx:

is existential, there is an existential

a. 4such that A

1=

(0 Prove the main claim, using the argument that for any equivalent to 4.5.4

with the same free variables as

ax: a. there is an existential formula

—'4,

Type and Equational Soundness

Since there are two proof systems, one for proving typing assertions and one for equations, there are two forms of soundness for and other typed lambda calculi. The first, type soundness, is stated precisely in the following lemma.

Lemma 4.5.13 (Type Soundness) M: a

Let A be any Henkin model for signature E, F a provable typing assertion, and F an environment for A. Then AlIT M:

E

Aa.

This has the usual form for a soundness property: if a typing assertion is provable, then it holds semantically. The proof is a straightforward induction on typing derivations, which is omitted. With a little thought, we may interpret Lemma 4.5.13 as saying that well-typed terms do not contain type errors. In a general framework for analyzing type errors, we would expect to identify certain function applications as erroneous. For example, in the signature for PCF, we would say that an application of + to two arguments would produce a type error if either of the arguments is not a natural number. We will use this example error to show how Lemma 4.5.13 rules out errors. Suppose we have a signature that gives addition the type +: nat —± nat —± nat and a Henkin model A that interprets + as a binary function on We consider any application of + to arguments that are not elements of an error. It is easy to see that the only way to derive a typing for an application + M N is to give M and N type nat. By Lemma 4.5.13, if we can prove typing assertions F M: nat and F N: nat, then in any Henkin model the semantic meaning of M and N will be an element of the set of semantic values for natural numbers. Therefore, the meaning of a syntactically well-typed application + M N will be determined by applying the addition function to two elements of It follows that no type error arises in the semantic interpretation of any well-typed term in this Henkin model. To conclude that no type errors arise in a specific operational semantics, we may prove an equivalence between the denotational and operational semantics, or give a separate operational proof. The standard statement of type soundness for an operational semantics given by reduction is Lemma 4.4.16. Before proving equational soundness, we state some important facts about variables and

4.5

Henkin Models, Soundness and Completeness

289

the meanings of terms. The most important of these is the substitution lemma, which is analogous to the substitution lemma for algebra in Chapter 3. A simple, preliminary fact is that the meaning of F M: a in environment ri can only depend on if y is free in M.

Lemma 4.5.14 [Free Variables] Suppose iji(x) = p2(x) for every x E FV(M). Then LEF

= IF

M:

j= F are environments for

A such that

M: alIl)2.

The proof is a straightforward inductive arguments, following the pattern of the proof of Lemma 4.3.6. Since substitution involves renaming of bound variables, we will prove that the names of bound variables does not effect the meaning of a term. To make the inductive proof go through, we need a stronger inductive hypothesis that includes renaming free variables and setting the environment appropriately.

and suppose F M: a isa well-typed term. Lemma 4.5.15 Let F = (xl: cr1 Let F' = (yl: cr1 ak) and let N be any term that is a-equivalent to [yj YkI

xk]M.Ifll'(yi)=ll(xi),forl

N: a

111' r>

a

c>

—±

P: r

pl]ij and

—±

x

FfFc>M:a x

111'

Ff1' r>

M: aFj

tFj =AppfFf1'iM:a+tllij 111'

where

ftF

c>

(M,

N): a x

= the

such that

unique p E

p=

Ff1' r>

M:

and

Recall that an applicative structure satisfies the environment model condition if the standard inductive clauses define a total meaning function ft•11• on terms F M: a and environments such that = 1'. In other words, for every well-typed term 1' M: a, and must exist as defined above. every environment = 1', the meaning 111' M: aFj E

Simply-typed Lambda Calculus

304

Combinatory Model Condition the condition that every term has a meaning is equivalent to the existence of combinators, where each combinator is characterized by an equational axiom. To avoid any confusion, the constants for combinators and their types are listed below. As for

Zeroa *

:

null

—±

a

:unit :

a

—±

(a

+ r)

r

r

(a

r)

a r

The equational axioms for these constants are the equational axioms of Anu11,untt,+, X ,—± It is straightforward to extend Theorem 4.5.33 and the associated lemmas in Section 4.5.7 to Most of the work lies in defining pseudo-abstraction and verifying its properties. However, all of this has been done in Section 4.5.7. X Exercise 4.5.36 Prove Lemma 4.5.3 for Show that if A is a Henkin model, then is isomorphic to a superset of

X

Models of Typed Lambda Calculus

5.1

Introduction

This chapter develops two classes of models for typed lambda calculus, one based on partially-ordered structures called domains, and the other based on traditional recursive function theory (or Turing machine computation). The second class of models is introduced using partial recursive functions on the natural numbers, with the main ideas later generalized to a class of structures called partial combinatory algebras. The main topics of this chapter are:

Domain-theoretic models of typed lambda calculus with fixed-point operators based on complete partial orders (CPOs). •

•

Fixed-point induction, a proof method for reasoning about recursive definitions.

Computational adequacy and full abstraction theorems, relating the operational semantics of PCF (and variants) to denotational semantics over domains. •

Recursion-theoretic models of typed lambda calculus without fixed-point operators, using a class of structures called modest sets. •

Relationship between modest sets and partial equivalence relations over untyped computation structures called partial combinatory algebras. •

•

Fixed-point operators in recursive function models.

In the sections on domain-theoretic models, we are primarily concerned with the simplest class of domains, called complete partial orders, or CPOs for short. The main motivation for studying domains is that they provide a class of models with fixed-point operators and methods for interpreting recursive type expressions. Our focus in this chapter is on fixed points of functions, with recursive types considered in Section 7.4. Since fixed-point operators are essential for recursion, and recursion is central to computation, domains are widely studied in the literature on programming language mathematics. A current trend in the semantics of computation is towards structures that are related to traditional recursive function theory. One reason for this trend is that recursive function models give a pleasant and straightforward interpretation of polymorphism and subtyping (studied in Chapters 9 and 10, respectively). Another reason for developing recursiontheoretic models is the close tie with constructive logic and other theories of computation. In particular, complexity theory and traditional studies of computability are based on recursive functions characterized by Turing machines or other computation models. It remains to be seen, from further research, how central the ideas from domain theory are for models based on recursive function theory. Some indication of the relationships are discussed in Section 5.6.4.

Models of Typed Lambda Calculus

306

models, leaving the interpreBoth classes of structures are presented primarily as tation of sums (disjoint union) and other types to the exercises. The proofs of some of the theorems stated or used in this chapter are postponed to Chapter 8. 5.2 5.2.1

Models and Fixed Points

Domain-theoretic

Recursive Definitions and Fixed Point Operators

Before defining complete partial orders and investigating their properties, we motivate the central properties of domains using an intuitive discussion of recursive definitions. Recall from Section 2.2.5 that if we wish to add a recursive definition form

letrec f:o=M in

N

to typed lambda calculus, it suffices to add a fixed-point operator fiXa that returns a fixed point of any function from a to a. Therefore, we study a class of models for and extensions, in which functions of the appropriate type have fixed points. A specific example will be a model (denotational semantics) of PCF. We will use properties of fix reduction to motivate the semantic interpretation of fix. To emphasize a few points about (fix) reduction, we will review the factorial example from Section 2.2.5. In this example, we assume reduction rules for conditional, equality test, subtraction and multiplication. the factorial function may be writtenfact F, where F is the expression

Eq?yO then

F

else y*f(y—l).

1

Since the type is clear from context, we will drop the subscript fromfix. To computefact n, we expand the definition, and use reduction to obtain the following.

fact n = (Af: flat

—*

f)) F n

nat.

=F (fixF)n Eq?yO

=if

Eq?nO

then

1

then

1

else y*f(y—1))(fixF)n

else n*(fixF)(n—1)

When n = 0, we can use the reduction axiom for conditional to simplify fact 0 to 1. For n > 0, we can simplify the test to obtain n * (fix F)(n — 1), and continue as above. For any numeral n, we will eventually reducefact n to the numeral for n!. An alternative approach to understanding fact is to consider the finite expansions of

5.2

Domain-theoretic Models and Fixed Points

307

fix F. To make this as intuitive as possible, let us temporarily think of flat —÷ flat as a collection of partial functions on the natural numbers, represented by sets of ordered pairs. Using a constant diverge for the "nowhere defined" function (the empty set of ordered pairs), we let the "zero-th expansion" fixt01 F = diverge and define

F = F

F)

F describes the recursive function computed using at most n evaluations of the body of F. Or, put another way, F is the best we could do with a machine having such limited memory that allocating space for more than n function calls would overflow the run-time stack. For example, we can see that F) 0 = and F) 1 = 1, but (fix121 F) n is undefined for n > 2. Viewed as sets of ordered pairs, the finite expansions of fix F are linearly ordered by set-theoretic containment. Specifically, F = 0 is the least element in this ordering, F) U (fi, n!) properly contains all fixt'1 F for i fi. This reflects F= and the fact that if we are allowed more recursive calls, we may compute factorial for larger natural numbers. In addition, since every terminating computation involving factorial uses only a finite number of recursive calls, it makes intuitive computational sense to let fact = F); this gives us the standard factorial function. However, a priori, there is F is a fixed point of F. This may be no reason to believe that for arbitrary F, natural conditions on F (or the basic functions used guaranteed by imposing relatively to define F). Since any fixed point of F must contain the functions defined by finite F will be the least fixed point of F (when we order partial expansions of F, functions by set containment). In domain-theoretic models of typed lambda calculus, types denote partially-ordered sets of values called domains. Although it is possible to develop domains using partial functions we will follow the more standard approach of total functions. However, flat "contains" the partial functions we will alter the interpretation of flat so that flat on the ordinary natural numbers in a straightforward way. This will allow us to define an F F ordering on total functions that reflects the set-theoretic of partial functions. In addition, all functions in domain-theoretic models will be continuous, in a certain sense. This will imply that least fixed-points can be characterized as least F n O}. upper bounds of countable sets like A specific problem related to recursion is the interpretation of terms that define nonterminating computations. Since recursion allows us to write expressions that have no normal form, we must give meaning to expressions that do not seem to define any standard value. For example, it is impossible to simplify the PCF expression In computational

1

I

letrec f: nat —± flat = Ax: flat. f(x + 1) in f3

Models of Typed Lambda Calculus

308

to a numeral, even though the type of this expression is flat. It is sensible to give this term type nat, since it is clear from the typing rules that if any term of this form were to have a normal form, the result would be a natural number. Rather than say that the value of this expression is undefined, which would make the meaning function on models a partial function, we extend the domain of "natural numbers" to include an additional value 'flat to represent nonterminating computations of type nat. This gives us a way to represent partial functions as total ones, since we may view any partial numeric function as a function into the domain of natural numbers with 'flat added. The ordering of a domain is intended to characterize what might be called "information content" or "degree of definedness." Since a nonterminating computation is less informative than any terminating computation, -i-nat will be the least element in the ordering on the domain of natural numbers. We order flat —÷ nat point-wise, which gives rise to an ordering that strongly resembles the containment ordering on partial functions. For example, since the constant function Ax: flat. —I--nat produces the least element from any argument, it will be the least element of flat —÷ flat. Functions that are defined on some arguments, and intuitively "undefined" elsewhere, will be greater than the least element of the domain, but less than functions that are defined on more arguments. By requiring that every function be continuous with respect to the ordering, we may interpret fix as the least fixed-point func— tional. For continuous F, the least fixed point will be the least upper bound of all finite

F. 5.2.2

Complete Partial Orders, Lifting and Cartesian Products

Many families of ordered structures with the general properties described above are called domaifis. We will focus on the complete partial orders, since most families of domains are obtained from these by imposing additional conditions. In this section, we define complete partial orders and investigate two operations on complete partial orders, lifting and cartesian product. Lifting is useful for building domains with least elements, an important step in finding fixed points, and for relating total and partial functions. Cartesian products of CPOs are used to interpret product types. The definition of complete partial order requires subsidiary definitions of partial order and directed set. A partial order (D, 0, we have

Exercise 5.2.17 If f: D1 x ... x E, then there are two ways that we might being continuous. The first is that we could fix n — 1 arguments, and ask think of whether the resulting unary function is continuous from to E. The second is that since the product D1 x ... x of CPOs is a CPO, we can apply the general definition of continuous function from one CPO to another. Show that these two coincide, i.e., a function f: D1 x ... x E is continuous from the product CPO D1 x ... x to E if each unary function obtained by holding n — 1 arguments fixed is continuous.

f

Exercise 5.2.18

Prove the four parts of Lemma 5.2.11 as parts (a), (b), (c) and (d) of this

exercise.

Exercise 5.2.19 Show that the parallel-or function described in Section 2.5.6 is continuous. More specifically, show that there is a continuous function por: B1 B1 B1 satisfying

por true x por x true

= true = true

por false false —false by defining the function completely and proving that it is continuous. You may use the names for functions in B1 B1 given in Example 5.2.9.

Models of Typed Lambda Calculus

318

5.2.4

Fixed Points and the Full Continuous Hierarchy

Our current motivation for studying domain-theoretic models is to construct Henkin models of typed lambda calculi with fixed-point operators. The type frame, A, called the hierarchy over CPOs (Ab0, (Ak, is defined by taking full continuous Abk as base types, and

= Aa x =

all continuous

f: (Ar',

-÷ (AT,

ordered coordinate-wise by and ordered point-wise by with By Lemmas 5.2.2 and 5.2.10, each (Aa, 0}. is continuous.

In addition, the

f: V -÷ V be continuous. Since is monotonic, and 1< f(I), we can see that f'1(I) < fn+l(I). It follows that {f'1(I) n 0} is linearly ordered and therefore directed. Let a be the least upper bound a = n O}. We must show that a is a fixed point of f. By continuity of f,

f

Proof Let

f(a)=f(V{f(I)In >0}) >O} But since {f'1(I)} and {f(n+l)(I)} have the same least upper bound, we have f(a) = a. To see that a is the least fixed point of f, suppose b = f(b) is any fixed point. Since 1< b, we have f(b) and similarly f'1(b) for any n 0. But since b is a fixed point of f, f"(b) = b. Therefore b is an upper bound for the set {f"(I) n O}. Since the least upper bound of this set is a, it follows that a 0} is the set of all equations provable from axioms and elements of H.

.

In the remainder of this section, we show that the full continuous hierarchy is a Henkin model. The following two lemmas are essential to the inductive proof of the environment model condition. The second is a continuous analogue to the s-rn-n theorem for recursive functions.

Lemma 5.2.23 App

f g given by

(App

If f:

C

—÷

(V —* E) and g: C

—±

V are continuous, then so is the function

fg)c= (fc)(gc)

for all c

E C.

In addition, the map App is continuous.

Lemma 5.2.24 If tion (Curry f): C

(Curry f)cd =

f: C

—÷

x V —* E is continuous, then there is a unique continuous func(V —± E) such that for all c E C and d e D, we have

f(c, d).

In addition, the map Curry is continuous.

Models of Typed Lambda Calculus

322

Proof of Lemma 5.2.23 The monotonicity of App f g and of App itself reader. We first show that App the following calculation:

V{(Appfg)cIcE

S}

f

is left to the g is continuous. If S C C is directed, then we can make

=V{(f c)(gc)

CE S}

=(V{fcIcES})(V{gcIcES})byLemma5.2.llc = f(VS) g(VS) by continuity of f, g

=(Appfg)(VS) To show that App itself is continuous, suppose Si c (C are directed. We can check continuity by this calculation:

D

V{AppfglfESi,

E) and

S2

(C

D)

gES2} g E S2} by Lemma 5.2.10

f

=c

(V{f c

=c

((VSi)c) ((V52)c) by Lemma 5.2.1 lc

Sil) (V{g cl g

by Lemma 5.2.1 ic

E S2})

= App (VS1) (VS2) Proof of Lemma 5.2.24 As in the proof of Lemma 5.2.23, we leave monotonicity to the reader. To show that Curry is continuous, we assume that S c C is directed. Using similar reasoning as in the proof of Lemma 5.2.23, we have the following continuity calculation.

f

f)c Jc E SI = V{d F-* f(c, d)

c

S}

V{f(c, d)

c

S}

=

d

=

d

'—÷

f(VS,

by Lemma 5.2.10

d) by Lemma 5.2.11

=(Curryf)VS To show that Curry itself is continuous, we assume that S C (C x D make a similar calculation: E

S}=

V{cF—* d

f(c,d)If E SI f(c,d)jfES}byLemma5.2.1O

=ci—*

= c H-*

F—*

E) is directed and

(d

V{f(c, d)

f E S}) by Lemma 5.2.10

Models and Fixed Points

5.2

=

C

(d

(VS)

323

(c, d)) by

Lemma 5.2.11

= Curry (VS) This proves the lemma. We now show that the full continuous hierarchy is a Henkin model.

Lemma 5.2.25 The full continuous hierarchy over any collection of CPOs is a Henkin model. In Chapter 8, using logical relations, we will prove that if all base types are infinite, then is complete for equations that hold in the full continuous model. $,

Proof It

is an extensional apis easy to see that the full continuous hierarchy for plicative structure. We prove that the environment model condition is satisfied using an induction hypothesis that is slightly stronger than the theorem. Specifically, in addition to showing that the meaning of every term exists, we show that the meaning is a continuous

function of the values assigned to free variables by the environment. We use the assumption that meanings are continuous to show that a function defined by lambda abstraction is continuous. We say IF M: nj is continuous if, for every x: a E F and 11 H r, the map

to At. By Exercise 5.2.17, this is the same as saya continuous function from where F = ing M: r]j is continuous as an n-ary function A°' x ... x

{xl:al Using Lemmas 5.2.24 and 5.2.23, the inductive proof is straightforward. It is easy to see that

IIxi:ai is continuous, since identity and constant functions are continuous, and similarly for the

meaning of a constant symbol. For application, we use Lemma 5.2.23, and for lambda abstraction, Lemma 5.2.24.

.

We have the following theorem, as a consequence

Theorem 5.2.26 The full continuous

hierarchy A over any collection of CPOs is N' is pointed there is a least fixed point

a 1-Lenkin model with the property that whenever

operator

E

of Lemmas 5.2.20 and 5.2.25.

Models of Typed Lambda Calculus

324

At is pointed CPOs for the type constants, then every will be pointed and we will have least-fixed-point operators at all types. We consider a specific case, a CPO model for PCF, in the next section. The extension of Theorem 5.2.26 to sums is described in Exercise 5.2.33. Recall that if At is pointed, then

Exercise 5.2.27 Find a CPO V without least element and a continuous function f: V —* V such that does not have a least fixed point (i.e., the set of fixed points of does not have a least element).

f

f

Calculate the least fixed point of the following continuous function from using the identity given in Lemma 5.2.20.

Exercise 5.2.28 13j

—*

f=

13j

8j

—*

8j.

I

A{g(true),

true false

g(false)}

true F-÷

g(true)

A{g(true), g(false)} is the greatest lower bound of g(true) and g(false). You may —* want to look at Example 5.2.9 to get a concrete picture of the CPO where

Exercise 5.2.29 Let S be some set with at least two elements and let S°° be the set of finite and infinite sequences of elements of 5, as in Exercise 5.2.4. This is a CPO with the prefix order. For each of the following functions on S°°, show that the function is continuous and calculate its least fixed point using the identity given in Lemma 5.2.20. We assume a, b E S and write the concatenation of sequences s and s' as s 5'. (a)

(b) (c)

f(s) = ab s, the concatenation of ab and sequence s. f(s) = abab s', where s' is derived from s by replacing every a by f(s) = a s', where s' is derived from s by replacing every a by ab.

b.

Exercise 5.2.30 Recall from Example 5.2.2 1 that the collection PJV of all sets of natural numbers forms a CPO, ordered by set inclusion. This exercise asks you to show that various functions on PJV are continuous or calculate their least fixed points using the identity given in Lemma 5.2.20. —* A' is any function and A0 (a) Show that if f: F: PJV —* PJV defined by

F(A)=AoUff(n)In EA} is continuous.

c

any subset, then the function

5.2

Domain-theoretic Models and Fixed Points

325

(b) Find the least fixed points of the functions determined by the following data according to the pattern given in part (a). (i) Ao={2),f(n)=2n (ii) Ao = {1}, f(n) = the least prime p > n (c) Show that if f: A1 is any function from finite subsets of A1 to A1 and Ao ç 1V is any subset, then the function F: defined by

F(A) = Ao U { f(A') A' C A finite I

is continuous.

(d) Find the least fixed points of the functions determined by the following data according to the pattern given in part (c). (i) A0 = {2, 4, 6}, f(A) = (ii) Ao = {3, 5, 7}, f(A) = I1flEA

Exercise 5.2.31 If V is a partial order and element x E V with f(x) <x.

f: V

V, then a pre-fixed-point of

f is an

(a) Show that if V is a GPO (not necessarily pointed) and f: V V is continuous, then the set of pre-fixed-points of is a GPO, under the same order as V.

f

(b) Show that if V is a pointed GPO and f: V —+ V is continuous, then the set of prefixed-points of is a pointed GPO (under the same order as V) whose least element is the

f

least fixed-point of

f.

Exercise 5.2.32 Lemma 5.2.25, showing that the CPOs form a Henkin model, is proved using the environment model condition. An alternative is to use the combinatory model condition and show that the full continuous hierarchy has combinators. Prove that for all types p, a, r, the combinators Kay and Sp,a,y are continuous. Exercise 5.2.33

This exercise is about adding sum types to the full continuous hierarchy.

= (D, ±

N, then the CPO

c

These soundness results show that the denotational semantics given by has the minimal properties required of any denotational semantics for PCF. Specifically, if we can prove M = N, or reduce one term to the other, then these two terms will have the same meaning in APCF. It follows that if two terms have different meaning, then we cannot prove them equal or reduce one to the other. Therefore, we can use to reason about unprovability or the non-existence of reductions. Some additional connections between APCF and the operational semantics of PCF are discussed in Sections 5.4.1 and 5.4.2. A weak form of completeness called computational adequacy, proved in Section 5.4.1, allows us to use to reason about provability and the existence of reductions. As remarked at

5.2

Domain-theoretic Models and Fixed Points

329

the beginning of this section, it is impossible to have full equational completeness for In the rest of this section and the exercises, we work out the meaning of some expressions in and make some comments about the difference between the least fixed point of a function and other fixed points.

Example 5.2.37 Recall that the factorial function may be writtenfact where F is the expression F

Af:nat—±nat.Ay:nat.if Eq?yO then

1

else

F,

y*f(y—1).

We will work out the meaning of this closed term assuming that * and — denote multiplication and subtraction functions which are strict extensions of the standard ones. In other words, we assume that subtraction and multiplication have their standard meaning on nonbottom elements of .Ni, but have value if either of their arguments is I. Following the definition of fix in a CPO, we can see that the meaning of factorial is the least upper bound of a directed set. Specifically, ]J is the least upper bound n > 0), where F = APCFE[F]1. We will work out the first few cases of Ftz(I) to understand what this means. Since all of the lambda terms involved are closed, we will dispense with the formality of writing APCFE[ and use lambda terms as names for elements of the appropriate domains.

I

I

11

F-o (I)

—'nat—÷nat

F'(I)=Ay:nat.if Eq?yO then = flat. if Eq? y 0 then =Ay:nat.if Eq?yO then

1

else y*(F0(I))(y—1)

1

else

y * (Inat÷nat)(y

1

else

mat

1)

—

Eq?yO

then

1

else y*(F'(I))(y—l)

=Ay:nat.if Eq?yO

then

1

else y*

(Ax:

flat,

=Ay:nat.

if

if Eq? x 0

Eq?

y 0

then

then

1

else Inat)(y — 1) else y*(if Eq? (y— 1

1)0

then

1

else mat)

Continuing in this way, we can see that P1(I) is the function which computes y! whenever 0 y 1, there are two cases. The first is that the set (aj is pairwise consistent. Since (hi must also be pairwise consistent, we may let b = V{bi Since a is an applicative structure with (A')O, eo) A

=

.), and let (Abi,

be for E over modest sets .

Aa x At

= all recursive f:

(Aa, ea)

enumerated by eAaxAr and

Appatfx

—±

(At, er), with

=f(x),

(x,

y>

= x,

(x,

y>

=

y.

for each c: a E C, may be chosen arbitrarily. The function Const, choosing Const(c) E Note that technically speaking, an applicative structure is given by a collection of sets, rather than a collection of modest sets. In other words, while partial enumerations are used to define function types, they are not part of the resulting structure. This technical point emphasizes that the meaning of a pure lambda term does not depend on the way any modest set is enumerated. However, when we consider a specific recursive hierarchy, we will assume that a partial enumeration function is given for each type.

Theorem 5.5.10

Every full recursive hierarchy is a Henkin model.

Proof It

is easy to see that the full recursive hierarchy for is an extensional apenvironment model condition is satisfied, using plicative structure. We will prove that the

a slightly stronger induction hypothesis than the theorem. Specifically, we will show that the meaning of every term exists and is computable from a finite portion of the environment. We use the assumption that meanings are computable to show that every function defined by lambda abstraction is computable. To state the induction hypothesis precisely, we assume that all variables are numbered in some standard way. To simplify the notation a bit, we will assume that the variables used in any type assignment are numbered consecutively. We say that a sequence (n nk> for of natural numbers codes environment ij on F = {x1: ai xk: ak} if n1 E 1 cont y then skip else swap) contx is the sequence of assignments z = cont x; x cont y; y = cont z I do x := (contx) — I (c) true (while contx od) contx = (d) F {m := 0; n := 100; while contm i. Let

NE

Mk are

val terms with

x1

not occur-

be the conjunction

NE

of all inequalities

x1 x1 for i j. Intuitively, NE says that all of the location variables are different. Show that for any first-order formula G not containing cont and possibly containing value variables yj the formula

NEA[Mj

Mk/yl

yk]G{xj:=Mj; ... [contxi

contxk/yl

ykIG

is provable using the assignment axioms, sequencing rule, and rule of consequence.

Exercise 6.5.5 Find first-order formula F, store s and environment mapping each location variable to a distinct location with the property that the assertion F {x := y} [(contx)/y]F is not satisfied by s and Exercise 6.5.6

In most logics, substitution is a sound inference rule. In algebra, we have the explicit substitution rule (subst) given in Chapter 3. In lambda calculus, we can achieve the same effect using lambda abstraction and application, as shown in Lemma 4.4.2. In first-order logic, if we can prove then we can prove Vx by universal generalization and use this to prove for any term M. However, Floyd-Hoare logic does not have

Before—after Assertions about While Programs

6.5

421

a corresponding substitution property. Prove this by finding a valid partial correctness assertion F {P} G and substitution [M/x] such that [M/xIjF {[M/xIP} [M/xIG is not

valid. (Hint: You may use a simple assertion, with P a single assignment, that is used as an example in this section.)

Exercise 6.5.7 This exercise uses the same partial correctness assertions as Exercise 6.5.1, which asks you to use the denotational semantics of while programs to explain why each assertion is valid. Now prove each assertion using the proof rules given in this section. When programs refer to operations on natural numbers, you may assume that these operations are interpreted in the standard way. You do not have to prove any firstorder formulas, but be sure the ones you use are true in the intended interpretation.

y{x := conty; x := contx + 1}contx conty (b) x conty then skip else swap} contx > cont y, where swap z {if contx is the sequence of assignments z contx; x := conty; y contz (a) x

(c) true {while contx (d) F {m := 0; n := 100; is the formula (ax: nat)[0

if Eq? f(contm)

I

do x := (contx)

od) contx

=

while contm

F(b)

G(a) ,LG(f)

F(f),L b

G(a),

>

v(b)

G(b)

Categories and Recursive Types

456

We can read this commutative diagram as saying that v translates F's image F(f): F(a) G(b) of the same arrow b from C to G's image G(f): G(a) F(b) of the arrow f: a from C. A vague but helpful elaboration of this intuition involves thinking of a functor as V providing a "view" of a category. To simplify the picture, let us assume that F, G C are functors that are both injective (one-to-one) on objects and arrows. We may think of drawing F by drawing its image (a subcategory of D), with each object and arrow labeled by its pre-image in C. We may draw G similarly. This gives us two subcategories of V labeled in two different ways. A natural transformation from F to G is a "translation" from one picture to another, given by a family of arrows in V. Specifically, for each object a of C, a natural transformation selects an arrow from F(a) to G(a). The conditions of the :

definition guarantee that this collection of arrows respects the structure of "F's picture of C." In other words, since functors F and G must preserve identities and composition, a natural transformation v F G may be regarded as a translation of any diagram in the image of F. For any pair of categories, C and V, we have the functor category whose objects are functors from C to D, with natural transformations as arrows. The identity natural transformation just maps each object a to the identity arrow ida. The standard composition of natural transformations F G and v: G D, is to H, for functors F, G, H: C compose F(a) G(a) with v(a): G(a) H(a). This composition is sometimes called "vertical" composition of natural transformations, to distinguish it from another "horizontal" form of composition. We will not use horizontal composition in this book; the definition and examples may be found in standard texts [BW9O, Mac7l]. A worthwhile exercise on functor categories is 7.3.11, the Yoneda embedding. :

Exercise 7.2.8 An object a of category C is initial if, for every object b in C, there is a unique morphism from a to b. An object a of C is terminal if, for every b, there is a unique morphism from b to a. (a) Show that if a and a' are both initial objects of C, then they are isomorphic. (b) Show that a is initial in C iff a' is terminal in C°". (c) Show that if a and a' are both terminal objects of C, then they are isomorphic. (d) Show that if a is initial and b is terminal in C, and there is an arrow from b to a, then a and b are isomorphic. Show that in this case, all objects of the category are isomorphic.

Exercise 7.2.9 Let C and V be CPOs. Show that if we regard both as categories, as described in Example 7.2.6, the product category C x V is the same as the product CPO, defined in Section 5.2. Show that the functor category DC is also a partial order. How is it similar to the function CPO C V and how does it differ?

7.2

Cartesian Closed Categories

457

Exercise 7.2.10 This exercise asks you to verify properties of the categorical definitions of products and coproducts. Suppose that a, b and c are objects of some category C. (a) Show that if [a x b] and [a x b]' are products of a and b, then these are isomorphic. (b) Show that the product of two CPOs is the categorical product (i.e., satisfies the definition given in this section). (c) Show that in any category with products and a terminal object, unit, there is an iso-

morphism a x unit

a for each object a.

(d) A quick way to define sums, or coproducts, is to say that d is a coproduct of a and b if d is a product of a and b in C°". This is an instance of categorical duality, which involves defining dual notions by applying a definition to the opposite category. Write a direct definition of coproduct along the lines of the definition of product in this section. (e) Show that the sum CPO is the categorical coproduct and describe the categorical coproduct for the category Cpo1 of pointed CPOs and strict functions. (Hint: See Exer-

cise 5.2.33.)

Exercise 7.2.11 Show that if [a —± b] and [a —± b]' are function objects for a and b, then these are isomorphic. Check that function CPOs and modest sets are categorical function objects. Product and function space functors are defined for Cpo in Example 7.2.7. However, these definitions may be generalized to any category with products and function objects. Show that for any such category, these maps are indeed functors by checking that identities and composition are preserved. You may wish to write this out for Cpo first, then look to see what properties of Cpo you have used.

Exercise 7.2.12

Exercise 7.2.13 Let P = (P, into the category of sets.

be a

partial order and let F, G: P

—±

Set be two functors

F(b) is empty, then F(a) must also be the empty set. (b) Show that if v is a natural transformation from F to G, and G(a) is empty, then F(a) (a) Show that if a

b and

must also be the empty set. (c) Let (9: P

—± Set be the functor mapping each a E P to a one element set, t*}. By composition, each natural transformation v: F —÷ G from F to G gives us way of mapping (9 —± G. Show, (9 —÷ F to a natural transformation v o each natural transformation however, that there exist F and G such that there are many natural transformations from from F to G, but no natural transformations from 0 to F or 0 to G.

Exercise 7.2.14

An equalizer of two arrows

Categories and Recursive Types

458

f —> a b —> g

f

consists of an object c and an arrow c —÷ a with oh = go h such that for any h': c' o h' = g o h', there is a unique arrow c' —÷ c forming a commuting triangle with h a with and h'. In Set, we may let the equalizer off, g a —÷ b be the subset c = {x e a f(x) = g(x)} of a, together with the function c —÷ a mapping each element of c to the same element in a. (This function is called the inclusion of c into a.)

f

(a) Show that Cpo has equalizers but Cppo does not. To do this, show that if f, g: A —÷ B are continuous functions on CPOs, then {x E A f(x) = g(x)} is a CPO (when ordered as in A), but not necessarily pointed even when A and B are pointed. I

(b) Show that the category Mod of modest sets and total recursive functions has equalizers. (c) Show that for any category C, the functor category SetC has equalizers. If F, G C —± Set are functors (objects of SetC), with natural transformations v: F —± G, then consider the functor H C —> Set with H(A) = {x E F(A) 1UA(x) = VA(x)} and the inclusion natural transformation 1:: H —> F mapping each object A of C to the inclusion of :

:

I

H(A) into F(A). 7.2.3

Definition of Cartesian Closed Category

In this section, we define cartesian closed categories (CCC's), which are the categorical analog to Henkin models. A category theorist might simply say that a cartesian closed category is a category with specified terminal object, products and exponentials (function objects). These three parts may be defined in several ways, differing primarily in the amount of category theory that they require. The word "specified" in this definition means that a CCC is a structure consisting of a category and a specific choice of terminal object, products and function objects, much the way a partial order is a set together with a specific order relation. As a result, there may be more than one way to view a category as a CCC. However, as shown in Exercises 7.2.8, 7.2.10 and 7.2.11, the choice of each part is determined up to isomorphism. Since the concept of CCC is central to the study of categories and lambda calculus, we will give three definitions. Each definition has three parts, requiring terminal objects, products and function objects (respectively), but the way these three requirements are expressed will differ. The first definition is an approximation to a definition using adjunctions. Although adjunctions are among the most useful concepts in category theory, they involve slightly more category theory than will be needed in the rest of this book. There-

7.2

Cartesian Closed Categories

459

we consider adjunctions only in Exercise 7.2.25. The second definition is purely equational, while the third uses diagrams from Section 7.2.2 in a manner more common in the category-theoretic literature. It is a useful exercise, for the serious student, to work out the equivalence of these definitions in detail (see Exercises 7.2.21 and 7.2.25). fore,

First definition of CCC A cartesian closed category (CCC) fied object unit and specified binary maps x and b, and c, •

Hom(a,

unit) has

•

Hom(a,

b)

•

Hom(a x

c)

a,

exactly one element,

x Hom(a,

b,

is a category with a speci-

on objects such that for all objects

c)

!-Iom(a, b

!-Iom(a, b

x

c),

c),

the first "x" in the second line is ordinary cartesian product of sets, and the isomorphisms must be "natural" in a, b and c. The naturality conditions, which we will not spell out in detail here, may be made precise using the concept of adjunction. (See Exercise 7.2.25, which defines comma category and adjunction in general.) Intuitively, the naturality conditions ensure that these isomorphisms preserve composition in the manner described below. The intuitive idea behind the definition of CCC is that we have a one-element type, unit, and given any two types, we also have their cartesian product and the collection of functions from one to the other. Each of these properties is stated axiomatically, by referring to collections of arrows. Since objects are not really sets, we do not say "unit has only one element" directly. However the condition that Hom(a, unit) has only one element is equivalent in many categories. For example, in the category of sets, if there is only one map from B to A, for all B, then A must have only one element. In a similar way, the axiom that Hom(a, b x c) is isomorphic to the set of pairs of arrows Hom(a, b) x Hom(a, c) says that b x c is the collection of pairs of elements from b and c. The final axiom, Hom(a x b, c) Hom(a, b —* c), says that the object b —÷ c is essentially the collection of all arrows from b to c. A special case of the axiom which illustrates this point is Hom(unit x b, c) !-Iom(unit, b —* c). Since unit is intended to be a one-element collection, it should not be surprising that b unit x b in every CCC. Hence !-Iom(unit x b, c) is isomorphic to !-Iom(b, c). From this, the reader may see that the hom-set Hom(b, c) c). Since any set A is in one-to-one correspondence with is isomorphic to Hom(unit, b the collection of functions from a one-element set into A, the isomorphism Hom(b, c) Hom(unit, b —÷ c) is a way of saying that the object b —* c is a representation of Hom(b, c) inside the category. The first definition is not a complete definition since it does not state the additional "naturality conditions" precisely. Rather than spell these all out, we consider an example, where

Categories and Recursive Types

460

motivating a slightly different set of operations that make it easier to define CCC's by a set of equations. The isomorphism involving products, in the definition above, requires a "pairing function"

xHom(a,c)—±Hom(a,b xc), and a corresponding inverse function from Hom(a, b x c) to pairs of arrows, for all objects Hom(a, b x c), then we will write irif and ir2f for the elements of a, b and c. If Hom(a, Hom(a, b) and c) given by this isomorphism. One of the "naturality conditions" is that when o g E Hom(a, b x c), we must have

f

f

This gives us a way of replacing irk, which is a map from one hom-set to another, by arrow C of the category. Specifically, if we let = rri(idbX ) , then we have

=

rrj(idbXc

o

f) =

f

We may also derive an analogous arrow, (b —f c) x b c, by applying the isomorphism from Hom(a, b —f c) to Hom(a x b, c) to the identity on b —f c. Using Proj1, Proj2 and App, we may write an equational definition of CCC directly without any additional naturality conditions.

Second definition of CCC extra structure •

A cartesian closed category is a category with the following

An object unit with unique arrow Onea: a

unit, for each object a,

A binary object map x such that, for all objects a, b and c, there is a specified function b b x (. x c c b c satisfying •

.

Proj1 °

(fi'

f2) =

f1

and (Proj1

o

g, Proj2

o

=g

g)

for all objects a, b, c and arrows fi: a —f b, f2: a •

c, and g:

a —f b x c,

A binary object map —f such that, for all objects a, b and c, there is a specified function Hom(a x b, c) Hom(a, b —* c) and arrow (b —* c) x b c

satisfying

App

o

(Curry(h) o Proj1, Proj2) =

and

h

for all objects a, b, c and arrows h: a x b

If we write

f x g for the arrow (f

equations of the definition above as

o

Curry (App o c

and k: a

Proj1, g

o

(k

o

Proj1, Proj2))

k

(b

Proj2), then we can write the last two

7.2

Cartesian Closed Categories

App

a

461

(Curry(h) x id) = h and Curry (App

o

(k x id))

=

k

for all arrows h and k as described above. If we view categories as two-sorted algebras, then we may write the second definition (above) using equational axioms over the expanded signature with function symbols One, x, (., •), and so on. In the third definition (below), we define CCC's using diagrams instead of equations. In doing so, we will also state that certain arrows are unique; this eliminates the need for some of the equations that appear in the second definition, as the reader may verify in Exercise 7.2.2 1.

Third definition of CCC A Cartesian Closed Category

1S

a

category with the following

extra structure: •

An object unit with unique arrow Onea: a

unit for each object a,

A binary object map x such that, for all objects a, b and C, there is a specified function )abc. Hom(a, b) x Hom(a, C) Hom(a, b X C) and specified arrows and c, the arrow (fi, b and 12: a a such that for every fj: a b x c is the unique g satisfying •

(.

f2

bxc

C

such that, for all objects a, b and c, there is a specified funcA binary object map such that for every Hom(a x Hom(a, b c) and arrow b, C) tion c) satisfying (b is the unique k: a c, the arrow h: a x b •

(b—*c)

xb

c

App

Categories and Recursive Types

462

Example 7.2.15 The categories Set of sets, Mod of modest sets, and Cpo of CPOs are all cartesian closed, with unit, a x b and a b all interpreted as in the Henkin models based on sets (the full classical hierarchy), modest sets and CPOs. Example 7.2.16 For any typed lambda calculus signature E and E-Henkin model A, the category CA whose objects are the type expressions over E and arrows from a to r are the elements of Henkin model, A is a this category is cartesian closed. The terminal object of CA is the type unit, the product of objects a and r is a x t, and the function object for a and r is a r. Writing elements of the model as constants (i.e., using the signature in which there is a constant for each element of the model), we let Onea

=

(f,g)

a

x r. x x cr.(Proj1p)(Proj2p)

t Exercise 7.2.23 asks you to verify that the equations given in the second definition of CCC are satisfied.

.

Example 7.2.17 If C is any small category (i.e., the collections of objects and arrows are sets), then the functor category SetC is cartesian closed. Since we discuss the special case where 7) is a poset, in connection with Kripke lambda models in Section 7.3, we will not explain the cartesian closed structure of SetC here. However, it is worth mentioning that there exist Cartesian closed categories of functors that are not equivalent to any category obtained by regarding a Henkin model as a CCC. . Example 7.2.18

The category Cat of small categories and functors is cartesian closed (but not a small category). The terminal object of Cat is the category with one object and one arrow (which is the identity on this object). The product in Cat is the product operation on categories and the function object C —÷ V for categories C and V is the functor category DC.

.

Example 7.2.19 For any signature E and set E of equations between E-terms, we may construct a CCC called the term category generated by E. The objects of (E) are the types over E and the anows are equivalence classes of terms,

Cartesian Closed Categories

7.2

463

modulo provable equality from E. To eliminate problems having to do with the names of free variables, it is simplest to choose one variable of each type, and define the arrows from a to r using terms over the chosen free variable of type a. We will write [z: a M: r]E for the arrow of CE(E) given by the term z: a M: r, modulo E. In more detail, [z: a M: r]E is the set of typed terms I

The definition of the unique arrow One°: a —k unit, and operations ( ), App, Curry are straightforward. For example, One, ()and Curry are defined by

One°

([x:

a

M:

pl'

[x:

a

Curry°'t'19([z: a x r

=

[z:

a

=

[z:

aj x a2

*: unit]

a,]

N: ri)

= [x: a

(M, N): p x rI

M: p1)

= [x: a

Ày:

r.[(x, y)/z]M: pJ

Composition in the term category is defined by substitution, as follows. [x: a

M:

r]e;

[y: r

N:

= [x: a FM/yIN:

We sometimes omit the signature, writing C(E) instead of CE(E), when E is either irrelevant or clear from context. Exercise 7.2.24 asks you to verify that this category is cartesian closed. The term category is used to prove completeness in Exercise 7.2.39.

.

An interesting fact about cartesian closed categories that is relevant to lambda calculus with fixed-point operators is that any CCC with coproducts (sums; see Exercise 7.2.10) and a fixed-point operator fixa: (a a) —k a for every object a is preorder. This is the categorical version of the statement that if we add sum types to PCF, as described in Section 2.6.2, then we can prove any two terms of the same type equal. The proof for CCC's is essentially the same as for PCF with sums, given in Exercise 2.6.4. We can see how each of the usual categories of CPOs and continuous functions skirts this problem:

Cpo

The category Cpo of CPOs and continuous functions is cartesian closed and has coproducts (Exercise 7.2.10). But if the CPU D is not pointed, then a continuous D may not have a fixed point. function f: D

Cpo1 The category Cpo1 of pointed CPOs and strict continuous functions has coproducts (Exercise 7.2.10) and fixed point operators on every object. However, this category is not cartesian closed.

Categories and Recursive Types

464

Cppo The category Cppo of complete, pointed partial orders and (arbitrary) continuous functions is cartesian closed and has fixed point operators on every object but does not have coproducts.

Each of these claims may be verified by process of elimination. For example, we know by Lemma 5.2.20 that Cpo1 has fixed points, and Exercise 7.2.10 shows that Cpo1 has coproducts. Therefore, Cpo1 cannot be cartesian closed. For Cppo, we again have fixed points by Lemma 5.2.20, and can see that Cppo is cartesian closed since the terminal object (one-element CPO) of Cpo is pointed, and product and function space constructions preserve pointedness. The collection of cartesian closed categories forms a category. More precisely, the category CCC is the category whose objects are cartesian closed categories and morphisms are functors preserving cartesian closed structure. The morphisms of CCC are called cartesian closed functors, sometimes abbreviated to cc-functors. Another name for these functors that appears in the literature is representation of cartesian closed category. Since unit, x, and associated maps are specified as part of each CCC, we require that cc-functors preserve these. In more detail, if C and V are CCC's, then a functor F: C —÷ V is a cc-functor if unity, F(a x b) = F(a) x F(b), and similarly for function spaces. In addition, we require F(Onea) = = = , and the operations (),and Curry must be pre= served. For Curry, for example, this means that

(F(g)). In the remainder of this section, we prove two useful identities that hold in all cartesian closed categories. They are

(f,g)oh=(foh,goh) Curry(f)

o

h

Curry(f o (h

x id))

where f: c —± a, g: c —÷ b and h: d —÷ c in the first equation and f: a x b —± c and h: d —÷ a in the second. These equations, which are part of the "naturality conditions" associated with the first definition, may be proved using either the equations given in the second definition or the diagrams given in the third definition. To give an example of reasoning with pictures, we will prove these using diagrams. The first equation may be proved using the diagram

7.2

Cartesian Closed Categories

465

d

foh

axb

which is obtained from the "defining diagram" for (f, g) by adding an arrow h into c at the top of the picture and drawing both of the compositions with h that appear in the equation. and g, we have a triangle of the form that appears in the Ignoring the two arrows for third definition of CCC, and in the definition of product object from the previous section. The conditions on this diagram are that for every pair of outside arrows, in this case o h and g o h, there is a unique arrow,(f o h, g o h), from d into a x b that commutes with the projection functions. But since the arrow (f, g) o h, appearing as a path from d to a x b in this picture, also commutes with o h, g o h and the projection functions, it must be that

f

f

f

(f, g)

a h

=

a h, g a h).

The second equation is proved by similar reasoning, using the diagram dxa

h X ida

(h x ida)

cXa

(f) X ida

(a—*b) XC

b

App

Categories and Recursive Types

466

c and putting h x ida at the top of the "definobtained by assuming an arrow h: d forms the hying diagram" for function objects. Here the composition of h x id with potenuse of the new (larger) right triangle. The defining condition for function objects says (a —÷ b) such that k x id forms that given o (h x id) and App, the unique arrow k: d o a commuting (larger) triangle is k = Curry(f (h x id)). But since

f

f

(Curry(f)

x id)

o

(h x id)

= (Curry(f)

o

h) x id

also forms a commuting triangle, we must have Exercise 7.2.20).

Curry(f) o h = Curry(f

o

(h x id)) (see

Exercise 7.2.20 Show that the following implications hold in every CCC, assuming each arrow has an appropriate domain and codomain. (a)

(b)

1ff oProj1 =g oProj1, then f =g. (And similarly if f oProj2 =g oProj2.) If (f x id) o (g x id) = (h x id), then f o g = h. (Hint: Begin by writing k x

id as

Proj2).)

(k o

Exercise 7.2.21 Show that the second and third definitions of CCC are equivalent and that either of these implies the isomorphisms given in the first definition of CCC. Exercise 7.2.22 The product of two CCC's, in CCC, is the ordinary product of two categories. Show that CCC has function objects. Exercise 7.2.23 The category CA derived from a Henkin model A is described in ExHenkin model A, the category CA is amples 7.2.5 7.2.16. Show that for any cartesian closed, using the second definition of CCC. Exercise 7.2.24 Show that for any signature and theory E between s-terms, the category C(E) generated by E is cartesian closed. This category is described in Example 7.2.19. First write out the conditions you must check, such as if f: a —÷ unit in C(S) then = Onea. Then verify each condition.

f

Exercise 7.2.25 Adjoint situations occur commonly in mathematics and may be used to define terminal object, cartesian product, and function object succinctly. While there are several ways to define adjoint situations, the simplest and easiest to remember uses the comma category construction. If A, B and C are categories, with functors .F: A B and is defined as follows. (The original notation B, then the comma category (.F c: C was (.F, instead of (.F a).) The objects of (.F are triples (a, h, c) where a is an object of A, c is an object of C, and h: F(a) G(c) is a morphism of B. A morphism is pair of morphisms (f, g) with f: a (a, h, c) —* (a', h', c') in (.F a' in A and

7.2

Cartesian Closed Categories

467

g: c —÷ c' in C such that

F(a)

F(f)

G(c)

>

F(a')

>

G(c')

G(g)

A useful convention is to use the name of a category as the name of the identity functor on it. For example, we will use A as the name of the identity functor A A —÷ A. Using comma categories and the notation above, an adjoint situation consists of two categories, A and B, two functors F: A —÷ B and U: B —÷ A, and a functor Q: 13) (A 4. U) with inverse Q—' such that :

Q

AxB where the arrows into the product category A x B are the projections mapping an object (a, h, b) of either comma category to (a, b) and morphism (f, g) to (f, g). The functors into A x B force Q to map an object (a, h, b) of 4, 13) to an object (a, j, b) with the same first and last components, and similarly for Q—1. When such an adjoint situation exists, F is called the left adjoint of U and U the right adjoint of F. The names "F" and "U" come from the traditional examples in algebra where F(a) produces the free algebra over a, and U(b) returns the underlying set of the algebra b. A category C has a (a) The terminal category Y has one object, *, and one morphism, terminal object iff the functor C —÷ 'T (there is only one possible functor) has a right gives an isomorphism of adjoint U. Show that Q: (C 4. U) with inverse 4, T) every object of Define a terminal object of C and hom-sets 'T(*, *) C(c, U(*)) for c C. morphism One from Y and the functors involved in this adjoint situation and show that they have the properties required in either the second or third definition of CCC.

Categories and Recursive Types

468

C —÷ (b) A category C has products if there is a right adjoint U to the diagonalfunctor to (f, f). Show how such an C x C mapping object c to (c, c) and similarly morphism adjoint situation gives an isomorphism of horn-sets C(a, b) x C(a, c) C(a, b x c) for any objects a, b and c of C. Define pairing, projection morphisms (as in the second and third definitions of CCC) and the product functor C x C —÷ C from the functors involved in this adjoint situation and show that they have the required properties.

f

(c) Assuming a category C has products, which we will write in the usual way, C has function spaces if, for every object b of C, there is a right adjoint to the the functor Fb: C —÷ C with F(a) = a x b. Show how such an adjoint situation gives an isomorphism of horn-sets C(a x b, c) C(a, b —± c) for any objects a and c of C. Define currying and application morphism (as in the second and third definitions of CCC) and the function space functor C —÷ C mapping c to b —± c from the functors involved in this adjoint

situation and show that they have the required properties. Note that the problem only asks for a covariant functor mapping c to b —± c, not a full contra/covariant functor mapping b and c to b —± c.

Soundness and the Interpretation of Terms

7.2.4

We may interpret Aumt, X terms over any signature in any CCC, once we have chosen an object of the category for each type constant and a morphism for each term constant. signature E, and an object for type constant, we More specifically, given some use unit and object maps x and —± to interpret each type expression as some object of the category. Writing I[cr]J for the object named by type expression cr, we must also choose, for each term constant C: cr of an arrow unit —± E[cr]J. We will call a CCC, together with interpretations for type constants and term constants from E, a -CCC. This is similar to the terminology we use for algebra in Chapter 3, where we refer to an algebra with an interpretation for symbols from algebraic signature E as a s-algebra. Although the interpretation of type expressions as objects of a CCC may be clear from the brief description in the preceding paragraph, we give a full inductive definition to X,-± signature eliminate any confusion. Given a and C, we define the interpretation of type expression a over in CCC C as follows. CE[unit]J

= unit,

CI[b]J

=

CE[a x nil

=CIIa]l x

CE[a

—÷

b, given as part

of the

C,

CILr]J, CE[rjJ.

As with other interpretation functions, we often omit the name of the category, writing E[riJ

Cartesian Closed Categories

7.2

469

instead of C 11cr when the category C is clear from context. It is also sometimes convenient to omit the double brackets and use type expressions as names for objects of the category. If F M: cr is a typed ) flit,X,± term, then the interpretation of this term in C will be an arrow from the types of its free variables to the semantic type of the term, L[cr]1. This is analogous to the Henkin-model meaning of a term as a map from environments, which give the values of the free variables, to the semantic type of the term. One way of making "the type of the free variables" precise is to write the term with only one free variable, using cartesian products. More precisely, if F = (xi: crk}, we may transform a term F M: o into an essentially equivalent term y: (cit x ... x M': o-, where M' is obtained from M by replacing each free variable by the appropriate combination of projection functions applied to y. This approach has been used to some extent in the literature. Instead of following it here, we will give a direct interpretation of arbitrary terms that has exactly the same result. We formalize "the type of the free variables" of a term by interpreting each typing of typing context F is defined by induction context as an object. The interpretation on the length of the context: 11

—unit,

CL[ø]1

CL[T,x:oil =CL[F]1

C: cr]]

= where

ClEF i>

(M, N): ClEF

a

11cr]]

is given

a]]

=

o ClEF

M:

a

x ri]

rJ]

=

o ClEF i>

M:

a

x ri]

x

r]J

= =

MN: ri]

CftF1, x: a,

unit —*

M:

=

(CIIF

M: all, ClEF o (CEIF

CffF1, F2

M:

M: nJ

o

N: rJ])

a

r]], ClEF

211

N: aJ])

where

f is the

n, (n + 1)-function such that (F1, x: a, F2)f

=

F1,

This interpretation of terms should seem at least superficially natural, since each term is interpreted using the part of the CCC that is intended for this purpose. For example, we have a terminal object so that we can interpret *, and the morphisms associated with a and ( , ) are associated terminal object are indeed used to interpret *. Similarly, with product objects, and these are used directly to interpret term forms associated with product types. While one might expect the interpretation of a variable x: a x: a to be the identity morphism, a morphism unit x a —÷ a is used instead since the interpretation of context x: a is unit x a. Perhaps the most interesting aspect of this interpretation of terms is that lambda abstraction is interpreted as Currying. This makes intuitive sense if

Categories and Recursive Types

472

a. M from F, x: a M. If we view M as a function we remember that we form F x hail. When we change x from a free to of F, x: a, the domain of this function is and, for obtain a term whose meaning depends only on a lambda-bound variable, we any values for the free variables, determines a function from f[a]I to the type of M. Thus a. M is exactly Currying. The clause for (add var) the passage from F, x: a M to F is discussed below. It is a worthwhile exercise to check that for every typable term, we have :

In words, the meaning of a term F M: a in a cartesian closed category is a morphism from I[FIfl to f[a]j. We have already seen, in Examples 7.2.5 and 7.2.16, that every Henkin model A determines a cartesian closed category CA. It is shown in Exercise 7.2.38 that a in the cartesian closed category CA, dethe meaning of a term Xl: ai xk: termined according to the definition above, corresponds to the meaning of the closed term unit, Xl: ai xk: ak). M in the Henkin model A. Unlike the Henkin model interpretation of (add var), the CCC interpretation of this typing rule is nontrivial. The reason is that this rule changes the context, and therefore changes the "type" of the meaning of the term. Intuitively, since a new variable added to a context must not appear free in the term, the meaning of F1, x: a, F2 M: r is obtained by composing the meaning of F1, F2 M: r with a combination of projection functions that "throw away" the value of free variable x and keep the values of the variables listed in F1, F2. A special case of the clause above (see Exercise 7.2.34) is

Cur, x: a

M:

rJJ

= CIIIF M:

rJJ

We give some intuition for the interpretation

a

of (add var), using this special case, in the

following example.

Example 7.2.26 As stated in Lemma 7.2.29 below, the meaning of a constant satisfies the condition

where unit —k is given as part of the E-CCC. Intuitively, since the meaning of a constant does not depend on the values of any variables, we compose the map unit —k hrIJl with some constant function iliFifi —k unit that does not depend on the value of type IIIFII. We will work out a typical example with F = {x: a, y: p1. The only way to prove the typing assertion F C: r, using our typing rules, is to begin with 0 C: r and use (add var) twice. Let us assume we derive x: a C: r first, and then

Cartesian Closed Categories

7.2

x: ci, y: p flux: ci,

C:

y: p

473

t. The meaning of the term in a CCC is

C:

nj =

ci

= (f[Ø o

C:

njj

P

o

P

o

C:

Proj1

a

) o

Proj1

By the associativity of composition, we can write this as where a

=

o

One

txa)xp

:

o

a

P),

o

(unit x a) x p —+ unit

since the definition of CCC guarantees that there is on'y one arrow from (unit x ci) x p to unit.

.

Example 7.2.27 The term Xx: ci —± r. Xy: ci. xy is one of the simplest that requires all but the typing rule for constants. We will work out its meaning and see that it is equal to the lambda term for the identity. Working out the meaning of each subterm, we have the following calculation. 110 r>

Xx: ci

lix: ci

—÷

—± n.

Xy: ci. xyjj

t, y: ci xy:

nil

= Curry (Curry (lix: ci —± t, y: ci r> xy: nil)) = o (f[x: ci —± r, y: ci x]], ci —÷ r, y: a

y]])

.(unitxa—÷r) a

=

o

(unit x

ci

—±

n)

= Proj2•(unitxa—÷r) (unit x

ci

—±

x

ci

—÷ (ci

x

ci

—÷

—±

a

n)

ci

This gives us the final result

Curry (Curry (Appat

o

o

Proj

ta-÷r),a

If we recognize that sequences of projection functions

are used to give the meaning of a variable, as a function of the tuple of variables in the typing context, then we can begin to recognize the form of the term from its interpretation. Specifically, from the way the meaning of the term is written, we can see that the term must have the form X—abstraction (X—abstraction (application (variable, variable))).

Categories and Recursive Types

474

Intuitively, the direct correspondence between the interpretation of a term, expressed using the basic operations given as part of a CCC, and the lambda calculus syntax, demonstrates the close connection between the two systems. It is interesting to note that unlike the Henkin model meaning of a term, the variable names never appear in the calculation. This makes it easy to see that a-conversion does not change the meaning of a term. a t. x, whose meaning in a CCC is We can compare the term above with 0 directly calculated as —÷

t.x]J = Curry(E[x:a

—÷

t

—÷

t]J)

= Curry unit

((a

(a

These two meanings are in fact the same morphism, which may be easily verified using the equations given in the second definition of CCC. Specifically, this is an immediate

consequence of the equation

Curry(App

o

(k

o

Proj1, Proj2))

=k

which may be regarded as stating the soundness of i'-conversion in a CCC. (The other equation associated with with function objects in the second definition of CCC similarly . ensures soundness of $-conversion.) It is worth noting that in defining the meaning of a term in a CCC, we do not need an extra condition, such as in the environment model condition for Henkin models, stating that every typed term must have a meaning. In essence, the definition of CCC is similar to the combinatory form of Henkin model definition. A CCC is required to have a set of operations which, together, allow us to interpret each term. In fact, we may regard the basic operations One, (., .), Curry, and App as an alternate set of combinators

for typed lambda calculus, with the translation of lambda terms into combinators given by the meaning function above. The use of these categorical combinators is explored in [Cur86] and the references given there. As shown in Exercise 7.2.38, every Henkin model can be regarded as a CCC, with the two meaning functions giving the same interpretation of terms. Therefore, CCC's may be regarded as a generalization of Henkin models. (Further discussion appears in Section 7.2.5.) Since we have generalized the semantics of lambda terms, we must reprove the basic facts about the meanings of terms, such as the substitution lemma and fact that the meaning of term depends only on its free variables. Many of these are routine, often using essentially the same arguments as we used for Henkin models in Chapter 4. An important

7.2

Cartesian Closed Categories

475

property that is not routinely proved by essentially the same argument is the coherence of the meaning function for CCC's. Specifically, we must show that for any term F M: a, the meanings we obtain using different proofs of the typing assertion F M: a are all identical. The reason why this is more complicated for CCC's than for Henkin models is that we have used relatively complicated sequences of projection functions to correctly account for the dependence of a variable on the typing context.

Example 7.2.28 We can see the importance of coherence by working out the interpretation of y: a (?.x: a. x)y: a and comparing it to the /3-equivalent term y: a y: a.

Fly:

a

=

r>

unit x

a —k

a

a. x, the first is to assume There are two ways we might calculate the meaning of y: ci a. x with no free variables. The that this term is typed by adding y: a to the closed term second involves adding the variable y to the context before doing the lambda abstraction. We work both of these out below. Fly: ci

a. xli =

a. xl]

110

o

=

o

unit x E[y:

a

unit, a

Proj1

a —k

a)

(ci

a. xli = Curry(F[y: a, x: a

r>

= Curry (Proj2•unitxa a ) If we use the first, then the interpretation of the original term is written Fly: ci

(?.x: a. x)yIl

= App

o

o

Projrlta,

h by the general equation App o (Curry(h) o Proj1, Proj2) which is equal to given in the second definition of CCC. Thus this part of the definition of CCC corresponds directly to the soundness of /3-conversion. The reader may verify that the second interprea. x is equal to the first in Exercise 7.2.35. tation of the subterm y: a

Rather than prove coherence directly, we will prove simultaneously that the meaning of a term does not depend on which typing derivation we use and that permuting or dropping unnecessary variables from the typing context changes the meaning in a predictable way. By characterizing the effect of permutation, we will be able to treat typing contexts in

Categories and Recursive Types

476

equations as unordered. This allows us to continue to use unordered typing contexts in the equational proof system. In addition, we will use properties of permuting or dropping variables in the coherence proof. Let C be a E-CCC. The meaning function signature E has the following properties:

Lemma 7.2.29

11

on typed terms over

of length (i) For any m, n-function f, and ordered type assignment F = Xl: a! n such that F M: a is well-typed and Ff contains all free variables of M, we have CF[F

M:cr]j °

M:cr]j

(ii)

CDjxi:ai

=

unit]]

C([Fi>c:rlJ

CF[F

CF[F

cr]]

=

o

M:

a x ill

ni

=

o

M:

a x

=

M:a]j,

(M, N): p x

nil

C([F

r]1

N: nil) n]j,

—±

nil

f such every

that

Ff contains

FV(M) but notx

Proof We prove conditions (i) and (ii) simultaneously, by a single induction on the structure of terms (not their typing derivations). The general pattern is to prove (ii) first, using the induction hypothesis (i) for simpler terms, then check that (i) holds for the term itself. For a variable, xi: Ui x1: the only typing derivations begin with axiom (var), followed by n — 1 uses of the (add var) typing rule. We prove (ii) by a subinduction on the number of times (add var) is used. The base case is

which satisfies the condition of the lemma. For the induction step, suppose xi: a1

7.2

Cartesian Closed Categories

477

is derived by adding the variable xk. Then we have .

.

unitxa1x...xa

= where i

f is the (n

—

1),

n-function mapping each

j .a1x

.xa

=

.a1x...xa

o

1

.(aix...xan_i),an Proj1

1

i

w. Therefore, we say a Kripke applicative structure A is from to extensional if, for all f, g e

f

f = g whenever Vw'> w.Va e

a

a.

This can also be stated using the standard interpretation of predicate logic in Kripke structures. However, since this requires extra machinery, we will not go into predicate logic here. The main point that may be of interest to the general reader is that the statement above is exactly the result of interpreting the usual formula for extensionality (not mentioning possible worlds at all) in a Kripke model. There are two ways to identify the extensional applicative structures that have enough elements to be models. As with ordinary applicative structures, the environment and combinatory model definitions are equivalent. Since the combinatory condition is simpler, we will use it here. For all types a, r and p, we need global elements of type a a r and a a) of type (p r) (p r. The equational condition on Ka,t p is that for a e we must have b e a b = a. The condition for S is derived from the equational axiom in Section 4.5.7 similarly. We define a Kripke lambda model to be a Kripke applicative structure A that is extensional and has combinators. As mentioned in the last section, the interpretation of each type in a Kripke model is a Kripke model over functor from a partial order of possible worlds into Set. The full SetT any given functors F1 —÷ Set is the full subcategory whose objects : P of are all functors definable from F1 using and whose arrows are all natural transformations between these functors. Since we did not define function objects in SetT in Example 7.2.17, we do so after the following example.

Example 7.3.2

This example describes part of a Kripke lambda model and shows that may be nonempty, even if and are both empty. Let P = {O, 1, 2} with I 2. We are interested in the Kripke lambda model A with a single type constant a o interpreted as the functor F : P Set with F(O) = 0, F(1) = c2} and F(2) = {c}. (This means that = F(i) for i e {O, 1, 2}; each is uniquely deter= mined.)

Kripke Lambda Models and Functor Categories

7.3

F

A natural transformation

495

F must satisfy the diagram

v(2)

{c}

{c}

v(1)

{cl,c2}

{cl,c2}

0

>

0

v(O)

Since v(O) and v(1) are uniquely determined and any map from {ci, c2} to {ci, c2} will do for v(l), there are four natural transformations from F to F. Let us call these v, v', V', V"t.

The function type should represent all natural transformations from F to G. However, by extensionality, we may only have as many elements of as there are natural transformations when we consider the possible worlds that are w. In particular, may only have one element, since there is only one possible mapping from to and there are no other worlds 2. Let us write Vlw for the result of restricting V to worlds > w. In other words, Vlw is the function with domain {w' w'> w} such that = V(W'). Using this notation, we may define Aa+a by a—*a

A2

If

F/f

=

{Vl2,

=

{VII, V'Il, V"I1, V"ji}

=

{V, V', V",

F

V

V

2,

V

21

V}

Note, however, that v12 = V'12 Application is given by VX

=

= v12 =

since there is only one function from

to

V(W)(X),

which is the general definition for any full Kripke lambda model where the elements of function type are natural transformations. The reader is encouraged to verify that this is an four elements of extensional interpretation of a —* a with four elements of and one element of As mentioned in Example 7.2.17, the category SetC of functors from C into Set is cartesian closed, for any small category C. Any functor F with each F(c) a one-element Set is given by set will be a terminal object of SetC. The product of functors F, G : C

(F x G)(c) = F(c) x G(c),

Categories and Recursive Types

496

f

to F(f) x G(f). The function object with the arrow may of F x G taking an arrow (F —÷ G) : C —k Set defined using the Horn functor with Horn(a, _): C —k Set taking object c to the set Hom(a, c) and acting on morphisms by composition. Using Horn, the functor F —÷ G is defined as follows:

= the set of all natural transformations from Horn(c, _) x F to

(F —÷ G)(c) (F —÷ G)(f: a

—÷

b)

=

G

the map taking natural transformation v:

Hom(a, _) x F —k G

toji=(F—÷ G)(f)(v):Horn(b,_) x F—÷

G

according to the diagram below, for each object c of C. Horn(b, c) x F(c)

b

G(c)

>

lid

(_of)xidJ,

fT

Hom(a, c) x F(c)

a

G(c)

>

G(g)

f)

f.

where the function (_ o : Hom(b, c) —k Hom(a, c) maps h to h o In the special case of Set1', where P is a partial order, each of the horn-sets have at most one element. This allows us to simplify the definition of F —÷ G considerably. If w is an object of the partial order P, it will be convenient to write the objects that are w in the partial order. If F: P

P

—* Set, then we similarly write restricting to follows.

f

P

for the functor from into Set obtained by —* Set, we can simplify the definition above as

(F —÷ G)(w)

= the set of all natural transformations from

(F —÷

= the rnap taking natural transformation to

= (F —÷ G)(f)(v):

v:

to G

—k G

—* G

according to the diagrarn below, for each object w"

w'

)

W

>

w'? w in P.

G(w)

lid —

v(w)

G(W)

In other words, (F G)(f)(v) is identical to v, restricted to the subcategory Put in the notation of Kripke lambda models, if types a and r are interpreted as functors

7.3

Kripke Lambda Models and Functor Categories

F, G P —± Set by = F(w), Kripke lambda model is the functor

497

= G(w) then the interpretation of a = F —÷ G with

:

= the set of all natural transformations from = the map restricting Exercise

7.3.3

—*

r in the full

to G

—± G to

v:

and

Describe

in

the full Kripke model of Exam-

ple 7.3.2. 7.3.5

Environments

and Meanings of Terms

An environment ij for a Kripke applicative structure A is a partial mapping from variables and worlds to elements of A such that

If

e

and

w'>

w, then ijxw'

=

(env)

Intuitively, an environment ij maps a variable x to a "partial element" that may exist (or be defined) at some worlds, but not necessarily all worlds. Since a type may be empty at one world and then nonempty later, we need to have environments such that is undefined at some w, and then "becomes" defined at a later w'> w. We will return to this point after defining the meanings of terms. we write ij[x If ij is an environment and a e a] for the environment identical to ij on variables other than x, and with (ij[x

a])xw'

for all w' w. We take (ij[x i—÷ a])xw' to be undefined if w' is not greater than or equal to a] actually depends on w if the are not disjoint. W. Note that the definition of ij[x If ij is an environment for applicative structure A, and F is a type assignment, we say ij F if satisfies F at W, written ij, W

EF. Note that if ij, W F and w'> W, then ij, w' F. F, we define the For any Kripke model A, world ui and environment ij with ij, W meaning AI[F M: aIIiJW of term F M: a in environment ij at world ui by induction on typing derivations, as for standard Henkin models. For notational simplicity, we write If. .11 ... in place of All. .11 ... Since the clause for (add var) is trivial, it is omitted. .

Categories and Recursive Types

498

= =

aJjijw Fir

)7XW

=

MN:

= the unique d

a

M: a

(hF

(hF

—÷

N:

E

such that for all

w'>

w and a E

= IF, x: a

M:

al w'

Combinators and extensionality guarantee that in the F Xx: a.M: a —± r case, d exists and is unique. This is proved as in the standard Henkin setting, using translation into combinators as in Section 4.5.7 for existence, and extensionality for uniqueness. We say an equation r M = N: a holds at w and ij, written

(r

w

M = N: a)

if, whenever

w

M:

=

Fir

r, we Fir

A model A satisfies F satisfy the equation.

have

N: M

= N: a, written A

F

M = N: a, if every

ij

and w of

A

Example 7.3.4 As an application of Kripke models, we will show how to construct a counter-model to an implication that holds over Henkin models. Let E1 and E2 be the equations E1

Xx:a.fPi=Xx:a.fP2

and E2

fPi=fP2,

Xx: a. where f: (a a a) b, P1 a. x and P2 a. a. y. Exercise 4.5.24 shows how rule (nonempty) can be used to prove E2 from E1 and Exercise 4.5.29 how to prove E2 from E1 using rules (empty I) and (empty E). Since E1 does not semantically imply E2 over Kripke models, it follows that we cannot prove E2 from E1 without using one of these extensions to the proof system. Furthermore, neither (nonempty) nor (empty I) and (empty E) are sound for Kripke lambda models. We must construct a Kripke lambda model A for type constants a and b and term constant that satisfies E1 but not E2. If there is a global element of type a, then we could apply both functions in E1 to this element and obtain E2. If is empty at all possible worlds, then the argument given in Exercise 4.5.29 shows that E1 implies E2. Therefore,

f

7.3

Kripke Lambda Models and Functor Categories

499

we must let A'1 be empty at some world and nonempty at another. Since it suffices to have two possible worlds, we let W = {O, l} with 0 < and take = 0. Since we need P1 and P2 to be different, we let A? have a two distinct elements, c1 and c2. Equation E1 will hold at world 0, since a is empty. To make Pi = P2 equal at world 1, we let have only one element, say d. Without committing to we can see that E1 must hold at both worlds: the two terms of type a —k b are equal at world 0 since =0 and equal at world since has only one element. We let have two elements, d1 and d2, so that E2 may fail at world 0. Our counter-model is the full Kripke lambda model over the two functors A'2 and Ab. As described in Example 7.3.2 and Exercise 7.3.3, and therefore are nonempty at world 0. If we let map to d1 and to d2, then the equation E2 fails at world 0. Since E2 holds in A only if it holds at all possible worlds of A, we can see that A E1 but A E2. 1

f

f

1

Exercise 7.3.5 We may consider classical applicative structures as a special case of Kripke applicative structures, using any partial order. To be specific, let A = {AppoT}) be a classical applicative structure, i.e., {A°j is a collection of sets indexed by is a collection of application functions. We define the Kripke structure types and

Aw = (W,

Const)

= = by taking sets application functions and each transition functo be the identity. In categorical terms, Aw could be called an applicative struction ture of constant presheaves. Show that if A is a classical lambda model then Aw is a Kripke lambda model by checking extensionality and using induction on terms to prove that the meaning of a term M in Aw is the same as the meaning of M in A, in any environment. 7.3.6

Soundness and Completeness

We have soundness and completeness theorems for Kripke lambda models.

Let S be a set of well-typed equations. then every model satisfying S also satisfies F M = N: a.

Lemma 7.3.6 (Soundness)

IfS

F

F—

M

= N: a,

A direct proof is given as Exercise 7.3.8. An alternate proof may be obtained using the correspondence between Kripke models and CCC's discussed in Section 7.3.7.

Theorem 7.3.7 (Completeness for Kripke Models)

For every lambda theory 5, there is a Kripke lambda model satisfying precisely the equations belonging to S.

Proof The completeness theorem is proved by constructing a term model

Categories and Recursive Types

500

A = (W,

Const)

of the following form.

W is the poset of finite type assignments F ordered by inclusion. In what follows, we will write F for an arbitrary element of W. •

M: a], where F M: a with respect to E is

is the set of all [F

•

of F

[F M:

a]) =

M:

a

is well-typed, and the equivalence class

N:

M: a] for F C F'

It is easy to check that the definition makes sense, and that we have global elements K and S at all appropriate types. For example,

= [)Lx: a.)Ly: r.x] Since the required properties of K and S are easily verified, the term applicative structure is a Kripke combinatory algebra. The proof of extensionality illustrates the difference between Kripke models and ordiand [F N: a —± r] have the same nary Henkin models. Suppose that [F M: a —± functional behavior, i.e., for all F'> F and F' P: a, we have

Then, in particular, for F' [F, x: a

Mx: r]

[F, x: a

F, x: a with x not in F, we have c'-

Nx:

r]

and axiom

(ij),

we have

ifx:aEFCF'

1)X

—

j

undefined

otherwise

[F

M:

a

r]

N: a —± r]. Thus A is a Kripke lambda model. This argument may be compared with the step in the proof of Lemma 8.3.1 showing that if E F-Fr> MN = M'N': r for every N, N' with E H Fr> N = N': a, then E F- F c- M = M': a —± r using the assumption that 7-1 provides infinitely many variables of each type. It remains to show that A satisfies precisely the equations belonging to E. We begin by relating the interpretation of a term to its equivalence class. If F is any type assignment, we may define an environment ij by and so by rule

—±

[F

Kripke Lambda Models and Functor Categories

7.3

501

A straightforward induction on terms shows that for any F" N

N

F, we have

M: a jj1)F" = [F" N M: all

In particular, whenever A satisfies an equation F construction of Ti, and so

[F

F'

M:

oll=

M = N: a, we have

F

=

F by

[F N N: all

Since this reasoning applies to every F, every equation satisfied by A must be provable from S. While it is possible to show that A satisfies every equation in S directly, by similar reasoning, certain complications may be avoided by restricting our attention to closed terms. There is no loss of generality in doing so, since it is easy to prove the closed equation

=Axl:al....Axk:crk.N from the equation

between open terms, and vice versa. For any closed equation 0

M

= N: r

E

5, we have

SF-I' M = N: r for any F, by rule (add var). Therefore, for every world F of A, the two equivalence and [F N: r} will be identical. Since the meaning of 0 M: r in any classes [F M: environment Ti at world F will be [F M: r], and similarly for 0 N: r, it follows that A satisfies 0 M = N: r. This proves the theorem.

Exercise 7.3.8 Prove Lemma 7.3.6 by induction on equational proofs. 7.3.7

Kripke Lambda Models as Cartesian Closed Categories

In this section, we will see that any Kripke model A with products and a terminal type determines a cartesian closed category CA. As one would hope, the categorical interpretation of a term in CA coincides with the meaning of the term in the Kripke model A. It is easy to extend the definitions of Kripke applicative structure and Kripke lambda model to include cartesian product types a x r and a terminal type unit. The main ideas are the same as for Henkin models, outlined in Section 4.5.9. Specifically, a structure, but with additional operations applicative structure is just like a

Categories and Recursive Types

502

—÷ —÷ at each possible world w. An extensional structure, and therefore model, must satisfy the must have exactly one element at each world, and conditions that

Vp, q

A

E

=

Dp

=q

at each world. The interpretation of combinators * and Pair, as described in Section 4.5.9, give a name to the unique global element of A and force there to be a pairing

function. X,+ Kripke model A, we regard a partially-ordered To construct a CCC CA from set 0, and show that the least fixed point of F is isornorphic to the natural numbers. What operation on sat. unit + t corresponds to successor? Is W coproduct in

Exercise 7.4.1 A

W

B

=

({O}

Sets? 7.4.2

Diagrams, Cones and Limits

Our first step is to generalize the definition of least upper bound from partial orders to categories. This requires some subsidiary definitions. In keeping with standard categorical

Categories and Recursive Types

510

terminology, we consider the generalization of greatest lower bound before least upper bound. A diagram D in a category C is a directed graph whose vertices are labeled by objects of from C and arcs are labeled by morphisms such that if the edge e is labeled by an arrow a to b, then the labels on the endpoints of e must be a and b. Some examples of diagrams are the squares and triangles that were drawn in earlier sections. If a vertex of D is labeled by object a, we will often ambiguously refer to both the vertex and its label as a, when no confusion seems likely to result, and similarly for morphisms and arcs. If D is a diagram in obtained by reversing the direction is the diagram in the opposite category C, then of each of the arcs (arrows). As explained in Section 7.2.2, a diagram commutes if, for every pair of vertices, all compositions given by a path from the first vertex to the second are equal morphisms of the category. The reason for defining diagrams formally is to define limits; the definition of limit, in turn, requires the definition of cone. A cone (c, {fd: c —± d}d D) over commuting diagram D consists of an object c together with a morphism fd: c —± d for each vertex d of D such that the diagram obtained by adding c and all fd: c —± d to D commutes. A morphism of cones, (c, {fd: c —* d}d in D) (c', {gd: c' —* d}d D), both over the same diagram D, is a morphism h c —± c' such that the entire diagram combining h, both cones, and D commutes. Since we assume each cone commutes, it suffices to check that for each d in D, the triangle asserting fd = gj o h commutes. This is illustrated in Figure 7.1. It is easy to check that cones and their morphisms form a category. A limit of a diagram D is a terminal cone over D. In other words, a limit is a cone (c, {fd: c —* d}) such that for any cone (c', {gd: c' —* d}) over the same diagram, there is a unique morphism of cones from c' to c. The dual notion is colimit: a colimit of D is a limit of the diagram in the opposite category C°". It follows from Exercise 7.2.8, showing that initial and terminal objects are unique up to isomorphism, that any two limits or colimits of a diagram are isomorphic in the category of cones or cocones. It is easy to see that if h c c' is an isomorphism of cones or cocones, it is also an isomorphism of c and c'in the underlying category.

f

:

:

The limit of a discrete diagram D = {d1, d2}, with two objects and no arrows, is the product, d1 x d2, with projection functions d1 x d2 —± d. (See Exercise 7.4.4.) .

Example 7.4.2

Example 7.4.3

Let C be a partial order, regarded as a category in the usual way (Example 7.2.6) and let D be set of objects from C. We may regard D as a diagram by including all of the "less-than-or-equal-to" arrows between elements of D. Since all diagrams in C commute, a cone over D is any lower bound of D, and the limit of D is the greatest lower

7.4

Domain Models of Recursive Types

511

C

d5 d3

d1

Figure 7.1 Morphism of cones.

bound of D. We can also describe least upper bounds as limits using the opposite category Specifically, the least upper bound of D is the limit of D in C. .

Example 7.4.3 shows that least upper bounds are a special case of colimits. The way we will obtain recursively-defined objects is by generalizing the least fixed-point construction to obtain colimits of certain diagrams. Since we will therefore have more occasion to work with colimits than limits, it is useful to have a direct definition of colimits using cocones instead of opposite categories. A cocone (c, {fd: d c)d D) over diagram D c for each vertex d of D such consists of an object c together with a morphism fd: d that the diagram obtained by adding c and all fd: d —± c to D commutes. A morphism of (C', {gd: d c')), is a morphism h c c' such that the cocones, (c, {fd: d c}) entire diagram combining h, both cocones, and D commutes. c c A for a cone = (c, If A is a diagram, a common notation is to write = (c, {lLd: d —± c)d in D), since these highlight the A c for a cocone d)d in D) and directions of the arrows involved. :

Exercise 7.4.4 Show that the limit of a discrete diagram D = Example 7.4.2, if it exists, is the product di x d2. Exercise 7.4.5

If f, g: a

—±

b, then an

equalizer of

{d1, d2), as

described in

f and g is a limit of the diagram

Categories and Recursive Types

512

f b.

a

g

Show that an equalizer is determined by an arrow h: c

—±

a

and that, in Set, the image of h

is(xEa f(x)=g(x)). I

7.4.3

F-algebras

We will interpret a recursively-defined type as a "least fixed point" of an appropriate functor. In this section, we show that the least fixed point of a functor F is the initial object in the category of F-algebras, which are also defined below. We will show in the next section that an initial F-algebra is also the colimit of an appropriate diagram. The ordinary algebras (as in Chapter 3) for an algebraic signature may be characterized as F-algebras in the category of sets. To illustrate this concretely, let us take the signature

E with one sort, s, and three function symbols,

fi:sxs—*s, f2:sxsxs—*s, f3:s. and of Recall that a E-algebra A consists of a set, A, together with functions the appropriate type. If we use a one-element set unit, products and sums (disjoint unions), we can reformulate any E-algebra as a set, A, together with a single function

fA:(A xA)+(A xA can all be recovered from fA using injection functions. Since any since and composition of products, sums and constant functors yields a functor, we may define a functor,

Fr(s) =

(s

x s) + (s x s x s) + unit,

f:

—± and say that a s-algebra is a pair (A, f) consisting of a set A and a function for any algebraic signature A. It is easy to formulate a general definition of functor This allows us to define of the class of algebras with a given signature using categorical concepts. If C is a category and F C —± C is a functor, then an F-algebra is a pair (a, f) consisting of an object a and a morphism F(a) —± a in C. This is equivalent to the definition in Chapter 3. More precisely, for any algebraic signature the Fr-algebras in Set are exactly the s-algebras of universal algebra. We can also use the definition for categories other than Set, which is one of the advantages of using categorical terminology. The curious reader may wonder why the definition of F-algebra requires that F be :

f:

7.4

Domain Models of Recursive Types

513

of just a function from objects to objects. After all, in establishing a correspondence with the definition of s-algebra, we only use the object part of FE. The answer to this question is that, using functors, we may give a straightforward definition of algebraic homomorphism, as verified in Exercise 7.4.7. Specifically, a morphism of Falgebras from (a, f) to (b, g) is a morphism h a b satisfying the commuting diagram a functor, instead

:

F(a)

F(h)

F(b)

a

b

>

h

Since F-algebras and F-algebra morphisms form a category, we write h: (a, f) (b, g) to indicate that h is a morphism of F-algebras. An F-algebra (a, f) is a JLxed point of F if F(a) a is an isomorphism. The category of fixed points of F form a subcategory of the category of F-algebras, using Falgebra morphisms as morphisms between fixed points. In the special case that C is a CPO and a functor is a monotonic function f: C C, given definition of fixed point just is equivalent definition in 4, the to the Chapter by antisymmetry. An f-algebra, in the case that C is a CPO, is called a pre-fixed-point of f, as described in Exercise 5.2.3 1. The following lemma identifies fixed points of a functor F within the category of Falgebras and shows that an initial F-algebra is a fixed point of F. For the special case of a CPO as a category, this shows that the least pre-fixed-point is the least fixed point, which is part (b) of Exercise 5.2.3 1. Since F-algebra morphisms are a generalization of standard algebraic homomorphisms (when some category other than Set is used), initial F-algebras are also a generalization of the the initial algebras described in Chapter 3. In particular, the discussion of initiality in Sections 3.5.2 and 3.6.2 provides some useful intuition for F-algebras in arbitrary categories.

f:

Lemma 7.4.6

If (a,

f)

is an initial F-algebra, then (a,

Proof Suppose (a, f) is initial. We show (a, f)

F(F(a))

F(f) >

f)

is an initial fixed point of

F.

is a fixed point using the diagram

F(a)

.tf F(a)

>

f

a

which shows that is an F-algebra morphism from (F(a), F(f)) to (a, f). Since (a, f) is initial, we also have an F-algebra morphism g: (a, f) —± (F(a), F(f)) in the opposite

Categories and Recursive Types

514

direction. Since the only morphism from an initial objects to itself (in any category) is the identity, it is easy to see that o g = ida. Since F is a functor, it follows F(f) o F(g) =

f

ldF(a).

The remaining identity is proved by adding g and F(g) horizontally and vertically to the square above. Since the result commutes, and F(f) o F(g) = as noted above, we may conclude g o = This shows (a, f) is a fixed point. Since the fixed points form a subcategory of the category of F-algebras, it follows that (a, f) is initial in the category of fixed point of F. .

f

and functor F>2 above, verify that morExercise 7.4.7 Using the example signature phisms of F>2 algebras (in Set) are precisely ordinary homomorphisms of -algebras.

Let F: Gpo —* Gpo be the lifting functor with F(A) = A1 and, for f: A B, F(f) = A1 —* B1. Describe the initial F-algebra and explain, for any other F-algebra, the unique F-algebra morphism.

Exercise 7.4.8 —*

Exercise 7.4.9

In Gpo or Set, let

F be the functor F(A) = unit +

A.

(a) Describe the initial F-algebra (A, f) and explain, for any other F-algebra (B, g), the unique F-algebra morphism hB,g: A —* B.

(b) If we think of the unique hB,g: A —* B as a function of g, for each B, we obtain a function hB: ((unit + B) —* B) —* B. What standard operation from recursive function theory does this provide? (Hint: A function (unit + B) —* B is equivalent to a pair B x (B —* B); see Exercise 2.5.7.) (c) Why is the initial F-algebra the same in Gpo and Set?

7.4.4

w-Chains and Initial F-algebras

As discussed earlier, we are interested in finding least fixed-points of functors. The main lemma in this section shows that under certain completeness and continuity assumptions, colimits of the appropriate diagrams are also initial F-algebras and therefore, by Lemma 7.4.6, initial (or least) fixed points. Although we used directed sets to define complete partial orders, we will use chains instead for categories. (Exercise 5.2.6 compares chains and directed sets for CPOs). An w-chain is a diagram A of the form A

= d0

fo

d1

fi

d2

written in symbols as A of arrows leading from define the morphism

=

f2

(d1,

to d1 —*

when by

In any such w-chain, there is a unique composition i j. To be precise, and to give this a name, we

7.4

Domain Models of Recursive Types

°

=

515

fij

For an w-chain A as indicated above, we write A the first object and arrow, A

—

and,

fI

= d1

if F: C

f2

d2

—±

C

for the w-chain obtained by dropping

d3

is a functor on the category containing A, we also define the diagram

F(A) by

F(A)=F(do) If

=

(c, ([Li:

(c, ([Li: d1

F(fo)

F(d1)

d1 —k

c)1>j) of A

F(f1)

F(d2)

F(f2)

A is a cocone of A, we similarly write ,iT for the cocone and F(,i) for the cocone (F(c), F(d1) F(c))1>o) of c

—k

F(A). A familiar example of an w-chain arises in a CPO, regarded as a category C with a morphism a —± b if a p' (see Exercise 7.4.16). It follows that if (e, p) and (e', p') are embedding-projection pairs with e = e', then p = p', and conversely. Therefore, if is an embedding we may unambiguously write for the corresponding embedding. It is also convenient to extend this notation to diagrams, writing for example, for the diagram obtained by replacing every morphism in a diagram A in CE with the associated projection fprj Exercise 7.4.18 shows that if A and B are CPOs with an embedding-projection pair (e, p): A —÷ B and A is pointed, then B must be pointed and both e and p must be strict.

f

f

f

Example 7.4.14 If A and B are CPOs, then e: A B and p: B A form an embedding-projection pair only if p o e = id. This equation, which is often read, "p is a left inverse of e," may be satisfied only if e is injective (one-to-one). Since we only consider continuous functions in the category Cpo, an embedding in Cpo must be an injective, continuous function with a left-inverse. Using the CPOs {O, 1) to {O, 1, 2), with 0 < i. Show for any a E we have

=

(ao,

ai,...)

—±

B to A

...)) = D.

A. (Note that

The corresponding j

—+

= AHIF

=

M:

xa

a

M:

ALIT

Ay: r>

a.[y/xJM: a

—+

whenever

= AHIT M: r]j'l forx

=

772(x)

allx

E

FV(M)

not in T

Two important examples of acceptable meaning functions are meaning functions for models and substitution on applicative structures of terms. These and other examples appear following the next definition. The last three conditions of the definition above are not

Logical Relations

540

strictly necessary for the Basic Lemma. However, they are natural properties that hold of all the examples we consider. In addition to acceptable meaning functions, we will need some assumptions about the behavior of a logical relation on lambda abstractions. A logical relation over models is necessarily closed under lambda abstraction, as a consequence of the way that lambda abstraction and application interact. However, in an arbitrary applicative structure, abstraction and application may not be "inverses," and so we need an additional assumption to prove the Basic Lemma. If C A x 8 and AI[ 81111 are acceptable meaning functions, we say is admissible for AE[ and BII if, for all related environments Tia, 1= F and terms F, x: a M: r and F, x: a N: r, 11

11

Va, b.

R°(a, b)

j Rt(AI[F, x: a

a N: r}jTib[x

M:

b])

implies

Va,b.

R°(a,b)j

_±

The definitions of acceptable meaning function and admissible logical relation are technical and motivated primarily by their use in applications of the Basic Lemma. To give some

intuition, several examples are listed below.

Example 8.2.6 If A is a model, then the ordinary meaning function AL[ is acceptable. It is clear that if A and B are models, then any logical A x 8 is admissible, since then 11

x: a

a

M:

a N: r.

.

Example 8.2.7 Let T be an applicative structure of terms M such that F M: a for some F 1-i, as in Example 4.5.1. An environment for T is a mapping from variables to terms, which we may regard as a substitution. Let us write TiM for the result of substituting terms for free variables in M, and define a meaning function on T by TiM.

It is a straightforward but worthwhile exercise to verify that Till II is an acceptable meaning function. We will see in Section 8.3.1 that every lambda theory E gives us an admissible logical relation lZj' T x T. In short, relates any pair of terms that are provably equal from E. The reason why lZj' is admissible is that we can prove F Ax: a. M = Ax: a. N: a r immediately from F, x: a M = N: r. .

8.2

Logical Relations over Applicative Structures

541

Example 8.2.8 If A is a typed combinatory algebra, then A is an applicative structure with an acceptable meaning function. The standard acceptable meaning function for A is obtained by translation into combinators. More specifically, as described in Section 4.5.7, we may translate a typed term F M: a into a combinatory term F cL(M): a that contains constants K and S of various types but does not involve lambda abstraction. Since any such combinatory term has a meaning in A, we can take the meaning of F cL(M): a as the meaning of F M: a. It is easy to verify that this is an acceptable meaning function. However, recall that since a combinatory algebra need not be extensional, 16-equivalent terms may have different meanings. It is shown in Exercise 8.2A6 that every logical relation over combinatory algebras is admissible. Example 8.2.9 A special case of Example 8.2.8 that deserves to be singled out is that from any partial combinatory algebra 21), as defined in Section 5.6.2, we may construct a typed combinatory algebra In this structure, the interpretation of each typed term will be an element of the untyped partial combinatory algebra. We shall see in Sections 8.2.4 and 8.2.5 that the symmetric and transitive logical relations over correspond exactly to partial equivalence relation models over V. Using acceptable meaning functions and admissible relations, we have the following general form of the Basic Lemma.

Lemma 8.2.10 (Basic Lemma, general version)

Let A and B be applicative structures ç A x B be an admissible with acceptable meaning functions All and 511 and let logical relation. Suppose 17a and ip, are related environments satisfying the type assignment F. Then 11

c>

M: a]]TJa,

for every typed term F

c>

c>

M: a]]Tlb)

M: a.

The proof is similar to the proof of the Basic Lemma for models, since the acceptability and the admissibility of are just what we need to carry out the proof. and BI[ We will go through the proof to see how this works.

of AI[

Proof We use induction on terms. For a constant c, we assume

, ConstB(C)) as part of the definition of logical relation, so the lemma follows from the definition of by assumpacceptable meaning function. For a variable F x: a, we have and similarly for 51111. Therefore tion. Since All 11 is acceptable, AIIF x: aliT/a = BIIF x: R,(AI[F x: M: a For an application F MN: r, we assume inductively that Since logical, > N: R. is and > N: M: a aliT/a, BIIF alIT/b). R°(AIIF tilT/a, BIIF tilT/b)

Logical Relations

542

it follows that

RT(App A1FF

M: a

t]ila

N: a]jija, App 131FF

AE[F

M: a

Since the meaning functions are acceptable, it follows that AIFF

MN: t]IT)a are related. For an abstraction F Ax: a.M:

Ax: a.M:

a

t1171a, 131FF

for related environments modified environments we have Va, b.

Since

R°(a, b)

D

a

'c1a

and

—÷

N: all b).

131FF

11

MN: tIU17a and 81FF

t, we want to show that

Ax: a.M:

a

and are related, then whenever R°(a, b), the a] and llb[x b] are related. By the inductive hypothesis,

x: a

a], 81FF, x: a

M: rFja[x

M: r]l77b[x

bJ).

is admissible, it follows that

Va, b. R°(a, b) D RT(App (AIFF

App (81FF

Ax:

a.M: a

Ax: a.M:

a

t]11]b)b).

But by definition of logical relation, this implies that AI[F Xx: a.M: a 131FF t]17)b are related.

Ax:

a.M: a

t]jiia and

—÷

•

The following lemma is useful for establishing that logical relations over term applicative structures are admissible. For simplicity, we state the lemma for logical predicates only.

Let T be an applicative structure of terms with meaning function defined by substitution, as in Example 8.2.7. Let P T be a logical predicate. If

Lemma 8.2.11 VN E

Pb(({N/xJM)N .. Nk) implies VN .

E

for any type constant b, term Xx:a.M in TVk, then P is admissible.

Pb((Xx:a.M)NN1 .. Nk) .

and terms N1

E TV!

Nk E

This lemma may be extended to using elimination contexts in place of sequences of applications, as described in Section 8.3.4.

Proof For a term applicative structure T, the definition of admissible simplifies to VN E VN E

Lt, N/xi

xt, x]M) implies

8.2

Logical Relations over Applicative Structures

543

Since x is a bound variable in the statement of the lemma, we are free to choose x so as not to appear in any of the L1. Therefore, it suffices to show that

implies YN

YN E

for arbitrary M,x and YN E

t.

a.M)N)

E

Since this follows immediately from condition

papr(([N,/xIM)N ..

implies YN

.

E

papr((Ax:aM)NN

.

.

.

Nh),

(*)

it suffices to show (*) by induction on t. The base case is easy, since this is the hypothesis of the lemma. For the inductive step, we assume (*) for types and T2 and assume YN E

.

.

.

Nh).

[N/x]M may be replaced by (Ax: a.M)N. Let Q tion and the fact that 7' is logical, we have We must show that

YN E

papr2(([N/XIM)N

. .

Pt1. By assump-

.

By the inductive hypothesis, is follows that YN E

papr2((Ax.aM)NN

Since this holds for all Q YN E pa

. .

.

we may conclude that .

.

Nh).

.

This completes the inductive step and the proof of the lemma.

Our definition of "admissible" is similar to the definition given in [Sta85b], but slightly weaker. Using our definition we may prove strong normalization directly by a construction that does not seem possible in Statman's framework [Sta85b, Example 4].

Exercise 8.2.12

Let A be a Henkin model. We say that a

E

is lambda definable from

akEAokifthereisaterm{xJ:crj in the environment with 17(x1) = a1. Use the Basic Lemma (for models) to show that if E c A x A is a logical relation, a E is lambda definable from ai E

a1EA°'

17

and

a1) for each i, then

R°(a, a).

Exercise 8.2.13 Complete Example 8.2.7 by showing that TIIF acceptable meaning function for T. Exercise 8.2.14 logical relation

M:

17M is

an

Let A be a lambda model for some signature without constants. If a is called a hereditary A x A is a permutation at each type, then

Logical Relations

544

is permutation-invariant permutation (see Exercise 8.2.3). We say that an element a e if 1Z(a, a) for every hereditary permutation R. ç A x A. Show that for every closed term M, its meaning AIIM]] in A is permutation invariant.

Exercise 8.2.15 Let A be the full classical hierarchy over a signature with single type constantb, with Ab = (0, l}.

f

that is permutation invariant (see Exercise 8.2.14), but (a) Find some function e not the meaning of a closed lambda term. You may assume that every term has a unique

normal form.

f

(b) Show that if e has the property that R(f, f) for every logical relation R. c A x A, then is the meaning of a closed lambda term. (This is not true for all types, but that is tricky to prove.)

f

Exercise 8.2.16 Prove the basic lemma for combinatory algebras. More specifically show that if R. A x B is a logical relation over combinatory algebras A and B and 17a and 17b are related environments with 17a' 17b 1= F, then 'JZ(AI{F

i

M:

BI{F

i

M: Oil Jib)

for every typed combinatory term F M: a. (Do not consider it part of the definition of logical relation that R(K, K) and R(S, S) for combinators K and S of each type.) i

8.2.3

Partial Functions and Theories of Models

A logical relation R. ç A x B is called a logical partial function if each function from to Ba, i.e., R°(a, b1)

and R°(a, b2) implies

b1

=

is a partial

b2.

Logical partial functions are related to the theories of typed lambda models by the following lemma.

Lemma 8.2.17 If 7?. c A x B is a logical partial function from model A to model B, then Th(A) c Th(B). loss of generality, we will only consider equations between closed terms. B be a partial function and suppose AIIM]] = AIINIII for closed M, N. Writing 7Z(.) as a function, we have

Proof

Let

7?.

Without

çAx

7?.(AI[M]])

= BI[M1II and 7?.(AE[N]]) = BuN]]

by the Basic Lemma. Thus BI{M]1

= BIINII.

8.2

Logical Relations over Applicative Structures

545

By Corollary 4.5.23 (completeness for the pure theory of ij-conversion) we know there is a model A satisfying F M = N: a if F- F M = N: a. Since the theory of this model is contained in the theory of every other model, we may show that a model B has the same theory by constructing a logical partial function c B x A. This technique was ij-conversion is complete for the full set-theoretic first used by Friedman to show that type hierarchy over any infinite ground types [Fri75, Sta8Sa].

Corollary 8.2.18 Let A be a model of the pure theory of is a logical partial function and B is a model, then Th(B)

ij-conversion. If

CB x A

= Th(A).

We use Friedman's technique, and extensions, in Section 8.4 to show completeness for

full set-theoretic, recursive and continuous models. Since logical partial functions are structure-preserving mappings, it is tempting to think of logical partial functions as the natural generalization of homomorphisms. This often provides useful intuition. However, the composition of two logical partial functions need not be a logical relation. It is worth noting that for a fixed model A and environment ij, the meaning function AE[ }Jij from T to A is not necessarily a logical relation. In contrast, meaning functions in algebra are always homomorphisms.

Exercise 8.2.19 Find logical partial functions ç A x B and S ç B x C whose comç A x C is not a logical relation. (Hint: Try C = A and position 5 o = (S° o 8.2.4

Logical Partial Equivalence Relations

Another useful class of logical relations are the logical partial equivalence relations (logical pçr's), which have many of the properties of congruence relations. If we regard typed applicative structures as multi-sorted algebras, we may see that logical partial equivalence is relations are somewhat more than congruence relations. Specifically, any logical pçr is an equivalence relation closed under application, but must also a congruence, since be closed under lambda abstraction. Before looking at the definition of logical partial equivalence relation, it is worth recalling the definition of partial equivalence relation from Section 5.6.1. The first of two equivalent definitions is that a partial equivalence relation on a set A is an equivalence relation on some subset of A, i.e., a pair (A', R) with A' A and R A' x A' an equivalence relation. However, because R determines A' = (a R(a, a)} uniquely and R is an equivalence relation on A' if R is symmetric and transitive on A, we may give a simpler definition. ç A x A such that A logical partial equivalence relation on A is a logical relation is symmetric and transitive. We say a partial equivalence relation is total if each each I

Logical Relations

546

R'7 is reflexive, i.e., R'7(a, a) for every a E A'7. From the following lemma, we can see that there are many examples of partial equivalence relations.

is symmetric and transitive, Let R. c A x A be a logical relation. If for each type constant b, then every R'7 is symmetric and transitive.

Lemma 8.2.20

The proof is given as Exercise 8.2.21. Intuitively, this lemma shows that being a partial equivalence relation is a "hereditary" property; it is inherited at higher types. In contrast, being a total equivalence relation is not. Specifically, there exist logical relations which are total equivalence relations for each type constant, but not total equivalence relations at every type. This is one reason for concentrating on partial instead of total equivalence relations.

Exercise 8.2.21

Prove Lemma 8.2.20 for

by induction on type expressions.

Henkin model A and logical relation R. ç A x A such that Exercise 8.2.22 Find a Rb is reflexive, symmetric and transitive, for each type constant b, but some R'7 is not reflexive. (Hint: Almost any model will do.)

Exercise 8.2.23 Let E be a signature with type constants B = {bo, b1, .} and (for simbe a plicity) no term constants. Let V be a partial combinatory algebra and let combinatory algebra for signature E constructed using subsets of V as in Section 5.6.2. x be any logical partial equivalence relation. Show that for all types a Let R. c and r, the product per R'7 x Rt V is identical to the relation Roxt, and similarly for (Hint: There is something to check here, since the definition of R'7XT depends on A'7XT but, regarding R'7 and Rt as pers over V, the definition of R'7 x Rt does not.) .

8.2.5

.

Quotients and Extensionality

If R. is a logical partial equivalence relation over applicative structure A, then we may form a quotient structure AIR. of equivalence classes from A. This is actually a "partial" quotient, since some elements of A may not have equivalence classes with respect to However, to simplify terminology, we will just call AIR. a "quotient." One important fact about the structure AIR. is that it is always extensional. Another is that under certain reasonable assumptions, AIR. will be a model with the meaning of each term in AIR. equal to the equivalence class of its meaning in A. Let R. c A x A be a logical partial equivalence relation on the applicative structure A = ({A'7}, {App'7't}, Const}). The quotient applicative structure

AIR. = ({A'7/R.}, {App'7't/R.}, Const/R.) is defined as follows. If R'7(a,

a), then the equivalence class

of a with respect to R. is

8.2

Logical Relations over Applicative Structures

547

defined by

[aIR=(a'

I

= ([aIx R°(a, a)} be the collection of all such equivalence classes and define application on equivalence classes by Note that a E [aIx. We let A°77?.

[al [fri =

I

a bJ

The map interprets each constant c as the equivalence class This definition completes the of A/i?.. It is easy to see that application is well-defined: if a') and R°(b, b"), then since 7Z is logical, we have Rt(Appat a b, a' b').

If 7?. ç A x A is a logical partial equivalence relation, then A/i?. is an extensional applicative structure.

Lemma 8.2.24

Proof Let [au,

[a2] E

and suppose that for every [bi

E

A°/7?., we have

App/i?. [au [bI = App/i?. [a2] [b]. This means that for every b1, b2 e A°, we have R°(b1, b2) Rt(App a! b1, App b2). Since 7?. is logical, this gives us a2), which means that [au = [a21. Thus A/i?. is extensional. .

j

A special case of this lemma is the extensional collapse, AE, of an applicative structure A; this is also called the Gandy hull of A, and appears to have been discovered independently by Zucker [Tro73] and Gandy [Gan56]. The extensional collapse AE is defined as follows. Given A, there is at most one logical relation 7?. with

Rb(ai,a2)

if

e Ab

for each type constant b. (Such an 7?. is guaranteed to exist when the type of each constant is either a type constant or first-order function type.) If there is a logical relation 7?. which is the identity relation for all type constants, then we define AE = A/i?..

Corollary 8.2.25 (Extensional Collapse) Let A be an applicative structure for signature E. If there is a logical relation 7?. that is the identity relation on each type constant, then the extensional collapse AE is an extensional applicative structure for In some cases, A/i?. will not only be extensional, but also a Henkin model. This will be true if A is a model to begin with. In addition, since A/i?. is always extensional, we would expect to have a model whenever A has "enough functions," and $-equivalent terms are equivalent modulo 7?.. If AE[ is an acceptable meaning function, then this will guarantee that A has enough functions. When we have an acceptable meaning function, we can also express the following condition involving $-equivalence. We say 7?. satisfies F, x: a ($) with respect to All if, for every term F, x: a M: r and environment 11

11

Logical Relations

548

with

we have

x: a

AIJJ', x:

(Ax:

a

It follows from properties of acceptable meaning function A that if R. satisfies (a), then 7?. relates a-equivalent terms (Exercise 8.2.28) and 7?. is admissible (Exercise 8.2.29). In addition, if A is a combinatory algebra with standard meaning function as described in Example 8.2.8, then by Exercise 8.2.16 and the equational properties of combinators, every 7?. over A satisfies (a). we will associate an environment To describe the meanings of terms in for by AIR- with each environment ij for A with RF(ij, ij). We define

= and note that = F if = F. Using these definitions, we can give useful conditions which imply that AIR. is a model, and characterize the meanings of terms in AIR-.

Lemma 8.2.26 (Quotient Models) Let 7?. c A x A be a partial equivalence relation over applicative structure A with acceptable meaning function AI[ 11, and suppose that 7?satisfies Then AIR. is a model such that for any A-environment = F with R"(ij, ij) we have

(A/R-)llj'

M:

= [AI[F

M:

In other words, under the hypotheses of the lemma, the meaning of a term F M: a in AIR- is the equivalence class of the meaning of F M: a in A, modulo R-. Lemma 8.2.26 is similar to the "Characterization Theorem" of [Mit86c], which seems to be the first use of this idea (see Theorem 9.3.46).

Pro of

We use induction on the structure of terms. By Exercise 8.2.29, 7?. is admissible. It follows from the general version of the Basic Lemma that the equivalence class [AI[F M: of the interpretation of any term in A is nonempty. This is essential to the proof. For any variable F x: a, it is clear that

(A/R-)IIF

=

=[AI[F

by definition of and acceptability of AI[ II. The case for constants is similarly straightforward. The application case is also easy, using the definition of application in AIR-. For the abstraction case, recall that —÷

where, by definition,

f,

f satisfies the equation

8.2

Logical Relations over Applicative Structures

(App/R)

f

549

= (A/'R)l[F, x: ci M:

i—±

[all

for all satisfies the equation e A°/R°. We show that [AI[F Xx: cr.M: ci —> for f, and hence the meaning of F Xx: ci.M: ci —> r exists in A/'R. Since AIR. is extensional by Lemma 8.2.24, it follows that A/7Z is a Henkin model. For any e Aa/Ra, we have

(App/R)

[ALIT

Xx: ci.M: ci

by definition of application in it is easy to see that A1[F

Xx: ci.M: ci

—>

= [App

—>

(ALIT

Xx: ci.M: ci

—>

Using the definition of acceptable meaning function,

= A1[F, x: ci

since x is not free in Xx: ci.M. Since a definition of acceptable that

Xx: ci.M: ci

= A1[F, x:

ci

—>

i—÷

al

x:

i—÷

al, it follows from the

Therefore, using the assumption that 7?. satisfies ($), we may conclude that [App (ALIT

Xx: ci.M: ci

—>

=

[A1[F, x: ci

By the induction hypothesis, it follows that [AI[F equation for f. This proves the lemma.

M:

i—÷

Xx: ci.M: ci

—>

satisfies the

.

A straightforward corollary, with application to per models, is that the quotient of any combinatory algebra by a partial equivalence relation is a Henkin model.

Let A be a combinatory algebra and c A x A a logical partial equivalence relation. Then AIR. is a Henkin model, with the meaning of term F M: ci the equivalence class

Corollary 8.2.27

M:

= [ALIT cL(M):

of the interpretation of the combinatory term cL(M) in A. The proof is simply to apply Lemma 8.2.26 to combinatory algebras, with acceptable meaning function as described in Example 8.2.8, noting that by Exercise 8.2.16 and the equational properties of combinators, every relation over A satisfies ($). Corollary 8.2.27 implies that any hierarchy of partial equivalence relations over a partial combinatory algebra (see Sections 5.6.1 and 5.6.2) forms a Henkin model. More specifically, for any partial combinatory algebra D, any hierarchy of pers over V forms a logical partial equivalence relation over a typed (total) combinatory algebra Am as described

Logical Relations

550

in Exercise 8.2.23. Therefore, we may show that pers over V form a Henkin model by of a combinatory algebra by a logical partial equivalence showing that the quotient x relation A'p is a Henkin model. This is immediate from Corollary 8.2.27. In c addition, Corollary 8.2.27 shows that the interpretation of a lambda term is the equivaare exactly lence class of its translation into combinators. Since the combinators of the untyped combinators of D, according to the proof of Lemma 5.6.8, it follows that the is the equivalence class of the meaning of its translation meaning of a typed term in

into untyped combinators in V.

Exercise 8.2.28 terms F, x: a

with respect to AI[ Show that if satisfies r and F N: a, and environment = F with

a.M)N: Exercise 8.2.29

AIIIF

Show that if

then, for every pair of we have

[N/x]M:

satisfies

with respect to ALl 11, then

is admissible.

Exercise 8.2.30 Let A be an applicative structure and let {xi: a1 ak} M: r be represents M if A 1= a term without lambda abstraction. An element a E {xj: a1 Xk = M: t, using a as a constant denoting a. Note that since xk: axi M contains only applications, this equation makes sense in any applicative structure. Show that if A x A is a logical partial equivalence relation and a represents F M: a in A, then [a] represents F M: a in . .

8.3 8.3.1

.

Proof-Theoretic Results Completeness for Henkin Models

In universal algebra, we may prove equational completeness (assuming no sort is empty) by showing that any theory E is a congruence relation on the term algebra T, and noting M = N E S. We may give similar completeness that T/E satisfies an equation M = N proofs for typed lambda calculus, using logical equivalence relations in place of congruence relations. We will illustrate the main ideas by proving completeness for models without empty types. A similar proof is given for Henkin models with possibly empty types in Exercise 8.3.3. It is also possible to present a completeness theorem for Kripke models (Section 7.3) in this form, using a Kripke version of logical relations. Throughout this section, we let T be the applicative structure of terms typed using infinite type assignment N, with meaning function TII defined by substitution. (These are defined in Example 8.2.7.) We assume N provides infinitely many variables of each type, so each is infinite. IfS is any set of equations, we define the relation = over T x T by taking

if

11

Proof-Theoretic Results

8.3

551

C7-i:.

Note that since E

Lemma 8.3.1

F-

x: a

r>

x

= x: a for each x: a

E 7-i:,

For any theory E, the relation

no Rd is empty.

T

x

T

is a logical equivalence

relation.

Proof It

is an equivalence relation. To show that R. is logical, = M') and Ra(N, N'). Then E F- Fi r> M = M': a r and E F- ['2

is easy to see that R.

suppose that

M'N': r. Therefore RT(MN, M'N'). It remains to show that if E F- F r for every N, N' with E F- F N = N': a, then E F- F M = M': a r. This is where we use the assumption that 7-i: provides infinitely many variables of each type. Let x be a variable that is not free in M or M' and with x: a E 7-i:. Since E F- F, x: a x = x: a, we have E F- F, x: a r> Mx = M'x: r with F, x: a ç 7-i:. By we have E

F-

F

Ax:

a.Mx = Ax: a.M'x: a

r.

We now apply (7)) to both sides of this equation to derive E reflexivity, R. respects constants. This proves the lemma.

F-

F

M

= M': a

r. By

.

From Lemma 8.2.26, and the straightforward observation that any satisfies we can form a quotient model from any theory. This allows us to prove "least model" completeness for Henkin models without empty types.

Theorem 8.3.2 (Henkin completeness without empty types) Let E be any theory Then A is a Henkin model without empty closed under (nonempty), and let A = types satisfying precisely the equations belonging to E.

Proof By Lemma 8.3.1, the theory E determines a logical equivalence relation over the relation is admissible by Exercise 8.2.29. Let us T. Since clearly satisfies and A for By Lemma 8.2.26, A is a model with AIIF M: = for for every T-environment (substitution) ij satisfying F. It remains to show that A satisfies precisely the equations belonging to E. Without loss of generality, we may restrict our attention to closed terms. The justification for this is that it is easy to prove the closed equation

write

...—±ak--± from the equation

(xl: al.

.

.

xk: ak}

= N:

Logical Relations

552

between open terms, and vice versa. Let be the identity substitution. Then for any closed equation 0 M = N: t E E, since 0, we have

This shows that A satisfies every equation in E. Conversely, if A satisfies 0 M = N: t, This implies = which means then A certainly must do so at environment = N: t for some F ci-i. Since M and N are closed, we have = N: t E E E F- F by rule (nonempty). This proves the theorem.

Exercise 8.3.3 A quotient construction, like the one in this section for models without empty types, may also be used to prove deductive completeness of the extended proof system with proof rules (empty I) and (empty E) for models that may have empty types. (This system is described in Sections 4.5.5 and 4.5.6.) This exercise begins the model construction and asks you to complete the proof. To prove deductive completeness, we assume that Fo M0 = N0: ao is not provable from some set E of typed equations and construct a model A = with A E, but A F0 M0 = N0: ao. The main new idea in this proof is to choose an infinite set of typed variables carefully so that some types may be empty. We decide which types to make empty by constructing an infinite set 7-1 of typing hypotheses of the form x: a and emptiness assertions of the form empty(a) such that (a) FoCI-I, (b) for every type a, we either have emply(a) E many variables x (but not both),

7-I

or we have x: a

E I-I

for infinitely

(c) for every finite F c 7-I containing all the free variables of M0 and No, the equation F M0 = No: ao is not provable from E.

The set 7-I is constructed in infinitely many stages, starting from I-Ia = Fo. If to, is an enumeration of all type expressions, then is determined from by looking at equations of the form F, x: M0 = No: ao, where F c and x is not in and not free in Mo or N0. If

e

=

F, x:

for any finite F

=

ç

No: ao

with the above equation well-formed, then we let

U

Otherwise, we take

=

U {x:

tj

I

infinitely many fresh x}.

Proof-Theoretic Results

8.3

553

the union of all the the model A = is defined as in this section. As described in Example 4.5.1, 23L is an applicative structure for any However, while Lemma 8.3.1 does not require (nonempty), the proof does assume that provides infinitely many variables of each type.

Let

be

Given

(a) Show that properties (i) and (ii) hold for 7-L (b) Show that is a logical equivalence relation, re-proving only the parts of Lemma 8.3.1 that assume provides infinitely many variables of each type. (Hint: You will need to use proof rule (empty I).) (c) Show that A does not satisfy To M0 = N0: ao. (This is where you need to use rule (empty E). The proof that A satisfies E proceeds as before, so there is no need for you to

check this.) 8.3.2

Normalization

Logical relations may also be used to prove various properties of reduction on typed terms. term is strongly normalizing. More precisely, we In this section, we show that every show that for every typed term T M: a, we have SN(M), where SN is defined by

SN(M)

if there is no infinite sequence of fi, q-reductions from M.

While the proof works for terms with term constants, we state the results without term constants to avoid confusion with more general results in later sections. We generalize the argument to other reduction properties and product types in the next section and consider reductions associated with constants such as + andfix in Section 8.3.4. The normalization proof for has three main parts: 1.

Define a logical predicate

2. Show that

P on the applicative structure of terms.

implies SN(M).

P is admissible, so that we may use the Basic Lemma to conclude that holds for every well-typed term T M: a.

3. Show that

Since the Basic Lemma is proved by induction on terms, the logical relation P is a kind of "induction hypothesis" for proving that all terms are strongly normalizing. To make the outline of the proof as clear as possible, we will give the main definition and results first, postponing the proofs to the end of the section. Let Y = be an applicative structure of terms, as in Examples 4.5.1 and 8.2.7, where provides infinitely many variables of each type. We define the logical predicate (unary logical relation) P ç Y by taking pb to be all strongly normalizing terms, for each type

Logical Relations

554

constant b, and extending to function types by

if

VN E Ta.

P°(N)

PT(MN).

It is easy to see that this predicate is logical. (If we have a signature with term constants, then it follows from Lemma 8.3.4 below that P is logical, as long as only is used in the definition of SN.) The next step is to show that P contains only strongly normalizing terms. This is a relatively easy induction on types, as long as we take the correct induction hypothesis. The hypothesis we use is the conjunction of (i) and (ii) of the following lemma.

Lemma 8.3.4 (i) IfxM1

.

.

For each type

.Mk

E

a,

the predicate

satisfies the following two conditions.

and SN(M1), .. .SN(Mk), then pa (xMi ..

(ii) If P°(M) then SN(M). The main reason condition (i) is included in the lemma is to guarantee that each pa is non-empty, since this is used in the proof of condition (ii). Using Lemma 8.2.11, we may show that the logical predicate P is admissible.

Lemma 8.3.5

The logical predicate

P c T is admissible.

From the two lemmas above, and the Basic Lemma for logical relations, we have the following theorem.

Theorem 8.3.6 (Strong Normalization) SN(M).

For every well-typed term F

M: a, we have

M: a be a typed term and consider the applicative structure T defined using variables from H with F 7-1. Let be the environment for T with = x for all x. Then since 'TI[ is acceptable and the relation P is admissible, M: ajliio), or, equivalently, P°(M). Therefore, by by Lemma 8.3.5, we have Lemma 8.3.4, we conclude SN(M). .

Proof of Theorem 8.3.6 Let F

Proof of Lemma 8.3.4 We prove both conditions simultaneously using induction on types. We will only consider variables in the proof of (i), since the constant and variable cases are identical. For a type constant b, condition (i) is straightforward, since SN(x M1 . Mk) whenever for each i. Condition (ii) follows trivially from the definition of P. So, we move on to the induction step. For condition (i), suppose xM1 and for each we have Mk E Let N E with P°(N). By the induction hypothesis (ii) for type a, we have SN(N). .

.

. .

.

8.3

Proof-Theoretic Results

555

Therefore, by the induction hypothesis (i) for type r, we have Pt(xMi . MkN). Since . this reasoning applies to every N E we conclude that Mk). We now consider condition (ii) for type a —± r. Suppose M E with and x E By the induction hypothesis (i) we have P°(x) and so Pt(Mx). Applying the induction hypothesis (ii) we have SN(Mx). However, this implies SN(M), since any infinite sequence of reductions from M would give us an infinite sequence of reductions from Mx. .

.

.

.

.

Proof of Lemma 8.3.5 This lemma

is proved using Lemma 8.2.11, which gives us a straightforward test for each type constant. To show that P is admissible, it suffices to

show that for each type constant b, term N and term ([N/x]M)Ni .. Nk of type b, .

SN(N) and SN(({N/x]M)Ni .. Nk) imply SN((Ax:a.M)NN1 .

.

Nk).

. .

With this in mind, we assume SN(N) and SN(([N/x]M)Ni Nk) and consider the possible reductions from (Ax: a.M)NN1 Nk. It is clear that there can be no infinite sequences of reductions solely within N, because SN(N). Since ({N/xJM)Ni . Nk is strongly normalizing, we may conclude SN(M), and SN(N1) SN(Nk), since otherwise we could produce an infinite reduction sequence from ([N/x]M)N1 . Nk by reducing one of its subterms. Therefore, if there is any infinite reduction sequence with /3-reduction occurring at the left, the reduction sequence must have an initial segment of the form . .

.

. . .

.

.

(Ax:

a.M)NN1

.

. .

Nk

N

([N/x]M)Ni

.

.

.

Nk

(Ax:

N' and

.

.

.

.

. .

for

—÷÷

.

1

i

k. But then clearly we may reduce

.

well, giving us an infinite sequence of reductions from ([N/x]M)Ni . Nk. The other possibility is an infinite reduction sequence with it-reduction occurring at the left, which would have the form as

(Ax:a.M)NN1...Nk

.

.

-÷

But since the last reduction shown could also have been carried out using /3-reduction, we have already covered this possibility above. Since no infinite reduction sequence is possible, the lemma is proved.

.

Exercise 8.3.7 We can show weak normalization of /3-reduction directly, using an argument devised by A.M. Turing sometime between 1936 and 1942 [Gan8O]. For this exercise, the degree of a /3-redex (Ax: a. M)N is the length of the type a —f r of Ax: a. M

Logical Relations

556

when written out as a string of symbols. The measure of a term M is the pair of natural numbers (i, J) such that the redex of highest degree in M has degree i, and there are exactly j redexes of degree i in M. These pairs can be ordered lexicographically by either i 0.

defined from strong normalization of pcf+-reduction, we have else ), and for each type a and natural number

then

Proof For addition, we assume

and and show + N). It suffices to show that M + N is strongly normalizing. However, this follows immediately since M and N are both strongly normalizing and the reduction axioms for + only produce a numeral from two numerals. The proof for equality test, Eq?, is essentially the same. P°(M), P°(N) and demonstrate p°(jf B For conditional, we assume then M else N). By Lemma 8.3.20, it suffices to show that if E[if B then B then M else NI Melse this is analysis, according to whether However, an easy case normalizing. is strongly false, or neither, using (ii) of Lemma 8.3.20. B true, B For this, we use induction The remaining case is a labeled fixed-point constant is easy since there is no associated reduction. For on the label. The base case, for and let E[ I be any elimination context with we assume I) and is strongly normalizWe prove the lemma by showing that e ing. If there is an infinite reduction sequence from this term, it must begin with steps of the form

Logical Relations

568

a -+ a.

a -+

4

a.

However, by the induction hypothesis and (ii) of Lemma 8.3.20, we have and P(e"[ ]), which implies that an infinite reduction sequence of this form is impossible. This completes the induction step and proves the lemma.

.

Since Lemma 8.3.2 1 is easily extended to Theorem 8.3.19.

Theorem 8.3.22

we have the following corollary of

Both pcf+ and pcf+,o-reduction are strongly normalizing on labeled

PCF terms. We also wish to show that both forms of labeled PCF reduction are confluent. A general theorem that could be applied, if we did not have conditional at all types, is given in [BT88, BTG89] (see Exercise 8.3.29). The general theorem is that if typed lambda calculus is extended with a confluent, algebraic rewrite system, the result is confluent. The proof of this is nontrivial, but is substantially simplified if we assume that 1Z is left-linear. That is in fact the case we are interested in, since left-linearity is essential to Proposition 8.3.18 anyway. An adaptation of this proof to typed lambda calculus with labeled fixed-point operators has been carried out in [HM9O, How92]. For the purposes of proving our desired results as simply as possible, however, it suffices to prove weak confluence, since confluence then follows by Proposition 3.7.32. To emphasize the fact that the argument does not depend heavily on the exact natural number and boolean operations of PCF, we show that adding proj, lab+(o) to any weakly confluent reduction system preserves weak confluence, as long as there are no symbols in common between 1Z and the reduction axioms of proj, lab+(o). This is essentially an instance of orthogonal rewrite systems, as they are called in the literature on algebraic rewrite systems [Toy87], except that application is shared. A related confluence property of untyped lambda calculus is Mitschke's 8-reduction theorem [Bar84], which shows that under certain syntactic conditions on untyped 1Z, a-reduction is confluent.

Lemma 8.3.23

Let 1Z be a set of reduction axioms of the form L -+ R, where L and R are lambda terms of the same type and L does not contain an application x M of a variable to an argument or any of the symbols X, fix or any If 1Z is weakly confluent, then 1Z, proj, lab+(o) is weakly confluent.

Proof The proof is a case analysis on pairs of redexes. We show two cases involving reduction and reduction from

1Z.

The significance of the assumptions on the form of L is

8.3

Proof-Theoretic Results

569

that these guarantee that if a subterm of a substitution instance [M1 Mk/X1 xkIL of some left-hand-side contains a proj, lab+(o) redex, then this redex must be entirely within one of M1 Mk. The lemma would also hold with any other hypotheses guaranteeing this property. The first case is an R,-redex inside a This gives a term of the form

(Ay:o.(...[Mi

Mk/x1,..., xkIL...))N

reducing to either of the terms

(...[N/y][Mi

Mk/xi

Xk]L...)

(Ay:o.(.

Mk/xi

xkIR. .))N

.

.

the first by a-reduction and the second by an axiom L both reduce in one step to

(...[N/y][Mi

Mk/x1

—÷

Mk/Xl

where one of Mi

R. It is easy to see that

Xk]R...)

The other possible interaction between a-reduction and

(...[Mi

R from

begins with a term of the form

Xk]L...) Mk contains a

If M1

—*

Mi', then this term reduces to either

of the two terms

(...[Mi,...,

Mk/x1

Xk]L...)

(...[M1,..., M1

Mk/xi

Xk]R...)

in one step. We can easily reduce the first to

(...[Mi

M,'

Mk/x1

Xk]R...)

step. Since we can also reach this term by some number of a-reductions, one for each occurrence of x, in R, local confluence holds. The cases for other reductions in a substitution instance of the left-hand-side of an axiom are very similar. by one

.

By Lemma 3.7.32, it follows that and are confluent. We may use Proposition 8.3.18 to derive confluence of pcf-reduction from confluence of pcf+. This gives us the following theorem.

Theorem 8.3.24

The reductions pcf+ and pcf+o are confluent on labeled PCF terms and pcf -reduction is confluent on ordinary (unlabeled) PCF terms.

Logical Relations

570

Completeness of Leftmost Reduction is a set of "left-normal" rules (defined below), The final result in this section is that if and proj, lab+ is confluent and terminating, then the leftmost reduction strategy de,8, proj,fix-normal forms. Since the PCF fined in Section 2.4.3 is complete for finding rules are left-normal, it follows from Proposition 2.4.15 that lazy reduction is complete for finding normal forms of PCF programs. A reduction axiom is left-normal if all the variables in the left-hand side of the rule appear to the right of all the term constants. For example, all of the PCF rules for natural numbers and booleans are left-normal. If we permute the arguments of conditional, however, putting the boolean argument at the end, then we would have reduction axioms cond x y true

x,

cond x yfalse

y.

of the constants true and false. Intuitively, if we have left-normal rules, then may safely evaluate the arguments of a function from left to right. In many cases, it is possible to replace a set of rules with essentially equivalent left-normal ones by either permuting arguments or introducing auxiliary function symbols (see Exercise 8.3.28). However, this is not possible for inherently "nonsequential" functions such as parallel-or. We will prove the completeness of left-most reduction using strong normalization and confluence of positive labeled reduction, and the correspondence between labeled and unlabeled reduction. Since labeled fix reduction may terminate with fix° where unlabeled fix reduction could continue, our main lemma is that any fix0 occurring to the left of the left-most redex will remain after reducing the left-most redex. These are not left-normal since x and y appear to the left

Lemma 8.3.25 Let R. be a set of left-normal reduction axioms with no left-hand-side containingflx°. If M N by the left-most reduction step, andflx° occurs to the left of the left-most redex of M, then fix0 must also occur in N, to the left of the left-most redex if N is not in normal form.

Proof

Suppose M C[LJ —* C[RJ N by contracting the left-most ,8, proj, redex and assumeflx° occurs in C[]to the left of [1. If N C[R] is in normal form, then fix° occurs in N since fix0 occurs in C [1. We therefore assume N is not in normal form and show thatfix° occurs to the left of the left-most redex in N. Suppose, for the sake of deriving a contradiction, thatfix° occurs within, or to the right of, the left-most redex of N. Since a term beginning with fix0 is not a redex, the left-most redex of N must begin with some symbol to the left of fix0, which is to the left of R when we write N as C[RJ. The proof proceeds by considering each possible form of redex. Suppose the left-most redex in N is Proj1 (N1, N2), withfix° and therefore R occurring

8.3

Proof-Theoretic Results

571

Since fix° must occur between and R, R cannot be (N1, N2), to the right of and so a redex of the form (_, ) must occur in M to the left of L. This contradicts the assumption that L is the leftmost redex in M. Similar reasoning applies to a t3-redex; the fix case is trivial. It remains to consider a redex SL' with S some substitution and L' R' in 1Z. We assumejix° and therefore R occur to the right of the first symbol of SL'. Sincejix° is not in L', by hypothesis, and the rule is left-normal, all symbols to the right of fix°, including R if it occurs within L', must be the result of substituting terms for variables in L'. It follows that we have a redex in M to the left of L, again a contradiction. This proves the lemma.

.

is confluent and terminating, with Suppose 1Z, proj, linear and left-normal. If M N and N is a normal form, then there is a R, from M to N that contracts the leftmost redex at each step.

Theorem 8.3.26

liZ

left-

proj,

N then, by Lemma 8.3.16 there exist labelings M* and N* of these M* N*. Since proj, lab+-reduction is confluent and termiterms such that M* nating, we may reduce to N* by reducing the left-most redex at each step. of M* to N* is also the left-most We show that the left-most R, proj, proj,fix-reduction of M to N (when labels are removed). It is easy to see that this liz, is the case if no term in the reduction sequence has fix° to the left of the left-most redex, since this is the only term that would be an unlabeled redex without being a labeled one. Therefore, we assume that the final k steps of the reduction have the form

Proof If M

... where fix° occurs to the left of L in Mk. But by Lemma 8.3.25 and induction on k, Jix° must also be present in N*, which contradicts the fact that N is a liZ, proj,fix-normal form. It follows from Lemma 8.3.16 that by erasing labels we obtain a leftmost reduction

fromMtoN. shows that left-most reduction is normalizing for the untyped lambda calculus extended with any left-normal, linear and non-overlapping term rewriting system. Our proof is simpler since we assume termination, and since type restrictions make fix the only source of potential nontermination. Since the rules of PCF satisfy all the conditions above, Theorem 8.3.26 implies the completeness of leftmost reduction for finding PCF normal forms. This was stated as Proposition 2.4.12 in Section 2.4.3. A related theorem in

Exercise 8.3.27 Gödel's system constant nat and term constants

T of primitive recursive functionals

is

over type

Logical Relations

572

o

:nat

succ: nat

—÷

nat

Ra

for each type

a, with the following reduction axioms

RaMNO ± M

RaMN(succx)

—±

The primitive recursion operator Ra of type a is essentially the same as prime of Exercise 2.5.7, with cartesian products eliminated by currying. Extend the proof in this section to show that Gödel's T is strongly normalizing. You need not repeat proof steps that are given in the text. Just point out where additional arguments are needed and carry them out. Historically, Gödel's T was used to prove the consistency of first-order Peano arithmetic [Göd58], by a reduction also described in [HS86, Tro73] and mentioned briefly in [Bar84, Appendix A.2]. It follows from this reduction that the proof of strong normalization of T cannot be formalized in Peano arithmetic. Given this, it is particularly surprising that the proof for T involves only the same elementary reasoning used for

Exercise 8.3.28

Consider the following set of algebraic reduction axioms.

f true true y f true false y f false x true

false

ffalsexfalse

true.

true

false

This system is not left-normal. Show that by adding a function symbol not with axioms

not true

false

not false —* true. we may replace the four reduction axioms by two left-normal ones in a way that gives exactly the same truth-table for f.

Exercise 8.3.29 This exercise is concerned with adding algebraic rewrite rules to typed lambda calculus. Recall from Section 4.3.2 that if = (S, F) is an algebraic signature, we may construct a similar signature = (B, F') by taking the sorts of as base types, and letting each algebraic function have the corresponding curried type. This allows us to view algebraic terms as a special case of typed lambda terms. Let R. be an algebraic rewrite system over Thinking of R. as a set of rewrite

8.4

Partial Surjections and Specific Models

573

rules for terms over E, it makes sense to ask whether the reduction system consisting of together with the usual reduction rules ($)red and (1))red of typed lambda calculus is confluent. This turns out to be quite tricky in general, but it gets much simpler if we drop (1))red and just consider R. plus $-reduction (modulo a-conversion of terms). We call the combination of 7?. and $-reduction 7?,, $-reduction. The problem is to prove that if is confluent (as an algebraic rewrite system), then 7?,, $-reduction is confluent on typed lambda terms over E. (This also holds with proj-reduction added [HM9O, How92].) Some useful notation is to write for reduction using rules from similarly for 7?,, $-reduction, and $nf(M) for the normal form ofM under $-reduction. Note You may assume the following that finf(M) may still be reducible using rules from facts, for all F M: a and F N: a. 1. If R. is confluent on a'gebraic terms over signature typed lambda terms over L

2. If M is a 3.

If M

$-normal form and M N, then $nf(M)

N, then N is a

then

is

confluent on

form.

$nf(N).

The problem has two parts. 1—3 above, show that if 7?. is confluent (as an algebraic rewrite system), then (This problem has a 4 or 5$-reduction is confluent on typed lambda terms over and ordinary 18-reduction sentence solution, using a picture involving

(a) Using 7?,,

(b) Show that (a) fails if we add fix-reduction. Specifically, find a confluent algebraic $,fix-reduction on typed lambda terms over signature E U rewrite system 7?. such that {fixa: (a —± a) —± al is not confluent. (Hint: try to find some algebraic equations that become inconsistent when fixed points are added.)

8.4

Partial Surjections and Specific Models

In Sections 8.4.1—8.4.3, we prove completeness of $, i)-conversion for three specific models, the full classical, recursive and continuous hierarchies. These theorems may be contrasted with the completeness theorems of Section 8.3.1, which use term models to show that provability coincides with semantic implication. The completeness theorems here show that the specific theory of $, ij-conversion captures semantic equality between pure typed lambda terms in three specific, non-syntactic models. One way of understanding each of these theorems is to consider a specific model as characterizing a specific form of "distinguishability." Suppose we have two terms,

Logical Relations

574

M, N: (b b) b, for example. These will be equal in the full classical hierarchy if the functions defined by M and N produce equal values of type b for all possible function only contains recurarguments. On the other hand, in a recursive model A where sive functions, these terms will be equal if the functions they define produce equal values when applied to all recursive functions. Since not all set-theoretic functions are recursive, there may be some equations (at this type) that hold in the recursive model but not in the full set-theoretic hierarchy. Surprisingly, the completeness theorems show that these two forms of semantic equivalence are the same, at least for pure typed lambda terms without constants. The third case, the full continuous hierarchy, is also interesting for its connection with PCF extended with parallel-or. Specifically, we know from Theorem 5.4.14 that two terms of PCF + por are equal in the continuous hierarchy over nat and bool if they are operationally equivalent. In other words, two lambda terms are equal in the continuous model if they are indistinguishable by PCF program contexts. Therefore, our completeness theorem for the full continuous hierarchy shows that ij-conversion captures PCF operational equivalence for pure typed lambda terms without the term constants of PCF. A technical point is that this only holds for terms whose types involve only nat and —÷; with bool, the theorem does not apply since we only prove completeness of ij-conversion for the full continuous hierarchy over infinite CPOs. A more general theorem, which we do not consider, is Statman's so-called 1-section theorem [Rie94, Sta8Sa, 5ta82]. Briefly, Statman's theorem gives a sufficient condition on a Henkin model A implying that two terms have the same meaning in A if the terms ij-convertible. It is called the "1-section theorem" since the condition only involves are the first-order functions of A. Although the 1-section theorem implies all of the three completeness theorems here, the proof is more complex and combinatorial than the logical relations argument used here, and the generalization to other lambda calculi is less clear. We will work with and type frames instead of arbitrary applicative structures. Recall that a type frame is an extensional applicative structure A such that the elements of are functions from to AT and App x = f(x). We therefore write fx in place of App x.

f

8.4.1

f

Partial Surjections and the Full Classical Hierarchy

It is tempting to think of any type frame B as a "subframe" of the full classical hierarchy P with the same interpretation of type constants. This seems quite natural, since at every type a —k r, the set is a subset of the set of all functions from to Br. pa—*T However, we do not have in general. This is because may be a proper subset of if a is a functional type, and so the domain of a function in is different from a function in However, the intuitive idea that P "contains" B is reflected in a

8.4

Partial Surjections and Specific Models

575

certain kind a logical relation between P and 13. Specifically, a logical relation C A x 13 is a partial surjection, i.e., is called a partial surjection if each is a partial function and for each b E there is some a E N' with R°(a, b). Bb for each In this section, we prove that if we begin with partial surjections Rb: pb type constant b of some frame 13, then the induced logical relation from the full classical hierarchy P to 8 is a partial surjection. By choosing 8 isomorphic to a 17-term model, we may conclude that any full classical with each pb large enough has the same theory as 13. Consequently, an equation F M = N: a holds in the full classical hierarchy over 17-convertible. This was first proved in [Fri75]. any infinite types if M and N are

Lemma 8.4.1 (Surjections for Classical Models) Let over a full classical hierarchy P and type frame 8. If Rb for each type constant b, then

Proof We assume

ç P x 8 be a logical relation pb x Bb is a partial surjection

is a partial surjection.

and Rt are partial surjections and show that

is also a partial

surjection. Recall that

=

{(a, b)

I

We first show that

Writing

aal

E

b')

3 Rt(aa', bb')}.

is onto by letting b E

and finding a

E

pa-±r related to b.

and Rt as functions, a can be any function satisfying

(Rt)_I(b(Raai)) allaj EDom(R°).

is a function on its domain Since is the full type hierarchy, there is at least one such a E is a partial function, we suppose that To see that pa be any "choice function" satisfying Let f:

R°(fb', b')

and Rt is onto, b1) and

b2).

all b' E Ba.

we have Rt(a(fb1), bib') for i = 1,2. But since Rt is a function, Then for every b' E Thus, by extensionality of 13, we have b1 = it follows that b1b' = b2b' for every b' E the lemma. . proves b2. This It is now an easy matter to prove the completeness of cal hierarchy over any infinite sets.

7)-conversion for the full classi-

Theorem 8.4.2 (Completeness for Classical Models) Let P be the full classical hierarchy over any infinite sets. Then for any pure lambda terms, F M, N: a without constants, we have? 1= F M = N: a if M and N are $, 17-convertible.

Logical Relations

576

Proof Let B

be isomorphic to a Henkin model of equivalence classes of terms, as in Theorem 8.3.2, where E is the pure theory of fi, n-conversion. Let 1Z ç P x B be a is infinite logical relation with a partial surjection. We know this is possible since and each is countably infinite. By Lemma 8.4.1, 1Z is a partial surjection. Consequently, by Lemma 8.2.17, Th(P) C Th(B). But since every theory contains Th(B), it follows that

Th(P)=Th(B). 8.4.2

Full Recursive Hierarchy

In this subsection, we prove a surjection theorem for modest sets. Just as any frame A with an arbitrary collection of functions is the partial surjective image of a full classical hierarchy, we can show that any frame A containing only recursive functions is the partial surjective image of a full recursive hierarchy. Since we may view the term model as a recursive model, the surjection lemma for recursive models also implies that fi, n-conversion is complete for recursive models with infinite interpretation of each type constant. In order to carry out the inductive proof of the surjection lemma, we need to be able to show that for every function in a given recursive frame B, we can find some related recursive function in the full recursive hierarchy A. The way we will do this is to establish that for every type, we have functions that "recursively demonstrate" that the logical relation 1Z is a partial surjection. Following the terminology of Kleene realizability [K1e71], we say that a function fa realizes is surjective by computing fab with Similarly, a recursive function b E is a partial function by realizes computing the unique gaa = b with R°(a, b) from any a in the domain of We adapt the classical proof to modest sets by showing that the inductive hypothesis is realized by recursive functions. The assumption we will need for the interpretation of type constants is a form of "recursive embedding." If (A, eA) and (B, are modest sets, then we say (f, g) is a recursive of into embedding (B, ej,) (A, if and g are recursive functions

f

f: (B, eb)

(A, eA) and

g: (A, eA)

f

(B, eb)

whose composition g o = idB is the identity function on B. We incorporate such functions into a form of logical relation with the following definition. Let A and B be type frames with each type interpreted as a modest set. We say a family {R°, fa, of relations and functions is a recursive partial surjection from A to B if 1Z c A x B is a logical partial surjection, and

fa (Be',

(Aa, ea),

ga: (Ar', ea)

8.4

Partial Surjections and Specific Models

are recursive functions with fa realizing that partial function.

577

is surjective and g0 realizing that

is a

Lemma 8.4.3 (Surjections for Recursive Models)

Let A, B be frames of recursive functions and modest sets, for a signature without term constants, with A a full recursive for each type constant b, then there is a is recursively embedded in hierarchy. If from to recursive partial surjection B. A f°' from the recursive embedding Proof For each type constant b, we define gb) by gj,x = y. It is easy to check that (Ri', fb, gb) has the required properties. We y) define the logical relation 7?. ç A x B by extending to higher types inductively.

if

For the inductive step of the proof, we have three conditions to verify, namely is a surjective partial function,

(i)

(ii) There is a recursive b

fa±t:

a recursive

ga±t:

—÷

ea±t) with

—÷

with

ga) for

all a E

Since (i) follows from (ii) and (iii), we check only (ii) and (iii), beginning with (ii). Given with b). Writing we must find some recursive function a E b E R(.) as a function, a must have the property

aai

E

o for all ai E Dom(R°). It is easy to see that this condition is satisfied by taking a ft(b(Raai)) o chooses an element computes the function and E b g0, since ga (Rt)_l(b(Raai)) whenever there is one. We can also compute a as a function of b by composing with recursive functions. b) for some b. We so that For condition (iii), we assume a E need to show how to compute b uniquely from a. It suffices to show how to compute and that the value bb1 is uniquely determined the function value bb1 for every b1 E then R° (fabi, b1) and so we have Rt(a(fabi), bb1). Therefore by a and b1. If b1 E = bb1 and this equation determines the value bb1 uniquely. Thus we take = o a o fa, which, as we have just seen, gives the correct result for every a E

The completeness theorem is now proved exactly as for the full classical hierarchy, with Lemma 8.4.3 in place of Lemma 8.4.1, and with the term model viewed as a frame

Logical Relations

578

of recursive functions and modest sets, as explained in Examples 5.5.3, 5.5.4 and Exercise 5.5.13, instead of as a frame of classical functions.

Theorem 8.4.4 (Completeness for Recursive Models)

Let A be the full recursive type with the natural numbers recurhierarchy over any infinite modest sets (A"O, sively embedded in each A". Then for any pure lambda terms, F M, N: a without constants, we have A F M = N: a iff M and N are $, 8.4.3

Full Continuous Hierarchy

We may also prove a completeness theorem for models consisting of all continuous functions on some family of complete partial orders. While similar in outline to the completeness proofs of the last two sections, the technical arguments are more involved. The proof is from [P1o82]. The main lemma in each of the preceding sections is the surjection lemma. In each case, the main step in its proof is, given b e finding some a e satisfying

aa1 E (RT)_l(b(Raai))

eDom(RCJ).

This With the full classical hierarchy, we use the fact that all functions belong to allows us to choose one satisfying the conditions. In the case of recursive functions on modest sets, we built up recursive functions between A and B by induction on type, and used these to define a. In this section, we will build up similar continuous functions between A and the CPO obtained by lifting the set of equivalence classes of terms. Since there is no available method for regarding each type of the term structure as a "usefully" ordered CPO, the proof will involve relatively detailed reasoning about terms and free variables. The analogue of "recursive embedding" is "continuous retract." If V = (D,

s'

>

IA'I

f, A) A' is a

f,L Al

The reader may easily check that (Iunitl, id, unit) is a terminal object in C. There is a canonical "forgetful" functor C C which takes (S, f, A) to A and (t, x) to x. Category theorists will recognize the scone as a special case of a comma category, namely C = (id ), where id is the identity functor on Set (see Exercise 7.2.25). It is common I

to overload notation and use the symbol Set for the identity functor from sets to sets (and similarly for other categories). With this convention, we may write C = (Set A remarkable feature of sconing in general is that it preserves almost any additional categorical structure that C might have. Since we are interested in cartesian closed structure, we show that sconing preserves terminal object, products and function objects (exponentials). I

Lemma 8.6.14 canonical functor

If C is a cartesian closed category, then C is cartesian closed and the C is a cartesian closed functor. C .

Proof Let X

(S,

f,

A) and V

=

(S',

f', A') be objects of C. Then the object

Xx Y=(SxS',fxf',AxA'), is a product of X and Y, where and V is given by

X

V

= (M, h,

A

(f

x

f')(s, s') = (f(s), f'(s')).

A function object of X

A'),

where M is the set of morphisms X V in C and h((t, x)): unit A —÷ A' is the morphism in C obtained from x: A A' by currying, i.e., h((t, x)) = Curryx. We leave it to the reader, in Exercise 8.6.17, to check that 7r: C C is a cartesian closed functor.

.

In order to make a connection with logical predicates or relations, we will be interested in a subcategory C of the scone which we call the subscone of C. The objects of the subscone C are the triples (S, f, A) with f: S '—* IAI an inclusion of sets. It is possible to let f: S '—* IA be any one-to-one map. But, since we are really more interested in the

8.6

Generalizations of Logical Relations

597

map lxi: Al —k IA'i in the diagram above than ti: S —k S', for reasons that will become clear, we will assume that if (S, f, A) is an object of C, then S c Al is an ordinary settheoretic subset and f: S IAI is the identity map sending elements of S into the larger set Al. The morphisms (S, f, A) —k (S', f', A') of C are the same as for C, so C is afull subcategory of C. If (t, x) is a morphism from (S, f, A) to (S', f', A') in C, then the set function t: S —k S'is uniquely determined by x and S. Specifically, t is the restriction of lxi: Al —k IA'I to the subset S C Al. The subcategory C inherits the cartesian closed structure of C, up to isomorphism. More precisely, C is a subcategory of C that is also cartesian closed. However, given two objects, X and Y, from C, the function object X Y = (M, h, A A') given in the proof of Lemma 8.6.14 is not an object of C since M is not a subset of IA A'I. However, the function object for X and Y in C is isomorphic to the function object in C. Instead of describing function objects in C directly for arbitrary C, we will take this up in the illustrative special case that C is a product of two CCCs. Let us suppose we have a product category C = A x B, where both categories A and B have a terminal object. Recall that the objects of A x B are ordered pairs of objects from A and B and the morphisms (A, B) —k (A', B') are pairs (x, y), where x: A —k A' is a morphism in A and y: B —k B' is a morphism in B It is easy to see that when A and B are cartesian closed, so is A x B, with coordinate-wise cartesian closed structure. For example, a terminal object of A x B is (unit, unit). It is easy to see that (A, B)jc = AlA x BIB, where the cartesian product of AlA and BIB is taken in sets. We will often omit the subscripts from various l's, since these are generally clear from context. The objects of the scone AxB may be described as tuples (5, f, g, A, B) ,with S a set, and g functions that form a so-called span: A an object in A, B an object in B, and .

I

f

IAI

S

IBI

f, g, A, B) —k (S', f', g', A', B') in AxB may be described as tuples (t, x, y) , with x: A —k A' a morphism in A, y: B —k B' a morphism in B, and t: S —k 5' a function such that: The morphisms (5,

Logical Relations

598

lxi

Al

IA'I

.

BI

IB'I

lyl

Let us now look at the subcategory Ax B of AxB. Working through the definition of AxB, we may regard the objects of this subcategory as tuples (S, f, g, A, B), where f and

f

g provide an inclusion S -± Al x IBI. Since S, and g determine a subset of Al x IBI, as before, an object (S, f, g, A, B) may be expressed more simply as a triple (S, A, B) with S an ordinary binary relation on Al x IBI. For any morphisms (t, x, y) between objects of Ax B, the set map t is again uniquely determined by A and B maps x and y and the relation S. Specifically, t is the restriction of the product map lxi x to the subset simply write morphisms pairs, instead of triples. It S. This allows us to as is easy to see that (x, y) forms a morphism from (S, A, B) to (S', A', B') exactly if, for all morphisms a: unit —k A in A and b: unit —k B in B, we have

a S b implies (x

o

a) S' (y

o

b),

where we use relational notation a S b to indicate that S relates a to b, and similarly for 5'. With some additional notation, we can express the definition of morphism in AxB as a form of "commuting square." To begin with, let us write S: Al —x IBI to indicate that S is a binary relation on IA x IBI. The definition of morphism now has an appealing diagrammatic presentation as a pair (x, y) satisfying i

lxi

IA'I

Al

SI IB'l

IBI

iyI

8.6

Generalizations of Logical Relations

599

o 5 is included in the relation S"o lxi. where "ç" in this diagram means that the relation It is instructive to write out the function objects (exponentials) in the cartesian closed category AxB. The function object (S, A, B) —÷ (S', A', B') may be written (R, A —÷ B'), where R: IA A", B WI is the binary relation such that for all A'I lB morphisms e: unit —f A —÷ A' in A and d: unit —f B —÷ B' in B, we have

e

Rd if a Sb implies App

o (e,

a) S' App

o

(d, b)

for all a: unit —f A in A, b: unit —f B in B. We may also express products in a similar fashion. As in the general case, the subscone C is a CCC and a full subcategory of C, and the restriction of the forgetful functor C —f C to C is also a cartesian closed functor. The main properties of A x B are summarized in the following proposition, whose proof is left as Exercise 8.6.18.

Let C = A x B be the cartesian product of two CCCs. Then C is a cartesian closed category, with canonical functor C —f C a cartesian closed functor.

Proposition 8.6.15

Products and function objects in C are given by (5, A, B) x (5', A', B") (5', A', B')

(5, A, B)

= =

(S

x 5', A x A', B x B'),

(S

where the relations S x 5' and S

5', A

A', B

B'),

5' have the following characterizations:

if (Proj1 o a) S (Proj1 o b) and (Proj2 o a) 5' (Proj2 o b) 5') d if Va: unit —f A. Vb: unit —f B. a Sb implies App o (e, a) 5' App o (d, b).

a (S x 5') b e (S

—÷

At this point, it may be instructive to look back at the definition of logical relations in Section 8.2.1 and see the direct similarity between the descriptions of products and function objects in this proposition and the definition of logical relations over an applicative structure or Henkin model. The difference is that here we have generalized the definition to any cartesian closed category, forming relations in Set after applying the functor

Exercise 8.6.16 Show that for any category C with products and terminal object, the from C to Set preserves products up to isomorphism, i.e., IA x B functor IA x lB for all objects A and B of C. I

I

I

I

I

Exercise 8.6.17 Show that for any CCC C, the map 7r: C —f C defined in this section is a functor that preserves terminal object, products and function objects. Begin by showing

Logical Relations

600

that and

7t is

Y

a functor, then check that 7r(X x Y) = 7rX x 7rY for all objects X = (S, (S', f', A') of C, and similarly for function objects and the terminal object.

Exercise 8.6.18

f, A)

Prove Proposition 8.6.15.

= (S x S', A x A', B A', B') = (S —* S', A —* A',

(a) Show that (S, A, B) x (S', A', B')

x B') is a product in C.

(b) Show that (S, A, B) in C.

B

—*

(c) Show that the functor

Exercise 8.6.19 8.6.4

(S',

—*

B') is a function object

C —÷ C is a cartesian closed functor.

Show that if C is a well-pointed category, then so is C.

Comparison with Logical Relations

In this section, we describe the connection between logical relations and subscones, for the special case of cartesian closed categories determined from Henkin models, and show how the Basic Lemma may be derived from a simple categorical properties of the subscone. indexed by the types Since a logical predicate P ç A is a family of predicates over the signature of A, we can think of a logical predicate as a mapping from types over some signature to subsets of some structure. We will see that a mapping of this form is determined by the categorical interpretation of lambda terms in the subscone of a CCC. In discussing the interpretation of terms in cartesian closed categories in Section 7.2.4, where may be any signature. The differwe used a structure called a A is equipped with a specific choice ence between a CCC and a is that a unit —+ A for each term of object Ab for each type constant b of and a morphism Since a associates an object or arrow with each symbol in we constant of —÷ A can think of a as a cartesian closed category A together with a mapping from the signature into the category. X For any signature and set of equations between -terms, there is a CCC CE(e) called the term category generated by S, as described in Example 7.2.19. There is a canonical mapping presenting CE(e) as a -CCC, namely, the —÷ function mapping each type constant to itself and each term constant to its equivalence class, modulo As shown in Proposition 7.2.46, there is a unique cartesian closed functor from CE(ø) to any other D. (The categorical name for this is that CE(ø) is the free cartesian closed category over signature If we draw in the mappings from then such a cartesian closed functor forms a commuting triangle ,

8.6

Generalizations of Logical Relations

601

since the interpretation of a type or term constant is determined by the way that D is regarded as a E-CCC. As explained in Section 7.2.6, the cartesian closed functor from (0) into a E -CCC D is the categorical equivalent of the meaning function associated with a Henkin model. It is really only a matter of unraveling the definitions, and using Proposition 8.6.15, to see that the logical predicates over a Henkin model A for signature E conespond to ways of regarding the subscone of the category CA determined A as a E-CCC. More specifically, let C = CA be the cartesian closed category determined by Henkin model A, as explained in Section 7.2.5, regarded as a E-CCC in the way that preserves the meaning of terms. By Theorem 7.2.41, C is a well-pointed cartesian closed category, with the morphisms unit a exactly the elements of from the Henkin model A. If we consider a cartesian closed functor F from the term category the object map of this functor designates an object F(a) = (Pa, a) C with P c for each type a. But since laP = a meaning function into the subscone chooses a subset of the Henkin model at each type. It is easy to see from Proposition 8.6.15 that the family of predicates determined by the object map of a cartesian closed functor C is in fact a logical relation. This gives us the following proposition.

Proposition 8.6.20 Let C = CA be the well-pointed E-CCC determined by the Henkin model A for signature E. Then a logical predicate on A is exactly the object part of a over E into the E-CCC C. cartesian closed functor from the term category Using product categories, Proposition 8.6.20 gives us a similar characterization of logical relations over any number of models. For example, if C = CA x C8 is determined from Henkin models A and B, then C and C are again well-pointed cartesian closed categories (see Exercise 8.6.19) and a meaning function into C is a logical relation over Henkin models A and B. Using the conespondence given in Proposition 8.6.20, we can see that the Basic Lemma for logical relations is a consequence of the commuting diagram given in the following proposition.

Logical Relations

602

signature and let C be a cartesian closed catLet be a Composing with the —÷ C presenting C as a egory. Consider any mapping —÷ corresponding map —÷ C and, by the natural obtain a C, we canonical functor C (0) into the embedding of symbols into the term category, we have a similar map —÷ there exist unique cartesian closed functors term category. Given these three maps from from CE (0) into C and C commuting with the canonical functor C —÷ C as follows:

Proposition 8.6.21

0 C

Proof Let functions from to objects of the CCCs C, C and CE (0) be given as in the statement of the proposition. It is a trivial consequence of the definitions that the triangle involving C and C commutes. By Proposition 7.2.46, the cartesian closed functors CE(ø) —÷ C and CE(ø) —÷ C are uniquely determined. This implies that the two C triangles with vertices C commute. Finally, since the canonical functor C —÷ C is a cartesian closed functor (see Proposition 8.6.15), the two uniquely • determined cartesian closed functors from CE (0) must commute with this functor. It is again a matter of unraveling definitions to see how the Basic Lemma follows from this diagram. Let us consider the binary case with A and B given by Henkin models A and B of simply typed lambda calculus. Then both C = A x B and C are well-pointed cartesian closed categories. The cartesian closed functor CE (0) —÷ C maps a typed lambda term a M: r to the pair of morphisms giving the meaning of M in A and, respectively, the meaning of M in B. The cartesian closed functor CE(O) —÷ C maps a typed lambda term x CBI[a]1I = x x: a M: r to a morphism from some subset S ç to a subAt Bt, x CBIIr]1I = x where CAI[a]1 is the object of A named type expressetS' c sion a, and similarly for C81[a]1 and so on. As noted in Section 8.6.3, such a morphism in the subscone is a product map x g with f: CAI[a]1 —÷ C4t]j and g: C81[a]1 —÷ C81[r]1 which maps a pair (u, v) with u S v to a pair (f(u), g(v)) with f(u) 5' g(v). Because the two cartesian closed functors of CE(O) commute with the functor C —÷ C, must be the meaning = AI[x: a M: r]J of the term M in A, regarded as a function of the free

f

f

f

8.6

Generalizations of Logical Relations

603

variable x, and similarly g = 131[x: a M: t]1. Because the term x: a M: r was arbitrary, and a may be a cartesian product of any types, we have shown that given logically related interpretations of the free variables, the meaning of any term in A is logically related to the meaning of this term in B. This is precisely the Basic Lemma for logical relations over Henkin models. 8.6.5

General Case and Applications to Specific Categories

The definition of scone and subscone of a category C rely on the functor from C into Set. However, since relatively few properties of Set are used, we can generalize the construction and associated theorems to other functors and categories. We begin by stating the basic properties of the "generalized scone" and "subscone," then continue by deriving Kripke logical relations and directed-complete logical relations (from Section 8.6.2) as special cases. Proposition 8.6.14 holds in a more general setting with Set replaced by any cartesian closed category C' with equalizers and the functor I: C —* Set replaced by any functor F: C —* C' that preserves products up to isomorphism, by essentially the same proof. This takes us from the specific comma category C = (Set 4, I) to a more general case of the form (C' 4, F). I

I

I

I

Proposition 8.6.22

Let C and C' be cartesian closed categories and assume C' has equalizers. Let F: C —* C' be a functor that preserves finite products up to isomorphism. Then the comma category (C' 4. F) is a cartesian closed category and the canonical functor 7r: (C' 4. F) —* C is a cartesian closed functor.

The next step is to identify generalizations of the subscone. The proof of Proposition 8.6.2 1 actually establishes the general version where instead of C, we may consider any CCC D that has a cartesian closed functor into C. We will be interested in situations where D is a subcategory of a comma category (C' 4, F), such that the restriction (C' 4. F) —* C to D is a cartesian closed of the canonical cartesian closed functor functor. Let C and C' be cartesian closed categories and assume C' has equalizers. Let F: C —* C' be a functor that preserves products and let D be a subx,± signacartesian-closed category of the comma category (C' 4, F). Let E be a ture and consider any mapping E —÷ D presenting D as a E-CCC. Composing with the canonical functor D —* C, we obtain a corresponding map E —÷ C and, by the natural (0) into embedding of symbols into the term category, we have a similar map E —÷ the term category. Given these three maps from E, there exist unique cartesian closed

Proposition 8.6.23

Logical Relations

604

functors from CE (0) into C and D commuting with the canonical functor D —± C as follows: CE(O)

Sconing with Kripke Models If P is a partially ordered set, then we may consider P as a category whose objects are the elements of 7' and morphisms indicate when one element is less than another, as described in Example 7.2.6. As shown in Section 7.3.7, each Kripke lambda model with P as the set of "possible worlds" determines a cartesian closed category C that is The inclusion F of C into category a subcategory of the functor category preserves products, since the interpretation of a product type in a Kripke lambda model By Exercise 7.2.14, has equalizers and we is the same as cartesian product in can apply Propositions 8.6.22 and 8.6.23 For the appropriate analog of subscone, we take the subcategory with an inclusion at each "possible world." More specifically, let D be the full subcategory of the comma cateF(A)(p) an inclusion gory S(p) 4. F) that consists of objects (S, f, A) with for each p in P. By assuming these are ordinary subsets, we may omit when refening to objects of D. It may be shown that D is a CCC (Exercise 8.6.24). In describing function objects (exponentials) in D we may use natural transformations F(appA,A'): F(A —± A') x F(A) —± F(A') because the functor F preserves binary products. For each object C in C and each p p'in P, we also use the transition function F(C)(p p'): F(C)(p) —± F(C)(p') given by the action of the functor F(C): P Set on the order of P. Specifically, (S, A) A') may be given as (P, A A'), where A A' is taken in C and P: P —± Set is the functor such that the set P(p) consists of all t in F(A —k A')(p) for which for all p' p and all a in F(A)(p'), if a belongs to S(p'), then the element A')(p p')(t), a) in F(A')(p') belongs to S'(p'). In other words, the object parts of the unique cartesian closed functor from the term category into D in this case amount to the Kripke logical predicates on the Kripke lambda model that corresponds to F.

f

8.6

Generalizations of Logical Relations

605

In the binary case, we let A and B be cartesian closed categories and let F: A -÷ and G: A —÷ finite product preserving functors given by Kripke lambda models A and B, each with P as the poset of possible worlds. Let H: A x B -÷ be the functor given by H(A, B) = F(A) x G(B). Continuing as above, with C = A x B and with H instead of F, we can see that the object parts of the cartesian closed functors into D are exactly the Kripke logical relations on Kripke lambda models A and B.

Sconing with CPOs Since Cpo is a cartesian closed category with equalizers (by Exercise 7.2.14), we may also form a generalized scone using functors into Cpo. This is like the case for Kripke models, except that we have the added complication of treating pointed CPOs properly; our relations must preserve the bottom element in order to preserve fixed points, as illustrated in Section 8.6.2. It might seem reasonable to work with the category Cppo of pointed CPOs instead, since relations taken in Cppo would have to have least elements. However, the category Cppo does not have equalizers (by Exercise 7.2.14). Moreover, we may wish to construct logical relations which are closed under limits at all types, but which are only pointed where needed to preserve fixed points. A general approach would be to use lifted types, and extend the lifting functor from Cpo to the appropriate comma category. This would give "lifted relations" on lifted types, which would necessarily preserve least elements. A less general alternative, which we will describe below, is to simply work with subcategories where predicates or relations are pointed when needed. This is relatively ad hoc, but is technically simpler and (hopefully) illustrates the main ideas. Since we are interested in preserving fixed-point operators, we let the signature E be a signature that may include fixed-point constants at one or more types, and let Ef be the set of equations of the form fiXa = Af: a -÷ a. f(fix f) for each type a that has a fixedis the term category of simply typed lambda point operator in E. The category calculus with with one or more fixed-point constants, each with its associated equational axiom. By Proposition 7.2.46, there is a unique cartesian closed functor C from to any E-CCC satisfying the fixed-point equations. In particular, there is a unique such functor into any E-CCC of CPOs. If we use Cpo Set, then the scone (Set I) allows us to form logical predicates or relations over CPOs. However, these relations will not be directed complete. Instead, we let I be the identity functor on Cpo and consider the comma category (Cpo I). This could also be written (Cpo Cpo), since we are using Cpo as a convenient shorthand for the identity functor on Cpo. To form the subscone, we consider the full subcategory D A. In other words, P is a with objects of the form (P, f, A), where is an inclusion P sub-CPO of A, and therefore closed under limits of directed sets. By generalizing Proposition 8.6.15, we can see that D provides logical predicates over CPO models that are closed I

f

606

Logical Relations

under limits. However, there is no reason for predicates taken in Gpo to preserve least elements — we have objects (P, f, A) where A is pointed and P is not. This is not a problem in general, but if we wish to have a predicate preserve a fixed-point operator on pointed A, we will need for P to be a pointed subset of A. In order to preserve fixed points, we consider a further subcategory. Assume we have a operators. map E —* Gpo identifying certain CPOs on which we wish to have —± For this to make sense, we assume that if a is a type over E with a constant fixa: a, then our mapping from types to CPOs will guarantee that the interpretation of a is a pointed GPO. In this case, we are interested in subcategories of D whose objects are of the form (P, A), where if A is the interpretation of a type with a fixed-point constant, then the inclusion P c—+ A is strict. In particular, P must have a least element. The CCC structure of this subcategory remains the same, but in D we also get fixed-point operators where required, and the forgetful functor D —± Gpo preserves them. The object parts of into D are exactly the inclusive logical the unique cartesian closed functor from CE (Ef predicates described in Section 8.6.2, pointed where required by the choice of fixedpoint constants. In the binary case, we let D be the full subcategory of (Gpo x Gpo Gpo x Gpo) whose objects are of the form (S, A, B), where S is a sub-GPO of A x B, and force pointed relations as required. Suppose P is a partially ordered set and C is a Kripke lambda model with P as the set of "possible worlds." Let F: C —* be the inclusion of C into the category category Show that the full subcategory D of the comma category F) consisting of objects (S, f, A) with S(p) c-+ F(A)(p) an inclusion for each p in P is cartesian closed.

Exercise 8.6.24

Polymorphism and Modularity

9.1

9.1.1

Introduction Overview

This chapter presents type-theoretic concepts underlying polymorphism and data abstraction in programming languages. These concepts are relevant to language constructs such as Ada packages and generics [US 801, c++ templates [E590], and polymorphism, abstract types, and modules in ML [MTH9O, MT91, U11941 and related languages Miranda {Tur85J and Haskell [HF92, As a rule, the typed lambda calculi we use to formalize and study polymorphism and data abstraction are more flexible than their implemented programming language counterparts. In some cases, the implemented languages are restricted due to valid implementation or efficiency considerations. However, there are many ideas that have successfully progressed from lambda calculus models into implemented programming languages. For example, it appears from early clu design notes that the original discovery that polymorphic functions could be typechecked at compile time came about through Reynolds' work on polymorphic lambda calculus {Rey74bI. Another example is the Standard ML module system, which was heavily influenced by a type-theoretic perspective [Mac851. The main topics of this chapter are: Syntax of polymorphic type systems, including predicative, impredicative and type: type versions.

•

Predicative polymorphic lambda calculus, with discussion of relationship to other systems, semantic models, reduction and polymorphic declarations.

•

Survey of impredicative polymorphic lambda calculus, focusing on expressiveness, strong normalization and construction of semantic models. This section includes discussion of representation of natural numbers and other data by pure terms, a characterization of theories by contexts, including the maximum consistent theory, parametricity, and formal category theory.

•

•

Data abstraction and existential types.

General products and sums and their relation to module systems for Standard ML and related languages. •

Most of the topics are covered in a less technical manner than earlier chapters, with emphasis on the intuitive nature and expressiveness of various languages. An exception is the more technical study of semantics and reduction properties of impredicative polymorphism.

Polymorphism and Modularity

608

9.1.2

Types as Function Arguments

In the simply-typed lambda calculus and programming languages with similar typing constraints, static types serve a number of purposes. For example, the type soundness theorem (Lemma 4.5.13) shows that every expression of type ci —* r determines a function from type ci to r, in every Henkin model. Since every term has meaning in every Henkin model, it follows that no typed term contains a function of type flat —* flat, say, applied to an argument of type bool. This may be regarded as a precise way of saying that simply-typed lambda calculus does not allow type errors. Simply-typed lambda calculus also enjoys a degree of representation independence, as discussed in Section 8.5. However, the typing constraints of simply-typed lambda calculus have some obvious drawbacks: there are many useful and computationally meaningful expressions that are not well-typed. In this chapter, we consider static type systems that may be used to support more flexible styles of typed programming, generally without sacrificing the type security of more restricted languages based on simply-typed lambda calculus. Some of the inconvenience of simple type structure may be illustrated by considering sorting functions in a language like C or Pascal. The reader familiar with a variety of sorting algorithms will know that most may be explained without referring to the kind of data to be sorted. We typically assume that we are given an array of pointers to records, and that each record has an associated key. It does not matter what type of value the key is, as long as we have a "comparison" function that will tell which of two keys should come first in the sorted list. Most sorting algorithms simply compare keys using the comparison function and swap records by resetting pointers accordingly. Since this form of sorting algorithm does not depend on the type of the data, we should be able to write a function Sort that sorts any type of records, given a comparison function as a actual parameter. However, in a language such as Pascal, every Sort function must have a type of the form

Sort: (t x

t

—*

bool) x Array[ptr tI

—*

Array[ptr t]

for some fixed type t. (This is the type of function which, given a binary relation on t and an array of pointers to t, return an array of pointers to t). This forces us to write a separate procedure for each type of data, even though there is nothing inherent in the sorting algorithm to require this. From a programmer's point of view, there are several obvious disadvantages. The first is that if we wish to build a program library, we must anticipate in advance the types of data we wish to sort. This is unreasonably restrictive. Second, even if we are only sorting a few types of data, it is a bothersome chore to duplicate a procedure declaration several times. Not only does this consume programming time, but having several copies of a procedure may increase debugging and maintenance

9.1

Introduction

609

time. It is far more convenient to use a language that allows one procedure to be applied to many types of data. One family of solutions to this problem may be illustrated using function composition, which is slightly simpler than sorting. The function

Af: nat

composenatnatnat

—±

nat.Ag: nat —± nat.Ax:

nat.f(gx)

composes two natural number functions. Higher-order functions of type nat nat) and (nat —± nat) —± nat may be composed using the function

(nat

—÷

nat)

—÷

nat.Ag: nat —÷ (nat

—÷

—÷

nat).Ax:

(nat

—÷

nat.f(gx)

It is easy to see that these two composition functions differ only in the types given to the formal parameters. Since these functions compute their results in precisely the same way, we might expect both to be "instances" of some general composition function. A polymorphic function is a function that may be applied to many types of arguments. (This is not a completely precise definition, in part because "many" is not specified exactly, but it does give the right general idea.) We may extend to include polymorphic functions by extending lambda abstraction to type variables. Using type variables r, s and t, we may write a "generic" composition function

composerst

).f:s

—±

t.).g:r

—÷

s.Ax:r.f(gx).

It should be clear that composer,s,t denotes a composition function of type (s —÷ t) —÷ (r —* s) —* (r —* t), for any semantic values of type variables r, s and t. In particular, when all three type variables are interpreted as the type of natural numbers, the meaning of composerst is the same as composenatnatnat. By lambda abstracting r, s and t, we may write a function that produces composenatnatnat when applied to type arguments nat, nat and nat. To distinguish type variables from ordinary variables, we will use the symbol T for some collection of types. As we shall see shortly, there are several possible

interpretations for T. Since these are more easily considered after presenting abstraction and application for type variables, we will try to think of T as some collection of types types, and perhaps other types not yet specified. There is that at least includes all the no need to think of T itself as a "type," in the sense of "element of T." In particular, we do not need T to include itself as a member. Using lambda abstraction over types, which we will call type abstraction, we may define a polymorphic composition function compose

Ar: T. As: T. At: T. composerst.

Polymorphism and Modularity

610

This function requires three type parameters before it may be applied to an ordinary func.tion. We will refer to the application of a polymorphic function to a type parameter as type application. Type applications may be simplified using a version of a-reduction, with the type argument substituted in place of the bound type variable. If we want to apply compose to two numeric functions f, g nat —± nat, then we first apply compose to type arguments nat, nat and nat. Using a-reduction for type application, this gives us :

compose nat nat nat = (Xr: T.Xs: T.Xt: T.Xf: s

= Xf: nat

—±

nat.Xg: nat

—±

—±

t.Xg: r

—±

s.Xx:

r.f(gx)) nat nat nat

nat.Xx: nat.f(gx).

In other words, compose nat nat nat = composenatnat,nat. By applying compose to other type arguments, we may obtain composition functions of other types. In a typed function calculus with type abstraction and type application, we must give types to polymorphic functions. A polymorphic function that illustrates most of the typing issues is the polymorphic identity

Id

Xt:T.Xx:t.x.

It should be clear from the syntax that the domain of Id is T. The range is somewhat more difficult to describe. Two common notations for the type of Id are fIt: T.t —± t and Vt: T.t —± t. In these expressions, fI and V bind t. The V notation allows informal readings such as, "the type of functions which, for all t: T, give us a map from t to t." The fI notation is closer to standard mathematical usage, since a product A, over an infinite index set I consists of all functions f: I t UjEJA, such that f(i) E A1. Since Id t t for every t T, the polymorphic identity has the correct functional behavior to belong to the infinite product flt•T t —± t, and we may write Id: fIt. t —± t. We will use fI notation in the first polymorphic lambda calculus we consider and V in a later system, primarily as a way of distinguishing the two systems. Just as we can determine the type of an ordinary function application fx from the and x, we can compute the type of a type application from the type of the types of polymorphic function and the type argument. For the type application Id t, for example, the type of Id is fIt. t —± t. The type of Id t is obtained by substituting t for the bound variable t in t —* t. While this substitution may seem similar to some kind of a-reduction (with fI in place of X), it is important to notice that this substitution is done only on the type of Id, not the function itself. In other words, we can type-check applications of polymorphic functions by without evaluating the applications. In a programming language based on this form of polymorphism, this type operation be carried out by a compile-time type checker (except in the case for type: type polymorphism described below, where it is difficult to separate compile-time from run-time).

f

9.1

Introduction

611

Having introduced types for polymorphic functions, we must decide how these types fit into the type system. At the point we introduced T, we had only defined non-polymorphic types. With the addition of polymorphic types, there are three natural choices for T. We may choose to let T contain only the simple types defined using —+, + andlor x, say, and some collection of type constants. This leads us to a semantically straightforward calculus of what is called predicative polymorphism. The word "predicative" refers to the fact that T is introduced only after we have defined all of the members of T. In other words, the collection T does not contain any types that rely on T for their definition. An influential predicative calculus is Martin-Löf's constructive type theory [Mar73, Mar82, Mar84], which provides the basis for the Nuprl proof development system A second form of polymorphism is obtained by letting T also contain all the polymorphic types (such as fIt: T.t —+ t), but not consider T itself a type. This T is "impredicative," since we assume T is cEosed under type definitions that refer to the entire collection T. The resulting language, formulated independently by Girard [Gir7l, Gir72] and Reynolds [Rey74b], is variously called the second-order lambda calculus, System or the calculus of impredicative polymorphism. (It is also referred to as the polymorphic lambda calculus, but we will consider this phrase ambiguous.) The final alternative is to consider T a type, and let T contain all types, including itself. This may seem strange, since assumptions like "there is a set of all sets" are well-known to lead to foundational problems (see Section 1.6). However, from a computational point of view, it is not immediately clear that there is anything wrong with introducing a "type of all types." Some simple differences between these three kinds of polymorphism may be illustrated by considering uses of Id, since the domain of Id is T. If we limit T to the collection of all types, we may only apply Id to a non-polymorphic type such as nat or (nat —+ nat) —+ (nat —+ bool). If we are only interested polymorphic functions as a means types, then this seems sufficient. A technical property of for defining functions with may be extended to a model of predicative this restricted language is that any model of polymorphism. In particu'ar, there are classical set-theoretic models of predicative polymorphism. A second advantage of predicative polymorphism is the way that modularity constructs may be integrated into the type system, as discussed in Section 9.5. Predicativity may also be useful in the development of type inference algorithms. These algorithms allow a programmer to obtain the full benefit of a static type system without having to explicitly declare the type of each variable in a program. The type inference algorithm used in the programming language ML is discussed in Chapter 11. If we let T be the collection of all types, then the polymorphic identity may be applied to any type, including its own. By /3-reduction on types, the result of this application is an identity function

Polymorphism and Modularity

612

Id (lit: T.t

t) = ?.x: (lit: T.t

—f

t). x

which we may apply to Id itself. This gives a form of "self-application" reminiscent of Russell's paradox (Exercise 4.1.1). In the impredicative calculus, it is impossible to interpret every polymorphic lambda term as a set-theoretic function and every type as the set of all functions it describes. The reason is that the interpretation of Id would have to be t) which contains Id. a function whose domain contains a set (the meaning of lit: T.t The reader may find it amusing to draw a Venn diagram of this situation. If we think of a function as a set of ordered pairs (as described in Section 1.6.2) then we begin with a circle for the set Id of ordered pairs Id = ((r, ?.x: r.x) r T}. But one of the e'ements of Id t).x) allowing the function to be applied to its t), ?.x: (lit: T.t is the pair ((lit: T.t own type. One of the components of this ordered pair is a set which contains Id, giving us a picture with circularly nested sets. This line of reasoning may be used to show that a naive interpretation of the impredicative second-order calculus conflicts with basic axioms of set theory. A more subtle argument, discussed in Section 9.12, shows that it is not even possible to have a semantic model with classical set-theoretic interpretation of all function types. The semantics of impredicative polymorphism has been a popular research topic in recent years, and several appealing models have been found. In the final form of polymorphism, we let T be a type which contains all types, including itself. This design is often referred to as type: type. One effect of type: type is to allow polymorphic functions such as Id to act as functions from types to types, as well as functions from types to elements of types. For example, the application Id T is considered meaningT is ful, since T: T. By $-reduction on types, we have Id T = ?.x: T.x and so (Id T): T a function from types to types. We may also write more complicated functions from T to T and use these in type expressions. While languages based on the other two choices for T are strongly normalizing (unless we add fixed-point operators) and have efficiently decidable typing rules, strong normalization fails for most languages with type: type. In fact, it may be shown that the set of provable equations in a minimal language with type: type is undecidable ([MR86]; see also [Mar73, Coq86, How87]). Since arbitrarily complex expressions may be incorporated into type expressions, this makes the set of well-typed term undecidable. Historically, a calcu'us with a type of all types was proposed as a foundation for constructive mathematics by Martin-Löf, until an inconsistency was found by Girard in the early 1970's [Mar73]. While compile-time type checking may be impossible for a language with a type of all types, we will see in Section 9.3.5 that there are insightful and non-trivial semantic models for polymorphism with a type of all types. In Sections 9.1.3 and 9.1.4, we summarize some extensions of polymorphic type systems. In Section 9.2, we return to the predicative polymorphic calculus and give a precise definition of the syntax, equational proof rules and semantics of this typed lambda calI

9.1

Introduction

613

culus. We also show how impredicative and type: type polymorphism may be derived by adding rules to the predicative type system. All of the systems described in this chapter use what is called parametric polymorphism. This term, coined by Strachey [Str671, refers to the fact that every polymorphic function uses "essentially the same algorithm" at each type. For example, the composition function given earlier in this section composes two functions and g in essentially the same way, regardless of the types of and g. Specifically, composeparfgx is always computed by first applying g to x and then applying to the result. This may be contrasted with what Strachey termed ad hoc polymorphism. An ad hoc polymorphic function might, for example, test the value of a type argument and then branch in some way according to the form of this type. If we added a test for type equality, we could write the following form of "composition" function:

f

f

add_hoc_compose

f

T.

T.

T.

s

r

r

if Eq?st then f(f(gx)) else f(gx)

f

This function clearly composes and g in a manner that depends on the type of f. This kind function is considered briefly in Section 9.3.2. It appears that ad hoc polymorphism is consistent with predicative polymorphism, as explored in [HM95aJ. However, ad hoc operators destroy some of the desirable properties of impredicative polymorphic calculus. Some research papers on parametricity, which has been studied extensively if not conclusively, are [BFSS89, FGSS88, Rey83, MR92, Wad891. 9.1.3

General Products and Sums

There are a number of variations and extensions of the polymorphic type systems described in this chapter. An historically important series of typed lambda calculi are the Automath languages, used in a project to formalize mathematics [DB8OJ. Some conceptual descendants of the Automath languages are Martin-Löf's intuitionistic type theory [Mar73, Mar82, Mar84] and the closely related Nuprl system for program verification and formal mathematics [C+86]. Two important constructs in these predicative type systems are the "general" product and sum types, written using H and E, respectively. Intuitively, general sums and products may be viewed as straightforward set-theoretic constructions. If A is an expression defining some collection (either a type or collection of types, for example), and B is an expression with free variable x which defines a collection for each x in A, then Ex: A. B and Hx: A. B are called the sum and product of the family B over the index set A, respectively. In set-theoretic terms, the product flx: A .B is the Cartesian product of the family of sets (B(x) x E A). The elements of this product are such that f(a) E [a/xJB for each a E A. The sum Ex: A.B is the disjoint functions I

f

Polymorphism and Modularity

614

union of the family fB(x) x e A}. Its members are ordered pairs (a, b) with a e A and b e [a/x]B. Since the elements of sum types are pairs, general sums have projection functions first and second for first and second components. A polymorphic type lit: T.cr or Vt: T.a is a special case of general products. Another use of general products is in so-called "first-order dependent products." An example from programming arises in connection with anays of various lengths. For simplicity, let us consider the type constructor vector which, when applied to an integer n, denotes the type of integer arrays indexed from 1 to n. If we have a function zero_vector that, given an integer n, returns the zero-vector (anay with value 0 in each location) of length n, then we can write the type of this function as a general product type I

zero_vector: lin: mt. vector n In words, the type lin: mt. vector n is the type of functions which, on argument n, return an element of the type vector n. An interesting and sometimes perplexing impredicative language with general products is the Calculus of Constructions [CH88], which extends

the impredicative polymorphic calculus. Existential types, described in Section 9.4, provide a type-theoretic account of abstract data types. These types are similar to general sums, but restricted in a significant way. Intuitively, in a program with an abstract data type of stacks, for example, stacks would be implemented using some type (such as records consisting of an array and an array index) for representing stacks, and functions for manipulating stacks. Thus an implementation of stacks is a pair consisting of a type and a tuple of elements of a related type (in this case, a tuple of functions). If a is a type expression describing the signature of the tuple of functions associated with stacks, then the type of all stack implementations is the existential type T.a. This differs from the general sum type Et: T.a in the following way. If b): Ex: A.B, thenfirst(a, b) is an expression for an element of A withfirst(a, b) = a. (a, However, if (t = r, M: a): the we do not want any program to be able to access the first component, r, since this is the representation of an abstract type. This restriction follows from the goals of data abstraction, but may seem unnecessarily restrictive in other contexts, such as for ML-style modules. We discuss existential types in more detail in Section 9.4

Exercise 9.1.1 Show that if x does not occur free in B, then lix: A.B is the set of functions from A to B and Ex: A.B is the set of ordered pairs A x B. 9.1.4

'rypes as Specifications

The more expressive type systems mentioned in Section 9.1.3 all serve dual purpose, following what is called the formulas-as-types correspondence, or sometimes the Curry-

9.1

Introduction

615

Howard isomorphism. This general correspondence between typed lambda calculi and constructive logics allows us to regard lambda terms (or programs) as proofs and types as specifications. In simple type systems without variables or binding operators in types, the logic of types is just propositional logic. With polymorphism and data abstraction, however, we may develop type systems of quantified logics, capable of specifying more subtle properties of programs. This approach is used in the the Nuprl system for program verification and other conceptual descendents of the Automath languages [DB8OI. Some related systems for checking proofs via type checking are the Edinburgh Logical Framework [HHP87I and PX [HN881, which is based on Feferman's logical system [Fef791. Section 4.3.4 contains a basic explanation of natural deduction and the formulas-astypes correspondence for propositional logic. The main idea in extending this correspondence to universal quantification is that a constructive proof of Vx: A must be some kind of construction which, given any a E A, produces a proof of If we identify proofs of the formula with elements of a type that we also write then proofs of which, given any a E A, produce an elVx: A4 must be functions from A to Therefore, the constructive proofs of Vx: A4 are precisely the elements ement of of the general product type Fix: A be seen in the proof rules for universal quantification. The natural deduction rules for universal quantifiers, found in [Pra65], for example, are universal generalization and instantiation.

x:Anotfreein Vx:

A4

any assumption

(The condition in the middle is a side condition for the first rule.) The first rule gives us a typing rule for lambda abstraction. Specifically, if we have a term giving a proof of we can lambda abstract a free variable x of type A in M to obtain a proof of Vx: In lambda calculus terminology, the side condition is that x may not appear in the type of any free variable of M. This does not arise in simply-typed lambda calculus since term variables cannot appear in type expressions. However, in polymorphic calculi, this becomes an important restriction on lambda abstraction of type variables. In summary, the logical rule of universal generalization leads to the (U Intro) rule

F

F, x: A

M: a

Xx: A. M:

fix: A.a

x not free in

r

(U Intro)

explained in more detail for A = Ui, a "universe of types," in Section 9.2.1. The second proof rule, universal instantiation, corresponds to function application. More specifically, if a proof of Vx: A4 is a function of type fix: A4, and a: A, then we may obtain a proof of by applying the function to argument a. This leads to the (fi Elim) rule

Polymorphism and Modularity

616

FiM:flx:A.a F

MN: [N/x]a

explained for A = U1 in Section 9.2.1. The situation is similar for existential quantification and general sums. The logical rules of existential generalization and instantiation are [y/x]4)

[a/x]4)a:A 3x: A.4)

i/c

other assumption

by showing that 4) holds for some specific a: A. In the In the first rule, we prove second rule, the second hypothesis is meant to indicate a proof of i/c from an assumption of a formula of the form Ily/x]4), with some "fresh" variable y used to stand for some A.4) says that arbitrary element satisfying the property 4). In other words, the formula "4) holds for some x." We can use this to prove another formula i/c by choosing some which means arbitrary name y for "some x." This rule discharges the assumption that although the proof of i/c above the line depends on this hypothesis, the proof of i/c resulting from the use of this rule does not depend on this assumption. (See Section 4.3.4 for a discussion of how natural deduction rules may introduce or discharge assumptions.) The term formation rules associated with general sums and existential quantification are given in Section 9.4. In logic, there is a fundamental distinction between first-order logic and second- and higher-order logics. Since a general product flx: A. B provides universal quantification over A, we can obtain first-order, second-order or higher-order type systems by specifying the kind of type A that may appear in a product fix: A. B. The first-order dependent products of the last subsection usually correspond to first-order quantification, since we are quantifying over elements of a type. Polymorphism arises from products of the form lix: T.B, where T is some collection of types, providing universal quantification over propositions. Similarly, a general sum A. B provides existential quantification over A while the general sums T.B associated with data abstraction provide existential quantification over propositions. With general products and sums of all kinds, we can develop logic with universal and existential quantification over each type, as well as quantification over collections of types. In a predicative system, quantification over propositions is restricted to certain subcollections, while in an impredicative system, we can quantify over the class of all propositions. Another aspect of formulas-as-types is that in most type systems, common atomic for-

9.2

Predicative Polymorphic Calculus

617

mulas such as equality do not appear as basic types. Therefore, in order to develop type systems that allow program specification and verification, we must also introduce types that correspond to atomic formulas. To give some idea of how atomic formulas may be added, we will briefly summarize a simple encoding of equality between terms. Since types represent propositions, we need a type representing the proposition that two terms are equal, together with a way of writing terms of the resulting "equality type." For any terms M, N: A, we could introduce a type of the form N) to represent the atomic formula M = N. The terms of type N) would correspond to the ways of establishing that M = N. Since equality is reflexive, we might have a term formation rule so that for any M: A, there is a term reflA(M): M). In other words, for every M: A, there is a "proof" reflA(M) of the formula M). To account for symmetry, we would have another term formation rule such that for any M: P), representing a proof that N = P, there is also a term symA(M): N) representing a proof that P N. In addition, we would expect term forms for transitivity and possibly other properties of equality such as substitutivity of equivalents. In a language like this, we could write a term of type N) whenever M and N are provably equal terms of type A. To see how we may regard types as "program specifications" and well-typed terms as "verified programs," we will give a concrete example. Suppose prime(x) is a type corresponding to the assertion that x: flat is prime, and divides(x, y) is a type "saying" that x divides y. These would not necessarily be atomic types of the language, but could be complicated type expressions with free natural number variables x and y. Then we would expect a term of type Fix: nat.(x>

1

—÷

Ey: nat.(prime(y) x divides(y, x)))

to define a function which, given any natural number x, takes a proof of x > 1 and returns a pair (N, M) with N: nat and M proving that N is a prime number dividing x. Based on this general idea of types as logical assertions, Automath, Nuprl and the Calculus of Constructions have all been proposed as systems for verifying programs.

9.2

9.2.1

Predicative Polymorphic Calculus Syntax of Types and Terms

In this section, we define the predicative polymorphic lambda calculus By itself, ' pure is not a very significant extension of since the only operations we may perform on an expression of polymorphic type are type abstraction and application. Since we do not have lambda abstraction over term variables with polymorphic types, we cannot

618

Polymorphism and Modularity

pass polymorphic functions as arguments or model polymorphic function declarations. ' illustrates some of the essential features common to all languages with However, type abstraction and forms the core fragment of several useful languages. For example, by adding a declaration form for polymorphic functions, we obtain a language resembling a core fragment of the programming language ML as explained in Section 9.2.5. With additional typing rules that eliminate the distinctions imposed by predicativity, we obtain the Girard—Reynolds calculus of impredicative polymorphism, explored in Section 9.3. With one more typing assumption, we obtain type: type. It is straightforward to extend any of these systems with other simple types such as null, unit, + and x, using the term forms and equational rules in Chapter 4. types and the polymorfall into two classes, corresponding to the The types of ' phic types constructed using 11. Following the terminology that originated with MartinLöf's type theory, we will call these classes universes. The universe of function types over some collection of given types (such as nat and bool) will be written Ui and the universe of polymorphic types U2. If we were to extend the language with cartesian product types, we would put the cartesian product of two Ui types in U1, and the cartesian product of two U2 types (if desired) in U2. To give the universes simple names, we sometimes call U1 and U2 the "small" and "large" types, respectively. Although we will later simplify the syntax of by introducing some meta-linguistic conventions, we will begin by giving explicit syntax rules for both type expressions and terms. The reasons for giving a detailed treatment of syntax are: (i) this makes membership in each type and universe as clear as possible, (ii) this gives an introduction to the general method used to define the syntax of more general type systems, and (iii) this makes it easier to see precisely how impredicative and type: type polymorphism are derivable from predicative polymorphism by imposing additional relationships between U1 and U2. In a general type system, the syntax of each class of expressions is given by a proof system for assertions that are commonly called "syntactic judgements." Like the type derivation system for simply-typed lambda calculus in Chapter 4, the proof system for syntactic judgements may be understood as an axiomatic presentation of a type checker for the language. In our general presentation of we will use judgements of the form F A: B, where F is a context (like the sort assignments of Chapter 3 and type assignments of Chapter 4) indicating the type or universe of each variable, A is the expression whose type or universe is being asserted, and B is either U1, U2 or a type expression. The judgement F A: B is read, "in context F, A is a well-formed expression defining an element of type or universe B." A useful property of the syntax rules is that if F A: B is derivable, then all of the free variables of A and B must appear in F. A context is an ordered sequence

Predicative Polymorphic Calculus

9.2

F

=

v1:A1

619

vk:Ak

types or universes to variables v Vk. In order to make sense, no v1 may occur twice in this sequence, and each must be provably well-formed assuming only vi: A1 v,_1: A,_1. We will put this formally using axioms and inference rules for judgements of the form "F context," which say that "F is a well-formed context." The order of assertions in a context is meant to reflect the order of declarations in an expression, including declarations of the types of formal parameters. For example, in showing that a term such as giving

is well-typed, we would show that the function body, xy, is well-typed in the context t: U1, x: t —÷ t, y: t. In this context, it is important that t: U1 precedes x: t —÷ t and y: t since we need to declare that t is a type variable before we write type expressions t —÷ t and t. The context would make sense with x and y declared in either order, but only one order is useful for deriving a type for this term.

The context axiom 0 context

(empty context)

says that the empty sequence is a well-formed context. We can add an assertion x: A to a context if A is either a universe or a well-formed type expression. Although it would be entirely reasonable to have U2 variables in an extension of we will only use U1 variables, for reasons that are outlined below. We therefore have one rule for adding U1 variables and one for adding variables with types to contexts. F context F,

t:

F, x:

t not in F

(U1 context)

U1 context

a context

F

(U1

type context)

The second rule (for adding a variable ranging over a specific type) illustrates a general convention in formal presentation of type systems: we omit conditions that will be guaranteed by simple properties of the syntax rules. In this case, we do not explicitly assume "F context" in the hypothesis of this rule since it will be a property of the syntax rules that we can only derive F a: U1 when F is a well-formed context. Along with forming contexts, there are two other general rules that apply to a variety of type systems. The first is the rule for making use of assumptions in contexts.

Polymorphism and Modularity

620

F, x: A context F, x: A x: A

(var)

This is essentially the converse of the rules for adding an assertion x: A to a context. The rule is useful both when A is U1 and when A is a type belonging to U1 or U2. We also have a general rule for adding hypotheses to a syntactic judgement.

F,x:Ccontext F, x: C

(add var)

A: B

This is needed to derive a judgement such as x: A, y: B x: A where the variable on the is not the last variable in the context. We give these rules the same names, right of the (var) and (add var), as the axiom and inference rule in the definition of simply-typed lambda calculus, since these are the natural generalizations. " signature consists of a set of type constants (which we restrict to U1) and a set A of term constants. Each term constant must be assigned a closed type expression (without free variables) over the type constants of the signature. We now give syntax rules for U1 and (12 type expressions. The first universe U1 is essentially the predicative T of the last section. The U1 type expressions are given by variable rule (var), applied to U1 variables, the axiom

(cstUi) for type constant b of the signature, and inference rule

F

r

r

:

U1

giving us function type expressions. The second universe U2 contains the types of polymorphic functions.

U1

r: U1 F r: U2 F

types and

(U1

F

c U2) (flU2)

Since the (U1 ç U2) rule makes every type of the first universe a type of the second, we describe the relationship between universes by writing Uj c U2.

Example 9.2.1 A simple example of a typing derivation is the proof that fit: a well-formed U2 type expression.

U1

.t

—÷

t is

Predicative Polymorphic Calculus

9.2

0 context t:

U1

t: U1

621

by axiom (empty context)

context

by (U1 context)

(var)

t: U1

(—k U1)

(U1CU2)

(lit: U1.t

—÷

t): U2(li U2)

The reader may wish to stop and work out the similar proof that fls: U1. fit: U1. s t is a well-formed U2 type expression. The only tricky point in this proof is using (add var) to derive s: U1, t: Ui s: U1 from s: Ui s: U1. Although all these proof rules for syntax may seem like much ado about nothing, it is important to be completely clear about which expressions are well-formed and which are not. In addition, with a little practice, syntax proofs become second nature and require very little thought.

While we have expressions for U2 types, we do not have variables ranging over U2. The reason is that since we do not have binding operators for U2 variables, it seems impossible to use U2 variables for any significant purpose. If we were to add variables and lambda abstraction over U2, this would lead us to types of the form fIt: U2. a, which would belong to a third universe U3. Continuing in this way, we might also add universes U4, U5, U6 and so on. Several systems of this form may be found in the literature Mar73, Mar82, Mar84], but we do not investigate these in this book. The terms of may be summarized by defining the set of "pre—terms," or expressions which look like terms but are not guaranteed to satisfy all the typing constraints. The un—checked pre—terms of are given by the grammar M

x

).x:r.M

I

MM

Xt:U1.M

Mr

where in each case r must be a Ui type. Every term of must have one of the forms given by this grammar, but not all expressions generated by the grammar are well-typed terms. The precise typing rules for terms include axiom (var) for variables and rule (add var) for adding hypotheses about additional variables to judgements. We also have an axiom (cst) for each term constant of the signature. For lambda abstraction and application, we have reformulated as follows to include type variables rules (—k Intro) and (—± Elim) as in explicit assumptions. and universe

Polymorphism and Modularity

622

F

F (—* Elim), It is not necessary to specify that r: U1 since r —* r' will only be well formed when both r and r' are in U1. Since we have repeated all the typing rules from it is easy to check that any term, perhaps containing type variables in place of constants, is type a term Type abstractions and applications are formed according to the following rules:

In rule

F At: U1. M:

F

([1 Intro)

flt.a

Mi: [t/tla

F

In (fT Elim), the type [t/tla will belong to The final rule is a type equality rule

U1

if a: U1, and belong only to

(12

otherwise.

Fr>M:a1 F

M:

which allows us to replace the type of a term by an equal type. In pure the only equations between types will be consequences of a-conversion. However, in extensions of the language, there may be more complicated type equations and so we give the general rule here. We say M is a term with type a in context F if F M a is derivable from the axioms and typing rules. :

Example 9.2.2 Let E be a signature with type constant nat and term constant 3: nat. We will show that (At: U1. Ax: t. x) nat 3 has type nat. This is a relatively simple term, but the typing derivation involves many of the proof rules. Since the term begins with lambda bindings fort: U1, x: t, we begin by showing "t: Ui, x: t context." 0 context t:

U1

context

t:

U1

t: U1

by axiom(empty context)

by (U1 context)

(var)

t: U1, x: t context (U1 type context)

Predicative Polymorphic Calculus

9.2

623

We next observe that the judgement

t: U1, x: t

t:

U1

follows from the last two lines above by (add var) while t: U1, x: t

x: t

follows from t: Uj, x: t context by (var). The remaining steps use the term-formation rules. For simplicity, we omit the judgement nat: U1, used below as the second hypothesis for rules (—± Intro) and (II Elim).

fromabove t:

U1

Ax: t. x: t

—*

0

At: U1. Ax: t. x:

0

U1. Ax:

0

3:

t

llt.t

(—* —±

(II Intro)

t

t. x) nat: nat

Intro)

—±

nat

nat (II Elim)

axiom (cst)

Uj.Ax:t.x)nat3:nat

(—±

Elim)

.

One potentially puzzling design decision is the containment Ui ç U2. The main reason for this rule is that it simplifies both the use of the language, and a number of technical details in its presentation. For example, by putting every r: Ui into U2 as well, we can write a single 11-formation rule, instead of giving two separate cases for types starting in U1 or U2 places no additional semantic (12. An important part of this design decision is that Ui constraints on the language. More specifically, if we remove the (U1 ç U2) rule from the language definition, we are left with a system in which every U1 type is represented as a U2 into the weaker version without Ui C (12. type. This allows us to faithfully translate This is made more precise in Exercises 9.2.5 and 9.2. which uses the equational axiom system. Another possible relationship that we do not impose is U1: U2, which would mean that "the universe of small types is itself a large type." Since does not have any significant operations on arbitrary U2 types, there does not seem to be any reason to take Ui: U2. it would be a reasonable language design to However, in predicative extensions of put U1: U2. In an impredicative system with Uj = U2, the further assumption that U1: U2 would lead to type: type, which is often considered undesirable. However, we will see in Section 9.5.4 that with the a type-binding operator, used for writing ML "signatures," we will essentially be forced to have Ui as a member of U2.

"

Exercise 9.2.3

An alternate (var) rule is

Polymorphism and Modularity

624

F context

x:

Show that this is a derivable rule, given (var), section.

Exercise 9.2.4

var)

and

the other rules listed in this

Sketch derivations of the following typing judgements.

(a) (b)

(add

—÷

t.Ay:s.xy:fls:Uj.flt:Ui.(s

—± t)—± s

—±

t

s:U1,x:s,

(c) s:U1,

This exercise asks you to show that for the purpose of typing terms, we have U1 ç U2 even if we reformulate the system without the (U1 c U2) rule. If we drop the (U1 c U2) rule, then we need two (fl U2) rules, one for a Ui type and one for a U2 type: Exercise

9.2.5

flt:Ui.r:U2

flt:U1.a:U2

(These could be written as a single rule using a meta-variable i representing either We should also rewrite the (fl Intro) rule as two rules:

F,t: U1

M:t

F,

t:

U1

t:

U1

F,

t:

U1

F,t: U1

or 2.)

U2

Show that with these changes to (fl U2) and (fl Intro), any typing assertion F can be derived using (Ui c U2) can also be derived without (U1 c U2).

9.2.2

I

M: ci that

Comparison with Other Forms of Polymorphism

Both the impredicative typed lambda calculus and the "type: type" calculus may be viewed as special cases of the predicative polymorphic calculus. The first arises by imposing the "universe equation" U1 = U2 and the second by imposing both the equation U1 = U2 and the condition U1: U2. This way of relating the three calculi is particularly useful when it comes to semantics, since it allows us to use the same form of structure as models ' for all three languages. We conclude this section with a simplified syntax for that eliminates the need for type variables in contexts. This simplified syntax may also be used for the impredicative calculus, but breaks down with either type: type or the general product and sum operations added to in Section 9.5.

9.2

Predicative Polymorphic Calculus

625

Impredicative Calculus Since we already have U1 U2 in that forces the reverse inclusion, U2 F F

c

we may obtain Ui = U2 by adding a syntax rule The following rule accomplishes this.

U1.

a: U2 a: U1

(U2

c

Example 9.2.6 A simple example which illustrates the effect of this rule is the term (I (flt.t t)) I, where I At: Ui.Ax: t.x is the polymorphic identity. We will prove the syntactic judgement 0

(I (flt.t

t)) I: flt.t

t.

As illustrated in Example 9.2.2, we can derive the typing judgement 0

I: (flt.t

—÷

t) us-

ing the typing rules of with (flt.t —÷ t): U2. Using the impredicative rule ((12 U1), we may now derive (Ilt.t —÷ t): Ui. This allows us to apply the polymorphic identity to the type flt.t t. According to the (11 Elim) typing rule, the type application I (flt.t t) has type [(flt.t t)/tJ(t t), which is (flt.t t) (flt.t t). Notice that this would not be a well-formed type expression in predicative but it is a type expression of both universes (since they are the same) when we have (U2 U1). Finally, Elim) the application (I (Flt.t by rule t)) I has type flt.t t. .

Type : Type The second extension of predicative polymorphism is to add both U2 = Ui and Ui: U2. We have already seen that the syntax rule (U2 c U1) is sufficient to achieve U2 = U1. To make U1 an element of U2, we may add the typing axiom (U1:U2)

Since there is no reason to distinguish between U1 and U2, we will often simply write U for either universe. There is some lack of precision in doing this, but the alternative involves lots of arbitrary decisions of when to write U1 and when to write U2. There is much more complexity to the syntax of this language than initially meets the eye. In particular, since the distinction between types and terms that have a type is blurred, the equational proof rules may enter into typing derivations in relatively complicated ways. Some hint of this is given in the following example.

Example 9.2.7

An example which illustrates the use of the type: type rule is the term

(Af:U—÷ (U—÷

U).As:U.At:U.Ax:(fst).x) (As:U.At:U.s)natbool3

Polymorphism and Modularity

626

where we assume flat, boo!: U1 and 3: nat. Intuitively, the first part of this term is an identity function whose type depends on three parameters, the type function f: U —* (U —* U) and two type arguments s, t: U. This is applied to the type function As: U.At: U.s, two type arguments, and a natural number argument. The most interesting part of the type derivation is the way that equational reasoning is used. We begin by typing the left-most function. The type derivation for this subterm begins by showing that U1 —* U1 —* U1 is a type, introducing a variable and adding variables s and t to the context.

f

byaxiom(Ui:U2) (U2CU1) (—* U1)

(—* U1)

f: U1 —* U1 —* f: U1 —* U1 f: U1 —± U1 f: U1 —÷ U1

type context)

Ui context

—÷ U1

f: U1 —* U1

—±

U1, 5: U

context

—*

U1, s: U

f: U1

(var)

—÷ U1

(U1 cofltext) —* U1 —* U1

(add var)

f: U1 —* U1 —± U1, 5:

U,

t:

U

context

f: U1 —* U1 —* U1,

U,

t:

U

f: U1

U1

in the same context similarly, and use

We may derive s:

s:

U1 and

t:

(U1 —* U1 —* U1

cofltext)

(add var) (—*

Elim) twice to

show

f: U1

—* U1 —*

U1, s: U,

t: U

fst: U1

It is then straightforward to form the identity function of this type, and lambda abstract the three variables in the context to obtain

Ør>Af:U1—÷Ui--÷U1.As:U.At:U.Ax:fst.x: flf: U1 —÷ U1 -* U1. fls: U1. fit: U1. (fst) —÷ (fst) The interesting step is now that after applying to arguments (As: U1.At: Uj.s): U1, flat: U1 and bool: U1 to obtain 0

(Af:

U1 —* U1 —* U1. As: U1.

((As: U1.At: Ui.s)nat bool)

At: U1. Ax:

—±

fst.x)(As:

U1 —÷ U1 —*

U1.At: U1.s)flat bool:

((As: U1.At: U1.s)flat bool)

We must use equational reasoning to simplify both occurrences of (As: U1.At: U1.s)nat

9.2

Predicative Polymorphic Calculus

627

bool in the type of this application to nat. This is done using fl-reduction to obtain 0

U1

U1

.s)nat bool = nat:

U1

followed by the type equality rule (type eq). Once we have the resulting typing assertion, 0

U1

U1

r>

nat

U1.

U1.

U1.s)nat bool:

flat

it is clear that we may apply this function to 3: nat. The important thing to notice is that the type function Ui Ui .s could have been replaced by a much more complicated expression, and that type checking requires the evaluation of a type function applied to

type arguments.

Example 9.2.7 shows how equational reasoning enters into type: type typing derivations. Although we will not go into the proof, it can be shown by a rather subtle argument [Mar73, MR86J that with type: type, we may write very complex functions of type Ui Ui Ui, for example. This allows us to construct functions like the one given in Example 9.2.7 where evaluating the actual function is very simple, but deciding whether it is well-typed is very complicated. Using essentially this idea, it may be shown that the set of well-typed terms with type: type is in fact undecidable [Mar73, MR86J. This definitely rules out an efficient compile-time type checking algorithm for any programming language based on the type: type calculus.

Simplified Syntax of A simplifying syntactic convention is to use two sorts of variables, distinguishing term variables x, y, z,... from type variables r, s, t,... by using different letters of the alphawe assume that all type variables denote Ui bet. In the simplified presentation of types. A second convention is to use r, r', Ti,... for Ui types and a, a', a1,... for U2 types. This allows us to give the syntax of Ui and U2 type expressions by the following grammar: ,

::=

t

lb I

I

IIt.a

When we simplify the presentation of type expressions, we no longer need to include type variables in contexts, and we can drop the ordering on assumptions in contexts. terms may now be written using contexts of the form The typing rules for

F={xj,a1 consisting

xk:akl

of pairs x:

that each associate a type with a distinct variable. The rules for

Polymorphism and Modularity

628

terms remain essentially the same as above, except that we may drop the explicit assumptions involving universes since these follow from our notational conventions. To be absolutely clear, we give the entire simplified rules below.

(var) (cst) F F

(Ax:

Intro)

r.M): r MN: r'

F F

At.M: flt.cr

F

Mr: [r/t]cJ

(fi Intro)

(t not free in F)

The restriction "t not free in F" in (11 Intro) is derived from the ordering on contexts in the

detailed presentation of syntax. Specifically, if we use the detailed presentation of syntax, then a term may contain a free type variable t only if the context contains t: U1. Since (fi Intro) for ordered contexts only applies when t: Ui occurs last in the context, we are guaranteed that t does not occur elsewhere in the context. In order for the simplified syntax to be accurate, we therefore need the side condition in (11 Intro) above. This side condition prevents meaningless terms such as {x: t} At.x, in which it is not clear whether t is free or bound (see Exercise 9.2.10).

Exercise 9.2.8 then so is 9.2.3

Use induction on typing derivations to show that if F

M:

a

is derivable,

[r/t]F [r/t]M: [r/t]cJ.

Equational Proof System and Reduction

We will write terms using the simplified syntax presented at the end of the last section. Like other typed equations, equations have the form F M = N: a, where M and N are terms of type a in context F. The equational inference system for is an extension of the proof system, with additional axioms for type abstraction and

application.

As.[s/tIM: FIt.a

F

At.M

F

(At.M)r = [r/t]M: [r/t]a

(a)n

9.2

Predicative Polymorphic Calculus

629

tnotfreeinM

(17)fl

Reduction rules are defined from

and by directing these axioms from left to right. The proof system includes a reflexivity axiom and symmetry and transitivity rules to make provable equality an equivalence relation. Additional inference rules make provable equality a congruence with respect to all of the term-formation rules. For ordinary lambda abstraction and application, the appropriate axioms are given in Chapter 4. For type abstraction and application we have the following congruence rules.

flt.a F M = N: flt.a F Mt = Nt: [r/t]a F

At.M = At.N:

We consider type expressions that only differ in the names of bound variables to be equivalent. As a result, we do not use an explicit rule for renaming bound type variables in types that occur in equations or terms. As for other forms of lambda calculus, we can orient the equational axioms from left to right to obtain a reduction system. In more detail, we say that term M reduces to N and write M N if N may be obtained by applying any of the reduction rules (p8), (i'), to some subterm of M. For example, we have the reduction se-

quence (At.Ax: t.x) r y

—f

(Ax:

r.x) y

y,

reduction and the second by ordinary (fi) reduction. It is generally convenient to refer to both ordinary (fi) and as and similarly for reduction. Most of the basic theorems about reduction in simply-typed lambda calculus generalize nT• to For example, we may prove confluence and strong normalization for reduction polymorphic terms. (See Exercise 9.2.12 below for an easy proof of strong normalization; confluence follows from an easy check for local confluence, using Lemma 3.7.32.) ", subject reduction, is that reduction preserves the An important reduction property of type of a term. the first by

Proposition 9.2.9 (Subject Reduction) and M N by any number of well-typed term of the same type.

Suppose

r

M: ci is a well-typed term steps. Then F N: a-, i.e., N is a

Polymorphism and Modularity

630

The proof is similar to the proof of Lemma 4.4.16. The main lemmas are preservation of typing under term substitution (as in Lemma 4.3.6) and an anaJogous property of type substitution given as Exercise 9.2.8.

Exercise 9.2.10 Show that if the side condition "t not free in F" is dropped from nile (fi Intro), the Subject Reduction property fails. (Hint: Consider a term of the form (At. M)r where t is not free in M.) Exercise 9.2.11 Exercise 9.2.5 shows that for the purpose of typing terms, the rule (Uj c U2) is unnecessary, provide we reformulate (fi U2) and (fi Intro) appropriately. This exercise asks you to show that under the same modifications of (U U2) and (fi Intro), we will essentially have Ui c U2, i.e., for every U1 type there is an "essentially equivalent" U2 type.

Let r: U1 be any type from the first universe, and let t be a variable that is not free in r. Let unit be any U1 type that has a closed term *: unit. For any term M, let tM be a type variable that does not occur free in M. The mappings i[ I and j[ I on terms are defined by Munit. Ut. M and j[M]

i[M]: Ut: U1.r whenever F M r. r whenever F M: fit: Ui.r. (b) Show that F (a) Show that F

:

:

(c)

with U1 C U2 into an equivalent i[] and jill' sketch a translation from expression that is typed without using Uj ç U2. Explain briefly why this translation preserves equality and the structure of terms. (d) Using

Exercise 9.2.12 Use the strong normalization property of the simply typed lambda calculus to prove that reduction is strongly normalizing. It may be useful to show first that if a term M has a subterm of the form xr, then x must be a free variable of M. 9.2.4

Models of Predicative Polymorphism

It is basically straightforward to extend the definitions of applicative structure and and to prove analogous type soundness, equational soundness and Henkin model to equational completeness theorems. In this section, we give a short presentation of applicative structures and Henkin models, leaving it to the reader to prove soundness and completeness theorems if desired. Since impredicative and type: type polymorphism are derivable from predicative polymorphism by imposing additional relationships between universes, models of impredicative and type: type calculi are special cases of the more general predicative structures defined below. A category-theoretic framework for polymorphism may be developed using hyperdoctrines, relatively Cartesian closed categories

Predicative Polymorphic Calculus

9.2

631

[Tay871, or locally cartesian closed categories [See841. However, none of the connections between these categories and polymorphic calculi are as straightforward as the categorical equivalence between and CCCs described in Chapter 7. An interesting choice in giving semantics to lies in the interpretation of the containment Ui c U2. While it seems syntactically simpler to view every element of U1 as an element of U2, there may be some semantic advantages of interpreting Ui ç U2 as meaning that U1 may be embedded in U2. With appropriate assumptions about the inclusion mapping from Ui to U2, this seems entirely workable, and leads to a more flexible model definition than literal set-theoretic interpretation of Ui c U2. For simplicity, however, we will interpret U1 C U2 semantically as literal set-theoretic containment. This assumption is convenient when we come to impredicative and type: type calculi, since in both cases we will assume U, = U2 anyway.

Henkin Models for Polymorphic Calculi has two collections of types, U1 and U2, with U1 C U2, a model A Since C For each element a E will have two sets and with we will also have a set Doma of elements of type a. This also gives us a set Doma for each a E U14. In addition, we need some way to interpret type expressions with —* and V as elements of U1 or U2, and some machinery to interpret function application. The following model definition is in the same spirit as the Henkin models for impredicative calculus developed in [BMM9OI. A applicative structure A is a tuple

A = (U, dom,

App1},

I),

where

•

flA, 7Jtype} specifies sets a set [U14 —* of functions from to a binary operation —Y" on Uj4, a map flA from —* and a map 'type from type constants to to is a collection of sets indexed by types of the structure. a E dom =

•

{App", App1}

•

U

= (U14,

from the first universe, and one types a, b E mapping the first universe to the second. Each :

App1

for every function must be a function

:

f

—*

(Doma

Dom

to functions from Doma to Dom". Similarly, each

from

for every pair of

is a collection of application maps, with one

—±

fl

aEU1

must be a function

Polymorphism and Modularity

632

into the cartesian product flaEU1 assigns a value to each constant symbol, with 1(c) E Constants if c is a constant of type r. Here I[r]1 is the meaning of type expression r, defined below. Recall that the type of a constant must not contain any free type variables.

from

I:

and This concludes the definition. An applicative structure is extensional if every is one-to-one. As in Chapter 4, we define Henkin models by placing an additional constraint on extensional applicative structures. Specifically, we give inductive definitions of the meanings of type expressions and terms below. If, for all type expression, term and environment, these clauses define a total meaning function, then we say that the structure is a Henkin model. frame, then an A-environment is a mapping If A is a

ij: Variables

(Uj4 U

U

The meaning I[a]lij of a type such that for every type variable t, we have ij(t) E expression a in environment ij is defined inductively as follows:

=ij(t) L[billii

Lilt

=

L[flt.a]li

=

E

L[a]lij[a/t])

11 case, recall that is a map from to For a frame to be a model, of flA must contain, for each type expression a and environment the domain [Uj" ij, the function L[a]lij[a/t] obtained by treating a as a function oft. E If F is a context, then ij satisfies F, written ij F, if for every x: a E E F. The meaning of a term F M: a in environment i= F is defined by induction as follows:

In the

MN: r]lij = where a

M: r' E[r']lij and b

ElF

=

N:

9.2

Predicative Polymorphic Calculus

cj.M: a

—±

633

= the unique

fE

=

with

I[F, x:

a

where a = L[a]lij and b ff1'

Mt: [t/tja}]ij

=

ffF

where

U1.M: flt.a}]ij

ff1'

=

dl all d

E

=

f(a) = Ia ]lij[a/ tJ =

F-*

M: flt.a}]ij IIt]lij,

the unique g E

where

M: t]lij[x

ff1'

all a E

with M : a]lij[a/t] all a

f(a) = L[a]lij[a/t] all

a

E

E Uj4.

An extensional applicative structure is a Henkin model if M: defined above, for every well-typed term F M: a and every j= F. An impredicative applicative structure or Henkin model is one with U14' A type:type applicative structure or Henkin model is one with = some specified a E =

exists, as

=

for

Soundness and Completeness It is easy to show that in every model, the meaning of each term has the correct semantic type.

Lemma 9.2.13 (Type Soundness) Let A be a and ff1'

model, F

M: a a well-typed term,

j= F an environment. Then

M:

E

A straightforward induction shows that the equational proof system is sound and term model constructions may be used to show that the equational proof system is complete (when a (nonempty) rule as in Chapter 4 is added) for models that do not have empty types. Additional equational completeness theorems may be proved for models that may have empty types [MMMS 871, and for Kripke-style models.

Predicative Model Construction We will see additional examples of models in Section 9.3, which is concerned with impredicative and type: type models. One difference between the predicative, impredicative ' and type: type languages is that has classical set-theoretic models, while the other

Polymorphism and Modularity

634

languages do not (see Section 9.3.2). In fact, any model of the simply-typed lambda calby the following simple set-theoretic culus, may be extended to a model of construction. I) for the U1 types a E If we begin with some model A = ' using standard set-theoretic and terms, we can extend this to a model for all of cartesian product. For any ordinal a, we define the set [U2]a as follows [U2]0

U1

[U2]fl+l

=

[U2]fl U

{

fl

.

f(a)

f:

-* [U2]fl}

aEU1

for limit ordinal a

(J

[U2]a

Note that for any a, the set [U2]a contains all cartesian products indexed by functions from [U1

-*

U21a

U U1

=

[U2]fl.

The least limit ordinal w actually gives us a model. The reason for this is that every U2 type FIt1: U1 .. U1.r for some r: U1. It is easy to expression a of is of the form a show that any type with k occurrences of Fl has a meaning in [U2]k. Consequently, every This is proved precisely in the lemma below. To type expression has a meaning in .

shorten the statement of the lemma, we let A by taking [U1

-*

and U2

=

U U1 -*

be the structure obtained from a

U1

model

[U2]k

k point

General Products, Sums and Program Modules

9.5

val x_coord point -> real val y_coord point -> real val move_p point * real * real

689

: :

point

—>

:

end;

signature Circle sig include

=

Point

type circle val ink_circle val center

point * real —> circle —> point :

val radius

:

circle ->

val move_c

:

circle

*

circle

real real * real

—>

circle

end;

signature Rect =

sig include Point type rect

make rectangle from lower right, upper left corners

(*

point —> rect rect —> point rect -> point rect * real * real —> rect

val mk_rect val lleft

:

:

val uright

val move_r

point

*

end;

signature Geom =

sig (*

include Circle include Rect, except Point, by hand

*)

type rect val mk_rect

val lleft

:

val uright val move_r (*

point —> rect point -> point * real * real —> rect

point rect —>

:

rect :

rect

*

Bounding box of circle *) circle —> rect val bbox :

end;

Polymorphism and Modularity

690

For Point,

Circle and Rect,

structure pt struct

:

we give structures explicitly.

Point real*real

type point

(x,y)

fun mk_point(x,y) = fun x.coord(x,y)

x

fun y_coord(x,y) = y fun move_p((x,y):point,dx,dy) =

(x+dx,

y+dy)

end;

structure cr

:

Circle =

struct open pt

point*real mk_circle(x,y) = (x,y)

type circle = fun

fun center(x,y) = x

fun radius(x,y) = y fun move_c(((x,y),r):circle,dx,dy) =

((x+dx,

y+dy),r)

end;

structure rc

:

Rect =

struct open pt type rect =

point

fun mk_rect(x,y) = fun lleft(x,y) = x

*

point (x,y)

fun uright (x,y) = y fun move_r(((xl,yl),(x2,y2)):rect,dx,dy) =

((xl+dx,yl+dy) (x2+dx,y2+dy)) ,

end;

The geometry structure is constructed using

a functor that takes

Circle and

Rect struc-

tures as arguments. The signatures Circle and Rect both name a type called point. These two must be implemented in the same way, which is stated explicitly in the "sharing" constraint on the formal parameters. Although we will not include sharing constraints

General Products, Sums and Program Modules

9.5

691

in our type-theoretic account of the SML module system, it is generally possible to eliminate uses of sharing (such as the one here) by adding additional parameters to structures, in this case turning circle and rectangle structures into functors that both require a point structure. However, as pointed out in [Mac86J, this may be cumbersome for large

programs.

functor

geom(

structure

c

Circle

structure r:Rect

c.point)

sharing type r.point =

:

Geom =

st ru Ct

c.point

type point =

val mk_point = val x_coord =

val y_coord =

c.mk_point c.x_coord c.y_coord

val move_p = c.move_p

type circle =

c.circle c.mk_circle c.center c.radius

val mk_circle =

val center = val radius =

val move_c = c.move_c

type rect =

r.rect

val mk_rect =

val

(*

heft

=

r.mk_rect

r.lleft

val uright =

r.uright

val move_r =

r.move_r

Bounding box of circle

*)

fun bbox(c) =

let

val x =

x_coord(center(c))

and y = y_coord(center(c)) and r radius(c)

in mk_rect(mk_point(x-r ,y-r) ,mk_point(x+r ,y+r)) end end; .

Polymorphism and Modularity

692

Predicative Calculus with Products and Sums

9.5.2

Syntax by adding general gento a function calculus In this section we extend eral sums and products (from Section 9.1.3). While general sums are closely related to structures, and general cartesian products seem necessary to capture dependently-typed is somewhat more general than Standard ML. For example, functors, the language while an ML structure may contain polymorphic functions, there is no direct way to define a polymorphic structure (i.e., a structure that is parametric in a type) in the implicitly-typed programming language. This is simply because there is no provision for explicit binding of type variables. However, polymorphic structures can be "simulated" in ML by using E, by a functor whose parameter is a structure containing only a type binding. In virtue of the uniformity of the language definition, there will be no restriction on the types will have expresof things that can be made polymorphic. For similar reasons, sions corresponding to higher-order functors and functor signatures. Neither of these were included in the original design of the ML module system [Mac85], but both were added after the type-theoretic analysis given here was developed [Mac86, HM93]. conUnfortunately, general products and sums complicate the formalization of siderably. Since a structure may appear in a type expression, for example, it is no longer possible to describe the well-formed type expressions in isolation from the elements of those types. This also makes the well-formed contexts difficult to define. Therefore, we cannot use the simplified presentation of given in Section 9.2.2; we must revert to the general presentation of Section 9.2.1 that uses inference rules for determining the wellU formed contexts, types and terms. The un-checked pre-terms of are given by the

grammar

M::=Ui Ix

I

(x:

c

I

U2

lb M-*M flx:M.M )x:M.M I

Xx:M.M

I

I

I

MM

M=M, M: M) first(M) I

second(M)

Intuitively, the first row of this grammar gives the form of type expressions, the second ' row expressions from and the third row the expression forms associated with general sums. However, these three classes of expressions are interdependent; the precise definition of each class is given by the axioms and inference rules below. We will use M, N, P and a and t for pre-terms, using a and r when the term is intended to be a type. As in our presentation of existential types, we use a syntactic form (x: a M, N: a'), with the variable x bound in a'. In addition, x is bound in the second component N, so that the "dependent pair" (x: a=M, N: a') is equal to (x: a=M, [M/x]N: a'), assuming x is not

9.5

General Products, Sums and Program Modules

693

free in N. Unlike existential types, general sums have unrestricted first and second projection functions that allow us to access either component of a pair directly. In this respect, pairs of general sum type are closer to the ordinary pairs associated with cartesian product types. Although it is meaningful to extend with a polymorphic let declaration, let is definable in using abstraction over polymorphic types in U2. (See Exercise 9.5.6.) The typing rules for are an extension of the rules for given in Secallows lambda abstraction over U2 types, tion 9.2.1. In addition to general sums, resulting in expressions with fl-types. The associated syntax is characterized by the following three rules, the first giving the form of fl-types in and the second and third the associated term forms.

F

F

M[N] [N/x]a' :

It is an easy exercise (Exercise 9.5.3) to show that the weaker (fl U2) rule, in Section 9.2.1, is subsumed by the one given here. The four rules below give the conditions for forming E-types and the associated term forms.

F

F

F

(x:

:

a=M, N: a") Ex: a.a' :

a (E Elim

second(M) [first(M)/x]a'

These rules require the equational theory below. A simple typing example that does not require equational reasoning is the derivation of x: (Et: U1.t), y:firstx —*firstx, z:firstx

yz:firstx

Polymorphism and Modularity

694

With a "structure" variable x Et: U1 .t in the context,firstx is a Ui type expression and so we may form the U1 type firstx —± firstx. The rest of the typing derivation is routine. The following example shows how equational reasoning may be needed in typing derivations. The general form of type equality rule is the same as in Section 9.2.1. However, the rules for deriving equations between types give us a much richer form of equality than the and simple syntactic equality (with renaming of bound variables) used in :

Example 9.5.2 We can see how equational reasoning is used in type derivations by introducing a "type-transparent" form of let declaration,

lets x:a=M in

second(x:a=M,N)

N

The main idea is that in a "dependent pair" (x: a=M, N), occurrences of x in N are bound to M. Moreover, this is incorporated into the (E Intro) rule so that N is typed with each occurrence of x replaced by M. The typing properties of this declaration form are illustrated by the expression x: (Et: U1.t)

=

(t: U1=int, 3: t)

in

second(x: (Et: Ui.t)=(t: U1=int, 3: t),

mt. z)(secondx) (Az: mt.

z)(secondx))

It is relatively easy to show that the expression, (t: U1=int, 3: t), bound to x has type Et: U1 .t, since we assume int: U1 and 3: mt are given. The next step is to show that the

larger pair

(x: (Et: Uj.t)=(t: U1=int, 3: t), has type (Et: suffices to show

U1

z)(secondx))

.t).int. Pattern matching against the

[M/xI(Az: mt. z)(secondx) where M

(Az: mt.

:

[M/x]int

Intro) rule, we can see that it

int,

(t: U1=int, 3: t). By (E Elim 2), the type of secondM is

second(t: Uj=int,

3: t)

first(t: U1=int, 3: t)

Using the equational axiom ()first) below, we may simplifyfirst(t: Uj=zint, 3: t) to mt. This equational step is critical since there is no other way to conclude that the type of secondM is mt. .

Equations and Reduction The equational proof system for contains the axioms and rules for (with each axiom or rule scheme interpreted as applying to all terms), described in Section 9.2.3, plus the rules listed below.

General Products, Sums and Program Modules

9.5

695

The equational axioms for pairing and projection are similar to the ones for cartesian products in earlier chapters, except for the way the first component is substituted into the second in the (E second) rule.

N:a') =

F

M

:

a

(Efirst)

N:a') =[M/xJN: [M/xja'

second)

(Esp) second) may yield equations Note that since a or a' may be U1, axioms and between types. Since type equality is no longer simply syntactic equality, we need the congruence rules for type equality below. Since all types are elements of U2, we may use the general reflexivity axiom and symmetry and transitivity rules mentioned in Section 9.2.3 for equations between types (see Exercise 9.5.4).

F

r>

ai = aj':

F, x: a1

a2

=

a! = F

:

(12

F,

U2 (II

flx:af.a2 =

F F

(12

Cong)

(12

x: a!

a2

=

=

U2

Cong)

U2

The congruence rules for terms are essentially routine, except that when a term form

includes types, we must account for type equations explicitly. To avoid confusion, we list the rules for terms of Fl and type here.

F

Ax:

a.M = Ax: a".M':

fIx:

a.a'

F

M= M' :a F = F F (x: a=M, N: a') = (x: a"=M', N': a")

F F

M = M': =first(M')

:

a

:

a.a'

Polymorphism and Modularity

696

F F

c

c

M

= M': )x: a.a'

second(M) = second(M') [first(M)/x]a' :

Since contexts are more complicated than in the simplified presentation of the (add var) rule should be written

FrM=N:a F, x: A

c

F,x:Acontext M = N: a

(add var)

so that only well-formed contexts are used.

If we direct the equational axioms from left to right, we obtain a reduction system of the form familiar from other systems of lambda calculus. Strong normalization for may be proved using a translation into Martin-Löf's 1973 system [Mar73]. It follows that is decidable. It is an interesting open problem to develop the equational theory of a task that is complicated by the presence of a theory of logical relations for full '

general

and H types.

Exercise 9.5.3 Show that the (H U2) rule in Section 9.2.1 is subsumed by the (H U2) rule given in this section, in combination with the other rules of Exercise 9.5.4 A general reflexivity axiom that applies to types as well as terms that have a Ui or U2 type is

(refi)

assuming that a may be any well-formed expression of Write general symmetry and transitivity rules in the same style and write out a proof of the equation s: U1, t:

between

U1

U1

second(r:

U1

= s,

t: U1)

t

=t

t:

U1

type expressions.

Exercise 9.5.5 Show that if F M: ).x: o.a' is derivable, then the expression (x: a= firstM, secondM: a') used in the (E sp) axiom has type a.a'. r

Exercise 9.5.6 Show that if we consider let x: a = M in N as syntactic sugar for (Ax: a. N)M, in then typing rule for let in Section 9.2.5 is derivable. Exercise 9.5.7 In some of the congruence rules, it may not be evident that the terms proved equal have the same syntactic type. A representative case is the type congruence rule for lambda abstraction, which assumes that if F, x: a M: a' and F a o": U2 are derivable, then so is F Ax: a". M: flx: a.a'. Prove this by showing that for any M, if F a = a": U2 and F, x: a M: a' are derivable, then so is F, x: a" M: a'. c

c

c

c

c

c

General Products, Sums and Program Modules

9.5

9.5.3

697

Representing Modules with Products and Sums

General sums allow us to write expressions for structures and signatures, provided we regard environments as tuples whose components are accessed by projection functions. For example, the structure

struct type t=int

va].

x:t=3 end

may be viewed as the pair (t: U1=int, 3: t). In the components t and x are retrieved by projection functions, so that S . x is regarded as an abbreviation for second (S). With general sums we can represent the signature

sig type t val x:t end Et: U1.t, which has the pair (t: Ui=int, 3: int) as a member. The representation of structures by unlabeled tuples is adequate in the sense that it is a simple syntactic translation to replace qualified names by expressions involving projection functions. A limitation that may interest ML experts is that this way of viewing structures does not provide any natural interpretation of ML's open, which imports declarations from the given structure into the current scope. Since general products allow us to type functions from any collection to any other collection, we can write functors as elements of product types. For example, the functor F from Section 9.5.1 is defined by the expression as the type

(Et: U1t).(s: Ui=first(S) xfirst(S), (second(S), second(S)) s), :

which has type

[IS: (st: U1.t).

U1.s).

An important aspect of Standard ML is that signature and functor declarations may only occur at "top level," which means they cannot be embedded in other constructs, and structures may only be declared inside other structures. Furthermore, recursive declarations are not allowed. Consequently, it is possible to treat signature, structure, and functor declarations by simple macro expansion. An alternative, illustrated in Example 9.5.8 below, is to using the type-transparent letE bindings of Example 9.5.2, treat declarations (in on general sums. which are based is to replace all occurrences The first step in translating a ML program into E expressions. of signature, structure and functor identifiers with the corresponding Then, type expressions may be simplified using the type equality rule of Section 9.2.1, as required. With the exception of "generativity,", which we do not consider, this process models elaboration during the ML type checking phase fairly accurately.

Polymorphism and Modularity

698

Example 9.5.8 We illustrate the general connection between Standard ML and using the example program

structure

S =

struct type t val x

mt

= :

t

= 7

end;

S.x

+ 3

expression for the structure bound to

The S

(t:U1=int,7:t)

:

S is

Et:U1.t

If we treat SML structure declarations by macro expansion, then this program is equivalent to

secondS +

3

second(t: U1=int, 7: t) + 3

We may type this program by observing that the type of secondS isfirst((int, 7)), which may be simplified to mt. Since structures in Standard ML are transparent, in the sense that the components of using a structure are fully visible, structure declarations may be represented in bindings of Example 9.5.2. In this view, the program above is treated as transparent

the

term

letE S:Et:U1.t =

(t:

Uj=int, 7: t) in secondS +

3

which has a syntactic structure closer to the original program. As explained in Example 9.5.2, the typing rules for general sums allow us to use the fact thatfirst(S) is equivalent to mt. As mentioned earlier in passing, the calculus is more general than Standard first since ML in two apparent respects. The is that there is no need to require that subexpressions of types be closed, we are able to write explicitly-typed functors with E• The second is that nontrivial dependent types in due to the uniformity of the language, we have a form of higher-order functors.

9.5.4

Predicativity and the Relationship between Universes

Although each of the constructs of corresponds to a particular part of Standard ML, the lambda calculus allows arbitrary combinations of these constructs, and straightforward extensions like higher-order functor expressions. While generalizing in certain

9.5

General Products, Sums and Program Modules

699

'H, maintains the ML distincways that seem syntactically and semantically natural, tion between monomorphic and polymorphic types by keeping U1 and U2 distinct. The restrictions imposed by universes are essential to the completeness of type inference (discussed in Chapter 11) and have the technical advantage of leading to far simpler semantic model constructions, for example. However, it may seem reasonable to generalize ML polymorphism by lifting the universe restrictions (to obtain an impredicative calculus), or alter the design decisions U1 c U2 and Ui: U2. Exercises 9.2.5 and 9.2.11 show that the decision to take c U2 is only a syntactic convenience, with no semantic consequence. In this section we show that the decision to take U1 U2 is essentially forced by the other constructs of and that in the presence of structures and functors, the universe restrictions are essential if we wish to avoid a type of all types. :

Strong Sums and Ui:U2 we have U1 c U2, but not U2. However, when we added general product and sum types, we also made the assumption that U1: U2. The reasons for this are similar to the reasons for taking U1 c U2: it makes the syntax more flexible, simplifies the technical presentation, and does not involve any unnecessary semantic assumptions. A precise statement is spelled out in the following proposition. In

Proposition 9.5.9

In any fragment of rules for types of the form t: U, t, with .

which is closed under the term formation are contexts

t: Ui, there

and

where unit may be any tions.

U1

type with closed term *: unit, satisfying the following condi-

i[t]: (Et: Ui.unit).

1.

If F

r

2.

If F

M: (st: U1.unit), then F

U1, then F

j[M]

:

U1.

3.

In other words, given the hypotheses above, since Ui: U2 without loss of generality.

(st: U1 .unit):

U2, we may assume

with sums over U1 In words, the proposition says that in any fragment of (and some U1 type unit containing a closed term *), we can represent U1 by the type Et: U1 .unit. Therefore, even if we drop U1: U2 from the language definition, we are left with a representation of U1 inside U2. For this reason, we might as well simplify matters and take U1: U2.

700

Polymorphism and Modularity

Impredicativity and "type :type" as in the programming language SML, polymorphic functions In the calculus applicable to arguments of all types. For example, the identity function, not actually are has polymorphic type, but it can only written id(x) = x in ML and At. Ax: t. x in be applied to elements of types from the first universe. One way to eliminate this restriction is to eliminate the distinction between U1 and U2. If we replace U1 and U2 by a single ' then we obtain the impredicative polymorphic calculus. universe in the definition of calculus impredicative by eliminating the distinction However, if we make the full between U1 and U2, we obtain a language with a type of all types. Specifically, since we have general products and Ui: U2, it is quite easy to see that if we let Ui = U2, then the type: type calculus of Section 9.2.2 becomes a sublanguage of

with U1: U2, U1 = U2, and closed under the Any fragment of type and term formation rules associated with general products is capable of expressing all terms of the type: type calculus of Section 9.2.2.

Lemma 9.5.10

The proof is a straightforward examination of the typing rules of the type: type calculus of Section 9.2.2. By Lemma 9.5.9, we know that sums over U1 give us Ui: U2. This proves the following theorem.

Theorem 9.5.11 The type: type calculus of Section 9.2.2 may be interpreted in any fragwithout universe distinctions which is closed under general products, and ment of sums over U1 of the form Et: U1.r. Intuitively, this says that any language without universe distinctions that has general products (ML functors) and general sums restricted to U1 (ML structures with type and value but not necessarily structure components) also contains the language with a type of all types. Since there are a number of questionable properties of a type of all types, such as nontermination without explicit recursion and undecidable type checking, relaxing the universe restrictions of would alter the language dramatically.

Trade-off between Weak and Strong Sums Theorem 9.5.11 may be regarded as a "trade-off theorem" in programming language design, when combined with known properties of impredicative polymorphism. The tradeoff implied by Theorem 9.5.11 is between impredicative polymorphism and the kind of E ' types used to represent ML structures in Generally speaking, impredicative polymorphism is more flexible than predicative polymorphism, and E types allow us to type more terms than the existential types associated with data abstraction.

9.5

General Products, Sums and Program Modules

701

Either impredicative polymorphism with the "weaker" existential types, or restricted E seems predicative polymorphism with "stronger" general sum types of reasonable. By the normalization theorem for the impredicative calculus, we know that impredicative polymorphism with existential types is strongly normalizing (normalization with existential types can be derived by encoding as Vr[Vt(a —± r) —± r]). As noted in Section 9.5.2, a translation into Martin-Löf's 1973 system [Mar73] shows that with predicative polymorphism and "strong" sums is also strongly normalizing. However, by Theorem 9.5.11, we know that if we combine strong sums with impredicative polymorphism by taking U1 = U2, the most natural way of achieving this end, then we must admit a type of all types, causing strong normalization to fail. In short, assuming we wish to avoid type: type and non-normalizing recursion-free terms, we have a trade-off between impredicative polymorphism and strong sums. E, it is apparent that there are actually several ways to combine In formulating impredicative polymorphism with strong sums. An alternative is that instead of adding impredicative polymorphism by equating the two universes, we may add a form of impredicative polymorphism by adding a new type binding operator with the formation rule F, t:

U1

r U1.r

F

U1

:

:

U1

Intuitively, this rule says that if r is a U1 type, then we will also have the polymorphic type Vt: U1 .r in U1. The term formation rules for this sort of polymorphic type would allow us to apply any polymorphic function of type Vt: Ui .t to any type in U1, including a polymorphic type of the form Vs: U1 .a. However, we would still have strong sums like E t: U1 r in U2 instead of U1. The normalization theorem for this calculus follows from that of the theory of constructions with strong sums at the level of types [Coq86] by considering U1 to be prop, and U2 to be type0. .

Subtyping and Related Concepts

10.1

Introduction

This chapter is concerned with subtyping and language concepts that make subtyping useful in programming. In brief, subtyping is a relation on types which implies that values of one type are substitutable for values of another. Some language constructs associated with subtyping are records, objects, and various forms of polymorphism that depend on the subtype relation. The main topics of the chapter are: •

Simply-typed lambda calculus with records and subtyping.

•

Equational theories and semantic models.

•

Subtyping for recursive types and the recursive-record model of objects.

•

Forms of polymorphism appropriate for languages with subtyping.

While the main interest in subtyping stems from the popularity and usefulness of objectoriented programming, many of the basic ideas are illustrated more easily using the simpler example of record structures. Many of the issues discussed in this chapter are topics of current research; some material may be subsumed in coming years. A general reference with research papers representing the current state of the art is [GM94]; other collections representing more pragmatic issues are {5W87, YT87J. An early research/survey paper that proposed a number of influential typing ideas is {CW85]. Subtyping appears in a variety of programming languages. An early form of subtyping appears in the Fortran treatment of "mixed mode" arithmetic: arithmetic expressions may be written using combinations of integer and real (floating point) expressions, with integers converted to real numbers as needed. The conversion of integers to reals has some of the properties that are typical of subtyping. In particular, we generally think of the mathematical integers as a subset of the real numbers. However, conversion in programs involves changing the representation of a number, which is not typical of subtyping with records or objects. Fortran mixed mode arithmetic also goes beyond basic subtyping in two ways. The first is that Fortran provides implicit conversion from reals to integers by truncation, which is different since this operation changes the value of the number that is represented. Fortran mt and +: real x real real. also provides overloaded operations, such as +: mt x mt However, if we overlook these two extensions of subtyping, we have a simple example of integers as a subtype of reals that provides some intuition for the general properties of subtyping. Although the evaluation of expressions that involve overloading and truncation is sometimes subtle, the Fortran treatment of arithmetic expressions was generally successful and has been adopted in many later languages.

704

Subtyping and Related Concepts

Another example of subtyping appears in Pascal subranges or the closely related range constraints of Ada (see [Hor84], for example). As discussed again below, the Pascal subrange [1.. 10] containing the integers between 1 and 10 is subtype of the integers. If x is a variable of type [l..l0], and y of type integer, then we can assign y the value of x since every integer between 1 and 10 is an integer. More powerful examples of subtyping appear in typed object-oriented languages such as Eiffel [Mey92b] and C++ [ES9O]. In these languages, a class of objects, which may be regarded for the moment as a form of type, is placed in a subtype hierarchy. An object of a subtype may be used in place of one of any supertype, since the subtype relation guarantees that all required operations are implemented. Moreover, although the representation of objects of one type may differ from the representation of objects of a subtype, the representations are generally compatible in a way that eliminates the need for conversion from one to another. We may gain some perspective on subtyping by comparing subtyping with type equality in programming languages. In most typed programming languages, some form of type equality is used in type checking. A basic principle of type equality is that when two types are equal, every expression with one type also has the other. With subtyping, we replace this principle with the so-called "subsumption" property of subtyping:

If A is a subtype of B then every expression with type A also has type B. This is not quite the definition of subtyping since it depends on what we mean by having a type, which in turn may be depend on subtyping. However, this is a useful condition for recognizing subtyping when you see it. The main idea is that subtyping is a substitution property: if A is a subtype of B, then we may substitute A's for B's. In principle, if we may meaningfully substitute A's for B's, then it would make sense to consider A a subtype of B. A simple illustrative example may be given using record types. Consider two record types R and 5, where records of type R have an integer a component and boolean b component, and records of type S have only an integer a component. In the type system of Section 10.3, R will be a subtype of S. One way to see why this makes sense is to think about the legal operations on records of each type. Since every r: R has a and b components, expressions r. a and r. b are allowed. However, since records of type S have only a components, the only basic operation on a record s of type S is to select the a component by writing s. a. Comparing these two types, we can see that every operation that is guaranteed to make sense on elements of type S is also guaranteed to make sense on elements of R. Therefore, in any expression involving a record of type S, we could safely use a record of type R instead, without type error.

10.1

Introduction

705

We will use the notation A In, m tion 5.6, including Rnat = {(n, n> In E .A/}, Rb001 = {(0, 0>, (1, 1>),

Subtyping and Related Concepts

736

= {(n, m) Vk E Al. n k m k}. Note that these are all 0, we can write F = F0, yl: ai such that the derivation concludes yk: with

by

(—±

F

Elim), followed by k uses of (add var). By the inductive hypothesis,

11.2

Type Inference for

with Type Variables

781

and are most general typings for U and V. This means that there exist substitutions such that

T1

and

T2

T1T1cT0, T2T2CT0,

N=T2N'andT2p=a'

Since the type variables in PT(V) are renamed to be distinct from any type variables in PT(V), the substitutions Ti and T2 may be chosen so that any type variable changed by one is not affected by the other. Anticipating the need for a substitution that behaves properly on the fresh variable t introduced in the algorithm, we let T be the substitution with

Ts = T1sifs appears in PT(U), Ts = T2sifs appears in PT(V),

Tt = a Since the type assignments Fi and ['2 must contain exactly the term variables that occur free in U and V, and TF1 c To for i = 1, 2, the substitution T must unify (cx = $ x: a e Ti and x: $ e ['2). In addition, since Tr = a' —± a = Tp —± Tt, the substitution T unifies = p t. It follows that there is a most general unifier

S_—Unify((a=$

I

and the call to PT(UV) must succeed. Since S is a most genera' unifier for these tions, there is a substitution R with T = R a S. It is easy to see that T MN: a is an instance of PT(UV) by substitution R. The final case is a derivable typing assertion T Xx: a'. M: a' —± a with Erase (Xx: a'. M) = Xx. U. For some k > 0, we can write F = T0, yl: ai yk: ak such that concludes with derivation the T0, x: Xx:

by

a'

a'.

M: a M: a' —± a

Intro), followed by k uses of (add var). By the inductive hypothesis,

T1r'M':p= PT(U) is a most general typing for U, which means that there is a substitution T with

TF1CTØ,

M=TM'andTp=a

Type Inference

782

If x: r

TI for some

r, then Tr must be a', and

r. M': r

T

Ax:

a'. M: a' —±

a is an instance of

p by substitution T. If x does not occur in T, then we may further assume without loss of generality that Ts = a', where s is the fresh type variable introduced in Algorithm PT. In this case, it is easy to verify that T Ax: a'. M: a' —± a T

—

{x: r}

—±

must be an instance of T

Ax: s. M: s

—±

p by substitution T. This proves the theorem.

.

Execute the principal typing algorithm "by hand" on the following

Exercise 11.2.14 terms. (a) K_=Ax.Ay.x. (b)

S

(c)

Af.Ax.f(fx)

Ax. Ay. Az. (xz)(yz).

(d) SKK, where K and S are given in parts (a) and (b). (e) Af. (Ax.

f (xx))(Ax. f(xx)). This is the untyped "fixed-point combinator" Y. It has the

property that for any untyped lambda term U, YU = U(YU).

Implicit Typing

11.2.4

Historically, many type inference problems have been formulated using proof rules for untyped terms. These proof systems are often referred to as systems for implicit typing, since the type of a term is not explicitly given by the syntax of terms. In this context, type inference is essentially the decision problem for an axiomatic theory: given a formula asserting the type of an untyped term, determine whether the formula is provable. In this section, we show how type inference may be presented using "implicit" typing rules that correspond, via Erase, to the standard "explicit" typing rules of The following proof system may be used to derive typing assertions of the form T type U: r, where U is any untyped lambda term, r is a type expression, and T is a assignment. 0

c:

r, c a constant of type r without type variables

(cst)

(var) T, x: T

r

U: p

(Ax.U): r

—±

(abs)

p

T

p

U:

r

T

with Type Variables

11.2

783

These rules are often called the Curry typing rules. Using C as an abbreviation for Curry, we will write F—a F U: t if this assertion is provable using the axiom and rules above. There is a recipe for obtaining the proof system above from the definition of we simply apply Erase to all of the terms appearing in the antecedent and consequent of every rule. Given this characterization of the inference rules, the following lemma should not be too surprising.

Lemma 11.2.15

If

Conversely, if F—a F

F M: r is a well-typed term of U: t, then there is a typed term F M:

then

t of

F Erase(M): t. with Erase(M) = U.

F—a

The proof (below) is a straightforward induction on typing derivations. It may be surprising that this lemma fails for certain (much more complicated) type systems, as shown in [Sa189].

Proof If F M: t

is a well-typed term of XT*, then it is easy to show Hc F Erase(M): r by induction on the typing derivation of F M: r. For the converse, assume we have a proof of F U: t. We must construct a term M with F M: r and Erase(M) = U, by induction on the proof of F U: r. Since this is essentially straightforward, we simply illustrate the ideas involved using the lambda-abstraction case. Suppose F Xx.U: r —± r' follows from F, x: t U: r' and we have a well-typed term F, x: r M: t' with Erase(M) = U. Then clearly F Xx: r.M: t —÷ r' and Erase(Ax: r.M)

.

In the proof of Lemma 11.2.15, it is worth noting that in constructing a well-typed term from a typable untyped term, the term we construct depends on the proof of the untyped judgement. In particular, if there are two proofs of F U: t, we might obtain two different explicitly typed terms with erasure U from these two proofs. When we work with typing assertions about untyped terms, some terminology and definitions change slightly. We say untyped term U has typing F U: r if this typing assertion is derivable using the Curry rules of Section 11.2.4 or, equivalently, if there is term F M: r with Erase(M) = U. We say F' U: r' is an instance a well-typed of F U: r if there is a substitution S with SF c F' and St = r'. A typing F U: r is principal for U if it is a typing for U and every other typing for U is an instance of F U: r. It is easy to use Theorems 11.2.12 and 11.2.13 to show that any typable untyped term has a principal typing, and that this may be computed using algorithm PT. It is possible to study the Curry typing rules, and similar systems, by interpreting them semantically. As in other logical systems, a semantic model may make it easy to see that certain typing assertions are not provable. To interpret F U: t as an assertion about U,

Type Inference

784

we need an untyped lambda model to make sense of U and additional machinery to interpret type expressions as subsets of the lambda model. One straightforward interpretation of type expressions is to map type variables to arbitrary subsets and interpret r —÷ p as all elements of the lambda model which map elements of r to elements of p (via application). This is the so-called simple semantics of types, also discussed in Section 10.4.3 in connection with subtyping. It is easy to prove soundness of the rules in this framework, showing that if 1—c r U: r, then the meaning of U belongs to the collection designated by r. With an additional inference rule

r

V:

for giving equal untyped terms the same types, we also have semantic completeness. Proofs of this are given in [BCDC83, Hin83J. Some generalizations and related studies may be found in [BCDC83, CDCV8O, CZ86, CDCZ87, MP586, Mit88J, for example. It is easy to show using a semantic model that the only terms of type t —÷ t, for example, are those equal to .x. However, since semantically equal untyped terms must have the same types in any model, no decidable type system can be semantically complete, by Exercise 11.2.2. 11.2.5

Equivalence of Ilyping and Unification

While algorithm PT uses unification, the connection between type inference and unification is stronger than the text of algorithm PT might suggest. A general theme in this section is that type inference and unification are algorithmically equivalent, in the sense that every type inference problem generates a unification problem, and every unification problem arises in type inference for some term. We begin this section with an alternative type inference algorithm whose complexity is easier to analyze than PT. This algorithm is essentially a reduction (in the sense of recursion theory or complexity theory) from typing to unification. When combined with the linear-time unification algorithm given in {PW78J, this reduction gives a linear-time algorithm for typing. The remainder of this section develops the converse reduction, from unification to type inference, with some consequences of independent interest noted along the way. To establish intuition, we give some examples of terms with large types, showing that in the worst case a term may have a type whose string representation is exponential in the length of the term. Next, we show that if r is the type of some closed term, then r must be the principal type of some term. This fact, which was originally proved by Hindley for combinatory terms [Hin69J, is used in our reduction from unification to type

11.2

Type Inference for

with Type Variables

785

Table 11.3

Algorithm reducing

(Curry) typing to unification.

TE(c, t) = {t = r}, where r is the variable-free type of c TE(x, t) = = t1), where is the type variable for x TE(UV, t) = TE(U, r) U TE(V, s) U = s —* t}, where r, s are fresh type variables TE(Ax.U, t) = TE(U, s) U = —* s), where s is a fresh type variable

inference. The reduction from first-order unification to type inference and the examples of terms with large types provide useful background for analyzing ML typing in Section 11.3.5. We will work with typing assertions about untyped terms, as in Section 11.2.4. Given an untyped term, algorithm TE in Table 11.3 produces a set of equations between type expressions. These equations are solvable by unification if the given untyped term has a typing. The input to T E must be an untyped term with all bound variables distinct, and no variable occurring both free and bound. The reason for this restriction, which is easily satisfied by renaming bound variables, is that it allows us to simplify the association between term variables to type variables. Specifically, since each term variable is used in only one way, we can choose a specific type variable for each term variable x. For any untyped term U, let ['u be the type assignment I

xfreeinU)

If U is an untyped term, t is a type variable, and E = T E(U, t), then every typing for U is an instance of S unifying E. In particular, the principal typing for U is given by the most general unifier for E. This is illustrated by example below.

Example 11.2.16 We can see how algorithm TE works using the example term S Ax. Ay. Az.(xz)(yz). Note first that all bound variables are distinct. Tracing the calls to

TE, we have TE(Ax. Ay. Az.(xz)(yz), r)

=

TE(Ay.Az.(xz)(yz),s)

-÷ s)

U

= =

TE((xz)(yz), u) U

{t

TE(x, vl)

V2) U {vl

=

—>

u, s

=

—>

= =

U

TE(z,

=

v2

> V1

t, r

=

—>

sl

Type Inference

786

U

TE(y, wi)

U

TE(z, w2)

U

(wi

= W2 -*

w}

U

= U U

v2, and w2, since each must be equal to the type of some term variable, Eliminating we have the unification problem

We can solve this set of equations by repeatedly using substitution to eliminate variables, working from left to right. If we stop with the equation for r, we find that the type of S is

r

—

(w

w)

u))

u)

Since the main ideas are similar to the correctness proof for PT in Section 11.2.3, we will not prove correctness of Algorithm TE. (A correctness proof for this algorithm may be found in [Wan871.) In analyzing the complexity, we assume the input term is given by a parse tree (Section 1.7) so that the length of U V is the length of U plus the length of V plus some constant factor (for the symbols indicating that this is an application of one term to another). Given this, it is an easy induction on terms to show that the running time of TE(U, t) on parsed term U is linear in the length of U. If the output of TE is presented using the dag representation of type expressions discussed in Section 11.2.2 and Exercise 11.2.6, then we may apply the linear time unification algorithm of [PW78]. This gives a linear time typing algorithm. The output of the linear time typing algorithm is a typing in which all of the type expressions are again represented by directed acyclic graphs. This is necessary, since the examples we consider below show that representing the types as strings could require exponential time. (As shown in Exercise 11.2.6, a dag of size n may represent a type expression of length 211.) In the following theorem, we use the standard "big-oh" notation from algorithm analysis. Specifically, we say a function is O(g(n)) if there is some positive real number c > 0 such that f(n) cg(n) for all sufficiently large n. We similarly write f(n) = g(O(h(n)) if there is a positive real number c > 0 with f(n) g(ch(n)) for all sufficiently large n. In particular, a function is 2°(n) if there is some positive real number c > 0 with for all sufficiently f(n) large n.

f

f

11.2

Type Inference for

with Type Variables

787

Theorem 11.2.17 Given an untyped lambda term U of length n with all bound variables distinct, there is a linear time algorithm which computes a dag representing the principal typing of U if it exists, and fails otherwise. If it exists, the principal typing of U has length at most and dag size 0(n).

Before proving the converse of Theorem 11.2.17, we develop some intuition for the difficulty of type inference by looking at some example terms with large types. Product types and pairing and projection functions are not essential, but do make it easier to construct expressions with specific principal types. An alternative to adding product types and associated operations is to introduce syntactic sugar that is equivalent for the purpose of typing. To be specific, we will adopt the abbreviation (U1

Uk)

Az.zUi .. .

where z is a fresh variable not occurring in any of the This abbreviation is based on a common encoding of sequences in untyped lambda calculus that does not quite work for typed lambda calculus (see Exercise 11.2.24). It is easy to verify that if U1 has principal then the principal type of the sequence is type (U1

Uk):(al—*...--*ak--*t)--*t

where t is a fresh type variable not occurring in any of the a1. We will write x ... x —* 1) —* t with I not occurring as an abbreviation for any type expression (ai —* ... —* in any a1.

Example 11.2.18 P

The closed term

Xx.(x,x)

has principal type s —* (s x s). If we apply P to an expression M with principal type a, then the application PM will be typed by unifying s with a, resulting in the typing PM : a x a. Thus applying P to a typable expression doubles the length of its principal type.

.

By iterating the application of P from Example 11.2.18 to any typable expression M it is easy to prove the following lemma.

For arbitrarily large n, there exist closed lambda terms of length n whose principal types have length

Proposition 11.2.19

Another useful observation about principal types involves terms of the form V

Aw.Kw(U1

Type Inference

788

.x. If w does not occur free in any of the Ui's, then where K is the untyped term However, a t, as the I combinator V is typable, with the same principal type, t term of form V might be untypable if the type constraints introduced by the Ui's cannot be satisfied. This gives us a way of imposing constraints on the types of terms, as illustrated in the following example.

Example 11.2.20

The term

EQ

Kw

w (zu) (zv)).

has principal type t

—*

t

—* S —* S.

Therefore EQ U V is typable

if the principal types of U and V are unifiable.

An interesting and useful property of Curry typing is that if F U: r is a typing for some F, r not containing type constants, then F V: r is a principal typing for some term V [Hin69]. Our proof uses the following lemma.

If r is any type expression without type constants and t1 tk is a list containing all the type variables in r, then there is a closed untyped lambda term with principal type t1 r s s for some (arbitrary) type variable s not among ...

Lemma 11.2.21

tl

tk.

The proof is by induction on the structure of r, with ti tk fixed. If r is one of the type variables then t1 s s is a principal type of the term ... t1

Proof Ax1.

.

. .

AXk.Ay. Az. K z (EQx1 y)

Ày. u and EQ is the term given in Example 11.2.20.

where K

If r has the form r1 lambda terms U1 and tk

Ax1.

.

.

.

t2

s

r2, then by the inductive hypothesis we assume we have untyped

with principal types t1 ... tk ti s s and respectively. It is not hard to see that the untyped term

U2

s,

Axk.Ay. Az. K z (Aui.Au2.Aw. w (U1x1

.

.

.

Xku

(U2x1

. .

.

t1

Xku2) (EQ (yu1) u2))

has principal typing

This proves the lemma.

.

For simplicity, we state and prove the following theorem for closed terms. The more

11.2

Type Inference for

with Type Variables

789

general statement for open terms follows as a corollary. This is left to the reader as Exercise 11.2.25.

Theorem 11.2.22

If U is a closed untyped lambda term with type r, not necessarily principal, and r does not contain type constants, then there is a closed V with principal type r and V

U.

is a closed term W with principal type ti —± ... where ti tk lists all the type variables in r. Let V be the term (Ay. K y W') U, where W' is the term

Proof By Lemma 11.2.21, there tk —k

AX1.

r

—k s —k

s,

...AXk.AZ.Z(WX1...Xky).

Note that y occurs free in W'. The occunence of y in W' forces y to have type r. Therefore, the principal type of Ày. K y W' is r —± r. Since U has type r by hypothesis, the principal type of the entire term must be r.

Recall from Proposition 4.3.13 that there is a closed term of type r if r is a valid formula of intuitionistic propositional logic, with —k read as implication. This means that 0, r is a typing for some closed term if r is a valid formula of intuitionistic propositional logic. It follows from the PSPACE complexity of propositional intuitionistic logic [5ta79] that it is PSPACE-cOmplete to determine whether a given pair F, r is a typing for some (untyped) lambda term. From Theorem 11.2.22 and Proposition 4.3.13, it is relatively easy to prove that for any type expressions r and r', there is a closed term that is typable if r and r' are unifiable.

Theorem 11.2.23 Let r and r' be type expressions without type constants. There is a closed, untyped lambda term that is

typable

if r and r' are unifiable.

Proof Read as an implication,

—± s —± s says that r and r' imply true. This is r —± an intuitionistically valid implication. By Proposition 4.3.13, there is a term with this type, and by Theorem 11.2.22 there is a term U which has r —± —± s —± s as a principal type. Therefore, AX. UXX is typable r and r" are unifiable.

if

.

It follows from the p-completeness of unification [DKM84] that deciding typability is complete for polynomial time with respect to log-space reduction. This means that unless certain conjectures about parallel computation and polynomial time fail (specifically, the class NC contains all of p), it is unlikely that a parallel algorithm for typing could be made to run any faster in the asymptotic worst case than the best sequential algorithm. As shown in Exercise 11.2.26, Theorem 11.2.23 may be proved directly from Lemma

Type Inference

790

11.2.21. However, the proof using Proposition 4.3.13 seems easier to remember. An alternate direct proof is given in [Tys881. t for U: a and V: r. Exercise 11.2.24 Consider (U, V) Ax. xUV (a —÷ r —÷ t) Write untyped terms P1 and P2 such that (U, V)P1 = U and (U, V)P2 = V. Show that typing (U, V)P1:a and (U, V)P2: r require different typings for (U, V). :

Exercise 11.2.25 Use Theorem 11.2.22 to show that if F U: r is a typing for some open term U, then there is a term V —>± U with principal typing F V: r. Exercise 11.2.26 Prove Theorem 11.2.23 directly from Lemma 11.2.21, without using Proposition 4.3.13. 11.3

Type Inference with Polymorphic Declarations

11.3.1

ML Type Inference and Polymorphic Variables

from Section 9.2.5. In this section, we consider type inference for the language with type abstraction, type application, and Recall that this language is the extension of polymorphic let declarations. The type expressions of may contain the binding operator Fl, giving us polymorphic types of a second universe U2. The typing contexts may include both U1 and U2 type expressions, and the type of a constant symbol may be any U2 type expression without free type variables (i.e., all type variables must be bound by IT). Since this language exhibits the kind of polymorphism provided by the programming language ML, we may call the resulting type inference problem ML type inference. The main feature of A-÷n,let that we use in type inference is that all polymorphic types with r a simple non-polymorphic type. As a result, the genhave the form Fit1. .. eral techniques used for type inference apply to extensions of the language with cartesian product types, recursion operators, and other features that do not change the way that type binding occurs in type expressions. If we extend the language to include function flt'.r', however, then the type inference problem appears to be types of the form flt.r harder. With full impredicative typing, type inference becomes algorithmically unsolvable .

[We1941.

The ML type inference problem is defined precisely below by extending the Erase function of Section 11.1 to type abstraction, type application and let as follows:

Erase(At.M)

Erase(Mr)

= Erase(M) Erase(M)

Erase(let x: a = M in N) = let

x

= Erase(M) in Erase(N)

Type Inference with Polymorphic Declarations

11.3

791

Given any pre-term, Erase produces a term of an untyped lambda calculus extended with untyped let declarations of the form let x = U in V. Note that this is not the same as the Erase function in Example 11.1.2. A property of that complicates type inference for terms with free variables is that there are two distinct type universes. This means that there are two kinds of types that may be assigned to free variables. If we attempt to infer ML typings "bottom-up," as in the typing algorithm PT of Section 11.2.3, then there is no way to tell whether we should assume that a term variable has a U1 type or a U2 type. Furthermore, if we wanted to choose a "most general" U2 type, it is not clear how to do this without extending the type language with U2 type variables. However, this problem does not arise for closed terms. The reason is that each variable in a closed term must either be lambda bound or let bound. The lambda-bound variables must have U1 types, while the type of each let-bound variable may be determined from the declaration. For example, in let I = Ax. x in II, the type of the let-bound variable I can be determined by typing the term Xx. x; there is no need to find an arbitrary type for I that makes II well-typed. More generally, we can type an expression of the form let x = U in V by first finding the principal typing for U, then using instances of this type for each occurrence of x in V. Since this eliminates the need to work with U2 types, we can infer types for closed using essentially the same unification-based approach as for Ar-terms. We consider three algorithms for ML type inference. Since the first two involve substitution on terms, it is easier to describe these algorithms if we do not try to compute the corresponding explicitly-typed term. This leads us to consider the following ML type inference problem: ML Type Inference Problem: Given a closed, untyped A-term V, possibly containing untyped let's and term constants with closed U2 types, find a U1 type r such that there term 0 M: r with Erase(M) = V. exists a well-typed It is possible to modify each of the algorithms we develop so that the explicitly-typed term 0 M: r is actually computed. It is easy to see that there is no loss of generality in only considering Ui types for terms, since 0 M: ft1.... ftk.T is well-typed iff 0 r is. Mt1 . .

11.3.2

.

Two Sets of Implicit Typing Rules

We consider two sets of implicit typing rules and prove that they are equivalent for deriving while the secnon-polymorphic typings. The first set matches the typing rules of type inference. ond gives us a useful correspondence between ML type inference and The first set of rules is obtained by erasing types from the antecedent and consequent of

Type Inference

792

This gives us rules (cst), (var), (abs), (app) and (add hyp) each typing rule of of Section 11.2.4, together with the following rules for type abstraction, type application and let. The names (gen) and (inst) are short for "generalization" and "instantiation."

Fr>V:a t not free F V: Ilt.a Fr> V:Ilt.a F

V:

in F

(gen)

(inst)

[r/t]a

Fr>(let x=U in V):a'

(let)2

Since we often want let-bound variables to have polymorphic types, this let rule requires V: a U2 types, hence the subscript "2" in the name (let)2. We will write F-ML2 F if the typing assertion F V: a is provable using (cst), (var), (abs), (app), (add hyp), (gen), (inst) and (let)2. With these rules, we have the straightforward analog of Lemma 11.2.15 for Lemma 11.3.1 If F (M): a. Conversely, if with Erase(M) = V.

then F-ML2 F M: a is a well-typed term of V: a, then there is a typed term F M: a of

Erase

F-ML2 F

Proof The proof follows the same induction as the proof of Lemma 11.2.15, but with additional cases for (gen), (inst) and (let)2. Each of these is straightforward, and essentially similar to the cases considered in the proof of Lemma 11.2.15.

.

As a step towards an algorithm that solves the ML type inference problem without using polymorphic U2 types, we reformulate the type inference rules so that no polymorphic types are required. Essentially, the alternative rule for let uses substitution on terms in place of the kind of substitution on types that results from rules (gen) and (inst). More specifically, let x = U in V has precisely the same ML typings as [U/x]V, provided F U: p for some non-polymorphic type p. This suggests the following rule:

Fr>[U/x]V:r x=U in V:r

(let)i

If we ignore the hypothesis about U, then this rule allows us to type let x = U in V by typing the result [U/x]V of let reduction. It is not hard to show that if x occurs in V, and F [U/x]V: r is an ML typing, then we must have F U: a for some a. Therefore, the assumption about U only prevents us from typing let x U in V when x does not occur in V and U is untypable.

Type Inference with Polymorphic Declarations

11.3

793

let

A comparison of the two rules appears in Example 11.3.2 below. Since we wish to eliminate the typing rules that involve polymorphic types, we reformulate the rule for constants to incorporate (inst).

Tk/tl,..., tk]T, iffltl....fltk.risthetypeofc

(cst)1

We will write F-ML1 F U: r if the typing assertion F U: r is provable from (var), (abs), (app) and (add hyp) of Section 11.2.4, together with (cst)j and It follows from the proof that the two inference systems are equivalent that with (cst)j and (let)j, we do not need (inst) or (gen).

Example 11.3.2

Consider the term

let y=Ax.Af.fx in if

then

z

y true Ax.5

else

y 3 Ax.7

where we assume that true: bool and 3: nat as usual. Using rules common to ML1 and ML2, we may derive the typing assertion 0

Af. Ax.

fx: s

s)

(s

s

In the ML2 system, we can then generalize s by rule (gen) to obtain

fx: fls. s

0

Af. Ax.

and

use (inst)

y:

fls.s

(s

s)

s

to derive

flat)

s

(s

nat

and similarly for bool. This allows us to prove y:

fls.

s

(s

s)

s

then y true Ax. 5

(if z

else

which makes it possible to type the original term by the We can also derive

y_—Ax.Af.fx

in if

z

y 3 Ax.

7): nat

(let)2 rule.

then ytrueAx.5 else y3Ax.7:nat

using (let)1. Working backwards, it suffices to show O

if

z

then

(Ax.

Af.

fx) true

Ax. 5

else

(Ax.

Af.

fx) 3

Ax.

7:

nat

which can be done by showing that both (Ax. Af. fx) true Ax. 5 and (Ax. Af. fx) 3 Ax. 7 have type nat. These typings can be shown in the Ar-fragment of ML1 in the usual way. The difference between the ML1 and ML2 derivations is that in ML1, we type each

Type Inference

794

occurrence of Ax. Af. fx in a different way, while in ML2 we give one polymorphic typing for this term, and instantiate the bound type variable differently for each occurrence.

.

Since ML1 typings use the same form of typing assertions as Curry typing, we may adopt the definition of instance given in Section 11.2.4. It is not hard to see that Lemma 11.2.3 extends immediately to

Lemma 11.3.3 Let F' F' U: r'. then

U:

t'

be an instance of the typing F

U:

t. If

F

U:

t,

Proof This proof by induction on typing derivations is identical to the proof of Lemma 11.2.3 except for an additional case for (let)i. • We also have a similar property for ML2 typings, where we define instance exactly as before. Specifically, F' U: t' is an instance of F U: r if there is a substitution S with SF c F' and Sr t'. As usual, a substitution only effects free type variables, not those bound by Fl.

Let F' U: r' be an instance of the typing F U: r, where F and r may contain polymorphic types. If F U: r, then F' U: t'.

Lemma 11.3.4

This lemma will be used in proving that ML1 and ML2 are equivalent. The proof is again by induction on typing derivations. We can prove that the two inference systems are equivalent for typing judgements without polymorphic types.

Theorem 11.3.5 Let F be a type assignment with no polymorphic types, and let r be any nonpolymorphic type. Then

F

U: r iff

F—ML1

F

U: r.

Although the details are involved, the main idea behind this equivalence is relatively simple. We give a short sketch here and postpone the full proof to Section 11.3.4. The proof of Theorem 11.3.5 has two parts, the first showing that ML1 is as powerful as ML2 and the second showing the converse. The first part is easier. To show that ML1 is as powerful as ML2, we first show that adding the ML2 rules (gen) and (inst) would not change the ML1. The essential reason is that if F U: r' is proved from F U: r by some sequence of steps involving only (gen) and (inst), then F U: r' is an instance of F U: r. But since any instance of an ML1 typing is already ML1 -provable, rules (gen) and (inst) are unnecessary. The second step in showing ML1 is as powerful as ML2 is to show that (let)2 can only be used to derive typing assertion that could already be proved using ML1 rules. If we prove a typing by (let)2,

Type Inference with Polymorphic Declarations

11.3

F

U:

Ut1.... Utk.t, F, x: Ut1.... lltk.r x=U in V:r'

V:

795

t'

then every occurrence of x in V is given a substitution instance of r by applying some combination of (inst) and (gen) steps. Therefore, we could prove F [U/x] V: r' using the ML1 rules. This is the outline of the proof that ML1 is as powerful as ML2. The converse, however, requires principal typing. The main problem is to show that if x occurs in V and F [U/x]V: r' is ML1-provable, then there is some type r such that F U: lit1.... fltk.t and F, x: lit1.... fltk.r V: t' are both ML2-provable. The only reasonable choice for r is the principal type for U. If we set up the inductive proof of equivalence properly, then it suffices to show that ML1 has principal types. Therefore, we proceed with the typing algorithms based on ML1. These give us the existence of principal typings as a corollary, allowing use to prove the equivalence of ML1 and ML2 in Section 11.3.4. Theorem 11.3.5 gives us the following correspondence between F-ML and the explicitly typed language

Corollary 11.3.6 taining only

Erase(M): t.

Let r be any U1 type of and F any type assignment conlet then F-ML1 F types. If F M: r is a well-typed term of F V: r, then there is a typed term F M: r of Conversely, if

U1

with Erase(M)

V.

Proof This follows immediately from Lemma 11.3.1 and Theorem 11.3.5. 11.3.3

.

Type Inference Algorithms

We consider three algorithms for computing principal ML1 typings of untyped A, letterms. The first eliminates let's and applies algorithm PT of Section 11.2.3. This takes exponential time in the worst case, since eliminating let's may lead to an exponential increase in the size of the term. However, as we shall see in Section 11.3.5, exponential time is the asymptotic lower bound on the running time of any correct typing algorithm. The second algorithm is based on the same idea, but is formulated as an extension to algorithm PT. This serves as an intermediate step towards the third algorithm, which uses a separate "environment" to store the types of let-bound variables, eliminating the need for substitution on terms. This algorithm more closely resembles the algorithms actually used in ML and related compilers. Although the third algorithm has the same asymptotic worstcase running time as the first, it has a simple recursive form and may be substantially more efficient in practice since it does not perform any operations on terms. All three algorithms are explained and proved correct for ML1 typing. It follows by Theorem 11.3.5 that the algorithms are also correct for and produce principal typings for ML2 typing.

Type Inference

796

First Algorithm The main idea behind the first algorithm is that when x occurs in V, the term let x = U in V has exactly the same typings as [U/x] V. We can treat the special case where x does not occur in V by the following general transformation on terms.

An untyped A, let-term of the form let x = U in V has precisely the same ML1 typings as let x — U in (Ay.V)x, provided y not free in V, and precisely the same ML1 typings as the result [U/x]((Ay.V)x) of let-reducing the latter term.

Lemma 11.3.7

The proof is straightforward, by inspection of the ML1 inference rules. A second induction shows that we may repeat this transformation anywhere inside a term, any number of times. W1 be any untyped A, let-term and let W2 be the result of repeatedly replacing any subterm of the form let x = U in V by either let x = U in Then W1 and W2 have precisely the same ML1 typings. or

Lemma 11.3.8 Let

Lemma 11.3.8 gives a transformation of untyped let-terms that preserves typability. Since this transformation can be used to eliminate all let's from a term, we may compute typings. Given an untyped A, let-term, ML1 typings using the algorithm PT for algorithm PTL1 eliminates all let's according to Lemma 11.3.8 and applies algorithm PT to the resulting untyped lambda term. If we write let-expand(U) for the result of removing all let's according to Lemma 11.3.8, then we may define algorithm PTL1 as follows:

PTL1(U) =let F

M: r

F

U: r

= PT(let-expand(U))

in Termination follows from Proposition 9.2.15. Specifically, we can compute the letexpansion of a term in two-phases. We begin by replacing each subterm of the form let x = U in V by let x = U in (Ay.V)x, where y is not free in V. This is clearly a terminating process. Then, by Proposition 9.2.15, we can let-reduce each let x = U in (Ay. V)x to [U/x]((Ay. V)x). The theorems below follow immediately from mas 11.3.3 and 11.3.8 and Theorems 11.2.12 and 11.2.13.

Theorem 11.3.9 If PTL1(U) = provable.

F

U:

r, then every instance of F

U: r is ML1-

11.3

Type Inference with Polymorphic Declarations

Theorem 11.3.10 Suppose

797

U: r is an ML1 typing for U. Then PTL1(U) succeeds and produces a typing assertion with T U: r as an instance. T

Corollary 11.3.11 Let U be an untyped A, has a principal ML1 typing.

let-term. If

U has an MLi typing, then U

Second Algorithm We can reformulate algorithm PTL1 as an extension of P1 rather than a two phase algorithm with a preprocessing step. If we read typing rule (let)i from the bottom up, this rule suggests that we extend algorithm PT to let by adding the following clause:

PT(let x=U in

V)

=let T

U: r

PT(U)

in

PT([U/x]V) We write PTL2 for the algorithm given by this extension to PT. It is intuitively clear from the (let)i rule that this algorithm is correct. We state and prove this precisely using let-expansion of terms. For any terms U and V, we write PTL1(U) PTLJ(V) if either both PTLI(U) and PTLJ(V) fail, or PTLI(U) = T1 U: ri and PTLJ(V) = T2 V: r2 both succeed with = T2 and = T2.

Lemma 11.3.12

For any untyped

A,

let-term U, we have PTL2(U)

PTL2(let-

expand(U)).

Proof The proof uses induction on an ordering of untyped terms by "degree." Recall from Proposition 9.2.15 that there are no untyped A, let-terms with an infinite sequence of let reductions. If V is an untyped A, let-term, then the degree of V is the pair dg(V) = (t, m) of natural numbers with £ the length of the longest let-reduction sequence from V and m = VI the length of V. We order degrees lexicographically, so that (t, m) < (t', m') if either £ U: lit1 .. r) x: r" from this same series of proof steps may be used to obtain F, x: (lit1 . . F, x: (fIt1 . r) x: fIt1 ... fltk. r, and conversely. The remaining cases are essentially alike. We will illustrate the argument using the let case, since this is slightly more involved than the others. By Lemma 11.3.20, we need only consider proofs that end with rule (let)2. Therefore, we have F—ML2 F [U/x](let y V1 in V2): v" if F—ML2 F r> (let y = [U/x}V1 in [U/x]V2): r" if by the last proof rule both F—ML2 F [U/x}Vi: a' and L—ML2 F, y: o' [U/x]V2: r". Since r" is not a polymorphic type, the inductive hypothesis implies that the second condition is equivalent to r). F, x: o, y: o' V2: r", where a = (lit1 .. hypothesis to F r> [U/x]V1: oW', we note that o' must have the To apply the inductive do not form a-' = flsi . . fIst. r'. We assume without loss of generality that Si occur free in F and remove the quantifiers by rule (inst). Using the inductive hypothesis on the typing assertion obtained in this way, we observe: .

.

.

.

.

F—ML2

Fr> [U/x]Vi: flSi

.

.

.

fIst.

r'if if if

F—ML2

Fr> [U/x]Vi:

r'

HML2F,x:a-r>Vi:t

F,x:o

r>

Vi: fIs1

...

Proof of Theorem 11.3.5 The proof uses induction on an ordering of untyped terms by degree, as described in the proof of Lemma 11.3.12. Specifically, the degree of V is the pair dg(V) = m) of natural numbers with £ the length of the longest let-reduction sequence from V and m = Vl the length of V. We show that the two systems are equivalent for terms of each syntactic form. The cases for variable, application and abstraction are essentially straightforward, using Lemma 11.3.20. To give an example, we will prove the theorem for abstractions. By Lemma 11.3.20, we have F r> Ax.U: r —÷ r' if F-ML2 F, x: r U: t', where by Lemma 11.3.19 we assume that x does not occur in F. By the inductive hypothesis, L—ML2 F, x: r r> U: r' if L—ML1 F, x: r r> U: r'. By inspection of the ML1 rules, we can see that since x does not occur in F, F—MLI F, x: r U: r' if F r> Ax.U: r —± r'. Since the primary difference between the two systems is in the let rules, this is the main case of the proof. We prove each direction of the equivalence separately. If L—ML i F r> let x U in V: r, then the derivation concludes with the rule

Fr>U:t', Fr>[U/x}V:t x=U in V:t

Type Inference with Polymorphic Declarations

11.3

805

There are two cases to consider. If x does not occur free in V, then by the inductive hypothesis, we know that both hypotheses of this proof rule are ML2-provable. From F V: r we obtain F, x: V: r and so we complete the ML2 proof by F

U: r', F, x:

V: r

Fr>let x=U in V:r using rule (let)2. The more difficult case is when x occurs free in V. Since we have shown that has principal typings, in Corollary 11.3.11, we may assume by Lemma 11.3.21 that F U: r' is the F-principal typing of U. This means that every ML i-provable typing assertion of the form F U: r" has r" = Sr' for some substitution S which only affects type variables tk that are free in r' but not in F. The F-principality of F > U: r' carries over to ML2 by the induction hypothesis. By the inductive hypotheses, we also have F t> U: r' and F {U/x]V: r. By rule (gen), we have F U: lit1 . r' and by Lemma 11.3.22 F, x: (lit1 ... litk. r') V: r. This allows us to apply rule (let)2, concluding this direction of the proof. For the converse, we assume F let x = U in V: r. By Lemma 11.3.20, we may assume that the proof ends with the proof step . .

x=U in where a = lit1

(let)2

V:r

... litk. r' for some sequence

tj

of type

variables and

nonpolymor-

we may assume that t1 tk do not appear free in F. By (inst), we have F U: r', and hence this assertion is ML1-provable by the induction hypothesis. There is no loss of generality in assuming that F U: r' is a Fprincipal typing for U. This does not affect the provability of F, x: a V: r; since we may use (inst) on the variable x to obtain any substitution instance of r in the typing for V. This gives us F—ML2 F [U/x]V: r by Lemma 11.3.22 and therefore F—ML1 F [U/x]V: r by the inductive hypothesis. We now use rule (let)i to finish the proof. phic type

11.3.5

r'.

Without loss of generality,

Complexity of ML type Inference

In this section, we show that the recognition problem for ML-typing is exponential-time complete. In other words, any algorithm that decides whether untyped A, let-terms are ML-typable requires exponential time, in the worst case. All of the algorithms given in Section 11.3.3 run in at most exponential time, and are therefore optimal in principle. This stands in contrast to the complexity of Curry type inference, which may be decided in linear time, as shown in Theorem 11.2.17.

Type Inference

806

In discussing size and running time, we use standard "big-oh" notation as in Section 11.2.5, writing f(n) = g(O(h(n)) if there is a positive real number c > 0 with f(n) g(ch(n)) for all sufficiently large n. For lower bounds, we write f(n) g(Q(h(n)) if there is a positive real number c > 0 with f(n) g(ch(n)) for infinitely many n. A more detailed analysis of any of the ML typing algorithms shows that the running time is exponential in the let-depth of a term, and otherwise linear. (To be precise, the is where U running time on input U is let-depth a term, of defined in Section 9.2.5, is the depth of nesting of let-declarations. In the worst case, the let-depth of U may be proportional to the length of U. However, for many practical kinds of programs, the depth of non-trivial nesting appears bounded. There are two main proofs of the exponential-time lower bound for ML typing. One is a direct encoding of exponential-time Turing machines as typing problems. The other uses a variant of unification called semi-unification. The first style of proof was developed and refined in a series of papers by several closely-communicating authors [KM89, Mai90, KMM91, HM941. The semi-unification proof of [KTEJ90bj is related to a larger investigation of polymorphic type inference carried out in a separate series of papers [KTEJ88, KTU89, KTU90a]. One intriguing aspect of the algorithmic lower bound for ML typing is that it does not seem to effect the practical use of the ML type inference algorithm. The lower bound shows that for arbitrarily large n, there exist terms of length n that require on the order of steps to decide whether they are typable. Like any other algorithmic lower bound, the lower bound for ML type inference relies on the construction of an infinite family of "hard instances" of the typing problem. However, the empirical evidence is that these hard instances do not often arise in practice. In spite of twenty years of ML programming, the lower bounds were a complete surprise to most ML programmers. An interesting direction for future research would be to characterize the form of most "practical" ML programs in a way that explains the apparent efficiency of the algorithm in practice. We do have some explanation, in the way that the complexity depends on the let-depth of an expression, but there is likely to be more to the story. A few example terms with large types will give us some useful intuition for the difficulties of ML typing. Before proceeding, it may be helpful to recall that in algorithm PTL, type variables in the types of let-bound expression are renamed at each occurrence. For example, consider a term of the form let = M in N with M closed. The first step in typing this term is to compute the principal type of M. Then, for each occurrence of in N, the algorithm uses a copy of this principal type of M with all type variables renamed to be different from those used in every other type. (Something similar but slightly more complicated is done when M is not closed.) Because of variable renaming, the number of

f

f

11.3

Type Inference with Polymorphic Declarations

807

type variables in the principa' type of a term may be exponential in the length of the term. The following example illustrating this exponential growth is due to Mitchell Wand and, independently, Peter Buneman.

Example 11.3.23

Consider the expression

let x=M in where M is closed with principal type a. The principal type of this expression is a' x a", where a' and a" are copies of a with type variables renamed differently in each case. Unlike the expression (Xx.(x, in Example 11.2.18, not only is the type twice as long as a, but the type has twice as many type variaMes. For this reason, even the dag representation (see Exercise 11.2.6) for the type of the expression with let is twice as large as the dag representation for the type of M. We can iterate this doubling of type variables using terms of the following form:

letx0=Min let = (xo, xO) in = It is not hard to check that for each n, the principal type of

Consequently, the dag representation of this type will have

has nodes.

type variables.

Although the dag representation of the principal type for the expression in Example 11.3.23 has exponential size, other types for this expression have smaller dag representations. In particular, consider the instance obtained by applying a substitution which replaces all type variables with a single variable t. Since all subexpressions of the resulting type share the same type variable, this produces a typing with linear size dag representation. In the following example, we construct expressions such that the principal types have doubly-exponential size when written out as strings, and the dag representations of every typing must have at 'east exponential size. from Example 11.2.18 douExample 11.3 .24 Recall that the expression P Xx.(x, bles the length of the principal type of its argument. Consequently, the n-fold composition Using nested of P (with itself) increases the length of the principal type by a factor of composition of P using an expression of length n. This lets, we can define the gives us an expression whose principal type has doubly-exponential length and exponential dag size. Since the dag must contain an exponential-length path, any substitution instance

Type Inference

808

of the principal type also has exponential dag size. Such an expression

is defined as

follows:

letxo=Pin let x1 = Ay.xo(xoy) in let

= ?cY.Xn_i(Xn_1Y) in

Q

where for simplicity we may let Q be Az. z. To write the principal type of this expression for the n-ary product, defined inductively by simply, let us use the notation (r[k1). has symbols. By examining and x It is easy to see that and tracing the behavior of PTL, we see that xo has principal type the expression Consequently, the and for each k > 0 the principal type of xk is t —÷ t —÷ , which has length 2 Since pnncipal type of the entire expression is (t —÷ a dag representation can reduce this by at most one exponential (see Exercise 11.2.6), the dag size of the principal type is

Proposition 11.3.25 For arbitrarily large n, there exist closed expressions of length 0(n) whose principal types have length dag size and distinct type variables. Furthennore, every instance of the principal type must have dag size

Proof For any

n 1, we may construct an expression as in Example 11.3.23 whose principal type has exponentially many type variables, and an expression as in Exsatisfying remaining conditions. expression ample 11.3.24 the The (Wa, proves the proposition.

The exponential-time lower bound for ML typing is proved using an adaptation of Example 11.3.24. By Lemma 11.3.7, let x M in N has the same types as [M/x]N when x occurs in N. It follows that the principal type of is the same as the type of the term obtained by let-reduction, v,,

=

p2flQ

Therefore, we may compute the type of by working across the term P2'1 Q from right to left. This process begins with the principal type of Q and, for each of the occurrences of P, finds a type that makes the application well-formed. Since there are occurrences of P, we unify the type giving the domain of P a total of T1 times, even though the term has length 0(n). We may see the general pattern a little more clearly if we introduce some notation. Let P0 be the principal type of Q and a —÷ r the principal type of P. Then we type by computing a sequence of types P1 with Pi+1 determined from p1 and o- —÷ r by letting S = Unify(p1, o) and Pi+I = Sr. We may think of the type a —÷ r as an

11.3

Type Inference with Polymorphic Declarations

809

"instruction" for computing Pi+ i from p1 by unification, with this instruction executed times in the typing of In the lower bound proof, we replace the "initial" term Q and the "instruction" P in by terms whose types code parts of a Turing machine computation. More specifically, suppose P0 codes the initial configuration of a Turing machine in some way, and the type a —÷ r has the property that if Sa codes the i-th configuration in some computation, then St codes the (i + 1)-st configuration. Then computed as above will be the configuration of the Turing machine after computation steps. Consequently, we can represent an exponential-time computation by an appropriate adaptation of provided we have some way to code an initial configuration and the one-step function of the Turing machine. Given the results in Section 11 which show that we can represent any unification problem, combined with the idea in [DKM84] that boolean circuit evaluation can be represented by unification, it should not be difficult to imagine that we can encode Turing machine computation in this way. Note that the number of steps we can encode depends on the let-depth of the term. This leads us to the following theorem. Any algorithm that decides whether any given untyped A, let-term is ML typable requires time on input U, where is the length of U and

Theorem 11.3.26

£d(U) is its let-depth. We first look at an outline of the proof, then fill in the details.

Proof Sketch Let T be any deterministic, single-tape Turing machine and let a be an input string of length n. We assume without loss of generality that T has exactly one accepting state and that once T enters this state, no subsequent computation step leads to a different state. This assumption, which may be justified by standard manipulation of Turing machines [HU79], makes it easy to see whether T accepts its input. There are also some minor details associated with the way we represent an infinite tape by a type expression. This leads us to modify the standard Turing machine model slightly. Specifically, we represent the tape by a list of symbols to the left of the input head and list of symbols to the right of the input head. The list of symbols to the left of a Turing machine tape head is always finite. On the right, there will always be a finite list of symbols after which all symbols will be blank. Therefore, we can also represent the infinite tape to the right of the head by a finite list. The only complication comes when executing a move to the right, onto a blank tape square that is not in our finite list. To handle this case, we assume that the Turing machine is started with a special end marker symbol, $, at the end of the input. We also assume that the Turing machine moves the end marker to the right as needed, so that it always marks the right-most tape position scanned in the computation. To make this possible, we augment the standard "left" and "right" moves of the Turing

810

Type Inference

machine with a special "insert" move that simultaneously writes a symbol on the tape and moves all symbols currently to the right of the tape head one additional tape square to the right. This allows the machine to insert as many blanks (or other symbols) on the tape as needed, without running over the end marker. An essentially equivalent approach would be to work with 2-stack machines instead of Turing machines. However, since Turing machines are more familiar, we will stick to Turing machines. or fewer steps. We will construct a term that is ML-typable if T accepts input a in A configuration of T will be represented by a triple (q, L, R), where q is the state of the machine, L is the list of symbols appearing to the left of the tape head, and R is the list of symbols appearing to the right of the tape head. Let mit be an untyped lambda term whose principal type represents the initial configuration of T on input a. (This configuration is E, a$), where E is the empty string, qi right of the tape head, terminated by starting state of and the input a sits to the is the T the end marker $.) Let Next be an untyped lambda term whose principal type represents the transition function of T. Specifically, the principal type of Next must have the form a —k r, with the property that if Sa represents a configuration of T, for some substitution 5, then Sr represents the configuration of T after one move of the Turing machine. We choose two distinct types representing boolean values "true" and "false" and assume there are untyped lambda terms with these principal types. Let Accept be an untyped lambda term whose principal type has the form a —k f3 with represents "true," the property that if Sa represents an accepting configuration, then while if Sa represents a non-accepting configuration, Sf3 represents false. These lambda terms are constructed so that if T accepts input a in or fewer steps, then the type of Run represents "true," and "false" otherwise. Choosing "true" and "false" appropriately, we may apply Run to arguments so that the resulting term is typable only if the type represents "true." This allows us to decide whether T accepts input a by deciding whether a term involving Run has an ML type. Since we can write Run as a term of length linear in a and the description of T, using the pattern of from Example 11.3.24, it requires exponential time to decide whether a given term is ML-typable. In the remainder of the section, we describe the construction of untyped lambda terms mit, Next and Accept for a Turing machine T satisfying the assumptions given in the proof sketch for Theorem 11.3.26. An important insight in [Hen9Oa, HM94] is that these terms can be motivated by considering a particular coding of Turing machines by lambda terms, then showing that the computation is also coded by the types of these terms. A consequence of this approach is that the lower bound may be extended to impredicative systems that only give types to terms with normal forms.

Type Inference with Polymorphic Declarations

11.3

811

We begin with terms for booleans,

tt

Ax.Ay.x

t

t —± s

Ax.Ay.y

if

t

s

s

and consider the types as well as the terms as representing "true" and "false." This coding for booleans is a special case of a general idea. Specifically, if we wish to code elements of some finite set of size k, we can use the Cunied projection function .

t2

t1

. .

...

>

>

tj

or its type to represent the ith element of the set. We therefore represent the states tape symbols Cl,..., Cm e C by 7rr,..., and qi,..., e Q of T by the tape head directions left, right and insert by 7r?, and (The end marker $ E C is one of the tape symbols.) An insert move is our special move that allows the machine to insert a tape symbol in front of the end marker in a single move. We may represent a tuple (V1, V2 and a list Vk) by the function Ax. xV1 ... [V1 Vk] by the nested sequence of pairs

4,

[V1

Vk]

wherenil

(V1,(V2

(Vk,nil)...))

Az.z.

Exercise 11.3.29 illustrates some typing difficulties associated with projection functions. We circumvent these problems using the following syntactic form

match

(x,y)=U in

V

U(Ax.Ay.V)

and similarly for match (x, y, z) U in V. For untyped terms, match (x, y) = U in V is equivalent to the pattern-matching form let (x, y) U in V as in Section 2.2.6. However, the typing properties are different, as explained in Exercise 11.3.29. Using these general coding ideas, we can represent the initial Turing machine configuration of T on input a by a triple

mit

¶!!!

where we assume is the start state of T, nil represents the empty list of symbols to the left of the tape and [al, a2 $] represents the list of input symbols to the right of the

Type Inference

812

tape, followed by the end marker symbol i.e., if the ith symbol of input a is the jth tape symbol then a1 7rT. The transition function of the Turing machine is represented by

)xonfig.

Next

match (q, Left, Right) = config in match (t, L)

Left

—

in

match (r, R) = Right in

match (q',

c",

d') = deltaq r in

d' MoveL MoveR MoveL

(q', L, (t, (c', R)))

MoveR

(q',

Insert

(q', (c', (t,

(c',

Insert

(t, L)), R) L)),

(r, R))

where configurations MoveL, MoveR and Insert correspond to moving the tape head left, right or inserting a new tape symbol, and delta is a function returning the next state, tape symbol, and direction for the tape head move. In words, Next assumes its argument is a triple (q, (t, L), (r, R)) representing a configuration of T. The function delta is applied to the state q and symbol r under the tape head, returning a tuple (q', c', d') containing the next state, tape symbol to be written and direction for the tape head to move. Since the or we can apply d to terms that represent the next direction d will be either ire, configuration if the machine moves left, the next configuration if the machine moves right, and the next configuration if the machine inserts a symbol. The function delta depends on the transition function of T. It is straightforward, although potentially tedious, to write this function out completely. If there are three states and two tape symbols, for example, then the state will be represented by a projection function on triples and the tape symbol by a projection function on pairs. In this case, delta could be written in the form

delta

where

(

(q,c,d)),

represents

the move from state q1 and tape symbol c3.

is not hard to verify that Next works properly. What is slightly more surprising is that we can read the computation of T from the types of mit and Next. More specifically, It

11.3

Type Inference with Polymorphic Declarations

813

consider the types of tt andff as representing "true" and "fa'se", and similarly for the types of projection functions. Abusing notation slightly, let us write (ai

Uk)

•

for the principal type of and similarly a!

[ai

0k —f (t

(U1,...,

—*

t)

Uk), assuming

has principal type

and t is not in

Ok]

Then the type if mit a'so codes the initial configuration of T and the type of Next codes the transition function of T. This gives us the following lemma.

Lemma 11.3.27 Let C: a be a term and its principal type, both representing a configuration of Turing machine T and let C', a' similarly represent the configuration after one computation step of T. Then (Next C)

—÷÷

C' and Next C

:

a'.

This essentially completes proof of Theorem 11.3.26. If Nextklnit gives us the machine configuration after k steps, it is easy to use match or application to projection functions to obtain the machine state after k steps. This lets us write the term Accept as required in the proof sketch. Exercise 11.3.28 asks you to find terms so that Run is typable if the Turing machine accepts. An interesting way to "check" the proof is to convert the lambda terms mit and Next into ML syntax and use the ML type checker. A simple, illustrative example appears in Exercise 1.3.30.

Exercise 11.3.28 Suppose untyped A, let-term Run has type s —* t —* s if a Turing machine T accepts and s —* t —* t otherwise. Write a term using Run that is ML-typable if the Turing machine accepts. Exercise 11.3.29 We may define untyped projection functions by

fst

Ap.p(Ax.Ay.x)

snd

Ap.p(Ax.Ay.y)

It is easy to check that for the coding of pairs by (U, V) = Az. zUV, we have fst(U, V) = However, there is a typing problem with U and snd(U, V) = V by untyped these projection functions in ML. This may be illustrated by considering an alternate definition

match (x,y)—U in

V

(Xx.Ay.V)(fstU)(sndU)

(a) Show that for both versions of match, we have match (x, y) (Ax. Ax. V)Ui (12 in untyped lambda calculus.

(U1, U2)

in

V

=

Type Inference

814

(b) Show that the type of (Az.match (x, y) z in (x, y)) (U, ff) depends on which definition of match we use. Why is the first definition given in this section better than the alternative given in this exercise?

Exercise 11.3.30 The following Standard ML code gives the representation of a simple Turing machine with three states, q3 E Q and two tape symbols, c1, c2 E C. For simplicity, the special "insert" move is omitted. The machine moves right until the second occurrence of c2, then moves back and forth indefinitely over the second occurrence of c2. Type this program in and "run" the simulation for enough steps to verify that the machine enters state when passing over the second c2. Then modify the machine so that instead of leaving the input tape alone, it changes all of the symbols to ci, up to and including the second c2, before looping back and forth over the square containing the second c2. Test your encoding by computing enough steps to verify that you have made the modification correctly. fun pi_i_2(x_i)(x_2)

=

x_i;

val tt

pi_i_2;

fun pi_2_2(x_i)(x_2)

=

x_2;

val ff =

pi_i_2;

x_i;

val qi

pi_i_3;

x_2; x_3;

val q2 =

pi_2_3; pi_3_3;

fun pi_i_3(x_i)(x_2)(x_3) = fun fun

pi_2_3(x_i) (x_2) (x_3) pi_3_3(x_i)(x_2)(x_3)

= =

val q3 =

val ci =

pi_i_2;

val left =

val c2 =

pi_2_2;

val right

fun pair(x)(y)

fn

w

=>

w

x y

fun triple(x)(y)(z) =

fn

w

>

val t2 =

triple triple triple

val t6 =

triple

val ti =

=

qi ci right;

fun

;

w

(*

x y z;

nil(x)

=

pi_i_2; pi.2_2;

x;

val cons =

pair;

move from state qi on symbol ci

*)

move from state qi on symbol c2 *) val t3 = q2 ci right; (* move from state q2 on symbol ci *) val t4 triple q3 c2 left; (* move from state q2 on symbol c2 *) val t5 triple q2 ci left; (* move from state q3 on symbol ci *)

val delta =

q2 c2 right;

(*

q2 ci left

(*

(triple

val a = cons ci

;

move from state q3 on symbol c2

(pair ti t2)

(cons c2

(cons ci

(pair t3 t4) (cons c2

(pair t5 t6));

(cons ci nil))));

*)

11.3

Type Inference with Polymorphic Declarations

val

mit

=

triple qi

(cons ci

nil)

a;

fun next(config)

config fn Left => fn Right => (Left (fn 1 > fn L > (Right (fn r => fn B. => (delta q r) (fn q' => fn c' > fn d'

(fn

q =>

d'

(triple q' L (cons

(triple ))))))

q'

(cons

c'

> 1

(cons c' B.)))

(cons 1 L)) R)

815

" " " $ "

" &''(&)* &+ #+*

, # "-

#"

4*

5 "" 4"

&/&012(3* &++*

! "

#

%

" .""

6 " !

#" "

&3/301)')()37 &+* 4+&

4

,# "8 "

4+3

# 4

! 8 # "1

4+3#

# 4

4+2

4=

+'

" &*3(&& &++&

%

" 9 """

" &('3 ,9: &++3

%

! 8 # "1 "9 """ %

" '+()'* ,9: 64, 7+ &++3

# 4 ; "# % !"#

4 A" + / )- =9C &+'

+*

+'

"

! " # $

7/&J'01&+('3* &++* A 4 " %

+3

=5 " ! "# "8 " 9 9

23

" )&()'7

A &++'

+*

A 4

+&

" " ! " 9

# 9

2/&017(&)3 &++* + +

* C

"9E" /&++*0 '&)('7) !"# ) 4" F 4 4 "

&++&

+3

" "

)

# 9

1 >" "

+

3/'01&'7('* &++3

A " A : - %

# " " ! " "1

&'1)()7 &+) : B9 4# # 9 " % " '(+* &+ F B "

4,+& : B9 4$ 4 , 8 % " .

+)/&01&7'(''& &++&

+

: B9 A C B K %

+3

= F "8" # " 9

24 5 0 *

"

&)7(&2+ &++ , 64, )7 E+*

4 E

4

4" # =9C

4 3

4

"

= C % &++* % )-

&+

" " ! % ) * # 6+ 7 " 2&( A &+3 , 64, &7) 4 # % ! +

" '&(37 , 64, '3'

4

&+ 4 +

@ 4

" " ! "8 " # $

%

" &3(&7 &++ , 64, )7' 4>+&

= 9 4 > 4" K !

#

%

# " F "8

2338

" '+&()*' &++&

, 64, 2&* 4>4:*

4 >B 94 : = " " %

""

# "

(9 , ! * ! +

" 2)2(2* ="" &+* 4>4L7

4 >B 94 L " !" "

4>

4>A

+

4

A > "" A "F 6 " 9)

,49)& >54 """ " 4 &+

4

A > A "F 6 " 9) "" %

1

" '*'('&' &++ ! 69C " &+2

4@2

C 4 @"

4+'

= 9 4

4 ! "#" 9-

4E+

4C

#

7'12(&& &+7

'122(+& &++'

+3

4$ 4 E"- > 9 " ! "

&/'01&')(&7 &++

4$ C

" ! ""

#

7/'J)01+2(&'* &+ 4C+3

4C+3#

4$ C C#

9 "

+

3/&0177( &++3 = 9 4 C

+ 3/&01&&)(&&2 &++3

= 1

# " """

G . !

M

4)'

4

" ! " " ! ! !

))1)3() &+)'

, F " : )) " )+(3 ! " 4)

4 " 8 # # ! #

21)32())

&+) 43* 43&

4 ! ! " ! " 4

! -

! 212( &+3*

= D8 ="" &+3& &+)

# D8" " % # % 4

,+3

47

4+ 4$

4=+'

4

, A

, 8 ." ! ," @ F "#

&*+13(2 &++3 = 8" * , 64, 2' ,# &++& " 72*(77* , 4- , "" "" ! . "" ! 8 71&'+(&37 &+7 E 4- ) => "" F D8" &++ 4$ "" ! H" . % " ''7(') A &+ 4 ="

6F ! " ! . "1

@%N9 "

+/'01&7&('&* &++' 4 A "" , " ! 8" )+137'( @%N9

4)

3' &+) 4

= 9 4

! * 0 *

=9

A E ," 6F G- &+ " 72

4

= E

; " " #" "

- &7/30137&(2'' &+2

4 L !

" '&('' A &+

" %

(9 , !

* ! + " 27+(*7 ="" &+* 5 E >-" " ! 8 ! " 6 > "8 ! %

&/0132)(327 &+72 >7 >A+*

> 3

> '

>7+ 5 2 5 7' 5"7&

5 E >-"

)

=9C

&+7

6 >"FB A 9= A F """ % A 8 F

* . 9 " '3)()'* 69C 4 >F- =

-" A 4

&1)2(2* &+3

; "$

( !'

" &++*

!

> " = "" ! ! " % 3

" '*7('&' &+' A > ; " " ! 123(2* &+7+ C 5 + ! / 2 ,9: &+2 C 5 ="" &+7' G 5"8 4 # " ! "" ! ' &*/)01'37()* &+7&

5"7' 5"7 5,+* @!7+

! ' &&/301)7(3)7 &+7' G 5"8 C O8 " ! ' &2/013'(23 &+7

"9E" &++*

5 " ," , @! 4"8 " ! !" """ % 0 5:; " G 5"8 4 # !" ! "

&2+(''3 " &+7+ 69C @,,

= @ A 9G , 8 = A , # "

%

,

" '73('7+ A

&+ @C +3

@+

@" @ C"

A 4 # " ! # " " B

% 6! 97 &1)()7 &++3 = 8" * &++) '()

4 @ * + 69C " &++

@ 7

@;)

E @ "" " % A ,F B

" &+()' , &+7

, @ > 8 ;H> .""8"" ! " " ""

)*/&01&2&(&2 &+)

@

= @ = =5 &+

@72

C @

5$ #F ! "

% = -

0

"

''()7 ,9: &+72 @ ,+'

@,7

@,+* #& 2 *

5." =5H" # +/'01'&&(''7 &++' = 8" * %555 &++* )3()23 = @ , 8 , " "" ! # " % " )&2()&+ A &+7 = @ , 8 * # 69C &++* > ## - ( 5 > &+& ; ; . ! ." ( ! '&1 #"

'=;8,

"

E" &+) I ,+*

E7

I B

> - !

4 > , ,

: 6" &+ , "

% A 8 F

* . 9 " ))(73 69C

( !'

" &++*

A A E 5 E # "

8 3 =9C &+7 4

, 0 % ="" "" ! #" " % G

+'

4 # &++' 77

A :

#" " 8 ! ""

'*/01)+(3*3 &+77 C

+'

= C - C "-

$ % % '7/201,

&++' C * C 3

C 3

C2*

= C "

%-

> : 6" &+*

( !'

* , 1 " 3+7(*3 " &+3 4 C " ! " % ) * # 6+ 7 " &27(&7 &+3 C- 4 "" ! " ! &2/'01&(+& A > C > % > ## @ "

&+2* C)

C- "

C+*

@ C

C+*#

C@+'

9

! "

F # ! !

+ 2'1)')()33 &+)

!1 J " # "

DD94,9+*9&3 D8 ! D &++*

, E &++* = C - A @ " C "- $ % % '7/201,

C""

&++' CC=7

C @ C"

= - ! F- !

" &+3('*3 A &+7

" %

C+

A C

9" ! # #

&31'+(* &++ C)

A C "" !

# "

''1&(&7

&+) C*

C +*

C +)

C +3

A C # " " ."

'1'+()&* &+*

CF A 4 ; . " " ! =4@ % # +

" '+()* &++* C A 4 ; " ! ,

&2/'01'&&('2' &++) 5 8" " " Q 5"" ! R 2@

* &+ '(3 @ C C " . ! ! ! 9

C +2

C +2#

C 7

? ' $ #

+

3/3013)2(37 &++3

#

C " 4 " " " "" % 4 ; 5$ " F "1 "8 % -

, - ! " ,% %

C + C 7' C +2

% ="" &+

="" &+*

+

8" 8" "

4,9&&& &+*

&'/&*0127(2* &++ 4 C =! ! "" ! " " &1'7&('& &+7'

C - ! => ""

4 C . # "" !

, ! D8" &++2 C3 CF7)

CF*

CF7

CF+'

+

4 , ="" &+3 E CF C B # ! " % - " 323(3& , 6 )33 &+7) E CF ! "9 "9" ! " % (9 , ! * !# + " 37+(3+* ="" &+* > A CF # 8 ! H" . % " '*2('&3 A &+7 CF +1 1

=> 5 CFB

"" , ! D8" &++' C,

A C A = ,

CD7+

! !

, , ." & 4 # D8" ="" &+

A 5 C! A > D

*

"9E" &+7+ C*

C 4K "1 #" " " F """

'7/3017+7('& &+*

C 7

A 5 C

" B ! $ " " !

#

'/&'01)&()7* &+7 C O8 " % 9

" C '

A 5

&2('& 69C C

A 5 C

"

" &+'

"

3* &+

8 ! 4H" ""1 @! G " L""/60 A &+ A 72

A ! ! F " ! F 8 ! # %

! " '&)('&+ ,

4 I 64, )7 &+72

A77

= A"

)

, @

="" &+77

" "

21+7(&'& &+) 7*

> 5 = . , F # " 8"

! ! !

# % A

" ')('+7 = ="" 5 "! 6 G

&+7* A)2

, 4 "" A "" ! !

"

)1)*(

) &+)2 ) 32

)' '1)3*()2) &+) ; ! " # ! &*1&*+(

, 4 # # "8"" , 4 &'3 &+32

7&

*

, 4 B # 1 "8 "8 % !

# " +2(&&' , 6 ))7 &+7& A E ! => "" D8" ! D &+* =# " "

4 &'+

9 )'1&3)(&' &+7

7

A E F1

+

= 4

-" A 4

= 4

-" C " A 4 D " %

+&

' 77

=

%

" &*2(&&2 &++

24

* ( ! " 333(37 % ="" &++& 4 = A " " ! # " 2'/)01)*()') &+' - 5 " >" % ( !' " ')()& 69C

7

" &+77

E >

=9C

,!F ,"

&+7 2+

"

% ! "" # " ! "8 ! " ! "

C

- " &*&(&' 69C

%

"

&+2+ 2

, -

+

,

"" ! "

%

+ -

" +'(&)* /= 4 $ ;.! &+)0 69C "

&+2 +*

D

> B A " ! " % A 8 F

* . 9 " 7+(3* 69C

( !'

" &++*

A ! A = DBB ." ! F O8 ""9 %

2@

" 2(+ &+

D+

! A = DBB B #

%

4

"$"

" " !

A

&++ D+*

! A = DBB %

# ! "9 #

.9 %

&*&S &2(&2 &+2 2

7*

= A . 7** C

%" "

! )*1') &+2 C

"

+1&27(& &+

" # 9 "

#" ! B # ! F " " "

%

,

9B %C

" ''7(')3 &+7* F+

"+'

" " " " % * (#

" &3)(&32 , 6 +' &++

@ E F8 >

= "

! F """ # "

8 "

. 6 7

# )' 6" 4 , " '&() ,9

: ,# &++' P 872

A 9A P 8

# !

4 I

% C 9

D !

*

9

9

9 "

#

9 "

%

! " &37(&2 ,

64, )7 &+72

"-8 A

! / )-

% =""

&+ 77

+'

,+)

"" "Æ ! ." !

+ " &( &+77

" %

2;

= > A 4 "" ! ! F "#" %

23

" '+)()*3 A &++'

" , , 88 !

# " %

"

(&3 &++) I #72

C I # 5#

"9

! " ! "

729 " %" " &+72 ,

A #- = A ,

(# 4 # D8"

="" 4 # D &+ ,,77

"-8 , -" 4 , O #" ""

'*123(27 &+77

L73

"-8 , L

" = F #" "

% +12*(2+

&+73

7&

2

,

&' 8 2 ! $ 1 #

,9: &+7&

> T

" ! ,

'/'0 &+2

8" = &+3 4 , " @

+*

7*

> T

)2 "

=

D" " ."" "

" '77(' &+ C " >

%

# " ! " .

2D %

2:

" )'(3*& A &++* 8

" "8 !"

: 6" 4 = &+3

24

" ''('3& A &++ E " ! # " < 2'/&017(&'' A "

" ! "

%

&+'

!"#

+'

% )* " " D# " &++'

+'#

C

A 4

B,

=9C &+

=9C &++'

C "" ! %

" '(3 A &+

2@

77

@

#" " !

#

3/&01&('' &+77

7

2

! "

,

&71)3()72 &+7

'/'0 &+2 ' "

8" = &+3 4 , " @

2#

,

'/'0 &+2 ' "

8" = &+3 4 , " @

#

=

=

#

A 4

#" B # ! " 8 / #" 0

A 4

A 4

9! " " " !

! 2&/)012&(2' &+

" #" %

" ')('7 A &+ .""" % &+

2D

+

" )*()&+ "

+ +

*

F 8""

C

"9E" /&++*0 &+2('&'

A 4

+ +

*

= !

7/'J)01'&&('3+ &+

C

"9E" /&++*0 &2)(&+3 ; $8 ! " " % : !"B / , (

+&

A 4

+&#

A 4

" )*2())* ="" &++& ! F " "#"

+

&/)01'32('

&++&

2

A 4

, 9

"

%

"

''2(') A &+2 ,9: 64, &+)

+&

,7

5 -9" " ! # " 2&1++(&'3 &++& = 8" * &+7 " )*)()&3

A 4

A 4

5 , 5 "

#

2E

" '2)('' A &+7 F 8"" + + #

* C

"9E" /&++*0 '7)('3 A 4 A ;H> B # " " ! 9 " % ! F " )'()' ! &+ A > - 8 )7 ! $ 1 ,9: " %

;

7

&+7

7)

A C " " "" % &

" &'*(&'3 &+7)

/ =(C &++* ! 9C

+*

4

"73

G 6 "8 -"

"

&+73

=

> = - #" " 8 ." " #

&*/)0137*(2*' &+ = 8" 2 "" TH"

D8" " 4 &++* ; '

@ A ; "

#

=> ""

, " D8" , " 6 G &+' ; 2

@ A ; "

# " ! " # - "

" 23)(27) 4 # D8 ="" &+2

%

!

;+'

= E ;HC >

, " !

8 # "

% = @ =

8 &77 ! % " '&7(') 4 # D8" A" =" "

="" 4 # 5 &++' ;+)

= 7'

= 7

= E ;HC >

8 # "

" &7&(&3 &++)

%

= " ; # " " """ "

2/&'01&*2)(&*2 ># &+7' 4 = " , - ! + 4 # D8 ="" &+7

=7

, = A" C

=+*

+

=(

&+7

E =

) G!

=> "" 4 # &++*

8 # "

D8" ! 5 # > ! 4 , 4,9'9+& 54,9@4,9+&9&7& =+*#

=+& =7

=+

=++

= 7)

" )()77 &++* = 9 % ="" &++& =" = " " "9 "8 %

" &'()+ &+7 , 64, ') =" 698 F " H # "#" ! " % " (&) &++ =" 4 % ( !' * . . # ;.! D8" ="" &++< E = 5O8 " " " %

> = - # 9 # D8" ! 5 # ,

= 72

= 77

!

"

/ 0 ,%9 93

% &+7)

> = -

4

9#9 9#98

> = -

4@ " "

# "

&1&'2(&2+ &+72

21'')('22 &+77 = *

= &

> = - # # ! % (9 , ! * ! + " ))()7) ="" &+* > = - " " "

>% % @69&+

" D8" 4 , > &+& = '

> = - 6" "" ! !

" D# " "

&+' = 2

> = - >

" " F

!" " 4 , % ,

, , ! &+2

% )

= 2

> = FB

=E7

, = " 6 E

73

A 4 " ; #F " " %

73#

$" E-"

,- &+2

&1&2(&7 &+7

0 *

" &3&(&2 &+73 ,9: 64, A 4 " F " ! " % 0

" 3*(3'2 &+73 ,9: 64, &+

&

A 4 " "" ! % -- 8 : "

" )32()7' " &+& %@%= 69C )

3

+3 #2

A 4 " " #" " %

5;D " 2&)(2') 69C

" &+)

A 4 " = " " "9 % ) * # 6+ 7 " &32(&2 &+3 , 64, &7) A - , H" &9"

F9C

&+7

"9 ! " "" "

, )

&+

> , , ( 8 4D4C %,E% ;ECG " &++

2/)012''(27 &+7 > , , " ! # " % (9 , !

* ! + " 3*)(32* ="" &+* > , > " "

"

> , , " ! " " 4 ,

+21))(3 &+3

;.! &+*

" " "

, 4 " " ! 9

2'1++(++ &+7 ,

, ,#

" # "$

!" 8

!

# "

!

"9E" &++

"

% = @

* ) 2332 8 &77 ! % " '2('+ = A" =" "

4 # D8" ="" &++' ,6 +

, - E 6 A A " ; 9" $

0 ! * . ="" &+3 A 5 , ) , #

% ="" &+77 , + !

= A , ,

E ," 6F G- &+ ,+&

,

% # !

" !

7+1)27()2 &++& ,+&#

C , K ! " " ( %,= % : !"B

/ , ( " 3*+(3' ="" &++& 4 , F " ! " " % , + )

" &+(''* 69C &+

,

,7

4 ,

@

, , , ,E7

"

"

6" %

4 = 4 " &+7

"9E" &+ ,8 = E " ) !"#

,"

%

="" &+7 7

E E %" '&' &+7

! ! " !

! )'1&+(

7

=

- ) * 1 => "" 9

" > 4 # D8" &+7 &

+& !

>

=9C

% &+&

> @9 " " ! " " % " '*(''3 &+ , 64, '3* >

=9C % &++&

! => "" 5 # D8" &+ 8 # " 5 # D8" # ! @ " ! 4 ,

!+*

54,(@4,((23

! ! ! !"

+/&01&()3

&++* 7

7)

G ; 49"" ! " ! F """

)31&'(&3) &+7 , " -

,

6 )33 &+7) )7 2

! '1&2)(&) &+)7 > 1 9" ! F " % + 5 +

* % " &(& &+2

4 #

# #

, 64, '*& "

A "B-FB 4 . ! !

# " "H" ""

D8" ! E " F &+ D

+3

D, *

A > D

= C &++3

D, > ! >!"

=; **9***9

**)239 &+* 8>*

> 8 >

=> ""

D8 5 8

&+* E 7

4 E "F #F " ! ,H"

2/)013(2'& &+7 " ! !V % +

" )37()2+ &++

E @ # " " ."" &+1'7(33

" E +

E 7+

= E

&+7+ E 7

E " ! ! !

+ &*1&&2(

&'' &+7 E +3

A E

" # - " 9

E+) E+*

E,C77

G7

# 9 " $8

" &7(&2 &++3 E"-

% ="" &++)

E" # " % A 8 F ( !' # * . 9 " 72(7 69C " &++* A E " E , 4 C #" "" " = 1 7/012(+ &+77 GB F - " !"#

% =""

# %

&+7

%#$ *

%+

+

,

Æ

!"# $ #%#% "#& $ " " #% % " " " '# # "#& $

%#

%# #(# )"' %#

'#

'"# ) #

- -

, , , , , , , , ,

&"#%

%# !"#$%& ' - # ' # -

'$ (

· %#

.

!

## / ) " %*½ 0%"' +

" # # $ % & & +

,

,

&

"#& $

1

1 "'&!" #%' #%' 2 !"# # #

$

"!#"# $#"( "!#"# $#"( "#"!' %"# 33 %# "#"!' "& %# " '$ 4" &"& '"&%"& "0%"$ "# "5 # #%"# "!' 6# %# #$ "!' '%) "# "!' ' &"' '"# "!' "# "'&!" "'&!" "#"#$ %# %# #$ +"# # "'&!" "'" "!&% % &"" "## "#$#$

"'"# '"!" #"# "'"#) #%#% (# "' "#) " "#& $ '$ #$ #$

" "# " "# ) # " # " ("# "'

" ("# #

" 6 " "#& $ " "#)#$ '# &# " "#)#$ # "( "( "# "#

78 # !" " %# !" #$ !

!#"%# !5#) %# !"$ # 759 :3 ! % )""!' ! % '#

! % '$ 7%" ;#

< &"& '"&%"& ' H"$ %'' &"' %# I #$ &"' % #$ &"# "'&!" &"# "# &' !"' '#

" "#& $

" GE ' &' !"' '# &" %#

& % #" 0%"# #" # %!##%# # "'#& !'

A"E'' &"& '"&%"& " "' &# " #

AE ' & ' & "#) '$ #$ #$ #"$ %#"# & ' & & %!#$ ! % '$ ' %'" 3 3 3 ' # !#6 "'&!" !#6 "'"#) #%#% "#) &" '#"# ' 6# #"" !) "!' '"# "# '# #$& "#) '$ '% '%) "# ##

% " +( # ## ##

$# % " +( # ( 33 %# %# +( # % %# $ # %# # %#) $ # 0%"# "' %'' "!#"#

%' #" #"' "'&!" 6"E #"' "'&!" '" 0%"# $ #"' "'&!" "# #"' !5#

#"' #$ 5#) %# #" '"# #$& # ' ## # )""#

6' '

# # "#"' 0%)"' '"#

%) !#6 "'&!" !#6 "'"#) #%#% " "#& $

G' "' G%#7( '# G%#7( ## GE "'"#) #%#% '"!" '

' &"' '"# B=)"'% '"!' #"#

'"!' +( # "# '"!' %# #) '"!' # '"!" "!#"# '"!" "'&!" #$ '"!" "'%'%

'"!" +"!' '"!" ( '"!" ##"# '"!" # '"# !& '"F$ )"'%"# '"F$ "' B '" )'"& 6# $# #$ #$ "#) ' &"' "( #"' #)"' ' ) # "'&!" &"#% "' "'F"# %"' % %#

% "#"' %#

!5# '"&%"&

!5# " "#& $

!5# # &"&

!)"!' #$

!)"!' )"'%

!)"# "' &%

!)"# "' 0%)"'

!)"# "''$ 0%)"'# '

!) " "#"#$

# %#

# %#

"# "' 0%)"'

"# "' "#

"# " "#"#$

# "#& $

"

# & "' 6# $# "6 ## '# "6 ## %!# ""''' %# ""''' ""# '$ " # "#"' "'"#) #%#% "#"' !"# $ "'&!" "#"' !"# $ "'&!" " "#"' !"# $ "'&!" "#"' # "# "#"' %"# %# "#"' 0%)"' '"# "#"' 0%)"' '"# "#"' 0%)"' '"# "#"' %# "#"' "#"' %) %# "#"' %5# "#"' %5# %) "!$ "!$)"'% ; "" ( "#"#

" 0%"# !$ '"!" '

!$ " # "'&!" A " ' &

%'" !$ '

# %'"

# "&#

#$ "&# "#"' # "#"# 0%"# ##"#

" "#& $ $" #"# " '"!" "'%'% "# %# "# # $ "# &"

%+"# ""# % > 0% +# 0%#"' # # % ## # %# "$ &"#% %'# # "'&!" '$ ' "# ' "# #$ '$#$ '"!" "'%'% %'#" % %!##%# ( &' # "'&!" &'# # "'' "#& $ " %# # "'&!" # "&# "'&!" # %

#$& %'

> %'# # "'&!" +"# "'&!" "#"#$ #"# $! ' #"# # %# # J $ #"#K ## %# "'"# ## %# #& # & %# # & "'F"#

# &'$ @%# %# #%#% "# "' "# %!"#& $ %!5# %# %!5# %# #$ %!0% ## " # %!

%!# %!# ##"# %!##%# " ) # ## '" "'&!" '" > '" AE '

"'&!" #

'"!" #

;

% (

3

! "# $ &'# &'# ' ) *+ ,) -. / / '! "1 , 2 ! ' #

% ! # ) # 4 3 4 ) 3 4 !# # 5 3 6,5

4 '! 5 */ "+ + 7 - 5 % $ ' ( 8!,# # */ '! ' ! # */ &) ! $ % "# ( + 8 ' ) 5 5

( 0 % ( % %

( ( ( ( (( (0 0 0 %

%

(

*/ ' 4 # % %

% &) 5, 9'

%% %( */ */ ! # !# + 7 ( ,# ( ) ) ( 4 (% & + 5 (( + 5 ( ,8 ' 5 # # : ! 5 */ ' 5 ! ; '# + '# % 5 '#

!

%

(

4 + 7 5 # 8 & '! 5 ) # 5 % &) * # % % & % # # 5 ' %% / 5 # %( * + # % #' ) # # ' )

7 + +' . ) ) '

# 5 +# 5 ) . ) ' M) * ) 5 *

#$!

E*)# F

EF - # 5 !# K 8 5 + 5, 8!,# # E F EF * # 5 ! # E F EF # 5 K ' # E %F EF ) !# + #7 E (F

EF

E*)# F

E ?F

EF # 5 ' # E %F EF 0, then has elements (see Exercise 1.6.1). Returning to cartesian products, let A and B be sets with a E A and b E B. The singleton {a} and the set {a, b} with two elements are both in the powerset P(A U B). Therefore, the ordered pair {{a}, {a, b}} is in P(P(A U B)), the powerset of the powerset. This finally gives us a way of defining the cartesian product using only set-theoretic operations that do not lead to paradoxes:

In addition to ordered pairs, it is also possible to define ordered triples, or k-tuples for any positive integer k. If A1 x Ak is the Ak are sets, then the cartesian product A1 x collection of all k-tuples (al i

1"1, V2

Eq?n n if

values

V a

V

false

true, Eq?n m

true then

M

else

N

M,

value

n,m distinct numerals false

if

then M

else

N

N

Subterm Rules

M+N boo!

Eq?MN

M' + N

n

Eq?M'N

Eq?nM M

(M, N>

M Proj1 M

Functions

(M',

N>

M

n

n a numeral

+ M'

n a numeral

Eq?nM'

M'

if M then N else Pairs

+

M' then N (V, N>

else

(V, N>

P

V a value

M' Proj1 M'

NWN'

V

a value

eager or call-by-value reduction is defined in Table 2.5. Like lazy reduction, eager reduction is only intended to produce a numeral or boolean constant from a full (closed) program; it is not intended to produce normal forms or fully reduce open terms that may have free variables. This can be seen in the rules for addition: there is no eager reduction from a term of the form x + M, even though M could be further reduced. A significant restriction on eager PCF is that we only have fixed-point operator fiXa for The intuitive reason is that since a recursively-defined natural function types a = ai number, boolean or pair would be fully evaluated before any function could be applied to it, any such expression would cause the program containing it to diverge. In more detail, reduction of closed terms only halts on lambda abstractions, pairs of values and constants.

The Language PCF

94

While lambda abstractions and pairs of values could contain occurrences of fix, analysis of a pair (V1, V2) with V1 and V2 values shows that any occurrence of fix must occur inside a lambda abstraction. Therefore, the only values that could involve recursion are functions. If we changed the system so that a pair (M, N) were a value, for any terms M and N, then it would make sense to have a fixed-point operatorfix0 for each product type a = al x It is easy to verify, by examination of Table 2.5, that for any M, there is at most one is a partial function on terms whose N. Since no values are reduced, N with M domain contains only non-values. The use of delay and fix reduction requires some explanation. Since eager reduction Ax: a. Mx does not reduce under a lambda abstraction, a term of the form will not be reduced. This explains why we call the mapping M F—± Ax: a. Mx "delay." The reason that delay is used in fix reduction is to halt reduction of a recursive function definition until an argument is supplied. An example is given in Exercise 2.4.24; see also Exercise 2.4.25.

Example 2.4.20

Some of the characteristics of eager reduction are illustrated by reduc-

ing the term (fix (Ax:

nat

—*

nat. Ày: flat. y)) ((Az: flat. z

+

1) 2)

to a value. This is only a trivial fixed point, but the term does give us a chance to see the order of evaluation. The first step in determining which reduction to apply is to check the entire term. This has the form of an application MN, but neither the function M nor the argument N is a value. According to the rule at the bottom left of Table 2.5, we must (Ax: flat —± flat. Ày: nat. y). Since the argument to fix is eager-reduce the function M since delay[.. .1 is a value, we apply the reduction axiom forfix. Followed by a value, this gives us a function value (lambda abstraction) as follows:

fix (Ax: nat

—*

(Ax: flat

flat. Ày: flat. y) —÷ nat. Ày: flat. y) (delay[fix (Ax: flat

—*

flat. Ày: flat. y)])

Ay:nat.y Note that without delay[ 1' the eager strategy would have continued fix reduction indefinitely. We have now reduced the original term MN to a term VN with function V Ày: flat. y a value but argument N (Az: flat. z + 1) 2 not a value. According to the rule at the bottom right of Table 2.5, we must eager-reduce the function argument. Since Az: nat. z + 1 and 2 are both values, we can apply a-reduction, followed by reduction of the sum of two numerals:

2.4

PCF Reduction and Symbolic Interpreters

95

This now gives us the application (Ày: nat. y) 3 of one value to another, which can be reduced to 3.

Example 2.4.21

Divergence of eager evaluation can be seen in the term

let f(x:nat):nat=3 in letrec g(x:nat):nat=g(x+l) in

f(g5)

from Example 2.4.6. Exercise 2.4.22 asks you to reduce this term. Since we can easily prove that this term is equal to 3, this term shows that the equational proof system of PCF is not sound for proving equivalence under eager evaluation. It is possible to develop an alternative proof system for eager equivalence, restricting $-conversion to the case where the function argument is a value, for example. However, due to restrictions on replacing subexpressions, the resulting system is more complicated than the equational system for PCF.

The deterministic call-by-value evaluator evalv is defined from in the usual way. Since the partial function selects a reduction step if the term is not a value, eva/v may also be defined as follows.

evalv(M)

fM ifMisavalue =jN

if M

M' and evalv(M') = N

There seem to be two main reasons for implementing eager rather than left-most (or lazy) evaluation in practice. The first is that even for a purely functional language such as PCF (i.e., a language without assignment or other operations with side effects), the usual implementation of left-most reduction is less efficient. The reason for the inefficiency appears to be that when an argument such as x is passed to a function g, it is necessary to pass a pointer to the code for and to keep a record of the appropriate lexical environment. As a result, there is significantly more overhead to implementing function calls. It would be simpler just to call with argument x immediately and then pass the resulting integer, for example, to g. The second reason that left-most reduction is not usually implemented has to do with side effects. As illustrated by the many tricks used in Algol 60, the combination of left-most evaluation and assignment is often confusing. In addition, in the presence of side effects, left-most evaluation does not coincide with nondeterministic or parallel evaluation. The reason is that the order in which assignments are made to a variable will generally affect the program output. We cannot expect different orders of evaluation to produce the same result. Since most languages in common use include assignment, many of the advantages of left-most or lazy evaluation are

f

f

lost.

f

The Language PCF

96

Exercise 2.4.22 Show, by performing eager reduction, that the eager interpreter does not halt on the program given in Example 2.4.2 1. Assuming appropriate reduction rules for the function symbols used in 6. the factorial function fact of Section 2.2.5, show thatfact 3

Exercise 2.4.23

Exercise 2.4.24 fixa (Ax: a. M)

An alternate eager reduction forflx might be [fixa (Ax: a.

M)/xIM

Find a term Ax: a. M where eager reduction as defined in Table 2.5 would terminate, but the alternate form given by this rule would not. Explain why, for programs of the form letrec f(x: a) = M in N with fix not occurring in M or N, it does not seem possible to distinguish between the two possible eager reductions forflx.

Eager or call-by-value reduction may also be applied to untyped lambda calculus. With ordinary (left-most) reduction, an untyped fixed-point operator, Y, may f(xx)). Under eager, or call-by-value reduction, be written Y )Lf.(Ax. f(delay[xx]))(Ax. the standard untyped fixed-point operator is written Z f(delay[xx])) where delay[U] Xz. Ux for x not free in untyped term U. Show that for M Ax. .Xy. y, the application YM lazy or left-most reduces to a term beginning with a lambda, and similarly for ZM using untyped eager reduction. What happens when we apply eager reduction to YM?

Exercise 2.4.25

PCF Programming Examples, Expressive Power and Limitations

2.5 2.5.1

Records and n-tuples

Records are a useful data type in Pascal and many other languages. We will see that record expressions may be translated into PCF using pairing and projection. Essentially, a record is an aggregate of one or more components, each with a different label. We can combine any expressions M1: ai Mk: ak and form a record {t1 = M1 4 = Mk) whose the value of The type of this record may be written aj component has £k: ak). We select the component of any record r of this type (0 i k) using the "dot" notation For example, the record {A 3, B = true) with components labeled A and B has type {A: nat, B: bool). The A component is selected by writing {A 3, B = true).A. In general, we have the equational axiom

= M1

4=

= M1

(record selection)

for component selection, which may be used as a reduction rule from left to right.

PCF Programming Examples, Expressive Power and Limitations

2.5

97

One convenient aspect of records is that component order does not matter (in the syntax just introduced and in most languages). We can access the A component of a record r without having to remember which component we listed first in defining r. In addition, we may choose mnemonic names for labels, as in

let

person = (name: string, age: nat, married: bool,

.

.

.}

However, these are the only substantive differences between records and cartesian products; by choosing an ordering of components, we can translate records and record types into pairs and product types of PCF. For records with two components, such as jA: nat, B: bool), we can translate directly into pairs by choosing one component to be first. However, for records with more than two components, we must translate into

"nested" pairs. To simplify the translation of n-component records into PCF, we will define n-tuples a,1, we introduce the n-ary product as syntactic sugar. For any sequence of types al notation cr1

x

•

x

cr1

x

(cr2

x

.

. .

(an_i

X

a,1)

.

by associating to the right. (This is an arbitrary decision. We could just as easily associate n-ary products to the left.) To define elements of an n-ary product, we use the tupling

notation (Mi as syntactic sugar for a nested pair. It is easy to check that if M1:

then

(M1

To retrieve the components of an n-tuple, we use the following notation for combinations

of binary projection functions. x

x

x

x

We leave it as an exercise to check that

justifying the use of this notation. A useful piece of meta-notation is to write

a

a ofk a's.

The Language PCF

98

Using n-tuples, we can now translate records with more than two components into PCF quite easily. If we want to eliminate records of some type {t1: £k: cik), we choose some ordering of the labels Li Lk and write each type expression and record expression using this order. (Any ordering will do, as long as we are consistent. For concreteness, we could use alpha-numeric lexicographical order.) We then translate expressions with records into PCF as follows. {t1:ci1

x ..

£k:crk}

=

=

Mk}

(M1

X

Mk)

If an expression contains more than one type of records, we apply the same process to each type independently. Example 2.5.1

let

r:

We will translate the expression

{A: int, B: bool)

=

{A

= 3,

B

=

true)

in if rB then

else rA +

1

into PCF. The first step is to number the record labels. Using the number 1 for A and 2 for B, the type {A: int, B: bool) becomes mt x bool and a record {A = x, B = y}: {A: int, B: bool) becomes a pair (x, y) : mt x bool. Following the general procedure outlined above, we desugar the expression with records to the following expression with product types and pairing:

let

r: mt x bool =

(3, true)

in if Proj2r then Proj1r else (Proj1r) + 1

Some slightly more complicated examples are given in the exercise.

Exercise 2.5.2

Desugar the following expressions with records to expressions with product types and pairing. (a)

let

r:{A:int, B:bool,

in if r.B then

r.A

int}={A =5, B—_false,

else (r.C)(r.A)

let f(r:{A:int,C:bool}):{A:int, B:bool}={A=r.A, B=r.C} in f{A=3,C=true} (b)

You may wish to eliminate one type of records first and then the other. If you do this, the intermediate term should be a well-typed expression with only one type of records.

2.5.2

Searching the Natural Numbers

One useful programming technique in PCF is to "search" the natural numbers, starting from 0. In recursive function theory, this method is called minimization, and written us-

2.5

PCF Programming Examples, Expressive Power and Limitations

99

ing the operator Specifically, if p is a computable predicate on the natural numbers (which means a computable function from flat to bool), then 1ax[p x] is the least natural number n such that pn true and pn' —false for all n' 0. If C is a class of numeric functions, we say C is is numeric if f:

f

1\fk

Closed under composition if, for every the class C also contains the function h defined by

•

h(n1,...,

and g:

—*

iV from C,

—k

from C,

flk)

Closed under primitive recursion if, for every the class C contains the function h defined by

•

h(O, n1

nk)

f(n1

f: Ark

J\f and g:

nk)

nk)=g(h(m,nl

h(m+1,n1 •

iV

nk),m,nl

nk),

Closed under minimization if, for every f: .N from C such that Vn1,..., f(m, ni nk) = 0, the class C contains the function g defined by

g(nl

I

nk)

==

the least m such that

f(m, ni

nk)

=

0.

The class of total recursive functions is the least class of numeric functions that contains the projection functions (n i nk) n1, the successor function Ax: nat. x + 1, the constant zero function XX: nat. 0, and that is closed under composition, primitive recursion, and minimization. We can show that every tota' recursive function is definable in PCF, in the following precise sense. For any natural number n E Al, let us write Fnl for the corresponding J\f is PCF-definable, or simply numeral of PCF. We say a numeric function f: natk definable, if there is a closed PCF expression M: —* nat, where natk is the k-ary product type nat x ... x nat, such that Vn1

nke.A/.M(Fnll

Theorem 2.5.12

Proof

Fnkl)=Ff(nl

nk)].

Every total recursive function is definable in PCF.

To show that every total recursive function is definable in PCF, we must give the projection functions the successor function, and the constant zero function, and

2.5

PCF Programming Examples, Expressive Power and Limitations

107

show closure under composition, primitive recursion, and minimization. Some of the work has already been done in Exercise 2.5.7 and Proposition 2.5.3. Following the definitions in Section 2.5.1, we let

Ax:nat.x Ax:natk.Projix

(1

(1

0. If and g are numeric functions of k arguments, then g will behave like g, except that an application to k arguments will only have a normal form (or "halt") when the application of has a normal form. For any k > 0, the may be written as follows: function

f

f

f

f

f

Ax:nat1'.if Eq?(fx)0 then gx else gx. For any x, if fx can be reduced to normal form then (whether this is equal to zero or not) the result is gx. However, if fx cannot be reduced to a normal form, then the conditional can never be eliminated and g will not have a normal form. The main reason why this works is that we cannot reduce Eq? (fx) 0 to true or false without reducing fx to normal form. Using cnvg, we can define a composition and g that is defined only when is defined. are unary partial recursive functions represented by PCF —k For example, if f, g: terms M and N, we represent their composition by the term cnvg M (Ax: nat.(N(Mx))).

f

f

f

The Language PCF

112

and The generalization to the composition of partial functions fi —k fe: g: is straightforward and left to the reader. —k for appropriate k. The remaining cases of the proof are treated similarly, using The reader is encouraged to write out the argument for primitive recursion, and convince him- or herself that no modification is needed for minimization.

This theorem has some important corollaries. Both of the corollaries below follow from a well-known undecidability property of recursive functions. Specifically, there is no recursive, or algorithmic, method for deciding whether a recursive partial function is defined on some argument. This fact is often referred to as the recursive unsolvability of the Halting Problem. Exercise 2.5.18 describes a direct proof that the PCF halting problem is not solvable in PCF.

Corollary 2.5.15

There is no algorithm for deciding whether a given PCF expression has

a normal form.

Proof If there were such an algorithm, then by Theorem 2.5.14, we could decide whether a given partial recursive function was defined on some argument, simply by writing the function applied to its argument in PCF.

.

Corollary 2.5.16 There is no algorithm for deciding equality between PCF expressions.

Proof If there were such an algorithm, then by Theorem 2.5.14, we could decide whether a given partial recursive function was defined on some argument. Specifically, given a description of a partial function f, we may produce PCF term M representing the application f(n) by the straightforward algorithm outlined in the proof of Theorem 2.5.14, for any n. Then, using the supposed algorithm to determine whether if Eq? M M then o else 0 = 0, we could decide whether f(n) is defined.

.

Exercise 2.5.17 A primitive recursive function is a total recursive function that is defined without using minimization. The Kleene normal form theorem states that every partial recursive function f: may be written in the form

f(ni

nk)

= p(the least n with tk(i,

n)

= 1)

where p and tk are primitive recursive functions that are independent of f, and i depends on f. This is called the Kleene normalform of f, after Stephen C. Kleene, and the number i is called the index of partial recursive function f. The Kleene normal form theorem may be found in standard books on computability or recursion theory, such as [Rog67}. (a) Use the Kleene normal form theorem and Theorem 2.5.12 to show that every partial recursive function is definable in PCF.

2.5

PCF Programming Examples, Expressive Power and Limitations

113

(b) Using the results of Exercise 2.5.13, show that every partial function computed by a Turing machine is definable in PCF.

Exercise 2.5.18

It is easy to show the halting problem for PCF is not solvable in PCF. Specifically, the halting function on type a, Ha, is a function which, when applied to any PCF term Ma, returns true if M has a normal form ("halts"), and false otherwise. In this exercise, you are asked to show that there is no PCF term Hb00/ defining the halting function on type bool, using a variant of the diagonalization argument from recursive function theory. Since the proof proceeds by contradiction, assume there is a PCF term Hb00/: bool —* bool defining the halting function. (a) Show that using Hb001, we may write a PCF term G: bool —k bool with the property that for any PCF term M: bool, the application GM has a normal form if M does not.

(b) Derive a contradiction by considering whether or notfix G has a normal form.

2.5.6

Nondefinability of Parallel Operations

We have seen in Sections 2.5.4 and 2.5.5 that the PCF-definable functions on numerals correspond exactly to the recursive functions and (in the exercises) the numeric functions computable by Turing machines. By Church's thesis, as discussed in Section 2.5.4, this suggests that all mechanically computable functions on the natural numbers are definable in PCF. However, as we shall see in this section, there are operations that are computable in a reasonable way, but not definable in PCF. The main theorem and the basic idea of Lemma 2.5.24 are due to Plotkin [P1o77J. A semantic proof of Theorem 2.5.19 appears as Example 8.6.3, using logical relations and the denotational semantics of PCF. The main example we consider in this section is called parallel-or Before discussing this function, it is worth remembering that for a PCF term to represent a k-ary function on the natural numbers, we only consider the result of applying the PCF term to a tuple of numerals. It is not important how the PCF term behaves when applied to expressions that are not in normal form. The difference between the positive results on representing numeric functions in Sections 2.5.4 and 2.5.5 and the negative result in this section is that we now take into account the way a function evaluates its arguments. It is easiest to describe the parallel-or function algorithmically. Suppose we are given closed terms M, N: bool and wish to compute the logical disjunction, M V N. We know true, M —**false, or M has no normal form, and similarly for N. One that either M way to compute M v N is to first reduce both M and N to normal form, then return true if either is true and false otherwise. This algorithm computes what is called the sequential-or of M and N; it will only terminate if both M and N have a normal form. For we wish to return true if either M or N reduces to true, regardless of

The Language PCF

114

Table 2.6 Evaluation contexts for lazy PCF reduction. EV[J

::=

EV[]+M n+Ev[]fornumeraln Eq?Ev[]M Eq?nEv[Jfornumeraln I

I

I

if EVE]

then

IProJ1EvH

I

N

else

P

EVE]M

whether the other term has a normal form. It is easy to see how to compute parallelor in a computational setting with explicit parallelism. We begin reducing M and N in parallel. If either terminates with value true, we abort the other computation and return true. Otherwise we continue to reduce both, hoping to produce two normal forms. If both reduce to false, then we return false. To summarize, the parallel-or of M and N is true if either M or N reduces to true, false if both reduce to false, and nonterminating otherwise. The rest of this section will be devoted to proving that parallel-or is not definable in PCF, as stated in the following theorem.

Theorem 2.5.19 POR M N

There is no PCF expression POR with the following behavior true false no normal form

if M true or N false and N if M otherwise

true false

for all closed boolean expressions M and N. We will prove Theorem 2.5.19 by analyzing the operational semantics of PCF. This gives us an opportunity to develop some general tools for operational reasoning. The first important decision is to choose a deterministic reduction strategy instead of reasoning about arbitrary reduction order. Since we will only be interested in reduction on terms of type nat or bool, we can use lazy reduction, defined in Section 2.4.3, instead of the more complicated left-most reduction strategy. A convenient way to analyze the reduction of subterms is to work with contexts. To identify a critical subterm, we define a special form of context called an evaluation context. While the analysis of parallel-or only requires evaluation contexts for boolean terms, we give the general definition for use in the exercises and in Section 5.4.2. The evaluation contexts of PCF are defined in Table 2.6. An example appears below, after two basic lemmas. To make the outline of the proof of Theorem 2.5.19 as clear as possible, we postpone the proofs of the lemmas to the end of

the section. There are two main properties of evaluation contexts. The first is that if M N, then M matches the form of some evaluation context and N is obtained by reducing the indicated term.

2.5

PCF Programming Examples, Expressive Power and Limitations

115

Lemma 2.5.20

If M N, then there is a unique evaluation context EV[ such that M EVI1M'], the reduction M' —± N' is an instance of one of the PCF reduction axioms, and N EVIN']. 1

Since we can determine the unique evaluation context mentioned in the lemma by pattern matching, evaluation contexts provide a complete characterization of lazy reduction. The second basic property of evaluation contexts is that the lazy reduction of EVIM] is the lazy reduction of M, when M has one. This is stated more precisely in the following lemma.

Lemma 2.5.21

If M

M' then, for any evaluation context

EV[ ], we

have

EV{M'].

A subtle point is that when M is a lazy normal form, the lazy reduction of EVIM] may be a reduction that does not involve M.

Example 2.5.22 The left-most reduction of the term ((Xx: nat. Xy: flat. x + y) 7) 5 + (Xx: flat. x) 3 is given in Example 2.4.fl. As observed in Example 2.4A4 this is also the lazy reduction of this term. We illustrate the use of evaluation contexts by writing the evaluation context for each term in the reduction sequence. Since we underline the active redex, the evaluation context is exactly the part of the term that is not underlined.

+ y) 7)5 + (Xx: nat. x) (Xy:nat.7 + y)S + (Xx:nat.x)3

((Xx: nat. Xy: nat. x

(7+5) +(Xx:nat.x)3 12+(Xx:nat.x)3 12+3

3

]

(1] 5) + (Xx: nat. x) 3

+ (Xx:nat.x)3 (Xx:nat.x)3

left

If we look at the second evaluation context, EVI ] [ ] + (Xx: nat. x) 3, then it is easy to see that when M M' the lazy reduction of EV[M] will be EvIM] as guaranteed by Lemma 2.5.21. However, if we insert a lazy normal form such as 2 into the context, then the resulting term, 2 + (Xx: nat. x) 3, will have its lazy reduction completely to the right of the position marked by the placeholder [] in the context. Another case arises with the first context, EV[ ] ([ ] 5) + (Xx: nat. x) 3. If we insert the lazy normal form M flat. 7 + y, we obtain a term EV[M] whose lazy reduction involves a larger subterm than M. We note here that left-most reduction can also be characterized using evaluation contexts. To do so, we add six more forms to the definition of evaluation context, corresponding to the six rules that appear in Table 2.3 but not in Table 2.4. Lemma 2.5.20 extends

The Language PCF

116

easily to left-most reduction, but Lemma 2.5.21 requires the additional hypothesis that M a. M1 or (M1, M2). does not have the form in the analysis of parallel-or by showing that if the normal form We use Lemma 2.5.2 1 of a compound term depends on the subterms M1 Mk, as we expect the parallel-or of M1 and M2 to depend on M1 and M2, then we may reduce the term to the form EV[Mj, for some i independent of the form of M1 Mk. The intuition behind this is that some number of reduction steps may be independent of the chosen subterms. But if any of these subterms has any effect on the result, then eventually we must come to some term The "sequential" nature of PCF is where the next reduction depends on the form of that when we reach a term of the form EV[M1], it follows from Lemma 2.5.2 1 that if M1 is not a lazy normal form, we continue to reduce M1 until this subterm term does not lazyreduce further. Since we are interested in analyzing the reduction steps applied to a function of several arguments, we will use contexts with more than one "hole" (place for inserting a term). A context C[. ] with k holes is a syntactic expression with exactly the same form as a term, but containing zero or more occurrences of the placeholders []i []k. each assumed to have a fixed type within this expression. As for contexts with a single hole, without we write C[M1 Mk] for the result of replacing the ith placeholder [ by renaming bound variables. Our first lemma about reduction in arbitrary contexts is that the lazy reduction of a term of the form C[M1 Mk] is either a reduction on C, independent of the terms M1 placed in the context, or a reduction that depends on the form of one of the Mk terms M1 Mk. Let C[.,..., •] be a context with k holes. Suppose that there exist closed terms N1 Nk] is not in lazy Nk of the appropriate types such that C[N1 normal form. Then C must have one of the following two properties:

Lemma 2.5.23

(i) There is a context C'[. appropriate types C[M1 Mk]

•] such that for all closed terms C'[M1 Mk].

(ii) There is some i such that for all closed terms M1 there is an evaluation context EV[] with C[M1 Mk]

M1,..., Mk of

the

Mk of the appropriate types EV[M1].

We use Lemma 2.5.23 to prove the following statement about sequences of lazy reduc-

tions.

Lemma 2.5.24 Let C[. Mk be closed .1 be a context with k holes, let M1 terms of the appropriate types for C and suppose C[Mi is program with nora Mk] mal form N. Then either C[M

reduces to N for all closed

of the

2.5

PCF Programming Examples, Expressive Power and Limitations

appropriate types or there is some integer i such that for all closed appropriate types there is an evaluation context Ev[ I with

117

of the Ev[M1.

An intuitive reading of this lemma is that some number of reduction steps of a term If this does not produce a normal form, MkI may be independent of all the then eventually the reduction of C[M1 MkI will reach a step that depends on the form of some with the number i depending only on C. If we replace lazy reduction with left-most reduction, then we may change lazy normal form to normal form in Lemma 2.5.23 and drop the assumption that C[M1 MkI is a program in Lemma 2.5.24. Using Lemmas 2.5.21 and 2.5.24, we now prove Theorem 2.5.19. C[M1

Proof of Theorem 2.5.19

Suppose we have a PCF expression POR defining parallel-. or and consider the context C[., 1 POR [ Ii [ 12. Since POR defines parallel-or, C[true, truel true and C[false,falsel false. By Lemma 2.5.24, there are two possibilities for the lazy reduction of C [true, truel to true. The first implies that C[M1, M21 true for all closed boolean terms M1 and M2. But this contradicts C[false,falsel false. Therefore, there is an integer i e (1, 2} such that for all closed boolean terms M1, M2, there is an evaluation context EV[ I with C[M1, M21 EV[M,1. But, by Lemma 2.5.21, this implies that if we apply POR to true and the term fixboo/ bool. x) with no normal form, bool. x) the ith argument, we cannot reduce the resulting term to true. This contradicts the assumption that POR defines parallel-or.

*

.

The reader may reasonably ask whether the non-definability of parallel-or is a peculiarity of PCF or a basic property shared by sequential programming languages such as Pascal and Lisp. The main complicating factor in comparing PCF to these languages is the order of evaluation. Since a Pascal function call P (M , N) is compiled so that expressions M and N are evaluated before P is called, it is clear that no boolean function P could compute parallel-or. However, we can ask whether parallel-or is definable by a Pascal (or Lisp) context with two boolean positions. More precisely, consider a Pascal program context P[., .] with two "holes" for boolean expressions (or bodies of functions that return boolean values). An important part of the execution of P[M, NI is what happens when M or N may run forever. Therefore, it is sensible and nontrivial to ask whether parallel-or is definable by a Pascal or Lisp context. If we were to go to the effort of formalizing the execution of Pascal programs in the way that we have formalized PCF evaluation, it seems very likely that we could prove that parallel-or is not definable by any context. The reader may enjoy trying to demonstrate otherwise by constructing a Pascal context that defines parallel-or and seeing what obstacles are involved. In Lisp, it is possible to define parallel-or by defining your

The Language PCF

118

own eval function. However, the apparent need to define a non-standard the sequentiality of standard Lisp evaluation.

eval illustrates

Proof of Lemma 2.5.20 We use induction on the proof,

in the system of Table 2.4, that N is one of the reduction axioms (these are listed in

M N. The base case is that M —> Table 2.2). If this is so, then the lemma holds with Ev{ I [I. The induction steps for the inference rules in Table 2.4 are all similar and essentially straightforward. For example, if we have P + R Q + R by the rule

then by the induction hypothesis there is a unique evaluation context EV'{ J such that P Ev'{P'I, P' —> Q' is a reduction axioms, and Q EV'[Q'I. The lemma follows by

Proof of Lemma 2.5.21 We prove the lemma by induction on the structure of evaluation contexts. In the base case, we have an evaluation context EV[ I [I. Since EV[MI M and EV{M'] M', it is easy to see that under the hypotheses of the lemma, Ev[M] EV[M'J. There are a number of induction cases, corresponding to the possible syntactic forms of EV[ I. Since the analysis of each of these is similar, we consider three representative cases. The case Ev[ ] ] + M1 is similar to most of the compound evaluation contexts. In this case, Ev[MI EV'[MJ + M1 and, by the induction hypothesis, Ev'[M] EV'[M'I. Since EV'{MJ is not syntactically a numeral, no axiom can apply to the entire term EV'[MI + M1. Therefore the only rule in Table 2.4 that applies is EV'{MI

EV'[M]

+

M1

EV'[M']

EY[M'] + M1

This gives us EV[M] EV[M']. The two other cases we will consider are projection and function application. A context EV[ I I is an evaluation context only if EV'{] does not have the syntactic form of a pair. Since M M' implies that M is not a pair, we know that EV[M] and EV'{MI do not have the syntactic form of a pair. It follows that no reduction axiom applies to the entire term Therefore, by the induction hypothesis that EV'[MI EV'[M'II, we conclude that EV[M1 EV[M'I. The last case we consider is EV[ ] EV'[ I N. This is similar to the projection case. The context Ev'[ I cannot be a lambda abstraction, by the definition of evaluation contexts, and M is not a lambda abstraction since M M'. It follows that EV'[M] does not have the syntactic form of a lambda abstraction. Since the reduction axiom does not apply

2.5

PCF Programming Examples, Expressive Power and Limitations

119

to the entire term EV'[MJ N, we use the induction hypothesis EV'[M] EV'[M'] and the definition of lazy reduction in Table 2.4 to conclude that EV[MJ EV[M'].

Proof of Lemma 2.5.23

We prove the lemma by induction on the structure of contexts. If single symbol, then it is either one of the placeholders, [I,, a variable or a constant. It is easy to see that C has property (ii) if it is a placeholder and property (i) if it is some other symbol. For a context Cj [ ] + •], we consider two cases, each dividing into two subcases when we apply the induction hypothesis. If there exist terms M1 Mk with C1[M1 Mk] not in lazy normal form, then we apply the induction hypothesis to Ci. If C1 [M1 [M1 Mk] Mk] for all M1 Mk, then it is easy to see from the definition of lazy reduction that C is a

Ci[Mi

MkJ

+ C2[M1

Cç[M1

Mk]

MkJ

+ C2[M1

for all M1 Mk]

C1[Mj

Mk. On the other hand, if for every M1 EV[MI], then Mk]

+ C2[M1

EV[M1]

Mk]

MkJ

Mk we can write

+ C2[M1

C1

[Mi

Mkj

and we have an evaluation context for M1. The second case is that C1[M1 MkJ is a numeral and therefore M1 Mk do not occur in C1 [M1 Mkj. In this case, we apply the induction hypothesis to C2 and reason as above. The induction steps for Eq? and if ... then ... else ... are similar to the addition case. The induction steps for pairing and lambda abstraction are trivial, since these are lazy normal forms. For a context Proj1C[. •J, we must consider the form of C. If C[. .] = II then .1 satisfies condition (ii) of the lemma since this is an evaluation context. If C[. (C1 [• I I, I), then Proj1C[. I satisform, fies condition (i). If C[. has some other then Proj1C[M1 cannot MkI •] be a (proj) redex. Therefore, we apply the induction hypothesis to C and reason as in the earlier cases. The induction step for a context C1 [• case, ] •] is similar to the the terms we place in contexts except that we will need to use the assumption that are closed. Specifically, if C1[. a. C3[• then let C' be the context I For all closed M1 ]/x]Ci[. C'[. Mk we have I. j .

.

.

.

(Ax:a.C3[Mi

Mk])C2[Ml

Mk]

and the context satisfies condition (i) of the lemma. If Cit• ] [ ],, then as in the satisfies (ii) the lemma. Finally, if the application context condition of projection case,

The Language PCF

120

some context that is not of one of these two forms, we reason as in the addition case, applying the induction hypothesis to either Ci or, if C1[M1 Mk] is a lazy a normal form for all M1,..., Mk, the context C2. This completes the proof. C1

] is

.

Proof of Lemma 2.5.24

Let C[. Mk .1 be a context with k holes and let M1 be c'osed terms of the appropriate types such that C[M1 Mk] has normal form N of N, where n indicates the number of reduction observable type. Then C[M1 Mk] induction n. prove the lemma by on steps. We N is in normal form. There are two cases to conIn the base case, C[M1 Mk] sider. The degenerate one is that C[M is in normal form for all closed terms of the appropriate types. But since the only results are numerals and boolean must not appear in N and the lemma easily follows. The secconstants, Since ond case is that is not in normal form for some C[M1 Mk] is in normal form, condition (i) of Lemma 2.5.23 cannot hold. Therefore

Mfl In the induction step, with C[M1 Mkj *m+1 N, Lemma 2.5.23 gives us two cases. The simpler is that there is some i such that for all closed terms M of the appropriate types has the form Ev[M]. But then clearly EV[M] and the lemma holds. The remaining case is that there is some context C'[. .1 such that for all of appropriate types, we have a reduction of the form C[M1

Mk]

C'[M1

Mk]

*m N.

The lemma follows by the induction hypothesis for

...

,

•]. This concludes the proof.

U

Exercise 2.5.25 Show that Lemma 2.5.24 fails if we drop the assumption that C[M1 Mk] is a program.

Exercise 2.5.26 This exercise asks you to prove that parallel-conditional is not definable in PCF. Use Lemmas 2.5.21 and 2.5.24 to show that there is no PCF expression PIFnat boo! —÷ nat —÷ nat —÷ flat such that for all closed boolean expressions M: boo! and N, P, Q: flat with Q in normal form, PIFnat MN P

—*4

Q

if

[M M J

[

true and N has normal form Q or, and P has normal form Q or, N, P both have normal form Q.

(If we extend Lemmas 2.5.2 1 and 2.5.24 to left-most reduction, the same proof shows that is not definable for any type a.)

2.5

PCF Programming Examples, Expressive Power and Limitations

121

Exercise 2.5.2 7

The operational equivalence relation on terms is defined in Section 2.3.5. Show that if M and N are either closed terms of type nat or closed terms of type bool, neither having a normal form, then M N.

Exercise 2.5.28

A plausible statement that is stronger than Lemma 2.5.24 is this:

Let be a context with k holes, let M1 Mk be closed terms of the J appropriate types for C and suppose C[M1 N, where N does not further reduce by lazy reduction. Then either C[Mç reduces to N for all closed of the appropriate types or there is some integer i such that for all closed of the appropriate types there is an evaluation context EV[ I with C[Mç •

MkI.

*

The only difference is that we do not require N to be in normal form. Show that this statement is false by giving a counterexample. This may be done using a context C[ I with only one placeholder.

Exercise 2.5.29 [Sto9laI This problem asks you to show that parallel-or, boolean parallel-conditional and natural-number parallel-conditional are all interdefinable. Parallel-or is defined in the statement of Theorem 2.5.19 and parallel-conditional is defined in Exercise 2.5.26. An intermediate step involves parallel-and, defined below. (a) Show how to define POR from PIFb001 by writing a PCF expression containing PIFb00/. (b) Show how to define PIFb00/ from PIFnat using an expression of the form Ax: bool. Ày: bool. Az: bool. Eq?

1

(PIFnat X M N).

(c) Parallel-and, PAND, has the following behavior:

PAND M N

true false no normal form

if M —*±

true and N if M —*± false or N otherwise.

—*±

true,

false,

Show how to define PAND from POR by writing a PCF expression containing POR. (d) Show how to define PIFnat from POR using an expression of the form Ax: bool. Ày: nat. Az: nat. search

P

bool has x, y, z free and contains POR, PAND, and Eq?. The essential where P: nat properties of search are summarized in Proposition 2.5.3 on page 99.

The Language PCF

122

Variations and Extensions of PCF

2.6 2.6.1

Summary of Extensions

This section briefly summarizes several important extensions of PCF. All are obtained by adding new types. The first extension is a very simple one, a type unit with only one element. The second extension is sum, or disjoint union, types. With unit and disjoint unions, we can define bool as the disjoint union of unit and unit. This makes the primitive type bool unnecessary. The next extension is recursive type definitions. With recursive type definitions, unit and sum types, we can define nat and its operations, making the primitive type nat also unnecessary. Other commonly-used types that have straightforward recursive definitions are stacks, lists and trees. Another use of recursive types is that we can define the fixed-point operatorfix on any type. Thus with unit, sums and recursive types, we may define all of the type and term constants of PCF. The final language is a variant of PCF with lifted types, which give us a different view of nontermination. With lifted types, written in the form aj, we can distinguish between natural-number expressions that necessarily have a normal form and those that may not. Specifically, natural-number terms that do not involve fix may have type nat, while terms that involve fix, and therefore may not have a normal form, have the lifted type nati. Thus the type flat of PCF corresponds to type nati in this language. The syntax of many common programs becomes more complicated, since the distinction between lifted and unlifted types forces us to include additional lifting operations in terms. However, the refinement achieved with lifted types has some theoretical advantages. One is that many more equational or reduction axioms become consistent with recursion. Another is that we may study different evaluation orders within a single framework. For this reason, explicit lifting seems useful in meta-languages for studying other programming languages. All of the extensions summarized here may be combined with polymorphism, treated in Chapter 9 and other typing ideas in Chapters 10 and 1 2.6.2

Unit and Sum Types

We add the one-element type unit to PCF, or any language based on typed lambda calculus, by adding unit to the type expressions and adding the constant *: unit to the syntax of

terms. The equational axiom for * is

M

= *: unit

(unit)

for any term M: unit. Intuitively, this axiom says that every element of type unit is equal to *. This may be used as a reduction rule, read left to right. While unit may not seem very interesting, it is surprisingly useful in combination with sums and other forms of types.

2.6

Variations and Extensions of PCF

123

It should be mentioned that the reduction rule for unit terms causes confluence to fail, when combined with i'-reduction [CD91, LS86]. This does not have an immediate consequence for PCF, since we do not use i'-reduction. Intuitively, the sum type a + r is the disjoint union of types a and r. The difference between a disjoint union and an ordinary set union is that with a disjoint union, we can tell which type any element comes from. This is particularly noticeable when the two types overlap or are identical. For example, if we take the set union of mt and int, we just get mt. In contrast, the disjoint union of mt and mt has two "copies" of mt. Informally, we think of the sum type mt + mt as the collection of "tagged" integers, with each tag indicating whether the integer comes from the left or right mt in the sum. The term forms associated with sums are injection and case functions. For any types a have types and r, the injection functions Inleftat and

Inleftot

a

:

Inright°'t

:

—f

r

a

—±

+r

a

+

r

Intuitively, the injection functions map a or r to a + r by "tagging" elements. However, since the operations on tags are encapsulated in the injection and case functions, we never say exactly what the tags actually are. The Case function applies one of two functions to an element of a sum type. The choice between functions depends on which type of the sum the element comes from. For all types r and p, the case function CaseatP has type :

(a+r)—÷(a--÷p)--÷(r--÷p)--÷p

fg

inspects the tag on x and applies Intuitively, from r. The main equational axioms are

f g = fx (Inright°'t x) f g = gx

Caseat/) (Inleftat x)

f if x is from o

and g if x is

(case)i (case)2

Both of these give us reduction axioms, read left to right. There is also an extensionality axiom for sum types,

Casea,t/) x

(f o

(f o

=

f x,

(case) 3

where f: (a + r) —± (a + r). Some consequences of this axiom are given in Exercise 2.6.3. Since the extensionality axiom leads to inconsistency, when combined with fix, we do not include this equational axiom in the extension of PCF with sum types (see Exercise 2.6.4). We will drop the type superscripts from injection and case functions when the type is either irrelevant or determined by context.

The Language PCF

124

As an illustration of unit and sum types, we will show how bool may be eliminated if unit and sum types are added to PCF. We define bool by

bool

unit + unit

and boolean values true and false by

true false

Inleft *

Inright *

The remaining basic boolean expression is conditional, We may consider this sugar for Case, as follows:

if

M

then N else P

if ...

then

... else

M

is the lambda term Ax: p. Ày: unit. x that produces a constant where N, P: p and function. To show that this works, we must check the two equational axioms for conditional:

if true then if false then

M M

else N=M, else N = N.

The reader may easily verify that when we eliminate syntactic sugar, the first equational * = M, and similarly for axiom for Case yields if true then M else N = the second equation. Other uses of sum types include variants and enumeration types, considered in Exercises 2.6.1 and 2.6.2.

Exercise 2.6.1

A variant type is a form of labeled sum, bearing the same relationship to sum types as records do to product types. A variant type defining a sum of types is written [Li: oi Lk: (7k], where Li Lk are distinct syntactic labels. As oi for records, the order in which we write the labelltype pairs does not matter. For example, [A: int, B: bool] is the same type as [B: bool, A: int]. Intuitively, an element of the variant type [L1: Lk: ad is an element of one of for 1 i 0, we may similarly define the numeral ml by

ml

up (Inright up (Inright

.

.

up

(Inleft*)

with n occurrences of Inright and n + applies up and Inright:

succ

Ax:

1

.

.

.))

applications of up. The successor function just

nat. up(Inright x)

We can check that this has the right type by noting that if x: nat, then Inright x belongs to unit + nat. Therefore, up(Inright x): nat. The reader may verify that Sini — In + 11 for any natural number n. The zero-test operation works by taking x: nat and applying dn to obtain an element of unit + nat, then using Case to see whether the result is an element of unit (i.e., equal to 0) or not.

zero?

Ax: nat.

(dn x) (Ày: unit, true) (Az: nat.false)

2.6

Variations and Extensions of PCF

129

The reader may easily check that zero? 101 = true and zero? In] =false for n > 0. The final operation we need is predecessor. Recall that although there is no predecessor of 0, it is conventional to take pred 0 = 0. The predecessor function is similar to zero? since it works by taking x: nat and applying dn, then using Case to see whether the result is an element of unit (i.e., equal to 0) or not. If so, then the predecessor is 0; otherwise, the element of flat already obtained by applying dn is the predecessor.

pred

(dn x) (Ày: unit. 0) (Az: nat. z)

nat.

Ax:

The reader may verify that for any x: nat we have pred(succ x) = x. (This is part of Exercise 2.6.5 below.) A second illustration of the use of recursive types has a very different flavor. While most common uses of recursive types involve data structures, we may also write recursive definitions over any type using recursive types. More specifically, we may define a function a —f a. (fix0 f) using only typed lambda fixa with the reduction property fixa calculus with recursive types. The main idea is to use "self-application," which is not possible without recursive types. More specifically, if M is any PCF term, the application MM cannot be well—typed. The reason is simply that if M: r, then r does not have enough symbols to have the form r —± r'. However, with recursive types, (dn M)M may be well-typed, since the type of M could have the form 1fit.t —± r. We will define fix0 using a term of the form dn MM. The main property we want for dn MM is that

f

a

dn MM

—±

i. f(dn MMf).

This reduction gives us a good clue about the form of M. Specifically, we can let M have the form M

up(

Ax:r.Af:a —± a.f(dnxx f))

if we can find an appropriate type r. Since we want dn MM: (a —± a) —± a, the expression dn x x in the body of M must have type 7. Therefore, r must satisfy the isomorphism

f

—*

(a

—*

a)

—*

i.

We can easily achieve this by letting

dn MM, or more simply

We leave it to the reader to verify that if we define r.

a

—±

a.

f(dnx x f))

r.

a

—±

i. f(dnx x f)))

The Language PCF

130

—÷÷ Af: a —± a. we have a well-typed expression f). The fixed-point operator we have just derived is a typed version of the fixed-point operator e from untyped lambda calculus [Bar84J. As outlined in Exercise 2.6.7 below, any untyped lambda term may be written in a typed way using recursive types.

Exercise 2.6.5 Using the definitions given in this section, show that we have the following reductions on natural number terms. (a) For every natural number n, we have S [nl true and zero? ml

(b) zero? [01

[n

+

false for n > 0.

(c) If x is a natural number variable, then pred (succ x)

Exercise 2.6.6 recursive type: unit

list

11.

—÷÷

x.

We may define the type of lists of natural numbers using the following

+ (nat x t)

The intuition behind this definition is that a list is either empty, which we represent using the element *: unit, or may be regarded as a pair consisting of a natural number and a list.

(a) Define the empty list nil: list and the function cons: nat x list number to a list.

list that adds a natural

(b) The function car returns the first element of a list and the function cdr returns the list following the first element. We can define versions of these functions that make sense when applied to the empty list by giving them types

car

:

list

—±

unit

+ nat

cdr

:

list

—*

unit

+ list

and establishing the convention that when applied to the empty list, each returns Inleft *. Write terms defining car and cdr functions so that the following equations are provable:

car nil car (cons x £)

cdr nil cdr (cons x £)

= Inleft * =x

= Inleft * =£

for x: nat and £: list. (c) Using recursion, it is possible to define "infinite" lists of type list. Show how to use fix to define a list for any numeral n, such that car = n and cdr =

2.6

Variations and Extensions of PCF

131

Exercise 2.6.7

The terms of pure untyped lambda calculus (without constant symbols) are given by the grammar U

::=

x

UU

J

I

Ax.U

The main equational properties of untyped lambda terms are untyped versions of (a), (,8) and (n). We may define untyped lambda terms in typed lambda calculus with recursive types using the type sit. t t. Intuitively, the reason this type works is that terms of type t t may be as used functions on this type, and conversely. This is exactly what we btt. need to make sense of the untyped lambda calculus. (a) Show that if we give every free or bound variable type sit. t t, we may translate any untyped term into a typed lambda term of type sit. t t by inserting up, dn, and type designations on lambda abstractions. Verify that untyped (a) follows from typed (a), untyped follows from dn(up M) = M and untyped (,8) and, finally, untyped follows from typed and up(dn M) = M. (b) Without recursive types, every typed lambda term without constants has a unique normal form. Show that by translating (Ax. xx) Ax. xx into typed lambda calculus, we can use recursive types to write terms with no normal form. (c) A well-known term in untyped lambda calculus is the fixed-point operator Y

Af.(Ax.f(xx))(Ax.f(xx)).

Unlike e, this term is only an equational fixed-point operator, i.e., Yf = f(Yf) but Yf does not reduce to f(Yf). Use a variant of the translation given in part (a) to obtain a typed equational fixed point operator from Y. Your typed term should have type (a a) a. Show that although Yf does not reduce to f(Yf), Yf reduces to a term Mf such that

Mf

fMf.

Exercise 2.6.8 The programming language ML has a form of recursive type declaration, called datatype, that combines sums and type recursion. In general, a datatype declaration has the form

datatype

t

=

of

a1

...

of ak

where the syntactic labels Li

Lk are used in the same way as in variant types (labeled sums). Intuitively, this declaration defines a type t that is the sum of ak, with labels Li Lk used as injection functions as in Exercise 2.6.1. If the declared type t occurs in then this is a recursive declaration of type t. Note that the vertical bar is part of at the syntax of ML, not the meta-language we are using to describe ML. A similar exercise I

The Language PCF

132

on ML datatype declarations, using algebraic specifications instead of recursive types and sums, appears in Chapter 3 (Exercise 3.6.4). The type of binary trees, with natural numbers at the leaves, may be declared using datatype by

datatype tree = leaf of nat

I

node

of tree x tree.

Using the notation for variants given in Exercise 2.6.1, this dechires a type tree satisfying

[leaf: nat, node: tree x tree]

tree

datatype declaration as

of the form .] using variants as in Exercise 2.6.1. Illustrate your general method on the type of trees above. (a) Explain how to regard a

a recursive type expression

(b) A convenient feature of ML is the way that pattern matching may be used to define functions over declared datatypes. A function over trees may be declared using two "clauses," one for each form of tree. In our variant of ML syntax, a function over tree's defined by pattern matching will have the form

letrec fun f(leaf(x:nat)) I

f(node(tl:tree,t2:tree)) in P

=M =N

f

where variables x: nat and f: tree a are bound in M, t1, tree and are bound in N and is bound in the the declaration body P. The compiler checks to make sure there is one clause for each constructor of the datatype. For example, we may declare a function that sums up the values of all the leaves as follows.

f

f

letrec fun f(leaf(x:nat)) f(node(t1:tree, t2:tree))

=x = f(t1) + f(t2)

Explain how to regard a function definition with pattern matching as a function with Case and illustrate your general method on the function that sums up the values of all the leaves of a tree. 2.6.4

Lifted Types

Lifted types may be used to distinguish possibly nonterminating computations from ones that are guaranteed to terminate. We can see how lifted types could have been used in PCF by recalling that every term that does not contain fix has a normal form. If we combine basic boolean and natural number expressions, pairs and functions, we therefore have a language in which every term has a normal form. Let us call this language PCF0. In Section 2.2.5, we completed the definition of PCF by adding fix to PCF0. In the process, we

2.6

Variations and Extensions of PCF

133

extended the set of terms of each type in a nontrivial way. For example, while every PCF0 term of type flat reduces to one of the numerals 0, 1, 2 PCF contains terms such as fix (Xx: nat. x) of type nat that do not have any normal form and are not provably equal to any numeral. This means that if we want to associate a mathematical value with every PCF term, we cannot think of the values of type flat as simply the natural numbers. We must consider some superset of the natural numbers that contains values for all the terms without normal form. (Since it is very reasonable to give all nat terms without normal form the same value, we only need one new value, representing nontermination, in addition to the usual natural numbers.) An alternative way of extending PCF0 with a fixed-point operator is to use lifted types. Although lifted types are sometimes syntactically cumbersome, when it comes to writing common functions or programs, lifted types have some theoretical advantages since they let us clearly identify the sources of nontermination, or "undefinedness," in expressions. One reason to use lifted types is that many equational axioms that are inconsistent with fix may be safely combined with lifted types. One example, which is related to the inconsistency of sum types andflx outlined in Exercise 2.6.4, is the pair of equations Eq? x x

= true,

Eq? x (fi + x) = false

numeral n different from 0.

While these equations make good sense for the ordinary natural numbers, it is inconsistent to add them to PCF. To see this, the reader may substituteflx (Ax: nat. 1 + x) for x in both equations and derive true = false. In contrast, it follows from the semantic construction in Section 5.2 that we may consistently add these two equations to the variant of PCF with lifted types (see Example 2.6.10). Another important reason to consider lifted types is that this provides an interesting and insightful way to incorporate alternative reduction orders into a single system. In particular, as we shall see below, both ordinary (nondeterministic or leftmost) PCF and Eager PCF (defined in Section 2.4.5) can be expressed in PCF with lifted types. This has an interesting consequence for equational reasoning about eager reduction. The equational theory of Eager PCF is not given in Section 2.4.5 since it is relatively subtle and has a different form from the equational theory of PCF. An advantage of lifted types is that we may essentially express eager reduction, and at the same time allow ordinary equational reasoning. In particular, equational axioms may be used to "optimize" a term without changing its normal form. This follows from confluence of PCF with lifted types [How92]. These advantages notwithstanding, the use of lifted types in programming language analysis is a relatively recent development and some syntactic and semantic issues remain unresolved. Consequently, this section will be a high-level and somewhat informal overview.

The Language PCF

134

The types of the language PCF1, called "PCF with lifted types," are given by the grammar

a of PCF, plus the form aj which is read "a lifted." Intuitively, the elements of aj are the same as the elements of a, plus the possibility of "nontermination" or "divergence". Mathematically, we can interpret aj as the union of a and one extra which serves as the value of any "divergent" term that is not equal element, written to an element of type a. The terms of PCF1 include all of the general forms of PCF, extended to PCF1 types. Specifically, we have pairing, projection, lambda abstraction and application for all product and function types. Since we can use if ... then ... else ... on all types in PCF, we extend this to all PCF1 types. More specifically, if M then N else P is well-formed whenever M: boo! and N, P: a, for any a. However, the basic natural number and boolean functions of PCF (numerals, boolean constants, addition and equality test) keep the same types as in PCF. For example, 5 + x is a nat expression of PCF1 if x: nat; it is not an expression of type nat1. The equational and reduction axioms for all of these operations are the same as for PCF. The language PCF1 also has two operations associated with lifted types and a restricted fixed-point operator. Since both involve a class of types called the "pointed" types, we discuss these before giving the remaining basic operations of the language. Intuitively, the pointed types of PCF1 are the types that contain terms which do not yield any meaningful computational value. More precisely, we can say that a closed term M: a is noninformative if, for every context C[ J of observable type, if C[M] has a normal form, then C[N] has the same normal form for every closed N: a. (An example noninformative term is fix(Xx: a. x); see Lemma 2.5.24.) Formally, we say a type a is pointed if either a tj, a a1 x a2 and both al, a a1 and is pointed. The reason tj has noninformative terms will become apparent from the type of the fixedpoint operator. For a product a1 x a noninformative term by combining noninformative terms of types ai and a2. If M: a2 is noninformative, and x: al is not free in M, then The name "pointed," which a!. M is a noninformative term of type al —÷ is not very descriptive, comes from the fact that a pointed type has a distinguished element (or "point") representing the value of any noninformative term. The language PCF1 has a fixed-point constant for each pointed type: These are the types

fixa: (a

a) -÷ a

a pointed

The equational axiom and reduction axiom forfix are as in PCF.

2.6

Variations and Extensions of PCF

135

The first term form associated with lifted types is that if M: o, then

[M]:ai.

(Li Intro)

Intuitively, since the elements of o are included in 0j, the term [Mj defines an element of o-, considered as an element of al. The second term form associated with lifting allows us to evaluate expressions of a1 to obtain expressions of type a. Since some terms of type aj do not define elements of a, this is formalized in a careful way. If M: a1 and N: r, with r pointed and N possibly containing a variable x of type a, then

let [x:ai=M in

N

(LiElim)

is a term of type r. The reader familiar with ML may recognize this syntax as a form of "pattern-matching" let. Intuitively, the intent of Lx: a] = M in N is that we "hope" that M is equal to [M'] for some M". If so, then the value of this expression is

let

the value of N when x has the value of M'. The main equational axiom for Li and let is

let Lx:7]=LMi in N=[M/x]N

(let

Li)

This is also used as a reduction axiom, read left to right. Intuitively, the value of M: a1 is either or an element of a. If M =1, then let Lx: aj = M in N will have value In operational terms, if M cannot be reduced to the form LM'j, then let Lx: a] = M in N not be reducible to a useful form of type r. If M is equal to some LM'i, then the value of let Lx: aj = M in N is equal to [M'/x]N. r representing "undefinedness" or nontermination at each If we add a constant state some other equational properties. pointed type r, we can also

I

(let M

in M)

[xi N

MLxi = NLxi

M=N

x not free in M, N:

aj

r

Intuitively, the first equation reflects the fact that if we cannot reduce M to the form [M'i, then let Lxi = M in N is "undefined." The inference rule is a form of reasoning by cases, based on the intuitive idea that any element of a lifted type °í is either the result of lifting an element of type a or the value 'aJ representing "undefinedness" or or either of nontermination at type a. However, we will not use constants of the form these properties in the remainder of this section. In general, they are not especially useful for proving equations between expressions unless we have a more powerful proof system (such as the one using fixed-point induction in Section 5.3) for proving that terms are equal to

The Language PCF

136

We will illustrate the use of lifted types by translating both PCF and Eager PCF into

PCFj. A number of interesting relationships between lifted types and sums and recursive types are beyond the scope of this section but may be found in [How921, for example. Before proceeding, we consider some simple examples.

Example 2.6.9 While there is only one natural way to write factorial in PCF, there are some choices with explicit lifting. This example shows two essentially equivalent ways of writing factorial. For simplicity, we assume we have natural number subtraction and multiplication operations so that if M, N: flat, then M — N: flat and M * N: flat. Using where these, we can write a factorial function as fact1 Af: nat

F1

if

nat1. Ax: flat.

Eq?xO then [1]

else let [yj=f(x—1) in [x*yj

In the definition here, the type of fact is flat —* nat±. This is a pointed type, which is important since we only have fixed points at pointed types. This type for fact1 is also the type of the lambda-bound variable in F1. Since f: flat —* nat±, we can apply to will be an element of arguments of type flat. However, the result f(x — 1) of calling flatl, and therefore we must use let to extract the natural number value from the recursive call before multiplying by x. If we wish to applyfact1 to an argument M of type flatj, we do so by writing

f

let

[xi = M

in fact1

f

x

An alternative definition of factorial is fact2 F2

Af: flatl

if

—*

f

nat1. Ax:

Eq?zO then [1]

natj. let

else let

[z]

where x

in

[yj =zf[z—

1]

in [z*yj

Here, the type of factorial is different since the domain of the function is flatl instead of flat. The consequence is that we could applyfact2 to a possibly nonterminating expression. However, since the first thingfact2 does with an argument x: flat1 is force it to some value z: flat, there is not much flexibility gained in this definition of factorial. In particular, as illustrated below, both functions calculate 3! in essentially the same way. The reason fact2 immediately forces x: flat1 to some natural number z: flat is that equality test Eq? requires a flat argument, instead of a natj. For the first definition of factorial, we can compute 3! by expanding out three recursive calls and simplifying as follows: (fix F1) 3

2.6

Variations and Extensions of PCF

if

137

let [y]=(flxFi)(3—1) in [3*y]

Eq?30 then [1] else

let [y]= let

(fix F1)(1)

[y']

in

[2 *

y']

in [3*y]

let

let

[y]— [y'] = let [y"] = (fix F1)O in

[1 *

y"]

in [2*y'] in [3*y] Then, once we have reduced (fix F1)O [1], we can set y" to 1, then y' to 1, then y to 2, then perform the final multiplication and produce the final value 6. For the alternative definition, we have essentially the same expansion, beginning with [3] instead of 3 for typing reasons: (fix F2) [3]

let [z] [3] in if Eq?zO then [1] else let [yj=(flxF2)[z—lj in [z*yj let [yj= let [y'j = let [y"] =fix F2) [0] in

[1 *

y"]

in [2*y'j in [3*yj The only difference here is that in each recursive call, the argument is lifted to match the type of the function. However, the body F2 immediately uses let to "unlift" the argument, so the net effect is the same as forfact1.

Example 2.6.10 Eq? x x

As mentioned above, the equations

= true,

Eq? x (n + x) —false

numeral n different from 0.

are consistent with PCF1, when x: flat is a natural number variable, but inconsistent when added to PCF. We will see how these equations could be used in PCF1 by considering the PCF expression

letrec

f(x:nat):nat=if

if in f3

Eq?xOthen

I

else

Eq?f(x—I)f(x—I) then

2 else 3

The Language PCF

138

If we apply left-most reduction to this expression, we end up with an expression of the form

if

Eq? ((fix F)(3 — 1)) ((fix F)(3 — 1))

then

2

else

3

which requires us to simplify two applications of the recursive function in order to determine that f(2) = f(2). We might like to apply some "optimization" which says that whenever we have a subexpression of the form Eq? M M we can simplify this to true. However, the inconsistency of the two equations above shows that this optimization cannot be combined with other reasonable optimizations of this form. We cannot eliminate the need to compute at least one recursive call, but we can eliminate the equality test using lifted types. A natural and routine translation of this expression into PCF1 is

letrec

f(x:nat):nat1=if Eq?xO then let

in

[yj

= f(x —

[1]

1) in if

else

Eq?yy then

L2i

else

L3i

f3

In this form, we have a subexpression of the form Eq? y y where y: nat. With the equations above, we can replace this PCF1 expression by an equivalent one, obtaining the provably equivalent expression

letrec

f(x:nat):nat1=if Eq?xO then

let

[yj=f(x—

1)

Lii

in

else L2i

in f3 The reason we cannot expect to simplify let = f(x — 1) in L2i to L2i by any local optimization that does not analyze the entire declaration of is that when f(x — 1) does not terminate (i.e., cannot be reduced to the form LMj), the two are not equivalent. .

f

Translation of PCF into PCF1 The translation of PCF into PCF1 has two parts. The first is a translation of types. We will map expressions of type a from PCF to expressions of type in PCF1, where ö is the result of replacing nat by nat± and bool by boolj. More specifically, we may define by induction on types:

bool

= bool1

nat

= nati

ai

0IXO2

Xc12 '72

=

2.6

Variations and Extensions of PCF

139

a, the type is pointed. This gives us a fixed-point operator of each type The translation of compound PCF terms into PCF1 is essentially straightforward. The most interesting part of the translation lies in the basic natural number and boolean operations. If M: a in PCF, we must find some suitable term M of type in PCF1. The translation of a numeral n is [nj and the translations of true and false are similarly [true] and Lfalse]. It is easy to see these have the right types. For each of the operations Eq?, + and conditional, we will write a lambda term of the appropriate PCF1 type. Since Eq? compares two natural number terms and produces a boolean, we will write a PCF1 term Eq? of type nat1 —÷ boo!1. The term for equality test is It is easy to check that for any PCF type

Eq?

=

Ax:

nat1.

Ày:

let

[x']

x

in

let

[y'] = y in [Eq? x' y']

An intuitive explanation of this function is that Eq? takes two arguments of type nat1, evaluates them to obtain elements of type nat (if possible), and then compares the results for equality. If either argument does not reduce to the form [M] for M: nat, then the corresponding let cannot be reduced. But if both arguments do reduce to the form [M], then we are certain to be able to reduce both to numerals, and hence we will be able to obtain [true] or Lfalse] by the reduction axiom for Eq?. The reader is encouraged to try a few examples in Exercise 2.6.14. The terms :j: for addition and cond for if ... then ... else ... are similar in spirit to Eq?. Ax:

nati. Ày: nati. let

[x'S]

= x in

let

[y'] = y in

[x'j=x in if

x' then y

[x'S

+ y']

else

z

In the translation of conditional, it is important to notice that is pointed. As a result, the function body let [x'j x in if x' then y else z is well-typed. For pairing and lambda abstraction, we let

(M,N) =(M,N) Ax:a. M = and similarly for application, projections and fixed points. Some examples appear in Exercises 2.6.14 and 2.6.16. The main properties of this translation are (i) If M:

a in PCF,

then M:

in PCF1.

The Language PCF

140

(ii) If M, N are syntactically distinct terms of PCF, then M, N are syntactically distinct terms of PCF1.

(iii) If M N in PCF1. Conversely, if M is a normal form in PCF, N in PCF, then M then M is a normal form in PCF1. (iv) If M

N is provable in PCF, then M = N is provable in PCF1.

While there are a number of cases, it is essentially straightforward to verify these properties. It follows from properties (ii) and (iii) that if M, N are distinct normal forms of PCF, then M, N are distinct normal forms of PCF1.

Translation of Eager PCF into PCF1 Like the translation of PCF into PCF1, the translation of Eager PCF into PCF1 has two parts. We begin with the translation of types. For any type a of Eager PCF, we define the associated type a. Intuitively, a is the type of Eager PCF values (in the technical sense of Section 2.4.5) of type a. Since any Eager PCF term will either reduce to a value, or diverge under eager reduction, an Eager PCF term of type a will be translated to a PCF1 term of type a1. After defining a by induction on the structure of type expressions, we give an informal explanation.

nat

=nat

bool

= bool

a

t = a —+

axt The Eager PCF values of type flat or bool correspond to PCF1 terms of type flat or bool. An Eager PCF function value of type a r has the form Ax: a. M, where for any argument V, the body [V/x]M may either reduce to a value of type r or diverge under eager reduction. Therefore, in PCF1, the values of type a —÷ t are functions from a- to t1. Finally, the Eager PCF values of type a x r are pairs of values, and therefore have type a x r in PCF1. We translate a term M: a of Eager PCF with free variables x1: a1 a term M: a1 of PCF1 with free variables Xl: a1 as follows:

=Lxi =Lni =

Ltruei

Variations and Extensions of PCF

2.6

false

=

M+N

=let

if

M

Lfalsej Lxi =%

then N

MN

in let

in Lx+yi

let = N in [Eq? x yj else P=let Lxi=M in if x then N else P = let Lfi = M in let Lx] = N in fx = let

Eq?MN

141

Lxi

= M in

Ax:o.M

Ag:

(a

—÷

r)

—÷

(a

—÷

r). g(?.x: a.

let

Lhi

= fg in hx))j

The translation of the fixed-point operator needs some explanation. As discussed in Section 2.4.5, Eager PCF only has fixed-point operators on function types. The translation of the fixed-point operator on type a —÷ r is recursively defined to satisfy

=

?.g:

(a

—÷

r)

—÷

(a

—÷

r). g(?.x: a. let [hi

g

in hx)

This is essentially the Eager PCF property "fix = Ag: (. . .). tion 2.4.5, where with delay[M] ?.x: (. .). Mx, using the fact that

gil)" from Sec-

.

Lfi=M in let let Lfi=M in fx

Ly]=

Lxi

in fy

byaxiom(let Li). In the next two examples, we discuss the way that eager reduction is preserved by this translation. A genera' difference between the eager reduction strategy given in Section 2.4.5 and reduction in PCF1 is that eager reduction is deterministic. In particu'ar, eager reduction of an application MN specifies reduction of M to a value (e.g., Ax: (.. .). M') before reducing N to a va'ue. However, from the point of view of accurately representing the termination behavior and values of any expression, the order of evaluation between M and N is not significant. When an application MN of Eager PCF is translated into PCF1, it will be necessary to reduce both to the form L.. .j, intuitively corresponding to the translation of an Eager PCF value, before performing 18-reduction. However, PCF1 allows either the function M or the argument N to be reduced first.

Example 2.6.11 We may apply this translation to the term considered in Example 2.4.20. Since this will help preserve the structure of the expression, we use the syntactic sugar M N

let

Lf]

= M in let

Lxi

= N in fx

The Language PCF

142

for eager application. For readability, we will also use the simplification

u-l-v= let

Lx]

=

Lu]

in (let

Ly]

= [v] in

[x

+ y])

Lu

+

v]

for any variables or constants u and v, since this consequence of our equational axioms does not change the behavior of the resulting term. Applying the translation to the term (fix (Ax: nat

—>

nat. Xy: flat. y)) ((Az: flat. z

+ 1)2)

gives us

(fix.

L

Ax:nat—>

])

(L

Az:nat.[z+

L2i)

A useful general observation is that

but that we cannot simplify a form M'• N' unless M' has the form LM] and N' has the form LN]. The need to produce "lifted values" (or, more precisely, the lifting of terms that reduce to values) is important in expressing eager reduction with lifted types. We can see that there are two possible reductions of the term above, one involving fix and the other (LAz: flat. Lz + l]j L2i) L2 + 1]. The reader may enjoy leftmost reducing the entire expression and seeing how the steps correspond to the eager reduction steps carried out in Example 2.4.20.

Example 2.6.12

We can see how divergence is preserved by considering the term

let f(x:nat):flat = 3 in letrec g(x:flat):nat

g(x+l) in f(g5)

from Example 2.4.21. After replacing let's by lambda abstraction and application, the translation of this term is Xf:nat—> flati.[Xg:nat—> flatl.Lf] (g5)]] L

•[Xx:flat.L3j

]

•(flx. LAg:flat —> flati.LAx:flat.g(x + 1)]]

)

where for simplicity we have used [g] LSi = g 5 and Lgj (x+l) = Lgj Lx + lj = g(x + 1). Recall from Examples 2.4.6 and 2.4.21 that if we apply leftmost reduction to the original term, we obtain 3, while eager reduction does not produce a normal form. We can see how the translation of Eager PCF into PCF1 preserves the absence of normal forms by leftmost-reducing the result of this translation. Since the first argument and the main function have the form [...], we can apply fJreduction as described in Example 2.6. 11. This gives us

2.6

Variations and Extensions of PCF

143

(g5)] (fix.

LAg:nat—* nat±.LAx:nat.L3]]

LG] )

where G Ag: nat —* nat1. [Ax: flat. g(x + 1)]. Our task is to reduce the argument (fix. LG] ) to the form [.. .j before performing the leftmost $-reduction. We do this using the

that — L

Ag: (flat

—*

F

L

nat1)

—*

(nat

] —** —*

nat1). g(Ax: nat.

let

Lh] —fix

(nat

natjj.

F g in hx)j

where

F

nat1)

(Af: ((flat

Ag:(nat

—*

nat1)

nat1))

(nat

—*

[h]

(flat

= fg in hx))

However, we will see in the process that the application to 5 resulting from 15-reduction cannot be reduced to the form L. . .j, keeping us from every reaching the normal form L3] for the entire term. The expansion offlx above gives us

F G

(Ax:nat.let [h]

—fix F G

—+÷

G

—+÷

[Ax: nat.((Ax: nat.

let

—+÷

[Ax:nat.let

=flx F

Lh]

in hx)

in hx) in h(x + 1)j

[h] =fix F G G

(x

+ 1))j

When the function inside the [ ] is applied to 5, the result must be reduced to a form L...] to complete the reduction of the entire term. However, this leads us to again reduce fix F G to this form and apply the result to 6. Since this process may be continued indefinitely, the entire term does not have a normal form. The main properties of the translation from Eager PCF into PCF1 are:

a in Eager PCF, with free variables xi: a1 xk: ak, then M: free variables xi: xk: Qk• (ii) If M N in PCF1. N in Eager PCF, then M (i) If M:

a1 in PCF1 with

(iii) If M is a closed term of Eager PCF of type nat or bool and c is a numeral or true or c in Eager PCF iff M —+÷ [c] in PCF1.

false, then M

(iv) If we let eager be operational equivalence of Eager PCF term, defined in the same in Section 2.3.5 but using eager evaluation, and let =PCFI be operational was as equivalence in PCF1, then for closed terms M, N we have M =eager N iffM PCFI i'L.

It is largely straightforward to verify properties (i)—(iii). Property (iv), however, involves

The Language PCF

144

construction of fully abstract models satisfying the properties described in Section 5.4.2. A proof has been developed by R. Viswanathan.

Exercise 2.6.13 Suppose that instead of having fixa for every pointed o', we only have fixed-point operators for types of the form Tj. Show that it is still possible to write a term with no normal form of each pointed type. Exercise 2.6.14 Reduce the application of Eq? to the following pairs of terms. You should continue until you obtain Ltruei or [false] or it is apparent that one of the let's can never be reduced. (a)

L3

+ 2]

and L5i.

(b)

and

(c)

and L19i.

[(Xx:nat.x+ 1)3].

Exercise 2.6.15 Carry out the reduction of the termfix. from Example 2.6.11.

nat

—÷

Exercise 2.6.16 where

The fibonacci function may be written in PCF as fib

F

nat. Xx: nat.

Xf: flat

—*

if (Eq?xO)or(Eq?xl) then

1

nat.[yjjj

else f(x—1)+f(x—2)

We assume for simplicity that we have natural number subtraction and a boolean or function. (a) Translate the definition from (ordinary) PCF into PCF1 and describe the reduction of fib 3 in approximately the same detail as Example 2.6.9. (Subtraction and or should be translated into PCF1 following the pattern given for + and Eq?. You may simplify [Mi — [N] to [M — N], or perform other simplifications that are similar to the ones used in Example 2.6.11.) (b) Same as part (a), but use the translation from Eager PCF into

PCFj.

Universal Algebra and Algebraic Data Types

3.1

Introduction

The language PCF may be viewed as the sum of three parts: pure typed lambda calculus with function and product types; the natural numbers and booleans; and fixed-point operators. If we replace the natural numbers and booleans by some other basic types, such as characters and strings, we obtain a language with a similar type structure (functions and pairs), but oriented towards computation on different kinds of data. If we wish to make this change to PCF, we must decide on a way to name basic characters and strings in terms, the way we have named natural numbers by 0, 1, 2 and choose basic functions for manipulating characters and strings, in place of +, Eq? and if ... then ... else Then, we must write a set of equational axioms that characterize these functions. This would define the terms of the language and give us an axiomatic semantics. Once we have the syntax of expressions and axiomatic semantics, we can proceed to select reduction axioms, write programs, andlor investigate denotational semantics. An algebraic datatype consists of one or more sets of values, such as natural numbers, booleans, characters or strings, together with a collection of functions on these sets. A fundamental restriction on algebraic datatypes is that none of the functions may have function parameters; this is what "algebraic" means. A expository issue is that in this chapter, we use the standard terminology of universal algebra for the sets associated with an algebra. Apart from maintaining consistency with the literature on algebra, the main reason for doing so is to distinguish algebraic datatypes, which consist of sets and functions, from the sets alone. Therefore, basic "type" symbols such as nat, bool, char and string are called sorts when they are used in algebraic expressions. In algebraic datatype theory, the distinction between type and sort is that a type comes equipped with specific operations, while a sort does not. This distinction is actually consistent with the terminology of Chapter 2, since function types and product types come equipped with specific operations, namely, lambda abstraction, application, pairing and projection. Universal algebra, also called equational logic, is a general mathematical framework that may be used to define and study algebraic datatypes. In universal algebra, the axiomatic semantics of an algebraic datatype is given by a set of equations between terms. The denotational semantics of an algebraic datatype involves structures called algebras, which consist of a collection of sets, one for each sort, and a collection of functions, one for each function symbol used in terms. The operational semantics of algebraic terms are given by directing algebraic equations. Traditionally, reduction axioms for algebraic terms are called rewrite rules. Some examples of datatypes that may be defined and studied using universal algebra are natural numbers, booleans, lists, finite sets, multisets, stacks, queues, and trees, each with various possible functions.

Universal Algebra and Algebraic Data Types

146

In this chapter, we will study universal algebra and its use in defining the kind of datatypes that commonly occur in programming. While the presentation of PCF in Chapter 2 centered on axiomatic and operational semantics, this chapter will be primarily concerned with the connection between axiomatic semantics (equations) and denotational semantics (algebras), with a separate section on operational semantics at the end of the chapter. In studying algebra, we will cover several topics that are common to most logical systems. The main topics of the chapter are: •

Algebraic terms and their interpretation in multi-sorted algebras,

•

Equational (algebraic) specifications and the equational proof system,

Soundness and completeness of the equational proof system (equivalence of axiomatic and denotational semantics),

•

•

Homomorphisms and initiality,

•

Introduction to the algebraic theory of datatypes,

•

Rewrite rules (operational semantics) derived from algebraic specifications.

The first four topics provide a brief introduction to the mathematical system of universal algebra. The subsequent discussion of the algebraic theory of datatypes points out some of the differences between traditional concerns in mathematics and the use of algebraic datatypes in programming. In the final section of this chapter, we consider reduction on algebraic terms. This has two applications. The first is to analyze properties of equational specifications and the second is to model computation on algebraic terms. A pedagogical reason to study universal algebra before proceeding with the denotational semantics of typed lambda calculus is that many technical concepts appear here in a simpler and more accessible form. There are many additional topics in the algebraic theory of datatypes. The most important omitted topics are: hierarchical specifications, parameterized specifications, refinement of one specification into another, and correctness of implementations. While we discuss problems that arise when a function application produces an error, we do not consider some of the more sophisticated approaches to errors such as algebras with partial functions or order-sorted algebras. The reader interested in more detailed investigation may consult [EM85, Wir9O]. 3.2

Preview of Algebraic Specification

The algebraic approach to data abstraction involves specifying the behavior of each datatype by writing a set of equational axioms. A signature, which is a way of defin-

3.2

Preview of Algebraic Specification

147

ing the syntax of terms, combined with a set of equational axioms, is called an algebraic specification. If a program using a set of specified datatypes is designed so that the correctness of the program depends only on the algebraic specification, then the primary concern of the datatype implementor is to satisfy the specification. In this way, the specification serves as a contract between the user and the implementor of a datatype; neither needs to worry about additional details of the other's program. This methodology is not specific to equational specifications, but equations are the simplest specification language that is expressive enough to be used for this purpose. The reader interested in specification and software development may wish to consult [BJ82, GHM78, LG86, Mor9O, Spi88], for example. When we view a set of equations as a specification, we may regard an algebra, which consists of a set of values for each sort and a function for each function symbol, as the mathematical abstraction of an implementation. In general, a specification may be satisfied by many different algebras. In program development, it is often useful to identify a "standard" implementation of a specification. One advantage is that it is easier to think about an equational specification if we choose a concrete, "typical" implementation. We can also use a standard implementation to be more specific about which implementations we consider acceptable. In datatype theory and in practical tools using algebraic datatypes, the so-called initial algebra (defined in this chapter) is often taken as the standard implementation. The main reasons to consider initiality are that it gives us a specific implementation that can be realized automatically and the initial algebra has some useful properties that cannot be expressed by equational axioms. In particular, there is a useful form of induction associated with initial algebras. Initial algebras are defined in Section 3.5.2, while their use in datatype theory is discussed in Section 3.6. In addition to initial algebras, so-called final algebras have also been proposed as standard implementations of an algebraic specification [Wan79J. Other computer scientists prefer to consider any "generated" algebra without extra elements. A short discussion of these alternatives appears in Section 3.6.2 and its exercises. For further information, the reader may consult [Wir9OI and the references cited there. One limitation of the algebraic approach to datatypes is that some of the operations we use in programming languages are type specific in some arguments, and type-independent in others. For example, the PCF conditional if ... then ... else ... requires a boolean as its first argument, but the next two arguments may have any type (as long as both have the same type). For this reason, a discussion of the algebra of PCF nat and bool values does not tell the whole story of PCF conditional expressions. Another limitation of the algebraic approach is that certain types (such as function types) have properties that are not expressible by equations alone. (Specifically, we need a quantifier to axiomatize

Universal Algebra and Algebraic Data Types

148

extensional equality of functions.) For this reason, it does not seem accurate to view a/l types algebraically. However, it does seem profitable to separate languages like PCF into an algebraic part, comprising basic datatypes such as nat, boo!, string, and so on, and a "higher-order" or lambda calculus part, comprising ways of building more complex types from basic ones. Some general references for further reading are [Grä68], a reference book on universal algebra, and [EM85, Wir9O], which cover algebraic datatypes and specifications. Some historically important research articles are [Hoa72, Par72], on the relationships between datatypes and programming style, and [LZ74, GHM78, GTW78] on algebraic datatypes and specifications. 3.3

Algebras, Signatures and Terms Algebras

3.3.1

An algebra consists of one or more sets, called carriers, together with a collection of distinguished elements and first-order (or algebraic) functions

over the carriers of the algebra. An example is the algebra

i,+, *), with carrier the set of natural numbers, distinguished elements 0, 1 E and functions +, * x Al. The difference between a "distinguished" element of a carrier and any other element is that we have a name for each distinguished element in the language of algebraic terms. As an economy measure, we will consider distinguished elements like 0, 1 E as "zero-ary" functions, or "functions of zero arguments." This is not a deep idea at all, just a convenient way to treat distinguished elements and functions uniformly. An example with more than one carrier is the algebra -4pcf

=

(IV,

8,0,

1

+, true,false, Eq9

where is the set of natural numbers, 8 the set of booleans, 0, 1,... are the natural numbers, + is the addition function, and so on. These are the mathematical values that we usually think of when we write basic numeric and boolean expressions of PCF. In studying algebras, we will make the correspondence between the syntactic expressions of PCF and the semantic values of this algebra precise.

3.3

3.3.2

Algebras, Signatures and Terms

149

Syntax of Algebraic Terms

The syntax of algebraic terms depends on the basic symbols and their types. This information is collected together in what is called a signature. In algebra, as mentioned earlier, it is traditional to call the basic type symbols sorts. A signature = (S, F) consists of •

A set S whose elements are called sorts,

A collection F of pairs in two distinct pairs. •

(f,

x

...

x

5k —÷

s), withs1

5 E

Sand no

f occurring

f

In the definition of signature, a symbol occurring in F is called a typed function symbol (or function symbol for short), and an expression 51 x ... x si —÷ s a first-order (or algebraic) function type over S. We usually write f: r instead of (f, r) E F. A sort is a name for a carrier of an algebra, and a function symbol f: sj x ... x 5k s is a name for a k-ary function. We allow k = 0, so that a "function symbol" f: s may be a name for an element of the carrier for sort s. A symbol f: s is called a constant symbol, and f: si x ... x 5k s with k> 1 is called aproperfunction symbol. The restriction that we cannot have f: and f: means that each constant or function symbol has a unique sort or type.

Example 3.3.1 A simple signature for writing natural number expressions is = (5, F), where S = {nat} contains only one sort and F provides function symbols 0: nat, 1: flat, +: flat x nat —÷ nat and *: nat x nat —÷ nat. A convenient syntax for describing signatures is the following tabular format.

sorts:nat fctns:0, 1: nat

+, * nat x nat —* nat In this notation, we often write several function symbols of the same type on one line. This saves space and usually makes the list easier to read.

The purpose of a signature is to allow us to write algebraic terms. We assume we have some infinite set V of symbols we call variables, and assume that these are different from all the symbols we will ever consider using as constants, function symbols or sort symbols. Since it is not meaningful to write a variable without specifying its sort, we will always list the sorts of variables. A sort assignment is a finite set xk:sk}

Universal Algebra and Algebraic Data Types

150

of ordered variable: sort pairs, with no variable given more than one sort. Given a signature = (S, F) and sort assignment F using sorts from S, we define the set Termss(E, F) of algebraic terms of sort s over signature E and variables F as follows x

Termss(E,

F)ifx:s M

In the special case k

=

f

f

F) if

0, the

F,

F) for i

=

1

n.

second clause should be interpreted as

:

For example, 0 e 0) and +0! is a term in 0), since + is a F) binary numeric function symbol of En-. It is also easy to see that +01 e for any sort assignment F. To match the syntax of PCF, we may treat 0 + 1 as syntactic sugar for +01. When there is only one sort, it is common to assume that all variables have that sort. In this case, we can simplify sort assignments to lists of variables. We write Var(M) for the set of variables that appear in M. In algebra, the substitution [N/x]M of N for all occurrences of x in M is accomplished by straightforward replacement of each occurrence of x by N, since there are no bound variables in algebraic terms. It is easy to see that if we replace a variable by a term of the right sort, we obtain a well-formed term of the same sort we started with. This is stated more precisely below. In the following lemma, we use the notation F, x: s' for the sort assignment

F,x:s =FU{x:s} where x does not occur in F.

Lemma 3.3.2

If M e

F, x: s') and N e

F), then [N/x]M e

F). A related property is given in Exercise 3.3.4.

Proof The proof is by induction on the structure of terms in Termss(E, F, x: s'). Since this is the first inductive proof in the chapter, we will give most of the details. Subsequent inductive proofs will give only the main points. Since every term is either a variable or the application of a function symbol to one or more arguments, induction on algebraic terms has one base case (variables) and one induction step (function symbols). In the base case, we must prove the lemma directly. In the induction step, we assume that the lemma holds for all of the subterms. This assumption is called the inductive hypothesis.

3.3

Algebras, Signatures and Terms

151

For the base case we consider a variable y E Termss(E, F, x: s'). If y is different from x, then [N/xjy is just y. In this case, we must have y: s E F, and hence y E Terms5(E, F). If our term is the variable x itself, then the result [N/xjx of substitution is the term N, which belongs to Termss(E, F) by similar assumption. This proves the base case of the induction. For the induction step, we assume that [N/xjM1 E Termssi(>, F), for < i , F, x: s'), the function symbol must have type sj x ... x 5k s. It follows that f[N/xjMi F), which finishes the induction step and proves the lemma. E . [N/X]Mk 1

. .

. .

.

.

Example 3.3.3 We may write natural number and stack expressions using the signature = (S, F) with S = {nat, stack) and F containing the function symbols listed below.

sorts:nat, stack

...

fctns:0, 1,2,

+,

*

:

:

flat

flat x nat

nat

—÷

empty stack :

push nat x stack :

pop stack :

top stack :

—f —÷

—÷

stack

stack

nat

The terms of this signature include the expression push2(pushl (push0empiy)), which would name a stack with elements 0,1 and 2 in the standard interpretation of this signature. Using a stack variable, we can also write an expression push I (push0s) E {s: stack)) which defines the stack obtained by pushing 0 and I onto s.

.

Exercise 3.3.4 Use induction on terms to show that if M assignment containing F, then M E Terms5(>, F'). 3.3.3

E

F) and F' is a sort

Algebras and the Interpretation of Terms

Algebras are mathematical structures that provide meaning, or denotational semantics, for algebraic terms. To make the mapping from symbols to algebras explicit, we will use a form of algebra that gives a value to every sort and function symbol of some signature. If E is a signature, then a E-algebra A consists of •

Exactly one carrier

AS

for each sort symbol s

E

S,

Universal Algebra and Algebraic Data Types

152

•

An interpretation map

I assigning a function

to each proper function symbol each constant symbol f: s e F.

f: Si

x

...

x

5k

s

e F and an element

1(f)

E

to

If E has only one sort, then we say that a E-algebra is single-sorted; otherwise, we say an algebra is many-sorted or multi-sorted. If A = ({AS)5Es, I) is a E-algebra, and is a function symbol of E, it is often convenient to write fA for 1(f). In the case that there are finitely many or countably many function symbols, it is also common to list the functions, as in the example algebras and Apcf given earlier. Combining these two conventions, we may write out the En-algebra

f

as

This gives us all the information we need to interpret terms over since the interpretaway tion map is given by the the functions of the algebra are named. It is a relatively simple matter to say what the meaning of a term M E Terms(E, F) is in any E -algebra. The only technical machinery that might not be familiar is the use of variable assignments, or environments. An environment for A is a mapping ij: V -± USAS from variables to elements of the carriers of A. The reason that we need an environment is that a term M may have a variable x in it, and we need to give x a value before we can say what M means. If someone asks you, "What is the value of x + 1?" it is difficult to answer without choosing a value for x. Of course, an environment must map each variable to a value of the appropriate carrier. We say an environment ij satisfies F if ij (x) E AS for each x: s E F. Given an environment for A that satisfies F, we define the meaning AI[M]Jij of any M e Terms(E, F) in as follows

I

AIIIfM1

. .

.

= fA(AIIIMI]IJl

= A, since there are no In the special case that f: s is a constant symbol, we have function arguments. It is common to omit A and write if the algebra A is clear from context. If M has no variables, then does not depend on ij. In this case, we may omit the environment and write for the meaning of M in algebra A. Example 3.3.5 The signature of Example 3.3.1 has a single sort, nat, and function symbols 0, 1, +, *. We may interpret terms over this signature in the En-algebra given

Algebras, Signatures and Terms

3.3

153

in Section 3.3.1. Let be an environment 77 is determined as follows. (Remember that x

with 77(x) 0. The meaning of x + + is syntactic sugar for +x 1.)

1

in

1

+ 1N) 1N)

=

Example 3.3.6 The signature = (S, F) of Example 3.3.3 has sorts flat and stack, and function symbols 0, 1, 2 +, *, empty, push, pop and top. One algebra Astk for this signature interprets the natural numbers in the standard way and interprets stacks as sequences of numbers. To describe the functions on stacks, let us write n: s for the result of adding natural number n onto the front of sequence s E Using this notation, we have Astk

0A, 1A 2A

=

+A,

emptyA,pushA,popA, topA)

with stack functions determined as follows emptyA

=

pushA(fl,

s)

=

pOpA(fl:

s)

=

:

popA(E)

E,

the empty sequence :s

s E

topA(fl::s)

=rz

topA(E)

=0

In words, the constant symbol empty is interpreted as emptyA, the empty sequence of natural numbers, and the function symbol push is interpreted as pushA, the function that removes the first adds a natural number onto the front of a sequence. The function element of a sequence if the sequence is nonempty, as described by the first equation Since pop(empty) is a well-formed algebraic term, this must be given some for interpretation in the algebra. Since we do not have "error" elements in this algebra, we let popA(E) be the empty sequence. A similar situation occurs with top, which is intended to return the top element of a stack. When the stack is empty, we simply return 0, which is an arbitrary decision.

Universal Algebra and Algebraic Data Types

154

We can work out the meanings of terms using the environment ij mapping variable x: flat to 3 and variable s: stack to the sequence (2, 1) whose first element is 2 and second element is 1.

=popA(pushA(IIjx]JIJ,

lpop(pushx

=pOjy4(pUshA(3, (2, 1)))

=popA((3,2,

1))

= (2,1) =

lljtop(pushx

top

= topA(pushA(3,

(2,

1)))

to see from these examples that for any environment number and s to a stack, we have

ij

mapping x to a natural

ffpop(pushxs)Jlij = ffsjjij _—ffxjjij

fftop(pushxs)}Jij

.

One important property of the meaning function is that the meaning of a term M E Termss(E, F) is an element of the appropriate carrier As, as stated in the lemma below.

This lemma should be regarded as a justification of our definitions, since if it failed, we would change the definitions to make it correct.

Lemma 3.3.7

Let A be a E-algebra, M

fying F. Then

E Terms5(E,

F), and

ij be

an environment satis-

E AS

in Termss(E, F). The base case is to show that for any variable x E Termss(E, F) and environment ij satisfying F, we have ffxjlij E As. From the definition of Terms5(E, F), we can see that x: s E F. By the definition of meaning, we know that ffxjjij = ij(x). But since ij satisfies F, it follows that ij(x) E As. This proves the base case of the induction. For the induction step, we assume that ElM1 Tlij E Asj, for 1 +

M3...

N

and similarly

P

Pi

Po

P3...

P2 —>+

Q.

Since —>+ is reflexive, we may assume without loss of generality that k = Lemma 3.7.10, we have

[P/x]M

Therefore, by

[P3/x]M3...

[P1/x]M1

[Po/x]Mo

£.

[Pk/X]Mk

[Q/x]N.

This proves the lemma.

N [F]

if M

Lemma 3.7.12 If

is

Proof Suppose M

N [F]. Consider any sequence of terms Mi,

M

M3

—>+

M1

—>+

confluent, then M

.

. .

.

Mk such that

N.

Mk

. .

N.

o

N [F] in this form, By reflexivity and transitivity of —>+ we may always express M with the first sequence of reductions proceeding from M —>+ M1, and the next in the opposite direction. If k 2, then the lemma holds. Otherwise, we will show how to shorten M3. the sequence of alternations. By confluence, there is some term P with M1 —>+ P By transitivity of —>+, this gives us a shorter conversion sequence M

—>+

P

M4

—>+

...

Mk

N.

By induction on k, we conclude that M —>+

Theorem 3.7.13 M

o

N.

o

N.

For any confluent rewrite system

1Z,

we have

I—

M

= N[F]

. if

Universal Algebra and Algebraic Data Types

210

Proof By Lemma 3.7.11, we know 3.7.12, confluence implies M

Corollary 3.7.14 If s, then

ER. is consistent.

3.7.3

Termination

ER H M

N [F]

if M

= N[Fj o

if

M N.

N [F]. By Lemma

is confluent and there exist two distinct normal forms

of sort

There are many approaches to proving termination of rewrite systems. Several methods involve assigning some kind of "weights" to terms. For example, suppose we assign a natural number weight WM to each term M and show that whenever M -÷ N, we have WM > wN. It follows that there is no infinite sequence M0 -÷ M1 -÷ M2 -÷ . of one-step reductions, since there is no infinite decreasing sequence of natural numbers. While this approach may seem straightforward, it is often very difficult to devise a weighting function that does the job. In fact, there is no algorithmic test to determine whether a rewrite system is terminating. Nonetheless, there are reasonable methods that work in many simple cases. Rather than survey the extensive literature on termination proofs, we will just take a quick look at a simple method based on algebras with ordered carriers. More information may be found in the survey articles [DJ9O, H080, K1o87] and references cited there. The method we will consider in this section uses binary relations that prohibit infinite sequences. Before considering well-founded relations, also used in Section 1.8.3, we reis a binary relation on a set A, we will write view some notational conventions. If for the union of -< and equality, and >- for the complement of In other words, x y if x -< y or x = y and x >- y —'(y x). Note that in using the symbol -< as the name of is transitive or has any other properties a relation, we do not necessarily assume that of an ordering. However, since most of the examples we will consider involve ordering, the choice of symbol provides helpful intuition. A binary relation -< on a set A is well-founded if there is no infinite decreasing sequence, ao >- ai >>—..., of elements of A. The importance of well-founded relations is that if we can establish a correspondence between terms and elements of a set A with well-founded binary relation —.- R for each rule L —÷ R in 7?., then is terminating.

Proof The proof uses the substitution lemma and other facts about the interpretation of terms. The only basic fact we must prove directly is that for any term M e Terms(E, F, x: s) containing the variable x, and any environment if a >- b e A5, then a]

>.-

IIMFj[x

b].

This is an easy induction on M that is left to the reader. we To prove the lemma, it suffices to show that if M —÷ N, then for any environment >.since no well-founded carrier has any infinite sequence of "decreashave ing" elements. In general, a single reduction step has the form —÷ IISR/x]M, where x occurs exactly once in M. Therefore, we must show that for any environment satisfying the appropriate sort assignment,

and b Let be any environment and let a = and b = for some environment lemma, we have a = the substitution S. Therefore, by the hypothesis of the lemma, a easy induction mentioned in the first paragraph of the proof) that

>-

By the substitution determined by and b. It follows (by the

Universal Algebra and Algebraic Data Types

212

.

The lemma follows by the substitution lemma.

It is important to realize that when we prove termination of a rewrite system by interpreting terms in a well-founded algebra, the algebra we use does not satisfy the equations associated with the rewrite system. The reason is that reducing a term must "decrease" its value in the well-founded algebra. However, rewriting does not change the interpretation of a term in any algebra satisfying the equations associated with the rewrite system. An essentially trivial termination example is the rewrite system for stacks, obtained by orienting the two equations in Table 3.1 from left to right. We can easily see that this system terminates since each rewrite rule reduces the number of symbols in the term. This argument may be carried out using a well-founded algebra whose carriers are the natural numbers. To do so, we interpret each function symbol as a numeric function that adds one to the sum of its arguments. Since the meaning of a term is then found by adding up the number of function symbols, it is easy to check that each rewrite rule reduces the value of a term. It is easy to see that when the conditions of Lemma 3.7.15 are satisfied, and each carrier is the natural numbers, the numeric meaning of a term is an upper-bound on the longest sequence of reduction steps from the term. Therefore, if the number of rewrite steps is a fast-growing function of the length of a term, it will be necessary to interpret at least one function symbol as a numeric function that grows faster than a polynomial. The following example uses an exponential function to show that rewriting to disjunctive normal form always terminates.

Example 3.7.16 (Attributed to Filman [DJ9OJ.) Consider the following rewrite system on formulas of propositional logic. As usual, we write the unary function —' boo! —± bool as a prefix operator, and binary functions A, V bool x bool —± boo! as infix operations. :

:

x

—'—lx

V

y) y)

A >

V

XA(yVz)-+ (xAy)v(xAz) (yvz)Ax—±(yAx)v(zAx). This system transforms a formula into disjunctive normal form. We can prove that this rewrite system is terminating using a well-founded algebra of natural numbers. More specifically, let A = (A"°°1, orA, andA, nor4) be the algebra with

3.7

Rewrite Systems

213

carrier the

=

orA(x,y)

=x+y+1

andA(x, y)

=x

1), the natural numbers greater than 1, and

—

* y

It is easy to see that each of these functions is strictly monotonic on the natural numbers (using the usual ordering). Therefore, A is a well-founded algebra. To show that the rewrite rules are terminating, we must check that for natural number values of the free variables, the left-hand side defines a larger number than the right. For the first rule, this is obvious since for any natural number x > 1, we have 22x > x. For the > for any x, y> 1. The remaining rules rely second rule, we have 2(x+y+1) = on similar routine calculations. .

Example 3.7.17 In this example, we will show termination of a rewrite system derived from the integer multiset axioms given in Table 3.2. For simplicity, we will omit the rules for conditionals on multisets and booleans, since these do not interact with any of the other rules. Specifically, let us consider the signature with sorts mset, nat, bool, constants for natural numbers, the empty multiset and true, false, and functions + and Eq? on natural conditional for natural numbers. The numbers, insert and mem? on multisets, and are rewrite rules

O+O—±O,O+

Eq?xx

—±

Eq?O

—±

1

1

—±

1,...

true

false, Eq?02 —± false,...

mem? x empty —k 0

mem?x(insertym) true x y xy

—± —±

—±

if

Eq?xy then (mem?xm)+1 else mem?xm

x y

= = A with A/. We must choose strictly monotonic functions for each function symbol so that the meaning of the left-hand side of each rule is guaranteed to be a larger natural number than the meaning of the right-hand side. For most of the functions, this may be done in a straightforward manner. However, since the second membership rule does not decrease the number of We will interpret this signature in an algebra

Universal Algebra and Algebraic Data Types

214

symbols in a term, we will have to give more consideration to the numeric interpretation of mem?. An intuitive way to assign integers to terms is to give an upper bound on the longest possible sequence of reductions. For example, the longest sequence from a term of the form reduces all three subterms as much as possible, then (assuming M1 reduces to true or false) reduces the conditional. Therefore, a reasonaNe numeric interpretation for condo is 1

+x+y+z.

Following the same line of reasoning, we are led to the straightforward interpretations of the other functions, except mem?.

+Axy

=1+x+y

Eq?Axy

=l+x+y =

For

1

+x +

m

technical reasons that will become apparent, we will interpret

=

=

=

—

{O, 11

rather than using all natural numbers. Therefore, we interpret each constant symbol as 2. For mem?, we must choose a function so that the left-hand side of each mem? rule will give a larger number than the right-hand side. The first mem? rule only requires that the function always have a value greater than 2. From the second rule, we obtain the condition

mem?Ax(y +m) >

5+x +y +2*(mem?Axm).

We can surmise by inspection that value of mem?A x m should depend exponentially on its second argument, since mem?A x (y + m) must be more than twice mem?A x m, for any y > 2. We can satisfy the numeric condition by taking mem?A x m

=

*

2m

With this interpretation, it is straightforward to verify the conditions of Lemma 3.7.15 and conclude that the rewrite rules for multisets are terminating. The only nontrivial case is mem?. A simple way to see that x

*2(y+m)_(5+x

is to verify the

each variable.

+y+x *2m+1)>O

inequality for x = y = m = 2 and then check the derivative with respect to

3.7

Rewrite Systems

Exercise 3.7.18

215

Consider the following rewrite system on formulas of propositional

logic.

V y)

>

A

y)

>

V

(xvy)A(xvz) (yvx)A(zvx). This is similar to the system in Example 3.7.16, except that these rules transform a formula into conjunctive normal form instead of disjunctive normal form. Show that this rewrite system is terminating by defining and using a well-founded algebra whose carrier is some subset of the natural numbers. (This is easy if you understand Example 3.7.16.)

Exercise 3.7.19 Show that the following rewrite rules are terminating, using a wellfounded algebra. (a)

O+x (Sx) + y

S(x

+ y)

(b) The rules of part (a) in combination with

O*x (Sx)

* y

>

(x * y)

+y

(c) The rules of parts (a) and (b) in combination with

factO

1

fact(Sx)

(Sx) * (factx)

Termination for a more complicated version of this example is proved in [Les92] using a lexicographic order.

An algebraic specification for set, nat and bool is given in Table 3.4. We may derive a rewrite system from the the equational axioms by orienting them from left to right.

Exercise 3.7.20

(a) Show that this rewrite system is terminating by interpreting each of the function symbols as a function over a subset of the natural numbers. (b) We may extend the set datatype with a cardinality function defined by the following

axioms.

Universal Algebra and Algebraic Data Types

216

=0

card empty

card(insert x s) = cond (ismem? x s)(card s)((card s) +

1)

Show that this rewrite system is terminating by interpreting each of the function symbols as a function over a subset of the natural numbers.

3.7.4

Critical Pairs

We will consider two classes of confluent rewrite systems, one necessarily terminating and the other not. In the first class, discussed in Section 3.7.5, the rules do not interact in any significant way. Consequently, we have confluence regardless of termination. In the second class, discussed in Section 3.7.6, we use termination to analyze the interaction between rules. An important idea in both cases is the definition of critical pair which is a pair of terms representing an interaction between rewrite rules. Another general notion is local confluence, which is a weak version of confluence. Since local confluence may be used to motivate the definition of critical pair, we begin by defining local confluence. o is locally confluent if N M —÷ P implies N P. This is A relation o strictly weaker than confluence since we only assume N P when M rewrites to N or P by a single step. However, as shown in Section 3.7.6, local confluence is equivalent to ordinary confluence for terminating rewrite systems.

Example 3.7.21 confluent is a a

—÷

—÷

b, b

—÷

a

ao, b

—÷

bo

A simple example of a rewrite system that is locally confluent but not

Both a and b can be reduced to ao and b0. But since neither ao nor b0 can be reduced to the other, confluence fails. However, a straightforward case analysis shows that is locally confluent. A pictorial representation of this system appears in Figure 3.1. Another rewrite system that is locally confluent but not confluent, and which has no loops, appears in Example 3.7.26.

.

In the rest of this section, we will consider the problem of determining whether a rewrite system is locally confluent. For rewrite systems with a finite number of rules, we may demonstrate local confluence by examining a finite number of cases called "critical pairs." Pictorial representations of terms and reductions, illustrating the three important cases, are given in Figures 3.2, 3.3 and 3.4. (Similar pictures appear in [DJ9O, Hue8O, K1o87].) Suppose we have some finite set 1?. of rewrite rules and a term M that can be reduced in two different ways. This means that there are rewrite rules L —÷ R and L' —÷ R' and

3.7

a0

Rewrite Systems

217

ab

Figure 3.1 A locally confluent but non-confluent reduction.

substitutions S and S' such that the subterm of M at one position, has the form SL and the subterm at another position, has the form S'L'. It is not necessary that L R and R' be different rules. However, if we may rename variables so that L and L' have no L' variables in common, then we may assume that S and 5' are the same substitution. There are several possible relationships between and In some cases, we can see that there is essentially no interaction between the rules. But in one case, the "critical pair" case, there is no a priori guarantee of local confluence. The simplest case, illustrated in Figure 3.2, is if neither SL nor SL' is a subterm of the is a subsequence of the other. In this case, it other. This happens when neither nor does not matter which reduction we do first. We may always perform the second reduction immediately thereafter, producing the same result by either pair of reductions. For example, if M has the form fSL SL', the two reductions produce SR SL' and fSL SR', respectively. But then each reduces to SR SR' by a single reduction. The second possibility is that one subterm may contain the other. In this case, both reductions only effect the larger subterm. For the sake of notational simplicity, we will assume that the larger subterm is all of M. Therefore, we assume is the empty sequence, SL, and SL' is a subterm of SL and position iii, There are two subcases. so M A "trivial" way for SL' to be a subterm of SL is when S substitutes a term containing SL' for some variable in L. In other words, if we divide SL into symbol occurrences that come from L and symbol occurrences that are introduced by 5, the subterm at position only contains symbols introduced by S. This is pictured in Figure 3.3. In this case, reducing SL to SR either eliminates the subterm SL', if R does not contain x, or gives us a term containing one or more occurrences of SL'. Since we can reduce each occurrence of SL' to SR', we have local confluence in this case. The remaining case is the one we must consider in more detail: SL' is at position of is not a variable. Clearly, this may only happen if the principal (first) function SL and If we apply the first symbol of L' occurs in L, as the principal function symbol of

f

f

Universal Algebra and Algebraic Data Types

218

/ Figure 3.2 Disjoint reductions

with rewrite rule, we obtain SR. The second rewrite rule gives us the term our chosen occurrence of SL' replaced by SR'. This is pictured in Figure 3.4. Although o we may have SR [SR7ñi]SL "accidentally," there is no reason to expect this in general.

Example 3.7.22 Disjoint (Figure 3.2) and trivially overlapping (Figure 3.3) reductions are possible in every rewrite system. The following two rewrite rules for lists also allow conflicting reductions (Figure 3.4): —± £ I

I

£

We obtain a disjoint reduction by writing a term that contains the left-hand sides of both rules. For example,

3.7

Rewrite Systems

219

/ Figure 3.3 Trivial overlap

condB(cdr(cons x £)) (cons(car £') (cdr £')) may be reduced using either rewrite rule. The two subterms that may be reduced independently are underlined. A term with overlap only at a substitution instance of a variable (Figure 3.3) is

cdr(cons x(cons(car £')(cdr £'))) which is obtained by substituting the left hand side of the second rule for the variable £ in the first. Notice that if we reduce the larger subterm, the underlined smaller subterm remains intact. Finally, we obtain a nontrivial overlap by replacing x and £ in the first rule by care and

cdr(cons(car £')(cdr £'))

Universal Algebra and Algebraic Data Types

220

/

N

Figure 3.4 Critical pair

If we apply the first rule to the entire term, the cons in the smaller subterm is removed by the rule. If we apply the second rule to the smaller subterm, cons is removed and the first rule may no longer be applied. However, it is easy to see that the rules are locally confluent, since in both cases we obtain the same term, £. A system where local confluence fails due to overlapping rules appears in Example 3.7.1. Specifically, the term (x + 0) + y may be reduced to either x + y or x + (0 + y), but neither of these terms can be reduced further. Associativity (as in the third rule of Example 3.7.1) and commutativity are difficult to treat by rewrite rules. Techniques for rewriting with an associative and commutative equivalence relation are surveyed in [DJ9O], for example.

.

It might appear that for two (not necessarily distinct) rules L R and L' —* R', we must consider infinitely many substitutions S in looking for nontrivial overlaps. However, for each term L, position in L, and term L', if any substitution S gives SL' there is a simplest such substitution. More specifically, we may choose the minimal (most general) substitution S under the ordering S 5' if there exist 5" with 5' = S" o S. Using unification, we may compute a most general substitution for each L' and non-variable subterm or determine that none exists. For many pairs of rewrite rules, this is a straightforward hand calculation. Intuitively, a critical pair is the result of reducing a term with overlapping redexes, obtained with a substitution that is as simple as possible. An overlap is a triple (SL, SL', ñ'z) such that SL' occurs at position of SL and is not a variable. If S is the sim-

3.7

Rewrite Systems

221

plest substitution such that (SL, SL', n1) is an overlap, then the pair (SR, is called a critical pair. In Example 3.7.22 the two nontrivial overlaps give us critical pairs S(x + y) + z), since the simplest possible substitutions were £) and (Sx + (y +

used. A useful fact to remember is that a critical pair may occur only between rules L —f R of L' occurs in L. Moreover, if and L' —± R' when the principal function symbol is applied to non-variable terms in both L' and the relevant occurrence in L, then the principal function symbols of corresponding arguments must be identical. For example, if we have a rule f(gxy) —f R', then this may lead to a critical pair only if there is a rule R such that either fz or a term of the form f(gPQ) occurs in L. A subterm of the L form f(h...) does not give us a critical pair, since there is no substitution making (h...) syntactically identical to any term with principal function symbol g. In general, it is important to consider overlaps involving two uses of the same rule. For example, the rewrite system with the single rule f(fx) —f a is not locally confluent since we have an overlap (f(f(fx)), f(fx), 1) leading to a critical pair (fa, a). (The term f(f(fx)) can be reduced to both fa and a.) It is also important to consider the case where two left-hand sides are identical, since a rewrite system may have two rules, L —± R and L —f R', that have the same left-hand side. (In this case, (R, R') is a critical pair.) Since any rule L —± R results in overlap (L, L, E), where E is the empty sequence, any rule results in a critical pair (R, R) of identical terms. We call a critical pair of identical terms, resulting from two applications of the a single rule L —f R to L, a trivial critical

f

f

pair Proposition 3.7.23

[KB7O] A rewrite system N for every critical pair (M, N).

is locally confluent

if M

o

If a finite rewrite system is terminating, then this lemma gives us an algorithmic procedure for deciding whether 1Z is locally confluent. We will consider this in Section 3.7.6.

Proof The implication from left to right is straightforward. The proof in the other direction follows the line of reasoning used to motivate the definition of critical pair. More specifically, suppose M may be reduced to two subterms, N and P, using L —± R and L' —f R'. Then there are three possible relationships between the relevant subterms SL and SR of M. If these subterms are disjoint or overlap trivially, then as argued in this P. The remaining case is that N and P contain substitusection, we have N —*± o tion instances of a critical pair. Since the symbols not involved in the reductions M —± N o P, we may simand M —± P will not play any role in determining whether N of 1Z with plify notation by assuming that there is some critical pair (SR,

Universal Algebra and Algebraic Data Types

222

However, by the hypothesis of the lemma, we have Therefore, since each reduction step applies to any substitution o P. instance of the term involved (see Lemmas 3.7.10 and 3.7.11), we have N

S'(SR) and P

N SR

o

.

Using the signature of Example 3.7.1, we may also consider the rewrite

Exercise 3.7.24 system

O+x (—x)+x (x

+ y) +

—±0 z

x

+

(y

+

z)

You may replace infix addition by a standard algebraic (prefix) function symbol if this makes it easier for you. (a) Find all critical pairs of this system.

(b) For each critical pair (SR,

determine whether SR

—** o

Exercise 3.7.25

A rewrite system for formulas of propositional logic is given in Example 3.7.16. You may translate into standard algebraic terms if this makes it easier for you. (a) Find all critical pairs of this system.

(b) For each critical pair (SR,

3.7.5

determine whether SR

—** o

[SR'/ñz]SL.

Left-linear Non-overlapping Rewrite Systems

A rewrite system is non-overlapping if it has no nontrivial critical pairs. In other words, the only way to form a critical pair is by using a rule L —÷ R to reduce L to R twice, obtaining the pair (R, R). Since trivial critical pairs do not effect confluence, we might expect every non-overlapping rewrite system to be confluent. However, this is not true; a counterexampIe is given in Example 3.7.26 below. Intuitively, the problem is that since rewrite rules allow us to replace terms of one form with terms of another, we may have rules that only overlap after some initial rewriting. However, if we restrict the left-hand sides of rules, then the absence of critical pairs ensures confluence. The standard terminology is that a rule L —k R is left-linear if no variable occurs twice in L. A set of rules is left-linear if every rule in the set is left-linear. The main result in this section is that left-linear, non-overlapping rewrite systems are confluent. The earliest proofs of this fact are probably proofs for combinatory logic (see Example 3.7.31) that generalize to arbitrary left-linear and non-overlapping systems [Klo87]. The general approach using parallel reduction is due to Huet [Hue8O].

3.7

Rewrite Systems

223

Example 3.7.26 The following rewrite system for finite and "infinite" numbers has no critical pairs (and is therefore locally confluent) but is not confluent.

-±Sc,c Eq?xx Eq?x (Sx) Intuitively, the two Eq? rules would be fine, except that the "infinite number" oc is its own successor. With oc, we have

Eq? oc

true

Eq?c'cc'c—÷ Eq?c'c(Scx)-± false so one term has two distinct normal forms, true and false. The intuitive reason why confluence fails in this situation is fairly subtle. The only possible critical pair in this system would arise from a substitution R such that R(Eq?x x) R(Eq? x (Sx)). Any such substitution R would have to be of the form R = [M/xj, where M SM. But since no term M is syntactically equal to its successor, SM, this is impossible. However, the term oc is reducible to its own successor. Therefore, the rules have the same effect as an overlap at substitution instances Eq? 00 00 and Eq? oc (Sx). Another, way of saying this is that since oc —* we have foo/xJ(Eq?xx) —* fcx/xJ (Eq?x (Sx)), and the "overlap after reduction" causes confluence to fail. We will analyze left-linear, non-overlapping rewrite systems using parallel reduction, which was considered in Section 2.4.4 for PCF. We define the parallel reduction relation, for a set of algebraic rewrite rules by

...flkJM

[SL1

where h1, . . nk are disjoint positions in M (i.e., none of these sequences of natural numR1 that bers is a subsequence of any other) and L1 Lk —* Rk are rewrite rules in need not be distinct (and which have variables renamed if necessary). An alternate characand is closed is that this is the least relation on terms that contains terization of under the rule .

N1

C[M1

Mk

Nk

NkJ

for any context Cf. ..] with places for k terms to be inserted. This was the definition used for parallel PCF reduction in Section 2.4.4.

Universal Algebra and Algebraic Data Types

224

N by a reduction sequence of N if M Some useful notation is to write M length n, and M N if there is a reduction sequence of length less than n. The reason we use parallel reduction to analyze linear, non-overlapping rewrite systems is that parallel reduction has a stronger confluence property that is both easier to prove directly than confluence of sequential reduction and interesting in its own right. A relation —f on terms is strongly confluent if, whenever M —* N and M —* P, there is a term Q with N Q and P Q. The reader who checks other sources should know that an alternate, slightly weaker definition of strong confluence is sometimes used. is strongly confluent. This is called the "parallel moves lemma." We now show that implies confluence After proving this lemma, we will show that strong confluence of

of Lemma 3.7.27 confluent.

[Hue8O] If R. is left-linear and non-overlapping, then

is

strongly

The lemma proved in [Hue8O] is actually stronger, since it applies to systems with the N or N M. However, property that for every critical pair (M, N), we have M the left-linear and non-overlapping case is easier to prove and will be sufficient for our purposes.

Proof Suppose that M

o N and M P. We must show that N P. For notational simplicity, we will drop the subscript in the rest of the proof. The reduction M N results from the replacement of subterrns SL1 SLk at positions by SR1 P involves replaceSRk and, similarly, the reduction M jig. By hypothesis, the positions ñi ment of terms at positions must be disjoint and similarly for J3g. There may be some overlap between some of the 's and 's, but the kinds of overlaps are restricted by the disjointness of the two sets of positions. Specifically, if some ñ is a subsequence of one or more flj's, say then none of Pi can be a subsequence of any Therefore, the only kind of overlap we must consider is a subterm SL of M that is reduced by a single rule L —* R in one reduction, but has one or more disjoint subterms reduced in the other. Since there are no critical pairs, we have only the trivial case that with the same rule applied in = both cases, and the nontrivial case with all of the effected subterms of SL arising from substitution for variables in L. For notational simplicity, we assume in the rest of this proof that we do not have ñj = for any i and j. The modifications required to treat = are completely straightforward, since N and P must be identical at this position. To analyze this situation systematically, we reorder the sequences of disjoint positions so that for some r k and s £, the initial subsequences and P5

n.

3.7

Rewrite Systems

225

have two properties. First, none has any proper subsequences among the complete lists

and Second, every sequence in either complete list must have some proper subsequence among the initial subsequences ñi n and If we let t = r + s and let M1 be the subterms Mf,j1,..., MI,4, then every M1 is reduced in producing N or P, and all of the reductions occur in M1 Therefore, we may write M, N and P in the form

M_=C[M1

NJ PtJ

where, for 1 ÷

are confluent.

Proof It

are the same relation on terms. (This and is not hard to see that is part (a) of Exercise 2.4.19 in Section 2.4.4.) Therefore, it suffices to show that if

is confluent. For notational simplicity, we will drop the strongly confluent then N if there is a reduction subscript 7Z in the rest of the proof. Recall that we write M N if the length is less than n. sequence from M to N of length n, and M N Suppose is strongly confluent. We first show by induction on n that if M For the base and M P then there is some term Q with N =* N holds, in some well-founded algebra. The completion procedure begins by testing whether L > R, for each rewrite rule L —* R. If so, then the rewrite system is terminating and we may proceed to test for local confluence. This may be done algorithmically by computing critical pairs (M, N) and o searching the possible reductions from M and N to see whether M N. Since the rewrite system is terminating, there are only finitely many possible reductions to consider. If all critical pairs converge, then the algorithm terminates with the result that the rewrite system is confluent and terminating. o ±*— N for some critical pair (M, N) then by definition of If we do not have M critical pair, there is some term that reduces to both M and N. Therefore, M and N may be proved equal by equational reasoning and it makes logical sense to add M —± N or N —* M as a rewrite rule. If M> N or N > M, then the completion procedure adds the appropriate rule and continues the test for local confluence. However, if M and N

3.7

Rewrite Systems

229

incomparable (neither M> N nor N > M), there is no guarantee that either rule preserves termination, and so the procedure terminates inconclusively. This procedure may not terminate, as illustrated in the following example, based on [Be186J. are

Example 3.7.34 In this example, we show that the completion procedure may continue indefinitely, neither producing a confluent system after adding any finite number of rules, nor reaching a state where there is no rule to add. Let us consider two rewrite rules, called H for homomorphism and A for associativity.

fx+fy

(H) (A)

We can see that these are terminating by constructing a well-founded algebra, A. As the single carrier, we may use the natural numbers greater than 2. To make (A) decreasing, we

must choose some non-associative interpretation for +. The simplest familiar arithmetic operation is exponentiation. We may use

+A(x,

fA(x)

yX,

x2.

An easy calculation shows that each rule is decreasing in this algebra. The only critical pair is the result of unifying fx + fy with the subterm (x + y) of the left-hand side of (A). Reducing the term (fx + fy) + z in two ways gives us the critical pair

(f(x + y) + z, fx + (fy + z)), with weights

fA(x +A y) +A z =

fAx +A (fAx +A z) =

x2

=

(zY2)x2

2x

=

Comparing the numeric values of these expressions for x, y, z> 3, we can see that termination is preserved if we add the rule rewriting the first term to the second. This gives us a rewrite rule that we may recognize as an instance of the general pattern

+ y) +

z

+ (ffly + z),

(A) (A)o. The general pattern, if we continue the completion procedure, is that by reducing the for n > 1, we obtain a critical pair of the term + fy) + z by (H) and form

Universal Algebra and Algebraic Data Types

230

+ (ffly + z))

+ y) + z, with weights

x2n

+A z) =

(fA)flx +A

This leads us to add the rewrite rule

2nx

= For this reason, the completion procedure will

continue indefinitely.

.

There are a number of issues involved in mechanizing the completion procedure that we will not go into. A successful example of completion, carried out by hand, is given in the next section.

Exercise 3.7.35 Consider the infinite rewrite system consisting of rule (H) of Examgiven in that example. Prove that this ple 3.7.34, along with each rule of the form infinite system is confluent and terminating, or give a counterexample. Exercise 3.7.36

f(g(fx))

The rewrite system with the single rule

g(fx)

is terminating, since this rule reduces the number of function symbols in a term. Carry out several steps of the completion procedure. For each critical pair, add the rewrite rule that reduces the number of function symbols. What is the general form of rewrite rule that is

added? 3.7.7

Applications to Algebraic Datatypes

There are several ways that rewrite rules may be used in the design, analysis and application of algebraic specifications. If equational axioms may be oriented (or completed) to form a confluent rewrite system, then this gives us a useful way to determine whether two terms are provably equal. In particular, a confluent rewrite system allows us to show that two terms are not provably equal by showing that they have distinct normal forms. A potential use of rewrite rules in practice is that rewriting may be used as a prototype implementation of a specified datatype. While this may be useful for simple debugging of specifications, reduction is seldom efficient enough for realistic programming. Finally, since our understanding of initial algebras is largely syntactic (based on provability), rewrite systems are very useful in the analysis and understanding of initial algebras. We will illustrate two uses of rewrite systems using the algebraic specifications of lists with error elements. The first is that we may discover the inconsistency of our first specification by testing for local confluence. The second is that we may characterize the initial al-

3.7

Rewrite Systems

231

gebra for the revised specification by completing directed equations to a confluent rewrite system. In Tables 3.5 and 3.6 of Section 3.6.3, we considered a naive treatment of error values that turned out to be inconsistent. Let us consider the rewrite system obtained by reading the equations from left to right. It is easy to see that these rules are terminating, since every rule reduces the length of an expression. Before checking the entire system for confluence, let us check Table 3.5 and Table 3.6 individually for critical pairs. The only pair of rules that give us a critical pair are the rules

= errori cons x errorl = errori cons errora

1

of Table 3.6. Since the left-hand sides are unified by replacing x and 1 by error values, we have the critical pair (error1, errorl). However, this critical pair obviously does not cause local confluence to fail. It follows that, taken separately, the rules determined from Tables 3.5 or Table 3.6 are confluent and terminating. Hence by Corollary 3.7.14, the equations in each figure are consistent. Now let us examine rules from Table 3.6 that might overlap with a rule from Table 3.5. x containing The first candidate is cons errora 1 errori, since a rule car(consx 1) cons appears in Table 3.5. We have a nontrivial overlap since the term car(cons errora /) is a substitution instance of one left-hand side and also has a subterm matching the leftwhile the other gives hand side of another rule. Applying one reduction, we obtain us car errorl. Thus our first non-trivial critical pair is (errora, car errorl). However, this critical pair is benign, since car errori errora. The next critical pair is obtained by unifying the left-hand side of cons errora 1 errorl 1. Substituting errora for x produces a with a subterm of the next rule, cdr(cons x 1) term cdr(cons errora 1), which reduces either to the list variable 1 or cdr errorl. Since the latter only reduces to errors, we see that confluence fails. In addition, we have found a single expression that reduces both to a list variable 1, and a list constant errorl. Therefore, [1: list] and so by Lemma 3.7.11 the equational specification given in Tables 3.5 errorl and 3.6 proves the equation errorl = 111: list]. This shows that all lists must be equal in any algebra satisfying the specification. From this point, it is not hard to see that similar equations are provable for the other sorts. Thus we conclude that the specification is inconsistent. The revised specification of lists with error elements is given in Table 3.7. We obtain rewrite system by reading each equation from left to right. Before checking the critical a pairs, let us establish termination. This will give us a basis for orienting additional rules that arise during the completion procedure. We can show termination of this rewrite system using a well-founded algebra A with 1

Universal Algebra and Algebraic Data Types

232

each carrier the set of positive natural numbers (the numbers greater than 0). We interpret each of the constants as 1 and the functions as follows. This interpretation was developed by estimating the maximum number of reduction steps that could be applied to a term of each form, as described in Example 3.7.17.

consAxt

—x+t

cdrAt

=2*t+5 =2*t+5

isemply?At = £ +

8

condAuxy=u+x+y+1 OKAx

=x+8

It is straightforward to check that L > R for each of the rewrite rules derived from the

specification. There are several critical pairs, all involving the rules for consxerror and conserrort. The first overlap we will check is

car(cons error t) If we reduce the outer term, we get cond (OK error) (cond(OK t)error error) error while the inner reduction gives us carerror. This is a nontrivial critical pair. However, both terms reduce to error, so there is no problem here. The next overlap is

car(consx error) which gives us the similar-looking critical pair

(cond (OK x) (cond(OK error) x error) error,

car error)

The second term clearly reduces to error, but the closest we can come with the first term (by reducing the inner conditional) is

cond (OK x)error error. This gives us a candidate for completion. In the well-founded algebra A, it is easy to see that

3.7

Rewrite Systems

233

cond (OKx)error error > error so we will preserve termination by adding the rewrite rule

cond (OKx)error error

error.

Since the left-hand and right-hand sides of this rule are already provably equal from this specification, adding this rule does not change the conversion relation between terms. The critical pairs for cdr and isempty? are similar. For both functions, the first critical pair converges easily and the second converges using the new rule added for the car case. The final critical pairs are for OK applied to a cons. For car(cons error £), we get false in both cases. However, for

car(cons x error) we must add the rewrite rule

cond (OKx)falsefalse

—÷

false.

Thus two new rules give us a confluent rewrite system. The initial algebra for a specification consists of the equivalence classes of ground (variable-free) terms, modulo provable equality. Since the confluent and terminating rewrite system for the list specification allows us to test provable equality by reduction, we may use the rewrite system to understand the initial algebra. Specifically, since every equivalence class has exactly one normal form, the initial algebra is isomorphic to the algebra whose carriers are the ground normal forms of each sort. Therefore, the atoms error,4, the booleans in the initial algebra are in the initial algebra are a, b, c, d true,false, errorb, and the lists all have the form nil, error or cons A L where neither A nor L contains error. Thus this specification correctly adds exactly one error element to each carrier and eliminates undesirable lists that contain error elements. The reader interested in a general theorem about initial algebras with error elements may consult [GTW78, Wir9O].

Exercise 3.7.37 Exercise 3.6.7 asks you to write an algebraic specification of queues with error elements, in the same style as the specification of lists in Table 3.7. Complete this specification to a confluent set of rewrite rules and use these rules to characterize the initial algebra. Exercise 3.7.38 Exercise 3.6.8 asks you to write an algebraic specification of trees with error elements, in the same style as the specification of lists in Table 3.7. Complete this specification to a confluent set of rewrite rules and use these rules to characterize the initial algebra.

Simply-typed Lambda Calculus

4.1

Introduction

This chapter presents the pure simply-typed lambda calculus. This system, which is extended with natural numbers, booleans and fixed-point operators to produce PCF, is the core of all of the calculi studied in this book, except for the simpler system of universal algebra used in Chapter 3. The main topics of this chapter are: •

Presentation of context-sensitive syntax using typing axioms and inference rules.

The equational proof system (axiomatic semantics) and reduction system (operational semantics), alone and in combination with additional equational axioms or rewrite rules. •

Henkin models (denotational semantics) of typed lambda calculus and discussion of general soundness and completeness theorems. •

Since Chapter 2 describes the connections between typed lambda calculus and programming in some detail, this chapter is more concerned with mathematical properties of the system. We discuss three examples of models, set-theoretic models, domain-theoretic models using partial orders, and models based on traditional recursive function theory, but leave full development of domain-theoretic and recursion-theoretic models to Chapter 5. The proofs of some of the theorems stated or used in this chapter are postponed to Chapter 8. Although versions of typed lambda calculus with product, sum, and other types described in Section 2.6 are useful for programming, the pure typed lambda calculus with only function types is often used in this chapter to illustrate the main concepts. Church developed lambda calculus in the 1930's as part of a formal logic and as a formalism for defining computable functions [Chu36, Chu4l]. The original logic, which was untyped, was intended to serve a foundational role in the formalization of mathematics. The reason that lambda calculus is relevant to logic is that logical quantifiers (such quantifiers) are binding operators and their usual proof as universal, V, and existential, rules involve substitution. Using lambda abstraction, logical quantifiers may be treated as constants, in much the same way that recursive function definitions may be treated using fixed-point constants (see Example 4.4.7 and associated exercises). However, a paradox was discovered in logic based on untyped lambda calculus 11KJ35]. This led to the development of typed lambda calculus, as part of a consistent, typed higher-order logic Hen5O]. In 1936, Kleene proved that the natural number functions definable in untyped lambda Turing, in 1937, then proved that the calculus are exactly the recursive functions A-definable numeric functions are exactly as the functions computable by Turing machines

Simply-typed Lambda Calculus

236

[Tur37]. Together, these results, which are analogous to properties of PCF considered in Sections 2.5.4 and 2.5.5, establish connections between lambda calculus and other models of computation. After the development of electronic computers, untyped lambda calculus clearly influenced the development of Lisp in the late 1950's [McC6O]. (See [McC78, Sto9lb} for many interesting historical comments.) General connections between lambda notation and other programming languages were elaborated throughout the 1960's by various researchers [BG66, Lan63, Lan65, Lan66], leading to the view that the semantics of programming languages could be given using extensions of lambda calculus [Str66, MS76, Sto77]. Although untyped lambda calculus provides a simpler execution model, a modern view is that typed lambda calculus leads to a more insightful analysis of programming languages. Using typed lambda calculi with type structures that correspond to types in programming languages, we may faithfully model the power and limitations of typed programming languages. Like many-sorted algebra and first-order logic, typed lambda calculus may be defined using any collection of type constants (corresponding to sorts in algebra) and term constants (corresponding to function symbols). Typical type constants in programming languages are numeric types and booleans, as in PCF, as well as characters, strings, and other sorts that appear in the algebraic datatypes of Chapter 3. In typed lambda calculus, the term constants may have any type. Typical constants in programming languages include 3, + and if ... then ... else ., as in PCF. As in algebra, the set of type constants and term constants are given by a signature. The axiom system (axiomatic semantics), reduction rules (operational semantics) and model theory (denotational semantics) are all formulated in a way that is applicable to any signature. .

.

Exercise 4.1.1 In the untyped logical system Church proposed in 1932, Russell's paradox (see Section 1.6) results in the untyped lambda term (Ax. not (xx))(Ax. not(xx))

This may be explained as follows. If we identify sets with predicates, and assume that x is a set, then the formula xx is true if x is an element of x, and not(xx) is true if x is not an element of x. Therefore the predicate (Ax. not(xx)) defines the set of all sets that are not members of themselves. Russell's paradox asks whether this set is a member of itself, resulting in the application above. Show that if this term is equal to false, then it is also equal to true, and conversely. You may assume that not true = false and not false = true. (Church did not overlook this problem. He attempted to avoid inconsistency by not giving either truth value to this paradoxical term.)

4.2

4.2 4.2.1

Types

237

Types

Syntax

We will use the phrase "simply-typed lambda calculus" to refer to any version of typed lambda calculus whose types do not contain type variables. The standard type forms that occur in simply-typed lambda calculus are function types, cartesian products, sums (disjoint unions) and initial ("null" or "empty") and terminal ("trivial" or "one-element") types. We do not consider recursive types, which were introduced in Section 2.6.3 and are considered again in Section 7.4, a form of simple type. (Recursive types contain type variables but a type variable alone is not a type, so this could be considered a borderline case.) Since lambda calculus would not be lambda calculus without lambda, all versions of simply-typed lambda calculus include function types. The type expressions of full simply-typed lambda calculus are given by the grammar

a ::= blnulllunitla+ala

xala

—÷a

where b may be any type constant, null is the initial (empty) type and, otherwise, terminal (one-element), sum, product and function types are as in Chapter 2. (Type connectives x and —÷ are used throughout Chapter 2; unit and + appear in Section 2.6.) Recall that when parentheses are omitted, we follow the convention that —± associates to the right. In addition, we give x higher precedence than —±, and —± higher precedence than +, so that

a+b x c—÷disreadasa+((b

xc)—÷ d).

To keep all the possibilities straight, we will name versions of typed lambda calculus by their types. The simplest language, called simply-typed lambda calculus with function

With sums, products and functions, we have types, will be indicated by the symbol type expression is a type expression containing only —± and type and soon. A constants. A type expression is a type expression containing only x, —± and type

constants, and similarly for other fragments of simply-typed lambda calculus. 4.2.2

Interpretation of Types

There are two general frameworks for describing the denotational semantics of typed lambda calculus, Henkin models and cartesian closed categories. While cartesian closed categories are more general, they are also more abstract and therefore more difficult to understand at first. Therefore, we will concentrate on Henkin models in this chapter and postpone category theory to Chapter 7. In a Henkin model, each type expression is interpreted as a set, the set of values of that type. The most important aspect of a Henkin model is the interpretation of function types, since the interpretation of other types is generally determined by the function types.

Simply-typed Lambda Calculus

238

Intuitively, a r is the type, or collection, of all functions from a to r. However, there are important reasons not to interpret a r as all set-theoretic functions from a to r. The easiest to state is that not all set-theoretic functions have fixed points. In the denotational semantics of any typed lambda calculus with a fixed-point constant, the type a —÷ a must contain all functions definable in the language, but must not contain any functions that do not have fixed points. There are many reasonable interpretations of function types. A general intuitive guide is to think of a r as containing all functions from a to r that have some general and desirable property, such as computability or continuity. To emphasize from the outset that many views are possible, we will briefly summarize three representative interpretations: classical set-theoretic functions, recursive functions coded by Gödel numbers, and continuous functions on complete partial orders (domains). Each will be explained fully later. In the full set-theoretic interpretation, a type constant b may denote any set Ab, and

the function type

a

r denotes the set

of all functions from

to At.

Note that is determined by the sets AU and At. There are several forms of recursion-theoretic interpretation for each involving sets with some kind of enumeration function or relation. We will consider a representative framework based on partial enumeration functions. A modest set (A, eA) is a set A together with a surjective partial function eA: A from the natural numbers to A. Intuitively, the natural number n is the "code" for eA(n) E A. Since eA is partial, eA(n) may not be defined, in which case n is not a code for any element of A. The importance of having codes for every element of A is that we use coding to define the recursive functions on A. Specifically, f: A B is recursive if there is a recursive map on natural numbers taking any code for a E A to some code for f(a) E B. In the recursive interpretation, a type constant b may denote any modest set (A", eb), and

the function type a tions from AU to At.

r denotes the modest set

of all total recursive func-

Since every total recursive function from AU to At has a Gödel number, we can use any standard numbering of recursive functions to enumerate the elements of the function type Thus, if we begin with modest sets for each type constant, we can associate a modest set with every functional type. The continuous or domain-theoretic interpretation uses complete partial orders. The primary motivations for this interpretation are to give semantics to recursively defined

4.3

Terms

239

functions and recursive type definitions. A complete partial order, or CPO, (D,

flat

—±

flat.

flat. xy

=

flat

—±

flat. x

We will now show that the meaning of a well-typed term does not depend on the typing derivation we use. This is the intent of the following two lemmas. The first requires some special notation for typing derivations. In general, a typing derivation is a tree, with each leaf an instance of a typing axiom, and each internal node (non-leaf) an instance of an inference rule. In the special case that an inference rule has only one premise, then uses of this rule result in a linear sequence of proof steps. If we have a derivation ending in a sequence of uses of (add var), this derivation has the form A,A', where A is generally a tree, and A' is a sequence of instances of (add var) concatenated onto the end of A. We use this notation to state that (add var) does not effect the meanings of terms.

Suppose is a derivation of F M: a and M: a such only rule (add var) appears in is'. Then for any rj

Lemma 4.5.7

IIIF

is a

derivation of F'

F', we have

=

where the meanings are taken with respect to derivations

and

The proof is an easy induction on typing derivations, left to the reader. We can now show that the meanings of "compatible" typings of a term are equal.

286

Simply-typed Lambda Calculus

Suppose L\ and L\' are derivations of typings F M: a and F" respectively, and that F and F' give the same type to every x free in M. Then

Lemma 4.5.8

M:

a,

if where the meanings are defined using L\ and L\', respectively.

The proof, which we 'eave as an exercise, is by induction on the structure of terms. It follows that the meaning of any well-typed term is independent of the typing derivation. This 'emma allows us to regard an equation F M = N: a as an equation between M and N, rather than derivations of typings F M: a and F N: a. Lemma 4.5.8 is a simple example of a coherence theorem. In general, a coherence problem arises when we interpret syntactic expressions using some extra information that is not uniquely determined by the expressions themselves. In the case at hand, we are giving meaning to terms, not typing derivations. However, we define the meaning function using extra information provided by typing derivations. Therefore, we must show that the meaning of a term depends only on the term itself, and not the typing derivation. The final lemma about the environment model condition is that it does not depend on the set of term constants in the signature. Let A be an environment model for terms over signature E and let E' be a signature containing E (i.e., containing all the type and term constants of E). If we expand A to E' by interpreting the additional term constants as any elements of the appropriate types, then we obtain an environment model for terms over

Lemma 4.5.9

Proof This lemma

is easily proved using the fact that every constant is equal to some variable in some environment. More specifically, if we want to know that a term F M: a with constants from E' has a meaning in some environment, we begin by replacing the constants with fresh variables. Then, we choose some environment that is identical to on the free variables of M, giving the new variables the values of the constants they replace. If A is an environment model, then the new term must have a meaning in environment and so it is easy to show that the original term must have a meaning in

. Exercise 4.5.10 Recall that in a full set-theoretic function hierarchy, contains all set-theoretic functions from to At. Show, by induction on typing derivations, that for any full set-theoretic hierarchy A, typed term F M: a and any environment F, the meaning AIIF M: is well-defined and an element of Exercise 4.5.11 Recall that in a full set-theoretic function hierarchy, contains all set-theoretic functions from to At. If there are no term constants, then this frame is

4.5

Henkin Models, Soundness and Completeness

287

simply a family of sets A = {A°} indexed by types. This problem is concerned with the full set-theoretic hierarchy for the signature with one base type b and no term constants, determined by A's' = {O, 1). It is easy to see that there are two elements of Ab, four elements of Ab and 44 = 256 elements of type 1.

(a) Calculate the meaning of the term Af: b

—÷

b. Ax: b.

f(fx)

in this model using the

inductive definition of meaning. and stating 2. (b) Express your answer to part (a) by naming the four elements of Ab to each element. the result of applying the function defined by Af: b —÷ b. Xx: b.

f(fx)

3. (c) Recall that every pure typed lambda term has a unique normal form. It follows from

the soundness theorem for Henkin models that the meaning of any term in any model is the same as the meaning of its normal form. Use these facts about normal forms to determine are the meanings of and how many elements of each of the types A", closed typed lambda term without term constants.

Exercise 4.5.12 [Fri75] This exercise is also concerned with the full set-theoretic hierarchy, A, for one type constant, b. We assume that A's' has at least two elements, so that the model is nontrivial. We define a form of higher-order logic with formulas

::= where in writing these formulas we assume that M and N have the same type (in some is well-formed under the assumption that x typing context) and that, in the final case, is a variable of type a. Each formula may be interpreted in A in the obvious way. We N (M N) A (N N), and assume abbreviations will use the abbreviation M for other connectives and quantifier as in Example 4.4.7. We will also use the abbreAx:b.Xy:b.y and, for M:a and N:r, (M,N) Ax:b.Xy:b.x, F viations T there exist terms M and N such that Xx: a —÷ r —÷ b. xMN. Show that for every N, by the steps given below. Vx: a. M A (a)

(b) For that any M, N, P, Q of appropriate types, A

((M, N) = (P, Q))

(M = P

A

N=Q). a(M = N). Show that if (c) We say is existential if has the form variables such that A i/i with the same free then there is an existential

is existential,

and are existential, then there is an existential 0 with FV(0) (q5 A i/i). 0 U FV(i/r) such that A

(d) Show that if

=

Simply-typed Lambda Calculus

288

(e) Show that if Rx:

is existential, there is an existential

a. 4such that A

1=

(0 Prove the main claim, using the argument that for any equivalent to 4.5.4

with the same free variables as

ax: a. there is an existential formula

—'4,

Type and Equational Soundness

Since there are two proof systems, one for proving typing assertions and one for equations, there are two forms of soundness for and other typed lambda calculi. The first, type soundness, is stated precisely in the following lemma.

Lemma 4.5.13 (Type Soundness) M: a

Let A be any Henkin model for signature E, F a provable typing assertion, and F an environment for A. Then AlIT M:

E

Aa.

This has the usual form for a soundness property: if a typing assertion is provable, then it holds semantically. The proof is a straightforward induction on typing derivations, which is omitted. With a little thought, we may interpret Lemma 4.5.13 as saying that well-typed terms do not contain type errors. In a general framework for analyzing type errors, we would expect to identify certain function applications as erroneous. For example, in the signature for PCF, we would say that an application of + to two arguments would produce a type error if either of the arguments is not a natural number. We will use this example error to show how Lemma 4.5.13 rules out errors. Suppose we have a signature that gives addition the type +: nat —± nat —± nat and a Henkin model A that interprets + as a binary function on We consider any application of + to arguments that are not elements of an error. It is easy to see that the only way to derive a typing for an application + M N is to give M and N type nat. By Lemma 4.5.13, if we can prove typing assertions F M: nat and F N: nat, then in any Henkin model the semantic meaning of M and N will be an element of the set of semantic values for natural numbers. Therefore, the meaning of a syntactically well-typed application + M N will be determined by applying the addition function to two elements of It follows that no type error arises in the semantic interpretation of any well-typed term in this Henkin model. To conclude that no type errors arise in a specific operational semantics, we may prove an equivalence between the denotational and operational semantics, or give a separate operational proof. The standard statement of type soundness for an operational semantics given by reduction is Lemma 4.4.16. Before proving equational soundness, we state some important facts about variables and

4.5

Henkin Models, Soundness and Completeness

289

the meanings of terms. The most important of these is the substitution lemma, which is analogous to the substitution lemma for algebra in Chapter 3. A simple, preliminary fact is that the meaning of F M: a in environment ri can only depend on if y is free in M.

Lemma 4.5.14 [Free Variables] Suppose iji(x) = p2(x) for every x E FV(M). Then LEF

= IF

M:

j= F are environments for

A such that

M: alIl)2.

The proof is a straightforward inductive arguments, following the pattern of the proof of Lemma 4.3.6. Since substitution involves renaming of bound variables, we will prove that the names of bound variables does not effect the meaning of a term. To make the inductive proof go through, we need a stronger inductive hypothesis that includes renaming free variables and setting the environment appropriately.

and suppose F M: a isa well-typed term. Lemma 4.5.15 Let F = (xl: cr1 Let F' = (yl: cr1 ak) and let N be any term that is a-equivalent to [yj YkI

xk]M.Ifll'(yi)=ll(xi),forl

N: a

111' r>

a

c>

—±

P: r

pl]ij and

—±

x

FfFc>M:a x

111'

Ff1' r>

M: aFj

tFj =AppfFf1'iM:a+tllij 111'

where

ftF

c>

(M,

N): a x

= the

such that

unique p E

p=

Ff1' r>

M:

and

Recall that an applicative structure satisfies the environment model condition if the standard inductive clauses define a total meaning function ft•11• on terms F M: a and environments such that = 1'. In other words, for every well-typed term 1' M: a, and must exist as defined above. every environment = 1', the meaning 111' M: aFj E

Simply-typed Lambda Calculus

304

Combinatory Model Condition the condition that every term has a meaning is equivalent to the existence of combinators, where each combinator is characterized by an equational axiom. To avoid any confusion, the constants for combinators and their types are listed below. As for

Zeroa *

:

null

—±

a

:unit :

a

—±

(a

+ r)

r

r

(a

r)

a r

The equational axioms for these constants are the equational axioms of Anu11,untt,+, X ,—± It is straightforward to extend Theorem 4.5.33 and the associated lemmas in Section 4.5.7 to Most of the work lies in defining pseudo-abstraction and verifying its properties. However, all of this has been done in Section 4.5.7. X Exercise 4.5.36 Prove Lemma 4.5.3 for Show that if A is a Henkin model, then is isomorphic to a superset of

X

Models of Typed Lambda Calculus

5.1

Introduction

This chapter develops two classes of models for typed lambda calculus, one based on partially-ordered structures called domains, and the other based on traditional recursive function theory (or Turing machine computation). The second class of models is introduced using partial recursive functions on the natural numbers, with the main ideas later generalized to a class of structures called partial combinatory algebras. The main topics of this chapter are:

Domain-theoretic models of typed lambda calculus with fixed-point operators based on complete partial orders (CPOs). •

•

Fixed-point induction, a proof method for reasoning about recursive definitions.

Computational adequacy and full abstraction theorems, relating the operational semantics of PCF (and variants) to denotational semantics over domains. •

Recursion-theoretic models of typed lambda calculus without fixed-point operators, using a class of structures called modest sets. •

Relationship between modest sets and partial equivalence relations over untyped computation structures called partial combinatory algebras. •

•

Fixed-point operators in recursive function models.

In the sections on domain-theoretic models, we are primarily concerned with the simplest class of domains, called complete partial orders, or CPOs for short. The main motivation for studying domains is that they provide a class of models with fixed-point operators and methods for interpreting recursive type expressions. Our focus in this chapter is on fixed points of functions, with recursive types considered in Section 7.4. Since fixed-point operators are essential for recursion, and recursion is central to computation, domains are widely studied in the literature on programming language mathematics. A current trend in the semantics of computation is towards structures that are related to traditional recursive function theory. One reason for this trend is that recursive function models give a pleasant and straightforward interpretation of polymorphism and subtyping (studied in Chapters 9 and 10, respectively). Another reason for developing recursiontheoretic models is the close tie with constructive logic and other theories of computation. In particular, complexity theory and traditional studies of computability are based on recursive functions characterized by Turing machines or other computation models. It remains to be seen, from further research, how central the ideas from domain theory are for models based on recursive function theory. Some indication of the relationships are discussed in Section 5.6.4.

Models of Typed Lambda Calculus

306

models, leaving the interpreBoth classes of structures are presented primarily as tation of sums (disjoint union) and other types to the exercises. The proofs of some of the theorems stated or used in this chapter are postponed to Chapter 8. 5.2 5.2.1

Models and Fixed Points

Domain-theoretic

Recursive Definitions and Fixed Point Operators

Before defining complete partial orders and investigating their properties, we motivate the central properties of domains using an intuitive discussion of recursive definitions. Recall from Section 2.2.5 that if we wish to add a recursive definition form

letrec f:o=M in

N

to typed lambda calculus, it suffices to add a fixed-point operator fiXa that returns a fixed point of any function from a to a. Therefore, we study a class of models for and extensions, in which functions of the appropriate type have fixed points. A specific example will be a model (denotational semantics) of PCF. We will use properties of fix reduction to motivate the semantic interpretation of fix. To emphasize a few points about (fix) reduction, we will review the factorial example from Section 2.2.5. In this example, we assume reduction rules for conditional, equality test, subtraction and multiplication. the factorial function may be writtenfact F, where F is the expression

Eq?yO then

F

else y*f(y—l).

1

Since the type is clear from context, we will drop the subscript fromfix. To computefact n, we expand the definition, and use reduction to obtain the following.

fact n = (Af: flat

—*

f)) F n

nat.

=F (fixF)n Eq?yO

=if

Eq?nO

then

1

then

1

else y*f(y—1))(fixF)n

else n*(fixF)(n—1)

When n = 0, we can use the reduction axiom for conditional to simplify fact 0 to 1. For n > 0, we can simplify the test to obtain n * (fix F)(n — 1), and continue as above. For any numeral n, we will eventually reducefact n to the numeral for n!. An alternative approach to understanding fact is to consider the finite expansions of

5.2

Domain-theoretic Models and Fixed Points

307

fix F. To make this as intuitive as possible, let us temporarily think of flat —÷ flat as a collection of partial functions on the natural numbers, represented by sets of ordered pairs. Using a constant diverge for the "nowhere defined" function (the empty set of ordered pairs), we let the "zero-th expansion" fixt01 F = diverge and define

F = F

F)

F describes the recursive function computed using at most n evaluations of the body of F. Or, put another way, F is the best we could do with a machine having such limited memory that allocating space for more than n function calls would overflow the run-time stack. For example, we can see that F) 0 = and F) 1 = 1, but (fix121 F) n is undefined for n > 2. Viewed as sets of ordered pairs, the finite expansions of fix F are linearly ordered by set-theoretic containment. Specifically, F = 0 is the least element in this ordering, F) U (fi, n!) properly contains all fixt'1 F for i fi. This reflects F= and the fact that if we are allowed more recursive calls, we may compute factorial for larger natural numbers. In addition, since every terminating computation involving factorial uses only a finite number of recursive calls, it makes intuitive computational sense to let fact = F); this gives us the standard factorial function. However, a priori, there is F is a fixed point of F. This may be no reason to believe that for arbitrary F, natural conditions on F (or the basic functions used guaranteed by imposing relatively to define F). Since any fixed point of F must contain the functions defined by finite F will be the least fixed point of F (when we order partial expansions of F, functions by set containment). In domain-theoretic models of typed lambda calculus, types denote partially-ordered sets of values called domains. Although it is possible to develop domains using partial functions we will follow the more standard approach of total functions. However, flat "contains" the partial functions we will alter the interpretation of flat so that flat on the ordinary natural numbers in a straightforward way. This will allow us to define an F F ordering on total functions that reflects the set-theoretic of partial functions. In addition, all functions in domain-theoretic models will be continuous, in a certain sense. This will imply that least fixed-points can be characterized as least F n O}. upper bounds of countable sets like A specific problem related to recursion is the interpretation of terms that define nonterminating computations. Since recursion allows us to write expressions that have no normal form, we must give meaning to expressions that do not seem to define any standard value. For example, it is impossible to simplify the PCF expression In computational

1

I

letrec f: nat —± flat = Ax: flat. f(x + 1) in f3

Models of Typed Lambda Calculus

308

to a numeral, even though the type of this expression is flat. It is sensible to give this term type nat, since it is clear from the typing rules that if any term of this form were to have a normal form, the result would be a natural number. Rather than say that the value of this expression is undefined, which would make the meaning function on models a partial function, we extend the domain of "natural numbers" to include an additional value 'flat to represent nonterminating computations of type nat. This gives us a way to represent partial functions as total ones, since we may view any partial numeric function as a function into the domain of natural numbers with 'flat added. The ordering of a domain is intended to characterize what might be called "information content" or "degree of definedness." Since a nonterminating computation is less informative than any terminating computation, -i-nat will be the least element in the ordering on the domain of natural numbers. We order flat —÷ nat point-wise, which gives rise to an ordering that strongly resembles the containment ordering on partial functions. For example, since the constant function Ax: flat. —I--nat produces the least element from any argument, it will be the least element of flat —÷ flat. Functions that are defined on some arguments, and intuitively "undefined" elsewhere, will be greater than the least element of the domain, but less than functions that are defined on more arguments. By requiring that every function be continuous with respect to the ordering, we may interpret fix as the least fixed-point func— tional. For continuous F, the least fixed point will be the least upper bound of all finite

F. 5.2.2

Complete Partial Orders, Lifting and Cartesian Products

Many families of ordered structures with the general properties described above are called domaifis. We will focus on the complete partial orders, since most families of domains are obtained from these by imposing additional conditions. In this section, we define complete partial orders and investigate two operations on complete partial orders, lifting and cartesian product. Lifting is useful for building domains with least elements, an important step in finding fixed points, and for relating total and partial functions. Cartesian products of CPOs are used to interpret product types. The definition of complete partial order requires subsidiary definitions of partial order and directed set. A partial order (D, 0, we have

Exercise 5.2.17 If f: D1 x ... x E, then there are two ways that we might being continuous. The first is that we could fix n — 1 arguments, and ask think of whether the resulting unary function is continuous from to E. The second is that since the product D1 x ... x of CPOs is a CPO, we can apply the general definition of continuous function from one CPO to another. Show that these two coincide, i.e., a function f: D1 x ... x E is continuous from the product CPO D1 x ... x to E if each unary function obtained by holding n — 1 arguments fixed is continuous.

f

Exercise 5.2.18

Prove the four parts of Lemma 5.2.11 as parts (a), (b), (c) and (d) of this

exercise.

Exercise 5.2.19 Show that the parallel-or function described in Section 2.5.6 is continuous. More specifically, show that there is a continuous function por: B1 B1 B1 satisfying

por true x por x true

= true = true

por false false —false by defining the function completely and proving that it is continuous. You may use the names for functions in B1 B1 given in Example 5.2.9.

Models of Typed Lambda Calculus

318

5.2.4

Fixed Points and the Full Continuous Hierarchy

Our current motivation for studying domain-theoretic models is to construct Henkin models of typed lambda calculi with fixed-point operators. The type frame, A, called the hierarchy over CPOs (Ab0, (Ak, is defined by taking full continuous Abk as base types, and

= Aa x =

all continuous

f: (Ar',

-÷ (AT,

ordered coordinate-wise by and ordered point-wise by with By Lemmas 5.2.2 and 5.2.10, each (Aa, 0}. is continuous.

In addition, the

f: V -÷ V be continuous. Since is monotonic, and 1< f(I), we can see that f'1(I) < fn+l(I). It follows that {f'1(I) n 0} is linearly ordered and therefore directed. Let a be the least upper bound a = n O}. We must show that a is a fixed point of f. By continuity of f,

f

Proof Let

f(a)=f(V{f(I)In >0}) >O} But since {f'1(I)} and {f(n+l)(I)} have the same least upper bound, we have f(a) = a. To see that a is the least fixed point of f, suppose b = f(b) is any fixed point. Since 1< b, we have f(b) and similarly f'1(b) for any n 0. But since b is a fixed point of f, f"(b) = b. Therefore b is an upper bound for the set {f"(I) n O}. Since the least upper bound of this set is a, it follows that a 0} is the set of all equations provable from axioms and elements of H.

.

In the remainder of this section, we show that the full continuous hierarchy is a Henkin model. The following two lemmas are essential to the inductive proof of the environment model condition. The second is a continuous analogue to the s-rn-n theorem for recursive functions.

Lemma 5.2.23 App

f g given by

(App

If f:

C

—÷

(V —* E) and g: C

—±

V are continuous, then so is the function

fg)c= (fc)(gc)

for all c

E C.

In addition, the map App is continuous.

Lemma 5.2.24 If tion (Curry f): C

(Curry f)cd =

f: C

—÷

x V —* E is continuous, then there is a unique continuous func(V —± E) such that for all c E C and d e D, we have

f(c, d).

In addition, the map Curry is continuous.

Models of Typed Lambda Calculus

322

Proof of Lemma 5.2.23 The monotonicity of App f g and of App itself reader. We first show that App the following calculation:

V{(Appfg)cIcE

S}

f

is left to the g is continuous. If S C C is directed, then we can make

=V{(f c)(gc)

CE S}

=(V{fcIcES})(V{gcIcES})byLemma5.2.llc = f(VS) g(VS) by continuity of f, g

=(Appfg)(VS) To show that App itself is continuous, suppose Si c (C are directed. We can check continuity by this calculation:

D

V{AppfglfESi,

E) and

S2

(C

D)

gES2} g E S2} by Lemma 5.2.10

f

=c

(V{f c

=c

((VSi)c) ((V52)c) by Lemma 5.2.1 lc

Sil) (V{g cl g

by Lemma 5.2.1 ic

E S2})

= App (VS1) (VS2) Proof of Lemma 5.2.24 As in the proof of Lemma 5.2.23, we leave monotonicity to the reader. To show that Curry is continuous, we assume that S c C is directed. Using similar reasoning as in the proof of Lemma 5.2.23, we have the following continuity calculation.

f

f)c Jc E SI = V{d F-* f(c, d)

c

S}

V{f(c, d)

c

S}

=

d

=

d

'—÷

f(VS,

by Lemma 5.2.10

d) by Lemma 5.2.11

=(Curryf)VS To show that Curry itself is continuous, we assume that S C (C x D make a similar calculation: E

S}=

V{cF—* d

f(c,d)If E SI f(c,d)jfES}byLemma5.2.1O

=ci—*

= c H-*

F—*

E) is directed and

(d

V{f(c, d)

f E S}) by Lemma 5.2.10

Models and Fixed Points

5.2

=

C

(d

(VS)

323

(c, d)) by

Lemma 5.2.11

= Curry (VS) This proves the lemma. We now show that the full continuous hierarchy is a Henkin model.

Lemma 5.2.25 The full continuous hierarchy over any collection of CPOs is a Henkin model. In Chapter 8, using logical relations, we will prove that if all base types are infinite, then is complete for equations that hold in the full continuous model. $,

Proof It

is an extensional apis easy to see that the full continuous hierarchy for plicative structure. We prove that the environment model condition is satisfied using an induction hypothesis that is slightly stronger than the theorem. Specifically, in addition to showing that the meaning of every term exists, we show that the meaning is a continuous

function of the values assigned to free variables by the environment. We use the assumption that meanings are continuous to show that a function defined by lambda abstraction is continuous. We say IF M: nj is continuous if, for every x: a E F and 11 H r, the map

to At. By Exercise 5.2.17, this is the same as saya continuous function from where F = ing M: r]j is continuous as an n-ary function A°' x ... x

{xl:al Using Lemmas 5.2.24 and 5.2.23, the inductive proof is straightforward. It is easy to see that

IIxi:ai is continuous, since identity and constant functions are continuous, and similarly for the

meaning of a constant symbol. For application, we use Lemma 5.2.23, and for lambda abstraction, Lemma 5.2.24.

.

We have the following theorem, as a consequence

Theorem 5.2.26 The full continuous

hierarchy A over any collection of CPOs is N' is pointed there is a least fixed point

a 1-Lenkin model with the property that whenever

operator

E

of Lemmas 5.2.20 and 5.2.25.

Models of Typed Lambda Calculus

324

At is pointed CPOs for the type constants, then every will be pointed and we will have least-fixed-point operators at all types. We consider a specific case, a CPO model for PCF, in the next section. The extension of Theorem 5.2.26 to sums is described in Exercise 5.2.33. Recall that if At is pointed, then

Exercise 5.2.27 Find a CPO V without least element and a continuous function f: V —* V such that does not have a least fixed point (i.e., the set of fixed points of does not have a least element).

f

f

Calculate the least fixed point of the following continuous function from using the identity given in Lemma 5.2.20.

Exercise 5.2.28 13j

—*

f=

13j

8j

—*

8j.

I

A{g(true),

true false

g(false)}

true F-÷

g(true)

A{g(true), g(false)} is the greatest lower bound of g(true) and g(false). You may —* want to look at Example 5.2.9 to get a concrete picture of the CPO where

Exercise 5.2.29 Let S be some set with at least two elements and let S°° be the set of finite and infinite sequences of elements of 5, as in Exercise 5.2.4. This is a CPO with the prefix order. For each of the following functions on S°°, show that the function is continuous and calculate its least fixed point using the identity given in Lemma 5.2.20. We assume a, b E S and write the concatenation of sequences s and s' as s 5'. (a)

(b) (c)

f(s) = ab s, the concatenation of ab and sequence s. f(s) = abab s', where s' is derived from s by replacing every a by f(s) = a s', where s' is derived from s by replacing every a by ab.

b.

Exercise 5.2.30 Recall from Example 5.2.2 1 that the collection PJV of all sets of natural numbers forms a CPO, ordered by set inclusion. This exercise asks you to show that various functions on PJV are continuous or calculate their least fixed points using the identity given in Lemma 5.2.20. —* A' is any function and A0 (a) Show that if f: F: PJV —* PJV defined by

F(A)=AoUff(n)In EA} is continuous.

c

any subset, then the function

5.2

Domain-theoretic Models and Fixed Points

325

(b) Find the least fixed points of the functions determined by the following data according to the pattern given in part (a). (i) Ao={2),f(n)=2n (ii) Ao = {1}, f(n) = the least prime p > n (c) Show that if f: A1 is any function from finite subsets of A1 to A1 and Ao ç 1V is any subset, then the function F: defined by

F(A) = Ao U { f(A') A' C A finite I

is continuous.

(d) Find the least fixed points of the functions determined by the following data according to the pattern given in part (c). (i) A0 = {2, 4, 6}, f(A) = (ii) Ao = {3, 5, 7}, f(A) = I1flEA

Exercise 5.2.31 If V is a partial order and element x E V with f(x) <x.

f: V

V, then a pre-fixed-point of

f is an

(a) Show that if V is a GPO (not necessarily pointed) and f: V V is continuous, then the set of pre-fixed-points of is a GPO, under the same order as V.

f

(b) Show that if V is a pointed GPO and f: V —+ V is continuous, then the set of prefixed-points of is a pointed GPO (under the same order as V) whose least element is the

f

least fixed-point of

f.

Exercise 5.2.32 Lemma 5.2.25, showing that the CPOs form a Henkin model, is proved using the environment model condition. An alternative is to use the combinatory model condition and show that the full continuous hierarchy has combinators. Prove that for all types p, a, r, the combinators Kay and Sp,a,y are continuous. Exercise 5.2.33

This exercise is about adding sum types to the full continuous hierarchy.

= (D, ±

N, then the CPO

c

These soundness results show that the denotational semantics given by has the minimal properties required of any denotational semantics for PCF. Specifically, if we can prove M = N, or reduce one term to the other, then these two terms will have the same meaning in APCF. It follows that if two terms have different meaning, then we cannot prove them equal or reduce one to the other. Therefore, we can use to reason about unprovability or the non-existence of reductions. Some additional connections between APCF and the operational semantics of PCF are discussed in Sections 5.4.1 and 5.4.2. A weak form of completeness called computational adequacy, proved in Section 5.4.1, allows us to use to reason about provability and the existence of reductions. As remarked at

5.2

Domain-theoretic Models and Fixed Points

329

the beginning of this section, it is impossible to have full equational completeness for In the rest of this section and the exercises, we work out the meaning of some expressions in and make some comments about the difference between the least fixed point of a function and other fixed points.

Example 5.2.37 Recall that the factorial function may be writtenfact where F is the expression F

Af:nat—±nat.Ay:nat.if Eq?yO then

1

else

F,

y*f(y—1).

We will work out the meaning of this closed term assuming that * and — denote multiplication and subtraction functions which are strict extensions of the standard ones. In other words, we assume that subtraction and multiplication have their standard meaning on nonbottom elements of .Ni, but have value if either of their arguments is I. Following the definition of fix in a CPO, we can see that the meaning of factorial is the least upper bound of a directed set. Specifically, ]J is the least upper bound n > 0), where F = APCFE[F]1. We will work out the first few cases of Ftz(I) to understand what this means. Since all of the lambda terms involved are closed, we will dispense with the formality of writing APCFE[ and use lambda terms as names for elements of the appropriate domains.

I

I

11

F-o (I)

—'nat—÷nat

F'(I)=Ay:nat.if Eq?yO then = flat. if Eq? y 0 then =Ay:nat.if Eq?yO then

1

else y*(F0(I))(y—1)

1

else

y * (Inat÷nat)(y

1

else

mat

1)

—

Eq?yO

then

1

else y*(F'(I))(y—l)

=Ay:nat.if Eq?yO

then

1

else y*

(Ax:

flat,

=Ay:nat.

if

if Eq? x 0

Eq?

y 0

then

then

1

else Inat)(y — 1) else y*(if Eq? (y— 1

1)0

then

1

else mat)

Continuing in this way, we can see that P1(I) is the function which computes y! whenever 0 y 1, there are two cases. The first is that the set (aj is pairwise consistent. Since (hi must also be pairwise consistent, we may let b = V{bi Since a is an applicative structure with (A')O, eo) A

=

.), and let (Abi,

be for E over modest sets .

Aa x At

= all recursive f:

(Aa, ea)

enumerated by eAaxAr and

Appatfx

—±

(At, er), with

=f(x),

(x,

y>

= x,

(x,

y>

=

y.

for each c: a E C, may be chosen arbitrarily. The function Const, choosing Const(c) E Note that technically speaking, an applicative structure is given by a collection of sets, rather than a collection of modest sets. In other words, while partial enumerations are used to define function types, they are not part of the resulting structure. This technical point emphasizes that the meaning of a pure lambda term does not depend on the way any modest set is enumerated. However, when we consider a specific recursive hierarchy, we will assume that a partial enumeration function is given for each type.

Theorem 5.5.10

Every full recursive hierarchy is a Henkin model.

Proof It

is easy to see that the full recursive hierarchy for is an extensional apenvironment model condition is satisfied, using plicative structure. We will prove that the

a slightly stronger induction hypothesis than the theorem. Specifically, we will show that the meaning of every term exists and is computable from a finite portion of the environment. We use the assumption that meanings are computable to show that every function defined by lambda abstraction is computable. To state the induction hypothesis precisely, we assume that all variables are numbered in some standard way. To simplify the notation a bit, we will assume that the variables used in any type assignment are numbered consecutively. We say that a sequence (n nk> for of natural numbers codes environment ij on F = {x1: ai xk: ak} if n1 E 1 cont y then skip else swap) contx is the sequence of assignments z = cont x; x cont y; y = cont z I do x := (contx) — I (c) true (while contx od) contx = (d) F {m := 0; n := 100; while contm i. Let

NE

Mk are

val terms with

x1

not occur-

be the conjunction

NE

of all inequalities

x1 x1 for i j. Intuitively, NE says that all of the location variables are different. Show that for any first-order formula G not containing cont and possibly containing value variables yj the formula

NEA[Mj

Mk/yl

yk]G{xj:=Mj; ... [contxi

contxk/yl

ykIG

is provable using the assignment axioms, sequencing rule, and rule of consequence.

Exercise 6.5.5 Find first-order formula F, store s and environment mapping each location variable to a distinct location with the property that the assertion F {x := y} [(contx)/y]F is not satisfied by s and Exercise 6.5.6

In most logics, substitution is a sound inference rule. In algebra, we have the explicit substitution rule (subst) given in Chapter 3. In lambda calculus, we can achieve the same effect using lambda abstraction and application, as shown in Lemma 4.4.2. In first-order logic, if we can prove then we can prove Vx by universal generalization and use this to prove for any term M. However, Floyd-Hoare logic does not have

Before—after Assertions about While Programs

6.5

421

a corresponding substitution property. Prove this by finding a valid partial correctness assertion F {P} G and substitution [M/x] such that [M/xIjF {[M/xIP} [M/xIG is not

valid. (Hint: You may use a simple assertion, with P a single assignment, that is used as an example in this section.)

Exercise 6.5.7 This exercise uses the same partial correctness assertions as Exercise 6.5.1, which asks you to use the denotational semantics of while programs to explain why each assertion is valid. Now prove each assertion using the proof rules given in this section. When programs refer to operations on natural numbers, you may assume that these operations are interpreted in the standard way. You do not have to prove any firstorder formulas, but be sure the ones you use are true in the intended interpretation.

y{x := conty; x := contx + 1}contx conty (b) x conty then skip else swap} contx > cont y, where swap z {if contx is the sequence of assignments z contx; x := conty; y contz (a) x

(c) true {while contx (d) F {m := 0; n := 100; is the formula (ax: nat)[0

if Eq? f(contm)

I

do x := (contx)

od) contx

=

while contm

F(b)

G(a) ,LG(f)

F(f),L b

G(a),

>

v(b)

G(b)

Categories and Recursive Types

456

We can read this commutative diagram as saying that v translates F's image F(f): F(a) G(b) of the same arrow b from C to G's image G(f): G(a) F(b) of the arrow f: a from C. A vague but helpful elaboration of this intuition involves thinking of a functor as V providing a "view" of a category. To simplify the picture, let us assume that F, G C are functors that are both injective (one-to-one) on objects and arrows. We may think of drawing F by drawing its image (a subcategory of D), with each object and arrow labeled by its pre-image in C. We may draw G similarly. This gives us two subcategories of V labeled in two different ways. A natural transformation from F to G is a "translation" from one picture to another, given by a family of arrows in V. Specifically, for each object a of C, a natural transformation selects an arrow from F(a) to G(a). The conditions of the :

definition guarantee that this collection of arrows respects the structure of "F's picture of C." In other words, since functors F and G must preserve identities and composition, a natural transformation v F G may be regarded as a translation of any diagram in the image of F. For any pair of categories, C and V, we have the functor category whose objects are functors from C to D, with natural transformations as arrows. The identity natural transformation just maps each object a to the identity arrow ida. The standard composition of natural transformations F G and v: G D, is to H, for functors F, G, H: C compose F(a) G(a) with v(a): G(a) H(a). This composition is sometimes called "vertical" composition of natural transformations, to distinguish it from another "horizontal" form of composition. We will not use horizontal composition in this book; the definition and examples may be found in standard texts [BW9O, Mac7l]. A worthwhile exercise on functor categories is 7.3.11, the Yoneda embedding. :

Exercise 7.2.8 An object a of category C is initial if, for every object b in C, there is a unique morphism from a to b. An object a of C is terminal if, for every b, there is a unique morphism from b to a. (a) Show that if a and a' are both initial objects of C, then they are isomorphic. (b) Show that a is initial in C iff a' is terminal in C°". (c) Show that if a and a' are both terminal objects of C, then they are isomorphic. (d) Show that if a is initial and b is terminal in C, and there is an arrow from b to a, then a and b are isomorphic. Show that in this case, all objects of the category are isomorphic.

Exercise 7.2.9 Let C and V be CPOs. Show that if we regard both as categories, as described in Example 7.2.6, the product category C x V is the same as the product CPO, defined in Section 5.2. Show that the functor category DC is also a partial order. How is it similar to the function CPO C V and how does it differ?

7.2

Cartesian Closed Categories

457

Exercise 7.2.10 This exercise asks you to verify properties of the categorical definitions of products and coproducts. Suppose that a, b and c are objects of some category C. (a) Show that if [a x b] and [a x b]' are products of a and b, then these are isomorphic. (b) Show that the product of two CPOs is the categorical product (i.e., satisfies the definition given in this section). (c) Show that in any category with products and a terminal object, unit, there is an iso-

morphism a x unit

a for each object a.

(d) A quick way to define sums, or coproducts, is to say that d is a coproduct of a and b if d is a product of a and b in C°". This is an instance of categorical duality, which involves defining dual notions by applying a definition to the opposite category. Write a direct definition of coproduct along the lines of the definition of product in this section. (e) Show that the sum CPO is the categorical coproduct and describe the categorical coproduct for the category Cpo1 of pointed CPOs and strict functions. (Hint: See Exer-

cise 5.2.33.)

Exercise 7.2.11 Show that if [a —± b] and [a —± b]' are function objects for a and b, then these are isomorphic. Check that function CPOs and modest sets are categorical function objects. Product and function space functors are defined for Cpo in Example 7.2.7. However, these definitions may be generalized to any category with products and function objects. Show that for any such category, these maps are indeed functors by checking that identities and composition are preserved. You may wish to write this out for Cpo first, then look to see what properties of Cpo you have used.

Exercise 7.2.12

Exercise 7.2.13 Let P = (P, into the category of sets.

be a

partial order and let F, G: P

—±

Set be two functors

F(b) is empty, then F(a) must also be the empty set. (b) Show that if v is a natural transformation from F to G, and G(a) is empty, then F(a) (a) Show that if a

b and

must also be the empty set. (c) Let (9: P

—± Set be the functor mapping each a E P to a one element set, t*}. By composition, each natural transformation v: F —÷ G from F to G gives us way of mapping (9 —± G. Show, (9 —÷ F to a natural transformation v o each natural transformation however, that there exist F and G such that there are many natural transformations from from F to G, but no natural transformations from 0 to F or 0 to G.

Exercise 7.2.14

An equalizer of two arrows

Categories and Recursive Types

458

f —> a b —> g

f

consists of an object c and an arrow c —÷ a with oh = go h such that for any h': c' o h' = g o h', there is a unique arrow c' —÷ c forming a commuting triangle with h a with and h'. In Set, we may let the equalizer off, g a —÷ b be the subset c = {x e a f(x) = g(x)} of a, together with the function c —÷ a mapping each element of c to the same element in a. (This function is called the inclusion of c into a.)

f

(a) Show that Cpo has equalizers but Cppo does not. To do this, show that if f, g: A —÷ B are continuous functions on CPOs, then {x E A f(x) = g(x)} is a CPO (when ordered as in A), but not necessarily pointed even when A and B are pointed. I

(b) Show that the category Mod of modest sets and total recursive functions has equalizers. (c) Show that for any category C, the functor category SetC has equalizers. If F, G C —± Set are functors (objects of SetC), with natural transformations v: F —± G, then consider the functor H C —> Set with H(A) = {x E F(A) 1UA(x) = VA(x)} and the inclusion natural transformation 1:: H —> F mapping each object A of C to the inclusion of :

:

I

H(A) into F(A). 7.2.3

Definition of Cartesian Closed Category

In this section, we define cartesian closed categories (CCC's), which are the categorical analog to Henkin models. A category theorist might simply say that a cartesian closed category is a category with specified terminal object, products and exponentials (function objects). These three parts may be defined in several ways, differing primarily in the amount of category theory that they require. The word "specified" in this definition means that a CCC is a structure consisting of a category and a specific choice of terminal object, products and function objects, much the way a partial order is a set together with a specific order relation. As a result, there may be more than one way to view a category as a CCC. However, as shown in Exercises 7.2.8, 7.2.10 and 7.2.11, the choice of each part is determined up to isomorphism. Since the concept of CCC is central to the study of categories and lambda calculus, we will give three definitions. Each definition has three parts, requiring terminal objects, products and function objects (respectively), but the way these three requirements are expressed will differ. The first definition is an approximation to a definition using adjunctions. Although adjunctions are among the most useful concepts in category theory, they involve slightly more category theory than will be needed in the rest of this book. There-

7.2

Cartesian Closed Categories

459

we consider adjunctions only in Exercise 7.2.25. The second definition is purely equational, while the third uses diagrams from Section 7.2.2 in a manner more common in the category-theoretic literature. It is a useful exercise, for the serious student, to work out the equivalence of these definitions in detail (see Exercises 7.2.21 and 7.2.25). fore,

First definition of CCC A cartesian closed category (CCC) fied object unit and specified binary maps x and b, and c, •

Hom(a,

unit) has

•

Hom(a,

b)

•

Hom(a x

c)

a,

exactly one element,

x Hom(a,

b,

is a category with a speci-

on objects such that for all objects

c)

!-Iom(a, b

!-Iom(a, b

x

c),

c),

the first "x" in the second line is ordinary cartesian product of sets, and the isomorphisms must be "natural" in a, b and c. The naturality conditions, which we will not spell out in detail here, may be made precise using the concept of adjunction. (See Exercise 7.2.25, which defines comma category and adjunction in general.) Intuitively, the naturality conditions ensure that these isomorphisms preserve composition in the manner described below. The intuitive idea behind the definition of CCC is that we have a one-element type, unit, and given any two types, we also have their cartesian product and the collection of functions from one to the other. Each of these properties is stated axiomatically, by referring to collections of arrows. Since objects are not really sets, we do not say "unit has only one element" directly. However the condition that Hom(a, unit) has only one element is equivalent in many categories. For example, in the category of sets, if there is only one map from B to A, for all B, then A must have only one element. In a similar way, the axiom that Hom(a, b x c) is isomorphic to the set of pairs of arrows Hom(a, b) x Hom(a, c) says that b x c is the collection of pairs of elements from b and c. The final axiom, Hom(a x b, c) Hom(a, b —* c), says that the object b —÷ c is essentially the collection of all arrows from b to c. A special case of the axiom which illustrates this point is Hom(unit x b, c) !-Iom(unit, b —* c). Since unit is intended to be a one-element collection, it should not be surprising that b unit x b in every CCC. Hence !-Iom(unit x b, c) is isomorphic to !-Iom(b, c). From this, the reader may see that the hom-set Hom(b, c) c). Since any set A is in one-to-one correspondence with is isomorphic to Hom(unit, b the collection of functions from a one-element set into A, the isomorphism Hom(b, c) Hom(unit, b —÷ c) is a way of saying that the object b —* c is a representation of Hom(b, c) inside the category. The first definition is not a complete definition since it does not state the additional "naturality conditions" precisely. Rather than spell these all out, we consider an example, where

Categories and Recursive Types

460

motivating a slightly different set of operations that make it easier to define CCC's by a set of equations. The isomorphism involving products, in the definition above, requires a "pairing function"

xHom(a,c)—±Hom(a,b xc), and a corresponding inverse function from Hom(a, b x c) to pairs of arrows, for all objects Hom(a, b x c), then we will write irif and ir2f for the elements of a, b and c. If Hom(a, Hom(a, b) and c) given by this isomorphism. One of the "naturality conditions" is that when o g E Hom(a, b x c), we must have

f

f

This gives us a way of replacing irk, which is a map from one hom-set to another, by arrow C of the category. Specifically, if we let = rri(idbX ) , then we have

=

rrj(idbXc

o

f) =

f

We may also derive an analogous arrow, (b —f c) x b c, by applying the isomorphism from Hom(a, b —f c) to Hom(a x b, c) to the identity on b —f c. Using Proj1, Proj2 and App, we may write an equational definition of CCC directly without any additional naturality conditions.

Second definition of CCC extra structure •

A cartesian closed category is a category with the following

An object unit with unique arrow Onea: a

unit, for each object a,

A binary object map x such that, for all objects a, b and c, there is a specified function b b x (. x c c b c satisfying •

.

Proj1 °

(fi'

f2) =

f1

and (Proj1

o

g, Proj2

o

=g

g)

for all objects a, b, c and arrows fi: a —f b, f2: a •

c, and g:

a —f b x c,

A binary object map —f such that, for all objects a, b and c, there is a specified function Hom(a x b, c) Hom(a, b —* c) and arrow (b —* c) x b c

satisfying

App

o

(Curry(h) o Proj1, Proj2) =

and

h

for all objects a, b, c and arrows h: a x b

If we write

f x g for the arrow (f

equations of the definition above as

o

Curry (App o c

and k: a

Proj1, g

o

(k

o

Proj1, Proj2))

k

(b

Proj2), then we can write the last two

7.2

Cartesian Closed Categories

App

a

461

(Curry(h) x id) = h and Curry (App

o

(k x id))

=

k

for all arrows h and k as described above. If we view categories as two-sorted algebras, then we may write the second definition (above) using equational axioms over the expanded signature with function symbols One, x, (., •), and so on. In the third definition (below), we define CCC's using diagrams instead of equations. In doing so, we will also state that certain arrows are unique; this eliminates the need for some of the equations that appear in the second definition, as the reader may verify in Exercise 7.2.2 1.

Third definition of CCC A Cartesian Closed Category

1S

a

category with the following

extra structure: •

An object unit with unique arrow Onea: a

unit for each object a,

A binary object map x such that, for all objects a, b and C, there is a specified function )abc. Hom(a, b) x Hom(a, C) Hom(a, b X C) and specified arrows and c, the arrow (fi, b and 12: a a such that for every fj: a b x c is the unique g satisfying •

(.

f2

bxc

C

such that, for all objects a, b and c, there is a specified funcA binary object map such that for every Hom(a x Hom(a, b c) and arrow b, C) tion c) satisfying (b is the unique k: a c, the arrow h: a x b •

(b—*c)

xb

c

App

Categories and Recursive Types

462

Example 7.2.15 The categories Set of sets, Mod of modest sets, and Cpo of CPOs are all cartesian closed, with unit, a x b and a b all interpreted as in the Henkin models based on sets (the full classical hierarchy), modest sets and CPOs. Example 7.2.16 For any typed lambda calculus signature E and E-Henkin model A, the category CA whose objects are the type expressions over E and arrows from a to r are the elements of Henkin model, A is a this category is cartesian closed. The terminal object of CA is the type unit, the product of objects a and r is a x t, and the function object for a and r is a r. Writing elements of the model as constants (i.e., using the signature in which there is a constant for each element of the model), we let Onea

=

(f,g)

a

x r. x x cr.(Proj1p)(Proj2p)

t Exercise 7.2.23 asks you to verify that the equations given in the second definition of CCC are satisfied.

.

Example 7.2.17 If C is any small category (i.e., the collections of objects and arrows are sets), then the functor category SetC is cartesian closed. Since we discuss the special case where 7) is a poset, in connection with Kripke lambda models in Section 7.3, we will not explain the cartesian closed structure of SetC here. However, it is worth mentioning that there exist Cartesian closed categories of functors that are not equivalent to any category obtained by regarding a Henkin model as a CCC. . Example 7.2.18

The category Cat of small categories and functors is cartesian closed (but not a small category). The terminal object of Cat is the category with one object and one arrow (which is the identity on this object). The product in Cat is the product operation on categories and the function object C —÷ V for categories C and V is the functor category DC.

.

Example 7.2.19 For any signature E and set E of equations between E-terms, we may construct a CCC called the term category generated by E. The objects of (E) are the types over E and the anows are equivalence classes of terms,

Cartesian Closed Categories

7.2

463

modulo provable equality from E. To eliminate problems having to do with the names of free variables, it is simplest to choose one variable of each type, and define the arrows from a to r using terms over the chosen free variable of type a. We will write [z: a M: r]E for the arrow of CE(E) given by the term z: a M: r, modulo E. In more detail, [z: a M: r]E is the set of typed terms I

The definition of the unique arrow One°: a —k unit, and operations ( ), App, Curry are straightforward. For example, One, ()and Curry are defined by

One°

([x:

a

M:

pl'

[x:

a

Curry°'t'19([z: a x r

=

[z:

a

=

[z:

aj x a2

*: unit]

a,]

N: ri)

= [x: a

(M, N): p x rI

M: p1)

= [x: a

Ày:

r.[(x, y)/z]M: pJ

Composition in the term category is defined by substitution, as follows. [x: a

M:

r]e;

[y: r

N:

= [x: a FM/yIN:

We sometimes omit the signature, writing C(E) instead of CE(E), when E is either irrelevant or clear from context. Exercise 7.2.24 asks you to verify that this category is cartesian closed. The term category is used to prove completeness in Exercise 7.2.39.

.

An interesting fact about cartesian closed categories that is relevant to lambda calculus with fixed-point operators is that any CCC with coproducts (sums; see Exercise 7.2.10) and a fixed-point operator fixa: (a a) —k a for every object a is preorder. This is the categorical version of the statement that if we add sum types to PCF, as described in Section 2.6.2, then we can prove any two terms of the same type equal. The proof for CCC's is essentially the same as for PCF with sums, given in Exercise 2.6.4. We can see how each of the usual categories of CPOs and continuous functions skirts this problem:

Cpo

The category Cpo of CPOs and continuous functions is cartesian closed and has coproducts (Exercise 7.2.10). But if the CPU D is not pointed, then a continuous D may not have a fixed point. function f: D

Cpo1 The category Cpo1 of pointed CPOs and strict continuous functions has coproducts (Exercise 7.2.10) and fixed point operators on every object. However, this category is not cartesian closed.

Categories and Recursive Types

464

Cppo The category Cppo of complete, pointed partial orders and (arbitrary) continuous functions is cartesian closed and has fixed point operators on every object but does not have coproducts.

Each of these claims may be verified by process of elimination. For example, we know by Lemma 5.2.20 that Cpo1 has fixed points, and Exercise 7.2.10 shows that Cpo1 has coproducts. Therefore, Cpo1 cannot be cartesian closed. For Cppo, we again have fixed points by Lemma 5.2.20, and can see that Cppo is cartesian closed since the terminal object (one-element CPO) of Cpo is pointed, and product and function space constructions preserve pointedness. The collection of cartesian closed categories forms a category. More precisely, the category CCC is the category whose objects are cartesian closed categories and morphisms are functors preserving cartesian closed structure. The morphisms of CCC are called cartesian closed functors, sometimes abbreviated to cc-functors. Another name for these functors that appears in the literature is representation of cartesian closed category. Since unit, x, and associated maps are specified as part of each CCC, we require that cc-functors preserve these. In more detail, if C and V are CCC's, then a functor F: C —÷ V is a cc-functor if unity, F(a x b) = F(a) x F(b), and similarly for function spaces. In addition, we require F(Onea) = = = , and the operations (),and Curry must be pre= served. For Curry, for example, this means that

(F(g)). In the remainder of this section, we prove two useful identities that hold in all cartesian closed categories. They are

(f,g)oh=(foh,goh) Curry(f)

o

h

Curry(f o (h

x id))

where f: c —± a, g: c —÷ b and h: d —÷ c in the first equation and f: a x b —± c and h: d —÷ a in the second. These equations, which are part of the "naturality conditions" associated with the first definition, may be proved using either the equations given in the second definition or the diagrams given in the third definition. To give an example of reasoning with pictures, we will prove these using diagrams. The first equation may be proved using the diagram

7.2

Cartesian Closed Categories

465

d

foh

axb

which is obtained from the "defining diagram" for (f, g) by adding an arrow h into c at the top of the picture and drawing both of the compositions with h that appear in the equation. and g, we have a triangle of the form that appears in the Ignoring the two arrows for third definition of CCC, and in the definition of product object from the previous section. The conditions on this diagram are that for every pair of outside arrows, in this case o h and g o h, there is a unique arrow,(f o h, g o h), from d into a x b that commutes with the projection functions. But since the arrow (f, g) o h, appearing as a path from d to a x b in this picture, also commutes with o h, g o h and the projection functions, it must be that

f

f

f

(f, g)

a h

=

a h, g a h).

The second equation is proved by similar reasoning, using the diagram dxa

h X ida

(h x ida)

cXa

(f) X ida

(a—*b) XC

b

App

Categories and Recursive Types

466

c and putting h x ida at the top of the "definobtained by assuming an arrow h: d forms the hying diagram" for function objects. Here the composition of h x id with potenuse of the new (larger) right triangle. The defining condition for function objects says (a —÷ b) such that k x id forms that given o (h x id) and App, the unique arrow k: d o a commuting (larger) triangle is k = Curry(f (h x id)). But since

f

f

(Curry(f)

x id)

o

(h x id)

= (Curry(f)

o

h) x id

also forms a commuting triangle, we must have Exercise 7.2.20).

Curry(f) o h = Curry(f

o

(h x id)) (see

Exercise 7.2.20 Show that the following implications hold in every CCC, assuming each arrow has an appropriate domain and codomain. (a)

(b)

1ff oProj1 =g oProj1, then f =g. (And similarly if f oProj2 =g oProj2.) If (f x id) o (g x id) = (h x id), then f o g = h. (Hint: Begin by writing k x

id as

Proj2).)

(k o

Exercise 7.2.21 Show that the second and third definitions of CCC are equivalent and that either of these implies the isomorphisms given in the first definition of CCC. Exercise 7.2.22 The product of two CCC's, in CCC, is the ordinary product of two categories. Show that CCC has function objects. Exercise 7.2.23 The category CA derived from a Henkin model A is described in ExHenkin model A, the category CA is amples 7.2.5 7.2.16. Show that for any cartesian closed, using the second definition of CCC. Exercise 7.2.24 Show that for any signature and theory E between s-terms, the category C(E) generated by E is cartesian closed. This category is described in Example 7.2.19. First write out the conditions you must check, such as if f: a —÷ unit in C(S) then = Onea. Then verify each condition.

f

Exercise 7.2.25 Adjoint situations occur commonly in mathematics and may be used to define terminal object, cartesian product, and function object succinctly. While there are several ways to define adjoint situations, the simplest and easiest to remember uses the comma category construction. If A, B and C are categories, with functors .F: A B and is defined as follows. (The original notation B, then the comma category (.F c: C was (.F, instead of (.F a).) The objects of (.F are triples (a, h, c) where a is an object of A, c is an object of C, and h: F(a) G(c) is a morphism of B. A morphism is pair of morphisms (f, g) with f: a (a, h, c) —* (a', h', c') in (.F a' in A and

7.2

Cartesian Closed Categories

467

g: c —÷ c' in C such that

F(a)

F(f)

G(c)

>

F(a')

>

G(c')

G(g)

A useful convention is to use the name of a category as the name of the identity functor on it. For example, we will use A as the name of the identity functor A A —÷ A. Using comma categories and the notation above, an adjoint situation consists of two categories, A and B, two functors F: A —÷ B and U: B —÷ A, and a functor Q: 13) (A 4. U) with inverse Q—' such that :

Q

AxB where the arrows into the product category A x B are the projections mapping an object (a, h, b) of either comma category to (a, b) and morphism (f, g) to (f, g). The functors into A x B force Q to map an object (a, h, b) of 4, 13) to an object (a, j, b) with the same first and last components, and similarly for Q—1. When such an adjoint situation exists, F is called the left adjoint of U and U the right adjoint of F. The names "F" and "U" come from the traditional examples in algebra where F(a) produces the free algebra over a, and U(b) returns the underlying set of the algebra b. A category C has a (a) The terminal category Y has one object, *, and one morphism, terminal object iff the functor C —÷ 'T (there is only one possible functor) has a right gives an isomorphism of adjoint U. Show that Q: (C 4. U) with inverse 4, T) every object of Define a terminal object of C and hom-sets 'T(*, *) C(c, U(*)) for c C. morphism One from Y and the functors involved in this adjoint situation and show that they have the properties required in either the second or third definition of CCC.

Categories and Recursive Types

468

C —÷ (b) A category C has products if there is a right adjoint U to the diagonalfunctor to (f, f). Show how such an C x C mapping object c to (c, c) and similarly morphism adjoint situation gives an isomorphism of horn-sets C(a, b) x C(a, c) C(a, b x c) for any objects a, b and c of C. Define pairing, projection morphisms (as in the second and third definitions of CCC) and the product functor C x C —÷ C from the functors involved in this adjoint situation and show that they have the required properties.

f

(c) Assuming a category C has products, which we will write in the usual way, C has function spaces if, for every object b of C, there is a right adjoint to the the functor Fb: C —÷ C with F(a) = a x b. Show how such an adjoint situation gives an isomorphism of horn-sets C(a x b, c) C(a, b —± c) for any objects a and c of C. Define currying and application morphism (as in the second and third definitions of CCC) and the function space functor C —÷ C mapping c to b —± c from the functors involved in this adjoint

situation and show that they have the required properties. Note that the problem only asks for a covariant functor mapping c to b —± c, not a full contra/covariant functor mapping b and c to b —± c.

Soundness and the Interpretation of Terms

7.2.4

We may interpret Aumt, X terms over any signature in any CCC, once we have chosen an object of the category for each type constant and a morphism for each term constant. signature E, and an object for type constant, we More specifically, given some use unit and object maps x and —± to interpret each type expression as some object of the category. Writing I[cr]J for the object named by type expression cr, we must also choose, for each term constant C: cr of an arrow unit —± E[cr]J. We will call a CCC, together with interpretations for type constants and term constants from E, a -CCC. This is similar to the terminology we use for algebra in Chapter 3, where we refer to an algebra with an interpretation for symbols from algebraic signature E as a s-algebra. Although the interpretation of type expressions as objects of a CCC may be clear from the brief description in the preceding paragraph, we give a full inductive definition to X,-± signature eliminate any confusion. Given a and C, we define the interpretation of type expression a over in CCC C as follows. CE[unit]J

= unit,

CI[b]J

=

CE[a x nil

=CIIa]l x

CE[a

—÷

b, given as part

of the

C,

CILr]J, CE[rjJ.

As with other interpretation functions, we often omit the name of the category, writing E[riJ

Cartesian Closed Categories

7.2

469

instead of C 11cr when the category C is clear from context. It is also sometimes convenient to omit the double brackets and use type expressions as names for objects of the category. If F M: cr is a typed ) flit,X,± term, then the interpretation of this term in C will be an arrow from the types of its free variables to the semantic type of the term, L[cr]1. This is analogous to the Henkin-model meaning of a term as a map from environments, which give the values of the free variables, to the semantic type of the term. One way of making "the type of the free variables" precise is to write the term with only one free variable, using cartesian products. More precisely, if F = (xi: crk}, we may transform a term F M: o into an essentially equivalent term y: (cit x ... x M': o-, where M' is obtained from M by replacing each free variable by the appropriate combination of projection functions applied to y. This approach has been used to some extent in the literature. Instead of following it here, we will give a direct interpretation of arbitrary terms that has exactly the same result. We formalize "the type of the free variables" of a term by interpreting each typing of typing context F is defined by induction context as an object. The interpretation on the length of the context: 11

—unit,

CL[ø]1

CL[T,x:oil =CL[F]1

C: cr]]

= where

ClEF i>

(M, N): ClEF

a

11cr]]

is given

a]]

=

o ClEF

M:

a

x ri]

rJ]

=

o ClEF i>

M:

a

x ri]

x

r]J

= =

MN: ri]

CftF1, x: a,

unit —*

M:

=

(CIIF

M: all, ClEF o (CEIF

CffF1, F2

M:

M: nJ

o

N: rJ])

a

r]], ClEF

211

N: aJ])

where

f is the

n, (n + 1)-function such that (F1, x: a, F2)f

=

F1,

This interpretation of terms should seem at least superficially natural, since each term is interpreted using the part of the CCC that is intended for this purpose. For example, we have a terminal object so that we can interpret *, and the morphisms associated with a and ( , ) are associated terminal object are indeed used to interpret *. Similarly, with product objects, and these are used directly to interpret term forms associated with product types. While one might expect the interpretation of a variable x: a x: a to be the identity morphism, a morphism unit x a —÷ a is used instead since the interpretation of context x: a is unit x a. Perhaps the most interesting aspect of this interpretation of terms is that lambda abstraction is interpreted as Currying. This makes intuitive sense if

Categories and Recursive Types

472

a. M from F, x: a M. If we view M as a function we remember that we form F x hail. When we change x from a free to of F, x: a, the domain of this function is and, for obtain a term whose meaning depends only on a lambda-bound variable, we any values for the free variables, determines a function from f[a]I to the type of M. Thus a. M is exactly Currying. The clause for (add var) the passage from F, x: a M to F is discussed below. It is a worthwhile exercise to check that for every typable term, we have :

In words, the meaning of a term F M: a in a cartesian closed category is a morphism from I[FIfl to f[a]j. We have already seen, in Examples 7.2.5 and 7.2.16, that every Henkin model A determines a cartesian closed category CA. It is shown in Exercise 7.2.38 that a in the cartesian closed category CA, dethe meaning of a term Xl: ai xk: termined according to the definition above, corresponds to the meaning of the closed term unit, Xl: ai xk: ak). M in the Henkin model A. Unlike the Henkin model interpretation of (add var), the CCC interpretation of this typing rule is nontrivial. The reason is that this rule changes the context, and therefore changes the "type" of the meaning of the term. Intuitively, since a new variable added to a context must not appear free in the term, the meaning of F1, x: a, F2 M: r is obtained by composing the meaning of F1, F2 M: r with a combination of projection functions that "throw away" the value of free variable x and keep the values of the variables listed in F1, F2. A special case of the clause above (see Exercise 7.2.34) is

Cur, x: a

M:

rJJ

= CIIIF M:

rJJ

We give some intuition for the interpretation

a

of (add var), using this special case, in the

following example.

Example 7.2.26 As stated in Lemma 7.2.29 below, the meaning of a constant satisfies the condition

where unit —k is given as part of the E-CCC. Intuitively, since the meaning of a constant does not depend on the values of any variables, we compose the map unit —k hrIJl with some constant function iliFifi —k unit that does not depend on the value of type IIIFII. We will work out a typical example with F = {x: a, y: p1. The only way to prove the typing assertion F C: r, using our typing rules, is to begin with 0 C: r and use (add var) twice. Let us assume we derive x: a C: r first, and then

Cartesian Closed Categories

7.2

x: ci, y: p flux: ci,

C:

y: p

473

t. The meaning of the term in a CCC is

C:

nj =

ci

= (f[Ø o

C:

njj

P

o

P

o

C:

Proj1

a

) o

Proj1

By the associativity of composition, we can write this as where a

=

o

One

txa)xp

:

o

a

P),

o

(unit x a) x p —+ unit

since the definition of CCC guarantees that there is on'y one arrow from (unit x ci) x p to unit.

.

Example 7.2.27 The term Xx: ci —± r. Xy: ci. xy is one of the simplest that requires all but the typing rule for constants. We will work out its meaning and see that it is equal to the lambda term for the identity. Working out the meaning of each subterm, we have the following calculation. 110 r>

Xx: ci

lix: ci

—÷

—± n.

Xy: ci. xyjj

t, y: ci xy:

nil

= Curry (Curry (lix: ci —± t, y: ci r> xy: nil)) = o (f[x: ci —± r, y: ci x]], ci —÷ r, y: a

y]])

.(unitxa—÷r) a

=

o

(unit x

ci

—±

n)

= Proj2•(unitxa—÷r) (unit x

ci

—±

x

ci

—÷ (ci

x

ci

—÷

—±

a

n)

ci

This gives us the final result

Curry (Curry (Appat

o

o

Proj

ta-÷r),a

If we recognize that sequences of projection functions

are used to give the meaning of a variable, as a function of the tuple of variables in the typing context, then we can begin to recognize the form of the term from its interpretation. Specifically, from the way the meaning of the term is written, we can see that the term must have the form X—abstraction (X—abstraction (application (variable, variable))).

Categories and Recursive Types

474

Intuitively, the direct correspondence between the interpretation of a term, expressed using the basic operations given as part of a CCC, and the lambda calculus syntax, demonstrates the close connection between the two systems. It is interesting to note that unlike the Henkin model meaning of a term, the variable names never appear in the calculation. This makes it easy to see that a-conversion does not change the meaning of a term. a t. x, whose meaning in a CCC is We can compare the term above with 0 directly calculated as —÷

t.x]J = Curry(E[x:a

—÷

t

—÷

t]J)

= Curry unit

((a

(a

These two meanings are in fact the same morphism, which may be easily verified using the equations given in the second definition of CCC. Specifically, this is an immediate

consequence of the equation

Curry(App

o

(k

o

Proj1, Proj2))

=k

which may be regarded as stating the soundness of i'-conversion in a CCC. (The other equation associated with with function objects in the second definition of CCC similarly . ensures soundness of $-conversion.) It is worth noting that in defining the meaning of a term in a CCC, we do not need an extra condition, such as in the environment model condition for Henkin models, stating that every typed term must have a meaning. In essence, the definition of CCC is similar to the combinatory form of Henkin model definition. A CCC is required to have a set of operations which, together, allow us to interpret each term. In fact, we may regard the basic operations One, (., .), Curry, and App as an alternate set of combinators

for typed lambda calculus, with the translation of lambda terms into combinators given by the meaning function above. The use of these categorical combinators is explored in [Cur86] and the references given there. As shown in Exercise 7.2.38, every Henkin model can be regarded as a CCC, with the two meaning functions giving the same interpretation of terms. Therefore, CCC's may be regarded as a generalization of Henkin models. (Further discussion appears in Section 7.2.5.) Since we have generalized the semantics of lambda terms, we must reprove the basic facts about the meanings of terms, such as the substitution lemma and fact that the meaning of term depends only on its free variables. Many of these are routine, often using essentially the same arguments as we used for Henkin models in Chapter 4. An important

7.2

Cartesian Closed Categories

475

property that is not routinely proved by essentially the same argument is the coherence of the meaning function for CCC's. Specifically, we must show that for any term F M: a, the meanings we obtain using different proofs of the typing assertion F M: a are all identical. The reason why this is more complicated for CCC's than for Henkin models is that we have used relatively complicated sequences of projection functions to correctly account for the dependence of a variable on the typing context.

Example 7.2.28 We can see the importance of coherence by working out the interpretation of y: a (?.x: a. x)y: a and comparing it to the /3-equivalent term y: a y: a.

Fly:

a

=

r>

unit x

a —k

a

a. x, the first is to assume There are two ways we might calculate the meaning of y: ci a. x with no free variables. The that this term is typed by adding y: a to the closed term second involves adding the variable y to the context before doing the lambda abstraction. We work both of these out below. Fly: ci

a. xli =

a. xl]

110

o

=

o

unit x E[y:

a

unit, a

Proj1

a —k

a)

(ci

a. xli = Curry(F[y: a, x: a

r>

= Curry (Proj2•unitxa a ) If we use the first, then the interpretation of the original term is written Fly: ci

(?.x: a. x)yIl

= App

o

o

Projrlta,

h by the general equation App o (Curry(h) o Proj1, Proj2) which is equal to given in the second definition of CCC. Thus this part of the definition of CCC corresponds directly to the soundness of /3-conversion. The reader may verify that the second interprea. x is equal to the first in Exercise 7.2.35. tation of the subterm y: a

Rather than prove coherence directly, we will prove simultaneously that the meaning of a term does not depend on which typing derivation we use and that permuting or dropping unnecessary variables from the typing context changes the meaning in a predictable way. By characterizing the effect of permutation, we will be able to treat typing contexts in

Categories and Recursive Types

476

equations as unordered. This allows us to continue to use unordered typing contexts in the equational proof system. In addition, we will use properties of permuting or dropping variables in the coherence proof. Let C be a E-CCC. The meaning function signature E has the following properties:

Lemma 7.2.29

11

on typed terms over

of length (i) For any m, n-function f, and ordered type assignment F = Xl: a! n such that F M: a is well-typed and Ff contains all free variables of M, we have CF[F

M:cr]j °

M:cr]j

(ii)

CDjxi:ai

=

unit]]

C([Fi>c:rlJ

CF[F

CF[F

cr]]

=

o

M:

a x ill

ni

=

o

M:

a x

=

M:a]j,

(M, N): p x

nil

C([F

r]1

N: nil) n]j,

—±

nil

f such every

that

Ff contains

FV(M) but notx

Proof We prove conditions (i) and (ii) simultaneously, by a single induction on the structure of terms (not their typing derivations). The general pattern is to prove (ii) first, using the induction hypothesis (i) for simpler terms, then check that (i) holds for the term itself. For a variable, xi: Ui x1: the only typing derivations begin with axiom (var), followed by n — 1 uses of the (add var) typing rule. We prove (ii) by a subinduction on the number of times (add var) is used. The base case is

which satisfies the condition of the lemma. For the induction step, suppose xi: a1

7.2

Cartesian Closed Categories

477

is derived by adding the variable xk. Then we have .

.

unitxa1x...xa

= where i

f is the (n

—

1),

n-function mapping each

j .a1x

.xa

=

.a1x...xa

o

1

.(aix...xan_i),an Proj1

1

i

w. Therefore, we say a Kripke applicative structure A is from to extensional if, for all f, g e

f

f = g whenever Vw'> w.Va e

a

a.

This can also be stated using the standard interpretation of predicate logic in Kripke structures. However, since this requires extra machinery, we will not go into predicate logic here. The main point that may be of interest to the general reader is that the statement above is exactly the result of interpreting the usual formula for extensionality (not mentioning possible worlds at all) in a Kripke model. There are two ways to identify the extensional applicative structures that have enough elements to be models. As with ordinary applicative structures, the environment and combinatory model definitions are equivalent. Since the combinatory condition is simpler, we will use it here. For all types a, r and p, we need global elements of type a a r and a a) of type (p r) (p r. The equational condition on Ka,t p is that for a e we must have b e a b = a. The condition for S is derived from the equational axiom in Section 4.5.7 similarly. We define a Kripke lambda model to be a Kripke applicative structure A that is extensional and has combinators. As mentioned in the last section, the interpretation of each type in a Kripke model is a Kripke model over functor from a partial order of possible worlds into Set. The full SetT any given functors F1 —÷ Set is the full subcategory whose objects : P of are all functors definable from F1 using and whose arrows are all natural transformations between these functors. Since we did not define function objects in SetT in Example 7.2.17, we do so after the following example.

Example 7.3.2

This example describes part of a Kripke lambda model and shows that may be nonempty, even if and are both empty. Let P = {O, 1, 2} with I 2. We are interested in the Kripke lambda model A with a single type constant a o interpreted as the functor F : P Set with F(O) = 0, F(1) = c2} and F(2) = {c}. (This means that = F(i) for i e {O, 1, 2}; each is uniquely deter= mined.)

Kripke Lambda Models and Functor Categories

7.3

F

A natural transformation

495

F must satisfy the diagram

v(2)

{c}

{c}

v(1)

{cl,c2}

{cl,c2}

0

>

0

v(O)

Since v(O) and v(1) are uniquely determined and any map from {ci, c2} to {ci, c2} will do for v(l), there are four natural transformations from F to F. Let us call these v, v', V', V"t.

The function type should represent all natural transformations from F to G. However, by extensionality, we may only have as many elements of as there are natural transformations when we consider the possible worlds that are w. In particular, may only have one element, since there is only one possible mapping from to and there are no other worlds 2. Let us write Vlw for the result of restricting V to worlds > w. In other words, Vlw is the function with domain {w' w'> w} such that = V(W'). Using this notation, we may define Aa+a by a—*a

A2

If

F/f

=

{Vl2,

=

{VII, V'Il, V"I1, V"ji}

=

{V, V', V",

F

V

V

2,

V

21

V}

Note, however, that v12 = V'12 Application is given by VX

=

= v12 =

since there is only one function from

to

V(W)(X),

which is the general definition for any full Kripke lambda model where the elements of function type are natural transformations. The reader is encouraged to verify that this is an four elements of extensional interpretation of a —* a with four elements of and one element of As mentioned in Example 7.2.17, the category SetC of functors from C into Set is cartesian closed, for any small category C. Any functor F with each F(c) a one-element Set is given by set will be a terminal object of SetC. The product of functors F, G : C

(F x G)(c) = F(c) x G(c),

Categories and Recursive Types

496

f

to F(f) x G(f). The function object with the arrow may of F x G taking an arrow (F —÷ G) : C —k Set defined using the Horn functor with Horn(a, _): C —k Set taking object c to the set Hom(a, c) and acting on morphisms by composition. Using Horn, the functor F —÷ G is defined as follows:

= the set of all natural transformations from Horn(c, _) x F to

(F —÷ G)(c) (F —÷ G)(f: a

—÷

b)

=

G

the map taking natural transformation v:

Hom(a, _) x F —k G

toji=(F—÷ G)(f)(v):Horn(b,_) x F—÷

G

according to the diagram below, for each object c of C. Horn(b, c) x F(c)

b

G(c)

>

lid

(_of)xidJ,

fT

Hom(a, c) x F(c)

a

G(c)

>

G(g)

f)

f.

where the function (_ o : Hom(b, c) —k Hom(a, c) maps h to h o In the special case of Set1', where P is a partial order, each of the horn-sets have at most one element. This allows us to simplify the definition of F —÷ G considerably. If w is an object of the partial order P, it will be convenient to write the objects that are w in the partial order. If F: P

P

—* Set, then we similarly write restricting to follows.

f

P

for the functor from into Set obtained by —* Set, we can simplify the definition above as

(F —÷ G)(w)

= the set of all natural transformations from

(F —÷

= the rnap taking natural transformation to

= (F —÷ G)(f)(v):

v:

to G

—k G

—* G

according to the diagrarn below, for each object w"

w'

)

W

>

w'? w in P.

G(w)

lid —

v(w)

G(W)

In other words, (F G)(f)(v) is identical to v, restricted to the subcategory Put in the notation of Kripke lambda models, if types a and r are interpreted as functors

7.3

Kripke Lambda Models and Functor Categories

F, G P —± Set by = F(w), Kripke lambda model is the functor

497

= G(w) then the interpretation of a = F —÷ G with

:

= the set of all natural transformations from = the map restricting Exercise

7.3.3

—*

r in the full

to G

—± G to

v:

and

Describe

in

the full Kripke model of Exam-

ple 7.3.2. 7.3.5

Environments

and Meanings of Terms

An environment ij for a Kripke applicative structure A is a partial mapping from variables and worlds to elements of A such that

If

e

and

w'>

w, then ijxw'

=

(env)

Intuitively, an environment ij maps a variable x to a "partial element" that may exist (or be defined) at some worlds, but not necessarily all worlds. Since a type may be empty at one world and then nonempty later, we need to have environments such that is undefined at some w, and then "becomes" defined at a later w'> w. We will return to this point after defining the meanings of terms. we write ij[x If ij is an environment and a e a] for the environment identical to ij on variables other than x, and with (ij[x

a])xw'

for all w' w. We take (ij[x i—÷ a])xw' to be undefined if w' is not greater than or equal to a] actually depends on w if the are not disjoint. W. Note that the definition of ij[x If ij is an environment for applicative structure A, and F is a type assignment, we say ij F if satisfies F at W, written ij, W

EF. Note that if ij, W F and w'> W, then ij, w' F. F, we define the For any Kripke model A, world ui and environment ij with ij, W meaning AI[F M: aIIiJW of term F M: a in environment ij at world ui by induction on typing derivations, as for standard Henkin models. For notational simplicity, we write If. .11 ... in place of All. .11 ... Since the clause for (add var) is trivial, it is omitted. .

Categories and Recursive Types

498

= =

aJjijw Fir

)7XW

=

MN:

= the unique d

a

M: a

(hF

(hF

—÷

N:

E

such that for all

w'>

w and a E

= IF, x: a

M:

al w'

Combinators and extensionality guarantee that in the F Xx: a.M: a —± r case, d exists and is unique. This is proved as in the standard Henkin setting, using translation into combinators as in Section 4.5.7 for existence, and extensionality for uniqueness. We say an equation r M = N: a holds at w and ij, written

(r

w

M = N: a)

if, whenever

w

M:

=

Fir

r, we Fir

A model A satisfies F satisfy the equation.

have

N: M

= N: a, written A

F

M = N: a, if every

ij

and w of

A

Example 7.3.4 As an application of Kripke models, we will show how to construct a counter-model to an implication that holds over Henkin models. Let E1 and E2 be the equations E1

Xx:a.fPi=Xx:a.fP2

and E2

fPi=fP2,

Xx: a. where f: (a a a) b, P1 a. x and P2 a. a. y. Exercise 4.5.24 shows how rule (nonempty) can be used to prove E2 from E1 and Exercise 4.5.29 how to prove E2 from E1 using rules (empty I) and (empty E). Since E1 does not semantically imply E2 over Kripke models, it follows that we cannot prove E2 from E1 without using one of these extensions to the proof system. Furthermore, neither (nonempty) nor (empty I) and (empty E) are sound for Kripke lambda models. We must construct a Kripke lambda model A for type constants a and b and term constant that satisfies E1 but not E2. If there is a global element of type a, then we could apply both functions in E1 to this element and obtain E2. If is empty at all possible worlds, then the argument given in Exercise 4.5.29 shows that E1 implies E2. Therefore,

f

7.3

Kripke Lambda Models and Functor Categories

499

we must let A'1 be empty at some world and nonempty at another. Since it suffices to have two possible worlds, we let W = {O, l} with 0 < and take = 0. Since we need P1 and P2 to be different, we let A? have a two distinct elements, c1 and c2. Equation E1 will hold at world 0, since a is empty. To make Pi = P2 equal at world 1, we let have only one element, say d. Without committing to we can see that E1 must hold at both worlds: the two terms of type a —k b are equal at world 0 since =0 and equal at world since has only one element. We let have two elements, d1 and d2, so that E2 may fail at world 0. Our counter-model is the full Kripke lambda model over the two functors A'2 and Ab. As described in Example 7.3.2 and Exercise 7.3.3, and therefore are nonempty at world 0. If we let map to d1 and to d2, then the equation E2 fails at world 0. Since E2 holds in A only if it holds at all possible worlds of A, we can see that A E1 but A E2. 1

f

f

1

Exercise 7.3.5 We may consider classical applicative structures as a special case of Kripke applicative structures, using any partial order. To be specific, let A = {AppoT}) be a classical applicative structure, i.e., {A°j is a collection of sets indexed by is a collection of application functions. We define the Kripke structure types and

Aw = (W,

Const)

= = by taking sets application functions and each transition functo be the identity. In categorical terms, Aw could be called an applicative struction ture of constant presheaves. Show that if A is a classical lambda model then Aw is a Kripke lambda model by checking extensionality and using induction on terms to prove that the meaning of a term M in Aw is the same as the meaning of M in A, in any environment. 7.3.6

Soundness and Completeness

We have soundness and completeness theorems for Kripke lambda models.

Let S be a set of well-typed equations. then every model satisfying S also satisfies F M = N: a.

Lemma 7.3.6 (Soundness)

IfS

F

F—

M

= N: a,

A direct proof is given as Exercise 7.3.8. An alternate proof may be obtained using the correspondence between Kripke models and CCC's discussed in Section 7.3.7.

Theorem 7.3.7 (Completeness for Kripke Models)

For every lambda theory 5, there is a Kripke lambda model satisfying precisely the equations belonging to S.

Proof The completeness theorem is proved by constructing a term model

Categories and Recursive Types

500

A = (W,

Const)

of the following form.

W is the poset of finite type assignments F ordered by inclusion. In what follows, we will write F for an arbitrary element of W. •

M: a], where F M: a with respect to E is

is the set of all [F

•

of F

[F M:

a]) =

M:

a

is well-typed, and the equivalence class

N:

M: a] for F C F'

It is easy to check that the definition makes sense, and that we have global elements K and S at all appropriate types. For example,

= [)Lx: a.)Ly: r.x] Since the required properties of K and S are easily verified, the term applicative structure is a Kripke combinatory algebra. The proof of extensionality illustrates the difference between Kripke models and ordiand [F N: a —± r] have the same nary Henkin models. Suppose that [F M: a —± functional behavior, i.e., for all F'> F and F' P: a, we have

Then, in particular, for F' [F, x: a

Mx: r]

[F, x: a

F, x: a with x not in F, we have c'-

Nx:

r]

and axiom

(ij),

we have

ifx:aEFCF'

1)X

—

j

undefined

otherwise

[F

M:

a

r]

N: a —± r]. Thus A is a Kripke lambda model. This argument may be compared with the step in the proof of Lemma 8.3.1 showing that if E F-Fr> MN = M'N': r for every N, N' with E H Fr> N = N': a, then E F- F c- M = M': a —± r using the assumption that 7-1 provides infinitely many variables of each type. It remains to show that A satisfies precisely the equations belonging to E. We begin by relating the interpretation of a term to its equivalence class. If F is any type assignment, we may define an environment ij by and so by rule

—±

[F

Kripke Lambda Models and Functor Categories

7.3

501

A straightforward induction on terms shows that for any F" N

N

F, we have

M: a jj1)F" = [F" N M: all

In particular, whenever A satisfies an equation F construction of Ti, and so

[F

F'

M:

oll=

M = N: a, we have

F

=

F by

[F N N: all

Since this reasoning applies to every F, every equation satisfied by A must be provable from S. While it is possible to show that A satisfies every equation in S directly, by similar reasoning, certain complications may be avoided by restricting our attention to closed terms. There is no loss of generality in doing so, since it is easy to prove the closed equation

=Axl:al....Axk:crk.N from the equation

between open terms, and vice versa. For any closed equation 0

M

= N: r

E

5, we have

SF-I' M = N: r for any F, by rule (add var). Therefore, for every world F of A, the two equivalence and [F N: r} will be identical. Since the meaning of 0 M: r in any classes [F M: environment Ti at world F will be [F M: r], and similarly for 0 N: r, it follows that A satisfies 0 M = N: r. This proves the theorem.

Exercise 7.3.8 Prove Lemma 7.3.6 by induction on equational proofs. 7.3.7

Kripke Lambda Models as Cartesian Closed Categories

In this section, we will see that any Kripke model A with products and a terminal type determines a cartesian closed category CA. As one would hope, the categorical interpretation of a term in CA coincides with the meaning of the term in the Kripke model A. It is easy to extend the definitions of Kripke applicative structure and Kripke lambda model to include cartesian product types a x r and a terminal type unit. The main ideas are the same as for Henkin models, outlined in Section 4.5.9. Specifically, a structure, but with additional operations applicative structure is just like a

Categories and Recursive Types

502

—÷ —÷ at each possible world w. An extensional structure, and therefore model, must satisfy the must have exactly one element at each world, and conditions that

Vp, q

A

E

=

Dp

=q

at each world. The interpretation of combinators * and Pair, as described in Section 4.5.9, give a name to the unique global element of A and force there to be a pairing

function. X,+ Kripke model A, we regard a partially-ordered To construct a CCC CA from set 0, and show that the least fixed point of F is isornorphic to the natural numbers. What operation on sat. unit + t corresponds to successor? Is W coproduct in

Exercise 7.4.1 A

W

B

=

({O}

Sets? 7.4.2

Diagrams, Cones and Limits

Our first step is to generalize the definition of least upper bound from partial orders to categories. This requires some subsidiary definitions. In keeping with standard categorical

Categories and Recursive Types

510

terminology, we consider the generalization of greatest lower bound before least upper bound. A diagram D in a category C is a directed graph whose vertices are labeled by objects of from C and arcs are labeled by morphisms such that if the edge e is labeled by an arrow a to b, then the labels on the endpoints of e must be a and b. Some examples of diagrams are the squares and triangles that were drawn in earlier sections. If a vertex of D is labeled by object a, we will often ambiguously refer to both the vertex and its label as a, when no confusion seems likely to result, and similarly for morphisms and arcs. If D is a diagram in obtained by reversing the direction is the diagram in the opposite category C, then of each of the arcs (arrows). As explained in Section 7.2.2, a diagram commutes if, for every pair of vertices, all compositions given by a path from the first vertex to the second are equal morphisms of the category. The reason for defining diagrams formally is to define limits; the definition of limit, in turn, requires the definition of cone. A cone (c, {fd: c —± d}d D) over commuting diagram D consists of an object c together with a morphism fd: c —± d for each vertex d of D such that the diagram obtained by adding c and all fd: c —± d to D commutes. A morphism of cones, (c, {fd: c —* d}d in D) (c', {gd: c' —* d}d D), both over the same diagram D, is a morphism h c —± c' such that the entire diagram combining h, both cones, and D commutes. Since we assume each cone commutes, it suffices to check that for each d in D, the triangle asserting fd = gj o h commutes. This is illustrated in Figure 7.1. It is easy to check that cones and their morphisms form a category. A limit of a diagram D is a terminal cone over D. In other words, a limit is a cone (c, {fd: c —* d}) such that for any cone (c', {gd: c' —* d}) over the same diagram, there is a unique morphism of cones from c' to c. The dual notion is colimit: a colimit of D is a limit of the diagram in the opposite category C°". It follows from Exercise 7.2.8, showing that initial and terminal objects are unique up to isomorphism, that any two limits or colimits of a diagram are isomorphic in the category of cones or cocones. It is easy to see that if h c c' is an isomorphism of cones or cocones, it is also an isomorphism of c and c'in the underlying category.

f

:

:

The limit of a discrete diagram D = {d1, d2}, with two objects and no arrows, is the product, d1 x d2, with projection functions d1 x d2 —± d. (See Exercise 7.4.4.) .

Example 7.4.2

Example 7.4.3

Let C be a partial order, regarded as a category in the usual way (Example 7.2.6) and let D be set of objects from C. We may regard D as a diagram by including all of the "less-than-or-equal-to" arrows between elements of D. Since all diagrams in C commute, a cone over D is any lower bound of D, and the limit of D is the greatest lower

7.4

Domain Models of Recursive Types

511

C

d5 d3

d1

Figure 7.1 Morphism of cones.

bound of D. We can also describe least upper bounds as limits using the opposite category Specifically, the least upper bound of D is the limit of D in C. .

Example 7.4.3 shows that least upper bounds are a special case of colimits. The way we will obtain recursively-defined objects is by generalizing the least fixed-point construction to obtain colimits of certain diagrams. Since we will therefore have more occasion to work with colimits than limits, it is useful to have a direct definition of colimits using cocones instead of opposite categories. A cocone (c, {fd: d c)d D) over diagram D c for each vertex d of D such consists of an object c together with a morphism fd: d that the diagram obtained by adding c and all fd: d —± c to D commutes. A morphism of (C', {gd: d c')), is a morphism h c c' such that the cocones, (c, {fd: d c}) entire diagram combining h, both cocones, and D commutes. c c A for a cone = (c, If A is a diagram, a common notation is to write = (c, {lLd: d —± c)d in D), since these highlight the A c for a cocone d)d in D) and directions of the arrows involved. :

Exercise 7.4.4 Show that the limit of a discrete diagram D = Example 7.4.2, if it exists, is the product di x d2. Exercise 7.4.5

If f, g: a

—±

b, then an

equalizer of

{d1, d2), as

described in

f and g is a limit of the diagram

Categories and Recursive Types

512

f b.

a

g

Show that an equalizer is determined by an arrow h: c

—±

a

and that, in Set, the image of h

is(xEa f(x)=g(x)). I

7.4.3

F-algebras

We will interpret a recursively-defined type as a "least fixed point" of an appropriate functor. In this section, we show that the least fixed point of a functor F is the initial object in the category of F-algebras, which are also defined below. We will show in the next section that an initial F-algebra is also the colimit of an appropriate diagram. The ordinary algebras (as in Chapter 3) for an algebraic signature may be characterized as F-algebras in the category of sets. To illustrate this concretely, let us take the signature

E with one sort, s, and three function symbols,

fi:sxs—*s, f2:sxsxs—*s, f3:s. and of Recall that a E-algebra A consists of a set, A, together with functions the appropriate type. If we use a one-element set unit, products and sums (disjoint unions), we can reformulate any E-algebra as a set, A, together with a single function

fA:(A xA)+(A xA can all be recovered from fA using injection functions. Since any since and composition of products, sums and constant functors yields a functor, we may define a functor,

Fr(s) =

(s

x s) + (s x s x s) + unit,

f:

—± and say that a s-algebra is a pair (A, f) consisting of a set A and a function for any algebraic signature A. It is easy to formulate a general definition of functor This allows us to define of the class of algebras with a given signature using categorical concepts. If C is a category and F C —± C is a functor, then an F-algebra is a pair (a, f) consisting of an object a and a morphism F(a) —± a in C. This is equivalent to the definition in Chapter 3. More precisely, for any algebraic signature the Fr-algebras in Set are exactly the s-algebras of universal algebra. We can also use the definition for categories other than Set, which is one of the advantages of using categorical terminology. The curious reader may wonder why the definition of F-algebra requires that F be :

f:

7.4

Domain Models of Recursive Types

513

of just a function from objects to objects. After all, in establishing a correspondence with the definition of s-algebra, we only use the object part of FE. The answer to this question is that, using functors, we may give a straightforward definition of algebraic homomorphism, as verified in Exercise 7.4.7. Specifically, a morphism of Falgebras from (a, f) to (b, g) is a morphism h a b satisfying the commuting diagram a functor, instead

:

F(a)

F(h)

F(b)

a

b

>

h

Since F-algebras and F-algebra morphisms form a category, we write h: (a, f) (b, g) to indicate that h is a morphism of F-algebras. An F-algebra (a, f) is a JLxed point of F if F(a) a is an isomorphism. The category of fixed points of F form a subcategory of the category of F-algebras, using Falgebra morphisms as morphisms between fixed points. In the special case that C is a CPO and a functor is a monotonic function f: C C, given definition of fixed point just is equivalent definition in 4, the to the Chapter by antisymmetry. An f-algebra, in the case that C is a CPO, is called a pre-fixed-point of f, as described in Exercise 5.2.3 1. The following lemma identifies fixed points of a functor F within the category of Falgebras and shows that an initial F-algebra is a fixed point of F. For the special case of a CPO as a category, this shows that the least pre-fixed-point is the least fixed point, which is part (b) of Exercise 5.2.3 1. Since F-algebra morphisms are a generalization of standard algebraic homomorphisms (when some category other than Set is used), initial F-algebras are also a generalization of the the initial algebras described in Chapter 3. In particular, the discussion of initiality in Sections 3.5.2 and 3.6.2 provides some useful intuition for F-algebras in arbitrary categories.

f:

Lemma 7.4.6

If (a,

f)

is an initial F-algebra, then (a,

Proof Suppose (a, f) is initial. We show (a, f)

F(F(a))

F(f) >

f)

is an initial fixed point of

F.

is a fixed point using the diagram

F(a)

.tf F(a)

>

f

a

which shows that is an F-algebra morphism from (F(a), F(f)) to (a, f). Since (a, f) is initial, we also have an F-algebra morphism g: (a, f) —± (F(a), F(f)) in the opposite

Categories and Recursive Types

514

direction. Since the only morphism from an initial objects to itself (in any category) is the identity, it is easy to see that o g = ida. Since F is a functor, it follows F(f) o F(g) =

f

ldF(a).

The remaining identity is proved by adding g and F(g) horizontally and vertically to the square above. Since the result commutes, and F(f) o F(g) = as noted above, we may conclude g o = This shows (a, f) is a fixed point. Since the fixed points form a subcategory of the category of F-algebras, it follows that (a, f) is initial in the category of fixed point of F. .

f

and functor F>2 above, verify that morExercise 7.4.7 Using the example signature phisms of F>2 algebras (in Set) are precisely ordinary homomorphisms of -algebras.

Let F: Gpo —* Gpo be the lifting functor with F(A) = A1 and, for f: A B, F(f) = A1 —* B1. Describe the initial F-algebra and explain, for any other F-algebra, the unique F-algebra morphism.

Exercise 7.4.8 —*

Exercise 7.4.9

In Gpo or Set, let

F be the functor F(A) = unit +

A.

(a) Describe the initial F-algebra (A, f) and explain, for any other F-algebra (B, g), the unique F-algebra morphism hB,g: A —* B.

(b) If we think of the unique hB,g: A —* B as a function of g, for each B, we obtain a function hB: ((unit + B) —* B) —* B. What standard operation from recursive function theory does this provide? (Hint: A function (unit + B) —* B is equivalent to a pair B x (B —* B); see Exercise 2.5.7.) (c) Why is the initial F-algebra the same in Gpo and Set?

7.4.4

w-Chains and Initial F-algebras

As discussed earlier, we are interested in finding least fixed-points of functors. The main lemma in this section shows that under certain completeness and continuity assumptions, colimits of the appropriate diagrams are also initial F-algebras and therefore, by Lemma 7.4.6, initial (or least) fixed points. Although we used directed sets to define complete partial orders, we will use chains instead for categories. (Exercise 5.2.6 compares chains and directed sets for CPOs). An w-chain is a diagram A of the form A

= d0

fo

d1

fi

d2

written in symbols as A of arrows leading from define the morphism

=

f2

(d1,

to d1 —*

when by

In any such w-chain, there is a unique composition i j. To be precise, and to give this a name, we

7.4

Domain Models of Recursive Types

°

=

515

fij

For an w-chain A as indicated above, we write A the first object and arrow, A

—

and,

fI

= d1

if F: C

f2

d2

—±

C

for the w-chain obtained by dropping

d3

is a functor on the category containing A, we also define the diagram

F(A) by

F(A)=F(do) If

=

(c, ([Li:

(c, ([Li: d1

F(fo)

F(d1)

d1 —k

c)1>j) of A

F(f1)

F(d2)

F(f2)

A is a cocone of A, we similarly write ,iT for the cocone and F(,i) for the cocone (F(c), F(d1) F(c))1>o) of c

—k

F(A). A familiar example of an w-chain arises in a CPO, regarded as a category C with a morphism a —± b if a p' (see Exercise 7.4.16). It follows that if (e, p) and (e', p') are embedding-projection pairs with e = e', then p = p', and conversely. Therefore, if is an embedding we may unambiguously write for the corresponding embedding. It is also convenient to extend this notation to diagrams, writing for example, for the diagram obtained by replacing every morphism in a diagram A in CE with the associated projection fprj Exercise 7.4.18 shows that if A and B are CPOs with an embedding-projection pair (e, p): A —÷ B and A is pointed, then B must be pointed and both e and p must be strict.

f

f

f

Example 7.4.14 If A and B are CPOs, then e: A B and p: B A form an embedding-projection pair only if p o e = id. This equation, which is often read, "p is a left inverse of e," may be satisfied only if e is injective (one-to-one). Since we only consider continuous functions in the category Cpo, an embedding in Cpo must be an injective, continuous function with a left-inverse. Using the CPOs {O, 1) to {O, 1, 2), with 0 < i. Show for any a E we have

=

(ao,

ai,...)

—±

B to A

...)) = D.

A. (Note that

The corresponding j

—+

= AHIF

=

M:

xa

a

M:

ALIT

Ay: r>

a.[y/xJM: a

—+

whenever

= AHIT M: r]j'l forx

=

772(x)

allx

E

FV(M)

not in T

Two important examples of acceptable meaning functions are meaning functions for models and substitution on applicative structures of terms. These and other examples appear following the next definition. The last three conditions of the definition above are not

Logical Relations

540

strictly necessary for the Basic Lemma. However, they are natural properties that hold of all the examples we consider. In addition to acceptable meaning functions, we will need some assumptions about the behavior of a logical relation on lambda abstractions. A logical relation over models is necessarily closed under lambda abstraction, as a consequence of the way that lambda abstraction and application interact. However, in an arbitrary applicative structure, abstraction and application may not be "inverses," and so we need an additional assumption to prove the Basic Lemma. If C A x 8 and AI[ 81111 are acceptable meaning functions, we say is admissible for AE[ and BII if, for all related environments Tia, 1= F and terms F, x: a M: r and F, x: a N: r, 11

11

Va, b.

R°(a, b)

j Rt(AI[F, x: a

a N: r}jTib[x

M:

b])

implies

Va,b.

R°(a,b)j

_±

The definitions of acceptable meaning function and admissible logical relation are technical and motivated primarily by their use in applications of the Basic Lemma. To give some

intuition, several examples are listed below.

Example 8.2.6 If A is a model, then the ordinary meaning function AL[ is acceptable. It is clear that if A and B are models, then any logical A x 8 is admissible, since then 11

x: a

a

M:

a N: r.

.

Example 8.2.7 Let T be an applicative structure of terms M such that F M: a for some F 1-i, as in Example 4.5.1. An environment for T is a mapping from variables to terms, which we may regard as a substitution. Let us write TiM for the result of substituting terms for free variables in M, and define a meaning function on T by TiM.

It is a straightforward but worthwhile exercise to verify that Till II is an acceptable meaning function. We will see in Section 8.3.1 that every lambda theory E gives us an admissible logical relation lZj' T x T. In short, relates any pair of terms that are provably equal from E. The reason why lZj' is admissible is that we can prove F Ax: a. M = Ax: a. N: a r immediately from F, x: a M = N: r. .

8.2

Logical Relations over Applicative Structures

541

Example 8.2.8 If A is a typed combinatory algebra, then A is an applicative structure with an acceptable meaning function. The standard acceptable meaning function for A is obtained by translation into combinators. More specifically, as described in Section 4.5.7, we may translate a typed term F M: a into a combinatory term F cL(M): a that contains constants K and S of various types but does not involve lambda abstraction. Since any such combinatory term has a meaning in A, we can take the meaning of F cL(M): a as the meaning of F M: a. It is easy to verify that this is an acceptable meaning function. However, recall that since a combinatory algebra need not be extensional, 16-equivalent terms may have different meanings. It is shown in Exercise 8.2A6 that every logical relation over combinatory algebras is admissible. Example 8.2.9 A special case of Example 8.2.8 that deserves to be singled out is that from any partial combinatory algebra 21), as defined in Section 5.6.2, we may construct a typed combinatory algebra In this structure, the interpretation of each typed term will be an element of the untyped partial combinatory algebra. We shall see in Sections 8.2.4 and 8.2.5 that the symmetric and transitive logical relations over correspond exactly to partial equivalence relation models over V. Using acceptable meaning functions and admissible relations, we have the following general form of the Basic Lemma.

Lemma 8.2.10 (Basic Lemma, general version)

Let A and B be applicative structures ç A x B be an admissible with acceptable meaning functions All and 511 and let logical relation. Suppose 17a and ip, are related environments satisfying the type assignment F. Then 11

c>

M: a]]TJa,

for every typed term F

c>

c>

M: a]]Tlb)

M: a.

The proof is similar to the proof of the Basic Lemma for models, since the acceptability and the admissibility of are just what we need to carry out the proof. and BI[ We will go through the proof to see how this works.

of AI[

Proof We use induction on terms. For a constant c, we assume

, ConstB(C)) as part of the definition of logical relation, so the lemma follows from the definition of by assumpacceptable meaning function. For a variable F x: a, we have and similarly for 51111. Therefore tion. Since All 11 is acceptable, AIIF x: aliT/a = BIIF x: R,(AI[F x: M: a For an application F MN: r, we assume inductively that Since logical, > N: R. is and > N: M: a aliT/a, BIIF alIT/b). R°(AIIF tilT/a, BIIF tilT/b)

Logical Relations

542

it follows that

RT(App A1FF

M: a

t]ila

N: a]jija, App 131FF

AE[F

M: a

Since the meaning functions are acceptable, it follows that AIFF

MN: t]IT)a are related. For an abstraction F Ax: a.M:

Ax: a.M:

a

t1171a, 131FF

for related environments modified environments we have Va, b.

Since

R°(a, b)

D

a

'c1a

and

—÷

N: all b).

131FF

11

MN: tIU17a and 81FF

t, we want to show that

Ax: a.M:

a

and are related, then whenever R°(a, b), the a] and llb[x b] are related. By the inductive hypothesis,

x: a

a], 81FF, x: a

M: rFja[x

M: r]l77b[x

bJ).

is admissible, it follows that

Va, b. R°(a, b) D RT(App (AIFF

App (81FF

Ax:

a.M: a

Ax: a.M:

a

t]11]b)b).

But by definition of logical relation, this implies that AI[F Xx: a.M: a 131FF t]17)b are related.

Ax:

a.M: a

t]jiia and

—÷

•

The following lemma is useful for establishing that logical relations over term applicative structures are admissible. For simplicity, we state the lemma for logical predicates only.

Let T be an applicative structure of terms with meaning function defined by substitution, as in Example 8.2.7. Let P T be a logical predicate. If

Lemma 8.2.11 VN E

Pb(({N/xJM)N .. Nk) implies VN .

E

for any type constant b, term Xx:a.M in TVk, then P is admissible.

Pb((Xx:a.M)NN1 .. Nk) .

and terms N1

E TV!

Nk E

This lemma may be extended to using elimination contexts in place of sequences of applications, as described in Section 8.3.4.

Proof For a term applicative structure T, the definition of admissible simplifies to VN E VN E

Lt, N/xi

xt, x]M) implies

8.2

Logical Relations over Applicative Structures

543

Since x is a bound variable in the statement of the lemma, we are free to choose x so as not to appear in any of the L1. Therefore, it suffices to show that

implies YN

YN E

for arbitrary M,x and YN E

t.

a.M)N)

E

Since this follows immediately from condition

papr(([N,/xIM)N ..

implies YN

.

E

papr((Ax:aM)NN

.

.

.

Nh),

(*)

it suffices to show (*) by induction on t. The base case is easy, since this is the hypothesis of the lemma. For the inductive step, we assume (*) for types and T2 and assume YN E

.

.

.

Nh).

[N/x]M may be replaced by (Ax: a.M)N. Let Q tion and the fact that 7' is logical, we have We must show that

YN E

papr2(([N/XIM)N

. .

Pt1. By assump-

.

By the inductive hypothesis, is follows that YN E

papr2((Ax.aM)NN

Since this holds for all Q YN E pa

. .

.

we may conclude that .

.

Nh).

.

This completes the inductive step and the proof of the lemma.

Our definition of "admissible" is similar to the definition given in [Sta85b], but slightly weaker. Using our definition we may prove strong normalization directly by a construction that does not seem possible in Statman's framework [Sta85b, Example 4].

Exercise 8.2.12

Let A be a Henkin model. We say that a

E

is lambda definable from

akEAokifthereisaterm{xJ:crj in the environment with 17(x1) = a1. Use the Basic Lemma (for models) to show that if E c A x A is a logical relation, a E is lambda definable from ai E

a1EA°'

17

and

a1) for each i, then

R°(a, a).

Exercise 8.2.13 Complete Example 8.2.7 by showing that TIIF acceptable meaning function for T. Exercise 8.2.14 logical relation

M:

17M is

an

Let A be a lambda model for some signature without constants. If a is called a hereditary A x A is a permutation at each type, then

Logical Relations

544

is permutation-invariant permutation (see Exercise 8.2.3). We say that an element a e if 1Z(a, a) for every hereditary permutation R. ç A x A. Show that for every closed term M, its meaning AIIM]] in A is permutation invariant.

Exercise 8.2.15 Let A be the full classical hierarchy over a signature with single type constantb, with Ab = (0, l}.

f

that is permutation invariant (see Exercise 8.2.14), but (a) Find some function e not the meaning of a closed lambda term. You may assume that every term has a unique

normal form.

f

(b) Show that if e has the property that R(f, f) for every logical relation R. c A x A, then is the meaning of a closed lambda term. (This is not true for all types, but that is tricky to prove.)

f

Exercise 8.2.16 Prove the basic lemma for combinatory algebras. More specifically show that if R. A x B is a logical relation over combinatory algebras A and B and 17a and 17b are related environments with 17a' 17b 1= F, then 'JZ(AI{F

i

M:

BI{F

i

M: Oil Jib)

for every typed combinatory term F M: a. (Do not consider it part of the definition of logical relation that R(K, K) and R(S, S) for combinators K and S of each type.) i

8.2.3

Partial Functions and Theories of Models

A logical relation R. ç A x B is called a logical partial function if each function from to Ba, i.e., R°(a, b1)

and R°(a, b2) implies

b1

=

is a partial

b2.

Logical partial functions are related to the theories of typed lambda models by the following lemma.

Lemma 8.2.17 If 7?. c A x B is a logical partial function from model A to model B, then Th(A) c Th(B). loss of generality, we will only consider equations between closed terms. B be a partial function and suppose AIIM]] = AIINIII for closed M, N. Writing 7Z(.) as a function, we have

Proof

Let

7?.

Without

çAx

7?.(AI[M]])

= BI[M1II and 7?.(AE[N]]) = BuN]]

by the Basic Lemma. Thus BI{M]1

= BIINII.

8.2

Logical Relations over Applicative Structures

545

By Corollary 4.5.23 (completeness for the pure theory of ij-conversion) we know there is a model A satisfying F M = N: a if F- F M = N: a. Since the theory of this model is contained in the theory of every other model, we may show that a model B has the same theory by constructing a logical partial function c B x A. This technique was ij-conversion is complete for the full set-theoretic first used by Friedman to show that type hierarchy over any infinite ground types [Fri75, Sta8Sa].

Corollary 8.2.18 Let A be a model of the pure theory of is a logical partial function and B is a model, then Th(B)

ij-conversion. If

CB x A

= Th(A).

We use Friedman's technique, and extensions, in Section 8.4 to show completeness for

full set-theoretic, recursive and continuous models. Since logical partial functions are structure-preserving mappings, it is tempting to think of logical partial functions as the natural generalization of homomorphisms. This often provides useful intuition. However, the composition of two logical partial functions need not be a logical relation. It is worth noting that for a fixed model A and environment ij, the meaning function AE[ }Jij from T to A is not necessarily a logical relation. In contrast, meaning functions in algebra are always homomorphisms.

Exercise 8.2.19 Find logical partial functions ç A x B and S ç B x C whose comç A x C is not a logical relation. (Hint: Try C = A and position 5 o = (S° o 8.2.4

Logical Partial Equivalence Relations

Another useful class of logical relations are the logical partial equivalence relations (logical pçr's), which have many of the properties of congruence relations. If we regard typed applicative structures as multi-sorted algebras, we may see that logical partial equivalence is relations are somewhat more than congruence relations. Specifically, any logical pçr is an equivalence relation closed under application, but must also a congruence, since be closed under lambda abstraction. Before looking at the definition of logical partial equivalence relation, it is worth recalling the definition of partial equivalence relation from Section 5.6.1. The first of two equivalent definitions is that a partial equivalence relation on a set A is an equivalence relation on some subset of A, i.e., a pair (A', R) with A' A and R A' x A' an equivalence relation. However, because R determines A' = (a R(a, a)} uniquely and R is an equivalence relation on A' if R is symmetric and transitive on A, we may give a simpler definition. ç A x A such that A logical partial equivalence relation on A is a logical relation is symmetric and transitive. We say a partial equivalence relation is total if each each I

Logical Relations

546

R'7 is reflexive, i.e., R'7(a, a) for every a E A'7. From the following lemma, we can see that there are many examples of partial equivalence relations.

is symmetric and transitive, Let R. c A x A be a logical relation. If for each type constant b, then every R'7 is symmetric and transitive.

Lemma 8.2.20

The proof is given as Exercise 8.2.21. Intuitively, this lemma shows that being a partial equivalence relation is a "hereditary" property; it is inherited at higher types. In contrast, being a total equivalence relation is not. Specifically, there exist logical relations which are total equivalence relations for each type constant, but not total equivalence relations at every type. This is one reason for concentrating on partial instead of total equivalence relations.

Exercise 8.2.21

Prove Lemma 8.2.20 for

by induction on type expressions.

Henkin model A and logical relation R. ç A x A such that Exercise 8.2.22 Find a Rb is reflexive, symmetric and transitive, for each type constant b, but some R'7 is not reflexive. (Hint: Almost any model will do.)

Exercise 8.2.23 Let E be a signature with type constants B = {bo, b1, .} and (for simbe a plicity) no term constants. Let V be a partial combinatory algebra and let combinatory algebra for signature E constructed using subsets of V as in Section 5.6.2. x be any logical partial equivalence relation. Show that for all types a Let R. c and r, the product per R'7 x Rt V is identical to the relation Roxt, and similarly for (Hint: There is something to check here, since the definition of R'7XT depends on A'7XT but, regarding R'7 and Rt as pers over V, the definition of R'7 x Rt does not.) .

8.2.5

.

Quotients and Extensionality

If R. is a logical partial equivalence relation over applicative structure A, then we may form a quotient structure AIR. of equivalence classes from A. This is actually a "partial" quotient, since some elements of A may not have equivalence classes with respect to However, to simplify terminology, we will just call AIR. a "quotient." One important fact about the structure AIR. is that it is always extensional. Another is that under certain reasonable assumptions, AIR. will be a model with the meaning of each term in AIR. equal to the equivalence class of its meaning in A. Let R. c A x A be a logical partial equivalence relation on the applicative structure A = ({A'7}, {App'7't}, Const}). The quotient applicative structure

AIR. = ({A'7/R.}, {App'7't/R.}, Const/R.) is defined as follows. If R'7(a,

a), then the equivalence class

of a with respect to R. is

8.2

Logical Relations over Applicative Structures

547

defined by

[aIR=(a'

I

= ([aIx R°(a, a)} be the collection of all such equivalence classes and define application on equivalence classes by Note that a E [aIx. We let A°77?.

[al [fri =

I

a bJ

The map interprets each constant c as the equivalence class This definition completes the of A/i?.. It is easy to see that application is well-defined: if a') and R°(b, b"), then since 7Z is logical, we have Rt(Appat a b, a' b').

If 7?. ç A x A is a logical partial equivalence relation, then A/i?. is an extensional applicative structure.

Lemma 8.2.24

Proof Let [au,

[a2] E

and suppose that for every [bi

E

A°/7?., we have

App/i?. [au [bI = App/i?. [a2] [b]. This means that for every b1, b2 e A°, we have R°(b1, b2) Rt(App a! b1, App b2). Since 7?. is logical, this gives us a2), which means that [au = [a21. Thus A/i?. is extensional. .

j

A special case of this lemma is the extensional collapse, AE, of an applicative structure A; this is also called the Gandy hull of A, and appears to have been discovered independently by Zucker [Tro73] and Gandy [Gan56]. The extensional collapse AE is defined as follows. Given A, there is at most one logical relation 7?. with

Rb(ai,a2)

if

e Ab

for each type constant b. (Such an 7?. is guaranteed to exist when the type of each constant is either a type constant or first-order function type.) If there is a logical relation 7?. which is the identity relation for all type constants, then we define AE = A/i?..

Corollary 8.2.25 (Extensional Collapse) Let A be an applicative structure for signature E. If there is a logical relation 7?. that is the identity relation on each type constant, then the extensional collapse AE is an extensional applicative structure for In some cases, A/i?. will not only be extensional, but also a Henkin model. This will be true if A is a model to begin with. In addition, since A/i?. is always extensional, we would expect to have a model whenever A has "enough functions," and $-equivalent terms are equivalent modulo 7?.. If AE[ is an acceptable meaning function, then this will guarantee that A has enough functions. When we have an acceptable meaning function, we can also express the following condition involving $-equivalence. We say 7?. satisfies F, x: a ($) with respect to All if, for every term F, x: a M: r and environment 11

11

Logical Relations

548

with

we have

x: a

AIJJ', x:

(Ax:

a

It follows from properties of acceptable meaning function A that if R. satisfies (a), then 7?. relates a-equivalent terms (Exercise 8.2.28) and 7?. is admissible (Exercise 8.2.29). In addition, if A is a combinatory algebra with standard meaning function as described in Example 8.2.8, then by Exercise 8.2.16 and the equational properties of combinators, every 7?. over A satisfies (a). we will associate an environment To describe the meanings of terms in for by AIR- with each environment ij for A with RF(ij, ij). We define

= and note that = F if = F. Using these definitions, we can give useful conditions which imply that AIR. is a model, and characterize the meanings of terms in AIR-.

Lemma 8.2.26 (Quotient Models) Let 7?. c A x A be a partial equivalence relation over applicative structure A with acceptable meaning function AI[ 11, and suppose that 7?satisfies Then AIR. is a model such that for any A-environment = F with R"(ij, ij) we have

(A/R-)llj'

M:

= [AI[F

M:

In other words, under the hypotheses of the lemma, the meaning of a term F M: a in AIR- is the equivalence class of the meaning of F M: a in A, modulo R-. Lemma 8.2.26 is similar to the "Characterization Theorem" of [Mit86c], which seems to be the first use of this idea (see Theorem 9.3.46).

Pro of

We use induction on the structure of terms. By Exercise 8.2.29, 7?. is admissible. It follows from the general version of the Basic Lemma that the equivalence class [AI[F M: of the interpretation of any term in A is nonempty. This is essential to the proof. For any variable F x: a, it is clear that

(A/R-)IIF

=

=[AI[F

by definition of and acceptability of AI[ II. The case for constants is similarly straightforward. The application case is also easy, using the definition of application in AIR-. For the abstraction case, recall that —÷

where, by definition,

f,

f satisfies the equation

8.2

Logical Relations over Applicative Structures

(App/R)

f

549

= (A/'R)l[F, x: ci M:

i—±

[all

for all satisfies the equation e A°/R°. We show that [AI[F Xx: cr.M: ci —> for f, and hence the meaning of F Xx: ci.M: ci —> r exists in A/'R. Since AIR. is extensional by Lemma 8.2.24, it follows that A/7Z is a Henkin model. For any e Aa/Ra, we have

(App/R)

[ALIT

Xx: ci.M: ci

by definition of application in it is easy to see that A1[F

Xx: ci.M: ci

—>

= [App

—>

(ALIT

Xx: ci.M: ci

—>

Using the definition of acceptable meaning function,

= A1[F, x: ci

since x is not free in Xx: ci.M. Since a definition of acceptable that

Xx: ci.M: ci

= A1[F, x:

ci

—>

i—÷

al

x:

i—÷

al, it follows from the

Therefore, using the assumption that 7?. satisfies ($), we may conclude that [App (ALIT

Xx: ci.M: ci

—>

=

[A1[F, x: ci

By the induction hypothesis, it follows that [AI[F equation for f. This proves the lemma.

M:

i—÷

Xx: ci.M: ci

—>

satisfies the

.

A straightforward corollary, with application to per models, is that the quotient of any combinatory algebra by a partial equivalence relation is a Henkin model.

Let A be a combinatory algebra and c A x A a logical partial equivalence relation. Then AIR. is a Henkin model, with the meaning of term F M: ci the equivalence class

Corollary 8.2.27

M:

= [ALIT cL(M):

of the interpretation of the combinatory term cL(M) in A. The proof is simply to apply Lemma 8.2.26 to combinatory algebras, with acceptable meaning function as described in Example 8.2.8, noting that by Exercise 8.2.16 and the equational properties of combinators, every relation over A satisfies ($). Corollary 8.2.27 implies that any hierarchy of partial equivalence relations over a partial combinatory algebra (see Sections 5.6.1 and 5.6.2) forms a Henkin model. More specifically, for any partial combinatory algebra D, any hierarchy of pers over V forms a logical partial equivalence relation over a typed (total) combinatory algebra Am as described

Logical Relations

550

in Exercise 8.2.23. Therefore, we may show that pers over V form a Henkin model by of a combinatory algebra by a logical partial equivalence showing that the quotient x relation A'p is a Henkin model. This is immediate from Corollary 8.2.27. In c addition, Corollary 8.2.27 shows that the interpretation of a lambda term is the equivaare exactly lence class of its translation into combinators. Since the combinators of the untyped combinators of D, according to the proof of Lemma 5.6.8, it follows that the is the equivalence class of the meaning of its translation meaning of a typed term in

into untyped combinators in V.

Exercise 8.2.28 terms F, x: a

with respect to AI[ Show that if satisfies r and F N: a, and environment = F with

a.M)N: Exercise 8.2.29

AIIIF

Show that if

then, for every pair of we have

[N/x]M:

satisfies

with respect to ALl 11, then

is admissible.

Exercise 8.2.30 Let A be an applicative structure and let {xi: a1 ak} M: r be represents M if A 1= a term without lambda abstraction. An element a E {xj: a1 Xk = M: t, using a as a constant denoting a. Note that since xk: axi M contains only applications, this equation makes sense in any applicative structure. Show that if A x A is a logical partial equivalence relation and a represents F M: a in A, then [a] represents F M: a in . .

8.3 8.3.1

.

Proof-Theoretic Results Completeness for Henkin Models

In universal algebra, we may prove equational completeness (assuming no sort is empty) by showing that any theory E is a congruence relation on the term algebra T, and noting M = N E S. We may give similar completeness that T/E satisfies an equation M = N proofs for typed lambda calculus, using logical equivalence relations in place of congruence relations. We will illustrate the main ideas by proving completeness for models without empty types. A similar proof is given for Henkin models with possibly empty types in Exercise 8.3.3. It is also possible to present a completeness theorem for Kripke models (Section 7.3) in this form, using a Kripke version of logical relations. Throughout this section, we let T be the applicative structure of terms typed using infinite type assignment N, with meaning function TII defined by substitution. (These are defined in Example 8.2.7.) We assume N provides infinitely many variables of each type, so each is infinite. IfS is any set of equations, we define the relation = over T x T by taking

if

11

Proof-Theoretic Results

8.3

551

C7-i:.

Note that since E

Lemma 8.3.1

F-

x: a

r>

x

= x: a for each x: a

E 7-i:,

For any theory E, the relation

no Rd is empty.

T

x

T

is a logical equivalence

relation.

Proof It

is an equivalence relation. To show that R. is logical, = M') and Ra(N, N'). Then E F- Fi r> M = M': a r and E F- ['2

is easy to see that R.

suppose that

M'N': r. Therefore RT(MN, M'N'). It remains to show that if E F- F r for every N, N' with E F- F N = N': a, then E F- F M = M': a r. This is where we use the assumption that 7-i: provides infinitely many variables of each type. Let x be a variable that is not free in M or M' and with x: a E 7-i:. Since E F- F, x: a x = x: a, we have E F- F, x: a r> Mx = M'x: r with F, x: a ç 7-i:. By we have E

F-

F

Ax:

a.Mx = Ax: a.M'x: a

r.

We now apply (7)) to both sides of this equation to derive E reflexivity, R. respects constants. This proves the lemma.

F-

F

M

= M': a

r. By

.

From Lemma 8.2.26, and the straightforward observation that any satisfies we can form a quotient model from any theory. This allows us to prove "least model" completeness for Henkin models without empty types.

Theorem 8.3.2 (Henkin completeness without empty types) Let E be any theory Then A is a Henkin model without empty closed under (nonempty), and let A = types satisfying precisely the equations belonging to E.

Proof By Lemma 8.3.1, the theory E determines a logical equivalence relation over the relation is admissible by Exercise 8.2.29. Let us T. Since clearly satisfies and A for By Lemma 8.2.26, A is a model with AIIF M: = for for every T-environment (substitution) ij satisfying F. It remains to show that A satisfies precisely the equations belonging to E. Without loss of generality, we may restrict our attention to closed terms. The justification for this is that it is easy to prove the closed equation

write

...—±ak--± from the equation

(xl: al.

.

.

xk: ak}

= N:

Logical Relations

552

between open terms, and vice versa. Let be the identity substitution. Then for any closed equation 0 M = N: t E E, since 0, we have

This shows that A satisfies every equation in E. Conversely, if A satisfies 0 M = N: t, This implies = which means then A certainly must do so at environment = N: t for some F ci-i. Since M and N are closed, we have = N: t E E E F- F by rule (nonempty). This proves the theorem.

Exercise 8.3.3 A quotient construction, like the one in this section for models without empty types, may also be used to prove deductive completeness of the extended proof system with proof rules (empty I) and (empty E) for models that may have empty types. (This system is described in Sections 4.5.5 and 4.5.6.) This exercise begins the model construction and asks you to complete the proof. To prove deductive completeness, we assume that Fo M0 = N0: ao is not provable from some set E of typed equations and construct a model A = with A E, but A F0 M0 = N0: ao. The main new idea in this proof is to choose an infinite set of typed variables carefully so that some types may be empty. We decide which types to make empty by constructing an infinite set 7-1 of typing hypotheses of the form x: a and emptiness assertions of the form empty(a) such that (a) FoCI-I, (b) for every type a, we either have emply(a) E many variables x (but not both),

7-I

or we have x: a

E I-I

for infinitely

(c) for every finite F c 7-I containing all the free variables of M0 and No, the equation F M0 = No: ao is not provable from E.

The set 7-I is constructed in infinitely many stages, starting from I-Ia = Fo. If to, is an enumeration of all type expressions, then is determined from by looking at equations of the form F, x: M0 = No: ao, where F c and x is not in and not free in Mo or N0. If

e

=

F, x:

for any finite F

=

ç

No: ao

with the above equation well-formed, then we let

U

Otherwise, we take

=

U {x:

tj

I

infinitely many fresh x}.

Proof-Theoretic Results

8.3

553

the union of all the the model A = is defined as in this section. As described in Example 4.5.1, 23L is an applicative structure for any However, while Lemma 8.3.1 does not require (nonempty), the proof does assume that provides infinitely many variables of each type.

Let

be

Given

(a) Show that properties (i) and (ii) hold for 7-L (b) Show that is a logical equivalence relation, re-proving only the parts of Lemma 8.3.1 that assume provides infinitely many variables of each type. (Hint: You will need to use proof rule (empty I).) (c) Show that A does not satisfy To M0 = N0: ao. (This is where you need to use rule (empty E). The proof that A satisfies E proceeds as before, so there is no need for you to

check this.) 8.3.2

Normalization

Logical relations may also be used to prove various properties of reduction on typed terms. term is strongly normalizing. More precisely, we In this section, we show that every show that for every typed term T M: a, we have SN(M), where SN is defined by

SN(M)

if there is no infinite sequence of fi, q-reductions from M.

While the proof works for terms with term constants, we state the results without term constants to avoid confusion with more general results in later sections. We generalize the argument to other reduction properties and product types in the next section and consider reductions associated with constants such as + andfix in Section 8.3.4. The normalization proof for has three main parts: 1.

Define a logical predicate

2. Show that

P on the applicative structure of terms.

implies SN(M).

P is admissible, so that we may use the Basic Lemma to conclude that holds for every well-typed term T M: a.

3. Show that

Since the Basic Lemma is proved by induction on terms, the logical relation P is a kind of "induction hypothesis" for proving that all terms are strongly normalizing. To make the outline of the proof as clear as possible, we will give the main definition and results first, postponing the proofs to the end of the section. Let Y = be an applicative structure of terms, as in Examples 4.5.1 and 8.2.7, where provides infinitely many variables of each type. We define the logical predicate (unary logical relation) P ç Y by taking pb to be all strongly normalizing terms, for each type

Logical Relations

554

constant b, and extending to function types by

if

VN E Ta.

P°(N)

PT(MN).

It is easy to see that this predicate is logical. (If we have a signature with term constants, then it follows from Lemma 8.3.4 below that P is logical, as long as only is used in the definition of SN.) The next step is to show that P contains only strongly normalizing terms. This is a relatively easy induction on types, as long as we take the correct induction hypothesis. The hypothesis we use is the conjunction of (i) and (ii) of the following lemma.

Lemma 8.3.4 (i) IfxM1

.

.

For each type

.Mk

E

a,

the predicate

satisfies the following two conditions.

and SN(M1), .. .SN(Mk), then pa (xMi ..

(ii) If P°(M) then SN(M). The main reason condition (i) is included in the lemma is to guarantee that each pa is non-empty, since this is used in the proof of condition (ii). Using Lemma 8.2.11, we may show that the logical predicate P is admissible.

Lemma 8.3.5

The logical predicate

P c T is admissible.

From the two lemmas above, and the Basic Lemma for logical relations, we have the following theorem.

Theorem 8.3.6 (Strong Normalization) SN(M).

For every well-typed term F

M: a, we have

M: a be a typed term and consider the applicative structure T defined using variables from H with F 7-1. Let be the environment for T with = x for all x. Then since 'TI[ is acceptable and the relation P is admissible, M: ajliio), or, equivalently, P°(M). Therefore, by by Lemma 8.3.5, we have Lemma 8.3.4, we conclude SN(M). .

Proof of Theorem 8.3.6 Let F

Proof of Lemma 8.3.4 We prove both conditions simultaneously using induction on types. We will only consider variables in the proof of (i), since the constant and variable cases are identical. For a type constant b, condition (i) is straightforward, since SN(x M1 . Mk) whenever for each i. Condition (ii) follows trivially from the definition of P. So, we move on to the induction step. For condition (i), suppose xM1 and for each we have Mk E Let N E with P°(N). By the induction hypothesis (ii) for type a, we have SN(N). .

.

. .

.

8.3

Proof-Theoretic Results

555

Therefore, by the induction hypothesis (i) for type r, we have Pt(xMi . MkN). Since . this reasoning applies to every N E we conclude that Mk). We now consider condition (ii) for type a —± r. Suppose M E with and x E By the induction hypothesis (i) we have P°(x) and so Pt(Mx). Applying the induction hypothesis (ii) we have SN(Mx). However, this implies SN(M), since any infinite sequence of reductions from M would give us an infinite sequence of reductions from Mx. .

.

.

.

.

Proof of Lemma 8.3.5 This lemma

is proved using Lemma 8.2.11, which gives us a straightforward test for each type constant. To show that P is admissible, it suffices to

show that for each type constant b, term N and term ([N/x]M)Ni .. Nk of type b, .

SN(N) and SN(({N/x]M)Ni .. Nk) imply SN((Ax:a.M)NN1 .

.

Nk).

. .

With this in mind, we assume SN(N) and SN(([N/x]M)Ni Nk) and consider the possible reductions from (Ax: a.M)NN1 Nk. It is clear that there can be no infinite sequences of reductions solely within N, because SN(N). Since ({N/xJM)Ni . Nk is strongly normalizing, we may conclude SN(M), and SN(N1) SN(Nk), since otherwise we could produce an infinite reduction sequence from ([N/x]M)N1 . Nk by reducing one of its subterms. Therefore, if there is any infinite reduction sequence with /3-reduction occurring at the left, the reduction sequence must have an initial segment of the form . .

.

. . .

.

.

(Ax:

a.M)NN1

.

. .

Nk

N

([N/x]M)Ni

.

.

.

Nk

(Ax:

N' and

.

.

.

.

. .

for

—÷÷

.

1

i

k. But then clearly we may reduce

.

well, giving us an infinite sequence of reductions from ([N/x]M)Ni . Nk. The other possibility is an infinite reduction sequence with it-reduction occurring at the left, which would have the form as

(Ax:a.M)NN1...Nk

.

.

-÷

But since the last reduction shown could also have been carried out using /3-reduction, we have already covered this possibility above. Since no infinite reduction sequence is possible, the lemma is proved.

.

Exercise 8.3.7 We can show weak normalization of /3-reduction directly, using an argument devised by A.M. Turing sometime between 1936 and 1942 [Gan8O]. For this exercise, the degree of a /3-redex (Ax: a. M)N is the length of the type a —f r of Ax: a. M

Logical Relations

556

when written out as a string of symbols. The measure of a term M is the pair of natural numbers (i, J) such that the redex of highest degree in M has degree i, and there are exactly j redexes of degree i in M. These pairs can be ordered lexicographically by either i 0.

defined from strong normalization of pcf+-reduction, we have else ), and for each type a and natural number

then

Proof For addition, we assume

and and show + N). It suffices to show that M + N is strongly normalizing. However, this follows immediately since M and N are both strongly normalizing and the reduction axioms for + only produce a numeral from two numerals. The proof for equality test, Eq?, is essentially the same. P°(M), P°(N) and demonstrate p°(jf B For conditional, we assume then M else N). By Lemma 8.3.20, it suffices to show that if E[if B then B then M else NI Melse this is analysis, according to whether However, an easy case normalizing. is strongly false, or neither, using (ii) of Lemma 8.3.20. B true, B For this, we use induction The remaining case is a labeled fixed-point constant is easy since there is no associated reduction. For on the label. The base case, for and let E[ I be any elimination context with we assume I) and is strongly normalizWe prove the lemma by showing that e ing. If there is an infinite reduction sequence from this term, it must begin with steps of the form

Logical Relations

568

a -+ a.

a -+

4

a.

However, by the induction hypothesis and (ii) of Lemma 8.3.20, we have and P(e"[ ]), which implies that an infinite reduction sequence of this form is impossible. This completes the induction step and proves the lemma.

.

Since Lemma 8.3.2 1 is easily extended to Theorem 8.3.19.

Theorem 8.3.22

we have the following corollary of

Both pcf+ and pcf+,o-reduction are strongly normalizing on labeled

PCF terms. We also wish to show that both forms of labeled PCF reduction are confluent. A general theorem that could be applied, if we did not have conditional at all types, is given in [BT88, BTG89] (see Exercise 8.3.29). The general theorem is that if typed lambda calculus is extended with a confluent, algebraic rewrite system, the result is confluent. The proof of this is nontrivial, but is substantially simplified if we assume that 1Z is left-linear. That is in fact the case we are interested in, since left-linearity is essential to Proposition 8.3.18 anyway. An adaptation of this proof to typed lambda calculus with labeled fixed-point operators has been carried out in [HM9O, How92]. For the purposes of proving our desired results as simply as possible, however, it suffices to prove weak confluence, since confluence then follows by Proposition 3.7.32. To emphasize the fact that the argument does not depend heavily on the exact natural number and boolean operations of PCF, we show that adding proj, lab+(o) to any weakly confluent reduction system preserves weak confluence, as long as there are no symbols in common between 1Z and the reduction axioms of proj, lab+(o). This is essentially an instance of orthogonal rewrite systems, as they are called in the literature on algebraic rewrite systems [Toy87], except that application is shared. A related confluence property of untyped lambda calculus is Mitschke's 8-reduction theorem [Bar84], which shows that under certain syntactic conditions on untyped 1Z, a-reduction is confluent.

Lemma 8.3.23

Let 1Z be a set of reduction axioms of the form L -+ R, where L and R are lambda terms of the same type and L does not contain an application x M of a variable to an argument or any of the symbols X, fix or any If 1Z is weakly confluent, then 1Z, proj, lab+(o) is weakly confluent.

Proof The proof is a case analysis on pairs of redexes. We show two cases involving reduction and reduction from

1Z.

The significance of the assumptions on the form of L is

8.3

Proof-Theoretic Results

569

that these guarantee that if a subterm of a substitution instance [M1 Mk/X1 xkIL of some left-hand-side contains a proj, lab+(o) redex, then this redex must be entirely within one of M1 Mk. The lemma would also hold with any other hypotheses guaranteeing this property. The first case is an R,-redex inside a This gives a term of the form

(Ay:o.(...[Mi

Mk/x1,..., xkIL...))N

reducing to either of the terms

(...[N/y][Mi

Mk/xi

Xk]L...)

(Ay:o.(.

Mk/xi

xkIR. .))N

.

.

the first by a-reduction and the second by an axiom L both reduce in one step to

(...[N/y][Mi

Mk/x1

—÷

Mk/Xl

where one of Mi

R. It is easy to see that

Xk]R...)

The other possible interaction between a-reduction and

(...[Mi

R from

begins with a term of the form

Xk]L...) Mk contains a

If M1

—*

Mi', then this term reduces to either

of the two terms

(...[Mi,...,

Mk/x1

Xk]L...)

(...[M1,..., M1

Mk/xi

Xk]R...)

in one step. We can easily reduce the first to

(...[Mi

M,'

Mk/x1

Xk]R...)

step. Since we can also reach this term by some number of a-reductions, one for each occurrence of x, in R, local confluence holds. The cases for other reductions in a substitution instance of the left-hand-side of an axiom are very similar. by one

.

By Lemma 3.7.32, it follows that and are confluent. We may use Proposition 8.3.18 to derive confluence of pcf-reduction from confluence of pcf+. This gives us the following theorem.

Theorem 8.3.24

The reductions pcf+ and pcf+o are confluent on labeled PCF terms and pcf -reduction is confluent on ordinary (unlabeled) PCF terms.

Logical Relations

570

Completeness of Leftmost Reduction is a set of "left-normal" rules (defined below), The final result in this section is that if and proj, lab+ is confluent and terminating, then the leftmost reduction strategy de,8, proj,fix-normal forms. Since the PCF fined in Section 2.4.3 is complete for finding rules are left-normal, it follows from Proposition 2.4.15 that lazy reduction is complete for finding normal forms of PCF programs. A reduction axiom is left-normal if all the variables in the left-hand side of the rule appear to the right of all the term constants. For example, all of the PCF rules for natural numbers and booleans are left-normal. If we permute the arguments of conditional, however, putting the boolean argument at the end, then we would have reduction axioms cond x y true

x,

cond x yfalse

y.

of the constants true and false. Intuitively, if we have left-normal rules, then may safely evaluate the arguments of a function from left to right. In many cases, it is possible to replace a set of rules with essentially equivalent left-normal ones by either permuting arguments or introducing auxiliary function symbols (see Exercise 8.3.28). However, this is not possible for inherently "nonsequential" functions such as parallel-or. We will prove the completeness of left-most reduction using strong normalization and confluence of positive labeled reduction, and the correspondence between labeled and unlabeled reduction. Since labeled fix reduction may terminate with fix° where unlabeled fix reduction could continue, our main lemma is that any fix0 occurring to the left of the left-most redex will remain after reducing the left-most redex. These are not left-normal since x and y appear to the left

Lemma 8.3.25 Let R. be a set of left-normal reduction axioms with no left-hand-side containingflx°. If M N by the left-most reduction step, andflx° occurs to the left of the left-most redex of M, then fix0 must also occur in N, to the left of the left-most redex if N is not in normal form.

Proof

Suppose M C[LJ —* C[RJ N by contracting the left-most ,8, proj, redex and assumeflx° occurs in C[]to the left of [1. If N C[R] is in normal form, then fix° occurs in N since fix0 occurs in C [1. We therefore assume N is not in normal form and show thatfix° occurs to the left of the left-most redex in N. Suppose, for the sake of deriving a contradiction, thatfix° occurs within, or to the right of, the left-most redex of N. Since a term beginning with fix0 is not a redex, the left-most redex of N must begin with some symbol to the left of fix0, which is to the left of R when we write N as C[RJ. The proof proceeds by considering each possible form of redex. Suppose the left-most redex in N is Proj1 (N1, N2), withfix° and therefore R occurring

8.3

Proof-Theoretic Results

571

Since fix° must occur between and R, R cannot be (N1, N2), to the right of and so a redex of the form (_, ) must occur in M to the left of L. This contradicts the assumption that L is the leftmost redex in M. Similar reasoning applies to a t3-redex; the fix case is trivial. It remains to consider a redex SL' with S some substitution and L' R' in 1Z. We assumejix° and therefore R occur to the right of the first symbol of SL'. Sincejix° is not in L', by hypothesis, and the rule is left-normal, all symbols to the right of fix°, including R if it occurs within L', must be the result of substituting terms for variables in L'. It follows that we have a redex in M to the left of L, again a contradiction. This proves the lemma.

.

is confluent and terminating, with Suppose 1Z, proj, linear and left-normal. If M N and N is a normal form, then there is a R, from M to N that contracts the leftmost redex at each step.

Theorem 8.3.26

liZ

left-

proj,

N then, by Lemma 8.3.16 there exist labelings M* and N* of these M* N*. Since proj, lab+-reduction is confluent and termiterms such that M* nating, we may reduce to N* by reducing the left-most redex at each step. of M* to N* is also the left-most We show that the left-most R, proj, proj,fix-reduction of M to N (when labels are removed). It is easy to see that this liz, is the case if no term in the reduction sequence has fix° to the left of the left-most redex, since this is the only term that would be an unlabeled redex without being a labeled one. Therefore, we assume that the final k steps of the reduction have the form

Proof If M

... where fix° occurs to the left of L in Mk. But by Lemma 8.3.25 and induction on k, Jix° must also be present in N*, which contradicts the fact that N is a liZ, proj,fix-normal form. It follows from Lemma 8.3.16 that by erasing labels we obtain a leftmost reduction

fromMtoN. shows that left-most reduction is normalizing for the untyped lambda calculus extended with any left-normal, linear and non-overlapping term rewriting system. Our proof is simpler since we assume termination, and since type restrictions make fix the only source of potential nontermination. Since the rules of PCF satisfy all the conditions above, Theorem 8.3.26 implies the completeness of leftmost reduction for finding PCF normal forms. This was stated as Proposition 2.4.12 in Section 2.4.3. A related theorem in

Exercise 8.3.27 Gödel's system constant nat and term constants

T of primitive recursive functionals

is

over type

Logical Relations

572

o

:nat

succ: nat

—÷

nat

Ra

for each type

a, with the following reduction axioms

RaMNO ± M

RaMN(succx)

—±

The primitive recursion operator Ra of type a is essentially the same as prime of Exercise 2.5.7, with cartesian products eliminated by currying. Extend the proof in this section to show that Gödel's T is strongly normalizing. You need not repeat proof steps that are given in the text. Just point out where additional arguments are needed and carry them out. Historically, Gödel's T was used to prove the consistency of first-order Peano arithmetic [Göd58], by a reduction also described in [HS86, Tro73] and mentioned briefly in [Bar84, Appendix A.2]. It follows from this reduction that the proof of strong normalization of T cannot be formalized in Peano arithmetic. Given this, it is particularly surprising that the proof for T involves only the same elementary reasoning used for

Exercise 8.3.28

Consider the following set of algebraic reduction axioms.

f true true y f true false y f false x true

false

ffalsexfalse

true.

true

false

This system is not left-normal. Show that by adding a function symbol not with axioms

not true

false

not false —* true. we may replace the four reduction axioms by two left-normal ones in a way that gives exactly the same truth-table for f.

Exercise 8.3.29 This exercise is concerned with adding algebraic rewrite rules to typed lambda calculus. Recall from Section 4.3.2 that if = (S, F) is an algebraic signature, we may construct a similar signature = (B, F') by taking the sorts of as base types, and letting each algebraic function have the corresponding curried type. This allows us to view algebraic terms as a special case of typed lambda terms. Let R. be an algebraic rewrite system over Thinking of R. as a set of rewrite

8.4

Partial Surjections and Specific Models

573

rules for terms over E, it makes sense to ask whether the reduction system consisting of together with the usual reduction rules ($)red and (1))red of typed lambda calculus is confluent. This turns out to be quite tricky in general, but it gets much simpler if we drop (1))red and just consider R. plus $-reduction (modulo a-conversion of terms). We call the combination of 7?. and $-reduction 7?,, $-reduction. The problem is to prove that if is confluent (as an algebraic rewrite system), then 7?,, $-reduction is confluent on typed lambda terms over E. (This also holds with proj-reduction added [HM9O, How92].) Some useful notation is to write for reduction using rules from similarly for 7?,, $-reduction, and $nf(M) for the normal form ofM under $-reduction. Note You may assume the following that finf(M) may still be reducible using rules from facts, for all F M: a and F N: a. 1. If R. is confluent on a'gebraic terms over signature typed lambda terms over L

2. If M is a 3.

If M

$-normal form and M N, then $nf(M)

N, then N is a

then

is

confluent on

form.

$nf(N).

The problem has two parts. 1—3 above, show that if 7?. is confluent (as an algebraic rewrite system), then (This problem has a 4 or 5$-reduction is confluent on typed lambda terms over and ordinary 18-reduction sentence solution, using a picture involving

(a) Using 7?,,

(b) Show that (a) fails if we add fix-reduction. Specifically, find a confluent algebraic $,fix-reduction on typed lambda terms over signature E U rewrite system 7?. such that {fixa: (a —± a) —± al is not confluent. (Hint: try to find some algebraic equations that become inconsistent when fixed points are added.)

8.4

Partial Surjections and Specific Models

In Sections 8.4.1—8.4.3, we prove completeness of $, i)-conversion for three specific models, the full classical, recursive and continuous hierarchies. These theorems may be contrasted with the completeness theorems of Section 8.3.1, which use term models to show that provability coincides with semantic implication. The completeness theorems here show that the specific theory of $, ij-conversion captures semantic equality between pure typed lambda terms in three specific, non-syntactic models. One way of understanding each of these theorems is to consider a specific model as characterizing a specific form of "distinguishability." Suppose we have two terms,

Logical Relations

574

M, N: (b b) b, for example. These will be equal in the full classical hierarchy if the functions defined by M and N produce equal values of type b for all possible function only contains recurarguments. On the other hand, in a recursive model A where sive functions, these terms will be equal if the functions they define produce equal values when applied to all recursive functions. Since not all set-theoretic functions are recursive, there may be some equations (at this type) that hold in the recursive model but not in the full set-theoretic hierarchy. Surprisingly, the completeness theorems show that these two forms of semantic equivalence are the same, at least for pure typed lambda terms without constants. The third case, the full continuous hierarchy, is also interesting for its connection with PCF extended with parallel-or. Specifically, we know from Theorem 5.4.14 that two terms of PCF + por are equal in the continuous hierarchy over nat and bool if they are operationally equivalent. In other words, two lambda terms are equal in the continuous model if they are indistinguishable by PCF program contexts. Therefore, our completeness theorem for the full continuous hierarchy shows that ij-conversion captures PCF operational equivalence for pure typed lambda terms without the term constants of PCF. A technical point is that this only holds for terms whose types involve only nat and —÷; with bool, the theorem does not apply since we only prove completeness of ij-conversion for the full continuous hierarchy over infinite CPOs. A more general theorem, which we do not consider, is Statman's so-called 1-section theorem [Rie94, Sta8Sa, 5ta82]. Briefly, Statman's theorem gives a sufficient condition on a Henkin model A implying that two terms have the same meaning in A if the terms ij-convertible. It is called the "1-section theorem" since the condition only involves are the first-order functions of A. Although the 1-section theorem implies all of the three completeness theorems here, the proof is more complex and combinatorial than the logical relations argument used here, and the generalization to other lambda calculi is less clear. We will work with and type frames instead of arbitrary applicative structures. Recall that a type frame is an extensional applicative structure A such that the elements of are functions from to AT and App x = f(x). We therefore write fx in place of App x.

f

8.4.1

f

Partial Surjections and the Full Classical Hierarchy

It is tempting to think of any type frame B as a "subframe" of the full classical hierarchy P with the same interpretation of type constants. This seems quite natural, since at every type a —k r, the set is a subset of the set of all functions from to Br. pa—*T However, we do not have in general. This is because may be a proper subset of if a is a functional type, and so the domain of a function in is different from a function in However, the intuitive idea that P "contains" B is reflected in a

8.4

Partial Surjections and Specific Models

575

certain kind a logical relation between P and 13. Specifically, a logical relation C A x 13 is a partial surjection, i.e., is called a partial surjection if each is a partial function and for each b E there is some a E N' with R°(a, b). Bb for each In this section, we prove that if we begin with partial surjections Rb: pb type constant b of some frame 13, then the induced logical relation from the full classical hierarchy P to 8 is a partial surjection. By choosing 8 isomorphic to a 17-term model, we may conclude that any full classical with each pb large enough has the same theory as 13. Consequently, an equation F M = N: a holds in the full classical hierarchy over 17-convertible. This was first proved in [Fri75]. any infinite types if M and N are

Lemma 8.4.1 (Surjections for Classical Models) Let over a full classical hierarchy P and type frame 8. If Rb for each type constant b, then

Proof We assume

ç P x 8 be a logical relation pb x Bb is a partial surjection

is a partial surjection.

and Rt are partial surjections and show that

is also a partial

surjection. Recall that

=

{(a, b)

I

We first show that

Writing

aal

E

b')

3 Rt(aa', bb')}.

is onto by letting b E

and finding a

E

pa-±r related to b.

and Rt as functions, a can be any function satisfying

(Rt)_I(b(Raai)) allaj EDom(R°).

is a function on its domain Since is the full type hierarchy, there is at least one such a E is a partial function, we suppose that To see that pa be any "choice function" satisfying Let f:

R°(fb', b')

and Rt is onto, b1) and

b2).

all b' E Ba.

we have Rt(a(fb1), bib') for i = 1,2. But since Rt is a function, Then for every b' E Thus, by extensionality of 13, we have b1 = it follows that b1b' = b2b' for every b' E the lemma. . proves b2. This It is now an easy matter to prove the completeness of cal hierarchy over any infinite sets.

7)-conversion for the full classi-

Theorem 8.4.2 (Completeness for Classical Models) Let P be the full classical hierarchy over any infinite sets. Then for any pure lambda terms, F M, N: a without constants, we have? 1= F M = N: a if M and N are $, 17-convertible.

Logical Relations

576

Proof Let B

be isomorphic to a Henkin model of equivalence classes of terms, as in Theorem 8.3.2, where E is the pure theory of fi, n-conversion. Let 1Z ç P x B be a is infinite logical relation with a partial surjection. We know this is possible since and each is countably infinite. By Lemma 8.4.1, 1Z is a partial surjection. Consequently, by Lemma 8.2.17, Th(P) C Th(B). But since every theory contains Th(B), it follows that

Th(P)=Th(B). 8.4.2

Full Recursive Hierarchy

In this subsection, we prove a surjection theorem for modest sets. Just as any frame A with an arbitrary collection of functions is the partial surjective image of a full classical hierarchy, we can show that any frame A containing only recursive functions is the partial surjective image of a full recursive hierarchy. Since we may view the term model as a recursive model, the surjection lemma for recursive models also implies that fi, n-conversion is complete for recursive models with infinite interpretation of each type constant. In order to carry out the inductive proof of the surjection lemma, we need to be able to show that for every function in a given recursive frame B, we can find some related recursive function in the full recursive hierarchy A. The way we will do this is to establish that for every type, we have functions that "recursively demonstrate" that the logical relation 1Z is a partial surjection. Following the terminology of Kleene realizability [K1e71], we say that a function fa realizes is surjective by computing fab with Similarly, a recursive function b E is a partial function by realizes computing the unique gaa = b with R°(a, b) from any a in the domain of We adapt the classical proof to modest sets by showing that the inductive hypothesis is realized by recursive functions. The assumption we will need for the interpretation of type constants is a form of "recursive embedding." If (A, eA) and (B, are modest sets, then we say (f, g) is a recursive of into embedding (B, ej,) (A, if and g are recursive functions

f

f: (B, eb)

(A, eA) and

g: (A, eA)

f

(B, eb)

whose composition g o = idB is the identity function on B. We incorporate such functions into a form of logical relation with the following definition. Let A and B be type frames with each type interpreted as a modest set. We say a family {R°, fa, of relations and functions is a recursive partial surjection from A to B if 1Z c A x B is a logical partial surjection, and

fa (Be',

(Aa, ea),

ga: (Ar', ea)

8.4

Partial Surjections and Specific Models

are recursive functions with fa realizing that partial function.

577

is surjective and g0 realizing that

is a

Lemma 8.4.3 (Surjections for Recursive Models)

Let A, B be frames of recursive functions and modest sets, for a signature without term constants, with A a full recursive for each type constant b, then there is a is recursively embedded in hierarchy. If from to recursive partial surjection B. A f°' from the recursive embedding Proof For each type constant b, we define gb) by gj,x = y. It is easy to check that (Ri', fb, gb) has the required properties. We y) define the logical relation 7?. ç A x B by extending to higher types inductively.

if

For the inductive step of the proof, we have three conditions to verify, namely is a surjective partial function,

(i)

(ii) There is a recursive b

fa±t:

a recursive

ga±t:

—÷

ea±t) with

—÷

with

ga) for

all a E

Since (i) follows from (ii) and (iii), we check only (ii) and (iii), beginning with (ii). Given with b). Writing we must find some recursive function a E b E R(.) as a function, a must have the property

aai

E

o for all ai E Dom(R°). It is easy to see that this condition is satisfied by taking a ft(b(Raai)) o chooses an element computes the function and E b g0, since ga (Rt)_l(b(Raai)) whenever there is one. We can also compute a as a function of b by composing with recursive functions. b) for some b. We so that For condition (iii), we assume a E need to show how to compute b uniquely from a. It suffices to show how to compute and that the value bb1 is uniquely determined the function value bb1 for every b1 E then R° (fabi, b1) and so we have Rt(a(fabi), bb1). Therefore by a and b1. If b1 E = bb1 and this equation determines the value bb1 uniquely. Thus we take = o a o fa, which, as we have just seen, gives the correct result for every a E

The completeness theorem is now proved exactly as for the full classical hierarchy, with Lemma 8.4.3 in place of Lemma 8.4.1, and with the term model viewed as a frame

Logical Relations

578

of recursive functions and modest sets, as explained in Examples 5.5.3, 5.5.4 and Exercise 5.5.13, instead of as a frame of classical functions.

Theorem 8.4.4 (Completeness for Recursive Models)

Let A be the full recursive type with the natural numbers recurhierarchy over any infinite modest sets (A"O, sively embedded in each A". Then for any pure lambda terms, F M, N: a without constants, we have A F M = N: a iff M and N are $, 8.4.3

Full Continuous Hierarchy

We may also prove a completeness theorem for models consisting of all continuous functions on some family of complete partial orders. While similar in outline to the completeness proofs of the last two sections, the technical arguments are more involved. The proof is from [P1o82]. The main lemma in each of the preceding sections is the surjection lemma. In each case, the main step in its proof is, given b e finding some a e satisfying

aa1 E (RT)_l(b(Raai))

eDom(RCJ).

This With the full classical hierarchy, we use the fact that all functions belong to allows us to choose one satisfying the conditions. In the case of recursive functions on modest sets, we built up recursive functions between A and B by induction on type, and used these to define a. In this section, we will build up similar continuous functions between A and the CPO obtained by lifting the set of equivalence classes of terms. Since there is no available method for regarding each type of the term structure as a "usefully" ordered CPO, the proof will involve relatively detailed reasoning about terms and free variables. The analogue of "recursive embedding" is "continuous retract." If V = (D,

s'

>

IA'I

f, A) A' is a

f,L Al

The reader may easily check that (Iunitl, id, unit) is a terminal object in C. There is a canonical "forgetful" functor C C which takes (S, f, A) to A and (t, x) to x. Category theorists will recognize the scone as a special case of a comma category, namely C = (id ), where id is the identity functor on Set (see Exercise 7.2.25). It is common I

to overload notation and use the symbol Set for the identity functor from sets to sets (and similarly for other categories). With this convention, we may write C = (Set A remarkable feature of sconing in general is that it preserves almost any additional categorical structure that C might have. Since we are interested in cartesian closed structure, we show that sconing preserves terminal object, products and function objects (exponentials). I

Lemma 8.6.14 canonical functor

If C is a cartesian closed category, then C is cartesian closed and the C is a cartesian closed functor. C .

Proof Let X

(S,

f,

A) and V

=

(S',

f', A') be objects of C. Then the object

Xx Y=(SxS',fxf',AxA'), is a product of X and Y, where and V is given by

X

V

= (M, h,

A

(f

x

f')(s, s') = (f(s), f'(s')).

A function object of X

A'),

where M is the set of morphisms X V in C and h((t, x)): unit A —÷ A' is the morphism in C obtained from x: A A' by currying, i.e., h((t, x)) = Curryx. We leave it to the reader, in Exercise 8.6.17, to check that 7r: C C is a cartesian closed functor.

.

In order to make a connection with logical predicates or relations, we will be interested in a subcategory C of the scone which we call the subscone of C. The objects of the subscone C are the triples (S, f, A) with f: S '—* IAI an inclusion of sets. It is possible to let f: S '—* IA be any one-to-one map. But, since we are really more interested in the

8.6

Generalizations of Logical Relations

597

map lxi: Al —k IA'i in the diagram above than ti: S —k S', for reasons that will become clear, we will assume that if (S, f, A) is an object of C, then S c Al is an ordinary settheoretic subset and f: S IAI is the identity map sending elements of S into the larger set Al. The morphisms (S, f, A) —k (S', f', A') of C are the same as for C, so C is afull subcategory of C. If (t, x) is a morphism from (S, f, A) to (S', f', A') in C, then the set function t: S —k S'is uniquely determined by x and S. Specifically, t is the restriction of lxi: Al —k IA'I to the subset S C Al. The subcategory C inherits the cartesian closed structure of C, up to isomorphism. More precisely, C is a subcategory of C that is also cartesian closed. However, given two objects, X and Y, from C, the function object X Y = (M, h, A A') given in the proof of Lemma 8.6.14 is not an object of C since M is not a subset of IA A'I. However, the function object for X and Y in C is isomorphic to the function object in C. Instead of describing function objects in C directly for arbitrary C, we will take this up in the illustrative special case that C is a product of two CCCs. Let us suppose we have a product category C = A x B, where both categories A and B have a terminal object. Recall that the objects of A x B are ordered pairs of objects from A and B and the morphisms (A, B) —k (A', B') are pairs (x, y), where x: A —k A' is a morphism in A and y: B —k B' is a morphism in B It is easy to see that when A and B are cartesian closed, so is A x B, with coordinate-wise cartesian closed structure. For example, a terminal object of A x B is (unit, unit). It is easy to see that (A, B)jc = AlA x BIB, where the cartesian product of AlA and BIB is taken in sets. We will often omit the subscripts from various l's, since these are generally clear from context. The objects of the scone AxB may be described as tuples (5, f, g, A, B) ,with S a set, and g functions that form a so-called span: A an object in A, B an object in B, and .

I

f

IAI

S

IBI

f, g, A, B) —k (S', f', g', A', B') in AxB may be described as tuples (t, x, y) , with x: A —k A' a morphism in A, y: B —k B' a morphism in B, and t: S —k 5' a function such that: The morphisms (5,

Logical Relations

598

lxi

Al

IA'I

.

BI

IB'I

lyl

Let us now look at the subcategory Ax B of AxB. Working through the definition of AxB, we may regard the objects of this subcategory as tuples (S, f, g, A, B), where f and

f

g provide an inclusion S -± Al x IBI. Since S, and g determine a subset of Al x IBI, as before, an object (S, f, g, A, B) may be expressed more simply as a triple (S, A, B) with S an ordinary binary relation on Al x IBI. For any morphisms (t, x, y) between objects of Ax B, the set map t is again uniquely determined by A and B maps x and y and the relation S. Specifically, t is the restriction of the product map lxi x to the subset simply write morphisms pairs, instead of triples. It S. This allows us to as is easy to see that (x, y) forms a morphism from (S, A, B) to (S', A', B') exactly if, for all morphisms a: unit —k A in A and b: unit —k B in B, we have

a S b implies (x

o

a) S' (y

o

b),

where we use relational notation a S b to indicate that S relates a to b, and similarly for 5'. With some additional notation, we can express the definition of morphism in AxB as a form of "commuting square." To begin with, let us write S: Al —x IBI to indicate that S is a binary relation on IA x IBI. The definition of morphism now has an appealing diagrammatic presentation as a pair (x, y) satisfying i

lxi

IA'I

Al

SI IB'l

IBI

iyI

8.6

Generalizations of Logical Relations

599

o 5 is included in the relation S"o lxi. where "ç" in this diagram means that the relation It is instructive to write out the function objects (exponentials) in the cartesian closed category AxB. The function object (S, A, B) —÷ (S', A', B') may be written (R, A —÷ B'), where R: IA A", B WI is the binary relation such that for all A'I lB morphisms e: unit —f A —÷ A' in A and d: unit —f B —÷ B' in B, we have

e

Rd if a Sb implies App

o (e,

a) S' App

o

(d, b)

for all a: unit —f A in A, b: unit —f B in B. We may also express products in a similar fashion. As in the general case, the subscone C is a CCC and a full subcategory of C, and the restriction of the forgetful functor C —f C to C is also a cartesian closed functor. The main properties of A x B are summarized in the following proposition, whose proof is left as Exercise 8.6.18.

Let C = A x B be the cartesian product of two CCCs. Then C is a cartesian closed category, with canonical functor C —f C a cartesian closed functor.

Proposition 8.6.15

Products and function objects in C are given by (5, A, B) x (5', A', B") (5', A', B')

(5, A, B)

= =

(S

x 5', A x A', B x B'),

(S

where the relations S x 5' and S

5', A

A', B

B'),

5' have the following characterizations:

if (Proj1 o a) S (Proj1 o b) and (Proj2 o a) 5' (Proj2 o b) 5') d if Va: unit —f A. Vb: unit —f B. a Sb implies App o (e, a) 5' App o (d, b).

a (S x 5') b e (S

—÷

At this point, it may be instructive to look back at the definition of logical relations in Section 8.2.1 and see the direct similarity between the descriptions of products and function objects in this proposition and the definition of logical relations over an applicative structure or Henkin model. The difference is that here we have generalized the definition to any cartesian closed category, forming relations in Set after applying the functor

Exercise 8.6.16 Show that for any category C with products and terminal object, the from C to Set preserves products up to isomorphism, i.e., IA x B functor IA x lB for all objects A and B of C. I

I

I

I

I

Exercise 8.6.17 Show that for any CCC C, the map 7r: C —f C defined in this section is a functor that preserves terminal object, products and function objects. Begin by showing

Logical Relations

600

that and

7t is

Y

a functor, then check that 7r(X x Y) = 7rX x 7rY for all objects X = (S, (S', f', A') of C, and similarly for function objects and the terminal object.

Exercise 8.6.18

f, A)

Prove Proposition 8.6.15.

= (S x S', A x A', B A', B') = (S —* S', A —* A',

(a) Show that (S, A, B) x (S', A', B')

x B') is a product in C.

(b) Show that (S, A, B) in C.

B

—*

(c) Show that the functor

Exercise 8.6.19 8.6.4

(S',

—*

B') is a function object

C —÷ C is a cartesian closed functor.

Show that if C is a well-pointed category, then so is C.

Comparison with Logical Relations

In this section, we describe the connection between logical relations and subscones, for the special case of cartesian closed categories determined from Henkin models, and show how the Basic Lemma may be derived from a simple categorical properties of the subscone. indexed by the types Since a logical predicate P ç A is a family of predicates over the signature of A, we can think of a logical predicate as a mapping from types over some signature to subsets of some structure. We will see that a mapping of this form is determined by the categorical interpretation of lambda terms in the subscone of a CCC. In discussing the interpretation of terms in cartesian closed categories in Section 7.2.4, where may be any signature. The differwe used a structure called a A is equipped with a specific choice ence between a CCC and a is that a unit —+ A for each term of object Ab for each type constant b of and a morphism Since a associates an object or arrow with each symbol in we constant of —÷ A can think of a as a cartesian closed category A together with a mapping from the signature into the category. X For any signature and set of equations between -terms, there is a CCC CE(e) called the term category generated by S, as described in Example 7.2.19. There is a canonical mapping presenting CE(e) as a -CCC, namely, the —÷ function mapping each type constant to itself and each term constant to its equivalence class, modulo As shown in Proposition 7.2.46, there is a unique cartesian closed functor from CE(ø) to any other D. (The categorical name for this is that CE(ø) is the free cartesian closed category over signature If we draw in the mappings from then such a cartesian closed functor forms a commuting triangle ,

8.6

Generalizations of Logical Relations

601

since the interpretation of a type or term constant is determined by the way that D is regarded as a E-CCC. As explained in Section 7.2.6, the cartesian closed functor from (0) into a E -CCC D is the categorical equivalent of the meaning function associated with a Henkin model. It is really only a matter of unraveling the definitions, and using Proposition 8.6.15, to see that the logical predicates over a Henkin model A for signature E conespond to ways of regarding the subscone of the category CA determined A as a E-CCC. More specifically, let C = CA be the cartesian closed category determined by Henkin model A, as explained in Section 7.2.5, regarded as a E-CCC in the way that preserves the meaning of terms. By Theorem 7.2.41, C is a well-pointed cartesian closed category, with the morphisms unit a exactly the elements of from the Henkin model A. If we consider a cartesian closed functor F from the term category the object map of this functor designates an object F(a) = (Pa, a) C with P c for each type a. But since laP = a meaning function into the subscone chooses a subset of the Henkin model at each type. It is easy to see from Proposition 8.6.15 that the family of predicates determined by the object map of a cartesian closed functor C is in fact a logical relation. This gives us the following proposition.

Proposition 8.6.20 Let C = CA be the well-pointed E-CCC determined by the Henkin model A for signature E. Then a logical predicate on A is exactly the object part of a over E into the E-CCC C. cartesian closed functor from the term category Using product categories, Proposition 8.6.20 gives us a similar characterization of logical relations over any number of models. For example, if C = CA x C8 is determined from Henkin models A and B, then C and C are again well-pointed cartesian closed categories (see Exercise 8.6.19) and a meaning function into C is a logical relation over Henkin models A and B. Using the conespondence given in Proposition 8.6.20, we can see that the Basic Lemma for logical relations is a consequence of the commuting diagram given in the following proposition.

Logical Relations

602

signature and let C be a cartesian closed catLet be a Composing with the —÷ C presenting C as a egory. Consider any mapping —÷ corresponding map —÷ C and, by the natural obtain a C, we canonical functor C (0) into the embedding of symbols into the term category, we have a similar map —÷ there exist unique cartesian closed functors term category. Given these three maps from from CE (0) into C and C commuting with the canonical functor C —÷ C as follows:

Proposition 8.6.21

0 C

Proof Let functions from to objects of the CCCs C, C and CE (0) be given as in the statement of the proposition. It is a trivial consequence of the definitions that the triangle involving C and C commutes. By Proposition 7.2.46, the cartesian closed functors CE(ø) —÷ C and CE(ø) —÷ C are uniquely determined. This implies that the two C triangles with vertices C commute. Finally, since the canonical functor C —÷ C is a cartesian closed functor (see Proposition 8.6.15), the two uniquely • determined cartesian closed functors from CE (0) must commute with this functor. It is again a matter of unraveling definitions to see how the Basic Lemma follows from this diagram. Let us consider the binary case with A and B given by Henkin models A and B of simply typed lambda calculus. Then both C = A x B and C are well-pointed cartesian closed categories. The cartesian closed functor CE (0) —÷ C maps a typed lambda term a M: r to the pair of morphisms giving the meaning of M in A and, respectively, the meaning of M in B. The cartesian closed functor CE(O) —÷ C maps a typed lambda term x CBI[a]1I = x x: a M: r to a morphism from some subset S ç to a subAt Bt, x CBIIr]1I = x where CAI[a]1 is the object of A named type expressetS' c sion a, and similarly for C81[a]1 and so on. As noted in Section 8.6.3, such a morphism in the subscone is a product map x g with f: CAI[a]1 —÷ C4t]j and g: C81[a]1 —÷ C81[r]1 which maps a pair (u, v) with u S v to a pair (f(u), g(v)) with f(u) 5' g(v). Because the two cartesian closed functors of CE(O) commute with the functor C —÷ C, must be the meaning = AI[x: a M: r]J of the term M in A, regarded as a function of the free

f

f

f

8.6

Generalizations of Logical Relations

603

variable x, and similarly g = 131[x: a M: t]1. Because the term x: a M: r was arbitrary, and a may be a cartesian product of any types, we have shown that given logically related interpretations of the free variables, the meaning of any term in A is logically related to the meaning of this term in B. This is precisely the Basic Lemma for logical relations over Henkin models. 8.6.5

General Case and Applications to Specific Categories

The definition of scone and subscone of a category C rely on the functor from C into Set. However, since relatively few properties of Set are used, we can generalize the construction and associated theorems to other functors and categories. We begin by stating the basic properties of the "generalized scone" and "subscone," then continue by deriving Kripke logical relations and directed-complete logical relations (from Section 8.6.2) as special cases. Proposition 8.6.14 holds in a more general setting with Set replaced by any cartesian closed category C' with equalizers and the functor I: C —* Set replaced by any functor F: C —* C' that preserves products up to isomorphism, by essentially the same proof. This takes us from the specific comma category C = (Set 4, I) to a more general case of the form (C' 4, F). I

I

I

I

Proposition 8.6.22

Let C and C' be cartesian closed categories and assume C' has equalizers. Let F: C —* C' be a functor that preserves finite products up to isomorphism. Then the comma category (C' 4. F) is a cartesian closed category and the canonical functor 7r: (C' 4. F) —* C is a cartesian closed functor.

The next step is to identify generalizations of the subscone. The proof of Proposition 8.6.2 1 actually establishes the general version where instead of C, we may consider any CCC D that has a cartesian closed functor into C. We will be interested in situations where D is a subcategory of a comma category (C' 4, F), such that the restriction (C' 4. F) —* C to D is a cartesian closed of the canonical cartesian closed functor functor. Let C and C' be cartesian closed categories and assume C' has equalizers. Let F: C —* C' be a functor that preserves products and let D be a subx,± signacartesian-closed category of the comma category (C' 4, F). Let E be a ture and consider any mapping E —÷ D presenting D as a E-CCC. Composing with the canonical functor D —* C, we obtain a corresponding map E —÷ C and, by the natural (0) into embedding of symbols into the term category, we have a similar map E —÷ the term category. Given these three maps from E, there exist unique cartesian closed

Proposition 8.6.23

Logical Relations

604

functors from CE (0) into C and D commuting with the canonical functor D —± C as follows: CE(O)

Sconing with Kripke Models If P is a partially ordered set, then we may consider P as a category whose objects are the elements of 7' and morphisms indicate when one element is less than another, as described in Example 7.2.6. As shown in Section 7.3.7, each Kripke lambda model with P as the set of "possible worlds" determines a cartesian closed category C that is The inclusion F of C into category a subcategory of the functor category preserves products, since the interpretation of a product type in a Kripke lambda model By Exercise 7.2.14, has equalizers and we is the same as cartesian product in can apply Propositions 8.6.22 and 8.6.23 For the appropriate analog of subscone, we take the subcategory with an inclusion at each "possible world." More specifically, let D be the full subcategory of the comma cateF(A)(p) an inclusion gory S(p) 4. F) that consists of objects (S, f, A) with for each p in P. By assuming these are ordinary subsets, we may omit when refening to objects of D. It may be shown that D is a CCC (Exercise 8.6.24). In describing function objects (exponentials) in D we may use natural transformations F(appA,A'): F(A —± A') x F(A) —± F(A') because the functor F preserves binary products. For each object C in C and each p p'in P, we also use the transition function F(C)(p p'): F(C)(p) —± F(C)(p') given by the action of the functor F(C): P Set on the order of P. Specifically, (S, A) A') may be given as (P, A A'), where A A' is taken in C and P: P —± Set is the functor such that the set P(p) consists of all t in F(A —k A')(p) for which for all p' p and all a in F(A)(p'), if a belongs to S(p'), then the element A')(p p')(t), a) in F(A')(p') belongs to S'(p'). In other words, the object parts of the unique cartesian closed functor from the term category into D in this case amount to the Kripke logical predicates on the Kripke lambda model that corresponds to F.

f

8.6

Generalizations of Logical Relations

605

In the binary case, we let A and B be cartesian closed categories and let F: A -÷ and G: A —÷ finite product preserving functors given by Kripke lambda models A and B, each with P as the poset of possible worlds. Let H: A x B -÷ be the functor given by H(A, B) = F(A) x G(B). Continuing as above, with C = A x B and with H instead of F, we can see that the object parts of the cartesian closed functors into D are exactly the Kripke logical relations on Kripke lambda models A and B.

Sconing with CPOs Since Cpo is a cartesian closed category with equalizers (by Exercise 7.2.14), we may also form a generalized scone using functors into Cpo. This is like the case for Kripke models, except that we have the added complication of treating pointed CPOs properly; our relations must preserve the bottom element in order to preserve fixed points, as illustrated in Section 8.6.2. It might seem reasonable to work with the category Cppo of pointed CPOs instead, since relations taken in Cppo would have to have least elements. However, the category Cppo does not have equalizers (by Exercise 7.2.14). Moreover, we may wish to construct logical relations which are closed under limits at all types, but which are only pointed where needed to preserve fixed points. A general approach would be to use lifted types, and extend the lifting functor from Cpo to the appropriate comma category. This would give "lifted relations" on lifted types, which would necessarily preserve least elements. A less general alternative, which we will describe below, is to simply work with subcategories where predicates or relations are pointed when needed. This is relatively ad hoc, but is technically simpler and (hopefully) illustrates the main ideas. Since we are interested in preserving fixed-point operators, we let the signature E be a signature that may include fixed-point constants at one or more types, and let Ef be the set of equations of the form fiXa = Af: a -÷ a. f(fix f) for each type a that has a fixedis the term category of simply typed lambda point operator in E. The category calculus with with one or more fixed-point constants, each with its associated equational axiom. By Proposition 7.2.46, there is a unique cartesian closed functor C from to any E-CCC satisfying the fixed-point equations. In particular, there is a unique such functor into any E-CCC of CPOs. If we use Cpo Set, then the scone (Set I) allows us to form logical predicates or relations over CPOs. However, these relations will not be directed complete. Instead, we let I be the identity functor on Cpo and consider the comma category (Cpo I). This could also be written (Cpo Cpo), since we are using Cpo as a convenient shorthand for the identity functor on Cpo. To form the subscone, we consider the full subcategory D A. In other words, P is a with objects of the form (P, f, A), where is an inclusion P sub-CPO of A, and therefore closed under limits of directed sets. By generalizing Proposition 8.6.15, we can see that D provides logical predicates over CPO models that are closed I

f

606

Logical Relations

under limits. However, there is no reason for predicates taken in Gpo to preserve least elements — we have objects (P, f, A) where A is pointed and P is not. This is not a problem in general, but if we wish to have a predicate preserve a fixed-point operator on pointed A, we will need for P to be a pointed subset of A. In order to preserve fixed points, we consider a further subcategory. Assume we have a operators. map E —* Gpo identifying certain CPOs on which we wish to have —± For this to make sense, we assume that if a is a type over E with a constant fixa: a, then our mapping from types to CPOs will guarantee that the interpretation of a is a pointed GPO. In this case, we are interested in subcategories of D whose objects are of the form (P, A), where if A is the interpretation of a type with a fixed-point constant, then the inclusion P c—+ A is strict. In particular, P must have a least element. The CCC structure of this subcategory remains the same, but in D we also get fixed-point operators where required, and the forgetful functor D —± Gpo preserves them. The object parts of into D are exactly the inclusive logical the unique cartesian closed functor from CE (Ef predicates described in Section 8.6.2, pointed where required by the choice of fixedpoint constants. In the binary case, we let D be the full subcategory of (Gpo x Gpo Gpo x Gpo) whose objects are of the form (S, A, B), where S is a sub-GPO of A x B, and force pointed relations as required. Suppose P is a partially ordered set and C is a Kripke lambda model with P as the set of "possible worlds." Let F: C —* be the inclusion of C into the category category Show that the full subcategory D of the comma category F) consisting of objects (S, f, A) with S(p) c-+ F(A)(p) an inclusion for each p in P is cartesian closed.

Exercise 8.6.24

Polymorphism and Modularity

9.1

9.1.1

Introduction Overview

This chapter presents type-theoretic concepts underlying polymorphism and data abstraction in programming languages. These concepts are relevant to language constructs such as Ada packages and generics [US 801, c++ templates [E590], and polymorphism, abstract types, and modules in ML [MTH9O, MT91, U11941 and related languages Miranda {Tur85J and Haskell [HF92, As a rule, the typed lambda calculi we use to formalize and study polymorphism and data abstraction are more flexible than their implemented programming language counterparts. In some cases, the implemented languages are restricted due to valid implementation or efficiency considerations. However, there are many ideas that have successfully progressed from lambda calculus models into implemented programming languages. For example, it appears from early clu design notes that the original discovery that polymorphic functions could be typechecked at compile time came about through Reynolds' work on polymorphic lambda calculus {Rey74bI. Another example is the Standard ML module system, which was heavily influenced by a type-theoretic perspective [Mac851. The main topics of this chapter are: Syntax of polymorphic type systems, including predicative, impredicative and type: type versions.

•

Predicative polymorphic lambda calculus, with discussion of relationship to other systems, semantic models, reduction and polymorphic declarations.

•

Survey of impredicative polymorphic lambda calculus, focusing on expressiveness, strong normalization and construction of semantic models. This section includes discussion of representation of natural numbers and other data by pure terms, a characterization of theories by contexts, including the maximum consistent theory, parametricity, and formal category theory.

•

•

Data abstraction and existential types.

General products and sums and their relation to module systems for Standard ML and related languages. •

Most of the topics are covered in a less technical manner than earlier chapters, with emphasis on the intuitive nature and expressiveness of various languages. An exception is the more technical study of semantics and reduction properties of impredicative polymorphism.

Polymorphism and Modularity

608

9.1.2

Types as Function Arguments

In the simply-typed lambda calculus and programming languages with similar typing constraints, static types serve a number of purposes. For example, the type soundness theorem (Lemma 4.5.13) shows that every expression of type ci —* r determines a function from type ci to r, in every Henkin model. Since every term has meaning in every Henkin model, it follows that no typed term contains a function of type flat —* flat, say, applied to an argument of type bool. This may be regarded as a precise way of saying that simply-typed lambda calculus does not allow type errors. Simply-typed lambda calculus also enjoys a degree of representation independence, as discussed in Section 8.5. However, the typing constraints of simply-typed lambda calculus have some obvious drawbacks: there are many useful and computationally meaningful expressions that are not well-typed. In this chapter, we consider static type systems that may be used to support more flexible styles of typed programming, generally without sacrificing the type security of more restricted languages based on simply-typed lambda calculus. Some of the inconvenience of simple type structure may be illustrated by considering sorting functions in a language like C or Pascal. The reader familiar with a variety of sorting algorithms will know that most may be explained without referring to the kind of data to be sorted. We typically assume that we are given an array of pointers to records, and that each record has an associated key. It does not matter what type of value the key is, as long as we have a "comparison" function that will tell which of two keys should come first in the sorted list. Most sorting algorithms simply compare keys using the comparison function and swap records by resetting pointers accordingly. Since this form of sorting algorithm does not depend on the type of the data, we should be able to write a function Sort that sorts any type of records, given a comparison function as a actual parameter. However, in a language such as Pascal, every Sort function must have a type of the form

Sort: (t x

t

—*

bool) x Array[ptr tI

—*

Array[ptr t]

for some fixed type t. (This is the type of function which, given a binary relation on t and an array of pointers to t, return an array of pointers to t). This forces us to write a separate procedure for each type of data, even though there is nothing inherent in the sorting algorithm to require this. From a programmer's point of view, there are several obvious disadvantages. The first is that if we wish to build a program library, we must anticipate in advance the types of data we wish to sort. This is unreasonably restrictive. Second, even if we are only sorting a few types of data, it is a bothersome chore to duplicate a procedure declaration several times. Not only does this consume programming time, but having several copies of a procedure may increase debugging and maintenance

9.1

Introduction

609

time. It is far more convenient to use a language that allows one procedure to be applied to many types of data. One family of solutions to this problem may be illustrated using function composition, which is slightly simpler than sorting. The function

Af: nat

composenatnatnat

—±

nat.Ag: nat —± nat.Ax:

nat.f(gx)

composes two natural number functions. Higher-order functions of type nat nat) and (nat —± nat) —± nat may be composed using the function

(nat

—÷

nat)

—÷

nat.Ag: nat —÷ (nat

—÷

—÷

nat).Ax:

(nat

—÷

nat.f(gx)

It is easy to see that these two composition functions differ only in the types given to the formal parameters. Since these functions compute their results in precisely the same way, we might expect both to be "instances" of some general composition function. A polymorphic function is a function that may be applied to many types of arguments. (This is not a completely precise definition, in part because "many" is not specified exactly, but it does give the right general idea.) We may extend to include polymorphic functions by extending lambda abstraction to type variables. Using type variables r, s and t, we may write a "generic" composition function

composerst

).f:s

—±

t.).g:r

—÷

s.Ax:r.f(gx).

It should be clear that composer,s,t denotes a composition function of type (s —÷ t) —÷ (r —* s) —* (r —* t), for any semantic values of type variables r, s and t. In particular, when all three type variables are interpreted as the type of natural numbers, the meaning of composerst is the same as composenatnatnat. By lambda abstracting r, s and t, we may write a function that produces composenatnatnat when applied to type arguments nat, nat and nat. To distinguish type variables from ordinary variables, we will use the symbol T for some collection of types. As we shall see shortly, there are several possible

interpretations for T. Since these are more easily considered after presenting abstraction and application for type variables, we will try to think of T as some collection of types types, and perhaps other types not yet specified. There is that at least includes all the no need to think of T itself as a "type," in the sense of "element of T." In particular, we do not need T to include itself as a member. Using lambda abstraction over types, which we will call type abstraction, we may define a polymorphic composition function compose

Ar: T. As: T. At: T. composerst.

Polymorphism and Modularity

610

This function requires three type parameters before it may be applied to an ordinary func.tion. We will refer to the application of a polymorphic function to a type parameter as type application. Type applications may be simplified using a version of a-reduction, with the type argument substituted in place of the bound type variable. If we want to apply compose to two numeric functions f, g nat —± nat, then we first apply compose to type arguments nat, nat and nat. Using a-reduction for type application, this gives us :

compose nat nat nat = (Xr: T.Xs: T.Xt: T.Xf: s

= Xf: nat

—±

nat.Xg: nat

—±

—±

t.Xg: r

—±

s.Xx:

r.f(gx)) nat nat nat

nat.Xx: nat.f(gx).

In other words, compose nat nat nat = composenatnat,nat. By applying compose to other type arguments, we may obtain composition functions of other types. In a typed function calculus with type abstraction and type application, we must give types to polymorphic functions. A polymorphic function that illustrates most of the typing issues is the polymorphic identity

Id

Xt:T.Xx:t.x.

It should be clear from the syntax that the domain of Id is T. The range is somewhat more difficult to describe. Two common notations for the type of Id are fIt: T.t —± t and Vt: T.t —± t. In these expressions, fI and V bind t. The V notation allows informal readings such as, "the type of functions which, for all t: T, give us a map from t to t." The fI notation is closer to standard mathematical usage, since a product A, over an infinite index set I consists of all functions f: I t UjEJA, such that f(i) E A1. Since Id t t for every t T, the polymorphic identity has the correct functional behavior to belong to the infinite product flt•T t —± t, and we may write Id: fIt. t —± t. We will use fI notation in the first polymorphic lambda calculus we consider and V in a later system, primarily as a way of distinguishing the two systems. Just as we can determine the type of an ordinary function application fx from the and x, we can compute the type of a type application from the type of the types of polymorphic function and the type argument. For the type application Id t, for example, the type of Id is fIt. t —± t. The type of Id t is obtained by substituting t for the bound variable t in t —* t. While this substitution may seem similar to some kind of a-reduction (with fI in place of X), it is important to notice that this substitution is done only on the type of Id, not the function itself. In other words, we can type-check applications of polymorphic functions by without evaluating the applications. In a programming language based on this form of polymorphism, this type operation be carried out by a compile-time type checker (except in the case for type: type polymorphism described below, where it is difficult to separate compile-time from run-time).

f

9.1

Introduction

611

Having introduced types for polymorphic functions, we must decide how these types fit into the type system. At the point we introduced T, we had only defined non-polymorphic types. With the addition of polymorphic types, there are three natural choices for T. We may choose to let T contain only the simple types defined using —+, + andlor x, say, and some collection of type constants. This leads us to a semantically straightforward calculus of what is called predicative polymorphism. The word "predicative" refers to the fact that T is introduced only after we have defined all of the members of T. In other words, the collection T does not contain any types that rely on T for their definition. An influential predicative calculus is Martin-Löf's constructive type theory [Mar73, Mar82, Mar84], which provides the basis for the Nuprl proof development system A second form of polymorphism is obtained by letting T also contain all the polymorphic types (such as fIt: T.t —+ t), but not consider T itself a type. This T is "impredicative," since we assume T is cEosed under type definitions that refer to the entire collection T. The resulting language, formulated independently by Girard [Gir7l, Gir72] and Reynolds [Rey74b], is variously called the second-order lambda calculus, System or the calculus of impredicative polymorphism. (It is also referred to as the polymorphic lambda calculus, but we will consider this phrase ambiguous.) The final alternative is to consider T a type, and let T contain all types, including itself. This may seem strange, since assumptions like "there is a set of all sets" are well-known to lead to foundational problems (see Section 1.6). However, from a computational point of view, it is not immediately clear that there is anything wrong with introducing a "type of all types." Some simple differences between these three kinds of polymorphism may be illustrated by considering uses of Id, since the domain of Id is T. If we limit T to the collection of all types, we may only apply Id to a non-polymorphic type such as nat or (nat —+ nat) —+ (nat —+ bool). If we are only interested polymorphic functions as a means types, then this seems sufficient. A technical property of for defining functions with may be extended to a model of predicative this restricted language is that any model of polymorphism. In particu'ar, there are classical set-theoretic models of predicative polymorphism. A second advantage of predicative polymorphism is the way that modularity constructs may be integrated into the type system, as discussed in Section 9.5. Predicativity may also be useful in the development of type inference algorithms. These algorithms allow a programmer to obtain the full benefit of a static type system without having to explicitly declare the type of each variable in a program. The type inference algorithm used in the programming language ML is discussed in Chapter 11. If we let T be the collection of all types, then the polymorphic identity may be applied to any type, including its own. By /3-reduction on types, the result of this application is an identity function

Polymorphism and Modularity

612

Id (lit: T.t

t) = ?.x: (lit: T.t

—f

t). x

which we may apply to Id itself. This gives a form of "self-application" reminiscent of Russell's paradox (Exercise 4.1.1). In the impredicative calculus, it is impossible to interpret every polymorphic lambda term as a set-theoretic function and every type as the set of all functions it describes. The reason is that the interpretation of Id would have to be t) which contains Id. a function whose domain contains a set (the meaning of lit: T.t The reader may find it amusing to draw a Venn diagram of this situation. If we think of a function as a set of ordered pairs (as described in Section 1.6.2) then we begin with a circle for the set Id of ordered pairs Id = ((r, ?.x: r.x) r T}. But one of the e'ements of Id t).x) allowing the function to be applied to its t), ?.x: (lit: T.t is the pair ((lit: T.t own type. One of the components of this ordered pair is a set which contains Id, giving us a picture with circularly nested sets. This line of reasoning may be used to show that a naive interpretation of the impredicative second-order calculus conflicts with basic axioms of set theory. A more subtle argument, discussed in Section 9.12, shows that it is not even possible to have a semantic model with classical set-theoretic interpretation of all function types. The semantics of impredicative polymorphism has been a popular research topic in recent years, and several appealing models have been found. In the final form of polymorphism, we let T be a type which contains all types, including itself. This design is often referred to as type: type. One effect of type: type is to allow polymorphic functions such as Id to act as functions from types to types, as well as functions from types to elements of types. For example, the application Id T is considered meaningT is ful, since T: T. By $-reduction on types, we have Id T = ?.x: T.x and so (Id T): T a function from types to types. We may also write more complicated functions from T to T and use these in type expressions. While languages based on the other two choices for T are strongly normalizing (unless we add fixed-point operators) and have efficiently decidable typing rules, strong normalization fails for most languages with type: type. In fact, it may be shown that the set of provable equations in a minimal language with type: type is undecidable ([MR86]; see also [Mar73, Coq86, How87]). Since arbitrarily complex expressions may be incorporated into type expressions, this makes the set of well-typed term undecidable. Historically, a calcu'us with a type of all types was proposed as a foundation for constructive mathematics by Martin-Löf, until an inconsistency was found by Girard in the early 1970's [Mar73]. While compile-time type checking may be impossible for a language with a type of all types, we will see in Section 9.3.5 that there are insightful and non-trivial semantic models for polymorphism with a type of all types. In Sections 9.1.3 and 9.1.4, we summarize some extensions of polymorphic type systems. In Section 9.2, we return to the predicative polymorphic calculus and give a precise definition of the syntax, equational proof rules and semantics of this typed lambda calI

9.1

Introduction

613

culus. We also show how impredicative and type: type polymorphism may be derived by adding rules to the predicative type system. All of the systems described in this chapter use what is called parametric polymorphism. This term, coined by Strachey [Str671, refers to the fact that every polymorphic function uses "essentially the same algorithm" at each type. For example, the composition function given earlier in this section composes two functions and g in essentially the same way, regardless of the types of and g. Specifically, composeparfgx is always computed by first applying g to x and then applying to the result. This may be contrasted with what Strachey termed ad hoc polymorphism. An ad hoc polymorphic function might, for example, test the value of a type argument and then branch in some way according to the form of this type. If we added a test for type equality, we could write the following form of "composition" function:

f

f

add_hoc_compose

f

T.

T.

T.

s

r

r

if Eq?st then f(f(gx)) else f(gx)

f

This function clearly composes and g in a manner that depends on the type of f. This kind function is considered briefly in Section 9.3.2. It appears that ad hoc polymorphism is consistent with predicative polymorphism, as explored in [HM95aJ. However, ad hoc operators destroy some of the desirable properties of impredicative polymorphic calculus. Some research papers on parametricity, which has been studied extensively if not conclusively, are [BFSS89, FGSS88, Rey83, MR92, Wad891. 9.1.3

General Products and Sums

There are a number of variations and extensions of the polymorphic type systems described in this chapter. An historically important series of typed lambda calculi are the Automath languages, used in a project to formalize mathematics [DB8OJ. Some conceptual descendants of the Automath languages are Martin-Löf's intuitionistic type theory [Mar73, Mar82, Mar84] and the closely related Nuprl system for program verification and formal mathematics [C+86]. Two important constructs in these predicative type systems are the "general" product and sum types, written using H and E, respectively. Intuitively, general sums and products may be viewed as straightforward set-theoretic constructions. If A is an expression defining some collection (either a type or collection of types, for example), and B is an expression with free variable x which defines a collection for each x in A, then Ex: A. B and Hx: A. B are called the sum and product of the family B over the index set A, respectively. In set-theoretic terms, the product flx: A .B is the Cartesian product of the family of sets (B(x) x E A). The elements of this product are such that f(a) E [a/xJB for each a E A. The sum Ex: A.B is the disjoint functions I

f

Polymorphism and Modularity

614

union of the family fB(x) x e A}. Its members are ordered pairs (a, b) with a e A and b e [a/x]B. Since the elements of sum types are pairs, general sums have projection functions first and second for first and second components. A polymorphic type lit: T.cr or Vt: T.a is a special case of general products. Another use of general products is in so-called "first-order dependent products." An example from programming arises in connection with anays of various lengths. For simplicity, let us consider the type constructor vector which, when applied to an integer n, denotes the type of integer arrays indexed from 1 to n. If we have a function zero_vector that, given an integer n, returns the zero-vector (anay with value 0 in each location) of length n, then we can write the type of this function as a general product type I

zero_vector: lin: mt. vector n In words, the type lin: mt. vector n is the type of functions which, on argument n, return an element of the type vector n. An interesting and sometimes perplexing impredicative language with general products is the Calculus of Constructions [CH88], which extends

the impredicative polymorphic calculus. Existential types, described in Section 9.4, provide a type-theoretic account of abstract data types. These types are similar to general sums, but restricted in a significant way. Intuitively, in a program with an abstract data type of stacks, for example, stacks would be implemented using some type (such as records consisting of an array and an array index) for representing stacks, and functions for manipulating stacks. Thus an implementation of stacks is a pair consisting of a type and a tuple of elements of a related type (in this case, a tuple of functions). If a is a type expression describing the signature of the tuple of functions associated with stacks, then the type of all stack implementations is the existential type T.a. This differs from the general sum type Et: T.a in the following way. If b): Ex: A.B, thenfirst(a, b) is an expression for an element of A withfirst(a, b) = a. (a, However, if (t = r, M: a): the we do not want any program to be able to access the first component, r, since this is the representation of an abstract type. This restriction follows from the goals of data abstraction, but may seem unnecessarily restrictive in other contexts, such as for ML-style modules. We discuss existential types in more detail in Section 9.4

Exercise 9.1.1 Show that if x does not occur free in B, then lix: A.B is the set of functions from A to B and Ex: A.B is the set of ordered pairs A x B. 9.1.4

'rypes as Specifications

The more expressive type systems mentioned in Section 9.1.3 all serve dual purpose, following what is called the formulas-as-types correspondence, or sometimes the Curry-

9.1

Introduction

615

Howard isomorphism. This general correspondence between typed lambda calculi and constructive logics allows us to regard lambda terms (or programs) as proofs and types as specifications. In simple type systems without variables or binding operators in types, the logic of types is just propositional logic. With polymorphism and data abstraction, however, we may develop type systems of quantified logics, capable of specifying more subtle properties of programs. This approach is used in the the Nuprl system for program verification and other conceptual descendents of the Automath languages [DB8OI. Some related systems for checking proofs via type checking are the Edinburgh Logical Framework [HHP87I and PX [HN881, which is based on Feferman's logical system [Fef791. Section 4.3.4 contains a basic explanation of natural deduction and the formulas-astypes correspondence for propositional logic. The main idea in extending this correspondence to universal quantification is that a constructive proof of Vx: A must be some kind of construction which, given any a E A, produces a proof of If we identify proofs of the formula with elements of a type that we also write then proofs of which, given any a E A, produce an elVx: A4 must be functions from A to Therefore, the constructive proofs of Vx: A4 are precisely the elements ement of of the general product type Fix: A be seen in the proof rules for universal quantification. The natural deduction rules for universal quantifiers, found in [Pra65], for example, are universal generalization and instantiation.

x:Anotfreein Vx:

A4

any assumption

(The condition in the middle is a side condition for the first rule.) The first rule gives us a typing rule for lambda abstraction. Specifically, if we have a term giving a proof of we can lambda abstract a free variable x of type A in M to obtain a proof of Vx: In lambda calculus terminology, the side condition is that x may not appear in the type of any free variable of M. This does not arise in simply-typed lambda calculus since term variables cannot appear in type expressions. However, in polymorphic calculi, this becomes an important restriction on lambda abstraction of type variables. In summary, the logical rule of universal generalization leads to the (U Intro) rule

F

F, x: A

M: a

Xx: A. M:

fix: A.a

x not free in

r

(U Intro)

explained in more detail for A = Ui, a "universe of types," in Section 9.2.1. The second proof rule, universal instantiation, corresponds to function application. More specifically, if a proof of Vx: A4 is a function of type fix: A4, and a: A, then we may obtain a proof of by applying the function to argument a. This leads to the (fi Elim) rule

Polymorphism and Modularity

616

FiM:flx:A.a F

MN: [N/x]a

explained for A = U1 in Section 9.2.1. The situation is similar for existential quantification and general sums. The logical rules of existential generalization and instantiation are [y/x]4)

[a/x]4)a:A 3x: A.4)

i/c

other assumption

by showing that 4) holds for some specific a: A. In the In the first rule, we prove second rule, the second hypothesis is meant to indicate a proof of i/c from an assumption of a formula of the form Ily/x]4), with some "fresh" variable y used to stand for some A.4) says that arbitrary element satisfying the property 4). In other words, the formula "4) holds for some x." We can use this to prove another formula i/c by choosing some which means arbitrary name y for "some x." This rule discharges the assumption that although the proof of i/c above the line depends on this hypothesis, the proof of i/c resulting from the use of this rule does not depend on this assumption. (See Section 4.3.4 for a discussion of how natural deduction rules may introduce or discharge assumptions.) The term formation rules associated with general sums and existential quantification are given in Section 9.4. In logic, there is a fundamental distinction between first-order logic and second- and higher-order logics. Since a general product flx: A. B provides universal quantification over A, we can obtain first-order, second-order or higher-order type systems by specifying the kind of type A that may appear in a product fix: A. B. The first-order dependent products of the last subsection usually correspond to first-order quantification, since we are quantifying over elements of a type. Polymorphism arises from products of the form lix: T.B, where T is some collection of types, providing universal quantification over propositions. Similarly, a general sum A. B provides existential quantification over A while the general sums T.B associated with data abstraction provide existential quantification over propositions. With general products and sums of all kinds, we can develop logic with universal and existential quantification over each type, as well as quantification over collections of types. In a predicative system, quantification over propositions is restricted to certain subcollections, while in an impredicative system, we can quantify over the class of all propositions. Another aspect of formulas-as-types is that in most type systems, common atomic for-

9.2

Predicative Polymorphic Calculus

617

mulas such as equality do not appear as basic types. Therefore, in order to develop type systems that allow program specification and verification, we must also introduce types that correspond to atomic formulas. To give some idea of how atomic formulas may be added, we will briefly summarize a simple encoding of equality between terms. Since types represent propositions, we need a type representing the proposition that two terms are equal, together with a way of writing terms of the resulting "equality type." For any terms M, N: A, we could introduce a type of the form N) to represent the atomic formula M = N. The terms of type N) would correspond to the ways of establishing that M = N. Since equality is reflexive, we might have a term formation rule so that for any M: A, there is a term reflA(M): M). In other words, for every M: A, there is a "proof" reflA(M) of the formula M). To account for symmetry, we would have another term formation rule such that for any M: P), representing a proof that N = P, there is also a term symA(M): N) representing a proof that P N. In addition, we would expect term forms for transitivity and possibly other properties of equality such as substitutivity of equivalents. In a language like this, we could write a term of type N) whenever M and N are provably equal terms of type A. To see how we may regard types as "program specifications" and well-typed terms as "verified programs," we will give a concrete example. Suppose prime(x) is a type corresponding to the assertion that x: flat is prime, and divides(x, y) is a type "saying" that x divides y. These would not necessarily be atomic types of the language, but could be complicated type expressions with free natural number variables x and y. Then we would expect a term of type Fix: nat.(x>

1

—÷

Ey: nat.(prime(y) x divides(y, x)))

to define a function which, given any natural number x, takes a proof of x > 1 and returns a pair (N, M) with N: nat and M proving that N is a prime number dividing x. Based on this general idea of types as logical assertions, Automath, Nuprl and the Calculus of Constructions have all been proposed as systems for verifying programs.

9.2

9.2.1

Predicative Polymorphic Calculus Syntax of Types and Terms

In this section, we define the predicative polymorphic lambda calculus By itself, ' pure is not a very significant extension of since the only operations we may perform on an expression of polymorphic type are type abstraction and application. Since we do not have lambda abstraction over term variables with polymorphic types, we cannot

618

Polymorphism and Modularity

pass polymorphic functions as arguments or model polymorphic function declarations. ' illustrates some of the essential features common to all languages with However, type abstraction and forms the core fragment of several useful languages. For example, by adding a declaration form for polymorphic functions, we obtain a language resembling a core fragment of the programming language ML as explained in Section 9.2.5. With additional typing rules that eliminate the distinctions imposed by predicativity, we obtain the Girard—Reynolds calculus of impredicative polymorphism, explored in Section 9.3. With one more typing assumption, we obtain type: type. It is straightforward to extend any of these systems with other simple types such as null, unit, + and x, using the term forms and equational rules in Chapter 4. types and the polymorfall into two classes, corresponding to the The types of ' phic types constructed using 11. Following the terminology that originated with MartinLöf's type theory, we will call these classes universes. The universe of function types over some collection of given types (such as nat and bool) will be written Ui and the universe of polymorphic types U2. If we were to extend the language with cartesian product types, we would put the cartesian product of two Ui types in U1, and the cartesian product of two U2 types (if desired) in U2. To give the universes simple names, we sometimes call U1 and U2 the "small" and "large" types, respectively. Although we will later simplify the syntax of by introducing some meta-linguistic conventions, we will begin by giving explicit syntax rules for both type expressions and terms. The reasons for giving a detailed treatment of syntax are: (i) this makes membership in each type and universe as clear as possible, (ii) this gives an introduction to the general method used to define the syntax of more general type systems, and (iii) this makes it easier to see precisely how impredicative and type: type polymorphism are derivable from predicative polymorphism by imposing additional relationships between U1 and U2. In a general type system, the syntax of each class of expressions is given by a proof system for assertions that are commonly called "syntactic judgements." Like the type derivation system for simply-typed lambda calculus in Chapter 4, the proof system for syntactic judgements may be understood as an axiomatic presentation of a type checker for the language. In our general presentation of we will use judgements of the form F A: B, where F is a context (like the sort assignments of Chapter 3 and type assignments of Chapter 4) indicating the type or universe of each variable, A is the expression whose type or universe is being asserted, and B is either U1, U2 or a type expression. The judgement F A: B is read, "in context F, A is a well-formed expression defining an element of type or universe B." A useful property of the syntax rules is that if F A: B is derivable, then all of the free variables of A and B must appear in F. A context is an ordered sequence

Predicative Polymorphic Calculus

9.2

F

=

v1:A1

619

vk:Ak

types or universes to variables v Vk. In order to make sense, no v1 may occur twice in this sequence, and each must be provably well-formed assuming only vi: A1 v,_1: A,_1. We will put this formally using axioms and inference rules for judgements of the form "F context," which say that "F is a well-formed context." The order of assertions in a context is meant to reflect the order of declarations in an expression, including declarations of the types of formal parameters. For example, in showing that a term such as giving

is well-typed, we would show that the function body, xy, is well-typed in the context t: U1, x: t —÷ t, y: t. In this context, it is important that t: U1 precedes x: t —÷ t and y: t since we need to declare that t is a type variable before we write type expressions t —÷ t and t. The context would make sense with x and y declared in either order, but only one order is useful for deriving a type for this term.

The context axiom 0 context

(empty context)

says that the empty sequence is a well-formed context. We can add an assertion x: A to a context if A is either a universe or a well-formed type expression. Although it would be entirely reasonable to have U2 variables in an extension of we will only use U1 variables, for reasons that are outlined below. We therefore have one rule for adding U1 variables and one for adding variables with types to contexts. F context F,

t:

F, x:

t not in F

(U1 context)

U1 context

a context

F

(U1

type context)

The second rule (for adding a variable ranging over a specific type) illustrates a general convention in formal presentation of type systems: we omit conditions that will be guaranteed by simple properties of the syntax rules. In this case, we do not explicitly assume "F context" in the hypothesis of this rule since it will be a property of the syntax rules that we can only derive F a: U1 when F is a well-formed context. Along with forming contexts, there are two other general rules that apply to a variety of type systems. The first is the rule for making use of assumptions in contexts.

Polymorphism and Modularity

620

F, x: A context F, x: A x: A

(var)

This is essentially the converse of the rules for adding an assertion x: A to a context. The rule is useful both when A is U1 and when A is a type belonging to U1 or U2. We also have a general rule for adding hypotheses to a syntactic judgement.

F,x:Ccontext F, x: C

(add var)

A: B

This is needed to derive a judgement such as x: A, y: B x: A where the variable on the is not the last variable in the context. We give these rules the same names, right of the (var) and (add var), as the axiom and inference rule in the definition of simply-typed lambda calculus, since these are the natural generalizations. " signature consists of a set of type constants (which we restrict to U1) and a set A of term constants. Each term constant must be assigned a closed type expression (without free variables) over the type constants of the signature. We now give syntax rules for U1 and (12 type expressions. The first universe U1 is essentially the predicative T of the last section. The U1 type expressions are given by variable rule (var), applied to U1 variables, the axiom

(cstUi) for type constant b of the signature, and inference rule

F

r

r

:

U1

giving us function type expressions. The second universe U2 contains the types of polymorphic functions.

U1

r: U1 F r: U2 F

types and

(U1

F

c U2) (flU2)

Since the (U1 ç U2) rule makes every type of the first universe a type of the second, we describe the relationship between universes by writing Uj c U2.

Example 9.2.1 A simple example of a typing derivation is the proof that fit: a well-formed U2 type expression.

U1

.t

—÷

t is

Predicative Polymorphic Calculus

9.2

0 context t:

U1

t: U1

621

by axiom (empty context)

context

by (U1 context)

(var)

t: U1

(—k U1)

(U1CU2)

(lit: U1.t

—÷

t): U2(li U2)

The reader may wish to stop and work out the similar proof that fls: U1. fit: U1. s t is a well-formed U2 type expression. The only tricky point in this proof is using (add var) to derive s: U1, t: Ui s: U1 from s: Ui s: U1. Although all these proof rules for syntax may seem like much ado about nothing, it is important to be completely clear about which expressions are well-formed and which are not. In addition, with a little practice, syntax proofs become second nature and require very little thought.

While we have expressions for U2 types, we do not have variables ranging over U2. The reason is that since we do not have binding operators for U2 variables, it seems impossible to use U2 variables for any significant purpose. If we were to add variables and lambda abstraction over U2, this would lead us to types of the form fIt: U2. a, which would belong to a third universe U3. Continuing in this way, we might also add universes U4, U5, U6 and so on. Several systems of this form may be found in the literature Mar73, Mar82, Mar84], but we do not investigate these in this book. The terms of may be summarized by defining the set of "pre—terms," or expressions which look like terms but are not guaranteed to satisfy all the typing constraints. The un—checked pre—terms of are given by the grammar M

x

).x:r.M

I

MM

Xt:U1.M

Mr

where in each case r must be a Ui type. Every term of must have one of the forms given by this grammar, but not all expressions generated by the grammar are well-typed terms. The precise typing rules for terms include axiom (var) for variables and rule (add var) for adding hypotheses about additional variables to judgements. We also have an axiom (cst) for each term constant of the signature. For lambda abstraction and application, we have reformulated as follows to include type variables rules (—k Intro) and (—± Elim) as in explicit assumptions. and universe

Polymorphism and Modularity

622

F

F (—* Elim), It is not necessary to specify that r: U1 since r —* r' will only be well formed when both r and r' are in U1. Since we have repeated all the typing rules from it is easy to check that any term, perhaps containing type variables in place of constants, is type a term Type abstractions and applications are formed according to the following rules:

In rule

F At: U1. M:

F

([1 Intro)

flt.a

Mi: [t/tla

F

In (fT Elim), the type [t/tla will belong to The final rule is a type equality rule

U1

if a: U1, and belong only to

(12

otherwise.

Fr>M:a1 F

M:

which allows us to replace the type of a term by an equal type. In pure the only equations between types will be consequences of a-conversion. However, in extensions of the language, there may be more complicated type equations and so we give the general rule here. We say M is a term with type a in context F if F M a is derivable from the axioms and typing rules. :

Example 9.2.2 Let E be a signature with type constant nat and term constant 3: nat. We will show that (At: U1. Ax: t. x) nat 3 has type nat. This is a relatively simple term, but the typing derivation involves many of the proof rules. Since the term begins with lambda bindings fort: U1, x: t, we begin by showing "t: Ui, x: t context." 0 context t:

U1

context

t:

U1

t: U1

by axiom(empty context)

by (U1 context)

(var)

t: U1, x: t context (U1 type context)

Predicative Polymorphic Calculus

9.2

623

We next observe that the judgement

t: U1, x: t

t:

U1

follows from the last two lines above by (add var) while t: U1, x: t

x: t

follows from t: Uj, x: t context by (var). The remaining steps use the term-formation rules. For simplicity, we omit the judgement nat: U1, used below as the second hypothesis for rules (—± Intro) and (II Elim).

fromabove t:

U1

Ax: t. x: t

—*

0

At: U1. Ax: t. x:

0

U1. Ax:

0

3:

t

llt.t

(—* —±

(II Intro)

t

t. x) nat: nat

Intro)

—±

nat

nat (II Elim)

axiom (cst)

Uj.Ax:t.x)nat3:nat

(—±

Elim)

.

One potentially puzzling design decision is the containment Ui ç U2. The main reason for this rule is that it simplifies both the use of the language, and a number of technical details in its presentation. For example, by putting every r: Ui into U2 as well, we can write a single 11-formation rule, instead of giving two separate cases for types starting in U1 or U2 places no additional semantic (12. An important part of this design decision is that Ui constraints on the language. More specifically, if we remove the (U1 ç U2) rule from the language definition, we are left with a system in which every U1 type is represented as a U2 into the weaker version without Ui C (12. type. This allows us to faithfully translate This is made more precise in Exercises 9.2.5 and 9.2. which uses the equational axiom system. Another possible relationship that we do not impose is U1: U2, which would mean that "the universe of small types is itself a large type." Since does not have any significant operations on arbitrary U2 types, there does not seem to be any reason to take Ui: U2. it would be a reasonable language design to However, in predicative extensions of put U1: U2. In an impredicative system with Uj = U2, the further assumption that U1: U2 would lead to type: type, which is often considered undesirable. However, we will see in Section 9.5.4 that with the a type-binding operator, used for writing ML "signatures," we will essentially be forced to have Ui as a member of U2.

"

Exercise 9.2.3

An alternate (var) rule is

Polymorphism and Modularity

624

F context

x:

Show that this is a derivable rule, given (var), section.

Exercise 9.2.4

var)

and

the other rules listed in this

Sketch derivations of the following typing judgements.

(a) (b)

(add

—÷

t.Ay:s.xy:fls:Uj.flt:Ui.(s

—± t)—± s

—±

t

s:U1,x:s,

(c) s:U1,

This exercise asks you to show that for the purpose of typing terms, we have U1 ç U2 even if we reformulate the system without the (U1 c U2) rule. If we drop the (U1 c U2) rule, then we need two (fl U2) rules, one for a Ui type and one for a U2 type: Exercise

9.2.5

flt:Ui.r:U2

flt:U1.a:U2

(These could be written as a single rule using a meta-variable i representing either We should also rewrite the (fl Intro) rule as two rules:

F,t: U1

M:t

F,

t:

U1

t:

U1

F,

t:

U1

F,t: U1

or 2.)

U2

Show that with these changes to (fl U2) and (fl Intro), any typing assertion F can be derived using (Ui c U2) can also be derived without (U1 c U2).

9.2.2

I

M: ci that

Comparison with Other Forms of Polymorphism

Both the impredicative typed lambda calculus and the "type: type" calculus may be viewed as special cases of the predicative polymorphic calculus. The first arises by imposing the "universe equation" U1 = U2 and the second by imposing both the equation U1 = U2 and the condition U1: U2. This way of relating the three calculi is particularly useful when it comes to semantics, since it allows us to use the same form of structure as models ' for all three languages. We conclude this section with a simplified syntax for that eliminates the need for type variables in contexts. This simplified syntax may also be used for the impredicative calculus, but breaks down with either type: type or the general product and sum operations added to in Section 9.5.

9.2

Predicative Polymorphic Calculus

625

Impredicative Calculus Since we already have U1 U2 in that forces the reverse inclusion, U2 F F

c

we may obtain Ui = U2 by adding a syntax rule The following rule accomplishes this.

U1.

a: U2 a: U1

(U2

c

Example 9.2.6 A simple example which illustrates the effect of this rule is the term (I (flt.t t)) I, where I At: Ui.Ax: t.x is the polymorphic identity. We will prove the syntactic judgement 0

(I (flt.t

t)) I: flt.t

t.

As illustrated in Example 9.2.2, we can derive the typing judgement 0

I: (flt.t

—÷

t) us-

ing the typing rules of with (flt.t —÷ t): U2. Using the impredicative rule ((12 U1), we may now derive (Ilt.t —÷ t): Ui. This allows us to apply the polymorphic identity to the type flt.t t. According to the (11 Elim) typing rule, the type application I (flt.t t) has type [(flt.t t)/tJ(t t), which is (flt.t t) (flt.t t). Notice that this would not be a well-formed type expression in predicative but it is a type expression of both universes (since they are the same) when we have (U2 U1). Finally, Elim) the application (I (Flt.t by rule t)) I has type flt.t t. .

Type : Type The second extension of predicative polymorphism is to add both U2 = Ui and Ui: U2. We have already seen that the syntax rule (U2 c U1) is sufficient to achieve U2 = U1. To make U1 an element of U2, we may add the typing axiom (U1:U2)

Since there is no reason to distinguish between U1 and U2, we will often simply write U for either universe. There is some lack of precision in doing this, but the alternative involves lots of arbitrary decisions of when to write U1 and when to write U2. There is much more complexity to the syntax of this language than initially meets the eye. In particular, since the distinction between types and terms that have a type is blurred, the equational proof rules may enter into typing derivations in relatively complicated ways. Some hint of this is given in the following example.

Example 9.2.7

An example which illustrates the use of the type: type rule is the term

(Af:U—÷ (U—÷

U).As:U.At:U.Ax:(fst).x) (As:U.At:U.s)natbool3

Polymorphism and Modularity

626

where we assume flat, boo!: U1 and 3: nat. Intuitively, the first part of this term is an identity function whose type depends on three parameters, the type function f: U —* (U —* U) and two type arguments s, t: U. This is applied to the type function As: U.At: U.s, two type arguments, and a natural number argument. The most interesting part of the type derivation is the way that equational reasoning is used. We begin by typing the left-most function. The type derivation for this subterm begins by showing that U1 —* U1 —* U1 is a type, introducing a variable and adding variables s and t to the context.

f

byaxiom(Ui:U2) (U2CU1) (—* U1)

(—* U1)

f: U1 —* U1 —* f: U1 —* U1 f: U1 —± U1 f: U1 —÷ U1

type context)

Ui context

—÷ U1

f: U1 —* U1

—±

U1, 5: U

context

—*

U1, s: U

f: U1

(var)

—÷ U1

(U1 cofltext) —* U1 —* U1

(add var)

f: U1 —* U1 —± U1, 5:

U,

t:

U

context

f: U1 —* U1 —* U1,

U,

t:

U

f: U1

U1

in the same context similarly, and use

We may derive s:

s:

U1 and

t:

(U1 —* U1 —* U1

cofltext)

(add var) (—*

Elim) twice to

show

f: U1

—* U1 —*

U1, s: U,

t: U

fst: U1

It is then straightforward to form the identity function of this type, and lambda abstract the three variables in the context to obtain

Ør>Af:U1—÷Ui--÷U1.As:U.At:U.Ax:fst.x: flf: U1 —÷ U1 -* U1. fls: U1. fit: U1. (fst) —÷ (fst) The interesting step is now that after applying to arguments (As: U1.At: Uj.s): U1, flat: U1 and bool: U1 to obtain 0

(Af:

U1 —* U1 —* U1. As: U1.

((As: U1.At: Ui.s)nat bool)

At: U1. Ax:

—±

fst.x)(As:

U1 —÷ U1 —*

U1.At: U1.s)flat bool:

((As: U1.At: U1.s)flat bool)

We must use equational reasoning to simplify both occurrences of (As: U1.At: U1.s)nat

9.2

Predicative Polymorphic Calculus

627

bool in the type of this application to nat. This is done using fl-reduction to obtain 0

U1

U1

.s)nat bool = nat:

U1

followed by the type equality rule (type eq). Once we have the resulting typing assertion, 0

U1

U1

r>

nat

U1.

U1.

U1.s)nat bool:

flat

it is clear that we may apply this function to 3: nat. The important thing to notice is that the type function Ui Ui .s could have been replaced by a much more complicated expression, and that type checking requires the evaluation of a type function applied to

type arguments.

Example 9.2.7 shows how equational reasoning enters into type: type typing derivations. Although we will not go into the proof, it can be shown by a rather subtle argument [Mar73, MR86J that with type: type, we may write very complex functions of type Ui Ui Ui, for example. This allows us to construct functions like the one given in Example 9.2.7 where evaluating the actual function is very simple, but deciding whether it is well-typed is very complicated. Using essentially this idea, it may be shown that the set of well-typed terms with type: type is in fact undecidable [Mar73, MR86J. This definitely rules out an efficient compile-time type checking algorithm for any programming language based on the type: type calculus.

Simplified Syntax of A simplifying syntactic convention is to use two sorts of variables, distinguishing term variables x, y, z,... from type variables r, s, t,... by using different letters of the alphawe assume that all type variables denote Ui bet. In the simplified presentation of types. A second convention is to use r, r', Ti,... for Ui types and a, a', a1,... for U2 types. This allows us to give the syntax of Ui and U2 type expressions by the following grammar: ,

::=

t

lb I

I

IIt.a

When we simplify the presentation of type expressions, we no longer need to include type variables in contexts, and we can drop the ordering on assumptions in contexts. terms may now be written using contexts of the form The typing rules for

F={xj,a1 consisting

xk:akl

of pairs x:

that each associate a type with a distinct variable. The rules for

Polymorphism and Modularity

628

terms remain essentially the same as above, except that we may drop the explicit assumptions involving universes since these follow from our notational conventions. To be absolutely clear, we give the entire simplified rules below.

(var) (cst) F F

(Ax:

Intro)

r.M): r MN: r'

F F

At.M: flt.cr

F

Mr: [r/t]cJ

(fi Intro)

(t not free in F)

The restriction "t not free in F" in (11 Intro) is derived from the ordering on contexts in the

detailed presentation of syntax. Specifically, if we use the detailed presentation of syntax, then a term may contain a free type variable t only if the context contains t: U1. Since (fi Intro) for ordered contexts only applies when t: Ui occurs last in the context, we are guaranteed that t does not occur elsewhere in the context. In order for the simplified syntax to be accurate, we therefore need the side condition in (11 Intro) above. This side condition prevents meaningless terms such as {x: t} At.x, in which it is not clear whether t is free or bound (see Exercise 9.2.10).

Exercise 9.2.8 then so is 9.2.3

Use induction on typing derivations to show that if F

M:

a

is derivable,

[r/t]F [r/t]M: [r/t]cJ.

Equational Proof System and Reduction

We will write terms using the simplified syntax presented at the end of the last section. Like other typed equations, equations have the form F M = N: a, where M and N are terms of type a in context F. The equational inference system for is an extension of the proof system, with additional axioms for type abstraction and

application.

As.[s/tIM: FIt.a

F

At.M

F

(At.M)r = [r/t]M: [r/t]a

(a)n

9.2

Predicative Polymorphic Calculus

629

tnotfreeinM

(17)fl

Reduction rules are defined from

and by directing these axioms from left to right. The proof system includes a reflexivity axiom and symmetry and transitivity rules to make provable equality an equivalence relation. Additional inference rules make provable equality a congruence with respect to all of the term-formation rules. For ordinary lambda abstraction and application, the appropriate axioms are given in Chapter 4. For type abstraction and application we have the following congruence rules.

flt.a F M = N: flt.a F Mt = Nt: [r/t]a F

At.M = At.N:

We consider type expressions that only differ in the names of bound variables to be equivalent. As a result, we do not use an explicit rule for renaming bound type variables in types that occur in equations or terms. As for other forms of lambda calculus, we can orient the equational axioms from left to right to obtain a reduction system. In more detail, we say that term M reduces to N and write M N if N may be obtained by applying any of the reduction rules (p8), (i'), to some subterm of M. For example, we have the reduction se-

quence (At.Ax: t.x) r y

—f

(Ax:

r.x) y

y,

reduction and the second by ordinary (fi) reduction. It is generally convenient to refer to both ordinary (fi) and as and similarly for reduction. Most of the basic theorems about reduction in simply-typed lambda calculus generalize nT• to For example, we may prove confluence and strong normalization for reduction polymorphic terms. (See Exercise 9.2.12 below for an easy proof of strong normalization; confluence follows from an easy check for local confluence, using Lemma 3.7.32.) ", subject reduction, is that reduction preserves the An important reduction property of type of a term. the first by

Proposition 9.2.9 (Subject Reduction) and M N by any number of well-typed term of the same type.

Suppose

r

M: ci is a well-typed term steps. Then F N: a-, i.e., N is a

Polymorphism and Modularity

630

The proof is similar to the proof of Lemma 4.4.16. The main lemmas are preservation of typing under term substitution (as in Lemma 4.3.6) and an anaJogous property of type substitution given as Exercise 9.2.8.

Exercise 9.2.10 Show that if the side condition "t not free in F" is dropped from nile (fi Intro), the Subject Reduction property fails. (Hint: Consider a term of the form (At. M)r where t is not free in M.) Exercise 9.2.11 Exercise 9.2.5 shows that for the purpose of typing terms, the rule (Uj c U2) is unnecessary, provide we reformulate (fi U2) and (fi Intro) appropriately. This exercise asks you to show that under the same modifications of (U U2) and (fi Intro), we will essentially have Ui c U2, i.e., for every U1 type there is an "essentially equivalent" U2 type.

Let r: U1 be any type from the first universe, and let t be a variable that is not free in r. Let unit be any U1 type that has a closed term *: unit. For any term M, let tM be a type variable that does not occur free in M. The mappings i[ I and j[ I on terms are defined by Munit. Ut. M and j[M]

i[M]: Ut: U1.r whenever F M r. r whenever F M: fit: Ui.r. (b) Show that F (a) Show that F

:

:

(c)

with U1 C U2 into an equivalent i[] and jill' sketch a translation from expression that is typed without using Uj ç U2. Explain briefly why this translation preserves equality and the structure of terms. (d) Using

Exercise 9.2.12 Use the strong normalization property of the simply typed lambda calculus to prove that reduction is strongly normalizing. It may be useful to show first that if a term M has a subterm of the form xr, then x must be a free variable of M. 9.2.4

Models of Predicative Polymorphism

It is basically straightforward to extend the definitions of applicative structure and and to prove analogous type soundness, equational soundness and Henkin model to equational completeness theorems. In this section, we give a short presentation of applicative structures and Henkin models, leaving it to the reader to prove soundness and completeness theorems if desired. Since impredicative and type: type polymorphism are derivable from predicative polymorphism by imposing additional relationships between universes, models of impredicative and type: type calculi are special cases of the more general predicative structures defined below. A category-theoretic framework for polymorphism may be developed using hyperdoctrines, relatively Cartesian closed categories

Predicative Polymorphic Calculus

9.2

631

[Tay871, or locally cartesian closed categories [See841. However, none of the connections between these categories and polymorphic calculi are as straightforward as the categorical equivalence between and CCCs described in Chapter 7. An interesting choice in giving semantics to lies in the interpretation of the containment Ui c U2. While it seems syntactically simpler to view every element of U1 as an element of U2, there may be some semantic advantages of interpreting Ui ç U2 as meaning that U1 may be embedded in U2. With appropriate assumptions about the inclusion mapping from Ui to U2, this seems entirely workable, and leads to a more flexible model definition than literal set-theoretic interpretation of Ui c U2. For simplicity, however, we will interpret U1 C U2 semantically as literal set-theoretic containment. This assumption is convenient when we come to impredicative and type: type calculi, since in both cases we will assume U, = U2 anyway.

Henkin Models for Polymorphic Calculi has two collections of types, U1 and U2, with U1 C U2, a model A Since C For each element a E will have two sets and with we will also have a set Doma of elements of type a. This also gives us a set Doma for each a E U14. In addition, we need some way to interpret type expressions with —* and V as elements of U1 or U2, and some machinery to interpret function application. The following model definition is in the same spirit as the Henkin models for impredicative calculus developed in [BMM9OI. A applicative structure A is a tuple

A = (U, dom,

App1},

I),

where

•

flA, 7Jtype} specifies sets a set [U14 —* of functions from to a binary operation —Y" on Uj4, a map flA from —* and a map 'type from type constants to to is a collection of sets indexed by types of the structure. a E dom =

•

{App", App1}

•

U

= (U14,

from the first universe, and one types a, b E mapping the first universe to the second. Each :

App1

for every function must be a function

:

f

—*

(Doma

Dom

to functions from Doma to Dom". Similarly, each

from

for every pair of

is a collection of application maps, with one

—±

fl

aEU1

must be a function

Polymorphism and Modularity

632

into the cartesian product flaEU1 assigns a value to each constant symbol, with 1(c) E Constants if c is a constant of type r. Here I[r]1 is the meaning of type expression r, defined below. Recall that the type of a constant must not contain any free type variables.

from

I:

and This concludes the definition. An applicative structure is extensional if every is one-to-one. As in Chapter 4, we define Henkin models by placing an additional constraint on extensional applicative structures. Specifically, we give inductive definitions of the meanings of type expressions and terms below. If, for all type expression, term and environment, these clauses define a total meaning function, then we say that the structure is a Henkin model. frame, then an A-environment is a mapping If A is a

ij: Variables

(Uj4 U

U

The meaning I[a]lij of a type such that for every type variable t, we have ij(t) E expression a in environment ij is defined inductively as follows:

=ij(t) L[billii

Lilt

=

L[flt.a]li

=

E

L[a]lij[a/t])

11 case, recall that is a map from to For a frame to be a model, of flA must contain, for each type expression a and environment the domain [Uj" ij, the function L[a]lij[a/t] obtained by treating a as a function oft. E If F is a context, then ij satisfies F, written ij F, if for every x: a E E F. The meaning of a term F M: a in environment i= F is defined by induction as follows:

In the

MN: r]lij = where a

M: r' E[r']lij and b

ElF

=

N:

9.2

Predicative Polymorphic Calculus

cj.M: a

—±

633

= the unique

fE

=

with

I[F, x:

a

where a = L[a]lij and b ff1'

Mt: [t/tja}]ij

=

ffF

where

U1.M: flt.a}]ij

ff1'

=

dl all d

E

=

f(a) = Ia ]lij[a/ tJ =

F-*

M: flt.a}]ij IIt]lij,

the unique g E

where

M: t]lij[x

ff1'

all a E

with M : a]lij[a/t] all a

f(a) = L[a]lij[a/t] all

a

E

E Uj4.

An extensional applicative structure is a Henkin model if M: defined above, for every well-typed term F M: a and every j= F. An impredicative applicative structure or Henkin model is one with U14' A type:type applicative structure or Henkin model is one with = some specified a E =

exists, as

=

for

Soundness and Completeness It is easy to show that in every model, the meaning of each term has the correct semantic type.

Lemma 9.2.13 (Type Soundness) Let A be a and ff1'

model, F

M: a a well-typed term,

j= F an environment. Then

M:

E

A straightforward induction shows that the equational proof system is sound and term model constructions may be used to show that the equational proof system is complete (when a (nonempty) rule as in Chapter 4 is added) for models that do not have empty types. Additional equational completeness theorems may be proved for models that may have empty types [MMMS 871, and for Kripke-style models.

Predicative Model Construction We will see additional examples of models in Section 9.3, which is concerned with impredicative and type: type models. One difference between the predicative, impredicative ' and type: type languages is that has classical set-theoretic models, while the other

Polymorphism and Modularity

634

languages do not (see Section 9.3.2). In fact, any model of the simply-typed lambda calby the following simple set-theoretic culus, may be extended to a model of construction. I) for the U1 types a E If we begin with some model A = ' using standard set-theoretic and terms, we can extend this to a model for all of cartesian product. For any ordinal a, we define the set [U2]a as follows [U2]0

U1

[U2]fl+l

=

[U2]fl U

{

fl

.

f(a)

f:

-* [U2]fl}

aEU1

for limit ordinal a

(J

[U2]a

Note that for any a, the set [U2]a contains all cartesian products indexed by functions from [U1

-*

U21a

U U1

=

[U2]fl.

The least limit ordinal w actually gives us a model. The reason for this is that every U2 type FIt1: U1 .. U1.r for some r: U1. It is easy to expression a of is of the form a show that any type with k occurrences of Fl has a meaning in [U2]k. Consequently, every This is proved precisely in the lemma below. To type expression has a meaning in .

shorten the statement of the lemma, we let A by taking [U1

-*

and U2

=

U U1 -*

be the structure obtained from a

U1

model

[U2]k

k point

General Products, Sums and Program Modules

9.5

val x_coord point -> real val y_coord point -> real val move_p point * real * real

689

: :

point

—>

:

end;

signature Circle sig include

=

Point

type circle val ink_circle val center

point * real —> circle —> point :

val radius

:

circle ->

val move_c

:

circle

*

circle

real real * real

—>

circle

end;

signature Rect =

sig include Point type rect

make rectangle from lower right, upper left corners

(*

point —> rect rect —> point rect -> point rect * real * real —> rect

val mk_rect val lleft

:

:

val uright

val move_r

point

*

end;

signature Geom =

sig (*

include Circle include Rect, except Point, by hand

*)

type rect val mk_rect

val lleft

:

val uright val move_r (*

point —> rect point -> point * real * real —> rect

point rect —>

:

rect :

rect

*

Bounding box of circle *) circle —> rect val bbox :

end;

Polymorphism and Modularity

690

For Point,

Circle and Rect,

structure pt struct

:

we give structures explicitly.

Point real*real

type point

(x,y)

fun mk_point(x,y) = fun x.coord(x,y)

x

fun y_coord(x,y) = y fun move_p((x,y):point,dx,dy) =

(x+dx,

y+dy)

end;

structure cr

:

Circle =

struct open pt

point*real mk_circle(x,y) = (x,y)

type circle = fun

fun center(x,y) = x

fun radius(x,y) = y fun move_c(((x,y),r):circle,dx,dy) =

((x+dx,

y+dy),r)

end;

structure rc

:

Rect =

struct open pt type rect =

point

fun mk_rect(x,y) = fun lleft(x,y) = x

*

point (x,y)

fun uright (x,y) = y fun move_r(((xl,yl),(x2,y2)):rect,dx,dy) =

((xl+dx,yl+dy) (x2+dx,y2+dy)) ,

end;

The geometry structure is constructed using

a functor that takes

Circle and

Rect struc-

tures as arguments. The signatures Circle and Rect both name a type called point. These two must be implemented in the same way, which is stated explicitly in the "sharing" constraint on the formal parameters. Although we will not include sharing constraints

General Products, Sums and Program Modules

9.5

691

in our type-theoretic account of the SML module system, it is generally possible to eliminate uses of sharing (such as the one here) by adding additional parameters to structures, in this case turning circle and rectangle structures into functors that both require a point structure. However, as pointed out in [Mac86J, this may be cumbersome for large

programs.

functor

geom(

structure

c

Circle

structure r:Rect

c.point)

sharing type r.point =

:

Geom =

st ru Ct

c.point

type point =

val mk_point = val x_coord =

val y_coord =

c.mk_point c.x_coord c.y_coord

val move_p = c.move_p

type circle =

c.circle c.mk_circle c.center c.radius

val mk_circle =

val center = val radius =

val move_c = c.move_c

type rect =

r.rect

val mk_rect =

val

(*

heft

=

r.mk_rect

r.lleft

val uright =

r.uright

val move_r =

r.move_r

Bounding box of circle

*)

fun bbox(c) =

let

val x =

x_coord(center(c))

and y = y_coord(center(c)) and r radius(c)

in mk_rect(mk_point(x-r ,y-r) ,mk_point(x+r ,y+r)) end end; .

Polymorphism and Modularity

692

Predicative Calculus with Products and Sums

9.5.2

Syntax by adding general gento a function calculus In this section we extend eral sums and products (from Section 9.1.3). While general sums are closely related to structures, and general cartesian products seem necessary to capture dependently-typed is somewhat more general than Standard ML. For example, functors, the language while an ML structure may contain polymorphic functions, there is no direct way to define a polymorphic structure (i.e., a structure that is parametric in a type) in the implicitly-typed programming language. This is simply because there is no provision for explicit binding of type variables. However, polymorphic structures can be "simulated" in ML by using E, by a functor whose parameter is a structure containing only a type binding. In virtue of the uniformity of the language definition, there will be no restriction on the types will have expresof things that can be made polymorphic. For similar reasons, sions corresponding to higher-order functors and functor signatures. Neither of these were included in the original design of the ML module system [Mac85], but both were added after the type-theoretic analysis given here was developed [Mac86, HM93]. conUnfortunately, general products and sums complicate the formalization of siderably. Since a structure may appear in a type expression, for example, it is no longer possible to describe the well-formed type expressions in isolation from the elements of those types. This also makes the well-formed contexts difficult to define. Therefore, we cannot use the simplified presentation of given in Section 9.2.2; we must revert to the general presentation of Section 9.2.1 that uses inference rules for determining the wellU formed contexts, types and terms. The un-checked pre-terms of are given by the

grammar

M::=Ui Ix

I

(x:

c

I

U2

lb M-*M flx:M.M )x:M.M I

Xx:M.M

I

I

I

MM

M=M, M: M) first(M) I

second(M)

Intuitively, the first row of this grammar gives the form of type expressions, the second ' row expressions from and the third row the expression forms associated with general sums. However, these three classes of expressions are interdependent; the precise definition of each class is given by the axioms and inference rules below. We will use M, N, P and a and t for pre-terms, using a and r when the term is intended to be a type. As in our presentation of existential types, we use a syntactic form (x: a M, N: a'), with the variable x bound in a'. In addition, x is bound in the second component N, so that the "dependent pair" (x: a=M, N: a') is equal to (x: a=M, [M/x]N: a'), assuming x is not

9.5

General Products, Sums and Program Modules

693

free in N. Unlike existential types, general sums have unrestricted first and second projection functions that allow us to access either component of a pair directly. In this respect, pairs of general sum type are closer to the ordinary pairs associated with cartesian product types. Although it is meaningful to extend with a polymorphic let declaration, let is definable in using abstraction over polymorphic types in U2. (See Exercise 9.5.6.) The typing rules for are an extension of the rules for given in Secallows lambda abstraction over U2 types, tion 9.2.1. In addition to general sums, resulting in expressions with fl-types. The associated syntax is characterized by the following three rules, the first giving the form of fl-types in and the second and third the associated term forms.

F

F

M[N] [N/x]a' :

It is an easy exercise (Exercise 9.5.3) to show that the weaker (fl U2) rule, in Section 9.2.1, is subsumed by the one given here. The four rules below give the conditions for forming E-types and the associated term forms.

F

F

F

(x:

:

a=M, N: a") Ex: a.a' :

a (E Elim

second(M) [first(M)/x]a'

These rules require the equational theory below. A simple typing example that does not require equational reasoning is the derivation of x: (Et: U1.t), y:firstx —*firstx, z:firstx

yz:firstx

Polymorphism and Modularity

694

With a "structure" variable x Et: U1 .t in the context,firstx is a Ui type expression and so we may form the U1 type firstx —± firstx. The rest of the typing derivation is routine. The following example shows how equational reasoning may be needed in typing derivations. The general form of type equality rule is the same as in Section 9.2.1. However, the rules for deriving equations between types give us a much richer form of equality than the and simple syntactic equality (with renaming of bound variables) used in :

Example 9.5.2 We can see how equational reasoning is used in type derivations by introducing a "type-transparent" form of let declaration,

lets x:a=M in

second(x:a=M,N)

N

The main idea is that in a "dependent pair" (x: a=M, N), occurrences of x in N are bound to M. Moreover, this is incorporated into the (E Intro) rule so that N is typed with each occurrence of x replaced by M. The typing properties of this declaration form are illustrated by the expression x: (Et: U1.t)

=

(t: U1=int, 3: t)

in

second(x: (Et: Ui.t)=(t: U1=int, 3: t),

mt. z)(secondx) (Az: mt.

z)(secondx))

It is relatively easy to show that the expression, (t: U1=int, 3: t), bound to x has type Et: U1 .t, since we assume int: U1 and 3: mt are given. The next step is to show that the

larger pair

(x: (Et: Uj.t)=(t: U1=int, 3: t), has type (Et: suffices to show

U1

z)(secondx))

.t).int. Pattern matching against the

[M/xI(Az: mt. z)(secondx) where M

(Az: mt.

:

[M/x]int

Intro) rule, we can see that it

int,

(t: U1=int, 3: t). By (E Elim 2), the type of secondM is

second(t: Uj=int,

3: t)

first(t: U1=int, 3: t)

Using the equational axiom ()first) below, we may simplifyfirst(t: Uj=zint, 3: t) to mt. This equational step is critical since there is no other way to conclude that the type of secondM is mt. .

Equations and Reduction The equational proof system for contains the axioms and rules for (with each axiom or rule scheme interpreted as applying to all terms), described in Section 9.2.3, plus the rules listed below.

General Products, Sums and Program Modules

9.5

695

The equational axioms for pairing and projection are similar to the ones for cartesian products in earlier chapters, except for the way the first component is substituted into the second in the (E second) rule.

N:a') =

F

M

:

a

(Efirst)

N:a') =[M/xJN: [M/xja'

second)

(Esp) second) may yield equations Note that since a or a' may be U1, axioms and between types. Since type equality is no longer simply syntactic equality, we need the congruence rules for type equality below. Since all types are elements of U2, we may use the general reflexivity axiom and symmetry and transitivity rules mentioned in Section 9.2.3 for equations between types (see Exercise 9.5.4).

F

r>

ai = aj':

F, x: a1

a2

=

a! = F

:

(12

F,

U2 (II

flx:af.a2 =

F F

(12

Cong)

(12

x: a!

a2

=

=

U2

Cong)

U2

The congruence rules for terms are essentially routine, except that when a term form

includes types, we must account for type equations explicitly. To avoid confusion, we list the rules for terms of Fl and type here.

F

Ax:

a.M = Ax: a".M':

fIx:

a.a'

F

M= M' :a F = F F (x: a=M, N: a') = (x: a"=M', N': a")

F F

M = M': =first(M')

:

a

:

a.a'

Polymorphism and Modularity

696

F F

c

c

M

= M': )x: a.a'

second(M) = second(M') [first(M)/x]a' :

Since contexts are more complicated than in the simplified presentation of the (add var) rule should be written

FrM=N:a F, x: A

c

F,x:Acontext M = N: a

(add var)

so that only well-formed contexts are used.

If we direct the equational axioms from left to right, we obtain a reduction system of the form familiar from other systems of lambda calculus. Strong normalization for may be proved using a translation into Martin-Löf's 1973 system [Mar73]. It follows that is decidable. It is an interesting open problem to develop the equational theory of a task that is complicated by the presence of a theory of logical relations for full '

general

and H types.

Exercise 9.5.3 Show that the (H U2) rule in Section 9.2.1 is subsumed by the (H U2) rule given in this section, in combination with the other rules of Exercise 9.5.4 A general reflexivity axiom that applies to types as well as terms that have a Ui or U2 type is

(refi)

assuming that a may be any well-formed expression of Write general symmetry and transitivity rules in the same style and write out a proof of the equation s: U1, t:

between

U1

U1

second(r:

U1

= s,

t: U1)

t

=t

t:

U1

type expressions.

Exercise 9.5.5 Show that if F M: ).x: o.a' is derivable, then the expression (x: a= firstM, secondM: a') used in the (E sp) axiom has type a.a'. r

Exercise 9.5.6 Show that if we consider let x: a = M in N as syntactic sugar for (Ax: a. N)M, in then typing rule for let in Section 9.2.5 is derivable. Exercise 9.5.7 In some of the congruence rules, it may not be evident that the terms proved equal have the same syntactic type. A representative case is the type congruence rule for lambda abstraction, which assumes that if F, x: a M: a' and F a o": U2 are derivable, then so is F Ax: a". M: flx: a.a'. Prove this by showing that for any M, if F a = a": U2 and F, x: a M: a' are derivable, then so is F, x: a" M: a'. c

c

c

c

c

c

General Products, Sums and Program Modules

9.5

9.5.3

697

Representing Modules with Products and Sums

General sums allow us to write expressions for structures and signatures, provided we regard environments as tuples whose components are accessed by projection functions. For example, the structure

struct type t=int

va].

x:t=3 end

may be viewed as the pair (t: U1=int, 3: t). In the components t and x are retrieved by projection functions, so that S . x is regarded as an abbreviation for second (S). With general sums we can represent the signature

sig type t val x:t end Et: U1.t, which has the pair (t: Ui=int, 3: int) as a member. The representation of structures by unlabeled tuples is adequate in the sense that it is a simple syntactic translation to replace qualified names by expressions involving projection functions. A limitation that may interest ML experts is that this way of viewing structures does not provide any natural interpretation of ML's open, which imports declarations from the given structure into the current scope. Since general products allow us to type functions from any collection to any other collection, we can write functors as elements of product types. For example, the functor F from Section 9.5.1 is defined by the expression as the type

(Et: U1t).(s: Ui=first(S) xfirst(S), (second(S), second(S)) s), :

which has type

[IS: (st: U1.t).

U1.s).

An important aspect of Standard ML is that signature and functor declarations may only occur at "top level," which means they cannot be embedded in other constructs, and structures may only be declared inside other structures. Furthermore, recursive declarations are not allowed. Consequently, it is possible to treat signature, structure, and functor declarations by simple macro expansion. An alternative, illustrated in Example 9.5.8 below, is to using the type-transparent letE bindings of Example 9.5.2, treat declarations (in on general sums. which are based is to replace all occurrences The first step in translating a ML program into E expressions. of signature, structure and functor identifiers with the corresponding Then, type expressions may be simplified using the type equality rule of Section 9.2.1, as required. With the exception of "generativity,", which we do not consider, this process models elaboration during the ML type checking phase fairly accurately.

Polymorphism and Modularity

698

Example 9.5.8 We illustrate the general connection between Standard ML and using the example program

structure

S =

struct type t val x

mt

= :

t

= 7

end;

S.x

+ 3

expression for the structure bound to

The S

(t:U1=int,7:t)

:

S is

Et:U1.t

If we treat SML structure declarations by macro expansion, then this program is equivalent to

secondS +

3

second(t: U1=int, 7: t) + 3

We may type this program by observing that the type of secondS isfirst((int, 7)), which may be simplified to mt. Since structures in Standard ML are transparent, in the sense that the components of using a structure are fully visible, structure declarations may be represented in bindings of Example 9.5.2. In this view, the program above is treated as transparent

the

term

letE S:Et:U1.t =

(t:

Uj=int, 7: t) in secondS +

3

which has a syntactic structure closer to the original program. As explained in Example 9.5.2, the typing rules for general sums allow us to use the fact thatfirst(S) is equivalent to mt. As mentioned earlier in passing, the calculus is more general than Standard first since ML in two apparent respects. The is that there is no need to require that subexpressions of types be closed, we are able to write explicitly-typed functors with E• The second is that nontrivial dependent types in due to the uniformity of the language, we have a form of higher-order functors.

9.5.4

Predicativity and the Relationship between Universes

Although each of the constructs of corresponds to a particular part of Standard ML, the lambda calculus allows arbitrary combinations of these constructs, and straightforward extensions like higher-order functor expressions. While generalizing in certain

9.5

General Products, Sums and Program Modules

699

'H, maintains the ML distincways that seem syntactically and semantically natural, tion between monomorphic and polymorphic types by keeping U1 and U2 distinct. The restrictions imposed by universes are essential to the completeness of type inference (discussed in Chapter 11) and have the technical advantage of leading to far simpler semantic model constructions, for example. However, it may seem reasonable to generalize ML polymorphism by lifting the universe restrictions (to obtain an impredicative calculus), or alter the design decisions U1 c U2 and Ui: U2. Exercises 9.2.5 and 9.2.11 show that the decision to take c U2 is only a syntactic convenience, with no semantic consequence. In this section we show that the decision to take U1 U2 is essentially forced by the other constructs of and that in the presence of structures and functors, the universe restrictions are essential if we wish to avoid a type of all types. :

Strong Sums and Ui:U2 we have U1 c U2, but not U2. However, when we added general product and sum types, we also made the assumption that U1: U2. The reasons for this are similar to the reasons for taking U1 c U2: it makes the syntax more flexible, simplifies the technical presentation, and does not involve any unnecessary semantic assumptions. A precise statement is spelled out in the following proposition. In

Proposition 9.5.9

In any fragment of rules for types of the form t: U, t, with .

which is closed under the term formation are contexts

t: Ui, there

and

where unit may be any tions.

U1

type with closed term *: unit, satisfying the following condi-

i[t]: (Et: Ui.unit).

1.

If F

r

2.

If F

M: (st: U1.unit), then F

U1, then F

j[M]

:

U1.

3.

In other words, given the hypotheses above, since Ui: U2 without loss of generality.

(st: U1 .unit):

U2, we may assume

with sums over U1 In words, the proposition says that in any fragment of (and some U1 type unit containing a closed term *), we can represent U1 by the type Et: U1 .unit. Therefore, even if we drop U1: U2 from the language definition, we are left with a representation of U1 inside U2. For this reason, we might as well simplify matters and take U1: U2.

700

Polymorphism and Modularity

Impredicativity and "type :type" as in the programming language SML, polymorphic functions In the calculus applicable to arguments of all types. For example, the identity function, not actually are has polymorphic type, but it can only written id(x) = x in ML and At. Ax: t. x in be applied to elements of types from the first universe. One way to eliminate this restriction is to eliminate the distinction between U1 and U2. If we replace U1 and U2 by a single ' then we obtain the impredicative polymorphic calculus. universe in the definition of calculus impredicative by eliminating the distinction However, if we make the full between U1 and U2, we obtain a language with a type of all types. Specifically, since we have general products and Ui: U2, it is quite easy to see that if we let Ui = U2, then the type: type calculus of Section 9.2.2 becomes a sublanguage of

with U1: U2, U1 = U2, and closed under the Any fragment of type and term formation rules associated with general products is capable of expressing all terms of the type: type calculus of Section 9.2.2.

Lemma 9.5.10

The proof is a straightforward examination of the typing rules of the type: type calculus of Section 9.2.2. By Lemma 9.5.9, we know that sums over U1 give us Ui: U2. This proves the following theorem.

Theorem 9.5.11 The type: type calculus of Section 9.2.2 may be interpreted in any fragwithout universe distinctions which is closed under general products, and ment of sums over U1 of the form Et: U1.r. Intuitively, this says that any language without universe distinctions that has general products (ML functors) and general sums restricted to U1 (ML structures with type and value but not necessarily structure components) also contains the language with a type of all types. Since there are a number of questionable properties of a type of all types, such as nontermination without explicit recursion and undecidable type checking, relaxing the universe restrictions of would alter the language dramatically.

Trade-off between Weak and Strong Sums Theorem 9.5.11 may be regarded as a "trade-off theorem" in programming language design, when combined with known properties of impredicative polymorphism. The tradeoff implied by Theorem 9.5.11 is between impredicative polymorphism and the kind of E ' types used to represent ML structures in Generally speaking, impredicative polymorphism is more flexible than predicative polymorphism, and E types allow us to type more terms than the existential types associated with data abstraction.

9.5

General Products, Sums and Program Modules

701

Either impredicative polymorphism with the "weaker" existential types, or restricted E seems predicative polymorphism with "stronger" general sum types of reasonable. By the normalization theorem for the impredicative calculus, we know that impredicative polymorphism with existential types is strongly normalizing (normalization with existential types can be derived by encoding as Vr[Vt(a —± r) —± r]). As noted in Section 9.5.2, a translation into Martin-Löf's 1973 system [Mar73] shows that with predicative polymorphism and "strong" sums is also strongly normalizing. However, by Theorem 9.5.11, we know that if we combine strong sums with impredicative polymorphism by taking U1 = U2, the most natural way of achieving this end, then we must admit a type of all types, causing strong normalization to fail. In short, assuming we wish to avoid type: type and non-normalizing recursion-free terms, we have a trade-off between impredicative polymorphism and strong sums. E, it is apparent that there are actually several ways to combine In formulating impredicative polymorphism with strong sums. An alternative is that instead of adding impredicative polymorphism by equating the two universes, we may add a form of impredicative polymorphism by adding a new type binding operator with the formation rule F, t:

U1

r U1.r

F

U1

:

:

U1

Intuitively, this rule says that if r is a U1 type, then we will also have the polymorphic type Vt: U1 .r in U1. The term formation rules for this sort of polymorphic type would allow us to apply any polymorphic function of type Vt: Ui .t to any type in U1, including a polymorphic type of the form Vs: U1 .a. However, we would still have strong sums like E t: U1 r in U2 instead of U1. The normalization theorem for this calculus follows from that of the theory of constructions with strong sums at the level of types [Coq86] by considering U1 to be prop, and U2 to be type0. .

Subtyping and Related Concepts

10.1

Introduction

This chapter is concerned with subtyping and language concepts that make subtyping useful in programming. In brief, subtyping is a relation on types which implies that values of one type are substitutable for values of another. Some language constructs associated with subtyping are records, objects, and various forms of polymorphism that depend on the subtype relation. The main topics of the chapter are: •

Simply-typed lambda calculus with records and subtyping.

•

Equational theories and semantic models.

•

Subtyping for recursive types and the recursive-record model of objects.

•

Forms of polymorphism appropriate for languages with subtyping.

While the main interest in subtyping stems from the popularity and usefulness of objectoriented programming, many of the basic ideas are illustrated more easily using the simpler example of record structures. Many of the issues discussed in this chapter are topics of current research; some material may be subsumed in coming years. A general reference with research papers representing the current state of the art is [GM94]; other collections representing more pragmatic issues are {5W87, YT87J. An early research/survey paper that proposed a number of influential typing ideas is {CW85]. Subtyping appears in a variety of programming languages. An early form of subtyping appears in the Fortran treatment of "mixed mode" arithmetic: arithmetic expressions may be written using combinations of integer and real (floating point) expressions, with integers converted to real numbers as needed. The conversion of integers to reals has some of the properties that are typical of subtyping. In particular, we generally think of the mathematical integers as a subset of the real numbers. However, conversion in programs involves changing the representation of a number, which is not typical of subtyping with records or objects. Fortran mixed mode arithmetic also goes beyond basic subtyping in two ways. The first is that Fortran provides implicit conversion from reals to integers by truncation, which is different since this operation changes the value of the number that is represented. Fortran mt and +: real x real real. also provides overloaded operations, such as +: mt x mt However, if we overlook these two extensions of subtyping, we have a simple example of integers as a subtype of reals that provides some intuition for the general properties of subtyping. Although the evaluation of expressions that involve overloading and truncation is sometimes subtle, the Fortran treatment of arithmetic expressions was generally successful and has been adopted in many later languages.

704

Subtyping and Related Concepts

Another example of subtyping appears in Pascal subranges or the closely related range constraints of Ada (see [Hor84], for example). As discussed again below, the Pascal subrange [1.. 10] containing the integers between 1 and 10 is subtype of the integers. If x is a variable of type [l..l0], and y of type integer, then we can assign y the value of x since every integer between 1 and 10 is an integer. More powerful examples of subtyping appear in typed object-oriented languages such as Eiffel [Mey92b] and C++ [ES9O]. In these languages, a class of objects, which may be regarded for the moment as a form of type, is placed in a subtype hierarchy. An object of a subtype may be used in place of one of any supertype, since the subtype relation guarantees that all required operations are implemented. Moreover, although the representation of objects of one type may differ from the representation of objects of a subtype, the representations are generally compatible in a way that eliminates the need for conversion from one to another. We may gain some perspective on subtyping by comparing subtyping with type equality in programming languages. In most typed programming languages, some form of type equality is used in type checking. A basic principle of type equality is that when two types are equal, every expression with one type also has the other. With subtyping, we replace this principle with the so-called "subsumption" property of subtyping:

If A is a subtype of B then every expression with type A also has type B. This is not quite the definition of subtyping since it depends on what we mean by having a type, which in turn may be depend on subtyping. However, this is a useful condition for recognizing subtyping when you see it. The main idea is that subtyping is a substitution property: if A is a subtype of B, then we may substitute A's for B's. In principle, if we may meaningfully substitute A's for B's, then it would make sense to consider A a subtype of B. A simple illustrative example may be given using record types. Consider two record types R and 5, where records of type R have an integer a component and boolean b component, and records of type S have only an integer a component. In the type system of Section 10.3, R will be a subtype of S. One way to see why this makes sense is to think about the legal operations on records of each type. Since every r: R has a and b components, expressions r. a and r. b are allowed. However, since records of type S have only a components, the only basic operation on a record s of type S is to select the a component by writing s. a. Comparing these two types, we can see that every operation that is guaranteed to make sense on elements of type S is also guaranteed to make sense on elements of R. Therefore, in any expression involving a record of type S, we could safely use a record of type R instead, without type error.

10.1

Introduction

705

We will use the notation A In, m tion 5.6, including Rnat = {(n, n> In E .A/}, Rb001 = {(0, 0>, (1, 1>),

Subtyping and Related Concepts

736

= {(n, m) Vk E Al. n k m k}. Note that these are all 0, we can write F = F0, yl: ai such that the derivation concludes yk: with

by

(—±

F

Elim), followed by k uses of (add var). By the inductive hypothesis,

11.2

Type Inference for

with Type Variables

781

and are most general typings for U and V. This means that there exist substitutions such that

T1

and

T2

T1T1cT0, T2T2CT0,

N=T2N'andT2p=a'

Since the type variables in PT(V) are renamed to be distinct from any type variables in PT(V), the substitutions Ti and T2 may be chosen so that any type variable changed by one is not affected by the other. Anticipating the need for a substitution that behaves properly on the fresh variable t introduced in the algorithm, we let T be the substitution with

Ts = T1sifs appears in PT(U), Ts = T2sifs appears in PT(V),

Tt = a Since the type assignments Fi and ['2 must contain exactly the term variables that occur free in U and V, and TF1 c To for i = 1, 2, the substitution T must unify (cx = $ x: a e Ti and x: $ e ['2). In addition, since Tr = a' —± a = Tp —± Tt, the substitution T unifies = p t. It follows that there is a most general unifier

S_—Unify((a=$

I

and the call to PT(UV) must succeed. Since S is a most genera' unifier for these tions, there is a substitution R with T = R a S. It is easy to see that T MN: a is an instance of PT(UV) by substitution R. The final case is a derivable typing assertion T Xx: a'. M: a' —± a with Erase (Xx: a'. M) = Xx. U. For some k > 0, we can write F = T0, yl: ai yk: ak such that concludes with derivation the T0, x: Xx:

by

a'

a'.

M: a M: a' —± a

Intro), followed by k uses of (add var). By the inductive hypothesis,

T1r'M':p= PT(U) is a most general typing for U, which means that there is a substitution T with

TF1CTØ,

M=TM'andTp=a

Type Inference

782

If x: r

TI for some

r, then Tr must be a', and

r. M': r

T

Ax:

a'. M: a' —±

a is an instance of

p by substitution T. If x does not occur in T, then we may further assume without loss of generality that Ts = a', where s is the fresh type variable introduced in Algorithm PT. In this case, it is easy to verify that T Ax: a'. M: a' —± a T

—

{x: r}

—±

must be an instance of T

Ax: s. M: s

—±

p by substitution T. This proves the theorem.

.

Execute the principal typing algorithm "by hand" on the following

Exercise 11.2.14 terms. (a) K_=Ax.Ay.x. (b)

S

(c)

Af.Ax.f(fx)

Ax. Ay. Az. (xz)(yz).

(d) SKK, where K and S are given in parts (a) and (b). (e) Af. (Ax.

f (xx))(Ax. f(xx)). This is the untyped "fixed-point combinator" Y. It has the

property that for any untyped lambda term U, YU = U(YU).

Implicit Typing

11.2.4

Historically, many type inference problems have been formulated using proof rules for untyped terms. These proof systems are often referred to as systems for implicit typing, since the type of a term is not explicitly given by the syntax of terms. In this context, type inference is essentially the decision problem for an axiomatic theory: given a formula asserting the type of an untyped term, determine whether the formula is provable. In this section, we show how type inference may be presented using "implicit" typing rules that correspond, via Erase, to the standard "explicit" typing rules of The following proof system may be used to derive typing assertions of the form T type U: r, where U is any untyped lambda term, r is a type expression, and T is a assignment. 0

c:

r, c a constant of type r without type variables

(cst)

(var) T, x: T

r

U: p

(Ax.U): r

—±

(abs)

p

T

p

U:

r

T

with Type Variables

11.2

783

These rules are often called the Curry typing rules. Using C as an abbreviation for Curry, we will write F—a F U: t if this assertion is provable using the axiom and rules above. There is a recipe for obtaining the proof system above from the definition of we simply apply Erase to all of the terms appearing in the antecedent and consequent of every rule. Given this characterization of the inference rules, the following lemma should not be too surprising.

Lemma 11.2.15

If

Conversely, if F—a F

F M: r is a well-typed term of U: t, then there is a typed term F M:

then

t of

F Erase(M): t. with Erase(M) = U.

F—a

The proof (below) is a straightforward induction on typing derivations. It may be surprising that this lemma fails for certain (much more complicated) type systems, as shown in [Sa189].

Proof If F M: t

is a well-typed term of XT*, then it is easy to show Hc F Erase(M): r by induction on the typing derivation of F M: r. For the converse, assume we have a proof of F U: t. We must construct a term M with F M: r and Erase(M) = U, by induction on the proof of F U: r. Since this is essentially straightforward, we simply illustrate the ideas involved using the lambda-abstraction case. Suppose F Xx.U: r —± r' follows from F, x: t U: r' and we have a well-typed term F, x: r M: t' with Erase(M) = U. Then clearly F Xx: r.M: t —÷ r' and Erase(Ax: r.M)

.

In the proof of Lemma 11.2.15, it is worth noting that in constructing a well-typed term from a typable untyped term, the term we construct depends on the proof of the untyped judgement. In particular, if there are two proofs of F U: t, we might obtain two different explicitly typed terms with erasure U from these two proofs. When we work with typing assertions about untyped terms, some terminology and definitions change slightly. We say untyped term U has typing F U: r if this typing assertion is derivable using the Curry rules of Section 11.2.4 or, equivalently, if there is term F M: r with Erase(M) = U. We say F' U: r' is an instance a well-typed of F U: r if there is a substitution S with SF c F' and St = r'. A typing F U: r is principal for U if it is a typing for U and every other typing for U is an instance of F U: r. It is easy to use Theorems 11.2.12 and 11.2.13 to show that any typable untyped term has a principal typing, and that this may be computed using algorithm PT. It is possible to study the Curry typing rules, and similar systems, by interpreting them semantically. As in other logical systems, a semantic model may make it easy to see that certain typing assertions are not provable. To interpret F U: t as an assertion about U,

Type Inference

784

we need an untyped lambda model to make sense of U and additional machinery to interpret type expressions as subsets of the lambda model. One straightforward interpretation of type expressions is to map type variables to arbitrary subsets and interpret r —÷ p as all elements of the lambda model which map elements of r to elements of p (via application). This is the so-called simple semantics of types, also discussed in Section 10.4.3 in connection with subtyping. It is easy to prove soundness of the rules in this framework, showing that if 1—c r U: r, then the meaning of U belongs to the collection designated by r. With an additional inference rule

r

V:

for giving equal untyped terms the same types, we also have semantic completeness. Proofs of this are given in [BCDC83, Hin83J. Some generalizations and related studies may be found in [BCDC83, CDCV8O, CZ86, CDCZ87, MP586, Mit88J, for example. It is easy to show using a semantic model that the only terms of type t —÷ t, for example, are those equal to .x. However, since semantically equal untyped terms must have the same types in any model, no decidable type system can be semantically complete, by Exercise 11.2.2. 11.2.5

Equivalence of Ilyping and Unification

While algorithm PT uses unification, the connection between type inference and unification is stronger than the text of algorithm PT might suggest. A general theme in this section is that type inference and unification are algorithmically equivalent, in the sense that every type inference problem generates a unification problem, and every unification problem arises in type inference for some term. We begin this section with an alternative type inference algorithm whose complexity is easier to analyze than PT. This algorithm is essentially a reduction (in the sense of recursion theory or complexity theory) from typing to unification. When combined with the linear-time unification algorithm given in {PW78J, this reduction gives a linear-time algorithm for typing. The remainder of this section develops the converse reduction, from unification to type inference, with some consequences of independent interest noted along the way. To establish intuition, we give some examples of terms with large types, showing that in the worst case a term may have a type whose string representation is exponential in the length of the term. Next, we show that if r is the type of some closed term, then r must be the principal type of some term. This fact, which was originally proved by Hindley for combinatory terms [Hin69J, is used in our reduction from unification to type

11.2

Type Inference for

with Type Variables

785

Table 11.3

Algorithm reducing

(Curry) typing to unification.

TE(c, t) = {t = r}, where r is the variable-free type of c TE(x, t) = = t1), where is the type variable for x TE(UV, t) = TE(U, r) U TE(V, s) U = s —* t}, where r, s are fresh type variables TE(Ax.U, t) = TE(U, s) U = —* s), where s is a fresh type variable

inference. The reduction from first-order unification to type inference and the examples of terms with large types provide useful background for analyzing ML typing in Section 11.3.5. We will work with typing assertions about untyped terms, as in Section 11.2.4. Given an untyped term, algorithm TE in Table 11.3 produces a set of equations between type expressions. These equations are solvable by unification if the given untyped term has a typing. The input to T E must be an untyped term with all bound variables distinct, and no variable occurring both free and bound. The reason for this restriction, which is easily satisfied by renaming bound variables, is that it allows us to simplify the association between term variables to type variables. Specifically, since each term variable is used in only one way, we can choose a specific type variable for each term variable x. For any untyped term U, let ['u be the type assignment I

xfreeinU)

If U is an untyped term, t is a type variable, and E = T E(U, t), then every typing for U is an instance of S unifying E. In particular, the principal typing for U is given by the most general unifier for E. This is illustrated by example below.

Example 11.2.16 We can see how algorithm TE works using the example term S Ax. Ay. Az.(xz)(yz). Note first that all bound variables are distinct. Tracing the calls to

TE, we have TE(Ax. Ay. Az.(xz)(yz), r)

=

TE(Ay.Az.(xz)(yz),s)

-÷ s)

U

= =

TE((xz)(yz), u) U

{t

TE(x, vl)

V2) U {vl

=

—>

u, s

=

—>

= =

U

TE(z,

=

v2

> V1

t, r

=

—>

sl

Type Inference

786

U

TE(y, wi)

U

TE(z, w2)

U

(wi

= W2 -*

w}

U

= U U

v2, and w2, since each must be equal to the type of some term variable, Eliminating we have the unification problem

We can solve this set of equations by repeatedly using substitution to eliminate variables, working from left to right. If we stop with the equation for r, we find that the type of S is

r

—

(w

w)

u))

u)

Since the main ideas are similar to the correctness proof for PT in Section 11.2.3, we will not prove correctness of Algorithm TE. (A correctness proof for this algorithm may be found in [Wan871.) In analyzing the complexity, we assume the input term is given by a parse tree (Section 1.7) so that the length of U V is the length of U plus the length of V plus some constant factor (for the symbols indicating that this is an application of one term to another). Given this, it is an easy induction on terms to show that the running time of TE(U, t) on parsed term U is linear in the length of U. If the output of TE is presented using the dag representation of type expressions discussed in Section 11.2.2 and Exercise 11.2.6, then we may apply the linear time unification algorithm of [PW78]. This gives a linear time typing algorithm. The output of the linear time typing algorithm is a typing in which all of the type expressions are again represented by directed acyclic graphs. This is necessary, since the examples we consider below show that representing the types as strings could require exponential time. (As shown in Exercise 11.2.6, a dag of size n may represent a type expression of length 211.) In the following theorem, we use the standard "big-oh" notation from algorithm analysis. Specifically, we say a function is O(g(n)) if there is some positive real number c > 0 such that f(n) cg(n) for all sufficiently large n. We similarly write f(n) = g(O(h(n)) if there is a positive real number c > 0 with f(n) g(ch(n)) for all sufficiently large n. In particular, a function is 2°(n) if there is some positive real number c > 0 with for all sufficiently f(n) large n.

f

f

11.2

Type Inference for

with Type Variables

787

Theorem 11.2.17 Given an untyped lambda term U of length n with all bound variables distinct, there is a linear time algorithm which computes a dag representing the principal typing of U if it exists, and fails otherwise. If it exists, the principal typing of U has length at most and dag size 0(n).

Before proving the converse of Theorem 11.2.17, we develop some intuition for the difficulty of type inference by looking at some example terms with large types. Product types and pairing and projection functions are not essential, but do make it easier to construct expressions with specific principal types. An alternative to adding product types and associated operations is to introduce syntactic sugar that is equivalent for the purpose of typing. To be specific, we will adopt the abbreviation (U1

Uk)

Az.zUi .. .

where z is a fresh variable not occurring in any of the This abbreviation is based on a common encoding of sequences in untyped lambda calculus that does not quite work for typed lambda calculus (see Exercise 11.2.24). It is easy to verify that if U1 has principal then the principal type of the sequence is type (U1

Uk):(al—*...--*ak--*t)--*t

where t is a fresh type variable not occurring in any of the a1. We will write x ... x —* 1) —* t with I not occurring as an abbreviation for any type expression (ai —* ... —* in any a1.

Example 11.2.18 P

The closed term

Xx.(x,x)

has principal type s —* (s x s). If we apply P to an expression M with principal type a, then the application PM will be typed by unifying s with a, resulting in the typing PM : a x a. Thus applying P to a typable expression doubles the length of its principal type.

.

By iterating the application of P from Example 11.2.18 to any typable expression M it is easy to prove the following lemma.

For arbitrarily large n, there exist closed lambda terms of length n whose principal types have length

Proposition 11.2.19

Another useful observation about principal types involves terms of the form V

Aw.Kw(U1

Type Inference

788

.x. If w does not occur free in any of the Ui's, then where K is the untyped term However, a t, as the I combinator V is typable, with the same principal type, t term of form V might be untypable if the type constraints introduced by the Ui's cannot be satisfied. This gives us a way of imposing constraints on the types of terms, as illustrated in the following example.

Example 11.2.20

The term

EQ

Kw

w (zu) (zv)).

has principal type t

—*

t

—* S —* S.

Therefore EQ U V is typable

if the principal types of U and V are unifiable.

An interesting and useful property of Curry typing is that if F U: r is a typing for some F, r not containing type constants, then F V: r is a principal typing for some term V [Hin69]. Our proof uses the following lemma.

If r is any type expression without type constants and t1 tk is a list containing all the type variables in r, then there is a closed untyped lambda term with principal type t1 r s s for some (arbitrary) type variable s not among ...

Lemma 11.2.21

tl

tk.

The proof is by induction on the structure of r, with ti tk fixed. If r is one of the type variables then t1 s s is a principal type of the term ... t1

Proof Ax1.

.

. .

AXk.Ay. Az. K z (EQx1 y)

Ày. u and EQ is the term given in Example 11.2.20.

where K

If r has the form r1 lambda terms U1 and tk

Ax1.

.

.

.

t2

s

r2, then by the inductive hypothesis we assume we have untyped

with principal types t1 ... tk ti s s and respectively. It is not hard to see that the untyped term

U2

s,

Axk.Ay. Az. K z (Aui.Au2.Aw. w (U1x1

.

.

.

Xku

(U2x1

. .

.

t1

Xku2) (EQ (yu1) u2))

has principal typing

This proves the lemma.

.

For simplicity, we state and prove the following theorem for closed terms. The more

11.2

Type Inference for

with Type Variables

789

general statement for open terms follows as a corollary. This is left to the reader as Exercise 11.2.25.

Theorem 11.2.22

If U is a closed untyped lambda term with type r, not necessarily principal, and r does not contain type constants, then there is a closed V with principal type r and V

U.

is a closed term W with principal type ti —± ... where ti tk lists all the type variables in r. Let V be the term (Ay. K y W') U, where W' is the term

Proof By Lemma 11.2.21, there tk —k

AX1.

r

—k s —k

s,

...AXk.AZ.Z(WX1...Xky).

Note that y occurs free in W'. The occunence of y in W' forces y to have type r. Therefore, the principal type of Ày. K y W' is r —± r. Since U has type r by hypothesis, the principal type of the entire term must be r.

Recall from Proposition 4.3.13 that there is a closed term of type r if r is a valid formula of intuitionistic propositional logic, with —k read as implication. This means that 0, r is a typing for some closed term if r is a valid formula of intuitionistic propositional logic. It follows from the PSPACE complexity of propositional intuitionistic logic [5ta79] that it is PSPACE-cOmplete to determine whether a given pair F, r is a typing for some (untyped) lambda term. From Theorem 11.2.22 and Proposition 4.3.13, it is relatively easy to prove that for any type expressions r and r', there is a closed term that is typable if r and r' are unifiable.

Theorem 11.2.23 Let r and r' be type expressions without type constants. There is a closed, untyped lambda term that is

typable

if r and r' are unifiable.

Proof Read as an implication,

—± s —± s says that r and r' imply true. This is r —± an intuitionistically valid implication. By Proposition 4.3.13, there is a term with this type, and by Theorem 11.2.22 there is a term U which has r —± —± s —± s as a principal type. Therefore, AX. UXX is typable r and r" are unifiable.

if

.

It follows from the p-completeness of unification [DKM84] that deciding typability is complete for polynomial time with respect to log-space reduction. This means that unless certain conjectures about parallel computation and polynomial time fail (specifically, the class NC contains all of p), it is unlikely that a parallel algorithm for typing could be made to run any faster in the asymptotic worst case than the best sequential algorithm. As shown in Exercise 11.2.26, Theorem 11.2.23 may be proved directly from Lemma

Type Inference

790

11.2.21. However, the proof using Proposition 4.3.13 seems easier to remember. An alternate direct proof is given in [Tys881. t for U: a and V: r. Exercise 11.2.24 Consider (U, V) Ax. xUV (a —÷ r —÷ t) Write untyped terms P1 and P2 such that (U, V)P1 = U and (U, V)P2 = V. Show that typing (U, V)P1:a and (U, V)P2: r require different typings for (U, V). :

Exercise 11.2.25 Use Theorem 11.2.22 to show that if F U: r is a typing for some open term U, then there is a term V —>± U with principal typing F V: r. Exercise 11.2.26 Prove Theorem 11.2.23 directly from Lemma 11.2.21, without using Proposition 4.3.13. 11.3

Type Inference with Polymorphic Declarations

11.3.1

ML Type Inference and Polymorphic Variables

from Section 9.2.5. In this section, we consider type inference for the language with type abstraction, type application, and Recall that this language is the extension of polymorphic let declarations. The type expressions of may contain the binding operator Fl, giving us polymorphic types of a second universe U2. The typing contexts may include both U1 and U2 type expressions, and the type of a constant symbol may be any U2 type expression without free type variables (i.e., all type variables must be bound by IT). Since this language exhibits the kind of polymorphism provided by the programming language ML, we may call the resulting type inference problem ML type inference. The main feature of A-÷n,let that we use in type inference is that all polymorphic types with r a simple non-polymorphic type. As a result, the genhave the form Fit1. .. eral techniques used for type inference apply to extensions of the language with cartesian product types, recursion operators, and other features that do not change the way that type binding occurs in type expressions. If we extend the language to include function flt'.r', however, then the type inference problem appears to be types of the form flt.r harder. With full impredicative typing, type inference becomes algorithmically unsolvable .

[We1941.

The ML type inference problem is defined precisely below by extending the Erase function of Section 11.1 to type abstraction, type application and let as follows:

Erase(At.M)

Erase(Mr)

= Erase(M) Erase(M)

Erase(let x: a = M in N) = let

x

= Erase(M) in Erase(N)

Type Inference with Polymorphic Declarations

11.3

791

Given any pre-term, Erase produces a term of an untyped lambda calculus extended with untyped let declarations of the form let x = U in V. Note that this is not the same as the Erase function in Example 11.1.2. A property of that complicates type inference for terms with free variables is that there are two distinct type universes. This means that there are two kinds of types that may be assigned to free variables. If we attempt to infer ML typings "bottom-up," as in the typing algorithm PT of Section 11.2.3, then there is no way to tell whether we should assume that a term variable has a U1 type or a U2 type. Furthermore, if we wanted to choose a "most general" U2 type, it is not clear how to do this without extending the type language with U2 type variables. However, this problem does not arise for closed terms. The reason is that each variable in a closed term must either be lambda bound or let bound. The lambda-bound variables must have U1 types, while the type of each let-bound variable may be determined from the declaration. For example, in let I = Ax. x in II, the type of the let-bound variable I can be determined by typing the term Xx. x; there is no need to find an arbitrary type for I that makes II well-typed. More generally, we can type an expression of the form let x = U in V by first finding the principal typing for U, then using instances of this type for each occurrence of x in V. Since this eliminates the need to work with U2 types, we can infer types for closed using essentially the same unification-based approach as for Ar-terms. We consider three algorithms for ML type inference. Since the first two involve substitution on terms, it is easier to describe these algorithms if we do not try to compute the corresponding explicitly-typed term. This leads us to consider the following ML type inference problem: ML Type Inference Problem: Given a closed, untyped A-term V, possibly containing untyped let's and term constants with closed U2 types, find a U1 type r such that there term 0 M: r with Erase(M) = V. exists a well-typed It is possible to modify each of the algorithms we develop so that the explicitly-typed term 0 M: r is actually computed. It is easy to see that there is no loss of generality in only considering Ui types for terms, since 0 M: ft1.... ftk.T is well-typed iff 0 r is. Mt1 . .

11.3.2

.

Two Sets of Implicit Typing Rules

We consider two sets of implicit typing rules and prove that they are equivalent for deriving while the secnon-polymorphic typings. The first set matches the typing rules of type inference. ond gives us a useful correspondence between ML type inference and The first set of rules is obtained by erasing types from the antecedent and consequent of

Type Inference

792

This gives us rules (cst), (var), (abs), (app) and (add hyp) each typing rule of of Section 11.2.4, together with the following rules for type abstraction, type application and let. The names (gen) and (inst) are short for "generalization" and "instantiation."

Fr>V:a t not free F V: Ilt.a Fr> V:Ilt.a F

V:

in F

(gen)

(inst)

[r/t]a

Fr>(let x=U in V):a'

(let)2

Since we often want let-bound variables to have polymorphic types, this let rule requires V: a U2 types, hence the subscript "2" in the name (let)2. We will write F-ML2 F if the typing assertion F V: a is provable using (cst), (var), (abs), (app), (add hyp), (gen), (inst) and (let)2. With these rules, we have the straightforward analog of Lemma 11.2.15 for Lemma 11.3.1 If F (M): a. Conversely, if with Erase(M) = V.

then F-ML2 F M: a is a well-typed term of V: a, then there is a typed term F M: a of

Erase

F-ML2 F

Proof The proof follows the same induction as the proof of Lemma 11.2.15, but with additional cases for (gen), (inst) and (let)2. Each of these is straightforward, and essentially similar to the cases considered in the proof of Lemma 11.2.15.

.

As a step towards an algorithm that solves the ML type inference problem without using polymorphic U2 types, we reformulate the type inference rules so that no polymorphic types are required. Essentially, the alternative rule for let uses substitution on terms in place of the kind of substitution on types that results from rules (gen) and (inst). More specifically, let x = U in V has precisely the same ML typings as [U/x]V, provided F U: p for some non-polymorphic type p. This suggests the following rule:

Fr>[U/x]V:r x=U in V:r

(let)i

If we ignore the hypothesis about U, then this rule allows us to type let x = U in V by typing the result [U/x]V of let reduction. It is not hard to show that if x occurs in V, and F [U/x]V: r is an ML typing, then we must have F U: a for some a. Therefore, the assumption about U only prevents us from typing let x U in V when x does not occur in V and U is untypable.

Type Inference with Polymorphic Declarations

11.3

793

let

A comparison of the two rules appears in Example 11.3.2 below. Since we wish to eliminate the typing rules that involve polymorphic types, we reformulate the rule for constants to incorporate (inst).

Tk/tl,..., tk]T, iffltl....fltk.risthetypeofc

(cst)1

We will write F-ML1 F U: r if the typing assertion F U: r is provable from (var), (abs), (app) and (add hyp) of Section 11.2.4, together with (cst)j and It follows from the proof that the two inference systems are equivalent that with (cst)j and (let)j, we do not need (inst) or (gen).

Example 11.3.2

Consider the term

let y=Ax.Af.fx in if

then

z

y true Ax.5

else

y 3 Ax.7

where we assume that true: bool and 3: nat as usual. Using rules common to ML1 and ML2, we may derive the typing assertion 0

Af. Ax.

fx: s

s)

(s

s

In the ML2 system, we can then generalize s by rule (gen) to obtain

fx: fls. s

0

Af. Ax.

and

use (inst)

y:

fls.s

(s

s)

s

to derive

flat)

s

(s

nat

and similarly for bool. This allows us to prove y:

fls.

s

(s

s)

s

then y true Ax. 5

(if z

else

which makes it possible to type the original term by the We can also derive

y_—Ax.Af.fx

in if

z

y 3 Ax.

7): nat

(let)2 rule.

then ytrueAx.5 else y3Ax.7:nat

using (let)1. Working backwards, it suffices to show O

if

z

then

(Ax.

Af.

fx) true

Ax. 5

else

(Ax.

Af.

fx) 3

Ax.

7:

nat

which can be done by showing that both (Ax. Af. fx) true Ax. 5 and (Ax. Af. fx) 3 Ax. 7 have type nat. These typings can be shown in the Ar-fragment of ML1 in the usual way. The difference between the ML1 and ML2 derivations is that in ML1, we type each

Type Inference

794

occurrence of Ax. Af. fx in a different way, while in ML2 we give one polymorphic typing for this term, and instantiate the bound type variable differently for each occurrence.

.

Since ML1 typings use the same form of typing assertions as Curry typing, we may adopt the definition of instance given in Section 11.2.4. It is not hard to see that Lemma 11.2.3 extends immediately to

Lemma 11.3.3 Let F' F' U: r'. then

U:

t'

be an instance of the typing F

U:

t. If

F

U:

t,

Proof This proof by induction on typing derivations is identical to the proof of Lemma 11.2.3 except for an additional case for (let)i. • We also have a similar property for ML2 typings, where we define instance exactly as before. Specifically, F' U: t' is an instance of F U: r if there is a substitution S with SF c F' and Sr t'. As usual, a substitution only effects free type variables, not those bound by Fl.

Let F' U: r' be an instance of the typing F U: r, where F and r may contain polymorphic types. If F U: r, then F' U: t'.

Lemma 11.3.4

This lemma will be used in proving that ML1 and ML2 are equivalent. The proof is again by induction on typing derivations. We can prove that the two inference systems are equivalent for typing judgements without polymorphic types.

Theorem 11.3.5 Let F be a type assignment with no polymorphic types, and let r be any nonpolymorphic type. Then

F

U: r iff

F—ML1

F

U: r.

Although the details are involved, the main idea behind this equivalence is relatively simple. We give a short sketch here and postpone the full proof to Section 11.3.4. The proof of Theorem 11.3.5 has two parts, the first showing that ML1 is as powerful as ML2 and the second showing the converse. The first part is easier. To show that ML1 is as powerful as ML2, we first show that adding the ML2 rules (gen) and (inst) would not change the ML1. The essential reason is that if F U: r' is proved from F U: r by some sequence of steps involving only (gen) and (inst), then F U: r' is an instance of F U: r. But since any instance of an ML1 typing is already ML1 -provable, rules (gen) and (inst) are unnecessary. The second step in showing ML1 is as powerful as ML2 is to show that (let)2 can only be used to derive typing assertion that could already be proved using ML1 rules. If we prove a typing by (let)2,

Type Inference with Polymorphic Declarations

11.3

F

U:

Ut1.... Utk.t, F, x: Ut1.... lltk.r x=U in V:r'

V:

795

t'

then every occurrence of x in V is given a substitution instance of r by applying some combination of (inst) and (gen) steps. Therefore, we could prove F [U/x] V: r' using the ML1 rules. This is the outline of the proof that ML1 is as powerful as ML2. The converse, however, requires principal typing. The main problem is to show that if x occurs in V and F [U/x]V: r' is ML1-provable, then there is some type r such that F U: lit1.... fltk.t and F, x: lit1.... fltk.r V: t' are both ML2-provable. The only reasonable choice for r is the principal type for U. If we set up the inductive proof of equivalence properly, then it suffices to show that ML1 has principal types. Therefore, we proceed with the typing algorithms based on ML1. These give us the existence of principal typings as a corollary, allowing use to prove the equivalence of ML1 and ML2 in Section 11.3.4. Theorem 11.3.5 gives us the following correspondence between F-ML and the explicitly typed language

Corollary 11.3.6 taining only

Erase(M): t.

Let r be any U1 type of and F any type assignment conlet then F-ML1 F types. If F M: r is a well-typed term of F V: r, then there is a typed term F M: r of Conversely, if

U1

with Erase(M)

V.

Proof This follows immediately from Lemma 11.3.1 and Theorem 11.3.5. 11.3.3

.

Type Inference Algorithms

We consider three algorithms for computing principal ML1 typings of untyped A, letterms. The first eliminates let's and applies algorithm PT of Section 11.2.3. This takes exponential time in the worst case, since eliminating let's may lead to an exponential increase in the size of the term. However, as we shall see in Section 11.3.5, exponential time is the asymptotic lower bound on the running time of any correct typing algorithm. The second algorithm is based on the same idea, but is formulated as an extension to algorithm PT. This serves as an intermediate step towards the third algorithm, which uses a separate "environment" to store the types of let-bound variables, eliminating the need for substitution on terms. This algorithm more closely resembles the algorithms actually used in ML and related compilers. Although the third algorithm has the same asymptotic worstcase running time as the first, it has a simple recursive form and may be substantially more efficient in practice since it does not perform any operations on terms. All three algorithms are explained and proved correct for ML1 typing. It follows by Theorem 11.3.5 that the algorithms are also correct for and produce principal typings for ML2 typing.

Type Inference

796

First Algorithm The main idea behind the first algorithm is that when x occurs in V, the term let x = U in V has exactly the same typings as [U/x] V. We can treat the special case where x does not occur in V by the following general transformation on terms.

An untyped A, let-term of the form let x = U in V has precisely the same ML1 typings as let x — U in (Ay.V)x, provided y not free in V, and precisely the same ML1 typings as the result [U/x]((Ay.V)x) of let-reducing the latter term.

Lemma 11.3.7

The proof is straightforward, by inspection of the ML1 inference rules. A second induction shows that we may repeat this transformation anywhere inside a term, any number of times. W1 be any untyped A, let-term and let W2 be the result of repeatedly replacing any subterm of the form let x = U in V by either let x = U in Then W1 and W2 have precisely the same ML1 typings. or

Lemma 11.3.8 Let

Lemma 11.3.8 gives a transformation of untyped let-terms that preserves typability. Since this transformation can be used to eliminate all let's from a term, we may compute typings. Given an untyped A, let-term, ML1 typings using the algorithm PT for algorithm PTL1 eliminates all let's according to Lemma 11.3.8 and applies algorithm PT to the resulting untyped lambda term. If we write let-expand(U) for the result of removing all let's according to Lemma 11.3.8, then we may define algorithm PTL1 as follows:

PTL1(U) =let F

M: r

F

U: r

= PT(let-expand(U))

in Termination follows from Proposition 9.2.15. Specifically, we can compute the letexpansion of a term in two-phases. We begin by replacing each subterm of the form let x = U in V by let x = U in (Ay.V)x, where y is not free in V. This is clearly a terminating process. Then, by Proposition 9.2.15, we can let-reduce each let x = U in (Ay. V)x to [U/x]((Ay. V)x). The theorems below follow immediately from mas 11.3.3 and 11.3.8 and Theorems 11.2.12 and 11.2.13.

Theorem 11.3.9 If PTL1(U) = provable.

F

U:

r, then every instance of F

U: r is ML1-

11.3

Type Inference with Polymorphic Declarations

Theorem 11.3.10 Suppose

797

U: r is an ML1 typing for U. Then PTL1(U) succeeds and produces a typing assertion with T U: r as an instance. T

Corollary 11.3.11 Let U be an untyped A, has a principal ML1 typing.

let-term. If

U has an MLi typing, then U

Second Algorithm We can reformulate algorithm PTL1 as an extension of P1 rather than a two phase algorithm with a preprocessing step. If we read typing rule (let)i from the bottom up, this rule suggests that we extend algorithm PT to let by adding the following clause:

PT(let x=U in

V)

=let T

U: r

PT(U)

in

PT([U/x]V) We write PTL2 for the algorithm given by this extension to PT. It is intuitively clear from the (let)i rule that this algorithm is correct. We state and prove this precisely using let-expansion of terms. For any terms U and V, we write PTL1(U) PTLJ(V) if either both PTLI(U) and PTLJ(V) fail, or PTLI(U) = T1 U: ri and PTLJ(V) = T2 V: r2 both succeed with = T2 and = T2.

Lemma 11.3.12

For any untyped

A,

let-term U, we have PTL2(U)

PTL2(let-

expand(U)).

Proof The proof uses induction on an ordering of untyped terms by "degree." Recall from Proposition 9.2.15 that there are no untyped A, let-terms with an infinite sequence of let reductions. If V is an untyped A, let-term, then the degree of V is the pair dg(V) = (t, m) of natural numbers with £ the length of the longest let-reduction sequence from V and m = VI the length of V. We order degrees lexicographically, so that (t, m) < (t', m') if either £ U: lit1 .. r) x: r" from this same series of proof steps may be used to obtain F, x: (lit1 . . F, x: (fIt1 . r) x: fIt1 ... fltk. r, and conversely. The remaining cases are essentially alike. We will illustrate the argument using the let case, since this is slightly more involved than the others. By Lemma 11.3.20, we need only consider proofs that end with rule (let)2. Therefore, we have F—ML2 F [U/x](let y V1 in V2): v" if F—ML2 F r> (let y = [U/x}V1 in [U/x]V2): r" if by the last proof rule both F—ML2 F [U/x}Vi: a' and L—ML2 F, y: o' [U/x]V2: r". Since r" is not a polymorphic type, the inductive hypothesis implies that the second condition is equivalent to r). F, x: o, y: o' V2: r", where a = (lit1 .. hypothesis to F r> [U/x]V1: oW', we note that o' must have the To apply the inductive do not form a-' = flsi . . fIst. r'. We assume without loss of generality that Si occur free in F and remove the quantifiers by rule (inst). Using the inductive hypothesis on the typing assertion obtained in this way, we observe: .

.

.

.

.

F—ML2

Fr> [U/x]Vi: flSi

.

.

.

fIst.

r'if if if

F—ML2

Fr> [U/x]Vi:

r'

HML2F,x:a-r>Vi:t

F,x:o

r>

Vi: fIs1

...

Proof of Theorem 11.3.5 The proof uses induction on an ordering of untyped terms by degree, as described in the proof of Lemma 11.3.12. Specifically, the degree of V is the pair dg(V) = m) of natural numbers with £ the length of the longest let-reduction sequence from V and m = Vl the length of V. We show that the two systems are equivalent for terms of each syntactic form. The cases for variable, application and abstraction are essentially straightforward, using Lemma 11.3.20. To give an example, we will prove the theorem for abstractions. By Lemma 11.3.20, we have F r> Ax.U: r —÷ r' if F-ML2 F, x: r U: t', where by Lemma 11.3.19 we assume that x does not occur in F. By the inductive hypothesis, L—ML2 F, x: r r> U: r' if L—ML1 F, x: r r> U: r'. By inspection of the ML1 rules, we can see that since x does not occur in F, F—MLI F, x: r U: r' if F r> Ax.U: r —± r'. Since the primary difference between the two systems is in the let rules, this is the main case of the proof. We prove each direction of the equivalence separately. If L—ML i F r> let x U in V: r, then the derivation concludes with the rule

Fr>U:t', Fr>[U/x}V:t x=U in V:t

Type Inference with Polymorphic Declarations

11.3

805

There are two cases to consider. If x does not occur free in V, then by the inductive hypothesis, we know that both hypotheses of this proof rule are ML2-provable. From F V: r we obtain F, x: V: r and so we complete the ML2 proof by F

U: r', F, x:

V: r

Fr>let x=U in V:r using rule (let)2. The more difficult case is when x occurs free in V. Since we have shown that has principal typings, in Corollary 11.3.11, we may assume by Lemma 11.3.21 that F U: r' is the F-principal typing of U. This means that every ML i-provable typing assertion of the form F U: r" has r" = Sr' for some substitution S which only affects type variables tk that are free in r' but not in F. The F-principality of F > U: r' carries over to ML2 by the induction hypothesis. By the inductive hypotheses, we also have F t> U: r' and F {U/x]V: r. By rule (gen), we have F U: lit1 . r' and by Lemma 11.3.22 F, x: (lit1 ... litk. r') V: r. This allows us to apply rule (let)2, concluding this direction of the proof. For the converse, we assume F let x = U in V: r. By Lemma 11.3.20, we may assume that the proof ends with the proof step . .

x=U in where a = lit1

(let)2

V:r

... litk. r' for some sequence

tj

of type

variables and

nonpolymor-

we may assume that t1 tk do not appear free in F. By (inst), we have F U: r', and hence this assertion is ML1-provable by the induction hypothesis. There is no loss of generality in assuming that F U: r' is a Fprincipal typing for U. This does not affect the provability of F, x: a V: r; since we may use (inst) on the variable x to obtain any substitution instance of r in the typing for V. This gives us F—ML2 F [U/x]V: r by Lemma 11.3.22 and therefore F—ML1 F [U/x]V: r by the inductive hypothesis. We now use rule (let)i to finish the proof. phic type

11.3.5

r'.

Without loss of generality,

Complexity of ML type Inference

In this section, we show that the recognition problem for ML-typing is exponential-time complete. In other words, any algorithm that decides whether untyped A, let-terms are ML-typable requires exponential time, in the worst case. All of the algorithms given in Section 11.3.3 run in at most exponential time, and are therefore optimal in principle. This stands in contrast to the complexity of Curry type inference, which may be decided in linear time, as shown in Theorem 11.2.17.

Type Inference

806

In discussing size and running time, we use standard "big-oh" notation as in Section 11.2.5, writing f(n) = g(O(h(n)) if there is a positive real number c > 0 with f(n) g(ch(n)) for all sufficiently large n. For lower bounds, we write f(n) g(Q(h(n)) if there is a positive real number c > 0 with f(n) g(ch(n)) for infinitely many n. A more detailed analysis of any of the ML typing algorithms shows that the running time is exponential in the let-depth of a term, and otherwise linear. (To be precise, the is where U running time on input U is let-depth a term, of defined in Section 9.2.5, is the depth of nesting of let-declarations. In the worst case, the let-depth of U may be proportional to the length of U. However, for many practical kinds of programs, the depth of non-trivial nesting appears bounded. There are two main proofs of the exponential-time lower bound for ML typing. One is a direct encoding of exponential-time Turing machines as typing problems. The other uses a variant of unification called semi-unification. The first style of proof was developed and refined in a series of papers by several closely-communicating authors [KM89, Mai90, KMM91, HM941. The semi-unification proof of [KTEJ90bj is related to a larger investigation of polymorphic type inference carried out in a separate series of papers [KTEJ88, KTU89, KTU90a]. One intriguing aspect of the algorithmic lower bound for ML typing is that it does not seem to effect the practical use of the ML type inference algorithm. The lower bound shows that for arbitrarily large n, there exist terms of length n that require on the order of steps to decide whether they are typable. Like any other algorithmic lower bound, the lower bound for ML type inference relies on the construction of an infinite family of "hard instances" of the typing problem. However, the empirical evidence is that these hard instances do not often arise in practice. In spite of twenty years of ML programming, the lower bounds were a complete surprise to most ML programmers. An interesting direction for future research would be to characterize the form of most "practical" ML programs in a way that explains the apparent efficiency of the algorithm in practice. We do have some explanation, in the way that the complexity depends on the let-depth of an expression, but there is likely to be more to the story. A few example terms with large types will give us some useful intuition for the difficulties of ML typing. Before proceeding, it may be helpful to recall that in algorithm PTL, type variables in the types of let-bound expression are renamed at each occurrence. For example, consider a term of the form let = M in N with M closed. The first step in typing this term is to compute the principal type of M. Then, for each occurrence of in N, the algorithm uses a copy of this principal type of M with all type variables renamed to be different from those used in every other type. (Something similar but slightly more complicated is done when M is not closed.) Because of variable renaming, the number of

f

f

11.3

Type Inference with Polymorphic Declarations

807

type variables in the principa' type of a term may be exponential in the length of the term. The following example illustrating this exponential growth is due to Mitchell Wand and, independently, Peter Buneman.

Example 11.3.23

Consider the expression

let x=M in where M is closed with principal type a. The principal type of this expression is a' x a", where a' and a" are copies of a with type variables renamed differently in each case. Unlike the expression (Xx.(x, in Example 11.2.18, not only is the type twice as long as a, but the type has twice as many type variaMes. For this reason, even the dag representation (see Exercise 11.2.6) for the type of the expression with let is twice as large as the dag representation for the type of M. We can iterate this doubling of type variables using terms of the following form:

letx0=Min let = (xo, xO) in = It is not hard to check that for each n, the principal type of

Consequently, the dag representation of this type will have

has nodes.

type variables.

Although the dag representation of the principal type for the expression in Example 11.3.23 has exponential size, other types for this expression have smaller dag representations. In particular, consider the instance obtained by applying a substitution which replaces all type variables with a single variable t. Since all subexpressions of the resulting type share the same type variable, this produces a typing with linear size dag representation. In the following example, we construct expressions such that the principal types have doubly-exponential size when written out as strings, and the dag representations of every typing must have at 'east exponential size. from Example 11.2.18 douExample 11.3 .24 Recall that the expression P Xx.(x, bles the length of the principal type of its argument. Consequently, the n-fold composition Using nested of P (with itself) increases the length of the principal type by a factor of composition of P using an expression of length n. This lets, we can define the gives us an expression whose principal type has doubly-exponential length and exponential dag size. Since the dag must contain an exponential-length path, any substitution instance

Type Inference

808

of the principal type also has exponential dag size. Such an expression

is defined as

follows:

letxo=Pin let x1 = Ay.xo(xoy) in let

= ?cY.Xn_i(Xn_1Y) in

Q

where for simplicity we may let Q be Az. z. To write the principal type of this expression for the n-ary product, defined inductively by simply, let us use the notation (r[k1). has symbols. By examining and x It is easy to see that and tracing the behavior of PTL, we see that xo has principal type the expression Consequently, the and for each k > 0 the principal type of xk is t —÷ t —÷ , which has length 2 Since pnncipal type of the entire expression is (t —÷ a dag representation can reduce this by at most one exponential (see Exercise 11.2.6), the dag size of the principal type is

Proposition 11.3.25 For arbitrarily large n, there exist closed expressions of length 0(n) whose principal types have length dag size and distinct type variables. Furthennore, every instance of the principal type must have dag size

Proof For any

n 1, we may construct an expression as in Example 11.3.23 whose principal type has exponentially many type variables, and an expression as in Exsatisfying remaining conditions. expression ample 11.3.24 the The (Wa, proves the proposition.

The exponential-time lower bound for ML typing is proved using an adaptation of Example 11.3.24. By Lemma 11.3.7, let x M in N has the same types as [M/x]N when x occurs in N. It follows that the principal type of is the same as the type of the term obtained by let-reduction, v,,

=

p2flQ

Therefore, we may compute the type of by working across the term P2'1 Q from right to left. This process begins with the principal type of Q and, for each of the occurrences of P, finds a type that makes the application well-formed. Since there are occurrences of P, we unify the type giving the domain of P a total of T1 times, even though the term has length 0(n). We may see the general pattern a little more clearly if we introduce some notation. Let P0 be the principal type of Q and a —÷ r the principal type of P. Then we type by computing a sequence of types P1 with Pi+1 determined from p1 and o- —÷ r by letting S = Unify(p1, o) and Pi+I = Sr. We may think of the type a —÷ r as an

11.3

Type Inference with Polymorphic Declarations

809

"instruction" for computing Pi+ i from p1 by unification, with this instruction executed times in the typing of In the lower bound proof, we replace the "initial" term Q and the "instruction" P in by terms whose types code parts of a Turing machine computation. More specifically, suppose P0 codes the initial configuration of a Turing machine in some way, and the type a —÷ r has the property that if Sa codes the i-th configuration in some computation, then St codes the (i + 1)-st configuration. Then computed as above will be the configuration of the Turing machine after computation steps. Consequently, we can represent an exponential-time computation by an appropriate adaptation of provided we have some way to code an initial configuration and the one-step function of the Turing machine. Given the results in Section 11 which show that we can represent any unification problem, combined with the idea in [DKM84] that boolean circuit evaluation can be represented by unification, it should not be difficult to imagine that we can encode Turing machine computation in this way. Note that the number of steps we can encode depends on the let-depth of the term. This leads us to the following theorem. Any algorithm that decides whether any given untyped A, let-term is ML typable requires time on input U, where is the length of U and

Theorem 11.3.26

£d(U) is its let-depth. We first look at an outline of the proof, then fill in the details.

Proof Sketch Let T be any deterministic, single-tape Turing machine and let a be an input string of length n. We assume without loss of generality that T has exactly one accepting state and that once T enters this state, no subsequent computation step leads to a different state. This assumption, which may be justified by standard manipulation of Turing machines [HU79], makes it easy to see whether T accepts its input. There are also some minor details associated with the way we represent an infinite tape by a type expression. This leads us to modify the standard Turing machine model slightly. Specifically, we represent the tape by a list of symbols to the left of the input head and list of symbols to the right of the input head. The list of symbols to the left of a Turing machine tape head is always finite. On the right, there will always be a finite list of symbols after which all symbols will be blank. Therefore, we can also represent the infinite tape to the right of the head by a finite list. The only complication comes when executing a move to the right, onto a blank tape square that is not in our finite list. To handle this case, we assume that the Turing machine is started with a special end marker symbol, $, at the end of the input. We also assume that the Turing machine moves the end marker to the right as needed, so that it always marks the right-most tape position scanned in the computation. To make this possible, we augment the standard "left" and "right" moves of the Turing

810

Type Inference

machine with a special "insert" move that simultaneously writes a symbol on the tape and moves all symbols currently to the right of the tape head one additional tape square to the right. This allows the machine to insert as many blanks (or other symbols) on the tape as needed, without running over the end marker. An essentially equivalent approach would be to work with 2-stack machines instead of Turing machines. However, since Turing machines are more familiar, we will stick to Turing machines. or fewer steps. We will construct a term that is ML-typable if T accepts input a in A configuration of T will be represented by a triple (q, L, R), where q is the state of the machine, L is the list of symbols appearing to the left of the tape head, and R is the list of symbols appearing to the right of the tape head. Let mit be an untyped lambda term whose principal type represents the initial configuration of T on input a. (This configuration is E, a$), where E is the empty string, qi right of the tape head, terminated by starting state of and the input a sits to the is the T the end marker $.) Let Next be an untyped lambda term whose principal type represents the transition function of T. Specifically, the principal type of Next must have the form a —k r, with the property that if Sa represents a configuration of T, for some substitution 5, then Sr represents the configuration of T after one move of the Turing machine. We choose two distinct types representing boolean values "true" and "false" and assume there are untyped lambda terms with these principal types. Let Accept be an untyped lambda term whose principal type has the form a —k f3 with represents "true," the property that if Sa represents an accepting configuration, then while if Sa represents a non-accepting configuration, Sf3 represents false. These lambda terms are constructed so that if T accepts input a in or fewer steps, then the type of Run represents "true," and "false" otherwise. Choosing "true" and "false" appropriately, we may apply Run to arguments so that the resulting term is typable only if the type represents "true." This allows us to decide whether T accepts input a by deciding whether a term involving Run has an ML type. Since we can write Run as a term of length linear in a and the description of T, using the pattern of from Example 11.3.24, it requires exponential time to decide whether a given term is ML-typable. In the remainder of the section, we describe the construction of untyped lambda terms mit, Next and Accept for a Turing machine T satisfying the assumptions given in the proof sketch for Theorem 11.3.26. An important insight in [Hen9Oa, HM94] is that these terms can be motivated by considering a particular coding of Turing machines by lambda terms, then showing that the computation is also coded by the types of these terms. A consequence of this approach is that the lower bound may be extended to impredicative systems that only give types to terms with normal forms.

Type Inference with Polymorphic Declarations

11.3

811

We begin with terms for booleans,

tt

Ax.Ay.x

t

t —± s

Ax.Ay.y

if

t

s

s

and consider the types as well as the terms as representing "true" and "false." This coding for booleans is a special case of a general idea. Specifically, if we wish to code elements of some finite set of size k, we can use the Cunied projection function .

t2

t1

. .

...

>

>

tj

or its type to represent the ith element of the set. We therefore represent the states tape symbols Cl,..., Cm e C by 7rr,..., and qi,..., e Q of T by the tape head directions left, right and insert by 7r?, and (The end marker $ E C is one of the tape symbols.) An insert move is our special move that allows the machine to insert a tape symbol in front of the end marker in a single move. We may represent a tuple (V1, V2 and a list Vk) by the function Ax. xV1 ... [V1 Vk] by the nested sequence of pairs

4,

[V1

Vk]

wherenil

(V1,(V2

(Vk,nil)...))

Az.z.

Exercise 11.3.29 illustrates some typing difficulties associated with projection functions. We circumvent these problems using the following syntactic form

match

(x,y)=U in

V

U(Ax.Ay.V)

and similarly for match (x, y, z) U in V. For untyped terms, match (x, y) = U in V is equivalent to the pattern-matching form let (x, y) U in V as in Section 2.2.6. However, the typing properties are different, as explained in Exercise 11.3.29. Using these general coding ideas, we can represent the initial Turing machine configuration of T on input a by a triple

mit

¶!!!

where we assume is the start state of T, nil represents the empty list of symbols to the left of the tape and [al, a2 $] represents the list of input symbols to the right of the

Type Inference

812

tape, followed by the end marker symbol i.e., if the ith symbol of input a is the jth tape symbol then a1 7rT. The transition function of the Turing machine is represented by

)xonfig.

Next

match (q, Left, Right) = config in match (t, L)

Left

—

in

match (r, R) = Right in

match (q',

c",

d') = deltaq r in

d' MoveL MoveR MoveL

(q', L, (t, (c', R)))

MoveR

(q',

Insert

(q', (c', (t,

(c',

Insert

(t, L)), R) L)),

(r, R))

where configurations MoveL, MoveR and Insert correspond to moving the tape head left, right or inserting a new tape symbol, and delta is a function returning the next state, tape symbol, and direction for the tape head move. In words, Next assumes its argument is a triple (q, (t, L), (r, R)) representing a configuration of T. The function delta is applied to the state q and symbol r under the tape head, returning a tuple (q', c', d') containing the next state, tape symbol to be written and direction for the tape head to move. Since the or we can apply d to terms that represent the next direction d will be either ire, configuration if the machine moves left, the next configuration if the machine moves right, and the next configuration if the machine inserts a symbol. The function delta depends on the transition function of T. It is straightforward, although potentially tedious, to write this function out completely. If there are three states and two tape symbols, for example, then the state will be represented by a projection function on triples and the tape symbol by a projection function on pairs. In this case, delta could be written in the form

delta

where

(

(q,c,d)),

represents

the move from state q1 and tape symbol c3.

is not hard to verify that Next works properly. What is slightly more surprising is that we can read the computation of T from the types of mit and Next. More specifically, It

11.3

Type Inference with Polymorphic Declarations

813

consider the types of tt andff as representing "true" and "fa'se", and similarly for the types of projection functions. Abusing notation slightly, let us write (ai

Uk)

•

for the principal type of and similarly a!

[ai

0k —f (t

(U1,...,

—*

t)

Uk), assuming

has principal type

and t is not in

Ok]

Then the type if mit a'so codes the initial configuration of T and the type of Next codes the transition function of T. This gives us the following lemma.

Lemma 11.3.27 Let C: a be a term and its principal type, both representing a configuration of Turing machine T and let C', a' similarly represent the configuration after one computation step of T. Then (Next C)

—÷÷

C' and Next C

:

a'.

This essentially completes proof of Theorem 11.3.26. If Nextklnit gives us the machine configuration after k steps, it is easy to use match or application to projection functions to obtain the machine state after k steps. This lets us write the term Accept as required in the proof sketch. Exercise 11.3.28 asks you to find terms so that Run is typable if the Turing machine accepts. An interesting way to "check" the proof is to convert the lambda terms mit and Next into ML syntax and use the ML type checker. A simple, illustrative example appears in Exercise 1.3.30.

Exercise 11.3.28 Suppose untyped A, let-term Run has type s —* t —* s if a Turing machine T accepts and s —* t —* t otherwise. Write a term using Run that is ML-typable if the Turing machine accepts. Exercise 11.3.29 We may define untyped projection functions by

fst

Ap.p(Ax.Ay.x)

snd

Ap.p(Ax.Ay.y)

It is easy to check that for the coding of pairs by (U, V) = Az. zUV, we have fst(U, V) = However, there is a typing problem with U and snd(U, V) = V by untyped these projection functions in ML. This may be illustrated by considering an alternate definition

match (x,y)—U in

V

(Xx.Ay.V)(fstU)(sndU)

(a) Show that for both versions of match, we have match (x, y) (Ax. Ax. V)Ui (12 in untyped lambda calculus.

(U1, U2)

in

V

=

Type Inference

814

(b) Show that the type of (Az.match (x, y) z in (x, y)) (U, ff) depends on which definition of match we use. Why is the first definition given in this section better than the alternative given in this exercise?

Exercise 11.3.30 The following Standard ML code gives the representation of a simple Turing machine with three states, q3 E Q and two tape symbols, c1, c2 E C. For simplicity, the special "insert" move is omitted. The machine moves right until the second occurrence of c2, then moves back and forth indefinitely over the second occurrence of c2. Type this program in and "run" the simulation for enough steps to verify that the machine enters state when passing over the second c2. Then modify the machine so that instead of leaving the input tape alone, it changes all of the symbols to ci, up to and including the second c2, before looping back and forth over the square containing the second c2. Test your encoding by computing enough steps to verify that you have made the modification correctly. fun pi_i_2(x_i)(x_2)

=

x_i;

val tt

pi_i_2;

fun pi_2_2(x_i)(x_2)

=

x_2;

val ff =

pi_i_2;

x_i;

val qi

pi_i_3;

x_2; x_3;

val q2 =

pi_2_3; pi_3_3;

fun pi_i_3(x_i)(x_2)(x_3) = fun fun

pi_2_3(x_i) (x_2) (x_3) pi_3_3(x_i)(x_2)(x_3)

= =

val q3 =

val ci =

pi_i_2;

val left =

val c2 =

pi_2_2;

val right

fun pair(x)(y)

fn

w

=>

w

x y

fun triple(x)(y)(z) =

fn

w

>

val t2 =

triple triple triple

val t6 =

triple

val ti =

=

qi ci right;

fun

;

w

(*

x y z;

nil(x)

=

pi_i_2; pi.2_2;

x;

val cons =

pair;

move from state qi on symbol ci

*)

move from state qi on symbol c2 *) val t3 = q2 ci right; (* move from state q2 on symbol ci *) val t4 triple q3 c2 left; (* move from state q2 on symbol c2 *) val t5 triple q2 ci left; (* move from state q3 on symbol ci *)

val delta =

q2 c2 right;

(*

q2 ci left

(*

(triple

val a = cons ci

;

move from state q3 on symbol c2

(pair ti t2)

(cons c2

(cons ci

(pair t3 t4) (cons c2

(pair t5 t6));

(cons ci nil))));

*)

11.3

Type Inference with Polymorphic Declarations

val

mit

=

triple qi

(cons ci

nil)

a;

fun next(config)

config fn Left => fn Right => (Left (fn 1 > fn L > (Right (fn r => fn B. => (delta q r) (fn q' => fn c' > fn d'

(fn

q =>

d'

(triple q' L (cons

(triple ))))))

q'

(cons

c'

> 1

(cons c' B.)))

(cons 1 L)) R)

815

" " " $ "

" &''(&)* &+ #+*

, # "-

#"

4*

5 "" 4"

&/&012(3* &++*

! "

#

%

" .""

6 " !

#" "

&3/301)')()37 &+* 4+&

4

,# "8 "

4+3

# 4

! 8 # "1

4+3#

# 4

4+2

4=

+'

" &*3(&& &++&

%

" 9 """

" &('3 ,9: &++3

%

! 8 # "1 "9 """ %

" '+()'* ,9: 64, 7+ &++3

# 4 ; "# % !"#

4 A" + / )- =9C &+'

+*

+'

"

! " # $

7/&J'01&+('3* &++* A 4 " %

+3

=5 " ! "# "8 " 9 9

23

" )&()'7

A &++'

+*

A 4

+&

" " ! " 9

# 9

2/&017(&)3 &++* + +

* C

"9E" /&++*0 '&)('7) !"# ) 4" F 4 4 "

&++&

+3

" "

)

# 9

1 >" "

+

3/'01&'7('* &++3

A " A : - %

# " " ! " "1

&'1)()7 &+) : B9 4# # 9 " % " '(+* &+ F B "

4,+& : B9 4$ 4 , 8 % " .

+)/&01&7'(''& &++&

+

: B9 A C B K %

+3

= F "8" # " 9

24 5 0 *

"

&)7(&2+ &++ , 64, )7 E+*

4 E

4

4" # =9C

4 3

4

"

= C % &++* % )-

&+

" " ! % ) * # 6+ 7 " 2&( A &+3 , 64, &7) 4 # % ! +

" '&(37 , 64, '3'

4

&+ 4 +

@ 4

" " ! "8 " # $

%

" &3(&7 &++ , 64, )7' 4>+&

= 9 4 > 4" K !

#

%

# " F "8

2338

" '+&()*' &++&

, 64, 2&* 4>4:*

4 >B 94 : = " " %

""

# "

(9 , ! * ! +

" 2)2(2* ="" &+* 4>4L7

4 >B 94 L " !" "

4>

4>A

+

4

A > "" A "F 6 " 9)

,49)& >54 """ " 4 &+

4

A > A "F 6 " 9) "" %

1

" '*'('&' &++ ! 69C " &+2

4@2

C 4 @"

4+'

= 9 4

4 ! "#" 9-

4E+

4C

#

7'12(&& &+7

'122(+& &++'

+3

4$ 4 E"- > 9 " ! "

&/'01&')(&7 &++

4$ C

" ! ""

#

7/'J)01+2(&'* &+ 4C+3

4C+3#

4$ C C#

9 "

+

3/&0177( &++3 = 9 4 C

+ 3/&01&&)(&&2 &++3

= 1

# " """

G . !

M

4)'

4

" ! " " ! ! !

))1)3() &+)'

, F " : )) " )+(3 ! " 4)

4 " 8 # # ! #

21)32())

&+) 43* 43&

4 ! ! " ! " 4

! -

! 212( &+3*

= D8 ="" &+3& &+)

# D8" " % # % 4

,+3

47

4+ 4$

4=+'

4

, A

, 8 ." ! ," @ F "#

&*+13(2 &++3 = 8" * , 64, 2' ,# &++& " 72*(77* , 4- , "" "" ! . "" ! 8 71&'+(&37 &+7 E 4- ) => "" F D8" &++ 4$ "" ! H" . % " ''7(') A &+ 4 ="

6F ! " ! . "1

@%N9 "

+/'01&7&('&* &++' 4 A "" , " ! 8" )+137'( @%N9

4)

3' &+) 4

= 9 4

! * 0 *

=9

A E ," 6F G- &+ " 72

4

= E

; " " #" "

- &7/30137&(2'' &+2

4 L !

" '&('' A &+

" %

(9 , !

* ! + " 27+(*7 ="" &+* 5 E >-" " ! 8 ! " 6 > "8 ! %

&/0132)(327 &+72 >7 >A+*

> 3

> '

>7+ 5 2 5 7' 5"7&

5 E >-"

)

=9C

&+7

6 >"FB A 9= A F """ % A 8 F

* . 9 " '3)()'* 69C 4 >F- =

-" A 4

&1)2(2* &+3

; "$

( !'

" &++*

!

> " = "" ! ! " % 3

" '*7('&' &+' A > ; " " ! 123(2* &+7+ C 5 + ! / 2 ,9: &+2 C 5 ="" &+7' G 5"8 4 # " ! "" ! ' &*/)01'37()* &+7&

5"7' 5"7 5,+* @!7+

! ' &&/301)7(3)7 &+7' G 5"8 C O8 " ! ' &2/013'(23 &+7

"9E" &++*

5 " ," , @! 4"8 " ! !" """ % 0 5:; " G 5"8 4 # !" ! "

&2+(''3 " &+7+ 69C @,,

= @ A 9G , 8 = A , # "

%

,

" '73('7+ A

&+ @C +3

@+

@" @ C"

A 4 # " ! # " " B

% 6! 97 &1)()7 &++3 = 8" * &++) '()

4 @ * + 69C " &++

@ 7

@;)

E @ "" " % A ,F B

" &+()' , &+7

, @ > 8 ;H> .""8"" ! " " ""

)*/&01&2&(&2 &+)

@

= @ = =5 &+

@72

C @

5$ #F ! "

% = -

0

"

''()7 ,9: &+72 @ ,+'

@,7

@,+* #& 2 *

5." =5H" # +/'01'&&(''7 &++' = 8" * %555 &++* )3()23 = @ , 8 , " "" ! # " % " )&2()&+ A &+7 = @ , 8 * # 69C &++* > ## - ( 5 > &+& ; ; . ! ." ( ! '&1 #"

'=;8,

"

E" &+) I ,+*

E7

I B

> - !

4 > , ,

: 6" &+ , "

% A 8 F

* . 9 " ))(73 69C

( !'

" &++*

A A E 5 E # "

8 3 =9C &+7 4

, 0 % ="" "" ! #" " % G

+'

4 # &++' 77

A :

#" " 8 ! ""

'*/01)+(3*3 &+77 C

+'

= C - C "-

$ % % '7/201,

&++' C * C 3

C 3

C2*

= C "

%-

> : 6" &+*

( !'

* , 1 " 3+7(*3 " &+3 4 C " ! " % ) * # 6+ 7 " &27(&7 &+3 C- 4 "" ! " ! &2/'01&(+& A > C > % > ## @ "

&+2* C)

C- "

C+*

@ C

C+*#

C@+'

9

! "

F # ! !

+ 2'1)')()33 &+)

!1 J " # "

DD94,9+*9&3 D8 ! D &++*

, E &++* = C - A @ " C "- $ % % '7/201,

C""

&++' CC=7

C @ C"

= - ! F- !

" &+3('*3 A &+7

" %

C+

A C

9" ! # #

&31'+(* &++ C)

A C "" !

# "

''1&(&7

&+) C*

C +*

C +)

C +3

A C # " " ."

'1'+()&* &+*

CF A 4 ; . " " ! =4@ % # +

" '+()* &++* C A 4 ; " ! ,

&2/'01'&&('2' &++) 5 8" " " Q 5"" ! R 2@

* &+ '(3 @ C C " . ! ! ! 9

C +2

C +2#

C 7

? ' $ #

+

3/3013)2(37 &++3

#

C " 4 " " " "" % 4 ; 5$ " F "1 "8 % -

, - ! " ,% %

C + C 7' C +2

% ="" &+

="" &+*

+

8" 8" "

4,9&&& &+*

&'/&*0127(2* &++ 4 C =! ! "" ! " " &1'7&('& &+7'

C - ! => ""

4 C . # "" !

, ! D8" &++2 C3 CF7)

CF*

CF7

CF+'

+

4 , ="" &+3 E CF C B # ! " % - " 323(3& , 6 )33 &+7) E CF ! "9 "9" ! " % (9 , ! * !# + " 37+(3+* ="" &+* > A CF # 8 ! H" . % " '*2('&3 A &+7 CF +1 1

=> 5 CFB

"" , ! D8" &++' C,

A C A = ,

CD7+

! !

, , ." & 4 # D8" ="" &+

A 5 C! A > D

*

"9E" &+7+ C*

C 4K "1 #" " " F """

'7/3017+7('& &+*

C 7

A 5 C

" B ! $ " " !

#

'/&'01)&()7* &+7 C O8 " % 9

" C '

A 5

&2('& 69C C

A 5 C

"

" &+'

"

3* &+

8 ! 4H" ""1 @! G " L""/60 A &+ A 72

A ! ! F " ! F 8 ! # %

! " '&)('&+ ,

4 I 64, )7 &+72

A77

= A"

)

, @

="" &+77

" "

21+7(&'& &+) 7*

> 5 = . , F # " 8"

! ! !

# % A

" ')('+7 = ="" 5 "! 6 G

&+7* A)2

, 4 "" A "" ! !

"

)1)*(

) &+)2 ) 32

)' '1)3*()2) &+) ; ! " # ! &*1&*+(

, 4 # # "8"" , 4 &'3 &+32

7&

*

, 4 B # 1 "8 "8 % !

# " +2(&&' , 6 ))7 &+7& A E ! => "" D8" ! D &+* =# " "

4 &'+

9 )'1&3)(&' &+7

7

A E F1

+

= 4

-" A 4

= 4

-" C " A 4 D " %

+&

' 77

=

%

" &*2(&&2 &++

24

* ( ! " 333(37 % ="" &++& 4 = A " " ! # " 2'/)01)*()') &+' - 5 " >" % ( !' " ')()& 69C

7

" &+77

E >

=9C

,!F ,"

&+7 2+

"

% ! "" # " ! "8 ! " ! "

C

- " &*&(&' 69C

%

"

&+2+ 2

, -

+

,

"" ! "

%

+ -

" +'(&)* /= 4 $ ;.! &+)0 69C "

&+2 +*

D

> B A " ! " % A 8 F

* . 9 " 7+(3* 69C

( !'

" &++*

A ! A = DBB ." ! F O8 ""9 %

2@

" 2(+ &+

D+

! A = DBB B #

%

4

"$"

" " !

A

&++ D+*

! A = DBB %

# ! "9 #

.9 %

&*&S &2(&2 &+2 2

7*

= A . 7** C

%" "

! )*1') &+2 C

"

+1&27(& &+

" # 9 "

#" ! B # ! F " " "

%

,

9B %C

" ''7(')3 &+7* F+

"+'

" " " " % * (#

" &3)(&32 , 6 +' &++

@ E F8 >

= "

! F """ # "

8 "

. 6 7

# )' 6" 4 , " '&() ,9

: ,# &++' P 872

A 9A P 8

# !

4 I

% C 9

D !

*

9

9

9 "

#

9 "

%

! " &37(&2 ,

64, )7 &+72

"-8 A

! / )-

% =""

&+ 77

+'

,+)

"" "Æ ! ." !

+ " &( &+77

" %

2;

= > A 4 "" ! ! F "#" %

23

" '+)()*3 A &++'

" , , 88 !

# " %

"

(&3 &++) I #72

C I # 5#

"9

! " ! "

729 " %" " &+72 ,

A #- = A ,

(# 4 # D8"

="" 4 # D &+ ,,77

"-8 , -" 4 , O #" ""

'*123(27 &+77

L73

"-8 , L

" = F #" "

% +12*(2+

&+73

7&

2

,

&' 8 2 ! $ 1 #

,9: &+7&

> T

" ! ,

'/'0 &+2

8" = &+3 4 , " @

+*

7*

> T

)2 "

=

D" " ."" "

" '77(' &+ C " >

%

# " ! " .

2D %

2:

" )'(3*& A &++* 8

" "8 !"

: 6" 4 = &+3

24

" ''('3& A &++ E " ! # " < 2'/&017(&'' A "

" ! "

%

&+'

!"#

+'

% )* " " D# " &++'

+'#

C

A 4

B,

=9C &+

=9C &++'

C "" ! %

" '(3 A &+

2@

77

@

#" " !

#

3/&01&('' &+77

7

2

! "

,

&71)3()72 &+7

'/'0 &+2 ' "

8" = &+3 4 , " @

2#

,

'/'0 &+2 ' "

8" = &+3 4 , " @

#

=

=

#

A 4

#" B # ! " 8 / #" 0

A 4

A 4

9! " " " !

! 2&/)012&(2' &+

" #" %

" ')('7 A &+ .""" % &+

2D

+

" )*()&+ "

+ +

*

F 8""

C

"9E" /&++*0 &+2('&'

A 4

+ +

*

= !

7/'J)01'&&('3+ &+

C

"9E" /&++*0 &2)(&+3 ; $8 ! " " % : !"B / , (

+&

A 4

+&#

A 4

" )*2())* ="" &++& ! F " "#"

+

&/)01'32('

&++&

2

A 4

, 9

"

%

"

''2(') A &+2 ,9: 64, &+)

+&

,7

5 -9" " ! # " 2&1++(&'3 &++& = 8" * &+7 " )*)()&3

A 4

A 4

5 , 5 "

#

2E

" '2)('' A &+7 F 8"" + + #

* C

"9E" /&++*0 '7)('3 A 4 A ;H> B # " " ! 9 " % ! F " )'()' ! &+ A > - 8 )7 ! $ 1 ,9: " %

;

7

&+7

7)

A C " " "" % &

" &'*(&'3 &+7)

/ =(C &++* ! 9C

+*

4

"73

G 6 "8 -"

"

&+73

=

> = - #" " 8 ." " #

&*/)0137*(2*' &+ = 8" 2 "" TH"

D8" " 4 &++* ; '

@ A ; "

#

=> ""

, " D8" , " 6 G &+' ; 2

@ A ; "

# " ! " # - "

" 23)(27) 4 # D8 ="" &+2

%

!

;+'

= E ;HC >

, " !

8 # "

% = @ =

8 &77 ! % " '&7(') 4 # D8" A" =" "

="" 4 # 5 &++' ;+)

= 7'

= 7

= E ;HC >

8 # "

" &7&(&3 &++)

%

= " ; # " " """ "

2/&'01&*2)(&*2 ># &+7' 4 = " , - ! + 4 # D8 ="" &+7

=7

, = A" C

=+*

+

=(

&+7

E =

) G!

=> "" 4 # &++*

8 # "

D8" ! 5 # > ! 4 , 4,9'9+& 54,9@4,9+&9&7& =+*#

=+& =7

=+

=++

= 7)

" )()77 &++* = 9 % ="" &++& =" = " " "9 "8 %

" &'()+ &+7 , 64, ') =" 698 F " H # "#" ! " % " (&) &++ =" 4 % ( !' * . . # ;.! D8" ="" &++< E = 5O8 " " " %

> = - # 9 # D8" ! 5 # ,

= 72

= 77

!

"

/ 0 ,%9 93

% &+7)

> = -

4

9#9 9#98

> = -

4@ " "

# "

&1&'2(&2+ &+72

21'')('22 &+77 = *

= &

> = - # # ! % (9 , ! * ! + " ))()7) ="" &+* > = - " " "

>% % @69&+

" D8" 4 , > &+& = '

> = - 6" "" ! !

" D# " "

&+' = 2

> = - >

" " F

!" " 4 , % ,

, , ! &+2

% )

= 2

> = FB

=E7

, = " 6 E

73

A 4 " ; #F " " %

73#

$" E-"

,- &+2

&1&2(&7 &+7

0 *

" &3&(&2 &+73 ,9: 64, A 4 " F " ! " % 0

" 3*(3'2 &+73 ,9: 64, &+

&

A 4 " "" ! % -- 8 : "

" )32()7' " &+& %@%= 69C )

3

+3 #2

A 4 " " #" " %

5;D " 2&)(2') 69C

" &+)

A 4 " = " " "9 % ) * # 6+ 7 " &32(&2 &+3 , 64, &7) A - , H" &9"

F9C

&+7

"9 ! " "" "

, )

&+

> , , ( 8 4D4C %,E% ;ECG " &++

2/)012''(27 &+7 > , , " ! # " % (9 , !

* ! + " 3*)(32* ="" &+* > , > " "

"

> , , " ! " " 4 ,

+21))(3 &+3

;.! &+*

" " "

, 4 " " ! 9

2'1++(++ &+7 ,

, ,#

" # "$

!" 8

!

# "

!

"9E" &++

"

% = @

* ) 2332 8 &77 ! % " '2('+ = A" =" "

4 # D8" ="" &++' ,6 +

, - E 6 A A " ; 9" $

0 ! * . ="" &+3 A 5 , ) , #

% ="" &+77 , + !

= A , ,

E ," 6F G- &+ ,+&

,

% # !

" !

7+1)27()2 &++& ,+&#

C , K ! " " ( %,= % : !"B

/ , ( " 3*+(3' ="" &++& 4 , F " ! " " % , + )

" &+(''* 69C &+

,

,7

4 ,

@

, , , ,E7

"

"

6" %

4 = 4 " &+7

"9E" &+ ,8 = E " ) !"#

,"

%

="" &+7 7

E E %" '&' &+7

! ! " !

! )'1&+(

7

=

- ) * 1 => "" 9

" > 4 # D8" &+7 &

+& !

>

=9C

% &+&

> @9 " " ! " " % " '*(''3 &+ , 64, '3* >

=9C % &++&

! => "" 5 # D8" &+ 8 # " 5 # D8" # ! @ " ! 4 ,

!+*

54,(@4,((23

! ! ! !"

+/&01&()3

&++* 7

7)

G ; 49"" ! " ! F """

)31&'(&3) &+7 , " -

,

6 )33 &+7) )7 2

! '1&2)(&) &+)7 > 1 9" ! F " % + 5 +

* % " &(& &+2

4 #

# #

, 64, '*& "

A "B-FB 4 . ! !

# " "H" ""

D8" ! E " F &+ D

+3

D, *

A > D

= C &++3

D, > ! >!"

=; **9***9

**)239 &+* 8>*

> 8 >

=> ""

D8 5 8

&+* E 7

4 E "F #F " ! ,H"

2/)013(2'& &+7 " ! !V % +

" )37()2+ &++

E @ # " " ."" &+1'7(33

" E +

E 7+

= E

&+7+ E 7

E " ! ! !

+ &*1&&2(

&'' &+7 E +3

A E

" # - " 9

E+) E+*

E,C77

G7

# 9 " $8

" &7(&2 &++3 E"-

% ="" &++)

E" # " % A 8 F ( !' # * . 9 " 72(7 69C " &++* A E " E , 4 C #" "" " = 1 7/012(+ &+77 GB F - " !"#

% =""

# %

&+7

%#$ *

%+

+

,

Æ

!"# $ #%#% "#& $ " " #% % " " " '# # "#& $

%#

%# #(# )"' %#

'#

'"# ) #

- -

, , , , , , , , ,

&"#%

%# !"#$%& ' - # ' # -

'$ (

· %#

.

!

## / ) " %*½ 0%"' +

" # # $ % & & +

,

,

&

"#& $

1

1 "'&!" #%' #%' 2 !"# # #

$

"!#"# $#"( "!#"# $#"( "#"!' %"# 33 %# "#"!' "& %# " '$ 4" &"& '"&%"& "0%"$ "# "5 # #%"# "!' 6# %# #$ "!' '%) "# "!' ' &"' '"# "!' "# "'&!" "'&!" "#"#$ %# %# #$ +"# # "'&!" "'" "!&% % &"" "## "#$#$

"'"# '"!" #"# "'"#) #%#% (# "' "#) " "#& $ '$ #$ #$

" "# " "# ) # " # " ("# "'

" ("# #

" 6 " "#& $ " "#)#$ '# &# " "#)#$ # "( "( "# "#

78 # !" " %# !" #$ !

!#"%# !5#) %# !"$ # 759 :3 ! % )""!' ! % '#

! % '$ 7%" ;#

< &"& '"&%"& ' H"$ %'' &"' %# I #$ &"' % #$ &"# "'&!" &"# "# &' !"' '#

" "#& $

" GE ' &' !"' '# &" %#

& % #" 0%"# #" # %!##%# # "'#& !'

A"E'' &"& '"&%"& " "' &# " #

AE ' & ' & "#) '$ #$ #$ #"$ %#"# & ' & & %!#$ ! % '$ ' %'" 3 3 3 ' # !#6 "'&!" !#6 "'"#) #%#% "#) &" '#"# ' 6# #"" !) "!' '"# "# '# #$& "#) '$ '% '%) "# ##

% " +( # ## ##

$# % " +( # ( 33 %# %# +( # % %# $ # %# # %#) $ # 0%"# "' %'' "!#"#

%' #" #"' "'&!" 6"E #"' "'&!" '" 0%"# $ #"' "'&!" "# #"' !5#

#"' #$ 5#) %# #" '"# #$& # ' ## # )""#

6' '

# # "#"' 0%)"' '"#

%) !#6 "'&!" !#6 "'"#) #%#% " "#& $

G' "' G%#7( '# G%#7( ## GE "'"#) #%#% '"!" '

' &"' '"# B=)"'% '"!' #"#

'"!' +( # "# '"!' %# #) '"!' # '"!" "!#"# '"!" "'&!" #$ '"!" "'%'%

'"!" +"!' '"!" ( '"!" ##"# '"!" # '"# !& '"F$ )"'%"# '"F$ "' B '" )'"& 6# $# #$ #$ "#) ' &"' "( #"' #)"' ' ) # "'&!" &"#% "' "'F"# %"' % %#

% "#"' %#

!5# '"&%"&

!5# " "#& $

!5# # &"&

!)"!' #$

!)"!' )"'%

!)"# "' &%

!)"# "' 0%)"'

!)"# "''$ 0%)"'# '

!) " "#"#$

# %#

# %#

"# "' 0%)"'

"# "' "#

"# " "#"#$

# "#& $

"

# & "' 6# $# "6 ## '# "6 ## %!# ""''' %# ""''' ""# '$ " # "#"' "'"#) #%#% "#"' !"# $ "'&!" "#"' !"# $ "'&!" " "#"' !"# $ "'&!" "#"' # "# "#"' %"# %# "#"' 0%)"' '"# "#"' 0%)"' '"# "#"' 0%)"' '"# "#"' %# "#"' "#"' %) %# "#"' %5# "#"' %5# %) "!$ "!$)"'% ; "" ( "#"#

" 0%"# !$ '"!" '

!$ " # "'&!" A " ' &

%'" !$ '

# %'"

# "&#

#$ "&# "#"' # "#"# 0%"# ##"#

" "#& $ $" #"# " '"!" "'%'% "# %# "# # $ "# &"

%+"# ""# % > 0% +# 0%#"' # # % ## # %# "$ &"#% %'# # "'&!" '$ ' "# ' "# #$ '$#$ '"!" "'%'% %'#" % %!##%# ( &' # "'&!" &'# # "'' "#& $ " %# # "'&!" # "&# "'&!" # %

#$& %'

> %'# # "'&!" +"# "'&!" "#"#$ #"# $! ' #"# # %# # J $ #"#K ## %# "'"# ## %# #& # & %# # & "'F"#

# &'$ @%# %# #%#% "# "' "# %!"#& $ %!5# %# %!5# %# #$ %!0% ## " # %!

%!# %!# ##"# %!##%# " ) # ## '" "'&!" '" > '" AE '

"'&!" #

'"!" #

;

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close