OXFORD LOGIC GUIDES
1.
2. 3. 4. 5. 6. 7. 8. 9. 10. II. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27...
34 downloads
643 Views
22MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
OXFORD LOGIC GUIDES
1.
2. 3. 4. 5. 6. 7. 8. 9. 10. II. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.
Jane Bridge: Beginning model theOlY: the completeness theorem and some consequences Michael Dummett: Elements of intuitionism (lst edition) A. S. Troelstra: Choice sequences: a chapter of intuitionistic mathematics J. L. Bell: Boolean-valued models and independence proofs in set theOlY (lst edition) Krister Seberberg: Classical propositional operators: an exercise in thefoulldation of logic G. C. Smith: The Boole-De Morgan correspondence 1842-1864 Alec Fisher: Formal number theOlY and computability: a work book Anand Pillay: An introduction to stability theOlY H. E. Rose: Subrecursion: filllctions and hierarchies Michael Hallett: Cantorian set theOlY and limitation of size R. Mansfield and G. Weitkamp: Recursive aspects of descriptive set theOlY J. L. Bell: Boolean-valued models and independence proofs in set theOlY (2nd edition) Melvin Fitting: Computability theOlY: semantics and logic programming J. L. Bell: Toposes and local set theories: an introduction R. Kaye: Models of Pea no arithmetic J. Chapman and F. Rowbottom: Relative categOlY theory and geometric morphisms: a logical approach Stewart Shapiro: Foundations withoutfoundationalism John P. Cleave: A study of logics R. M. Smullyan: Gadel's incompleteness theorems T. E. Forster: Set theory with a universal set: exploring an untyped universe C. McLarty: ElementGlJI categories, elementGlY toposes R. M. Smullyan: Recursion theOlY for metamathematics Peter Clote and Jan Krajiacek:Arithmetic, prooftheOlY, and computational complexity A. Tarski: Introduction to logic and to the methodology of deductive sciences G. Malinowski: Many valued logics Alexandre Borovik and Ali Nesin: Groups offinite Morley rank R. M. Smullyan: Diagonalization and self-reference Dov M. Gabbay, Ian Hodkinson, and Mark Reynolds: Temporal logic: mathematical foundations and computational aspects: Volume 1 Saharon Shelah: Cardinal arithmetic Elik Sandewall: Features andfiuents: Volume I: a systematic approach to the representation ofkno>vledge about dynamical systems T. E. Forster: Set theory with a universal set: exploring an untyped universe (2nd edition) Anand PilIay: Geometric stability theory Dov. M. Gabbay: Labelled deductive systems Raymond M. Smullyan and Melvin Fitting: Set theOlY and the continuum problem Alexander Chagrov and Michael Zakharyaschev: Modal logic G. Sambin and J. Smith: Twenty-five years of Martin-Lafconstructive type theOlY Maria Manzano: Model theOlY Dov M. Gabbay: Fibring logics Michael Dummett: Elements of intuitionism (2nd edition) D. M. Gabbay, M. A. Reynolds, and M. Finger: Temporal logic: mathematical foundations and computational aspects volume 2 J. M. Dunn and G. Hardellree: Ali!ebraic methods in DhilosolJhicallOf!ic
Algebraic Methods in Philosophical Logic J. MICHAEL DUNN and GARY M. HARDEGREE
CLARENDON PRESS • OXFORD 2001
OXFORD UNIVERSITY PRESS
Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Athens Auckland Bangkok Bogota Buenos Aires Calcutta Cape Town Chennai Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Paris Sao Paulo Singapore Taipei Tokyo Toronto Warsaw with associated companies in Berlin Ibadan Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York © J. M. Dunn and G. M. Hardegree, 2001 The moral rights of the authors have been asserted Database right Oxford University Press (maker) First published 2001 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford Univer.sity Press, at the address above You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer A catalogue record for this title is available from the British Library. Library of Congress Cataloging in Publication Data Dunn, J. Michael, 1941Algebraic methods in philosophical logic / J. Michael Dunn and Gary Hardegree. p. cm.- (Oxford logic guides; 41) Includes bibliographical references and index. 1. Algebraic logic. I. Hardegree, Gary. II. Title. III. Series. QAlO.D85 2001 511.3'24-dc21 2001021287 ISBN 0 19 853192 3 (Hbk) Typeset by the authors in LaTeX Printed in Great Britain on acid-free paper by T. J. International Ltd, Padstow, Cornwall
Dedicated to our loved, and loving spouses,
Sally Dunn and Kate Doifman, who have been with us even longer than this book
PREFACE This book has been in process over many years. Someone once said "This letter would not be so long if I had more time," and we have somewhat the dual thought regarding this book. The book was begun by JMD in the late 1960s in the form of handouts for an algebraic logic seminar, taught first at Wayne State University and then at Indiana University. Chapters 1 through 3, 8, 10, and 11 date in their essentials from that period. GMHjoined the project after taking the seminar in the middle 1970s, but it did not really take off until GMH visited Indiana University in 1982. The bulk of the collaborative writing was done in the academic year 1984-85, especially dUling the spring semester when JMD visited the University of Massachusetts (Amherst) to work with GMH. JMD wishes to thank the American Council of Learned Societies for a fellowship during his sabbatical year 1984-85. Most of Chapters 4 through 7 were written jointly during that period. Little did we know then that this would be a "book for the new millennium." We wish to thank Dana Scott for his encouragement at that stage, but also for his critical help in converting our initial work, written in a then popular wordprocessor, to IbTEX. We also thank his then assistants, John Aronis and Stacy Quackenbush, for their skillful and patient work on the conversion and formatting. Then, for various reasons, the project essentially came to a stop after our joint work of 1984-85. But JMD resumed it, preparing a series of draft manuscripts for seminars. GMH is the principal author of Chapter 9, and JMD is the principal author of the remaining chapters. It is impossible to recall all of the students who provided lists of typos or suggestions, but we especially wanno thank Gerry Allwein, Alexandru Baltag, Axel Barcelo, Gordon Beavers, Norm Danner, Elic Hammer, Timothy Herron, Yu-Houng Houng, Albert Layland, Julia Lawall, Jay Mersch, Ed Mares, Michael O'Connor and Yuko Murakami. JMD has had a series of excellent research assistants who have been helpful in copy editing and aiding with the IbTEX aspects. Monica Holland systematized the formatting and the handling of the files for the book, Andre Chapuis did most of the diagrams, Chrysafis Hartonas helped particularly with the content of Chapter 13, Steve Crowley helped add some of the last sections, and Katalin Bimb6 did an outstanding job in perfecting and polishing the book. She also essentially wrote Section 8.13 and provided significant help with Section 8.3. Kata truly deserves the credit for making this a completed object rather than an incomplete process. We owe all of these our thanks. We owe thanks to Allen Hazen and Piero D' Altan, who have provided extensive comments, suggesting a range of improvements, from cOlTections of typos and technical points to stylistic suggestions. We also thank Yaroslav Shramko and Tatsutoshi Tatenaka for cOlTections. We wish to thank Greg Pave1cak and Katalin Bimb6 for prepming the index. We thank Robert K. Meyer for providing a critical counter-example (cf. Section
viii
PREFACE
6.9), and also for his useful interactions over the years with JMD. The "gaggle theory" in our book is a generalization of the semantics for relevance logic that he developed with Richard Routley in the early] 970s. The authors owe intellectual debts especially to G. D. Birkhoff, M. H. Stone, B. Jonsson and A. Tarski. Their work on universal algebra and representation theory permeates the present work. JMD also wants to thank his teacher N. D. Belnap for first stimulating his interest in algebraic methods in logic, and to also acknowledge the influence of P. Halmos' book, Algebraic Logic (1962). We wish to thank Richard Leigh, the copy editor for Oxford University Press, and Lisa Blake, the development editor for their keen eyes and friendly and professional manner. We owe many last-minute "catches" to them. Despite the efforts of all of us, there are undoubtedly still typos and maybe more serious errors, for which the authors take full responsibility. Someone (Aelius Donatus) also said "Pareant, inquit, qui ante nos nostra dixerunt" (Confound those who have voiced our thoughts before us). As the book was written over a considerable period of time, thoughts which were once original with us (or at least we thought they were) have undoubtedly been expressed by others. While we have tried to reference these wherever we could, we may have missed some, and we apologize to any such authors in advance. We wish to thank the following journals and publishers for permissions. Detailed bibliographic information appears in the references at the end of this volume under the headings given below. Section numbers indicate where in this volume some version of the cited material can be found. Springer-Verlag: Dunn (1991), 12.1-12.9, 12.16. Clarendon Press: Dunn (1993a), 3.17. W. de Gruyter: Dunn (1995a), 12.10-12.15; Dunn (l993b), 3.13. Zeitschriftfiir Mathematische Logik und Grundlagen der Mathematik: Dunn and Meyer (1971), ] 1.10. We wish to thank Indiana University and the University of Massachusetts for support for our research. In particular, JMD wishes to thank Morton Lowengrub, Dean of the College of Arts and Sciences, for his support over the years. We thank our spouses, Sarah J. ("Sally") Dunn and Katherine ("Kate") Dorfman for their love and support. Obviously this book tries to represent a reasonable portion of the intersection of algebraic logic and philosophical logic, but still contains only a fraction of the results. Scholars who know our previous publications may find surprising how little is devoted to relevance logic and quantum logic. We knew (between us) too much about these subjects to fit them between two covers. Another notable omission is the algebraic treatment of first-order logic, where perhaps we know too little. There are at least three main treatments for classical logic: cylindric algebras (Henkin, Tarski and Monk (1971)), polyadic algebras (Halmos (1962)), and complete lattices (Rasiowa and Sikorski (1963)), and at a rough calculation to do justice to them all we would have to multiply the length of the present book by three. We suspect that the reader applauds our decision. An overriding theme of the book is that standard algebraic-type results, e.g., representation theorems, translate into standard logic-type results, e.g., completeness the: orems. A subsidiary theme, stemming from JMD's research, is to identify a class of
PREFACE
IX
algebras most generally appropriate for the study of logics (both classical and nonclassical), and this leads to the introduction of gaggles, distributoids, and partial gaggles and tonoids. Another important subtheme is that logic is fundamentally infOlmation based. Its main elements are propositions, which can be understood as sets of information states. This book is both suitable as a textbook for graduate and even advanced undergraduate courses, while at the same time hopefully of interest to researchers. In terms of the book's target audience, we briefly considered indicating this by expanding its title to "Algebraic Methods in Philosophical Logic for Computer and Information Scientists, and maybe Linguists." We rejected this as too nakedly a marketing ploy. But the serious point behind this joke title is that we do believe that the book has results of interest to mathematicians, philosophers, computer and information scientists, and maybe linguists. J.M.D. G.M.H.
CONTENTS 1 2
Introduction Universal Algebra Introduction Relational and Operational Structures (Algebras) Subrelational Structures and Sub algebras Intersection, Generators, and Induction from Generators Homomorphisms and Isomorphisms Congruence Relations and Quotient Algebras Direct Products Subdirect products and the Fundamental Theorem of Universal Algebra 2.9 Word Algebras and Interpretations 2.10 Varieties and Equational Definability 2.11 Equational Theories 2.12 Examples of Free Algebras 2.13 Freedom and Typicality 2.14 The Existence of Free Algebras; Freedom in Varieties and Subdirect classes 2.15 Birkhoff's Varieties Theorem 2.16 Quasi-varieties 2.17 Logic and Algebra: Algebraic Statements of Soundness and Completeness
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
3
Order, Lattices, and Boolean Algebras Introduction Partially Ordered Sets Strict Orderings Covering and Hasse Diagrams Infima and Suprema Lattices The Lattice of Congruences Lattices as Algebras Ordered Algebras Tonoids Tonoid Varieties Classical Complementation Non-Classical Complementation Classical Distribution Non-Classical Distribution Classical Implication Non-Classical Implication Filters and Ideals
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18
1 10 10 10 11
13 15 19 25 28 33 36 37 39 41 44 47 49 51 55 55 55 58 60 63 67 70 71 74 77
82 85 88 92 98 105 109 115
CONTENTS
xii
4
5
4.1 4.2 4.3 4.4 4.5 4.6 4.7
Syntax Introduction The Algebra of Strings The Algebra of Sentences Languages as Abstract Structures: Categorial Grammar Substitution Viewed Algebraically (Endomorphisms) Effectivity Enumerating Strings and Sentences
125 125 125 130 133 136 137 138
Semantics Introduction Categorial Semantics Algebraic Semantics for Sentential Languages Truth-Value Semantics Possible Worlds Semantics Logical Matrices and Logical Atlases Interpretations and Valuations Interpreted and Evaluationally Constrained Languages Substitutions, Interpretations, and Valuations Valuation Spaces Valuations and Logic Equivalence Compactness The Three-Fold Way
141 141 142 144 146 148 152 155 158 162 166 169 172 176 181
Logic
184 184 185 187 189 191 194
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 6
CONTENTS
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13
Motivational Background The Varieties of Logical Experience What Is (a) Logic? Logics and Valuations Binary Consequence in the Context of Pre-ordered Sets Asymmetric Consequence and Valuations (Completeness) Asymmetric Consequence in the Context of Pre-ordered Groupoids Symmetric Consequence and Valuations (Completeness and Absoluteness) Symmetric Consequence in the Context of Hemi-distributoids Structural (Formal) Consequence Lindenbaum Matrices and Compositional Semantics for Assertional Formal Logics Lindenbaum Atlas and Compositional Semantics for Formal Asymmetric Consequence Logics Scott Atlas and Compositional Semantics for Formal Symmetric Consequence Logics
7
6.14 Co-consequence as a Congruence 6.15 Formal Presentations of Logics (Axiomatizations) 6.16 Effectiveness and Logic
214 216 224
Matrices and Atlases Matrices 7.1.1 Background 7.1.2 Lukasiewicz matrices/submatrices, isomorphisms 7.1.3 GOdel matrices/more submatrices 7.1.4 Sugihara matrices/homomorphisms 7.1.5 Direct products 7.1.6 Tautology preservation 7.1.7 Infinite matrices 7.1.8 Interpretation 7.2 Relations Among Matrices: Submatrices, Homomorphic Images, and Direct Products 7.3 Proto-preservation Theorems 7.4 Preservation Theorems 7.5 Varieties Theorem Analogs for Matrices 7.5.1 Unaryassertionallogics 7.5.2 Asymmetric consequence logics 7.5.3 Symmetric consequence logics 7.6 Congruences and Quotient Matrices 7.7 The Structure of Congruences 7.8 The Cancellation Property 7.9 Normal Matrices 7.10 Normal Atlases 7.11 Normal Characteristic Matrices for Consequence Logics 7.12 Matrices and Algebras 7.13 When is a Logic "Algebraizable"?
226 226 226 227 230 230 232 232 233 234
Representation Theorems Partially Ordered Sets with Implication(s) 8.1.1 Partially ordered sets 8.1.2 Implication structures 8.2 Semi-lattices 8.3 Lattices 8.4 Finite Distributive Lattices 8.5 The Problem of a General Representation for Distributive Lattices 8.6 Stone's Representation Theorem for Distributive Lattices 8.7 Boolean Algebras 8.8 Filters and Homomorphisms 8.9 Maximal Filters and Prime Filters
277 277 277 278 287 288 293
7.1
8
8.1 196 199 202 208 209 211 213
xiii
237 239 243 246 246 247 249 249 254 257 262 266 270 271 273
295 297 300 302 302
CONTENTS
xiv
8.10 8.11 8.12 8.13 9
CONTENTS
Stone's Representation Theorem for Boolean Algebras Maximal Filters and Two-Valued Homomorphisms Distributive Lattices with Operators Lattices with Operators
303 305 313 317
Classical Propositional Logic 9.1 Preliminary Notions 9.2 The Equivalence of (Unital) Boolean Logic and Frege Logic 9.3 Symmetrical Entailment 9.4 Compactness Theorems for Classical Propositional Logic 9.5 A Third Logic 9.6 Axiomatic Calculi for Classical Propositional Logic 9.7 Primitive Vocabulary and Definitional Completeness 9.8 The Calculus BC 9.9 The Calculus D(BC) 9.10 Asymmetrical Sequent Calculus for Classical Propositional Logic 9.11 Fragments of Classical Propositional Logic 9.12 The Implicative Fragment of Classical Propositional Logic: Semi-Boolean Algebras 9.13 Axiomatizing the Implicative Fragment of Classical Propositional Logic 9.14 The Positive Fragment of Classical Propositional Logic
321 321 322 324 326 333 334 335 337 341
10 Modal Logic and Closure Algebras 10.1 10.2 10.3
Modal Logics Boolean Algebras with a Normal Unitary Operator Free Boolean Algebras' with a Normal Unitary Operator and Modal Logic 10.4 The KIipke Semantics for Modal Logic 10.5 Completeness 10.6 Topological Representation of Closure Algebras 10.7 The Absolute Semantics for S5 10.8 Henle Matrices 10.9 Alternation Property for S4 and Compactness 10.10 Algebraic Decision Procedures for Modal Logic 10.11 S5 and Pretabularity 11 Intuitionistic Logic and Heyting Algebras 11.1 Intuitionistic Logic 11.2 Implicative Lattices 11.3 Heyting Algebras 11.4 Representation of Heyting Algebras using Quasi-ordered Sets 11.5 Topological Representation of Heyting Algebras
346 348 349 350 352 356 356 358 361 361 363 364 367 367 369 370 375 380 380 381 383 383 384
11.6 11.7 11.8 11.9 11.10
Embedding Heyting Algebras into Closure Algebras Translation of H into S4 Alternation Property for H Algebraic Decision Procedures for Intuitionistic Logic LC and Pretabularity
xv
386 386 387 388 390
12 Gaggles: General Galois Logics 12.1 Introduction 12.2 Residuation and Galois Connections 12.3 Definitions of Distributoid and Tonoid 12.4 Representation of Distributoids 12.5 Partially Ordered Residuated Groupoids 12.6 Definition of a Gaggle 12.7 Representation of Gaggles 12.8 Modifications for Distributoids and Gaggles with Identities and Constants 12.9 Applications 12.10 Monadic Modal Operators 12.11 Dyadic Modal Operators 12.12 Identity Elements 12.13 Representation of Positive Binary Gaggles 12.14 Implication 12.14.1 Implication in relevance logic 12.14.2 Implication in intuitionistic logic 12.14.3 Modal logic 12.15 Negation 12.15.1 The gaggle treatment of negation 12.15.2 Negation in intuitionistic logic 12.15.3 Negation in relevance logic 12.15.4 Negation in classical logic 12.16 Future Directions
412 414 415 417 420 421 422 423 424 424 425 425 426 427 429 430
13 Representations and Duality 13.1 Representations and Duality 13.2 Some Topology 13.3 Duality for Boolean Algebras 13.4 Duality for Distributive Lattices 13.5 Extensions of Stone's and Priestley's Results
431 431 433 435 438 441
References
445
Index
455
394 394 395 398 400 406 408 409
1 INTRODUCTION
The reader who is completely new to algebraic logic may find this the hardest chapter in the book, since it uses concepts that may not be adequately explained. Such a reader is advised to skim this chapter at first reading and then to read relevant parts again as appropriate concepts are mastered. In this chapter we shall recall some of the high points in the development of algebraic logic, our aim being to provide a framework of established results with which our subsequent treatment of the algebra of various logics may be compared. Although we shall chiefly be discussing the algebra of the classical propositional calculus, this discussion is intended to have a certain generality. We mean to emphasize the essential features of the relation of the classical propositional calculus to Boolean algebra, remarking from time to time what is special to this relation and what is generalizable to the algebra of other propositional calculi. It should be mentioned that we here restrict ourselves to the algebra of propositional logics, despite the fact that profound results concerning the algebra of the classical predicate calculus have been obtained by Tarski, Halmos, and others. It should also be mentioned that we are not here concerned with setting down the history of algebraic logic, and that, much as in a historical novel, historical figures will be brought in mainly for the sake of dramatic emphasis. About the middle of the nineteenth century, the two fields of abstract algebra and symbolic logic came into being. Although algebra and logic had been around for some time, abstract algebra and symbolic logic were essentially new developments. Both these fields owe their origins to the insight that formal systems may be investigated without explicit recourse to their intended interpretations. This insight led George Boole, in his Mathematical Analysis of Logic (1847), to formulate at one and the same time perhaps the first example of a non-numerical algebra and the first example of a symbolic logic. He observed that the operation of conjoining two propositions had certain affinities with the operation of multiplying two numbers. Boole tended also to identify propositions with classes of times, or cases, in which they are true (cf. Dipert, 1978); the conjunction of propositions thus corresponded to the intersection of classes. Besides the operation of conjunction on propositions, there are also the operations of negation (-) and disjunction (v). Historically, Boole and his followers tended to favor exclusive disjunction (either a or b, but not both, is true), which they denoted by a+b, but modem definitions of a Boolean algebra (cf. Chapter 3) tend to feature inclusive disjunction (a and/or b are/is true), which is denoted by a V b. The "material conditional" a :J b can be defined as -a V b. A special element 1 can be defined as a V -a, and a relation of "implication" can be defined so a S b iff (a :J
2
INTRODUCTION
b) = 1.1 (If the reader has the natural tendency to want to reverse the inequality sign on the grounds that if a implies b, a ought to be the stronger proposition, think of Boole's identification of propositions with sets of cases in which they are true. Then "a implies b" means that every case in which a is true is a case in which b is true, i.e., a :J b.) Boole's algebra oflogic is thus at the same time an algebra of classes, but we shall ignore this aspect of Boole's algebra in the present discussion. He saw that by letting letters like a and b stand for propositions, just as they stand for numbers in ordinary algebra, and by letting juxtaposition of letters stand for the operation of conjunction, just as it stands for multiplication in ordinary algebra, these affinities could be brought to the fore. Thus, for example, ab = ba is a law of this algebra of logic just as it is a law of ordinary algebra of numbers. At the same time, the algebra of logic has certain differences from the algebra of numbers since, for example, aa = a. The differences are just as important as the similarities, for whereas the similarities suggested a truly symbolic logic, like the "symbolic arithmetic" that comprises ordinary algebra, the differences suggested that algebraic methods could be extended far beyond the ordinary algebra of numbers. Oddly enough, despite the fact that Boole's algebra was thus connected with the origins of both abstract algebra and symbolic logic, the two fields developed for some time thereafter in comparative isolation from one another. On the one hand, the notion of a Boolean algebra was perfected by Jevons (1864), Schroder (1890-1905), Huntington (1904), and others (until it reached the modem conception used in this book), and developed as a part of the growing field of abstract algebra. On the other hand, the notion of a symbolic logic was developed along subtly different lines from Boole's original algebraic formulation, starting with Frege (1879) and receiving its classic statement in Whitehead and Russell's Principia Mathematica (1910). The divergence of the two fields was partly a matter of attitude. Thus Boole, following in the tradition of Leibniz, wanted to study the mathematics of logic, whereas the aim of Frege, Whitehead, and Russell was to study the logic iEI) E
R~ If.ffor every i
E
I, (a}, ... , a;l)
E R{.
j
(1) C = X(Ai).
28
UNIVERSAL ALGEBRA
(2) Foreveryj
E
J, O~[(af)iEI, ... ,(aniEI] = (O{[af,···,a;l]iEI).
There is an important special case of a direct product, which is called direct power, that is equivalent (when specialized to an algebra) to what is more commonly called a function space. Let us concentrate for the moment on algebras. Suppose that the family (Ai) of algebras comprehends just one algebra, in the sense that Ai = Aj for all indices i,j; in this case, we have "I-many" repetitions of A. Then the direct product of the family (Ai) is called the I -direct power of A, much as the product of a number x with itself n times is called the nth power of x. Now, the Cartesian product of a set A with itself "I times" (where I is any set whatsoever) is simply the set of all functions from I into A. Given this set AI of functions from I into A, and given that A is the carrier set of algebra A, we can form the associated function space, which is simply the algebra founded in the natural way on A I. In particular, the operations acting on the functions are defined according to the corresponding operations on the elements of A. Thus, for example, in a numerical function space, the sum of two functions f and g, herein denoted [f + g], is defined according to the following rule:
(f+) [f + g](x)
= f(x) + g(x).
In other words, to compute what [f + g] yields when applied to an element x, one applies f to x and g to x, and adds the results. More generally, one defines each operation 0 on the function space according to the following scheme: (fO) [O(!l, ... ,fn)](x)
= O[!l(x), ... ,!n(x)].
By way of concluding this section, we introduce a subsidiary notion, which is employed in subsequent sections. Given any direct product (Ai), and given any element k in the indexing set I, one can d~fine the associated projection map, Pk, which maps X(Ai) to Ale, according to the following rule:
IT
(PROJ) Pk(ai)iEI) = ak. In other words, Pk simply selects the kth component of each sequence in X(Ai). For example, in the case of the 2-direct power of the real numbers, the projection of any point in the plane is simply the coordinate of that point along the appropriate axis.
Theorem 2.7.5 Each projection Pk is a /1lmphism from IT (Ai) onto Ak; in other words, each Aj is a mmphic image of the direct product IT (Ai). Exercise 2.7.6 Prove the above theorem. Exercise 2.7.7 Show that in the context of relational structures Pk is not necessarily strongly faithful, i.e., not necessarily a homomorphism. Show that in the context of operational structures Pk is a homomorphism. 2.8
Subdirect Products aud the Fuudamental Theorem of Universal Algebra
As we have seen in the previous section, we can, so to speak, multiply algebras to obtain other algebras, just as we can multiply natural numbers to obtain other natural
SUBDrRECT PRODUCTS
29
numbers. Recall that a natural number is called prime if it has no non-trivial factors. According to the prime factorization theorem, sometimes called the fundamental theorem of arithmetic, every number is identical to a product of prime numbers; for example, 10 = 2 x 5, and 30 = 2 x 3 x 5. Birkhoff, the father of universal algebra, proved the corresponding fundamental theorem of universal algebra (1944), which says (very roughly!) that every algebra is isomorphic to a product of prime algebras. In this section, we describe the content of this theorem. Let us start with the notion of prime algebra. Intuitively, an algebra is prime if it is not a non-trivial product of other algebras. What is a trivial product? For any similarity type of algebras, there is the trivial algebra of that type, which consists of exactly one element. Now, it is easy to show that the direct product of an algebra A with the trivial algebra is isomorphic to A. (This may be done as an exercise.) We regard such a product as trivial. In general, a product is trivial if at least one of the associated projection maps is an isomorphism, and it is non-trivial if it is not trivial (what else!). Having the notion of prime algebra under our belt, let us consider the content of the algebraic prime factorization theorem. One is naturally inclined to hope that this theorem says something like the following: (??) Every algebra is isomorphic to a direct product of prime algebras.
As nice as this may sound, it proves to be unfruitful. Consider cardinality. Every Cartesian, and hence every direct, product of finite algebras is either finite or uncountable; no such direct product is denumerable. On the other hand, many algebras are denumerable, which is a consequence of the famous Lowenheim-Skolem theorem from logic, which depends on the finitary (inductive) nature of the operations. We are accordingly forced either to accept every denumerable algebra as prime, or to devise a different notion of algebraic multiplication. Birkhoff opted for the latter choice, developing what is known as subdirect products. One way to look at a subdirect product is that it is a subalgebra S of a direct product (Ai) that still preserves the relationship of Theorem 2.7.5: each component algebra Ai is a homomorphic image of S. The following is our official definition.
IT
Definition 2.8.1 Let (Ai) be a family of similar algebras, and let B be a subalgebra of the corresponding direct product IT (Ai). Then B is said to be a subdirect product of (Ai) if it satisfies the follOWing condition: (1) for each index k, Ak is a homomorphic image of B under the projection map Pk (restricted to B).
B is said to be a trivial subdirect product of (Ai) toAk.
if there is a k such that B is isomorphic
Adopting this as our notion of product for the purposes of the prime factorization theorem, we now define prime algebra as follows (in the literature, the standard terminology is the mouthful "subdirectly irreducible"):
UNIVERSAL ALGEBRA
30
Definition 2.8.2 An algebra B is said to be prime, or subdirectly irreducible, iffor any family F of similar algebras, B is a subdirect product of F only if B is a trivial subdirect product of F.
With these definitions, we can now state Birkhoff's result. Theorem 2.8.3 (Birkhoff's prime factorization theorem) Every algebra is isomorphic to a subdirect product of prime algebras. The proof of Birkhoff's prime factorization theorem is somewhat involved, and proceeds by a series of lemmas. We shall first need Zorn's lemma, a well-known equivalent of the Axiom of Choice; but we first recall some terminology necessary for its statement. Where E is a family of sets, C is a chain of E iff (1) C is a subfamily of E and (2) for every pair of sets X, Y E C, either X ~ Y or Y ~ X. The union over a family of sets C is that set C containing all and only members of members of C. A set P is maximal in a family of sets E iff no member of E is a proper superset of P. Not all families of sets have maximal members, but Zorn's lemma states:
U
Lemma 2.8.4 (Zorn's lemma) If E is a non-empty family of sets, and if the union U C over every non-empty chain C of E is itself a member of E, then E has some maximal member P.
SUBDIRECT PRODUCTS
Corollary 2.8.6 Let A be a non-trivial algebra and let (Oi) be an indexed family of all the congruences Oi on A for which there is a pair of distinct elements a, b such that 0i is maximal with respect to not relating a and b. Then n(Oi) = E, the equality relation on A. I Lemma 2.8.7 If n(Oi) = E, where each Oi is a congruence on A, then A is isomOlphic to a subdirect product of (AjOi). Proof Suppose n(Oi) = E, where each Oi is a congruence relation on A. We argue that A is isomorphic to a subdirect product of (AjOi). For each AjOi let hi be the canonical homomorphism of A onto Aj Oi. Define the desired isomorphism h so that for a E A, h(a) = (hi(a). Note that our supposition amounts to that if a -:f. b then for some Oi we have not aOib, which means hi(a) -:f. hi (b) and so h(a) -:f. h(b). So h is one-one. That h preserves operations follows trivially from the componentwise definition of the operations on the direct product and from the fact that each hi, being a homomorphism, preserves the operations as they are computed at the ith component. All that remains then is to show that the homomorphic image of A under h, h*(A), is a subdirect product of the AjOiS. It is obviously a subalgebra of the direct product II (AjOi) because of the fact that h is an isomorphism. So we have finally to argue that each ith projection of h*(A) is onto AjOi. Consider [a]Bi E AjOi. Consider h(a) = (hi(a). Since hi(a) = [a]Bi' clearly [h(a)]i = hi(a) = [a]B i . So [a]Bi appears in the ith place of some member of h*(A), as was to be proven. 0
We now develop a series of lemmas and corollaries. Lemma 2.8.5 For a, bE A, if a -:f. b then there exists a congruence Oa, b on A which is maximal among all those congruences 0 satisfying the condition: it is not the case that aOb. Proof For each pair of elements a, b of A with a -:f. b, let C a, b be the set of all congruence relations 0 on A such that not aOb. Ca, b is non-empty since E E Ca, b. Regard each congruence as a relation in the set-theoretical sense, i.e., a set of ordered pairs. Our reason for emphasizing that congruences are sets is so we can talk of such things as one congruence being included in another and the union of a set of congruences. We need to be able to talk this way so as to apply Zorn's lemma so as to find a maximal member ofCa,b. Consider, then, any chain C of Ca, b. Clearly U C is a reflexive and symmetric relation. U C is also transitive. For suppose (x, y) E U C and (y, z) E U C. Then there is some 0 E C such that (x, y) E 0 and some Jr E C such that (y, z) E Jr. But either o ~ Jr or Jr ~ O. Whichever is the case, both (x, y) and (y, z) are members of one of Jr or O. Since both Jr and 0 are transitive, then (x, z) is a member of a member of C, i.e.,
(x,y) E UC. Verification of the Replacement Property is similarly argued (but note that it depends upon the operations of the algebra being finitary). We have thus shown that U C is a congruence, and it is obvious that U C does not relate a and b since no member of C does. Thus we have in hand all the hypotheses of Zorn's lemma and conclude that Ca, b has a maximal member; let us call it Oa, b. 0
31
Lemma2.8.8 If A issubdirectly reducible, then n(Oi) non-trivial congruences on A.
= E, where (Oi) is the set of all
Proof Suppose A is subdirectly reducible and yet n(Oi) -:f. E. Then for some a, bE A, a ~ b and ye~ aOib for all i E I. Let A be isomorphic to S, a subalgebra of II Ai, where S IS the subdlrect product that shows A subdirectly reducible. Thus no Ai is isomorphic to A. Each Pi restricted to S is a homomorphism of S (and hence indirectly of A) onto Ai, and hence determines a congruence Oi. Letting the isomorphism of A onto S be h, h(a) -:f. h(b), and hence for some i, [h(a)]i -:f. [h(b)]i. Thus Pi(h(a)) -:f. PiCh(b)) and so o not aOib.
Corollary 2.8.9 If A is simple, A is subdirectly irreducible. Proof If A has at least two distinct elements a and b, then there is no non-trivial congruence Oi that distinguishes a from b. So the consequent of Lemma 8.8 is false, and so by contraposition, A is subdirectly irreducible. In the degenerate case when A is a .trivial, one-element algebra, it is easy to see that A can be represented by a subdlrect product of algebras only when A is isomorphic to each of the 0 algebras. 1This last is just a fancy w~y. of saying that any pair a, b of distinct elements can be distinguished by a congruence B~, I.e., for some Bi It IS not the case that aBib. Incidentally, if (Bi) is empty, which is never needed In our use of It, we understand n(Bi) to be the universal relation on A, A x A.
WORD ALGEBRAS AND INTERPRETATIONS
UNIVERSAL ALGEBRA
32
Sub lemma 2.8.10 Let e be a congruence on A. Let If be a congruence on A/ e, alld define a(elf)b as [a]olf[b]o. Then, a(elf)b is a congruence on A. The proof of the sublemma is left as an exercise.
Lemma 2.8.11 Let e a, b be as in Lemma 2.8.5. Then A/ea, b is subdirectly irreducible. Proof Case 1. A/ea, b is simple. By Corollary 2.8.9, A/ea, b is sub directly ilTeducible. Case 2. A/ e a, b is not simple. Let (lfi) be the set of all non-trivial congruences on A/ e a, b. By Sublemma 2.8.10 it will follow that [a]Oa, blfi[b]Oa, b for alilfi. To see this, first notice that for arbitrary x, yEA, if xe a, bY then [x ]Oa, b = [Y]O o , b' and so [x]oa blfi[Y]Oa,b' i.e., X(ea,blfi)Y' Thus ea,b ~ (ea,blfi)' But further it follows fro~ the non-triviality of lfi that ea,b f:. (ea,blfi)-otherwise lfi = E. But since ea , b was maximal with respect to not making a and b congruent we have [a]Oa,blfi[b]Oa,b as promised. But then n(lfi) f:. E, and so by Lemma 2.8.8, A/ a , b cannot be subdirectly reducible. 0
e
We now at last prove Birkhoff's Theorem 2.8.3 proper. Let A be an arbitrary algebra. Suppose then that A is non-trivial. (We may suppose A is non-trivial. For if it has only one element, then it would obviously be simple, and hence by Corollary 2.8.9, A would itself be subdirectly ilTeducible.) Plugging Corollary 2.8.6 into Lemma 2.8.7, we conclude A is isomorphic to a subdirect product of an indexed family of algebras (A/ei), where each ei is as in Lemma 2.8.5. But by Lemma 2.8.11, each A/ei is itself subdirectly ilTeducible. The following is a sometimes useful summary of an important part of the reasoning used in showing Birkhoff's prime factorization theorem. It can of course be rephrased with congruences in place of homomorphisms, and quotient algebras in place of homomorphic images.
Theorem 2.8.12 Let K be a similarity class of algebras and let A (not necessarily K) be a similar algebra. Suppose for each a, b E A with a f:. b there exists an algebra E K and a /zomolnOlphism h of A onto A' such that h(a) f:. h(b). Let H be the class of all such homomorphisms. Then A is isolnOlphic to a subdirect product of the direct product IIhEH h(a). E
A'
We define f(a) = (h(a»hEH' We start by showing that f is one-one and preserves operations. The first is obvious, for if a f:. b, then there exists h such that h(a) f:. h(b), and so (h(a) hEH and (h(b) hEH differ at the "hth place." As for preserving operations, let us illustrate this with an arbitrary binary operation * (the extension to operations of all degrees being then clear). The following calculation depends upon the fact that both h and the angle brackets can be moved "inside," which is just a visual way of expressing that h is an isomorphism and operations in the direct product are defined componentwise. f(a
* b) = (h(a * b»hEH = (h(a) * h(b»hEH
= (h(a»hEH * (h(b)hEH
= f(a)
33
* feb).
All that remains is to show that the image of A under f is a subdirect product. We know that it is a subalgebra of the direct product and so all that really remains is to show that the projections are all onto their appropriate algebras in K. The hth projection is the function that assigns to each (h(a)hEH the element h(a) in the algebra A' E K. Since h is onto, this means that for every a' E A' that there is some a E A such that h(a) = a'. Thus every element a' E A' shows up in the "hth place" as a component of (h(a)hEH'
2.9
Word Algebras and Interpretations
In many of the remaining sections of this chapter, we employ concepts bOlTowed from the metatheory of first-order logic. So we take this opportunity to describe very briefly some of these concepts, leaving the detailed presentation until a later chapter. One of the things that connects algebra and metalogic is the notion of word algebra, which is crucial both to universal algebra and to algebraic logic. At least two examples of word algebras arise in logic-the algebra of terms of a first-order language, and the algebra of sentences of a zero-order language. In the construction of the class of terms of a first-order language, one begins with a class V of variable symbols, a class 0 of operation symbols (where 0 and V are disjoint), and a function d that assigns a natural number to each operation symbol. For each operational symbol S, deS) is the syntactic degree of S, which pertains to the rules of term formation, given below. Given the class SYM (= V U 0) of symbols, one constructs the associated class of all finite sequences of symbols, and using a set of syntactic rules, one identifies those sequences that are well-formed; in the case of first-order languages, these are called terms. This process applies equally to the formation of sentences in a zero-order language, as in sentential logic. In this case, the variable symbols cOlTespond to sentential variables, the operation symbols cOlTespond to connectives, or sentential operators, and the well-formed strings cOlTespond to sentences. Both first-order terms and zero-order sentences are concrete instances of the abstract algebraic concept of words. The terminology is natural: if the symbols correspond to letters of the alphabet, then the well-formed sequences of letters cOlTespond to words. Before formally defining algebraic words, we describe a subsidiary notion, that of an operational syntax.
Definition 2.9.1 An operational syntax is a system (V, (Oi), (di»), where V is any nonempty set, (Oi) is any non-repeating non-empty family disjoint from V, and (d i ) is any family (with the same indexing set) of natural numbers.
Definition 2.9.2 Let SYN = (V, 0, d), be an operational syntax, as defined above. Then the set of symbols of SYN is V U 0, and the set of strings of SYN is the set of all finite sequences of symbols of SYN. The set of words on SYN is a subset of strings of SYN, denoted W(SYN), inductively defined asfollows:
(1) Any sequence consisting solely of a variable symbol is a word (i.e., is an element of W(SYN)). (2) If WI, ... , Wk are words, and ijOi is a k-place operation symbol (i.e., di = k), then the sequence obtained by juxtaposing Oi, WI, ... , Wk, in that order, is also a word. (3) Nothing else is a word.
Given the set of words on a syntax SYN, we can construct an associated algebra, called the algebra of words on SYN. Specifically, for each operation symbol 0, of syntactic degree n, we define a conesponding operation 0*, of algebraic degree n, defined in the natural way, as follows: (0)
WORD ALGEBRAS AND INTERPRETATIONS
UNNERSAL ALGEBRA
34
O*(Wl,"" W/l) = OWl··· W/l'
The right-hand expression denotes the string of symbols that results by juxtaposing 0 with WI, ... , W/l, in that order. In other words, to compute the result of applying the algebraic operation 0* to a sequence of words, one first juxtaposes those words, and then prefixes the conesponding operation symbol O. Thus, the algebraic operation 0* serves as the mathematical portrayal of the syntactic action of prefixing the operation symbol 0 in front of an appropriate number of strings. As noted earlier, the algebraic notion of word encompasses both the terms of a firstorder language and the sentences of a zero-order language. Accordingly, the notion of word algebra has two concrete instances-the algebra of terms of a first-order language, and the algebra of sentences of a zero-order language. In the remainder of this section, we concentrate on algebras of terms. Every system of words has a type, given by the family (di), and hence every algebra of words also has a type, likewise given by (di). We can accordingly consider homomorphisms from a given algebra of words into similar algebras. In the case of the algebra of terms of a first-order language, the: homomorphisms conespond to the interpretations of model theory, which we now briefly describe. Consider a pure functional first-order language L, that is, a first-order language whose only predicate is identity. An interpretation structure for L is simply an algebra A similar to the algebra of terms of L. An assignment is any function that assigns an element of A to every variable. On the other hand, an interpretation is a (but not just any) function that assigns an element of A to every term; in order for a function to qualify as an interpretation, it must respect the conespondence between the syntactic operations, on the one hand, and the algebraic operations, on the other. For example, consider a language, L, with only one (two-place) operation symbol P, and consider an interpretation structure (i.e., algebra), A, consisting of a single twoplace operation, +, which in this example is intended to be the meaning of the operation symbol P. To say that a function I respects the conespondence between the symbol P and its intended meaning, addition, is simply to say the following: for any terms sand t, if I(S) = x, and l(t) = y, then I(PSt) = x + y; equivalently, I(PSt) = I(S) + l(t). But recall that the operation p* on the algebra of terms is defined so that P*(s, t) = Pst, so we can rewrite this condition thus: I[P*(S, t)] = I(S) + ICt)· This, of course, is simply the condition that I is a homomOlphism from the algebra of terms into the algebra A.
35
More generally, given any system of terms of a pure functional first-order language, an interpretation of that language is a homomorphism from the associated algebra of terms into a similar algebra. Every interpretation I gives rise to a unique assignment, which is simply the restriction of I to the class V of variables. The other direction also holds, although it is not trivial-every assignment function on A can be extended to a unique interpretation. The proof of this important, though unglamorous, result depends upon proving that the terms of a first-order language (more generally, the words of an operational syntax) decompose uniquely (cf. Section 4.3). For first-order language this result was first proven by Church (1956). Lemma 2.9.3 (Isolation lemma) Let l' = r(xl, ... , xn) be a tenn containing no variables other than those displayed. Let I and" be two interpretations that agree on each OfXl, ... ,X/l' Thenl[r(xl,···,X/l)] = 1'[r(xl, ... ,X/l)]. Proof. (By induction on the complexity of r(xl, ... , xn).) (i) Base case. Let r(xl, ... , X/l) be atomic. Then it is of the form OiXl ... Xn, and
(ii) Inductive case. We suppose the lemma true for terms 1'1, ... , Tn and show that the lemma is preserved under the construction of l' = Oi(Tl, ... , rn). We first observe that each of the terms rj can be relabelled as: rj(Xl, ... , X/l)' The requirement on this last notation was only that all variables of the term are included among Xl, ... , X/l, and so we can just pool all of the variables occuning in r(xl, ... , xn) thus having a uniform list of variables for each ingredient term in r. By inductive hypothesis, for 1 ::; j ::; n:
and so: I[Oi(rj, ... , rn)]
= Di(lrj, ... , 11'/1) = Di(z'rl, ... , z'rn) = z'[Oi(rl, ... , Tn)].
D
We introduce some quite standard notations which implicitly rely on the fact that in evaluating a formula we have no need to look outside that formula (as the isolation lemma tells us). We write r(xl, ... , xn) to mean a sentence containing no atomic sentences other than Xl, ... , X/l' When ai, ... , an are elements of an algebra A, it is natural to use a substitution notation real, ... , all) to indicate that we are thereby restricting the interpretations of the term to those where each displayed variable Xi has been assigned the element ai. Note that we are not necessarily assuming that each ai is denoted by a constant in our language (though adding a distinct constant for each distinct element would be another way to go). The notation real, ... , an) is neither syntactic fish, nor semantic fowl, but a mixture of both. We shall also from time to time employ a related notation I(al, ... , an/Xl, ... , XII) to mean an interpretation that is exactly like the interpretation I except that it assigns ai to Xi (1 ::; i ::; n). Another notation that is useful is rA(al, ... , all) = [I(al, ... , an/xj, ... , Xn)](T). Note that this is a semantic notion in
EQUATIONAL THEORIES
UNIVERSAL ALGEBRA
36
the sense that it computes an element in the algebra using the functions matching the operation symbols. 2.10
Varieties and Equational Definability
The symbolic machinery of first-order logic includes a two-place predicate symbol for identity, or equality, herein denoted E (though we shall quickly com~ also t.o ~se the standard =, both for the object language and metalanguage, context dlfferentIatmg them). Thus, the atomic formulas of a first-order language include in their ranks formulas involving this predicate; these are called, quite naturally, equations. Standard model theory treats E as a logical predicate, which is to say it assigns to E a fixed interpretation, in particular, the relation of (numerical) identity. What this means may be described as follows. As remarked in the previous section, in the special case of a pure functional language L, an interpretation is simply a homomorphism from the algebra of terms of L into a similar algebra. Standard model theory also involves the concept of satisfaction, which may be regarded as a relation between interpretations and formulas. For .the purposes ~f the remaining sections of this chapter, we need only be concerned WIth pure functlOnal languages, and we need only be concerned with equations. For this special class of formulas, satisfaction is simple to define: I satisfies Est if and only if I(S) = ICt). We ~an then derivatively talk of satisfaction by an assignment, meaning satisfaction by the mterpretation that it determines. Given the concept of satisfaction for equations, we can define a number of derivative notions. To say that an algebra A (of the appropriate type) satisfies Est is to say that every interpretation into A satisfies Est, and to say that a class K of similar algebras satisfies a class Q of equations is to say that every algebra in K satisfies every equation in Q. We also say that A is a model ,of Q when A satisfies Q. With these pieces of terminology, we can now define equational class. Definition 2.10.1 A class K of similar algebras is said to be an equational class, or to be equationally definable, if there is a set Q of equations such that K is precisely the class of all models of Q. Since it refers to linguistic entities (terms, formulas) in addition to algebraic entities, the notion of equational class is model-theoretic, and not purely algebraic. In his Varieties Theorem, Birkhoff (1935), showed that the notion of equational class is coextensive with a purely algebraic notion-namely, the notion of variety, which is defined as follows. Definition 2.10.2 A class K of similar algebras is said to be a variety following conditions: (S) If A E K, and B is a subalgebra of A, then B E K.
(H) If A
E K,
and B is a homomorphic image of A, then B
(P) If Ai E K, for all i E I, then II (Ai) E K.
E
K.
if it satisfies the
37
In other words, a variety is a class of similar algebras that is closed under the formation of (S) subalgebras, (H) homomorphic images, and (P) direct products. Note that, in virtue of (S) and (P), every variety is automatically closed under the formation of subdirect products, and in virtue of the homomorphism theorem, every variety is closed under the formation of quotient algebras. Birkhoff's varieties theorem may be stated as follows. Theorem 2.10.3 (The varieties theorem) (I) EvelY equational class is a variety. (2) Every variety is equationally definable. In the remainder of this section, we prove the first half of this theorem, leaving the second (harder!) half until we develop additional machinery.
Proof. We verify each of the conditions (1)-(3) in Definition 2.10.2. (S) Proving the contrapositive, if B is a subalgebra of A and fails to satisfy s = t then there is some interpretation I of the terms in B so that I(S) =f. l(t). But 1 is afortiori an interpretation in A. (H) Again contrapositively, if B is a homomorphic image of A and fails to satisfy s = t, then there is an interpretation I in B such that I(S) =f. l(t). Let h be the given homomorphism. Define the interpretation I' in A so that, for every variable x, I'(X) E h-l(I(X». Note that I(X) may in general have many "pre-images" in A, and which one you choose is absolutely arbitrary-hence technically you have to use the Axiom of Choice. It may be proven by an easy induction on the length of terms that for every term ll, simple or complex, I'(U) E h-l(l(ll». It follows that I'(S) =f. let), for otherwise h(z'(s» = h(z'(t» and thus I(S) = let), contrary to our assumption. (P) Let II (Ai) be a direct product of algebras each of which satisfies s = t, but suppose it itself does not. Let I be an interpretation in II (Ai) such that I(S) =f. l(t). Letting I(S) = (ai) and l(t) = (hi), then for some i E I, ai =f. hi. Let Pi be that ith projection homomorphism. Define li(U) = Pi(l(ll» on all terms u. This is verified to be an interpretation by an induction on terms, and obviously li(S) = ai =f. hi = liCtm Exercise 2.10.4 The reader can provide any missing details in the above proof, e.g., the inductions. Exercise 2.10.5 For each of conditions (S), (H), and (P) the reader can give an example of a kind of postulate, satisfaction of which would not be preserved by the corresponding way of forming new algebras from old. (Hint: For (S) think of existential sentences, for (H) think of non-identities, and for (P) think of disjunctions.) 2.11
Equational Theories
The question arises about how to formalize the notion of an equational class definable by a set of equations Q. This may seem a strange question, since the cheap way is to take as the set of axioms ' yiffy (4) x 2:: y iff x 3.4
< y or x = y; :S x; < x; > y or x = y.
Covering and Hasse Diagrams
Unlike many mathematical structures, partially ordered sets (at least the finite ones) can be graphically depicted in a natural way. The technique is that of Hasse diagrams, which is based on the notion of covering. Before formally defining the notion of cover, we discuss a number of subsidiary notions, including a general technique for constructing pre-orderings and strict orderings.
COVERING AND HASSE DIAGRAMS
61
Definition 3.4.1 Let R be any relation. Then the transitive closure of R, R*, is defined to be the smallest transitive relation that includes R. Alternatively, R* is defined so that xR* y iff at least one of the following conditions obtains: (1) xRy, or (2) there is a finite sequence of elements q, C2, ... , CIl , such that xRq, qRc2, ... ,
cnRy. One can show that R*, so defined, always exists. First define the field of R(fld(R» = {x : 3y(xRy or yRx)}. Then take the intersection of all the transitive relations on the field of R that include R, this latter set being non-empty since it contains at least fld(R) x fld(R). Definition 3.4.2 Let R be any relation. Then the transitive reflexive closure of R, herein denoted R*, is defined to be the smallest transitive and reflexive relation that includes R. Alternatively, R* is defined so that xR* y iff at least one of the following conditions obtains: (1) xRy, or (2) x = y, or
(3) there is afinite sequence of elements q, C2, ... , Cil such that xRq, qRcz, ... , cnRy. Definition 3.4.3 Let A be any set, and let R be any relation on A. The pre-ordering generated by R, denoted R*, is defined to be the transitive reflexive closure of R. Just as the transitive closure of a relation always exists, so does the pre-ordering generated by a relation. Thus, every relation gives rise to an associated ordering/preordering. One is naturally led to ask what added conditions ensure that the resulting relation is a partial/strict ordering. This leads to the following definition. Definition 3.4.4 Let R be any relation. Then R is said to be regular if it satisfies the following infinite series of conditions: (r1) (r2)
If aRb, then ai-b. If aRb, and bRc, then ai-c.
(r3) If aRb, and bRc, and cRd, then a i- d, etc. Alternatively stated, R is regular if its transitive closure is ineflexive. (The reader may show this as an exercise.) Intuitively, regular relations admit no closed loops2 (e.g., aRb & bRa; aRb & bRc & cRa; etc.). Familiar examples of regular relations include the membership relation of set theory (in virtue of the axiom of regularity), and the relation of strict inclusion. With the notion of regUlarity, one can prove the following theorem. Theorem 3.4.5 Let R be any regular relation. Then the transitive closure of R is a strict ordering, and the transitive reflexive closure of R is a partial ordering. Exercise 3.4.6 Prove Theorem 3.4.5. 2But they do not exclude infinite chains, unlike well-founded relations.
62
INFIMA AND SUPREMA
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
Before continuing, we introduce one further notion, intransitivity. This is the polar opposite of transitivity; it is considerably stronger than mere non-transitivity. In particular, by an intransitive relation, we mean a relation R satisfying the following condition:
12 {a,b}
3
4
(IT) If aRb and bRc, then not (aRc). Now, every regular relation, even if it is intransitive, generates a partial ordering. Every partial ordering is generated by a regular relation (let the generating relation be the counterpart strict ordering). A more interesting question is whether every partial ordering is generated by an intransitive regular relation. In general, the answer is negative (see Exercise 3.4.9). However, if we restrict our attention to finite posets, then the answer is affilmative. In order to see how this works, we present the following definition.
Definition 3.4.7 Let (A,:s;) be a poset, and let a, b be elements of A. Then b is said to cover a if the following conditions are satisfied: (1) a < b.
(2) There is no x such that a < x and x < b.
In other words, b covers a if and only if b is above a in the ordering, and furthermore, no element lies between them in the ordering. For example, in the numerical ordering of the natural numbers, 2 covers 1. On the other hand, in the numerical ordering of the rational numbers the covers relation is uninstantiated; no rational number covers any rational number, since the set of rationals is dense (there is a rational number between any two distinct rational numbers). One can show that the covering relation is regular and intransitive, so we can consider the partial (strict) ordering generated by the covering relation. In the case of a finite partially (strictly) ordered set, but not in general, the partial (strict) ordering generated by the covering relation is the original partial (stIict) ordering.
Exercise 3.4.8 Verify the claims in the preceding paragraph. Exercise 3.4.9 Show that the usual partial ordering on the rational numbers is not generated by an intransitive regular relation. With the notion of cover, we can describe precisely what a Hasse diagram is. A Hasse diagram is a graphical depiction of the covering of a partially (strictly) ordered set. The representational convention is straightforward: one uses points (or other tokens, like name tokens) to represent the elements, and one connects two points to indicate that the corresponding covering relation holds; in particular, in order to indicate that a covers b, one connects "a" and "b" in such a way that "a" is north of "b" in the diagram. One then reads off the diagram by noting that the strict ordering is the transitive closure, and the partial ordering is the transitive reflexive closure, of the covering relation. Figure 3.1 contains some examples of Hasse diagrams. The first diagram depicts the poset consisting of integers 1, 2, 3, ordered by the usual numerical ordering. The second depicts the poset consisting of the subsets of a two-element set {a, b}, ordered by set inclusion. The third depicts the poset consisting of all divisors of 12, ordered by the integral division relation discussed earlier.
63
2
{a}
{b}
1
0
(HI)
3
2
(H2)
(H3)
FIG. 3.1. Examples of Hasse diagrams As noted in the previous section, ordering is a double-sided notion in at least two senses. This "double duality" is reflected in the method of Hasse diagrams. First, a Hasse diagram is impartial between the partial ordering and the strict ordering it depicts; whether we regard a Hasse diagram as portraying one rather than the other depends only upon whether we take the relation depicted to be reflexive or irreflexive. Thus, the first principle of duality between strict and partial orderings is graphically represented by the impartiality of Hasse diagrams. The second form of duality (which is literally "duality" in the accepted mathematical jargon) has an equally concrete graphical representation. Specifically, the (second) principle of duality amounts to the principle that every Hasse diagram can be turned upside down, and the result is a Hasse diagram, not of the original ordering, but of its converse (dual).
3.5 Infima and Suprema Thus far in this chapter, we have discussed only one logical concept, implication, which we have treated as a two-place relation among sentences (propositions). Logic seeks to characterize valid reasoning, and so the concept of implication is central to logic. However, the fundamental strategy employed by formal logic is to analyze language in terms of its syntactic structure, and, in part at least, this involves analysis in terms of a few privileged logical connectives-most notably, "and," "or," "not," and "if-then." We are accordingly interested in the mathematical description of these concepts, and their relation to implication, as we have described it above. In the next few sections, we concentrate on "and" and "or"; later in the chapter, we concentrate on "not" and "if-then." Although "and" and "or" are probably best understood as anadic connectives in English ("anadic" means having no fixed degree), in formal logic they are typically treated as dyadic connectives. In lattice theory, they can be treated either way, but let us start with the dyadic treatment. What are the general properties of conjunction? Let us suppose that "and" corresponds to a propositional operation, so whenever x and yare
64
INFIMA AND SUPREMA
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
65
propositions, so is x-and-y. Without saying too much about the nature of propositions and propositional conjunction, we can at least say the following:
Applying the concept of lower and upper bound to the entire poset yields the following definitions.
(CI) x-and-y implies x; x-and-y implies y. (C2) if w implies x, and w implies y, then w implies x-and-y.
Definition 3.5.3 A poset is said to be upper bounded (or bounded above) if it has at least one (and hence exactly one) upper bound.
Note the exact formal parallel in set theory:
Definition 3.5.4 A poset is said to be lower bounded (or bounded below) if it has at least one (and hence exactly one) lower bound.
(Sl) XnY~X;XnY~Y. (S2) If W ~ X and W ~ Y, then W
~
X n Y.
Now, just as we can talk about the intersection of a collection of sets, we can talk about the conjunction of a collection X of propositions. This leads to the natural generalization of (Cl) and (C2): (CI*) The conjunction of X implies each x in X. (C2*) If w implies every x in X, then w implies the conjunction of X. These also have exact parallels in set theory, the statements of which we leave as an exercise. Next, we describe similar principles for disjunction, which is dual to conjunction and parallel to set union: (DI) (D2) (Dl*) (D2*)
x implies x-or-y; y implies x-or-y. If x implies z, and y implies z, then x-or-y implies z. Each x in X implies the disjunction of X. If every x in X implies y, then the disjunction of X implies y.
Definition 3.5.1 Let (A, ~) be a poset, let S be a subset of A, and let a be an element of A. Then a is said to be an upper bound of S if the following condition is satisfied: (ub) For all sin S, a ~ s. In other words, a is an element of A that is larger than or equal to every element of S. Notice that, in principle, a set S may have any number of upper bounds, including none. The set of upper bounds of S is denoted ub(S). The dual notion of lower bound is defined in a natural way.
Definition 3.5.2 Let (A,~) be a poset, let S be a subset of A, and let a be an element of A. Then a is said to be a lower bound of S if the following condition is satisfied: (Ib) For all s in S, a ~ s.
In other words, a is an element of A that is smaller than or equal to every element of S. As with upper bounds, a set can have any number of lower bounds. The set of lower bounds of S is denoted Ib(S). When we set S = A in the above definitions, we obtain two useful specializations. First of all, A does not have just any number of upper bounds (lower bounds); it has exactly one, or it has none. For suppose that p, q are both upper bounds of A. Then p ~ x for all x in A, and q ~ x for all x in A, but p, q are in A, so in particular, p ~ q and q ~ p, whence p = q by anti-symmetry. One shows in a similar manner that lower bounds are unique.
Definition 3.5.5 A poset is said to be bounded bounded.
if it is both upper bounded and lower
It is customary to use the symbol "1" to refer to the upper bound of a poset (supposing it exists), and to use the symbol "0" to refer to the lower bound of a poset (supposing it exists). Thus, in particular, in a bounded poset, every element lies between the zero element, 0, and the unit element, l. As we saw above, a poset has at most one upper bound and at most one lower bound. This generalizes to all the subsets of a given poset, since every subset of a poset is a poset in its own right. (Bear in mind that a poset is a relational structure, not an algebra.) This leads to the notion of least element and greatest element, which are defined as follows.
Definition 3.5.6 Let (A,~) be a poset, and let S be any subset of A. The greatest element of S is defined to be the unique element of S, denoted g(S), which (if it exists) satisfies the following conditions: (1) g(S) E S. (2) For all s in S, g(S) ~ s.
In other words, g(S) is an element of S that is also an upper bound of S. If there is any such element, then it is unique. On the other hand, not every subset S has a greatest element, which is to say that the term "g(S)" need not refer to anything. A succinct mathematical formulation of this idea is that S n ub(S) is either empty or has exactly one element. A weaker notion is that of a maximal element of S. This is an element m of S which is such that there is no xES with the property that x > m. Clearly g(S) is maximal, but not necessarily vice versa. The dual notion of least element is defined in the obvious dual way.
Definition 3.5.7 Let (A,~) be a poset, and let S be any subset of A. The least element of S is defined to be the unique element of S, denoted I(S), which (if it exists) satisfies the following conditions: (1) I(S) E S.
(2) For all s in S, I(S)
~
s.
In other words, I(S) is an element of S that is also a lower bound of S. Once again, I(S) need not exist, but if it does, it is unique. Mathematically speaking, S n Ib(S) is either empty or contains exactly one element. Again, minimal element can be defined dually.
66
Combining the notions of least and upper bound yields the notion of least upper bound, and combining the notions of greatest and lower bound yields the notion of greatest lower bound. The idea is quite simple. The set ub(S) of upper bounds of a set S mayor may not be empty; if it is not empty, then ub(S) mayor may not have a lea~t element. If ub(S) does have a least element (necessarily unique), then that element IS called the least upper bound of S, and is denoted lub(S). In a completely parallel manner, the set I b( S) of lower bounds of S mayor may not be empty, and if it is not empty, then it mayor may not have a greatest element. But if Ib(S) does have a greatest element, then that element is called the greatest lower bound of S, and is denoted glb(S). In spite of the perfectly compositional character of the expressions "least upper bound" and "greatest lower bound," a certain amount of confusion seems to surround these ideas, probably because they involve, so to speak, going in two directions at the same time, so that it may not be clear where one is when the process is completed. For this reason, and for reasons of succinctness, alternative terminology is often adopted. Specifically, the greatest lower bound of S is often called the infimum of S (whic~ is clearly below S), and the least upper bound is often called the supremum of S (WhICh is clearly above S). We will use these terms pretty much interchangeably, although we lean toward the less verbose "infimum" and "supremum." Exercise 3.5.8 Given a bounded po set, show that lub(0) oand 1 are the least and greatest elements, respectively.)
= 0, and glb(0) = 1. (Recall
Proposition 3.5.9 Let (A,~) be a poset, and let (A', ~') be a "subposet" in the sense that A' ~ A and ~' is just ~ restricted to A'. Let S ~ A have an infimum i. If i E A', then i is also an infimum of S' = S n A' (and similarly for suprema). Proof It should be reasonably intuitive that if an element i is the greatest lower bound of S ~ A, then it continues to be a greatest lower bound in any restriction S' of the set S to a subposet A', as long as the element s is a member of A'. Thus i is still a lower bound of the restricted set A', and if every element of S is "above" it, then so is every element of S' ~ S. 0 Corollary 3.5.10 Let (A', ~') be a subposet of (A', ~') as in Proposition 3.5.9. Then if every S ~ A has an infimum (supremum), then so does every S' ~ A' have that same supremum. We conclude this section by discussing an important example of a poset where infima and suprema always exist. Such po sets are called lattices and will be more formally introduced in the next section. Let ~(X) be the "power set" of the set X, i.e., the set of all subsets of X. Then it is easy to see that this forms a lattice in the following way. Proposition 3.5.11 Given Y, Z E ~(X), the infimum of {Y, Z} is the intersection of Y and Z, Y n Z = {x: x E Y and x E Z}. The supremum of {Y, Z} is the union ofY and Z, Y U Z = {x: x E Y or x E Z}. Proof Proposition 3.5.11 is actually a special case of Proposition 3.5.12 to be found 0 below.
67
LATTICES
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
Proposition 3.5.11 can be generalized from binary intersections and unions to arbitrary intersections and unions: c = {x : \iY E C, x E Y}, U C = {x : 3Y E C, x E Y}. The following actually includes Proposition 3.5.11 as a special case, since we can define Y n Z = {Y, Z} and Y U Z = U{Y, Z} .
n
n
Proposition 3.5.12 Given a non-empty C ~ ~(X), the infimum of C is the intersection of c, c, and the supremum of C is the union of C, U C.
n
n
n
Proof We show first that C is a lower bound of the sets in C, i.e., \iY E C, C~ Y. But this follows by definition of C. We next must show that C is the greatest among lower bounds of the sets in C. Let B be another lower bound. We must show that B ~ c. Since B is a lower bound this means that \iY E C, B ~ Y. This means that any member of B is also a member of every Y E C. But this means that B ~ c. The proof concerning the supremum is proven symmetrically, as the reader may confirm. 0
n
n
n
n
Corollary 3.5.13 Let C be any collection of sets closed under arbitrary intersections and unions. Then C forms a complete lattice (with inclusion as the partial order), intersection the infimum, and union the supremum. Proof This follows from Proposition 3.5.12 using Proposition 3.5.9.
o
As a special case we have: Corollary 3.5.14 Let C be any collection of sets closed under binary intersections and unions. Then C forms a lattice with binary intersection and union. Especially in older literature, a collection C satisfying the conditions of the last corollary is called a "ring of sets," and one satisfying the conditions of the first corollary is called a "complete ring of sets." Even though not every infimum is intersection, and not every supremum is union, nonetheless the notions of infimum and supremum are natural generalizations of the notions of (infinitary) intersection and union. In the next section, we examine notions that are the abstract counterparts of finite intersection and union. 3.6
Lattices
As noted in the previous section, the infimum (supremum) of a subset S of a poset P is the greatest (least) element of P that is below (above) every element of S. As defined, the notions infimum and supremum apply to all subsets, both finite and infinite. In the present section, we discuss these notions as they apply specifically to finite nonempty subsets of P. As we saw in the previous section such structures are called "lattices." Some standard references include: Balbes and Dwinger (1974), Gericke (1963), Rutherford (1965), and Szasz (1963). As noted below, infima (suprema) of finite non-empty sets reduce to infima (suprema) of doubleton sets, so we begin with these. A doubleton set is a set expressible (abstractly) by something of the form {s, t}, where s, t are terms; accordingly, in spite of its name, a doubleton may in fact have only one element (if s = t); in any case, a doubleton has at least one element, and at most two elements.
68
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
LATTICES
69
When the set S is a doubleton (a, b}, the infimum of S is denoted a /\ b, and the supremum is denoted a V b. This is in complete analogy with set theory, where the infimum of a pair A, B of sets is denoted A n B, and the supremum of A, B is denoted Au B. It is, furthermore, customary to call a /\ b the meet of a and b, and to call a V b the join of a and b; thus, we may read "a /\ b" as "a meet b" and "a V b" as "a join b". Sometimes, the infimum (supremum) of an infinite set S may be called the meet (join) of S. We shall, however, reserve these terms for the finite case. Indeed, we tend to reserve the terms "meet" and "join" for the specifically algebraic characterization of lattices (see Section 3.7), whereas we use the terms "infimum" and "supremum" for the characterization in terms of partially ordered sets. As noted in the previous section, the infimum (supremum) of a set S need not exist, ilTespective of cardinality. Issues concerning the existence of infima and suprema lead to the following series of definitions.
Hint: Consider the set of integers.
Definition 3.6.1 Let P = (A,~) be a poset. Then P is said to be a meet-semi-Iattice (MSL) if every pair a, b of elements of A has an infimum (meet) in A.
has an lI1fimum
Definition 3.6.2 Let P = (A,~) be a poset. Then P is said to be a join-semi-Iattice (JSL) if every pair a, b of elements of A has a supremum (join) in A.
Hint: ~roof by induction, where the i~duction formula is "every subset of A of size n has an mfimum". ,
Definition 3.6.3 Let P MSL and a JSL.
= (A, ~) be a poset. Then P is said to be a lattice ifP is both an
Hint: Consider the set of countable subsets of an uncountable set (e.g., the real num-
bers).
Theorem 3.6.10 Every complete lattice is bounded. Hint: Consider the infimum and supremum of the whole set.
Theorem 3.6.11 Not evel}' sigma-complete lattice is bounded. Hint: Consider the set of countable subsets of an uncountable set.
Theorem 3.6.12 Not every lattice is bounded.
Theore~ 3.6.1~ Let P 1Il
= (A,~) be a poset. Suppose that every doubleton {a, b} in A A. Then P is an MSL.
Theorem 3.6.14 . Let P = (A,~) be a poset. Suppose that every doubleton {a, b} in A 1Il A. Then P is a JSL.
has a supremum
The following can be shown (see below) to be an equivalent definition. Definition 3.6.4 Let P
Theorem 3.6.9 Not evel}' sigma-complete lattice is complete.
=
Hint: Proof by induction.
(A,~)
be a poset. Then P is said to be a lattice if evel}' non-empty finite subset S of A has both an infimum and a supremum in A.
By taking the latter as our official definition, we are naturally led to propose further, stronger notions, the following being the most commonly used. ~) be a poset. Then P is said to be a sigma-complete lattice if eVel}' non-empty countable subset S of A has both an infimum and a supremum in A.
Definition 3.6.5 Let P = (A,
= (A,~) be a poset. Then P is said to be a complete lattice if evel}' non-empty subset S of A has both an infimum and a supremum in A.
Definition 3.6.6 Let P
Thus, as a simple matter of definition, every complete lattice is a sigma-complete lattice, and every sigma-complete lattice is a lattice. The following is a list of theorems, some obvious, some less obvious. In each case, a more or less developed suggestion is appended by which it is hoped that the reader can provide a proof.
Theorem 3.6.15 The dual of an MSL is a JSL; conversely the dual MSL. '
IS
an
Theorem ~.6.16 . The dual of a lattice (a sigma-complete lattice, a complete lattice) is also a lattzce (a szgma-complete lattice, a complete lattice). Hint: See preceding hint.
Theorem 3.6.17 Not evel}' JSL is a lattice. Hint: Consider the Hasse diagram in Figure 3.2. a
Hint: Every subset of a finite set is finite.
Theorem 3.6.8 Not every lattice is sigma-complete, and hence not every lattice is comHint: Consider the set of integers ordered in the usual way.
a JSL .
Hint:. Show that a is the infimum (supremum) of S in P iff a is the supremum (infimum) of S m pOP, where pop is (A,:2:) iff P is (A, ~).
Theorem 3.6.7 Every finite lattice is complete, and hence sigma-complete.
plete.
OF 'J
c
b
FIG. 3.2. Theorem 3.6.17
70
LATTICES AS ALGEBRAS
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
Theorem 3.6.18 Let P = (A, S) be a poset. Suppose that every subset S of A (including the empty set) has a supremum in A. Then P is a complete lattice. Hint: First notice that the supremum of the empty set must be the least element of A. Define the infimum of a set to be the supremum of its lower bounds.
3.7
The Lattice of Congruences
Given a collection Y of binary relations on a set X, by definition they form a complete lattice iff for any 3 ~ Y both the greatest lower bound I\. 3 and the least upper bound V3 are members of 3 (these bounds taken relative to ~ on Y). Under favorable conditions I\. is just intersection.
Lemma 3.7.1 lfn 3
E
Y, then/\. 3
= n3.
o
Proof This follows from Lemma 3.5.13
This makes it easy to "compute" the meet. Many natural collections Y of binary relations satisfy the antecedent of the lemma for all 3 ~ Y. For example, the class of all equivalence relations, or the class of all congruences, is closed under arbitrary intersections. The details are provided through the following:
Lemma 3.7.2 The set £(A) of all equivalence relations on some algebra A is closed under arbitrmy (including infinite) intersections. The same holds for the set C(A) of all congruences on A. Exercise 3.7.3 Prove the above lemma. Computing V 3 is more difficult. It is very rare that this is the union, since if there are any closure conditions on the classes of relations (e.g., an equivalence relation requires symmetry and transitivity), the least upper bound is larger than the mere union. We must expand the union using the closure conditions. The symmetric and transitive closure of 3 is the intersection of all sets 3' :2 3 such that if (a, b) E 3' then (b, a) E 3', and if (a, b) E 3' and (b, c) E 3', then (a, c) E 3'.
Lemma 3.7.4 Let C(A) be the set of all congruences on the algebra A. Then
V3
is
the symmetric and transitive closure of3. Proof The symmetric and transitive closure of 3 is clearly the smallest equivalence relation including all of the relations in 3, being reflexive since each of the relations in 3 is reflexive. The replacement property is perhaps not quite so obvious. Suppose (a, b) E V3. If a and b are "directly congruent" in the sense that for some congruence 8 E 3, (a, b) E 8, then obviously the replacement property holds. Otherwise a and bare "indirectly congruent" in the sense that 3Cj, ... , C/1 E A and 381, ... ,8/1 E 3 such that:
Since each 8i is a congruence, the replacement can now take place a step at a time:
71
& ... & 1'(C/1-1)811-11'(C/1) & 1'(c/1)8I!1'(b). And so < 1'(a), 1'(b) > E
V 3, as required.
o
Note that the lemma can be strengthened to just requiring transitive closure, i.e., the intersection of all sets 3' :2 3 such that if (a, b) E 3' and (b, c) E 3', then (a, c) E 3':
Corollary 3.7.5 The set C(A) of all congruences on the algebra A forms a complete 3, and (b) V3 is the transitive closure of3. lattice, wherefor 3 ~ C(A), (a) I\. 3 =
n
Proof The corollary is an immediate consequence of the following:
o
Fact 3.7.6 If 3 is a set of symmetric relations, then the transitive closure of its union, TransCl(U 3), is also symmetric. Proof If (a, b) E TransCl(U 3), then it is easy to see that 3XI, ... , Xb Xk+l, ... , XI! and relations PI, ... ,Pn E 3 such that a(pI)xl & ... & XkCPk+I)xk+I & ... & xn(Pn+I)b.
But each of PI, ... , PI! is symmetric, and so we have: b(Pn+ I )XI! & ... & xk+ I (Pk+ I )Xk & ... & Xl (PI )a,
i.e., (b, a)
E
TransCl(U 3).
o
Since the class of congruences C(A) forms a complete lattice, it is clear that it must a smal~est a~d larg~st congruence. It is easy to see that the smallest congruence IS =A (the Identlty relatIOn restricted to A). The largest congruence is of course the universal relation A x A. ~ave
3.8
Lattices as Algebras
Thus fa~, we have characterized lattices as special sorts of partially ordered sets, which are relatIOnal structures. However, as mentioned at the beginning of this chapter, lattices can also be characterized as algebras, which is the focus of this section. We start by noting that every semi-lattice induces an associated simple binary algebra. In the case of a meet-semi-Iattice, the binary operation, called meet, is defined as one would expect: (M) a i\ b = inf {a, b) . Similarly, in the case of a join-semi-lattice, the binary operation, called join, is also defined as one would expect: (J) avb=sup{a,b}.
In an ~SL (JSL), the infimum (supremum) of any pair of elements exists, so these operatIOns are well-defined. Next, we note that if the poset in question is in fact a lattice, then there is an associated algebra of type (2,2), where the operations, called meet and join, are defined by (M) and (1), respectively.
LATTICES AS ALGEBRAS
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
72
Finally, in this connection, we note that if the poset in question is bounded above, we can add a zero-place operation, 1, to the associated algebra, and if the po set is bounded below, we can add a zero-place operation, denoted 0, to the associated algebra. Thus, for example, a bounded lattice gives rise to an algebra of type (2,2,0,0). Every lattice gives rise to an associated algebra. What about the other direction; what sort of algebra gives rise to a lattice? We begin by answering a smaller question, concerning the algebraic description of semi-lattices, which is the topic of the following definition. Definition 3.8.1 Let A be an algebra of type (2), where the sole operation is *. Then A is said to be a semi-lattice algebra if it satisfies the following equations:
(sl) a * (b * c) = (a * b) * c (associativity); (s2) a * b = b * a (commutativity); (s3) a * a = a (idempotence). One might naturally be interested in the relation between semi-lattice algebras, on the one hand, and MSLs and JSLs, on the other. One's curiosity is satisfied, it is hoped, by the four theorems that follow. The reader can easily provide the proofs. Theorem 3.8.2 Let (A,~) be an MSL. Define a binary operation * so that a inf {a, b}. Then the resulting structure (A, *) is a semi-lattice algebra.
*b =
Theorem 3.8.3 Let (A,~) be a JSL. Define a binary operation * so that a sup{a, b}. Then the resulting structure (A, '1') is a semi-lattice algebra.
*b =
These theorems assert that every MSL (JSL) generates an associated semi-lattice algebra. More interestingly perhaps, every semi-lattice algebra generates both an MSL and a JSL, which is formally stated in the following theorems. Theorem 3.8.4 Let (A, *) be a sem'i-lattice algebra. Define a binmy relation ~ as follows: a ~ b iff a * b = a. Then the resulting relational structure (A, ~) is an MSL, where inf {a, b} ( 0 1)
= a * b.
Theorem 3.8.5 Let (A, *) be a semi-lattice algebra. Define a binmy relation ~ as fol-
but a * b = a, so we have a * c = a, which by (01) means that a ~ c, which was to be shown. Thus we see that semi-lattices can be characterized by a set of equations, specifically (sl)-(s3), which is to say that semi-lattices form a variety. Next, we consider whether lattices can be equationally characterized. First, every lattice is both an MSL and a JSL, so at a minimum, we need two copies of the (s 1)-(s3), one for meet, one for join: (Ll) a/\(b/\c) = (a/\b)/\c; (L2) a /\ b = b /\ a; (L3) a /\ a = a; (L4) a V (b V c) = (a V b) V c; (L5) a V b = b V a; (L6) a V a = a.
But a lattice is not merely a pair of semi-lattices. In addition, the semi-lattices are linked by a common partial order relation so that a ~ b iff a /\ b = a and a < b iff a V b = b. So our set of equations must ensure that a /\ b = a iff a V b = b. W;could simply add this biconditional as an axiom, but it is not an equation, and we are looking for an equational characterization. The customary equations are the following two: (L7) a /\ (a V b) = a; (LS) a V (a /\ b) = a.
With these equations listed, we present the following formal definition and attendant theorems. Definition 3.8.7 Let A be an algebra of type (2,2). Then A is said to be a lattice algebra if it satisfies equations (Ll )-(L8). Theorem 3.8.8 Let (A, ~) be a lattice. Define two binary operations, /\ and V, so that a /\ b = inf {a, b}, and a V b = sup {a, b}. Then the resulting algebra (A, /\, V) is a lattice algebra. Theorem 3.8.9 Let (A, /\, v) be a lattice algebra. Define a relation ~ so that a ~ b ¢> a /\ b = a. Then the relational structure (A,~) is a lattice, and in particular, for evelY pair a, b in A, inf{a, b} = a /\ b, and sup{a, b} = a vb. Exercise 3.8.10 The proofs of these theorems are left as exercises.
lows: (02) a ~ b iff a * b = b. Then the resulting relational structure (A, ~) is a JSL, where sup {a, b}
73
= a * b.
Exercise 3.8.6 Prove the four preceding theorems.
Notice that the ordering relation defined by (01) is the converse of the order relation defined by (02). Thus, the MSL and JSL mentioned in the previous theorems are duals to one another. We sketch the proof that ~, as defined by (01), is transitive. Suppose that a ~ band b ~ c; we show that a ~ c. By (01), a * b = a, and b * c = b, so by substitution of b * c for b in a * b = a we have a * (b * c) = a, which by associativity yields (a * b) * c = a,
As a further detail, we note that bounded lattices (semi-lattices) can be algebraically characterized using the following equations: (B 1) (B2) (B3) (B4)
0 /\ a I /\ a 0Va 1 Va
= 0; = a; = a; = 1.
In particular, (B 1) gives us an MSL bounded below, (B2) gives us an MSL bounded above, (B3) gives us a JSL bounded below, and (B4) gives us a JSL bounded above. (These claims may be verified as exercises.) Thus, for example, by adding (B1)-(B4)
74
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
to (Ll)-(LS), we obtain a set of equations for bounded lattices. (This too is left as an exercise.) Having seen that lattice algebras are coextensive with lattices regarded as relational structures, we adopt the standard convention of using the term "lattice" ambiguously to refer both to lattices as posets and lattices as algebras. Alternatively, we can be regarded as using the term "lattice" to refer to mixed structures that have the lattice operations as well as an explicit partial order relation. This practice will seldom, if ever, cause any difficulty. 3.9
Ordered Algebras
Vmious people have defined ordered algebras as structures (A, S, (Oi) iEI) where S is a partial order on A that is isotonic in each argument place in each of the operations 0i: as b =? Oi(Xj, ... ,a, ... ,x n ) s oi(Xj, ... ,b, ... ,xn).
Bloom (1976) and Fuchs (1963) are two classic sources regm'ding ordered algebras. Birkhoff (194S) is an early discussion, focusing on the case where the partial order forms a lattice, and contains (fn 1, p. 200) historical information about seminal work by Dedekind, Ward, Dilworth, and Certaine. Birkhoff (1967) has more information about using the general partial order. Wechler (1992) is an excellent recent source. Example 3.9.1 Given any algebra A = (A, (Oi )iEI), it can be considered an ordered algebra A = (A, = A, (0; )iEI), where "s" is just = A (the identity relation restricted to A). This is called the discrete ordered algebra.
ORDERED ALGEBRAS
75
Thus the class of commutative semi-groups is not truly equational." We should surely respond that the axioms regarding equality are special, and are assumed as the background. It is then still an interesting question as to which algebraic structures can be axiomatized by adding simply equations to these background axioms. We can take the same attitude towards ordered algebras, and say that the following implicational axioms regarding S are assumed as coming for free, along with reflexiveness (a S a) as part of the background: asb & bsa=?a=b asb & bsc=?asc asb=?aoxsbox a S b=? x 0 a S x 0 b.
With this understanding, the class of lattices is "inequationally definable": Exercise 3.9.5 Show that a lattice may be defined as an ordered algebra (L, S, /\, V) satisfying the inequations a /\ b S a, a /\ b S b, a S a /\ a, a S a V b, b S a V b, a Vas a. One can formalize the inequationallogic (IL) with the following axiom and rules: xsx x S y, Y S xsz
z
x S y, Y S x x=y x:
(-,+)
79
0: (+) 0: (+) ..,: (-)
Example 3.10.3 Let (A,~, 0, -'>, +-) be aresiduated partially ordered groupoid (cf. Section 3.17). Then the tonic types are as follows: 0: (+,+) -'>: (-,+) +-: (+,-). Remark 3.10.4 Residuated partially ordered groupoids are commonly stud~ed as ordered algebras (see Fuchs 1963), but this is really a kind of a "cheat" if the resIduals are indeed "first-class citizens." Example 3.10.5 Any ordered algebra is trivially a tonoid (with the distribution type for each operation being a string of plus signs). 4There is a further requirement that they preserve or co-preserve (invert) some bound, but we omit this here as simply complicating the discussion.
Example 3.10.8 The set Q+ of positive rational numbers, with multiplication (a x b) and division (a/b) is a tonoid. The relation a ~ b is understood as "a (integrally) divides b without remainder," i.e., there exists a natural number n such that a x 11 = b. Multiplication has tonic type (+, +) and division has tonic type (-, +).
One can formalize the tonoid inequationallogic (TIL) with the same axioms and rules for IL, with the obvious exception that the rule expressing isotonicity is replaced with the following rule (given that the tonic type of 0i is ((JI, ... , (J~, ... , (In), and that ~± is ~ if (Jm = +, and ;?: if (Jm = -): Xm ~Ym
We leave to the reader the straightforward task of proving that the set of axioms and rules for TIL is sound and complete. As we have seen from examples above, tonoids arise very naturally in looking at various logical connectives, particularly negation and implication. Definition 3.10.9 An implication tonoid is a tonoid operation on A whose tonic type is (-, +) .
(A,~,
-'»,
where
-'>
is a binary
Implication tonoids are a start on the road to axiomatizing various fragments of various substructural logics, including the implicational fragment of the relevance logic R, by adding various inequations as we shall show in Section 3.17. In addition to inequations, we can form quasi-inequations as inferences that have one or more inequations as premises, and an inequation as the conclusion, for example: (RfP)
a ~ b
-'> C
=? b ~ a
-'>
c
(Rule-form permutation),
which does not hold for all implication tonoids, since it is false in the implication tonoid described by the following table (assign a, b, c the values 1, respectively).
t, t,
T:
-'>
1
1
1+
1
1+ 1+
'2 0
1
'2 0 1+ 1+
0 0 0 1+
Note that the plus indicates when a ~ b holds, so this table does double duty, showing implication both as a "metalinguistic" relation and as an "object language" operation. Note that it is easy to visually check whether a table defines an implication tonoid.
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
80
TONOIDS
81
Implications are antitonic in the antecedent positions, so the values must decrease (more precisely not increase) as one moves along a row from left to right. Implications are isotonic in their consequent positions, so this means that the values must increase (more precisely, not decrease) as one moves along a column from top to bottom. The reader can easily see that 0 S ~ S 1, and so we have a linear order. The motivation of this definition is that a true implication is when we have a S b, and a false implication is when we have a b, and that we "skip the middle person" by taking truth to be the top element 1, and falsity to be the bottom element O. This is a kind of strict, or necessary conditional, of the sort associated with the modal logic S5. We noted above (see Example 3.9.9) that quasi-inequations are not preserved under homomorphism.
However, notice that permutation is really an instance of contraposing in the sense of Section 12.6. Moreover, (RfP) assumes that the two operations which are contrapositives of each other are the same. This is a rather strong assumption, although it is not rare in real case examples. The simplest such example is negation. Recall from Section 3.13 that (ml), which is a quasi-inequation stating contraposition for a single negation, implies (m2) and (m3), another form of contraposition and half of the double negation. We generalize these observations in the following fact stating that contrapositive quasi-inequations are preserved under (tonoid) homomorphism.
Problem 3.10.10 Prove a "vmieties theorem" for "quasi-inequationally definable" tonoids (those that can be axiomatized by implications from a finite number of inequations to an inequation) similar to Theorem 3.11.1 for inequationally definable tonoids, below. We conjecture that one somehow modifies the conditions for inequationally definable tonoids by replacing preservation under homomorphic ordered images with some more subtle requirement about preservation under images.
(cp)
i
One particularly interesting quasi-inequation is rule-form permutation (RfP). This is interesting because it distinguishes those substructural logics which potentially have two implications (distinct left and right residuals, e.g., the Lambek calculus) from those that have just one single implication (the residuals collapse to each other), e.g., linear logic, relevance logic, BCK logic, intuitionistic logic. It is interesting to note that in the context of tonoids, this is in fact inequationally axiomatizable using the single inequation: x S (x --+ y) --+ Y
(assertion).
The following derivation shows that assertion implies rule-form permutation: 1. 2. 3. 4.
xS Y
(assumption) (y --+ z) --+ Z S x --+ Z (1, suffixing) Y S (y --+ z) --+ Z (assertion) Y S x --+ z (3,2, transitivity). --+ Z
Conversely, assertion follows from x --+ Y S x --+ Y by rule-form permutation, so assertion and rule-form permutation are in fact equivalent in the context of tonoids. 5 This phenomenon is quite general for tonoids; (RfP) is a special-though undoubtedly interesting-case. To analyze the problem, first, notice that despite the fact that tonoids might have antitonic operations, the notion of homomorphism is the same as for ordered algebras. The reason for this is that the anti tonicity does not appem" in any inequation per se. The problem with the non-preservation of quasi-inequations in general is that they are conditionals, and nothing excludes the conditional being vacuously true by the antecedent being false. Thus, fine-tuning the notion of homomorphism cannot solve the puzzle. 5We owe this observation and the following generalization of it to Kala Bimb6.
Lemma 3.10.11 Let OJ be anll-my operation which is order-reversing ill the ith place. Then the quasi-inequation XSO}C. .. ,Yi, ... ) =? YiSOj(' .. ,x, ... )
is preserved under homomorphism.
Proof First we prove for any such operation the following: (1) OJ( ... ,Yi, ... )SOJC. .. ,Yi, ... ) (2) Yi S OJ( ... , OJ( ... , Yi, ... ), ... )
(refl.ofS); ((1), by contraposition).
Here (2) is the "law of intuitionistic double negation." (Another way to put this is that the operation satisfies the principle of "extensionality," taking the application of the operation twice as a closure operation. This is quite plausible since OJ is order-reversing in its ith place, i.e., forms with itself a Galois connection in the ith place.) Using this we show that if a tonoid I satisfies the above quasi-inequation, then so does tonoid J where J is a homomorphic image (under h) of I: 1. 2. 3. 4. 5.
hx S oJC. .. , hYi, ... )
(assumption, in J)
OJ( ... , OJ(' .. , hYi, ... ), ... ) S oJC. .. , hx, ... )
(1, by ton. rule, in J)
(by (2), in 1) (3, h ord. hom., in J) (2,4, by transitivity of S, in J).
Yi S OJ( .. . , OJ( .. . , Yi, ... ), ... )
hYi S o}C. .. , OJ( ... , hYi, ... ), ... ) hYi S o}C. .. , hx, ... )
This concludes the proof.
o
Turning back to the "axiomatizability" view, this lemma can be taken as demonstrating that in a tonoid "extensionality," that is, (2) is equivalent to "contraposition," that is, (cp) (where "contraposition is taken with respect to the ith place of an operation antitonic at this place). Note that (2) is a direct analog of "assertion," just as is (cp) an analog of the "rule form of permutation." The first pmt of the proof of the lemma directly derived "extensionality" from "contraposition." The second part of the proof showed the converse. The proof crucially uses the fact that the structure is a tonoid, and that the operation is order reversing in its ith place. Remark 3.10.12 We emphasize that rule-form permutation is inequationally axiomatizable by assertion only because we are requiring suffixing as part of the fundamental framework of a tonoid. It is easy to see that if we do not require this, then no set of
82
TONOID VARIETIES
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
in equations is equivalent to rule-form permutation. The argument is that the set of inequations would be preserved under homomorphism, whereas rule-form permutation is not. The proof goes by modifying Example 3.9.9 as follows. Define an operation on A : x -+ y = x.1t is easy to see that rule-form permutation (and prefixing and suffixing) hold for::;1 (and hence so would the inequations), but that permutation (and suffixing) fail for::;2 (though the supposedly equivalent inequations would be preserved). Permutation after all states that x ::; y -+ Z = Y implies that y ::; x -+ Z = x. Setting x = 0 and y = 1 (z can be either) and using the fact that 0 ::;2 1, we obtain 1 ::;2 0 (which is clearly false).
It is interesting to note that in equational definability may depend upon the presence of "helper" connectives. Thus suppose we add the fusion connective 0 (cf. Section 3.17) to an implication tonoid, forming a right-residuated partially ordered groupoid, subject to the residuation condition: x
(Res)
0
y ::;
z
iff y ::; x -+
z·
Then we can state the rule-form permutation (RfP) as: x
0
y::; yo x.
83
We shall call a class K of tonoids "similar" just when their algebraic parts are similar . as algebras (having the same number of operations of the same degree), and in addition, corresponding operations have the same tonic type. The subsequent discussion is based on Bloom's (1976) similar theorem for order algebras, and it turns out that the proof of this theorem can be readily extended to tonoids. Then the theorem for order algebras turns out just to be a special case (when all of the tonic types are positive in each position). The notions of subtonoid, tonic homomorphic image, and tonic direct product are defined precisely the same way as their corresponding notions for ordered algebras. We prove the theorem by first recognizing that the Galois connection that was established in Section 2.15 for algebras and equations, extends to tonoids and inequations. We write It to stand for the tonoids that satisfy the inequations I, and Ki to stand for the set of inequations that are valid in K.
Fact 3.11.2 The maps above form a Galois connection between the power set of the set of all inequations (in a language appropriate to K) and the power set of the set of tonoids. Let KI and K2 be similarity classes of to noids (of the same type), and let Ir and Iz be sets of equations (in the same terms). In particular, K ~ (Ki)t. We can then show Theorem 3.11.1, if we show the converse (Ki)t ~ K (under the closure hypotheses of the theorem). For then we have (Ki)t = K and so the class K is axiomatized by the set of equations Ki. We first prove a necessary theorem about the existence of free tonoids.
The reader may check this fact by verifying the following chain of equivalences:
a ::; b -+
C
iff boa::; c iff a 0 b ::; c iff b ::; a -+ c.
The discerning reader may worry that the residuation condition is not itself an inequation. But: Exercise 3.10.13 Prove that the residuation condition (Res) is equivalent to the following pair of inequations: a
0
(a -+ b) ::; b a ::; b -+ (b
0
a).
Things get more interesting yet, because in Section 8.1.2 we prove that every implication tonoid is embeddable in a right-residuated partially ordered groupoid. The question naturally arises as to whether there is an analog to Birkhoff's varieties theorem for tonoids. We answer this positively in the next section. In Chapter 12 we shall see that tonoids have nice representations.
3.11
Tonoid Varieties
We state the following, and then say something about the notions it uses.
Theorem 3.11.1 A similarity class K of tonoids is inequationally definable (ff K is closed under subtonoid, tonoid homomOlphic image, and tonoid direct product.
Theorem 3.11.3 Let K be a non-degenerate class of similar tonoids, which is closed under tonoid subalgebras and tonoid direct products. Then for any cardinal 17, a free K -tonoid FK(n) exists.
Proof Pick a set V of 17 variables. Form the word algebra W on V of the same similarity type as the algebras in K. Given a class K of tonoids, we define a quasi-congruence on W as follows: (qc)
W;) K w' iff for every interpretation I of W in an algebra A E K,I(W) ::; I(W').
G = {[xb K : x E V} generates AI ~K and (because of non-degeneracy) if x =f. y then [X]""K =f. [yb K • These are free generators. Let f be any mapping of G into an arbitrary A E K. Define an interpretation 1 so that (1) leX) = f([xb K ), (2) h([wb K) = leW).
We know as in the proof of Theorem 2.14.6 that h is an algebraic homomorphism. We need to show that it also preserves order. Thus suppose that [W]]""K::; [W2]""K' Then WI ;)K W2· We need to show that h([W]""K) ::; h([wb K), i.e., lewd ::; I(W2). But this is just what (qc) above provides. 0 We have shown that W/;)K is free in K, but we have not shown that it is a member of K. Instead we show that some isomorphic copy is a member of K. As a special case we define this relative to a single interpretation: W
;)1 w'
iff
I(W)::; leW').
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
84
CLASSICAL COMPLEMENTATION
Lemma 3.11.4 Let A = (A, =A, (Oi)iEI) and A' = (A', :::;A', (O;)iEI) be two similar tonoids (note that the first is a discrete algebra). Let h be an algebraic homomorphism from Aal g = (A, (Oi)iEI) onto A~lg = (A', (O;EI»' Then h is also a tonoid homomorphismfrom A = (A, =A, (Oi)iEI) onto A' = (A', :::;A', (O;)iEI).
Proof If a =A b, then a order, then a :::;A' b.
= b, and so h(a) = h(b). Because of the reflexivity of partial D
Proposition 3.11.5 The discrete word algebras are universally free in the class of tonoids.
Proof Let K be a class of similar algebras, and let A E K. First form the word algebra FK(n), of the same similarity type, with n is the cardinality of A. We know from Chapter 2 that FK(n) is universally free in K considered as a class of algebras. This means that every mapping f of the generators into A can be extended to an algebraic homomorphism h of FK(n) onto A. By Lemma 3.11.4, h is also a tonoid homomorphism, as is required for freedom in a class of order algebras. D Let K be an abstract class of tonoids, closed under tonoid subalgebra and tonoid direct product. We form a word algebra W on a set of variables V at least as big as A. Let 10 map V onto A. Outfitting W with =A (identity restricted to A) makes it a tonoid. By Proposition 3.l1.5 this tonoid is universally free, and so 10 can be extended to an interpretation: (1) I:W~A.
Since K is a subdirect class, K has free K-tonoids of any cardinality of generators. Let the cardinality of V be n, and consider the free tonoid FK(n). For the sake of simplification we assume that FK(n) has been constructed as a quotient tonoid on W. We then have the canonical interpretation:
8S
Corollary 3.11.7 (Theorem 3.9.6, Bloom 1976) A class K of ordered algebras is inequationally definable iff K is closed under ordered subalgebra, ordered homomorphic image, and ordered direct product. 3.12
Classical Complementation
As we have seen, the logical notions of conjunction and disjunction have lattice-theoretic c.ounterparts, ~e notions of meet and join. In a similar way, the logical notion of negatIOn has a lattIce-theoretic counterpart, the notion of complementation. We must hasten to add, however, that whereas the mathematical traits of meet and join characterize all im~lementations .of logical conjunction and disjunction, there is no corresponding root notIOn of algebraIC negation; rather, there is a whole group of notions falling under the general heading of complementation. In t~e present s~ction, we describe the classical notion of complementation, leaving alternatIve conceptIOns to the succeeding section. We begin with the traditional definition of complementation in lattice theory. Definition 3.12.1 Let L be a bounded lattice with bounds 0 and 1, and let a and b be elements of L. Then a and b are said to be complements of each other (in symbols, aC b) if they satisfy the following conditions:
=
(cl) a 1\ b 0; (c2) a V b = 1.
By way of illustrating this concept, consider the Hasse diagrams in Figure 3.4. In particular, consider in each case the element b. In (1), b has no complements (there is no x such that xCb). On the other hand, in (2) b has exactly one complement, a, and in (3) b has exactly two complements, a and c (which are also complements of each other).
(2) []K: W ~ FK(n)
It is easy to show that;:2,K ~ ;:2,1' We next define a new mapping (Ij K) from FK(n) onto A: (ljK)([W]K)
b
a
b
a
c
= I(W).
That (Ijw) is a homomorphism follows from: Lemma 3.11.6 Let A be a tonoid with two quasi-congruences ;:2,1 ~ ;:2,2. Let :::::i1 and :::::i2 be the corresponding congruences. Then Aj ;:2,1 is an ordered homomorphic image of Aj ;:2,2 under the mapping h([ab l ) = [ab 2 •
Proof The reader should consult the corresponding lemma for "unordered" algebras (Lemma 2.l5.2) for h preserving operations and being onto. We show here that h preserves order, i.e., if [a]~1 :::;1 [b]~i' then [a]~1 :::;2 [b]~2' The antecedent means that a ;:2,1 b and the consequent means that a ;:2,2 b, and so the required implication is just D the hypothesis that ;:2,1 ~ ;:2,2.
o (1)
o
(2)
o (3)
FIG. 3.4. Illustrations of complementation Consideration of these three examples leads to the following definition. Definition 3.12.2 Let L be a bounded lattice. Then L is said to be complemented if evelY element in L has at least one complement in L, and L is said to be uniquely complemented if every element in L has exactly one complement in L. In the Hasse diagrams in Figure 3.4, (1) is not complemented (since b lacks a complement), (2) is uniquely complemented, and (3) is complemented but not uniquely
86
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
CLASSICAL COMPLEMENTATION
complemented (since a, b, c all have each other as complements). Perhaps the most common example of a uniquely complemented lattice is the lattice of all subsets of a set U, where the partial ordering is set inclusion. In this case, the (unique) complement of a subset X of U is the complement of X relative to U (i.e., the set-theoretic U - X). If one has a uniquely complemented lattice, then one can define a unary complementation operation, c, so that c(a) is the unique complement of a. More generally, given simply a complemented lattice, one can define a complementation operation for each choice function that selects exactly one element from each set {x : xC a} of complements. In other words, for each way of choosing one particular complement for each element, one obtains a distinct complementation operation. Among complementation operations in general, there are special ones which are described in the following definitions.
1
1
a
a-
(1)
87
°
(2)
a
b-
b
a-
°
Definition 3.12.3 Let L be a complemented lattice, and let n be any function from L into L. Then n is said to be an orthocomplementation on L if, for all x, y in L: (n1) (n2) (n3) (n4)
n(x) /\ x = 0. n(x) V x = 1. n[n(x)] = x. Ifx ~ y, then n(y)
a
~
n(x).
b
(3)
If we read 11 as a negation function, then the intuitive content of (n1)-(n4) goes as follows. First of all, (n3) is simply double negation, and (n4) is simply a form of contraposition which says that if x implies y then the negation of y implies the negation of x. Then (n1) and (n2) together say that n(x) is a complement of x. Recall that implies every proposition, and 1 is implied by every proposition. Thus, (n1) says that the conjunction of x with its negation implies every proposition, and (n2) says that the disjunction of x with its negation is implied by every proposition. With the notion of orthocomplementation, we can define a special class of algebras as follows.
°
Definition 3.12.4 Let A = (A, /\, V, 0,1, /1) be an algebra of type (2,2,0,0,1). Then A is said to be an orthocomplemented lattice (or simply an ortholattice) if (1) (A, /\, V, 0,1) is a complemented lattice, and (2) n is an orthocomplementation on (A, /\, V, 0,1).
A common example of an ortholattice consists of the power set of any set U, where the orthocomplementation operation is the standard set-complement operation. Figure 3.5 contains Hasse diagrams of ortholattices. Here, x - denotes n(x); a further convention is that 0- = 1, 1- = 0, x -- = x. One can show that the orthocomplementation functions indicated in (1) and (2) are the only ones admitted by those structures. In the case of (1), the lattice is uniquely complemented. In the case of (2), the lattice is not uniquely complemented, so it admits many complementation operations; nevertheless, the lattice in (2) admits only one orthocomplementation operation. In the case of (3), there are three distinct orthocomplementation functions, one of which has been depicted; all the others are isomorphic to this one.
°
FIG. 3.5. Hasse diagrams of ortholattices
~ince orthocomplementation is defined in terms of the partial order relation, it is not whether ortholattices can be equationally characterized. However, as it turns out, they can, the relevant equations being given as follows: (01) aVa-=I; (02)a/\a-=O; (03) (a-)-=a; (04) (a/\bf =a-vb-; (05) (a V bf = a- /\ b-. ObVIOUS
Exercise 3.12.5 It is left as an exercise to show that every ortholattice satisfies these equations, and every lattice satisfying these equations is an ortholattice. We have seen that orthocomplementation constitutes a mathematical characterization of negation, treating it as a unary operation on propositions. This characterization of negation involves the following principles. (1) (2) (3) (4)
Double negation: not(not(x)) = x. Contraposition: if x implies y, then not(y) implies not(x). Contradiction: x and not (x) implies everything. Tautology: everything implies x or not(x).
These particular principles are not universal features of all logics that have been proposed. On the one hand, classical logic, supervaluational logic, and quantum logic
88
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
espouse all four principles. On the other hand, multi-valued logic, intuitionistic logic, and relevance logic dispute one or more of these principles. For this reason, it is useful to present alternative conceptions of complementation. An important thing to notice about orthocomplemented lattices is that since they are not required to be uniquely complemented, the definition of the orthocomplement function can be somewhat arbitrary. For example in Figure 3.4(3), the orthocomplement of a could be anyone of the nodes labeled a-, b- , or b. This is not true if the underlying lattice is "distributive," as we shall see in Section 3.14, since then complementation is unique. 3.13
Non-Classical Complementation
As noted in the previous section, orthocomplementation provides a mathematical representation of classical negation. Since classical negation has features that have been disputed by alternative logics, in the present section we discuss a general concept of complementation that subsumes classical negation as well as a number of well-known alternati ves. The notion of "non-classical complementation" might seem at first glance to be a contradiction in terms. Indeed, it has been customary in the literature to call an element c the "complement" of an element b only when they satisfy the "classical" conditions (cl) and (c2) of the previous section. This has led to a proliferation of pejorative terms for weaker notions 3lising in connection with various non-classical logics; for example, Rasiowa (1974) uses such terms as "pseudo-complementation" and "quasicomplementation." We hope to reverse this trend before the term "quasi-pseudo-complementation" appears in the literature. Unfortunately, there is not much time; Rasiowa (1974) already refers to "quasi-pseudo-Boolean algebras"! In particular, we propose to use the term "complementation" as a generic tem for any unary operation on a lattice (or partially ordered set) that satisfies a certain minimal condition common to most well-known logics. Note, however, that we shall continue to use the expression "complemented lattice" in the traditional sense, to mean a lattice in which every element has a complement in the sense of (c 1) and (c2) of the previous section. The following is our official definition of (generic) complementation.
NON-CLASSICAL COMPLEMENTATION
89
Exercise 3.13.2 Show that (ml) is equivalent to the following pair of conditions: (m2) If a ::; b then -b ::; -a. (m3) a::; - - a. Condition (m2) corresponds to the logical principle of contraposition. We shall call a unary operation satisfying (m2) a subminimal complementation. Condition (m3) corresponds to the weak half of the logical principle of double negation. The remaining (strong) half of double negation (viz., - - a ::; a) does not follow from the minimal principle(s) (ml)-(m3), as can be seen by examining various examples below. What does follow (and what may be verified as an exercise) is the following principle of triple negation: (m4) - - -a = -a.
Definition 3.13.1 Let P be a partially ordered set, and let x H -x be a unary operation on P. Then x H -x is said to be a complementation (operation) on P (f the following condition is satisfied:
~efore discussing the various specific versions of non-classical complementation, we dISCUSS a way of looking at the above minimal principles of complementation (negation). Recall that traditional logic distinguishes between contradictories (literal negations) and contraries. An example of a contrary of "snow is white" is "snow is black"; they are contrary precisely because they cannot both be true. However, "snow is black" is not the weakest proposition contrary to (inconsistent with) "snow is white." For it does not merely deny that snow is white; rather, it goes on to say specifically what other color snow has. The sentence "snow is white" has many contraries, including "snow is red," "snow is green," "snow is puce," etc. To deny that snow is white is to say that snow has some other color, which might be understood as the (infinite) disjunction "snow is red, or snow is green, or snow is puce, or ...." Thus, the negation of a proposition is an infinite disjunction of all of its contraries. Assuming that disjunction behaves like the join operation of lattice theory, another way of expressing the above is to say that the negation of a proposition b is the weakest proposition inconsistent with b. Somewhat more formally, the negation of b is the ~east u?per bo~nd (relative to the implication partial ordering) of all the propositions mconslstent WIth (contrary to) b. Now, of course, the indicated least upper bound may not exist. Under special circumstances, its existence is ensured; for example, its existence is ensured whenever the set of propositions forms a complete lattice with respect to the implication relation. Given the relation I (where "xl y" is read "x is inconsistent with y"), the least upper bound property can be expressed quite concisely by the following principle:
(ml) If a::; -b, then b ::; -a.
(nl) x::; -a iff xla.
Notice that, since (ml) implies its own converse, we could just as easily replace (ml) by the associated biconditional. Also notice that the poset P can, as a special but important case, be a bounded lattice. The minimal condition (ml) con"esponds roughly to the natural assumption that if proposition a is inconsistent with proposition b, then conversely b is inconsistent with a (see below, however).
This simply says that -a is implied by all and only propositions that are inconsistent with a. Ignoring for the moment the natural assumption that the relation I is symmetric, we can write a corresponding principle, pertaining to a second negation operation, as follows: (n2) x::;
~a
iff aIx.
90
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
Starting from (n1) and (n2), it is easy to show the following, which looks very much like (m1): (gl) as-b iff
NON-CLASSICAL COMPLEMENTATION
H3
1
1
bS~a.
a
Indeed, to obtain (m 1) as a special case, all we have to do is postulate that the relation I is symmetric, in which case we can show that -a = ~ a. A pair of functions (-, ~) satisfying (gl) is called a Galois connection. More will be said about Galois connections in due course; at the moment, we simply remark (and ask the reader to prove) that one can show that Galois connections have the following properties reminiscent of negation: (g2) If a S b, then -b S -a. (g3) If a sb, then ~b S ~a. (g4) as-~a. (g5) a S ~ -a.
a (= -a)
o
0(= -a)
H4
1
a
Now, returning to complementation operations, we remark that every orthocomplementation operation is an example of a complementation operation. On the other hand, there are alternative non-classical complementation operations which are intended to model the negation operators of the various non-classical logics. The classical principles of negation can be formulated (somewhat redundantly) lattice-theoretically as follows:
dM4
b
Exercise 3.13.3 Show that (gl) is equivalent to (g2) through (g5).
(pI) (p2) (p3) (p4) (p5)
dM3
91
(= -a) a
- - a = a (strong double negation). a /\ -a = 0 (contradiction). a V -a = 1 (tautology).
The first two principles are unopposed, and accordingly constitute the bare-bones notion of complementation, as we conceive it. On the other hand, the remaining three principles have been disputed by various non-classical logical systems. For example, Heyting's (1930) intuitionistic logic system H rejects (p3) and (p5), although it accepts the non-minimal principle (p4). The minimal logic system of Johansson (1936) goes one step further and rejects (p4) as well, accepting only the minimal principles-(p1) and (p2). On the other hand, the relevance logic systems E and R of Anderson and Belnap (1975) reject both (p4) and (p5), but accept (p3). Interestingly, the multi-valued logic systems of Lukasiewicz (1910, 1913) agree precisely with relevance logic concerning the principles of negation, although for very different philosophical reasons. In light of the different accounts of logical negation, we propose correspondingly different accounts of complementation, which are formally defined as follows. We offer in Figure 3.6 a few examples, using Hasse diagrams. In each diagram, complements are
o
0(= -a = -b)
as - - a (weak double negation). If as b, then -b S -a (contraposition).
b (= -b)
M4 (= -b) a
b(=-a)
o
a
b (= -b)
0(= -a)
FIG. 3.6. Examples of v31ious sorts of complementation indicated parenthetically, except for 0 and 1, in which case the reader is to assume that -1 = 0 and -0 = 1, unless explicitly indicated otherwise.
Definition 3.13.4 Let L be a lattice with 0, and let x 1-+ -x be a complementation on L. Then x 1-+ -x is said to be a Heyting complementation on L if it additionally satisfies the following condition: (p4) a /\ -a = O.
Definition 3.13.5 Let L be a lattice, and let x 1-+ -x be a complementation on L. Then x 1-+ -xis said to be a De Morgan complementation on L if it additionally satisfies the following condition:
CLASSICAL DISTRIBUTION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
92
93
Minimal complement (p3) - - a = a. Notice that De Morgan complementation is so called because it satisfies the De Morgan laws: (dMl) -(a V b) = -a A -b; (dM2) -(a A b) = -a V -b.
Heyting complement
De Morgan complement
(This may be verified as an exercise.) Indeed, (p3) and either (dMI) or (dM2) provide an equational characterization of De Morgan complementation. (This may be shown as an exercise.) For the sake of completeness, we repeat the definition of orthocomplementation here.
Orthocomplement FIG. 3.7. Logical relationships among four sorts of complementation
Definition 3.13.6 Let L be a lattice with 0 and 1, and let x 1-+ -x be a complementation on L. Then x 1-+ -x is said to be an orthocomplementation on L if it additionally satisfies the following conditions: (p3) - - a = a;
= 0; a V -a = 1.
(p4) a A -a
(pS)
The various kinds of complementation operations are more rigorously associated with the various kinds of logics in the chapters devoted to those particular logics. The purpose of the current section is plimarily to give basic definitions, and to show a little of the lay of the land. In Figure 3.6, H3 and H4, which are Heyting lattices, illustrate Heyting complementation. dM3 and dM4, which are De Morgan lattices, illustrate De Morgan complementation. B4, which is a Boolean lattite, illustrates orthocomplementation. Finally, M4 illustrates minimal complementation, in the sense that the complementation operation satisfies no complementation principle beyond the minimal principles. We have now described four sorts of complementation: orthocomplementation, Heyting complementation, De Morgan complementation, and general (minimal) complementation. The logical relationships among these are depicted in the Hasse diagram in Figure 3.7, where the strict ordering relation is "is a species of." Exercise 3.13.7 As a final exercise for this section, the reader should verify the asymmetry of the above relation. In particular, the reader should show that not every minimal complement is a De Morgan complement (Heyting complement), and not every De Morgan complement (Heyting complement) is an orthocomplement. This may be done by reference to the examples above.
3.14
in the sense that every poset is isomorphic to an inclusion po set. A natural extension of this notion is that of a lattice of sets, which is formally defined as follows. Definition 3.14.1 Let P be an inclusion poset. Then P is said to be a lattice of sets if P is closed under intersection and union, which is to say that for all X, Y, (1) ifX,YEP,thenXnYEP;
(2) if X, YEP, then X U YEP. An alternative piece of terminology in this connection is "ring of sets." Note carefully that a lattice of sets is not merely an inclusion po set that happens also to be a latticewe will call this an inclusion lattice. In order for an inclusion lattice to be a lattice of sets, the meet of two sets must be their intersection, and the join of two sets must be their union. The Hasse diagrams in Figure 3.8 illustrate the difference. Both of these are inclusion po sets that happen to be lattices. However, ILl is not a lattice of sets, because in ILl join does not correspond to union; for example, {a} V {b} = {a, b, c} f:. {a} U {b}. By contrast, IL2 is a lattice of sets. Now, every poset is isomorphic to an inclusion po set, and every lattice is isomorphic to an inclusion lattice. On the other hand, not every lattice is isomorphic to a lattice of sets. In support of this claim, we state and prove the following theorem. {a,b,c}
{a, b, c}
{c}
{a}
{c}
{a,b}
Classical Distribution
Recall that an inclusion poset is a poset (A, :s;) in which A is a collection of sets and :s; is set inclusion. As noted in Section 3.2, inclusion po sets are completely typical po sets
o
o
FIG. 3.8. Distinction between inclusion lattices and lattices of sets
Theorem 3.14.2 Let L be a lattice that is isomorphic to a lattice of sets. Then for all a, b, c, in L, a 1\ (b V c) = (a 1\ b) V (a 1\ c). Proof. Suppose L * is a lattice of sets, and suppose that h is an isomorphism from L into L *. Then hex 1\ y) = hex) n hey), and hex V y) = hex) U hey) for all x, yin L. So, setting A = h(a), B = h(b), and C = h(c), we have h[a 1\ (b V c)] = An (B U C), and h[(a 1\ b) V (a 1\ c)] = (An B) U (A n C). By set theory, An (B U C) = (A n B) U (A n C), so h[a 1\ (b V c)] = h[(a 1\ b) V (a 1\ c)]. But h is one-one, since it is an isomorphism, so this implies that a 1\ (b V c) = (a 1\ b) V (a 1\ c). 0
The sort of reasoning employed in the above proof can be generalized to demonstrate that any lattice that is isomorphic to a lattice of sets satisfies the following equations: (dl) a 1\ (b V c) = (a 1\ b) V (a 1\ c); (d2) a V (b 1\ c) = (a V b) 1\ (a V c); (d3) (a 1\ b) V (a 1\ c) V (b 1\ c) = (a
V
b) 1\ (a
V
c) 1\ (b
V
c).
These three equations are known as the distributive laws; the first two are the common forms, and are dual to one another; the third one is the "self-dual" form. Consideration of these equations leads naturally to the following definition. Definition 3.14.3 Let L be a lattice (not necessarily bounded). Then L is said to be a distributive lattice if it satisfies (dl )-( d3). Now, one can demonstrate that a lattice satisfies all three equations-(dl), (d2), (d3 )-if it satisfies anyone of them. Exercise 3.14.4 Prove this claim. On the other hand, these formulas are not entirely lattice-theoretically equivalent, a fact that is demonstrated in the next section. Before continuing, we observe that one half of each distributive law is satisfied by every lattice. This is formally presented in the following theorem. Theorem 3.14.5 Let L be a lattice. Thenfor all a, b, c in L, (1) (a 1\ b) V (a 1\ c) ~ a 1\ (b V c); (2) a V (b 1\ c) ~ (a V b) 1\ (a V c);
(3) (a 1\ b)
V
(a 1\ c)
V
(b 1\ c) ~ (a
V
b) 1\ (a
V
c) 1\ (b
V
95
CLASSICAL DISTRIBUTION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
94
1
NDI
1
ND2 a
a
b
c
b
c
0 0 FIG. 3.9. Non-distributive lattices Whereas the inequations of Theorem 3.14.5 are true of every lattice, their converses are not. In other words, not every lattice is distributive; the lattices in Figure 3.9, for example, are non-distributive. Indeed, NDI and ND2 are completely typical examples, as explained in the following theorem. Theorem 3.14.8 Let L be a lattice. Then L is distributive lattice that is isomorphic to NDI or to ND2.
if and only if it has no sub-
Proof. The easy direction is from left to right. If a lattice contains a sublattice isomorphic to NDI or ND2, then it is easy to check that a 1\ (b V c) i (a 1\ b) V c-assuming the labelling as in Figure 3.9. For the other direction, assume that a 1\ (b V c) i (a 1\ b) V c for some a, band c in L. There are five cases to check. If a ~ b, or b ~ a, or a and b are incomparable and a ~ c, then in fact a 1\ (b V c) ~ (a 1\ b) V c contrary to our assumption. (The verification of these cases is left as an exercise.) The two remaining cases are when a and b are incomparable but c ~ a, and when a and b, and also a and c, are incomparable. The first gives rise to a sublattice isomorphic to ND2, the other allows one to construct a sublattice isomorphic to NDI. We sketch the reasoning for ND2, leaving the other construction as an exercise. Since a ~ c, a 1\ c = c, and a V c = a. The diagram on the left in Figure 3.10 illustrates what is known so far. Notice that a 1\ (b V c) ~ a 1\ c, and a V c ~ (a 1\ b) V c, by isotonicity, which gives the diagram on the right. If any two of the five elements, a 1\ (b V c), b V c, b, a 1\ b and (a 1\ b) V c are identified, then a 1\ (b V c) ~ (a 1\ b) V c; thus, these five elements form a sublattice isomorphic to ND2. 0
c). a
a
bvc
bvc
Exercise 3.14.6 Prove Theorem 3.14.5. al\(bvc)
It follows from (1) that (dl) can be simplified to:
(dl') a 1\ (b
V
c) ~ (a 1\ b) V (a 1\ c).
This may, in turn, be simplified to: (dl") a 1\ (b V c)
~ (a 1\ b) V
c.
Exercise 3.14.7 Prove that (dl') and (dl") are equivalent.
b
b (al\b)Vc c al\b c al\b FIG. 3.10. Illustration of the proof of Theorem 3.14.8
96
97
CLASSICAL DISTRIBUTION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
In this context, by "isomorphic," we mean that the function preserves meet and join, but need not preserve bounds (especially if the lattice has no bounds). Of course, as a special case of the above theorem, if the lattice has a sublattice exactly like ND1 or ND2, including the bounds, then it is not distributive. This fact is intimately connected to the following theorem.
1
1
Theorem 3.14.9 In a distributive lattice, every element has at most one complement; i.e., if aCx and aCy then x = y.
a
b
a
c
d
f
Corollary 3.14.10 Every complemented distributive lattice is uniquely complemented. A direct method of demonstrating this theorem proceeds along the following lines: suppose a is an element with complements c and c'; show c = c' by applying the distributive law together with the principle that x 1\ y = x iff x V y = y iff x :'S y. The details are left as an exercise. An alternative method of demonstrating this theorem involves appealing to the previous theorem as well as showing that every instance of ambiguous complementation yields either a sublattice isomorphic to ND1 or a sublattice isomorphic to ND2. This is also left as an exercise.
o
o
o
CD3
CD2
CD1
FIG. 3.1l. Complemented distributive lattices
1
Theorem 3.14.11 In a complemented distributive lattice, complementation is unique. Proof Suppose that we have two complements of x: -x and ~x. Then -x 1\ x ~x 1\ x, and -x V x = I = ~x V x. By Theorem 3.14.13, ~x = -x.
=0 =
1
b
a
0
Definition 3.14.12 A Boolean algebra (sometimes called a Boolean lattice) is a complemented distributive lattice. By the previous corollary, complementation is in fact unique, and it is customary to denote it by -x or sometimes x. We shall learn more about Boolean algebras in Section 8.7, but for now let us content ourselves with two examples. Example 3.14.13 Consider the power set of some set U: rp(U) = {X: X ~ U}. This is readily seen to be a Boolean algebra, with the lattice order just ~, glb just n, lub just u (all of these restricted to rp(U), and -X = {a E U: a ¢ X}. Example 3.14.14 The Lindenbaum algebra of classical propositional calculus can be shown to be a Boolean algebra. Showing this depends somewhat on the particular formulation, and may require some "axiom chopping." But it is very easy if we take the classical propositional calculus to be defined as truth-table tautologies. Some distributive lattices are complemented, and others are not. Figure 3.11 contains examples of distributive lattices that are complemented; Figure 3.12 contains examples of distributive lattices that are not complemented. The distributivity of CD1 and NCD1 is an instance of a more general theorem. Theorem 3.14.15 EvelY linearly ordered set is a distributive lattice. Proof We note the following. In a linearly ordered set, for all b, c, either b :'S cor c :'S b. In general, if b :'S c, then b 1\ c = band b V c = c. Now let us consider a 1\ (b V c) :'S (a 1\ b) V (a 1\ c). If b :'S c then a 1\ (b V c) = a 1\ b :'S (a 1\ b) V (a 1\ c). The case where c :'S b is similar. 0
a
d
o
d
o
o NCD1
c
NCD2
NCD3
FIG. 3.12. Non-complemented distributive lattices On the other hand, the distributivity of the remaining lattices above follows from the above theorem along with the fact that distributive lattices are equationally defined and hence form a variety. The class of distributive lattices is accordingly closed under the formation of homomorphic images, direct products, and subalgebras. CD2 is isomorphic to the 2-direct power of CD1; CD3 is isomorphic to the 3-direct power of CD1. NCD2 is isomorphic to the direct product of CD1 and NCD1, and NCD3 is a sublattice of NCD2, which happens to be a subdirect product of CD1 and NCD1. Note also that NCD1 is a subdirect product of CD1 with itself. We conclude this section by describing a general class of concrete distributive lattices. Theorem 3.14.16 Let n be a natural number, let D(n) be the set of divisors of n, and let:'S be the relation of integral divisibility among elements of D(n). Then (D(n),:'S) is a distributive lattice. And in particular, a 1\ b is the greatest common divisor of a and b, and a V b is the least common multiple of a and b.
Proof Let us denote the two operations by gcd and lnn. It is easy to verify that both gcd and ICIn are idempotent, commutative, and associative; furthermore, gcd(a, lcm(a, b)) = a and lcm(a, gcd(a, b)) = a. (This is left as an exercise.) To show that the lattice is distributive, one has to show gcd(a, lcm(b, e)) ~ lcm(gcd(a, b), e). To show this, assume that pll is a prime factor in gcd(a, lcm(b, e)), i.e., pll I gcd(a, lcm(b, e)), but pll t lcm(gcd(a, b), e). Then, pili a and pllllcm(b, e), so pili a and (pili b or pili e). From the second assumption, pll gcd(a, b) and pll e. Then, pll a and pll b, which leads to contradiction, since then pll t gcd(a, b). 0
t
t
t
99
NON-CLASSICAL DISTRIBUTION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
98
I
b e
t
a
Distributive lattices have the following useful property. Theorem 3.14.17 Let L be a distributive lattice. Suppose that there are elements a, x, Y such that x
1\
a
o
= y 1\ a,
FIG. 3.13. ND2
x V a = y V a.
Then x = y. Proof x
=
=
x V (x 1\ a) x V (y 1\ a) (y V x) 1\ (y Va) y V (y 1\ a) y.
3.15
=
(x V y) 1\ (x V a)
= (x V y) 1\ (y V a) = o
=
Non-Classical Distribution
The classical principles of distribution are espoused by virtually every proposed logical system. Nevertheless, there are exceptions-in particular, the various non-distributive logics inspired (principally) by quantum theory. Inasmuch as classical distribution is not a universal logical principle, it is worthwhile for us to examine briefly various proposed weakenings of the classical principles of distribution. We begin by defining the notion of a distributive triple, which is a natural generalization of the notion of a distributive lattice. Definition 3.15.1 Let L be a lattice, and let {a, b, e) be an unordered triple of elements of L. Then {a, b, e) is said to be a distributive triple if it satisfies the following equations, for every assignment of the elements a, b, e to the variables x, y, z: (dl) x 1\ (y V z) = (x 1\ y) V (x 1\ z); (d2) x V (y 1\ z) = (x V y) 1\ (x V z).
if and only if every triple of elements of L
Thus, as the reader can verify, the assignment of a to x, b to y, and e to z satisfies (dl) but not (d2). Note that if we interchange a with b in ND2 the same assignment satisfies (d2) but not (dl). As mentioned in the previous section, ND2 and ND1 together are completely typical non-distributive lattices, in the sense that a lattice is non-distributive if and only if it has a sublattice isomorphic to ND1 or ND2. Thus, classical distributivity corresponds to the absence of sublattices like ND1 and ND2. This way of looking at it suggests an obvious way to generalize classical distribution. Classical distribution rules out both ND1 and ND2. One way to generalize classical distribution involves ruling out only ND1, and another way involves ruling out only ND2. The first generalization does not correspond to any well-known class of lattices, so we will not pursue it any further. The second generalization, however, does correspond to a frequently investigated class of lattices-namely, modular lattices-which are formally defined as follows. Definition 3.15.3 Let L be a lattice. Then L is said to be modular following equation:
We note the following immediate theorem. Theorem 3.15.2 A lattice L is distributive a distributive triple.
a 1\ (b V e) = a 1\ I = a; (a 1\ b) V (a 1\ e) = a V 0 = a; a V (b 1\ e) = a V 0 = a; (a V b) 1\ (a V e) = b 1\ 1 = b.
is
As mentioned in the previous section, a lattice satisfies (dl) if and only if it satisfies (d2), yet (dl) and (d2) are not entirely lattice-theoretically equivalent. This amounts to the fact that a particular assignment of lattice elements to variables can satisfy one without satisfying the other. For example, consider the lattice ND2 in Figure 3.l3. In this particular case:
(ml) x
1\ (y V
(x 1\ z))
if it satisfies
the
= (x 1\ y) V (x 1\ z).
Notice first of all that (ml) is a consequence of (the universal closure of) (dl), obtained simply by substituting 'xl\z' for' z: and noting that xl\(xl\z) = xl\z. On the other hand, (dl) is not a consequence of (ml). This is seen by noting that M1 (= ND1) satisfies (ml) but not (dl). (This may be shown as an exercise.) Note that M1 in Figure 3.14 is the smallest modular lattice that is not distributive.
NON-CLASSICAL DISTRIBUTION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
100
101
a/\b
b
b
a
b/\(a/\c)
a/\c
c
c
c
b
a
a/\c
(1)
(2)
a/\b
a/\c
o FIG. 3.15. Illustration of the proof of Theorem 3.15.5
FIG. 3.14. M1 Next we observe that, just as there are three distribution equations, there are three modularity equations. In addition to (mI), there is also (m2), which is the dual of (mI), and (m3), which is its own dual: (m2) x
V (y /\ (x V z)) = (x V y) /\ (x V z);
(m3) (x /\ y)
V «x V y) /\ z)
= (x V y) /\ «x /\ y) V z).
One can show that a lattice satisfies all three equations-(m1), (~2), (m3)-if ~nd only if it satisfies anyone of them. On the other hand, th~se equatIOns. are not lattIcetheoretically equivalent. (The reader may wish to prove thIS as an ex~rclse:) . The following series of theorems connects the notion of modulanty. Wlt~ the notI~n of distributive triple, and with the idya that modularity excludes sublattIces IsomorphIc toND2. Theorem 3.15.4 Let L be a modular lattice, and let a, b, c be elements of L, and suppose that a ::; b. Then {a, b, c} is a distributive triple. To prove the claim notice that half of each equality (d1), (d2) .holds in every lattice. Notice also that simultaneously substituting y for z and z for y III (dl) or (~2) returns (dl) and (d2). Thus, it is sufficient to derive six inequations. The two steps whIch use the fact that the lattice is modular are explicitly indicated, the rest follows from a ::;..b or general lattice properties. (i) a /\ (b V c) ::; a = a V (a /\ c) = (a /\ b) V (a /\ c). ~~~) a V (c /\ b) = a V (c /\ (a V b)), and by (m2) a V (c /\ (a V b)) = (a V c) /\ (a V b). (~1l) b /\ (a V c) = b /\ (c V (b /\ a)), and by (m1) b /\ (c V (b /\ a)) = (b /\ a) V (b /\ c). (IV)
Proof
(a/\b)/\(a/\c). (iii) b/\(a/\(b/\c))
= b/\(a/\(a/\c)) = b/\a = a = (a/\b)/\(a/\c).
(iv) b /\ (c /\ (b /\ a)) = b /\ (c /\ a) = a = (a /\ b) /\ (a /\ c). (Figure 3.15(1) illustrates this case when no additional assumptions concerning c are made.) (2) Assume that a II b II c II a. The following inequations hold due to the isotonicity properties of meet and join: a /\ (b /\ (a /\ b)) ::; a, (a /\ b) /\ (a /\ c) ::; a, a /\ (b /\ (a /\ c)) ::; b /\ (a /\ c), (a /\ b) /\ (a /\ c) ::; b /\ (a /\ c), a /\ (b /\ (a /\ c)) ~ a /\ b, and (a /\ b) /\ (a /\ c) ~ a /\ b. Since in any lattice the inequation x /\ (y /\ z) ~ (x /\ y) /\ (x /\ z) is true, a /\ (b /\ (a /\ c)) ~ (a /\ b) /\ (a /\ c). Let us denote by x and y the left- and the right-hand side of the last inequation. We want to show that not only y ::; x, but also x ::; y. Since y ::; x, x and y form a distributive triple with any element; take b. Then (d 1): x /\ (y /\ b) = (x /\ y) /\ (x /\ b). Then b /\ Y = b /\ (a /\ b) /\ (a /\ c) = b /\ (a /\ c), and further, x /\ (y /\ b) = a /\ (b /\ (a /\ c)) /\ (b /\ (a /\ c)) = a /\ (b /\ (a /\ c)) = x. On the other hand, b /\ a = b /\ a /\ (b /\ (a /\ c)) = a /\ b, and so y /\(x /\ b) = (a /\ b) /\ (a /\ c) /\ (a /\ b) = (a /\ b) /\ (a /\ c) = y. That is, x = y. The incomparability is a symmetric relation; thus, the proof is complete. (Figure 3.15(2) illustrates this case, showing x and y not yet identified.) D
= b/\(bVc) = (bVa)/\(bVc). (v) c/\(avb) = c/\b::; (c/\a)V(c/\b~ = c V a ~ (c V a) /\ (c V b).
Theorem 3.15.6 A lattice L is modular iff it contains no sub lattice iso17ZOlphic to ND2.
Theorem 3.15.5 Let L be a lattice having the property that every triple {a, b, c} such that a ::; b is a distributive triple. Then L is a modular lattice.
Exercise 3.15.7 Prove this theorem. The tedious part of the proof is similar to the proof of Theorem 3.14.8. (Hint: Construct the free modular lattice over three generators.)
bV(a/\c)
~
The proof consists of showing that all triples satisfy (m1). There are two main cases to consider: first, when there are two elements related in the triple; and second, when all three elements are incomparable. (1) Assume that a::; b. (i) a /\ c ::; a ::; b, so b /\ (a /\ c) = b, and further a /\ (b/\ (a /\ c)) = a. On the other hand, a /\ b = a since a ::; b. But a /\ (a /\ c) = a. Thus, a /\ (b /\ (a /\ c)) = a = (a /\ b) /\ (a /\ c). The other cases are very similar, and they are summarized in by the following equations. (ii) a /\ (c /\ (a /\ b)) = a /\ (c 1\ a) = a =
Proof
b
(vi) c V (a /\ b)
102
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
NON-CLASSICAL DISTRIBUTION
Thus, in a modular lattice, although not every triple need be distributive, every triple in which one element is below another is distributive. Modular lattices are the best-known generalization of distributive lattices. Nonetheless, they are not sufficiently general for the purposes of quantum logic. We must accordingly consider further weakenings of the distributive laws. Now, just as we can generalize distributivity by defining the notion of a distributive triple, we can generalize modularity by defining the notion of a modular pair as follows.
b
a
103
c
Definition 3.15.8 Let L be a lattice, and let (b, c) be an ordered pair of elements of L. Then (b, c) is said to be a modular pair-written M (b, c)-if the following condition obtains for evelY a in L: (mp) If a :$ c, then a V (b 1\ c)
= (a V b) 1\ c.
d
e
The following theorem states the expected relation between modularity and modular pairs. (The proof of the theorem is left as an exercise.)
Theorem 3.15.9 A lattice L is modular iff every pair of elements of L is modular.
WMI
Note that the modularity relation M is not symmetric. For example, ND2 (see Figure 3.13) provides a counter-example-M(b, c) but not M(c, b). This leads to a fairly well-known generalization of modularity, known as semi-modularity, which may be (but usually is not) defined as follows.
FIG. 3.16. Weakly modular, semi-modular lattice
Definition 3.15.10 Let L be a lattice. Then L is said to be semi-modular if it satisfies the following condition:
Definition 3.15.13 Let L be an ortholattice. Then L is said to be an orthomodular lattice if it satisfies the following condition:
(sm) If M(a, b), then M(b, a).
(om) If a .1 b, then M(a, b).
Semi-modular lattices are also called symmetric lattices, since those are the lattices in which the M relation is symmetric. The principle of modularity may be understood as declaring that every pair of elements is modular. This suggests a general scheme for generalizing modularity-rather than declaring every pair to be modular, one declares only certain special pairs to be modular. Under this scheme, we consider two examples, weak modularity and orthomodularity, the former being defined as follows.
In oth~r words, in an orthomodular lattice, every orthogonal pair is a modular pair; hence Its name.
Definition 3.15.11 Let L be a lattice with lower bound O. Then L is said to be weakly modular if it satisfies the following condition: (wm) If a 1\ b = 0, then M(a, b).
o
The following theorems state the logical relation between orthomodularity and weak modularity.
Theorem 3.15.14 Every weakly modular ortholattice is an orthomodular lattice. Theorem 3.15.15 Not every orthomodular lattice is weakly modular. The first theorem follows from the fact that a 1\ b = 0 if a 1. b. The second theorem may be seen by examining the lattice in Figure 3.17; this lattice is orthomodular but not weakly modular (neither is it semi-modular nor modular). OMI is the smallest ortholattice that is orthomodular but not modular. Ne~t, we note t~at, whereas orthomodular lattices form a variety, weakly modular ~attIc~s and semI-modular lattices do not. Concerning the former, we observe that addmg eIther of the following (dual) equations to the equations for ortholattices serves to characterize orthomodular lattices:
In other words, in a weakly modular lattice, a pair (a, b) is modular provided a 1\ b = O. Figure 3.16 contains an example of a lattice that is weakly modular and semi-modular, but not modular. Whereas weak modularity applies exclusively to lower bounded lattices, orthomodularity applies exclusively to ortholattices. The most revealing definition of orthomodularity uses the notion of orthogonality on an ortholattice, which is defined as follows, after which the official definition of orthomodular lattices is given.
(OMI) a 1\ (a- V (a (OM2) a V (a- 1\ (a
Definition 3.15.12 Let L be an ortholattice, and let a, b be elements of L. Then a and b are said to be orthogonal-written a 1. b-if a :$ b- .
Concerning the latte:, w~ appeal to Birkhoff's varieties theorem (see Chapter 2), which states that every varIety IS closed under the formation of subalgebras. In particular, we
1\ V
b» b»
= a 1\ b; = a V b.
CLASSICAL IMPLICATION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
104
3.16
d
d
105
Classical Implication
The logical concepts discussed so far in the mathematical theory of propositions have included implication, conjunction, disjunction, and negation. The astute reader has no doubt noticed the asymmetry between the concept of implication, on the one hand, and the concepts of conjunction, disjunction, and negation, on the other. Whereas implication has been treated as a binary relation on the set of propositions, the remaining concepts have been treated as binary operations on the set of propositions. In one concrete presentation of the theory of propositions, propositions are treated as sets of possible worlds. In this representation, implication corresponds to set inclusion, whereas conjunction (disjunction, negation) conesponds to intersection (union, set complement). Letting Ilxll denote the set of worlds in which proposition x is true, we can write the following pairs of expressions in the theory of propositions: (El) pimpliesq, lip II ~ Ilqll; (E2) p and q, Ilpll n Ilqll; (E3) p or q, lip II u Ilqll·
OMI
o FIG. 3.17. Orthomodular (but not weakly modular) lattice note that, whereas WMI is both weakly modular and semi-modular, it has a sublattice that is neither-specifically, the lattice in Figure 3.18: We conclude this section by noting that the lattices that are traditionally investigated in quantum logic are lattices of closed subspaces of separable infinite-dimensional complex Hilbert spaces. These lattices are orthomodular and semi-modular, but they are not weakly modular, and hence they are not modular. The condition of semi-modularity is not equational, so it is customarily ignored in purely logical investigations of quantum logic, which tend to concentrate on orthomodular lattices. I
Another analogy is worth remarking at this point. Recall the notion of a division lattice, which is the set of all divisors of a given number n, together with the relation "x divides y." In light of the theory of division, we can append the following to the above three pairs of expressions, thus obtaining three triples of analogous expressions: (el) x divides y; (e2) x plus y; (e3) x times y. Notice the crucial grammatical difference between these various expressions. Whereas (el) is a sentence (more specifically, an open formula) of the language of division theory, (e2) and (e3) are not sentences, but are rather (open) terms. Just as the fundamental predicate in the theory of propositions is "implies," the fundamental predicate in the theory of division is "divides." On the other hand, the theory of division can be enriched to include an additional concept of division, namely, the familiar one from grammar school, which can be written using either of the following pair of expressions: (e4) x divided by y;
a
c d
WM 1s
o FIG. 3.18. Sublattice ofWMl
x divided into y.
Whereas "x divides y" is a formula, "x divided by y" and "x divided into y" are terms. Thus, we have both a division relation (a partial order relation), and a division operation. Many concepts are paired in this way. For example, whereas "is a mother" is a predicate (unary relation), "the mother of" is a term operator (unary operation). What about the concept of implication? The concept of implication is expressed in English in two ways-in the formal mode of speech (the metalanguage), and in the material mode of speech (the object language), as illustrated in the following sentences: (1) "Grass is green" implies "grass is colored."
(2) If grass is green, then grass is colored.
CLASSICAL IMPLICATION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
106
In (1), the sentences "grass is green" and "grass is colored" are mentioned, which is to say that the sentences are the topic (subject) of sentence (1). Thus, the overall grammatical form of (1) is [subject-verb-object]. On the other hand, in (2), the sentences "grass is green" and "grass is colored" are not mentioned, but rather used; they are parts of sentence (2), but they are not the topic of (2). The overall grammatical form of (2) is [sentence-connective-sentence] . From the vantage point of the mathematical theory of propositions, whereas the formal version of implication corresponds to the partial order relation ~ of lattice theory, the material version of implication corresponds to any of several two-place operations on the lattice of propositions, depending on which particular analysis of material implication one opts for. For example, classical truth-functional logic opts for the simplest, and least interesting, analysis of material implication, according to which p ~ q is identical to '" p V q. This particular operation has all the properties that one expects of an implication operation, but unfortunately it also has a number of properties that make it unsatisfactory as a representation of material implication. Dissatisfaction with the shortcomings of the classical material implication has led to the investigation of a large variety of alternative material implications, including the strict implications of C. 1. Lewis (1918), the relevant implications of Anderson and Belnap (1975), and the counterfactual implications of D. Lewis (1973) and Stalnaker (1968). In this chapter, we have identified the minimum properties of formal implication (it is a pre-ordering), the minimum properties of conjunction and disjunction (they are respectively greatest lower bound and least upper bound with respect to implication), and the minimum properties of negation (it is a generic complementation operation). With this in mind, we are naturally led to ask the following fundamental question. (Q) What are the (minimum) properties of a lattice operation, in virtue of which it is
deemed an implication operation? The first requirement, of course, is that the operation in question must be a twoplace operation, since the corresponding English connective ("if ... then ... ") is a two-place connective, and also since the intended formal counterpart (~) is a two-place relation. This alone cannot be sufficient, however, unless we are willing to countenance both conjunction and disjunction as legitimate implication operations. So what other requirements need be satisfied? First of all, it seems plausible to require of any candidate operation that it be related to the implication relation ~ in such a way that if a proposition p implies a proposition q, then the proposition p ~ q is universally true (true in every world), and conversely. Stated lattice-theoretically, we have the following condition: (cl) p
~
q iff p
~
q= 1
Notice that (c1) immediately eliminates both conjunction and disjunction as candidates for implicationhood. (This may be verified as an exercise.) On the other hand, (cl) is nonetheless quite liberal. Examples of binary operations satisfying (cl) are easy to
a
107
b
o FIG. 3.19. A four-element lattice B4 construct; for example, let L be any bounded lattice with two distinct elements 0 and 1. Define x ~ y as follows: whenever x ~ y, set x ~ y equal to 1; otherwise, set x ~ y equal to anything other than 1. For a concrete example, consider the lattice B4 in Figure 3.19. The three matrices below define three different implication operations-all satisfying (el )-on B4. ~
0
a
b
0 0 0
1 0 0
1 0 1 0
0 a b
1
~
0
a
b
0 1
a b
0 0 0
b
a a
~
1 1
1 b
1
0
a
b
1
0
1 1 1
0 a
b
b
a a
0 b
a
Exercise 3.16.1 There are 4 16 binary operations on B4. Calculate the number of the operations that satisfy condition (el). Needless to say, when we look at larger lattices, the number of distinct operations satisfying (c 1) becomes combinatorially staggering. Fortunately, (c1) is not the only condition one might plausibly require an operation to satisfy in order to count as a material implication. The next plausible requirement that comes to mind is the law of modus ponens. In its sentential guise, modus ponens sanctions the inference from the sentences Sand if-S-then-T to the sentence T. In its propositional guise, the principle of modus ponens may be stated as follows (see Section 3.18 on filters and ideals): (c2) p 1\ (p
~
q)
~
q.
In one concrete representation, propositions are sets of worlds, implication is set inclusion, and meet is set intersection. Accordingly, in this context, (c2) can be rewritten as follows: (c2*) p n (p
~
q)
~
q.
Now, the latter formula is equivalent set-theoretically to each of the following, where -p is the complement relative to the "set" of all possible worlds: (c3*) -q n (p ~ q) ~ -p; (c4*) p n -q ~ -(p ~ q).
108
NON-CLASSICAL IMPLICATION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
109
Translating these into lattice-theoretic formulas yields the following, which are not lattice-theoretically equivalent to (c2):
(AI) If I were to drop this glass, then it would break; therefore, ifI were to drop this glass, and it were shatterproof, then it would break.
(c3) -q /\ (p -+ q) ::; -p;
Another condition one might consider is the law of contraposition, which may be stated lattice-theoretically as follows:
(c4) p /\ -q ::; -(p
-+ q).
Conditions (cl)-(c4) are collectively referred to as the minimal implicative conditions: every implication operation should satisfy all four conditions on any lattice with complementation. With few exceptions, no material implication that has been proposed violates any of these principles. One apparent exception is the system of Fitch (1952), which has no negative implication introduction rule, and so seems to violate (c4). The many-valued logics of Lukasiewicz violate (c2). The question is whether there are any other conditions that are satisfied by every implication operation. Without answering this question definitively, we simply examine a few candidate conditions, and show that each one is rejected by at least one extant material implication, and accordingly cannot be regarded as minimal. Let us start by considering a very powerful principle, the law of importation-exportation, which may be stated lattice-theoretically as follows:
(c8) p -+ q = -q -+ -p This condition is true of the classical material implication, and it is true of the strict implications of modal logic, but it is not true of the material implication of intuitionistic logic. Accordingly, (c8) cannot count as a minimal implicative criterion.
3.17 Non-Classical Implication A partially ordered groupoid is a structure (S, ::;, 0), where::; is a partial order on S, and o is a binary operation on S that is isotonic in each of its positions. When the partial order is a lattice ordering, we speak of a lattice-ordered groupoid when 0 distributes over V from both directions (in which case isotonicity becomes redundant). The binary operation -+ L is a left residual iff it satisfies:
(c5) p /\ q ::; riff p ::; q -+ r.
(1r) aob::;c iff a::; b-+L c.
This condition is quite strong. To begin with, it entails both (el) and (c2). What is perhaps more surprising is the following theorem.
A right residual satisfies:
Theorem 3.16.2 Let L be a lattice, and let
We often follow Pratt (1991) in denoting the right residual by the unsubscripted -+, and the left residual by +-, noting that the order of the arguments reverses so Z +- Y = Y -+ L Z. It is easy to see that left and right residuals are uniquely defined by the above properties.
-+ be any two-place operation on L satis-
fying condition (c5). Then L is distributive. Proof It suffices to show (x V y) /\ Z ::; (x /\ z) V (y /\ z). Let r = (x /\ z) V (y /\ z). Now, clearly both x /\ z ::; r, and y /\ Z ::; r, so by (c5) x::; z -+ r, and y ::; z -+ r, so x V y ::; z -+ r, so by (c5) (x V y) /\ Z ::; r. D
In other words, a lattice admits an operation satisfying (c5) only if it is distributive. On the other hand, (c5) cannot be considered a minimal implicative condition, for although it is satisfied by the classical material implication and the material implication of intuitionistic logic, it is not satisfied by the various strict implications of modal logic, nor is it satisfied by the various counterfactual implication connectives. Another candidate condition of implicationhood is the law of transitivity, stated lattice-theoretically as follows: (c6) (p -+ q) /\ (q -+ r) ::; p -+ r. This amounts to the claim that if (p -+ q) and (q -+ r) are both true, then (p -+ r) must also be true. As plausible as (c6) seems, it has the following immediate consequence, the law of weakening:
(rr) aob::;c iff b::;a-+Rc.
Definition 3.17.1 A residuated groupoid is a structure (S,::;, 0, +-, -+) where (S,::;, 0) is a partially ordered groupoid and -+,
+-
are, respectively, right and left residuals.
Note a similmity between residuals and implication. Thus thinking of 0 as a premise grouping operation (we call it "fusion" following the relevance logic literature-the term originates with R. K. Meyer) and thinking of::; as deducibility, the law of the right residual is just the deduction theorem and its converse. One could say the same about left residuals, but notice that right and left residuals differ in whether it is the formula to the left of 0 or to the right of 0 that is "exported" from the premise side to the conclusion side. We have the following easily derivable facts:
(c7) q -+ r ::; (p /\ q) -+ r.
(1) a 0 (a -+ b) ::; b, (b +- a) 0 a ::; b (modus ponens). (2) a::; b -+ (b 0 a), a::; (a 0 b) +- b (fusing). (3) Let reb) be any product al 0 . . . 0 b 0 . . . 0 all (parentheses ad lib). Then if a ::; b and reb) ::; c, then rea) ::; c (cut).
However, (c7), and hence (c6), cannot be considered as a minimal implicative criterion, since in particular it is not satisfied by counterfactual implications. In order to see this, consider the following argument:
It is easy to see that one may replace the law of the right residual equivalently with the first halves of (1) and (2), and similarly with the left residual and the second halves. One may also replace transitivity and isotonicity with (3).
110
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
NON-CLASSICAL IMPLICATION
Isotonicity of 0 and transitivity yield that residuals are antitonic in their first arguments and isotonic in their second arguments, as is stated in the following. Fact 3.17.2 If a :'S: b, then b -+ c :'S: a -+ c (rule suffixing) and c -+ a :'S: c -+ b (rule prefixing), and the same for the left-residual +-. Proof We do only the proofs for the right residual, the others being analogous:
1. a:'S: b (hypothesis) 2. a 0 (b -+ c) :'S: b 0 (b -+ c) (isotonicity) 3. b 0 (b -+ c) :'S: c (modus ponens) 4. a 0 (b -+ c) :'S: c (2,3, transitivity) 5. b -+ c :'S: a -+ c (4, right residual); 1. 2. 3. 4.
0
c) :'S: (a
0
b)
0
c.
1. co (c -+ a) :'S: a (modus ponens) 2. a 0 (a -+ b) :'S: b (modus ponens) 3. [c 0 (c -+ a)] 0 (a -+ b) :'S: b (1,2, cut) 4. co [(c -+ a) 0 (a -+ b)] :'S: b (3, right associativity) 5. (c -+ a) 0 (a -+ b) :'S: c -+ b (4, right residual) 6. (a -+ b) :'S: (c -+ a) -+ (c -+ b) (5, right residual).
(rc) a
b)
0
0
c :'S: a
0
(b
0
c),
1. 2. 3. 4. 5. 6.
6.
Fact 3.17.4 We can also go the other way and derive light associativity from prefixing. 6The reason for the word "right" is to keep mnemonic linkage to the right residual. We would not quarrel with anyone who said that it was more natural to call this "left associativity," but below we will use this name for the dual of the above inequation, to keep linkage with the lefi residual.
0
c) :'S: b 0 (a
0
c).
a -+ (b -+ c) :'S: a -+ (b -+ c)
(reflexivity)
a 0 [a -+ (b -+ c)] :'S: b -+ C (1, right residual) b 0 {a 0 [a -+ (b -+ c)]} :'S: c (2, right residual) a 0 {b 0 [a -+ (b -+ cm :'S: c (3, right commutation) b 0 [a -+ (b -+ c)] :'S: a -+ c (4, right residual) a -+ (b -+ c) :'S: b -+ (a -+ c) (5, right residual).
Now going the other way we derive right commutation from permutation:
5.
(imported prefixing).
(b
Proof First we show that permutation follows from right commutation:
3. 4.
D
0
Fact 3.17.5 In this context (pm) is equivalent to (rc).
(a 0 c) :'S: b 0 (a 0 c) (reflexivity) c :'S: b -+ [b 0 (a 0 x)] (1, right residual) c:'S:a-+ {b-+[bo(aoc)]} (2,rightresidual) c:'S: b -+ {a -+ [b 0 (a 0 cm (3, permutation) b 0 c :'S: a -+ [b 0 (a 0 c)] (4, right residual) a 0 (b 0 c) :'S: b 0 (a 0 c) (5, right residual).
I. b 2. a
Before going on let us note the following obvious consequence of prefixing (apply the law of the right residual twice): (a -+ b)] :'S: b
D
and (contextual) right commutation, D
Proof
0
(1,2, isotonicity) (imported prefixing)
(pm) a -+ (b -+ c) :'S: b -+ (a -+ c),
Fact 3.17.3 It yields the following property of the light residual, familiar from implicationallogics: a -+ b:'S: (c -+ a) -+ (c -+ b) (prefixing).
co [(c -+ a)
b:'S: a -+ (a 0 b) (fusing) c:'S: (a 0 b) -+ (a 0 b) 0 c (fusing) a 0 (b 0 c) :'S: a 0 {[a -+ (a 0 b)] 0 [(a 0 b) -+ (a 0 b) 0 c]} a 0 {[a -+ (a 0 b)] 0 [(a 0 b) -+ (a 0 b) 0 c]} :'S: (a 0 b) 0 c 5. a 0 (b 0 c) :'S: (a 0 b) 0 c (3,4, transitivity).
is equivalent to prefixing for the left residual. Let us next consider the relation between the "sequent form" of permutation,
0, e.g., associativity, commutation, idempotence, which are even likely to be taken for granted when premises are collected together into sets. In fact, these (or in the absence of associativity, slightly generalized versions) all correspond to various implicational axioms. Thus, consider half of associativity (let us call it "right associativity,,6):
(b
1. 2. 3. 4.
(a
It is customary to assume various familiar requirements on
0
Proof
It is easy to argue symmetrically that "left associativity",
a:'S: b (hypothesis) co (c -+ a) :'S: a (modus ponens) co (c -+ a) :'S: b (1,2, transitivity) c -+ a :'S: c -+ b (3, right residual)
a
III
0 0
D
Let us remark that it is clear that in the presence of associativity, right commutation can be replaced with simple commutation, a 0 b :'S: boa,
and even in the absence of associativity, the "rule f01177" of permutation, a :'S: b -+
C
implies b:'S: a -+
C,
can be shown equivalent to simple commutation. Also, of course, given a right identity element (a 0 e = a), the "sequent form" of permutation, (pm), can be shown equivalent to the "theoremfonn" of permutation,
~
e
[a -+ (b -+ c)] -+ [b -+ (a -+ c)].
It makes ideas simpler to assume the presence of such an identity element (and a left one too), as well as associativity. But we shall not so assume unless we explicitly indicate. Another familiar implicationallaw is contraction, ~
a -+ (a -+ b)
a -+ b.
If we had associativity and a right (left) identity, contraction for the right (left) residual would just amount to square-increasingness, a
~
a
0
a.
But working in their absence, we must consider the more general forms a
0
b
~
boa
a
~
(a
0
(b
0
0
a)
b) 0
(left square-increasingness).
Fact 3.17.6 Contraction for the right residual is equivalent to right square-increasingness. The cOlTesponding property for the left residual and left square-increasingness follows by a symmetric argument. Proof Let us first show that contraction follows from right square-increasingness:
1. a 2. 3. 4. 5.
a a a a
0 (a 0 b) ~ a 0 (a 0 b) (reflexivity) a 0 b ~ a -+ (a 0 (a 0 b» (1, right residual) b ~ a -+ [a -+ (a 0 (a 0 b)] (2, light residual) a -+ [a -+ (a 0 (a 0 b))] ~ a -+ (a 0 (a 0 b» (contraction) b ~ a -+ (a 0 (a 0 b» (3,4, transitivity) a 0 b ~ a 0 (a 0 b) (5, right residual).
1. a
D
Besides the rules that cOlTespond to thinking of premises as collected together into sets, there is one more rule that is often taken for granted, namely, dilution or thinning (sometimes called "monotonicity," and it is the absence of this rule that delineates socalled "non-monotonic logics"). Dilution is the rule that says it never hurts to add more premises, and algebraically it amounts to saying that a 0 b ~ b ("right lower bound"), and cOlTesponds to the positive paradoxfor the right residual, a
~
a
~
a .(- b.
The various relationships we have discovered between principles of the right residual and principles for fusion all have their obvious duals for the left residual. Besides residuation giving familiar properties of either left or right alTOW, it also gives "almost familiar" properties relating the two, at least if the reader squints so as not to be able to distinguish them (here we will use subscripts). Thus both of the following are easy to derive:
-+L
c
(rule
It is easy to see that pseudo-assertion and pseudo-permutation are equivalent to each
other. Thus, for example, the first variety of pseudo-assertion follows easily from the first variety of pseudo-permutation as follows: 1. a -+ L b ~ a -+ L b (reflexivity) 2. a ~ (a -+ L b) -+ R b (1, rule pseudo-permutation).
[a -+ (a -+ b)] ~ a -+ b (modus ponens) 0 {a 0 [a -+ (a -+ b)]} ~ b (1, right residual) 0 [a -+ (a -+ b)] ~ a 0 {a 0 [a -+ (a -+ b)]} (right square-increasingness) 0 [a -+ (a -+ b)] ~ b (2,3, transitivity) -+ (a -+ b) ~ a -+ b (4, right residual). 0
The converse goes as follows: 2. 3. 4. 5. 6.
We are really examining the case of "thinning on the right." There is also "thinning on the left," which algebraically amounts to saying that boa ~ b ("left lower bound"), and which cOlTesponds to the positive paradox for the left residual,
(pa) a ~ (a -+L b) -+R b, a ~ (a -+R b) -+L b (pseudo-assertion); (rpp) if a ~ b -+L c, then b ~ a -+R c, and if a ~ b -+R c, then b ~ a pseudo-permutation).
(right square-increasingness),
a
113
NON-CLASSICAL IMPLICATION
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
112
b -+ a.
The proof that these are equivalent is an immediate application of (IT), the law of the right residual.
Of course, when 0 is commutative, the two alTOWS are identical and hence we obtain ordinary assertion and rule permutation, familiar from relevance logic (cf. Anderson and Belnap 1975). Indeed, the commutativity of 0, the pseudo-permutation for -+ R, the pseudo-permutation for -+ L, ordinary assertion, and the rule form of permutation are all equivalent to each other. By putting various conditions on fusion, we obtain algebras cOlTesponding to various systems of implication in the logical literature (in Figure 3.20 one is supposed to keep every condition below and add the new condition). Thus with associativity alone one obtains the Lambek calculus. If one adds commutativity, one obtains linear logic (Girard 1990). From here one has two natural choices, adding either square increasingness to obtain relevant implication (Anderson and Belnap 1975), or the postulate that fusion produces lower bounds (a 0 b ~ a) to get BCK implication (Ono and Komori 1985). Finally, one obtains the properties of intuitionistic implication by collecting all these properties together. These relationships are summarized in Figure 3.20, where conditions below are always preserved in adding new properties above. Remark 3.17.7 The Lambek calculus was first formulated as a Gentzen system, and there are two versions depending on whether one allows an empty left-hand side or not. Algebraically this cOlTesponds to whether one assumes the existence of an identity or not. The same point arises with the other systems, and amounts to whether we are interested purely in the implication relations (a ~ b), or want theorems as well (e ~ c). If all theorems were of the form a -+ b, this might well be thought to be a distinction
114
FILTERS AND IDEALS
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
3.18
Intuitionistic implication
115
Filters and Ideals
Recall that, given any algebra A, and given any subset S of A, we can form a subalgebra of A based on S, so long as S is closed under tlle operations of A. Since a lattice may be regarded as an algebra, we may apply this idea to lattices, as follows. BCK implication (lower bound)
Relevant implication (square-increasingness)
Definition 3.18.1 Let L be any lattice, and let S be any non-empty subset of L. Then S is said to form a sublattice ofL if it satisfies the following conditions:
(SI) If 0 E Sand bE S, then
0 /\
(S2) fro E Sand b E S, then
0
bE
s.
V b E S.
Notice that conditions (Sl) and (S2) simply correspond to the closure of S under the formation of meets and joins. Among sublattices in general, two kinds are especially important, called filters and ideals. We begin with the formal definition of filters.
Linear implication
(commrtionl
De~ni~on
3.18.2 Let L be any lattice, and let F be any non-empty subset of L. Then F IS saId to be a filter on L if it satisfies the following conditions:
Lambek calculus (associativity)
(FI) If a E F and b E F, then
FIG. 3.20. Implicational fragments and algebraic properties without a difference, but even when the only operations are like e ~ (0 ....... 0) 0 (b ....... b).
0
and ....... , one still gets things
Remark 3.17.8 There is a subtlety implicit in the relationship of the algebraic systems to their parent logics that goes beyond the usual routine of seeing the logics as their "Lindenbaum algebras" (identifying provably equivalent sentences). The problem is that almost all of the logics mentioned above had connectives other than fusion and implication in their original formulations, and some of them did not have fusion as an explicit connective. The Lambek calculus is a notable exception, since it had no other connectives tllan the two arrows, and although it was formulated as a Gentzen system, tllere is not too much involved in reading comma as fusion and adding associativity to "defuse" the binary nature of the fusion operation. The story with the other systems is more complicated, and it would take us too far afield here to try to work out in detail that when t is a term containing only ....... , we can derive e ~ t in the algebraic system just when the formula t is a theorem in tlle logical system (for simplicity using tlle same symbols for sentential variables in tlle logic as for variables in the algebra). But by browsing through tlle literature one can find pure implicational fragments of all of the above logics-try Anderson and Belnap (1975) as a start: careful reading will give the clues needed for the other systems. The only question is then how to add fusion conservatively.
(F2) If a
E
F and 0
~
0 /\
b E F.
b, then b E F.
In ne~~ly every definition, there is the problem of dealing witll degenerate cases, and the defimtIOn of filters is no exception. For example, the empty set satisfies (FI) and (F2) vacu?usly, ye~ we have (by fiat) excluded it from counting as a filter. Similarly, the whole lattIce L satIsfies these conditions, so the dual question is whetller to count L as a filter. Here, we are somewhat ambivalent; we often want to exclude the whole set L at other times it is more convenient to count L as a filter. ' Our solution is officially to allow L as a filter, and introduce tlle further notion of proper filter, which is defined simply to be any filter distinct from L. On tlle other hand, a standard "conversational implication" throughout this book will be tllat by "filter" we mean proper filter. Occasionally we shall backslide on tllese conventions when it is convenient to do so, but we shall always tell the reader when we are doing so and why. . From the viewpoint of logic, a filter corresponds to a collection of propositions that IS closed under implication (F2) and the formation of conjunctions (FI). There are two wa~s of tllinking about tllis. On the one hand, we may think of a filter as a theory, whIch may ?e regarded as a logically closed collection of claims. In particular, if a the?ry T claIms p, and T claims q, then T claims the conjunction of p and q; and if T claIms p, and p logic.ally imp~ies q, then T claims q. On the other hand, a filter may be thought of as a (partial) pOSSIble world, which may be regarded as a closed collection of pro~~sitions (n~me.ly, tlle propositions that obtain in that world). In particular, if a propOSItIOn p o.bta~ns m wor~d w, and proposition q obtains in w, then tlle conjunction of p and q obtams m w; and If p obtains in w, and p implies q, tllen q obtains in w. (See below, however.)
There are several alternative ways of characterizing filters that are helpful. For example, (F2) may be replaced by either of the following conditions (which are equivalent in light of the commutativity of the join operation): (F2') If a E F, then a V b E F. (F2+) If a E F, or bE F, then a V bE F. The interchangeability of (F2) with (F2'), or with (F2+), is a consequence of two simple lattice-theoretic facts: a :S a V b (and of course b :S a V b); and a :S b iff b = a V b. Filters can also be characterized by a single condition, which is a strengthening of (FI): (FI +) a
FILTERS AND IDEALS
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
116
E
F and b E F iff a 1\ b E F.
The equivalence of (FI +) and (FI) & (F2) is based on the following lattice-theoretic facts: a 1\ b :S a, b; a :S b iff a 1\ b = a. Combining these observations, we obtain a useful (although redundant) characterization of a filter as a set satisfying conditions (FI +) and (F2+), collected as follows: (FI +) a E F and b E F iff a 1\ b E F. (F2+) If a E Forb E F, then a V bE F. From the logical point of view, (FI +) and (F2+) say that a filter corresponds to a set (of propositions) that behaves exactly like a classical truth set (possible world) with respect to conjunction, and halfway like a classical truth set with respect to disjunction. What is missing, which would make a filter exactly like a classical truth set, is the converse of (F2+). Appending the missing half of (F2+) yields the important notion of prime filter, which is formally defined as follows. Definition 3.18.3 Let L be any lattice, and let P be any non-empty subset of L. Then P is said to be a prime filter on L if it satisfies the following conditions: (PI) a 1\ b E P (P2) a V b E P
iff both a E P and b E P. iff either a E Par b E P.
Thus, in a prime filter (prime theory, prime world), the conjunction of two propositions is true if and only if both of the propositions are true, and the disjunction of two propositions is true if and only if at least one of the propositions is true. Having described filters, we now discuss ideals, which are exactly dual to filters. Recall that the dual of a lattice is obtained by taking the converse of the partial order relation. Now, filters on a lattice L correspond exactly to ideals on the dual of L. More formally, we define ideals as follows. Definition 3.18.4 Let L be any lattice, and let f be any non-empty subset of L. Then f is said to be an ideal on L if it satisfies the following conditions: (II) If a E f and b E f, then a V b E f. (12) If a E f and a 2:: b, then b E f.
Since ideals are dual to filters, all of the various characterizations of filters can be straightforwardly dualized (switching V and 1\, and :S and 2::) to yield corresponding
117
characterizations of ideals. In particular, dualizing the definition of prime filter yields the definition of prime ideal. Similarly, just as a filter can be thought of as a theory (i.e., a logically closed collection of claims), an ideal can be thought of as a counter-theory (i.e., a logically closed collection of disclaimers), and a prime ideal can be thought of as a "false ideal." Not only are prime filters and prime ideals dual concepts, they are also complementary, in the sense that the set-theoretic complement of any prime filter is a prime ideal, and vice versa. (The reader may wish to verify this as an exercise.) For historical as well as structural reasons, algebraists have favored ideals, whereas logicians have found filters more congenial. However, lattices are self-dual, so there can be no ultimate reason to prefer filters to ideals, or ideals to filters; after all, a filter on a lattice is just an ideal on the dual lattice, and vice versa. To emphasize this, some authors refer to filters as "dual ideals." But we are writing as logicians, and have a natural preference for truth over falsity; accordingly, we concentrate our attention on filters throughout this book, although we make occasional use of ideals (e.g., see below). Given our bias, and given the duality of filters and ideals, we shall not usually provide separate definitions of properties for ideals, being content to define a property for filters and letting the reader dualize as needed. Next, we note a very important property of filters. Theorem 3.18.5 Let L be a lattice, and let K be any non-empty collection offilters on L. Then K is also afilter on L.
n
Corollary 3.18.6 Let L be a lattice, and let S be any subset of L. Then there is a filter P on L satisfying the following: (sI) S S;; P. (s2) For any filter Fan L,
if S
S;; F, then P S;; F.
In other words, for any lattice L and for any subset S of L, there is a smallest filter on L that includes S. In particular, the smallest filter on L that includes S is the intersection of the set {F: F is a filter on L, and S S;; F}. (This is left as an exercise.) This justifies the following definitions. Definition 3.18.7 Let L be a lattice, and let S be any subset of L. Then the filter generated by S, denoted [S), is defined to be the smallest filter on L that includes S. Definition 3.18.8 Let L be a lattice, and let a be any element of L. Then the principal filter generated by a, denoted [a), is defined to be the smallest filter on L containing a; i.e., [a) = [{a}). Definition 3.18.9 Let L be any lattice, and let P be any non-empty subset of L. Then P is said to be a principal filter on L if P = [a) for some a in L. Given the above definitions, the following theorems may be verified. Theorem 3.18.10 Let L be a lattice, and let X be any subset of L. Then [X)
= {xEL:
for some a[, ... ,an in X, a[I\ ... l\an:Sx}.
FILTERS AND IDEALS
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
118
Proof We first observe that F = {y : 3Xl, .. ·, XIl E X such that Xl /\ ... /\ x ll S y} is a filter. Suppose Xl, ... , x ll E X and Xl /\ ... /\xn S a. Also suppose Yl, ... ,YI11 E X and Yl /\ .. . /\YI11 S b. Then Xl, ... , XIl, Yl, ... , YI11 E X and Xl /\ .. . /\XIl /\Yl /\ .. ·/\YI11 S a/\b. Suppose xl, ... , XIl E X and Xl /\ ... /\ x ll S a and a S b. Then Xl /\ ... /\ xn S b. We next suppose that G is a filter such that X ~ G. Clearly F ~ G, for Y E F =? 3XI, ... ,Xn E X such that Xl /\ ... /\ Xn S y. But since X ~ G, Xl,··· ,xn E G, and since G is a filter, Xl /\ ... /\ Xn E G, and again since G is a filter (and Xl /\ ... /\ Xn S Y), then Y E G. So since F ~ any filter G such that G :2 X, F ~ {G : G is ~ filter and G :J X}. And since we showed above that F is itself a filter, and since ObVlOusly 0 F :2 X, clearly [X) ~ F, and hence F = [X).
n
Theorem 3.18.11 Let L be a lattice, and let a be any element of L. Then [a) = {x E L: a
S
Theorem 3.18.12 IfG and H are filters, then [G U H) = {z : 3x E G,3y E H such thatx/\y s Z}. AndifG isafiltel; [GU {a}) = [GU[a)) = {z : 3x E G such that X /\ a S z}. [G U {a}) is often denoted by [G, a).
(i) 3Xl, ... , Xi Z,
zE
[GuH) iff 3Zl, ... , ZIl
E
G, 3Yl, ... , Yk
E
G such that Xl /\ ... /\ XIl S H such that Yl /\ ... /\ Yn S
E
E
GuH such that Zl/\· . . /\Z/l S
H such thati+k = n and Xl /\ .. . /\Xi/\Yl /\ .. ·/\Yk S
or
(ii) 3Xl, ... , x/l (iii) 3Yl, ... , Yll
E
Z,
Definition 3.18.13 Let L be any lattice, and let F be any non-empty subset of L. Then F is said to be a complete filter on L if it satisfies the following conditions: (CFl) If A ~ F, and inf(A) exists, then inf(A) E F. (CF2) Ifa E F and as b, then bE F.
Definition 3.18.14 Let L be any lattice, and let P be any complete filter on L. Then P is said to be a completely prime filter on L if it satisfies the following additional condition: (CP) For A
~
L, if sup(A) exists, and sup(A) E P, thell a E P for some a E A.
Whereas an ordinary filter is closed under binary (and hence finite) conjunction, a complete filter is closed under arbitrary (and hence infinite) conjunction. The following theorems connect these ideas to earlier ones.
x}.
Proof Finally, since [a) = {y : 3Xl, ... , Xn E {a}, Xl /\ ... /\ XIl S y}, and since a = a /\ a = a /\ a /\ a, etc., [a) = {y : as y} = [a). 0
Proof By Theorem 3.18.10, Z. But this is true iff either
119
or
z.
But (i) is true iff 3x E G (namely, Xl /\ ... /\ Xi), 3y E H (namely, Yl /\ ... /\ Yk) such that X /\ Y S Z (since both G and H are filters). And (ii) is true iff 3x E G (namely, Xl /\ ... /\ X/l), 3y E H (any Y E H) such that X /\ Y S Z (since G is a filter). And similarly, (iii) is true iff 3x E G (any X E G), 3y E H (namely, Yl /\ ... /\ Y/l) such that X /\ Y S z. As for the second part, if G is a filter, clearly [G U {a}) = [G U [a)), since any X 2:: a must be in [G U {a}). We now show that [G U [a)) = {z : 3x E G such that X /\ a S d· By the above [G U [a)) = {z : 3x E G,3y 2:: a such that X /\ Y S d· Clearly then {z : 3x E G such that X /\ a S z} ~ [G U [a)). Suppose conversely that 3x E G, 3y 2:: a such that X /\ Y S z. Then X /\ a S z. 0 The notions of ideal generated by a subset A, and principal ideal are defined in a dual manner which is left as an exercise. There ar~ two stronger notions, complete filter and completely prime filter, which are especially appropriate to complete lattices, but which are also useful occasionally in more general lattices. These are defined as follows.
Theorem 3.18.15 Every principal filter on a lattice is complete, and conversely, every! complete filter on a lattice is principal. Theorem 3.18.16 Not every principal filter is completely prime.
In examining the former, notice that in a complete filter F, the infimum of every subset of F must be an element of F, so in particular inf(F) must be in F; but if inf(F) E F, then F has a least element, viz., inf(F). In examining the latter, consider the lattice of all rational numbers, ordered in the usual way, and consider the set P = {r : o S r} of all non-negative rationals, which is clearly a principal filter. Now, although P contains the supremum of the set N of negative rationals (i.e., 0), it contains no negative rational, and accordingly is not completely prime. In addition to filters and ideals, we have occasional use for a pair of weaker notions, especially in connection with general partially ordered sets-the notions of positive cone and negative cone, which are dual to each other. The former notion is defined as follows. Definition 3.18.17 Let P be a partially ordered set, and let C be any subset of P. Then C is said to be a positive cone on P if it satisfies the following: (PC) Ifx
E
C, and X S y, then Y
E
C.
Notice that (PC) is just (F2), from the definition of filter. Next, we note that the intersection of any collection of positive cones on a poset P is itself a positive cone on P; this fact justifies the following definition.
Definition 3.18.18 Let P be a partially ordered set, and let S be any subset of P. Then the positive cone generated by S, denoted [S), is defined to be the smallest positive cone on P that includes S. Definition 3.18.19 Let P be a partially ordered set, and let a be any element of P. Then the principal positive cone generated by a, denoted [a), is defined to be the smallest positive cone on P that contains a; i.e., [a) = [{ a} ). A set S is called a principal positive cone on P if S = [a) for some a in P.
120
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
FILTERS AND IDEALS
Remark 3.18.20 Strictly speaking, the [S), [a) notation is ambiguous between filters and positive cones generated by S and a respectively. It turns out though that the principal filter generated by a is always the set {x : a ::; x}, i.e., the positive cone generated by a, so the ambiguity of "[a)" is harmless. This is not so with "[S)" since the positive cone generated by S need not be closed under meet, whereas of course the filter must be. Theorem 3.18.21 Let P be a partially ordered set, and let S be any subset of P. Then the following obtain: (1) [S) = {x E P: for some s E S, s ::; x};
(2) [a) = {xEP:a::;x}.
The notions of a negative cone, negative cone generated by A (denoted (AD, and principal negative cone generated by a, denoted (a], are defined dually. The following theorems are relevant to the historical origin of the term "ideal" in lattice theory. Theorem 3.18.22 Let L be a linearly ordered set, and let C be a positive cone Then C is in fact a filter on L.
011
L.
Theorem 3.18.23 Let L be a linearly ordered set, and let C be a negative cone on L. Then C is in fact an ideal on L.
Recall Dedekind's (1872) construction of the real numbers from the rationals using "cuts." Now, a Dedekind lower cut is simply a lower cone (and hence ideal) on the lattice of rational numbers; dually, a Dedekind upper cut is simply a positive cone (and hence filter). What Dedekind did was to identify real numbers with cuts (ideals) in such a way that rational numbers are identified with principal ideals, and inational numbers are identified with ideals that are not, principal. One way of looking at Dedekind's construction is that the rationals are completed by adding certain "ideal" objects which can only be approximated by the rationals, but are otherwise not really there; hence the expression "ideal" in reference to these set-theoretic constructions, which Dedekind used to make sense of Kummer's concept of "ideal number," which had arisen in connection with certain rings of numbers (the algebraic integers). The terminology was carried over, as a special case, to Boolean lattices (which may be viewed as special kinds of lings) and subsequently generalized to lattices as a whole. The Dedekind construction of the reals from the rationals may be viewed as embedding the (non-complete) lattice of rationals into the complete lattice of ideals. This is actually a special case of two more general theorems, stated as follows, but not proved until Chapter 8. Theorem 3.18.24 EVelY partially ordered set P can be embedded into the partially ordered set of negative cones on P, where the partial order relation is set inclusion. Theorem 3.18.25 Every lattice L can be embedded into the lattice of ideals on L, where the partial order relation is set inclusion.
121
In addition to prime filters, already defined, there is another special, and important, kind of filter, defined in what follows. Recall that, in a partially ordered set P, a maximal element of a subset S of P is any element m satisfying the following conditions: (ml) mE S.
(m2) For all
XES,
m ::; x only if m = x.
In other words, a maximal element of S is any element of S that is not below any other element of S. Now, the collection F of filters of a given lattice L form a partially ordered set, where inclusion is the partial order relation, and the collection P of proper filters of L is a subset of F. We can accordingly talk of maximal elements of P relative to this partial ordering. This yields the notion of a maximal filter, which is formally defined as follows. Definition 3.18.26 Let L be a lattice, and let F be a subset of L. Then F is said to be a maximal filter on L if the following conditions are met: (ml) F is a proper filter on L.
(m2) FOl· any proper filter F' on L, F
~
F' only ifF
= F'.
In other words, a maximal filter on L is any proper filter on L that is not included in any other proper filter on L. Note that the qualification "proper" is crucial, since every filter on L is included in the non-proper filter L. Logically interpreted, a maximal filter conesponds to a maximal theory, or a maximal possible world. A maximal theory is one that claims as much as it can claim, short of claiming everything (which would be inconsistent). Similarly, a maximal world is one that cannot be enriched without producing the "absurd world" (i.e., one in which every proposition is true). The first theorem concerning maximal filters is given as follows. Theorem 3.18.27 Let L be a lattice, and let F be any filter on L. Then there is a maximal filter F+ on L such that F ~ F+.
In other words, every (proper) filter on a lattice is included in a maximal filter. The proof of this very important theorem is postponed until Chapter 13. Before continuing, we note that another popular term appearing in the literature for maximal filter is "ultrafilter." We, however, stick to the less flashy name "maximal filter." The following very important theorems state the relation between maximal filters and prime filters in the special case of distributive lattices. Theorem 3.18.28 In a distributive lattice, evelY maximal filter is prime, although not every prime filter is maximal.
Proof Consider a maximal filter F and suppose that a V bE F, but that neither a E F nor b E F. Then consider [F U {a}) and [F U {b}). Both of these must equal the whole lattice L, i.e., for any x E L, there is some element fI E F such that flAa ::; x and there
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
122
r
FILTERS AND IDEALS
123
Theorem 3.18.32 In a Boolean lattice, a filter is maximal if and only if it is complete (and proper), and it is maximal if and only if it is prime (and proper). We have already shown that, in any distributive lattice, every maximal filter is prime. So all we need to show is that, in a Boolean lattice, (1) every prime filter is complete, and (2) every complete filter is maximal.
o FIG. 3.21. Lattice with a non-maximal prime filter is some element 12 E F such that 12 1\ b::; x. But then (using (FI)) f = (fI 1\ h) E F, and clearly f 1\ (a V b) = (f 1\ a) V (f 1\ b) ::; x. But since both f and a V b are in F, then (again using (FI)) f 1\ (a Vb) E F, and so (by (F2)) for arbitrary x, x E F, contradicting our assumption that F was proper. In order to see that not every prime filter is maximal, consider the lattice in Figure 3.21. This lattice is distributive, and whereas {l} is a prime filter, it is not maximal, since it is included in {a, I}, which is a proper filter distinct from {I}. 0 Although prime filters and maximal filters do not coincide in the general class of distributive lattices, they do coincide in the special subclass of complemented distributive lattices, i.e., Boolean lattices. Before showing this, we define some notions appropriate to the more general category of lattices with complementation operations (recall Sections 3.12 and 3.13). Definition 3.18.29 Let L be a lattice, let x f-+ -x be any complementation operation on L, and let F be any filter on L. Then F is said to be consistent (with respect to x f-+ -x) if the following condition obtains: (c) If x
E
F, then -xli F.
Definition 3.18.30 Let L be a lattice, let x f-+ -x be any complementation operation on L, and let F be any filter on L. Then F is said to be complete (with respect to x f-+ -x) if the following condition obtains: (c) If x Ii F, then -x E F. Logically interpreted, consistency says that if a proposition p is true, its negation -p is not true, whereas completeness says that if p is not true, then -pis true. Having presented the general notion, we concentrate on Boolean lattices. We begin with the following theorem. Theorem 3.18.31 In a Boolean lattice, every proper filter is consistent, and every consistent filter is proper. Proof There is only one improper filter on L, namely, L itself; so if a filter is improper, it contains every element, and hence is inconsistent. Going the other direction simply uses the fact that a 1\ -a = 0, so if a E F and -a E F, then a 1\ -a E F, so 0 E F, but o ::; a, for every a, so a E F for every a. 0
Proof (l) Let F be a prime filter of a Boolean algebra. Since F is non-empty, there is some x E F. But x::; 1 = a V -a. So by (F2), a V -a E F, and so by primeness, a E F or -a E F, for an arbitrary element a picked as you like. So F is complement complete. (2) Suppose that F is complete. If F is not maximal then there must exist some other proper filter G that properly includes F, and then there must be some element a E G such that a Ii F. But since F is complement complete, it must then be the case that -a E F ~ G, i.e., both a, -a E G. But then G is inconsistent, and hence improper, contrary to our earlier assumption. 0
By the work of the exercise above, and by the fact that maximal filters (ideals) coincide with prime filters (ideals) in Boolean algebras, we know that in a Boolean algebra the set-theoretical complement of a maximal filter is a maximal ideal. But this is not always the case for an arbitrary lattice. (Look again at the three-element lattice above.) Having discussed (maximal) filters and ideals separately, we conclude this section by mentioning what we think is the more fundamental notion-namely, the idea of a filter-ideal pair. We introduce two notions: that of a maximal filter-ideal pair and that of a principal filter-ideal pair. Definition 3.18.33 Let L be a lattice, and let F and I be subsets of L. Then the ordered pair (F, I) is said to be a filter-ideal pair on L ifF is afiltel; and I is an ideal, on L. Definition 3.18.34 Let L be a lattice, and let (F, I) be a filter-ideal pair on L. Then (F, I) is said to be disjoint ifF n I = 0, overlapping ifF n I =P 0, and exhaustive if FuI=L. Definition 3.18.35 Let L be a lattice, and let P be a collection offilter-ideal pairs on L. Define a binary relation::; on P so that (F, I) ::; (G, J) iff F ~ G and I ~ J. Fact 3.18.36 The relation::; defined above is a partial order relation on P. Definition 3.18.37 A filter-ideal pair on L is said to be a maximal filter-ideal pair if it is a maximal element of PI with respect to the ordering ::;, where PI is the collection of all disjoint filter-ideal pairs. Definition 3.18.38 A filter-ideal pair on L is said to be a principal filter-ideal pair if it is a minimal element of"P2 with respect to the ordering ::;, where "P2 is the collection of all overlapping filter-ideal pairs. In other words, a maximal filter-ideal pair is a disjoint filter-ideal pair that does not bear the relation::; to any other disjoint filter-ideal pair; a principal filter-ideal pair is an
124
ORDER, LATTICES, AND BOOLEAN ALGEBRAS
overlapping filter-ideal pair that does not bear the relation ::::: to any other overlapping filter-ideal pair. We shall show in Chapter 13 (see Lemma 13.4.4) that every disjoint filter-ideal pair can be extended to a maximal filter-ideal pair. We also show in Section 8.13 that every overlapping filter-ideal pair can be shrunk to a principal filter-ideal pair. The notion of filter-ideal pair puts the concepts of truth and falsity on equal terms. In particular, a filter-ideal pair cOlTesponds to a theory, not merely in the sense of a collection of claims, but more specifically in the sense of a collection of claims together with a cOlTesponding collection of disclaimers. Thus, under this construal, every theory claims ceI1ain propositions, denies others, and is indifferent with regard to still others. Notice carefully the difference between being a disclaimer and failing to be a claim: with respect to certain propositions, a given theory may simply have nothing to say. For example, a theory of celestial mechanics may say nothing about what wines are good with lobster. We conclude this section by noting that, in the special case of distributive lattices, the notion of maximal filter-ideal pair reduces to the earlier concepts of maximal filter (ideal) and prime filter (ideal). We cannot show a similar result for the principal filter-ideal pairs; however, as it will become clear in Chapter 13 they have other nice properties which render them interesting in the context of representation.
Theorem 3.18.39 Let L be a distributive lattice, and let (F,1) be a maximal filterideal pair on L. Then F is a prime filter on L, and I is a prime ideal on L. Corollary 3.18.40 Let L be a Boolean lattice, and let (F, 1) be a maximal filter-ideal pair on L. Then F is a maximal filter on L, and I is a maximal ideal on L. The corollary uses the fact that, in a Boolean lattice, maximal filters (ideals) are prime, and conversely. The proof of the theorem is left as an exercise.
4 SYNTAX 4.1
Introduction
It is customary to think of sentences concretely as utterances stretched out linearly in time, or, even more commonly, as inscriptions stretched out linearly in space, but this very sentence is a counter-example to such over-simplicity (because of the need for line breaks). Such examples (and even the previous sentence when intuitions are sufficiently trained) lend themselves nicely to the construction in most elementary logic texts of sentences as strings of symbols, where, when push comes to shove, these are given the standard set-theoretical rendering as finite sequences. But there is no reason to think that sequences are the most felicitous choice of "data structure" in which to code hieroglyphs or ideograms of various types. It could be that the placement of a pictorial element over or under, to the left or the right of another, might have linguistic significance. Nonetheless there seems nothing wrong with thinking that the pictographic elements of a language are ilTelevant from some suitably cold intellectual point of view, and we shall, for the time being, adopt the useful fiction of the logic texts that a sentence is indeed a string of symbols, understood in the standard set-theoretical way as a finite sequence, i.e., a function from some proper initial segment of the natural numbers. For ease of exposition we shall not countenance the null string ( ) (the function defined on the empty set), but we shall eventually get around to discussing it in an exercise.
4.2
The Algebra of Strings
Let us call any finite, non-null sequence of symbols chosen from some given set A a string (in A), and let us call A an alphabet and the members of A symbols. Many authors talk of "expressions" instead of strings, but this neologism leads to the eventual need to distinguish those "expressions" which are well-formed (i.e., grammatical) from those that are not, with the resultant barbmism "well-formed expression." We denote the set of all such sequences as S. There is a natural operation on finite sequences, namely juxtaposition: (so, ... ,sm) ~ (to, ... ,til) = (so,· .. , Sm, to,· .. ,til)' Juxtaposition can be pictured as joining two strings side by side, and is a natural operation on S that allows us to regard it as an algebra. Thus the algebra of strings in the alphabet A is the structure S = (S, ~). It is easy to see that it is generated from the singletons of its alphabet, and that it has the following property:
THE ALGEBRA OF STRINGS
SYNTAX
126
127
An algebra satisfying this propelty is called a semi-group. It turns out that in a certain sense this by itself captures all the typical properties of an algebra of strings. Thus we have the following.
Proof The proof is by induction on generators. The base case is when x is a generator. Plugging the antecedent of (3) into pseudo-trichotomy, we have either y = b (as desired), or else x < x, i.e., that x = x~m for some m E S, which violates the atomicity of x. For the inductive step, we assume that for x = Xl~X2, Xl and X2 each satisfy left-cancellation (no matter what the right-hand term is). Then assuming the hypothesis of (3),
Theorem 4.2.2 Up to isomorphism, free semi-groups and algebras of strings are the same.
and by associativity we may regroup so as to obtain
Exercise 4.2.1 Prove that all algebras of strings are associative.
(XI~X2)~y
= (XI~X2)~b,
Xl ~(X2 ~y) = Xl ~(X2 ~b).
We shall prove this result in two halves (Subtheorems 4.2.3 and 4.2.7). Subtheorem 4.2.3 Every algebra of strings is afree semi-group. It would be possible to prove this directly. Thus if f is a mapping of the set A of symbols into a semi-group S = (S, +), one can define h( (So, . .. , Sk») = f(So) + ... + f(Sk) and it is easy to see that h is then a homomorphism. However, we shall proceed
somewhat more abstractly, collecting some needed properties for the antecedent of a lemma because these properties are interesting in their own right. (1) (Pseudo-trichotomy.) a~b,
Define x < a to mean that
3m(x~m
= a). Then if x~y =
= b, or
Note that the positive integers have these properties, with + as ~ and G = {I}. Indeed the integers satisfy the stronger law of trichotomy (x = a or x < a or a < x), which helps motivate our choice of name above. Thus it will tum out that the positive integers form the free semi-group with one free generator, S(1). But more important for our purposes is that every algebra of strings has properties (1) and (2). We leave it for the reader to prove this in the following exercise.
Exercise 4.2.4 Show that an algebra of strings S in an alphabet A is atomically generated (with the singletons of the elements of A as the generators), and that it satisfies pseudo-trichotomy. Before stating our lemma, we shall state and prove the following sublemma which deals with semi-groups that are not necessarily algebras of strings. (S,~)
(3) (Left-Cancellation.)
and then for X2, so as to obtain y
= b, as D
Lemma 4.2.6 Let S = (S,~) be an atomically generated semi-group satisfying pseudo-trichotomy. Then S is afree semi-group. Proof Let G be the set of atomic generators, and let f be any mapping of these into the carrier set of some given semi-group with + as its operation. Define h inductively so that E
G, h(s) = f(s), and + hey).
(2) h(x~ y) = hex)
(2) (Atomic generation.) For every algebra of strings there exists a class G of atomic generators, i.e., no element a in G is of the fOlm x~y.
Sublemma 4.2.5 Let Then it also satisfies
Xl
We are now in a position to deal with the lemma that wiII give us Subtheorem 4.2.3.
(1) for s
either
(i) x = a and y (ii) x < a, or (iii) a < x.
We may now use left-cancellation, first for desired.
be a semi-group satisfying properties (1) and (2) above.
Ifx~y
= x~b, then y = b.
The only way that this definition could go wrong would be if the above clauses somehow conflicted either with each other, or with themselves, so as to assign different values to some given element. The first kind of conflict is clearly impossible, for no atom s can be of the form x~y. The second kind of conflict is clearly impossible in the case of clause (1) (since f is a function, and hence single-valued), and associativity will come into play in showing that it is also impossible in the case of clause (2). In somewhat more detail, the proof will proceed by induction on generators, showing that h is "well-defined" (gives a single value when computed according to clauses (I) and (2)). As we said above, clause (1) clearly determines a unique value for h on the generators. For the sake of having a sufficiently strong inductive hypothesis, we shall prove not merely that h is well-defined on each element e, but also that h is well-defined on all "substrings," i.e., on all elements x, y such that e = x~y. Thus suppose that we have a string x~y = a~b. We shall show that h must assign the left-hand side the same value that it assigns the right by way of the calculations of clause (2). We know from pseudo-trichotomy that unless x = a and y = b (in which case, invoking the inductive hypothesis, we are clearly OK), then either x < a or a < x. The two cases being symmetric, we shall treat only the first case. If x is "a proper initial segment" of a, this means that a = x~m (for some "middle chunk" m), and so (3) a~b = (x~m)~b.
SYNTAX
128
THE ALGEBRA OF STRINGS
appropriate algebraic structure is a monoid (M, +, 0), where (M, +) is a semi-group and 0 is a distinguished element satisfying
But then by the associativity of -, it may be seen that (4) x-y = x-em-b).
Since by inductive hypothesis we may assume that h is well-defined on "substrings" of a and b, we have by way of the computations of clause (2) that (5) h(a-b) = (hx
129
+ hm) + hb.
But from (4), using left-cancellation (guaranteed to us by the sublemma), we have that
(ld) x
+0 = 0+x
= x
(identity).
Besides the tedium of keeping track of 0 and ( ), which as "null entities" are a bit hard to always see, there is the further conceptual problem of how to treat "distinguished" elements. Our suggestion is that "0" be viewed as a nullary, or, if that is too much, a constant unary operation, always giving the value O. This way it need not be counted among the generators in the definition of afree monoid.
(6) y=m-b,
i.e., that m and b are "substrings" of y. This means that again we are justified in applying the computations of clause (2) to obtain (7) h(x-y) = hx + (hm
+ hb).
But then associativity of the semi-group operation + gives us the desired (8) h(x-y) = h(a-b).
o
Subtheorem 4.2.3, of course, follows from this lemma and Exercises 4.2.1 and 4.2.4. We still have to prove the other half of Theorem 4.2.2. We do this by proving Subtheorem 4.2.7, the converse of Subtheorem 4.2.3. Subtheorem 4.2.7 Let S be afree semi-group. Then S is isomorphic to an algebra of strings. Proof Let us assume that S = (S, +) is a free semi-group with free generators G. We shall show that S is isomorphic to an algebra of strings. Pick A as a set in oneone correspondence f with G (it might as well be G itself). Let SeA) be the algebra of strings in the alphabet A. We know from Subtheorem 4.2.3 that SeA) is itself a free semigroup with free generators A, and we know from a result of Section 2.14 that any two free semi-groups with the same cardinality of free generators are isomorphic. 0 Remark 4.2.8 Combining Lemma 4.2.6 and Subtheorem 4.2.7, we obtain a kind of representation theorem for atomically generated semi-groups that satisfy pseudo-trichotomy, that is, we show that structures satisfying a rather abstract descliption can all be thought of concretely as sets of strings operated on by concatenation. Exercise 4.2.9 The proof alluded to in the above remark is rather indirect, detouring through talk about free algebras, etc. Give instead a "direct" proof that every atomically generated semi-group satisfying pseudo-trichotomy is isomorphic to an algebra of strings. (Hint: Show that every element in such an algebra can be "factored" into atomic elements in at least one way, and in at most one way, i.e., prove a suitable "unique factorization theorem.") Exercise 4.2.10 In our description of the algebra of strings we have dropped the null string (the empty sequence ( » from consideration. We have done this for reasons of simplicity in exposition, but many authors allow it. "Your mission, should you choose to accept it," is to put it back in, and prove analogs to all of the above results. The
Exercise 4.2.11 There is often more than one fruitful way to abstract a concrete structure. Thus instead of thinking of strings as constructed by way of concatenation, we can think of them as all constructed from the null string at root, by the operation of extending a sequence by adding one more component at its end. Thus a multiple successor algebra is a structure (N, 0, «(J'i )iEI), where 0 E N, each (J'i is a unary operation on N, and where no special postulates are required. A multiple successor arithmetic (due to Hermes 1938) is a multiple successor algebra in which for all i E J, (1) for all x E N, (J'iX
(2) if (J'iX
1= 0
= (J'iY, then x = y.
Show that (up to isomorphism) free multiple successor algebras and multiple successor algebras of strings are the same. Show further that every multiple successor arithmetic is isomorphic to a multiple successor algebra of strings. We can give examples of syntactic structures that satisfy the postulates on the algebras corresponding to the Lambek calculus in a couple of its forms. 1 Example 4.2.12 (Associative Lambek calculus of strings). Consider the algebra of strings S = (S, -) in the alphabet A, i.e., the set of all strings of symbols from A. This includes the empty string ( ). The operation - of concatenation is an associative operation, and ( ) is the identity element. Concatenation is a kind of "addition" of strings, and might be denoted by +. We define a kind of "subtraction" as follows: x ~ y is the result of deleting the string x from the beginning of the string y. There clearly is the symmetric operation of deleting the suing x from the end of the string y. We denote this as y ~ x. (Note that in each case, the "harpoon" points to the suing from which the other string is being deleted.) An alternative metaphor, which does not seem as natural, is to view concatenation as multiplication x, and x)" y and y,/ x as quotients. A metaphor which has closer connections to logic is the following. We view concatenation as a kind of "fusion of premises" 0, and we view the deletion operations as kinds of implication, writing x -+ y and y +- x. Note that no matter what the metaphor, we use symbols that "point" so as to distinguish between the dual residuals. Older literature did not do this, instead using unmemorable notations such as x/y, x\y, x/ /y, x : y, x :: y to make distinctions. I By simply dropping the empty string (pair) one can obtain forms which correspond in the Gentzen system to not allowing empty left-hand sides.
SYNTAX
130
Exercise 4.2.13 Consider the algebra of strings S = (S, ~) in the alphabet A. Let =s be the identity relation restricted to the set S, which is of course a partial order on S. Show that (S, = s, ~, ~, ~, ( ») is a residuated monoid. Example 4.2.14 (Non-associative Lambek calculus of pairs). This is similar to the example above, except the fundamental operation is not concatenation but rather "pairing": x, y
1---+
(x, y).
S is now the set that results from closing A under repeated applications of the pairing operation. The "subtraction operations" now delete either the first or the second components. The empty operation that pairs nothing with itself is denoted by ().
Exercise 4.2.15 Let S be as in the above example. Prove that this is a residuated groupoid with identity. 4.3
The Algebra of Sentences
Let us look at the various ways that the string (p -+ q) may be composed by concatenation. Here we adopt the customary informal practice of denoting a sequence by listing its members. Thus (p -+ q) is our "nickname" for the more formally designated (Cp,-+,q,)·
Perhaps we should make one more comment about our practices. Following Curry (1963), we never display the object language, and so, for example, '-+' is not the conditional sign, it is rather the name of the conditional sign (the conditional sign itself could be a shoe, a ship, or a piece of sealing wax). Returning to the various ways of generating (p -+ q), these include first generating ( Cp, -+, q), and then sticking a right parenthesis on the end (this corresponds to the multiple successor arithmetic way of looking at things). But an equally valid mode of generation is to first generate ( Cp) and then concatenate it with (-+, q, ) ). We leave to the reader the task of writing all the various combinations, but one thing shouldbe clear: none of them corresponds to the intuitions that we all have from logic that (p -+ q) is generated from p and q. In logic texts, the usual inductive definition of sentences for sentential logic says that sentences are generated from sentences, that is from other (well-formed) expressions, and not, as in the examples above, from nonsensical strings. Thus the typical definition from logic texts starts out by postulating a certain (denumerable) class of atomic sentences (p, q, etc.), and then says things like: (-) if ¢ and VI are sentences, then (¢ -+ VI) is a sentence. (Of course typically there would be additional connectives besides -+, but this will do for our present purposes.) There are different ways of understanding clause (-+). One quite common way is to regard sentences as a special subclass of strings, and so (-+) is interpreted as saying that if two strings ¢ and VI are sentences, then so is the string ( ( ) ~ ¢ ~ ( -+ ) ~ VI~ ( ) ).
THE ALGEBRA OF SENTENCES
131
Atomic sentences are then reinterpreted so that strictly speaking they are singletons of the given atomic elements p, q, etc. This rather concrete way of interpreting (-+) would require that if we were to use Polish notation, where we write Cpq instead of (p -+ q) in order to avoid the need for parentheses, the clause would have to be redrawn: (C) if ¢ and VI are sentences, then C¢VI is a sentence. Another more fruitful approach to the interpretation of clause (-+) is to regard (¢ -+ VI) as denoting some way of composing the sentences ¢ and VI so as to form their "conditional," but to be non-committal as to the particular syntactical details. The conditional may be formed by the normal infix notation (as the "icon" (¢ -+ VI) suggests), but it might be formed by the Polish prefix notation, or the so-called reverse Polish suffix notation popularized in Hewlett-Packard advertisements, or even, as in English, by a mixture of prefix and infix notation ("if ___ , then ___ "). In this more abstract, algebraic approach, there is not even the need to think that we are dealing with sequences; this point of view nicely accommodates two-dimensional ideographs and tonal languages. This leads to a distinctive way of regarding the composition of sentences (quite different than the juxtapositional way). We thus regard sentences as forming an algebra, where sentences are composed from other sentences by various syntactic operations, e.g., that of the conditional. In general, of course, there are many more such operations (negation, conjunction, and disjunction, to name the most familiar ones). Thus we can view an algebra of sentences S as a structure (S, (Oi)iEI), where the operations 0i correspond to the various ways of composing sentences from each other. But this is overly general and does not get at the idea that there are certain atomic sentences which serve as the starting points, the generators for the others. We could throw into the structure, then, a certain set A of atomic sentences as the generators of the algebra, but we would still be missing an important feature of the situation, namely uniqueness of composition; no conditional is a conjunction, etc., and if two conjunctions are identical, then their component conjuncts are identical, etc. This is in a way one of the most notoriously difficult of the theorems in Church's (1956) classic Introduction to Mathematical Logic (at least out of all proportion to the seeming obviousness of its content). Definition 4.3.1 Let A = (A, (Oi )iEI). We say that the algebra A has the property of unique decomposition iff (1) there exists a set G of atomic generators of A in the sense that no element s ofG is of the form Oi(al,···, am); (2)
if Oi(al, ... ,am) =
OJ(bl, ... , bn ), then
(i) i = j,
(ii) m = n, and (iii) for each k S m, ak = ble.
132
SYNTAX
In "English," every element can be factored into a composition from the generators in only one way. Unique decomposition comes to take on algebraic clothing in the following result. Theorem 4.3.2 Let A = (A, (Oi) iEI) be an algebra with the property of unique decomposition. Then A is universally free, i.e., free in its similarity class. Proof Let f be a mapping of G into any algebra of the same similarity type. It is reasonably intuitive that f can be extended to a homomorphism h by the following inductive definition: (1) For s E G, h(s) = f(s).
(2) h[Oi(ai, ... , am)] = Oi(hai, ... , ham).
Clearly h so defined preserves the operations. The only thing that could conceivably go wrong would be that the clauses should somehow not determine a unique value hex) for some element x of A. We prove that this does not happen by induction on generators. When x is an atom, clause (1) applies (and clause (2) clearly does not), and since f is a (single-valued) function, clearly it assigns s a unique value. When x is composite, it is of the form Oi(ai, ... , am). The only way that clause (2) could fail to have h assign a unique value to it would be if the same element also had some other form, say OJ(bi, ... , bn ). But this is precisely what unique decomposition says is impossible, and so the proof is complete. D Not only is an algebra with unique decomposition universally free, but the converse is true as well, as we shaII investigate next. It turns out that this is easiest to show by looking at certain concrete examples of universally free algebras. It was claimed in Chapter 2 that universally free algebras exist for every cardinality of generators, and examples were provided by looking . Having the notions of a sentential language and algebra of propositions (appropriate) for a sentential language, we next define interpretations, which provide the crucial link between sentences and propositions. Definition 5.3.3 Let L be a sentential language, let ACL) be the associated algebra of sentences of L, and let P be an algebra (appropriate) for L. Then an interpretation of Lin P is any homomOlphismfrom A(L) into P. To say that a function I is a homomorphism from ACL) to P is to say that the following condition is satisfied for every connective c in L: (H) I[Oc(P1, ... , P1J] = Ddl(p]), ... , I(P1J].
Before continuing, it is important to notice that there are three objects associated with each connective c: (l) the connective itself, which is a symbol Cor string of symbols) together with a grammatical role in the language; (2) the operation Oc on the algebra of sentences, which is a mathematical function that represents the grammatical role of c; (3) the operation Dc on the algebra of propositions, which is the propositional counterpart of c. In particular, whereas c is not a mathematical (set-theoretic) object, both Oc and Dc are mathematical objects. The basic idea is that an interpretation I is a function that assigns a proposition to every sentence of L. But not just any assignment of propositions to sentences will do; in order to be an interpretation, a function must be a homomorphism. This amounts to the claim that the proposition assigned to any compound sentence is algebraically composed out of the propositions assigned to the constituent sentences. In particular, condition (H) states that the proposition assigned to a compound sentence, formed using the connective c, is a function of the propositions respectively assigned to the constituent sentences, where in particular the function is the propositional counterpart of the syntactic connective c. Exercise 5.3.4 Show that requiring that every interpretation is a homomorphism is just the algebraic way of expressing the principle of categorial compositionality in the special case of sentential languages. (Hint: Let the interpretation of a connective c, ICC), be the algebraic operation (i.e., the propositional connective) Dc.) In order to illustrate the idea that an interpretation is a homomorphism, let us consider conjunction. Furthermore, let us suppose that the propositions form a lattice, in which case propositional conjunction is the meet operation. In this case, we have the following, as a special case of condition (H):
TRUTH-VALUE SEMANTICS
SEMANTICS
146
(H&) l(¢ & lfI) = l(¢) /\ l(lfI)· This can be read in a straightforward manner: the proposition designated by the syntactic conjunction of two sentences ¢ and lfI is the propositional cOl~junction of the propositions designated by ¢ and lfI, respectively. An alternate way of writing (H&), which might be easier to understand, goes as follows: (H&*) If ¢ designates p, and lfI designates q, then ¢ & lfI designates p /\ q. Note carefully that & is not the conjunction connective per se; rather, it is a mathematical object, in particular, the algebraic operation associated with syntactic conjunction. Also note that we write the mathematical operation symbol '&' in infix notation irrespective of how syntactic conjunction is in fact concretely implemented. For example, in Polish formatted languages, the conjunction of two sentences ¢ and lfI is obtained by prefixing K in front of ¢ in front of lfI; however, in other languages, conjunction is implemented differently. In any case, whatever actual concrete form the syntactic conjunction of ¢ and lfI takes, we denote it in the same way, by the expression ¢ & lfI, which we may read as "the conjunction of ¢ and lfI," whatever that may in fact be in the particular language under scrutiny. In Sections S.4-S.6, we consider various ways of implementing compositional semantics for sentential languages, starting with the simplest method. 5.4
Truth-Value Semantics
Having discussed general algebraic semantics in Section S.2, and having simplified our discussion to sentential languages in Section S.3, in the present section we consider a further (rather extreme) simplification of the general theory. Specifically, we consider algebras that are most fruitfully understood as consisting, not of propositions per se, but rather of truth values. No one seems to know or care what truth values "really" are. We know there are at least two of them: Frege (1892) calls these "the true" and "the false." We reify these as the numbers 1 and 0 and often refer to them using the letters '[' and' We begin with a general definition.
r.
Definition 5.4.1 An algebra of truth values is any algebra (V, F), where V is a nonempty set of truth values and F is a family offunctions on V. If the elements of the algebra are truth values, then of course, the operations on such an algebra are functions that take truth values and yield truth values; in other words, the operations are what are customarily called truth functions. Another way to say this, perhaps, is that if propositions tum out to be truth values, then propositional connectives correspondingly tum out to be truth functions. We begin with a famous example, with which everyone is familiar-classical truth tables. First of all, we all know about addition and multiplication tables, without necessarily knowing that these tables specify an algebra of natural numbers. Similarly, we all know about truth tables, without necessmily knowing that these tables specify an algebra of truth values. We call this algebra the Frege algebra, which is officially
defined as follows.
147
Definition 5.4.2 The Frege algebra is the algebra (V, F), where V consists of only two elements-l ("the true") and 0 ("the false"), and where F consists of the familiar truth functions, formally defined as follows: (Fl) x /\ y = xy; (F2) x V y = x + y (F3) -x = 1 + x;
+ xy;
(F4) x =? y = 1 + x (FS) x ¢;> y = 1 + x
+ xy; + y.
Here, the variables x and y range over truth values (0 and 1), and the connective-like symbols refer to truth functions. Juxtaposition indicates ordinary numericalmultiplication, and + indicates modulo-2 addition, which is defined so that 0 + 1 = 1 + 0 = 1 and 0+0 = 1 + 1 = O. Exercise 5.4.3 (FI )-(FS) constitute a succinct presentation of the five classical truth functions, based on addition and multiplication on the two-element ring. Verify that the functions specified in (FI )-(FS) do in fact correspond exactly to the familiar truth functions of classical sentential logic. For example, show that the conjunction of t and t is t; i.e., 1 /\ 1 = 1, i.e., 1 x 1 = 1. Since the Frege algebra is an algebra of truth values, any interpretation of a sentential language L into this algebra is by default an assignment of truth values to sentences of L. Furthermore, since an interpretation must satisfy the requirement of compositionality (the homomorphism requirement), it must satisfy the following: (11) l(¢ & lfI)
= l(¢) /\ l(lfI);
(12) l(¢ v lfI) = l(¢) V l(lfI); (13) I(~¢) = -l(¢); (14) l(¢ -+ lfI) = l(¢) =? l(lfI);
(IS) l(¢ +-+ lfI) = l(¢)
¢;>
l(lfI)·
As before, the connective-like symbols on the syntactic side do not refer to the actual syntacti~ connecti~e.s, but rather to their mathematical representations; so, for example, ¢ -+ lfIIS the condItIOnal sentence formed from ¢ and lfI, however that is in fact accomplished in the particular language under scrutiny. Exercise 5.4.4 Verify that (l1)-(1S), in conjunction with (Fl)-(FS), yield the usual classical restrictions that apply to the assignment oftruth values to compound sentences. ~or example, if ¢ and lfI are both interpreted as "the true," then their conjunction is also mterpreted as "the true." Exercise 5.4.5 Show that the Frege algebra can be regarded as a two-element Boolean algebra (lattice), supplemented by a conditional operation (defined so that x =? y = -x V y), and a biconditional operation (defined so that x¢;> y = (x /\ y) V (-x /\ -y)). S~~w, f~r e~ample, that Frege conjunction is the same as Boolean meet, and that Frege dISjUnctIOn IS the same as Boolean join.
Having discussed two-valued algebras, we now consider a natural generalization, which is obtained by enlarging the number of truth values from two to three or more (including possibly infinitely many). In this way we arrive at multi-valued logic, which traces back to Lukasiewicz (1910,1913). The precise philosophical significance of the additional non-standard truth values is unclear. On the other hand, mathematically speaking, multi-valued (MV) algebras are just like two-valued algebras, only bigger! Bigger indeed: as one adds more and more intermediate truth values, the number of mathematically possible truth functions becomes staggeringly large. The following is a simple (though hardly unique) example of a whole family of MV algebras. In each particular example, 0 corresponds to "the false," 1 corresponds to "the true," and the fractions between 0 and I correspond to intermediate truth values.
Definition 5.4.6 An MV algebra is an algebra (V, F) in which V consists of all the fractions O/n, 1/ n, ... , n/nfor some fixed n, and in which the operations in F are defined as follows: (01) x/\y=min(x,y);
(02) x
V
y
= max(x, y);
(03) -x = I - x; (04) x
=>
y = mine 1 - x
(oS) x¢:> Y = (x
=>
+ y, I); => x).
y) /\ (y
Exercise 5.4.7 Show that the Frege algebra is a special case of an MV algebra, in which V is just {O, I} . As with the Frege algebra, it is customary and natural to interpret the elements of an MV algebra as truth values. The difference, of course, is that a (non-trivial) MV algebra has non-classical intermediate truth values. Multi-valued logic was originally motivated by the problem of future contingent statements. Unfortunately, as it turns out, multi-valued logic does not provide a satisfactory solution to this problem, primarily because multi-valued logic is fundamentally truth-functional. Notwithstanding its failure to solve the philosophical problem for which it was originally invented, multi-valued logic has grown into a thriving mathematical discipline with many (non-philosophical) applications. Nevertheless, we do not deal further with multi-valued logics as such, although we certainly deal with semantic algebras containing more than two elements. This is the topic of Sections S.S and S.6.
5.5
POSSIBLE WORLDS SEMANTICS
SEMANTICS
148
Possible Worlds Semantics
In the Frege algebra, there are exactly two "propositions," 1 and 0, which are identified with the truth values "the true" and "the false." In other words, in the Frege algebra, to say that a proposition is (adjectively) true is precisely to say that it is (identical to) "the true."
149
The Frege algebra is a special case of the more general class of truth-value algebras, which include various MV algebras. In every such algebra, the propositions are simply truth values, and propositional connectives are truth functions. Accordingly, only truthfunctional connectives can be interpreted within a truth-value algebra, be it the Frege algebra or an MV algebra. This approach to formal semantics works very well for truth-functional logic, including classical sentential logic and the various multi-valued logics, but it does not work for logics that are not truth-functional, including quantifier logic and modal logic. A more general approach, which we formally present in the next section, distinguishes between propositions and truth values, in analogy to Frege's (1892) distinction between sense and reference. According to this approach, every sentence has a direct intelpretation, which is a proposition; every proposition is, in turn, either true or false (adjectively), so every sentence also has an indirect intelpretation, which is a truth value. However, before proceeding to the more general approach, we consider one more method of implementing algebraic compositionality, namely, the method of possible worlds. According to this method, an interpretation function does not assign a truth value simpliciter to each sentence; rather, it assigns a truth value with respect to each possible world. One then gives truth conditions for complex sentences in a systematic manner, analogous to the truth conditions for classical truth-functional logic. The following illustrate this approach, where veep, w) is the truth value of ep at world w: (vI) v(ep & lJf, w) = t iff veep, w) = t and v(lJf, w) = t; (v2) v(ep v lJf, w) = t iff veep, w) = t and/or v(lJf, w) = t; (v3) v(~ep, w) = t iff veep, w) = f. All the connectives defined in (v 1)-( v3) are truth-functional. If we confine ourselves to these connectives, we have an intensional semantics for a language that has no syntactic means of articulating the various intensional distinctions that arise in the semantics. The failure of the syntax to adequately reflect the semantics prompts any syntactically oriented logician to introduce further, non-truth-functional, sentential operators, namely, modal operators. The most celebrated modal operators are "necessarily ... " and "possibly ... ," which are customarily symbolized by D and O. One characterization of their truth conditions, which traces back to Leibniz, is given as follows: (v4) v(Dep, w) = t iff veep, w') = t for every possible world w'; (vS) v(Oep, w) = t iff veep, w') = t for some possible world w'. The above truth conditions correspond to absolute modal logic. One obtains weaker modal logics by adding an accessibility relation R to the truth conditions, an idea that traces to Kripke (l963a). Thus, in the Kripke approach, one posits a non-empty set W of possible worlds together with a binary relation R on W. One then characterizes interpretations as follows: (v4*) v(Dep, w) (vS*) v(Oep, w)
=t =t
iff veep, w') = t for every w' such that WRW'; iff veep, w') = t for some w' such that wRw ' .
Depending on what properties one asclibes to the accessibility relation R (e.g., reflexivity, symmetry, transitivity, etc.), one obtains various well-known modal systems (e.g., T, S4, B, S5). In addition to the customary presentation of the semantics of modal logic, due to Kripke (1959, 1963a, 1963b, 1965), there is an alternative presentation that renders possible worlds semantics completely compatible with algebraic semantics. Toward this end, we define two special sorts of propositional algebras (cf. Lemmon 1966). The first is more general and is based on Kripke (l963a, 1963b). The second is based in effect on Kripke (1959). Definition 5.5.1 Let W be a nOll-empty set (of possible worlds), and let R be any binary relation on W (the accessibility relation). Then the Kripke algebra on (W, R) is the aigebra KACW, R) = (P, F) defined as follows: P is the set \p(W) of all subsets ofW; F is afamily of operations, defined by: (Kl) p/\q=pnq; (K2) p V q
POSSIBLE WORLDS SEMANTICS
SEMANTICS
150
= p U q;
(K3) -p = W - p; (K4) p =? q = W - P U q; (K5) p ¢} q = (p =? q) n (q =? p); (K6) Dp = {x : for all y, ifxRy then yEp}; (K7) Op = {x : for some y, xRy and yEp}.
Definition 5.5.2 Let W be a non-empty set (of possible vvorlds). Then the Leibniz algebra on W is the algebra LA(W), defined to be identical to the Kripke algebra KA(W, R), where R is the universal relation Oil W. Here, the variables x and y range o~er elements of W (i.e., worlds), and p and q range over elements of P (i.e., sets of worlds). Also, the symbols that look like connectives refer to the operations on the Kripke (Leibniz) algebra. Exercise 5.5.3 Show that in the Leibniz algebra LA(W), conditions (K6) and (K7) reduce to the following: (L6) Dp = W if p = W, and 0 otherwise. (L7) 0 p = 0 if p = 0, and W otherwise. In a Leibniz or Kripke algebra, a proposition is a "UCLA proposition."l In other words, a proposition is simply identified with the set of worlds in which it is true. This means, of course, that distinct propositions cannot be true in precisely the same worlds. Since UCLA propositions are sets (of worlds), propositional connectives are set-theoretic operations. For example, the negation of a proposition (set of worlds) is I JMD first heard the tenn "UCLA proposition" from Alan Ross Anderson sometime during the mid-I 960s. We do not know if it originates with Anderson, but it was of some currency then and reflects the contributions made by Carnap, Montague, and Kaplan (all at the University of California at Los Angeles) to the semantics of modal logic.
151
its set-theoretic complement (relative to W). Similarly, the conjunction of two propositions is simply their intersection, and the disjunction of two propositions is simply their union. Besides the simple set operations, there are also somewhat more complicated set operations, which are associated with the modal operators. For example, condition (K6) states that a world w is in Dp iff every world accessible from w is in p. On the other hand, the Leibnizian condition (L6) states that w is in Dp iff every world is in p. In other words, if p is true in every world, then its necessitation Dp is true in every world, but if p is false in at least one world, then Dp is false in every world. This reflects the fact that the Leibniz algebra corresponds to absolute modal logic. Now, back to algebraic semantics. First of all, in a Kripke or Leibniz algebra, a proposition is just a set of worlds, and a proposition p is true at a world w iff w E p. An algebraic interpretation (homomorphism) I assigns a proposition to every sentence. Accordingly, a sentence c}; is true, according to I, at a world w iff I(c};) is true at w, which is to say that wE I(c};). In other words, we have a straightforward correspondence between the customary interpretations of modal logic and algebraic interpretations. This is formally described as follows, where v is a customary interpretation and I is the corresponding algebraic interpretation: (cl) v(c};, w)
=t
in S, and every subset r of S, r entails rJ> in LI iff r entails rJ> in L2· (3) LI and L2 are weakly equivalent iff for evelY sentence rJ> in S, rJ> is valid in Ll !If rJ> is valid in L2. In other words, two languages are weakly equivalent if they agree concerning what sentences are valid, they are strongly equivalent if they agree concerning what singleconclusion arguments are valid, and they are strictly equivalent if they agree concerning what multi-conclusion arguments are valid.
EQUIVALENCE
173
Next, we present an important theorem, which states that the strict equivalence relation among ECSLs is in fact the identity relation. Theorem 5.12.6 Two (similar) ECSLs Ll and L2 are strictly equivalent LI =L2.
if and only if
Proof The "if" direction is tlivial, so we consider the "only if" direction. We proceed contrapositively. Suppose that LI "I L2, in which case VI "I V2 (since LI and L2 are similar). We wish to show that LI and L2 are not strictly equivalent, which is to say that there are sets rand.D.. such that r entails .D.. in LI but not in Lz, or the other way around. Since VI "I V2, there is a v in one but not the other. Without loss of generality, we may assume that there is some v in VI but not in V2. Consider Tv = {rJ> : v(rJ» = t} and Fv = {rJ> : v( rJ» = f}. Clearly, Tv does not entail Fv in Ll, since there is a valuation in VI that satisfies Tv but falsifies F v, namely v itself. On the other hand, Tv does entail Fv in L2. For suppose to the contrary; then there is a valuation v' in V2 that satisfies Tv and falsifies Fv. In this case, v' assigns t to every formula in Tv and f to every formula in F v , just like v! Functions are extensional objects, so v' must be the same as v, but this contradicts our earlier assumption that v is not in V2. D Having discussed general evaluationally constrained languages, we now focus our attention on evaluationally constrained (sentential) languages that arise from underlying interpreted languages. This provides con'esponding definitions of equivalence for matrices, medleys, and atlases. Recall that every matrix M appropriate to a given sentential language L gives rise to an associated set V(M) of valuations, and hence gives rise to a naturally associated evaluationally constrained (sentential) language. This is formally defined as follows. Definition 5.12.7 Let L be a sentential language, where S is the associated algebra of sentences of L, alld let M be a matrix appropriate for L. Then the associated evaluationally constrained (sentential) language is the system (S, V(M)), where V(M) are the valuations induced by M.
Exercise 5.12.3 We have defined only three forms of equivalence. There are others that can be defined, which respectively pertain to contra-validity, unsatisfiability, unfalsifiability, and simple entailment. Provide these additional definitions.
Exercise 5.12.8 The same can be said about medleys and atlases. Provide the COlTesponding definitions.
Since the validity of formulas is a special case of the validity of single-conclusion arguments, which in tum is a special case of the validity of multi-conclusion arguments, we have a natural ordering of the above forms of equivalence, given in the following.
Since every matrix appropriate to a language gives rise to an associated evaluationally constrained language, we can use the various equivalence relations on ECSLs to induce cOlTesponding equivalence relations on matrices. This is formally defined as follows.
Theorem 5.12.4
Definition 5.12.9 Let L be a sentential language, where S is the associated algebra of sentences of L. Let MI and M2 be logical matrices appropriate for L, and let Ll = (S, V(MI)) and L2 = (S, V(M2)) be the associated evaluationally constrained languages.
(1) If two ECSLs are strictly equivalent, then they are also strongly equivalent; the
(2)
converse does not hold. If two ECSLs are strongly equivalent, then they are also weakly equivalent; the converse does not hold.
Exercise 5.12.5 Prove the above theorem.
if L 1 and L2 are strictly equivalent. (2) Ml and M2 are strongly equivalent if Ll and L2 are strongly equivalent. (3) Ml and M2 are weakly equivalent if LI and L2 are weakly equivalent.
(1) M 1 and M2 are strictly equivalent
SEMANTICS
174
EQUIVALENCE
175
Exercise 5.12.10 Provide the corresponding definitions for medleys and atlases. An alternative term for strict equivalence is "logical indiscernibility"; the appropriateness of this term pertains to the fact that two matrices that are strictly equivalent agree on all logical questions, at least all questions that can be answered exclusively by reference to valuations. This is because strictly equivalent matrices give rise to the very same class of valuations. So, from the standpoint of notions defined purely in terms of valuations, strictly equivalent matrices are indistinguishable, although of course they may be metaphysically quite different. This idea is more fully developed in Chapter 6. In order to illustrate these definitions, we offer a variety of examples, defelTing detailed discussion, however, until our chapter on matrix and atlas theory. All the examples are standard Boolean matrices; they differ solely in what subsets are counted as designated. As before, xs are designated, os are undesignated. We give these examples without proof. The reader will be invited in Chapter 7, using relevant notions of homomorphic image and submatrix, to supply the proofs. Example 1: The logical matrices in Figure 5.6, the first of which is the Frege matrix, are all strictly (and hence strongly, and hence weakly) equivalent. x
x
x
o
x
o
o
x
x
o
o
o
x
o
o
FIG. 5.6. Example 1 Example 2: The two logical matrices in Figure 5.7 are strictly equivalent to each other. On the other hand, whereas they are not strictly equivalent to any of the matrices from Example 1, they are strongly, and hence weakly, equivalent to all of them. Example 3: The matrices in Figure 5.8 are weakly equivalent, but they are not strongly, and hence they are not strictly, equivalent. We now tum to a major theorem, which says that every medley is strictly equivalent to an atlas. In other words, for logical purposes, what can be done with a multitude of propositional algebras can equally well be done with a single propositional algebra, although it may be very big. Theorem 5.12.11 Let M be a medley of logical matrices. Then there exists a logical atlas A strictly equivalent to M, in the sense that V(M) = YeA).
o
o
o
o
o
FIG. 5.7. Example 2 x
x
x
o
x
o
x
o
x
o
FIG. 5.8. Example 3 Proof Index M as (Mi) iEI, or (Mi) for short, where each Mi = (Ai, Di). Define A = (P, (Di») as follows: P is the algebraic direct product of (Ai), denoted IIiAi. For each j E I, D j = {( ai) : aJED j }. In other words, a sequence (ai) is designated in the jth designated set D j of the atlas A iff the jth component of (a;) is designated in the jth matrix M j of the original medley. Claim: A and M are logically equivalent; i.e., YeA) = V(M). It suffices to show that every v in YeA) is also in V(M), and conversely every v in V(M) is also in YeA). (1) Assume v E YeA). Then v is induced by [ with respect to some Dj, where [ is a homomorphism from S into IliA;. By definition v(¢) = tiff l(¢) E Dj (from A). The projection function lrj is a homomorphism from IIi A; onto A j , and so lrj 0 [ is a homomorphism too, from S into Aj. However, /(¢) E Dj (from A) iff lrj 0 l(¢) E Dj (from M). Thus, lrj 0 I induces the same v, hence v E V(M). (2) Assume v E V(M). Then v is induced by I in some M j = (Aj, D j ). v(¢) = t iff I(¢) E Dj (in M). Define I' from S into II; A; so that if l(¢) = aj then l(¢) = (aj, ... , aj, ... ,a j), an i-tuple the elements of which are all ajs. It is easy to see that [' induces a valuation v on the atlas such that v(¢) = tiff l(¢) E Dj (in M) iff I(¢) E Dj (in M). Thus, v E YeA). D
SEMANTICS
176
5.13
Compactness
A key concept in fOlmal semantics and logic is the concept of compactness, which is bOlTowed from general (point-set) topology. Compactness in logic is intimately tied to a related notion, fin itm y entailment, which we begin by briefly discussing. The characterization of entailment (logical consequence) presented in Section 5.11 is semantic, not deductive. Specifically, according to the semantic construal of entailment, to say that cjJ is a logical consequence of r is simply to say that cjJ is true whenever every r is true; it is, in particular, not to say that cjJ can be deduced from r in a formal deductive system. Of course, logicians are generally not content to present a semantic account of entailment and leave it at that. They generally prefer to present a fOlmal deductive (axiomatic) account as well. DefelTing a detailed discussion of ax iomatics to a later chapter, in the present chapter we simply observe a very important feature of the axiomatic account of entailment. The fundamental notion of ax iomatics is the notion of a proof (or derivation), which is customarily defined to be afinite sequence of formulas subject to specified conditions. Furthermore, to say that cjJ can be deduced from r is to say that there is a proof (i.e., a finite sequence) using formulas of r that yields cjJ. But, because of its finite character, a proof of cjJ from r can use only finitely many formulas in r; accordingly, any proof of cjJ from r is in fact a proof of cjJ from a finite subset of r. This can be summarized in the following principle.
Principle 5.13.1 (The compactness of deductive entailment) Aformula cjJ can be deducedfrol71 a set r offormulas only if cjJ can be deducedfmm afinite subset r' ofr. This remarkable feature of deductive systems naturally leads one to query whether semantic systems of entailment have a cOlTesponding property, summarized as follows.
Principle 5.13.2 (The compactness of semantic entailment) A formula cjJ is (semantically) entailed by a set r offormulas only if cjJ is (semantically) entailed by a finite subset r' ofr. Alas, there is nothing about the fOlmal semantic definition of entailment that ensures the truth of this principle. The most famous example of the failure of compactness is in (second-order) number theory. Consider the following (infinite) entailment of second-order number theory: (E) {F(O), F(l), F(2), F(3), ... } I- \fxF(x).
Whereas this expresses a valid entailment in second-order number theory, its validity depends essentially on the infinitude of the set of premises. In particular, there is no finite subset of premises that entails the conclusion. Of course, (E) does not hold in firstorder number theory, for precisely the reason that classical first-order logic is compact! This is in fact the basis of much "mischief" in modern mathematics. We now formally present the concept of compactness as it relates to evaluationally constrained languages. Afterwards, we discuss how semantic compactness can be seen
COMPACTNESS
177
to be a special case of topological compactness, in virtue of which the term "compact" is fully justified. As it turns out, there are actually four distinct notions of compactness in general fOlmal semantics, although under commonly OCCUlTing conditions (see below) they all coincide. We refer to these forms of compactness respectively as U-compactness, 1compactness, E-compactness, and S-compactness, which are defined as follows.
Definition 5.13.3 Let L = (S, V) be an evaluationally constrained language. Then L is said to be U-compact ifior any subset r of S, r is unfalsifiable only if there is afinite subset r' of r that is Ullfalsifiable. In brief, every unfalsifiable set has a finite unfalsifiable subset, or contrapositively stated, if every finite subset of a set S is falsifiable, then S is also falsifiable. The "u" in "Ucompact" refers to the term "union," the relevance of which is explained below.
Definition 5.13.4 Let L = (S, V) be an evaluationally constrained language. Then L is said to be I-compact iffor any subset r of S, r is unsatisfiable only if there is afinite subset r' of r that is unsatisfiable. In brief, every unsatisfiable set has a finite unsatisfiable subset, or contrapositively stated, if every finite subset of a set S is satisfiable, then S is also satisfiable. The "I" in "I-compact" refers to the term "intersection," the relevance of which is explained below.
Definition 5.13.5 Let L = (S, V) be an evaluationally constrained language. Then L is said to be E-compact if for any subset r of S, a formula cjJ is entailed by r only if there is a finite subset r' of r that entails cjJ. The "E" in "E-compact" refers to the term "entailment," which is self-explanatory. Ecompactness is the notion refelTed to at the beginning of the section.
Definition 5.13.6 Let L = (S, V) be an evaluationally constrained language. Then L is said to be S-compact if for any subsets r, a of S, r entails a only if there are finite subsets r', a' such that r' entails a'. S-compactness is the natural generalization of E-compactness that applies to symmetric entailment; hence the name. Since unfalsifiability, unsatisfiability, and ordinary entailment are special cases of symmetric entailment, one might expect the cOlTesponding notions of compactness to be special cases of S-compactness. This is indeed the case.
Theorem 5.13.7 Let L be an evaluationally constrained language. Then if L is Scompact, then L is also U-compact, I-compact, and E-compact. Exercise 5.13.8 Prove the above theorem. The above theorem can be read as saying that S-compactness implies U-compactness, I-compactness, and E-compactness. The following theorem is to be understood in relation to this reading.
178
SEMANTICS
u
I
COMPACTNESS
E
S FIG. 5.9. Four forms of compactness Theorem 5.13.9
(1) U-compactness does not imply S-compactness, I-compactness, or E-compactness. (2) I-compactness does not imply S-compactness, V-compactness, or E-compactness. (3) E-compactness does not imply S-compactness, V-compactness, or I-compactness. Exercise 5.13.10 Prove the above theorem. (Hint: See van Fraassen (1971), where all three notions are discussed. Indeed, our own discussion of compactness owes much to van Fraassen.) Thus, the four forms of compactness can be diagrammed as in Figure 5.9. Although the four forms of compactness are in general distinct, under special but common circumstances they all coincide. We state the relevant definitions, after which we state the theorem. Definition 5.13.11 Let (S, V) be an evaluationally constrained language, and let ¢ be a sentence in S. A sentence Iff is said to be an exclusion negation of ¢ iffor every v in V, V(Iff) = t ijJv(¢) = f.
In other words, an exclusion negation of a sentence ¢ is any sentence whose truth value is always opposite to ¢'s truth value. Notice that an exclusion negation of ¢ need not be recognizable as such by its syntactic form; it can only be recognized by its semantic content, as characterized by the class V of admissible valuations. Definition 5.13.12 An evaluationally constrained language (S, V) is said to be closed under exclusion negation if every sentence in S has an exclusion negation in S. Theorem 5.13.13 Let L = (S, V) be an evaluationally constrained language that is closed under exclusion negation. Then if L is V-compact or I-compact or E-compact, then it is S-compact. Corollary 5.13.14 Suppose L is closed under exclusion negation. (1) If L is V-compact, then L is both I-compact and E-compact.
(2) If L is I-compact, then L is both V-compact and E-compact. (3) If Lis E-compact, then L is both U-compact and I-compact.
Exercise 5.13.15 Prove the above theorem. (Hint: See van Fraassen (1971).)
179
Of course, in classical logic, every sentence ¢ has an exclusion negation, being the syntactically produced negation -¢. Accordingly, in classical logic, all forms of compactness collapse into a single form of compactness. Having discussed the various forms of semantic compactness, which are the same in classical logic but not in general, we now discuss topological compactness, after which we show that the former is a species of the latter. We begin with the definition of a topological space. Definition 5.13.16 Let S be a non-empty set, and let 0 be a non-empty collection of subsets of S. Then 0 is said to be a topology on S precisely if the following conditions are satisfied:
(tl) 0 E O. (t2) S E O. (t3) If X E 0, and YEO, then X n YEO. (t4) IfC ~ 0, then U CEO.
Definition 5.13.17 A topological space is a system (S, 0), where 0 is a topology on S. Definition 5.13.18 Let (S, 0) be a topological space, and let X be a subset of S. Then X is said to be open in (S, 0) if X E 0; X is said to be closed in (S, 0) if'S _ X E 0; X is said to be clopen in (S, 0) if X is both open and closed in (S, 0).
Treating the elements of 0 as open sets, (t1 )-( t4) can be read as saying that 0 and S are open sets, ~hat the intersection of any finite collection of open sets is itself an open set, that the umon of any collection of open sets is itself an open set. Dually, treating the complements of elements of 0 as closed sets, (tl)-(t4) can be read as saying the dual; namely, 0 and S are closed sets; the intersection of any collection of closed sets is itself a closed set; the union of any finite collection of closed sets is itself a closed set. We next tum to the customary topological definition of compactness. Definition 5.13.19 Let (S, 0) be a topological space, and let C be any collection of subsets of S. Then C is said to be a cover if U C = S, and C is said to be an open cover if additionally every element of C is an open set; i.e., C ~ O. Definition 5.13.20 Let (S, 0) be a topological space, and let C be any cover, and let ~ C. Then C' is said to be a subcover of C if c' is also a covel; and C' is said to be a finite subcover if it is additionally finite. C'
Definition 5.13.21 A topological space (S, 0) is said to be compact if every open cover has afinite subcove!: In other words, ifC ~ 0, and U C = S, then there is afinite subset c' of C such that U C' = S.
We now tum to the question of how semantic compactness and topological compactness are related. In order to do this, we first discuss how one can convert an evaluationally constrained language into a quasi-topological object, namely a valuation space, which was defined in Section 5.10, together with the notion of an elementmy class.
THE THREE-FOLD WAY
SEMANTICS
180
Now, the collection of elementary classes on L need not form a topology on V; i.e., a valuation space need not be a topological space. On the other hand, the elementary classes can be used to construct a topology on V. This is a special case of a general theorem, stated as follows. Theorem 5.13.22 Let S be a non-empty set, and let C be any collection of subsets of S. Let int(C) = X : X ~ C, and X isfinite}; let T(C) = (U D : D ~ int(C)}. Then T(C) is a topology on S.
{n
Exercise 5.13.23 Prove the above theorem. (Hint: Note that
n0 = S, and U 0 = 0.)
In other words, to construct a topology from an arbitrary collection C of subsets of S, first one forms all the finite intersections of elements of C, and then one takes these sets and forms arbitrary unions. In this manner, one can construct a topological space from any valuation space. But before we deal with that construction, we discuss the compactness of (S, T(C)). Definition 5.13.24 Let S be a non-empty set, and let C be any collection of subsets of S. Then C is said to have the finite union property iffor any subset D ofC, if U D = S, then there is a finite subset D' of D such that U D' = S. In other words, the finite union property is simply the compactness property applied to an arbitrary collection C of subsets of a set S, irrespective of whether C forms a topology on S. Theorem 5.13.25 Let S be a non-empty set, let C be any collection of subsets of S, and let (S, T(C)) be the topological space on S induced by C. Then (S, T(C)) is compact iff C has the finite union property. Exercise 5.13.26 Prove the above theorem. (Hint: One half ("only if") is uivial; the other half ("if") is proved by extensive appeal to various properties of infinite union.) The dual of the finite union property is the finite intersection property, which is related to I-compactness, and which is defined as follows. Definition 5.13.27 Let S be a non-empty set, and let C be any collection of subsets of S. Then C is said to have the finite intersection property if for any subset D of C, if D = 0, then there is a finite subset D' of D such that D' = 0.
n
n
We now return to evaluation ally constrained languages and valuation spaces. First, two simple theorems. Theorem 5.13.28 An evaluationally constrained language is U-compact iff the associated valuation space has the finite union property. Theorem 5.13.29 An evaluationally constrained language is I-compact ated valuation space has the finite intersection property.
iff the associ-
Exercise 5.13.30 Prove the above theorems. Next, we define the topology associated with an evaluation ally constrained language.
181
Definition 5.13.31 Let (S, V) be an evaluationally constrained language, and let (V, (V(¢) : ¢ E S}) be the associated valuation space. The topological space induced by (S, V) is the topological space (V, T( (V(¢) : ¢ E S})). We conclude this section with the theorem linking semantic and topological compactness. Theorem 5.13.32 An evaluationally constrained language L is U-compact iff the topological space induced by L is compact. Exercise 5.13.33 Prove the above theorem. 5.14
The Three-Fold Way
The following remarks should help clarify the role of matrices and atlases in the definition of consequence, as well as the notion(s) of (quasi-, partially) interpreted language. Elementary logic books usually make one of three suggestions regarding the nature of logical validity: (1) It is a matter of "logical form"; all arguments of the same form are valid. (2) It is a matter of "logical necessity"; in every possible world in which the premises are true, the conclusion is also true. (3) All of the above. The usual definition of validity using models fudges the distinction between (1) and (2), since a model may variously be viewed as an interpretation or as a possible world. Although books rarely distinguish the first from the second, they are clearly different. Consider the argument: Snow is white or grass is green. Therefore, grass is green. The first criterion has one changing the meaning of the atomic constituents, and assessing the actual world for truth and falsity. This change of meaning is in practice usually accomplished by a "translation" that substitutes other constituents of the appropriate grammatical type. Thus in the case in point, one can substitute the sentence "grass is purple" for "grass is green," obtaining the following argument "of the same form," in which the premise is actually true but the conclusion false: Snow is white or grass is purple. Therefore, grass is purple. The second test has one performing thought experiments about "science fiction" worlds in which grass is purple, in which case the premise is true but the conclusion is not. The third test has one doing whichever comes quickest to mind, and maybe even a combination of the two. To be somewhat more formal, let us suppose that we have an atlas A, and adopt the useful fiction that A is the set of all propositions, or at least all the propositions within some realm of discourse. Let us suppose further that there is some particular interpretation 10 that assigns to p the proposition that snow is white, and assigns to q the proposition that grass is green. Now, consider the argument pVql-q.
182
SEMANTICS
Criterion (1) amounts to fixing on a particular designated subset D;, e.g., that designated subset Do which contains the propositions true in the actual world, and then considering all of the various interpretations I, e.g., an 11 that continues to assign the proposition that snow is white to p but assigns the proposition that grass is purple to q. In fact, as far as cliterion (1) is concerned, one really does not need an atlas, but could get by with a matrix instead, since one only looks at a single designated subset. Thus in effect we have a locally evaluated interpretation ally constrained language. Criterion (2) uses the other designated subsets, but only a single interpretation, say again 10. This is in effect to consider an interpreted language. One considers then another designated subset Dl, say the one that still contains the proposition that snow is white, and hence the proposition that snow is white or grass is green, but which does not contain the proposition that grass is green (containing instead, say, the proposition that grass is purple). Criterion (3) allows both the interpretation and the designated subset to change, and this time the needed apparatus is a globally evaluated interpretationally constrained language. Thus one might reject the validity of the argument above by changing both the interpretation and the designated subset. 2 Incidentally, cliterion (1) has a syntactic rendering. In changing the meaning, one can do it by considering all sentences of the same form. This mayor may not give the same result as changing the propositions, depending upon the expressive resources of the language. To be more formal, we would say that the argument ¢ f- If/ is valid iff for every substitution a, if lo(a(¢)) E Do, then lo(a(lf/)) E Do. Let us consider a language for classical sentential logic without negation and which has only the two atomic sentences p and q (the example can trivially be extended to accommodate more). Let us assume further that 10(p) is the true proposition that snow is white, and that lo(q) is the true proposition that grass is green. Then p V q f- q would end up as valid. All this relates to Quine's famous characterization of a logical truth. In Quine (1961, pp. 22-23), logical truth is characterized as "a statement which is true and remains true under all reinterpretations of its components other than the logical particles." This sounds like criterion 0), specialized to the case of unary assertion, and read in a seman tical tone of voice. However, on other occasions Quine has said the same thing in a more syntactical tone, talking of substitutions for components other than the logical particles, as in Philosophy of Logic: "a logical truth is a truth that cannot be turned false by substituting for lexicon. When for its lexical elements we substitute any other strings belonging to the same grammatical categories, the resulting sentence is true" (Quine 1986, pp. 58). A natural question arises as to whether and when the three criteria agree with each other. This question is complicated by the fact that in assessing the validity of an argument, one should be free to quantify over all atlases (matrices), but to start with let us 2This is useful from a pedagogical point of view in that changing the interpretation does not always produce premises which are literally true and a conclusion that is literally false, but rather more likely premises that are "almost true" and a conclusion that is "almost false." So one still has to tell some little story about the world to get things to turn out right. In the example above, one says things like let's suppose that snow never gets splattered with mUd, etc., and that grass never gets sprayed with purple paint or whatever.
THE THREE-FOLD WAY
183
fix on a single atlas. Then clearly criterion (3) implies the other two. We leave as an open problem the investigation of other relationships among the three criteria both abstractly an~ in m01:e concrete circumstances (say, for the case of c1assicallog'ic, where the atlases 1J1 questIOn are all Boolean algebras, and where the designated sets D· are all I the maximal filters).
THE VARIETIES OF LOGICAL EXPERIENCE
6 LOGIC 6.1
Motivational Background
We take the view that "consequence" (what follows from what) is the central business of logic. Strangely, this theme, clear from the time of Aristotle's syllogistics, has been obscured in modem times, where the emphasis has often been on the laws of logic, where these laws are taken not as patterns of inference (relations between statements) but rather as logical truths (statements themselves). Actually it seems that Aristotle himself was at least partly responsible for starting this perverse view of logic, with his so-called three laws of thought (non-contradiction, excluded middle, and identity), but we lay the major blame on Frege, and the logistic tradition from Peano, Whitehead and Russell, Hilbert, Quine, and others. This tradition views logic along a model adapted from Euclid's geometry, wherein certain logical truths are taken as axioms, and others are then deduced from these by way of a rule or two (paradigmatically, modus ponens). Along the way there were some divergent streams, in particular the tradition of natural deduction developed by laskowski (1934) and Gentzen (1934-35), and promulgated in a variety of beginning logic texts by Quine (1950), Copi (1954), Fitch (1952), Kalish and Montague (1964), Lemmon (1965), and Suppes (1957), to name some of the most influential. That this was indeed an innovative view of logic when measured against the axiomatic approach can be seen in a series of papers by Popper (e.g., 1947) concerning what he viewed as "logic without foundations." The view of logic as centered around consequence has been a major thrust of postwar Polish logic, building on earlier work by Tarski on the "consequence operator." In particular, a paper by L6s and Suszko (1957) laid the framework for much later Polish work. Our discussion here will also utilize much of this framework, although we will not take the trouble to always tie specific ideas to the literature. There is one more influence that we must acknowledge, and it too started with Gentzen (1934-35). It is well known that in developing his "sequenzen calculus" for classical logic, he found need for "multiple conclusions." Thus he needed to extend the idea of a "singular sequent" r I- ¢ (a set of premises r implies the sentence ¢) to "multiple sequents" r I- 11 (a set of premises r implies a set of conclusions 11). This last is understood semantically as "every way in which all the premises are true is a way in which some of the conclusions are true." Alternatively, it can be explained informally that the premises are understood conjunctively, whereas the conclusions are understood disjunctively. There is a kind of naive symmetry about this that we shall make more precise below, but in the meantime we shall dub the relation we shall be discussing symmetric consequence. Now there seems to us nothing to have been written
185
in the sky that says logic should focus on arguments with multiple conclusions. Indeed, in the work of Gentzen, the use of multiple conclusions appears to be more or less a technical device to accommodate classical logic's symmetric attitude towards truth and falsity (and this seems true of the subsequent work of Kneale (1956) and Camap (1943, p. 151) regarding what the latter dubbed "involution"). But more recently the work of Scott (1973) and Shoesmith and Smiley (1978) has shown the theoretical utility of considering multiple conclusion arguments in a more general setting, and we shall build on their work below.
6.2
The Varieties of Logical Experience
We shall here make a quick sketch of various ways of presenting logical systems: (1) (2) (3) (4)
unary assertional systems, I- ¢; binary implicational systems, ¢ I- If/; asymmetric consequence systems, r I- ¢; symmetric consequence systems, r I- 11.
Since it is understood here that the sets of sentences r,11, can be empty or of course singletons, (1) is a special case of (3), and (3) can be viewed as a special case of (4). Also in the same sense, (2) is a special case of (3) and (4). There are clearly other variants on these notions that will occur to the reader, e.g., (5) Unary refutational systems: ¢ I-
("¢ is refutable"),
or versions of (3) and (4) where the sets r, 11, are required to be finite, etc. Consider a symmetric consequence r I- 11. Either of r or 11 can be required to be finite, and having made that choice, one can further choose to restrict r or 11 to have a specific number of sentences. 1 or 0 are popular choices, though Aristotle would have restricted r to 2 and 11 to 1 for the syllogistic inferences (but both rand 11 to 1 for the immediate inferences). Sometimes the specific number is a maximum, as with Gentzen's (1934-35) treatment of intuitionistic logic, where 11 can have either 1 or 0 members. But for simplicity, let us restrict out attention to the two choices of requiring finiteness, or not requiring finiteness, and then go on to supplement the first choice with two specific exact numbers, I or O. This gives us 2 x 2 = 4 choices for each of rand 11, or then 4 x 4 = 16 choices for r I- 11. Logicians do not always bother to formally distinguish all of these variations because in "real life," logics tend to have the following properties: (a) compactness (and dilution), and (b) the presence of connectives that indicate structuralfeatures. By "compactness" we mean the property that if r I- 11 is valid, then so is some ro I- 110, where ro is a finite subset of rand 110 is a finite subset of 11. By "dilution" we mean the converse (but where ro and 110 do not necessarily have to be finite). Clearly, then, given (a), the restriction to finite sets is otiose.
WHAT IS (A) LOGIC?
LOGIC
186
By (b) we mean to refer to the phenomenon that allows one to replace
P] , ... , Pm
I- Iff] , ... , If/n
with first
P] /\ ... /\ Pm I- Iff]
V ... V If/n,
and then I-
PI /\ ... /\ Pm
--+ If/l V ... V If/n,
or that allows one to replace with I-
~p.
But in general there is no reason to think that (1)-(5), or the various variants that the reader may produce given our few hints, are equivalent. Certainly it is not hard to imagine (b) failing for real-life examples (think, for example, of studying fragments of even classical logic where various crucial connectives are missing). But there are other examples offull non-classical logics. There are not many real-life examples where (a) fails (though quantum logic and supervaluational classical logic certainly count, at least as usually considered). Compactness typically fails for second-order logics with standard models, and these are beyond the scope of this book. Actually there is a way of looking at the consequence relation of relevance logic so that dilution fails. But one need not look at the consequence relation in just this way, and so this is only a slight caveat. We shall focus in this book on just the four varieties of presentations of logics that lead off this section, and indeed largely concentrate on unary assertional systems, asymmetric consequence relations, and symmetric consequence relations, even though binary implicational systems are perhaps the presentation that most fits the idea of thinking of logics as ordered algebras. We consider one last way of presenting a logic which is of even closer affinity to algebraic approaches to logic, namely (6) equivalential systems,
P -11- If/.
Here P -11- If/ is to be understood as saying that If/ is a consequence of P, and vice versa. This way of thinking of logic is not quite on all fours with the others; at least it is true that one cannot think of (6) as a special case of (4) in just the same way that one can with the others. But clearly (6) is not altogether unrelated to (2), and given the emphasis in algebraic studies on identity (equational classes), it is not too surprising that equivalence should raise its head. Before leaving the topic of the vmious ways of presenting logical systems, we make a terminological point or two. For uniformity, in the sequel we can always assume that each variety is a special case of the symmetric consequence (with empty left-hand side, etc., as needed), but we shall not always bother to explicitly respect this point. We shall also in the sequel identify a system with its consequence relation, and speak of the two interchangeably.
6.3
187
What Is (a) Logic?
We do not in this section presume to decide issues between classical logic and its competitors (intuitionistic logic, etc.) as to which is really logic. We just want to lay down certain natural requirements on any system that is even in the running. We shall do this for at least the main different varieties discussed in Section 6.2. In this section we shall presuppose a universe S of statements. The word "statement" has a somewhat checkered philosophical past, sometimes being used by writers to refer to sentences, sometimes to propositions, sometimes to more subtle things like declm'ative speech acts, etc. We here take advantage of these ambiguities to appropriate it for a somewhat abstract technical use, allowing it to be any of these things and anything else as well. Quite likely the reader will think of S as a denumerable set of sentences of English or some other natural language, but it is our intention that the elements of S may be any items at all, including ships and shoes and sealing wax, natural numbers or even real numbers. The important thing to emphasize about the elements of S is that at the present stage of investigation, we are considering them as having no internal structure whatsoever. This does not mean that they in fact have no internal structure; they may in fact be the sentences of English with their internal grammatical structure. What it does mean is that any internal structure which they do have is disregarded at the present level of abstraction. Let us start with the familiar unary systems, Then a natural way to think of a logic 12 is that it is just a subset of S (the theorems). We shall write I- L P for pEL. There are various other things that one might build into a logic, perhaps sometlling about its having axioms and rules of inference. But we take these things as having to do with the particular syntactical presentation of the proof theory of a logic, and so do not want to take these as part of the abstract notion of a logic itself. If the reader has some reservations, thinking that still there ought to be some more structure representing all the valid rules of inference, this is just a reason for the reader to prefer consequence systems of one variety or the other. Before talking about these, we shall first pause to discuss binary systems, There are really two kinds: binary implicational systems and binary equivalential systems. In both cases a logic 12 is understood as a set of pairs of statements, The difference between the implicational systems and the equivalential systems is brought out by further requirements. Of course we require of both kinds of system: reflexivity, (p, p) E 12; transitivity, (p, If/) ELand (If/, X) E 12 only if (p, X) E L But we require further of an equivalential system, symmetry, (p, If/) E 12 only if (Iff, p) E L Turning now to consequence systems, for the asymmetric versions, a logic will be understood to again be a set of pairs, but this time the first component of each pair is a set of statements and the second component still just a statement. (For symmetric consequence the second component too will be a set of statements.)
188
LOGICS AND VALUATIONS
LOGIC
For an (asymmetric) consequence system we further require generalized forms of reflexivity and transitivity first set down by Gentzen (1934-35) (as is customary, we shall write 'T, A" in place of the more formal 'TuA", and 'T, cj/' in place of 'Tu {CP} "). Actually we shall need the property that Gentzen called "cut" in a stronger form, so we shall first define an (asymmetric) pre-consequence relation as a relation I- between sets of statements and statements satisfying the following two properties: overlap, if cP
E
r, then r 1-£ cP;
cut, if r 1-£ cP and r, cP 1-£ lJf, then r I- £ lJf. We require further (with Gentzen), dilution, ifr 1-£ cP then r,A 1-£ cpo There is a strengthening of cut that we require in addition for a full-fledged consequence relation: in finitary cut, ifr 1-£ cP, for all cP E A, andr,A 1-£ lJf, then r 1-£ lJf. Clearly infinitary cut includes plain cut as a special case; and they are equivalent when the consequence relation is compact, but not in general. Exercise 6.3.1 Show the above claims. (Hint: To show the non-equivalence give an example of a pre-consequence relation that lacks infinitary cut, i.e., is not a consequence relation.)
189
For full-fledged symmetric consequence we must again strengthen cut in some appropriate infinitary way. To this end we define the global cut property for symmetric consequence, but first we define a quasi-partition. Definition 6.3.3 Given any set of statements L, let us define a quasi-partition of L to be a pair of disjoint sets L 1, L2 such that L = L 1 U L2 (the reason why this is called a quasi-partition is that we allow one OfLI or L2 to be empty). Definition 6.3.4 We say that I- has the global cut property ijj·given any set of statements L, whenever not (r I- A) then there exists a quasi-partition Ll, L2 ofL sllch that not (L 1 , r I- A, L2). The global cut property for symmetric consequence clearly implies the cut property, for, proceeding contrapositively, if not (r I- A), then choosing L = {CP}, we have either not (r, cP I- A), or not (r I- cP, A). It can also be shown to imply the infinitary cut property, even in its stronger symmetric form: Symmetric infinitary cut, if r I- cP, 8 (for all cP
E
A), and r, A I- 8, then r I- 8.
Theorem 6.3.5 Let I- be a symmetric consequence relation. Then I- satisfies symmetric infinitary cut.
There is another way of treating consequence, namely as an operation on a set of statements r producing yet another set of statements Cn(r) (the set of "consequences of r"). This was the idea of Tarski and it has become the standard view of logic of the Polish School. Clearly it seems largely a matter of style as to whether one writes r I- cP or cP E Cn(r), as we pin down in the following.
Proof Proceeding by indirect proof, let us suppose the hypotheses of symmetric infinitary cut, and yet suppose that not (r I- 8). The global cut property tells us that we must be able to divide up A into Al and A2 so that not (AIX I- 8,A2). Clearly A2 must be empty, for if some cP E A2, then AI. r I- 8, A2 by virtue of dilution applied to the given hypothesis that r I- cP, 8. So all of A must end up on the left-hand side, that is, we have not (A, r I- 8). But this is impossible, since its opposite is just the given D hypothesis that r, A I- 8.
Exercise 6.3.2 Show that the properties of consequence as a relation listed above (including infinitary cut) are implied by the following properties of the consequence operation:
Note that for the case that L = S, the global cut property guarantees that if not (r I- A), then there is a partition of S into two halves T, F such that r ~ T, A ~ F, and not (T I- F) (indeed this "special case" is equivalent to the global cut property).
(i) r ~ Cn(r); (ii) Cn(Cn(r» = Cn(r); (iii) if r ~ A then Cn(r) ~ Cn(A).
Exercise 6.3.6 Prove that the global cut property is equivalent to the "special case" when L = S (the set of all sentences).
Show conversely that those properties of the consequence relation, with infinitary cut, imply properties (i)-(iii) of the consequence operation. For symmetric pre-consequence, the properties of overlap, etc. must be slightly generalized. Thus we have (we begin here to omit the subscript 12 on 1-£ as understood in context): overlap, if rnA =I- 0, then r I- A; cut, if r I- cP, A and r, cP I- A, then r I- A; dilution, if r I- A then L, r I- A,8.
We shall not make any further moves to justify these "Gentzen properties" as desirable features of a "logic," but we hope that they at least shike the reader as natural. Their fruitfulness will be clear from the results of the next section. 6.4
Logics and Valuations
By a valuation of a set of statements S is meant an assignment of the values t or f to the elements of S. This is just an extension of our usage in the previous chapter so as to allow the arguments of the valuation to be statements, which might be sentences (as required in the previous chapter), but might be propositions or something else entirely. It is convenient and customary to think of the assignment, let us call it v, as a function
190
defined on all the elements of S. This amounts to saying that each statement has a truth value, and no statement has more than one truth value, although for many purposes (reflecting attempts to model the imperfections of natural languages) these restrictions may seem over-idealistic. A valuation clearly partitions the statements in S into two halves, which we denote as Tv and Fv. Recall from the previous chapter the notion of a semi-interpreted language as a pair (S, V), where V is some set of valuations of the algebra S of sentences of the language L, which we call the admissible valuations. For a while at least, it is unimportant that the sentences have any internal structure or that they are sentences at all, so we shall replace L with a set of "statements" S (recall this is just any non-empty set), and talk of a semi-intelpreted semi-language (the reader is assured that we will not employ this barbarism often). Note that every semi-interpreted semi-language has a natural symmetric consequence relation:
r
BINARY CONSEQUENCE
LOGIC
f- A iff for every v E V, if v assigns t to every member of r, then v assigns t to some member of A.
We write f- (V) for this relation (also in accord with the usual conventions about functional notation, we sometimes write f-v).
Exercise 6.4.1 Show that a class of valuations of a set of statements S always gives rise to a symmetric consequence relation on the set S. Not only does a class of valuations determine a symmetric consequence system, but in a similar fashion it also determines an asymmetlic consequence system, a binary implicational system, a unary assertional system, a left-sided unary refutational system, an equivalential system, etc. (All of the explicitly listed systems, save the equivalential, are just special cases of the symmetric consequence obtained by requiring the righthand set to be a singleton, both left- and right-hand sets to be singletons, the left-hand set to be empty and the right-hand set to be a singleton, etc.) Thus to consider explicitly just one more case that interests us, a unary assertional system can be defined so that f-v P iffv(p) = t for every valuation v E V. The important thing about a consequence relation of any kind is not only that does a class of valuations determine the consequence relation, but also that the converse is true. Where f- is a symmetlic consequence relation, we shall say that a valuation v respects f- if there is no pair of sets of statements r, A, such that r f- A, and yet vCr) = t and v(A) = f. (We write vCr) = t to mean that vCr) = t for all r E r, and we similarly write v(A) = f to mean that v(8) = f for all 8 E A.) Analogously, when f- is an asymmetric consequence relation, respect amounts to there being no set of statements r and statement p such that r f- p, while vCr) = t and v(p) = f. And when f- is unary assertional consequence, respect just amounts to there being no statement p such that f- p and yet v(p) = f. (We leave it to the interested reader to figure out the appropriate extensions -to other varieties of logical presentation.) Given a consequence relation of any kind, we define V (f-) = {v: v is a valuation respecting f-}. (We also write this as H-.)
191
We speak of a class of valuations V being sound with respect to a consequence relation f- (of any kind) just when every v E V respects f-, i.e., f- ~ f-v. And we speak of a consequence relation f- being complete with respect to a class of valuations V just when conversely f-v ~ f-. Intuitively, this amounts to f- being strong enough to capture all of the inferences (of the appropriate kind-unary, asymmetric, symmetric, etc.) valid in V. Soundness and completeness then just means that f- = f-v. Naturally, then, the question arises as to when a logical system is both sound and complete with respect to some class of valuations. This can be expressed informally as the question as to when a logic has a semantics. The next two sections will address this question for asymmetric and symmetric consequence systems, respectively. We shall here and now dispose of the question for unary assertional systems, since this is such an easy and special case. Thus given a logic J: and its unary consequence relation f-, it is easy to show that J: is always sound and complete with respect to a singleton class of valuations. Thus simply define V = {v}, where v is the "characteristic" function for f-, i.e., v assigns t to just those sentences Iff for which f- Iff. Not only is J: sound and complete with respect to {v}, but clearly then also with respect to Vf- (the set of valuations respecting f-, of which v is the one that "does all the work").
Theorem 6.4.2 (Completeness for unary assertional consequence) Let f- be a unary assertional consequence relation. Then f- is sound and complete with respect to a class of valuations V = {v}, where v is the characteristic function for f-. Thus f- is also sound and complete with respect to Vf-. We shall not find things so easy in dealing with asymmetric and symmetric consequence (these are progressively more difficult). We shall find that we need to place a few natural conditions on f- so as to get any result at all, and we shall find that we cannot get by with singletons. We shall practice first on the "toy" case of binary consequence. The reader will find that several main themes will be introduced in this context.
6.5
Binary Consequence in the Context of Pre-ordered Sets
Let us fix throughout this section a set S, the elements that we think of as "statements" (sentences or propositions). There are two ways to think of determining a binary implication relation on S. The first, which we shall call the direct way, is to simply postulate a binary relation of implication ~ on S. It is natural to give ~ the properties of a pre-ordering, i.e., reflexivity and transitivity. Thus by an (abstract) binary implication relation we shall simply mean a pre-order ~. Anti-symmetry (the only ingredient missing from a partial ordering) also springs to mind. But surely when the elements of S are sentences there is no reason to think that two elements which co-imply one another must be identical. Thus consider p and p !\ p, the latter of which clearly contains a conjunction sign that the former does notthough Martin and Meyer (1982) show that an interesting "minimal" system of pure implicationallogic proposed by Belnap has precisely the coincidence of logical equivalence and identity. Even when the elements are propositions, the situation is somewhat
192
BINARY CONSEQUENCE
LOGIC
dicey, since in the philosophical literature there are ~ome co~ceptions of propositions that allow for finer-grained distinctions than mere logIcal eqUIvalence. There is another (indirect) way to induce a binary implication relation on the set S. Let us suppose that we have a set V of valuations, i.e., mappings of S into {t, f} . We can define the relation a ~v b iff for every v E V, if v(a) = t then v(b) = t. Valuations are just characteristic functions picking out subsets of S, so alternatively we could start with a set J of subsets of S (we call T E J a truth set), and define a ~J b iff for every T E J, if a E T then bET. Given a valuation v, we shall deno~e its tr~lth set ({a: v(a) = t}) by Tv, and given a truth set T, we shall denote the valuatIOn whIch .. . assigns t to its members (and f to its non-members) b~ VT: The direct way of specifying an implication relatIOn IS most famIlIar III a proo~ theoretic context, and the indirect way is certainly reminiscent of a more model-theoretIc (semantical) context. The question then naturally arises as to whether and when these two ways agree. This is the subject we now address. We begin somewhat abstractly. Let J be a collection of subsets of S, and R be a binary relation on S, characterized by the property that (1) aRb iff '
(2) a 0 (b + c) ::; b + (a 0 c); (3) (b + c) 0 a ::; b + (c 0 a); (4) (b + c) 0 a ::; (b 0 a) + c.
Remark 6.8.7 The properties (iii') and (iv') guarantee that there is a one-one correspondence between symmetric consequence relations and sets of valuations ((iv') conesponds to completeness and (iii') conesponds to absoluteness). Indeed, it is easy to see that the map V (I-) is a "dual" isomorphism between the lattice of symmetric consequence relations and the lattice of sets of valuations (intersection canied into union, and vice versa). This means that the lattice of symmetric consequence relations has an exceptionally simple structure, namely, that of the lattice of all subsets of a set, or in more abstract terms, that of a complete atomic Boolean algebra. This is in sharp contrast to the situation of asymmetric consequence relations. See W6jcicki (1988) for more on the lattice of asymmetric consequence relations.
The idea of a hemi-distributoid related to consequence is that besides providing an operation generalizing conjunction (for the left), another dual operation generalizing disjunction (for the right), and the various "hemi-distributive laws," (1)-(4) postulated above, give the effect of the cut property (this is unfortunately not quite true as we shall ultimately see below). Given a set a we shall define the set of fissions of a (in symbols Fis(a)) in precisely the same way as we defined the fusions of a set, but using + instead of 0 (and at this stage we do not provide for empty fissions or fusions). Then given a hemi-distributoid we can define explicit symmetric consequence between two sets r, a ~ S as follows:
Historical Remark 6.8.8 Absoluteness is the appropriate analog for a logic of categoricity for a theory. In view of both the naturalness and importance of the property we call "absoluteness," we find puzzling the casualness with which it has been previously dealt in the literature. Scott (1973) emphasized the importance of symmetric consequence, and also of looking at a semantics as a class of valuations. Indeed, that paper contains the proof of the completeness theorem for compact symmetric logics, and yet Scott does not state its dual, the absoluteness theorem. Regarding Shoesmith and Smiley (1978), we tend to agree with a remark made by Shoesmith, in conespondence, that our asking whether they clearly state absoluteness is "like asking if the New Testament teaches the doctrine of the Trinity-it is written on every page, whether it is explicit or not." But the one explicit reference supplied (p. 73) mentions the property in an offhand way (in the context of establishing that "the single-conclusion part of a calculus is never sufficient to detennine the remainder of it"). It is said that "for whereas each multiple-conclusion calculus is characterized by a unique set of partitions, the presence or absence of each pair (T, U) being dictated by the falsity or truth of T I- U, several sets of partitions can characterize the same single-conclusion calculus" (a "partition" is just a division of sentences into two sets, the true T, and the false U, and so functions as a valuation). Not only does absoluteness not appear as a numbered theorem, but the justification provided in the quotation seems hardly to count as a proof. Talking as it does of partitions (T, U), it seems to overlook the more typical consequences r I- a where the pair (r, a) does not necessmily constitute a partition. Incidentally, in closing these historical remarks, we observe that viewing absoluteness as the dual of completeness in the context of Galois connections seems to be novel with us.
In analyzing what propeliies are needed for this to give a symmetric consequence relation, it is clear once more that overlap, dilution (and compactness) are simple consequences of the definition. Everything then falls on the symmetric cut property. This requires that if r ::; {x} u a, and r u {x} ::; a, then r ::; a. The most general case then would look something like this:
6.9
Symmetric Consequence in the Context of Hemi-distributoids
By a hemi-distributoid we mean a structure (S,::;, 0, +), where each of (S, ::;, 0) and (S,::;, +) is a pre-ordered groupoid, and where 0 distributes over + as follows: (1) a
0
(b
+ c) ::;
(a
0
b)
+ c;
(5) r::; a iff for some
r E Fus(r), 8 E Fis(a), r ::; 8.
(6) If Yl ::; 81 (x) and n(x) ::; 82, then n(Y1) ::; 81 (82)
((symmetIic) algebraic cut).
Here it is understood that the r-terms are members of Fus(r) and the 8-tenns are members of Fis(a), and that the displayed substitutions may be made for as many occurrences of x as one likes (as long as at least one is made on each side). The symmetric algebraic cut propeliy is intimately connected with the hemi-distributive laws (1)-(4). We shall derive the hemi-distributive law (2) from it by way of example, leaving the others as exercises. It is a matter of inspection to see that (7) below is a special case of (6). (We underline the substituted positions to aid the eye.) (7) If b + x::; b + ~ and a
0
~
::;
a
0
x, then a
0
(b
+ x) ::; b + (a 0
x).
We would love to proceed to show conversely that symmetric algebraic cut holds in general of hemi-distributoids, but we are unable to do this, as the following shows. Exercise 6.9.1 R. K. Meyer (personal communication) has supplied us with the following hemi-distributoid which provides a counter-example not only to symmetric algebraic cut, but to symmetric cut in general. Consider the set {1, ~, O} with the usual ordering, and let fusion and fission be defined by the tables in Figure 6.1. Show that this is a hemi-distributoid, and that whereas {x + x} ::; {x} and {x} ::; {x 0 x}, still {x + x} {x 0 x}. (Hint: for this last, assign x the value ~, and note that all fusions of the left-hand side will take the value 1 and that all fissions of the right-hand side will take the value 0.)
i
Remark 6.9.2 The reader may recognize these tables as being just like those for conjunction and disjunction in the Lukasiewicz (1910) three-valued logic, excepting the center entry in each case. Actually, where ~ and -+ are the Lukasiewicz operations for negation and implication, x 0 y = ~(x -+ ~y) and x + y = ~x -+ y.
T 1
HEMI -DISTRIB UTOIDS
LOGIC
204
1
o
2:
o
1
2:
1
o
+
1
2:
o 1 1
1
2:
2:
1
o
o
2:
o
o
o
o
o
1
1
2:
1
1
2:
o
FIG. 6.1. Counter-example to symmetric and symmetric algebraic cut Since we thus cannot show symmetric algebraic cut in general, we instead show it for the case where but one OCCUlTence of x is replaced, which we display as follows (the square brackets indicating a specific OCCUlTence of x-this can be understood as shorthand for an infinite number of specific laws using algebraic expressions where we can actually talk about specific OCCUlTences of x, as in (7) above): (8) If Yl ::; 81 [x] and Y2[x] ::; 82, then Y2[Yl] ::; 81 [82]
(single algebraic cut).
We need the following kind of general hemi-distributive law. (If it could be proven allowing substitution in more than one OCCUlTence, we would have been able to prove algebraic cut in general.) Lemma 6.9.3 Let y[x] be afusion, and let 8[x] be afission. Then y[8[x]] ::; 8[y[x]].
In proving the lemma, we find it convenient to have at hand what is in effect a special case (Sublemma 6.9.4), which we leave to the reader to prove by induction on the complexity of the construction of 8[x]. (The inductive proof of the main lemma can be structured so that this is not necessary, but it is more confusing.) Sublemma 6.9.4 Let 8[x] be afission. Then:
monotonicity, so we can assume that both are complex, say that Y2[X], + 82[X].
0
There are actually three more cases, depending on whether the distinguished ocCUlTence of x is on the left or right of the fusion and fission (a different one of the hemi-distributions (1)-(4) coming to the rescue each time). We shall explicitly work only on the case above. Thus: 0
(Y2[81
0
(81
+ (Yl
+ 82[X]])
+ 82[Y2[X]]) o
82 [Y2 [x]])
Y2[X]]
by sublemma
It is striking that the problem in generalizing the above all comes down in the end to the tangle of multiple OCCUlTences of the "cut element" x. One way to cut this Gordian knot is just to assume some suitable general forms of idempotence for 0 and +. It of course crosses the mind that we should have square-increasingness for 0 and its dual for+: (11) a + a::; a
(sum-decreasingness).
But these are not suitably general (unless one adds as well the commutativity and associativity of the hemi-distributoid operations). Given y, y' E Fus(r), we shall say that y' is a reduct of y (in symbols y' -< y) intuitively when y' is just the same as y except for containing fewer OCCUlTences of one or more members of r. We shall say that y' is a complete reduct of y when it is a reduct of y containing at most one OCCUlTence of each member of r (this is the same as saying that it is minimal with respect to - has the deduction theorem property that I- P :::> lfI iff P I- lfI. This allows us in a unary assertional fonnulation of classical logic, "throwing the ladder away," to simply define the relation P I- lfI as I- P :::> lfI. This move is common enough for even non-classical logics that we wish to abstract it. Thus we say of a sentence B(p, q) containing the atomic sentences p and q that it indicates implication iff (i) I- B(p, p) (reflexivity); (ii) if I- B(p, lfI) and I- B(lfI, X), then I- B(p, X) (transitivity); (iii) if I- P and I- B(p, lfI), then I- lfI (modus ponens). It is then natural to abbreviate B(p, lfI) as p -+ lfI, and to define p I- lfI as I- p -+ lfI. Then of course p -II- lfI is defined as the conjunction p I- lfI and lfI I- p. Finally, we say that the unary logic (indirectly) supports congruence if there is some sentence B(p, q) that indicates implication, and such that -II- satisfies (rep).
Remark 6.14.4 The role of (i) and (ii) above is clear enough in relation to obtaining a congruence, but what is the role of (iii)? Besides being natural on logical grounds, its algebraic role is to ensure that the congruence respects the predicate I- ("is a theorem"). Note that in the case of direct support, the infinitary cut property takes care of not only this, but also the analogous problem of respecting asymmetric and symmetric consequence. Although we have motivated the notion of indirect support of congruence by way of unary assertionallogics, there is nothing wrong with speaking of a consequence logic (either asymmetric or symmetric) as indirectly supporting congruence. (i)-(iii) above would just be understood as having the empty set of premises. Most nonnally we would then expect that if a consequence logic indirectly supported consequence, then it would directly support it (and vice versa). This of course would be ensured if the logic had the deduction theorem property.
216
LOGIC
But there is at least one real-life example (relevance logic) where under a "first guess" of what it would be like as a consequence logic, it indirectly but not directly supports congruence. In this case we would presumably want to require of indirect support of congruence that it satisfy replacement not just with respect to the theoremhood predicate (what (iii) enforces), but also, when working with consequence versions, that it allow replacement of equivalent sentences in the premise or conclusion sets (as we said, infinitary cut ensures this in the case of direct support). But we will not now worry any more about this rather obscure matter.
6.15
Formal Presentations of Logics (Axiomatizations)
We assume that everyone has the basic idea of an axiomatization. One is presented with certain sentences that count as axioms, and one is given certain rules that allow one to construct proofs of theorems from the axioms. Axiomatizations traditionally have a certain epistemological purpose. The axioms are supposed to be obviously true, and the rules are supposed to obviously carry truths to truths, and so one can discover, by the method of proof, further truths that may not be at all so obvious. The paradigm of this is of course Euclidean geometry. There were epistemological defects in Euclidean geometry. One of these (the lack of "obviousness" of the Axiom of Parallels) will not detain us, but the other is of more immediate logical interest. It had to do with the question as to what rules were allowed in deducing the theorems. Seemingly one was allowed to use logically correct inferences, whatever those were, and also maybe certain constructions. At the beginning of the twentieth century a number of mathematicians, Hilbert chief amongst them, worked at removing the mystery abou.t just what rules were to count. The constructions were uncovered and made explicit in the form of further axioms (e.g., Pasch's law), so that logic alone was needed to derive the theorems. Also the logic itself was axiomatized. Further the whole enterprise was made "formal" in the sense that what was an axiom or the correct application of a rule was made a question of syntactic form alone. Frege had already done this for the logical part in 1879, but given Hilbert's emphasis upon the importance of presenting logic axiomatically (and perhaps his greater influence), axiomatizations of logic of the most familiar kind are called Hilbert-style axiomatizations. 1 Given a language L, we shall call the following a Hilbert-style presentation: it is a structure P = (S, A, R..) where S is the set of all sentences ofL, A ~ S is called the set of axioms, and R.. is a set of rules, i.e., each member of R.. is an n-place relation on S for some natural number n > 1. Note that if we allow one-place relations (sets), the role of A can be subsumed under R.., but given the historical importance of axioms we shall continue to give them a special place in the structure. Given an (n + 1)place rule R, we say that a sentence lJf (the conclusion) follows from (the premises) PI, ... , Pn by way of the rule iff R(Pl, ... , Pn, lJf)· 1Actually, there is a welcome tendency in the computer science literature on complexity to call such axiomatization "Frege systems," but we retain here the less accurate but more traditional nomenclature.
FORMAL PRESENTATIONS OF LOGICS
217
Given a Hilbert-style presentation P, we can define a proof as a finite sequence of sentences PI, ... , Pi, ... ,PIl such that each member Pi is either (i) an axiom or (ii) follows from preceding items by one of the rules. A theorem is then simply the last item in a proof (we write f-p P to mean that P is a theorem of P). Now without addressing the question of just what the "correct" axioms and rules should be, there are still two further kinds of requirements that we might make from our abstract perspective. First, given the traditional epistemological purpose, we should require that both the axioms and rules be "effective," i.e., one ought to be able to mechanically determine whether or not a sentence is an axiom, and whether or not a given sentence follows from some finite number of other sentences by an application of a rule. The way this is traditionally achieved (in modem times) is to require that the set of axioms be recursive, and to require that there be only a finite number of recursive rules. The precise meaning of recursive need not detain us here, but intuitively it means simply that one could write a program to determine whether a sentence is an axiom, or whether a finite sequence of sentences was an instance of a given rule. Incidentally, it is customary to apply the term "axiomatization" only when the presentation is recursive in the sense above. This is one reason that we have chosen to speak of "presentations," although we may allow ourselves to use the more familiar term "axiomatization" as a synonym (reserving then the explicit "recursive axiomatization" for the stronger notion). A second requirement that might occur to one (it is not totally unrelated to the first, as we shall see) is that the axioms and rules be given "schematically." What this means informally is that, for example, certain sentences are presented as "typical" axioms with the understanding that all substitution instances are also axioms. Then, in addition, certain "typical" instances of applications of rules would be presented, with the understanding that all substitution instances would be allowed. Often when axioms and rules are schematically presented, one invokes a new style of sentence variable, e.g., a "schematic variable" P if one thinks of its being added to the object language, and says things like "all instances of the schematic axiom (P V ~ P) are to count as axioms." Or in a more modem style, one uses P as a "metalinguistic variable," and says things like "for all sentences p, (p V ~ p) is an axiom." But there is nothing wrong with avoiding the need for these variables by using the atomic sentence p itself, saying "all substitution instances of (p V ~ p) are axioms." Whether one explicitly uses axiom (and rule schemes) of either kind is a matter of personal choice, but what is important is that axioms and rules be closed under substitution. As argued before, this is what formal logic is all about. Moreover, from an epistemological point of view, schematic presentations are most important since they can allow, in the best of circumstances anyway, the number of axiom and rule schemes to be finite, the determination of their further instances being entirely a matter of "inspection" (very computable syntax). This is one way to be clear that the axioms and rules are effective. More technically, then, we shall say of a Hilbert-style presentation that it is schematic if its axioms and rules are schematic, i.e., the axioms A can all be obtained as "substitution instances" from some subset A of them (each pEA is of the form (J(lJf) for
218
FORMAL PRESENTATIONS OF LOGICS
LOGIC
some endomorphism u of L applied to some "axiom scheme" ljf E A). Analogously, we say of a rule R that it is schematic if there is some "rule scheme" (if> I , ... ,if>n, ljf) E R so that every member of R is of the form (uif>l, ... , uif>n, Uljf) for some endomorphism u of L. We shall say of a Hilbert-style presentation that it isfinite iff there are a finite number of schematic rules and the axioms A can all be obtained as substitution instances from some finite subset A of axiom schemes as detailed above. Clearly each Hilbert-style presentation P gives rise to a unary assertional system £: (the set of theorems). Also it is easy to show by induction on the length of proofs that if P is schematic, then £: is formal (closed under substitution). What about the converse? Given a unary assertional system £: (recall this is just a set of sentences), is there necessarily a Hilbert-style presentation of it? A positive answer is disappointingly easy: just let A = £:. The set R can be empty, since there is now no need for rules (every theorem having a one-step proof). Also clearly if £: is formal, then A can be schematic (let A = A). Surely we had something else in mind? Maybe an effective presentation, or even a finite presentation? To investigate this closely would take us on a detour through recursion theory, which is outside the scope of this book. Let it be said, though, that there is no reason at all to think that even a recursive formal unary assertional logic should be finitely axiomatizable (though it is trivially "recursively axiomatizable"). Hilbert-style presentations are obviously geared towards unary assertional logics, but in many "real-life" cases they can be used to "simulate" other varieties of logic. Thus, for example, one can introduce the concept of a deduction of the sentence if> from hypotheses r as a finite sequence whose last member is if>, each member of which is either (i) a member of r, or (ii) an axiom, or (iii) a consequence of previous items by a rule. Obviously, this is the same as the definition of proof except for the first clause, which in effect grants the status of "temporary axiom" to the members of r. This definition is only appropriate in "real-life" cases where the rules are all "truthpreserving" (like modus ponens and unlike substitution, generalization, necessitation, and other mere "validity-preserving rules"). One can then define r f-p if> to mean that there is a deduction of if> from r. Theorem 6.15.1 The relation r f-p if> is a compact asymmetric consequence relation. Moreover, ifP is schematic, then f-p isfonnal. Proof Compactness is an immediate consequence of the definition of a deduction as a finite sequence. Overlap follows from clause (i) of the definition. Dilution follows from the fact that clause (i) merely allows members of the hypothesis set to be used; it does not require that all be used. Because of compactness, it suffices to show just cut (as opposed to infinitary cut). This we do by way of the following lemma. 0
Lemma 6.15.2 Let if> 1, ... , if>m be a deduction of if> from r, and let ljfl, ... , ljfn be a deduction of ljf from r U {if>}. Then their concatenation if> 1, ... ,if>m, ljfl, ... , ljfn is a deduction of ljf from r. Proof Let us relabel the concatenation of the two deductions as Xl,···, Xm, Xm+l, ... , Xm+n. Clearly for i such that I :S i :S m, the sequence Xl, ... , Xi is a deduction of
219
Xi from r. From m+ lon, each Xi is either a member ofr, or an axiom, or a consequence of proceeding items, with the possible exception of an item which is justified as being the sentence if>. But this sentence can be justified in just the same way that Xm (= if» w~.
0
There is another way that a Hilbert-style presentation can simulate asymmetric consequence-it can contain a binary "implication" connective ~ and a binary "conjunction" connective o. (These connectives are subject to certain natural conditions that we tease out beloW.) Let us define for sentences if> and ljf, if> :Sp ljf iff f-p if> ~ ljf. Clearly we would hope that this is a binary implication relation, i.e., that :Sp is a pre-ordering. We can then go on to define for a set of sentences r, r :SP if> iff there exist sentences if> 1, ... , if>n E r so that the sentence (if>l 0 . . . 0 if>n) ~ if> is a theorem of P. Here we understand "the conjunction" if>1 0 . . . 0 if>n to have parentheses distributed across it such as to make the result a well-formed sentence. Thus we should talk of "a conjunction" of if> I , it being understood that a single sentence if>1 all by itself counts as "a conjunction." Finally, we want it understood that we do not have in mind an "empty conjunction," i.e., we do not allow the fact that if> is a theorem of P to mean that f-p r ~ if>. With all of these understandings we show that the relation r f-p if> defined as f-p r ~ if> is an asymmetric consequence relation. Incidentally, we shall label this relation explicit consequence. It should be clear to the reader that the above notion can be made precise using the notion of Fus(r) as in Section 6.7. We can define r :Sp if> to mean that for some y E Fus(r), y :Sp if>. If we can somehow connect P to pre-ordered groupoids, then we can take advantage of what we established there about consequence relations on such things. Thus we shall say that a Hilbert-style presentation P on an underlying language L induces a pre-ordered groupoid if (L, :Sp, 0) is a pre-ordered groupoid, i.e., all of the following hold: (1) f-p if>
~
(2) f-p if> (3) f-p if> (4) f-p if>
~ ~ ~
if>; ljf, f-p ljf ~ X =? f-p if> ~ X; ljf =? f-p (X 0 if» ~ (X 0 ljf); ljf =? f-p (if> 0 X) ~ (ljf 0 X).
If 0 satisfies in addition the principles below, we shall speak of P inducing a presemi-lattice (in which case we shall usually refer to the semi-lattice operation as /\ rather than 0): (5) f-p (if> 0 ljf) ~ if>; (6) f-p (if> 0 ljf) ~ ljf; (7) f-p X ~ if>, f-p X
~
ljf =? f-p X
~
(if>
0
ljf).
Exercise 6.15.3 Show that (3) and (4) are redundant in the list (1)-(7) above. The following is now immediate from the results of Section 6.7. Theorem 6.15.4 Let P be a Hilbert-style presentation, with the underlying language having connectives ~ and 0 (primitive or defined) inducing a pre-ordered groupoid
LOGIC
220
(i.e., satisfying principles (1 )-(4)). Then the relation of explicit consequence I-p r -+ ¢ is an asymmetric consequence relation. Now we shall examine elliptical consequence, but relativizing it not just to any arbitrary set of sentences (as in Section 6.7), but rather to the theorems of P. To mark this fixing of parameters, we shall call this particular elliptical consequence implicit consequence. Let T be the set of theorems of P. We shall define r ::;PI ¢ to mean that ruT ::;P ¢. Also we shall define the mixed consequence r::;p M ¢ to mean that either (i) r ::;P ¢ or (ii) 0 ::;PI ¢. Corollary 6.15.5 The relation of implicit consequence defined above is a compact asymmetric consequence relation. The proof is clear from Corollary 6.7.3, implicit consequence being just a special case of elliptical consequence. Of somewhat greater interest in Hilbert-style settings is the stop-gap consequence relation, which we define as follows: r ::;Psg ¢ iff either (i) r ::;P ¢ or (ii) I-p ¢ ("¢ is either an explicit consequence of r or else ¢ is a theorem"). We call this "stop gap" both because we are hard pressed for a permanent name and because of the pun of providing for consequences from the empty set of premises. Stop-gap consequence is quite natural in the context of a Hilbert-style presentation, since one is tempted to say in one breath both that the theorems "follow from" no premises, and that they follow from all premises. It turns out that under a very natural assumption, stop-gap consequence turns out to be the same as mixed consequence, and we know from Section 6.7 that this last, under another quite general assumption, is the same as implicit consequence. Corollary 6.15.6 Let P be a Hilbert-style presentation that induces a pre-ordered groupoid. Then if it has as a rule (8) I-p ¢, I-p ¢
-+
0/
~
I-p 0/
(modus ponens),
then stop-gap consequence is identical to what in Section 6. 7 we called mixed consequence, and hence, when the theorems are upper identities, to implicit consequence (which is of course a compact asymmetric consequence relation). Furthel; under all of the assumptions above, all these consequence relations agree with explicit consequence when the premise sets are non-empty. Proof We first observe that
(9) I-p ¢ iff there is some theorem t such that t :::;p ¢. For the left-to-right assertion the t in question can be ¢ itself. Going from right to left, we simply invoke (8). Thus stop-gap consequence is just a species of mixed consequence. The rest follows from results of Section 6.7. 0 The results above apply quite readily to a pre-semi-Iattice. Corollary 6.15.7 Let P be a Hilbert-style presentation, but let it induce a pre-semilattice. Then the relation of explicit consequence is a compact asymmetric consequence
FORMAL PRESENTATIONS OF LOGICS
221
relation. Further, implicit consequence is also a compact asymmetric consequence relation. Finally, on the additional assumptions that the theorems are top elements and that P satisfies (8), stop-gap and implicit consequence coincide (also with explicit consequence when the premise sets are non-empty). Proof Immediate from the fact that a pre-semi-Iattice is a pre-ordered groupoid, ap-
plying Corollary 6.7.7 (note that when identity).
° is 1\, a top element is the same as an upper 0
Can we simulate symmetric consequence in a Hilbert-style presentation? The use of deduction all by itself does not take us very far, since deductions have single conclusions. But given the presence of a "disjunction" connective +, we can define r I-p A to mean that there exist 0/1, ... ,0/11 E A so that 0/1 + ... + 0//1 is deducible from r. Exercise 6.15.8 Tease out the properties of + that are natural to assume in order that the above definition yield a symmetric consequence relation. We shall examine in some detail the alternative "object-language" definition ofr I-p A as I-p r -+ A, meaning by this that there exist YL ... ,Ym E r, 81, ... ,8/1 E A such that I-p (n ° ... ° Ym) -+ (81 + ... + 8/1). This we shall call explicit symmetric consequence. Again it is understood that -+,0, and + are binary connectives, and that parentheses are to be distributed ad lib so as to make the above well-formed. The more precise way of talking is to invoke the framework of Section 6.9, and require that there exist Y E Fus(r), 8 E Fis(A), such that I-p Y -+ 8. In analyzing what properties are needed for this to give a symmetric consequence relation, we again revert to the framework of Section 6.9. Thus let us jump in straight away and assume that P induces a sufficiently idempotent hemi-distributoid, i.e., that the structure (L, :::;p, 0, +) is a sufficiently idempotent hemi-distributoid (where L is the language underlying P). Theorem 6.15.9 Let P be a Hilbert-style presentation, with the underlying language having connectives -+,0, + (primitive or defined) inducing a distributoid. Then the relation r :::;p A of explicit consequence is a symmetric consequence relation. Corollary 6.15.10 Let P be a Hilbert-style presentation inducing a distributive prelattice. Then the relation of explicit consequence r:::;p A is a compact symmetric consequence relation. The proof is immediate from Theorem 6.9.19. Let us now briefly examine implicit consequence. It should be clear by now how this allows for empty left-hand sides, but how can we deal with empty right-hand sides? Although an empty left-hand side has a natural interpretation in Hilbert-style systems (I- ¢ just means "¢ is a theorem"), the problem is that there is no natural interpretation of its dual ¢ I- in a Hilbert-style "assertional" system at any rate (in a "refutational" system it would just mean "¢ is refutable"). If we, however, only had a negation connective", "indicating" refutation, we could simulate refutation in an ordinary Hilbert-style (assertional) system by 1-", ¢.
222
The idea in general is that we can set T to be the set of theorems, and F to be the set of "counter-theorems," i.e., those sentences rjJ such that rv rjJ is a theorem. Then following the ideas of Section 6.9 we can define all three species of implicit consequence: left, right, and symmetric implicit (LE, RE, and S E). We can also define the corresponding versions of mixed consequence. Thus, for example, symmetric implicit is defined so that r :S;PSE li iff rUT :S;p F Uli. And the corresponding symmetric mixed implication is defined so that r :S;PSM li iff either (i) r :S;P li, or (ii) 0 :S;PSE li, or (iii) r :S;PSE rjJ (left implicit is defined by deleting F, and its mixed version by deleting clause (iii), and right implicit is defined by deleting T, and its mixed version by deleting (ii)). When the theorems are upper identities, we know (from the results of Section 6.9) that left implicit and its mixed version agree, and when the counter-theorems are lower identities, we know the same for light implicit and its mixed version (and when both the theorems and counter-theorems serve as appropriate identities, we know that symmetric implicit consequence is the same as symmetric mixed consequence). To say that the counter-theorems serve as the appropriate lower identities is just to say (10) I-p rvrjJ
=}
I-p (rjJ
+ lfI)
-l-
lfI.
We are now in a position to define symmetric stop-gap consequence, saying r :S;PSsg li iff either (i) li is an explicit consequence of r, or (ii) I-p 8 for some fission of members of li, or (iii) I-p rvy for some fusion of members of r. Left stop-gap consequence omits clause (iii), while right stop-gap consequence omits clause (ii). What is needed to turn stop-gap implication into an honest implicit consequence? Just as with the asymmetric case, we need (8) to equate being a theorem with following from some theorem. What is new is that we need the principle below to equate being a counter-theorem with having as a consequence some counter-theorem: (11) I-p rjJ
FORMAL PRESENTATIONS OF LOGICS
LOGIC
-l-
lfI, I-p rvlfl
=}
I-p rvrjJ
(modus tollens).
Corollary 6.15.11 Let P be a Hilbert-style presentation that induces a distributoid. Then if (8) and (11) hold, then stop-gap consequence is identical to what in Section 6.9 we called mixed consequence, and hence, when also the theorems are upper identities and the counter-theorems lower identities, to implicit consequence (which is, of course, a compact asymmetric consequence relation). Corollary 6.15.12 Let P be a Hilbert-style presentation with connectives (primitive or defined) -l-, /\, V, rv that induces a pre-distributive lattice. Then the relations of explicit and implicit consequence are both compact symmetric consequence relations. Let P also satisfy (8) and (11). If the theorems are all top elements and the counter-theorems are all bottom elements, then stop-gap and implicit symmetric consequence are the same, and agree also with explicit consequence when neither side (premises or conclusions) is empty. Remark 6.15.13 Another, and in some ways more elegant, approach to the characterization of implicit consequence is to add a constant true sentence t, and interpret I- rjJ as t I- rjJ. For symmetry, we should add a constant false sentence f for interpreting empty right-hand sides. This interprets rjJ I- as I- rjJ -l- f-and, given the Johansson
223
(1936) definition of rvrjJ as rjJ -l- f, amounts to what we did above. Of course, one has to give these constant sentences some special properties. It turns out that making them respectively upper and lower identities suffices to ensure that the three species of implicit consequence agree with their mixed counterparts. What is then needed to forge linkage with the three stop-gap consequence relations is (12) I- rjJ
¢>
I- t
-l-
rjJ.
Note incidentally, that if we were working in the context of a system that allowed refutation as well as proof, we would need to add ( 13) rjJ is refutable
¢>
I- rjJ
-l-
f,
thus restoring what might seem to be some missing duality (in a Hilbert-style system there is no way of stating the dual of (12)). Exercise 6.15.14 Reprove the various results above concerning "simulated" consequence, with the new definitions of the various species of implicit consequence using t and f. Of course the assumption is that P satisfies (12), in addition to the vruious hypotheses of the theorems above. Remark 6.15.15 It might be thought strange that a single assertionallogic (with the proper connectives) always gives rise to (at least) four different symmetric consequence relations. Which one is the real consequence relation of the logic? Given t with the properties above, the Hilbert-style system can always be recaptured by the definition of I- rjJ as t I- rjJ, no matter which of the four consequence relations we choose to work with. (The proof of this is left as an exercise for the reader.) And yet it is clear from even the case of classical logic that the four consequence relations are distinct. Thus explicit consequence allows no consequences "from or to" the empty set by definition, and both left and right implicit consequence disallow the empty set on one side or the other. It is only symmetric implicit consequence that gives full status to the empty set as both a legitimate set of premises and conclusions. And yet classical logic has valid arguments with the empty set of premises (14) 01- {p, rvp}, and also with the empty set of conclusions (15) {p, rvp} I-
0.
So this suggests, for classical logic at least, symmetric implicit consequence is the "correct" consequence relation. Exercise 6.15.16 We know by the absoluteness of symmetric consequence, that every consequence relation is characterized by a unique semantics (class of valuations). This means that for any usual Hilbert-style axiomatization of classical logic, since there are four consequence relations, there are four different semantics. Thus corresponding to explicit consequence is the class of valuations that consists of the usual truth-functional valuations plus the valuation that makes every sentence true and the valuation that makes every sentence false (this is because we must be able to falsify both (13) and (14)). Left implicit consequence requires only the addition of the valuation that makes every
LOGIC
224
sentence true, and right implicit consequence requires of course only the addition of the valuation that makes every sentence false. Every reader should verify that with these minor changes the semantics is able to falsify, as appropriate, (13) and/or (14). The reader knowing something about how to prove completeness for some standard axiomatization of classical propositional calculus should go on to prove that, with these additions, the semantics does characterize the appropriate consequence relations.
6.16
Effectiveness and Logic
To simplify things, we shall suppose that the logic is given an effective Hilbert-style presentation as discussed in Section 6.15, i.e., the set of axioms is decidable and there are finitely many decidable rules (viewing each n-ary rule as a set of n-tuples of sentences). A proof is a finite sequence of sentences, each of which is either an axiom or follows from previous members of the sequence using one of the rules. Hence a proof is a certain sequence of sequences, and can be understood as a single string by separating its component sentence strings by a new symbol used as a delimiter (think of it as a "space"). Theorem 6.16.1 The set of proofs of an effective Hilbert-style presentation of a logic is decidable.
Proof In tum, check each item of the sequence to see whether it is an axiom (this is given as a decidable question) or whether it follows from preceding members by one of the rules (again a decidable question). If the answer is "yes" for each element, then the 0 sequence is a proof, and otherwise it is not. Corollary 6.16.2 The set of theorems is effectively enumerable.
Proof Construct a machine that strips the last line off of a proof. Effectively enumerate the proofs, and as you do feed them into the "stripping machine" and the result will be an enumeration of the theorems. 0 Remark 6.16.3 It is well known that the set of theorems of classical propositional logic is decidable (being just the set of two-valued tautologies-cf. Chapter 9). Various non-classical propositional logics are also decidable (e.g., intuitionistic logic and many standard modallogics-cf. Chapters 10 and 11). But not all propositional logics, even those that are finitely axiomatizable, are decidable. For many years, all examples were more or less artificial. But recently A. Urquhart has shown the systems Rand E of relevance logic to be undecidable-cf. Anderson et al. (1992). And Lincoln et al. (1992) have shown linear logic undecidable. It is clear that any logic that has a finite characteristic matrix is decidable. But Harrop (1958) showed (cf. also Harrop 1965) that if a logic has a finite presentation (finitely many schematic axioms and rules) that the following weaker property suffices:
Definition 6.16.4 Let L be a unary assertionallogic with a finite Hilbert-style presentation. Then L has the finite model property (fffor evel)! non-theorem cjJ, there is afinite
EFFECTIVENESS AND LOGIC
225
matrix M in which all axioms of L are valid and in which all of the rules of L preserve designation, but in which cjJ is not valid. Theorem 6.16.5 (Harrop 1958) Let L be a unm)! assertional logic with a finite Hilbert-style presentation. Then if L has the finite model property, then L is decidable.
Proof We know from Corollary 6.16.2 that the theorems are effectively enumerable. Finite models are also effectively enumerable, as we can see from a "brute force" argument. The actual nature of the elements of a matrix makes no difference (since isomorphic matrices verify the same sentences), so a matrix with n elements can always be thought of as composed of the elements {I, 2, ... , n}. Using these elements, one can thus construct (up to isomorphism) all of the two-element matrices, the three-element matrices, etc., and as one does so check to see whether the axioms are valid in the matrices and the rules preserve designation. (Here is where it is important that there be only finitely many axioms and mles-the theorems could still be enumerated even were the axioms an infinite set as long as it is decidable.) So to decide the theorernhood of a candidate sentence cjJ, start one machine enumerating the theorems, and start another machine enumerating the models of the logic. As each model is enumerated, pass control over to a third machine that checks whether it is also a model of cjJ. At some point, either cjJ will show up in the enumeration of the theorems, or else, if it is a non-theorem, it will be rejected by one of the finite models being enumerated. So at some point, one will have a "yes or no" answer to the question of whether cjJ is a theorem. 2 0 Remark 6.16.6 A matrix of the sort used in the above theorem, where the axioms are valid and the rules preserve designation, is called a "strong" matrix for the logic L. A matrix where the axioms are valid but where the rules merely preserve validity is called a "weak" matrix for L Obviously a strong matrix for a logic is also a weak matrix for that logic, but a weak matrix for a logic may not necessarily be a strong matrix for that logic. However, Harrop (1958, 1965) showed that if there is a finite weak matrix for a logic in which a certain sentence is not valid, that there is also a finite strong matrix for that logic which also invalidates that sentence. This shows the equivalence of what might be called the weak and the strong finite model properties. The idea of the proof is to take a non-theorem cjJ and consider the algebra of sentences generated just from its atomic sentences. The Lindenbaum matrix restricting the logic to that language is of course infinite. But one can use the given finite matrix to establish a natural congmence on the algebra of sentences, counting two sentences to be equivalent just when for every interpretation of the atomic sentences the sentences evaluate to the same element in the matrix. It is clear that the quotient algebra is finite, since for each positive integer n there are only finitely many n-ary operations on a finite set. We tum this quotient algebra into the required strong matrix by counting an equivalence class [If/] as designated iff the sentence If/ is a tautology. We leave the details of the proof to the reader. 20f course, one will not know in advance how many turns of the cranks of the two enumeration machines will be needed, but this is not essential to the notion of decidability. In privileged circumstances, one might have a "speed function" that would tell one how long one must wait for an answer.
MATRICES
7 MATRICES AND ATLASES
7.1
Matrices
We have introduced the notion of matrices in Chapter 5. Before introducing abstract notions, as matrix product and matrix homomorphisms, we devote this section to an overview of several logically important matrices. This also gives us a chance to familiarize the reader with the abstract notions, which will be formally defined later, on concrete examples. After outlining some basics, we present the Lukasiewicz (and Kleene) matrices, the GOdel matrices and the Sugihara matrices. We furnish examples of submatrices, matrix isomorphism and homomorphism, product and preservation of tautologies. We conclude this section with a quick look at infinite generalizations and different philosophical interpretations. 7.1.1
Background We defined the notion of a logical matrix in Definition 5.6.1. Formally, it is just an algebra with a non-empty proper subset of "designated" elements. The notion was formally introduced by Lukasiewicz and Tarski (1930) (translated into English in Tarski (1956)), and the study of matrices has been one of the characteristic features of the socalled "Polish School" in logic. I Many results concerning matrices are to be found in Polish journals such as Studia Logica. The classic treatise on matrices is Los (1944). The reader may wish to consult W6jcicki (1988) for many interesting results concerning matrices which go beyond the scope of our discussion here. The aim of the discussion in tllis section is to develop some relatively informal intuitions concerning matrices, with very few proofs and some simple exercises. We encourage the reader to think of the elements of the underlying algebra of a matrix as "propositions," but a popular alternative is to think of them as "truth values." Of course since every right-thinking person knows that there are only two real truth values (which we have designated as t and f) this would lead to a severely limited number of matrices. Indeed, except for the question of which operations to favor, this would basically lead to only variations on the classical truth tables. A natural response, originating with Lukasiewicz (1920, 1930), is to allow degrees of truth. Thus for tlrree values we might have: true,false, neither.
I English
(1967).
translations of some of the important early works of Polish logicians may be found in McCall
227
These values can be understood in various ways.2 Lukasiewicz himself was motivated by Aristotle's famous problem of future contingents, and thought that most statements about the future were not determined to be either true or false at the present. Thus true for Lukasiewicz means "presently determined to be true," and false means "presently determined to be false," whereas neither means "none of the above." Besides bringing metaphysical issues about determinism into the picture, Lukasiewicz further complicates things by bringing in the notion of tensed truth values. There are other logicians, such as Kleene, who have seen the need for three "truth values" even in the context of arithmetic, where presumably there are no issues of causal determination or tense. Kleene saw that in certain formalisms for recursive functions, it was not effectively decidable whether applying a function f at an argument n terminated, i.e., whether the value fen) exists or not. Thus Kleene devised a system of truth tables much like Lukasiewicz for sentences involving terms that might refer to partial recursive functions. There are also logicians 3 who have wanted to work with four "truth values," adding the value both to the values true,false, neither (cf. Anderson et al. 1992). See Dunn (1999) for a systematic exploration of various options in the use of these values. One thing to realize is that, so far in our description, there has really been no need to ontologize the extra "truth values." Not to have a value, or to have two values, need not be taken as additional values. (One is reminded of Lewis Carroll's "Nobody passed me on the road.") It is rather that a valuation can go wrong in two different ways: it can fail to be total or it can fail to be uniquely defined. A convenient way to mathematize this intuition is to take a valuation to be a function to the power set of the original two truth values {t,f}. 7.l.2
Lukasiewicz matrices/submatrices, isomorphisms
But what happens if we want more than four values? Let us return to Lukasiewicz who suggested having systems with 3,4,5, ... values. Indeed, for any positive integer none can have the n + 1 values n /1-1 n-(n-I) o} . { n'-n-'···'--n-'n,1.e.
{I /1-1
'-1-1
I
""'n'
O}
.
i,
One might interpret, say, {I,~, O} as true, mostly true, mostly false, false. Lukasiewicz extended this to infinitely many values, taking either the rational or real numbers as values. A set of values does not by itself make up an algebra, let alone a matrix. Two items need to be added: an indexed set of operations and a designated subset of the values. Lukasiewicz defined the operations as follows: x A y = min(x, y), x V y = max(x, y), oX = I-x, and x -+ y = min(I-x+y, 1). Lukasiewicz took the set of designated values to be just {I}. By Lm we shall mean the finite Lukasiewicz matrix with m values. The reader can easily compute that the matrix L3 is presented by the following "generalized truth tables" (* indicates the designated value): 2Malinowski (1993, Section 2.4) is a historically sensitive discussion of the philosophical difficulties raised by the various interpretations. 3Dunn (1976) pioneered this.
MATRICES
MATRICES AND ATLASES
228
-.
-+
1*
I
I
I
"2 0
"2 0
"2 1
0
/\
I
0
1*
I
I
"2
1 1
"2 1 1
1*
0
I
1*
1*
0
V
1*
I
0 0 0
1*
1 1 1
"2
I
I
"2 0
"2 0
"2 1
I
"2 "2 0
I
"2 0
I
"2 1 I
"2 I
"2
0 1 I
"2 0
L2 is the two-valued Boolean algebra 2 = {I, O} with I as the only designated element. The operations are defined by the usual 2-valued truth tables:
-TJo I~
-+
1*
o
10
1* 1 1
/\
0
1*
1*
0
1 0
o
0 0 0
1*
V
0
1*
1 0
o
While the charactelization of the elements as numbers in the unit interval [0, 1] is mathematically convenient, particularly in terms of charactelizing negation and implication, it is by no means essential. The valious finite Lukasiewicz matrices could just as well be characterized as having as elements {O, I}, {O, 1, 2), {O, 1, 2, 3), ... , or say {-I, 1 }, { -1, 0, + I}, {-2, - 1, + 1, + 2) , .... The point of course is that only the cardinality of the set of elements matters. On the first charactelization the three-valued Lukasiewicz tables can be presented as follows:
-m 1
o
1 2
-+
2*
1 0
/\
2*
0
V
2*
1 0
2*
2 2 2
1 0 2 2 2
2* 1 0
2
1 0 1 0 0 0
2* 1 0
2 2 2
2 2 1 1 1 0
0
0
The two presentations of L3 illustrate the idea of two matrices being isomorphic, which means that there is an isomorphism h in the algebraic sense between the two underlying algebras such that an element a is designated in the one matrix iff h(a) is designated in the other. What this amounts to visually is that the tables characterizing the operations for one matrix are a simple "relabeling" of the tables for the other (including where * appears). We next illustrate the notion of "submatlix." Consider the following table for Lukasiewicz's three-valued implication, with a few more lines added: -+
1*
0
1* 1
"2 0
The reader can determine by inspection that the two-valued truth table for the classical matelial conditional can be found "hidden" in the four corners of the tables for L3. We also of course need to consider the tables for -., /\, V before we can conclude that "L2 is a submatlix of L3," a task we leave to the reader.
229
In the next section we shall be more formal about the definition, but for now it suffices to say that a matrix M is a submatrix of a matrix M' iff the algebraic part of M is a subalgebra of the algebraic part of M', and the designated elements of M are those also designated in M'. We want to take the occasion to make the abstraction visually concrete. Given a table defining an operation, let us call the leftmost column and the uppermost row of values the border, and let us call the remaining values the interior. The values on the border are inputs, and those in the interior are outputs. To begin with, let us consider matrices with only one operation: -+. In visual terms, to say that M is a submatlix of M' means that the table for M results from deleting one or more rows and matching columns in the following way. One selects elements kl, ... , k n that one wants to delete. One deletes the rows headed by kl, ... , kn, as well as the columns headed by kl, ... ,kn . (In the case pictured above there is just lq = 1.) After this operation, two conditions must be met, which are easy to check visually: the intelior of the table must not contain a value that does not occur on the border; and (2) at least one designated value must remain on the border. Extending this to matlices with more than one operation is straightforward. For all of the tables displaying its operations, one must be able to do deletions as described above of the "same" rows and columns (i.e., those headed by the same values).
(1)
Exercise 7.1.1 (Lindenbaum) Show that L2 is a submatrix of each matrix L n , for n? 2. The reader should not hastily draw the conclusion that Ln is always a submatlix of Ln+l. It is easy to see that L3 is not a submatlix of L4, since negation has a fixed point (~) in L3 but not in L4. Exercise 7.1.2 (Lindenbaum) This generalizes the previous exercise. Show that if m divides n (without remainder) then Lm+1 is isomorphic to a submatlix of Ln+l. (Hint: If m divides n, then 3k(km = n). The map hex) = kx is the desired isomorphism.) Vmious interpretations can be put on the values of a Lukasiewicz matlix. Lukasiewicz himself favored some kind of interpretation where the values are taken to be probabilities. This does not fit the interpretation he gives to the connectives. For example, a conjunction is given the least value of either of the two conjuncts, whereas the probability calculus would instead take their product (and only then when the two conjuncts are independent). If one looks at only negation, conjunction, and disjunction, one gets Kleene's "strong" three-valued matlix K3. Kleene interprets the intermediate value as indicating "undefined," and he throws away the Lukasiewicz implication, providing in its place <jJ :J I/f, which can be regarded as just an abbreviation for -.<jJ V I/f. Note that ~ :J ~ = ~ and is undesignated in K3, whereas Lukasiewicz set ~ -+ 1 = 1. This means that there are no tautologies in Kleene's strong three-valued matrii. Note that there are nonetheless valid consequences, the simplest example being <jJ I= K 3 <jJ, i.e., every interpretation in K3 that gives <jJ the designated value 1 also assigns <jJ the value 1. Kleene also provided a "weak" three-valued logic which differs from the strong one only when one of the input values is the intermediate "undefined" value. The idea, put
MATRICES AND ATLASES
230
MATRICES
in computer science jargon, is "garbage in, garbage out," or in AmeIican folk wisdom "one rotten apple spoils the barTel." Put quickly, any computation involving 0, 1 is done as with classical truth tables, but any calculation that involves as input results in as output. This goes back to Bochvar's logic of meaningfulness. Readers wishing to lear'n more about the Kleene and Bochvar interpretations can consult Urquhart (1985).
1
0
-+
2*
0
/\
0
2*
2
0'1
1 0
2 2
0 OQ
2* 1 0
2
2 2
2
2* 2 1 0
1
0
V
1 0 1 0 0 0
2* 1 0
2* 2 2 2
1
2*
1*
0
0
2*
2
OQ
2 2
1Q
2
1* 0
0 OQ
2
2
2 1 1
2 1 0
Remark 7.1.4 The map above is almost the identity map, and so each G n (n ~ 1) is almost a submatrix of Gn+l. We would get precisely a submatrix if we changed the characterization of a Godel matrix so that the top element was always the same, say w. Sugihara matrices/homomorphisms
Another interesting sequence of finite matrices, called the Sugihara matrices, satisfies the relevance logic R (and in fact R. K. Meyer has shown that they are jointly characteristic of the system RM (R-Mingle)-cf. Anderson and Belnap 1975).are the natural numbers {O, 1, ... , n - I}. We define a /\ b and a V b as minimum and maximum, just as for Lukasiewicz and GOdel, and we define ..,a as the "mirror image" of a, i.e., ..,a = (n - 1) - a (just as for Lukasiewicz). Then a -+ b = ..,a V b if a :S b, and otherwise a -+ b = ..,a /\ b. Unlike either Lukasiewicz or Godel, we can have more that one designated value, taking a as designated whenever ..,a :S a. S2 is of course just the usual two-valued matrix and has only one designated value. The three- and four-valued tables are as follows (again ~ indicates differences from Lukasiewicz):
3*
2*
3*
3
OQ
0'1
2*
3
2Q
1Q
0 OQ
2Q
2Q
OQ
0
3 3
3
3
3
-+
/\
2*
1*
0
V
2*
1*
0
2* 1* 0
2 1 0
1 1 0
0 0 0
2* 1* 0
2 2 2
2
2 1 0
0
1-+
1, .., 1
1-+
2, ..,0
/\
3*
2*
1
0
V
3*
2*
1
0
3* 2* 1 0
3 2 1 0
2 2 1 0
1 0 0 1 0 0 0
3* 2* 1 0
3 3 3 3
3 2 2 2
3 2 1 1
3 2 1 0
1-+
3.
Note that S3 is not a submatrix of S4 since .., has a fixed point (.., 1 = 1) in S3 but not in S4. Nonetheless, there is an interesting relationship between the two which we shall express by saying that S3 is isomorphic to a weak homomorphic image of S4. Visually this means we can "box" together 2 and 1, squinting so to speak so that they blur together, and the result can be seen to be isomorphic to the implication table for S3: 0
-+
0
Exercise 7.1.3 Show that each Godel matrix G n (n ~ 2) is isomorphic to a submatrix of Gn+l. (Hint: Define h so that for 0 :S m < n - 1, hem) = m and hen) = n - 1.)
7.1.4
-+
The four-valued matrix for negation is obvious, ..,3* 1-+ 0, ..,2* The matrices for the binary connectives are as follows:
The operations above are obviously not the only operations definable on three values. GOdel (1933) defined a sequence of finite matrices Gl, G2, G3, ... which satisfy the theorems of the intuitionistic propositional calculus (cf. Section 11.10). The n-valued matrix Gil has as its set of values {O, 1, ... , n - 1 }. GOdel let 0 be the only designated value, but we follow the usual custom and reverse the order relation, so that n - 1 is the designated value. (Actually, it is more elegant to replace n - 1 with w since otherwise the designated value differs from matrix to matrix.) Operations ar'e then defined as follows: a /\ b = min(a, b), a vb = max(a, b) (just as for Lukasiewicz), but a -+ b = n - 1 if a :S b, and otherwise a -+ b = b. Finally ..,a can be defined as a -+ 0, which means for a I- 0, ..,a = 0, but for a = 0, ..,a = n - 1. GI is a degenerate one-element matrix whose only element is designated. G2 is the classical two-valued matrix. G3 is characterized by the following three-valued tables (we put a ~ where they differ from Lukasiewicz):
2*
2* 1* 0
1
Codel matrices/more submatrices
7.1.3
.., S3 :
231
box
----7
3* 2* 1 0
3
0
0
-+
0 blur
3GJJO 322 0 3 3 3 3
----7
3* {2, 1) * 0
3* 3 {2, 1) * 3 0 3
0 {2, 1) 3
0 0 3
-+
relabel
---+
2* 1* 0
2* 2 1* 2 0 2
0 1 2
0 0 2
No~e that d~signation is preserved from left to right, but not conversely, since {2, 1) is deSIgnated III the target, and yet 1 is not designated in the source. We will speak of a strong homomorphism when designation is preserved in both directions.
Exercise 7.1.5 Show that S2 is isomorphic to a submatrix of S3. Exercise 7.1.6 (Dunn (1970)) Show that when n is odd, Sn is a homomorphic image of Sn+ 1 and when n is even, S/1 is isomorphic to a submatrix of Sn+ I. Re~ark 7.1.7 We have used initial segments of the natural numbers for the Sugihara matrIces, so as to facilitate comparison with the Lukasiewicz and Godel matrices. But each has its most natural and/or normal presentation. We have already seen that for the Lukasiewicz matrices fractions between 0 and 1 are the most natural (with 1 designated). The GOdel matrices are, perhaps, best understood as an initial segment of the n~tura~ ~umbers, with wadded on top as the designated element. For the Sugihara matnces It IS natural to use the integers -n, - n + 1, ... , - 1, (0), + 1, ... , n - 1, n, the parentheses around 0 indicating that it may be absent. Now we can define ..,a as -a. Then a -+ b = max( -a, b) if a :S b, and otherwise a -+ b = mine -a, b). And a is designated whenever 0 :S a.
MATRICES
MATRICES AND ATLASES
232
7.1.5 Direct products We illustrate one last construction involving matrices. Suppose we want to take two matrices, say L3 and S3 and "glue them together." This is accomplished using the "direct product" construction, which is obtained by first taking t~e direct. product of the underlying algebras, and then taldng an element (ai, a2) as d~sIgnated Ju~t .when both al and a2 are designated. In the table below we have taken the lIberty of stnking out the secon.d components, and boxing together those pairs that have the same first component. It IS clear from this that L3 is a homomorphic image of II (L3, S3)· -+
1, )'*
1, f'*
1,-ft
~,)'
1, ,l*
1, )' 1, )' 1, )'
1, -ft
1,-ft 1,-ft
~,)' ~,)' ~,)'
~,)' ~,f'
1, )'
1, -ft
~, -ft
1, )'
0,),
1, )'
O,f' 0, -ft
1, f'*
1,-ft
I, )'
1, f' 1, )'
1, )' 1, -ft
1,)'
I, f' 1,), 1,-ft 1,f'
1,-ft 1,)' 1, -ft 1, -ft
1, )'
1, )'
1, )'
1, )' 1, )' 1, )'
~, f' ~, -ft ~,f' ~,)' 1,-ft 1,f'
~, -ft 4,-ft J,-ft ~,)' 1, -ft
1, )' 1, )'
1, -ft
1,f'
1,-ft 1,), 1,-ft 1, -ft
1,),
1, )'
1, )'
1, )'
0,), 0,), 0,), 0,),
~,)' ~,)' ~,)'
1,)' 1, )' 1, )'
O,f' O,-ft O,f'
0, -ft
0,),
0, -ft 0, -ft 0,),
~, -ft ~,f' ~,)' 1,-ft 1, f'
~, -ft ~, -ft ~,)' 1, -ft 1, -ft
1, )'
1, )'
Exercise 7.1.8 Through similar "visual reasoning," show that S3 is also a homomorphic image of the direct product. (Hint: First rearrange the order of the values in the borders so as to allow the order of the second component to dominate.) 7.1.6 Tautology preservation Given a matrix M, let Taut(M) (the "tautologies" of M) be the set of sentences that take designated values for every interpretation in the underlying algebra. Given a unary assertionallogic L, a matrix M is said to be characteristic for L whenever for all sentences where v/(p) = tiff l(p) E D. The set of admissible valuations according to M, VM, is then defined as the set of all such VI' Where K is a class of (similar) matrices, we can then define the set of admissible valuations according to K as U{VM : M E K}. Using these sets of admissible valuations, we can define validity in M in various senses. These definitions make explicit notions that were implicit in the previous chapter and the previous section. First we define (unmy assertional) validity ("tautologyhood") in a matrix M as v(p) = t for every v E VM, i.e., /(p) E D for every interpretation in M. This is customarily denoted by some notation such as I=M p. This can be extended to a class K of (similar) mahices in the obvious way, requiring that I=M p for every ME K, which is the same as requiring that v( p) = t for each v in the set VM of admissible valuations according to K. We write I=K p.
PRESERVATION THEOREMS
MATRICES AND ATLASES
244
Asymmetric matrix consequence is defined in a corresponding manner. r I=M cjJ iff for every v E VM, if v(y) = t for every y E r, then v(cjJ) = t, i.e., iff for every interpretation I in M, if I(Y) E D for all y E r, then l(cjJ) E D. Again this can be extended to a class of matrices K, r I=K cjJ iff for every v E VK, if v(y) = t for all y E r, then v(cjJ) = t, i.e., iff for every ME K and interpretation I in M, if I(Y) ED for all y E r, then l(cjJ) E D. Symmetric matrix consequence is defined analogously. r I=M A iff for every v E VM, if v(y) = t for every y E r, then v(8) = t for some 8 E A, i.e., iff for every interpretation I in M, if I(Y) E D for all y E r, then 1(8) ED for some 8 E A. Again this can be extended to a class of matrices K, r 1= K A iff for every v E VK, if v(y) = t for all y E r, then v(8) = t for some 8 E A, i.e., iff for every M E K and interpretation I in M, if I(Y) ED for all y E r, then 1(8) ED for some 8 EA. lt is easy now to see, for example, that unary assertional validity in a matrix is preserved under both weak and strong submatrix. Thus, suppose that M' is a weak submatrix ofM and I=M cjJ. Consider an arbitrary interpretation t' in M'. By Lemma 7.3.1, I' is also an interpretation in M and so t'(cjJ) ED' since t'(cjJ) ED. Thus I=M' cjJ. We shall leave to the reader the routine derivation of other preservation theorems from their corresponding "proto-lemmas." The results are summarized in the following table. A check of course means that validity is preserved, and a cross means it is not. A small check (./) is an immediate consequence of a large check (v') in its vicinity, just as a small cross (x) is an immediate consequence of a large one ( X)· Note that there are only negative results in nine places, and because of the immediate consequences noted, these boil down to just four, which are labeled in order. We shall address these below.
245
We can produce an actual counter-example by letting M' be the three-valued Lukasiewicz matlix and letting M be a "designation expansion," i.e., the same except that we extend the set of designated values to be {l, ~} (in effect "designated" means "nonfalse"). It may seem odd that M' and M are the same on the algebraic component, but an algebra surely counts as a subalgebra of itself. Consider the sentence p V "p, and assign p the value ~. The interpretation of p V .,p is itself then ~, and thus p V "p is rejected in M' by this interpretation. But this interpretation no longer rejects p V "p in M, and indeed no interpretation does since the assignment of either 1 or 0 always gives p V .,p the value 1. Counter-example 2. Asymmetric consequence is not preserved under weak homomorphic images. We in fact show this for singleton left- and right-hand sides. Let us see where the argument for preservation breaks down. Suppose h is a matrix homomorphism of M onto M', and that cjJ f!:M' lJf. Then there is an interpretation t' in M' such that t' (cjJ) E D' and t' (lJf) ¢ D'. One can then argue that if one constructs an assignment IO(p) E h- 1(t' (p)) for each atomic sentence p, that the resulting interpretation in M will have the property that I(X) E h- 1 (t' (X)) for all sentences cjJ. So far, so good. But then we have just been arguing the algebraic situation, we have not yet turned our attention to the question of the designated values. Consider now l(cjJ) and l(lJf). We know that l(lJf) is not designated, for if l(lJf) E D, then since h preserves designation h(l(lJf)) = 1'(lJf) ED', but this is false. But there is no way to show that z( cjJ) is designated. We would try to argue that if l(cjJ) ¢ D, then h(z(cjJ)) = 1'(cjJ) ¢ D'. But this assumes that h preserves non-designation. Obviously what is required is that h is a strong homomorphism. We now produce an actual counter-example, again by fussing with the three-valued logic. For this purpose we add a new unary connective \/ with the table:
Unary Asymmetric Symmetric Submatrix
Homomorphic image
Inverse hom. image
Direct product
Weak Strong Weak Strong Weak Strong Weak Strong
Xl
x
x
./
v'
v'
X2
x
./
v'
v'
x:
x
x
./
./
v'
v'
v'
X4
v'
./
./
v'
Counter-example 1. Unary assertional consequence is not preserved under weak submatrices. Suppose that M' is a weak submatrix of M and (contrapositively) that f!:M' cjJ. Suppose that I' is an interpretation in M' such that I' (cjJ) ¢ D'. In terms of the relationship of the designated sets, all that we know is D' ~ D. For a strong submatrix we would have D n M' = D', and could argue then that t'(cjJ) ¢ D, as required. But this breaks down in the first step when we only have the inclusion the one way.
1 1
'2 2
1
'2
1
'2
1
'2
On a Bochvar reading of the value ~ as "garbage" this can be interpreted as "anything in, garbage out." We again use the trick of letting M and M' agree on their algebraic component (identity is an isomorphism and hence a homomorphism), and let this algebra have simp.ly the operation \/ defined by the above table. Thus M = M' = {I, ~, O}. Again we SImply expand the designated set (but in the converse direction). Thus D = {I} and D' = {I, ~ }. It is clear that \/p f!:M' q since we can assign q the value 0, and no matter what value is assigned to p, \/ p will take the designated value ~. But in M this value is not designated, and so there is no invalidating assignment. Incidentally, the reader may be bothered by the "artificiality" of the operation \/, which is just the constant ~ function. Such a reader may take the three-valued Lukasiewicz matrix for -+ but make ~ the only designated value. Call this M. M' is the same except that we make the value 1 designated as well. Clearly p -+ p I=M q since p -+ p
246
always takes the undesignated value 1, and yet p -'; P ft:M' q since in M' 1 is designated and q can be assigned the undesignated value 0. . Counter-example 3. Unary assertional consequence is not preserved under weak znverse homomorphic image. This is a simple reinterpretation of the situation of counterexample 2. M has {I} as the set of designated elements, and M' has {I, ~ }. M' is of course a weak homomorphic image of M under the identity homomorphism, and p V "'p is valid in M' and yet not in M. Counter-example 4. Symmetric consequence is not preserved under direct product. Indeed, we can show this for an empty left-hand side. Consider the direct product 22 of the two element Boolean algebra 2, with 1 as its only designated value. Its elements are {(1, 1), (1,0), (0, 1), (O,O)}, and the only designated v~lue is (1, 1). The cons~quence f- p,"'P is clearly valid in 2, since ..,p takes the Opposl~e value ~ro~ Pi as~urm~ th~t always one of p, ..,p will take the value 1. But f- p,"'P IS not vahd m 2 , smce If p IS assigned (1, 0) then ..,p is interpreted as (0, 1), and so neither ends up designated. Exercise 7.4.1 Complete all of the reasoning needed to justify the various checks and crosses in the table above. Exercise 7.4.2 Prove the claims about the matrices in the three examples in Section 5.12. (Hint: First relate the notions of strict, strong and weak equivalence of matrices to the notions at the beginning of this section.)
7.5
Varieties Theorem Analogs for Matrices
In Chapter 2 we stated Birkhoff's Theorem 2.10.3 and presented its proof. Recall .that this theorem links a proof-theoretical characterization of a class of algebras (equatIOnally definable) with a model-theoretic characterization of the class (closure under subalgebra, homomorphic image, and direct product). This gives a purely algebraic answer to the question: when is a class of algebras axiomatizable? Obviously the same t~pe ~f question can be asked of classes of matrices. We can ask when a ~lass of matr~ces IS characteristic for a "logic." This question is actually several questIOns, dependmg on what one takes a "logic" to be. As we saw in Section 6.2 there are various alternatives. We shall present answers to this question for unary assertional, asymmetric, and symmetric consequence logics. The first was shown by Blok and Pigozzi (1986), a~d th~ latter two by Czelakowski (1980a, 1983). Czelakowski (1980b), and Blok and Plgozz1 (1992) are also relevant. We omit proofs, which can be found in the works cited. A couple of preservation theorems not proven in Section 7.4 are given as exercises. 7.5.1
VARIETIES THEOREM ANALOGS FOR MATRICES
MATRICES AND ATLASES
247
Note that designation extension is related to our notion of A being a weak submatrix, except the two underlying algebras moe required to be the same. 7 Theorem 7.5.2 (Blok and Pigozzi 1986) A class of matrices K is the class of all matrices satisfying a unmy assertionallogic iff K is closed under (weak) direct products, strong submatrices, inverse strong homomorphic images, designation expansions, and strong homomOlphic images. Applying these operations finitely often, and in a certain order, suffices, as is summarized in obvious symbols: Matr(f-) = HstD HstSstP(Matr(f-».
We know from Section 7.4 that unary assertional consequence is closed under inverse strong homomorphic image (expansion), strong homomorphic image (contraction). We also know from Section 7.4 that since it is closed under weak submatrix, then it is closed under designation expansion. Designation extension plays a role for matrices similar to that of homomorphic image in equational systems. Exercise 7.5.3 Show that every weak homomorphic image of a matrix M can be obtained as a strong homomorphic image of a designation extension of M. Blok and Pigozzi actually state their theorem using notions of "contraction" and "expansion," which are equivalent to strong homomorphic and inverse strong homomorphic images: Definition 7.5.4 (Blok and Pigozzi, 1986) A matrix M = (A, D) is an expansion of a matrix M' = (A', D') (If there exists an onto algebraic homo117Olphism h : A I--l- A' such that h- I (D') = D. Conversely, M' is said to be a contraction of M. Blok and Pigozzi define M' to be a relative of M iff M' can be obtained from M by a finite number of contractions and expansions. This defines an equivalence relation, and, as Blok and Pigozzi observe, plays a role in the model theory of unary assertional logics similar to that played by isomorphism in equational systems. Blok and Pigozzi show that every relative of M can be obtained as an expansion of a contraction of M. Exercise 7.5.5 Show that if M is a relative of M', then f-M 7.5.2
= f-~.
Asymmetric consequence logics
We turn now to the case of an asymmetric consequence relation. It turns out that in characterizing the closure conditions on a class of matrices we have to work with a notion more complicated than direct product, namely an m-reduced product of matrices. We first introduce the simpler notion of a reduced product of matrices.
Unmyassertionallogics
We turn first to the question of characterizing the "vmieties" of matrices for unary assertionallogic. We first provide a needed definition: Definition 7.5.1 A matrix M' = (A', D') is a designation extension of a matrix M = (A, D) (fl(l) A' = A and (2) D ~ D'.
7Blok and Pigozzi actually speak of "filter extensions," but they do not necessarily mean a filter in the lattice sense. They mean any designated set of elements that satisfies the axioms and inference rules of some given underlying logic. (In a lattice this would most naturally form a filter, hence the terminology.) In fact they do not need this for their proof of the varieties theorem analog, and so, for the sake of both simplification and comparison to other results, we do not here impose any such requirement on the designated elements. This is why we prefer the term "designation expansion."
248
MATRICES AND ATLASES
Definition 7.5.6 Let I be a set of indices, and let ~ be a filter of sets of its power set rtJ(I). The product of matrices reduced by ~ (in symbols IT!l MiEl) is defined as the quotient matrix of the ordinary direct product ITiEl MiEl' produced by the congruence relation =;5 induced by ~ as follows: (ai liEf =;5 (b i liEl iff {i : ai = bi} E ~. An element [(ai liEl] =;5 is designated (ff {i : ai E Di} E~. Remark 7.5.7 This is to say that we first form the reduced product of the underlying algebras (as in Section 2.16) and then characterize designation. Note that just as the congruence relation can be understood as saying that (ailiEf and (biliEI are "almost everywhere" identical, designation of [(ai )iE[] =;5 can be understood as saying (ai liEf is "almost everywhere" designated in the underlying direct product. There are two examples of special interest. First, when ~ is just the power set of I, we obtain the ordinary direct product. Second, when ~ is a maximal filter, we obtain what is called an ultraplVduct (~ is usually called an ultrafilter in this context). Definition 7.5.8 An ultraproduct of a similarity class of matrices is a plVduct of these matrices reduced by a maximal filter. We also have use for the following abstraction: Definition 7.5.9 ~ is said to be an m-filter if besides being closed under upward inclusion and finite intersections, it is also closed under infinite intersections as long as the cardinality of the collection of sets being intersected has cardinality less than m. An m-reduced product of matrices is just a product of matrices reduced by m-filter~. We can now state the theorem of Czelakowski, subject to certain technical conditions on the cardinals m, which we shall subsequently explain. Any reader who wants or needs to skip over these technical considerations should go immediate to the corollary, which applies to most "real-life" logics. Theorem 7.5.10 (Czelakowski, 1980a) A class of matrices K is the class of all matrices satisfying an asymmetric consequence relation f- iff K is closed under (strong) submatrices, strong homomorphic images, inverse stlVng homomorphic images, and m-reduced plVducts, where m is an infinite regular cardinal weakly bounded by the cardinality of f- and the successor of the cardinal of the set of sentences in the underlying language. In obvious symbols:
We now tum to the technical conditions on m. First we explain the notion of the cardinality of an asymmetric consequence relation. It is the least infinite cardinal n such that the consequences of any set of sentences r can be obtained by taking the union of the consequences of sets of sentences each of whose cardinality is strictly smaller than n. The cardinality of f- is always less than or equal to the successor cardinal of the set of sentences in the underlying language. Observe that when the language is denumerable and f- is compact, its cardinality is ~o.
CONGRUENCES AND QUOTIENT MATRICES
249
A regular cardinal m is one such that it "cannot be surpassed from below," i.e., given a family of cardinals {mi} iEl such that each member mi < m and such that the family itself has a cardinal less than m, then LiEl mi < m. This is a fairly technical notion but let us note that the first infinite cardinal ~o is regular. This last observation, together with observation of the preceding paragraph, leads to the following, much less technical version of the theorem for the ordinary case where the underlying language is denumerable. Corollary 7.5.11 (Czelakowski, 1980a) Let f- be a compact asymmetric consequence relation on a denumerable sentential language. Then the theorem above can be simplified by replacing m-reduced plVducts with reduced plVducts. The following suffices: Matr(f-) = HstHstSstPR(Matr(f-».
PIVO! The observations already noted show that ~o satisfies the technical conditions, so we simply add that an ~o-reduced product is simply a reduced product, since any filter is closed under finite intersections. D Exercise 7.5.12 Show that an asymmetric consequence relation is preserved under reduced products. 7.5.3
Symmetric consequence logics
Czelakowski extended his theorem to apply to symmetric consequence relations. The conditions are similar, except closure under strong homomorphic images is dropped, and "Ultra-product" is substituted for "m-reduced product," which means the technical condition disappears. This gives a prettier statement: Theorem 7.5.13 (Czelakowski, 1983) A class of matrices K is the class of all matrices satisfying a symmetric consequence relation f- iff K is closed under ultraproduct, (strong) submatrices, strong matrix homomorphisms, and inverse strong homomorphic images. The following suffices: Matr(f-) = HstHstSst Pu(Matr(f-».
Exercise 7.5.14 Show that any symmetric consequence relation is closed under ultraproduct. We close this section by simply raising the question about how to extend the results above to a larger class of similar results. There various ways of presenting logics which we have not considered in this section-for example, unary refutational systems (cjJ f-). Not only are there a lot of ways of presenting logics, but there are also a lot of closure conditions on classes of matIices floating around, and it would be interesting to explore which combinations of them correspond to which "vmieties" oflogics (thus completing the pun set up by the title of Section 6.2: "The Varieties of Logical Experience"). 7.6
Congruences and Quotient Matrices
We recall from Section 2.6 that a congruence on an algebra is just an equivalence relation = that respects the operations of the algebra. In defining a congruence on a matIix
MATRICES AND ATLASES
250
M we obviously want it to be a congruence on the underlying algebra alg(M), but the question is how much the relation should respect designation. The natural requirement is (1) if a
E D
and a == b, then bED.
Because of the symmetry of ==, this is equivalent to requiIing that if a == b, then either both a and b are designated or else both a and bare undesignated. We shall call a congruence satisfying (1) a strong congruence, and one that is merely a congruence on the underlying algebra a weak congruence. We shall single out neither for the epithet congruence simpliciter, the problem being that we feel a certain tension: the strong congruence is certainly the one with the greater claim to naturalness, and yet (as we shall see) it has an affinity with the so-called strong homomorphism. Given a congruence == of either type, we can define a quotient matrix Mj == as follows: The algebraic part of Mj == will be the quotient algebra alg(M) j == as defined in Section 2.6, i.e., its elements will be the equivalence classes [a] detelmined by a E M, and operations are defined on them by way of representatives. So, for example, taking a binary operation: (2) [a]
* [b] = [a * b].
The important question is which equivalence classes we are to count as designated. For a weak quotient matrix, its set of designated elements, D j ==, will consist of those cliques [a] such that there is some a' E [a] (i.e., a' == a) such that a' E D. (This is in effect a special case of (Q) of Section 2.6, when R is a "unary relation".) For a strong quotient matrix we shall define the set of designated elements exactly the same way, but the difference between the weak and strong notions comes out in our requiring for a strong quotient matrix that the equivalence relation == be strong. It just turns out (because of (1)) that a clique [a] is designated iff all a' E [a] (i.e., all a' == a) are such that a' E D. How can we describe the notion of congruence with some intuitive logical vocabulary? Well, one way to think of congruence is as provable equivalence in some theory, and since theories should be closed under provable equivalence, this quickly motivates the requirement (I) above. The story can go on to give some intuitive force to the notion of a quotient matrix. Many times, propositions that are non-equivalent in some "inner theory" are said to be equivalent in some "outer theory" (proper extension). Thus, for example, propositions that are not logically equivalent might be said to be mathematically equivalent. And sentences that are not mathematically equivalent, might be said to be physically (in physics) equivalent. And sentences that are not physically equivalent might be said to be biologically equivalent, etc. And at the very beginning of the chain, we can have sentences that are not identical (say p and p 1\ p) even though they are logically equivalent. Now if one wants, one can just leave the situation at this level of description. But if one is interested in reifying what it is that equivalent propositions have in common (what makes them the "same"), one can go on to say that they express the same proposition "really," i.e., in some more
CONGRUENCES AND QUOTIENT MATRICES
251
elaborate theory. Thus it is a familiar quasi-nominalistic move to identify propositions with classes of logically equivalent sentences, and it is certainly not too funny a way of talking to say that two logically distinct propositions express mathematically the same proposition, etc. The partitioning into equivalence classes to form quotient matrices can be understood to be just a convenient set-theoretical device to reify what it is that equivalent propositions (or sentences) have in common. With some story like the above, how could we ever motivate some requirement weaker than (l)? Possibly the designated set could be the theorems of the "inner theory." Maybe D is some set of mathematical theorems (or truths) and == is physical equivalence. Assuming some standard reductionist imagery, we could imagine that a proposition a is equivalent to some mathematical fact a', and on that account the physical proposition [a] ought to be designated. It is clear that every strong homomorphism determines a strong congruence. Thus, count two elements as congruent when they are carried into the same element. We know from Section 2.6 that this is a congruence on the algebraic part of the matrix. But since a strong homomorphism can never carry a designated element and an undesignated element to the same value, it is clear that this congruence also respects designation in the sense of (2) above, and that we have then a strong congruence. Conversely, every strong congruence determines a strong homomorphism. As we know, once again from Section 2.6, the canonical homomorphism, which carries an element a into the class [a] of the quotient matrix, is an algebraic homomorphism. It is also clear that the canonical homomorphism respects designation, since if a is designated, then [a] is designated by virtue of our decision on how to designate cliques in the quotient. And if a is undesignated, then, by (2), all elements congruent to a are undesignated, and thus each element of [a] is undesignated and so [a] is undesignated. Playing with these facts, one can establish the following theorem. Theorem 7.6.1 (Strong homomorphism theorem for matrices) Every strong 11omomOlphic image ofM is isol1ZOIphic to a strong quotient matrix ofM. Exercise 7.6.2 Give a detailed proof of the above. What, then, of weak homomorphisms? Clearly weak congruences determine weak homomorphisms, since the canonical homomorphism again carries a designated element a to the designated clique [a]. (Designation of the clique requires only that olle member be designated.) But it is not necessarily true that weak homomorphisms determine even weak congruences. The problem is that a weak homomorphism can carry an undesignated element a to a designated element a', even though no other designated element is carried to a'. An obvious fix to this problem is to restrict our attention to homomorphisms that are (minimally) faithful in the sense of Section 2.5 (thinking of "designation" as a unar·y relation). What this means is that in the problem case described above, a' must also have some other designated element b carried to a. Incidentally, notice that the canonical homomorphism onto a weak quotient matlix is always faithful, since [a] is made designated only when there is some member b that is designated. But then b is the desired designated pre-image. The following is easily proven from the discussion above.
MATRICES AND ATLASES
252
Theorem 7.6.3 (Weak homomorphism theorem for matrices) EvelY weak faithful homomorphic image ofM is isomorphic to a weak quotient matrix ofM. Exercise 7.6.4 Fill in the details. Consider the set Cst(M) of all strong congruences on a matrix M = (A, D). As with algebras, the smallest congruence is just the identity relation restlicted to A. But this time the largest "natural" congruence cannot in general be the universal relation A x A (unlike the case with algebras), for if A has at least two elements and D f:. A, then the universal relation would identify a designated element with a non-designated element. Still, we have: Theorem 7.6.5 The set Cst(M)of all strong congruences on a matrix M = (A, D) forms a complete lattice. Proof We leave this to the reader. In virtue of Corollary 3.7.5, all that needs to be checked is that the intersection of relations () "compatible" with D (if a(}b, then a E D only if b E D)8 is also a relation compatible D, and similarly with the transitive closure of the union of relations compatible with D. 0 Blok and Pigozzi answer the question as to how to characterize the largest strong congruence on a matrix. As they point out, their solution stems from Leibniz's principle of the identity of indiscernibles. It is better rephrased in this context as congruence of indiscernibles, because the idea is to define two elements to be congruent just when they are extl'insically indiscernible in terms of their roles in the matrix. The Leibniz congruence has to do with indiscernibility by way of predicates (by which is meant a relation). There are just two natural atomic predicate symbols in the first-order language used to describe a matrix: one is a unary predicate for membership in the designated set, and the other is the binary predicate for identity. This discussion assumes that we have only the first, which we denote by D[x]. Definition 7.6.6 An n-ary predicate (relation) P is first-order definable over a matrix M = (A, D) iff there is afonnula (XI, ... , Xll, YI, .. ·, Yk) offirst-order logic containing only the predicate D and function symbols corresponding to the various operations of A, and there are elements C], ... , Ck E A, such that for all ai, ... , all E A,
Definition 7.6.7 Let A be an algebra and let D over D is defined by QAD
= {(a, b) : Pea)
~
~
A. The Leibniz congruence on A
PCb) for every definable predicate P}.
CONGRUENCES AND QUOTIENT MATRICES
253
Exercise 7.6.8 Prove that QAD is a strong congruence. Remark 7.6.9 Note that the first-order formula (XI, ... , Xn , YI, ... , Yk) in Definition 7.6.6 can have all of the usual connectives and quantifiers in it, as well as various occurrences of the predicate D. We signal this by using the capital letter , reserving the lower case p, as has been our practice, for sentences (really terms) in the sentential language appropriate to M. There is a fussy point to be made about this. Given a sentence of the sentential language, we have been writing it as P(PI, ... , Pn, ql, ... , qk), and we have been writing the corresponding first-order term as P(XI, ... , XI!, YI, ... , Yk). As the reader can easily see, the two terms m'e "isomorphic" except for the choice of the symbols (variables and operation symbols). For convenience, let us assume that the terms are written in the same language, with "x I" and "PI" just being two names in our metalanguage for the same symbol, and similarly with the other matching symbols. Let us consider just atomic formulas, i.e., those first-order formulas of the form D[P(XI, ... , X Il , YI,···, Yk)]. Remember that such a formula says that
Definition 7.6.10 We shall say that an n-ary predicate (relation) P is atomically definable over a matrix M = (A, D) iff there is an atomic formula D [p(x I, ... , x n , YI, ... , Yk)] offirst-order logic containing only the displayed occurrence of the predicate D and function symbols corresponding to the various operations of A, and there are elements C], ... , Ck E A, such thatfor all al, ... , all E A,
Definition 7.6.11 Let A be an algebra and let D ~ A. The atomic Leibniz congruence on A over D is defined by
Q% D =
{(a, b) : Pea) ~ PCb) for every atomically definable predicate P}.
It turns out that the restriction to atomic formulas in Definition 7.6.11 makes no difference. The predicates in Definition 7.6.7 can without loss be restricted to those definable by atomic formulas, i.e., formulas of the form D[P(XI, YI, ... , Yk)].
Lemma 7.6.12 QAD
= Q% D.
Proof This is an immediate consequence of the fact that the atomic replacement is equivalent to complex replacement (cf. Theorem 2.6.5). 0
The function QA(D) = QAD is defined on all subsets of A and is called the Leibniz operator on A.
Blok and Pigozzi use Lemma 7.6.12 to prove the following characterization of the Leibniz congruence in terms of other strong congruences:
SNote that when () is a congruence, it is symmetric, and so this is the same as requiring "two-way respect": if a(}b, then a E D iff bED.
Theorem 7.6.13 For any matrix M = (A, D), QAD is the largest strong congruence in the lattice of strong congruences on M.
Proof Let 0 be any strong congruence on M, and assume (a, b) EO. Let pep], q], ... , qk) be any sentence, and let Z be any interpretation in A. Since 0 is a congruence, we have for C] , ... ,Ck E A,
Since 0 is a strong congruence, this means: pA(a, C], ... , Ck) ED pA(b, C], ... , Ck) ED.
Hence (using Lemma 7.6.12) we have (a, b)
E
QAD.
o
Let us consider the special case of a Leibniz congruence on an algebra of sentences. We know from Section 6.12 that theories correspond to the designated subsets, so we can write QsT to obtain a certain congruence on the algebra of sentences (the subscript is often omitted), namely the largest congruence that is compatible with T. Dividing the set S of sentences by this congruence gives the quotient algebra S I nT, and then dividing out the theory T as well gives us the matrix (S I nT, TinT). We .n~ed a na~e for this matrix. Note that this is neither the Lindenbaum algebra (because It IS a matnx, and because QT is generated from "above" rather than below), nor is it the Lindenbaum matrix (because the elements of the Lindenbaum matrix are sentences, not equivalence classes of sentences). We shall call it the Blok-Pigozzi matrix (detennined by T).
Theorem 7.6.14 p QT lJI (!ffor all intelpretations z in the Blok-Pigozzi matrix (SI nT, TinT), z(p) = Z(lJI)·
Proof The direction from right to left is a kind of completeness. Instantiate I to be the canonical interpretation I(X) = [X]nT. Then assuming z(p) = l(lJI), we have [p]nT = [lJI] nT, and hence p QT lJI· The direction from left to right is a kind of soundness result. Let 1(p) = [p'] nT and l(lJI) = [lJI'] nT. We first observe that we can choose p' and lJI' to be substitution instances of p and lJI. The reason is that we can consider just the interpretation of the atomic sentences I(pj) = [xIlnT, ... ,z(Pi) = [Xi], ... , and pick a sentence P' from each equivalence class, being careful to pick the same sentence for identic~l equivalence classes. This induces a substitution (J : CJ(Pi) ~ (p;), and in gen~ral (J(X) = x(p]1 p']' ... , Plllp;l)' where P], ... , Pil are all the atomIC sentences occurrmg in X. It is easy to prove by induction that z(X) = [x(p]lp'l' ... ,Plllp;l)] = [(JX]. N ow we can complete the proof of soundness. Since p QT lJI, we have p' QTlJI', and 0 this means [ifJ']nT = [lJI'] nT, i.e., I(p) = z(lJI).
7.7
255
THE STRUCTURE OF CONGRUENCES
MATRICES AND ATLASES
254
The Structure of Congruences
Whether we are talking of weak or strong congruences, they can be regarded as sets of ordered pairs, and ordered by set inclusion. Then =] ~ =2 means intuitively that =] is a "stronger" (stlicter) relation than =2. Somewhat in the face of English usage, the "stronger" relation is the "smaller" of the two, the idea being thatfewer pairs satisfy the stronger relation.
The inclusion relation clearly has the following properties, for arbitrary sets x, y, and z: (l) x (2) x (3) x
~
x
~
y and y ~ x imply x
~
y and y ~ z imply x ~ z
(reflexivity);
=y
(antisymmetry); (transitivity).
Any relation with these properties is called a partial order. Partial orders, and some related notions, were introduced more fully in Chapter 3, but we shall review pertinent facts about them as needed. Let £(M) be the set of all equivalence relations on (the carrier set of) M, and similarly let Cw(M) and Cst(M) be, respectively, the sets of all weak and strong congruences on M. It is easy to see that given any non-empty subset E of £(M), the intersection E is also an equivalence relation on M. It is also easy to see that the relation a E) b holds just when aOb for all 0 E E. Then, it is straightforward that E is reflexive since each 0 E E is reflexive, and similarly for symmetry and transitivity. If the members of E happen to be either all weak or all strong congruences, it is similarly easy to see that E will inherit the respective property. Thus £(M), Cw(M), and Cst(M) are all such that they are closed under non-vacuous intersections. Whenever U is a set with a partial order ~, given any subset S, we can ask whether S has a greatest lower bound (glb), i.e., whether there is an element 1\ S which is such that:
n
(n
n
n
(4) "Ix E S, 1\ S ~ x (lower bound). (5) Given any element u such that "Ix E S, II then u ~ 1\ S) (greatest lower bound).
~
x (i.e., given any lower bound
II
of S,
n
It is easy to see that for non-empty sets S, S has just these properties. We also have the dual notion of the least upper bound (lllb) of S, which is an
VS satisfying: "Ix E S, x ~ VS (upper bound).
element
(6) (7) Given any element II such that "Ix E S, x then V S ~ u) (least upper bound).
~
u (i.e., given any upper bound u of S,
It is easy to see that £(M) (and also Cw(M) and Cst(M» always contains the glb of any subset and that this is just intersection. We now address the question of whether £(M) (and also Cv(M) and Cst(M» always contains all the lubs of its subsets. In answering this question, it is important to note that the union of a bunch of equivalence relations is rarely itself an equivalence relation. This is because we may have a =] band b =2 c, and yet have no equivalence relation in the bunch so that a = c. The obvious answer to this problem is to take the transitive closure, i.e., the smallest transitive relation that includes the union. What this amounts to in practical tenns is that we shall say that a =E b iff there is some sequence of elements (possibly null) X], ... ,Xi,Xi+], ... ,XJc, and of equivalence relations 0], ... , Oi, Oi+], ... , Ok+] E E, such that (8) aO]x], ... ,XiOi+]Xi+], ... ,XkOk+]b.
256
THE CANCELLATION PROPERTY
MATRICES AND ATLASES
It is then easy to see that =E is an equivalence relation. It is equally easy to verify that if each e E E respects D, then =E respects D (respect for the operations or for designation just transmits itself across the chain (8)). Thus it is clear that each of [(M), Cw(M), and Cst(M) is closed with respect to the operation =E on non-empty subsets E. It is clear that =E is the lub of E, VE, in any of [(M), Cw(M), or Cst(M). So far we have been talking about taking glbs and lubs of non-empty sets. What happens when E = 0? Then, among the equivalence relations [(M), the lub V0 must be an equivalence relation that is included in every equivalence relation which is an upper bound of every equivalence relation in 0. But since there are no equivalence relations in 0, this means that every equivalence relation is such an upper bound, and so V 0 must be included in every equivalence relation on M. But this is just the identity relation (restricted to M), since each equivalence relation must be reflexive. Similar considerations give the same conclusion for either Cw(M) or Cst(M), the point being that identity clearly respects both operations and designation (indeed, presumably indiscemibility in all respects). Identity (restricted to the elements of the matrix) is of course the strictest equivalence or congruence, in any sensible sense of equivalence or congruence, and as such is at the very bottom of any of [(M), Cw(M), or Cst(M). But what of the largest element? We can quickly see that in the cases of [(M) and Cw(M) the largest element is just the universal relation on M, M x M (the relation that holds between any two elements of M). But this relation obviously does not respect designation (assuming that the matrix has at least one designated and one undesignated element), and so does not count as a strong congruence. Is there, then, a weakest strong congruence? The answer is clearly yes, since we know that every non-empty subset of Cst(M) has a lub, and so in particular, if we take the lub VCst(M) of the whole set of strong congruences we obtain our desired weakest strong congruence. Let us denote it by /-l. It should now be clear what happens when we take the glb of 0. In the environment of each of [(M), Cw(M), and Cst(M) we obtain their top element. In [(M) and Cw(M), this is the universal relation, whereas in Cst(M) it is something stronger. A partially ordered set that contains glbs and lubs for all of its subsets is called a complete lattice. We summarize all of the above discussion in the following theorem.
Theorem 7.7.1 Given a matrix M, its set of equivalence relations [(M), set of weak congruences Cw(M), and set of strong congruences Cst(M) are all complete lattices (with ~ the partial order). In each case, for non-empty subsets, the glb is intersection and the lub is transitive closure. In each case the identity relation (restricted to M) is the bottom element (and the lub of 0). In the case of [(M) and Cw(M) the top element is the universal relation (restricted to M), M x M. Remark 7.7.2 Clearly [(M) ~ Cw(M) ~ Cst(M). The identity map thus gives a kind of embedding of the complete lattice of equivalence relations into the complete lattice of weak congruences, and that, in tum, into the complete lattice of strong congruences. Since intersections and transitive closures of non-empty sets depend only on the elements of the sets, and not on other elements in the environment (unlike the general case
257
of glbs and lubs), we have that this embedding preserves glbs and lubs of non-empty sets. Also clearly it preserves V0. Also 1\ 0 is preserved by the first embedding, but not the second. The reader may have some sense of mystery as to just what the top element "looks like" in the case of Cst(M). Let us introduce the notation I(a/ p) to indicate the "semantic substitution of the element a for the atomic sentence p in the interpretation I," i.e., the interpretation just like I except (perhaps) for assigning a to p. Theorem 7.7.3 (Czelakowski 1980a) Given a matrix M, let L be a language (with infinitely many generators) appropriate to M. Let /-l be a relation on M defined so that aJlb ({ffor all sentences cjJ ofL and all atomic sentences p ofL and all interpret ations I, l(a/p)(cjJ) = l(b/p)(cjJ). Then /-l is the weakest congruence on M. Exercise 7.7.4 Prove the above theorem. 7.8
The Cancellation Property
Recall from Section 6.11 that a unary assertionallogic always has a characteristic matrix, namely its Lindenbaum matrix. We saw in Sections 6.12 and 6.13 that formal asymmetric and symmetric consequence logics also always have a characteristic semantics that can be defined in terms of a given set of "propositions," but our proof had us looking at the Lindenbaum atlas (with many different designated subsets) rather than just at the Lindenbaum matrix (with its single designated subset). In this section we discuss whether, and under what circumstances, we are forced to an atlas instead of just a matrix. We prove a theorem due to Shoesmith and Smiley (1978) giving a necessary and sufficient condition for a broad class of symmetric logics having a charactelistic matrix. There is an analogous (and simpler) result of Shoesmith and Smiley (1971), proven for asymmetric logics, that we shall examine after we look at the symmetric version. Before stating the theorem, we need to explain a key notion called cancellation. Following Shoesmith and Smiley, we say of two sentences cjJ and Iff that they are disconnected if they share no atomic sentences, and we shall say of two sets of sentences r and I:::.. that they are disconnected if for each cjJ E r and each Iff E 1:::.., cjJ and Iff are disconnected. Finally, we shall say of a family of sets of sentences (ri ) iEI that it is disconnected if for each j, k E I with j =f. Ie, r j is disconnected from r Ie. We then say of a symmetric logic that it has the cancellation property if and only if whenever (ri U I:::.. i ) is a disconnected family of sets of sentences such that Uri f- U I:::..i, then for some i, ri f- I:::..i. The cancellation property is a quite natural condition for a logic. The quick intuitive idea is that there can be no real logical interaction between formulas in sets rj and I:::..Ie with different indices, since the formulas share no content (except for degenerate cases such as when some sentences of rj are contradictory, or some sentences of I:::..j are valid), so that all of the "action" can be "localized" at some pair (ri, I:::..i) which
258
MATRICES AND ATLASES
THE CANCELLATION PROPERTY
presumably share content (or else we are back in one of the degenerate cases mentioned above, in which case it can be degenerately localized). It is easy to prove the following lemma.
Lemma 7.8.1 If a symmetric logic has a characteristic matrix, then it has the cancellation property. Proof Let L be a symmetric logic characterized by the matrix M, and let be a disconnected family. Suppose for each i, not (ri I- Ji i ). Then for each i there is some interpretation Ii that assigns a designated value to each sentence in r i and yet assigns an undesignated value to each sentence in Jii. Since the value assigned to a sentence depends only on the interpretations of the atomic sentences that occur in it, the valuations Ii can be combined into a single interpretation I so that I(¢) = li(¢) for ¢ E r i UJi i . (If ¢ is not in any ri UJii, define I on the atomic sentences in ¢ arbitrarily.) It is clear that I invalidates uri I- UJii as desired. D It turns out that under a suitable hypothesis about "stability" (the definition of which will be provided in the course of the proof) the converse of Lemma 7.8.1 also holds.
Theorem 7.8.2 (Shoesmith and Smiley) For a stable symmetric (formal) logic, a necessary and sufficient condition for it to have a characteristic matrix is for it to have the cancellation property. Proof Necessity is of course provided by Lemma 7.8.1. Recall that a symmetric logic is just a formal symmetric consequence relation 1-. We shall now sketch a strategy for proving sufficiency (somewhat different than that of Shoesmith and Smiley), and in the process uncover the needed definition of "stability." Let us collect together all of the pairs such that not eri I- Jii). The key idea is for each index i to make a copy of the set of sentences Si (for later convenience when invoking substitutions, we let one of these be S itself). We then let the elements of the matrix be the union of all these sets, and (as a first approximation) let the set of designated elements D be the union of the copies of the ris. We draw an appropriate picture (for the simple case of two pairs) in Figure 7.9.
:--.. r'1.' .. ' . '.
r'2
259
Before proceeding, we must correct the first approximation. In the final construction D will not simply be the union of the ris, but shall be a somewhat larger set. We shall have to show, reverting to the picture above, that Ji'1 UJi; is not a consequence of r'1 ur;, and then invoke the global cut property to partition S' (the union of all the copies-in the picture S1 US2) into the desired D and its complement (as the horizontal dotted line suggests). But consequence was defined only on the original set of sentences S, and so we must extend the definition to the set of copies S'. We do this as follows. The first thought is to define a new relation on S' so that r' 1-' Ji' iff there is a substitution cr so that r' = crer), Ji' = cr(Ji) , and r I- Ji. The problem with this is that it is pretty plain that 1-' does not satisfy dilution (the problem is that if we try to dilute before we substitute, the new items then become subject to substitution whether we want them to or not). So we just build into the definition of 1-' that it is the closure of I- under substitution and dilution, i.e., r' 1-' Ji' iff there exist r and Ji (subsets of S) and a substitution cr (defined on S') so that r' ;2 cr(r), Ji' ;2 cr(Ji), and r I- Ji. It is still not necessary that 1-' so amended be a consequence relation, but it might be. Clearly it satisfies overlap, but there is still the question of global cut. This brings us to the promised definition: Shoe smith and Smiley call a symmetric consequence logic Istable when it has the property that 1-' is always a symmetric consequence relation (for an arbitrary extension of the original language by new atomic sentences). We state some simple relationships between 1-' and 1-.
Fact 7.8.3 If rand Ji are sets of sentences of S, and r' and Ji' are the respective results of applying some one-one substitution cr (in S'), then r I- Ji iff r' 1-' Ji'. Fact 7.8.4 1-' has the cancellation property (given that I- has). Proof We leave the straightforward proof of Fact 7.8.3 to the reader. For Fact 7.8.4, we suppose that is a disconnected family of sets of sentences of S' and also that U r; 1-' U Ji;. By the definition of 1-', there exist sets rand Ji of sentences of S so that cr(r) ~
U r;,
cr(Ji) ~
U Ji;,
and r I- Ji.
Let r i = {¢: ¢ E rand cr(¢) E r;}, and let Jii be defined analogously. Then is a disconnected family since is. Note that U ri = rand U Jii = Ji, and so we can apply cancellation (on r I- Ji) to obtain ri I- Jii (for some i). And so (again by definition of 1-') r; 1-' Ji;, as required. D Returning now to the proof of Theorem 7.8.2 and reverting to the picture above, we must show that it is not the case that r'1 U r; 1-' Ji'1 U Ji;. Suppose to the contrary that
.. ' Ji'
Ji'1 S "Before"
.'
"After" FIG. 7.9. Illustration for i = 2
2
r'1' r; 1-' Ji~, Ji; . Since is clearly a disconnected family, this means that by the cancellation property (and Fact 7.8.4, which entitles us to apply it), we must have either r'1 1-' Ji'1 or r'2 1-' Ji'2' But Fact 7.8.3 entitles us to remove the primes, obtaining
260
THE CANCELLATION PROPERTY
MATRICES AND ATLASES
f] f- al or f2 f- a2.
But this is contrary to our original choice of the pairs (fi' ai) such that not (fi f- ai). The argument above, although carried out for the simple case of two pairs, is completely general, and so we know that it is not the case that U f-' U The penultimate step of the construction is then to invoke the global cut property so as to partition S' into two sets, D' and - D', so that U ~ D', U ~ D', and not (D' f- -D'). The final stage of the construction is to consider the algebra of sentences S' defined on the sentences S', and outfit it with the designated set D' so as to obtain "the Shoesmith-Smiley matlix" (S', D'). (Note that despite the definite article, it is not unique, depending as it does on a choice of D'.) Interpretations in the matlix are just substitutions, so soundness follows as with the constructions from the previous chapter of the Lindenbaum matlix and the Scott atlas. (We leave to the reader to check that f-' is formal, given that f- is.) And completeness is guaranteed by the construction, for if there are sets of sentences fi and ai such that not (fi f- ai), then we know that there is a substitution (J so that (J(fi) ~ D' and O'(ai) ~ - D' ((J just assigns to each sentence its ith copy). D
f;
f;
a;. a; -
Remark 7.8.5 Note that the construction above leads to a larger cardinality than does the construction of the Lindenbaum matrix. Since one copy S; has to be made for each pair of sets (fi, ai), it is reasonably clear that even when S is denumerable, most generally one will be constructing "continuum many" copies of a denumerable set, and so the union will be non-countable (indeed, of the power of the continuum). The following is an easy application of Theorem 7.8.2. Corollary 7.8.6 (Shoesmith and Smiley) For a compact symmetric logic, a necessary and sufficient condition for it to have a characteristic matrix isfor it to have the cancellation property. Proof It clearly suffices to show that a compact symmetlic logic is stable, and this boils down to showing that f-' (as defined in the proof above) satisfies the cut property. We leave the details to the reader. D
We now briefly discuss the case of an asymmetric consequence relation. First, the cancellation property specialized to the asymmetlic case comes down to the following: (ACP) If f, U fi f- p, the family (fi) is disconnected, and f and {p} are both disconnected from each n, then f f- P unless some fi is "absolutely inconsistent" in the sense that n f- If/ for all sentences If/. Thus consider the family that has in it the pair (f, {p} ) as well as the pairs (fi' 0). To say that this family is disconnected is precisely to give the hypotheses about disconnection above in (ACP). And to say that the union of the first components has as a consequence the union of the second components is precisely to give the hypothesis that f, Ufi f- p. So applying the symmetric version of the cancellation property, we obtain that either f f- P or else some n f- 0. But this last is the same as fi f- a for all
261
sets of sentences a (dilution). The closest we can come to saying this for the case of an asymmetric logic is that fi f- If/ for all sentences If/, but this is just the definition of absolute inconsistency. We can now state and prove the analog of Theorem 7.8.2 for asymmetric logics. Theorem 7.8.7 (Shoesmith and Smiley) Given a stable asymmetric (formal) logic, a necessary and sufficient condition for it to have a characteristic matrix is that it have the property (ACP). Proof Of course an asymmetric logic is just a formal asymmetric consequence relation f-, and to say that f- is stable is just to analogize the definition for a symmetric consequence relation. This requires that if we extend the original set of sentences S over which f- is defined by arbitrary new atomic sentences (obtaining a new set of sentences S' :2 S), and define f' f-' p' iff there exist f and p such that f ~ Sand pES, and a substitution (J (defined on S') so that r' :2 (J(r), p' = (J(p), and f f- p, then the relation f-' is an asymmetlic consequence relation. The rest of the proof is entirely analogous to that of Theorem 7.8.2, except that the set D can be just the closure under f-' of all the sets D
f;.
Formal asymmetlic consequence relations differ markedly from their symmetlic cousins, as shown by the following somewhat surprising, but nonetheless trivial fact (due to Shoesmith and Smiley). Fact 7.8.8 All formal asymmetric consequence relations (compact or otherwise) defined on countable languages are stable. Proof Let f- be a formal asymmetric consequence relation. Let f-' be defined as in the definition of "stability." We must show that f-' is a formal asymmetric consequence relation. As with symmetric consequence, the properties of overlap, closure under substitution, and dilution are easy. We concentrate our attention then on the infinitary cut. Let us suppose then that f f-' 8, for each 8 E a, and that a, p f-' If/. By the definition of f-', there exists a countable set of sentences a' ~ a so that a', p f-' If/. (A substitution performed on a countable set leaves a countable set.) Now for each 8' E a', there exists (similarly) a countable set f' ~ f so that r' f-' 8'. Considering all the sentences in a' and in the sets f', we are considering a countable union of countable sets, which set theory tells us is again countable. It is clear that only a countable number of atomic sentences occur as generators of these countably many sentences, and so we can find a substitution (J that rewrites these in a one-one fashion as atomic sentences from the originally given countable language, and we can apply the in finitary cut there using f-. Reversing the substitution (and possibly diluting) gives us the desired infinitary cut for f-'. D
Corollary 7.8.9 For a countable asymmetric logic, a necessmy and sufficient condition for it to have a characteristic matrix isfor it to have the property (ACP).
262
7.9
MATRICES AND ATLASES
Normal Matrices
A matrix is normal (in the sense of Church 1956) if the set of designated elements forms a "truth set" with respect to the classical logical operations, i.e., (i) -a
E
D iff a ¢ D,
(ii) a 1\ bED iff a
E
D and bED.
Clearly this definition presupposes a particular choice of the primitive logical connectives, and would have to be appropliately modified to provide for disjunction, the material conditional, Sheffer stroke, or whatever. But the particular choice above is convenient from our point of view (because of the association with the lattice notation), and of course, is well known for the fact that all of the other classical truth-functional operations can be defined from these. In particular, we shall assume the definitions of <jJ V lff = ~(~<jJ & ~lff) and <jJ :J lff = ~(<jJ & ~lff). A normal matrix is one that can be viewed as semantically "OK" in the sense that its elements (regarded as propositions) can be divided up into "the true" and "the false" in such a way as to respect the classical logical operations. Algebraically, a normal matrix is such that if we just consider the operations 1\, V, and - (that is, if we consider its "reduct" (M, D, 1\, V, - )), then there exists a strong homomorphism of it into the twoelement Boolean algebra 2 (recall that a strong homomorphism will have to carryall elements of D into 1, and all other elements to 0, and so in this instance is unique). Another way of looking at the situation is that if we "divide out" the reduct by counting two elements "equivalent" if either they are both designated or both undesignated, then this relation is in fact a congruence (on the reduct), and of course the quotient algebra determined by this congruence is just 2. In this section we prove a result stated in Kripke (1965) giving necessary and sufficient conditions for a unary assertionallogic to have a normal characteristic matrix. The proof that we give is based on a reconstruction worked out with Nuel Belnap and Peter Woodruff many years ago. Before stating the theorem, we define some requisite notions. We presuppose some antecedent knowledge of the notion of a classical tautology (a sentence <jJ is a tautology iff evelY interpretation of it in the two-element Boolean algebra assigns it the value 1). A set of sentences f tautologically implies a set of sentences l:!.. (in symbols, f 1= l:!..) iff there exist n, ... , Ym E f, and 01, ... ,On E l:!.. such that the sentence (n & ... & Ym) :J (01 V ... V On) is a tautology.
Here we presuppose the customary definition of <jJ :J lff as ~<jJ V lff, parentheses in the conjunction and disjunction are to be associated to the left, and either or both of the conjunction and disjunction may be missing (when the ys are missing the consequent must be a tautology, and when the os are missing the antecedent must be such that its negation is a tautology). As a special case when l:!.. = {<jJ}, we have that f tautologically implies the sentence <jJ iff there exist sentences n, ... ,Yn E f such that (n & ... & Yn) :J <jJ is a tautology,
or, when f is empty, <jJ all by itself is a tautology.
NORMAL MATRICES
263
Recall also that it is part of our notion of a logic that its theorems are closed under substitution. A logic is said to be consistent if / is not a theorem of 12, where we define / to be <jJ & ~<jJ, for some fixed sentence <jJ. A set of sentences f is said to be tautologically consistent if / is not tautologically implied by f. We need the notion of a sentence <jJ' being an alphabetic variant of a sentence <jJ, which means that there is a one-one substitution (Y that assigns atomic sentences to atomic sentences and such that (Y(<jJ) = <jJ'.
A logic is said to be complete in the sense 0/ Hallden (see Hallden 1951) iff whenever <jJ V lff is a theorem and <jJ and lff are disconnected from each other (share no atomic formulas), then either <jJ is a theorem or lff is a theorem. Hallden completeness is intimately related to the cancellation property, and indeed is equivalent to it, given quite usual assumptions. We shall show that Hallden completeness implies the symmetric cancellation property, under natural assumptions which we shall develop in the course of the proof. Let us. suppose the hypothesis of the symmetric cancellation property, that Ufi I- Ul:!..i. If I- IS compact, then there are finitely many finite subsets f'. and l:!..'. (of fi and l:!..i respec'1y?, ~o thU' . ~ of all ) the sentences in fi, tIve at . fj 1-. U" l:!..j" Lettmg Yj be the conjunctIOn and SImIlarly WIth OJ, If I- supports the Ketonen rules of "conjunction on the left" and "disjunction on the right," we have
n
& ... & Yj & ... & Yn I- 01 V ... V OJ V ... V On,
and if I- has the deduction theorem property (the Gentzen rule of "conditional on the right"), we have I- (n & ... & Yj & ... & Yn) :J (01 V ... V OJ V ... V 011)'
Assuming for the moment that we have closure under tautological implication, upon rearranging terms we obtain I- (~n V Dr) V ... V (~Yj V OJ) V ... V (~YIl V on).
The other hypothesis of the cancellation property tells us that each disjunction is discon~ected from every other disjunction in the above. So by repeated applIcatIOns of Hallden completeness we will obtain for some index j, (~Yj. V ~j)
I-
(~Yj V
OJ),
or upon rewriting, I-Yj:Joj.
If we have the natural properties that amount to the converse of the deduction theorem property, and the converse of the Ketonen rules, we can conclude
and hence fj I- l:!..j,
as desired for the cancellation property.
264
MATRICES AND ATLASES
265
NORMAL MATRICES
We have thus seen how, on very usual assumptions, the cancellation property (at least the symmetric version) falls out of Hallden completeness. It turns out, as the reader can easily verify, that, except for compactness, all of the assumptions above can be justified on the basis of one general assumption, to wit, that the relation f- includes the relation of tautological implication. Further, as the reader can verify, on this same assumption, Hallden completeness follows from the cancellation property. Further, the reader can verify that this equivalence holds, on the same assumption, for the asymmetric case. We record these facts in the following theorem.
finally, we define T to be just the union of all of the stages (T will be a set since we begin with a set of sentences). D
Theorem 7.9.1 Let f- be a compact symmetric (or asymmetric) consequence relation including the relation of tautological implication (in symbols, 1= ~ f-). Then f- has the cancellation property iff f- satisfies Hallden completeness, i.e., whenever f- ¢ V lfI, then if ¢ and lfI are disconnected from each other, then either f- ¢ or f- lfI·
Proof To is tautologically consistent, since it is just L, which we shall now show is not only consistent (as is given) but also tautologically consistent. It is easy to see that L is closed under tautological implication (because of conditions (i) and (ii)). Thus L is tautologically consistent if it is consistent, and of course this is just condition (iii). Let us next consider the successor case Ta+!, and suppose, contrary to what we want to show, that it is not tautologically consistent. This means that
Having investigated Hallden completeness, we return to the conditions of Kripke for normality:
Theorem 7.9.2 (Kripke) Let L be a unary assertional logic. Then L has a normal characteristic matrix iff all of the following conditions hold: (i) All truth-functional tautologies (in &, V, and ~) are theorems of L. (ii) If ¢ and ¢ ~ lfI (=~ ¢ V lfI) are theorems of L, then lfI is a theorem of L.
The construction defined above is just an interesting variation on the usual Henkinstyle proof of the completeness of classical propositional logic.
Lemma 7.9.3 At each stage a in the construction defined above, the set Ta is tautologically consistent. Hence the set T, defined as the union of all stages, must be tautologically consistent.
Ta+1
1=
f·
Since (by inductive hypothesis) Ta is tautologically consistent, a formula ¢ (and a negated alphabetic variant) must have been added at stage a + I, and so the inconsistency must be laid at their feet. This means
(iii) L is consistent. (iv) L is complete in the sense of Hallden. Proof Let 1= represent tautological implication. Then it is easy to check that this relation has the deduction theorem property and its converse, i.e.,
r, ¢ 1= lfI
iff r 1= ¢ ~ lfI· We start with the set L of theorems of L as a basis, and inductively build up a set of sentences T that behaves as a "truth set," i.e., (i) ~¢ E T iff ¢ if. T, (ii) ¢ & lfI E T iff ¢ E T and lfI E T. The desired matrix will be of the "parasitic" variety, i.e., its elements will be sentences and T will be the designated set. We "enumerate" the set of sentences S: ¢I, ¢2, ... , ¢a+! .... In the usual case where S is denumerable, the indices will be just the positive integers; in the more general case the indices will be ordinals, and in particular, for convenience, will always be successor ordinals-we skip over the limit ordinals. We define To = L. At each successor stage a + 1, if ¢a+ I is not a theorem of L, we define Ta+! to be the result of adding ¢a+ I to Ta if the result is tautologically consistent, and at the same time, adding ~¢~+ I (where ¢~+I is the first alphabetical variant of ¢a+1 in the enumeration disconnected from all sentences already in Ta+! - L). If the result of adding ¢a+1 is not tautologically consistent, or ¢a+1 is a theorem of L (in which case it got in at stage 0), we let Ta+1 be just Ta. At limit stages (of course there will be none in the usual case when S is denumerable) we just take unions of all the sets introduced at earlier stages. And
By the finitary definition of tautological implication and the deduction theorem property, we know that there exist sentences Yl, ... , Yll E Ta so that (Yl & ... & YIl)
~ (¢ ~ (~¢' ~ f))
is a tautology.
Some of the y;s come from L and others were introduced at later stages. Let us denote the conjunction of the first of these by A and the conjunction of the latter by 'C. Then (rearranging terms and using the fact that ~lfI ~ f is truth-functionally equivalen t to lfI) A~
('C ~
(¢ ~ ¢')) is a tautology,
and hence, by condition (i), a theorem of L. Clearly A is a theorem of L (since, as observed above, L is closed under tautological implication). So by condition (ii), 'C ~
(¢ ~ ¢') is a theorem of L.
Hence (by closure under tautological implication), (~'C V ~¢) V ¢'
is a theorem of L.
By the construction, the sentence ¢' is disconnected from (~'C V ~¢). By condition (iv) (the Hallden completeness property), at least one of (~'C V ~¢) or ¢' is a theorem of L. If it is ¢', then since L is closed under substitution, ¢ is also a theorem of L. But in that case ¢ was not added at stage a + 1, contrary to our assumption above that it was this addition (along with ~¢') that led to inconsistency. So then 'C ~ ~¢ must be a theorem of L. This contradicts the supposition that ¢ could be added with tautological consistency to Ta.
At limit stages TfJ, any tautological inconsistency (since it comes from only a finite number of sentences) must already exist at some previous stage Ta, contrary to inductive hypothesis. D
Lemma 7.9.4 T is a truth set. Proof We first observe as a sub lemma that if T 1= lfI, then lfI E T. The point is that lfI is some sentence ¢a+! in the enumeration, and when its turn came up in the construction of T, it would at that point have been added, since it could in no way interfere with the tautological consistency of T, being already a tautological consequence. We next observe that we do not have lfI, ~lfI E T, for otherwise T is not tautologically consistent. We next show that we have at least one of lfI, ~lfI E T. If neither is in T, then we consider T U {lfI} and T U {~lfI}. In this case, each of lfI and ~ lfI must have led to tautological inconsistency when an attempt was made to add them to the construction when their turn came up. So we have T, lfI
1=
j, and T, ~lfI
1=
j.
By obvious moves licensed by tautological implication, we then have T
1=
j,
contradicting that T is tautologically consistent. We have thus established (i)
NORMAL ATLASES
MATRICES AND ATLASES
266
~lfI E
T iff lfI
~
T.
But it is easy to see as well that (ii) lfI/\ X
E T
iff lfI
E T
and X
E T,
using the sub lemma that T is closed under tautological implication together with the obvious facts that lfI /\ X 1= lfI, lfI /\ X 1= X, and lfI, X 1= lfI /\ X· Turning back now to completing the main lines of the proof, we finally have to verify that "the Kripke matrix" (S, T) is characteristic for L. It should be clear that if ¢ is a theorem of L, since L is closed under substitution, any substitution instance (J(¢) will be a member of T. And since interpretations are just substitutions, this establishes soundness, since ¢ will then always be assigned a designated value in every interpretation. Turning now to completeness, if ¢ is not a theorem of L, then either ¢ was never added in the construction of T, or if it was, at the same time a substitution instance ~¢' of ~¢ was added. Since T is consistent, this means that ¢' is not in T. In either case it is then possible to find a substitution instance of ¢ that is not designated, i.e., an interpretation that falsifies ¢. D
7.10
Normal Atlases
Generalizing the idea of a nonnal matrix from the previous section, an atlas (A, (Di») will be said to be normal if for every index i, the matrix (A, Di) is nonnal. Talking infonnally, this means that the atlas can be understood as a collection of possible worlds
267
realized as sets of true propositions. Since we know that it is atlases that are key to the characterization of logics understood as consequence relations (of either the asymmetric or symmetric stripe), it is natural to raise the question as to when such logics have a characteristic nonnal atlas. Let us first observe that if an asymmetric logic tolerates inconsistency in the sense that there exist sentences ¢, ~¢, and lfI so that it is not the case that ¢, ~¢ I- lfI, then it is clearly impossible that it has a normal characteristic atlas. For to falsify such an implication we would need to designate both an element a and also -a (realizing an "impossible world"). For a symmetric logic, we would talk about the existence of sentences ¢, ~¢, and a set of sentences r such that it is not the case that ¢, ~¢ I- r, and we would also want to bring to attention the dual situation when it is not the case that r I- ¢, ~¢ (which would require the realization of an "incomplete world"). One quick way of ruling out such situations is to require that the consequence relation I- include tautological implication, i.e., that 1= ~ 1-.
Theorem 7.10.1 Any symmetric, or compact asymmetric logic I- has a characteristic normal atlas iff it satisfies the following conditions: (i) The consequence relation includes the relation of tautological consequence (in symbols, 1= ~ I-). (ii) r I- ¢ :J lfI, fl iff r, ¢ I- lfI, fl (fl is of course empty for an asymmetric logic).
Remark 7.10.2 The reader may desire a comparison of the above conditions with those for the corresponding theorem for unary assertional logics of Kripke in Section 7.9. Condition (i) above is the straightforward generalization of condition (i) of the Kripke theorem. Also, it is easy to see (because of the cut property) that (ii) above implies (ii') if r I- ¢, fl and r I- ¢ :J lfI, fl, then r I- lfI, fl, which is the obvious generalization of the corresponding condition (ii) for unary assertionallogics given in the statement of the theorem by Kripke in Section 7.9. Looking at the converse direction, it is easy to see (using dilution and cut) that (ii') implies half of (ii) (the direction from left to right), but we believe that the other direction does not follow. An informal argument for this goes as follows. Notice that (ii') does not affect the left-hand side of 1-. Since neither dilution nor overlap can move formulas from the left to the right, these rules cannot lead from r, ¢ I- lfI, fl to the premises of (ii'), nor to r I- ¢ :J lfI, fl. The only hope would be to cut ¢, but dilution and overlap cannot produce an appropriate premise, since r I- fl is not given. The theorem of Kriplce had two additional conditions: (iii) consistency; and (iv) Hallden completeness. We have no need for a generalization of consistency, (say) that I- 0 (or 0 I- f) does not hold, because in the construction below we are always trying to find a counter-example for some consequence. So we are given a set r and a sentence ¢ such that it is not the case that r I- ¢, and from this we can argue that r is a consistent set and then use it as the basis of our construction of a truth set. Thus we can drop (iii) as a condition. This construction is even easier than in the Kripke theorem, because we do not have to find counter-examples for all non-theses in relation to the same set of designated values. This means that we can drop the condition of Hallden completeness.
o
268
NORMAL ATLASES
MATRICES AND ATLASES
Proof The proof differs for (1) the asymmetlic and (2) the symmetric case. (1) The proof for the compact asymmetric consequence relation is much like that for the unary assertional case (cf. Remark 7.10.2). Again, necessity is straightforward and is left to the reader as an exercise. For sufficiency, let us suppose that not (r f- p). We are going to inductively expand r u {",p} to a truth set. Thus we set To = r u {",p}. We next verify that To is consistent. If not, then r u {",p} f- f, and so (by (ii)), r f- "'P :::> f. But "'P :::> f F P, and so by (i) and the cut property we have r f- p, contrary to our hypothesis. We then enumerate the new set of sentences S : PI, P2, ... , Pa+l, .... In the usual case where S is denumerable, the indices will be just the positive integers; in the more general case the indices will be ordinals, and in particular, for convenience, will always be successor ordinals. (We skip over the limit ordinals.) We define To = r', and at each successor stage a + I, we define Ta+ I to be the result of adding Pa+l to Ta if the result is tautologically 'consistent, and closing the result under f-. Otherwise Ta+ I is defined to be just Ta. At limit stages (of course, there will be none in the usual case when S is denumerable) we just take unions of all the sets introduced at earlier stages. And finally, we define T to be just the union of all of the stages.
Lemma 7.10.3 T is closed under f-. Proof This follows trivially from the fact that each successor stage is closed under f-, given that f- is compact. That each successor stage is closed under f- is easily seen to be the case, for if T f- I(f, then when I(f'S tum came up in the enumeration, it would have been added, since adding it could not interfere with T's consistency, it being already a consequence of T. (The reader can add the details, using dilution and cut.) 0
Lemma 7.10.4 At each stage a in the construction defined above, the set Ta is consistent. Proof We first give the proof for the case of a compact asymmeUic logic. That To is consistent was shown above. Let us suppose that some successor stage Ta+l is inconsistent. The reader can easily see that Ta then must have been inconsistent, contrary to inductive hypothesis. At limit stages T{3, any inconsistency (since, by compactness, it comes from only a finite number of sentences) must already exist at some previous stage Ta , contrary to inductive hypothesis. 0
Corollary 7.10.5 T (defined as the union of all the stages in the construction defined above) is consistent. Proof Similar to that for limit stages in Lemma 7.10.4.
o
Lemma 7.10.6 T is a truth set. Proof We first observe that we do not have I(f, "'I(f E T, for otherwise T is not consistent (this uses fact that I(f, "'I(f F f, and (i)). We next show that we have at least one
269
I(f, "'I(f E T. If neither is in T, then we consider T U {I(f} and T U {"'I(f}. Each of and "'I(f must have led to inconsistency when an attempt was made to add it at the appropriate stage in the construction. So we have (using dilution)
of
I(f
T, I(f f- f, and T, "'I(f f- f.
By obvious moves (using (ii), (i) and facts that have T f-
"'I(f,
I(f
and T f-
:::>
f F
"'I(f
and "'I(f :::> f
F I(f), we
I(f.
But (using (ii)) then T f-
"'I(f
:::>
(I(f
:::> f),
and so (using (ii') twice), we have T f- f,
contradicting that T is consistent. We have thus established (i)
"'I(f E
T iff
I(f
Ii! T.
But it is easy to see as well that (ii)
I(f /\
X E T iff
I(f E
T and X E T,
using the lemma that T is closed under f-, together with the obvious facts (derived using (i)) that I(f /\ X f- I(f, I(f /\ X F X, and I(f, X f- I(f /\ X. 0 We finally construct the desired "Kripke atlas" by so constructing a truth set Ti for each pair (r, p) such that it is not the case that r f- p, and by letting the desired atlas be (S, For soundness, if 1::J.. f- I(f, then 0-*(1::J..) f- o-(I(f) (for any substitution 0-), and thus if 0-*(1::J..) ~ h then o-(I(f) E Ti (since each T; is closed under f-). This means that any substitution, i.e., interpretation, that designates all members of 1::J.. also designates I(f, and so we have soundness. Turning now to completeness, if not (r f- p), then it is easy to see from the construction that there exists a truth set Ti such that r ~ Ti, and yet P is not in Ti. The identity substitution (interpretation) thus designates all members of r in some designated set of the Kripke atlas, but fails to designate p. (This completes the proof of Theorem 7.10.1 for case (1).) (2) The proof for a symmeUic consequence relation is a bit easier, since the global cut property allows us to drop the hypothesis of compactness, and the construction of a truth set T does not involve "Lindenbaumizing." (Again, we leave the necessity part of the proof to the reader.) If it is not the case that r f- 1::J.., then we can simply invoke the global cut property to obtain a partition of the sentences into two sets of sentences (T, F) such that it is not the case that T f- F. It is easy to verify the following hold:
m».
(0) Not (T f- F).
p, pET or P E F (2) For no sentence p, pET and P E F
(1) For each sentence
(exhaustiveness). (exclusiveness).
270
(3a) (3b) (4a) (4b) (5a) (5b)
MATRICES AND ATLASES
For no sentence cjJ, cjJ E T and ",CjJ E T. For no sentence cjJ, cjJ E F and ",CjJ E F. For each sentence cjJ, cjJ E Tor ",CjJ E T. For each sentence cjJ, cjJ E F or ",CjJ E F. cjJ 1\ If/ E T iff cjJ E T and If/ E T. cjJ V If/ E F iff cjJ E Tor If/ E T.
(0)-(2) restate the global cut property. As for (3a), if cjJ, ",CjJ E T, then since cjJ, ",CjJ F Ll, by (i), we would have (using dilution) that r I- Ll, contrary to hypothesis. «3b) follows similarly.) (4a) follows from the fact that a sentence has to end up on one side of the partition (T, F), or the other. If cjJ ends up in T, fine; if it ends up in F, then by (3b) we know that ",CjJ cannot also be in F, and so ",CjJ ETas needed. «4b) is argued dually.) As for (5), let us suppose that cjJ 1\ If/ E T, but that, say, cjJ Ii T. Then by exhaustiveness cjJ E F, and since cjJ 1\ If/ F cjJ, we have by (i) (and dilution) that T I- F, contrary to (0). As for the converse, if both cjJ, If/ E T, and yet cjJ 1\ If/ Ii T, then (again by exhaustiveness), cjJ 1\ If/ E F. And since cjJ, If/ F cjJ 1\ If/, we would have (using (i) and dilution) T I- F, contrary to (0). (Again (5b) is argued dually.) Using (0)-(5) it is easy to see that T is a truth set. Further, then for each pair (n, Lli) such that not (r i I- Lli) there exists such a pair (Ti' Fi ), and so we can use the "parasitical" atlas defined on the algebra of sentences S, (S, (1j»), as a normal atlas in which every non-consequence can be falsified (completeness). It is also clear that no conect consequence is falsified this way, for if II I- B, and yet this consequence could be falsified, then there would have to exist a pair (Ti, Fi) and a substitution u so that u* (II) ~ Ti and u*(B) ~ F i . But because I- is formal, we know that u*(II) I- u*(B), and so by dilution we would have Ti I- Fi, contrary to specifications. D
Corollary 7.10.7 (Symmetric case) Any compact symmetric pre-consequence relation has a normal characteristic atlas. Proof Immediate from the second half of the proof of Theorem 7.10.1, and from Lemma 6.8.4 showing that a compact symmetric pre-consequence relation is a symmetric consequence. D
7.11
Normal Characteristic Matrices for Consequence Logics
In this section we in effect combine the results of the previous three sections. The background for all of this work is the fact that every unary assertionallogic has a characteristic matrix (the Lindenbaum matrix). In Section 7.8 we examined the conesponding property for consequence logics, providing necessary and sufficient conditions (due to Shoesmith and Smiley) for a consequence logic (of either the asymmetric or symmetlic flavor) having a characteristic matrix. In Section 7.9 we returned to unary assertional logics, and raised the question of a stronger, normal, characteristic matrix, again providing necessary and sufficient conditions (due to Kripke) for a unary assertional logic to have a normal characteristic matrix. In Section 7.10 we came back to consequence logics, raising the normality issue, but this time in the context of atlases instead of matrices
MATRICES AND ALGEBRAS
271
(just as every unary assertionallogic has a characteristic matrix, so every consequence logic has a characteristic atlas; the interest then extends to the normal ones in each case). Now in the present section we attack the question of when a consequence logic has a normal characteristic matrix, combining all of the issues of strength and generality at once.
Theorem 7.11.1 For a stable symmetric logic I- the following constitute necessmy and sufficient conditions for it to have a normal characteristic matrix: (i) I- has the cancellation property; (ii) F ~ I- (F is tautological implication); (iii) r I- cjJ :J If/, Ll iff r, cjJ I- If/, Ll.
Proof Before we begin, let us recall the observation made in Section 7.9 in discussing the conditions of the theorem of Kripke. There it was remarked that in the presence of condition (ii) above, Hallden completeness and the Cancellation Property amount to the same thing, so we can freely substitute them below in our reasoning. Necessity is straightforward and is left to the reader. As for sufficiency, we begin as with the theorem of Shoesmith and Smiley to collect together all the pairs (n, Lli) such that it is not the case that ri I- Lli' and then for each such pair make a disjoint copy Si of the set of sentences S (we let one of these be S itself). We then union these together to get a new set of sentences S', and define a new consequence relation 1-' on the subsets of S', making 1-' the closure of I- under substitution and dilution. That 1-' is a symmetric consequence relation is just the content of the hypothesis of "stability." We then verify as before that Ur; ¥ U Ll;, and then invoke the global cut property so as to partition S' into two sets T and F so that U r i ~ T and U Lli ~ F. We now have only to verify that T is a truth set, and this velification proceeds exactly like the verification for the symmetric case of the conesponding theorem for atlases of Section 7.10. D We can also prove the following results, by techniques familiar by now (and left to the reader).
Corollary 7.11.2 A compact symmetric pre-consequence relation has a normal characteristic matrix under precisely the same set of necessmy and sufficient conditions as in Theorem 7.1l.l. Theorem 7.11.3 Let I- be a compact asymmetric logic. Then the following are necessmy and sufficient conditions for I- to have a normal characteristic matrix: (i) I- has the asymmetric cancellation property; F ~ I- (F is asymmetric tautological implication); r I- cjJ :J If/ (ff r, cjJ I- If/.
(ii) (iii)
7.12
Matrices and Algebras
Although a matlix is an algebra, it is not just an algebra because of the need of singling out a designated subset. This is unfortunate in that it means in general that we
272
MATRICES AND ATLASES
cannot just cmT)' over results from universal algebra9 and apply them to matrices without thought (although often close analogs can be obtained). But in this section we shall discuss how it is that often "in real life" matrices can be viewed (without loss) as just algebras. Often a matrix M will have an "implication" operation --+, so that if we define an "implication" relation a :S b iff a --+ bED, it turns out that :S is a partial ordering. If in addition D is a (positive) cone under this partial order (a E D and a :S b only if bED), we shall call the matrix standard. There is a slightly weaker notion that is also of interest. If:s turns out only to be a pre-order (not necessarily anti-symmetric), then, given one more condition which we shall next desclibe, we shall call the matrix pre-standard. Thus define a == b iff both a :S band b :S a. The additional requirement is that == must be a congruence (when :S is anti-symmetric == is just identity and we have no need for this requirement). Note that == is a strong congruence given the requirement above that D be a cone. It is easy to see that the quotient matrix of a pre-standard matrix is itself standard. Now for standard matrices a trick for throwing away D is to find some distinguished element e so that D = {x E M: e :S x}. The element e can be understood as just a nullary operation added to the underlying algebra. Unfortunately, this does not by itself suffice to let us consider M as just an algebra, since we still have the partial order with which to contend. But when there is a semi-lattice operation 1\ so that a :S b holds just when a 1\ b = a, then the reduction of a matrix to a plain algebra can be complete. (Clearly it would also suffice to have the dual semi-lattice operation V with a :S b iff a V b = b.) This allows us to give an entirely equational characterization of D as the set of elements satisfying the equation e 1\ x = e. So let us call a standard matrix with an operation that interacts in the required way with :S (and hence ultimately with --+) a completely standard matrix. Another tlick is to find an equation of the form sea) = tea) that holds of exactly the designated elements D. Elok and Pigozzi (1989) observe that this can be done for the relevance logic R using the fact that a is designated iff a --+ a :S a. And, of course, this last can be restated equationally as (a --+ a) 1\ a = a. There are a few things that need to be checked about the coincidence of matrix notions and algebraic notions for completely standard matrices. Since equations are preserved under algebraic homomorphisms, it is easy to see that a (weak) matl'ix homomorphism is just an ordinm'y algebraic homomorphism. Also, it is easy to see that a submatrix is just a subalgebra. First notice that if A' is a subalgebra of A, then (the nullary operation) e' has to be identical to e. But then the cone of elements in A' determined bye' is just the same as the cone of elements in A determined bye, i.e., D' = A' n D, as required. It is left as an exercise for the reader to verify that direct (and subdirect) products of matrices are just direct (and subdirect) products in the algebraic sense. (The verification falls back on the componentwise definitions of designation and 9Matrices can be regarded as many-sorted algebras in a natural way, and we could then appeal to results of many-sorted universal algebra. We have thought it best to leave the latter subject outside the scope of this monograph, however.
WHEN IS A LOGIC "ALGEBRAIZABLE"?
273
:S, once it is noted that the distinguished element of the direct product is just the indexed set of the distinguished elements of the component algebras.) The above ideas are perhaps most familim' in the context of classical logic and Boolean algebras, where the distinguished element e can be picked as the greatest element 1, and so D = {I}. The particular choice of e depends, of course, on the posit that all logical truths co-imply one another (an assumption common to a number of logics other than classical logic, e.g., the usual modal and intuitionistic logics). It turns out that the general idea is of much wider utility, and can be applied even to the usual relevance logics (which certainly have no such posit), although the logics have to be extended conservatively with a constant sentence t with the property that it is a theorem that implies all theorems. 7.13
When is a Logic "Algebraizable"?
We have been examining logics using the tools of algebra. It is natural to ask: how far does this methodology extend? We here give some rough answers to this question. Our answers are rough because we feel that a more precise answer may actually get in the way of further research. We follow Chairman Mao in wanting a hundred flowers to bloom. An algebra is simply a set with some operations. A logic is a set of sentences with some kind of consequence relation. For the sake of clarity we shall focus on asymmetl'ic consequence. The applications to symmetlic consequence and unary assertional systems are often left to the reader. We start by recalling some results of the previous chapter. For a unary assertional logic we showed that its Lindenbaum matrix characterizes its assertions, whereas for an asymmetric consequence logic its Lindenbaum atlas characterizes its consequences, and for a symmetric consequence logic it is the corresponding Scott atlas that does the trick. An atlas is just an algebra with many designated sets, and it is easy to see that an atlas (A, (Di) iEJ) can be "unfolded" into an equivalent indexed set of matrices ((A, Di)) iEJ (equivalent in the sense of validating the same consequences, whether they be unm'y, asymmetric, or symmetric). When a collection of matrices all have the same underlying algebra, then this is sometimes called a bundle. Summarizing, we have shown in Chapter 5 that formal logics, whether they be unary, asymmetric, or symmetric, all have a "matrix semantics" in the following sense.
Theorem 7.13.1 For eVelY unary assertional logic L there is a class of matrices M such that for every sentence cp: I- L cp iff 1= M cp. For evelY asymmetric consequence logic L there is a class of matrices M sLlch that for evelY set of sentences r and sentence cp : r I-L cp (If 1=M cp. For eVelY symmetric consequence logic L there is a class of matrices M sLlch that for every set of sentences rand l::!. : r I- L l::!. (If r 1= M l::!.. A matrix is an algebra with a designated subset, so this result shows that algebras figure centrally in the semantics of logic. On the other hand a matrix is more than an algebra because of its designated subset D. In addressing which logics are "very algebraizable" we must address the question when D can be defined "algebraically."
274
WHEN IS A LOGIC "ALGEBRAIZABLE"?
MATRICES AND ATLASES
275
Czelakowski (1981) defined an algebraic semantics for a logic to be a class of matrices charactelizing the consequence relation of the logic, with each matrix having just one designated element d. Without the restriction to just one designated element, we shall call this a matrix semantics. Let us introduce a predicate T(cjJ) which intuitively means "cjJ is true." This means that we can add a constant ad to the language of the logic, with the interpretation rule that lead) = d. We can now define T(cjJ) as cjJ = ad. The idea is that an inference licensed in classical logic, e.g., {p, p -+ q} f- q, can be translated into a statement about Boolean algebras, namely, if x = 1, and x -+ y = 1, then y = l. Note that the relettering is unnecessary if we use the same variables and operation symbols in the language of the algebra that we use in the language of the logic, and we shall adopt this simplifying convention for this discussion. Having a single designated element works fine for many logics, including classical and intuitionistic logics, where any theorem is implied by any sentence whatsoever, and so all theorems are logically equivalent. For these logics we can then look at the equivalence class of theorems as the greatest element 1 in the natural partial order defined by [cjJ] ~ [vr] iff f- cjJ -+ vr. For logics where a theorem is not implied by any sentence whatsoever, e.g., relevance logic and some other substructural logics, we have to resort to another device, briefly described in the previous section. The device introduced in Dunn (1966) (cf. 1975, Section 28.2 Anderson and Belnap) was to add to the language of the logic R a constant t conceived of as the conjunction of all theorems. Anderson and Belnap had shown that t can be added conservatively to R. In the Lindenbaum algebra [t] ~ [cjJ] is then a way of saying that cjJ is a theorem, and this can be abstracted out by having an "identity" element e and defining T(a) as e ~ a. The idea is that the "true" elements can be viewed as forming a principal cone, and it is just an "accident" that with, say, Boolean algebras this cone is degenerate and contains only 1. Cf. Remark 6.15.13. Note incidentally that e ~ a can be rephrased as an equation:
Remark 7.13.5 Just what would be the the "Blok-and-Piggoziesque" criterion for algebraizability of a symmetric consequence logic? The cliterion for symmemc consequ~nce would be of the same form as their criterion for an asymmetric consequence loglc, but K would be weaker than quasi-equationally definable. It would instead have to do with definability by "symmetric quasi-equations," i.e., formulas of the form: a conjunction of equations implies a disjunction of equations.
e /\ a = a.
Blok and Pigozzi show that theoremhood in the relevance logic R can be characterized without use of t, using the equivalence
But this does involve adding a constant to the language. A less contrived answer is to say that it is when we can find a set of equations of the form sex) = t(x) (containing only the vmiable x) which holds precisely of the elements of D. Blok and Pigozzi (1989) call these defining equations. They introduce this generalization in their definition of what it is for a logic to have an algebraic semantics. 10 It roughly amounts to saying that the logic has a matrix semantics, and the designated set D of each matrix can be uniformly characterized by equations. This is not quite right, since they also require that the set of defining equations be finite. This turns out to be no problem since they only consider logics that satisfy compactness, so an infinite set of premises r can always be traded for a finite subset r'. It would be interesting to investigate logics without this reshiction to compactness. IOThough they also implicitly introduce the requirement that the set of premises r in r I- rjJ is always finite. This obviously does not hurt for a system that satisfies compactness, but otherwise seems arbitrary.
Definition 7.13.2 (Blok and Pigozzi 1989) An asymmetric consequence logic L has an algebraic semantics iff there exists a finite set of equations with a single variable p, Sl (p) = tl (p), ... , sn(P) = tn(P)' such that for all i (1 ~ i ~ n);
r
f- L cjJ iff {Sl(vr/P) = tl(vr/P),· .. ,sn(vr/p) = tn(vr/p):
vr E r}
F=K Si(vr/p) = ti(vr/p).
The definition of when a unary assertional logic has an algebraic semantics is just the sp.eci~l case of this when r is empty. We leave to the reader to state the obvious generahzatlOn for a symmetric consequence logic. Blok and Pigozzi go on to give as their criterion for when an asymmetric consequence logic is "algebraizable": Definition 7.13.3 An asymmetric consequence logic is algebraizable iff there is a quasiequationally definable class of algebras K such that K is an algebraic semantics for the logic. Their cliterion for a unary consequence logic would presumably be essentially the same, except that the class is required to be equationally definable. Definition 7.13.4 A unary assertionallogic is algebraizable iff there is an equationally definable class of algebras K such that K is an algebraic semantics for the logic.
(cjJ /\ (cjJ
cjJ))
-+
(cjJ
¢;>
-+
cjJ),
which is fundamentally based on the R-theorem cjJ
-+
[(cjJ
-+
cjJ)
-+
cjJ]
(Demodalizer),
whose converse also holds. This means that the theorems can be characterized as those sentences cjJ such that (cjJ
-+
cjJ)
-+
cjJ
is a theorem. This may seem to be a circular definition, since "theorem" occurs in both the definiens and the definiendum, but the previous formula can be rephrased algebraicallyas (a -+ a) ~ a,
and obviously this can be reexpressed as the identity that Blok and Pigozzi require: a /\ (a
-+
a)
= (a -+ a).
276
MATRICES AND ATLASES
Whichever is chosen, R has a (characteristic) algebraic semantics. Blok and Pigozzi go on to show that the relevance logic E has no algebraic semantics. As the name "demodalizer" suggests, E lacks it since E is a modal logic in addition to being a relevance logic. In E, the necessity operator D¢ can be defined as (¢ --+ ¢) --+ ¢, and so (the demodalizer) has the effect of making mere truths necessary. Blok and Pigozzi also show that the implicational fragment of R (R... ) has no algebraic semantics. This relies on the fact that R-+ lacks conjunction (and disjunction, since a 1\ (a --+ a) = a --+ a can be dualized to a V (a --+ a) = a). It is well worth noting that this depends upon the lack of appropriate equation, since the inequality a --+ a ::; a would suit the bill admirably if inequations were allowed. Its surrogate e 1\ a = a works just as well for the extension of R with a logical constant t (W). We do not quarrel with the work of Blok and Pigozzi. Indeed, we praise it. Their criterion of algebraizability has a certain "philosophical" naturalness, and generalizes a large class of motivating logics. Also they can prove a number of interesting theorems, one of which we state. Recall that OCT) is the Leibniz congruence determined by a theory T on the underlying algebra of formulas of some given logic. Blok and Pigozzi establish the following interesting relationship between algebraizability and the Leibniz operator. Theorem 7.13.6 (Blok and Pigozzi 1989) An asymmetric consequence logic L is algebraizable ifffor all theories T and S, ifT ~ S then OCT) ~ O(S). But we think the restriction to equalities is too restrictive, and we propose another. Definition 7.13.7 An asymmetric consequence logic L is partially algebraizable iff there is a quasi-inequationally definable class of tonoids K such that K is a (sound and complete) semantics for the logic. Their criterion for a unary consequence logic is essentially the same, except that the class is required to be inequationally definable. Definition 7.13.8 A unary assertional logic is partially algebraizable iff there is an inequationally definable class of tonoids K such that K is a (sound and complete) semantics for the logic.
8 REPRESENTATION THEOREMS 8.1 8.1.1
Partially Ordered Sets with Implication(s) Partially ordered sets
We annunciated a theme in Remark 6.5.3 which we want to reemphasize here. Propositions can be understood as sets of "possible worlds" or, as we now stress, more generally as sets of "information states." The latter is more general in that there can be states of information that are inconsistent, incomplete, or both, and so do not correspond to possible worlds. An "information frame" will always consist of at least a set U whose elements are regarded as "states of information." It may have additional features, for example a binary relation b on U thought of as an "information order." a b p is to be read as "P contains at least the information a." This order is to be understood "qualitatively" and not "quantitatively" and is thus to be contrasted with Shannon's information theory. The information order arises quite naturally in a number of places, but we will content ourselves with assigning its origins to the Kripke-Grzegorczyk semantics for intuitionistic logic (cf. Chapter 11). Other features that an information frame might possess include accessibility relations and/or operations combining pieces of information. The most familiar example of the first is Rap (P is possible relative to a), which comes from the Kripke semantics for modal logic (cf. Chapter 10). As a less familiar example of the first we give Rapy, understood as something like "a and p are compatible from the standpoint of y," and as an example of the second we have something like "the combination of a and p." These can sometimes be parsed in terms of each other, e.g., Rapy can be understood as a • p b y. These last examples arise from the semantics of relevance logic (and more generally substructural logics), as developed by Routley, Meyer, Fine, and Urquhart. See Dunn (1986) or Anderson et al. (1992) for details and history. Routley and Meyer (1972, 1973) are the key references, along with Meyer and Routley (1972), which is even more important in the context of algebraic logic. We refer generally to sets whose elements are regarded to be states as "UCLA propositions." As the reader saw by working through Exercise 6.5.2, binary consequence, understood "mathematically" as a partial order between propositions, can be fully represented as inclusion between sets, and the latter can be understood "philosophically" as consequence between UCLA propositions. We run through the proof of this in slow motion, since grasping its essence is of major importance. Let P = (P, ::;) be a poset. We think of P as a set of propositions and::; as (binary) consequence. These "propositions" are conceived of abstractly. They could be anything;
PARTIALLY ORDERED SETS WITH IMPLICATION(S)
REPRESENTATION THEOREMS
278
we know nothing about their internal structure. Recall that a cone C is a subset of P which is closed upward under S;, i.e., if x E C and x S; y, then y E C. Let C be the set of all cones of P. A cone is a kind of primitive "theory" (at least it is closed under binary consequence) and as such can be regarded as an information state. So it is natural to interpret an "abstract" proposition a E P as the set of information states, i.e., theories (cones) that contain it. Thus, we define (1) h(a) = {CEC:aEC}. This is the desired representation function. We need to show that it preserves preting it as s on C):
S;
to be the residual of some fusion operator. Let us arbitrarily take implication to be the right residual-it will tum out, as we shall see, that there is no way of distinguishing the left residual from the right residual in the absence of fusion. So we assume that we have a po set with a binary operation (S, S;, -+). The only properties of -+ that we assume are that it is antitonic in its first position (antecedent) and isotonic in its second position (consequent), i.e., rule suffixing and rule prefixing. We shall call such a structure an implicational poset, and as we saw in Section 3.10 it is an example of a more general structure, discussed in Dunn (1993a), called a tonoid.
(inter-
s
(2) a S; b iff h(a) h(b), i.e., (3) a S; b iff VC E C, a E C implies b E C.
The left-to-right half follows immediately from the definition of a cone. The rightto-left half follows by instantiating C to be the cone determined by a, i.e., the smallest cone containing a. This involves a small "existence" proof since we have to show that there is indeed such a cone. In this case it is easy, since it explicitly constructed as [a) = {x : a S; x}. As we shall see, there are other cases, as with the representation of distributive lattices and Boolean algebras, when we have to go through some maximalization argument, but here all we need is a cone and it is easy to show that [a) is a cone. Thus suppose that x E [a) and x S; y. The first conjunct just means a S; x and so by transitivity with the second conjunct we obtain as; y, i.e., y E [a) as needed. But what we have is: (4) a E [a) implies b E [a).
Since a E [a) follows from the reflexivity of S;, we have bE [a) by the argument above, i.e., a S; b as desired. We still have to show that h is one-one, but this follows easily from anti-symmetry and (2). Thus if h(a) = h(b), then h(a) s h(b) and h(b) s h(a). But then by (2) a S; b and b S; a, and so by anti-symmetry a = b. This concludes the proof. Before we pass on, let us observe that sets of cones of the form h(a) have a special property. This is confusing as to type level, but not only are they sets of cones, they themselves are cones. Thus if C E h(a), i.e., a E C, and C s C', then a E C', i.e., C' E h(a). This is to say that we can require UCLA propositions to be not just any sets of states, but restrict them to sets of states closed upward under the information order, i.e., if a E p and a b fJ, then fJ E p. This is sometimes useful, and plays a role, for example, in the Kripke-Grzegorczyk semantics for intuitionistic logic (cf. Section 11.4). 8.1.2
279
Implication structures
In the previous section we regarded implication as a relation. We now tum to examining structures with an implication operation. While we take the point of view that useful properties of implication can be sorted out by relating them to properties of fusing premises, it must be acknowledged that fusion is (regrettably) still a relatively arcane notion. Accordingly it is interesting to wonder just what are the minimal properties of implication that are needed to allow it
Representation of Implicational Posets We shall now prove the following, which illustrates a more general result for tonoids. Theorem 8.1.1 EvelY implicational poset (S,
S;, -+) can be embedded in a right residuated partially ordered (p.o.) groupoid (S', S;, 0, -+).
Proof We prove this by way of a representation result. Let U be a non-empty set, and let R be a ternary relation on U. We call the structure (U, R) a ternary frame. We define the following operations on subsets of U: A
0
B
= {X:
3a E A,3fJ E B,RafJx};
A -+ B = {X: Va, VfJ, if RaxfJ & a E A, then fJ E B}; B +- A = {X: Va, VfJ, if RxafJ & a E A, then fJ E B}.
It is easy to verify that ('(.J(U), 0, -+, +-) is a residuated p.o. groupoid. Indeed, any subcollection S' closed under the operations 0, -+, +- is a residuated p.o. groupoid. We call these concrete residuated p.o. groupoids. 0
We can now state and prove the following lemma. Lemma 8.1.2 Every implicational poset (S,
S;, -+) can be embedded in a concrete re-
siduated p.o. groupoid. Proof Let U be the set of all cones C on S. Define the relation R as RCI C2C3
iff Vx, y, if x E CI and x -+ y E C2 then y E C3.
Using R we can now define the operation -+ on subsets of U as in the concrete residuated poset above. Next define the map h(a) = {C : C is a cone and a E C}. It is easy to see that h is one-one, since if a f:. b, then a S; b or b S; a or all b (a and b are unrelated). Without loss of generality we may choose as; band allb to be the cases considered. (1) If as; b then the principal cone [b) = {x : b S; x} is not in h(a), but certainly [b) E h(b) as well as [a) E h(b). (2) If allb, then neither [a) E h(b) nor [b) E h(a). Next we show that h preserves -+, i.e., C E h(a -+ b) iff C E h(a) -+ h(b). To facilitate this we first translate the left- and right-hand sides via their definitions: C E h(a -+ b) iff a -+ b E C; C E h(a) -+ h(b) iff VCI, C2, RCI CC2 & a E CI implies b E C2.
REPRESENTATION THEOREMS
280
It only remains to show then that
But the left-to-right half is immediate, given the canonical definition of R, which simply says that if an implication is in C and its antecedent is in Cl then its consequent is in C2. For the right-to-Ieft half, we proceed contrapositively, assuming that a --+ b ¢ C. We then show that 3CI, C2, RCI CC2 & a E CI yet b ¢ C2. We simply let CI = [a). To obtain C2, we first consider the principal dual cone determined by b, (b] = {x : x :s; b} . We then set C2 = U - (b] (it is easy to check that this is a cone). It is clear that a E C I and b ¢ C2. What needs argument is that RCI CC2. Recalling the canonical definition of R, this amounts to assuming that x E CI, x --+ Y E C and showing y E C2· For reductio let us suppose then that y ¢ C2. Our hypotheses that x E CI and y ¢ C2 become a :s; x, and y :s; b. Using the rule forms of prefixing and suffixing we can easily derive x --+ y :s; a --+ b (y :s; b implies x --+ y :s; x --+ b, a :s; x implies x --+ b :s; a --+ b, and apply transitivity). Then our hypothesis that x --+ y E C gives that a --+ b E C, contrary to our initial supposition. 0 Representation of Donble Implicational Posets Let us now consider the case where we have both "residuals" but no fusion operation to connect them. To this end we will define a double implicational poset to be a structure (S,:S;, --+, ~), where both (S,:S;, --+) and (S,:S;, ~) are implicational posets, and the arrows interact by "pseudo-assertion": (1) a:S; (b ~ a) --+ b;
(2) a:S; b
~
(a --+ b).
The following shows that we have again correctly axiomatized (in this case both) residuals without residuation. Theorem 8.1.3 Every double implicational poset can be embedded in a concrete residuated p.o. groupoid.
Proof From the theorem above we know tlrat every implicational poset is so embeddable, using the canonical relation R defined above, but let us now subscript it:
By a symmetric argument every implicational poset is also so embeddable using the canonical relation:
PARTIALLY ORDERED SETS WITH IMPLICATION(S)
that R-+CI C2C3 says '< 1) V (ia >< 0) EM.
Since M is prime, this means that (ia >< 1)
E
M or (ia >< 0)
E
M, i.e., ia ~ 1 or ia ~ O.
This means that for x E A/~, ix = 1 or ix = O. Now suppose that x t= 1. Then by Proposition 10.2.8, ix t= 1. Thus, ix = 0 is the only alternative remaining. D
369
Theorem 10.8.4 EvelY S5-algebra is isomorphic to a subdirect product of Henle algebras. Proof This is immediate from Lemma 10.8.3 and Theorem 2.8.12.
D
Remark 10.8.5 There are at least two other methods of proving the above theorem. The first is to show that only Henle matrices are irreducible and then apply Birkhoff's prime factorization theorem. If A is irreducible and not a Henle matrix, there is an element of the form ca t= 0,1. We define congruences ';::jfl and ';::jv just as for Lemma 8.11.12, but putting ca in place of a. The only thing then that needs doing is to show that these respect the modal operator, say c. If x ';::jfl y, then x /\ ea = y /\ ea & -x /\ ea = -y /\ ca. From the first we can conclude e(x /\ ea) = e(y /\ ea), and from this we can obtain ex /\ ea = ey /\ ea, using Proposition 10.2.13. This is just half of showing that ex ';::jfl ey. We must also show that -ex /\ ea = -ey /\ ca. We have that -x /\ ea = -y /\ ea, and so i( -x /\ ea) = i( -y /\ ea). But again using Proposition 10.2.13 we obtain i-x /\ ea = i - Y /\ ea, and hence -ex /\ ea = -ey /\ ea as needed. All that really remains to be shown is that ';::jv respects e, where the remaining two modal distribution principles from Proposition 10.2.13 play their role. The rest of the proof is as for Lemma 8.6.6. Another way to prove the embedding theorem is given implicitly by the following exercise. Exercise 10.8.6 We know from Exercise 10.6.8 (cf. also Section 10.7) that every S5algebra A is representable using a frame (U, ==), where == is an equivalence relation on U. We also know that the U is partitioned by == into a number of disjoint equiValence classes {[ w] : w E U}. Look at each power set \p([ w]) as a field of sets. Show that it becomes a Henle matrix, given the definition for X ~ [w], C(X) = {P : 3a E X, a == P}.1t is obvious that if X is empty then C(X) = 0. Show that when X is not empty, then C(X) = [w]. Go on to show that the S5-algebra A is isomorphic to a subdirect product of these Henle matrices. (This proof is somewhat analogous to showing implication 3 in Exercise 8.11.11 (Figure 8.15), and also has antecedents in the informal proof given above of the relative and absolute semantics for S5.) 10.9
Alternation Property for S4 and Compactness
A topological space (U', C') is a subspace of a topological space (U, C) iff (1) U' ~ U and (2) C' is just C restIicted to U'. (The latter is equivalent to the more usual requirement that each open set 0' of the first topology is the intersection 0 n U' for some open set 0 of the second topology.) A topological space U is compact iff for every indexed family of open subsets {Oi} iEi, wh~n~ver ~iEi Oi = U then ~here exists some finite J ~ I such that UjEJ OJ = U. ThIS IS easIly seen to be eqUIvalent to the dual condition that for every indexed family of closed subsets {C} iEi, if for every finite J ~ I, njEJ Cj t= 0, then niEi C:i t= 0. The antecedent of this conditional, i.e., that n'EJ Cj is non-empty for all fimte J, is often called the "finite intersection property," ~nd the consequent, i.e., that niEi C is non-empty is often called the "intersection property." Thus compactness
MODAL LOGIC AND CLOSURE ALGEBRAS
370
ALGEBRAIC DECISION PROCEDURES
371
may be neatly stated as requiring that every family of closed sets which has the finite intersection property also has the intersection property. We shall be interested in a somewhat stronger property, to wit, a topological space U is strongly compact iff for every indexed family of open subsets {Oi} iEI, whenever UiEIOi = U, then::li E I such that Oi = U. This is again easily seen to be equivalent to the dual condition that for every indexed family of closed subsets {Ci } iEI, if for every i E I, C =f. 0, then niEI Ci =f. 0. We want now to prove the one-point compactification theorem.
algebra (where [pJJ ... [Pk] are the generators, and PI, ... , Pk are all the subformulas of lfI). This finitely generated Boolean algebra B is finite (because of normal forms) but may not be closed under c[p] = [Op]. We define a new operation c' on B' that agrees with the old c when it exists in B' (all the theorems are about this). Thus we can now falsify lfI in a finite K-algebra. Indeed I-K lfI ifflfl is valid in all K-algebras of size:::; 22", where 11 is the number of subformulas of lfI. For other modal logics, e.g., K, T, and S4, we can obtain similar results. The details are to be found in the following theorems.
Theorem 10.9.1 Every topological space (U, C) is a subspace of a strongly compact topological space (U+, C+) such that U+ - U is a one-element set and is closed.
Theorem 10.10.1 (McKinsey 1941) Let (B, 1\, V, -, c) be a K-algebra and let (B', 1\',
Proof Let aD be anything not in U, and let U+ = U U {aD}. Define C+(X) = C(X) U {aD} (think of aD as "close" to all points). Clearly, by the definition of C+, (U, C) is a subspace of (U+, C+). 0
Let us say that a closure algebra (B, 1\, v, -, c) is finitely compact iff whenever ia V ib = 1 then ia = 1 or ib = 1 (or dually, ca =f. 0 and cb =f. 0 imply ca 1\ cb =f. 0). This terminology is frankly invented and is the vestigial form of strong compactness that remains when one is working with closure algebras that are not necessarily complete. Thus, it is easily seen to be equivalent to the condition that for every finite indexed family of open elements {Oi} iEI, if AiEl 0i = 1, then there is an i E I such that 0i = 1. Theorem 10.9.2 Every closure algebra (B, 1\, V, -, c) is a subalgebra of afinitely compact closure algebra. Proof By the representation theorem for closure algebras we know that B is isomorphic to a topological field of sets of some topological space (U, C). By the one-point compactification theorem, we know that (U, C) is a subspace of a strongly compact topological space (U+, C+). 0
Theorem 10.9.3 (Alternation property for S4) If 1-84 Dp V DlfI, then 1-s4 Dp or 1-84 DlfI· Proof Consider the Lindenbaum algebra LA(S4). By Theorem 10.9.2 it can be embedded in a finitely compact closure algebra. Let us assume the hypothesis I- Dp V DlfI. Then i[p] V i[lfI] = 1 in LA(S4) and so h(i[p] V i[lfI]) = ih([p]) V ih([lfI]) = 1 as well, where h is the embedding. But then by the definition of finite compactness, ih([p]) = 1 or ih([lfI]) = 1, i.e., h(i[p]) = 1 or h(i[lfI]) = 1. So i[p] = 1 or i[lfI] = 1 back in 0 LA(S4), i.e., 1-s4 p or 1-84 lfI.
10.10
Algehraic Decision Procedures for Modal Logic
We shall present a proof of McKinsey's (1941) theorem that S4 is decidable, which shows that S4 has the finite model property. We shall show the same for K and T. Decidability will then follow using Harrop's theorem (Theorem 6.16.5). The rough idea is that if, for example, 11K lfI then lfI is falsifiable in a K-algebra under the canonical valuation [lfI]. The K-algebra is "almost" a finitely generated Boolean
V', -') be an infinitely distributive complete Boolean subalgebra ofB, i.e.,
(i) (B', 1\', V', -') is a complete Boolean algebra;
(ii) (B', 1\', V', -') is a subalgebra of(B, 1\, V, -) where 1\' and V' are defined for 0, bE
B' as a 1\' b = A' (a, b} and a V' b = V' (a, b} ; (iii) B' is infinitely distributive, that is, a V' A' X = X ~ B'; and (iv) c1 E B'.
A' {a v' x
: x E X} for a E B',
Then there exists a unalY operation c' on B' such that (B', 1\', V', -', c') is a K-algebra, and whenever a, ca E B', then c' 0 = ca. Proof Before we begin, we remark that the principal application of the theorem (as in Corollaries 10.10.2 and 10.10.3) is to the case when B' is finite, in which case conditions
(i)-(iii) collapse to the simple requirement that B' be a Boolean subalgebra of B. The reader may want to keep this case in mind as he goes through the proof to "fix ideas." Also, to simplify notation, we shall drop the primes from the symbols denoting the operations of the sub algebra with the exception of the "new" operation c' which we wish to focus attention on. Now to begin the proof, define for a E B', (1) c'(a)
= A{cx: ex E B' and a :::; x},
which is non-empty due to (iv). We first verify that c' a = co for a, ca E B'. Thus, suppose a, ca E B'; then since 0 :::; a, ca is itself a component in the meet that makes up c' a. So clearly c' a :::; ca. But ca :::; c' a as well, since we can argue that ca :::; any component ex in the meet displayed above. For this, suppose ex E B' and a :::; x. Then by monotonicity of c, ca :::; ex, which is what is wanted. We will now occupy ourselves with showing that B' is a K-algebra. We need to show (2) c'o = 0 and (3) c'(aVb)=c'aVc'b.
It is trivial that (2) holds given (1). Thus, cO = 0 E B', and so c'o = cO = O. The calculations for (3) are somewhat complex. We first obtain a useful form of the right-hand side of (3). Thus, (4) c' a V c' b
= A {ex: ex E B' &
a :::; x} V A {cy : cy E B' & b :::; y}.
But by infinite distribution, this equals
(5)
!\ U\ {cx : cx E B' &
a S x} V cy : cy E B' & b s y}.
And by a further infinite distribution, this equals (6) !\{!\{cxVCy:cxEB'&asx} :cyEB'&bSy}·
!\ {cx V cy : cx, cy E B' & a S x
(9) c'a V c'b
Recalling the definition of c' a as the meet of cxs satisfying a certain condition, it suffices to assume (2) cx E B' and a S x
(3)
& b s y}.
These computations are straightforward but confusing (because of the nested braces, etc.). Asserting that our principal application of the theorem will be to the case where B' is finite, we invite the reader to work through all our computations with c' a = CXj, V ... V cx n . Now going on to show (3), it obviously suffices to show both (8) c'(a V b)
S c'a V c'b and S c'(a V b).
as cx.
But (3) clearly follows from the second conjunct of (2) and x S cx (which is the Talgebra postulate (3c)). 0 Corollary 10.10.3 Let B andB' be as in Corollary 10.10.2, except that B is required to be an S4-algebra. Then there exists a unmy operation c' on B' such that (B', /\', V', -', c') is an S4-algebra and whenever a, ca E B', then c' a = ca. Proof Tn virtue of the proof of Corollary 10.10.2, we need only check that the c' as defined in the proof of the theorem satisfies the characteristic S4-postulate:
To prove (8), given the definition of c'(a V b) and (7), it suffices to show
(1) c'c'a
s c'a.
(10) !\{cz:cZEB'&avbsz}scxvcy
It obviously suffices to suppose that
upon the assumption of
(2) cx E B' and a S x
(11) cx, cy E B' & a S x & b
s
y.
to and show
This we do if we show cx V cy to be a component of the meet that constitutes the lefthand side of (10). The trick is to show that c(x V y) is such a cz. From (10) it follows that c(x V y) = cx V cy E B'. It only remains to show then that a V b S x V y, but this follows by lattice properties from (11). To prove (9), given (7) and the definition of c'(a V b), it suffices to show (12)
!\ {cx V cy : cx, cy E B' &
a S x & b s y}
s cz
upon the assumption of
S z. By moves familiar by now, it suffices to show that such a cz is in fact a component of the meet on the left-hand side of (12). The trick is note that cz = cz V cz. By (13), cz E B' and a S z and b S z, which is all that is needed to put cz in the set being meeted. 0 (13) cz E B' & a vb
Corollary 10.10.2 Let Band B' be as in the theorem except that condition (iv) is dropped and in its place it is required that B be aT-algebra. Then there exists a unary operation c' on B' such that (B', /\', V', -', c') is aT-algebra and whenever a, ca E B', then c' a = ca. Proof Condition (iv) can be dropped since it is trivially satisfied by every T-algebra that c1 = 1 E B'. Since a T-algebra is a K-algebra, we need then only check the definition of c' used in the proof of the theorem to see that it satisfies the additional postulate for a T-algebra, namely, (1)
as c' a.
373
and to show
But by "infinite association," this equals (7)
ALGEBRAIC DECISION PROCEDURES
MODAL LOGIC AND CLOSURE ALGEBRAS
372
(3) c' c' a S cx, i.e.,
(4) !\{cy: cy E B' & c'a S y} S cx.
To show (4) it obviously suffices to show that cx is a component of the left-hand meet, i.e., to find some y such that cx = cy and cy E B' and c' as y. Since ccx = cx (the (4c) condition of an S4-algebra), cx is a possible choice for y (x itself is a very poor choice for y) provided only that the other two conditions on yare met. But c(cx) = cx E B' (from the S4-condition (4c) and (2)). So it remains to show only (5) c'a
s cx.
But this is immediate from (2) and the definition of c'.
o
Remark 10.10.4 For certain modal systems the definition of c' can be varied somewhat. Thus, for example, for S4-algebras (closure algebras) one may use the definition
c~4(a)= /\'{X:XEB' and cx=x/\aSx}. This definition accords with the familiar topological intuition that the closure of a set is the intersection of all its closed supersets (x is "closed" if cx = x). For other modal systems the definition of c' must be varied somewhat. Shukla (1970) gives an ingenious variation needed for some of the weaker modal systems in the neighborhood of SI. Exercise 10.10.5 Show for an S4-algebra that cS4(a)
= c'(a).
374
85 AND PRETABULARITY
MODAL LOGIC AND CLOSURE ALGEBRAS
Exercise 10.10.6 For certain systems stronger than K the definition of e' must be varied somewhat. Thus consider the logic D = K + O(¢ V ~¢). (O(¢ V ~¢) may be replaced by D¢ :J O¢, which has been thought particularly appropriate for a deontic logic with D read as "it is obligatory that" and 0 read as "it is pemlissible that.") The condition on the accessibility relation is that every world have a world possible relative to it. (a) Show that the theorem can be proven for D-algebras (K-algebras with the added postulate that e 1 = 1) with the definition
e~(a) =
I\. {ex: ex
E B' & a:S ex}.
(b) Does e~(a) = e'(a) in aD-algebra? Exercise 10.10.7 Can some analog of the theorem be proved for B and S5? It would seem that the definition of e' must be varied. (Strangely, there seems to be no discussion of this problem in the literature.)
375
VI(OIff) = e'[Iff], the result cannot be guaranteed to be [Olff]' But it can be when Olff is a sub sentence of ¢, and this is good enough. Thus one can prove the following by an easy induction on sentences:
Lemma 10.10.9 Let VI be as defined above. Then VI(Iff) = [Iff].
if Iff
is a subsentence of ¢, then
The finite model property now follows directly. Since we have been assuming that IIH ¢, then [¢] -:j:. 1, and so v I (¢) = [¢] is undesignated. The corresponding results for T and S4 follow using Corollaries 10.10.2 and 10.10.3 in place of Theorem 10.1 0.1. D Corollary 10.10.10 The set of theorems for each of the modal logics K, T, and S4 is decidable. Proof From Theorem 10.10.8 using Harrop's Theorem (Theorem 6.16.5).
D
Theorem 10.10.8 The modal logics K, T, and S4 all have the finite model property. Proof We suppose that the logic in question is K. The proofs for T and S4 follow by obvious modifications. Suppose then that 11K ¢. Then form the Lindenbaum algebra of K, which is a K-algebra. Let Iff], ... , Iffn be all of the subsentences of ¢. Consider the Boolean algebra B' generated by [Iff]], ... , [lffn]. (For D we also need to add [O(lffl V ~Iffr)] to the generators.) We will show that B' is finite. Using De Morgan's laws and double negation, every element can be put into "meetnormal" form (Xl,] V ... V Xl, 1111) /\ ... /\ (X n ,] V ... V x n , 1111/)' where the Xi} are the generators or their complements. Drive complement signs inside meets or joins (switching meets withjoins), removing double complements as they arise, so complement signs flank generators. Then use distribution to drive meets in past joins. Because of associativity, commutativity, and idempotence this form can be defined to be unique (up to an ordering of the generators). So clearly the sublattice L' is finite, since it is easy to see that there are at most 22211 such forms. Indeed, a little fiddling so as to not allow joins in which both a generator and its complement occurs reduces this bound to 221/.2 There is not yet a closure operation defined on B'. We cannot simply use c[1ff] = [Olff], since the result might not be defined when Olff is not a subsentence of ¢ (or even a conjunction of disjunctions of negations of such). And we cannot simply expand B', closing it under e, since the result need not be finite (as it was when we did the corresponding construction with meet, join, and complement). But we do know by Theorem 10.10.1 that a new closure operation e' can be defined on L' that agrees with the original closure operation e when a and ea are both in B'. This turns out to be good enough, and (B', /\, v, -, e') will be our desired finite model. Let us consider the interpretation l(p) = [p], for each atomic sentence p which is a sub sentence of ¢, and otherwise l(p) is arbitrarily defined. This is very much like the canonical valuation in the Lindenbaum algebra except that when one comes to compute 2 Alternatively, the reader can use the fact that Boolean tenns can be identified when they define the same operations in 2, and note that there are only 22" such n-p]ace functions.
10.11
S5 and Pretabularity
The reader may have noticed that in the previous section we did not bother to show that S5 is decidable. This is because S5 has the finite model property in a very strong sense which we shall call "uniform." It is not just that each non-theorem ¢ can be refuted in a finite model, but the size of the model can be calculated in a very simple manner from the size of ¢, indeed from the number of distinct atomic sentences occurring in ¢. Theorem 10.11.1 (Uniform finite model property for S5) If there are exactly n atomic sentences in a sentence ¢, then ¢ is a theorem of S5 iff ¢ is valid in the Henle matrixHIl • Corollary 10.11.2 (Soundness and completeness for S5) A sentence ¢ is a theorem of S5 (fifor all positive integers n, ¢ is valid in Hn. We shall eventually get around to proving the theorem. The corollary clearly follows from the fact (Exercise 10.8.1) that a Henle-matrix is an S5-matrix (provided Theorem 10.11.1 holds). We will prove the even more surprising theorem (Theorem 10.11.7 below) due to Scroggs (1951) that all modal logics extending S5 have a finite characteristic matrix, namely, one of the finite Henle matrices. This, together with the fact that S5 itself has no finite charactelistic matrix (Theorem 10.11.9 below), establishes that S5 is "pretabular" in the sense that while it does not itself have a finite characteristic matrix, every modal extension of it does. Lemma 10.11.3 Let X be a modal extension of S5. is not valid in some Henle matrix H.
If ¢
is not a theorem of X then ¢
Proof We leave to the reader the verification that Henle matrices satisfy the axioms and rules of S5. Consider the Lindenbaum algebra LA(X) fonned using the equivalence relation of provable material equivalences (Iff == X iff (Iff :J X) /\ (X :J Iff) is a theorem
376
MODAL LOGIC AND CLOSURE ALGEBRAS
85 AND PRETABULARITY
of S5), or alternatively, of provable strict equivalences. This is an S5-algebra in which [p] i= 1, since P is not a theorem of X. We can now apply Lemma 10.8.3 to say that there is a Henle matrix H and a homomorphism h from LA(X) onto H so that h([ p]) i= h(l) = 1. In the interpretation z(p) = h([p]), then z(p) i= 1. D Notice that there is no reason to think that H is finite, so we work towards replacing the Henle matrix H with some finite Henle matrix Hn. The following will be of use in that task: Proposition 10.11.4 For natural numbers i and j, i
~ j
= c(h(al,
Theorem 10.11.7 (Scroggs 1951) Let X be a modal extension of S5. Some finite Henle matrix H/1 is characteristic for X in the sense that p is a theorem of X (ff P is valid in H/1'
Proof We then let I be the set of indices such that Hi validates all of the theorems of X. Either I is infinite or finite. If infinite, then because of Proposition 10.11.4 we can take I to be all of :£;+, and so (by Theorem 10.11.5) X = S5. If finite, then (assuming non-emptiness) there is a largest index k. Because of Proposition 10.11.4 and
Theorem 10.11.5, it is easy to see than Hk is the desired characteristic matlix.
iff Hi is a subalgebra of Hj.
Proof The "if" part is obvious on size considerations. For the "only if" part we show that Hn is a subalgebra of Hn+ I, and then the proposition follows by an obvious induction. We know from the similar Theorem 8.11.14 that the Boolean part of Hn is isomorphic to a sub algebra of the Boolean part of Hn+l under the mapping h(al, ... , an) = (aI, ... , all, an). It remains to show that h(c(al, ... ,all))
Remark 10.11.8 Note that in the above proof, if I extension in which all sentences are theorems.
Note that we cannot simply apply the general law of universal algebra stated in Exercise 8.11.15 because (unlike the Boolean operations) c "thinks globally" and is not computed componentwise. Rather c( a I, ... , all) = (1, ... , 1) if some component ai = 1, and c(al, ... ,all) = (0, ... ,0) if every component ai = O. The key trick is that some component of (aI, ... , all) is 1 iff some component of its "expansion" (aI, ... , an, an) is 1, and similarly every component of (aI, ... , an) is 0 iff every component of its expansion (aI, ... , an, all) is O. Note that either h(c(al, ... , an)) is all Is, or else it is all Os. But h(c(al, ... ,all)) is allIs iff c(al, ... ,an ) is allIs iff (aI, ... ,an ) contains a 1 iff (aI, ... , an, an) contains a 1 iff c( (aI, ... , an, an)) is allIs. And h(c(al, ... ,an)) is all Os iff c(al, ... , an) is alIOs iff (aI, ... , an) is alIOs iff (aI, ... , an, an) is all Os iff c(al, ... ,an,an )) is allIs. In either case, h(c(al, ... ,an )) = c(h(al, ... ,an )).) D
Theorem 10.11.5 Let X be a modal extension of S5. If P is not a theorem of X then P is not valid in some finite Henle matrix Hn (where n is the number of atomic sentences with occurrences in p).
P of X is rejectable
in some Henle matrix H. We observe that the Henle submatrix H' of H generated by the elements assigned to atomic sentences in p is finite and no larger than 2n. This is because of normal forms for Boolean algebras, together with the fact that in a Henle matrix the modal operators do not lead to new elements. Because of Proposition 10.8.2 we can take H' to be of the form 2k, with k ~ n. Because of PropositionlO.11.4 this means that 2k is a subalgebra of 2n. Since validity is preserved under subalgebra, this D means that p is rejected in 2/1. Proposition 10.11.6 Setting X to be S5 in the above theorem gives the "completeness" half of Theorem /0.11.1, the discussion of which began the present section. The "soundness" half (that each H/1 validates all of the theorems of S5) is given by Exercise /0.8.1.
= 0 then
D
X is the inconsistent
Theorem 10.11.9 S5 has no finite characteristic matrix. Before we present the proof proper we discuss some background. Using == for material equivalence, the following is a well-know tautology of classical logic:
... ,all))'
Proof We know from the previous lemma that any non-theorem
377
(p ==
lfI) V (p
== X)
V (lfl
== X)·
The reason is that with only two truth values one ends up having to always assign the same truth value to two of the three sentences p, lfI, X. This can be generalized to Henle matrices. By the finitizing equality n says of n + 1 variables that some two of them stand for the same proposition. By the corresponding finitizing sentence L/1 we mean a sentence in the modal language of the following form, with p >< lfI (strict equivalence) defined as necessary material equivalence (D[(-.p V lfI) A (-'lfI V p))):
(PI >< P2) V (PI >< P3) V ... V (PI >< p/1+d V (P2 >< P3) V ... V (P2 >< Pn+l) V
V
(Pn >< P/1+I) = 1.
Proof Suppose that some finite matrix M is characteristic for S5 and that M has n elements. Just on size considerations it will thus validate the finitizing sentence n, contradicting that M is characteristic for S5. D
378
MODAL LOGIC AND CLOSURE ALGEBRAS
Exercise 10.11.10 Show that any SS matrix with more than n elements refutes < Xl) is designated, and (2) a disjunction is designated if any disjunct is designated.) We have the following corollary, which shows that modal extensions of SS have particularly simple axiomatizations. Corollary 10.11.11 Every proper modal extension of SS can be axiomatized by adding one of the jinitizing sentences L,n to the axioms of SS. Proof Let X be a proper modal extension, by which we mean that X c SS. We know that X has a characteristic Henle matrix Hn. It is easy to see that adding the finitizing sentence L,2" to the axioms of SS (call the resulting system SSn) forces Hn to be also the characteristic matrix of SSn. Hence X = SSn. 0 It turns out that "SS can be approximated from above" by its finitary counterparts SSn (defined as in the proof just above):
Corollary 10.11.12 The extensions of SSline up as a chain asfollows: SSo :J SSl :J ... :J SSn :J SSn+1 :J ... :J SS. Proof The weak version of these inclusions (replace :J with 2) follows from Proposition 10.11.4 and that fact that submatrices preserve validity. All that remains is to show that the inclusions are proper by showing distinctness of the displayed systems. That no SSn = SS amounts to Theorem 10.11.9. The finitizing sentence L,2" is a sentence valid in SSn but not valid in SSn+ 1. 0
Corollary 10.11.13 SS
= nnEw SS/l.
Classical logic can be viewed as the limiting case of modal logic. Viewed in truthfunctional terms, 0 is simply the identity function, and can be read in English as "it is true that." If one considers the two element Boolean algebra as a Henle matrix, one obtains the same result. We shall say that a modal logic "collapses to classical logic" when DcjJ >< cjJ is a theorem. A logic is said to be Post complete if every proper normal extension of it is Post inconsistent in the sense that every sentence is a theorem. Sometimes writers call these notions absolute completeness and absolute inconsistency, and we sometimes stray into this way of talking. Note that in logics where any contradiction implies every sentence, Post consistency (absolute consistency) and ordinary consistency (not both cjJ and ..,cjJ are theorems) coincide. This last is sometimes called "negation consistency" for emphasis. Corollary 10.11.14 The only consistent and Post complete modal extension of SS collapses to classical logic. Proof The proof can be more or less read off of the approximation of SS given by Corollary 10.11.12. First note that SSl is consistent, for there is a sentence provable in SSo which is not provable in SSl. Further, SSI is Post complete, since its only extension is SSo, which can be easily seen to be the absolutely inconsistent modal logic. And
S5 AND PRETABULARITY
379
for n > 1, SSn is not Post complete, since it can always be properly extended to the 0 consistent system SSn-l. Remark 10.11.1S The above results can be given a "purely algebraic" formulation. For example, the fundamental Theorem 10.11.7 can be restated to say that if any equations are added to those axiomatizing SS-algebras, then the resulting set of equations axiomatizes some finite Henle matrix H/l. Remark 10.11.16 An amazing fact is that SS is one of exactly five pretabular modal extensions of S4, as was shown independently by Maksimova (1975), and Esakia and Meskhi (1977).
IMPLICATIVE LATTICES
381
It is worth remarking that (H5) is given in an exported form, but it can be given in an imported form
11 INTUITIONISTIC LOGIC AND HEYTING ALGEBRAS
(H5') ((cjJ -- If!) /\ (cjJ -- X)) -- (cjJ -- (If! /\ X)) at the cost of adding either the axiom (H12) cjJ -- (If! -- (cjJ /\ If!)) or the adjunction rule:
11.1
Intuitionistic Logic
We here present a Hilbert-style formalism for the sentential calculus H due to Heyting (1930). We assume an infinite set of atomic sentences, binary connectives __ , /\, V for implication, conjunction, and disjunction respectively, and a unary connective ..., for negation. The axioms consist of all sentences of the following forms: (HO) (HI) (H2) (H3)
cjJ -- cjJ; cjJ -- (If!-- cjJ); (cjJ -- (If! -- X)) -- ((cjJ -- If!) -- (cjJ -- X)); (cjJ /\ If!) -- cjJ;
(H4) (cjJ /\ If!) -- If!;
(H5) (H6) (H7) (HS) (H9) (HlO)
(ADJ) If cjJ and If!, then cjJ /\ If!. One can also replace (HS) with its imported form (HS') ((cjJ -- If!) /\ (X -- If!)) -- ((cjJ V X) -- If!). The point of this is to make the axioms more obviously give /\ and V the properties of lattice meet and join. Incidentally, from (HI) and (H2), one can prove the following principle of transitivity: (H13) (cjJ -- If!) -- ((If! -- X) -- (cjJ -- X))·
This, with (HO), shows that f-H cjJ -- If! establishes a pre-order, indeed a pre-lattice. We shall see that it is distributive.
(cjJ -- If!) -- ((cjJ -- X) -- (cjJ -- (If! /\ X))); cjJ -- (cjJ V If!);
11.2 Implicative Lattices
If! -- (cjJ V If!);
We shall call a structure (L, /\, V, :::}) an implicative lattice if (L, /\, V) is a lattice and for all a, b, x E L,
(cjJ -- If!) -- ((X -- If!) -- ((cjJ V X) -- If!)); (cjJ -- ""X) -- (X -- ...,cjJ); cjJ--(...,cjJ--lf!).
As sole rule we take modus ponens: (MP) If cjJ and cjJ -- If!, then If!. We now make a few remarks of an axiom-chopping sort. Axiom (HO) is redundant. (Show this as an exercise if you have never done so before.) Axioms (HI) and (H2) completely characterize the "pure implicational fragment" (those theorems whose only connective is --). Axioms (HI) through (H5) completely characterize the "implicationconjunction fragment" (those theorems whose connectives are only -- and /\). And axioms (HI) through (HS) characterize so-called "positive logic" (those theorems that are negation-free). We do not prove these "separation results," but we mention them for the sake of calling attention to the importance of the various fragments. It turns out that if one has a primitive constant false proposition j, one can define ...,cjJ = cjJ -- j (in the style of Johansson 1936), thus dispensing with the need for a primitive negation connective. One must, though, add the axiom scheme (Hll) j -- cjJ
to get the effect of (HlO). But (H9) follows even without this addition, and so (HI) through (H9) amount to the axioms of what Johansson called "minimal logic" (positive logic supplemented with Johansson's definition of..., but no special axioms about f).
(*) x /\ a ~ b iff x ~ a :::} b.
(The reader may want to verify, as an exercise, that this condition implies that:::} is antitonic in the first position and monotonic in the second.) The motive for the name is obvious once it is remarked that the postulate (*) from left to right amounts to the "deduction theorem," and from light to left amounts to modus ponens. Nonetheless, it should be remarked that not all "implications" discussed in the literature satisfy the deduction theorem as it is usually understood. Thus, in particular, the strict implication of C. I. Lewis (191S), the counterfactual implications of Stalnaker and Thomason (1970) and D. Lewis (1973), the Sasaki implication of quantum logic, and the relevant implication of Anderson and Belnap (1975) all reject the deduction theorem in the form (**) r,cjJf-lf!onlyifrf-cjJ--lf!,
where f- stands for ordinary deducibility. The quick reason for this rejection is that by setting paradox of implication
r
to be {If!}, one obtains the
(***) If!f-cjJ--lf!,
which is anathema for strict, counterfactual, and relevant implication. But the deduction theorem in its form (**) is central to the classical and intuitionistic systems, particularly to the intuitionistic system wherein all the pure implicational theorems may be deduced from (**) and its converse (modus ponens). Accordingly, (*) is central to the algebraic
INTUITIONISTIC LOGIC
382
HEYTING ALGEBRAS
treatment of the intuitionistic system. Note that (*) is just a special case of residuation (see, in particular, Sections 3.10, 3.16, and 3.17), which does allow for more general ways of combining premises than simply the ordinary conjunction in (*). The careful reader may have noticed that our definition of an implicative lattice does not explicitly postulate distributivity. This is because to do so would be redundant, as the following shows. Theorem 11.2.1 (Skolem-Birkhoff) Every implicative lattice is distributive.
(1) b/\as(a/\b)V(a/\c), (2) c /\ a S (a /\ b) V (a /\ c).
Corollary 11.2.4 Let (L, /\, v) be a finite distributive lattice. Then there is a unique implicative lattice (L, /\, V, =?). 11.3
Heyting Algebras
(1) ..,a
From these we obtain by "exportation," using (*), -+
but this is transparent since each component a /\ x is postulated to be less than or equal D to b. Finally (*) trivially uniquely characterizes =?
A Heyting algebra (sometimes called a Heyting lattice) is an implicative lattice (L, /\, V, =?,O) with least element 0. The Heyting complement, also called pseudo-complement, can be defined as
Proof By lattice properties, we have
(3) b S a
383
= a =? 0,
and can be easily seen to have the properties ascribed to it in Chapter 3.
[(a /\ b) V (a /\ c)],
(4) c S a -+ [(a /\ b) V (a /\ c)].
11.4
And from (3), (4) we obtain by lattice properties
Let (U, t:) be a quasi-ordered set, i.e., t: is a reflexive and transitive relation on U. By a proposition p we mean a subset of U "closed upward," i.e., for a, p E U, if a E p and a t: P, then PEp. By a full Heyting algebra of propositions we mean a structure (A, /\, V, =?, 0), where A is the set of all propositions on some partially ordered set (U, t:), /\ and V are intersection and union, p =? q = {a E U: for all P E U such that a t: P, P jt p or P E q}, and 0 is the empty set. By a Heyting algebra of propositions we mean a subalgebra of a full one. We shall call (U, t:) an evidential frame. The idea comes from Kripke (1965). An item a E U is thought of as an "evidential state," and a t: P means that the information at a is contained in that of p. The requirement that p be closed upward corresponds to the idea that if p is established by a piece of information, it is also established by any piece of information that extends it. "Established" is intended in a very strong sense then, i.e., it means "proven." Such an assumption might well be given up for some weaker sense, such as "shown to be highly probable." So-called "non-monotonic logics" might well reject this assumption.
(5) b V c S a
-+
[(a /\ b) V (a /\ c)].
But from (5), we obtain by "importation," using (*) (with commutation), (6) a /\ (b V c) S (a /\ b) V (a /\ c),
which inequality suffices for distribution (cf. Chapter 2).
D
Remark 11.2.2 The above theorem has been thought important by many writers (including Birkhoff) for showing that "quantum logic" can have no decent implication, since distribution fails. In light of our remarks above about the dangers in the nomenclature "implicative lattice" in the light of many "non-exporting" implications, this moral must be regarded as questionable. Indeed, many recent workers on quantum logic have questioned this once orthodox moral; see especially Hardegree (1975), who relates a quantum implication to the Stalnaker conditional. Theorem 11.2.3 Let (L, /\, v) be a complete lattice which is infinitely distributive, i.e., a /\ VX = V{a /\ x : x E X}. Then there is a unique operation =? on L such that (L, /\, V, =?) is an implicative lattice (with a /\ b = !\ {a, b}, a V b = V {a, b D. Proof Define a =? b = V{x : x immediate, since if x /\ a S b, then x is a =? b. From right to left is only x /\ a S (a =? b) /\ a. So it suffices to
/\ a S b}. Then (*) from left to right is all but is in fact a component of the "infinite" join which slightly harder. Thus, assume x sa=? b. Then show that
a /\ (a =? b)
S b,
i.e., modus ponens holds. By virtue of the definition of a =? b as "infinite join" and "infinite distribution" it then suffices to show that
v
{a /\ x : x /\ a S b}
s b,
Representation of Heyting Algebras using Quasi-ordered Sets
Theorem 11.4.1 A full Heyting algebra ofpropositions is a Heyting algebra. Corollary 11.4.2 A Heyting algebra ofpropositions is a Heyting algebra. Proof Despite appearances, neither the theorem nor the corollary is of the form "an unmarried male is unmarried." We chose the name "full Heyting algebra of propositions" in anticipation of the theorem, but calling it a "Heyting algebra" does not make it a Heyting algebra. Such patterns of nomenclature are ubiquitous in logic and mathematics. The corollary follows by virtue of the fact that Heyting algebras are equationally definable, and hence, by Birkhoff's varieties theorem, are closed under subalgebras. As for the theorem proper, we first need to verify that the set of propositions A is closed under all of the operations. Because of our requirement that propositions be closed upward, A need not be the power set of U and so it is not immediate. But if a E p n q and a t: P, then since a E p and a E q and since both p and q are closed upward we have PEp, q, i.e., PEp n q. The case for V is argued similarly. As for =?,
384
INTUITIONISTIC LOGIC
suppose a E p =} q and a b {3. To show that {3 E P =} q we must argue that for arbitrary y such that {3 b y, y rt p or y E q. The trick is that since a b {3 and {3 b y, it follows that a b y, and it was required for a E p =} q that for all y such that a b y, y rt p or r E q. Finally, note that the empty set is vacuously closed upward. In verifying postulates, the only thing that is not obvious is that =} is a relative pseudo-complement. Suppose p n q ~ r. We show p ~ q =} r. Let a E p. To show a E q =} r it suffices to consider arbitrary {3 such that a b {3 and show {3 rt q or {3 E r. If {3 E q, then, since p is closed upward and a E p, {3 E P as well, and so {3 E P n q ~ r, i.e., {3 E r. Conversely, suppose p ~ q =} r and a E pn q, i.e., a E p and a E q. Then, obviously a E q =} r and since a b a, by the definition of p =} q it follows that a rt q or a E r. But since a E q, a E r, and thus p n q ~ r. 0
TOPOLOGICAL REPRESENTATION OF HEYTING ALGEBRAS
385
That AnY ~ B iff Y ~ (X - A) U B may be easily verified. But Y ~ (X - A) U B implies I(Y) ~ I((X - A) U B). This follows from the more general fact that Y ~ Z implies I(Y) ~ CCZ), which fact comes rather trivially from (14). But, since Y is open, Y = I(Y), so Y ~ A =} B. Arguing the converse direction, suppose Y ~ A =} B, i.e., Y ~ I((X - A) U B). Then by (12), Y ~ (X - A) U B, and so by the "iff" that starts off this paragraph, AnY ~ B. This gives the following.
Theorem 11.5.1 A full Heyting algebra of open sets is a Heyting algebra. By a Heyting algebra of open sets we shall mean a subalgebra of a full Heyting algebra of open sets. We then have the following corollary, which follows in the same way the corollary to Theorem 11.4.1 followed.
Corollary 11.5.2 A Heyting algebra of open sets is a Heyting algebra.
Theorem 11.4.3 EvelY Heyting algebra is isomorphic to a Heyting algebra ofpropoWe are next going to provide a topological representation for Heyting algebras.
sitions. Proof Let A = (A, /\, v, =}, 0) be an arbitrary Heyting algebra. Let U be the set of prime proper filters on A, and for a, {3 E U define a b {3 iff a ~ {3. We shall embed A into the full Heyting algebra of propositions on (U, b), using the mapping h(a) = {a E U : a E a}. By Stone's representation for distributive lattices, we know h preserves
Theorem 11.5.3 Every Heyting algebra is isomorphic to a Heyting algebra of open sets. Proof Instead of giving a direct proof of this theorem, we shall obtain it as a special
/\ and V and is one-one. Also, since it is evident that the only filter containing 0 is the whole lattice, h(O) = 0. We need only argue then that h(a) is a proposition, i.e., is closed upward, and that h preserves =}. As to the first, if a E h(a), i.e., a E a, and a ~ {3 E U, then a E {3, i.e., {3 E h(a). As to h preserving =}, suppose a E h(a =} b), i.e., a =} b E a E U, and suppose ab{3 E U. Ifwe can show {3 rt h(a) or {3 E h(b), i.e., art {3 or bE {3, we will have shown a E h(a) =} h(b). Since a b {3 means a ~ {3, we have a =} b E {3. So if a E {3, then a /\ (a =} b) E {3 and, since a /\ (a =} b) ~ b, b E {3. Arguing the other direction, we assume contrapositively that a rt h(a =} b), i.e., a E U but a =} b rt a. We argue that the latter means b rt [x, a). Recall that if b E [a, a), then 3c E a so c /\ a ~ b. But then c ~ a =} b and so a=} bE a. Since b rt [a, a), [a, a) may be extended to a prime filter {3 with b rt {3. So since a b {3 and a E {3 yet b rt {3, it follows that art h(a) =} h(b). 0
case of the representation in terms of "propositions" given in Theorem 11.4.3, exploiting the connection we have discovered between quasi-ordered sets and quasi-metrics in the previous chapter. The idea is to take some arbitrary quasi-ordered set (K, b) and then to consider the topological space (K, C) determined by the quasi-metric d which is the characteristic function of Con K x K. We shall show that the full Heyting algebra of propositions on (K, b) is the same as the full Heyting algebra of open sets of (K, C). Before proceeding we observe that for Y ~ K, I(Y) = K - CCK - Y). So for x E K,x E I(Y) iff x ~ CCK-Y)iff3r E JR+,Vy E K-Y,d(x,y) ~ r.But since d is two-valued, this is true iff Vy E K - Y, d(x, y) = 1, i.e. (since d is the characteristic function for b), iff Vy E K(y rt Y =} x g y), i.e. (by contraposition), iff Vy E K(x bY=} Y E Y). The latter gives us a workable characterization of the members x of I(Y). We first argue that the "propositions," i.e., the subsets of K that are closed upward, are precisely the open sets. Recall that a subset Y of K is a proposition iff
11.5
(1) Vx, y E K(x E Y & x bY=} Y E Y),
Topological Representation of Heyting Algebras
Given a topological space (X, C), we can construct a Heyting algebra which we shall call the full Heyting algebra of open sets of the space. Let it be (0, n, u, =}, 0) where 0 is the set of all open sets of X and where for A, B E 0, A =} B = I((X - A) U B). That o is closed under nand U follows from the well-known topological fact asked for in Exercise 10.6.3. That 0 is closed under =} follows from the clause (I3) of Section 10.6 together with our definition of an open set as one that equals its own interior. That 0 is open follows also from that definition, but using (12) (from the same section). That (0, n, U) is a distributive lattice with least element 0 is obvious. We do need, however, to verify that =} satisfies the residual law: AnY
~
B iff Y
~
A =} B.
and a subset Y of K is open iff (2) Y ~ I(Y).
But, by our characterization of I at the end of the last paragraph, (2) is equivalent to (3) Vx(x E Y =} Vy E K(x bY=} Y E Y)).
But (1) and (3) are essentially just stylistic variants of one another and are trivially equivalent. Since /\, V, and 0 are n, u, and 0 in both Heyting algebras of propositions and Heyting algebras of open sets, it remains only to argue that p =} q = I(K - p) U q, where =} is the implication operation defined on propositions. By definition, x E p =} q iff
INTUITIONISTIC LOGIC
386
ALTERNATION PROPERTY FOR H
Vy E K, x !: y implies x ¢ p or x E q. But by our characterization ofl, x E I(K - p).u q iff Vy E K, x !: y implies x E (K - p) U q, i.e., x ¢ p or x E q. So x E P ~ q iff
x E I((K - p) U q).
D
11.6 Embedding Heyting Algebras into Closure Algebras We have already established that a Heyting algebra of open sets is indeed a Heyting algebra in Theorem 11.5.1 and its Corollary 11.5.2. This result can be cast a little more generally. Given a closure algebra B, define an element a of B to be open iff ia = a (where i is defined in terms of the given closure operator by ia = -c-a). For a given closure algebra, we define a Heyting algebra of open elements as an abstract version of the notion of a Heyting algebra of open sets. Thus, where (B, II, v, -, c) is a closure algebra, the Heyting algebra of open elements of B is a structure (A, II, V, ~,O) where A is the set of open elements of B, II and V are the corresponding operations of the closure algebra restricted to A, a ~ b = i( -a V b), and 0 is the least element of B. We leave it as an exercise for the reader to verify that A is closed under the operations and that ~ is indeed relative pseudo-complement (the proof being precisely analogous to the corresponding proof for Heyting algebras of open sets).
Theorem 11.6.1 The Heyting algebra of open elements of a closure algebra is a Heyting algebra. We can also recast our Theorem 11.5.3 that represented Heyting algebras as Heyting algebras of open sets more abstractly as follows.
Theorem 11.6.2 Every Heyting algebra is isomO/phic to a Heyting algebra of open elements in some closure algebra. Notice that no proof is needed since (unlike the situation with Theorem 11.6.1) Theorem 11.6.2 is actually a weakening of the Oliginal, more concrete theorem.
11.7 Translation of H into S4 McKinsey and Tarski (1948) demonstrated that in a certain sense the intuitionist sentential calculus H may be translated into the modal logic S4. We define the translation * inductively:
= Dp; (-.cjJ)* = D-cjJ*; (cjJ IIIJI)* = cjJ* IIIJI*; (cjJ V IJI)* = cjJ* V IJI*; (cjJ -> IJI)* = D(cjJ* J IJI*)·
(1) p*
(2) (3) (4) (5)
Lemma 11.7.2 Let (B,II,V,-,O) be a closure algebra and let (A,II,V,~,O) be the Heyting algebra of open elements of B. Let / be an interpretation of S4-formulas in B and let t' be an interpretation of H-formulas in A which is such that for all atomic sentences pJ(p) = /(p). Thenfor all H-formulas cjJJ(cjJ) = /(cjJ*). Proof Basically the reader should be able to gestalt the lemma by reason of the fact that
definitions of the operations in a Heyting algebra of open elements and the "definitions" of the connectives of H given by the translation * parallel one another, but technically we need an induction on the length of the formula cjJ. For the base case, where cjJ is an elementary sentence p, we note that /(p) is an open element of B, hence t'(p) = i/'(p) = i/(p) = t(p*). We leave the tlivial cases when cjJ is a conjunction or disjunction to the reader and jump to the case when cjJ is IJI -> x. t'(cjJ -> X) = l(lJI) ~ lex), and, by definition of ~, this equals i(/(IJI) J z'(X)). Then, by inductive hypothesis i(/(IJI*) J z(X*)), and further /(D(IJI* J X*)) = z((1JI -> X)*). The case of negation is handled similarly, but is complicated slightly by the fact that we did not take the pseudo-complement operation -. as primitive in Heyting algebras but instead defined -.a = a ~ O. We argue that z'(-.cjJ) = -.z'(cjJ) = z'(cjJ) ~ 0 = i(/(cjJ) J 0) = i(-z(cjJ*)) = iz(-cjJ*) = /(D-cjJ*) = z((-.cjJ)*). D Now we tum to the proof of the preceding theorem. Proof From left to right is rather trivial. It may be proven relatively mechanically by induction on the length of proof of cjJ in H, producing for each axiom cjJ of H a proof of cjJ* in S4, and observing that if cjJ*, D(cjJ* J X*) are theorems of S4, then so is X*. Alternatively, we can take our Theorem 11.6.1 as establishing the faithfulness of the translation from left to right, for it really just amounts to a statement of that faithfulness in algebraic language. Thus, if f-H cjJ then by the generalized soundness theorem for H we have that cjJ is valid in all Heyting algebras. Hence, by Theorem 11.6.1, cjJ is valid particularly in the class of all Heyting algebras of open elements in closure algebras. Hence, by Lemma 11.7.2, cjJ* is valid in all closure algebras and thus, by the generalized completeness theorem for S4, we have f-S4 cjJ*. From right to left we proceed contrapositively, showing that if not f-H cjJ then not f-S4 cjJ. Supposing not f-H cjJ, we have by the general completeness theorem for H that there is a Heyting algebra (A, II, V,~, 0) and an interpretation z ofH sentences in A such that z(cjJ) =/: 1. By our Theorem 11.6.2 we know that A is isomorphic to a Heyting algebra of open elements in some closure algebra. Let the closure algebra be (B, II, v, -, c) and let the Heyting algebra of open elements be (B', II, V,~, 0). It is clear that cjJ can be rejected in an isomorphic image of B just as well as in B, so there is an interpretation I' in B' so that t' (cjJ) =/: 1. We now define an interpretation /' of S4 sentences in a so that zl/(cjJ*) =/: 1. Set II/(p) = z'(p) for each elementary sentence p. We know by the previous lemma that /(cjJ) = t"(cjJ*). Thus since /(cjJ) =/: 1, t"(cjJ*) =/: 1. Hence, by the general D soundness theorem for S4, not f-S4 cjJ.
Theorem 11.7.1 For each sentence cjJ of H, f-H cjJ Uff-s4 cjJ*. Before we prove the theorem we state the following, which may be established by a straightforward induction on formulas.
387
11.8 Alternation Property for H Theorem 11.8.1 Iff-H cjJ
V
IJI, then f-H cjJ or f-H IJI.
INTUITIONISTIC LOGIC
388
ALGEBRAIC DECISION PROCEDURES
Proof We could give a direct proof analogous to the proof given of the alternation property for S4, but instead we shall actually use the alternation property for S4 by way of the GOdel-McKinsey-Tarski translation of H into S4. We shall argue as follows. Suppose I-H ¢ V lfI. Then, by the translation, I-S4 ¢* V lfI*. But then, by the lemma that we will prove below, I-S4 O¢* V olfl*. Then, by the alternation property for S4, 1-s4 O¢* or I-S4 Olfl*. So by the translation, I-H ¢ or I-H lfI, as desired. 0
So we only need the following lemma. Lemma 11.8.2 Let ¢* be the Godel-McKinsey-Tarski translation of a sentence ¢ of H. Then 1-s4 ¢* +-+ o¢*. Proof We induct on the complexity of ¢. (1) Base case. ¢
= p, where p is a sentential variable. Then p* = op, and obviously
I-S4 0 p +-+ DO P by the Axiom of Necessity and the characteristic S4 axiom. (2) ¢ = "'lfI. Then ¢* = 0 -lfI, and the reasoning proceeds as in the base case. (3) ¢ = lfI A X. Then (lfl A X)* = lfI* A X*· By inductive hypothesis, I-S4 lfI* +-+ olfl* and I-S4 x* +-+ 0 X*. So by the replacement theorem for S4, 1-s4 (lfI* A X*) +-+ (Olfl* A 0 X*). But since it is a well-known fact that 1-s4 O(a A p) +-+ (Oa A OfJ) (distribution of necessity over conjunction), we have 1-s4 (lfI* A X*) +-+ O(lfI* A X*) as desired. (4) ¢ = lfI V X. The proof proceeds as in case (3), but, of course, cannot appeal to simple distribution of necessity over disjunction, since that is a well-known modal fallacy. Instead the trick is to proceed from the step I-S4 lfI* V X* +-+ olfl* V 0 X* by the S4 axiom (with Axiom of Necessity) to 1-s4 lfI* V X* +-+ OOlfl* V ooX*. Then use I-S4 OOa V OOfJ +-+ O(Oa V OfJ) to obtain I-S4 lfI* V X* +-+ O(Olfl* V 0X*). Inductive hypothesis then strips the inner necessity signs off, giving I-S4 lfI* V X* +-+ O(lfI* V X*), as desired. (5) ¢ = lfI -+ X. Then ¢* = O(lfI* :J X*), and the proof proceeds as in the base case.
o 11.9
Algebraic Decision Procedures for Intuitionistic Logic
It is possible to show that the theorems of H are decidable by using the translation of H
into S4 and the fact that S4 has the finite model property. Indeed, by fussing with this it is possible to show that H itself has the finite model property, but instead we shall sketch a more direct proof. The reader is advised to compare the presentation with that of the corresponding theorems for S4 of Section 10.10, Since we shall be more brief here. Theorem 11.9.1 Let (L, A, V, =}, 0) be a Heyting lattice and let (L', A, V, 0) be a complete infinitely distributive sublattice ofL (with same lower bound 0). Then there exists a binary operation =}' on L' such that (L', A, V, =}', 0) is a Heyting algebra and when a, b, a =} bEL', then a =}' b = a =} b. Proof We use the same symbols to denote the meet and join operations in the original
lattice and in the sublattice. We define
389
(1) a=}'b=V{XEL':xAa~b}.
We know from the proof of Theorem 11.2.3 (which uses infinite distributivity) that (2) a=}b=V{xEL:xAa~b},
and so clearly a =}' b ~ a =} b. We show that the inequality holds as well in the other direction when a, b, a =} b E L'. It suffices to recall (again from the proof of Theorem 11.2.3) that (a =} b) A a ~ b (modus ponens). Since a =} bEL', then a =} b is actually one of the components of the join that defines a =}' b; pictorially a =}' b is ... Va=} b V .... But then clearly a =} b ~ a =}' b.
0
Remark 11.9.2 When L' is finite the conditions above collapse to the condition that L' is a sublattice of L (with the same lower bound). Theorem 11.9.3 The intuitionistic propositional calculus H has the finite model property. Proof Suppose that IIH ¢. Then form the Lindenbaum algebra of H, which is a Heyting lattice L. Let lfII, ... , lfIn be all of the subsentences of ¢. Consider the sublattice L' generated by [lfII], ... , [lfIn], [f]. We will show that L' is finite.
Using distribution, every element can be put into "meet-normal" form (Xl, I V ... V Xl, III I ) A ... A (x n, I V ... V xn, Ill,), where the Xi} are the generators. Because of associativity, commutativity, and idempotence this form can be defined to be unique (up to an ordering of the generators). So clearly the sublattice L' is finite, since it is easy to see that there are at most 22"+1 such forms. There is not yet an implication operation defined on L'. We cannot simply define [lfI] =} [X] = [lfl -+ X], since the result might not be defined when lfI -+ X is not a subsentence of ¢ (or even a conjunction of disjunction of such, even throwing in f). And we cannot simply expand L', closing it under =}, since the result need not be finite (as it was when we did the corresponding construction with meet and join). But we do know by Theorem 11.9.1 that a new implication operation =}' can be defined on L' that agrees with the original implication operation =} when a, b, and a =} b are all in L'. This turns out to be good enough, and (L', A, V, =}', [f]) will be our desired finite model. 0 Let us consider the interpretation l(p) = [p], for each atomic sentence p which is a subsentence of ¢, and otherwise l(p) is arbitrarily defined. This is very much like the canonical valuation in the Lindenbaum algebra except that when one comes to compute V/(lfII -+ lfI2) = [lfII1 =}' [lfI2], the result cannot be guaranteed to be [lfII -+ lfI2]. But it can be when lfII -+ lfI2 is a subsentence of ¢, and this is good enough. Thus one can prove the following by an easy induction on sentences: Lemma 11.9.4 Given VI as defined above, iflfl is a subsentence of ¢, then VI(lfI)
= [lfI].
Proof The finite model property now follows directly. Since we have been assuming that IIH ¢, then [¢] f: 1, and v I (¢) = [¢] is undesignated. 0
390
11.10
INTUITIONISTIC LOGIC
LC and Pretabularity
Dummett (1959) presented the sentential calculus LC which is obtained from the intuitionist sentential calculus H by the addition of all sentences of the form (1) (¢ -+ 1Jf) V (1Jf -+ ¢). Dummett then gave a completeness proof for LC with respect to the sequence of matrices that Godel (1933) used in showing that LC has no finite characteristic matrix. Dummett proved that although LC too has no finite characteristic matrix, still each (n + 2)-valued GOdel matrix is characteristic for those LC sentences containing but n distinct sentential variables. Ulrich (1970) proved that every extension of LC that is closed under substitution and modus ponens (we call these normal extensions) has the finite model property. In this section we report results of Dunn and Meyer (1971), giving an alternative proof of Dummett's completeness theorem by algebraic means, but more importantly strengthening Ulrich's result by showing that every normal extension of LC has a finite characteristic matrix. Similar results have been obtained for S5 by Scroggs (1951) (presented in Section 10.11) and for RM by Dunn (1970). The proofs we give are exactly parallel to those of Dunn (1970). Maksimova (1972) has set these results about LC into the context of a very pleasant general result, to wit that there are only three normal extensions of the intuitionistic sentential logic H that have the property of pretabularity (that all their normal extensions have a finite characteristic matrix). Where X is an extension of LC (perhaps LC itself), by an X-algebra we mean a pseudo-Boolean algebra in which all of the theorems of X are valid. In pseudo-Boolean algebras generally, ...,a = a =? O. So in considering LC-algebras, we need only concern ourselves with A, V, =?, and O. Certain LC-algebras are especially important. By Goo we mean that algebra whose elements are the negative integers and 0 together with -(j) (where -(j) is the least element), and whose operations are defined as follows: (i) a A b = min(a, b); (ii) a V b = max (a, b); and
(iii) a =? b =
{Ib
if a ::; b if a> b. By Gil we shall mean that subalgebra of Goo whose elements are the negative integers -n to -1 inclusive, together with 0 and -(j). We take Go to consist of just -(j) and O. Incidentally, our definitions of these matrices are dualized from those of Godel (1933) and other references, where conjunction is interpreted as max(a, b), etc. Generalizing, by a Godel algebra we shall mean any algebra whose elements form a chain with least and greatest elements, and whose operations are defined in an analogous way. All GOdel algebras are LC-algebras. Where A is a pseudo-Boolean algebra and F is a filter of A, we define the quotient algebra AI F. The elements of AI F are the equivalence classes of [a] of all elements b of A such that a=? b, b =? a E F.
LC AND PRETABULARITY
391
Theorem 11.10.1 If X is an extension of LC, A is an X-algebra, and F is a filter of A, then AI F is an X-algebra and is a homomOlphic image of A under the natural homomOlphism, h(a) = [a]. Proof. AI F is a pseudo-Boolean algebra and is a homomorphic image of A. We need then only observe that every theorem of X is valid in AI F. Since AI F is a homomorphic D image of A and every theorem of X is valid in A, this follows.
Theorem 11.10.2 Let X, A, and F be as in Theorem n.IO.I, but let F be prime, i.e., a V b E F only if a E F or b E F. Theil AI F is a Godel algebra. Proof. That AI F is a chain is immediate given (1) and the primeness of F. Further, AI F must have least and greatest elements since every pseudo-Boolean algebra does.
We need then only check that the operations are defined as on a GOdel algebra. This is obvious for A and V, and the following theorem of LC, proved in Dummett (1959), ensures that =? is all right:
Thus since F is prime, either a =? b E F or (a =? b) =? b E F. But a =? b E F iff [a] ::; [b]. So if [a] ::; [b], then a=? b E F. But in general for a pseudo-Boolean algebra, x E F only if [x] is the greatest element in the quotient algebra. So [a] =? [b] = [a =? b], which is the greatest element of F, as it should be. On the other hand, if [a] i [b], then a =? b ¢. F, and then (a =? b) =? b E F. So then [a =? b] ::; [b]. And in any pseudoBoolean algebra, since b =? (a =? b) is the greatest element and hence in F, then [b] ::; [a =? b]. So [a] =? [b] = [a =? b] = [b], as it should. D Exercise 11.10.3 Prove that (¢ -+ 1Jf) V «¢ -+ 1Jf) -+ 1Jf) is a theorem ofLC. Theorem 11.10.4 Let X and A be as in Theorem n.lO.l, and let a E A be such that a :j:. 1. Then there is a homomorphism h of A onto a Godel algebra which is an Xalgebra, such that h(a) :j:. 1. Proof. Immediate from Theorems 11.10.1 and 11.10.2 once we invoke Stone's prime D filter separation theorem.
We remark that it easily follows from Theorem 11.10.4, by a familiar construction used by Stone, that every LC-algebra is isomorphic to a subdirect product of Godel algebras. Since the only GOdel algebra which is a Boolean algebra (excluding the degenerate one-element algebra) is Go, this result may be regarded as a generalization of the embedding theorem of Stone's for Boolean algebras. Theorem 11.10.5 Consider the sequence of Godel algebras Go, G], G2, .... tence ¢ is valid in G i , then ¢ is valid in G j for all j ::; i. Proof. This is immediate since each G j is a subalgebra of G i .
If a senD
Where X is a sentential calculus and A is a set of atomic sentences, let XI A be that sentential calculus like X except that its sentences contain no atomic sentences other than those in A. The following theorem is then obvious.
392
INTUITIONISTIC LOGIC
Theorem 11.10.6 If X is a normal extension of LC, then A(X/A) is an X-algebra, and in fact is characteristic for X/A, since any non-theorem may be falsified under the canonical valuation that sends every sentence ¢ to 1i¢11. The hard part of Dummett's completeness result for LC is showing that if a sentence ¢ is not a theorem, then there is some GOdel algebra G n such that ¢ is not valid in G n . This is contained in the following theorem, though generalized to arbitrary normal extensions of LC. Theorem 11.10.7 Let X be a normal extension of LC. Then if a sentence ¢ is not a theorem of X, then there is some Godel algebra G n such that G n is an X-algebra and ¢ is not valid in G n . Proof. It follows quickly from Theorems 11.10.6 and l1.lOA. Thus if ¢ is not a theorem of X, then by Theorem 11.10.6, ¢ is falsifiable in the X-algebra A(X/ A) where A is the set of atomic sentences occurring in ¢. But since 1i¢11 f:. IllfI:J lfIll, the greatest element, then by Theorem 11.10.4, there is a homomorphism h of A(X/A) onto a GOdel algebra G such that G is an X-algebra and h(II¢II) f:. 1. We may then falsify ¢ in G by the interpretation l(¢) = h(II¢I\). Note that G is an X-algebra and h(lllfIll) f:. 1. We may then falsify lfI in G by the interpretation l(lfI) = h(lilfll\). Note that G is finitely generated since it is the homomorphic image of A(X/A), which itself is finitely generated by the elements Ilpli such that pEA. Thus G is finitely generated by the elements h(llpll) such that pEA. It is obvious that every finitely generated GOdel algebra is finite, and it is further obvious that every finite Godel algebra containing at least two elements is isomorphic to some Gil' Thus G is isomorphic to some Gil' which completes the 0 theorem.
We now tum to the proof of our principal result. Theorem 11.10.8 Every consistent proper normal extension of LC has a finite characteristic matrix, namely, some Godel algebra Gil' Proof. The reasoning mimics that of Scroggs (1951). Let I be the set of indices of those GOdel algebras Gil that are X-algebras, where X is the given consistent proper normal extension of LC. By Theorem 11.10.7, since X is consistent, I is non-empty. If I contains infinitely many indices, then I contains every index because of Theorem 11.10.5. But then it follows from Dummett's completeness result that X is identical to LC. But if I contains only finitely many indices, then by Theorem 11.10.5, there must be some index i such that I contains exactly those indices less than or equal to i. By construction, G; is an X-algebra. Now suppose that a sentence ¢ is not a theorem of X. Then by Theorem 11.10.7, ¢ is not valid in some X-algebra G;, and by our choice of i, k ::; i. But then by Theorem 11.10.5, ¢ is not valid in G;. So G; is the desired finite characteristic matrix. 0
We remark that Theorem 11.10.8 has as a corollary that every proper normal extension of LC may be axiomatized by adding as an axiom to LC one of the sentences
LC AND PRETABULARITY
393
Godel (1933) used in showing that H has no finite characteristic matrix, and that from this it easily follows that the only consistent and complete normal extension ofLC is the classical sentential logic. (Compare the proof of similar corollaries at the end of Section 10.11.) It should also be remarked that Thomas (1962) contains another interesting way of ax iomati zing all of the GOdel matrices G,z, in which each of them is axiomatized by the addition of some appropriate pure implicational sentence as an axiom to LC. We finally allude to the fact that strong completeness results for LC are readily obtainable from Theorem 11.10.7, which are along the lines of strong completeness results for RM in Dunn (1970).
RESIDUATION AND GALOIS CONNECTIONS
12 GAGGLES: GENERAL GALOIS LOGICS 12.1
Introduction
The aim of this chapter is to provide a uniform semantical approach to a variety of non-classical logics, including intuitionistic logic and modal logic, so as to recover the representation theorems of Chapters 10 and 11 as special cases. The strategy is to adopt the basic framework of the Kripke-sty Ie semantics for modal and intuitionistic logic (cf. Chapters 10 and 11), using accessibility relations to give truth conditions for the connectives. We generalize this in line with Jonsson and Tarski (1951, 1952) so that in general an n-place connective will be interpreted using an (n+ 1)place accessibility relation (cf. Section 8.12). Besides the Kripke semantics for modal logic, there are motivating precedents with the Routley and Meyer (1973) semantics for relevant implication and the Goldblatt (1974) semantics for orthonegation. The problem with the Jonsson Tarsld result is that while it shows how Boolean algebras with n-place "operators" can be realized using (n + 1 place)-relations, the context is more restrictive than one would like. For example, the underlying structure must be a Boolean algebra, and the "operators" must distribute over Boolean disjunction in each of their places. We have already shown in Section 8.12 that Boolean algebras can be replaced with distributive lattices. But in this chapter we shall examine structures that we call "distributoids," and which relax the constraints of Jonsson Tarski. Distributoids are not the full abstraction we are seeking, because there need be no interaction between the various operators. We have noticed that many important logical principles can be seen as involving relationships between pairs of logical operators that may be seen under the algebraic abstractions of residuation and Galois connections. I We shall abstract these relationships into an algebraic structure called a "gaggle." Incidentally, we owe the name "gaggle" to Paul Eisenberg (a historian of philosophy, not a logician), who supplied it at our request for a name like a "group," but which suggested a certain amount of complexity and disorder. It is a euphonious accident that "gaggle" is the pronunciation of the acronym for "general galois logics.,,2 The general approach here is algebraic, thus we will think of a logic in terms of its "Lindenbaum algebra," formed by dividing the sentences into classes of provable equivalents, defining operators on these equivalence classes by means of the connectives applied to representatives. We shall represent the algebras in a way pioneered by Stone
395
(1936, 1937), and extended by Jonsson and Tarski, so that elements are mapped into sets (thought of as "propositions," or sets of states where the sentences are true). This gives completeness results for the various logics. See Chapter 1 for a discussion of the general relation between representation results and completeness theorems. In their original incarnation (Dunn 1991), gaggles were required to have underlying distributive lattices so that meet ("and") is represented as intersection, and join ("or") is union. Canonically then states are prime filters. However, this condition can be weakened to where the underlying structure is just a partial order (as with the Lambek calculus). Then states can just be principal cones and the complements of principal dual cones, and one does not need Zorn's lemma. In certain cases (as with orthologic and at least the non-exponential part of linear logic) where the logic is a meet-semilattice in any case, for consistency and join can be defined from meet using a negation that is a lattice involution, the methods can be extended so both meet and join are given reasonable interpretations. At the end of this chapter we shall give some applications. We should caution that in most cases, the general representation theorem leads to something other than the usual semantics known in the literature. Thus, for example, the usual semantics for intuitionistic implication (cf. Chapter 11) uses a two-place accessibility relation, whereas the gaggle approach yields a three-place accessibility relation. It then becomes necessary to examine the details of the general representation, applying specific algebraic properties of the logic in question to see that the usual result falls out as a special case after "fiddling with the representation." A toy example of this is given, showing how the usual Stone representation for Boolean algebras (where Boolean complement becomes set complement) can be obtained from the gaggle representation (where Boolean complement is represented using a two-place accessibility relation). 12.2
Residuation and Galois Connections
Consider two po sets A
= (A,::;) and B = (B, ::;') with functions f :A
1---+
B,
g: B
1---+
A.
The pair (f, g) is called residuated iff (rp) fa::;' b iff a::; gb.
The pair (f, g) is called a Galois connection iff (gc) b::;' fa iff a::; gb. A dual Galois connection is a pair (f, g), where (dgc) fa::;' b iff gb::; a. A dual residua ted pair (f, g) is a pair (f, g), where (rp) b::;' fa iff gb::; a.
I There
has been an anticipation of this in Sections 3.10, 3.17, and 8.1, but where the underlying order structure was only a partial order and not a (distributive) lattice. 2We do not ourselves endorse the alternative pronunciation "giggle."
Remark 12.2.1 We have already defined a Galois connection in Section 3.13 for the special case where A = B. The definitions above differ from one another only in the
GAGGLES: GENERAL GALOIS LOGICS
396
direction of an inequality here and there. Thus turn around the left inequality in the definition of a residuated pair, and we obtain a Galois connection, and turning around the right inequality gives a dual Galois connection. If both the left and right inequalities are turned around, of course a residuated pair becomes a dual residuated pair, and similarly for a Galois connection and its dual. Incidentally, observe that a dual residuated pair (f, g) is just a residuated pair (g, J), and so for the most part we shall not bother to look separately at dual residuated pairs. The moral is that as long as the two posets A and B are treated as distinct, one is of course free to turn the inequalities around, since the converse of a partial ordering is again a partial ordering. As someone in Australia can testify, one person's "up" is another person's "down." But these are abstractly all the same, and yet can be distinguished if we assume that A and B are the same, as we shall do henceforth. Our next theorem is easy to prove.
Theorem 12.2.2 (1) For a residuated pair, the following is an equivalent definition: (a) both f and g are monotonic, and fgx S x, x S gfx.
Moreover
if the poset is a lattice,
then
(b) f(x V y) = f(x) V fey) and (c) g(x 1\ y) = g(x) 1\ g(y). (a) both f and g are antitonic, and x S fgx, x S gfx.
if the poset is a lattice,
then
(b) f(x V y) = f(x) 1\ fey) and (c) g(x V y) = g(x) 1\ g(y).
(3) For a dual Galois connection, the following is an equivalent definition: (a) both f and g are antitonic, and fgx S x, gfx S x.
Moreover
if the poset is a lattice,
then
(b) f(x 1\ y) = f(x) V fey) and (c) g(x 1\ y) = g(x) V g(y).
Terminological note. The definition of a Galois connection was first introduced by Birkhoff (1940), and was defined in terms of condition (2a) above. (Birkhoff attributes to 1. Schmidt the equivalence stated as (gc) and which constitutes our definition.) Galois connections were extensively studied by Ore (1944) and Everett (1944). The notion of a Galois connection of course abstracts out the correspondence between the sub fields of a given separable field extension and the subgroups of the Galois group of transformations that leave that given subfield unmoved. Note the relation Fl ~ F2 iff H(F2) ~ H(Fd, which gives a clear motivation for turning the partial order around on the right-hand side of a Galois connection. There is an issue about this, as we shall see. The notion of residuation has most often been discussed in the case of binary operations, in the context of "residuated partially ordered groupoids" (see below), but the
397
definition of a residuated pair is a natural extension of that concept to the unary case (with the further understanding that the function need not be an operation taking values in its domain). Our unary form of a residuated pair can be found (after some decoding) in Blyth and Janowitz (1972), where it is explicitly contrasted with a Galois connection (pp. 18-19). It is also somewhat confusingly to be found in Gierz et al. (1980), where it is called a "Galois connection," with the ironic remark "notice that we have to keep the order straight." Gratzer (1979, p. 51) also follows this usage in an exercise. These usages reinforce the moral above about the essential abstract equivalence of these notions. Finally, let us note that in the language of category theory, a residuated pair can be understood as a pair of "adjoint functors" (cf. MacLane 1971). We believe that the theorem in MacLane "Galois connections are adjoint pairs" (p. 93) has been the driving force in identifying what we are distinguishing as Galois connections and residuated pairs (cf. also Lambek 1981), though MacLane himself is explicit about the fact that Galois connections are anti tonic, and that one must take the "opposite" of the righthand category (the dual of the po set) in order to obtain the desired result. Given a binary relation R on a set U (a frame), it is easy to construct examples of residuated pairs and Galois connections (also their duals) defined on subsets of U as follows.
Example 12.2.3 Residuated pair (0 tA
(2) For a Galois connection, the following is an equivalent definition: Moreover
RESIDUATION AND GALOIS CONNECTIONS
~
B
A
¢
~
DB):
0tA = {X: 3a(aRx & a E A)}, DA = {X: Va(xRa =? a E A)}.
Example 12.2.4 Dual residuated pair (0 A
~
B
¢
A
~ D t B):
OA = {X: 3a(xRa & a E A)}, DtA = {X: Va(aRx =? a E A)}.
Example 12.2.5 Galois connection (A ~ B.l.. .l..A = {X: Va(a E A A.l.. = {X: Va(a E A
=? =?
¢
B ~ .l..A):
XRa)}, aRX)}.
Example 12.2.6 Dual Galois connection (There is no customary notation-we use ? for "possibly false."?A ~ B ¢?tB ~ A): ?A = {X: 3a(xRa & a ¢ A)}, ?tA = {X: 3a(aR x & a ¢ A)}.
All of the above examples, except for the last, have occurred prominently in the literature, but not "on a single page." Thus, for example, thinking of U as a set of "possible worlds," or "states," and thus A and B as "propositions," DA and 0 A are just the usual definitions of the necessity and possibility operators in the Kripke semantics for modal logic. The relation R is called an "accessibility" relation, and aRfJ is read as "fJ is possible relative to a." Of course DtA and 0 tA are just the duals ("backward possibility and necessity"). It is not common to have these together with their forward version in the same modal logic, but this does happen in temporal logic, where aRfJ is read "fJ is an alternative future to a." Then DA becomes the usual tense operator GA for
398
GAGGLES: GENERAL GALOIS LOGICS
"it will always be the case," OA is FA for "it will (sometimes) be the case," and D~A and 0 ~A become the past tense versions H A and P A, respectively. Note also that in standard modal logic, when the accessibility relation is symmetric, (as for the logics Band S5), the "backward" operators are indistinguishable from the "forward" operators, and we can thus strike out the 'T' in the residuated pair and dual residuated pair laws above. The interesting thing is that these laws always hold if we distinguish backwards from forwards. Finally, we discuss Galois connections. It is interesting to note that these definitions can be found in Birkhoff (1940, 1948, 1967) under the heading of "polarities," and Everett (1944) showed that all Galois connections defined on power sets can be obtained from polarities. (Our results concerning gaggles will obtain, as a very special case, the more general result for Galois connections defined on distributive lattices, or, by trivial modification, Galois connections defined on posets). In interpreting R it is best to think of it as an "inaccessibility relation," or better as "incompatibility." It is customary to denote this relation as..L, and in Birkhoff (1940, p. 125) it is connected to Hilbert spaces as the "orthogonal" or "perp" relation. Goldblatt (1974) gives a completeness theorem for ortholattices (but not orthomodular lattices) in effect using A..L as the definition of negation. In many cases, including Goldblatt's, it is natural to require that the relation be symmetric, in which case A..L = ..LA. But this is not forced. It is easy to establish from our representation results below, that the above examples are "canonical," i.e., all residuated pairs, Galois connections, and their duals are isomorphic to those defined as above on a collection of subsets of some frame. Our representations assume that the po set is a distributive lattice, since we want representations that carry meet and join into intersection and union, respectively. But if one does not care about that one can have an arbitrary poset, and it is left as an exercise for the reader to see how to rework our results so that they apply to this "more general" case. Section 8.1 is a good model.
12.3
Definitions of Distributoid and Tonoid
As a first approximation a distributoid is a structure D = (A, 1\, v, (Oi) iEI), where (A, 1\, V) is a distributive lattice, and each I E (Oi) iEI is a (finitary) operation on A that "distributes" in each of its places over at least one of 1\ and V, leaving the lattice operation unchanged or switching it with its dual. Note that it is easy to see that for each I E (Oi) iEI, I is in each of its argument positions either isotonic or antitonic: ',