Non-Standard Logics for Automated Reasoning
This book is the result of a project that was funded by the Commission of the European Communities' COST (Committee on Science and Technology) initiative.
Non-Standard Logics for Automated Reasoning Edited by
PH ILl PPE SM ETS /RID/A, Universite Libre de Bruxel/es, Belgium
ABE MAMDANI Department of Electrical Engineering, Queen Mary College, London, UK
DIDIER DUBOIS Laboratoire Langages et Systemes lnformatiques, Universite Paul Sabatier, Toulouse, France
HENRI PRADE Laboratoire Langages et Systemes lnformatiques, Universite Paul Sabatier, Toulouse, France
1988
@ ACADEMIC PRESS Harcourt Brace Jovanovich, Publishers London San Diego New York Boston Sydney Tokyo Toronto
ACADEMIC PRESS LIMITED 24/28 Oval Road, London NW1 7DX
United States Edition published by ACADEMIC PRESS INC. San Diego, CA 92101 Copyright © 1988 by ACADEMIC PRESS LIMITED Chapter 4 "Autoepistemic Logic" © by R. C. Moore Chapter 8 "Probabilistic Logic" © by G. Paass "Discussion of Smets" © by G. Paass Chapter 9 "Belief Functions" © by P. Smets
All Rights Reserved No part of this book may be reproduced in any form by photostat. microfilm, or any other means, without written permission from the publishers
British Library Cataloguing in Publication Data Non-standard logics for automated reasoning. 1. Logic, Symbolic and mathematical I. Smets, P.H. 511.3 BC135 ISBN 0-12-649520-3
Filmset by Eta Services (Typesetters) Ltd, Beccles, Suffolk Printed in Great Britain by St Edmundsbury Press Ltd, Bury St Edmunds, Suffolk
Participants
Philippe Besnard, IRISA, Domaine Universitaire, Campus Beaulieu, Avenue du General Leclerc, 35042 Rennes Cedex, France John Bigham, Department of Electrical and Electronic Engineering, Queen Mary College, Mile End, London El 4NS, England John A. Campbell, Department of Computer Science, University College London, Gower Street, London WC IE 6BT, England Eugene Chouraqui, * Groupe Representation et Traitement des Connaissances, Centre National de Ia Recherche Scientifique, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 09, France Michael R. B. Clarke, Department of Computer Science, Queen Mary College, Mile End Road, London E1 4NS, England Anthony G. Cohn, Department of Computer Science, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, England Marie-Odile Cordier, LRI, Batiment 490, Universite Paris-Sud, 91405 Orsay Cedex, France Didier Dubois, Laboratoire Langages et Systemes lnformatiques, Universite Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cedex, France Jean Fargues, Centre Scientifique IBM France, 36 Avenue Raymond Poincare, F-75116 Paris, France Luis Farinas del Cerro, Langages et Systemes lnformatiques, Universite Paul Saba tier, 118 Route de Narbonne, 31062 Toulouse Cedex, France Christine Froidevaux, UA 410, CNRS, Laboratoire de Recherche en lnformatique, Universite de Paris-Sud, 91405 Orsay Cedex, France Richard A. Frost, Department ofComputer"Sciences, University of Windsor, 401 Sunset Avenue, Windsor, Ontario N9B 3P4, Canada D. M. Gabbay,* Department of Computing, Imperial College of Science and Technology, University of London, 180 Queen's Gate, London SW7 2AZ, England Paul Gochet, Seminaire de Logique et d'Epistemologie, Universite d'Etat de Liege, 32 Place du XX Aout, 4000 Liege, Belgium Eric Gregoire, Unite d'Informatique, Universite Catholique de Louvain, Place Sainte Barbe 2, Bl348 Louvain-La-Neuve, Belgium Andreas Herzig,* Laboratoire Langages et Systemes lnformatiques, Universite Paul Saba tier, 118 Route de Narbonne, 31062 Toulouse Cedex, France *Contributor but not participant
viii
Preface
been realized. The editors, who were also the coordinators of the project, gratefully acknowledge this support. The content of this book was discussed and polished at workshops held in September 1986 and June 1987. The workshops took place over long weekends (each amounting to three and a half days of meetings and discussions), the first in Cordes and the second in Rodez-two pleasant locations near Toulouse, France. The format of these meetings is worth a brief mention here. Formal sessions took place from 9 a.m. to 12 p.m. and from 4 p.m. to 7 p.m. each day. This allowed considerable time in the afternoon and the evenings for informal discussions. For the first workshop, a few of the participants had been nominated by the project coordinators to present draft papers. These papers were read and discussed at great length. The coordinators also nominated discussants to provide the critiques for each presentation, but voluntary discussions were also encouraged. The revised papers, their critiques and rejoinders were circulated to all participants for comment and were later discussed at the second workshop, which essentially performed the function of editing the text for the book. As editors and project coordinators, we were heartened to witness the intense intellectual activity that occurred at the workshops, both during the formal sessions and outside them. At the end of the introduction to the book we have included three appendices. The first two provide short tutorials on classical logic and modal logics, and the third gives a brief introduction to the existing literature on the logical aspects of probability theory. These tutorials and the bibliography included in the appendices provide useful reference material for the reader. The critiques and rejoinders accompanying each paper are there further to enhance the intellectual flavour of each presentation and accurately reflect the discussions that took place. The editors hope that a discerning reader will be able to gather from these discussions the care that needs to be exercised in applying each logic and the open questions that need further research. A graduate student will find the text useful both in selecting a topic for research and also in directing attention to the sort of questions that need to be researched when working in non-standard logics. The organization of this project necessitated the distribution of a large amount of printed material from each author to the discussants and to other participants, throughout the two years of the project. In this we were helped greatly by Fabienne Gerard and Lucy Ineson, and we wish to acknowledge their efforts here. Philippe Smets Abe Mamdani Didier Dubois Henri Prade
Contents
Preface
IX
Introduction E. H. Mamdani, John Bigham and Flash Sheridan Appendix A: Classical Logic- An Introduction Anthony G. Cohn Appendix B: Modal Logic-A Brief Tutorial Bruno Marchal Appendix C: Logic and Probability Didier Dubois and Henri Prade
1 8 15
1 On-Game-Theoretic Interactions with First-Order Knowledge Bases Peter Jackson Discussants: P. Gochet 54 L. Farinas del Cerro 56 R. C. Moore 56 Reply 59 2
3
4
An Automated Modal Logic of Elementary Luis Farinas del Cerro and Andreas Herzig Discussants: J. Fargues 76 F. Veltman 78 Reply 79 Formal Expression of Time in a Eugene Chouraqui Discussants: C. Testemale 98 L. Farinas del Cerro 99 J. A. Campell 100 Reply 101 Autoepistemic Logic Robert C. Moore Discussants: J. Fargues 127 P. Jackson 129 Ph. Smets 130 E. Gregoire 132 Reply 133 ix
23
27
Changes
Knowledge
63
Base 81
105
Contents
X
5 The Preferential-Models Approach to Non-Monotonic Logics Philippe Besnard and Pierre SieKel Discussants: C. Froidevaux 156 F. Sheridan 159 Reply 160 6
7
8
9
I0
An lntuitionistic Basic for Non-Monotonic Michael R. B. Clarke and D. M. Gahhay Discussant: J. A. Campbell 175 Reply 177
Reasoning
Inheritance in Semantic Networks and Default Christine Froidevaux and Daniel Kayser Discussants: D. Dubois and H. Prade 206 Ph. Smets 208 D. Dubois and R. Valette 209 Reply 211 Probabilistic Logic Gerhard Paass Discussants: F. Veltman 244 D. Dubois and H. Prade Ph. Smets 248 Reply 249 Belief Functions Philippe Smets Discussants: M. R. B. Clarke 277 G. Paass 279 D. Dubois and H. Prade Reply 282 An Introduction to Possibilistic Didier Duhois and Henri Prade Discussants: M .-0. Cordier 315 P. Gochet 318 F. Sheridan 320 Reply 321
Index
137
163
Logic 179
213
246
253
280
and
Fuzzy
Logics 287
327
Introduction
In trying to automate reasoning one should first attempt to articulate not how humans do reason but rather how they might most fruitfully reason. Logic is thus concerned with the normative aspects of reasoning. "Automated reasoning" in the title of this book refers to the automation of a formal system of logic. The term logic in the title refers to such a formal system. About 150 years ago, George Boole expounded his Laws of Thought. This suggested that logical human reasoning was subject to mathematical laws. (It is interesting to note that this book's full title is An Investigation of the Laws of Thought on which are Founded the Mathematical Theories of Logic and Probabilities; we shall have more to say about probability later.) His work was the starting point of formal systems of logic. The motivations for automating re1:1soning and studying a formal system of logic are different. The point of automating reasoning is to apply it to reasoning about a range of practical problems. In contrast, one may undertake to study a formal system for philosophical reasons or for scholastic reasons, i.e. for the sake of the beauty of the system' itself. For example, while studying a formal system, one attempts to ensure that the system adheres to certain well-understood metalogical (for the definition of this and the following terms see Appendix A on classical logic) requirements, such as the consistency, soundness and the completeness of a system. Now a system that is to be automated should adhere to the requirements for soundness; that is, if a statement is provable from a given set of premises using the system's formalisms, then that statement should also logically follow from the premises. However, the requirement for completeness-that everything that logically follows from a set of premises should be provable within the system--can quite conceivably be relaxed for an automated practical system of reasoning. None of the logics presented here explicitly consider incompleteness resulting from limited computational resources, although in practical systems for automated reasoning that is one factor that needs to be taken into account. The system of logic that is not non-standard is called classical logic. It is the formal system of logic that has been most extensively studied since George Boole. However, several non-standard systems were also investigated before the advent of the digital computer and the need for automating reasoning. The latter has simply accelerated the study of non-standard systems. There is NON-STANDARD LOGICS FOR AUTOMATED REASONING ISBN 0-12-649520-3
Copyright~('} /988 Academic Pre.~.'i Limited All rights t~{reproduction in anyfiJrm reserv,•J
2
E. H. Mamdani eta!.
controversy concerning non-standard logics among philosophers and logicians. This controversy has also spilled over to scientists who wish to apply logics by automating them, but the criteria for the debate ought to be (but often are not) different in these two settings. The work on non-standard systems continues apace and this book is aimed at satisfying the need for an explanation of such systems. Moreover, these explanations are accompanied by an exposition of the arguments that surround each of the systems. At the end of this introduction we have included three appendices that the reader may find particularly useful. Appendix A is a short tutorial on classical logic. Many of the non-standard logics are either extensions or deviants of classical logic. Therefore, even in 1l book on non-standard logic, classical logic remains a benchmark. This appendix also includes a short discussion of a particular deviant of classical logic: namely, intuitionistic logic. One way to extend classical logic is to extend its vocabulary with modal operators. Depending on the assumed properties of these operators, we obtain a large class of modal logics. The semantics of modal logics are explained in terms of "many possible worlds" first articulated by Kripke. Appendix B is a short tutorial on modal logics and Kripke semantics. A significant background work in logic and probability theory (recall the motivation of George Boole as expressed in the full title of his book) is largely ignored by the artificial intelligence community who use probability theory in their application. Appendix C is a very brief introduction to the literature that exists on logical aspects of probability that we feel ought to be given due recognition.
WHY NON-STANDARD LOGICS? It is impossible to give a short summary of the controversy surrounding nonstandard logics. This book does not pretend to adopt an impartial stance, merely an intellectually open one. The claim that they are all wrong or unnecessary, and usually both, may be ignored in automating reasoning, where ideas of abstract truth and economy in primitive elements may be of less concern than questions of more convenient expressive power and, perhaps, also of the economy of computational effort. Moreover, non-standard systems are conceived and put forward in order to capture particular modes of reasoning (e.g. reasoning about beliefs, probabilistic reasoning or default reasoning) rather than to simply investigate the rules of logical reasoning
per se. The first question that should occur to someone presented with an alternative to classical logic is, "Why bother?" Classical logic is widely understood and agreed upon; this is not to say that it is either perfect or
Introduction
3
invariably useful. The purpose of this book is, broadly, not to argue the latter, but to show that for many purposes, some alternative to classical logic works better. There are, of course, many arguments that classical logic is flawed, and these arguments are the historical origin of many of the logics used here; but the arguments against classical logic do not themselves belong here. We are showing how one can use various non-!rtandard logics, not that one must. It is interesting first of all to give a rough classification of the logics discussed in this book; any such classification is open to debate. Most of them have modified their classical paradigm for greater expressive power. It is not that classical theory comes up with the wrong answers to questions, but that certain questions cannot be expressed in it easily, naturally, or efficiently computationally. The most obvious distinction is numeric versus non-numeric; Paass, Smets, and Dubois and Prade in this book all use numeric logics; the rest, nonnumeric. The second distinction, we take from Susan Haack's Philosophy of Logics (Cambridge University Press, 1978). Some logics are extended; they can talk about things that classical logic cannot by extending the basic vocabulary of logic. Other logics are restricted or deviant: some, at least, of the things they say use roughly the same vocabulary as classical logic, but they make some of the theorems false where classical logic would have them true. A logic can be both deviant and extended. Farinas and Herzig, Chouraqui, and Moore have all extended clasicallogic by adding modal operators that apply to sentences; any sentence of their systems without these operators is true if and only if it is true in classical logic. Similarly, Smets, and one part of Dubois and Prade's chapter (on logics of uncertainty) can be viewed as extensions of classical probability theory. Whereas classical probability theory assigns a sentence a' single number, both of these assign a sentence two numbers. Another sort of extension of any theory is to start reasoning about the assertions of theory. This is called the theory's meta-theory (see Appendix A). Siegel and Besnard use model theory, part of classical logic's meta-theory; this also is an extension. Paass, on the other hand, reasons about the assertions of probability theory; so his theory is an extension of probability theory. The other part of Dubois and Prade's chapter, on the logics of vagueness, is a deviation from classical logic; both the law of the excluded middle and the law of contradiction hold in classical logic, but in general none of them hold in fuzzy logic. Another deviant logic is intuitionistic logic, presented by Clarke and Gabbay; this denies the law of the excluded middle only. Jackson's chapter and that of Froidevaux and Kayser are both quite different from this. Jackson uses classical logic; he just has a different
4
E. H. Mamdani et a!.
semantics (see Appendix A) for it. Froidevaux and Kayser use neither logic nor probability theory; in order to do much the same job as default logic (an extension of classical logic) they use inheritance trees. Below, we discuss each of the non-standard logics in the order in which they appear in this book. The earlier chapters concern logics that rely on only symbolic (hence non-numeric) reasoning. The last three chapters all deal with logics based on numeric formalisms.
SYMBOLIC REASONING The paradigm for non-numeric reasoning has been classical logic, in which classical theorem-proving programs make automated reasoning a (limited) reality. Ad hoc methods exist, but their logical foundations need. further investigation. When standard logic is modified, the extensions and modifications must be given a clear interpretation. In non-numeric non-standard logic, "non-monotonic" is the most prominent buzz-word, but not the central issue; non-monotonicity is merely a property of some of the logics. We must sometimes change our minds; this, essentially, is non-monotonicity. Upon learning more, we realize that something that we thought was true is not. So any logic to deal with practical matters is likely to be non-monotonic, but this will be only one of its properties. (For a more technical definition of "non-monotonic", see Appendix A.) Game-theoretic semantics Jackson does not use a non-standard logic at all; he uses a non-standard semantics for classical logic, due to Hintikka, in order to provide a model of a computer's attempt to maintain a consistent knowledge base. It is conservative when accepting information updates but permissive and helpful when answering queries. Ordinary, truth-functional semantics is essentially nonpragmatic, whereas Hintikka's game-theoretic semantics is pragmatic in that it incorporates the notion of verification and refutation. This allows the user to give domain-dependent heuristics that (although they can lead to errors) may be able to answer queries more efficiently. Modal logics Classical logic is timeless. If the proposition that Gottlob Frege had a beard on 1 January 1886 was once true, it will always be true. But this proposition
Introduction
5
has no logical connection with his having a beard on the next day. If we are going to reason about things changing in time, we will have to do better than this; temporal logics apply modal logics to reasoning about time. In a temporal logic we can express that at a given time Frege changed from a bearded state to a beardless one. This we cannot express directly in classical logic. Chouraqui's chapter starts with a survey of the field and concludes with his own theory. Another thing that classical logic cannot express directly is contrary-tofact conditionals (alias counterfactuals), e.g. sentences like "IfGalileo had been American he would have taught at Harvard." This statement presents more difficulty than at first appears. One may believe it, but one may also believe that ifGalileo had been American he would have been an uneducated farmer; this contradicts our first counterfactual. The difficulty comes in deciding what other things to change when we change the antecedent from the truth (Galileo was Italian) to something else (that he be American). What else do we change? His education? His native language? No-one has found a solution that works well in all cases; Farinas and Herzig present a solution which will work in the case where all facts are mutually independent. Reasoning with incomplete knowledge Moore, Siegel and Besnard, and Froidevaux and Kayser each deal with a different form of reasoning, not just from what we know, but also from realizing what we do not know. The traditional example is as follows: if we know Tweety is a bird, then we assume he can fly if we do not know the contrary. This requires us to draw a conclusion based not just on our knowledge of the world, but on our knowledge of our own ignorance. This is a metatheoretic extension of classical logic's expressive power; classical logic cannot talk about ignorance. Reasoning from ignorance has an odd effect, namely non-monotonicity. If further information leads us to believe that Tweety is a penguin then we can no longer conclude that he can fly. More information leads to one less conclusion. Moore presents autoepistemic logic, a non-monotonic logic for modelling the set of beliefs that an ideally rational agent, reflecting on what he knows and does not know, would hold. He shows that such theories (i.e. sets of beliefs) have Kripke (possible worlds) models (see Appendix B). This also enables us to demonstrate the soundness and completeness of such a theory with respect to a set of premises. Moore presents a procedure for determining whether, given some data, such a rational agent might conclude a given formula. For many cases the algorithm requires exponential time for an initial calculation for a data set; afterwards answers to queries can be computed quickly.
6
E. H. Mamdani et a!.
A different approach to the Tweety problem is not to use classical logic as a paradigm, but to use inheritance hierarchies instead. Froidevaux and Kayser discuss type hierarchies (e.g. Tweety is a bird; "bird" is a type of animal; "animal" is a type of living thing), and properties inherited (perhaps with exceptions) through them. They also discuss their relation to default logics; default logics perhaps have a firmer logical foundations, but type hierarchies are computationally simpler. They show how to translate some type hierarchies into default inferences. A particularly simple example of reasoning from ignorance is the following: If a database does not say that Flight 913 stops in Chicago, we assume it does not stop in Chicago, even though we have not been told this; this is the Closed World Assumption. Another sort of non-monotonic inference that until now was considered quite distinct is this: if we do not know why a light does not light, we prefer the explanation that it is the electrical socket that has failed rather than the light, the hair drier, and the refrigerator. This is the idea behind Circumscription: select the solution that minimizes the number of things acting abnormally. Siegel and Besnard use classical logic (more specifically, Model Theory) to provide a neat presentation of both the ClosedWorld Assumption and Circumscription in one framework. It is not only easier to see the relation between the two concepts as the authors present them, but we believe that it is the easiest available introduction to circumscription, and the easiest way to understand the consequences of the Closed-World Assumption. The more technical parts of this chapter will be beyond the beginner. Intuitionism
Clarke and Gab bay first discuss the nature of non-monotonic inference; then they compare intuitionistic logic with classical logic. They claim the intuitionistic notion of truth-that some things may not yet be true, but might become so, and that there is not necessarily a single way things could turn out-is more realistic, and that this realism makes it well suited for nonmonotonic reasoning. They present a non-monotonic inference rule and discuss automating it, and examine its connection with classical nonmonotonic theories such as Moore's.
NUMERIC FORMALISMS
Numerical techniques have been extensively used in automated reasoning (e.g. expert systems), sometimes without a clear understanding of what the numbers mean. The chapters in this book attempt to clarify this issue. The
Introduction
7
most respected counterpart to the role of classical logic in Artificial Intelligence has always been probability theory. There are alternatives to classical logic that use real numbers, usually from zero to one, inclusive; they are not probability theory, but for them probability theory has always been a strong paradigm, to mimic, to disagree with, or to extend. One extension of any theory is to start reasoning about the assertions of the theory. This is called the theory's meta-theory. In probability theory we can deduce the probability of one event from some other probabilities. To get these probabilities, one must ask someone; his answers may be erroneous, uncertain, or even inconsistent. Paass's probability logic, reasoning about these probabilities (hence meta-theoretically), attempts to turn them into a consistent, and we hope more accurate, set of probability distributions. In the case of insufficient knowledge, it may only be possible to derive upper and lower bounds on probabilities. Probabilistic logic is concerned with defining probability distributions over logical propositions. Paass's chapter examines the methods and the underlying assumptions necessary for evaluating these in inference networks, and comments on the relationship between this and other formalisms of uncertain reasoning. In practical situations, the available pieces of information often have different reliability; this must be taken into account for evaluation. The quantitative theory of belief that Smets presents differs in two ways from probability theory: he wants to say something different, and he wants to say more. On the one hand, Smets argues, Shafer theory is a theory of subjective belief, as opposed to objective but unknown probabilities. On the other hand, Shafer can say things about belief that probability theory cannot say about probability. For instance, in probability theory, if we know that either P or Q will definitely happen, but we don't know the probabilities of P and of Q, we must nonetheless give them numerical values: We must (in the usual interpretation of probability) say that P and Q each has a probability of 0.5. In Shafer theory, we merely give all of our subjective weight of belief to the set { P, Q}, which is different from giving a weight of0.5 toP and to Q. It is more difficult to say whether this theoretical advantage brings a corresponding advantage in guiding action. In contrast with the above two, fuzzy logic is not an attempt to say more than probability theory, but to get by with less information. In cases where probability theory would say one needs more information to answer a question, fuzzy logic comes up with an answer (possibly imprecise). Its adherents claim that varieties of fuzzy logic are axiomatizations of vagueness and of uncertainty (as distinct from probability); these claims are disputed. While it may be debatable what fuzzy logics are logics of, it is clear that they have been useful in application. Dubois and Prade characterize the difference between fuzzy logics of vague-
f'.. H. Mamdani et al.
8
ness and uncertainty logic. The logics of uncertainty are uncontroversially not truth-functional; they argue that the logic of vagueness can be truth functional under perfect information. The basic notion is that of the degree of truth of a vague proposition, which will be a number between zero and one. The ease (and controversy) come from traditional fuzzy logic's truth-functionality. This gives a very (perhaps excessively) simple nature to the logical connectives. The degree of truth of a disjunction, for instance, is merely the maximum of the degrees of the disjuncts: if P is 0. 7 true and Q is 0.6 true then P v Q is 0.7 true (in the most common variety of truth-functional fuzzy logic; in another it is 1.0). They show that many classical inferences have analogs in possibility logic and discuss the relationships with Shafer theory, probability theory, and modal logic, suggesting that possibility logic offers a graded theory of the modal notions of possibility and necessity. E. H. MAMDANI, JOHN BIGHAM and FLASH SHERIDAN
Department of' Electrical Engineering, Queen Mary College, London, UK APPENDIX A: CLASSICAL LOGIC-AN INTRODUCTION In this appendix a very brief overview of classical logic is presented for those readers who would like a short overview or revision of the main concepts taken for granted in some of the other chapters. A logic is a formal system for representing knowledge unambiguously and reasoning with it. The first requirement therefore is an unambigous syntax that defines the strings in the language, called wel/formedformulae (wffs). In order to know what a wffmeans, a semantics must be also be given so that it is possible to say when any given formula is true and when it is false. Finally, we shall want to define a proof system or calculus that will allow chains of reasoning to be constructed in order to represent a given argument (i.e. the derivation of a conclusion from a set of premises). Perhaps the most studied system of logic, and certainly the most prevalent historically in Computer Science and Artificial Intelligence is First-Order Predicate Logic (FOL). There are many concrete syntaxes in the literature, but the differences are unimportant. A typical formulation is as follows. The alphabet of the language is the union of the following sets. (i) (ii)
P: a non-empty set of predicate symbols. We shall use strings
composed entirely of capital Roman letters as elements of P. F: a non-empty set of function symbols. We shall use strings composed entirely of lower-case Roman letters or numerals as elements of F.
9
Introduction
(iii) (iv) (v) (vi) (vii)
V: a non-empty set of variables. We shall use lower-case italic letters as elements of V. A set of Boolean connectives. For simplicity here, we shall restrict ourselves just to {--,, -+ }. {V, 3}: two quantifiers: the universal quantifier \f and the existential quantifier 3 (read "for all" and "there exists"). {T, _1_}: representing the two logical constants true and false respectively. {(, ), ', '}: three punctuation symbols.
Associated with every oc E (P u F) is a non-negative integer called its rank. We shall write P" and F" to denote the sets of predicate and function symbols of rank n respectively. Rank-zero function symbols are often called constants and rank-zero predicate symbols are often called propositions (or propositional variables). The set of wffs is the smallest set of strings such that: (i) (ii) (iii) (iv)
s1 is a wff if s1 = f3(oc 1 , ••• , ocn) and f3 E P" and oc; is a term for all l ~ i ~ n; s1 is a wff if s1 = (84 -+ ~). where fiB and ~ are wffs; s1 is a wff if s1 = T or if s1 = _i; s1 is a wff if s1 = \focf!B or if s1 = 3oc84, where oc E V and 84 is a wff, where y is a term iff yEV
or
y = f3(oc 1 ,
..• ,
ocn) and f3 E F" and oci is a term for all l
~
i
~
n
If oc E (F 0 u P 0 ) then we can unambiguously write oc rather than oc( ), since F n Vis empty. Thus, for example, if Q E P 2 , g E F 1 and bE F 0 then Q(f(b), x) and \fx3y(Q(x, y) -+ Q(y, x)) are both wffs. Notice that, for example, \fx\fyQ(y(b), x) is not a wff; in first-order logic one may not quantify over functions (or over predicates). Logics that admit such strings as formulae are higher-order logics. If no predicate symbols other than rank-zero ones are used and the only possible wffs are of type (i), (ii) or (iii) above then we get a sublanguage, which is usually known as propositional logic. Sometimes additional Boolean connectives other than -+ and --, are included. This can either be done by definitions or by including them in the language properly and giving semantic definitions, axiom schemas and rules of inference in addition to those given below. Commonly " {conjunction}, v (disjunction) are included and can be defined thus: (.W'
1\
fiB)=: I(.W'-+ 1f1B)
(.W' v 84)
=(--, s1
-+
84)
10
E. H. Mamdani et al.
Similarly, .l and T are often dispensed with and either V or 3 can be omitted since they are interdefinable: Vrx..r#
=- -, 3rx.---, .Jli'
Two important concepts are those of free and bound variables. An occurrence of a variable rx. E V in a wff .r# is bound in the wffs Vrx..r# and 3rx.d'. It is free if there is an occurrence that is not bound. Thus, for example, in (R(x, y) -+ 3xVzR(x, z)), z is bound, y is free, and x has both free and bound occurrences. If a wff has no free variables then it said to be closed. So far all we have is a set of wffs; we have said nothing about what a formula means. The semantics of a wff of FOL can be defined by composing the meanings of the subexpressions. Logical symbols such as the quantifiers and the Boolean connectives have fixed meanings, but the non-logical symbols (i.e. F and P) do not. Whenever we write a formula, we usually have an intended meaning or interpretation for the non-logical symbols and an intended universe of discourse (i.e. the set of things over which we want to quantify) in mind. Formally, an interpretation (or model) is a tuple B) = P(B I A) is at all reasonable; but Lewis ( 1976) proved that no probability function meeting this identity can assign positive probability to three or more pairwise incompatible statements. Discussions about the meaning of Stalnaker's conditionals versus Adams' point of view appear in Harper et a/. ( 1981) and especially in the paper by Gibbard (1981 ). It seems that Stalnaker's approach naturally expresses subjunctive conditionals ("if men had wings they would fly"), while conditional probability is closer to indicative conditionals ("if John is not the murderer then it is Peter"). A new approach to conditionals has been proposed by Goodman and Nguyen (1987), which may turn out to be a significant step towards making logical conditionals and conditional probabilities compatible. Strangely enough, most of the artificial-intelligence literature in probabilistic reasoning has focused on algorithmic issues (how to efficiently process a Bayesian network, or a set of probability-valued formulae), but not on semantics or representational issues; in particular the controversy concerning conditionals, which really belongs to a debate between probability and logic, has not yet been widely considered by artificial-intelligence researchers, although the debate between logic and probability as a proper tool for implementing reasoning processes is particularly lively at present, as seen by several papers by Cheeseman and others (see references in Paass' contribution to this volume). DIDIER DUBOIS and HENRI PRADE
LSI, Universite Paul Sabatier, Toulouse, France
References Adams, E. W. ( 1975). The Logic of Conditionals. Reidel, Dordrecht. Adams, E. W. and Levine, H. P. ( 1975). On the uncertainties transmitted from premises to conclusions in deductive inferences. Synthese 30, 429-460.
Introduction
25
Boole, G. (1954). An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities. MacMillan. (Reprinted by Dover, New York, 1958.) Carnap, R. (1950). Logical Foundations of Probability. Routledge & Kegan Paul, London. Fenstad, J. E. (1967). Representations of probabilities defined on first order languages. Sets, Models and Recursion Theory (ed. J. N. Crossley), pp. 156-172. NorthHolland, Amsterdam. Harper, W. L., Stalnaker, R. and Pearce, G. (eds.) (1981). ~fs. Conditionals, Belief, Decision, Chance and Time. Reidel, Dordrecht. Keisler, H. J. (1985). Probability quantifers. Perspectives in Mathematical Logic (ed. J. Barwise and S. Feferman), pp. 509-556. Springer-Verlag, Berlin. Lewis, D. ( 1976 ). Probabilities of conditionals and conditional probabilities. Phil. Rev. 85, 297-315. Also in Harper eta/. (1981, pp. 129-150). Los, J. ( 1963). Semantic representations of the probability of formulas in formalized theories. Studia Logica 14, 183-194. Reichenbach, H. (1949). The Theory of Probability. University of California Press, Berkeley and Los Angeles. Stalnaker, R. (1968). A theory of conditionals. Studies in Logical Theory (ed. N. Rescher). Blackwell, Oxford. Also in Harper eta/. (1981, pp. 41-55).
1
On Game-Theoretic Interactions with First-Order Knowledge Bases PETER JACKSON Department of Artificial Intelligence, Edinburgh University, UK
Abstract Suggestions that knowledge-representation languages should be based on logical languages appear to be predicated upon the assumption that one might thereby be able to ensure the correctness, completeness and consistency of the knowledge so represented. The question then arises as to how the knowledge engineer, in developing a particular knowledge base, can use methods associated with automatic theoremproving to derive any of these benefits. This chapter describes one approach to this problem, in which the interaction between a knowledge engineer and knowledge base under development is regarded as a language game. A game-theoretic interpretation of first-order logic is used to design an inference engine that helps the knowledge engineer to explore the consequences of adding new knowledge. It provides both rules governing the conditions under which a new proposition can be embedded in a model, and interaction strategies that the inference engine can adopt in response to assertions and questions. A modal extension to this logic is then presented, and it is shown that a suitable doxastic system can be obtained from the game rules by restricting the accessibility relation. Finally, this more expressive language is used to specify domaindependent game-playing heuristics.
1
INTRODUCTION
This research was undertaken with a view to exploring the role that logic might play in the incremental development of knowledge bases. The central idea was to try and use logic as a tool for both the specification of knowledge bases and the investigation of their epistemological properties. Knowledgebased programs are just as hard to specify and verify as more conventional pieces of software, and almost any advance in this area would be worthwhile. A first-order language, under a suitable interpretation, might provide a basis for constructing a program that would help knowledge engineers to explore and experiment with knowledge bases as an integral part of the process of creating them. This is because the notion of consistency is well defined for such languages, and an expert or knowledge engineer can also use logic as a query language to monitor the inferential behaviour of such a system. ~~AN-STANDARD LOGICS FOR AUTOMATED SONING ISBN 0.12-649520.3
Copyright («) /9HH Academic Pres. 0. MIN wins Gk[PI, ... , Pn ~] if and only if the set of propositions {[PI, ... , Pn ~ ]} u k is unsatisfiable, otherwise MIN loses. MIN and MAX can be generalized to games on any clause in L of the general form [PI, ... , Pn ~ Q" ... , Qm], where n, m > 0 in the following way: MIN wins Gk[PI, ... , Pn ~ Q" ... , Qm] if and only if MAX wins Gk[ ~ QI, ... , Qm] with substitutions and MIN wins Gk[P" ... , Pn ~ ]s, where [ ... ] s denotes the result of applying s to [ ... ].
If we allow that MAX wins its subgame when the conditions are empty, and MIN wins its subgame when the conclusions are empty, we can generalize the above to cases when m, n ~ 0. Let the empty conclusions be called reject and the empty condition be called accept. Thus MAX wins Gk[reject ~PI, ... , Pn] if MAX wins Gk[ ~PI, ... , Pn], and MIN wins Gk[P" ... , Pn ~accept] if MIN wins Gk[PI, ... , Pn ~].This allows for the evaluation of facts and constraints in the obvious way. A win for MAX is always a loss for MIN, and vice versa. This is because there is no concept of a draw at the level of MAX and MIN. If MAX t .I aJ s to generate the empty clause during Gk[ ~ QI, ... , Qm] then MAX has
36
P. Jackson
lost and MIN has won, while if MIN fails to generate the empty clause during Gk[Pt. ... , Pn +--]then MIN has lost and MAX has won. Both players have equal and total access to k. We are therefore dealing with a two-person, zerosum, perfect-information game. MAX and MIN can be considered as functions from both the set of sentences of Land the set of all model sets, 2M, to the set {win, lose}. Their semantics is as follows. MAX winning Gk[ +-- P 1 , ••• , Pn] denotes that the attempt to embed the proposition in the model set K has failed, while MIN winning Gk[P 1 , ... , Pn +--]denotes a similar failure. In other words,
PnJm n K
MAX wins Gk[ +-- P 1 ,
... ,
MIN wins Gk[P 1 ,
Pn +--]iff [Pt. ... , Pn +-- Jm n K = {
... ,
Pn] iff [ +-- P 1 ,
... ,
= { }
}
where P m denotes the model set associated with a proposition, i.e. all those states of affairs that constitute a model of P. When does the interpreter use which strategy? As in earlier work, it seems reasonable to assume that the interpreter should be critical of assertions presented to it, especially if they are proposed with a view to incorporating them into the knowledge base. Adopting the MIN strategy will accomplish this, and act as a guard against inconsistency. The user plays MAX, since he intends that the knowledge-base interpreter accept the assertion, and will want to know any grounds for rejection. On the other hand, the interpreter should be helpful when presented with a query, in that it should look for values for which the query can be satisfied. Adopting the MAX strategy will accomplish this, and return such values if they are to be found. The user will then be in the MIN role, in that he may adopt a critical attitude towards solutions, and demand alternatives. Nothing has been said so far about the interface language that the knowledge engineer uses to make assertions and pose questions. Let this language be the set of sentence types that can be formed by prefixing a mood symbol to any formula of the first-order calculus, expressed not in the clausal form but in an unambiguous variety of the full notation, which makes clear the scope of quantifiers and the precedence of connectives. Having the full notation will facilitate the introduction of modal operators in Section 3, by resolving any ambiguities that might result with respect to the relative scopes of modal operators and quantifiers. A meta-interpreter for the interface language is described in Section 2.3. Mood symbols denote the mood of the sentence in which a token of some formula occurs. Let > denote the declarative mood and ? denote the interrogative mood. The assertion > P will be considered as an illocutionary act on the part of the user, with the desired perlocutionary effect that the interpreter accept P, i.e. incorporate it into the knowledge base, while the
J Game- Theoretic lnteractiom
37
query ?P will be considered as a request that the interpreter inform the user whether or not the knowledge base logically implies P. Any expression understood by the knowledge-base interpreter will be a token of one of these sentence types. We are then interested in the patterns of acceptance and rejection exhibited by the interpreter to these sentence tokens, and not in the truth or falsity of either the sentence types, or their propositional contents. The effect of a sentence token upon the behaviour of the system will be a function of both the mood of the sentence and the valuation of its propositional content, the latter being the clausal form of the wff occurring in the sentence. It is in the interests of clarity to keep these two aspects of the system-propositional evaluation and perlocutionary effect-quite separate, apart from this functional relationship. The former is the function of the interpreter; the latter will be assigned to a separate entity, called the "executive". The executive handles such things as input-output and manipulations of the knowledge base; in short, all that is "side effect", as opposed to evaluation. Since we are not interested in the details of how the executive is implemented, let us characterize it by a function E: E(>, [PI, ... , P. +--- QI, ... , Qm], k) = (kif MIN wins Gk[PI, ... , P. +--- Q~> ... , Qm]) (k u {[PI, ... , P. +--- QI, ... , Qm]} otherwise)
2.2
Queries and quantifiers
At first glance, it might appear that queries of the general form ?[PI, ... , P. +--- QI, ... , Qm] could be answered using this mechanism, withE answering "yes" if MAX wins Gk[PI, ... , P. +--- QI, ... , Qm], and "no" otherwise. However, a little thought will show that this is not the case. A reply of "yes" would simply mean "yes, the proposition can be embedded in the model set", or "yes, some model in the model set is a model of the proposition", and not "yes, every model in the model set is a model of the proposition", which is what is required. Let us begin by looking at "PRO LOG-style" conjunctions of goals of the form ?[GI, ... , G.], with the interpretation "Does k logically imply GI & ... & G. (for some values of the variables in the G;)?" To answer this question in the affirmative, it is sufficient to show that MAX wins Gk[._ GI, ... ,G.], i.e. to show that{[+--- G~>····G.]} uk does not have a model. However, under such circumstances, a negative answer would be ambiguous between "it is not the case that k I- GI & ... & G." and "k 1- "-'(GI & ... G.)''. The confounding of "not provable" and "demonstrably not" implicit in the naive treatment is unacceptable in the present context.
38
P. Jackson
One way of dealing with this problem would be to respond to the query? P by attempting to show that k logically implied either P, ""P, neither or both, by playing two games: Gk[ +- P] and Gk[P +-]. MAX wins Gk[ +- P] if and only if kI-P, while MIN wins Gk[P +-] if and only if k I-"" P. The logic of propositional contents then remains two-valued, in that either MIN or MAX wins. Rather than there being three or four "truth values", e.g. true, false, unknown and inconsistent, there are four possible responses that our executive E can make, depending on how the games turn out:
E(?, [P" .. ., P.], k) = (yes if MAX wins Gk[ +- P" .. . , P.] and MAX wins Gk[P 1 , ... , P. +-] (no if MIN wins Gk[ +- P" ... , P.] and MIN wins Gk[P 1 , ... , P. +-] (unknown if MIN wins Gk[ +- P" .. . , P.] and MAX wins Gk[P" ... , P. +-] (inconsistent if MAX wins Gk[ +- P 1 , • .. , P.] and MIN wins Gk[P" ... , P. +-] Having handled the simple case of conjunctions of goals containing only existentially quantified variables, it is not hard to see how a meta-interpreter could be written that would take arbitrary formulae of the first-order calculus and construct the corresponding clausal queries and assertions required by the games specified in the function E. For the connectives, the following axiom schemata suffice:
NR:
[ +-
NL:
[P, -P
P, ""P] +-]
vP:
[(P v Q)
+-
P]
vQ:
[(P v Q)
+-
Q]
vE:
[P, Q
+-
(P v Q)]
&E:
[(P & Q)
&P:
[P
+-
(P& Q)]
&Q:
[Q
+-
(P& Q)]
CP:
[(P
::J
Q), p
+-]
CQ:
[(P
::J
Q)
Q]
CE:
[Q
+-
(P
+-
+::J
P, Q]
Q), P]
Such rules are similar to those for manipulating semantic tableaux, which are related, in their turn, to the sequent calculus.
Game- Theoretic Interactions
39
Handling the quantifiers is slightly more complicated:
Em.. :
MAX wins [ +--- (3x)( ... x .. .)] if MAX wins [ +--- (.. .f. .. )]
Um..: MAX wins [ +--- (Vx)( ... x .. .)] if MAX wins [+---( ... f ... )] Emin:
MIN wins [(3x)( ... x .. .) +---] if MIN wins [( .. .f. .. ) +---]
Umin:
MIN wins [(Vx)( ... x .. .) +---] if MIN wins [( ... y .. .] +---]
where xis the bound variable of quantification, y is a new variable uniformly substituted for x in the matrix, and f is a Skolem function of all the free variables in (... x .. .). Processing arbitrary queries of the form ?P, where Pis a wff of the interface language, now reduces to playing the games Gk[ +--- P] and Gk[P +--- ], as specified in the executive function E. The meta-interpreter will see to it that the correct clausal games are played. The main advantage is that the logic of propositions is still two-valued, since one and only one of MAX and MIN will win propositional games. Queries are important, because they give the expert and the knowledge engineer the ability to check that the representation of knowledge permits the drawing of just those inferences that they would sanction, and allows them to exercise their right to accept or reject the answers that they receive. This is consonant with the aim of providing a more symmetric interaction between the knowledge base and the knowledge engineer. It is also in harmony with the somewhat pragmatic view of truth and consistency espoused in this chapter.
3
MODAL EXTENSIONS TO THE META-INTERPRETER
The incremental approach to knowledge-base construction described herein requires that a knowledge-base interpreter reason about its own beliefs. It therefore seems natural to extend the interface language and the underlying theorem prover to accommodate doxastic operators. The idea is that introducing modal operators to the meta-language will help us to reason about the underlying model set directly, as a heuristic device in the generation of object-level proofs. Section 3.1 outlines a game-theoretic interpretation of first-order modal logic that is suitable for all normal systems where the accessibility relation is serial and the Barcan formula holds. Section 3.2 examines a particular doxastic logic, KD45, and considers how one might be able to construct autodoxastic theories incrementally and on demand, using a modified version of this proof procedure. Section 3.3 illustrates how a language extended in
40
P. Jack.l'on
this direction might facilitate the specification of domain-specific gameplaying heuristics.
3.1
Game rules for modal logics
The correspondence between modal treatments of doxastic logic and the underlying semantics of model sets derives from Hintikka ( 1962). The models in a model set can be viewed as the set of possible worlds W accessible from an agent's representation of the real world, i.e. conceivable in terms of what he believes in the real world. The valuation V assigned to a proposition P, V(P) E 2 W, is now no longer truth or falsity but the set of worlds in which P is true as far as the agent knows. However, it is not the case that all worlds are accessible from (i.e. conceivable in) all other worlds. A modal model is therefore a triple (W, R, V), where R enough{uel(x)) squadron-leader(y, s), M(enough-fuel(y))]
+--
Thus any member of a squadron is assumed to have enough fuel if it is consistent to assume that his squadron leader has enough fuel. Rules like M3 could be used to define "prototypical individuals", which can be substituted for variables of quantification as quantifiers are eliminated. This is more knowledge-rich than the domain-free method of substituting arbitrary constants or standardized variables, and could be used to encode heuristic information of a common-sense nature, as in the case shown above. It is also more reliable, and easier to modify, than the heuristics described in earlier work. In addition to making the specification of game-playing strategies easier, the presence of modal operators obviously enriches the representation language. Thus one can distinguish between hostile aircraft, aircraft known to be hostile, aircraft not known to be hostile, aircraft known not to be hostile, aircraft not known not to be hostile, and so on. This allows the writing of heuristic rules which rely on such distinctions; for example in times of war, unidentified intruders on airspace might be taken as hostile in the absence of contradictory information, whereas in peacetime we might insist that there be positive reasons for believing that an intruder is hostile before an attack is authorized: M4:
[attack(x)
+--
penetrate(x, airspace), status(war), M(hostile(x))]
M5:
[attack(x)
+--
penetrate(x, airspace), status(peace), L(hostile(x))]
Clearly there are still decisions to be made concerning the control regime in the meta-interpreter. The above assumes that clauses in the meta-interpreter are hand-ordered for effect, rather as clauses are ordered in PROLOG programs, with proper axioms listed before logical axioms. The same is not true, however, of clauses in the object-level knowledge base, which form a set. The idea is to use control knowledge to eliminate backtracking in the metainterpreter as far as possible, with proper axioms pre-empting the application of logical axioms, unless the user initiates backtracking by rejecting a reply. The rationale behind this whole approach is that implicit control knowledge (such as that represented by clause ordering) should be confined to the metatheory, and should not contaminate the object-level theory of the domain (cf. Clancey and Bock, 1986, Section I). If this principle is observed then the object-level knowledge base remains capable of being run under different control regimes. For example, the knowledge engineer might like to experiment with different orderings of default rules.
50
P. Jackson
Also on the subject of control, it might make sense to use some form of limited inference when evaluating modalized literals, such as restricting the evaluation of the Pin M P or LP to unit resolution at the object-level, thereby trading completeness for further efficiency. Consider Moore's example, "If I had a brother, then I would know about it", along the lines of
[L(brother(x, Peter))
+-
brother(x, Peter)]
The spirit of this rule is that knowledge of any brothers I may have is immediate, rather than being available via repeated application of modus ponens.
4
SUMMARY AND STATUS OF THE RESEARCH
This work represents a fresh attempt to provide a game-theoretic foundation for constructive interactions between a user and a knowledge base. It represents an improvement on previous attempts in that: ( 1)
using the clausal form of logic as the underlying representation allows the employment of well-established theorem-proving techniques for which soundness and completeness results exist;
(2)
the clean separation between the evaluation of propositional contents and the perlocutionary effect of sentence tokens allows the retention of a two-valued logic;
(3)
the knowledge engineer retains the full expressive power of a firstorder language for interacting with the knowledge base, and can examine tableau proofs that are relatively easy to understand;
(4)
the specification of game-playing strategies and heuristics is made much easier by the introduction of modal operators to a metainterpreter written in the representation language itself.
4.1
Status of the present program
The modal proof method described in Jackson and Reichgelt (1987) has recently been implemented, and we propose to construct a meta-interpreter that deploys game-playing strategies of the kind described in Section 3. The theorem-prover allows the user to specify what properties of R can be used by the program in the resolution of indexed literals; the list of such properties is simply the value of a global variable. Thus it is an entirely trivial matter to
J Game- Theoretic Interactions
51
replace the reflexivity property by seriality to derive the doxastic logic KD45. Autodoxastic reasoning will be implemented by allowing the metainterpreter to call the object-level interpreter to perform introspection as illustrated in the game of Section 3.2. This will require some modification to the theorem-prover, since these games are not identical with those in Section 3.1.
4.2
Conclusions and relation to other work
It is worth stressing that game-theoretic semantics is not, in and of itself, a non-standard logic. Rather it is a non-standard interpretation of logic that unlike many other intepretations, appears to acknowledge from the outset the finitude of resources available for constructing proofs, and demands that game-playing strategies be supplied. As such, it can be applied to either standard or non-standard logics, as this chapter has tried to demonstrate. In particular, having a game-theoretic semantics does not mean that a logical language is necessarily non-monotonic. In a non-monotonic logic, new axioms can invalidate old axioms, and a theory can contain inconsistent subtheories. This is because inferences drawn on the basis of incomplete information may have to be retracted in the light of fresh evidence. In the game-theoretic approach to knowledge-base construction outlined above, it not generally the case that the interpreter allows a theory to be extended on the basis of default rules. Rather, in the face of an assertion of the form > P, it adopts a certain strategy in trying to refute P, and accepts P if this attempt fails. Rules like [LP +- P] serve to restrict this process and provide short cuts to discovering inconsistency. Similarly, in the face of a query of the form ?P, a particular strategy is adopted in trying to construct an answer, but, even if P is concluded on the basis of a default rule such as [P +- M P], this does not mean that P will be added to the knowledge base. All it means is that the interpreter is allowed to go on affirming Pin response to queries until such time as k I- "'P. Meta-rules containing L and M (of the kind exhibited in the last section) perform one of two basic functions. Rules of the general form
[Pt •... , L(F( ... )), ... , Pn +-Qt •... , Qm] serve to "restrict" the predicate F in a particular context, while rules of the form
serve to "relax" the predicate F in a particular context. However, restriction rules do not have a non-monotonic interpretation in
52
P. Jackson
the system described above. A non-monotonic interpretation presumably allows a predicate to be "opened" again, once it has been "closed". To return to Moore's "brother" example, non-monotonicity tolerates a scene in which one really can have a long-lost brother. Relaxation rules specify short-cut strategies for drawing inferences on demand, in order to answer questions on the basis of the available evidence. New evidence may cause the default rule to return a different reply to a future query on the basis of an extended theory. This seems to be another advantage of separating the executive from the interpreter, as described in Section 2. A comparison with Reiter's (1980) system is useful here. The examples of default rules of the form Q: M PIP given in his paper are relaxation rules, according to the present terminology, since they can be rewritten in the form [ P +--- M P, Q], while rules of the form Q: M- P1- P are restriction rules, since they can be rewritten in the form [LP +--- P, Q]. The main difference between the two systems is the way in which such rules are used. The present system uses relaxation rules to answer queries and restriction rules to monitor assertions, whereas Reiter's system uses both kinds of rule to extend a theory. In the case of relaxation rules, success in using them never extends the theory, while success in using restriction rules always prevents the theory from being extended. Thus the modal approach described herein relegates such rules to the meta-level of a theorem-prover for a standard logic (albeit with a non-standard semantics), while Reiter actually constructs a non-standard logic of defaults. In summary, game-theoretic semantics can be seen as a tool for building logical languages and interpretation schemes, rather than specifying a particular language or scheme. The aim of this chapter has been to demonstrate its usefulness in the context of knowledge-base management, and explore some of the different ways in which game-theoretic interpreters can be implemented for different logics. A proof method for modal extensions to the predicate calculus has been outlined in terms of game rules, and examples have been given as to how one might use this more expressive language to specify autodoxastic strategies for the heuristic control of language games.
ACKNOWLEDGMENTS I should like to thank Han Reichgelt for many useful discussions on the subject of modal logic, and for comments on earlier drafts. I am also grateful to Frank van Harmelen, George Kiss, Marco Colombetti and Jean-Phillipe Solvay for their helpful comments on papers and programs associated with
1 Game- Theoretic Interactions
53
this work over the last two years. Special thanks are due to Paul Gochet, for his detailed criticisms of an earlier version of this chapter.
BIBLIOGRAPHY Austin, J. L. (1962). How To Do Things With Words. Harvard University Press, Cambridge, Mass. (Austin's classic lectures on performatives, etc.) Chellas, B. F. (1980). Modal Logic: An Introduction. Cambridge University Press. (A thorough introduction to the main systems and results of modal propositional logic.) Clancey, W. J. and Bock, C. (1986). Representing control knowledge as abstract tasks and metarules. Memo. KSL 85-16, Knowledge Systems Laboratory, Stanford University. (An interesting account of both the potential and the perils of representing control knowledge in a logical language.) Davis, R. ( 1976). Applications of meta-level knowledge to the construction, maintenance and use of large knowledge bases. Knowledge Based Systems in Artificial Intelligence (ed. R. Davis and D. Lenat). McGraw-Hill, New York. Ellis, B. (1979). Rational Belief Systems. Blackwell, Oxford. (An intriguing and controversial discussion of the principles of rationality and their relation to logic.) Hintikka, J. (1955). Form and content in quantification theory. Acta Phil. Fennica 8, 7-55. (This paper contains a lucid exposition of model sets.) Hintikka, J. (1962). Knowledge and Belief Cornell University Press, Ithaca, NY. (A model-theoretic approach to knowledge and belief.) Hintikka, J. (1973). Logic, Language Games and IIJformation. Oxford University Press. (The most accessible text on game-theoretic semantics.) Hintikka, J. (1983). The Game of Language. Reidel, Dordrecht. (A compendium of papers on game-theoretic semantics. This is not a very good place to start. However, it does contain some interesting applications.) Jackson, P. ( 1987a). A representation language based on a game-theoretic interpretation of logic. PhD thesis, Computer Based Learning Unit, University of Leeds. (A more exhaustive (and exhausting) account of the game-theoretic approach to knowledge base maintenance described in this paper.) Jackson, P. (1987b). Towards an architecture for advice-giving systems. Current Issues in Expert Systems (ed. P. Dufour and A. van Lamsveerde). Academic Press, London. (This paper is essentially Chapters 3 and 4 of Jackson (1987a), as far as game-theoretic semantics is concerned.) Jackson, P. and Reichgelt, H. (1987). A general proof method for first-order modal logics. Prov. lOth Joint Conf. on Art(ficiallntelligence, Milan, pp. 942-944. Morgan Kaufmann, Los Altos, California. (A paper that gives the details of the indexical proof method mentioned in Section 3. Alas, there is a small mistake in the unification algorithm presented therein (which I have corrected here). In accordinace with Murphy's Law, this was discovered the day after the paper was sent off') Le~esque, H. (1984). Foundations of a functional approach to knowledge representation. Artificial I ntel/igence 23, 155-212. (This paper is an attempt to examine some formal properties of knowledge bases at a functional level of abstraction that rises above implementation detail.) Lewis, D. (1972). General semantics. Semantics of Natural Language (ed. D. Donaldson and G. Harman), pp. 169-218, Reidel, Dordrecht. (A wide-ranging ~aper that touches upon indexical semantics and the representation of performatJves, amongst other things.)
54
P. Jackson
Moore, R. C. ( 1985). Seman tical considerations on nonmonotonic logic. Artificial Intelligence 25,75-94. (A significant paper that sheds light on some of the problems associated with earlier formulations of non-monotonic reasoning.) Nguyen, T. A., Perkins, W. A., Laffey, T. J. and Pecora, D. (1985). Checking an expert system knowledge base for consistency and completeness. Proc. 9th Int. Joint Col!{ on Artificial Intelligence, Los Angeles, pp. 375-378. Morgan Kaufmann, Los Altos, California. Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence 13, 81-132. (A thorough account of default logic that provides many interesting results, develops a complete proof theory for a class of common defaults, and shows how this could be implemented in conjunction with a resolution theorem-prover.) Searle, J. (1965). Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press. (Contains an analysis of illocutionary acts, such as assertions, questions, requests, congratulations and warnings.) Suwa, M., Scott, A. C. and Shortliffe, E. H. (1982). An approach to verifying completeness and consistency in a rule-based expert system. The AI Magazine, Fall, pp. 367-374.
DISCUSSION Paul Gochet: The author states that "the clean separation between the evaluation of propositional contents and the perlocutionary effect of sentence tokens allows the retention of a two-valued logic". I should like to stress the importance and originality of this result by setting it off against the opinion that has prevailed up to now. The shared view on the topic has been forged by two fascinating papers due to the logician N.D. Belnap (Belnap eta/., 1976/7; Bellman eta/., 1977). Belnap introduces four values, defines three connectives in terms of the latter and supplies a method for extending the mapping of all atomic formulae into the four values to a mapping of all the formulae (molecular as well as atomic ones) into these truth values. Belnap defines the epistemic states in which a computer can find itself in much the same way as Hintikka. A collection of Carnapian state descriptions is his way of representing incomplete information. This is congenial to Jackson's use of model sets to represent incomplete knowledge. At a later stage of the second paper, Belnap takes advantage of Dana Scott's theory of approximation lattices and points out that his four truth values make up an approximation lattice for which it is possible to define meet and join, and he states that "... the value of a sentence in an epistemic state is to be determined by taking the meet of all its values in the separate states ... "(Bellman et a/., 1977). Thanks to Belnap's sophisticated apparatus, it is possible to compute the determinate truth value of P v Q out of the indeterminate vaolue of P and the indeterminate value of Q, a computation that clearly results in an increase of the amount of information. The following example will make that clear. Suppose that the computer is given incomplete information with respect to P and also incomplete information with respect to Q. In one model set, Pis ascribed the truth value True and Q the truth value Neither True nor False. In the other model set, P is ascribed the truth value Neither True nor False and Q is ascribed the truth value True. Hence, when confronted with the queries ?P and ?Q, the computer cannot give an answer. Belnap says: "we should mark A with True in E (the model set) if it is marked True in
1 Game- Theoretic Interactions
55
all set-ups (in all models) in E, and mark it with False if it is marked False in all set-ups in E (Bellman et a/., 1977). Jackson adopts the same policy: he makes it a condition for a proposition being true that "every model in the model set is a model of the proposition." I have set the stage. Let me now examine what happens if the computer is asked ?(P v Q). Applying Belnap's insight and taking the meet, the computer can answer "Yes". The question arises as to whether Belnap's insight can be captured within the framework of two valued epistemic logic supplied by Jackson. N. Tennant (1987) has recently inquired into the novelty and independence of game-theoretic semantics from Tarski standard semantics. A brief comparison with Jackson's exploitation of game theory can help see the originality of the latter's work. Several features distinguish Hintikka's game-theoretic semantics from Tarski's semantics. The more apparent one is the fact that Hintikka's recursion clauses operate from outside in, instead of operating from inside out, like Tarski's. This feature, which should appeal to the computer scientist and the linguist, is not, however, the most important. For Tennant, the originality of game-theoretic semantics lies in its providing us with a rigorous account of the activities of seeking and finding. That point had already been made by Hintikka. What is new in Tennant's account is the claim that these two activities are misplaced when their purported objects are "constructively inaccessible existents" (Tennant, 1987, p. 175). In other words, gametheoretic semantics delivers a full account of seeking and finding if and only if we impose upon it an additional constraint, i.e. if we require that the strategies in our game be effective in the broad sense. This requirement, however, has a negative side. We have to pay something for it: it may happen that neither player possesses an effective strategy. By catering for this possibility, we lose bivalence. This helps us to appreciate the attractiveness of Jackson's account. That account succeeds in capturing the specificity of the activities of seeking and finding without sacrificing bivalence. Jackson achieves this result by putting to a new use the Austinian distinction between locution and perlocution. Jackson's logic of propositions remains two-valued: either MAX or MIN will win propositional games. On the other hand, the case of unsuccessful query, which occurs when the knowledge base is incomplete or inconsistent, is not ignored but accounted for at another level, i.e. at the level of the definition of the function EXECUTIVE. The latter function ranges over four values (Yes, No, Unknown, Inconsistent), which are not truth-values of the proposition (locutionary act) but which are possible responses to be seen as effects (perlocutionary act) triggered by the propositional component of the question (illocutionary act). Jackson fully clarifies the interplay between quantifiers and epistemic operators. This is, unquestionably, a major achievement. When a problem has been solved, as is the case here, the question to raise is "What should come next?" I guess that at this stage it would be appropriate to come to grips with the logic of identity and the logic ?f causality and to see how a first-order epistemic logic could be broadened to cover Identity and causality. As Ch. Cherniak (1986) says, "We must distinguish between an agent's merely acting as if he had a particular belief and his actually having that belief. ~though the logically deviant creature will act appropriately for the putative belief t a_t c = d and a = h, as well as for the belief that a = b and c = d, only the latter will be Its reason for these actions-that is, it will only use the belief that a = b and c = d as a basis for selecting desirable actions." The trouble is that the attempts to extend
56
P. Jackson
resolution methods to the logic of identity have not been very successful up to now. I should like to thank Dr M. McRobbie for a useful conversation on this topic. L. Farinas del Cerro: Usually the notion of derivability in logic uses a set of axioms and rules, another method has been proposed by Hintikka and Lorenzen. Hintikka uses the notion of two-player game as proposed by Jackson, and Lorenzen ( 1959) proposes the notion of two-person dialogue. These two proposals are very close; however, the philosophical motivations are very different. A logic in the Hintikka and Lorenzen approach is defined by a set of rules steering the game or the dialogue. The rules are certainly conventions; hence the main problem is what conventions are the most natural. Jackson gives conventions that define classical logic. In Lorenzen's work it appears that conventions for intuitionistic calculus look more natural than conventions for classical calculus. The same idea can be found in Gabbay's work, where classical calculus is obtained by adding a new rule to the intuitionistic rules (Gabbay, 1985). An important point in Jackson's work is the possibility of reaching modal logics in the contex of the game approach. However, it doesn't seem clear to me how the axioms of a particular modal system are captured in his approach, because the properties of the accessibility relation don't appear explicitly. Lorenzen defines dialogue rules able to formalize some modal logics. I think that this will be a natural way to extend Jackson's work.
Robert C. Moore: It is not immediately obvious on reading Peter Jackson's paper exactly what he is trying to achieve. At a superficial level, things are more or less clear. The problem being addressed is how to support a knowledge engineer building up a knowledge base about a domain, by providing him with tools for checking the consistency of what he has entered into the knowledge base and for checking whether desired conclusions are entailed by what he has entered into the knowledge base. What is not clear is precisely what a game-theoretic approach has to offer over more conventional solutions. The conventional way to approach the manifest problem would be to use a resource-bounded theorem-prover. When an assertion is presented to the knowledge base for acceptance, the theorem-prover is run to prove the negation of the new assertion. If the theorem-prover succeeds in proving the negation then the assertion is inconsistent with the knowledge base and is not accepted. If the theorem-prover runs out of ways to prove the negation without finding a proof then the original assertion has been shown to be consistent with the knowledge base (provided that the theoremprover is logically complete) and the assertion is accepted. If the theorem-prover runs out of resources while trying to prove the negation then the original assertion is provisionally accepted, although it has not been proved to be consistent. Since the consistency of complex theories is usually undecidable, however, this will normally be the most we can hope for. Checking entailment is just the dual of this process. To see whether a desired conclusion follows from the knowledge base, simply run the theorem-prover until the desired conclusion has been proved, or the theorem-prover runs out of ways to try to prove the conclusion, or the theorem prover runs out of resources. If the conclusion is proved then all is well. If the theorem-prover runs out of ways to prove the conclusion then it definitely does not follow from what is already in the knowledge base, and the knowledge base must be strengthened if the user wants the conclusion to follow. If the
1 Game- Theoretic Interactions
57
theorem-prover runs out of resources then the provisional assumption would be that the conclusion does not follow. Again, because of the limits of decidability, this is usually the best one can hope for if the conclusion really does not follow. This is such an obvious approach to take, that it would hardly be worth writing a paper about it. What we should like Jackson to tell us is why a game-theoretic approach might be better than this. I suspect that the answer Jackson would give might be something like this: general-purpose theorem proving is well known to be computationally very expensive, some would say computationally infeasible. The kind of logical games that Jackson discusses, however, if taken seriously as games might be played by the computer much faster than the time needed to get reasonable results by running a general-purpose theorem-prover, even with a resource bound. The game associated with a particular formula in Hintikka's approach has at most as many moves in it as the maximum depth of the formula, so if the computation needed to do move selection can be reasonably bounded, one could hope to obtain quick procedures for checking consistency and entailment. I see two difficulties with this answer-one conceptual and the other practical. The conceptual difficulty is that the kind of games Hintikka introduced were semantic games, whereas what is needed for the kind of enterprise Jackson is suggesting are proof-theoretic games. Hintikka's games are concerned with truth in a model, while Jackson is concerned with consistency and entailment with respect to a knowledge base, i.e. a theory. Hintikka's games are relatively straightforward because the structure of models is so much simpler than the structure of theories. For instance, the rule for P v Q is simply that MAX chooses Porchooses Q,since for P v Q to be true, P must be true or Q must be true. Jackson gives more complicated rules, apparently to cope with such facts as that P v Q could be entailed by a knowledge base without P being entailed or Q being entailed. The rationale for the particular rules he gives is not very clear, though, and it is difficult to see whether the total set of rules he gives is complete or, if not, what the limits of the rules are. One is left with the question of whether by moving from the domain of semantics to the domain of proof theory one has lost the intuitive motivation and simplicity that made the game-theoretic approach attractive in the first place. The practical question is whether there really could be game-playing strategies that would give useful results and still be computationally trictable, even if the conceptual shift from semantics to proof theory is carried off successfully. After all, Hintikka's methods depend on proving that one player has a winning strategy in a particular game. That generally requires an exhaustive search of the game tree, which in the case of theories over infinite domains is impossible. Jackson's discussion seems to suggest that, rather than proving that a player has a winning strategy, we simply play the game between the system and the user and assume that whichever wins must have had a winning strategy. But does this really make sense? Suppose the user wants to "prove" that a certain universal statement follows from th~ ~nowledge base, by playing a game with the knowledge base. Suppose they play Hmtikka's game with the user taking the MAX role and the system taking the MIN ~ol~. The system would have to choose an instance of the universal statement that it t~Inks" is a counter-example. Suppose it makes such a choice and the game continues ;uh MAX eventually winning. Can we conclude that the statement actually does ollow from the knowledge base? That depends, as Jackson himself points out, on Whether the system plays the game perfectly. But to play the game perfectly, the system would have to have proved that there was no other better choice of possible
58
P. Jackson
counter-example. It is not clear that this would be any easier than simply proving that the original formula was false. On the other hand, if the system is not a perfect player then it is hard to see why we should draw any connection between the outcome of a particular game and there being a winning strategy for the game. Jackson's answer to this problem is to propose the use of heuristics to guide the choice of moves in the game. But just as domain-independent heuristics for guiding a conventional theorem-prover seldom work very well, Jackson tried a variety of domain-independent heuristic game-playing strategies and found that "none of them is very satisfactory". Jackson goes on to suggest the use of domain-specific heuristics, but none of the examples he gives seem to depend particularly on the game-theoretic framework he assumes. This brings us back, then, to a refinement of the question we posed initially: what, if anything, makes the game-theoretic framework more attractive than conventional approaches as a basis for reasoning systems that are guided by domain-specific knowledge? Reply to Gochet: (F. v) is weaker than disjunction elimination, because it is not the case that k = { P v Q} 1- P v Q. MAX does not have a winning strategy in G(P v Q), given k and the extra game rules given above, although, given k = { P} or k = { Q}, MAX does have a winning strategy. Thus, to win an argument on P v Q, MAX must show which of P and Q is true. However, the epistemic extension solves this problem. We can represent the situation "we believe that either P or Q is true" by L( P v Q), i.e. every model in K, the model set of k, contains either P or Q. In the indicia! notation, we have [IPiw:O• IQiw:o +-] E k. The corresponding query can be put to the theorem-proved as [ +- L( P v Q)], and the proof is as follows:
+- L(P v Q) +- IP v Qlt:O 3. +- IPII:O 4. +-IQit:o 5. IPiw:O• IQiw:O +6. +I.
2.
A Lmox• I vP, 2 vQ,2 Aek
{1/w}, 3, 4, 5
R need have no special properties for this proof to go through; for example, it would go through in the system K, where R is not reflexive, symmetric or transitive. I am grateful to Professor Gochet for drawing my attention to Tennant's new book. My impression after reading the relevant chapter is that the author is correct in interpreting Hintikka's games of seeking and finding as constructive, and in pointing out that problems concerning decidability, infinite domains and incomplete information make the discovery of effective strategies a dubious enterprise. This confirms my own experience from a computational point of view; such problems motivated the separation of propositional valuation and perlocutionary force in the present work, and led me to characterize the games in terms of acceptability rather than truth. These difficulties also led to the use of meta-level inference for strategy selection. The strategies I propose are really heuristics, i.e. they are definitely weaker than those envisaged by Hintikka. Bear in mind that what Hintikka means by a strategy is "a rule that tells the player in question what to do in each conceivable situation that can come up in a game" (my italics). Thus it is clear that the present formulation takes me some distance from my roots in Hintikka's work. This is perhaps not too surprising; Hintikka was not concerned with the computational problems of game-theoretic semantics viewed as a technique for automated reasoning.
1 Game- Theoretic Interactions
59
Reply to Farinas del Cerro: The main advantage of the modal proof method outlined is that the issue of accessibility is dealt with by a special unification algorithm. This is more general than having to state the requisite axioms, and more efficient than a first order axiomatization in which one needs to have recursive (and combinatorially explosive) rules defining the accessibility relation. If one places no additional constraints upon the unification of two indexed literals, then one has a system in which the accessibility relation is one of equivalence, i.e. one has S5. However, by manipulating the properties of the accessibility relation, one can weaken this logic to any normal modal system with a Kripke semantics. (Recent work has removed the restrictions concerning the Barcan formula and the seriality requirement.) I should rather not be drawn into a discussion of classical versus intuitionistic logic, for which I am ill-equipped! Reply to Moore: If one attempts to program a solution to the problem of knowledge base updates, one soon finds out that the potential for combinatorial explosion in checking the consistency of new assertions is very great indeed. Applying resource bounds to a conventional resolution theorem-prover is both arbitrary and ineffective within reasonable limits. Complete methods for the full clausal form, such as ancestryfiltered form refutation, are combinatorially explosive, even if combined with heuristics such as unit preference, merging, subsumption and tautology elimination. Suppose that one's knowledge base, k, already contains some facts and rules, a number of which share predicates with the new rule r that one wishes to add. Let the average branching factor at each stage of the proof be b, and lets be the size of the rule, i.e. the number of literals it contains. In the happy event that a contradiction can be derived from k v {r} by unit resolution alone, this can take bs steps in the worst case. However, one may need to perform non-unit resolution, which will introduce more goals that ultimately need to be resolved away. If n such rule applications are required involving rules also of size s, then the worst case lengthens to bs + !s- 2 l" steps. Needless to say, the factor of n in the exponent is not good news. Thus Moore is correct when he supposes that the interest in game-theoretic approaches derives from the fact that conventional methods are computationally infeasible. Both domain-free and domain-specific heuristics will effect improvements on the worst case, as Moore rightly observes. The question is how to do this in a principled and structured way. Anyone who has looked inside the Boyer-Moore theorem-prover knows that the addition of heuristics requires careful organization if program behaviour is to be understandable, and the effect of adding new heurstics is to be predictable. Transparency is after all an important issue in expert systems. The game-theoretic approach has the advantage that the finiteness of resources is acknowledged at the outset, and one is more or less compelled to supply game-playing strategies that serve as heuristics. As Moore notes, the lure is that the length of the game is linear in the maximum depth of the formula, if the players have good heuristics and a level of risk can be tolerated. Strategies are organized around the connectives and quantifiers, and I had hoped that this structure would serve as a framework for modular and incremental heuristic control. In retrospect, this may seem rather naive, but then hindsight is an exact science. The main problem is that alth~ugh Hintikka's games on connectives and quantifiers purport to be about s~ekmg and finding, it is hard to find general-purpose strategies for actually playing t em (see my reply to Gochet). Now let us address Moore's conceptual and practical difficulties. Although Hintikka's games are essentially semantic, the moment that one
60
P. Jackson
abandons the simplifying assumption of a complete interpretation, one is forced to do real seeking and finding, and so, in the context of knowledge-base management, proof-theoretic games are required. The conceptual link between the two is provided by Hintikka's notion of a model set, which can be viewed both as a theory embodied in a set of sentences and as a partial interpretation that gives rise to more than one model. (Tennant's "background model" serves a similar purpose.) In my earlier work, the semantic games are the G games, such as (G. v) and (G.&), while the proof-theoretic games are the F games, such as (F. v) and (F.&). The former are as they are simply by virtue of the semantics of the connectives and operators; the latter say how to proceed with the proof with respect to a given theory once the atomic case has been reached. (F. v) and (F. E) are not complete with respect to the standard semantics of classical logic because they insist that a proof be constructive (see my reply to Gochet). The practical question of whether or not one can derive useful game-playing strategies is precisely that addressed by this paper and in previous work. In Jackson (l987b) I show that one can only derive useful domain-independent heuristics for a restricted class of games-those involving formulae which are essentially production rules, consisting of simple conjunctions of conditions and conclusions, with variables ranging over a small, finite domain. Even here, one was forced to tolerate a certain level of risk when making irrevocable choices, since the game outcomes win or lose are not identical with the normal valuations of truth and falsity, as Moore points out. The program described was intended as an aide to a knowledge engineer, not as a theoremprover. As stated in the earlier paper, it was assumed that the user would play a series of (possibly iterative and recursive) games with the system until he was satisfied with the outcome (cf. Tennant on game-theoretic arguments). Thus the user could reject partial proofs that he did not find convincing, or ask supplementary questions. The basic idea was to make proof sketches continuously available to the user, and allow him to flesh them out as far as he deemed necessary, rather than imposing arbitrary bounds on the prover, or letting it run overnight. The whole point of the present paper is that domain-free methods are "knowledgepoor". What is really needed is a mechanism for representing and reasoning with domain-dependent heuristics, for example by introducing "prototypical individuals" when playing quantification games. Section 3 presents a way of doing this that avoids the risks of unsoundness in the F and G rules. Although one can still structure heuristics around the connectives and quantifiers, one can also structure them around individual predicates. The amount of context that one supplies in a heuristic serves to limit its application to a particular level of control. Allowing the interpreter to reason about its own beliefs in the application of heuristics has a number of advantages. The main one is that autodoxastic reasoning is not defeasible, as Moore himself has pointed out. In other words, the winner of an autodoxastic game has a winning strategy, since he has total access to his own beliefs (even if his beliefs are incomplete with respect to the world). This is rather better than the situation in my earlier work, where a player could win or lose "unfairly" because either he or his opponent made a bad game decision. The primacy of the implicit heuristic component made it very hard to prove any interesting results about game-theoretic derivability, as I think Moore implies. Such is not the case here. The object-level theorem-prover is sound and complete with respect to the standard semantics of classical logic, while explicit heuristics in the modal metainterpreter involve patterns of reasoning that are simply indexical with respect to the current contents of the knowledge base. I have also explained that, although the
J Game- Theoretic Interactions
61
responses of the system show evidence of non-monotonic reasoning, theory extension remains monotonic. Non-monotonic theory extension is beyond the scope of this chapter. The main outcome of this research is the specification of a modal meta-interpreter that can reason about control by a genuine process of introspection, but that does not require the construction of an autodoxastic theory. The basic theorem-prover, described in Jackson and Reichgelt (1987), has since been implemented. The proof method employed can be shown to be sound and complete, and experiments conducted so far suggest that it is also computationally tractable. Moore concludes with the question: how much of this is actually due to gametheoretic semantics? I feel that many facets of Hintikka's work (e.g. game-theoretic semantics, model sets, and the possible-worlds approach to knowledge and belief) have contributed to his research. Game-theoretic semantics gave me a conceptual framework for thinking about heuristic control, even though it did not provide any straightforward solution. I also arrived at the modal proof method by the gametheoretic route described here, although in Jackson and Reichgelt we "compile out" the game theory by providing an axiomatization. My conclusion remains that gametheoretic semantics is an interesting way of looking at logic, but I should not like to claim that it is the only way to do meta-level inference.
Additional references Bellman, R. E. et a/. (1977). A useful four-valued logic. Modern Uses of MultipleValued Logic (ed. J. M. Dunn and G. Epstein), p. 19. Reidel, Dordrecht. Belnap, N.D. eta/. (1976/7). How a computer should think. Contemporary Aspects of Philosophy (ed. G. Ryle), pp. 30-69. Oriel Press, Stockfield, London. Cherniak, Ch. (1986). Minimal Rationality. MIT Press, Cambridge, Mass. Gabbay, D. (1985). N-prolog: an extension of Prolog with hypothetical implication. J. Logic Programming 2, 251-283. Lorenzen, P. (1959). Ein dialogisches Konstruktivitatskriterium. Infinitistic Methods: Proc. Symp. on Foundations of Mathematics, Warsaw. Tennant, N. (1987). Anti-Realism and Logic. Clarendon Press, Oxford.
2
An Automated Modal Logic of Elementary Changes LUIS FARINAS DEL CERRO and ANDREAS HERZIG Langages et Systemes lnformatiques, Universite Paul Sabatier, Toulouse, France
Abstract In this paper we define a modal logic for reasoning about changes of belief. We consider a particular kind of change, dealing with atomic facts, for which a simple formalization can be found. We stress how updates in databases can be represented in the same frame.
1
INTRODUCTION
To change one's set of beliefs is a fairly common activity, and it is certainly an essential occurrence in every evolving computer system. This is why an understanding of changes in belief is of great importance in areas such as philosophy, logic, databases and artificial intelligence. In the area of databases, with regard to updates, two paradigms have been given. One we call temporal. In this case the updates are interpreted as a way of obtaining a new state of the database. Modifications of the database produce a sequence of states, ordered temporally. Then with each update is associated a date. Consequently the update is something global to the database, i.e. the update concerns the complete database. Examples of this approach can be found in work of Castilho et a/. ( 1982) and Cholvy (1986). In the other approach, which we call hypothetical, the new database after the update is not constructed explicitly, but rather the reasoning that we can do on the database is modified, under the hypothesis of the update. In other words, in this approach the mechanism of reasoning about updates is local to the database, i.e. it involves a fragment of the database, the updates are not organized in time. Examples of this approach can be found in Gab bay (1982) and Warren (1986). These two dichotomic approaches can be found in many studies of "changes". In general, in the theory of changes three kinds of modifications of changes can be found. ~~~·STANDARD LOGICS
FOR AUTOMATED SONING ISBN 0·12·649520·3
Copyright((', 1988 Academic Press Umited All rights of reproduction in any form reserved
64
L. Farinas del Cerro and A. Herzig
(1)
Expansions: a piece of information is added without any modification of the database, consequently inconsistences can be introduced in this way.
(2)
Revisions: a piece of information is introduced, but requires modifications of the database; for example, any piece of information inconsistent with the added piece must be eliminated.
(3)
Contractions: a piece of information is withdrawn (retracted) from the database.
When complete databases are considered (any piece of information that does not appear in the database is false), the retraction of a piece of information means the addition of its negation to the database. Consequently, contraction and revision are the same type of modification, and revision is the only operation needed for updates. In any case, the following question regarding these two different ways of formalizing updates must be answered: what kind of information on the database must be brought to the new state after an update? The classical answer to this question is: the new state after an update is the state built from the state before with "minimal changes", and which is consistent with the update. As is well known in the logic of changes, when "minimal changes" interfere with the definition of a formal system, attention must be paid to the definition of the "preservation criterion". In other words, what formulae can be brought from one state to another? For example, we consider the database defined by the following set of formulae: {A, A -+ -+B}. We define the following update: add 1B to the database. Suppose the "preservation criterion" is defined as: "If a formula F belongs to the database and is consistent with the update then F must appear in the new database". Then we obtain a trivial database, because both A and A -+ B verify the "preservation criterion", and the new database will be: {A, A -+ B, -, B }, which is inconsistent. Therefore a system that includes "minimal changes" and a "preservation criterion" in more general terms can be inconsistent. Thus a balance must be found between "minimality" and "preservation". In the present chapter we have chosen a particular preservation criterion, and we shall consider restricted types of changes: changes that involve only units of information. In this case a solution can be found. We present a nonclassical logic, which we call ASSUME, for reasoning about changes and particularly about updates in complete databases. In Section 2 we present the language, in Sectio"n 3 we give the semantics and in Section 4 the axiomatics. The completeness and proof theory of the logic are treated in Section 5. In Section 6 modifications to databases are formalized in the frame of our logic. Finally we give some connections between our logic and related works.
2
2
65
An Automated Modal Logic of Elementary Changes
THE LANGUAGE OF THE LOGIC "ASSUME"
Expressions of the language of ASSUME are built from the symbols of the following pairwise disjoint sets: VAR:
....,, ", v,
propositional variables, p, q, ...
LIT:
propositional variables or negations of propositional variables, L, M, N, ...
+2:
classical propositional operations of negation, conjunction, disjunction, implication and equivalence respectively
-+,
ASSUME [ ]: ( ):
a modal operator brackets
The set of formulae FOR is the least set satisfying the following conditions: VAR c FOR A, BE FOR implies 1A, A "B, A v B, A-+ B, A +2 BE FOR
A
E
FOR and L
E
LIT implies ASSUME [L]A
E
FOR
3 THE SEMANTICS OF THE LANGUAGE OF THE LOGIC "ASSUME" To define the meaning of a formula, we fix a set of propositional variables, w £ VAR, usually called "world", from which the notion of satisfiability of a formula A (denoted by w sat A) is defined as follows:
w sat p iff p E w for
p E VAR
w sat -,A iff w s~t A w sat A v B iff w sat A or w sat B w sat A " B iff w sat A and w sat B w sat A
-+
B iff w s~t A or w sat B
w sat A +2 B iff w sat A
-+
B and w sat B
-+
A
w sat ASSUME[p]A iffw u {p} sat A for p E VAR w sat ASSUME[op]A iff w- {p} sat A for p E VAR Then ASSUME[L]A is true in a state w if A and L are true in a new state obtained from w with minimal changes.
66
L. Farinas del Cerro and A. Herziy
A set of propositional variables w s;; VAR satisfies a formula A if"w sat A". And A is valid, denoted by I= A, if A is satisfied by every set w s;; VAR. Let us give the following example. Let w = { p, q} be a set of propositional variables. Then the formula ASSUME[op]q is true in w, "w sat ASSUME[op]q", because "w- {p} sat q" (i.e. q E w- {p}). It can be represented graphically as follows:
w-{p}
w
4
AXIOMATIZATION
In this section we present a deductive system for the language of ASSUME. Let L, L 1 , L 2 , ..• denote arbitrary literals and let A and B denote formulae. We define M[L 1 , ... , L.] (n ~I) as the modality ASSUME[L 1 ] ••• ASSUME[L.] and we write L # L' if L and L' possess different propositional variables. We admit the following schema of axioms and inference rules. A I:
All tautologies of propositional logic
A2:
ASSUME[L]L
A3:
ASSUME[L](A
A4:
ASSUME[L]oA
A5:
M[L" .. ., L.- 1 ]L. ~ ASSUME[L]M[L 1 ,
R I:
modus ponens
R2:
presupposition
A
-+
B)
~
-+
(ASSUME[L]A
-+
ASSUME[L]B)
-,ASSUME[L]A ••• ,
L.- 1 ]L. if L # L.
A-+B
B
ASSU~E[L]A
The axiom A4 expresses the determinism of changes, in other words the expression "it is false that for every state under the presupposition L, A will be true" is equivalent to "under presupposition L, A is false". The axiom A5 expresses the "preservation criterion"; so if a formula is true in a state and this formula is independent of the change L then this formula will be true again in the state where L is assumed. Our ASSUME operator can be considered as a non-monotonic operator. If we suppose that A is deductible, and if we assume L, then A will not be
2
67
An Automated Modal Lo!fic (}{Elementary Chan!fe.
A is evaluated in a state w as follows:
w sat L
=:>
A iff for every w' such that w s; w', w' s~t L or w' sat A
In contrast, if we use this implication to represent the updates, we may obtain paradoxes, because L =:. (---, L =:> A) is a theorem of the intuitionistic calculus. In this calculus worlds are interpreted as states of knowledge. Thus the updates represented by implications express expansions. In the same framework, a logic between intuititionistic logic and ASSUME logic can be defined, avoiding some paradoxes of the intuitionistic implication and without the strict minimal change constraint of ASSUME logic. Its satisfiability relation is
w sat ASSUME[p]A iff for every w' such that w u {p} s; w', w' sat A w sat ASSUME[1p]A iff for every w' such that prj; w', w- {p} s; w', w' sat A Although in this chapter the formal properties of this logic are not given, we can see that the axioms A4 and A5 are not valid. Consequently, this logic does not collapse into the propositional calculus as previously. In analogy with intuitionistic calculus, in the new logic, worlds can be interpreted as states of belief. For example, to evaluate the formula ASSUME[1p]A in a state w we should evaluate A in a state w' such that w s; w'. Thus the updates represented by the ASSUME operator can express revisions.
2
8
An Automated Modal Logic()( Elementary Changes
75
RELATION TO OTHER RESEARCH
If we consider the expression "after adding A we have B" or "assuming A we have B" as a particular implication, our approach strengthens the classical implication A -+ Bin order to avoid some of its paradoxes. Thus, as we have seen above, A -+ (1 A -+B) is not valid from our point of view. This joins the standpoint of relevant logics as given in Orlowska and Weingartner ( 1986), where the classical calculus is filtered by the following criterion: a classical implication A -+ B will be valid only if A is "relevant" for B. On the basis of a relevance relation between predicates, Orlowska and Weingartner require that for each predicate in the consequent there exists one in the antecedent that is relevant for it. Thus A -+ ( 1 A -+ A) is valid, and A -+ (1 A -+ B) and A -+ (B -+ A) are valid only if A is relevant for B. Our approach has a particular understanding of their relevance relation between formulae. For us, A is relevant forB if A and B satisfy the following criterion of deductive independence: for every C deduced from A and B, C can be deduced from either A or B alone Another work connected with our approach is the work of Doyle (1979) on truth maintenance systems (TMS). In our case the structure of the semantic network is very simple because it is represented by a propositional complete theory, on the other hand, in contrast with TMS, if a certain belief is out of the network this entails that the negation is in. But, given the structure of our states and the properties of our logic, we cannot produce states where a certain belief and its negation appears. In TMS if this is the case then a revision of the network (nodes and their justifications in TMS terminology) is necessary to re-establish consistency.
9
CONCLUSION
In this chapter a non-classical logic that allows us to formalize updates in databases has been presented. The main property of the logic is that the real modifications to the database are given only when necessary, i.e. the new state obtained by modification is not built completely, because we use the ASSUME properties. Certainly the changes that can be expressed by our formalism are very restricted, one step that must be made is to extend this kind of change to any formula, i.e. in ASSUME[A]B, A can be a formula, eventually containing ASSUME operators. For this kind of extension, many problems appear; in Particular, the changes are non-deterministic, hence for an update many
76
L. Farinas del Cerro and A. Herzig
successor states are possible. To obtain these states, a consistency algorithm must be used; in this case, it will be impossible to use a simple syntactic criterion as for our logic. Therefore investigations must be continued. ACKNOWLEDGMENTS
We should like to thank Laurence Cholvy, Robert Demolombe, Jean Fargues, Solomon Passy and Philippe Smets for their many useful observations and criticism. BIBLIOGRAPHY Alchourron, C., Giirdenfors, P. and Makinson, D. (1985). On the logic of theory change. J. Symbolic Logic 50, 510-580. (A complete and very interesting study of modifications in theories, which marries the Alchourron and Makinson approach with the Giirdenfors approach.) Castilho, J. M., Casanova, M. A. and Furtado, A. L. ( 1982). A temporal framework for data base specifications. Proc. Very Large Data Bases, pp. 280-292. (One of the first papers considering a evolutive database as a Kripke model in the framework of temporal logic.) Cholvy, L. (1986). Updates semantics under domain closure assumption. Report ONERA-CERT, Toulouse. Doyle, J. (1979). A truth maintenance system. Artificial intelligence 12, 231-272. Gabbay, D. (1982). N-prolog: an extension of Prolog with hypothetical implication II. J. Logic Programming 2, 251-283. Giirdenfors, P (1984). Epistemic importance of minimal changes in belief. Austral. J. Phil. 61, 136-157. (A very original approach in changes in belief.) Orlowska, E. and Weingartner, P (1986). Semantic considerations on relevance. ICS PAS Report 582, Warsaw. Robinson, J. (1965). A machine-oriented logic based on the resolution principle. J. ACM 12, 23--41. Segerberg, K. ( 1986). On the logic of small changes in theories. Auckland Philosophical Papers. Auckland University. (A very clear paper about the relations between theory changes and conditional logics.) Stalnaker, R. ( 1968). A theory of conditionals. American Philosophical Quarterly Monograph Series 2, 98-112. (A classical paper in conditional logics.) Veltman, F. (1985). Logics for conditionals. Dissertation, University of Amsterdam. Warren, D. S. (1986). Data base updates. Pure Prolog: Proc. Int. Con.f on Fifth Generation Computer Systems, /COT, Tokyo, pp. 244--253.
DISCUSSION Jean Fargues: In this discussion, we abbreviate the modality ASSUME by using ASS, for convenience. This chapter presents an interesting way to envision consistency maintenance in the case of updates or revisions of a set of formulae. The logic of
2
An Automated Modal Logic (}{Elementary Changes
77
elementary change is a modal logic that allows us to express in the language of the logic itself the fact that q can be derived after revising the current set of formulae by addingp. Thus the modal formula ASS[p]q is true in Tifand only if q may be derived from a new set of formulae T' obtained from Tby addingp. From this definition, it becomes possible to reason about (elementary) changes to a theory T, without physically adding or retracting formulae in T. This comes from the Kripke semantics used to define this logic: if Tis the current theory we consider, i.e. the current world, then a new revised theory T' will be obtained as a successor world after a revision, this revision being expressed by the fact that some ASS[p]q formula is satisfied in T. Thus, although reflecting some non-monotonic behaviour of particular logic theories, the modal logic of elementary change remains monotonic, well founded and complete, as the authors have proved in the chapter. A first criticism of the logic of elementary changes could be that these changes are elementary. In fact, the language ofthis logic has a very strong restriction, namely that p must be a literal in any modal formula ASS[p]q. Another restriction comes from the fact that the modality ASSUME is a kind of successor modality (as in linear temporal logic for example) and from the determinism of the logic of elementary changes. An argument against this criticism is that we don't know how to express the axiomatics of a logic of non-elementary changes that can deal with addition or revision of arbitrary formulae, and therefore we cannot prove the completeness of such a logic. For example, the logic discussed here cannot be extended to formulae ASS[p v q]r without introducing non-determinism and completeness problems. Perhaps some other chapter in this book can provide some intuition on the way of considering a general logic of arbitrary change. I think that the most advanced survey on the logic of change has been made by David Makinson (1985), following Peter Giirdenfors, Isaac Levi and Robert Stalnaker. I shall refer to that work later in the discussion. Another criticism concerns the use of the logic of elementary change for the database-update problem. A reader familiar with some real database application could have some difficulties in filling the gap between the practical databasemanagement system he uses and the theoretical concerns of this paper. For example, a simple thing that we might like to express on a database is "we must take care to add p whenever we add q by update", or "we must take care to retract p whenever we retract q by update" When I read this chapter, I had problems in expressing such conditions in the Farinas del Cerro and Herzig logic. Thus I should appreciate it if the authors could propose to me a formulation for such constraints, by considering a particular theory including specific additional axioms. An idea could be to write formulae in the form ASS[p ]ASS[ 1 q ]false to express the fact that each database state does not contain p and 1q. But, ASS[L]true is a theorem, so that 1 ASS[p]ASS[ 1 q]true (equivalent to the preceding formula) is not a valid formula. Thus I do not see very well how to use the logic of elementary change to express integrity constraints in a database. A more fundamental problem concerns possible extensions to the logic of elementary change. To give a better understanding of the general problem of defining ~ logi~ of change, we must recall an alternative way of handling logic of change, dollowrng the good survey given by Makinson ( 1985). We recall some notations and efinitions.
78
L. Farinas del Cerro and A. Herzig
We will note that TH(A) = {x: A I- x }, I- being the deductibility relation on the logic we consider. We shall say that a set T of formulae is a theory if it is closed under TH, i.e. if T = TH(T), and we suppose here that the TH operator is such that l. 2. 3. 4.
As; TH(A) for all sets of formulae A TH(A) = TH(TH(A)) [idempotency] A s; B = TH(A) s; TH(B) [monotonicity] There exists a finite A' s; A such that p E TH(A) = p E TH(A')
[compactness]
Thus we suppose that (A, I-> is a logical structure, following the traditional definition of Gentzen and Tarski. We note that A -pis an arbitrary maximal consistent subset of A that does not imply p (several such sets may exist, of course). Giirdenfors, Levi and Makinson introduced a general meta-axiomatic defining the properties that a logic of change should have. To do this, Makinson used the "+" operator, defined by A+ x = TH((A- •x) v {x})
This definition is called the Levi identity. A + x appears as being the revised theory obtained from A by adding x, in the same way as Farinas del Cerro and Herzig do in this chapter, the consistency of the theory being preserved. In fact, the definition of ASSUME may be reformulated as ASS[p]qE T-=qE T+ p or by the equivalent definition ASS[p]qE T-=qETH((T- op)v{p}) Thus I should like to know if the logic of elementary change that the authors introduce may be considered as a particular case of the more general formalism introduced by Makinson and the other authors I have mentioned. In this case, it should be possible to extend the formalism of the modal logic of elementary change in order to allow for more complex formulae within the scope of the ASSUME modality. Frank Veltman: Even elementary changes are complicated-much more complicated than Luis Farinas del Cerro and Andreas Herzig seem to think. The only databases that could work well under the kind of changes that their theory takes into account are databases in which all units of information are mutually independent: it must be possible to alter the truth value of one atomic sentence and leave the truth values of all other atomic sentences as they stand, without this leading to incoherencies. I wonder if there are any complete databases that can harmlessly be altered in such a manner. Perhaps there are, but none of them is comparable to the kind of databases human beings have in their heads. The theory presented by Farinas del Cerro and Herzig does not teach us very much about changes in belief states, not even about the elementary ones. Let me illustrate these remarks by an example. In Holland the traffic lights have three bulbs: red, yellow and green. At all times, exactly one of these bulbs is shining; so the light is always either red, or yellow or green, but never both red and yellow, or both yellow and green, let alone both red and green. Now, let L be some Dutch traffic light and suppose our database contains the information that L was red at time t. If this database is anything near to complete then it will also contain the information that L was not green at time t and that L was not yellow at time t. It is not difficult to
2
An Automated Modal Lol(ic of Elementary Chanl(es
79
imagine a situation where it would be natural to assume that L had been green at time t instead of red: many dutchmen have at least once in their lives thought this: If the light had been green, I would not have to pay this fine
However, if we update the database according to the rules offered by Farinas del Cerro and Herzig, we end up with a database that not only says that L was green at timet, but in addition maintains that L was red at timet. And Dutch law does not say anything decisive about fines having to be paid or not in cases where the light happens to be both red and green. By changing the example a bit, we can illustrate another shortcoming of the theory of Farinas del Cerro and Herzig. Apparently, the authors think that an elementary assumption will always bring you from one complete database to another complete database. Again, this may be true for complete databases in which the basic information pieces are mutually independent, but it certainly does not hold generally. If it did then none of us would have any difficulty in choosing between the following alternatives: If the light had not been red, it would have been green If the light had not been red, it would have been yellow
But we all find it hard to make a choice here, don't we?
Reply: The main comment by Fargues and Veltman concerns the dependence of units of information. Since the only formulae that are true in every state of the model are the valid formulae, we cannot formalize this kind of dependence in our logic. Certainly, as mentioned in Section 7, it will be necessary to extend this logic in order to capture the possibility of expressing integrity constraints, which are a way of representing the dependence of units of information. Nevertheless, our aim was to define a simple modal logic that we hope is tutorial rather than to give a system able to resolve all the classical problems in databases. The example proposed by Veltman is an illustration of the dependence problem. It suggests the following question: what is a unit of information? If we consider that the colour of the traffic light is a unit then his remark is true, but this statement may not necessarily hold. Concerning Fargues' last comments, in which he asks "how are conditional logics related to theory change?", an answer is provided by P. Giirdenfors and K. Segerberg as follows: No family of conditional expressions of the form CONDITIONAL[A]B can be defined in the object language satisfying the very general postulates for theory changes of Alchourron, Giirdenfors and Makinson.
Additional reference Makinson, D. (1985). How to give it up: a survey of some formal aspects of the logic of theory change. Synthese 62, pp. 347-363. Reidel, Dordrecht.
3
Formal Expression of Time 1n a Knowledge Base EUGENE CHOURAOUI Groupe Representation et Traitement des Connaissances, Centre National de Ia Recherche Scientifique, Marseille, France
Abstract A justification is given for the desirability of using logics that take explicit account of time, especially in artificial-intelligence applications. Alternative models of the structure of time are outlined, together with their corresponding axiomatizations. Details of the use of temporal logics are illustrated by a case study of a particular new time-dependent logic, called TL-1. This logic is based on an infnite linear model of time, and contains the novelty that "immediate" past and future times are distinguished from the more general "past" and "future" by the use of special operators.
1
INTRODUCTION
Time plays an essential role in reasoning modelling where the involved informations are evolutive or submitted to changes, these changes being generally expressed by means of rules. The knowledge domains where time is involved are numerous and varied. They concern not only the real world (medicine, robot motions, automatic plant controllers, simulation of digital circuits, etc.) but symbolic universes as well (natural-language understanding, knowledge representation, man-machine communication, theory of programming, planning, etc.). In that framework, the problems to solve are so complex that, for a long time, work in artificial intelligence ignored, with a few exceptions, the study of temporal reasoning. However, because of the importance of time in many applications and despite its low level of computer modelling, researchers in Artificial Intelligence have gradually moved towards a new direction in research on time representation and Processing, which is now being actively pursued. The corresponding works can be divided in two categories: on one hand those based on computer models specifically elaborated from the object domains and/or the problems that are to be solved, and on the other hand those based on temporal logics and involving the techniques of automatic theorem-proving. We consider ~~N·STANDARD
LOGICS FOR AUTOMATED ASONING ISBN 0·12·649520·3
Copyright((" /9H8 Academic Pre.fis Limited All rights f~( reproduction in any form rt'.fierr,ed
82
E. Chouraqui
these two approaches in the following two sections; the rest of the chapter describes a particular temporal logic and its application to a symbolic system of knowledge representation, the ARCHES system.
2
COMPUTER MODELS OF TIME
An examination of the work done using the first approach reveals the diversity of computer models of time. These models have specific formal features depending on the cognitive purposes for which they have been developed. Hence the relations they have in common are mainly determined by the most general and intuitive properties characterizing time. Thus we can divide them into three categories: (i) models expressing sequences or changes between state; (ii) models representing a nonlinear time; (ii) those describing a linear time with manipulation of time intervals. 2.1
Models expressing sequences of states
This category includes the earliest work on AI relating to time; it concerned the situational calculus (McCarthy and Hayes, 1969). The domain of the study is represented through a set of states whose modalities of sequence are formally defined. Each state-called a situation-is described by features that themselves are used to infer other features about this situation or about future situations. The fluents are functions enabling calculations on these features. They are defined on the set of situations and are divided into two groups according to the nature of the target sets: the range of the propositional fluents is the set {True, False} while the range of the situational fluents is the set of situations. For instance, Rain(x)(s) is true if in the situation s it rains in x and the value of time(s) is the time t associated with the situation s. The causal assertions between situations are denoted by the specific fluent F(IT)(s), where IT is any propositional fluent: F(IT)(s) means that the situations must be followed (without specifying the moment) by the situation s' satisfying the fluent IT. Some other particular fluents enable a more accurate description of the sequences of situations (past or future) and also of the action representation. It is interesting to note that in this model neither a complete axiomatic of time nor a decision procedure similar to certain theorem-proving techniques have been developed (see Section 3). At the same time, examination of this model leads us to consider different approaches to modelling the changes between the states or the actions (Hayes, 1971); but, anyway, these approaches have the same limitations as the situational calculus. More recently, a temporal model expressing the chain ~f discrete events
3
83
Formal Expression of Time in a Know/edl(e Base
has been integrated into a medical expert system for cardiovascular diseases. This is the MECS-AI system (Koyama et a/., 1981), in which the past is introduced by key words like PREVIOUS or IN-PAST and the present by a special state called PRESENT; the future has no representation. This system handles chainings between states whose sequencing is not determined by action rules. During a consultation every state is generated, and the inference rules defined from past-related data enable the fixing of a diagnosis and the corresponding treatment. This model has been elaborated so as to solve some of the temporal problems arising in the selected domain, which justify the limits on its power of expression associated with its low level of formalization. 2.2
Models expressing nonlinear times
The study of computer modelling of branching and non-dense time has been approached in the context of natural-language understanding and, more specifically, in the context of tense analysis. The most significant model denotes the time by an ordered pair (T, , that is the smallest transitive and reflexive relation satisfying the axioms of Table 2 (Porte, 1965). The relation = has been introduced to reduce the length of the axioms. We state that a = b (the description a "is equal to" the description b) if and only if a=> b and b =>a. We can easily show that the relation = is an equivalence. Conditions CI-C3 and CIO-CI6 specify that the semantic properties of the connectives ", v and -, are the same as in classical logic. Conditions C5, C6, C5' and C6' indicate respectively that the present and the immediate future belong to the future and the present and the immediate Past belong to the past. Conditions Cl7 and Cl7' express the determinism of the evolution of descriptions. Conditions Cl9 and Cl9' ensure that the connectives EB and 8 are transitive. C20 and C20' express the coherence of the evolution of descriptions in that the set d has one and only one minimum. Conditions C~ I • C22, C21' and C22' mean that if the formula a => b is always true then it ~Ill always be true in the future and in the past. Finally, conditions C23 and 24 express the symmetry of connectives defining the future and the past.
92
6
E. Chouraqui
SOME SEMANTIC CONTENTS OF TLA
The interpretation domain ofTL& is the (non-empty) union of two sets D; and De, containing the symbols for individuals and concepts respectively. Clearly De c &'(D;), the power set of D;, because every concept may be interpreted as a set of individuals, and Dec D; because every concept may be interpreted as some individual. Every interpretation of a description b is to be viewed as the set of the interpretations characterized by b: more precisely, as a mapping of the integers 7l. into &'(D;) so as to take account of the evolution of individuals: .::\ -+
(7l.
-+
&'(D;))
Associated with this mapping, there is a corresponding function Cfi that associates each component of TL& with its interpretation in the domain D; u De. Hence Cfi(b)U) E &'(D;) determines the set of interpretations of individuals that is covered by the description b in the state j, where j is a member of a set isomorphic to 7l.. The interpretation of any general description is defined as the set of interpretations of individuals characterized by this description, and successive (time-)states in the evolution can be labelled (as below) by successive elements n of 7l.. It is often convenient to begin the evolution from an initial staten= 0, which is a starting-point for the iterations such as (4) and (5) below, which describe the evolution of the correspondence function Cf, as n vanes. The interpretation of classical connectives is given by (1)
Cfi(a "b)(n) = Cfi(a)(n) n Cfi(b)(n)
(2)
Cfi(a v b)(n)
(3)
= Cfi(a)(n) u
Cfi(b)(n)
Cfi(1a)(n) = CCfi(a)(n)
(Cis complement in D;)
These rules correspond to the classical and intuitive interpretations of ", v and --,. The interpretation of temporal connectives is given by
(4)
Cfi( + a)(n) = Cfi(a)(n
+ 1)
00
(5)
Cfi($ a)(n) =
U Cfi(a)(n + p)
p=O
(6)
Cfi(- a)(n) = Cfi(a)(n - 1)
(7)
Cfi(
(8)
e a)(n) =
Cfi(.lc)(n) =
0
u Cfi(a)(n 00
p)
p=O
(i.e. empty)
Rules (4), (5), (6) and (7) determine the semantics of the evolution of
3
93
Formal Expression (}(Time in a Knowledl(e Base
descriptions: the interpretation of connectives + and - show that this latter expresses the evolution between two consecutive states E; and E; + 1 or E; and E;-i; the interpretation of connectives EB and expr-ess the evolution between two successive but not necessarily consecutive states E; and Ej U ~ i or j ~ i). From rules (5) and (7) we may also deduce that CC( EB EB a)(n) = ~(EB a)(n) and ~(88 a)(n) = CC(8 a)(n), which proves the transitivity of connectives EB and e. Finally, we may remark that unlike connectives + and -, the connectives EB and guarantee that the present is a part of the future or of the past (because j ~ i or j ~ i). Rule (8) shows that, whatever the state E;, there is no individual x characterized by the empty description. The connectives EB and 8 must be introduced in order to derive descriptions in which the sequence of states is not explicit (the aim is to look for the existence of at least one state in which an individual x has a given group of properties; these connectives are similar to the existential quantifier in classical logic). We can establish the soundness of the deduction relation=> in Table 2 with a proof that contains only technical details. We define the formula a=> b to be valid if CC(a)(i) is included in CC(b)(i) (i.e. CC(a)(i) c ~(b)(i)) for all i and for every pair (i.e. every interpretation) {Diu De,~}. The soundness of=> as a deduction relation is then proved by separate proofs of soundness of every condition Ck in Table 2 that makes explicit or (through =)indirect use of this relation. Further details of TL& that are of interest can be derived from all of the material above. For example, from conditions C5, C6, C5' and C6' and from the transitivity of the relation =>, we can easily establish the six following propositions:
e
e
-, e•a =>-a -, e-, => ea These propositions show that the complex operators"-, EB-," and "-, e-," may be interpreted as the universal quantification in the framework of time expression (an interpretation analogous to 3 and V in classical logic: l3-, = V); thus we may express the permanence of description whatever the state of knowledge. From axiomatization of the relation =>, we can also establish the following propositions: +(a v b)= +a v +b
-(a v b)= -a v -b
$(a v b) = EBa v EBb
8(a v b) = ea v 8b
$(a " b)=> EBa " EBb
8(a " b)=> ea " 8b
94
E. Chouraqui
The converse propositions of the last line are not satisfied. Therefore, in order to represent the description in a normal form expressed as an addition of disjunctions, we make the hypothesis that descriptions of the type EB(a " b), EB•(a v b), 8(a" b) and 8•(a v b) are not formulae of TL11 •
Table 2
The axioms and inference rules of TL 11
(i) Classical connectives CI C2 C3 C4
=
a 1\ b =a and a 1\ b b if a = b then a A c = b " c if a " 1 b = b then a = b if a, a b then b
=
aAb=bAa Cll a A (b " c) = (a " b) " c Cl2 a A a= a Cl3 a " A= A " a= A Cl4 a A (a v b)= a CIS a 1\ (b v c)= (a 1\ b) v (a Cl6 a/\ Ia= lal\a=A CIO
A
c)
avb=bva a v (b v c) = (a v b) v c ava=a avA=Ava=a a v (a A b) = a a v (b A c)= (a v b) 1\ (a v c)
(ii) Temporal connectives C5 C6 C7 CS C9
a=®a +a= ®a +®a=®+a ®(a v b)= ®a v EBb ®181a=a
Cl7 + 1a =..,+a CIS +(a A b)= +a A +b Cl9 ®®a= ®a C20 ®A=A C21 if a = b then +a = + b C22 if a= b then ®a= ®b C23 -+a= +-a C24 G®a = ®Ga
C5' C6' CT CS' C9'
a=Ga -a=Ga -ea= e-a 8(a v b)= ea v 8b 81®1a=a
CIT CIS' Cl9' C20' C21' C22'
- Ia = 1 -a -(a 1\ b)= -a A -b eea = ea 8A=A if a = b then -a = - b if a = b then ea = Gb
Similarly, focusing on ANDTHEN, we can derive simple results like noncommutativity (a ANDTHEN b #- b ANDTHEN a) and significant and desirable results like non-associativity ((a ANDTHEN b) ANDTHEN c #- o ANDTHEN (b ANDTHEN c)) and the fact that a permanent description cannot evolve (-, EB-, a ANDTHEN b = A.).
3
95
Formal Expression (}[ Time in a Knowledge Base
7 EXAMPLE OF DECISION PROCEDURE FOR THE RELATION
=
Figure 4 shows a simple outline of the proof of the goal "Does (will) there exist at least one state such that a robot will be awake?" In order to achieve this goal, the corresponding decision procedure involves on the one hand the facts "the robot Musclor is in the passage P; a moment later it will be in the workshop W and then it will perform the task T", and on the other hand the
KNOWLEDGE BASE RULES BASES
( R150
if (ROBOT X 8ACTIVITY( ISA,TASK ( Y))) then (ROBOT X 8STATE (IS A, AWAKE)))
DATABASE
(ROBOT MUSCLOR
LOCALIZATION(IN,PASSAGE ( Pll /\+LOCALIZATION (IN, WORKSHOP(W)) A++ACTIVITY(ISA,TASK( T )))
(elimination of the "and" connective)
(ROBOT MUSCLOR
LOCAL7,PASSAGE (P
(ROBOT MUSCLOR/\t+ACTIVITY( ISA,TASK( T))) ))I
1
(C21,C17,C15l
l
(ROBOT MUSCLOR/\8/\ ACTIVITY( ISA,TASK(T)))
(ROBOT MUSCLOR +LOCALIZATION(IN,WORKSHOP(W)))
(Rl50,•-I.IJSCOR,,•Tl
(ROBOT MUSCLOR/\8/\STATE( ISA,AWAKE)) ~ (z-MUSCLOR) (ROBOTz /\8/\STATE (IS A, AWAKEJ'?(goal)
Fig. 4
Example of a decision tree.
96
E. Clwuraqui
rule "if there exists (will exist) at least one state such that a robot performs (will perform) a task then there exists (will exist) at least one state such that a robot is (will be) awake". More precisely, the procedure whose effect is shown in Fig. 4 determines whether the relation H =>Cis satisfied, given a pair of descriptions (H, C). It works by using the formal properties of => noted above, and methods of problem decomposition and construction of the corresponding and/or graphs. It builds up two trees Ah and Ac for the hypothesis H and the tentative conclusion C, by development from the descriptions associated with Hand C, and then constructs an and/or graph A, by "appending'' to each leaf of Ah the tree Ac without its root. Finally, it tries to satisfy H => C by searching for at least one valid "and" subtree of A, by using all the axiomatic properties of TL& (e.g. in Tables I and 2).
8
CONCLUSION
Temporal reasoning plays a crucial role in several knowledge domains. Because of the complexity of the technical problems that have to be solved, it has probably been more common in the past for people to build specialpurpose computational models to solve specific problems than for them to use temporal logics and automatic theorem-proving. This may have happened in part because systematic presentations of the available types of temporal logic have been hard to find. The present chapter is intended to fill the gap, and to encourage the use of temporal logics for computational time-based reasoning. Following the general outline in Sections 1-4, we have given as an example and a case study the construction of a particular temporal logic TL& that is linear in a time coordinate that extends infinitely far into the past and the future. The novelty of. the basic structure of TL& is that the temporal connectives make a distinction between immediate past and future and the usual general views of "past" and "future". This distinction is made in order to allow the easy expression of transformations that may typically affect data in the course of time. The axiomatization of the connectives of TL&, and the natural development of its properties that has been summarized in Section 4, have led us to define a model of interpretation for the logic that is comparable to that of Kripke ( 1963) for modal logic. With the help of this model, we can prove consistency and soundness of TL&, which allows us to preserve the intrinsic coherence of the knowledge base of our ARCHES system when TL& is embedded in ARCHES. In addition, we have given a decision procedure that is based on the TL& axiomatic system and that supports information retrieval from the knowledge base, as in Fig. 4, taking into consideration the
3
Formal Expression of Time in a Knowled!fe Base
97
temporal constraints. The completeness problem has not been approached in the framework of this study, because the objective of our present research is the representation of time in a knowledge base rather than automatic theorem-proving. BIBLIOGRAPHY Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Commun. ACM 26, 832-843. Allen, J. F. ( 1984). Towards a general theory of action and time. Artificial/ ntelligence, 23, 123-154. Bruce, B. C. (1972). A model for temporal references and its application in a question answering program. Artificial Intelligence, 3, 1-26. Chouraqui, E. (1981). Contribution a )'etude theorique de Ia representation des connaissances, le systt~me symbolique ARCHES. These de doctorat d'etat, Nancy. Chouraqui, E. (1983). Formal expression of the evolution of knowledge. Proc. Int. Systems Dynamics Conf MIT. Chouraqui, E. (1986). Un systeme forme) de caracterisation de l'evolution des connaissances. Proc. Canadian Society for Computational Studies of Intelligence (CSCS/-86), Montreal, pp. 256-261. Cocchiarella, N. (1966a). Modality within tense logic. J. Symbolic Logic, 31,690--691. Cocchiarella, N. (1966b). A completeness theorem of tense logic. J. Symbolic Logic, 31, 688-691. Farinas del Cerro, L. (1981). Deduction automatique et logique modale. These de doctorat d'etat, Paris. Farinas del Cerro, L. (1986). MOLOG, Manuel d'utilisation. Laboratoire Langages Systemes Informatiques, Universite Paul Sabatier, Toulouse. Fusaoka, A., Seki, H. and Takahashi, K. ( 1983). A description and reasoning of plant controllers in temporal logic. Proc.lnt. Joint Conf on Artificial Intelligence (/JCA/83), Karlsruhe, Vol. I. Gabbay, D. M. (1972). Tense systems with discrete moments of time. J. Phil. Logic, I, 35-44. Hayes, P. J. (1971). A logic of actions. Machine Intelligence 6 (ed. B. Meltzer and D. Michie), pp. 495-520. Edinburgh University Press. Hughes, G. E. and Creswell, M. J. (1973). An Introduction to Modal Logic. Fletcher and Sons, Northwich, UK. Kandrashina, E. Yu. (1983). Representation of temporal knowledge. Proc. Int. Joint Conf. on Artificial Intelligence (/JCA/-83), Karlsruhe, Vol. 1, pp. 346-348. Koyama, T., Kaihara, S., Minimamikawa, T. and Kurokawa, T. (1981). Time-oriented features for medical consultation systems. Proc. Int. Joint Conf on Artificial Intelligence (/JCA/-81), Vancouver, Vol. 2, pp. 910--912. Kripke, S. A. ( 1963). Seman tical considerations on modal logic. Acta Phil. Fennica, 16, 83-94. Long, W. J. (1983). Reasoning about state from causation and time in medical domain. Proc. American Association for Artificial Intelligence Conf (AAAI-83), Washington, DC., pp. 251-254. Long, W. J. and Russ, T. A. (1983). A control structure for time dependent reasoning. Proc. Int. Joint Conf on Artificial Intelligence (/JCA/-83), Karlsruhe, Vol. 1, pp. 230-232.
98
E. Chouraqui
Malik, J. and Binfold, T. D. (1983). Reasoning in time and space. Proc./nt. Joint Con{ on Artificial Intelligence (IJCAI-83), Karlsruhe, Vol. I, pp. 343-345. Manna, Z. and Pnueli, A. (1983). Verification of concurrent programs: a temporal proof system. Report STAN-CS-83-967, Dept Computer Science, Stanford Univ. McCarthy, J. and Hayes, P. J. (1969). Some philosophical problems from the standpoint of artificial intelligence. Machine Intelligence (ed. B. Meltzer and D. Mitchie), 4, pp. 463-502. Edinburgh University Press. Porte, J. ( 1965). Recherches sur Ia theorie generate des systemesf(Jrmels. GauthierVillars, Paris. Prior, A. N. (1957). Time and Modality. Clarendon Press, Oxford. Prior, A. N. (1967). Past, Present and Future. Clarendon Press, Oxford. Prior, A. N. (1968). Papers on Time and Tense. Clarendon Press, Oxford. Rescher, N. and Garson, J. (1968). Topological logic. J. Symbolic Logic 33, 537-548. Schwind, C. B. ( 1978). Representing actions by state logic. Proc. Artificial Intelligence and Simulation of Behaviour (AISB/GJ), Hamburg, pp. 304-308. Schwind, C. B. (1983). A completeness proof for a logic of actions. Groupe Representation et Traitement des Connaissance (GRTC/LISH 127bis), Marseille. Schwind, C. B. (1984). A Prolog theorem prover for temporal and modal logic. Groupe Representation et Traitement des Connaissances (GRTC/LISH 386), Marseille. Vilain, M. B. (1982). A system for reasoning about time. Proc. American Association for Artificial Intelligence Conf (AAA/-82), Pittsburgh, pp. 197-201.
DISCUSSION In the first part of the chapter different computer models of time are briefly presented. With regards to models expressing sequences of states, one of the most famous is the situation calculus. A situation is a time interval over which no state of interest changes truth value; it denotes a stable period of time (or state, for short). For instance (T s (location John office)) means "It is true in situation s that John is at his office". An event is something that can happen, like John going to his house, and expresses a transition between states. So we can represent the effect of "John goes back home" by (T (results (do John (go home))) (location John home)), where "result" builds a new situation. Finally, it can be expressed that in a given situation, a given event actually occurs. One of the limitations of the situation calculus is that it focuses on transitions between discrete states. Moreover, the situation calculus is most useful when there is a single next situation after a given situation; that is, in the deterministic case. In the section dealing with linear-time models with manipulation of time intervals, the work from McDermott ( 1982) deserves to be mentioned. In this approach. partially ordered sets of states are dealt with. A fact is represented by the set of states where this fact is true, and an event is represented by the set of intervals during which this event happens. Moreover, a date is associated with each state. McDermott has proposed a semantic approach to a temporal logic of dating using a temporal precedence relation defined on the set of dates. This precedence relation satisfies the properties of transitivity, left-linearity (there is only one past), density and continuity. The temporal logic defined by McDermott is most interesting in Artificial Intelligence applications. Claudette Testemale:
3
99
Formal Expression of Time in a Knowledge Base
The second part of the chapter deals with temporal logics divided in tense logics and logics of dating. It would be useful to have a semantic characterization of the axioms in the framework of a temporal structure, in terms of temporal precedence relation and meaning function. In the rest of the chapter, a particular temporal logic and its application to a symbolic system of knowledge representation are described. This temporal logic belongs to non-measurable tense logics with discrete moments of time, but the axiomatics has been made suitable for the knowledge-representation application. However, the need for a specific temporal logic is not enhanced enough in the chapter. A possible way of giving a justification could consist in talking about semantics first. The semantics of the evolution of the so-called descriptions is described by the rules of interpretation of the different connectives. Then it is interesting to look for an axiomatization (with inference rules and axioms) of a special kind of formula (cf. tautologies in first-order logic). For instance, the axioms denoted Cl7, CIT, C20 and C20' seem to be fundamental and indispensable in any minimal set of axioms for the system TL~. Besides, it would have been interesting to get a synthetic view of the differences between the temporal-logic system TL~ and the temporal logics listed in the second part of the chapter. For instance, among the main characteristics of the system TL~. we find the use of four temporal connectives (the usual ones, mediate past, mediate future and two other connectives-immediate past, immediate future), and a specific organization of time where each point of time has a unique successor. The set of points of time may be represented by the set of integers. Then the temporal precedence relation underlying the semantic interpretation is the usual order on the set of integers (less than). This relation satisfies the properties of linearity and transitivity. Finally, it is worth noting that the usual connectives H (truth every time in the past) and G (true every time in the future) are recovered as -, 8-, and -, EE>-, respectively. The final comment concerns the property of the relation of deduction in the formal system TL~. The relation is proved to be valid but nothing is said about completeness (that is: if b). From the axiomatization of the relation =>, the proposition Ef>(a 1\ b)=> Ef)a 1\ Ef)b can be established while the converse cannot. However, it can be easily proved that the following relation is valid: Ef)a
1\
Ef)b => Ef>(a
1\
b) v Ef>(a
1\
Ef)b) v Ef>(Ef>a
1\
b)
Therefore one may ask for a justification for not considering formulae in which the connectors "and" and "or" appear in the scope of EE> or e.
L. Farinas del Cerro: When temporal logic is used in computer science, as in other domains, an important problem arises concerning the choice of the underlying model of time that we have in mind. Broadly, two main models can be considered: the linear model of time and the branching model of time (i.e. the time is formalized by a total or by a partial order). For example, in Chouraqui's paper the linear approach is given. Many discussions about the relevance of capturing the time structure of a particular P!oblem appear in the literature. The languages associated with these two models can grve, in a first approximation, some ideas about how to compare these two models. example, in the branching-time model we can distinguish an event that is possible ..rom an event that is certainly possible; for example "Marie can come tomorrow" and Marie will come tomorrow". From this example, we can see that branching logic is
jor
too
E. Chouraqui
more expressive than linear logic. However, Lamport (1980) proves that linear logic and branching logic are not equivalent. He proves that for a reasonable language there is a formula in linear logic that is not equivalent to any formula of branching logic, and on the other hand, there is a formula of branching logic that is not equivalent to any formula of linear logic. The result shows us that the choice of temporal logic must be given using a pragmatic criterion, for example the field of application. Consequently, particular attention must be paid to the justification of the structure of time selected. In this chapter the choice of the linear model is given under the hypothesis of the deterministic nature of time changes. J. A. Campbell:
There are a considerable number of logics dealing with time, but there is not much agreement on the key questions that the various kinds of temporal logic are expected to answer. The book on this subject by Turner ( 1984) is a good source for this variety. Chouraqui's classification of actual and possible work on temporal logics in terms of the structure of time is a helpful supplement to Turner's treatment. It is useful to read the two in parallel; surveys of temporal logic are otherwise hard to find. The chapter then continues with an example of a new and specific logic. When any new proposal of this kind appears, the reader needs to see first whether it is a technical development that avoids some of the difficulties of some existing class(es) of temporal logic or whether it is sui generis, introducing new issues or new views of reasoning about time. If it fits the former description then the examination can focus on the technical achievements and the specific connections with the logics that it improves upon. If not, then it can be assessed on the likely importance or usefulness of the new issues that it raises (provided that it passes an inspection for internal technical defects). Chouraqui's case study is difficult to place at first sight, because it is presented as a contribution to the topic of evolution of the contents of a knowledge base in time (which contains sub-topics involving several different kinds of logic). However, most of these sub-fields are just mentioned but otherwise ignored here. This is understandable in a short chapter, but it would have been interesting to see at least an estimation of the relative advantages and disadvantages of TL-1 and time-stamp or dating logics (Section 3.2) for the knowledge-base application. In practice, Chouraqui's contribution is quite narrowly a system in which observations about the past, present and future truth or falsity of logical assertions can be made. It competes with several existing logical frameworks that deal with unbounded linear time, and can be judged in competition with them. The general lesson of experience with these frameworks is that complicated systems of axioms cause more trouble than they are worth. For example, they complicate the achievement of technically necessary goals such as proofs of completeness. It is not intuitively obvious how to start a proof of completeness in his scheme. It may therefore not be coincidence that even a short report about completeness is not present in the chapter. In this respect, the case-study part of the chapter is more a report of work in progress than a finished survey of an area. Apart from this objection, the technical basis for further development of Chouraqui's logic seems sound, but the motivation for the approach (i.e. the answer to the question "why should somebody use just this approach for reasoning about the evolution of a time-dependent knowledge base, rather than some other one indicated by Turner (1984) or its bibliography?") is not clear. The evident novelty of the logic is
3
Formal Expression (){Time in a Knmdedl(e Base
101
the distinction between two parts (immediate and mediate) of both the past and the future. It is certainly true that many acts of reasoning are made more convenient if one can give special treatment to the immediate past or future (for example, an assertion may be known to be true or false close to the present but have no definite truth-value at more remote times), but informally this intuition is captured only if the two-way split is exclusive. However, in the logic under discussion it is not exclusive: if the past is p and the immediate past is IP, then IP is a subset of P. Consequently, the truth of a in IP requires one to deduce that a is true in P. (Axiom C6' justifies both of these points.) It is hard to see what problems in time-based reasoning are better suited to a logic containing C5, C5', C6 and C6' than one in which the fundamental partitioning of the past deals with (and makes axiomatic distinctions between) IP and P-IP. The same applies to the future, where IF and F- IF have the obvious meanings. When the special features of the logic that refer to IP and IF are removed, the remaining apparatus allows a fairly conventional temporal logical structure to be built. This structure (or, rather, a family of structures in which the non-"immediate" part of the present structure is a member) is summarized well in Chapter 6 of Turner (1984). This reference also contains developments of the basic structure in which intervals of time are primary units of discussion-a natural and welcome development if one is trying to find temporal logics for the formulation of practical examples in artificial intelligence. In fact, in some cases the interval is the basic unit of discussion. Allen's (1981) temporal logic is a good example, which also shows that the issues (e.g. of "naturalness" of axioms) are not all cut and dried: there is still room for research and improvement in interval logics and temporal logics in general. This issue of improvement is currently a live one, as very recent work by Ladkin (1987) and Allen and Hayes ( 1987) indicates. The work arises from a development (Allen and Hayes, 1985) of Allen's original logic, and does not have the property of completeness. Ladkin shows that one of the Allen and Hayes axioms is redundant, and adds a new axiom to achieve completeness. There are still open research questions in the area defined by these logics, for example establishing decidability in all instances where this is possible, and finding easy or practical decision procedures. A useful background reference to these logics is the text by van Benthem (1983). Among the more recent kinds of temporal logic that have emerged in response to the pressure of need, it is worth mentioning the scheme Tempura developed by Moszkowski (1986) as simultaneously a logical formalism and a programming language for the description and simulation of time-dependent processes. Although Tempura was first intended for concurrent and parallel computing exercises, it has s?me features that make it attractive also as an experimental means of describing time-dependent evolution of knowledge. Part of its quality of design is evidently due to the fact that its syntax and all of its semantics have been built up carefully together: a ~seful aim when one is trying to construct a computationally effective temporal logic ~Ithout running into both theoretical and practical difficulties. Chouraqui's presentation above does not make it evident that this happened with all the parts of the logic described in his case study.
Reply: TL& is a specific temporal logic that we purposely designed for the application we had in mind; that is, to allow representation in the ARCHES system certain modes 0 knowledge evolution and to solve the related problems. Hence its elaboration is ~ osely dependent upon the development of ARCHES, and so TL& is obviously not he most general feasible temporal language. This situation seems quite sensible methodologically speaking. Indeed, the temporal logics used in Artificial Intelligence
t
102
E. Chouraqui
present features inferred from the domains of applications and/or the classes of problems to solve (natural-language understanding, knowledge representation, databases, automatic theorem-proving, planning, causal reasoning and theory of actions, etc.). There are many different logical theories permitting reasoning about time that have been developed in Artificial Intelligence, as shown by the extensive bibliography relative to this area, but no particular theory can generally be considered as the best by the scientific community. Every one has useful properties allowing solution of the problem for which it was specifically developed, and thus leaves a lot of open questions. The case study that I have presented in the second part of the chapter stands in the research framework. The general application domain supported by this case study is the representation of knowledge. Of course, TL-1 does not allow solution of all the questions relative to the representation of time. Choices have been made with respect to the class of problems that ARCHES can represent and treat: linear and discrete expression of time, allowing representation of concepts like Yesterday, Tomorrow or Then, representation of state-changing or of the chain of discrete events, and updating of the knowledge base with keeping of the history. Thus TL-1 enables reasoning about general time sequences. With this purpose, two new temporal connectives have been defined: the immediate past and future. The semantic properties of these connectives entail the existence and uniqueness of the past (or future) instant (linear time); each is then an instant of the past (or of the future). In contrast with Campbell, I think that this last property is very intuitive: the immediate past (or future) always belongs to the past (or future). Moreover, many works on the discrete representation of time have the same kind of properties: see for example Gabbay (1972), Manna (1983) and Schwind (1983). Thus TL-1 is a specific non-measurable temporal logic built up from a classical NMTL logic for which the time is transitive, linear and doubly infinite (see Section 2.1 ). I completely agree with Campbell about how to design a logical system: its syntax and its semantics must be carefully built up together-with the goal of constructing a computable effective temporal logic. And I do stress that this process has been carried out in the design ofTL-1 (see Sections 5 and 6). Indeed, how could it not be so in order to prove the consistency and the soundness of this logic, since these two properties are essential for respecting the intrinsic coherence of the knowledge base of the ARCHES system. Similarly, we have elaborated a decision procedure particularly based upon the axiomatic system of TL-1, which allows the carrying out of information retrieval from the base, taking the temporal constraints into consideration. The completeness problem has not been approached in the framework of this study, since our objective was not automatic theorem-proving, but rather time representation in a knowledge base. In particular, the induction axiom a A -, ®.+ 1(a =>+a)=>-, B( P(x)))}
120
R. C. Moore
and we are asked to decide whether oP(C) should follow from these premises. At first, it might appear that it should, since the third premise says that every object with property Pis believed to have property P, and only A and B are believed to have property P. But what if C = A or C = B is true? The quantifier in the third premise ranges over objects, not names of objects, so the possibility is open that P(C) is true, because C is believed to have property P under a different name. Rather than ---, P(C), it seems that what we want to be able to infer is P(C) => (C = A v C = B). It is not immediately clear what is the most general pattern of reasoning of which this is an instance, or how to describe it formally. As a final word on the applications of autoepistemic logic, we shall take note of an area where autoepistemic logic probably ought not to be applied. although it is often confused with autoepistemic reasoning. Non-monotonic reasoning is frequently discussed as if it were a single unified phenomenon and all cases of non-monotonic reasoning should be handled by a single formalism or mechanism. But non-monotonicity is a rather abstract syntactic property of an inference system, and there is no a priori reason to believe that all forms of non-monotonic reasoning should have the same logical basis. In fact, it appears that formalisms better suited to modelling autoepistemic reasoning are often mistakenly applied to the rather different phenomenon of default reasoning. By default reasoning we mean the drawing of plausible inferences from less-than-conclusive evidence in the absence of information to the contrary. The classic example concerns the ability of birds to fly. If we know that Tweety is a bird we shall normally assume, in the absence of evidence to the contrary, that Tweety can fly. If, however, we later learn that Tweety is a penguin, we shall withdraw our prior assumption. This inference is unquestionably non-monotonic, but is it autoepistemic? It does have an autoepistemic component, which is perhaps the source of the confusion, since we should _reflect on whether we have any reason to believe that Tweety cannot fly before inferring that he can, but there is more to it than that. Perhaps the major difference between autoepistemic reasoning and default reasoning is that autoepistemic reasoning is logically valid, but default reasoning is not. If we know that Tweety is a bird then that gives us some evidence that Tweety can fly, but it is not conclusive. In the absence of information to the contrary, however, we are willing to go ahead and tentatively conclude that Tweety can fly; the conclusion is not certain. though, so default reasoning is not a form of valid inference. Consider the belief that lies behind our willingness to infer that Tweety can fly from the fact that Tweety is a bird. It is probably something like "most birds can fly, or "almost all birds can fly", or "a typical bird can fly". To model this kind of reasoning, in a theory whose only premises are "Tweety is ' 1
4
121
Autoepistemic Logic
bird" and "Most birds can fly", we ought to be able to infer (nonmonotonically) "Tweety can fly". If there were a form of valid inference then we should be guaranteed that the conclusion is true if the premises are true. This is manifestly not the case. The premises of this inference give us a good reason to draw the conclusion, but not the ironclad guarantee that validity demands. McDermott (1982, p. 33) suggests using a formula equivalent to the following to sanction non-monotonic inferences about birds being able to fly:
'v'x((Bird(x) " -, B(-, Canjly(x))
:J
Canjly(x))
McDermott suggests as a gloss of this formula "Most birds can fly", which would indicate that he thinks of the inferences it sanctions as default inferences. But the formula actually says something quite different: "For all x, if x is a bird and it is not believed that x cannot fly then x can fly." McDermott's formula, then says that the only birds that cannot fly are the ones that are believed not to fly. If we have a theory whose only premises are this one and an assertion to the effect that Tweety is a bird then the conclusion that Tweety can fly would be a valid inference. That is, if it is true that Tweety is a bird, and it is true that only birds believed not to fly are in fact unable to fly, and Tweety is not believed not to fly, then it must be true that Tweety can fly. This, then, is a pure autoepistemic inference, not a default inference. To put the problem slightly differently, if we took McDermott's formula as a premise, and we did not have any information about any birds that cannot fly, we should be able to infer that all birds do fly. But this is not reasonable-even as a default inference-if all we know is that most, or almost all, birds fly. Default reasoning and autoepistemic reasoning are both non-monotonic, but for different reasons. Default reasoning is non-monotonic because, to use a term from philosophy, it is defeasible: its conclusions are tentative, so, given better information, they may be withdrawn. Purely autoepistemic reasoning, however, is not defeasible. If one really believes that one already knows all the instances of birds that cannot fly then one cannot consistently hold to that belief and at the same time accept new instances of birds that cannot fly. As Stalnaker ( 1980) has observed, autoepistemic reasoning is nonmonotonic because the meaning of an autoepistemic statement is contextsensitive; it depends on the theory in which the statement is embedded. The operator B changes its meaning with context just as do indexical words in natural language, such as "I", "here" and "now". The non-monotonicity associated with autoepistemic statements should therefore be no more PUzzling than the fact that "I am hungry" can be true when uttered by a Particular speaker at a particular time, but false when uttered by a different
122
R. C. Moo"'
speaker at the same time or the same speaker at a different time. So we might say that, whereas default reasoning is non-monotonic because it is defeasible, autoepistemic reasoning is non-monotonic because it is indexicial.
4
RELATED LOGICS
Autoepistemic logic is closely related to the logic of knowledge and ignorance of Halpern and Moses ( 1984), the chief difference being that theirs is a logic of knowledge rather than belief. Levesque (1981) has also developed a kind of autoepistemic logic, but in his system the agent's premises are restricted to a sub-language that makes no reference to what he believes. Autoepistemic logic is most closely related, however, to the non-monotonic logics of McDermott and Doyle ( 1980) and McDermott ( 1982). In fact, it was designed to be a reconstruction of these logics to avoid some of their difficulties. In the first logic that they define, McDermott and Doyle (1980) introduce an operator M, with formulae of the form M ( P) being read informally as"/' is consistent". Their logic, however, gives such a weak notion of consistency that, as they point out, M ( P) is not inconsistent with -, P. That is, it is possible for a theory to assert simultaneously that P is consistent with the theory and that P is false, without there being a formal contradiction. McDermott subsequently (1982) tried strengthening non-monotonic logic by developing non-monotonic modal logics based on T, S4, and S5. He discovered, however, that the most plausible candidate for formalizing the notion of consistency that he wanted, non-monotonic S5, collapses to ordinary S5 and is therefore monotonic. The reasons for these problems are readily apparent if we compare McDermott and Doyle's logics to autoepistemic logic. To make the comparison, we shall interpret B as the dual of M. That is, M ( P) is taken as an abbreviation for -, B (-, P). In other words, P is said to be consistent if 11' is not believed. Since we suppose we are dealing with ideally rational agents. this seems appropriate. On this interpretation, McDermott and Doyle's first logic is very similar to our autoepistemic logic with one glaring exception; its specification includes nothing corresponding to Stalnaker's Condition 2 (if PET then B(P) E T). McDermott and Doyle define the non-monotonic fixed points of a set of premises A, corresponding to our stable expansions of A. Their definition is equivalent to the following:
T is a fixed point of A if and only if T is the set of ordinary logical consequences of Au {-, B(P) I P rt T}. Compare this with our definition of a stable expansion of A:
4
Autoepistemic Lo![ic
123
Tis a stable expansion of A if and only if Tis the set of ordinary logical consequences of Au {B(P) I PET} u {1B(P) I P rt T}. In McDermott and Doyle's non-monotonic logic, {B ( P) I P E T} is missing from the "base" of the fixed points. This makes it possible for there to be nonmonotonic theories with fixed points that contain P but not B(P). So, under an autoepistemic interpretation of B, McDermott and Doyle's agents are omniscient as to what they do not believe, but they may know nothing about what they do believe. This explains essentially all the peculiarities of McDermott and Doyle's original logic. For instance, they note (1980, p. 69) that M (C) does not follow from M (C " D). Changing the modality to B, this is equivalent to saying that ---,B(P) does not follow from ---,B(P v Q). The problem is that, lacking the ability to infer B(P) from P, non-monotonic logic permits interpretations of B that are more restricted than simple belief. Suppose that we interpret B as "inferable in n or fewer steps" for some particular n. P might be inferable in exactly n steps, and P v Q in n + I. According to this interpretation, ---,B(P v Q) would be true and 1B(P) would be false. Since this interpretation of B is consistent with McDermott and Doyle's definition of a fixed point, ---, B( P) does not follow from ---, B( P v Q). The other example of this kind noted by McDermott and Doyle is that {M(C), 1C} has a consistent fixed point, which amounts to saying simultaneously that Pis consistent with everything asserted and that Pis false. But this set of premises is equivalent to {---, B( P), P }, which would have no consistent fixed points if B( P) were forced to be every fixed point that contains P. On the other hand, McDermott and Doyle consider it to be a problem that the set of premises {M( C) :J D, ---, D} has no fixed point in their logic. Restated in terms of B, this set of premises is equivalent to { P :J B(Q), P}. Every stable autoepistemic theory containing these premises will also contain Q. (If such a theory is consistent then, being closed under ordinary logical consequence, it will contain B(Q) and therefore must contain Q to avoid containing 1B(Q). On the other hand, an inconsistent autoepistemic theory will contain Q because it contains everything.) But Q is not contained in any theory grounded in the premises { P :J B(Q), P}; it is possible for P :J B(Q) and P both to be true with respect to an agent while Q is false. So there is no stable expansion of { P :J B(Q), P} in autoepistemic logic; hence this set of premises cannot be the foundation of an appropriate set of beliefs for an ideally rational agent. Thus our analysis justifies non-monotonic logic in this case, contrary to the intuition of McDermott and Doyle . . McDermott and Doyle recognized the weakness of the original formulabon of non-monotonic logic, and McDermott ( 1982) went on to develop a group of theories that are stronger because they are based on modal logic
124
R. C. Moore
rather than classical logic. McDermott's non-monotonic modal theories alter the logic in two ways. First, the definition of fixed point is changed to be equivalent to T is a fixed point of A only if T is the set of modal consequences of Au {•B(P) I P ¢ T},
where "modal consequence" means that PI- B(P) is used as an additional inference rule. Secondly, McDermott considers only theories that include as premises the axioms of one of the standard logics T, S4 and S5. Merely changing the definition of fixed point brings McDermott's logic much closer to autoepistemic logic. In particular, adding PI- B( P) as an inference rule means that all modal fixed points of A are stable expansions of A. However, adding PI- B(P) as an inference rule, rather than adding { B( P) I P E T} to the base of T, has as a consequence that not all stable expansions of A are modal fixed points of A. The difference is that, in autoepistemic logic, if P can be derived from B( P) then both can be in a stable expansion of the premises, whereas in McDermott's logic there must be a derivation of P that does not rely on B(P). Thus, although in autoepistemic logic there is a stable expansion of {B(P) :J P} that includes P, in McDermott's logic there is no modal fixed point of {B(P) :J P} that includes P. It is as if, in autoepistemic logic, one can acquire the belief that P and justify it later by the premise that if P is believed then it is true. In nonmonotonic logic, however, the justification of P has to precede belief in B( P ). This makes the interpretation of B in non-monotonic modal logic more like "justified belief" than simple belief. Since we have already shown that autoepistemic logic requires no specific axioms to model ideal autoepistemic reasoning, we might wonder what purpose is served by McDermott's second modification of non-monotonic logic, the addition of the axioms of various modal logics. The most plausible answer is that, besides behaving in accordance with the principles of autoepistemic logic, an ideally rational agent might well be expected to kno'>' what some of those principles are. For instance, the modal logic T has all instances of the schema B(P :J Q) :J (B(P) :J B(Q)) as axioms. This says that the agent's beliefs are closed under modus ponens-which is true for an ideally rational agent, so he might as well believe it. S4 adds the schema B(P) :J B(B(P)), which means that if the agent believes P then he believes that he believes it (Stalnaker's Condition 2). S5 adds the schema -, B( P) :J B(-, B(P)), which means that if the agent does not believe P then he believes that he does not believe it (Stalnaker's Condition 3). Since all these formulae are always true with respect to any ideally rational agent, it seems plausible to expect him to adopt them as premises. Thus S5 seems to be the most plausible candidate of the non-monotonic logics as a model of
4
Autoepistemic Logic
125
autoepistemic reasoning. Unfortunately, non-monotonic S5 turns out to be equivalent to ordinary S5. The problem is that all of these logics also contain the schema B( P) ::J P, which means that if the agent believes P then P is true-but this is not generally true, even for ideally rational agents. t It turns out that B( P) ::J P will always be contained in any stable autoepistemic theory (that is, ideally rational agents always believe that their beliefs are true), but making it a premise allows beliefs to be grounded that otherwise would not be. As a premise the schema B( P) ::J P can itself be justification for believing P, while as a "theorem" it must be derived from 1B(P), in which case P is not believed, or from P, in which case P must be independently justified, or from some other grounded formulae. In any case, as a premise schema, B(P) ::J P can sanction any belief whatsoever in autoepistemic logic. This is not generally true in modal non-monotonic logic, as we have also seen, but it is true in non-monotonic S5. The S5 axiom schema 1B(P) ::J B(•B(P)) embodies enough of the model theory of autoepistemic logic to allow B( P) to be "self-grounding": the schema 1B(P) ::J (1B(P)) is equivalent to the schema oB(oB(P)) ::J B(P), which allows B(P) to be justified by the fact that its negation is not believed. This inference is never in danger of being falsified, but, from this and B(P) ::J P, we obtain an unwarranted justification for believing P. The collapse of non-monotonic S5 into monotonic S5 follows immediately. Since B( P) ::J P can be used to justify belief in any formula at all, there are no formulae that are absent from every fixed point of theories based on nonmonotonic S5. It follows that there are no formulae of the form --, B( P) that are contained in every fixed point of theories based on non-monotonic S5; hence there are no theorems of the form --, B( P) in any theory based on nonmonotonic S5, because McDermott takes the theorems of a theory to be the intersection of all the fixed points. Since these formulae are just the ones that Would be produced by non-monotonic inference, non-monotonic S5 collapses to monotonic S5. In more informal terms, an agent who assumes that he is infallible is liable to believe anything, so an outside observer can conclude nothing about what he does not believe. The real problem with non-monotonic S5, then, is not the S5 schema; therefore McDermott's rather unmotivated suggestion to drop back to nonmonotonic S4 ( 1982, p. 45) is not the answer. The S5 schema merely makes k t B( P) => P would be an appropriate axiom schema if the intepretation of 8( P) were "P is nown" rather than "Pis believed," but that notion is not non-monotonic. An agent cannot, in ~eneral, know when he does not know P, because he might believe P-leading him to believe that e knows P-while Pis in fact false. Since agents are unable to reflect directly on what they do not know (only on what they do not believe), an autoepistemic logic of knowledge alone would not be a non-monotonic logic; rather, the appropriate logic would seem to be monotonic S4.
126
R. C. Moore
explicit the consequences of adopting B(P) ::::J Pas a premise scheme that arc implicit in the logic's natural semantics. If we want to base non-monotonic logic on a modal logic then the obvious solution is to drop back, not to S4, but to what Stalnaker (1980) calls "weak S5"-S5 without B( P) ::::J P-or K45 in Chellas's (1980) terminology. It is much better motivated, and moreover has the advantage of actually being non-monotonic. In autoepistemic logic, however, even this much is unnecessary. Adopting any of the axioms of weak S5 as premises makes no difference to what can be derived. The key fact is the following (Moore, 1985, Theorem 4.1 ).
Theorem 6 If P is true in every autoepistemic interpretation of T then Tis grounded in Au { P} if and only if Tis grounded in A. An immediate corollary of this result is that if P is true in every autoepistemic interpretation of T then T is a stable expansion of A u { P} if and only if Tis a stable expansion of A. Since the modal axiom schemata of weak S5, Q)
::::l
(B(P)
B(P)
::::l
B(B(P))
1B(P)
::::J
B(-,B(P))
B(P
::::l
::::l
B(Q))
simply state Stalnaker's Conditions 1~3, all their instances are true in every autoepistemic interpretation of any stable autoepistemic theory. The nonmodal axioms of weak S5 are just the valid formulae of ordinary logic, so they are true in every interpretation (autoepistemic or otherwise) of any autoepistemic theory (stable or otherwise). Therefore it immediately follows from Theorem 6 that a set of premises containing any of the axioms of weak S5 will have exactly the same stable expansions as the corresponding set of premises without any weak-S5 axioms.
ACKNOWLEDGMENTS This research was supported in part by the US Air Force Office of Scientific Research under Contract F49620-82-K-0031. It was also made possible in part by a gift from the Systems Development Foundation.
Bl B LIOG RAPHY Chellas, B. F. ( 1980). M ada/ Logic: An Introduction. Cambridge University Pre'' (This is one of the main textbooks on modal logic.)
4
Autoepistemic Logic
127
Halpern, J. Y. and Moses, Y. (1984). Towards a theory of knowledge and ignorance: preliminary report. Proc. Workshop on Non-Monotonic .Reasoning•. f~!ohonk Mountain House, New Paltz, New York, pp. 125-143. Amencan AssociatiOn for, Artificial Intelligence, Menlo Park, California; reprinted in Logics and Models of Concurrent Systems (ed. K. Apt), pp. 459-476. Springer-Verlag, Berlin (1985). (This paper gives a treatment of the locution "all that I know is P," which is very close to autoepistemic logic. The intended applications are to distributed computing.) Kripke, S. A. (1971). Semantical considerations on modal logic. Reference and Modality (ed. L. Linsky), pp. 63-72. Oxford University Press. (This is an easily understandable presentation of the formal semantics of modal logic by the most important contributor to that field.) Levesque, H. J. ( 1981). The interaction with incomplete knowledge bases: a formal treatment. Proceedings 7th Int. Joint Cof!f on Artificial Intelligence, Vancouver, pp. 240--245. William Kaufmann, Los Altos, California. (Levesque's goal was to characterize knowledge bases that can answer questions about what they know. The formalism does not permit the knowledge base to make explicit reference to what it knows, however, so that facts such as "if P were true, I would know it" are not representable.) McDermott, D. and Doyle, J. ( 1980). Non-monotonic logic I. Artificial/ ntelligence 13, 41-72. (See Section 4 for a detailed discussion.) McDermott, D. (1982). Nonmonotonic logic II: Nonmonotonic modal theories. J. ACM 29, 33-57. (See Section 4 for a detailed discussion.) Minsky, M. (1974). A framework for representing knowledge. MIT Artificial Intelligence Laboratory, AIM-306; reprinted in Mind Design (ed. J. Haugeland), pp. 95-128. MIT Press, Cambridge, Mass. (1981). (This is the paper that initially raised awareness of the problem posed by non-monotonicity for modelling commonsense reasoning as logical inference. Minsky seems to have thought this was a crushing argument against the use of logic, but the response was the invention of logics that are non-monotonic.) Moore, R. C. (1984). Possible-world semantics for autoepistemic logic. Proc. Workshop on Non-monotonic Reasoning, Mohonk Mountain House, New Paltz, New York, pp. 344-354; also SRI Artificial Intelligence Center Technical Note 337, SRI
International, Menlo Park, California (August, 1984). (The contents of this paper are summarized in Section 2.2.) Moore, R. C. (1985). Semantical considerations on nonmonotonic logic. Artificial intelligence 25, 75-94. (This is the original paper on autoepistemic logic, with the main emphasis on diagnosing the problems of McDermott and Doyle's nonmonotonic logics.) Stalnaker, R. (1980). A note on non-monotonic modal logic. Dept. Philosophy, Cornell Univ. (This originated as a commentary on McDermott (1982). It did much to point the way toward autoepistemic logic, but was unfortunately never published.)
DISCUSSION Jean Fargues: In this discussion we shall use the notation Dp instead of Bp, because We shall refer to the traditional modal systems S4 and S5. The autoepistemic logic interpretation introduced by Moore in this chapter is a Very seductive one, in the sense that it tries to avoid some of the problems encountered
128
R. C. Moore
by previous workers in this field. In that sense, it is a major contribution to the representation problem of the deduction on beliefs of rational agents, but the interest of that work goes beyond the autoepistemic formalism because it is related to all the other works on non-monotonic logics. Thus this chapter is very relevant to current research. Its great interest consists in the new and original way in which the author introduces the "belief" logic in the autoepistemic interpretation. Consequently, it is difficult to relate the proposed formalism to the other modal logics and to the traditional way of presenting an interpretating of a modal logic system. Thus it is not easy for the author to relate his discussion to our intuitive notions or to recent work on non-monotonic logics. I am sure that the reader will feel that the prime concern of the author was to be didactic, although the subject should be presented with the usual rigour, as the author did. This discussion should, strictly speaking, start with a survey of the belief logics, or of modal logic. This is not necessary here because the two last sections of the article are a good survey by themselves. Another interesting survey has been produced by Hanks and McDermott (1986), McDermott being one of the main contributors in the related non-monotonic logic field. I shall not compare here all the formalisms proposed for representing and manipulating the "beliefs" of a rational agent, because this productive comparison should be one of the benefits that a reader can gain from reading several chapters of this book. (Another reason is that the conclusion of such a comparison depends on the point of view adopted; a philosophical one versus a practical one, an atomistic one versus a continuous one, and so on.) Thus I should like to focus the discussion on two points that may be found difficult if one starts from the well-known notions that are the basis of classical logic. These points are: the notion of "deductibility" in autoepistemic logic; the consistency notion in autoepistemic logic. When I read the chapter, it appeared to me that a major difficulty was understanding what could be the notion of deduction underlying the characterization of a stable autoepistemic theory. In classical logic the deductibility relation I- can be viewed as a fixed-point operator, i.e. if TH( A) = {x: A I- x} is the deductibility operator (defining the set of theorems derivable from a set A of formulae) then the monotonicity meta-axiom may be written as VX, VY, X c
Y~TH(X)
c TH(Y)
Furthermore, it implies that the TH operator is a fixed-point operator, i.e. that TH(A) = TH(TH(A)). Thus there is a unique set of formulae that we can derive from the axioms, i.e. a unique set of theorems. In the case of non-monotonic logic, the uniqueness of the TH(A) set of theorems derivable as a fixed point of the deductibility operator is not necessarily satisfied. For this reason, it is only possible to characterize a stable expansion of an autoepistemic theory by a non-iterative statement:
T= TH(A v {Dp I peT} v {1Dp I pf T}) This is the first difficulty, because this "holistic" definition cannot lead to an iterative definition such as
4
129
A utoepistemic Logic
To= A
Tn+ 1 = TH(Tn U {Op I P E Tn}
U
{-1 Op I P ¢ T.})
because of the possiblity to go back at the (n + I )th step on some inference done at the jth step, j ~ n. An important consequence is that the proof methods for autoepistemic logic cannot be envisioned as the standard mechanized proofs that we know for classical logic or even for traditional modal logics (S4 or S5). The proof method detailed in this chapter is an interesting contribution for deduction on autoepistemic logic. But, for the above reason, it is in one sense non-constructive, as opposed to proof methods like resolution methods, or semantic-array-based methods. I could also discuss the fact that it seems less deterministic than the known methods, because of the non-iterative definition of the stable expansion of an autoepistemic theory. Thus the gap between the theoretical results on autoepistemic logic and the effective implementation of a proof procedure in a computer today seems difficult to reduce. I think that future work, starting from Moore's results, could provide an easier way to apply the autoepistemic formalism on real application domains. My second remark concerns the relativity of the consistency definition, when we consider non-monotonic logics. I should like to emphasize that a naive point of view about the consistency notion could lead the reader to encounter some paradoxical results. In fact, when we enter in this "non-monotonic-logic world", we must be aware that the most obvious definitions that we learned in classical logic may be related to monotonicity, and might not be valid in autoepistemic logic, for example. Thus let us consider the consistency definition in classical logic. A set of formulae X is inconsistent if and only if any of the following conditions hold: I. 2.
3.
3p such that X I- p and X I- 1p; lfp, X I- p; X I-f, where/is a distinguished formula (the truth-value falsehood), the nega-
tion --, p being defined as the formula p -=> f These three conditions may be proved to be equivalent, but the proof that I know uses the monotonicity property, for example for (2)=>(1) or for (2)=>(3), but not for (I)= (2) or for (3) = (I). Thus I should like to know if there is some relationship between the consistency notion in autoepistemic logic and the fact that it is nonmonotonic. Another question concerns the serial property of the accessibility relation between worlds (there always exists a world accessible from a given world). The serial property implies that {Op, 0--, p} is inconsistent. Is it inconsistent in autoepistemic logic too?
Peter Jackson: First of all, I have a request for clarification. What is in T? There is a reading of Moore's paper that seems to allow every proposition pin an autoepistemic theory Tto be implicitly in the scope the belief operator B. After all, Tis meant to be a theory about what an agent believes, and not a theory about the world. An au.toepistemic interpretation ofT ought to be a model ofT only if every proposition in TIs a true statement of the agent's beliefs, regardless of whether the agent's beliefs are correct. This is how I interpret Definition 2, after reflection, although my original reading of it required that unmodalized formulae in T be true of the world. If every proposition in T is implicity in the scope of B then the semantics that Moore proposes for B, in terms of a complete S5 Kripke structure, appears to follow
130
R. C. Moore
immediately, without the intricacies of Definitions 8 and 9 and Theorem 4. This is because Stalnaker's conditions define a modal logic as strong as K45, therefore containing K5, and it is not hard to show that l-.< 5 Doc iff I-55 oc (see e.g. Chell as, 1980, p. 142). Moore's statement that every autoepistemic theory will contain intances of S5 schemata such as Hoc => oc tends to confirm this reading of Definition 2, and it is also consistent with the secondary reflexiveness of the accessibility relation implied by the K5 theorem D(Doc => oc). If everything in T isn't implicitly in the scope of B then one would want to add seriality to Stalnaker's conditions in order to maintain Moore's semantics. Otherwise, there is a Kripke counter-model where {w} is the set of worlds and both Bp andB- p can be true at w. In other words, it seems that if Bp is in T, then - B- p must also be in T, if the beliefs of our rational agent are to be consistent. From an axiomatic point of view, this is like adding the doxastic equivalent of Doc => -D-oc to weak S5. The resulting logic would then be as strong as KD45, which is widely regarded as an appropriate logic of belief (e.g. Halpern and Moses, 1985; Konolige, 1985). Paradoxes of strict implication. Secondly, I should like to ask Moore if he is happy with the following theorems of autoepistemic logic:
B(Bp => Bq) v B(Bq => Bp) Bp => B(q => p) which have prompted other researchers in doxastic logic (e.g. Levesque, 1984) to adopt alternatives to possible world semantics. How could it be used? Thirdly, I should like to express some reservations about the non-monotonic aspects of autoepistemic logic. Under what conditions would you ever want to enumerate all the stable expansions of an autoepistemic theory (even where it was possible in principle to do so)? There is a lack of convincing examples here. Also, the restriction on quantifying into modal contexts, necessitated by the undecidability of even the monadic modal predicate calculus, must severely curtail the practical utility of any system that attempts to apply autoepistemic logic to problems requiring non-monotonic patterns of reasoning. As the author himself states, the ability to quantify in is required by many useful formulae that we might wish to add as proper axioms to an autoepistemic theory, e.g. 'v'x( P(x) => B( P(x))).
Philippe Smets: (i) In his introduction, Moore says that "if P is not a consequence of a set of premises A then Pis not believed". This does not fit with the commonsense meaning given to "belief". I can think of a situation where Pis not a consequence of a set of premises but nevertheless I believe P. Moore's interpretation for "belief" is some idealized concept. But why does he restrict belief to "sound belief" ("justified belief"), why does he require that "all agent's be\iefs should be grounded in his premises"? It seems that Moore's belief is knowledge in a world where all the premises would be true-this world might be of course different from the actual world. If the premises were true in the actual world then Moore's belief could be plain knowledge. (ii) Could we interpret "ideally rational agent" (see Section 2.2) by claiming that "ideal" means completeness (deductive closeness of T) and "rational" means soundness (believe only what can be deduced from the premises)? (iii) Which modal system could fix doxastic logic? K45logic (also called weak S5 in Moore's chapter) or KD45 (as mentioned in Jackson's comments) seem to us inadequate as they contain the negative introspection axiom 5 (-, BP => B-, BP ). Models KD4 or KD4! (i.e. K: B( P => Q) => (BP => BQ), D: BP => -, B-, P, 4:
4
131
Autoepistemic Logic
BP => BBP, ! BBP => BP; see Chellas, 1980, p. 155) seem to us more appropriate for doxastic logic, where a Peircian definition is given to "belief" (I believe A means that I would act as if A were true). In order to explore what kind of modal logic belief functions are extensions of, we consider belief functions with their range reduced to 0 and 1 (so each belief function has only one focal proposition, see Section 3.1 in Chapter 9). One finds KD4!, but not KD45 as axiom 5 is not satisfied. Let us take a space X with only 2 propositions, P and --, P. There are three ways to allocate the here unique mass on X. Should P = "God exists" and --, P = "God does not exist" then the three possible allocations correspond to those characterizing a= a deist (BP), b =an atheist (B--,P) and c =an agnostic (?P, where ?P is a shorthand for --, BP 1\ --, B--, P), where B is the modal operator "I believe that". Let y = {a v b v c}. Equation (6.1) of Chapter 9 (see page 272) when B = Y gives for x =>X andy e Y
belx(X I Y) =
L mo(z) n belx(X I y)
z:::. y
yez
where belx(x I y) is the degree of belief that x is true given that y e Y and belx(x I Y) is the degree of belief that x is true given an a priori belief function bel 0 defined on Y with mass m0 • Table 1 shows belx(x I y) for x = P, --, P and P v --, P and y e Y = {a v b v c}. Table 1
Values of belx(x I y)
X
Deist a belx(. I a)
Atheist b belx(. I b)
Agnostic c belx(. I c)
p --,p p v --,p
1 0 1
0 1 1
0 0 1
BP
B--,P
?P
Table 2 shows the seven possible allocations of the unique mass by the a priori belief functions defined on Y. Thus the column a v b corresponds to the case where I only believe that I am either a deist or an atheist, I only believe that I am not an agnostic. The modal translation translates my a priori belief with modal operators, so that "I only believe that I am not an agnostic" is translated into B(a v b) = B--,? P. At the b~ttom of the table the modal translation is the translation of belx(x I Y) computed With (6.1) given above. One must note that BBP and BP are in the same column; therefore one has BP == BBP, therefore axiom 4! is satisfied. But when --, BP is true, one does not necessarily have B--, BP.--, BP is true in all a priori allocations except the first one, but B: BP is true only in one of the six a priori allocations compatible with --, BP. So ~Xlom 5 (--, BP => B--, BP) is not satisfied. Therefore belief functions are a general!Zation of KD4! but not of KD45, where degrees of belief can take the values on the Interval [0, 1] and not only in {0, 1}.
132
R. C. Moore
Table 2 Values ofbelx(x I Y) for x in X for each of the seven possible a priori belief functions on Y. ??Pis shorthand for B(a v b v c), the case of total ignorance on Y
Focal elements for which m0 {.) Modal translations
a B(a) 88P
b B(b) 88--,P
c B(c) 8?P
0
0 I
0 0
8P
8--,P
?P
=
I
ave 8(a v c) 8--,8--,P
bvc 8(b v c) 8--,8P
avbvc 8(a v b v c) ??P
0 0
0 0
0 0
0 0
?P
?P
?P
?P
avb B(a v b) 8--,?P
X p --,p p v --,p
Modal translations
I should like to give an additional comment on the applicability of Moore's autoepistemic logic (AEL) and on the epistemology of"logics for introspective reasoning. AEL allows the modelling of the beliefs that an ideally rational introspective reasoner is entitled to hold on the basis of an initial set of premises. As such, AEL is a competence model of reflection upon one's beliefs. Mitigated computational results and limited applicability have been presented by Moore. One obvious reason for these defects that I should like to stress is the ideal nature of the kind of reasoning modelled by AEL, which may assume unbounded computational resources. Logical completeness is assumed for such a mode of reasoning. AEL is thus an adequate model when all the possible logical consequences of an initial set of premises must be taken into account. However, such abilities are not required, or even desirable, for several AI applications where cognitive skill and even negative introspection are to be modelled. With respect to these applications, AEL should thus be regarded as an idealized model to which an effective system may converge, but not a tool that should be used directly. Other systems for introspective reasoning must be designed to this end. To illustrate this, let us consider the well-known Tweety example. Independently of the other reasons raised by Moore, the ideal nature of the reasoning modelled by AEL may discard this problem from the possible domain of applicability. Actually, if we are able to derive that Tweety can fly, it is not because we cannot prove or do not believe through the use of perfect, sound and complete logical abilities that Tweety does not fly. The paradigm that we want to model is instead concerned with the fact that we do not explicitly hold the belief that Tweety cannot fly and we do not arrive with a certain amount of introspective investigation at the conclusion that Tweety does not fly. This introspective investigation certainly requires some logical abilities, but not the whole capacity of a complete first-order theorem-prover (at least. as far as I consider my personal way of reasoning about my knowledge of flying birds). Actually, we can question whether an adequate model should present completeness and soundness with respect to what is implicitly implied by an initial set of beliefs. In contrast, an ideal logical introspective reasoning is closed under logical consequence and has the following logical omniscience properties. E. Gregoire:
4
A utoepistemic Logic
133
The final sets of beliefs include all the valid formulae. They contain all or none of each class of logically equivalent formulae. All formulae are present if contradictory beliefs are held. However, it should be possible to avoid these often unrealistic properties. Recent new logics of knowledge and belief have been proposed in order to avoid the properties of logical omniscience. For instance, we may not require that the deductive system be complete (see Konolige, 1984), or we can reinterpret the possible-world semantics in order to get more reasonable ways of characterizing beliefs of rational agents (for instance, by distinguishing between what is explicit and implicit in one's beliefs-see Levesque, 1984). Although these new logics have not yet really coped with limited negative introspection, they may influence the design of systems for limited introspective reasoning. Moore's AEL should thus be viewed as a model of perfect introspective reasoning abilities. When aiming at the modelling of beliefs and disbeliefs held by agents that do not need to have complete reasoning abilities, other models seem desirable. In conclusion, AEL is used to model the beliefs of an ideally rational introspective agent. As Moore demonstrates, AEL can be used directly in several application domains. However, it should be remarked that AEL relies upon strong assumptions with respect to the modelled reasoning abilities. In several domains, computational shortcuts or restricted reasoning abilities seem acceptable or even desirable. Under these conditions, the role of AEL should appear as an idealized model because of the completeness of the logical reasoning that it models.
Reply: Fargues is right to focus on the "non-constructive" nature of the specification and computation of stable expansions in autoepistemic logic. It was for this very reason that I have studiously avoided the use of the term "deducibility" in my chapter, since autoepistemic consequences are not deducible from premises in any ordinary sense of the term. In the general case, constructive methods are known to be !lnachievable, since stable exP.ansions are not always recursively enumerable. The Interesting question is whether for special cases there would be constructive, iterative methods of computing stable expansions. In fact there is a well-known iterative method that is sound, but not complete, based on the "negation by failure" rule widely used in database querying and Prolog-style reasoning systems. The idea is that to prove --, B ( P ), the system tries all possible ways of proving P. If the system exhausts all possible ways of proving P without finding a proof, and if the underlying monotonic deduction method is complete, then this constitutes a (meta) proof that P is not deducible; hence --, B( P) is true. The method is not complete, however, because the attempt to prove P may not terminate. In that case --,B(P) is true, but the system never arrives at that conclusion. It would seem to be a very important question for theoretical analysis to find out whether there is any natural characterization of the autoepistemic theories for which the attempt to prove a formula would always terminate and hence this iterative method would be complete. The three notions of consistency discussed by Fargues are, in fact, all equivalent in autoepistemic logic. This follows immediately from the fact that stable autoepistemic theories are closed under ordinary logical consequence. Autoepistemic logic has ~nother aspect, however, that is in some intuitive sense another kind of inconsistency, ~· the possibility that there is no stable expansion of a set of premises. That is, it may that from a certain set of premises, an agent cannot construct any set of beliefs that conform to the conditions of autoepistemic logic, not even a set that is inconsistent in
134
R. C. Moore
the ordinary sense. I believe that the familiar paradox of "the unexpected hanging" (Montague and Kaplan, 1960) can be reconstructed in this way. To answer Fargues's question about seriality of the accessibility relation, it is necessary to look at the relationship between autoepistemic logic and standard modal logics from two different perspectives. One question that we can ask is what standard modal logics are included in an autoepistemic theory. A corollary of Theorem 3 is that every stable autoepistemic theory includes all the valid sentences of S5 and is closed under S5 consequence. Hence if we build standard Kripke models for stable autoepistemic theories then the accessibility relation will be an equivalence relation, and therefore serial. A slightly different question, however, is what modal logic describes autoepistemic logic from the outside (so to speak)? That is, suppose we do put every formula of an autoepistemic theory within the scope of an additional B, as Jackson suggests. It appears that the appropriate standard modal logic for this purpose is what I have called (following Stalnaker) weak S5 (or K45 in Chellas's terminology), which does not satisfy seriality. We cannot use KD45, which incorporates B(P) :::J 1B(1P), because we want to be able to describe inconsistent autoepistemic theories. These can arise, because if a set of premises is inconsistent then it will have a single stable expansion, which is also inconsistent, in view of Definition 4 and the stability conditions. K45, then, seems to be the logic that has as its models (not its theories) all possible stable autoepistemic theories. Hence, from this point of view, seriality is not appropriate. Jackson's question "What is in T?" is a natural consequence of the self-reflective nature of autoepistemic theories. The key to understanding what is going on is to realize that autoepistemic theories are not merely about the beliefs of a rational agent, they are supposed to be in the beliefs of a rational agent. A primitive proposition Pin an autoepistemic theory, then, is true only if Pis true in the world. To make an analogy, suppose that we have a book that collects the speeches of some famous politician. Of course one can ask whether the book is accurate in the sense of correctly recording what the politician said, but if we ask whether a particular statement in the book is true (e.g. "inflation has never been lower"), we have to look at the world, not merely at what the politician said. To push the analogy even further, if the politician says "I have said that inflation has never been lower" then that can be true whether or not inflation actually has been lower, but that statement involves explicit reference to what has been said. So, in Jackson's terms, things in Tare not implicitly in the scope of B with respect to their truth conditions. Jackson's proposed counter-model reflects a misunderstanding of the relationship between the possible-world interpretations of autoepistemic theories and standard Kripke structures for modal logic. Recall that a possible-world interpretation of an autoepistemic theory is an ordered pair of a single possible world, to represent what is true in the actual world, and a complete S5 structure, to represent what the agent believes. If we wanted to use a standard Kripke structure rather than these ordered pairs, we should add the actual world to the S5 structure, making each world in the original S5 structure accessible from the actual world, but the actual world would not necessarily be accessible from any other world. The resulting structure would satisfy the so-called Euclidean property, R(x, y) v R(x, z) :::J R(y, z), rather than necessarily being an equivalence relation as in S5 structures. Using this construction, {w} with 1 R(w, w) represents the situation where w is the actual world and the agent has inconsistent beliefs; hence there are no worlds compatible with what the agent believes in the actual world. In such a case, it is perfectly appropriate forB( P) and B(1 P) to both be true at w.
4
Autoepistemic Logic
135
Jackson's question about the paradoxes of strict implication seems really beside the point. These are paradoxes only if one is trying to use strict implication to model natural-language conditionals. Autoepistemic logic makes no attempt to say anything about conditionals, and if we remind ourselves that in autoepistemic logic, like classical logic, P => Q means nothing more than 1 P v Q then both of the examples Jackson gives are perfectly reasonable. As to Jackson's final points, the computational methods discussed in the paper apply to autoepistemic theories with more than one stable expansion more for theoretical reasons than practical ones. Straightforward examples of autoepistemic reasoning, like the Nixon example in the paper, do in fact result in unique stable expansions. Of course it is possible for pathological cases to occur, but the fact that things like the "unexpected hanging" paradox are so puzzling to our untutored intuition suggests that this is a genuine aspect of autoepistemic reasoning and not just an artefact of the logic. The fact that the monadic modal predicate calculus is undecidable does not, in and of itself, mean that quantifying into the scope of the autoepistemic operator is computationally intractable. Since autoepistemic logic is actually stronger than ordinary weak S5, it is not clear that the result to which Jackson refers is applicable. For example, the word problem for groups is undecidable, but the word problem for Abelian groups (a stronger theory) is trivial. The real problem with quantifying into the scope of the autoepistemic operator is to figure out what it means. Until this conceptual problem is solved, the computational problem cannot even be properly posed. In answer to Smets, whether every belief is the consequence of a set of premises depends on what one takes the premises to be. Reasoning has to start somewhere, so for our purposes an agent's premises could be taken to be simply those beliefs that are not the result of reasoning from other beliefs. These might be beliefs that are the direct result of perception. The notion of premise then, is used in an almost tautological way in this theory. Smets is correct in his conjecture that I am taking "ideal" to mean that the agent's reasoning is logically complete, and "rational" to mean that his reasoning is logically sound. Of course, this itself is a highly idealized notion of rationality, since it is often rational to believe something that one doesn't have irrefutable evidence for, but that is more in Professor Smets's line than mine. Although Smets finds negative introspection unintuitive as a property of belief, the whole enterprise of autoepistemic logic is, in some sense, just an exploration of the consequences of having perfect negative introspection, so this question is one that cannot really be addressed from within the perspective of autoepistemic logic. To defend negative introspection, however, consider the question of whether the President of the United States was standing up or not at precisely 10.00 a.m. on I January 1987. I dare say that the vast majority of people have no belief whatever about this question, and that furthermore, they are firmly convinced that they have no such belief. But being so convinced requires negative introspection! I must confess that the exact relation of autoepistemic logic to the logic of belief ~unctions remains obscure to me, so I do not really understand why negative !ntrospection seems to be ruled out in that framework. There is, however, what to me Is a more obvious way of connecting autoepistemic logic to a framework that deals in degrees of belief, namely subjective probability. Suppose that we define a probability measure over the set of possible worlds in the S5 structure that characterize a Particular stable autoepistemic theory. We can interpret the degree of belief in a Proposition to be the probability of the set of worlds in which that proposition is true. If we let the language talk the subjective probability the agent assigns to propositions then the analogue of perfect introspection will simply be to say that every agent has
136
R. C. Moore
perfect knowledge of exactly what subjective probability he assigns to every proposition. This may not be the most appealing theory of degrees of belief, but at least the framework itself does not rule out the possibility of negative introspection, as Smets's approach to belief functions seems to. The issue that Gregoire raises is to some extent a result of tension between the desire, on the one hand, that autoepistemic logic concern itself with the reflection of rational agents upon what they believe, and the fact, on the other hand, that it is a logic. In treating it as a logic, we are concerned with its completeness properties, and a logically complete theory can be viewed as a set of beliefs only of an ideally rational agent. It is important to realize, though, that the basic semantic definitions of autoepistemic interpretation and model and the notion of soundness are applicable even to incomplete theories. Lacking a specific theory of the reasoning abilities of lessthan-ideally rational agents, however, it is hard to derive any definite results about such agents. Additional references
Halpern, J. Y. (ed.) ( 1986). Proc. 1986 Conj: on Theoretical Aspects of Reasoning ahora Knowledge. Morgan Kaufmann, Los Altos, California. Halpern, J. Y. and Moses, Y. (1985). A guide to the modal logics of knowledge and belief: preliminary draft. Proc. 9th Int. Joint Conf on Artificial Intelligence, Los Angeles, pp. 480-490. Morgan Kaufmann, Los Altos, California. Hanks, S. and McDermott, D. (1986). Default reasoning, non-monotonic logics, and the frame problem. Proc. American Association for Artificial Intelligence Con( (AAAI-86), Philadelphia, pp. 328-333. Konolige, K. (1984). A deduction model of belief and its logics. Report STAN-CS-841022, Dept. Computer Sci., Stanford Univ. Konolige, K. (1985). A computational theory of belief introspection. Proc. 9th Int. Joint Conf on Artificial Intelligence, Los Angeles, pp. 502-508. Morgan Kaufmann. Los Altos, California. Levesque, H. J. (1984). A logic of implicit and explicit belief. Proc. American Association for Artificial Intelligence Conf (AAAI-84), pp. 198-202, Morgan Kaufmann, Los Altos, California. Montague, R. and Kaplan, D. (1960). A paradox regained. Notre Dame J. Formal Logic 1, 79-90; reprinted in Formal Philosophy: Selected Papers of Richard Montague (ed. R. H. Thomason), pp. 271-285. Yale University Press, New Haven. Conn. (1974). Newell, A. (1980). The knowledge level. AI Magazine 2, no. 2, pp. 1-20.
5
The Preferential-Models Approach to Non-Monotonic Logics PHILIPPE BESNARD /RISA. Rennes, France
PIERRE SIEGEL GIA. Universite d'Aix en Provence
a Luminy, Marseille, France
Abstract We suggest that non-monotonic reasoning could be expressed by preferring some interpretations and models over others in a given state of knowledge. We study a notion of preference based upon an ordering of the models of a theory. This notion enables us to compare various non-monotonic systems, including circumscription.
1
INTRODUCTION
Logics, those special systems aimed at formalizing arguments into dedicated languages, rely on the notion of inference-an operation that assigns to premises a conclusion. Accordingly, a logic derives its basic features from the properties of the various kinds of inferences that it allows. Classical firstorder logic, for instance, is monotonic because any deductiont is valid; that is, there is no (first-order) interpretationt -and hence no (first-order) model§ of the premises-in which the premises of the deduction would hold as opposed
t Deduction is the precise word for the notion of inference developed in first-order logic. The conclusion of a deduction is said to be deduced from the premises of that deduction. We consider here deduction from the semantic, that is model-theoretic, point of view, bearing in mind that the sernantic and syntactic approaches are equivalent in first-order logic. t A (first-order) interpretation is a special structure of first-order logic that makes every sentence (a formula) of a first-order language to be assigned a truth value, either true or false. §Given a theory, that is, a set of sentences (formulae), a (first-order) model of the theory is a (first-order) interpretation that makes every sentence (formula) of the theory to be assigned the truth value true. ~~~·STANDARD LOGICS FOR AUTOMATED SONING ISBN 0-12-649520-3
Cop)riqhr :1' /9HH Academic Pre.'ls Limirt•d All right.'l ~{reproduction in any j(Jrm re.'lert•ed
138
P. Be.mard and P. SieK(•/
to the conclusion. Stated alternatively, monotonicity is a property of firstorder logic stating that if H is part of a collection of premises K then whatever can be deduced from H can be deduced from K as well. The concept of monotonicity can easily be extended to any logic by substituting the notion of deduction for that of inference. Here is an example that illustrates the relationship between the monotonicity of a logic and the arguments to be formalized by a logic. Knowing that Socrates is a man and that all men are mortal, we can conclude that Socrates is mortal. This argument can be regarded as exemplifying validity because it can be formlized in first-order logic as the deduction whose premises are M AN(Socrates) and VxMAN(x)=MORTAL(x) and the conclusion is MORTAL(Socrates). Then, by monotonicity (of first-order logic here), that MORTAL(Socrates) can be deduced from the two given premises implies that it can be deduced whichever group of additional premises is introduced. In fact, the word monotonicity describes the intuitive principle by which increasing a collection of premises always increases the class of all corresponding conclusions. In contrast with the preceding valid inference, consider the following argument. Knowing that Socrates is a man and that almost all men do not suffer from agutyt leads us to conclude that Socrates does not suffer from aguty. After translating the components of this argument into sentences of an adequate first-order language, we are faced with a purported inference that would have ---,A GUT/ ED(Socrates) as a conclusion and a suitable collection of sentences (formulae) as premises. So far so good. However, if A GUT/ ED(Socrates) is added to the initial collection of premises then, clearly, no inference can be accepted that would have ---,A GUT/ ED(Socrates) as a conclusion following from the enlarged collection of premises. This means that, although the collection of premises in hand has been increased, the class of all corresponding conclusions did not increase in the sense that the class lost the element ---,A GUT/ ED(Socrates). In such a case, the mono tonicity principle is violated (as a further evidence for this, consider the original definition of monotonicity and observe that ---,A GUT/ ED( Socrates) cannot be inferred from a collection of premises a part of which makes it possible to be inferred). What is obviously required here is a non-monotonic logic (see AI (1980) or NM RW (1984) for an introduction to the topic). It appears from this example that monotonicity enforces a notion of inference too restrictive to formalize all those arguments that everybody commonly presents or accepts without considering validity. Yet, the importance of such arguments in Artificial Intelligence and in the field of t
A gut y is the disease that makes people unable to taste food.
5
Preferential-Models Approach to Non-Monotonic Lol(ics
139
expert systems in particular seems to be uncontroversial, as we are usually content with being right in all plausible states of affairs if we are not certain to be right K1 all possible states of affairs. In the context of models, this is to say that we are to be concerned with truth in models of which commonsense tells us that they are worthy of more consideration than the others. From this point of view, the right way to define a non-monotonic logic should be by just specifying a set of such preferential models from a superset of first-order models (such a formulation, by explicitly defining preferential models as firstorder models, emphasizes the fact that deductions are to be captured by the logic being devised-which is then going to extend first-order logic, as it seems inescapable to have all valid inferences in a logic that admits some invalid ones). This paper is devoted to a few non-monotonic logics that can indeed be defined in such a way (no philosophical issues are addressed, as we are not concerned here with why one particular definition of preferential models has been chosen rather than another).
2
FROM FIRST-ORDER MODELS TO MINIMAL MODELS
Starting from here, we focus on non-monotonic logics for which the notion of preferential models involved corresponds to minimality with respect to a certain preordering defined on sets of models. This enables us to present a few of those non-monotonic logics within a unique yet reasonably simple framework to be introduced in some detail shortly. For the time being, let us briefly review some basic notions of model theory of first-order logic, in particular in order to fix terminology and notations. First of all, an alphabet A consists of the usual logical symbols, variables, function symbols and relation symbols. Given an alphabet A, a first-order language, denoted L(A), consists of 0, TRUE, FALSE, terms and formulae built in the usual manner from symbols in A.
Definition Given an alphabet A, an interpretation M is a function over L(A) such that the following requirements are met: (i)
M[0] = D, where Dis a non-empty set called the domain of M;
(ii)
M (JJ
(iii)
M[R]
(iv)
M[t] ED for every term t;
(v)
M [F]
E E
DD" for every function symbol f of rank n;
2D" for every relation symbol R of rank n;
E { 0,
I} for every formula F;
140
P. Besnard and P. Siegel
(vi) (vii) (viii)
M[f(t~>
.. . , t.)] = M[f](M[t 1 ], ... , M[t.]) for all function symbolsjofrank nand terms t~>····t.;
M[R(t 1 , ••• , t.)] = M[R](M[t 1 ], • •• , M[t.]) for all relation symbols R of rank n and terms t 1 , ... , t.; M[FALSE] = 1- M[TRUE] = 0;
(ix)
M[1F] = 1 - M[F] for every formula F;
(x)
M[F v F'] =max (M[F], M[F']) for all formulae F and F';
(xi)
for every formula F and variable x; M['v'xF] = 1 iff N[F] = 1 for all models N such that N I L(A - {x}) = M I L(A- {x}).t
Notation Values 0 and 1 of the set 2 = {0, 1} correspond to the usual values for characteristic functions of subsets of a set (context always making it clear what this set is, product D", set of formulae, ... ).
By conditions (vi)-(xi), the truth value assigned to a formula is constrained to respect the principle of compositionality: the meaning of a formula depends on the meaning of its components. Modal logic (for a brief account of which see Appendix B to the Introduction to this book, and for an application see Chapter 2) for instance, differs from first-order logic in this respect. A model of a theory (i.e. set of formulae) Tis a first-order interpretation in which every formula ofT is satisfied or true (i.e. assigned value 1). A theory is satisfiable if and only if there exists at least one model of it. A theory T entails a formula F, denoted T 1 I= F, if and only if F is true in all models of T. · At the end of the last section we expressed, on a model-theoretic level, our view of the non-monotonic character of commonsense arguments as follows: in the perspective of commonsense, some models can be neglected. According to our approach, then in a non-monotonic logic formalizing commonsense arguments, the right models correspond to a class of preferential first-order models. If we want to characterize a non-monotonic logic in this way then we have to specify what the preferential models are. To do that, we could just elicit some models, but, of course, we won't get anywhere this way, so let's try something else. Think of the aguty example. In the light of a model where it is true that Socrates does not suffer from aguty, a model where it is true that Socrates suffers from aguty is not worth considering. More generally, given two models, we hope that we are able to prefer one to the other. We can then discard the latter model and proceed to the next pair of models. Eventually. t The notation !I D is used to designate the restriction of the function f to the domain D.
5
Preferential-Models Approach to Non-Monotonic Lol(ics
141
we shall get a stable state where only preferred models remain (this is only an intuitive presentation, so we don't need to bother about the process described here being effective or not). Such a comparison between models is intended to be achieved by means of a preordering, and the preferential models are then defined to be all models that are minimal with respect to the preordering. We now proceed to define the notion of preemption preordering, which is the cornerstone of our preferential-models approach to non-monotonic logics. Definition Given an alphabet A with set of relation symbols R, the preemption preordering ~P attached to the partition P = (R=, R+, R-, R.) of the set of relation symbols R is a preordering such that for every two models M and N over L(A), N minors M with respect to partition P, denoted N ~PM, iff (a)
N[0] = M[0];
(b)
N[x]
(c)
N[f] = M[f] for all function symbols/;
(d)
N[R]
(e)
N[R] 2 M[R] for all relation symbols R in R+;
(f)
N[R]
= M[x] =
~
for all variables x;
M[R] for all relation symbols R in R=;
M[R] for all relation symbols R in R_.
Here is an example of minoring where M and N are models sharing the same domain, namely D = {Socrates, Emos} and interpreting the constant Socrates (respectively Emos) by the element Socrates (respectively Emos) of the domain D. The preemption preordering to be considered is such that a model should be preferred to another if it makes the property A GUT/ ED smaller (R _ consists of the unique relation symbol A GUT/ ED) and if it makes the property 0 RDI NARY larger (R + consists of the unique relation symbol ORDINARY), regardless of the property FAST-WALKER (R. the set of varying relations, consists of the unique relation symbol FASTWALKER); in addition, all other relations (including the property MAN) should be the same in both the preferred model and the non-preferred model (R= = R- {AGUTIED, ORDINARY, FAST-WALKER}). Relations in M are as follows:
M[MAN]
=
{Emos, Socrates}
M[ORDINARY]={} M[AGUTIED] ={Socrates} M[FAST-WALKER] ={Socrates}
142
P. Besnard and P. Siegl'/
Relations in N are as follows:
N[MAN]
=
{Emos, Socrates}
N[ORDINARY] N[AGVTIED]
=
{Emos}
= { }
N[F AST-WALKER]
=
{Emos}
Since the preemption preordering ~P is based on the partition P = .. . , t.] = f(t 1 , ... , t.) for all function symbols/ of A and all ground terms t 1 , .. • , t. of L(A').
Such a model is called discriminant (Colmerauer, 1979) because it tends to make two different ground terms denote two different elements of its domain. Observe that Herbrand models are a proper subclass of the discriminant models, the actual difference between both notions being too technical a matter to be of interest here. The fundamental concept with respect to subimplication is minimality over the class of discriminant models. We make this precise in the following definition.
Definition A formula F follows from a theory T by subimplication, denoted T I= suB F, iff F is true in all discriminant models ofT that are minimal with respect to preemption preordering ~p, where P = ( { }, { }, R, { }). Of course, throughout this section, the term minimal refers to the preemption preordering attached to the partition P = ( { }, { }, R, { }). To illustrate these definitions, let us return to the theory T 1 consisting of the formula COM P-M ANU F(Bull) and other formulae not involving the relation symbol COMP-MANUF. We call M 3 the model such that
5
147
Preferential-Models Approach to Non-Monotonic Logics
(i)
its domain D is {Bull, Peugeot, Dassau/t, ... } (that is, it consists of all ground terms of L(A));
(ii)
M 3 [Bull] = Bull, M 3 [Peugeot] M 3 [Dassault] = Dassault, .. . ;
(iii)
=
Peugeot,
M 3 [COMP-MANUF](d)= I ilfd=BullforalldED.
Clearly, M 3 is a discriminant minimal model of T 1. Also, all other discriminant minimal models ofT1 look like M 3 , so that T 1 l=suB •COM PMANUF(Peugeot), T 1 l=su 8 -,COMP-MANUF(Dassault), and so on. More importantly, T 1 l=suB Vx COM P-M ANUF(x) x =Bull, as desired. Subimplication is obviously a non-monotonic logic, and in any case, the example illustrating the non-monotonic nature of CW A could be used here as well. It has to be pointed out that considering only discriminant models is precisely what permits the equality relation to be minimized in fact. As we shall see in the next section, minimization of equality cannot always be achieved if arbitrary models are involved, and this sometimes precludes minimization of other relations. In contrast with CW A, subimplication need not be restricted to Horn theories. Returning to the theory T 2 that consists of the unique formula COM P-M ANUF(Peugeot) v COMP-MANUF(Dassault), no contradiction follows from T 2 by subimplication. Subimplication combines truth in models where COMP-MANUF(Peugeot) is true but not COMPMANUF(Dassault) and truth in models where COMP-MANUF(Dassault) is true but not COMP-MANUF(Peugeot), to yield the formula -,COMPMANUF(Peugeot) v -,COMP-MANUF(Dassault). In symbols, T2i=su 8 oCOMP-MANUF(Peugeot) v -,COMP-MANUF(Dassault). It can be shown that universal theories always have at least one discriminant minimal model. It follows that consistency of subimplication is assured over the class of satisfiable universal theories. Here is an example of a theory that has no minimal model.
=
Let T 3 be the theory consisting of the four formulae below: Vx G(s(x), x)
Vxyz G(x, y)
1\
G(y, z)
= G(x, z)
Vx 1G(x, x) 3xVy G(y, x) = L(y)
To see that T 3 has no minimal models, it helps to interprets as "successor of", G as "is greater than" and L as "is a large number". Then, on the set of
148
P. Be.mard and P. Siel(el
natural numbers, the fourth formula means that there is a natural number that is the greatest natural number that is not a "large number". Obviously. a first case for the set of the "large numbers" is all natural numbers, another is all natural numbers except zero, still another is all natural numbers except zero and one, and so on. In terms of models, this means that there exists an infinite chain of inclusions of subsets of natural numbers, all of these subsets being interpretations for L. Consequently, there is an infinite chain of minoring of models ofT 3 that have the structure of natural numbers. For models with arbitrary domains things are slightly more complicated, but follow the same lines. The definition of subimplication we provide above is quite satisfactory for theories that are minimally modellable (a theory is said to be minimally modellable (Bossu and Siegel, 1982), or well-founded (Etherington et a/., 1985), iff every model of the theory is minored by a minimal model). A finer definition of a more achieved version of subimplication (which coincides with the version of subimplication presented here over the class of minimally modellable theories) exists, for which consistency over all theories can be proved, which guarantees monotonicity over the class of positive formulae: T I= F iff T I= suB F for all formulae F in which " and v are the only connectives Since subimplication is thereby identified with first-order logic as regards positive formulae, it is very well suited to databases (for more on this point of view see Nicolas, 1979). Subimplication has a syntactical counterpart in the form of a proof proceduret over the class of groundable formulae whose basic form is
where all variables actually occurring on the right side of the connective ~ occur on the left side (R 1 , ... , Rn+m are relation symbols). Variables can be replaced only by constants, or the resulting formulae will not be groundable.
5
CIRCUMSCRIPTION
In the general framework given in Section 2 relation symbols are divided into four groups. CW A and subimplication which we have described so far are based on very simple partitions; in fact all relation symbols are in R-. Circumscription (McCarthy, 1980, 1986) is an example of a non-monotonic logic based on a more complex partition. Circumscription deals with t A proof procedure is an effective formal system in which inferences of a given logic can he carried out.
5
Preferential-Models Approach to Non-Monotonic Logics
149
minimization of a relation (to be specified) in a way that may result in certain relations (also to be specified) being altered. After this informal description of circumscription, the precise definition given below is straightforward. Definition A formula F follows from a theory T by circumscription of a relation R in T with relations R~> .. . , R. varying, denoted T l=ciRIPJ F, iff F is true in all models that are minimal with respect to the preemption preordering ~P such that P = ... , R.}), where R= is of course the complement of {R, R 1 , .•. , R.} in R.
Observe that circumscription is actually a generic term, as emphasized by the notation l=c/R(PJ where P denotes the partition of R that furnishes the relations R, R~> .. ., R. characterizing the circumscription of relation R with the relations R 1 , ••• , R. being allowed to vary. Let us consider the theory T 1 once again. Circumscribing COM P-M AN U F with no varying relation yields preemption preordering attached to the partition P of R such that P = x =Bull (i.e. what follows from circumscription). Evidently, circumscription is more powerful than CW A (Lifschitz, 1985b ). Consistency of circumscription is very naturally linked to the existence of minimal models. Accordingly, there is nothing more to say than about consistency of subimplication: there exist examples of theories the circumscription of which is inconsistent and there exists a proof of consistency of circumscription for well-founded theories (Etherington et a/., 1985) (hence circumscription, when applied to universal theories, whether they are Horn or not, is consistent). The usefulness of varying relations can be illustrated easily by means of an example about employing a circumscriptive conclusion like Vx COM PMANUF(x)~x =Bull as the premise of an inference. Indeed, if the knowledge base of our expert system is presented with the information "the new SS tax applies to all companies except computer manufacturers", which can be encoded as the formula VxCOMPANY(x)" -,COMPMANUF(x)=>SS-TAXED(x), then circumscription of COMP-MANUF with the relation SS-TAXED varying yields the desired result, namely T l=ciRIP) Vx COM PAN Y(x) => [oSS-TAX ED(x) => x =Bull]. This illustrates the underlying principle of circumscription by which minimization of a relation is given priority up to fully specifying other relations (among the ones allowed to vary). Subimplication would not lead to the same result basically because subimplication is monotonic for positive formulae (see end of Section 4). It is not surprising that circumscription allows more inferences than subimplication, since partitioning of R is far more flexible for circumscription than for subimplication. All this is illustrated in the continuation of our example below. Let T 4 be the theory consisting of the two formulae considered in our example; that is,
COMP-MANUF(Bull) Vx COMPANY(x)" -,COMP-MANUF(x) =>SS-TAX ED(x)
We call M 5 the Herbrand model ofT 4 such that (i)
its domain D is {Bull, Peugeot, Dassau/t, ... };
(ii)
M 5 [Bull] = Bull, M 5 [Peugeot] M 5 [Dassault] = Dassau/t, .. . ;
(iii)
M 5 [COMPANY](d) =I for all dED;
=
Peugeot,
(iv) M 5 [COMP-MANUF](d) =I for all dE D;
5
151
Preferential-Models Approach to Non-Monotonic Logics
M 5 [SS-TAXED](d) =I for no dE D.
(v)
We call M 6 the Her brand model ofT 4 such that (i)
its domain D is {Bull, Peugeot, Dassau/t, ... };
(ii)
M 6 [Bu/l] = Bull, M 6 [Peugeot] M 6 [Dassault] = Dassau/t, ...;
(iii)
M 6 [COMPANY](d) =I for all dED;
(iv)
M 6 [COMP-MANUF](d) =I iff d =Bull for all dE D;
(v)
M 6 [SS- TAX ED](d)
=
=
Peugeot,
I iff d # Bull for all dE D.
Both M 5 and M 6 are minimal with respect to the preemption preordering attached to subimplication. Then T 4 I:Fsu 8 Vx COMPANY(x)= [--,SS-TAXED(x)=x =Bull]. In contrast, M 5 is not minimal with respect to the preemption preordering P attached to the circumscription of COM PMANVF with SS-TAXED varying because M 6 ~pM 5 and M 6 is not minored by M 5 . In fact, T 4 [=c 1R
Vx COMPANY(x)
=
[1SS-TAX ED(x) = x =Bull]
From a proof-theoretic point of view, circumscription offers the exciting prospect of a general formulation, as opposed to CW A and subimplication, for which not all theories are eligible to be applied to existing proof procedures. In fact, circumscription benefits from a very elegant prooftheoretic approach through a mere axiom schema, the so called circumscription schema. Definition
The circumscription schema for circumscription of the relation •.. , R.}) with varying ... , R. is of the form
R in the theory T (in the form of the sentencet T {R, R 1 ,
relations R 1 ,
[T{ R', R'1 ,
Where R', R'1 ,
... ,
••• ,
R~} " (Vx R'(x)
= R(x))] = (Vx R(x) = R'(x))
R~ are formulae and T{R', R'1 ,
... ,
R~} is the theory
T{R, R1 , •.. , R.} in which all occurrences of R, R1 , ... , R. are replaced by . Iy. ... , R'• respective Intuition is often lost when it comes to selecting formulae for replacing formula parameters R', R'1 , ••. , R~ in the circumscription schema. It is a ~ifficult task (Reiter, 1982; Besnard, 1984; Lifschitz, 1985a) to find an Instancet (of the circumcription schema) that ultimately leads to what could R' • R'1,
t A (finite) theory can always be written in the form of a formula in which all variables Occuring are quantified. tAn instance of a schema is a formula that results from substituting formulae for the formula Parameters in the schema.
!52
P. Besnard and P. Siegel
be called conclusive formulae. This is why it seems prudent to provide the reader with at least a rough idea of what the circumscription schema means. First, T{ R', R'1 , •• • , R~} testifies, if deducible from T, that R' if admissible for R in the sense that the way R is (incompletely) specified in T does not preclude R from being R' (only extensionally of course). Secondly, (Vx R'(x) => R(x)) ensures that R is required (by the formulae ofT) to be true whenever R' is true, so that if R' is admissible for R then restricting R to R' (by means of the formula (Vx R(x) => R'(x))) is the least that can be done in order to have R minimized. The circumscription schema is to be added to first-order predicate calculust (a brief account of it is given in the Introduction, Appendix A, p. 8) for usc by first-order inference rules as any first-order axiom schema, thus disturbing first-order predicate calculus as little as possible. Definition A formula F is derivable by circumscription ofthe relation R in the theory T with the relations R 1 , ••• , R. varying, denoted T l-c1RIP) F, iff F is derivable by first-order predicate calculus from T supplemented with the corresponding circumscription schema, in symbols C[T{R/R 1 , ... , R.}] 1- F.t Let us see how all this works in practice. Returning to our illustration, let us circumscribe COMP-MANUF, with SS-TAXED being allowed to vary, in the theory T 4 consisting of the two following formulae:
COM P-M AN U F(Bull) Vx COMPANY(x) " 1 COMP-MANUF(x)=>SS-TAXED(x)
Let us consider the instance yielded by the circumscription schema for the substitutions COMP-MANUF'(x)~x SS-TAXED'(x)~ IX=
=Bull
Bull
Then T 4 {COMP-MANUF', SS-TAXED'} consists of
Bull= Bull Vx COMPANY(x)" IX= Bull=> IX= Bull
Both formulae are valid, and, accordingly, they can be derived from anY theory; the former is an immediate consequence of the axiom of reflexivity of t
First-order predicate calcu/u.~ is the syntactical part of first-order logic. is the usual notation of derivability by first-order logic.
t The symbol f-
5
Preferential-Models Approach to Non-Monotonic Logics
153
equality Vx x = x; the latter comes from the axiom schema (A " B)~ B. As regards Vx COMP-MANUF'(x) ~ COMP-MANUF(x), it is the formula
Vxx
= Bull~COMP-MANUF(x)
which can be derived from COMP-MANVF(Bull) by virtue of Leibniz' substitutivity schema, namely VxVy x = y " A(x) ~ A(y) for every formula
A. Since the left member of the considered instance of the circumscription schema can be derived from T 4 , we can use modus ponens (from formulae A and A~ B infer formula B) to get Vx COMP-MANUF(x)
~
COMP-MANUF'(x), that is,
Vx COMP-MANUF(x)
~
x =Bull
At this stage, we have used the circumscription schema directly to obtain the formula Vx COMP-MANUF(x)~x =Bull, which can thus be said to be derivable from T 4 by circumscription (of COMP-MANUF in T 4 with SS-TAXED being allowed to vary). Symbolically, T 4 1-ciR(Pl Vx COMPMANVF(x) ~ x =Bull where C/R(P) denotes the circumscription of COMP-MANUF in T 4 with SS-TAXED being allowed to vary. From this formula and by sticking to pure first-order predicate calculus, it is possible to arrive at T 4 1-ciR(PJ Vx COM PAN Y(x) ~ [oSS-TAX ED(x) ~ x =Bull]. Details are as follows. The second formula of T 4 can be written in the form Vx COMPANY(x) ~ [oSS-TAXED(x)
~
COMP-MANUF(x)]
from which, using Vx COMP-MANUF(x)~x =Bull (which can be derived from the formula obtained above and the formula Vx x =Bull~ COMPMANUF(x) that we have already seen to be derivable from the formula COMP-MANUF(Bull) ofT 4 ), we conclude
'Vx
•JX
COMPANY(x) ~ [oSS-TAXED(x)
~
x =Bull]
. Deciding which formulae are to be substituted for the formula parameters
In the circumscription schema is fundamental with respect to obtaining conclusive formulae (particularly Vx COMP-MANUF(x)~x =Bull), as lllany instances of the circumscription schema, for example the one arising from substituting COMPANY(x) AX= Bull for COMP-MANUF'(x), lead nowhere. 1' We now consider non-monotonicity of circumscription through the theory s, which is the theory T 4 supplemented with the formula
COM P-M ANUF(Peugeot)
!54
P. Besnard and P. SieKei
It is not possible to have Vx COMP-MANUF(x) =>x =Bull
derivable from T 5 by circumscription, i.e. circumscription is non-monotonic. However, it is possible to derive Vx COMP-MANUF(x)=>x =Bull v x =Peugeot
from T 5 by circumscription using substitution x = Bull v x =·Peugeot for both COMP-MANUF'(x) and IX= Bull" IX= Peugeot for SS-
TAXED'(x). 6
CONCLUDING REMARKS
The account presented of the preferential-models approach to nonmonotonic logics seems to leave out two major contributions to nonmonotonic reasoning: default logic (Reiter, 1980) and autoepistemic logic (Chapter 4 and Moore, 1985). For the former, specifying an appropriate preemption preordering is rather easy for a restricted fragment like the one corresponding to CW A. Unfortunately, things get much more involved if non-atomic formulae are taken into account. For autoepistemic logic. working with a modal language adds another difficulty, mainly because a logic in which theories consist only of first-order formulae is easier to capture by means of a preemption preordering (rendering the effect of the nonmonotonic, perhaps content-specific, inference rules). In any case, both default logic and autoepistemic logic are more general than CW A, subimplication and circumscription in that they require maximization of one or several relations in many theories. That it would be impossible to characterize certain existing nonmonotonic logics through the preferential-models approach would just mean that their semantic bases rely on non-constructive fixed points, which cannot be captured by any preemption preordering. It is only a matter of expression But the principle of the preferential-models approach is not disputed: first formalize our intuitions about non-monotonic reasoning within a modeltheoretic setting and then devise a formal system for it (thereby being concerned with constructing a subclass of the preferential models defined hy means of non-constructive fixed points).
BIBLIOGRAPHY AI (1980). Special Issue on Non-monotonic Logics. Artificial Intelligence 13, no. I 2 (Apart from the closed world assumption, introduces the very first non-monoton 1c logics.)
5
Preferential-Models Approach to Non-Monotonic Lol(ics
155
sesnard, P. (1984). Vers une caracterisation de Ia circonscription. Rapport Inria. (Some results on circumscription about its proof theory.) sossu, G. and Siegel, P. (1982). Non monotonic Reasoning and Databases. Advances in Database Theory (ed. H. Gallaire, J. Minker and J.-M. Nicolas), pp. 239-284. Plenum Press, New York. (A first (technical) glance at subimplication.) Bossu, G. and Siegel, P. (1985). Saturation, Non monotonic Reasoning, and the Closed World Assumption. Artificial Intelligence 25, 13-63. (Where subimplication is defined.) Chang, C. C. and Lee, R. C. T. (1973). Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York. (The most readable textbook on Herbrand models and their role in the design of proof-theoretic systems, especially resolution.) Colmerauer, A. ( 1979). Sur les bases theoriques de Pro log. Rapport de Recherche, Universite d'Aix, Marseille 2. (A paper roughly along the lines of the previous item, but devoted to a particular system, namely Prolog (the logic programming language.) Etherington, D. W., Mercer, R. E. and Reiter, R. (1985). On the adequacy of predicate circumscription for closed world reasoning. Computational Intelligence 1, 11-15. (A must as far as the formal study of circumscription is concerned.) Farinas del Cerro, L. and Herzig, A. ( 1988). An automated modal logic of elementary changes. Chapter 2 of this book. (An introduction to a different view of logic programming.) Lifschitz, V. (1985a). Computing circumscription. Proc. Int. Joint Conf on Artificial Intelligence (/JCAI-85), pp. 121-127. Kaufmann, Los Angeles. (As the title says.) Lifschitz, V. (1985b). Closed world databases and circumscription. Artificial Intelligence 27, 229-235. (Results on when circumscription and the closed-world assumption meet.) McCarthy, J. (1980). Circumscription-a form of non monotonic reasoning. Artificial Intelligence 13, 27-39. (A beautiful exposition of the motivation for circumscription and its definition.) McCarthy, J. (1986). Applications of circumscription to formalizing commonsense knowledge. Artificial Intelligence 28, 89-116. (More insight into circumscription.) Marchal, B. (1988). Modal logic-a brief tutorial. Appendix B, Introduction to this book. (The basic definitions and concepts of modal logic.) Moore, R. C. ( 1985). Seman tical considerations on non-monotonic logic. Artificial Intelligence 25, 75-94. (Together with circumscription and default logic, the best proposal within the field of non-monotonic logics. Highly technical.) Moore, R. C. ( 1988). Autoepistemic logic. Chapter 4 of this book. (Good background _reading for the previous paper.) Nicolas, J .-M. ( 1979). Contribution a l'etude theorique des bases de donnees: apports de Ia logique mathematique. These d'etat, Onera-Cert, Toulouse. (A study of the application of logic to database theory.) NMRW (1984). Non Monotonic Reasoning Workshop, New Paltz. (A collection of Papers, the standard of which is rather mixed but including valuable contributions t_o the study of non-monotonic logics.) Reuer, R. ( 1978). On closed world databases. Logic and Databases (ed. H. Gallaire and J. Minker), pp. 55-76. Plenum Press, New York. (A pioneering and very insightful R ~aper on non-monotonicity in logic systems.) eiter, R. (1980). A logic for default reasoning. Artificial Intelligence 13, 81-132. (An R ~ssential work in the field of non-monotonic logics.) euer, R. (1982). Circumscription implies predicate completion (sometimes). Proc.
!56
P. Besnard and P. Siege/
American Association for Artificial Intelligence Col!f. (AAA/-82), pp. 418--420.
Kaufmann, Pittsburgh. (One more result on circumscription.) DISCUSSION
C. Froidevaux: Informal introduction to circumscnptwn and CWA. Commonsense reasoning supposes very wide knowledge, so we prefer to use general statements rather than to store all elementary facts they could generate. But in everyday life there are exceptions for almost general assertions, and they are too many to be all mentioned explicitly: our knowledge is necessarily incomplete. Thus we need to use general statements admitting exceptions. This process introduces non-monotonicity in commonsense reasoning. The prototypical example is the classical inference about birds: from the facts that generally birds can fly and that ostriches are birds but cannot fly, we can infer that Tweety. provided that it is a bird, can fly. But if we subsequently discover that Tweety is an ostrich, the ability of Tweety to fly can no longer be derived. To express such general statements, the circumscription approach uses classical logic and introduces abnormality predicates. (We should point out that there are as many abnormality predicates as types of exceptions.) In the case of birds, we say that a bird that is not abnormal (with respect to the ability to fly) can fly. We obtain the following first-order formulae: 1
(\fx) (bird(x)
A
1abnormal(x) = .flies(x))
= bird(x)) (\fx) (oxtrich(x) = 1jlies(x)) (\fx) (ostrich(x)
bird(Tweety)
From these formulae, it can be inferred that an ostrich is abnormal, but to deduce that Tweety flies, we must know that only ostriches are abnormal. We require the well-known default principle (cf. Chapter 7): if something relevant is an abnormal object then this is explicitly stated; otherwise it can be reasonably considered as normal. More generally, this principle extends to the following one: the objects that can be shown to have a certain property P (here abnormal) are all (and only those) the objects that satisfy this property: we circumscribe the extension of the predicate P. In order to do this, we have to minimize its extension. From a semantic point of view, we have introduced a notion of preferability between models: models where the extension of P is minimal are preferable. Another form of non-monotonicity occurs with the assumption that only positive information is represented and that negative information can be derived h~ completion. This process considerably simplifies the storage of the data: we need only give relevant positive facts, while negative information is very high. In the context of relational databases, this is known as the closed-world assumption (C W A); we assume that every relation instance is true only if it is given explicitly or else implied by one of the universal rules defining the relation. (In general, only Horn theories arc considered.)
5
Preferential-Models Approach to Non-Monotonic Logics
!57
for example, let us consider a universe of blocks and table as follows: A, Band C are blocks, D is a table, A is green, C is red, A is ON D, C is ON D, B is ON C and if object X is ON object Y and object Y is ON object Z then X is ON Z. Under the CW A, we restrict our attention to the world where A, Band Care the only blocks, Dis the only table, A is the only green thing and Cis the only red thing. Moreover, if object X is ON object Y then (X, Y) e {(C, D), (B, C), (B, D), (A, D)}. Hence in this world the colours of B and of D remain unknown and A is neither on B nor on C. Thus CW A leads us to prefer models where every predicate has a minimal extension. While circumscription focuses on some predicates for the minimization process, CW A deals with all predicates. This brief presentation highlights the fact that for these two formalisms the notion of first-order model is insufficient to capture non-monotonicity, and suggests that we resort to a concept of preferability between models: not all models are desirable. The preference criterion obviously depends on the formalism considered. Nonmonotonicity results from the following observation: if M is a preferable model for a set of axioms T and if T c T', then M is not even necessarily a model ofT'.
2 A preemption preordering on the models. Besnard and Siegel provide a general framework to define the semantics of both formalisms. To every theory T they attach a partition P of the set of its relation symbols, so that P = (R ~, R +, R _, R). R ~ denotes the set of relations that must be identical in all models, R + the set of relations to be maximized, R _ the set of relations to be minimized and R the set of relations allowed to vary from model to model. This partitioning is used to compare models: ~P denotes the preemption ordering with respect to P on the models. Recall the example of Section 2 of Besnard and Siegel's Chapter. Let T be the following theory: T = {MAN(Emos), MAN(Socrates),
(' *
l/
Father
••
We shall not elaborate on the role exceptions in the rest of this paper, but they don't seem to conceal more difficulty than that we shall treat.
3
PRESENTATION OF THE DEFAULT LOGIC (REITER)
3.1 3.1.1
Definition of the theory Intuitive approach
Our presentation of the formalism will be essentially syntactic, because the proposals in order to have a semantical definition of default theories did not provide a more intuitive insight into these theories. This formalism presents many interesting features and is in our opinion an appropriate tool to handle default reasoning. In order to make the syntactical definitions more intuitive. we begin with a few informal considerations. Recall that by default reasoning, we mean the drawing of plausible inferences from less-than-conclusive evidence in the absence of information \l1 the contrary. The first idea is to introduce in the first-order formalism a basiL default operator, denoted 'tf where 'tf w means "w cannot be deduced from the given knowledge base". The first-order theory will be augmented with inference schemata like this for example: (5)
bird(x) 'tf (penguin(x) v ostrich(x) v ... ) jlies(x)
7 Jnherilance in Semantic Networks and Default Logic
187
such a formula means: "if x is a bird and if it cannot be proved that x is a penguin or an ostrich, then deduce that x flies." If we add the formula (6)
bird(tweety)
then we can infer that tweety flies. Default reasoning is non-monotonic in the sense that the addition of new statements may invalidate previously derived facts: the set of theorems does not grow monotonically with the set of axioms. Namely, in the example, if we add to the formulae (5) and (6) the axiom
(7)
penguin( tweet y)
then the theory is still consistent but default (5) is not applicable and the formula flies (tweety) is no longer a theorem. An important difference between classical logic and default reasoning is that a single set of axioms can have more than one set of conclusions. For example, consider a universe with two objects A and B. Assume an object is not a block unless it is required to be. Assume also that either A or B is a block. We get the two statements: (8)
ff Block(x) -,B/ock(x)
(9)
Block(A) v Block(B)
Default (8) means "if it cannot be proved that x is a block then deduce that x is not a block". Now neither Block(A) nor Block(B) can be proved using the classical inference rules, so that -,Block( A) and -,Block( B) should be provable by means of(8). But the statement -,Block( A) 1\ -,Block( B), which then becomes provable, is inconsistent with (9). In order to avoid this inconsistency, we will manage to have two possible extensions, one in which Block(A) and -,Block( B) are provable, and another in which Block( B) and -,Block( A) are provable. This feature of default reasoning explains the difficulty of providing a semantics for defaults that allows a set of axioms to give rise to several extensions. Another problem is related to the definition of the non-monotonic theorems. In the inference schema (10)
ffP Q
it is necessary to know that P is not in the set of all provable statements in Order to declare that Q is in that set. This amounts to saying that the set of Provable statements should be known before any proof begins. To avoid this
188
C. Froidevaux and D. Kay.l'j,
E(ed = 0,
var (ei) =
af
Here iti is simply assumed to be a measurement value without respect to its probability properties. This model may only serve as a crude approximation, as probability values outside the interval [0, 1] are not excluded (cf. Rauch, 1984).
232
G. Paas5
Binomial errors niifi "' B(nj, ni(O)) where B(nj, ni(O)) indicates a binomial distribution. This error model presupposes an experiment involving ni independent random drawings according to the "true" probability ni of the rule or fact h For each observation in the resulting sample Si it can only be determined whether Fi is true or.fa/se. Let ci be the number of times where Fi holds and cdni the relative frequency. The probability of ci and ni for a fixed ni(O) is then proportional to ni(9y;(1 - ni(O))"•-c•. The error model was proposed by Ginsberg (1985) and Paass (1986). It can be justified if the experts get their information by experience and base their probability estimates on relative frequencies that they have observed in practice. Because of the limited number of observed cases, there will be a deviation of cdni from the true value ni(O), the sampliny error, which is assumed to be the only cause of errors. It decreases with growing sample size ni. Many variants of these error distributions (e.g. transformed normal errors, discussed by Genest and Zidek, 1986; Lindley, 1985) may be used to represent the knowledge of the decison-maker about the precision of experts' judgements. The parameters of the error distributions on the one hand can be specified directly if they have an intuitive meaning like the standard deviation O"j. On the other hand, they can be expressed in terms of quantiles of the error distribution. Assume, for instance, that the decision-maker wants to determine the sample size ni for the binomial error model and he knows that with probability 0.95 the value ifi will be the interval [0.7, 0.9] if ni(O) = 0.8. Then, using the definition of the binomial distribution (or appropriate approximations), the corresponding sample size n 1 can be determined as 66.5. Assume that an expert has specified ifi = 1 but the decision-maker thinks that this statement is uncertain to some extent. By means of an error model, it is possible to specify this uncertainty without necessarily stating evidence against ifi = I. This means that an estimate ifi = I will be the "best" estimate of ni if the other probability assignments do not imply some evidence in favour of ni < 1. Usually the errors for different ni are assumed to be statistically independent. In other words, it is supposed that the actual deviation ifi - rr; for some ni is not influenced by the actual deviation ifi- ni for any other ni,j #- i. This is reasonable if the experts use different sources of information and do not collaborate. Clemen and Winkler ( 1985) analyse the impact of dependence between experts on the precision of final estimates for probabilities. They show that dependent sources of information considerably reduce the precision of estimates in comparison with the independent case. An alternative is to model the joint distribution p(ifj, ifiln;(O), nj(O)) with the
8
Prohahilistic Logic
233
corresponding covariances or interactions. The statistical techniques discussed below can be applied directly to this case (cf. Paass, 1986). There are procedures to combine the opinions of experts that use no explicit statistical model but start from desirable properties of combination methods. One example is the linear opinion pool, where the estimated final distribution ftw is a convex combination of the distributions Pw; specified by the experts: ftw = Li PiPW;· However, the same result would arise if additive error with constant variance was assumed. For a discussion of these and other approaches see Genest and Zidek ( 1986). 5.2
Evaluation by statistical methods
Let us first discuss some statistical evaluation methods (for a short survey see Dawid, 1983). A widespread method is the likelihood approach. Its direct appeal lies in the idea that it is a good way to compare parameter values 9 1 and 9 2 by means of the probability that they assign to the observed "data". For given "data" iti specified by independent experts and error distributions p(itd ni(9)) we can define the likelihood function L(9) by n,.
L(9) := p(ir lx(9)) :=
fl
p(iti I ni(9))
i= I
This function summarizes all information present in the data. 9 1 is more compatible with the data than 9 2 if L(9 1 ) > L(9 2) (in the absence of other information). On the other hand, the same inferences should result for 9 1 and 82 if L(9d = L(92). Let E>max be the set of parameters where L(9) is maximal. Let us assume that E>max contains only one element. This will happen if the number n6 of parameters is not too large in relation to the number (and structure) of the data items ni. It can be checked by examining the derivates of L(9). The unique maximal parameter value e is called the maximum-likelihood estimate: L(O) = max L(9)
(13)
9
It utilizes all available information in an efficient way and yields the true parameter 9 0 if the sample sizes go to infinity. The likelihood function may be maximized by different optimization methods using first- or second-order derivatives (Mcintosh, 1982) or by simple iterative algorithms (e.g. the EMalgorithm; Dempster et al., 1977) that work without derivatives. If the unknown distribution contains very many parameters then simplified algorithms have to be used to reduce computational effort (cf. Paass, 1986). There are several ways to evaluate the likelihood function. First, of course, 0 can be utilized to calculate the associated value of p(U*) = g*'pw(O) for a
234
G. Paass
proposition U*. The precision of the estimate p(U*) can be determined by a confidence interval. It measures the information contained in the data and enables the decision-maker to distinguish between well-establsihed but equal probabilities and ignorance caused by missing information. It can he estimated using the appropriate likelihood-ratio statistic, whose accuracy depends on the accuracy of the approximation of L(O) by a normal distribution, which increases with growing sample sizes n;. To explore the stochastic relation between some variables of interest, the marginal probabilities for these variables may be estimated. In this way, one can determine. for instance, the information content of yet-unknown symptoms for a diagnosis (Paass, 1986). If two experts consider identical rules F; and Fi and give the same uncertain probability statement ii; = iii with the same variance a 2 then the combination of these two statements is equivalent to a statement with identical it; but variance !a 2 . Therefore pieces of evidence are accumulated, and the true rr; is more likely to be located near ii;. An inherent feature of the maximum-likelihood approach is that "contradictions" between given probabilities ii; are resolved and a unique estimate Pw == Pw(O) is determined. Estimated values it; may differ from the specified values ii;. Contradictions are resolved in such a way that for less reliable ir, the extent of modification is largest. Hence by checking the difference between ii; and it; the "most contradictory" probabilities can be detected.
Table 2 Rule or fact F;
p(D I A) p(DI-,A) p(B I D) p(BI-,D) p(BI A) p(A)
Sample size
Value ii; supplied by expert
1ri,low
ni. hiKh
n;
0.20 0.80 0.70 0.40 0.10 0.50
0.00 0.70 0.60 0.30 0.00 0.30
0.40 0.90 0.80 0.50 0.20 0.70
II 43 57 65 24 17
Interval
Example Suppose that for our example the probabilities ii; listed in Table 2 have been assigned by independent experts. Assuming binomial errors, the decision-maker assesses the reliability of the experts by specifying a "probable interval" [rr;.Iow• n;, high] that will contain the true probability value n; with the prescribed probability P; = 0.90. From this interval, the sample size n; of the corresponding hypothetical sample can be determined (a normal
8
235
Prohahilistic Logic
approximation was used). The hypothetical sample size n; is based on the assumption that the interval [n;. low• ni. high] corresponds to the 90% confidence interval for 1tj. The EM algorithm yields the maximum-likelihood estimate p(D I ---,A 1\ B)= 0.75 with an estimated standard deviation of 0.07. Compared with the worst-case solution of the example given above, this interval is much tighter. If the information about p(B I A) is not taken into account and A and B are assumed to be conditionally independent given D the estimate p(D I ---,A 1\ B) = 0.87 results. This effect is quite general, as the assumption of conditional independence has a tendency to overstate the information content of the available "data". Now suppose that an additional expert states his knowledge if 8 •= p(B I ---,A) = 0.50 and the decision-maker assumes this expert to be rather reliable with n 8 . low = 0.45 and n 8 • high = 0.55. The corresponding hypothetical sample size is 270.7, yielding an estimate p(D I ---,A 1\ B) = 0.88. However, the ifi are now contradictory to some extent, and the estimated values deviate from the specified values, for example p(B I A)= 0.35 instead of 0.10 and p(B I---, A) = 0.52 instead of0.50. According to a x2 test the deviation of p(B I A) is more significant than the deviation of p(B I---, A). In this sense the specified p(B I A) is "more contradictory" than p(B I---, A). An alternative statistical evaluation principle is Bayesian analysis. Here the vector 9 of unknown parameters itself is considered as an n8 -dimensional random variable with an unknown distribution Pr (9). Probability vectors are treated as points in !R"o and Pr (9) is a probability measure on that space, which can usually be described by a probability density. It induces a distribution Pr (n) on 1t because of Pw = Pw(9) and 1t = (R+ Pw) -:- (R- Pw). The aim of Bayesian analysis is the combination of a given prior distribution Pr (9) with some data. In our context, Pr (9) may result from two different lines of reasoning. First, it can encode structural information about the distribution discussed above (e.g. the absence of higher order associations, higher plausibility of specific distributions). On the other hand, it can reflect complete ignorance and give the same density to all 9-values. However, the definition of such "non-informative" priors is a controversial topic (Berger, 1980, pp. 68ff). The crucial and arguable feature of Bayesian reasoning is that the prior distribution Pr (9) is always assumed to exist. For a given prior distribution and a known error model p(if 19) the posterior density for the unknown parameter 9 is _
Pr( 9 ltt) =
p(if 19) Pr (9)
f p(ifl9) Pr(9)d9
( 14)
It specifies the density value or "relative" probability of 9-vectors after the information contained in if has been taken into account. Its maximum, the
236
G. Paass
maximum posterior estimate, if it exists, gives the "most-probable" parameter value. Moreover, posterior regions may be determined where the posterior density is highest and which contain the true parameter with a prescribed probability. This can even be done for the case that there is no unique maximum of the posterior density. If the decision-maker has a constant "noninformative" prior density Pr(9) =constant, indicating missing information about the parameter, then the maximum-posterior estimate is identical with the maximum-likelihood estimate (13). The determination of the maximum-posterior estimate can be done with the same methods and comparable effort as the calculation of the maximumlikelihood estimate. The determination of posterior regions involves the determination of multivariate integrals, which is computationally very demanding in the general case. Simplifications arise if Pr (9) is assumed to be normally distributed, and least-squares methods like the extended Kalman filter (Lederman, 1984, pp. 902ff) may be used. Gokhale and Kullback (1978, pp. 199 ff) discuss the application of the maximum entropy approach. A new class of algorithms for the solution of the nonlinear optimization problems employs statistical simulation techniques. With relatively little effort they yield approximate solution whose quality increases during the progress of the optimization. As in case of uncertain probabilities the variance of 0 usually is large suboptimal solutions often are sufficient. An example is the simulated annealing algorithm (Aarts, Laarhoven, 1985) where a cost function C (9) is to be minimized. The algorithm consists in successive modifications of a single component 0; of 9 to a new value ii;. If C(li) < C(9) the modification is accepted. Otherwise the modification is accepted with probabiity exp ([C(9)- C(li)]/t). If the control parameter t is slowly decreased towards zero it can be shown that the resulting values concentrate on the set {0 I C(O) =min C(9)} of minimal cost parameters. In a Bayesian context we may define C 8 (9) == -log (p(ft 19) Pr (9)) and the procedure yields the maximum posterior estimate. In the same way the negative log likelihood function CL(9) == -log (p(ft 19)) or other cost functions may be utilized. If for a suitable parameterization 9 and Bayesian cost function C8 (9) the value oft is set to I it can be shown (Paass, 1987) that the resulting simulated sequence of parameter values can be considered as a sample of the posterior distribution Pr (9 I ft). By observing the evolution of 9 after convergence to steady state the marginal posterior distribution and corresponding posterior regions of any parameter can be obtained. In the same way confidence regions can be established if the likelihood cost function CL(9) is employed as criterion function. In contrast to the sample based procedure of Bundy ( 1985) the values of higher order interactions not affected by the criterion function
8
Probabilistic Logic
237
can be controlled and, for example, can be set to their 'least informative' zero values.
6 6.1
SUMMARY AND COMPARISON Assessment of probabilistic logic
In this section we want to summarize and discuss the main features of probabilistic logic in expert systems. It has been demonstrated above that the probability of a proposition A can be interpreted as the "subjective degree of belief" of the decision-maker in A. No frequency concept is needed, as the existence of a probability measure p(x) can be derived from a few axioms of rational behaviour, and there exists a clear framework for the interpretation of probabilities. An expert system seems to be particularly suitable for the application of probability, as it is a closed simplified system with fixed states that can react to the real world only according to a limited number of "pieces of evidence". In contrast with the early utilization of Bayes' rule, where a large number of prior probabilities were required, the approaches discussed in this chapter need only the information about the structure and probability values of the inference net that is logically necessary to arrive at a result at all. Of course, the prize paid for such a relaxation in the requirements is that in general the resulting probabilities may no longer be unique point values but may only be known to be located within some interval. The inference net can have an arbitrary form with cycles. Structural assumptions may be stated by restricting the class of distributions considered (if, for instance, higher-order interactions are zero) or by assuming informative prior distributions. In worst-case analysis no structural assumptions are necessary. The inference net can be revised and enlarged by simply exchanging rules or adding variables. If new propositions occur in the course of reasoning then the probability measure can be extended consistently. Hence probabilistic logic is not confined to situations where all propositions are known in advance. It is not necessary to stick to a single number for the description of the degree of belief in a proposition. The precision of probabilities can be characterized by a whole function, the likelihood function or the related posterior density. Unlike classical logic, in general no single possible world will emerge as the "true" world, but all possible worlds have to be taken into account with differing chances of being the "true" world. Because of this complexity, no simple "explanation" of a result is usually feasible. It is,
238
G. Paass
however, possible to identify the main reasons that led to a specific result by considering the probability of antecedents of rules. An inherent feature of the approaches for handling uncertain probabilities is their ability to resolve contradictions and to take into account the relative precision of "inputs" for the determination of the resulting probabilities. During the process of probabilistic reasoning, the probability p(A) and hence the truth value assigned to a proposition A can change if new evidence arrives and uncertain probabilities are assumed. Consequently, probabilistic logic with uncertain pieces of evidence is a sort of non-monotonic logic. The intention of this chapter was to discuss the different evaluation principles and inherent assumptions that may be chosen in probabilistic logic. The algorithms presented often involve large computational effort. They give a sort of reference solution, which may be used to derive simpler computationally feasible procedures. Methods that may be employed for larger inference nets are (i) (ii) (iii) (iv)
the linear-programming approach, where the restrictions are specified in terms of marginal probabilities; the INFERNO approach of Quinlan (1983); the Bayesian network technique proposed by Pearl ( 1986), where a specific interaction pattern is assumed; statistical simulation techniques (Paass, 1987), which give approximate solutions for the most general case.
As many research groups are currently working on new algorithms, significant progress can be expected in the near future. 6.2
Relation to similar approaches
The characteristic of the Shafer-Dempster approach (Shafer, 1976; Chapter 9 of the present book) is that it does not lead to a probability distribution over the exclusive and exhaustive possible worlds W; E "'f/, but rather starts with a "probability mass function" m(A) ~ 0 on the Boolean algebra fi' over "HH. This function is supplied by the experts, and the masses have to sum to I. Example Consider the possible worlds "HH corresponding Boolean algebra is
= { W~> W2 ,
W3 }. Then the by fi' = {0, W~> W2, W3, W1 v W2, W1 v W3, W 2 v W3, W1 v W 2 v W3 }. Assume the probability mass function m(W1 ) = 0.3, m(W3) = 0.1, m(W2 v W3 ) = 0.4, and m( W1 v W2 v W3 ) = 0.2. given
If a mass is given to W; v J.tj, this means that this "amount" of belief can be
8
Probabilistic Logic
239
attributed jointly to W; and It), but the decision-maker does not have enough information to allocate proportions of m( W; v It}) to the single propositions w; and It). Consequently the mass given to the disjunction of all W; cannot be allocated at all. This situation can also be modelled by means of probabilistic logic. Assume for our example that there is a random variable w with possible "values" WI> W2 and W3, and that there exists a probability measure p(w) for w. The values of p(w) are not known. There exists, however, another variable v that has the "values" v~> v3, v23 and v 123 corresponding to the elements :!i' with positive mass. w and v are assumed to have a joint distribution p(v I w)p(w) = p(w, v) = p(w I v)p(v). About p(w I v), the decision-maker has some structural information: p(W1Ivd =I,
p(W2Ivd = 0,
p(W3I vd = 0
p(W1Iv3) = 0,
p( W2l V3) = 0,
p(W3Iv3) =I
o,
p( W2l V23) =IX,
p( W1 I v23) =
p( W1 I v123) = P~>
p( W2 I v 123) = P2,
p(W3IV23)= I-IX p(W3Iv123) = I - P1- P2
The free parameters IX, P1 and P2 , however, are unknown to him. They represent the information necessary for an allocation of probability masses to the probabilities of the W;. The basic probability assignment defines the marginal distribution p(v). It is then the task of the decision-maker to derive conclusions about the probability of the elements of the Boolean algebra. Obviously there is no unique solution, but upper and lower bounds on these probabilities can be established by use of worst-case analysis. The resulting bounds are the narrowest possible without introducing additional information into p(v, w). Grosof ( 1986) and Kyburg ( 1987) point out that every Shafter-Dempster belief function may be expressed by inequality constraints on an underlying probability measure. The converse, however, is not true, and hence the Shafer-Dempster scheme has less expressive power than the probability approach with upper and lower bounds. In the case of two experts supplying two different probability mass functions, an error model could be formulated according to the precision of their assignments. By convex Bayesian analysis (Thompson, 1985), the mass functions can be combined (cf. Grosof, 1986). It would be interesting to compare the results of Dempster's rule of combination with these probabilistic techniques. Lemmer ( 1986) has already showed that the combination rule yields results contradictory to a probability interpretation.
240
G. Pcws.1·
Many of the structural features of other approaches to uncertain reasoning are similar to probabilistic logic. Gaines ( 1978) shows that probabilistic logic as well as fuzzy logic (see Chapter 10) can be considered as special cases of a "standard uncertainty logic", which he defines by a set of axioms. T 0 arrive at probabilistic logic, the axiom of excluded middle p(A v -,A) = 1 has to be added, while another axiom is added to arrive at a variant of fuzzy logic. Goodman and Nguyen ( 1985) develop generalized set-membership functions with probabilistic and fuzzy logic as special cases. Horvitz et a/. ( 1986, pp. 212f) discuss the relation of two distinct forms of fuzzy logic to probabilistic logic. The first type (Zadeh, 1983) allows beliefs to be assigned to propositions that are fuzzy, i.e. remain ill-defined. Proponents of probability theory have pointed out that imprecision in the specification of a proposition could always be converted to uncertainty of a related precise event that had similar or identical semantic content. Cheeseman ( 19S6) proposed that probability distributions over variables of interest can capture the characteristics of fuzziness within the framework of probability. The fuzzy proposition "Mary is young", for instance, can be represented by a distribution specifying the probability that Mary has age z. He claims that "fuzzy logic is unnecessary for representing and reasoning about uncertainty." The second type of fuzzy logic (Gaines, 1978) interprets the degree J1.T( A) of membership of a proposition A in the set of true propositions as the degree of belief in A. In this approach the degree of belief in a conjunction is defined by J1.T(A 1\ B) = min (J1.T(A), J1.T(B)). Obviously, this is not consistent with the factorization p(A " B) = p(A I B)p(B) of probability theory. It is, however. comparable to features of worst-case analysis discussed above. The relation between probabilistic logic and default logic is discussed in Chapter 10. There are many other approaches to uncertain reasoning, some of which are rather ad hoc. Horvitz eta/. ( 1986) showed, for example, that the certainty factors approach is inconsistent. Heckerman ( 1986) modified the certainty-factor method in such a way that it satisfies the requirements for a Bayesian probability interpretation. The above arguments show that probabilistic logic is able to exhibit similar features to some "new" concepts of non-monotonic reasoning. Hence the concepts seem to be complementary instead of contradictory. For each approach it is most important to clarify and check all the inherent assumptions (independence, absence of higher-order associations, etc.) before applying it to a concrete problem. There has been great progress in this field during the last few years, and it is to be hoped that parallel research on different concepts will stimulate progress in uncertain reasoning as a whole.
8
Prohahilistic Logic
241
ACKNOWLEDGMENTS
I should like to thank Didier Dubois, Gabor Gyarfas, Hermann Quinke, Phillipe Smets and Frank Veltman for their valuable comments. In addition, I am grateful to the Gesellschaft fiir Mathematik und Datenverarbeitung, who provided the opportunity to work on this subject. Bl B LIOG RAPHY The following references provide an introduction to the basic principles and problems of probabilistic logic and compare it with other approaches. Cheeseman, P. (1985). In defense of probability. Proc. Int. Joint Con[. on Artificial Intelligence (/JCAI-85), Los Angeles, pp. 1002-1009. (Starting with the notion of probability as a measure of belief in the truth of a proposition, the positive features of probabilistic reasoning in comparison to other approaches are compiled. It is argued that probability theory, when used correctly, is sufficient for the task of uncertain reasoning.) Fishburn, P. C. (1986). The axioms of subjective probability (with discussion). Statist. Sci. 1, 335-358. (Gives an up to date survey of axiom systems, including comparative probability relations, decision-theoretic approaches, interval probabilities, etc. The discussion gives an impression of the controversies in this field.) French, S. (1985). Group consensus probability distributions: a critical survey. Bayesian Statistics 2 (ed. J. M. Bernardo et a/.), pp. 183-202. North-Holland, Amsterdam. (Two main versions of the group-consensus problem are considered. In the expert problem a group of experts submits probability judgements to a decision-maker outside the group, who has to aggregate the experts' opinions. In the group-decision problem the group itself is responsible for aggregating their probability judgements to a consistent probability distribution.) Genest, C. and Zidek, J. V. (1986). Combining probability distributions: a critique and an annotated bibliography. Statist. Sci. 1, 114-148. (This paper discusses the problem of aggregating a number of probability distributions specified by different experts. In contrast with probabilistic reasoning no marginal or conditional distributions are specified by the experts. From the point of view of decision theory, the different approaches are compared. The extensive bibliography and the discussion give a comprehensive picture of this field.) Kanal, L. N. and Lemmer, J. F. (eds) (1986). Uncertainty in Artificial Intelligence. North-Holland, Amsterdam. (This collection contains nearly 40 papers and gives a representative impression of current developments. The topics of probabilistic reasoning, belief functions, maximum entropy, and interval probabilities are covered and compared in depth.) Nilsson, N.J. (1986). Probabilistic logic. Artificial Intelligence 28, 71-87. (Defines the truth value of sentences of first-order logic by their probability in probabilistic reasoning systems. The derivation applies to any logical system for which the consistency of a finite set of sentences can be established.) Paass, G. (1986). Consistent evaluation of uncertain reasoning systems. Proc. 6th Int. Workshop on Expert Systems and their Applications, Avignon, pp. 73-94. (Inference nets are considered where probabilities of fact and rules are not known exactly but
242
G. Paas.1
are subject to error. These probabilities are modelled as random samples where the number of elements determines their reliability. The inference net is evaluated according to the maximum-likelihood principle, allowing conflicting evidence to be processed.) Quinlan, J. R. (1983). INFERNO: a cautious approach to uncertain reasoning. Comp. J. 26, 255-269. (Specifies a method for the evaluation of inference nets where intervals are specified for marginal and conditional probabilities, yielding intervals that have to contain the true probabilities. The approach is computationally cheap, but may yield intervals that are larger than optimal.) Spiegelhalter, D. J. (1986a). A statistical view of uncertainty in expert systems. Artificial Intelligence and Statistics (ed. W. Gale ed.), pp. 17-55. Addison-Wesley, Reading, Mass. (Different approaches to uncertain reasoning (e.g. probabilistic reasoning, fuzzy reasoning, belief functions and the theory of endorsements) arc compared from a statistical point of view. It is argued that a subjectivist Bayesian view of uncertainty can provide many features demanded by expert systems. Some examples as well as numerically feasible methods for the evaluation of inference nets are discussed.)
Other references
Aarts, E. H. L. and van Laarhoven, P. J. M. (1985). Statistical cooling: a general approach to combinatorial optimization problems. Philips J. Res. 40, 193-226. Berger, J. 0. (1980). Statistical Decision Theory. Springer-Verlag, New York. Bernardo, J. M., DeGroot, M. H., Lindley, D. V. and Smith, A. F. M. (eds) (1985). Bayesian Statistics 2. North-Holland, Amsterdam. Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, Mass. Bundy, A. (1985). Incidence calculus: a mechanism for probabilisitc reasoning. J. Autom. Reasoning 1, 263-283. Cheeseman, P. (1986). Probabilistic versus fuzzy reasoning. In Kana! and Lemmer (1986), pp. 85-102. Clemen, R. T. and Winkler, R. L. (1985). Limits for the precision and value of information from dependent sources. Operations Res. 33, 427-442. Dalkey, N. C. (1986). Inductive inference and the representation of uncertainty. In Kana! and Lemmer (1986), pp. 393-397. Dawid, A. P. (1983). Statistical inference. Encyclopedia of Statistics, Vol. 4 (ed. S. Kotz and N. L. Johnson), pp. 80--105. Wiley, New York. Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood frorn incomplete data via the EM algorithm. J. R. Statist. Soc. 839, 1-38. Diaconis, P. and Zabell, S. L. (1982). Updating subjective probability. J. Am. Statist. Assn 77, 822-830. Fienberg, S. E. (1980). The Analysis of Cross Classified Categorial Data. MIT Press. Cambridge, Mass. Gaines, B. R. ( 1978). Fuzzy and probability uncertainty logics. Info. Control 38. 154-169. Gale, W. (ed.) (1986). Artificial Intelligence and Statistics. Addison-Wesley, Reading. Mass.
8
Prohahilistic Logic
243
Ginsberg, M. L. (1985). Does probability have a place in nonmonotonic reasoning? Proc.Int. Join Con,{. on Artificial Intelligence, (JJCAI-86) (ed. A. Joshi), Los Angeles, pp. 107-110. Gokhale, D. V. and Kullback, S. (1978). The In,{ormation in Contingency Tables. Marcel Dekker, New York. Good, I. J. (1982). Axioms of Probability. Encyclopedia o.f Statistical Sciences, Vol. I (ed. S. Kotz and N. L. Johnson), pp. 169-176, Wiley, New York. Goodman, I. R. and Nguyen, H. T. (1985). Uncertainty Models.for Knowledge-Based Systems. North-Holland, Amsterdam. Grosof, B. N. (1986). An inequality paradigm for probabilistic reasoning. In Kana! and Lemmer (1986), pp. 259-275. Haber, M. and Brown, M. B. (1986). Maximum likelihod methods for log-linear models when expected frequencies are subject to linear constraints. J. Am. Statist. Assn 81, 477-482. Hamburger, H. (1986). Representing, combining and using uncertain estimates. In Kanal and Lemmer (1986), pp. 399-414. Heckerman, D. (1986). Probabilistic interpretations for MYCIN's certainty factors. In Kanal and Lemmer (1986), pp. 167-196. Horvitz, E. J., Heckerman, D. E. and Langlotz, C. P. (1986). A framework for comparing alternative formalisms for plausible reasoning. Proc. American Association .for Artificial Intelligence Con,{. (AAAI-86), pp. 21(}-214. Hunter, D. (1986). Uncertain reasoning using maximum entropy inference. In Kana! and Lemmer (1986), pp. 203-209. Kahneman, D., Slovic, P. and Tversky, A. (1982). Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press. Konolidge, K. (1982). An information-theoretic approach to subjective Bayesian inference in rule-based systems. Draft, SRI International, Menlo Park. Kyburg, H. E. (1987). Bayesian and non-Bayesian evidential updating. Artificial Intelligence 31, 271-293. Lederman, E. (ed.) (1984). Handbook of Applicable Mathematics, Vol. VI Part B: Statistics. Wiley. Chichester. Lemmer, J. F. (1986). Confidence factors, empiricism and the Dempster-Shafer theory of evidence. In Kana! and Lemmer (1986), pp. 117-125. Lindley, D. V. (1985). Reconciliation of discrete probability distributions. In Bernardo eta/. (1985), pp. 375-390. Loui, R. P. (1986). Interval-based decisions for reasoning systems. In Kana! and Lemmer (1986), pp. 459-472. Mcintosh, A. A. (1982). Fitting Linear Models: An Application o.f Conjugate Gradient Algorithms. Springer-Verlag, New York. Paass, G. ( 1988). Uncertain reasoning by stochastic simulation. Working Paper G MD/ F3, St Augustin, FRG. Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence 29, 241-288. Rauch, H. E. (1984). Probability concepts for an expert system used for data fusion. AI Magazine (Fall), pp. 55-60. Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press. Shore, J. E. (1986). Relative entropy, probabilistic inference, and AI. In Kana! and Lemmer (1986), pp. 211-215. Smith, C. A. B. (1961). Consistency in statistical inference and decision. J. R. Statist. Soc. 823, 1-25.
244
G. Paa.\'.1'
Spiegelhalter, D. J. (1986b). Probabilistic reasoning in predictive expert systems. In Kana! and Lemmer (1986), pp. 47-67. Thompson, T. R. (1985). Parallel formulation of evidential reasoning theories. Proc. Int. Joint Conf. on Artijiciallntelligence (IJCAI-85), Los Angeles, pp. 321-327. Zadeh, L. A. (1983). The role of fuzzy logic in management of uncertainty in expert systems. Fuzzy Sets and Systems 11, 199-227.
DISCUSSION Frank Veltman: Since most decisions must be made under circumstances of uncertainty, the designer of an expert system is immediately confronted with the problem of representing the modes of inference typical of such circumstances. Gerhard Paass's chapter offers an encyclopaedic survey of the statistical techniques that become available once one has taken for granted that the kind of "uncertainty" at stake here is best captured by probability theory. I shall not dispute the idea that probability theory offers the best characterization of uncertainty-indeed, there are strong arguments in favour of this position (see e.g. Lindley, 1982). My comments--one remark and one question-only pertain to the particular way in which Paass develops this idea. (i) The remark that I want to make is this: the domain of application of the mathematical framework offered in Sections 2.3 and 3.1 is rather limited, much more limited than the author seems to think. Amplification. The theory presented in Sections 2.3 and 3.1 requires for its application that probabilities be assigned to things that bear a truth value: sentences (or propositions, as the author prefers to call them). This is established by choosing as the sample space a set of so-called possible worlds, each determined by some maximal consistent set of sentences. The practice, however, does not fit this theory. For one thing, there is no way to make sense of the example presented in Section 2.1 if the symbols "A"," B" and "D" arc really to be interpreted as sentences. This is what Paass says: Suppose that a doctor has to decide whether or not a patient has a disease D. The relation between the two symptoms A and B and the disease D is specified in the form of the following rules F" .. ., F 5 , which hold with a certain probability: F 1 ,="If A then D follows" holds with probability n 1 F 2 '="If --,A then D follows" holds with probability n 2
[ .... ] These probabilities 1ti reflect the subjective degree of belief of the doctor in the truth of the rules for a certain universe, for instance the people of a town. [ ....] the probability associated with a rule is always defined as the conditional probability of the consequence given the antecedent. For F1 we have for example n 1 = p(D I A)'= p(A " D)/p(A).
One might try to interpret "A" as "a person chosen at random shows symptom A". and "D" as "a person chosen at random suffers from diseaseD", but in this manner the sentence (A 1\ D) is not going to mean "a person chosen at random has both the
8
Prohahilistic Logic
245
symptom A and the diseaseD", let alone that p(D I A) can serve as a formalization of the conditional probability that a person chosen at random suffers from disease D, given that this person shows symptom A. There are two alternative, and appropriate, ways to interprete the quoted paragraphs, but in neither of these A and D can be understood as sentences. On the first reading, A and D are to be interpreted as predicates, and the sample space concerned is not a set of possible worlds, but a set of possible "patients", each given by some maximal consistent set of predicates. Fortunately, the statistical techniques discussed in Section 4 work just as well-mutatis mutandis-for this set up as they do for the original. However, this only holds for the case that we restrict ourselves to oneplace predicates. For more complex predicates things go wrong: Example 1 Some people think that the kissing disease is transmitted by kissing. Clearly, these people, and those who disagree with them, will want to talk about the conditional probability that a person x will be infected with the kissing disease by a person y who has the kissing disease, and who has kissed x. As far as I can see, there is no way to handle this probability within the framework offered by Paass, not even if we interpret it in a different manner.
The second way to read the quoted paragraphs is to think of A and D as formulae Ax and Dx in which the free variable x is suppressed. Ax can be considered as an abbreviation of "x shows symptom A", and Dx as an abbreviation of "x suffers from diseaseD". Where Paass writes (A 1\ D), we read (Ax 1\ Dx), and instead of p(D I A) we read p(Dx 1 Ax). Perhaps, at first sight, there is not much difference between this reading and the one discussed above. However, the advantage of not suppressing the free variables becomes clear if we want to formalize examples that involve both formulae with free variables and sentences, i.e. formulae without free variables. Example 2
(a) (b)
Compare the following:
the probability that every male smoker dies from cancer before the age of 60; the conditional probability that a randomly chosen person dies from cancer before the age of 60, given that this person is a male smoker.
Note that these two probabilities are not necessarily the same. In fact we can be pretty sure that an expert will assign the probability 0 to (a), and a positive probability to (b). Still, there are logical relations between these two probabilities. If one of them is I then so is the other. Within Paass's framework, these logical relations cannot be made explicit. What one needs for this example as well as for the previous one is a fully fledged probabilistic semantics for arbitrary first-order formulae with or without free variables-a semantics that assigns to a formula 4J(x., ... , x.) the probability of finding for a randomly chosen n-tuple of objects that the property expressed by 4> applies to them. Given such a framework, we could safely write "Dx" for "x dies before the age of 60", and "Sx" for "xis a male smoker", and so arrive at p(l;/x(Sx --+ Dx)) for the probability under (a), and p(Dx I Sx) for the probability under (b). The probability of Example I could be formalized as p(lxyiPy 1\ Kyx), where "lxy" is an abbreviation of"x will be infected by y", "Px" is an abbreviation of"x has the kissing disease", and "Kxy" is an abbreviation of "x has kissed y". A probability semantics with the desired properties was devised in the early sixties by the Polish logician Jerzy J:_os. Unfortunately, lack of space prevents me from
246
G. Paas.1
describing his theory here. Let me just mention the relevant literature. The locus classicus is J:,os (1963). The theory is further developed in Fenstad (1967), and in Gaifman and Snir (1982). Cooke ( 1986) discusses the theory with a view to application in expert systems. (ii) The question that I want to ask is this: How does the discussion of Section s relate to other work that has been done on the subject of expert resolution. Mor~ precisely, is it meant as an alternative to the approach that tries to develop criteria for evaluating expert probability assessment? Amplification. At several places in his chapter, Paass emphasizes that the probabilities at stake are supposed to reflect the expert's subjective degree of belief in the propositions concerned. Now, clearly, different experts may have different opinions about the same propositions, and therefore assign different numbers to them. In one way or another the decision-maker will have to resolve this conflict of opinions. I must confess that I do not fully understand the strategies that Paass recommends to this purpose. Actually, I find myself already at a loss with the way he introduces the problem. He says that the probabilities supplied by the experts may be erroneous to some extent, and he introduces so-called error models that give the stochastic relation between the true probability and the numbers supplied by the experts. I find it odd to find the word "error" in a context where subjective probabilities are involved- can people really be mistaken about their own degree of belief?-and I should be greatly helped if the author could give an example of the kind of errors he has in mind. Neither do I see what can possibly be meant by "the true probability", and it does not help much when the author says on p. 230 that
The "true" probability 1t; can be considered as the subjective probability estimate that would be supplied by a rational expert with complete information about all aspects of the problem. As far as I can see, the only feasible subjective probability estimates that any rational being-expert or not-will supply in these ideal circumstances are 0 for the propositions known to be false, and I for the propositions known to be true. But this is not what the author has in mind, I am afraid.t Anyway, the question arises as to why the author takes recourse to these error models to solve a problem that first and foremost seems to be a selection problem: which of the experts is the most reliable-or, perhaps better: which one has so far been found to be the most reliable? This question has received a lot of attention recently. and several criteria have been proposed for evaluating expert probability assessment. Important contributions can be found in Lichtenstein et a/. (1982), De Groot and Fienberg (1983) and Cooke (1985). I am grateful to Roger Cooke and Michie! van Lambalgen for their help.
Didier Dubois and Henri Prade: Are probability measures inevitable for modelling: subjective uncertainty? Several authors cited by Paass, such as Cheeseman (1983) and Horvitz et a/. (1986), have claimed that axioms of rational behaviour force an 'On p. 217ft' Paass discusses as a special case the case that the experts concerned have given their subjective estimates of an objective probability-one or the other statistical fact. Of course. in this special case the words "error" and "true probability" do apply.
8
247
Probabilistic Lof?ic
uncertainty measure to be a probability measure. To do so, they put forward Cox's (1946) axiom system for the modelling of "reasonable expectation". However, Cox's axioms are not always exactly reported. Namely, he starts from the following requirements. Letting .f(a I b) be a measure of the "reasonable credibility" of the proposition b when the proposition a is known to be true, Cox proposes two basic axioms: Cl:
there is some operation * such that .f(c " b I a)
C2:
=
.f(c I b " a) *f(b Ia)
(I)
there is a function S such that .f(--,bla)
=
S(f(bla))
(2)
where --, b denotes not b. The following additional technical requirements are needed: C3:
* and S both have continuous second-order derivatives.
Then .f is proved to be isomorphic to a probability measure. Cheeseman (1985) proposes Cox's results as a formal proof that no other set functions than probability measures are reasonable for the modelling of subjective uncertainty. This claim can be disputed for two reasons. Although (I) seems very sensible as a definition of conditional credibility function, the purely technical assumption (C3) is very strong and cannot be justified on commonsense arguments. For instance* =minimum is a solution of (I) that does not violate the algebra of propositions, but it certainly violates C3. It is recovered as a valid solution as soon as C3 is relaxed to a more intuitive continuity assumption. A second objection concerns Axiom C2, which explicitly states that only one number is enough to describe both the uncertainties of b and --,b. Clearly, this statement rules out the ability to distinguish between the notions of possibility and certainty. This distinction is the very purpose of belief functions, possibility measures, and any kind of upper and lower probability system. Hence Cox's setting, although an interesting attempt at recovering probability measures from a purely non-frequentist point of view, does not provide the ultimate answer to the problem of justifying subjective probabilities. Dubois and Prade (1982) should be consulted for another axiomatic setting encompassing both probability and possibility measures as admissible models of subjective uncertainty. Namely the degree g(a) attached to proposition a should satisfy the following decomposability axiom: Dl:
if a " b =
0
then there is an operation .l such that g(a v b)= g(a) .l g(b)
(3)
These are called decomposable measures. They are usually different from belief functions in the sense of Shafer, although possibility and probability measures do belong to both settings. But many belief functions are not decomposable. Another comment concerns the interpretation of degrees of probability as degrees of intermediate truth. In our contribution to this book (Chapter to) we strongly argue against such a confusion. A degree of truth is not a degree of uncertainty about truth. In particular, the logical propositions considered in Paass's chapter are only either true or false. This distinction, in a probabilistic setting, is at least as old as Carnap's 0945) paper. However, Carnap's view of probability of a proposition a as the ratio between the number of possible worlds where a is true over the number of possible Worlds is not really convincing, since it assumes that the possible worlds are equally
248
G. Paass
probable-an assumption that is very difficult to check, and which turns out to be false in many cases. Actually, a probability measure on an algebra of propositions is justified if possible worlds are identified with a set of outcomes in a random process, and statistical data about this process are available.
Philippe Smets: F:
The author, like probabilists, translates the sentence
"if W then N" holds with probability p
as P(N I W) = p. This interpretation is questionable. One can translate F in at least two ways: 1:
P(NI W)
2:
P(--, W
=p
v N) = p
Suppose that an urn contains 100 balls. Let W =the ball is white, N =the ball is numbered. Suppose that there are 60 W" N balls, 15 W" --,N, 12 --, W" N, 13--, W" --,N. A ball is going to be taken randomly (each ball having a probability of O.Ql of being selected). If one translates F into "if I have extracted a white ball then the probability that the ball is numbered is p" then one has P(N I W) = p = 0.80. But one can also consider all extractions where the proposition W -+ N is true, i.e. whenever the ball is either W " N or --, W. One obtains the second translation with P(--, W v N) = p = 0.85. The two translations correspond respectively to 1':
N-+P(W)=p
2':
P( W-+ N) = p
The decision as to which translation is relevant can only be derived from information external to F. In medicine, with G = "if symptom S then diseaseD holds with probability p", one usually derives G from the fact that among those with symptomS a proportion p have disease D, in which case interpretation I holds. But it could also be that one receives the rule S -+ D from a professor whose proportion of correct assertions is p. Then interpretation 2 holds. A study of which proposition receives the I - p can be enlightening. In the case G, r is the proportion of time the professor tells the truth and I - p is the proportion of time the professor does not tell the truth. To be a good probabilist, one must then construct the probability distribution on all the sentences that the professor could assert given that he does not tell the truth, often an unreal requirement, and distribute the probability I - p among these sentences. With belief functions, the distinction between I and 2 disappears. One has bei(NI W) = c(bel(--, W v N)- bel(--, W)) (see Chapter 9, Section 3.6), with c = I or c = I - bel(--, W), depending on the openor closed-world assumption (Chapter 9, Section 4). . Let W and N be the spaces on which W and N are defined (i.e. W is the set of colours). When I have only the information "my belief is p that W-+ N is true", I build a belief function on W x N such that bei(N 1 W) = p and such that bel(cyl( W)) "" bel(cyl(--, W)) = 0 (Chapter 9, Section 6), with cyi(W) = (W, N) v (W, 1N). Then c = I and bei(N I W) =bel(--, W v N)
8
Probabilistic Logic
249
Therefore in the G case, the belief-function approach consists in allocating the mass p to S--+ D and the mass I - p to the tautology, avoiding in fact the obligation to enumerate the set of sentences that might be uttered by the professor when he lies, and to allocate to each of them a probability. To assimilate the probability of a conditional A --+ C to a conditional probability P(C 1 A) is hazardous, especially when considering iterated conditionals like A --+ (B--+ C) as shown by Lewis (1976) in his first triviality result. Let us define the--+ operator such that P(A --+ C)= P(C I A) for every A and C. Then one gets P(A --+ C I C) = P(C I A " C) = I P(A--+ CI--,C) = P(CI A
1\ I
C)= 0
For any D one has P(D) = P(DIC)P(C)
+ P(DI--,C)P(--,C)
If D is A --+ C then one obtains P(C I A)= I· P(C)
+ 0· P(--,C) =
P(C)
so A and C are probabilistically independent! Conditionals are highly delicate concepts (see Harper et a/., 1981 ), and their direct translation as conditional probabilities can be misleading.
Reply: First, I should like to discuss the objections of Frank Veltman to the definition of a probability measure using propositions. Recall that the set "'Y = {Wt. .. . , w. } of elementary propositions was defined as an exhaustive collection of consrstent and mutually exclusive statements about the world. Exactly one of these statements, Wj, is true. To remain in our example of medical diagnosis, the world is described by the characteristics of the next patient in the waiting room of the doctor. The possible worlds are the possible, logically consistent combinations W; of symptoms and diseases of this patient. The subjective probability measure of the doctor assigns a number 0 ~ p( U) ~ I to each subset U of "'Y according to the subjective degree to which he believes this subset to contain the fixed, but yet unobserved, realization Wj E "'f'", the true symptoms and diseases of the patient. Hence relevant propositions are stated with respect to a specific situation (a specific patient). A sample or a population is not necessary for the specification and evaluation of subjective beliefs by subjective probability measures. As long as the set of elementary propositions is finite, this conceptually simple setup may be utilized. Nilsson (1986, pp. 77ft"), for instance, shows how problems of firstorder logic may be solved by this approach. However, first-order logic and probability are only loosely connected in Nilsson's theory, as in a first step the "internal" consistency of possible worlds "'Y; has to be established within first-order logic, and in a second step probability theory is applied to derive the desired probability of consequences. Therefore I agree that integrated probability semantics for arbitrary first-order formulae-as proposed by Veltman-are more appropriate in this case. Let me now discuss the remarks of Veltman concerning the case of uncertain knowledge about probabilities. Here an independent external decision-maker is postulated who is able to judge the reliability of different experts. This decision-maker IS assumed to specify his subjective belief about the reliability of the ith expert for a series of hypothetical situations. In each such hypothetical situation he is asked to assume a specific "true" value, e.g. n; = 0.3, for the probability p(A) in question.
250
G. Paass
Subsequently, he is asked to specify his subjective probability that the number rr 1 that the expert wiii assign to p(A) will be lower then a specific value Ct. for example c, == 0.1. By specifying his subjective probability for different values c1, the decisionmaker can formulate his conditional subjective probability measure p(if 1 ln 1 = 0.3) for the situation that n 1 = 0.3. This process is repeated for other hypothetical situations with different values of n1• In this way, a subjective conditional probability measure p(if1 ln 1) can be gained, givinga complete picture of how the decision-maker judges the performance of the ith expert in different circumstances. Note that p(ifd n 1) contains no indication of the "true" probability in question. In the literature on group decision making, such an external decision-maker has been called "supra-Bayesian" (Genest and Zidek, pp. 120ff). It has several advantages over other approaches. (i)
If such a decision-maker exists then the pooling process is not a problem, as he can treat the experts' judgements as data and update his prior via Bayes' theorem (Genest and Zidek, 1986, p. 120).
(ii)
The likelihood solution corresponds to a Bayesian solution with noninformative priors.
(iii)
Many known approaches to combining subjective probability estimates can be understood as special cases. Forming a weighted average of the probabilities of the experts (linear opinion pool), for example, corresponds to the assumption of a normal error model with the inverse weights as variances.
I agree with Dubois and Prade that there are different ways to model subjective uncertainty. The selection of such a theory depends on the characteristics of the problem at hand. It has to be demonstrated, however, that a new formalism is internally consistent and offers more than well established approaches (e.g. probability theory). Otherwise it may have a confusing effect and ignore the rich theory developed for established paradigms. The interpretation of a sentence F: "if W then N" holds with probability p as P(N I W) =pis not essential to the approach of the paper. Depending on the situation of interest, the interpretation P(-, W v N) = p may be more appropriate, as demonstrated in the comment of Smets. Both interpretations state some characteristics of the joint probability measure, and may be used without problems with the techniques discussed in this chapter.
Additional references Carnap, R. (1945). The two concepts of probability. Phil. Phenomenol. Res. 5, 513-532. Cooke, R. M. (1985). Expert resolution: Proc. 2nd Conf on Analysis, Design and Evaluation of Man-Machine Systems. Pergamon Press, New York. Cooke, R. M. (1986). Probabilistic reasoning in expert systems reconstructed in probability semantics. Philosophy of Science Association 1986, Vol. I. Cox, R. (1946). Probability, frequency and reasonable expectation. Am. J. Phys. 14. 1-13. De Groot, M. and Fienberg, S. E. (1983). The comparison and evaluation of forecasters. Statistician, 32, 12-22. Dubois, D. and Prade, H. (1982). A class of fuzzy measures based on triangular norrns.
8
Probabilistic Logic
251
A general framework for the combination of uncertain information. Int. J. Gen. Syst. 8, 43-61. Fens tad, J. E. ( 1967). Representations of probabilities defined on first order languages. Sets, Models, and Recursion Theory (ed. J. N. Crossley), pp. 156-172. NorthHolland, Amsterdam. Gaifman, H. and Snir, M. (1982). Probabilitities over rich languages, testing and randomness. J. Symbolic Logic 47, 495-548. Harper, W. L., Stalnaker, R. and Pearce, G. (1981). Ifs: Conditionals, Belief, Decision, Chance, and Time. Reidel, Dordrecht. Lewis, D. ( 1976). Probabilities of conditionals and conditional probabilities. Phil. Rev. 85, 297-315. Also in Harper eta/. (1981), pp. 129-147. Lichtenstein, S., Fisch hoff, B. and Phillips, D. (1982). Calibration of probabilities: the state of the art to 1980. Judgement under Uncertainty: Heuristics and Biases (ed. D. Kahneman, P. Slovic and A. Tversky), pp. 306-335. Cambridge University Press. Lindley, D. V. (1982). Scoring rules and the inevitability of probability. Int. Statist. Rev. 50, 1-26. Los, J. (1963). Semantic representation of the probability of formulas in formalized theories. Studia Logica 14, 183-194.
9
Belief Functions PHILIPPE SMETS /RID/A, Universite Libre de Bruxel/es, Belgium
Abstract This chapter is a short self-contained presentation of the use of belief functions, a mathematical tool for the quantification of subjective, personal credibility.
1
INTRODUCTION
In order to delimit the problems covered by belief functions, we briefly describe various types of ignorance closely related to belief. There are at least three forms: possibilistic, probabilistic or credibilistic, each endowed with its own mathematical model. 1.1
Possibility
The information that "John's height is over 170 em" implies that, in describing John, any height h over 170 is possible and any height equal to or below 170 is impossible. This can be represented by a possibility function on the height domain whose value is 0 for h ~ 170 and I for h > 170 (where 0 = impossible and I = possible). Ignorance is due to the lack of precision, of specificity of the information "over 170". This type of ignorance can be generalized with statements like "John is tall". It implies that a height less than 160 em is impossible (value= 0) and a height above 180 em is possible (value= 1). In between, one may consider that the possibility takes some intermediate value between its extrema 0 and 1, the greater the height, the greater the possibility. Ignorance is due to the imprecision that results from the use of the fuzzy, vague, ill-defined term "tall". This type of possibilistic ignorance is covered by Dubois and Prade in Chapter I 0, and will not be discussed here. 1.2
Probability
Another form of ignorance results from randomness encountered in chance
254
P. Snwrs
set-ups. For example, when throwing a dice, the probability that the outcome . . I IS one IS "6· This model can be generalized by considering that the probability of each event is not known as a real value between 0 and I, but as belonging to an interval. This results in the "upper and lower probability theory" (Good 1950; Smith, 1961; Dempster, 1967, 1968). This theory should not be confused with the one covered by belief functions. It requires the existence of an underlying probability whose value is known only to be within a crisp interval. It has been further generalized when probability is known as a fuzzy number (close to 0.6), as a linguistic variable (small) or to lie within a fuzzy interval (approximately between 0.4 and 0.5) (Zadeh, 1975). Another generalization is obtained by the introduction of some metaprobability that describes our knowledge about the value of the unknown but existing probability (Lindley et al., 1979). This meta-probability expresses in fact our degree of belief about an unknown probability, where degrees of belief are quantified by a probability function. It is a particular form of Bayesian probability. Furthermore, it is also a special case of the theory of belief functions when belief is quantified by a probability function, a particular form of belief function described below. 1.3
Credibility
Belief functions aim to model and to quantify the subjective, personal credibility (called belief hereinafter) induced in us by evidence. Some evidence is strong enough to induce knowledge: if it is II a.m. then I know it is daytime. Other not so definite evidence may induce only a belief: given the information available on 15 July, 1986, I believe that I will be in Cordes on 22 September, 1986. This belief can be more or less strong, thus admitting degrees of belief. Bayesian probabilists have claimed that this degree of belief can be quantified by probability functions whose major axiom, the additivity axiom. states that the probability of the union of two disjoint events is the sum of the probabilities of each event (Fine, 1973). The Bayesian approach is usually justified by axioms describing decision processes or betting behaviour Within such a context, our belief can indeed be described by a probability function, but it does not follow that our belief should always be so modelled Belief can exist outside any decision or betting context. It is a cognitive process that exists per se. The Bayesian argument implies only that when we face a decision problem we must be able to construct a probability function based on our belief. This chapter presents a model to quantify someone's degree of belief based on belief functions (Shafer, 1976). Within the AI community, it is often called
9
Belief Functions
255
the Dempster-Shafer model, an unfortunate denomination which allows too widespread a confusion between upper and lower probabilities and belief functions, the first dealing with an imprecisely known underlying probability, the second with the intensity of our credibility. Some modifications of Shafer's initial model are introduced, essentially the distinction between the open- and the closed-world assumptions and its impact on the normalization. This chapter is a short self-contained exposition of the whole theory, but Shafer's ( 1976) highly readable seminal book should be read before really pursuing the topic. The model developed here should not be confused with those found in recent AI research papers on belief networks (Pearl, 1986a, b). In these papers beliefs are quantified by classical Bayesian probabilities, and the problem under consideration is their implementation for AI applications. Section 2 of this chapter discusses the nature of the frame of discernment on which a degree of belief will be established, and presents the distinction between open- and closed-world assumptions. Section 3 introduces the general model and presents an example. Section 4 presents the relevant mathematical definitions and properties. Our presentation considers algebra of propositions and not sets as is often done. Both approaches could have been used. Our choice reflects a personal taste and the idea that the concept of the truth of a proposition precedes that of belonging to a set. Section 5 presents Dempster's rules of conditioning and combination. Section 6 presents Bayes theorem, generalized within a belief-function framework (Smets, 1978). Section 7 discusses discounting evidence, i.e. what to do when the available evidence is not reliable. Section 8 presents canonical experiments that can explain the meaning of the numerical value of the belief given to a proposition A. Section 9 concludes and presents some hints about the use of this model for automated reasoning.
2 THE FRAME OF DISCERNMENT 2.1
Open- and closed-world assumptions
In most probability theories, as well as in Shafer's theory, one starts by postulating some frame of discernment A (also called the Universe of Discourse or the Domain of Reference) on which evidence induces some belief. In reality, the cognitive process is hardly as simple. When faced with a cognitive problem, one starts by constructing the set KP of those propositions Known as Possible. But there is also (I) the set UP of Unknown Propositions for which we have no idea whether they are possible or impossible, and (2) the set K I of those propositions Known as Impossible. In
256
P. Smets
the classical approach, one considers that UP is empty and accepts the highly idealized closed-world assumption, i.e. that the truth is necessarily in KP and that A is KP. The content of the three sets depends not only on the problem studied, but also on the pieces of evidence available. As evidence becomes available, propositions are redistributed between the three sets. (1)
A proposition A is transferred from KP to Kl when the evidence permits the claim that A is impossible. This corresponds to the classical concept of conditioning.
(2)
A proposition A is transferred from UP to KP if the evidence induces us to consider as possible some forgotten propositions.
(3)
A proposition A is transferred from UP to Kl if the evidence induces us to consider that some forgotten propositions are in fact impossible. In practice, this has no direct impact, as the degrees of belief are constructed only on KP.
(4)
Transfer from K I to K P or UP and from K P to UP would be inconsistent with the definition of the three sets, if one accepts, as here, that the allocation of any proposition to one of the three sets is always correct. A true proposition may be correctly allocated to KP and UP. and a false proposition may be correctly allocated to K P, K I or UP.
A true proposition may not be allocated to Kl, and any propositiOn allocated to Kl will stay in Kl, inducing monotonicity for the impossible (false) propositions. The generalization could be considered by accepting that a true proposition might be in Kl and constructing some meta-belief function on the set of all propositions that expresses the degree of belief that each proposition can belong to any of the three sets. The closed-world assumption postulates an empty UP set. The open-world assumption admits the existence of a non-empty UP set, and the fact that the truth might be in UP. 2.2
Notation
This presentation of the frame of discernment is formalized as follows. One writes -,, v, 1\ and => for the negation, the disjunction, the conjunction and the material-implication connectives. The set K P will be based on A, a finite set of elementary propositions. Let D be the boolean algebra of propositions derived from A, i.e. Q contains the
9
Belief Functions
257
conjunctions, disjunctions and negations of any set of propositions of A. Let ln be the tautology relative to n, i.e. ln is the disjunction of all elementary propositions of A. Let On be the contradiction relative to n, i.e. none of the propositions of A implies On. Then the conjunction of any two distinct propositions of A is On. The set UP will be denoted by e. No details about its structure and about Kl are needed. Any support given by some piece of evidence to some proposition A of n is in fact given to A v e. In order to simplify the notation, we shall not repeat the disjunction with e, but it must be unstood that whenever a proposition A of Q is mentioned, it corresponds to A v e. The proposition On is not the contradiction, as it corresponds to On v e. There would be a contradiction if e was empty (the closed-world assumption). The proposition ln corresponds to ln v e and is thus a tautology as all propositions in KJ are false by definition. Negation of any proposition A of n, symbolized by --,A, is taken relatively to A. So --,A is the disjunction of e and any elementary proposition of A not implying A. The E symbol is used with the following meanings: A E A means that A is an elementary proposition of A; A E n means that A is a proposition of n;
for BEn, A E B means that A is an elementary proposition implying B. Thus OnE n is true but OnE A and OnE Bare false as On is not an elementary proposition. (Being an element of the algebra n is different from being an element of an element of Q.) For any A En, IAI is the number of elementary propositions BE A such that BE A. For A, BEn, the symbol A ..... B means "it is true that A implies B", i.e. A and B are such that whenever X E A, then X E B. Note that On -+ A can be asserted for all A in Q. We say that a proposition BEn is based on some elementary proposition A of A if A E B.
3 3.1
QUANTIFICATION OF DEGREE OF BELIEF General model
Suppose that there is a piece of evidence that induces in us some belief concerning the truth of propositions defined on a finite frame of discernment
258
P. Smers
A with !l being the Boolean algebra derived from A. It is postulated that there exists some finite amount of belief that is spread among the various propositions A of n according to the available evidence. For instance suppose that Mrs Jones has been murdered and we, the judges, know that the suspects arc Peter, Paul and Mary. Thus A= {Peter, Paul, Mary}. Given the available evidence, parts of the amount of belief are allocated to each of the three potential murderers, as in Bayesian model. But some evidence might support something other than only one of the three persons. Such is the case of the evidence "the mufderer is a male". This evidence supports A ="Peter or Paul", and we allocate some part of m of our total mass of belief to A without being able to split it between the two components of A. In such a case, probabilists usually invoke the Principle of Insufficient Reason or an argument of symmetry to decide that the mass m must be split into two equal parts, one for Peter and one for Paul. The originality (and the power) of Shafer's model is that it does not evoke these principles and leaves the mass m allocated to the proposition A. The total amount (mass) of belief is arbitrary, but is conveniently scaled to I without any loss of generality. The non-negative mass m(A) allocated to the proposition A En that cannot be allocated to any proposition A' such that A' -+ A, A' # A is called a basic probability number by Shafer ( 1976). (A -+ B is short for "it is true that A implies B'.) The function m: n-+ [0, I] is called a basic probability assignment whenever
L
m(A) =I
A-1 11
where ln is the tautology relative to n. The notation
L
means that the sum
A-B
is taken over all propositions A En that imply BEn, or over all propositions BEn implied by A E !l, depending on which symbol A orB is not fixed by the context. Any A such that m(A) > 0 is called a focal proposition.
3.2
Practical example
As a practical example, suppose that we are the judges and must analyse the available evidence concerning Mrs Jones' case. Three witnesses provide evidence (testimonies). Let the three pieces of evidence be symbolized by E ,. E2 and E 3 . E1:
Witness I is a janitor, who claims he heard the victim yelling and then saw a small man running out of the victim's house.
9
259
Belief Functions
E2 :
Witness 2 is an old lady, who lives across the street from the victim and who saw the crime through her window and claims the murderer was much taller than the victim.
E3 :
Witness 3 is Peter's girlfriend, who testifies that Peter was at her home far away from the victim's house when the crime happened.
How do we evaluate the meaning of these three pieces of evidence, how do we quantify their respective support for the potential murderers, how do we combine these supports, and what do we do if doubt can be cast about the quality of the testimonies. Let k symbolize the killer. E 1 supports that k is a man. Furthermore, k looks small, which fits Paul or Mary, both being small, but not Peter, who is quite tall. But as the janitor was far from the house, his opinion about the tallness of the man he saw running is doubtful, as is his testimony about the sex, as Mary has short hair and could have worn slacks. The impact of E 1 on n can be summarized by three masses, one pointing to {Peter or Paul}, one pointing to {Paul or Mary} and one unallocated, i.e. pointing to 10 . E 2 suggests Peter, but as the witness is short-sighted and claims she had taken off her glasses just before looking through the window, some reservation must be allocated concerning the value of her testimony. The impact of E 2 can be summarized by two masses, one pointing to {Peter}, the other being unallocated. E 3 suggests Paul or Mary. But as the witness is Peter's girlfriend, serious doubts must be put on her testimony. The impact of E 3 can be summarized by two masses, one pointing to {Paul or Mary}, the other being unallocated. Indeed, if the witness is lying, E 3 does not support that Peter is the killer, it only makes her testimony meaningless. Table 1 (columns m" m 2 and m3 ) presents the masses quantifying the impact of the three pieces of evidence on n. The evaluation of the masses is not discussed in this chapter; it will be briefly discussed in Section 8. Table 1
Masses derived from the three pieces of evidence, and their combination Q
m,
mz
mJ
On Peter Paul Mary Peter or Paul Peter or Mary Paul or Mary Peter, Paul or Mary
0.6
0.5 0.2 0.3
m,z
m,zJ
0.12 0.48
0.36 0.24 0.10 0.00 0.10 0.00 0.14 0.06
0.20
0.4
0.5 0.5
0.08 0.12
260
P. Sme1s
The present example is based on external evidence (testimonies). But one could just as well have used internal (objective) evidence like the fingerprints found on the weapon or the knowledge that the killer smokes a certain brand of cigarettes. 3.3
Combination of evidence
Pieces of evidence are combined by the application of Dempster's rule of combination on the basic probability assignments. The product of the masses induced by two distinct pieces of evidence is allocated after combination to the conjunction of the two focal propositions. Let mi(A) be the masses derived from evidence Ei, i = 1, 2, and let mu(A) be the mass obtained after the combination of pieces of evidence E 1 and E 2 . So m 1 (A) * m2 (B) is allocated to the conjunction A 1\ B. All such possible products are computed and all masses allocated to the same proposition are added together: mu(A) =
L
m 1 (A v X)m 2 (A v Y) =
x--,A
L
m 1(X)m 2 (Y)
x"'r=A
y--,A X"' Y =011
m123 is computed by combining m 12 with m3 in the same way. Table presents the results. Dempster's rule of combination is associative: whatever the order in which basic probability assignments are combined, the results are identical. 3.4
Belief and plausibility
The quantity m(A) measures the amount of belief that one commits specifically to A, not the total belief that one commits to A. Each mass m(A) supports also any proposition implied by A. Therefore the total degree of support (belief) that we have about the fact that a proposition A is true is obtained by adding all the masses m(B) allocated to propositions B that imply A without implying -,A (which means that On must be discarded from the sum). The degree of belief given to A is quantified by the belief function bel: n ..... [0, 1], with bel(A) =
L
m(B)
(3.1)
B-A
B, 0} can be ordered as q" .. . , qm such that Vi= I, rn- I, qi E /(qi+ 1 ) then the function PI is a possibility measure and conversely. In this case Cr is also a necessity measure (see e.g. Dubois and Prade, 1982b).
2.2.3
Modallogic
It is interesting to discuss the links between possibility logic and modal logics
which provide a syntactic modelling of the concept of possibility and necessity, usually referring to possible-world semantics (see Appendix B of the Introduction). The main differences between both approaches seem to be as follows. (i)
In modal logic possibility and necessity are all-or-nothing concepts. As a consequence they can be introduced as special symbols in the language. Op reads "pis necessary" while OP reads "pis possible". In contrast, in possibility theory, possibility is a graded notion as well as necessity-whence the use of numbers.
(ii)
Modal logics propose numerous axiomatic settings, while the axioms of possibility theory are well-defined and unique. Along this line, it is relevant to define qualitative counterparts of possibility-theory axioms, in the style of modal logic, restricting ourselves to the case where O(p) E {0, 1}. One way of doing this is to use the following translation rule: 1- Dp
translates into N( p)
=
I
1- OP
translates into n(p)
=
I
/0
303
Po.uihilistic and Fu::::y Logics
Clearly, the classical identiy---, () p = D---, ptranslates into I - 0( p) = N(---, p), which is a basic relationship in possibility theory. Moreover, a numerical translation of Lewis' implication D(p-+ q) is clearly N(p-+ q) = I, which implies that p-+ q is true, in possibility logic. The basic
axiom of possibility logic can be expressed as 1-()(p v q)+-+(()p v ()q)
(28)
This is one of the basic axioms of the modal-logic system T according to von Wright (see Hughes and Cresswell, 1968). In addition, possibility theory recovers the square of Aristotelian modalities, as does the S 5 system. It would be interesting to relate possibility theory to some existing formal systems in modal logic.
2.2.4
Default reasoning using logics of uncertainty
Probabilistic logic and possibility logic have both been suggested as possible approaches to default reasoning (Rich, 1983; Farreny and Prade, 1986). The idea is to interpret the weight bearing on an "if-then" rule as a measure of the extent to which the rule has no exception. The rule then models an imperfect "is-a" link in a semantic network. See Chapter 7. This interpretation of logics of uncertainty faces several problems. (i)
It is not clear that in default logic the grade of uncertainty must be attached to a logical implication p -+ q. The use of conditioning instead of implication may appear more natural for modelling imperfect "is-a" relations. For instance Prob (q I p) expresses the proportion of q's among those that are p's. See Zadeh ( 1983) for a treatment of default rules in terms of fuzzy proportions. The problem raised here is the difference between what can be called a "conjecture" (i.e. a universal assertion that is true or false, but cannot yet be proved nor refuted) and what Zadeh ( 1985) calls a "disposition" (an assertion that generally holds, but sometimes does not).
(ii)
Default rules do not always underlie a statistical interpretation. In particular, typicality (Chapter 7) seems to be of a different nature. In that case a possibilistic treatment of default rules, as done by Farreny and Prade (1986), may be more satisfactory. A default rule is then modelled by knowledge of the quantity O(q 1 p) defined by the relation n(p
1\
q) =min (n(p), n(q I p))
(29)
from the knowledge of a possibility measure. However, as indicated
304
D. Dubois and H. Prade
by Dubois and Prade ( 1986a), this notion of conditioning is very close to logical implication, since n(q I p) = 1 - N(p--+ 1q), or 1.
3
FUZZY LOGIC
Uncertain propositions, considered in the preceding section, must not be confused with fuzzy propositions. In the first case we have propos1tions that are true or false (thus involving non-vague predicares), but due to the lack of precision of the available information, we can in general only estimate to what extent it is possible or necessary that a proposition is true. In the second case the available information is precise, but the vagueness of predicates leads to propositions with intermediary degrees of truth. Obviously we may encounter a fuzzy proposition for which the available reference information is not precise; then we have the general case of an uncertain fuzzy proposition; the study of such propositions is outside the scope of this introduction. This situation is the most complicated one, and it leads to fuzzy truth-values as indicated in Section 1. See also Prade (1985b) and Yager (1984) for a representation and a treatment of uncertain fuzzy propositions in terms of possibility distributions.
3.1
Formal reasoning with vague predicates
Patterns of reasoning in the style of modus ponens can be developed for fuzzy propositions, i.e. bounds on t(q) can be computed from knowledge oft( p) and t( p --+ q); see Dubois-Prade (1980a) for instance. However, there are different natural ways for defining t(p--+ q) from (14)-(16), as discussed by Dubois and Prade (1984a); moreover, we may have t(1q--+ 1p) #- t(p--+ q), for some definitions of the implication operator, when t( p --+ q) is not defined as t(1q v q). It can be proved that t(q) ~min (t(1p v q), t(p)) if and only if t(1p v q) + t(p) > 1 (see Dubois and Prade, 1980a, p. 167), result contrasts with (20). Quite early in the development of fuzzy-set theory, an extension of Robinson's resolution principle was proposed by Lee ( 1972) for ground clauses in the framework of the fuzzy logic defined by (14)-(16), i.e. for dealing with fuzzy propositions; note that the resolution principle avoids the explicit use of the implication connective in the representation of the knowledge. Basically, Lee proved that if all the truth-values of parent clauses are strictly greater than 0.5 then a resolvent clause derived by the resolution principle always has a truth-value between the maximum and the minimum of those of the parent clauses. See Dubois and Prade (1987a) for a bibliography of subsequent works along this line.
10
305
Po.uibilistic and Fuzzy Logics
Note that a set of fuzzy propositions is generally not a Boolean algebra. In particular, the law of contradiction is not valid, so that the refutation method, which is the basis of many logic-programming techniques, seems hard to implement here. This fact, and also the fact that the above results are applicable only to ground clauses, may restrict the applicability of formal proving methodologies for fuzzy logic.
3.2
Fuzzy logic based on meaning computation
Another approach to reasoning with fuzzy statements, i.e. statements containing fuzzy predicates, is described by Zadeh (1979). Let S~o S 2 , ... , Sn ben statements expressed, say, in natural language. Let {M(S;), i = I, ... , n} be the meanings of S ~o ... , Sn, as defined in Section I. M(Si) is obtained by translating Si into a meaning-representation language such as PR UF (Zadeh, 1978b), and is fuzzy restriction on a variable vector Xi taking its value on universe Vi. Let U be the universe of discourse built from U~o .. . , Un, and let V be some subuniverse from which a variable Yin which we are interested takes its values. Reasoning is here viewed as using the statements S~o ... , Sn to say something about Y. This is done in three basic steps: (i)
compute the meanings M(S;) of Si, i = I, ... , n, and their cylindrical extensions on U, say M 0 (S;);
(ii)
calculate the join of the M(Si); this yields a fuzzy relation R on U, interpreted as a possibility distribution;
(iii)
project R on the universe V, to obtain the fuzzy relation Projv (R), which is the meaning of the conclusion statement S about Y.
This scheme is very general. R is obtained by intersection of the fuzzy relations M 0 (S;). Let X be the vector variable pertaining to U; X can be denoted ( Y, Y'), where Y' pertains to variables taking values in V' such that U = V x V'. Projv (R) is defined by
v Y, J.lProjy(R)( Y)
=
sup J.lR( Y, Y')
(30)
Y'
consistently with possibility theory. The classical modus-ponens pattern has been generalized to the following fuzzy-logic pattern. S 1:XisA' S2 : if X is A then Yis B S: Yis B' where M(S 1 ) = A', a fuzzy set on U 1
306
D. Dubois and H. Prade
M(S 2 ) is obtained by means of a multiple-valued implication connective denoted --+ (see e.g. Dubois and Prade (1984b) for a review). We have J.lM(S2)(X, Y) = J.lA(X)
--+
J.lB( Y)
(31)
The universe U is U 1 x V, and the relation R on U is such that J.lR(X, Y) = J.lA·(X)
where
* is an
* (J.LA(X)
--+
J.lB( Y))
(32)
intersection operation such as ( 15). Thus we get (33)
It is shown by Dubois and Prade (1984b, 1985a) that the choice of the implication operation --+ is dictated by the choice of the intersection * as soon as the meaning of S 2 is defined from J.lA and J.l 8 under the following constraints:
(i)
from X is A one should conclude S: Y is B;
(ii)
M(S 2 ) should be as much unspecific as possible (i.e. as large a fuzzy set as possible) not to be arbitrary.
In particular, if * = min then --+ should be Godel implication (a --+ b = I if a ::::; b, and b otherwise). An axiomatic approach to the definition of manyvalued logic implication connectives (Rescher, 1969) is given in Trillas and Valverde (1985) for instance. A unified view of several classes of many-valued implication functions is proposed by Dubois and Prade (1984a). The pattern of the resolution principle can be dealt with using the same methodology:
S 1 :XisA'orZisC
S 2 : X is not A or Y is B S: ( Y, Z) is D This pattern can be called a generalized resolution principle. Indeed, here A' #- A, and predicates are fuzzy. It can be checked that, using (14)-(16) for basic set-theoretic operations, J.lo( Y, Z) =sup min (max (J.lA' (X), J.lc(Z)), max (I - J.lA(X), J.l 8 ( Y))) (34) X
Note that J.lo( Y, Z) ~ max (J.tc(Z), J.l 8 ( Y)) always holds; equality is obtained when A' = A is a crisp subset. The classical resolution principle is then recovered. Note that when C is empty, the pattern of the generalized modus ponens (33) is recovered, with * = min and a --+ b = max (I -a, b). When the implication used is not Godel's, A'= A does not yield B' = B in (33), D = B or C in (34). In other words, the elimination of fuzzy predicates
/0
307
Pmsihilistic and Fuzzy Logics
is not always permitted in fuzzy counterparts of the resolution principle. To recover the elimination property requires once again a proper choice of the implication operations in the pattern S 1 : if X is not A then Z is C S 2 : if X is A then Yis B S: ( Y, Z) is B or C Note that the inference mechanism in fuzzy logic is generally a nonlinearprogramming technique. Examples of systems based on these ideas are proposed by Baldwin (1979, 1983), Yager (1984) and Martin-Ciouaire and Prade (1986) for example. See also Prade and Negoita (1986), Sanchez and Zadeh (1987) for application-oriented papers, and Prade (1985a) for a larger bibliography.
3.3
Illustrative example
Let us con~ider the following example, which illustrates various aspects of possibility and fuzzy logics. We have two rules and two facts: (a)
if a person is a professional jockey U) then his/her weight is approximately between 45 and 50 kg (A');
(b)
if a person is a male (m) and his weight is between 40 and 50 kg (A) then it is likely he is a teenager (t);
(c)
John (J) is a male (m) and a professional jockey U).
The possible weights of a professional jockey specified by rule (a) are represented by means of a bell-shaped possibility distribution Jl.A, like that pictured in Fig. 6, and whose support is the interval [45, 50]. The conclusion part of rule (b) is pervaded with uncertainty; this can be modelled using the notation introduced above: (a):j(x)--+ A'(x);
(b): m(x)
1\
A(x) --+ t(x)
(IX);
(c): m(J),j(J)
where IX = N(Vx, m(x) " A(x) --+ t(x)). A particular case of the generalized modus ponens enables us to deduce from (a) and (c) that A'(J), i.e. John's weight is fuzzily restricted by A'. Then we compute to what extent we are certain that the condition part of (b) holds as N(m(J)
1\
A(J); m(J)
1\
A'(J)) =min (N(m(J); m(J)), N(A(J); A'(J))) =min (I, N(A(J); A'(J))) = N(A(J); A'(J))
which can be easily computed using (11) with n = propagate the uncertainty along rule (b):
llA'IJ)·
N(t(J));;;: min (N(A(J); A'(J)), IX)
Then by (20) we
308
D. Dubois and H. Prade
However, here we may find the conclusion that John is a teenager is somewhat certain is highly undesirable. The problem of controlling transitivity effects is classical in default logic; see Chapter 7. The way of coping with this problem in our framework is as follows. First, as others do, we substitute for (b) a more precise statement, namely (b')
if a person is a male and his weight is between 40 and 50 kg and he is not a jockey then it is likely that he is a teenager.
Then if we know that John is a jockey, rule (b') can no longer be applied to John. Secondly, if we just know that John is a male and that his weight is fuzzily restricted by A' then, in order to be able to use (b'), we keep the piece of default knowledge that the (a priori) possibility that a person is a jockey is very low, say e. So the corresponding a priori certainty that a person is not a jockey is high (I - e). The evaluation of the condition part of (b') now yields min (N(A(J); A'(J)), I -e), and finally our certainty that John is a teenager will be min (N(A(J); A'(J)), I - e, IX), i.e. a strong certainty if IX and N(A( J); A'( J)) are close to I.
3.4
Reasoning with fuzzy quantifiers
Often items of knowledge are expressed in the form of statements involving quantifiers different from the universal or the existential ones. These quantifiers can be viewed as proportions that may be only vaguely specified. They translate linguistic terms such as "most of" and "some". Zadeh ( 1983, 1985) has considered syllogisms with propositions involving fuzzy quantifiers modelled by fuzzy subsets of the unit interval, for example the so-called intersection/product syllogism of the form
Q1 As are Bs Q2 (A and B)s are Cs Q1 ® Q2 As are (B and C)s where ® is the extended product of fuzzy numbers (Dubois and Prade, 1980b). In the above syllogism Q 1 restricts the possible values of the proportion, lA n BI/IAI (where I I denotes the cardinality) or more generally of the conditional probability P(B I A). Q2 is defined in a similar way. The resulting quantifier Q1 ® Q2 is justified by the well-known probabilistic identity P(B n CIA) = P(B I A)· P(C IAn B). Note that patterns of reasoning that are valid when universally quantified may no longer hold even in a weaker form with fuzzy or numerical quantifiers. For instance the syllogism, where V means "all"
/0
309
Possihilistic and Fuzzy Logics
VAs are Bs Q 1 Bs are Cs Q As are Cs is valid with Q 1 = Q = V. However, as soon as Q 1 =IV, nothing can be said about the proportion of As that are Cs, which may take any value in the unit interval, i.e. Q = [0, 1]. Moreover, if we add the supplementary piece of knowledge Q 2 Bs are As
then it can be established that when JJ.Q 1 is increasing fori= I, 2 (i.e. Q 1 and
Q2 are variants of "most" as in Fig. 6)
eQl)
~( 0,18~ 1 Q=max
(35)
e
the extended where IilaX' is the extended maximum for fuzzy numbers, subtraction and (I Q 1 )/Q 2 an extended quotient. This result only expresses the fact that if A ~ B, P(A I B) ~ q 2 and P( C I B) ~ q 1 then, from the laws of probability theory, we conclude that P(C I A)~ (0, I - (1 - q 1 )/q 2 ). In (35) laws of probability theory are combined with results in fuzzy arithmetics (Dubois and Prade, 1980a). Consequently, Zadeh's theory of fuzzy syllogisms is nothing but probabilistic logic expressed in terms of conditional probabilities (as in the approach described in Chapter 8) with the assumption that the knowledge of probability values is in the form of fuzzy intervals, i.e. fuzzy probabilities (Zadeh, 1984), instead of point-probabilities or interval-valued ones. Based on this modelling, some forms of reasoning with default rules can be defined, when the general rules of the form Q As are Bs are instantiated. Note that in
e
0 Fig. 6
Q ="most".
310
D. Duboi.l' and H. Prade
their linguistic forms, the quantifiers may not explicitly appear in the rules (e.g. "snow is white" is short for "usually, snow is white"). Rules with explicit or implicit quantifiers are called "dispositions" by Zadeh ( 1985). The degree of truth of a statement of the formS= "Q As are Bs" is computed by Zadeh as follows: t(S 1.?4)
_.UQ (lA lA IBl) -_.UQ (u~u min (.UA(u), ,U8(u)))
-
11
L .UA(u)
ueU
provided that all values {(.UA(u), .u8(u)) I u E U} are stored in .?4. Yager (1983) proposes another treatment of quantified statements of the form "Q As are Bs", which does not relate to conditional probabilities.
4 EXAMPLES OF APPLICATIONS IN THE AUTHORS' RESEARCH GROUP Implementing the generalized modus ponens can be very simply achieved using parametrized representations of membership functions (Dubois et al., 1987b); the order of complexity of fuzzy production systems is thus not higher than for usual production systems. Of course the processing time for each rule is higher than in the classical case. But this is counterbalanced by a better expressive power of fuzzy production rules. A few fuzzy rules can generally account for the behaviour of a larger set of non-fuzzy rules. Implementing possibilistic logic in the production system style is quite efficient owing to the max-min matrix-calculation scheme for uncertainty propagation. In the resolution style, linear resolution strategies can be adapted, and a heuristic search algorithm has been developed (analogous to Nilsson's A*) to maximize the certainty of the resulting empty clause (Dubois et a/., 1987a). The design of the inference engine SPII (Martin-Ciouaire and Prade, 1986) has been motivated by the need for a sufficiently general inference system able to (i) deal both with the imprecision and the uncertainty pervading factual and expert knowledge, and (ii) combine symbolic reasoning with numerical computation. SPII-2 is capable of treating pieces of information (facts or rules) that are imprecise (since they are expressed by means of vague predicates) or uncertain (since their truth is not fully guaranteed). SPII-2 works in backward-chaining. Possibility theory is used for representing imprecision in terms of possibility distributions and uncertainty by means of a pair of possibility and necessity measures. More technically, SPII-2 (i) propagates uncertainty and imprecision in the reasoning process via deductive inferences; (ii) estimates the degree of matching between facts and condition parts of rules in presence of vagueness; (iii) combines imprecise or
10
Pos.l'ihili.l'tic and Fu:::y Logics
311
uncertain pieces of information relative to the same matter; and (iv) performs computation on ill-known numerical quantities using fuzzy arithmetics. SPII-2 has been developed and experimentally tested on a realistic prospectappraisal problem in petroleum geology involving fuzzy rules (Lebailly et al., 1987). SPII-2 is written in LELISP and is running on a VAX 11-780 computer as well as a Macintosh microcomputer. DIABETO (Buisson et al., 1987) is a medical expert system, accessible from the French videotex network TELETEL, which is a decision-aid tool for the treatment of diabetes. In DIABET0-111, imprecise/uncertain rules and facts are represented in an unified manner using possibility distributions. Particularly DIABET0-111 deals with expert rules involving fuzzy conditions, which are understood as "the more the condition is satisfied, the more certain is the conclusion". Besides, an interpolation method enables the system to build, from a given set of fuzzy rules, a new fuzzy rule which is more adapted to the current situation if necessary. Presently the knowledge base contains about 300 rules (the full knowledge base should contain about 1000 rules). The system is_ designed for use by sick people themselves. It is implemented in NIL (a dialect of LISP) on a VAX 11-780. The inference engine T AlGER (Farreny et al., 1986) is not only able to handle uncertain rules but also imprecise and uncertain factual pieces of knowledge concerning the values of logical or numerical variables. The possibilistic representation of uncertainty that is used is somewhat similar to the MYCIN one (Buchanan and Shortliffe, 1984), but the chaining and combination operations of the possibility-theory-based aproach differ somewhat from the empirical choice (obtained as distorted probabilistic laws) made in MYCIN. Besides, imprecision is dealt with in the same possibilistic framework in TAIGER. TAIGER manipulates numerical values pervaded with imprecision and uncertainty, while inference engines like that of MYCIN treat uncertain rules and facts only. TAIGER maintains a representation of imprecise or uncertain facts in terms of possibility distributions, while the uncertainty of a rule is modelled by attaching the numbers appearing in a 2 x 2 matrix representation of the rule (Farreny and Prade, 1986). TAIGER works in backward-chaining. TAIGER is currently implemented on an IBM-PC microcomputer in MULISP.
5
CONCLUSION
Possibility theory offers a common setting for modelling uncertainty and imprecision in reasoning systems. However, the reasoning methodology in fuzzy logic drastically differs from the theorem-l'roving approach. In the latter, statements are translated into logical formulae. Inference is then performed symbolically, regardless of the meaning of the formulae. In fuzzy
312
D. Dubois and H. Prade
logic, in contrast, statements are translated into elastic constraints in a meaning-representation language, and the meaning of the conclusion is directly computed via nonlinear-programming techniques. However, in possibility logic, as soon as no vagueness pervades the knowledge, it seems that part of the theorem-proving methodology can be extended, as stressed in Section 2. Finally, we have pointed out that the notion of truth can be viewed as the result of a semantic pattern-matching process. This view leads to the definition of operational procedures in order to compute degrees of truth and degrees of uncertainty that can feed approximate reasoning systems. Bl B LIOG RAPHY Baldwin, J. A. (1979). A new approach to approximate reasoning using a fuzzy logic. Fuzzy Sets and Systems 2, 309-325. (An extensive treatment of the generalized modus ponens based on fuzzy truth-values.) Dubois, D. and Prade, H. (1980a). Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York. (An account of the fuzzy-set literature in the nineteen seventies. Covers a broad range of topics.) Dubois, D. and Prade, H. (1985a). (avec Ia collaboration de H. Farreny, R. MartinCiouaire, C. Testemale) Thi!Orie des Possibilith Applications a Ia Representation des Connaissances en lnformatique. Collection Methode+ Programmes, Masson, Paris. (English translation to be published by Plenum Press, New York.) (A complement to the previous reference on some aspects of fuzzy-set theory, especially possibility measures, fuzzy arithmetics and fuzzy-set-theoretic operations. Focuses on applications to approximate reasoning, heuristic search, fuzzy programming and relational databases.) Dubois, D. and Prade, H. (1986a). Possibilistic inference under matrix form. Fuzzy Logic in Knowledge Engineering (ed. H. Prade and C. V. Negoita), pp. 112-126. Verlag TUV Rheinland, K6ln. (An extensive presentation of possibilistic logic. Also deals with the question of the possibility of conditionals and conditional possibility.) Dubois, D. and Prade, H. (1987a). Necessity measures and the resolution principle. IEEE Trans. Syst. Man Cyber. 17, 474-478. (The theorem-proving approach to possibilistic logic.) Gaines, B. R. (1976). Foundations of fuzzy reasoning. Int. J. Man-Machine Stud 8, 623-668. (A basic reference on the links between multiple-valued logics and fuzzyset theory.) Lee, R. C. T. (1972). Fuzzy logic and the resolution principle. J. Assoc.for Computing Machinery 19, 109-119. (The main and oldest reference on the theorem-proving approach to the max-min multiple-valued logic underlying fuzzy-set theory.) Ponasse, D. (1978). Algebres floues et algebres de J:_ukasiewicz. Rev. Roum. Mat h. Pures Appl. 23, 103-113. (A fuzzy counterpart of Stone's theorem for Boolean algebras.) Prade, H. (1985a). A computational approach to approximate and plausible reasoning with applications to expert systems. IEEE Trans. Pattern Anal. Machine Intelligence 7, 260--283 (Corrections in 7, 747-748). (An overview of approximate reasoning methodologies related to possibility theory and fuzzy logic. Includes a very large bibliography.) Prade, H. and Negoita, C. V. (eds) (1986). Fuzzy Logic in Knowledge Engineering.
/0
Possihi/istic and Fuzzy Logics
313
Verlag TOY Rheinland, Koln. (A collection of up-to-date contributions by major researchers in the area of possibility theory and fuzzy logic applied to approximate reasoning, databases and expert systems.) Sanchez, E. and Zadeh, L.A. (eds) (1987). Approximate Reasoning in Intelligent Systems, Decision and Control. Pergamon Press, Oxford. (A similar collection, with other contributions.) Yager, R. R. (1983). Quantified propositions in a linguistic logic. Int. J. Man-Machine Stud. 19, 195-227. (An alternative approach to fuzzy quantifiers, extending the substitution method in logic.) Zadeh, L. A. (1965). Fuzzy sets. Info. Control 8, 338-353. (The founding paper on fuzzy-set theory. It is still recommended reading to capture the basic intuitions.) Zadeh, L. A. (1978a). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems I, 3-28. (The first paper on possibility measures. Stresses the links between possibility distributions and linguistic information. Adopts a physical point of view on possibility, as opposed to statistical knowledge.) Zadeh, L.A. (1979). A theory of approximate reasoning. Machine Intelligence 9, (ed. J. E. Hayes, D. Michie and L. I. Mikulich), pp. 149-194. Elsevier, Amsterdam. (Zadeh's approach to reasoning with vague information. Describes in detail the combination/projection methodology sketched in Section 3.2.) Zadeh, L. A. (1985). Syllogistic reasoning in fuzzy logic and its application to usuality and reasoning with dispositions. IEEE Trans. Syst. Man Cyber. 15, 754-763. (The most up-to-date paper on the theory of dispositions and the treatment of fuzzy quantifiers.) Other references
Baldwin, J. (1983). A fuzzy relational inference language for expert systems. Proc. 13th IEEE Int. Symp. on Multiple-Valued Logic, Kyoto, pp. 416-423. IEEE, New York. Bellman, R. E. and Zadeh, L. A. (1977). Local and fuzzy logics. Modern Uses of Multiple-Valued Logics (ed. J. M. Dunn and G. Epstein), pp. 103-165. Reidel, Dordrecht. Buchanan, B. G. and Shortliffe, E. H. (1984). Rule-based Expert Systems. TheM YCI N Experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, Mass. Buisson, J. C., Farreny, H., Prade, H., Turnin, M. C., Tauber, J. P. and Bayard, F. (1987). TOULMED, an inference engine which deals with imprecise and uncertain aspects of medical knowledge. Proc. Eur. Conf on Artificiallnteligence in Medicine (AIM E-87), Marseilles. Springer-Verlag, Berlin. Dubois, D. (1987). Possibility theory: towards normative foundations. Risk, Decision and Rationality (ed. B. Munier). Reidel, Dordrecht. (To appear.) Dubois, D. and Prade, H. (1980b). New results about properties and semantics of fuzzy-set-theoretic operators. Fuzzy Sets. Theory and Applications to Policy Analysis and Information Systems (ed. P. P. Wang and S. K. Chang), pp. 59-75. Plenum Press, New York. Dubois, D. and Prade, H. (1982a). A class of fuzzy measures based on triangular norms. Int. J. Gen. Syst. 8, 43-61. Dubois, D. and Prade, H. (1982b ). On several representations of an uncertain body of evidence. Fuzzy Information and Decision Processes (ed. M. M. Gupta and E. Sanchez), pp. 167-181. North-Holland, Amsterdam. Dubois, D. and Prade, H. (1984a). A theorem on implication functions defined from triangular norms. Stochastica, 8, 267-279.
314
D. Duhois and H. Prade
Dubois, D. and Prade, H. (1984b). Fuzzy logics and the generalized modus ponens revisited. Cybernetics and Systems 15, 293-331. Dubois, D. and Prade, H. (1984c). The management of uncertainty in expert systems: the possibilistic approach. Operational Research '84: Proc. lOth Triennal I FO RS Conf., Washington, DC (ed. J. P. Brans), pp. 949-964. North-Holland, Amsterdam. Dubois, D. and Prade, H. (1985b). A review of fuzzy set aggregation connectives. Info. Sci. 36, 85-121. Dubois, D. and Prade, H. (1985c). Evidence measures based on fuzzy information. Automatica 31, 547-562. Dubois, D. and Prade, H. (1986b). Fuzzy sets and statistical data. Eur. J. Operational Res. 25, 345-356. Dubois, D., Prade, H. and Testemale, C. (1986). Weighted fuzzy pattern matching. Proc. Journee Nationale sur /es Ensembles F/ous, Ia Theorie des Possibilites et leurs Applications, Toulouse, pp. 115-145. (To appear in Fuzzy Sets and Systems, 1988.) Dubois, D., Lang, J. and Prade, H. (1987a). Theorem proving under uncertainty. A possibility theory-based approach. Proc. lOth Int. Joint Conf. on Artificial Intelligence (IJCAI-87), Milan.
Dubois, D., Martin-Ciouaire, R. and Prade, H. (1987b). Practical computing in fuzzy logic. Fuzzy Computing (ed. M. M. Gupta and T. Yamakawa). North-Holland, Amsterdam. (To appear.) Farreny, H. and Prade, H. (1986). Default and inexact reasoning with possibility degrees. IEEE Trans. Syst. Man Cyber. 16, 270-276. Farreny, H., Prade, H. and Wyss, E. (1986). Approximate reasoning in a rule-based expert system using possibility theory: a case study. Information Processing '86 (ed. H. J. Kugler), pp. 407-413. North-Holland, Amsterdam. Giles, R. (1982). Foundations for a theory of possibility. Fuzzy /'!formation and Decision Processes (ed. M. M. Gupta and E. Sanchez), pp. 183-195. North-Holland, Amsterdam. Goodman, I. R. and Nguyen, H. T. (1985). Uncertainty Modelsfor Knowledge-Based Systems. North-Holland, Amsterdam. Hughes, G. E. and Cresswell, M. J. (1968). An Introduction to Modal Logic. Methuen, London. Lebailly, J., Martin-Clouaire, R. and Prade, H. ( 1987). Use of fuzzy logic in rule-based systems in petroleum geology. Approximate Reasoning in Intelligent Systems, Decision and Control (ed. E. Sanchez and L. A. Zadeh), pp. 125-144. Pergamon Press, Oxford. Martin-Ciouaire, R. and Prade, H. (1986). SPil-l: a simple inference engine capable of accommodating both imprecision and uncertainty. Computer-Assisted DecisionMaking (ed. G. Mitra), pp. 117-131. North-Holland, Amsterdam. Prade, H. (1983). Data bases with fuzzy information and approximate reasoning in expert systems. Proc. IF AC Int. Symp. on Artificial Intelligence, Leningrad, pp. 113120. Prade, H. (1985b). Reasoning with fuzzy default values. Proc. 15th IEEE Int. Symp. on Multiple-Valued Logic, Kingston, Ontario, pp. 191-197. IEEE, New York. Rescher, N. (1969). Many-Valued Logic. McGraw-Hill, New York. Rich, E. (1983). Default reasoning as likelihood reasoning. Proc. American Association for Artificallntelligence Conf (AAAI-83), Washington, DC, pp. 348-351. Schweizer, B. and Sklar, A. (1963). Associative functions and abstract semi-groups. Pub/. Mat h. Debrecen 10, 69-81. Shackle, G. L. S. (1961). Decision, Order and Time in Human Affairs, 2nd edn. Cambridge University Press.
10
Po.vsihilistic and Fuzzy Lof(ic.\·
315
Suppes, P. ( 1966). Probabilistic inference and the concept of total evidence. Aspects of Inductive Logic (ed. J. Hintikka and P. Suppes), pp. 49-65. North-Holland, Amsterdam. Trillas, E. and Valverde, L. (1985). On implication and indistir;~guishability in the setting of fuzzy logic. Management Decision Support Systems Using Fuzzy Sets and Possibility Theory (ed. J. Kacprzyk and R. R. Yager), pp. 198-212. Verlag TOY, Rheinland, Koln. Yager, R. R. (1984). Approximate reasoning as a basis for rule-based expert systems. IEEE Trans. Syst. Man Cyber. 14, 636-643. Zadeh, L. A. ( 1978b ). PR UF: a meaning representation language for natural languages. Int. J. Man-Machine Stud. 10, 395-460. Zadeh, L. A. (1981 ). Test-score semantics for natural languages and meaning representation via PRUF. Technical Note 247, SRI International, Menlo Park, California. Also in Empirical Semantics (ed. B. B. Rieger), pp. 281-349. Brockmeyer, Bochum, 1982. Zadeh, L.A. (1983). The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets and Systems 11, 199-228. Zadeh, L.A. (1984). Fuzzy probabilities. l'!fo. Proc. Mgmt 19, 148-153.
DISCUSSION Marie-Odile Cordier: Dubois and Prade make in their paper a clear distinction between uncertain reasoning and vague (or approximate?) reasoning. Uncertain reasoning is a precise reasoning on a given situation (or world) incompletely described in a database. The answer to a precise query can then be: "surely" true, "surely" false, or "possibly" true or false: i.e. a degree of certainty. If one knows for example that John is tall then the query "is John's height more than 1.80 m?" is answered in terms of whether there is more or less possibility that the statement is true. Vague reasoning is reasoning with vague predicates on a precise database. The predicates are defined approximately and are more or less verified by precise data, i.e. are more or less true. An answer to a query is then a degree of truth. If one knows that John's height is 1.70 m then a statement such as "John is tall" can be said to be true with a to-be-determined degree of truth. Possibilistic logic and fuzzy logic are two ways of reasoning on imperfect knowledge; they use as a common tool fuzzy-set theory and for that reason are quite often confused. They are clearly distinguished in Dubois and Prade's chapter. Possibilistic logic is concerned with uncertain reasoning where the database is a fuzzy description of a given world; it is a logic of uncertainty, as is probabilistic logic and the logic of evidence. Fuzzy logic is concerned with fuzzy reasoning on precise information; it is a logic of vagueness. Both are numerical approaches in the sense that degrees of certainty and degrees of truth are both estimated by numerical values. In possibilistic logic, an assertion is labelled by two values, the possibility and the necessity, which describe an interval of certainty of this assertion. In fuzzy logic, an assertion is labelled with a degree of truth describing its conformity with the reality. An important result is given by Dubois and Prade stating that logics of uncertainty cannot be truth-functional. Uncertain reasoning: evaluation of uncertainty versus use of dependency links. Uncertain reasoning means reasoning on incomplete information: the value of
316
D. Dubois and H. Prade
what Dubois and Prade call a variable (such as the height of John) is unknown, but can be restricted to a set of values. These restrictions reduce the set of possible worlds that can be modelled by such an incomplete database. One way of doing this is to evaluate the possible values of the variable: using numerical estimations, which are easy to combine and to manipulate. These numbers can be obtained using fuzzy set theory as is done in possibilistic logic, or probability theory as in probabilistic logic, and can describe the possibility, probability, credibility etc. of the corresponding assertion. Symbolic estimations can also be used, such as the modalities proposed by Kodratoff eta/. (1985). Another way is to consider unknown values as possible hypotheses and to use hypothetical reasoning; assertions are then labelled by hypothetical contexts that describe precisely under what conditions the result can be obtained. Instead of producing answers such as "/ikes(Mary, Paul) with possibility
IX
and necessity
P"
it would produce "likes(Mary, Paul) if Paul earns more than $10000 or Paul is less than 40 years old" which can be more useful or more instructive.
One of the problems with logics of uncertainty is how to transform uncertain information into information labelled by a certainty degree. In probabilistic logic the certainty degree describes a probability that can be obtained by using the well-known probabilistic theory. It seems that probabilities are well adapted to describe the certainty of a fact such as "the king of diamonds is in West's hand" (game of bridge) which can be computed precisely. In possibilistic logic the possibility and necessity are obtained through the use of fuzzy-set theory. The meaning of a fuzzy predicate is described by a fuzzy set; the justification of these values is a question of agreement on the meaning of a word; a fuzzy set describing "John is tall" in terms of precise heights or "Peter is rich" in terms of amount of salary can be said to be good only if it satisfies the user. The meaning of a word is a matter of opinion and cannot be formally justified. The choice between logics of uncertainty seems to be quite dependent on the domain; Dubois and Prade argue that some results are worse (using the resolution principle) in probabilistic logic than in possibilistic logic, but what if inputs could be more precisely determined in the first case? Comparison with other logics of uncertainty.
Possibility and necessity of an implication: p :::J q (p-+ q). Dubois and Prade show how to get the couple (possibility, necessity) for an element of a fuzzy database: the meaning of a fuzzy predicate is given by a fuzzy set, represented by a trapezoidal curve; these curves are used to obtain for a given fuzzy fact, represented by a ground atomic formula, the two certainty measures. It is not so clear when one considers implication like p -+ q: where do possibility and necessity come from? What is their (intuitive) meaning? Possibility and necessity of an implication can be seen as extensions of possibility and necessity of a ground fact; they would describe the fuzzy relation between two propositions p and q, and express the uncertainty of expressions such as "it is possible that", "it can be that", "probably ... ". It would be expected that an implication would be labelled by possibility and
10
Possibi/istic and Fuzzy Logics
317
necessity measures as is the case for ground assertions; it seems that a matrix is proposed instead; this matrix is said to express "the grade of necessity of all ways of relating p and q". But no means (such as fuzzy sets for expressing the fuzzy predicates) are given for determining these measures. What is the intuitive meaning of these values? Where does this matrix come from? Are fuzzy sets used to compute it? For example, what is n 11 ? is it the necessity of (p --+ q) or the necessity of q knowing p? or are these two notions equivalent? In Farreny and Prade (1986) it seems that the matrix corresponds to conditional possibilities, with the relation O(p and q) =min (O(q I p), O(p)). Is this always the case? What are the properties of the matrix induced by the properties on possibilities and necessities? Does the matrix replace the possibility and necessity of an implication? Or can these measures be obtained from it? Possibility and necessity on .first-order implications. It is not so difficult to imagine the possibility and necessity of a propositional implication like
"if attends-to-the-meeting(Peter) then very probably attends-to-the-meeting(Mary)". It is not so easy when one considers first-order implications. What is now the meaning of possibility Poss and necessity Nee for: Vx P(x) --+ Q(x). (I) It could be the possibility (respective necessity) of the global formula
Poss [Vx P(x)
--+
Q(x)]
This means that "it can be" that all P(x) are Q(x) as in "it can be that all planes are on strike today"; it cannot be used to express the uncertainty on "all birds fly" for example. Poss [Vx p/ane(x)
--+
in-strike(x)]
(2) It is more probably something like the possibility of the conclusion when the conditions are verified, which is not so far from conditional possibilities: VxP(x)
--+
Poss [Q(x)]
Vx attends-to-the-meeting(Peter, x)
-->
Poss [attends-to-the-meeting(Mary, x)]
as in
or
Vx smokes(x)
--+
Poss [diedbefore60(x)].
These implications could be rewritten as: smokes(Toto)
--+
Poss [diedbefore60(Toto)]
smokes(Lulu)
--+
Poss [diedbefore60(Lulu)] .
.
The possibility does not depend on the x concerned, and remains the same after instantiation. (3) But what about when the possibility depends on the domain of x (as is the case when one considers statistical measures)? For example, in: Vx bird(x)
--+
Poss [flies(x)],
the possibility reflects the fact that Bird is a superclass, a union of classes of birds that fly and of classes of birds that do not fly; Poss is only valid for x being a bird but
318
D. Dubois and H. Prade
changes when the domain of xis restricted to a subset of bird such as duck or penguin; the implication cannot be specialized, via the classical rule of specialization, without an update of the possibility. Let us suppose Vx penguin(x)
-+
bird(x)
Vx bird(x)
-+
Poss [.flies(x)]
One cannot derive from this Vx penguin(x)
-+
Poss [bird(x)]
This is the same argument used in Duval and Kodratoff (1986): the French usually drink coffee, but it cannot be used to derive that there is some possibility that someone (who is French) drinks coffee. More generally, the problem is that of an evaluation that is true for a groupE of x, but cannot be used for a subset of E. The same problem is met with the use of fuzzy quantifiers; and it seems to be difficult to reason on such information. Implementation issues.
Let us suppose two certain inferences:
Vx height(x) > 1.80 m
-+
basketball-player(x)
Vx height(x) > 1.80 m
-+
likes(Mary, x)
and that we know from a database that height(John) > 1.80 with n = IX and N = fi. If a contradiction such as --,basketball-player(John) is added, then it seems that we have to (i) (ii)
update the certainty measures of height(John) > 1.80; update the certainty measures of the derived assertions such as likes(Mary, John);
(iii)
update all the certainty measures concerned with the height of John as height(John) > 1.90 ... ;
(iv)
if the first inference were
Vx height(x) > 1.80 m and weight(John) < 80 kg update could be done on weight(John) too.
-+
basketball-player(x) then this
If, for dealing with a contradiction, a complete TMS algorithm has to be implemented, requiring the use of dependency links to antecedents, is it not easier to use these links to reason directly, as in hypothetical reasoning, on the unknown or incompletely known values? In conclusion, this chapter seems to be an up-to-date treatment of a crucial problem, that of reasoning on imperfect knowledge. A clear presentation is made of possibilistic and fuzzy logics, and a number of exciting problems remain to be explored. Paul Gochet: Dubois and Prade acknowledge that the notion of truth with which they operate has been tailored for a special purpose. They define truth as the agreement, which can be partial (graded) between the representation of the meaning
10
Possihilistic and Fuzzy Logics
319
of a statement and the representation of what is actually known. If that definition is combined with the standard definition of knowledge as justified true belief then a vicious circle is generated. This objection, however, can be dismissed. The authors are entitled to take the notion of "knowledge base" as primitive and to define knowledge as the content of a knowledge base. They present the correspondence theory of truth as the standard concept: "Truth is generally understood as the conformity between a statement and the actual state of affairs it supposedly refers to." At first sight, that presentation can be questioned. Tarski's semantic concept of truth or Ramsey's earlier redundancy theory of truth have won a wider agreement, at least among logicians and philosophers, than the traditional correspondence theory of truth. For Tarski, truth consists of satisfaction by all sequences of objects of the domain, and satisfaction, in turn, is given a recursive definition (Gochet, 1986). That definition enables him to do without the metaphoric expression of "conformity" and to avoid commitment to dubious entities such as facts (Gochet, 1980) or states of affairs. For Tarski, the predicate "true in language L" can be defined either absolutely, i.e. independently of a model, or relatively, i.e. with respect to a model. The second definition is more often used today, as it plays a crucial role in formal semantics, where it serves to define validity (truth in all models). That definition of validity is very general. It applies also to non-classical logics in which the models have been enriched by the introduction of possible worlds, moments of time, accessibility relations, and modified by a non-standard interpretation of logical constants. A version of the correspondence theory of truth was recently defended by Perry and Barwise within the framework of their situation semantics. It has been shown, however, that situation semantics and Montague semantics, despite significant differences, can both be subsumed under a slightly modified version of the framework that Montague provided in his Universal grammar (Muskens, 1988). Since Montague's framework embodies and enlarges Tarski's definition of truth, this result shows that Dubois and Prade's correspondence theory of truth can fit in with the "received view", i.e. with Tarski's definition of truth. Dubois and Prade's theory, however, is incompatible with Ramsey's theory. This is worth examining, since Ramsey has taken a stance on the issue raised by the concept of degree of truth. According to Ramsey, and also according to Ayer (Gochet, 1988), who has much improved on Ramsey's theory, saying that a statement is true is nothing more than reasserting the statement. The sentence "It is true that p" means nothing more than "p". The predicate "true" is redundant. Haack ( 1980) observes that the very notion of degree of truth ceases to make sense if we take up the redundancy theory: "... given that he holds that 'It is true that p' means that p, it is natural that Ramsey should say that 'It is ! true that p' means nothing at all, since there seems to be no way of modifying the right-hand side of Ramsey's definition to give a sense to the modified left-hand side." One might question the claim that the adverbial modification of the truth-predicate cannot be transferred meaningfully to the asserted sentence. Instead of saying "It is half true that the flag Is white", one could say "The flag is half white". But this counterexample fails to refute Haack's claim. By cancelling out the expression "It is true that" and displacing the degree adverb, we change the meaning. The former sentence allowed one interpretation only ("The flag is grey"), whereas the latter allows several ones, and the preferred reading is "Half of the flag is white". Moreover, there are cases where the syntactic shift is really impossible. We can say
320
D. Dubois and H. Prade
"It is i true that France is hexagonal" but the sentence "France is i hexagonal" is sheer nonsense. The clash between Dubois and Prade's admission of degrees of truth and Ramsey's redundancy theory is definitely not an argument against the former view, since it is open to us to abandon Ramsey's theory and retain the idea that truth comes in degrees. We make rough statements such as the abovementioned sentence "France is hexagonal" borrowed from J. L. Austin. Two strategies are possible to cope with that linguistic use. We can say that such a statement fits the fact to a certain degree and decide that statements that fit the fact to a degree ranging between 50% and 100% are true, whereas statements whose "degree of fit" falls below 50% are to be ascribed the truth-value False. Or we can collapse the two dimensions of assessment (fitness to facts and truth-value) into one and introduce the notion of degrees of truth. This is Dubois and Prade's policy. This policy enables them to exhibit the interconnections between classical logic, modal logic, possibilistic logic, many-valued logic and fuzzy logic. This fully justifies their choice even if it departs from ordinary parlance.
Flash Sheridan: The problem with fuzzy logics is not that they are bad logics, but that they are not logics at all. It is hard to define what logic is, but it can be clear that something isn't a logic: if it has significant empirical consequences, or if it doesn't have connectives satisfying the most basic properties (see below) of and and or. Dubois and Prade's logic proves something that I claim is an empirical statement about the nature of colour. An alternative fuzzy logic has connectives that claim to be and and or, but are nothing like them. (I shall restrict my attacks to or; it is the easier target.) The two most basic things about or are that p or pis the same asp, and that p or q is no less true than either p or q. Call the first "idem potence", the second "monotonicity." And must also be idempotent, and monotonic the other way: p and q is no more true than either p or q. Say we have a pencil that is fairly red, and fairly orange. I claim that it is at least conceivable that it is very red or orange. (In fact, there is such a pencil, but that doesn't matter.) With the "most popular choice of operations" (Dubois and Prade's equations (14) and (15)) this is impossible: the pencil is fairly red or orange. (t(P v Q) = max (t(P), t( Q)). t(P) is the degree of truth of the proposition P.) Dubois and Prade do have a theory of colour that makes sense of this; I think it is arbitrary and wrong, but that doesn't matter. What matters is that one can deduce their theory of colour from their logic. I claim that this theory is empirical, so this version of fuzzy logic is not a logic. I am not going to discuss the meaning of"empirical"; if you feel the existence of such a pencil is not an empirical matter, you need not believe my argument. (I think one could even make a case that it isn't.) But if you agree that it is empirical, you may be intrigued by fuzzy logic's usefulness, but you must believe that this usefulnesss is accidental. I know of a different version of or. It uses + instead of max. The obvious problem with this is that one may then get truth values greater than I; it dodges the problem by fiat: if the value one gets is greater than!, pretend it is 1: t(p v q) =min [t(p + q), 1]. This is not idempotent. I am not here attacking the idea of vagueness; I should be interested to see a good logic of vagueness, although there are strong reasons to believe that there can be no such thing. The best philosopher to address the issue of vagueness has concluded that
10
Possibilistic and Fuzzy Lof(ics
321
it is incoherent (Dummett, 1975; see also Fine, 1975-to this latter article, through Frank Veltmann, I am indebted for the color example). If there is a way to axiomatize vagueness, it seems it would have to be far more radical than you would be willing to acept. (It would probably have to be an extreme version of an extreme philosphical position called "strict finitism".) But, except for computational convenience, I see no reason for it to be truth-functional.
Reply: The three discussants have each focused on different aspects of our paper; their respective comments can be summarized into the following questions. (i)
Can there be a logic of vague propositions (Sheridan)?
(ii)
What is the expressive power of possibilistic logic and its relevance for commonsense reasoning (Cordier)?
(iii)
What is the meaning of graded truth (Gochet)?
All three questions are very much relevant to a proper understanding of fuzzy set and possibility theory, and we are grateful to the discussants for raising them. First of all, fuzzy-set theory has no special claim to stand as a general theory of vagueness. To build a membership function one needs three objects: a referential set n, a set of membership values V and a mapping !1A from n to V that discriminates between membership and non-membership in A. The set Vis usually taken to be the unit interval, but this is clearly a matter of convenience. More generally, V should be allowed to be a lattice (Goguen, 1967); then one can build purely qualitative models of vague concepts. Q can be any kind of set, but in practice the use of the fuzzy-set approach is made easier whenever n is what we shall call a "simple set", i.e. either a finite set with small cardinality, or a linear numerical scale, or a Cartesian product thereof. Outside these cases, it is difficult to find a procedure that enables the membership function to be elicited in a reasonable way. Fortunately the above cases occur quite often in practice, especially when the predicate A can be expressed by means of some clearly identified attribute a of objects in n, ranging on some scale S that is a simple set. For instance n is a set of (possibly numerous) people, a(w) evaluates the size of wEn and A means "tall", and is defined on S rather than n using the membership function !1A: S -+ V. !1A(a(w)) is then the degree of tallness of the individual w. The identification of a membership function on a simple set is a problem in empirical psychometry, which is not especially difficult (see Smithson, 1987; Norwich and Turksen, 1984 ). Ancestors of the membership functions have been suggested by philosophers of vagueness (for example Black's (1937) consistency profiles) as a reasonable way of capturing the meaning of vague concepts. But note that the expression of membership functions in terms of random sets (Kampe de Feriet, 1982; Goodman and Nguyen, 1985) enable statistical interpretations to live alongside pure psychometric interpretations of fuzzy sets. This dependence of fuzzy logic upon empirical or statistical matters may look disgraceful to fully fledged logicians. In particular, our theory of vagueness offers no "ontological", absolute definition of graded truth. From a philosophical point of view, fuzzy or possibilistic logic may appear to be accidental. But all practical problems are philosophically accidental too, and the purpose of the logic systems in this book is the solving of practical problems rather than the solving of philosophical issues, although Gochet tends to suggest that our classification of logic systems may have some philosophical relevance. Let us turn to the question of truth-functionality. Sheridan stands strongly against
322
D. Duhoi.1· and H. Prade
the idea that a logic of vagueness can be truth-functional. First, we have never said that it always is. Truth-functionality is preserved only in the presence of complete information. It is false whenever the available information is not complete, even for standard formulae. Another point is that gradedness of truth is not compatible with Boolean algebra. An algebra of vague propositions necessarily has fewer properties than a Boolean algebra. Here there is a choice about which structural properties we are to give up. Let t( p) be the truth value of the (vague) proposition p. Sheridan thinks that basic properties of disjunction are monotonicity (t( p v q) ~ max (t( p), t(q))) and idempotency. Taking truth-functionality for granted leads to t(p v q) = t(p) l_ t(q), where l_ is continuous, coincides with the logical "or" for binary truth values, is commutative, and monotonically increasing in the wide sense. If we further assume that x l_ 0 = x (i.e. p or "false" = p) then the only possible choice for t(p v q) is t( p v q) = max (t( p), t(q)), given Sheridan's requirements of monotonicity and idempotency. Moreover, idempotency is incompatible with the excluded-middle law (Dubois and Prade, 1984d). If the latter must be preserved then we must drop idempotency, and the only possible solution, up to an isomorphism, becomes (t(p v q) =max (0, t(p) + t(q)- I). These proposals are not arbitrary, but are dictated by the algebraic properties that one wishes to keep. Hence there are several possible algebraic structures for a set of vague propositions, and they are all compatible with the unit interval. This is why truth-functionality can be preserved for vague propositions. But we acknowledge the fact that the truth-functionality assumption is made in order to get a simple theory of vagueness. We make it because it is not self-inconsistent and because it makes computations easy to carry out. We agree with Sheridan that 0.4 red and 0.4 orange may lead to 0.8 red or orange. We could always get a disjunction operation that satisfies this condition. However, we believe that nobody would state this property this way. We prefer the approach that first states which algebraic properties of the fuzzy "or" are sensible in a given situation, and then derives the proper class of "or"s accordingly. If this class contradicts the available evidence then maybe the truthfunctionality assumption should be dropped. But we are aware that truth-functionality is here a matter of convenience and is clearly an assumption. See Osherson and Smith (1981, 1982) for a discussion of its limitation from a psychological point of view, and the discussions by Zadeh (1982) and Cohen and Murphy (1985) of the extensionality of the logical combination of vague concepts. Let us now turn to possibilistic logic. Cordier raises a number of very interesting issues in her comments. First, why use numbers instead of symbols to express uncertainty? A purely symbolic approach to uncertainty such as the one by Duval and Kodratoff (1986) faces a challenging problem; that is, how to combine the symbolic modalities. At the end of a proof one gets a list modalities to be interpreted as a whole. And to our knowledge there is no guideline as to how this should be done. In contrast, here, uncertainty propagation is done according to the rules of a given theory of uncertainty (whose choice depends upon the nature of the available information). Of course, the degree of uncertainty bearing on a conclusion may not be informative enough, and, as Cordier stresses it, one may wish to get the reasons for uncertainty as well. But handling uncertainty is not incompatible with maintaining the hypothetical assumptions under which this uncertainty would be removed. The two tasks are not redundant: uncertainty expresses to what extent information is lacking, while hypothetical reasoning is useful for characterizing what is the extra information needed to remove the uncertainty. In that sense, the suggestion of using Truth Maintenance System-like approaches in conjunction with uncertain reasoning
10
Possihilistic and Fuz=y Logics
323
is certainly valuable. Note also that in the estimation of the uncertainty of a compound proposition with respect to a given (incomplete) state of information using a fuzzy pattern-matching technique, it is possible not only to compute a possibility and a necessity degree, but also to determine what part of the information needs to be made more precise and in what manner in order to come closer to complete certainty. Another important issue is that of properly interpreting degrees of necessity and possibility, and being able to get them out of the available evidence. As mentioned earlier, possibilistic information usually stems from linguistic information involving vague terms referring to simple sets. Incompleteness and vagueness of available evidence lead to grades of uncertainty obtained through fuzzy pattern-matching. This is true for elementary facts as well as rules, since a fuzzy linguistic rule also translates into a possibility distribution. Moreover, the interval between the necessity and the possibility of an assertion p, say [N(p), O(p)], can be viewed as bounds on an unknown probability-either lower-bounded (if 0( p) = I) or upper-bounded (if N(p) = 0).
Why is possibilistic logic interesting at all compared with probabilistic logic? (i)
Possibilistic logic offers an absolute reference point for expressing ignorance. Namely N(p) = N(1p) = 0. Expressing some belief about p comes down to choosing a point in [0, I] in between certainty (N(p) = I, N(1p) = 0) and ignorance (N(p) = 0, N(1p) = 0). Ignorance cannot be modelled in probability theory, where it is approximated by randomness. Moreover, in the case of randomness one can say nothing about Prob (p) compared with Prob ( 1 p) unless one knows how many alternatives are offered by p and 1 p. Note that possibility cannot model randomness, i.e. probability theory has its own usefulness, of course. Upper and lower probability systems can model ignorance, and possibility theory can be viewed as the simplest of upper and lower probability systems.
(ii)
Possibility theory offers a nice framework to attach weights of uncertainty to rules "if p then q" in complete accordance with classical logic. An uncertain "if ... then" rule is better expressed by a conditional measure (g(q I p)) than by the measure of a conditional (g( p -+ q)). The quantity N( p -+ q) is very close to a conditional possibility measure, as explained in Dubois and Prade (1986a). Namely we have N(p-+ q) = N(ql p) £ I - 0(1ql p)
as soon as O(p " q) of. 0( p), where O(q Ip) is defined as the greatest solution of 0( p " q) = min (n(q I p), O(p)). Hence the necessity of a conditional is close to a definition of conditional necessity, and possibilistic logic, when we restrict the pair (N(p), N(1p)) to be either (0, I) or (1, 0), reduces to classical logic. The probability of a conditional is seldom equal to the conditional probability (Prob (q 1 p) = Prob (p -+ q) only if they are both I, or if Prob (p) = I). Hence translating uncertain rules into conditional probabilities does not yield a logic that generalizes classical logic, strictly speaking. (iii)
Possibilistic logic is a quasi-qualitative calculus where numbers are compared and not added or multiplied. Numbers are useful only to model gradedness, and no great precision is required. In contrast, probabilistic logic requires sufficiently precise inputs in order to be able to carry out long inferences that remain informative.
D. Dubois and H. Prade
324
Let us now consider the problem of first-order implications. It is clear that N(Vx, P(x) -+ Q(x)) = IX is not the same as Vx N(P(x) -+ Q(x)) = IX (where P and
Q
are non-vague predicates for simplicity). The first expression represents a conjecture that is refuted by finding x 0 such that P(x 0 )-+ Q(x 0 ) is false. The other expression is closer to a default rule when IX is close to I, since it means that for all x, "P(x) -+ Q(x) is true is almost sure" (but there may be exceptions). That is exactly equivalent to saying that when P(x) is true then Q(x) is almost surely true, putting the necessity on Q(x) only. This identity of meaning is reflected by the fact that both approaches, i.e. putting the necessity on the rule or on the rule conclusion, are equivalent. To see this, let M(P) = {x 1 P(x) is true}. N(Q(x)) =IX is expressed by the fuzzy set M(Q.) defined by (Prade, 1985b) I ifxEM(Q) { I-IM