A Methodology for Uncertainty in Knowledge-Based Systems (Lecture Notes in Computer Science)

Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Edited by .I. Siekmann Lecture...

Author: Kurt Weichselberger | Sigrid Pöhlmann

103 downloads 408 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Edited by .I. Siekmann

Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis

Editorial

Artificial Intelligence has become a major discipline under the roof of Computer Science. This is also reflected by a growing number of titles devoted to this fast developing field to be published in our Lecture Notes in Computer Science. To make these volumes immediately visible we have decided to distinguish them by a special cover as Lecture Notes in Artificial Intelligence, constituting a subseries of the Lecture Notes in Computer Science. This subseries is edited by an Editorial Board of experts from all areas of AI, chaired by JOrg Siekmann, who are looking forward to consider further AI monographs and proceedings of high scientific quality for publication. We hope that the constitution of this subseries will be well accepted by the audience of the Lecture Notes in Computer Science, and we feel confident that the subseries will be recognized as an outstanding opportunity for publication by authors and editors of the AI community. Editors and publisher

Lecture Notes in Artificial Intelligence Edited by J. Siekmann Subseries of Lecture Notes in Computer Science

419 KurtWeichselberger Sigrid P6hlmann

A Methodology for Uncertainty in Knowledge-Based Systems

Springer-Verlag Berlin Heidelberg NewYork London ParisTokyo Hong Kong

Authors

Kurt Weichselberger Sigrid P6hlmann Seminar fLir Spezialgebiete der Statistik Universit&t ML~nchen Ludwigstral3e 33, D-8000 ML~nchen 22, FRG

CR Subject Classification (1987): 1.2.3-4 ISBN 3-540-52336-? Springer-Verlag Berlin Heidelberg NewYork ISBN 0-38?-52336-? Springer-Verlag NewYork Berlin Heidelberg

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is only permitted under the provisions of the German Copyright Law of September 9, 196,5, in its version of June 24, 1985, and a copyright fee must always be paid. Violations fall under the prosecution act of the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1990 Printed in Germany Printing and binding: Druckhaus Beltz, Hemsbach/Bergstr. 2145/3140-543210 - Printed on acid-free paper

FOREWORD

The number of publications on the management of uncertainty in expert systems has grown considerably over the last few years. Yet the discussion is far from drawing to a close. Again and again new suggestions have been made for the characterization and combination of uncertain information in expert systems. None of these proposals has been adopted generally. Most of the methods recommended introduce new concepts which are not founded on classical probability theory. This book, however, written by statisticians, investigates the possibility of giving a systematic treatment using the classical theory. It also takes into account that in many expert systems the available information is too weak to produce reliable point estimates for probability values. Therefore the handling of interval-valued probabilities is one of the main goals of this book. We have not dealt with all important aspects of these issues in our study. We intend to continue our research on the subject with the aim of solving those problems which still remain unsolved. Also we are aware of the fact that the experience of other researchers may throw new light on some of our statements. Therefore we are grateful for any criticism and for all suggestions concerning possible improvements to our treatment. We had the opportunity to discuss some parts of our study with Thomas K~mpke, Ulm, and owe valuable suggestions to him. Since our native tongue is German and we live in a German speaking environment, we had some difficulties as regards the English style. Louise Wallace, Plymouth, has supported us very much in this respect, although she bears no responsibility for remaining imperfections. Anneliese Hiiser and Angelika Lechner, both from Munich, carefully managed the editing of a manuscript which progressed step by step to its final version. Dieter Schremmer, Munich, supported us by drawing the diagrams. Their help is greatly appreciated.

Munich, January 1990

Kurt Weichselberger, Sigrid P6hlmann

CONTENTS 1.

T h e a i m s of this s t u d y

2.

I n t e r v a l e s t i m a t i o n of p r o b a b i l i t i e s

.

.

.

.

7.

R e l a t e d theories

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

1 7 29

3.1

C h o q u e t - c a p a c i t i e s and sets of p r o b a b i l i t y d i s t r i b u t i o n s . . . . .

29

3.2

C h o q U e t - c a p a e i t i e s and m u l t i v a l u e d m a p p i n g s

34

3.3

T h e o r y of belief functions

3.4

C o m b i n a t i o n rules of t h e D e m p s t e r - S h a f e r t y p e

. . . . . . .

44

3.5

T h e m e t h o d s used in t h e e x p e r t s y s t e m M Y C I N

. . . . . . .

59

. . . . . . .

. . . . . . . . . . . . . . .

T h e s i m p l e s t case of a d i a g n o s t i c s y s t e m

. . . . . . . . . . . .

4.1

A s o l u t i o n w i t h o u t further a s s u m p t i o n s

4.2

Solutions with double i n d e p e n d e n c e a n d r e l a t e d m o d e l s . . . . .

Generalizations

. . . . . . . . . .

67 75 87

. . . . . . . . . . . . . . . . . . .

87

The formalism

5.2

Some aspects of p r a c t i c a l a p p l i c a t i o n

. . . . . . . . . . .

I n t e r v a l e s t i m a t i o n of p r o b a b i l i t i e s in d i a g n o s t i c s y s t e m s 6.1

An a p p r o a c h w i t h o u t a d d i t i o n a l i n f o r m a t i o n

6.2

A d d i t i o n a l i n f o r m a t i o n a b o u t ~j

6.3

T h e c o m b i n a t i o n rule for two units

6.4

T h e c o m b i n a t i o n rule for m o r e t h a n two u n i t s

95

. . . . . .

99

. . . . . . . .

101

. . . . . . . . . . . . .

103

. . . . . . . . . . . .

111

A d e m o n s t r a t i o n of t h e use of i n t e r v a l e s t i m a t i o n

. . . . . . . .

118

. . . . . . . .

121

A p p l i c a t i o n of F o r m u l a (3.21) to s t r u c t u r e s defined by k - P R I s

References

67

. . . . . . . . . . . . . . . . . . . .

5.1

Appendix:

38

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

127 131

LIST OF DEFINITIONS

k-PRI

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

7

k-dimensional probability interval . . . . . . . . . . . . . . . .

7

reasonable

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

7

structure

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9

feasible

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

9

derivable

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10

.

.

.

.

degenerate

.

global independence total independence

.

. .

.

.

interval-admissible

.

.

.

.

.

mutual k-independence

.

.

. .

.

.

.

.

double independence k-independence

.

.

. .

.

.

.

.

. .

.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

19

.

.

.

.

.

.

.

.

.

.

.

61

.

.

.

.

.

.

.

.

.

.

.

.

.

.

61

.

.

.

61

.

.

.

87

. .

.

.

.

. .

. .

. .

. .

. .

. .

.

.

.

.

. .

. . . . . . . . . . .

.

.

. .

.

.

. .

. .

. .

. . . . . . . . . .

.

.

.

.

.

.

C o n c e p t s w h i c h c a n be f o u n d in r e l a t e d t h e o r i e s a r e n o t i n c l u d e d .

.

.

.

89 104

CHAPTER 1 The Aims of this Study Expert systems of a certain kind rely essentially upon the availability of a method for handling uncertainty. These systems cannot be conceived without a decision being firstly made about the choice of this method. Obviously this is true for all expert systems using empirical knowledge which in itself is not absolutely certain. As an example, we could mention a medical expert system which draws conclusions from the observed symptoms about whether or not a certain disease is present. All conclusions of this type unavoidably contain an amount of uncertainty. The rules which lead to these conclusions should not be confused with logical rules and must not be treated in the same way.

We shall call expert systems of this type diagnostic systems. They are mostly found in the field of medicine, but can also be used for meteorological or geological purposes, and of course for the control of technical installations. We shall demonstrate some results of our study by means of an example which uses the alarm system of a power plant. Therefore the expression "diagnostic system" should always be understood in the sense of an expert system, which relies upon empirical interdependences for drawing its conclusions and consequently requires the treatment uncertainty.

of

In order to make it possible to decide upon an appropriate therapy, a quantitative measure of uncertainty has to be applied in all relevant cases of a diagnostic system. Additionally it may be sensible to establish rules which, in certain stages of the investigation, direct the investigator's efforts depending on the degree of certainty achieved for possible hypotheses. It is evident therefore, that for researchers who design diagnostic systems the question has to be answered, as to which method of measuring uncertainty should be employed. For more than three hundred years scientists, philosophers, mathematicians and statisticians have used the concept of probability to describe degrees of uncertainty. Over three centuries a huge amount of theoretical results and experiences concerning the applicability of probability theory in different fields of human knowledge has been accumulated. Nevertheless many doubts concerning the appropriateness of the use of probability in diagnostic systems have arisen during the last few decades. So one has to ask what new aspects have evolved in the construction of such systems, which could possibly result in the necessity to develop methods for measuring uncertainty beyond classical probability theory. First of all it must be stated that although the basic ideas prevailing in some considerations about diagnostic systems sound convincing, they violate fundamental requirements for reasonable handling of uncertainty. These ideas may be described as follows: If a certain fact is observed, a measure M1 of uncertainty concerning the hypothesis in question must exist. If in addition another fact is observed, which produces a measure M2 with respect to the same hypothesis, a combination

rule must be given, which yields the measure of uncertainty of this hypothesis resulting from both observations. Such a rule, which calculates the measure of uncertainty for the combined observation as a function of the measures M1 and M2 can never take into account the kind of mutual dependence of the two observed facts. It might well be that these facts nearly always occur together, if indeed they occur at all. In such a situation the second observation is redundant and should not be used to update the measure of uncertainty. In another situation the two facts very seldomly occur simultaneously and if they do, then this is an important indication concerning the hypothesis in question. If they do occur simultaneously, the updating of the measure of uncertainty should have drastic consequences. A combination rule which treats these two situations equally, can by no means be regarded as useful. The question arises: Should probability theory be blamed for not supporting the construction of such a combination rule? Yet more fundamental is the question: Is it justifiable to attribute a certain measure of uncertainty to the observation of a given fact, irrespective of the circumstances? Take the example of a medical diagnostic system: If a symptom Z is observed, and a measure of uncertainty is used concerning the hypothesis of the presence of a certain disease, can this measure remain valid, if this disease occurs much more frequently than before? Once again an appropriate use of probability theory reveals the kind of dependence prevailing in this case. However, this will not be a popular result, because it states that a diagnostic system using this type of measure of uncertainty cannot be applied to populations showing different frequencies of this disease. Later in this study we shall demonstrate that negligence with respect to the aspects mentioned above may result in the inclusion of information into a diagnostic system which is equivalent to ruining it. Another argument against a possible application of probability theory in diagnostic systems is as follows: While probability theory affords statements, using real numbers as measures of uncertainty, the informative background of diagnostic systems is often not strong enough to justify statements of this type. This is indeed a true concern of the conception of diagnostic systems not met by probability theory in its traditional form. However, it is possible to expand the framework of probability theory in order to meet these requirements without violating its fundamental assumptions. In our study we shall present elements of a systematic treatment of problems of this kind and refer to related theories. Therefore we believe that the weakness of estimates for measures of uncertainty as used in diagnostic systems represents a stimulus to enrich probability theory and the methodological apparatus derived from it, rather than an excuse for avoiding its theoretical claims. A third argument which is met in the discussion about the application of probability theory in diagnostic systems refers to the disputes about the foundations of that theory. It is easy to quote prominent probabilists who express completely contradictory opinions about the essential meaning

of a probability statement. It must however be noted, that those difficulties concerning the concept of probability originate mainly from the problems of statistical inference - which may be the object of an expert system, but never of a diagnostic system. When a diagnostic system is conceived, the experimental background of the information employed is not explicitly considered: Now the question is, which measure of uncertainty should be applied in order to describe this information irrespective of whether it stems from the experience of an expert or from the evaluation of a sample. Concerning this problem the queries about the foundations of probability cannot be a reason for turning away from the language of probability, and even more so because the probabilists are in agreement about the basic rules for the use of probability. Only these basic rules are required in a diagnostic system. Therefore as far as these systems are concerned, if a measure of uncertainty has to be developed we recommend that the language of probability be relied upon and that one refrains from interfering in the dispute about probability concepts. It should be explicitly stated that this recommendation emphasizes that all rules of classical probability theory must be respected and that any new principle which cannot be justified by this theory must be avoided. Nevertheless in our study we shall discuss methods which employ such principles if they have been proposed for use in diagnostic systems: the Dempster-Shafer rule of combination and the methods applied in the expert system MYCIN. Probably the main problem which arises from the construction of diagnostic systems is the combination of information stemming fl'om different sources. We shall concentrate on the discussion of this problem which has attracted much attention in recent literature on Artificial Intelligence [See e.g. KANAL and LEMMER, 1986; PEARL, 1988] Since it is our concern to promote the use of probability theory for handling uncertainty in artificial intelligence, we shall investigate situations which may be described as follows: A number of sources of information are given, for instance the results of different parts of a compound medical test or the behaviour of different alarm units controlling the state of a power plant. For each of the sources of information a probability statement about a problem under consideration can be made, for instance concerning the state of health of the person tested or concerning the momentary state of the power plant. How can these probability statements stemming from different sources of information be combined to an overall probability statement? Since the circumstances suggest in many cases that we should consider the sources of information as if they were in a temporal sequence, the expression "updating of probability" may be used to describe this problem. To avoid confusion we shall carefully explain the difference between the problem we are concerned with and another problem, which is sometimes called "combining of probability distributions". The latter problem deals with subjective probability distributions stemming from different persons, and the aim is to find a single probability distribution which may be defined as being attributed to the group of persons as a whole. It assumes that all persons of the group possess the same stock of

information. Deviations between their probability statements can then be ascribed solely to their personal attitudes. An abundance of literature concerning this problem is available, going back as far as to Abraham Wald [GENEST, ZIDEK, 1986; LEHRER, WAGNER, 1981]. If the main difference between the two problems is kept in mind - i.e. different sources of information or one common stock of information - it should always be possible to distinguish between them, even if one has to combine probability statements stemming from different experts in a diagnostic system. To make the distinction as clear as possible we shall never use expressions like "expert view" in our study. In this way we also wish to demonstrate that we are not concerned about the origin of the probability statements used in diagnostic systems. Whether these statements are created through theoretical considerations, through evaluations of empirical results or through personal views of experts, does not influence the way we use them. We therefore assume that probability estimates are given under well defined conditions, regardless of whether these are point estimates or interval estimates. Certainly this must be seen as a realistic assumption, because it allows for situations in which little information is available and which therefore produce very wide probability intervals. In the case that absolutely no knowledge at all is given - if this ever occurs - this has to be described by an interval reaching from zero to one. It should be stated explicitly that we refrain from using fuzzy sets to define probability estimates. We believe that the use of interval estimates produces a degree of freedom large enough to distinguish between situations which may be relevant for the use in diagnostic systems. The combination of the theory of fuzzy sets with the methods proposed here would inevitably lead to further complications of these methods and consequently result in an impediment to their application. As already mentioned, the results described in this study are created by elementary probability theory. From the standpoint of this theory they provide no new insights. The methodology recommended for use in diagnostic systems depends upon the feature of the sources of information involved. We propose rules for combining two sources of information in the simple case of two states of nature and two symptoms distinguished for each source of information, if all relevant probabilities are given as numbers, but no assumption is made about mutual independence of the sources of information (Chapter 4.1). In the case of mutual independence of the sources - a concept which is discussed already in Chapter 3.5 - we provide basic results for two sources of information in Chapter 4.2 and generalize them in Chapter 5.1, so that no restrictions concerning the number of sources of information or the number of states of nature or the number of symptoms distinguished remain. In Chapter 5.2 we shall give an answer to the question of changing prior distributions, which was mentioned before, when we referred to the disease, whose frequency had been increased. All this is done under the assumption, that all probabilities are estimated by real numbers.

Interval estimates of probabilities are described in a general manner in Chapter 2, while theories related to this subject are reported in Chapter 3.1 to 3.3. The problem of combining information stemming from mutual independent sources in the case that probabilities are estimated by intervals is treated in Chapter 6. In that chapter we confine ourselves to the case, that only two states of nature are distinguished and only two symptoms can be observed for each source of information. The resulting recommendations for two sources of information are described in Chapter 6.3 and those for more than two sources of information in Chapter 6.4. Their behaviour is demonstrated in Chapter 7. The cases of more than two states of nature or more than two possible symptoms afford additional methodological considerations which will have to be postponed for further studies. We hope to be able to include results referring to this problem in a second edition of this book. An important aspect of our investigation is the discussion of alternative combination rules. In Chapter 3.4 the Dempster-Shafer rule is described, which is often recommended in literature. Apart from theoretical considerations, which bring to light a lack of justification for this rule, we demonstrate its behaviour in problems of practical relevance. The main result of this part of our study is shown in Example (3.11): The Dempster-Shafer can produce misleading results. Reasonable statements concerning the same problem are derived through probability theory in Chapter 4 and in ChapterS.

Compare: Example (4.3) and Example (5.1)! These comparisons exclude the

Dempster-Shafer combination rule from the stock of methods which can be recommended. The expert-system MYCIN introduces a technique relying on the construction of certainty factors. Its background and its behaviour are investigated in Chapter 3.5, where it turns out, that it is also not suitable as a basis for diagnostic systems. We do not discuss all relevant recent literature on the subject of handling uncertainty in expert systems, which deserves careful consideration, primarily the book by Judea PEARL [1988], but we intend to do so in a later edition of our study. Since in Pearl's book a comprehensive bibliography can be found, we refrain from including a bibliography in our contribution.

CHAPTER 2 Interval Estimation of Probabilities Many methodological considerations about diagnostic systems start with the assumption, that probability estimates are given by intervals and not by real numbers. Therefore it seems worthwhile to discuss the formal aspects of such a situation. In this chapter an approach is presented, which promises to qualify for use in diagnostic systems. Let us start with Some definitions. Definition (2.1): k Be 8 = {El, •. • ,Ek}. The unknown probability distribution P ( E l ) , . . . , P (Ek), with i~lP (Ei) = 1, has to be estimated. A set of intervals [Li ; Ui],

i = 1 , . . . , k , with

0Li , for all i ¢ j, it follows: k Uj+ Z1Li.= _< 1. iCj Analogous for b).

[]

In Example (2.2) the violation of the conditions described in Theorem (2.2) under a) for j = 2 and j = 3 can be recognized, and it is also evident that the estimate would be reasonable if all the upper limits were 0.4. The question now arises whether it is possible to derive a feasible estimate from a reasonable one which is not feasible. In other words: We are searching for an algorithm determining which part of the intervals can be eliminated if a reasonable but not feasible estimate has to be converted into a feasible one. We first need a definition which describes this program:

Definition (2.5): A k - P R I ( ~ ' , U ' ) is called derivable from a reasonable k - P R I ( ~ , U ) , if

s*(I~' ,I~') D s*(I],l~)

(2.5a)

Li I

(2.a)

then g Ui + Lj _>1 i;q

for each j ¢ j*.

makes use of the following two lemmata:

iCj*

Lemma 2: Be i~j,Ui_ + Lj, _< 1 then E L i + Uj _ I

i~j*

then for each j ¢ j* :

i~jUi+ Lj =i~jUi+Uj* + LJ>i~ jLi+ Uj*+I~j = 2 Li+ Uj*> 1 i;~j* i;~j* -

iCj*

-

Proof of Lemma 2: analogous to the proof of Lemma 1. Conclusions from these lemmata: For a reasonable k-PRI, which is not feasible, there are only three possibilities: a) only a)-conditions, as defined in Theorem (2.2), are violated but no b)-condition. /3) only h)-conditions are violated but no a)-condition. 7) there exists only one j* for which both conditions are violated. Proof of Theorem (2.3): a) To show: ( g ' , U ' ) is derivable from (T',U). Obviously (2.5b) holds. Be{P(E,),...,P(Ek)}CS*(I~,U),thenLj_ 1 (otherwise it would be case 7)) Consequently itEi,Uil + Li,' = itEj,Ui + Lj* -> 1. For all j # j*: iCj

Ui' + Lj' = i~j,Ui' + U j * ' - U j ' + = i~j,Ui+ ( 1 - i ~ j , L i )

Lj : - Ui + Lj =

: 1 +F~ (Ui-Li) - (Uj-Lj) > 1 i~j* This means that all b)-conditions are satisfied if I J I = 1. B e I J I >2 : jl, j 2 e J , j l # j 2 , then for all j # jl: i;Q

Ui' +

Lj'

i~ilUi I + UJ1 -Uj =

i/Jl

=1+

Vi ' + ( 1 - i

~Jl

~, ( U i ' - L i ) i~J 1

+ Li = Li) - (Uj' - LI) (Uj' - L i ) > l , f o r a l l j # j l

analogous: i~i ~If]J

Ui' +L i' = l + E

~J2

(Ui' - L i ) -

(Uj'-Lj)_>I,

for all j # j 2

I > 2 all b)-conditions are satisfied.

Case ~, only b)-conditions are violated: analogous to case a.

13

Case 7, there exists only one j* for which both conditions are violated: U}* = 1 - S ,Li , Ui' = Ui for i i~ j* i;tl L}* = 1 -i~],Vi ,

Therefore

Li I = L i

for i ~ j*

i~j*Ui'+ Li , = itZj,Ui+ Ll , = 1 and i

,Li + g}* = 1

This means condition a) and b) are fulfilled for j = j*. That for j ~t j* conditions a) and b) are not violated for ( ~ ' , U ' ) , can be shown in the same way as in case a). c) Uniqueness: Be (L'",U") derivable from ( ~ , U ) and feasible. We want to show: (L+",'U '') = (L",'U'). From Equation (2.6) it follows: S * ( ~ , U ) = S*(L",V') = S*(L'",'U"). As both k-PRI's are feasible we can conclude: Lj' = L j " = Min_P(Ej)

Ui' : U;' :

S*(L,u) M~ P(Ej) S*[L,U)

The following example shows how a feasible interval estimation can be derived, when a reasonable one is given. Example (2.3):

0.20 0, -

i 1~ i = 1

}

17 It is worthwhile to explain how the six corners originate: Each upper limit U i cuts the corresponding apex of the triangle and creates two new corners of S~ because there are two borders which meet at the apex of the original triangle. The number of the corners of S'~ is the product of three (cuttings of an apex) times two (corners created by each cutting). If the upper limits Vi were smaller than in Example (2.5), larger parts of the apices would be cut, but the structure S* remains a hexagon. Example 2.6: Only if each Ui=0.5, is the hexagon reduced to a triangle (s21, s22, $23), as can be seen in Figure 2. In this case the structure of the 3-PRI, denoted by S*~, is described by

S*

2=

A1

0.0

I {0.5} {001

0.5

+ ~2

0.5

0.0

+ A3

0.5

hi _> 0,

3

1~1,~i.== 1

}

0.5

It is evident that by choosing smaller values for Ui the number of the corners of S* either remains unaltered or may be reduced, but the nmnber will never increase. In order to proceed to the general case of a feasible 3 - P R I we shall make use of Example (2.7): Let a 3-PRI be:

0.1 < P(Et) 5 0.4 0.2 < P(E2) < 0.5 0.3 < P(E3) < 0.6

(It is easy to control that this is a feasible 3-PRI.) If we again use triangular coordinates, S* will also in this case be represented by a hexagon, as it is seen in Figure 3. It is easy to recognize the way in which the six corners of S* can be calculated: Each of them corresponds to a probability distribution consisting of one probability equal to its lower limit and another one equal to the respective upper limit provided that the third probability does not exceed its limits. In this example we arrive at the following values: sl:

P(E1) : L1 = 0.1 ;

P(E2) : U2 = 0.5

~

P(E3) = 0.4

s2:

P(E1) = L~ = O. 1 ;

P(E3) = U3 = 0 . 6

~

P(E2) = 0.3

$3:

P(E2) = L2 = 0.2 ;

P(E3) = U3 = 0.6

~

P(E1) = 0.2

s4:

P(E,) = U, : 0.4 ;

P(E2) = L2 = 0.2

~

P(E3) : 0.4

ss:

P(E1) : U~ = 0.4 ;

P(E3) : L3 : 0.3

~

P(E2) = 0.3

s6:

P(E2) = U2 = 0.5 ;

P(E3) = L3 = 0.3

~

P(E1) = 0.2

18

?(E,)

0

('tO,O)

T ~1,0 0,~ O,g Off

0.

0

0,6

o5

~3

0,3 0,2,

Sq

t61

0,4

tz

(0&0)

(0,o,4)

c7

Figure 3

o

.( o~

The representation of S* in the case of Example (2.7).

O n e should note, t h a t in all o t h e r cases, in which two p r o b a b i l i t i e s are equal to one of t h e i r respective limits, t h e t h i r d p r o b a b i l i t y does not lie b e t w e e n its limits. In F i g u r e 3 we labelled these p o i n t s using tt,t2,t3,t4,ts,t6. tl:

P(E2) : U 2 : 0 . 5

;

P(Ea) = U3 : 0 . 6

~

P(Ei) : - 0 . 1

t2:

P(E~) = L1 = 0 . 1 ;

P(E2) = L2 = 0 . 2

~

P(E3) = 0 . 7

t3:

P(E1) = U1 = 0 . 4 ;

P(E3) : U3 : 0 . 6

~

P(E2) : 0 . 0

t4:

P(E2) = L2 = 0 . 2 ;

P(E3) = L3 = 0 . 3

~

P(E1) = 0 . 5

t5:

P(E1) = U ~ = 0 . 4

;

P(E2) = U2 = 0 . 5

~

P(E3) = 0 . 1

t6:

P(E1) = L1 = 0 . 1 ;

P(E3) = L3 = 0 . 3

~

P(E2) = 0 . 6

W e m a y therefore describe S* of this e x a m p l e as follows:

19

S*=

{{°11 [iil 1°211°41f°41 [°21 ,h 0.5 +,~2 0.4

:

+,~3 0.2 +24 0.2 +J5 0.3 +,~6 0.5 [0.6J [0.4J [0.3J 0.3

Ai>O, ~lJi=l i

}

and it is obvious, that S* cannot be represented mathematically in a simpler way. This is, however, not true for every feasible 3-PRI. To describe results concerning the structure S* we introduce a new term. Definition (2.6): A feasible k - P R I is called degenerate, if there exists a value Qi with Q i = L i or Q i = U i for each k

i=l,...,k, so that i__E1Qi= 1.

[] 3

It is easy to see, that a feasible 3 - P R I can be degenerate only, if either E Li + (Uj-Li) = 1 or i=l

3

E Ui + (Li-Uj) = 1 for an index j e

i=1

{1,2,3}.

We can now formulate: Theorem (2.4): The structure S* of a 3 - P R I is represented by a hexagon, if the 3 - P R I is not degenerate. Proof of Theorem (2.4): We first define tile six corners of S* as in Example (2.7): $1:

P(E1) = L1

;

P(E2) : U2

;

P(Ea) = 1-L,-U2

s2: sa:

P(E,) = La

;

P(E2) = 1-L,-Ua

;

P(E3) = Ua

P(E,) = >L2-Ua

;

P(Eu) = Lu

;

P(Ea) = Ua

P(E3) = 1-U1-L2

s4:

P(E1)

= U1

;

P(E2) = L2

;

ss:

P(Ei) = U1

;

P(E2) = 1-UI-L3

;

P(E3) = L3

s6:

P(E1) = 1-U2-L3

;

P(E2) = Us

;

P(E3) = L3

Due to the condition of feasibility none of these probabilities exceed their respective limits, for instance: La _< 1-L1-U2 __ 1. We now suppose that the 3 - P R I is not degenerate. We want to show, that in this case the representation of S* has exactly six corners, namely the corners sl,s2,sa,s4,s5 and s6. Therefore firstly it has

20 to be shown, that there are no additional corners. The possible candidates are those points, where for two probabilities either the upper limits or the lower limits are reached, e.g.: t , : P(E1) = 1-U2-U3; P(E2) =V2

;

P(E3) = U3

t6 : P(E1) = L1 ; P(E2) = 1-L1-L3; P(E3) = L3 As the 3-PRI is not degenerate, we obtain for the point t~: P(E1) = 1-U2-U3 < LI because of UI+U2+Ua-(UI-L1) > 1 and for the point t6: P(E2) = 1-L1-L3 > U2 because of LI+L2+L3-(L2-U2) < 1 The other four points are excluded in the same way. Secondly it must be shown, that the corners sl to s6 are indeed six different points. If for two points sj and sj, all three respective probabilities are to be alike, then two equations must hold, e.g. if 81~$2: U2 = 1-LwU3 and 1-LwU2 = U3. As this is possible only if the 3-PRI is degenerate, it can be excluded due to the underlying assumption in Theorem (2.4). o The case of a degenerate 3-PRI is demonstrated through Example (2.8), which modifies the estimates of Example (2.7) in only one place. Example (2.8): A 3-PRI be:

0.1 < P(E~) __P*(B1 U B2) + P*(BI N B2)

(-ql) N (-12)

E Li]

"(IInI2)]

32 C) Z Ui > 1 - ~ L i and Z Ui < 1 - ~ L i ll -ql I2 "M2 Then:

P*(B~) + P*(B2) = 1 - Z Li + Z Ui _> 711 [2 > 1 - Z Li + Z Ui ~ (Ui - Li) = "~I1 12 (-ql)NI2 = I-

Z

Li +

Z Ui _)P* (B1 U B2) + P* (Bx N B2)

(.i,)n(~i2) |~ni~ because of inequality 3) as well.

d)

~Ui>l-

11 Then:

~ L i and ~ U i > l "MI I2

ZLi "~I2

P*(B1) + P*(B2) = 1 - Z Li + 1 - Z Li = -ql -~I2 =2 -

Z

Li -

Z

Li >_pa(B1 U B2) + P*(B1 N B2)

(-~II)U (~12) ("ql) f] ("112) because of inequality 4). Therefore (3.5a) holds in each of the four cases. Example (3.1): We refer to Example (2.9), where the following estimates are given: 0.00 < P(E~) < 0.10 0.10 _ 0

CF1 + CF2 + CFa'CF2

f o r CF1,CF2 < 0

CF=

(3.28)

CF1 + CF2 otherwise 1 - Min[lCFI[,lCF21]

Here CF1 and CF2 are the certainty factors which come from two different observations Z1 and Z2; CF is the certainty factor which arises from the combined observation of Z1 and Z2. It is understood that MBs, MDs and CFs are always calculated with respect to the original prior probability. The new combination formula (3.28) has obvious advantages: it secures - as Shortliffe and Buchanan stress - commutativity. Therefore it is not necessary to store all MBs and MDs. It is sufficient to store the last CF because another sequence of the same observations would not change the result. [SHORTLIFFE, BUCHANAN, 1985, p.216]. The authors consider Formula (3.28) more plausible than the original formula. The main objective remains: It is not possible to judge the merits of any combination rule which does not take into account a measure of association or correlation between the observations concerned. In the rules MYCIN uses for the final decisions, the certainty factors of competing hypotheses are compared and therapy is based on hypotheses with high CP-vaiues. This is described in the following way [SHORTLIFFE, BUCHANAN, 1985, pp.261/262]: "We have shown that the numbers thus calculated are approximations at best. Hence it is not justifiable simply to accept as correct the hypothesis with the highest CF after all relevant rules have been tried. Therapy is therefore chosen to cover for all identities of organisms that account for a sufficiently high proportion of the possible hypotheses on the basis of their CF's. This is accomplished by ordering them from highest to lowest and selecting all those on the list until the sum of their CF's exceeds z (where z is equal to 0.9 times the sum of the CF's for all confirmed hypotheses). This ad hoc technique therefore uses a semiquantitative approach in order to attain a comparative goal." (Italics in the

original text.)

64 This method can lead to very dangerous results, if prior probabilities for different hypotheses are unequal. This was shown by various authors and is demonstrated in the following example. Example (3.14): Let the prior probabilities of three possible hypotheses Bl, B2 and B3 be P(B1) = 0.65 P(B2) = 0.30 P (B3) = 0.05 A certain observation Z more or less eliminates the possibility of B2 and transfers almost all of its probability to the hypothesis B3: P(BIlZ) : 0 . 6 5 P(B21Z) =0.01 P(B31Z) : 0.34 It follows: MB(B,,Z) = 0 MB(B2,Z) =0

MD(B~,Z) = 0 MD(B2,Z) = 0.30-0.01 0.30 =0.967

MB(B3,Z) 0.34 - 0.05 = 0,305 = 1 - 0.05

MD(Bs,Z) = 0

Therefore the certainty factors are (in order of magnitude) CF(B3,Z) = +0. 305 CF(BI,Z) :

0

CF(B2,Z) = -0.967 The rule recommended by MYCIN bases its therapy on the hypothesis B3, which has 100% of the sum of CFs for all confirmed hypotheses. Hypothesis B1 is not taken into account because its certainty factor is zero. But hypothesis B1 still has a probability of 65%.

[]

This is far from an extreme example. A rule which may yield results as the one described in this example is liable to spoil valuable information and to be misleading in decision making and therefore should not be recommended. To summarize: The method of certainty factors can be described as an attempt to develop a special language for the communication between experts and an expert system. This attempt proves to be a failure as it does not meet the basic requirement of introducing a new language: an exact description of the concepts used. Only when both communication partners have agreed, at least to a certain extent, on these concepts, is communication possible. But in the case of MYCIN one partner - the expert system - develops ideas about what the many other partners have to bear in mind when they make a certain statement. It is very informative that the authors write: "Suppes pressed us early on to state whether we were trying to model how expert physicians do think or how they

ought to think. We argued that we were doing neither. Although we were of course influenced

65 by information regarding the relevant cognitive processes of experts ...... , our goals were oriented much more toward the development of the high-performance computer program. Thus we sought to show that the CF model allowed MYCIN to reach good decisions comparable to those of experts and intelligible both to experts and to the intended user community of practicing physicians." [SHORTLIFFE, BUCHANAN, 1985, p.211, Italics in the original text]. Obviously the system MYCIN does not provide a suitable basis for a new concept of credibility for use in diagnostic systems. On the other hand, if probability is used for this purpose in the traditional way, the majority of users are familiar with the employed concept. Others can be given short informative courses if they need some elements of probability either to express their empirical knowledge or to use a diagnostic system properly. Therefore we strongly recommend the application of probability theory in its original version in diagnostic systems. What can be achieved in this way will be demonstrated in the following chapters.

CHAPTER4 The Simplest Case of a Diagnostic System 4.1 A Solution Without Further Assumptions In Chapter 3.4 we discussed the example of the power station (Example (3.11)) which has different alarm units. In order to construct a probabilistic model we shall assume that each alarm unit consists of a stochastic control mechanism. At certain time intervals, for instance every hour, all control mechanisms are read. We shall assume that there is a certain probability that such a reading produces an alarm signal and call these probabilities Pi, with j = 1,..., l, if there axe 1 mechanisms or - as we shall say generally in future - 1 units Ai, j = 1,...,1. For simplicity we shall take 1 = 2 as the simplest case of a diagnostic system and obtain the following table: P P2-P P2 Z2

Pl -P 1-p 1 -P2+P 1 -P2 -~Z2

Pl l-p1

Zl ~Z1

1

In the event that unit A1 causes an alarm we call this Z~ and similarly use Z2 when the alarm is caused by unit A2. In the event that unit Aj does not cause an alarm we call this ~Zj (j -- 1,2). To complete the description of the system we need in addition the probability that both units cause an alarm at the same time. We call this probability p. It characterizes the kind of dependence between the two alarm units. Obviously the following limits exist for p: Max(0; pl+p2-1) < p _<Min(pl,p2) If p reaches its maximal possible value, the unit with the lower probability of alarm can never give alarm without the other one doing so at the same time. In the special case that pl = p2, either both or neither unit gives alarm. Obviously in this case one of the two units is unnecessary. Without loss of generality we assume that Pt _ 1 describes positive association between Z1 and Z2 with respect to EF: The probability that both events occur simultaneously is greater than it would be in the case of independence. Using Definition (4.1) we are able to derive the following theorem.

76 Theorem (4.2): If t~F and ~N describe the dependence of Z1 and Z2 with respect to EF and EN, then: EFtOI~2

(4.5)

X++ =

[]

Proof of Theorem (4.2): With the notation used above we obtain: px,, = ~Fplxl'p2°)2 and

pl(1-Xl)

p2(1-~d2)

Therefore: px++ = ~--~Fp,p2~l~2

and

p(1-x,,) : aN_i~p,p2(1-w,)(1-w2 )

(1-w2)lj P = PlP2 [ aFXI-L~w~+ aN (1-Wlll_w

By summation:

Dividing (*) by (**) we get Formula (4.5).

(*) (**) []

Due to the assumed knowledge of the kind of dependence between Z~ and Z2 under EF and EN we derive x++ as a number, not as an interval. Therefore we may conclude that the interval estimation of x,, in the preceding chapter is only due to the fact that no assumption was made about the conditional dependence or independence of Zl and Z2 with relation to the two states of nature. It should be noted, that (4.5) is a simple result of probability theory and does not contain any concepts which are not part of classical probability theory. Of course it does not use any ad hoc proposals for combining evidence. In fact results like (4.5) can be found in text books of probability theory. Any combination rule which is not in agreement with Formula (4.5) therefore contradicts elementary probability theory and cannot be justified by any argument about experimental verification. The most frequently used assumption about the conditional dependencies is the assumption about double independence of Z1 and Z2 in EF as well as in EN. This means nF = 1 and nN = 1. In this case Formula (4.5) is simplified to ~dl a/2

~0 0/

(4.6a)

+

This result is even more popular than (4.5), because the assumption of double independence is widely used. (4.6a) occurs when probabilities of a hypothesis are updated, provided that a total probability of the hypothesis is considered available - a situation which is often described as "Bayesian updating". The methods used in the expert system PROSPECTOR are of this kind and therefore may be regarded as almost identical [DUDA, HART, NILSSON, 1986]. The only relevant

77 difference between the two treatments is that P R O S P E C T O R uses likelihoods of the type P(Zj

IEF)

in its formula, while in (4.6a) probabilities of the type P(E~IZj) = wj are used. This changes the outfit of the results to some extent. The use of odds instead of probabilities by P R O S P E C T O R is purely superficial. It is possible to obtain, analogous to Theorem (4.2), the other probabilities of

EF,

for instance x÷_,

the probability of a failure in case of Z1 and -,Z2. In this case we have to use corresponding values of gF and gN. For double independence all those values for ~ take the value 1. Therefore formulas for x+_, x_+ and x__ can be derived in the same way as Formula (4.6a) for x++: 011~2

x+_ :

w aJl~2 + (1-~Ol)(1-~2) oJ 1-o~

(4.6b)

~la/2

x_+ =

~o + (1-~)(1-~2)

~

(4.6c)

~11~12

x__ =

~ ~a

(4.6d) i-~

Example (4.3): With the parameters used in Example (4.1) we achieve under the assumption of double inde~ pendence the following four probabilities of a failure of the power plant: 0.0001 gT0-O-f2-gN = 0.255 x,+= 0.0001 0.9801 gT0-0-02~ + (y79997N2 0.000001 x,_ = x_+ = 0.000001 0.989901 = 0.003377 g-_Ny0-2~ + 0TgggT~ 1.10-s ~:g0-O2-gg X__ = 1-10 -s 0.999800 = 0.000034 ~J:NY0-2~ + g:V99T0~ These are results in accordance with the purpose of an alarm system: If there are two units and both go off, the probability of a failure is much greater than in the case that there is only one unit and this goes off. One should remember the misleading result gained by the Dempster-Shafer combination rule in Example (3.11)! The corresponding p-value derived from equation (**)in Theorem (4.2) is p = 0.000526 and is remarkably different from the value p = 0.0004, which is a consequence of global independence.

78 It may be asked, how these results correspond to those of the preceding chapter, where for each p-value intervals for x+÷, x÷_, x_+ and x__ were obtained. If we use the p-value of 0.000526 to calculate those intervals we find, due to Equations (4.4a-d): 0.1939 _<x,+ < 0.3802 0 < x+_, x_+ __0.00503 0 _<x__ < 0.000102. Therefore all probabilities derived under the assumption of double independence lie in the corresponding intervals. [] The question, whether this statement is generally true, can be determined by the following theorem. Theorem (4.3): The probabilities deduced from (4.6) (under the assumption of double independence) lie in the corresponding intervals (4.4). [] Proof of Theorem (4.3): It can be checked that for [ ~1~02+ (1-~0t) (1-~o2) ] p = piP2 ~ 1-~a the values x++, x+_, x.+, x__ according to (4.6) are solutions of the System (4.2). For instance, p,,++ + (p,-p)x+_

= plp

+ p,(1-p2)

=

(***)

-

+

= p,+,,,+,,.

due to: p + pl(1-p2)

~1 2 +.

1-~0

= Pl P2 ---7-+ P2

l-w

{

=

-

(1-aq) (1-~2) }

i-~

= Pl { 7x, [p2w2 + (1-p2)~2] + l-x, [p2(i-~;2)+ (l-p2) (i-~2)]} = p,

~- x + i 7 7 (1-0o [ -

= Pl (l-w) (1-52)]

As the inequalities (4.4) are valid for all solutions of the System (4.2), they must be valid for the solutions x++, x+_, x_+, x_, provided the p-value for double independence is used. m It becomes obvious that p according to (***) is necessary for double independence, as shown in the proof of Theorem (4.2), but is not sufficient. It should be noted that double independence of Zl and Z2 in the case of E F and EN comes nearest to what might be understood by "independent sources of information" as used by Dempster. But of

79 course the two concepts are not comparable directly, in that double independence is defined exactly, whereas the meaning of "independent sources of information" is not totally clear. With respect to what we call similarity of the two concepts it is justifiable to compare the results of Dempster-Shafer's combination rule and (4.6). This is first done by means of Examples (3.11) and (4.3). The application of the Dempster-Shafer combination rule, in this case, gives results which are very different from those stemming from probability theory. If we compare the two formulas which are applied in both rules, the origin of the differences becomes obvious: while in (4.6) the total probability of EF is used as a denominator, it is not used at all in Dempster-Shafer's formula. Therefore the difference in the information used is described exactly by this "prior probability". It is easy to convert the Formulas (4.6) into Dempster-Shafer's formula, if all total probabilities are equal to one another, so that the denominators may be reduced. Therefore the application of the Dempster-Shafer combination rule leads to the same results as the application of (4.6), if the total probabilities are: P(EF) : P(EN) :

1

Such an assumption is of course very far from any estimation of the probability that a power station has a breakdown. This is the reason why Example (3.11) led to such obviously dangerous results, which are not seen in examples used by Sharer himself. As long as the regarded events Ei are more or less equally probable a-priori, the application of Dempster-Shafer' s rule comes near to the results of (4.6) and therefore leads to sensible results. It should be noted that the Dempster-Shafer rule applied to x+÷ does not use values of ~j, which influence the prior probability ~ by (4.1). Therefore it is possible to imagine values of ~j which lead - in combination with any pj - to w = 1/2. In this case the result of the Dempster-Shafer rule would be correct. For the example of the power station these ~j would have to be near to 1, in contrast to the true values of ~j which are much smaller than wj. Now the results of the application of the Dempster-Shafer rule in Example (3.11) become understandable. They are the same as those of the application of (4.6), provided ~j is near to 1, indicating that the signals Z1 and Z2 designate the state of tow danger while ~Zi designates high danger. In such a case it would be quite reasonable, that ZI^Z2 would signalize an extremely low danger. Therefore the probability of EF in case of ZI^Z2 would be in accordance with the results of the Dempster-Shafer rule. Of course this explication is good only for x,+. If one applies the Dempster-Shafer rule to x__, it amounts to the application of probability theory with the assumption that wj is very near to 1. In the cases of x+_ and x_+ the assumptions would be, that those wj or ~j which are not used in the formula itself, are near to 1. It is easily seen that using the knowledge of wI and Wl for the power station a value of w = 1/2 can be excluded, even if the values of Pl and p2 are not known. Therefore in such cases the application of Dempster-Shafer's rule amounting to such an assumption is not tolerable at all. The same is true in most medical diagnostic systems, when it is an important aim to detect rare diseases: If the Dempster-Shafer rule is applied, such a detection is impossible.

80 Concerning the strategy in cases of low information availability, generally it can be said: a)

It is hard to understand that the probabilities of a state E i in the case of Z1 and Z2 could be known, while the total probability of Ei is not known at all. Therefore in most cases it must be possible to give at least a good estimation of ~o, even if pl and P2 are not known (which is not necessary for the application of (4.6)).

b)

If it is absolutely impossible to estimate w, at least we know that, due to (4.1), w is a weighted average of wl and ~ ; therefore w lies between these two limits. And the same is true for w2 and ~2: these are also limits for oz Using the narrowest of these limits, interval estimates for x+, etc. can be derived from (4.6), even if "no information whatsoever about the prior exists".

Such interval estimates use the following facts: 1) 2)

Ma~ Min (0Jj ,~j) < 01< Min Ma~ (0Jj,~j) i J x** as well as x÷-, x_, and x_- are monotonously decreasing functions of w.

(4.7)

In the following example we demonstrate this kind of interval estimation: Example (4.4): Using the same values for wj and ~] as in Example (4.3), but assuming that w remains unknown (and p is not known either) we have: ~j = 0.0001 < 0J < 0.01 = wj Due to the monotony of the probabilities x++ etc. as functions of w, the upper limits are found by using a) = 0.0001 and the lower limits by using w = 0.01: 0.01 < x++ < 0.505 0.0001 _<x+_, x_+ < 0.01 0.99.10 -6 _<x__ < 0.0001 Although

these limits are far ranging, it is evident that the results of the application of

Dempster-Shafer's combination r u l e - as in Example (3.11) - l i e

far outside these limits in all

cases. This demonstrates: even if no information at all exists about the total probability, there is no excuse for the use of Dempster-Shafer's rule at least in this example, because the information about o3j and ~j is sufficient to show that the results of this combination rule are by far unreasonable.

[]

The derivation of x,, in (4.6) resulting from the Equation (4.5) was carried out under the assumption that ~;F ----gN = 1. Of course formulas which are analogous to (4.5) may be derived for x,_, x_+ and x__. Formulas (4.6) are special cases of these generalized formulas if the corresponding values of ~F and ~N are equal to one. In order to describe the necessary conditions for the validity of (4.6) we designate coefficients which must be used in formulas analogous to (4.5):

81 a ~ * = nF

n~-= K~+

- -

a~÷ = NN

P(Z1A-Z2IEF)

(4.8)

P(Z, IEF)'P(~Z2IEF)

P('ZI^Z2JEF) P(-ZaIEF)'P(Z2IEF) P(-Z1A-Z2JEF) P(-ZllEF)'P("Z2[EF)

a~-, a~*, ~fi- analogous. Now the question arises, whether the conditions that all aF and all aN are equal to 1 is necessary for the validity of Equations (4.6). It is obvious that (4.6a) is also derivable from (4.5) if ai~+ : aft +

(4.9)

Therefore the probability x++ may be described by (4.6a) not only in the case of double independence but also in those cases, in which both symptoms are dependent upon each other, but the kind of dependence under EF and EN is equal in the sense of (4.9). One should be reluctant to use this kind of "dependence of equal strength". In the same way as a ++ may be used as a measure of dependence, the values of a +-, ~-+ and a-- may also be used. Therefore it would only be justifiable to talk about dependence of equal strength, if all four measures were equal in the case of EF and EN: at + = ~+

a t = a~-

(4.10)

al~+ = ai~+

All these equations are true, if each of the eight values a is equal to one. This means double independence. The possibility, that (4.10) is true in other cases is investigated in Theorem (4.4). Theorem (4.4): (4.10) holds, iff either all values a are equal to 1 or P(ZjIEF) = P(Zj[EN) , j = 1 , 2 .

(4.11) []

According to this theorem the equality of all corresponding values of aF and aN is only possible in the case of double independence or in the trivial case that the signals have the same probability under EF and EN and therefore the units are not suitable for distinguishing between the two states (or for recognizing disease).

82 Proof of Theorem(4.4): Using thefollowing table for EF:

~}+P(ZllEF)P(Z21EF )

~}-P(ZI[EF)(1-P(Z2[EF))

P(Z1]EF)

~+(1-P(Z,]EF))P(Z2]EF)

~p- (1-P(Z1]EF))(1-P(Z2[EF))

1-P(ZllEF)

(4.12) P(Z2IEF)

1-P(Z2[EF)

and an analogous table for EN, we obtain

~--

1-~*P(Z2]EF)

~fi-=

1-P(Z2JEN)

1-P(Za]EF)

g~+ =

1-~*P(ZIIEF)

1-~+P(Z2IEN)

~+ =

1-P(ZllEF)

1-~+P(Z, IEN) 1-P(ZllEN)

1-P(Z, IEF)-P(Z21EF)+~ff+P(Z, IEF)P(Z2[EF) (t-P(ZIIEF))(1-P(Z2IEF)) and an analogous result for ~ - . By means of these results it can be shown that ~}+ = a~ + and a}~- = ~ - is together only possible if ~i~÷ = 1 or (4.11) holds. The analogous statements can be derived with respect to ~+, a~+, t;~- and KI~-.

[]

We demonstrate these considerations by use of the data to be found in Example (4.1): Example (4.5): Recalling that for the power station ~i = 0.01, ~j = 0.0001, Pi = 0.02 we find for the conditional probabilities P(Zj IEF) :

j = 1,2

P(EFIZi)'P(Zi) = wiPi : 0.6711. P(EF)

In an analogous way: P(Z i [EN) = 0.0198. These results characterize the two units: If there is a breakdown, each unit gives alarm with a probability of about 2/3, if the power plant is in normal state, the probability of a blind alarm is about 2 % for each unit. The probability of both units giving alarm can be determined, if n}~+and n~+ are chosen. For example we take n~+ = n~+ = 1.3 and obtain

83

P (ZlAZ2]EF) = O. 5856 P (Zl^Z2 ]EN) = O. 0005. Table (4.12) can now be completed by subtraction:

EF : 0.5856

0.0856

0.6711

0.0856

0.2433

0.3289

0.6711

0.3289

1

0.0005

0.0193

0.0198

0.0193

0.9609

0.9802

0.0198

0.9802

EN:

From this we obtain: ~ - = a~ + = 0.3878,

a~- = 2.2495

nil+ = ~ + = 0.9939,

nil- = 1.0001

If we now apply (4.5) to x**, we arrive at our previous result: x++ = 0.255

(because of ~ + = ~ ' )

while

x+_ = x_+ : 0.0013

(in contrast to 0.0034 in the case of double independence)

and

x__ = 0.000075

(in contrast to 0.000034).

[]

Up to now in this chapter we have used a description of the stochastic behaviour of the two units Z~ and Z2 with respect to the two states of nature: EF and EN. In previous chapters we used the concept of global independence which is related to O, the union of EF and EN. Global independence is assured by p = P(z,^z2)

= P(Z,).P(z~)

: pl.p2.

In Chapter (4.1) it was shown that for this p there results an interval for x++. If a further assumption is then made, either the independence in the case of EF or of EN, we describe situations which are called (O,EF)-independence or (O,EN)-independence. Because of the symmetry between them it is sufficient to discuss (O,EF)-independence. In this case x++ is derived in the following way: P(ZIAZ21EF)'P(EF)

P(ZI[EF)'P(Z21EF)'P(EF)

P (EF I ZIAZ2) P(ZIAZ2)

P(Z~).P(Z2)

P(EFIZ1)'P(EFIZ2) (4.13)

~(EF)

84 Of course the probability of EN in the case of (O,EF)-independence cannot be derived in an analogous way but must be calculated by P(EFIZ1)'P(EFIZ2) P(ENIZ1AZ2)

= 1-P(EFIZI^Z2) = 1

P(EF)

-

(4.14)

P(E~IZl)+P(~IZ2)-P(EN)-P(E~IZ,).P(ENIZ~) 1 - P(EN) The formulas for x,_, x_÷ and x__ correspond to (4.13) and (4.14). In case of (O,EN)-independence the Formulas (4.13) and (4.14) (or their corresponding ones) must be interchanged. The effect of the assumption "(O,EF)-independence" instead of "double independence" is shown in the following example. Example (4.6): Using the previous parameters and (O,EF)-independence it follows, that: P (EF I ZI^Z2) = 0. 3356 P (EN IZl^Z2) = 0.6644 If we use the respective formulas for (O,EN)-independence: P (EF I ZIAZ2) = 0. 0196

P (EN I ZlhZ2) = 0. 9804

In the case of double independence we had: P(EFIZI^Z2) = 0. 255

P(ENIZI^Z2) = 0. 745.

If these results are compared with previous results, we find that the value for x** lies in the interval which is derived frorn Theorem (4.1), if the value attributed to p describes global independence, i.e. p = 0.0004. This is a consequence of the fact that (O,EF)-independence is a special case of global independence. The analogous value of x++ in the case of (O,EN)-independence is not in concordance with the corresponding interval from Theorem (4.1). If the reason for this deviation is sought, it can be shown that (O,EN)-independence is not possible in this case. This is an example of the limitations for the construction of (O,EF)-independence and (O,EN)-independence, which we have already mentioned in Chapter 3.

[]

85 If we investigate the concept of these types of independence using the model of this chapter we find: Theorem (4.5): With the notations used in this chapter, the assumption of (O,Es)-independence is possible, iff a)

(1-~o,) (1-w2) < 1 1-to

h~j

P2 _< 1 - .(1-101).(i-102)

c)

Pl _

A Methodology for Uncertainty in Knowledge-Based Systems (Lecture Notes in Computer Science)

Hybrid Systems (Lecture Notes in Computer Science)

Exercises in Computer Systems Analysis (Lecture Notes in Computer Science)

Software Engineering for Resilient Systems (Lecture Notes in Computer Science)

Computer Vision Systems (Lecture Notes in Computer Science)

Computational Logistics (Lecture Notes in Computer Science)

Reachability Problems (Lecture Notes in Computer Science)

Provable Security (Lecture Notes in Computer Science)

Information Hiding (Lecture Notes in Computer Science)

Operating Systems: Lecture Notes in Computer Science Vol 80

Architecting Dependable Systems III (Lecture Notes in Computer Science 3549)

Hybrid Systems IV (Lecture Notes in Computer Science)

Multiple Classifier Systems (Lecture Notes in Computer Science 5997)

Architecture of Computing Systems (Lecture Notes in Computer Science 5974)