This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
wq and
q
I
c,
(83)
dO,i ~ 0
(84)
=
i= I
ql
I
i=O
have an equal degree of confirmation with respect to C.
146
RISTO HILPINEN
5. Our principal results are now seen from formulas (52), (55), (58), (79), (81), (83) and (84). They can be interpreted as follows: (i) It is not reasonable to assume that such singularities as are found in our evidence represent real singularities of our universe. (ii) All the constituents according to which in our universe there are unobserved singularities have an equal degree of confirmation with respect to any kind of evidence. Moreover, these constituents are exactly as probable as that constituent which denies the existence of any unobserved singularities. Our result (i) seems quite natural and reasonable from the point of view of intuitive judgment as soon as we pay attention to the fact that we have been dealing with an infinite universe, i.e. we have considered the case in which N'> co, It is obvious that even if there were an attributive constituent exemplified by exactly one individual in our infinite universe, our chances to find the individual in question are minimal. This is clearly seen also from the formula (42): If we assume that the singularities found in our evidence represent real singularities of our universe, P (C1 W)'>O when N'> co. The probability that we should have in our evidence such a Ctpredicate instantiated as is exemplified by only one individual in the whole universe approaches zero when the total number of individuals in our universe grows without limit. What we just said holds of course also for any finite number of individuals. as is seen from (79). On the other hand, it is of course possible that our finite evidence contains singularities although in the universe there are no such singularities. Our second main result (ii) seems somewhat problematic. According to (ii), all constituents that say that in our universe there are unobserved singularities are equally probable. In addition, they are as probable as such a constituent as denies the existence of singularities. It is clear that in the case discussed above a reasonable man would choose such a generalization as says that the singularities in question do not exist. This generalization would be the simplest one, and in choosing it one would not postulate the existence of any unobserved kinds of individuals. Even ifit were not true in the sense that in our universe there in fact are singularities denied by the generalization in question, the risk that our future experiences would contradict it is minimal. However, such a choice cannot be defended in terms of our inductive logic. What is the reason for this discrepancy between our intuitions and our formal results? One could say that the generalizations we have considered are somehow unnatural. It seems rather strange to specify in one's generalizations numbers of individuals up to a given point, as we have done. In fact, these numerical assumptions are absolutely unverifiable when an infinite universe
ON INDUCTIVE GENERALIZATION
147
is concerned, and, as we have seen, they are not confirmable either. It is not possible to explain anything by postulating such singularities. Moreover, even if it were possible to express in our language the existence of singularities, our result (ii) suggests that we perhaps should not distribute a priori probabilities among the constituents according to the method used above. According to this method, each constituent with a fixed depth q received an equal a priori probability. Some of these constituents differed from each other only because of unconfirmable numerical assumptions, and the equality of their a priori probabilities was reflected again in the equality of their a posteriori probabilities. It would perhaps be advisable to give a relatively high a priori probability to such constituents as deny the existence of singularities, and lower a priori probability to other constituents. In Section 8 we shall inquire whether this can be done in some simple and natural way. To some extent, both the results (i) and (ii) may be taken to reflect the limitations of the language systems we are here studying, rather than the limitations of the basic ideas of our inductive logic. It does not seem reasonable to expect that the exact numbers of the different kinds of individuals we can distinguish from each other can in any case be accounted for in terms of purely qualitative concepts, i.e. monadic predicates. Our results may perhaps be taken to justify this pessimism. 6. If N is a finite number, i.e. if we are speaking of a finite domain of individuals, we obtain results different from those obtained in the previous case. In particular, if n is not negligible in comparison to N, (i) and (ii) do not hold any more. Our formulas in the sequel are rough approximations and presuppose that both Nand n are large in comparison with K. For the sake of simplicity, we shall restrict our remarks to the case in which q = 2. Formulas (38) and (39) hold for the finite case as well as for the infinite one. If n is not negligible in comparison with N, and both are large in comparison with K, P( CI W) is instead of (47) approximately
m(W) ~ (N  n)d
M(W)
D
N
•J
(_1)(W~I~) . w;
Because the value of the denominator
Ndl.l
I
(85)
P( CI U) of the formula for
P (WIC) is independent of the choice of W, we shall in the sequel inquire which constituent W has the highest degree of confirmation with respect to C
by considering formula (85) only. When (85) assumes its greatest value,
P (WIC) ass~mes its greatest value, too.
148
RISTO H1LPINEN
It is easy to see from (85) that the value of P (C1 W) with dO,l
(86)
=0
is larger than any of the values which it assumes when do,1 >0. When (86) holds, (85) reduces to
1P(CI W) = (  d N 1.1
,) )(Wd'.
_2_. W~
(87)
Because of (21), (24) and (25), (87) can be written as follows:
(N~1'[)(~:2++C~1~~~~~1,~++d~:~;:'1
(88)
Because we assumed that n > K and thus n > d 1,1' (88) assumes its greatest value when (89) d O,2 = 0 (88) thus becomes 1 P(CI W) = N d , . , (c2 + C 1  dl,l)dt.,n. (90) How should we now choose dl,l so as to make (90) and therefore also P(WIC) as large as possible? The required choice of d1,1 depends on how the numbers n, N, C 1 and C 2 are related to each other. Let us consider two different possibilities. We may choose (91) and obtain (92) or alternatively (93) whence (94) By comparing formulas (92) and (94) one can see that if n>log~ cz
+Ct
C2
(

N
)C I,
(95)
a constituent with (93) has a higher probability than one with (91). Con
ON INDUCTIVE GENERALIZATION
versely, if n
CZ+Ct
(
C2
N
)CI
149
(96)
a constituent W with d l l =0 has a higer degree of confirmation with respect to C than one with (93) and is thus preferable to it. If n=log~ ( cz+c,
c
2
N
)Cl ,
(97)
the two constituents in question have an equal a posteriori probability. As we mentioned earlier, our argument is not quite strict, because we have used formulas that are only rough approximations. Our results (95)(97) cannot therefore be taken to give exact quantitative conditions for the preferability of our alternative constituents to each other. However, they can be interpreted qualitatively as follows: In our evidence C contains a relatively large part of the individuals in the whole universe, it is reasonable to assume that the singularities we have found in our evidence are genuine singularities of our universe. On the other hand, if the number n of observed individuals is small in comparison with N, it is not advisable to draw such a conclusion. In the latter case we ought to expect that in the whole domain of individuals there are unobserved individuals that exemplify attributive constituents in C i This result can be extended to apply to any number of layers of quantifiers. It is different from the result (i) obtained in the infinite case. When our universe is finite, (i) must be replaced by (i') It is not reasonable to expect that the singularities of our evidence are genuine singularities of our universe unless we have examined a large part of the individuals in the whole universe. (86) shows that our result (ii) does not hold if the number of individuals in our evidence C is not negligible in comparison with the total number of individuals in the whole domain of individuals. If N is a finite number, it is not reasonable to assume that there exist unobserved singularities in our universe, and the unreasonableness of such assumptions is brought out by our inductive logic. The smaller the ratio n] N, the smaller is the difference P (W:dO , 1 =OIC) P (W:dO , 1 >OIC). When n/N+O, we obtain the result (ii) as a limit case. 7. One can notice certain analogies between the results obtained in the present paper and those obtained by using Carnap's confirmation function c *.
150
RISTO HILPINEN
In fact, our system of inductive logic becomes more and more like Carnap's
system when q grows. As we mentioned earlier, constituents correspond to different partitions W = (Wq , Wq _ I ' ... , WI' Wo) of the class of all attributive constituents. Each constituent specifies a statistical distribution over the attributive constituents Ct.; Ct 2 , .•. , Ct K • Because of the restriction imposed on the number oflayers of quantifiers, the distributions are not defined completely: the class Wq contains all Ctpredicates exemplified by at least q individuals. If we do not restrict the numbers of layers of quantifiers in the way mentioned above, but instead let our formulas contain any number of layers of quantifiers, it is possible to define all the possible statistical distributions in our universe. If N is the total number of individuals in our universe, we need also N layers of quantifiers to specify completely all the possible statistical distributions. In this case our constituents become Carnap's structuredescriptions 3. (This procedure is of course unavailable in an infinite universe.) When we are speaking of an infinite universe, our result (ii) is analogous to that obtained in Carnap's system. If we let the maximal number of layers of quantifiers grow, the number of such constituents as we are not able to distinguish from each other in terms of their respective degrees of confirmation becomes larger, too. When the number q grows, the highest a posteriori probability given by e to any constituent W becomes smaller. By letting q become large enough the probability of any constituent with respect to any evidence whatsoever can be pushed arbitrarily close to zero. Carnap's system of inductive logic gives somewhat unsatisfactory and strange results in the case of generalizations, especially in an infinite universe, and so does Hintikka's theory, if applied in the particular way in which we have applied it so far (and which need not be the only way of applying it). This similarity is not accidental, but is due to the connection which there obtains between Carnap's system and the situations studied above. In a sense, our results bring out the underlying reasons for the failure of Carnap's system to deal with inductive generalizations. This perhaps is the greatest interest of applying Hintikka's ideas to the case at hand in the way we have done so far. We shall see, however, that if we look away from Carnap's system; there are somewhat more natural ways of dealing with inductive generalizations in a monadic firstorder logic with identity. 8. In the preceding considerations we have assumed that as soon as the :l
For structuredescriptions, see Carnap [1950] pp. 114117.
ON INDUCTIVE GENERALIZATION
151
maximal number q of layers of quantifiers is fixed, a priori probabilities are distributed evenly among all constituents that can be specified by using at most q layers of quantifiers. In Section 5 this method of assigning a priori probabilities to constituents was found to lead to result (ii) which does not seem to be in accordance with our inductive common sense. From the point of view of intuitive judgment it is clearly reasonable to prefer a constituent which denies the existence of unobserved singularities to any alternative constituent according to which such singularities do exist. However, according to result (ii), this preference cannot be defended in terms of our inductive logic. Can this shortcoming of our system be removed by assigning a priori probabilities to the constituents of depth «> 2 unevenly in some suitable and natural way? As we suggested earlier, it is perhaps advisable to assign a relatively high a priori probability to a constituent which denies the existence of singularities in our universe, and attach correspondingly lower probabilities to other constituents. This seems very natural indeed, because we are more willing to accept a constituent of the former kind than other constituents. Is there some simple way of doing this? Constituents of different depth are related to each other in certain simple ways which suggest a natural solution to this problem. Let us consider first constituents with q = 1. Constituents of this kind can be expressed without using the sign of identity. Each constituent of depth q= I is equivalent to a disjunction of constituents with q = 2. Constituents which can be specified by using at most two layers of quantifiers are equivalent to disjunctions of constituents of depth q = 3, and so on. In general, a constituent of depth q is always equivalent to a disjunction of constituents of depth q+ r (r = 1, 2, ... ), which are called by Hintikka ([l965a] p. 56) constituents subordinate to the firstmentioned constituent. An especially natural method of assigning a priori probabilities to constituents is obviously as follows: We distribute first a priori probabilities evenly among all constituents with q = 1. The probability of each constituent with q = 1 is then distributed evenly among all such constituents with q = 2 as are subordinate to it. This procedure can be continued and thus extended to constituents of any depth q.4 All the constituents with q = 2 will not have an equal a priori probability because the number of constituents subordinate to a constituent of depth q= I is different for different constituents of this depth. 4
This method, as well as the methods used earlier, has been suggested by Hintikka
[1965bl p. 282.
152
RISTO HILPINEN
It is easy to see that a constituent of depth q=2 has the larger a priori
probability the larger is the number of such attributive constituents as are not instantiated according to the constituent in question. When a priori probabilities are distributed among the constituents in the way described above the probability of a constituent with a given depth is independent of the maximal number oflayers of quantifiers that the sentences of our language are allowed to contain. This is of course a very natural feature of our new a priori probabilities. How do the a posteriori probabilities of constituents behave if a priori probabilities are distributed according to this new method? We shall restrict our remarks to the case in which q=2. The probability of each constituent of this depth is again distributed evenly among all statedescriptions which make the constituent in question true. It is assumed again that the universe we are dealing with is infinite or very large. Our evidence is supposed to be described in the same way as earlier. The a priori probability of each constituent with q= 1 is 1/2K • A constituent with q=2, according to which W o specified kinds of individuals are empty, has now obviously for its a priori probability
1 P(W) = 2~Kc.2KCCW~o '
(98)
r.e.
(99)
According to Bayes' formula, the degree of confirmation of W with respect to C is thus (100)
where U represents an arbitrary constituent compatible with C. The values of P (C I W) and P (C IU) are given by formulas (42) and (49). Which constituent W has the highest degree of confirmation with respect to C? Because the denominator of (100) is identical for all constituents, we can answer this question by asking when the numerator of (100) assumes its greatest value. When N grows without limit, the numerator of (100) becomes according to (42) 2 W O W~I.I (101) Nd",' w~ .
ON INDUCTIVE GENERALIZATION
153
When N+Cf), (101) approaches a value different from zero only if d!,!
= O.
(102)
When (102) holds, (101) becomes (103) (103) assumes obviously its greatest value when W o is as great as possible. In constituents compatible with C, 0 ~ W o ~ Co' The evidence C thus gives the highest degree of confirmation to a constituent W in which Wo
= Co
(104)
and therefore also (because of (23) and (26») and
««, = 0
(105)
o.
(106)
d o,2
=
In other words, C gives the highest degree of confirmation to a constituent W according to which (iii) Singularities of our evidence do not represent genuine singularities of the universe (formula (102», and (iv) In the whole universe there are only such kinds of individuals as are already instantiated in experience; i.e. unobserved singularities do not exist (formulas (105) and (106»). The result (i) obtained in Section 5 thus remains valid; it is identical with (iii). On the other hand, (ii) no longer holds: C gives the highest degree of confirmation to such a constituent as denies the existence of unobserved singularities. Accordingly, we should prefer this generalization; and our choice can now be defended in terms of inductive probabilities. Obviously the difference between (iv) and (it) is due to a different assignment of a priori probabilities. The method used in Sections 35 represents a priori indifference as far as the choice of a constituent with a fixed maximal depth q was concerned. This a priori indifference implies a posteriori indifference as regards constituents containing unverifiable numerical assumptions. On the other hand, according to the present method we prefer a priori a constituent which denies the existence of singularities, and this preference is reflected again in a posteriori probabilities. Both methods are, however, based on an even distribution of a priori probabilities to constituents. According to the former
154
RISTO HlLPINEN
method, a priori probabilities are distributed first evenly among the deepest constituents that can be specified without using more than q layers of quantifiers. According to the latter method, a priori probabilities are distributed first evenly among such constituents as contain only one layer of quantifiers. When the latter method is used, it is possible to consider constituents of different depths at one and the same time, which seems to be in accordance with our inductive habits. It might be suggested that the defects of Carnap's confirmation function c* are due to the fact that it is in effect based on the former and not on the latter method 5. References CARNAP, R., 1950, The logical foundations of probability (University of Chicago Press, Chicago; second ed., 1963) HINTIKKA, J., 1965a, Distributive normal forms in firstorder logic, Formal Systems and Recursive Functions, Proc. Eighth Logic Colloquium, Oxford, 1963, eds. J. N. Crossley and M. A. E. Dummett (NorthHolland Pub!. Co., Amsterdam) pp. 4790 HINTIKKA, J., 1965b, Towards a theory of inductive generalization. Proc. 1964 Intern. Congress for Logic, Methodology, and Philosophy of Science, ed. Y. BarHillel (NorthHolland Publ. Co., Amsterdam) pp. 274288
5 This study has been facilitated by a grant by the Finnish State Council of Humanities (Valtion humanistinen toimikuntai to a team led by Prof. Jaakko Hintikka.
INDUCTIVE GENERALIZATION IN AN ORDERED UNIVERSE RAIMO TUOMELA University oj Helsinki, Helsinki, Finland
1. Most of the existing work in inductive logic is concerned with monadic firstorder logic, while few studies concerning polyadic cases have been made. In this paper an attempt is made to extend Hintikka's system of inductive logic(see Hintikka [1965aDfrom monadic firstorder logic to a special polyadic case, viz. to certain ordered universes. We are mainly interested in universal generalizations describing the structure of these universes. For instance, "every third individual in the ordering has the property P" is a typical and frequently cited example of such a generalization. Only some brief comments will be made on singular sentences at the end of the paper. One reason for beginning the construction of an inductive logic from generalizations is that greater difficulties due to the logical interdependencies of individuals may be expected to arise in connection with singular sentences. Carnap discusses the problem of ordered universes briefly (Carnap [1950], § 15, B). He indicates that a solution may be looked for either by defining the ordering in the usual firstorder logic or by using a richer language capable of expressing elementary number theory. Let us first consider the latter approach. Carnap suggests that two ways to a solution may be tried here. In the first we simply ignore the order and transfer the old definitions of degree of confirmation to the richer language. But this leads to results that may seem quite counterintuitive: (1) As in unordered universes, the degree of confirmation (d.c.) of a universal generalization approaches zero as the number of individuals in the universe increases. It receives nonnegligible values only when a relatively large part of the universe has been examined. (2) Carnap claims that in science we do not strive for generalizations that are highly confirmed themselves, but for those that have a high degree of instance confirmation. This kind of degree of confirmation is just the d.c. of the hypothesis that the next unexamined individual satisfies the generalization. But it does not take into account the order in which the individuals occur in the sequence. For instance, if we have examined a sequence of 100 individuals
156
RAIMO TUOMELA
such that every third individual has the property P and that the last two have the property ~P, the d.c. of the hypothesis that the Wist individual has the property P ought to be very high. However, in Carnap's system it is approximately t. Thus the first kind of solution in the language of elementary arithmetic seems unsatisfactory. In Carnap's second proposal for a solution the order of individuals is taken into account in defining the degree of confirmation of the sentences of the language. This can be done by using positional predicates (e.g. P(x;}=i'th individual has the property p). However, Carnap's critics (Putnam, Achinstein) have shown that there are serious difficulties also in this kind of approach. Their argumentation proceeds roughly as follows. First they adopt some conditions of adequacy which they try to make intuitively plausible. Then they show that no computable measure function defined in this kind of language can fulfil these conditions simultaneously. So they conclude that they have shown it impossible to construct an inductive logic for this kind of language. Carnap's answers to his critics amount to little more than to saying that the only thing they have proved is that their own conditions of adequacy are logically incompatible, and nothing else. We will not enter the details of this discussion here because in our approach we use firstorder logic, which Carnap did not consider promising for the solution of the problem in question. 2. In this paper a firstorder language system similar to Carnap's L is used. Only a slight modification concerning quantifiers will be made. The order among the individuals is defined axiomatically by using a dyadic predicate "immediate successor". To each individual belongs one property from a family of (complex) properties. The "strong" generalizations describing the structure of the universe specify what kinds of individuals there exist. The kind of an individual is determined by the properties belonging to it and to its predecessors and successors in the sequence. How shall the descriptions of possible kinds of worlds be formed? Carnap does this by means of his structure descriptions. The main drawback in Carnap's procedure seems to be its dependence on the domain of individuals in question. It is not very natural to start one's inductive logic from the assumption that one is always familiar with the whole of one's world, for the logic is largely designed to reconstruct some of the procedures we use in coming to know it. Hence it is not surprising that Carnap's theory leads to difficulties in connection with generalizations, as was mentioned above. Hence we may also conclude that we cannot use structuredescriptions in describing
INDUCTIVE GENERALIZATION
157
a possible kind of world but descriptions that are independent of one's list of individuals. They must be described by general and not by singular sentences. By a general sentence we mean here a sentence of firstorder logic without individual constants and without free individual variables. A sentence which is not general is called singular.) This can be achieved by using in description the distributive normal forms of general sentences proved to exist e.g. in Hintikka [1965b]. According to Hintikka's results every consistent general sentence of firstorder logic may be transformed into its distributive normal form which is a disjunction of certain conjunctions called constituents which are of a fixed depth d (= maximum number of layers of quantifiers occurring in it). As our language system contains a dyadic predicate and hence dependent argument places, a certain number of the constituents will turn out to be incompatible with the axioms for our universe. In the general polyadic case finding an effective method of eliminating inconsistent constituents is equivalent to finding an effective decision procedure for firstorder logic, which is known to be nonexisting. However, ordered universes like ours constitute a special case in which this elimination can be accomplished. An inductive logic will be constructed by defining a regular symmetric measure function for the set of consistent constituents. The a posteriori probabilities of constituents are calculated by Bayes' theorem. Finally, the d.c. of a sentence expressible as a disjunction of constituents is of course obtained as a sum of the d.c. 's of the constituents in the disjunction. 3. The order in our universe will be defined by a dyadic predicate Rxy = "y is an immediate successor of x". Each individual in the ordering has exactly one predicate from a family of predicates. For the sake of simplicity this family will here consist of only two members P and ~ P. In defining the order we would obviously need three quantifiers. For besides saying that some y is an immediate successor of x we must add that every z that also is an immediate successor of x must be identical with y  and similarly with respect to the converse relation. Because of certain practical difficulties (to avoid laborious computations) we restrict the number of layers of quantifiers to two by using numerical quantifiers instead of the usual ones. This is not an essential restriction because it only reduces the number of attributive constituents and hence also the number of constituents while the method of eliminating inconsistent constituents remains essentially the same. The numerical quantifiers needed here are defined as follows by taking the
158
RAIMO TUOMELA
ordinary quantifiers as primitives:
(Exo)F(x) =df ~ (Ex)F(x), (Ex1)F(x) =df(Ex) [F(x) &(y)(F(y):::> y = x)], (Exz)F(x) =df(Ex)F(x)& ~ (ExdF(x).
(1)
The axioms for our ordered universe can now be given: AI. (X)(EY1)(Rxy) , A2. (X)(EY1)(Ryx) , A3. (x)(y)(Rxy:::> ~ Ryx).
(2)
Intuitively speaking these axioms say that every individual has exactly one immediate successor and predecessor. In addition the succession relation is asymmetric. What kinds of orderings are compatible with these axioms? Obviously finite cycles without initial and terminal members and infinite open orderings plus any combination (relationtheoretic sum) of these. Again, for the sake of simplicity, the relationtheoretic sums qualifying as models for our universe will be disregarded in developing the inductive logic as their effect on the degrees of confirmation of our constituents can be shown to be negligible. A proof of this is sketched in appendix 4. We could equally well have laid down other axioms for order, however, keeping in mind that the universes must be such as to permit the generalizations to be satisfiable in a finite domain. For instance, a linear order with an initial and a terminal member would also qualify as such. What will a generalization concerning our universe look like? The first task in answering this question is to define the notion of a possible kind of an individual which is called an attributive constituent in our language. The notion ofa possible kind ofa world is defined by listing all kinds of individuals that are exemplified and adding that they are the only ones exemplified. The result is called a constituent. We begin by defining the attributive constituents Ct1(x) which are complex attributes of the individual x. The index d gives the depth of the sentences in question. Consider some arbitrary individual x to be constituted. We specify its relation to all possible reference point individuals y in the ordering. The individual y is either (1) an immediate successor or (2) an immediate predecessor of x or (3) a distant individual (= neither a successor nor a predecessor). Of these (1) and (2) always exist in virtue of A1 and A2, and are unique. Hence the only question concerning them is whether they have the predicate P or not. Excluding very small cycles, there are also
159
INDUCTIVE GENERALIZATION
distant individuals. What we have to specify is whether there are distant individuals with the predicate P, and whether there are those without it. The result of these specifications is a Ct{ predicate attributed to x. To make this more concrete we give the beginning of the list of all these predicates:
__ successor distant individuals _
Ct;lpredicate
Cto;o(x) says CtO;1 (x) says CtO;2 (x) says Ct O;3 (x) says
Ct1;0(x) says etc. 

predecessor
x itself
P P P P P
P P P P P 

_._._
.. .
.__.~
P P P P P
EP EP EP EP EP
EP EP EP EP EP
~
(EP = a distant individual with the property P exists, etc.)
Now a purely formal description of the attributive constituents can be given. Taking into account axioms AIA3 reduces the number of terms we need to write explicitly in the attributive constituents. For instance, terms like Rxx are excluded on grounds of asymmetry of (immediate) succession. This leads to
CtL(x) = (±)P(x) &(EYl) [Rxy &(±)P(y)] & &(EYl) [Ryx &(±)P(y)] & &(±)(Eyo)[ ~ Rxy & ~ Ryx &(±)P(y)] & & (Eyo) ~ {[Rxy &(±)P(y)] v [Ryx &(±)P(y)] v v [~Rxy & ~ Ryx &(±)P(y)]}
(3)
where i=O, ... , 7;j=0, ... ,3. The upper index (1) shows that these attributive constituents are of depth one. The number of attributive constituents is 2 s = 32 and hence the number of constituents is 2 3 2 • However, it will be seen that the number of those consistent constituents which we have to consider turns out to be very small and  with two obvious exceptions  the only attributive constituents occurring in them are of the same type Cti;o(x) which means that all kinds of distant individuals exist relative to x. Accordingly, as the attributive constituents differ only with respect to x, its immediate predecessor and successor, there is in the case of each constituent onetoone correspondence between the attributive constituents of the consistent constituents and the following
160
RAIMO TUOMELA
triples S, which say what kind of property belongs to the immediate predecessor of x, x itself, and its immediate successor: So Sl Sz S3 S4
s,
S6 S7
P P P
P P P p P P
P j5 P j5 P P
PP P
PPP P j5 j5 .
The two exceptions as to the kind of distant individuals occurring in the attributive constituents of the consistent constituents are the two constituents containing only the attributive constituent Ct O ; 1 (x) or Ct 7 ;z(x) which allow only distant individuals with the property P or ~ P, respectively, to be exemplified. But this does not affect the onetoone correspondence between the triples S, and our attributive constituents in each constituent. Thus we will use these triples as representatives of the attributive constituents when makingouranalysis. The next task is to define the notion of a constituent (a description of a possible kind of world) in our system. This is done by listing all kinds of individuals (= the triples S) and specifying whether they are exemplified or not. This is equivalent to saying which kinds of individuals are exemplified, and adding that these are the only kinds of individuals exemplified in our world. Formally speaking the constituent C~ is defined by (4) where tt. is the conjunction operator, and (J is the disjunction operator. But the axioms for our universe greatly reduce the number of sequences of triples which otherwise would qualify as constituents. We need an effective method for eliminating the inconsistent sequences of triples SiC = constituents). It will be given in graphtheoretical terms. We say that the triples S, and Sj are immediately adjacent (in that order) if two rightmost members of S, coincide with two leftmost members of Sj. In this case a move from the triple S, to Sj is called a transition of one step. Obviously, there are only two logically possible transitions of one step from a given triple Si' namely those to the two triples obtained by adding an individual with either the property P or P to the two rightmost members of S; Thus it is possible, for instance, to make a transition of one step from So to Sj but not to Sz. Let us now form the maximally great graph GM by
INDUCTIVE GENERALIZATION
161
taking all the triples S; as vertices and all the possible transitions of one step
(S;, Sj) as arcs. (See appendix 1 for the definition of the graphtheoretical
terms used.)
Constituents can now be represented by certain subgraphs of GM • These subgraphs (other than So and S7) are characterized by the fact that every vertex in them must occur as an initial and as a terminal vertex of some arc that does not return to the same vertex. For only then can an ordered finite universe satisfy the axioms AtA3. (Our reason for disregarding infinite universes here will be given later.) Subgraphs satisfying this condition are strongly connected or sums of strongly connected graphs. (We have already decided to disregard the latter; cf. appendix 4A.) A method of eliminating inconsistent constituents can now be given: A (putative) constituent given by (4) is finitely satisfiable if and only if it is representable by a strongly connected graph that is a subgraph of GM (or by a sum of such graphs). The list of such consistent constituents, which cannot be decomposed as sums, is given in appendix 2. It can be seen that the restrictions we have imposed on the set of all initially possible constituents are very strong. Out of 2 3 2 = 4294967296 constituents only 24 remain to be considered. An examination of our constituents (cf. appendix 2) shows that there is, with some exceptions, a onetoone correspondence between these constituents and the graphs formed by using the triples S; as vertices. An example of an exceptional case with twotoone correspondence is provided by the
162
RAIMO TUOMELA
graph SOS\S2S4' It corresponds to two constituents which have the attributive constituents Ct6;0(x), Ctto(x), Ct~;o(x) in common and are different only with respect to the distant individuals corresponding to S1' i.e. they include either Ct:;o or Ct:; I ' In the latter case it is denied that distant individuals with the property P exist. It can be shown that this kind of exception among the constituents is an uninteresting rarity as to its d.c. because it allows the problematic kind of triple to be exemplified by only a single evidence instance. Thus we may omit these few cases from our discussion as singularities. The proof that justifies this omission is given in appendix 4B. The onezero matrix associated with a strongly connected graph (cf. appendix I) and hence with a finitely satisfiable consistent constituent must satisfy the following condition: in every row and every column of the matrix there must be at least one nonzero element (and at most two of them) which is not a diagonal element, except for the constituents So and S7 which are represented by an 1 x 1 matrix with 1 as the only element. An example of a consistent constituent is e.g. SoS \ S3S6S4' Its graph and matrix are presented in appendix 1. Thus all possible submatrices of the maximal matrix AM corresponding to G M which satisfy this condition will qualify as matrices corresponding to consistent constituents. Before attempting to construct an inductive logic it is, perhaps, worth while considering the possible effects of the simplifications made in constructing the set of consistent constituents. So far we have made two main simplifications whose effects have not yet been discussed in the text or in the appendixes. The first concerns the number of logically different individuals to be treated simultaneously (=the depth of the sentences in question). This number depends on the maximum number k of argument places in the predicates of the language used. In our case the most complex predicate is dyadic and the number of layers of (usual) quantifiers needed is three. But in defining our ordered universe we "quasireduced" this threequantifier case to a twoquantifier case by using numerical quantifiers.  The other simplification was made in selecting the axioms for the universe. The axioms AIA3 require that the graphs representing finitely satisfiable consistent constituents be strongly connected. However, if the universe is assumed to contain an initial and a terminal member these graphs need no longer be strongly connected. For instance, SOS1S2S5 would then qualify as a finitely satisfiable consistent constituent with the initial triple So' But after a transition to the second triple in the ordering (S1) has occurred So can no more be exemplified in future experience. Generalizations of this kind with "reflexive" elements (So or S7) in them are in a sense very
INDUCTIVE GENERALIZATION
163
interesting as far as their predictiveness is concerned. This is due to the fact that such cases are now allowed as have either no arc coming to the reflexive triple or no arc leaving it, and this makes it possible to predict accurately either the nonoccurrence or the continual occurrence of the triple So or S7' respectively. These generalizations are also pathological as to their degree of confirmation, however, as will be indicated later. 4. The next task is to define a measurefunction for constituents (= strong generalizations). The following features of Hintikka's inductive logic for the monadic case will be taken as conditions of adequacy for this measure: (1) Among possible strong generalizations compatible with the evidence there will be one approaching 1 as limit when the positive evidence increases. (2) The simplest constituent compatible with evidence will have the greatest a posteriori probability. As to the justification of these conditions both of them sound intuitively natural when considered separately. We want our strong generalizations to be as reliable as possible in a fairbetting sense. We also regard informativeness (roughly synonymous to simplicity in its presystematic sense) as a desirable feature of our strong generalizations. It will be seen that these conditions may be satisfied together in the system to be developed. The simplest way to assign a priori probabilities to generalizations is to give equal a priori probabilities to all the consistent constituents (j = 1..., w, ... ). (Of course the a priori probabilities could be assigned according to some other principle than that of indifference, too.) As a constituent is regarded as the set of its models its probability may be divided evenly among these models. The a posteriori probabilities b( C;le) are then given by Bayes' theorem: 2 b(C~)'b(eIC~) b(eIC~)
CJ
b(Cwle) =
Lb(~lc~)J(C~) = Ii(~lcJ)" j
(5)
j
The evidence statement is given by a general sentence telling what kinds of triples of individuals have been exemplified in a sequence of n individuals. Due to the nature of our universe we can say only that these n individuals are adjacent individuals without being able to speak of their location with respect to all the other N  n individ uals of the universe. The degrees of confirmation for general sentences are obtained as sums of the d.c.'s of the constituents occurring in their normal forms. The main task is, of course, to evaluate the conditional probabilities b(eICJ). In other words we want the probability that just this kind of
164
RAIMO TUOMELA
evidence is observed provided that CJ is true. This can be evaluated as the relative frequency of models compatible with both the evidence and with CJ among all the models of A model of is a path oflength 1 in which n  1 transitions coincide with the sequence oftransitions oftriples exemplified in the evidence. Let rjk Il be the number of different ways we can take k 1 =N 1 adjacent steps in the graph of CJ where any vertices of the graph may occur as possible initial and terminal vertices of the paths. This is just the total number of models of As our graphs are strongly connected, the evidence does not excl ude anyonestep transition possibilities from the remaining N  n individuals. Accordingly, rjk2l, where k 2 = N  n  l , represents the number of models compatible both with the evidence and with The conditional probability of the evidence is then obtained as a quotient between the number of favorable models and the num ber of all possible models 1: r<.k Il J(eICD=r:.k~l' (6)
CJ.
CJ
N
CJ.
CJ.
J
This probability measure can be shown to satisfy the usual probability axioms. The measures rjkl may be evaluated from the graphs by combinatorial means. But a more elegant way to do this is to use matrix theoretical devices and operate with the matrix associated with the graph of According to a wellknown theorem, given e.g. in Berge [l964], p. 131, the element a~Y) of matrix A k (obtained by taking the product of A with itself k times) is equal to the number of distinct paths of length k which go from S, to Sj' The sum of the elements of A k says in how many ways we can take k steps in the graph of without regard to the initial and the terminal vertex of the path. This is identical with the measure r
CJ.
CJ
(7) where e is the unit rowvector, and e' is its transpose. I The only constituents consistent with the axioms AIA3 that do not have strongly connected graphs are constituents satisfiable only in an infinite domain (e.g. 50515255). Thus their conditional probabilities of evidence are equal to zero in a finite domain, and hence their d.c.'s must also equal zero according to Bayes' theorem. Taking the limit, it is seen that this holds also in an infinite domain.
INDUCTIVE GENERALIZATION
165
We still want to know (a) what kind of function of N the measure r(k) is in order to evaluate the behavior of c:5(eIC}); (b) how it depends on the characteristics of the constituents C}. The first problem is solved by transforming A into its Jordan canonical form J by pre and postmuitiplying it by certain nonsingular matrices P and r >. A=P1JP (cf, appendix 3 for a more detailed explanation). Taking kth power of A is known not to affect P; thus A k =p 1 rp (Browne [1958], p. 181). Hence (8) Here F (Ai k » is a linear combination of certain powers (~k) of the latent roots Az(l= 1, ... , r) of A.  By using theorems of the theory of nonnegative matrices some interesting results can be obtained as to the behavior of F ()t» in our case. A real matrix is nonnegative if it does not include any negative elements. It is irreducible if and only if the graph corresponding to it is strongly connected. Finally, the matrix is primitive if in the graph corresponding to it the greatest common divisor of all the lengths of its closed paths is = 1. A primitive matrix is necessarily both nonnegative and irreducible. All the matrices A j corresponding to the constituents C} are nonnegative and irreducible. They are also primitive except for those few which are permutation matrices. As to these exceptional matrices, the sums of their elements do not change when the matrices are raised to higher powers. Obviously, for such matrices c:5(eICn=l.  Thus we need consider only the theory of primitive nonnegative irreducible matrices. Some theorems concerning these matrices are stated below without proof (for proofs, see Varga [1962], ch. 2). Let A be a primitive nonnegative irreducible matrix. Then A has a dominant positive latent root t» such that: (a) w is simple, (b) w has an absolute value greater than that of any other latent root, (c) min
n
n
j=l
j=l
L aij ~ w ~ max L aij,
(d) w increases when any entry of A increases. As these results are applied to our case it is seen that the matrices A j associated with the finitely satisfiable consistent constituents have a unique dominant latent root Wi such that 1 ~ ta j ~ 2. In F (A\k» co j is the only latent root with the highest power k. Thus it is seen that the measure r(k) grows
166
RAIMO TUOMELA
exponentially towards infinity as k grows without limit. With sufficiently large values of k it can be approximated by aw~ where a is a small positive constant due to the matrices P and P  I (cf. above). Hence the expression for the conditional probability of evidence (7) (with k 2  k 1 = n) can be approximated by (9) The complexity of a constituent Cf may be measured by the amount of one step transition possibilities in its graph. This is equal to the sum of the elements of the associated matrix A i: And by (d) this can be measured by Wj. Hence the smaller w j is the greater is the simplicity of C], Finally, the a posteriori probability of C~ is given by Bayes' theorem
() (e 2Ie) = IL~: = .. w
I
j
l/w~
I
j
1
__ .
(ww/wJn
(10)
Thus the a posteriori probability is independent of the number of individuals in the universe. It depends only on the simplicity of the constituents and the number of positive evidence instances (=n). The behavior of (10) is the most clearcut in the asymptotic case, i.e. when n is allowed to increase without limit. Among the constituents compatible with given evidence there is (at most) one whose d.c. approaches 1 as a limit with increasing positive evidence. This is the constituent with the smallest dominant latent root and therefore the simplest in the sense defined above. The degree of confirmation of all the other constituents of course approaches zero. The qualification 'at most' is occasioned by the case where there are several constituents with equal minimal dominant latent roots. According to our previous results the matrices of these constituents cannot be primitive nonnegative irreducible ones. Therefore, only in cases where two or more permutation matrices occur among the matrices of the constituents compatible with the evidence the asymptotic behavior of (10) is exceptional. For in these cases all the constituents representable by permutation matrices have an equal share of the asymptotic degree of confirmation (= 1).  As one can see (cf. appendix 2) in our particular universe this exceptional case (= many permutation matrices compatible with given evidence) is not even possible. More generally, the degree of confirmation of a generalization consisting of a disjunction of constituents is obtained as the sum of the d.c.'s of the constituents occurring in the disjunction. Thus the d.c. of a generalization approaches I as a limit if and only if the constituent with the smallest latent
INDUCTIVE GENERALIZATION
167
root (of constituents compatible with given evidence) occurs in the distributive normal form of the generalization. Above the simplicity of a constituent was measured by the dominant latent root of the corresponding matrix. According to our previous results OJ is the greater, the more nonzero elements the matrix includes, and hence the more vertices and arcs the corresponding graph includes. In particular, the more vertices with two transition possibilities ( = arcs) there are, the more subpaths there are in the graph, hence the greater is the number of possibilities to take k steps in the graph.  All this amounts to saying that a constituent with the following properties is that with the highest d.c. both in preasymptotic and in asymptotic cases: (a) it allows the minimum number of kinds of individuals compatible with evidence to be exemplified; and (b) within these limits it allows as few transition possibilities as possible (and it is, if compatible with the evidence, a permutation matrix allowing at each point only one transition possibility). Thus our conditions of adequacy are satisfied. Let us take an example to illustrate our results. We assume that the following string ofindividuals has been observed in our universe: . .. PPPPPPP .... This amounts to saying that the triples attributive constituents 5051535654 have been exemplified in this order. The consistent constituents compatible with this evidence are: Ci=5051535654' C~=505153575654' C~ = 5 05 1525s535654,and the maxirnalconstituent C'[» 5 05 1525s535756S~. The dominant latent roots of the matrices corresponding to these constituk 2k ents can be evaluated by e.g. the power method: OJj = (trace A ) 1 / 2 . Thus we find that Cj has the smallest value of OJ( = 1.87)while C~ has the maximum OJ = 2. The other OJvalues lie between these limits. The substitution of these values in formula (10) yields the highest degree of confirmation for the simplest constituent ci. Finally a few words about confirmation of generalizations in a (linearly) ordered universe with an initial and a terminal member. As indicated previously, this situation is in general similar to ours. We have only to deal with more constituents as attributive constituents taking into account the first and the last individual must be added to the constituents. Another change would be the addition of finitely satisfiable constituents with reflexive elements (loops) i.e. constituents whose graphs are not strongly connected, e.g. 5 05 1525s· The measurefunction r(k) for these constituents need not be an exponentially increasing function of N but only a monotonically increasing one, and
168
RAIMO TUOMELA
as clearcut as here concerning their d.c.'s cannot be given. The constituents must therefore be studied one by one. For instance, the d.c. of SOSlSZSS can then be shown to approach t asymptotically. 5. Thus far the problem of evaluating the degree of confirmation a posteriori of constituents and hence of all general sentences has been solved in a way which satisfies our criteria of adequacy. We can still ask for the degree of confirmation of the hypothesis h that a certain attributive constituent compatible with CJ will be instantiated by a certain hitherto unexamined individual. This will be a singular prediction as the sentence h contains free individual symbols. We have
b(h&e) b(hle) =  bee) , where in the denominator b(e)= Ij b(eICj ) 15 (Cj ) = Ij (1/wj)'b(CJ Here 15(11 & e) is obtained from bee) by replacing n by n+ 1. Then c5(hle)+1 as 11+ UJ. If we want to calculate the probability that a certain attributive constituent will be exemplified, we cannot assign equal probabilities to all the Ct;predicates. As there are independent circuits in the graph we might first assign probabilities to them  equal or with weights proportional to the number of vertices (attributive constituents) they contain. But these probabilities do not change with increasing evidence. Perhaps we would want some learning from experience to take place even in the case of singular predictions and not only in the case of generalizations, for different circuits in the graph may not be exemplified equally often. This can be achieved by generalizing this system by adding a parameter which enables such learning from experience to take place (cf. Hintikka [1966]). But that does not belong in this paper. Anyway the main problem  the construction of a measurefunction for generalizations in an ordered universe  has been solved in a way that satisfies the assumed conditions of adequacy.
ApPENDIX
1
Some basic concepts in the theory of graphs
(For reference, see Berge [1962] pp. 58.) We say that we have a graph whenever we have: 1. a set X, 2. a function r mapping X into X.
169
INDUCTIVE GENERALIZATION
A graph G can now be defined as the pair G=(X, r). Fig. 1 presents two graphs G1 and Gz corresponding to constituents SOSlS3S6S4 and SoS1 SZSSS3S6·
Gz
CO
So
G1 SI
S3
c~<1
"'0
S4
56
o ..
j
SI
I O\/OS, j
$3
0
S2
.. 0
"0
0
$6
Fig. 1
The set X consists here of the points or vertices Sj. (If S, and Sj are joined by a continuous line with an arrowhead pointing from S, to Sj this is denoted by SjErSi.) The pair (Sj, S) with SjErSi is called an arc of G. The set of arcs of a graph will be designated by U = {(Sj, Sj)}. Obviously G=(X, r)=(X, U). For an arc u=(Sj, Sj)' the vertex S, is called its initial vertex and Sj its terminal vertex. A path is a sequence (u1 , Uz, ... ) of arcs of a graph (X, U) such that the terminal vertex of each arc coincides with the initial vertex of the succeeding arc. A path is simple if it does not use the same arc twice, and composite otherwise. The length of a path j1=(u 1 , U z, ... , Uk) is the number of arcs in the sequence. A loop is a circuit of length I, which consists of a single arc. A graph is said to be strongly connected if there is a path joining any pair of arbitrary distinct vertices. For instance, G1 is a strongly connected graph whereas Gz is not. Finally, we need the concept of the matrix associated with a graph. If G has n points we form a n x n square matrix A. To the ith row of A we associate the point SiEX; to thejth column the point SjEX. Within the cells the elements of A are denoted by Aij; we put aij= 1 if (Sj, SJE U and aij=O if
170
RAIMO TUOMELA
(Sj, Sj)r/=U. The matrix At associated with the graph G, is given below:
At =
S0
Sl
So
1
1
St
0
0
S3
0
0
0
S6
0
0
0
0
1
0
0
0
S4
!
S3
S6
S4
0
0
0
0
0

0
._._
ApPENDIX
2
List of finitely satisfiable consistent constituents The constituents that are representable by strongly connected graphs (or by sums ofsuch graphs) are the only finitely satisfiable constituents compatible with the axioms AIA3 given in the text. Those that are not sums of smaller graphs are listed below: (i) including So:
So SOSlSZ S4 SOSlSZ SSS3 S6S4 SOSlSZSSS3S7S6S4 SOSlS3 S6S4 SOSt S3 S7 S6S4 SZS4S0S1S3S6 SZ S4S0 S 1 S3S7S6 , (ii) including St but not So: St SZS4 SlSZSS S3S6S4 SlSZSSS3 S7 S6S4 SlS3S6S4 St S3S7 S6S4 StS3S6SSSZS4 SlS3S7S6SSSZS4 SZS4StS3S7S6 S3 S6S4St SZ'
INDUCTIVE GENERALIZATION
171
(iii) including Sz but neither So nor S\ (hence not including S4 either): SzSs SZ SSS3 S 6 SZSS S3 S 6 SZSS S3 S 7 S 6 , (iv) including S3 but none of So, S\, Sz (hence not including S4 either): S3 S 6 SS S3 S 7 S 6 SS, (v) including S4' Ss, S6 but none of So, St, Sz, S3: none, (vi) including S7 but none of SOSlSZS3S4SSS6: S7· ApPENDIX
3
The Jordan form of a matrix F or every complex n x n matrix A there exists a nonsingular n x n matrix which reduces the matrix A to its Jordan normal form, i.e. PAP\
=
J
=
where each of the n, x n, submatrices J, has the form
172
RAIMO TUOMELA
rr,
The kth power of A is given by Ak=P 1 thus r(k) = eAke' =ep1 Jkpe' = Ld~Y)(/)=F(,W)). F(A}k)) is a linear combination of the terms of eP ) J Pe' as explained in the text.
Li
ApPENDIX
4
A. Degree of confirmation and the partitioning of the population of individuals into subpopulations As models of our order axioms there occur finite cycles without initial and terminal members and as limit cases infinite open linear orderings as well as any combinations (relationtheoretic sums) of these. It was claimed in the text that the relationtheoretic sums may be omitted from discussion as their effect on the degree of confirmation of a constituent can be shown to be negligible in comparison to the degree of confirmation obtained when no partitioning of the population is allowed. The proof of this is easy. We count the measures r(k) both for the simple uniform universe consisting of only one ordering and for the complex case allowing the relationtheoretic sums to occur, and show that the degrees of confirmation in both cases are asymptotically eq ual. In the complex case, let the total population of N individuals be partitioned into [J subpopulations such that N l + N, ... + N p = N. Then the value of our measure r(k) becomes
In the simple case treated in the text the measure r(k) was defined as
As shown before the measures r(k) are exponentially growing functions of the number of possible one step transitions of the cycles. It can now be seen that the quotient r~)/r\k)+ 1 as N+oo. Denote the conditional probabilities of the evidence in the complex and the simple case by <5(eIC~)o=r~d/r6k2) and ()(eIC~)l =r\k tl / r\k2 ) , respectively, where z , and z , have the same meanings as in the text. Thus it is seen that <5(eIC~)o/<5(eIC~)l=(r~kJir\k2))/(r6k2)r\kI))+1. Then it follows that <5(C~le)o/<5(C~le)l+l,which completes our proof.
INDUCTIVE GENERALIZATION
173
B. Graphs corresponding to more than one constituent
As mentioned in the text there exist strongly connected graphs corresponding to two, and not only one, finitely satisfiable consistent constituents of our particular universe. There are four such constituents: SOSlSZS4' SOSlS3S6S4 and their mirror constituents (with respect to P) S7S6SSS3, S7S6S4S1S3' The problematic triples are Sz in the first two constituents and Ss in their mirror images. To each of these triples correspond two attributive constituents Ct.. 0 and Ct.. 1 (i = 1,6) with both kinds or only one kind of distant individuals exemplified. The constituents with the latter kind of attributive constituents are descriptions of an unusual world in which a certain kind ofindividual ta; 1 (x)) is allowed to be exemplified by only one evidence instance. We will show here that such exceptional constituent C; will always have a vanishingly small a posteriori probability, and thus the other constituent corresponding to the same graph can always be preferred to The d.c. of C; can be evaluated in the usual manner by (6). In graph theoretical terms r(k) is now the number of ways to take k transitions of one step such that the paths corresponding to the models are allowed to go only once through the vertex representing the problematic triple with only one kind of distant individuals. When this combinatorical task is solved it is seen that J(e\C;)= f(k 1 ) / f(k z) where f(k 1 ) and f(k z) are linear combinations of k , and k z , respectively, with the same parameters. The d.c. of C; is now obtained by (10) as usual. It is seen that when we let ns N and Nss o», the denominator of (10) increases without limit and hence J( C;\e)~O in all cases.
C;.
References ACHlNSTEIN, P., 1963, Confirmation theory, order, and periodicity, Philosophy of Science, vol. 30, pp. 1735 BERGE, C., 1964, The theory ofgraphs (Methuen & Co., London) BROWNE, E. T., 1958, Introduction to the theory ofdeterminants and matrices (University of North Carolina Press, Richmond, Virginia) CARNAP, R., 1950, The logical foundations of probability (University of Chicago Press, Chicago; second ed., 1963) CARNAP, R., 1963, Variety, analogy, and periodicity in inductive logic, Philosophy of Science, vo!' 30, pp. 222227 HINTIKKA, J., 1965a, Towards a theory of inductive generalization, in: Proc. 1964 Intern. Congress for Logic, Methodology, and Philosophy of Science, ed. Y. BarHillel (NorthHolland Pub!. Co., Amsterdam) pp. 274288 HINTIKKA, J., 1965b, Distributive normal forms in firstorder logic, in: Formal Systems and Recursive Functions, Proc. Eighth Logic Colloquium, Oxford, July 1963, ed.
174
RAIMO TUOMELA
J. N. Crossley and M. A. E. Dummett (NorthHolland Pub!. Co., Amsterdam) pp. 4790 HINTIKKA, J., 1966, A twodimensional continuum of inductive methods, this volume, pp. 113132 PUTNAM, H., 1963, Degree of confirmation and inductive logic, The Philosophy of Rudolf Carnap (The Library of Living Philosophers), ed. P. A. Schilpp (Open Court Publishing Company, La Salle, Illinois) pp. 761783 ROSENBLATT, D., 1957, On the graphs and asymptotic forms of finite Boolean relation matrices and stochastic matrices, Naval Research Logistics Quarterly, vol. 4, pp. 151161 SCHNEIDER, H., editor, 1964, Recent advances in matrix theory (University of Wisconsin Press, Madison, Wisconsin) VARGA, R., 1962, Matrix iterative analysis (PrenticeHall, Inc., Englewood Cliffs, New Jersey) WALK, K., Extension of inductive logic to ordered sequences, IBM Technical Report TR 25.053
NOTES ON THE "PARADOXES OF CONFIRMATION" MAX BLACK Cornell University, Ithaca, New York
1. Formulation of the paradoxes. The label, "paradoxes of confirmation", was assigned by Professor C. G. Hempel, in his wellknown paper (Hempel [1945]), to the following sheaf of arguments. Consider the proposition, All ravens are black (which I shall call "the raven hypothesis"). Common sense is inclined to say the following three things about it: (i) The hypothesis is, or would be, shown to be false by the existence of a single nonblack raven. (ii) The existence of a black raven supports, or would support, the hypothesis  is, or would be, empirical evidence favoring its truth. (iii) Not all objects bear upon the hypothesis, negatively or positively, in these ways: for instance, the existence of Halley's comet neither falsifies nor supports the raven hypothesis. In short, the common sense position is that the existence of some, but not all, things is relevant to the raven hypothesis. This might be called the principle of limited relevance. Hempel now offers the following arguments against this principle: The raven hypothesis can be symbolically represented as
(x)(Rx::> Bx),
(1)
and a positive instance of it will accordingly have the form Ra·Ba.
(2)
Now (i) is logically equivalent to its "contrapositive":
(3) a positive instance of which has the form ~
Bb
~
Rb,
(4)
176
MAX BLACK
and is also logically equivalent to what I shall call its "comprehensive",
(x)[(Rx v
~
Rx)::::>
(~ Rx v
Bx)] ,
(5)
a positive instance of which has the form
(Rc v
~
Rc).( ~ Rc v Be).
(6)
On the assumption that logically equivalent hypotheses are "confirmed" (Hempel's term, which I shall discuss later) by the same instances, we are led to the following concl usions: (iv) Any nonblack nonraven confirms the raven hypothesis (cf. formula (4) above). (v) Any nonraven also confirms the same hypothesis (because any object, c, for which ~ Rc is the case, will satisfy formula (6) above). (vi) Any black thing will also confirm the hypothesis (for Be will logically imply (6) above). Thus it would seem that the raven hypothesis is, or would be, "confirmed", for instance, by the existence of a white handkerchief (cf. (ivj], by the existence of a stone (cf. (v») and by the existence of a black pearl (cf. (vij). These conclusions are certainly startling. The foregoing arguments purport to show that any object, 0, without exception, is relevant to the raven hypothesis, in the sense of either confirming it or falsifying it. For if
NOTES ON THE "PARADOXES OF CONFIRMATION"
177
conferred upon formula (1) above by formulas (2) or (4) or (6) could be considered in abstraction from any meanings that might be attached to 'R' or to 'B' and hence without reference to any "background knowledge" which we might normally regard as relevant. Now it is argued that common sense's reluctance to accept some of the paradoxical instances produced in the last section can be explained by our possession of relevant "background knowledge": if we think the existence of a stone does nothing to strengthen the empirical evidence supporting the raven hypothesis, that is because we already know that no stones are ravens and are therefore receiving no additional information that is relevant>, It is extremely difficult, however, to suppress, ignore or "bracket" the knowledge we in fact have about ravens, colored things  and, more generally, about birds, other physical objects and other relevant broadscale features of the universe 3. One might, however, deliberately construct an artificial context in which covert appeal to "background knowledge" would be impossible. Suppose somebody invites me to consider a proposition, H, say, of the form, All A are B, where 'A' and 'B' stand for definite characters, known to my interlocutor, but deliberately concealed from me. Let it be further supposed that my interrogator now tells me that, to his knowledge, a certain object, not further described or identified, is in fact both A and B: I am asked 2 Cf. Hempel's discussion of an attempt to "support" the assertion, All sodium salts burn yellow by burning a piece of ice: ..... we happen to 'know anyhow' that ice contains no sodium salt; this has the consequence that the outcome of the flamecolour test becomes entirely irrelevant for the confirmation of the hypothesis and thus can yield no new evidence for us" (Hempel [1945] p. 19). So far as I can see, this kind of defense could be used only in connection with "contrapositive instances" (cf. (4) above) and would leave the still more puzzling cases of "comprehensive instances" (stones, lumps of coal) to be explained. However, all three types of instances are made to seem relevant by Hempel's original argument and in the end the relevance of all three types of cases must be explained or explained away. 3 I am strongly inclined to think that nobody who lacked such knowledge could qualify as understanding the raven hypothesis or as knowing the language in which it is expressed. Could somebody who had never seen or heard of animals understand our intention in saying "All ravens are black"? And would not an effort to teach him the intended meaning naturally begin with a recital of certain/acts, familiar to us, but surprising to him  "There are creatures, like us in certain respects, but unlike in others, who do such and such, etc. etc."? If we did not need to understand the raven hypothesis in order to assess its confirmation by given empirical facts (as an advocate of a formal concept of confirmation expects), what I have just said would be irrelevant. But I shall be arguing that reference to meaning is necessary if apparently "paradoxical" consequences are to be generated.
178
MAX BLACK
to say whether, in my opinion, the likelihood of H being true has been increased by this information. The correct answer seems to me that I am in no position to have any rational opinion about the matter. For one thing, my enforced ignorance concerning the identity of the hypothesis in question prevents me from any reasonable opinion concerning the initial likelihood of H. (To have any opinion about this would be as absurd as to have an opinion about the height of some object, known to the questioner, but completely unknown to me.) Given this condition of almost total nescience on my part, it would seem absurd for me to say that I now have more empirical support for H than I had a moment ago. Of course, the report of a confirming instance does tend, in general, to favor a hypothesis, but it may be a long step from "tends to favor" to "positively supports" 4. I am inclined to say that if the presupposition of total ignorance of relevant background were taken seriously, a question concerning the empirical support for a given hypothesis could not yet arise. 'Empirical support' may well be a threshold concept whose application requires appeal to some background knowledge and hence one that fails to apply if such background knowledge is lacking. 3. Some pecularities of the stock example. I have been following the practice of previous writers in using Hempel's illustration  for it is intended to be no more than that  of the "raven hypothesis". Attention to some specific illustration is indeed essential, if what some philosophers like to call our "preanalytical intuitions" are to be consulted and, if necessary, rectified. For common sense speaks with an uncertain voice at best about abstract relations of "confirmation" between propositions about unidentified A's and B's (as I have just been arguing). However, the stock example may be held to have been un unfortunate choice, considering certain of its features which I shall now enumerate. (These oddities might plausibly be held responsible for at least some of the air of "paradox" that clings to Hempel's conclusions.) (a) Asymmetry between the raven hypothesis and its contrapositive. There The following analogy may help: Suppose I am told about some chess game, not otherwise identified, that White has just captured a pawn: should I conclude that White has a better chance of winning than he had before making that move? Well, other things being equal, capturing a pawn tends to help a chessplayer to win. But then so much depends upon the state of the game! Finding empirical support for an hypothesis may well resemble playing well at chess, rather than conducting a valid argument. (We can sometimes prove that a move is bad, but whether a move is good is usually debatable.) 4
NOTES ON THE "PARADOXES OF CONFIRMATION"
179
is a noticeable artificiality about the contrapositive, All nonblack things are nonravens and a still greater artificiality about what I have called the "comprehensive", Everything that is either a raven or a nonraven is either not a raven or black or both. If it is hard to imagine anybody seriously undertaking to put the contrapositive to empirical test, one reason may be that nonblack is a nonindividuating predicate, that omits any specification of the logical type of the things to which it is intended to apply. (Is a rainbow to count as a nonblack thing? And what about an electron  or even a prime number?) Here may be one source of our resistance to the claim that any nonblack thing, without exception, "confirms" the raven hypothesis 5. If we replace the original example by one in which predicate and subject are both individuating terms, we may get such a proposition as All vertebrate animals are warmblooded animals, which might be symbolised as (x)(Ax. Vx
::::>
Ax· Wx).
(7)
The corresponding "restricted contrapositive", as it might be called, would be All coldblooded animals are invertebrate animals, symbolised as (x)(Ax. ~ Wx::::> Ax· ~ Vx).
(8)
Here it is by no means implausible, or shocking to common sense, that the existence of a coldblooded invertebrate should support the hypothesis. We might therefore conjecture that one source of the "paradoxes" is the choice, as a paradigm, of a proposition of the form (1) rather than one of the form (8). Let us call objects that are A's (i.e. animals, in our example) members of the range of the hypothesis: we would still have to explain why members inside the range seem eligible, while objects outside the range do not, to be treated as relevant instances of the hypothesis. For it is easy to transform (7) or (8) into a "comprehensive" form like (5) 6. And when this has been done, the old perplexities about apparently paradoxical instances (about unlimited relevance) will reappear. 5 Consider instead the statement, Anything whatever is either a nonraven or black or both (k, say). If my own "intuitions" can be trusted, no corresponding resistance is evoked: it seems reasonable enough to say that k is "confirmed" by a rainbow, by an electron, by the number 5  or, more generally, by anything that is not black. The soundness of Hempel's conclusions accordingly turns on whether the raven hypothesis can properly be regarded as identical with, or at least logically equivalent to, the statement k. If the answer is Yes, Hempel is quite right. 6 Thus (7) is logically equivalent to (x) [(Ax V ~ Ax) ::> « ~ Ax V ~ Vx) v (Ax·Wx»].
180
MAX BLACK
(b) Uncertainty about the meaning of the raven hypothesis. It is natural to understand the assertion that all ravens are black as intending to attribute to all ravens some typical speciesidentifying character (like some identifying shape of beak). But if so, the hypothesis would express a "lawlike" association of attributes, applicable to possible as well as to actual specimens of the class of animals in question. And then any serious investigation of the truth of the hypothesis, thus construed, would be far more complicated than the symbolic formulas so far used would suggest. (If falsification of the hypothesis required only the existence of some nonblack nonraven, why not "falsify" the hypothesis by painting some raven pink all over?) It seems clear, however, that neither Hempel nor other writers on this topic have understood the raven hypothesis in this natural way. It is not unfair to them, I think, to say that they conceive of the blackness of any given object as determinable by immediate inspection. What is more important, however, is their conception of the hypothesis in question as an indefinite accidental association of attributes  as the use of the sign for material implication in such formulas as (1) above shows 7. Now, if the allegedly universal blackness of ravens is conceived as a tremendous cosmic accident, with each raven merely happening to be black (if it is black), as a die might happen to throw a six, independently of what happens to any other raven, we are dealing with a proposition so unlikely, on general principles, to be true that it is hard to take the notion of "confirming" it seriously. This brings me to a final source of dissatisfaction with the stock example. (c) Uncertainty about the truthvalue of the hypothesis. One is inclined to think that the truth of the raven hypothesis is a commonplace  to assume that "everybody knows" that ravens are black, just as everybody is supposed to know that robins are red. (This "background knowledge", if we genuinely possess it, will interfere massively with our efforts to scrutinise the allegedly abstract logical relationship of "confirmation".) But the hypothesis is a commonplace  if indeed it really is true  only when construed in the "lawlike" way. Taken however, as intended, as an enormously, perhaps infinitely, extended conjunction of independent singular accidental assertions, it seems almost certainly false. (If there is any nonzero probability against the "accidental" coincidence of two attributes in a given instance, the chance of these attributes coinciding indefinitely often becomes in the long run vanish7 Formal logic, for all its sophistication, is still notoriously embarrassed by the task of distinguishing a "lawlike" from an "accidental" generalization.
NOTES ON THE "PARADOXES OF CONFIRMATION"
181
ingly small the blackness of all ravens would eventually become as unlikely as an uninterrupted run of sixes from a fair die.) It is worth noticing that, on the view of "confirmation" advocated by Hempel (and also, with some variations, by Carnap) the increment in the degree of "confirmation" contributed to the raven hypothesis by even a straightforward positive instance of a black raven is zero. For in this type of confirmation theory, every generalization has an a priori probability of zero, that cannot be raised by any finite number of positive instances s. Now, if this were correct, it would be hard to evoke the "intuitions" to which we are supposed to appeal. If the raven hypothesis remains infinitely unlikely to be true after any amount of empirical evidence has been found, it is perhaps not so paradoxical after all that a white shoe might make the same negligible contribution as anything else 9. (d) Conclusions. I have been objecting  pedantically, as some will think 10,  to my predecessors' preoccupation with the stock illustration of the ravens because it does not lend itself to contraposition and other relevant logical transformations, because its intended meaning is unclear (with a dangerous oscillation between a "lawlike" and an "accidental" interpretation) and because its presumed truthvalue is uncertain (so that we don't know whether to treat it as a tiresome commonplace or a wildly implausible prediction). Can we find another simple example, free of these blemishes? Perhaps the following might serve: All American taxpayers turned on the radio at least once in the course of 1965 11 . Considering, in connection with this hypothesis This is plausible enough if a generalization is viewed as an extended logical product of indefinitely many singular statements about the independently existing individuals in the universe. Roughly speaking, the "width" of the singular confirming proposition is always negligibly small when compared with the "logical width" of the generalization under test. 9 Yet one is still inclined to think the white shoe irrelevant in a way in which the black raven is not! This is what still needs to be explained  or explained away. 10 But if empirical support is not a wholly formal notion  contrary to the hopes of the architects of inductive logic  such pedantry is quite in order. If the appropriate methods for an empirical test of a given hypothesis depend on the meaning of that hypothesis, clarification of such meaning is essential if paralogism is to be avoided. 11 In order to stay as close to Hempel's program as possible, I have deliberately chosen an hypothesis in which there can be no serious suggestion of a "lawlike" connection  one could hardly suppose that being an American taxpayer is a reason for listening to the radio. It seems to me an open question whether the taxpayer hypothesis is true: a statistical test, by sampling American taxpayers, would therefore have some point  if anybody were interested in the answer. 8
182
MAX BLACK
the three types of primajacie "paradoxical" instances produced by Hempel, we can perhaps say the following: restricted "contrapositive instances" (Americans who did not use a radio in 1965 and paid no taxes) still seem somewhat odd 12  though perhaps less so than in the case of the raven hypothesis; on the other hand, the remaining types (e.g. nonAmericans, also those who use their radios, whether Americans or not) still seem  with whatever justice  to be plainly irrelevant. I therefore conclude that the puzzle will not be resolved by introducing better illustrations: its roots must be deeper. 4. The technical sense of "confirmation" and the common notion of "empirical evidence". "Confirmation" is, of course, a term of art l 3 , however firmly established in the writings of Englishspeaking philosophers of science. It is plainly intended to be a technical surrogate for the common notion that [ have been referring to as "constituting empirical support"  or, at least, some component of that notion 14. Now, that common notion is closely connected with the notion of what might be called a deliberate investigation of a given hypothesis. Anybody who becomes seriously interested in determining the truthvalue of some hypothesis, H, will normally initiate a search for relevant empirical data that may reasonably be expected to establish H, to refute it  or, failing either of these results, to alter the initial plausibility of H. (I shall ignore, for the purpose in hand, any questions about the coherence of H with provisionally accepted generalizations and theories.) A fullblooded, authentic, empirical inquiry of this sort (as, for instance, in contemporary investigations of the suspected causal links between cigarettesmoking and cancer) has the following characteristic features: IC
1:1
Some reasons for this lingering oddity will be suggested later.
It is good English to speak of "confirming a rumour" or "confirming a statement" 
approximately in the sense of citing an independent statement by a trustworthy source  but in nontechnical contexts "confirming an hypothesis" would sound so odd as to be barely intelligible. II In the 1945 article, Hempel uses a number of ordinary, nontechnical expressions, intended to indicate, at least roughly, the "preanalytical" concept he hopes to "reconstruct". He speaks of the raven hypothesis as "being tested by confrontation with experimental findings" (p. J), of its receiving "favourable evidence" (p. 2), of data being "in accord with" the hypothesis (ibid.) and so on. He also speaks of "relevant evidence" (p. 3) and asks how a fact could "affect ... [the] probability of a given hypothesis" (p. 9). It is doubtful whether these various expressions, that Hempel treats as synonyms, really do have the same meanings and the same presuppositions: for instance, some writers would refuse to assign probability to hypotheses or other general statements, while readily admitting that hypotheses can be empirically tested.
NOTES ON THE "PARADOXES OF CONFIRMATION"
183
(i) It is conducted against a background of at least provisionally accepted relevant information, notably including a large number of generalizations: the empirical investigator never begins with a clean slate 15. (ii) The investigation is typically selective or comparative: the merits of H are weighed, in the light of acquired empirical findings, against those of a finite number of rival hypotheses, H', H", .. .16 • (iii) The inquiry may be viewed as a finite series of investigationepisodes, each of which is conducted according to a predetermined plan of operations and prearranged understandings (rules of valuation) as to how and to what degree the possible outcomes of the episode are to count in H's favor. An empirical investigation, as here conceived, is not a fishing expedition conducted at random, but an orderly series of consecutive operations, performed according to a welldefined modus operandi, that may not be changed, without good reason, during the course of the investigation in question. When an hypothesis, H, has successfully withstood a rigorous examination of this kind it may be said to have been empirically supported, in a strong sense of that expression. Alternatively it may be said to have acquired (a certain amount of) empirical evidence. The notion of empirical support, as I have depicted it, is of course thoroughly "pragmatic": whether a given hypothesis has been supported (and in what degree) depends upon historical facts concerning the identity and competence of the investigators in question and the soundness of the procedures employed by them, as well as upon the character of the facts uncovered. It is particularly important to stress that the amount of support that acquired data yield normally depends strongly upon the method employed in finding those data 17: If I merely hear by accident of a hundred black ravens in Jerusalem, that cannot appreciably strengthen my belief in the raven hypothesis; the case would be different if the batch had been selected from scattered regions in accordance with a rigorous sampling technique. My sketch of the main features of the empirical investigation of hypotheses is somewhat idealized: important scientific discoveries have been made by 15 Try to imagine Sherlock Holmes operating with no experience of human nature, no preconceptions about criminal behavior, and so on! 16 In current statistical procedure, a choice is usually made between two hypotheses, antecedently designated. (The choice of the hypotheses partially determines the character of the subsequent inquiry.) 17 This point was repeatedly and emphatically made by Peirce. In this respect, he saw further than his commentators who have, for instance, tended to poohpooh Peirce's insistence upon the importance of "predesignation".
184
MAX BLACK
accident and we often do, in practice, admit, as empirical evidence, data stumbled upon in the absence of prior commitment to a rigorous searching procedure. (We can sometimes accommodate such uncovenanted findings retroactively, by treating them as ifwe had set out to find them  we might then, perhaps, speak of "virtual empirical support".) There are, therefore, weaker senses of "empirically supports" and "constitutes empirical evidence for" which do not imply the deliberate contrivance, adherence to explicit rules of procedure, and so on, that are characteristic of what I have above called the "stronger" senses of these expressions. Yet something of the stronger senses still clings to the weaker senses: if we count the fact F, however it came to our knowledge, as supporting H, that is normally so  in view of the nature of empirical support here advocated because we can, after the event, concoct a defensible and appropriate procedure by which F might have been discovered in such a way as to have rendered H more credible than its designated rivals. When this is not the case, there will be something distinctly misleading about using such expressions as "empirical support" or "empirical evidence". Let us see how these ideas apply to Hempel's discussion. He says that the existence of a black raven "confirms" the raven hypothesis. If one agrees, that is  I suggest  because we substitute "empirically support" for "confirms" and naturally think (retroactively!) of the admissible procedure of selecting a raven at random in order to determine whether or not it is black. But even this example suffices to highlight the gap between Hempel's designedly formal concept of "confirmation" and the pragmatic concept of empirical support outlined above. For it is one procedure to select a raven at random in order to determine whether it is black, and a different procedure to select a black thing at random (assuming that makes sense, which seems very, questionable) in order to see whether it is a raven. In general, an hypothesis, All A are B, will receive different amounts of empirical support from A random A has been found to be B and from A random B has been found to be A, respectively, while the hypothesis of course receives the same "confirmation" from AaBa as from BaAa i8. The gap between the two concepts becomes still more glaring when we consider the paradoxical but allegedly confirming instance of a nonraven. What would be the corresponding modus operandi of an "investigation epiJ' Cf. G. H. von Wright's discussion of the paradoxes in Von Wright [1957] pp. 125127, in which he relies upon the difference made by the order in which an investigator comes to observe an A or a B.
NOTES ON THE "PARADOXES OF CONFIRMATION"
185
sode" to inquire into the raven hypothesis? Could it be: to look at anything one pleases, with the prior understanding that if it turns out to be not a raven that will count as supporting the hypothesis? Apart from the intrinsic absurdity of such a rule of operation, there is a conclusive objection to it, viz. that the outcome would not discriminate between the raven hypothesis and its contradictory, No ravens are black 19. Hence, finding a nonraven would not count as a test of the hypothesis 20, would not count as empirical support for it. To sum up, there seems to be an appreciable and important gap between the formal and artificially constructed concept of confirmation and certain familiar if somewhat illdefined concepts in common use ("empirical support", "empirical evidence") of which it is intended to be a technical "reconstruction". Whatever empirically supports an hypothesis must at least be in logical agreement with it  but the converse is untrue. Now one explanation for the "paradoxes" may well be that we naturally think of them in terms of the pragmatic concept, rather than in terms of the technical concept of "confirmation". Of course, Hempel might retort that he can hardly be held responsible for any such confusion between the two concepts: let us remember what he means by "confirmation" and all risk of confusion will be obviated 21. But the issue is neither so simple, nor so clearcut. The whole interest of Hempel's studies in confirmation (and of similar essays by those who hope to develop a formal inductive logic) depends upon the "adequacy" of his technical surrogate for the common notions of support and empirical evidence. Any striking disparity between the two notions is a prima facie ground for distrusting the "logical reconstruction" offered. (This, as I have already said, is what makes the "paradoxes of confirmation" philosophically significant  something more than substitutes for the daily crossword puzzle.) 5. Possible ways out. It may be useful at this point to survey the various strategies that have been used  or that might plausibly be used  for dealing with the paradoxes. 19 On the customary symbolization, (x) (Rx ::::> Bx) is compatible with (x) (Rx ::::> ~ Bx). But cf. the discussion below of "ordinary" uses of conditionals. 20 This point has been well made by Nelsen Goodman and, following him, by Scheffler [1963]. 21 One may recall the disputes between Spearman and those psychologists who rejected his definition of "intelligence" as inadequate. Spearman and his defenders used to say that by "g" they meant "g" (as technically defined, e.g. in connection with factor analysis). If anybody wanted to identify "g" with "intelligence" that was his affair!
186
MAX BLACK
We might (a) reject Hempel's original argument as unsound in some identified respect; or, we might (b) accept the soundness of his argument. If the latter course seems right, we shall need to explain why the argument's conclusions should seem "paradoxical". In so doing, we might rely (b i ) upon the temptation to invoke "background knowledge"; or (b 2 ) upon disparities between the technical notion of "material implication", as used in the paradoxgenerating arguments, with "ordinary" uses of "ifthen"; or (bj) upon confusion between low (or negligible) confirmation and irrelevance; or, finally, (b 4 ) upon some other significant differences between "confirmation" and other notions with which it might easily be confused. (The last four are obviously compatible strategies.) I shall now make brief comments upon these possible solutions, reserving (b 2 ) and (b 3 ) for more extended treatment later. (a) Rejecting the paradoxengendering argument. Given the relative simplicity and perspicuity of Hempel's original exposition, there seem to be only three ways in which this might plausibly be done. (a.) We might try to reject the assumption that logically equivalent propositions receive exactly the same confirmation from given data. This is a decidedly uninviting stratagem. If "logically equivalent propositions" are understood to be such as are, of logical necessity, rendered true or false by precisely the same statesofaffairs, the conclusion seems inescapable that they cannot be confirmed in different degrees by the same evidence. (This verdict remains correct if we substitute "empirical support" for "confirmation" .) (a 2 ) We might try to reject the remaining assumption that an instance of a simple generalization always confirms it 22 . This is somewhat more plausible (given what we know, at the back of our minds, about the lurking notion of "empirical support"). However, I find it hard to see any reasonable way in which this assumption could be denied, once a formal definition of confirmation had been adopted. (a 3 ) We might try to question the alleged logical equivalence between the original raven hypothesis and its "contrapositive" (formula (3) above) or between it and its "comprehensive" (formula (5) above). There can be no question, however, that the three propositions, as expressed, are indeed logically equivalent. (Any lingering doubts about the equivalence of these That is to say, that a truth of the form AaBa always confirms the corresponding hypothesis (x) (Ax =:> Bx).
col
NOTES ON THE "PARADOXES OF CONFIRMATION"
187
expressions  or, rather, the ordinarylanguage expressions corresponding to them  might better be dismissed under one of the subheadings that immediately follow.) On the whole, then, it seems to me that Hempel's argument, taken as intended, must be regarded as perfectly sound. There is no prospect of finding an internal flaw in it: if we are startled by its conclusions, the fault must lie in some stubborn confusion or prejudice. (b) Accepting the paradoxengendering argument. Hempel has made a strong case for the view that the common sense principle of limited relevance, which I mentioned earlier, arises only from a "misleading intuition": he claims that "the impression of a paradoxical situation is not objectively sound; it is a psychological illusion" (Hempel [1945] p. 18). We tend, he suggests, to confuse "practical" with "logical" considerations: for the very form of the raven hypothesis reveals a practical interest in the application of the hypothesis to ravens  and a relative lack of interest in its bearing upon nonravens, nonblack things, etc. 23 Yet, in this case, as in others like it, " ... the hypothesis nevertheless asserts something about, and indeed imposes restrictions upon, all objects (within the logical type of the variable occurring in the hypothesis ... )" (Hempel [1945]). The raven hypothesis, we need to see, is indeed an assertion about 24 every physical object in the world, claiming of each such thing that it is either a nonraven or a black thing; and, for similar reasons, any generalization whatever is about each and every thing of the appropriate logical type in the universe. Once we grasp this point, it should no longer appear startling or paradoxical that every physical object, without exception, bears one way or the other upon the truth of the raven hypothesis. On the whole, Hempel's analysis impresses me as attractively straightforward and persuasive, by contrast with some of the more elaborate explanations of the "psychological illusion" that some subsequent writers have proposed. If further explanation seems needed in order to account for our proneness to overlook the simple logical point that Hempel mainly relies upon (the logical equivalence of formulas (1), (3) and (5) above), we have a choice be23 Is it always true  or even generally true  that the subject of an assertion "interests" us more than the predicate, and that what is explicitly mentioned interests us more than what is implicit? How would "interest", in the relevant sense, be defined or detected? 24 There is surely a covert extension here of the common notion of about? Even philosophers don't talk "about" everything whenever they utter generalizations.
188
MAX BLACK
tween a number of plausible options (see (b 1)(b4 ) above). Since enough has already been said about the possible influence of "background knowledge" (i.e. (b 1 ) ) , I shall proceed at once to discuss the possible influence of the logical gap between material implication and ordinary uses of "ifthen" (option (b z)). 6. Some relevant pecularities of material implication. (a) Ordinary uses of singular conditionals. I wish to recall some familiar features of ordinary uses of sentences of the form If A then C, or of similar sentences obtainable from such sentences by changes of mood. The "logic" of such ordinary singular conditionals is closely related to the "logic" of general statements of the form All A are B and may be expected to throw some useful light on the latter. We shall find it convenient to distinguish between indicative singular conditionals, such as "If the temperature falls, we shall have snow", subjunctive singular conditionals, such as "If you were to touch that plate you would get burned", and counterfactual singular conditionals, such as "If [ had betted on Excelsior, I would have won". I can think of no other types that are relevant to the present discussion. It has often been observed that when a speaker asserts an indicative singular conditional, he normally implies some connection between the antecedent and the consequent. Suppose [ say, "If you interrupt Robinson now he will be angry". [f you do proceed to interrupt Robinson and he does become angry, that will not necessarily show that my original assertion was true: for if Robinson became angry because somebody entered the room at the moment you interrupted him, we should have to say that the truth of my original assertion remained unsettled. Thus the force of my original remark was approximately the same as that of "If you interrupt Robinson he will become angry because you interrupt him". In this more explicit form, the word "because" expresses the intended presence of some reason (often, though not always of a causal sort) why the antecedent and consequent should have the same truthvalue. The character of the imputed connection between antecedent and consequent varies from case to case: antecedent and consequent may be intended to be both true in virtue of some common cause, or the implied link may be supplied by sornebody's promise, decision, and so on. The general formula seems to be that the truth of the antecedent A is such as to provide a reason, of some sort, for the truth of B. (Hence, somebody who in ordinary life says "If A then B" can always properly be asked, in the kind of case I here have
NOTES ON THE "PARADOXES OF CONFIRMATION"
189
in mind, why the truth of A should make B also true.) When a singular sentence is used in this familiar way, I shall speak of the statement as a connected singular conditional. For the reasons I have explained, a connected singular conditional statement is a stronger statement than the corresponding material conditional, symbolized as "A::::J B". Although ifthen sentences are normally used in the way I have described, there are I believe, special and exceptional occasions when a speaker wishes to be understood as making only the weaker statement. When I say "If that penny comes down heads when tossed now, so will that other penny", I cannot mean that the truth of the antecedent will constitute a reason for the truth of the consequent: my intended meaning is simply that if A is made true, B will as a matter of fact, and not for any specifiable reason, also be true  just that and nothing more. In such a case one might speak of an unconnected or, perhaps an accidental singular conditional. Of course, a connected singular conditional logically implies the corresponding accidental singular conditional  but not vice versa. I do not wish to argue here that "ifthen" has different meanings or senses in the types of cases I have called "unconnected" and "accidental". If I had to choose, I would say that the same meaning was involved each time. (b) Truthconditions and direct verification of accidental conditionals. It is obvious that an accidental singular conditional is directly verified by A·B and is directly falsified by A· ~ B. If you toss both pennies and both show heads, my original assertion was true (since I made no further claim about there being any connection between the two states of affairs); if you toss them and the first shows heads, but the second tails, my original assertion was false. But suppose you throw the first penny into the fire as soon as I have made the prediction, so that the antecedent A remains unfulfilled. Then it seems that the truthvalue of the conditional remains open; and the original assertion has received and can henceforward receive no direct test. (If this is correct, there is a sharp contrast with the truth conditions for a material implication of the form A::::JB.) We need not abrogate the law of excluded middle in such a case: even if you refuse to make the test, I can still sensibly maintain, "What I said was true  if you had tossed both pennies, the second would have come down heads if the first did". But in the absence of direct test, any further argument about the conditional's truthvalue will have to rest upon indirect evidence. (If I had some hidden device that would allow me to produce heads at will, ( might have a good reason for reaffirming the
190
MAX BLACK
truth of the original accidental conditional, in the absence of direct verification.) Let us now compare these results with the corresponding results for the contrapositive, If notB then notA. It is seen at once that while this proposition is directly verified by ~ B· ~ A and directly falsified by ~ B· A, it has no direct verification or falsification if B is true. Thus we see that whereas the original proposition and its contrapositive are both falsified by the same complex stateofaffairs, A· ~ B, A· B, which directly verifies the original proposition, leaves its contrapositive's truthvalue open, while ~ B· ~ A, which directly verifies the contrapositive, leaves the truthvalue of the original proposition open. In the case of ~ A • B, neither of the two propositions receives direct verification. If we write P for the original proposition and Q for its contrapositive, we shall obtain the following summary: P is verified by A· B, falsified by A· ~ B, left open by ~ A • B and by ~A·~B.
Q is verified by ~ A . ~ B, falsified by A • ~ B, left open by ~ A • B and by A·B. Thus P and Q have different ranges of direct verification: if two men betted on P and Q respectively, then if one lost so would the other, but one might win while the other neither lost nor won. It seems, therefore, that in ordinary uses an accidental singular conditional and its contrapositive are not logically equivalent. This point can be clinched by showing that situations can arise in which one of the two propositions is directly verified while the other is actually false. Suppose P is "If you now press the switch, the light will go on" and Q is the corresponding contra positive, "If the light does not now go on, you will not in fact have pressed the switch", where both are intended to be taken "accidentally". Then if you do not press the switch and the light does not go on (i.e. if .~ A • ~ B is the case) Q will be directly verified. But this result is compatible with the falsity of P: we might know, for instance, that the lamp was broken and therefore be in a position to assert, in retrospect, "If you had pressed the switch the light would not have gone on", and hence to derive the falsity of P. Such results as these are so unlike the corresponding results for material conditionals, that conclusions based, as in Hempel's arguments, upon theorems of the standard propositional calculus must be interpreted with great caution. Before leaving this topic, we may notice the following simple way of representing "accidental singular conditionals" in terms of the familiar symbolism of the propositional calculus. Using the technique of Carnap's "reduction
NOTES ON THE "PARADOXES OF CONFIRMATION"
191
sentences" we may write the following formulas:
== B) B ::J (Q == ~ A)
A ~
::J
(P
which highlight the indeterminacy of direct verification previously noticed. These expressions can, in turn, be "solved" for P and Q, respectively yielding: P==A·Bv
~A·X
Q==~A·~BvB.Y
where X and Yare to be taken as indeterminate parameters  propositions of unspecified truthvalues 25. As I have already suggested, values of X or Y, respectively, can sometimes be supplied by indirect reasoning from similar cases or, what comes almost to the same thing, by indirect reasoning from relevant generalizations. (c) The verification of restricted accidental generalizations. Let us now apply the results already obtained to the verification of the general statement All the white balls in this urn are solid (P', say). I have chosen a statement that, to common sense at least, seems to be about a finite, although unknown, number of objects, viz., the white balls contained in the urn in question. The corresponding "restricted" contrapositive may be taken to be All the nonsolid balls in this urn are nonwhite (Q', say). Our generalization, p', may reasonably be construed as a finite conjunction of an indefinite number of accidental singular conditionals: it says of each ball in the urn that if it is white, then, as a matter of fact (and not for any special "connection" or reason) that ball is also solid. The asymmetry between the conditions for direct test of an accidental singular conditional and its contrapositive will obviously reappear in the present case. When P' is directly tested by examining each ball in the urn separately, a given ball may be found to agree with P' by being both white and solid (W.S) or it may disagree with it by being white and not solid (W. ~ S), whereupon the testing process will terminate; but if it should be nonwhite, the instance will be dismissed as irrelevant. If, however, the restricted contrapositive, Q', is 2& It is easy to see that these parameters must be functions of A and B. Suppose we have Pi == A·B v ~ A,Xl and P2 == A·~ Bi V ~ A,X2. We want Pi. and P2 to be contradictories, which will require ~ A :::J ~ (Xl,X2) to be the case. Thus the "parameters" cannot be chosen altogether freely, if the ordinary conventions for "ifthen" are to be respected. I shall not pursue this topic here.
192
MAX BLACK
being directly tested, different judgments will be in point (except in the case of falsification). Let a be the class of whiteandsolid balls in the urn; b the class of whiteandnonsolid balls in the urn; c the class of nonwhiteandsolid balls in the urn; and, finally, d the class of nonwhiteandnonsolid balls in the urn. Then P' is partially verified by each member of a, is falsified by each member of b, and is unaffected by each member of c and by each member of d; while the restricted contrapositive, Q', is unaffected by each member of a, is falsified by each member of b, is unaffected by each member of c, and is partially verified by each member of d. We have, once again, the pattern previously observed of different though overlapping verification conditions, but identical falsification conditions. There is, however, the following new point. In order to establish P' directly, we must eventually examine each ball in the urn, in order to record it as partially verifying the hypothesis, or as irrelevant to it. If P' is in fact true (and not void on account of the absence of any white balls) completion of the entire process of direct testing will also, thereby, supply all the data we need for direct verification of the contrapositive, Q'. It is easily seen that if we have found by direct test that P' is true, then Q', if not vacuous (through the absence of nonsolid balls in the urn) must likewise be true. We may therefore say, without doing any violence to our "intuitions", that any direct test of P' will be an indirect test of Q', and vice versa. This will sometimes make it plausible to examine cases of nonsolid balls, even if we are setting out to test the direct generalization, P'. For example, if we knew that we could locate three balls that were known to be the only nonsolid ones in the urn, examination of each of them for their color would provide us with a rapid way of indirectly testing P', without the lengthy and tedious routine of successively examining each ball in the urn. This might be regarded as a scrutiny of all the possible negative instances  those that might be of the form ~ S· W. On the other hand, it would never be appropriate to consider cases that are ~ W· S  whether we were testing P' or Q'. [I' we knew that we could extract a certain subset of the balls, each known to be both nonwhite and solid ( ~ W· S), we should at once discard them as being irrelevant to the testing of either P' or Q'. If we now apply these results to a modified form of Hempel's example, such as the hypothesis, All ravens in the New York Zoo are black (R, say), we shall want to say that each black raven in the Zoo directly confirms R, each nonblack raven in the Zoo falsifies it, each white or other nonblack object is irrelevant (so far as direct test goes) and each black thing in the Zoo
NOTES ON THE "PARADOXES OF CONFIRMATION"
193
either confirms it or is irrelevant, depending on whether or not it is a raven. Furthermore, each thing outside the New York Zoo must count as irrelevant. But suppose Hempel, or someone who agrees with his approach, asks us to consider instead the original raven hypothesis, All ravens are black? If this is intended to be understood as an unrestricted accidental generalization, to the effect that each raven (past, present and future) is in fact black, it is doubtful whether the notion of "direct verification" with its implication of successive scrutiny of every physical object, without exception, continues to make sense 26 . At any rate, the chance of such a tremendous cosmic "accident" occurring is so small, on general grounds, that there would be something odd about saying that observation of a single black raven should count as "partial" verification. Here, perhaps, "verification" assumes the negative sense of "absence of falsification" and with this understanding the sting is removed from the paradoxes. There is certainly nothing paradoxical about saying that both a white shoe and a black cat fail to falsify the raven hypothesis  although even here we might wish to make a distinction between an object, like the first, that might have falsified the hypothesis, and one like the second that could not have done so. Paradoxical suggestions would be conveyed by this manner of description only if we were led to suppose that an appropriate method of testing an unrestricted accidental generalization might reasonably consist of an unsystematic and exhaustive scrutiny of every object of the appropriate logical type in the entire universe. (d) Transition to the case of connected conditionals. I have been arguing that an "accidental" singular conditional of the form If A then B, does not have the same direct truthconditions as its contrapositive, Ifnot B then notA, and may therefore be treated as a distinct proposition. I now wish to consider whether a similar point can be made with respect to "connected" singular conditionals. Take, as an example, If I press the switch the light will go on (If A then Bor P) with the intended implication that fulfillment of the antecedent will be a reason for the fulfillment of the consequent. The possibilities for direct testing of this strong conditional are even more restricted than in the case of the corresponding weak, accidental, conditional. For even if I do press the ~6 In the case of the balls in the urn, discussed above, the examination of a single ball is one step forward in a procedure which I know, in advance, will terminate at a known instant. But to undertake to scrutinise "everything in the universe" would be to start something which could never be known to have been accomplished.
194
MAX BLACK
switch and the light does go on (A. B), that might still have been just a coincidence, and more evidence of an indirect sort (e.g. concerning the mechanism of the switch) will be needed before the assertion can be regarded as established. Similar remarks apply to the contrapositive, If the light will not go on then I will not have pressed the switch (Q). SO, with respect to both P and Q we now have the situation that each of them is falsified by A· ~ Band neither is directly verified by any of the three remaining possibilities, A .B, ~ A • B, and ~ A • ~ B. It looks as if the asymmetry of the truthconditions upon which I previously relied has vanished. Common sense, however, will still wish to make a distinction between the bearing upon P of the case A . B and the bearing upon it of the cases ~ A •B or ~. A • ~ B. Consider the first: If I press the switch, then for alII know the light may not go on, in which case P would be false; if, therefore, the light does go on, I have, to be sure, obtained no conclusive evidence of P's truth, but I have obtained some relevant information. A natural way to describe this would be to say that I have obtained some partial verification of P. (This would be all the more natural if we were to think of P as a conjunction of an accidental conditional and an assertion about the imputed "reason" or "connection"; the observation of A· B then directly verifies the first conjunct, while leaving the second open.) On the other hand, ifI do not press the switch, then there is no chance of my falsifying P, so nothing that subsequently happens, whether the light goes on or not, can give me any direct information. These cases, one is inclined to say, are irrelevant. If we apply similar considerations to Q, we get the following patterns of truth conditions for the original assertion and its contrapositive: P is falsified by A· ~ B, is partially verified by A· B and is unaffected by ~ A . B and by ~ A • ~ B, Q is falsified by A • ~ B, is partially verified by ~ A • ~ B, and is unaffected by ~A·B and by A·B. Thus we get a modified asymmetry, somewhat resembling what we found in the case of accidental conditionals, and can proceed as before. We must notice, however, that the logical relations between "strong" conditionals such as P and Q differ from those of the corresponding "weak" conditionals. It is easy to see, indeed, that if P is true, Q must also be true; and if P is false, Q cannot be true. Thus we must admit a relation of logical equivalence between the two propositions. It follows that any case that partially verifies P by direct test (A. C) will also indirectly and partially verify its contrapositive Q. Our analysis must accordingly be modified as follows:
NOTES ON THE "PARADOXES OF CONFIRMATION"
195
TABLE OF TRUTH CONDITIONS
A· ~ B directly falsifies P, directly falsifies Q. A·B partially and directly verifies P; and hence partially and indirectly verifies Q. ~ A· ~ B partially and directly verifies Q; and hence partially and indirectly verifies P. ~ A· B leaves P unaffected; leaves Q unaffected. (e) Conclusions. I believe that a prima facie case has now been made for thinking that the discomfort produced by the paradoxical cases of confirmation is partly due to the logical gap between material implication and "ordinary" implication. However, it is hard to be sure of this in the absence of any thorough and comprehensive examination of the discrepancies between the two concepts 27. 7. Bayesian approaches. I have left for the last a type of "solution", involving considerations of "inverse probability", that has been astonishingly popular P', considering the notorious difficulties that have generally discredited the classical "Bayesian" approach to the confirmation of empirical generalizations 29. The argument fastens upon the circumstances that the number of ravens in the universe is very much smaller than the number of nonblack things. This being admitted, an attempt is made to show that the "prior" or antecedent likelihood of finding a raven to be black is smaller than the likelihood of finding a nonblack thing to be a nonraven. Indeed, if the class of ravens is much smaller than the class of nonblack things, the first likelihood is much smaller than the second. (Call this last contention Step one.) It is now urged that, on the basis of Step one, the increase in degree of confirmation produced by finding a raven to be black is much greater than the increase produced by A valuable contribution to this neglected task is a recent paper by Adams [1965]. See, for instance, HosiassonLindenbaum [1940]. This contains one of the best attempts to deal with the paradoxes in the way now to be explained. For critical comments, see Hempel [1945] p. 21, footnote 2. A recent attempt of this sort is Mackie [1963], which contains references to other essays in the same vein. 29 There is a useful summary of such difficulties in Von Wright [1957] pp. 112117. Von Wright says that" ... such uses of inverse probability as those of determining the probability that the sun will rise tomorrow or that the next raven will be black are illegitimate" (p. 115). Anybody who agrees will reject the "Bayesian approach" to the paradoxes at the very outset. 27
2B
196
MAX BLACK
finding a nonblack thing to be a nonraven (call this last claim Step two). We now explain our initial reluctance to see that a white shoe, or any other nonblack nonraven, is an authentically confirming instance of the raven hypothesis, as arising from a confusion between low confirmation and irrelevance: intuitively grasping, as we are supposed to do, the negligible contribution to the confirmation of the hypothesis made by a "paradoxical" instance, we mistakenly suppose that it makes no contribution at all. Once this error has been detected, there is no further reason to reject such an instance as irrelevant, even though it makes a trifling contribution in practice to the support of the hypothesis that interests us. J believe I can here dispense with a detailed examination of the ingenious arguments by which this approach has been supported, since the following considerations seem sufficient to show its inadequacy. (i) The defense of what I have called Step two (the crucial link in the argument offered) is admittedly intricate and problematic 30. (ii) Hence, even if Step two is correct (which I doubt), common sense  if it does rely upon the supposedly different contributions made by the two sets of instances  must be using a fallacious argument. (iii) No reason has been given to believe that "common sense" does in fact believe in Step tH'O  indeed the empirical evidence (and this is an empirical question l) suggests that common sense simply holds the paradoxical cases to be irrelevant. (iv) Consider the contrapositive, H', of the raven hypothesis, i.e. All non30 One might be inclined to think it obvious that an instance antecedently less likely to arise supports the hypothesis (H, say) more strongly than does an instance that is antecedently more likely to arise. (Roughly speaking, the less surprising an observed consequence of a law under empirical test, the less support such an observation gives to the law.) But any careful attempt to calculate the relevant degrees of "confirmation" quickly reveals the implicit fallacy. Call the positive instance (a black raven) a and the contrapositive instance (e.g. a white shoe) b. Let the probability of a being observed if H is true, P(aIH), be PI and similarly, let P(bl H) = qi; let P(a] ~ H) = P2; and, finally P(bl ~ H) = qz. Then, the observation of a raises the antecedent odds in favor of H in the ratio pl!P2; and the observation of b raises those same odds in the ratio of ql!q2. Whether the positive instance, a, supports H more strongly than the contrapositive instance, b, therefore depends on whether p ilp« is greater than QJ/q2. Clearly, more is involved than the sizes of the classes of ravens and nonblack things respectively. What is at stake may, with some simplification, be said to be whether a predominance of contrapositive instances over positive instances is more likely on the supposition that H is true than upon the supposition that H is false. It is hard to see how this question could possibly be answered. As Peirce said, universes arc not as plentiful as blackberries; and hence speculation about the number of white shoes to be expected if not all ravens are black is bound to be idle.
NOTES ON THE "PARADOXES OF CONFIRMATION"
]97
black things are nonravens (or Nothing that is not black is a raven). On the solution proposed, "common sense" ought to treat direct instances of H' (nonblack things that are nonravens) as irrelevant'". (For the subject class of H', the nonblack things, is supposed to have enormously more members than the complement of its predicate class.) This is as paradoxical as what is supposed to be explained. On the whole, the Bayesian approach seems to me wrong in principle and ineffective in practice.
8. Postscript. Considering the amount of sophisticated discussion that the paradoxes have received, the lack of some generally acceptable solution is disappointing. The preceding remarks have no claim to serve as a satisfactory basis for such a solution. If they have any merit, it may be that of drawing attention to some subtleties that have been overlooked in the past. I have been concerned throughout to stress the gap between the syntactical notion of confirmation and the common notion of "empirical evidence" (see especially section 4). But the latter notion is still shrouded in unnecessary obscurity. References ADAMS, E. W., 1965, The logic of conditionals, Inquiry, vol. 8, pp. 166197 HEMPEL, C. G., 1945, Studies in the logic of confirmation, Mind, vol. 54, pp. 126, 91121 HOSIASSONLINDENBAUM, Janina, 1940, On confirmation, J. Symbolic Logic, vol. 5, pp. 138148 MACKIE, J. L., 1963, The paradox ofconfirmation, British Journal for Philosophy of Science, vol. 13, pp. 265277 SCHEFFLER, I., 1963, The anatomy of inquiry (Alfred A. Knopf, New York) VON WRIGHT, G. H., 1957, The logical problem of induction (Macmillan, New York)
31
This point has been well made by Professor I. Scheffler.
A BAYESIAN APPROACH TO THE PARADOXES
OF CONFIRMATION * PATRICK SUPPES
Stanford University, Stanford, Calif
1. Introduction. What I have to say about the paradoxes of confirmation from a Bayesian standpoint is rather simple. The ideas have been implicitly expressed several times, probably first by HosiassonLindenbaum [1940]. Perhaps the only virtue of the present paper is to make the Bayesian ideas very explicit. The remarks in the last section on the different probabilistic forms of causal and noncausal laws are very likely the most original aspect of the analysis. The paradoxes arise from two "facts". First the sentence
(\7' x)(Ax > Ex)
(1)
is logically equivalent to its contrapositive:
(\7' x)(i3x
>
Ax),
(2)
where ",, is the symbol of negation (and later of set complementation). Second, the singular sentence Aa&Ba (3) seems to confirm (1) in a way that the singular sentence
Aa&Ba
(4)
does not, but with respect to (2) the roles of (3) and (4) are reversed, even though (1) and (2) are logically equivalent.
2. Bayesian approach. On a Bayesian approach, we first look at the four classes and assign each a prior probability in the universe of objects  exactly how this universe is to be characterized I leave open for the moment.
* r am indebted to Ernest W. Adams and Paul Holland for several helpful comments on an earlier draft of this paper. The writing of this paper has been partly supported by the Carnegie Corporation of New York.
THE PARADOXES OF CONFIRMATION
J99
Using the familiar notation '{x :Ax}' for describing the set of objects x that have property A, we then have in terms of four mutually exclusive and exhaustive classes p({x: Ax &Bx}) = PI P({x: Ax&Bx}) = pz p({x: Ax &Bx}) = P3 P({x: Ax&Bx}) = P4 and LPi = 1. Also for simplicity I assume throughout that Pi=FO, for i= J, 2, 3, 4. If we take the familiar example and let' Ax' be 'x is a raven' and' Bx' be 'x is black', then P4 should be much larger than PI' Pz and P3 for any very broadly construed universe. The central question is why we are right in our intuitive assumption that we should look at randomly selected ravens and not randomly selected nonblack things in testing the generalization that all ravens are black. We may consider the general case, representing classes by 'A' and 'B' in the obvious way: A = {x :Ax}, etc. First of all, we note that
peA) = PI + Pz,
(5)
PCB) = PI + P3'
(6)
and thus in terms of conditional probability
PCB I A) = __PJ __
(7)
P4. P ( A I B) = pz + P4
(8)
PI
+ pz
Now we want to justify the sampling rule that we look at A's rather than
nonB's if P(BIA)
PI < P4 .  PI
+ pz
pz
+ P4
(9)
It is an immediate arithmetical truth that (9) is true if and only if
(10) I believe the argument is straightforward for holding that the decision to look at A's rather than nonB's hinges upon the simple inequality (10). In sampling objects to confirm or disconfirm the generallaw '(if x) (Ax > Bx)', we want to test the law. This, I take it, means that we want to sample items
200
PATRICK SUPPES
with a higher prior probability of disconfirming the law. This point is made clear by noting
and
A ) =pz P ( BI PI + pz peA I B)
= __ fJ2. P:
+ P4
(11)
,
(12)
and selection of an A has a higher prior probability of disconfirming the law than does selection of a nonB just when
P (A I B) < P (B I A) , that is, just when once again
PI < P4' It should be made clear that the adoption of a rational rule for what to observe or sample does not follow from the prior probabilities alone. Some other ingredient must be added, but the rule that tells us to select an A rather than a nonB when PI
THE PARADOXES OF CONFIRMATION
Then
1
201
PCB I A) =  = 0.9090909 1.1
(13)
_ _ 0.9998989 peA I B) =0.999899 = 0.999999899.
(14)
The probability of (14) is so close to 1 that it seems ridiculous to try to change it by additional observation. This is far less true for the probability of (13). Consequently we sample ravens rather than nonblack things. On the other hand, it would be a mistake to think that P4 will always turn out to be close to 1, and therefore that we should always look at A's rather than nonB's. Consider the following example. Suppose that in a certain election district we want to test the generalization that all voters in this district are literate. The universe X of objects we define to be the adult population of the district. Let V be the subset of voters and L be the subset of literate people of X. A quite reasonable a priori probability for each of the four classes could be something like the following: PI = P2 = P3 = P4 =
P(V &L) = 0.75 P(V &L) = 0.05 P(P &L) = 0.15 P (P & L) = 0.05,
and because P4
202
PATRICK SUPPES
limited, the probability P4 becomes extremely close to 1. But when irrelevant objects like non black thoughts are excluded from sampling by the probability argument given above, the sample space represents a radically reduced population that does not have P4 absurdly close to 1. There are several features of the Bayesian viewpoint that stand in sharp contrast to the position of Carnap and others who work mainly in the framework of inductive logic rather than mathematical statistics. On the one hand, the logicians are scandalized at the vague and subjective character of the prior probabilities used by Bayesians. On the other hand, Bayesians are scandalized at the artificiality and simplistic character of nearly all examples considered by the logicians. I don't pretend to be able to offer arguments that will resolve a conflict of this depth, but I would like to say some things that have perhaps already been said elsewhere but that can perhaps be said in a somewhat new way to defend the Bayesian viewpoint. I would claim to do this without too much bias because in two other articles in this volume I am concerned to criticize what I take to be fundamental limitations of the Bayesian approach to induction and problems of rational information processing. First, the Bayesian is quick to remark that in most systems of inductive logic, the probabilities PI and P4 in the raven example are assigned the same, or nearly the same value, if the example is analyzed from scratch in terms of the classes or properties A and B. (For the Bayesian this pure a priori assignment reflects wanton waste of evidence already available.) Now the inductive logician may reply that if he were seriously pursuing this example, he would begin much further back and introduce a number of fundamental elementary predicates into his language, and use them to express explicitly the known evidence about ravens, A's and B's; or what have you. And this reply leads to the second Bayesian retort. The inductive logicians, to the Bayesian at least, are the heirs to an intellectually distinguished but misguided tradition of logical atomism that begins at least with Hume. The constraints imposed by the atomism of Wittgenstein's Tractatus are weak compared to the assumptions of statistical independence built into the Carnapian measures imposed on state or structure descriptions. It is the virtue of Carnap to have pursued the atomistic tradition to its more complete probabilistic version, for what the approach comes to in terms of specific questions of confirmation or observation can be settled in a way that is not possible for the relatively vague and noncategorical doctrines of Hume's Treatise or Wittgenstein's Tractatus. But for Bayesians it is an approach that is bound to fail for fundamental reasons. First, it is impossible to express in explicit
THE PARADOXES OF CONFIRMATION
203
form all the evidence relevant to even our simplest beliefs. There is no canonical set of elementary propositions to be approached as an ideal for expressing exactly what evidence supports a given belief, whether it be a belief about ravens, gods, electrons or patches of red. Arguments in support of this last assertion are numerous, and it is worth examining the most important ones because of the fundamental nature of this particular issue for any theory of induction or rational behavior. The simple memory of a computer does provide an example of evidence organized in terms of canonical propositions, but this is because all beliefs or propositions for the computer are categorical. Every issue or belief is an utterly black or white affair. There is no place for partial beliefs, tentative evidence or vague but perceptive hunches. And the lack of these characteristics is perhaps the most salient feature of contemporary computers. At the present time partial beliefs or intuitive hunches can probably be analyzed more thoroughly in terms of probability distributions than in any other fashion. It is not unreasonable to say that the propensity for generating Bayesian distributions is the human facility computers most sorely need. However, there is no need to bring computers into the argument. Analysis of ordinary beliefs, with appropriate contrast of Bayesian and Carnapian views, will stand on its own feet. If I ask a person who follows politics with any serious interest, what the Republican chances are of winning the 1968 presidential election, he would probably not hesitate to give some sort of qualitative answer, like "Not so good", "Unlikely", or "Very small in my judgment". And if I go on to ask him for the basis of his estimate of the chances, he will probably go on to offer a packet of heterogeneous facts and the reasons for thinking some of them are particularly significant. All this is very standard in political conversation and also very Bayesian. Most political beliefs are not quite pinned down; the evidence is assembled higgledypiggledy from all kinds of sources that vary widely in reliability and relevance. Now it is not uncommon for an inductive logician to be willing to admit all this, but he may go on to say that while the consideration of hunches or badly formulated beliefs may be an essential part of the discovery, it is not essential in the validation and assessment of hypotheses. The Bayesian is not content with this thin bone of discovery. He will go on to reply that the vague and subjective prior distribution is of importance primarily in summarizing all the information about the experiment or proposed test which lies outside the narrow framework of the experiment itself, but which is still relevant in varying degrees. The assumption of a prior distribution is a systematic way of summarizing a great deal of heterogeneous information.
204
PA TRICK SUPPES
And here another point arises. The Bayesian is more modest than the inductive logician in what he hopes to express by means of a prior distribution. It is of fundamental importance to any deep appreciation of the Bayesian viewpoint to realize that the particular form of the prior distribution expressing beliefs held before the experiment is conducted is not a crucial matter. If a moderate number of observations is taken in the experiment, the conclusions drawn will be relatively robust, that is, relatively indifferent to moderate variations in the prior distribution; and, the more the number of systematic observations the more robust the conclusion. There is a very general theorem that can be stated here, but I shall not digress to formulate it precisely. It is to the effect that given any two prior distributions drawn from a large class of possible distributions, there is, for a broad class of experiments, a sufficiently large number of observations to bring the two posterior distributions as close together as is desired. For the Bayesian, concerned as he is to deal with the rea! world of ordinary and scientific experience, the existence of a systematic method for reaching agreement is important. To him it is hopeless to strive for an atomistic expression of the total relevant evidence in terms of elementary observation sentences. The welldesigned experiment is one that will swamp divergent prior distributions with the clarity and sharpness of its results, and thereby render insignificant the diversity of prior opinion. The Bayesian does not believe that we can find ways to express these diverse prior opinions in logically tight, explicit form. The task of the theory of rationality, for the Bayesian, is to understand how to conceive and design experiments that will eliminate or reduce diversity of opinion about serious questions, and part of the task of this theory is being clear about puzzling matters like the paradoxes of confirmation. I hope that these more general remarks will have defined more sharply the framework within which I have proposed to resolve the paradoxes. 3. Causal versus noncausallaws. I have some additional specific remarks to make about the paradoxes. From a standard statistical viewpoint the analysis already given goes only part of the way. A welldefined sample space for a welldefined experiment was not constructed for either the raven or voting example. What bothers me is that the construction of the appropriate sample space does not seem at all natural. Tn trying to determine why this is so, what has struck me most is the complete artificiality of the problem. The experimental literature of biology, psychology and to some extent even physics is full of meaningful experiments testing meaningful hypotheses in a statistical
THE PARADOXES OF CONFIRMATION
205
fashion, but essentially none of those hypotheses is of the form "All ravens are black". The main point seems to be that no one applies systematic statistical procedures for making inductive inferences or testing hypotheses when the hypothesis in question asserts a nonprobabilistic implication about a discrete classification. In the case of physics particularly, statistical procedures are used to test deterministic hypotheses, but the hypotheses are about continuous quantities, and statistical questions enter mainly in discussing errors of measurement. The reason for this attitude toward deterministic hypotheses seems clear. The assessment of evidence for or against such a hypothesis is a quite simpleminded affair. A single observation will falsify the hypothesis; all positive instances will confirm it. We have no serious or systematic statistical problem of assessing evidence. LaPlace, I am sure, would have considered the raven sort of problem rather silly, because he thought the apparatus of probability theory was to be applied to the determination of the complex causes of phenomena when no simple or deterministic scheme would work in practice. More can be said on this point, but let us see how the main thrust of these remarks bears on the paradoxes of confirmation moved to a more realistic setting. The paradigmsort of hypothesis is now: Smoking tends to cause cancer. Put in terms of classes A and R, we move from For all x, if x is A then x is B to: or:
P(RIA) > P(RIA)
(15)
P( cancer Ismoking) > P(cancer Inonsmoking).
(16)
The first thing to note is that the obvious form of the paradox of confirmation disappears for in general
P(RIA)
=1=
P(AIB),
i.e., the direct analogue of contraposition is not valid in terms of conditional probability. On the other hand, it reappears in another form, which is innocuous in many applications. We need the usual 2 x 2 contingency table to bring out the point. The distribution of the population (or sample) is shown by the numbers nij' B 
A nl1
n 12
A
n 22
n21
(17)
206
PATRICK SUPPES
We may use this table to show that (15) holds if and only if
P(AIE) > P(AIB),
(18)
and (18) is a sort of probabilistic contrapositive of (15). Using (17), we have
P(BIA) > P(BIA) nil 111 1
+ 1112
>
11 1 11122 >
11111122
+ 11211122> 1122 11 1Z
+ 112 2
>
if and only if
1121 n21
+ 1122
if and only if
11121121 11121121
+ 11211122
112 1 1111
if and only if
+ 112 1
if and only if if and only if
P(AIE) > P(AIB),
which establishes the desired equivalence. Tn terms of smoking and cancer, we have:
P( cancer Ismoking) > P( cancer Inonsmoking) if and only if P( nonsmoking I noncancer) > P( nonsmoking Icancer), and not only does this seem reasonable, but it also seems reasonable to sample either the causes (smoking) or the effects (cancer) and their absences in establishing a probabilistic causal law. We may sample by looking at smokers and nonsmokers, or by looking at persons with cancer and those without cancer. (For detailed design of an experiment, the question of precisely what class seems a priori most appropriate to sample or, more realistically, in what proportions classes of individuals should be sampled, would follow the same line of analysis pursued earlier in discussing the raven example, and will not be considered in detail again.) However, a subtle point has been illegitimately smuggled in, and the situation changes when we consider something closer to the raven case, i.e., a noncausal law. We may entertain the noncausal probabilistic law: Most ravens are black.
(19)
The natural probability expression of this hypothesis is not the analogue of (15): (20) P(BIR) > P(BJR), but rather (21) P(BIR) > P(EIR),
THE PARADOXES OF CONFIRMATION
207
and without further assumption the apparent "contrapositive" probability analogue of (21) is not necessarily equivalent to it. To be explicit, (21) is not necessarily equivalent to
P(RIB) > P(RIB),
(22)
as may be seen from using table (17) as before, and with this observation, the paradoxes of confirmation vanish for (19). (It may be argued that the bare inequality of (21) does not reflect the exact meaning of most and that a stronger form of inequality should be used, but meeting this criticism is not crucial for the present discussion.) As far as I know, the relevance for the paradoxes of confirmation of the sharp distinction between causal and noncausal laws, particularly the relevance of the different probabilistic forms of such laws, has not been previously noticed. It should be apparent that the kind of causal law pertinent to this discussion is probabilistic rather than deterministic in character, and is of the sort ordinarily tested in biological, medical and psychological experiments and reported in contingencytable data. A certain lack of clarity in the distinction between causal and noncausal laws is also to be found in the terminology used in the statistical literature. Statisticians have developed measures of association for contingencytable data and the probabilistic causal laws tested by the tables. It would seem more natural to reserve the term association for testing the noncausal laws, but such tests are not ordinarily discussed in the same detailed fashion, undoubtedly because of the greater importance of causal laws from both a practical and conceptual standpoint. I do not mean to suggest that inequality (15) offers a very profound analysis of the probabilistic notion of cause. My limited objective in this paper has been to point out the conceptually sharp distinction between causal and noncausallaws when they are expressed in a probabilistic form. The ideas used here go no deeper than what I would call the level of naive causes. The identification of genuine causes, which to me seems necessarily relative to a particular conceptual scheme, requires a more elaborate probabilistic structure than I have introduced here. But the introduction of additional structure would not change what I have said about the nonexistence of the paradoxes of confirmation for either causal or noncausallaws of a probabilistic sort. References HOSIASSONLINDENBAUM, 1940, On confirmation, J. of Symbolic Logic, vol. 5, pp.133l48
THE PARADOXES OF CONFIRMATION* G. H. VON WRIGHT The Academy of Finland, Helsinki, Finland
1. We consider generalizations of the form "All A are B". An example could be "All ravens are black". We divide the things, of which A (e.g. ravenhood) and B (e.g. blackness) can be significantly (meaningfully) predicated into four mutually exclusive and jointly exhaustive classes. The first consists of all things which are A and B. The second consists of all things which are A but not B. The third consists of all things which are B but not A. The fourth, finally, consists of all things which are neither A nor B. Things of the second category or class, and such things only, afford disconfirming (falsifying) instances of the generalization that all A are B. Since things of the first and third and fourth category do not afford disconfirming instances one may, on that ground alone, say that they afford confirming instances of the generalization. If we accept this definition of the notion of a confirming instance, it follows that any thing which is not A ipso facto affords a confirming instance of the generalization that all A are B. This would entail, for example, that a table, since notoriously it is not a raven, affords a confirmation of the generalization that all ravens are black. A consequence like this may strike one as highly "paradoxical". it may now be thought that a way of avoiding the paradox would be to give to the notion of a confirming instance a more restricted definition. One suggestion would be that only things of the first of the four categories, i.e. only things which are both A and B, afford confirmations of the generalization that all A are B. This definition of the notion of a confirming instance is sometimes referred to under the name "Nicod's Criterion". According to this criterion, only propositions to the effect that a certain thing is a raven and is
* The treatment of the Paradoxes of Confirmation which is suggested in this paper is
substantially the same as the one given in my essay, in Theoria, vol. 31 (1965), pp. 254274. The nonformal parts of the discussion in the two papers are largely identical. The formal argument, as presented here, is more condensed and also, I hope, more perspicuous than in the Theoria paper.
THE PARADOXES OF CONFIRMATION
209
black can rightly be said to confirm the generalization that all ravens are black. But if we adopt Nicod's Criterion as our definition of the notion of a confirming instance we at once run into a new difficulty. Consider the generalization that all notB are notA. According to the proposed criterion we should have to say that only things which are notB and notA afford confirmations of this generalization. The things which are notB and notA are the things of the fourth of the four categories which we distinguished above. But, it is argued, the generalization that all A are B is the same as the generalization that all notB are notA. To say "all A are B" and to say "all notB are notA" appear to be but two ways of saying the same thing. It is highly reasonable, not to say absolutely necessary, to think that what constitutes a confirming or disconfirming instance of a generalization should be independent of the way the generalization is formulated, expressed in words. Thus any thing which affords a confirmation or disconfirmation of the generalization g must also afford a confirmation and disconfirmation respectively of the generalization h, if "g" and "h" are logically equivalent expressions. This requirement on the notion of a confirming instance is usually called "The Equivalence Condition". To accept Nicod's Criterion thus seems to lead to conflict with the Equivalence Condition. This conflict constitutes another Paradox of Confirmation. 2. Before we proceed to a "treatment" of the paradoxes which we have mentioned, the following question must be asked and answered: Are confirmations of the generalization that all A are B through things which are notA always and necessarily to be labelled "paradoxical", and never "genuine"? Simple considerations will show, I think, that the answer is negative. Let us imagine a box or urn which contains a huge number of balls (spheres) and of cubes, but no other objects. Let us further think that every object in the urn is either black or white (all over). We put our hand in the urn and draw an object "at random". We note whether the drawn object is a ball or a cube and whether it is black or white. We repeat this procedurewithout replacing the drawn objects  a number of times. We find that some of the cubes which we have drawn are black and some white. But all the balls which we have drawn are, let us assume, black. We now frame the generalization or hypothesis that all spherical objects in the box are black. In order to confirm or refute it we continue our drawings. The drawn object would disconfirm (refute) the generalization if it turned out to be a white ball. If it is a black ball or a white cube or a black
210
G. H. VON WRIGHT
cube, it confirms the generalization. Is any of these types of confirming instance to be pronounced worthless? It seems to me "intuitively" clear that all the three types of confirming instance are of value here and that no type of confirmation is not a "genuine" but only a "paradoxical" confirmation. (Whether confirmations of all three types are of equal value for the purpose of confirming the generalization may, however, be debated.) I would support this opinion by the following ("primitive") argument: What we are anxious to establish in this case is that no object in the box is white and spherical. Not knowing, whether there are or are not any white balls in the box, we run a risk each time when we draw an object from the box of drawing an object of the fatal sort, i.e, a white ball. Each time when the risk is successfully stood, we have been "lucky". We have been this, if the object which our hand happened to touch was a cube (and, since we could feel it was a cube, need not be examined for colour at all); and we have been lucky, if the object was a ball which upon examination was found to be black. To touch a ball, one might say, is exciting, since our tension (fear of finding a white ball) is not removed until we have examined its colour. To touch a cube is not exciting at all, since it ipso facto removes the tension we might have felt. But to draw from the box is in any case exciting, since we do not know beforehand, whether we shall, to our relief, touch a cube, or touch a ball and, to our relief, find that it is black, or touch a ball and, to our disappointment, find that it is white. Let "S" be short for "spherical object in the box", "C" for "cubical object in the box", "B" for "black", and" W" for "white". All things in the box can be divided into the four mutually exclusive and jointly exhaustive categories of things which are Sand B, Sand W, C and B, and C and W. It is not connected with any air of paradoxality to regard things of all the four types as relevant (positively or negatively) to the generalization that all S are B. All things in the world can be divided into the four mutually exclusive and jointly exhaustive categories of things which are Sand B, S but not B, B but not S, and neither S nor B. Things of the first category obviously bear positively and things of the second category negatively on the generalization. But of the things of the third and fourth category some, we "intuitively" feel, do not bear at all on the generalization, have nothing to do with its content  and therefore "confirm" it only in a "paradoxical" sense. The categories of things C & Band S & W differ from the categories of things ~ S & B and ~ S & ~ B in this feature: All things of the first two cate
THE PARADOXES OF CONFIRMATION
211
gories are things in the box, but some things (in fact the overwhelming majority of things) of the last two categories are things outside the box. The things which we "intuitively" regard as affording "paradoxical" confirmations of the generalization that all S are B are those things of the 3rd and 4th category which are not things in the box. I shall here introduce the term range ofrelevance of a generalization. And I shall say that the range of relevance of our generalization above that all spherical things in the box are black is the class of all things in the box. I now put forward the following thesis: All things in the range of relevance of a generalization may constitute genuine confirmations or disconfirmations of the generalization. The things outside the range are irrelevant to the generalization. They cannot confirm it genuinely. Since, however, they do not disconfirm it either, we may "by courtesy" say that they confirm it, though only "paradoxically". In order to vindicate my thesis I shall try to show, by means of a formal argument, that the irrelevance of the "paradoxical" confirmations consists in the fact that they are unable to affect the probability of the generalization. Showing this is one way, and a rather good one it seems to me, of dispelling the air of paradoxality attaching to these confirmations. 3. It is important to state explicitly the logicomathematical frame of probability within which we are going to conduct our formal argument concerning the confirmation paradoxes. The probability concept of the confirmation theories of Carnap and Hintikka is a twoplace functor which takes propositions (or, on an alternative conception, sentences) as its arguments. The probability concept used by us is a functor the arguments of which are characteristics (attributes, properties). Let "qJ" and "1/1" stand for arbitrary characteristics of the same logical type (order). The expression "P(cp/l/J)" may be read "the probability that a random individual is tp, given that it is l/J". Instead of "is" we can also say "has the characteristic", and instead of "given" we can say "on the datum" or "relative to". We stipulate axiomatically that, for any pair of characteristics which are of the same logical type and such that the second member of the pair is not empty, the functor "P( I )" has a unique, nonnegative numerical value. Furthermore, the functor obeys the following three axioms: AI. (Ex)l/Jx &(x)(l/JX4cpx)4P(cp!l/J) = 1, A2. (Ex)l/Jx4P(cpll/J)+P( cpll/J)= 1, A3. (Ex)(xx &cpx)4P(cp!X)' P(l/Jlx &cp)=P(cp&l/Jlx)·
212
G. H. VON WRIGHT
It is a rule of inference of the calculus that logically equivalent (names of) characteristics are intersubstitutable in the functor "P( I )" ("Principle of Extensionality"). The application of probabilities, which are primarily associated with characteristics, to individuals is connected with notorious difficulties. The application is sometimes even said to be meaningless. This, however, is an unnecessarily restricted view of the matter. If x is an individual in the range of significance of cp and t/J, and if it is true that P( cp!t/J) =p, then we may, in a secondary sense, say that, as a bearer of the characteristic t/J, the individual x has a probability p of being a bearer also of the characteristic ip,
4. If R is the range of relevance of the generalization that all A are B, and if this generalization holds true in that range, then it will also be true that (x)( Rx» (Ax~ Bx)).  This may be regarded as a "partial definition" of the notion of a range of relevance. For the sake of convenience, I shall introduce the abbreviation" Fx" for "Axs Bx", "F", we can also say, denotes the property which a thing has by virtue of the fact that it satisfies the propositional function "Ax~ Bx", [ define a secondorder property q/ R by laying down the following truthcondition: The (firstorder) property X has the (secondorder) property 1ftR, if and only if, it is universally implied by the (firstorder) property R. That X is universally implied by R means that it is true that (x)(Rx~Xx). The property w R, in other words, is the property which a property has by virtue of belonging to all things in the range R. A property which belongs to all things in a range can also be said to be universal in that range. Assume we can order all things of which A, Band R can be significantly predicated into a sequence XI' x 2 , ••• , x n , •••• Then we can define a sequence of secondorder properties .9"'\, :F 2 , ..• , .9'" n ... as follows: The (firstorder) property X has the (secondorder) property :Fn , if and only if, it is true that RXn~Xxn' The property :Fn, in other words, is the property which a property has (solely) by virtue of belonging to a certain individual thing, if this thing is in the range R. ("If" here means material implication.) For the sake of convenience, I introduce the abbreviation "l[Jn" for the logical product of the first n properties in the sequence :F\,:F 2, ... , :Fn , ." • "1[Jn", we can also say, denotes the property which a property has by virtue of the fact that it is not missing from any of those of the first n things in the world which also are things in the range R.
THE PARADOXES OF CONFIRMATION
213
Finally, let "e" denote a tautological secondorder property, i.e. a property which any first order property tautologically possesses  for example, the property of either having or not having the secondorder property qt R (or ff n)· 5. We prove the following theorem of probability: T. P(o/IRIO»O ~(P( 'PiRliPn+ 1)> P( 'PiRliPn)~P(ffn+ lliP n ) < 1). The firstorder property R trivially has the secondorder property 0 & 4>no For, that OCR) may become equated with the tautology that epn(R) v ~ iPn(R), and "epn(R)" is an abbreviation for "(Rx 1~RX1)&'" &(Rxn~RxS'. Consequently, it is logically true (for all values of n) that (EX)(O(X)&iPn(X». It follows immediately that it is logically true, too, that (EX)O(X) and (EX)epn(X). From A3 we derive, by substitution and detachment, that P( epn & o/tRIO) =P(iPnIO)' P('PiRIO &iPn). "epn&'Pi/' is logically equivalent with "ollR" alone. This follows from the way the secondorder properties were defined. That o/tR(X) means that (x)(Rx~Xx) and that iPn(X) means that (Rx 1~XX1)&'" &(Rxn~Xxn)' Similarly, "0 & epn" is logically equivalent with "4>n" alone. Substituting the simpler equivalents, the equality above reduces to that P( 'PiRIO) =P(epnIO)· P( 'PiRliP n). By an exactly analogous argument we derive the equality that P(qtRIO) =P(iPn+ 110)· P( 'PiRliPn+ 1)' Combining the two equalities we get thatP(iPnIO)'P(o/tRlepn)=P(epn+lI0)' .P('PiRlepn+ 1)' Now assume that P(°11 RIO) > O. Since probabilities are nonnegative, it follows that P( 'PiRliPn+ 1)> P( 0/1Rlepn)~P( epnl O) > P( epn+ 110). By repeated application of A3 we detach the equalities that P( epnlO) =P(ff 110) · ... ·P(ffnI0&4>n_l) and that P(epn+lle)=p('~110)"'" P(ffnI0&iPnl)·P(ffn+lI0&4>n). The assumption that PCOllRIO»O guarantees that all the factors of the products are different from O. Hence, after cancellation, we get that P(iP nI8)P('PiRliPn)~P(ffn+llepn)< 1. This completes the proof of T. Let us now see what this theorem amounts to in plain words. "P('PiRIO»O" says that the probability that a random property in the universe of properties is true of all things in the range R is greater than O. "P('PiRlepn+ l»P('PiRlepS' says that the probability that a random property is true of all things in the range R, is greater, given that it is true of
214
G. H. VON WRIGHT
those of the first n + 1 things in the world which fall in this range, than given (only) that it is true of those of the first n things which fall in this range. "P(J n + IlcP n ) < 1", finally, says that the probability that a random property is true of the (n + I yt thing in the world, if this thing belongs to the range R, is smaller than 1, given that this property is true of those of the first n things in the world which fall in that range. The theorem as a whole thus says the following: lfthe probability that a random property in the universe of properties is true of all things in the range R is not minimal (0), then the probability that this property is true of all things in the range is greater, given that it is true of those of the first n + 1 things which fall in the range, than given (only) that it is true of those of the first n things which fall in the range, if and only if, the probability that it is true of the(n+ 1)slthing, if this belongs to the range, is not maximal (1), given that it is true of those of the first n things which belong to the range R. Now apply the theorem to the individual property F. To say that F is true of all things in the range R is tantamount to saying that the generalization that all A are B is true in the range R. To say that F is true of those of the first n (or n+ 1) things in the world which are also things in the range amounts to saying that the first n (or n + 1) things afford confirming instances of the generalization that all A are B in the range R. To say that F is true of the (n + I yl thing, if this thing belongs to the range, finally, comes to saying that this thing affords a confirming instance of the generalization that all A are B in the range R. When applied to the individual property F, the theorem as a whole thus says the following: If, on tautological data ("a priori"), the probability that all A are B in the range R is not minimal, then the probability of this generalization is greater on the datum that the first n + I things in the world afford confirming instances of it than on the datum that the first n things afford confirming instances, if and only if, the probability that the (n + l.)" thing affords a confirming instance is not maximal on the datum that the first n things afford confirming instances. It follows by contraposition that, if this last probability is maximal (l), then the new confirmation of the generalization in the (n+ It instance does not increase its probability. The new confirmation is, in this sense, irrelevant to the generalization. 6. Now assume that the thing x n + 1 actually does not belong to the range
THE PARADOXES OF CONFIRMATION
215
of relevance R of the generalization that all A are B. In other words, assume that ",Rx n+ 1 . It is a truth of logic (tautology) that "'RXn+1~(Rxn+1~Exn+1)'Since "E" does not occur in the first antecedent, we can generalize the first consequent in "E". It is a truth oflogic, too, that", Rxn+ 1 ~(X)(Rxn+ 1 ~XXn+ 1)' By definition, ff n+1(X) can replace RXn+1~XXn+1' Thus it is a truth of logic that", Rxn+ 1 ~(X)ffn+ 1 (X). From this it follows trivially that", Rxn+ 1+ (X)( 4in(X)+'~n+1 (X)). According to axiom Al of probability (X) ( 4in (X ) +ff n+ 1 (X)) entails that P(ffn+ 114in) = 1  provided that at least one property has the (secondorder) property 4in The existential condition is satisfied, since the property R trivially has the property 4in • 4in(R) means by definition the same as (RX1 ~RX1) & ... &(Rxn+Rx n) which is tautology. Herewith has been proved that, if it is the case that", Rxn+ l' i.e. if the (n + l)'t thing in the world does not belong to the range R, then it is also the case thatP(ffn+ ll4i n)=I, i.e. then the probability that this thing will afford a confirmation of any generalization to the effect that something or other is true of all things in this range, is maximal. This probability being maximal, the confirmation which is trivially afforded by the thing in question is irrelevant to any such generalization in the sense that it cannot contribute to an increase in its probability. And this constitutes a good ground for saying that a thing which falls outside the range of relevance of a generalization can be said to afford only a "vacuous" or "spurious" or "paradoxical", and not a "genuine", confirmation of the generalization in question. 7. After all these formal considerations we are in a position to answer such questions as this: Is it possible to confirm genuinely the generalization that all ravens are black through the observation, e.g., of black shoes or white swans? The answer is that this is possible or not, depending upon which is the range of relevance of the generalization, upon what the generalization "is about". If, say, shoes are not within the range of relevance of the generalization that all ravens are black, then shoes cannot afford genuine confirmations of this generalization. This is so, because no truth about shoes can then affect the probability of the generalization that, in the range ofrelevance in question, all things which are ravens are black. So what is then the range of relevance of the generalization that all ravens are black? Here it should be noted that it is not clear by itself which is the range of relevance of a given generalization such as, e.g., that all ravens are black. Therefore it is not clear either which things will afford genuine and
216
G. H. VON WRIGHT
which only paradoxical confirmations. In order to tell this we shall have to specify the range. Different specifications of the range lead to so many different generalizations, one could say. The generalization that all ravens are black is a different generalization, when it is about ravens and ravens only, and when it is about birds and birds only, and when it is  ifit ever is  about all things in the world unrestrictedly. As a generalization about ravens, only ravens are relevant to it, and not, e.g., swans. As a generalization about birds, swans are relevant to it, but not, e.g., shoes. And as a generalization about all things, all things are relevant  and this means: of no thing can it then be proved that the confirmation which it affords is maximally probable relative to the bulk of previous confirmations and therefore incapable of increasing the probability of the generalization. When the range of relevance of a generalization of the type that all A are B is not specified, then the range is, I think, usually understood to be the class of things which fall under the antecedent term A. The generalization that all ravens are black, range being unspecified, would normally be understood to be a generalization about ravens  and not about birds or about animals or about everything there is. I shall call the class of things which are A the natural range ofrelevance of the generalization that all A are B. It would be a mistake to think, when the range of relevance of a generalization is unspecified, it must be identified with the natural range. If it strikes one as odd or unplausible to regard the genus bird, rather than the species raven, as the range of relevance of the generalization that all ravens are black, this is probably due to the fact that the identification of birds as belonging to this or that species is comparatively easy. But imagine the case that species of birds were in fact very difficult to distinguish, that it would require careful examination to determine whether an individual bird was a raven or a swan or an eagle. Then the generalization that all birds which are (upon examination turned out to be) ravens are black might be an interesting hypothesis about hirds. Perhaps we can imagine circumstances too under which all things, blankets and shoes and what not, would be considered relevant to the generalization that all ravens are black. But these circumstances would be rather extraordinary. (We should have to think of ourselves as beings who quasi put their hands into the universe and draw an object at random.) Only in rare cases, if ever, do we therefore intuitively identify the unspecified range with the whole logical universe of things. It would also be a mistake to think that the range of a generalization must become specified at all. But even when the range is left unspecified we may
THE PARADOXES OF CONFIRMATION
217
have a rough notion of what belongs to it and what does not  and therefore also a rough idea about which things are relevant to testing (confirming or disconfirming) the generalization. No ornithologist would ever dream of examining shoes in order to test the hypothesis that all ravens are black. But he may think it necessary to examine some birds which look very like ravens, although they turn out actually to belong to some other species. 8. In conclusion I shall say a few words about the alleged conflict between the socalled Nicod Criterion and the Equivalence Condition (cf. above, section 1). The Nicod Criterion, when applied to the generalization that all A are B, says that only things which are both A and B afford genuine confirmations of the generalization. Assume now that the range of relevance of the generalization in question is A, i.e. assume that we are considering this generalization relative to what we have here called its natural range. Then, by virtue of what we have proved (sections 46), anything which is notA cannot afford a genuine confirmation of the generalization. In other words: Within the natural range ofrelevance ofa generalization, the class ofgenuinely confirming instances is determined by Nicod's Criterion. But is this not in conflict with the Equivalence Condition? This condition, as will be remembered, says that what shall count as a confirming (or disconfirming) instance of a generalization cannot depend upon any particular way of formulating the generalization (of a number of logically equivalent formulations). Do we wish to deny then that the generalization that all A are B is the same generalization as that all not Bare notA? We do not wish to deny that "all A are B" as a generalization about things which are A expresses the very same proposition as "all notB are notA" as a generalization about things which are A. Generally speaking: when taken relative to the same range of relevance, the generalization that all A are B and the generalization that all notB are notA are the same generalization. But the generalization that all A are B with range of relevance A is a different generalization from the one that all notB are notA with range of relevance notB. If we agree that, range of relevance not being specified, a generalization is normally taken relative to its "natural range", then we should also have to agree that, the ranges not being specified, the forms of words "all A are B" and "all notB are notA" normally express different generalizations. The generalizations are different, because their "natural" ranges of relevance are different. This agrees, I believe, with how we naturally tend to understand the two formulations.
218
G. H. VON WRIGHT
Speaking in terms of ravens: The generalization that all ravens are black as a generalization about ravens, is different from the generalization that all things which are not black are things which are not ravens as a generalization about all notblack things. But the generalization that all ravens are black as a generalization about, say, birds is the very same as the generalization that all things which are not black are not ravens as a generalization about birds. (For then "thing which is not black" means "bird which is not black".) Within its natural range of relevance, the generalization that all A are B can become genuinely confirmed only through things which are both A and B and is "paradoxically" confirmed through things which are B but not A, or neither A nor B. Within its natural range of relevance the generalization that all notB are notA can become genuinely confirmed only through things which are neither A nor B and is "paradoxically" confirmed through things which are both A and B, or B but not A. Within the natural range of relevance, Nicod's Criterion of confirmation is necessary and sufficient. Within another specified range of relevance R, the generalization that all A are B may become genuinely confirmed also through things which are B but not A, or neither A nor B. And within the same range of relevance R, the class of things which afford genuine confirmations of the generalization that all A are B is identical with the class of things which afford genuine confirmations of the generalization that all notB are notA. Thus, in particular, if the range of relevance of both generalizations are all things whatsoever, i.e. the whole logical universe of things of which A and B can be significantly predicated, then everything which affords a confirming instance of the one generalization also affords a confirming instance of the other generalization, and vice versa, all confirmations being "genuine" and none "paradoxical".
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS DANA SCOTT Stanford University, Stanford, Calif," and PETER KRAUSS University of California, Berkeley, Calif:
1. Introduction. Probability concepts nowadays are usually presented in the standard framework of the Kolmogorov axioms. A sample space is given together with a afield of subsets  the events  and a aadditive probability measure defined on this afield. When the study turns to such topics as stochastic processes, however, the sample space all but disappears from view. Everyone says "consider the probability that X2 0", where X is a random variable, and only the pedant insists on replacing this phrase by "consider the measure ofthe set {WEQ:X(W)20}". Indeed, when a process is specified, only the distribution is of interest, not a particular underlying sample space. In other words, practice shows that it is more natural in many situations to assign probabilities to statements rather than sets. Now it may be mathematically useful to translate everything into a settheoretical formulation, but the step is not always necessary or even helpful. In this paper we wish to investigate how probabilities behave on statements, where to be definite we take the word "statement" to mean "formula of a suitable formalized logical calculus". It would be fair to say that our position is midway between that of Carnap and that of Kolmogorov. In fact, we hope that this investigation can eventually make clear the relationships between the two approaches. The study is not at all complete, however. For example, Carnap wishes to emphasize the notion of the degree of confirmation which is like a conditional probability function. Unfortunately the mathematical theory of general conditional probabilities is not yet in a very good state. We hope in future papers to comment on this problem. Another question concerns the formulation of
* This work was partially supported by grants from the National Science Foundation and the Sloan Foundation.
220
DANA SCOTT AND PETER KRAUSS
interesting problems. So many current probability theorems involve expectations and limits that it is not really clear whether consideration of probabilities of formulas alone really goes to the heart of the subject. We do make one important step in this direction, however, by having our probabilities defined on infinitary formulas involving countable conjunctions and disjunctions. In other words, our theory is oadditive. The main task we have set ourselves in this paper is to carryover the standard concepts from ordinary logic to what might be called probability logic. Indeed ordinary logic is a special case: the assignment of truth values to formulas can be viewed as assigning probabilities that are either 0 (for false) or 1 (for true). Tn carrying out this program, we were directly inspired by the work of Gaifman [1964] who developed the theory for finitary formulas. Aside from extending Gaifman's work to the infinitary language, we have simplified certain of his proofs making use of a suggestion of C. RyllNardzewski. Further we have introduced a notion of a probability theory, in analogy with theories formalized in ordinary logic, which we think deserves further study. In Section 2 the logical languages are introduced along with certain syntactical notions. In Section 3 we define probability systems which generalize relational systems as pointed out by Gaifman. In Section 4 we show how given a probability system the probabilities of arbitrary formulas are determined. Tn Section 5 we discuss modeltheoretic constructs involving probability systems. In Section 6 the notion of a probability assertion is defined which leads to the generalization of the notion of a theory to probability logic. In Section 7 we specialize and strengthen results for the case of finitary formulas. In Section 8 examples are given. An appendix (by Peter Krauss) is devoted to the mathematical details of a proof of a measuretheoretic lemma needed in the body of the paper. 2. The languages of probability logic. Throughout this paper we will consider two different firstorder languages, a finitary language !E(w) and an infinitary language !E. To simplify the presentation both languages have an identity symbol = and just one nonlogical constant, a binary predicate R. Most definitions and results carryover with rather obvious modifications to the corresponding languages with other nonlogical constants, and we will occasionally make use of this observation when we give specific examples. The language !E(w) has a denumerable supply of distinct individual variables Vn' for each n < W, and !E has distinct individual variables v~, for each ~ <WI' where WI is the first uncountable ordinal. Both languages have logical
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
221
constants r., v, ', V, 3, and = standing for (finite) conjunction, disjunction, negation, universal and existential quantification, and identity as mentioned before. In addition the infinitary language se has logical constants 1\ and V standing for denumerable conjunction and disjunction respectively. The expressions of se are defined as transfinite concatenations of symbols oflength less than W 1, and the formulas of secw) and se are built from atomic formulas of the forms Rv~v~ and v~=v~ in the normal way by means of the sentential connectives and the quantifiers. Free and bound occurrences of variables in formulas are defined in the wellknown way. (For a more explicit description of infinitary languages see the monograph Karp [1964].) A sentence is a formula without free variables. We will augment the nonlogical vocabulary of our languages with various sets T of new individual constants tET and denote the resulting languages by secw)(T) and seCT) respectively. It is then clear what the formulas and sentences of secW)(T) and seCT) are. For any set T of new individual constants let Y and yeT) be the set of sentences of se and seCT) respectively, and let oCT) be the set of quantifierfree sentences of seCT). We adopt analogous definitions for the language secw). If L is a set of sentences and cp is a sentence, then cp is a consequence of L if cp holds in all models in which all sentences of L hold, and we write L 1= cpo cp is valid if it is a consequence of the empty set, and we write 1= tp, For both languages 2 Cw) and :e we choose standard systems of deduction, and we write L f cp if cp is derivable from L. cp is a theorem if cp is derivable from the empty set, and we write f tp, (For details concerning the infinitary language we again refer the reader to Karp [1964].) By the wellknown Completeness Theorem of finitary first order logic we have for every L <;;ycw) and every cpE.
222
DANA SCOTT AND PETER KRAUSS
two sentences
3. Probability systems. We start with the definition of a concept which corresponds to the notion of a relational system in ordinary logic. Recall that if .d is a Boolean algebra then a probability on .9/ is a oadditive probability measure on .#. A finitely additive probability on .d is a finitely additive probability measure on .#. For a detailed discussion of these concepts see Halmos [1963] and Sikorski [1964]. DEFINITION. A probability system (or sometimes, a probability model) is a quintuple
<.4.
If m=c
m
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
223
model. Most concrete examples of probability systems we have encountered have strict identity. However some intuitively very suggestive model constructions, such as the ultraproduct construction and symmetric probability systems which will be discussed in Section 5, lead beyond the realm of probability systems with strict identity. For this reason we thought it advisable to introduce the more general notion of a probability system. If 1ll=
a ~ b iff Id(a,b) = 1. The substitution property of Id implies that
~
is a congruence relation on
Ill. The cardinality of Ill, denoted by IIll/ is defined to be the cardinality of the
set of equivalence classes with respect to ~. Ifill is a probability system with strict identity then IIIII is the cardinality of the set A. More generally, for any subset A' c;; A the systemcardinality of A, denoted by IA'I~I' is defined to be the cardinality of the set of equivalence classes with respect to ~ which have a nonempty intersection with A'. 4. Probability interpretations. We now interpret the language !l' in probability systems and give a definition of the concept "a sentence cp holds in a probability system III with probability a", where O~ a~ 1 is a real number. The definition could be given in the traditional way using an analogue of Tarski's concept of satisfaction; however, in the context of probability logic it seems to be more appropriate to use the equally wellknown device of new individual constants. Let 1ll=
=
1 ~ h(cp),
(iv) h(V CPi) = V h(CPi)' i<~
i<~
i<~
i<~
(v) h(/\ cp;) = /\ h(cp;), (vi) h(3vcp) = V h(cp(ta) )
,
aeA
(vii) h (V vcp) = /\ h (cp (t a) ) aeA
.
The following lemma is wellknown and easy to prove:
224
DANA SCOTT AND PETER KRAUSS
4.1. (i) For all tp, l/JE,C/'(T'll)' ijfCP++l/J, then h(cp)=h(l/J); (ii) h induces a ahomomorphism from .C/'(T~()/f into d. Proof: By the definition of h it suffices to prove that for all cpE9'(T'l1)' if cpf then h(cp)=l. This can be done by considering a standard system of deduction for 2'(T~(), and showing that h maps all axioms into 1 and that the property of being mapped into 1 is preserved under all rules of deduction. For a more detailed presentation see Karp [1964]. (ii) is an immediate conseq uence of (i). LEMMA
We identify h with the induced homomorphism and define Jl'll(cp/f)= m(h(cp)) for all cpE.C/'(T~a. Then Jl~( is a probability on 9'(T'll)/f. (This is a wellknown fact in measure theory, and a proof can be found in Halmos [1963] p. 66.) Since hardly any confusion could arise we write Jl21(CP) for Jl~(cp/f), and we read "Jl~(cp)=a" as "cp holds in the probability system m: wi th probability a". If m: is a probability system with strict identity and m is two valued, then for every CPE2'(TtI)' Jlm(cp)=1 iff cp holds in the model
Ilm(3vcp)= sup 1l21(V cp(tJ);
(G)
FeA«(J)
aeF
where A(w) is the set of all finite subsets of A. Proof: ,u~,(3vcp)=m(h(3vcp))=m( V h(cp(ta))). .,,1 is a measure algebra and aEA
therefore satisfies the countable chain condition (see Halmos [1963] p. 67). Thus there exists a countable subset A'c;;A such that
V h«(p(t a )) = V h(cp(t a ) )
Thus Since
aeA
aeA'
1l~1(3vcp) = m( V h(cp(t a ))) aeA'
J1~1
is aadditive,
J1~I( V (p(t a ) ) (leA'
.
= m(h( V cp(ta ))) = JIm ( V cp(ta) ) . {leA'
= sup
FeA'(UJ)
aeA'
,u~I( V cp(ta))· aeF
Condition (G) now follows from the choice of A'. We now present the preceding ideas from a slightly different point of view
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
225
to see that the probability system ~ may be identified with the restriction of the probability Il~( to {}(~()/f. We first observe that from the definition of h and the countable chain condition in d it follows that for the purpose of our probability interpretation we may assume that d is the aalgebra generated by the union of the images of A x A under Id and R respectively. If the definition of h is restricted to clauses (i)(v), then obviously Lemma 4.1 holds with ,5f'(Tm) replaced by {}(Tm). Since m is strictly positive on ,# we have Thus the quotient algebra of {}(Tm)jf modulo the aideal {q>jfEo(Tm)jf: J1m(q»=O} is isomorphic to d, and it is a wellknown fact that the probability m on d may be uniquely recaptured from the probability Il~( on o(Tm)jf. (See, e.g., Halmos [1963] pp. 64ff.) Thus the probability system ~ is, up to the obvious isomorphism, determined by the ordered pair (Tm, 11m), where Il'll is restricted to {}(T~l)jf. In general any ordered pair (T, Il), where T is a set of new individual constants and Il is a probability on {}(T)jf, uniquely determines a probability system ~. Indeed let A=T, let d be the quotient algebra of;;(T)jf modulo the aideal {q>jf:q>E{}(T), 1l(q»=O}, let m be the probability on d induced by u, and let Id(t, t') and R(t, 1') be the image of t= t'jf and R(t, t')jf under the canonical homomorphism of;;(T)jf onto d. Then ~=(A, R, Id, .9/, m) clearly is a probability system; it is easy to check that the valuation homomorphism h is the canonical homomorphism, and Il is the restriction of Ilm to {}(T)jf. Moreover, if ll(t=t')=O for all t, t'ETwhere Ii:.t', then ~ has strict identity. Thus we may also regard a probability system as an ordered pair (T, m), where T is a set of new individual constants, and m is a probability on {}(T)jf. The probability systems with strict identity are then characterized by the condition m(t=t')=O for all t, t'ETwhere ti=t'. This is the form in which Gaifman [1964] introduces the concept of a probability model and, whenever convenient, we will also adopt this terminology. From this new point of view we have the following extension theorem: 4.3. Let (T, m) be a probability system. Then there exists a unique
THEOREM
probability m* on ,5f'( T)jf which extends m and satisfies the Gaifman Condition: whenever 3Vq>EY' (T), then
(G)
m*(3 Vip) =
sup FET(W)
m*( V
where T(w) is the set of all finite subsets of T.
tEF
q>(t));
226
DANA SCOTT AND PETER KRAUSS
Proof: The existence of m* is clear from our considerations above. The uniqueness of the extension will be proved by transfinite induction. During the course of our proof we will make use of analogues of Lemma 7.9 which will be established separately, and of course independently, for the finitary language ::t'lwj (T) in Section 7 of this paper. For every ordinal ~ < WI' we shall define sets of sentences j~(T) S; Y' ~(T) S; YeT) by recursion: First let Jo(T)=J(T). Then if ~>O, let j~(T) be the closure of U .'/'~(T) under denumerable propositional combinations. For t1<~
every ~ < W j , let Y~(T) be the closure of ,)~(T) under quantification and finite propositional combinations. Then obviously whenever IJ < ~ < WI' and
Y'(T)=
U
';<(1)1
.J~(T)=
U
.'/'~(T).
~<(J)l
Now suppose III and I1 z are both oadditive probability measures on (/'(T)jf which extend m and satisfy condition (G). We shall prove by transfinite induction that for every ~<Wl and everyO, first observe that (U Y'~(T))jf is a subalgebra '1<~
of Y(T)jf that cgenerates ,)~(T)jf. By way of an induction hypothesis, we assume that 111 (
oadditive measures, we conclude by a wellknown extension theorem of . measure theory (see Halmos [1950] p. 54), that 111 (
O. Thus there are uncountably many p's such that fl(P) > O. Thus for some n < ca there are uncountably many
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
227
paper, which renders the existence part of Theorem 4.3 almost trivial, was suggested to us by Professor C. RyllNardzewski. It is now clear how to define various probabilitymodeltheoretic concepts in analogy to the standard concepts of ordinary model theory. We will discuss a few examples in the next section. 5. Modeltheoretic concepts in the theory of probability systems. If
<
Remark: The concepts defined in (i), (ii) and (iii) correspond to the concepts of subsystem, !I!subsystem and !I!equivalence respectively in ordinary model theory. Not many interesting results concerning these concepts are known for the in finitary language !I!, a phenomenon which the probabilitymodel theory of !I! seems to share with the ordinary model theory of !I!. In many cases the authors have been able to establish for probability logic analogies of major results known from ordinary logic; this is particularly true for the finitary language !I!(wl, for which several results have already been published by Gaifman [1964]. We present next a few standard constructions for probability systems. Independent Unions. Let I be an index set. For each iEI, let !l!i be the infinitary language whose only nonlogical constant is the binary predicate R;, and let .'1"; be its set of sentences. Let T be a set of new individual constants. For each iEI, let
m=
iel
TI m, is the product measure on)(T)/f induced by the family {mi : iEI}.
i€I
228
DANA SCOTT AND PETER KRAUSS
Then we define the independent union of the family of probability systems {
We note two corollaries of the construction given with this definition: 5.1. For every iEI and qJEY'i(T), m*(qJ)=mi(qJ). Proof: We argue by transfinite induction along the same lines as the uniqueness part of the proof of Theorem 4.3. Let iEI, and for every ~<Wl define sets of sentences di~(T)~ Y'i~(T) ~ Y'i(T), as in the proof of Theorem 4.3. If qJEdiO(T) then m* (qJ)=mi (qJ) by the definition of m. The rest of the induction is carried out as in the proof of Theorem 4.3. COROLLARY
We state a simple fact about product measures: Let X, Y be sets, let d, :!B be fields of subsets of X, Y respectively; let d, PJ be the ITfields generated by s¥, :!B respectively; and let u, v be probabilities on d, PJ respectively. Let .;;f x PJ be the product ITfield of.91 and PJ. Then we have: LEMMA 5.2. If A is a probability on .;;f x PJ such that A(A x B)=/l(A)' v(B) for all Auy, BE!!IJ, then A(A x B)=/l(A)'v(B)for all AEd, BEPJ. Proof: Let .97 x!!IJ be the field of subsets of X x Y generated by rectangles A x B, where AE.W', BE.OJ. Then the condition A(A x B)=/l(A)' v(B) determines the probability A on d x .OJ. .;;f x PJ is ogenerated by d x:!B. Thus this condition determines A on d x PJ. The product measure /l x v on d x PJ agrees with A on d x:!B. Thus A=/l x v, which proves the assertion. COROLLARY
5.3. For every n « co, let inEI and let qJnEY';jT). Then
m*( 1\ qJn)= fl m*(qJn)' n<w
n<w
Proof: The assertion follows from the continuity ofm* if we can establish: If nc co, i o, " " i n  1 EI and qJkEY'iJT)for all k «;n, then m*( 1\ qJk) = fl m*(qJk)' k
k
As in the proof of Theorem 4.3, for every ~<Wl' k «:n we define sets of If qJkE.'l'ik(T) for all ken, then there exists ~<(()l such that qJkEY'ik~(T) for all k «:n. Accordingly we prove by transfinite induction: For every ~<(()l' if qJkE·Y'ik~(T) for all k «:n, then m* ( 1\ qJk) = flm* (qJk)' First if qJkE {}ik O(T) for all k < n, then the assertion sentenCeSJik~(T)~Y'ik~(T)~.Y'dT).
k
k
holds by the definition of m*. If qJkE.'l'ikO(T) for all k «:n, then for every k
229
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
string of quantifiers and Mk is an 0ik O(T)matrix of CfJk> which means every substitution instance of Mk belongs to 0ik O(T). It is now easy to see that by an analogue of Lemma 7.10, straightforward computations with sup's and inf's, and the fact that we have established the assertion for CfJkEOikO(T), k «:n, we obtain m*( /\ QkMk)= m*(QkM k). We omit the cumbersome
n
k
k
details and illustrate the idea with a simple example. By an analogue of Lemma 7.10,
m*(3voMo(vo) A VvIMI(VI»= sup infm*[( V Mo(t o» Fo
toeEo
Fl
A (
1\ MI(tI»] tl EF I
= sup inf[m*( V Mo(to»'m*( /\ MI(tI»] Fo
FI
toeEo
II
eF I
= sup m*( V Mo(to»'inf m*( /\ MI(tt» Fo
toeEo
F1
IIEF t
because V Mo(to)EoioO(T), /\ MI (tl)EOitO(T), and those are formulas for toeFo
flEFt
which the assertion has already been established. Next assume ~ > and that the assertion holds for all ordinals smaller than ~. First suppose CfJkEOik~(T) for all ken. Remember that for every k:«:n, (U Y ik" (T»/f is a subalgebra of Yik(T) and rrgenerates Oik~(T). Now
°
,,<~
suppose tfikEU Yik,,(T) for all ken. Then for some 11«, tfikEYik,,(T) for ,,<~
all ken. Thus, by inductive hypothesis, m*( /\ tfik)= k
dimensional version of Lemma 5.2, m*( /\ CfJk)= k
nm*(tfik)' By an n
n m*(CfJk)' In the general k
k
case where CfJkEYik~(T) for all ken, we proceed in the familiar fashion using prenex normal forms and Lemma 7.10, as in the second part of the case ~=o. This completes the proof of Corollary 5.3. The construction of independent unions is particularly valuable for the introduction into a given probability system of "a priori conditions" such as ordinary relational structures. For example, we may consider a probability system '!(I=
230
DANA SCOTT AND PETER KRAUSS
It is natural to ask for the definition of an analogue of the direct product of ordinary relational systems; however, a reasonable, natural generalization of this construction for probability systems does not seem to exist. On the other hand we are able to give an intuitively very suggestive definition of an analogue of the ultraproduct construction of relational systems. Ultraproducts. Consider again our language if with one binary predicate R. Let! be an index set. For each ie J, let T, be a set of new individual constants, and let j , m;) be a probability system, where m, is a probability on J(~)/f. Let T= T, be the Cartesian product of the family of sets {~: iEI}.
n
iEJ
For qJEY'(T) and iEJ, let qJli be the projection of qJ onto the r h coordinate; that is, replace in tp every tETby tjE~. Then for every qJEy;J(T) and iEJ, f qJ implies f qJli. Finally let A be a probability on the power set of 1. Define for all (pEJ(T) a function m by the equation
m(qJ) =
J
mj(qJli)dA(i).
1
5.4. (i) For every qJ, ljIEJ(T), if f qJ+>ljI, then m(qJ)=m(ljI). if r ip ; then m(cp)=1. (iii) m, regarded as afunction on J(T)/f, is a probability. Proof: (i) and (ii) are trivial. Thus m may indeed be regarded as a function on .J (T)/f, and it suffices to prove oadditivety. Suppose qJnEa(T) for all n < W, and f, (qJlIl;\ (Pn) for all m iol1. Then f, (qJlIlli;\ qJnli) for all m ion and all iEI. Thus for all iEJ LEMMA
(ii) For every qJEJ(T),
I1l j( V qJnli)= Il1l i(qJnl i). n<w
n<w
J
Therefore by the Dominated Convergence Theorem
m( V (Pn) = n<w
m j ( V CPn I i) dA(i)
JI
n<w
1
=
I
n
I1lj(qJn I i) dA(i)
n~roJ 11l;(qJnl i)dA(i) =
I
n<w
1
m(qJ,,).
We define the ultraproduct with respect to A of the family of probability
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
231
systems {
COROLLARY
5.5. For all lpEY(T),
m*(lp) =
J
m7(lpji)dA(i).
I
Proof: Define ]l(lp)=J1m7(lpli)dA(i), for all lpEY(T). By the same argument as in Lemma 5.4, ]l is a probability on Y(T)/!. Clearly ]l extends m. By Theorem 4.3 it suffices to prove that ]l satisfies the condition (G). Let 3VlpEY(T). For every iEI, m7 (3 Vlp i i) =
sup m7 (V (lp I i)(t»).
FeTi(W)
teF
Therefore for every iEI and n
m7 (3vlp I i) = lim m7 (V (lp I i)(tik»). k
ne co
For n
n~ 00
Thus, by the Monotone Convergence Theorem, J1 (3vlp) = J
m~ (3vlp I i) dA(i) =
I
=
J lim I
limJm~(V
nr co
I
k
ntoo
m~ (V
k
lp (Sk) I i) dA(i)
lp(sk)li)dA(i) = limJ1(V (P(Sk») n.. .«co
k
=J1(V lp(Sk») = sup J1(V lp(t»). k<w
FeT(w)
teT
Remark: Let t, t'ETand let J={iEI:ti=t'}. If for every iEI,
232
DANA SCOTT AND PETER KRAUSS
function nETT is a permutation of T if tc is onetoone and onto. A permutation tt is finite if n(t)=t for all but a finite number of lET. Following a suggestion of Gaifman [1964] we call
5.6. Let
permutation n ofT and every (pEJ(T), m*(
properties of sup's and inf's that we obtain m*(QM)=m*(QM"). We illustrate the argument with a simple example. By an analogue of Lemma 7.9,
m*(3v OVv1M(v o,v 1 )) = sup
inf
FoeT{W) F1ET(W)
Now V
rn e Fo
m*(V
/\
toeF01{EFI
M(to,t 1 ) )
.
/\ M(to, t l)EJO (T), and the assertion has already been establishtl
eF l
ed for these formulas, so we have for every F 0' F 1 ET(W)
m*( V
/\
10eFo tlEF l
Finally, since
tt.
M(tO,t 1 ) ) = m*( V
/\
toeEo llEFI
M"(n(tO),n(t 1 ) ) )
.
is onto,
m*(3v OVv1M(v o,v t )) =
e
sup
inf
FoeT(W) FIET(W)
m*( V
/\ M"(tO,t 1 ) )
10eFo fiEF.
Next, assume > 0 and that the assertion holds for all ordinals smaller than Recall that (U Y'q(T))/f is a subalgebra of /7 (T)/f that agenerates
e.
q<~
233
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
6~(T)/f.
By the inductive hypothesis m*(qJ)=m*(qJ") for every
qJE
U 9"~(T).
~<~
Let E be the set of qJE;J~(T) for which the assertion holds. If qJnEE and fqJn~qJn+l for all n «:co, then
m* (V qJn) n<w
=
lim m* (qJn)
ne co
= lim m*(qJ;) =
m*( V qJ;). n<w
A similar argument for decreasing sequences proves that E is monotone, and therefore by a wellknown fact about monotone classes (see Halmos [1950], p. 27), E=;J~(T). If qJEYJ~(T), we proceed as in the second part of the case of ~=O. In a sense the symmetric probability systems are diametrically opposite to ordinary relational systems. In ordinary relational systems the probability is as concentrated as possible; in symmetric probability systems it is completely dispersed. The condition of symmetry is a severe restriction on a probability system, as an example in Section 6 will demonstrate. This completes our discussion of probabilitymodeltheoretic concepts, and we now turn to the analogue in probability logic of theories in ordinary logic. 6. Probability assertions. In ordinary logic a theory of i f is any subset of 9" closed under deduction. In probability logic we first have to define the concept of probability assertions which play the role of sentences (or, better axioms and theorems) in ordinary logic. For this purpose we introduce a new language Jt, the firstorder language of real algebra. Jt has denumerably many distinct individual variables Am n < w. The nonlogical constants of Jt are a binary predicate ~, binary function symbols + and ., and individual constants 0, + 1 and 1. The logical constants of vlt are /\, v, ', V and 3, standing for (finite) conjunction, disjunction, negation, universal and existential quantification respectively. Formulas and sentences of Jt are defined as usual. Jt is to be interpreted in the real numbers in the standard way with the obvious meaning being given to the symbols. Let Re denote the set of real numbers, and say that Jt is interpreted in the relational system 9t=(Re, ~, +, ',0, + 1, I).
234
DANA SCOTT AND PETER KRAUSS
The set of sentences of ull true in 9t is called the set of theorems of real algebra. An algebraic formula is a quantifier free formula of uti. Every algebraic formula is equivalent in real algebra to a disjunction (conjunction) of conjunctions (disjunctions) of polynomial inequalities of the form p;;::O or p>O, where p is a polynomial with integral coefficients. We call an algebraic formula closed (open) if it is equivalent to a disjunction of conjunctions of polynomical inequations of the form p;;::O(p>O). It is obvious that in this definition we could have used the conjunctive instead of the disjunctive normal form. We now make several definitions. A probability assertion of se is an (n + 1)tuple <$, ({Jo, ... , ({Inl)' where nc co, $ is an algebraic formula with exactly n free variables and ({Jo, ... , (Pn 1 E .C/. A probability assertion is called closed (open) if the algebraic formula is closed (open). A probability system
<
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
235
bility assertions of JI. This, however, is not true for open probability assertions of JI'. We thus fall somewhat short of our objectives. Nevertheless since not very much work has yet been done towards the investigation of probability assertions, we chose the present formulation for its simplicity. If L:s; .'7, then L: determines a set of probability assertions {o.o 1 ~ 0, 1p):lpEL:}. Accordingly we say that
and
Let L:",
= {< qnAo  Pn ~ 0,
U
«
q~Ao
+ P~ ~ 0, tp »: n < w}
and let L:= U L:",. Accordingly we say that
fl if m(Ip)=fl(
complete and consistent theory, then L1 uniquely determines a two valued oadditive probability measure fl on Yj't. In this case
{V
n<w
P~n: ~
< wd
u {.
[P~n
1\
P~'n]: ~
< ~' < w 1 , n < w}.
It is easy to see that L: is consistent. Suppose then there exists auadditive probability measure fl on Yj't such that fl(
236
DANA SCOTT AND PETER KRAUSS
~ < WI such that f.1(P~n) > O. Since f.1([P~n /\ P~'n])=O for ~ # C this is a contradiction. Obviously every complete and consistent set of sentences of an infinitary proposi/ionallanguage has a model. In infinitary propositional logic the trouble therefore arises from the fact that the Prime Ideal Theorem fails for Boolean calgebras. Naturally the question arises: Does every complete and consistent set 1:<;; Y' have a model? The answer is again no, and a counterexample is due to Professor C. RyllNardzewski. Interestingly enough the counterexample produces a probability model of the complete consistent set of sentences under consideration. The question of whether every complete consistent set 1:<;; Y' has a probability model can, however, be settled by a similar counterexample, and we shall discuss both of these examples in a form slightly modified from RyllNardzewski's original suggestion. Let 2 be an infinitary language with countably many oneplace predicates P, for eachj<w, and define a probability model 1ll=(A, R j, d, m)j<", as follows: Let A =w, and let ,91 be the Borel sets of the product space (2"')"'; that is, the ofield of subsets of (2"')'" generated by all sets of the form gE(2"')"': ~(i) (j)=l}, where i,j<w. Let m be the product measure on d determined by m({~E(2"')"':~(i) (j)=l})=! for all i,j<w. Finally, forj<w, define Rj(i)=gE(2"')"':~(i)(j)=l} for all iEA. (Note: strictly speaking III is not a probability model since (s~, m) is not a measure algebra. Thus we would have to consider the quotient algebra s?iII of d modulo the oideal I={xEd:m(x)=O}, and lift m up to a strictly positive probability on diI. In this example, however, all sup's and inf's in ,91' that have to be taken into consideration are countable; clauses (vi) and (vii) of the definition of the valuation function h make sense; and everything comes out just the same. We can omit the tedious details.) Then let ~1={/;:jEA} be a set of new individ ual constants such that Ii # Ii" if i # 1'. Now we observe that for every IfJE S, the element h( IfJ )Es?i is invariant under all finite permutations of the second coordinate in (2"')"'. By the wellknown 01 Law (Hewitt and Savage [1955] p. 496) m is twovalued on h(IfJ). Thus the set 1:={IfJE,'/}: m(h(1fJ ))=1} is a complete and consistent theory of 2. We wish to show that 1: has no model. Indeed, suppose IB=(B, Sj)j<", is a model of 1:. Since B must be nonempty, let bEB. For j<w define formulas Qj(v)=Piv), if bES j; while Qj(v)=iPiv), if brjSj' Then 3v[ A Qj(v)] holds in lB. However, as a j<w
straightforward computation shows, m(3v[ A Qj(v)])=O, which is a contral « o»
diction. On the other hand, by its very construction of 1:; that is, f.1~1(1fJ)=1 for allIfJE1:.
mis a probability model
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
237
For our second example we let m:'=(A, R~, d')i<<» be that Booleanalgebraic model where A=w, where d'=d/J, the ideal J being the aideal of all firstcategory sets in the Borel algebra d, and where R~(i)=Rj(i)/J for all i.] «co. Since A is countable, we note that the valuation h' for m:' is such that h'(
Boolean algebra has a strictly positive aadditive probability measure iff it is weakly distributive and has the Kelley property (see Kelley [1959]). In Karp [1964] we find the theorem that every countable consistent set E r:;;. Y' has a countable model in the ordinary sense. The exact analogue of that result also holds for probability logic. We can also treat the case of theories with identity. To help formulate the result we define the formula On for O
each probability system (T, m) with strict identity we have m(On)E{O, I} for O
238
DANA SCOTT AND PETER KRAUSS
THEOREM 6.2. (i) Let J1 be a probability on .'7'If, and let E ~:7 be a countable set. Then there exists a countable probability model (T, m) such that for every cpEE, iii(cp)=J1(CP)' (ii) Iffor every O
We will give a detailed proof of part (ii) and leave the proof of part (i) to the reader. To prove this result we require a few lemmas. The first is a measure theoretic generalization ofthe wellknown RasiowaSikorski Lemma (see Rasiowa and Sikorski [1950]). The proof is given in full in the Appendix. 6.3. Let.iJ?J be a Boolean aalgebra and let d ~I:JB be a asubalgebra. Let J1 be a probability on d, and for every m, n < w let bmnEI:JB. Then there exists a finitely additive probability v on I:JB such that LEMMA
(i) vex) = J1(x),for all
XE
d;
(ii) v( 1\ bmn) = lim v(l\ bmi),for all m < W. nc rc
tfJ.r£·
i
For every cpE.J(T) we recursively define an ordinal number A(cp)<Wl' called the length of (P, by these equations:
(i) if (P is atomic, A(cp) = 1; (ii) J.( , (p) = l(cp)
+ 1;
(iii) A«(P, v (P2) = A(cp,
A
(P2) = A(cp,) + A(CP2)
(iv) A( V (Pn) = l( 1\ CPn) = n<w
n<(J)
I
n<w
+ 1;
l(CPn)'
LEMMA 6.4. If cpE;J(T), A(cp)2 co, and , occurs in cP only infront of atomic formulas, then there exists a sequence ljJiEJ(T) such that A(ljJ;) < A(cp) for all i < ill, and either hp<+ V ljJ i or hp<+ 1\ ljJ i : i<w
i « s»
Proof: By transfinite induction on A(cp), If A(cp)<W, the assertion holds trivially. Thus assume }.(cp)2W and that the assertion holds for all cP' such that }'«(P')
n<w
Thus for some n<w, either cP= V (Pi or cP= 1\ CPi' Consider the first case. i
i
Since A(CP)2W, we see that ;~«(Pi)2ill for some i«:n. Let m be the largest such integer i:n. Then A(CPm) <),«(p) and A(CPi)<W for all mc.icn. By inductive hypothesis there exists a sequence ljJjEJ(T) such that A(ljJj)
239
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
j
fqJm~
Then
V t/Jj or
j<w
fqJ~
fqJm~
V [V qJi
j<w
tc m
V
A t/Jj' Again consider the first case.
j<w
t/Jj
V
V qJJ.
m
It now follows from wellknown laws of ordinal addition that for every j < OJ,
2( V qJi i<m
V
t/Jj
V
V qJi) = 2( V qJJ + 2(t/JJ + 2( \I
qJJ
<2(\/ qJi)+2(qJm)+2( \I
qJi)
m
i<m
m
i<m
m
= 2(qJ),
which proves the assertion. All other cases are treated analogously. If f! is a finitely additive probability on Yjf and I £.9" is a set of infinite conjunctions and disjunctions, we say f! preserves I if for every A qJm V t/Jn EI, n<w
ac ro
f! (A qJn)
=
f!( \I t/Jn)
= lim
n<w
and
n<w
lim f! (1\ qJi)
ne co
nr co
t« n
f!( \I t/Ji)' i
LEMMA 6.5. For every qJEa(T) there exists a denumerable set I£a(T) of infinite conjunctions and disjunctions such that for all finitely additive probabilities m i and m2 on o(T)jf , if they agree on the finitary sentences and preserve I, then ml(qJ)=m2(qJ). Proof: By transfinite induction on 2(qJ). If 2(qJ)
or
i<w
fqJ~
A t/Ji' Consider the first case. Then for every n < OJ, we note that
i<w
2(\/ t/Ji)<2(qJ). By the inductive hypothesis, for every n
denumerable set In such that for all finitely additive probabilities m l , mZ, if they agree on the finitary sentences and preserve In' then m i (\I t/Ji)=
m 2( \I t/J;). Let I= i
i
U In U{ V t/JJ Then clearly mi (qJ)=m 2(qJ) for all finite
nc cc
i<w
ly additive probabilities m., m2 which agree on the finitary sentences and preserve I. The other case is completely analogous.
240
DANA SCOTT AND PETER KRAUSS
Now we begin with the proof of part (ii) of Theorem 6.2. Let J1 be a probability on Y'/'r such that for every 0
u {3v inq>,,(v;,,)+ V
;,;,,(,,)
q>,,(tj):n<w}.
The following lemma is essentially due to Ehrenfeucht and Mostowski [1961]. LEMMA 6.6. Iflll=
cursion:
(1) Iffor some n « co, m=(J(n) and if there exists an xEA~{ai:i<m} that satisfies q>,,(v;,,) in
Let e={IO,,:n<w}, let ,1=rU e, and define a mappingffrom y/e'r into Y(T}/,1 'r by f«(p/e 'r )=q>/,1 'r. Since e s,1, the mapping is well defined. Clearlyfis a (Jhomomorphism. Moreover,jis an isomorphism into. Indeed, let q> E //' and suppose ,1 'r I q>. If e U {q>} has a model then, by the LowenheimSkolem Theorem, it has a denumerably infinite model. Thus by Lemma 6.6, ,1 U {q>} has a model, contrary to the assumption. Thus, e U {q>} has no model; therefore, by the "weak" Completeness Theorom, e'r I ip, which proves the assertion. Now let g be the canonical ohomomorphism of Y(T)/'r onto Y(T)/Ll'r; that is, g(q>/'r) = q>/Ll'r. For orientaiton we draw a
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
diagram
241
Y/I ~ y/e I ~ Y(T)/iJ I ~ Y(T)/I.
As is wellknown, since /l(, Bn)= 1 for all O
(epjl)=(jIOg) (ep'/I), we have /l(ep)=/l(ep'). By Lemma 6.5, for each ep' there exists a denumerable set E' £ d(T) of infinite conjunctions and disjunctions such that for all probability measures m l and m z onJ(T)/1 if they agree on the finitary sentences and preserve E', then ml (ep')=mz(ep'). Finally,
let };' be the union of all these sets E'. };' is denumerable. By Lemma 6.3, there exists a finitely additive probability v on Y(T)/I which extends /l and preserves };'. Let n be the restriction of v to the finitary quantifierfree sentences. Then n is oadditive. (This is a consequence of Lemma 7.1 which will be established in Section 7.) As is wellknown, n may be extended to a probability m on d(T)/I. We claim that
r
{ti:l= tj:i <j < N}, e = {, 0n:n < N} U {On:N < n < w}, iJ=FUe. =
As before, define a mapping f from y/e I into Y(T)/iJ I such that f(ep/el)=ep/iJl. Againfis an isomorphism into. Indeed, let epEY' and suppose iJ I,ep. Then
el( /\
i<j
ti:l=tj)~'ep.
242
DANA SCOTT AND PETER KRAUSS
Thus
E> 1:3 V o ... :3
However,
Vn  \ (
1\
i<j
E>13v O ••• 3v n _
t (
V i:# Vj) + ,
/\ i<j
q>.
Vi:#V j ) .
Thus E>I'q>. Finally we observe that if q>' is obtained from q>EI: by eliminating quantifiers; that is, by successively replacing all existential subformulas 3vt/J(v) by V t/J (f i ) , then T 'r q><>q>'. Thus we can complete the proof as i:»
before. We conclude the proof of Theorem 6.2 with a remark concerning the proof of part (i). In this case we need only replace the refined method of the EhrenfeuchtMostowski Theorem (Lemma 6.6) by the somewhat cruder method of the original Henkin Completeness Proof (Henkin [1949]). The steps are quite similar though simpler than those just given. In this case, however, we clearly cannot expect the probability model
We now give an example to show that there are probabilities on :7/1which have a probability model but do not have a symmetric probability model. Indeed, for every q>E.(/' let J1(q»=1 iff q> holds in <WI' <), the system of the countable ordinals with their natural ordering. Every WI is definable in 2:; that is, there exists a formula q>~ of 2: with exactly one free variable v such that for every I1<W I , 11 satisfies oin <Wt, <) iff 11=(. Thus whenever «WI' we have J1(3vq>~)=1; and whenever 11«<W I , we have J1(3v[q>,r /\ q>~])=O. Now suppose J1 has a symmetric probability model
«
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
243
finitary language 2(w) which is due to Gaifman [1964]. We will give a simplified version of his proof in Section 7, Theorem 7.14. The questions of whether there is a method of deductively generating the probability consequences from a given set of probability assertions, or of deductively generating all probability laws are clouded by the fact that it is difficult to give the concept of deductively generating a workable meaning for infinitary languages. Nevertheless we can make a few remarks. Our discussion thus far certainly shows that we have not much reason to expect a positive answer to the first question. On the other hand the second problem has in a sense a positive solution which we will present now. THEOREM 6.7. Let
<
i
2 iff the sentence VAo ... V An _ I
[[
i\ Ai = 0
ieI
1\
i\ Ai ~ 0
ic n
1\ ..1. 0
+ ... + An _ I = 1] + cP]
is a theorem of real algebra.
Proof: Suppose
244
DANA SCOTT AND PETER KRAUSS
therefore yields a method of generating all probability laws of Sf. Whether there is a more useful way of generating the laws remains to be seen. Theorem 6.7 also provides suggestions for the definition of an analogue of the concept of consistency in ordinary logic. The general problem of conditions under which a set of probability assertions has a probability model is completely open for the infinitary language Sf, and it is apparently quite difficult. This completes our general discussion of the probability logic of the infinitary language Sf, and we now turn to the finitary language Sf(w) where the situation is somewhat more satisfactory. 7. The finitary case. The Boolean algebras Y(W)/f, y(w)(T)/f and )w) (T)/f are subalgebras of the ITalgebras //' /f, Y(T)/f and o(T)/f respectively. Our definitions and results concerning the infinitary language Sf therefore have rather obvious applications to the finitary language Sf(w), and in many cases they can be considerably strengthened. This is due to two important facts which we state for comment and reference. 7.1. Every finitely additive probability J1 on :7(w)/f is ITadditive. Proof: For every ];<;;,,/)(W) and every
The following lemma is wellknown and has an easy proof by means of elementary methods of functional analysis. For a purely algebraic proof we refer the reader to Horn and Tarski [1948]. LEMMA 7.2. Let fJ?J be a Boolean algebra and let d <;; fJ?J be a subalgebra. Every finitely additive probability on d can be extended to a finitely additive probability on fJ?J. As pointed out before, Gaifman [1964] gives a proof of the next theorem. Our proof of Theorem 6.2 can be essentially simplified to yield this result by replacing the role of Lemma 6.3 by Lemmas 7.1 and 7.2. Indeed Lemma 6.3 has been designed to patch up the difficulties arising from the fact that Lemma 7.2 fails for ITadditive probabilities. This in turn corresponds to the fact that the Prime Ideal Theorem for ITideals fails for Boolean ITalgebras. THEOREM
bility model.
7.3. (i) Every probability J1 on //,(w)/f has a denumerable proba
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
245
(ii) Iffor every O
Remarks: (1) As can readily be verified by an analysis of Lemma 6.6, our proof of part (ii) does not go through for a language 2(w) which either has infinitely many individual constants or nondenumerably many nonlogical constants to begin with. This fact is substantiated by two counterexamples of Gaifman [1964]. Nevertheless, Theorem 7.3 still holds for these languages, as the proof in Gaifman [1964] shows. We will not discuss the question of adapting our method of Lemma 6.6 to this situation. (2) The cardinality statements of Theorem 7.3 obviously depend on our assumption that 2(ro) has only denumerably many nonlogical constants. If we allow nondenumerably many nonlogical constants then the wellknown adjustments have to be made. The same remark applies to all other theorems of this part which contain statements about the cardinality of probability systems. Let T be the set of complete consistent theories in .P(w), that is, the set of prime ideals in y(ro)/f. As is wellknown from the Representation Theorem for Boolean Algebras (see, e.g., Halmos [1963] p. 77), T is a compact Hausdorff space with a basis of closedopen (clopen) sets of the form {LET: qJEL}, where qJEY(w), and y(ro)/f is isomorphic to the field of clopen subsets ofT under an isomorphism which maps qJ/f into {LET: qJEL}. Every model determines a complete consistent theory in 2(ro), that is, a point in T. By the ordinary Completeness Theorem, every complete consistent theory
246
DANA SCOTT AND PETER KRAUSS
of
2(10) has a model; thus T may be identified with the space of models. Many important results in the ordinary logic of 2(w) may conveniently be established through topological considerations in the space T. This topological construction can be generalized in the strictest sense of the word; thus the space M of probability models of 2(w) can be defined as a compact Hausdorff space such that the space T can be homeomorphically embedded into M. The construction makes use of wellknown definitions and methods of functional analysis. For the details we have to refer the reader to Dunford and Schwartz [1958]. Let C (T) be the linear space of all continuous real functions on T. Since the characteristic functions of clopen subsets of T are continuous, we may regard Y(W)/'r as a subset of C (T). Let L be the linear subspace of C (T) generated by .,/,(w)/'r in C (T). As is wellknown, C (T) is a Banach space Any finitely under the supnorm where for XEC (T) we have [x] =sup x( ~ET
n
additive probability p on .,/,(w)/'r uniquely extends to a linear functional p on C (T) such that (i) p(x):c;. [x] for all XEC (T); (ii) p(l)= 1. Conversely, any linear functional on C (T) satisfying (i) and (ii) uniquely determines a finitely additive probability on ,,/,(w)/'r. Let C (T)* be the linear space of all continuous linear functionals on C (T). As is wellknown, C (T)* is also a Banach space with its own norm such that for every pE C (T)*, Ilpll:::;; 1 iff p(x):::;; [x] for all XEC (T). Let M={pE C (T)*: Ilpll :::;; 1, p(l)=l}. By our remark above, the set of all finitely additive probabilities on y(w)/f may be identified with M, and by Lemma 7.1 this set agrees with the set of probabilities on .,/,(W)/'r. Every probability model determines a probability on Y(W)/f, and by Theorem 7.3 every probability on .'/~(w)/'r has a probability model, thus M may be identified with the space of probability models. We now consider C (T)* with the socalled weak star topology. The basic open neighborhoods are sets of the form
where nc co, pEC(T)*, xo, ..., XII_1EC(T), and s cO is a real number. It is easy to see that M is a closed subset of the unitsphere {pE C (T)*: li!lll:::;; I}; therefore, by the Alaoglu Theorem M is a compact Hausdorff space with the relativized weak star topology. Every LET uniquely determines a twovalued probability p on y(w)/'r,
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
247
and conversely. Thus there exists a natural embedding of T into M. Finally we observe that M is a convex set; that is, for every fll, ItzEM, and every real number 0< oc< 1, OCfll +(1 oc)flz EM. For any subset K of a linear space, the convex hull of K is the smallest convex set containing K. The closed convex hull of a subset K of a linear topological space is the closure of the convex hull of K. We say flEM is an extreme point of M iff for every fll, flz EM and every O
To prove Theorem 7.4 we first establish a useful lemma. LEMMA 7.5. Sets of the form M n N(fl; xo, ... , x n  1 ; e), where xo, ... , x n _ 1E.C/'(w)/f, constitute a basis for the weak star topology ofM. Proof: Let flEC(T)*,xo, ... ,Xn _ 1 EC (T ), and e>O. Consider VE M n N(fl; x o, ... , x n  1 ; e). Let c:5 =e max Iv(x;) fl(x;)l. By the StoneWeieri
strass Theorem, there exist Yo, ... , Ynl EL such that Ilxi  Yill < tc:5 for all i <no We show M n N(v;yo, ... ,Ynl ;tc:5) s M n N(fl;x o, ... 'X n 1 ;e). Indeed, let AEM n N(v; Yo, ... , Yn1; tc:5) and i«:n. Then since A, vEM
IA(xi)  fl(xi)1 ::; IA(xi)  A(Yi)1 + IA(Yi)  v(yJI + Iv(y;)  V (x;)j + Iv(x i )
< [x, < e.
~
p(x;)1 )';11 IIAII + tc:5 + IIYi  x;llllvll + e  c:5 
This proves that sets of the form MnN(p;xo, ... ,xn _ 1;e), where Xo,···, Xn  1EL, form a basis. Now let xEL. Then x=ocoxo+···+OC'_IX,_I' where X o, ... , X,_ 1 E Y(w) If and (J(O' ••• ' OC'1 are real numbers. Consider vEMn N(p; x; e), and let c:5=elv(x)fl(x)i. Then a straightforward computation yields M n N (v; x;; c:5jrloci J) s M n N(fl; x ; e). This proves the
n
lemma.
i
Now we proceed with the proof of Theorem 7.4. For part (i) we regard T as a subset of M and show that the topology of T is the topology of M
248
DANA SCOTT AND PETER KRAUSS
reIativised to T. Let n «.co, flET and x o, ... , Xn_1E::/'(w)/'r. Then {vET: Iv(x;)  fl(X;)1 < e for all i < n} =
n {vET: Iv(x i) 
i
fl(x;)1 < s}.
For every i «:n, {vET:lv(xi)fl(X;)I<E} will be either the empty set, the whole set, or the set of all prime ideals of y(w)/'r containing Xi' depending on the choice of fleX;) and s ; in any case it is a clopen subset of T. Since a similar argument shows that every clopen set of T is the restriction of an open set of M to T, Lemma 7.5 proves assertion (i). To prove part (ii) we let flET and consider fll, flzEM and O
(ii) L has a probability model with strict identity iff every finite subset of L has a probability model with strict identity. Proof: It is clear that the set of probability models of a closed probability assertion of stew) is a closed subset of M. Part (i) therefore foIlows from the compactness of M. To prove part (ii), let L be a set of closed probability assertions of se(w) such that every finite subset of L has a probability model with strict identity. Consider the set {On:O
where }.o = ik is short for  Ao ~ 0 if ik = 0, and Ao 1 ~ 0 if i k = 1. Clearly Llni is a set of closed assertions. Then for every 0 < n < w, there exists i EIn such that L U LIn; has a probability model. Indeed, suppose that this is not the case.
249
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
Then, by part (i), there exists O
i e Is;
a probability model with strict identity; however, for some iE/II this probability model has to be a model of Li ni, which is a contradiction. If for some n there exists ie I; which is not identically zero, and I: U Li ni has a probability model, then, by virtue of our observation above, there exists 0
7.7. IjVvcpEy(w)(T), then f.l(Vvcp)=inf f.l(/\ cp(t»). F
tEF
If cp, l/JEy(w)(T), then an occurrence of
250
DANA SCOTT AND PETER KRAUSS
7.8. (i) Let ljJ' be obtained from ljJ by replacing a simple occurrence of3vqJ in ljJ by V qJ(t). Then p(ljJ)=sup p(ljJ'). LEMMA
F
rEF
(ii) Let ljJ' be obtainedfrom ljJ by replacing a simple occurrence ofVvqJ in ljJ hy 1\ qJ(t). Then p(ljJ)=inf p(ljJ'). tEF
F
Let qJE//?(lU)(T) be of the form Qov o ... Qnl vn 1 M (v o, ... , VnI)' where for each i
(aeEo
Fn~1
(1l\EF n 1
M(to,···,t n 1 ») .
Proof: By induction on n. For n = 1 the assertion holds by condition (G) and Lemma 7.7. For n> 1 we have by inductive hypothesis, p (qJ) = bd., ... bd n 1 P( #0 ... Eo
Fnl
taeFo
QnDn M (to, ... , tnl> Dn») .
TI IFiI, where IFil is the number of elements We enumerate the Cartesian product set F= TI F say F= {.!ic:k
Let F 0' ... ,Fnl be fixed. Let N = in Fi •
# tlllEFn~t
i
i
j,
If
#0··· #nI QnDnM(to,···,tnl,Dn),
taeEo
tnIEF II

J
where f=
the monotonicity of disjunction and conjunction and the elementary properties of sup's and inf's we have p( # QnvnM(J,vn») = bd; ... bd; p( # fEF
GfO
GfN.
= bdnP( #
Thus
En
#n M(J,s»)
».
fEFsEGf
#n M(J,t n
feFtnEF n
p(qJ)=bdo···bdnP(#o ... #n M(to,···,t n»). Fa
En
(aeE o
{nEFn
251
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
The last lemma of this series is an easy consequence of Lemma 7.9 and the distributive laws. For a convenient formulation we adopt the notation of the proof of Lemma 7.9 and write
bd/l(# M(J))=bdo··.bdn1/l(#o .. , feE
F
Fo
Fn~l
toeFo
#nl n
tn  t E F
l
M(t o, ... ,tn 1 ) )
·
For every leer, let qJiESI'(w)(T) be of the form QkMk, where Qk is a string of quantifiers and Mk is a formula. For every k < r, let bd, and #k be the associated boundary and distribution prefixes respectively. LEMMA
7.10.
(i) /l( 1\ qJk) = bd.; ... bdc.., /l( 1\ #k Mk) k
(ii) /l (V qJk) k
Fa
Frt
= bd., ... bd r Fo
Frt
k
l
tkEFk
/l (\j #k Mk )
·
k
We are now in a position to prove analogues of many important results about the ordinary logic of 2(wJ. As a matter of fact, if we regard closed probability assertions of 2(w) as the analogues of sentences in ordinary logic, then the analogy in many respects seems to be complete. There are exceptions, however: we mentioned before our failure to define an analogue of direct products. We state next a few of the positive results and comment on their proof. THEOREM 7.11 (DOWNWARD LOWENHEIMSKOLEM THEOREM). Let
>
m*(3vqJ)= sup m*(V qJ(t)), FeT'(W)
teF
where (T, m> is the given probability system. THEOREM 7.12 (DIRECTED UNION THEOREM). Let {(T;, m;>:iEI} be a ~ directed family of probability systems; that is,for all i,jEI there exists k e I such that both
252
DANA SCOTT AND PETER KRAUSS
every CPE,/W)(T), m(cp)=mi(cp), where cpEo(w)(T;). Then
We now present a result concerning symmetric probability systems which obviously has no analogue in ordinary logic. This result is due to Gaifman [1964] whose proof we have simplified by using ideas from the ultraproduct construction of probability models. THEOREM 7.14. Let 1: be a set of probability assertions of 2(w). Then 1: has a denumerable probability model iff 1: has a denumerable symmetric probability model. Proof: Suppose 1: has a denumerable probability model
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
253
yields X(Q)=l. We define for all
p,,(
=
f m* (
By the same argument as in the proof of Lemma 5.4 we show that 11", regarded as a function on y(w) (T)jf, is a finitely additive probability. By Lemma 7.1, 11" is aadditive. 11" satisfies the condition (G). Indeed, let 3V
Q
tt
is onto,
m*(3v
tEF
= sup m*(V
teF
Since T is denumerable by the Dominated Convergence Theorem,
11,,(3v
FeT(W)
teF
= sup fm*(V
teF
n
= sup 11,,(V
teF
If
Proof: By Theorems 7.3 and 7.14. Remarks: (l) It is clear that our proof of Theorem 7.14 depends on the assumption that L has a countable probability model. Consequently Corollary 7.15 depends on the assumption that 2(w) has only countably many nonlogical constants; that is, in our standard case one binary predicate R.
254
DANA SCOTT AND PETER KRAUSS
Indeed, the counterexample concerning symmetric probability models given in Section 6 can be constructed in 2("1) if we allow nondenumerably many unary predicates in 2«"). (2) The probability system
{nEQ:n(t m ) = n(t"n = U({nEQ:n(l m ) = i<}'
Thus
i
Consequently tIJ;.(t m
tJ n {nEQ:n(t,,) = tJ).
= (II) =
j' fl
In
(n(l m ) = n(/,,)) dX(rr)
~ .I
1<;
A({/;W > O.
If
We conclude our discussion of the finitary language 2("1) with an immediate consequence of Theorem 6.7. 7.16. The set ofprobability laws of 2'("1) is recursively enumerable. Proof: In the remarks following the proof of Theorem 6.7 we can, for the finitary language 2("1), replace "effectively generate" everywhere by "recursively enumerate". This yields a proof of Theorem 7.16. THEOREM
8. Examples. We have reason to hope that the results of probability logic may have useful applications to deductive logic, inductive logic and to probability theory. The first point was illustrated by RyllNardzewski's example of a complete theory without models. The second point is rather obvious, as a matter of fact our work originally started with a study of Carnap's inductive logic. We will illustrate the third point by considering
ASSIGNrNG PROBABrLrTIES TO LOGICAL FORMULAS
255
some wellknown measurability problems in the theory of stochastic processes with nondenumerable index sets. Let ~ be the two pointcompactification of the set of real numbers; that is, the set of real numbers together with the points  00, 00. Let (\) be the set of rational numbers. Let f!jj be the afield of Borelsets of ~. Let T be an index set, and we will choose Ts; ~. Let f!jjT be the product afield of subsets of the Cartesian product space ~T induced by f!jj. As is wellknown, a stochastic process with index set T may be identified with a probability space <~T, f!jjT, m), where m is a probability on f!jjT. (For further details see e.g., Loeve [1960] p. 497 ff.) Let ~ be the afield of Borel sets of ~T; that is, the afield of subsets of ~T generated by the closed sets of the product topology on ~T. Then ,oljT r;;;;.~, and f!jjT =I=~ if T is nondenumerable. During the investigation of stochastic processes one frequently would like to assign probabilities to sets Xr;;;;. ~T which are not mmeasurable; that is, X ¢f!jjT. It is wellknown that this can always be done with finitely many sets at a time. Thus if X o, ... , X n  1 r;;;;. ~T, then m can always be extended to a probability on the afield generated by f!jjT U {X o, " " Xnd (see, e.g., Halmos [1950] p. 71). The extension is not unique, however. In general, given a afield d2f!jjT, the question arises of whether there exists a probability n on d which extends m. Moreover, one attempts to specify convenient conditions which render such an extension unique. Nelson [1959] investigates this question for ~ and gives a sufficient condition for the existence of a uniquely determined extension. He also shows that many interesting sets belong to~. Another extension result is Doob's Separability Theorem (Loeve [1960] p. 507). Let .# be the afield generated by f!jjT and sets of the form {XE~T:X(t)EC}, where Ir;;;;.~ is an open interval and Cr;;;;.~ is closed.
n
tElnT
Then Doob's Theorem says that every probability m on f!jjT has an extension to .Pi'. Moreover, the extension n may be assumed to be separable; that is, there exists a denumerable subset ss: T such that for all open intervals I and closed sets C, n(
n
tEInT
{XE~T:x(t)EC})=n(
n
tEIns
{XE~T:X(t)EC}).
S is called a separating set. The separability condition makes the extension unique. Upon closer inspection the separability condition turns out to be an instance of a "Gaifman Condition." This will appear more clearly during the later development of our example. Indeed, it seems that from the earliest investigations of the extension problem conditions for the "reasonableness"
256
DANA SCOTT AND PETER KRAUSS
of extensions of probabilities on B?JT have been proposed which strikingly resemble particular instances of a "Gaifman Condition" (see, e.g., Doob [1947]). Thus one might be tempted to put down a Gaifman Condition on extensions of probabilities on B?JT to a certain rrfield d~B?JT which renders such extensions unique, provided they exist, and then to investigate the problem of the existence of probabilities satisfying this condition. Our example will point in this direction. Of course, a Gaifman Condition is most conveniently stated in terms of a language rather than in terms of certain representations of sets. For our example we use the infinitary language if. As nonlogical constants of if we provide a binary predicate <, and for every qe (j) a unary predicate P q • Moreover we augment if by a set of individual constants which, for convenience, we choose to be the index set T~ IR. Let S be the set of relational systems of the similarity type of if(T). We embed IR T pointwise into S. For xEIR T define 'llxES as follows: 'llx=
8.1. For all t , t'ET, qEQ and (S;W,
.
(I) M(t
= t) il ,
IR
T
T
~ IR if t
= ) (1\;/'
= t'
\l'Iljt=l=t
'
T
< 1') n IR T = , IR if t < t' (0 if' tc t' T (iii) M(Pq(t)) n IR = {XE IRT:x(t) S; q} (iv) M(II.{J) il IR T = IR T ~ M(I.{J) (ii)
M(t
(v) M( V (Pi) i<~
n IR T = U M(l.{Ji) n IR T
(vi) M(!\ l.{Ji) n IR = T
i<~
(vii) M(:! VI.{J) n IR T
n M(I.{J;) n IR
i<~
T
i<~
=
(viii) M(V VI.{J) n IR T =
U M(I.{J(t)) n
rET
n M(I.{J(t)) il
tET
IR
T
IR
T
.
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
257
The operation of restriction to a subset is a complete homomorphism of fields of sets. By Lemma 8.1, (i)(vi), this homomorphism maps {M(qJ):qJEd(T)} onto f18T. Accordingly, we define for all qJEJ(T),
11m (M(qJ)) = m (M(qJ) n lR T) . We obtain a probability 11m on {M(qJ):qJEd(t)} and thus also on d(T)/f. We write /lm(qJ) for Ilm(M (qJ)) and obtain (T, 11m) as a probability system. We are interested in the afield d = {M(qJ) n [RT: qJEY(T)}. By our remark above f18T S; d, and we will see later that d contains a vast assortment of interesting sets. First we must complete our series of definitions. Suppose n is a probability on d which extends m. Define for all qJEY(T),
vn(qJ) = n(M(qJ) n
[RT).
We say that n satisfies the Gaifman Condition if Vn satisfies (G). We thus have as an immediate corollary of Theorem 4.3: THEOREM 8.2. For every probability m on f18T there exists at most one probability on d which extends m and satisfies the Gaifman Condition.
This settles the uniqueness part of the extension problem, the existence part is of course much more difficult. Consider the probability system (T, 11m) induced by m. It is wellknown that the equation n(M(qJ)n lR T )= 11= (M(qJ)) for all qJEY(T) defines a probability non d iff whenever qJE yeT) and [RT S; M (qJ), then Il~ (qJ ) = 1. Indeed, just in this case n is a welldefined set function on d (see, e.g., Halmos [1963] p. 65). This leads to: THEOREM 8.3. Let m be a probability on f18T. Then there exists a probability on d which extends m and satisfies the Gaifman Condition iff whenever qJEY(T) and [RT S;M(qJ), then 1l~(qJ)= 1.
The authors have been able to show that not every probability m on f18T has a Gaifman extension to d. A counterexample can already be produced with the case of dependent Bronoulli trials. In this case the stochastic process is twovalued, the space [RT collapses to 2 T , and in our language !E we only need one unary predicate P (together with c , of course). Further M(P(t)) n 2 T = {xE2 T : x(t)= 1} so that pet) means "success at time t", For qJ we choose a finitary sentence which says "P has a least upper bound" as follows:
3 Va [\I Vt [P(v t ) 
V t S;
va] /\ \I Vt [\I Vz [P(v z) + Vz S; Vt]
+ Va S; VI]].
258
DANA SCOTT AND PETER KRAUSS
Then 2Ts;M«p), but a measure m may be defined so that tl~(
nc
eI
t
T {XEIR T :x(t)sq}=M ( \lvo[t[
and the separability condition amounts to the Gaifman Condition (G) for sentences of the form \lvo [II
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
259
(1) The set of nondecreasing functions:
VVoVVI [vo S VI
+
1\ [Pq(vI)+Pq(V o)]].
qeQ
(2) The set offunctions assuming a maximum:
3v,Vvo 1\ [Pq(v,)+Pq(v o)]. qeQ
(3) The set offunctions assuming at most n different values:
3vo···3vn_,Vvn V [/\ Pq(Vi)~Pq(Vn)]. i
qEQ
In the following examples the sentence
ApPENDIX by Peter Krauss
A measuretheoretic generalization of the RasiowaSikorski Lemma In Rasiowa and Sikorski [1950] the following theorem is proved, which now is generally known as the RASIOWASIKORSKI LEMMA:
260
DANA SCOTT AND PETER KRAUSS
Let <:YJI, /\, v, ,....,) be a Boolean algebra, let bE!!J such that b # 1, and for every meso let 'Cm~!!J be a subset such that 1\ 'Cm exists in !!J. Then there exists a prime ideal jl in !!J such that (i) bEjl; (ii) for every mc.co, 1\ 'CmEjl ifffor some CE'Cm, CEjl.
An immediate consequence is the following relativised version: THEOREM 1. Let
(iii) If ~ ft· Proof: Apply the RasiowaSikorski Lemma to the quotient algebra B/['!]. We prove the following measure theoretic generalization of Theorem 1: 2. Let <.oJJ, r., v, ~) be a Boolean aalgebra, let d~!!J be a asubalgebra, and let 11 be a aadditive probability measure on d. Let v' be a finitely additive probability measure on !!J such that v' (x) = 11(x) for all XEd, let XO, ... ,xn_tE/!IJ and e>O. Finally, for every m, nc co let bmnE!!J. Then there exists a finitely additive probability measure v on !!J such that THEOREM
(i) Iv(x;)  v'(x;)1 < e for all i < n;
(ii) for every m < co, v( 1\ b mn) = lim v( 1\ b m;); n<w
(iii) vex)
= 11 (x)for all XEd.
ne co
i
Throughout the rest of this appendix we assume that
3. Let bEfJIJ~,r#. Then [dU {b}]={(xn b)U (y~b):x,YEd}.
261
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
4. Let
and let a, cEd such that aszbs;« and fl(a)= sup {fl(X):x<;;;.b, xEd}, fl(c)=inf {fl(X):b<;;;.x, XEd}. Let d=c~a, and let rx, B:» 0 be real numbers such that a + jJ= 1. Define for x, y Ed: LEMMA
v((X
n b)
bE:!lJ~d
U (y ~ b)) = fl((X
n a)
U
(y
~
c)) + rxfl(X n d) + Pfl(Y n d).
Then v is a aadditive probability measure on [dU {b}] such that V(X)=fl(X) for all XEd.
The next lemma is also known: LEMMA 5. Let bEIJi ~ d and let v be a finitely additive probability measure on [dU {b}] such that v(x)=j1(x)for all XEd. Then v is aadditive. Proof: Let xnEd, ncto be a decreasing sequence. Then V(( 1\ x n) n<w
Suppose
n b):S;
v(( 1\ x n) n b) < lim v(x n n b). n<w
Then
lim v(x n n b).
n>oo
n> 00
fl( 1\ x n) = v( 1\ x n) = v(( 1\ x n ) n<w
n<w
n<w
n b) + vH 1\
< lim v(x n n b) ne co
+ lim
n<w
x n ) ~ b)
v(x n ~ b)
ne co
contradicting the hypothesis that j1 is (Jadditive. Lemma 3 now proves Lemma 5. LEMMA
6. Let bnEIJi, n<w be a decreasing sequence such that 1\ bn=O. n<w
Then there exists a finitely additive probability measure v on IJi such that
(i) lim V (bn ) = 0; n> 00
(ii)
V (x)
=
fl(x)jor all XE d.
Proof: Define by recursion: do=d, d n+ 1 = [dnU {b n}], for n «:co. By Lemma 4 we see that without loss of generality we may assume that bnrfdm for ncco, For every n « co choose anEd such that a.s:b; and fl(an)=sup {fl(X):x<;;;.b m xEd}. Since b; is decreasing we may assume that an is de
creasing. By recursion we wish to define a oadditive probability measure Vn on d n such that Vo=fl and for nc co, Vn+ 1 (X) = Vn(X) for every XEdno Suppose vn has been defined on d no To define vn+ 1 on slfn+ 1, let a~, c~ Edn be
262 such that
DANA SCOTT AND PETER KRAUSS a~<;;;bn<;;;c~ and
vn(a~) = sup{vn(x):x <;;; bn,xEdn} vn(c~)
= inf{vn(x):b n <;;;
x,xEsi'n}'
Since an
We prove by induction on n: For every nc co, xEd",
By definition, vn+l(xnbn)=vn(xna~). In case n=O, let xEd o. Then n a;) E .91 and by the definition of a~,
x
Now suppose for every
XES~n'
vn+l(xn bn)=sup{,u(z):z<;;;xn b",zEd}. We first show: vn+2(bn+I)=Vn+2(an+I)' In fact, by definition, vn+2(bn+I)= Vn+l(a~+I)' By the induction hypothesis, Vn+ 1 (bn)=,u(an)=v n+ 1 (an)' Since a;,+ I <;;;bn+l <;;;b", Vn+ 1(a~+ 1)= vn+ 1(a~+ 1 n an). Since a~+ 1Ed n+ 1 and a~+ 1 <;;; b", a~+ 1= X n b; for some XES'i'n" Thus vn+ 1 (a~+ 1)= Vn+ 1((X nan) n bn), where x n anEs¥n" By these induction hypothesis,
Furthermore an+ 1 <;;;a;, + 1 <;;;bn+ 1 and an+ 1<;;;an" Thus an+l<;;;a~+lnan= (x nan) n b; <;;; b; + I' Therefore, by the definition of an+ I ' Vn+ 1 ((x nan) n bn) = ,u(a/!+I)' This proves vn+2(bn+I)=Vn+2(an+l)' Now let
XE'#n+I'
Then
In fact, I'/! + 2 (b/! +1) = Vn+ 2 (a n+ I) and an+ 1 <;;; bn+ I Therefore vn+ 2 (x n b/!+ I) = vn+2(x n a/!+ I)' Since x n a n+ 1 E'#n+1 and x n an+ 1 <;;;b n+ 1
xnbn+I,Vn+2(xnbn+I)'::;;sup {,u(z):z<;;;xnb n+ l, ZEd}. Clearly Vn+2(xn bn+1)2SUp {Jl(z):z<;;;xn bn+l> ZE.r:i}, which completes the inductive proof. As a direct corollary we obtain: For every n « co, vn+l(bn)=,u(an). Now
define a finitely additive probability measure v on the subalgebra
U
n<w
d
n
263
ASSIGNING PROBABILITIES TO LOGICAL FORMULAS
by v(x)=vn(x) for xEdn, and extend v to a finitely additive probability measure on BO. Then lim v (bn)= lim jl(an)=O since /\ an£ /\ bn=O. This ne co
proves Lemma 6.
n<w
ns co
n<w
Now we proceed to prove Theorem 2. C (X) with the supnorm is a Banach space. Any finitely additive probability measure v on BO uniquely extends to a linear functional on C (X) such that (i) v (x) S Ilxll for all XE C (X); (ii) v(l)=1. And, conversely, any linear functional on C (X) satisfying (i) and (ii) uniquely determines a finitely additive probability measure on BO. Let C (X)* be the conjugate space of C (X) and consider the weak star topology for C (X)*. For q:>EC(X)*, x o, ... , Xn+1EC(X), and e>O, let N(q:>;x o, ""Xnl ;e) = {t/tEC(X)*: 1t/t(xJ  q:>(xJI < ef or i < n}
M .... = {q:>EC(X)*: 11q:>11 S I,q:>(x) = jl(x)for all xEd}.
By the Alaoglu Theorem, M .... with the weak star topology is a compact Hausdorff space. LEMMA
7. Let bnEBO, n<w be a decreasing sequence such that /\ bn=O. n
For every r> 1, let Pr={q:>EM.... :III's lim q:>(bn)}. Then P, is nowhere dense in
M .....
Proof: Since Pr=
some q:>EC (X)*,
n {q:>EM.... : I/rsq:>(bn)}, P, is closed in M ..... Suppose for
n
X O, ... ,
Xn 1 EC (X) and e>O, we have
By the StoneWeierstrass Theorem there exist vo, ... , vn  1 EL such that Ilxi Viii ' be the restriction of q:> to M. By Lemma 5, tp' is cadditive on [d U{co,···, c.; I}]' By Lemma 6, tp' has an extension t/t to C (X) such that t/tEM .... and lim t/t(bn)=O. Let ic.n, Then q:>(V i) = q:>'(Vi) = t/t(v;). Thus Iq:>(x i )

t/t(xi)1
s s
+ )t/t(Vi)  t/t(Xj)J + 11t/tIIIIX i  Viii < e.
1q:>(xJ  q:>(vJI II q:> II
II Xi
Thus t/t E P, which is a contradiction.

viii
264
DANA SCOTT AND PETER KRAUSS
Now consider the hypothesis of Theorem 2. It clearly suffices to assume that for every m < ill, o.: is a decreasing sequence for n < ill such that /\ bmn=O. For every m « co, r211et
n<w
By Lemma 7, p=
U U
m<
W
r2:: 1
P, is of first category in M ...... Since the set
N(v'; x o, ... , x n  I ; e) n M.<J1 is a nonempty open set and M ..... is a compact Hausdorff space, Theorem 2 follows by the BaireCategory Theorem. References DOOB, J. L., 1947, Probability in function space, Bull. Am. Math. Soc., vol. 53, pp. 1530 DUNFORD, N. and J. T. SCHWARTZ, 1958, Linear operators I (Interscience, New York) EHREN FEUCHT, A. and A. MOSTOWSKI, 1961, A compact space of models of axiomatic theories, Bul!. Acad, Pol on. Sci. Sec. Sci. Math., Astron. Phys., vol. 9, pp. 369373 FENSTAD, J. E., A limit theorem in polyadic probabilities, Proc. Logic Colloquium in Leicester, August 1965 (forthcoming) GAIFMAN, H., 1964, Concerning measures on firstorder calculi, Israel J. Math., vol. 2, pp.I18 HALMOS, P. R., 1950, Measure theory (Van Nostrand, New York) HALMOS, P. R., 1963, Lectures on Boolean algebras (Van Nostrand, New York) HENKIN, L., 1949, The completeness of the firstorder functional calculus, J. Symbolic Logic, vo!. 14, pp. 159166 HEWITT, E. and L. SAVAGE, 1955, Symmetric measures on Cartesian products, Trans. Am. Math. Soc., vo!. 80, pp. 470501 HORN, A. and A. TARSKI, 1948, Measures in Boolean algebras, Trans. Am. Math. Soc., vo!. 64, pp. 467497 KARP, C. R., 1964, Languages with expressions of infinite length (NorthHolland Pub!. Comp., Amsterdam) KELLEY, J. L., 1959, Measures in Boolean algebras, Pacific J. Math., vol. 9, pp. 11651177 LOEVE, M., 1960, Probability theory (Van Nostrand, New York) Los, J., 1962, Remarks on foundations of probability, Proc. Intern. Congress of Mathematicians, Stockholm, 1962, pp, 225229 Los, J. and E. MARCZEWSKI, 1949, Extension of measure, Fundamenta Mathematicae, vol. 36, pp. 267276 NELSON, E., 1959, Regular probability measures on function space, Ann. Math., vol. 69, pp.630643 RASIOWA, H. and R. SIKORSKI, 1950, A proofof the completeness theorem ofGiidel, Fundamenta Mathematicae, vol. 37, pp. 193200 SIKORSKI, R., 1964, Boolean algebras (second ed., Springer Verlag, Berlin)
PROBABILITY AND THE LOGIC OF CONDITIONALS* ERNEST W. ADAMS University of California, Berkeley, California
1. Introduction. The purpose of this paper is to give a rigorous mathematical foundation for a theory of the logic of conditionals based on probabilistic concepts, which has been informally presented in Adams [1965]. This theory involves a formal calculus analogous to the propositional calculus, for symbolizing conditional statements and their truthfunctional components, and gives rules for determining the formal validity of inferences symbolized within the calculus. The objective of setting up this calculus is to give a more adequate representation of inferences involving conditionals than does the propositional calculus. Thus, many writers, e.g. Lewis [1932], Belnap [1960], Belnap and Anderson [1962], and Angell [1962] have felt that the fact that the propositional calculus allows the 'fallacies' of material implication (from 'not p' to infer 'if p then q', or from 'q' to infer 'if p then q') as valid inferences shows that this calculus does not adequately represent the laws of the ordinary English 'if then', and they have proposed alternative calculi aiming to better represent the logic of conditionals. The calculus I will develop here has the same objectives as those of these other writers  only I hope to show that it has several advantages over previously given calculi for representing the logic of English conditionals. Before going into the details of our calculus, or discussing its advantages, I shall briefly describe the main theses upon which the formal theory rests. The main philosophical ideas upon which our theory is based will be stated here without much discussion or defense, for which the reader may refer to Adams [1965]. My first contention is that an adequate understanding of inferences involving conditionals must take into consideration other things besides their truth conditions. This is connected with the fact that in ordinary parlance the words 'true' and 'false' have no unambiguous sense as applied to
* This work was supported in part by National Science Foundation Grant GS824, and was carried out during the tenure of a Fellowship at the Center for Advanced Study in the Behavioral Sciences.
266
ERNEST W. ADAMS
conditional statements whose antecedents prove to be false. Without getting involved with the tricky question as to whether one can repair the lack of definition of the terms 'true' and 'false' by supplying a satisfactory general criterion of truth which would apply to conditionals, the lack of a clear ordinary definition suggests that it may be more profitable not to attempt to characterize validity of inferences involving conditionals by the requirement that it should be impossible for the premises of the inference to be true but the conclusion to be false. My second and more positive contention is that in analyzing the validity of inferences involving conditionals, one must consider what [ vaguely characterize as conditions ofjustified assertability for the statements involved. Some points of contrast between justified assertability and truth are the following. First, 'truth' suggests verification or tests to determine truth, and tests must be performed at some time and, usually, on some things. Normally, the things upon which verification tests are performed are things referred to in the statement under test, and the test is performed at a time indicated by the tense of the statement, or explicitly referred to in the statement (thus, to verify a statement like 'John will get married tomorrow', one must wait until tomorrow, and then observe John). In contrast, the conditions of justified assertability are ones which prevail at the time of the making ofthe statement, and depend on such things as the authority of the speaker to say what he says at the time he says it. Though there is certainly a connection between justified assertability and truth, one is usually neither a necessary nor a sufficient condition for the other. In particular, one may be well justified in making a statement which ultimately proves to be false, and one may also be completely unjustified in making a statement which ultimately proves to be true. The foregoing observation leads immediately to my third main thesis. The extent to which one is justified in making a statement on an occasion is a matter of degree. And, if it is the case that one may be very well justified in making a statement even when there is nonnegligible likelihood that the statement may eventually prove false, this suggests that it may be fruitful to bring probabilistic ideas explicitly into consideration in developing a theory of the logic of conditionals. That is, I propose to replace the vague and unquantified notion of 'justified assertability' by that of 'high probability' (i.e. probability very close to 1). This leads in turn to formulating an alternative criterion of 'reasonableness' for inferences, in place of the criterion of validity stated in terms of truthconditions, as follows: an inference is reasonable just in case it is impossible for its premises to have high probability while the conclusion has low probability. Thus, we arrive at the criterion for reason
PROBABILITY AND THE LOGIC OF CONDITIONALS
267
ableness of inferences simply by substituting 'high probability' for 'true' in the classical definition of 'validity'. As it turns out, this substitution makes no difference in those inferences of the propositional calculus which do not involve conditionals, the intuitive reason being that, where 'true' is clearly defined, 'high probability' is the same as 'high probability of being true'. The close connection between high probability and truth which exists for ordinary unconditional statements does not prevail with conditionals. This is a consequence of my fourth basic thesis, which is that the probabilities of conditional statements are to be construed as conditional probabilities. The conditional probability of the statement 'if p then q' can easily be seen to be high just in case it is much more likely that p and q are both true than that p is true and q is false. But this condition is quite different from one requiring that the probability that 'if p then q' is true (where truth is connected with tests) is high. The fact that the close connection between probability and truth which exists for unconditional statements fails to hold for conditionals partly explains, I think, why it is that the propositional calculus gives a more adequate representation of logical relations among nonconditional statements than it does among conditionals. The main elements of the present theory may be summed up as follows: I propose to replace the criterion of 'validity' stated in terms of truth conditions by a criterion of 'reasonableness'. This criterion requires that the high probability of the premises of a reasonable inference must guarantee that the conclusion also has a high probability. And, finally, the probabilities of conditional statements are taken to be conditional probabilities. With this informal exposition, we are in a position to proceed to the development of a precise mathematical formulation and analysis of these ideas. Before doing that, though, it may help to motivate what follows to describe some of the evidence which can be brought in support of our theory, showing how it helps to explain many facts concerning the logic of conditionals, and even predicts some heretofore unsuspected phenomena. First, it is easy to explain in the light of the theory why it is that the 'fallacies' of material implication do strike people as fallacious. Why should we be reluctant to infer from 'It will not rain tomorrow' the conclusion 'If it does rain tomorrow then the game will be played'? One good reason, at least, for our reluctance is surely this: we can only assert 'It will rain tomorrow' with a high probability, but this high probability does not by any means entail a high probability for 'If it does rain tomorrow then the game will be played'. At any rate, this conclusion would certainly not be a safe one to draw from the assertion, and our intuitive recognition of this fact explains, I
268
ERNEST W. ADAMS
think, why we regard the inference as questionable. Note that in arguing in the foregoing way we have taken explicitly into account the situation of the persons making the statements in question, and of their hearers, and not just the things the statements are about (tomorrow's weather, and the game). A second bit of evidence supporting the theory is that it predicts that some of the rules of inference involving conditionals which have so far been accepted as valid are not in fact universally valid, and that it should be possible to find instances of premises and conclusions conforming to these rules such that it would not be reasonable to infer the conclusions from the premises. Several examples of 'exceptions' to usually valid rules are given in Adams [1965], and I select one for illustration  a counterexample to show that the law of Hypothetical Syllogism does not always yield reasonable inferences. The counterexample to this rule of inference (to infer 'if p then r' from 'if q then r' and 'if p then q') is the following: If Smith wins the election then Brown will retire to private life. If Brown dies before the election then Smith will win the election. Therefore, if Brown dies before the election then he will retire to private life. Of course, it may be argued that this example and ones like it represent the kinds of logical oddities which formal logic may properly ignore, or which may be handled in ad hoc ways and don't require to be taken into account in a general theory. The point in favor of the present theory, however, is that it predicts the existence of such examples, and yields considerable information to help in the search for them (by telling what kinds of probability relations must subsist between 'p', "q' and 'r' and their compounds if it is to be the case that 'if q then r' and 'if p then q' have high probabilities, but 'if P then r' has a low probability). The third kind of evidence supporting the theory is that it yields an explanation as to why certain rules of inference involving conditionals are usually accepted without question, even though they 'have exceptions' (if the theory is correct), whereas other rules of inference (e.g., those associated with the fallacies of material implication) are immediately seen to be fallacious. One example is the logical law connecting disjunctions with conditionals (to infer 'if p then q' from 'either not p or q' and vice versa), which the theory shows to have exceptions, though it is usually unchallenged. Indeed, if this law were universally valid, then this would justify treating the English 'if then' as a material conditional for the purposes of logical analysis. What the present theory shows is that inferring 'if p then q' from 'either not p or q' is not always reasonable, but that the only situation under which 'either not p or q' has a high probability but 'if p then q' has a low one is the situation in
PROBABILITY AND THE LOGIC OF CONDITIONALS
269
which 'not p' has a high probability. Assuming this, we have an immediate explanation of why we are ordinarily willing to infer 'if p then q' from 'either not p or q': the reason is that people do not ordinarily assert a disjunction when they are in a position to assert one of its members outright (in fact, it is misleading to do so, and therefore doing it probably runs against strong conventions for the proper use of language). Thus, if one heard it said that 'either the game will not be played tomorrow, or the Dodgers will win' he would be well justified in inferring 'if the game is played tomorrow, then: the Dodgers will win', and what would justify the inference would be the knowledge that the person asserting 'either the game will not be played or the Dodgers will win' did not do so simply on the grounds of information he had to the effect that the game would not be played. Similar analysis explains why the laws of Contraposition and Hypothetical Syllogism, both of which have exceptions, nevertheless are reasonable in the loose sense that if conventions of proper language use (e.g., don't assert a disjunction when you are in a position to assert one of its members outright) are adhered to, then on any occasion on which the premises are asserted with high probabilities the conclusions must also have high probabilities (whereas no such conventions justify the 'fallacious' inferences of material implication). We now turn to the formal development of the theory. This formal theory is not here being offered as a serious rival to standard formal logic. As will be seen, the formalism involves many questionable and arbitrary features, and it certainly falls far short of current formal logic in scope and power (in spite of its advantages in handling conditionals). On the other hand, it seems to me desirable to attempt to develop a precise theory even though it is highly provisional, since in this way we may hope to bring to light difficulties which would otherwise go unnoticed, and to get some indications which could lead to improvements. 2. Basic concepts. Three formal concepts are basic to the present theory: those of a formula (and that of a language), of a probability function for a language, and of reasonable consequence which is a relation holding between sets of formulas and formulas. The formulas we work with are intended to symbolize ordinary truthfunctionally analyzable propositions, and conditionals whose antecedents and consequents are truthfunctional propositions. Formal truthfunctional compounds are constructed just using the operations for conjunction, disjunction and negation (the symbols for which are' &', ' v " and'  " respectively), and conditionals are constructed using the arrow, ' ..... '. As a technical convenience I also include two distinguished
270
ERNEST W. ADAMS
formulas, 'T' and' F', among the atomic formulas, which are interpreted as a logical truth and a contradiction, respectively. One important limitation of this formalism, distinguishing it for the ordinary propositional calculus, is that the arrow can occur only as the main connective of a formula, and therefore truthfunctional compounds of conditionals, as well as iterated conditionals, cannot be symbolized within the formalism. The general reason for imposing this limitation is to insure that all formulas of the language can have probabilities assigned to them which obey the axioms of the standard calcul us of conditional probability, which does not apply in its usual form to compound expressions with conditional components. That this restriction is somewhat more severe than necessary is probably obvious: in fact, formal denials and 'quasiconjunctions' of conditionals will be introduced later as metalinguistic abbreviations, and, as Patrick Suppes has pointed out, conditionals with conditional consequents might be introduced with the assumption that they obey a probabilistic version of the law of Importation. Not all 'truthfunctional' compounds of conditionals are so easily handled, however (see Adams [1965] for informal discussion of some problems arising with disjunctions of conditionals, 'only if' statements, and conditionals with conditional antecedents) and so it seems somewhat easier at the outset to impose a blanket restriction excluding all compounds with conditional components, rather than attempt to formulate complicated rules specifying just which compounds containing conditionals can have probabilities assigned, and which cannot. 1 1.1. A formula is any expression of one of the following three kinds: i) atomic  'T', 'F', and all lower case Latin letters, with or without numeral subscripts; ii) truthfunctional  all atomic formulas and expressions constructable from them using the binary connectives' &' and' v " and the unary connective'  ' ; iii) conditional  all formulas of the form qJ'>t{J, where qJ and t{J are truthfunctional formulas. 1.2. Let o: be a set of atomic formulas including 'T' and' F'. The language of ct is the set of all formulas which include only atomic formulas in ct. A language is a set of formulas which is the language of some set, ct, of atomic formulas including 'T' and 'P'. In what follows, I will ordinarily use lower case Greek letters as variables ranging over truthfunctional formulas, use capital Latin letters from 'A' DEFINITION
PROBABILITY AND THE LOGIC OF CONDITIONALS
271
through 'D' as variables ranging over conditional formulas, and use capital Latin letters'S'  '2' as variables ranging over sets of formulas. I will speak freely of a formula or set of formulas, S, tautologically implying a formula A, even where S and A may include conditional formulas. In applying the concepts of tautology and tautological implication to conditional formulas, it will be understood that they are to be regarded as material conditionals for the purpose of the application. Thus, I will say that 'p' tautologically implies 'p+q,' though the inference is not reasonable. An abbreviation which we will use informally is to write 'cp l/J', in place of 'cp & l/J'. It is convenient to introduce at this point some auxilliary concepts and notation applying to formulas and sets of formulas, even though most of it will only be employed in later sections. Some metalinguistic notation is defined first. DEFINITION 2. Let A be a formula. 2.1. If A=l/J then ~A= l/J; if A=cp+l/J, then A =cp+ l/J. 2.2. If A=l/J then Ant(A)=T, Cons (A) = l/J and Cond(A)=T+l/J; if A=cp+l/J then Ant(A)=cp, Cons(A)=l/J, and Cond(A)=cp+l/J. What Definition 2.1 does is to introduce a kind of metalinguistic negation operation, which is the same as the ordinary negation as applied to truthfunctional formulas, and which, as applied to a conditional cp+l/J, yields the 'contrary conditional', cp+ l/J. Denials of conditionals could have been included directly in the object language, but putting them into the metalanguage emphasizes the fact that these are to be regarded as abbreviations, and are not to be interpreted as symbolizing English statements having the form of denials of conditionals. Definition 2.2 introduces formal symbols for the antecedents and consequents of conditionals. It also introduces a kind of formal conditional T+l/J associated with a truthfunctional statement, ljJ. It will be seen that the formulas l/J and T +ljJ are interchangeable in all of their logical relations (including ones of reasonable inference), and it is convenient to be able to reduce everything to just the consideration of relations among conditionals, by replacing any truthfunctionalljJ by T +l/J. The next series of notions to be defined apply only to finite languages: i.e., to languages, 2, generated by finite sets of atomic formulas. As is well known, these languages are 'atomic', in the following sense. There is a set, SD2, of 'state descriptions for 2', having the following properties: (1) SD2 is a finite set of consistent truthfunctional formulas of 2, (2) any two distinct members ()( and f3 of SD2 are tautologically inconsistent, and (3)
272
ERNEST W. ADAMS
every truthfunctional formula cp of 2 which is not tautologically equivalent to F is tautologically equivalent to a (unique) disjunction offormulas of SD2. These SDsets for finite languages are constructable in well known ways, and [ will assume without giving the explicit construction that each finite language, 2, has associated with it a uniquely specified SDset which will be called 'the SDset of 2'. This assignment of SDsets to finite languages is assumed in the following definition. DEFINITION 3. Let 2 be a finite language, let SD be the SDset of 2, let (X be a member of SD, let S be a finite set of formulas of 2, let A be a formula of 2, and let Ant(A)=cp and Cons(A)=t/J. 3.1. (X belongs to tp if and only if (X tautologically implies tp, 3.2. SD (A) is the set of all (X in SD belonging to cp; SD (A) is the set of all (X in SD belonging to cpt/J. 3.3. SD (S) is the union of all SD (B) for Bin S; SD (S) is the union of all SD(B) for B in S. 3.4. S is null if and only if SD(S)=SD(S). The significance of functions SD(S) and SD(S) will only become apparent later. For the moment we may note the following. Strictly speaking, the concepts introduced in Definition 3 should all be relativized in the particular language 2. We are being somewhat informal in not making this relativization explicit in the notation. In future developments, ambiguity will be avoided by always specifying which language is under consideration. In the case of the concept 'null', the dependence on the language is only apparent, since a set S of formulas is null relative to one language if and only if it is null relative to all (finite) languages. Probability functions for languages are defined next, using essentially the Kolmogorov axioms for conditional probability, with one important modification. The modification consists in assigning probability 1 to all conditional formulas whose antecedents have probability O. The justification for this procedure is the following. Assignment of probability 1 to a statement can be justified if it can be shown that the statement is perfectly 'safe', and in Adams [1965] 'safe' is interpreted to mean 'safe in a betting sense'. Now, any bet on a conditional is itself conditional, and is called off in case the antecedent 'condition' fails to be satisfied. If the antecedent condition has probability 0, then the bet is certain to be called off, and therefore the bet (and, by implication, the statement) is perfectly safe in the sense that it is practically sure not to be lost (though it is equally sure not to be won). The foregoing argument to justify assigning probability 1 to conditionals whose antecedents
273
PROBABILITY AND THE LOGIC OF CONDITIONALS
have probability zero is certainly not conclusive: for the present it is being used as a heuristic justification for making some disposition of these troublesome cases. The arbitrariness of this disposition shows the need to consider the consequences of alternative solutions. One such alternative is in fact discussed in Section 9, where it is shown that the alternative does lead to somewhat different rules for 'reasonable inference', but the two systems of rules are related in a very simple way. DEFINITION 4. Let se be a language. 4.1. A probabilityfunction for se is a realvalued function P with domain se such that: for all truthfunctional formulas
°
274
ERNEST
W.
ADAMS
namely, that it should be possible to guarantee arbitrarily high probability for A by assigning sufficiently high (but less than 1) probabilities to the formulas of S. We shall see later that this requirement is equivalent to the following even weaker requirement: that if A is reasonably inferable from S, it should not be the case that arbitrarily high probabilities for formulas in S are compatible with arbitrarily low probability for A. In addition to the notion of 'reasonable consequence', a concept of 'strict consequence' is also defined, according to which A is a strict consequence of S if and only if the fact that all formulas of S have probability 1 guarantees that A has probability 1. As might be expected, strict consequence proves to be equivalent to tautological consequence, and hence, in fact, is a less stringent requirement than reasonable consequence. DEFINITION 5. Let 2' be a language, and let S and A be, respectively, a set of formulas of 2', and a formula of .!e. 5.1. A is a reasonable consequence of S (in symbols, SIfA) if and only if for all s > 0 there exists b > 0 such that for all probability functions P for 2', if P(B»1b for all B in S, then P(A»1e. 5.2. A is a strict consequence of S (in symbols, SI A) if and only if for all probability functions P of .!e, if PCB) = 1 for all Bin S, thenP(A) = 1. Three things to be noted about the concepts just defined are the following. First, formal strictness would require that the dependence of the concepts of reasonable and strict consequence on the language be made explicit in the notation. In fact, however, the dependence is only apparent, since it is easy to show that A is a reasonable or strict consequence of S, 'relative to .!e' if and only if it is a reasonable or strict consequence of S relative to any other language. Second, it is trivial to see that, if PI' Pb .. ' is a uniform sequence of probability functions and S is a finite set, and lim n > 00 (Pn(B)) = 1 for all Bin S, but lim n > 00 (Pn ( A)) < 1, then A is not a reasonable con seq uence of S. The converse is also true (i.e., if the fact that lim n > 00 (Pn(B)) = 1 for all B in S guarantees that lim n > 00 (Pn(A)) = 1, then A is a reasonable consequence of S), but this is not so obvious. Finally, I have deliberately used the tautological consequence symbol '}' here as the symbol for strict consequence. The justification has already been noted: namely that strict and tautological consequence are equivalent. This is shown in Theorem 1, below, which establishes some other elementary consequences of the basic definitions.
THEOREM 1. Let.!e be a language, and let S and A be, respectively, a set of formulas and a formula of .!e.
PROBABILITY AND THE LOGIC OF CONDITIONALS
275
1.1. SI A if and only if A is a tautological consequence of S. 1.2. If SII A then SI A. 1.3. II is a deduction relation, when restricted to finite sets: i.e., if Sand S' are finite sets of formulas, then i) if A is in S, then SII A, ii) if S'II B for all B in S, and SII A, then S'II A, iii) if S' and A' result from S and A, respectively, by substituting a truthfunctional formula
A' is a tautology, and therefore it has probability 1, for any probability function. And, it follows from this that any probability function, P, such that P(B;)=1 for i=l, ... , n also assigns P(A') = 1. But, it follows directly from the definition of a probability function that if C is any formula and C' is its 'associated' material conditional (or C'=C, if C is truthfunctional), then P(C)=1 if and only if P(C')=1. Hence, if P(Bi) = 1, for i = 1, ... , n, then P(A)= 1, and therefore A is a strict consequence of Sf, and hence of S. This concludes the proof of 1.1. Proof of 1.2. This follows immediately from Theorem 1.1 just proven. For, if SII A, but not SI A, then A is not a tautological consequence of S, and therefore there is a truthassignment to the formulas of the language under which all formulas of S have the value 'true', but A has the value 'false'. Again, a probability function can be defined as in the proof of 1.1,
276
ERNEST W. ADAMS
such that P(B)= 1 for all Bin S, but peA) =0, from which it follows trivially that A cannot be a reasonable consequence of S. This proves 1.2. Proof of 1.3. That if A is in S, then SIf A is obvious. To prove part (ii), suppose that Sand S' are both finite, that S'II B for all Bin S, and SII A. For any e> 0 there exists b >0 such that if PCB»~ Ib for all Bin S, then peA»~ Ie. Since S'If B for all Bin S, there exists bB for all Bin S such that if P(C» 1b B for all C in S', then PCB»~ Ib. Since Sis finite, there exists a minimum b B for all B in S, which is positive; let b o be this minimum. Clearly, then, if P(C» Ib o for all C in S', then PCB»~ Ib for all Bin S, and therefore peA»~ Ie. Hence S'If A, as was to be shown. Part (iii) follows directly from the fact that, for any probability function P' of!£ it is possible to construct another probability function P of se such that P(B)=P'(B') for all formulas Band B ' of se, where B' results from B by replacing all occurrences of a in B by qJ. The construction of P is elementary and will not be described here. Assuming this construction, it follows directly that if not S'If A', then not SIf A. For, if there were some e>O such that for all b>O there existed a probability function P' such that P'(B'» Ib for all B' in S', but P' (A')::;; 1 e, then it would also be the case that PCB»~ Ib for all Bin S but peA)::;; 1e, and hence not SIf A. This concludes the proof. Theorem 1.1 shows that the notion of strict consequence is of no formal interest, since it is equivalent to tautological consequence. The intuitive significance of Theorem 1.1 is that it suggests that we should not 'get in trouble' in analyzing logical relations among conditional statements by treating them as material conditionals, so long as the premises of our arguments can be asserted with logical certainty. That is, where we may expect trouble in applications of standard logic is in situations in which we are reasoning from premises which are not known with certainty. Theorem 1.3 is significant in showing that the reasonable consequence relation has at least some minimal properties of deduction relations, and therefore justifies calling this a 'consequence' relation, at least as applied to finite sets of premises. That the probabilistic consequence relation is not a deduction relation where its domain is extended to include infinite sets of formulas is seen from the fact that it fails to satisfy the compactness condition: i.e. there are infinite sets of formulas, S, and formulas A, such that SII A, but not S/If A for any finite subset, S', of S. An example of a set Sand formula A having this property is as follows. Let S be the set of all formulas B;= 'a, v a. ; 1 >ai + 1 &  a;' for i = 1, 2, ... (where the 'a;' are distinct atomic formulas), and let A = 'a1>F'. Now it is a trivial consequence of the axioms of probability that if P(BJ>f for all i = 1,2, ... , then P(aJ::;;tP(ai+ 1) for all i,
PROBABILITY AND THE LOGIC OF CONDITIONALS
277
from which it follows that P(a 1) must be 0, hence P(ar+ F) = I. Clearly, therefore, SIIA, since an arbitrarily high probability for A can be guaranteed by requiring that all formulas of S have probability of at least t. On the other hand, the same argument shows that for any finite subset S' of S, an assignment P(a1»0, and therefore P(a 1+F) =0 is consistent with assigning arbitrarily high probabilities to all formulas of S', so it is not the case that S'IIA.
In what follows we shall be concerned exclusively with the reasonable consequence relation restricted to finite sets of premises. It will prove that the reasonable consequence relation restricted to finite sets is equivalent to several other conditions with intuitive significance, and in fact it is possible to give a system of rules of inference within a natural deduction system such that a conclusion, A, follows from a finite set, S, of premises if and only if A is derivable from S by those rules. These rules will be given in the following section, in the definition of the relation of 'probabilistic consequence', and it will be shown that derivability in accordance with these rules is a sufficient condition for a conclusion to be a reasonable consequence of premises. The proof that probabilitistic consequence is also a necessary condition for reasonable consequence (the completeness proof) is more difficult, and requires further preliminaries. 3. Probabilistic consequence. We now give a set of rules for deriving 'probabilitistic consequences' from sets of formulas, S, and show that if a formula, A, is a probabilistic consequence of S, then A is a reasonable consequence of S. The rules for deriving probabilistic consequences form the clauses of Definition 6, below. DEFINITION 6. Let S be a set of formulas. Then the set of probabilistic consequences (abbreviated 'p.c.s.') of S is the smallest set S' having S as a subset such that for all truthfunctional formulas ip, lJ', and y: PCl. if
lJ'+y;
if either
278
ERNEST W. ADAMS
In practice, In considering the probabilistic consequences of a set S of formulas of a language 2, we may restrict the application of the rules of inference PCIPC8 to just formulas of 2. That is, the class of all p.c.s. of S that are formulas of f£/ is the same as the class of formulas derivable from S by rules PCIPC8 restricted to formulas of 2. Observe concerning rules PCIPC8 that all of these are tautologically valid. This is to be expected, of course, if probabilistic consequence is to be equivalent to reasonable consequence, since all reasonable consequences are tautological consequences (Theorem 1.2). Two of these rules are, however, conspicuously weaker than well known corresponding rules of tautological inference. Thus, the tautologically valid rule to deduce q>~y from q> v 'P~y is weakened in PCS so that q>~y can only be infered from q> v 'P~y together with 'Jf~  y. Likewise, PC8 is a weakened version of the Hypothetical Syllogism. As noted in Section 1, the Hypothetical Syllogism is not reasonable in complete generality; what PC8 asserts is that the weaker law that q>~y can be infered from q>~ 'P and q> & 'P~y is reasonable in complete generality. Something similar also holds in the case of PCS; the stronger version of that rule (that q>~y can be derived from q> v P~y) is not reasonable in complete generality. In fact, we shall eventually prove the following: if either PCS or PC8 is replaced by their 'stronger' versions, in the definition of probabilistic consequence, then all tautological consequences are derivable by the modified rules. The next theorem asserts what was taken for granted above: namely that all probabilistic consequences are reasonable consequences. All that is required to prove this is to show that any conclusion derivable by a single application of any of PCIPC8 is a reasonable consequence of the formulas from which it is immediately derived, since Theorem 1.3 guarantees that reasonable conseq uences of reasonable conseq uences are themselves reasonable consequences. THEOREM 2. Let S be a set of formulas, and let A be a formula. If A is a probabilistic consequence of S then A is a reasonable consequence of S. Proof As noted above, all that is required is to show that all immediate inferences in accordance with rules PCIPC8 are reasonable consequences of the formulas from which they are derived. This will be shown in only two cases  Rules PCI and PC4, since the proofs of the other rules proceed in entirely similar fashion. PCI is trivial. If q> is tautologically equivalent to 'P, then for any probability function P, P(q» = P(P), and thereforeP(q>+y) = P(lp+y). Hence 'Jf+y is clearly a reasonable consequence of q>+y.
PROBABILITY AND THE LOGIC OF CONDITIONALS
279
The proof of rule PC4, as well as the other 'nontrivial' rules PCSPC8, is most easily obtained using the following simple inequality (which will prove important in other developments): for any probability function P and truthfunctional formulas tp, ':?, y and f.t such that peep) and P(y) are both positive,
P(P v f.t) PcP) P(f.t) _.. <   +P(ep v y)  P (ep) P(y)" This follows as a matter of simple algebra. For, if we set P(':?f.t)=a, P(':?&f.t)=b, P(f.t ':?)= c, P(ep y) =d, P(ep &y)=e, and P(y ep)= f, then P(':?v f.t)=a+b+c, P(ep v y)=d+e+ f, P(':?)=a+b, P(f.t)=b+c, P(ep)= die, and P(y)=e+ f, then what must be shown is that:
a+b+c a+b b+c <+d
+ e + f
d
+e
e
+ J'
where all the numbers involved are nonnegative, and all of the denominators are positive. Verifying this inequality is trivial, though tedious, since the result of crossmultiplying to clear of fractions yields an inequality in which all the terms on the left are cancelled by ones on the right; hence the ineq uality reduces to an inequality of the form 0::; t, where t is a sum of products of terms a  f Carrying out of this verification is left to the reader. Now it can be shown that it is possible to guarantee a probability at least 1e for ep v ':?"y by requiring that both P(ep"y) and P('P"y) be greater than 1 _.te. Assume first that both probabilities P (ep) and P (':?) are positive, and consider the probabilities P(ep" y) and P('P" y). We have then: P (qJ "  y) =
and
P (lp "  y) =
P(ep  y) = 1  P (ep " y) < !e peep) P(p  y)
P(tf) = 1 
P ('1' " y) < !e .
By the inequality just proven, then,
P«ep  y) v (lp  y)) ...   < P(ep v '1')
.it; 2
+ .i e = 2
e.
But, it is an elementary consequence of the axioms of probability that, if f(ep v 0/) is not zero,
P(ep v lp "y) = 1 P«ep  y) v (lp_=~)) P(ep vp) hence P(ep v I/I"y) > ie.
280
ERNEST W. ADAMS
In the case in which one or both P(qJ) or pel{!) is zero, the proof is even simpler, for it follows directly from the axioms in that case that if, say, P( qJ) =0 then P (qJ v l{!+y) = P(l{!+y), and therefore a value at least 1 a for P(({} v l{!+y) is assured by requiring that the probability of P(l{!>y) be at least Ia. This concludes the proof that qJ v l{!+y is a reasonable consequence of qJ+y and l{!+y, and therefore finishes the argument. An interesting sidelight on the proof of Theorem 2 is the following: it is possible to guarantee a probability at least Ia for an immediate inference from a single premise by requiring that the premise have probability at least I  s, and it is possible to guarantee a probability at least 1 s for an immediate inference from two premises by requiring that both premises have probabilities at least I 1e. It will be shown later that this result generalizes to remote inferences as well: that is, if an inference of a conclusion from n premises is reasonable, then it is possible to guarantee a probability at least Ie for the conclusion by requiring the premises to have probabilities at least Ieln. Thus, one may establish conclusively that an inference of a conclusion from n premises is not reasonable, by finding some a such that the probabilities of all the premises is greater than Ieln but the probability of the conclusion is less than I a. We conclude this section by listing a number of probabilistic consequences which follow from rules PCIPC8. These consequences will be used in proving the completeness of the rules. THEOREM 3. Let 'I and qJI, ... , qJn and 'PI' ... , 'Pn be truth functional formulas. 3.1. If Y is a tautological consequence of 'P[ then qJ[+y is a p.c. of qJI + 'P[. 3.2. qJ[ > 'P[ and qJI +qJj & 'P[ are p.c.s. of one another. 3.3. qJt v qJl+  «PI  'Pd is a p.c. of qJI+ 'Pl' 3.4. qJt v ... vqJn+(qJ['PI)& ... &«Pn'Pn) is a p.c. ofqJt+'Pt, ... , qJn + 'Pn ' 3.5. If qJl & 'P1 is a tautological consequence of qJt & 'PI and qJt  'P t is a tautological consequence of qJl 'P1 then qJl+ 1Jf1 is a p.c. of qJt+ 'Pt. Proof The proofs of four parts of this theorem are most conveniently presented in the form of schemata of natural deduction derivations of the conclusions of the inferences from their premises. The proof of 1.1 goes as follows: 1. qJ[+1Jf1 given. 2. 'PI fy ('P j tautologically implies 'I) given.
PROBABILITY AND THE LOGIC OF CONDITIONALS
2S1
3. CPl & 'Pi ~y 2, PC3. 4. CPl ~y 1, 3, PCS. The derivation of CPl ~CPl & 'Pi from CPl ~ 'Pi goes as follows: 1. CPl ~ 'Pi given. 2. CPl tautologically implies CPl' 3. CPl ~CPl 2, PC3. 4. CPl ~CPl & 'Pi 1, 3, PC7. The derivation of CPl ~ 'Pi from CPl ~CP1 & 'P1 is also simple. 1. CP1 ~CPl & v, given. 2. CPl & 'Pi tautologically implies 0/1 &CP1' 3. CPl ~ 'P1 &CPl 1, 2, Theorem 3.1. 4. CP1 ~ 'Pi 1, 3, PC 6. The proof of 3.3 goes: 1. CP1 ~ 'Pi given. 2. CP1&'Plf(CP1'P1) tautology. 3. CP1 & 'P1~  (CP1  'Pi) 2, PC3. 4. CPl ~  (CP1  'P1) 1, 3, PCS. 5. CPzCP lf(CP1'P1) tautology. 6. CPZCP1~(q)1'Pl) 5,PC3. 7. CP1V(CPZCP1)~(CP1'P1) 4, 6, PC4. 8. CP1 v CPz~ (CPl  'Pi) 7, PCl. Theorem 3.4 is obtained by simple iteration of applications of Theorem 3.3, plus use of rule PC7. Thus, Theorem 3.3 entails that CP1 v ... V CPn+ (CPi 'Pi) is a p.c. of lfJi~ 'Pi for each i= 1, ... , n. And applying PC7 nI times yields a derivation of the desired formula as a p.c. of the formulas CPl v ... V CPn~ (CPi 'Pi)' for i= 1, ... , n. Theorem 3.5 requires a somewhat longer derivation. 1. CPl ~ 'Pi given. 2. (CPl &CPZ)V(CP1CPZ)~'Pl 1, PCl. 3. CP1 & 'Pi f cpz & r, given. 4. CP1  cpz f  'Pi 3. 5. CP1 cpz+  'P1 4, PC3. 6. CPl &cpz+'P1 2, 5, PC5. 7. CP1 &cpz & 'Pi f 'Pz 3. S. CPl &cpz & 'P1~ 'Pz 7, PC3. 9. CPl &CPz~ v, 6, 8, PC8. 10. cpz  'Pz f CP1  'P1 given. 11. CPzcplf'PZ 10. 11, PC3. 12. cpz  CPl ~ 'Pz
282
ERNEST W. ADAMS
13. (
9,12, PC4.
This concludes the proof of the theorem. 4. Porderings and their associated uniform sequences of probability functions. We have now obtained sufficient conditions for a formula A to be a reasonable consequence of a set of formulas, S: namely that A be a probabilistic consequence of S. In this and the next section it will be shown that these conditions are also necessary. In order to do this we need to consider systematically the construction of probability functions and sequences of probability functions with respect to which the probabilities of the formulas in S approach 1 as a limit, but the limit approached by the probability of A is less than 1. One rather convenient way of generating such sequences of probability functions is based on what will be called a 'Pordering' of the truthfunctional formulas of a language. Intuitively, a Pordering is a weak ordering relation of the truth functional formulas of a language which can be interpreted such that if (P and tp are truthfunctional formulas and o s: tp holds, then the probability ratio P(P)jP(
r: F);
iii)
PROBABILITY AND THE LOGIC OF CONDITIONALS
283
doing this, though, we prove that Porderings of finite languages are generated in a rather simple way from weak orderings of the SDsets of the language (augmented by F). THEOREM 4. Let 2 be a finite language, let SD be the SDset of 2, and let s; be a binary relation over the truthfunctional formulas of 2. Then s; is a Pordering of 2 if and only if there exists a weak ordering, s; 0, of the set SD+ =SDu{F}, such that: i) FS;olX for all IX in SD+, and F
284
ERNEST
W.
ADAMS
if and only if ex::; p. The first equivalence follows from conditions (i) and (iii) of Definition 7.1. For, by condition (iii), tp::; 'P (where tp is assumed to be a disjunction of elements of SD +) if and only if ex' ::; 'P, for all ex' in SD + belonging to tp, and by condition (i), ex'::; 'P, for all such elements ex' if and only if ex::; 'P, where ex is a maximal element of SD+ belonging to tp. The second equivalence follows from conditions (i) and (iv) of Definition 7.1. Thus, supposing that IfI is a disjunction of members of SD+, then it follows from condition (iv) that ex::; 'P if and only if ex::; f3' for some disjunct P' in 'P, and by condition (i) ex::; P' for some such disjunct, if and only if ex::; P, where P is a maximal element of SD + belonging to lJ'. This proves condition (ii) of the theorem, and therefore that if ::; is a Pordering of :E, then there is a weak ordering ::; 0 of SD + satisfying conditions (i) and (ii). The proof of the theorem is completed by showing that if ::; 0 is a weak ordering of SD +, and s, 0 and s; satisfy conditions (i) and (ii) of the theorem, then v; satisfies conditions (i)(iv) of Definition 7.1, and therefore x; is a Pordering of !E. Each of conditions (i)(iv) of Definition 7.1 follows trivially from the fact that s; 0 is a weak ordering of SD + , and that s; and s; 0 satisfy conditions (i) and (ii) of the theorem. Thus, to show that s; is a weakordering, suppose that ex, P and (j are, respectively, maximal members of SD+ in the ordering s; 0 belonging to tp, 'P and y, respectively, where tp, 'P and yare truthfunctional formulas of :E. By condition (ii) of the theorem, tp::; 'P holds if and only if ex::; oP, and 1fI::; tp holds if p::; oex; and, since s; 0 is a weak ordering either ex::; oP or fJ::; orx, and therefore either tp::; IfI or 'P::; tp hold. Again, if tp::; 'P and 'P::; y, then, by condition (ii) of the theorem, ex::; of3 and f3::; o(), and since ::; 0 is a weak ordering, rx::; o(j, and therefore o s: y. Hence the relation of ::; is strongly connected and transitive, and so is a weak ordering. Conditions (ii)(iv) of Definition 7.1 follow in similar fashion, and we omit the arguments. This concludes the proof. Now we come to what is more important about Porderings: namely, their connection with uniform sequences of probability functions. It was asserted that the relation (p::; 'P could be interpreted to mean that the limit approached by the probability ratios P( 1fI)/P( tp) is positive. This idea is made precise in Definition 8, below, which introduces the idea of a uniform sequence Pt, Pl , ... associated with a Pordering s . DEFINITION 8. Let:E be a language, let ::; be a Pordering of :E, and let ..• be a uniform sequence of probability functions of :E. Then P l , Pl , ... is a uniform sequence associated with s, if and only if for all truth
PI' Pl ,
PROBABILITY AND THE LOGIC OF CONDITIONALS
285
ir;
functional formulas q> and lfJ of 2, q> ~ lfJifand only if lim., 00 (q> v 'P+ 'P)) >0. We will not go through the argument to show that, provided the limits involved exist, then lim n > 00 (Pn (q> v 'P+ lfJ») is positive if and only if the limit of P; ('P)/P; (q» is positive (or possibly equal to plus infinity), and therefore the intuitive interpretation of q> ~ 'P is justified, since we will not use this fact in what follows. The proof, however, is elementary. Likewise, it follows trivially from Definition 8 that if PI' P l , ... is associated with ~, then q> < 'P holds if and only iflim n > 00 (p" (q> v 'P+q») = 0, and (provided the limit exists), this is equivalent to the condition that lim,., 00 (Pn (q»/P; ('P)) = O. What are important for present purposes are the facts asserted in the next theorem. THEOREM 5. Let 2 be a finite language, and let A and S be, respectively, a formula and a finite set of formulas of 2. 5.1. If PI' P l , ... is a uniform sequence of probability functions for 2, then there is a unique Pordering ~ of 2 such that PI> P l , ... is associated with ~. 5.2. If ~ is a Pordering of 2 then there is a uniform sequence of probability functions of 2 associated with ~. 5.3. If ~ is a Pordering of 2, and PI' P l , ... is a uniform sequence associated with ~, then A holds in ~ if and only if limc., 00 (Pn (A») = 1. 5.4. If SIIA then A holds in all Porderings of 2 which all formulas of S hold in. Proof of 5.1. This proof proceeds by showing that the binary relation defined by the condition: for all q> and 'P, q> ~ 'P if and only if limj., oo(Pn (q> v lfJ+ lfJ») > 0 satisfies conditions (i)(iv) of Definition 7.1, and is therefore a Pordering of 2, and is, furthermore one such that PI' P l , ... , is a uniform sequence associated with it, according to Definition 8. The proof of each of conditions (i)(iv) of Definition 7.1 is routine, and we shall actually carry out only the proof of (i)  that ~ is a weak ordering of the truth functional formulas of 2. That either q> ~ 'P or 'P ~ q> must hold follows, since lim (Pn(q> V
n> 00
'P + q> v 'P») = 1
Hence at least one of the two limits on the right above must be positive, and therefore either q> ~ lfJ or 'P ~ q>. The transitivity of the relation ~ follows from the following inequality
286
ERNEST W. ADAMS
of the pure calculus of probability: for any probability function P" whatever, P" (q> v y ...... y) :::": P"(q> v p ...... P)' P" (P v y ...... y) . This inequality follows by simple algebra from the axioms of probability as given in Definition 4.1. Assuming this inequality, transitivity follows immediately, since if both q> ~ P' and 'P ~ y, then both of the limits of P" (q> v 'P...... P) and p,,(P'vy ...... y) must be positive, hence the limit of their product must be positive, and therefore, by the inequality, the limit of P" (q> v y...... y) is also positive, hence q> ~ y holds. This concludes the proof of 5.l. Proof of 5.2. Let ~ be a Pordering for 2, and let SD+ =SDu{F}, where SD is the SDset for 2. Assume that the elements of SD + are ordered as follows: i.e., the elements of SD+ are ordered in increasing 'blocks', CX"i_,+I' ••. , CX"i where the elements within each block are all equivalent to one another. Now, for each i = I. ... , k , let Pi be the disjunction of the elements in the ith block: I.e .•
We first define the values of P,,({3i)' for n=2, 3, ... and for i= 1, ... , k. If k=2, then everything is trivial; we set P" ({31) = 0 and P" ({32)= 1 for all n = 2, 3, .... If k > 2 then the probabilities are defined as follows:
P,,(fJJ=
1
n
fori=I .... ,k1
2",i'
1/1"'21
P,,(fJJ = 1 2",2 /1
/1
1
for n = 2, 3, .... Now. it follows by simple algebra from the foregoing equation that:
I
and
i= 1
P" (fJJ
1
P,,(fJi) = 1,

for
/1
= 2,3, ...
for i = I, .. _, k  1,
/1
= 2,3 ....
Next the probabilities of the individual elements of SD + are defined by setting the probabilities of the elements of anyone Pi equal: for j = 1, ... , m;
PROBABILITY AND THE LOGIC OF CONDITIONALS
287
if Cl. j is a disjunct of Pi' then
where r, is the number of disjuncts of Pi' Clearly, the probabilities Pn(Cl. j) form a distribution over the set SD, and therefore define a unique probability function P; for the language 2. It is clear also that Pz, P3 , •.. form a uniform sequence of probability functions for 2, and all that remains to be shown is that this is a uniform sequence associated with :<:;;. Assuming the result of Theorem 5.1, all that is necessary to prove that P 2 , P3 , ••• is a sequence associated with :<:;; is to show that for any two Cl. h and Cl. j in SD+, Cl.h:<:;;Cl. j if and only if limn~w (Pn(Cl. h V Cl.r>Cl. j ) ) is positive. For, by Theorem 5.1, the sequence Pz, P3 , •.• is associated with some unique Pordering of 2, and according to Theorem 4, this ordering is itself uniquely determined by its restriction just to the set SD +. But now what remains to be proved is trivial. If IXh:<:;;Cl. j, then either IXh~Ct.j or IXh< rJ. j. In the first case, P; (Ct. h)= P;(Ct. j), from which it follows immediately that P; (Ct. h V Ct.r+Cl.j)=h and therefore the limit of these probabilities is 1, which is greater than O. If Ct.h < Ct. j ' it follows that Ct.h and Ct. j belong to disjunctions Pp and Pq such that p
288
ERNEST W. ADAMS
must be 1, since the limit of Pn ( qJ~  P) is zero. The argument is reversible, so if the limit of P,,(qJ~P) is 1, then qJ~P holds in :S;. In case qJ:s;F, the limit approached by Pn(qJvF~F) must be 1. Since Pn(qJv F"F) can only be 0 or 1, and 1 only if Pn(qJ)=O, it can only be that Pn(qJ) is 0 for all but a finite number of values of Pn(qJ). But, in this case, P; (qJ~ P) can differ from 1 for only a finite number of values of n, and therefore limn~(X) (Pn (lp" P)) = 1. Conversely, if limn~(X) (Pn (lp~ P)) = 1 and qJ:S; F, clearly lp" P holds in :S::. This concludes the proof of 5.3. Proof of 5.4. This follows trivially from 5.2 and 5.3. Suppose that there is a Pordering :s; such that all members of S hold in :S;, but A does not hold in it. Then by 5.2 there is a uniform sequence PI' Pz , ... associated with :s;, and by 5.3, Iimn~(X)(Pn(B))=1 for all Bin S, but limll~(X)(Pn(A))
PROBABILITY AND THE LOGIC OF CONDITIONALS
289
9.3. The ordered partition of SD generated by S is the sequence SD l ' ... , SD p+ 1 of subsets of SD such that SD 1 =SD~SD(Sl)' SD i + 1 =SD(Si)~SD(Si+l) for i=l, ... ,pl, and SD p+ t =SD(Sp), where S1> ... , Sp is the reduction sequence of S. 9.4. If it is not the case that SD (S) = SD, and if the ordered partition of SD generated by Sis SD t , ... , SD p+ b then the standard Piordering of!l' associated with S is the Pordering :s; such that for all a and fJ in SD, if a is in SD i and fJ is in SD j then a:S; F if and only if i = P + 1, and a:S; fJ if and only if j:S;i, Definition 9.4 presupposes what has not been shown: namely, that the ordered partition of SD generated by S is a partition of SD. This and other important properties of these partitions and their associated Porderings will be derived below. First, however, it may help to make the intuitive bases of the concepts introduced in Definition I~ 9 clearer to illustrate them as they apply in a simple example. Consider the a language generated from just the two ! atomic sentences 'p' and 'q' (plus 'T' and 'F'). The SDset of !l' may be bed taken to consist of the four formulas a='p& q', b='p& q', c='p&q', and d='p&q', whose relations may be most easily visualized with the aid of the Venn Diagram to the right. Now, let S be the set containing just the four formulas p ..... q,  q'>  p, q'>  p, and p v q'>p. To determine the immediate reduction of S, we must determine which formulas A in S have the property that SD(A)~SD(S). These two concepts are characterized in Definitions 3.2 and 3.3. SD (A) is the set of SDs belonging to Ant(A) (the antecedent of A), SD(A) is the set of SDs belonging to Ant(A)& Cons(A), and SD(S) is the union of all SD(A) for A in S. The formulas A of S, together with the sets SD(A) and SD(A) are conveniently represented in a table, as below: formula A l.p+q 2.  q +  P 3. q +  p 4. p V q + p
The formulas A of S such that
SD(A) {b, e} {a, b} {e, d} {b, e, d}
SD(A)
{b} {b}
{e} {el}
SD(S) SD(A)~SD(S)are
=
{b, c, d}.
clearly the first, third and
290
ERNEST W. ADAMS
fourth in the above table: i.e., the formulas p*q, q*p, and pvq*p. These, then, comprise the set Red (S). To construct the reduction sequence of S, we simply iterate the process described above. Thus, we set St =S, and set Sz= Red(St)={p*q, q* p, p v q*p}. To construct S3, we should find Red(Sz), which again is easily determined from the above table by finding all A in Sz such that SD(A) <;;SD(Sz). But, here it is clear that SD(Sz)= {b, c, d}, since Sz consists of formulas 1,3 and 4, and the union ofSD(A) for A in that set is {b, c, d}. It is the case then that SD(A)<;;SD(Sz) for all A in Sz  i.e. Red (Sz)=Sz. This being the case, the construction of the reduction sequence terminates with Sz: i.e., the reduction sequence isjust the sequence St, Sz, where St =S and Sz consists of formulas 1, 3 and 4. Having constructed the reduction sequence of S, the determination of the ordered partition of SD generated by S goes as follows. We take SD t to be the set of all SDs outside ofSD(St). Since SD(Sd=SD(S)={b, c, d}, then SD t = {a}. Next, SD z is defined as the set difference SD(St)~SD(Sz). We have already observed that SD(St)=SD(Sz)= {h, c, d}, and therefore SD z is the empty set, A. Finally, SD 3 is defined to be SD(Sz)={b, c, d}. The ordered partition of SD generated by S is then just the sequence of sets SD 1 ={a}, SDz=A, SD 3={h, c, d}. Given the foregoing ordered partition, the associated standard Pordering of iF is constructed essentially by reversing the ordering of the SDs in the partition, and setting all elements of SD 3 equivalent to F; i.e., the associated ordering is determined by the following ordering of SDv{F}:
The manner in which the ordering of two truthfunctional formulas q> and lfI in /1" is determined by the above ordering of the SDs is spelled out in Theorem 4: essentially, the ordering of q> and lfI is the same as the ordering of the maximal SDs in the ordering belonging to each of them. Thus, to determine the ordering of p and q, we need only to note that all of the SDs belonging to both of them are equivalent, and therefore p and q are equivalent. On the other hand, the maximal SDs belonging to p are strictly less than the maximal SD belonging to  p (which is a), and therefore p is strictly less than p in the ordering. Note in particular that the following hold in the standard ordering: p::::;F, q& ( p)< q& p, q si F, and p v q s.F, from which it follows that the four formulas of S all hold (in the sense of Definition 7.2) in the standard ordering. The foregoing is not accidental: one of the most
PROBABILITY AND THE LOGIC OF CONDITIONALS
291
essential facts about standard orderings associated with sets of formulas, S, is that all formulas of S hold in them. This is proved in the next theorem. THEOREM 6. Let 2 be a finite language, let SD be its SDset, let S be a finite set of formulas of 2, and let SD t, ... , SD p+ 1 be the ordered partition of SD generated by S. 6.1. SDt> ... , SD p+ t is a partition of SD. 6.2. IfSD (S) is not equal to SD then all formulas of S hold in the standard Pordering of 2 associated with S. Proof of 6.1. This follows immediately from the fact that in the reduction sequence St, ... , Sp of S, each later element in the sequence is a subset of the earlier elements, and therefore the subset relations SD(Sp)~SD(Sp_t) ~ .,. ~SD(St) hold. Elementary set theory then entails that the sets SD t =SDSD(St), ... , SD p+ t =SD(Sp) are mutually exclusive, and their union is SD. Proof of 6.2. Let A be a member of S, and suppose that SD",SD(S) is nonempty, hence the standard Pordering, :C:;, associated with S is defined. We can assume without loss of generality that A is conditional, since if A is unconditional, A holds in :c:; if and only if the conditional formula Ant(A)+Cons(A)=Cond(A) holds. Suppose that A =cp+lJI: it will be shown that A holds in :C:;. Note first that it follows immediately from the definition of the ordered partition SDt> ... , SD p+ t that for i=l, ... ,p, SD(S;) = SD i+ t u ... u SD p+ t, where St, ... , Sp is the reduction sequence of S. Since AES=St, A is in at least one S, in the reduction sequence, and if A is in S, then SD(A)~SD(Si) =SDi+1U ... uSD p+ 1' Suppose first that A is in Sp. Then SD(A)~SDp+l' and the latter set is the set of all SDs which are set equivalent to F in the ordering :C:;. But, SD(A) is the set of all SDs belonging to cp, and therefore all SDs belonging to cp are equivalent to F, and so cp:C:; F. This in turn entails that cp+ lJ' holds in :C:;. Suppose that A is not in Sp' Then there is some i such that A is in S, but not in Si+l' As before, SD(A) is a subset of SD(Si)=SDi+tu ... uSD p+t' On the other hand, since A is not in Si+ t = Red (S;), it is not the case that SD(A)~SD(Si)' so SD(A) must contain an SD in the union SD t u ... uSD i . Now, the members ofSD(A) are all SDs belonging to cplJI, and
292
ERNEST W. ADAMS
therefore all SDs belonging to qJ 'Pare in SD i+ 1 u ... uSD p+ j  The members of SD(A) are all SDs belonging to tp, so there must be some SD in SD 1 u ... u S D, belonging to qJ, and therefore to qJ & 'P, since that element cannot belong to qJ  'P. It follows immediately from this that qJ  'P < qJ & 'P, since the maximum elements in the ordering belonging to qJ  'P are in SD i + 1 u ... u SD p + I' and these are strictly less than the maximum elements in qJ & 'P, which are in the union SD 1 u ... uSD i . But, qJ  'P < tp & 'P entails that qJ+ 'P holds in s. Therefore, the assumption that A is in S entails that A holds in s, so Theorem 6.2 is proved. 6. Completeness and a decision procedure. Let A be a formula and S be a finite set of formulas. We are now ready to establish the equivalence of the following three conditions: (I) A is a probabilistic consequence of S (in the sense of Definition 6), (2) A is a reasonable consequence of S, and (3) A holds in all Porderings in which all members of S hold. Actually, we have already shown that condition (1) entails condition (2) (Theorem 2), and that condition (2) entails condition (3) (Theorem 5.4), so what we have to do now is 'close the ring' by showing that condition (3) entails condition (1). It turns out to be easier to do this if three more links are added to the chain: conditions (4), (5) and (6) such that condition (3) entails (4), (4) entails (5), (5) entails (6) and (6) entails (I). Adding these links actually simplifies the proof and, moreover, yields an immediate decision procedure for determining whether a conclusion is a reasonable consequence of premises. THEOREM 7. Let 2 be a finite language, let SD be its SDset, let A be a formula and S be a finite set of formulas of 2; let Sf be the set Su{ ~ A}, let f S;, ... , S; be the reduction sequence for Sf, let SD 1 , ••. , SD;+ 1 be the ordered partition of SD generated by Sf, and let So be the set S;~{ ~A} (i.e., So is the set resulting from the deletion of ~ A from S;). Then the following conditions are equivalent: (I) A is a probabilistic consequence of S, (2) A is a reasonable consequence of S, (3) A holds in all Porderings of 2 in which all members of S hold,
(4) SD(A)~SD(S;)
(5) SD(A)~SD(So) and SD(So)~SD(So)~SD(A)~SD(A), (6) for some subset S" of S, SD(A)~SD(S") and SD(S")~SD(S") ~SD(A)~SD(A).
Proof. That condition (1) entails condition (2) and condition (2) entails
condition (3) were proven in Theorems 2 and 5.4, and that (5) entails (6) is
PROBABILITY AND THE LOGIC OF CONDITIONALS
293
trivial. What will now be shown is that (3) entails (4), (4) entails (5) and (6) entails (1). Assume first that condition (3) is satisfied: i.e., A holds in all Porderings of !l? in which all members of S hold. We can assume here and later that A is a conditional formula q>+ 'P, since if A is some unconditional formula 'P, then A can be replaced by T + 'P throughout without altering any of the conditions under consideration. Consider now the augmented set S' =Su{ ~A}, where ~A was defined (Definition 2.1) to be the formula q>+  'P. Either SD(S')=SD, in which case the standard ordering associated with S' is undefined, or SD(Sl;6 SD, in which case the standard ordering associated with S' is defined. Consider the case in which SD(S')= SD first. In this case condition (4) holds trivially, since SD(A) is clearly a subset of the set of all SDs of 2:'. Now, suppose that SD(Sl;6 SD, and therefore that the standard ordering, ::s;, associated with S' is determined. Since ~ A is in S', and all formulas of S' hold in S (by Theorem 6.2), ~ A holds in ::S;. Likewise, since S is a subset of S', all formulas of S hold in s, and therefore A must hold in ::s;, by the assumption that condition (3) holds. Now, it follows directly from the definition of 'holding' that two 'contrary' formulas A = q>+ 'P and ~ A = q>+  'P can both hold at once in a Pordering S only if q>
S;.
294
ERNEST W. ADAMS
i.e., a is an SD belonging to CfJ  'P (see Definition 3.2). SD (A) is a subset of SD(A), hence, by condition (4), a, which is a member of SD(A), is a member of SD(S~)=SD(S~). Also, since S~=Sou {CfJ~  'P}, SD(S~)=SD(So)u SD (CfJ ~  'P): hence a is a member of SD (So)uSD (CfJ ~  'P). But the elements of SD (CfJ~  'P) are just the SDs belonging to CfJ & 'P, and so a does not belong to SD (CfJ~  'P). Therefore a is a member ofSD (So), and we have shown that SD(A) is a subset of SD(So)' To show that SD(So)~SD(So) is a subset of SD(A)~SD(A), suppose that a is a member of SD(So)~SD(So)'Since So is a subset of S~, a must be an element of SD(S~)=SD(S~)=SD(So)uSD(CfJ~ 'P). By hypothesis, though, a, is not in SD(So), so it must be a member of SD(CfJ~  'P)= SD( ~ A). But, the members ofSD (CfJ~  'P) are all of the SDs belonging to CfJ & 'P, and these are the SDs composing the set SD (A)~SD(A). Hence, a is a member of SD(A)~SD(A), and we have shown that condition (4) entails condition (5). Now suppose that condition (6) holds: i.e., SD(A)s;SD(S") and SD(S") ~SD(S")s;SD(A)~SD(A)for some subset S" of S. We will show that A is a probabilistic consequence of S", and hence of S, since S" is a subset of S. Let us assume that S" is the set of conditional formulas CfJt ~ 'P l, ... , CfJn~ 'Pn  if any unconditional formula 'Pi occurs in S, then we can replace it by T  'Ph since the latter is a probabilistic consequence of 'P, (Rule PC2 of Definition 6). Now construct the formula:
According to Theorem 3.4, B is a probabilistic consequence of S". Moreover, it is the case that SD(B)=SD(S") and SD(B)=SD(S"), and therefore SD(B)~SD(B)=SD(S")~SD(S"). That SD(B)=SD(S") follows from the fact that the antecedent of B is the disjunction of the antecedents of the , CfJn~ 'P/I in S". Therefore, SD(B) is the set of all SDs formulas (Pl+ Ill l , belonging to CfJt v V CfJn' which is the union of the SDs belonging to CfJl' CfJz, etc., which is equal to SD(S"). That SD(B)=SD(S") follows from the fact that Ant (B) &  Cons (B) is tautologically equivalent to (CfJl  'P l) V ... V (CfJ/I 'Pn)· SD(B) is the set of all SDs belonging to the formula Ant(B)&  Cons (B), and is therefore the same as the set of SDs belonging to the union of the sets of SDs belonging to CfJl  'Pi> ... , CfJn  'P". But, SD (CfJi~ 'P) is the set of all SDs belonging to ip,  'Pi' so SD (S") is equal to the union of all these sets, and is therefore equal to SD (B). Now, given any two formulas A and B, it can only be the case that
PROBABILITY AND THE LOGIC OF CONDITIONALS
295
SD(A)s;SD(B) if Ant(B)Cons(B) is a tautological consequence of Ant(A)Cons(A). For SD(A) and SD(B) are, respectively, the SDs belonging to Ant(A)Cons(A) and Ant(B)Cons(B), and the set of SDs belonging to one formula are a subset of the set of SDs belonging to a second if and only if the second is a tautological consequence of the first. Therefore, Ant(B)Cons(B) is a tautological consequence of Ant(A)Cons(A), since SD(A)s;SD(S")=SD(B). The same kind of argument can be used to show that if SD (B),....,SD (B)S; SD (A),...., SD (A), then theformulaAnt(A)&Cons(A) is a tautological consequence of Ant(B)&Cons(B). Therefore Ant(A)& Cons(A) is a tautological consequence of Ant(B)&Cons(B), since SD(B) ,....,SD (B) = SD (S,,),...., SD (S"):s; SD (A),....,SD(A). We have now shown both that Ant(B)Cons(B) is a tautological consequence of Ant(A)Cons(A), and that Ant (A) &Cons (A) is a tautological consequence of Ant (B) & Cons (B). It follows from Theorem 3.5 in this case that A is a probabilistic consequence of B, and hence A is a probabilistic consequence of S. This completes the proof. Theorem 7 presents the key mathematical results of this paper. In the following section we will derive some relatively easy correlates of Theorem 7, some of which have, perhaps, a more immediate intuitive significance than does the main theorem. In concluding this section, note that Theorem 7 provides the basis for a fairly direct decision procedure for determining whether a formula A is a reasonable consequence of a finite set, S, offormulas. Probably the simplest procedure for determining whether A is a reasonable consequence of S is to construct the reduction sequence of the augmented set S'=Su{,....,A}, and determine whether SD(A) is a subset of SD(S~), where S; is the final term in the reduction sequence of S'. According to condition (4) of the theorem, A is a reasonable consequence of S if and only if SD(A) is a subset of SD(S;). The following illustration will help to make clear how this decision procedure works. Suppose that the problem is to determine whether the tautologically valid inference of  p+q from p v q is reasonable. In this case Sis the singleton set {p v q}, and A =  p+q. As always, we begin by replacing all nonconditional formulas by their corresponding conditionals in the standard way. In this case, p v q is replaced by T +p v q, the justification being that the two formulas are each reasonable consequences of the other. The next step is to form the augmented set S' = Su {,...., A}, where > A in this case is p+ q. The next step is to form the reduction sequence, S;, ..., S;, of S'. One simple way is to use the tabular representation of formulas B, and the sets SD(B) and SD(B). These are exhibited here, letting a, b, c and d be the
296
ERNEST W. ADAMS
formulas' p & q', 'p & q', 'p &q' and 'p&q', respectively (see diagram on p. 289). Then we have: formula B T+pVq p,.q
SD (B)
SD(B)
(a, b, c, d} {a, d} SD(SD
{a} {d}
= SD(S') = {a, d}.
The second term of the reduction sequence, S~, is the set of all B in S; = S' such that SD(B)s;SD(S;): so in this case, S~ is the singleton set {p~ q}. S~ is determined in similar fashion, by finding all B in S; such that SD(B)s;SD(SD= {d}. In this case there are no members of B of S~ with this property, so S3 = A. And, since SD(A) =SD(A), S~ terminates the sequence: i.e., the reduction sequence of S' is the sequence S;, S;, S~ where: S'! S~
= {T ~ p v q,  p ~  q} = { p ~  q}
S~ = II
SD(S;) = {a,d} SD(S~)
= {d}
SD(S~)
= SD(S;) = A.
Now, SD(A)=SD( p~q)={a, d}, and this is not a subset of SD(S;)=A, and so it is not the case that A is a reasonable consequence of S. The procedure outlined above also leads immediately to the construction of a Pordering in which all members of S hold, but A does not hold, if it proves to be the case that A is not a reasonable consequence of S. In particular, if s; is the standard Pordering associated with the augmented set S' = Su {A}, but SD(A) is nota subset of SD(S;), then all formulas of Shold in ~, but A does not. Thus, in the foregoing example, the ordered partition of SD generated by S' is the sequence SD 1 , ... , SD 4 , where SD t =SD~SD(S;)= {b, c}, SD z =SD(S;)~SD(S~)= {a}, SD 3 =SD(Sn~SD(S~)= {d}, and SD 4 = SD (S~) = A. The standard Pordering is thus determined from the following ordering of the set SDu{F}:F
PROBABILITY AND THE LOGIC OF CONDITIONALS
297
among a number ofcorollaries to Theorem 6 which are derived in the following section. 7. Consequences of the completeness theorem. One important corollary of the completeness theorem is still another necessary and sufficient condition for a formula A to be a reasonable consequence of a finite set of formulas S. This corollary is stated in terms of another 'consequence' relation which has some significance in its own right. There is an intuitive sense in which we may speak of a conditional statement of the form 'if p then q' as being verified in a situation in which p and q are both found to be true, as being falsified in a situation in which p is found to be true and q is found to be false, and as being neither verified nor falsified in case p is found to be false. If 'verified' and 'falsified' are taken in the senses described above, then a conditional formula is verified just in case the corresponding conditional bet wins, is falsified if the bet loses, and is neither verified nor falsified in case the 'condition' of the bet is not fulfilled, so that the bet is not in force. The concepts of verification and falsification under a truth assignment are next defined formally, and in terms of those the relation of strong entailment is defined such that a set S of formulas strongly entails a formula A in case: (1) any truth assignment falsifying A must falsify at least one member of S, and (2) any truth assignment not falsifying any member of S and verifying at least one member of S must verify A. DEFINITION 10. Let 2 be a language, let S and A be respectively a set of formulas and a formula of 2, and letfbe a function mapping the class of truthfunctional formulas of 2 into {0,1}. 10.1. f is a truthassignment for 2 if and only if for all truthfunctional formulas qJ and lJT in 2,/( qJ)=l f(qJ),/(qJ& '1')= f(qJ)'f('1') andj(qJ v '1')= f(qJ) + f('1') f(qJ & '1'), andf(F)=O andf(T)= 1. 10.2. Iffis a truthassignment for 2 then A is verified underfif and only if either A is truthfunctional and f (A) = 1 or A = qJ+ lJT for some qJ and '1', andj(qJ)= j('1') = 1; A isjalsified underjif and only if either A is truthfunctional andj(A)=O, or A =qJ+ '1' for some qJ and '1', and j'{o) = 1 andj('1')=O. 10.3. S strongly entails A if and only if for all truth assignments f of 2: (i) if no Bin S is falsified under f, then A is not falsified under f, and (ii) if no B in S is falsified under f, and at least one B is verified, then A is verified under f. Another way of describing the relation of strong entailment between a
298
ERNEST
W.
ADAMS
set S and a formula A is to say that not losing on any bet on a formula of S guarantees not losing on A, and not losing on any bet on a member B ofSand winning on at least one of them guarantees winning on A (and not just not losing). It should perhaps be noted that, despite any intuitive significance the concept of strong entailment may have, the relation formally defined here is not a deduction relation in the usual sense. It is not always the case, for instance, that if a formula A is strongly entailed by a subset of S, then A is strongly entailed by S (even if S is finite). The connection between strong entailment and reasonable consequence is shown in the next theorem. THEOREM 8. Let!l' be a finite language, and let A and S be, respectively, a formula and a finite set of formulas of !l'. Then A is a reasonable consequence of S if and only if there is a subset, S', of S such that S' strongly entails A. Proof By Theorem 7, a necessary and sufficient condition for A to be a reasonable consequence of S is that there exists a subset S' of S such that: SD(A)~SD(S') and SD(S')~SD(S')~SD(A)~SD(A).It will now be shown that S' satisfies the two conditions above if and only if S' strongly entails A in the sense of Definition 10.3. It will then follow directly that A is a reasonable consequence of S if and only if there exists a subset S' of S such that S' strongly entails A. If SD is the SDset of !l', then every element a of SD defines a unique truthassignrnent j; for !f! such that for all truthfunctional qJ in !f!,
/, (p) = 1 if and only if a belongs to
qJ.
Furthermore, the correspondence between members of SD and truth function is oneone since for every truth function f there is a unique element I'J. of SD such thatf=.f~. Now, consider any formula A and set of formulas S of il'. The following facts are easily established: (1) A is falsified underz, if and only if a is in SD(A), (2) A is verified underr, if and only if a is in SD(A)~SD(A), (3) at least one member of S is falsified underr, if and only if a is in SD (S), (4) no member of S is falsified and at least one is verified underr, if and only if a is in SD(S)~SD(S). In proving (1)(4) we need only consider the case in which A and S consist of conditional formulas, since the strategy of replacing any unconditional formula IJf by the conditional T ~ IJf works here as elsewhere. Now, if A = (p~ 1Jf, thenr, falsifies A if and only if/,(qJ) = 1 and/, (1Jf)=0, hence if and only if a belongs to qJ and not to 1Jf. But, these are precisely the conditions
PROBABILITY AND THE LOGIC OF CONDITIONALS
299
for a to belong to SD (A) (see Definition 3.2). Similarly, A is verified under fa ifand only iffa(qJ)=fa('P)= 1, hence ifand only if a belongs to both qJ and 'P, which is again trivially equivalent to the requirement that a belongs to SD(A)~SD(A). At least one member of Sis falsified under j', ifand only if for some B in S, fa falsifies B, hence if and only if for some B is S, a is in SD (B). And this is equivalent to the condition that a be in the union of all SD (B) for Bin S, which is by definition equal to SD (S). Finally, a necessary and sufficient condition for fa not to falsify any B in S, and to verify at least one is that a not be a member of any SD (B) for Bin S, and be a member of SD(B)~SD(B)for some B in S. The latter is equivalent to the requirement that a be a member of the set:
U (SD(B) ~ SD(B)) ~ U SD(B).
BES
BES
By elementary set theory, though, the above set is equal to
U SD(B) ~ U SD(B),
BES
BES
and this is by definition the same as SD(S)~SD(S). Having established equivalences (1)(4) above, the rest of the proof is trivial. Thus, ifSD (A) is a subset ofSD (S'), then every truth assignment that falsifies A must falsify at least one member of S' (by equivalences (1) and (3», hence every truth assignment not falsifying any member of S' must not falsify A. And, if SD(S')~SD(S') is a subset of SD(A)~SD(A)then every truth assignment not falsifying any member of S' and verifying at least one must verify A (by equivalences (2) and (4». Hence, S' must strongly entail A. All of these equivalences are reversible, so that if S' strongly entails A then SD(A) is a subset of SD(S) and SD(S')~SD(S') is a subset of SD(A) ~SD (A), and therefore SIf A. This completes the proof. Theorem 8, besides possibly helping to illuminate the intuitive significance of the reasonable consequence relation (which is now seen to be closely connected with strong entailment), affords a rather quick way of checking inferences which are tautologically valid, to see if they are also reasonable. If an inference is tautologically valid, then no truth assignment which fails to falsify all of the premises can falsify the conclusion; and what is required in addition to guarantee that the strong entailment relation holds, is to show that every truth assignment failing to falsify any of the premises and verifying at least one of them must also verify the conclusion. Often this second condition is quite easy to check. For example, consider contraposition as a
300
ERNEST W. ADAMS
rule of inference for conditionals: to infer  q_  p from p_q. In this case there is only one premise, pq, and so what must be the case in order for pq to strongly entail q p is that verifying p_q entails verifying  q  p. But, this is clearly not the case, since verifying pq requires assigning both 'p' and 'q' the value 'true', whereas verifying qp requires assigning 'p' and 'q' both the value 'false'. Hypothetical Syllogism is also seen not to satisfy the condition of strong entailment, for by assigning the values 'false', 'true', 'true' to 'p', 'q' and 'r', respectively, we can fail to falsify pq, verify qr, but fail to verify pr, which shows that pq and qr do not strongly entail pr. The reader can, incidentally, easily check rules PCIPCS for probabilistic consequence, to verify that in each case the premises do strongly entail the conclusion. Thus, it is clear that pq &r does strongly entail p_q, since pq is a tautological consequence of ps q Scr, and, furthermore, verifying pq &r requires assigning 'true' to each of 'p', 'q' and 'r", which in turn verifies pq. Checking to determine whether a conclusion is strongly entailed by premises has two drawbacks as a method for determining whether the conclusion is a reasonable consequence of premises. One of these is obvious: a conclusion may be a reasonable consequence of premises without being strongly entailed by them, provided that a subset of the premises strongly entail the conclusion (i.e., some of the premises are redundant). Thus, pp is a reasonable consequence of  p, though  p does not strongly entail pp. The reason is that p_p is strongly entailed by the empty set (in fact the necessary and sufficient condition for a formula to be strongly entailed by the empty set is simply that it be a tautology). Hence, showing that a conclusion is not strongly entailed by a set of premises is sufficient to show that the concl usion is not a reasonable con seq uence of the premises only if the premises are in fact not redundant. The second drawback of testing for strong entailment as a decision procedure for reasonable consequence is that the method provides less information about the relation between the premises and conclusion than does the one outlined at the end of Section 6. In particular, the method there outlined showed how to construct a Pordering (and associated uniform sequence of probability functions) in which the premises hold but the conclusion does not, in case the conclusion is not a reasonable consequence of the premises. This is important in applications of the formal calculus where, once having shown theoretically that a conclusion is not a reasonable consequence of premises, one wants to find instances of the premises and conclusion among sentences of English, in which it is obvious that the premises have high
PROBABILITY AND THE LOGIC OF CONDITIONALS
301
probabilities but the conclusion has a low one. The Pordering gives information as to what the probability relations among the atomic propositions belonging to the premises and conclusion must be in order that the premises should have high probabilities, but the conclusions have a low probability, and this information is extremely useful in the search for counterexamples in English. On the other hand, showing that, say, verifying nvs does not entail verifying  q~  p, does not provide any direct means for finding out what the probability relations among 'p' and 'q' and their compounds must be, in order that p~q should have high probability, but  q~  p have a low one. Aside from its intuitive significance, the connection between reasonable consequence and strong entailment established in Theorem 8 provides a useful tool in the derivation of many general 'metatheorems' concerning reasonable consequence. A number of these are given in Theorem 9, below. THEOREM 9. Let A and B be formulas and S be a finite set of formulas. 9.1. SIlA if and only if S, ~AIIAnt(A)~F. 9.2. If S, BII A and S, ~ BII A then SII A. 9.3. If either S is empty or A is truth functional, then SII A if and only if S tautologically implies A. 9.4. If all formulas of S are truthfunctional, then SII A if and only if either A is a tautology, or S tautologically implies Ant (A) &Cons (A). 9.5. If S= {CPl ~ If'l' ... , CPn~ If'n} and SII A but not S'II A for any proper subset S' of S, then A is a reasonable consequence of C(S) = (CPi v ... V CPn)~  (CPi  If'l)&··· &  (CPn  If'n)· 9.6. If A = p~ If', where p is an atomic formula not occurring in S, then SII A if and only if either p~ If' is a tautology, or SII If'~F.l Proofof9.1. Assume that A=cp~1jJ and therefore ~A=cp~IjJ. It follows directly from Definition 6 that cp~F is a probabilitistic consequence of {cp~ljJ, cP~ 1jJ}. Hence, if SII A, then cp~F=Ant(A)~F must be a probabilistic and therefore a reasonable consequence of S, and ~ A. To show the converse, suppose that cp~Fis a reasonable consequence of S and ~ A = cP ~ 1jJ. By Theorem 8, there exists a subset S' of Su { ~ A} such that S' strongly entails cp~F. Either ~ A is in S' or it is not. In the second case, S' must be a subset of S, and therefore there exists a subset of S which strongly entails cp~F. Hence cp~F is a reasonable consequence of S, according to Theorem 8. But, it is easy to show that A = cP~1jJ is itself a 1 An incorrect statement of the conditions under which S II p in S, was originally presented in Adams [1965).
+
.p, where p doesnot occur
302
ERNEST W. ADAMS
reasonable consequence of (p+F, and therefore in this case, A must be a reasonable consequence of S. In case ~A=(p+t/! is a member of S', let So=S'~{cp+t/!}: i.e., S' = Sou { ~ A}. It will be shown that So, which is a subset of S, strongly entails A. Suppose first that a truth assignment rfails to falsify any member of So, but falsifies A = cp+ t/!..f therefore verifies ~ A = cp+t/!, hence fails to falsify any member of S' = Sou { ~ A}, and verifies at least one. But S' strongly entails cp+ F, and therefore f would have to verify cp+F, which is impossible. Therefore if/fails to falsify any member of So, it cannot falsify A. Now suppose thatz'fails to falsify any member of So and verifies at least one member of it. In this case f must actually verify A, for if it failed to do so, fwould fail to falsify ~A, and therefore would fail to falsify any member of S'=Sou{ ",A}, and verify at least one, and so would have to verify cp+F, which is impossible. Therefore, iff fails to falsify any member of So and verifies at least one member, it verifies A. This shows that So strongly entails A, and therefore A is a reasonable consequence of S, by Theorem 8. This concludes the proof of 9.1. Proof of 9.2. Suppose that S, Blf A and S, '" Blf A. According to Theorem 9.1 then, S, B, ~AIfAnt(A)+F and S, "'B, ~AIfAnt(A)+F. It will be shown that S, ~AIfAnt(A)+F, from which it will follow by Theorem 9.1 again that SIf A. Suppose that A = Cp+t/!, and therefore >> A = cp+  t/!, and Ant(A)=(p. Further, let S+ be the set Su{"'A}: by hypothesis then S+, Blfcp+F and S+, '" Bifcp+F, and what must be shown is that S+Ifcp+F. From the fact that S+, Blfcp+F, it follows that there exists a subset S' of 5 +, B such that 5' strongly entails cp+F. In case B does not belong to 5', then S' is a subset of S +, and therefore there is a subset of S + strongly entailing cp+F, so S+Ifcp+F, as was to be shown. Suppose on the other hand that B does belong to 5'. Now let 5 0=5' ~ {B} and therefore S'=Sou{B} strongly entails (p+F. The immediate consequence of this is that So strongly entails ~B. For, suppose thatfis a truth assignment failing to falsify any member of So, but falsifying ~ B; thenfwould actually verify B, so fwould fail to falsify any member of 5' = Sou {B}, and would verify at least one member (namely, B), and hence should verify cp+F, since the latter is strongly entailed by S'. But it is impossible to verify tps F, and so no truth assignment failing to falsify any member of So can falsify", B. If a truth assignmentf fails to falsify any member of So and verifies at least one member, then it must actually falsify B and therefore verify ~ B, since if it failed to falsify B, then no member of S' would be falsified, and at least one would be verified, which would again entail that cp+F was verified, which is impossible.
PROBABILITY AND THE LOGIC OF CONDITIONALS
303
It has been shown, then, that S + has a subset, So, which strongly entails '" B, from which it foIlows that S + If '" B. But, it foIlows immediately from this together with the assumption that S+, '" BIf
to be shown. This concludes the proof of 9.2. Proof of 9.3. According to Theorem 1.2, if SIf A then S tautologically implies A. Hence to prove Theorem 9.3, what we must do is to show that if either S is empty of A is truth functional, and S tautologically implies A, then SIf A. Suppose first that S is empty, and S tautologicaIly implies A  i.e., A is a tautology. Then S strongly entails A, since no truth assignment falsifies A, and no truth assignment can verify at least one member of S. Hence, in this case SIf A, by Theorem 8. Now suppose that A is truth functional and S tautologically implies A. Again, it cannot be the case that any truth assignment which fails to falsify any member of S falsifies A. But this entails that any truth function failing to falsify any member of S must actually verify A, since in the case of truth functional formulas A, any truth function failing to falsify them must verify them. Therefore, again Smust strongly entail A, hence SIf A. This proves 9.3. Proof of 9.4. Suppose that all formulas of S are truth functional. If SIf A, then there is a subset Sf of S such that Sf strongly entails A. Two cases must be considered: (1) Sf is empty, and (2) S/ is not empty. In case Sf is empty, then it strongly entails A if and only if A is a tautology. In case (2), it wiIl be shown that S' tautologically implies Ant(A)&Cons(A). Again, we can assume without loss of generality that A is conditional, say A =
304
ERNEST W. ADAMS
entailed by Ant(A) and Cons(A), hence Ant(A), Cons (A)II A, and so, by transitivity of II , SIIA. This concludes the proof of Theorem 9.4. Proofof9.5. Suppose that S= {fPc> 'PI, ... , fPn+'P,.} and SIIA, but not S'II A for any proper subset S' of S. This means that S must itself strongly entail A. But it is not hard to see that S strongly entails A if and only if the formula
strongly entails A. The reason is this: C (S) is so constructed that any truth assignment failing to falsify any member of S also fails to falsify C(S), and any truth assignment failing to falsify any member of S and verifying at least one of them must verify C (S). Thus, if a truth assignment, j, failed to falsify any fPi+ 'Pi for i = 1, ... , n, but falsified C (S), it would have to assign 'false' to one of the formulas fPi  'Pi in the consequent of C (S). But this would then falsify the formula fPi+ 'Pi' contrary to hypothesis. Iff failed to falsify any fPi+ 'Pi for i= 1, ... , n, and verified at least one of them, it would have to assign the value 'true' to at least one of the fPi' for i = 1, ... , n, and to all of the  (fPi 'PJ for l = 1, ... , fl. But this would make both the antecedent and consequent of C(S) true, and would verify C(S). The converse of the above also holds: any truth assignment failing to falsify C (S) must fail to falsify any member of S, and any assignment which verifies C(S) verifies at least one member of S. From what has been shown, then, C(S) strongly entails A, since S does, and therefore C(S)II A. This concludes the proof of 9.5. Proof of 9.6. Suppose that A =p+ 'P, where p is an atomic formula not occurring in S. Assume first that SII A. Then there is a subset S' of S such that S' strongly entails A. If S' is empty, then A must be a tautology. If S' is not empty it will be shown that  'P+F is strongly entailed by S'. Since S' strongly entails A, A is a tautological consequence of S'. But, it is an easy theorem of the propositional calculus that if S' tautologically implies A = p+'P, where p does not occur in S', then S' tautologically implies 'P, and hence S' tautological1y implies  'P+F. And, it follows directly from this that any truth assignment failing to falsify any member of S' must also fail to falsify  'P + F. We show next that the condition that any truth assignment failing to falsify any member of S' and verifying at least one member of S' verifies  'P+Fholds vacuously, since no truth assignment can fail to falsify all members of S' and verify at least one. The reason is clear: such a truth assignment would have to verify p+'P, since A =p+ 'P is strongly entailed by S'. But since p does not occur in S', the value 'false' could be assigned
PROBABILITY AND THE LOGIC OF CONDITIONALS
305
to p independently, so that the truth assignment would fail to falsify any member of S', verify at least one member, but not verify p~ IJ', since p was assigned the value 'false'. Therefore, the subset S' strongly entails IJ'~F, and hence SIf  'P ~ F. Conversely, suppose that either A =p~ IJ' is a tautology, or SIf IJ'~F. In the first case, A is a reasonable consequence of the empty set, and hence of S. In the second case, p~ 'P is a reasonable consequence of S because it is strongly entailed by  'P~F. Thus, any truth assignment failing to falsify  'l'~F must give the value 'true' to 'P, and hence fail to falsify p~ IJ'. And, no truth assignment verifies IJ'~F, so the other requirement for strong entailment holds vacuously. Hence in either case that A is a tautology or SIf IJ'~F, SIfA. This concludes the proof. Each of the parts of Theorem 9 has some intuitive significance, which will be discussed briefly. Theorems 9.1 and 9.2 are exhibited to show that at least some of the usual metatheorems of the propositional calculus carryover to the theory of reasonable inference. The rule that if Ant(A)~Fis a reasonable consequence of S and ~ A, then A is a reasonable consequence of S (Theorem 9.1) is a kind of generalization of the reductio ad absurdum principle, and it reduces to the familiar principle in case the conclusion, A, is truth functional. The generalization of the rule to the conditional case, A =
306
ERNEST W. ADAMS
steps in the reasoning may not be strictly in accordance with the rules for reasonable inference. Theorem 9.4 says, essentially, that no interesting conditional conclusions can be drawn from unconditional premises. A conditional conclusion from premises would not be practically interesting if it were in fact tautological, or if it could only be asserted on the grounds that its antecedent and consequent could be independently asserted, and Theorem 9.4 says that these are the only conditions under which the conditional can be reasonably inferred fom truth functional premises. Once again, one may look upon this result as casting doubt on the adequacy of the present theory. At any rate, most empiricists would probably argue that the data upon which all of our conclusions about the world are based should be represented as unconditional statements, and if these are the only 'premises' we have to go on, we must be able to justify infering certain conditional consequences from them. This shows that the traditional problems of induction may have their counterparts even in the very modest calculus here set forth. On the more technical side, Theorem 9.4 shows what we might have expected on intuitive grounds: that conditional statements can't be replaced by logically equivalent (in the sense of our calculus) truth functional statements. In particular, it follows immediately from the theorem that the formula p~q is not equivalent to any truth functional formula, for it could only be a reasonable consequence of a truth functional formula qJ if both p and q followed from qJ, and in that case, qJ would not be a reasonable consequence of p~q. The principle significance of Theorem 9.5 is to show that in deriving reasonable inferences from nonredundant sets of premises, one can replace the set ofpremises (here, as always, assumed to be finite) by a single conditional premise. In a sense, this suggests that one might introduce a kind of 'quasiconjunction' operation applying to conditionals, as follows: if A =qJI ~ PI and B=qJz~ 'Pz , then we would set This conjunction does have the property that any conclusion, C, which follows from A and B, but not from A alone or B alone, also follows from A &B. One can also show easily that quasiconjunction reduces to ordinary conjunction in case both A and B are truth functional. On the other hand, this is not a true conjunction operation because in general neither A nor B is a reasonable consequence of A & B. In fact, Theorem 9.5 entails directly that no such 'true' conjunction operation is definable within this calculus. In particular, if
PROBABILITY AND THE LOGIC OF CONDITIONALS
307
we set A=p~r and B=q~r, then if C were any formula which was both a reasonable consequence of A and B, and which had both A and B as reasonable consequences, then C would have to be a reasonable consequence of the quasiconjunction: A&B
= (p v q) ~  (p  r) &
 (q  r),
which is equivalent to p v q~r. But, the latter formula has neither p~r nor q~r as a probabilistic consequence. It seems to me that this result somewhat confirms our intuitions about conditionals: namely that the joint assertion of two conditionals is not the same as, and cannot be paraphrased as the assertion of a single conditional. The intuitive significance of Theorem 9.6 is this: we cannot infer anything about what will happen if a certain situation occurs (represented by the atomic formula p) from premises which do not in any way refer to that situation, unless some very strong conditions are met. One might say that this shows that not only is the well known fallacy of material implication  to infer q~p from p  not valid in our calculus, but no fallacy of a similar kind can arise unless special conditions are met. Ones intuitive reaction to the inference of q~p from the premise p is that what is wrong here is the drawing of a conclusion about what would happen in situation q from a premise not involving q (e.g. to infer from 'it will rain tomorrow' the conclusion 'if John's car is a Ford then it will rain tomorrow'). Of the two special cases in which the inference of a conclusion p~ IfF, whose antecedent is not mentioned among the premises, is reasonable, one is familiar: namely, that in which the conclusion is a tautology, and therefore practically uninteresting. The other case is that in which it is also possible to deduce the formula  'P~F from the premises. What is important to note here is that, in a sense, the statement  'P ~ F is much 'stronger' than the statement 'P. That is, though  'P~Fand 'Pareequivalenttruthfunctionally, IfF can have in general a probability which is arbitrarily close to, but different from 1, whereas  'P~Fcan have a high probability only when its probability is equal to 1, and that probability is equal to 1 only if the probability of 'Pis 1. Thus, the second special case in which p» 'P can be reasonably infered from premises not containing p is that in which a high probability of the premises actually insures the certainty of 'P (in the sense that the probability of 'P is 1). This intuitive explanation has a more precise counterpart in the result stated in Theorem to.3, to follow. Let us note here, by way of caution, that we are treading on dangerous ground in attaching significance to formulas of the form 'P~F, which occur
308
ERNEST W. ADAMS
in both Theorems 9.1 and 9.6. The reason is that in assigning any probability other than 0 to them, we are relying on our somewhat doubtful convention to assign any conditional the probability 1 if the probability of its antecedent is O. Remarks in Section 9 throw further light on this issue. In concluding this section, three further corollaries to Theorem 8 are derived, all of which deal explicitly with probabilities. THEOREM 10. Let 2? be a finite language, let A and S be, respectively, a formula and a finite set of formulas of 2?, and let ({J be a truth functional formula of 2? 10.1. Ifnot SII A then for all s >0 there exists a probability function P of 2? such that P(B» 1<; for all Bin S, but P(A)<<;. 10.2. If SII A and S has n members, then for all <;>0, and probability functions P of 2?, if P(B) > 1 <; for all Bin S, then P(A» In<:. 10.3. If SIt  l[f+ F and S has n members, then for all probability functions P of 2?, if P(B» Ilin for all Bin S, then P(l[f) = 1. Proof of fO.l. Suppose that A is not a reasonable consequence of S. We can assume, as usual, that A is conditional, say A =
309
PROBABILITY AND THE LOGIC OF CONDITIONALS
C(S') is seen to be equivalent to the iterated quasiconjunction B 1 & ... &Bn in the sense that the antecedent and consequent of B 1 & ... ScB; are tautologically equivalent, respectively, to the antecedent and consequent of C (S'). And, the necessary and sufficient condition that C (S') strongly entail A = ep+ 'P is just that ep 'PtautologicaUy imply Ant( C(S')) Cons( C(S')), and Ant(C(S'))&Cons(C(S')) tautologically imply ep& P. Now we will show that if C=ep1+ P 1 and D=ep2+ P 2, and we define C&D=(ep1 v e(2)>(ept 'P1) &  (ep2  lJI 2) , and if P(C» 110 and P(D» 18, then P(C &D» 1108. Assume first that neither P(ept) nor P(ep2) is zero. Then:
P(ep1  1[11) P( ~ C) = P(ept +  'P1) = P(ept) = 1  P(C) < 10, and
P( ~ D) = P(ep2
+
'P2) =
P(ep2  'P2) = 1 P(D) < 8. P(ep2)
By the inequality of general probability theory cited in the proof of Theorem 2,
P((ep1  'P 1) V (ep2  'P2))    s P(ep1
Ve(2)
P(ep1  P 1) P(ep2  P 2) _ + < 10 P(epd P(ep2)
+ 8.
But, by the elementary calculus of probability,
P((ep1  P t ) v (ep2  lJI 2)) P(C&D)=I >1108. P(ept Ve(2) In case either ept or ep2 is zero, the proof is even simpler. For example, suppose that P(ept)=O. Then by elementary probability,
P(C&D) = P(ep1 v ep2 + (ept  P t) &  (ep2  lJI 2)) = = P(ep2 + 1[12) = P(D), hence, clearly P(C &D» 1108. Thus, we have shown that the 'probability defect' of the conjunction C &D is not greater than the sum of the 'defects' of the conjuncts. Iterating this, it follows directly that if
P(epi+ PJ > 1 10 for i = 1, ... .n, then P(C(S')) > 1 nco We have supposed that C(S') was so constructed that Ant(A)Cons(A) tautologically implies Ant( C(S')) Cons (C(S')) and Ant (C(S')) &Cons (C(S')) tautologically implies Ant(A)&Cons(A). To shorten writing, let us set A = ep+ P and C(S')=y+ fl. Then ep  P tautologically implies Y fl and
310
ERNEST W. ADAMS
I' & Ji tautologically implies
P(y  Ji) ,~,
>
P(

P(Y&Ji) P(
from which it follows immediately that: P(y &Jl) P(y + Ji) =   P(y &Ji) + P(y 
P(
{l)
~
     = P(
P(
If P(y &Ji) =0 but P(y)#O, then P(Y+Ji) =0, so clearly P(Y+Ji)~ P(
PROBABILITY AND THE LOGIC OF CONDITIONALS
311
A technical observation on Theorem 10.2 is that it reduces to a Theorem of Suppes (this volume) in the case in which all of the formulas involved are truthfunctional. Comparison of the proofs illustrates strikingly how much simpler the 'inequality' calculus of unconditional probabilities is than the corresponding calculus of inequalities among conditional probabilities. 8. Analysis of inferences which are tautologically valid, but not reasonable. The general results derived in the previous sections provide a stock of mathematical tools with which to attack a number of questions having to do with the formal theory of reasonable inference and its applications. The final two sections deal briefly with two topics of some methodological interest connected with the present theory, which the earlier results shed some light on. In both cases the discussion will be informal. The nature of inferences which are tautologically valid but not reasonable in the sense of this theory warrents further consideration. In particular, we may ask: given an instance of an inference schema to infer a conclusion A from a set of premises S, where A is a tautological but not a reasonable consequence of S, what conditions must obtain in order that the probabilities of the premises be high, but that of the conclusion be low? Our results concerning Porderings give some information on this question, as will be shown below. First, however, a few examples of inference schemata which are tautologically valid but not reasonable may help to show how pervasive this phenomenon is. None of the following schemata are reasonable: (1) to infer p~q from any of pvq, p, or q; (2) to infer  p~ F from p; (3) to infer p~r frompvq~r; (4) to infer p&q~r from ps r ; (5) to infer q~  p from p~q; (6) to infer p~r fromp~q and q~r; (7) to infer p~  q from p~ r and q~  r. As an example of a standard metatheorem of inference which is not valid for reasonable inference, we may cite the deduction theorem (rule ofconditionalization). That is, the following rule does not hold for reasonable inference: if S is a set of formulas and
312
ERNEST W. ADAMS
premises S according to those rules if and only if A were a tautological consequence of S. The proof of this is trivial in the case of the first rule: to infer (p+l/J from qJvl/J. For, suppose that S tautologically implies a formula, A. If A is truth functional, then A is also a reasonable consequence of S (by Theorem 9.3), and hence A would also follow by the augmented rules. If A is conditional, say A=qJ+l/J, then clearly it is the case that S tautologically implies the truth functional formula qJ v l/J, so qJ v l/J is also a reasonable consequence of S. But, with the additional rule to infer qJ+l/J from  (P v l/J, A = qJ+l/J could also be derived from S. The reader can easily verify in the cases of the other nonreasonable rules that adding them to those for reasonable inference would make the resulting system equivalent to the propositional calculus. Since all rules except (1) and (2) are highly plausible intuitively, this shows what very strong intuitively plausible reasons there are for representing 'if then' statements in English by material conditionals for the purposes of logical analysis. Thus, one may say, granted that one wants to give a truth condition analysis of conditionals in order to determine their logical relations with other statements, then one is practically forced to assign truth values to them which are the same as those for the material conditional. Now we return to the question raised earlier: what probability relations must the components of the premises and conclusion of a tautologically valid but not reasonable inference have in order for the premises to have high probability but the conclusion have a low probability? Answering this question may be the most interesting application of the formal theory, and it is here that the results concerning Porderings prove useful. For, supposing that the premises and conclusion are S and A, respectively, we can deal with the problem most efficiently by asking the related question: that conditions must be satisfied by a Pordering:::; in order that all the formulas of S hold in :::;, but A does not? To illustrate, consider the nonreasonable inference rule: to infer p+q from  p v q. What sorts of Porderings are such that  p v q holds in the ordering but p+q does not? To formulate the conditions under which  p v q holds, but p+q does not, we may use the Venn Diagram (see p. 289), letting a, b, c and dstand for  p& q, p & q,p &q and  p &qrespectively. Then:  p v q holds in :::; if and only if b
not hold in :::; if and only if:
c :::; band F < b v c.
PROBABILITY AND THE LOGIC OF CONDITIONALS
313
Now, it is a simple matter of inference from the axioms for Porderings to show that the two conditions above hold if and only if: F < band c ~ band b < a v d.
Given the interpretation of qJ < ljJ as meaning that the probability of qJ must be small relative to that of ljJ, the above conditions lead to the following conclusions concerning the probabilities of a, b, c and d (and by implication, about the probabilities of combinations of them): (l) that P (p) > 0, (2) that P(b) is not small by comparison to P(c), and (3) thatP(b) is small with respect to P (a v d) = P (  p). One inference in particular is significant: namely that, since P(c) is not large relative to PCb) and P(b) is small relative to pea v d), then P(p)=P(bvc) is small relative to P(p)=P(avd). Hence, the formula  p must hold in the ordering ~. The foregoing facts help in the search for an actual counterexample to the rule to derive p~q from  p v q: i.e., such an example must be one in which pep q)= P(b»O, pep q) is not small relative to pep &q)= P(c), and it is small relative to P(  p)  and, therefore, P(  p) is close to I. Now, all we have to do is to find examples of English statements, p and q, whose probabilities have these relations. First, the probability of  p must be high, so that of p must be small: i.e., p must be a statement with low probability. Let us take any such statement to start with: e.g., let p = Mr. Jones will have an accident on his way to work.
Now q must be chosen in such a way that the probability of p  q is not small in comparison to p&q. In fact, one way to insure that this condition is satisfied is to require that pep &q) be small in comparison to P(pq), which is the same as to require that, if the event p does occur (which is unlikely), then it will be much more likely that q does not occur than that it does. So, we may take q to assert something which would be highly improbable in the event of p's occurrence, for example: q=Mr. Jones will arrive on time for work.
All that remains, after having chosen p and q in the above way, is to verify that these are statements such that  p v q is highly probable, but p~q is not. In this case,  p v q = Either Mr. Jones will not have an accident on his way to work, or he will arrive on time for work, and p~q=If Mr. Jones has an accident on his way to work, then he will arrive on time for work.
314
ERNEST W. ADAMS
Now,  P v q is probable, simply because its first disjunct (that Mr. Jones won't have an accident on his way to work) is probable, but clearly p+q is highly improbable, and so it would certainly be unreasonable to deduce p+q from the premise  p v q in this case. The foregoing example also illustrates a point made in Section 1. The example is odd in one respect: namely that it suggests that the premise  p v q is being asserted in a situation in which the stronger assertion,  p could be made (in fact, both  p and q could be asserted here). I suggested earlier that to actually assert a disjunction under such circumstances is misleading (the hearer is likely to ask 'what is the connection between p and q?'). And, if we are justified in assuming that speakers do not deliberately make misleading statements, the inference of p+q from  p v q is 'reasonable' in the sense that it can be justified by this assumption. The above point can be made more precise in the following way. If, on some occasion, P(  P v q» le, but P(p+q)::;.t, it is easy to show that P(  p» 12e, and P(p)<2e. More generally, we may assert: if S tautologically implies (p+1jJ but not SIf
9. A modified criterion of reasonableness. This section is concerned with the possible doubtfulness of those consequences of the basic assumptions of the present theory which depend on the arbitrary assignment of probability 1 to all conditionals whose antecedents have probability O. One way to avoid the problem of having to make some arbitrary assignment of probabilities to conditionals whose antecedents have zero probability is to work only with probability functions which assign values in the open interval (0,1) to all but logically false or logically true statements. Such probability functions have been called regular: they may be formally defined as probability functions in the sense of Definition 4.1, which satisfy the following additional axiom: for all truth functional