North Holland is an imprint of Elsevier The Boulevard, Langford lane, Kidlington, Oxford, OX5 1GB, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands 225 Wyman Street, Waltham, MA 02451, USA First edition 2011 Copyright © 2011 Elsevier B.V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone ( 44) (0) 1865 843830; fax ( 44) (0) 1865 853333; email:
[email protected] Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-444-52936-7 ISSN: 1874-5857
For information on all North Holland publications visit our web site at elsevierdirect.com
Printed and bound in Great Britain 11
11 10 9 8 7 6 5 4 3 2 1
INTRODUCTION
While the more narrow research program of inductive logic is an invention of the 20th century, philosophical reflection about induction as a mode of inference is as old as philosophical reflection about deductive inference. Aristotle was concerned with what he calls epagoge and he studied it, with the same systematic intent with which he approached the logic of syllogisms. However, it turned out that inductive inferences are much harder to evaluate, and it took another 2300 years to make substantial progress on these issues. Along the way, a number of philosophical and scientific turning points were achieved, and we can now look back on the excitingly rich history that this handbook covers in considerable detail. After Aristotle, our history took off in the 18th century with the ingenious insights and contributions of two philosophers: David Hume famously formulated the problem of induction with tremendous clarity. This problem (also called Hume’s Problem) kept philosophers busy ever since; many responses have been put forward and, in turn, criticized and variants of a major philosophical claim (“scepticism”) have been defended on its basis. At around the same time, Blaise Pascal and the philosophers of the School of Port Royal developed probability theory and laid the groundwork for decision theory. Both developments eventually lead to a much better understanding of inductive inferences, and it would be difficult to see how their impact on philosophy and science could be overestimated. The strong bond between developments in science and philosophy (as far as they can be separated) can also be observed in the later course of this history. Think, for example, of the work by Carnap, Hintikka, Ramsey and de Finetti and the contemporary endeavours in learning theory and Bayesian inference. The close interaction between science and philosophy is obvious here, which makes the field of inductive logic rather special. While there are many examples were a science split from philosophy and became autonomous (such as physics with Newton and biology with Darwin), and while there are, perhaps, topics hat ar of exclusively philosophical interest, inductive logic — as this handbook attests — is a research field where philosophers and scientists fruitfully and constructively interact. A final development should be noted: While much of deductive logic has been developed in an anti-psychologistic spirit (an exception is van Lambalgen and Stenning’s Human Reasoning and Cognitive Science, MIT Press 2008), inductive logic profits considerably from empirical studies. And so it is no wonder that contemporary cognitive psychologists pay much attention to inductive reasoning and set out to study it empirically. In the course of this work philosophical accounts (such as Bayesianism) can be critically evaluated, and alternatives might be inspired. Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
viii
It is to be hoped that philosophers and psychologists will interact on these issues more closely in the future, and that the new trend in experimental philosophy will prove a beneficial good. It was our intention to include a chapter on the Port Royal contributions to probability theory and decision theory. For reasons of space, we decided to avoid duplication with Russell Wahl’s excellent chapter, “Port Royal: The Stirrings of Modernity”, which appears in volume two of the Handbook, Mediaeval and Renaissance Logic. The Editors are deeply and most gratefully in the debt of the volume’s superb authors. For support and encouragement thanks are also due Nancy Gallini, Dean of Arts and Margaret Schabas, Head of Philosophy (and her successor Paul Bartha) at UBC, and Christopher Nicol, Dean of Arts and Science, and Michael Stingl, Chair of Philosophy, University of Lethbridge. Special thanks to Jane Spurr, Publications Administrator in London; Carol Woods, Production Associate in Vancouver, and our colleagues at Elsevier, Senior Acquisitions Editor, Lauren Schultz and Gavin Becker, Assistant Editor. Dov M. Gabbay King’s College London Stephan Hartmann Tilburg University John Woods University of British Columbia and King’s College London and University of Lethbridge
CONTRIBUTORS
Nick Chater University College London, UK.
[email protected] Frederick Eberhardt Washington University, USA.
[email protected] Malcolm Forster University of Wisconsin-Madison, USA.
[email protected] Maria Carla Galavotti University of Bologna, Italy.
[email protected] Clark Glymour Carnegie Mellon University, USA.
[email protected] Ulrike Hahn Cardiff University, UK.
[email protected] Evan Heit University of California, Merced, USA.
[email protected] James Joyce University of Michigan, USA.
[email protected] Marc Lange University of North Carolina at Chapel Hill, USA.
[email protected] x
Hannes Leitgeb University of Bristol, UK.
[email protected] Ulrike von Luxburg University of Tuebingenm Germany.
[email protected] John Milton King’s College London, UK
[email protected] Alan Musgrave University of Otago, New Zealand.
[email protected] Ilkka Niiniluoto University of Helsinki, Finland.
[email protected] Mike Oaksford Birkbeck College London, UK.
[email protected] Daniel Osherson Princeton University, USA.
[email protected] Ronald Ortner Montanuniversit¨ at Leoben, Austria.
[email protected] Stathis Psillos University of Athens, Greece.
[email protected] Jan-Willem Romeijn University of Groningen, The Netherlands.
[email protected] Bernhard Schoelkopf University of Tuebingen, Gemany.
[email protected] Contributors
Robert Schwartz University of Wisconsin - Milwaukee, USA.
[email protected] Jan Sprenger Tilburg University, The Netherlands.
[email protected] Scott Weinstein University of Pennsylvania, USA.
[email protected] Jonathan Weisberg University of Toronto, Canada.
[email protected] Sandy Zabell Northwestern University, USA.
[email protected] xi
INDUCTION BEFORE HUME J. R. Milton The word ‘Induction’ and its cognates in other languages, of which for present purposes the most important is Latin ‘inductio’, have a complex semantic history, as does the Greek ἐπαγωγή from which they were derived. Though some of these uses — electromagnetic induction, or the induction of a clergyman into a new benefice — are manifestly irrelevant, others that still diverge significantly from any of the uses current among present-day philosophers and logicians are not. As will soon become apparent, any attempt to write a history that focused solely on the direct ancestors of modern usage would be arduous if not impossible to execute, and deeply unsatisfactory if it could brought to a conclusion. The net must, at least initially, be cast more widely. Another potential problem is that there may have been philosophers who discussed problems of inductive inference without using the word ‘induction’ (or its equivalents) at all. The most conspicuous suspect here is David Hume, who has been widely seen — in the twentieth century at least1 — as an inductive sceptic, even though it is notorious that he rarely used the word, and never in the passages where his inductive scepticism has been located. Whether or not this interpretation of Hume is correct lies outside the scope of this chapter, but it is at least entirely clear that the issue cannot be decided simply from an analysis of Hume’s vocabulary. In the Hellenistic era discussions of non-deductive inference were centred on what became known as inference from signs (semeiosis). This was concerned with arguments from the apparent to the non-apparent — either the temporarily and provisionally non-apparent (for example something at a distance), or to the permanently and intrinsically non-apparent (for example invisible bodies such as atoms). How useful it is for modern historians to employ the terminology of induction when dealing with this material is disputed: some do so quite freely, e.g. [Asmis, 1984], while others reject it altogether [Barnes, 1988]. In the present study no attempt will be made to discuss this material in any detail; for some modern accounts see [Burnyeat, 1982; Sedley, 1982; Allen, 2001]. 1
THE ANCIENT WORLD
Human beings have been making generalisations since time immemorial, and certainly long before any logicians arrived on the scene to analyse what they were doing. Techniques could sometimes go well beyond induction by simple enumeration, as the following remarkable passage from the Old Testament shows: 1 [Stove,
1973; Winkler, 1999; Howson, 2000; Okasha, 2001].
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
2
J. R. Milton
And Gideon said unto God, If thou wilt save Israel by mine hand, as thou hast said, Behold I will put a fleece of wool in the floor; and if the dew be on the fleece only, and it be dry upon all the earth beside, then shall I know that thou wilt save Israel by mine hand, as thou hast said. And it was so: for he rose up early on the morrow, and thrust the fleece together, and wringed the dew out of the fleece, a bowl full of water. And Gideon said unto God, Let not thine anger be hot against me, and I will speak but this once: let me prove, I pray thee, but this once with the fleece; let it now be dry only upon the fleece, and upon all the ground let there be dew. And God did so that night: for it was dry upon the fleece only, and there was dew on all the ground. (Judges, vi. 36–40). Neither the writer of this passage nor his readers had ever read Mill, or heard of the Method of Agreement or Method of Difference, but few could have found Gideon’s procedures difficult to comprehend. As Locke was to comment sardonically, ‘God has not been so sparing to Men to make them barely two-legged Creatures, and left it to Aristotle to make them Rational’ (Essay, IV. xvii. 4; [Locke, 1975, p. 671]). It was, nevertheless, Aristotle who was the first philosopher to give inductive reasoning a name and to provide an account, albeit a brief and imperfect one, of what it was and how it worked. The name chosen was ἐπαγωγή (epagoge), derived from the verb ἐπάγειν, variously translated, according to context, as to bring or lead in, or on. Like ‘induction’ in modern English, epagoge had (and continued to have) a variety of other, irrelevant meanings: Plato had used it for an incantation (Republic 364c), and Aristotle himself employed it for the ingestion of food (De Respiratione 483a9).
1.1 Socrates and Plato Although none of Aristotle’s predecessors had anticipated him in using the term epagoge for inductive arguments, he had himself picked out Socrates for his use of what Aristotle called ἐπακτικοὺς λόγους (Metaphysics 1078b28). Though Aristotle would have had testimony about Socrates’ activities that has since been lost, there can be little doubt that his main source of information was Plato. In the early dialogues, Socrates was often portrayed as using modes of argument that Aristotle would certainly have classed as epagoge, for example in Protagoras 332C, where Socrates is reporting his interrogation of Protagoras: Once more, I said, is there anything beautiful? Yes. To which the only opposite is the ugly? There is no other. And is there anything good? There is.
Induction before Hume
3
To which the only opposite is the evil? There is no other. And there is the acute in sound? True. To which the only opposite is the grave? There is no other, he said, but that. Then every opposite has one opposite only and no more? He [Protagoras] assented. [Plato, 1953, vol. I, p. 158] Here and elsewhere (e.g. Charmides 159–160; Ion 540) the conclusion is a philosophical one that could have been grasped directly by someone intelligent and clear-sighted enough. Plato was concerned with truths such as these, not with empirical generalisations involving white swans or other sensory particulars [Robinson, 1953, pp. 33–48; McPherran, 2007].
1.2
Aristotle
Aristotle’s theory of induction — or to put it more neutrally, of epagoge, since there is disagreement even about the most appropriate translation of that term — has long been a matter of controversy. It is widely regarded as incomplete and in various respects imperfect: one modern commentator has referred to ‘the common belief [that] Aristotle’s concept of induction is incomplete, ill-conceived, unsystematic and generally unsatisfactory’, at least in comparison with his theory of deduction [Upton, 1981, p. 172]. Though not everyone might agree with this, it is clear that there is no consensus either about what exactly Aristotle was trying to do, or about how successful he was.2 When Aristotle used the word epagoge to characterise his own arguments, his employment of the term is thoroughly Socratic, or at least Platonic; the arguments were seldom empirical generalisations, or anything like them. The following passage from Metaphysics I is in this respect entirely typical: That contrariety is the greatest difference is made clear by induction [ἐκ τῆς ἐπαγωγῆς]. For things which differ in genus have no way to one another, but are too far distant and are not comparable; and for things that differ in species the extremes from which generation takes place are the contraries, and the distance between extremes — and therefore that between the contraries — is greatest. (1055a5–10). Similar remarks can be found elsewhere in the same book, e.g. in 1055b17 and 1058b9. Aristotle discussed epagoge in three passages, none of them very long. The earliest is in Topics A12, where dialectical arguments are divided into two kinds, 2 A selection of diverse views can be found in [Kosman, 1973; Hamlyn, 1976; Engberg-Pedersen, 1979; Upton, 1981; Caujolle-Zaslavsky, 1990; McKirahan, 1992, pp. 250–7; De Rijk, 2002, pp. 140–8].
4
J. R. Milton
syllogismos and epagoge. The meaning of the former term is certainly broader than ‘syllogism’ as now generally understood, and as the word is used in Aristotle’s later writings; it can probably best be translated as ‘deduction’. Epagoge is characterised quite briefly: Induction is the progress from particulars to universals; for example, ‘If the skilled pilot is the best pilot, and the skilled charioteer the best charioteer, then in general the skilled man is the best man in any particular sphere.’ Induction is more convincing and more clear and more easily grasped by sense perception and is shared by the majority of people, but reasoning [syllogismos] is more cogent and more efficacious against argumentative opponents (105a12–19). The first part of this subsequently became the standard definition of induction in the Middle Ages and Renaissance. It is natural for a modern reader to interpret it as meaning that induction is the mode of inference that proceeds from particular to universal propositions, but the Greek does not quite say this. Induction is merely the passage (ἔφοδος) from individuals to universals, τὰ καθόλου, and in other places (notably Posterior Analytics B19) these universals would seem to be, or at least to include, universal concepts. It should also not be automatically assumed that ‘ἔφοδος’ means inference in any technical sense [De Rijk, 2002, pp. 141–4]. Aristotle’s longest account of epagoge is in Prior Analytics B23: Now induction, or rather the syllogism which springs out of induction [ὁ ἐξ ἐπαγωγῆς ουλλογισμὸς], consists in establishing syllogistically a relation between one extreme and the middle by means of the other extreme, e.g. if B is the middle term between A and C, it consists in proving through C that A belongs to B. For this is the manner in which we make inductions. For example let A stand for long-lived, B for bileless, and C for the particular long-lived animals, e.g. man, horse, mule. A then belongs to the whole of C: for whatever is bileless is long-lived.3 But B also (‘not possessing bile’) belongs to all C. If then C is convertible with B, and the middle term is not wider in extension, it is necessary that A should belong to B. For it has already been proved that if two things belong to the same thing, and the extreme is convertible with one of them, then the other predicate will belong to the predicate that is converted. But we must apprehend C as made up of all the particulars. For induction proceeds through an enumeration of all the cases. (68b15–29). This is not an easy passage to understand, and has been the subject of much discussion. Aristotle appears to be applying his method of conversion, devised as 3 The phrase given here in italics makes no sense here; it may be an interpolation and if so should be excised [Aristotle, 1973, p. 514], even though there is no manuscript support for doing this [Ross, 1949, p. 486].
Induction before Hume
5
part of his account of syllogisms, to a case where it is not obviously applicable: hence the mention of middle terms. The crucial step in the argument is that B belongs to all C, i.e. that every long-lived animal is bileless. This could mean that every individual long-lived animal is bileless, or it could mean that every species of such animals is bileless. The latter seems to be indicated by the examples given — man, horse, mule, rather than (say) Socrates, Bucephelas etc. If so, then Aristotle appears to have been giving an example of what has subsequently came to be termed perfect (i.e. complete) induction: an inference from a finite sample that is sufficiently small for all the particular cases to be examined. This might seem to be what is indicated by the final remark, that ‘induction proceeds through an enumeration of all the cases’, but here (as often) the Oxford translation supplies words not present in the Greek, which merely says ‘for induction [is] through all’, ἡ γὰρ ἐπαγωγὴ διὰ πάντων. It is perhaps significant here that the proposition being proved — that all bileless animals are long-lived — is a generalisation about the natural world, and therefore very unlike the propositions argued for by Socrates in the early Platonic dialogues. It is manifestly not something that could in principle be grasped immediately by intuition. The same is true of another proposition described as having been derived by induction: in Posterior Analytics A13 (78a30–b4) Aristotle gave a celebrated example of a scientific demonstration:
Therefore
(1) (2) (3)
The planets do not twinkle. Whatever does not twinkle is near. The planets are near.
This counts as a demonstration, as distinct from a merely valid syllogism, because it states the cause: it is because the planets are near (i.e. nearer than the fixed stars) that they do not twinkle. Premise (2) is described as having been reached ‘by induction or through sense-perception’ (78a34–5), though the same must in fact be true also of premise (1). For (1) the argument is straightforward and unproblematic — Mercury does not twinkle, Venus does not twinkle, etc. — but for (2) it is not. There is clearly no difficulty in assembling a long list of particular non-twinkling objects that are also nearby, but how could the general proposition that all such objects are nearby be established? If it is supposed to be the conclusion of an inductive argument, then the enumeration is manifestly incomplete, and the inference correspondingly fallible. The demonstrations analysed in the Posterior Analytics are syllogistic arguments (here ‘syllogism’ is being used in the strict sense) which proceed from premises that are ‘true, primary, immediate, better known than, prior to, and causative of their conclusion’ (71b20–2). All these premises are universal in form, and this raises an obvious question: if the primary premises from which demonstrations proceed cannot themselves be demonstrated, how are they to be known? It was an issue that Aristotle deferred until the final chapter of the second book. The problem is stated quite clearly at the beginning of the chapter, but the discussion that follows at first sight seems rather puzzling: rather than discussing
6
J. R. Milton
inductive arguments, Aristotle appears to be trying to account for the acquisition of universal concepts — from the perception of several individual men to the species man, and then to the genus animal (100a3–b3). He then commented (this is the only place in which the word epagoge occurs in the whole chapter): ‘Thus it is clear that it is necessary for us to come to know the first principles by induction, because this is also the way in which universals are put into us by sense perception’ (100b3–5). The whole passage is undeniably difficult, and has been diversely interpreted, as the two main English commentaries on the Posterior Analytics show. Sir David Ross took it that Aristotle was concerned with both concept formation and induction, and treated them together because ‘the formation of general concepts and the grasping of universal propositions are inseparably interwoven’ [Ross, 1949, p. 675]. Jonathan Barnes, on the other hand, held that ‘Here “induction” is used in a weak sense, to refer to any cognitive progress from the less to the more general . . . Thus construed, 100b3–5 says no more than that concept acquisition proceeds from the less to the more general.’ [Barnes, 1975, p. 256]. On Barnes’s reading, the passage is not concerned with the inference from singular to universal propositions at all. This is not a dispute that can easily be resolved: the relevant texts are quite short, and all the participants in the debate are thoroughly familiar with them. My own inclination is to side with Ross. Aristotle’s position here is very different from that found in a later empiricist like Locke. Locke had an account of how humans — unlike the other animals that he called ‘brutes’ — had a capacity to frame abstract general ideas from the ideas of particular things given in perception [Locke, 1975, pp. 159–60], but this process had nothing to do with an inductive ascent from particular to universal propositions, about which Locke said virtually nothing. For Aristotle what comes to rest in the soul (more specifically, in the intellect) is not a mere Lockean abstract general idea, a particular entity that has the capacity to function as a universal sign, but rather a real universal thing, a form freed from matter and thereby de-individuated. This is why the same psychological process can be used to explain both the acquisition of universal concepts and the knowledge of first principles. In the Posterior Analytics the account of this is little more than a sketch, but it was subsequently fully worked out by Aristotle’s followers in late antiquity and in the Middle Ages. There is no hint whatever in Aristotle that epagoge is merely one of several ways by which we can gain knowledge of first principles. The view found in many modern empiricists that while some universal truths are known — or at least receive some degree of evidential support — a posteriori, by induction, others (for example Euclid’s axiom that all right angles are equal) are known a priori, is entirely foreign to his way of thinking. For Aristotle it is impossible to view (θεωρῆσαι) universals except through induction (Posterior Analytics 81b2). In all the passages mentioned so far, epagoge is treated as a process leading to universals, whether concepts, or propositions, or both. This is explicit in the definition in the Topics, but it can also be seen in the Prior and the Posterior Analytics. Often, however, and especially in the practical affairs of life, we are
Induction before Hume
7
concerned with reasoning from particulars to other particulars — whether the sun will rise tomorrow, whether this loaf of bread will nourish me, and so on. Aristotle was, of course, well aware that we do this, and classified such inferences as ‘examples’ (paradeigmata). What is less clear is whether paradeigma is a type of induction, or whether it is a different kind of argument, resembling induction in various ways, but not a sub-variety of it. In Prior Analytics B24, the chapter immediately after the chapter on induction, there is an account of paradeigmata. To give one specimen of such an argument, Athens against Thebes and Thebes against Phocis are both cases of wars against neighbours; the war against Phocis was bad for Thebes, so a war against Thebes would be bad for Athens (68b41–69a13). The inference might appear to proceed via a more general principal that war against neighbours is always bad (69a4, 6), which would make it an application of induction: a two-part argument involving an inductive ascent to a generalisation followed by a deductive descent to a particular case. Aristotle, however, insisted that the two kinds of inference were distinct: example is not reasoning from part to whole or from whole to part, but from part to part (69a14–15). Induction proceeds by an examination of all the individual cases (ἐξ ἁπάντων τῶν ἀτόμων), while example does not (69a16–19). In Aristotle’s Rhetoric, however, induction and example seem much closer, if not identical: just as in dialectic there is induction on the one hand and syllogism or apparent syllogism on the other, so it is in rhetoric. The example is an induction, the enthymeme4 is a syllogism, and the apparent enthymeme is an apparent syllogism. I call the enthymeme a rhetorical syllogism and the example a rhetorical induction. Every one who effects persuasion through proof does in fact use either enthymemes or examples: there is no other way. And since every one who proves anything at all is bound to use either syllogisms or inductions (and this is clear to us from the Analytics), it must follow that enthymemes are syllogisms and examples are inductions (1356b1–10). The exhaustive division of all arguments into either syllogismos or epagoge is not peculiar to the Rhetoric: it can be found in both parts of the Analytics (68b13–14, 71a5–6), as can the identification of enthymeme and example as their rhetorical counterparts (71a9–11). One very plausible way of interpreting this is that enthymeme and example are not sub-varieties of syllogismos and epagoge, still less entirely different types of argument, but rather instances of syllogismos and epagoge ‘when these occur in a rhetorical speech rather than in a dialectical argument’ [Burnyeat, 1994, p. 16]. If this is done, however, the notion of epagoge must be broadened to include most if not all non-deductive argument, since one thing that is absolutely certain about paradeigma is that it concerns arguments from particulars to particulars. 4 Aristotle’s account of enthymeme is complex and has often been misunderstood, but lies outside the scope of this chapter; for a penetrating modern analysis, see [Burnyeat, 1994].
8
J. R. Milton
None of Aristotle’s surviving works contains a detailed and systematic account of induction, and there is no evidence that one was ever produced. Why this should have been the case is not obvious, given the potential importance of such reasoning in his theory of knowledge, but one explanation may be that the separation of form and content, which had been central to his analysis of the syllogism, was (and still remains) more difficult to achieve in the case of induction. At all events, Aristotle did not bequeath to his successors an account of induction that was in any way comparable to his treatment of the syllogism.
1.3 Hellenistic and later Greek accounts In the three centuries that followed Aristotle’s death, his technical writings were not much studied outside the (declining) Peripatetic school, and the terms that he had devised were replaced by others. The problems involved in inference from particular to universal propositions were raised occasionally, but they seem not to have become the central issue of discussion, unlike the problems of inference from signs. Alcinous The lack of any serious interest in induction among the Platonists is indicated by the extremely brief treatment in one of the few philosophical textbooks to survive, the Handbook of Platonism (Didaskalikos) attributed to a certain Alcinous, often identified with the Middle Platonist Albinus (2nd century ad): Induction is any logical procedure which passes from like to like, or from the particular to the general. Induction is particularly useful for activating the natural concepts (Didaskalikos, 6.7; [Dillon, 1993, p. 10]). The last remark may allude to the well-known passage in the Meno where the slave boy is being led to reveal his innate knowledge of geometry [Dillon, 1993, p. 77]. One finds here a characteristic blend of Platonism and Aristotelianism: the role of induction is to provide particular examples that can bring to full consciousness the concepts implanted in us by nature. Diogenes Laertius Two other Greek writers from the Roman period had rather more to say about induction: the biographer Diogenes Laertius (early 3rd century?), and the Pyrrhonian sceptic, Sextus Empiricus (late 2nd or early 3rd century?). Neither was an original thinker, and indeed Diogenes was barely a thinker at all, but rather a scissors-and-paste compiler whose labours would have been ignored by posterity had they not resulted in the only extensive compendium of philosophical biographies to have survived from antiquity.
Induction before Hume
9
Diogenes’ remarks on induction are in his life of Plato (III. 53–55). Epagoge is defined as an argument in which we infer from some true premises a conclusion resembling them. There are two varieties: from opposites (κατ᾿ ἐναντίωσιν), and from implication (ἐκ τῆς ἀκολουθίας). The former is a mode of argument that bears little resemblance to any modern notion of induction: If man is not an animal he will be either a stick or a stone. But he is not a stick or a stone, for he is animate and self-moved. Therefore he is an animal. But if he is an animal, and if a dog or an ox is also an animal, then man by being an animal will be a dog and an ox as well. The first part of this is clear enough — it seems that either Diogenes or his source was using an ancient version of the question ‘Animal, Vegetable or Mineral?’ — but the last part is considerably more obscure. The second kind of induction is much more familiar. There are two sub-varieties: one, described as belonging to rhetoric, in which the argument is from particulars to other particulars, and the other, belonging to dialectic, in which it is from particulars to universals. The former is clearly the Aristotelian paradeigma, though that term was not used. An instance of the latter is the argument that the soul is immortal: And this is proved in the dialogue on the soul [presumably the Phaedo] by means of a certain general proposition, that opposites proceed from opposites. And the general proposition is established by means of some propositions which are particular, as that sleep comes from waking and vice-versa, and the greater from the less and vice-versa. These are not examples of empirical generalisations. Sextus Empiricus Among the immense range of sceptical arguments preserved and deployed by Sextus Empiricus, inductive scepticism is inconspicuous, though not wholly absent. In the Outlines of Pyrrhonism II. 204 inductive arguments were dismissed in a very cursory, almost contemptuous, manner: It is also easy, I consider, to set aside the method of induction [τὸν περὶ ἐπαγογῆς τρόπον]. For, when they propose to establish the universal from the particulars by means of induction, they will effect this by a review either of all or of some of the particular instances. But if they review some, the induction will be insecure, since some of the particulars omitted in the induction may contravene the universal; while if they are to review all, they will be toiling at the impossible, since the particulars are infinite and indefinite. Thus on both grounds, as I think, the consequence is that induction is invalidated.5 [Sextus, 1967, p. 283]. 5 Literally,
‘shaken’, or ‘made to totter’.
10
J. R. Milton
Another passage a few pages earlier (II. 195) supplies a little more detail: Well then, the premiss ‘Every man is an animal’ is established by induction from particular instances; for from the fact that Socrates, who is a man, is also an animal, and Plato likewise, and Dion and each one of the particular instances, they think it is possible to assert that every man is an animal. . . [Sextus, 1967, p. 277]. Sextus was not persuaded: if even a single counter-example can be found, the universal conclusion is not sound (ὑγιής, i.e. healthy), ‘thus, for example, when most animals move the lower jaw, and only the crocodile the upper, the premiss “Every animal moves the lower jaw” is not true.’ [Sextus, 1967, p. 277]. At first sight this differs from the familiar modern textbook example of ‘All swans are white’ being falsified by the observation of a single individual black swan, but in fact the differences are small. In the case of the swans, what makes the falsification effective is that it was a species of black swans that was discovered. Logically speaking, a single negative instance can falsify a universal proposition; in practice it usually would not, as a variety of what Imre Lakatos called ‘monster-barring’ stratagems would come into play. It is very unlikely that the generalisation about how animals move their jaws, with the crocodile as an exception, was original to Sextus: the same example can be found in Apuleius’ Peri Hermeneias [Apuleius, 1987, p. 95]. It had probably long been a stock example, repeated from author to author. Alexander of Aphrodisias The view that conclusions drawn from inductive arguments are not conclusively established was not peculiar to the sceptics — indeed it can be found among the Aristotelians themselves, notably the late second-century commentator Alexander of Aphrodisias. On the passage in Topics 105a10ff quoted above, Alexander observed: So induction has the quality of persuasiveness; but it does not have that of necessity. For the universal does not follow by necessity from the particulars once these have been conceded, because we cannot get something through induction by going over all the particular cases, since the particular cases are impossible to go through [Alexander, 2001, p. 93]. As this and other remarks to be quoted in what follows show quite clearly, it is utterly mistaken to suppose that Hume was the first person to notice that inductive arguments are not deductively valid, and that any universal generalisation which covers a field that is either infinite or too large to survey completely is vulnerable to counter-examples. To suppose this would be unfair both to Hume, who was certainly doing something more radical and much less banal, and to his predecessors, who had taken the fallibility of such inferences for granted.
Induction before Hume
1.4
11
Roman philosophy
Cicero and the rhetorical tradition The Romans, unlike their medieval successors, had little interest in logic as a technical discipline,6 but rhetoric was a central — perhaps the central — element of their educational curriculum. When philosophy began to be written in Latin, a new technical vocabulary needed to be devised. Who introduced the term ‘inductio’ for epagoge is not now known, but in the surviving corpus of Latin literature the word first appears with this sense in a youthful work by Cicero, De Inventione. Here it is described as a form of argument in which the speaker first gets his opponent to agree on some undisputed propositions, and then leads him to assent to others resembling them. In the example Cicero gave, Pericles’ sharp-witted mistress Aspasia is interrogating the wife of a certain Xenophon (not the historian): ‘Please tell me, if your neighbour had a better gold ornament than you have, would you prefer that one, or your own?’ ‘That one’, she said. ‘And if she had clothes or other finery more expensive than you have, would you prefer yours or hers?’ ‘Hers, of course’, she replied. ‘Well then, if she had a better husband than you have, would you prefer yours or hers?’ At this, the woman blushed. (I. 55). Clearly this is not a specimen of inductive generalisation, but rather of what Aristotle called paradeigma. Cicero had little interest in the kinds of generalisation that might be made by a natural philosopher: his concern, here as elsewhere, was with the strategies that can be used in public speaking or in a court of law. In a later rhetorical treatise, the Topics, induction is mentioned very briefly as merely one variety of a more extensive class of arguments from similarity. The example Cicero gave — that if honesty is require of a guardian, a partner, a bailee and a trustee, it is required of an agent (Topics, 42) — is described as an epagoge (the Greek term was used), but it is clearly a case of what Aristotle had called paradeigma. In the rhetorical tradition, it was the analysis and employment of arguments of this type that attracted most interest. Cicero’s account of induction was followed by the writers of rhetorical treatises and textbooks, notably Quintilian’s Institutio Oratoria, V. x. 73, xi. 2 [Quintilian, 1921, vol. II, pp. 241, 273], though the treatment is fairly cursory: induction was merely one rather unimportant variety of reasoning, less deserving of extended analysis than either arguments from signs or examples. This subsumption of induction into the theory of rhetoric had the unwelcome result (for analytically minded historians of philosophy) that what they have thought of as the Problem of Induction — the enquiry into how (if at all) universal propositions can be proved, or
6 Though
the aversion was by no means universal: see [Barnes, 1997, ch. 1].
12
J. R. Milton
at least made probable,7 from evidence of particular cases — was never properly raised, let alone answered. Boethius It was only in the final twilight of the ancient world, after the fall of the Empire in the west, that Aristotle’s writings started to be translated into Latin. Boethius had planned to translate and comment on the entire corpus, but by the time of his premature death only a small part of this exceedingly ambitious project had been completed. The only translations that have survived were of the Categories and De Interpretatione, but Boethius’ own logical writings gave his early medieval successors some information about the content of Aristotle’s other works on logic. Induction was dealt with fairly briefly in De Topicis Differentiis [Stump, 1978, pp. 44–46], being described in Aristotelian rather than Ciceronian terms as a progression from particulars to universals. This is taken directly from Aristotle’s account in the Topics, as was the example given to illustrate it: just as a pilot should be chosen on the basis of possessing the appropriate skill rather than by lot, and similarly with a charioteer, so generally if one wants something governed properly one should choose someone on the basis of their skill. The main historical importance of Boethius’ account is not that it added anything to earlier analyses — it did not — but that it provided his early medieval readers with information then unavailable from any other source. Summary It is striking that no sustained discussion of inductive reasoning has survived from the ancient world. Of course the vast majority of Greek and Roman philosophical works have perished, and are accessible only from fragments quoted by other writers, or often not at all. If more had been preserved, then the patchy and episodic account given above could unquestionably have been made considerably longer and more detailed. There is nevertheless no sign that a major and systematic account of inductive reasoning has been lost: among the many lists of works given by Diogenes Laertius there is no sign of any treatise with the title Peri Epagoges or something similar. It would appear, therefore, that induction was not something that any of the ancients regarded as one of the central problems of philosophy. Several reasons for this state of affairs can be discerned. One is that the general drift of philosophy, especially in late antiquity, was also away from the kind of systematic empirical enquiry practised by Aristotle and his immediate successors. Plotinus, for example, used the word epagoge only twice, once for an argument to show that there is nothing contrary to substance, and once for an argument that whatever is destroyed is composite (Enneads, I. 8. 6; 7 On the meaning of probabilis and related terms in Cicero and other ancient authors, see [Glucker, 1995]. On subsequent history, see [Hacking, 1975; Cohen, 1980; Franklin, 2001].
Induction before Hume
13
II. 4. 6). The kind of understanding gained through empirical generalisation was too meagre and unimportant for the modes of argument leading to it to merit sustained analysis. Another reason is that interest in the systematic investigation of the natural world was intermittent and localised. It did not help that the scientific discipline in which the greatest advances were made had been mathematical astronomy, and this was not a field where the problems posed by inductive reasoning would have surfaced, still less become pressing. Constructing a model for the motions of a planet was a highly complex business, but it did not involve generalisation from data in the form ‘this A is B’ and ‘this A is B’ to ‘every A is B’. Ptolemy, indeed, seems to have felt so little urge to generalise that his models for the individual planets are all given separately, and (in the Almagest at least) not integrated into a single coherent system Finally, the centrality of rhetoric in ancient education meant that when inductive arguments were discussed, they tended to be evaluated for their persuasiveness, not for their logical merits. Inductive arguments became almost lost in a mass of miscellaneous un-formalised arguments that were not investigated for their validity, or any inductive analogue thereof, but for their plausibility in the context of a speech.
14
J. R. Milton
2 THE MIDDLE AGES
2.1 Arabic accounts Two civilisations inherited the legacy of ancient philosophy. Starting in the late eighth century, a large part of the philosophical literature that had been fashionable in late antiquity was translated into Arabic, including most of the corpus of Aristotle’s writings and many of the works of his commentators. The accounts of induction in the Prior Analytics, the Posterior Analytics and the Topics became the starting point of subsequent treatments. Very little of the Greek technical terminology was directly transliterated, a notable exception being the word for philosophy itself (falsafah). Epagoge was translated as istiqrˆ a, a word whose root meaning was investigation or examination.8 Al-Fˆ arˆ abˆı The first Arabic writer to give a systematic account of induction was al-Fˆ arˆ abˆı (c.870–c.950) [Lameer, 1994, pp. 143–154, 169–175]. His conception of induction differed in one important respect from Aristotle’s. According to Joep Lameer: For Aristotle, induction is the advance from a number of related particular cases to the corresponding universal. In opposition to this, al-Fˆ arˆ abˆı explains induction in terms of an examination of the particulars. This view must be taken to be a natural consequence of the fact that in the Arabic Prior Analytics, epagˆ ogˆe was rendered as istiqrˆ a (‘collection’ in the sense of a scrutiny of the particulars). [Lameer, 1994, p. 173, cf. p. 144]. This conception of induction as proceeding by a one-by-one examination of the particulars had the consequence that inductions have full probative force only when they are complete [Lameer, 1994, pp. 144, 147]. Al-Fˆarˆ abˆı also made a distinction between induction and what he called methodic experience (tajriba, equivalent to Greek empeiria): methodic experience means that we examine the particular instances of universal premises to determine whether a given universal is predicable of each one of the particular instances, and we follow this up with all or most of them until we obtain necessary certainty, in which case that predication applies to the whole of that species. Methodic experience resembles induction, except the difference between methodic experience and induction is that induction does not produce necessary certainty by means of universal predication, whereas methodic experience does. [McGinnis and Reisman, 2007, p. 67]. 8 For insight into the meaning of Arabic terminology I am grateful to my colleague Peter Adamson.
Induction before Hume
15
Induction is inferior to methodic experience because it does not uncover necessary truths or lead to certain knowledge. Avicenna The same distinction between induction and methodic experience appears in Avicenna (Ibn Sina, 980–1037) [McGinnis, 2003], [McGinnis, 2008]. In his main philosophical work, The Cure (Book of Demonstration, I. 9. §§ 12, 21), induction was described as inferior to methodic experience, in that unless it proceeds from an examination of all the relevant cases, it leads only to probable belief; [McGinnis and Reisman, 2007, pp. 149, 152]). Methodic experience is not like induction . . . methodic experience is like our judging that the scammony plant [Convolvulus scammonia] is a purgative for bile; for since this is repeated many times, it stops being a case of something that occurs by chance, and the mind then judges and grants that it is characteristic of scammony to purge bile. Purging bile is a concomitant accident of scammony. [McGinnis and Reisman, 2007, p. 149]. The Aristotelian background is apparent here: events due merely to chance do not recur regularly, and a regular succession is therefore a sign that something is occurring naturally: Now one might ask: ‘This is not something whose cause is known, so how are we certain that scammony cannot be sound of nature, and yet not purge bile?’ I say: Since it is verified that purging bile so happens to belong to scammony, and that becomes evident by way of much repetition, one knows that it is not by chance, for chance is not always or for the most part. Then one knows that this is something scammony necessarily brings about by nature, since there is no way it can be an act of choice on the part of scammony. [McGinnis and Reisman, 2007, p. 149] To use the language of more recent philosophers, we know a priori that the physical world is full of natural law-like regularities, and we merely need enough experience to show that the apparent regularity we are considering is one of these, and not something due purely to chance. Even if this is granted, however, methodic experience does not produce certainty: there is always the risk of coming up with a generalisation that is too wide: We also do not preclude that in some country, some temperament and special property is connected with or absent from the scammony such that it does not purge. Nevertheless, the judgment based on methodic experience that we possess must be that the scammony commonplace among and perceived by us purges bile, whether owing to its essence
16
J. R. Milton
or a nature in it, unless opposed by some obstacle. [McGinnis and Reisman, 2007, p. 151] Another problem is effectively identical to the white-swan problem of modern textbooks: Were we to imagine that there were no people but Sudanese, and that only black people were repeatedly perceived, then would that not necessarily produce a conviction that all people are black? On the one hand, if it does not, then why does one repetition produce such a belief, and another repetition does not? On the other hand, if the one instance of methodic experience does produce the belief that there are only black people, it has in fact produced an error and falsehood. [McGinnis and Reisman, 2007, p. 150] It was a very pertinent question, and Avicenna’s response was rather opaque: you can easily resolve the puzzle concerning the Sudanese and their procreation of black children. In summary form, when procreation is taken to be procreation by black people, or people of one such country, then methodic experience will be valid. If procreation is taken to be that of any given people, then methodic experience will not end with the aforementioned particular instances; for that methodic experience concerned a black people, but people absolutely speaking are not limited to black people. [McGinnis and Reisman, 2007, p. 150] Though Avicenna’s writings had an immense influence on the philosophers in the universities of medieval Europe, this particular work was never translated into Latin. The purgative powers of scammony, however, became a stock example in scholastic discussions, probably through its use in Avicenna’s medical writings, which had an immense influence on medical education in the Latin west [Weinberg, 1965, pp. 134–135].
2.2 The Latin West Boethius’ categorisation of induction as a progression from particulars to universals was only one definition current during the Middle Ages. Another was a more rhetorical definition, derived ultimately from Cicero and transmitted by authors such as Victorinus and Alcuin, that made no mention of universality. Alcuin defined induction as an argument that from certain things proves uncertain ones, and compels the assent of the unwilling [Halm, 1863, p. 540].9 The reception and translation of the main body of Aristotle’s writings into Latin during the course of the twelfth and thirteenth centuries, initially from Arabic, 9 Inductio est oratio quae per certas res quaedam incerta probat et nolentem ducit in assensionem, Disputatio de Rhetorica et De Virtutibus, 30. According to Victorinus, Inductio est oratio, quae rebus non dubiis captat adsensiones eius, quicum instituta est, Explanationum in Rhetoricam M. Tullii Ciceronis Libri Duo, I. 31, [Halm, 1863, p. 240].
Induction before Hume
17
but subsequently directly from Greek, focused the attention of philosophers in the universities on the logical rather than the rhetorical tradition. There are two main locations for discussions of induction in the works of the schoolmen. One was in commentaries and questions on the Prior and Posterior Analytics, the other in general treatises on logic and logic textbooks, though few of these dealt with it at length.10 Robert Grosseteste One of the most elaborate and most interesting commentaries on the Posterior Analytics was one of the first, written by Robert Grosseteste before 1230 [Hackett, 2004, p. 161]. Grosseteste’s account of induction was based closely on the final chapter of the Posterior Analytics, though the specific example he used came from Avicenna: For when the senses several times observe two singular occurrences, of which one is the cause of the other or is related to it in some other way, and they do not see the connections between them, as, for example, when someone frequently notices that the eating of scammony happens to be accompanied by the discharge of red bile and does not see that it is the scammony that attracts and withdraws the red bile, then from constant observation of these two observable things it begins to form [estimare] a third, unobservable thing, namely that scammony is the cause that withdraws the red bile [Grosseteste, 1981, pp. 214–215; Crombie, 1953, pp. 73–74] This looks much more like the advancing of a causal hypothesis than a specimen of inductive generalisation. The next part of Grosseteste’s account followed Aristotle closely: repeated perceptions are stored in the memory, and this in turn leads to reasoning: Reason begins to wonder and consider whether things really are as the sensible recollection says, and these two lead the reason to the experiment [ad experientiam], namely, that scammony should be administered after all other causes purging red bile have been isolated and excluded. But when he has administered scammony many times with the sure exclusion of all other things that purge red bile, then there is formed in the reason this universal, namely that all scammony of its nature withdraws red bile; and this is the way in which it comes from sensation to a universal experimental principle. [Grosseteste, 1981, p. 215; Crombie, 1953, p. 74] If the conclusion had merely been that all scammony draws out red bile, then the argument would be a clear and unproblematic case of inductive generalisation. In 10 Little has been written specifically on medieval accounts of induction, but for two short general surveys, see [Weinberg, 1965; Bos, 1993].
18
J. R. Milton
fact the conclusion is stronger than this: that scammony of its nature [secundum se] draws out red bile. No doubt Grosseteste would have replied that the inference would only be safe if the power of drawing out bile really was part of the nature of scammony. The framework and most of the details of Grosseteste’s account are plainly Aristotelian, but there is one important difference. In the Posterior Analytics a plurality of memories constitute a single experience (empeiria), and this, unlike the memories from which it had arisen, is universal (100a5–6); there is no suggestion whatever that anything that we would now describe as an experiment needs to be undertaken. Grosseteste’s procedure was much more interventionist: scammony is to be administered in a variety of situations in which all the other substances that are known to purge bile have been excluded, and it is this systematic variation of the circumstances that provides the justification for the universal conclusion. William of Ockham The most detailed account of induction by any of the writers on logic was given by Ockham in his Summa Logicae, Part III, section iii, chapters 31–36. In the first of these, induction was defined in the manner of Aristotle and Boethius, as a progression from singulars to a universal [Ockham, 1974, p. 707]. In both the premises and the conclusion the predicate remains the same, and variation occurs merely in the subject: for example ‘This [man] runs, that [man] runs, and so on for other singulars [et sic de singulis], therefore every man runs’, or ‘Socrates runs, Plato runs, and so on for other singulars, therefore every man runs’ [Ockham, 1974, p. 708]. In all these examples Ockham was concerned with propositions ascribing a predicate to an individual (Socrates, this man, that white thing), and not a species. This is fully consonant with his thoroughgoing nominalism: only individuals exist, and universals are merely signs that represent them. In the chapters that follow Ockham gave a series of rules for sound and unsound inductive inferences. He began by considering non-modal propositions about present states of affairs (de praesenti et de inesse). There are three rules for these: 1. Every true universal proposition has some true singular. 2. If all the singulars of some universal proposition are true, then the universal is true. 3. If a negative universal proposition is false, then it follows that at least one of its singulars is false. [Ockham, 1974, pp. 708–709] The first of these points to a fundamental difference between medieval and modern post-Fregean logic, in which ‘Every A is B’ does not imply that ‘Some A is B’. The second rule might seem obvious, but as becomes apparent in the chapters that
Induction before Hume
19
follow, there are types of proposition for which Ockham thought that it did not apply. For some modal propositions — those in sensu divisionis 11 — the same rules apply: just as we can draw the conclusion that ‘Every man runs’ from ‘Socrates runs’, ‘Plato runs’, etc., so we can make the inference ‘Socrates is contingently an animal, Plato is contingently an animal, and so on for other singulars, therefore every man is contingently an animal’ [Ockham, 1974, p. 715]. In cases where the modality is in sensu compositionis, however, different rules apply: this rule is not generally true [vera] ‘all the singulars are necessary, therefore the universal is necessary’. Similarly . . . this rule is not general ‘the universal is necessary therefore the singulars are necessary’. [Ockham, 1974, p. 717] Another inference that is not valid (non valet) is ‘all the singulars are possible, therefore the universal is possible’: For it does not follow ‘this is possible: this contingent proposition is true; and this is possible: that contingent proposition is true, and so for the other singulars; therefore this is possible: every contingent proposition is true’. [Ockham, 1974, p. 718] It is clear that in these chapters Ockham was not concerned with the problems discussed in modern treatises on probability and induction. Although the subject matter was described as induction, the problems addressed are those of deductive logic, in particular the relations between universal propositions — or to be more accurate propositions involving universal quantification — and their associated singular propositions. When, for example, he wrote that ‘this rule is not valid, the singulars are contingent, therefore the universal is contingent’,12 it is quite clear that he meant all the singulars, and not merely some of them. The problems involved in generalisation from a finite sample were not even raised, let alone answered: here at least Ockham was not engaged in that kind of enquiry. Jean Buridan Ockham never wrote a commentary on either the Prior or the Posterior Analytics, but one fourteenth-century nominalist who did was Jean Buridan (c.1300–1358). Buridan took it for granted that inductive arguments are invalid if only some of the singulars are considered: ‘an induction is not a good consequence [bona consequentia] unless all the singulars are enumerated in it. But we cannot enumerate all 11 Modal propositions in sensu divisionis (or in sensu diviso) were those where the modal operator was applied to part of the proposition, not the whole; propositions in sensu compositionis (or in sensu composito) were those where the operator was applied to the whole proposition: see [Broadie, 1993, pp. 59–60]; on Ockham’s usage, see [Lagerlund, 2000, pp. 98–100]. 12 ista regula non valet, singulares sunt contingentes, igitur universalis est contingens’, ch. 36, [Ockham 1974, p. 720].
20
J. R. Milton
of them because they are infinitely many.’ [Biard, 2001, p. 92]. We do nevertheless draw general conclusion from finite samples: For when you have often seen rhubarb purge bile and have memories of this, and have never found a counterexample in the many different circumstances you have considered, then the intellect, not as a necessary consequence, but only from its natural inclination to the truth, assents to the universal principle and understands it as if it were an evident principle based on an induction such as ‘this rhubarb purged bile, and that [rhubarb]’, and so on for many others, which have been sensed and held in memory. Then the intellect supplies the little clause [clausulam] ‘and so on for the [other] singulars’, because it has never witnessed a counterexample . . . nor is there any reason or dissimilarity apparent why there should be a counterexample. [Biard, 2001, p. 93] Parts of this may remind a modern reader of Hume’s account of the operation of the mind, but there is one crucial difference: Buridan’s ‘inclinations’ are inclinations to the truth, not mere habits grounded on the association of ideas. In the background there is the unquestioned assumption — notoriously absent in Hume — that God has equipped us with faculties that, when not mis-used, will lead us to truth rather than error. 3 THE RENAISSANCE
3.1 The revival of rhetoric At the risk of some simplification, it seems fair to say that the Renaissance saw a rise in the status of rhetoric, and a fall in the status of logic, or at least formal logic, though the process was far from uniform or complete. Aristotle continued generally to be treated with respect, even by those who did not think of themselves as Aristotelians, but the refinements of later medieval logic, with its intricate subtleties and (to the humanists) barbarous grammar and terminology, were quite another matter. Hostility towards formal logic can be traced back at least as far as Petrarch, but the first sustained attack was at the hands of Lorenzo Valla (1407–1457). The earliest textbook in the new style was the De Inventione Dialectica of Rudolph Agricola (1444–1485), first published in 1515 and reprinted sufficiently often thereafter for it to have been described as ‘the first humanist work in logic to become a best seller’ [Monfasani, 1990, p. 181]. Similar criticisms of scholastic logic were made by Juan Luis Vives (1492–1540), who had studied logic in the University of Paris as an undergraduate, and had not enjoyed the experience [Broadie, 1993, pp. 192–206]; his In Pseudodialecticos was first published in 1520 [Vives, 1979]. Valla’s opposition to traditional logic was deeper than that of his successors, in that he disliked not merely late medieval subtleties, but formal logic as such [Mack, 1993, pp. 83–4]. The rules of sound reasoning, like the rules of good writing,
Induction before Hume
21
were to be drawn ad consuetudinem eruditorum atque elegantium [Valla, 1982, p. 217], that is, from the actual Latin usage of the best writers of the best period. Logic therefore became merely one part — and a relatively unimportant one at that — of rhetoric. Valla explicitly indicated his dislike of, and dissent from, the Boethian description of induction as a progression from particulars to universals [Valla, 1982, p. 346]: for him it was the rhetorical argument from particulars to particulars that mattered. Agricola preferred the term ‘enumeratio’ to ‘inductio’, even though both had been used by Cicero: ‘to me it seems that induction should be more rightly called enumeration, since Cicero called it an argument from the enumeration of all the parts’ [Agricola, 1992, p. 316]. Some of the examples given are inductions of the traditional kind, but some certainly are not, for example: ‘the wall is mine, the foundation is mine, the roof is mine, the rest of the parts are mine. Therefore the house is mine.’ [Agricola, 1992, p. 316]. This kind of argument seems to have become a recognised type of induction in the rhetorical tradition: in the early eighteenth century Vico’s Institutiones Oratoriae drew a distinction between two kinds of induction, inductio partium and inductio similium. The former in turn had two sub-varieties, one involving an enumeration of all the species that made up a genus, the other an enumeration of all the parts that make up a totality, such as the limbs and organs of the human body [Vico, 1996, p. 90].
3.2
Zabarella
One of the most interesting sixteenth-century accounts of induction was by one of the professors at Padua, at that time the leading university in Italy, and arguably in Europe, and one where the study of logic continued to flourish [Grendler, 2002, pp. 250–253, 257–266]. Jacopo Zabarella (1533–1589) has been described by Charles Schmitt as ‘in the methodological matters . . . without a doubt the most acute and most influential of the Italian Renaissance Aristotelians’ [Schmitt, 1969, p. 82]. In chapter 4 of his short treatise De Regressu, he distinguished two kinds of induction: dialectical and demonstrative. Dialectical induction is used when the subject matter is mutable and contingent (in materia mutabili et contingente) and has no strength (nil roboris habet) unless all the particulars are considered without exception [Zabarella, 1608, col. 485d]. Demonstrative induction, by contrast, can be employed in necessary [subject] matter, and in things which have an essential connection among themselves, and for that reason in it [demonstrative induction] not all the particulars are considered, for our mind having inspected certain of these at once grasps the essential connection [statim essentialem connexum animadvertit], and leaving aside the remainder of the singulars, at one infers [colligit] the universal: for it knows it to be necessary that things are thus with the remainder [Zabarella, 1608, col. 485d–e].
22
J. R. Milton
A similar account can be found in the longer treatise De Methodis, III. 14] Zabarella, 1608, col. 255f]. For Aristotelians like Zabarella, demonstrative induction was needed because it alone among the varieties of induction could lead to certain knowledge of the universal propositions that serve as the premises of demonstrative syllogisms. Such truths can become known to us not by a complete survey of all the particulars, which is impossible, but by enough of them being inspected for the appropriate universal to be formed in the soul. This is not merely a universal concept, but a real universal, a form abstracted from matter and thereby de-individuated. The situation may be represented by a diagram: (a) singular propositions
(b) Universal propositions
(c) Real individuals
(d) Real universals
Logically speaking, induction is an inference from (a) to (b) — this much was agreed by everyone working in the Aristotelian (as distinct from the rhetorical) tradition. For the medieval and post-medieval realists, including Zabarella, this inference from (a) to (b) was mirrored by the relation between the real individuals (c) and the real universals (d): what made a universal proposition true was what later philosophers might have called a universal fact. The existence of these facts explained why a universal proposition could be known to be true even though not all the relevant particulars had been surveyed — indeed sometimes when only a few (aliqua pauca) of them had been [Zabarella, 1608, col. 255f]. Once the intellect had grasped the universal, further investigation of the particulars was no longer required. In demonstrative induction this kind of grasp could be achieved, and certainty was therefore attainable. It is clear that this account of induction presupposed a realist account of universals, of a kind apparently held (though in a form that still remains a subject of dispute) by Aristotle, and certainly developed in a variety of different and incompatible forms by his successors in the Middle Ages and later [Milton, 1987]. It was not available to nominalists such as Ockham for whom the entities in class (d), the supposed real universals, were wholly non-existent. Despite the brilliance of several of its advocates, nominalism always remained a minority option among the university-based Aristotelians. In the seventeenth century it was to become much more popular.
Induction before Hume
4
23
THE SEVENTEENTH CENTURY AND EARLY EIGHTEENTH CENTURY
Many of the most original and creative philosophers of the seventeenth and early eighteenth century had little or nothing to say about induction. The word does not appear in either Spinoza’s Ethics or Locke’s Essay, and only once in passing in Berkeley’s Principles of Human Knowledge, § 50. That Spinoza had nothing to say is perhaps not very surprising,13 but the reason for Locke’s virtual silence — the term was used once in The Conduct of the Understanding 14 — is less immediately obvious. Part of the explanation may be that he had no confidence that natural philosophy would ever become a science, and that his own experience had mainly been as a physician, reasoning about particular cases and using general rules only as fallible guides to practice.
4.1
Bacon
Though Francis Bacon was the first thinker to invert the traditional priority and give induction precedence over deduction, it is potentially misleading to describe him as the founder of inductive logic. Bacon was not a logician either by temperament or doctrine, and it would be unhelpful to see him as a remote precursor of Carnap. His treatment of induction should be seen in the context of a massive but incomplete programme for the discovery of a new kind of scientific knowledge [Malherbe, 1996; Gaukroger, 2001, pp. 132–159]. Like Descartes a generation later, Bacon while still quite young became profoundly dissatisfied with all the many and various kinds of natural philosophy currently taught in the universities, but while Descartes was repelled by the uncertainty of this so-called knowledge, Bacon despised it for its uselessness — its utter failure to provide any grounding for practically effective techniques of controlling nature. Bacon’s disdain for traditional philosophy was made plain in 1605, in his first major publication, the Advancement of Learning. His low opinion of the logic taught by the schoolmen extended to their treatment of induction: Secondly, the Induction which the Logitians speake of, and which seemeth familiar with Plato, whereby the Principles of Sciences may be pretended to be invented, and so the middle propositions by derivation from the Principles; their fourme of Induction, I say is utterly vitious and incompetent.. . . For to conclude uppon an Enumeration of particulars, without instance contradictorie is no conclusion but a coniecture; for who can assure (in many subjects) uppon those particulars which appeare of a side, that there are not other on the contrarie side, which not? [Bacon, 2000a, pp. 109–110] 13 The
word does occur in ch. 11 of the Tractatus Theologico-Polticus [Spinoza, 2004, p. 158]. observations ‘may be establish’d into Rules fit to be rely’d on, when they are justify’d by a sufficient and wary Induction of Particulars’ [Locke, 1706, p. 49]. 14 Our
24
J. R. Milton
What Bacon proposed to use instead of this vicious and incompetent form of induction is not explained, though he did promise the reader that ‘if God give me leave’ he would one day publish an account of his new method, which he called the Interpretation of Nature [Bacon, 2000a, p. 111]. The promise was eventually honoured in 1620 with the publication of Bacon’s most substantial philosophical work, the Novum Organum, designed as the second part, though the first to be published, of a massive — and unfinished — six-part project, the Great Instauration (Instauratio Magna). The title chosen for this second part made it clear that Bacon was making an open challenge to Aristotle. Aristotle’s logical works had become known collectively as the Organon, or tool, and the New Organon was intended not merely as a supplement, but as a replacement. Bacon’s case against the traditional logic of the schools had two main strands. In the first place the old logic was concerned with talk rather than action: For the ordinary logic professes to contrive and prepare helps and guards for the understanding, as mine does; and in this one point they agree. But mine differs from it in three points especially; viz., in the end aimed at; in the order of demonstration; and in the starting point of the inquiry. For the end which this science of mine proposes is the invention not of arguments but of arts; not of things in accordance with principles, but of principles themselves; not of probable reasons, but of designations and directions for works. And as the intention is different, so accordingly is the effect; the effect of the one being to overcome an opponent in argument, of the other to command nature in action. [Bacon, 1857– 74, IV, pp. 23–24] Training in traditional logic encouraged the wrong kind of mental skills: it placed a premium on intellectual subtlety, but ‘the subtlety of nature is far greater than the subtlety of the senses and understanding’ (Novum Organum, I. 10). Facility with words and agility in debate are not what is required when one is trying to penetrate the workings of nature. Secondly, by concentrating on the forms of argument, syllogistic logic draws attention away from defects in their matter, which are far more dangerous: The syllogism consists of propositions, propositions consist of words, words are symbols of notions. Therefore if the notions themselves (which is the root of the matter) are confused and over-hastily abstracted from the facts, there can be no firmness in the superstructure. Our only hope therefore lies in a true induction. [Novum Organum, I. 14] This is the first occasion in which induction was mentioned in the Novum Organum (as distinct from the parts of the Instauratio Magna that preceded it), but there is no subsequent explanation of how induction could contribute to the rectification of
Induction before Hume
25
defective concepts. One thing that is apparent, however, is that Bacon’s approach was quite different to Descartes’: there was no suggestion that the establishment of a set of clear and distinct ideas either could or should precede the investigations undertaken with their help. The improvement of concepts and the growth of knowledge had to take place together, by slow increments. Despite these harsh remarks, Bacon did not reject syllogistic reasoning entirely, but he restricted its use to areas of human life where ‘popular’, superficial concepts are employed: Although therefore I leave to the syllogism and these famous and boasted modes of demonstration their jurisdiction over popular arts and such as are matter of opinion (in which department I leave all as it is), yet in dealing with the nature of things I use induction throughout . . . [Bacon, 1857–74, IV, p. 24]15 Bacon was the opposite of an ‘ordinary language’ philosopher: he had no belief whatever that the concepts embedded since time immemorial in common speech would prove to be the ones needed in a reformed natural philosophy — indeed quite the contrary. One of his fundamental objections to Aristotle was that Aristotle had taken as his starting point popular notions and merely ordered and systematised them, instead of replacing them by something better. Bacon had no liking for neologisms, and whenever possible preferred ‘to retaine the ancient tearmes, though I sometimes alter the uses and definitions, according to the Moderate proceeding in Civill government’ [Bacon, 2000a, p. 81]. But though he was prepared to retain the traditional vocabulary, the kind of induction he was planning to use would very unlike anything described by his predecessors: In establishing axioms, another form of induction must be devised than has hitherto been employed, and it must be used for proving and discovering not first principles (as they are called) only, but also the lesser axioms, and the middle, and indeed all. For the induction which proceeds by simple enumeration is childish; its conclusions are precarious and exposed to peril from a contradictory instance; and it generally decides on too small a number of facts, and on those only which are at hand. [Novum Organum, I. 105]16 The fallibility of induction by simple enumeration could hardly be more clearly expressed. Bacon had no intention of retaining it and merely adding safeguards that would make its use less risky and any conclusions reached more probable. He wanted it to be discarded in favour of something entirely different: 15 See also the letter of 30 June 1622 to Fr. Redemptus Baranzan [Bacon, 1857–74, XIV, p. 375]. 16 Axioms here are not the axioms of modern mathematics and logic, but rather important general principles; the term comes from Stoic logic [Frede, 1974, pp. 32–37; Kneale and Kneale, 1962, pp. 145–147].
26
J. R. Milton
But the induction which is to be available for the discovery and demonstration of sciences and arts, must analyse nature by proper rejections and exclusions; and then, after a sufficient number of negatives, come to a conclusion on the affirmative instances.. . . But in order to furnish this induction or demonstration well and duly for its work, very many things are to be provided which no mortal has yet thought of; insomuch that greater labour will have to be spent in it than has hitherto been spent on the syllogism. [Novum Organum, I. 105] The last part of this was a warning that Bacon’s own account of this new kind of induction would (at this stage) be far from complete. He never supposed that his method could be described in detail, prior to its employment in actual investigations. The specimen given in the Novum Organum of an enquiry made using the new kind of induction was explicitly described as a First Vintage, or provisional interpretation (interpretatio inchoata, II. 20); a full account would have to wait until the final part of the Instauratio Magna, the Scientia Activa, which was never written, or indeed even begun. One thing that was clear from the start, however, is that it would be a form of eliminative induction, relying on ‘rejections and exclusions’. A great mass of merely confirming instances, however large, is never enough. Bacon’s own preliminary account of his method is given in Book II of the Novum Organum. There are three stages: the compilation of a ‘natural and experimental history’ of the nature under investigation, the ordering of this in tables, and finally induction. While the first two of these are described in considerable detail (Novum Organum, II. 10–14), the account of induction itself is strikingly brief: We must make, therefore, a complete solution and separation of nature, not indeed by fire, but by the mind, which is a kind of divine fire. The first work therefore of true induction (as far as regards the discovery of Forms) is the rejection or exclusion of the several natures which are not found in some instance where the given nature is present, or are found in some instance where the given nature is absent, or are found to increase in some instance when the given nature decreases, or to decrease when the given nature increases. [Novum Organum, II. 16] Bacon’s theory of forms is notoriously obscure — they are certainly not the substantial forms of the Aristotelians — but it is clear that, whatever they might be in ontological terms, they are the causes of the (phenomenal) natures [P´erezRamos, 1988, pp. 65–132]. The form of heat is something which is present in all hot bodies, absent from all cold bodies, and which varies in intensity according to the degree of heat found in a body. The conclusion of the process of induction was described in a vivid (but opaque) metaphor taken from contemporary chemistry: ‘after the rejection and exclusion has been duly made, there will remain at the bottom, all light opinions vanishing into smoke, a Form affirmative, solid, and true and well defined’ (Novum Organum, II. 16), like a puddle of gold at the bottom of an alchemist’s crucible. Bacon’s own
Induction before Hume
27
comment on this is entirely apposite: ‘this is quickly said; but the way to come at it is winding and intricate.’ Bacon’s confidence that his method of eliminative induction would produce certain knowledge rested on several presuppositions, of which the most important is what Keynes subsequently termed a Principle of Limited Variety. Though the world as we experience it appears unendingly varied, all this complexity arises from the combination of a finite — indeed quite small — number of simple natures. There is an alphabet of nature,17 the contents of which cannot be guessed or discovered by speculation, but which will start to be revealed once the correct inductive procedures are employed. Bacon made no attempt to give an a priori justification of this, and there is no reason to suppose that he would have regarded any such justification as either possible or necessary. As always, validation would be retrospective — by having supplied those who employed the method correctly with power over nature.
4.2
Descartes
Induction played no significant role in Descartes’ mature philosophy, but there are some remarks on it in the early and unfinished Regulae ad Directionem Ingenii (c.1619–c.1628). Whether Descartes had read any of Bacon’s works at this stage in his life is not known — he certainly became familiar with Bacon’s thought subsequently [Clarke, 2006, p. 104] — but his account in the Regulae appears to have owed nothing whatever to the Novum Organum. In the Regulae the most certain kind of knowledge comes from intuition, a direct apprehension of the mind unmediated by any other intellectual operations. Deduction is needed because some chains of reasoning are too complex to be grasped by a single act of thought: we can grasp intuitively the link between each element in the chain and its predecessor, but not all the links between the elements at once. Induction18 was dealt with more briefly, Rule VII stating that In order to make our knowledge complete, every single thing relating to our undertaking must be surveyed in a continuous and wholly uninterrupted sweep of thought, and be included in a sufficient and well ordered enumeration [sufficienti et ordinata enumeratione]. [Descartes, 1995, I, p. 25] It would seem that for Descartes the words ‘inductio’ and ‘enumeratio’ were merely alternative names for the same thing; their equivalence is suggested by phrases such as ‘enumeratio, sive inductio’ and ‘enumerationem sive inductionem’ in the passages quoted below [Descartes, 1908, pp. 388, 389], [Marion, 1993, p. 103]. 17 On
this, and Bacon’s work on an Abecedarium Naturae, see [Bacon, 2000b, pp. xxix–xl, 305]. word occurs three times in Rule VII [Descartes, 1908, pp. 388, 389, 390] and once in Rule XI (p. 408). There is one place in Rule III (p. 368) where ‘inductio’ appears in the first edition of 1701, but this may be a transcriber’s or printer’s error for ‘deductio’; for a discussion of the problem, see [Descartes, 1977, pp. 117–119]. 18 The
28
J. R. Milton
The function of a sufficient enumeration is given in the explication of Rule VII: We maintain furthermore that enumeration is required for the completion of our knowledge [ad scientiae complementum]. The other Rules do indeed help us resolve most questions, but it is only with the aid of enumeration that we are able to make a true and certain judgement about whatever we apply our minds to. By means of enumeration nothing will wholly escape us and we shall be seen to have some knowledge on every question. In this context enumeration, or induction, consists in a thorough investigation of all the points relating to the problem at hand, an investigation which is so careful and accurate that we may conclude with manifest certainty that we have not inadvertently overlooked anything. So even though the object of our enquiry eludes us, provided we have made an enumeration we shall be wiser at least to the extent that we shall perceive with certainty that it could not possibly be discovered by any method known to us. [Descartes, 1995, I, pp. 25–26] If an enumeration is to lead to a negative conclusion that the knowledge of something lies entirely beyond the reach of the human mind, then it is essential that it should be ‘sufficient’: We should note, moreover, that by ‘sufficient enumeration’ or ‘induction’ [sufficientem enumerationem sive inductionem] we just mean the kind of enumeration which renders the truth of our conclusions more certain than any other kind of proof [aliud probandi genus] (simple intuition excepted) allows. But when our knowledge of something is not reducible to simple intuition and we have cast off our syllogistic fetters, we are left with this one path, which we should stick to with complete confidence. [Descartes, 1995, I, p. 26] This notion of a sufficient enumeration plays a crucial role in Descartes’ account, and it is unfortunate that his explication of it singularly fails to meet his own professed ideal of perfect clarity [Beck, 1952, p. 131]. It is not the same as completeness. If I wish to determine how many kinds of corporeal entities there are, I need to distinguish them from one another and make a complete enumeration of all the different kinds, But if I wish to show in the same way that the rational soul is not corporeal, there is no need for the enumeration to be complete; it will be sufficient if I group all bodies together into several classes so as to demonstrate that the rational soul cannot be assigned to any of these. [Descartes, 1995, I, pp. 26–27] The thought here appears to be that we do not need to make a complete list of all the different kinds of body: if we are merely attempting to establish a negative
Induction before Hume
29
thesis about what the rational soul is not, a division into several broad classes is enough. One relatively straightforward example of an enumeration is given in Rule VIII. Someone attempting to investigate all the kinds of knowledge will have to begin by considering the pure intellect, since the knowledge of everything else depends on this; then ‘among what remains he will enumerate [enumerabit] whatever instruments of knowledge we possess in addition to the intellect; and there are only two of these, namely imagination and sense perception’ [Descartes, 1995, I, p. 30]. He will make a precise enumeration [enumerabit exacte] of all the paths to truth which are open to men, so that he may follow one which is reliable. There are not so many of these that he cannot immediately discover them all by means of a sufficient enumeration [sufficientem enumerationem]. . . [Descartes, 1995, I, p. 30] Another example of an enumeration — here explicitly described as an induction — is more puzzling: To give one last example, say I wish to show by enumeration that the area of a circle is greater than the area of any other geometrical figure whose perimeter is the same length as the circle’s. I need not review every geometrical figure. If I can demonstrate that this fact holds for some particular figures, I shall be entitled to conclude by induction that the same holds true in all the other cases as well. [Descartes, 1995, I, p. 27] This does not appear to be an example of a complex proof composed of separate proofs of a finite set of more specific cases, as when a theorem about triangles in general is established by showing it to be true for acute, obtuse and right-angled triangles. There are clearly an infinite number of polygons with the same perimeter as a given circle but with smaller areas. How the argument is meant to proceed is not clear, but it is certainly not though an exhaustive case-by-case analysis.19 There is no further discussion of induction in the works that Descartes published. The Regulae was not printed until 1701, though copies circulated in manuscript and were read by Leibniz, and (possibly) by Locke.20 The account of knowledge that Locke gave in book IV of the Essay concerning Human Understanding certainly has close parallels with the account in the Regulae, but there is no mention at all of induction.
4.3
Gassendi
Pierre Gassendi discussed induction in his Institutio Logica [Gassendi, 1981], first published in 1658 as parts of his posthumous Opera Omnia. Following in the 19 The
result is not elementary. A non-rigorous proof, first given by Zenodorus (2nd century bc?), is preserved in Book V of Pappus’ Collections [Cuomo, 2000, pp. 61–62]. 20 If Locke had seen a copy, it would have been between 1683 and 1689 when he was in exile in the Netherlands. There is no mention of this anywhere in his private papers.
30
J. R. Milton
tradition of the Prior Analytics, induction was treated as a kind of syllogism, for example: Every walking animal lives, every flying animal lives, and also every swimming animal, every creeping animal, every plant-like animal; therefore every animal lives. [Gassendi, 1981, p. 53] In such an induction there is a concealed premise: Every animal is either walking, or flying, or swimming, or creeping, or plant-like. Without this, the inference would have no force (consequutionis vis nulla foret), since if there were another kind of animal in addition to these, a false conclusion could emerge. If an induction is to be valid (legitima) it has to be based on an enumeration of all the relevant species or parts, and as Gassendi commented, such an enumeration is usually difficult if not impossible to achieve [Gassendi, 1981, p. 54]. The same account appeared with only minor changes in the French Abreg´e de la philosophie de Gassendi, published by Gassendi’s disciple, Fran¸cois Bernier [Bernier, 1684, I, pp. 132–133].
4.4
Arnauld and Nicole
The work most strongly influenced by Descartes’ as yet unpublished Regulae was La Logique ou l’art de penser, published in 1662 by Antoine Arnauld and Pierre Nicole. Induction is introduced in traditional and broadly neutral terms: Induction occurs whenever an examination of several particular things leads us to knowledge of a general truth. Thus when we experience several seas in which the water is salty, and several rivers in which the water is fresh, we infer that in general sea water is salty and river water is fresh. [Arnauld and Nicole, 1996, p. 202] Induction is described as the beginning of all knowledge, because singular things are presented to us before universals. This sounds thoroughly Aristotelian, but the resemblance is only superficial. Though I might never have started to think about the nature of triangles if I had not seen an individual example, ‘it is not the particular examination of all triangles which allows me to draw the general and certain conclusion about all of them . . . but the mere consideration of what is contained in the idea of the triangle which I find in my mind’ [Arnauld and Nicole, 1996, p. 202]. The same is true of the very general axioms that have application in fields quite remote from geometry, for example the principle that a whole is greater than its part, the ninth and last of Euclid’s Common Notions. According to certain philosophers — un-named, but presumably Gassendi and his followers — we know this only because ever since our infancy we have observed that a man is larger than his head, a house larger than a room, a forest larger than a tree, and
Induction before Hume
31
so on. Arnauld and Nicole replied that ‘if we were sure of this truth . . . only from the various observations we had made since childhood, we would be sure only of its probability [nous n’en serions probablement assur´es], since induction is a certain means of knowing something only when the induction is complete’ [Arnauld and Nicole, 1996, p. 247]. It is striking that several of Arnauld and Nicole’s examples of over-confident reliance on inherently fallible inductive arguments are taken from recent developments in the physical sciences. Natural philosophers had long believed that a piston could not be drawn out of a perfectly sealed syringe and that a suction pump could lift water from any depth, and they supposed these alleged truths to be founded on a ‘a very certain induction based on an infinity of experiments [exp´eriences]’ [Arnauld and Nicole, 1996, p. 203, translation modified]. Again, it was assumed that if water was contained in a curved vessel (e.g. a U-tube) of which one arm was wider than the other, the level in the two arms would be equal; experiment had shown that this was not true when one arm was very narrow, allowing capillary attraction to become significant [Arnauld and Nicole, 1996, p. 247]. It would probably be going to far to say that new discoveries in the natural sciences were the main force fuelling inductive scepticism, but they do seem to have played a part in reducing confidence in the age-old experiential data on which Aristotelian science had been based [Dear, 1995].
4.5
Hobbes and Wallis
Despite Hobbes’s strong and unswerving commitment to nominalism, inductive reasoning did not play a large role in his philosophy, and he had little to say about it. In the mid-1650s he became involved in a series of acrimonious arguments with the mathematician John Wallis [Jesseph, 1999], part of which touched on Wallis’s use of inductive arguments. Hobbes thought that induction had no place in mathematics, or at least in mathematical demonstration: The most simple way (say you) of finding this and some other Problemes, is to do the thing it self a little way, and to observe and compare the appearing Proportions, and then by Induction, to conclude it universally. Egregious Logicians and Geometricians, that think an Induction without a Numeration of all the particulars sufficient to infer a Conclusion universall, and fit to be received for a Geometricall Demonstration! [Hobbes, 1656, p. 46] Hobbes clearly thought that there were only two kinds of induction, one founded on a complete enumeration of all the particulars, which could lead to certainty, and the other founded on a partial enumeration, which could not. The remarks by Wallis that Hobbes had found so objectionable were in the Arithmetica Infinitorum of 1656, but his fullest treatment of the issue is to be found in his much later Treatise of Algebra [Wallis, 1685], where he was responding to the criticisms of a very much more capable mathematician than Hobbes, Pierre
32
J. R. Milton
Fermat [Stedall, 2004, pp. xxvi–xxvii]. Wallis insisted that inductive arguments have a legitimate role in mathematics: As to the thing itself, I look upon Induction as a very good Method of Investigation; as that which doth very often lead us to the easy discovery of a General Rule; or is at least a good preparative to such an one. And where the Result of such Inquiry affords to the view, an obvious discovery; it needs not (though it may be capable of it,) any further Demonstration. And so it is, when we find the Result of such Inquiry, to put us into a regular orderly Progression (of what nature soever,) which is observable to proceed according to one and the same general Process; and where there is no ground of suspicion why it should fail, or of any case which might happen to alter the course of such Process. [Wallis, 1685, p. 306] The example Wallis gave of this was the expansion of the binomial (a + e)n : (a + e)2 = a2 + 2ae + e2 , (a + e)3 = a3 + 3a2 e + 3ae 2 + e3 , (a + e)4 = a4 + 4a3 e + 6a2 e2 + 4ae 3 + e4 , and so on. The coefficients in each case can be found by repeated multiplication, but more easily by using the diagram now known as Pascal’s triangle, in which each element is the sum of the two diagonally above it in the row above: 1 1 1 1 1 1
3 4
5
1 2
1 3
6 10
1 4
10
1 5
1
The general result, that this procedure can be used for any power, is described as being established by induction. Wallis remarked: But most Mathematicians that I have seen, after such Induction continued for some few Steps, and seeing no reason to disbelieve its proceeding in like manner for the rest, are satisfied (from such evidence,) to conclude universally, and so in like manner for the consequent Powers. And such Induction hath hitherto been thought (by such as do not list to be captious) a conclusive Argument. [Wallis, 1685, p. 308] There is no indication here or anywhere else that the mathematical results reached by this kind of induction are merely probable, or in any way uncertain. The reason is clear: ‘there is, in the nature of Number, a sufficient ground for such a sequel’ [Wallis, 1685, p. 307]. Wallis was not using what has since become known as mathematical induction [Cajori, 1918]: his argument is much closer to the demonstrative induction of the later Aristotelians such as Zabarella, in which the examination of a few cases is enough to reveal the underlying regularity.
Induction before Hume
33
Wallis’s account of induction in Book III, chapter 15 of his Institutiones Logicae [Wallis, 1687, pp. 167–172] is more traditional, and much of it is concerned with the reduction of perfect inductions to various figures of the syllogism. In imperfect inductions the conclusion is described as only conjectural, or probable, on the familiar grounds that it can be overturned by a single negative instance [Wallis, 1687, p. 170]. No attempt was made to estimate any degrees of probability. Wallis insisted several time that the weakness (imbecillitas) that characterised all imperfect inductions did not lie in their form, which was that of a syllogism, but in their matter: For example, if someone argues Teeth in the upper jaw are absent in all horned animals; because it is thus in the Ox, the Sheep, the Goat, nor is it otherwise (as far as we know) in the others; Therefore (at least as far as we know) in all. This conclusion is not certain, but only probable [verisimilis]; not through a defect in the Syllogistic form, but through the uncertainty of the matter, or the truth of the premises. [Wallis, 1687, pp. 170–171] In other words an inductive argument is not a fallible inference from reliably established premises such as ‘The ox has no teeth in the upper jaw’ and ‘The sheep has no teeth in the upper jaw’, but rather a deductive inference from premises like ‘The ox, the sheep and the goat have no teeth in the upper jaw, nor is it otherwise in the other horned animals’, all of which are uncertain and are provisionally accepted merely because no counter-examples are known to exist.
4.6
Leibniz
Induction was not a central issue in Leibniz’s philosophy, but given his omnivorous intellectual curiosity, it is not surprising either that he said something or that what he had to say is of considerable interest [Rescher, 1981; 2003; Westphal, 1989]. In the preface to his New Essays on Human Understanding Leibniz explained that one point of fundamental disagreement between him and Locke concerned the existence or non-existence of innate principles in the soul. This in turn raised the question of ‘whether all truths depend on experience, that is on induction and instances, or if some of them have some other foundation’. Leibniz chose the second answer: the senses ‘never give us anything but instances, that is particular or singular truths. But however many instances confirm a general truth, they do not suffice to establish its universal necessity; for it does not follow that what has happened will always happen in the same way.’ [Leibniz, 1981, p. 49]. Our knowledge of truths of reason, such as those of arithmetic and geometry, is not based on induction at all. Not all truths are, however, truths of reason, and the other kind of truths — truths of fact — need to be discovered in a different way, at least by human beings. As Leibniz remarked in a paper ‘On the souls of men and beasts’, written around 1710:
34
J. R. Milton
there are in the world two totally different sorts of inferences, empirical and rational. Empirical inferences are common to us as well as to beasts, and consist in the fact that when sensing things that have a number of times been experienced to be connected we expect them to be connected again. Thus dogs that have been beaten a number of times when they have done something displeasing expect a beating again if they do the same thing, and therefore they avoid doing it; this they have in common with infants. [Leibniz, 2006, p. 66] Beasts and infants are not alone in making such inferences: so too do human beings. As he noted in § 28 of the Monadology, Men act like beasts insofar as the sequences of their perceptions are based only on the principle of memory, like empirical physicians who have a simple practice without theory. We are all mere empirics in three-fourths of our actions. For example, when we expect daylight tomorrow, we act as empirics, because this has always happened up to the present. Only the astronomer concludes it by reason. [Leibniz, 1969, p. 645, translation modified]21 There is however one difference between beasts and mere empirics: ‘beasts (as far as we can tell) are not aware of the universality of propositions . . . And although empirics are sometimes led by inductions to true universal propositions, nevertheless it only happens by accident, not by the force of consequence.’ [Leibniz, 2006, p. 67]. Human beings when relying purely on experience make generalisations that are often wrong, but beasts seem not to generalise at all. It would seem from this that there are three kinds of reasoning (using this word in a large sense): (1) inferences from particulars to other particulars, the kind of reasoning that earlier philosophers had called paradeigma or example; (2) inductive generalisation proper; and (3) deduction. Something of this kind seems to be indicated in a note he made on the back of a draft letter dated May 1693, where a distinction is made between three grades of confirmation (firmitas): logical certainty, physical certainty, which is only logical probability, and physical probability. The first example [is] in propositions of eternal truth, the second in propositions which are known to be true by induction, as that every man is a biped, for sometimes some are born with one foot or none; the third that the south wind brings rain, which is usually true but not infrequently false. [Couturat, 1961, p. 232] Physical certainty is identified with moral certainty in New Essays IV. vi. 13 [Leibniz 1981, p. 406]. The implication is that the conclusions reached by inductive inferences can at least sometimes be morally certain. 21 The parallel between beasts and empirical physicians is one that Leibniz drew several times: Principles of Nature and Grace, § 5 [Leibniz, 1969, p. 638], New Essays, preface, [Leibniz, 1981, p. 50].
Induction before Hume
35
An indication of how such certainty can be obtained is provided by one of the earliest expositions of Leibniz’s views on induction. The Dissertatio de Stilo Philosophico Nizolii was a preface written in 1670 for a new edition of Mario Nizzoli’s De Veris Principiis et Vera Ratione Philosophandi contra Pseudophilosophos, first published in 1553. Nizzoli (1488–1567) was an idiosyncratic thinker who has been described as a Ciceronian Ockhamist, and for Leibniz, his fundamental error is his nominalism — his denial of the existence of real universals: If universals were nothing but collections of individuals, it would follow that we could attain no knowledge through demonstration . . . but only through collecting individuals or by induction.22 But on this basis knowledge would straightway be made impossible, and the skeptics would be victorious. For perfectly universal propositions can never be established on this basis because you are never certain that all individuals have been considered. You must always stop at the proposition that all the cases which I have experienced are so. But . . . it will always remain possible that countless other cases which you have not examined are different. [Leibniz, 1969, p. 129] Leibniz admitted that we believe confidently that fire burns, and that we will ourselves be burned if we place our hand in one, but this kind of moral certainty does not depend on induction alone and is reached only with the assistance of other universal propositions: 1. if the cause is the same or similar in all cases, the effect will be the same or similar in all; 2. the existence of a thing which is not sensed is not assumed; and, finally, 3. whatever is not assumed, is to be disregarded in practice until it is proved. [Leibniz, 1969, p. 129]. The second and third of these are methodological principles, similar though not identical to Ockham’s Razor. The first is a more carefully worded version of Hume’s principle that ‘like causes always produce like effects’.23 Without the aid of these helping propositions (adminicula), as Leibniz called them, not even moral certainty would be possible. Our knowledge of the adminicula cannot therefore be grounded on induction: ‘For if these helping propositions, too, were derived from induction, they would need new helping propositions, and so on to infinity’ [Leibniz, 1969, p. 130]. Leibniz’s language here was Baconian,24 but his thought manifestly is not: it is much closer to Hume. 22 As the Latin (collectionem singularium, seu inductionem) makes clear, this is one process named in two ways, not two distinct processes [Leibniz, 1840, p. 70]. 23 The sixth of Hume’s rules by which to judge of causes and effects, Treatise of Human Nature, I. iii. 15. 24 The Adminicula inductionis were announced as a topic of future discussion in Novum Organum, II. 21, but never described in detail.
36
J. R. Milton
There is an illuminating comparison to be made between this kind of sophisticated induction, buttressed by the adminicula, and the demonstrative induction described by Zabarella. In demonstrative induction the conclusion can be made certain to us because the intellect grasps the universal nature on which the truth of the universal proposition is grounded. In Leibniz the help is provided by principles of a much higher degree of generality, such as the Law of Continuity [Leibniz, 1969, pp. 351–352], and ultimately the Principle of Sufficient Reason. In the words of Foucher de Careil: Thus Leibniz has seen that in order to be introduced into science, induction needs the help of certain universal propositions that in no way depend on it. And since there can be no obstacle to the complete and systematic unity of science except the diversity of the facts of experience, he saw that the law of continuity, which is the link between the universal and the particular, and which unites them in science, is the true basis of induction . . . without it induction is sterile, with it, it generates moral certainty. [Leibniz, 1857, p. 422] It is the regularity of nature — i.e. the fact that it is law-governed — that makes properly conducted inductive inferences safe. 5 CONCLUSION The story told in the pages above has been an episodic and fragmentary one, with remarks about induction extracted from the writings of authors who were almost always concerned primarily with other matters, and for whom inductive reasoning was a matter of relatively minor importance. The one significant exception was Bacon, and even his treatment of his new method of eliminative induction was remarkably brief, given its pivotal role in his programme. Even in the modern world a philosopher is not required to deal with induction at any length — or indeed at all — if they wish to be considered as a candidate for greatness. Given the direction of his interests, no one would have expected Nietzsche, for instance, to have focused his considerable talents on the problem, and the same is true of a large number of his predecessors. What is striking is not that many philosophers chose to concentrate on other matters, but that virtually everyone did. The notion that there is a general and far-reaching ‘problem of induction’ is relatively recent. One of the earliest and most influential uses of the phrase was in J. S. Mill’s System of Logic, III. iii. 3, where the discussion of induction concluded with the following peroration: Why is a single instance, in some cases, sufficient for a complete induction, while in others, myriads of concurring instances, without a single exception known or presumed, go such a very little way towards establishing an universal proposition? Whoever can answer this question
Induction before Hume
37
knows more of the philosophy of logic than the wisest of the ancients, and has solved the problem of induction. [Mill, 1973–4, p. 314] Whether anyone has subsequently succeeded in solving — or dissolving — the problem may be doubted, though confident (and sometimes absurd) claims have continued to be made. What does seem clear is that no one before the nineteenth century saw induction as posing a single, general problem, still less regarded a failure to solve it as being, in C. D. Broad’s often-quoted words, ‘the scandal of Philosophy’ [Broad, 1926, p. 67]. BIBLIOGRAPHY [Agricola, 1992] R. Agricola. De Inventione Dialectica Libri Tres. Edited by L. Mundt. T¨ ubingen: Max Niedereyer Verlag, 1992. [Alexander, 2001] Alexander of Aphrodisias. On Aristotle Topics I. Translated by J. M. Van Ophuijsen. London: Duckworth, 2001. [Allen, 2001] J. Allen. Inference from Signs: Ancient Debates about the Nature of Evidence. Oxford: Clarendon Press, 2001. [Apuleius, 1987] D. Londey and C. Johanson. The Logic of Apuleius. Leiden/New York/Copenhagen/Cologne: Brill, 1987. [Aquinas, 1970] Thomas Aquinas. Commentary on the Posterior Analytics of Aristotle. Translated by F. R. Larcher. Albany: Magi Books, 1970. [Aristotle, 1966] Aristotle. Posterior Analytics, Topica. Translated by H. Tredennick and E. S. Forster. London/Cambridge MA: William Heinemann and Harvard University Press, 1966. [Aristotle, 1973] Aristotle, Categories, On Interpretation, Prior Analytics. Translated by H. P. Cooke and H. Tredennick. London/Cambridge MA: William Heinemann and Harvard University Press, 1973. [Arnauld and Nicole, 1996] A. Arnauld and P. Nicole. Logic or the Art of Thinking. Translated by J. V. Buroker. Cambridge: Cambridge University Press, 1996. [Asmis, 1984] E. Asmis. Epicurus’ Scientific Method. Ithaca: Cornell University Press, 1984. [Atherton, 1999] M. Atherton. The Empiricists: Critical Essays on Locke, Berkeley and Hume. Lanham MD: Rowman & Littlefield, 1999. [Bacon, 1857–74] F. Bacon. The Works of Francis Bacon. Collected and edited by J. Spedding, R. L. Ellis and D. D. Heath. London: Longman & Co., 1857–74. [Bacon, 2000a] F. Bacon. Advancement of Learning. Edited by M. Kiernan. The Oxford Francis Bacon, vol. IV. Oxford: Clarendon Press, 2000. [Bacon, 2000b] F. Bacon. The Instauratio magna: Last Writings. Edited and translated by G. Rees. The Oxford Francis Bacon, vol. XIII. Oxford: Clarendon Press, 2000. [Bacon, 2004] F. Bacon. The Instauratio magna Part II: Novum organum and Associated Texts. Edited and translated by G. Rees and M. Wakeley. The Oxford Francis Bacon, vol. XI. Oxford: Clarendon Press, 2004. [Barnes, 1975] J. Barnes. Aristotle’s Posterior Analytics. Oxford: Clarendon Press, 1975. [Barnes, 1988, ] J. Barnes. Epicurean Signs. Oxford Studies in Ancient Philosophy, Supplementary Volume, 1988, pp. 91–134. Oxford: Clarendon Press, 1988. [Barnes, 1997] J. Barnes. Logic and the Imperial Stoa. Leiden/New York/Cologne: Brill, 1997. [Barnes et al., 1982] J. Barnes, J. Brunschwig, M. Burnyeat and M. Schofield. Science and Speculation: Studies in Hellenistic theory and practice. Cambridge: Cambridge University Press, 1982. [Beck, 1952] L. J. Beck. The Method of Descartes. Oxford: Clarendon Press, 1952. [Bernier, 1684] F. Bernier. Abreg´ e de la philosophie de Gassendi. Lyon: Anisson, Posuel & Rigaud, 1684. [Biard, 2001] J. Biard. The Natural Order in John Buridan. In [Thijssen and Zupko, 2001, pp. 77–96]. [Bos, 1993] E. P. Bos. A Contribution to the History of Theories of Induction in the Middle Ages. In [Jacobi, 1993, pp. 553–576].
38
J. R. Milton
[Broad, 1926] C. D. Broad. The Philosophy of Francis Bacon. Cambridge: Cambridge University Press, 1926. [Broadie, 1993] A. Broadie. Introduction to Medieval Logic. Oxford: Clarendon Press, 1993. [Burnyeat, 1982] M. F. Burnyeat. The Origins of Non-Deductive Inference. In [Barnes et al., 1982, pp. 193–238]. [Burnyeat, 1994] M. F. Burnyeat. Enthymeme: the Logic of Persuasion. In [Furley and Nehamas, 1994, pp. 3–56]. [Cajori, 1918] F. Cajori. Origin of the Name ‘Mathematical Induction’. American Mathematical Monthly, 25: 197–201, 1918. ´ [Caujolle-Zaslavsky, 1990] F. Caujolle-Zaslavsky. Etude pr´eparatoire ` a une interpr´etation du ` sens aristot´elicien d’Eπαγογή. In [Devereux and Pellegrin, 1990, pp. 365–387]. [Cicero, 1949] Marcus Tullius Cicero, De Inventione, De Optimo Genere Oratorum, Topica. Translated by H. M. Hubbell. London/Cambridge MA: William Heinemann and Harvard University Press, 1949. [Clarke, 2006] D. M. Clarke. Descartes: A Biography. Cambridge: Cambridge University Press, 2006. [Cohen, 1980] L. J. Cohen. Some Historical Remarks on the Baconian Conception of Probability. Journal of the History of Ideas, 41: 219–231, 1980. [Couturat, 1961] L. Couturat. Opuscules et fragments id´ edites de Leibniz. Hildesheim: Georg Olms, 1961. [Crombie, 1953] A. C. Crombie. Robert Grosseteste and the Origins of Experimental Science, 1100–1700. Oxford: Clarendon Press, 1953. [Cuomo, 2000] S. Cuomo. Pappus of Alexandria and the Mathematics of Late Antiquity. Cambridge: Cambridge University Press, 2000. [Dear, 1995] P. Dear. Discipline and Experience: The Mathematical Way in the Scientific Revolution. Chicago: University of Chicago Press, 1995. [De Rijk, 2002] L. M. De Rijk. Aristotle: Semantics and Ontology. Volume I: General Introduction. Works on Logic. Leiden/Boston/Cologne: Brill, 2002. [Descartes, 1908] R. Descartes. Oeuvres de Descartes, vol. 10. Edited by C. Adam and P. Tannery. Paris: J. Vrin, 1908. [Descartes, 1977] R. Descartes. R` egles utiles et claires pour la direction de l’esprit en la recherche de la v´ erit´ e. Edited by J.- L. Marion. The Hague: Martinus Nijhoff, 1977. [Descartes, 1985] R. Descartes. The Philosophical Writings of Descartes. Edited by J. Cottingham, R. Stoothoff and D. Murdoch. Cambridge: Cambridge University Press, 1985. [Devereux and Pellegrin, 1990] D. Devereux and P. Pellegrin. Biologie, logique et m´ etaphysique chez Aristote: actes du s´ eminaire C.N.R.S.–N.S.F. Paris: Editions du C.N.R.S., 1990. [Dillon, 1993] J. Dillon. Alcinous: The Handbook of Platonism. Oxford: Clarendon Press, 1993. [Diogenes Laertius, 1980] Diogenes Laertius. Lives of Eminent Philosophers. Translated by R. D. Hicks. London/Cambridge MA: William Heinemann and Harvard University Press, 1980. [Engberg-Pedersen, 1979] More on Aristotelian Epagoge. Phronesis, 24: 301–319, 1979. [Franklin, 2001] J. Franklin. The Science of Conjecture. Baltimore/London: Johns Hopkins University Press, 2001. [Frede, 1974] M. Frede. Die Stoische Logik. G¨ ottingen: Vandenhoek and Ruprecht, 1974. [Furley and Nehamas, 1994] D. J. Furley and A. Nehamas, editors. Aristotle’s Rhetoric: Philosophical Essays. Princeton: Princeton University Press, 1994. [Gassendi, 1981] Pierre Gassendi’s Institutio Logica (1658). Edited and translated by H. Jones. Assen: Van Gorcum, 1981. [Gaukroger, 2001] S. Gaukroger. Francis Bacon and the Transformation of Early-modern Philosophy. Cambridge: Cambridge University Press, 2001. [Glucker, 1995] J. Glucker. Probabile, Veri Simile and Related Terms. In [Powell, 1995, pp. 115–144]. [Grendler, 2002] P. F. Grendler. The Universities of the Italian Renaissance. Baltimore/ London: Johns Hopkins University Press, 2002. [Grosseteste, 1981] R. Grosseteste. Commentarius in Posteriorum Analyticorum Libros. Edited by P. Rossi. Firenze: L. S. Olschki, 1981. [Hackett, 2004] J. Hackett. Robert Grosseteste and Roger Bacon on the Posterior Analytics. In [Lutz-Bachmann et al., 2004, pp. 161–212].
Induction before Hume
39
[Hacking, 1975] I. Hacking. The Emergence of Probability. Cambridge: Cambridge University Press, 1975. [Halm, 1863] K. F. von Halm. Rhetores Latini Minores. Leipzig: B. G. Teubner, 1863. [Hamlyn, 1976] D. Hamlyn. Aristotelian Epagoge. Phronesis, 21: 167–184, 1976. [Hobbes, 1656] T. Hobbes. Six Lessons To the Professors of the Mathematiques. London: Andrew Crook, 1656. [Howson, 2000] C. Howson. Hume’s Problem: Induction and the Justification of Belief. Oxford: Clarendon Press, 2000. [Jacobi, 1993] Argumentationstheorie: Scholastiche Forschungen zu den logischen und semantischen Regeln korrekten Folgerns. Edited by K. Jacobi. Leiden/New York/Cologne: Brill, 1993. [Jesseph, 1999] D. M. Jesseph. Squaring the Circle: The War between Hobbes and Wallis. Chicago: University of Chicago Press, 1999 [Kneale and Kneale, 1962] W. Kneale and M. Kneale. The Development of Logic. Oxford: Clarendon Press, 1962. [Kosman, 1973] L. A. Kosman. Understanding, Explanation and Insight in Aristotle’s Posterior Analytics. In [Lee et al., 1973, pp. 374–392]. [Lameer, 1994] J. Lameer. Al-Farabi and Aristotelian Syllogistics. Leiden/New York/Cologne: Brill, 1994 [Lagerlund, 2000] H. Lagerlund. Modal Syllogistics in the Middle Ages. Leiden/Boston/Cologne: Brill, 2000. [Lee et al., 1973] E. N. Lee, A. P. D. Mourelatos and R. M. Rorty. Exegesis and Argument. Assen: Van Gorcum, 1973. [Leibniz, 1840] G. W. Leibniz. Opera Philosophica quae extant Latina Gallica Germanica omnia. Edited by J. E. Erdmann. Berlin: G. Eichler, 1840. [Leibniz, 1857] G. W. Leibniz. Nouvelles lettres et opuscules de Leibniz. Edited by A. Foucher de Careil. Paris: Auguste Durand, 1857. [Leibniz, 1969] G. W. Leibniz. Philosophical Papers and Letters. Translated by L. E. Loemker. Dordrecht/Boston/London: Reidel, 1969. [Leibniz, 1980] G. W. Leibniz. Philosophisches Schriften, Band 2: 1663–1672. Edited by H. Schepers, W. Kabitz and W. Schneiders. Berlin: Akademie Verlag, 1980. [Leibniz, 1981] G. W. Leibniz. New Essays on Human Understanding. Translated by P. Remnant and J. Bennett. Cambridge: Cambridge University Press, 1981. [Leibniz, 2006] G. W. Leibniz. The Shorter Leibniz Texts: A Collection of New Translations. Edited and translated by L. H. Strickland. London: Continuum, 2006. [Locke, 1706] J. Locke Posthumous Works of Mr. John Locke. London: A. and J. Churchill, 1706. [Locke, 1975] J. Locke. An Essay concerning Human Understanding. Edited by P. H. Nidditch. Oxford: Clarendon Press, 1975. [Lutz-Bachmann et al., 2004] M. Lutz-Bachmann, A. Fidora and P. Antolic. Erkenntnis und Wissenschaft: Probleme der Epistemologie in der Philosophie des Mittelalters. Berlin: Akademie Verlag, 2004. [Mack, 1993] P. Mack. Renaissance Argument: Valla and Agricola in the Traditions of Rhetoric and Dialectic. Leiden/New York/Cologne: Brill, 1993. [Malherbe, 1996] M. Malherbe. Bacon’s Method to Science. In [Peltonen, 1996, pp. 75–98]. [Marion, 1993] J.- L. Marion. Sur L’ontologie grise de Descartes. Paris: Vrin, 1993. [Marrone, 1986] S. P. Marrone. Robert Grosseteste on the Certitude of Induction. In [Wenin, 1986, Volume II, pp. 481–488]. [McGinnis, 2003] J. McGinnis. Scientific Methodologies in Medieval Islam. Journal of the History of Philosophy, 41: 307–327, 2003. [McGinnis, 2008] Avicenna’s Naturalized Epistemology and Scientific Method. In [Rahman et al., 2008, pp. 129–152]. [McGinnis and Reisman, 2007] J. McGinnis and D. C. Reisman, editors. Classical Arabic Philosophy. Indianapolis/Cambridge: Hackett, 2007. [McKirahan, 1992] R. D. McKirahan Jr. Principles and Proofs: Aristotle’s Theory of Demonstrative Science. Princeton: Princeton University Press, 1992. [McPherran, 2007] M. L. McPherran. Socratic Epagoge and Socratic Induction. Journal of the History of Philosophy, 45: 347–364, 2007.
40
J. R. Milton
[Mill, 1973–4] J. S. Mill. A System of Logic, Ratiocinative and Inductive. Edited by J. M. Robson and R. F. McRae. Collected Works of John Stuart Mill, vols. VII and VIII. Toronto: University of Toronto Press, 1973, 1974. [Milton, 1987] J. R. Milton. Induction before Hume. British Journal for the Philosophy of Science, 38: 49–74, 1987. [Monfasani, 1990] J. Monfasani. Lorenzo Valla and Rudolph Agricola. Journal of the History of Philosophy, 28: 181–200, 1990. [Ockham, 1974] William of Ockham. Summa Logicae. Edited by P. Boehner, G. G´ al and S. Brown. Opera Philosophica, vol. I. St Bonaventure, N. Y: Franciscan Institute, 1974. [Okasha, 2001] S. Okasha. What did Hume really show about Induction? Philosophical Quarterly, 51: 307–327, 2001. [Oliver, 2004] S. Oliver. Robert Grosseteste on Light, Truth and Experimentum. Vivarium, 42: 151–180, 2004. [Peltonen, 1996] M. Peltonen. The Cambridge Companion to Bacon. Cambridge: Cambridge University Press, 1996. [P´ erez-Ramos, 1988] A. P´ erez-Ramos. Francis Bacon’s Idea of Science and the Maker’s Knowledge Tradition. Oxford: Clarendon Press, 1988. [Plato, 1953] The Dialogues of Plato. Translated by B. Jowett. 4th Edition. Oxford: Clarendon Press, 1953. [Powell, 1995] J. G. F. Powell, editor. Cicero the Philosopher. Oxford: Clarendon Press, 1995. [Quintilian, 1921] Marcus Fabius Quintilianus. Institutio Oratoria. Edited by H. E. Butler. London/Cambridge MA: William Heinemann and Harvard University Press, 1921. [Rahman et al., 2008] S. Rahman, A. Street and H. Tahiri, editors. The Unity of Science in the Arabic Tradition. New York/London: Springer, 2008. [Rescher, 1981] N. Rescher. Inductive Reasoning in Leibniz. In N. Rescher, Leibniz’s Metaphysics of Nature, pp. 20–28. Dordrecht/Boston/London: Reidel, 1981. [Rescher, 2003] N. Rescher. The Epistemology of Inductive Reasoning in Leibniz. In N. Rescher, On Leibniz, pp. 117–126. Pittsburgh: University of Pittsburgh Press, 2003. [Robinson, 1953] R. Robinson. Plato’s Earlier Dialectic. Oxford: Clarendon Press, 1953. [Ross, 1949] Aristotle’s Prior and Posterior Analytics. Introduction and commentary by W. D. Ross. Oxford: Clarendon Press, 1949. [Schmitt, 1969] C. B. Schmitt. Experience and Experiment: A Comparison of Zabarella’s View with Galileo’s in De motu. Studies in the Renaissance, 16: 80–138, 1969. [Sedley, 1982] D. Sedley. On Signs. In [Barnes et al., 1982, pp. 239–272]. [Serene, 1979] E. F. Serene. Robert Grosseteste on Induction and Demonstrative Science. Synthese, 40: 97–115, 1979. [Sextus, 1967] Sextus Empiricus. Outlines of Pyrrhonism. Translated by R. G. Bury. London/Cambridge MA: William Heinemann and Harvard University Press, 1967. [Spinoza, 2004] B. Spinoza. A Theologico-Political Treatise and A Political Treatise. New York: Dover Books, 2004. [Stedall, 2004] J. A. Stedall. The Arithmetic of Infinitesimals: John Wallis 1656. New York/London: Springer, 2004. [Stove, 1973] D. C. Stove. Probability and Hume’s Inductive Scepticism. Oxford: Clarendon Press, 1973. [Stump, 1978] E. Stump. Boethius’ De topicis differentiis: Translated with notes and essays on the text. Ithaca/London: Cornell University Press, 2001. [Thijssen, 1987] J. M. M. H. Thijssen. John Buridan and Nicholas of Autrecourt on Causality and Induction. Traditio, 43: 237–255, 1987. [Thijssen and Zupko, 2001] J. M. M. H. Thijssen and J. Zupko, editors. The Metaphysics and Natural Philosophy of John Buridan. Leiden/Boston/Cologne: Brill, 2001. [Upton, 1981] T. V. Upton. A Note on Aristotelian epagoge. Phronesis, 26: 172–176, 1981. [Valla, 1982] L. Valla. Repastinatio Dialectice et Philosophice. Edited by G. Zippel. Padova: Editrice Antenore, 1982. [Vico, 1996] G. Vico. The Art of Rhetoric: (Institutiones Oratoriae, 1711–1741). Edited by G. Crif` o; translated by G. A. Pinton and A. W. Shippee. Amsterdam/Atlanta: Editions Rodopi, 1996. [Vives, 1979] Against the Pseudodialecticians: A Humanist Attack on Medieval Logic. Translated with an introduction by R. Guerlac. Dordrecht/Boston/London: Reidel, 1979.
Induction before Hume
41
[Wallis, 1685] J. Wallis. A Treatise of Algebra, Both Historical and Practical. London: John Playford, 1685. [Wallis, 1687] J. Wallis. Institutio Logicae, Ad communes usus accomodata. Oxford: E Theatro Sheldoniano, 1687. [Weinberg, 1965] J. R. Weinberg. Abstraction, Relation, and Induction: Three Essays in the History of Thought. Madison/Milwaukee: University of Wisconsin Press, 1965. ˆ [Wenin, 1986] L’homme et son univers au Moyen Age: Actes du septi` eme congr` es internationale de philosophie m´ edi´ evale. Edited by C. Wenin. Philosophes m´edi´evaux, 26–27. Louvain-la-Neuve: Editions de l’institute sup´erieur de philosophie, 1986. [Westphal, 1989] Leibniz and the Problem of Induction. Studia Leibnitiana, 21: 174–187, 1989. [Winkler, 1999] K. Winkler. Hume’s Inductive Skepticism. In [Atherton, 1999, pp. 183–212]. [Zabarella, 1608] J. Zabarella. Opera Logica. Frankfurt: Lazarus Zetzner, 1608; reprinted Hildesheim: G. Olms, 1966.
HUME AND THE PROBLEM OF INDUCTION
Marc Lange
1
INTRODUCTION
David Hume first posed what is now commonly called “the problem of induction” (or simply “Hume’s problem”) in 1739 — in Book 1, Part iii, section 6 (“Of the inference from the impression to the idea”) of A Treatise of Human Nature (hereafter T ). In 1748, he gave a pithier formulation of the argument in Section iv (“Skeptical doubts concerning the operations of the understanding”) of An Enquiry Concerning Human Understanding (E).1 Today Hume’s simple but powerful argument has attained the status of a philosophical classic. It is a staple of introductory philosophy courses, annually persuading scores of students of either the enlightening or the corrosive effect of philosophical inquiry – since the argument appears to undermine the credentials of virtually everything that passes for knowledge in their other classes (mathematics notably excepted2 ). According to the standard interpretation, Hume’s argument purports to show that our opinions regarding what we have not observed have no justification. The obstacle is irremediable; no matter how many further observations we might make, we would still not be entitled to any opinions regarding what we have not observed. Hume’s point is not the relatively tame conclusion that we are not warranted in making any predictions with total certainty. Hume’s conclusion is more radical: that we are not entitled to any degree of confidence whatever, no matter how slight, in any predictions regarding what we have not observed. We are not justified in having 90% confidence that the sun will rise tomorrow, or in having 70% confidence, or even in being more confident that it will rise than that it will not. There is no opinion (i.e., no degree of confidence) that we are entitled to have regarding a claim concerning what we have not observed. This conclusion “leaves not the lowest degree of evidence in any proposition” that goes beyond our present observations and memory (T , p. 267). Our justified opinions must be “limited to the narrow sphere of our memory and senses” (E, p. 36). 1 All page references to the Treatise are to [Hume, 1978]. All page references to the Enquiry are to [Hume, 1977]. 2 However, even in mathematics, inductive logic is used, as when we take the fact that a computer search program has found no violation of Goldbach’s conjecture up to some enormously high number as evidence that Goldbach’s conjecture is true even for higher numbers. For more examples, see [Franklin, 1987]. Of course, such examples of inductive logic in mathematics must be sharply distinguished from “mathematical induction”, which is a form of deductive reasoning.
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
44
Marc Lange
Hume’s problem has not gained its notoriety merely from Hume’s boldness in denying the epistemic credentials of all of the proudest products of science (and many of the humblest products of common-sense). It takes nothing for someone simply to declare himself unpersuaded by the evidence offered for some prediction. Hume’s problem derives its power from the strength of Hume’s argument that it is impossible to justify reposing even a modest degree of confidence in any of our predictions. Again, it would be relatively unimpressive to argue that since a variety of past attempts to justify inductive reasoning have failed, there is presumably no way to justify induction and hence, it seems, no warrant for the conclusions that we have called upon induction to support. But Hume’s argument is much more ambitious. Hume purports not merely to show that various, apparently promising routes to justifying induction all turn out to fail, but also to exclude every possible route to justifying induction. Naturally, many philosophers have tried to find a way around Hume’s argument — to show that science and common-sense are justified in making predictions inductively. Despite these massive efforts, no response to date has received widespread acceptance. Inductive reasoning remains (in C.D. Broad’s famous apothegm) “the glory of Science” and “the scandal of Philosophy” [Broad, 1952, p. 143]. Some philosophers have instead embraced Hume’s conclusion but tried to characterize science so that it does not involve our placing various degrees of confidence in various predictions. For example, Karl Popper has suggested that although science refutes general hypotheses by finding them to be logically inconsistent with our observations, science never confirms (even to the smallest degree) the predictive accuracy of a general hypothesis. Science has us make guesses regarding what we have not observed by using those general hypotheses that have survived the most potential refutations despite sticking their necks out furthest, and we make these guesses even though we have no good reason to repose any confidence in their truth: I think that we shall have to get accustomed to the idea that we must not look upon science as a ‘body of knowledge,’ but rather as a system of hypotheses; that is to say, a system of guesses or anticipations which in principle cannot be justified, but with which we work as long as they stand up to tests, and of which we are never justified in saying that we know that they are ‘true’ or ‘more or less certain’ or even ‘probable’. [Popper, 1959, p. 317; cf. Popper, 1972] However, if we are not justified in having any confidence in a prediction’s truth, then it is difficult to see how it could be rational for us to rely upon that prediction [Salmon, 1981]. Admittedly, “that we cannot give a justification . . . for our guesses does not mean that we may not have guessed the truth.” [Popper, 1972, p. 30] But if we have no good reason to be confident that we have guessed the truth, then we would seem no better justified in being guided by the predictions of theories that have passed their tests than in the predictions of theories that have failed their
Hume and the Problem of Induction
45
tests. There would seem to be no grounds for calling our guesswork “rational”, as Popper does. Furthermore, Popper’s interpretation of science seems inadequate. Some philosophers, such as van Fraassen [1981; 1989], have denied that science confirms the truth of theories about unobservable entities (such as electrons and electric fields), the truth of hypotheses about the laws of nature, or the truth of counterfactual conditionals (which concern what would have happened under circumstances that actually never came to pass — for example, “Had I struck the match, it would have lit”). But these philosophers have argued that these pursuits fall outside of science because we need none of them in order to confirm the empirical adequacy of various theories, a pursuit that is essential to science. So even these interpretations of science are not nearly as austere as Popper’s, according to which science fails to accumulate evidence for empirical predictions. In this essay, I will devote sections 2, 3, and 4 to explaining Hume’s argument and offering some criticism of it. In section 6, I will look at the conclusion that Hume himself draws from it. In sections 5 and 7-11, I will review critically a few of the philosophical responses to Hume that are most lively today.3 2 TWO PROBLEMS OF INDUCTION Although Hume never uses the term “induction” to characterize his topic, today Hume’s argument is generally presented as targeting inductive reasoning: any of the kinds of reasoning that we ordinarily take as justifying our opinions regarding what we have not observed. Since Hume’s argument exploits the differences between induction and deduction, let’s review them. For the premises of a good deductive argument to be true, but its conclusion to be false, would involve a contradiction. (In philosophical jargon, a good deductive argument is “valid”.) For example, a geometric proof is deductive since the truth of its premises ensures the truth of its conclusion by a maximally strong (i.e., “logical”) guarantee: on pain of contradiction! That deduction reflects the demands of non-contradiction (a semantic point) has a metaphysical consequence — in particular, a consequence having to do with necessity and possibility. A contradiction could not come to pass; it is impossible. So it is impossible for the premises of a good deductive argument to be true but its conclusion to be false. (That is why deduction’s “guarantee” is maximally strong.) It is impossible for a good deductive argument to take us from a truth to a falsehood (i.e., to fail to be “truth-preserving”) because such failure would involve a contradiction and contradictions are impossible. A good deductive argument is necessarily truthpreserving. In contrast, no contradiction is involved in the premises of a good inductive argument being true and its conclusion being false. (Indeed, as we all know, this 3 My critical review is hardly exhaustive. For an admirable discussion of some responses to Hume in older literature that I neglect, see [Salmon, 1967].
46
Marc Lange
sort of thing is a familiar fact of life; our expectations, though justly arrived at by reasoning inductively from our observations, sometimes fail to be met.) For example, no matter how many human cells we have examined and found to contain proteins, there would be no contradiction between our evidence and a given as yet unobserved human cell containing no proteins. No contradiction is involved in a good inductive argument’s failure to be truth-preserving. Once again, this semantic point has a metaphysical consequence if every necessary truth is such that its falsehood involves a contradiction (at least implicitly): even if a given inductive argument is in fact truth-preserving, it could have failed to be. It is not necessarily truth-preserving.4 These differences between deduction and induction lead to many other differences. For example, the goodness of a deductive argument does not come in degrees; all deductive arguments are equally (and maximally) strong. In contrast, some inductive arguments are more powerful than others. Our evidence regarding the presence of oxygen in a room that we are about to enter is much stronger than our evidence regarding the presence of oxygen in the atmosphere of a distant planet, though the latter evidence may still be weighty. As we examine more (and more diverse) human cells and find proteins in each, we are entitled to greater confidence that a given unobserved human cell also contains proteins; the inductive argument grows stronger. Furthermore, since the premises of a good deductive argument suffice to ensure its conclusion on pain of contradiction, any addition to those premises is still enough to ensure the conclusion on pain of contradiction. In contrast, by adding to the premises of a good inductive argument, its strength may be augmented or diminished. By adding to our stock of evidence the discovery of one human cell that lacks proteins, for example, we may reduce the strength of our inductive argument for the prediction of proteins in a given unobserved human cell. That inductive arguments are not deductive — that they are not logically guaranteed to be truth-preserving — plays an important part in Hume’s argument (as we shall see in a moment). But the fact that the premises of a good inductive argument cannot give the same maximal guarantee to its conclusion as the premises of a deductive argument give to its conclusion should not by itself be enough to cast doubt on the cogency of inductive reasoning. That the premises of an inductive argument fail to “demonstrate” the truth of its conclusion (i.e., to show that the conclusion could not possibly be false, given the premises) does not show that its premises fail to confirm the truth of its conclusion — to warrant us (if 4 Sometimes it is said that since the conclusion of a good deductive argument is true given the premises on pain of contradiction, the conclusion is implicitly contained in the premises. A good deductive argument is not “ampliative”. It may make explicit something that was already implicit in the premises, and so we may learn things through deduction, but a deductive argument does not yield conclusions that “go beyond” its premises. In contrast, a good inductive argument is ampliative; it allows us to “go beyond” the evidence in its premises. This “going beyond” is a metaphor that can be cashed out either semantically (the contrary of an inductive argument’s conclusion does not contradict its premises) or metaphysically (it is possible for the conclusion to be false and the premises true).
Hume and the Problem of Induction
47
our belief in the premises is justified) in placing greater confidence (perhaps even great confidence) in the conclusion. (Recall that Hume purports to show that even modest confidence in the conclusions reached by induction is not justified.) That the conclusion of a given inductive argument can be false, though its premises are true, does not show that its premises fail to make its conclusion highly plausible. In short, inductive arguments take risks in going beyond our observations. Of course, we all know that some risks are justified, whereas others are unwarranted. The mere fact that inductive arguments take risks does not automatically show that the risks they take are unreasonable. But Hume (as it is standard to interpret him) purports to show that we are not justified in taking the risks that inductive inferences demand. That induction is fallible does not show that inductive risks cannot be justified. To show that, we need Hume’s argument. It aims to show that any scheme purporting to justify taking those risks must fail. It is important to distinguish two questions that could be asked about the risks we take in having opinions that go beyond the relatively secure ground of what we observe: 1. Why are we justified in going beyond our observations at all? 2. Why are we justified in going beyond our observations in a certain specific way: by having the opinions endorsed by inductive reasoning? To justify our opinions regarding what we have not observed, it would not suffice merely to justify having some opinions about the unobserved rather than none at all. There are many ways in which we could go beyond our observations. But we believe that only certain ways of doing so are warranted. Our rationale for taking risks must be selective: it must reveal that certain risks are worthy of being taken whereas others are unjustified [Salmon, 1967, p. 47]. In other words, an adequate justification of induction must justify induction specifically; it must not apply equally well to all schemes, however arbitrary or cockeyed, for going beyond our observations. For example, an adequate justification of induction should tell us that as we examine more (and more diverse) human cells and find proteins in each, we are entitled (typically) to greater confidence that a given unobserved human cell also contains proteins, but not instead to lesser confidence in this prediction — and also not to greater confidence that a given unobserved banana is ripe. In short, an answer to the first of the two questions above that does not privilege induction, but merely supports our taking some risks rather than none at all, fails to answer the second question. An adequate justification of induction must favor science over guesswork, wishful thinking, necromancy, or superstition; it cannot place them on a par. 3 HUME’S FORK: THE FIRST OPTION Consider any inductive argument. Its premises contain the reports of our observations. Its conclusion concerns something unobserved. It may be a prediction
48
Marc Lange
regarding a particular unobserved case (e.g., that a given human cell contains proteins), a generalization concerning all unobserved cases of a certain kind (that all unobserved human cells contain proteins), or a generalization spanning all cases observed and unobserved (that all human cells contain proteins) — or even something stronger (that it is a law of nature that all human cells contain proteins). Although Hume’s argument is not limited to relations of cause and effect, Hume typically gives examples in which we observe a cause (such as my eating bread) and draw upon our past experiences of events that have accompanied similar causes (our having always derived nourishment after eating bread in the past) to confirm that a similar event (my deriving nourishment) will occur in this case. Another Hume favorite involves examples in which a body’s presenting a certain sensory appearance (such as the appearance of bread) and our past experiences confirm that the body possesses a certain disposition (a “secret power,” such as to nourish when eaten). How can the premises of this inductive argument justify its conclusion? Hume says that in order for the premises to justify the conclusion, we must be able to reason from the premises to the conclusion in one of two ways: All reasonings may be divided into two kinds, namely demonstrative reasoning, or that concerning relations of ideas, and moral reasoning, or that concerning matter of fact and existence. (E, p. 22) By “demonstrative reasoning”, Hume seems to mean deduction. As we have seen, deduction concerns “relations of ideas” in that a deductive argument turns on semantic relations: certain ideas contradicting others. Then “moral reasoning, or that concerning matter of fact and existence” would apparently have to be induction. (“Moral reasoning”, in the archaic sense that Hume uses here, does not refer specifically to reasoning about right and wrong; “moral reasoning” could, in the strongest cases, supply “moral certainty”, a degree of confidence beyond any reasonable doubt but short of the “metaphysical certainty” that a proof supplies.5 ) I will defer the half of Hume’s argument concerned with non-demonstrative reasoning until the next section. indexmoral reasoning Is there a deductive argument taking us from the premises of our inductive argument about bread (and only those premises) to the argument’s conclusion? We cannot think of one. But this does not show conclusively that there isn’t one. As we all know from laboring over proofs to complete our homework assignments for high-school geometry classes, we sometimes fail to see how a given conclusion can be deduced from certain premises even when there is actually a way to do it. But Hume argues that even if we used greater ingenuity, we could not find a way to reason deductively from an inductive argument’s premises to its conclusion. No way exists. Here is a reconstruction of Hume’s argument. If the conclusion of an inductive argument could be deduced from its premises, then the falsehood of the conclusion would contradict the 5 For other examples of this usage, see the seventh definition of the adjective “moral” in The Oxford English Dictionary.
Hume and the Problem of Induction
49
truth of the premises. But the falsehood of its conclusion does not contradict the truth of its premises. So the conclusion of an inductive argument cannot be deduced from its premises. How does Hume know that the conclusion’s falsehood does not contradict the truth of the argument’s premises? Hume says that we can form a clear idea of the conclusion’s being false along with the premises being true, and so this state of affairs must involve no contradiction. Here is the argument in some of Hume’s words: The bread, which I formerly eat, nourished me; that is, a body of such sensible qualities, was, at that time, endued with such secret powers: But does it follow, that other bread must also nourish me at another time, and that like sensible qualities must always be attended with like secret powers? The consequence seems nowise necessary. . . . That there are no demonstrative arguments in the case, seems evident; since it implies no contradiction, that the course of nature may change, and that an object, seemingly like those which we have experienced, may be attended with different or contrary effects. May I not clearly and distinctly conceive, that a body, falling from the clouds, and which, in all other respects, resembles snow, has yet the taste of salt or the feeling of fire? . . . Now whatever is intelligible, and can be distinctly conceived, implies no contradiction, and can never be proved false by any demonstrative argument or abstract reasoning a priori. (E, pp. 21-2) This passage takes us to another way to understand “Hume’s fork”: the choice he offers us between two different kinds of reasoning for taking us from an inductive argument’s premises to its conclusion. We may interpret “demonstrative reasoning” as reasoning a priori (that is, reasoning where the step from premises to conclusion makes no appeal to what we learn from observation) and reasoning “concerning matter of fact and existence” as reasoning empirically (that is, where the step from premises to conclusion depends on observation). According to Hume, we can reason a priori from p to q only if “If p, then q” is necessary — i.e., only if it could not have been false. That is because if it could have been false, then in order to know that it is true, we must check the actual world — that is, make some observations. If “If p, then q” is not a necessity but merely happens to hold (i.e., holds as a “matter of fact”), then we must consult observations in order to know that it is the case. So reasoning “concerning matter of fact and existence” must be empirical. Now Hume can argue once again that by reasoning a priori, we cannot infer from an inductive argument’s premises to its conclusion. Here is a reconstruction:
50
Marc Lange
If we could know a priori that the conclusion of an inductive argument is true if its premises are true, then it would have to be necessary for the conclusion to be true if the premises are true. But it is not necessary for the conclusion to be true if the premises are true. So we cannot know a priori that the conclusion of an inductive argument is true if its premises are true. Once again, Hume defends the middle step on the grounds that we can clearly conceive of the conclusion being false while the premises are true. Hence, there is no contradiction in the conclusion’s being false while the premises are true, and so it is not necessary for the conclusion to be true if the premises are true. Hume says: To form a clear idea of any thing, is an undeniable argument for its possibility, and is alone a refutation of any pretended demonstration against it. (T , p. 89; cf. T , pp. 233, 250) Of course, if the premises of our inductive argument included not just that bread nourished us on every past occasion when we ate some, but also that all bread is alike in nutritional value, then there would be an a priori argument from the premises to the conclusion. It would be a contradiction for all bread to be nutritionally alike, certain slices of bread to be nutritious, but other slices not to be nutritious. However, Hume would ask, how could we know that all bread is alike in nutritional value? That premise (unlike the others) has not been observed to be true. It cannot be inferred a priori to be true, given our observations, since its negation involves no contradiction with our observations. Earlier I said that Hume aims to show that we are not entitled even to the smallest particle of confidence in our predictions about (contingent features of) what we have not observed. But the arguments I have just attributed to Hume are directed against conclusions of the form “p obtains”, not “p is likely to obtain” or “p is more likely to obtain than not to obtain”. How does Hume’s argument generalize to cover these conclusions? How does it generalize to cover opinions short of full belief — opinions involving a degree of confidence less than certainty? The appropriate way to extend Hume’s reasoning depends on what it is to have a degree of belief that falls short of certainty. Having such a degree of belief in p might be interpreted as equivalent to (or at least as associated with) having a full belief that p has a given objective chance of turning out to be true, as when our being twice as confident that a die will land on 1, 2, 3, or 4 than that it will land on 5 or 6 is associated with our believing that the die has twice the objective chance of landing on 1, 2, 3, or 4 than on 5 or 6. As Hume says, There is certainly a probability, which arises from a superiority of chances on any side; and accordingly as this superiority increases, and
Hume and the Problem of Induction
51
surpasses the opposite chances, the probability receives a proportionable increase, and begets still a higher degree of belief or assent to that side, in which we discover the superiority. If a die were marked with one figure or number of spots on four sides, and with another figure or number of spots on the two remaining sides, it would be more probable, that the former would turn up than the latter; though, if it had a thousand sides marked in the same manner, and only one side different, the probability would be much higher, and our belief or expectation of the event more steady and secure. (E, p. 37; cf. T , p. 127) Suppose, then, that our having n% confidence in p must be accompanied by our believing that p has n% chance of obtaining. Then Hume could argue that since there is no contradiction in the premises of an inductive argument being true even while its conclusion lacks n% chance of obtaining — for any n, no matter how low — we cannot proceed a priori from an inductive argument’s premises to even a modest degree of confidence in its conclusion. For example, there is no contradiction in a die’s having landed on 1, 2, 3, or 4 twice as often as it has landed on 5 or 6 in the many tosses that we have already observed, but its not having twice the chance of landing on 1, 2, 3, or 4 as on 5 or 6; the die could even be strongly biased (by virtue of its mass distribution) toward landing on 5 or 6, but nevertheless have happened “by chance” to land twice as often on 1, 2, 3, or 4 as on 5 or 6 in the tosses that we have observed. Hume sometimes seems simply to identify our having n% confidence in p with our believing that p has n% chance of obtaining. For example, he considers this plausible idea: Shou’d it be said, that tho’ in an opposition of chances ‘tis impossible to determine with certainty, on which side the event will fall, yet we can pronounce with certainty, that ‘tis more likely and probable, ‘twill be on that side where there is a superior number of chances, than where there is an inferior: . . . (T , p. 127) Though we are not justified in having complete confidence in a prediction (e.g., that the die’s next toss will land on 1, 2, 3, or 4), we are entitled to a more modest degree of belief in it. (One paragraph later, he characterizes confidence in terms of “degrees of stability and assurance”.) He continues: Shou’d this be said, I wou’d ask, what is here meant by likelihood and probability? The likelihood and probability of chances is a superior number of equal chances; and consequently when we say ‘tis likely the event will fall on the side, which is superior, rather than on the inferior, we do no more than affirm, that where there is a superior number of chances there is actually a superior, and where there is an inferior there is an inferior; which are identical propositions, and of no consequence. (T , p. 127)
52
Marc Lange
In other words, to have a greater degree of confidence that the die’s next toss will land on 1, 2, 3, or 4 than that it will land on 5 or 6 is nothing more than to believe that the former has a greater chance than the latter. However, it is not the case that our having n% confidence that p must be accompanied by our believing that p has n% chance of obtaining. Though the outcomes of die tosses may be governed by objective chances (dice, after all, is a “game of chance”), some of our predictions concern facts that we believe involve no objective chances at all, and we often have non-extremal degrees of confidence in those predictions. For instance, I may have 99% confidence that the next slice of bread I eat will be nutritious, but I do not believe that there is some hidden die-toss, radioactive atomic decay, or other objectively chancy process responsible for its nutritional value. For that matter, I may have 90% confidence that the dinosaurs’s extinction was preceded by an asteroid collision with the earth (or that the field equations of general relativity are laws of nature), but the objective chance right now that an asteroid collided with the earth before the dinosaurs’ extinction is 1 if it did or 0 if it did not (and likewise for the general-relativistic field equations). Suppose that our having n% confidence that the next slice of bread I eat will be nutritious need not be accompanied by any prediction about which we must have full belief — such as that the next slice of bread has n% objective chance of being nutritious, or that n% of all unobserved slices of bread are nutritious, or that n equals the limiting relative frequency of nutritious slices among all bread slices. Since there is no prediction q about which we must have full belief, Hume cannot show that there is no a priori argument from our inductive argument’s premises to n% confidence that the next slice of bread I eat will be nutritious by showing that there is no contradiction in those premises being true while q is false. We have here a significant gap in Hume’s argument [Mackie, 1980, pp. 15-16; Stove, 1965]. If our degrees of belief are personal (i.e., subjective) probabilities rather than claims about the world, then there is no sense in which the truth of an inductive argument’s premises fail to contradict the falsehood of its conclusion — since there is no sense in which its conclusion can be false (or true), since its conclusion is a degree of belief, not a claim about the world. (Of course, the conclusion involves a degree of belief in the truth of some claim about the world. But the degree of belief itself is neither true nor false.) Hence, Hume cannot conclude from such non-contradiction that there is no a priori argument from the inductive argument’s premises to its conclusion. Of course, no a priori argument could demonstrate that the premises’ truth logically guarantees the conclusion’s truth — since, once again, the conclusion is not the kind of thing that could be true (or false). But there could still be an a priori argument from the opinions that constitute the inductive argument’s premises to the degrees of belief that constitute its conclusion — an argument showing that holding the former opinions requires holding the latter, on pain of irrationality. This a priori argument could not turn entirely on semantic relations because a degree of belief is not the sort of thing that can be true or false, so it cannot
Hume and the Problem of Induction
53
be that one believes a contradiction in having the degrees of belief in an inductive argument’s premises without the degree of belief forming its conclusion. Thus, the a priori argument would not be deductive, as I characterized deduction in section 2. Here we see one reason why it is important to distinguish the two ways that Hume’s fork may be understood: (i) as deduction versus induction, or (ii) as a priori reasoning versus empirical reasoning. Hume apparently regards all a priori arguments as deductive arguments, and hence as arguments that do not yield mere degrees of belief, since degrees of belief do not stand in relations of contradiction and non-contradiction. (At E, p. 22, Hume explicitly identifies arguments that are “probable only” with those “such as regard matter of fact and real existence, according to the division above mentioned” — his fork. See likewise T , p. 651.) If degrees of belief can be interpreted as personal probabilities, then there are a priori arguments purporting to show that certain degrees of belief cannot rationally be accompanied by others: for example, that 60% confidence that p is true cannot be accompanied by 60% confidence that p is false — on pain not of contradiction, but of irrationality (“incoherence”). Whether such a priori arguments can resolve Hume’s problem is a question that I will take up in section 9. On the other hand, even if our degrees of belief are personal probabilities rather than claims about the world, perhaps our use of induction to generate our degrees of belief must (on pain of irrationality) be accompanied by certain full beliefs about the world. Suppose we regard Jones as an expert in some arcane subject — so much so that we take Jones’ opinions on that subject as our own. Surely, we would be irrational to regard Jones as an expert and yet not believe that there is a higher fraction of truths among the claims in Jones’ area of expertise about which Jones is highly confident than among the claims in Jones’ area of expertise about which Jones is highly doubtful (presuming that there are many of both). If we did not have this belief, then how could we consider Jones to be an expert? (A caveat: Perhaps our belief that Jones is an expert leaves room for the possibility that Jones has a run of bad luck so that by chance, there is a higher fraction of truths among the claims about which Jones is doubtful than among the claims about which Jones is highly confident. However, perhaps in taking Jones to be an expert, we must at least believe there to be a high objective chance that there is a higher fraction of truths among the claims about which Jones is highly confident than among the claims about which Jones harbors grave doubts.) We use induction to guide our predictions. In effect, then, we take induction as an expert; we take the opinions that induction yields from our observations and make them our own. Accordingly, we must believe that there is (a high chance that there is) a higher fraction of truths among the claims to which induction from our observations assigns a high degree of confidence than among the claims to which induction from our observations assigns a low degree of confidence (presuming that there are many of both). (Perhaps we must even believe that there is (a high chance that there is) a high fraction of truths among the claims to which induction from our observations assigns a high degree of confidence. Otherwise, why would we have such a high degree of confidence in their truth?)
54
Marc Lange
We may now formulate an argument [Skyrms, 1986, pp. 25—27] in the spirit of Hume’s. To be justified in using induction to generate our degrees of belief, we must be justified in believing that there is (a high chance that there is) a higher fraction of truths among the claims to which induction from our observations assigns a high degree of confidence than among the claims to which induction from our observations assigns a low degree of confidence. But the falsehood of this claim does not contradict our observations. So we cannot know a priori (or deductively) that this claim is true given our observations. For us to be justified in using induction, would it suffice that we justly possess a high degree of confidence that there is (a high chance that there is) a higher fraction of truths among the claims to which induction from our observations assigns a high degree of confidence than among the claims to which induction from our observations assigns a low degree of confidence? Perhaps.6 If so, then once again, our Humean argument is vulnerable to the reply that there may be an a priori argument for our having this high degree of confidence, given our observations, even if there is no contradiction between our observations and the negation of the claim in which we are placing great confidence. On the other hand, consider our expert Jones. Suppose we merely possess a high degree of confidence that there is a higher fraction of truths among the claims in Jones’ area of expertise about which Jones is highly confident than among the claims in Jones’ area of expertise about which Jones is highly doubtful. Then although we might give great weight to Jones’ opinions, we might well not take Jones’ opinions as our own. We should, if possible, consult many other experts along with Jones and weigh each one’s opinion regarding p by our confidence in the expert who holds it in order to derive our own opinion regarding p. We should take into account whether we believe that a given expert is more likely to err by placing great confidence in claims about which he should be more cautious or by having grave doubts regarding claims in which he should place greater confidence. But our relation to the expert Jones would then be very different from our relation to our “in-house” expert Induction. In contrast to Jones’ opinions, the opinions that induction generates from our observations we take unmodified as our own. If we possessed merely a high degree of confidence that there is a higher fraction of truths among the claims to which induction from our observations assigns a high degree of confidence than among the claims to which induction from our observations assigns a low degree of confidence, then we would have to take the degrees of belief recommended by induction and amend them in light of our estimates of induction’s tendency to excessive confidence and tendency to excessive caution. We do not seem to rest our reliance on induction upon any balancing (or even contemplation) of these correction factors.
6 Though Hume doesn’t seem to think so: “If there be any suspicion, that the course of nature may change, and that the past may be no rule for the future, all experience becomes useless, and can give rise to no inference or conclusion.” (E, p. 24)
Hume and the Problem of Induction
4
55
HUME’S FORK: THE SECOND OPTION
Let’s now turn to the second option in Hume’s fork: Is there an inductive (rather than deductive) — or empirical (rather than a priori) — argument taking us from the premises of a given inductive argument to its conclusion? Of course there is: the given inductive argument itself! But since that is the very argument that we are worrying about, we cannot appeal to it to show that we are justified in proceeding from its premises to its conclusion. Is there any independent way to argue inductively (or empirically) that this argument’s conclusion is true if its premises are true? Hume argues that there is not. He believes that any inductive (or empirical) argument that we would ordinarily take to be good is of the same kind as the argument that we are worrying about, and so cannot be used to justify that argument on pain of circularity: [A]ll experimental conclusions [what Hume on the following page calls “inferences from experience”] proceed upon the supposition that the future will be conformable to the past. To endeavour, therefore, the proof of this last supposition by probable arguments, or arguments regarding existence, must be evidently going in a circle, and taking that for granted, which is the very point in question. (E, p. 23) [P]robability is founded on the presumption of a resemblance betwixt those objects, of which we have had experience, and those, of which we have had none; and therefore ‘tis impossible this presumption can arise from probability. (T , p. 90)7 Since all non-deductive arguments that we consider good are based on the “principle of the uniformity of nature” (that unexamined cases are like the cases that we have already observed), it would be begging the question to use some such argument to take us from the premises to the conclusion of an inductive argument. For example, suppose we argued as follows for a high degree of confidence that the next slice of bread to be sampled will be nutritious: 1. We have examined many slices of bread for their nutritional value and found all of them to be nutritious. 2. (from 1) If unobserved slices of bread are like the slices of bread that we have already examined, then the next slice of bread we observe will be nutritious. 3. When in the past we examined things that had not yet been observed, we usually found them to be like the things that we had already observed. 7 Although Hume’s is the canonical formulation of the argument, the ideas behind it seem to have been in the air. In 1736, Joseph Butler [1813, p. 17] identified the probability “that all things will continue as we experience they are” as “our only natural reason for believing the course of the world will continue to-morrow, as it has done as far as our experience or knowledge of history can carry us back.”
56
Marc Lange
4. So (from 3) unobserved slices of bread are probably like examined slices of bread. 5. Therefore (from 2 and 4) it is likely that the next slice of bread we observe will be nutritious. But the step from (3) to (4) is based on our confidence that unobserved things are like observed things, which — had we been entitled to it — could have gotten us directly from (1) to (5) without any detour through (2), (3), and (4). As Hume wrote, Shou’d it be said, that we have experience, that the same power continues united with the same object, and that like objects are endow’d with like powers, I wou’d renew my question, why from this experience we form any conclusion beyond those past instances, of which we have had experience. If you answer this question in the same manner as the preceding, your answer gives still occasion to a new question of the same kind, even in infinitum; which clearly proves, that the foregoing reasoning had no just foundation. (T , p. 91) To justify induction by arguing that induction is likely to work well in the future, since it has worked well in the past, is circular.8 It might be suggested that although a circular argument is ordinarily unable to justify its conclusion, a circular argument is acceptable in the case of justifying a fundamental form of reasoning. After all, there is nowhere more basic to turn, so all that we can reasonably demand of a fundamental form of reasoning is that it endorse itself. However, certain ludicrous alternatives to induction are also self-supporting. For instance, if induction is based on the presupposition that unexamined cases are like the cases that we have already observed, then take “counterinduction” to be based on the opposite presupposition: that unexamined cases are unlike the cases that we have already observed. For example, induction urges us to expect unexamined human cells to contain proteins, considering that 8 Moreover, surely we did not have to wait to accumulate evidence of induction’s track record in order to be justified in reasoning inductively. It has sometimes been suggested (for instance, by [Black, 1954]) that an inductive justification of induction is not viciously circular. Roughly speaking, the suggestion is that the argument from past observations of bread to bread predictions goes by a form of reasoning involving only claims about bread and other concrete particulars, whereas the argument justifying that form of reasoning (“It has worked well in past cases, so it will probably work well in future cases”) goes by a form of reasoning involving only claims about forms of reasoning involving only claims about bread and the like. In short, the second form of reasoning is at a higher level than and so distinct from the first. Therefore, to use an argument of the second form to justify an argument of the first form is not circular. This response to the problem of induction has been widely rejected on two grounds [BonJour, 1986, pp. 105—6]: (i) Even if we concede that these two forms of argument are distinct, the justification of the first form remains conditional on the justification of the second form, and so on, starting an infinite regress. No form ever manages to acquire unconditional justification. (ii) The two forms of argument do not seem sufficiently different for the use of one in justifying the other to avoid begging the question.
Hume and the Problem of Induction
57
every human cell that has been tested for proteins has been found to contain some. Accordingly, given that same evidence, counterinduction urges us to expect unexamined human cells not to contain proteins.9 Counterinduction is plainly bad reasoning. However, just as induction supports itself (in that induction has worked well in the past, so by induction, it is likely to work well in the future), counterinduction supports itself (in that counterinduction has not worked well in the past, so by counterinduction, it is likely to work well in the future). If we allow induction to justify itself circularly, then we shall have to extend the same privilege to counterinduction (unless we just beg the question by presupposing that induction is justified whereas counterinduction is not). But as I pointed out at the close of section 2, an adequate justification of induction must justify induction specifically; it must not apply equally well to all schemes, however arbitrary or cockeyed, for going beyond our observations. Even counterinduction is self-supporting, so being self-supporting cannot suffice for being justified. [Salmon, 1967, pp. 12—17] It might be objected that there are many kinds of inductive arguments — not just the “induction by enumeration” (taking regularities in our observations and extrapolating them to unexamined cases) that figures in Hume’s principal examples, but also (for example) the hypothetico-deductive method, common-cause inference [Salmon, 1984], and inference to the best explanation [Harman, 1965; Thagard, 1978; Lipton, 1991]. Does this diversity undermine Hume’s circularity argument? One might think not: even if an inference to the best explanation could somehow be used to support the “uniformity assumption” grounding one of Hume’s inductions by enumeration, we would still need a justification of inference to the best explanation in order to justify the conclusion of the inductive argument. There is some justice in this reply. However, this reply also misunderstands the goal of Hume’s argument. Hume is not merely demanding that we justify induction, pointing out that we have not yet done so, and suggesting that until we do so, we are not entitled to induction’s fruits. Hume is purporting to show that it is impossible to justify induction. To do that, Hume must show that any possible means of justifying induction either cannot reach its target or begs the question in reaching it. The only way that inference to the best explanation (or some other 9 Of course, expressed this crudely, “counterinduction” would apparently lead to logically inconsistent beliefs — for instance, that that the next emerald we observe will be yellow (since every emerald we have checked so far has been found not to be yellow) and that the next emerald we observe will be orange (since every emerald we have checked so far has been found not to be orange). One way to reply is to say: so much the worse, then, for any argument that purports to justify counterinduction! Another reply is to say that like induction, counterinduction requires that we form our expectations on the basis of all of our evidence to date, so we must consider that every emerald we have checked so far has been found not merely to be non-yellow and nonorange, but to be green, so by counterinduction, we should expect only that the next emerald to be observed will not be green. Finally, we might point out that induction must apply the principle of the uniformity of nature selectively, on pain of leading to logically inconsistent beliefs, as Goodman’s argument will show (in a moment). “Counterinduction” must likewise be selective in applying the principle of the non-uniformity of nature. But no matter: let’s suppose that counterinduction allows that principle to be applied in the argument that I am about to give by which counterinduction supports itself.
58
Marc Lange
non-deductive kind of inference) can beg the question is if it, too, is based on some principle of the uniformity of nature. That it has not yet itself been justified fails to show that induction cannot be justified. In other words, Hume’s argument is not that if one non-deductive argument is supported by another, then we have not yet justified the first argument because the second remains ungrounded. Rather, Hume’s argument is that every non-deductive argument that we regard as good is of the same kind, so it would be circular to use any of them to support any other. In other words, Hume is arguing that there is a single kind of non-deductive argument (which we now call “induction”) that we consider acceptable. Consequently, it is misleading to characterize Hume’s fork as offering us two options: deduction and induction. To put the fork that way gives the impression that Hume is entitled from the outset of his argument to presume that induction is a single kind of reasoning. But that is part of what Hume needs to and does argue for: [A]ll arguments from experience are founded on the similarity, which we discover among natural objects, and by which we are induced to expect effects similar to those, which we have found to follow from such objects. (E, p. 23) If some good non-deductive argument that does not turn on a uniformity-of-nature presumption could be marshaled to take us from the premises to the conclusion of an inductive argument, then we could invoke that argument to justify induction. As long as the argument does not rely on a uniformity-of-nature presumption, we beg no question in using it to justify induction; it is far enough away from induction to avoid circularity. Hume’s point is not that any non-deductive scheme for justifying an induction leaves us with another question: how is that scheme to be justified? Hume’s point is that any non-deductive scheme for justifying an induction leaves us with a question of the same kind as we started with, because every non-deductive scheme is fundamentally the same kind of argument as we were initially trying to justify. Let me put my point in one final way. Hume has sometimes been accused of setting an unreasonably high standard for inductive arguments to qualify as justified: that they be capable of being turned into deductive arguments [Stove, 1973; Mackie, 1974]. In other words, Hume has been accused of “deductive chauvinism”: as presupposing that only deductive arguments can justify. But Hume does not begin by insisting that deduction is the only non-circular way to justify induction. Hume argues for this by arguing that every non-deductive argument is of the same kind. If there were many distinct kinds of non-deductive arguments, Hume would not be able to suggest that any non-deductive defense of induction is circular. Hume’s argument, then, turns on the thought that every inductive argument is based on the same presupposition: that unobserved cases are similar to the cases that we have already observed. However, Nelson Goodman [1954] famously showed that such a “principle of the uniformity of nature” is empty. No matter
Hume and the Problem of Induction
59
what the unobserved cases turn out to be like, there is a respect in which they are similar to the cases that we have already observed. Therefore, the “principle of the uniformity of nature” (even if we are entitled to it) is not sufficient to justify making one prediction rather than another on the basis of our observations. Different possible futures would continue different past regularities, but any possible future would continue some past regularity. [Sober, 1988, pp. 63—69] For example, Goodman says, suppose we have examined many emeralds and found each of them at the time of examination to be green. Then each of them was also “grue” at that time, where Object x is grue at time t iff x is green at t where t is earlier than the year 3000 or x is blue at t where t is during or after the year 3000.10 Every emerald that we have found to be green at a certain moment we have also found to be grue at that moment. So if emeralds after 3000 are similar to examined emeralds in their grueness, then they will be blue, whereas if emeralds after 3000 are similar to examined emeralds in their greenness, then they will be green. Obviously, the point generalizes: no matter what the color(s) of emeralds after 3000, there will be a respect in which they are like the emeralds that we have already examined. The principle of the uniformity of nature is satisfied no matter how “disorderly” the world turns out to be, since there is inevitably some respect in which it is uniform. So the principle of the uniformity of nature is necessarily true; it is knowable a priori. The trouble is that it purchases its necessity by being empty. Thus, we can justify believing in the principle of the uniformity of nature. But this is not enough to justify induction. Indeed, by applying the “principle of the uniformity of nature” indiscriminately (both to the green hypothesis and to the grue hypothesis), we make inconsistent predictions regarding emeralds after 3000. So to justify induction, we must justify expecting certain sorts of past uniformities rather than others to continue. The same argument has often been made in terms of our fitting a curve through the data points that we have already accumulated and plotted on a graph. Through any finite number of points, infinitely many curves can be drawn. These curves disagree in their predictions regarding the data points that we will gather later. But no matter where those points turn out to lie, there will be a curve running through them together with our current data. Of course, we regard some of the curves passing through our current data as making arbitrary bends later (at the year 3000, for instance); we would not regard extrapolating those curves as justified. To justify induction requires justifying those extrapolations we consider 10 Here I have updated and simplified Goodman’s definition of “grue.” He defines an object as “grue” if it is green and examined before a given date in the distant future, or is blue otherwise. My definition, which is more typical of the way that Goodman’s argument is presented, defines what it takes for an object to be grue at a certain moment and does without the reference to the time at which the object was examined. Notice that whether an object is grue at a given moment before 3000 does not depend on whether the object is blue after 3000, just as to qualify as green now, an object does not need to be green later.
60
Marc Lange
“straight” over those that make unmotivated bends. It might be alleged that of course, at any moment at which something is green, there is a respect in which it is like any other thing at any moment when it is green, whereas no property is automatically shared by any two objects while they are both grue; they must also both lie on the same side of the year 3000, so that they are the same color. Thus, “All emeralds are grue” is just a linguistic trick for papering over a non-uniformity and disguising it as a uniformity. But this move begs the question: why do green, blue, and other colors constitute respects in which things can be alike whereas grue, bleen, and other such “schmolors” do not? Even if there is some metaphysical basis for privileging green over grue, our expectation that unexamined emeralds are green, given that examined ones are, can be based on the principle of the uniformity of nature only if we already know that all green objects are genuinely alike. How could we justify that without begging the question? The principle of the uniformity of nature does much less to ground some particular inductive inference than we might have thought. At best, each inductive argument is based on some narrower, more specific presupposition about the respect in which unexamined cases are likely to be similar to examined cases. Therefore, Hume is mistaken in thinking that all inductive arguments are of the same kind in virtue of their all turning on the principle of the uniformity of nature. Hence, Hume has failed to show that it is circular to use one inductive argument to support another. Of course, even if this gap in Hume’s argument spoils Hume’s own demonstration that there is no possible way to justify induction, it still leaves us with another, albeit less decisive argument (to which I alluded a moment ago) against the possibility of justifying any particular inductive argument. Rather than argue that any inductive justification of induction is circular, we can offer a regress argument. If observations of past slices of bread justify our high confidence that the next slice of bread we eat will be nutritious, then what justifies our regarding those past observations of bread as confirming that the next slice of bread we eat will be nutritious? Apparently, other observations justify our believing in this link between our bread observations and our bread predictions — by justifying our high confidence that unexamined slices of bread are similar in nutritional value to already observed slices of bread. But whatever those other observations are, this bread uniformity does not follow deductively from them. They confirm it only by virtue of still other observations, which justify our believing in this link between certain observations and the bread uniformity. But what justifies our regarding those observations, in turn, as confirming this link, i.e., as confirming that the bread uniformity is likely if the first set of observations holds? We are off on a regress. A given inductive link is justified (if at all) only by observations, but those observations justify that link (if at all) only through another inductive link, which is justified (if at all) only by observations, which justify that link (if at all) only through another inductive link. . . . How can it end? If all non-deductive arguments can be justified only by way of observations, then any argument that
Hume and the Problem of Induction
61
we might use to justify a given non-deductive argument can itself be justified only by way of observations, and those observations could justify that argument only by an argument that can be justified only by way of other observations, and those observations could justify that argument only by an argument that can be justified only by way of still other observations, and so on infinitely. No bottom ever seems to be reached, so none of these arguments is actually able to be justified. In other words, any non-deductive argument rests upon high confidence in some contingent (i.e., non-necessary) truth (such as that slices of bread are generally alike nutritionally) linking its conclusion to its premises. We cannot use a nondeductive argument to justify this confidence without presupposing high confidence in some other contingent truth. We have here a close cousin of Hume’s argument — one that leads to the same conclusion, but through a regress rather than a circle.11 Even if there is no single “principle of the uniformity of nature” on which every inductive argument rests, there remains a formidable argument that no inductive argument can be justified inductively.12 5 THREE WAYS OF REJECTING HUME’S PROBLEM Let’s sum up Hume’s argument. We cannot use an inductive argument to justify inferring an inductive argument’s conclusion from its premises, on pain of circularity. We cannot use a deductive argument to justify do so, because there is no deductive argument from the premises of an inductive argument to its conclusion. So there is no way to justify the step from an inductive argument’s premises to its conclusion. Hume, according to the standard interpretation, holds that we are not entitled to our opinions regarding what we have not observed; those opinions are unjustified. This is a bit different from the conclusion that there is no way to justify that step — that no successful (e.g., non-question-begging) argument can be given for it. Perhaps there is no argument by which induction can be justified, but we are nevertheless justified in using induction, and so we are entitled to the opinions that we arrive at inductively. In this section, I shall look briefly at some forms that this view has taken. It has often been thought that certain beliefs are justified even though there is no argument by which they acquire their justification; they are “foundational”. Many epistemologists have been foundationalists, arguing that unless certain beliefs are 11 Perhaps it is even Hume’s argument. See the passage I quoted earlier from T, p. 91, where Hume’s argument takes the form of a regress. 12 Contrast John Norton [2003], who gives similar arguments that there is no general rule of inductive inference. Norton contends that different inductions we draw are grounded on different opinions we have regarding various particular contingent facts (e.g., that samples of chemical elements are usually uniform in their physical properties). He concludes that there is no special problem of induction. There is only the question of how there can be an end to the regress of justifications that begins with the demand that we justify those opinions regarding particular contingent facts on which one of our inductive arguments rests.
62
Marc Lange
justified without having to inherit their justification by inference from other beliefs that already possess justification, none of our beliefs is justified. (After all, a regress seems to loom if every belief acquires its justification from other beliefs that acquire their justification from other beliefs. . . . How could this regress end except with beliefs that are justified without having to have acquired their justification from other beliefs? This regress argument poses one of the classic problems of epistemology.) The beliefs that we acquire directly from making certain observations have often been considered foundational. Another kind of belief that has often been considered foundational consists of our beliefs in certain simple propositions that we know a priori, from which we infer the rest of our a priori knowledge — for example, that a person is tall if she is tall and thin. We rest our knowledge of this fact on no argument. It has sometimes been maintained that we just “see” — by a kind of “rational insight” — that this fact obtains (indeed, that it is necessary). Some philosophers have suggested that the proper lesson to take from Hume’s argument is that induction is likewise foundational. For instance, Bertrand Russell [1959, pp. 60-69] offers an inductive principle and suggests that it does not need to rest on anything to be justified. It is an independent, fundamental rule of inference.13 But this approach has all of the advantages of theft over honest toil.14 It fails to explain why induction rather than some alternative is a fundamental rule of inference. It does not tell us why we should expect the products of inductive reasoning to be true. It tries to make us feel better about having no answer to Hume’s problem — but fails. As Wesley Salmon writes: This is clearly an admission of defeat regarding Hume’s problem, but it may be an interesting way to give up on the problem. The search for the weakest and most plausible assumptions sufficient to justify alternative inductive methods may cast considerable light upon the logical structure of scientific inference. But, it seems to me, admission of unjustified and unjustifiable postulates to deal with the problem is tantamount to making scientific method a matter of faith. [Salmon, 1967, pp. 47–8] When philosophers have identified certain sorts of beliefs as foundational, they have generally offered some positive account of how those beliefs manage to be 13 Russell
offers the following as a primitive inductive principle: “(a) When a thing of a certain sort A has been found to be associated with a thing of a certain other sort B, and has never been found dissociated from a thing of the sort B, the greater the number of cases in which A and B have been associated, the greater is the probability that they will be associated in a fresh case in which one of them is known to be present; (b) Under the same circumstances, a sufficient number of cases of association will make the probability of a fresh association nearly a certainty, and will make it approach certainty without limit.” [1959, p. 66; cf. Russell, 1948, pp. 490–1] Of course, this principle is vulnerable to Goodman’s “grue” problem. Other approaches offering primitive inductive principles are Mill’s [1872] “axiom of the uniformity of the course of nature” and Keynes’ [1921] presumption of “limited independent variety”. 14 The phrase is Russell’s: “The method of ”postulating” what we want has many advantages; they are the same as the advantages of theft over honest toil.” [1919, p. 71]
Hume and the Problem of Induction
63
non-inferentially justified (e.g., of how certain of us qualify as able to make certain kinds of observations, or of how we know certain facts a priori). Simply to declare that induction counts as good reasoning seems arbitrary. A similar problem afflicts the so-called “ordinary-language dissolution” of the problem of induction. Many philosophers have suggested that induction is a fundamental kind of reasoning and that part of what we mean by evidence rendering a given scientific theory “justified”, “likely”, “well supported”, and so forth is that there is a strong inductive argument for it from the evidence. Hence, to ask “Why is inductive reasoning able to justify?” is either to ask a trivial question (because by definition, inductive reasoning counts as able to justify) or to ask a meaningless question (because, in asking this question, we are not using the word “justify” in any familiar, determinate sense). As P.F. Strawson remarks: It is an analytic proposition that it is reasonable to have a degree of belief in a statement which is proportional to the strength of the evidence in its favour; and it is an analytic proposition, though not a proposition of mathematics, that, other things being equal, the evidence for a generalization is strong in proportion as the number of favourable instances, and the variety of circumstances in which they have been found, is great. So to ask whether it is reasonable to place reliance on inductive procedures is like asking whether it is reasonable to proportion the degree of one’s convictions to the strength of the evidence. Doing this is what ‘being reasonable’ means in such a context. . . . In applying or withholding the epithets ‘justified’, well founded’, &c., in the case of specific beliefs, we are appealing to, and applying, inductive standards. But to what standards are we appealing when we ask whether the application of inductive standards is justified or well grounded? If we cannot answer, then no sense has been given to the question. [Strawson, 1952, pp. 256-7]15 In contending that it is either trivial or meaningless to ask for a justification of induction, the ordinary-language approach does not purport to “solve” the problem of induction, but rather to “dissolve it”: to show that the demand for a justification of induction should be rejected. One might reply that this line of thought offers us no reason to believe that the conclusions of strong inductive arguments from true premises are likely to be true. But the ordinary-language theorist disagrees: that these conclusions are the conclusions of strong inductive arguments from true premises is itself a good reason to believe that they are likely to be true. What else could we mean by a “good reason” than the kind of thing that we respect as a good reason, and what’s more respectable than induction? In his Philosophical Investigations, 15 Cf. [Horwich, 1982, pp. 97—98; Salmon, Barker, and Kyburg, 1965]. For critique of this view, I am especially indebted to [BonJour, 1998, pp. 196—199; Salmon, 1967, pp. 49—52; and Skyrms, 1986, pp. 47—54].
64
Marc Lange
Ludwig Wittgenstein recognizes that we may feel the need for a standard that grounds our standards for belief, but he urges us to resist this craving: 480. If it is now asked: But how can previous experience be a ground for assuming that such-and-such will occur later on? — the answer is: What general concept have we of grounds for this kind of assumption? This sort of statement about the past is simply what we call a ground for assuming that this will happen in the future. . . 481. If anyone said that information about the past could not convince him that something would happen in the future, I should not understand him. One might ask him: what do you expect to be told, then? What sort of information do you call a ground for such a belief? . . . If these are not grounds, then what are grounds? — If you say these are not grounds, then you must surely be able to state what must be the case for us to have the right to say that there are grounds for our assumption. . . 482. We are misled by this way of putting it: ‘This is a good ground, for it makes the occurrence of the event probable.’ That is as if we had asserted something further about the ground, which justified it as a ground; whereas to say that this ground makes the occurrence probable is to say nothing except that this ground comes up to a particular standard of good grounds — but the standard has no grounds!. . . .. 484. One would like to say: ‘It is a good ground only because it makes the occurrence really probable.’. . . 486. Was I justified in drawing these consequences? What is called a justification here? — How is the word ‘justification’ used?. . . [Wittgenstein, 1953; cf. Rhees and Phillips, 2003, pp. 73-77] But this argument makes the fact that induction counts for us as “good reasoning” seem utterly arbitrary. We have not been told why we should respect induction in this way. We have simply been reminded that we do. If part of what “good reason” means is that inductive reasons qualify as good, then so be it. We can still ask why we ought to have a term that applies to inductive arguments (and not to bogus arguments instead or in addition) and where a consequence of its applying to some argument is that we ought to endorse that argument. The mere fact that these circumstances of application and consequences of application are coupled in the meaning of “good reason” cannot prevent us from asking why they ought to be coupled — just as (to use Michael Dummett’s example) the term “Boche” has “German” as its circumstance of application and “barbaric” as its consequence of application, so that it is contradictory to say “The Boche are not really barbaric” or “The Germans are not really the Boche”, but we can still ask why we ought (not) to have “Boche” in our language. [Dummett,
Hume and the Problem of Induction
65
1981, p. 454; Brandom, 1994, pp. 126–127]. It is not contradictory to conclude that we should not use the term “Boche” because it is not true that the Germans are barbaric. Analogously, we can ask why we ought to have a term like “good argument” if an inductive argument automatically qualifies as a “good argument” and any “good argument” is automatically one that conveys justification from its premises to its conclusion. Without some account of why those circumstances of application deserve to go with those consequences of application, we have no reason to put them together — and so no reason to think better of arguments that qualify by definition as “good.” As Salmon says, It sounds very much as if the whole [ordinary-language] argument has the function of transferring to the word ‘inductive’ all of the honorific connotations of the word ‘reasonable’, quite apart from whether induction is good for anything. The resulting justification of induction amounts to this: If you use inductive procedures you can call yourself ‘reasonable’ — and isn’t that nice! [Salmon, 1957, p. 42; cf. Strawson, 1958] It does not show us why we ought to be “reasonable”. However, the ordinary-language dissolutionist persists, to ask why we ought to use the term “good reason” — why we ought to couple its circumstances and consequences of application — is just to ask for good reasons for us to use it. We cannot find some point outside of all of our justificatory standards from which to justify our standards of justification. What standards of justification do we mean when we demand a justification of induction? If an argument meets some “standards of justification” that are not our usual ones, then it does not qualify as a justification. On the other hand, if it meets our usual standards of justification, then (since, Hume showed, no deductive argument can succeed in justifying induction) the argument will inevitably be inductive and so beg the question, as Hume showed. But we do not need to specify our standards of justification in advance in order for our demand for a justification of induction to make sense. We know roughly what a justification is, just as we know roughly what it is for an argument to beg the question. A justification of induction would consist of an argument that we believe is properly characterized as justificatory — an argument that, we can show, meets the same standards as the familiar arguments that we pretheoretically recognize as justificatory. In showing this, we may be led to new formulations of those standards — formulations that reveal features that had been implicit in our prior use of “justification”. When we ask for a justification of induction, we are not trying to step entirely outside of our prior standards of justification, but at the same time, we are not asking merely to be reminded that induction is one of the kinds of reasoning that we customarily recognize as good. Rather, we are asking for some independent grounds for recognizing induction as good — a motivation for doing so that is different enough to avoid begging the question, but not so different that it is unrecognizable as a justification. Of course, it may be unclear
66
Marc Lange
what sort of reasoning could manage to walk this fine line until we have found it. But that does not show that our demand for a justification of induction is trivial or meaningless. Here is an analogy. Suppose we want to know whether capital punishment counts as “cruel” in the sense in which the United States Constitution, the English Bill of Rights, and the Universal Declaration of Human Rights outlaw cruel punishment. One might argue that since capital punishment was practiced when these documents were framed (and is practiced today in the United States), “cruel” as they (and we) mean it must not apply to capital punishment. But the question of whether capital punishment is cruel cannot be so glibly dismissed as trivial (if we mean “cruel” in our sense) or meaningless (if we mean “cruel” in some other, unspecified sense) — and could not be so dismissed even if no one had ever thought that capital punishment is cruel. What we need, in order to answer the question properly, is an independent standard of what it takes for some punishment to qualify as “cruel”. The standard must fit enough of our pretheoretic intuitions about cruel punishment (as manifested, for example, in legal precedents) that we are justified in thinking that it has managed to make this notion more explicit, rather than to misunderstand it. Furthermore, the standard must derive its credentials independently from whatever it says about capital punishment, so that we avoid begging the question in using this standard to judge whether capital punishment qualifies as cruel. Of course, prior to being sufficiently creative and insightful to formulate such a standard, we may well be unable to see how it could be done. But that is one reason why it takes great skill to craft good arguments for legal interpretations — and, analogously, why it is difficult to address Hume’s problem. Another approach that deems induction to be good reasoning, even though no non-question-begging argument can be given to justify it, appeals to epistemological naturalism and externalism. On this view, if inductive reasoning from true premises does, in fact, tend to lead to the truth, then an inductive argument has the power to justify its conclusion even though the reasoner has no non-circular basis for believing that the conclusions of inductive arguments from true premises are usually true [Brueckner, 2001; Kornblith, 1993; Papineau, 1993, pp. 153–160; Sankey, 1997; van Cleve, 1984]. To my mind, this approach simply fails to engage with Hume’s problem of induction. The externalist believes that we qualify as having good reasons for our opinions regarding the future as long as inductive arguments from true premises do in fact usually yield the truth regarding unexamined cases. But the problem of induction was to offer a good reason to believe that a given inductive argument from true premises will likely yield the truth regarding unexamined cases. Suppose the externalist can persuade us that to be justified in some belief is to arrive at it by reliable means. Then we are persuaded that if induction is actually reliable, then the conclusion of an inductive argument (from justified premises) is justified. We are also persuaded that if induction actually is reliable, then an inductive reasoner is justified in her belief (arrived at inductively, from the frequent success of past
Hume and the Problem of Induction
67
inductive inferences) that induction will continue to be reliable. Nevertheless, the externalist has not persuaded us that induction is reliable. 6
HUME’S CONCLUSION
Hume, according to the standard interpretation of his view, is an “inductive skeptic”: he holds that we are not entitled to our opinions regarding what we have not observed. There are plenty of textual grounds for this interpretation. For example, in a passage that we have already quoted (T , p. 91), he says that an inductive argument for induction has “no just foundation”, suggesting that his main concern is whether induction has a just foundation. Sometimes Hume appears to concede induction’s justification: I shall allow, if you please, that the one proposition [about unexamined cases] may justly be inferred from the other [about examined cases]: I know in fact, that it always is inferred. (E, p. 22). But his “if you please” suggests that this concession is merely rhetorical – for the sake of argument. His point is that someone who believes that we are justified in having expectations regarding unexamined cases should be concerned with uncovering their justification. When Hume finds no justification, he concludes that these expectations are unjustified. Accordingly, Hume says, our expectations are not the product of an “inference” (E, p. 24) or some “logic” (E, p. 24) or “a process of argument or ratiocination” (E, p. 25): it is not reasoning which engages us to suppose the past resembling the future, and to expect similar effects from causes, which are, to appearance, similar. (E, p. 25) Rather, Hume says, our expectations regarding what we have not observed are the result of the operation of certain innate “instincts” (E, pp. 30, 37, 110). When (for example) the sight of fire has generally been accompanied by the feeling of heat in our past experience, these instinctual mental mechanisms lead us, when we again observe fire, to form a forceful, vivid idea of heat — that is, to expect heat: What then is the conclusion of the whole matter? A simple one; though, it must be confessed, pretty remote from the common theories of philosophy. All belief of matter of fact or real existence is derived merely from some object, present to the memory or senses, and a customary conjunction between that and some other object. Or in other words; having found, in many instances, that any two kinds of objects, flame and heat, snow and cold, have always been conjoined together; if flame or snow be presented anew to the senses, the mind is carried by custom to expect heat or cold, and to believe, that such
68
Marc Lange
a quality does exist, and will discover itself upon a nearer approach. This belief is the necessary result of placing the mind in such circumstances. It is an operation of the soul, when we are so situated, as unavoidable as to feel the passion of love, when we receive benefits; or hatred, when we meet with injuries. All these operations are a species of natural instincts, which no reasoning or process of the thought and understanding is able, either to produce, or to prevent. (E, p. 30) With prose like that, is it any wonder that Hume’s argument has become such a classic? Hume believes that we cannot help but form these expectations, in view of the way our minds work. So Hume does not recommend that we try to stop forming them. Any such attempt would be in vain. But Hume’s failure to recommend that we try to resist this irresistible psychological tendency should not lead us to conclude (with Garrett [1997]) that Hume believes our expectations to be justified or that Hume is uninterested in evaluating their epistemic standing. Hume sometimes uses normative-sounding language in giving his naturalistic, psychological account of how we come by our expectations: [N]one but a fool or a madman will ever pretend to dispute the authority of experience, or to reject that great guide of human life. . . (E, p. 23) But by “authority” here, he presumably means nothing normative, but merely the control or influence that experience in fact exercises over our expectations — experience’s “hold over us”.16 As Hume goes on to explain on the same page, he wants “to examine the principle of human nature, which gives this mighty authority to experience. . . ” That “principle of human nature” has no capacity to justify our expectations; it merely explains them. (And since it is irresistible, none can reject it and none but a fool or a madman will pretend to reject it.) Hume often terms this principle of the association of ideas “custom” or “habit”: ’Tis not, therefore, reason which is the guide of life, but custom. That alone determines the mind, in all instances, to suppose the future conformable to the past. However easy this step may seem, reason would never, to all eternity, be able to make it. (T , p. 652) 16 That “authority” here should be interpreted as brute power to bring about rather than entitlement (“rightful authority”) to bring about is evident from other passages: “If the mind be not engaged by argument to make this step, it must be induced by some other principle of equal weight and authority [namely, custom]. . . ” (E, p. 27). An interpretation of “authority” as normative has led some to regard Hume not as an inductive skeptic, but instead as offering a reductio of some particular conception of knowledge on the grounds that it would deem what we know inductively not to be knowledge: ”Far from being a skeptical challenge to induction, Hume’s ‘critique’ is little more than a prolonged argument for the general position that Newton’s inductive method must replace the rationalistic model of science” according to which a priori reasoning is “capable of deriving sweeping factual conclusions.” [Beauchamp and Rosenberg, 1981, p. 43] See also [Smith, 1941; Stroud, 1977].
Hume and the Problem of Induction
69
For wherever the repetition of any particular act or operation produces a propensity to renew the same act or operation, without being impelled by any reasoning or process of the understanding; we always say, that this propensity is the effect of Custom. By employing that word, we pretend not to have given the ultimate reason of such a propensity. We only point out a principle of human nature, which is universally acknowledged, and which is well known by its effects. . . . [A]fter the constant conjunction of two objects, heat and flame, for instance, weight and solidity, we are determined by custom alone to expect the one from the appearance of the other. (E, p. 28) Hume is thus offering a theory of how our minds work.17 His “arguments” for this scientific theory are, of course, inductive. What else could they be? For instance, Hume points out that in the cases we have seen, a correlation in some observer’s past observations (such as between seeing fire and feeling heat) is usually associated with that observer’s forming a certain expectation in future cases (e.g., expecting heat, on the next occasion of seeing fire). Having noted this association between an observer’s expectations and the correlations in her past observations, Hume extrapolates the association; Hume is thereby led to form certain expectations regarding what he has not yet observed, and so to believe in a general “principle of human nature”.18 Some interpreters have suggested that since Hume is here using induction, he must believe that he is (and we are) entitled to do so — and so that induction is justified.19 However, in my view, Hume’s use of induction shows no such thing. Hume says that we cannot help but form expectations in an inductive way — under the sway (“authority”) of certain mental instincts. Hume’s behavior in forming his own expectations regarding the expectations of others is just one more example of these mental instincts in action. So Hume’s own belief in his theory of human nature is accounted for by that very theory. Like his expectations regarding the next slice of bread he will eat, his expectations regarding human belief-formation fail to suggest that Hume regards our expectations regarding what we have not observed to be justified. By the same token, Hume notes that as we become familiar with cases where someone’s expectations regarding what had not yet been observed turn out to be accurate and other cases where those expectations turn out not to be met, we notice the features associated with these two sorts of cases. We then tend to be 17 As we have seen, the “principle of the uniformity of nature” must be applied selectively, on pain of leading to contradiction. So insofar as Hume’s theory of the mind incorporates such a principle as governing the association of ideas, it does not suffice to account for our expectations. 18 By analogous means, Hume arrives at other parts of his theory, such as that every simple idea is a copy of a prior impression, that there are various other principles of association among ideas, etc. 19 See, for instance, [Garrett, 1997], according to which Hume’s main point is not to give “an evaluation of the epistemic worth of inductive inferences” (p. 94) but rather to do cognitive psychology — to identify the component of the mind that is responsible for those opinions (imagination rather than reason). For more discussion of this line of interpretation, see [Read and Richman, 2000].
70
Marc Lange
guided by these associations in forming future expectations. This is the origin of the “Rules by which to judge of causes and effects” that Hume elaborates (T , p. 173): “reflexion on general rules keeps us from augmenting our belief upon every encrease of the force and vivacity of our ideas” (T , p. 632). We arrive at these “rules” by way of the same inductive instincts that lead us to form our other expectations regarding what we have not observed. Some have argued that Hume’s offering these rules shows that Hume is not an inductive skeptic, since if there are rules distinguishing stronger from weaker inductive arguments, then such arguments cannot all be bad.20 But as I have explained, Hume’s endorsement of these rules does not mean that he believes that expectations formed in accordance with them are justified. The rules do not distinguish stronger from weaker inductive arguments. Rather, they result from our instinctively forming expectations regarding the expectations we tend to form under various conditions. Today, in the wake of Darwin’s theory of evolution by natural selection, we might argue that natural selection has equipped us with various innate beliefforming instincts. Creatures with these instincts stood at an advantage in the struggle for existence, since these instincts gave them accurate expectations and so enabled them to reproduce more prolifically. But once again, this theory does not solve Hume’s problem; it does not justify induction. To begin with, we have used induction of some kind to arrive at this scientific explanation of the instincts’s origin. Furthermore, even if the possession of a certain belief-forming instinct was advantageous to past creatures because it tended to lead to accurate predictions, we would need to use induction to justify regarding the instinct’s past predictive success as confirming its future success. 7
BONJOUR’S A PRIORI JUSTIFICATION OF INDUCTION
Laurence BonJour [1986; 1998, pp. 203–216] has maintained that philosophers have been too hasty in accepting Hume’s argument that there is no a priori means of proceeding from the premises of an inductive argument to its conclusion. BonJour accepts that there is no contradiction in an inductive argument’s premises being true and its conclusion false. But BonJour rejects Hume’s view that the only truths that can be established a priori are truths that hold on pain of contradiction. BonJour is inclined to think that there is something right in the view that only necessary truths can be known a priori. But again, he believes that there are necessary truths that are not analytic (i.e., necessary truths the negations of which are not contradictions). BonJour concedes that we cannot know a priori that anything like the “principle of the uniformity of nature” holds. However, he thinks that a good inductive argument has as its premise not merely that the fraction of Gs among examined F s is m/n, but something considerably less likely to be a coincidence: that the fraction converged to m/n and has since remained approximately steady as more 20 See
prior note.
Hume and the Problem of Induction
71
(and more diverse) F s have been examined. BonJour suggests that (when there is no relevant background information on the connection between being F and being G or on the incidence of Gs among F s) we know by a priori insight (for certain properties F and G) that when there has been substantial variation in the locations and times at which our observations were made, the character of the observers, and other background conditions, the fraction m/n is unlikely to remain steady merely as a brute fact (i.e., a contingent fact having no explanation) or just by chance — e.g., by there being a law that approximately r/n of all F s are G, but “by chance” (analogous to a fair coin coming up much more often heads than tails in a long run of tosses), the F s we observed were such that the relative frequency of Gs among them converged to a value quite different from r/n and has since remained about there. We recognize a priori that it is highly unlikely that any such coincidence is at work. Moreover, as the observed F s become ever more diverse, it eventually becomes a priori highly unlikely that the explanation for the steady m/n fraction of Gs among them is that although it is not the case that there is a law demanding that approximately m/n of all F s are G, the F s that we have observed have all been Cs and there is a law demanding that approximately m/n of all F Cs are G. As the pool of observed F s becomes larger and more diverse, it becomes increasingly a priori unlikely that our observations of F s are confined only to Cs where the natural laws demand that F Cs behave differently from other F s. For that to happen would require an increasingly unlikely coincidence: a coordination between the range of our observations and the natural laws. (Analogous arguments apply to other sorts of possible explanations of the steady m/n fraction of Gs among the observed F s, such as that the F s we observed in each interval happened to consist of about the same fraction of Cs, and the laws assign a different likelihood to F Cs being G than to F ∼ Cs being G.) Thus, in the case of a good inductive argument, it is a priori likely (if the act of observing an F is not itself responsible for its G-hood) that our evidence holds only if it is a law that approximately m/n of all F s are G. In other words, we know a priori that the most likely explanation of our evidence is the “straight inductive explanation”. BonJour does not say much about the likelihoods that figure in these truths that we know a priori. They seem best understood as “logical probabilities” like those posited by John Maynard Keynes [1921] and Rudolf Carnap [1950], among others — logical, necessary, probabilistic relations obtaining among propositions just in virtue of their content.21 Just as we know by rational insight that the premises of a deductive argument can be true only if the conclusion is true, so likewise (BonJour seems inclined to say) we know by rational insight that the premises of a good inductive argument make its conclusion highly likely. As Keynes wrote: Inasmuch as it is always assumed that we can sometimes judge directly that a conclusion follows from a premises, it is no great extension of this assumption to suppose that we can sometimes recognize that a 21 BonJour
[personal communication] is sympathetic to the logical interpretation of probability.
72
Marc Lange
conclusion partially follows from, or stands in a relation of probability to a premiss. [Keynes, 1921, p. 52] Presumably, part of what we grasp in recognizing these probabilistic relations is that (in the absence of other relevant information) we should have high confidence in any proposition that stands to our evidence in a relation of logically high probability. But I wonder why we should. If a conclusion logically follows from our evidence, then we should believe the conclusion because it is impossible for the premises to be true without the conclusion being true. But we cannot say the same in the case of a conclusion that is merely “highly logically probabilified” by our evidence. To say that the conclusion is made likely, and so we should have great confidence that it is true, is to risk punning on the word “likely”. There is some sort of logical relation between the premises and the conclusion, and we call this relation “high logical probabilification” presumably because it obeys the axioms of probability and we think that (in the absence of other relevant information) we ought to align our subjective degrees of probability (i.e., our degrees of confidence) with it. But then this relation needs to do something to deserve being characterized as “high logical probabilification”. What has it done to merit this characterization? The problem of justifying induction then boils down to the problem of justifying the policy of being highly confident in those claims that stand in a certain logical relation to our evidence. Calling that relation “high logical probabilification” or claiming rational insight into that relation’s relevance to our assignment of subjective probability does not reveal what that relation does to merit our placing such great weight upon it. Why should my personal probability distribution be one of the “logical” probability functions? How do we know that my predictions would then tend to be more accurate? (To say that they are then “likely” to be more accurate, in the sense of “logical” probability, is to beg the question.) Let’s turn to a different point. BonJour recognizes that no matter how many F s we observe, there will always be various respects C in which they are unrepresentative of the wider population of F s. (All F s so far observed existed sometime before tomorrow, to select a cheap example.) Of course, we can never show conclusively that the steady m/n frequency of Gs among the observed F s does not result from a causal mechanism responsible for the G-ness of F Cs but not for the G-ness of other F s. That concession is no threat to the justification of induction, since strong inductions are not supposed to be proofs. But how can we be entitled even to place high confidence in the claim that there is no such C? BonJour writes (limiting himself to spatial Cs for the sake of the example): [Our data] might be skewed in relation to some relevant factor C . . . because C holds in the limited area in which all the observations are in fact made, but not elsewhere. It is obviously a quite stubborn empirical fact that all of our observations are made on or near the surface of the earth, or, allowing for the movement of the earth, in the general region of the solar system, or at least in our little corner of the galaxy, and
Hume and the Problem of Induction
73
it is possible that C obtains there but not in the rest of the universe, in which case our standard inductive conclusion on the basis of those observations would presumably be false in relation to the universe as a whole, that is, false simpliciter. . . . The best that can be done, I think, is to point out that unless the spatio-temporal region in which the relevant C holds is quite large, it will still be an unlikely coincidence that our observations continue in the long run to be confined to that region. And if it is quite large, then the inductive conclusion in question is in effect true within this large region in which we live, move, and have our cognitive being. [BonJour, 1998, p. 215] BonJour seems to be saying that we know a priori that it would be very unlikely for all of our observations so far to have been in one spatiotemporal region (or to have been made under one set of physical conditions C) but for the laws of nature to treat that region (or those conditions) differently from the region in which (or conditions under which) our next observation will be made. But this does not seem very much different from purporting to know a priori that considering the diversity of the F s that we have already examined and the steady rate at which Gs have arisen among them, the next case to be examined will probably be like the cases that we have already examined. Inevitably, there will be infinitely many differences between the F s that we have already examined and the F s that we will shortly examine (if we are willing to resort to “gruesome” respects of similarity and difference). Is the likely irrelevance of these differences (considering the irrelevance of so many other factors, as manifested in the past steadiness of the m/n fraction) really something that we could know a priori? Isn’t this tantamount to our knowing a priori (at least for certain properties F and G) that if it has been the case at every moment from some long past date until now that later F s were found to be Gs at the same rate as earlier F s, then (in the absence of other relevant information) it is likely that the F s soon to be examined will be Gs at the same rate as earlier F s? That seems like helping ourselves directly to induction a priori. Consider these two hypotheses: (h) A law requires that approximately m/n of all F s are G; (k) A law requires that approximately m/n of all F s before today are G but that no F s after today are G. On either of these hypotheses, it would be very likely that approximately m/n of any large, randomly-selected sample of F s before today will be G.22 Do we really have a priori insight into which of these hypotheses provides the most likely explanation of this fact? BonJour apparently thinks that we do, at least for certain F s and Gs, since (k) would require an a priori unlikely coordination between the present range of our observations and the discriminations made by the natural laws. Moreover, insofar as (k) is changed so that the critical date is pushed back from today to the year 3000 (or 30,000, or 300,000. . . ), (k)’s truth would require less of an a priori unlikely coordination between the laws and the present range of our observations — but, BonJour seems 22 See
section 10.
74
Marc Lange
to be saying, it then becomes increasingly the case that “the inductive conclusion in question is in effect true within this large region in which we live, move, and have our cognitive being.” 8
REICHENBACH’S PRAGMATIC JUSTIFICATION OF INDUCTION
Hans Reichenbach [1938, pp. 339–363; 1949a, 469–482; 1968, 245–246] has proposed an intriguing strategy for justifying induction. (My discussion is indebted to [Salmon, 1963].) Reichenbach accepts Hume’s argument that there is no way to show (without begging the question) that an inductive argument from true premises is likely to lead us to place high confidence in the truth. However, Reichenbach believes that we can nevertheless justify induction by using pragmatic (i.e., instrumental, means-ends) reasoning to justify the policy of forming our expectations in accordance with an inductive rule. Of course, Reichenbach’s argument cannot justify this policy by showing that it is likely to lead us to place high confidence in the truth — since that approach is blocked by Hume’s argument. Reichenbach believes that to justify an inductive policy, it is not necessary to show that induction will probably be successful, or that induction is more likely to succeed than to fail, or even that the claims on which induction leads us to place high confidence are at least 10% likely to be true. It suffices, Reichenbach thinks, to show that if any policy for belief-formation will do well in leading us to the truth, then induction will do at least as well. That is, Reichenbach believes that to justify forming our opinions by using induction, it suffices to show that no other method can do better than induction — even if we have not shown anything about how well induction will do. Induction is (at least tied for) our best hope, according to Reichenbach, though we have no grounds for being at all hopeful.23 BonJour [1998, pp. 194—196] has objected that Reichenbach’s argument cannot justify induction because it does not purport to present us with good grounds for believing that induction will probably succeed. It does not justify our believing in the (likely) truth of the claims that receive great inductive support. It purports to give us pragmatic rather than epistemic grounds for forming expectations in accordance with induction. Surely, BonJour says, if we are not entitled to believe that a hypothesis that has received great inductive support is likely to be true, then we have not really solved Hume’s problem. Reichenbach writes: A blind man who has lost his way in the mountains feels a trail with his stick. He does not know where the path will lead him, or whether it may take him so close to the edge of a precipice that he will be plunged into the abyss. Yet he follows the path, groping his way step by step; for if there is any possibility of getting out of the wilderness, it is by 23 Feigl [1950] distinguishes “validating” a policy (i.e., deriving it from more basic policies) from “vindicating” it (i.e., showing that it is the right policy to pursue in view of our goal). A fundamental policy cannot be validated; it can only be vindicated. Accordingly, Reichenbach is often interpreted as purporting to “vindicate” induction.
Hume and the Problem of Induction
75
feeling his way along the path. As blind men we face the future; but we feel a path. And we know: if we can find a way through the future it is by feeling our way along this path. [Reichenbach, 1949a, p. 482] BonJour replies: We can all agree that the blind man should follow the path and that he is, in an appropriate sense, acting in a justified or rational manner in doing so. But is there any plausibility at all to the suggestion that when we reason inductively, or accept the myriad scientific and commonsensical results that ultimately depend on such inference, we have no more justification for thinking that our beliefs are likely to be true than the blind man has for thinking that he has found the way out of the wilderness? [BonJour, 1998, pp. 195—196] BonJour’s objection illustrates how much Reichenbach is prepared to concede to Hume. Reichenbach’s point is precisely that an agent “makes his posits because they are means to his end, not because he has any reason to believe in them.” [Reichenbach, 1949b, p. 548]24 The policy for forming our expectations that Reichenbach aims to justify is the “straight rule”: If n F s have been examined and m have been found to be G, then take m/n to equal (within a certain degree of approximation) the actual fraction of Gs among F s — or, if there are infinitely many F s and Gs (and so the fraction is infinity divided by infinity), take m/n to approximate the limiting relative frequency of Gs among F s. Reichenbach presents his policy as yielding beliefs about limiting relative frequencies, rather than as yielding degrees of confidence, because Reichenbach identifies limiting relative frequencies with objective chances. Accordingly, the “straight rule” is sometimes understood as follows: If n F s have been examined and m have been found to be G, then take m/n to equal (within a certain degree of approximation) an F s objective chance of being G. Reichenbach then argues that if there is a successful policy for forming beliefs about limiting relative frequencies, then the straight rule will also succeed. For instance, suppose that some clairvoyant can predict the outcome of our next experiment with perfect accuracy. Then let F be that the clairvoyant makes a certain prediction regarding the outcome and G be that the clairvoyant is correct. Since m/n (from our past observations of F s and Gs) equals 1, the straight rule endorses our taking 1 to be the limiting relative frequency of truths among the clairvoyant’s predictions (or chance that a given prediction by the clairvoyant will come to pass). In short, Reichenbach argues that if the world is non-uniform (i.e., if there is no successful policy for forming beliefs about limiting relative frequencies), then the straight rule will fail but so will any other policy, whereas if the world is uniform (i.e., if there is a successful policy), then the straight rule will seize upon the uniformity (or policy). Hence, the straight rule can do no worse than 24 However, Reichenbach [1968, p. 246] says “in my theory good grounds are given to treat a posit as true”.
76
Marc Lange
any other policy for making predictions. Under any circumstances, it is at least tied for best policy. Therefore, its use is pragmatically justified. Even this rough statement of Reichenbach’s argument suffices to reveal several important difficulties it faces. First, as we saw in connection with the principle of the uniformity of nature, the straight rule licenses logically inconsistent predictions. For instance, if the F s are the emeralds, then “G” could be “green” or “grue.” In either case, m/n is 1, but we cannot apply the straight rule to both hypotheses on pain of believing that emeralds after 3000 are all green and all blue. For the rule to license logically consistent predictions, it must consist of the straight rule along with some principle selecting the hypotheses to which the straight rule should be applied. However, a straight rule equipped with a principle of selection is no longer guaranteed to succeed if any rule will. If all emeralds are grue (and there are emeralds after 3000), then a straight rule equipped with a principle of selection favoring the grue hypothesis over the green hypothesis will succeed whereas a straight rule favoring green over grue will fail (as long as all of our emeralds are observed before the year 3000).25 Here is a related point. The straight rule does not tell us what properties to take as F and G. It merely specifies, given F and G, what relative frequency (or chance) to assign to unexamined F s being G. But then there could be an F and a G to which the straight rule would lead us to assign an accurate relative frequency, but as it happens, we fail to think of that F and G. (For instance, we might simply not think of tallying the rate at which the clairvoyant’s predictions have been accurate in the past, or of taking “grue” as our G. [Putnam, 1994, p. 144]) In other words, the straight rule is concerned with justifying hypotheses once they have been thought up — not with thinking them up in the first place. (It is not a “method of discovery”; it is a “method of justification.”) So the sense in which the straight rule is guaranteed to lead us to the genuine limiting relative frequency, if any rule could, is somewhat limited. Let’s set aside this difficulty to focus on another. Reichenbach compares the straight rule to other policies for making predictions. But what about the policy of making no predictions at all? Of course, the straight rule is more likely (or, at least, not less likely) than this policy to arrive at accurate predictions. But it is also more likely than this policy to arrive at inaccurate predictions. If our goal is to make accurate predictions and we incur no penalty for making inaccurate ones, then the straight rule is obviously better than the no-prediction policy. This seems to be Reichenbach’s view: We may compare our situation to that of a man who wants to fish in 25 Reichenbach responds, “The rule of induction . . . leads only to posits that are justified asymptotically.” [1949a, p. 448] In the long run, we observe emeralds after 3000. So although “applying the rule of induction to [grue], we shall first make bad posits, but while going on will soon discover that [emeralds after 3000 are not grue]. We shall thus turn to positing [green] and have success.” This response is vulnerable to the reply that we make all of our actual predictions in the short run rather than the long run (as I will discuss momentarily). Moreover, if we “apply the rule of induction” to grue as well as to green, then we make predictions that are contradictory, not merely inaccurate.
Hume and the Problem of Induction
77
an unexplored part of the sea. There is no one to tell him whether or not there are fish in this place. Shall he cast his net? Well, if he wants to fish in that place I should advise him to cast the net, to take the chance at least. It is preferable to try even in uncertainty than not to try and be certain of getting nothing. [Reichenbach, 1938, pp. 362—363]26 This argument presumes that there is no cost to trying — that the value of a strategy is given by the number of fish that would be caught by following it, so that if a strategy leads us to try and fail, then its value is zero, which is the same as the value of the strategy of not trying at all. So the fisherman has everything to gain and nothing to lose by casting his net. But doesn’t casting a net come with some cost? (It depletes the fisherman’s energy, for instance.) In other words, the straight rule offers us some prospect of making accurate predictions, whereas the policy of making no predictions offers us no such prospect, so (Reichenbach concludes) the straight rule is guaranteed to do no worse than the no-prediction rule. But why shouldn’t our goal be “the truth, the whole truth, and nothing but the truth”, so that we favor making accurate predictions over making inaccurate predictions or no predictions, but we favor making no predictions over making inaccurate predictions? Reichenbach then cannot guarantee that the straight rule will do at least as well as any other policy, since if the straight rule fails, then the policy of making no predictions does better. Thus, Reichenbach’s argument may favor the straight rule over alternative methods of making predictions, but it does not justify making some predictions over none at all. Let us now look at Reichenbach’s more rigorous formulation of his argument. Consider a sequence of F s and whether or not each is G. Perhaps the sequence is G, ∼ G, ∼ G, G, ∼ G, ∼ G, ∼ G. . . , and so at each stage, the relative frequency of Gs among the F s is 1/1, 1/2, 1/3, 1/2, 2/5, 1/3, 2/7. . . . Either this sequence converges to a limiting relative frequency or it does not. (According to Reichenbach, this is equivalent to: either there is a corresponding objective chance or there is not.) For instance, if 1/4 of the first 100 F s are G, 3/4 of next 1000 are G, 1/4 of the next 10,000 are G, and so forth, then the relative frequency of Gs among F s never converges and so there is no limiting relative frequency. No method can succeed in arriving at the limit if there is no limit, so in that event, the straight rule does no worse than any other method. On the other hand, if there is a limiting relative frequency, then the straight rule is guaranteed to converge to it in the long run: the rule’s prediction is guaranteed eventually to come within any given degree of approximation to the limiting relative frequency, and thenceforth to remain within that range. That is because by definition, L is the limit of the sequence a1 , a2 , a3 ,. . . exactly when for any small positive number ε, there is an integer N such that for any n > N, an is within ε of L. So if L is the limit of 26 Salmon [1991, p. 100] takes himself to be following Reichenbach in arguing that the policy of making no predictions fails whether nature is uniform or not, so it cannot be better than using induction, since the worst that induction can do is to fail. But although the no-prediction rule fails in making successful predictions, it succeeds in not making unsuccessful predictions.
78
Marc Lange
the sequence a1 , a2 , a3 ,. . . where an is the fraction (m/n) of Gs among the first n F s, then at some point in the sequence, its members come and thenceforth remain within ε of L, and since at any point the straight rule’s prediction of the limit is just the current member m/n of the sequence, the straight rule’s prediction of the limit is guaranteed eventually to come and thenceforth to remain within ε of L. If instead our goal is the accurate estimation of the objective chance of an F ’s being G, then if there is such a chance, the straight rule is 100% likely to arrive at it — to within any specified degree of approximation — in the long run. (That is, although a fair coin might land heads repeatedly, the likelihood of its landing heads about half of the time becomes arbitrarily high as the number of tosses becomes arbitrarily large.) So the straight rule is “asymptotic”: its prediction is guaranteed to converge to the truth in the long run, if any rule will. Surely, Reichenbach seems to be suggesting, it would be irrational to knowingly employ a rule that is not asymptotic if an asymptotic rule is available. Plenty of rules are not asymptotic. For instance, consider the “counterinductive rule”: If n F s have been examined and m have been found to be G, then take (1−m/n) to approximate the actual fraction of Gs among F s. The counterinductive rule is not asymptotic since, unless the limit is 50%, its prediction is guaranteed to diverge from the straight rule’s in the long run, and the straight rule’s is guaranteed to converge to the truth (if there is a truth to converge to) in the long run. In this way, Reichenbach’s argument purports to justify following the straight rule rather than the counterinductive rule. (Notice how this argument aims to walk a very fine line: to justify induction without saying anything about induction’s likelihood of leading us to the truth!) This argument does not rule out another rule’s working better than the straight rule in the short run — that is, converging more quickly to the relative limiting frequency than the straight rule does. For instance, the rule that would have us (even before we have ever tossed a coin!) guess 50% as the approximate limiting relative frequency of heads among the coin-toss outcomes might happen to lead to the truth right away. So in this respect, it might do better than the straight rule. But it cannot do better than the straight rule in the long run, since the straight rule is guaranteed to lead to the truth in the long run. (The 50% rule is not guaranteed to lead to the truth in the long run.) However, in the long run (as Keynes famously quipped), we are all dead! All of our predictions are made in the short run — after a finite number of observations have been made. Why should a rule’s success in the long run, no matter how strongly guaranteed, do anything to justify our employing it in the short run? We cannot know how many cases we need to accumulate before the straight rule’s prediction comes and remains within a certain degree of approximation of the genuine limit (if there is one). So the fact that the straight rule’s prediction is guaranteed to converge eventually to the limit (if there is one) seems to do little to justify our being guided by the straight rule in the short run. Why should the straight rule’s success under conditions that we have no reason to believe we currently (or will ever!) occupy give us a good reason to use the straight rule?
Hume and the Problem of Induction
79
(This seems to me closely related to BonJour’s objection to Reichenbach.) A final problem for Reichenbach’s argument is perhaps the most serious. A nondenumerably infinite number of rules are entitled to make the same boast as the straight rule: each is guaranteed to converge in the long run to the limiting relative frequency if one exists. Here are a few examples: If n F s have been examined and m have been found to be G, then take m/n + k/n for some constant k (or, if this quantity exceeds 1, then take 1) to equal (within a certain degree of approximation) the actual fraction of Gs among F s. If n F s have been examined and m have been found to be G, then take the actual fraction of Gs among F s to equal 23.83%, if n < 1, 000, 000, or m/n, otherwise. If n F s have been examined and m have been found to be G, and among the first 100 F s examined, M were found to be G, then take the actual fraction of Gs among F s to equal (m+M )/(n+ inf{n, 100}). (In other words, “double count” the first 100 F s.) In the long run, each of these rules converges to the straight rule and so must converge to the limit (if there is one). The straight rule cannot be shown to converge faster than any other asymptotic rule. Moreover, the asymptotic rules disagree to the greatest extent possible in their predictions: for any evidence and for any prediction, there is an asymptotic rule that endorses making that prediction on the basis of that evidence.27 The first of these three rivals to the straight rule violates the constraint that if G and H are mutually exclusive characteristics and a rule endorses taking p as Gs relative frequency and q as Hs relative frequency among F s, then the rule should endorse taking (p + q) as the relative frequency of (G or H). Furthermore, the second and third of these three rivals to the straight rule violate the constraint that the rule endorse taking the same quantity as the limiting relative frequency for any sequence of Gs and ∼ Gs with a given fraction of Gs and ∼ Gs, no matter how long the sequence or in what order the Gs and ∼ Gs appear in it. Constraints like these have been shown to narrow down the asymptotic rules to the straight rule alone. [Salmon, 1967, pp. 85–89, 97–108; Hacking, 1968, pp. 57–58] However, it is difficult to see how to justify such constraints without begging the question. For instance, if the sequence consists of the F s in the order in which we have observed them, then to require making the same prediction regardless of the order (as long as the total fraction of Gs is the same) is tantamount to assuming that 27 Reichenbach [1938, pp. 353–354] notes this problem. In [Reichenbach, 1949a, p. 447], he favors the straight rule on grounds of “descriptive simplicity”. But although Reichenbach regards “descriptive simplicity” as relevant for selecting among empirically equivalent theories, rules are not theories. In any case, the rival rules do not endorse all of the same predictions in the short run, so they are not “equivalent” there. Of course, in the long run, they are “equivalent”, but why is that fact relevant?
80
Marc Lange
later F s are no different from earlier F s — that the future is like the past, that each F is like an independent flip of the same coin as every other F (i.e., that each F had the same objective chance of being G). 9 BAYESIAN APPROACHES Suppose it could be shown — perhaps by a Dutch Book argument or an argument from calibration [Lange, 1999] — that rationality obliges us (in typical cases) to update our opinions by Bayesian conditionalization (or some straightforward generalization thereof, such as Jeffrey’s rule). This would be the kind of argument that Hume fails to rule out (as I explained in section 3): an argument that is a priori (despite not turning solely on semantic relations, since a degree of belief is not capable of being true or false) and that proceeds from the opinions that constitute a given inductive argument’s premises to the degree of belief that constitutes its conclusion. Such an argument would still be far from a justification of induction. Whether Bayesian conditionalization yields induction or counterinduction, whether it underwrites our ascribing high probability to “All emeralds are green” or to “All emeralds are grue,” whether it leads us to regard a relatively small sample of observed emeralds as having any bearing at all on unexamined emeralds — all depend on the “prior probabilities” plugged into Bayesian conditionalization along with our observations. So in order to explain why we ought to reason inductively (as an adequate justification of induction must do — see section 2 above), the rationality of Bayesian conditionalization would have to be supplemented with some constraints on acceptable priors. This argument has been challenged in several ways. Colin Howson [2000] argues that a justification of induction should explain why we ought to reason inductively. Thus, it can appeal to our prior probabilities; Bayesian conditionalization, acting on these particular priors, underwrites recognizably inductive updating. The “initial assignments of positive probability . . . cannot themselves be justified in any absolute sense”. [Howson, 2000, p. 239] But never mind, Howson says. Inductive arguments are in this respect like sound deductive arguments, they don’t give you something for nothing: you must put synthetic judgements in to get synthetic judgements out. But get them out you do, and in a demonstrably consistent way that satisfies certainly the majority of those intuitive criteria for inductive reasoning which themselves stand up to critical examination. [Howson, 2000, p. 239, see also p. 171] All we really want from a justification of induction is a justification for updating our beliefs in a certain way, and that is supplied by arguments showing Bayesian conditionalization to be rationally compulsory. As Frank Ramsey says, We do not regard it as belonging to formal logic to say what should be a man’s expectation of drawing a white or black ball from an urn; his
Hume and the Problem of Induction
81
original expectations may within the limits of consistency be any he likes, all we have to point out is that if he has certain expectations, he is bound in consistency to have certain others. This is simply bringing probability into line with ordinary formal logic, which does not criticize premisses but merely declares that certain conclusions are the only ones consistent with them. [Ramsey, 1931, p. 189] Ian Hacking puts the argument thus: At any point in our grown-up lives (let’s leave babies out of this) we have a lot of opinions and various degrees of belief about our opinions. The question is not whether these opinions are ‘rational’. The question is whether we are reasonable in modifying these opinions in light of new experience, new evidence. [Hacking, 2001, p. 256] But the traditional problem of induction is whether by reasoning inductively, we arrive at knowledge. If knowledge involves justified true belief, then the question is whether true beliefs arrived at inductively are thereby justified. And if an inductive argument, to justify its conclusion, must proceed from a prior state of opinion that we are entitled to occupy, then the question becomes whether we are entitled to those prior opinions, and if so, how come. I said that Bayesian conditionalization can underwrite reasoning that is intuitively inductive, but with other priors plugged into it, Bayesian conditionalization underwrites reasoning that is counterinductive or even reasoning that involves the confirmation of no claims at all regarding unexamined cases. However, it might be objected that if hypothesis h (given background beliefs b) logically entails evidence e, then as long as pr(h|b) and pr(e|b) are both non-zero, it follows that pr(e|h&b) = 1, and so by Bayes’s theorem, we have pr(h|e&b) = pr(h|b)pr(e|h&b)/pr(e|b) = pr(h|b)/pr(e|b) > pr(h|b), so by Bayesian conditionalization, e confirms h. On this objection, then, Bayesian conditionalization automatically yields induction. However, this confirmation of h (of “All emeralds are green,” for example) by e (“The emerald currently under examination is green”) need not involve any inductive confirmation of h — roughly, any confirmation of h’s predictive accuracy. For example, it need not involve any confirmation of g: “The next emerald I examine will turn out to be green.” Since g (given b) does not logically entail e, pr(e|g&b) is not automatically 1, and so pr(g|e&b) is not necessarily greater than pr(g|b). Howson insists that it is no part of the justification of induction to justify the choice of priors, just as deductive logic does not concern itself with justifying the premises of deductive arguments. [Howson, 2000, p. 2, see also pp. 164, 171, 239; cf. Howson and Urbach, 1989, pp. 189–190] To my mind, this parallel between deduction and induction is inapt. It presupposes that prior probabilities are the premises of inductive arguments — are, in other words, the neutral input or substrate to which is applied Bayesian conditionalization, an inductive rule of inference. But, as Howson rightly emphasizes, it is Bayesian conditionalization
82
Marc Lange
that is neutral; anything distinctively “inductive” about an episode of Bayesian updating must come from the priors. Consequently, a justification of induction must say something about how we are entitled to those priors. Some personalists about probability have argued that if anything distinctively “inductive” about Bayesian updating must come from the priors, then so much the better for resolving the problem of induction, since we are automatically entitled to adopt any coherent probability distribution as our priors. This view is prompted by the notorious difficulties (associated with Bertrand’s paradox of the chord) attending any principle of indifference for adjudicating among rival priors. For example, Samir Okasha writes: Once we accept that the notion of a prior distribution which reflects a state of ignorance is chimerical, then adopting any particular prior distribution does not constitute helping ourselves to empirical information which should be suppressed; it simply reflects the fact that an element of guess work is involved in all empirical enquiry.[Okasha, 2001, p. 322] Okasha’s argument seems to be that a prior state of opinion embodies no unjustified information about the world since any prior opinion embodies some information. But the inductive sceptic should reply by turning this argument around: Since any prior opinion strong enough to support an inductive inference embodies some information, no prior opinion capable of supporting an inductive inference is justified. In other words, Okasha’s argument seems to be that there are no objectively neutral priors, so if the inductive sceptic accuses our priors of being unjustified, we need only ask the sceptic ‘What prior probability do you recommend?’ [. . . ] It does not beg the question to operate with some particular prior probability distribution if there is no alternative to doing so. Only if the inductive sceptic can show that there is an alternative, i.e., that ‘information-free’ priors do exist, would adopting some particular prior distribution beg the question. [Okasha, 2001, p. 323] But there is an alternative to operating from a prior opinion strong enough to support the confirmation of predictions. If the sceptic is asked to recommend a prior probability, she should suggest a distribution that makes no probability assignment at all to any prediction about the world that concerns logically contingent matters of fact. By this, I do not mean the extremal assignment of zero subjective probability to such a claim. That would be to assign it a probability: zero. Nor do I mean assigning it a vague probability value. I mean making no assignment at all to any such claim. According to the inductive sceptic, there is no degree of confidence to which we are entitled regarding predictions regarding unexamined cases.28 28 Admittedly,
the sceptic’s prior distribution violates the requirement that the domain of a
Hume and the Problem of Induction
83
Though an observation’s direct result may be to assign some probability to e, the sceptic’s prior distribution fails to support inductive inferences from our observations (since it omits some of the probabilities required by Bayesian conditionalization or any generalization of it). But that is precisely the inductive sceptic’s point. There is no alternative to operating with a prior distribution that embodies information about the world, as Okasha says, if we are going to use our observations to confirm predictions. But to presuppose that we are justified in using our observations to confirm predictions is obviously to beg the question against the inductive sceptic. 10
WILLIAMS’ COMBINATORIAL JUSTIFICATION OF INDUCTION
In 1947, Donald Williams offered an a priori justification of induction that continues to receive attention [Williams, 1947; Stove, 1986, pp. 55–75]. The first ingredient in Williams’ argument is a combinatorial fact that can be proved a priori: if there is a large (but finite) number of F s, then in most collections of F s beyond a certain size (far smaller than the total population of F s), the fraction of Gs is close to the fraction of Gs in the total F population. For example, consider a population of 24 marbles, of which 16 (i.e., 2/3) are white. The number of 6-member sets that are also 2/3 white (i.e., 4 white, 2 non-white) is [(16 x 15 x 14 x 13)/(4 x 3 x 2)] x [(8 x 7)/2] = 40,960. The number of sets containing 5 white and 1 non-white marbles is 34,944, and the number containing 3 white and 3 non-white marbles is 31,360. So among the 134,596 6-member sets, about 80% contain a fraction of white marbles within 1 marble (16.7%) of the fraction of white marbles in the overall population. For a marble population of any size exceeding one million, more than 90% of the possible 3000-member samples have a fraction of white marbles within 3% of the overall population’s fraction, no matter what that fraction is (even if no sample has a fraction exactly matching the overall population’s). Notice that this is true no matter how small the sample may be as a fraction of the total population. [Williams, 1947, p. 96; Stove, 1986, p. 70] The second ingredient in Williams’ argument is the rationality of what he calls the “statistical [or proportional] syllogism”: if you know that the fraction of As that are B is r and that individual a is A, then if you know nothing more about a that is relevant to whether it is B, then r is the rational degree of confidence for you to have that a is B. Williams regards this principle as an a priori logical truth: “the native wit of mankind . . . has found the principle self-evident.” [Williams, probability function be a sigma algebra. For example, it may violate the additivity axiom [pr(q or ∼ q) = pr(q) + pr(∼ q)] by assigning to (q or ∼ q) a probability of 1 but making no probability assignment to q and none to ∼ q. Some Bayesians would conclude that the sceptic’s “pr” does not qualify as a probability function. However, the sceptic is not thereby made vulnerable to a Dutch Book. She is not thereby irrational or incoherent. What is the worst thing that can be said of her? That she shows a certain lack of commitment. That characterization will hardly bother the sceptic! It may well be overly restrictive to require that the domain of a probability function be a sigma algebra. (See, for instance, [Fine, 1973, p. 62] or any paper discussing the failure of logical omniscience.)
84
Marc Lange
1947, p. 8] It appears to be far enough from the inductive arguments in question that no circularity is involved in justifying them by appealing to it. A statistical syllogism would be another argument of the kind that Hume fails to rule out (as I explained in section 3): an argument that is a priori (despite not turning solely on semantic relations, since a degree of belief is not capable of being true or false) and that proceeds from the opinions constituting a given inductive argument’s premises to the degree of belief that constitutes its conclusion. But how can we use a statistical syllogism to justify induction? Let the As be the large samples of F s and let B be the property of having a fraction of Gs approximating (to whatever specified degree) the fraction in the overall F population. Since it is a combinatorial fact that the fraction of As that are B is large, it follows (by the statistical syllogism) that in the absence of any evidence to the contrary, we are entitled to have great confidence that the fraction of Gs in a given large sample of F s approximates the fraction of Gs among all F s.29 So if f is the fraction of Gs in the observed large sample of F s, then we are entitled (in the absence of any countervailing evidence) to have great confidence that f is the fraction of Gs among all F s. As Williams says, Without knowing exactly the size and composition of the original population to begin with, we cannot calculate . . . exactly what proportion of our “hyper-marbles” [i.e., large samples of marbles] have the quality of nearly-matching-the-population, but we do know a priori that most of them have it. Before we choose one of them, it is hence very probable that the one we choose will be one of those which match or nearly match; after we have chosen one, it remains highly probable that the population closely matches the one we have, so that we need only look at the one we have to read off what the population, probably and approximately, is like. [Williams, 1947, pp. 98–99] Induction is thereby justified. However, we might question whether the statistical syllogism is indeed a principle of good reasoning.30 It presupposes that we assign every possible large sample the same subjective probability of being selected (so since there are overwhelmingly more large samples that are representative of the overall population, we are overwhelmingly confident that the actual sample is representative). Why is this equal-confidence assignment rationally obligatory? The intuition seems to be that 29 This argument, as used to infer from a population’s fraction of Gs to a sample’s likely fraction, is often called “direct inference”, and hence the statistical syllogism is termed “the principle of direct inference” [Carnap, 1950]. An inference in the other direction, from sample to population, is then termed “inverse reasoning”. That probably a sample’s fraction of Gs approximates the population’s fraction can apparently take us in either direction. 30 It cannot be justified purely by Bayesian considerations. If P is that the fraction of Gs in the population is within a certain range of f , and S is that f is the fraction of Gs in the large observed sample, then Bayes’s theorem tells us that pr(P |S) = pr(P ) pr(S|P )/pr(S), where all of these probabilities are implicitly conditional on the size of the sample. It is unclear how to assign the priors pr(S) and pr(P ). Even pr(S|P ) does not follow purely from combinatorial considerations.
Hume and the Problem of Induction
85
when we have no reason to assign any of these samples greater subjective probability than any other, we ought to assign them equal subjective probabilities. To do otherwise would be irrational. But perhaps, in the absence of any relevant information about them, we have no reason to assign any subjective probabilities to any of them. In other words, the motivation for the equal-confidence assignment seems to be that if we have no relevant information other than that most marbles in the urn (or most possible samples) are red (or representative), then it would be irrational to be confident that a non-red marble (or unrepresentative sample) will be selected. But this undoubted fact does not show that it would be rational to expect that a red (or representative) one will be selected. Perhaps we are not entitled to any expectation unless we have further information, such as that the marble (or sample) is selected randomly (i.e., that every one has an equal objective chance of being selected). Williams is quite correct in insisting that in the absence of any relevant information, we are not entitled to believe that the sample is selected randomly. So why are we entitled to other opinions in the absence of any relevant information? Furthermore, we do have further information — for example, that samples with members that are remote from us in space or time will not be selected. This information does not suggest that the sample we select is unrepresentative — if we believe that F s are uniform in space and time. But we cannot suppose so without begging the question.31 Since Williams’s argument is purely formal, we could apparently just as well take G as green or as grue while taking the F s as the emeralds. But we cannot do both on pain of being highly confident both that all emeralds are green and that all emeralds are grue. If we regard the fact that all of the emeralds are sampled before the year 3000 as further information suggesting that the sample may be unrepresentative, then neither hypothesis is supported.32 Finally, is there any reason to believe that statistical syllogisms will lead us to place high confidence in truths more often than in falsehoods (or, at least, that they have a high objective chance of doing so)? If, in fact, our samples are unrepresentative in most cases where we have no other relevant information, then statistical syllogisms will lead us to place high confidence in falsehoods more often than in truths. That we have no good reason to think that a sample is unrepresentative does not show that we are likely to reach the truth if we presume 31 For defense of the statistical syllogism, see [Williams, 1947, pp. 66–73 and p. 176; Carnap, 1959, p. 494; McGrew, 2001, pp. 161–167]. Maher [1996] has argued that the fraction of Gs in the sample may suggest that the sample is unrepresentative and so undercut the statistical syllogism. How are we entitled a priori to have the opinion that a sample with a given fraction of Gs is no less likely if it is representative than if it is unrepresentative? 32 Stove [1986, pp. 140–142] says that he is not committed to reasoning with green in the same way as with grue, but he identifies no a priori ground for privileging one over the other. Campbell [1990], who endorses Williams’s argument, says that “a complex array of higher-order inductions, about natural kinds, about discontinuities in nature, and about the kinds of properties it is significant to investigate” justify privileging green over grue. But aren’t these inductions also going to be subject to versions of the grue problem?
86
Marc Lange
it to be representative unless we believe that if it were unrepresentative, we would probably have a good reason to suspect so. But why should we believe that we would be so well-informed?33 11 THE INDUCTIVE LEAP AS MYTHICAL Hume’s argument appears to show that our observations, apart from any theoretical background, are unable to confirm or to disconfirm any predictions. Supplemented by different background opinions (whether understood as prior probabilities or a uniformity principle), the same observations exert radically different confirmatory influences. But then to be warranted in making any inductive leap beyond the safety of our observations, we must be justified in holding some opinions regarding some prediction’s relation to our observations. These opinions may rest on various scientific theories, which in turn have been confirmed by other observations. But when we pursue the regress far enough — or look at cases where we have very little relevant background knowledge — how can any such opinions be justified? My view [Lange, 2004] is that especially in theoretically impoverished circumstances, the content of our observation reports may include certain expectations regarding the observations’ relations to as yet undiscovered facts — expectations enabling those observations to confirm predictions regarding those facts. An observation report (“That is an F ”) classifies something as belonging to a certain category (F ). That category may be believed to be a “natural kind” of a certain sort (e.g., a species of star, mineral, animal, disease, chemical. . . ). Part of what it is for a category to be a natural kind of a given sort is for its members generally to be alike in certain respects. (These respects differ for different sorts of natural kind.) Members of the same species of star, for instance, are supposed to be generally similar in temperature, intrinsic luminosity, the mechanism by which they generate light, and so forth (and generally different in many of these respects from members of other star species). Therefore, to observe that certain stars are (for instance) Cepheid-type variables, an astronomer must be justly prepared (in the absence of any further information) to regard the examined Cepheids’ possession of various properties (of the sorts characteristic of natural kinds of stars) as confirming that unexamined Cepheids possess those properties as well. In that case, Cepheid observations (in the absence of any other evidence) suffice to justify astronomers in expecting that unexamined Cepheids will exhibit, say, a certain simple period-luminosity relation that examined Cepheids display. No opinions independent of Cepheid observations must be added to them in order to give them the capacity to confirm certain predictions. Hence, there arises no regress-inducing problem of justifying some such independent opinions. 33 I made an analogous point regarding BonJour’s a priori justification of induction. McGrew [2001, pp. 167–170] responds to this criticism of Williams’s argument. Kyburg [1956] argues that even if we knew that some method of inductive reasoning would more often lead to truth than to falsehoods in the long run, we could not justify using it except by a statistical syllogism.
Hume and the Problem of Induction
87
To observe that certain stars are Cepheids, an astronomer must already have the resources for going beyond those observations. As Wilfrid Sellars says, in arguing for a similar point: The classical ‘fiction’ of an inductive leap which takes its point of departure from an observation base undefiled by any notion as to how things hang together is not a fiction but an absurdity. . . . [T]here is no such thing as the problem of induction if one means by this a problem of how to justify the leap from the safe ground of the mere description of particular situations, to the problematic heights of asserting lawlike sentences and offering explanations.[Sellars, 1963b, p. 355] I could not make observations categorizing things into putative natural kinds if I were not justified in expecting the members of those kinds to be alike in the respects characteristic of such kinds. The observations that we most readily become entitled to make in theoretically impoverished contexts are (perhaps paradoxically) precisely those with inductive import — those purporting to classify things into various sorts of natural kinds. That is because although an observation report (“That is an F ”) has noninferential justification, I am justified in making it only if I can justly infer, from my track record of accuracy in making similar responses in other cases, that my report on this occasion is probably true [Sellars, 1963a, pp. 167–170]. If I have no reason to trust myself — if I am not in a position to infer the probable accuracy of my report — then on mature reflection, I ought to disavow the report, regarding it as nothing more than a knee-jerk reaction. But obviously, this inference from my past accuracy in making similar responses is inductive. I am entitled to regard my past successes at identifying F s as confirming that my latest F report is accurate only because I am entitled to expect that generally, unexamined F s look like the F s that I have accurately identified in the past and look different from various kinds of non-F s. (Only then is my past reliability in distinguishing F s good evidence — in the absence of countervailing information — for my future reliability in doing so.) That is just what we expect when the F s form a natural kind. My past reliability at identifying F s, where F s form a natural kind, confirms (in the absence of countervailing information) my future reliability without having to be supplemented by further regress-inducing background opinions. Accordingly, “taxonomic observations” (i.e., identifications of various things as members of various species) are among the sorts of observations that scientists are most apt to be in a position to make in a new field, where their background theory is impoverished. Of course, there is no logical guarantee that these putative observations are accurate. But one becomes qualified to make them precisely because of — rather than despite — the “thickness” of their content. If observers in theoretically impoverished contexts had not taken the F ’s as forming a certain sort of natural kind, then the range of cases in which those observers are justified in making “That is F ” reports would have been different. For example, suppose astronomers had taken the Cepheid-type variable stars as
88
Marc Lange
consisting simply of all and only those variables having light curves shaped like those of the two prototype Cepheids (delta Cephei and epsilon Aquilae). Then astronomers would have classified as non-Cepheids certain stars that they actually deemed to be Cepheids (and would have deemed other stars to be “somewhat Cepheid” or “Cepheid-like”, categories that are never used). Instead astronomers took the Cepheid category as extending from the prototypical Cepheids out to wherever the nearest significant gap appears in the distribution of stellar light curves. That is because they understood the Cepheids to be a natural kind of star sharply different from any other kind. A taxonomic observation report (such as “That is a Cepheid”) embodies expectations regarding as yet undiscovered facts, and these expectations — which ground the most basic inductive inferences made from those observations — are inseparable from the reports’ circumstances of application. The content of the observation reports cannot be “thinned down” so as to remove all inductive import without changing the circumstances in which the reports are properly made. Hume’s problem of induction depends on unobserved facts being “loose and separate” (E, p. 49) from our observational knowledge. But they are not. 12
CONCLUSION
Having surveyed some of most popular recent responses to Hume’s argument (and having, in the previous section, bravely sketched the kind of response I favor), I give the final word to Hume: Most fortunately it happens, that since reason is incapable of dispelling these clouds, nature herself suffices to that purpose, and cures me of this philosophical melancholy and delirium, either by relaxing this bent of mind, or by some avocation, and lively impression of my senses, which obliterates all these chimeras. I dine, I play a game of backgammon, I converse, and am merry with my friends; and when after three or four hour’s amusement, I wou’d return to these speculations, they appear so cold, and strain’d, and ridiculous, that I cannot find in my heart to enter into them any farther. Here then I find myself absolutely and necessarily determin’d to live, and talk, and act like other people in the common affairs of life. (T , p. 269) Most fortunately for us, Hume did not act like other people in all affairs of life. Rather, he bequeathed to us an extraordinary problem from which generations of philosophers have derived more than three or four hours’ amusement. I, for one, am very grateful to him. BIBLIOGRAPHY [Beauchamp and Rosenberg, 1981] T. Beauchamp and A. Rosenberg. Hume and the Problem of Causation. Oxford University Press, New York, 1981.
Hume and the Problem of Induction
89
[Black, 1954] M.Black. Inductive support of inductive rules. In Black, Problems of Analysis. Cornell University Press, Ithaca, NY, pp. 191–208, 1954. [BonJour, 1986] L. BonJour. A reconsideration of the problem of induction. Philosophical Topics 14, pp. 93—124, 1986. [BonJour, 1998] Laurence BonJour. In Defense of Pure Reason. Cambridge University Press, Cambridge, 1998. [Brandom, 1994] R. Brandom. Making it Explicit. Harvard University Press, Cambridge, MA, 1994. [Broad, 1952] C.D. Broad. Ethics and the History of Philosophy. Routledge and Kegan Paul, London, 1952. [Brueckner, 2001] A. Brueckner. BonJour’s a priori justification of induction. Pacific Philosophical Quarterly 82, pp. 1–10, 2001. [Butler, 1813] J. Butler. Analogy of religion, natural and revealed. In Butler, The Works of Joseph Butler, volume 1. William Whyte, Edinburgh, 1813. [Campbell, 1990] K. Campbell. Abstract Particulars. Blackwell, Oxford, 1990. [Carnap, 1950] R. Carnap. Logical Foundations of Probability. University of Chicago Press, Chicago 1950. [Dummett, 1981] M. Dummett. Frege: Philosophy of Language, 2nd ed. Harvard University Press, Cambridge, MA, 1981. [Feigl, 1950] H. Feigl. De principiis non disputandum. . . ?. In M. Black (ed.), Philosophical Analysis. Cornell Unniversity Press, Ithaca, pp. 119—156, 1950. [Fine, 1973] T. Fine. Theories of Probability. Academic, New York, 1973. [Franklin, 1987] J. Franklin. Non-deductive logic in mathematics. British Journal for the Philosophy of Science 38, 1—18, 1987. [Garrett, 1997] D. Garrett. Cognition and Commitment in Hume’s Philosophy. Oxford University Press, New York, 1997. [Goodman, 1954] N. Goodman. Fact, Fiction and Forecast. Harvard University Press, Cambridge, MA, 1954. [Hacking, 1968] I. Hacking. One problem about induction. In I. Lakatos (ed.), The Problem of Induction. North-Holland, Amsterdam, pp. 44—59, 1968. [Hacking, 2001] I. Hacking. An Introduction to Probability and Inductive Logic. Cambridge University Press, Cambridge, 2001. [Harman, 1965] G. Harman. Inference to the best explanation. Philosophical Review 74, pp. 88–95, 1965. [Horwich, 1982] P. Horwich. Probability and Evidence. Cambridge University Press, Cambridge, 1982. [Howson, 2000] C. Howson. Hume’s Problem: Induction and the Justification of Belief. Clarendon, Oxford, 2000. [Howson and Urbach, 1989] C. Howson and P. Urbach. Scientific Reasoning: The Bayesian Approach. Open Court, La Salle, IL, 1989. [Hume, 1977] D. Hume. An Enquiry Concerning Human Understanding, ed. Eric Steinberg. Hackett, Indianapolis, 1977. [Hume, 1978] D. Hume. A Treatise of Human Nature, ed. L.A. Selby-Bigge and P.H. Nidditch, 2nd ed. Clarendon, Oxford, 1978. [Keynes, 1921] J. M. Keynes. A Treatise on Probability. Macmillan, London, 1921. [Kornblith, 1993] H. Kornblith. Inductive Inference and its Natural Ground. MIT Press, Cambridge, MA, 1993. [Kyburg, 1956] H. Kyburg. The justification of induction. Journal of Philosophy 53, pp. 394– 400, 1956. [Lange, 1999] M. Lange. Calibration and the epistemological role of Bayesian conditionalization. Journal of Philosophy 96, pp. 294–324. [Lange, 2004] M. Lange. Would direct realism resolve the classical problem of induction? Nous 38, pp. 197–232, 2004. [Lipton, 1991] P. Lipton. Inference to the Best Explanation. Routledge, London, 1991. [Mackie, 1980] J.L. Mackie. The Cement of the Universe. Clarendon, Oxford, 1980. [Maher, 1996] P. Maher. The hole in the ground of induction. Australasian Journal of Philosophy 74, 423–432, 1996. [McGrew, 2001] T. McGrew. Direct inference and the problem of induction. The Monist 84, pp. 153-78, 2001.
90
Marc Lange
[Mill, 1872] J.S. Mill. A System of Logic, 8th ed. Longmans, London, 1872. [Norton, 2003] J. Norton. A material theory of induction. Philosophy of Science 70, pp. 647–70, 2003. [Okasha, 2001] S. Okasha. What did Hume really show about induction? The Philosophical Quarterly 51, pp. 307–327, 2001. [Papineau, 1993] D. Papineau. Philosophical Naturalism. Blackwell, Oxford, 1993. [Popper, 1959] K. Popper. The Logic of Scientific Discovery. Basic Books, New York, 1959. [Popper, 1972] K. Popper. Conjectural knowledge: my solution to the problem of induction. In Popper, Objective Knowledge. Clarendon, Oxford, pp. 1—31, 1972. [Putnam, 1994] H. Putnam . Reichenbach and the limits of vindication. In Putnam, Words and Life. Harvard University Press, Cambridge, MA, pp. 131—148, 1994. [Ramsey, 1931] F. Ramsey. Truth and probability. In Ramsey, The Foundations of Mathematics and Other Logical Essays. Routledge and Kegan Paul, London, pp. 156—198, 1931. [Read and Richman, 2000] R.J. Read and K.A. Richman.The New Hume Debate. Routledge, London, 2000. [Reichenbach, 1938] H. Reichenbach. Experience and Prediction. University of Chicago Press, Chicago, 1938. [Reichenbach, 1949a] H. Reichenbach. The Theory of Probability. University of California Press, Berkeley, 1949. [Reichenbach, 1949b] H. Reichenbach. Comments and criticism. Journal of Philosophy 46, pp. 545–549, 1949. [Reichenbach, 1968] H. Reichenbach. The Rise of Scientific Philosophy. University of California Press, Berkeley, 1968. [Rhees and Phillips, 2003] R. Rhees and D.Z. Phillips. Wittgenstein’s On Certainty: There — Like our Life. Blackwell, Oxford, 2003. [Russell, 1919] B. Russell. Introduction to Mathematical Philosophy. George Allen & Unwin, London, 1919. [Russell, 1948] B. Russell. Human Knowledge: Its Scope and Limits. Simon and Schuster, New York, 1948. [Russell, 1959] B. Russell. The Problems of Philosophy. Oxford University Press, London, 1959. [Salmon, 1957] W. Salmon. Should we attempt to justify induction? Philosophical Studies 8, pp. 33–48, 1957. [Salmon, 1963] W. Salmon. Inductive inference. In B. Baumrin (ed.), Philosophy of Science: The Delaware Seminar, volume II. Interscience Publishers, New York and London, pp. 35370, 1963. [Salmon, 1967] W. Salmon. The Foundations of Scientific Inference. University of Pittsburgh Press, Pittsburgh, 1967. [Salmon, 1981] W. Salmon. Rational prediction. British Journal for the Philosophy of Science 32, pp. 115—25, 1981. [Salmon, 1984] W. Salmon. Scientific Explanation and the Causal Structure of the World. Princeton University Press, Princeton, 1984. [Salmon, 1991] W. Salmon. Hans Reichenbach’s vindication of induction. Erkenntnis 35, pp. 99–122, 1991. [Salmon, Barker, and Kyburg, 1965] W. Salmon, S. Barker, and H. Kyburg, Jr. Symposium on inductive evidence. American Philosophical Quarterly 2, pp. 265–80, 1965. [Sankey, 1997] H. Sankey. Induction and natural kinds. Principia 1, pp. 239—54, 1997. [Sellars, 1963a] W. Sellars. Empiricism and the philosophy of mind. In Sellars, Science, Perception and Reality. Routledge and Kegan Paul, London, pp. 127—196, 1963. [Sellars, 1963b] W. Sellars. Some reflections on language games. In Sellars, Science, Perception and Reality. Routledge and Kegan Paul, London, pp. 321–358, 1963. [Skyrms, 1986] B. Skyrms. Choice and Chance, 3rd ed. Wadsworth, Belmont, CA, 1986. [Smith, 1941] N. K. Smith. The Philosophy of David Hume. Macmillan, London. [Sober, 1988] E. Sober. Reconstructing the Past. Bradford, Cambridge, MA, 1988. [Stove, 1965] D. Stove. Hume, probability, and induction. Philosophical Review 74, pp. 160—77, 1965. [Stove, 1973] D. Stove. Probability and Hume’s Inductive Scepticism. Clarendon, Oxford, 1973. [Stove, 1986] D. Stove. The Rationality of Induction. Oxford University Press, Oxford, 1986. [Strawson, 1952] P.F. Strawson. An Introduction to Logical Theory. Methuen, London, 1952.
Hume and the Problem of Induction
91
[Strawson, 1958] P.F. Strawson. On justifying induction. Philosophical Studies 9, pp. 20—21. 1958. [Stroud, 1977] B. Stroud. Hume. Routledge, London and New York, 1977. [Thagard, 1978] P. Thagard. The best explanation: criterion for theory choice. Journal of Philosophy 75, pp. 76–92, 1978. [van Cleve, 1984] J. van Cleve. Reliability, justification, and the problem of induction. In P. French, T. Uehling, and H. Wettstein (eds.), Midwest Studies in Philosophy IX. University of Minnesota Press, Minneapolis, pp. 555–567, 1984. [van Fraassen, 1981] B. van Fraassen. The Scientific Image. Clarendon, Oxford, 1981. [van Fraassen, 1989] B. van Fraassen. Laws and Symmetry. Clarendon, Oxford, 1989. [Williams, 1947] D.C. Williams. The Ground of Induction. Harvard University Press, Cambridge, MA, 1947. [Wittgenstein, 1953] L. Wittgenstein. Philosophical Investigations. Blackwell, Oxford, 1953.
THE DEBATE BETWEEN WHEWELL AND MILL ON THE NATURE OF SCIENTIFIC INDUCTION
Malcolm Forster
1 WHY THE DEBATE IS NOT MERELY TERMINOLOGICAL The very best examples of scientific induction were known in the time of William Whewell (1994–1866) and John Stuart Mill (1806–1873). It is puzzling, therefore, that there was such a deep disagreement between them about the nature of induction. It is perhaps astounding that the dispute is unresolved to this very day! What disagreement could there be about Newton’s discovery of universal gravitation? Prior to Newton, it was well known that gravity acts on objects near the Earth’s surface, and Copernicus even speculated that the planets have a spherical shape because they have their own gravity. But Newton was the first to understand that it’s the Earth’s gravity that keeps the Moon in orbit around the Earth, and that the Sun’s gravity keeps the Earth and the Moon in orbit around the Sun. At the root of this discovery was Newton’s explication of the kinematical concept of acceleration. To understand that the Moon (just like the fabled apple) is pulled by the Earth, one has to understand that the Moon is accelerating towards the Earth even if it is moving uniformly on the circular orbit around the Earth. Acceleration must not be defined as the time rate of change of speed, but as the time rate of change of velocity, where velocity has direction as well as magnitude. Thus, the Moon is accelerating towards the Earth because its velocity is changing its direction. Galileo, on the other hand, worked with a circular law of inertia, according to which uniform circular motion around the Earth was a “natural” motion that required no force. Further explication of the new conception of acceleration led Newton to discover that if the line from a point O to a body B sweeps out equal areas (Kepler’s second law), then B is accelerating towards O. If, in addition, the body follows an elliptical path with O at one focus (Kepler’s first law), then the acceleration towards O is inversely proportional to the square of the distance of B from O. In the case of the planets moving around the sun, if we assume that the constant of proportionality is the mass of the sun, then Kepler’s third law follows as well. Thus, Newton’s new conception of acceleration causes Kepler’s three laws to “jump together” in a way that tests the conceptions that Kepler had previously employed, involving ellipses,
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
94
Malcolm Forster
areas swept out by the line OB, the mean length of that line, and its period of revolution around the sun. For Whewell, the addition of the conceptions in each of these inductions is the defining characteristic of induction. Whewell introduced a new term for the process of binding the ‘facts’ by a new conception. He called it the colligation of facts, and used this phrase interchangeably with the word ‘induction’. Mill reacted negatively to this ‘improper’ use of the term. Mill agreed that new conceptions are often applied to the ‘facts’ during an induction, but he insisted that they are not part of the induction, and certainly not a defining characteristic. For Mill, induction consisted in extrapolating or interpolating a regularity from the known instances to the unknown instances, as is classically the case in examples of simple enumerative induction such as: All observed swans are white; therefore all swans are white. Whewell agreed that interpolation and extrapolation does, in general, result from a colligation of facts, but should not be the property that defines induction. It is tempting at this point to dismiss the debate as merely terminological. Whewell has an unusual conception of what induction is, but once it is taken on board, it is possible to translate between the two vocabularies. I agree that there is a large terminological component in the debate, but I insist that it is not merely terminological. Behind the difference in terminology is a very deep disagreement about the objectivity of human knowledge. Mill and Whewell both want to defend the objectivity of human knowledge. But they have quite distinctive views on how it comes about, and Whewell’s idea is interesting and new. Mill is entrenched in the rather extreme empiricist view that human knowledge is objective because it is built on an objective foundation of empirically given statements from which higher claims are inferred using the objective canons of inductive reasoning. Human knowledge maintains its objectivity (to the extent that it succeeds) by minimizing the influences of subjective elements at every stage of the process. For Whewell, subjective and objective elements are inseparable parts of human knowledge at any level in the hierarchy of knowledge, from the concept-ladenness of perceptual knowledge at the bottom, to the concept-ladenness of the highest forms of scientific knowledge at the top. The counter-proposal is that empirical success at the higher levels of knowledge, captured in terms of what he called the consilience of inductions, can help to secure the lower levels as a kind of bootstrapping effect. For example, Kepler’s colligations of facts are concept-laden in a way that makes them subjective at first, but once Newton used the new conception of force and acceleration to show how the facts, described in terms of Kepler’s colligations, lead successfully to a higher level colligation of facts, then the subjective elements involved are successfully “objectified”. Knowledge is like a building in which the addition of higher floors helps strengthen the lower levels. Whewell harbored a deep distain for Mill’s purely empiricist philosophy, which he saw as constantly downplaying the importance of the subjective component of knowledge, or as trying to reduce it to purely empirical elements at every stage.
The Debate between Whewell and Mill on the Nature of Scientific Induction
95
In contrast, the conceptual components of knowledge are, for Whewell, the very instruments that ultimately explain how human knowledge is possible. They produce the colligations that may be confirmed by the consiliences of colligations, which serves to objectify the subject elements, making knowledge possible. The introduction of new conceptions in the colligation of facts is therefore a defining characteristic of induction. There is a major problem in trying to understand the Whewell-Mill debate from what the authors wrote. Whewell was primarily a historian of science, but Mill did not have a good knowledge of the history of science. Whewell allowed Mill to center the debate on particular examples of induction such as Kepler’s inference that Mars moves on an ellipse. They got so tied up in that example, that the larger philosophical differences got lost in the discussion. It’s possible that Whewell’s hierarchical view of knowledge led him to believe that the bigger picture is played out in smaller examples on a smaller scale. Unfortunately, Whewell did not recall the details of the Kepler example in sufficient detail bring out those features of it. In section 2, I attempt to remedy that problem by describing the Kepler example in a way that challenges Mill’s picture of it. Section 3 turns to Whewell’s bigger picture by discussing his tests of hypotheses, while section 4 argues that WhewellMill debate helps us understand why sophisticated methods of induction have not been programmed to run automatically on a computer. Finally, section 5 asks whether the Whewell-Mill debate may help us identify fundamental limitations in the scope of Bayesian and Likelihoodist theories of evidence and confirmation. 2 THE KEPLER EXAMPLE AND THE COLLIGATION OF FACTS The colligation of facts was Whewell’s name for scientific induction. Its defining characteristic is the introduction of a new conception not previously applied to the data at hand, which unites and connects the data. In curve fitting, the idea is easy to visualize. According to Whewell, “the Colligation of ascertained Facts into general Propositions” consists of (1) the Selection of the Idea, (2) the Construction of the Conception, and (3) the Determination of the Magnitudes. In curve fitting, these three steps correspond to (1) the determination of the Independent Variable, (2) the Formula, and (3) the Coefficients. Once the variables are chosen (Step 1), one chooses a particular functional relationship (Step 2; choose the Formula, Conception, family of curves) characterized in terms of some adjustable parameters (which Whewell calls coefficients), and then one fits the curves to the data in order to estimate the values of the parameters (Step 3; determining the magnitude of the coefficients). Consider the simplest possible example. Suppose we hang an object on a beam balance in order to infer its mass from the distance at which a unit mass must be slid along the beam to counterbalance the object in question. If the units are chosen appropriately, and the device is built well, then the mass value can be read straight from the distance at which the unit weight balances the beam. The dependent variable chosen in step 1 of the colligation of facts is x (there is no
96
Malcolm Forster
independent variable), and the family of “curves” or the formula chosen in step 2 is x = m, where x is the distance of the unit mass from the fulcrum and m is an adjustable parameter, which represents the mass of the object in question. Whewell’s third step in the colligation of facts refers to the determination of the mass values by inferring them from the x values using the formula. The conception being introduced is the formula (∀o)(x(o) = m(o)), where ‘o’ ranges over a set of objects. The formula is something added to or imposed upon the facts by mind of the investigator; it is not contained in, or read from, those facts. Of course, the magnitude of the mass is read from the facts; indeed, this is the third step in Whewell’s colligation of facts. But that does not mean that the formula itself is determined by the facts. The underdetermination implies that the subjective elements in the colligation of facts make the inductive conclusions uncertain and conjectural. In order to defend the objectivity of our knowledge, we have two choices. We can choose the Millean strategy of denying that there is ever any such underdetermination, or go for the Whewellian strategy of allowing that the consilience of inductions can later test the conjecture, and upgrade its confirmational status in light of this higherlevel empirical success. To take our hindsight wisdom for granted, as Mill does, and to suppose that the initial induction had this status all along, is to commit the kind of error that non-historians often make. Though our beam balance example is not a real piece of history, the Millean mistake in that example would be to take the agreement of spring balance measurements of mass and beam balance measurements of mass for granted, and to assume that the justification for postulating ‘mass’ already existed prior to the consilience. Unfortunately, the debate centers around the Kepler example and neither author gives the details of this important example in sufficient detail for the purpose at hand. It is especially confusing because Mill held the very strange and rather complicated view that Kepler did not perform any induction at all, even in the very broad sense in which Mill uses the term. Mill’s strategy is to make a distinction between a description and an explanation, and to argue that the inductive conclusion in the Kepler example is merely a description of the data, and therefore, there was no induction performed by Kepler. For example, in his view, when the ancients hypothesized that the planets move by being embedded on crystalline spheres, they put forward an explanation of celestial motions. But when Ptolemy and Copernicus conceived of the motions in terms of the combinations of circles, they were merely putting forward a description. In Mill’s words: When the Greeks abandoned the supposition that the planetary motions were produced by the revolution of material wheels, and fell back upon the idea of “mere geometrical spheres or circles,” there was more in this change of opinion than the mere substitution of an ideal curve for a physical one. There was the abandonment of a theory, and the replacement of it by a mere description. No one would think of call-
The Debate between Whewell and Mill on the Nature of Scientific Induction
97
ing the doctrine of material wheels a mere description. That doctrine was an attempt to point out the force by which the planets were acted upon, and compelled to move in their orbits. But when, by a great step in philosophy, the materiality of the wheels was discarded, and the geometrical forms alone retained, the attempt to account for the motions was given up, and what was left of the theory was a mere description of the orbits. [Mill 1872, Book III, Chapter ii, section 4] It’s true that no one would think of calling the doctrine of material wheels a mere description. But it is very strange that Mill should insist that it becomes a mere description as soon as the materiality of the wheels is discarded. For these “mere descriptions” entail predictions that are not part of the data, and anything that goes beyond the data goes from the known to the unknown should therefore count as an induction, according to Mill’s own definition. Thus, even if Kepler’s conclusion were a mere description, in the sense that Mill has just described, it should not disqualify Kepler’s inference as counting as an induction. In order to be as charitable as possible to Mill, let me begin with the example that he presents as the clearest in his favor. It is about the circumnavigation of an island: A navigator sailing in the midst of the ocean discovers land: he cannot at first, or by any one observation, determine whether it is a continent or an island; but he coasts along it, and after a few days finds himself to have sailed completely round it: he then pronounces it an island. Now there was no particular time or place of observation at which he could perceive that this land was entirely surrounded by water: he ascertained the fact by a succession of partial observations, and then selected a general expression which summed up in two or three words the whole of what he so observed. But is there anything of the nature of an induction in this process? Did he infer anything that had not been observed, from something else which had? Certainly not. He had observed the whole of what the proposition asserts. That the land in question is an island, is not an inference from the partial facts which the navigator saw in the course of his circumnavigation; it is the facts themselves; it is a summary of those facts; the description of a complex fact, to which those simpler ones are as the parts of a whole. [Mill, 1872, Book III, ch. ii, section 3] Astonishingly, even in this example, Mill’s case is very weak. For if we think carefully about what is observed in this example, it is the similarity of the view of the shoreline at the start and the end of the circumnavigation. The views are not exactly the same because the distance from the shore is different, the tides are different, and the times of day are different. It is not given in the facts that the views are of the same shoreline. That is a conclusion. The hypothesis that an island has been circumnavigated explains why the views look similar. That the conclusion is inductive is made plain by the fact that it makes a prediction,
98
Malcolm Forster
which may be false. For it predicts that if we continue sailing further in the same direction, then we will see an ordered sequence of previously seen views of the shoreline. It is puzzling that Mill does not see this; he clearly defines induction, in his terms, as any inference from the known to the unknown. Perhaps he sees the logical gap as small in this case. But it gets much larger in the Kepler example because it is not merely a circumnavigation that is inferred, but also the exact path (Kepler’s first law) and rate of motion (Kepler’s area law). The puzzle is resolved a little once we look more carefully at Mill’s description of the Kepler example. He continues from the previous passage. Now there is, I conceive, no difference in kind between this simple operation [in the island example], and that by which Kepler ascertained the nature of the planetary orbits: and Kepler’s operation, all at least that was characteristic in it, was not more an inductive act than that of our supposed navigator. The object of Kepler was to determine the real path described by each of the planets, or let us say the planet Mars (since it was of that body that he first established the two of his three laws which did not require a comparison of planets). To do this there was no other mode than that of direct observation: and all which observation could do was to ascertain a great number of the successive places of the planet; or rather, of its apparent places. That the planet occupied successively all these positions, or at all events, positions which produced the same impressions on the eye, and that it passed from one of these to another insensibly, and without any apparent breach of continuity; thus much the senses, with the aid of the proper instruments, could ascertain. What Kepler did more than this, was to find what sort of a curve these points would make, supposing them to be all joined together. He expressed the whole series of the observed places of Mars by what Dr. Whewell calls the general conception of an ellipse. This operation was far from being as easy as that of the navigator who expressed the series of his observations on successive points of the coast by the general conception of an island. But it is the very same sort of operation; and if the one is not an induction but a description, this must also be true of the other. [Mill, 1872, Book III, ch. ii, section 3] Mill’s first naivet´e is his passing from “the successive apparent places of the planet” to “the successive places of the planet”, as if there is no important gap between the 3-dimensional positions of Mars and the angular position of Mars relative to the fixed stars. Then, without any additional argument, Mill simply affirms the analogy: “. . . if the one is not an induction but a description, this must also be true of the other.” Let’s read more. The only real induction concerned in the case, consisted in inferring that because the observed places of Mars were correctly represented
The Debate between Whewell and Mill on the Nature of Scientific Induction
99
by points in an imaginary ellipse, therefore Mars would continue to revolve in that same ellipse; and in concluding (before the gap had been filled up by further observations) that the positions of the planet during the time which intervened between two observations, must have coincided with the intermediate points of the curve. For these were facts which had not been directly observed. They were inferences from the observations; facts inferred, as distinguished from facts seen. But these inferences were so far from being a part of Kepler’s philosophical operation, that they had been drawn long before he was born. Astronomers had long known that the planets periodically returned to the same places. [Mill, 1872, Book III, ch. ii, section 3] So, finally, Mill states why Kepler did not perform an induction. The induction was already performed by astronomers before him who had concluded that the planets returned to the same places after a fixed period of time. Yes, astronomers before Kepler did assume that planets repeated exactly the same paths. But that inductive conclusion is very vague because it does not say what the path was. Specifying the path adds a great deal of predictive content, and so Kepler’s inference does take us from what is known to what is unknown even if we treat the periodicity of the orbits as known. The only way out for Mill is to insist that the full specification of the path (the particular ellipse) was a part of the data. Mill seems to be assuming that continuous sections of Mars’s orbit were observed at various time, and over time, these sections covered the whole ellipse. This is factually incorrect, as we shall see. But even if it were true, it still does not follow that Kepler conclusion is a mere description of the data, unless the observations are exact. Any margin of error can allow for a multitude of possible paths that can disagree in the accelerations that are attributed to the planets at different times. The consequences that Kepler’s laws have concerning the (unobserved) instantaneous accelerations of the planets will be crucial in Newton’s higher level colligation of Kepler’s three laws, according to which all the planets are attracted to the sun inversely to the square of their distances to the sun. It’s time to correct this series of mistakes (see also [Harper, 1989; 1993; 2002; Harper et al., 1994]).1 Mill’s first mistake was to ignore the difference between angular positions and 3-dimensional positions; this is a huge mistake. The correct story is complicated because it’s not so easy to fill this logical gap. To do it, Kepler first needed to determine earth’s orbit around the sun in relation to a particular point on the orbit of Mars. The measured period of the Martian orbit was 687 days, which is a little under two years. Tycho Brahe’s observations from earth at E, and at E1 687 days later, Kepler obtained the angle SE1 M directly, and obtained ESE1 from well known tabulations of the (angular) motion of the sun across the fixed stars. (See Fig. 1.) Mill is right that Kepler simply assumed that the orbits were 1I
follow Hanson’s [1970, pp. 277–282] account.
100
Malcolm Forster
periodic, even though it could never have been justified as exactly true (because it is not).
Figure 1. The first step in Kepler’s determination of Mars orbit was the calculation of the earth’s orbital motion. S denotes the sun, and M refers to Mars. As a check, Kepler might also have compared the two apparent positions of Mars relative to the fixed stars to obtain the third angle in the triangle, SM E1 (given that Mars returns to the same position M after 687 days). This is an important check given that the periodicity assumption is not entirely secure. The shape of the triangle SE1 M is thereby given, and this determines the distance SE1 as a ratio of the (unknown) distance SM . Similar calculations for triangles SE2 M , etc, obtained when Mars had returned to the point M again, then give the distances SE2 , etc, as a ratio of SM also. By then fitting a smooth elliptic orbit to these discrete data points, Kepler determined the motion of the Earth around the sun. Only now is Kepler able to return to the main problem of measuring the distance of Mars from the sun at different stages of its orbit. Consider another observation of Mars at M ′ in opposition with the earth at E0′ 687 days later at E1′ . (See Fig. 2.) Again, the shape of the triangle SE1′ M ′ is determined from the knowledge of its angles, and this gives the distance SM ′ as a ratio of SE1′ . But the distances SE1 ′ are known (as a ratio of SM) from the previous colligation of the facts concerning the orbit of earth. Therefore, SM ′ , SM ′′ , etc, are determined as ratios of SM . Kepler then fitted another elliptic curve to obtain the orbit of Mars around the sun as a continuous function of time, which he described in his first law (elliptic path) and second law (equal areas swept out in equal times). Here Kepler is adding a new conception by applying his elliptical formula to the inferred data. Although
The Debate between Whewell and Mill on the Nature of Scientific Induction
101
these inductions are suggestive, and he may have eliminated many competing hypotheses, Kepler himself did not succeed in fully justifying his results. That was left to Newton.
Figure 2. The second step in Kepler’s calculation of the Martian orbit. Against Mill, it is now clear that Kepler’s data was only a discrete sampling of points on Mars’ orbit. Moreover, each was inferred from measurements of the angles of a triangle and distance ratios that were inferred from another colligation of facts. They were hardly the incorrigible “givens” that empiricists like Mill assume to be the bedrock of inductive inferences. The 3-dimensional positions attributed to Mars were determined in a heavily theory-laden way. However natural it might seem to assume, in hindsight, that the planets live in a 3-dimensional space, such attributions are not part of any theory-neutral observation language [Kuhn, 1970]. But, for Whewell, this does not signal the end of the objectivity of science. Higher-level consiliences discovered by Newton will eventually ground the validity of these lower-level conceptions. The same point applies to Kepler’s ellipse. Yes, the ellipse hypothesis might have produced the best fit with the data out of the nineteen hypotheses that Kepler tried, but that does not mean that was completely secure at that time. It was later confirmed by the intimate connection between the inverse square law and Kepler’s first and second laws discovered by Newton when he proved that any planet moving such that the line from sun sweeps out equal areas in equal time is accelerating towards the sun, and further, that if the path is an ellipse, the sun-seeking acceleration is inversely proportional to the square of the distance. Furthermore, Kepler’s third law is icing on the cake because it also follows from
102
Malcolm Forster
the inverse square law that the ratios R3 /T 2 are independent measurements of the Sun’s mass, adding to the consilience of inductions. Colligation, for Mill, is a part of the invention process, whereas induction (properly so-called) is relevant to questions of justification. Whewell’s characterization of induction, Mill objects, belongs to (what we call) the ‘context of discovery’. Accordingly, Mill [1872, Book III, ch. ii, section 5] charges that “Dr Whewell calls nothing induction where there is not a new mental conception introduced and everything induction where there is.” “But,” he continues, “this is to confuse two very different things, Invention and Proof.” “The introduction of a new conception belongs to Invention: and invention may be required in any operation, but it is the essence of none.” Abstracting a general proposition from known facts without concluding anything about unknown instances, Mill goes on to say, is merely a “colligation of facts” and bears no resemblance to induction at all. In sum, Mill thinks that the colligation of facts are mere descriptions that have nothing to do with the justification of scientific hypotheses. Contrary to what Mill thinks, colligations are not mere descriptions. They do add something unknown to the facts; any general proposition (in Whewell’s sense) can be tested further, either by untried instances, or by the consilience of inductions. It does, therefore, go beyond the data. Yes, mental acts are essential to invention and discovery. But they are also essential to justification. Conceptions are essential to the justification of the hypothesis that results from a colligation of facts in spite of the fact that conceptions are mental, and therefore subjective. Conceptions are essential because there can be no consilience of inductions without them. For, the consilience of inductions often consists of the agreement of magnitudes (Step 3 in the colligation of facts) determined in separate inductions, which derive from the new conception imposed upon the facts in those inductions. Mill has no good reason to accuse Whewell of confusing invention and proof. At its core, the dispute is really about the nature of evidence and justification — about how hypotheses are tested and confirmed. 3 WHEWELL’S TESTS OF HYPOTHESES Whewell distinguishes four tests of scientific hypotheses (although the last one is more like a sign than a test). By ‘instances’ he is referring to empirical data that can be fitted to the hypothesis in question: 1. The Prediction of Tried Instances. 2. The Prediction of Untried Instances; 3. The Consilience of Inductions; and 4. The Convergence of a Theory towards Simplicity and Unity. Keep in mind that Whewell uses the term ‘colligation of facts’ interchangeably with ‘induction’. A consilience of inductions occurs when two, or more, colligations of
The Debate between Whewell and Mill on the Nature of Scientific Induction
103
facts are successfully unified in some way. Newton’s theory of gravity applied the same form of equation to celestial and terrestrial motions (the inverse square law), and in the case of the moon and the apple, both colligations of facts made use of the same adjustable parameter (the earth’s mass). Consequently, the moon’s motion and an apple’s motion provided independent measurements of the earth’s mass, and the agreement of these independent measurements was an important test of Newton’s hypothesis. This test is more than a prediction of tried or untried instances. It leads to a prediction of facts of a different kind (facts about celestial bodies from facts about terrestrial bodies, and vice versa). The consilience of inductions leads to a convergence towards simplicity and unity because unified theories forge connections between disparate phenomena, and these connections may be tested empirically, usually by the agreement of independent measurements. So, a theory can be unified in response to a successful consilience of inductions. Simplicity and unity are necessary conditions for the consilience of inductions, but not sufficient. A theory like ‘everything is the same as everything else’ is highly unified, but not consilient. As Einstein once described it, science should be simple, but not too simple. In the Novum Organon Renovatum, Whewell [1989, 151] speaks of the consilience of inductions in the following terms: We have here spoken of the prediction of facts of the same kind as those from which our rule was collected [tests (1) and (2)]. But the evidence in favour of our induction is of a much higher and more forcible character when it enables us to explain and determine cases of a kind different from those which were contemplated in the formation of our hypothesis. The instances in which this has occurred, indeed, impress us with a conviction that the truth of our hypothesis is certain. No accident could give rise to such an extraordinary coincidence. No false supposition could, after being adjusted to one class of phenomena, exactly represent a different class, where the agreement was unforeseen and uncontemplated. That rules springing from remote and unconnected quarters should thus leap to the same point, can only arise from that being the point where truth resides. Accordingly the cases in which inductions from classes of facts altogether different have thus jumped together, belong only to the best established theories which the history of science contains. And as I shall have occasion to refer to this peculiar feature of their evidence, I will take the liberty of describing it by a particular phrase; and will term it the Consilience of Inductions. [Whewell, 1989, 153] “Real discoveries are . . . mixed with baseless assumptions” (Whewell, 1989, 145), which is why Whewell considers the consilience of inductions to provide additional guidance in finding the “point where the truth resides.” Whewell has been soundly criticized over the years for his claim that the consilience of inductions “impress us with a conviction that the truth of our hypothesis
104
Malcolm Forster
is certain” and that “no false supposition could, after being adjusted to one class of phenomena, exactly represent a different class, where the agreement was unforeseen and uncontemplated.” Given the explication of the notion of truth that we use today, according to which a hypothesis is false if any small part of it is false, Whewell’s claims cannot be defended. But if they are suitably qualified, they cannot be so easily dismissed. It is true that such cases “belong only to the best established theories which the history of science contains.” In place of the consilience of inductions, Mill talks about the deductive subsumption of lower level empirical laws under more fundamental laws, which is a well-known part of hypothetico-deductivism. Whewell’s account of consilience gets around the common objection that deductive subsumption is too easy to satisfy. For instance, hypothetico-deductivism tries to maintain that Galileo’s theory of terrestrial motion, call it G, and Kepler’s theory of celestial motion, K, are subsumed under Newton’s theory N because N deductively entails G and K. The problem is that G and K are also subsumed under the mere conjunction of (G&K), so deductive subsumption by itself cannot fully capture the advantage that N is more unified or consilient. Many respond to the problem by saying that unification and simplicity must be added to confirmational equation as non-empirical virtues. But this is to short-change empiricism, because N does make empirical predictions that (G&K) does not. Namely, N predicts the agreement of independent measurements of the earth’s mass from celestial and terrestrial phenomena. That is why Whewell’s theory is better than Mill’s theory. Many of these ideas about confirmation have been raised in the literature before [Forster, 1988]. Earman [1978] uses the idea that unified hypotheses have greater empirical content to make sense of Ramsey’s argument for realism. Friedman [1981; 1983] uses a similar idea to makes sense of arguments for the reality of spacetime. Glymour [1980] discusses ideas about theory and evidence that have a distinctly Whewellian flavor. Norton [2000a; 2000b] emphasizes the overdetermination of parameters, Harper and Myrvold [2002], Harper [2002; 2007] emphasize the importance of the agreements of independent measurements, and provide excellent detailed examples. These authors appreciate the nuances involved in real examples of scientific discovery, yet there is still a failure to see two things very clearly: (1) The depth of difficulties for standard theories of confirmation, such as Bayesianism, and (therefore) a failure to appreciate (2) the relevance of Whewell’s ideas to contemporary debates about theory and evidence. To defend the objectivity of knowledge, we need to understand how conceptions introduced in our best explanations are “objectified” by the agreement of independent measurements in a hierarchy of successive generalizations. None of this is going to “fall out” of standard formal theories of epistemology.
The Debate between Whewell and Mill on the Nature of Scientific Induction
105
4 DISPUTES ABOUT INDUCTION THAT HAVE IGNORED THESE LESSONS Hempel [1945] made an important distinction between the direct and indirect confirmation of hypotheses. Direct confirmation is the familiar process by which a generalization is confirmed by observed instances of it, while indirect confirmation arises from its place in a larger network of hypotheses. For example, the law of free fall on the moon is directly confirmed by the experiments done on the moon by the Apollo astronauts, but was indirectly confirmed long before that by being deduced from Newton’s theory of gravitation, which has its own support. Whewell’s discussion of what he termed successive generalizations and the consilience of inductions can be seen as an account of indirect confirmation. Whewell’s idea is this: The aim of any inductive inference is to extract information from the data that can then be used in higher level inductions. For example, Copernicus’s theory can be used to infer 3-dimensional positions of the planets relative to the sun from 2-dimensional positions relative to the fixed stars. The 3-dimensional positions were then used by Newton to provide instances of the inverse square law of gravitation, which enable us to make predictions about one planet based on observations of other planets. It was only this higher-level empirical success that finally confirmed Copernicus’s conjecture that the earth moved with the sun at the center. Only then can we fully trust the inferences about 3-dimensional positions inferred from Copernicus’s theory on which Newton’s inductions we based. Whewell explains why this circle is not vicious. Mill’s mistake is to reduce Whewell’s innovative idea of the consilience of inductions solely as the deductive subsumption of lower-level generalizations under higher-level laws. The problem with Mill’s idea is that it seems to involve a kind of circular reasoning: A is confirmed because A entails B and B is confirmed; but wait, B is now better confirmed because A is confirmed and A entails B. Mill fails to notice that higher-level generalizations have a direct kind of empirical confirmation in terms of the agreement of independent measurements of theoretically postulated quantities. In the case of Newton’s theory of planetary motions, it was the agreement of independent measurements of the earth’s mass obtained by observing the moon’s motion and terrestrial projectiles, and the agreement of independent measurements of the sun’s mass, and of Jupiter’s mass, and so on. The consilience of inductions thereby relies on aspects of the data that play no role in the confirmation of lower-level generalizations. This is why indirect confirmation, on the Whewellian view, avoids the Millian circle. Whewell’s writings were responsible, in part, for the existence of Book III On Induction in Mill’s System of Logic, in which many footnotes and sections are devoted to the important task of separating Mill’s views from Whewell’s. In 1849, Whewell published a reply called “Of Induction, with Especial reference to Mr. Mill’s System of Logic”. Near the beginning of his commentary, Whewell [1989, p. 267] main complaint is that Mill “has extended the use of the term Induction not only to cases in which general induction is consciously applied to particular
106
Malcolm Forster
instances; but to cases in which the particular instance is dealt with by means of experience in the rude sense in which experience is asserted of brutes; and in which, of course, we can in no way imagine that the law is possessed or understood as a general proposition. Mill has thus “overlooked the broad and essential difference between speculative knowledge and practical action; and has introduced cases which are quire foreign to the idea of science, alongside with cases from which we may hope to obtain some views of the nature of science and the processes by which it must be formed.” In a footnote to chapter i, Book III, Mill [1872] replies: “I disclaim, as strongly as Dr. Whewell can do, the application of such terms as induction, inference, or reasoning, to operations performed by mere instinct, that is from an animal impulse, without the exertion of any intelligence.” But the essence of Whewell’s complaint is that simple enumerative induction, and Mill’s other methods of induction, are no more complicated than animal impulses even when it is consciously employed; at least, not different in a way that accounts for the difference in intelligence. If the complaint is about the established use of the word “induction”, then I tend to think that Whewell is the one swimming against the tide. But it would be a mistake to think that this is merely a linguistic debate about the use of the word ‘induction’; for as Whewell notes, there is always a proposition that accompanies every definition, and the proposition in this case is something like: Simple enumerative induction (such as inferring that all humans are mortal from John, Paul,. . . are mortal) adequately represents the habit of mind that brings about the highest forms of human knowledge. This is an assumption that should be questioned in light of what we know today. Whewell expands upon his worries by characterizing most generalizations of the form “All humans are mortal” as a mere juxtapositions of particular cases [Whewell, 1989, 163]. Whewell agrees that induction is the operation of discovering and proving general propositions, but he appears to have a different understanding of the term “general”. For Whewell (1989, 47) it is necessary that “In each inductive process, there is some general idea introduced, which is given, not by the phenomena, but by the mind.” The inductive conclusion is, therefore, composed of facts and conceptions “bound together so as to give rise to those general propositions of which science consists”. “All humans are mortal” is not general in the appropriate sense because there has been no conception added to the fact that John, Paul,. . . are mortal.2 Whewell insists that in every genuine induction, “The facts are known but they are insulated and unconnected . . . The pearls are there but they will not hang together until some one provides the string” [Whewell, 1989, 140-141]. The “pearls” are the data points and the “string” is a new conception that connects and unifies the data. The “pearls” in “All As are Bs” are unstrung because “All As are Bs”, though general in the sense that it is 2 But it would be incorrect to say that Whewell thinks that no generalization of the for All As are Bs can introduce a new conception. For example, it could be that “All metals conduct electricity” qualifies as an induction conclusion because the term ‘metal’ may represent a new conception not contained in the facts. I owe this point to Dan Schneider.
The Debate between Whewell and Mill on the Nature of Scientific Induction
107
universally quantified, does not connect or unify the facts; it does not colligate the facts. For Whewell, this process of uniting the facts under a general conception, which he calls the colligation of facts, is an essential step in the formation of human knowledge. Mill would gladly transfer Whewell’s description of the colligation of facts to his own pages, but fails to see that it has the kind of importance that Whewell attaches to it. There are two worries that everyone should have about simple enumerative induction: (1) It is not a habit of mind that we have in a great many cases; in fact, it is the subject of well known philosophical jokes. A philosopher jumps from the Empire State Building and is heard to say as he falls past the 99th floor “99 floors and I’m not dead!” As a different example, imagine a study of radioactive decay in which all the samples observed are radioactive, yet the very law of radioactive decay discovered from these observations leads us to deny that any finite sample will be radioactive for all times. (2) When such a habit of mind is desirable, it is very easy to implement. Simple associative learning is not what marks the difference between human intelligence and animal intelligence. I say ‘salt’ and you think ‘pepper’. Pavlov’s dogs are the most famous case of a kind of associative learning in animals known as classical conditioning. In more recent times, the same learning ability has been demonstrated in animals as primitive as sea slugs (Aplysia californica). It’s not just that “brutes” do it, sea slugs do it! A strong 1-sec electric shock to the mantle of the slug (called the unconditioned stimulus UCS) elicits a prolonged withdrawal of its siphon. The UCS in Pavlov’s dogs is the smell of meat, which elicits salivation. The aim of the experiments is to demonstrate an ability to learn to predict the UCS from a conditioned stimulus (CS). In Pavlov’s dogs, the CS was the sound of a bell. When presented immediately prior the presentation of food on several occasions, the bell would eventually trigger the salivation response by itself without the smell of meat, thereby indicating that the dogs had learned to predict the presence of meat from the sound of the bell. In the case of the sea slugs, one CS was a short tactile stimulation of the siphon, which elicited a short withdrawal of the siphon. When the CS was presented a short 0.5 sec before the UCS, and this was repeated 15 times, the CS would produce a siphon withdrawal that is more 4 times as long as what would have resulted without the learned association between the CS and the UCS. Just as Pavlov’s dogs appear to learn to “predict” the presence of food from the sound of a bell, the sea slugs appear to anticipate a large electrical shock from a short tactile stimulation of the siphon.3 Sea slugs have about 20,000 nerve cells in its central nerve system arranged in nine ganglia [Macphail, 1993, p. 32] compared to the approximately 1012 neurons in a human being, some of which may have several thousand synaptic contacts [Nauta and Feirtag, 1986]. What is the function of these extra neurons? To learn a billion more associations of the same 3 No such association is learned when the CS is presented after the UCS. See [Macphail, 1993, pp. 103-5], for a more complete description of the experiment, or the original source; Carew, Hawkins, and Kandel 1983.
108
Malcolm Forster
kind? If so, how are these learned associations organized or associated together? The most influential part of the System of Logic is Mill’s four methods of induction [Mill, 1972, Book III, Chapter VIII, IX]; but these are also the butt of many jokes. A philosopher goes to a bar on Monday and drinks whiskey and soda water all night. The next day he drinks vodka and soda. The following night, gin and soda, and then the night after that, bourbon and soda. Finally, on Friday, he comes into the bar and complains that he’s been too inebriated for the past week to get much work done, so tonight he’s going to drink whiskey without the soda. The philosopher has used Mill’s the method of agreement to observe that the only common thread in the four times he’s been inebriated is that he’s been drinking soda water. Therefore, soda water causes inebriation. So much the worse for simple inductive rules mindlessly applied. Of Mill’s four methods, Whewell [1989, p. 286] writes: “Upon these methods, the obvious thing to remark is, that they take for granted the very thing which is the most difficult to discover, the reduction of the phenomena to formulae such as are here presented to us. When we have any set of complex facts offered to us; for instance. . . the facts of the planetary paths, of falling bodies, of refracted rays, of cosmical motions, of chemical analysis; and when, in any of these cases, we would discover the law of nature which governs them, or if any one chooses so to term it, the feature in which all the cases agree, where are we to look for our A, B, C, and a, b, c? Nature does not present to us the cases in this form. . . ” Whewell’s point is very simple. In order to discover a connections between two disparate phenomena, we need to be able to extract the relevant information from each domain, that is, introduce quantities that will prove to be connected, yet we don’t know that until after we collect the right kind of data and see whether the quantities fit together in higher-level regularities. This kind of catch-22 makes discovery extremely difficult, though not impossible for human beings. But for present-day machines, computer systems, and primitive organisms, it has not been possible. A failure to see the depth of the problem is the root cause of the overly optimistic forecasts in the 1960s about how the AI systems would match human intelligence within 20 years. Even the apparent exceptions to this, such as the Deep Blue chessplaying program, prove the rule. In 1996, Deep Blue became the first computer system to defeat a reigning world champion (Garry Kasparov) in a match under standard chess tournament time controls. But it did it by brute force computing power, rather than the pattern-recognition techniques of the human chess masters, which enable them to play 40 opponents at once. (See [Dreyfus, 1992] for an indepth analysis.) In 1987, researchers based at Carnegie Mellon University (CMU) published a book called Scientific Discovery: Computational Explorations of the Creative Process by Langley, Simon, Bradshaw, and Zytkow. Again, the basic Whewellian criticism was raised about the computer programs such as Bacon, an AI system that rediscovered numeric laws such as Kepler’s third law, which equates the period of revolution of a planet around Sun to the 3/2 power of the mean radius. It’s one
The Debate between Whewell and Mill on the Nature of Scientific Induction
109
thing to ask how to relate one variable to another when the variables are already given, but quite another to discover Kepler’s laws from raw data about the angular positions of the planets at various times. Even knowing that ‘position relative to the fixed stars’ and ‘time’ can be functionally related is a major step forward. Nothing like this has been replicated by any computer system. That’s not to say that it’s impossible (indeed Langley and Bridewell (in press) speak in terms that remind me of Whewell). After all, our brains are computers and a network of these computers did solve the problem. But we must recognize that the requisite “explication of the conceptions”, to use Whewell’s term, is difficult. The most recent instance of this kind of disagreement surrounds the work by another group at CMU headed by Spirtes, Glymour and Scheines [1993], who have developed algorithms for discovering causal models or Bayes nets. Humphreys and Freedman [1996] published a critique, while Spirtes, Glymour and Scheines [1997] and Korb and Wallace [1997] published a reply. Again, this research in computerautomated algorithms of scientific discovery is an extremely valuable. The question is whether it could be improved by an implementation of Whewellian ideas (see [Forster, 2006]). In 1981, Hinton and Anderson edited an important volume on Parallel Models of Associative Memory, which was followed up by the very famous work on parallel distributed processing edited by Rumelhart and McClelland in 1986, which gave birth to a thriving industry on connectionist networks, otherwise known as artificial neural networks. The breakthrough was made possible by the mathematical discovery about how to implement a learning algorithm in neural networks that propagates backwards in the network to adjust connection weights so as to reduce the error in the output [Rumelhart et al., 1986]. Yet again, the lesson turned out to be the same: An all-purpose neural network is able to approximate any function in principle; but in practice too much flexibility creates difficulties. Top-down constraints need to be imposed on the network before data-driven search methods can match any of the cognitive abilities of human beings. My only point is that, in each of these episodes, it has taken quite some time to rediscover some of the points that were raised 150 years ago in the Whewell-Mill debate. 5
IMPLICATIONS FOR PROBABILISTIC THEORIES OF EVIDENCE AND CONFIRMATION
Allow me to predict a new example of the same thing. At the present time, there seems to me to be an overestimation of what the methods of statistical inference can achieve. In philosophy of science, major figures in the field endorse the view that Bayesian or Likelihoodist approaches to statistical reasoning can be extended to cover scientific reasoning more generally. In [Forster, 2007], I have argued that standard statistical methods of model selection, such as AIC [Akaike, 1973] and BIC [Schwartz, 1978], are fundamentally limited in their ability to replicate the methods of scientific discovery. (Note that connectionist networks are also implementing a standard statistical learning rule known as the method
110
Malcolm Forster
of least squares.) In [Forster, 2006], I put forward a positive suggestion about how Whewellian ideas about the consilience of inductions enrich the relationship between theory and evidence, which could improve the rate of learning and the amount that can be learned. Continuing on the same theme, philosophers of science, such as Hesse [1968; 1971], Achinstein [1990; 1992; 1994], and more recently Myrvold [2003], have tried to capture the confirmational value of consilience and unification in terms of standard probabilistic theories of confirmation, but with limited success. The reason for their limited success is illustrated by the following schematic example. Suppose we have a set of three objects {a, b, c} that can be hung on a mass measuring device, either individually or in pairs, a*b, a*c, and b*c, where a*b denotes the object consisting of a conjoined with b, and so on. Suppose that the Data consists of six measurements of the distances at which the counterweight need to be hung from the center of a beam balance in order to balance the object being measured. Let’s denote this observed distance as x(o), where o is the name of the object being measured. In order to talk about the consilience of inductions, we need two, or more, separate inductions; so let’s divide the data into two parts, and consider inductions performed on each part. Data1 = {x(a) = 1, x(b) = 2, x(c) = 3}, and Data2 = {x(a*b) = 3, x(a*c) = 4, x(b*c) = 5}. The core hypothesis under consideration is the assertion that for all objects o, x(o) = m(o), where m(o) denotes a theoretically postulated property of object o called mass. M:
(∀o)(x(o) = m(o)).
The quantity x can be repeatedly measured, but no assumption is made that its value will be the same on different occasions. That depends on what the world is like. On the other hand, the hypothesis M asserts that masses are constant over time. The postulated constancy of m, combined with the equation, predicts that repeated measurements on the same object will be the same. It’s easy to equate some new quantity m with the outcome of measurement x, but it’s not so easy to defend the new quantity as representing something real underlying the observable phenomena. If we apply the conception that x(o) = m(o) to the two data sets, we notice that the hypothesis accommodates the data in each case, and there is no test of the hypothesis in the precise sense that the hypothesis would not have been refuted had the data been “generated by” a contrary hypothesis [Mayo, 1996]. The predictive content is not tested by single measurements of each mass. Yet, we
The Debate between Whewell and Mill on the Nature of Scientific Induction
111
can arrive at an inductive conclusion from the data according to standard rules. In the case of Data1, we arrive at the hypothesis h1 :
M &{m(a) = 1, m(b) = 2, m(c) = 3}.
Note that h1 ⇒ Data1, where ‘⇒’ means ‘logically entails’. I have no problem with the claim that the data Data1 confirms the hypothesis h1 , although it does so by pointing to the particular predictive hypothesis out of all those compatible with M , rather than confirming M itself. Now let’s consider the inductive conclusion arrived at on the basis of Data2: h2
M &{m(a ∗ b) = 3, m(a ∗ c) = 4, m(a ∗ c) = 5}.
Again, h2 ⇒ Data2, and the data confirms the inductive hypothesis. On my understanding of Whewell and Mill, they would agree on this. To explain the difference between Whewell and Mill, let’s consider a stronger inductive conclusion that includes the standard Newtonian conception that the mass of a composite object such as a*b is the sum of the masses of the parts. We shall call this the law of the composition of masses (LCM), and write it more formally as: LCM
(∀o1 )(∀o2 )(m(o1 ∗ o2 ) = m(o1 ) + m(o2 )).
Let’s denote the stronger inductive conclusions drawn from the data sets by H1 = h1 & LCM and H2 = h2 & LCM, respectively. Again, the data confirms the respective hypotheses, but only by picking out the mass values that correctly apply to the objects. There is no confirmation of the general propositions in the inductive hypotheses by Data1 or Data2. But all this changes when we consider the bigger picture; for H1 and H2 entail more than the data from which they were inductively inferred, they predict the other data set as well. That is, H1 ⇒ Data2, and H2 ⇒ Data1. This is an illustration of the idea behind Whewell’s consilience of inductions. . . “That rules springing from remote and unconnected quarters should thus leap to the same point, can only arise from that being the point where truth resides” [Whewell, 1989, p. 153]. The hypotheses h1 and h2 enjoy no such relationship with the data. Another way of seeing the same thing is to note that the two data sets, Data1 and Data2, provide independent measurements of the theoretically postulated masses, m(a), m(b), and m(c), and the independent measurements agree.4 From Data1, we obtain values of m(a), m(b), and m(c), and from Data2, we obtain values of m(a) + m(b) = 3, m(a) + m(c) = 4, and m(b) + m(c) = 5. Since there are three equations in three unknowns, these equations yield an independent set 4 “Independent” just means that the measurements are calculated from non-overlapping sets of data.
112
Malcolm Forster
of values for the three masses, which agree with the first set. Therefore H is confirmed by agreement of independent measurements of its postulated quantities, while h = h1 &h2 is not. The intuition just described is far more forceful if we were to embellish the example by including a set of mass measurements on a larger set of objects; say 25 objects. Then Data1 consists of 25 measurements of the 25 objects, whereas Data2 consists of 300 measurements of all possible pairings of the 25 objects, which provides 12 more independent measurements of each mass. That fact that 13 independent measurements of mass agree for each of 25 different objects is very strong evidence for the hypothesis H. Unfortunately, we cannot obtain this conclusion (that H is better supported by the Data than h) from the standard theories of confirmation used in contemporary philosophy of science or in statistics, such as Bayesianism and Likelihoodism.5 These views are committed to a likelihood theory of evidence that says that degree to which a total evidence, the Data in our example, supports a hypothesis, such as H or h, is fully exhausted by likelihoods P (Data|H) and P (Data|h). But, H ⇒ Data, and h ⇒ Data, and, therefore, P (Data|H) = 1 = P (Data|h). The relationship between theory and evidence is therefore the same for each of the hypotheses according to these (well respected) accounts of the nature of evidence. I suspect that the Bayesians and Likelihoodists will respond to this example along the following lines. Instead of considering the hypotheses as I have defined them, which include the “determination of the magnitudes” (as Whewell would put it), we should consider just the generalizations M and (M & LCM). Then we can argue that (M & LCM) gives the Data a greater probability (i.e., the hypothesis has a greater likelihood). They may argue that P ( Data| M &LCM) > P ( Data| M ).6 The idea behind this claim is very simple, but first you need to understand that (by the axioms of probability) the likelihood of a family of hypotheses is equal to a weighted average of the likelihoods of the hypotheses in the family. (M & LCM) is a family of hypotheses in which one member, namely H, has likelihood 1, while all the others have likelihood 0 because they get at least one mass value wrong (out of the masses that have been measured). The same applies to M ; it contains one hypothesis with likelihood 1 and the rest with likelihood 0. (Having likelihood 0 usually means that the hypothesis is refuted by the data.) Thus, (M & LCM) has a greater likelihood because its likelihood is calculated by 5 The one exception that I know of is Mayo [1996]. Her take on this example would be that H is severely tested by the Data because the probability is high that H would be refuted if H were false. But h is not severely tested by the Data because it would not be refuted if h were false. The uneasiness I have with this approach is the reference to counterfactual data. Other things being equal, I prefer a theory of confirmation that focuses only on the actual data. 6 Proof : P ( Data| M &LCM) = P (Data1|M &LCM)P (Data2|M &LCM & Data1). But P (Data2|M &LCM & Data1) = 1, so P ( Data| M &LCM) = P (Data1|M &LCM). But now it is clear that the hypotheses “say the same thing” about Data1, so P (Data1|M &LCM) = P (Data1|M ), and it is obvious thatP (Data1|M ) > P ( Data| M ). Thus, the result follows.
The Debate between Whewell and Mill on the Nature of Scientific Induction
113
averaging over a larger set of other hypotheses, all of which have zero likelihood. In other words, the likelihood of M is smaller because its maximum likelihood is washed out by averaging over a greater number of hypotheses. The first problem with this reply is that it changes the subject. We began by talking about the confirmation of H and h, and ended about talking about something else. But let’s consider the confirmation of (M & LCM) and M . The problem is that under any sensible way of averaging likelihoods, it turns out to be zero, zilch, nil. This is because there is only one point hypothesis that has non-zero likelihood, so any weighting that averages (integrates) over an infinite number (a continuum) of point hypotheses will yield an average likelihood of zero (Forster and Sober 1994). So, the claim that P (Data|M &LCM) > P (Data|M ) is incorrect. It should have been P (Data|M &LCM) ≥ P (Data|M ). And under the rather general conditions I have stated, P (Data|M &LCM) = P (Data|M ). The core part of the Bayesian argument, the part that was right, derives from the inequality P (Data2|M &LCM & Data1) = 1 > P (Data2|M & Data1) = 0. But this inequality is just what lies at the heart of Whewell’s consilience of inductions! Once we see that the inequality is what’s crucial, then we can express what should be said about the original example in the language of probability, without changing the subject. For note that the hypothesis (M & LCM) & Data1 is logically equivalent to H1 , as we previously defined it, and M & Data1 is logically equivalent to h1 . So the inequality is just P (Data2|H1 ) = 1 > P (Data2|h1 ) = 0, to which we could add P (Data1|H2 ) = 1 > P (Data1|h2 ) = 0. In other words, the part of the likelihood analysis that makes sense rests on Whewellian principles. Why try to wrap it up in a Bayesian package with trappings that are false at worst, and irrelevant at best? I suggest that it is philosophically more fruitful to understand the relationship between theory and evidence in Whewellian terms right from the beginning. To repeat, as Whewell points out, nature does not present inductive problems in a form that lends itself to any simple methods of induction. In the mass measurement example, we began with two sets of data, with two phenomena, each of which is colligated by the formula x(o) = m(o), but we can discover no deeper connection between them until we explicate the concept of mass by introducing the law of composition of masses (LCM). Question: How do we explain why these thirteen independent measurements agree? Answer: By concluding that they are measurements of the same quantity, the effects of a common cause. Arguing that we should explain many effects in terms of a common cause is the easy part of the discovery. The harder part is to arrive at the problem in this form. The same is true of the Kepler example.
114
Malcolm Forster
ACKNOWLEDGEMENTS I would like to thank Elizabeth Wrigley-Field and Daniel Schneider for very helpful comments on an earlier draft. BIBLIOGRAPHY [Achinstein, 1990] P. Achinstein. Hypotheses, Probability, and Waves. British Journal for the Philosophy of Science 41: 73-102, 1990. [Achinstein, 1992] P. Achinstein. Inference to the Best Explanation: Or, Who Won the MillWhewell Debate? Studies in the History and Philosophy of Science 23: 349-364, 1992. [Achinstein, 1994] P. Achinstein. Explanation v. Prediction: Which Carries More Weight? In David Hull and Richard M. Burian (eds.), PSA 1994, vol. 2, East Lansing, MI, Philosophy of Science Association, 156-164, 1994. [Akaike, 1973] H. Akaike. Information Theory and an Extension of the Maximum Likelihood Principle. B. N. Petrov and F. Csaki (eds.), 2nd International Symposium on Information Theory: 26781. Budapest: Akademiai Kiado, 1973. [Carew et al., 1983] T. J. Carew, R. D. Hawkins, and E. R. Kandel. Differential classical conditioning of a defensive withdrawal reflex in Aplysia californica. Science 219: 397-400, 1983. [Dreyfus, 1992] H. L. Dreyfus. What Computers Still Can’t Do: A Critique of Artificial Reason MIT Press: Cambridge, Mass, 1992. [Earman, 1978] J. Earman. Fairy Tales vs. an Ongoing Story: Ramsey’s Neglected Argument for Scientific Realism. Philosophical Studies 33: 195-202, 1978. [Forster, 1988] M. R. Forster. ‘Unification, Explanation, and the Composition of Causes in Newtonian Mechanics. Studies in the History and Philosophy of Science 19: 55–101, 1988. [Forster, 2006] M. R. Forster. Counterexamples to a Likelihood Theory of Evidence, Mind and Machines, 16: 319-338, 2006. [Forster, 2007] M. R. Forster. A Philosopher’s Guide to Empirical Success, Philosophy of Science, Vol. 74, No. 5: 588-600, 2007. [Forster and Sober, 1994] M. R. Forster and E. Sober. How to Tell when Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions. British Journal for the Philosophy of Science 45: 1–35, 1994. [Friedman, 1981] M. Friedman. Theoretical Explanation. In Time, Reduction and Reality. Edited by R. A. Healey. Cambridge: Cambridge University Press. Pages 1–16, 1981. [Friedman, 1983] M. Friedman. Foundations of SpaceTime Theories. Princeton, NJ: Princeton University Press, 1983. [Glymour, 1980] C. Glymour. Explanations, Tests, Unity and Necessity. Noˆ us 14: 31–50, 1980. [Hanson, 1973] N. R. Hanson. Constellations and Conjectures, W. C. Humphreys, Jr. (ed.) D. Reidel: DordrechtHolland, 1973. [Harper, 1989] W. L. Harper. Consilience and Natural Kind Reasoning. In J. R. Brown and J. Mittelstrass (eds.) An Intimate Relation: 115152. Dordrecht: Kluwer Academic Publishers, 1989. [Harper, 1993] W. L. Harper. Reasoning from Phenomena: Newton’s Argument for Universal Gravitation and the Practice of Science. In Paul Theerman and Seeff, Adele F. (eds.) Action and Reaction, Newmark: University of Delaware Press, 144–182, 1993. [Harper, 2002] W. L. Harper. Howard Stein on Isaac Newton: Beyond Hypotheses. In David B. Malament (ed.) Reading Natural Philosophy: Essays in the History and Philosophy of Science and Mathematics. Chicago and La Salle, Illinois: Open Court. 71–112, 2002. [Harper, 2007] W. L. Harper. ‘Newton’s Method and Mercury’s Perihelion before and after Einstein. Philosophy of Science 74: 932-942, 2007. [Harper et al., 1994] W. L. Harper, B. H. Bennett and S. Valluri. “Unification and Support: Harmonic Law Ratios Measure the Mass of the Sun.” In D. Prawitz and D. Westerst¨ ahl (eds.) Logic and Philosophy of Science in Uppsala: 131-146. Dordrecht: Kluwer Academic Publishers, 1994. [Hinton and Anderson, 1981] G. E. Hinton and J. A. Anderson, eds. Parallel Models of Associative Memory. Hillsdale, NJ: Lawrence Erlbaum Associates, 1981.
The Debate between Whewell and Mill on the Nature of Scientific Induction
115
[Hempel, 1945] C. G. Hempel. Studies in the Logic of Confirmation. Mind, vol. 54, 1945. Reprinted in Hempel [1965]. [Hempel, 1965] C. G. Hempel. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. New York: The Free Press, 1965. [Hesse, 1968] M. Hesse. Consilience of Inductions. In I. Lakatos (ed.), Inductive Logic. NorthHolland, Amsterdam, 1968. [Hesse, 1971] M. Hesse. Whewell’s consilience of inductions and predictions, The Monist , 55, 520-524, 1971. [Humphreys and Freedman, 1996] P. Humphreys and D. Freedman. The Grand Leap. British Journal for the Philosophy of Science 47: 113-123, 1996. [Korb and Wallace, 1997] K. B. Korb and C. S. Wallace. In Search of the Philosopher’s Stone: Remarks on Humphreys and Freedman’s Critique of Causal Discovery. British Journal for the Philosophy of Science 48: 543-553, 1997. [Kuhn, 1970] T. Kuhn. The Structure of Scientific Revolutions, Second Edition. Chicago: University of Chicago Press, 1970. [Kuhn, 1970a] T. Kuhn. The Structure of Scientific Revolutions, Second Edition. Chicago: University of Chicago Press, 1970. [Langley et al., 1987] P. H. Langley, H. A. Simon, G. L. Bradshaw, and J. M. Zytkow. Scientific Discovery: Computational Explorations of the Creative Process. MIT Press, Cambridge, Mass, 1987. [Langley and Bridewell, in press] P. H. Langley and W. Bridewell. Processes and constraints in explanatory scientific discovery. Proceedings of the Thirtieth Annual Meeting of the Cognitive Science Society. Washingon, D.C., in press. [Macphail, 1993] E. M. Macphail. The Neuroscience of Animal Intelligence: From the Seahare to the Seahorse. Columbia University Press, New York, 1993. [Mayo, 1996] D. G. Mayo. Error and the Growth of Experimental Knowledge. Chicago and London, The University of Chicago Press, 1996. [Mill, 1872] J. S. Mill. A System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence and the Methods of Scientific Investigation, 1872. Eighth Edition (Toronto: University of Toronto Press, 1974). [Myrvold, 2003] W. Myrvold. A Bayesian Account of the Virtue of Unification, Philosophy of Science 70: 399-423, 2003. [Myrvold and Harper, 2002] W. Myrvold and W. L. Harper. Model Selection, Simplicity, and Scientific Inference, Philosophy of Science 69: S135-S149, 2002. [Nauta and Feirtag, 1986] W. J. H. Nauta and M. Feirtag. Fundamental Neuroanatomy. New York: W. H. Freeman, 1986. [Norton, 2000a] J. D. Norton. The Determination of Theory by Evidence: The Case for Quantum Discontinuity, 1900-1915, Synthese 97: 1-31, 2000. [Norton, 2000b] J. D. Norton. How We Know about Electrons. In Robert Nola and Howard Sankey (eds.) After Popper, Kuhn and Feyerabend, Kluwer Academic Press, 67-97, 2000. [Rumelhart et al., 1986] D. E. Rumelhart, J. McClelland, et al. Parallel Distributed Processing, Volumes 1 and 2. MIT Press, Cambridge, Mass, 1986. [Rummelhart et al., 1986a] D. E. Rumelhart, G. Hinton, and R. J. Williams. Nature 323: 533536, 1986. [Schwarz, 1978] G. Schwarz. Estimating the Dimension of a Model. Annals of Statistics 6: 4615, 1978. [Spirtes et al., 1993] P. Spirtes, C. Glymour and R. Scheines. Causation, Prediction and Search. New York: Springer-Verlag, 1993. [Whewell, 1840] W. Whewell. The Philosophy of the Inductive Sciences (1967 edition). London: Frank Cass & Co. Ltd. 1840. [Whewell, 1847] W. Whewell. Philosophy of the Inductive Sciences , 2 vols. (London, John W. Parker), 1847. [Whewell, 1858] W. Whewell. Novum Organon Renovatum, Part II of the 3rd the third edition of The Philosophy of the Inductive Sciences, London, Cass, (1858), 1967. [Whewell, 1989] W. Whewell. William Whewell: Theory of Scientific Method. Edited by Robert Butts. Hackett Publishing Company, Indianapolis/Cambridge, 1989.
AN EXPLORER UPON UNTRODDEN GROUND: PEIRCE ON ABDUCTION
Stathis Psillos Abduction, in the sense I give the word, is any reasoning of a large class of which the provisional adoption of an explanatory hypothesis is the type. But it includes processes of thought which lead only to the suggestion of questions to be considered, and includes much besides. Charles Peirce (2.544, note) 1
INTRODUCTION
Charles Sanders Peirce (1839-1914), the founder of American pragmatism, spent a good deal of his intellectual energy and time trying to categorise kinds of reasoning, examine their properties and their mutual relations. During this intellectual adventure, he was constantly breaking new ground. One of his major achievements was that he clearly delineated a space for non-deductive, that is ampliative, reasoning. In particular, he took it to be the case that there are three basic, irreducible and indispensable forms of reasoning. Deduction and Induction are the two of them. The third is what he came to call abduction, and whose study animated most of Peirce’s intellectual life. In his fifth lecture on Pragmatism, in 1903, Peirce claimed that “abduction consists in studying facts and devising a theory to explain them” (5.145).1 And in the sixth lecture, he noted that “abduction is the process of forming an explanatory hypothesis” (5.171). He took abduction to be the only kind of reasoning by means of which new ideas can be introduced (cf. 5.171). In fact, he also thought that abduction is the mode of reasoning by means of which new ideas have actually been introduced: “All the ideas of science come to it by the way of Abduction” (5.145). “Abduction”, he added, “consists in studying facts and devising a theory to explain them. Its only justification is that if we are ever to understand things at all, it must be in that way (5.145). 1 All references to Peirce’s works are to his Collected Papers, and are standardly cited by volume and paragraph number. The Collected Papers are not in chronological order. Every effort has been made to make clear the year in which the cited passages appeared.
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
118
Stathis Psillos
Peirce never doubted the reality, importance, pervasiveness and reasonableness of explanatory reasoning. And yet he thought that explanatory reasoning had been understudied—its character as a distinct logical operation had not been understood. Nor had it been sufficiently distinguished from other basic forms of reasoning. In 1902, in the middle of his unfinished manuscript Minute Logic, he made it clear that he was fully aware of the unprecedented character of the task he had set to himself. In his study of ‘Hypothetic inference’, as he put it, he was “an explorer upon untrodden ground” (2.102). In this chapter I will narrate the philosophical tale of this exploration. Section 2 will recount Peirce’s debts to Kant and Aristotle. Section 3 will articulate and present Peirce’s own two-dimensional framework for the study of reasoning and set out Peirce’s key aim, viz., the study of the mode of reasoning that is both ampliative and generative of new content. Section 4 explains Peirce’s early syllogistic approach to inference and discusses his division of ampliative reasoning into Hypothesis and Induction. Section 5 examines Peirce’s mature approach to abduction. Section 6 focuses on the issue of the legitimacy of abduction qua mode of reasoning and relates it to Peirce’s pragmatism. Section 7 relates Peirce’s conception of inquiry as a three-stage project which brings together all three basic and ineliminable modes of reasoning, viz., abduction, deduction and deduction. The chapter concludes with some observations about Peirce’s legacy.
2 IDEAS FROM KANT AND ARISTOTLE In setting out for the exploration of the untrodden ground, Peirce had in his philosophical baggage two important ideas; one came from Kant and the other from Aristotle. From Kant he took the division of all reasoning into two broad types: explicative (or necessary) and ampliative. In his Critique of Pure Reason, Kant famously drew a distinction between analytic and synthetic judgements (A7/B11). He took it that analytic judgements are such that the predicate adds nothing to the concept of the subject, but merely breaks this concept up into “those constituent concepts that have all along been thought in it, although confusedly”. For this reason, he added that analytic judgements can also be called “explicative”. Synthetic judgements, on the other hand, “add to the concept of the subject a predicate which has not been in any wise thought in it, and which no analysis could possibly extract from it; and they may therefore be entitled ampliative”. Peirce (cf. 5.176) thought that Kant’s conception of explicative reasoning was flawed, if only because it was restricted to judgements of the subject-predicate form. Consequently, he thought that though Kant was surely right to draw the distinction between explicative and ampliative judgements, the distinction was not properly drawn.2 He then took it upon himself to rectify this problem. One way in which Kant’s distinction was reformed was by a further division 2 For Peirce’s critique of Kant’s conception of analytic judgements, see his The Logic of Quantity (4.85-4.93), which is chapter 17 of the Grand Logic, in 1893.
An Explorer upon Untrodden Ground: Peirce on Abduction
119
of ampliative reasoning into Induction and Hypothesis. In fact, Peirce found in Aristotle the idea that there is a mode of reasoning which is different from both deduction and Induction. In his Prior Analytics, chapter 25 (69a20ff), Aristotle introduced an inferential mode which he entitled apag¯ og¯e and was translated into English as ‘reduction’. Aristotle characterised apag¯ og¯e as follows: “We have Reduction (1) when it is obvious that the first term applies to the middle, but that the middle applies to the last term is not obvious, yet nevertheless is more probable or not less probable than the conclusion”. This is rather opaque,3 but the example Aristotle used may help us see what he intended to say.4 Let A stand for ‘being taught’ or ‘teachable’; B for ‘knowledge’ and C for ‘morality’. Is morality knowledge? That is, is it the case that C is B? This is not clear. What is evidently true, Aristotle says, is that knowledge can be taught, i.e., B is A. From this nothing much can be inferred. But if we hypothesise or assume that C is B (that morality is knowledge), we can reason as follows: C is B B is A Therefore, C is A. That is: Morality is knowledge; Knowledge can be taught; therefore, morality can be taught. If the minor premise (C is B; morality is knowledge) is not less probable or is more probable than the conclusion (C is A; morality can be taught), Aristotle says, we have apag¯ og¯e : “for we are nearer to knowledge for having introduced an additional term, whereas before we had no knowledge that [C is A] is true”. The additional term is, clearly, B and this, it can be argued, is introduced on the basis of explanatory considerations.5 In the uncompleted manuscript titled Lessons from the History of Science (c. 1896), Peirce noted that “There are in science three fundamentally different kinds of reasoning, Deduction (called by Aristotle {synag¯ og¯e} or {anag¯ og¯e}, Induction (Aristotle’s and Plato’s {epag¯ og¯e}) and Retroduction (Aristotle’s {apag¯ og¯e}, but misunderstood because of corrupt text, and as misunderstood usually translated abduction)” (1.65). Peirce formed the hypothesis that Aristotle’s text was corrupt in some crucial respects and that Aristotle had in fact another kind of inference in mind than the one reconstructed above (and was also acknowledged by Peirce himself; cf. 7.2507.252).6 He took it that Aristotle was after an inference according to which the 3 There
is a second clause in Aristotle’s definition, but it need not concern us here. is actually some controversy over the exact rendering of Aristotle’s text and the interpretation of ‘reduction’. For two differing views, see W. D. Ross (1949, 480-91) and Smith (1989, 223-4). 5 Ross [1949, p. 489] claims that apag¯ og¯ e is a perfect syllogism and works on the assumption that if a proposition (which is not known to be true) is admitted, then a certain conclusion follows, which would not have followed otherwise. Smith [1989, p. 223] notes that apag¯ og¯ e “is a matter of finding premises from which something may be proved”. 6 Here is how he explained things in his fifth lecture on Pragmatism in 1903: “[I]t is necessary to recognize three radically different kinds of arguments which I signalized in 1867 and which 4 There
120
Stathis Psillos
minor premise (Case) of a syllogism is “inferred from its other two propositions as data” (7.249), viz., the major premise (Rule) and the conclusion (Result). Peirce took it that the proper form of hypothetical reasoning (the one that, according to Peirce, Aristotle was after in Prior Analytics, chapter 25) must be:7 Rule — M is P Result — S is P Case — S is M which amounts to a re-organisation of the premises and the conclusion of the following deductive argument: Rule — M is P Case — S is M Result — S is P . But then it transpired to Peirce that there is yet another re-organisation of the propositions of this argument, viz., Case — S is M Result — S is P Rule — M is P which, in his early period, he took it to characterise induction. We shall discuss the details of Peirce’s account of ampliative inference in the sequel. For the time being, let us just rest with the note that Peirce’s studies of ampliative reasoning were shaped by his re-evaluation and critique of ideas present in Kant and Aristotle. As we are about to see in the next section, Peirce created his own framework for the philosophical study of reasoning, which was essentially two-dimensional. 3 PEIRCE’S TWO-DIMENSIONAL FRAMEWORK In one of his last writings on the modes of reasoning, a letter he sent to Dr Woods in November 1913, Peirce summed up the framework within which he examined had been recognized by the logicians of the eighteenth century, although [those] logicians quite pardonably failed to recognize the inferential character of one of them. Indeed, I suppose that the three were given by Aristotle in the Prior Analytics, although the unfortunate illegibility of a single word in his MS. and its replacement by a wrong word by his first editor, the stupid [Apellicon], has completely altered the sense of the chapter on Abduction. At any rate, even if my conjecture is wrong, and the text must stand as it is, still Aristotle, in that chapter on Abduction, was even in that case evidently groping for that mode of inference which I call by the otherwise quite useless name of Abduction—a word which is only employed in logic to translate the [{apag¯ og¯ e}] of that chapter” (5.144). 7 Peirce said of Aristotle: “Certainly, he would not be Aristotle, to have overlooked that question [whether the minor premise of a syllogism is not sometimes inferred from its major premise and the conclusion]; and it would no sooner be asked than he would perceive that such inferences are very common” (7.249). In 8.209 (c. 1905), Peirce expressed doubts concerning his earlier view that the text in chapter 25 of Prior Analytics was corrupt.
An Explorer upon Untrodden Ground: Peirce on Abduction
121
reasoning and its properties. As he explained (8.383-8.388), there are two kinds of desiderata or aims that logicians should strive for when they study types of reasoning: uberty and security. Uberty is the property of a mode of reasoning in virtue of which it is capable of producing extra content; its “value in productiveness”, as he (8.384) put it. Security is the property of a mode of reasoning in virtue of which the conclusion of the reasoning is at least as certain as its premises. These two desiderata delineate a two-dimensional framework within which reasoning is studied. Peirce’s complaint is that traditional studies of reasoning have focused almost exclusively on its “correctness”, that is on “its leaving an absolute inability to doubt the truth of the conclusion as long as the premises are assumed to be true” (8.383). By doing so — by focusing only on security — traditional approaches have tended to neglect non-deductive modes of reasoning. They have confined their attention to deduction, which is the only mode of reasoning that guarantees security. This one-dimensional approach has, however, obscured the fact that there are different types of reasoning, which have different forms and require independent and special investigation. Induction and abduction are such types of reasoning. What is more, together with deduction they constitute the three ultimate, basic and independent modes of reasoning. This is a view that runs through the corpus of the Peircean work. Peirce’s two-dimensional framework suggests a clear way to classify the three modes of reasoning. Deduction scores best in security but worst in uberty. Abduction scores best in uberty and worst in security. Induction is between the two (cf. 8. 387). But why is uberty needed? One of Peirce’s stable views was that reasoning should be able to generate new ideas; or new content. The conclusion of a piece of reasoning should be able to be such that it exceeds in information and content whatever is already stated in the premises. Deduction cannot possibly do that. So either all attempts to generate new content should be relegated to processes that do not constitute reasoning or there must be reasoning processes which are non-deductive. The latter option is the one consistently taken by Peirce. The further issue then is the logical form of non-deductive reasoning. Peirce was adamant that there are two basic modes of non-deductive reasoning. Throughout his intellectual life he strove to articulate these two distinct modes, to separate them from each other and to relate them to deductive reasoning. ‘Abduction’, ‘Retroduction’, ‘Hypothetic Inference’, ‘Hypothesis’ and ‘Presumption’ are appellations aiming to capture a distinct mode of reasoning — distinct from both induction and deduction. What makes this kind of reasoning distinctive is that it relies on explanation — and hence it carves a space in which explanatory considerations guide inference. In his letter to Dr Woods, he said: “I don’t think the adoption of a hypothesis on probation can properly be called induction; and yet it is reasoning and though its security is low, its uberty is high” (8.388). What exactly is reasoning? In his entry on ampliative reasoning in the Dictionary of Philosophy and Psychology (1901-2), Peirce wrote: Reasoning is a process in which the reasoner is conscious that a judgment, the conclusion, is determined by other judgment or judgments,
122
Stathis Psillos
the premisses, according to a general habit of thought, which he may not be able precisely to formulate, but which he approves as conducive to true knowledge. By true knowledge he means, though he is not usually able to analyse his meaning, the ultimate knowledge in which he hopes that belief may ultimately rest, undisturbed by doubt, in regard to the particular subject to which his conclusion relates. Without this logical approval, the process, although it may be closely analogous to reasoning in other respects, lacks the essence of reasoning (2.773). This is instructive in many respects. Reasoning is a directed movement of thought (from the premises to the conclusion) that occurs according to a rule, though the rule (what Peirce calls “a general habit of thought”) might not be explicitly formulated. Actually, it is the task of the logician, broadly understood, to specify, and hence make explicit, these rules. There is more however to reasoning than being a directed movement of thought according to a rule. The rule itself — the general pattern under which a piece of reasoning falls — must be truthconducive. Or at least, the reasoner should hold it to be truth-conducive. Here again, it is the task of the logician to show how and in virtue of what a reasoning pattern (a rule) is truth-conducive. Peirce puts the matter in terms of knowledge: reasoning should lead to knowledge (and hence to truth). This is important because the reasoning process confers justification on the conclusion: it has been arrived at by a knowledge-conducive process. But Peirce’s account of knowledge is thoroughly pragmatic. Knowledge is not just true belief — justification is also needed. And yet, a belief is justified if it is immune to doubt; that is, if it is such that further information relevant to it will not defeat whatever warrant there has been for it. If justification amounts to resistance to doubt, it is clear that there are two broad ways in which a process of reasoning can confer justification on a belief. The first is by making it the case that if the premises are true, the conclusion has to be true. The second is by rendering a belief plausible and, in particular, by making a belief available for further testing, which — ideally at least — will be able to render this belief immune to revision. The security of deductive reasoning (and the justification it offers to its conclusions) are related to the fact that no new ideas are generated by deduction. But no new ideas are generated by induction either (cf. 5.145). Induction, understood as enumerative induction, generalises an observed correlation from a (fair) sample to a population. It moves from ‘All observed As have been B’ to ‘All As are B’. This is clearly non-demonstrative reasoning; and it is equally clearly ampliative, or content-increasing. But it is true that no new ideas are generated by this kind of reasoning. The reason is simple. The extra content generated by induction is simply a generalisation of the content of the premises; it amounts to what one may call ‘horizontal extrapolation’. Enumerative induction, pretty much like deduction, operates with the principle ‘garbage in, garbage out’: the descriptive vocabulary of the conclusion cannot be different from that of the premises. Hence with enumerative induction, although we may arguably gain knowledge of hitherto unobserved correlations between instances of the attributes
An Explorer upon Untrodden Ground: Peirce on Abduction
123
involved, we cannot gain ‘novel’ knowledge, i.e., knowledge of entities and causes that operate behind the phenomena. Peirce was quite clear on this: “[Induction] never can originate any idea whatever. No more can deduction. All the ideas of science come to it by the way of Abduction” (5.145). For Peirce then there must be a mode of reasoning which is both ampliative and generates new ideas. How this could be possible preoccupied him throughout his intellectual life. The key idea was that new content is generated by explanation — better put, explanatory reasoning (viz., reasoning that is based on searching for and evaluating explanations) is both ampliative and has the resources to generate new content, or new ideas. But in line with his overall approach to reasoning, it has to have a rather definite logical form. What then is this form? Peirce changed his mind over this at least once in his life. The reason is that his first attempt to characterise explanatory reasoning was constrained by his overall syllogistic conception of inference. This conception did not leave a lot of room for manoeuvre when it came to the formal properties of reasoning. A rather adequate formal characterisation of explanatory reasoning required a broadening of Peirce’s conception of logic, as the logic of inquiry. In fact, in 1882, Peirce came to see logic as “the method of methods”, or “the art of devising methods of research” (7.59). In his later writings, he was more inclined to equate logic with the scientific method, broadly understood as “the general method of successful scientific research” (7.79). We are about to see all this in detail in the next section. The general point, if you wish, is that Peirce’s view of explanatory reasoning has gone through two distinct phases. In the first phase, he took explanatory reasoning to be a specific inferential pattern that stands on its own and is meant to capture the formation and acceptance of explanatory hypotheses. In the second phase, explanatory reasoning was taken to be part of a broader three-stage methodological pattern (the method of inquiry). Explanatory reasoning no longer stands on its own; and though Peirce’s later view of explanatory reasoning shares a lot with his earlier view, (for instance the thought that explanatory reasoning is the sole generator of new content), the key difference is that explanatory reasoning yields conclusions that need further justification, which is achieved by means of deduction and induction. Abduction — Peirce’s settled term for explanatory reasoning — leads to hypotheses with excess/fresh content. These hypotheses do not bear their justification on their sleeves. They need to be further justified — by deduction of predictions and their confirmation (which is Peirce’s settled view of induction), because it is only this further process that can render them part of a body of beliefs that though fallible cannot be overturned by experience. Abductively generated beliefs should be subjected to further testing (encounters with possibly recalcitrant experience) and if they withstand it successfully (especially in the long run) they become true — in the sense that Peirce thinks of truths as doubt-resistant (or permanently settled) beliefs (truth: “a state of belief unassailable by doubt” 5.416).8 8 Two of Peirce’s commentators Arthur Burks [1946, p. 301] and K. T. Fann [1970, pp. 910] have rightly contrasted the two phases of Peirce’s views of explanatory reasoning along the following lines: in the first phase, Hypothesis is an evidencing process, while in the second phase
124
Stathis Psillos
4 HYPOTHESIS VS INDUCTION “The chief business of the logician”, Peirce said in 1878, “is to classify arguments; for all testing clearly depends on classification. The classes of the logicians are defined by certain typical forms called syllogisms” (2.619). As noted already in section 2, given this syllogistic conception of argument, typically exemplified by Barbara (S is M; M is P; therefore S is P), there is a division of ampliative types of argument into Induction and Hypothesis. Deduction is captured by a syllogism of the form: D: {All As are B; a is A; therefore, a is B}. There are two re-organisations of the premises and the conclusion of this syllogism: I: {a is A; a is B; therefore All As are B}; and H: {a is B; All As are B; therefore a is A}. Here is Peirce’s own example (2.623) DEDUCTION Rule. — All the beans from this bag are white. Case. — These beans are from this bag. ∴ Result. — These beans are white. INDUCTION Case. — These beans are from this bag. Result. — These beans are white. ∴ Rule. — All the beans from this bag are white HYPOTHESIS Rule. — All the beans from this bag are white. Result. — These beans are white. ∴ Case. — These beans are from this bag. The crucial thing here is that Peirce took both I and H to be formal argument patterns, which characterise “synthetic” reasoning. So, making a hypothesis falls under an inferential pattern. Peirce said: Suppose I enter a room and there find a number of bags, containing different kinds of beans. On the table there is a handful of white beans; and, after some searching, I find one of the bags contains white beans only. I at once infer as a probability, or as a fair guess, that this handful was taken out of that bag. This sort of inference is called making an hypothesis. it is a methodological as well as an evidencing process.
An Explorer upon Untrodden Ground: Peirce on Abduction
125
The intended contrast here, I take it, is with the statement “These beans are from the white-beans bag” (the conclusion of H) being a mere guess, or a wild conjecture and the like. Though it is clear that the conclusion of H does not logically follow from the premises (this is simply to say that H is not D), it is a conclusion, that is, the result of an inferential process — a movement of thought according to a rule. Already in 1867, in On the Natural Classification of Arguments, (Proceedings of the American Academy of Arts and Sciences, vol. 7, April 9, 1867, pp. 261-87), he took it to be the case that the adoption of a hypothesis is an inference “because it is adopted for some reason, good or bad, and that reason, in being regarded as such, is regarded as lending the hypothesis some plausibility” (2.511, note). More generally, although both H and I are logically invalid, they are not meant to be explicative inferences but ampliative. Their conclusion is adopted on the basis that the premises offer some reason to accept it as plausible: were it not for the premises, the conclusion would not be considered, even prima facie, plausible; it would have been a mere guess. The difference between I and H is, as Peirce put it, that induction “classifies”, whereas hypothesis “explains” (2.636). Classification is not explanation and this implies that Induction and Hypothesis are not species of the same genus of ampliative reasoning, viz., explanatory reasoning. Induction is a more-of-the-same type of inference.9 As noted already, the conclusion of an induction is a generalisation (or a rule) over the individuals mentioned in the premises. Hypothesis (or hypothetical inference) is different from induction in that the conclusion is a hypothesis which, if true, explains the evidence (or facts) mentioned in the premises. Here is how Peirce put it: “Hypothesis is where we find some very curious circumstance, which would be explained by the supposition that it was a case of a certain general rule, and thereupon adopt that supposition” (2.624). The supposition is adopted for a reason: it explains ‘the curious circumstance’. The centrality of explanation in (at least a mode of) ampliative reasoning is a thought that Peirce kept throughout his intellectual engagement with the forms of reasoning. A key idea that Peirce had already in the 1870s10 was that Hypothesis is different from Induction in that the conclusion of H is typically a new kind of fact, or something “of a different kind from what we have directly observed, and frequently something which it would be impossible for us to observe directly” (2.640). The excess content (the new ideas) generated by Hypothesis concerns, in a host of 9 In
his seventh of his Lowell lectures in 1903, Peirce developed a rather elaborate theory of the several types of induction (cf. 7.110-7.130). He distinguished between three types of induction: Rudimentary or crude induction, which is a form of default reasoning, viz., if there is no evidence for A, we should assume that A is not the case; Predictive induction, where some predictions are being drawn from a hypothesis and they are further tested; and Statistical or Quantitative induction, where a definite value is assigned to a quantity, viz., it moves from a quantitative correlation in a sample to the entire class “by the aid of the doctrine of chances”. Peirce goes on to distinguish between several subtypes of induction. The characteristic of all types is that their justification is that they will lead to convergence to the truth in the limit. Rudimentary induction is the weakest type, while statistical induction is the strongest one. See also his The Variety and Validity of Induction, (2.755-2.760), from manuscript ‘G’, c. 1905. 10 In a series of six articles published in Popular Science Monthly between 1877 and 1878.
126
Stathis Psillos
typical cases, unobservable entities that cause (and hence causally explain) some observable phenomena. Indeed, Peirce took this aspect of hypothetical reasoning to be one of the three reasons which suggest that Hypothesis and Induction are distinct modes of ampliative reasoning. As he put it in 1878: “Hypothetic reasoning infers very frequently a fact not capable of direct observation” (2.642). Induction lacks this capacity because it is constrained by considerations of similarity. Induction works by generalisation, and hence it presupposes that the facts mentioned in the conclusion of an inductive argument are similar to the facts mentioned in the premises. Hypothesis, on the other hand, is not constrained by similarity. It is perfectly possible, Peirce noted, that there are facts which support a hypothetically inferred conclusion, which are totally dissimilar to the facts that suggested it in the first place. The role played by similarity in Induction but not in Hypothesis is the second reason why they are distinct modes of ampliative reasoning. As he put it: (. . . ) the essence of an induction is that it infers from one set of facts another set of similar facts, whereas hypothesis infers from facts of one kind to facts of another” (2.642).11 The example Peirce used to illustrate this feature of hypothetical reasoning is quite instructive. The existence of Napoleon Bonaparte is based on a hypothetical inference. It is accepted on the basis that it accounts for a host of historical records, where these records serve as the ground for the belief in the historical reality of Napoleon. There is no way, Peirce thought, this kind of inference be turned to an induction. If there were, we would have to be committed to the view that all further facts that may become available in the future and confirm the historical reality of Napoleon will be similar to those that have already been available. But it is certainly possible, Peirce suggested, that evidence that may become available in the future might well be of a radically different sort than the evidence already available. To illustrate this, he envisaged the possibility that “some ingenious creature on a neighboring planet was photographing the earth [when Napoleon was around], and that these pictures on a sufficiently large scale may some time come into our possession, or that some mirror upon a distant star will, when the light reaches it, reflect the whole story back to earth” (2.642). There is clearly no commitment to similarity in the case of Hypothesis. Further facts that may confirm the hypothesis that Napoleon was historically real may be of any sort whatever. Actually, this thought tallies well with his claim that hypotheses can gain extra strength by unifying hitherto unrelated domains of fact (see 2.639). A case like this is the kinetic theory of gases which Peirce took it to have gained in strength by relating (unifying) a “considerable number of observed facts of different kinds” (2.639).12 11 Another salient point that distinguishes Induction from Hypothesis is that Hypothesis does not involve an enumeration of instances (see 2.632). 12 The third reason for the distinction between Hypothesis and Induction is somewhat obscure. He takes it that Induction, yielding as it does a general rule, leads to the formation of a habit. Hypothesis, by contrast, yielding as it does an explanation, leads to an emotion. It’s not quite clear what Peirce intends to say here. The analogy he uses is this: “Thus, the various sounds made by the instruments of an orchestra strike upon the ear, and the result is a peculiar musical
An Explorer upon Untrodden Ground: Peirce on Abduction
127
From the various examples of hypothetical reasoning Peirce offers,13 he clearly thinks that Hypothesis (meaning: hypothetical inference) is pervasive. It is invariably employed in everyday life as well as in science. But as noted already, Peirce’s approach to amplitative reasoning has been two-dimensional. Hypothesis might well score high in uberty, but it scores quite low in security. It is clear that Hypothesis does give us reasons to accept a conclusion, but the reasons might well be fairly weak. Here is Peirce again: “As a general rule, hypothesis is a weak kind of argument. It often inclines our judgment so slightly toward its conclusion that we cannot say that we believe the latter to be true; we only surmise that it may be so” (2.625). The point, clearly, is that the fact that a hypothesis might explain some facts is not, on its own, a conclusive reason to think that this hypothesis is true. Perhaps, Peirce’s careful wording (compare: “it often inclines our judgement so slightly . . . ”) implies that some conclusions of hypothetical inferences are stronger than others — that is, there might be some further reasons which enhance our degree of belief in the truth of the conclusion of a hypothetical inference. Indeed, Peirce went on to offer some rules as to how “the process of making an hypothesis should lead to a probable result” (2.633). This is very important because it makes clear that from quite early on, Peirce took it that hypothetical reasoning needs some, as it were, external support. It may stand on its own as a mode of reasoning (meaning: as offering grounds or reasons for a conclusion), but its strength (meaning: how likely the conclusion is) comes, at least partly, from the further testing that the adopted hypothesis should be subjected to. The three rules are (cf. 2.634): 1. Further predictions should be drawn from the adopted hypothesis. 2. The testing of the hypothesis should be severe. That is, the hypothesis should be tested not just against data for which it is known to do well but also against data that would prove it wrong, were it false. 3. The testing should be fair, viz., both the failures and the successes of the hypotheses should be noted. emotion, quite distinct from the sounds themselves” (2.643). “This emotion”, he carries on saying, “is essentially the same thing as an hypothetic inference, and every hypothetic inference involves the formation of such an emotion”. My guess, motivated by the example above, is that Peirce means to highlight the fact that Hypothesis generates beliefs with extra content or new ideas, which exceed the content of whatever beliefs were meant to explain. This might be taken to generate in the mind a feeling of comprehension that was not there before. In a later piece, he seemed to equate the emotion involved in hypothetical inference with the fact that the adopted explanation removes the surprise of the explanandum (cf. 7.197). 13 (A) “I once landed at a seaport in a Turkish province; and, as I was walking up to the house which I was to visit, I met a man upon horseback, surrounded by four horsemen holding a canopy over his head. As the governor of the province was the only personage I could think of who would be so greatly honored, I inferred that this was he”. (B) “This was an hypothesis. Fossils are found; say, remains like those of fishes, but far in the interior of the country. To explain the phenomenon, we suppose the sea once washed over this land. This is another hypothesis”. (C) Numberless documents and monuments refer to a conqueror called Napoleon Bonaparte. Though we have not seen the man, yet we cannot explain what we have seen, namely, all these documents and monuments, without supposing that he really existed. Hypothesis again. (2.625).
128
Stathis Psillos
Peirce did also suggest — somewhat in passing — that the proper ground for hypothetical inference involves comparison and elimination of alternative hypotheses: “When we adopt a certain hypothesis, it is not alone because it will explain the observed facts, but also because the contrary hypothesis would probably lead to results contrary to those observed” (2.628). But he did not say much about this. Later on, in 1901 (in The Logic of Drawing History from Ancient Documents), he did say a lot more on the proper ground for explanatory inference (by then called abduction). For instance, he insisted that the hypothesis that is adopted should be “likely in itself, and render the facts likely” (7.202). Part of the ground for stronger hypothetical inferences (abductions) comes from eliminating alternative and competing hypotheses, that if true, would account for the facts to be explained. One interesting issue concerns the very nature of a hypothesis. The very term ‘hypothesis’ alludes to claims that are put forward as conjectures or as suppositions or are only weakly supported by the evidence. Peirce was alive to this problem, but he nonetheless stuck with the term ‘hypothesis’ in order to stress two things: first, hypotheses are explanatory; and second, they admit of various degrees of strength.14 In his discussion of the case of the kinetic theory of gases (2.639 ff), Peirce makes clear that the outcome of hypothetical reasoning might vary from a “pure” hypothesis to a “theory”. What it is does not vary with respect to how it is adopted — it is always adopted on the basis of explanatory considerations. But it varies with respect to its explanatory power, which may well change over time. For instance, the kinetic theory of gases was a “pure hypothesis” when it merely explained Boyle’s law, but it became a “theory” when it unified a number of empirical laws of different kinds and received independent support by the mechanical theory of heat. The idea here is that a hypothesis gains in explanatory strength when it unifies various phenomena and when it gets itself unified with other background theories (like the principles of mechanics). “The successful theories” as Peirce put it, “are not pure guesses, but are guided by reasons” (2.638). In a move that became famous later on by Wilfrid Sellars, Peirce stressed that hypotheses gain in explanatory strength when they do not just explain an empirical law, but when they also explain “the deviations” from the law (2.638). Empirical laws are typically inexact and approximate and genuinely explanatory hypotheses replace them with stricter and more accurate theoretical models. Already in the 1870s, Peirce took it to be case that hypothetical reasoning is indispensable. Immediately after he noted that Hypothesis is a weak type of argument, he added: “But there is no difference except one of degree between such an inference and that by which we are led to believe that we remember the occurrences of yesterday from our feeling as if we did so” (2.625). Bringing memory 14 Concerning his own use of the term ‘hypothesis’, he said: “That the sense in which I have used ‘hypothesis’ is supported by good usage, I could prove by a hundred authorities. The following is from Kant: ‘An hypothesis is the holding for true of the judgment of the truth of a reason on account of the sufficiency of its consequents.’ Mill’s definition (Logic, Book III, Ch. XIV §4) also nearly coincides with mine” (2.511, note).
An Explorer upon Untrodden Ground: Peirce on Abduction
129
in as an instance of hypothetical inference suggests that if hypothetical inference cannot be relied upon, there is no way to form any kind of beliefs that exceed what we now perceive. Hence, belief with any content that exceeds what is immediately given in experience requires and relies upon hypothetical reasoning. The very issue of the trustworthiness of memory is tricky, but this should be clear. There is no way to justify the reliability of memory without presupposing that it is reliable. Even if an experiment were to be made to determine the reliability of memory, it should still be the case that the results of the experiments should themselves be correctly remembered before they could play any role in the determination of the reliability of memory. All this might well imply that memory (and hence hypothetical inference) is too basic a mode of belief formation either to be fully doubted or to be justifiable on the basis of even more basic inferential modes. Indeed, to say that hypothetical reasoning is weak (or low in security, as Peirce would later on put it) is not to say that it is unjustified or unjustifiable. Rather, it implies that its justification is a more complex affair than the justification of deduction. It also implies that there can be better or worse hypothetical inferences, and the task of the logician is to specify the conditions under which a hypothetical inference is good. Given Peirce’s syllogistic conception of inference, H and I clearly have different logical forms. Besides, Peirce has insisted that it is only Hypothesis that explains — that is, that is based on explanatory considerations. But, it may be argued, presented as above, the difference between H and I is rather superficial.15 The conclusions of both H and I are hypotheses, even though H and I have different logical forms. Besides, both types of conclusion seem to be explanatory. As Peirce himself put it on one occasion already in 1878: “(...) when we make an induction, it is drawn not only because it explains the distribution of characters in the sample, but also because a different rule would probably have led to the sample being other than it is” (2.628). This is a clear point to the effect that Induction is also based on explanatory considerations and is guided by them. More generally, that laws (law-like generalisations) are explanatory of their instances has been part of traditional view of explanation that Peirce clearly shared. Peirce took it that explanation and prediction are the two sides of the same coin. Actually, his overall conception of explanation is that it amounts to “rationalisation”, that is to rendering a phenomenon rational (rationally explicable), where this rationalisation consists in finding a reason why the phenomenon is the way it is, the reason being such that were it taken into account beforehand, the phenomenon would have been predicted with certainty or high probability. Here is how he put the matter: “(. . . ) what an explanation of a phenomenon does is to supply a proposition which, if it had been known to be true before the phenomenon presented itself, would have rendered that phenomenon predictable, if not with certainty, at least as something very likely to occur. It thus renders that phenomenon rational, — that is, makes it a logical consequence, necessary 15 If
I read him correctly, Nagel [1938, p. 385] makes this point, but from a different angle.
130
Stathis Psillos
or probable” (7.192).16 It should be obvious then that law-like generalisations do explain their instances, and in particular, that they do explain the observed correlation between two properties or factors. Peirce went as far to a argue that regularisation is a type of rationalisation (cf. 7.199), where a regularisation makes some facts less isolated than before by subsuming them under a generalisation: why are these As B? because all As are B. If laws are explanatory and if law-like statements (qua generalisations) are the products of Induction, it seems that H and I are closer to each other than Peirce thought. Hence, it might be argued, both H and I are modes of generation and acceptance of explanatory hypotheses, be they about singular facts (e.g., causes) or about generalisations (e.g., All As are B). Besides, it seems that H involves (at least in many typical cases) a law-like generalisation in its premises, since in the syllogistic guise it has been presented thus far, the claim is that what explains a certain singular fact is another singular fact and a general fact in tandem, viz., H: {a is B; All As are B; therefore a is A}. As Peirce acknowledged: “By hypothesis, we conclude the existence of a fact quite different from anything observed, from which, according to known laws, something observed would necessarily result” (2.636). It seems reasonable to claim that the chief difference between H and I is that Induction involves what we have called ‘horizontal extrapolation’, whilst Hypothesis involves (or allows for) ‘vertical extrapolation’, viz., hypotheses whose content is about unobservable causes of the phenomena. Indeed, as has been stressed already, the very rationale for Hypothesis is that it makes possible the generation of new content or new ideas. It turns out, however, that if Hypothesis is constrained by its syllogistic form, it cannot play its intended role as a creator of new content. Think of H presented as above: H: {All As are B; a is B; therefore a is A}.
The conclusion of H might well be a hypothesis, but its content is not really new: it is already contained in the major premise. So the inference does not create new content; rather it unpacks content that is already present in the premises. The very syllogistic character of H leaves no choice here: premises and conclusion must share vocabulary; otherwise the conclusion cannot be inferred in the way H suggests. The inference process is such that the antecedent of the major premise is detached and is stated as the conclusion. This might be an illegitimate move in deductive inference but it captures the essence of H. In this process, no new content is created; instead some of the content of the premises is detached and is asserted. This creates a certain tension in Peirce’s account. Hypothesis is ampliative and the sole generator of new ideas or content. And yet, in the syllogistic conception 16 The similarity with the standard Deductive-Nomological model of explanation developed by Hempel [1965] is quite striking. On a different occasion, Peirce noted that an explanation is “a syllogism exhibiting the surprising fact as necessarily consequent upon the circumstances of its occurrence together with the truth of the credible conjecture, as premises” (6.469).
An Explorer upon Untrodden Ground: Peirce on Abduction
131
of hypothetic inference, the new ideas or content must already be there before they are accepted as the conclusion of the inference. If Hypothesis is the sole generator of new content and ideas, and if this is the reason why it is, in the end, indispensable despite its insecurity, it must have been a great problem for Peirce that the syllogistic conception of reasoning, and of Hypothesis in particular, obscured this fact. Perhaps this was part of the reason why Peirce abandoned the syllogistic conception of explanatory reasoning. Another part of the reason is that, as noted above, Peirce came to think that the difference between the logical forms of Induction and Hypothesis is not as fundamental as he initially thought. In 1902, he offered the following honest diagnosis of his earlier thinking about explanatory reasoning: (M)y capital error was a negative one, in not perceiving that, according to my own principles, the reasoning with which I was there dealing [‘Hypothetic Inference’] could not be the reasoning by which we are led to adopt a hypothesis, although I all but stated as much. But I was too much taken up in considering syllogistic forms and the doctrine of logical extension and comprehension, both of which I made more fundamental than they really are. As long as I held that opinion, my conceptions of Abduction necessarily confused two different kinds of reasoning. When, after repeated attempts, I finally succeeded in clearing the matter up, the fact shone out that probability proper had nothing to do with the validity of Abduction, unless in a doubly indirect manner (2.102). What then are the two different kinds of reasoning that Peirce’s earlier syllogistic approach confused? It seems clear that the confusion was between the reasoning process by means of which hypotheses (with new and extra content) are being formulated and adopted on the basis of explanatory considerations and the reasoning process by means of which these hypotheses are rendered likely. In Peirce’s later writings hypothetical inference is liberalised. It is no longer constrained by the syllogistic conception of inference. It becomes part of a broader methodological process of inquiry. Induction, on the other hand, is given the role of confirmation. In his eighth Lowell lecture in 1903, Peirce took abduction to be “any mode or degree of acceptance of a proposition as a truth, because a fact or facts have been ascertained whose occurrence would necessarily or probably result in case that proposition were true” (5.603).
5 THE ROAD TO ABDUCTION In his unfinished manuscript of 1896, Lessons from the History of Science, Peirce employed the term ‘retroduction’ (or ‘retroductive inference’) to capture hypothetical reasoning — or perhaps, the species of it where the hypothesis concerns things
132
Stathis Psillos
past. Here too, it is explanation that makes this mode of inference distinctive. As he put it: Now a retroductive conclusion is only justified by its explaining an observed fact. An explanation is a syllogism of which the major premiss, or rule, is a known law or rule of nature, or other general truth; the minor premise, or case, is the hypothesis or retroductive conclusion, and the conclusion, or result, is the observed (or otherwise established) fact (1.89). As he explained, he took ‘retroduction’ to render into English the Aristotelian term apag‘=og¯e, which as Peirce noted (and as we have already seen in section 2), was “misunderstood because of corrupt text, and as misunderstood [it was] usually translated abduction” (1.65). But he did opt for the term ‘abduction’ in the end, though he also toyed with the term ‘Presumption’. As he put it in 1903: “Presumption, or, more precisely, abduction (. . . ), furnishes the reasoner with the problematic theory which induction verifies” (2.776). In the same context, he noted that “Logical or philosophical presumption is non-deductive probable inference which involves a hypothesis. It might very advantageously replace hypothesis in the sense of something supposed to be true because of certain facts which it would account for” (2.786). It should be clear that abduction inherits some of the characteristics of Hypothesis, while it forfeits others. The two main points of contact are that a) it is explanatory considerations that guide abduction, qua an inference; and b) abduction is “the only kind of reasoning which supplies new ideas, the only kind which is, in this sense, synthetic” (2.777). But, unlike Hypothesis, abduction c) is not constrained by syllogistic formulations; d) any kind of hypothesis can be adopted on its basis, provided it plays an explanatory role. As Peirce notes: “Abduction must cover all the operations by which theories and hypotheses are engendered” (5.590). Besides, Peirce took it that his shift to abduction lay further emphasis on the fact that abduction is an insecure mode of reasoning; that the abductively adopted hypothesis is problematic; that it needs further testing. The rationale for abduction, then, is that if rational explanation is possible at all, it can only achieved by abduction. As he put it: “Its only justification is that its method is the only way in which there can be any hope of attaining a rational explanation” (2.777). Peirce’s classic characterisation of abduction qua inference, in his seventh lecture on Pragmatism titled Pragmatism and Abduction in 1903, is this (cf. 5.189): (CC) The surprising fact, C, is observed; But if A were true, C would be a matter of course, Hence, there is reason to suspect that A is true. Immediately before (CC) he noted that abduction is “the operation of adopting an explanatory hypothesis” and that “the hypothesis cannot be admitted, even as
An Explorer upon Untrodden Ground: Peirce on Abduction
133
a hypothesis, unless it be supposed that it would account for the facts or some of them”. The emphasis on C being a matter of course relates to Peirce’s conception of explanation as rational expectability. What follows the classic characterisation is even more interesting. Peirce says: “Thus, A cannot be abductively inferred, or if you prefer the expression, cannot be abductively conjectured until its entire content is already present in the premise, ‘If A were true, C would be a matter of course”’. This claim captures the way in which abduction (in contradistinction to the earlier Hypothesis) can be genuinely ampliative and generative of new ideas and content. What Peirce implies, and what seems right anyway, is that the abductive inference generates both the major premise ‘If A were true, C would be a matter of course’ and licenses the conclusion that there is reason to accept A as true. Though A may be familiar in itself, it does not follow that it is the case that ‘If A were true, C would be a matter of course’. Qua a hypothesis A has excess and new content vis-` a-vis C precisely because it offers a reason (explains) why C holds. The following then sounds plausible. Abduction is a dual process of reasoning. It involves the generation of some hypothesis A with excess content in virtue of which the explanandum C is accounted for, where the explanatory connection between A and C is captured by the counterfactual conditional ‘If A were true, C would be a matter of course’. But it also allows the detachment of the antecedent A from the conditional and hence its acceptance in its own right. The detachment of the antecedent A requires reasons and these are offered by the explanatory connection there is between the antecedent and the consequent. Peirce has had a rather broad conception of a surprising fact. He took it that the very presence of regularities in nature is quite surprising in that irregularity (the absence of regularity) is much more common than regularity in nature. Hence the presence of regular patterns under which sequences of events fall are, for Peirce, unexpected and requires explanation (cf. 7.189; 7.195). This suggests that Peirce would not take the regularities there are in nature as crude facts — which admit or require no further explanation. Regularities hold for a reason and their explanation amounts to finding the reason for which they hold, thereby rendering the phenomena rational (cf. 7.192). But aren’t also deviations from a regularity surprising? Peirce insists that if an existing regularity is breached, this does require explanation (cf. 7.191). So both the regularity and the deviations from it require explanation, though the explanations offered are at different levels. A key feature of explanation according to Peirce is that it renders the explananda less “isolated” than they would have been in the absence of an explanation (cf. 7.199). This feature follows from the fact that explanation amounts to rational expectability. For, a fact is isolated if it does not fall under a pattern. And if it does not fall under a pattern, in its presence we “do not know what to expect” (7.201). Actually, abduction is justified, Peirce claimed, because it is the “only possible hope of regulating our future conduct rationally” (2.270), and clearly this rational regulation comes from devising explanations which render the facts less isolated.
134
Stathis Psillos
A stable element of Peirce’s thought on explanatory reasoning is that it is a reasoning process — it obeys a rule of a sort. In The Logic of Drawing History from Ancient Documents (1901), he insisted that abduction amounts to an adoption of a hypothesis “which is likely in itself, and renders the facts likely”. He noted: “I reckon it as a form of inference, however problematical the hypothesis may be held” (7.202). And he queried about the “logical rules” that abduction should conform to. But why should he think that abduction is a reasoning process? Recall that for him reasoning is a conscious activity by means of which a conclusion is drawn based on reasons. In his Short Logic (1893), he emphasised that reasoning (the making of inferences) amounts to the “conscious and controlled adoption of a belief as a consequence of other knowledge” (2.442). Reasoning is a voluntary activity, which among other things, involves considering and eliminating options (cf. 7.187). This is what abduction is and does. The point is brought home if we consider abduction as an eliminative inference. Not all possible explanatory hypotheses are considered. In answering the objection that abduction is not reasoning proper because one is free to examine whatever theories one likes, Peirce noted: The answer [to the question of what need of reasoning was there?] is that it is a question of economy. If he examines all the foolish theories he might imagine, he never will (short of a miracle) light upon the true one. Indeed, even with the most rational procedure, he never would do so, were there not an affinity between his ideas and nature’s ways. However, if there be any attainable truth, as he hopes, it is plain that the only way in which it is to be attained is by trying the hypotheses which seem reasonable and which lead to such consequences as are observed (2.776). What exactly is this criterion of reasonableness? Peirce took it that abduction is not a topic-neutral inferential pattern. It operates within a framework of background beliefs, depends on them and capitalises on them. It is these beliefs that determine reasonableness or plausibility. Here is Peirce, in 1901: Of course, if we know any positive facts which render a given hypothesis objectively probable, they recommend it for inductive testing. When this is not the case, but the hypothesis seems to us likely, or unlikely, this likelihood is an indication that the hypothesis accords or discords with our preconceived ideas; and since those ideas are presumably based upon some experience, it follows that, other things being equal, there will be, in the long run, some economy in giving the hypothesis a place in the order of precedence in accordance with this indication (7.220). Background beliefs play a dual role in abduction. Their first role is to eliminate a number of candidates as “foolish”. What hypotheses will count as foolish will surely depend on how strong and well supported the background beliefs are. But
An Explorer upon Untrodden Ground: Peirce on Abduction
135
Peirce also insisted that some hypotheses must be discarded from further consideration ab initio. They are the hypotheses that, by their very nature, are untestable (cf. 6.524). This call for testability is a hallmark of Peirce’s pragmatism. In his Lectures on Pragmatism, he famously noted that “the question of pragmatism” is “nothing else than the question of the logic of abduction” (5.196). As he went on to explain, the link between the two is testability. The Maxim of Pragmatism is that admissible hypotheses must be such that their truth makes a difference in experience. In the present context, Peirce put the Maxim thus: (. . . ) [T]he maxim of pragmatism is that a conception can have no logical effect or import differing from that of a second conception except so far as, taken in connection with other conceptions and intentions, it might conceivably modify our practical conduct differently from that second conception (5.196). In other words, there can be no logical difference between two hypotheses that results in no difference in experience. Abduction, according to Peirce, honours this maxim because it “puts a limit upon admissible hypotheses” (5.196). And the limit is set by the logical form of abduction CC, which has already been noted. The major premise of an abductive inference is: ‘If A were true, C would be a matter of course’. For A to be admissible at all it must be the case that it renders C explicable and expectable. We have already seen that Peirce equated explanation with rational expectability and this clearly implies that he took it that explanation yields predictions (or even that explanation and prediction are the two sides of the same coin). Predictions are, ultimately, what differentiates between admissible (qua testable) hypotheses and inadmissible (qua untestable) ones. Here is Peirce’s own way to put the point: Admitting, then, that the question of Pragmatism is the question of Abduction, let us consider it under that form. What is good abduction? What should an explanatory hypothesis be to be worthy to rank as a hypothesis? Of course, it must explain the facts. But what other conditions ought it to fulfil to be good? The question of the goodness of anything is whether that thing fulfils its end. What, then, is the end of an explanatory hypothesis? Its end is, through subjection to the test of experiment, to lead to the avoidance of all surprise and to the establishment of a habit of positive expectation that shall not be disappointed. Any hypothesis, therefore, may be admissible, in the absence of any special reasons to the contrary, provided it be capable of experimental verification, and only insofar as it is capable of such verification. This is approximately the doctrine of pragmatism (5.197). The second role that background beliefs play in abduction concerns the ranking of the admissible candidates in an “order of preference”; or the selection of hypotheses. Accordingly, the search for explanatory hypotheses is not blind but guided by reasons. The search aims to create, as Peirce (5.592) nicely put it,
136
Stathis Psillos
“good” hypotheses. So the search will be accompanied by an evaluation of hypotheses, and by their placement in an order of preference according to how good an explanation they offer. In Hume on Miracles (1901) Peirce put this point as follows: The first starting of a hypothesis and the entertaining of it, whether as a simple interrogation or with any degree of confidence, is an inferential step which I propose to call abduction. This will include a preference for any one hypothesis over others which would equally explain the facts, so long as this preference is not based upon any previous knowledge bearing upon the truth of the hypotheses, nor on any testing of any of the hypotheses, after having admitted them on probation (cf. 6.525). It is clear from this passage that the preferential ranking of competing hypotheses that would explain the facts, were they true, cannot be based on judgements concerning their truth, since if we already knew which hypothesis is the true one, it would be an almost trivial matter to infer this as against its rivals. So the ranking should be based on different criteria. What are they? The closest Peirce comes to offering a systematic treatment of this subject is in his Abduction, which is part of The Logic of Drawing History from Ancient Documents (1901). The principles which should “guide us in abduction or the process of choosing a hypothesis” include: A. Hypotheses should explain all relevant facts.17 B. Hypotheses should be licensed by the existing background beliefs; C. Hypotheses should be, as far as possible, simple (“incomplex” (7.220-1)); D. Hypotheses should have unifying power (“breadth” (7.220-1));18 E. Hypotheses should be further testable, and preferably entail novel predictions (7.220).19
6 FROM THE INSTINCTIVE TO THE REASONED MARKS OF TRUTH The picture of abduction that Peirce has painted is quite complex. On the face of it, there may be a question of its coherence. Abduction is an inference by 17 “Still, before admitting the hypothesis to probation, we must ask whether it would explain all the principal facts” (7.235). 18 Peirce characterised unifying power thus: “The purpose of a theory may be said to be to embrace the manifold of observed facts in one statement, and other things being equal that theory best fulfils its function which brings the most facts under a single formula” (7.410). 19 Peirce stressed that “the strength of any argument of the Second Order depends upon how much the confirmation of the prediction runs counter to what our expectation would have been without the hypothesis” (7.115).
An Explorer upon Untrodden Ground: Peirce on Abduction
137
means of which explanatory hypotheses are admitted, but it is not clear what this admission amounts to. Nor is it clear whether there are rules that this mode of inference is subject to. In a rather astonishing passage that preceded Peirce’s classic characterisation of abduction noted above, Peirce stressed: “It must be remembered that abduction, although it is very little hampered by logical rules, nevertheless is logical inference, asserting its conclusion only problematically or conjecturally, it is true, but nevertheless having a definite logical form” (5.188). How can it be that abduction has a definite logical form (the one suggested by CC above) and yet not be hampered by logical rules? Besides, Peirce made the seemingly strange point that “(. . . ) abduction commits us to nothing. It merely causes a hypothesis to be set down upon our docket of cases to be tried” (5.602). To resolve the possible tensions here we need to take into account Peirce’s overall approach to ampliative reasoning. Peirce was adamant that the conclusion of an abductive inference can be accepted only on “probation” or “proviosionally”. Here is one of the very many ways in which Peirce expressed this key thought of his: “Abduction, in the sense I give the word, is any reasoning of a large class of which the provisional adoption of an explanatory hypothesis is the type” (2.544, note). One important reason why explanatory hypotheses can only be accepted on probation comes from the history of science itself, which is a history of actual abductions. Though it is reasonable to accept a hypothesis as true on the basis that “it seems to render the world reasonable”, a closer look at the fate of explanatory hypotheses suggests that they were subsequently controverted because of wrong predictions. “Ultimately” Peirce said, the circumstance that a hypothesis, although it may lead us to expect some facts to be as they are, may in the future lead us to erroneous expectations about other facts, — this circumstance, which anybody must have admitted as soon as it was brought home to him, was brought home to scientific men so forcibly, first in astronomy, and then in other sciences, that it became axiomatical that a hypothesis adopted by abduction could only be adopted on probation, and must be tested (7.202). Peirce did consider the claim that abduction might admit of a strict logical form based on Bayes’s theorem. Well, he did not put it quite that way, but this is what he clearly meant when he said that according to a common theory, reasoning should be “guided by balancing probabilities, according to the doctrine of inverse probability” (2.777). The idea here is that one updates one’s degree of belief in a proposition by using Bayes’s theorem: Probnew (H)=Probold (H/e), where Prob(H/e)=Prob(e/H) × Prob(H)/Prob(e). “Inverse probabilities” are what later on became known as likelihoods, viz., Prob(e/H). As Peirce immediately added, this approach to reasoning relies “upon knowing antecedent probabilities”, that is prior probabilities. But he was entirely clear that this Bayesian approach could not capture the logical form of abduction because he thought that prior probabilities in the case of hypotheses were not available. Peirce was totally
138
Stathis Psillos
unwilling to admit subjective prior probabilities — if there were well-defined prior probabilities applied to hypotheses, they would have to be “solid” statistical probabilities “like those upon which the insurance business rests”. But when it comes to hypotheses, the hope for solid statistical facts is futile:
An Explorer upon Untrodden Ground: Peirce on Abduction
139
But they are not and cannot, in the nature of things, be statistical facts. What is the antecedent probability that matter should be composed of atoms? Can we take statistics of a multitude of different universes? An objective probability is the ratio of frequency of a specific to a generic event in the ordinary course of experience. Of a fact per se it is absurd to speak of objective probability. All that is attainable are subjective probabilities, or likelihoods, which express nothing but the conformity of a new suggestion to our prepossessions; and these are the source of most of the errors into which man falls, and of all the worst of them (2.777). It might be objected here that Peirce’s last point is unfair to probabilists, since he himself has tied the adoption of explanatory hypotheses to background beliefs, which could well be the sources of most of the errors into which man falls. Fair enough. Peirce would not think this is an objection to his views, precisely because the dependence of abduction on background beliefs is the reason he thought that it could not, in the first instance and on its own, yield probable results and that its conclusion should be accepted on probation, subject to further testing. So his point is that those who think that by relying on subjective prior probabilities can have a conception of inference which yields likely hypotheses delude themselves. What all this means is that abductive inference per se is not the kind of inference that can or does lead to likely conclusions. It’s not as if we feed a topic-neutral and algorithmic rule with suitable premises and it returns likely conclusions. Abduction is not like that at all. Peirce insisted that when it comes to abduction “yielding to judgments of likelihood is a fertile source of waste of time and energy” (6.534). So abduction is not in the business of conferring probabilities on its conclusions. But this is not to imply abduction is neither an inference nor a means, qua inference, to yield reasonably held conclusions. In his fifth lecture on Pragmatism (1903), Peirce drew a distinction between validity and strength (which is different from the one between uberty and security noted in section 3). An argument is valid, Peirce suggested “if it possesses the sort of strength that it professes and tends toward the establishment of the conclusion in the way in which it pretends to do this” (5.192). This might sound opaque, but the underlying idea is that different sort of inferences aim at different things and hence cannot be lamped together. Deduction aims at truth-preservation or truth-maintenance: if the premises are true, the conclusion has to be true. In deduction, validity and strength coincide because the conclusion of a deductive argument is at least as secure as its premises. But this is peculiar to deductive inference. Other inferential patterns may be such that validity and strength do not coincide. An inference may be weak and yet valid. It may be weak in that the conclusion of the inference might not be strongly supported by the premises, and yet it may be valid in Peirce’s sense above: the inference does not pretend to license stronger conclusions than it actually does. As he put it: “An argument is none the less logical for being weak, provided it does not pretend to a strength that it does not possess” (5.192). Abduction is a weak inference, but it can be reasonable
140
Stathis Psillos
nonetheless (or “valid”, as Peirce would put it). Unlike deduction, abduction does not advertise itself as truth-preserving. Its aim is the generation of extra content and the provision of reasons for its adoption (based on explanatory considerations). Here is Peirce’s own way to out the point: “The conclusion of an abduction is problematic or conjectural, but is not necessarily at the weakest grade of surmise, and what we call assertoric judgments are, accurately, problematic judgments of a high grade of hopefulness” (5.192). Peirce had actually examined this issue in his earlier Notes on Ampliative Reasoning (1901-2). There, after noting that “an argument may be perfectly valid and yet excessively weak” (2.780), he went on to suggest that the strength of abduction is a function of its eliminative power.20 The strength of an abductively inferred hypothesis depends on “the absence of any other hypothesis”. But this would suggest that abduction is very weak, since how can it possibly be asserted that all other potentially explanatory hypotheses have been eliminated? To avoid rendering abduction excessively weak, Peirce suggested that strength might be measured in terms of “the amount of wealth, in time, thought, money, etc., that we ought to have at our disposal before it would be worth while to take up that hypothesis for examination”. This introduces a new factor into reasoning — over and above the requirements of explanation and testability noted above. We can call this factor ‘economy’ echoing Peirce’s own characterisation of it. Peirce tied economy to a number of features of abductive reasoning. In his eighth Lowell lecture in 1903, he stressed that “the leading consideration in Abduction” is “the question of Economy–Economy of money, time, thought, and energy” (5.600). Economy is related to the range of potential explanations that may be entertained and be subjected to further testing (cf. 6.528). It is related to the eliminative nature of abduction. Economy dictates that when there is need for choice between competing hypotheses which explain a set of phenomena, some crucial experiment should be devised which eliminates many or most of the competitors.21 Economy is also related to the use of Ockham’s razor (cf. 6.535). The demand for, and the preference of, simple explanation is “a sound economic principle” because simpler explanations are more easily tested. The general point here is that abduction — qua reasoning — is subjected to criteria that do not admit a precise logical formulation. But these criteria are necessary for the characterisation of abduction nonetheless if abduction is to be humanly possible. In contradistinction to Descartes, Peirce was surely not inter20 In the case of Induction, Peirce noted that the larger the number of instances that form the inductive basis, the stronger the induction. But, he added, weak inductions (based on small numbers of instances) are perfectly valid (cf. 2.780). 21 Here is how Peirce put it: “Let us suppose that there are thirty-two different possible ways of explaining a set of phenomena. Then, thirty-one hypotheses must be rejected. The most economical procedure, when it is practicable, will be to find some observable fact which, under conditions easily brought about, would result from sixteen of the hypotheses and not from any of the other sixteen. Such an experiment, if it can be devised, at once halves the number of hypotheses” (6.529).
An Explorer upon Untrodden Ground: Peirce on Abduction
141
ested in the project of pure enquiry. Inference, in particular ampliative inference, does not operate in a vacuum; nor is it subjected to no constraints other than the search for the truth. Nor does it occur in an environment of unlimited resources of time and energy. Principles of economy govern abductive inference precisely because abduction has to work its way through a space of hypotheses that is virtually inexhaustible. So either no abductive inference would be possible or there should be criteria that cut down the space of hypotheses to a reasonable size (cf. 2.776). It might be thought that these considerations of economy render abduction totally whimsical. For, one may wonder, what possibly could be the relation between abduction and truth? Note, however, that this worry would be overstated. Principles of economy are principles which facilitate the further testing of the selected hypotheses. Hence, they can facilitate finding out whether a hypothesis is true in the only sense in which Peirce can accept this, viz., in the sense of making a hypothesis doubt-resistant. But there is a residual worry that is more serious. If abduction does not operate within a network of background of true beliefs, there is no way in which it can return hypotheses which have a good chance of being true. How can these true background beliefs emerge? In at least two different places, Peirce argues that the human mind has had the power to imagine correct theories, where this power is a “natural adaptation” (5.591). On one of these two occasions, he clearly associated this power of the human mind (“the guessing instinct”) with the principles of economy in abductive reasoning. These principles work because the mind has the power to hit upon the truth in a relatively small number of trials. Here is how he put it: In very many questions, the situation before us is this: We shall do better to abandon the whole attempt to learn the truth, however urgent may be our need of ascertaining it, unless we can trust to the human mind’s having such a power of guessing right that before very many hypotheses shall have been tried, intelligent guessing may be expected to lead us to the one which will support all tests, leaving the vast majority of possible hypotheses unexamined (6.530). Peirce does not prove this claim, how could he? He does say in its support that truth has survival value (cf. 5.591). But it is not clear that this is anything other than speculation. A more likely ground for Peirce’s claim is quasi-transcendental, viz., that unless we accept that the human mind has had this power to guess right, there can be no rational explanation of why it has come up with some true theories in the first place. Peirce tries to substantiate this claim by means of a further argument. True theories cannot be a matter of chance because given all possible theories that could have been entertained, stumbling over a true one is extremely unlikely. The possible theories, Peirce said, “if not strictly innumerable, at any rate exceed a trillion – or the third power of a million; and therefore the chances are too overwhelmingly against the single true theory in the twenty or thirty thousand years during which man has been a thinking animal, ever having
142
Stathis Psillos
come into any man’s head” (5.591). Note that this kind of argument is based on a statistical claim, which might be contentious: how can we come up with such statistics in the first place? Be that as it may, Peirce’s key point here is that though abduction does not wear its justification on its sleeve, it is reasonable to think that abduction does tend to operate within networks of true background beliefs. It is fair to say that though abduction cannot have a foundational role, its products cannot be doubted en masse, either. Its justification, qua mode of inference, comes from the need for rational explanation and in particular from the commitment to the view that rational explanation is possible; that the facts “admit of rationalization, and of rationalization by us” (7.219). Interestingly, Peirce claims that this commitment embodies another hypothesis, and as such it is the product of “a fundamental and primary abduction”. As Peirce put it: “it is a primary hypothesis underlying all abduction that the human mind is akin to the truth in the sense that in a finite number of guesses it will light upon the correct hypothesis”. This creates an air of circularity, of course. In essence, a grand abduction is employed to justify the possibility of abductive inference. Peirce does not address this problem directly. For him it seems that this circularity is the inevitable price that needs to be paid if human understanding is at all possible. Explanation aims at (and offers) understanding, but unless it is assumed that the human mind has a capacity or power to reach the truth in a finite number of trials, hitting the right explanations would be a random walk. It is no surprise, then, that Peirce brings in instinct once more. He draws a distinction between two kinds of considerations “which tend toward an expectation that a given hypothesis may be true”: the purely instinctive and the reasoned ones (7.220). The instinctive considerations kick in when it comes to the primary hypothesis that the human mind has a power to hit upon the truth. This is not reasoned, though it is supported by an induction on the past record of abductive inferences. As Peirce put it, “it has seldom been necessary to try more than two or three hypotheses made by clear genius before the right one was found”. The reasoned considerations kick in when a body of background beliefs has emerged which has some measure of truth in it. Then, the choice among competing hypotheses is guided by criteria noted above, e.g., breadth and incomplexity. For Peirce, however, it would be folly to try to hide the claim that “the existence of a natural instinct for truth is, after all, the sheet-anchor of science. From the instinctive, we pass to reasoned, marks of truth in the hypothesis” (7.220).22 In one of the first systematic treatments of Peirce’s views of abduction, Harry Frankfurt (1958, 594) raised what might be called Peirce’s paradox. This is that Peirce appears to want to have it both ways, viz., “that hypotheses are the products of a wonderful imaginative faculty in man and that they are products of a certain sort of logical inference”. It should be clear by now that this paradox is only 22 The role of instinct in abduction is raised and discussed, in more or less the same way, in Peirce’s fifth lecture on Pragmatism, in 1903 (5.171-5.74). There, Peirce expresses his view that the instinct of guessing right is accounted for by evolution.
An Explorer upon Untrodden Ground: Peirce on Abduction
143
apparent. Abduction involves a guessing instinct and is a reasoned process, but for Peirce these two elements operate at different levels. The guessing instinct is required for the very possibility of a trustworthy abductive inference. The reasoned process operates within an environment of background beliefs and aims to select among competing hypotheses on the basis of explanatory considerations. 7
THE THREE STAGES OF INQUIRY
In Peirce’s mature thought, we have already seen, abduction covers a cluster of operations that generate and evaluate explanatory hypotheses. Peirce was adamant, it was noted, that abduction is not the kind of inference that returns likely hypotheses. It’s not in the business of producing judgements of likelihood. This is not to say, we have stressed, that abduction is not trustworthy. Rather, its trustworthiness is a function of the background beliefs within which it operates. But is it not the case that, in the end of the day, we want theories or hypotheses that are likely to be true? Peirce never doubted this. In 1901 he summed this up by saying: “A hypothesis then has to be adopted which is likely in itself and renders the facts likely. This process of adopting a hypothesis as being suggested by the facts is what I call abduction” (7.202). But how can abduction lead to likely hypotheses if it is not meant to do so? In his mature writings Peirce treated abduction as the first part of a threestage methodological process, the other two stages being deduction and induction. The burden of likelihood is carried not by abduction in and of itself but by the other two methods which complement abduction in the overall method of inquiry. Abduction might confer plausibility or reasonableness on its conclusion, but their probability is determined by their further testing. Here is a long but nice summary by Peirce himself, offered in 1908: The whole series of mental performances between the notice of the wonderful phenomenon and the acceptance of the hypothesis, during which the usually docile understanding seems to hold the bit between its teeth and to have us at its mercy, the search for pertinent circumstances and the laying hold of them, sometimes without our cognizance, the scrutiny of them, the dark laboring, the bursting out of the startling conjecture, the remarking of its smooth fitting to the anomaly, as it is turned back and forth like a key in a lock, and the final estimation of its Plausibility, I reckon as composing the First Stage of Inquiry. Its characteristic formula of reasoning I term Retroduction, i.e. reasoning from consequent to antecedent (6.469). Retroduction does not afford security. The hypothesis must be tested. This testing, to be logically valid, must honestly start, not as Retroduction starts, with scrutiny of the phenomena, but with examination of the hypothesis, and a muster of all sorts of conditional experiential consequences which would follow from its truth. This constitutes the
144
Stathis Psillos
Second Stage of Inquiry. For its characteristic form of reasoning our language has, for two centuries, been happily provided with the name Deduction (6.470). The purpose of Deduction, that of collecting consequents of the hypothesis, having been sufficiently carried out, the inquiry enters upon its Third Stage, that of ascertaining how far those consequents accord with Experience, and of judging accordingly whether the hypothesis is sensibly correct, or requires some inessential modification, or must be entirely rejected. Its characteristic way of reasoning is Induction (6.472). Abduction is the sole method by means of which new ideas are introduced. It is the only method by means of which the phenomena are ‘rationalised’ by being explained. But to get from an abductively inferred hypothesis to a judgement of probability, this hypothesis should be subjected to further testing. According to Peirce: The validity of a presumptive adoption of a hypothesis for examination consists in this, that the hypothesis being such that its consequences are capable of being tested by experimentation, and being such that the observed facts would follow from it as necessary conclusions, that hypothesis is selected according to a method which must ultimately lead to the discovery of the truth, so far as the truth is capable of being discovered, with an indefinite approximation to accuracy (2.781). Taken on its own, abduction is the method of generation and ranking of hypotheses which potentially explain a certain explanandum. Peirce says: “The first starting and the entertaining of [a hypothesis], whether as a simple interrogation or with any degree of confidence, is an inferential step which I propose to call abduction” (6.525). But these hypotheses should be subjected to further testing which will determine, ultimately, their degree of confirmation. Accordingly, Peirce suggests that abduction should be embedded in a broader framework of inquiry so that the hypotheses generated and evaluated by abduction can be further tested. The result of this testing is the confirmation or disconfirmation of the hypotheses. So, Peirce sees abduction as the first stage of the reasoners’ attempt to add reasonable beliefs into their belief-corpus in the light of new phenomena or observations. The process of generation and first evaluation of hypotheses (abduction) is followed by deduction — i.e., by deriving further predictions from the abduced hypotheses — and then by induction which now Peirce understands as the process of testing these predictions and hence the process of confirming the abduced hypothesis (cf. 7.202ff). “As soon as a hypothesis has been adopted”, Peirce (7.203) says, the next step “will be to trace out its necessary and probable experiential consequences. This step is deduction”. And he adds:
An Explorer upon Untrodden Ground: Peirce on Abduction
145
Having, then, by means of deduction, drawn from a hypothesis predictions as to what the results of experiment will be, we proceed to test the hypothesis by making the experiments and comparing those predictions with the actual results of the experiment. (. . . ) When, (. . . ), we find that prediction after prediction, notwithstanding a preference for putting the most unlikely ones to the test, is verified by experiment, whether without modification or with a merely quantitative modification, we begin to accord to the hypothesis a standing among scientific results (7.206). Induction, then, is given an overall different role than the one it had in his earlier thinking. It now captures the methods by means of which hypotheses are confirmed. Hence, in the transition from his earlier views to his later ones, what really changed is not Peirce’s conception of explanatory reasoning, but rather his views on induction. Induction changed status: from a distinct mode of ampliative reasoning with a definite syllogistic form which leads to the acceptance of a generalisation as opposed to a fact (early phase) to the general process of testing a hypothesis. As Peirce out it: “This sort of inference it is, from experiments testing predictions based on a hypothesis, that is alone properly entitled to be called induction” (7.206). Induction is a process “for testing hypotheses already in hand. The induction adds nothing” (7.217). Induction is no less indispensable than abduction in the overall process of inquiry — but its role is clearly different from the role of abduction. Peirce put this point in a picturesque way when he said that our knowledge of nature consists in building a “cantilever bridge of inductions” over the “chasm that yawns between the ultimate goal of science and such ideas of Man’s environment”, but that “every plank of [this bridge] is first laid by Retroduction alone” (6.475). Peirce kept his view that abduction and induction are distinct modes of reasoning. In The Logic of Drawing History from Ancient Documents (1901), he noted that abduction and induction are “the opposite poles of reason, the one the most ineffective, the other the most effective of arguments” (7.218). Abduction is “the first step of scientific reasoning, as induction is the concluding step”. Abduction is “merely preparatory”. Abduction makes its start from the facts, without, at the outset, having any particular theory in view, though it is motived by the feeling that a theory is needed to explain the surprising facts. Induction makes its start from a hypothesis which seems to recommend itself, without at the outset having any particular facts in view, though it feels the need of facts to support the theory. Abduction seeks a theory. Induction seeks for facts. In abduction the consideration of the facts suggests the hypothesis. In induction the study of the hypothesis suggests the experiments which bring to light the very facts to which the hypothesis had pointed. Nonetheless, abduction and induction have a common feature: “that both lead to
146
Stathis Psillos
the acceptance of a hypothesis because observed facts are such as would necessarily or probably result as consequences of that hypothesis”. Hence, Peirce has moved a long way from his earlier view on induction. Abduction covers all kinds of explanatory reasoning (including explanation by subsumption under a generalisation), while induction is confirmation. What is important to note is that Peirce took it that induction is justified in a way radically distinct from the way abduction is justified. He thought that induction is, essentially, a self-corrective method23 , viz., that “although the conclusion [of induction] at any stage of the investigation may be more or less erroneous, yet the further application of the same method must correct the error” (5.145). Being a frequentist about probabilities, Peirce clearly thought that a consistent application of the straight rule of induction will converge in the limit to the true relative frequency of a certain factor A in a class of events B. In one of the most interesting studies of Peirce’s abduction, Douglas R. Anderson (1986, 162) noted that Peircean abduction “is a possibilistic inference whose test is in futuro”. This claim goes a long way in capturing the essence of Peircean abduction. Peirce employed the Aristotelian idea of “esse in futuro” to capture a mode of being which is potential, and not actual. For him, potentialities as well as laws of nature have their esse in futuro. Abduction, it might be claimed, has its justification in futuro — or, better put, it has its full justification in futuro. This means that although a hypothesis might be reasonably accepted as plausible based on explanatory considerations (abduction), the degree of confidence in this hypothesis is not thereby settled. Rather it is tied to the degree of confirmation of this hypothesis, where the latter depends, ultimately, on the future performance of the hypothesis, viz., on how well-confirmed it becomes by further evidence. This conception of justification in futuro tallies well with Peirce’a account of knowledge and truth. The aim of inquiry is to get doubt-resistant beliefs. As noted already, truth itself boils down to doubt-resistant belief. In What Pragmatism Is, in 1905, Peirce said: You only puzzle yourself by talking of this metaphysical ‘truth’ and metaphysical ‘falsity,’ that you know nothing about. All you have any dealings with are your doubts and beliefs, with the course of life that forces new beliefs upon you and gives you power to doubt old beliefs. If your terms ‘truth’ and ‘falsity’ are taken in such senses as to be definable in terms of doubt and belief and the course of experience (as for example they would be, if you were to define the ‘truth’ as that to a belief in which belief would tend if it were to tend indefinitely toward absolute fixity), well and good: in that case, you are only talking about doubt and belief. But if by truth and falsity you mean something not definable in terms of doubt and belief in any way, then you are talking of entities of whose existence you can know nothing, and which Ockham’s razor would clean shave off. Your problems would 23 “That
Induction tends to correct itself, is obvious enough” (5.776).
An Explorer upon Untrodden Ground: Peirce on Abduction
147
be greatly simplified, if, instead of saying that you want to know the ‘Truth,’ you were simply to say that you want to attain a state of belief unassailable by doubt (5.416). All beliefs, then, which are not certain should be subjected to further testing — it is only this further testing (or, at least, the openness to further testing) that can render beliefs permanently settled and hence doubt-resistant. The justification of all fallible beliefs is in futuro. Abduction generates and recommends beliefs; but the process of their becoming doubt-resistant is external to abduction — there is where induction rules.
8
LOOKING AHEAD
Peirce had the intellectual courage to explore uncharted territories, but this exploration did not leave behind a full and comprehensive map. Despite his expressed wish to write short book on “the real nature” of explanatory reasoning, he left behind papers, notes and unfinished manuscripts and, with them, a big challenge to his followers to reconstruct his thinking and put together a coherent and comprehensive theory of ampliative reasoning. A few decades passed after Peirce’s death in 1913 before philosophers started to appreciate the depth, richness and complexity of Peirce’s views of abduction. It was not until the publication of the first two volumes of his collected papers in 1931-2, that philosophers started to pay a more systematic attention to Peirce’s philosophy, and to his writings on abduction, in particular. In a paper published a few years after Peirce’s death, Professor Josiah Royce (who bequeathed Peirce’s manuscripts to the Harvard philosophy department and the Harvard library) brought to attention Peirce’s Lectures on Pragmatism as well as his Lowell Lectures on Logic in 1903-4 and made the following characteristic comment [Royce, 1916, p. 708]: It was these latter [the Lowell Lectures] which James described as ‘flashes of brilliant light relieved against Cimmerian darkness — ‘darkness’ indeed to James as to many others must have seemed those portions on ‘Existential Graphs’ or ‘Abduction’. William James’s reported view of Peirce’s writings on abduction was far from atypical. There is virtually no attempt for a reconstruction or exegesis of Peirce’s views of abduction before Arthur Burks’s (1946). In his long an instructive review of the first two volumes of Peirce’s Collected Papers, Ernest Nagel (1933, 382) devoted only a few lines on abduction noting that “Presumptive reasoning, (. . . ) (also called abduction, retroduction, hypothesis), consists in inferring an explanation, cause, or hypothesis from some fact which can be taken as a consequence of the hypothesis”. And Hans Reichenbach made the following passing note in his [1938, p. 36]:
148
Stathis Psillos
I admire Charles Peirce as one of the few men who saw the relations between induction and probability at an early time; but just his remarks concerning what he calls ‘abduction’ suffer from an unfortunate obscurity which I must ascribe to his confounding the psychology of scientific discovery with the logical situation of theories in relation to observed facts. When Peirce’s views were studied more carefully, there were two broad ways in which they were developed. The first focused on the issue of justification and reliability of ampliative reasoning; the second focused on the process of discovery of explanatory theories. Gilbert Harman’s [1965] paper on Inference to the Best Explanation (IBE) argued that the best way to conceive of abduction qua an inferential method was to see it as the method of inferring to the truth of the best among a number of competing rival explanations of a set of phenomena. On Harman’s view, abduction is the mode of inference in which a hypothesis H is accepted on the basis that a) it explains the evidence and b) no other hypothesis explains the evidence as well as H does. In a sense, IBE ends up being a liberalised version of Peircean abduction; it is defended as the mode of ampliative reasoning that can encompass hypotheticodeductivism and enumerative induction as special cases.24 One important issue in this way of thinking about abduction concerns its justification: why should it be taken to be the case that IBE is truth-conducive? Here the issue of the justification of IBE has been tied to the prospects of the defence of scientific realism in the philosophy of science.25 Another important issue concerns the virtues of hypotheses that make up goodness of explanation, or measure explanatory power. The identification of these virtues has not gone much further than what Peirce suggested (see, for instance, [Thagard, 1978]). But the justification of the truthconductive character of these virtues has become a subject of intense debate (see [McMullin, 1992]). A third issue concerns the relationship between abduction, qua IBE, and the Bayesian theory of confirmation and belief updating (see [Lipton, 2004]). It was Norman Russell Hanson in the 1950s who suggested that Peirce’s abduction should be best seen as a logic of discovery. The then dominant tradition was shaped by Reichenbach’s distinction between the context of discovery and the context of justification and the key thought (shared by Karl Popper and others as well) was that discovery was not subject to rules — it obeyed no logic; it was subject only to a psychological study (see Reichenbach’s comment on Peirce above). Hanson suggested that discovery falls under rational patterns and argued that this was Peirce’s key idea behind abduction. He took it that a logic of discovery is shaped by the following type of structure: it proceeds retroductively, from an anomaly to the delineation of a kind of explanation H which fits into an organised pattern of concepts [1965, p. 50]. 24 For 25 For
more on this, see Psillos [2002]. more on this, see Psillos [1999, chapter 4].
An Explorer upon Untrodden Ground: Peirce on Abduction
149
In the 1980s, the study of abduction found a new home in Artificial Intelligence. The study of reasoning, among other things, by computer scientists unveiled a variety of modes of reasoning which tend to capture the defeasible, non-monotonic and uncertain character of human reasoning. The study of abduction became of prominent aspect of this new focus on reasoning. In this respect, pioneering among the researchers in AI has been Bob Kowalski. Together with his collaborators, Kowalski attempted to offer a systematic treatment of both the syntax and the semantic of abduction within the framework of Logic Programming. The aim of an abductive problem is to assimilate a new datum O into a knowledge-base (KB). So, KB is suitably extended by a certain hypothesis H into KB’ such that KB’ incorporates the datum O. Abduction is the process through which a hypothesis H is chosen (see [Kakas et al., 1992; 1997]). Others, notably Bylander and his collaborators (1991), have aimed to offer computational models of abduction which capture its evaluative element.26 Abduction has been used in a host of areas such as fault diagnosis (where abduction is used for the derivation of a set of faults that are likely to cause a certain problem); belief revision (where abduction is used in the incorporation of new information in a belief corpus); as well as scientific discovery, legal reasoning; natural language understanding, and modelbased reasoning. In these areas, there have been attempts to advance formal models of abductive reasoning so that its computational properties are clearly understood and its relations to other kinds of reasoning becomes more precise. A rich map of the conceptual and computational models of abduction is offered in Gabbay and Woods [2005]. In this work, Gabbay and Woods advance their own formal model of abduction that aims to capture some of the nuances of Peirce’s later account. They treat abduction as a method to solve an ignorance-problem, where the latter is a problem not solvable by presently available cognitive resources. Given a choice between surrender (leaving the problem unsolved) and subduance (looking for novel cognitive resources), Gabbay and Woods promote abduction as a middle way: ignorance is not (fully) removed, but becomes the basis for looking for resources upon which reasoned action can be based. The abduced hypothesis does not become known, but it is still the basis for further exploration and action. Circa 1897, Peirce wrote this: The development of my ideas has been the industry of thirty years. I did not know as I ever should get to publish them, their ripening seemed so slow. But the harvest time has come, at last, and to me that harvest seems a wild one, but of course it is not I who have to pass judgment. It is not quite you, either, individual reader; it is experience and history (1.12). Both experience and history have now spoken. Peirce’s theory of abduction still yields fruits and promises good harvests for many years to come. 26 See also [Josephson and Josephson, 1994]. For a good survey of the role of abduction in AI, see [Konolige, 1996].
150
Stathis Psillos
FURTHER READING Perhaps the most important early writings on Peirce’s theory of abduction are by Burks [1946], Frankfurt [1958], and Fann [1970]. A very significant more recent article is Anderson [1986]. Even more recent work that discusses aspects of Peirce’s views of abduction are Hofmann [1999] and Paavola [2007]. An excellent, brief but comprehensive account of Peirce’s philosophy of pragmatism is given in Misak [1999]. Thagard’s [1981] is a brief but suggestive account of the relation between abduction and hypothesis, while his [1977] explains the relation between Induction and Hypothesis. On Peirce’s account of Induction, see Goudge [1940], Jessup [1970] and Sharpe [1970]. On issues related to the abduction as a logic of discovery, see Hanson [1965]. The classic book-length treatment of Inference to the Best Explanation is by Lipton [1991]. A recent thorough discussion of the rival interpretations of Peirce (IBE vs logic of discovery) is given in McKaughan, D. J. [2008]. For an emphasis on computational aspects of abduction, see Aliseda [2006]. The role of abduction in science is discussed in Magnani [2001]. On the relation between abduction and Bayesian confirmation, see the symposium on Peter Lipton’s Inference to the Best Explanation in Philosophy and Phenomenological Research, 74: 421-462, (2007) (Symposiasts: Alexander Bird, Christopher Hitchcock and Stathis Psillos). For a development of the Peircean two-dimensional framework see Psillos [2002] and [2009]. BIBLIOGRAPHY [Aliseda, 2006] A. Aliseda. Abductive Reasoning. Logical Investigations into Discovery and Explanation, Synthese Library vol.330, Springer, 2006. [Anderson, 1986] D. R. Anderson. The Evolution of Peirce’s Concept of Abduction, Transactions of the Charles S. Peirce Society 22: 145-64, 1986. [Burks, 1946] A. Burks. Peirce’s Theory of Abduction, Philosophy of Science 13: 301-306, 1946. [Bylander et al., 1991] T. Bylander, D. Allemang, M. C. Tanner, and J. R. Josephson. The Computational Complexity of Abduction, Artificial Intelligence 49: 25-60, 1991. [Fann, 1970] K. T. Fann. Peirce’s Theory of Abduction, The Hague: Martinus Nijhoff, 1970. [Frankfurt, 1958] H. Frankfurt. Peirce’s Notion of Abduction, The Journal of Philosophy 55: 593-7, 1958. [Gabbay and Woods, 2005] D. M. Gabbay and J. Woods. The Reach of Abduction: Insight and Trial, volume 2 of A Practical Logic of Cognitive Systems, Amsterdam: North-Holland, 2005. [Goodge, 1940] T. A. Goodge. Peirce’s Treatment of Induction, Philosophy of Science 7: 56-68, 1940. [Hanson, 1965] N. R. Hanson. Notes Towards a Logic of Discovery, in R.J. Bernstein (ed.) Critical Essays on C.S. Peirce, Yale University Press, 1965. [Harman, 1965] G. Harman. Inference to the Best Explanation, The Philosophical Review 74: 88-95, 1965. [Hempel, 1965] C. G. Hempel. Aspects of Scientific Explanation, New York: The Free Press, 1965. [Hofmann, 1999] M. Hofmann. Problems with Peirce’s Concept of Abduction, Foundations of Science 4: 271-305, 1999. [Josephson and Josephson, 1994] R. Josephson and S. G. Josephson, eds. Adducive Inference, Cambridge University Press, Cambridge, 1994. [Jessup, 1974] J. A. Jessup. Peirce’s Early Account of Induction, Transactions of the Charles S. Peirce Society 10: 224-34, 1974.
An Explorer upon Untrodden Ground: Peirce on Abduction
151
[Kakas et al., 1992] A. C. Kakas, R. A. Kowalski, and F. Toni. Abductive Logic Programming, Journal of Logic and Computation 2: 719-70, 1992. [Kakas et al., 1997] A. C. Kakas, R. A. Kowalski, and F. Toni. The Role of Abduction in Logic Programming, in D. Gabbay et al. (eds) Handbook in Artificial Intelligence and Logic Programming, Oxford: Oxford University Press, 1997. [Konolige, 1996] K. Konolige. Abductive Theories in Artificial Intelligence, in G. Brewka (ed.) Principles of Knowledge Representation, CSLI Publications, 1996. [Lipton, 1991] P. Lipton. Inference to the best Explanation (2nd enlarged edition, 2004), London: Routledge, 1991. [Magnani, 2001] L. Magnani. Abduction, Reason and Science. Processes of Discovery and Explanation, New York: Kluwer Academic, 2001. [McKaughan, 2008] D. J. McKaughan. From Ugly Duckling to Swan: C. S. Peirce Abduction and the Pursuit of Scientific Theories, Transactions of the Charles S. Peirce Society 44: 446-68, 2001. [McMullin, 1992] E. McMullin. The Inference that Makes Science, Milwaukee: Marquette University Press, 1992. [Misak, 1999] C. Misak. American Pragmatism — Peirce, in C. L. Ten (ed.) Routledge History of Philosophy, Volume 7, The Nineteenth Century, London: Routledge, 1999. [Nagel, 1933] E. Nagel. Charles Peirce’s Guesses at the Riddle, The Journal of Philosophy 30: 365-86, 1933. [Paavola, 2007] S. Paavola. On the Origin of Ideas: An Abductivist Approach to Discovery, Philosophical Studies from the University of Helsinki 15, 2007. [Peirce, 1931–1958] C. S. Peirce. Collected Papers of Charles Sanders Peirce, C. Hartshorne & P. Weiss (eds) (volumes 1-6) and A. Burks (volumes 7 and 8), Cambridge MA: Belknap Press, 1931-1958. [Psillos, 1999] S. Psillos. Scientific Realism: How Science Tracks Truth, London: Routledge, 1999. [Psillos, 2002] S. Psillos. Simply the Best: A Case for Abduction, in A. C. Kakas and F. Sadri (eds) Computational Logic: From Logic Programming into the Future, LNAI 2408, BerlinHeidelberg: Springer-Verlag, pp.605-25, 2002. [Psillos, 2009] S. Psillos. Knowing the Structure of Nature, Palgrave-MacMillan, 2009. [Reichenbach, 1938] H. Reichenbach. On Probability and Induction, Philosophy of Science 5: 21-45, 1938. [Ross, 1949] W. D. Ross. Aristotle’s Prior and Posterior Analytics, (with intr. and commentary), Oxford: Clarendon Press, 1949. [Royce, 1916] J. Royce. Charles Sanders Peirce, The Journal of Philosophy 13: 701-9, 1916. [Sharpe, 1970] R. Sharpe. Induction, Abduction and the Evolution of Science, Transactions of the Charles S. Peirce Society 6: 17-33, 1970. [Smith, 1989] R. Smith. Aristotle — Prior Analytics (translation, with intr., notes and commentary), Indianapolis: Hackett Publishing Company, 1989. [Thagard, 1977] P. Thagard. On the Unity of Peirce’s Theory of Hypothesis, Transactions of the Charles S. Peirce Society 113: 112-21, 1977. [Thagard, 1978] P. Thagard. The Best Explanation: Criteria for Theory Choice, The Journal of Philosophy 75: 76-92, 1978. [Thagard, 1981] P. Thagard. Peirce on Hypothesis and Abduction, in K. Ketner et al., eds., Proceedings of the C.S. Peirce Bicentennial International Congress, Texas: Texas Tech University Press, 271-4, 1981.
THE MODERN EPISTEMIC INTERPRETATIONS OF PROBABILITY: LOGICISM AND SUBJECTIVISM Maria Carla Galavotti This chapter will focus on the modern epistemic interpretations of probability, namely logicism and subjectivism. The qualification “modern” is meant to oppose the “classical” interpretation of probability developed by Pierre Simon de Laplace (1749-1827). With respect to Laplace’s definition, modern epistemic interpretations do not retain the strict linkage with the doctrine of determinism. Moreover, Laplace’s “Principle of insufficient reason” by which equal probability is assigned to all possible outcomes of a given experiment (uniform prior distribution) has been called into question by modern epistemic interpretations and gradually superseded by other criteria. In the following pages the main traits of the logical and subjective interpretations of probability will be outlined together with the position of a number of authors who developed different versions of such viewpoints. The work of Rudolf Carnap, who is widely recognised as the most prominent representative of logicism, will not be dealt with here as it is the topic of another chapter in the present volume.1 1
1.1
THE LOGICAL INTERPRETATION OF PROBABILITY
Forefathers
The logical interpretation regards probability as an epistemic notion pertaining to our knowledge of facts rather than to facts themselves. Compared to the “classical” epistemic view of probability forged by Pierre Simon de Laplace, this approach stresses the logical aspect of probability, and regards the theory of probability as part of logic. According to Ian Hacking, the logical interpretation can be traced back to Leibniz, who entertained the idea of a logic of probability comparable to deductive logic, and regarded probability as a relational notion to be valued in relation to the available data. More particularly, Leibniz is seen by Hacking as anticipating Carnap’s programme of inductive logic.2 1 See the chapter by Sandy Zabell in this volume. For a more extensive treatment of the topics discussed here, see Galavotti [2005]. 2 See Hacking [1971] and [1975].
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
154
Maria Carla Galavotti
The idea that probability represents a sort of degree of certainty, more precisely the degree to which a hypothesis is supported by a given amount of information, was later worked out in some detail by the Czech mathematician and logician Bernard Bolzano (1781–1848). Author of the treatise Wissenschaftslehre (1837) which is reputed to herald contemporary analytical philosophy,3 Bolzano defines probability as the “degree of validity” (Grad der G¨ ultigkeit) relative to a proposition expressing a hypothesis, with respect to other propositions, expressing the possibilities open to it. Probability is seen as an objective notion, exactly like truth, from which probability derives.4 The main ingredients of logicism, namely the idea that probability is a logical relation between propositions endowed with an objective character, are found in Bolzano’s conception, which can be seen as a direct ancestor of Carnap’s theory of probability as partial implication.
1.2 Nineteenth century British logicists In the nineteenth century the interpretation of probability was widely debated in Great Britain, and opposite viewpoints were upheld. Both the empirical and the epistemic views of probability counted followers. The empirical viewpoint imprinted the frequentist interpretation forged by two Cambridge scholars: Robert Leslie Ellis (1817–1859) and John Venn (1834–1923), author of The Logic of Chance, that appeared in three editions in 1866, 1876 and 1888. The epistemic viewpoint inspired the logical interpretation embraced by George Boole, Augustus De Morgan and Stanley Jevons, whose work is analysed in volume IV of the Handbook of the History of Logic, devoted to British Logic in the Nineteenth Century.5 Therefore, the present account will be limited to a brief outline of these authors’ views on probability. George Boole (1815–1864) is the author of the renowned An Investigation of the Laws of Thought, on Which are Founded the Mathematical Theories of Logic and Probabilities (1854). Although his name is mostly associated with (Boolean) algebra, Boole made important contributions to differential and integral calculus, and also probability. According to biographer Desmond MacHale, Boole’s work on probability was “greatly encouraged by W.F. Donkin, Savilian Professor of Astronomy in Oxford, who had himself written some important papers on the subject of probability. Boole was gratified that Donkin agreed with his results” [MacHale, 1985, p. 215]. William Donkin, on whom something will be added in the second part of this chapter, shared with Boole an epistemic view of probability, although he was himself closer to the subjective outlook. According to Boole “probability is expectation founded upon partial knowledge” [Boole, 1854a; 1916, p. 258]. In other words, probability gives grounds for 3 See
Dummett [1993]. Bolzano [1837]. 5 See Gabbay and Woods, eds. [2008], in particular the chapters by Dale Jacquette on “Boole’s Logic” (pp. 331–379); Michael E. Hobart and Joan L. Richards on “De Morgan’s Logic” (pp. 283– 329); and Bert Mosselmans and Ard van Moer on “William Stanley Jevons and the Substitution of Similars” (pp. 515–531). 4 See
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
155
expectation, based on the information available to those who evaluate it. However, probability is not itself a degree of expectation: “The rules which we employ in life-assurance, and in the other statistical applications of the theory of probabilities, are altogether independent of the mental phaenomena of expectation. They are founded on the assumption that the future will bear a resemblance to the past; that under the same circumstances the same event will tend to recur with a definite numerical frequency; not upon any attempt to submit to calculation the strength of human hopes and fears”. [Boole, 1854a; 1916, pp. 258-259] Boole summarizes his own attitude thus: “probability I conceive to be not so much expectation, as a rational ground for expectation” [Boole, 1854b; 1952, p. 292]. The accent on rationality features a peculiar trait of the logical interpretation, which takes a normative attitude towards the theory of probability. As we shall see, this marks a major difference from subjectivism. Within Boole’s perspective, the normative character of probability derives from that of logic, to which it belongs. The “laws of thought” investigated in his most famous book are not meant to describe how the mind works, but rather how it should work in order to be rational: “the mathematical laws of reasoning are, properly speaking, the laws of right reasoning only” [Boole, 1854a, 1916, p. 428].6 According to Boole’s logical perspective, probability does not represent a property of events, being rather a relationship between propositions describing events. In Boole’s words: “Although the immediate business of the theory of probability is with the frequency of the occurrence of events, and although it therefore borrows some of its elements from the science of number, yet as the expression of the occurrence of those events, and also of their relations, of whatever kind, which connect them, is the office of language, the common instrument of reason, so the theory of probabilities must bear some definite relation to logic. The events of which it takes account are expressed by propositions; their relations are involved in the relations of propositions. Regarded in this light, the object of the theory of probabilities may be thus stated: Given the separate probabilities of any propositions to find the probability of another proposition. By the probability of a proposition, I here mean [...] the probability that in any particular instance, arbitrarily chosen, the event or condition which it affirms will come to pass”. [Boole, 1851, 1952, pp. 250-251] Accordingly, the theory of probability is “coextensive with that of logic, and [...] it recognizes no relations among events but such as are capable of being expressed by propositions” [Boole, 1851, 1952, p. 251]. 6 According to some authors, Boole combines a normative attitude towards logic with psychologism. See Kneale [1948] and the “Introduction” (Part I by Ivor Grattan-Guinness and Part II by G´ erard Bornet) in Boole [1997].
156
Maria Carla Galavotti
Two kinds of objects of interest fall within the realm of probability: games of chance and observable phenomena belonging to the natural and social sciences. Games of chance confront us with a peculiar kind of problems, where the ascertainment of data is in itself a way of measuring probabilities. Events of this kind are called simple. Sometimes such events are combined to form a compound event, as when it is asked what is the probability of obtaining a six twice in two successive throws of a die. By contrast, the probability of phenomena encountered in nature can only be measured by means of frequencies, and then we face compound events. Simple events are described by simple propositions, and compound events are described by compounded propositions. Simple propositions are combined to form compounded propositions by means of the logical relations of conjunction and disjunction, and the dependence of the occurrence of certain events upon others can be represented by conditional propositions. Once the events subject to probability are described by propositions, these can be handled by using the methods of logic. The fundamental rules for calculating compounded probabilities are presented by Boole in such a way as to show their intimate relation with logic, and more precisely with his algebra. The conclusion attained is that there is a “natural bearing and dependence” [Boole, 1854a, 1916, p. 287] between the numerical measure of probability and the algebraic representation of the values of logical expressions. The task Boole sets himself is “to obtain a general method by which, given the probabilities of any events whatsoever, be they simple or compound, dependent or independent, conditioned or not, one can find the probability of some other event connected with them, the connection being either expressed by, or implicit in, a set of data given by logical equations”. [Boole, 1854a, 1916, p. 287] In so doing Boole sets forth the logicist programme, to be resumed by Carnap a hundred years later.7 Another representative of nineteenth century logicism is the mathematician Augustus De Morgan (1806–1871), who greatly influenced Boole.8 De Morgan’s major work in logic is the treatise Formal Logic: or, The Calculus of Inference, Necessary and Probable (1847) in which he claims that “by degree of probability we really mean, or ought to mean, degree of belief” [De Morgan, 1847, 1926, p. 198]. De Morgan strongly opposed the tenet that probability is an objective feature of objects, like their physical properties: “I throw away objective probability altogether, and consider the word as meaning the state of the mind with respect to an assertion, a coming event, or any other matter on which absolute knowledge does not exist” [De Morgan, 1847, 1926, p. 199]. However, when making these claims, De Morgan does not refer to actual belief, entertained by individual persons, but 7 The reader is addressed to Hailperin [1976] for a detailed account of Boole’s theory of probability. 8 On De Morgan’s life see the memoir written by his wife, in De Morgan, Sophia Elizabeth [1882], also containing some correspondence.
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
157
rather to the kind of belief a rational agent ought adopt when evaluating probability. Therefore, to say that the probability of a certain event is three to one should be taken to mean “that in the universal opinion of those who examine the subject, the state of mind to which a person ought to be able to bring himself is to look three times as confidently upon the arrival as upon the non-arrival” [De Morgan, 1847, 1926, p. 200]. De Morgan also wrote some essays specifically devoted to probability, including Theory of Probabilities (1837), and An Essay on Probabilities, and on their Applications to Life, Contingencies and Insurance Offices (1838), where he maintains that “the quantities which we propose to compare are the forces of the different impressions produced by different circumstances” [De Morgan, 1838, p. 6], and that “probability is the feeling of the mind, not the inherent property of a set of circumstances” [De Morgan, 1838, p. 7]. At first glance, De Morgan’s description of probability as a “degree of belief” and “state of mind” associate him with subjectivism. But his insistence upon referring to the human mind as transcending individuals, not to the minds of single agents who evaluate probabilities, sets him apart from modern subjectivists like Bruno de Finetti. The logicist attitude towards probability also characterizes the work of the economist and logician William Stanley Jevons (1835-1882).9 In The Principles of Science (1873) Jevons claims that “probability belongs wholly to the mind” [Jevons, 1873, 1877, p. 198]. While embracing an epistemic approach, Jevons does not define probability as a “degree of belief”, because he finds this terminology ambiguous. Against Augustus De Morgan, his teacher at University College London, he maintains that “the nature of belief is not more clear [...] than the notion which it is used to define. But an all-sufficient objection is, that the theory does not measure what the belief is, but what it ought to be” [Jevons, 1873, 1877, p. 199]. Jevons prefers “to dispense altogether with this obscure word belief, and to say that the theory of probability deals with quantity of knowledge” [Jevons, 1873, 1877, p. 199]. So defined, probability is seen as a suitable guide of belief and action. In Jevons’ words: “the value of the theory consists in correcting and guiding our belief, and rendering one’s states of mind and consequent actions harmonious with our knowledge of exterior conditions” [Jevons, 1873, 1877, p. 199]. Deeply convinced of the utility and power of probability, Jevons established a close link between probability and induction, arguing “that it is impossible to expound the methods of induction in a sound manner, without resting them upon the theory of probability” [Jevons, 1873, 1877, p. 197]. In this connection he praises Bayes’ method: “No inductive conclusions are more than probable, and [...] the theory of probability is an essential part of logical method, so that the logical value of every inductive result must be determined consciously or unconsciously, according to the principle of the inverse method of probability”. [Jevons, 1873, 1877, p. xxix] 9 See
Keynes, [1936, 1972] for a biographical sketch of Jevons.
158
Maria Carla Galavotti
A controversial aspect of Jevons’ work is his defence of Laplace against various criticisms raised by a number of authors including Boole. While granting Laplace’s critics that the principle of insufficient reason is to a certain extent arbitrary, he still regards it as the best solution available: “It must be allowed that the hypothesis adopted by Laplace is in some degree arbitrary, so that there was some opening for the doubt which Boole has cast upon it. [...] But it may be replied [...] that the supposition of an infinite number of balls treated in the manner of Laplace is less arbitrary and more comprehensive than any other that can be suggested”. [Jevons, 1873, 1877, p. 256] According to Jevons, Laplace’s method is of great help in situations characterized by lack of knowledge, so it “is only to be accepted in the absence of all better means, but like other results of the calculus of probability, it comes to our aid when knowledge is at an end and ignorance begins, and it prevents us from over-estimating the knowledge we possess”. [Jevons, 1873, 1877, p. 269] When reading Jevons, one is impressed by his deeply probabilistic attitude, testified by statements like the following: “the certainty of our scientific inferences [is] to a great extent a delusion” [Jevons, 1873, 1877, p. xxxi], and “the truth or untruth of a natural law, when carefully investigated, resolves itself into a high or low degree of probability” [Jevons, 1873, 1877, p. 217]. Jevons regards knowledge as intrinsically incomplete and calls attention to the shaky foundation of science, which is based on the assumption of the uniformity of nature. He argues that “those who so frequently use the expression Uniformity of Nature seem to forget that the Universe might exist consistently with the laws of nature in the most different conditions” (Jevons [1873, 1877], p. 749). In view of all this, appeal to probability is mandatory. Although probability does not tell us much about what happens in the short run, it represents our best tool for facing the future: “All that the calculus of probability pretends to give, is the result in the long run, as it is called, and this really means in an infinity of cases. During any finite experience, however long, chances may be against us. Nevertheless the theory is the best guide we can have”. [Jevons, 1873, 1877, p. 261] This suggests that for Jevons the ultimate justification of inductive inference is to be sought on pragmatical grounds.
1.3
William Ernest Johnson
William Ernest Johnson (1858–1931), mathematician, philosopher and logician, Fellow of King’s College and lecturer in the University of Cambridge, greatly
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
159
influenced outstanding personalities such as John Maynard Keynes, Frank Plumpton Ramsey and Harold Jeffreys. His most important work is Logic, published in three volumes between 1921 and 1924. By the time of his death he had been working on a fourth volume of Logic, dealing with probability. The drafts of the first three chapters were published posthumously in Mind in 1932, under the title: “Probability: The Relations of Proposal to Supposal”; “Probability: Axioms” and “Probability: The Deductive and the Inductive Problems”. The “Appendix on Eduction”, closing the third volume of Logic, also focuses on probability. Johnson adopts a “philosophical” approach to logic, stressing its epistemic aspects. He regards logic as “the analysis and criticism of thought” [Johnson, 1921, 1964, p. xiii], and takes a critical attitude towards formal approaches. By doing so, he set himself apart from the mainstream of the period. In a sympathetic spirit, Keynes observes that Johnson “was the first to exercise the epistemic side of logic, the links between logic and the psychology of thought. In a school of thought whose natural leanings were towards formal logic, he was exceptionally well equipped to write formal logic himself and to criticize everything which was being contributed to the subject along formal lines”. [Keynes, 1931, 1972, p. 349] Johnson makes a sharp distinction between “the epistemic aspect of thought”, connected with “the variable conditions and capacities for its acquisition”, and its “constitutive aspect”, referring to “the content of knowledge which has in itself a logically analysable form” [Johnson, 1921, 1964, pp. xxxiii-xxxiv]. The epistemic and grammatical aspects of logic are the two distinct albeit strictly intertwined components along which logic is to be analysed. Regarding probability, Johnson embraces a logical attitude that attaches probability to propositions. While taking this standpoint, he rejects the conception of probability as a property of events: “familiarly we speak of the probability of an event — he writes — but [...] such an expression is not justifiable” [Johnson, 1932, p. 2]. By contrast, “Probability is a character, variable in quantity or degree, which may be predicated of a proposition considered in its relation to some other proposition. The proposition to which the probability is assigned is called the proposal, and the proposition to which the probability of the proposal refers is called the supposal”. [Johnson, 1932, p. 8] The terms “proposal” and “supposal” stand for what are usually called “hypothesis” and “evidence”. As Johnson puts it, a peculiar feature of the theory of probability is that when dealing with it “we have to recognise not only the two assertive attitudes of acceptance and rejection of a given assertum, but also a third attitude, in which judgment as to its truth or falsity is suspended; and [...] probability can only be expounded by reference to such an attitude towards a given assertum” [Johnson, 1932, p. 2]. If the act of suspending judgment is a mental
160
Maria Carla Galavotti
fact, and as such is the competence of psychology, the treatment of probability taken in reference to that act is also strongly connected to logic, because logic provides the norms to be imposed on it. The following passage describes in what sense for Johnson probability falls within the realm of logic: “The logical treatment of probability is related to the psychological treatment of suspense of judgment in the same way as the logical treatment of the proposition is related to the psychological treatment of belief. Just as logic lays down some conditions for correct belief, so also it lays down conditions for correcting the attitude of suspense of judgment. In both cases we hold that logic is normative, in the sense that it imposes imperatives which have significance only in relation to presumed errors in the processes of thinking: thus, if there are criteria of truth, it is because belief sometimes errs. Similarly, if there are principles for the measurement of probability, it is because the attitude of suspense towards an assertum involves a mental measurable element, which is capable of correction. We therefore put forward the view, that probability is quantitative because there is a quantitative element in the attitude of suspense of judgment”. [Johnson, 1932, pp. 2-3] Johnson distinguished three types of probability statements according to their form. These three types should not be confused, because they give rise to different problems. They are: “(1) The singular proposition, e.g., that the next throw will be heads, or that this applicant for insurance will die within a year; (2) The class-fractional proposition, e.g., that, of the applicants to an insurance office, 3/4 of consumptives will die within a year; or that 1/2 of a large number of throws will be heads; (3) The universal proposition, e.g., that all men die before the age of 150 years”. [Johnson, 1932, p. 2] In more familiar terminology, Johnson’s worry is to distinguish between propositions referring to (1) a generic individual randomly chosen from a population, (2) a finite sample or population, (3) an infinite population. The distinction is important for both understanding and evaluating statistical inference, and Johnson has the merit of having called attention to it.10 Closely related is Johnson’s view that probability, conceived as the relation between proposal and supposal, presents two distinct aspects: constructional and inferential. Grasping the constructional relation between any two given propositions means that “both the form of each proposition taken by itself, and the process by which one proposition is constructed from the other” [Johnson, 1932, p. 4] are taken into account. In the case of probability, the form of the propositions involved and the way in which the proposal is constructed by modification of the 10 Some remarks on the relevance of the distinction made by Johnson are to be found in Costantini and Galavotti [1987].
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
161
supposal will determine the constructional relation between them. On such constructional relation is in turn based the inferential relation, “namely, the measure of probability that should be assigned to the proposal as based upon assurance with respect to the truth of the supposal” [Johnson, 1932, p. 4]. A couple of examples, taken from Johnson’s exposition, will illustrate the point: “Let the proposal be that ‘The next throw of a certain coin will give heads’. Let the supposal be that ‘the next throw of the coin will give heads or tails’. Then the relation of probability in which the proposal stands to the supposal is determined by the relation of the predication ‘heads’ to the predication ‘heads or tails’. Or. To take another example, let the proposal be that ‘the next man we meet will be tall and red-haired’, and the supposal that ‘the next man we meet will be tall’. Then the relation of predication ‘tall and red-haired’ to the predication ‘tall’ will determine the probability to be assigned to the proposal as depending on the supposal. These two cases illustrate the way in which the logical conjunctions ‘or’ and ‘and’ enter into the calculus of probability”. [Johnson, 1932, p. 8] Building on these concepts Johnson developed a theory of logical probability that is ultimately based on a relation of partial implication between propositions. This brings Johnson’s theory close to Carnap’s, with the fundamental difference that Carnap adopted a definition of the “content” of a proposition that relies on the more sophisticated tools of formal semantics. A major aspect of Johnson’s work on probability concerns the introduction of the so-called Permutation postulate, which corresponds to the property better known as exchangeability. This can be described by saying that exchangeable probability functions assign probability in a way that depends on the number of experienced cases, irrespective of the order in which they have been observed. In other words, under exchangeability probability is invariant with respect to permutation of individuals. This property plays a crucial role within Carnap’s inductive logic — where it is named symmetry — and de Finetti’s subjective theory of probability, which will be examined in the second part of this chapter. As we shall see, Johnson’s discovery of this result left some traces in Ramsey’s work. Johnson’s accomplishment was explicitly acknowledged by the Bayesian statistician Irving John Good, whose monograph The Estimation of Probabilities. An Essay on Modern Bayesian Methods opens with the following words: “This monograph is dedicated to William Ernest Johnson, the teacher of John Maynard Keynes and Harold Jeffreys” [Good, 1965, p. v]. In that book Good makes extensive use of what he calls “Johnson’s sufficiency postulate”, a label that he later modified by substituting the term “sufficiency” with “sufficientness”. Sandy Zabell’s article “W.E. Johnson’s ‘Sufficientness’ Postulate” offers an accurate reconstruction of Johnson’s argument, giving a generalisation of it and calling attention to its relevance for Bayesian statistics.11 11 See
Zabell [1982]. Also relevant are Zabell [1988] and [1989]. All three papers are reprinted
162
Maria Carla Galavotti
By contrast, the insight of Johnson’s treatment of probability was not grasped by his contemporaries, and his contribution, including the exchangeability result, remained almost ignored. Charlie Dunbar Broad’s comment on Johnson’s “Appendix on Eduction” testifies to this attitude: “about the Appendix all I can do is, with the utmost respect to Mr Johnson, to parody Mr Hobbes’ remark about the treatises of Milton and Salmasius: ‘very good mathematics; I have rarely seen better. And very bad probability; I have rarely seen worse”’ [Broad, 1924, p. 379].
1.4 John Maynard Keynes: a logicist with a human face The economist John Maynard Keynes (1883–1946), one of the leading celebrities of the last century, embraced the logical view in his A Treatise on Probability (1921). Son of the logician John Neville Keynes, Maynard was educated at Eton and Cambridge, where he later became a scholar and member of King’s College. Besides playing a crucial role in public life as a political advisor, Keynes was an indefatigable supporter of the arts, as testified, among other things, by his contribution to the establishment of the Cambridge Arts Theatre. In “A Cunning Purchase: the Life and Work of Maynard Keynes” Roger Backhouse and Bradley Bateman observe that “Keynes’ role as an economic problem-solver and a patron of the arts would continue through his last decade, despite his poor health” (Backhouse and Bateman [2006], p. 4). In Cambridge, Keynes was member of the “Apostles” discussion society — also known as “The Society”12 — together with personalities of the calibre of Lytton Strachey, Leonard Woolf, Henry Sidgwick, John McTaggart, Alfred North Whitehead, Bertrand Russell, Frank Ramsey and, last but not least, George Edward Moore. The latter exercised a great influence on the group, as well as on the partly overlapping “Bloomsbury group”, of which Maynard was also part. It was in this atmosphere deeply imbued with philosophy that the young Keynes wrote his book on probability. In The Life of John Maynard Keynes Roy Forbes Harrod maintains that the Treatise was written in the years 1906-1911.13 Although by that time the book was all but completed, Keynes could not prompt its final revision until 1920, due to his political commitments. When it finally appeared in print in 1921, the book was very well received, partly because of the fame that Keynes had by that time gained as an economist and political adviser, partly because it was the first systematic work on probability by an English writer after John Venn’s The Logic of Chance, whose first edition had been published forty-five years earlier, namely in 1866. A review of the Treatise by Charlie Dunbar Broad opens with this passage: “Mr Keynes’ long awaited work on Probability is now published, and will at once take its place as the best treatise on the logical foundations of the subject” [Broad, 1922, p. 72], and closes as follows: “I can only conclude by congratulating Mr Keynes in Zabell [2005]. 12 See Levy [1979] and Harrod [1951] for more details on the Apostles Society. 13 See Harrod [1951]. On Keynes’ life see also Skidelsky [1983-1992].
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
163
on finding time, amidst so many public duties, to complete this book, and the philosophical public on getting the best work on Probability that they are likely to see in this generation” [Broad, 1922, p. 85]. Referring to this statement, in an obituary of Keynes Richard Bevan Braithwaite observes that “Broad’s prophecy has proved correct” [Braithwaite, 1946, p. 284]. More evidence of the success attained by the Treatise is offered by Braithwaite in the portrait “Keynes as a Philosopher”, included in the collection Essays on John Maynard Keynes, edited by Maynard’s nephew Milo Keynes, where he maintains that “The Treatise was enthusiastically received by philosophers in the empiricist tradition. [...] The welcome given to Keynes’ book was largely due to the fact that his doctrine of probability filled an obvious gap in the empiricist theory of knowledge. Empiricists had divided knowledge into that which is ‘intuitive’ and that which is ‘derivative’ (to use Russell’s terms), and he regarded the latter as being passed upon the former by virtue of there being a logical relationships between them. Keynes extended the notion of logical relation to include probability relations, which enabled a similar account to be given of how intuitive knowledge could form the basis for rational belief which fell short of knowledge”. [Braithwaite, 1975, pp. 237-238] Braithwaite’s remarks remind us that at the time when Keynes’ Treatise was published, empiricist philosophers, under the spell of works like Russell’s and Whitehead’s Principia Mathematica, paid more attention to the deductive aspects of knowledge than to probability. Nevertheless, one should not forget that, as we have seen, the logical approach to probability already counted a number of supporters in Great Britain. Besides, in the same years a similar approach was embraced in Austria by Ludwig Wittgenstein and Friedrich Waismann.14 In the “Preface” to the Treatise Keynes acknowledges his debt to William Ernest Johnson, and more generally to the Cambridge philosophical setting, regarded as an ideal continuation of the great empiricist tradition “of Locke and Berkeley and Hume, of Mill and Sidgwick, who, in spite of their divergencies of doctrine, are united in a preference for what is matter of fact, and have conceived their subject as a branch rather of science than of creative imagination” [Keynes, 1921, pp. v-vi]. Keynes takes the theory of probability to be a branch of logic, more precisely that part of logic dealing with arguments that are not conclusive, but can be said to have a greater or lesser degree of inconclusiveness. In Keynes’ words: “Part of our knowledge we obtain direct; and part by argument. The Theory of Probability is concerned with that part which we obtain by argument, and treats of the different degrees in which the results so obtained are conclusive or inconclusive” [Keynes, 1921, p. 3]. Like the logic of conclusive arguments, the logic of probability investigates the general principles of inconclusive arguments. Both certainty and probability depend on the amount of knowledge that the premisses of 14 See
Galavotti [2005], Chapter 6, for more on Wittgenstein and Waismann.
164
Maria Carla Galavotti
an argument convey to support the conclusion, the difference being that certainty obtains when the amount of available knowledge authorizes full belief, while in all other cases one obtains degrees of belief. Certainty is therefore seen as the limiting case of probability. While regarding probability as the expression of partial belief, or degree of belief, Keynes points out that it is an intrinsically relational notion, because it depends on the information available: “The terms certain and probable describe the various degrees of rational belief about a proposition which different amounts of knowledge authorize us to entertain. All propositions are true or false, but the knowledge we have of them depends on our circumstances; and while it is often convenient to speak of propositions as certain or probable, this expresses strictly a relationship in which they stand to a corpus of knowledge, actual or hypothetical, and not a characteristic of the propositions in themselves. A proposition is capable at the same time of varying degrees of this relationship, depending upon the knowledge to which it is related, so that it is without significance to call a proposition probable unless we specify the knowledge to which we are relating it”. [Keynes, 1921, pp. 3-4] Another passage states the same idea even more plainly: “No proposition is in itself either probable or improbable, just as no place can be intrinsically distant; and the probability of the same statement varies with the evidence presented, which is, as it were, its origin of reference” [Keynes, 1921, p. 7]. The corpus of knowledge on which probability assessments are based is described by a set of propositions that constitute the premisses of an argument, standing in a logical relationship with the conclusion, which describes a hypothesis. Probability resides in this logical relationship, and its value is determined by the information conveyed by the premisses of the arguments involved: “As our knowledge or our hypothesis changes, our conclusions have new probabilities, not in themselves, but relatively to these new premisses. New logical relations have now become important, namely those between the conclusions which we are investigating and our new assumptions; but the old relations between the conclusions and the former assumptions still exist and are just as real as these new ones” [Keynes, 1921, p. 7] On this basis, Keynes developed a theory of comparative probability, in which conditional probabilities are ordered in terms of a relation of “more” or “less” probable, and are combined into compound probabilities. Like Boole, Keynes aimed to develop a theory of the reasonableness of degrees of belief on logical grounds. Within his perspective the logical character of probability goes hand in hand with its rational character. This element is pointed out by Keynes, who maintains that the theory of probability as a logical relation
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
165
“is concerned with the degree of belief which it is rational to entertain in given conditions, and not merely with the actual beliefs of particular individuals, which may or may not be rational” [Keynes, 1921, p. 4]. In other words, Keynes’ logical interpretation gives the theory of probability a normative value: “we assert that we ought on the evidence to prefer such and such a belief. We claim rational grounds for assertions which are not conclusively demonstrated” [Keynes, 1921, p. 5]. The kernel of the logical interpretation of probability lies precisely with the idea Keynes states with great clarity that in the light of the same amount of information the logical relation representing probability is the same for anyone. So conceived probability is objective, its objectivity being warranted by its logical character: “What particular propositions we select as the premisses of our argument naturally depends on subjective factors peculiar to ourselves; but the relations, in which other propositions stand to these, and which entitle us to probable beliefs, are objective and logical”. [Keynes, 1921, p. 4] It is precisely because the logical relations between the premisses and the conclusion of inconclusive arguments provide objective grounds for belief, that belief based on them can qualify as rational. As to the character of the logical relations themselves, Keynes says that “we cannot analyse the probability-relation in terms of simpler ideas” [Keynes, 1921, p. 8]. They are therefore taken as primitive, and their justification is left to our intuition. Keynes’ conception of the objectivity of probability relations and his use of intuition in that connection have been ascribed by a number of authors to Moore’s influence. Commenting on Keynes’ claim that “what is probable or improbable” in the light of a give amount of information is “fixed objectively, and is independent of our opinion” [Keynes, 1921, p. 4], Donald Gillies observes that when “Keynes speaks of probabilities as being fixed objectively [...] he means objective in the Platonic sense, referring to something in a supposed Platonic world of abstract ideas”, and adds that “we can see here clearly the influence of G. E. Moore. [...] In fact, there is a very notable similarity between the Platonic world as postulated by Cambridge philosophers in the Edwardian era and the Platonic world as originally described by Plato. Plato’s world of objective ideas contained the ethical qualities with the idea of the Good holding the principal place, but it also contained mathematical objects. The Cambridge philosophers thought that they had reduced mathematics to logic. So their Platonic world contained, as well as ethical qualities such as ‘good’, logical relations”. [Gillies, 2000, p. 33] The attitude just described is responsible for a most controversial feature of Keynes’ theory, namely his tenet that probability relations are not always measurable, nor comparable. He writes:
166
Maria Carla Galavotti
“By saying that not all probabilities are measurable, I mean that it is not possible to say of every pair of conclusions, about which we have some knowledge, that the degree of our rational belief in one bears any numerical relation to the degree of our rational belief in the other; and by saying that not all probabilities are comparable in respect of more and less, I mean that it is not always possible to say that the degree of our rational belief in one conclusion is either equal to, greater than, or less than the degree of our belief in another”. [Keynes, 1921, p. 34] In other words, Keynes admits of some probability relations which are intractable by the calculus of probabilities. Far from being worrying to him, this aspect testifies to the high value attached by Keynes to intuition. On the same basis, Keynes is suspicious of a purely formal treatment of probability, and of the adoption of mechanical rules for the evaluation of probability. Keynes believes that measurement of probability rests on the equidistribution of priors: “In order that numerical measurement may be possible, we must be given a number of equally probable alternatives” [Keynes, 1921, p. 41]. This admission notwithstanding, Keynes sharply criticizes Laplace’s principle of insufficient reason, which he prefers to call “Principle of Indifference” to stress the role of individual judgment in the ascription of equal probability to all possible alternatives “if there is an absence of positive ground for assigning unequal ones” [Keynes, 1921, p. 42]. To Laplace he objects that “the rule that there must be no ground for preferring one alternative to another, involves, if it is to be a guiding rule at all, and not a petitio principii, an appeal to judgments of irrelevance” [Keynes, 1921, pp. 54-55]. The judgment of indifference among various alternatives has to be substantiated with the assumption that there could be no further information, on account of which one might change such judgment itself. While in the case of games of chance this kind of assumption can be made without problems, most situations encountered in everyday life are characterized by a complexity that makes it arbitrary. For Keynes, the extension of the principle of insufficient reason to cover all possible applications is the expression of a superficial way of addressing probability, regarded as a product of ignorance rather than knowledge. By contrast, Keynes maintains that the judgment of indifference among available alternatives should not be grounded on ignorance, but rather on knowledge, and recommends that the application of the principle in question always be preceded by an act of discrimination between relevant and irrelevant elements of the available information, and by the decision to neglect certain pieces of evidence. A most interesting aspect of Keynes’ treatment is his discussion of the paradoxes raised by Laplace’s principle. As observed by Gillies: “It is greatly to Keynes’ credit that, although he advocates the Principle of Indifference, he gives the best statement in the literature [Keynes, 1921, Chapter 4] of the paradoxes to which it gives rise” [Gillies, 2000, p. 37]. The reader is addressed to Chapter 3 of Gillies’ Philosophical Theories of Probability for a critical account of Keynes’ treatment of the matter. Keynes’ distrust in the practice of unrestrictedly applying principles holding
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
167
within a restricted domain regards not only the Principle of Indifference, but extends to the “Principle of Induction”, taken as the method of establishing empirical knowledge from a multitude of observed cases. Keynes is suspicious of the inference of general principles on an inductive basis, including causal laws and the principle of uniformity of nature. He distinguishes two kinds of generalizations “arising out of empirical argument”. First, there are universal generalizations corresponding to “universal induction”, of which he says that “although such inductions are themselves susceptible of any degree of probability, they affirm invariable relations”. Second, there are those generalizations which assert probable connections, which correspond to “inductive correlation” [Keynes, 1921, p. 220]. Both types are discussed at length, the first in Part III of the Treatise and the second in Part V. Keynes stresses the importance of the connection between probability and induction, a relationship that was clearly seen by Thomas Bayes and Richard Price in the eighteenth century, but was underrated by subsequent literature. After mentioning Jevons, and also “Laplace and his followers”, as representatives of the tendency to use probability to address inductive problems, Keynes adds: “But it has been seldom apprehended clearly, either by these writers or by others, that the validity of every induction, strictly interpreted, depends, not on a matter of fact, but on the existence of a relation of probability. An inductive argument affirms, not that a certain matter of fact is so, but that relative to certain evidence there is a probability in its favour”. [Keynes, 1921, p. 221] In other words, probability is not reducible to an empirical matter: “The validity and reasonable nature of inductive generalisation is [...] a question of logic and not of experience, of formal and not of material laws. The actual constitution of the phenomenal universe determines the character of our evidence; but it cannot determine what conclusions given evidence rationally supports”. [Keynes, 1921, p. 221] Granted that induction has to be based on probability, the objectivity and rationality of probabilistic reasoning rests on the logical character of probability taken as the relation between a proposition expressing a given body of evidence and a proposition expressing a given hypothesis. On the same basis, Keynes criticizes inferential methods entirely grounded on repeated observations, like the calculation of frequencies. Against this attitude, he claims that the similarities and dissimilarities among events must be carefully considered before quantitative methods can be applied. In this connection, a crucial role is played by analogy, which becomes a prerequisite of statistical inductive methods based on frequencies. In Keynes’ words: “To argue from the mere fact that a given event has occurred invariably in a thousand instances under observation, without any analysis of the circumstances accompanying the individual instances, that it is likely
168
Maria Carla Galavotti
to occur invariably in future instances, is a feeble inductive argument, because it takes no account of the Analogy”. [Keynes, 1921, p. 407] The insistence upon analogy is a central feature of the perspective taken by Keynes, who devotes Part III of the Treatise to “Induction and analogy”. In an attempt to provide a logical foundation for analogy, Keynes finds it necessary to assume that the variety encountered in the world has to be of a limited kind: “As a logical foundation for Analogy, [...] we seem to need some such assumption as that the amount of variety in the universe is limited in such a way that there is no one object so complex that its qualities fall into an infinite number of independent groups (i.e., groups which might exist independently as well as in conjunction); or rather that none of the objects about which we generalise are as complex as this; or at least that, though some objects may be infinitely complex, we sometimes have a finite probability that an object about which we seek to generalise is not infinitely complex”. [Keynes, 1921, p. 258] This assumption confers a finitistic character to Keynes’ approach, criticized, among others, by Rudolf Carnap.15 The principle of limited variety is attacked on a different basis by Ramsey. The topic is addressed in two notes included in the collection Notes on Philosophy, Probability and Mathematics, namely “On the Hypothesis of Limited Variety”, and “Induction: Keynes and Wittgenstein”, where Ramsey claims to see “no logical reason for believing any such hypotheses; they are not the sort of things of which we could be supposed to have a priori knowledge, for they are complicated generalizations about the world which evidently may not be true” [Ramsey, 1991a, p. 297]. Another important ingredient of Keynes’ theory is the notion of weight of arguments. Like probability, the weight of inductive arguments varies according to the amount of evidence. But while probability is affected by the proportion between favourable and unfavourable evidence, weight increases as relevant evidence, taken as the sum of positive and negative observations, increases. In Keynes’ words: “As the relevant evidence at our disposal increases, the magnitude of the probability of the argument may either decrease or increase, according as the new knowledge strengthens the unfavourable or the favourable evidence; but something seems to have increased in either case — we have a more substantial basis upon which to rest our conclusion. I express this by saying that an accession of new evidence increases the weight of an argument. New evidence will sometimes decrease the probability of an argument, but it will always increase its ‘weight”’. [Keynes, 1921, p. 71] The concept of weight mingles with that of relevance, because to say that a piece of evidence is relevant is the same as saying that it increases the weight of an 15 See
Carnap [1950], § 62.
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
169
argument. Therefore, Keynes’ stress on weight backs the importance of the notion of relevance within his theory of probability. Keynes addresses the issue of whether the weight of arguments should be made to bear upon action choice. As he puts it: “the question comes to this — if two probabilities are equal in degree, ought we, in choosing our course of action, to prefer that one which is based on a greater body of knowledge?” [Keynes, 1921, p. 313]. This issue, he claims, has been neglected by the literature on action choice, essentially based on the notion of mathematical expectation. However, Keynes admits to find the question “highly perplexing”, adding that “it is difficult to say much that is useful about it” [Keynes, 1921, p. 313]. The discussion of these topics leads to a sceptical conclusion, reflecting Keynes’ distrust in a strictly mathematical treatment of the matter, motivated by the desire to leave room for individual judgment and intuition. He maintains that: “The hope, which sustained many investigators in the course of the nineteenth century, of gradually bringing the moral sciences under the sway of mathematical reasoning, steadily recedes — if we mean, as they meant, by mathematics the introduction of precise numerical methods. The old assumptions, that all quantity is numerical and that all quantitative characteristics are additive, can be no longer sustained. Mathematical reasoning now appears as an aid in its symbolic rather that in its numerical character”. [Keynes, 1921, p. 316] Keynes’ notion of weight is the object of a vast literature. Some authors think that such a notion is at odds with the logicist notion of probability put forward by Keynes. For instance, in “Keynes’ Theory of Probability and its Relevance to its Economics” Allin Cottrell argues that “the perplexities surrounding ‘weight’... are important as the symptom of an internal difficulty in the notion of probability Keynes wishes to promote” [Cottrell, 1993, p. 35]. More precisely, Cottrell believes that the idea that some probability judgments are more reliable than others by virtue of being grounded on a larger weight requires that probabilities of probabilities are admitted, while Keynes does not contemplate them. Cottrell thinks that the frequency notion of probability could do the job. As a matter of fact, this clutch of problems has been extensively dealt with by a number of authors operating under the label of “Bayesianism”, mostly of subjective orientation. Keynes’ views on the objectivity of probability relations involves the tenet that the validity of inductive arguments cannot be made to depend on their success, and it is not undermined by the fact that some events which have been predicted do not actually take place. Induction allows us to say that on the basis of a certain piece of evidence a certain conclusion is reasonable, not that it is true. Awareness of this fact should inspire caution towards inductive predictions, and Keynes warns against the danger of making predictions obtained by detaching the conclusion of an inductive argument. This features a typical aspect of the logical interpretation of probability, that has been at the centre of a vast debate, in which Rudolf Carnap
170
Maria Carla Galavotti
also took part.16 The refusal of any attempt to ground probabilistic inference on success, that goes along with Keynes’ insistence on the logical and non-empirical character of probability relations, is stressed by Anna Carabelli, who writes that “Keynes was [...] critical of the positivist a posteriori criterion of the validity of induction, by which the inductive generalization was valid as far as the prevision based on it will prove successful, that is, will be confirmed by subsequent facts. [...] On the contrary, the validity of inductive method, according to Keynes, did not depend on the success of its prediction, or on its empirical confirmation”. [Carabelli, 1988, p. 66] However, Carabelli adds that “notably, that was what made the difference between Keynes’ position and that of those later logico-empiricists, like R. Carnap, who analysed induction from what he called the ‘confirmation theory’ point of view” [Carabelli, 1988, p. 66]. This claim is misleading, because Carnap’s confirmation theory is not so closely linked to the criterion of success as Carabelli claims. In his late writings Carnap appealed to “inductive intuition” to justify induction, thereby embracing a position not so distant from that of Keynes.17 But it should be added that Keynes assigns to intuition a much more substantial role than Carnap. A further difference between these two authors amounts to their different attitude towards formalization. Keynes, as we have seen, distrusted the pervasive use of mathematics and formal methods, whereas Carnap embraced a strictly formal approach. Carnap’s production on probability, which culminated with the publication in 1950 of Logical Foundations of Probability and occupied the last twenty years of his life, until he died in 1970, is no exception. Although his perspective underwent significant changes, Carnap never abandoned the programme of developing an inductive logic aimed at providing a rational reconstruction of probability within a formalized logical system. As described by Richard Jeffrey’s colourful expression, Carnap “died with his logical boots on, at work on the project” [Jeffrey, 1991, p. 259]. In this enterprise, Carnap was inspired by an unwavering faith in the powers of formal logic on the one side, and of experience on the other, in compliance with the logical empiricist creed. By contrast, Keynes embraced a moderate version of logicism, a logicism “with a human face”, imbued with a deeply felt need not to lose sight of ordinary speech and practice, and to assign an essential role to intuition and individual judgment. To conclude this presentation of Keynes’ views on probability, it is worth mentioning the long debated issue of Ramsey’s criticism and Keynes’ reaction to it. Soon after the publication of the Treatise, Ramsey published a critical review in The Cambridge Magazine challenging some of the central issues in the Treatise, like the conviction that there are unknown probabilities, the principle of limited 16 See, for instance, Kyburg [1968] and the discussion following it, with comments by Y. BarHillel, P. Suppes, K.R. Popper, W.C. Salmon, J. Hintikka, R. Carnap, H. Kyburg jr. 17 See Carnap [1968].
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
171
variety, and the very idea that probability is a logical relation.18 As will be argued in more detail in what follows, Ramsey is very critical of this point, which also reappears in other writings. For instance, in “Truth and Probability” he objects to Keynes that “there really do not seem to be any such things as the probability relations he describes” [Ramsey, 1990a, p. 57], and in another note called “Criticism of Keynes” he maintains that: “there are no such things as these relations” [Ramsey, 1991a, p. 273]. After Ramsey’s premature death in 1930, Keynes wrote an obituary containing an explicit concession to Ramsey’s criticism. There he writes: “Ramsey argues, as against the view which I had put forward, that probability is concerned not with objective relations between propositions but (in some sense) with degrees of belief, and he succeeds in showing that the calculus of probabilities simply amounts to a set of rules for ensuring that the system of degrees of belief which we hold shall be a consistent system. Thus the calculus of probabilities belongs to formal logic. But the basis of our degrees of belief — or the a priori probabilities, as they used to be called — is part of our human outfit, perhaps given us merely by natural selection, analogous to our perceptions and our memories rather than to formal logic. So far I yield to Ramsey — I think he is right”. [Keynes, 1930, 1972, p. 339] Before adding some comments, it is worth recalling how the above quoted passage continues: “But in attempting to distinguish ‘rational’ degrees of belief from belief in general he [Ramsey] was not yet, I think, quite successful. It is not getting to the bottom of the principle of induction merely to say that it is a useful mental habit”. [Keynes, 1930, 1972, p. 339] As one can see, some ten years after publication of the Treatise, Keynes was still concerned with drawing a sharp boundary between rational belief and actual belief. Undeniably, such an attitude sides him with logicism, as opposed to subjectivism. indexlogicism The literature is divided among those who believe that after Ramsey’s criticism Keynes changed his attitude towards probability, and those who are instead convinced that Keynes never changed his mind in a substantial way. Among others, Anna Carabelli believes that “Keynes did not change substantially his view on probability” [Carabelli, 1988, p. 255]. By contrast, Bradley Bateman in “Keynes’ Changing Conception of Probability” holds that the views on probability retained in the Treatise “underwent at least two significant changes in subsequent years. Keynes first advocated an objective epistemic theory of probability, but later advocated both subjective epistemic and objective aleatory theories of probability” [Bateman, 1987, p. 113]. A still different viewpoint is taken by Donald Gillies, 18 See
Ramsey [1922].
172
Maria Carla Galavotti
who agrees with Bateman that Keynes changed his conception of probability as a consequence of Ramsey’s criticism, but disagrees as to the nature of such change. Gillies argues that “Keynes did realize, in the light of Ramsey’s criticism, that his earlier views on probability needed to be changed, and he may well have had some rough ideas about how this should be done, but he never settled down to work out a new interpretation of probability in detail. What we have to do therefore is not so much try to reconstruct, on the basis of rather fragmentary evidence, Keynes’ exact views on probability in the 1930s. I don’t believe that Keynes had very exact views on probability at that time. I suggest therefore that we should switch to trying to develop an interpretation of probability that fits the economic theory that Keynes presented in 1936 and 1937, but without necessarily claiming that his theory was precisely what Keynes himself had in mind”. [Gillies, 2006, p. 210] Keynes’ works to which Gillies refers are the well known book The General Theory of Employment, Interest and Money published in 1936, and the article “The General Theory of Employment” published in 1937. According to Gillies, Keynes accepted Ramsey’s criticisms to some extent, and moved to a theory of probability that he labels “intersubjective” and describes as intermediate between logicism and subjectivism. Its distinctive feature is that of ascribing degrees of belief not to single individuals, as subjectivists do, but rather to groups. Gillies presents the theory as an extension of the subjective viewpoint, by demonstrating a Dutch Book Theorem holding for groups, which shows the following: “Let B be some social group. Then it is the interest of B as a whole if its members agree, perhaps as a result of rational discussion, on a common betting quotient rather than each member of the group choosing his or her own betting quotient. If a group does in fact agree on a common betting quotient, this will be called the intersubjective or consensus probability of the social group”. [Gillies, 2006, p. 212]19 Gillies argues that this interpretation “fits perfectly with Keynes’ theory of longterm expectation developed in his 1936 and 1937 publications” [Gillies, 2006, p. 212]. The issue of Keynes’ reaction to Ramsey’s criticisms and the relationship between his conception of probability and his views on economic theory is the object of ongoing debate.
1.5 Harold Jeffreys between logicism and subjectivism Professor of astronomy and experimental philosophy at Cambridge University, Harold Jeffreys (1891–1989) is reputedly one of the last century’s most prominent 19 See
also Gillies [2000], Chapter 8, for more on the intersubjective theory of probability.
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
173
geophysicists and a pioneer of the study of the Earth. As described by Alan Cook in a memoir of Jeffreys written for the Royal Society, “the major spherically symmetrical elements of the structure of the Earth that he [Jeffreys] did so much to elucidate, are the basis for all subsequent elaboration, and generations of students learnt their geophysics from his book The Earth” [Cook, 1990, p. 303]. Jeffreys’ work also left a mark in other fields, like seismology and meteorology, and, last but not least, probability. 20 His interest in probability and scientific method led to publication of the book Scientific Inference in 1931, followed in 1939 by Theory of Probability. In addition, he published a number of articles on the topic. Jeffreys was a wholehearted inductivist who used to say that Bayes’ theorem “is to the theory of probability what Pythagoras’ theorem is to geometry” [Jeffreys, 1931, p. 7]. He was led to embrace Bayesianism by his own work in geophysics, where he only had access to scarce data, and needed a method for assessing hypotheses regarding unknown situations, like the composition of the Earth. As a practising scientist he was faced with problems of inverse probability, having to explain experimental data by means of different hypotheses, or to evaluate general hypotheses in the light of changing data. This made it natural for Jeffreys to adopt both an epistemic notion of probability and Bayesian methodology, although at the time he started working on this kind of problems Bayesian method was in disgrace among scientists and statisticians, for the most part supporters of frequentism. But as David Howie observed, “restricted to repeated sampling from a well-behaved population, and largely reserved for data reduction” frequentism “could apply neither to the diverse pool of data Jeffreys drew upon nor directly to the sorts of questions he was attempting to address” [Howie, 2002, p. 127]. Jeffreys’ refusal to embrace frequentism is responsible for the fact that his contribution to probability was not fully appreciated by his contemporaries, and he engaged a debate with various authors, including the physicist Norman Campbell, and the statistician Ronald Fisher.21 Against frequentism, Jeffreys holds that “no ‘objective’ definition of probability in terms of actual or possible observations, or possible properties of the world, is admissible” [Jeffreys, 1939, 1961, p. 11].22 As will be argued in what follows, this relationship is actually reversed within Jeffreys’ epistemology, where probability comes before the notions of objectivity, reality and the external world [Jeffreys, 1936a, p. 325]. Jeffreys started working on probability together with Dorothy Wrinch, a mathematician and scientist, at the time fellow of Girton College Cambridge, who had approached epistemological questions under the influence of Johnson and Russell. In three papers written between 1919 and 192323 Jeffreys and Wrinch draw the lines of an inductivist programme that Jeffreys kept firm throughout his long life, and put at the core of a genuinely probabilistic epistemology revolving around the 20 For a scientific portrait of Harold Jeffreys centred on probability and statistics see Lindley [1991]. 21 See Howie [2002] for a detailed reconstruction of the genesis of Jeffreys’ Bayesianism and the polemics he entertained with Fisher. 22 In connection with Jeffreys’ criticism of frequentism see also Jeffreys [1933] and [1934]. 23 See Jeffreys and Wrinch [1919], [1921] and [1923].
174
Maria Carla Galavotti
idea that probability is “the most fundamental and general guiding principle of the whole of science” [Jeffreys, 1931, p. 7]. Jeffreys and Wrinch made the assumption that all quantitative laws form an enumerable set, and their probabilities form a convergent series. This assumption allows for the assignment of significant prior probabilities to general hypotheses. In addition, Jeffreys and Wrinch formulated a simplicity postulate, according to which simpler laws are assigned a greater prior probability.24 According to its proponents, this principle corresponds to the practice of testing possible laws in order of decreasing simplicity. This machinery allows for the adoption of Bayesian method. Jeffreys’ inductivism is grounded in an epistemic view of probability that shares the main features of logicism, but in certain respects comes closer to subjectivism. According to Jeffreys probability “expresses a relation between a proposition and a set of data” [Jeffreys, 1931, p. 9]. Probability is deemed “a purely epistemological notion” [Jeffreys, 1955, p. 283], corresponding to the reasonable degree of belief that is warranted by a certain body of evidence, by which it is uniquely determined. Given a set of data, Jeffreys claims, “a proposition q has in relation to these data one and only one probability. If any person assigns a different probability, he is simply wrong” [Jeffreys, 1931, p. 10]. The conviction that there exists “unique reasonable degrees of belief” [Jeffreys, 1939, p. 36] puts him in line with logicism, while marking a crucial divergence from subjectivism, a divergence described by Bruno de Finetti as that between “necessarists” who affirm and subjectivists who deny “that there are logical grounds for picking out one single evaluation of probability as being objectively special and ‘correct”’ [de Finetti, 1970, English edition 1975, vol. 2, p. 40]. For Jeffreys, the need to define probability objectively is imposed by science itself. He aimed to define probability in a “pure” way suited for scientific applications. This led Jeffreys to criticize the subjective interpretation of probability put forward by Frank Ramsey, with whom he consorted and shared various interests but apparently never discussed probability.25 To Jeffreys’ eyes subjectivism is a “theory of expectation rather than one of “pure probability” [Jeffreys, 1936a, p. 326]. For a scientist like Jeffreys subjective probability is a theory “for business men”. This is not meant as an expression of contempt, for “we have habitually to decide on the best course of action in given circumstances, in other words to compare the expectations of the benefits that may arise from different actions; hence a theory of expectation is possibly more needed than one of pure probability” [Jeffreys, 1939, 1961, p. 326]. But what science requires is a notion of “pure probability”, not the subjective notion in terms of preferences based on expectations. 24 For
a discussion of the simplicity postulate see Howson [1988]. to Howie and Lindley, Jeffreys found out about Ramsey’s work on probability only after Ramsey’s death in 1930; see Howie [2002], p. 117 and Lindley [1991], p. 13. However, Howie provides evidence that both Ramsey and Jeffreys took part in a group discussing psychoanalysis, whose activity is described in Cameron and Forrester [2000]. Strangely enough, during those meetings they did not discuss probability. 25 According
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
175
In order to define probability in a “pure” way, Jeffreys grounds it on a principle, stated by way of an axiom, which says that probabilities are comparable: “given p, q is either more, equally, or less probable that r, and no two of these alternatives can be true” [Jeffreys, 1939, 1961, p. 16]. He then shows that the fundamental properties of probability functions follow from this assumption. By so doing, Jeffreys qualifies as one of the first to establish the rules of probability from basic presuppositions. Although admitting an affinity with Keynes’ perspective, Jeffreys is careful to keep his own position separate from that of Keynes. In the “Preface” to the second edition of his Theory of Probability, Jeffreys complains at having been labelled a “follower” of Keynes, and draws attention to the fact that Keynes’ Treatise on Probability appeared after he and Dorothy Wrinch had published their first contributions to the theory of probability, drawing the lines of an epistemic approach akin to Keynes’ logicism. He also points out that the resemblance between his own theory and that of Keynes depends on the fact that both attended the lectures of William Ernest Johnson [Jeffreys, 1939, 1961, p. v], thereby bringing more evidence of Johnson’s influence on his contemporaries. A major disagreement with Keynes concerns Keynes’ refusal “to admit that all probabilities are expressible by numbers” [Jeffreys, 1931, p. 223].26 In that connection, Jeffreys’ viewpoint coincides with subjectivism. A most interesting aspect of Jeffreys’ thought is that of developing an original epistemology, which is deeply probabilistic in character.27 This is rooted in a phenomenalistic view of knowledge of the kind upheld by Ernst Mach and Karl Pearson. However, for Jeffreys “the pure phenomenalistic attitude is not adequate for scientific needs. It requires development, and in some cases modification, before it can deal with the problems of inference” [Jeffreys, 1931, p. 225]. The crucial innovation to be made with respect to Mach’s phenomenalism amounts to the introduction of probability, or probabilistic inference, to be more precise. Jeffreys’ epistemology is constructivist, in the sense that such crucial ingredients of scientific knowledge as the notions of “empirical law”, “objectivity”, “reality”, and “causality” are established by inference from experience. This is made possible by statistical methodology, seen as the fundamental tool of science. Concerning objectivity, in the “Addenda” to the 1937 edition of Scientific Inference, Jeffreys writes that “the introduction of the word ‘objective’ at the outset seems [...] a fundamental confusion. The whole problem of scientific method is to find out what is objective” [Jeffreys, 1931, 1973, p. 255]. The same idea is expressed in Theory of Probability, where he states: “I should query whether any meaning can be attached to “objective” without a previous analysis of the process of finding out what is objective” [Jeffreys, 1939, p. 336]. Such a process is inductive and probabilistic, it originates in our sensations and proceeds step by step to the construction of abstract notions lying beyond phenomena. Such notions 26 Additional points of disagreement between Jeffreys and Keynes are described in Jeffreys [1922]. 27 This is described in more detail in Galavotti [2003].
176
Maria Carla Galavotti
cannot be described in terms of observables, but are nonetheless admissible and useful, because they permit “co-ordination of a large number of sensations that cannot be achieved so compactly in any other way” [Jeffreys, 1931, 1973, p. 190]. In this way empirical laws, or “objective statements”, are established. To this end, an inductive passage is needed, for it is only after the rules of induction “have compared it with experience and attached a high probability to it as a result of that comparison” that a general proposition can become a law. In this procedure lies “the only scientifically useful meaning of ‘objectivity”’ [Jeffreys, 1939, p. 336]. Similar considerations apply to the notion of reality. According to Jeffreys, a useful notion of reality obtains when some scientific hypotheses receive from the data a probability which is so high, that on their basis one can draw inferences, whose probabilities are practically the same as if the hypotheses in question were certain. Hypotheses of this kind are taken as certain in the sense that all their parameters “acquire a permanent status”. In such cases, we can assert the associations expressed by the hypotheses in question “as an approximate rule”. Jeffreys retains a likewise empirical and constructivist view of causality. His proposal is to substitute the general formulation of the “principle of causality” with “causal analysis”, as performed within statistical methodology. This starts by considering all the variations observed in a given phenomenon at random, and proceeds to detect correlations which allow for predictions and descriptions that are the more precise, the better their agreement with observations. This procedure leads to asserting laws, which are eventually accepted because “the agreement (with observations) is too good to be accidental” [Jeffreys, 1937, p. 62]. Within scientific practice, the principle of causality is “inverted”: “instead of saying that every event has a cause, we recognize that observations vary and regard scientific method as a procedure for analysing the variation” [Jeffreys, 1931, 1957, p. 78]. The deterministic version of the principle of causality is thereby discarded, for “it expresses a wish for exactness, which is always frustrated, and nothing more” [Jeffreys, 1937, pp. 63-64]. Jeffreys’ position regarding scientific laws, reality and causality reveal the same pragmatical attitude underpinning Ramsey’s views on general propositions and causality, the main difference being that Ramsey’s approach is more strictly analytic, whereas Jeffreys grounds his arguments on probabilistic inference and statistical methodology alone. Furthermore, Jeffreys and Ramsey share the conviction that within an epistemic interpretation of probability there is room for notions like chance and physical probability. Jeffreys regards the notion of chance as the “limiting case” of everyday probability assignments. Chance occurs in those situations in which “given certain parameters, the probability of an event is the same at every trial, no matter what may have happened at previous trials” [Jeffreys, 1931, 1957, p. 46]. For instance, chance “will apply to the throw of a coin or a die that we previously know to be unbiased, but not if we are throwing it with the object of determining the degree of bias. It will apply to measurements when we know the true value and the law of error already. [...] It is not numerically assessable except when we know so much about the system already that we need to know no more”
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
177
[Jeffreys, 1936a, p. 329]. Jeffreys also contemplates the possibility of extending the realm of epistemic probability to a robust notion of “physical probability” of the kind encountered in quantum mechanics. He calls attention to those fields where “some scientific laws may contain an element of probability that is intrinsic to the system and has nothing to do with our knowledge of it” [Jeffreys, 1955, p. 284]. This is the case with quantum mechanics, whose account of phenomena is irreducibly probabilistic. Unlike the probability (chance) that a fair coin falls heads, intrinsic probabilities do not belong to our description of phenomena, but to the theory itself. Jeffreys claims to be “inclined to think that there may be such a thing as intrinsic probability. [...] Whether there is or not — he adds – it can be discussed in the language of epistemological probability” [Jeffreys, 1955, p. 284]. We will find similar ideas expressed by Ramsey. The pragmatical attitude that characterizes Jeffreys’ epistemology brings him close to subjectivism, and so does his conviction that science is fallible, together with his admission that empirical information can be “vague and half-forgotten”, a fact that “has possibly led to more trouble than has received explicit mention” [Jeffreys, 1931, 1973, p. 406]. These features of his perspective are somewhat at odds with his definition of probability as a degree of rational belief uniquely determined by experience, and with the idea that the evaluation of probability is an objective procedure, whose application to experimental evidence obeys rules having the status of logical principles. 2 THE SUBJECTIVE INTERPRETATION OF PROBABILITY Modern subjectivism, sometimes also called “personalism”, shares with logicism the conviction that probability is an epistemic notion. As already pointed out, the crucial point of disagreement between the two interpretations comes in connection with the fact that unlike logicists, subjectivists do not believe that probability evaluations are univocally determined by a given body of evidence.
2.1
The starters
William Fishburn Donkin (1814–1869), professor of astronomy at Oxford, fostered a subjective interpretation of probability in “On Certain Questions Relating to the Theory of Probabilities”, published in 1851. There he writes that “the ‘probability’ which is estimated numerically means merely ‘quantity of belief’, and is nothing inherent in the hypothesis to which it refers” [Donkin, 1851, p. 355]. This claim impressed Frank Ramsey, who recorded it in his notes.28 Donkin’s position is actually quite similar to that of De Morgan, especially when he maintains that probability is “relative to a particular state of knowledge or ignorance; but [...] it is absolute in the sense of not being relative to any individual mind; since, the 28 See document 003-13-01 of the Ramsey Collection, held at the Hillman Library, University of Pittsburgh.
178
Maria Carla Galavotti
same information being presupposed, all minds ought to distribute their belief in the same way” [Donkin, 1851, p. 355]. If in view of claims of this kind Donkin qualifies more as a logicist than as a subjectivist, the appearance of his name in the present section on subjectivism is justified by the fact that he addressed the issue of belief conditioning in a way that anticipated the work of Richard Jeffrey a century later. Donkin formulated a principle, imposing a symmetry restriction on updating belief, as new information is obtained. In a nutshell, the principle states that changing opinion on the probabilities assigned to a set of hypotheses, after new information has been acquired, has to preserve the proportionality among the probabilities assigned to the considered options. Under this condition, the new and old opinions are comparable. The principle is introduced by Donkin as follows: “Theorem. If there be any number of mutually exclusive hypotheses, h1 , h2 , h3 ..., of which the probabilities relative to a particular state of information are p1 , p2 , p3 ..., and if new information be gained which changes the probabilities of some of them, suppose of hm+1 and all that follow, without having otherwise any reference to the rest, then the probabilities of these latter have the same ratios to one another, after the new information, that they had before; that is, p′1 : p′2 : p′3 : ... : p′m = p1 : p2 : p3 : ... : pm , where the accented letters denote the values after the new information has been acquired”. [Donkin, 1851, p. 356] The method of conditioning known as Jeffrey conditionalization reflects precisely the intuition behind Donkin’s principle.29 ´ The French mathematician Emile Borel (1871–1956), who gave outstanding contributions to the study of the mathematical properties of probability, can be considered a pioneer of the subjective interpretation. In a review of Keynes’ Treatise originally published in 1924 and later reprinted in the last volume of the series of monographs edited by Borel under the title Trait´e du calcul des probabilit´es et ses applications (1939),30 Borel raises various objections to Keynes, blamed for overlooking the applications of probability to science to focus only on the probability of judgments. Borel takes this to be a distinctive feature of the English as opposed to continental literature which he regards as more aware of the developments of science, particularly physics. When making such claims, Borel is likely to have in mind above all Henri Poincar´e, whose ideas exercised a certain influence on him.31 29 See
Jeffrey [1965], [1992a] and [2004]. Trait´ e includes 18 issues, collected in 4 volumes. The review of Keynes’ Treatise appears in the last issue, under the title “Valeur pratique et philosophie des probabilit´es”. 31 See von Plato [1994, p. 36], where Borel is described as a successor of Poincar´ e “in an intellectual sense”. The book by von Plato contains a detailed exposition of Borel’s ideas on probability. See also [Knobloch, 1987]. 30 The
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
179
While agreeing with Keynes in taking probability in its epistemic sense, Borel claims that probability acquires a different meaning depending on the context in which it occurs. Probability has a different value in situations characterized by a different state of information, and is endowed with a “more objective” meaning in science, where its assessment is grounded on a strong body of information, shared by the scientific community. Borel is definitely a subjectivist when he admits that two people, given the same information, can come up with different probability evaluations. This is most common in everyday applications of probability, like horse races, or weather forecasts. In all such cases, probability judgments are of necessity relative to “a certain body of knowledge”, which is not the kind of information shared by everyone, like scientific theories at a certain time. Remarkably, Borel maintains that when talking of this kind of probability the “body of knowledge” in question should be thought of as “necessarily included in a determinate human mind, but not such that the same abstract knowledge constitutes the same body of knowledge in two distinct human minds” [Borel, 1924, English edition 1964, p. 51]. Probability evaluations made at different times, based on different information, ought not be taken as refinements of previous judgments, but as totally new ones. Borel disagrees with Keynes on the claim that there are probabilities which cannot be evaluated numerically. In connection with the evaluation of probability Borel appeals to the method of betting, which “permits us in the majority of cases a numerical evaluation of probabilities” [Borel, 1924, English edition 1964, p. 57]. This method, which dates back to the origin of the numerical notion of probability in the seventeenth century, is regarded by Borel as having “exactly the same characteristics as the evaluation of prices by the method of exchange. If one desires to know the price of a ton of coal, it suffices to offer successively greater and greater sums to the person who possesses the coal; at a certain sum he will decide to sell it. Inversely if the possessor of the coal offers his coal, he will find it sold if he lowers his demands sufficiently”. [Borel, 1924, English edition 1964, p. 57] At the end of a discussion of the method of bets, where he takes into account some of the traditional objections against it, Borel concludes that this method seems good enough, in the light of ordinary experience. Borel’s conception of epistemic probability has a strong affinity with the subjective interpretation developed by Ramsey and de Finetti. In a brief note on Borel’s work, de Finetti praises Borel for holding that probability must be referred to the single case, and that this kind of probability is always measurable sufficiently well by means of the betting method. At the same time, de Finetti strongly disagrees with the eclectic attitude taken by Borel, more particularly with his admission of an objective meaning of probability, in addition to the subjective.32 32 De
Finetti’s commentary on Borel is to be found in de Finetti [1939].
180
Maria Carla Galavotti
2.2 Ramsey and the principle of coherence Frank Plumpton Ramsey (1903-1930), Fellow of King’s College and lecturer in mathematics at Cambridge, made outstanding contributions to a number of different fields, including mathematics, logic, philosophy, probability, and economics.33 In his obituary, Keynes refers to Ramsey’s as “one of the brightest minds of our generation” and praises him for the “amazing, easy efficiency of the intellectual machine which ground away behind his wide temples and broad, smiling face” [Keynes, 1930, 1972, p. 336]. A regular attender at the meetings of the Moral Sciences Club and the Apostles, Ramsey actively interacted with his contemporaries, including Keynes, Moore, Russell and Wittgenstein — whose Tractatus he translated into English — often influencing their ideas. Ramsey is considered the starter of modern subjectivism with his paper “Truth and Probability”, read at the Moral Sciences Club in 1926, and published in 1931 in the collection The Foundations of Mathematics and Other Logical Essays edited by Richard Bevan Braithwaite shortly after Ramsey’s death. Other sources are to be found in the same book, as well as in the other collection, edited by Hugh Mellor, Philosophical Papers (largely overlapping Braithwaite’s), and in addition in the volumes Notes on Philosophy, Probability and Mathematics, edited by Maria Carla Galavotti, and On Truth, edited by Nicholas Rescher and Ulrich Majer. Ramsey regards probability as a degree of belief, and probability theory as a logic of partial belief. Degree of belief is taken as a primitive notion having “no precise meaning unless we specify more exactly how it is to be measured” [Ramsey, 1990a, p. 63]; in other words, degree of belief requires an operative definition that specifies how it can be measured. A “classical” way of measuring degree of belief is the method of bets, endowed with a long-standing tradition dating back to the birth of probability in the seventeenth century with the work of Blaise Pascal, Pierre Fermat and Christiaan Huygens. In Ramsey’s words: “the old established way of measuring a person’s belief is to propose a bet, and see what are the lowest odds which he will accept” (Ramsey [1990a], p. 68). Such a method, however, suffers from well known problems, like the diminishing marginal utility of money, and is to a certain extent arbitrary, due to personal “eagerness or reluctance to bet”, and the fact that “the proposal of a bet may inevitably alter” a person’s “state of opinion” (Ramsey [1990a], p. 68). To avoid such difficulties, Ramsey adopted an alternative method based on the notion of preference, grounded in a “general psychological theory” asserting that “we act in the way we think most likely to realize the objects of our desires, so that a person’s actions are completely determined by his desires and opinions” [Ramsey, 1990a, p. 69]. Attention is called to the fact that 33 On Ramsey’s life, see [Taylor, 2006] and the last chapter of [Sahlin, 1990]. See also “Better than the Stars”, a radio portrait of Frank Ramsey written and presented by Hugh Mellor, with Alfred J. Ayer, Richard B. Braithwaite, Richard C. Jeffrey, Michael Ramsey (Archbishop of Canterbury and Frank’s brother), Lettice Ramsey (Frank’s widow), Ivor A. Richards, originally recorded in 1978, and later published in Mellor, ed. [1995]. More to be found in the Ramsey Archive of King’s College, Cambridge.
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
181
“this theory is not to be identified with the psychology of the Utilitarians, in which pleasure had a dominant position. The theory I propose to adopt is that we seek things which we want, which may be our own or other people’s pleasure, or anything else whatever, and our actions are such as we think most likely to realize these goods.” [Ramsey, 1990a, p. 69] After clarifying that “good” and “bad” are not to be taken in an ethical sense, “but simply as denoting that to which a given person feels desire and aversion” [Ramsey, 1990a, p. 70], Ramsey introduces the notion of quantity of belief, by assuming that goods are measurable as well as additive, and that an agent “will always choose the course of action which will lead in his opinion to the greatest sum of good” [Ramsey, [990a, p. 70]. The fact that people hardly ever entertain a belief with certainty, and usually act under uncertainty, is accounted for by appealing to the principle of mathematical expectation, which Ramsey introduces “as a law of psychology”. Given a person who is prepared to act in order to achieve some good, “if p is a proposition about which he is doubtful, any goods or bads for whose realization p is in his view a necessary and sufficient condition enter into his calculation multiplied by the same fraction, which is called the ‘degree of his belief in p’. We thus define degree of belief in a way which presupposes the use of mathematical expectation”. [Ramsey, 1990a, p. 70] An alternative definition of degree of belief is also suggested along the following lines: “Suppose [the] degree of belief [of a certain person] in p is m/n; then his action is such as he would choose it to be if he had to repeat it exactly n times, in m of which p was true, and in the others false” [Ramsey, 1990a, p. 70]. The two accounts point out two different, albeit strictly intertwined, aspects of the same concept, and are taken to be equivalent. Ramsey exemplifies a typical situation involving a choice of action that depends on belief as follows: “I am at a cross-roads and do not know the way; but I rather think one of the two ways is right. I propose therefore to go that way but keep my eyes open for someone to ask; if now I see someone half a mile away over the fields, whether I turn aside to ask him will depend on the relative inconvenience of going out of my way to cross the fields or of continuing on the wrong road if it is the wrong road. But it will also depend on how confident I am that I am right; and clearly the more confident I am of this the less distance I should be willing to go from the road to check my opinion. I propose therefore to use the distance I would be prepared to go to ask, as a measure of the confidence of my opinion”. [Ramsey, 1990a, pp. 70-71]
182
Maria Carla Galavotti
Denoting f (x) the disadvantage of walking x metres, r the advantage of reaching the right destination, and w the disadvantage of arriving at a wrong destination, if I were ready to go a distance d to ask, the degree of belief that I am on the right road is p = 1 − (f (d)/(r − w)). To choose an action of this kind can be considered advantageous if, were I to act n times in the same way, np times out of these n I was on the right road (otherwise I was on the wrong one). In fact, the total good of not asking each time is npr + n(1 − p)w = nw + np(r − w); while the total good of asking each time (in which case I would never go wrong) is nr − nf (x). The total good of asking is greater than the total good of not asking, provided that f (x) ≺ (r − w)(1 − p). Ramsey concludes that the distance d is connected with my degree of belief, p, by the relation f (d) = (r − w)(1 − p), which amounts to p = 1 − (f (d)/(r − w)), as stated above. He then observes that “It is easy to see that this way of measuring beliefs gives results agreeing with ordinary ideas. [...] Further, it allows validity to betting as means of measuring beliefs. By proposing to bet on p we give the subject a possible course of action from which so much extra good will result to him if p is true and so much extra bad if p is false”. [Ramsey, 1990a, p. 72] However, given the already mentioned difficulties connected with the betting scheme, Ramsey turns to a more general notion of preference. Degree of belief is then operationally defined in terms of personal preferences, determined on the basis of the expectation of an individual of obtaining certain goods, not necessarily of a monetary kind. The value of such goods is intrinsically relative, because they are defined with reference to a set of alternatives. The definition of degree of belief is committed to a set of axioms, which provide a way of representing its values by means of real values. Degrees of belief obeying such axioms are called consistent. The laws of probability are then spelled out in terms of degrees of belief, and it is argued that consistent sets of degrees of belief satisfy the laws of probability. Additivity is assumed in a finite sense, since the set of alternatives taken into account is finite. In this connection Ramsey observes that the human mind is only capable of contemplating a finite number of alternatives open to action, and even when a question is conceived, allowing for an infinite number of answers, these have to be lumped “into a finite number of groups” [Ramsey, 1990a, p. 79]. The crucial feature of Ramsey’s theory of probability is the link between probability and degree of belief established by consistency, or coherence — to use the term that is commonly adopted today. Consistency guarantees the applicability of the notion of degree of belief, which can therefore qualify as an admissible interpretation of probability. In Ramsey’s words, the laws of probability can be shown to be “necessarily true of any consistent set of degrees of belief. Any definite set of degrees of belief which broke them would be inconsistent in the sense that it violated the laws of preference between options. [...] If anyone’s mental condition violated these laws, his choice would depend
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
183
on the precise form in which the options were offered him, which would be absurd. He could have a book made against him by a cunning better and would then stand to lose in any event. We find, therefore, that a precise account of the nature of partial belief reveals that the laws of probability are laws of consistency. [...] Having any definite degree of belief implies a certain measure of consistency, namely willingness to bet on a given proposition at the same odds for any stake, the stakes being measured in terms of ultimate values. Having degrees of belief obeying the laws of probability implies a further measure of consistency, namely such a consistency between the odds acceptable on different propositions as shall prevent a book being made against you”. [Ramsey, 1990a, p. 78] By arguing that from the assumption of coherence one can derive the laws of probability Ramsey paved the way to a fully-fledged subjectivism. Remarkably, within this perspective the laws of probability “do not depend for their meaning on any degree of belief in a proposition being uniquely determined as the rational one; they merely distinguish those sets of beliefs which obey them as consistent ones” [Ramsey, 1990a, p. 78]. This claim brings us to the core of subjectivism, for which coherence is the only condition that degrees of belief should obey, or, to put it slightly differently, insofar as a set of degrees of belief is coherent there is no further demand of rationality to be met. Having adopted a notion of probability in terms of coherent degrees of belief, Ramsey does not need to rely on the principle of indifference. In his words: “the Principle of Indifference can now be altogether dispensed with” [Ramsey, 1990a, p. 85]. This is a decisive step in the moulding of modern subjectivism. As we will see in the next Section, a further step was made by Bruno de Finetti, who supplied the “static” definition of subjective probability in terms of coherent degrees of belief with a “dynamic” dimension, obtained by joining subjective probability with exchangeability within the framework of the Bayesian method.34 Although this crucial step was actually made by de Finetti, there is evidence that Ramsey knew the property of exchangeability, of which he must have heard from Johnson’s lectures. Evidence for this claim is found in his note “Rule of Succession”, where use is made of the notion of exchangeability, named “equiprobability of all permutations”.35 What apparently Ramsey did not see, and was instead grasped by de Finetti, is the usefulness of applying exchangeability to the inductive procedure, modelled upon Bayes’ rule. Remarkably, in another note called “Weight or the Value of Knowledge”,36 Ramsey was able to prove that collecting evidence pays in expectation, provided that acquiring the new information is free, and shows how much the increase in weight is. This shows he had a dynamic view at least of this 34 This terminology is borrowed from Zabell [1991], containing useful remarks on Ramsey’s contribution to subjectivism. For a comparison between Ramsey and de Finetti on subjective probability, see [Galavotti, 1991]. 35 See Ramsey [1991a], pp. 279-281. For a detailed commentary see Di Maio [1994]. 36 See Ramsey [1990b]; also included in [1991a, pp. 285-287].
184
Maria Carla Galavotti
important process. As pointed out by Nils-Eric Sahlin and Brian Skyrms, Ramsey’s note on weight anticipates subsequent work by Savage, Good, and others.37 Ramsey put forward his theory of probability in open contrast with Keynes. In particular, Ramsey did not share Keynes’ claim that “a probability may [...] be unknown to us through lack of skill in arguing from given evidence” [Ramsey, 1922, 1989, p. 220]. For a subjectivist, the notion of unknown probability does not make much sense, as repeatedly emphasized also by de Finetti. Moreover, Ramsey criticized the logical relations on which Keynes’ theory rests. In “Criticism of Keynes” he writes that: “There are no such things as these relations. a) Do we really perceive them? Least of all in the simplest cases when they should be clearest; can we really know them so little and yet be so certain of the laws which they testify? [...] c) They would stand in such a strange correspondence with degrees of belief” [Ramsey, 1991a, pp. 273-274]. Like Keynes, Ramsey believed that probability is the object of logic, but they disagreed on the nature of that logic. Ramsey distinguished between a “lesser logic, which is the logic of consistency, or formal logic”, and a “larger logic, which is the logic of discovery, or inductive logic” [Ramsey, 1990a, p. 82]. The “lesser” logic, which is the logic of tautologies in Wittgenstein’s sense, can be “interpreted as an objective science consisting of objectively necessary propositions”. By contrast, the “larger” logic, which includes probability, does not share this feature, because “when we extend formal logic to include partial beliefs this direct objective interpretation is lost” [Ramsey, 1990a, p. 83], and can only be endowed with a psychological foundation.38 Ramsey’s move towards psychologism was inspired by Wittgenstein. This is manifest in a paper read to the Apostles in 1922, called “Induction: Keynes and Wittgenstein”, where Wittgenstein’s psychologism is contrasted with Keynes’ logicism. At the beginning of that paper, Ramsey mentions propositions 6.363 and 6.3631 of the Tractatus, where it is maintained that the process of induction “has no logical foundation but only a psychological one” [Ramsey, 1991a, p. 296]. After praising Wittgenstein for his appeal to psychology in order to justify the inductive procedure, Ramsey discusses Keynes’ approach at length, expressing serious doubts on his attempt at grounding induction on logical relations and hypotheses. At the end of the paper, after recalling Hume’s celebrated argument, Ramsey puts forward by way of a conjecture, of which he claims to be too tired “to see clearly if it is sensible or absurd”, the idea that induction could be justified by saying that “a type of inference is reasonable or unreasonable according to the relative frequencies with which it leads to truth and falsehood. Induction is reasonable because it produces predictions which are generally verified, not because of any logical relation between its premisses and conclusions. On this view we should establish by induction that induction was reasonable, and induction being reasonable this would be a 37 See Nils-Eric Sahlin’s “Preamble” to Ramsey [1990b], and Skyrms [1990] and [2006]. See in addition Savage [1954] and Good [1967]. 38 For some remarks on Ramsey’s psychological theory of belief see Suppes [2006].
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
185
reasonable argument”. [Ramsey, 1991a, p. 301] This passage suggests that Ramsey had in mind a pragmatic justification of the inductive procedure. A similar attitude reappears at the end of “Truth and Probability”, where he describes his own position as “a kind of pragmatism”, holding that “we judge mental habits by whether they work, i.e. whether the opinions they lead to are for the most part true, or more often true than those which alternative habits would lead to. Induction is such a useful habit, and so to adopt it is reasonable. All that philosophy can do is to analyse it, determine the degree of its utility, and find on what characteristics of nature it depends. An indispensable means for investigating these problems is induction itself, without which we should be helpless. In this circle lies nothing vicious. It is only through memory that we can determine the degree of accuracy of memory; for if we make experiments to determine this effect, they will be useless unless we remember them”. [Ramsey, 1990a, p. 93-94] As testified by a number of Ramsey’s references to William James and Charles Sanders Peirce, pragmatism is a major feature of his philosophy in general, and his views on probability are no exception. A puzzling aspect of Ramsey’s theory of probability are the relations between degree of belief and frequency. In “Truth and Probability” he writes that “it is natural [...] that we should expect some intimate connection between these two interpretations, some explanation of the possibility of applying the same mathematical calculus to two such different sets of phenomena” [Ramsey, 1990a, p. 83]. Such a connection is identified with the fact that “the very idea of partial belief involves reference to a hypothetical or ideal frequency [...] belief of degree m/n is the sort of belief which leads to the action which would be best if repeated n times in m of which the proposition is true” [Ramsey, 1990a, p. 84]. This passage — echoing the previously mentioned conjecture from “Induction: Keynes and Wittgenstein” — reaffirms Ramsey’s pragmatical tendency to associate belief with action, and to justify inductive behaviour with reference to successful conduct. The argument is pushed even further when Ramsey says that “It is this connection between partial belief and frequency which enables us to use the calculus of frequencies as a calculus of consistent partial belief. And in a sense we may say that the two interpretations are the objective and subjective aspects of the same inner meaning, just as formal logic can be interpreted objectively as a body of tautology and subjectively as the laws of consistent thought”. [Ramsey, 1990a, p. 84] However, in other passages the connection between these two “aspects” is not quite so strict:
186
Maria Carla Galavotti
“experienced frequencies often lead to corresponding partial beliefs, and partial beliefs lead to the expectation of corresponding frequencies in accordance with Bernoulli’s Theorem. But neither of these is exactly the connection we want; a partial belief cannot in general be connected uniquely with any actual frequency”. [Ramsey, 1990a, p. 84] Evidence that Ramsey was intrigued by the relation between frequency and degree of belief is found in some remarks contained in the note “Miscellaneous Notes on Probability”, written in 1928. There four kinds of connections are pointed out, namely: “(1) if degree of belief = γ, most prob((able)) frequency is γ (if instances independent). This is Bernoulli’s theorem; (2) if freq((uency)) has been γ we tend to believe with degree γ; (3) if freq((uency)) is γ, degree γ of belief is justified. This is Peirce’s definition; (4) degree γ of belief means acting appropriately to a frequency γ” [Ramsey, 1991a, p. 275]. After calling attention to such possible connections, Ramsey reaches the conclusion that “it is this last which makes calculus of frequencies applicable to degrees of belief”. Remarkably, the result known as de Finetti’s “representation theorem” tells us precisely how to treat relation (4). One might speculate that Ramsey would have found an answer to at least part of what he was looking for in this result, that de Finetti found out in the very same years, but was not available to him.39 Claims like that mentioned above to the effect that partial belief and frequency “are the two objective and subjective aspects of the same inner meaning”, might be taken to suggest that Ramsey admitted of two notions of probability: one epistemic (the subjective view) and one empirical (the frequency view).40 This emerges again at the very beginning of “Truth and Probability” where Ramsey claims that although the paper deals with the logic of partial belief, “there is no intention of implying that this is the only or even the most important aspect of the subject”, adding that “probability is of fundamental importance not only in logic but also in statistical and physical science, and we cannot be sure beforehand that the most useful interpretation of it in logic will be appropriate in physics also” [Ramsey, 1990a, p. 53]. It can be argued that in spite of these claims Ramsey trusted that the subjective interpretation has the resources for accounting for all uses of probability. His writings offer plenty of evidence for this thesis. There is no doubt that Ramsey took seriously the problem of what kind of probability is employed in science. We know from Braithwaite’s “Introduction” to The Foundations of Mathematics that he had planned to write a final section of “Truth and Probability”, dealing with probability in science. We also know from Ramsey’s unpublished notes that by the time of his death he was working on a book bearing the title “On Truth and Probability”, of which he left a number of tables of contents.41 Of the projected book he only wrote the first part, dealing with the notion of truth, which was published in 1991 under the title On Truth. It can be conjectured that he meant to include in the second part of the book the 39 On
this point, see Galavotti [1991] and [1995]. instance, this opinion is upheld in Good [1965, p. 8]. 41 See the “Ramsey Collection” held by the Hillman Library of the University of Pittsburgh. 40 For
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
187
content of the paper “Truth and Probability”, plus some additional material on probability in science. The notes published in The Foundations of Mathematics under the heading “Further Considerations”,42 and a few more published in the volume Notes on Philosophy, Probability and Mathematics, contain evidence that in the years 1928-29 Ramsey was actively thinking about such problems as theories, laws, causality, chance, all of which he regarded as intertwined. A careful analysis of such writings shows that — contrary to the widespread opinion that he was a dualist with regard to probability — in the last years of his life Ramsey was developing a view of chance and probability in physics fully compatible with his subjective interpretation of probability as degree of belief. Ramsey’s view of chance revolves around the idea that this notion requires some reference to scientific theories. Chance cannot be defined simply in terms of laws (empirical regularities) or frequencies — though the specification of chances involves reference to laws, in a way that will soon be clarified. In “Reasonable Degree of Belief” Ramsey writes that “We sometimes really assume a theory of the world with laws and chances and mean not the proportion of actual cases but what is chance on our theory” [Ramsey, 1990a, p. 97]. The same point is emphasized in the note “Chance”, also written in 1928, where the frequency-based views of chance put forward by authors like Norman Campbell is criticized. The point is interesting, because it highlights Ramsey’s attitude to frequentism, which, far from considering a viable interpretation of probability, he deems inadequate. As Ramsey puts it: “There is, for instance, no empirically established fact of the form ‘In n consecutive throws the number of heads lies between n/2 ± ε(n)’. On the contrary we have good reason to believe that any such law would be broken if we took enough instances of it. Nor is there any fact established empirically about infinite series of throws; this formulation is only adopted to avoid contradiction by experience; and what no experience can contradict, none can confirm, let alone establish”. [Ramsey, 1990a, p. 104] To Campbell’s frequentist view, Ramsey opposed a notion of chance ultimately based on degrees of belief. He defines it as follows: “Chances are degrees of belief within a certain system of beliefs and degrees of belief; not those of any actual person, but in a simplified system to which those of actual people, especially the speaker, in part approximate. [...] This system of beliefs consists, firstly, of natural laws, which are in it believed for certain, although, of course, people are not really quite certain of them”. [Ramsey, 1990a, p. 104] In addition, the system will contain statements of the form: “when knowing ψx and nothing else relevant, always expect φx with degree of belief p (what is or 42 In Ramsey [1931, pp. 199-211]. These are the notes called: “Reasonable Degree of Belief”, “Statistics” and “Chance”, all reprinted in [1990a, pp. 97-109].
188
Maria Carla Galavotti
is not relevant is also specified in the system)” [Ramsey, 1990a, p. 104]. Such statements together with the laws “form a deductive system according to the rules of probability, and the actual beliefs of a user of the system should approximate to those deduced from a combination of the system and the particular knowledge of fact possessed by the user, this last being (inexactly) taken as certain” [Ramsey, 1990a, p. 105]. To put it differently, chance is defined with reference to systems of beliefs that typically contain accepted laws. Ramsey stresses that chances “must not be confounded with frequencies”, for the frequencies actually observed do not necessarily coincide with them. Unlike frequencies, chances can be said to be “objective” in two ways. First, to say that a system includes a chance value referred to a phenomenon, means that the system itself cannot be modified so as to include a pair of deterministic laws, ruling the occurrence and non-occurrence of the same phenomenon. As explicitly admitted by Ramsey, this characterization of objective chance is reminiscent of Poincar´e’s treatment of the matter, and typically applies “when small causes produce large effects” [Ramsey, 1990a, p. 106]. Second, chances can be said to be objective “in that everyone agrees about them, as opposed e.g. to odds on horses” [Ramsey, 1990a, p. 106)]. On the basis of this general definition of chance, Ramsey qualifies probability in physics as chance referred to a more complex system, namely to a system making reference to scientific theories. In other words, probabilities occurring in physics are derived from physical theories. They can be taken as ultimate chances, to mean that within the theoretical framework in which they occur there is no way of replacing them with deterministic laws. The objective character of chances descends from the objectivity peculiarly ascribed to theories that are universally accepted. Ramsey’s view of chance and probability in physics is obviously intertwined with his conception of theories, truth and knowledge in general. Within Ramsey’s philosophy the “truth” of theories is accounted for in pragmatical terms. In this connection Ramsey holds the view, whose paternity is usually attributed to Charles Sanders Peirce, but is also found in Campbell’s work, that theories which gain “universal assent” in the long run are accepted by the scientific community and taken as true. Along similar lines he characterized a “true scientific system” with reference to a system to which the opinion of everyone, grounded on experimental evidence, will eventually converge. According to this pragmatically oriented view, chance attributions, like all general propositions belonging to theories — including causal laws — are not to be taken as propositions, but rather as “variable hypotheticals”, or “rules for judging”, apt to provide a tool with which the user meets the future.43 To sum up, for Ramsey chances are theoretical constructs, but they do not express realistic properties of “physical objects”, whatever meaning be attached to this expression. Chance attributions indicate a way in which beliefs in various facts belonging to science are guided by scientific theories. Ramsey’s idea that 43 See
especially “General Propositions and Causality” (1929) in Ramsey [1931] and [1990a].
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
189
within the framework of subjective probability one can make sense of an “objective” notion of physical probability has passed almost unnoticed. It is, instead, an important contribution to the subjective interpretation and its possible applications to science.
2.3
de Finetti and exchangeability
With the Italian Bruno de Finetti (1906-1985) the subjective interpretation of probability came to completion. Working in the same years as Ramsey, but independently, de Finetti forged a similar view of probability as degree of belief, subject to the only constraint of coherence. To such a definition he added the notion of exchangeability, which can be regarded as the decisive step towards the edification of modern subjectivism. In fact exchangeability, combined with Bayes’ rule, gives rise to the inferential methodology which is at the root of the so-called neo-Bayesianism. This result was the object of the paper “Funzione caratteristica di un fenomeno aleatorio” that de Finetti read at the International Congress of Mathematicians, held in Bologna in 1928. In 1935, at Maurice Fr´echet’s invitation de Finetti gave a series of lectures at the Institut Henri Poincar´e in Paris, whose text was published in 1937 under the title “La pr´evision: ses lois logiques, ses sources subjectives”. This article, which is one of de Finetti’s best known, allowed dissemination of his ideas in the French speaking community of probabilists. However, de Finetti’s work came to be known to the English speaking community only in the 1950s, thanks to Leonard Jimmie Savage, with whom he entertained a fruitful collaboration. In addition to making a contribution to probability theory and statistics which is universally recognized as seminal, de Finetti put forward an original philosophy of probability, which can be described as a blend of pragmatism, operationalism and what we would today call “anti-realism”.44 indexAliotta, A. Richard Jeffrey labelled de Finetti’s philosophical position “radical probabilism”45 to stress the fact that for de Finetti probability imbues the whole edifice of human knowledge, and that scientific knowledge is a product of human activity ruled by (subjective) probability, rather than truth or objectivity. De Finetti’s outlined his philosophy of probability in the article “Probabilismo” (1931) which he regarded as his philosophical manifesto. Yet another philosophical text bearing the title L’invenzione della verit` a, originally written by de Finetti in 1934 to take part in a competition for a grant from the Royal Academy of Italy, was published in 2006. The two main sources of de Finetti’s philosophy are Mach’s phenomenalism, and pragmatism, namely the version upheld by the so-called Italian pragmatists, including Giovanni Vailati, Antonio Aliotta and Mario Calderoni. The starting point of de Finetti’s probabilism is the refusal of the notion of truth, and the related view that there are “immutable and necessary” laws. In “Probabilismo” he 44 This is outlined in some detail in Galavotti [1989]. For an autobiographical sketch of de Finetti’s the reader is addressed to de Finetti [1982]. 45 See Jeffrey [1992b] and [1992c].
190
Maria Carla Galavotti
writes: “no science will permit us to say: this fact will come about, it will be thus and so because it follows from a certain law, and that law is an absolute truth. Still less will it lead us to conclude skeptically: the absolute truth does not exist, and so this fact might or might not come about, it may go like this or in a totally different way, I know nothing about it. What we can say is this: I foresee that such a fact will come about, and that it will happen in such and such a way, because past experience and its scientific elaboration by human thought make this forecast seem reasonable to me”. [de Finetti, 1931a, English edition 1989, p. 170] Probability makes forecast possible, and since a forecast is always referred to a subject, being the product of his experience and convictions, the instrument we need is the subjective theory of probability. For de Finetti probabilism is the way out of the antithesis between absolutism and skepticism, and at its core lies the subjective notion of probability. Probability “means degree of belief (as actually held by someone, on the ground of his whole knowledge, experience, information) regarding the truth of a sentence, or event E (a fully specified ‘single’ event or sentence, whose truth or falsity is, for whatever reason, unknown to the person)” [de Finetti, 1968, p. 45]. Of this notion, de Finetti wants to show not only that it is the only non contradictory one, but also that it covers all uses of probability in science and everyday life. This programme is accomplished in two steps: first, an operational definition of probability is worked out, second, it is argued that the notion of objective probability is reducible to that of subjective probability. As we have seen discussing Ramsey’s theory of probability, the obvious option to define probability in an operational fashion is in terms of betting quotients. Accordingly, the degree of probability assigned by an individual to a certain event is identified with the betting quotient at which he would be ready to bet a certain sum on its occurrence. The individual in question should be thought of as one in a condition to bet whatever sum against any gambler whatsoever, free to choose the betting conditions, like someone holding the bank at a gambling-casino. Probability is defined as the fair betting quotient he would attach to his bets. De Finetti adopts this method, with the proviso that in case of monetary gain only small sums should be considered, to avoid the problem of marginal utility. Like Ramsey, de Finetti states coherence as the fundamental and unique criterion to be obeyed to avoid a sure loss, and spells out an argument to the effect that coherence is a sufficient condition for the fairness of a betting system, showing that a coherent gambling behaviour satisfies the principles of probability calculus, which can be derived from the notion of coherence itself. This is known in the literature as the Dutch book argument. It is worth noting that for de Finetti the scheme of bets is just a convenient way of making probability readily understandable, but he always held that there are other ways of defining probability. In “Sul significato soggettivo della probabilit` a”
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
191
[de Finetti, 1931b], after giving an operational definition of probability in terms of coherent betting systems, de Finetti introduces a qualitative definition of subjective probability based on the relation of “at least as probable as”. He then argues that it is not essential to embrace a quantitative notion of probability, and that, while betting quotients are apt devices for measuring and defining probability in an operational fashion, they are by no means an essential component of the notion of probability, which is in itself a primitive notion, expressing “an individual’s psychological perception” [de Finetti, 1931b, English edition 1992, p. 302]. The same point is stressed in Teoria delle probabilit` a, where de Finetti describes the betting scheme as a handy tool leading to “simple and useful insights” [de Finetti, 1970, English edition 1975, vol. 1, p. 180], but introduces another method of measuring probability, making use of scoring rules based on penalties. Remarkably, de Finetti assigns probability an autonomous value independent from the notion utility, thereby marking a difference between his position and that of Ramsey and other supporters of subjectivism, like Savage. The second step of de Finetti’s programme, namely the reduction of objective to subjective probability, relies on what is known as the “representation theorem”. The pivotal notion in this context is that of exchangeability, which corresponds to Johnson’s “permutation postulate” and Carnap’s “symmetry”.46 Summarizing de Finetti, events belonging to a sequence are exchangeable if the probability of h successes in n events is the same, for whatever permutation of the n events, and for every n and h ≤ n. The representation theorem says that the probability of exchangeable events can be represented as follows. Imagine the events were probabilistically independent, with a common probability of occurrence p. Then the probability of a sequence e, with h occurrences in n, would be ph (1−p)n−h . But if the events are exchangeable, the sequence has a probability P (e), represented according to de Finetti’s representation theorem as a mixture over the ph (1−p)n−h with varying values of p: 1 P (e) = ph (1 − p)n−h dF (p) 0
where the distribution function F (p) is unique. The above equation involves two kinds of probability, namely the subjective probability P (e) and the “objective” (or “unknown”) probability p of the events considered. This enters into the mixture associated with the weights assigned by the function F (p) representing a probability distribution over the possible values of p. Assuming exchangeability then amounts to assuming that the events considered are equally distributed and independent, given any value of p. In order to understand de Finetti’s position, it is useful to start by considering how an objectivist would proceed when assessing the probability of an unknown 46 In his “farewell lecture”, delivered at the University of Rome before his retirement, de Finetti says that the term “exchangeability” was suggested to him by Maurice Fr´echet in 1939. Before adopting this terminology, de Finetti had made use of the term “equivalence”. See de Finetti [1976, p. 283].
192
Maria Carla Galavotti
event. An objectivist would assume an objective success probability p. But its value would in general remain unknown. One could give weights to the possible values of p, and determine the weighted average. The same applies to the probability of a sequence e, with h successes in n independent repetitions. Note that because of independence it does not matter where the successes appear. De Finetti focuses on the latter, calling exchangeable those sequences where the places of successes do not make a difference in probability. These need not be independent sequences. An objectivist who wanted to explain subjective probability, would say that the weighted averages are precisely the subjective probabilities. But de Finetti proceeds in the opposite direction with his representation theorem: starting from the subjective judgment of exchangeability, one can show that there is only one way of giving weights to the possible values of the unknown objective probabilities. According to this interpretation, objective probabilities become useless and subjective probability can do the whole job. De Finetti holds that exchangeability represents the correct way of expressing the idea that is usually conveyed by the expression “independent events with constant but unknown probability”. If we take an urn of unknown composition, says de Finetti, the above phrase means that, relative to each of all possible compositions of the urn, the events can be seen as independent with constant probability. Then he points out that “what is unknown here is the composition of the urn, not the probability: this latter is always known and depends on the subjective opinion on the composition, an opinion which changes as new draws are made and the observed frequency is taken into account”. [de Finetti, 1995, English edition 2008, p. 163] It should not pass unnoticed that for the subjectivist de Finetti probability, being the expression of the feelings of the subjects who evaluate it, is always definite and known. From a philosophical point of view, de Finetti’s reduction of objective to subjective probability is to be seen pragmatically; it follows the same pragmatic spirit inspiring the operational definition of subjective probability, and complements it. From a more general viewpoint, the representation theorem gives applicability to subjective probability, by bridging the gap between degrees of belief and observed frequencies. Taken in connection with Bayes’ rule, exchangeability provides a model of how to proceed in such a way as to allow for an interplay between the information on frequencies and degrees of belief. By showing that the adoption of Bayes’ method, taken in conjunction with exchangeability, leads to a convergence between degrees of belief and frequencies, de Finetti indicates how subjective probability can be applied to statistical inference. According to de Finetti, the representation theorem answers Hume’s problem because it justifies “why we are also intuitively inclined to expect that frequency observed in the future will be close to frequency observed in the past” [de Finetti, 1972a, p. 34]. De Finetti’s argument is pragmatic and revolves around the task of induction: to guide inductive reasoning and behavior in a coherent way. Like
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
193
Hume, de Finetti thinks that it is impossible to give a logical justification of induction, and answers the problem in a psychologistic fashion. De Finetti’s probabilism is deeply Bayesian: to his eyes statistical inference can be entirely performed by exchangeability in combination with Bayes’ rule. From this perspective, the shift from prior to posterior, or, as he preferred to say, from initial to final probabilities, becomes the cornerstone of statistical inference. In a paper entitled “Initial Probabilities: a Prerequisite for any Valid Induction” de Finetti takes a “radical approach” by which “all the assumptions of an inference ought to be interpreted as an overall assignment of initial probabilities” [de Finetti, 1969, p. 9]. The shift from initial to final probabilities receives a subjective interpretation, in the sense that it means going from one subjective probability to another, although objective factors, like frequencies, are obviously taken into account, when available. As repeatedly pointed out by de Finetti, updating one’s mind in view of new evidence does not mean changing opinion: “If we reason according to Bayes’ theorem, we do not change our opinion. We keep the same opinion, yet updated to the new situation. If yesterday I was saying “It is Wednesday”, today I would say “It is Thursday”. However I have not changed my mind, for the day after Wednesday is indeed Thursday” [de Finetti, 1995, English edition 2008, p. 43]. In other words, the idea of correcting previous opinions is alien to his perspective, and so is the notion of a self-correcting procedure, retained by other authors, like Hans Reichenbach. The following passage from the book Filosofia della probabilit` a, recently published in English under the title Philosophical Lectures on Probability, highlights de Finetti’s deeply felt conviction that subjective Bayesianism is the only acceptable way of addressing probabilistic inference, and the whole of statistics. The passage also gives the flavour of de Finetti’s incisive prose: “The whole of subjectivistic statistics is based on this simple theorem of calculus of probability [Bayes’ theorem]. This provides subjectivistic statistics with a very simple and general foundation. Moreover, by grounding itself on the basic probability axioms, subjectivistic statistics does not depend on those definitions of probability that would restrict its field of application (like, e.g., those based on the idea of equally probable events). Nor, for the characterization of inductive reasoning, is there any need — if we accept this framework — to resort to empirical formulae. Objectivistic statisticians, on the other hand, make copious use of empirical formulae. The necessity to resort to them only derives from their refusal to allow the use of the initial probability. [...] they reject the use of the initial probability because they reject the idea that probability depends on a state of information. However, by doing so, they distort everything: not only as they turn probability into an objective thing [...] but they go so far as to turn it into a theological entity: they pretend that the ‘true’ probability exists, outside ourselves, independently of a person’s own judgement”.
194
Maria Carla Galavotti
[de Finetti, 1995, English edition 2008, p. 43] For de Finetti objective probability is not only useless, but meaningless, like all metaphysical notions. This attitude is epitomized by the statement “probability does not exist”, printed in capital letters in the “Preface” to the English edition of Teoria delle probabilit` a. A similar statement opens the article “Probabilit` a” in the Enciclopedia Einaudi : “Is it true that probability ‘exists’ ? What could it be? I would say no, it does not exist” [de Finetti, 1980, p. 1146]. Such aversion to the ascription of an objective meaning to probability is a direct consequence of de Finetti’s anti-realism, and is inspired by the desire to keep the notion of probability free from metaphysics. Unfortunately, de Finetti’s statement has fostered the feeling that subjectivism is surrounded by a halo of arbitrariness. Against this suspicion, it must be stressed that de Finetti’s attack on objective probability did not prevent him from taking seriously the issue of objectivity. In fact he struggled against the “distortion” of “identifying objectivity and objectivism”, deemed a “dangerous mirage” [de Finetti, 1962a, p. 344], but did not deny the problem of the objectivity of probability evaluations. To clarify de Finetti’s position, it is crucial to keep in mind de Finetti’s distinction between the definition and the evaluation of probability. These are seen by de Finetti as utterly different concepts which should not be conflated. To his eyes, the confusion between the definition and the evaluation of probability imprints all the other interpretations of probability, namely frequentism, logicism and the classical approach. Upholders of these viewpoints look for a unique criterion — be it frequency, or symmetry — and use it as grounds for both the definition and the evaluation of probability. In so doing, they embrace a “rigid” attitude towards probability, which consists “in defining (in whatever way, according to whatever conception) the probability of an event, and in univocally determining a function” [de Finetti, 1933, p. 740]. By contrast, subjectivists take an “elastic” attitude, according to which the choice of one particular function is not committed to a single rule or method: “the subjective theory [...] does not contend that the opinions about probability are uniquely determined and justifiable. Probability does not correspond to a self-proclaimed ‘rational’ belief, but to the effective personal belief of anyone” [de Finetti, 1951, p. 218]. For subjectivists there are no “correct” probability assignments, and all coherent functions are admissible. The choice of one particular function is regarded as the result of a complex and largely context-dependent procedure. To be sure, the evaluation of probability should take into account all available evidence, including frequencies and symmetries. However, it would be a mistake to put these elements, which are useful ingredients of the evaluation of probability, at the basis of its definition. De Finetti calls attention to the fact that the evaluation of probability involves both objective and subjective elements. In his words: “Every probability evaluation essentially depends on two components: (1) the objective component, consisting of the evidence of known data and facts; and (2) the subjective component, consisting of the opinion concerning unknown facts based on known evidence’ [de Finetti, 1974, p. 7]. The subjective component is seen as unavoidable, and for
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
195
de Finetti the explicit recognition of its role is a prerequisite for the appraisal of objective elements. Subjective elements in no way “destroy the objective elements nor put them aside, but bring forth the implications that originate only after the conjunction of both objective and subjective elements at our disposal” [de Finetti, 1973, p. 366]. De Finetti calls attention to the fact that the collection and exploitation of factual evidence, the objective component of probability judgments, involves subjective elements of various kinds, like the judgment as to what elements are relevant to the problem under consideration, and should enter into the evaluation of probabilities. In practical situations a number of other factors influence probability evaluations, including the degree of competence of the evaluator, his optimistic or pessimistic attitudes, the influence exercised by most recent facts, and the like. Equally subjective for de Finetti is the decision on how to let belief be influenced by objective elements. Typically, when evaluating probability one relies on information regarding frequencies. Within de Finetti’s perspective, the interaction between degrees of belief and frequencies rests on exchangeability. Assuming exchangeability, whenever a considerable amount of information on frequencies is available this will strongly constrain probability assignments. But information on frequencies is often scant, and in this case the problem of how to obtain good probability evaluations becomes crucial. This problem is addressed by de Finetti in a number of writings, partly fruit of his cooperation with Savage.47 The approach adopted is based on penalty methods, of the kind of the well known “Brier’s rule”. Scoring rules like Brier’s are devised to oblige those who make probability evaluations to be as accurate as they can and, if they have to compete with others, to be honest. Such rules play a twofold role within de Finetti’s approach. In the first place, they offer a suitable tool for an operational definition of probability, which is in fact adopted by de Finetti in his late works. In addition, these rules offer a method for improving probability evaluations made both by a single person and by several people, because they can be employed as methods for exercising “self-control”, as well as a “comparative control” over probability evaluations [de Finetti, 1980, p. 1151].48 The use of such methods finds a simple interpretation within de Finetti’s subjectivism: “though maintaining the subjectivist idea that no fact can prove or disprove belief” — he writes — “I find no difficulty in admitting that any form of comparison between probability evaluations (of myself, of other people) and actual events may be an element influencing my further judgment, of the same status as any other kind of information” [de Finetti, 1962a, p. 360]. De Finetti’s work in this connection is in tune with a widespread attitude, especially among Bayesian statisticians, that has given rise to a vast literature on “well-calibrated” estimation methods. Having clarified that de Finetti’s refusal of objective probability is not tantamount to a denial of objectivity, it should be added that such a refusal leads him to overlook notions like “chance” and “physical probability”. Having embraced 47 See 48 For
Savage [1971] where such a cooperation is mentioned. further details, the reader is addressed to Dawid and Galavotti [2009].
196
Maria Carla Galavotti
the pragmatist conviction that science is just a continuation of everyday life, de Finetti never paid much attention to the use made of probability in science, and held that subjective probability can do the whole job. Only the volume Filosofia della probabilit` a includes a few remarks that are relevant to the point. There de Finetti admits that probability distributions belonging to scientific theories — he refers specifically to statistical mechanics — can be taken as “more solid grounds for subjective opinions” [de Finetti, 1995, English edition 2008, p. 63]. This allows for the conjecture that late in his life de Finetti must have entertained the idea that probabilities encountered in science derive a peculiar “robustness” from scientific theories.49 Unlike Ramsey, however, de Finetti did not feel the need to include in his theory a notion of probability specifically devised for application in science. With de Finetti’s subjectivism, the epistemic conception of probability is committed to a theory that could not be more distant from Laplace’s perspective. Unsurprisingly, de Finetti holds that “the belief that the a priori probabilities are distributed uniformly is a well defined opinion and is just as specific as the belief that these probabilities are distributed in any other perfectly specified manner” [de Finetti, 1951, p. 222]. But what is more important is that the weaker assumption of exchangeability allows for a more flexible inferential method than Laplace’s method based on independence. Last but not least, unlike Laplace de Finetti is not a determinist. He believes that in the light of modern science, we have to admit that events are not determined with certainty, and therefore determinism is untenable.50 For an empiricist and pragmatist like de Finetti, both determinism and indeterminism are unacceptable, when taken as physical, or even metaphysical, hypotheses; they can at best be useful ways of describing certain facts. In other words, the alternative between determinism and indeterminism “is undecidable and (I should like to say) illusory. These are metaphysical diatribes over ‘things in themselves’; science is concerned with what ‘appears to us’, and it is not strange that, in order to study these phenomena it may in some cases seem more useful to imagine them from this or that standpoint” [de Finetti, 1976, p. 299].
CONCLUDING REMARKS The epistemic approach is a strong trend in the current debate on probability. Of the two interpretations that have been outlined, namely logicism and subjectivism, subjectivism seems by far more popular, at least within economics and more generally in the social sciences. This can be imputed to a number of reasons, the most obvious being that in the social sciences and economics personal opinions 49 This
is argued in some detail in Galavotti [2001] and [2005]. issue of determinism is addressed in de Finetti [1931c] and in the “Appendix” contained in de Finetti [1970]. Some comments on the de Finetti’s attitude towards determinism are to be found in Suppes [2009] and Zabell [2009]. 50 The
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
197
and expectations enter directly into the information used to support forecasts, forge hypotheses and build models. The work of Ramsey and de Finetti has exercised a formidable influence on subsequent literature. Under the spell of their ideas, novel research fields have been explored, including the theory of decision and the so-called dynamics of belief developed by authors like Richard Jeffrey, Brian Skyrms and many others.51 Equally impressive is the impact of Ramsey and de Finetti on the literature on exchangeability and Bayesian inference, with the work of L. J. Savage, I. J. Good, Dennis Lindley52 and many others working in their wake. From a philosophical point of view, the pragmatism and pluralism characterizing the subjective approach, especially its insistence on the role of various contextual factors, including the individual judgment of experts in the evaluation of probability, have gained considerable consensus. In the realm of natural sciences the prevailing tendency has always been to regard probability as an empirical notion and to assign it a frequentist interpretation. Exceptions to this tendency are the authors who have sided with logicism. One such exception is Harold Jeffreys, whose perspective was considered in Part I. In a similar vein, under the influence of Boole and Keynes the physicist Richard T. Cox derived the laws of probability from a set of postulates formulated in algebraic terms, introduced as plausibility conditions.53 In addition, Cox investigated the possibility of relating probability to entropy, taken as a measure of information and uncertainty, an idea shared by another physicist, namely Edwin T. Jaynes. A strong supporter of Bayesianism and an admirer of Jeffreys’ work, Jaynes put forward a “principle of maximum entropy” as an “objective” criterion for the choice of priors.54 The problem of suggesting objective criteria for the choice of prior probabilities is a burning topic within recent debate revolving around Bayesianism. This has given rise to a specific trend of research, labelled “objective Bayesianiam”.55 Work in this connection tends to transpose the fundamental divergence between logicism and subjectivism, which essentially amounts to the tenet shared by logicism but not subjectivism that a degree of belief should be univocally determined by a given body of evidence, to the framework of Bayesianism. It is on this ground that the influence of logicism on contemporary debates seems more tangible.
BIBLIOGRAPHY [Backhouse and Bateman, 2006] R. E. Backhouse and B. Bateman. A Cunning Purchase: the Life and Work of Maynard Keynes. In [Backhouse and Bateman, 2006, pp. 1-18]. 51 In addition to the references reported in footnotes 29 and 37, see Skyrms [1996] and Jeffrey [2004]. 52 See Savage [1954], Good [1965] and [1983] and Lindley [1965]. 53 See Cox [1946] and [1961]. 54 See Jaynes [1983] and [2003]. 55 See Williamson [2009].
198
Maria Carla Galavotti
[Backhouse and Bateman, 2006] R. E. Backhouse and B. Bateman, eds. The Cambridge Companion to Keynes, Cambridge: Cambridge University Press, 2006. [Bateman, 1987] B. Bateman. Keynes’ Changing Conception of Probability. Economics and Philosophy, III, pp. 97-120, 1987. [Bolzano, 1837] B. Bolzano. Wissenschaftslehre. Sulzbach: Seidel 1837. English partial edition Theory of Science, ed. by Jan Berg. Dordrecht: Reidel, 1973. [Boole, 1851] G. Boole. On the Theory of Probabilities, and in Particular on Mitchell’s Problem of the Distribution of Fixed Stars. The Philosophical Magazine, Series 4, I, pp. 521-530, 1851. Reprinted in Boole [1952], pp. 247-259. [Boole, 1854a] G. Boole. An Investigation of the Laws of Thought, on which are Founded the Mathematical Theories of Logic and Probabilities. London: Walton and Maberly, 1854. Reprinted as George Boole’s Collected Works, vol. 2. Chicago-New York: Open Court, 1916. Reprinted New York: Dover, 1951. [Boole, 1854b] G. Boole. On a General Method in the Theory of Probabilities. The Philosophical Magazine, Series 4, VIII, pp. 431-44, 1854. Reprinted in Boole [1952], pp. 291-307. [Boole, 1952] G. Boole. Studies in Logic and Probability, ed. by Rush Rhees. London: Watts and Co, 1952. [Boole, 1997] G. Boole. Selected Manuscripts on Logic and its Philosophy, ed. by Ivor GrattanGuinness and G´ erard Bornet. Berlin: Birkh¨ auser, 1997. ´ Borel. A ` propos d’un trait´e des probabilit´es. Revue Philosophique XCVIII, pp. [Borel, 1924] E. 321-36, 1924. Reprinted in Borel [1972], vol. 4, pp. 2169-2184. English edition Apropos of a Treatise on Probability. In Kyburg and Smokler, eds. [1964], pp. 45-60, (not included in the 1980 edition). ´ Borel. Oeuvres de Emile ´ ´ [Borel, 1972] E. Borel. 4 volumes. Paris: Editions du CNRS, 1972. [Braithwaite, 1946] R. B. Braithwaite. John Maynard Keynes, First Baron Keynes of Tilton. Mind LV, pp. 283-284, 1946. [Braithwaite, 1975] R. B. Braithwaite. Keynes as a Philosopher. In [Keynes, 1975, pp. 237-246]. [Broad, 1922] C. D. Broad. Critical Notices: A Treatise on Probability by J.M. Keynes. Mind XXXI, pp. 72-85, 1922. [Broad, 1924] C. D. Broad. Mr. Johnson on the Logical Foundations of Science. Mind XXXIII, pp. 242-269 (part 1), pp. 367-384 (part 2), 1924. [Cameron and Forrester, 2000] L. Cameron and J. Forrester. Tansley’s Psychoanalytic Network: An Episode out of the Early History of Psychoanalysis in England. Psychoanalysis and History II, pp. 189-256, 2000. [Carabelli, 1988] A. Carabelli. On Keynes’ Method. London: Macmillan 1988. [Carnap, 1950] R. Carnap. Logical Foundations of Probability. Chicago: Chicago University Press, 1950. Second edition with modifications 1962, reprinted 1967. [Carnap, 1968] R. Carnap. Inductive Logic and Inductive Intuition. In [Lakatos, 1968, pp. 258267]. [Cook, 1990] A. Cook. Sir Harold Jeffreys. Biographical Memoirs of Fellows of the Royal Society XXXVI, pp. 303-333, 1990. [Costantini and Galavotti, 1987] D. Costantini and M. C. Galavotti. Johnson e l’interpretazione degli enunciati probabilistici. In L’epistemologia di Cambridge 1850-1950, ed. by Raffaella Simili. Bologna: Il Mulino, pp. 245-62, 1987. [Costantini and Galavotti, 1997] D. Costantini and M. C. Galavotti, eds. Probability, Dynamics and Causality. Dordrecht-Boston: Kluwer, 1997. [Cottrell, 1993] A. Cottrell. Keynes’ Theory of Probability and its Relevance to his Economics. Economics and Philosophy IX, pp. 25-51, 1993. [Cox, 1946] R. T. Cox. Probability, Frequency, and Reasonable Expectation, American Journal of Physics XIV, pp. 1-13, 1946. [Cox, 1961] R. T. Cox. The Algebra of Probable Inference. Baltimore: The Johns Hopkins University, 1961. [Dawid and Galavotti, 2009] A. P. Dawid and M. C. Galavotti. De Finetti’s Subjectivism, Objective Probability, and the Empirical Validation of Probability Assessments. In [2009, pp. 97-114]. [de Finetti, 1929] B. de Finetti. Funzione caratteristica di un fenomeno aleatorio. In Atti del Congresso Internazionale dei Matematici. Bologna: Zanichelli, pp. 179-190, 1929. Also in de Finetti [1981], pp. 97-108.
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
199
[de Finetti, 1931a] B. de Finetti. Probabilismo. Logos, pp.163-219 1931. English edition Probabilism. In Erkenntnis XXXI, pp.169-223, 1989. [de Finetti, 1931b] B. de Finetti. Sul significato soggettivo della probabilit` a. Fundamenta mathematicae XVII, pp. 298-329, 1931. English edition On the Subjective Meaning of Probability. In de Finetti [1992], pp. 291-321. [de Finetti, 1931c] B. de Finetti. Le leggi differenziali e la rinuncia al determinismo. Rendiconti del Seminario Matematico della R. Universit` a di Roma, serie 2, VII, pp. 63-74, 1931. English edition Differential Laws and the Renunciation of Determinism. In de Finetti [1992], pp. 323-334. [de Finetti, 1933] B. de Finetti. Sul concetto di probabilit` a. Rivista italiana di statistica, economia e finanza V, pp. 723-47, 1933. English edition On the Probability Concept. In de Finetti [1992], pp. 335-352. [de Finetti, 1937] B. de Finetti. La pr´evision: ses lois logiques, ses sources subjectives. Annales de l’Institut Henri Poincar´ e VII, pp.1-68, 1937. English edition Foresight: its Logical Laws, its Subjective Sources. In Kyburg and Smokler, eds. [1964], pp. 95-158. Also in the second edition (1980), pp. 53-118. ´ [de Finetti, 1939] B. de Finetti. Punti di vista: Emile Borel. Supplemento statistico ai Nuovi problemi di Politica, Storia, ed Economia V, pp. 61-71, 1939. [de Finetti, 1951] B. de Finetti. Recent Suggestions for the Reconciliation of Theories of Probability. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, ed. by Jerzy Neyman. Berkeley: University of California Press, pp. 217-225, 1951. [de Finetti, 1962a] B. de Finetti. Obiettivit` a e oggettivit` a: critica a un miraggio. La Rivista Trimestrale 1: pp. 343-367 1962. [de Finetti, 1962b] B. de Finetti. Does it Make Sense to Speak of ‘Good Probability Appraisers’ ?. In The Scientist Speculates. An Anthology of Partly-Baked Ideas, ed. by Irving John Good et al., New York: Basic Books, pp. 357-364, 1962. [de Finetti, 1968] B. de Finetti. Probability: the Subjectivistic Approach. In La philosophie contemporaine, ed. by Raymond Klibansky. Florence: La Nuova Italia, pp. 45-53, 1968. [de Finetti, 1969] B. de Finetti. Initial Probabilities: a Prerequisite for any Valid Induction. Synth` ese XX, pp. 2-16, 1969. [de Finetti, 1970] B. de Finetti. Teoria delle probabilit` a, Torino: Einaudi, 1970. English edition Theory of Probability. New York: Wiley, 1975. [de Finetti, 1972a] B. de Finetti. Subjective or Objective Probability: is the Dispute Undecidable?. Symposia Mathematica IX, pp. 21-36, 1972. [de Finetti, 1972b] B. de Finetti. Probability, Induction and Statistics. New York: Wiley, 1972. [de Finetti, 1973] B. de Finetti. Bayesianism: Its Unifying Role for Both the Foundations and the Applications of Statistics. Bulletin of the International Statistical Institute, Proceedings of the 39 th Session, pp. 349-68, 1973. [de Finetti, 1974] B. de Finetti. The Value of Studying Subjective Evaluations of Probability. In The Concept of Probability in Psychological Experiments, ed. by Carl-Axel Sta¨el von Holstein. Dordrecht-Boston: Reidel, pp. 1-14, 1974. [de Finetti, 1976] B. de Finetti. Probability: Beware of Falsifications!. Scientia LXX, pp. 283303, 1976. Reprinted in Kyburg and Smokler, eds. [1964], second edition 1980, pp. 194-224 (not in the first edition). [de Finetti, 1980] B. de Finetti. Probabilit` a. In Enciclopedia Einaudi. Torino: Einaudi. Vol. 10, pp. 1146-87. 1980. [de Finetti, 1981] B. de Finetti. Scritti (1926-1930). Padua: CEDAM, 1981. [de Finetti, 1982] B. de Finetti. Probability and my Life. In The Making of Statisticians, ed. by Joseph Gani. New York: Springer, pp. 4-12, 1982. [de Finetti, 1992] B. de Finetti. Probabilit` a e induzione (Induction and Probability), ed. by Paola Monari and Daniela Cocchi. Bologna: CLUEB, 1992. (A collection of de Finetti’s papers both in Italian and in English.) [de Finetti, 1995] B. de Finetti. Filosofia della probabilit` a, ed. by Alberto Mura. Milan: Il Saggiatore, 1995. English edition Philosophical Lectures on Probability, ed. by Alberto Mura. Dordrecht: Springer, 2008. [de Finetti, 2006] B. de Finetti. L’invenzione della verit` a. Milan: Cortina, 2006. [De Morgan, 1837] A. De Morgan. Theory of Probabilities. In Encyclopaedia Metropolitana, 1837.
200
Maria Carla Galavotti
[De Morgan, 1838] A, De Morgan. An Essay on Probabilities, and on their Applications to Life, Contingencies and Insurance Offices. London: Longman, 1838. [De Morgan, 1847] A. De Morgan. Formal Logic: or, The Calculus of Inference, Necessary and Probable. London: Taylor and Walton, 1847. Reprinted London: Open Court, 1926. [De Morgan, 1882] S. E. De Morgan. Memoir of Augustus De Morgan. London: Longman, 1882. [Di Maio, 1994] M. C. Di Maio. Review of F.P. Ramsey, Notes on Philosophy, Probability and Mathematics. Philosophy of Science LXI, pp. 487-489, 1994. [Donkin, 1851] W. Donkin. On Certain Questions Relating to the Theory of Probabilities. The Philosophical Magazine, Series IV, I, pp. 353-368, 458-466; II, pp. 55-60, 1851. [Dummett, 1993] M. Dummett. Origins of Analytical Philosophy. London: Duckworth, 1993. [Gabbay and Woods, 2008] D. Gabbay and J. Woods, eds. Handbook of the History of Logic. Volume IV: British Logic in the Nineteenth Century. Amsterdam: Elsevier, 2008. [Galavotti, 1989] M. C. Galavotti. Anti-realism in the Philosophy of Probability: Bruno de Finetti’s Subjectivism. Erkenntnis XXXI, pp. 239-261, 1989. [Galavotti, 1991] M. C. Galavotti. The Notion of Subjective Probability in the Work of Ramsey and de Finetti. Theoria LVII, pp. 239-259, 1991. [Galavotti, 1995] M. C. Galavotti. F.P. Ramsey and the Notion of ‘Chance’. In The British Tradition in the 20 th Century Philosophy. Proceedings of the 17 th International Wittgenstein Symposium, ed. by Jaakko Hintikka and Klaus Puhl. Vienna: Holder-Pichler-Tempsky, pp. 330-340, 1995. [Galavotti, 1999] M. C. Galavotti. Some Remarks on Objective Chance (F.P. Ramsey, K.R. Popper and N.R. Campbell). In Language, Quantum, Music, ed. by Maria Luisa Dalla Chiara, Roberto Giuntini and Federico Laudisa. Dordrecht-Boston: Kluwer, pp. 73-82, 1999. [Galavotti, 2001] M. C. Galavotti. Subjectivism, Objectivism and Objectivity in Bruno de Finetti’s Bayesianism. In Foundations of Bayesianism, ed. by David Corfield and Jon Williamson. Dordrecht-Boston: Kluwer, pp. 161-174, 2001. [Galavotti, 2003] M. C. Galavotti. Harold Jeffreys’ Probabilistic Epistemology: Between Logicism and Subjectivism. British Journal for the Philosophy of Science LIV, pp. 43-57, 2003. [Galavotti, 2005] M. C. Galavotti. Philosophical Introduction to Probability. Stanford: CSLI, 2005. [Galavotti, 2006] M. C. Galavotti, ed. Cambridge and Vienna. Frank P. Ramsey and the Vienna Circle. Dordrecht: Springer, 2006. [Galavotti, 2009] M. C. Galavotti, ed.. Bruno de Finetti, Radical Probabilist. London: College Publications, 2009. [Gillies, 2000] D. Gillies. Philosophical Theories of Probability. London-New York: Routledge, 2000. [Gillies, 2006] D. Gillies. Keynes and Probability. In [Backhouse and Bateman, 2006, pp. 199216]. [Good, 1965] I. J. Good. The Estimation of Probabilities. An Essay on Modern Bayesian Methods. Cambridge, Mass.: MIT Press, 1965. [Good, 1967] I. J. Good. On the Principle of Total Evidence. British Journal for the Philosophy of Science XVIII, pp. 319-321, 1967. Reprinted in Good [1983], pp. 178-180. [Good, 1983] J. I. Good. Good Thinking. The Foundations of Probability and its Applications. Minneapolis: University of Minnesota Press, 1983. [Hacking, 1971] I. Hacking. The Leibniz-Carnap Program for Inductive Logic. The Journal of Philosophy LXVIII, pp. 597-610, 1971. [Hacking, 1975] I. Hacking. The Emergence of Probability. Cambridge: Cambridge University Press, 1975. [Hailperin, 1976] T. Hailperin. Boole’s Logic and Probability. Amsterdam: North Holland, 1976. [Harrod, 1951] R. F. Harrod. The Life of John Maynard Keynes. London: Macmillan, 1951. [Howie, 2002] D. Howie. Interpreting Probability. Cambridge: Cambridge University Press, 2002. [Howson, 1988] C. Howson. On the Consistency of Jeffreys’ Simplicity Postulate, and its Role in Bayesian Inference. The Philosophical Quarterly XXXVIII, pp. 68-83, 1988. [Howson, 2006] C. Howson. Scientific Reasoning and the Bayesian Interpretation of Probability. In Contemporary Perspectives in Philosophy and Methodology of Science, eds. Wenceslao J. Gonzalez and Jesus Alcolea, La Coru˜ na: Netbiblo, pp. 31-45, 2006. [Jaynes, 1983] E. T. Jaynes. Papers on Probability, Statistics and Statistical Physics, ed. R. Rosenkrantz. Dordrecht: Reidel, 1983.
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
201
[Jaynes, 2003] E. T. Jaynes. Probability Theory: The Logic of Science, ed. G.L. Bretthorst. Cambridge: Cambridge University Press, 2003. http://bayes.wustl.edu. [Jeffrey, 1965] R. C. Jeffrey. The Logic of Decision. Chicago: The University of Chicago Press, 1965. Second edition Chicago: The University of Chicago Press, 1983. [Jeffrey, 1991] R. C. Jeffrey. After Carnap. Erkenntnis XXXV, pp. 255-62, 1991. [Jeffrey, 1992a] R. C. Jeffrey. Probability and the Art of Judgment. Cambridge: Cambridge University Press, 1992. [Jeffrey, 1992b] R. C. Jeffrey. Radical Probabilism (Prospectus for a User’s Manual). In Rationality in Epistemology, ed. by Enrique Villanueva. Atascadero, Cal.: Ridgeview, pp. 193-204, 1992. [Jeffrey, 1992c] R. C. Jeffrey. De Finetti’s Radical Probabilism. In [de Finetti, 1992, pp. 263-275]. [Jeffrey, 2004] R. C. Jeffrey. Subjective Probability: The Real Thing. Cambridge: Cambridge University Press, 2004. [Jeffreys, 1922] H. Jeffreys. Review of J.M. Keynes, A Treatise on Probability. Nature CIX, pp. 132-3, 1922. Also in Collected Papers VI, pp. 253-6. [Jeffreys, 1931] H. Jeffreys. Scientific Inference. Cambridge: Cambridge University Press, 1931. Reprinted with Addenda 1937, 2nd modified edition 1957, 1973. [Jeffreys, 1933] H. Jeffreys. Probability, Statistics and the Theory of Errors. Proceedings of the Royal Society, Series A, CXL, pp. 523-535.. 1933. [Jeffreys, 1934] H. Jeffreys. Probability and Scientific Method. Proceedings of the Royal Society, Series A, CXLVI, pp. 9-16, 1934. [Jeffreys, 1936a] H. Jeffreys. The Problem of Inference. Mind XLV, pp. 324-333, 1936. [Jeffreys, 1936b] H. Jeffreys. On Some Criticisms of the Theory of Probability. Philosophical Magazine XXII, pp. 337-359, 1936. [Jeffreys, 1937] H. Jeffreys. Scientific Method, Causality, and Reality. Proceedings of the Aristotelian Society, New Series, XXXVII, pp. 61-70, 1937. [Jeffreys, 1939] H. Jeffreys. Theory of Probability. Oxford, Clarendon Press, 1939. 2nd modified edition 1948, 1961, 1983. [Jeffreys, 1955] H. Jeffreys. The Present Position in Probability Theory. The British Journal for the Philosophy of Science V, pp. 275-89. Also in Collected Papers VI, pp. 421-435, 1955. [Jeffreys and Wrinch, 1919] H. Jeffreys and D. Wrinch. On Certain Aspects of the Theory of Probability. Philosophical Magazine XXXVII 38, pp. 715-731, 1919. [Jeffreys and Wrinch, 1921] H. Jeffreys and D. Wrinch. On Certain Fundamental Principles of Scientific Inquiry. Philosophical Magazine XLII 42, pp. 369-90 (part I); XLV, pp. 368-74 (part II), 1921. [Jeffreys and Wrinch, 1923] H. Jeffreys and D. Wrinch. The Theory of Mensuration. Philosophical Magazine XLVI, pp. 1-22, 1923. [Jeffreys and Swirles, 1971–1977] J. Jeffreys and B. Swirles, eds. Collected Papers of Sir Harold Jeffreys on Geophysics and Other Sciences. 6 volumes. London-Paris-New York: Gordon and Breach Science Publishers, 1971–1977. [Jevons, 1873] W. S. Jevons. The Principles of Science. London: Macmillan, 1873. Second enlarged edition 1877. Reprinted New York 1958. [Johnson, 1921, 1922, 1924] W. E. Johnson. Logic. Cambridge: Cambridge University Press. Part I, 1921; Part II, 1922; Part III, 1924. Reprinted New York: Dover, 1964. [Johnson, 1932] W. E. Johnson. Probability: The Relations of Proposal to Supposal; Probability: Axioms; Probability: The Deductive and the Inductive Problems. Mind XLI, pp. 1-16, 281-296, 409-423, 1932. [Keynes, 1921] J. M. Keynes. A Treatise on Probability. London: Macmillan. 1921. Reprinted in Keynes [1972], vol. 8. [Keynes, 1930] J. M. Keynes. Frank Plumpton Ramsey. The Economic Journal, 930, 1930. Reprinted in Keynes [1933] and [1972], pp. 335-346. [Keynes, 1931] J. M. Keynes. W.E. Johnson. The Times, 15 January 1931. Reprinted in Keynes [1933] and [1972], pp. 349-350. [Keynes, 1933] J. M. Keynes. Essays in Biography. London: Macmillan, 1933. Second modified edition 1951. Third modified edition in Keynes [1972], vol. 10. [Keynes, 1936] J. M. Keynes. William Stanley Jevons. Journal of the Royal Statistical Society Part III. 1936. Reprinted in the second edition of Keynes [1933] and in [1972], pp. 109-160. [Keynes, 1972] J. M. Keynes. The Collected Writings of John Maynard Keynes. Cambridge: Macmillan, 1972.
202
Maria Carla Galavotti
[Keynes, 1975] M. Keynes, ed. Essays on John Maynard Keynes. Cambridge: Cambridge University Press, 1975. [Kneale, 1948] W. Kneale. Boole and the Revival of Logic. Mind LVII, pp. 149-175, 1948. ´ [Knobloch, 1987] E. Knobloch. Emile Borel as a Probabilist. In Kr¨ uger, Gigerenzer and Morgan, eds. (1987), vol. 1, pp. 215-33, 1987. [Kr¨ uger et al., 1987] L. Kr¨ uger, G. Gigerenzer, and M. Morgan, eds. The Probabilistic Revolution. 2 volumes. Cambridge, Mass.: MIT Press, 1987. [Kyburg, 1968] H. Kyburg, jnr. The Rule of Detachment in Inductive Logic. In [Lakatos, 1968, pp. 98-165]. [Kyburg and Smolker, 1964] H. Kyburg, jnr. and H. Smokler, eds. Studies in Subjective Probability. New York-London-Sydney: Wiley, 1964. Second modified edition Huntington (N.Y.): Krieger, 1980. [Lakatos, 1968] I. Lakatos, ed. The Problem of Inductive Logic. Amsterdam: North-Holland, 1968. [Levy, 1979] P. Levy. G.E. Moore and the Cambridge Apostles. Oxford-New York: Oxford University Press, 1979. [Lindley, 1991] D. Lindley. Sir Harold Jeffreys. Chance IV, pp. 10-21. 1991. [Lindley, 1965] D. Lindley. Introduction to Probability and Statistics. Cambridge: Cambridge University Press, 1965. [MacHale, 1985] D. MacHale. George Boole. His Life and Work. Dublin: Boole Press, 1985. [Mellor, 1995] H. Mellor, ed. Better than the Stars. Philosophy LXX, pp. 243-262, 1995. [von Plato, 1994] J. von Plato. Creating Modern Probability. Cambridge-New York: Cambridge University Press, 1994. [Ramsey, 1922] F. P. Ramsey. Mr. Keynes on Probability. The Cambridge Magazine XI, pp. 3-5. Reprinted in The British Journal for the Philosophy of Science XL (1989), pp. 219-222, 1922. [Ramsey, 1931] F. P. Ramsey. The Foundations of Mathematics and Other Logical Essays, ed. by Richard Bevan Braithwaite. London: Routledge and Kegan Paul, 1931. [Ramsey, 1990a] F. P. Ramsey. Philosophical Papers, ed. by Hugh Mellor. Cambridge: Cambridge University Press, 1990. [Ramsey, 1990b] F. P. Ramsey. Weight or the Value of Knowledge. British Journal for the Philosophy of Science XLI, pp. 1-4, 1990. [Ramsey, 1991a] F. P. Ramsey. Notes on Philosophy, Probability and Mathematics, ed. by Maria Carla Galavotti. Naples: Bibliopolis, 1991. [Ramsey, 1991b] F. P. Ramsey. On Truth, ed. by Nicholas Rescher and Ulrich Majer. DordrechtBoston: Kluwer, 1991. [Sahlin, 1990] N.-E. Sahlin. The Philosophy of F.P. Ramsey. Cambridge: Cambridge University Press, 1990. [Savage, 1954] L. J. Savage. Foundations of Statistics. New York: Wiley, 1954. [Savage, 1971] L. J. Savage. Elicitation of Personal Probabilities and Expectations. Journal of the American Statistical Association LXVI, pp. 783-801. 1971. [Skidelsky, 1983-1992] R. Skidelsky. John Maynard Keynes. 2 volumes. London: Macmillan, 1983-1992. [Skyrms, 1990] B. Skyrms. The Dynamics of Rational Deliberation. Cambridge, Mass.: Harvard University Press, 1990. [Skyrms, 1996] B. Skyrms. The Structure of Radical Probabilism. Erkenntnis XLV, pp. 286-297, 1996. Reprinted in Costantini and Galavotti, eds. (1997), pp. 145-157. [Skyrms, 2006] B. Skyrms. Discovering ‘Weight, or the Value of Knowledge’. In [Galavotti, 2006, pp. 55-66]. [Skyrms and Harper, 1988] B. Skyrms and W. L. Harper, eds. Causation, Chance, and Credence, 2 volls., Dordrecht-Boston: Kluwer, 1988. [Suppes, 2006] P. Suppes. Ramsey’s Psychological Theory of Belief. In [Galavotti, 2006, pp. 55-66]. [Suppes, 2009] P. Suppes. Some philosophical reflections on de Finetti’s thought. In [Galavotti, 2009, pp. 19-40]. [Taylor, 2006] G. Taylor. Frank Ramsey — A Biographical Sketch. In [Galavotti, 2006, pp. 1-18].
The Modern Epistemic Interpretations of Probability: Logicism and Subjectivism
203
[Williamson, 2009] J. Williamson. Philosophies of Probability: Objective Bayesianism and its Challenges. In Handbook of the Philosophy of Mathematics, volume IX of the Handbook of the Philosophy of Science, ed. A. Irvine, Amsterdam: Elsevier. 2009. [Zabell, 1982] S. Zabell. W.E. Johnson’s ‘Sufficientness’ Postulate. The Annals of Statistics X, pp. 1091-9. 1982. Reprinted in Zabell [2005], pp. 84-95. [Zabell, 1988] S. Zabell. Symmetry and its Discontents. In [Skyrms and Harper, 1988, vol I, pp. 155-190]. Reprinted in [Zabell, 2005, pp. 3-37]. [Zabell, 1989] S. Zabell. The Rule of Succession. Erkenntnis XXXI, pp. 283-321, 1989. Reprinted in [Zabell, 2005, pp. 38-73]. [Zabell, 1991] S. Zabell. Ramsey, Truth and Probability. Theoria LVII, pp. 210-238, 1991. Reprinted in [Zabell, 2005, pp. 119-141]. [Zabell, 2005] S. Zabell. Symmetry and its Discontents. Cambridge: Cambridge University Press, 2005. [Zabell, 2009] S. Zabell. De Finetti, chance, quantum physics. In [Galavotti, 2009, pp. 59–83].
POPPER AND HYPOTHETICO-DEDUCTIVISM Alan Musgrave Popper famously declared that induction is a myth. This thesis, if true, makes nonsense of the current volume. But is the thesis true? And, before we get to that, what precisely does it mean? Popper is a deductivist. He thinks that whenever we reason, we reason deductively or are best reconstructed as reasoning deductively. Most philosophers disagree. Most philosophers think that most reasoning is non-deductive. To understand why most philosophers think this, we have to look at the functions of reason or argument, and see that deduction seems quite unsuited to serve some of those functions. What are the functions of argument? Why do people reason or argue? One function of reason or argument is to form new beliefs or come up with new hypotheses. Another is to prove or justify or give reasons for the beliefs or hypotheses that we have formed. A third is to explore the consequences of our beliefs or hypotheses in order to try to criticise them. We need a logic of discovery, a logic of justification, and a logic of criticism. It is usually accepted that deductive logic is fine so far as the logic of criticism goes. “Exploring the consequences of our hypotheses” means exploring the deductive consequences of our hypotheses. Criticism proceeds by deducing some conclusion, showing that it is not true (because it does not square with experience, experiment, or something else that we believe), and arguing that some premise must therefore be false as well. Criticism only works if the reasoning is deductively valid, if the conclusion is ‘contained in’ the premises, if the reasoning is not ‘ampliative’. That is what entitles us to say that if the conclusion is false some premise must be false as well. If our argument were ampliative, criticism would not work. Showing that the conclusion is false would not entitle us to say that some premise must be false as well. But deduction’s strength so far as criticism is concerned seems to be a weakness as far as discovery and justification are concerned. In a valid deduction the conclusion is contained in the premises, does not ‘amplify’ them, says nothing new. If we want to come up with new beliefs or hypotheses, deduction obviously cannot help us. And if we want to justify a belief, deduction cannot help us once more. Deducing the belief we want to justify from another stronger belief is bound to be question-begging. The logics of discovery and justification must be non-deductive or ampliative. The conclusions of the arguments involved cannot be contained in the premises, but must ‘amplify’ them and say something new. Or so said the critics of deductive logic, down the ages. Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
206
Alan Musgrave
ENTHYMEMES AND THEIR DEDUCTIVIST RECONSTRUCTIONS Deductivists deem ampliative reasoning invalid. If most reasoning is ampliative, then deductivists deem most reasoning invalid. That unpleasant consequence may seem reason enough to reject deductivism. To be sure, logic has a critical function. The task of the logician is not just to describe or ‘model’ how people do in fact reason, but also to prescribe how people ought to reason if they are to reason well. But if most reasoning is ampliative, then deductivists seem committed to the view that most of the time we do not reason well. Deductivism is a utopian ethic according to which most ordinary logical behaviour is thoroughly immoral! Deductivists have a way of avoiding this unpleasant consequence. Folk seldom spell out all the premises of their arguments. Most reasoning, including most everyday reasoning, is in enthymemes, arguments with unstated or ‘missing’ premises. An argument which is invalid as stated can often be validated by spelling out its missing premise. But deductivists must be careful here. Any invalid argument from premise(s) P to conclusion C can be validated if we count it an enthymeme and add the missing premise “If P then C”. If deductivists are not careful, they will end up saying that people never argue invalidly at all! Deductive logic will be deprived of any critical function! Deductivists will now have an ethical gospel of relaxation — “Reason how you will, I will validate it”. There is an obvious deductivist response to this. Not every argument should be counted as having a missing premise. It must be clear from the context of production of an argument that it has a missing premise and what that is. Suppose somebody argues that it must be raining, because when it rains the streets get wet and the streets are wet. Deductivists do not validate this argument. They say it is a fallacy, an example of the fallacy of affirming the consequent. As to the uncertainty about what the missing premise of an argument is, if there is one, logic teachers have for generations asked students to supply the missing premises of arguments presented to them — and have marked their answers right or wrong. Still, this is not a logical issue, but a pragmatic one (for want of a better word). It must be admitted that in some cases it may not be clear from the context whether there is a missing premise and what it is. But then, if it matters, we can try to find out. Most philosophers do not like deductivist reconstructions. Most philosophers say that ampliative reasoning is not to be validated in this way. Ampliative reasoning is deductively invalid, to be sure, but it is perfectly good reasoning nevertheless. Or at least, some of it is. All of it is invalid, but some of it is cogent and some of it is not. Thus the inductive logicians, or at least the best of them, try to work out when an ampliative argument is cogent and when not. Here they face the same problem as the deductivists. If it is not clear what the missing premise is, that would convert a real-life argument into a valid deduction, then it is equally unclear what the ampliative rule or principle is, that the real-life arguer is supposed to be employing. Unclarity about enthymemes cannot comfort inductivists.
Popper and Hypothetico-deductivism
207
Furthermore, inductivists do not think that all invalid arguments are ampliative. They agree with deductivists that you commit a fallacy if you argue that it must be raining because when it rains the streets get wet and the streets are wet. They do not say that this argument, although it is not deductively valid, is a perfectly cogent argument in some fancy ampliative logic. Yet when it comes to other deductively invalid arguments, this is precisely what inductivists do say. This is puzzling. ‘AUTOMOBILE LOGIC’ To see how puzzling it is, suppose somebody produces the following argument: American cars are better than Japanese cars. Therefore, Cadillacs are better than Japanese cars. What to say of this argument? You might quarrel with the premise, or with the conclusion, or with both. But what to say if you are a logician? Obviously, if you are a deductive logician, you will say that the argument is invalid, that the conclusion does not follow from the premise. That verdict may seem a bit harsh. You might soften it by suspecting an enthymeme. Perhaps our arguer also has an unstated or missing premise to the effect that Cadillacs are American cars. If we spell that premise out, we get the perfectly valid argument: [Cadillacs are American cars.] American cars are better than Japanese cars. Therefore, Cadillacs are better than Japanese cars. Having reconstructed the argument this way, we can turn to the more interesting question of whether the premises and conclusion of the argument are true. This is all trivial, and familiar to countless generations of logic students. [By the way, in my deductivist reconstructions of enthymemes, I place the missing premise in square brackets. As we will see, these missing premises are often general claims. But it is not to be assumed, as it often is, that missing premises are always general. Ordinary reasoners often suppress particular premises as well as general ones. A real-life arguer might well say “We are all mortal — so one day George Bush will die” as well as “George Bush is only human — so one day he will die”. In the former case the suppressed premise (“George Bush is human”) is particular rather than general.] There is another way to soften the harsh verdict that the argument with which we started is invalid. You might say that though the argument is deductively invalid, it is a perfectly cogent argument in a special non-deductive or inductive or ampliative logic that deals with arguments about automobiles. This automobile logic is characterised by special rules of inference such as “Cadillacs are American cars”. (Do not object that “Cadillacs are American cars” is not a rule of inference,
208
Alan Musgrave
but a sociological hypothesis about automobile manufacture. Recast it as a rule of inference: “From a premise of the form ‘x is a Cadillac’, infer a conclusion of the form ‘x is an American car”’.) Unlike the formal or topic-neutral rules of deductive logic, the rules of automobile logic are material or topic-specific. The formally valid arguments of deductive logic are boring, for their conclusions can contain nothing new. But the ‘materially valid’ or cogent arguments of automobile logic are exciting, because their conclusions do contain something new. So, the deductivist ploy regarding an invalid argument she wishes to appropriate is to reconstruct it as an enthymeme and supply its missing premise. And the inductivist ploy regarding a valid argument he wishes to appropriate is to reconstruct it (perhaps ‘deconstruct it’ would be better) as an ampliative argument where some necessary premise becomes a material rule of inference. Both ploys risk being applied trivially, as the example of automobile logic makes clear. Enough! Automobile logic is silly. Has any serious thinker ever gone in for anything like automobile logic? Yes, they have, and on a massive scale. But before I document that claim, a complication needs to be considered. FORMAL AND SEMANTIC VALIDITY Suppose somebody produces the following argument: Herbert is a bachelor. Therefore, he is unhappy. This argument is invalid. As before, deductivists will treat it as an enthymeme with a missing premise to the effect that bachelors are unhappy, to obtain: [Bachelors are unhappy.] Herbert is a bachelor. Therefore, he is unhappy. This argument is valid, and has a dubious hypothesis about the virtues of matrimony as its missing premise. As with automobile logic, we would think ill of a philosopher who said that the original argument, though formally invalid, is a materially valid or cogent argument in matrimonial logic, which has “Bachelors are unhappy” as one of its interesting inference-licenses. But now suppose somebody produces the argument: Herbert is a bachelor. Therefore, he is unmarried. As before, deductivists might insist upon treating this as an enthymeme and supplying its missing premise to obtain: [Bachelors are unmarried.] Herbert is a bachelor. Therefore, Herbert is unmarried.
Popper and Hypothetico-deductivism
209
But it may be objected that the original argument is already valid: it is impossible for its premise to be true and its conclusion false. This is because its missing premise is analytically or necessarily true, true by virtue of the meaning of the word ‘bachelor’. The argument is not formally valid, to be sure, but it is semantically valid. Deductivists, it may be objected, simply overlook the fact that there are semantically valid arguments as well as formally valid ones. This objection raises the vexed question of whether the analytic/synthetic distinction is viable. Are there, as well as logical truths that are true in virtue of their logical form, also analytic truths that are true in virtue of the meanings of their non-logical terms? Quine said NO. He was a logical purist who insisted that so-called ‘semantically valid’ arguments are to be treated as enthymemes and converted into formally valid arguments by spelling out their missing premises. This purist policy can be defended by pointing to the notorious vagueness of the category of analytic truths. There are a few clear cases (like “Bachelors are unmarried”?), many clear non-cases (like “Bachelors are unhappy”?), and an enormous grey area in between. Or so most philosophers think. Quine thinks that the grey area is all encompassing, and that there is no analytic/synthetic distinction to be drawn, not even a vague one. Quine thinks the purist policy is the only game in town. But this purist policy hides a deep problem. What are formal validity and logical truth? They are validity and truth in virtue of logical form. And what is logical form? You get at it by distinguishing logical words from non-logical or descriptive words. But formal validity and logical truth turn on the meanings of the logical words. Formal validity is just a species of semantic validity, and logical truth just a species of analytic truth. (As is well-known, formal validity of the argument “P , therefore C” is tantamount to logical truth of the conditional statement “If P , then C” corresponding to the argument.) The deep problem is that there seems no sharp distinction between logical and descriptive words. There are a few clear cases of logical words, and the deductive logician’s usual list gives them: propositional connectives, quantifiers, the ‘is’ of prediction, and the ‘is’ of identity. There are many clear cases of non-logical or descriptive words. But then there is a grey area, the most notorious examples being comparatives — ‘taller than’, older than’, generally ‘X-er than’. These seem to be mixed cases, with a descriptive component, the X, and a logical component, the ‘-er’. What are we to say of statements expressing the transitivity of comparatives, statements of the form “If a is X-er than b, and b is X-er than c, then a is X-er than c”? Do we count these logical truths, and the arguments corresponding to them formally valid? Or do we count them analytic truths, and the arguments corresponding to them semantically valid? Or do we count them as synthetic truths, and the arguments corresponding to them simply invalid unless we add the transitivity of ‘X-er than’ as a suppressed premise? Quine favours the last option. His famous attack on the analytic/synthetic distinction proceeds for the most part by taking the notion of logical truth for granted. Logical truth is truth by virtue of the meanings of . . . (here comes
210
Alan Musgrave
the usual list of logical words). Quine then contrasts the precision of this notion with the vagueness of the notion of analytic truth. But then, all of a sudden and without much argument, we are told that all truths depend for their truth on the way the world is, that the whole of our knowledge faces the tribunal of experience as a corporate body, and that experience might teach us that even so-called ‘logical truths’ like “Either P or it is not the case that P ” are false. Yet elsewhere Quine tells us that the deviant logician who wants to reject the law of excluded middle has a problem: he wants to convict us of an empirical error, but all he can really do is propose a new concept of negation. In short, the logical truths are restored: the law of excluded middle is true by virtue of the meanings of ‘and’ and ‘not’, and to reject it is to change the meaning of one of these terms. Fortunately, these foundational issues are irrelevant to our main concern, which is with non-deductive or inductive logic. Recognition of the category of analytic but non-logical truth, and of semantic but non-logical validity, does not take us outside the realm of deduction. It does not get us to inductive or ampliative inferences. So let us return to that issue. (Later I will consider the view that inductive arguments are actually semantically valid deductive arguments.) HISTORICAL INTERLUDE: MILL VERSUS ARISTOTLE As we all know, deductive logic was founded by Aristotle, who worked out the logic of categorical propositions and the theory of the syllogism. Aristotle found out that there were 256 possible syllogisms that folk might use, and determined that only 24 of these were valid. So precious were these valid syllogisms that each was given its own name, and in the 13th century Pope John XXI put all their names into a rhyme. Thereafter every educated person had to learn the rhyme and remember the valid syllogisms so that they might reason well. Yet down the centuries people often complained that Aristotle’s logic was trivial, uninformative or useless — precisely because valid syllogisms were not ampliative. The grumbling grew loud during the Scientific Revolution. The philosophers of the Scientific Revolution wanted to put Aristotle behind them. They dreamt of inferences that would not be trivial or uninformative or useless, ampliative inferences that would lead to something new. And they dreamt of a logic or method that might systematise such inferences, and tell us which of them were good ones, just as Aristotle had told us which syllogisms were good ones. Bacon and Descartes and Locke are but three examples of philosophers who criticised Aristotle’s logic in this way. In the nineteenth century, John Stuart Mill was another. Aristotle’s critics had a point. Aristotelian logic is trivial in the sense that it deals only with relatively trivial arguments. There are lots of valid deductive arguments that Aristotelian logic cannot deal with. Perhaps the most famous everyday example is “All horses are animals. Therefore, all heads of horses are heads of animals”. More important, it is hopeless to try to capture mathematical reasoning in syllogistic form. But beginning in about 1850 with Boole, deductive logic pro-
Popper and Hypothetico-deductivism
211
gressed way beyond Aristotelian logic. There is an irony of history here. While deductive logic was poor, one could forgive people for thinking that it needed supplementing with some non-deductive logic. But after deductive logic became rich, as it has in the last 150 years, one might suppose that anti-deductivist tendencies might have withered away. But nothing could be further from the truth. Which suggests that the belief in non-deductive or ampliative inference stems from a deeper source than Aristotle’s inability to deal adequately with “All horses are animals. Therefore, all heads of horses are heads of animals”. It does indeed stem from a deeper source - it stems from the idea that the logics of discovery and justification must be ampliative. Mill argued (syllogistically, by the way!) that all genuine inferences lead to conclusions that are new, while syllogisms lead to nothing new, so that syllogisms are not genuine inferences at all. All genuine inferences, according to Mill, are inductive or ampliative inferences: All inference is from particulars to particulars. General propositions are merely registers of such inferences already made and short formulae for making more. The major premise of a syllogism, consequently, is a formula of this description; the conclusion is not an inference drawn from the formula, but an inference drawn according to the formula; the real logical antecedent or premise being the particular facts from which the general proposition was collected by induction. [Mill, 1843: II, iii, 4] What did Mill mean by induction? He meant arguments from experience. He was an empiricist, who thought that knowledge came from experience. But knowledge transcends experience. So we need ampliative or inductive reasoning to get us from premises that experience supplies, to conclusions that transcend those premises. The paradigmatic kind of inductive reasoning is inductive generalisation. Here is an example: All observed emeralds were green. Therefore, all emeralds are green. This argument is invalid. But if that verdict seems harsh, deductivists might soften it by reflecting that people seldom state all of their premises. Perhaps this argument has a missing premise, to the effect that unobserved cases resemble observed cases. If we spell that premise out, we get the perfectly valid argument: [Unobserved cases resemble observed cases.] All observed emeralds were green. Therefore, all emeralds are green. (By the way, if we change the conclusion from a generalisation about all emeralds to a prediction about the next case, we get so-called singular predictive inductive inference. That is what Mill had in mind when he said “All inference is from particulars to particulars”.)
212
Alan Musgrave
I said earlier that automobile logic is silly. But nearly everybody thinks that inductive logic is not silly. Why? The situation with the emeralds argument is symmetrical with the situation with the argument about Cadillacs. Yet most philosophers insist that they be treated differently. Most philosophers would not touch automobile logic with a barge-pole, yet insist that inductive logic must exist. Why? What is the difference between the two cases? One obvious difference is that the missing premise of the Cadillac argument is true, while the missing premise of the emeralds argument is false. Another difference, connected with the first, is that “Unobserved cases resemble observed cases” is a much more general hypothesis than “Cadillacs are American cars”. So what? It will hardly do to say that arguments with true missing premises are to be reconstructed as deductive, while arguments with false missing premises are not. Where, in the continuum of generality, will we draw a line below which we have empirical hypotheses, and above which we have material rules of inductive logic? And what do we gain by disguising a bit of false and human chauvinistic metaphysics like “Unobserved cases [always] resemble observed cases” as a principle of some fancy ampliative inductive logic? We gain nothing, of course, and inductive logicians are smart enough to realise this. So they warm to the task of getting more plausible inductive rules than “Unobserved cases resemble observed cases”. With characteristic clarity, Mill put his finger precisely on one big problem they face. Mill said that in some cases a single observation is “sufficient for a complete induction” (as he put it), while in other cases a great many observations are not sufficient. Why? Mill wrote: Whoever can answer this question knows more of the philosophy of logic than the wisest of the ancients, and has solved the problem of induction. [1843: III, iii, 3] The answer to Mill’s question is obvious. In the first kind of case, we are assuming that what goes for one instance goes for all, whereas in the second kind of case we are not. But philosophers do not like this obvious answer. Peter Achinstein discusses Mill’s own example as follows: . . . we may need only one observed instance of a chemical fact about a substance to validly generalise to all instances of that substance, whereas many observed instances of black crows are required to [validly?] generalise about all crows. Presumably this is due to the empirical fact that instances of chemical properties of substances tend to be uniform, whereas bird coloration, even in the same species, tend[s] not to be. [Achinstein, 2009: 8] Quite so. And if we write these empirical facts (or rather, empirical assumptions or hypotheses) as explicit premises, then our arguments become deductions. In the first case we have a valid deduction from premises we think true, in the second case we have a valid deduction from premises one of which we think false (namely,
Popper and Hypothetico-deductivism
213
that what goes for the colour of one or many birds of a kind goes for all of them). Mill’s question is answered, and his so-called ‘problem of induction’ is solved. Other inductive logicians say that an inductive generalisation will only be ‘cogent’ if the observed cases are typical cases, or only if the observed cases are a representative sample of all the cases. But reflection on what ‘typical’ or ‘representative’ mean just yields a more plausible deductivist reconstruction of the emeralds argument. To say that the emeralds we have observed are ‘typical’ or ‘representative’ is just to say that their shared features are common to all emeralds. Spelling that out yields: [Observed emeralds are typical or representative emeralds: their shared features are common to all emeralds.] All observed emeralds were green. Therefore, all emeralds are green. Or perhaps the generalizer about emeralds had something even more restricted in mind. Perhaps the hidden assumption was that emeralds form a ‘natural kind’, and that colour is one of the essential or ‘defining’ features of things of that kind. Spelling that out yields: [All emeralds have the same colour.] All observed emeralds were green. Therefore, all emeralds are green. Of course, the second premise of this argument is now much stronger than it need be. Once we have assumed that all emeralds have the same colour, we need observe only one emerald and then can argue thus: [All emeralds have the same colour.] This emerald is green. Therefore, all emeralds are green. This is an example of what old logic books called demonstrative induction. It is not induction, but (valid) deduction. Aristotle called it epagoge. He insisted that one observed case is enough for you to “intuit the general principle” provided that the observation yields the essence of the thing [Prior Analytics, 67a22]. Aristotle also described this as “a valid syllogism which springs out of induction [that is, observation]” [Prior Analytics, 68b15]. There is also what people call ‘perfect’ or ‘complete’ or ‘enumerative’ induction, where it is tacitly assumed that we have observed all the instances. Again, this is not induction but deduction. An example is: [The observed emeralds are all the emeralds.] All the observed emeralds were green. Therefore, all emeralds are green. Popper mistakenly takes Aristotle’s epagoge to be a complete induction [1963: 12, footnote 7].
214
Alan Musgrave
Finally, there is so-called ‘eliminative induction’. Once again, this is not induction, but just a special kind of deduction. An example is: [Either the wife or the mistress or the butler committed the murder.] The wife did not do it. The mistress did not do it. Therefore, the butler did it. As the example indicates, this is a typical pattern of argument from detective stories. Sherlock Holmes argues that way all the time. So does your car mechanic to find out what is wrong with your car. So does your doctor to find out what is wrong with you. It is also the ‘form of induction’ advocated by Frances Bacon, High Priest of the experimental method of science, to find out the causes of things. Of course, an eliminative induction, though perfectly valid, is only as good as its major and often suppressed premise. You may get a false conclusion, if that premise does not enumerate all the ‘suspects’. As you can see, it is child’s play, philosophically speaking, to reconstruct patterns of so-called ‘inductive reasoning’ as valid deductions. We have done it with inductive generalisation, singular predictive inference, enumerative induction, demonstrative induction, and eliminative induction. (I did not even mention mathematical induction — everybody agrees that that is deduction.) As I shall show later, one can also do it with abduction, and with its intellectual descendant, inference to the best explanation. Yet most philosophers do not like these deductivist reconstructions of so-called inductive or ampliative arguments. They prefer to say, with Mill, that whether an ‘inductive generalisation’ is valid or cogent depends on the way the world is. This means that inductive logic, which sorts out which inductive arguments are cogent and which not, becomes an empirical science. Deductive logic is not empirical. Empirical inquiry can tell you that the conclusion of a valid argument is false (and hence the premises as well). Empirical inquiry can tell you that the premises of a valid argument are false (though not necessarily, of course, its conclusion as well). But neither finding out that the premises are false nor finding out that the conclusion is false, shows that the argument is invalid. Empirical research can produce ‘premise defeaters’ and/or ‘conclusion defeaters’, but it cannot produce ‘argument defeaters’. However, when it comes to inductive arguments, empirical research can provide ‘argument defeaters’ as well. An argument defeater casts doubt on the cogency of the argument, without necessarily impugning the truth of either the premises or the conclusion. The error of ‘psychologism’ was to suppose that logic describes how people think and is a part of empirical psychology. Despite its invocation of ‘cogency’ to parallel the notion of validity, inductive logic is also descriptive, part of empirical science in general, since whether an inductive argument is cogent depends on the way the world is. Deductive validity is an all-or-nothing business, it does not come in degrees. If you have a valid argument, you cannot make it more valid by adding premises. Inductive logic is different. Inductive cogency does come in degrees. You can make
Popper and Hypothetico-deductivism
215
an inductive argument more cogent or less cogent by adding premises. If I have observed ten green emeralds, I can pretty cogently conclude that all emeralds are green. But my inference will be more cogent if I add a further premise about having observed ten more green emeralds. However, if I add instead the premise that I observed my ten green emeralds in the collections of a friend of mine who has a fetish about collecting green things (green bottles, green postage stamps, green gemstones, and so forth), then my argument becomes less cogent or perhaps not cogent at all. Deductive logic, as well as being non-empirical, is monotonic. You cannot make a valid argument invalid by adding a premise. (Here I ignore the relevance logicians, who think that any valid argument can be invalidated by adding the negation of a premise as another premise.) Inductive logic is non-monotonic. An ‘inductively valid’ or cogent argument can be made invalid or non-cogent by adding a premise. Deductivists prefer to keep logic and empirical science separate. They stick to deductive logic, monotonous and monotonic though it may be. Are they just stickin-the-mud - or worse, closet logical positivists? Why not go in for inductive logic, which is exciting rather than monotonous and monotonic? Yet as we have seen, it is child’s play to do without induction, and to reconstruct so-called inductive arguments as hypothetico-deductive ones. So why is everybody except Karl Popper reluctant to do that? Why does everybody believe in induction, and in ampliative reasoning? The answer lies in the fact that our deductivist reconstructions of so-called inductive or ampliative arguments turn them into hypothetico-deductive arguments, whose missing premises are hypotheses of one kind or another — Cadillacs are American cars, Bachelors are unhappy, Unobserved cases resemble observed cases, All emeralds have the same colour, Instances of chemical properties of substances are uniform, Observed Xs are typical or representative Xs, and so forth. Hypothetico-deductive reasoning is no use to us if we want to justify the conclusions we reach. (It is perfectly good, however, if we want just to arrive at interesting new hypothetical conclusions, if we want a logic of discovery. I shall return to this.) If our interest is justification, then why not render hypotheses invisible by resisting deductivist reconstructions? But this is just to hide the problem of induction, not to solve it. WITTGENSTEINIAN INSTRUMENTALISM Mill inaugurated the view that general hypotheses are not premises of our arguments, but rules by which we infer particulars from particulars. The logical positivists said the same thing. They read in Wittgenstein’s Tractatus (1921): Suppose I am given all elementary propositions: then I can simply ask what propositions I can construct out of them. And then I have all propositions, and that fixes their limits. (4.51)
216
Alan Musgrave
A proposition is a truth-function of elementary propositions. (5) All propositions are the results of truth-operations on elementary propositions. (5.3) All truth-functions are the results of successive applications to elementary propositions of a finite number of truth-operations. (5.12) Schlick took the ‘elementary propositions’ to be particular observation statements. As usual, there is some dispute whether this reading was correct. But given that reading, general propositions are not genuine propositions at all. They are not (finite) truth-functions of particular observation statements, and so are not verifiable by observation. Given the verifiability theory of meaning, general propositions are meaningless. Thus Schlick on the general laws of science: It has often been remarked that, strictly, we can never speak of the absolute verification of a law. . . the above-mentioned fact means that a natural law, in principle, does not have the logical character of a statement, but is, rather, a prescription for the formation of statements. The problem of induction consists in asking for a logical justification of universal statements about reality . . . We recognise, with Hume, that there is no logical justification: there can be none, simply because they are not genuine statements. (Schlick, as translated by Popper 1959: 37, note 7) And Ramsay: Variable hypotheticals are not judgements but rules for judging. . . . when we assert a causal law we are asserting not a fact, nor an infinite conjunction, nor a connection of universals, but a variable hypothetical which is not strictly a proposition at all but a formula from which we derive propositions. [Ramsey 1931: 241, 251] If general statements, including general principles and (putative) laws of science, are not true or false propositions, what are they? What exactly are “prescriptions for the formation of statements” (Schlick) or “rules for judging” (Ramsey)? Wittgenstein’s Tractatus did not help much, with its vaguely Kantian suggestions: Newtonian mechanics, for example, imposes a unified form on the description of the world. . . . Mechanics determines one form of description of the world . . . (6.341) . . . the possibility of describing the world by means of Newtonian mechanics tells us nothing about the world: but what it does tell us something about is the precise way in which it is possible to describe it by those means. (6.342) The whole modern conception of the world is founded on the illusion that the so-called laws of nature are the explanations of natural phenomena. (6.371)
Popper and Hypothetico-deductivism
217
W. H. Watson, a physicist who sat at the master’s feet, put it thus: It should be clear that the laws of mechanics are the laws of our method of representing mechanical phenomena, and since we actually choose a method of representation when we describe the world, it cannot be that the laws of mechanics say anything about the world. [Watson 1938: 52]; this is parroted by [Hanson, 1969: 325] Toulmin and Hanson attempt to clarify this Kantian view by saying that theories are like maps, and that general laws are like the conventions of map-making or ‘laws of projection’: Our rules of projection control what lines it is permissible to draw on the [map]. Our rules of mechanics control what formulae it is permissible to construct as representing phenomena . . . Perhaps what we have called “the laws of nature” are only the laws of our method of representing nature. Perhaps laws show nothing about the natural world. But it does show something about the world that we have found by experience how accurate pictures of the world . . . can be made with the methods we have learned to use. [Hanson, 1969: 325-6]; this parrots [Toulmin, 1953: 108-9] As well as flirting with these vaguely Kantian suggestions, the Wittgensteinians revert (without knowing it) to Mill’s idea that general statements, including the general principles or laws of science, are ‘material’ rules of non-deductive inference. Ryle insisted that “the most ‘meaty’ and determinate hypothetical statements” like “If today is Monday, then tomorrow is Tuesday” or “Ravens are black” are not premises of arguments but material rules of inference or ‘inference-licences’ [Ryle, 1950, 328]. Harre said the same of empirical generalisations: The natural process of prediction of an instance is to state the instance as a consequence of another instance, for example, that a creature is herbivorous follows from the fact that it’s a rabbit. The justification of this move . . . takes us back to the generalization or its corresponding conditional . . . These are not premises since they validate but do not belong in the argument that expresses the deduction. It is natural to call them the rules of the deduction. We infer a particular not from a generalization but in accordance with it. [Harre, 1960: 79-80] Toulmin and Hanson say the same of the hypotheses or principles or (putative) laws of science: . . . the role of deduction in physics is not to take us from the more abstract levels of theory to the more concrete . . . Where we make strict, rule-guided inferences in physics is in working out, for instance, where a planet will be next week from a knowledge of its present position, velocity, and so on: this inference is not deduced from the laws of
218
Alan Musgrave
motion, but drawn in accordance with them, that is, as an application of them. [Toulmin, 1953, 84-5]; see also [Hanson, 1969, 337-8] The idea that universal statements about reality are not genuine statements at all enabled Schlick to solve, or rather sidestep, the problem of induction. Watson agreed: It seems that the expression ‘the correct law of nature’ is not a proper grammatical expression because, not knowing how to establish the truth of a statement employing this form of speech, we have not given it a meaning. [Watson, 1938, 51]; see also [Hanson, 1969, 324] But this just hides the problem — it does not solve it. If observation cannot establish the truth of a universal statement, then neither can observation establish the soundness of a material rule of inference. Humean sceptical questions about the certainty of general hypotheses or the reliability of predictions drawn from them, can simply be rephrased as sceptical questions about the usefulness of inferencelicenses or the reliability of predictions drawn according to them. If answering Hume was the aim, it has not been achieved. The other arguments or motivations for the inference-licence view are equally broken-backed (as I show in my [1980]). I said earlier that automobile logic is silly, and asked whether any serious philosopher has gone in for anything like it. Well, as we have seen, Wittgensteinian and his followers went in for it, on a massive scale. (I leave the reader to judge whether they count as serious philosophers.) I also said earlier that the widespread belief in non-deductive logic stems from the view that that deductive logic is useless either as a logic of discovery or as a logic of justification. Well, let us see.
‘LOGIC OF DISCOVERY’ — DEDUCTIVE OR INDUCTIVE? The distinction between the contexts of discovery and justification is due to the logical positivists and Popper. They were sceptical about there being any logic of discovery. They regarded the ‘context of discovery’ as belonging to the province of psychology rather than logic. Popper famously declared (1959: 31): “The initial stage, the act of conceiving or inventing a theory, seems to me to neither call for logical analysis nor to be susceptible of it”. That this statement occurs in a book called The Logic of Scientific Discovery has astonished many readers. The oddity can be partially relieved. ‘Discover’ is a success-word. One cannot discover that the moon is made of green cheese, because it isn’t. To discover that p one must come up with the hypothesis that p, or guess that p, and then show that p is true. It is consistent to maintain that the initial ‘guessing’ stage is not susceptible of logical analysis, and that logic only plays a role in the second stage, where we show that p is true, prove it or justify it. This only partially relieves the oddity of Popper’s claim, because he famously claims that there is no proving or justifying
Popper and Hypothetico-deductivism
219
our hypotheses either. All he gives us in The Logic of Scientific Discovery is a logical analysis of the process of empirical testing. Because ‘discover’ is a success word, it is odd to speak of discovering a false hypothesis. It would be better to speak, not of the context of discovery, but of the context of invention. Then we can separate the question of inventing a hypothesis from the question of justifying it. But ‘context of justification’ is not a happy phrase either, at least in Popper’s case. He thinks that while scientists can rationally evaluate or appraise hypotheses, they can never justify or prove them. So as not to beg the question against that view, it would be better to speak of the context of appraisal. These terminological suggestions are due to Robert McLaughlin [1982, p. 71]. Were the positivists and Popper right that there is no logic of invention, no logical analysis of the initial stage of inventing a hypothesis? No. People do not typically invent hypotheses at random or through flashes of mystical intuition or in their dreams. People typically invent new hypotheses by reason or argument. But, the pervasive thought is, these reasonings or arguments cannot be deductive, for the conclusion of a valid deduction contains nothing new. Hence we need an inductive or ampliative logic of invention (discovery). But that, too, is wrong. We already saw that it is child’s play to reconstruct inductive generalisation, singular predictive inference, enumerative induction, demonstrative induction, and eliminative induction as valid deductions. And when we did that, we did not say whether we were reconstructing ‘discovery arguments’ or ‘justification arguments’. Let us suppose the former, and revisit one trivial example. Suppose you want to know what colour emeralds are. Do you lie on your couch, close your eyes, and somehow dream up conjectures that you will then subject to test? No. You observe an emerald and perform a trivial ‘demonstrative induction’ (deduction): [Emeralds share a colour.] This emerald is green. Therefore, all emeralds are green. Your major premise, perhaps left un-stated, is a presupposition of the question “What colour are emeralds?”. Here is another trivial example of the same thing. Suppose you want to know what the relationship is between two measurable quantities P and Q. You have the hunch that it might be linear, or decide to try a linear relationship first. Do you lie on your couch, think up some linear equations between P and Q (there are infinitely many of them!), and then put them to the test? No. You make a couple of measurements and perform a trivial deduction: [P = aQ + b, for some a and b.] When Q = 0, P = 3, so that b = 3. When Q = 1, P = 10, so that a = 7. Therefore, P = 7Q + 3.
220
Alan Musgrave
This is called ‘curve-fitting’. It is supposed to be induction. But of course, it is really deduction. These are trivial examples of the ‘logic of invention (discovery)’. Other examples are less trivial. Newton spoke of arriving at scientific theories by deduction from the phenomena. Newton was right to speak of deduction here, not of induction, abduction, or anything like that. He was wrong to speak of deduction from phenomena alone. The premises of his arguments do not just contain statements of the observed phenomena. They also contain general metaphysical principles, heuristic principles, hunches. Newton first called them ‘Hypotheses’. Then, anxious to make it seem that there was nothing hypothetical in his work, he rechristened them ‘Rules of Reasoning in Philosophy’. (As we can see, disguising hypothetical premises as rules of ampliative reasoning, as in automobile logic, has a fine pedigree - it goes all the way back to Newton!) Newton had four ‘Rules of Reasoning’: RULE I We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. To this purpose the philosophers say that Nature does nothing in vain, and more is vain, when less will serve; for Nature is pleased with simplicity, and affects not the pomp of superfluous causes. RULE II Therefore to the same natural effects we must, as far as possible, assign the same causes. RULE III The qualities of bodies, which neither admit intensification nor remission of degrees, and which are found to belong to all bodies within the reach of our experiments, are to be esteemed the universal qualities of all bodies whatsoever. RULE IV In experimental philosophy we are to look upon propositions inferred by general induction from phenomena as accurately or very nearly true, notwithstanding any contrary hypotheses that may be imagined, till such time as other phenomena occur, by which they may be made either more accurate, or liable to exceptions. This rule we must follow, that the argument of induction may not be evaded by hypotheses. (Principia, Book III; Newton 1934, 398-400.) Rule III enables Newton to arrive at the law of universal gravitation: Lastly, if it universally appears, by experiments and astronomical observations, that all bodies about the earth gravitate towards the earth,
Popper and Hypothetico-deductivism
221
and that in proportion to the quantity of matter that they severally contain; that the moon likewise, according to the quantity of its matter, gravitates toward the earth; that, on the other hand, our sea gravitates toward the moon; and all the planets one toward another; and the comets in like manner toward the sun: we must, in consequence of this rule, universally allow that all bodies whatsoever are endowed with a principle of mutual gravitation . . . [Newton, 1934, 399] Here Newton lists the ‘phenomena’ that experiment and astronomical observation have revealed to him. (These ‘phenomena’ are highly theory-laden, of course, but that is not the issue here.) He then applies Rule III “The qualities . . . which are found to belong to all bodies within the reach of our experiments, are . . . the universal qualities of all bodies whatsoever” and deduces “all bodies whatsoever are endowed with a principle of mutual gravitation”. So, Newton did deduce things, but not just from ‘phenomena’, also from general metaphysical principles disguised as ‘Rules of Reasoning’. Or so it seems — as we will see shortly, careful reading reveals another interpretation, in which the ‘Rules of Reasoning’ are not metaphysical principles at all but rather epistemic principles. Here what matters is that once we spell out Newton’s so-called ‘Rules of Reasoning’ as explicit premises, whether metaphysical or epistemic, his arguments all become deductive. Newtonian deduction from the phenomena is ubiquitous in science. There is now quite a body of literature analysing real episodes from the history of science and demonstrating the fact. Examples include Cavendish’s deduction of the electrostatic inverse square law (see [Dorling, 1973a; 1973b]), Einstein’s deduction of the photon hypothesis (see [Dorling, 1971]), Rutherford’s deduction of the Rutherford model of the atom (see [McLaughlin, 1982; Musgrave, 1989]), and Einstein’s deductions of the special and general theories of relativity (see [Zahar, 1973; 1983]). Sometimes the major and often ‘missing’ premises of these deductions are general metaphysical principles like Newton’s. Sometimes they are more specific hypotheses that make up the ‘hard core’ of a particular scientific research programme. Imre Lakatos and his followers have produced many case-studies of the latter kind (see the papers collected in [Howson, 1973; Latsis, 1976]). What all the examples show is that there is a logic of discovery, despite positivist/Popperian orthodoxy, and that it is deductive logic, despite philosophic orthodoxy in general. What of the argument that logic of discovery must be non-deductive or ampliative because discovery is by definition coming up with something new and the conclusion of a valid deduction contains nothing new? Here we must distinguish logical novelty from psychological novelty. True, the conclusion of a valid deduction is not ‘logically new’, which is just a fancy way of saying that it is logically contained in the premises. But the conclusion of a valid deduction can be psychologically new. We can be surprised to discover the consequences of our assumptions. When Wittgenstein said that in logic there are no surprises, he was just wrong: Hobbes was astonished that Pythagoras’s Theorem could be deduced from Euclid’s axioms. Moreover, the conclusion of a valid deduction can have
222
Alan Musgrave
interesting new properties not possessed by any of the premises taken singly — being empirically falsifiable, for example. Nor, finally, do deductivist reconstructions of inventive arguments take all the inventiveness out of them and render hypothesis-generation a matter of dull routine. The originality or inventiveness lies in assembling a new combination of premises that may yield a surprising conclusion. It also lies in obtaining that conclusion, which in interesting cases is no trivial or routine task. The positivists and Popper were wrong. There is a logic of invention (discovery). And it is deductive logic, or is best reconstructed as such. It will be objected that hypothetico-deductive inventive arguments are only as good as their premises. And further, that the ‘missing’ premises of inventive arguments, the general metaphysical or heuritstic principles that lie behind them, are very often false. It is not true that unobserved cases always resemble observed cases, that the relationship between two measurable quantities is always linear, that Nature is simple, that like causes have like effects, and like effects like causes, and so forth. Moreover, the objector might continue, scientists know this. Is it plausible to think that scientists argue from premises that they know to be false? There is no evading this objection by viewing the arguments as non-deductive arguments which proceed according to rules of inductive reasoning. It is equally implausible to think that scientists reason according to rules that they know to be unsound. Of course, that a hypothetico-deductive inventive argument contains a premise that is false, or at least not known to be true, is fatal to the idea that the argument proves or establishes its conclusion. But we should not mix up discovery and proof, or the logic of invention and the logic of justification. There is nothing wrong with getting new hypotheses from general heuristic principles that are not known to be true. There is not even anything wrong with getting new hypotheses from general principles that are known to be false, though they have some true cases. This may fail to convince. If so, we can if we wish recast hypothetico-deductive inventive arguments so that they become arguments that are not only sound but known to be so (at least so far as their general heuristic principles are concerned). To see how, let us return to Newton. So far we have had Newton deducing things, not just from ‘phenomena’, but also from general metaphysical principles disguised as ‘Rules of Reasoning’. But careful reading reveals another interpretation, in which the ‘Rules of Reasoning’ are not metaphysical principles at all but rather epistemic principles, about what we ought to admit, assign, esteem, or look upon to be the case. (It was, so far as I know, John Fox who first drew attention to this reading in his 1999.) On this reading, Newton does not deduce the law of universal gravitation (G) — what he deduces is that we must “allow that” or “esteem that” or “look upon it that” G is the case. In short, Newton’s conclusion is that it is reasonable for us to conjecture that G. And his ‘Rules of Reasoning’ are general epistemic or heuristic principles like “It is reasonable for us to conjecture that the qualities . . . which are found to belong to all bodies within the reach of our experiments, are . . . the universal qualities of all bodies whatsoever”. This
Popper and Hypothetico-deductivism
223
epistemic principle is, I submit, true and known to be true. Its truth is not impugned by the fact that a conjecture reached by employing it might subsequently get refuted. If we reasonably conjecture something and later find it to be false, we find out that our conjecture is wrong, not that we were wrong to have conjectured it. This epistemic interpretation makes sense of Newton’s Rule IV, in which Newton admits that any conclusion licensed by or reached from his first three Rules might be refuted. The purpose of Rule IV is to deny that sceptical proliferation of alternative hypotheses counts as genuine criticism (“This rule we must follow, that the argument of induction may not be evaded by hypotheses”). This is not trivial. It is important to see that the sceptical proliferation of alternative hypotheses is no criticism of any hypothesis we might have. It is only an excellent criticism of the claim that the hypothesis we have is proved or established by the data which led us to it. Return to so-called ‘inductive generalisation’, for example: All observed emeralds were green. Therefore, all emeralds are green. We validated this argument by spelling out its missing premise, to obtain: [Unobserved cases resemble observed cases.] All observed emeralds were green. Therefore, all emeralds are green. We then objected that this missing premise is a piece of false and human chauvinistic metaphysics, and that nothing is gained by replacing obvious invalidity by equally obvious unsoundness. But here is a better deductivist reconstruction of the argument: [It is reasonable to conjecture that unobserved cases resemble observed cases.] All observed emeralds were green. Therefore, it is reasonable to conjecture that all emeralds are green. This argument does not, of course, establish that emeralds are all green. But our interest here is invention (discovery), not proof. Or consider analogical reasoning, another alleged pattern of inductive inference which in its simplest form goes like this: a and b share property P . a also has property Q. Therefore, b also has property Q. We might validate this by adding an obviously false metaphysical missing premise: [If a and b share property P , and a also has property Q, then b also has property Q.] a and b share property P .
224
Alan Musgrave
a also has property Q. Therefore, b also has property Q. But if our interest is invention rather than proof, we can reconstruct analogical reasoning as a sound deductive argument: [If a and b share property P , and a also has property Q, then it is reasonable to conjecture that b also has property Q.] a and b share property P . a also has property Q. Therefore, it is reasonable to conjecture that b also has property Q. I submit that the missing premises of these deductivist reconstructions, premises about what it is reasonable for us to conjecture, are true and known to be true. That might be disputed. These missing premises simply acknowledge that scientists are inveterate ‘generalisers from experience’. The same applies to ordinary folk in the common affairs of life. Small children who have once burned themselves on a hot radiator do not repeat the experiment — they jump to conclusions and avoid touching the radiator again. The same applies to animals, as well. Popper tells the nice story of the anti-smoking puppy who had a lighted cigarette held under his nose. He did not like it, and after that one nasty experience he always ran away sneezing from anything that looked remotely like a cigarette. It seems that ‘jumping to conclusions’ is ‘hard-wired’ into us, part of the hypothesis-generating ‘software’ that Mother Nature (a.k.a. Natural Selection) has provided us with. Sometimes the ‘hard-wiring’ runs deep, being built into the perceptual system, and may be quite specific. A famous example concerns the visual system of the frog, which contains special mechanisms for detecting flies. A fly gets too close to a frog and triggers the mechanism, whereupon the frog catches and eats the fly. The frog’s eye is specially designed (by Natural Selection) for detecting flies. Similar discoveries have been made about the eyes of monkeys. Monkey eyes have special cells or visual pathways that are triggered by monkey hands. Of course, all this is unconscious. But if we adopt Dennett’s ‘intentional stance’ and attribute ‘as if’ beliefs to frogs and monkeys, we can see that they form beliefs on the basis of visual stimuli combined with general principles that are hard-wired into their visual systems. The beliefs that are formed in this way may be false. Experimenters can fool the frog into trying to eat a small metal object introduced into the visual field and moved around jerkily as a fly might move. Experimenters can fool baby monkeys into reaching out for a cardboard cut-out shaped roughly like a monkey-hand. But is it reasonable for us to proceed in this way? Philosophers have long been aware of inbuilt generalising tendencies. Frances Bacon said: “The human understanding is of its own nature prone to suppose the existence of more order and regularity in the world than it finds” (Novum Organum, Book I, Aphorism xlv). Bacon deplored this and tried to get rid of it. Hume deplored it, thought it could not be got rid of, and deemed us all irrational. But in deploring our generalising tendencies, Bacon and Hume mixed up invention (discovery) and proof (appraisal).
Popper and Hypothetico-deductivism
225
Hypotheses arrived at by ‘jumping to conclusions’ are not thereby shown to be true. But we need to navigate ourselves around the world, and forming beliefs in this way is a perfectly reasonable way to begin. Does this mean that positivist-Popperian orthodoxy was basically correct? Is the context of invention (wrongly ascribed to the province of psychology and deemed incapable of logical analysis, but no matter) irrelevant to the context of appraisal? No. To describe a conjecture as a reasonable conjecture is to make a minimal appraisal of it. And such minimal appraisals are important. Philosophers tediously and correctly point out that infinitely many possible hypotheses are consistent with any finite body of data. Infinitely many curves can be drawn through any finite number of data-points. We observe nothing but green emeralds and hypothesize that all emeralds are green — why not hypothesise that they are grue or grack or grurple? Scientists, not to mention plain folk, are unimpressed with the philosopher’s point, and do not even consider the gruesome hypotheses produced in support of it. What enables them to narrow their intellectual horizons in this way, and it is reasonable for them to do so? What enables them to do it are epistemic principles about reasonable conjecturing, which are, so far as we know, true. These principles are not necessarily true. We can imagine possible worlds in which they would deliver more false hypotheses than true ones, and thus be unreliable. But just as it may be reasonable to persist in a false belief until it is shown to be false, so also it may be reasonable to persist in an unreliable belief-producing mechanism until it is shown to be unreliable. And nobody has shown that the belief-producing mechanisms I have been discussing are unreliable. So much for the context of invention (discovery). I have resisted the idea that the logic of invention (discovery) must be inductive or ampliative. But what about the context of justification? Surely justification requires ampliative reasoning. Which brings me to the last, and most important, reason for the widespread belief in inductive logic. ‘LOGIC OF JUSTIFICATION’ — DEDUCTIVE OR INDUCTIVE? People reason or argue not just to arrive at new beliefs, or to invent new hypotheses. People also argue for what they believe, reason in order to give reasons for what they believe. In short, people reason or argue to show that they know stuff. Knowledge is not the same as belief, not even the same as true belief — knowledge is justified true belief. People reason or argue to justify their beliefs. Seen from this perspective, deductive arguments are sadly lacking. To be sure, in a valid deductive argument the premises are a conclusive reason for the conclusion — if the premises are true, the conclusion must be true as well. But if we want to justify a belief, producing a valid deductive argument for that belief is always question-begging. The argument “C, therefore C” is as rigorously valid as an argument can be. But it is circular, and obviously question-begging. Non-circular valid deductive arguments for C simply beg the question in a less obvious way. Moreover, the premises of a non-circular valid argument for C might be false,
226
Alan Musgrave
even if C is true. Such is the case with some of our deductivist reconstructions of arguments from experience. An inveterate generaliser observes a few ravens, notices that they are all black, and jumps to the conclusion that all ravens are black. His argument is invalid. But it does not help to validate it by adding the false inductive principle that unobserved cases always resemble observed cases. He might have tacitly assumed that to arrive at his new belief — remember, he is an inveterate generaliser. But if our interest is justification, there is no point replacing obvious invalidity by equally obvious unsoundness. And even where it is not obvious that the heuristic principle is false, spelling it out in a deductivist reconstruction does no good if we want justification. It does no good because it is not known to be true. We want a reason for our conclusion C, and produce a valid deductive argument “P , therefore C” to obtain one. But now we need a reason for the stronger claim P which logically contains C. And so on, ad infinitum, as sceptics tirelessly and rightly point out. But what if we start from premises for which no further reason is required, premises whose truth we know directly from observation or experience? And what if there are inductive or ampliative arguments from our observational premises to our conclusions? These arguments are not deductively valid, to be sure. Induction is not deduction. But inductive arguments might be cogent, they might give us good though defeasible reasons for their conclusions. We do not need ampliative arguments in the logic of criticism. We might not even need them in the logic of invention (discovery). But we surely do need them in the logic of justification. Or so everybody except Popper and me thinks. We are here confronted with the problem of induction. I think Popper has solved this problem. Let me briefly explain how. (What follows is a controversial reading of Popper, which is rejected by many self-styled ‘Popperians’. For more details, see my [2004] and [2007].) The key to Popper’s solution is to reject justificationism. What is that? As everybody knows, the term ‘belief’ is ambiguous between the content of a belief, what is believed, the proposition or hypothesis in question, and the act of believing that content. I shall call a belief-content just a ‘belief’, and a belief-act a ‘believing’. Talk of ‘justifying beliefs’ inherits this ambiguity. Do we seek to justify the belief or the believing of it? It is obvious, I think, that we seek to justify believings, not beliefs. One person can be justified in believing what another person is not. I can be justified in believing today what I was not justified in believing yesterday. The ancients were justified in believing that the earth does not move, though of course we are not. Despite these platitudes, justificationism is the view that a justification for a believing must be a justification for the belief. Given justificationism, we must provide some sort of inductive or ampliative logic leading us from evidential premises to evidence-transcending conclusions. At least, we must provide this if any evidence-transcending believings are to be justified believings. But if we reject justificationism, we need no inductive or ampliative reasoning. Our evidence-transcending believings might be justified even though our evidence-transcending beliefs cannot be. Of course, we need a theory of when an evidence-transcending believing is justified. Popper’s general story is
Popper and Hypothetico-deductivism
227
that an evidence-transcending believing is justified if the belief in question has withstood criticism. As we saw, the logic of criticism is entirely deductive. Popper’s critics object that he smuggles in inductive reasoning after all. In saying that having withstood criticism is a reason for believing, Popper must be assuming that it is a reason for belief as well. But these critics smuggle in precisely the justificationist assumption that Popper rejects. This is all terribly abstract. To make it concrete, let us consider abduction, and its intellectual descendant, inference to the best explanation (IBE). Abduction is generally regarded as the second main type of ampliative reasoning (the other being induction). Abduction was first set forth by Charles Sanders Peirce, as follows: The surprising fact, C, is observed. But if A were true, C would be a matter of course. Hence, . . . A is true. [C.S. Peirce 1931-1958, Vol. 5, p. 159] Here the second premise is a fancy way of saying “A explains C”. By the way, abduction was originally touted, chiefly by Hanson, as a long neglected contribution to the ‘logic of discovery’. It is no such thing. The explanatory hypothesis A figures in the second premise as well as the conclusion. The argument as a whole does not generate this hypothesis. Rather, it seeks to justify it. The same applies, despite its name, to ‘inference TO the best explanation’ (IBE). Abduction and IBE both belong in the context of appraisal (justification) rather than in the context of invention (discovery). Abduction is invalid. We can validate it by viewing it as an enthymeme and supplying its missing premise “Any explanation of a surprising fact is true”. But this is no use — it merely trades obvious invalidity for equally obvious unsoundness. The missing premise is obviously false. Nor is any comfort to be derived from weakening it to “Any explanation of a surprising fact is probably true” or to “Any explanation of a surprising fact is approximately true”. (Philosophers have cottageindustries devoted to both of these!) It is a surprising fact that marine fossils are found on mountain-tops. One explanation of this is that Martians came and put them there to surprise us. But this explanation is not true, or probably true, or approximately true. IBE attempts to improve upon abduction by requiring that the explanation is the best explanation that we have. It goes like this: F is a fact. Hypothesis H explains F . No available competing hypothesis explains F as well as H does. Therefore, H is true. [William Lycan, 1985, p. 138] This is better than abduction, but not much better. It is also invalid. We can validate it by viewing it as an enthymeme and supplying its missing premise “The
228
Alan Musgrave
best available explanation of a (surprising) fact is true”. But this missing premise is also obviously false. Nor, again, will going for probable truth or approximate truth help matters. But wait! Peirce’s original abductive scheme was not quite what we have considered so far. Peirce’s original scheme went like this: The surprising fact, C, is observed. But if A were true, C would be a matter of course. Hence, there is reason to suspect that A is true. This is also invalid. But to validate it the missing premise we need is “There is reason to suspect that any explanation of a surprising fact is true”. This missing premise is, I suggest, true. After all, the epistemic modifier “There is reason to suspect that . . . ” weakens the claim considerably. In particular, “There is reason to suspect that A is true” can be true even though A is false. So we have not traded obvious invalidity for equally obvious unsoundness. Peirce’s original scheme may be reconstructed so as to be both valid and sound. Why does everybody misread Peirce’s scheme and miss this obvious point? Because everybody accepts justificationism, and assumes that a reason for suspecting that something is true must be a reason for its truth. IBE can be rescued in a similar way. I even suggest a stronger epistemic modifier than “There is reason to suspect that . . . ”, namely “There is reason to believe (tentatively) that . . . ” or equivalently, “It is reasonable to believe (tentatively) that . . . ”. What results when this missing premise is spelled out is: [It is reasonable to believe that the best available explanation of a fact is true.] F is a fact. Hypothesis H explains F . No available competing hypothesis explains F as well as H does. Therefore, it is reasonable to believe that H is true. This is valid and instances of it might well be sound. Inferences of this are employed in the common affairs of life, in detective stories, and in the sciences. Why does everybody misread IBE and miss this obvious point? Because everybody accepts justificationism, and assumes that a reason for believing that something is true must be a reason for its truth. (The cottage industries devoted to probable truth and approximate truth stem from the same source.) All the criticisms of IBE presuppose justificationism. People object that the best available explanation might be false. Quite so — and so what? It goes without saying that any explanation might be false, in the sense that it is not necessarily true. But it is absurd to suppose that the only things we can reasonably believe are necessary truths. People object that being the best available explanation of a fact does not show that something is true (or probably true or approximately true). Quite so — and again, so what? This assumes the justificationist principle that a reason for believing something must be a reason for what is believed. People
Popper and Hypothetico-deductivism
229
object that the best available explanation might be the “best of a bad lot” and actually be false. Quite so — and again, so what? It can be reasonable to believe a falsehood. Of course, if we subsequently find out that that the best available explanation is false, it is no longer reasonable for us to believe it. But what we find out is that what we believed was wrong, not that it was wrong or unreasonable for us to have believed it. What goes for IBE goes for so-called ‘inductive arguments’ in general. They can be turned into sound deductive enthymemes with epistemic principles among their premises and epistemic modifiers prefacing their conclusions. Let us confine ourselves to inductive generalisation or singular predictive inference. In the context of justification we require a stronger epistemic modifier than “It is reasonable to conjecture that . . . ”. We need “It is reasonable to believe that . . . ”. For singular predictive inference we obtain: [It is reasonable to believe that unobserved cases resemble observed cases.] All observed emeralds have been green. Therefore, it is reasonable to believe that the next observed emerald will be green. Robert Pargetter and John Bigelow (1997) suggest an improved version of this, in which a tacit total evidence assumption is made explicit: All observed emeralds have been green. This is all the relevant evidence available. Therefore, it is reasonable to believe that the next observed emerald will be green. However, as in the above formulation, Pargetter and Bigelow do not spell out or make explicit the general epistemic principle involved here — “If all observed As have been B, and if this is all the relevant evidence available, then it is reasonable to believe that the next observed A will be B”. They do not spell this principle out because they regard it is analytically or necessarily true, true by virtue of the meaning of the term ‘reasonable’, so that the argument as it stands is semantically though not logically valid. They say, of arguments like the emeralds argument as set out above: They are, of course, not formally valid . . . They are valid just in the sense that it is not possible for their premises to be true while their conclusions are false. They are valid in the way that arguments like these are valid: ‘This is red, so it is coloured’, ‘This is square, so it is extended’, and so on. The validity of the emeralds argument rests not just on its logical form but on the nature of rationality. [Pargetter and Bigelow, 1997, p. 70] Now I do not want to quarrel about whether ‘Anything red is coloured’ or ‘Anything square is extended’ are analytic or necessary truths, as Pargetter and Bigelow
230
Alan Musgrave
evidently think. But I do wonder whether it is analytic that “If all observed As have been B, and if this is all the relevant evidence available, then it is reasonable to believe that the next observed A will be B”. This principle conflicts with the following Humean justificationist principle: “It is reasonable to believe a conclusion only if your premises establish that it is true or probably true (more likely true than not)”. I do not think this Humean principle is ‘conceptually confused’. So neither do I think the anti-Humean principle an analytic truth. But this is a family quarrel among deductivists, so I shall say no more about it (there is more in my [1999]). Deductivists have a different family quarrel with John Fox. Fox is generally sympathetic to deductivist reconstructions of so-called inductive arguments as deductive arguments with epistemic principles among their premises. He says that one can be a “deductivist without being an extreme inductive sceptic, by holding that the best analysis of why inductive beliefs are rational when they are displays no inferences but deductively valid ones as acceptable”, where the inferences “conclude not to predictions or generalisations, that is, not to inductive beliefs, but to judgements about their reasonableness” [Fox, 1999, pp. 449, 456]. But Fox thinks that this is not enough: . . . real-life arguers conclude to something further, which is not a deductive consequence of their premises: to the generalisations or predictions themselves. . . . In his primary concern to establish how surprisingly much can be reached simply by deduction, Musgrave seems simply to have overlooked both this further step and its non-deductive character. [Fox, 1999, 456] Fox says that all we need to get from a conclusion of the form “It is reasonable to believe that P ” to P itself is a further very simple non-deductive inference which he calls an epistemic syllogism. Examples are: It is reasonable to believe that P . Therefore, P . One should accept that P . Therefore, P . Epistemic syllogisms are obviously invalid, and could only be validated by invoking absurd metaphysical principles like “Anything that it is reasonable to believe is true” or “Anything that one should accept as true is true”. But here Fox makes an ingenious suggestion. He does not try to validate epistemic syllogisms but he does think that they can be trivially ‘vindicated’. To vindicate an argument is to show that, given its premise(s), it is reasonable to accept its conclusion. The premise of the epistemic syllogism is that it is reasonable to believe that P. If this is correct, then trivially it is reasonable to conclude, further, that P. Which vindicates epistemic syllogisms: “Indeed, precisely if these deductively drawn conclusions are correct, it is reasonable so to conclude” [Fox, 1999, p. 456].
Popper and Hypothetico-deductivism
231
This is clever — but is it correct? The matter turns on the word ‘conclude’, which is ambiguous between inferring and believing. Fox distinguishes a weak sense of ‘infer’ whereby one infers a conclusion from some premises without coming to believe it, from a strong sense of ‘infer’ whereby “to infer a conclusion from premises is to come to accept it on their basis” [Fox, 1999, p. 451]. I say that you infer in the strong sense if you first infer in the weak sense and then, as a result of having made that inference, come to accept or believe the conclusion. This can happen. But being caused to believe a conclusion, by inferring it from premise(s) that you believe, is not some special ‘strong’ kind of inferring. Making the inference is one mental act, coming to believe its conclusion is another. The former can cause the latter. But coming to believe something is not the conclusion of the inference, it is the effect of making it. Aristotle’s so-called ‘practical syllogism’, whose premises are statements and whose conclusion is an action, is an oxymoron. Fox agrees, but thinks his epistemic syllogisms are different: Aristotle’s ‘practical syllogism’ was not an inference at all. Its ‘premise’ was a proposition, to the effect that one should do x; its conclusion was the action of doing x. When the premise is that one should accept p, coming to accept p is doing just what the premise says one should, the ‘conclusion’ of an Aristotelian practical syllogism. But doing this is precisely (strongly) inferring in accordance with the pattern I vindicated above. Because here inference is involved, the term ‘syllogism’ is more apt than in most practical syllogisms. [Fox, 1999, p. 451] I can see no difference between Aristotle’s practical syllogism and Fox’s epistemic syllogism. Both involve or are preceded by inferences. In Aristotle’s case, you infer that you should do x from some premise(s). In the epistemic case, you infer that you should accept or believe P from some premise(s). The further steps, actually doing x or accepting P , are actions rather than the conclusions of arguments. Fox’s ‘vindication’ of his epistemic syllogisms seems trivial: from the premise “It is reasonable to believe that P ” the conclusion “It is reasonable to believe that P ” trivially follows. But “It is reasonable to believe that P ” does not say that any way of arguing for P is reasonable. It says nothing about any way of arguing for P — it speaks only of P . In particular, it does not say that “It is reasonable to believe that P . Therefore, P ” is a reasonable way to argue for P . Why does Fox think his obviously invalid epistemic syllogisms are necessary? Why does he think that “real life arguers” need to argue, not just that it is reasonable to believe some evidence-transcending hypothesis, but also for that hypothesis itself? Well, if you assume that a reason for believing P must be a reason for P itself, then you will need to invoke epistemic syllogisms to get you (invalidly) from a reason for believing P to a reason for P . But we should get rid of that justificationist assumption.
232
Alan Musgrave
GETTING STARTED — ‘FOUNDATIONAL BELIEFS’ In discussing induction, I talked of evidence and evidence-transcending hypotheses. And in discussing abduction and IBE, I talked of having ‘facts’ that require explanation. What is the source of this evidence or of these facts? There are two main sources, sense-experience and testimony. Justificationism bedevils discussion of these matters, too. My nose itches and I scratch it. The itch causes (or helps cause) the scratching. The itch is also a reason for the scratching (or part of the reason). In cases like this, we are happy with the thought that causes of actions are reasons for them. The experience (the itch) is both a cause and a reason for the action (the scratching). I see a tree and I form the belief that there is a tree in front of me. The treeexperience causes (or helps cause) the believing. Is the tree experience also a reason (or part of the reason) for the believing? The two cases seem symmetrical. Yet many philosophers treat them differently. Many philosophers are unhappy with the thought that the tree-experience is both a cause and a reason for the treebelieving. Why the asymmetry? Justificationism lies behind it. Justificationism says that a reason for believing something must be a reason for what is believed. What is believed is a statement or proposition. Only another proposition can be a reason for a proposition. But perceptual experiences are not propositions, any more than itches or tickles are. So my tree-experience cannot be a reason for my tree-belief, and cannot be a reason for my tree-believing either. If we reject justificationism, we can allow that perceptual experiences are reasons as well as causes of perceptual believings (though not, of course, for the perceptual beliefs, the propositions believed). We can even allow that they are good reasons. They are not conclusive reasons, of course, but defeasible ones. There is the ever-present possibility of illusion or hallucination. The tree-belief transcends the tree-experience, and future experiences may indicate that it is false. Still, it is reasonable to “trust your senses” unless you have a specific reason not to. In support of this, we can regard perceptual belief as a case of IBE. A simple example, formulated in the usual way, is: “I see a cat in the corner of the room. The best explanation of this is that there is a cat in the corner of the room. Therefore, there is a cat in the corner of the room”. But this formulation is wrong. The question was not “Why is there a cat in the corner of the room?”, but rather “Why do you believe that there is a cat in the corner of the room?”. What we are trying to justify or give a reason for is not the statement that there is a cat in the corner of the room, but rather my coming to believe this. So the conclusion ought to be “It is reasonable to believe that there is a cat in the corner of the room”. And the missing premise required to convert the argument into a perfectly valid deduction is “It is reasonable to believe the best explanation of any fact”. Of course, my reasonable perceptual belief might turn out to be false. If evidence comes in of hallucination or some less radical kind of perceptual error, I may concede that my perceptual belief was wrong — but that does not mean that I was wrong to have believed it.
Popper and Hypothetico-deductivism
233
Much the same applies to testimony. Somebody tells me something and I come to believe it. Is the testimony a reason as well as a cause for my believing? Many philosophers are unhappy with the thought that it is. Again, justificationism lies behind this. My hearing the testimony is not a proposition, any more than an itch or a tickle is. So my hearing the testimony cannot be a reason for my belief, and cannot if justificationism is right be a reason for my believing either. If we reject justificationism, we can allow that testimony is a reason as well as a cause of believing (though not, of course, for what is believed). We can even allow that it is a good reason. It is not a conclusive reason, of course, but a defeasible one. There is the ever-present possibility that my informant is misinformed or even lying to me. Future experience may indicate that the belief I acquired from testimony is false. Still, it is reasonable to “trust what other folk tell you” unless you have a specific reason not to. These reflections on the role of sense-experience and testimony are really no more than common sense. These ‘sources of knowledge’ — or rather, sources of reasonable believings - are simply ways of getting started. Sense-experience and testimony yield foundational beliefs, ‘foundational’ not in the sense that they are certain and incorrigible but only in the sense that they do not arise by inference from other beliefs. (For more on all this, see my [2009].) And so I conclude. We do not need inductive or ampliative logic anywhere — not in the context of criticism, not in the context of invention, and not in the context of appraisal either. BIBLIOGRAPHY [Achinstein, 2006] P. Achinstein. Mill’s Sins, or Mayo’s errors?, in D. Mayo and A. Spanos, eds., Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, London: Cambridge University Press, 2009. [Dorling, 1971] J. Dorling. Einstein’s Introduction of Photons: Argument by Analogy or Deduction from the Phenomena?, British Journal for the Philosophy of Science, 22, 1-8, 1971. [Dorling, 1973a] J. Dorling. Henry Cavendish’s Deduction of the Electrostatic Inverse Square Law from the Result of a Single Experiment, Studies in History and Philosophy of Science, 4, 327-348, 1973. [Dorling, 1973b] J. Dorling. Demonstrative Induction: Its Significant Role in the History of Physics, Philosophy of Science, 40, 360-372, 1973. [Fox, 1999] J. Fox. Deductivism Surpassed, Australasian Journal of Philosophy, 77, 447-464, 1999. [Hanson, 1969] N. R. Hanson. Perception and Discovery, San Francisco, CA: Freeman, Cooper & Co, 1969. [Harre, 1960] R. Harre. An Introduction to the Logic of the Sciences, London: McMillan & Co, 1960. [Howson, 1976] C. Howson, ed. Method and Appraisal in the Physical Sciences, London: Cambridge University Press, 1976. [Latsis, 1976] S. J. Latsis, ed. Method and Appraisal in Economics, London: Cambridge University Press, 1976. [Lycan, 1985] W. Lycan. Epistemic Value, Synthese, 64: 137-164, 1985. [McLaughlin, 1982] R. McLaughlin. Invention and Appraisal, in R. McLaughlin (ed), What? Where? When? Why?: Essays on Induction, Space and Time, and Explanation, Dordrecht: Reidel, 69-100, 1982.
234
Alan Musgrave
[Mill, 1843] J. S. Mill. A System of Logic Ratiocinative and Inductive, London: Longmans, Green and Co, 1843. [Musgrave, 1980] A. E. Musgrave. Wittgensteinian Instrumentalism, Theoria, 46, 65-105, 1980. [Reprinted in his Essays on Realism and Rationalism, Amsterdam — Atlanta, GA: Rodopi, 1999, 71-105.] [Musgrave, 1989] A. E. Musgrave. Deductive Heuristics, in K. Gavroglu et.al. (eds), Imre Lakatos and Theories of Scientific Change, Dordrecht/Boston/London: Kluwer Academic Publishers, 15-32, 1989. [Musgrave, 1999] A. E. Musgrave. How To Do Without Inductive Logic, Science and Education, 8, 395-412, 1999. [Musgrave, 2004] A. E. Musgrave. How Popper [might have] solved the problem of induction, Philosophy, 79: 19-31. [Reprinted in Karl Popper: Critical Assessments of Leading Philosophers. Anthony O’Hear (ed). London: Routledge (2003), Volume II, 140 — 151; and in Karl Popper: Critical Appraisals. P. Catton and G. Macdonald (eds). London: Routledge (2004) 16-27.] [Musgrave, 2007] A. E. Musgrave. Critical Rationalism’, in E. Suarez-Iniguez (ed), The Power of Argumentation (Poznan Studies in the Philosophy of the Sciences and the Humanities, vol. 93), Amsterdam/New York, NY: Rodopi, 171-211, 2007. [Musgrave, 2009] A. E. Musgrave. Experience and Perceptual Belief, in Z. Parusnikova & R. S. Cohen (eds), Rethinking Popper (Boston Studies in the Philosophy of Science), Springer Science & Business Media, 5-19, 2009. [Newton, 1934] I. Newton. Sir Isaac Newton’s Mathematical Principles of Natural Philosophy and his System of the World, Motte’s translation, revised by Cajori, Berkeley & Los Angeles: University of California Press, 1934. [Pargetter and Bigelow, 1997] R. Pargetter and J. Bigelow. The Validation of Induction, Australasian Journal of Philosophy, 75, 62-76, 1997. [Peirce, 1931-58] C. S. Peirce. The Collected Papers of Charles Sanders Peirce, ed. C. Hartshorne & P. Weiss, Cambridge, MA: Harvard University Press, 1931–1958. [Popper, 1963] K. R. Popper. Conjectures and Refutations, London: Routledge & Kegan Paul, 1963. [Popper, 1959] K. R. Popper. The Logic of Scientific Discovery, London: Hutchinson & Sons, 1959. [Ramsey, 1931] F. P. Ramsey. The Foundations of Mathematics, London: Routledge & Kegan Paul, 1931. [Ryle, 1950] G. Ryle. ”If”, “so”, and “because”, in M. Black (ed), Philosophical Analysis, New York: Cornell University Press, 323-340, 1950. [Toulmin, 1953] S. E. Toulmin. Philosophy of Science: An Introduction, London: Hutchinson & Co, 1953. [Watson, 1938] W. H. Watson. On Understanding Science, London: Cambridge University Press, 1938. [Wittgenstein, 1961] L. Wittgenstein. Tractatus-Logico-Philosophicus, translated by D.F.Pears & B. F. McGuiness, London: Routledge & Kegan Paul, 1961. [Zahar, 1973] E. G. Zahar. Why did Einstein’s Programme supersede Lorentz’s?, British Journal for the Philosophy of Science, 24, 95-123 & 223-262, 1973. [Zahar, 1983] E. G. Zahar. Logic of Discovery or Psychology of Invention? British Journal for the Philosophy of Science, 34, 243-261, 1983.
HEMPEL AND THE PARADOXES OF CONFIRMATION
Jan Sprenger
1
TOWARDS A LOGIC OF CONFIRMATION
The beginning of modern philosophy of science is generally associated with the label of logical empiricism, in particular with the members of the Vienna Circle. Some of them, as Frank, Hahn and Neurath, were themselves scientists, others, as Carnap and Schlick, were philosophers, but deeply impressed by the scientific revolutions at the beginning of the 20th century. All of them were unified in admiration for the systematicity and enduring success of science. This affected their philosophical views and led to a sharp break with the “metaphysical” philosophical tradition and to a re-invention of empiricist epistemology with a strong emphasis on science, our best source of high-level knowledge. Indeed, the members of the Vienna Circle were scientifically trained and used to the scientific method of test and observation. For them, metaphysical claims were neither verifiable nor falsifiable through empirical methods, and therefore neither true nor false, but meaningless. Proper philosophical analysis had to separate senseless (metaphysical) from meaningful (empirical) claims and to investigate our most reliable source of knowledge: science.1 The latter task included the development of formal frameworks for discovering the logic of scientific method and progress. Rudolf Carnap, who devoted much of his work to this task, was in fact one of the most influential figures of the Vienna Circle. In 1930, Carnap and the Berlin philosopher Hans Reichenbach took over the journal ‘Annalen der Philosophie’ and renamed it ‘Erkenntnis’. Under that name, it became a major publication organ for the works of the logical empiricists. The German-Austrian collaboration in the editorial board was no matter of chance: Congenial to the Vienna group, several similar-minded researchers based in Berlin gathered in the ‘Berlin Society for Empirical Philosophy’, among them Reichenbach. It was here that a young German student of mathematics, physics and philosophy — Carl Gustav Hempel — got into contact with empiricist philosophy. On the 1929 conference on the epistemology of the exact sciences in Berlin, he got to know Carnap and soon moved to Vienna himself. Nevertheless, he obtained his doctorate degree in Berlin in 1934, but faced with Nazi rule, Hempel 1 Cf.
[Friedman, 1999; Uebel, 2006].
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
236
Jan Sprenger
soon opted for emigration and later became Carnap’s assistant at the University of Chicago. Thus it is no matter of chance that the contents of Carnap’s and Hempel’s philosophy are so close to each other. Similar to Carnap, Hempel was interested in the logic of science and in particular in the problem of inductive inference. Similar to Carnap, Hempel thought that the introduction of rigorous methods would help us to establish a logic of induction and confirmation. Carnap’s life project consisted in developing a probabilistic logic of induction (Carnap [1950], cf. [Zabell, 2009]), similar to a deductive calculus for truth-preserving inferences. Indeed, the success of the calculus of deductive logic suggests a similar calculus for inductive, ampliative inferences that could be applied to the confirmation of scientific hypotheses. Having a logic of confirmation would thus contribute to the central aims of empiricist philosophy: to understand the progress and success of science, and in particular the replacement of old by new theories and the testability of abstract hypotheses by empirical observations. While in principle cherishing Carnap’s probabilistic work in that area, Hempel had some subtle methodological reservations: Prior to explicating the concept of confirmation in a probabilistic framework, we are supposed to clarify our qualitative concept of confirmation and to develop general adequacy criteria for an explication of confirmation. Therefore my essay deals less with probabilistic than with qualitative approaches to confirmation theory in modern philosophy of science. Hempel’s main contribution, the essay ‘Studies in the Logic of Confirmation’, was published in 1945, right after first pioneer works in the area (e.g. [Hossiasson-Lindenbaum, 1940]), but before Carnap’s [1950; 1952] major monographs. On the way, we will also stumble over Hempel’s famous paradoxes of confirmation, which pose, or so I will argue, a great challenge for any account of confirmation.2 Let us begin with some preliminary thoughts. In science, confirmation becomes an issue whenever science interacts with the world, especially when scientific hypotheses are subjected to empirical tests. Where exactly can a logic of induction and confirmation help us? Hempel distinguishes three stages of empirical testing [Hempel, 1945/1965, pp. 40-41]: First, we design, set up and carefully conduct scientific experiments, we try to avoid misleading observations, double-check the data, clear them up and finally bring them into a canonical form that we can use in 2 Confirmation is generally thought to hold between a hypothesis and pieces of evidence — a piece of evidence proves, confirms, undermines, refutes or is irrelevant to a hypothesis. At first sight, it sounds plausible to think of confirmation as a semantic relation between a scientific theory on the one side and a real-world object on the other side. For instance, a black raven seems to confirm the hypothesis that all ravens are black. But recall that we would like to assimilate confirmation theory to deductive logic and to find a system of syntactic rules for valid inductive inference. Therefore we should frame the evidence into sentences of a (formal) language, in order to gain access to powerful logical tools, e.g. checking deducibility and consistency relations between evidence and hypothesis. Thus, Hempel argues, a purely semantic account of confirmation is inadequate. We should set up a syntactic relation between hypothesis and evidence where both relata are (sets of) first-order sentences. (Cf. [Hempel, 1945/1965, pp. 21-22]). When I nevertheless say that ‘a black raven confirms hypothesis H’, this is just a matter of convenience and means the corresponding observation report ‘there is a black raven’.
Hempel and the Paradoxes of Confirmation
237
the next stage.3 In the second stage, these data are brought to bear on the hypothesis at stake — do they constitute supporting or undermining evidence? Third and last, the hypothesis is re-assessed on the basis of a judgment of confirmation or disconfirmation: we decide to accept it, to reject it or to suspend judgment and to collect further evidence. — In these three stages, only the second stage is, or so Hempel argues, accessible to a logical analysis: the first and the third stage are full of pragmatically loaded decisions, e.g. which experiment to conduct, how to screen off the data against external nuisance factors, or which strength of evidence is required for accepting a hypothesis. Evidently, those processes cannot be represented by purely formal means. That’s different for the second stage which compares observational sentences (in which the evidence is framed) with theoretical sentences which represent the hypothesis or theory. This is the point where logical tools can help to analyze the relation between both kinds of sentences and to set up criteria for successful scientific confirmation. A fundamental objection against a logic of confirmation holds that scientists frequently disagree whether an empirical finding really confirms a theoretical hypothesis, and this phenomenon is too common to ascribe it to irrationality on behalf of the researchers. Common scientific sense may not be able to decide such questions, first because the case under scrutiny might be very complicated and second, because people might have different ideas of common sense in a specific case. Formal criteria of confirmation help to settle the discussion, and once again, it is helpful to consider the analogy to deductive logic. For each valid deductive inference, there is a deduction of the conclusion from the logical axioms (that is the completeness theorem for first-order logic). Hence, in case there is a disagreement about the validity of a deductive inference, the formal tools can help us to settle the question. In the same way that the validity of a deductive inference can be checked using formal tools (deductions), it is desirable to have formal tools which examine the validity of an inductive inference. Sometimes this project is deemed futile because scientists do not always make their criteria of confirmation explicit. But that objection conflates a logical with a psychological point [Hempel, 1945/1965, pp. 9-10] — the lack of explicit confirmation criteria in scientific practice does not refute their existence. The objection merely shows that if such criteria exist, scientists are often not aware of them. But since scientists make, in spite of all disagreement in special cases, in general consistent judgments on evidential relevance, this is still a fruitful project. Confirmation theory thus aims at a rational reconstruction of inductive practice that is not only descriptively adequate, but also able to correct methodological mistakes in science. Thus confirmation theory is vastly more than a remote philosophical subdiscipline, it is actually a proper part of the foundations of science, in the very spirit of logical empiricism. Later œuvres where debates about proper scientific method interfere with confirmation-theoretic problems (e.g. [Royall, 1997]) vindicate this view. Let us now review Hempel’s pioneer work. 3 Suppes
[1962/1969] refers to this activity as building ‘models of data’.
238
Jan Sprenger
2 ADEQUACY CRITERIA Carnap and Hempel both worked on an explication of confirmation, but their methods were quite different. While Carnap connected confirmation to probability by proposing ‘degree of confirmation’ as an interpretation of probability, Hempel pursued a non-probabilistic approach which precedes the quantitative analysis. His method can be described thus: At the beginning, general considerations yield adequacy criteria for every sensible account of confirmation [Hempel, 1945/1965, pp. 30-33], considerably narrowing down the space of admissible accounts. Out of the remaining accounts, Hempel selects the one that also captures a core intuition about confirmation, namely that hypotheses are confirmed by their instances. Let us now see which criteria Hempel develops. The first criterion which he suggests is the Entailment Condition (EnC): If the observation report E logically implies the hypothesis H then E confirms H. For example, if the hypothesis reads ‘there are white ravens’ then, obviously, the observation of a white raven proves it and a fortiori, confirms it: Logical implication is the strongest possible form of evidential support. So the Entailment Condition sounds very reasonable. Then, if a theory is confirmed by a piece of evidence, it seems strange to deny that consequences of the theory are not confirmed by the evidence. For instance, if observations confirm Newton’s law of gravitation, they should confirm Kepler’s laws, too, since the latter’s predictions have to agree with the gravitation law. In other words, we demand satisfaction of the Consequence Condition (CC): If an observation report E confirms every member of a set of sentences S, then it confirms every consequence of S (e.g. every sentence H for which S |= H). In fact, the consequence condition is quite powerful, and several natural adequacy criteria follow from it. For instance, the Equivalence Condition (EC): If H and H ′ are logically equivalent sentences, then the observation report E confirms H if and only if E confirms H ′ .4 It is straightforward to see that (EC) follows from (CC): If a sentence H is confirmed by E and H is equivalent to H ′ , then H ′ is a logical consequence of {H} and the Consequence Condition can be applied, yielding that H ′ is also confirmed by E, and vice versa. 4 This condition can naturally be extended to a condition for the evidence, asserting that the confirmation relation is invariant under replacing the evidence statement by logically equivalent statements.
Hempel and the Paradoxes of Confirmation
239
Certainly, the equivalence condition is a minimal constraint on any account of confirmation. We have already said that scientific hypotheses are usually framed in the logical vocabulary of first-order logic (or a reduct thereof). That allows us to state them in different, but logically equivalent forms.5 The idea of the equivalence condition is that ‘saying the same with different words’ does not make a difference with regard to relations of confirmation and support: Hypotheses which express the same content in different words are equally supported and undermined by a piece of evidence, independent of the chosen formulation. To see this in more detail, note that for deductive relations, the Equivalence Condition holds by definition: If A logically implies B, A also implies any B ′ that is logically equivalent to B. An account of confirmation should contain relations of deduction and entailment as special cases: If an observation entailed the negation of a hypothesis, in other words, if the hypothesis were falsified by actual evidence, this would equally speak against all equivalent versions and formulations of that hypothesis. Deduction and logical entailment do not make a difference between equivalent sentences, and logical and mathematical axiomatizations are typical of the modern exact sciences (e.g. the propagation of sound is described by a general theory of mechanic waves). If the Equivalence Condition did not hold, the degree of support which a hypothesis got would depend on the specific formulation of the hypothesis. But that would run counter to all efforts to introduce exact mathematical methods into science, thereby making scientific analysis more precise, and ultimately more successful. Obviously, the Consequence Condition also implies the Special Consequence Condition (SCC): If an observation report E confirms a hypothesis H, then it confirms every consequence of H. However, there is an important confirmation intuition that contradicts (SCC) and stems from the link between prediction, test and confirmation. When a theory makes a prediction and this prediction is indeed observed, those observations lend empirical support to the theory. Abstract theories, like the General Theory of Relativity (GTR), are often not directly testable. We have to focus on parts of them and to use those parts for deriving observational consequences. This agrees with falsificationist methodology (e.g. [Popper, 1963]) — we derive conjectures and predictions from a theory and test them versus the empirical world. For instance, Eddington’s observations of the solar eclipse in 1919 did not prove GTR, but merely confirmed one of its predictions — namely the bending of light by massive bodies. Had the outcome been different, GTR (or one of the auxiliary assumptions) would have been falsified. Evidently, the stronger a theory, the higher its predictive power. In particular, if the theory T predicts the observation sentence E, E is also a prediction of any stronger theory T ′ . This line of reasoning suggests the Converse Consequence Condition (CCC): If an observation report E confirms a hypothesis H, then it confirms every hypothesis H ′ 5 For instance, the definition of compactness for sets of real numbers can be stated in topological or in analytical terms.
240
Jan Sprenger
that logically implies H (i.e. H ′ |= H). Obviously, the Converse Consequence Condition (CCC) stands in sharp contrast to the Special Consequence Condition (SCC). Indeed, accepting both adequacy conditions at once would trivialize the concept of confirmation: Every observation report E trivially implies itself, so by (EnC), E confirms E. By (CCC), E also confirms E.H for any hypothesis H since E.H logically implies H. Since E.H implies H and is confirmed by E, E confirms H by (SCC). Note that this derivation holds for an arbitrary hypothesis H and arbitrary observations E! Our paradoxical result reveals that we have to make a decision between the prediction/observationbased scheme of inference (CCC) and the ‘conservative’ (SCC). Hempel believed that the idea of predictive confirmation expressed in (CCC) is not an adequate image of confirmation in science. Sure, general laws as the law of gravitation are tested by observable consequences, such as the planetary motions. Indeed, successful tests of Kepler’s three laws are also believed to support the law of gravitation. But the evidence transfers from Kepler’s laws to the gravitation law because it is also an instance of the gravitation law — and not because the law of gravitation is logically stronger than Kepler’s laws. For instance, even the hypothesis ‘There is extraterrestrian life and Kepler’s laws hold’ is logically stronger than Kepler’s laws alone, but we would not like to say that this hypothesis can be confirmed by, let’s say, observing the orbit of Jupiter. If (CCC) is accepted, any hypothesis whatsoever (X) can be tacked to the confirmed hypothesis (H), and the new hypothesis H.X is still confirmed by the evidence. These are the paradoxes of hypothetico-deductive confirmation, the tacking paradoxes.6 Moreover, (CCC) licenses the confirmation of mutually incompatible hypotheses: Let H be confirmed by E. Then both H.X and H.¬X are, according to (CCC), confirmed by E. This sounds strange and arbitrary — the content of X is not at all relevant to our case, and if both hypotheses (H.X and H.¬X) are equally confirmed, it is not clear what we should believe in the end. There are now two ways to proceed: Either we can try to restrict (CCC) to logically stronger hypotheses that stand in a relevance relation to the evidence. Then, the paradoxes vanish. Several authors have tried to mitigate the paradoxes of hypothetico-deductive confirmation along these lines, namely by the additional requirements that the tacked hypothesis H.X or H.¬X be a content part of the hypothesis [Gemes, 1993] or that the inference to the evidence be ‘premise-relevant’ [Schurz, 1991]. So arbitrary hypotheses are no longer confirmed together with H. Hempel, however, chooses the other way — he rejects (CCC) in favor of (SCC). Contradictory hypotheses should, or so he argues, not be confirmed by one and the same evidence, in opposition to (CCC). We can put this view into another adequacy condition: hypotheses confirmed by a piece of evidence E must be consistent with each other. Consistency Condition (CnC): If an observation report E confirms 6 Cf.
[Musgrave, 2009; Weisberg, 2009].
Hempel and the Paradoxes of Confirmation
241
the hypotheses H and H ′ , then H is logically consistent with H ′ (i.e. there is at least one model of H that is also a model of H ′ ). Finally, we summarize the three conditions that are essential to Hempel’s account: 1. Entailment Condition (EnC): If E |= H, then E confirms H. 2. Consequence Condition (CC): If E confirms S and S |= H, then E confirms H. (Note: (CC) contains the Equivalence Condition (EC) and the Special Consequence Condition (SCC) as special cases.) 3. Consistency Condition (CnC): If E confirms H and H ′ , then H is logically consistent with H ′ . 3
THE SATISFACTION CRITERION
What should we demand of a piece of evidence in order to confirm a hypothesis? In general, logical entailment between evidence and hypothesis is too strong as a necessary criterion for confirmation. In particular, if the hypothesis is a universal conditional, no finite set of observations will ever be able to prove the hypothesis. But the evidence should certainly agree with those parts of the hypothesis that it is able to verify. Hempel suggests that, if an observation report says something about the singular terms a, b and c, the claims a hypothesis makes about a, b and c should be satisfied by the evidence. From such an observation report we could conclude that the hypothesis is true of the class of objects that occur in E. That is all we can demand of an confirming observation report, or so Hempel argues. In other words, we gain instances of a hypothesis from the evidence, and such instances confirm the hypothesis. To make this informal idea more precise, we have to introduce some definitions (partly taken from [Gemes, 2006]): DEFINITION 1. An atomic well-formed formula (wff) β is relevant to a wff α if and only if there is some model M of α such that: if M ′ differs from M only in the value β is assigned, M ′ is not a model of α. So intuitively, β is relevant for α if at least in one model of α the truth value of β cannot be changed without making α false. Now we can define the domain (or scope) of a wff: DEFINITION 2. The domain of a well-formed formula α, denoted by dom(α), is the set of singular terms which occur in the atomic (!) well-formed formulas (wffs) of L that are relevant for α. For example, the domain of F a.F b is {a, b} whereas the domain of F a.Ga is {a} and the domain of ∀x : F x are all singular terms of the logical language. In other words, quantifiers are treated substitutionally. The domain of a formula is
242
Jan Sprenger
thus the set of singular terms about which something is asserted. Those singular terms are said to occur essentially in the formula: DEFINITION 3. A singular term a occurs essentially in a formula β if and only if a is in the domain of β. So, i.e. a occurs essentially in F a.F b, but not in (F a ∨ ¬F a).F b. Now, we are interested in the development of a formula for the domain of a certain formula. DEFINITION 4. The development of a formula H for a formula E, H|E , is the restriction of H to the domain of E, i.e. to all singular terms that occur essentially in E.7 For instance, (∀x : F x)|{a,b} is F a.F b, and the development of the formula ∀x : F x for F a.Ga.Gb is F a.F b. Now we have the technical prerequisites for understanding Hempel’s satisfaction criterion: The evidence entails the hypothesis not directly, but it entails the restriction of the hypothesis to the domain of the evidence. DEFINITION 5. (Satisfaction criterion) A piece of evidence E directly Hempelconfirms a hypothesis H if and only if E entails the development of H to the domain of E. In other words, E |= H|dom(E) . DEFINITION 6. (Hempel-confirmation) A piece of evidence E Hempel-confirms a hypothesis H if and only if H is entailed by a set of sentences Γ so that for all sentences φ ∈ Γ, φ is directly Hempel-confirmed by E.
There are also formulations of those criteria that refer to a body of knowledge which provides the background for evaluating the confirmation relation (e.g. our current theory of physics). We do not need that for illustrating Hempel’s basic idea, but background information plays a crucial role in contrasting a hypothesis with empirical observations, as illustrated by the Duhem problem8 : Does the failure of a scientific test speak against the hypothesis or against the auxiliary assumptions which we need for connecting the evidence to the hypothesis? Therefore we give a formulation of Hempel’s satisfaction criterion which includes background knowledge. DEFINITION 7. (Satisfaction criterion, triadic formulation) A piece of evidence E directly Hempel-confirms a hypothesis H relative to background knowledge K if and only if E and K jointly entail the development of H to the domain of E. In other words, E.K |= H|dom(E) .9 DEFINITION 8. (Hempel-confirmation, triadic formulation) A piece of evidence E Hempel-confirms a hypothesis H relative to K if and only if H is entailed by a set of sentences Γ so that for all sentences φ ∈ Γ, φ is directly Hempel-confirmed by E relative to K. 7 The development of a formula can be defined precisely by a recursive definition, cf. [Hempel, 1943]. For our purposes, the informal version is sufficient. 8 Cf. [Duhem, 1914]. 9 Cf. [Hempel, 1945/1965, pp. 36-37].
Hempel and the Paradoxes of Confirmation
243
For example, F a (directly) Hempel-confirms the hypothesis ∀x : F x. Obviously, every piece of evidence that directly-Hempel confirms a hypothesis also Hempelconfirms it, but not vice versa. It is easy to see that any sentence that follows from a set of Hempel-confirmed sentences is Hempel-confirmed, too.10 Hence, Hempel’s confirmation criterion satisfies the Consequence Condition. The same holds true of the Consistency Condition. Indeed, Hempel’s proposal satisfies his own adequacy conditions. Moreover, many intuitively clear cases of confirmation are successfully reconstructed in Hempel’s account. However, one can raise several objections against Hempel, some of which were anticipated by Hempel himself in a postscript to “Studies in the Logic of Confirmation”. First, some hypotheses do not have finite developments and are therefore not confirmable. Take the hypothesis H2 = (∀x : ¬Gxx).(∀x : ∃y : Gxy).(∀x, y, z : Gxy.Gyz → Gxz) which asserts that G is a serial, irreflexive and transitive two-place relation. These properties entail that H2 is not satisfiable in any finite structure and thus not Hempel-confirmable by a finite number of observations. But certainly, H2 is not meaningless — you might interpret G as the ‘greater than’ relation and then, the natural numbers with their ordinary ordinal structure are a model of H2 . Read like this, H2 asserts that the ‘greater than’ relation is transitive, irreflexive and for any natural number, there is another natural number which is greater than it. It is strange that such hypotheses are not confirmable pace Hempel. The problem is maybe purely technical, but it is nevertheless embarrassing. Second, consider c, an individual constant of our predicate language, and the hypotheses H3 = ∀x : Ix and H4 = ∀x : (x = c → ¬Lx). Take the set of all planets of the solar system as the universe of our intended structure and let the individual constant c refer to Planet Earth. Then H3 might be interpreted as the claim that iron exists on all planets and H4 as the claim that no life exists on other planets. Both are meaningful hypotheses open to empirical investigation. Now, the observation report E = Ic (there is iron on Earth) directly Hempel-confirms H3 .H4 (there is iron on all planets and life does not exist on other planets) relative to empty background knowledge.11 While this may still be acceptable, it also follows that H4 is Hempel-confirmed by E = Ic, due to the Special Consequence Condition. This is utterly strange since the actual observation (there is iron on Earth) is completely independent of the hypothesis at stake (no life exists on other planets). Clearly, this conclusion goes beyond what the available evidence entitles us to infer. More embarrassing, this type of inference is generalizable to other examples, too.12 10 Assume that S |= H where S is Hempel-confirmed by E. Then there is a set Γ so that any element of Γ is directly Hempel-confirmed by E and that Γ |= S. Since by assumption S |= H, it follows that Γ |= H, too. Thus H is Hempel-confirmed by E. 11 The development of H .H with regard to c is Ic. 3 4 12 Cf. [Earman and Salmon, 1992].
244
Jan Sprenger
These technical problems may be mitigated in refined formulations of Hempelconfirmation, but there are more fundamental problems, too. They are in a similar vein connected to the fact that Hempel-confirmation satisfies the Special Consequence Condition. When a hypothesis H is Hempel-confirmed by a piece of evidence E (relative to K), any arbitrary disjunction X can be tacked to H while leaving the confirmation relation intact. For example, the hypothesis that all ravens are black or all doves are white is Hempel-confirmed by the observation of a black raven, although it is not clear in how far that observation is relevant for the hypothesis that all doves are white. Even worse, the same observation also confirms the hypothesis that all ravens are black or no doves are white. The tacked disjunction is completely arbitrary. Evidential relevance for the hypothesis gets lost, but a good account of confirmation should take care of these relations. Finally, consider the following case: A single card is drawn from a standard deck. We do not know which card it is. Compare, however, the two hypothesis that the card is the ace of diamonds (H5 ) and that the card is a red card (H6 ). Now, the person who draws the card tells us that the card is either an ace or a king of diamonds. Obviously, the hypothesis H6 is entailed by the evidence and thus Hempel-confirmed. But what about H5 ? We are now much more confident that H5 is true because the evidence favors the hypothesis that the card is an ace of diamonds over the hypothesis that the card is no ace of diamonds, in the usual relative sense of confirmation. However, the observation does not Hempel-confirm the hypothesis that the card is an ace of diamonds. This is so because not all assertions H5 makes about this particular card — that it is an ace and a diamond — are satisfied by the observation report. This behavior of Hempel-confirmation is awkward and stands in contrast to the most popular quantitative account of confirmation, the Bayesian account. Our toy example has analogues in science, too: it is not possible to Hempel-confirm all three of Kepler’s laws by confirming one of its three components. Any confirming observation report would have to entail each of Kepler’s laws (with regard to the planet that is observed). This is at least strange because we often cannot check each prediction of a theory. To give an example, an observation of the diffraction pattern of light apparently confirms the hypothesis that light is an electromagnetic wave. But waves have more characteristic properties than just exhibiting a diffraction pattern — in particular, properties that are not shown in our particular observation. Partial confirmation thus becomes difficult on a Hempelian view of confirmation. Hence, Hempel’s satisfaction criterion is not only liable to severe technical objections, but also fails to reconstruct an important line of thought in scientific observation and experimentation. Thus, the above objections do not only illuminate technical shortcomings of Hempel’s account, but also a general uneasiness with the Consequence Condition and the Special Consequence Condition. But why did they seem to be so plausible at first sight? I believe pace Carnap [1950] that the missing distinction between the absolute and the relative concept of confirmation is the culprit.13 We often 13 A discussion of that criticism which is more charitable towards Hempel can be found in [Huber, 2008].
Hempel and the Paradoxes of Confirmation
245
say that a certain theory is well confirmed, but we also say that a certain piece of evidence confirms a hypothesis. These two different usages correspond to different meanings of the word ‘confirmation’. When we use the former way of speaking — ‘theory T is well confirmed’ — we say something about a particular theory: T enjoys high confidence, the total available evidence speaks for T and favors it over all serious rivals. To be confirmed or to be well confirmed becomes a property of a particular hypothesis or theory. By contrast, the latter use says something about a relationship between hypothesis and evidence — it is asked whether a piece of evidence supports or undermines a hypothesis. Relative confirmation means that an empirical finding, a piece of evidence, lends support to a hypothesis or theory. This need, however, not imply that on account of the total available evidence, the theory is highly credible. The Consequence Condition is plausible whenever absolute confirmation is examined. When a strong, comprehensive theory is strongly endorsed — in the sense of ‘highly plausible’ or ‘empirically supported beyond all reasonable doubt’ — any part of this theory is also highly plausible, etc., in agreement with (CC) and (SCC). Obviously, the less risky a conjecture is, the more confidence can we put in it, and any proper part of a theory is logically weaker and thus less risky than the entire theory. Therefore the Consequence Condition makes perfect sense for degrees of belief and conviction, i.e. when it comes to endorsement and absolute confirmation. It is, however, highly questionable whether the Consequence Condition is also a sensible condition with regard to relative confirmation. Here, the evidence has to be informative with respect to the hypothesis under test. For instance, Eddington’s observations of the 1919 eclipse apparently confirmed the hypothesis that light is bent by massive bodies as the sun. General Theory of Relativity (GTR), the overarching theory, was at that time still fiercely contested, and the agreement of Eddington’s observations with GTR and their discrepancy from the Newtonian predictions constituted key evidence in favor of GTR. (The bending effect in the GTR predictions was roughly twice as high as in Newtonian theory.) But it would be much more controversial to claim — as (CC) does — that Eddington’s observations directly confirmed those parts of GTR that were remote from the bending-of-light effect, e.g. the gravitational redshift which was proven in the Pound-Rebka experiment in 1959. Confirmation does not automatically transmit to other sub-parts of an overarching theory, as vindicated in the (probabilistic) analysis by Dietrich and Moretti [2005]. Thus, we are well advised to drop the Consequence Condition. A similar criticism and be directed against the Consistency Condition since any coherent and unified theory that were in agreement with Eddington’s observations would have been confirmed by them. In a postscript to “Studies in the Logic of Confirmation” that appeared in 1965, Hempel admitted some of the problems of his account. In particular, he felt uncomfortable about the Consistency Condition which he thought to be too strong to figure as a necessary condition for (relative) confirmation. Thus, the satisfaction criterion is too narrow as a qualitative definition of confirmation. This concession suggests that Hempel actually spotted the problem of combining the concepts of
246
Jan Sprenger
relative and absolute confirmation in a single account (cf. [Huber, 2008]). But Hempel [1945/1965, p. 50] still contends that his adequacy conditions may be sufficient for a definition of confirmation. However, the next section will come up with a telling counterexample — a case of spurious confirmation which the satisfaction criterion fails to discern. 4
THE RAVEN PARADOX
Hypotheses about natural laws and natural kinds are often formulated in the form of universal conditionals. For instance, the assertion that all F ’s are G’s (H = ∀x : F x → Gx) suits hypotheses like ‘all planets have elliptical orbits’, ‘all ravens are black’ or ‘all cats are predators’. How are such claims confirmed? There is a longstanding tradition in philosophy of science that stresses the importance of instances in the confirmation of universal conditionals, going from Nicod [1925/1961] over Hempel [1945/1965] to Glymour [1980]. A confirming instance consists in the observation of an F that is also a G (F a.Ga) whereas an observation of an F that is no G (F a.¬Ga) refutes H. According to Nicod, only these two kinds of observation — ‘confirmation’ and ‘infirmation’ — are relevant to the hypothesis. L’induction par l’infirmation proceeds by refuting and eliminating other candidate hypothesis, l’induction par la confirmation supports a hypothesis by finding their instances. There is, however, an important asymmetry[Nicod. 1925/1961, pp. 23-25]: while observing a non-black raven refutes the raven hypothesis once and for all, observing a black raven does not permit such a conclusive inference. Nicod adds that not the sheer number of instances is decisive, but the variety of instances which can be accrued in favor of that hypothesis. If we try to put this idea of instance confirmation into a single condition, we might arrive at the following condition: Nicod Condition (NC): For a hypothesis of the form H = ∀x : Rx → Bx and an individual constant a, an observation report of the form Ra.Ba confirms H. However, this account does not seem to exhaust the ways a hypothesis can be confirmed. Recall the Equivalence Condition (EC): If H and H ′ are logically equivalent sentences then E confirms H if and only if E confirms H ′ . As already argued, the equivalence condition is an uncontroversial constraint on a logic of confirmation. Combining (EC) with Nicod’s condition about instance confirmation leads, however, to paradoxical results: Take the hypothesis that nothing that is non-black can be a raven (H ′ = ∀x : ¬Bx → ¬Rx). A white shoe is an instance of that hypothesis, thus, observing it counts as a confirming observation report. By the Equivalence Condition, H ′ is equivalent to H = ∀x : Rx → Bx so that a white shoe also confirms the hypothesis that all ravens are black.
Hempel and the Paradoxes of Confirmation
247
But a white shoe seems to be utterly irrelevant to the color of ravens. Hence, we have three individually plausible, but incompatible claims at least one of which has to be rejected: 1. Nicod Condition (NC): For a hypothesis of the form H = ∀x : Rx → Bx and any individual constant a, an observation report of the form Ra.Ba confirms H. 2. Equivalence Condition (EC): If H and H ′ are logically equivalent sentences then E confirms H relative to K if and only if E confirms H ′ relative to K. 3. Confirmation Intuition (CI): A Hypothesis of the form H = ∀x : Rx → Bx is not confirmed by an observation report of the form ¬Ra.¬Ba. This set of jointly inconsistent claims constitutes the paradox of confirmation and was first discussed in detail by Hempel [1965].14 The main conflict consists in the fact that (EC) and (NC) merely consider the logical form of scientific hypotheses whereas (CI) implicitly assumes that there is an ‘intended domain’ of a scientific hypothesis. In particular, only ravens seem to be evidentially relevant to the hypothesis that all ravens are black. One option to dissolve the paradox discussed (and rejected) by Hempel [1945/ 1965] consists in re-interpreting the hypothesis. General natural laws in the form of universal conditionals apparently confer existential import on the tentative hypotheses: ‘All ravens are black’ could be read as ‘all ravens are black and there exists at least one raven’. Then, there is no inconsistency between the above three claims. But that proposal is not convincing. The observation of a single black raven provides conclusive evidence in favor of the second part of the hypothesis. As Alexander [1958, p. 230] has pointed out, we will then focus on confirming or undermining the first part of the hypothesis (‘all ravens are black’) as soon as a black raven has been observed. Hence, the paradox appears again. Interpreting the raven hypothesis as having existential import does not remove the problem. Before going into the details of attempted solutions it is interesting to note a line of thought that can be traced back to Hempel himself. “If the given evidence E [...] is black [Ba], then E may reasonably be said to even confirm the hypothesis that all objects are black [∀x : Bx], and a fortiori, E supports the weaker assertion that all ravens are black [H = ∀x : Rx → Bx ].”15 We can transfer this argument in a canonical to non-ravens (cf. [Goodman, 1983, pp. 70-71]): 14 Note that the inconsistency vanishes if the conditionals are interpreted as subjunctive and not as material conditionals: contraposition is not a valid form of inference for subjunctive conditionals. 15 [Hempel, 1945/1965, p. 20].
248
Jan Sprenger
If the given evidence E is a non-raven [¬Ra], then E may reasonably be said to even confirm that all objects are non-ravens [∀x : ¬Rx], and a fortiori, E supports the weaker assertion that all non-black objects are non-ravens [∀x : ¬Bx → ¬Rx], i.e. that all ravens are black [H = ∀x : Rx → Bx ].16 Thus we obtain another ostensibly decisive argument against (CI). But as remarked by Fitelson [2006], the argument requires additional assumptions. Above all, the step from ‘a non-raven confirms H’ to ‘a black non-raven confirms H’ is far from trivial — it rests on the principle of monotonicity that extending the evidence cannot destroy the confirmation relation. Without this additional claim, the above argument would not bear on the observation of a black non-raven. Moreover, Hempel’s adequacy condition (SCC) is employed, namely in the transition from ‘E confirms ∀x : ¬Rx’ to ‘E confirms ∀x : ¬Bx → ¬Rx’. We may suspend judgment on monotonicity, but (SCC) is, as seen in the previous section, a controversial condition on relative confirmation. So the above reasoning does not remove the paradox convincingly.17 Hempel suggests that we should learn to live with the paradoxical conclusion. His argument can be paraphrased thus:18 Assume that we observe a grey, formerly unknown bird that is in most relevant external aspects very similar to a raven. That observation puts the raven hypothesis to jeopardy. It might just be the case that we have seen a non-black raven and falsified our hypothesis. But a complex genetic analysis reveals that the bird is no raven. Indeed, it is more related to crows than to ravens. Hence, it sounds logical to say that the results of the genetic analysis corroborate the raven hypothesis — it was at risk and it has survived a possible falsification. In other words, a potential counterexample has been eliminated. Thus there is no paradox in saying that an observation report of the form ¬Ra.¬Ba confirms H, in the sense that a satisfies the constraint given by H that nothing can be both a raven and have a color different from black.19 Hempel elaborates the crucial point in more detail, too: Compare two possible observation reports. First, we observe a crow which we know to be a crow and notice that it is grey (E1 = ¬Ba, K1 = ¬Ra). This seems to be a fake experiment 16 I borrow the idea to paraphrase Hempel’s argument in this way from Maher [1999] and Fitelson [2006]. 17 Quine [1969], by contrast, defends (CI) and finds the paradox unacceptable. Since he maintains (EC), too, he is forced to reject the Nicod Condition. Nonetheless, he defends a modified Nicod Condition whose content is restricted to natural kinds. Only instances of natural kinds confirm universal conditionals, and clearly, neither non-ravens nor non-black things count as natural kinds. However, this line of reasoning is subject to the Hempelian criticism explained in the text. 18 [Hempel, 1945/1965] makes the argument for quite a different example (‘all sodium salts burn yellow’) but I would like to stick to the original raven example in order not to confuse the reader. 19 It might now be objected that the observation of a black raven seems to lend stronger support to the raven hypothesis than the observation of a grey crow-like bird since such an observation is more relevant to the raven hypothesis. But this is a problem for a quantitative account of confirmation (we will get back to this in section 5) and not for a qualitative one.
Hempel and the Paradoxes of Confirmation
249
if evaluated with regard to the raven hypothesis — we knew beforehand that a crow could not have been a non-black raven. There was no risk involved in the experimentation, so neither confirmation nor Popperian corroboration could result. In the second case we observe an object about which we do not know anything beforehand and discover that the bird is a grey crow (E2 = ¬Ra.¬Ba, K2 = ∅). That counts as a sound case of confirmation, as argued above. Hempel describes the difference thus: When we are told beforehand that the bird is a crow [...] “this has the consequence that the outcome of the [...] color test becomes entirely irrelevant for the confirmation of the hypothesis and thus can yield no new evidence for us.”20 In other words, the available background knowledge in the two cases makes a crucial difference. Neglecting this difference is responsible for the fallacious belief (CI) that non-black non-ravens cannot confirm the hypothesis that all ravens are black. (CI) is plausible only if we tacitly introduce the additional background knowledge that the test object is no raven. Thus, in the above example, H should be confirmed if we do not know beforehand that the bird under scrutiny is a crow (K2 = ∅) and it should not be confirmed if we know beforehand that the bird is a crow (K1 = ¬Ra). In Hempel’s own words, “If we assume this additional information as given, then, of course, the outcome of the experiment can add no strength to the hypothesis under consideration. But if we are careful to avoid this tacit reference to additional knowledge (which entirely changes the character of the problem) [...] we have to ask: Given some object a [that is neither a raven nor black, but we do not happen to know this, J.S.]: does a constitute confirming evidence for the hypothesis? And now [...] it is clear that the answer has to be in the affirmative, and the paradoxes vanish.”21 Thus, the paradox is a psychological illusion, created by tacit introduction of background knowledge into the confirmation relation. From a logical point of view, (CI) reveals itself as plainly false. One of the three premises of the paradox has been discarded. A problem that remains, though, is that the Hempelian resolution does not make clear why ornithologists should go into the forest to check their hypothesis and not randomly note the properties of whatever object they encounter. This might be called the problem of armchair ornithology. In fact, this criticism is raised by Watkins [1957]. Watkins insinuates that Hempel may cheaply confirm H = ∀x : Rx → Bx by summing up observations of non-black non-ravens while sitting in the armchair.22 In a similar vein, Watkins [1957] objects that cases of confirmation as the observation of a white shoe do not 20 [Hempel,
1945/1965, p. 19]. 1945/1965, pp. 19-20]. 22 A reply to Watkins is [Vincent, 1964]. 21 [Hempel,
250
Jan Sprenger
put the hypothesis to a real test and thus contradict the falsificationist methodology for scientific hypotheses. On the Popperian, falsificationist account, hypotheses can only be corroborated by the survival of severe tests, and observing shoes does not count as a real test of a hypothesis. Second, the ‘negation’ of observing a white shoe, namely observing a black shoe, would equally confirm the raven hypothesis on Hempel’s account. This trivializes the notion of instance confirmation on which Hempel’s satisfaction criterion is based. Every universal conditional is automatically confirmed by lots of irrelevant evidence. Watkins concludes that an inductivist reasoning about confirmation (as Hempel’s instance confirmation) be better replaced by a truly falsificationist account. Alexander [1958; 1959] answers to Watkins that falsificationist corroboration also presupposes some kind of inductive reasoning if it is supposed to affect our expectations on the future: If a hypothesis survives several tests, we expect that it will survive future tests, too — otherwise it would not make sense to say that the hypothesis has been corroborated. So Watkins’s dismissal of inductive, instancebased reasoning goes too far. Moreover, Hempel makes an important proviso, namely that there be no substantial background assumptions when evaluating the evidential relevance of ¬Ra.¬Ba. If we do not know where to find and where not to find ravens, i.e. if we randomly sample from the class of all objects, the observation of a white shoe does count as a genuine test of the raven hypothesis. One may object that Hempel’s proviso is unrealistic for actual cases of (dis)confirmation (cf. [Watkins, 1960]), but conditional on this proviso, Hempel’s conclusion — everything that is not a non-black raven supports H = ∀x : Rx → Bx — seems to be correct. So the first objection vanishes. Second, it is misleading to say that the raven hypothesis is confirmed by conflicting evidence — rather, different kinds of evidence (namely, shoes of different color) equally confirm the hypothesis. Similarly, observing male as well as female black ravens confirms the raven hypothesis. Here, nobody would object that those pieces of evidence are conflicting and therefore inadmissible for confirmation. However, as pointed out by Agassi [195]), Hempel’s conclusion is stronger than that a non-black non-raven may confirm the raven hypothesis — it is claimed that this piece of evidence always confirms the raven hypothesis, independent of the background knowledge. Good [1960; 1961] has suggested the following (slightly modified) example to refute that conjecture: The only black middle-sized objects which a child sees are black crows and black ravens. No other black objects occur, and all ravens and crow the child sees are black. Suddenly she discovers a white crow. Then she says: “How surprising! Apparently objects that are supposed to be black can sometimes be white instead.”23 And what is good for the goose (crows) is equally good for the gander (ravens). So the child concludes that ravens may be white, too. On Hempel’s account, the observation of a grey crow would support rather than undermine the hypothesis that all ravens are black. Isn’t that behavior insensitive to the peculiarities of the specific case? I believe Agassi and Good are on the right track, but they do not fully pin down 23 [Good,
1961, p. 64]. Cf. [Swinburne, 1971].
Hempel and the Paradoxes of Confirmation
251
Hempel’s problem. We may admit that Hempel succeeds in explaining away the paradoxical nature of the problem. But his own satisfaction criterion fails to resolve the paradox. Remember Hempel’s diagnosis that tacitly introduced or deliberately suppressed background information is the source of the paradox. While perfectly agreeing with Hempel on this point, Fitelson and Hawthorne [2009] point out that Hempel is unable to make that difference in his own theory of confirmation. The reason is that his account is in general monotone with regard to the background knowledge: As long as the domain of the evidence is not extended (i.e. no individual constants are added), additional background knowledge cannot destroy the confirmation relation. Hempel inherits this property from deductive logic, because E.K |= H|dom(E) is the crucial condition for direct Hempel-confirmation, and thus also for Hempel-confirmation. Evidently, logical entailment is preserved under adding additional conditions to the antecedens. Therefore Hempel’s own account yields confirmation even if the background knowledge is far too strong. In the first case (we do not know beforehand that a is no raven) confirmation follows from E1 .K1 = ¬Ra.¬Ba |= (Ra → Ba) = H|dom(E) and in the second case, we have precisely the same implication E2 .K2 = ¬Ra.¬Ba |= (Ra → Ba) = H|dom(E) . Hence, adding the background knowledge that the test object is no raven does not destroy the (Hempel-)confirmation of H2 . Certainly Hempel spots two points correctly: First, the paradoxical conclusion of the raven example should be embraced, contra (CI). Second, background knowledge plays a crucial role when it comes to explaining the source of the paradox. But while pointing into the right direction, Hempel fails to set up an account of confirmation that conforms to his own diagnosis of the paradox. In particular, the adequacy criteria outlined in section 2 fail to be sufficient for a satisfactory concept of confirmation. The raven paradox drastically shows how valuable it is to distinguish between evidence and background knowledge. The distinction has to be formalized in a way that avoids Hempel’s problem. It further exhibits the problem of monotonicity with regard to evidence and background knowledge: When we happen to know more, confirmation might get lost. Therefore monotonicity is not a desirable property for accounts of confirmation, and I take this to be the third important moral from the paradoxes of confirmation. On the other hand, the arguments to resolve the paradox by giving up (CI) were on a whole convincing, and Hempel’s sixty-three-year-old judgment that part the paradoxical appearance often rests on a psychological illusion has some plausibility. The next section examines the paradoxes of confirmation from a probabilistic perspective.24 24 In a recent paper, Branden Fitelson [2009] elaborates the similarity of the raven a famous logical puzzle: the Wason Selection Task [Wason and Shapiro, 1971]. In Selection Task, four cards lie on the table. On the front side of each card, there on the back side, there is a number. The hypothesis H is: All cards with an even
paradox to the Wason is a letter, number on
252
Jan Sprenger
5 THE BAYESIAN’S RAVEN PARADOX
5.1 Bayesian confirmation theory and the Nicod Condition So far, we have discussed the paradox in a qualitative way — does observing a nonblack non-raven confirm the hypothesis that all ravens is black? The Hempelian resolution does, however, not clarify why we would recommend an ornithologist to go into the forest, in order to confirm the raven hypothesis. A natural reply would contend that black ravens confirm the raven hypothesis to a much stronger degree than white shoes. That thesis motivates a quantitative treatment of the paradox and will be the main subject of this section. Actually, the ‘confirmation intuition’ (CI) about the missing confirmatory value of non-ravens has three versions — a qualitative, a comparative and a quantitative one: Qualitative Intuition The observation of a non-black non-raven does not confirm the hypothesis that all ravens are black. Comparative Intuition The observation of a non-black non-raven confirms the hypothesis that all ravens are black to a lower degree than the observation of a black raven. Quantitative Intuition The observation of a non-black non-raven confirms the hypothesis that all ravens are black only to a minute degree. Part of the confusion in the existing literature is due to the fact that these three intuitions are not clearly set apart from each other. Hempel criticized exclusively the qualitative version. The quantitative and the comparative versions save the part of (CI) that concerns the extent of confirmation, and here our intuitions seem to be more stable. They form the resilient kernel of (CI) which makes the raven paradox so intriguing for modern confirmation theory. A further source of confusion is the question which background knowledge should be taken when evaluating these intuitions. Are they meant to hold for some, for empty or for all conceivable background assumptions? Or are those intuitions relative to the actual background assumptions?25 Hence, twelve (=3 × 4) different confirmation intuitions about the paradox could in principle be distinguished. But I believe intuitions with respect to actual background knowledge to be most interesting. First, most people seem to have that in mind when being one side have a vowel printed on the other side. Which of the cards (A, 2, F, 7) should you turn over to test the truth of H? Of course you have to turn over the card with the ‘2’ since this can be an obvious instance or counterexample to H. This line of reasoning is captured in the Nicod Condition, too. It is less obvious that you also have to turn over the ‘F ’ in order to test the contrapositive: All cards with a consonant on one side have an odd number on the other side. People regularly fail to recognize that the ‘F ’ has to be turned over, too. The kind of confirmation which this action yields is structurally identical to confirming the raven hypothesis by observing that a grey bird is not a raven, but a crow. Both the results in the Wason Selection Task and the debate around the raven paradox highlight the same kind of reluctancy to accept instances of the contrapositive as instances of the hypothesis itself. 25 I borrow these distinctions from [Fitelson, 2006].
Hempel and the Paradoxes of Confirmation
253
confronted with the paradox, so it is arguably the most accurate reconstruction of the paradox. Second, we will later argue that the Nicod Condition is best understood as referring to actual background knowledge. Indeed, Good’s [1961] raven/crow example suggests that the above confirmation intuitions will trivially hold for some background knowledge and trivially be false for every conceivable background knowledge. Finally, what empty background knowledge means stands in need of explication (though see [Carnap, 1950; Maher, 2004]). Thus we are well advised to focus on actual background knowledge. Here we have seen that the qualitative version of (CI) is under pressure, but on the other hand, the comparative and the quantitative versions enjoy some plausibility. This section tries to reinforce the arguments against the qualitative intuition and to vindicate the comparative and quantitative intuition from the point of view of Bayesian confirmation theory. The problem with the raven paradox is not the alleged truth of (CI), but the truth of the weaker comparative and quantitative versions. Qualitatively, Bayesian confirmation amounts to an increase in rational degree of belief upon learning new evidence. Degrees of belief are symbolized by subjective probabilities. In other words, evidence E confirms H if and only if P (H|E) > P (H). But we have to remember a lesson from the very first chapter of the book — confirmation is a three place predicate, relative to background knowledge. As both the raven paradox and the Duhem problem teach us, background assumptions are a crucial part of relating theory to evidence and inductive reasoning in science. The natural way to integrate them consists in taking background information for granted and conditionalizing an agent’s degrees of belief on it.26 That said, we can write down a first, qualitative definition of Bayesian confirmation: DEFINITION 9. A piece of evidence E confirms a hypothesis H relative to background assumptions K if and only if P (H|E.K) > P (H|K). This definition gives a probabilistic explication of relative confirmation, not of absolute confirmation: Definition 9 describes the relevance of evidence for a hypothesis, not high credibility of a hypothesis. However, the definition remains qualitative. To be able to tackle the comparative and quantitative versions of the paradox, we have to introduce a measure of confirmation. The following three candidates have been especially popular in the literature (see Fitelson 2001 for a discussion of their virtues and vices): Difference Measure d(H, E, K) := P (H|E.K) − P (H|K) Log-Ratio Measure r(H, E, K) := log
P (H|E.K) P (H|K)
26 Nonetheless, for reasons of convenience, we will often speak (but not write) as if the background knowledge were empty.
254
Jan Sprenger
Log-Likelihood Measure l(H, E, K) := log
P (E|H.K) P (E|¬H.K)
For reasons of simplicity, I restrict myself in the following to d and l which suffice to illustrate the substantial points. In the 1950s and early 1960s, the discussion of the confirmation paradoxes focussed on discussing, defending and rebutting (CI). In particular, Hempel himself has rejected (CI) and argued that tacit introduction of background knowledge may be responsible for the paradoxical appearance. In the light of Bayesian confirmation theory, one could, however, not only reject (CI), but also question (NC). Again, four versions of (NC) have to be distinguished. Nicod Condition (NC): For a hypothesis of the form H = ∀x : Rx → Bx and any individual constant a, an observation report of the form Ra.Ba confirms H, relative to every/actual/tautological/any background knowledge. Certainly, the Nicod Condition (every black raven confirms the raven hypothesis) is true relative to some background knowledge. But that claim is very weak and practically not helpful. It is somewhat more surprising that it is not true under all circumstances. I. J. Good [1967] constructed a simple counterexample in a note for the British Journal for the Philosophy of Science: There are only two possible worlds. In one of them, W1 , there are a hundred black ravens, no non-black ravens and one million other birds. In the other world W2 , there are a thousand black ravens, one white raven and one million other birds. Thus, H is true whenever W1 is the case, and false whenever W2 is the case. For all suggested measures of confirmation, the observation of a black raven is evidence that W2 is case and therefore evidence that not all ravens are black: P (Ra.Ba|W1 )
0, and P (Ba.Ra|K) > 0. • P (¬Ba|H.K) > P (Ra|H.K). • >
P (H | Ra.K) 1 − P (H | Ra.K)
1 − P (H | ¬Ba.K) · P (H | ¬Ba.K)
P (Ba | Ra.¬H.K) + [1 − P (Ba | Ra.¬H.K)]
P (Ra | H.K) (2) P (¬Ba | H.K)
Then l(H, Ba.Ra, K) > l(H, ¬Ba.¬Ra, K), i.e. (3) log
P (Ba.Ra|H.K) P (¬Ba.¬Ra|H.K) > log P (Ba.Ra|¬H.K) P (¬Ba.¬Ra|¬H.K)
and in particular (4) P (H | Ba.Ra.K) > P (H | ¬Ba.¬Ra.K) (The proof can be found in [Fitelson and Hawthorne, 2009].) The theorem asserts that the degree of support which Ba.Ra lends to H, as measured by the log-likelihood ratio l, exceeds the degree of support ¬Ba.¬Ra lends to H (see (3)). In other words, Fitelson and Hawthorne vindicate the comparative version of (CI): black ravens confirm the raven hypothesis better than white shoes. It follows easily that the posterior probability of H is higher if a black raven is observed than if a white shoe (or any non-black raven) is observed. To evaluate their result, we have to look at the assumptions of the theorem. The first set of assumptions is fully unproblematic: It is demanded that neither the observation of a black raven nor the observation of a non-black non-raven will determine the truth or falsity of H. Moreover, the rational degree of belief that a non-black ravens, black ravens or non-black non-ravens will be observed has to be higher than zero (though it can be infinitely small). These are just assumptions that reflect the openness of our probability assignments to empirical evidence. The second assumption is a little bit richer in content, but still extremely plausible: If H is true then we are more likely to observe a non-black object than a raven. That reflects the belief that there are many non-black objects (grey birds, for example), but comparably few ravens. Thus the last inequality (2) carries the main burden of the theorem. Is it a plausible assumption? Let us have a look at the right hand side first. Even if H is wrong, we expect the number of black ravens to vastly exceed the number of non-black ravens. (Note that we have already observed many black ravens!) Thus, x := P (Ba | Ra.¬H.K) is quite close to 1. Moreover, in any case there are many more non-black things than ravens. So the ratio P (Ra | H.K)/P (¬Ba | H.K) will be very small, and the second addend on the right hand side of (2) can be
260
Jan Sprenger
neglected (since 1 − x is close to zero). Now we come to the left hand side. Regardless of whether we observe black ravens or white shoes, a single observation of either object will not impose major changes on the posterior probability of H. This transfers to the posterior odds of H after observing Ba.Ra or ¬Ba.¬Ra, respectively. Thus, the quotient of those posterior odds will be close to 1 — even more close than x = P (Ba | Ra.¬H.K). And by this line of reasoning, we have established (2) and the last of Fitelson and Hawthorne’s premises. Thus, their argument is not only valid, but also conclusive. Of course, it is still possible to doubt one of the plausibility arguments in the previous paragraphs. But I think they are cogent enough to put the burden of proof to those who doubt Fitelson and Hawthorne’s comparative solution. Moreover, the elegance of their proof deserves high praise, and since they use clear-cut assumptions, their analysis directly points out the points of disagreement between defenders and critics of their solution. Furthermore, they do not rely on the independence claim (IA) or variants thereof. 6 SUMMARY The first part of this article has described and reviewed Hempel’s theory of confirmation and his analysis of the paradoxes of confirmation. Hempel’s approach to modeling confirmation departs from Carnap’s probabilistic approach: he decides to lay the qualitative foundations first by formulating general adequacy constraints that any account of confirmation has to satisfy. Hempel’s qualitative account of confirmation breaks with the classical hypothetico-deductive approach and proposes the satisfaction criterion: the restriction of a hypothesis to a specified object domain has to be entailed by the evidence. The criterion, however, has several shortcomings, some of them of technical nature, others being connected to the failure to account for confirmation by successful prediction. One of the most severe objections contends, however, that the satisfaction criterion is often monotone with respect to the background knowledge and thus unable to deal with the paradoxes of confirmation. On the one hand, Hempel has convincingly argued that the paradoxes rest on a psychological illusion, due to the tacit introduction of additional background knowledge. But on the other hand, his own criterion of confirmation neglects that insight and therefore fails to remove the paradox. The second part of the article focuses on recent attempts to solve the paradoxes in the framework of Bayesian confirmation theory. While Hempel was probably right that the qualitative version of the paradox was just a scheinproblem, there are comparative and quantitative versions of the paradox, too. To vindicate these intuitions in a probabilistic framework has proved to be a tough task. By their [2009] result, Fitelson and Hawthorne solve the comparative problem and give some reasons for optimism. But so far, the quantitative problem remains unsolved. Even more embarrassing, I have argued that there are principal problems that impair a sufficiently general resolution of the paradoxes of confirmation. The conclusion which I draw — scepticism towards quantitative solutions of the paradox — is
Hempel and the Paradoxes of Confirmation
261
somewhat atypical because most contributions to the literature either propose a solution or suggest to replace a previous attempt by a novel and better one.34 But the longstanding history of the paradox indicates that it will be hard to overcome. ACKNOWLEDGEMENTS I would like to thank Andreas Bartels, Branden Fitelson, Stephan Hartmann, James Hawthorne, Franz Huber, Kevin Korb, and Jacob Rosenthal for their incredibly helpful advice and criticism. BIBLIOGRAPHY [Agassi, 959] J. Agassi. Corroboration versus Induction, British Journal for the Philosophy of Science, 9:311-317, 1959. [Alexander, 1958] H. G. Alexander. The Paradoxes of Confirmation, British Journal for the Philosophy of Science, 9:227-233, 1958 [Alexander, 1959] H. G. Alexander. The Paradoxes of Confirmation — A Reply to Dr Agassi, British Journal for the Philosophy of Science, 10:229-234, 1959. [Black, 1966] M. Black. Notes on the ‘paradoxes of confirmation’. In Jaakko Hintikka and Patrick Suppes, eds., Aspects of Inductive Logic, pp. 175-197, North-Holland, Amsterdam, 1966. [Carnap, 1950] R. Carnap. Logical Foundations of Probability, The University of Chicago Press, Chicago, 1950. [Carnap, 1952] R. Carnap. The Continuum of Inductive Methods, The University of Chicago Press, Chicago, 1952. [Dietrich and Moretti, 2005] F. Dietrich and L. Moretti. On Coherent Sets and the Transmission of Confirmation, Philosophy of Science, 72:403-424, 2005. [Duhem, 1914] P. Duhem. La Th´ eorie Physique: Son Objet, Sa Structure. 1914. Second edition, reprinted in 1981 by J. Vrin, Paris. [Earman, 1992] J. Earman. Bayes or Bust? The MIT Press, Cambridge/MA, 1992. [Earman and Salmon, 1992] J. Earman and W. Salmon. The Confirmation of Scientific Hypotheses. In Merrilee H. Salmon. ed., Introduction to the Philosophy of Science, pp. 42-103. Hackett, Indianapolis, 1992. [Fitelson, 2001] B. Fitelson. A Bayesian Account of Independent Evidence with Applications, Philosophy of Science, 68:S123-S140, 2001. [Fitelson, 2006] B. Fitelson. The Paradox of Confirmation, Philosophy Compass, 1’:95–113, 2006 [Fitelson, 2009] B. Fitelson. The Wason Task(s) and the Paradox of Confirmation, Synthese, 2009. [Fitelson and Hawthorne, 2009] B. Fitelson and J. Hawthorne. How Bayesian Confirmation Theory Handles the Paradox of the Ravens. In Ellery Eells and James Fetzer, eds., Probability in Science, Open Court, Chicago, 2009. [Friedman, 1999] M. Friedman. Reconsidering Logical Positivism, Cambridge University Press, Cambridge, 1999. [Gaifman, 1979] H. Gaifman. Subjective Probability, Natural Predicates and Hempel’s Ravens, Erkenntnis, 21:105-147, 1979. [Gemes, 1993] K. Gemes. Hypothetico-Deductivism, Content and the Natural Axiomatisation of Theories, Philosophy of Science, 60:477-487, 1993. [Gemes, 2006] K. Gemes. Content and Watkins’ Account of Natural Axiomatizations, dialectica, 60:85-92, 2006. [Glymour, 1980] C. Glymour. Theory and Evidence, Princeton University Press, Princeton, 1980. [Good, 1960] I. J. Good. The Paradox of Confirmation, British Journal for the Philosophy of Science, 11:145-149, 1960. 34 [Korb,
1994] and [Vranas, 2004] are notable exceptions.
262
Jan Sprenger
[Good, 1961] I. J. Good. The Paradox of Confirmation (II), British Journal for the Philosophy of Science, 12:63-64, 1961. [Good, 1967] I. J. Good. The White Shoe is a Red Herring, British Journal for the Philosophy of Science, 17:322, 1967. [Good, 1968] I. J. Good. The White Shoe qua Herring is Pink, British Journal for the Philosophy of Science, 19:156-157, 1968. [Goodman, 1983] N. Goodman. Fact, Fiction and Forecast, Fourth Edition. Harvard University Press, Oxford, 1983. [Hempel, 1943] C. G. Hempel. A Purely Syntactical Definition of Confirmation, Journal of Symbolic Logic, 8: 122-143, 1943. [Hempel, 1965] C. G. Hempel. Studies in the Logic of Confirmation, Aspects of Scientific Explanation, 3-51, 1965. The Free Press, New York. Reprint from Mind 54, 1945 [Hempel, 1967] C. G. Hempel. The White Shoe: No Red Herring, British Journal for the Philosophy of Science, 18:239-240, 1967. [Hosiasson, 1940] J. Hosiasson-Lindenbaum. On Confirmation, Journal of Symbolic Logic, 5:133148, 1940. [Horwich, 1982] P. Horwich. Probability and Evidence, Cambridge University Press, Cambridge, 1982. [Howson and Urbach, 1993] C. Howson and P. Urbach. Scientific Reasoning: The Bayesian Approach. Second Edition. Open Court, La Salle, 1993. [Huber, 2008] F. Huber. Hempel’s Logic of Confirmation, Philosophical Studies, 139:181-189, 2008. [Humburg, 1986] J. Humburg. The Solution of Hempel’s Raven Paradox in Rudolf Carnap’s System of Inductive Logic, Erkenntnis, 24:57-72, 1986. [Korb, 1994] K. B. Korb. Infinitely many solutions of Hempel’s paradox. Theoretical Aspects Of Rationality and Knowledge — Proceedings of the 5th conference on Theoretical aspects of reasoning about knowledge. pp. 138-149. San Francisco: Morgan Kaufmann Publishers, 1994. [Mackie, 1963] J. L. Mackie. The Paradox of Confirmation, British Journal for the Philosophy of Science, 13:265-276, 1963. [Maher, 1999] P. Maher. Inductive Logic and the Ravens Paradox, Philosophy of Science, 66:5070, 1999. [Maher, (2004] P. Maher. Probability Captures the Logic of Confirmation. In Christopher Hitchcock, ed., Contemporary Debates in the Philosophy of Science, pp. 69-93. Blackwell, Oxford, 2004. [Musgrave, 2009] A. Musgrave. Popper and Hypothetico-Deductivism. In this volume, 2009. [Nicod, 1961] J. Nicod. Le Probl` eme Logique de l’Induction. Paris: Presses Universitaires de France. Originally published in 1925 (Paris: Alcan), 1961. [Popper, 1963] K. R. Popper. Conjectures and Refutations: The Growth of Scientific Knowledge, Routledge, London, 1963. [Royall, 1997] R. Royall. Statistical Evidence: A Likelihood Paradigm, Chapman & Hall, London, 1997. [Schurz, 1991] G. Schurz. Relevant Deduction, Erkenntnis, 35:391-437, 1991. [Suppes, 1969] P. Suppes. Models of Data. I P. Suppes, ed., Studies in the Methodology and Foundations of Science. Selected Papers from 1951 to 1969, pp. 24-35. Reidel, Dordrecht, 1969. Orginally published in Ernest Nagel, Patrick Suppes and Alfred Tarski (eds.): “Logic, Methodology and Philosophy of Science: Proceedings of the 1960 International Congress”. Stanford: Stanford University Press, 252-261, 1962. [Swinburne, 1971] R. Swinburne. The Paradoxes of Confirmation — A Survey, American Philosophical Quarterly, 8:318-330, 1971. [Uebel, 2006] T. Uebel. The Vienna Circle, 2006. [Vincent, 1964] D. H. Vincent. The Paradoxes of Confirmation, Mind, 73:273-279, 1964. [vonWright, 1966] G. H. von Wright. The Paradoxes of Confirmation. In Jaakko Hintikka and Patrick Suppes, eds., Aspects of Inductive Logic, pp. 208-218. North-Holland, Amsterdam, 1966. [Vranas, 2004] P. Vranas. Hempel’s Raven Paradox: A Lacuna in the Standard Bayesian Solution, British Journal for the Philosophy of Science, 55:545-560, 2004. [Wasonand Shapiro, 1971] P. C. Wason and D. Shapiro. Natural and contrived evidence in a reasoning problem, Quarterly Journal of Experimental Psychology, 23:63-71, 1971.
Hempel and the Paradoxes of Confirmation
263
[Watkins, 1957] J. W. N. Watkins. Between Analytical and Empirical, Philosophy, 33:112-131, 1957. [Watkins, 1960] J. W. N. Watkins. Confirmation without Background Knowledge, British Journal for the Philosophy of Science, 10:318-320, 1960. [Weisberg, 2009] J. Weisberg. Varieties of Bayesianism, this volume, 2009. [Woodward, 1985] J. Woodward. Critical Review: Horwich on the Ravens, Projectability and Induction, Philosophical Studies, 47:409-428, 1985 [Zabell, 2009] S. Zabell. Carnap and the Logic of Induction, this volume, 2009.
CARNAP AND THE LOGIC OF INDUCTIVE INFERENCE
S. L. Zabell
1
INTRODUCTION
This chapter discusses Carnap’s work on probability and induction, using the notation and terminology of modern mathematical probability, viewed from the perspective of the modern Bayesian or subjective school of probability. (It is a much expanded and more mathematical version of [Zabell, 2007]). Carnap initially used a logical notation and terminology that made his work accessible and interesting to a generation of philosophers, but it also limited its impact in other areas such as statistics, mathematics, and the sciences. Using the notation of modern mathematical probability is not only more natural, but also makes it far easier to place Carnap’s work alongside the contributions of such other pioneers of epistemic probability as Frank Ramsey, Bruno de Finetti, I. J. Good, L. J. Savage, and Richard Jeffrey. Carnap’s interest in logical probability was primarily as a tool, a tool to be used in understanding the quantitative confirmation of an hypothesis based on evidence and, more generally, in rational decision making. The resulting analysis of induction involved a two step process: one first identified a broad class of possible confirmation functions (the regular c-functions), and then identified either a unique function in that class (early Carnap) or a parametric family (later Carnap) of specific confirmation functions. The first step in the process put Carnap in substantial agreement with subjectivists such as Ramsey and de Finetti; it is the second step, the attempt to limit the class of probabilities still further, that distinguishes Carnap from his subjectivist brethren. So: precisely what are the limitations that Carnap saw as natural to impose? In order to discuss these, we must begin with his conceptS of probability. 2
PROBABILITY
The word ‘probability’ has always had a multiplicity of meanings. In the beginning mathematical probability had a meaning that was largely epistemic (as opposed to aleatory); thus for Laplace probability relates in part to our knowledge and in part to our ignorance. During the 19th century, however, empirical alternatives
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
266
S. L. Zabell
arose. In the years 1842 and 1843, no fewer than four independent proposals for an objective or frequentist interpretation were first advanced: those of Jakob Friedrich Fries in Germany, Antoine Augustin Cournot in France, and John Stuart Mill and Robert Leslie Ellis in England. Less than a quarter of a century later, John Venn’s Logic of Chance [Venn, 1866], the first book in English devoted exclusively to the philosophical foundations of probability, took a purely frequentist view of the subject. Ramsey, in advancing his view of a quantitative subjective probability based on a consistent system of preferences [Ramsey, 1926], deftly side-stepped the debate by conceding that the frequency interpretation of probability was a perfectly reasonable one, one which might have considerable value in science, but argued that this did not preclude a subjective interpretation as well. During the 20th century the debate became increasingly more complex, von Mises, Reichenbach, and Neyman advancing frequentist views, and Keynes, Ramsey, and Jeffreys competing logical or subjective theories. Carnap sought to bring order into this chaos by introducing the concepts of explicandum and explicatum. Sometimes philosophical debates arise unnecessarily due to the use of ill-defined (or even undefined) concepts. For example, an argument about whether or not viruses constitute a form of life can only really arise from a failure to define just what one means by life; define the term and the status of viruses (whose structure and function are in many cases very well understood) will become clear one way or the other. This is essentially an operationalist or logical positivist perspective, a legacy of Carnap’s days in the Vienna Circle. For Carnap the explicandum was the ill-defined concept; the explicatum the clarification of it that someone advanced. But probability did not involve just a dispute over the explication of a term. The term itself did double duty, being used by some in an epistemic fashion (the degree of belief in a proposition or event), and by others in an aleatory fashion (a frequency in a class or series). To unravel the Gordian knot of probability, one had to sever the two concepts and recognize that there are two distinct explicanda, each requiring separate exegesis.
2.1 Early views In his paper “The two concepts of probability” [1945b], Carnap introduced the terms probability1 and probability2 , the first referring to probability in its guise as a measure of confirmation, the second as a measure of frequency. This had twin advantages: putting the issue so clearly, debates about the one true meaning of probability became less credible; and the more neutral terminology helped shift the argument from issues of linguistic useage (which, after all, vary from one language to another), to conceptual explication. These ideas were developed at great length in Carnap’s magisterial Logical Foundations of Probability [1950], probabilities being assigned to sentences in a formal language. In his later work Carnap discarded sentences (which he viewed as insufficiently expressive for his purposes)
Carnap and the Logic of Inductive Inference
267
in favor of events or propositions, which he regarded as essentially equivalent, and we shall adopt this viewpoint. (The main technical complication in working at the level of sentences is that more than one sentence can assert the same proposition; for example, α ∧ β and ¬(¬α ∨ ¬β).) Carnap’s approach was a direct descendant of Wittgenstein’s relatively brief remarks on probability in the Tractatus, later developed at some length by Waismann [1930]. Carnap, following Waismann, assumed the existence of a regular measure function m(x) on sentences, defining these by first assuming a normalized nonnegative function on molecular sentences and then extending these to all sentences. Carnap then defined in the usual way c(h, e), the conditional probability of a proposition h given the proposition e, as the ratio m(h ∧ e)/m(e). Carnap interpreted the conditional probabilities c(h, e) as a measure of the extent to which evidence e confirms hypothesis h. Such functions had already been studied by Janina Hosiasson-Lindenbaum [1940] a decade earlier. Unlike Carnap, Hosiasson-Lindenbaum took a purely axiomatic approach: she studied the general properties of confirmation functions c(h, e), assuming only that they satisfied a basic set of axioms. There are several equivalent versions of this set appearing in the literature; here is one particularly natural formulation: The axioms of confirmation 1. 0 ≤ c(h, e) ≤ 1. 2. If h ↔ h′ and e ↔ e′ , then c(h, e) = c(h′ , e′ ). 3. If e → h, then c(h, e) = 1. 4. If e → ¬(h ∧ h′ ), then c(h ∨ h′ , e) = c(h, e) + c(h′ , e). 5. c(h ∧ h′ , e) = c(h, e) · c(h′ , h ∧ e). Carnap’s conditional probabilities c(h, e) satisfied these axioms (and so were plausible candidates for confirmation functions).
2.2
Betting odds and Dutch books
But just what do the numbers m(e) or c(h, e) represent? It was one of the great contributions of Ramsey and de Finetti to advance operational definitions of subjective probability; for Ramsey, primarily as arising from preferences, for de Finetti as fair odds in a bet. By then imposing rationality criteria on such quantities, both were able to derive the standard axioms for finitely additive probability. Ramsey, in a remarkable tour-de-force, was able to demonstrate the simultaneous existence of utility and probability functions u(x) and p(x). He did this by imposing natural consistency constraints on a (sufficiently rich) set of preferences, introducing the device of the ethically neutral proposition (the philosophical equivalent of tossing a fair coin) as a means of interpolating between competing alternatives. The
268
S. L. Zabell
functions u(x) and p(x) track one’s preferences in the sense that one action is preferred to another if and only if its expected utility is greater than the other. (Jeffrey [1983] discusses Ramsey’s system and presents an extremely interesting variant of it.) De Finetti, in contrast, initially gave primacy to probabilities interpreted as betting odds. (If p is a probability, then the corresponding odds are p/(1 − p).) The odds represent a bet either side of which one is willing to take. (Thus, the odds of 2 : 1 in favor of an event means that one would accept either a bet of 2 : 1 for, or a bet of 1 : 2 against. This is somewhat akin to the algorithm for two children dividing a cake: one divides the cake into two pieces, the other chooses one of the two pieces.) De Finetti imposed as his rationality constraint the requirement that these odds be coherent; that is, that it be impossible to construct a Dutch book out of them. (In a Dutch book, an opponent can choose a portfolio of bets such that he is assured of winning money. The existence of a Dutch book is analogous to the existence of arbitrage opportunities in the derivatives market.) A conditional probability P (A | B) in de Finetti’s system is interpreted as a conditional bet on A, available only if B is determined to have happened. De Finetti was able to show that the probabilities corresponding to a coherent set of bettings odds must satisfy the standard axioms of finitely additive probability. For example, if one takes the axioms for confirmation listed in the previous subsection, all are direct consequences of coherence. John Kemeny, one of Carnap’s collaborators in the 1950s, proved a beautiful converse to this result [Kemeny, 1955]. He showed that the above five properties of a confirmation function are at once both necessary and sufficient for coherence. That is, although de Finetti had in effect shown that coherence implies the five axioms, in principle there might be other, incoherent confirmation functions also satisfying the five axioms. If one did not begin by accepting (coherent) betting odds as the operational interpretation of c(h, e), this left open the possibility of other confirmation functions, ones not falling into the Ramsey and de Finetti framework. The power of Kemeny’s result is that if one accepts the five axioms above as necessary desiderata for any confirmation function c(h, e), then such functions necessarily assign coherent betting odds to the universe of events. This was a powerful argument in favor of the betting odds interpretation, and it persuaded Carnap, who adopted it. Thus, while in The Logical Foundations of Probability Carnap had advanced no fewer than three possible interpretations for probability1 — evidential support, fair betting quotients, and estimates of statistical frequencies — in his later work he explicitly abandoned the first of these, and wrote almost exclusively in terms of the second. (The “normative” force of Dutch book arguments has of course been the subject of considerable debate. Armendt [1993] contains a balanced discussion of the issues and provides a useful entry into the literature.) Nevertheless, even accepting the subjective viewpoint, the issue remains: can the inductive confirmation of hypotheses be understood in quantitative terms? It was this later question that was of primary interest to Carnap, and the one to
Carnap and the Logic of Inductive Inference
269
which he turned in a second paper “On inductive logic” [1945a]. 3 CONFIRMATION In order to better appreciate Carnap’s analysis of the inductive process, let us briefly review the background against which he wrote. First some basic mathematical probability. Suppose we have an uncertain event that can have one of two possible outcomes, arbitrarily termed “success” and “failure”, and let Sn denote the number of successes in n instances (“trials”). If the trials are independent, and have a constant probability p of success, then the probability of k successes in the n trials is given by the binomial distribution: n k p (1 − p)n−k , 0 ≤ k ≤ n. P (Sn = k) = k Here
n! n = k k!(n − k)! is the binomial coefficient, and n! = n · (n − 1) · (n − 2) ... 3 · 2 · 1. Suppose next that the probability p is itself random, with some probability distribution dμ(p) on the unit interval. For example, success and failure might correspond to getting a head or tail when tossing a ducat, and the ducat is chosen from a bag of ducats having variable probability p of coming up heads (reflecting the composition of coins in the bag). In this case the probability P (Sn = k) is obtained by averaging the binomial probabilities over the different possible values of p. This average is standardly given by an integral, namely 1 n k p (1 − p)n−k dμ(p), 0 ≤ k ≤ n, P (Sn = k) = k 0 In our example dμ(p) is aleatory in nature, tied to the composition of the bag. But it could just as well be taken to be epistemic, reflecting our degree of belief regarding the different possible values of p.
3.1 The rule of succession In this analysis there are several important questions as yet unanswered. In particular, the nature of p (is it a physical probability or a degree of belief?) has not been specified, and no guidance has been given regarding the origin of the initial or prior distribution dμ(p). In particular, even if the nature of p is specificed, how does one determine the prior distribution dμ(p)? For Laplace and his school, one had resort to the principle of indifference: lacking any reason to favor one value of p over another, the distribution was taken to be uniform over the unit interval: dμ(p) = dp. In this case the integral simplifies to give: P (Sn = k) =
1 , n+1
0 ≤ k ≤ n.
270
S. L. Zabell
But in fact the Reverend Thomas Bayes, the eponymous founder of the subject of Bayesian statistics, employed a subtler argument that paralleled Carnap’s later approach. Bayes [1764] reasoned that in a case of complete ignorance (“an event concerning the probability of which we absolutely know nothing antecedently to any trials made concerning it”), one has P (Sn = k) = 1/(n + 1) for all n ≥ 1 and 0 ≤ k ≤ n (in effect Bayes takes the later to be the definition of the former), and this in turn implies that the prior must be uniform. The argument can in fact be made rigorous. Let k = n; then Bayes’s postulate P (Sn = k) = 1/(n + 1) tells us that
1
1 = p dμ(p) = n+1 n
0
1
pn dp,
0
n ≥ 1.
Thus the as yet unknown probability dμ(p) has the same moments as the so-called “flat” prior dp. But the Hausdorff moment theorem tells us that a probability measure on a compact set (here [0, 1]) is characterized by its moments. Thus dμ(p) and dp, having the same moments, must coincide. Given the Bayes-Laplace formula P (Sn = k) = 1/(n+1), it is a simple matter to derive the corresponding predictive probabilities. If, for example, Xj is a so-called indicator variable taking the values 1 or 0, depending on whether the outcome of the j-th trial is a success or failure, respectively (so that the number of successes Sn is X1 + ... + Xn ), then P (Xn+1 = 1 | Sn = k) is the conditional probability of a success on the next trial, based on the experience of the past n trials. Since the formula for conditional probability is P (A | B) = P (A and B)/P (B), it follows after a little algebra that P (Xn+1 | Sn = k) =
k+1 . n+2
This is the celebrated (or infamous) rule of succession. Both it and the controversial principle of indifference on which it was based were the subject of harsh criticism beginning in the middle of the 19th century; see Zabell [1989]. Stigler [1982] argues that Bayes’s form of the indifference postulate, applying as it does to the discrete outcome k, does not entail the same paradoxes as the principle of indifference applied to the continuous parameter p. But Bayes’s ingenious argument was forgotten, and Laplace’s approach became the focus of controversy. The Cambridge phenom Robert Leslie Ellis objected in the 1840s that one could not conjure something out of nothing: ex nihilo nihil; the German Johann von Kries countered in 1886 that one could invoke instead the principle of cogent reason: alternatives are judged equipossible because our knowledge is distributed equally among them; the point is the equi-distribution of knowledge rather than nihilist ignorance. In pragmatic England the Oxford statistician and economist F. Y. Edgeworth argued the use of flat priors was justified on approximate empirical grounds; the Cambridge logician and antiquarian John Venn ridiculed the use of the rule of succession. In France the distinguished Joseph Bertrand challenged
Carnap and the Logic of Inductive Inference
271
the cogency of subjective probability; the even more distinguished Henri Poincar´e championed it. This was the decidedly unsatisfactory state of affairs in 1921, the year when John Maynard Keynes’s Treatise on Probability appeared. Keynes’s Treatise contains a useful summary of much of this debate. The next several decades saw increasing clarification of the foundations of probability and its use in inductive inference. But the particular thread we are interested in here involves a curious development that took place in two independent stages.
4 EXCHANGEABILITY In 1924 William Ernest Johnson, an English logician and philosopher at King’s College, Cambridge, published the third volume of his Logic. In an appendix at the end, Johnson suggested an alternative analysis to the one just discussed, one which represented a giant step forward. But despite the respect accorded him in Cambridge, Johnson had only limited influence outside it, and after his death in 1931, his work was little noted. It is one of the ironies of this subject that Carnap later followed essentially the same route as Johnson, but to much greater effect, in part because Carnap’s Logical Foundations of Probability embedded his analysis in a much more detailed setting, and in part because he continued to refine his treatment of the subject for nearly two decades (whereas Johnson died only a few years after the appearance of his book). Johnson’s analysis contained several elements of novelty. The first two of these were designed to meet the two basic objections that had been raised regarding the classical rule of succession: its appeal to the so-called “principle of indifference”, and its appeal by way of analogy to drawing balls from an urn.
4.1
Multinomial sampling
First, Johnson considered the case of t ≥ 2 equipossible cases (instead of just two). This was no mere technical generalization. In many of the most telling attacks on the principle of indifference, situations were considered where it was unnatural to think of the outcome of interest as being one of two equipossible competing alternatives. By encompassing the multinomial case (several possible categories rather than just two) Johnson’s analysis applied to situations in which the multiple competing outcomes are either naturally viewed as equipossible (for example, rolling a fair, six-sided die), or can be further broken down into equipossible subcases.
4.2
The permutation postulate
Second, Johnson presciently introduced the concept of exchangeability. Let us consider a sequence of random outcomes X1 , ..., Xn , each taking on one of t possible types c1 , ..., ct . (For example, you are on the Starship Enterprise, and each time
272
S. L. Zabell
you encounter someone, they are either Klingon, Romulan, or Vulcan, so that t = 3.) Then a typical probability of interest is of the form P (X1 = e1 , X2 = e2 , ..., Xn = en ),
ei ∈ {c1 , ..., ct },
1 ≤ i ≤ t.
In the classical inductive setting, the order of these observations is irrelevent, the only thing that matters being the counts or frequencies observed for each of the t categories. (More complex situtations will be discussed later.) Thus, if ni is the number of Xj falling into the i-th category, it is natural to assume that all sequences X1 = e1 , X2 = e2 , ..., Xn = en having the same frequency counts n1 , n2 , ..., nt have the same probability. Johnson termed this assumption the permutation postulate. (Carnap called the sequences e1 , ..., en state descriptions, the frequency counts n1 , ..., nt structure descriptions, and made the identical symmetry assumption.) The valid application of the rule of succession presupposes, as Boole notes, the aptness of the analogy between drawing balls from an urn — the urn of nature, as it was later called — and observing an event [Boole 1854, p. 369]. As Jevons [1874, p. 150] put it, “nature is to us like an infinite ballot-box, the contents of which are being continually drawn, ball after ball, and exhibited to us. Science is but the careful observation of the succession in which balls of various character present themselves . . . ”. The importance of Johnson’s “permutation postulate” is that it is no longer necessary to refer to the urn of nature. To what extent is observing instances like drawing balls from an urn? Answer: to the extent that the instances are judged exchangeable. Venn and others, having attacked the rote use of the rule of succession, rightly argued that some additional assumption, other than mere repetition of instances, was necessary for valid inductive inference. From time to time various names for such a principle have been advanced: Mill’s “Uniformity of Nature”; Keynes’s “Principle of Limited Variety”; Goodman’s “projectibility”. It was Johnson’s achievement to have realized both that ‘the calculus of probability does not enable us to infer any probability-value unless we have some probabilities or probability relations given’ [Johnson, 1924, p. 182]; and that the vague, verbal formulations of his predecessors could be captured in the mathematically precise formulation of exchangeability. The permutation postulate (the assumption of exchangeability in modern parlance) was later independently introduced by the Italian Bruno de Finetti (see, for example, [de Finetti, 1937]), and became a centerpiece of his theory. For our purposes here, the basic point is that if the sequence is assumed to be exchangeable, then an assignment of probabilities to sequences of outcomes e1 , e2 , ..., en reduces to assigning probabilities P (n1 , n2 , ..., nt ) to sequences of frequency counts n1 , n2 , ..., nt . This is because there are (using the standard notation for the multinomial coefficient) n! n = n1 n2 ... nt n1 ! n2 ! ... nt !
Carnap and the Logic of Inductive Inference
273
different possible sequences e1 , e2 , ..., en having the same set of frequency counts n1 , n2 , ..., nt , and each of these is assumd to be equally likely, so by exchangeability and the additivity of probability n! P (e1 , e2 , ..., en ). P (n1 , n2 , ..., nt ) = n1 ! n2 ! ... nt ! (That is, the probability of a state description e1 , ..., en , times the number of state descriptions having the same corresponding structure description n1 , ..., nt , gives the probability of that structure description.) It is a simple but nevertheless instructive exercise to verify that the predictive probabilities in this case take on a simple form: P (Xn+1 = ci | X1 = e1 , X2 = e2 , ..., Xn = en ) = P (Xn+1 = ci | n1 , n2 , ..., nt ). (That is, although the conditional probability apparently depends on the entire state description e1 , ..., en , in fact it only depends on the corresponding structure description n1 , ..., nt .) In statistical parlance this last property is summarized by saying that the frequencies n1 , ..., nt are sufficient statistics: no information is lost in summarizing the sequence e1 , ..., en by the counts n1 , ..., nt . Such statistics turn out to be a powerful tool in extensions of exchangeability discovered in recent decades; see, e.g., [Diaconis and Freedman, 1984].
4.3
The combination postulate
But what do we choose for P (n1 , n2 , ..., nt )? In the case t = 2, this reduces to assigning probabilities to the pairs (n1 , n2 ). A little thought will show that Bayes’s postulate (that the different possible frequencies k are equally likely) is equivalent to assuming that the different pairs (n1 , n2 ) are equally likely (since n1 = k, n2 = n − n1 and n is fixed). This in turn suggests the probability assignment that takes each of the possible structure descriptions to be equally likely, and this is in fact the path that both Johnson and Carnap initially took (Johnson termed this the combination postulate). Since there are n+t−1 t possible structure descriptions (also known as “ordered t-partitions of n”, a wellknown combinatorial fact, see, e.g., [Feller, 1968, p. 38]), and each of these is assumed equally likely, one has P (n1 , n2 , ..., nt ) =
1 . n+t−1 t
274
S. L. Zabell
Together, the combination and permutation postulates uniquely determine the probability of any specific finite sequence; if a state description e1 , e2 , ..., en has structure description n1 , n2 , ..., nt then its probability is 1 ; P (e1 , e2 , ..., en ) = n n+t−1 n1 n2 ... nt t see Johnson [1924, appendix on eduction]. This is Carnap’s m⋆ function. Having thus specified the probabilities of the “atomic” sequences, all other probabilities, including the rules of succession, are completely determined. Some simple algebra in fact yields P (Xn+1 = ci | n1 , n2 , ..., nt ) =
ni + 1 ; n+t
see Johnson [1924]. This is Carnap’s c⋆ function. 5
THE CONTINUUM OF INDUCTIVE METHODS
Although the mathematics of the derivation of the c⋆ system is certainly attractive, its assumption that all structure descriptions are equally likely is hardly compelling, and Carnap soon turned to more general systems. It is ironic that here too his line of attack very closely paralleled that of Johnson. After criticisms from C. D. Broad [1924] and others, Johnson devised a more general postulate, later termed by I. J. Good [1965] the sufficientness postulate. This assumes that the predictive probabilities for a particular type i are a function of how many observations of the type have been seen already (ni ), and the total sample size n. It is a remarkable fact that this characterizes the predictive probabilities or rules of succession (and therefore the probability of any sequence).
5.1
The Johnson-Carnap continuum
Suppose X1 , X2 , ..., Xn , ... represent an infinite sequence of observations, each assuming one of (the same) t possible values, and that at each stage n the sequence satisfies the permutation postulate. (In modern parlance, one has an infinitely exchangeable, t-valued sequence of random variables.) Assume the sequence satisfies the following three conditions: 1. Any state description e, ..., en is a priori possible: P (e1 , ...en ) > 0. 2. The “sufficientness postulate” is satisfied: P (Xn+1 = ei | n1 , ..., nt ) = fi (ni , n). 3. There are at least three types of species; t ≥ 3.
Carnap and the Logic of Inductive Inference
275
Then (unless the outcomes are independent of each other, so that observing one or more provides no predictive power regarding the others) the predictive probabilities have a very special form: there exist positive constants α1 , ..., αt such that if α = α1 + ... + αt , then for all n ≥ 1, states ei , and structure descriptions n1 , ..., nt , ni + αi . P (Xn+1 = ei | n1 , ..., nt ) = n+α This truly beautiful result characterizes the predictive probabilities up to a finite sequence of positive constants α1 , α2 , ..., αt . Note Carnap’s c⋆ measure of confirmation is a special case of the continuum, with αi = 1 for all i. The assumption that all state descriptions have positive probability is needed to insure that the requisite conditional probabilities are well-defined. (In Carnap’s terminology, the probability function is regular.) The restriction t ≥ 3 is necessary because otherwise the sufficientness postulate would be vacuous. (One can recover the result in the case t = 2 by replacing the sufficientness postulate by the assumption that the predictive probabilities are linear in ni ; see, e.g., [Zabell, 1982].)
5.2
The de Finetti representation theorem
The assumption that arbitrarily long sequences satisfy the permutation postulate means their probabilities admit an integral representation of the type mentioned earlier in Section 3; this is the content of the celebrated de Finetti representation theorem [de Finetti, 1937]. Specifically, let ∆t denote the set of probabilities on t elements: t pj = 1}. ∆t := {(p1 , ..., pt ) : pj ≥ 0, j=1
De Finetti’s theorem states that if X1 , X2 , X3 , ... is an infinitely exchangeable sequence on t elements, then there exists a probability measure dμ on ∆t , such that for every n ≥ 1, if n1 , ..., nt are the frequency counts of X1 , ..., Xn , then n! p1n1 pn2 2 ...pnnt dμ(p1 , ..., pt ). P (n1 , n2 , ..., nt ) = n !n ∆t 1 2 !...nt ! (Note that a single measure dμ simultaneously achieves this for all sample sizes n.) There are a number of interesting foundational issues arising from this result. The integrand n! pn1 pn2 ...pnnt n1 !n2 !...nt ! 1 2 is a multinomial probability, and the theorem asserts that an exchangeable probability P can be represented as a integral mixture of multinomial probabilities. It is obvious that a multinomial probability and more generally any mixture of multinomials is exchangeable; the force of the theorem is that the converse holds:
276
S. L. Zabell
every exchangeable probability is expressible as a mixture. There is no restriction placed on the mixing measure dμ. Many results in the literature of inductive inference are often easier to state, prove, or interpret in terms of such representations. For example, Johnson’s theorem can be interpreted as telling us that when the sufficientness postulate is satisfied the averaging measure in the representation is a member of the classical Dirichlet family of prior distributions: t t Γ( j=1 αj )
α −1 pj j dp1 ...dpt−1 dμ(p1 , ..., pt ) = t Γ(α ) j j=1 j=1
(αj > 0).
(Here Γ denotes the gamma function; if k is a positive integer, then Γ(k) = (k−1)!.) The ability to characterize the predictive probabilities using Johnson’s sufficientness postulate, however, means that in principle one can entirely pass over this interesting but more mathematically complex fact. As Johnson himself observed, I substitute, for the mathematician’s use of Gamma Functions and αmultiple integrals, a comparatively simple piece of algebra, and thus deduce a formula similar to the mathematician’s, except that, instead of for two, my theorem holds for α alternatives, primarily postulated as equiprobable. [Johnson, 1932, p. 418; Johnson’s α corresponds to our t] Why are rules of succession so important? Note the joint probability of a sequence of events can be built up from the corresponding sequence of conditional probabilities. For example: the joint probability P (X1 = e1 , X2 = e2 , X3 = e3 ) can be expressed as P (X1 = e1 ) · P (X2 = e2 | X1 = e1 ) · P (X3 = e3 | X1 = e1 , X2 = e2 ). Thus one can express joint probabilities in terms of initial probabilities and rules of succession.
5.3 Interpretation of the Continuum Let us consider a specific method in the continuum, say with parameters α1 , ..., αt . Then one can write the rule of succession as α n ni + αi ni αi + . = P (Xn+1 = ci | ni ) = n+α n+α n n+α α The two expressions in square brackets have obvious interpretations: the first, ni /n is the empirical frequency, and represents the input of experience; the second,
Carnap and the Logic of Inductive Inference
277
αi /α, is our initial or prior probability concerning the likelihood of seeing ci (set ni = n = 0 in the formula). The two terms in rounded brackets, n/(n + α) and α/(n+α), sum to one and express the relative weight accorded to our observations versus our prior information. If α is small, then n/(n + α) is close to one, and the empirical frequencies ni /n are accorded primacy; if α is large, then n/(n + α) is small, and the initial probabilties are accorded primacy. Of course, “if α is large” must be understood relative to a fixed value of n; no matter how large α is, for a fixed value of α it is evident that n = 1, n→∞ n + α lim
reflecting the fact that no matter how large the initial weight assigned to our initial probabilities, these prior opinions are ultimately swamped by the overwhelming weight of empirical evidence.
5.4
History
The result itself has an interesting history. Johnson considered the special case when the function fi (ni , n) = f (ni , n); that is, it does not depend on the category or type i. In this case there is just one parameter, α, since αi = α/t for all i. Johnson did not publish his result in his own lifetime (shades of Bernoulli and Bayes!); he had planned a fourth volume of his Logic, but only completed drafts of three chapters of it at the time of his death. A (then very young) R. B. Braithwaite edited the chapters for publication, and they appeared as three separate articles in Mind in 1932 [Johnson, 1932]. (It is ironic that G. E. Moore, the editor of Mind, questioned the desirability of including a mathematical appendix giving the details of the proof in such a journal, but Braithwaite — fortunately — insisted.) Due to its posthumous character, the proof as published contained a few lacunae, and a desire to fill these led to [Zabell, 1982]. This paper shows that not only can the above-mentioned lacunae be filled, but that Johnson’s method very naturally generalizes to cover the asymmetric case (when the predictive function fi (ni , n) depends on i), the case t = ∞, and the case of finite exchangeable sequences that are not infinitely extendable. Carnap followed much the same path as Johnson, initially considering the symmetric, category independent case, except that he assumed both the sufficientness postulate and the form of the predictive probabilities given in the theorem. It was only later that his collaborator John G. Kemeny was able to prove the equivalence of the two (assuming t > 2). Carnap subsequently extended these results, first to cover the case t = 2 [Carnap and Stegm¨ uller, 1959]; and finally in Jeffrey (1980, Chapter 6) abandoned the assumption of symmetry between categories and derived the full result given above (see also [Kuipers, 1978]). The historical evolution is traced in [Schillp, 1963, pp. 74–75 and 979–980; Carnap and Jeffrey, 1971, pp. 1–4 and 223; Jeffrey, 1980, pp. 1–5 and 103–104].
278
S. L. Zabell
6 CONFIRMATION OF UNIVERSAL GENERALIZATIONS Suppose all n observations are of the same type; for example, that we are observing crows and thus far all have been black. In such situtations, it is natural to view our experience as evidence not just that most crows are black, but as confirming the “universal generalization” that all crows are black. This apparently natural expectation, however, leads to unexpected complexities.
6.1 Paradox feigned This is due to an interesting property of the Johnson-Carnap continuum: (infinite) universal generalizations have zero probability! For example, having observed n black crows, it follows from k successive applications of the rule of succession that the probability the next k crows are also black is P (Xn+1 = Xn+2 = ... = Xn+k = ci | ni = n) =
n+k−1
j=n
j + αi . j+α
It is not hard to see that this product tends to zero as k tends to infinity. It is a standard result that if 0 < an ≤ 1(n ≥ 1) then the infinite product n≥1 an ) (1 − an ) diverges to zero if and only if the corresponding infinite series n≥1 diverges to infinity (see, e.g., [Knopp, 1947, pp. 218-221]). Because ∞ α − αi j+α j=n
diverges (it is essentially the harmonic series), one has lim
k→∞
n+k−1
j=n
j + αi = 0. j+α
This was viewed as a defect of Carnap’s system by several critics, for example, [Barker, 1957, pp. 87-88; Ayer, 1972, pp. 37-38, 80-81]. But the phenomenon itself had been both noted and defended much earlier, by Augustus De Morgan [1838, p. 128] in the nineteenth century. (“No finite experience whatsoever can justify us in saying that the future shall coincide with the past in all time to come, or that there is any probability for such a conclusion”); and by C. D. Broad [1918] in a similar situation (the “finite rule of succession”) in the twentieth. The obvious Bayesian response was advanced by Wrinch and Jeffreys [1919] a year after Broad wrote: one assigns non-zero initial probability to the generalization. As Edgeworth noted shortly after in his review of Keyens’s Treatise, “pure induction avails not without some finite initial probability in favour of the generalisation, obtained from some other source than the instances examined” [Edgeworth 1922, p. 267]. But can one build such a “finite initial probability” into the Carnapian approach (that is, via axiomatic characterization)? In order to understand this, let us first consider the simplest case.
Carnap and the Logic of Inductive Inference
279
6.2 Paradox lost It is possible to see what is going wrong in terms of the sufficientness postulate. Suppose there are three categories, 1, 2, and 3, and none of the observations thus far fall into the first. What can one say about P (X2n+1 = c1 | n1 , n2 , n3 )? According to the sufficientness postulate, there is no difference between the three cases (a) n2 = 2n, n3 = 0, (b) n2 = 0, n3 = 2n, and (c) n2 = n3 = n. But from the point of universal generalizations there is an obvious difference: the first and second cases confirm different universal generalizations (which may have different initial probabilities), while the third case disconfirms both. Continua confirming universal generalizations must treat the cases differently. Thus it is necessary to relax the sufficientness postulate, at least in the case when ni = n for some i. This diagnosis suggests a simple remedy. Suppose one modifies the sufficientness postulate so that the “representative functions” fi (n1 , ..., nt ) (to use yet another terminology sometimes employed) are assumed to be functions of ni and n unless ni = 0 and nj = n for some j = i. Then it can be shown (see, e.g., [Zabell, 1996]) that as long the observations are exclusively of one type, the representative function consists of two parts: a term corresponding to the posterior probability that future observations will continue to be of this type (the “universal generalization”), and a Johnson-Carnap term; and this continues to be the case as long as all observations are of a single type. If, however, at any stage a second type is observed, then the representative function reverts to a pure Johnson-Carnap form. So this was a tempest in a teapot: this criticism of the continuum was easily answered even at the time it was initially made. In hindsight the reason Johnson’s postulate gives rise to the problem is apparent, the minimal change to the postulate necessary to remedy the problem results in an expanded continuum confirming precisely the desired universal generalizations (and no others), and this can be demonstrated by a straightfoward modification of Johnson’s original proof (for further discussion and references, see [Zabell, 1996]). But in fact much more is true: such an extension of the original Carnap continuum is merely a special case of a much richer class of extensions due to Hintikka, Niiniluoto, and Kuipers.
6.3
Hintikka-Niiniluoto systems
In order to appreciate Hintikka’s contribution, consider first the category symmetric case. Let Tn (X1 , X2 , ..., Xn ) denote the number of distinct types or species observed in the sample. In the continuum discussed in the previous subsection the predictive probabilities now depend not just on ni and n, but also on Tn , the number of instantiated categories. Specifically: is Tn = 1 or is Tn > 1? Thus put, this suggests a natural generalization: let the predictive probabilities be any
280
S. L. Zabell
function of ni , n, and Tn . The result is a very attractive extension of the Carnap continuum. In brief, if the predictive probabilities depend on Tn , then in general they arise from mixtures of Johnson-Carnap continua concentrated on subsets of the possible types. Thus, given three categories a, b, c, the probabilities can be concentrated on a or b or c (universal generalizations), or Johnson-Carnap continua corresponding to the three pairs (a, b), (a, c), (b, c), or a Johnson-Carnap continuum on all three. In retrospect, this is of course quite natural. If only two of the three possibilities are observed in a long sequence of observations (say a and b), then (in addition to giving us information about the relative frequency of a and b) this tentatively confirms the initial hypothesis that only a’s and b’s will occur. In the more general category asymmetric case, the initial probabilities for the six different generalizations (a, b, c, ab, ac, and bc) can differ, and the predictive probabilities are postulated to be functions of ni , n, and the observed constituent: that is, the specific set of categories observed. (Thus in our example it is not enough to tell one that Tn = 2, but which two categories or species have been observed.) This beautiful circle of results originates with Hintikka [1966], and was later extended by Hintikka and Niiniluoto [1979]. The monograph by Kuipers [1978] gives an outstanding survey and synthesis of this work, including discussion of Kuipers’s own contributions; for a recent summary and evaluation, see Niiniluoto [2009].
6.4 Attribute symmetry Both the original Johnson-Carnap continuum and its Hintikka-Niiniluoto-Kuipers generalizations are of great interest, but share a common weakness. If what one is trying to do is to capture precisely the notion of a category-symmetric state of knowledge – no more and no less — then the one and only constraint is that the resulting probabilities be invariant under permutation of the categories. Carnap referred to such invariance as attribute symmetry. If one writes an n-long sequence in compact form as X : {1, ..., n} → {1, ..., t}, and P is a probability on the possible sequences X, then exchangeability requires P to be invariant under permutations of {1, ..., n} and attribute symmetry requires P to be invariant under permutations of {1, ..., t}. Suppose one adds attribute symmetry to exchangeability as a restriction on P . The resulting class of probability functions is still infinite dimensional; see Zabell [1982, p. 1097, 1992; pp. 216–217]. At first sight this seems surprising: if our knowledge is category symmetric, surely the sufficientness postulate should hold. But it is not hard to construct counterexamples. For example, suppose we have a die and know one face is twice as likely to come up as another, but not which face. Then there are six hypotheses Hj : for 1 ≤ j ≤ 6, Hj : pj = 2/7, pk = 1/7, k = j; and the six Hj are judged equiprobable. Consider the following two possible
Carnap and the Logic of Inductive Inference
281
frequency vectors that could occur in a sample of size n = 70: n1 = (20, 10, 10, 10, 10, 10),
n2 = (20, 30, 5, 5, 5, 5).
Obviously n1 supports H1 over H2 ; and n2 supports H2 over H1 , even though, if the sufficientness postulate held, the predictive probabilities for seeing a one on the next trial should be the same in each case. So there exist natural category symmetric epistemic states in which the sufficientness postulate fails. In general, if there is attribute symmetry the sufficient statistics are the frequencies of the frequencies (denoted ar ): for each r, 0 ≤ r ≤ t, ar is the number of categories j such that nj = r. The recognition that even in these cases the entire list of frequencies ni may contain relevant information concerning the individual categories via the ar appears to go back to Turing; see [Good, 1965, Chapter 8]. Thus even assuming both exchangeability and attribute symmetry admits a rich family of possible probabilities; and it might be thought this would limit their utility. But even exchangeability by itself has many interesting qualitative consequences. The next section illustrates one of these. 7 INSTANTIAL RELEVANCE One important desideratum of a candidate for confirmation is instantial relevance: if a particular type is observed, then it is more likely that such a type will be observed in the future. In its simplest form, this is the requirement that if i < j, then P (Xj = 1 | Xi = 1) ≥ P (Xj = 1) (the Xk denoting indicators that take on the values 0 or 1). It is not hard to see that exchangeability alone does not insure instantial relevance. Suppose, for example, one draws balls at random from an urn initially having three red balls and two black balls. If the sampling is without replacement, then the probability of selecting a red ball is initially 3/5, but the probability of selecting a second red ball, given the first is red, is 1/2. In the past there was a small cottage industry devoted to investigating the precise circumstances under which the principle of instantial relevance does or does not hold for a sequence of observations. If the observations in question can be imbedded in an infinitely exchangeable sequence (that is, into an infinite sequence X1 , X2 , ..., any finite segment X1 , ..., Xn of which is exchangeable), then instantial relevance does hold. After the power of the de Finetti representation theorem was appreciated, very simple proofs of this were discovered (see, e.g., [Carnap and Jeffrey, 1971, Chapters 4 and 5]). There are also simple ways of seeing this without using the representation theorem. For example, the principle of instantial relevance is equivalent to the assertion that the observations are nonnegatively correlated. If X1 , X2 , ..., Xn is an
282
S. L. Zabell
exchangeable sequence of random variables, then an elementary argument shows that the correlation coefficient ρ = ρ(Xi , Xj ) satisfies the simple inequality ρ≥−
1 . n−1
This is because (using both the formula for the variance of a sum and the exchangeability of the sequence) if σ 2 = V ar[Xi ], one has 0 ≤ V ar[X1 + ... + Xn ] = nσ 2 + n (n − 1) ρσ 2 . Thus, if the sequence can be indefinitely extended (so that one can pass to the limit n → ∞), it follows that ρ ≥ 0. The case ρ = 0 then corresponds to the case of independence (the past conveys no information about the future, inductive inference is impossible); and the case ρ > 0 corresponds to inductive inference and positive instantial relevance. 8 FINITE EXCHANGEABILITY In the end, infinite sequences are really just fictions, so we would rather not incorporate them into our Weltanschauung in an essential way. In this section we take a closer look at this question.
8.1 Extendability The de Finetti representation only holds for an infinite sequences; it is easy to construct counterexamples otherwise. Consider, for example, the exchangeable assignment 1 P (RR) = P (BB) = 0. P (RB) = P (BR) = ; 2 This corresponds to sampling without replacement from an urn containing one red ball (R) and one black ball (B). This exchangeable probability assignment on ordered pairs cannot be extended to one on ordered triples. To see this, suppose otherwise. Then 1 P (RBR) + P (RBB) = P (RB) = , 2 so either P (RBR) > 0 or P (RBB) > 0 (or both). Suppose without loss of generality that P (RBR) > 0. Then P (RR) ≥ P (RRB) = P (RBR) > 0 (the first inequality follows because probabilities are subadditive, that is, if A ⊆ B, then P (A) ≤ P (B); the equality because P is by assumption exchangeable). But this is impossible, since P (RR) = 0. (It is not hard to see this is typical: sampling without replacement from a finite population results in an exchangeable probability assignment that cannot be extended.)
Carnap and the Logic of Inductive Inference
283
In general, if X1 , X2 , ..., Xn is an exchangeable sequence, then it may or may not be possible to extend it to a longer exchangeable sequence X1 , X2 , ..., Xn , ..., Xn+r , r ≥ 1. If it is possible to do so for every r ≥ 1, then we can think of X1 , X2 , ..., Xn as the initial sequence of an infinitely exchangeable sequence X1 , X2 , X3 , ... (thanks to the Kolmogorov existence theorem). Thus the de Finetti representation theorem applies, the infinite sequence can be represented as a mixture of iid (independent and identically distributed) sequences, and hence a fortiori the initial segment of length n can be so represented. On the other hand, if a finite exchangeable sequence of length n has a representation as a mixture of iid sequences, it is immediate that it is infinitely extendable. Thus: A finite exchangeable sequence is infinitely extendable if and only if it is representable as a mixture of iid sequences. To summarize: in general a finite exchangeable sequence may or may not be extendable. Carnap alludes to this fact when he reports that while at the Institute for Advanced Studies in 1952–1953, he and his collaborator John Kemeny had talks with L. J. Savage. Among other things, Savage showed them that the use of a language LN with a finite number of individuals is not advisable, because a symmetric M -function in LN cannot always be extended to an M -function in a language with a greater number of individuals. [Carnap and Jeffrey, 1971, p. 3] Note the curious phrase “not advisable”. It is unclear why Savage thought this (if indeed he did): recall sampling without replacement from a finite population results in a perfectly respectable exchangeable assignment even though it cannot be extended. More generally think of any population which is naturally finite in extent, and to which we wish to extrapolate on the basis of a partial sample from it. (For example, think of a limited edition of a book, and whether or not such books are defective.) The phenomenon of non-extendability is no sense pathological. Or course there is a price to pay: the loss of the de Finetti representation. Or is there?
8.2
The finite representation theorem
Given a set of counts n = (n1 , ..., nt ), imagine an urn containing nj balls of each type, and suppose one successively draws out “at random” without replacement each ball in the urn (“at random” meaning that all possible sequences are judged equally likely). There are a total of (n1 + + nt )!/(n1 !...nt !) such sequences; the exchangeable probability assignment Hn giving each of these equal probability is called the hypergeometric distribution. If, more generally, X1 , , Xn is any exchangeable sequence whatsoever, and P (n) the corresponding probability assignment on the set of counts n, then the overall probability assignment P on the set of sequences is a mixture of the hypergeometric probabilities Hn using the weights
284
S. L. Zabell
P (n); compactly this can be expressed as P = P (n)Hn . n
This result is the finite de Finetti representation theorem. It is basically just the so-called “theorem of total probability” in disguise. It tells us that the structure of the generic finite exchangeable sequence is really quite simple. If the sequence is N long, and the outcomes can be of t different types, then you can think of it as a sequence of draws from an urn with N balls, each of which can be one of the t types, but the distribution of types of among the N balls (the n) is unknown. If (as the Spartans would say), you knew the distribution of types, then your probability assignment would be the appropriate hypergeometric distribution. But since you don’t, you assign a prior distribution to n and then average. Although the finite representation theorem is not quite as well known (or appreciated) as its big brother, the representation theorem for an infinite exchangeable sequence, it would be a serious mistake to underestimate it. To begin, thanks to the representation, there is a drastic reduction in the number of independent probabilities to be specified; in the case of tossing a coin 10 times, for example, from 210 − 1 = 1023 to 11. But there are also important conceptual and philosophical advantages to thinking in terms of the finite representation theorem.
8.3
The finite rule of succession
The classical rule of succession, that if in n trials there are k successes, then the probability of a success on the next trial is (k+1)/(n+2), assumes you are sampling from an infinite population (see [Laplace, 1774]). (Strictly speaking the last makes no sense, but it can be viewed as a shorthand for either sampling with replacement (so that the population remains unaltered by the sampling) or as passing to the limit in the case case of sampling from a finite population.) In particular, if all n are of the same type, then the probability that the next is also of this type is (n + 1)/(n + 2). But it is clear that the basic relevant question is a different one: the probability if you are sampling without replacement from a finite population. This question was first asked and answered by Prevost and L’Huilier [1799]. To answer the question, of course, one must make some assumption regarding the composition of the urn (that is, adopt some set of prior probabilities regarding the different possible urn compositions). The natural assumption, parallel to the Bayes-Laplace analysis, is to assume all possible vectors of counts are equally likely. Doing this, Prevost and L’Huilier were able to first derive the posterior probabilities for the different urn constitutions of the urn; and then from this derive the rule of succession as a consequence, the final result being that (given p successes out of m to date) the probability of a success on the next trial is (p + 1)/(m + 2), exactly the same answer as the classical Laplace rule of succession!
Carnap and the Logic of Inductive Inference
285
This result was subsequently independently rediscovered several times over the next century and a quarter, the last being by C. D. Broad in 1918, when it finally gained some traction in philosophical circles (see generally [Zabell, 1988]). The brute force mathematical derivation of this particular rule of succession requires the evaluation of a tricky combinatorial sum; and its history of successive rediscovery is a phenomenon that is sometimes seen in the mathematical literature when a result is interesting enough (so that it repeatedly attracts attention), hard enough (so that it is deemed worthy of publication), and obscure or technical enough (so that it is then subsequently easily forgotten or overlooked). But our point here is that this striking coincidence between the finite and infinite rules of succession, which, when viewed through the prism of the combinatorial legerdemain required to evaluate the necessary sum, appears to be a minor miracle, is in fact obvious when thought of in terms of the finite representation theorem. For consider. Suppose X1 , X2 , ... is an infinite exchangeable sequence of 0s and 1s having mixing measure dQ(p) = dp in the de Finetti representation (that is, the Bayes-Laplace process). If Sn = X1 + ... + Xn denotes the number of 1s in n trials, then, as noted earlier, 1 n k 1 p (1 − p)n−k dp = . P (Sn = k) = k n + 1 0 Now consider the initial segment X1 , X2 , .., Xn by itself. This is a finite exchangeable sequence, and so has a finite representation in terms of some mixture of hypergeometric probabilities. But the mixing measure for the finite representation in the dichotomous case is P (Sn = k), which is, as just noted, 1/(n + 1), the Prevost-L’Huilier prior (or, as Jack Good might put it, the Prevost-L’HuilierTerrot-Todhunter-Ostrogradskii-Broad prior). But the finite representation uniquely determines the stochastic structure of a finite exchangeable sequence; thus an n-long Prevost-L’Huiler sequence is stochastically identically to the initial, n-long segment of the Bayes-Laplace process, and therefore the two coincide in all respects, including (but not limited to) their rules of succession. No tricky sums! Viewed from the perspective of the philosophical foundations of inductive inference the finite rule of succession is important for two reasons vis-a-vis the classical Laplacean analysis: 1. It eliminates a variety of possible concerns about the occurrence of the infinite in the Laplacean analysis (e.g., [Kneale, 1949, p. 205]): that is, attention is focused on a finite segment of trials, rather than a hypothetical infinite sequence or population. 2. The frequency, propensity, or objective chance p that appears in the integral is replaced by the fraction of successes in a finite population; thus a purely personalist or subjective analysis becomes possible and objections to “probabilities of probabilities” or “unknown probabilities” (e.g., [Keynes, 1921, pp. 372 –75]) are eliminated.
286
S. L. Zabell
8.4 The finite continuum of inductive methods As one final example of both the utility and interest of considering finite exchangeable sequences, we note in passing that Johnson’s derivation of the continuum of inductive methods carries over immediately to the finite case, the chief element of novelty being that now the α parameters in the rule of succession can be negative (since, for example, when sampling without replacement from an urn, the more balls of a given color one sees, the less likely it becomes to see other balls of the same color); see [Zabell, 1982].
8.5 The proper role of the infinite Aristotle (Physics 3.6, see, e.g., [Heath, 1949, pp. 102–113]) distinguishes between the actual infinite and the potential infinite, a useful distinction to keep in mind when thinking about the use of the infinite in probability. One might summarize Aristotle as saying that the use of the infinite is only appropriate in its potential rather than actual sense. Let us apply this to the case of probability: theories that depend in an essential way on the actual infinite are fatally flawed. Consider von Mises’s frequency theory. In any theory of physical probability, if 0 < p < 1 is the probability of an outcome in a sequence of independent trials, then any finite frequency k in n trials has a positive probability. Thus any observed value of k is consistent with any possible value of p. In von Mises’s theory in order to achieve this consistency of any p with any k, it is essential that p be an infinite limiting frequency. But, being infinite in nature, p is unobservable, hence metaphysical (in the pejorative sense); see, e.g., [Jeffrey, 1977]. But, one might object, doesn’t the infinite representation theorem also suffer from this defect, since it holds just for infinitely exchangeable sequences (rather than finitely exchangeable sequences, the only things we really see)? The answer is no, if one correctly understands it from both a mathematical and a philosophical standpoint. Mathematical interpretation of the representation theorem In applied mathematics one frequently uses infinite limit theorems as approximations to the large but finite. That is, the sequence, although of course necessarily finite, is viewed as effectively unlimited in length. (So, for example, in tossing a coin, there is no practical limit to how many times we can toss it, although it will certainly wear down after many googles of tosses.) But the applied mathematician must also have some idea of when to use a limit theorem as an approximation and when not. This is the reason the central limit theorem (CLT) is is of practical use, but the law of the iterated logarithm (LIL) is not: the CLT provides an excellent approximation to sums of random variables for surprisingly small sample sizes; the LIL only for surprisingly large. What this ultimately means is that what the applied mathematician needs is either a generous fund of experience or a more informative mathematical result:
Carnap and the Logic of Inductive Inference
287
not just the limiting value but the rate of convergence to that limit. Happily such a result is available for the de Finetti representation theorem, thanks to Persi Diaconis and David Freedman [1980a]. First some notation: if S is a set, let S n denote its n-fold Cartesian product (n ≤ ∞). If p is a probability on S, let pn denote the corresponding n-fold product probability on S n (corresponding to an n-long, p-iid sequence). If P is a probability on S n , then Pk denotes its restriction to S k , k ≤ n. If Θ parametrizes the set of probabilities on S and μ is a a probability on Θ (to be thought of as a mixing measure), let Pµn denote the resulting exchangeable probability on S n ; that is pnθ dμ(θ). Pµn = Θ
Finally, if P and Q are probabilities on S n , let
||P − Q|| = maxn |P (A) − Q(A)| A⊂S
denote the variation distance between P and Q. Then one has the following result: Suppose S is a finite set of cardinality t and P is an exchangeable probability on S n . Then there exists a probability μ on the Borel sets of Θ and a constant c such that
2tk
n
f or all k ≤ n. ||Pk − Pµk || = Pk − pθ dμ(θ)
≤ n
This beautiful result has a number of interesting consequences. First, it makes precise the interrelationship between extendability and the existence of an integral representation. Given an exchangeable sequence of length k, if the sequence is extendable to a longer sequence of length n, then it can be approximated by an integral mixture to order k/n in variation distance. The more the sequence can be extended, the more it looks like an integral mixture. Thus it is not surprising (and Diaconis and Freedman in fact use the above theorem to prove) that a sequence which can be extended indefinitely (equivalently, is the initial segment of an infinitely exchangeable sequence) has an integral representation. But the theorem also tells us how to think about the application of the representation theorem. Given a sequence that is the initial segment of a “potentially infinite” sequence (that is, unbounded in any practical sense), thinking of it as an integral mixture is a reasonable approximate procedure (in just the same way as summarizing a population of heights in terms of a normal distribution is a reasonable approximation to an ultimately discrete underlying reality). For a very readable discussion of this topic, see [Diaconis, 1977]. Philosophical interpretation of the representation theorem From this perspective the representation is a tool used for mathematical approximation. The “parameter” p is a purely mathematical object, not a physical quantity. This was in fact de Finetti’s view: “it is possible... and to my mind preferable,
288
S. L. Zabell
to stick to the firm and unexceptionable interpretation that the limit distribution is merely the asymptotic expression of frequencies in a large, but finite, number of trials” [de Finetti, 1972, p. 216]. De Finetti was a finitist who rejected the use of countable additivity in probability as lacking a philosophical justification. (It is not a consequence of the usual Dutch book argument.) In particular, de Finetti’s statement and proof of the representation theorem uses only finitely additive probability. See Cifarelli and Regazzini [1996] for an outstanding discussion of the role of the infinite in de Finetti’s papers. 9 THE FIRST INDUCTION THEOREM There is a very interesting result, which Good [1975, p. 62] terms the first induction theorem. Its interest is that it makes no reference at all to exchangeability, and yet it provides an account of enumerative induction, in that it tells us that confirming instances (in a sense to be made precise in a moment) increase the probability of other potential instances. To be precise, if P (H) > 0 and P (Ej |H) = 1, j ≥ 1 (the Ej are “implications” of H), then (E1 E2 denoting the conjunction of E1 and E2 , and so on), lim P (En+1 En+2 ...En+m E1 E2 ...En ) = 1 n→∞
uniformly in m. The proof (due to [Huzurbazar, 1955]) is at once simple and elegant. Just note that for any n ≥ 1, one has P (E1 ...En H) = 1, hence P (E1 ...En ) ≥ P (E1 ...En En+1 ) ≥ P (E1 ...En En+1 H) = P (H) > 0.
It follows that un = P (E1 ...En ) is a decreasing sequence bounded from below by a positive number, and therefore has a positive limit. Thus un+m = 1; lim P (En+1 En+2 ...En+m E1 E2 ...En ) = lim n→∞ n→∞ un
and it is apparent that the convergence is uniform in m. The result is not so surprising for sampling from a finite population, but for a potentially infinite sequence is at first startling. It tells us that observing a sufficiently long sequence of confirming instances makes any further finite sequence, no matter how long, as close to one as desired. Good [1975, p. 62] says “the kudology is difficult”, but cites both Keynes [1921, Chapter 20] and Wrinch and Jeffreys [1921]; see also [Jeffreys, 1961, pp. 43–44]. 10
ANALOGY
Simple enumeration is an important form of inductive inference but there are also others, based on analogy. Carnap distinguished between two forms of analogy:
Carnap and the Logic of Inductive Inference
289
analogy by proximity and analogy by similarity; that is, proximity in time (or sequence number) and similarity of attribute. In the case of inductive analogy, Carnap wished to generalize his results, allowing for the possibility that the inductive strength of P varies depending on some measure of “closeness” of either time or attribute. In the case of attributes this required the specification of a “distance” on the attribute set; in the case of time such a metric is of course already present. But Carnap only obtained only partial results in this case (see [Carnap and Jeffrey, 1971, p. 1; Jeffrey, 1980, Chapter 6, Sections 16–18]). De Finetti and his successors were more successful. De Finetti formulated early on a concept of partial exchangeability [de Finetti, 1938], differing forms of partial exchangeability corresponding to differing forms of analogy. He viewed matters in effect as a spectrum of possibilities; exchangeability representing one extreme, a limiting case of “absolute” analogy. At the other extreme all one has is Bayes’s theorem, P (E|A) = P (AE)/P (A); absent “particular hypotheses concerning the influence of A on E”, nothing further can be said, “no determinate conclusion can be deduced”. The challenge was to find “other cases ... more general but still tractable”. For an English translation of de Finetti’s paper, see [Jeffrey, 1980, Chapter 9]. Diaconis and Freedman [Jeffrey, 2004, pp. 82–97] provides a very readable introduction to de Finetti’s ideas here.
10.1
Markov exchangeability
One example of building analogy by proximity into a probability function is the concept of Markov exchangeability (describing a form of analogy in time). Suppose X0 , X1 , ... is an infinite sequence of random outcomes, each taking values in the set S = {c1 , ..., ct }. For each n ≥ 1, consider the statistics X0 (the initial state of the chain) and the transition counts nij recording the number of transitions from ci to cj in the sequence up to Xn . (That is, the number of times k, 0 ≤ k ≤ n − 1, such that Xk = ci and Xk+1 = cj .) If for all n ≥ 1, all sequences X0 , ..., Xn starting out in the same initial state x0 and having the same transition counts nij have the same probability, then the sequence is said to be Markov exchangeable. Suppose further that the sequence is recurrent: the probability is 1 that Xn = X0 for infinitely many n. (That is, the sequence returns to the initial state infinitely often.) There is, it turns out, a de Finetti type representation theorem for the stochastic structure (probability law) of such sequences: they are precisely the mixtures of Markov chains, just as ordinary exchangeable sequences are mixtures of binomial or multinomial outcomes [Diaconis and Freedman, 1980b]. Furthermore there is also a Johnson-Carnap type rule of succession [Zabell, 1995]. Of course one might ask why Markov exchangeability is a natural assumption to make. Diaconis and Freedman [Jeffrey, 2004, p. 97] put it well: “If someone ... had never heard of Markov chains it seems unlikely that they would hit on the appropriate notion of partial exchangeability. The notion of symmetry seems strange at first ... A feeling of naturalness only appears after experience and
290
S. L. Zabell
reflection.” For further discussion of Markov exchangeability and its relation to inductive logic, see [Skyrms, 1991].
10.2 Analogy by similarity Given the tentative and limited nature of Carnap’s attempt’s to formulate an inductive logic that incorporated analogy by similarity, this stood as an obvious challenge and since Carnap’s death there have been a number of attempts in this direction; see, e.g., [Romeijn, 2006] and the references there to earlier literature. Skyrms [1993; 1996] suggests using what he terms “hyperCarnapian” systems: finite mixtures of Dirichlet priors. He argues (p. 331): “In a certain sense, this is the only solution to Carnap’s problem. ... HyperCarnapian inductive methods are the general solution to Carnap’s problem of analogy by similarity”. But what if the outcomes are continuous in nature? In order to discuss this, it will be necessary to first revisit the definition of exchangeability.
10.3 The general definition of exchangeability Consider first the general definition of exchangeability. A probability P on the space of sequences x1 , x2 , ..., xn of real numbers (that is, on Rn ) is said to be (finitely) exchangeable if it is invariant under all permutations σ of the index set {1, ..., n}; a probability P on the space of infinite sequences x1 , x2 , ... (that is, on R∞ ) is said to be infinitely exchangeable if its restriction Pn to finite sequences x1 , x2 , ..., xn is exchangeable for each n ≥ 1. There is a sweeping generalization of the de Finetti representation theorem that characterizes such probabilities. Some notation, briefly. Let {Pθ : θ ∈ Θ} denote the set of independent and identically distributed (iid) probabilities on infinite sequences. (That is, if pθ is a probability measure on R, then Pθ = (pθ )∞ is the corresponding product measure on R∞ . Here θ is just an index for the probabilities on the real line. Certain measure-theoretic niceties are being swept under the carpet at this point to simplify the exposition.) Now suppose that P is an infinitely exchangeable probability on infinite sequences. Then there exists a unique probability μ on Θ such that Pθ dμ(θ). P = Θ
That is, every exchangeable P on infinite sequences can be represented as a mixture of independent and identically distributed probabilities. (It is clear that every mixture of iid sequences is exchangeable; it is the point of the representation theorem that conversely every infinitely exchangeable probability arises thus. Aldous [1986] contains an outstanding survey of this and other generalizations of the original de Finetti theorem.) Thus, in order to arrive at P , it suffices to specify μ. Unfortunately, Θ is an uncountably infinite set, and the representation usefully reduces the dimensionality
Carnap and the Logic of Inductive Inference
291
of the problem of determining P only if one is able to exploit a difference in infinite cardinals!
10.4
The pragmatic Bayesian approach
In practical Bayesian statistics one sometimes proceeds as follows. Based on the background, training, and experience of the statistician, it is judged that the underlying but unknown distribution pθ of a population of numbers is a member of some particular parametric family (for example, normal, exponential, geometric, or Poisson) and it is the task of the statistician to estimate the unknown parameter θ. The parameter space Θ is now finite dimensional, often one dimensional. The mathematical model for a sample from such a population is an iid sequence of random variables X1 , X2 , X3 , ..., each Xj having distribution pθ , so that X1 , X2 , X3 , ... has distribution Pθ = (pθ )∞ . Being a Bayesian, the statistician assigns a “prior” or initial probability to Θ; the average over Θ using dμ then specifies a probability P as in the displayed formula above. Given a “random sample” (iid sequence) X1 , ..., Xn from the population, the statistician then computes the “posterior” or final probability P (θ|X1 , ..., Xn ) using Bayes’s theorem. In general, the larger the sample, the more concentrated the posterior distribution is about some value of the parameter. For example, if the density of pθ is (x − θ)2 1 , −∞ < x < ∞, pθ (x) = √ exp 2 2π (that is, normal, standard deviation one, unknown mean θ), then (except for certain “over-opinionated” priors) the posterior distribution for θ will be concentrated ¯ n , the sample mean for the random sample X1 , ..., Xn . about X It is apparent that this procedure in fact captures precisely the form of analogical ¯ n = x, then reasoning that Carnap had in mind. That is, if the sample mean is X the resulting posterior distribution expresses support for the belief that the next observation will be in the vicinity of x, the strength of the evidence for different values y decreasing as the distance of y from x increases. “But”, the Carnapian may object, “this is an enterprise entirely different from the one Carnap envisaged! There is no logical justification proffered for the choice of the parametric family pθ , or the choice of the prior dμ”!! True, but how might such a justification–if it existed–proceed? Consider the multinomial case in the continuum of inductive methods. There the de Finetti representation theorem tells us that the most general exchangeable sequence is a mixture of multinomial probabilities. The elegance of the JohnsonCarnap approach is that it replaces the essentially arbitrary, albeit mathematically convenient, quantitative assumption of the practicing Bayesian statistician that the prior is a member of a specific low-dimensional family (the Dirichlet priors
292
S. L. Zabell
on ∆t−1 ) by the purely qualitative sufficientness postulate. That is, based on information received one might well arrive at the purely qualitative judgment that the probability that the next observation will be of a certain type should depend only on the number of that type already observed and the total number of observations to date. This is certainly a more principled approach to the problem of assigning a prior, in stark contrast to assuming the prior is Dirichlet purely for reasons of mathematical convenience. Framed in this way, the form of a principled Bayesian approach to the more general problem (of deciding on priors for other parametric families) is also clear. Can one find, at least for the most common parametric families in statistics, a natural qualitative assumption on a sequence of observations in addition to exchangeability that implies the sequence is in fact not just an arbitrary mixture of iid probabilities, but a mixture of distributions strictly within the given parametric family? For example, what would be an analog of the sufficientness postulate ensuring that an exchangeable sequence is a mixture of normal, or exponential, or geometric, or Poisson distributions?
10.5 Group invariance and sufficient statistics Thanks to some very deep and hard mathematics on the part of David Freedman, Persi Diaconis, Phil Dawid, and others, one can in fact answer this question for many of the most common statistical families. Here are some examples, followed by a brief summary of the currently known state of the theory. Let φµ,σ2 (x) denote the density of the normal distribution with mean μ and variance σ 2 ; that is, (x − μ)2 1 exp − . φµ,σ2 (x) = √ 2σ 2 2πσ If a random variable X has such a distribution, then this is denoted X ∼ N (μ, σ 2 ). The first example, characterizing exchangeable sequences that are a mixture of N (0, σ 2 ), is admittedly not the most interesting from a statistical standpoint, but it provides a simple illustration of the type of results the theory provides. EXAMPLE 1. An infinite sequence of random variables X1 , X2 , X3 , ... is said to be orthogonally invariant if for every n ≥ 1, the sequence X1 , ..., Xn is invariant under all orthogonal transformations of Rn . (An orthogonal transformation is a linear map that preserves distances. It can be thought as an n-dimensional rotation.) Schoenberg’s theorem tells us that every orthogonally invariant infinite sequence of random variables is a mixture of N (0, σ 2 ) iid random variables. (Note that a coordinate permutation is a very special kind of orthogonal transformation; thus orthogonal invariance entails exchangeability and is much more restrictive.) In terms of the de Finetti representation, if P is the distribution of the orthogonally
Carnap and the Logic of Inductive Inference
293
invariant sequence X1 , X2 , ..., and Pσ the distribution of an iid sequence of N (0, σ 2 ) random variables, then there exists a probability measure Q on [0, ∞) such that ∞ Pσ dQ(σ). P = 0
There is an equivalent formulation of Schoenberg’s theorem in terms of sufficient statistics. Consider the statistic Tn = X12 + ... + Xn2 .
Then the property of orthogonal invariance is equivalent to the property that, for each n ≥ 1, conditional on Tn the distribution of X1 , .., Xn is uniform √ on the n − 1-sphere of radius Tn . Furthermore, the limit T = limn→∞ Tn / n exists almost surely and P (T ≤ σ) = Q([−∞, σ)); that is, the mixing measure Q is the distribution of the limit T . This has (accepting for the moment that one is willing to talk about infinite sequences of random √ variables, about which more later), a striking consequence. The statistic Tn / n is the standard sample estimate of the standard deviation σ. Thus one has a natural interpretation of both the Q and the σ appearing in the de Finetti representation. Far from being merely mathematical objects in the representation theorem, they acquire a significance of their own. The “parameter” (σ) emerges as the limit of the sample standard deviation (note one is certain of the existence of the limit but not its value); Q is our degree of belief regarding the unknown parameter (our uncertainty regarding the value of σ); and conditional on the limit being σ the sequence is iid N (0, σ 2 ). Thus one has a complete explication of the role of parameters, parametric families, and priors used by the pragmatic Bayesian statistician in this case. The particular parametric family arises from the particular strengthening of exchangeability (here orthogonal invariance) reflecting the knowledge of the statistician in this case. (If he doesn’t subscribe to orthogonal invariance, he shouldn’t be using a mixture of mean zero normals!) The single parameter σ is interpreted as the large sample limit of the sample standard deviation; and the mixing measure Q reflects our degree of belief as to the value of this limit. Very neat! EXAMPLE 2. Suppose P is a mixture of iid N (μ, σ 2 ) normals. Then it is easy to see that P is invariant under transformations that are orthogonal and preserves the line Ln : x1 = x2 = ... = xn . Dawid’s theorem states that this is in fact the necessary and sufficient condition for P to be such a mixture. In this case there are two sufficient statistics: Vn = X12 + ... + Xn2 ; Un = X1 + ... + Xn ,
and the symmetry assumption is equivalent to the property that, conditional on Un , Vn , the distribution of X1 , ..., Xn is uniform on the resulting (n − 2)-sphere. Furthermore, one has that the limits √ V = lim Vn / n U = lim Un /n, n→∞
n→∞
294
S. L. Zabell
exist almost surely and generate the mixing measure on the two-dimensional parameter space R × [0, ∞).
Characterizations of this kind are known for a number of standard statistical distributions. Many of these form “exponential families”; Diaconis and Ylvisaker [1980] characterize the conjugate priors for such families in terms of the linearity of their posterior expectations. In other cases the challenge remains to find such characterizations, preferably in terms of both symmetry condition and sufficient statistics. Diaconis and Freedman [1984] is an outstanding exposition, describing many such results and placing them into a unified theoretical superstructure. In sum: Carnap recognized the limited utility of the inductive inferences that the continuum of inductive methods provided, and sought to extend his analysis to the case of analogical inductive inference: an observation of a given type makes more probable not merely observations of the exact same type but also observations of a “similar” type. The challenge lies both in making precise the meaning of “similar”, and in being able to then derive the corresponding continua. Carnap sought to meet the first challenge by proposing that underlying judgements of similarity is some notion of “distance” between predicates; but then immediately hit the brick wall of how one could use a general notion of distance to derive plausible continua. Neither Carnap nor any of his successors were able to solve this problem (although not for want of trying). The Diaconis-Freedman theory enables us to see why. If one recognizes that the problem of analogical reasoning is essentially that of justifying parametric Bayesian inference, then it is indeed possible to derive attractive results that parallel those for the multinomial case. But these results are not trivial; they involve very hard mathematics, and although many special cases have been successfully tackled, it is possible to argue that no complete theoretical superstructure yet exists.
11
THE SAMPLING OF SPECIES PROBLEM
Another important problem concerns the nature of inductive inference when the possible types or species are initially unknown (this is sometimes referred to in the statistical literature as the sampling of species problem). Carnap thought this could be done using the equivalence relation R: belongs to the same species as. (That is, one has a notion of equivalence or common membership in a species, without prior knowledge of that species.) Carnap did not pursue this idea further, however, thinking the attempt premature given the relatively primitive state of the subject at that time. Carnap’s intuition was entirely on the mark here. One can construct a theory for the sampling of species problem, one that parallels the classical continuum of inductive methods — but the attendant technical difficulties are considerable, exchangeable random sequences being replaced by exchangeable random partitions. (Two sequences generate the same partition if they have the same frequencies of frequences ar defined earlier.) Fortunately, the English mathematician J. H. C.
Carnap and the Logic of Inductive Inference
295
Kingman did the necessary technical spadework in a brilliant series of papers a quarter of a century ago. Kingman’s beautiful results enable one to establish a parallel inductive theory for this case, including a Johnson-type characterization of an analogous continuum of inductive methods; see [Zabell, 1992; 1997]. In brief, consider the following three axioms, that parallel (in two cases) or extend (in one case) those of Johnson. 1. All sequences of outcomes are possible (have positive probability). 2. The probability of seeing on the next trial the i-th species already seen, is a function of the number of times that species has been observed, ni , and the total sample size n: f (ni , n). 3. The probability of observing a new species is a function only of the number of species already observed t and the sample size n: g(t, n). It is a remarkable fact that if these three assumptions are satisfied, then one can prove that the functions f (ni , n), g(t, n) are members of a three-dimensional continuum described by three parameters α, θ, γ. The continuum of inductive methods for the sampling of species Case 1: If ni < n for some i, then f (ni , n) =
ni − α , n+θ
tα + θ . n+θ
g(t, n) =
Note that if ni < n, then t > 1, there are at least two species, and the universal generalization is disconfirmed. Case 2: If ni = n for some i, then f (ni , n) =
ni − α + cn (γ), n+θ
here cn (γ) =
g(t, n) =
tα + θ − cn (γ); n+θ
γ(α + θ)
(n + θ) γ + (α + θ − γ)
n−1 j=1
j−α j+θ
represents the increase in the probability of seeing the i-th species again due to the confirmation of the universal generalization. Not all parameter values are possible: one must have 0 ≤ α < 1; θ > −α; 0 ≤ γ < α + θ. There is a simple interpretation of the three parameters θ, α, γ. The first, θ, is related to the likelihood of new species being observed; the larger the value of θ, the more likely it is that the next observation is that of a new species.
296
S. L. Zabell
Observation of a new species has a double inductive import: it is a new species, and it is a particular species. Observing it contributes both to the likelihood that a new species will again be observed and, if a new species is not observed, that the species just observed will again be observed (as opposed to another species already observed); this is the role of α. Finally, the parameter γ is related to the likelihood that only one species will be observed. If ǫ is the initial probability that there will only be one species, then γ = (α + θ)ǫ. The special case α = γ = 0 is of particular interest. In this case the probability of an “allelic partition” (set of frequencies of frequencies ar ) has a particularly simple form: given a sample of size n, P (a1 , a2 , ..., an ) =
n
n! θar ; a θ(θ + 1)...(θ + n − 1) r=1 r r ar !
this is the Ewens sampling formula. There is a simple urn model for such a process in this case, analogous to the Polya urn model [Hoppe, 1984]. Suppose we start out with an urn containing a single, black ball: the mutator. The first time we select a ball, it is necessarily the black one. We replace it, together with a ball of some color. As time progresses, the urn contains the mutator and a number of colored balls. Each colored ball has a weight of one, the mutator has weight θ. The likelihood of selecting a ball is proportional to its weight. If a colored ball is selected, it is replaced together with a ball of the same color; this corresponds to observing a species that has already been observed before (hence balls of its color are already present). If the mutator is selected, it is replaced, together with a ball of a new color ; this corresponds to observing a new species. It is not difficult to verify that the rules of succession for this process are f (ni , n) =
ni ; n+θ
g(n) =
θ . n+θ
Note that in this case the probability of a new species does not depend on the number observed. Such predictive probabilities arguably go back to De Morgan; see [Zabell, 1992]. 12 A BUDGET OF PARADOXES Strictly speaking, true paradox (in the sense of a basic contradiction in the theory itself) is no more possible in the Bayesian framework than it is in propositional logic: both are theories of consistency of input. The term “paradox” is often used instead to describe either some unexpected (but reasonable) consequence of the theory (so that we learn something from it); or an inconsistency arising from conflicting sets of inputs (which is what the theory is supposed to detect); or an apparent failure of the theory to explain what we regard as a valid intuition (which should be viewed as more of a challenge than a paradox). Nevertheless, analyzing and understanding such conundrums often gives us much greater insight
Carnap and the Logic of Inductive Inference
297
into a subject, and the theory of probability has certainly had its fair share of such “challenge problems”. In the following paragraphs a few of these paradoxes are briefly noticed, more by way of initial orientation and an entry into the literature, than any detailed analysis. Indeed the literature on all of these is considerable.
12.1
The paradoxes of conditional probability
There is an amusing and interesting literature concerning conditional probability paradoxes such as the paradox of the second ace [Shafer, 1985], the three prisoner paradox [Falk, 1992], and the two-envelope paradox [Katz and Olin, 2007]. The unnecessary controversies that sometimes arise over these (for example, in Philosophy of Science and The American Statistician, names omitted to protect the guilty) are object lessons in the pitfalls that can attend informal attempts to analyze problems based on vague intuitions without the rigor of first carefully defining the sample space of possibilities or modeling the way information is received. Properly understood these puzzles serve as examples of the utility of the theory, not its deficiencies.
12.2
Hempel’s paradox of the ravens
Nicod’s criterion states that an assertion “all A are B” is supported by an observation of an A that is also a B; Hempel’s equivalence condition that two logically equivalent propositions are equally confirmed by the same evidence. Hempel’s paradox [Hempel, 1945], in its best-known (or most notorious) form considers the assertion “all ravens are black”. This is equivalent to its contrapositive, “all nonblack objects are not ravens”. If one then observes a pink elephant, does this confirm the proposition “all ravens are black”? Strictly speaking this is not a paradox of logical or subjective probability, because it follows just from Nicod’s criterion and the equivalence condition. It is in any case easily accommodated within the Bayesian framework which, in brief, notes that pink elephants can indeed confirm black ravens, albeit to a very slight degree; see, e.g., [Hosiasson-Lindenbaum, 1940; Good, 1960]. Vranas [2004a], Howson and Urbach [2006, pp. 99–103], Fitelson [2008] provide entries to the recent literature; Sprenger [2009] provides a general survey and assessment.
12.3
Goodman’s new riddle of induction
For Carnap, probability1 is analytic and syntactic; probability2 synthetic and semantic. Returning in 1941 to Keynes’s Treatise on Probability with increased appreciation, Carnap sought to provide a satisfactory technical and quantitative foundation for inductive inference he saw as absent in Keynes. But after his paper proposing a purely syntactic justification for inductive inference [Carnap, 1945b], Nelson Goodman [1946] immediately published a serious challenge to it. To use
298
S. L. Zabell
the example later put forward by Goodman in Fact, Fiction, and Forecast (1954), under the striking heading of “the new riddle of induction”, Goodman defined a predicate grue: say an object is grue if, for some fixed time t, it is green before t and blue after. If all emeralds observed prior to time t are green, then this is equally consistent with their being either green and grue, and therefore apparently supports to an equal degree the expectation that emeralds observed after time t will be either green or red. Goodman’s conclusion was that inductive inference is not purely syntactic in nature; that to varying degrees predicates are more or less projectible, projectability depending on the extent to which a predicate is entrenched in natural language. Although Goodman and Carnap soon agreed to disagree, there was no escape; and Goodman’s point is now generally accepted. (Carnap sought to meet this objection by invoking his requirement of total evidence, of which more in a moment.) Goodman’s “new riddle” has sparked a substantial literature (see, e.g., [Stalker, 1994]). For a recent survey, see Schwartz [2009]. From a Bayesian perspective, projectability is effectively a question of the presence of exchangeability (or partially exchangeability); and as such this literature may be viewed as a complement to, rather than rival of the subjectivist position (see, e.g., [Horwich, 1982, pp. 67– 72]). For Carnap’s final views on grue, see [Carnap and Jeffrey, 1971, pp. 73–76].
12.4 The principle of total evidence Carnap’s initial defense to Goodman’s example was to invoke a requirement of total evidence, that in the application of inductive logic to a given knowledge situation, the total evidence available must be taken as basis for determining the degree of confirmation. [Carnap, 1950, p. 211] This closed one hole in the dike, only for another to arise. In 1957 Ayer raised a fundamental question: in any purely logical theory of probability, why are new observations important? This is an issue that, as Good [1967] observes, is both related to the principle of total evidence and relevant to subjective theories of probability. Good’s solution to the conundrum was a neat one: [I]n expectation, it pays to take into account further evidence, provided that the cost of collecting and using this evidence, although positive, can be ignored. In particular, we, should use all the evidence already available, provided that the cost of doing so is negligible. With this proviso then, the principle of total evidence follows from the principle of rationality [that is, of maximizing expected utility]. For further discussion of the principle of total evidence, see [Skyrms, 1987]; for the value of knowledge, see [Horwich, 1982, pp. 122–129; Skryms, 1990, Chapter 4].
Carnap and the Logic of Inductive Inference
299
Related questions here are Glymour’s problem of old evidence (if a theory T entails an experimental outcome E, but one observes E before this is discovered, does this increase the probability of T ?), see, e.g., [Garber, 1983; Jeffrey, 1992, Chapter 5; Earman, 1992, Chapter 5; Jeffrey, 2004, pp. 44-47; Howson and Urbach, 2006, pp. 197–20]; and I. J. Good’s concept of dynamic (or evolving) probability [Good, 1983, Chapter 10]. Central to both is the issue of the appropriateness of the principle of logical omniscience: if H logically entails E, then P (E | H) = 1. As Good notes [1983, p. 107], invoking a standard chestnut, it makes sense for purposes of betting to assign a probability of 1/10 that the millionth digit of π is a 7, even though one can, given sufficient times and resources, compute the actual digit (so that some would argue that the probability is either 0 or 1 depending). Discussion of this issue goes back at least to Polya [1941]; Hacking [1967] deals with the issue in terms of sentences that are “personally possible”. (Of course from a practical Bayesian perspective one simple solution is to work with probabilities defined on subsets of a sample space rather than logical propositions or sentences. Thus in the case of π, take the sample space to be the set {0, 1, ..., 9}, and assign a coherent probability to the elements of the set. Whether or not it is profitable to expand the sample space to accommodate further events then goes to the issue of the value of further knowledge.)
12.5
The Popper-Carnap controversy and Miller’s paradox
Karl Popper was a lifelong and dogged opponent of Carnap’s inductivist views. In Appendix 7 of his Logic of Scientific Discovery [Popper, 1968] Popper made the claim that the logical probability of a universal generalization must be zero; today this can only be regarded as an historical curiosity. For two critiques (among many) of Popper’s claim, see [Howson, 1973; 1987]. For those interested in the more general debate between Popper and Carnap, their exchange in the Schillp volume on Carnap [Schillp, 1963] is a natural place to start. For a general overview, see [Niiniluoto, 1973]. One important thread in the debate was Miller’s paradox ; Jeffrey [1975] is at once a useful reprise of the initial debate, and a spirited rebuttal. Closely related to Miller’s paradox is Lewis’s “principal principle”; see [Vranas, 2004b] for a recent discussion and many earlier references. For a more sympathetic view of Popper than the one here, see [Miller, 1997].
13
CARNAP REDUX
Thus far we have discussed Carnap’s basic views regarding probability and inductive inference, some of his technical contributions to this area, and some of the extensions of Carnap’s approach that took place during his lifetime and after. In this final part of the chapter we return to the philosophical (rather than technical)
300
S. L. Zabell
underpinnings of Carnap’s approach, and attempt to place them in the context of both his predecessors and his successors.
13.1 “Two concepts of probability” In his 1945 paper “The Two Concepts of Probability”, Carnap advanced his view of “the problem of probability’. Noting a “bewildering multiplicity” of theories that had been advanced over the course of more than two and a half centuries, Carnap suggested one had to carefully steer between the Scylla and Charybdis of assuming either too few or too many underlying explicanda, and settled on just two. These two underlying concepts Carnap called probability1 and probability2 : degree of confirmation versus relative frequency in the long run. Carnap’s identification of these two basic kingdoms of probability was not however novel; it is clearly stated in Poisson’s 1837 treatise on probability (where Poisson uses the terms probability and chance to distinguish the two). Thus Poisson writes: In this work, the word chance will refer to events in themselves, independent of our knowledge of them, and we will retain the word probability ... for the reason we have to believe. [Poisson, 1837, p. 31] Much the same distinction was made shortly after by Cournot [1843], Exposition de la theorie des chances et des probabilit´es, where he notes its “double sense”, which he refers to as subjective and objective, a terminology also found later in [Bertrand, 1890] and [Poincar´e, 1896]. Hacking [1975, p. 14] sees the distinction as going even further back to Condorcet in 1785. For discussion of Poisson and Cournot, see [Good, 1986, pp. 157–160; Hacking, 1990, pp. 96–99]. In the 20th century, Frank Plumpton Ramsey, one of the great architects of the modern subjective theory, likewise noted the possible validity of both senses: In this essay the Theory of Probability is taken as a branch of logic, the logic of partial belief and inconclusive argument; but there is no intention of implying that this is the only or even the most important aspect of the subject. Probability is of fundamental importance not only in logic but also in statistical and physical science, and we cannot be sure beforehand that the most useful interpretation of it in logic will be appropriate in physics also. Indeed the general difference of opinion between statisticians who for the most part adopt the frequency theory of probability and logicians who mostly reject it renders it likely that the two schools are really discussing different things, and that the word ’probability’ is used by logicians in one sense and by statisticians in another. This is as clear a statement of Carnap’s distinction as one might imagine. (It can also be found clearly stated in a number of other places such as [Polya, 1941; Good, 1950].)
Carnap and the Logic of Inductive Inference
301
Thus, although the clear recognition of the fundamentally dual nature of probability did not originate with Carnap, the importance of his contribution is this: despite clear statements by Poisson in the 19th century, Ramsey in the 20th, and others both before and after, the lesson had not been learned; and even those who recognized the duality implicit in the usage of the word for the most part believed this to reflect a confusion of thought, only one of the two senses being truly legitimate. By carefully, forcefully, and in sustained fashion arguing for the legitimacy of both, Carnap enabled the distinction to at last become an entrenched philosophical commonplace. “The duality of probability has long been known to philosophers. The present generation may have learnt it from Carnap’s weighty Logical Foundations” [Hacking, 1975, p. 13].
13.2
The later Carnap
Just as there is an early and later Wittgenstein, there is an early and later Carnap in inductive logic. Some of these changes were technical, but others reflected substantial shifts in Carnap’s underlying views. The appearance of Carnap’s book generated considerable discussion and debate in the philosophical community. A second volume was promised, but never appeared. Like many before him, who found themselves enmeshed in the intellectual quicksand of the problem of induction (such as Bernoulli and Bayes), Carnap continued to grapple with the problem, refining and extending his results, but found that new advances and insights (on the part of himself, his collaborators, and others) were coming so quickly that he eventually abandoned as impractical the project of a definitive and systematic book-length treatment in favor of publishing from time to time compilations of progress reports. Two such installments eventually appeared [Carnap and Jeffrey, 1971; Jeffrey, 1980], although even these were delayed far past their initially anticipated date of publication. Because no true successor to his Logical Foundations of Probability ever appeared, it is not always appreciated just how much of an evolution in Carnap’s views about probability took place over the last two decades of his life. This change reflected in part a changing environment: the increasing appreciation of the prewar contributions of Ramsey and de Finetti, and the publication of such books as [Good, 1950; Savage, 1954; Raiffa and Schlaifer, 1961]. Important materials in documenting this shift include the introduction to the second [1962] edition of [Carnap, 1950], his paper “The aim of inductive logic” ([Carnap, 1962], reprinted in revised form in [Carnap and Jeffrey, 1971, Chapter 1]), Carnap’s contributions to the Schilpp [1963] volume, and his posthumous “Basic system of inductive logic” ([Carnap and Jeffrey, 1971, Chapter 2; Jeffrey, 1980, Chapter 6]). Technical shifts Some of these shifts, although technical in nature, were quite important. First, there was a shift from sentences in a formal language to (effectively) subsets of
302
S. L. Zabell
a sample space. This reflected in part a desire to use the technical apparatus of modern mathematical probability, and in part a desire to formulate inductive logic in terms that had come to be standard in mathematical probability theory and theoretical statistics, where probabilities are attributed to “events” or (“propositions”) which are construed as sets of entities which can handily be taken to be models, in the sense in which that term is used in logic. [Carnap and Jeffrey, 1971, p.1] Second, as discussed at the beginning of this chapter, Carnap accepted the Ramsey–de Finetti–Savage link of probability to utility and decision making, its betting odds interpretation, the use of coherence and the Dutch book to derive the basic axioms of probability, and the central role of Bayes’s theorem in belief revision. This placed Carnap squarely in the Bayesian camp, the differences coming down to ones of the existence or status of further epistemic constraints. This change came fairly quickly; it is already evident in Carnap’s 1955 lecture notes [Carnap, 1973]. It is carefully stated in Carnap [1962] and then systematically elaborated in his Basic System. Carnap also announced in the preface to his second edition of Logical Foundations the abandonment of his requirements of logical independence (replacing it by Kemeny’s “meaning postulates”), and completeness for primitive predicates (replacing it by axioms relevant to language extensions). These are of less interest to us here. The emerging Bayesian majority Carnap’s shift to the subjective was certainly noted by others. I. J. Good, for example, remarks “Between 1950 and 1961 Carnap moved close to my position in that he showed a much increased respect for the practical use of subjective probabilities” [Good, 1975, p. 41; see also p. 40, Figure 1]. But for the best evidence of this convergence of view between Carnap and the subjectivists, however, one can summon Carnap himself as a witness. In his Basic System (his last, posthumously published work on inductive inference), Carnap tells us I think there need not be a controversy between the objectivist point of view and the subjectivist or personalist point of view. Both have a legitimate place in the context of our work, that is, the construction of a set of rules for determining probability values with respect to possible evidence. At each step in the construction, a choice is to be made; the choice is not completely free but is restricted by certain boundaries. Basically, there is merely a difference in attitude or emphasis between the subjectivist tendency to emphasize the existing freedom of choice, and the objectivist tendency to stress the existence of limitations. [Jeffrey, 1980, p. 119]
Carnap and the Logic of Inductive Inference
303
The ultimate difference between Carnap and subjectivists of the de Finetti– Savage–Good stripe, then, appears to be how they view the logical status of these additional constraints. Carnap seems to have thought of them as forming in some sense a sequence or hierarchy (thus his “at each step in the construction”); modern Bayesians, in contrast, view these more as auxiliary tools. They do not deny the utility of the symmetry arguments that underly much of the Carnapian approach but, as Savage remarks, they “typically do not find the contexts in which such agreement obtains sufficiently definable to admit of expression in a postulate” [Savage, 1954, p. 66]. Such arguments fall instead under the rubric of what I. J. Good terms “suggestions for using the theory, these suggestions belonging to the technique rather than the theory” itself [Good, 1952, p. 107]. Let us take this a little further. Is what is at stake really just a “difference in attitude or emphasis” between choice and limitation? Here is how W. E. Johnson himself saw the enterprise (as he notes in his paper deriving the continuum of inductive methods): the postulate adopted in a controversial kind of theorem cannot be generalized to cover all sorts of working problems; so it is the logician’s business, having once formulated a specific postulate, to indicate very carefully the factual and epistemic conditions under which it has practical value. [Johnson, 1932, pp. 418–419] This is surely right. There are no universally applicable postulates: different symmetry assumptions are appropriate under different circumstances, none is logically compulsory. The best one can do is identify symmetry assumptions that seem natural, have identifiable consequences, and may be a natural reflection of one’s beliefs under some reasonable set of circumstances. In judging the appropriate use of the sufficientness postulate, for example, the issue is not one of favoring “limitation” versus “choice”; it is one of whether or not you think the postulate accurately captures the epistemic situation at hand. This is the mission of partial exchangeability: to find different possible qualitative descriptions of the “the factual and epistemic conditions” that obtain in actual situations, descriptions that then turn out to have useful and satisfying quantitative implications. From credence to credibility Nevertheless Carnap did argue for additional symmetry requirements such as exchangeability; his explanation of this is perhaps most clearly presented in his 1962 paper “The aim of inductive logic”. It will be apparent that Carnap and the subjectivists part company at this point because they had radically different goals. Let Crt denote the subjective probability of an individual at time n, termed by Carnap credence. Using Bayes’s rule, Carnap imagines a sequence of steps in which one obtains discrete quanta of data Ej , j = 1, 2, ..., giving rise in turn to a sequence of credences Crt+j , j = 1, 2, ....
304
S. L. Zabell
In the case of a human being we would hesitate to ascribe to him a credence function at a very early time point, before his abilities of reason and deliberate action are sufficiently developed. But again we disregard this difficulty by thinking either of an idealized human baby or of a robot. ... [L]et us acribe to him an inital credence function Cr0 for the time point T0 before he obtains his first datum E1 . (This curiously echos Price’s analysis of inductive inference in his appendix to Bayes’s essay; see [Zabell, 1997, Section 3].) The subsequent conditional credences based on this initial credence Cr0 Carnap terms a credibility; and contrasts these with the “adult credence functions” of Ramsey, Savage, and de Finetti: When I propose to take as a basic concept, not adult credence, but either initial credence or credibility, I must admit that these concepts are less realistic and remoter from overt behavior and may therefore appear as elusive and dubious. On the other hand, when we are interested in rational decision theory, these concepts have great methodological advantages. Only for these concepts, not for credence, can we find a sufficient number of requirements of rationality as a basis for the construction of a system of inductive logic. Thus Carnap asserts there are additional rationality requirements for Cr0 , ones having “no analogue for credence functions”; for example, symmetry of individuals (i.e., exchangeability). The assertion is that absent identifiable differences between individuals at the initial time T0 (and since we are at the initial time T0 we have not yet learned of any), the probability of any proposition involving two or more individuals should remain unchanged if the individuals are permuted (see [Carnap 1962, pp. 313–314; 1971, p. 118]). Carnap regards this as “the valid core of the old principle of indifference ... the basic idea of the principle is sound. Our task is to restate it by specific restricted axioms” [Carnap, 1962, p. 316; 1973, p. 277]. No wonder this part of Carnap’s program never gained traction! It focuses on the credences of an “idealized human baby” rather than an adult; appeals to a state of complete ignorance; and presents itself as a rehabilitated version of the principle of indifference. And what does it mean to talk about individuals about what we know nothing except that they are different? In the end one exchanges one problem for another, replacing the task of finding a probability function by the (in fact much more daunting and questionable) task of establishing the existence of an underlying ideal language, one in which the description of sense experiences can be broken down into atomic interchangeable elements. Such ideal languages are a seductive dream that in one form or another go back centuries, as in John Wilkins’s philosophical language, or Leibniz’s “characteristica universalis”, which Leibniz thought could be used as the basis of a logical probability [Hacking, 1975, Chapter 15]. If Wittgenstein’s early program of logical atomism had been successful, then logical probability might be possible, but the failure of the former dooms the latter. Lacking an ultimate language in one-to-one
Carnap and the Logic of Inductive Inference
305
correspondence with reality, Carnapian programs retain an irreducible element of subjectivism. Despite the ultimate futility of Carnap’s program to justify induction in quantitative terms, the subjective Bayesian does provide a number of qualitative explicata. Inductive rationality in a single individual is not so much a matter of present opinion as the ability to be persuaded by further facts; and for two or more individuals by their ultimate arrival at consensus. To this end a number of results regarding convergence and merging of opinion have been discovered. For convergence of opinion see Skyrms [2006], and the earlier literature cited there; for merging of opinion see the classic paper of Blackwell and Dubins [1962] and the discussion in [Earman, 1992], as well as [Kalai and Lehrer, 1994] and [Miller and Sanchirico, 1999]. For further discussion of Carnap’s program for inductive logic in its final form, see [Jeffrey, 1973]. 14
CONCLUSION
Like his distinguished predecessors Bernoulli and Bayes, Rudolph Carnap continued to grapple with the elusive riddle of induction for the rest of his life. Throughout he was an effective spokesman for his point of view. But although the technical contributions of Carnap and his invisible college (such as Kemeny, Bar-Hillel, Jeffrey, Gaifman, Hintikka, Niiniluoto, Kuipers, Costantini, di Maio, and others) remain of considerable interest even today, Carnap’s most lasting influence was more subtle but also more important: he largely shaped the way current philosophy views the nature and role of probability, in particular its widespread acceptance of the Bayesian paradigm (as, for example, in [Horwich, 1982; Earman, 1992; Mayer, 1993; Jaynes, 2003; Boven and Hartman, 2004; Jeffrey, 2004; Howson and Urbach, 2006]). BIBLIOGRAPHY [Armendt, 1993] Brad Armendt. Dutch books, additivity, and utility theory. Philosophical Topics, 21:1–20, 1993. [Ayer, 1972] A. J. Ayer. Probability and Evidence. Macmillan, London, 1972. [Barker, 1957] S. F. Barker. Induction and Hypothesis. Cornell University Press, Ithaca, 1957. [Bayes, 1764] Thomas Bayes. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370–418, 1764. [Blackwell and Dubins, 1962] David Blackwell and Lester Dubins. Merging of opinions with increasing information. Annals of Mathematical Statistics, 33:882-886, 1962. [Boole, 1854] George Boole. An Investigation of the Laws of Thought. Macmillan, London, 1854. Reprinted 1958, Dover Publications, New York. [Bovens and Hartmann, 2004] Luc Bovens and Stephan Hartmann. Bayesian Epistemologyy. Oxford University Press, Oxford, 2004. [Broad, 1918] C. D. Broad. The relation between induction and probability. Mind, 27:389–404; 29:11–45, 1918. [Broad, 1924] C. D. Broad. Mr. Johnson on the logical foundations of science. Mind, 33:242–261, 369–384, 1924.
306
S. L. Zabell
[Carnap, 1945a] Rudolph Carnap. On inductive logic. Philosophy of Science, 12:72–97, 1945. [Carnap, 1945b] Rudolph Carnap. The two concepts of probability. Philosophy and Phenomenological Research, 5:513–532, 1945. [Carnap, 1950] Rudolph Carnap. Logical Foundations of Probability. University of Chicago Press, Chicago, 1950. Second edition, 1962. [Carnap, 1952] Rudolph Carnap. The Continuum of Inductive Methods. University of Chicago Press, Chicago, 1952. [Carnap, 1962] Rudolph Carnap. The aim of inductive logic. In E. Nagel, P. Suppes, and A. Tarski, editors, Logic, Methodology and Philosophy of Science, pages 303–318. Stanford University Press, Stanford, 1962. [Carnap, 1973] Rudolph Carnap. Notes on probability and induction. Synthese, 25:269–298, 1973. [Carnap and Jeffrey, 1971] Rudolph Carnap and Richard C. Jeffrey, editors. Studies in Inductive Logic and Probability, volume I. University of California Press, Berkeley and Los Angeles, 1971. [Carnap and Stegm¨ uller, 1959] Rudolph Carnap and W. Stegm¨ uller. Inductive Logik und Wahrscheinlichkeit. Springer-Verlag, Vienna, 1959. [Cifarelli and Regazzini, 1996] Donato Michele Cifarelli and Eugenio Regazzini. De Finetti’s contribution to probability and statistics. Statistical Science, 11:2253–282, 1996. [de Finetti, 1937] Bruno de Finetti. La pr´evision: ses lois logiques, ses sources subjectives. Annales de l’Institut Henri Poincar´ e, 7:1–68, 1937. Translated in H. E. Kyburg, Jr. and H. E. Smokler (eds.), Studies in Subjective Probability, Wiley, New York, 1964, pp. 93-158. [de Finetti, 1938] Bruno de Finetti. Sur la condition de “´equivalence partielle”. In Actualites Scientifiques et Industrielles, volume 739, pages 5–18. Hermann, Paris, 1938. [de Finetti, 1972] Bruno de Finetti. Probability, Induction, and Statistics. Wiley, New York, 1972. [De Morgan, 1838] Augustus De Morgan. An Essay on Probabilities, and their Application to Life Contingencies and Insurance Offices. Longman, Orme, Brown, Green, and Longmans, London, 1838. [Diaconis and Freedman, 1980a] Persi Diaconis and David Freedman. De Finetti’s theorem for Markov chains. Annals of Probability, 8:115–130, 1980. [Diaconis and Freedman, 1980b] Persi Diaconis and David Freedman. Finite exchangeable sequences. Annals of Probability, 8:745–764, 1980. [Diaconis and Freedman, 1984] Persi Diaconis and David Freedman. Partial exchangeability and sufficiency. In J. K. Ghosh and J. Roy, editors, Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference, pages 205–236. Indian Statistical Institute, Calcutta, 1984. [Diaconis and Ylvisaker, 1980] Persi Diaconis and Donald Ylvisaker. Conjugate priors for exponential families. The Annals of Statistics, 7:269-281, 1979. [Earman, 1992] John Earman. Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory. M. I. T. Press, 1992. [Edgeworth, 1922] Francis Ysidro Edgeworth. The philosophy of chance. Mind, 31:257–283, 1922. [Ellis, 1854] Robert Leslie Ellis. Remarks on the fundamental principle of the theory of probabilities. Transactions of the Cambridge Philosophical Society, 9:605–607, 1854. [Falk, 1992] Ruma Falk. A closer look at the probabilities of the notorious three prisoners. Cognition, 43:197–223, 1992. [Feller, 1968] William Feller. An Introduction to Mathematical Probability. Wiley, New York, 3rd edition, 1968. [Fitelson, 2008] Brandon Fitelson. Goodman’s “new riddle”. Journal of Philosophical Logic, 37:613–643, 2008. [Garber, 1983] Daniel Garber. Old evidence and logical omniscience in Bayesian confirmation theory. In J. Earman, editor, Testing Scientific Theories, volume 10, pages 99–131. University of Minnesota Press, Minneapolis, 1983. [Good, 1950] I. J. Good. Probability and the Weighing of Evidence. Hafner Press, New York, 1950. [Good, 1952] I. J. Good. Rational decisions. Journal of the Royal Statistical Society B, 14:107– 114, 1952. [Good, 1959] I. J. Good. Kinds of probability. Science, 129:443–447, 1959.
Carnap and the Logic of Inductive Inference
307
[Good, 1960] I. J. Good. The paradoxes of confirmation. British Journal for the Philosophy of Science, 11:145–149, 1960. 12, 63–64. [Good, 1965] I. J. Good. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M. I. T. Press, Cambridge, Mass., 1965. [Good, 1967] I. J. Good. On the principle of total evidence. British Journal for the Philosophy of Science, 17:319–321, 1967. [Good, 1971] I. J. Good. 46656 varieties of Bayesians. American Statistician, 25:62–63, 1971. [Good, 1975] I. J. Good. Explicativity, corroboration, and the relative odds of hypotheses. Synthese, 30:39–73, 1975. [Good, 1983] I. J. Good. Good Thinking. University of Minnesota Press, Minneapolis, 1983. [Good, 1986] I. J. Good. Some statistical applications of poisson’s work. Statistical Science, 1:157–170, 1986. [Goodman, 1946] Nelson Goodman. A query on confirmation. Journal of Philosophy, 43:383– 385, 1946. [Goodman, 1954] Nelson Goodman. Fact, Fiction, and Forecast. Hackett, Indianopolis, 1954. [Hacking, 1967] Ian Hacking. Slightly more realistic personal probability. Philosophy of Science, 34:311–325, 1967. [Hacking, 1975] Ian Hacking. The Emergence of Probability. Cambridge University Press, Cambridge, 1975. [Heath, 1949] Sir Thomas Heath. Mathematics in Aristotle. Clarendon Press, Oxford, 1949. [Hempel, 1945] C. G. Hempel. Studies in the logic of confirmation. Mind, 54:1–26, 97–121, 1945. [Hintikka and Niiniluoto, 1980] J. Hintikka and I. Niiniluoto. An axiomatic foundation for the logic of inductive generalization. In R. C. Jeffrey, editor, Studies in Inductive Logic and Probability, volume 2, pages 157–181? University of California Press, Berkeley, 1980. [Hintikka, 1966] J. Hintikka. A two-dimensional continuum of inductive methods. In J. Hintikka and P. Suppes, editors, Aspects of Inductive Logic, pages 113–132. North-Holland, Amsterdam, 1966. [Hoppe, 1984] Fred Hoppe. Polya-like urns and the Ewens samapling formula. Journal of Mathematical Biology, 20:91–94, 1984. [Horwich, 1982] Paul Horwich. Probability and Evidence. Cambridge University Press, Cambridge, 1982. [Hosiasson-Lindenbaum, 1940] Janina Hosiasson-Lindenbaum. On confirmation. Journal of Symbolic Logic, 5:133–148, 1940. [Howson and Urbach, 2006] Colin Howson and Peter Urbach. Scientific Reasoning: The Bayesian Approach. Open Court Press, Chicago and La Salle, IL, 3rd edition, 2006. [Howson, 1973] Colin Howson. Must the logical probability of laws be zero? British Journal for Philosophy of Science, 24:153–163, 1973. [Howson, 1987] Colin Howson. Popper, prior probabilities, and inductive inference. British Journal for Philosophy of Science, 38:207–224, 1987. [Huzurbazar, 1955] V. S. Huzurbazar. On the certainty of an inductive inference. Proceedings of the Cambridge Philosophical Society, 51:761–762, 1955. [Jaynes, 2003] E. T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, Cambridge, 2003. [Jeffrey, 1973] Richard C. Jeffrey. Carnap’s inductive logic. Synthese, 25:299–306, 1973. [Jeffrey, 1975] Richard C. Jeffrey. Probability and falsification: critique of the Popper program. Synthese, 30:95–117, 1975. [Jeffrey, 1977] Richard C. Jeffrey. Mises redux. In R. E. Butts and J. Hintikka, editors, Basic Problems in Methodology and Linguistics, pages 213–222. Reidel, Dordrecht, 1977. [Jeffrey, 1980] Richard C. Jeffrey, editor. Studies in Inductive Logic and Probability, volume II. University of California Press, Berkeley and Los Angeles, 1980. [Jeffrey, 1983] Richard C. Jeffrey. The Logic of Decision. University of Chicago Press, Chicago, 2nd ed, 1983. [Jeffrey, 1988] Richard C. Jeffrey. Conditioning, kinematics, and exchangeability. In W. L. Harper and B. Skyrms, editors, Causation, Chance, and Credence, volume 1, pages 221–255. Kluwer, Dordrecht, 1988. [Jeffrey, 1992] Richard C. Jeffrey. Probability and the Art of Judgment. Cambridge University Press, Cambridge, 1992.
308
S. L. Zabell
[Jeffrey, 2004] Richard C. Jeffrey. Subjective Probability: The Real Thing. Cambridge University Press, Cambridge, 2004. [Jeffreys, 1961] Harold Jeffreys. Theory of Probability. Clarendon Press, Oxford, 3rd edition, 1961. [Jevons, 1874] Wliiam Stanley Jevons. The Principles of Scienc: A Treatise on Logic and Scientific Method. Macmillan, London, 1st edition, 1874. (2nd e. 1877; reprinted 1958, Dover, New York). [Johnson, 1924] William Ernest Johnson. Logic, Part III: The Logical Foundations of Science. Cambridge University Press, 1924. [Johnson, 1932] William Ernest Johnson. Probability: The deductive and inductive problems. Mind, 41:409–423, 1932. [Kalai and Lehrer, 1994] Ehud Kalai and Ehud Lehrer. Weak and strong merging of opinions. Journal of Mathematical Economics, 23:73-86, 1994. [Katz and Olin, 2007] Bernard D. Katz and Doris Olin. A tale of two envelopes. Mind, 116:903– 926, 2007. [Kemeny, 1955] John Kemeny. Fair bets and inductive probabilities. Journal of Symbolic Logic, 20:263–273, 1955. [Keynes, 1921] John Maynard Keynes. A Treatise on Probability. Macmillan, London, 1921. [Kneale, 1949] William Kneale. Probability and Induction. The Clarendon Press, Oxford, 1949. [Knopp, 1947] Konrad Knopp. Theory and Application of Infinite Series. Hafner Press, New York, 1947. [Kuipers, 1978] Theo A. F. Kuipers. Studies in Inductive Probability and Rational Expectation. D. Reidel, Dordrecht, 1978. [Maher, 1993] Patrick Maher. Betting on Theories. Cambridge Studies in Probability, Induction and Decision Theory. Cambridge University Press, Cambridge, 1993. [Miller, 1997] David Miller. Sir Karl Raimund Popper, CH, FBA. Biographical Memoirs of Fellows of The Royal Society of London, 43:367–409, 1997. [Miller and Sanchirico, 1999] Ronald I. Miller and Chris William Sanchirico. The role of absolute continuity in “merging of opinions” and “rational learning. Games and Economic Behavior, 29:170-190, 1999. [Niiniluoto, 1973] Ilkka Niiniluoto. Review: Alex C. Michalos’ The Popper-Carnap Controversy. Synthese, 25:417–436, 1973. [Niiniluoto, 2009] Ilkka Niiniluoto. The development of the Hintikka program. In Dov Gabbay, Stephan Hartmann, and John Woods, editors, Handbook of the History of Logic, volume 10. Elsevier, London, 2009. [Poincar´e, 1896] Henri Poincar´ e. Calcul des probabilit´es. Paris, Gauthier-Villars, 1896 (2nd ed. 1912). [Polya, 1941] George Polya. Heuristic reasoning and the theory of probability. American Mathematical Monthly, 48:450–465, 1941. [Popper, 1959] Karl Popper. The Logic of Scientific Discovery. Basic Books, New York, 1959. Second ed. 1968, New York: Harper and Row. [Prevost and L’Huilier, 1799a] Pierre Prevost and S. A. L’Huilier. Sur les probabilit´es. M´ emoires de l’Academie Royale de Berlin 1796, pages 117–142, 1799. [Prevost and L’Huilier, 1799b] Pierre Prevost and S. AS. L. L’Huilier. M´emoire sur l’arte d’estimer la probabilit´e des cause par les effets. M´ emoires de l’Academie Royale de Berlin, 1796:3–24, 1799. [Ramsey, 1931] Frank Plumpton Ramsey. Truth and probability. In R. B. Braithwaite, editor, The Foundations of Mathematics and Other Logical Essays, pages 156–198. Routledge and Kegan Paul, London, 1931. Read before the Cambridge Moral Sciences Club, 1926. [Romeijn, 2006] Jan Willem Romeijn. Analogical predictions for explicit similarity. Erkenntnis, 64:253–280, 2006. [Savage, 1954] Leonard J. Savage. The Foundations of Statistics. John Wiley, New York, 1954. Reprinted 1972, New York: Dover. [Schillp, 1963] P. A. Schillp, editor. The Philosophy of Rudolph Carnap. Open Court, La Salle, IL, 1963. [Schwartz, 2009] Robert Schwartz. Goodman and the demise of the syntactic model. In Dov Gabbay, Stephan Hartmann, and John Woods, editors, Handbook of the History of Logic, volume 10. Elsevier, London, 2009.
Carnap and the Logic of Inductive Inference
309
[Shafer, 1985] Glenn Shafer. Conditional probability. International Statistical Review, 53:261– 275, 1985. [Skyrms, 1987] Brian Skyrms. On the principle of total evidence with and without observation sentences. In Logic, Philosophy of Science and Epistemology: Proceedings of the 11th International Wittgenstein Symposium, pages 187–195. H¨ older–Pichler–Tempsky, 1987. [Skyrms, 1990] Brian Skyrms, editor. The Dynamics of Rational Deliberation. Harvard University Press, Cambridge, MA, 1990. [Skyrms, 1991] Brian Skyrms. Carnapian inductive logic for Markov chains. Erkenntnis, 35:439– 460, 1991. [Skyrms, 1993] Brian Skyrms. Analogy by similarity in hypercarnapian inductive logic. In G. J. Massey J. Earman, A. I. Janis and N. Rescher, editors, Philosophical Problems of the Internal and External Worlds: Essays Concerning the Philosophy of Adolf Gr¨ unbaum, pages 273–282. Pittsburgh University Press, Pittsburgh, 1993. [Skyrms, 1996] Brian Skyrms. Inductive logic and Bayesian statistics. In Statistics, Probability, and Game Theory: Papers in Honor of David Blackwell, volume 30 of IMS Lecture Notes — Monograph Series, pages 321–336. Institute of Mathematical Statistics, 1996. [Skryms, 2006] Brian Skyrms. Diachronic coherence and radical probabilism. Philosophy of Science, 73:959-968, 2006. [Sprenger, 2009] Jan Sprenger. Hempel and the paradoxes of confirmation. In Dov Gabbay, Stephan Hartmann, and John Woods, editors, Handbook of the History of Logic, volume 10. Elsevier, London, 2009. [Stalker, 1994] Douglas Stalker. Grue! The New Riddle of Induction. Open Court, Chicago, 1994. [Stigler, 1982] Stephen M. Stigler. Thomas Bayes’s Bayesian inference. Journal of the Royal Statistical Society Series A, 145:250–258, 1982. [Venn, 1866] John Venn. The Logic of Chance. Macmillan, London, 1866. (2nd ed. 1876, 3rd ed. 1888; reprinted 1962, Chelsea, New York). [Vranas, 2004a] P. B. M. Vranas. Hempel’s raven paradox: A lacuna in the standard Bayesian solution. The British Journal for the Philosophy of Science, 55:545–560, 2004. [Vranas, 2004b] P. B. M. Vranas. Have your cake and eat it too: the old principal principle reconciled with the new. Philosophy and Phenomenological Research, 69:368-382, 2004. [Waismann, 1930] Friedrich Waismann. Logische Analyse des Wahrscheinlichkeitsbegriffs. Erkenntis, 1:228–248, 1930. [Wrinch and Jeffreys, 1919] Dorothy Wrinch and Harold Jeffreys. On certain aspects of the theory of probability. Philosophical Magazine, 38:715–731, 1919. [Wrinch and Jeffreys, 1921] Dorothy Wrinch and Harold Jeffreys. On certain fundamental principles of scientific enquiry. Philosophical Magazine, Series 6, 42:369–390, 1921. [Zabell, 1982] S. L. Zabell. W. E. Johnson’s “sufficientness postulate”. Annals of Statistics, 10:1091–1099, 1982. [Zabell, 1988] S. L. Zabell. Symmetry and its discontents. In W. L. Harper and B. Skyrms, editors, Causation, Chance, and Credence, volume 1, pages 155–190. Kluwer, Dordrecht, 1988. [Zabell, 1989] S. L. Zabell. The rule of succession. Erkenntnis, 31:283–321, 1989. [Zabell, 1992] S. L. Zabell. Predicting the unpredictable. Synthese, 90:205–232, 1992. [Zabell, 1995] S. L. Zabell. Characterizing Markov exchangeable sequences. Journal of Theoretical Probability, 8:175–178, 1995. [Zabell, 1996] S. L. Zabell. Confirming universal generalizations. Erkenntnis, 45:267–283, 1996. [Zabell, 1997] S. L. Zabell. The continuum of inductive methods revisited. In John Earman and John D. Norton, editors, The Cosmos of Science: Essays of Exploration, Pittsburg-Konstanz Series in the Philosophy and History of Science, pages 351–385. University of Pittsburgh Press/Universit¨ atsverlag Konstanz, 1997. [Zabell, 2007] S. L. Zabell. Carnap on probability and induction. In Michael Friedman and Richard Creath, editors, The Cambridge Companion to Carnap, pages 273–294. Cambridge University Press, 2007.
THE DEVELOPMENT OF THE HINTIKKA PROGRAM Ilkka Niiniluoto One of the highlights of the Second International Congress for Logic, Methodology, and Philosophy of Science, held in Jerusalem in 1964, was Jaakko Hintikka’s lecture “Towards a Theory of Inductive Generalization” (see [Hintikka, 1965a]). Two years later Hintikka published a two-dimensional continuum of inductive probability measures (see Hintikka, 1966), and ten years later he announced an axiomatic system with K ≥2 parameters (see [Hintikka and Niiniluoto, 1976]). These new original results showed once and for all the possibility of systems of inductive logic where genuine universal generalizations have non-zero probabilities in an infinite universe. Hintikka not only disproved Karl Popper’s thesis that inductive logic is inconsistent (see [Popper, 1959; 1963]), but he also gave a decisive improvement of the attempts of Rudolf Carnap to develop inductive logic as the theory of partial logical implication (see [Carnap, 1945; 1950; 1952]). Hintikka’s measures have later found rich applications in semantic information theory, theories of confirmation and acceptance, cognitive decision theory, analogical inference, theory of truthlikeness, and machine learning. The extensions and applications have reconfirmed — pace the early evaluation of Imre Lakatos [1974] — the progressive nature of this research program in formal methodology and philosophy of science.
1
INDUCTIVE LOGIC AS A METHODOLOGICAL RESEARCH PROGRAM
Imre Lakatos [1968a] proposed that Carnap’s inductive logic should be viewed as a methodological “research programme”. Such programs, both in science and methodology, are characterized by a “hard core” of basic assumptions and a “positive heuristics” for constructing a refutable “protective belt” around the irrefutable core. Their progress depends on the problems that they originally set out to solve and later “problem shifts” in their dynamic development. In a paper written in 1969, Lakatos claimed that Popper has achieved “a complete victory” in his attack against “the programme of an a priori probabilistic inductive logic or confirmation theory” - although, he added, “inductive logic, displaying all the characteristics of a degenerating research programme, is still a booming industry” [Lakatos, 1974, p. 259].1 A similar position is still advocated by David Miller [1994], one of the leading Popperian critical rationalists. 1 If the emphasis is on the term “a priori”, Hintikka agrees with Lakatos. However, we shall see that Hintikka’s “victory” over Carnap is completely different from Popper’s.
Handbook of the History of Logic. Volume 10: Inductive Logic. Volume editors: Dov M. Gabbay, Stephan Hartmann and John Woods. General editors: Dov M. Gabbay and John Woods. c 2011 Elsevier BV. All rights reserved.
312
Ilkka Niiniluoto
In this paper, I argue for a different view about inductive logic (see also [Niiniluoto, 1973; 1983]). Hintikka’s account of inductive generalization was a meeting point of several research traditions, and as a progressive turn it opened new important paths in Bayesian methodology, epistemology, and philosophy of science. Its potential in artificial intelligence is still largely unexplored. The roots of inductive logic go back to the birth of probability calculus in the middle of the seventeenth century. Mathematical probabilities were interpreted as relative frequencies of repeatable events, as objective degrees of possibility, and as degrees of certainty [Hacking, 1975]. For the classical Bayesians, like the determinist P. S. Laplace in the late eighteenth century, probability was relative to our ignorance of the true causes of events. The idea of probabilities as rational degrees of belief was defended in the 19th century by Stanley Jevons. This Bayesian interpretation was reborn in the early 20th century in two different forms. The Cambridge school, represented by W. E. Johnson, J. M. Keynes, C. D. Broad, and Harold Jeffreys, treated inductive probability P (h/e) as a logical relation between two sentences, a hypothesis h and evidence e. Related logical views, anticipated already by Bernard Bolzano in the 1830s, were expressed by Ludwig Wittgenstein and Friedrich Waismann. The school of subjective or personal probability interpreted degrees of belief as coherent betting ratios (Frank Ramsey, Bruno de Finetti) (see [Skyrms, 1986]). In Finland, pioneering work on “probability logic” in the spirit of logical empiricism was published by Eino Kaila in the 1920s (see [Kaila, 1926]). Kaila’s student Georg Henrik von Wright wrote his doctoral dissertation on “the logical problem of induction” in 1941 (see [von Wright, 1951; 1957]). Von Wright, with influences from Keynes and Broad, tried to solve the problem of inductive generalization by defining the probability of a universal statement in terms of relative frequencies of properties (see [Hilpinen, 1989; Festa, 2003; Niiniluoto, 2005c]). Von Wright was the most important teacher of Jaakko Hintikka who, in turn with his students, continued the Finnish school of induction. Rudolf Carnap came relatively late to the debates about probability and induction. Karl Popper had rejected induction in Logik der Forschung in 1934 (see [Popper, 1959]), and he was also sharply critical of the frequentist probability logic of Hans Reichenbach. Kaila had sympathies with Reichenbach’s empiricist approach. In a letter to Kaila on January 28, 1929, Carnap explained that he would rather seek a positivist solution where “probability inferences are equally analytic (tautologous) as other (syllogistic) inferences” (see [Niiniluoto, 1985/1986]). Carnap — influenced by the ideas of Keynes, Jeffreys, and Waismann on objective inductive probabilities, and to the disappointment of Popper — started to develop his views about probability in 1942-44 (see [Carnap, 1945]). Carnap’s Logical Foundations of Probability (LFP, 1950) gave a detailed and precise account of the inductive probability measure c∗ , which is a generalization of Laplace’s famous “rule of succession”. In 1952 Carnap published A Continuum of Inductive Methods. Its class of measures, defined relative to one real-valued parameter λ, contained c∗ as a special case only. Another special case c+ , proposed
The Development of the Hintikka Program
313
by Bolzano and Wittgenstein, was rejected by Carnap since it does not make learning from experience possible. The same point had been expressed in the 19th century by George Boole and Charles Peirce in their criticism of Bayesian probabilities. With John Kemeny, Carnap further showed how the λ-continuum can be justified on an axiomatic basis (see [Kemeny, 1963]). Part of Carnap’s counterattack to Popper’s praise of improbability was based on the new exact theory of semantic information that he developed with Yehoshua Bar-Hillel (see [Carnap and Bar-Hillel, 1952]). Ian Hacking [1971] has argued that the typical assumptions of Carnap’s inductive logic can be found already in the works of G. W. F. Leibniz in the late 17th century: (L1)
There is such a thing as non-deductive evidence.
(L2)
‘Being a good reason for’ is a relation between propositions.
(L3)
There is an objective and formal measure of the degree to which one proposition is evidence for another.
These assumptions can be found also in Keynes’ A Treatise on Probability (1921). Carnap formulated them in the Preface to LFP as follows: (C1)
All inductive reasoning is reasoning in terms of probability.
(C2)
Inductive logic is the same as probability logic.
(C3)
The concept of inductive probability or degree of confirmation is a logical relation between two statements or propositions, a hypothesis and evidence.
(C4)
The frequency concept of probability is used in statistical investigations, but it is not suitable for inductive logic.
(C5)
All principles and theorems of inductive logic are analytic.
(C6)
The validity of induction is not dependent upon any synthetic presuppositions.
The treatment of probabilities of the form P (h/e), where h is a hypothesis and e is evidence, connects Carnap to the Bayesian tradition. Against the subjectivist school, Carnap’s intention was to eliminate all “psychologism” from inductive logic — just as Gottlob Frege had done in the case of deductive logic. Carnap’s C4 accepts a probabilistic dualism with both physical and epistemic probabilities. By C5 and C6, probability as partial entailment is independent on all factual assumptions. In practical applications of inductive logic, degrees of confirmation P (h/e) have to be calculated relative to the total evidence e available to scientists. Carnap’s commitment to probabilistic induction (C1 and C2) leaves open the question whether the basic notion of induction is support (e confirms h) or acceptance (h is rationally acceptable on e). According to Carnap, David Hume was
314
Ilkka Niiniluoto
right in denying the validity of inductive inferences, so that the proper task of inductive logic is to evaluate probabilities of the form P (h/e). Such probabilities can then be used in practical decision making by applying the rule of Maximizing Expected Utility (cf. [Stegm¨ uller, 1973]). In normative decision theory, there are “quasi-psychological” counterparts to “purely logical” inductive probabilities (see [Carnap, 1971]). Carnap was followed by Richard Jeffrey in the denial of inductive acceptance rules (cf. the debate in [Lakatos, 1968b]). In the second 1962 edition of LFP, Carnap defended himself against the bitter attacks of Popper by distinguishing two senses of “degree of confirmation”: posterior probability P (h/e) and increase of probability P (h/e) − P (h). This was a clarification of the core assumption C3. In LFP, Carnap demanded that inductive logic should give an account of the following types of cases: (a)
Direct inference: from a population to a sample
(b)
Predictive inference: from a sample to another sample
(c)
Inference by analogy: from one individual to another by their known similarity
(d)
Inverse inference: from a sample to a population
(e)
Universal inference: from a sample to a universal hypothesis.
He showed how the measure c∗ helps to solve these problems. But Carnap did not wish to claim that c∗ is “perfectly adequate” or the “only adequate” explicatum of inductive probability (LFP, p. 563). So his method can be characterized by the following heuristic principle: (C7)
Use logic to distinguish alternative states of affairs that can be expressed in a given formal language L. Then define inductive probabilities for sentences of L by taking advantage of symmetry assumptions concerning such states of affairs.
The systematic applications of C7 distinguish the Carnapian program of inductive logic from the more general Bayesian school which admits all kinds of prior probability measures. As we shall see in the next section, Hintikka’s work on inductive logic relies on the heuristic principle C7 in a novel way, so that the problem (e) of universal inference gets a new solution. Hintikka’s system satisfies the core assumptions C1, C2, and C4. But, in his reply to Mondadori [1987], Hintikka himself urges that his studies did not just amount to “tinkering with Carnap’s inductive logic” or removing some “anomalies” from it, but rather “it means to all practical purposes a refutation of Carnap’s philosophical program in developing his inductive logic” [Hintikka, 1987b]. What Hintikka has in mind is the “logicism” involved in the Carnapian core assumptions C3, C5, and C6. Hintikka’s own move is to replace C3 and C6 with more liberal formulations:
The Development of the Hintikka Program
315
(C3′ ) Inductive probability P (h/e) depends on the logical form of hypothesis h and evidence e. (C6′ ) Inductive probabilities, and hence inductive probabilistic inferences, may depend on extra-logical factors. Here C6′ allows that inductive inferences may have contextual or “local” presuppositions (cf. [Bogdan, 1976]). Inductive probability is thus not a purely syntactical or semantical notion, but its explication involves pragmatic factors. However, in the spirit of what Hintikka calls “logical pragmatics”, C3′ and C6′ should be combined with C7 so that the dependence and interplay of logical and extra-logical factors is expressed in an explicit and precise way. Then it turns out that C5 is ambiguous: some principles of induction may depend on pragmatic boundary conditions (like the extra-logical parameters), while some mathematical theorems of inductive logic turn out to be analytically true. 2
FROM CARNAP TO HINTIKKA’S TWO-DIMENSIONAL CONTINUUM
In inductive logic, probabilities are at least partly determined by symmetry assumptions concerning the underlying language [Carnap, 1962; Hintikka and Suppes, 1966; Niiniluoto and Tuomela, 1973]. In Carnap’s λ-continuum the probabilities depend also on a free parameter λ which indicates the weight given to logical or language-dependent factors over and above purely empirical factors (observed frequencies) (see [Carnap, 1952]). Carnap’s λ serves thus as an index of caution in singular inductive inference. In Hintikka’s 1966 system one further parameter α is added to regulate the speed in which positive instances increase the probability of a generalization. More precisely, let Q1 , ..., QK be a K-fold classification system with mutually exclusive predicates, so that every individual in the universe U has to satisfy one and only one Q-predicate. A typical way of creating such a classification system is to assume that a finite monadic first-order language L contains k basic predicates M1 , ..., Mk , and each Q-predicate is defined by a k-fold conjunction of positive or negative occurrences of the M -predicates: (±)M1 x&...&(±)Mk x. Then K = 2k . Each predicate expressible in language L is definable as a finite disjunction of Q-predicates. Carnap generalized this approach to the case where the dichotomies {Mj , ∼ Mj } are replaced by families of mutually exclusive predicates Mj = {Mj1 , ..., Mjm }, and a Q-predicate is defined by choosing one element from each family Mj (see [Jeffrey, 1980]). For example, one family could be defined by colour predicates, another by a quantity taking discrete values (e.g., age). Assume that language L contains N individual names a1 , ..., aN . Let L be interpreted in universe U with size N , so that each object in U has a unique name in L. A state description relative to individuals a1 , ..., aN tells for each ai which Q-predicate it satisfies in universe U . A structure description tells how many individuals in U satisfy each Q-predicate. Every sentence within this first-order
316
Ilkka Niiniluoto
monadic framework L can be expressed as a disjunction of state descriptions; in particular, a structure description is a disjunction of state descriptions that can be obtained from each other just by permuting individual constants. The state descriptions in L that entail sentence h constitute the range R(h) of h. Regular probability measures m for L define a non-zero probability m(s) for each state description s of L. For each sentence h in L, m(h) is the sum of all measures m(s), s ∈ R(h). A regular confirmation function c is then defined as conditional probability: (1) c(h/e) =
m(h&e) . m(e)
Let now en describe a sample of n individuals in terms of the Q-predicates, and let ni ≥ 0 be the observed number of individuals in cell Qi (so that n1 + ... + nK = n). Carnap’s λ-continuum takes the posterior probability c(Qi (an+1 )/en ) that the next individual an+1 will be of kind Qi to be (2)
ni + λ/K . n+λ
This value is known as the representative function of an inductive probability measure. The probability (2) is a weighted average of ni /n (observed relative frequency of individuals in Qi ) and 1/K (the relative width of predicate Qi ). The choice λ = ∞ gives Reichenbach’s Straight Rule, which allows only the empirical factor ni /n to determine posterior probability. The choice λ = 4 would give the range measure proposed in Wittgenstein’s Tractatus, which divides probability evenly to state descriptions, but it makes the inductive probability (2) equal to 1/K which is a priori independent of the evidence e and, hence, does not allow for the learning from experience. When λ < ∞, predictive probability is asymptotically determined by the empirical factor: (3) [c(Qi (an+1 )/en ) − ni /n] → 0, when n → ∞. Principle (3) is known as Reichenbach’s Axiom [Carnap and Jeffrey, 1971; Kuipers, 1978b]. It is known that (3) implies the principle of Positive Instantial Relevance: (4) c(Qi (an+2 )/en &Qi (an+1 )) > c(Qi (an+1 )/en ). The choice λ = K in (2) gives Carnap’s measure c∗ , which allocates probability evenly to all structure descriptions. The formula (5) c∗ (Qi (an+1 /en ) =
ni + 1 n+K
includes as a special case (ni = n, K = 2) Laplace’s Rule of Succession (6)
n+1 . n+2
The Development of the Hintikka Program
317
Laplace derived this probability of the next favourable instance after n positive ones by assuming that all structural compositions of an urn with white and black balls are equally probable. If the Q-predicates are defined so that they have different relative widths qi , such that q1 + ... + qK = 1, then (2) is replaced by (2′ )
ni + qi λ . n+λ
[Carnap and Stegm¨ uller, 1959; Carnap, 1980]. (2) is obtained from (2′ ) by choosing qi = 1/K for all i = 1, ..., K.2 If the universe U is potentially infinite, so that its size N may grow without limit, the probability c(h/e) is defined as the limit of the values (1) in a universe of size N (when N → ∞). Then it turns out that all measures of Carnap’s λcontinuum assign the probability zero to universal generalizations h on singular evidence e. Carnap admitted that such a result “may seem astonishing at first sight”, since in science it has been traditional to speak of “well-confirmed laws” (see [Carnap, 1945]). But he immediately concluded that “the role of universal sentences in the inductive procedures of science has generally been overestimated”, and proposed to measure the instance confirmation of a law h by the probability that a new individual not mentioned in evidence e fulfills the law h. Carnap’s attempted reduction of universal inference to predictive singular inference did not convince all his colleagues. In Lakatosian terms, this move was a regressive problem-shift in the Carnapian program. One of those who criticized Carnap’s proposal was G. H. von Wright [1951a]. Von Wright knew, on the basis of Keynes, that universal generalizations h can be confirmed by positive singular evidence en = i1 &...&in entailed by h if two conditions are satisfied: (i) the prior probability P (h) is not minimal, and (ii) new confirmations of h are not maximally probable relative to previous confirmations [von Wright, 1951b]. The Principal Theorem of Confirmation thus states the following: (7) If P (h) > 0 and P (in+1 /en ) < 1, then P (h/en &in+1 ) > P (h/en ). As Carnap’s system does not satisfy this theorem, it is “no Confirmation-Theory at all” (see [von Wright, 1957, pp. 119, 215]). It also fails to solve the dispute of Keynes and Nicod about the conditions for the convergence of posterior probability to its maximum value one with increasing positive evidence: (8) P (h/en ) → 1, when n → ∞. It is remarkable that Popper, the chief opponent of inductive logic, also argued for the zero logical probability of universal laws, i.e., the same result that shattered Carnap’s system (see [Popper, 1959, appendices vii and viii]). Lakatos [1968a] 2 Carnap’s probabilities (2) and (2′ ) are known to statisticians as symmetric and nonsymmetric Dirichlet distributions (see [Festa, 1993]). Skyrms [1993a] has observed that statisticians have extended such distributions to a “value continuum” (i.e., the discrete set of Qpredicates is replaced by a subclass of a continuous space).
318
Ilkka Niiniluoto
called the assumption that P (h) > 0 for genuinely universal statements h “the Jeffreys-Keynes postulate”, and Carnap’s thesis about the dispensability of laws in inductive logic “the weak atheoretical thesis”. If indeed P (h) = 0 for laws h, then P (h/e) = 0 for any evidence e; and equally well all other measures of confirmation (like Carnap’s difference measure) or corroboration (cf. [Popper, 1959]) are trivialized (see [Niiniluoto and Tuomela, 1973, pp. 212–216, 242–243]). Carnap’s notion of instance confirmation restricts the applications of inductive logic to singular sentences. A related proposal is to accept the Carnapian framework for universal generalization in finite universes. This move has been defended by Mary Hesse [1974]. However, the applications of inductive logic would then depend on synthetic assumptions of the size of the universe — against the principle C6. Moreover, the Carnapian probabilities of finite generalizations behave qualitatively in a wrong way: the strongest confirmation is given to those universal statements that allow many Q-predicates, even when evidence seems to concentrate only on a few Q-predicates (see [Hintikka, 1965a; 1975]). In his “Replies” in the Schilpp volume (see [Schilpp, 1963, p. 977]), Carnap told that he has constructed confirmation functions which do not give zero probabilities to universal generalizations, but “they are considerably more complicated than those of the λ-system”. He did not published any of these results. In this problem situation, Hintikka’s presentation of his “Jerusalem system” in the 1964 Congress was a striking novelty. Hintikka solves the problem of universal generalization by dividing probability to constituents. He had learned this logical tool during the lectures of von Wright in Helsinki in 1947–1948. Von Wright characterized logical truth by means of “distributive normal forms”: a tautology of monadic predicate logic allows all constituents, which are mutually exclusive descriptions of the constitution of the universe. Hintikka’s early insight in 1948, at the age of 21, was the way of extending such distributive normal forms to the entire first-order logic with relations (see [Bogdan, 1987; Hintikka, 2006, p. 9]). This idea resulted in 1953 in a doctoral dissertation on distributive normal forms. Hintikka was thus well equipped to improve Carnap’s system of induction. Let L again be a monadic language with Q-predicates Q1 , ..., QK . A constituent C w tells which Q-predicates are non-empty and which empty in universe U . The logical form of a constituent is thus (9) (±)(∃x)Q1 (x)&...&(±)(∃x)QK (x). If Qi , i ∈ CT , are precisely those Q-predicates claimed to be non-empty by (9), then (9) can be rewritten in the form (∃x)Qi (x)&(x)[ Qi (x)]. (10) i∈CT
i∈CT
The cardinality of CT is called the width of constituent (10). Often a constituent with width w is denoted by C w . Then C K is the maximally wide constituent which claims that all Q-predicates (i.e., all kinds of individuals which can be described
The Development of the Hintikka Program
319
by the resources of language L) are exemplified in the universe. Note that if C K is true in universe U , then there are no true universal generalizations in L. Such a universe U is atomistic with respect to L, and C K is often referred to as the atomistic constituent of L. The number of different constituents of L is 2K . Among them we have the empty constituent of width zero; it corresponds to a contradiction. Other constituents are maximally consistent and complete theories in L: each of them specifies a “possible word” by means of primitive monadic predicates, sentential connectives and quantifiers. Thus, constituents are mutually exclusive, and the disjunction of all constituents is a tautology. Note that in a language with finitely many individual constants each constituent can be expressed by a disjunction of state descriptions or by a disjunction of structure descriptions. Each consistent generalization h in L (i.e., a quantificational statement without individual constants) can be expressed as a finite disjunction of constituents: Ci (11) ⊢ h = i∈Ih
(11) is the distributive normal form of h. Constituents are strong generalization in L, and other generalizations in L are weak. By (11), the probability of generalizations reduces to the probabilities of constituents. As above, let evidence e be a descriptions of a finite sample of n individuals, and let c be the number of different kinds of individuals observed in e. Sometimes we denote this evidence by ecn . Then a constituent C w of width w is compatible with ecn only if c ≤ w ≤ K. By Bayes’s formula, (12) P (C w /e) =
K−c i=0
P (C w )P (e/C w ) . K−c c+i c+i P (C )P (e/C ) i
Hence, to determine the posterior probability P (C w /e), we have to specify the prior probabilities P (C w ) and the likelihoods P (e/C w ). In his first papers, Hintikka followed the heuristic principle C7. His Jerusalem system is obtained by first dividing probability evenly to all constituents and then dividing the probability-mass of each constituent evenly to all state descriptions belonging to it [Hintikka, 1965a]. His combined system is obtained by first dividing probability evenly to all constituents, then evenly to all structure descriptions satisfying a constituent, and finally evenly to state descriptions belonging to a structure description [Hintikka, 1965b]. In both cases, the prior probabilities P (C w ) of all constituents are equal to 1/2K . It turns out that there is one and only one constituent C c which has asymptotically the probability one when the size n of the sample e grows without limit. This is the “minimal” constituent C c which states that the universe U instantiates precisely those c Q-predicates which are exemplified in the sample e: (13) P (C c /ecn ) → 1, if n → ∞ and c is fixed P (C w /ecn ) → 0, if n → ∞, c is fixed, and w > c.
320
Ilkka Niiniluoto
It follows from (13) that a constituent which claims some uninstantiated Q-predicates to be exemplified in U will asymptotically receive the probability zero. A weak generalization h in L will receive asymptotically the probability one if and only if its normal form (11) includes the constituent C c : (14) Assuming that n → ∞ and c is fixed, P (h/ecn ) → 1 iff C c ⊢ h.
The Keynes-Nicod debate thus receives an answer by Hintikka’s probability assignment. In his two-dimensional continuum of inductive methods, Hintikka [1966] was able to formulate a system which contains as special cases his earlier measures as well as the whole of Carnap’s λ-continuum. Hintikka proposes that likelihoods relative to C w are calculated in the same way as in Carnap’s λ-continuum (cf. (2)), but by restricting the universe to the w Q-predicates that are instantiated by C w . Thus, if e is compatible with C w , we have (15) P (Qi (an+1 )/e&C w ) =
ni + λ/w . n+λ
By (15), we can calculate that (16) P (e/C w ) =
c Γ(λ) Γ(nj + λ/w) , Γ(n + λ) j=1 Γ(λ/w)
where Γ is the Gamma-function. Note that Γ(n + 1) = n!. For prior probabilities Hintikka proposes that P (C w ) should be chosen as proportional to the Carnapian probability that a set of α individuals is compatible with C w . This leads to the assignment (17) P (C w ) =
Γ(α+wλ/K) Γ(wλ/K) K
i=0
.
Γ(α+iλ/K) (K i ) Γ(iλ/K)
The posterior probability P(Cw /e) can then be calculated by (12), (16), and (17). If α = 0, then (17) gives equal priors to all constituents: (18) P (C w ) = 1/2K for all C w . The Jerusalem system is then obtained by letting λ → ∞. Small value of α is thus an indication of the strength of a priori considerations in inductive generalization — just as small λ indicates strong weight to a priori considerations in singular inference. But if 0 < α < ∞, then we have ′
(19) P (C w ) < P (C w ) iff w < w′ .
The Development of the Hintikka Program
321
Given evidence ecn which has realized c Q-predicates, the minimal constituent C c compatible with ecn claims that the universe is similar to the sample ecn . This constituent is the simplest of non-refuted constituents in the sense of ontological parsimony (cf. [Niiniluoto, 1994]). By (19) it is initially the least probable, but by (13) it is the only constituent that receives asymptotically the probability one with increasing but similar evidence. If λ is chosen to be a function of w, so that λ(w) = w, then Hintikka’s generalized combined system is obtained; the original combined system of [Hintikka, 1965b] is a special case with α = 0. The formulas of the two-dimensional system are reduced to simpler equations: ni + 1 . (15′ ) P (Qi (an+1 )/e&C w ) = n+w (16′ )
(17′ )
P (e/C w ) =
c (w − 1)!
(nj !) (n + w − 1)! j=1
(α + w − 1)! (w − 1)! P (C w ) = K . K (α+i−1)! (i ) (i−1)! i=0
Hence, by (12), (20) P (C
w
/ecn )
=
(α+w−1)! (n+w−1)! K−c i=0
K−c
i
.
(α+c+i−1)! (n+c+i−1)!
In particular, when n = α, (20) reduces to K−c i=0
1
=
K−c
i
1 . 2K−c
If n and α are sufficiently large in relation to K, then using the approximation (m + n)! ≃ m!mn , where m is sufficiently large in relation to n2 (see [Carnap, 1950, p. 150]), we get from (20) an approximate form of P (C w /e): (21) P (C w /ecn ) ≃
(α/n)w−c . (1 + α/n)K−c
(See [Niiniluoto, 1987, p. 88].) Formula (21) shows clearly the asymptotic behaviour (13) of the posterior probabilities when n increases without limit. The representative function of the generalized combined system is K−c K−c (α+c+i−1)! i (n+c+i)! i=0 . (22) P (Qi (an+1 /ecn ) = (ni + 1) K−c K−c (α+c+i−1)! i (n+c+i−1)! i=0
322
Ilkka Niiniluoto
If h is a universal generalization in L which claims that certain b Q-predicates are empty, and if h is compatible with e, then K−b−c K−b−c (α+c+i−1)! i (n+c+i−1)! . (23) P (h/ecn ) = i=0 K−c K−c (α+c+i−1)! i (n+c+i−1)! i=0
Approximately, for sufficiently large α and n, (23) gives (24) P (h/e) ≃
1 . (1 + α/n)b
In agreement with (14), the value of (24) approaches one when n increases without limit. On the other hand, if α → ∞, we can see by (20) that P (C w /e) → 1 if and only if w = K. In fact, the same result holds for the prior probabilities of constituents: (25) If α → ∞, then P (C K ) → 1 and P (C w ) → 0 for w < K. More generally, we have the result that the probabilities of Hintikka’s λ − αcontinuum approach the probabilities of Carnap’s λ-continuum, when α → ∞. The result (25) explains why the probabilities of all universal generalizations are zero for all of Carnap’s measures: his probabilities of universal generalizations are fixed purely a priori in the sceptical fashion that the prior probability of the atomistic constituent C K is one. Carnap’s λ-continuum is thus the only special case (α = ∞) of Hintikka’s two-dimensional continuum where the asymptotic behaviour (13) of posterior probabilities does not hold. 3 AXIOMATIC INDUCTIVE LOGIC The aim of axiomatic inductive logic is to find general rationality principles which narrow down the class of acceptable probability measures. The first axiomatic treatment of this kind was presented by W. E. Johnson [1932] (cf. [Pietarinen, 1972]). His main results were independently, and without reference to him, rediscovered by Kemeny and Carnap in 1952-54 (see [Schilpp, 1963]). Let P be a real-valued function P defined for pairs of sentences (h, e), where e is consistent, of a finite monadic language L. Assume that P satisfies the following: (A1) Probability axioms (A2) Finite regularity: For singular sentences h and e, P (h/e) = 1 only if ⊢ e ⊃ h. (A3) Symmetry with respect to individuals: The value of P (h/e) is invariant with respect to any permutation of individual constants. (A4) Symmetry with respect to predicates: The value of P (h/e) is invariant with respect to any permutation of the Q-predicates.
The Development of the Hintikka Program
323
(A5) λ-principle: There is a function f such that P (Qi (an+1 /e) = f (ni , n). For the advocates of personal probability, A1 is the only general constraint of rational degrees of belief. It guarantees that probabilities serve as coherent betting ratios. A2 excludes that some contingent singular sentence has the prior probability one. A3 is equivalent to De Finetti’s condition of exchangeability (cf. [Carnap and Jeffrey, 1971; Hintikka, 1971]). It entails that the probability P (Qi (an+1 /e) depends upon evidence e only through the numbers n1 , . . . , nK , so that it is independent on the order of observing the individuals in e. A4 states that the Qpredicates are symmetrical: P (Qi )(aj ) = 1/K for all i = 1, . . . , K. A5 is Johnson’s [1932] “sufficientness postulate”, or Carnap’s “axiom of predictive irrelevance”. It states that the representative function P (Qi (an+1 /e) is independent of the numbers nj , j = i, of observed individuals in other cells than Qi (as long as the sum n1 + ... + nK = n). The Kemeny–Carnap theorem states that axioms A1-A5 characterize precisely Carnap’s λ-continuum with λ > 0: if A1–A5 hold for P , then f (ni , n) = where λ=
ni + λ/K , n+λ
Kf (0, 1) . 1 − Kf (0, 1)
If K = 2, the proof requires the additional assumption that f is a linear function of ni . The case λ = 0 is excluded by A2. By dropping A4, the function f (ni , n) will have the form (2′ ). Hence, we see that a regular and exchangeable inductive probability measure is Carnapian if and only if it satisfies the sufficiency postulate A5. In particular, the traditional Bayesian approach of Laplace with probability c∗ satisfies A5. Axiom A5 is very strong, since it excludes that predictive singular probabilities P (Qi (an+1 /ecn ) about the next instance depend upon the variety of evidence ecn , i.e., upon the number c of cells Qi such that ni > 0. As the number of universal generalizations in L which evidence e falsifies is also a simple function of c, axiom A5 makes induction purely enumerative and excludes the eliminative aspects of induction (see [Hintikka, 1968b]). We have already seen that the representative function (22) of Hintikka’s generalized combined system depends on c. The inability of Carnap’s λ-continuum to deal with inductive generalization is thus an unhappy consequence of the background assumption A5. The Carnap-Kemeny axiomatization of Carnap’s λ-continuum was generalized by Hintikka and Niiniluoto in 1974, who allowed that the inductive probability (2) of the next case being of type Qi depends on the observed relative frequency ni of kind Qi and on the number c of different kinds of individuals in the sample e (see [Hintikka and Niiniluoto, 1976]):
324
Ilkka Niiniluoto
A6
c-principle: There is a function f such that P (Qi (an+1 /ecn ) = f (ni , n, c).
The number c expresses the variety of evidence e, and it also indicates how many universal generalizations e has already falsified. Hintikka and Niiniluoto proved that measures satisfying axioms A1–A4 and A6 constitute a K-dimensional system determined by K-parameters λ=
Kf (1,K+1,K) 1−Kf (1,K+1,K)
−1
γc = f (0, c, c), for c = 1, ..., K − 1. Here λ > −K and (26) 0 < γc
≤
λ/K . c+λ
This K-dimensional system is called the NH-system by Kuipers [1978b]. (See also [Niiniluoto, 1977].) The upper bound of (26) is equal to the value of probability f (0, c, c) in Carnap’s λ-continuum; let us denote it by δc . It turns out that, for infinite universes, the probability of the atomistic constituent C K is P (C K ) =
γ1 ...γ)K − 1 . δ1 ...δK−1
Hence, P (C K ) = 1 iff γi = δi for all i = 1, ..., K − 1. In other words, Carnap’s λ-continuum is the only special case of the K-dimensional system which does not attribute non-zero probabilities to some universal generalizations. Again, Carnap’s systems turns out to be biased in the sense that it assigns a priori the probability one to the atomistic constituent C K that claims all Q-predicates to be instantiated in universe U . The reduction of all inductive probabilities to K parameters, which concern probabilities of very simple singular predictions, gives a counter-argument to Wolfgang Stegm¨ uller’s [1973] claim that it does not “make sense” to bet on universal generalizations (cf. [Hintikka, 1971]). In the K-dimensional system, a bet on a universal law is equivalent to a system of K bets on singular sentences on finite evidence. The parameter γc = f (0, c, c) expresses the predictive probability of finding a new kind of individual after c different successes. For such evidence e, the posterior probability of C c approaches one when γc approaches zero. Further, P (C c ) decreases when γc increases. Parameter γw thereby serves as an index of caution for constituents of width w. While Hintikka’s two-dimensional system has one index α of overall pessimism about the truth of constituents C w , w < K, in the K-dimensional system there is a separate index of pessimism for each width w < K. The K-dimensional system allows more flexible distributions of prior probabilities of constituents than Hintikka’s α − λ-continuum. For example, principle
The Development of the Hintikka Program
325
(19) may be violated. One may divide prior probability equally first to sentences S w (w = 0, ..., K) which state that there are w kinds individuals in the universe. K
Such “constituent-structures” S w are disjunctions of the (w) constituents C w of width w. This proposal was made by Carnap in his comment on Hintikka’s system (see [Carnap, 1968]; cf. [Kuipers, 1978a]). Assuming that the parameters γc do not have their Carnapian values, one can show (27) P (Qi (an+1 )/e&C w ) =
ni + λ/K . n + wλ/K
Comparison with formula (15) shows that, in addition to the λ-continuum, the intersection of the K-dimensional system and Hintikka’s α − λ-system contains those members of the latter which satisfy the condition that λ as a function of w equals αw for some constant a > 0.3 The case with a = 1 is Hintikka’s generalized combined system (cf. (15′ )). This new way of motivating this system shows its naturalness. The relations of different inductive systems are studied in detail by Theo Kuipers [1978b].4 It follows from (27) that the K-dimensional system satisfies Reichenbach’s Axiom (3) and Instantial Positive Relevance (4). The fundamental adequacy condition (13) of inductive generalization is satisfied whenever the parameters γi are chosen more optimistically than their Carnapian values: (28) If γi < δi , for i = c, ..., K−1, then P (C c /e) → 1 when n → ∞ and c is fixed. This result shows again that the much discussed result of Carnap’s λ-continuum, viz. the zero confirmation of universal laws, is really an accidental feature of a system of inductive logic. We get rid of this feature by weakening the λ-principle A5 to the c-principle A6. 4
EXTENSIONS OF HINTIKKA’S SYSTEM
Hintikka’s two-dimensional continuum was published in Aspects of Inductive Logic [1966], edited by Hintikka and Patrick Suppes. This volume is based on an International Symposium on Confirmation and Induction, held in Helsinki in late September 1965 as a continuation of an earlier seminar at Stanford University in the spring of the same year. Hintikka had at that time a joint appointment at Helsinki and Stanford. Besides essays on the paradoxes of confirmation (Max Black, Suppes, von Wright), the volume includes several essays on induction by 3 This class of inductive methods is mentioned by Hilpinen [1968, p. 65]. Kuipers [1978b] calls them SH-systems. 4 Zabell [1997] has developed Johnson’s axiomatization so that constituents of width one receive non-zero probabilities. This result is a special case of the K-dimensional system with γ1 < δ1 .
326
Ilkka Niiniluoto
Hintikka’s Finnish students: Risto Hilpinen, Raimo Tuomela, and Juhani Pietarinen. Like Carnap’s continuum, Hintikka’s two-dimensional system is formulated for a monadic first-order language with finitely many predicates. As a minor technical improvement, the Q-predicates may have different widths (cf. (2′ )). The Qpredicates may be defined by families of predicates in Carnap’s style, so that they allow discrete quantitative descriptions. Moreover, the number of predicates may be allowed to be countably infinite (see [Kuipers, 1978b]). However, more challenging questions concern extensions of Hintikka’s framework to languages which are essentially more powerful than monadic predicate logic. Hilpinen [1966] considers monadic languages with identity. In such languages it is possible to record that we have picked out different individuals in our evidence (“sampling without replacement”). Numerical quantifiers “there are at least d individuals such that” and “there are precisely d − 1 individuals” can be expressed by sentences involving a layer of d interrelated quantifiers. The maximum number of nested quantifiers in a formula is called its quantificational depth. Hence, by replacing existential quantifiers in formula (9) by numerical quantifiers, constituents of depth d can specify that each Q-predicate is satisfied by either 0, 1, ..., d − 1, or at least d individuals (see [Niiniluoto, 1987, p. 59]). A constituent of depth d splits into a disjunction of “subordinate” constituents at the depth d+1: the claim that there are at least d individuals in Qi means that there are precisely d or at least d + 1 individuals in Qi . For finite universes monadic constituents with identity are equivalent to Carnap’s structure descriptions, but expressed without individual constants. Hilpinen extends Hintikka’s Jerusalem system to the monadic language with identity by dividing the probability mass evenly to all constituents at the depth d and then evenly to state descriptions entailing a constituent. Hilpinen shows that, on the basis this probability assignment, it is not reasonable to project that cells not instantiated in our evidence are occupied by more than d individuals. Also it is not rational to project that observed singularities (i.e., cells with only one observed individual) are real singularities in the whole universe. However, all constituents according to which in our universe there are unobserved singularities have an equal degree of posterior probability on any evidence, and these constituents are equally probable as the constituent which denies the existence of unobserved singularities. The last result is not intuitive. Hilpinen shows that it can be changed by an alternative probability assignment: distribute probability first evenly to constituents of depth 1, then evenly to all subordinate constituents of dept 2, etc. (cf. [Hintikka, 1965a]). Then the highest probability is given to the constituent which denies the existence of unobserved singularities. Tuomela [1966] shows that Hintikka’s main result (13) about inductive generalization can be achieved in an ordered universe. The decision problem for a first-order language containing the relation Rxy =“y is an immediate successor of x” is effectively solvable. The Q-predicates for such a language specify triples: predecessor, object, successor. The constituents state which kinds of triples there are
The Development of the Hintikka Program
327
in the universe. If all constituents are given equal prior probabilities, the simplest constituent compatible with evidence will have the greatest posterior probability. Inductive logic for full first-order logic is investigated by Hilpinen [1971].5 In principle, Hintikka’s approach for monadic languages can be generalized to this situation, since Hintikka himself showed in 1953 how distributive normal forms can be defined for first-order languages L with a finite class of polyadic relations (cf. [Niiniluoto, 1987, pp. 61-80]). For each quantificational depth d > 0, i.e., the number of layers of quantifiers, a formula of L can be expressed as a disjunction of constituents of depth d. This normal form can be expanded to greater depths. A new feature of this method results from the undecidability of full first-order logic: some constituents are non-trivially inconsistent and there is no effective method of locating them. The logical form of a constituent of depth d is still (13), but now the Q-predicates or “attributive constituents” are trees with branches of the length d. A constituent of depth 1 tells what kinds of individuals there are in the universe, now described by their properties and their relations to themselves. A constituent C (2) of depth 2 is a systematic description of all different kinds of pairs of individuals that one can find in the universe. A constituent C (d) of depth d is a finite set of finite trees with maximal branches of the length d. Each such branch corresponds to a sequence of individuals that can be drawn with replacement from the universe. Such constituents C (d) are thus the strongest generalizations of depth ≤ d expressible in language L. Each complete theory in L can be axiomatized by a monotone sequence of subordinate constituents C (d) |d < ∞, where ... C (d+1) ⊢ C (d) ⊢ ... ⊢ C (1) .’indexDe Morgan, A. Given the general theory of distributive normal forms, the axiomatic approach can in principle be applied to the case of constituents of depth d. Whenever assumptions corresponding to A1, A2, A3, and A6 can be made, there will be one constituent C (d) which receives asymptotically the probability one on the basis of evidence consisting of ramified sequences of d interrelated individuals. The general case has not yet been studied. Hilpinen’s 1971 paper is still the most detailed analysis of inductive logic with relations.6 5 Assignment of mathematical probabilities to formulas of a first-order language L is studied in probability logic (see [Scott and Krauss, 1966; Fenstad, 1968]). The main representation theorem, due to Jerzy Los in 1963, tells that the probability of a quantificational sentence can be expressed as a weighted average involving two kinds of probability measures: one defined over the class of L-structures, and the other measures defined over sets of individuals within each L-structure. Again exchangeability (cf. A3) guarantees some instantial relevance principles (cf. [Nix and Paris, 2007]), but otherwise probability logic has yet not lead to new fruitful applications in the theory of induction. 6 Nix and Paris [2007] have recently investigated binary inductive logic, but they fail to refer to Hintikka’s program in general and to Hilpinen [1971] in particular. The basic proposal of Nix and Paris is to reduce binary relations to unary ones: for example, ‘x pollinates y’ is treated as equivalent to a long sentences involving only unary predicates of x and y. As a general move this proposal is entirely implausible. The reduction of relations to monadic properties was a dogma of classical logic, until De Morgan and Peirce in the mid-nineteenth century started the serious study of the logic of relations (see [Kneale and Kneale, 1962]). The crucial importance of the distinction between monadic and polyadic first-order logic was highlighted by the metalogical results in the 1930s: the former is decidable, the latter is undecidable. This difference has
328
Ilkka Niiniluoto
Hilpinen studies constituents of depth 2. Evidence e includes n observed individuals and a complete description of the relations of each pair of individuals in e. Now constituents C w of depth 2 describe what kinds of individuals there are in the universe U . A statement Dv which specifies, for each individual ai in e, which attributive constituent ai satisfies, gives an answer to the question of how observed individuals are related to unobserved individuals. Hilpinen distributes inductive probability P (C w ) to constituents C w evenly. Probabilities of the form P (Dv /C w ) are defined by the Carnap-Hintikka style representative function of the form (15). Likelihoods P (e/Dv &C w ) are also determined by the same type of representative function, but now applied to pairs of individuals. Again, corresponding to Hintikka’s basic asymptotic result (13), the highest posterior probability on large evidence is given to the simplest conjunction De &C e where De states that the individuals in e are related to unobserved individuals in the same ways as to observed individuals and C e states that there are in the universe only those kinds of individuals that are, according to De , already exemplified in E. Hilpinen observes that there is another kind of “simplicity”: the number of kinds of individuals in De may be reduced by assuming that all individuals in e bear the same relations to observed and some yet unobserved individuals. If this statement is denoted by D1 , and the corresponding constituent by C 1 , then P (Dv /C v ) is maximized by D1 &C 1 . Hence, inductive methods dealing with polyadic languages need at least two separate parameters which regulate the weights given to the two kinds of simplicity.’indexlawlike generalizations An extension of Hintikka’s system to modal logic has been proposed by Soshichi Uchii [1972; 1973; 1977] (cf. [Niiniluoto, 1987, pp. 91-102]). Such an account is interesting, if the formulation of lawlike generalizations requires intensional notions like necessity and possibility (see [Pietarinen, 1972]). Hintikka himself is one of the founders of the possible worlds semantics for modal logic (see [Bogdan, 1987; Hintikka, 2006]). Uchii is interested in a monadic language L() with the operators of nomic or causal necessity and nomic possibility ♦. Here ♦ =∼ ∼. It is assumed that necessity satisfies the conditions of the Lewis system S5. The nomic constituents of L() can now be defined in analogy with (10): ♦(∃x)Qi (x)&(x) Qi (x) . (29) i∈CT
i∈CT
Uchii calls (29) “a non-paradoxical causal law”. (29) specifies which kinds of individuals are physically possible and which kinds are physically impossible. Even stronger modal statements can be defined by Ci . ♦Ci & (30) i∈H
i∈H
where Ci are the ordinary constituents of the language L without . The laws expressible in L() are typically what John Stuart Mill called “laws of coexistence”. dramatic consequences to Hintikka’s theory of distributive normal forms as well.
The Development of the Hintikka Program
329
To express Mill’s “laws of succession”, some temporal notions have to be added to L() (see [Uchii, 1977]). Let us denote by Bi the nomic constituent (29) which has the same positive Q-predicates as the ordinary constituent Ci . As actuality entails possibility, there are K − w nomic constituents compatible with an ordinary constituent C w of width w. Uchii’s treatment assumes that P (Ci ) = P (Bi ) for all i. Further, the probability of evidence e, given knowledge about the actual constitution Ci of the universe, is not changed if the corresponding nomic constituent Bi is added to the evidence: P (e/Ci ) = P (e/Ci &Bi ). It follows from (13) that (31) P (B c /ecn ) → 1, when n → ∞ and c is fixed, iff P (B c /C c ) = 1.
Thus, if we have asymptotically become certain that C c is the true description of the actual constitution of the universe, the same certainty holds for the nomic constituent B c if and only if P (C c /B c ) = P (B c /C c ) = 1. Uchii makes this very strong assumption, which simply eliminates all the K − c nomic constituents compatible with C c and undermined by the asymptotic evidence. In fact, he postulates that P ((∃xφ(x))/♦(∃x)φ(x)) = 1 for all formulas φ. This questionable metaphysical doctrine, which says that all genuine possibilities are realized in the actual history, is known as the Principle of Plenitude. An alternative interpretation is proposed by Niiniluoto [1987, pp. 101-102]. Perhaps the actual constitution of the universe is not so interesting, since evidence e obtained by active experimentation will realize new possibilities. As laws of nature have counterfactual force, experimentation can be claimed to be the key to their confirmation (see [von Wright, 1971]). So instead of the fluctuating true actual constituent C c , we should be more interested in the permanent features of the universe expressed by the true nomic constituent. This suggests that the inductive approach of Sections 2 and 3 is directly formulated with nomic constituents, so that the axiomatic assumptions imply a convergence result for the constituent B c on the basis of experimental evidence ecn . 5 SEMANTIC INFORMATION Hintikka was quick to note that his inductive probability measures make sense of some of Popper’s ideas. Hintikka [1968b] observed that his treatment of induction is not purely enumerative, since the inductive probability of a generalization depends also on the ability of evidence to refute universal statements. This eliminative aspect of induction is related to the Popperian method of falsification. Popper [1959] argued that preferable theories should have a low absolute logical probability: good theories should be falsifiable, bold, informative, and hence improbable. In Hintikka’s two-dimensional system with the prior probability assignment (17), the initially least probable of the constituents compatible with evidence e, i.e., constituent C c (see (19)) eventually will have the highest posterior probability. The smaller finite value α has, the faster we switch our degrees of confirmation from initially more probable constituents to initially less probable constituents.
330
Ilkka Niiniluoto
Hence, the choice of a small value of parameter α is “an indication of one aspect of that intellectual boldness Sir Karl has persuasively advocated” [Hintikka, 1966, p. 131]. A systematic argument in defending essentially the same conclusion comes from the theory of semantic information [Hintikka and Pietarinen, 1966]. It can be shown that the degree of information content of a hypothesis is inversely proportional to its prior probability. A strong generalization is the more informative the fewer kinds of individuals it admits of. Therefore, C c is the most informative of the constituents compatible with evidence. For constituents, high degree of information and low prior probability — Popper’s basic requirements — but also high degree of posterior probability go together. The relevant notion of semantic information was made precise by Carnap and Bar-Hillel [1952] (cf. [Niiniluoto, 1987, pp. 147-155]). They defined the information content of a sentence h in monadic language L by the class of the content elements entailed by h, where content elements are negations of state descriptions. Equally, information content could be defined as the range R(∼ h) of the negation ∼ h of h, i.e., the class of state descriptions which entail ∼ h. If Popper’s “basic sentences” correspond to state descriptions, this is equivalent to Popper’s 1934 definition of empirical content. As the surprise value of h Carnap and Bar-Hillel used the logarithmic measure (32) inf(h) = − log P (h), which is formally similar to Shannon’s measure in statistical information theory. For the degree of substantial information of h Carnap and Bar-Hillel proposed (33) cont(h) = P (∼ h) = 1 − P (h). Substantial information is thus inversely related to probability, just as Popper [1959] also required. As both Carnap and Popper thought that P (h) = 0 for all universal generalizations, they could not really use the cont-measure to serve any comparison between rival laws or theories. Hintikka’s account of inductive generalization opened a way for interesting applications of the theory of semantic information. He developed these ideas further in “The Varieties of Information and Scientific Explanation” [Hintikka, 1968a] and in the volume Information and Inference (1970), edited together by Hintikka and Suppes. Hintikka [1968a] defined measures of incremental information which tell how much information h adds to the information already contained in e: (34) inf add (h/e) = inf(h&e) − inf(e)
contadd (h/e) = cont(h&e) − cont(e).
Measures of conditional information tell how informative h is in a situation where e is already known:
The Development of the Hintikka Program
331
(35) inf cond (h/e) = − log P (h/e)
contcond (h/e) = 1 − P (h/e).
Hence, inf add turns out to be same as inf cond . Measures of transmitted information tell how much the uncertainty of h is reduced when e is learned, or how much substantial information e carries about the subject matter of h: (36) transinf(h/e) = inf(h) − inf(h/e) = log P (h/e) − log P (h)
transcontadd (h/e) = cont(h) − contadd (h/e) = 1 − P (h ∨ e)
transcontcond (h/e) = cont(h) − contcond (h/e) = P (h/e) − P (h).
Thus, e transmits some positive information about h, in the sense of transinf and transcontcond , just in case P (h/e) > P (h), i.e., e is positively relevant to h. In the case of transcontadd , the corresponding condition is that h ∨ e is not a tautology, i.e., h and e have some common information content. Hilpinen [1970] used these measures to given an account of the information provided by observations. His results provide an information-theoretic justification of the principle of total evidence. Measures of transmitted information have also an interesting application to measures of explanatory power or systematic power (see [Hintikka, 1968a; Pietarinen, 1970; Niiniluoto and Tuomela, 1973]). In explanation, the explanans h is required to give information about the explanandum e. With suitable normalizations, we have three interesting alternatives for the explanatory power of h with respect to e: (37) expl1 (h, e) = transinf(e/h)/ inf(e) =
log P (e) − log P (e/h) log P (e)
expl2 (h, e) = transcontadd (e/h)/cont(e) = expl3 (h, e) = transcontcond (e/h)/cont(e) =
1 − P (h ∨ e) = P (∼ h/ ∼ e) 1 − P (e) P (e/h) − P (e) . 1 − P (e)
Here expl2 (h, e) is the measure of systematic power proposed by Hempel and Oppenheim in 1948 (see [Hempel, 1965]). Note that all of these measures receive their maximum value one if h entails e, so that they cannot distinguish between alternative deductive explanations of e. One the other hand, if inductive explanation is explicated by the positive relevance condition (cf. [Niiniluoto and Tuomela, 1973; Festa, 1999]), then they can used for comparing rival inductive explanations h of data e.7 7 For measures of systematic power relative to sets of competing hypotheses, see Niiniluoto and Tuomela [1973].
332
Ilkka Niiniluoto
6
CONFIRMATION AND ACCEPTANCE
Hintikka’s basic result shows that universal hypotheses h can be confirmed by finite observational evidence. This notion of confirmation can be understood in two different senses (cf. [Carnap, 1962; Niiniluoto, 1972]): h may have a high posterior probability P (h/e) on e, or e may increase the probability of h. (HP) High Probability: e confirms h if P (h/e) > q ≥ 21 . (PR) Positive Relevance: e confirms h iff P (h/e) > P (h). PR is equivalent to conditions P (h&e) > P (h)P (e), P (h/e) > P (h/ ∼ e), and P (e/h) > P (e). The basic difference between these definitions is that HP satisfies the principle of Special Consequence: (SC)
If e confirms h and h ⊢ g, then e confirms g,
while PR satisfies the principle of Converse Entailment: (CE) If a consistent h entails a non-tautological e, then e confirms h. In Peirce’s terminology, hypothetical inference to an explanation is called abduction, so that by “abductive confirmation” one may refer to the support that a theory receives from its explanatory successes. By Bayes’s Theorem, PR satisfies the abductive criterion, when P (h) > 0 and P (e) < 1: (38) If h deductively or inductively explains or predicts e, then e confirms h. (see [Niiniluoto, 1999]). It is known that no reasonable notion of confirmation can satisfy SC and CE at the same time. In the spirit of PR, Carnap [1962] proposed that quantitative degrees of confirmation are defined by the difference measure: (39) conf(h, e) = P (h/e) − P (h).
We have seen in (36) that (39) measures the transmitted information that e carries on h. As Hintikka [1968b] points out, many other measures of confirmation, evidential support, and factual support are variants of (39) (see also [Kyburg, 1970). This is the case also with Popper’s proposals for the degree of corroboration of h by e (see [Popper, 1959, p. 400]). Popper was right in arguing that degrees of corroboration should not be identified with prior or posterior probability. But Hintikka’s system has the interesting result that, in terms of measure (39) and its variants, with sufficiently large evidence e the minimal constituent C c at the same time maximizes posterior probability P (C w /e) and the information content cont(C w ). Hence, it also maximizes the difference (39), which can be written in the form P (h/e) + cont(h) − 1 (see [Hintikka and Pietarinen, 1966]). Hintikka [1968a] proposed a new measure of corroboration which gives an interesting treatment of weak generalizations (see also [Hintikka and Pietarinen, 1966; Niiniluoto and Tuomela, 1973]). Assume that h is equivalent to the disjunction of constituents C1 , ..., Cm , and define corr(h/e) as the minimum of the posterior probabilities P (Ci /e):
The Development of the Hintikka Program
333
(40) corr(h, e) = min {P (Ci /e)|i = 1, ...., m}. Measure (40) guarantees that, unlike probability, corroboration covaries with logical strength: (41) If e ⊢ h1 ⊃ h2 , then P (h1 /e) ≤ P (h2 /e) (42) If e ⊢ h1 ⊃ h2 , then corr(h1 , e) ≥ corr(h2 , e). Further, (40) favours C c among all (weak and strong) generalizations in language L: (43) With sufficiently large evidence ecn with fixed c, corr(h, ecn ) has its maximum value when h is the constituent C c . Hintikka inductive probability measures can be applied to the famous and much debated paradoxes of confirmation. In Hempel’s paradox of ravens, the universal generalization “All ravens are black” is confirmed by three kinds of instances: black ravens, black non-ravens, and non-black non-ravens. The standard Bayesian solution, due to Janina Hosiasson-Lindenbaum in 1940 (see [Hintikka and Suppes, 1966; Niiniluoto, 1998]), is that these three instances give different incremental confirmation to the hypothesis, since in a finite universe these cells are occupied by different numbers of objects. Instead of such an empirical assumption, one could also make a conceptual stipulation to the effect that the predicates “black” and “non-black”, and “raven”, and “non-raven”, have different widths, and then apply the formula (2′ ). Hintikka [1969a] proposes that the “inductive asymmetry” of the relevant Q-predicates could be motivated by assuming an ordering of the primitive predicates (cf. [Pietarinen, 1972]). Another famous puzzle is Nelson Goodman’s paradox of grue. Here Hintikka’s solution appeals to the idea that parameter α regulates the confirmability of universal laws. More lawlike generalizations can be more easily confirmed than less lawlike ones. If we associate a smaller α to the conceptual scheme involving the predicate “green” than to the scheme involving the odd predicate “grue”, then differences in degrees of confirmation of the generalizations “All emeralds are green” and “All emerald are grue” can be explained (see [Hintikka, 1969b; Pietarinen, 1972; Niiniluoto and Tuomela, 1973]). Carnap’s system of induction does not include rules of acceptance. Rather, the task of inductive logic is to evaluate the epistemic probabilities of various hypotheses. These probabilities can be used in decision making (see [Carnap, 1980; Stegm¨ uller, 1973]). Carnap agrees here with many statisticians — both frequentists (Jerzy Neyman, E. S. Pearson) and Bayesians (L. J. Savage) — who recommend that inductive inferences are replaced by inductive behaviour or probabilitybased actions. In this view, the main role of the scientists is to serve as advisors of practical decision makers rather than as seekers of new truths. On the other hand, according to the cognitivist model of inquiry, the tentative results of scientific research constitute a body of accepted hypotheses, the so-called “scientific
334
Ilkka Niiniluoto
knowledge” at a given time. In the spirit of Peirce’s fallibilism, they may be at any time questioned and revised by new evidence or novel theoretical insights. But, on some conditions, it is rational to tentatively accept a hypothesis on the basis of evidence. One of the tasks of inductive logic is define such rules of acceptance for corrigible factual statements. Hintikka, together with Isaac Levi [1967], belongs to the camp of the cognitivists. The set of accepted hypotheses is assumed to consistent and closed under logical consequence (cf. [Hempel, 1965]). Henry E. Kyburg’s lottery paradox shows then that high posterior probability alone is not sufficient to make a generalization h acceptable. But in Hintikka’s system one may calculate for the size n of the sample e a threshold value n0 which guarantees that the informative constituent C c has a probability exceeding a fixed value 1 − ε: (44) Let n0 be the value such that P (C c /e) ≥ 1 − ε if and only if n ≥ n0 . Then, given evidence e, accept C c on e iff n ≥ n0. (See [Hintikka and Hilpinen, 1966; Hilpinen, 1968].) Assuming logical closure, all generalizations entailed by C c are then likewise acceptable on e.8 In Hintikka’s two-dimensional continuum, n0 can be defined as the largest integer n for which ε′ ≤ max
K−c i=1
K−c
i
(
c n−α ) , c+i
where the maximum is taken over values of c, 0 ≤ c ≤ K − 1, and ε′ = ε/(1 − ε). Hintikka and Hilpinen [1966] argue further that a singular hypothesis is inductively acceptable if and only if it is a substitution instance of an acceptable generalization: (45) A singular hypothesis of the form φ(ai ) is acceptable on e iff the generalization (x)φ(x) is acceptable on e. This principle reduces singular inductive inferences (Mill’s “eduction”) to universal inferences. 7
COGNITIVE DECISION THEORY
According to Bayesian decision theory, it is rational for a person X to accept the action which maximizes X’s subjective expected utility. Here the relevant utility 8 Note that (44) is a factual detachment rule in the sense that it concludes a factual statement from factual and probabilistic premises. This kind of inductive rule should be distinguished from probabilistic detachment rules which can be formulated as deductive arguments within the probability calculus (see [Suppes, 1966]). An example of the latter kind of rules is the following:
P (h/e) = r P (e) = 1 P (h) = r.
The Development of the Hintikka Program
335
function express quantitatively X’s subjective preferences concerning the outcomes of alternative actions, usually in terms of some practical goals. The probabilities needed to calculate expected utility are X’s personal probabilities, degrees of belief concerning the state of nature. Cognitive decision theory adopts the same Bayesian decision principle with a new interpretation: the relevant actions concern the acceptance of rival hypotheses, and the utilities express some cognitively important values of inquiry. Such epistemic utilities may include truth, information, explanatory and predictive power, and simplicity. With anticipation by Bolzano, the basic ideas of cognitive decision theory were suggested in the early 1960s independently by Hempel and Levi (cf. [Hempel, 1965; Levi, 1967]). Inductive logic is relevant to this project, since it may provide the relevant epistemic probabilities [Hilpinen, 1968; Niiniluoto and Tuomela, 1983: Niiniluoto, 1987]. Let us denote by B = {h1 , ..., hn } a set of mutually exclusive and jointly exhaustive hypotheses. Here the hypotheses in B may be the most informative descriptions of alternative states of affairs or possible worlds within a conceptual framework L. For example, they may be state descriptions, structure descriptions or constituents of a monadic language, or complete theories expressible in a finite first-order language.9 If L is interpreted on a domain U , so that each sentence of L has a truth value (true or false), it follows that there is one and only true hypothesis (say h∗ ) in B. Our cognitive problem is to identify the target h∗ in B. The elements hi of B are the potential complete answers to the cognitive problem. The set D(B) of partial answers consists of all non-empty disjunctions of complete answers. The trivial partial answer in D(B), corresponding to ‘I don’t know’, is represented by a tautology, i.e., the disjunction of all complete answers. For any g ∈ D(B) and hj ∈ B, we let u(g, hj ) be the epistemic utility of accepting g if hj is true. We also assume that a rational probability measure P is associated with language L, so that each hj can be assigned with its epistemic probability P (hj /e) given the available evidence e. Then the best hypothesis in D(B) is the one g which maximizes the expected epistemic utility: (46) U (g/e) =
n
P (hj /e)u(g, hj ).
j=1
Expected utility gives us a new possibility of defining inductive acceptance rules: (EU) Accept on evidence e the answer g ∈ D(B) which maximizes the value U (g/e). Another application is to use expected utility as a criterion of epistemic preferences and cognitive progress: (CP) Step from answer g ∈ D(B) to another answer g ′ ∈ D(B) is cognitively progressive on evidence e iff U (g/e) < U (g ′ /e). 9 The framework also includes situations where B is a subset