RETHINKING EXPLANATION
BOSTON STUDIES IN THE PHILOSOPHY OF SCIENCE
Editors ROBERT S. COHEN, Boston University JÜRGEN...
30 downloads
1097 Views
859KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
RETHINKING EXPLANATION
BOSTON STUDIES IN THE PHILOSOPHY OF SCIENCE
Editors ROBERT S. COHEN, Boston University JÜRGEN RENN, Max-Planck-Institute for the History of Science KOSTAS GAVROGLU, University of Athens
Editorial Advisory Board THOMAS F. GLICK, Boston University ADOLF GRÜNBAUM, University of Pittsburgh SYLVAN S. SCHWEBER, Brandeis University JOHN J. STACHEL, Boston University MARX W. WARTOFSKY†, (Editor 1960–1997)
VOLUME 252
RETHINKING EXPLANATION Edited by
JOHANNES PERSSON Lund University, Sweden
and
PETRI YLIKOSKI University of Helsinki, Finland
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN-10 ISBN-13 ISBN-10 ISBN-13
1-4020-5580-3 (HB) 978-1-4020-5580-5 (HB) 1-4020-5581-1 (e-book) 978-1-4020-5581-2 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com
Printed on acid-free paper
All Rights Reserved © 2007 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
CONTENTS
Contributors
vii
Acknowledgements
ix
Preface
xi
Part 1: Theory of Explanation
1
Bengt Hansson EXPLANATIONS ARE ABOUT CONCEPTS AND CONCEPT FORMATION
3
Henrik Hållsten WHAT TO ASK OF AN EXPLANATION-THEORY
13
Petri Ylikoski THE IDEA OF CONTRASTIVE EXPLANANDUM
27
Jan Faye THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
43
Olav Gjelsvik CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
69
Stathis Psillos CAUSAL EXPLANATION AND MANIPULATION
93
v
vi
CONTENTS
Erik Weber and Jeroen Van Bouwel ASSESSING THE EXPLANATORY POWER OF CAUSAL EXPLANATIONS
109
Rebecca Schweder SOME NOTES ON UNIFICATIONISM AND PROBABILISTIC EXPLANATION
119
Part 2: Issues in Explanation
129
Alexander Bird SELECTION AND EXPLANATION
131
Johannes Persson IBE AND EBI: ON EXPLANATION BEFORE INFERENCE
137
Jaakko Kuorikoski EXPLAINING WITH EQUILIBRIA
149
Annika Wallin EXPLANATION AND ENVIRONMENT: THE CASE OF PSYCHOLOGY
163
Mika Kiikeri and Tomi Kokkonen BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION OF LANGUAGE ACQUISITION
177
Robin Stenwall ASPECT KINDS
193
INDEX
205
CONTRIBUTORS
Alexander Bird (University of Bristol) Jan Faye (University of Copenhagen) Olav Gjelsvik (University of Oslo) Bengt Hansson (Lund University) Henrik Hållsten (Stockholm University) Mika Kiikeri (University of Tampere) Tomi Kokkonen (University of Helsinki) Jaakko Kuorikoski (University of Helsinki) Johannes Persson (Lund University) Stathis Psillos (University of Athens) Rebecca Schweder (Lund University) Robin Stenwall (Lund University) Jeroen Van Bouwel (Ghent University) Annika Wallin (SCAS and Lund University) Erik Weber (Ghent University) Petri Ylikoski (University of Helsinki)
vii
ACKNOWLEDGEMENTS
We want to express our gratitude to Matti Sintonen. He has supported this project in so many ways. The editors also want to thank NOS-H for funding the project.
ix
PREFACE
This volume is a product of the international research project Theory of Explanation, which was funded by the Joint Committee for Nordic Research Councils for the Humanities and the Social Sciences (NOS-HS). The project started in 2001 and operated for a period of three years by organizing a number of workshops on scientific explanation in Norway, Iceland, Sweden and Finland. The workshops included presentations by people involved in the project and by invited guests. Both groups are represented in this volume, which brings together some of the papers presented in these meetings. The central theme of the research project was scientific explanation, but it was approached from many different angles. This plurality of approaches is also visible in the present volume. The authors share a joint interest in explanation, but not the same theoretical or methodological assumptions. As a whole, this volume shows that, although the theory of explanation has been a major industry within philosophy of science, there are still both conceptual problems to be solved and fresh philosophical ideas to explore. The papers in this volume have been divided into two broad groups. Part 1 consists of papers dealing with general issues in the theory of explanation, while the papers in Part 2 focus on some more specific problems. Part 1: Theory of Explanation The book opens with Bengt Hansson’s chapter on what explanation is and is not. He argues that an explanation is not a logical structure¸ and that it cannot be characterized in syntactic terms. It is rather an epistemological structure, a structure organising conceptual content. In order to show this, Hansson begins with a simple example and develops his argument by systematically exploring the effects of making the premises more general or more specific, thereby exposing the implicit suppositions and consequences of some commonly held views. xi
xii
PREFACE
In the next chapter, Henrik Hållsten argues that any theory of explanation must do justice to the intuition that a putative explanation can be irrelevant in at least two ways: relative to the epistemic situation and due to objective irrelevance. He argues that the latter sort of irrelevance cannot be relativized to the epistemic situation and that this poses an important, but often neglected, challenge for many theories of explanation. In chapter three, Petri Ylikoski discusses the idea of contrastive explanandum. He begins by presenting the intuitive idea behind the contrastive questions, and suggests a novel way to see the difference between scientific and everyday explanatory questions. He argues that all explananda can be analysed as contrastive and that this is a fruitful approach in making explanatory questions more explicit. In the latter part of the paper he defends the contrastive view against the major criticisms presented against it. Jan Faye focuses on pragmatic theories of explanation. According to him, these theories make a problematic association between the relevance and the correctness of an explanation. An answer to an explanation-seeking question can be false, but may still be relevant, and therefore be an explanation. Faye suggests a pragmaticrhetorical theory of explanation, where an explanation is grounded in a rhetorical practice of raising questions and answering them within a certain discourse. He also argues that his view has an interesting implication, according to which the distinction between scientific and everyday explanations turns out to be arbitrary. In chapter five, Olav Gjelsvik examines David Lewis’s thesis that to explain an event is to provide some information about its causal history. According to Gjelsvik this requirement is too weak to provide a fruitful starting point for a theory of explanation. He then proceeds to develop an account of his own by considering a number of suggestions to strengthen Lewis’s account of causal relevance. Stathis Psillos discusses causal explanation too. His focus is on the manipulationist account of causal explanation developed by James Woodward. According to this theory, c causally explains e on condition that, if c were to be (actually or counterfactually) manipulated, e would change too. Psillos discusses the role of laws in explanations of this kind. He claims that the very possibility of experimental counterfactuals requires that the laws are more than just invariances among magnitudes. In chapter seven, Erik Weber and Jeroen van Bouwel defend the view that the criteria by which the explanatory value of explanations is to be judged are contextdependent. Explanation-seeking questions have different motivations and the criteria for evaluating the explanatory power depend on the motivation. Weber and van Bouwel argue that in most contexts a posteriori probability is important. They also argue that explanatory depth (going back further and further in time) is relevant in only one context. In most contexts there are further explanatory criteria (e.g. familiarity or manipulability), but interestingly enough unification is not among these. In the last chapter of the Part 1, Rebecca Schweder argues that there are at least three different ways to understand the notion of probabilistic explanation. She proposes that these differences reflect real metaphysical differences. According to
PREFACE
xiii
Schweder, this constitutes a problem for the unificationist account of explanation, since not all three would count as explanations. Part 2: Issues in explanation Alexander Bird starts Part 2 by examining selection explanations. He argues that familiar raven cases provide counterexamples to the D-N model of explanation, not only to the hypothetic-deductive model of confirmation. Thus not everything deducible from a law is explained by that law. In selection cases the law operating is often negative, and the contrapositives of such laws do not explain their instances. However, the physical sciences tend not to focus on such negative general truths but on their (particularizable) contrapositives. What makes selection explanations different, Bird argues, is their use of negative properties. Consequently, in selection explanation it is the negative generalization that has a particularizable explanation while it is the logically equivalent positive generalization that is non-particularizable. Selection explanations, Bird concludes, do not appear fundamentally different from other nomological explanations. Johannes Persson looks into the relations between explanation and inference. Sometimes explanation clearly comes before inference (EBI). But are such cases also instances of Inference to the best explanation (IBE)? Persson proceeds to answer this question by examining some of the limiting cases of IBE—such as when no hypothesis meets the minimal requirement for being an explanation, or where we have competing concepts of explanation, or where the possibility of an IBE coexists with the applicability of other inferential tools. Persson finds that, sometimes, essentially explanatory considerations nevertheless belong to the early phases of such inquiries, for instance involving concept formation. Later phases of such enterprises merely appear to be governed by IBE. According to Jaakko Kuorikoski, equilibrium explanations, which are quite pervasive in economics and biology, have not received the philosophical attention they deserve. Elliot Sober once claimed that equilibrium explanations constitute a counter-example to “the causal thesis of explanation”, because they do not require the exposition of the actual causal history of the event to be explained. Kuorikoski shows that this claim is mistaken and based on the confusion about relation between causal and structural explanations. On a closer view, Kuorikoski argues, the explanatory uses of equilibrium models require that the models pick out causally relevant properties of the system. In the next chapter, Annika Wallin investigates the explanatory role of the environment in psychology and the cognitive sciences. She begins by considering different types of adaptation to the environment in psychology. Depending on the notion of adaptation adopted, different sorts of explanations of psychological phenomena become viable. Wallin describes how psychologists use methodological shortcuts to avoid some of the most difficult conceptual questions related to the differences between these notions of adaptation. For instance, using representative sampling (i.e. making sure that the environment in which measures are taken is typical for the everyday life of participants) gives psychologists an idea of how a cognitive process functions in its normal environment, even when they are unable to
xiv
PREFACE
characterise this environment more precisely. She concludes by discussing how psychologists can rely on such methodological shortcuts when explaining behaviour, and what their environment-oriented explanations require to be theoretically solid. Mika Kiikeri and Tomi Kokkonen start by comparing the empiricist and nativist accounts of language acquisition. Nativist explanations of language acquisition have assumed that all details that cannot be accounted for on the cognitive level are innate, based on the biological properties of an organism. At the same time, the details of biological implementation are left to future research. Innateness has thus remained as a kind of explanatory black-box for linguistics and cognitive science. Kiikeri and Kokkonen point out that the concept of innateness has somewhat different explanatory roles in biological and psychological contexts, making the assumption of neat disciplinary division of labour untenable. However, they argue that an adequate biological analysis of psychological innateness could still help to re-evaluate the role of innate information in language acquisition. This book ends with Robin Stenwall’s interesting study on explanations involving natural kinds. These explanations can be structurally complicated. For instance, in explaining why zinc sulphate is easily soluble in water, we make references to solubility being a characteristic of metallic sulphates, but when an explanation of the positive effect of zinc sulphate on the immune system is called for, we may point out that it is a zinc compound. Stenwall argues that when we explain in this manner, we are almost always doing this in relation to some contingent property of the natural kinds involved. Aspect kinds are kinds defined in terms of their contingent properties.
Johannes Persson and Petri Ylikoski
PART 1
THEORY OF EXPLANATION
EXPLANATIONS ARE ABOUT CONCEPTS AND CONCEPT FORMATION BENGT HANSSON
I have many ideas about explanations, and I have difficulties in bringing them all together under a sufficiently catching key-word. I have tried a nuanced, manyfaceted and in-depth argued approach elsewhere,1 and I will now try the opposite. By varying a trivial example along a single dimension I will put forward my main thesis: that an explanation is not a logical structure¸ that it cannot be characterised in syntactic terms, but it is rather an epistemological structure, and, more specifically, a structure organising conceptual content. Let us start with a trivial example and assume, without presupposing any particular theory of explanation and perhaps for no other reason than wishing to disprove it by a reductio argument, that the general (law-like) fact that specimens of table salt dissolve when put into water and the singular fact that this pinch of crystals is a specimen of table salt constitute an explanation of the fact that this pinch of crystals dissolved, and, furthermore, that the explanatory power lies essentially in the deductive relation, even if many theories of explanation also state additional requirements. The following puzzlement about such explanations has been repeated ad nauseam in the literature: (A1) All specimens of table salt dissolve when put into water. (A2) This salt is table salt. (A3) This salt has been hexed. (A4) This salt will dissolve when put into water. Obviously, if (A3) had not been included, this would have been a paradigmatic example of what I described in the second paragraph. Now the explanation seems somewhat awkward. But does the inclusion of (A3) actually do any harm? And if so, is it because it is silly or because it is superfluous? Yes, the inclusion of (A3) is harmful, and not because of its silliness. The explanation would still be awkward if we substituted “This salt was once owned by W. Salmon,” although it would then no longer be silly. To realise that superfluous premises do in fact harm an explanation we may exaggerate the situation by adding 1
Hansson (2006)
3 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 3–11. © 2007 Springer.
4
B. HANSSON
lots of irrelevant information, e.g. all of Encyclopaedia Britannica. No matter what theory of explanation you subscribe to, this would not be satisfactory as an explanation in any reasonable sense of the word. So explanation is not only about having sufficient information; it is also about having the right kind of information. At least, it should be relevant. So, what is it for information to be relevant? Deductive relevance is not the only kind. Consider the following example: (B1) Omega-3 fatty acids prevent thrombosis. (B2) This pill contains an adequate amount of omega-3 fatty acids. (B3) Omega-3 fatty acids work by way of several parallel mechanisms; they are i.a. highly anti-inflammatory and also prevent blood cells from clumping together. (B4) Taking this pill will reduce your risk of thrombosis. The premise (B3) is relevant because it both strengthens our belief that (B1) is correct and explains why it is correct and therefore raises our trust in the conclusion (B4). But it is not necessary; the explanation would still be an explanation without it. We need not, and cannot, bring in all that is relevant. Perhaps we should require all premises to be indispensable, like we already require the law-like premise to be? That would be a step in the right direction, but still not enough. Consider the two variants of the following example: (C1) All sodium salts are soluble. (C2) This salt is sodium chloride. (C3) This salt will dissolve. (C'1) All sodium salts are soluble. (C'2a) This is a sodium salt. (C'2b) This salt is a chloride. (C'3) This salt will dissolve. Both variants contain the same information in the premises. But (C2b) is superfluous in the C'-variant and would have to go if we allow only indispensable premises. We are now facing the following dilemma: Either C and C' are different explanations. Then we conclude that the identity of an explanation lies not in the information contained in the premises but in something else, and our task is to find what that is. I suggest that this has to do with how the explanation conceptualises the situation and I will say more about that later. Or else C and C' are the same explanation, only differently phrased. Since C' is defective because of the superfluous premise, C must be defective too, but covertly. I will try to prove that this is not so by casting doubt on the idea that covert superfluity is a fault. My strategy, therefore, is to attempt a reductio. If covert superfluity is to be avoided, how should we go about purging explanation C of it? (C3) is safe, being given as the explanandum, so we have to deal with (C1) and (C2).
EXPLANATIONS ARE ABOUT CONCEPTS AND CONCEPT FORMATION
5
Let us first look at (C2). It assigns the immediately given object to the category “sodium chloride”. Anyone familiar with this concept knows that it refers to a salt composed of sodium cations and chlorine anions. (Some may even think that this is obvious already from the linguistic form of the phrase, but that would be premature—an atom bomb does not mean e.g. a bomb made of atoms.) But only part of this conceptual content is needed to apply (C1). So, if the fault is with (C2), we will have to strip it down to (C'2a). The other possibility would of course be to make (C1) less general. It now says that all sodium salts are soluble; if it only said this about sodium chloride it would fit well with (C2). In either case, getting rid of covert superfluity means saying less. To show that this is not a desirable thing, I will follow two lines of argument to the extreme to see where they lead, the first one having to do with the generality of (C1), and the second with the specificity of (C2). I therefore look first at an explanation which is a little more general than C and C': (D1) All salts of alkaline metals are soluble. (D2) Sodium is an alkaline metal. (D3) This is a sodium salt. (D4) This salt will dissolve. In a similar manner, this explanation must contain covertly superfluous premises. (D3) specifies, unnecessarily, which alkaline metal is involved. “This is a salt of an alkaline metal” could replace both (D2) and (D3), ridding the explanation of superfluous information. But would this it a better explanation? I think not, but to better prove my view, I will take a few more steps in the same direction: (E1) All salts of metals with a single electron in the outermost orbital are soluble. (E2) Alkaline metals have a single electron in the outermost orbital. (E3) Sodium is an alkaline metal. (E4) This is a sodium salt. (E5) This salt will dissolve. Obviously, the same move can be repeated here. An appropriate change in (E4) could replace all of (E2)-(E4). What distinguishes example E from example D is that we have introduced yet another intervening concept between sodium and solubility. This has a number of consequences: the amount of excess information increases, the complexity of the explanation also increases, more concepts need to be mastered, but it also brings the explanation closer to those properties of sodium that account for its solubility and so makes it a better answer to the question why the salt will dissolve. If we wish to pursue this line of thought as far as it goes, we must be more and more specific about which mechanisms are in operation and approach a full chemical specification of what makes a salt soluble in water. That means that we would have to introduce new concepts, often not previously available to the explainee. Schematically, it might look like this:
6
B. HANSSON (F1) Solubility consists in having properties X1. (F2) Laws that relate properties X2 to properties X1. (F3) Laws that relate properties X3 to properties X2. ... (Fn) Laws that relate properties Xn to properties Xn-1. (Fn+1) This salt has properties Xn. (Fn+2) This salt will dissolve.
Again, we face increasing complexity and superfluity, but also greater depth in our understanding. I would like to note the following aspects of this explanation scheme: * It admits of what might be called variable depth. Each of (F2), (F3), ... (Fn) could (but need not) be expanded into an explanation of its own, but the whole explanation could also be made very simple by putting n = 1. * Whether explanation F is expanded, left intact, or contracted, it is still an explanation. But it may be better or worse and what determines this is the purpose of the why-question and the explainee’s pre-understanding. These are situation-specific circumstances which may be lumped together under the label “pragmatic”. Explanation F is best when each step is apprehensible for the explainee, given the concepts she masters and the beliefs she holds. * Law-likeness is less prominent in F than in the previous examples. Although (F1) will generate several laws, it is no law, and (F2) and its sequels could well be rephrased in the manner of (F1). This does not mean that laws can be dispensed with; they will still hold true as general propositions and perhaps play a part in derivations, but they will not be primary elements of an explanation, but secondary consequences of something more fundamental, like the conceptual relationships expressed by (F1). Let us now go back to the question whether C and C' are the same explanation, and whether C must be thought defective because of covert superfluity. I have now concluded my first line of argument against this idea by showing that one possible cause of covert superfluity, the generality of (C1), is no fault, and sometimes even an asset. For explanation F, which takes the generality of the first premise to the extreme, is, when properly adapted to pragmatic circumstances, a better explanation than either of C, D, and E. Great generality, and the covert superfluity this generates in relation to a particular explanandum, is not harmful but often beneficial, and it may even be instrumental in adapting an explanation to pragmatic circumstances. It now remains to pursue my second line of argument, having to do with the superfluity deriving from the specificity of (C2), the singular premise in explanation C. This premise says that what we have is a specimen of sodium chloride, but for the explanation to be valid it need only say that it is a sodium salt. The superfluity is amplified in explanations D, E, and F—the corresponding premise in E is (E4), which need say only that this is a salt of a metal with a single electron in the outermost orbital—rather far from the observation that it is common table salt. In the extreme case of explanation F, n has to be set to 1 to avoid superfluity, and the singular premise, which will then be (F2), has to be that this salt has exactly
EXPLANATIONS ARE ABOUT CONCEPTS AND CONCEPT FORMATION
7
properties X1, no matter how complex or theoretically embedded these properties are. Banning superfluity forces the singular premise to say exactly what is required to trigger the general premise, neither more (which would produce superfluity) nor less (there would be no deductive relation). This means that we can only have one-step explanations, without any intermediary steps. In general, this is an unhappy state of affairs, since the essential function of an explanation is to connect previously unconnected pieces of knowledge, and these are many. If combined with my previous argument to the effect that we should seek a highly general first premise, this seems to indicate that we should strip down explanation F to the following: (F'1) Solubility consists in having certain (rather specific physical and chemical) properties. (F'2) This salt has these properties. (F'3) This salt will dissolve. (F'1) is certainly informative and may well be at the cutting edge of scientific knowledge, but (F'2) does not help much in connecting this advanced knowledge with the little grains on the table before me. But this is not the end of the story. Explanation F' still contains superfluous information. Any classification of this specimen of salt, even only classifying it as table salt, and a fortiori classifying it like in (F'2), means adding superfluous information. The only superfluity-free explanation would be this: (G1) Anything exactly like this specimen of salt is soluble. (G2) This salt will dissolve. True, but not exciting! And not what we want from an explanation. I thus conclude that I have been successful also in my second line of argument to show that the idea that explanation C is defective because of covert superfluity leads to unacceptable consequences. So, covert superfluity is no fault, and C is a perfectly good explanation (in non-sophisticated circumstances) even though C' is not. But C and C' contain the same information. So, the identity of an explanation lies not in its informational content. Something else, then, besides information, is important. I claim that the difference lies in the way the situation is conceptualised. Covertly superfluous information often comes as part of useful concepts. In C, “sodium chloride” is presented as a simple natural category, it is taken for granted that it is contained in the wider category “sodium salts”, and this wider category is then connected to the concept of solubility. In C', we treat “sodium” and “chloride” as separate concepts, and only the former is connected to solubility. As it stands, C' suffers from the superfluous (C'2b), but without it, it would be a perfectly good explanation. The weakness if this particular example is of course that it is plain, from the very words, that to be “sodium chloride” means to be both something having to do with sodium and being a chloride, and that it is therefore not a simple concept, making
8
B. HANSSON
the difference between C and C' minimal. But this is an argument against the particular example and not against my thesis. To see this, we can look at a third variant of the C explanation: (C''1) All sodium salts are soluble. (C''2) Table salt is a sodium salt. (C''3) This salt is common table salt. (C''4) This salt will dissolve. “Table salt” is more clearly a simple natural concept than “sodium chloride”, because it is derived by induction from sensory experience. I claim that C'' is a better explanation than C, because it takes us in well linked steps from an empirical observation, (C''3), via a theoretical classification of this observation, (C''2), to a general law, (C''1), thereby providing unification and better organisation of our knowledge. C is less satisfactory, because (C2) mixes empirical reference with theoretical categorisation and leaves the purely empirical element unconnected. Of course, the same move can be repeated for explanations D and E, and the point will be seen even more clearly. This takes us to the question of better and worse explanations. I take it to be uncontroversial that we, as we pass from C to F, proceed from the trivial to the more genuinely explanatory. The main difference is that we progressively spell out more intermediate concepts, making each individual link in the explanation easier to apprehend, but making the explanation as a whole longer and more complex. How far one should go in this is to be determined by pragmatic factors, such as the explainee’s state of knowledge and pre-understanding, and the purpose and depth of intention behind the why-question, because these are the factors that determine when increased detail becomes a burden rather than explanatory. That the links are easy to apprehend (once they are spelled out) does not imply that they are more probable or easier to believe. On the contrary, a good explanation will often require the explainee to master new concepts and learn new facts about them, and it is often the creation of new concepts (rather than new knowledge) that prepares for better explanations. This view of explanation seems to be in conflict with the reasonable desideratum that an explanation should simplify things, and in particular with those brands of unificationism which seek to minimise the number of assumptions referred to. This is certainly right on the local level of a single explanation, but the increased complexity at that level may, at least partly, be compensated by a better economy of concepts at the global level of the system of all potential explanations.2 But this is not a decisive argument; regardless of the size of the compensation, the need to sometimes make an explanation more complex can be justified also by another argument. This other argument consists in a further deployment of the distinction between general laws and accidental generalisations. Both are universally quantified 2
I was made aware of the importance of the distinction between global and local conditions by Rebecca Schweder, who discusses it in her 2004.
EXPLANATIONS ARE ABOUT CONCEPTS AND CONCEPT FORMATION
9
sentences, but the former also exhibit some species of necessity, making them more true to actual processes and mechanisms in the world, however difficult it is to specify exactly what that means. This is reflected in the oft-mentioned fact that laws support counterfactual statements, whereas accidental generalisations do not. However, this is only a useful criterion and not an explication of what it is to be law, for the truth conditions for counterfactuals are no less obscure than the concept of a law. I do not pretend to be able to give a full explication of the concept of a law, only to show that there is more to it than being true and supporting counterfactual statements, and that the general premises in an explanation must be laws in this stronger sense. What needs to be added is that the separate links in a good explanation should somehow track real events, processes or dependencies, or, to use an ancient metaphor, cut nature at its joints. Take for example, disregarding for the moment the distinction between deterministic and probabilistic explanations, the common knowledge that approaching thunderstorms cause headaches. Is this an acceptable law in an explanation? With regard to its origin, it is no more than a generalisation from popular experience, although not an accidental generalisation. We have good reasons to believe that it is a real regularity in nature which we have come to observe. It therefore supports the counterfactual statement “If a thunderstorm had been approaching, more people would have had headaches”. But is it a real law? And is it fit for use in explanations? All it really adds to a specific fact is that similar things have happened often before. Is that sufficient to explain that fact? By contrast, one can spell out a number of intermediary steps, corresponding to clearly separable processes in nature: (H1) An approaching thunderstorm produces an excess of positive ions in the air. (H2) When inhaled, positive ions irritate the lungs and the respiratory tract, causing a mild inflammatory reaction. (H3) An inflammation in the lungs impairs their ability to supply oxygen to the blood. (H4) Less oxygen being supplied to the cranial arteries tends to produce a headache. It cannot be denied that this, if true, is a much better explanation than “Thunderstorms cause headaches” for anyone who is serious about her whyquestion, even though there exist situations where the simpler statement is adequate.3
3
It has been pointed out by several authors that an amplification like (H1)-(H4) becomes an answer to the question “Why do thunderstorms cause headaches?”, i.e. an explanation for a different explanandum. I do not deny that, but it is also an amplified explanation of the old explanandum. It must be possible to ask for a deeper or better explanation of the same explanandum without committing oneself to the view that it therefore must contain also an explanation of each of its premises, and so on. The proper place to stop is neither dogmatically after one premise nor, equally dogmatically, to go on forever, but at a level determined by pragmatic circumstances.
10
B. HANSSON
Why, then, talk about concepts and conceptual links rather than real properties, processes and mechanisms? For mainly two reasons, both having to do with the fact that explaining is an epistemic activity and that it is via concepts that we have access to the world. First, there may be real properties for which there are no concepts, either because the properties are really inapprehensible or because science has not yet advanced far enough. Obviously, such properties cannot themselves play any explicit role in an explanation, but they should not be allowed at all. For example, we shall not accept explanations like the following: (I1) There exists some property (apart from solubility itself and specific varieties of it) such that everything with this property is soluble. (I2) This salt has one such property. (I3) This salt will dissolve. If quantification is to be allowed at all in (I1) and (I2), it should only be allowed to range over properties for which there are concepts. The second reason has to do with the pragmatic desire for variable depth. Even if the conceptual links in an explanation do cut nature at its joints, there is no need to cut nature at every joint. The size of the chunks should be allowed to vary. Obviously, to discuss in detail the ontological relationship between properties and concepts would be to go too far in this note, but I think it can safely be said that concepts are more flexible than properties for the purpose of variable depth. To sum up: The above exercise about superfluous information shows that it is the conceptual arrangement of the premises rather than their informational content that make a good explanation. I therefore advocate the view that explanation consists in placing explanandum in relation to fundamental (and hence general) facts through intermediary concepts that make the connecting links easy to apprehend and, to the best of our knowledge, track real events, processes or dependencies. In a sense, explanations are like proofs in mathematics, which link the statement of a theorem to fundamental facts (axioms) through intermediate steps which are (comparatively) easy to apprehend and which reflect logically valid principles. In both cases it is essentially a conceptual, and hence epistemological exercise, which, however, is restricted by objective conditions. But placing explanations in the realm of concepts means that one is exposed to the danger of individual or collective ad hoc-constructions. One check against this is to look at the global level and to restrict the concepts to those that function in many different contexts or are part of a globally well connected system of concepts. Another check is the requirement that the concepts should track real events, processes or dependencies. The question that arises is, of course, whether these two checks are in accord or pull in different directions, but a full answer would have to involve the whole debate about social constructivism, so I have to leave it for another occasion. The position I argue for could therefore be described as another form of unificationism, but not one where unification is a goal by definition, but a consequence of other goals on the conceptual level. While the classical unificationist
EXPLANATIONS ARE ABOUT CONCEPTS AND CONCEPT FORMATION
11
is right in asking for intellectual and epistemological economy, she is wrong if she identifies this with having as few premises or beliefs as possible. Rather, global economy concerning what concepts are needed to make the world intelligible is more basic than either global or local economy of assumptions or premises.
REFERENCES Hansson, B. (2006). Why explanations? Fundamental, and less fundamental ways of understanding the world. Theoria LXXII (1): 23-59. Schweder, R. (2004). A Unificationist Theory of Scientific Explanation. Lund: Studentlitteratur.
WHAT TO ASK OF AN EXPLANATION-THEORY HENRIK HÅLLSTEN
1. In the following I will discuss some of the issues that an explanation-theory should address. Though it is an attempt to stay away from the question as to which particular theory that is the correct one, I will argue for and against different alternatives in ways of addressing these issues. Partly, what I will try to do is start listing some of the issues over which we, as philosophers in the theory of explanation, should make up our minds. In some cases this making up of minds will consist of agreeing on terminology and in some cases it deals with deeper questions. First I would like to clear up some terminological issues. I will use the following terms in the following way: A model of explanation is an attempt to define, or explicate, explanation, or a related concept, such that putative examples of explanations can be either ruled out or shown to be real explanations. An explanation-theory is a theory concerning explanation. It is in our explanation-theory that we advocate certain models of explanation or try to account for their apparent shortcomings. The theory of explanation is the theoretical study of explanation, i.e. a specific branch of philosophy.4 It is sometimes argued that there are two different approaches in the theory of explanation, one that tries to construe highly normative models used for judging the doings of scientists, and another that is more on a pair with the scientists’ doings in that it is an, more or less, empirical theory of what is considered to be explanatory. The above stipulation of the usage of terms can appear to fit only the first of these approaches. Without getting in to this discussion too deep, let us note how any theory need to sort out its terms: Studying whales—in the theory of whales—we need means of sorting non-whales from the rest. These means will get more elaborate when we move from common knowledge to science: dolphins are whales, whale-sharks are not, and, getting more fine-grained; killer-whales are dolphins. These models of the concepts of whale and dolphin are normative as well as 4
This terminology is basically an adaptation of Dummett’s terminology concerning the theory of meaning and meaning-theory (Dummett 1991, p. 22).
13 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 13–26. © 2007 Springer.
14
H. HÅLLSTEN
stipulative. They could have been different. Some people might regard whale-sharks as whales, as well as killer-whales as non-dolphins—and they might adapt their whale-theories accordingly. What we then have to do is to show—in the realm of our whale-theory—that our models are better. We have to account for the lacks of our models as well for their extravagances. Basically, the case is not different in the theory of explanation. As competent language-users we have intuitions concerning, both specific instances of putative explanations, and more general claims of what makes these instances explanatory. These intuitions are not un-controversial. This is an important fact that probably separates the theory of explanation from other fields of philosophy such as the theory of meaning where you seldom find philosophers disagreeing about the meaning of “the snow is white” in the sense that they behave differently in similar situations when confronted with the truth of the proposition. Concerning the theory of explanation it is well possible that whether or not we consider a putative explanation as complete or merely a probable sketch do have important consequences concerning how we—as scientists—will go about in our research. James Woodward discusses another division of the approaches towards an account of explanation and causation. In Woodward (2003) he presents what he calls a non-reductive account of both concepts, which he contrasts with reductive accounts, classified as follows: I take the general idea [of an reductive account] to be that concepts like “cause” and “explanation” belong to an interrelated family or circle of concepts that also include notions like “law”, “physical possibility” and other modally committed notions. (Woodward 2003, p. 20) An account is reductive if it analyzes concepts in this family solely in terms of concepts that lie outside of it. (Woodward 2003, p. 20)
This division seems to point at something more controversial. When Wesley Salmon revised his process-theory of causation, as presented in Salmon (1984), he did so on the grounds that it was not reductive. Arguments by Phil Dowe and Philip Kitcher, among others, convinced Salmon that his account of causality was in fact made in “modally committed notions” (Kitcher 1989; Dowe 1992; Salmon 1994). A similar example can be found in earlier writings in the theory of explanation. The concept “law” played an essential role in Hempel’s characterizations of both the DN as well as the I-S model of explanation and so it appears that a similar argument can be made against Hempel, since Hempel’s intention was without doubt a reductive theory, in Woodward’s terms. The difference lies in the fact that Hempel acknowledged his dependence on the concept of law and tried to give a similar account of it. Thus, if we are aiming for a reductive account we must be clear as to what we consider likable terms in which we elucidate explanation and what we consider concepts that play a role in our project but that need further clarifications. If causation is used to explicate explanation it too needs further clarification. If we on the other hand are aiming for a non-reductive theory it must be made clear what related concept(s) that we build our explications on. In Woodward’s case it is
WHAT TO ASK OF AN EXPLANATION-THEORY
15
causation, where one causal connection is elucidated in terms of other causal connections (Woodward 2003, pp. 22, 104-105). 2. Petri Ylikoski uses the following example from Garfinkel as an introduction to a discussion on contrastive explanations: When Willie [Sutton] was in prison, a prison priest asked him why he robbed banks. Willie answered, “Well, that’s where the money is.” The joke is based on a confusion, for Willie was not answering the question the priest was asking. The priest had in his mind the question: “why do you rob banks, instead of leading an honest life?”, whereas Willie answered the question: “why do you rob banks, rather than gas stations or grocery stores? (Ylikoski present volume; Ylikoski 2001, p. 22; Garfinkel 1981, p. 2122)5
This example can be used to show that we cannot discuss an explanation out of context. If we want to be able to rule out certain putative explanations as irrelevant we must specify in what contexts they are given. The idea is that although there might be situations where Sutton’s answer does constitute an explanation, it does not do so in the present context. If we were to understand the concept explanatory relevant as part of a complete explanation and for example take Hempel’s D-N model to explicate complete explanation, this example would show us how we would then fail to capture an important factor concerning explanatory relevance. Let us exercise the above terminology: The example can be used as a counterexample to a naïve explanation-theory that takes Hempel’s D-N model as a definition of a complete explanation and the idea that any act conveying information about the complete explanation is an act of explanation. Elaborations on the example can show further things. An obviously bad explanation would be “The bankers want me to”, since this is false. The bankers did not want Sutton to rob them and any putative explanation based on the assumption that they did would be bad due to this fact. The first—in the sense of simplest— lesson to learned from the example is thus that the explaining proposition(s)—the explanans—should be true. Now this is in fact not an uncontroversial requirement. See for example (Cartwright 1983) concerning the requirement in physical explanations. See also (Hållsten 2001, §§44, 46 and 47) for an attempt to account for apparent violations of this requirement. This is, however, not the issue of the present paper, but before we leave the question let me shortly address another putative counter-example to the requirement. If we ever explain any human behaviour, false believes are certain to play more or less important roles. If Sutton believed that the bankers wanted him to rob their banks this would certainly be explanatory relevant. But to construe this as a counter-example to the requirement that an explanation must be true is to mix up the distinction between the true proposition that Sutton
5
I have used this example before but with details all wrong and I would like to thank Petri for sorting them out for me.
16
H. HÅLLSTEN
believed the bankers wanted him to rob them, and the false one that the bankers wanted Sutton to rob them. A third example of a bad explanation would be “The cashiers are all blond.”— this would be so even if it were true that they were blond. Of course, Sutton could have a psychological disposition, such that this putative explanation would be good, but let us for argument sake, assume that he did not. Let us assume that the cashiers were all blond but that this had no bearing to the fact that Sutton chose to rob banks. And in order to avoid confusion, let us assume that this bad explanation is given not by Sutton but by someone else. This putative explanation would then be irrelevant in another way than his original answer. It is possible that there are explanationconsumers for whom Sutton’s original answer about the money would be explanatory, but given the assumed facts, the blondness-information cannot be genuinely explanatory for any explanation-consumer. Where our first variation of the original example had to do with the truth of the explanans, this variation deals, just like the original example, with the relevance of the explanans. But the missing relevance is different in the original example and our last version of it. This is so because there are at least two ways a putative explanans can be irrelevant. The first case of irrelevance is obviously connected to a specific context but this is not so for the second type of irrelevance. If the putative explanans is irrelevant as in the original example the explanatory claim is irrelevant for the present consumer, but if the putative explanans is irrelevant as in this last version of it the explanatory claim is simply false, and the putative explanans in the explanatory claim is simply irrelevant for any consumer. These two notions of relevance are arguably important. Any full-blooded explanation-theory must make justice to them. Thus, I claim that a putative explanation can fail by either being (i) false, (ii) objectively irrelevant or (iii) contextually irrelevant. In (ii) the explanatory claim is false, but in (iii) the explanatory claim is irrelevant for present purposes. As the answer “that’s were the money is” shows, something can be objectively relevant but contextually irrelevant. But as “the cashiers are all blond” shows, objective relevance is a necessary condition for contextual relevance. A similar division can be found in (Ylikoski present volume), with (i) corresponding to his factual failure, and (iii) corresponding to his pragmatic failure and perhaps also to his “failure to correct incorrect presuppositions of the explanatory question”. But (ii) lacks any counterpart in Ylikoski’s division. (This does not necessarily point to anything else than different but compatible focuses of our respective articles.) To summarize: Any explanation-theory must make justice to the distinction between objective explanatory relevance and context dependent explanatory relevance or provide good arguments as to why this distinction should not be upheld. If the distinction is not to be upheld the proponent of such an explanation-theory must show how it is reasonable that “Well, that’s where the money is.” is irrelevant in the same sense as “The cashiers are all blond.” Since this seems very hard to do— given the fact that we can imagine someone for whom the first answer is
WHAT TO ASK OF AN EXPLANATION-THEORY
17
explanatory, and that the similar seems hard to do concerning the other answer—I will from now on make use of the distinction.6 3. It is the objective relevance relation that was the focus for the early philosophers in the theory of explanation, most notably Hempel, whose D-N model can be seen as an attempt to explicate the objective relevance relation. Note that Hempel also provided a long discussion concerning the pragmatical issues in chapter 4 and 5 of “Aspects of Scientific Explanation”. However, when Hempel presented the first account of statistical explanation the distinction got, as we shall see, blurred. If we construe a model of statistical explanation such that it is only a relaxation of the deductive requirement that sets it apart from a deductive one we will get into problem with what Hempel called statistical ambiguity. Let us illustrate this with Hempel’s own example. Jones (j) has a streptococcal infection (Sj), gets penicillin (Pj) and recovers (Rj). The problem of accepting the argument
p(R | S ∧ P) = r Sj ∧ Pj
[r], r is close to 1
Rj as an explanation of the recovery is that everything in the putative explanans is compatible with us knowing that Jones’ infection is penicillin-resistant (S*j) and that the following argument also is sound:
p(¬R | S * ∧ P) = r * S * j ∧ Pj
[r *], r * is close to 1
¬Rj since both Sj and S*j can be true, and known to be true, at the same time. But though the premises of the arguments are consistent with each other their conclusions are not. Were we to disregard the importance of ambiguity we would have to claim that our first argument constitutes an explanation of the low probability event of Jones recovery from a penicillin resistant infection after he has been given penicillin.7 Hempel’s solution was the requirement of maximal specificity: 6 7
The above distinction is fundamentally the same as introduced in (Kitcher and Salmon 1987) in a critique of van Fraassen. I.e. Jones has a penicillin resistant infection, gets penicillin and recovers. (In order for us to make the example slightly more truth-like we might qualify “recovers” with “recovers within a week”.)
18
H. HÅLLSTEN if the total state of knowledge, K, implies that a belongs to subclass F* of F, then K must also imply a statement specifying the probability of G given F*, p(G|F*) = r1 and r1 = r. (Hempel 1965, p. 399-400)
Thus, when pondering on whether accepting something as an I-S explanation we must take into account everything we know. If we know that Jones’ infection is penicillin-resistant then we should not accept an explanation that is built on the administration of penicillin. Hempel was clear about the fact that this requirement made his I-S model very different from his D-N model: There is an explicit relativization to our present state of knowledge in the definition of what counts as an I-S explanation. Thus, I-S explanations are openly epistemically relativized. Hempel: The preceding considerations show that the concept of statistical explanation for particular events is essentially relative to a given knowledge situation as represented by a class K of accepted statements. Indeed, the requirement of maximal specificity makes explicit an unavoidable reference to such a class, and it serves to characterize the concept of ‘I-S explanation relative to the knowledge situation represented by K.’ We will refer to this characterization as the epistemic relativization of statistical explanation. (Hempel 1965, p. 402, my italics)
Accepting epistemic relativity for explanation is, as we will see, a controversial standpoint. Assume that we know that Jones is infected, that he gets penicillin and that he recovers, but are oblivious to the fact that he has a penicillin-resistant infection and that his recovery is in fact highly improbable. According to the I-S model we would then have an explanation of the recovery — not a probable explanation or a sketch of one, but an explanation full stop. With the distinctions introduced above in 1.2 it is clear to see how the I-S model fails to capture objective relevance. The penicillin is simply objectively irrelevant to the improbable recovery of Jones. No matter our interest or state of knowledge any putative explanation built on information about the penicillin is simply bad. Though it is true that Jones got penicillin the explanatory claim that this explains his recovery is false, and hence cannot be relevant. The part of the I-S model that is to be held accountable for accepting the irrelevant penicillin as explanatory is obviously its epistemic relativization and it is towards this that we will now turn. 4. This flaw of the I-S model has been widely recognised. See for example Coffa (1974), Humphreys (1989), Railton (1980, 1990) and Salmon (1989). We can read Railton saying: On the D-N account we can say that while being in error may lead us to accept a bad explanation, such ignorance cannot make it a good explanation; on the I-S account, we cannot draw this fundamental distinction. (Railton 1980, p. 227)
Humphreys, though not directly commenting on the I-S model: It is no explanation to provide a distorted representation of the world and the “understanding” induced by such incorrect models is illusory at best. (Humphreys 1989, p. 103)
WHAT TO ASK OF AN EXPLANATION-THEORY
19
By now it probably appears to the reader that I am kicking in open doors, but an elaboration on what it is that fails the I-S model will hopefully show this to be more controversial than it appears (especially concerning the theories of the above four writers). First of all let us try to capture the implicit requirement that the I-S model fail to live up to. The non-ER requirement on theories: An explanation-theory should not relativize what is to be counted as an explanation to a specific state of knowledge, unless the relativization has to do with the present interest and knowledge—the needs—of the explanation-consumer.8 Since the theory of explanation in much consists in elaborations on examples of putative explanations we want to rephrase this requirement for examples of explanations. This is important since we often argue against a proposed model of explanation by presenting examples that we claim are genuine but fail to be labelled as such by the model in question. Thus, before we start to discuss models and explanation-theories we need some sort of agreement on the type of examples that the models and theories are meant to account for. My proposed rephrasing is the following: The non-ER requirement on examples: If a putative explanation is to be interpreted as a genuine explanation by an explanation theory it must be impossible for further knowledge to show part of the explanans to be (objectively-) irrelevant without also showing it to be false. Observe that “explanation” above does not have to be a full explanation—be it the total causal history or a deductive argument; it is simply any information supplied in an explanatory answer, or used in an explanatory claim. And though further knowledge concerning the function of banks would make “that’s where the money is” irrelevant for an ignorant explanation-consumer, it would not make it objectively irrelevant. If the reader thinks that I am with this presupposing too much9, we can resort to: The non-ER requirement on examples*: If a putative explanation is to be interpreted as a genuine explanation by an explanation theory it must be impossible for further knowledge to show part of the explanans to be irrelevant without also showing it to be false, unless the putative explanation follows from the further knowledge. The improbable recovery of Jones from a penicillin-resistant infection would then not be explained by Jones’ eating penicillin, even though we would be right in believing so, had we not known that the infection was penicillin-resistant. And 8 9
Another way of putting it is that the knowledge-state of the explanation-producer should be kept out of the picture. I.e. the distinction between pragmatic and objective relevance discussed in 2.
20
H. HÅLLSTEN
Sutton’s answer is explanatory to an ignorant consumer even though further knowledge would make it irrelevant since the truth of Sutton’s answer is implied by the expanded knowledge. Before we make use of this requirement let us elaborate on terms further knowledge and impossible. The first of the terms is simple to clarify: further knowledge is a conservative extension of a knowledge state. Thus, knowledge is added but nothing is revised. What can be revised is of course beliefs about explanatory relations, but since I am arguing that putative explanations violating the non-ER requirement do not constitute knowledge this is not a problem for the present discussion. One way of understanding the requirement is so that it must be impossible for us to enhance our knowledge about the present world so that explanans is shown to be irrelevant. If we use possible worlds as an heuristic vehicle, understood this way the requirement only sets limits on the possible worlds that differs from ours on account of epistemic facts, facts about our knowledge. Concerning non-epistemic facts, they are all alike. Concerning the epistemic facts the worlds must be so that everything that is known here is also known there, but in addition we have found out a few more things and non of these things show our putative explanans to be irrelevant. This is arguably a too weak interpretation of impossible in the present context. The world could be so that everything that falls under the I-S model is in fact impossible to be made irrelevant by further knowledge. By chance, people never recover from penicillin-resistant infections when given penicillin, thus no counterexamples exists. But saving the I-S model with a lucky chance is hardly satisfactory. Thus, we need to give a stronger interpretation of the notion of impossible. Instead the possible worlds used to express our requirement must be so that they are similar to ours as far as we now know, free in all other respects. Thus, we know more in them, plus they might also be different concerning the facts we know nothing about in this world. Under this interpretation, we will have a counter-example to the I-S model, and any epistemically relativized notion of explanation, since even if our world would be such that no one that recovers after penicillin has in fact a penicillinresistant infection the model must also work in the world where Jones lives and recovers. Given this strong interpretation we see that the requirement of nonepistemic relativization is indeed demanding. But put simply, the demand is just that if a theory of explanation is to interpret something as an explanation then the defining clauses must conclusively decide the explanatory relevance of the putative explanation. That is what is missing in the I-S model and that is what the authors in the beginning of this section criticises; what I have done is to clarify this requirement of non-epistemic relativization in relation to an instance of an explanation. 5. As mentioned, the importance of the non-ER requirement on examples has to do with our evaluation of different explanation-theories. When Hempel advanced the IS model it was because he thought that there existed examples of explanations that his D-N model did not capture. And when Coffa, Humphreys, Railton and Salmon
WHAT TO ASK OF AN EXPLANATION-THEORY
21
argued that the I-S model was corrupt they did not claim that therefore the D-N model sufficed, instead the argued that other models were needed to capture those eluding examples of statistical explanation. As outlined in 1.1, this is they way we go about. We claim that certain examples should fit under a concept and together with certain other intuitions about the concept we try to give a definition that would give justice to both our intuitions about the examples as well as our more general intuitions.10 The non-ER requirements on theories and examples are meant to capture the intuitions that lie behind the above quotations from Humphreys and Railton. But these authors only considered the requirements concerning theories; the examples they put forward were simply presupposed to live up to the non-ER requirement. I have argued at length that this is not so in (Hållsten 2001) and (Hållsten 2004). The arguments therein focus on the fact that there are many ways in which an event can come about, especially if the event is indetermined. Even if one of the alternatives is the most likely it still could have been inefficient. And if it is possible that further knowledge can show this, then we should, according to the non-ER requirement on examples, refrain from calling it a genuine explanation. If a person gets cancer it is not necessarily so that this is due to all the exposures that might lead to cancer. The very fact that most probabilistic mechanisms are not necessary opens up to the possibility that a very unlikely — and unmentioned — putative explanation is responsible, whereas the very likely putative explanation is not. Since we are dealing with indeterministic mechanisms the fact that a mechanism is “put into action” does not determine the fact that it will cause the effect, and the fact that there are other alternatives makes it possible that the effect was caused by some of them. I claim that all putative explanations of probabilistic character fail on account of the non-ER requirement on examples.11 12 The very intuition that lies behind many 10 11
12
I intend nothing deep with “intuitions”—“gut-feelings” or “pre-theoretical presuppositions” would suffice. In Hållsten (2001) and Hållsten (2004) I also discuss examples in which no more knowledge can rule out one of the alternatives. My conclusion is the same there: one of them could have been responsible and the other ineffective. This conclusion is reached through different assumptions about how we should understand the important concept probabilistic causality and is therefore too long to recapitulate here. However, examples such as these—with full knowledge—are very unlikely and it is not they that are but forward as necessitating a concept of statistical explanation. Thus, it suffice to look at putative explanations and ask if the live up to the non-ER requirement for examples. If this is done thoroughly it will be obvious that they do not. Rebecca Schweder uses a distinction from Hållsten (2001) between probabilistic explanations that can be improved by the addition of more information and explanations for which this is not possible (Schweder present volume). As Hållsten (2001) is a plea for deductivism, i.e. the non-existence of genuine probabilistic explanations, my distinction should be regarded as holding between putative explanations. The above argument, as well as some in Hållsten (2001) and (2004) can then be presented as; the first part consists in establishing that there are no explanations of the second type presented; and the second part consists in showing that if a putative explanation is of the first type, then the additional information cannot only enhance it, it can also disqualify it—as in the example with Jones when we are oblivious of the fact that his infection is penicillin-resistant. (Hållsten (2001) and (2004) also shows that if an explanation would be of the second type, then the deductivist have a viable position in claiming that it is the objective probability that is explained.) On the contrary of what is argued by (Schweder present volume), it does not matter for the validity of these arguments whether or not the putative explanans consists of “genuine laws of nature” or “mere generalisations”
22
H. HÅLLSTEN
philosophers argument against the I-S model also banns putative examples of probabilistic explanation as genuine. The non-ER requirement is so strong that most instances of what we under normal circumstances regard as explanations should be regarded as mere probable sketches of explanations, or partial explanations. This has nothing to do with the fact that most of the information that we supply as explanatory is in fact falsifiable. In the Jones-example it is not falsification of the putative explanans that disqualify the penicillin-explanation of the unlikely recovery, it is the fact that the recovery is brought on in some other way. This can be illustrated with the Lewis-type example of Gjelsvik: “There has been a crash. There is the icy road, the drunk driver, the bald tyre, and much more.” (Gjelsvik present volume) As long as neither the icy road, the bald tire nor the drunk driver necessitates the crash—or necessitates their cooperation in bringing about the crash—any one of them could have been explanatory irrelevant. The driver could have been so drunk that he simply floated over to the wrong lane without any attempt of stopping, or the ice could have been so disguised and slippery that any driver on any tire would have crashed. If investigators of the crash discover that no attempt was made to stop or steer away they can presumable rule out the bald tire. Thus, the original example is epistemically relativized.13 If we want to present an example of an explanation, we must ask us if it is possible that further knowledge might show that although everything in the putative explanation is still believed to be true it is simply not (objectively-) relevant for the explanandum. If this is possible than we should consider the example as a probable explanation, and probable is here to be understood as “the information is true and probably explanatory relevant”. Without having gone through any example of a putative explanation I now assume that the reader no longer consider the non-ER requirements as obvious nor views the above as “kicking in open doors”. Instead it should be clear that if we take the non-ER requirements to heart, this has severe consequences. But before we go on with giving a better argument for the requirement, than those that can be found in the quotations from Railton and Humphreys above, let us consider what might appear an easy way out. Often a causal relation is assumed to be necessary between the explanans and the explanandum: In order for A to explain B, A must at least be part of B’s cause. If information about this causal link is added to the explanans then it is obvious how further knowledge that disqualify the explanans also falsifies it. If “The taking of penicillin caused the recovery” is added to the corrupt explanation of Jones recovery, this will be falsified by knowledge showing that the infection was penicillin resistant. By adding a causal claim to every
13
as long as the probabilistic laws of nature violates the sine qua non requirement. If it is possible that the explanandum could be brought about by something else than the putative explanans, then further information could show the putative explanans to be irrelevant. And the antecedent of this claim is true for all attempts to explicate probabilistic laws or mechanisms as well as for all putative examples of probabilistic explanation. Thus, putative examples of probabilistic explanation violate the non-ER requirement. As with my use of Ylikoski’s example I have not used Gjelsvik’s example as a part of an argument against Gjelsvik’s paper.
WHAT TO ASK OF AN EXPLANATION-THEORY
23
putative explanation something in the explanans would then be shown to be false if further knowledge disqualify the explanation. The problem with this is how to understand this causal claim. Either it is understood as something that cannot be analysed further — and hence not justified without other causal claim; or causality can be analyzed in a similar way as explanation. Since similar non-ER requirement can be posed concerning causality as concerning explanation, the second option does not solve the problem. The requirement would then just shift place and any example of a causal claim should then be interpreted as a probable claim as above. The first option seems to be in accordance with Woodward’s attempt, as introduced in section 1.1. It is non-reductive in the sense that explanation is analysed in terms of a fellow concept without a reductive analysis of the fellow concept. But unless knowledge can be claimed about the other causal connections that are elucidating the connection in question this does not help us either. The example would still fail the non-ER requirement. 6. Considering the fact that the non-ER requirement seems to be so strong perhaps it is tempting to give it up. Above, I have not given any arguments for it but just quoted authors endorsing it. I find Railton’s point concerning our lack of ability to draw the distinction between errors leading us to accept bad explanations and errors making something an explanation, very convincing. And Humphreys reasoning concerning the “understanding” induced by incorrect models is certainly convincing at face value. But in the face of the grave consequences we might be willing to give the requirement up. I will end by sketching on an argument against giving up the nonER requirement. But more than an ordinary argument it is an attempt to show that giving up the requirement put us in bad, or in fact no, company concerning another hotly discussed issue in philosophy, namely the Gettier-examples in the theory of knowledge. In (Gettier, 1963) counterexamples was presented to the account of knowledge as justified true belief. Without assuming anything more that that justification need not be conclusive (we can be justified in believing something that is false) and that a deduction transfers justification (if we are justified in P and can deduce Q from P then we are justified in Q), Gettier presented examples that fulfilled the justifiedtrue-belief account of knowledge but that fly in the face of our intuitive grasp of the notion. Gettier’s two cases, that have been followed by many others, do certainly appear to be contrived but nevertheless they are effective in casting doubts over the justified-true-belief account. One way of saving the account is of course bite the bullet and accept that Smith do know that “The man who will get the job has ten coins in his pocket.” when it is true that Smith will get the job, and that he has ten coins in his pocket as long as Smith is justified in believing that Jones will get the job and that Jones has ten coins in his pocket. As far as I know nobody has explicitly defended this position.
24
H. HÅLLSTEN
Let us now assume that Gettier’s Jones is identical with Hempel’s Jones and that Smith is his doctor. Smith has given penicillin to Jones and is now justified in believing that Jones has recovered due to the high probability of recovery after penicillin administration. As before, Jones has recovered. Smith will now be justified in the true belief that Jones has recovered. But unknown to Smith, Jones had an improbable recovery from a penicillin-resistant infection. Thus, Jones has recovered and Smith is justified in believing so. I hope that the elaborations of the above examples have been enough for the reader to appreciate this as a Gettier-like example. Accepting the I-S explanation of Jones improbable recovery would then be a very similar move as accepting that Smith have acquired knowledge in the Gettierlike example. And giving up the non-ER requirement would then be to give up a way of blocking these “Gettier-explanations” that epistemically relativized explanations constitute. If we look at the commentary literature concerning the Gettier-examples an even stronger connection can be detected. Robert Fogelin argues that the central feature of Gettier-examples is non-monotonicity: My suggestion is then that the central feature of Gettier’s original examples is this: Given a certain body of information, our subject S, using some standard procedure, justifiably comes to believe that a proposition, h, is true. We are given wider information than S possesses, and in virtue of this wider information see that S’s grounds, though responsibly invoked, do not justify h. (Fogelin 1994, p. 22)
I will not defend this characterization here but note the similarities concerning our present issue. What we are interested in is in which situations we want to claim that some information is explanatory. If S uses an epistemically relativized notion of explanation as providing the “standard procedure” for determining this, S run the risk of labelling something as explanatory when we “in virtue of this wider information see that S’s grounds, though responsibly invoked, do not justify” this explanatory claim.14 The similarities between the Gettier-examples, in Fogelin’s interpretation, and our epistemically relativized I-S explanations are striking. And if it is in fact true that practically all putative probabilistic explanations are epistemically relativized, then endorsing them as genuine explanations is similar to endorsing Gettier-examples as examples of genuine knowledge. As far as I know nobody take this position in the theory of knowledge, but in the theory of explanation its sister-position is not as widely rejected. Either it is openly endorsed as in the case of Hempel’s I-S model, or it is smuggled in through the back door since the examples used to boost the confidence in the presented models could in be fact 14
Another way of appreciating the similarities is by trying to capture what it is that is wrong with the Gettier examples without referring to individuals as possessors of knowledge. Knowledge is then seen as a relation between true and believed to be true propositions. (Or to be more precise, between sets of true and believed to be true propositions, and true and believed to be true propositions.) A suitable requirement concerning knowledge could then be the non-monotonicity requirement on knowledge examples: If h is true as well as believed to be so, then if h is to be interpreted as known in a specific state of knowledge by a knowledge-theory then it must be impossible for further knowledge to show h to be unknown.
WHAT TO ASK OF AN EXPLANATION-THEORY
25
“Gettier-explanations”. The explanans in any putative example of probabilistic explanation might just turn out to be irrelevant in the same way as the penicillin in the case of Jones’ recovery. Endorsing the presented examples as genuine explanations is similar to endorsing the penicillin in the recovery-example, and this is similar to endorsing a Gettier-example as genuine knowledge.15 7. To recapitulate: An explanation theory must give justice to both our intuitions concerning the interpretation of particular putative explanations as well as to certain more basic intuitions about what can be demanded of an explanation. These intuitions are constantly re-evaluated in the light our explanation-theory, as well playing a vital role in evaluating our theories. The often invoked requirement of non-epistemic relativization of explanation-theories can be shown to be a very strong requirement. In fact, it forces us to interpret most putative explanations as merely probably explanatory relevant. But it is a legitimate requirement as well as its sibling position of not accepting Gettier-like examples as genuine knowledge. It seems clear from the above considerations that what is lacking in the theory of explanation is more focus on the issue of what constitutes a probable partial explanation, or an explanation sketch. And it needs to be repeated that the reason for dubbing the explanation probable is not that the explanans is falsifiable, but that the explanatory-claim is falsifiable without falsifying the explanans. If we aspire to capture the notion of explanation and give justice to the non-ER requirement as well as interpret explanatory acts in science, we need a notion of probable explanation. Furthermore, we need to sort out the ways in which scientists show putative explanatory information to be more or less probable, since this is often an issue even when the truth of the information is undisputed. If we for arguments sake assume the position of Hempel in holding on to the D-N model, but trying to accommodate certain putative explanations that do not fit the D-N model, then I argue that we should not introduce a new model—as Hempel seemed to do—but instead advance our theory in order to accommodate the putative explanations as probable partial explanations, not as full explanations of a new kind. Regardless of what explanationtheory we adhere to, the non-Epistemic Relativization requirement should be addressed. If we hold on to the requirement we should interpret putative explanations accordingly.
15
It has often been argued that just as we cannot claim to know only those things we can justify with conclusive certainty, we cannot explains only those things that we can predict with conclusive certainty. The above considerations show that we will get the same problem in the theory of explanation as has been encountered in the theory of knowledge. When we do not require conclusiveness of justified beliefs and explanation, many particular justifications and putative explanations might be the wrong ones. (It should be pointed out that it is not my intention here to solve the Gettier-problems, and especially not so by requiring conclusiveness of justification.)
26
H. HÅLLSTEN REFERENCES
Cartwright, N. (1983). How the Laws of Physics Lie. Oxford: Clarendon Press. Coffa, A. (1974). Hempel’s Ambiguity. Synthese 28: 141-163. Dowe, P. (1994). Wesley Salmon’s Process Theory of Causality and the Conserved Quantity Theory. Philosophy of Science 59: 195-216. Dummett, M. (1991). The Logical Basis of Metaphysics. Massachusetts: Harvard University Press. Fogelin, R. (1994). Pyrrhonian Reflections on Knowledge and Justification. Oxford: Oxford University Press. Garfinkel, A. (1981). Forms of Explanation. New Haven: Yale University Press. Gettier, E. (1963). Is Justified Belief Knowledge? Analysis 23(6): 121-123. Gjelsvik, O. (present volume). Causal Explanation Provides Knowledge Why. Hempel, C. G. (1965). Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. New York: The Free Press. Humphreys, P. (1989). The Chances of Explanation. New Jersey: Princeton University Press. Hållsten, H. (2001). Explanation and Deduction; a defence of deductive chauvinism. Stockholm: Almqvist&Wiksell. Hållsten, H. (2004). The Explanatory Virtues of Probabilistic Causal Laws. In Faye, Needham, Scheffler and Urchs (eds.): Nature’s Principles. Springer: 157-170. Kitcher, P. (1989). Explanatory Unification and the Causal Structure of the World. In Minnesota Studies in the Philosophy of Science XIII. Minneapolis: University of Minnesota Press: 410-508. Kitcher, P. and W. Salmon. (1987). Van Fraassen on Explanation. Journal of Philosophy 84: 315-330. Railton, P. (1980). Explaining Explanation: A Realist Account of Scientific Explanation and Understanding. Ph.D.-thesis, University Microfilms International. Railton, P. (1990). Taking Physical Probability Seriously. In Salmon (ed.): The Philosophy of Logical Mechanism: 251-283. Salmon, W. (1989). Four Decades of Scientific Explanation. In Minnesota Studies in the Philosophy of Science XIII. Minneapolis: University of Minnesota Press: 3-219. Salmon, W. (1994). Causality Without Counterfactuals. Philosophy of Science 61: 297-312. Salmon, W. (1998). Causality and Explanation. Oxford: Oxford University Press. Schweder, R. (present volume). Some Observations on Unificationism and Probabilistic Explanation. Woodward, J. (2003). Making Things Happen. Oxford: Oxford University Press. Ylikoski, P. (2001). Understanding Interests and Causal Explanation. Helsinki: http://ethesis.helsinki.fi. Ylikoski, P. (present volume). The Idea of Contrastive Explanation.
THE IDEA OF CONTRASTIVE EXPLANANDUM PETRI YLIKOSKI
In this paper, I will discuss the idea of contrastive explanandum. I will restrict my discussion to singular causal explanation, but the basic ideas and the arguments have a broader application. They are relevant also to other kinds of explanations. In the first section I will first present the intuitive idea of contrastive questions, and then elaborate it by discussing typical criteria for the choice of a contrast. I also suggest a novel way to see the difference between scientific and everyday explanatory questions. In the second section I will discuss the major criticisms presented against contrastive theories of explanation in order to further clarify my position. I argue that all explananda can be analyzed as contrastive and that this is a fruitful approach in understanding explanatory questions. I also argue that the contrastive thesis should be understood as a claim about what an explanation can explain, not as a thesis about what the explainee has in her mind. Finally, I defend the thesis that a contrastive explanandum can be reduced to a non-contrastive explanandum against the arguments presented by Dennis Temple and John W. Carroll. 1. THE CONTRASTIVE EXPLANANDUM A famous joke about the bank robber Willie Sutton introduces the basic idea of contrastive explanation. When Willie was in prison, a prison priest asked him why he robbed banks. Willie answered, “Well, that’s where the money is.” The joke is based on a confusion, for Willie was not answering the question the priest was asking. The priest had in his mind the question: “why do you rob banks, instead of leading an honest life?”, whereas Willie answered the question: “why do you rob banks, rather than gas stations or grocery stores?” This is the basic insight of the contrastive approach. We do not explain simply ‘Why f?’ rather, our explanations are answers to the contrastive question ‘Why f rather than c?’. (Garfinkel 1981, p. 21-22.) Instead of explaining plain facts, we are explaining contrastive facts. Several philosophers of explanation have used the same basic idea. (For example, Hart and Honoré 1959; Hansson 1975; van Fraassen 1980; Garfinkel 1981; Hesslow 1983; Woodward 1993 (originally published in 1984); Lewis 1986; Sober 1994 (originally published in 1986); Temple 1988; Lipton 1990, 1991, 1993; Barnes 1994; Hitchcock 1996, 1999; Carroll 1997, 1999; Risjord 2000.) I will follow their lead and try to develop the contrastive idea a little bit further. 27 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 27–42. © 2007 Springer.
28
P. YLIKOSKI
In the following discussion I am not committed to any specific theory about explanation-seeking questions. (See van Fraassen 1980; Tuomela 1980; Sintonen 1984; Koura 1988; Hintikka and Halonen 1995.) Some advocates of the contrastive approach subscribe to the thesis that all explanatory questions are always whyquestions (van Fraassen 1980). The account presented here does not include any such commitment. The thesis that explanations are answers to questions should be kept separate from the thesis that all explanation-seeking questions are whyquestions. As I see it, the same explanatory request can often be made using various linguistic devices (Scriven 1959, p. 451). For example, in some cases a howquestion is a more natural way of making a contrastive explanatory request than a why-question. From the point of view of my account, it is not essential whether every explanation-seeking question is a why-question or that all explanation-seeking questions can be paraphrased as why-questions. As Markwick (1999, p. 191) has noted, there is no deep commitment to any specific question-theoretical approach among most supporters of the contrastive theory of explanation. I will continue this tradition of non-commitment. Contrary to a common misunderstanding, the erotetic approach to explanation is fully compatible with the realist account of explanation. It is not confined to the pragmatics of giving explanations nor does it commit one to the kind of explanatory subjectivism advocated by Bas van Fraassen (1980). As I have argued earlier (Ylikoski 2001), the contrastive account is completely compatible with view that the aim of explanations is to track objective relations of dependency in the world. Explanations are about the things in the world and there is more to the explanation than that the recipient is satisfied with it, as the extreme pragmatic theory of explanation would have it. It is possible that the recipient is wrong in accepting a certain answer as an explanation. Although the question is wholly up to the recipient, the adequacy of the answer is not. In fact, the idea that explanations are contrastive is natural if one thinks that the aim of explanation is to trace relations of dependency and that questions are crucial element in understanding explanation. In this context the idea that explanations are answers to ‘what-if-things-had-been-different’ questions (Woodward 1993, 2003) is quite natural. We wish to know how the change in the causes brings about the change in the effects. We want to know what makes the difference and then to leave out factors that do not. Furthermore, if there is no change in the effects when we make changes to our putative explanatory factors, then we do not have truly explanatory factors, since there is no appropriate relation of dependency. The contrastive idea works nicely with our preferred form of causal inquiry: the scientific experiment. When we are looking for explanations using the methods of experimental inquiry, we are basically working in a contrastive setting. For example, we contrast the control group with the experimental group or the process after the intervention with the process before the intervention. In both of these cases, we are trying to account for the differences between the outcomes. The basic idea is to keep the causal background constant and bring about changes in the outcomes by carefully controlled interventions. The same contrastive setting also works in comparative research, which is our second option if experiments are impossible.
THE IDEA OF CONTRASTIVE EXPLANANDUM
29
Clearly, by adopting the contrastive idea, we are starting with a very intuitive and central feature of our cognition. (Hilton 1995). It can be speculated that our contrastive explanatory preferences stem from our nature as active interveners in natural, psychological, and social processes. We want to know where to intervene to produce the changes we want, and this knowledge often presupposes answers to some why- and how-questions. Without this knowledge we would not know when the circumstances are suitable for our intervention. We would not be able to predict the results of our interventions. I am not claiming that we can reduce the notion of explanation to its origins in agency. We also want explanations for things that we cannot manipulate. However, our instrumental orientation might still explain why our explanatory preferences are as they are. Causal explanations face the problem of explanatory selection (Hesslow 1983). We have to pick the right aspects of the causal process to be included in the explanation. All causal information is not explanatory information. Usually, the causal history of an event includes a vast number of elements and aspects that are not explanatorily relevant to the explanation-seeking question we are addressing. We want only the items that make a difference in the things that we are interested in. But the problem of explanatory selection is wider than that, for it also requires that the explanans must also be described in the right way. Things can be described in various ways and at various levels of abstraction, and the challenge is to find the right way and the right level for the explanation at hand. The crucial advantage of the contrastive approach is that it allows us to be specific about the explanandum. As a consequence, it can be profitably used in the analysis of apparently competing explanations. As it will turn out, many apparently competing explanations are in reality complementing explanations. 2. VARIETIES OF CONTRASTIVE EXPLANATION There are various sorts of contrastive explanation, depending on the nature of the explanandum. They all share two basic ideas of the contrastive explanation. In all of them, the explanation traces objective relations of dependence and it is seen as an answer to a contrastive question. These explanations differ in terms of their explananda and in terms of the kind of dependence relationship they are tracking. First, there is singular causal explanation, which is the topic of this paper. In singular causal explanation we are explaining facts about events in terms of facts about the earlier development of the causal process in question. For example, we explain facts about a car accident by referring to the facts about events and circumstances that occurred before the accident. The explanation aims to select the relevant facts from the causal history, the measure of relevance being their contribution to the difference between the fact and its foil. I will return to this explanatory setting shortly. Second, there is a singular explanation for an instantiation of a property. For example, we might explain the fragility of a glass by referring to some facts about its molecular structure. This explanation does not directly refer to causal processes, or
30
P. YLIKOSKI
causal dependency, between molecular structure and fragility, but to the relations of dependency between the properties. The fact that a glass has a certain molecular structure constitutes the fact that it is fragile. Certain facts about the molecular structure make the difference between being fragile and being something else. They also determine the specific way in which the glass is fragile. As all facts about the molecular structure of the glass are not relevant to its fragility, we have a similar problem of explanatory selection as in the case of singular causal explanation. Furthermore, as there might be, and probably are many different ways to constitute fragility, our singular explanation of property instantiation is not necessarily a general explanation of the property of fragility. (For an account of this kind of explanation, see Cummins (1983).) Regularities and laws are the third important class of explananda where the contrastive model of explanation is applicable. In such cases we are interested in the dependence between regularities and more fundamental laws and mechanisms. The laws or regularities to be explained are the way they are because the more fundamental laws and mechanisms are the way they are. The contrastive approach also works here: when explaining laws and regularities, we are explaining why they are the way they are, rather than otherwise. This short survey of various kinds of explanation is not intended to be exhaustive. The central point here is that the intuitions behind the contrastive approach are general intuitions about explanations, and not ad hoc specifications made to suit the case of singular causal explanation. 3. INDICATING THE CONTRAST There are various linguistic means to indicate the contrast in an explanation-seeking question (van Fraassen 1980, p. 128; Garfinkel 1981, p. 25 and 29; Sober 1994, p. 176). For example, in English we can ask: Why f rather than c? Why f and not c? Why f instead of c? Given that f or c, why f? The contrast can also be indicated by the combination of emphasis and nonlinguistic contextual cues. Sometimes the contrast is so obvious from the context that there is no need to indicate it at all. The existence of alternative linguistic means even within a single language suggests that there is no single privileged way of indicating the contrast. (contra Carroll 1997, 1999.) In order to simplify the presentation and to avoid possible problems of generalizing the curiosities of one particular linguistic way of indicating the contrast, I will denote the contrastive explanandum by f [c]. Here f is a fact, and c is a noncompatible alternative (a foil) to f. This notation allows one to state the intended explanandum more clearly, or at least more economically. The usual linguistic devices can be ambiguous and clumsy in complicated situations that arise in philosophical discussion. The use of technical notation also underlines the difference
THE IDEA OF CONTRASTIVE EXPLANANDUM
31
between understanding the structure of the explanandum on one hand, and the pragmatics of locutions like ‘… rather than …’ in English language on the other hand. Some of the pioneers of the contrastive approach (for example, Bas van Fraassen and Alan Garfinkel) speak about a contrast class instead of a single contrast. I think this way of speaking is misleading from the point of view of analyzing explanation. The basic unit of explanation is always an explanation of singular f [c]. The contrast class f [c, c*, c**, etc.] can always be understood as cluster of more simple explananda and be partitioned to more simple explananda: f [c], f [c*], f [c**], etc. This observation does not rule out that in practice a given piece of explanatory information can explain more than one contrast. In such case its just contains more than one explanation. A single answer can sometimes answer more than one question. It is also possible that a whole group of contrasts can be equivalent from the point of view of explanation. In this case the same explanatory information explains are explananda f [c], f [c*], f [c**]. Case like this can be found for example in equilibrium explanations. However, neither the excess of explanatory information nor the explanatory equivalence of some contrasts is an argument against the idea that the basic unit of explanation is the singular contrastive fact. 4. THE IDEAL EXPLANATORY TEXT There are two basic approaches to identify the basic unit of explanation that are often seen as incompatible. The issue is about the standard for the completeness of explanation. The erotetic approach advocated in this paper regards an answer to an explanation-seeking (contrastive) question as a basic unit of explanation. For the advocates of the other view this is just an observation about pragmatics of requesting and giving explanations. They separate the answers and explanation proper by saying that the actual explanations by people just provide information about the real, and more complete, explanation. Peter Railton (1978, 1981; see also Hållsten 2001) calls this more inclusive unit of explanation an ideal explanatory text. According to Railton, the basic goal of science is the subsumption of particular facts and regularities to the ‘nomic nexus’. This is achieved by fitting the world’s phenomena into a fully general and comprehensive theory. In Railton’s vision, the ideal for which science strives is the ‘ideal explanatory text’, which would be able to explain every aspect of the phenomenon under consideration and would that also have a deductive-nomological structure. He notes, that “... plainly there is no question of ever setting such an ideal text on paper” (Railton 1981, p. 247), but he wants to underline the regulative role of such ideal. The aim of science is to develop a capacity to provide material for such ideal explanatory texts. I do not want discuss the merits of Railton’s ideal as a description of the goal of scientific knowledge. He clearly describes a picture of science that is accepted by many philosophers of science, but the validity of this picture is not an issue here. The question concerning the ideal form of scientific knowledge is a separate issue from the question ‘what makes a given explanation explanatory?’.
32
P. YLIKOSKI
The trouble with the idea of the ideal explanatory text is that it does not give us any hints about the principles that govern the choice of explanatory information. Real life explanations are said to provide information about the ideal explanatory text, but Railton himself admits that he has no conceptual tools to cash out this idea. (Railton 1981, p. 244; Woodward 2003, p. 175-181) Obviously, his approach requires additional ideas to handle this central problem. In practice, only the erotetic approach can do this. As a consequence, in order to have full account of explanation, the advocate of the ideal text approach needs both ideas. In this sense the two approaches are not real alternatives. The supporter of ideal text approach needs both. The situation is different from the point of view of erotetic approach. Its supporters can claim that they can have all the advantages of the ideal text approach without accepting any of its philosophical presuppositions. At least, this is something that I would like to argue. The basis of my argument is the possibility that the concepts of complete explanatory text and ideal explanatory text can be characterized using the idea of contrastive explanandum. A complete explanatory text for the fact f would contain all information required for answering any possible contrastive question f [x] about the f. The construction of the complete explanatory text would involve explaining why f against all possible contrasts. The concept of the ideal explanatory text would be even more ambitious. Here we would have a combination of complete explanations for any fact f about some event e. The ideal explanatory text would literally explain everything about e. With the help of these concepts, the advocate of the erotetic approach can claim that she can, if she wishes, say that the ideal aim of science is to provide ideal (or complete) explanatory texts as Railton has suggested. She can also claim that the erotetic approach is more fundamental. The elements of the ideal explanatory text for the event e are explanatory because they are parts of adequate answers to some contrastive questions about e. However, there is no way in which the supporters of the ideal text approach can accomplish a similar derivation. This asymmetry between the two approaches suggests that the erotetic approach is conceptually more fundamental. 5. EXPLANATORY ADEQUACY AND FAILURE We have an adequate explanation of f [c] when we have explained why f happened rather than the intended contrast c. An explanation is inadequate when it does not explain f [c]. An explanation can fail in various ways (Lewis 1986, p. 226-228). We can distinguish between a broad and narrow sense of explanatory failure. An explanation fails in the narrow sense due the fact that the offered explanation does not fulfill the requirements of an adequate explanation (see Ylikoski 2001, Chapter 2). For example, an explanation can fail by being partial. In such a case, the provided information is explanatorily relevant and true, but the explanation needs to be supplemented to be fully adequate. This notion is especially important in the case of probabilistic explanation. (Ylikoski 2001, Chapter 3) However, its use is not limited to probabilistic contexts.
THE IDEA OF CONTRASTIVE EXPLANANDUM
33
In the broad sense an explanation can fail in two different ways. First, the explanation might provide misinformation. It can claim things that are not true. Let us call this a factual failure. Although in practice it can often be very difficult to determine the facts of the matter, the case of misinformation is not a big theoretical problem for the theory of explanation. A distinction between a possible explanation and the true explanation can be useful for avoiding confusion. A possible explanation satisfies all the other criteria of a good explanation except for the truth requirement. If it were true, it would explain the explanandum. It fails for purely factual reasons. Another kind of factual failure is the failure to correct incorrect presuppositions of the explanatory question. In such cases the explainer fails to point out to the recipient that her question rests on premises that are not true. This failure can also be classified as an example of pragmatic fallacy, but I think it is clearer to treat it as an example of factual failure. The second possible explanatory failure in the broad sense is pragmatic failure. In these cases the explainer does not provide what the recipient of the explanation wants. For some reason, there is a communication breakdown between the explainer and the recipient. These failures are similar in all forms of communication, and consequently, they are not unique to the communication of explanatory information. Again, a pragmatic failure can occur in a number of ways. First, the explanation can answer the wrong question. The explanation could be perfectly good, but it does not address the question that the recipient wants to be answered. It might be that the recipient already knows that answer or she simply does not care about that particular question. Second, the explanatory information might be in a form that the recipient cannot understand. The explanation might be so technical, or the vocabulary so full of jargon, that the recipient cannot cope with it. The explainer might also presuppose background knowledge that the recipient does not have, which leads to a failure to understand. The third way to fail pragmatically is to provide the explanatory information in such a form that the recipient cannot separate the explanatory information from all the other information provided. In such cases, the explainer provides the explanatory information and so much other information that the recipient cannot disentangle them. A similar failure happens when I ask for Frank’s telephone number, and someone provides me all the numbers in the telephone book. 6. HOW DO CONTRASTS ARISE? Eric Barnes (1994, p. 37) warns about taking a linguistic approach to the generation of contrasts. He notes that most writers on contrastive explanation have used substitutional transformation of sentences to generate the contrasts. Halonen won the 2000 Finnish presidential election [Aho won the 2000 Finnish presidential election, Hautala won the 2000 Finnish presidential election, …] This way of presenting contrasts is obviously both pedagogically and stylistically practical, but it can give a misleading impression that this is the right or the only
34
P. YLIKOSKI
way to generate contrasts. For example, the following two suggestions are sensible contrasts, but they cannot be generated by substitutional transformation: Halonen won the 2000 Finnish presidential election [The 2000 Finnish presidential election ended in a tie] or Halonen won the 2000 Finnish presidential election [The results of the 2000 Finnish presidential elections were nullified] Clearly, our focus should be on the contrasted states of affairs, not on their linguistic representations. But how do we arrive at these contrasting states of affairs? There is more than one way to come up with a contrast. It can arise either by imagination or by comparison, or by a combination of the two. In everyday life, explanatory questions most often arise as a consequence of an abnormal or unexpected incident. Something abnormal happens and raises our curiosity. We want to know why it happened. In such situations the choice of the contrast is obvious: we will contrast the abnormal occurrence with the normal or expected state of affairs. (Hart and Honoré 1959, p. 31-38.) In his important 1983 paper, Germund Hesslow distinguishes five different ways in which the contrast can arise. It is useful to summarize these cases, since they illustrate various senses of ‘normality’ in explanatory context. First, the contrast can be the statistically normal case. For example, when we ask why a particular barn caught fire, we are asking what distinguishes the barn under consideration from other barns. And when we consider these other barns, which are made of similar materials, placed similarly and used similarly, we find that most barns of this type have not burned. This fact is our contrast. We will be looking for causal factors that are present in the case of our burned barn, but not in the statistically normal case. (Hesslow 1983, p. 95.) Second, the contrast can be the temporally normal case. When we ask why the barn caught fire at some particular time, we are comparing the time of the fire with the barn at earlier times. So here we are not comparing different but similar objects, but the same object at different times. In this case we will be looking for some changes in the conditions to account for the change in the states of affairs we are interested in. (Hesslow 1983, p. 95.) The third possible contrast is a theoretical ideal. Here the contrast does not arise from the observation of a difference, but from the predictions or assumptions of our theory. A theoretical account gives us a kind of ‘default’ contrast. The use of this kind of contrast facilitates the systematization of the field covered by the theory. Hesslow mentions Max Weber’s ‘ideal types’, the equilibrium models of the perfect market in the neo-classical economics, the definition of a ‘wild type’ in genetics, and the physiology of the healthy organism in medicine as typical examples of such theoretical ideals. (Hesslow 1983, p. 95-96.) These all provide scientists with a standard of comparison that helps in picking the things to be explained. Hesslow
THE IDEA OF CONTRASTIVE EXPLANANDUM
35
also compares theoretical ideals with what Stephen Toulmin calls ideals of natural order. Toulmin writes: Our 'ideals of natural order' mark off for us those happenings in the world around us which do require explanation, by contrasting them with 'the natural course of events'— i.e. those events which do not. Our definition of the 'natural course of events' is therefore given in negative terms: positive complications produce positive effects, and are invoked to account for deviations from the natural ideal, rather than conformity to it (Toulmin 1961, p. 79).
Since both Hesslow and Toulmin are very brief in characterizing them, it is difficult to say whether their concepts are the same, but at least one can say that their function in the context of explanation seems to be the same. Both work as generators of contrasts for scientific explanatory questions. The fourth source of a contrast is subjectively expected. Here the contrast is what the agent was expecting to happen. These explanations show how the fact to be explained could have occurred against the expectations we had on the basis of knowledge of earlier conditions. (Hesslow 1983, p. 96.) This source of contrast is interesting because it can be related to the intuition behind the original covering-law theory. According to Hempel, the function of the explanation is to make the explanandum expected. (Hempel 1965, p. 337.) He later interpreted this intuition as a requirement that the explanation must make the explanandum highly probable. The contrastive analysis gives an alternative way of interpreting this intuition: many explanations are related to our expectations since we are typically explaining facts that do not accord with our expectations. Showing why the unexpected happened corrects our background beliefs and in this sense reduces the unexpectedness of the explanandum. If this is right, we can give up the requirement that the explanation must make the explanandum highly probable, without losing the intuition that, at least sometimes, the function of an explanation is to reduce surprise. The fifth possible source of a contrast is a moral ideal. Sometimes an action is contrasted with a normative account of how an agent should have acted. In such a case we are asking for the explanation for the deviance from this standard of conduct, and we choose as explanatory causes such conditions that should not have been present. (Hesslow 1983, p. 96-97.) This is not an exhaustive list of all possible ways in which a contrast can arise. Certainly there are also other ways of generating contrasts. It is essential to see that there are various ways in which a contrast can arise, but having an exhaustive list of all the ways they can arise is not. However, there is one way in which Hesslow’s list can be misleading. All the explananda in his list are either abnormal or unexpected. This goes nicely with our everyday practices of explanation. However, we sometimes also want to explain the normal case. For example, we might want to explain why grass is (normally) green rather than red. The contrastive approach works here also. It might be that explananda in which the fact is the normal case and the foil abnormal are quite rare, but this does not reduce the legitimacy of such questions. At least some of these questions are sensible. Indeed, this observation suggests one way of characterizing the difference between everyday reasoning and science. The difference is in the explanatory questions asked: in science we also try to explain the normal case, whereas in
36
P. YLIKOSKI
everyday reasoning we are only interested in explaining the abnormalities. Everyday reasoning takes the normal course of events as granted and only wants to account for a deviation. Usually, we are not able to explain the normal course of events, and this does not bother us. But if something unusual happens, we want to know why. A deviance calls for an explanation. Scientists also explain deviance, but they want to know also about the normal case, and this can be reconstructed as turning around of the usual way of using contrasts. In this sense they are like children. 7. THE QUESTION OF COMPATIBLE CONTRASTS Peter Lipton (1990) has claimed that not all contrasts are incompatible and a number of authors have accepted this thesis without reservations. (Barnes 1994; Carroll 1997, 1999; Hitchcock 1999; and Risjord 2000.) As this thesis puts my analysis of the contrastive explanandum in danger, let us consider this thesis. Recall the famous example of paresis. Paresis is a form of neurosyphilis, and no one contracts this dreadful disease unless he had latent, untreated syphilis. However, the evolution of the disease is unknown, and only a small percentage of those who have untreated syphilis get paresis. Now we have two persons, Smith, who has had latent syphilis and who now has paresis, and his friend Jones who does not have latent syphilis. We can explain why Smith, rather than his friend, contracted paresis, for only he had syphilis. This is something that everybody accepts. However, Lipton notes that the fact that Smith has syphilis is compatible with the possibility that also Jones has syphilis. These two facts are independent of each other and for this reason compatible. It just happens that Jones does not have syphilis. This state of affairs is not in any way related to the curiosities of this particular example. The situation is quite common: we want an explanation for the difference between two apparently similar situations which turned out differently, but which might have turned out similarly. (Lipton 1990, p. 250.) This is typical in cases of explaining differences, so if Lipton’s thesis is right, we might expect that a good many contrasts are compatible. However, Lipton’s thesis is much less radical than it appears. In his 1993 paper he writes: When we ask questions such as why Smith rather than Jones contracted paresis, our underlying interest often really concerns a contrast about Smith alone. […] The talk about Jones is a way of getting at a certain type of question about Smith. Thus we see why a contrastive question retains the feeling of incompatibility even when then the explicit contrast is compatible. (Lipton 1993, p. 46)
Here Lipton admits that the reference to the brother is a surrogate for a counterfactual claim about Smith. We are really interested in his illness, not his friend’s health. We are asking why Smith has paresis rather than being like Jones, who does not have paresis? The exclusive alternative in the explanandum is Smith not having paresis, not Jones not having it. It is easy to see that the reference to the brother offers a convenient way of picking the desired contrast about Smith’s health, and the explanation is the thing that makes the difference between the two cases. We are contrasting two causal scenarios of Smith’s health. The first scenario culminates
THE IDEA OF CONTRASTIVE EXPLANANDUM
37
in Smith having paresis, and the second in Smith being like Jones, that is, without paresis. When subjected to a more careful analysis, the apparently compatible contrasts turn out to be incompatible. Naturally, we could also ask why Jones does not have paresis, when Smith has, but similar considerations will apply. And in some situations we might desire an explanation for both of these contrasts. But in these cases we have two separate explanations, not one explanation with compatible contrasts. In order to makes sense of this apparent conceptual confusion, we should make a difference between two sorts of explanatory questions. First, there are cases where we are contrasting two alternative outcomes of the same process. Here the contrast is imagined and the fact and its foil are incompatible. In the second case we are contrasting two separate processes with different outcomes. In this case both the fact and the foil are actual. However, in this case there is a difference between the surface structure of the contrastive statement and the contrastive structure of the explanatory question. The actual foil is a surrogate for a counterfactual claim about the process that led to the fact to be explained. So the contrast is compatible at the surface level, but incompatible at the deeper level that is the real concern of the theory of explanation. My claim is that all apparently compatible contrasts turn out to be incompatible when inspected more carefully. I have not seen a single example of compatible contrast that cannot be resolved in this manner. There is good reason for this. The explanation-seeking questions have to be reconstructed in the above manner in order to give them a properly contrastive answer. Otherwise the explanatory counterfactual could not do its job. Looking too closely at the linguistic form of the contrastive statement can lead to a misguided analysis. The basic idea in the contrastive approach to explanation is to look for the implied contrasts, instead of being satisfied with the usual statement of explanandum. The same approach should be used here: one should not be satisfied with just any contrastive statement. Instead, one should look behind linguistic formulations and try to capture the real contrast. Against this background it is natural to ask what is the point of compatible contrasts if they are so misleading? The reasons are methodological. It is often sensible to try to explain differences between two actual outcomes rather than to explain difference between actual and imagined outcome. First, in the former case one does not have to consider whether the foil is possible outcome of the process. As it has occurred, it is certainly possible. Of course, this presupposes that the cases compared are similar enough. But if they are, there is one potential challenge less for the explanatory question. There is also a second advantage. When explaining the differences between two actual outcomes, one only has to find differences between the two causal histories to find good candidates for the explanans. This much less theoretically burdened process than the one used in explaining the differences between actual and imagined outcomes. Significantly less hangs on one’s theoretical understanding of the process, since the whole process of imagining the alternative causal history is cut out. So if the processes are really similar enough, then the explanation of actual differences is much more convincing way to proceed.
38
P. YLIKOSKI 8. ARGUMENTS AGAINST THE CONTRASTIVE IDEA
A very common way to deny the philosophical relevance of contrastive questions in theory of explanation is to claim that not all explanations are contrastive (Ruben 1990, p. 40; Humphreys 1989, p. 137). The point implied by this claim is that a philosophical account of explanation need not concern itself with the contingent features of some individual explanations. If only some explanations are contrastive, the contrastive approach does not seem suitable for analyzing explanation in general. Responding to this critique is somewhat tricky. The challenge assumes that the supporters of contrastive approach are making universal statements about all explananda. I have not found any writer on this topic who is committed to this bold claim. For example, Lipton claims to be agnostic about the issue (Lipton 1990, p. 261). The reason for this is easy to see. Besides the logical problems with proving a universal statement, there is a problem concerning the vague boundaries of what counts as an explanation and what not. People do have conflicting intuitions about the explanatoriness of a good many explanations, and often the putative counterexamples to the contrastive thesis belong to this heterogeneous class of explanations. I think that a more fruitful approach to understand the contrastive approach would be to interpret it as claiming that all explananda can be analyzed as contrastive. If the contrastive analysis is generally applicable and if in most cases it provides fruitful results, we have an argument for it. I think that the already existing literature on contrastive explanation shows that interesting results can indeed be achieved by the use of the contrastive approach. It might be that in some special cases it does not help much, but this remains to be shown. And, of course, the critics can try to come up with examples where the use of the contrastive idea is a hindrance to the analysis of explanation. I have not seen such examples yet. The position I am taking can be further clarified by comparing two ways of understanding the contrastive approach to explanation. The first attaches the contrastive idea to a pragmatic theory of explanation. In this approach the contrastive thesis is about what people really have in their minds when they present explanation-seeking questions. The explanation-seeking question is thought to reflect the explainee’s epistemic state, and the contrastive suggestion is understood as a way of specifying what the explainee wants to know. I take this to be ‘the mainstream approach’ in the literature on contrastive explanation. The most famous representative of this approach is Bas van Fraassen (1980). Dennis Temple has noted a problem with this position. He writes: “… in many cases a speaker who asks ‘Why P?’ is simply puzzled about P, and without having any particular contrary in mind” (Temple 1988, p. 147). Temple has a point. Sometimes we do not always have any specific contrast in mind. For example, the explainee can be confused, or she might want answers to many different questions at the same time. Of course, the supporter of the pragmatist approach can reply that his theory is about an ideal or a rational explainee, and not about ordinary people, who are often confused. However, this would be a strange way to defend a pragmatic theory of explanation.
THE IDEA OF CONTRASTIVE EXPLANANDUM
39
The defender of the mainstream approach can respond by pointing out that asking for a contrast is a natural and effective way of clarifying or improving the intended explanandum. We can ask the explainee to specify her request by suggesting possible contrasts, such as: “Do you mean why f rather than c or why f rather than c*.” This is a useful and a very common way to settle the question. (Hesslow 1983, p. 94.) This is a good point, but I think we do not need the mainstream pragmatic approach to make it. The alternative interpretation, which I support, takes the contrastive thesis to be a central contribution to a theory about what an explanation can achieve. It is not concerned primarily with the explainee’s epistemic states, but with the things that an explanation can do. It asks, given that somebody has provided an explanation, what it explains or which question it answers. It does not make claims about the usual format of the explanation-seeking question, but about the questions for which our explanations could be satisfying answers. We should allow that people could be confused or just simply unclear about their intended explananda. The contrastive thesis should be about what an explanation can explain, not about what kind of questions people have, or can have, in their minds. In this alternative account, it is pragmatic and contextual factors that determine which questions we want to ask or which contrasts we choose, but after fixing the explanatory question the adequacy or inadequacy of the given explanation is a non-pragmatic matter. Consequently, the theory of contrastive explanandum is not a pragmatic theory of explanation in any interesting sense. (Hesslow 1983, p. 97-98; Woodward 1993, p. 276.) Of course, it can be naturally extended with pragmatic components, but that is a different matter. 9. ARE CONTRASTIVE EXPLANANDA REDUCIBLE? The contrastive approach has also been criticized by claiming that a contrastive explanandum can be reduced to a non-contrastive form. Dennis Temple and later John W. Carroll have suggested that ‘Why f rather than c’ is equivalent to ‘Why f and not-c’ (Temple 1988; Carroll 1997, 1999). Temple claims that a consequence of this reduction of the contrast is that the contrastive approach has no advantage over the traditional ‘propositional approach’, which sees the explanandum as a (noncontrastive) proposition. Let us call Temple’s position ‘the conjunctive view’. Temple makes it sound like the contrastive approach has nothing new to say about the explanandum. This is not true. The traditional explanandum has been the plain ‘f’, but now if we accept that the right representation of the explanandum is the conjunction ‘f and not-c’, we have made a substantial point about explanation. Earlier it was not thought that the explanandum is complex in this way. There are two ways of reading Temple’s suggestion. The difference between these readings is whether we accept the following three propositions to be equivalent: (1) ‘… explained the fact that f and not-c’ (2) ‘… explained the fact that f and explained the fact that not-c’ (3) ‘… explained the fact that f rather than c’
40
P. YLIKOSKI
The weak reading accepts that (1) and (3) are equivalent, but it does not accept that (2) is equivalent to them. The strong reading accepts that (1), (2), and (3) are all equivalent. The non-conjuctivist position naturally denies all claims of equivalence between these propositions. Let us first take a look at the strong reading of the conjunctive view. Although neither Temple nor Carroll says it, this reading is really a reductio ad absurdum of the contrastive approach. It claims that contrastive explanations are really conjunctions of two non-contrastive explanations. This would make contrastive explananda quite superficial phenomena. This is ironic: the contrastive suggestion was originally coined to make sense of our ordinary way of explaining ‘why f?’. But now it turns out that we cannot do that because the suggestion presupposes that we already know how to answer this question. Does this reductive thesis hold water? There are two principal arguments against it. The first argument was advanced by Peter Lipton, who has pointed out that explaining ‘Why f rather than c?’ requires less than explaining ‘Why f and not-c ?’ (Lipton 1990, p. 252-253). Recall the example of Smith’s paresis. No one contracts this dreadful disease unless he has latent, untreated syphilis. However, the evolution of the disease is unknown, and only a small percentage of those who have untreated syphilis get paresis. Now, we can (fully) explain why Smith, rather than his friend Jones, contracted paresis, for only he had syphilis. However, we cannot explain why he, among all syphilitics, got it. Under the assumptions of the example, we don’t know why some, but not all, with untreated syphilis contract paresis. The strong reading of the conjunctive view would require that we first explain why Smith contracted paresis, which we cannot do, and then to explain why his brothers did not contract it, which we can do. So, with the conjunctive view we cannot explain something that we intuitively think we can explain. This suggests that the strong reading does not work. We cannot infer (2) ‘… explained the fact that f and explained the fact that not-c’ from (3) ‘… explained the fact that f rather than c’. The second argument is by David Ruben. The acceptance of the strong reading requires that there are no limitations on possible contrasts. This can be seen by considering any arbitrary f and not-c. Let f be ‘snow is white’ and let c be ‘grass is red’. Suppose that I explain both f and not-c. Have I then explained the fact that snow is white rather than grass is red? Clearly something is missing here. We presuppose that there is some sort of relevance between f and c when we contrast them, but the normal truth-functional ‘and’ does not include any considerations of relevance. (Ruben 1990, p. 42.) The strong reading does not respect the requirements of relevance of the ‘… rather than…’ locution, which makes it a failed reduction. What about the weak reading? It does not accept the equivalence between (1) and (2), and so Lipton’s and Ruben’s arguments cannot refute it. However, these arguments show that the weak reading is of very limited interest. Ruben’s argument shows that if we rephrase ‘f rather than c’ as ‘f and not-c’, the ‘and’ does not work in the normal, truth-functional way. The normal ‘and’ does not require any relevance, but the ‘f rather than c’ requires some relevance between f and c. Consequently, the weak reading uses ‘and’ in non-standard way. There is more than a simple conjunction. This makes Temple’s suggestion just an alternative way of indicating
THE IDEA OF CONTRASTIVE EXPLANANDUM
41
the contrast. And this is not big news. As already noted above, the contrast can be expressed by alternative linguistic means. The fate of Temple’s argument teaches us an already familiar lesson: the analysis of contrastive explanation should not focus too closely on linguistic issues. The central focus of interest should be on the cognitive setting, not the linguistic means to express it. Our interest should be in contrastive facts, not in contrastive statements. The danger is that the philosophical analysis regresses to the study of the pragmatics of some locutions in English. And this is not what philosophical analysis should do. After all, explanations are also given in other languages. REFERENCES Barnes, E. (1994). Why P Rather Than Q? The Curiosities of Fact and Foil. Philosophical Studies 73: 3553. Carroll, J. W. (1997) Lipton on compatible contrasts. Analysis 57: 170-178. Carroll, J. W. (1999). The Two Dams and That Damned Paresis. The British Journal for the Philosophy of Science 50: 65-81. Cummins, R. (1983). The Nature of Psychological Explanation. Cambridge: MIT Press. Garfinkel, A. (1981). Forms of Explanation. New Haven: Yale University Press. Hansson, B. (1975). Explanations—Of What?. unpublished manuscript. Hart, H. L. A. and Honoré, A. M. (1959). Causation in the Law. Oxford: Claredon Press. Hempel, C. (1965). Aspects of Scientific Explanation. New York: The Free Press. Hesslow, G. (1983). Explaining differences and weighting causes. Theoria 49: 87-111. Hilton, D. J. (1995). Logic and language in causal explanation. In Sperber, Premack and Premack (eds.): Causal Cognition. Oxford: Oxford University Press: 495-525. Hintikka, J. and Halonen, I. (1995). Semantics and Pragmatics for WhyQuestions. The Journal of Philosophy 92: 636-657. Hitchcock, C. C. (1996). The Role of Contrast in Causal and Explanatory Claims. Synthese 107: 395-419. Hitchcock, C. C. (1999). Contrastive Explanation and the Demons of Determinism. The British Journal for the Philosophy of Science 50: 585-612. Hållsten, H. (2001). Explanation and Deduction. A Defence of Deductive Chauvinism. Acta Universitatis Stockholmiensis. Stockholm Studies in Philosophy 21. Stockholm: Almqvist & Wiksell. Humphreys, P. (1989). The Chances of Explanation. Princeton: Princeton University Press. Koura, A. (1988). An Approach to Why-Questions. Synthese 74: 191-206. Lewis, D. (1986). Philosophical Papers vol II. Oxford: Oxford University Press. Lipton, P. (1990). Contrastive Explanations. In Knowles (ed.): Explanation and its Limits. Cambridge: Cambridge University Press: 247-266. Lipton, P. (1991). Inference to the Best Explanation. London: Routledge. Lipton, P. (1993). Making a Difference. Philosophica 51: 39-54. Markwick, P. (1999). Interrogatives and Contrasts in Explanation Theory. Philosophical Studies 96: 183204. Railton, P. (1978). A Deductive-Nomological Model of Probabilistic Explanation. Philosophy of Science 45: 206-226. Railton, P. (1981). Probability, Explanation, and Information. Synthese 48: 233-256. Risjord, M. W. (2000). Woodcutters and Witchcraft. Albany: SUNY. Ruben, D.-H. (1990). Explaining Explanation. London: Routledge. Scriven, M. (1959). Truisms as the Grounds for Historical Explanations. In Gardiner (ed.): Theories of History. New York: The Free Press: 443-475. Sintonen, M. (1984). The Pragmatics of Scientific Explanation. Acta Philosophica Fennica 37. Helsinki: Societas Philosophica Fennica. Sober, E. (1994). Explanatory Presupposition. In From a Biological Point of View. Cambridge: Cambridge University Press: 175-183. Temple, D. (1988). The Contrast Theory of Why-Questions. Philosophy of Science 55: 141-151.
42
P. YLIKOSKI
Toulmin, S. (1961). Foresight and Understanding. London: Hutchinson. Tuomela, R. (1980). Explaining Explaining. Erkenntnis 15: 211-243. van Fraassen, B. (1980). The Scientific Image. Oxford: Oxford University Press. Woodward, J. (1993). A Theory of Singular Causal Explanation. In Ruben (ed.): Explanation. Oxford: Oxford University Press: 246-274. Woodward, J. (2003). Making Things Happen. A Theory of Causal Explanation. Oxford: Oxford University Press. Ylikoski, P. (2001). Understanding Interests and Causal Explanation. Ph.D. thesis. May 2001. [http://ethesis.helsinki.fi/julkaisut/val/kayta/vk/ylikoski/]
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION JAN FAYE
Explanation is one of the most discussed notions in philosophy of science. This may be because there is little consensus among specialists on how explanation in a scientific context should be characterised. Three main approaches appear to be alive today: the formal-logical view, the ontological view, and the pragmatic view. Between these three classes of theories little agreement seems possible. Beyond the expectation that explanation is meant to provide a particular kind of information about facts of matter, there seems to be little agreement at all. Given this, the pragmatic view has at least one advantage, namely, its ability to accept the others. Alternative conceptions of explanation may be construed as promoting wholly possible goals of a given scientific explanation in so far as the pragmatic situation determines that it is appropriate to pursue these goals. What pragmatists deny is that any of these other views tell us what scientific explanation is or that they cover all forms of scientific explanation, i.e., that there is any one goal of scientific explanation. 1. VARIOUS APPROACHES The formal-logical approach considers scientific explanation as something quite distinct and very different from ordinary explanation. It holds that every scientific explanation should have certain objective features by which it can be completely characterised and understood. Following Carl Hempel, a scientific explanation is to be construed as an argument with a propositional structure, i.e., an explanandum is a proposition that follows deductively from an explanans. This kind of approach gives us a prescriptive account of explanation in the sense that a proposition counts as a scientific explanation if, and only if, it fulfils certain formal requirements. As Hempel remarked, summarising his own position, “Explicating the concept of scientific explanation is not the same thing as writing an entry on the word ‘explain’ for the Oxford English Dictionary.”16 His approach offers certain norms with respect to which we can demarcate scientific explanations from other forms of explanation. Apart from Hempel’s original covering law model, this view includes approaches
16
Hempel (1965), p. 413
43 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 43–68. © 2007 Springer.
44
J. FAYE
such as Wesley Salmon’s statistical-relevance model and the unificationist theory of scientific explanation as elaborated by Michael Friedman and Phil Kitcher The ontological view considers a scientific explanation to be something that involves causal mechanisms or other factual structures. The idea is that facts and events explain things. In particular, causes explain their effects. A cause tells us why its effect occurs. A scientific explanation is an objective account of how the real world is connected. The cognitive representation of the facts of the matter does not contribute to the meaning of explanation. An explanation is both true and relevant if, and only if, it discloses the causal structure behind the given phenomena. Furthermore, an everyday account counts as an explanation if it is reducible to science talk about causal processes. The pragmatic view sees scientific explanations to be basically similar to explanations in everyday life. It regards every explanation as an appropriate answer to an explanation-seeking question, emphasising that the context of the discourse, including the explainer’s interest and background knowledge, determines the appropriate answer. Thus, pragmatists think that the explanatory product presupposes the circumstances under which the explanation is produced. The similarity between different kinds of explanations is found in the discourse of questions and answers that takes place in a context consisting of both factual and cognitive elements. The claim is that we do not understand what an explanation is unless we also take more pragmatic aspects around a communicative situation into consideration. The pragmatic view regards explanation as an agent of change in belief systems. For his part, Hempel would not deny that this effect is the pragmatic consequence of explanation, but in his view this had nothing to do with its quality as explanatory. In fact, he saw his covering-law model as an abstraction from every pragmatic context: This ideal intent suggests the problem of constructing a nonpragmatic concept of scientific explanation—a concept which is abstracted, as it were, from the pragmatic one, and which does not require relativization with respect to questioning individuals any more than does the concept of mathematical proof.17
In contrast, the pragmatic view holds that a response to an explanation-seeking question need not follow deductively from a set of premises; hence, their validation as explanations includes lots of contextual elements. It does not pretend to give us more than a descriptive account of explanation. Whether an explanation is good or bad, true or false, is not the issue as long as it fits into the general pattern of scientific inquiry. So the insight that can be associated with the pragmatic view of explanation is that scientific inquiry, and thus scientific explanation, is goal-oriented and context-bound. It is always performed relative to some set of interests. The pragmatic view can be divided into different theories. One is the cognitivist theory of scientific explanation. I consider Peter Gärdenfors and Matti Sintonen to be among its proponents. As Gärdenfors explains it: The central idea is that the explanans should increase the belief value, i.e., the probability, of the explanandum in a non-trivial way. The belief value of a sentence is 17
Hempel (1965), p. 426.
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
45
defined in terms of a given epistemic state. This state is not the one where the explanation is desired, but instead the contraction of that state with respect to the explanandum statement.18
I believe, however, that this suggestion fails to convey what is essential about explanation. If somebody asks why something is the case, that person already knows what the case is, and the explanation will not increase the belief value concerning this fact, which is already one. If I observe one morning that my dahlias have wilted over night, my belief in this fact does not increase when I am told that this is so because there was a severe frost during the night. I am pretty sure that my dahlias are dead regardless of whether I ever get to an explanation. What the explanans does is to fill in some bits of information missing in my system of knowledge. Gärdenfors may still be correct about a minor point. Although the explanans does not raise my belief value of the explanandum, i.e., of the proposition stating that my dahlias have withered, one could suggest that it does inductively increase another belief value, namely the one concerning the metaphysical belief that nothing happens to my dahlias without a cause. But this belief belongs to my background knowledge and is not what has to be explained. Moreover, I doubt that possessing a particular explanation will increase my trust in such a metaphysical principle. The cognitive approach has one important thing in common with the formallogical approach. They both conceive of the explanandum as a proposition. But this narrows the scope of their analysis. The aim of the cognitive approach is to analyse explanation by pragmatic means alone without any appeal to practical issues such as human interest. Matti Sintonen, for instance, includes the extra-logical contingencies of explanatory discourse in his five-placed analysandum. I shall reword his formula as: S explained to H why q by uttering u in a problem context P.19 The rational behind the utterance u can then be stated as follows: The role of u as intended by S, is to cause in H an epistemic change vis a vis H’s question. The analysandum, however, does rule out that u has to meet certain formal requirements in order for it to cause an epistemic change. Rather one must imagine that u must somehow be relevant to the question “why q” in order to be successful in changing H’s belief states. Not every possible answer will do. Thus, what guarantees u’s relevance could still be some formal logic of explanation à la Hempel unless the elements of the problem context P somehow excludes that any inferential characteristic of relevance can be abstracted from the pragmatic context. The basic notion is the problem context P. A response u is relevant to the question “why q” only with respect to this. The problem context has indeed both a material and an epistemic side. A problem arises in the tension between what is 18 19
Gärdenfors (1990), p. 111. Sintonen (1989). This and the following indent sentence are not quite Sintonen’s own formulations. Instead I have borrowed the above formulation from a student of mine, Thomas Basbøll, who, in his MA thesis, criticised that Sintonen writes “Why E”, where E for a proposition mentioned, rather than “Why q”, where q stands for proposition used, i.e., a fact stated. A fact and a proposition are two different things: we explain facts with propositions.
46
J. FAYE
known and what is not known. Thus, the problem context can be characterised in terms of a set of propositions stating a series of known facts. This set belongs to H’s background knowledge. Furthermore, the problem context also includes metaphysical principles such as the principle that regularities will continue to remain regular. Finally, it contains a number of propositions the truth of which H does not know but which she suspects to be relevant in solving the problem. But why she suspects them to be relevant is an open question, which the content of the problem context does not help us to answer. Therefore neither Sintonen nor Gärdenfors have shown that the pragmatic account is superior to the formal-logical account. To win the day, the pragmatists must argue convincingly that some elements of scientific practice are relevant for understanding scientific explanation, i.e., elements that cannot be explicated in terms of non-pragmatic objectives. What is missing from their analysis is H’s own interest in the problem. What H considers as relevant is partly determined by her background knowledge and partly by what she finds desirable to know. Thus, I believe that personal interest should be added into the problem context. This means that the problem context contains both beliefs concerning how the world is and beliefs about how one wishes the world to be. The upshot is that an explanation is not something that is entirely objective, but is always an account seen from a certain cultural or personal perspective. In an earlier paper I have defended a stronger version of the pragmatic view that also attempts to focus on the practical interest of the interlocutor and the respondent.20 I call this theory the pragmatic-rhetorical one. It maintains that an explanation should be seen as a reaction to a question concerning an issue where the interlocutor lacks information. Explanations are determined by the rhetorical practice of raising questions and providing answers in the sense that explanations are intentional communicative actions, they are concrete answers to definite questions, answers that have to fulfil certain rhetorical demands of purposiveness, relevance, asymmetry, etc. seen in relation to our background knowledge. As communication, explanations are context-dependent, goal-oriented, intentional and potentially persuasive. The answers are relevant and informative with respect to the context in which the questions are placed and the background knowledge of the interlocutor and the respondent and perhaps even interests. I want here to develop this idea further. In addition, I will argue that from a certain perspective, the requirements of the formal-logical approach or the ontological approach can be adopted by the pragmatic approach. 2. WHY IS EXPLANATION A MATTER OF PRAGMATICS? What are the reasons for changing from a formal-logical to a pragmatic treatment of explanation? I believe there are a number of good answers. First, we have to recognise that even within the natural sciences there exist many different types of accounts, which scientists regard as explanatory. In the natural sciences you find not only nomic accounts, but also mathematical, 20
Faye (1999). See also Faye (2002), Chapter 3.
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
47
probabilistic, causal, functional, and structural ones. Nomic accounts may seem to fit the requirements of the formal-logical approach reasonably well. But I doubt that they always do. For law statements often contain ceteris paribus clauses, saying that they are true only if certain idealisations are fulfilled, but these conditions are never of this world. Thus, no law statement applies directly to the world; rather a law statement is true of a model. Furthermore, law statements, such as Newton’s laws of motion, never refer to any particular object; they express merely how various properties are interrelated. The model represents some concrete system whose change we want to explain by picturing it according to certain interpretative standards. What we have is that causal explanations are usually carried out with the help of models, which do not have deductive connections with a theory. Models give us causal explanations; laws do not. Nancy Cartwright has offered a fine illustration of this.21 In quantum mechanics there is a phenomenon called radiative damping. It appears as a broadening of the spectral lines. The atom is represented in a model with the nucleus surrounded by electrons in various energy levels. It decays spontaneously from an exited state; it emits a quantum of energy into the radiation field, which then may be reabsorbed by the atom. The reaction of the field on the atom both provides the line width and causes a shift of the line called the Lamb shift. There are, nevertheless, six different ways of handling the line broadening using three different equations. None of them has priority; none of them gives us the correct covering law. What we see is rather that these different approaches are useful for different purposes. So while the causal explanation is the same, the theoretical treatment may differ depending on which mathematical technique that works, which again depends on the physicist’s capacity of finding a good approximation that fits the problem. The upshot of this and similar examples is that, in general, physicists select a particular covering law on pragmatic criteria. No theoretical description applies directly (i.e., without interpretation) to the world. Hence, these pragmatic criteria cannot be neutralised through a logical abstraction but are essential for grasping the notion of a scientific explanation. Second, if one is looking for a prescriptive treatment of explanation, I see no reason why the social sciences and the humanities should be excluded from such a prescription. If they are included, the prescriptive account must include intentional and interpretive explanations, i.e., accounts providing information about either motives or meanings. Also, the practise of natural sciences implicitly contains these kinds of accounts although they often play a methodological or meta-scientific role in explaining how to carry out, say, nomic or causal explanations. We can only make full sense of the entire scientific enterprise, including the development of new theories, in case we can make room for some kinds of interpretative explanations as well. Third, the meaning of a why-question alone does not determine whether the answer is relevant or not. Also pragmatic elements concerning the presentation of the question have a significant role to play. Take a question like: “Why do birds migrate to Africa?” Which response would be relevant to this explanation-seeking question depends on which words the inquirer emphasises. Hempel and Oppenheim 21
Cartwright (1983), p. 78 ff.
48
J. FAYE
construed an explanandum as a proposition and an explanans as premises from which this explanandum logically followed. Bengt Hansson, however, was the first who pointed out that the two utterances like “Why do birds migrate to Africa?” and “Why do birds migrate to Africa?” where the emphasis varies, express the same sentence but nevertheless mean different things.22 Unless we take into account such differences in utterance meaning, it is unclear what the inquirer asks to have explained. Does the questioner want to know why birds migrate to Africa, or does she want to know why it is Africa that birds migrate to? Since the formal-logical approach treats only propositions, which are acontextual, it will not be able to capture this difference. Hansson proposed constructing a “reference class” for the explanation-seeking why-question. The idea was to capture in precise terms the nuances in meaning that could only be conveyed by style, emphasis, stress, italics or the relative position of the words. The meaning, he argued, of “Why did Adam eat the apple?” “Why did Adam eat the apple?” “Why did Adam eat the apple?” and “Why did Adam eat the apple?” are “not independent of the tacitly understood reference class.” This reference class contains things that might have been different and therefore not mentioned by the question. It varies in each of the four cases. For instance, the reference class for “Why did Adam eat the apple?” includes not only apples, bananas, grapes, figs, dates, and other fruits and vegetables in the garden of Eden, but also the snake and other edible animals. A relevant answer to this why-question should give the reason why Adam chose to eat the apple instead of a banana, a fig, a date, etc. In general, a response R to the question Q is explanatory if it informs the questioner I about why P exists in contrast to another of the many relevant possibilities X. A why-question will probably be fixed by a reference class that is indefinitely large, and we may never come to know all members. Understanding a why-question, however, is a matter of being able to decide, for any given possibility, whether or not it belongs to the reference class. But we do not possess an infinite capacity to imagine all the members of reference class. This is where the context of the question shows its importance. The context helps us narrow down the alternative possibilities. In the story about Adam and Eve, we are interested in knowing why it was the apple that Adam ate, and not any other particular fruit, because God had allowed Adam and Eve to eat all kinds of fruit except the apples. He had explicitly forbidden them to eat the apples from the tree of wisdom. But Adam ate the apple because Eve convinced him to do so after the snake had tempted her to taste it. They ate the apple, rather than any other unspecified fruit, because by eating it they would gain wisdom. Someone might want to argue that the formal-logical model could handle the difference in emphasis by abstracting the logical features from the pragmatic context differently for each way of emphasis. Every reference class would then consist of a set of propositions; most of them would be false, but each of these propositions could figure in a formal inference. There are, however, severe problems with such a
22
Hansson (1975).
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
49
suggestion, for it is the context, and only that, that determines which proposition is relevant. Fourth, John Searle has correctly argued that the meaning of every indicative sentence is context-dependent.23 He does not deny that many sentences have literal meaning, which is traditionally seen as the semantic content a sentence has independently of any context. What he holds is that our understanding of the meaning of such sentences happens “against a set of background assumptions about the context in which the sentence could be appropriately uttered.” Thus, the background does not merely determine the utterance meaning whose context dependence may, like indexicals, already be realised in the semantic content of the sentences uttered. Also an extensive background of assumptions, practices, habits, institutions, traditions and so on determines the literal meaning of sentences. The background consists of a network of assumptions and, as Searle maintains, such background assumptions cannot be made entirely explicit as a determinate part of the truth conditions of the sentence. Rather the truth conditions of the sentence will vary with the variation of the background assumptions. These cannot be turned into objective implications of the sentences in question, and therefore cannot form part of the semantic content. Searle illustrates his point using the well-worn example: “The cat is on the mat.” Any assertion of this sentence logically implies the presence of a cat and the existence of a mat. A basic assumption, which is not implied but tacitly presupposed, is that there is a gravitational field defining up and down. The sentence is true only if the mat supports the weight of the cat. A space-cat, travelling in outer space under weightless conditions, would be no more on the mat than the mat would be on the cat. But the existence of gravity does not follow logically from the notion of “being-on.” (Even a person without any knowledge of gravitation would have no problem of understanding the meaning of the sentence.) Searle’s point is that any attempt to make the basic assumption part of the explicit content of the sentence would not help. For such an explication could then be extended endlessly since there will be no logical point to stop. For instance, the solidity of the cat, the mat lying horizontally, and the firm ground supporting the mat, are also among the assumptions which then had to be included in explication. In fact, sometimes the sentence may be true without some of the riders are being satisfied. The example here focuses on the context dependence of the word ‘on’, but also the words ‘cat’ and ‘mat’ are, if not more, context dependence. We may conclude that a sentence like “The cat is on the mat” does not have a set of truth conditions that uniquely specifies a truth value in a particular situation of utterance unless this set is taken to comprise features whose existence is not logically implied by the sentence.24 These features do not invariantly belong to a determinate set of truth 23 24
Searle (1978). Collin (1999) criticises Searle for holding that the meaning of a sentence is a function of its truth conditions, but that a sentence only possesses truth conditions given a certain setting. I think, however, that Searle is consistent though not very explicit in his suggestion. As I read Searle, he imagines that the truth conditions of an indicative sentence consist of two parts: One is invariant from one setting to another; this constitutes the semantic content. Another varies from setting to setting. But both contribute to fixing the truth value in a particular setting, i.e. a situation of utterance.
50
J. FAYE
conditions of the sentence, and therefore are not part of its semantic content, but belong to our background knowledge of a non-linguistic kind. Taking this analysis to be correct, the consequence is that also the meaning of scientific statements contains features that are not part of the semantic content. Hence, any attempt to make a formal abstraction from the pragmatic context, as Hempel had in mind, in order to establish an objective model of explanation, is doomed to fail. Even though they are understood literally, the meanings of the explanans and of the explanandum are always relative to a set of background assumptions. The truth is not fixed with respect to a determinate and invariant set of truth conditions. It presupposes conditions that are only determinable with respect to the actual situation in which the explanation takes place and which therefore cannot be taken into account by a formal-logical model. These conditions are often buried in ceteris paribus clauses. Generally, they reflect the explainer’s background, i.e., her interests, beliefs, assumptions, practices, habits, institutions, and traditions. Fifth, many explanations take the form of stories. Arthur Danto has argued that what we want to explain is always a change of some sort.25 When a change occurs, we have one situation before and another situation after, and the explanation is what connects these two situations. This is the story. We have a beginning, the middle, and an end. Indeed, this model of explanation does not only reflect complex historical-intentional explanations, but causal explanations fit in as well. A wooden farmhouse lies on the plain, when lightning strikes the house and flames consume the house. The change is taken to be the cause that explains why a new situation follows the original, where the new situation is the effect of the change. However, stories are not logical arguments. They are told from a certain perspective, which is determined by the interests and background knowledge of the explainer. Sixth, a change always takes place in a complex causal field of circumstances each of which is necessary for its occurrence. Writers like P.W. Bridgman, Norwood Russell Hanson, John Mackie, and Bas van Fraassen have all correctly argued that events are enmeshed in a causal network and that it is the salient factors mentioned in an explanation that constitute the causes of that events. For instance, oxygen needs to be present in a certain critical amount for the farmhouse to catch flames, but we normally cite the lightning as the cause. Most of these standing conditions, or other necessary factors, do not interest us, nor do they need to be covered by the answer in order to provide a causal explanation, nor may they be explicable. Rather, it depends on the speaker’s interests which of these necessary factors he picks up as the cause. The farmer himself may regard the bolt of lightning as the real cause, whereas the insurance company may consider a defect lightning conductor as the real cause. As van Fraassen aptly sums up, having quoted Russell Hanson with approval: In other words, the salient feature picked out as ‘the cause’ in that complex process, is salient to a given person because of his orientation, his interests, and various other
25
Danto (1985).
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
51
peculiarities in the way he approaches or comes to know the problem-contextual factors.26
That we have a tendency to select the salient or the most perspicuous event, e.g., the bolt of lightning, as the explanatory cause, does not only depend on our interest (i.e., how we want the world to be) but it also clings on our general background knowledge as such (i.e., knowledge of how the world is). Imagine, for instance, that a lightning conductor normally protects the farmhouse, but that it has been taken down for replacement. Assume, also, that you know that the plains are a high-risk lightning area and that lightning had struck the farmhouse many times before, but that nothing else had happened due to the effectiveness of the old lightning conductor. In this case you may, in virtue of the known facts, point to the absence of the lightning conductor as the real cause rather than the lightning itself. Nothing of what has been said here implies that causal explanations are subjective. The causal field as such exists objectively regardless of the fact that all the necessary factors may not be entirely explicable. What it tells us, however, is that there does not exist only one correct way of explaining things, since any correct explanation is still given in the light of the speaker’s interest and background knowledge. We may add a further point to this analysis of causal explanations as being highly context-dependent. It is well known that any counterfactual analysis of causation makes causation very contextual. This is due to David Lewis’ and Robert Stalnaker’s theories of how to evaluate counterfactuals in terms of similarity between possible worlds. Any appeal to a similarity relation between such worlds is not a purely objective matter. The standards of similarity between possible worlds are selected on a partly subjective basis since they depend on the conversational purposes for which we assert these counterfactual sentences. But even if we, as I personally prefer, say that the idea of causation is a primitive notion and therefore cannot be fully grasped in counterfactual terms, this would not solve the problem of the contextuality of causal expressions as we have already seen. Causal statements still logically imply counterfactuals. One would therefore expect that whatever is contextual about causal explanations will reappear as contextual elements in connection with the assertion of corresponding counterfactuals. Seventh, the level of explanation depends also on our interest of communication. In science an appropriate nomic or causal account can be given on the basis of different explanatory levels, and which of these levels one selects as informative depends very much on the rhetorical purposes. If a toxicologist tells the jury in a courtroom that the victim died because she had been poisoned by strychnine, he gives the explanation most relevant for this particular purpose. He chooses a level of explanation which is an appropriate account within the judiciary system and which suits the audience’s understanding. Had he chosen a chemically more accurate and detailed explanation of why this particular toxin killed this particular person, telling how the molecules of the strychnine had interacted with the cells of the body, the explanation would be on a different level. He would no longer focus on the causal 26
van Fraassen (1980), p. 125.
52
J. FAYE
mechanism of the substance on a living body but on the effects of the molecules of strychnine on the individual body cells. An explanation on this level may be relevant for other toxicologists. Similarly, a physicist might provide a causal explanation on even a lower level trying to give an account of the process on the atomic level. This possibility is indeed only relevant to other physicists. The question remains, however, whether all these explanations at various levels can be carried out as independent accounts, or whether every macroscopic explanation is in principle reducible to a microscopic explanation. In fact, supposing the latter, we must draw the conclusion that the atomic explanation, or the subatomic explanation, is basically the correct scientific account since very other explanation can be reduced to it. Such a view fares well with the formal-logical view. But fails, nevertheless, because we want to have a scientific explanation of what killed the person, and evidently to be identified as a dead body presupposes some everyday conditions. It does not make much sense to describe a dead body in terms of molecules, atoms, quarks or superstrings. The conclusion seems to be that every level of explanation is relevant with respect to certain problem contexts and not with respect to others. There exists not one correct explanation. The communicative situation, including the interest of the audience and the descriptive level of the explanandum, determines what is considered to be the appropriate explanans and the communicative situation changes all the time. This corresponds with the pragmaticrhetorical theory. Eight, scientific theories are empirically underdetermined by data. It is always possible to develop competing theories that explain things differently and, therefore, it is impossible to set up a crucial experiment that shows which of these theories that yields the correct account of the data available. The Bohmian theory of quantum mechanics is, for all we know, empirically equivalent with the orthodox quantum mechanics, although each gives a very different picture of the quantum world. The former is a deterministic theory and explains quantum phenomena in terms of hidden variables, whereas the latter is an indeterministic theory that explains everything probabilistic in terms of observables. The various reasons outlined above place explanation within the domain of the erotetic practice of science. Explaining a phenomenon amounts to answering a question, in particular a why-question. But the formal-logical approach completely ignores such an interrogative perspective. It does so because it does not realise that even scientific explanations rest on other than scientific conditions. A scientific explanation reflects a certain understanding of the context, including the questioner’s interest, and encapsulates many everyday presumptions that form our background knowledge. 3. EXPLANATION AS SPEECH ACT From the examples discussed above, it should be clear that scientific explanations as such cannot be grasped in terms of formal logic or semantics. Explanation comprises an important pragmatic dimension, which cannot be ignored since it forms an essential part of a complex understanding of explanation. This dimension is
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
53
important because explanation is an appropriate answer to an explanation-seeking question and pragmatic elements like intention and the context determines what counts as an appropriate answer. This kind of insight is also what drives Peter Achinstein in his understanding of explanation as a speech act. He argues that explanation can be understood as a process or a product. The product is the content of the linguistic performance that the person makes while producing an explanation. But he also holds that the process concept is primary, because any characterisation of the product must take account of the intention behind the explanation. Hence, he calls his account the illocutionary theory. Says Achinstein, “Explaining is what Austin calls an illocutionary act. Like warning and promising, it is typically performed by uttering words in certain contexts with appropriate intentions.”27 While this approach escapes some of the problems we saw in Gärdenfors’ and Sintonen’s cognitive theories, it, too, fails to answer certain central points, specifically on fleshing out what such notions as the explanatory context and intentions really mean. Is it possible to say something more precisely about the intentions and the context? In the process of developing his illocutionary view Achinstein lays down two aspects of explanation: If S explains q by uttering u, then S utters u with the intention that his utterance of u renders q understandable. If S explains q by uttering u, then S believes that u expresses a proposition that is a correct answer to Q.28 Oddly, this preliminary formulation overlooks that an illocutionary act is always directed towards somebody. S explains q to an audience. Thus, the kind of intention the speaker S has is to make a certain fact q understandable to an audience. What is problematic is the notion of understanding if it is to add anything new to our concept of explanation. If “understandable” means raising the probability of the belief concerning the fact to be explained, then we are not better off with Achinstein than with Gärdenfors. But this is not what Achinstein has in mind. He gives the following definition of understanding: A understands q, [if, and] only if there exists a proposition p such that A knows of p that it is a correct answer to Q, and p is a complete content-giving proposition with respect to Q. (Here p is a proposition expressed by a sentence u uttered by A.)29
27 28 29
Achinstein (1983), p. 16. Achinstein (1983), p. 16-17. Achinstein (1983), p.42. The quotation expresses only a necessary condition but later he takes it to express a sufficient condition as well (p. 57)
54
J. FAYE
An example may illustrate the idea. Consider the fact that Nero played his fiddle after he had set Rome on fire. The question would then be “Why did Nero play the fiddle?” and a straightforward complete content-giving proposition with respect to this question is “The reason Nero fiddled is that he was happy.” How shall we understand the phrase “to be a correct answer?” A correct answer is, according to Achinstein, one that has to be true as well as relevant to the question. I wish to argue, however, that Achinstein makes a serious mistake by thinking of explanations in terms of correct answers. My objections to his view are the following: First, a person A may understand q regardless of whether she knows the correct answer that explains the existence of q. Second, A may believe that p is a correct answer to Q. In many cases, however, we cannot say that she knows that p is true, and therefore that p is a correct answer. Third, what establishes that p is a relevant answer to the question Q? It cannot be that A knows that p is a correct answer to Q. This would be highly question begging as Salmon pointed out.30 And, as mentioned above, A need not know whether p is true anyway. Fourth, there may be no single complete content-giving proposition with respect to Q. If there can be many such propositions, which one should we then choose and on what conditions? Salmon suggested instead that there must be a causal mechanism that provides us with an objective relevance relation. But not every explanation is a causal explanation, and even causal explanations only consist partly of a description of an objective relationship. Hence, such a relationship can only be one of several constraining factors that may determine the relevance of a given answer. Explanation is an act of communication. It is goal-oriented and context-bound, so we cannot understand the relevance of the explanatory content without having knowledge of the goal and the context involved. 4. THE RHETORICAL SITUATION An explanation is, I shall argue, an answer to an explanation-seeking question that the explainer puts forward in a problem context whenever she has the intention of solving the inquirer’s problem with her information-giving answer. Avoiding any accusation of posing a question-begging definition, it must then be possible to define an explanation-seeking question independent of what we mean by explanation. I hold an explanation-seeking question to be a question that expresses an epistemic problem. This does not suffice, however. Not knowing the time is a cognitive problem for the person who must be at a business meeting at a certain hour. But asking somebody what time it is, is asking about a fact not about an explanation of a fact. We must be able to avoid cognitive problems of this sort. Thus we may add a further requirement. The epistemic problem must be brought to an end when the question is answered with reference to other facts, and when this connection, by being brought to the questioner’s attention, improves her understanding of the fact mentioned in the question. I think these remarks will do for now.
30
Salmon (1989), p. 148.
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
55
We have also pointed out that the context of the problem in which the respondent utters his answers is insufficient to catch the notion of explanatory relevance. Besides the problem context the notion of explanatory relevance also relies on interests and perspective. Thus, an explanation is a communicative act depending on the intention of the explainer, the problem of the inquirer, the background knowledge and interests of both the explainer and inquirer, and not least the facts of the matter that provoked the problem. Taken together these various elements create a certain rhetorical situation that I shall name the explanatory situation. The notion of the rhetorical situation was made famous by the American rhetorician Lloyd F. Bitzer who in 1968 set out to characterise situations that invited a discursive response. Bitzer argued that rhetorical situations are governed by exigencies, that is, urgencies that call upon a speaker to address an audience capable of modifying the urgency if persuaded to do so. In the case of scientific explanation I think that these exigencies can be identified with the epistemic problem. According to Bitzer: A work of rhetoric is pragmatic; it comes into existence for the sake of something beyond itself; it functions ultimately to produce action or change in the world; it performs some tasks. In short, rhetoric is a mode of altering reality, not by the direct application of energy to objects, but by the creation of discourse which changes reality through the mediation of thought and action. The rhetor alters reality by bringing into existence a discourse of such a character that the audience, in thought and action, is so engaged that it becomes mediator of change. In this sense rhetoric is always persuasive.31
This is exactly why I take explanation to be a work of rhetoric. The aim of an explanation is to induce new beliefs in the person who asks an explanation-seeking question. Moreover, I think that we cannot understand the notion of explanation unless we take its intended functions into account. Bitzer maintains that the rhetorical situation consists of three elements prior to any discourse: 1) the exigence; 2) the audience to be constrained in decision and action; and 3) the constraints that influence the rhetor and can be brought to bear upon the audience.32 Any exigence “is an imperfection marked by urgency; it is a defect, an obstacle, something waiting to be done.” There are numerous exigencies, but the only rhetorical ones are those that can be modified or changed. Also, a rhetorical exigence must be modified only by means of a discourse; other forms of changes are not rhetorical. Bitzer claims, moreover, that in a rhetorical situation there will be a least one exigence that controls and organises the situation: “it specifies the audience to be addressed and the change to be effected.” Next is the audience. A rhetorical discourse requires an audience because a rhetorical discourse is one that can influence people to chance and thereby to make decisions and actions. Finally, in every rhetorical situation exists a set of constraints. These constraints are created by persons, events, objects and relations that have the power to confine the decision and 31 32
Bitzer (1968/1999), p 219. Bitzer (1968/1999), p. 220 f.
56
J. FAYE
the action necessary in order to change the exigence. Bitzer mentions beliefs, attitudes, documents, facts, traditions, images, interest, and motives as some of the main sources of constraint. In addition to these, the orator brings in further constraints, apart from the manner in which his discourse harnesses the constraints already given by the situation, that is, his own personal character, his logical proofs, and his style. But just as important constraints, I may add, are the orator’s background knowledge and his entire worldview. How does the rhetorical situation help us to understand explanation, in particular scientific explanation? In fact, Bitzer denies that scientific discourse requires the same kind of audience as a rhetorical discourse. He argues that science does not need an audience to produce its ends since scientists can produce a discourse expressive and generative of knowledge without engaging another mind.33 I think, however, that this statement conveys a superficial understanding of the scientific discourse. First and foremost because science is a highly social enterprise. I agree that a single scientist can establish empirical knowledge without engaging an audience. She may observe a lot of low-level scientific facts such as that a mercury column, at the same temperature, is higher at the sea level than at the top of Mount Blanc. But a scientific explanation is usually not the result of observable knowledge. It most often expresses some hypothetical beliefs because, in general, scientific explanations appeal to invisible facts that the explainer believes explain the phenomenon in question. The scientific community as a whole must accept any such theoretical assumptions to elevate them to scientific knowledge. It goes without saying that an explanation always has a proper audience, namely at least that person who originally raised the explanation-seeking question. Bitzer defines his rhetorical audience as persons “who are capable of being influenced by discourse and of being mediators of change.” This description fits the inquirer, also if she is the same person as the explainer. The answer she eventually produces changes her beliefs or modifies her state of mind from ignorance to knowledge. During this process the scientist will bring her explanation to a larger forum. Through journals or conferences she will express her response to a certain question to her fellow scientists, who may have asked the same question and struggled with finding a proper explanation. In the end, if she is successful in convincing them of her suggestion, it is not only her mind that has undergone changes but the entire scientific community’s. What then can we say about the rhetorical exigence? Bitzer sees it as an imperfection that specifies both the audience to be addressed and the changes to be made by the discourse. In the case of explanation the rhetorical exigency is the lack of knowledge, which a person signals openly when asking why P. This both controls and organises the situation. A person’s lack of knowledge is an imperfection that can be remedied in virtue of an explanatory response. And the question points to the person who is the primary object of explanation. Indeed, giving a reason why something is the case is constrained in several ways. It seems that every constraint that confines ordinary explanation can also influence a scientific explanation proper. First, the explanatory situation is 33
Bitzer (1968/1999), p. 221.
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
57
constrained by the fact of the matter. To be successful as a scientific explanation the response to a why-question cannot deny or ignore obvious facts. Second, the contrast class also constrains the explanatory situation. The actual explanation should be more probable than any other explanation in terms of the contrast class. The explainee will not be convinced of the response if it is far-fetched with respect to what else she believes; that is, if the response appeals to assumptions that are not part of the common scientific background into which she has been trained and socialised. There are exceptions, of course. She may convince herself if new theoretical considerations or new empirical evidence support such an explanation. But also her personal beliefs, interests and perspective play a role in producing a response, or accepting a response, as the explanation. Alone the fact that an explanation cannot bring in all the appropriate facts at once makes an explanation a result of a selection. But we should also remember that explanations can be empirically underdetermined and that it may be impossible to select the best explanation among different theoretical proposals. Likewise, a constraining factor can be what kind of actions the inquirer may want to take on the explanatory information. In general, we may say that the explanatory situation must meet the requirement that the response to an explanation-seeking question is relevant, but what counts as relevant features are not merely objective facts, but social and personal facts as well. Thus thinking of explanation as a rhetorical discourse helps us to grasp the essential notion. The explanation is called into existence by a situation: the situation which an explainer understands as an invitation to create and present an explanation. We have called this the explanatory situation. Furthermore, not every response will do since not every response fits the explanatory situation. It must be a fitting response; it has to be relevant in the sense that it has to provide the wanted information. Seeing a situation as one that invites a fitting response makes sense only if the situation itself somehow prescribes the response that fits. A response has to meet the requirements of the situation, which are partly objective and partly subjective.34 The exigency, which generates the need for an explanation, is an epistemic problem of why something is the case, and the explanation is meant to give the solution to this problem. The person who formulates the problem may also address and solve the problem, but the explanatory situation requires that such a solution is always formulated in terms that can be understood and communicated to other scientists who struggle with the same problem. Most often someone within the community raises a question and somebody else answers it.
34
Bitzer takes a strong realist stand on the rhetorical situation by saying: “The exigency and the complex of persons, objects, events, and relations which generate rhetorical discourse are located in reality, are objective and publicly observable historic facts in the world of experience, are therefore available for scrutiny by an observer or critic who attends them. To say the situation is objective, publicly observable, and historic means that is real or genuine—that our critical examination will certify its existence.” (p. 223). This is at the same time a meta-rhetorical stand. The rhetorical situation itself may contain features which belong to the perspective of the persons involved in the discourse and which may not be publicly observable. These features exist nevertheless and may be revealed by other means.
58
J. FAYE 5. EXPLANATORY RELEVANCE
An important requirement of an explanation is that the response to an explanationseeking question is relevant. An answer that is considered irrelevant does not function as an explanation. What then establishes explanatory relevance, and how much does it depend on the problem context? It seems that no single feature characterises explanatory relevance, but that formal, semantic, methodological and pragmatic elements of the explanatory situation play a role in the way it suits the response. The features I am talking about are descriptive as well as normative. We may say that relevance is always measured against the explanatory situation including 1) the background of the inquirer, 2) the epistemic problem revealed by the interlocutor, and 3) the objective state of affairs, which has generated the epistemic problem. The background of a scientific inquirer contains metaphysical beliefs, theoretical assumptions, empirical knowledge, practical skills and social training, as well as cognitive and methodological values. Different such inquirers share to a large extent the same beliefs, practices and values, but some of them may vary from scientist to scientist. For instance, scientists have different metaphysical views on the world and this will influence what they take as a relevant response. Einstein never accepted quantum mechanics as an adequate theory of atomic processes because he believed that the world was deterministic, whereas Bohr did not share the same predilection for determinism. He considered quantum mechanics as the only proper account of atomic phenomena. A similar difference may exist between methodological values. Among them we find simplicity, accuracy, consistency, inter-theoretical unity and coherence, and fruitfulness. Thomas Kuhn correctly pointed out that methodological values are vague and that different scientists may apply them differently, but even if they did not, these values would sometimes be in conflict with one another.35 Some scientists prefer a more accurate explanation, while others look for explanations having a broader perspective and better explanatory resources. Kuhn’s example par excellence was what happened when astronomers had to choose between the geocentric and the heliocentric explanation of the planetary movements before Kepler added his laws and Newton came forward with his classical mechanics. The geocentric explanation had its spokesmen because it was in agreement with the current physics of that time. But others gave their support to the heliocentric explanation since it was overall simpler than the geocentric explanation. Two scientists may share the same methodological values; they may apply them in the same way, and put them into the same hierarchy of importance in case of a conflict. Nevertheless, they may disagree with respect to the relative weight these values have in those cases where the values work together. Still, they may agree on everything concerning these methodological values and prefer different explanations, assuming that the accounts given as a proper response are empirically underdetermined. So apart from common criteria of relevance, which scientists share due to their scientific training, we also find individual criteria. These rely on the 35
Kuhn (1977).
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
59
scientist’s previous experience, the type of work she has done until the time of inquiry, whether she has had success within her earlier field of work, the kind of concept and techniques she masters, and so on. Some scientists may prefer mathematically developed explanations; others seek more visualisable accounts. Among the individual criteria figure non-scientific values too. Kuhn argued, for instance, that the young Kepler accepted that the heliocentric explanation gave a relevant account of the planetary movement because he was occupied with hermetic and Neo-Platonist thoughts at that time. The second feature of the explanatory situation, which determines that the response is relevant as an answer, is the problem that creates the question. The scientist understands a lot of facts, but in relation to these facts there is perhaps something that she doesn’t understand, and an explanation is relevant only if it can provide information about what she is missing. But the response is adequate not only with respect to what is informed but also with respect to how it is told. She may not understand why a certain phenomenon exists, why a certain anomaly appears, and she will ask for an explanation that reflects the kind of epistemic problems she has. The nature of this problem points to the genre of explanation, that is, points to which format a response must take to be considered relevant. Here we find explanatory genres such as nomic, causal, probabilistic, functional, functionalist, structural, intentional, and interpretative explanation. All are responses to why-questions, but it is the particular problem in question that prescribes the relevant genre. A scientist may want to understand why a particular event occurs; hence a causal explanation, in which one appeals to the cause of this event, will be that kind of account which is relevant for getting to such an understanding. Another scientist hopes to grasp why a certain property helps an organism or an artefact to be successful. Here an appeal to its actual effect, instead of an appeal to its cause, may be considered relevant to gain the appropriate understanding. The effect is not intended to explain why this particular feature exists, but to explain what the particular function of the feature is, and therefore why it helps the object possessing this feature contributes to the object’s existence. The same is the case when it comes understanding people’s actions. The epistemic problem is to get to know why they did as they did. We usually perceive people to have motives for fulfilling certain goals, and we see their actions to be the means of realising those aims. Hence a social scientist will regard as relevant a response which explains the action in relation to the intended effect, the goal, and not the actual cause, the motive. Likewise, the real world constrains the explanatory situation and thereby determines the relevance of the explanatory discourse. All serious requests for knowledge are formed as information-seeking questions, and an answer to these questions, if the answer does not merely consist in stating a fact, gives us explanation. Thus, explanations are answers that provide information of a fact by relating it to other facts. But again not every fact will do, nor every form of a relation. If a response reflects a fact, which by no means could have had any real relation to, or any influence on, the fact whose existence gave rise to the question, the response will not be relevant and therefore not be an explanation. I have, for instance, previously argued that it is fully legitimate to claim that certain patterns in
60
J. FAYE
the English cornfields, which have been reported now and then, are due to beings from outer space. Such an answer is relevant, although highly improbable, as an explanation because it refers to something that could make such patterns in case it was real and had visited the earth. Facts like the height of the Eiffel tower, the date of my birth, that some mammals lay eggs, or that supernovas are exploding stars, cannot figure legitimately in a response, if the answer should count as an explanation. They do not belong to the right ontological categories that can stand in the appropriate connection to the cornfield patterns. In other words, an answer is only to be considered as an explanation if it does not commit a category mistake. Among the explanatory relation the causal connection seems to be by far the most effective. It is not spurious, it is real, and it is observable. The explanatory virtue of causes is that causes exist in the world, they connect facts or events together, and that we think of the cause as what brings about the effect. Because of these features any answer that appeals to a cause is taken to be highly relevant and therefore to provide an explanation of the effect. But we have to remember two things: causes take place in a network of circumstances and no single fact or event among those constituting this network is objectively the cause. It is the entire network as such that constrains the explanatory situation, while we select those we find most interesting. Furthermore, there are other real relations than causal relations that can constrain the explanatory situation and condition the answer to be an explanation. Not every relation in nature is a causal one. And in a world of structures, functions, meanings, rules or interpretations there are other kinds of relations that play the same kind of constraining factors. The exigence of the explanatory situation in these cases is the lack of knowledge concerning them. Beneath these relations we may find causal one, but this would be irrelevant as long as our cognitive goal is to get to know things in terms of their structures, functions, meanings, rules, and interpretations. 6. EXPLANATORY FORCE What gives an explanation its force to explain a certain fact? An explanation consists of a description of the event to be explained and of the events that are invoked to account for it. The explanans itself is only an explanans relative to the explanandum: it is towards that its explanatory force is directed. The question is whether very explanans must stand in the same kind of relation to its explanandum in order to be explanatory? Let us see what the rhetorical-pragmatic theory has to offer on this issue. Before we proceed an important distinction has to be made. We must distinguish between explanatory relevance and explanatory force. The former notion means that the explanatory answer fits the explanation-seeking question in the sense that there exists an appropriate thematic connection between the two. Whether the accessible information is seen as relevant or not is determined, as we have seen, by our background knowledge, our interests, and the nature of the epistemic problem. The latter notion, however, reflects the fact that an answer is successful in getting the
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
61
interlocutor to believe it answers her question and therefore that the facts are as stated by the explanation. The pragmatic-rhetorical view holds that logic alone cannot account for explanatory force. The formal-logical approach takes the explanatory force to consist of the inferential link between the explanans and the explanandum. If the explanans logically entails the explanandum, then and only then does the explanans have the power to explain the explanandum. It assumes that there is a logical fact of the matter which gives an explanation its explanatory force, and whenever the interlocutor grasps this objective state of affairs, she understands how and why the explanation explains. But we have already argued that there is no such deductive link between a theory, a set of propositions, and those propositions that state the facts we want to explain. And even if there were such an inferential connection, fundamental laws could still not explain anything because they do not describe the real world. The pragmatic-rhetorical theory also insists on the issue that truth has little, if anything, to do with explanatory force. Nor has truth anything to do with explanatory relevance. A theory of explanation should be able to specify what an explanation is regardless of whether it is true or false. Truth is definitely not sufficient because the explanans is never true, if it is true, relative to the explanandum. Nor does truth seem to be necessary. Looking into the history of science we see more often than not a hypothesis be promoted as yielding an appropriate explanation of observed facts, a hypothesis that later turns out to be false. Among today’s assumptions, which scientists acclaim, many will probably be false too. In spite of their falsity they are nevertheless thought of as vehicles of genuine explanations. False hypotheses are the rule, true ones the exception. Hence, truth cannot be a necessary condition either. As long as the scientific community thinks of a theory as true, nobody has problems with the suggestion that it can explain what it has to explain. However, no one wants a false explanation because it gives us wrong and useless information, which will mislead us if we want to take action based on this information. So, eventually known to be false, scientists then reject the theory as what gives us the wanted explanation. So truth in itself may not be necessary for explanation, but what may be necessary is the belief that the explanation is true. Undoubtedly we often acknowledge something as an explanation, even though we know that it is false. We do so because we see the answer as relevant for the explanation-seeking question. For instance, Lamarck’s suggestion that acquired attributes could be inherited in a new generation was used to explain the development of the biological species. After Darwin, biologists have accepted this hypothesis to be false; however, they may still maintain that it is but a wrong explanation. This and many similar cases indicate that we can accept a response to a why-question as a possible explanation without this acceptance being accompanied by a belief in its truth. Again, we may come up with a distinction between look-alike explanations and proper explanations in order to say that potential explanations are only look-like explanations. In real life, lying to his wife about the lipstick on his collar, a man may tell her that the bus had stopped very abruptly, and a lady’s face had bumped into his
62
J. FAYE
shoulder. He knows that it is not true, and she does not believe him. She thinks of it as explaining away things. It appears as an explanation, but is it a proper explanation? In other words can a response to a why-question show itself as an explanation without being one? Intuitions concerning this point seem to be divided. In case one holds that a belief in the truth of the response is required to have a proper explanation, the man’s lie does not count as such. Assuming that look-alike explanations are not different from proper explanations, we must then be able to specify what makes such responses be like proper explanation. Look-alike explanations and proper explanations must have some essential feature in common. What is it? Both provide reasons why something is a fact. This must be the form of a causal story. The man appeals to a possible, causal connection between the lipstick on his collar and an imagined episode on the bus, hoping that his wife then would believe him. In other words, his response is relevant in virtue of its reference to this possible connection, although it is false and his wife does not believe him. Hence, it seems justified to say that any relevant response to an explanation-seeking question is an explanation. Accepting this analysis means accepting that an explanation is a response that is considered to be relevant for an explanation-seeking question. So an explanation may lack explanatory force. The man could speak the truth, or he may tell a lie, in both cases his explanation would not be successful in explaining the fact. As long as his wife doesn’t believe him, the explanation has not fulfilled its purpose. She refuses to embrace the explanation not because she doesn’t see it as relevant, but because she has not been convinced that things are as she is told. I don’t think it discounts the discussion to think that she has no problems imagining things happening as stated in her husband’s explanation. But she possesses no evidential support and may instead have counter-evidence like love letters to her husband from an unknown woman, disrupted phone calls, etc. What is missing and what would make the explanation successful is her trust in her husband. The man has lost his authority as an explainer. The wife therefore thinks that the explanation is very implausible, although she deems it to be an explanation. Thus, an explanation has, as an act of rhetorical discourse, the force of explaining a fact if, and only if, it can persuade the audience to think of it as being correct. Its ability to convince an audience rests on both the explanatory relevance and the explainer’s ethos. An explanation has explanatory relevance in relation to the problem context it addresses. But this is not sufficient to make the explainee believe that things are as the explanation says they are. If an explanation is successful or unsuccessful in explaining the facts in question, it is partly due to the rhetorical situation, which includes the explainer’s ethos. Also in science do people sometimes accept explanations because of the explainer’s ethos. It is well known among historians of science that an experiment or an assumption will be given more credit than it deserves in case a famous scientist supports it. Even a wrong formula may find long time acceptance in the scientific community only because of the reputation of the person(s) who made the calculation. A couple of years after the advent of the relativity theory, several physicists, among them Max Planck and Albert Einstein himself, produced separately a formulation of the thermodynamical laws in accordance with the special
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
63
principle of relativity. Their treatment was adopted by many textbook over the years until H. Ott as late as 1963, and independently H. Arzeliès in 1965, discovered that the old formulation was not satisfactory. In particular this was so because Planck and Einstein had used generalised forces instead of true mechanical force in the description of thermodynamical processes.36 In his discussion of this very instructive example of how the explainer’s ethos plays a role in the audience’s belief in her explanation, Møller describes the laws of relativistic thermodynamics in terms which neither fits the formal-logical nor the ontological approach. Let me quote at length: The papers by Ott and Arzeliès gave rise to many controversial discussions in the literature and at the present there is no generally accepted description of relativistic thermodynamics. This is because many different formulations of the thermodynamical laws are possible, since the principle of relativity alone does not determine them uniquely. In fact, from this principle we may conclude only that the classical laws of thermodynamics are valid in the momentary rest system S0 of the matter, independently of the motion of this system with respect to the fixed stars. However, there is a wide spectrum of possible ways of describing relativistic thermodynamics in any other system S, since the basic laws may be assumed in a rather arbitrary way to depend explicitly on the velocity of the matter relative to S. In this situation we must have recourse to arguments of simplicity and convenience.37
Thus, Møller maintains that the selection of a relativistic thermodynamics is empirically and theoretically underdetermined and which of the many possible formulations one prefers, depends on methodological and pragmatic criteria such as simplicity and convenience. Again we see an illustration of the fact that there is not one correct covering law, and therefore not only one explanation. 7. THE LOGIC OF EXPLANATION Looking in Oxford English Dictionary, one will see that the verb “to explain” is given two, seemingly different meanings. The term means either to make something plain or clear or to give or be a reason for something. This distinction reflects the difference between description and explanation. So we have description-giving explanations and reason-giving explanations. These different kinds of explanation seem to correspond to different types of questions. The description-giving explanation would be the result of a how- or what-question, whereas reason-giving explanations are responses to why-questions. But things are not so obvious and straightforward, as they seem to be. There can be no doubt that the appropriate answer to every why-question is a reason-giving response. This is the essential feature of the logic of discourse to which a why-question is subjected. Consider a question like: 36
37
Møller (1972), p. 107, 219, and 232 f. for details and further references. Møller also used the old formulation in the first edition of his book that was published 1952. For instance, the Joule heat developed in the electric body per unity of time and volume with respect to a moving frame S would in the old formulation be expressed as φ0(1-u2/c2)½, whereas in the new formulation it becomes φ0/(1u2/c2)½. Møller (1972), p. 233.
64
J. FAYE (1) Why do some birds migrate to Africa in the autumn? An appropriate reaction to this question is to say: (2) The reason that some birds migrate to Africa in the autumn is that they would not be able to find food during the winter in Europe.
The question and the answer only make sense, of course, within the broader context of the geography of the Earth and the annual climate changes on the Northern Hemisphere. In fact, the answer that cites the lack of food is no more relevant than one that refers to the lack of daylight, cold temperature, or snow coverage. All these facts are parts of the same overall causal story where the lack of food is the perspicuous result. Another appropriate answer is: (3) The reason that some birds migrate to Africa in the autumn is that they have an instinct to do so. This answer (and question) requires an even broader context, including the biological evolution and selection, to make sense. I think that it is because the stated reason in either (2) or (3) is meant to justify the existence of a puzzling fact (the migration) that explanations are often only associated with responses to why-questions. It is this element of justification connected with stating a reason that intuitively get people to think of explanation as a reason-giving answer and, therefore, being an answer to a why-question. I have previously argued that scientific explanations may be reactions to other kinds of questions than why-questions such as how- and what-questions. Already William Dray and Michael Scriven38 noticed this, but also Sylvain Bromberger and Peter Achinstein and the late Wesley Salmon39 have denied that all explanations are answers to why-questions. Furthermore, I have argued that the distinction between description and explanation is concerned with pragmatics, neither with logic nor semantics. I shall briefly elaborate on these issues. Indeed many requests of knowledge in terms of a how- and a what-question can be re-phrased as a why-question. A what-causal question as (4) What causes some birds to migrate to Africa in the autumn? can be replaced by (1). Hence (2) and (3) are both possible answers to (4). But also a what-question such as (5) What do you mean?
38 39
Scriven (1962), p. 173-174. Salmon (1989), p. 138.
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
65
may be reformulated as (6) Why are you saying so? But it is not every how- or what-question that can be translated into a corresponding why-question without a loss of meaning. Again, it is the context that decides. The translation-argument does not settle the debate. We have to show, one may argue, that it is impossible to find any why-question that matches the response. The important issue is whether or not a response to a how- or a what-question can always be construed as if it also were a response to a why-question. A positive claim does not hinge on the claim that non-why-questions cannot be constructed for explanatory answers nor that these explanation-seeking non-why-questions are somehow disguised why-questions. The suggestion is only that if a proper answer to, say, a how-question should count as an explanation, you must also be able to find to this answer a matching why-question. In other words, the claim is that it is necessary that an explanation is a potential response to a why-question; however, it is not necessary that it be an actual response to a why-question. The idea behind this suggestion is that an explanation is identical with an answer to all kinds of wh-questions as long as it states a reason. Therefore, it must always be possible to find a why-question that matches the reason-giving answer. Consider the following question: (7) How do birds from the Northern Europe actually migrate to Africa? This how-question cannot just be replaced with (8) Why do birds from the Northern Europe actually migrate to Africa? These two modes of questions mean something different. The how-question asks for the actual manner in which birds migrate, whereas the why-question asks for the actual reason why birds migrate. In both cases the inquirer’s emphasis determines that it is the migration which the answer should inform about. The straightforward, but highly relevant, response to this how-question would be that birds do fly (instead of walking, swimming, etc.), whereas the proper response to the why-question cites the lack of food in Northern Europe during the winter (instead of the lack of daylight, cold temperature, snow coverage, etc.). Thus, since the answer to (7) does not contain a reason, it cannot be an explanation. It has been recognised by several authors that how-possibly questions are genuine explanation-seeking question. Take a question like: (9) How is it possible for birds to migrate to Africa? At the first sight (9) expresses that a person who poses the question is under the mistaken impressions that the occurrence is physically impossible or highly improbable. This is Hempel’s interpretation. But there are several other adequate
66
J. FAYE
interpretations. The correct understanding depends on the explanatory situation. The question may, for instance, just as well render, not the person’s disbelief, but her lack of knowledge concerning what kinds of properties birds have that allow them to be heading in the right direction. The intended meaning of the question is then something like “How do birds find their way to Africa while migrating?” or “How are birds able to navigate their way to Africa?” The relevant answer to this question depends on whether the birds are only nocturnal migraters, only daylight migraters, or both. The response may therefore refer to the birds’ internal star mappings, or their magnetic sense, and/or their ability to correct the course by the sun as well as by landmarks. One answer is (10) The reason that it is possible for nocturnal migraters to migrate to Africa is that they can navigate with the help of the stars. Here we can easily find a why-question that matches this response to (9): (11) Why is it possible for birds to fly straight to Africa? Another possible interpretation of (9) would be that one saw the question posed in an explanatory situation in which the person had been thinking of the distance between Northern Europe and Africa: (12) How are small birds able to fly such a long journey? In this case (10) would no longer be an appropriate response. Instead an answer such as (13) The reason that it is possible for birds to migrate the long distance to Africa is that they can find enough food while resting, would reflect the intentions behind the question. Again (13) is an appropriate response to the following why-question: (14) Why is it possible for birds to travel the long distance to Africa? Thus, any satisfactory answer, also those to how-possibly questions, gives us the reason for birds finding Africa, and as a consequence, it seems, it is possible to find an appropriate why-question that corresponds with the how-possibly question. Let us return to the how actually-question. But let us look at another example taken from Salmon. Instead of (7) we could say (15) How did mammals (other than bats) come to be in New Zealand? The answer to this question is that human beings came in boats and later imported other mammals. Salmon regarded this answer as a genuine scientific
THE PRAGMATIC-RHETORICAL THEORY OF EXPLANATION
67
explanation. It is not an explanation of why they came there, but an explanation of how they got there. Thus (15) cannot be reformulated as (16) Why did mammals (other than bats) come to be in New Zealand? Here an appropriate response is (17) The reason that mammals came to be in New Zealand is that people wanted to use them except for mice and rats, whereas the proper response to (15) cannot be expressed as a reason-giving answer. Does this means, then, that a story about how mammals came to be in New Zealand does not count as an explanation? Salmon said no, but he never told us why. Some may hold that answering a how actually-question merely gives us a description, because such a response does not provide us with a reason as answering of a why-question normally does. In my opinion such a reply is wrong. It is correct that only a response to a why-question (or a matching what-causal or how-possibly question) yields a reason. If one thinks of explanation in terms of giving reasons, then answers to how actually-questions cannot act as explanation. The case is closed. But if one takes an explanation to be an answer that has been selected from on a huge repertoire of possible responses, then the case is still open. I think that many answers to how- and what-questions, which cannot be replaced by a why-question, function as genuine scientific explanations. Think of responses to questions like “How did the Universe begin?” “How did the Egyptians build the Pyramids?” and “What kind of chemical bound connect Na-atoms and Cl-atoms?” In every case of explanation, we explain one fact by relating it to another fact in contrast to a whole class of possible facts, i.e. the contrast class, in which each member might have been mentioned in an alternative explanation. An answer counts as an explanation in the explanatory situation because it is informative in virtue of the fact that other answers are possible. When, in some rhetorical situations, we think of an answer to a how- or a whatquestion as a description but in others rather as an explanation, it has something to do with whether or not the question poses an enigma. And whether it constitutes a riddle depends on our background knowledge. The answer to (7), that birds fly, is seen more as a description than an explanation, because it is part of common knowledge that nearly all birds fly, and this is how they prefer to move around over longer distances. This is not something which science has discovered. This is something we know. How birds migrate does not normally represent an epistemic problem to anybody. We are quite certain that we will never discover that migrating birds don’t fly. But, again, the question could signal the presence of a real epistemic problem in the context. Other answers to (7) are possible, responses which we may then take to be explanations. What I have in mind is those explanatory situations where the person who asks the question wonders about whether or not birds migrate in all kinds of weather, flying day and/or night, making stops or flying non-stop, etc. If (7) expresses an epistemic problem, an appropriate answer relies on scientific investigation, and the result of this investigation is the subject of the explanation.
68
J. FAYE
Accordingly, the answer to (15), namely that human beings came in boats and later imported other mammals, functions as a scientific explanation. Here the information provided by the answer is not part of common knowledge. How mammals came to New Zealand represents an epistemic problem, not merely to a single person but to the scientific community as a whole. The information that solves the problem has been uncovered through scientific research. Therefore it is also possible that biologists and historians one day will reveal that mammals already lived in New Zealand when humans arrived in their boats. The upshot of our discussion is that responses to how- and what-questions also function as explanation in spite of the fact that erotetic logic of these kinds of questions does not allow us to formulate reason-giving answers. The distinction between description and explanation is a pragmatic one. If a response addresses the epistemic problem, which has been raised in a question, in a relevant and informative way, the answer yields an explanation. If the answer does not approach any epistemic problem because the question does not express one, it merely functions as a description. REFERENCES Achinstein, P. (1983). The Nature of Explanation. New York: Oxford University Press. Bitzer, L. (1968/1999). The Rhetorical Situation. In Lucaites et al. (eds.): Philosophy and Rhetoric 1. Contemporary Rhetorical Theory. New York: The Guilford Press: 217-225. Cartwright, N. (1983). How the Laws of Physics Lie. Oxford: Clarendon Press. Collin, F. (1999). Literal Meaning, Interpretation and Objectivity. In Haapala and Naukkarinen (eds.): Interpretations and its Boundaries. Helsinki: Helsinki University Press. Danto, A. (1985). Narration and Knowledge. New York: Columbia University Press. Faye, J. (1999). Explanation Explained. Synthese 120: 61-75. Faye, J. (2002). Rethinking Science. Alderston: Ashgate. Gärdenfors, P. (1990). An Epistemic Analysis of Explanations and Causal Beliefs. Topoi 9. Hansson, B. (1975). Explanation—Of What? (Unpublished manuscript). Hempel, C. G. (1965). Aspects of Scientific Explanation and other Essays in the Philosophy of Science. New York: The Free Press. Kuhn, T. B. (1977). Objectivity, Value Judgment, and Theory Choice. In his The Essential Tension. Chicago: The University of Chicago Press: 340-52. Møller, C. (1972). The Theory of Relativity. Oxford: Clarendon Press. Salmon, W. (1989). Four Decades of Scientific Explanation. In Minnesota Studies in the Philosophy of Science XIII. Scriven, M. (1962). Explanation, Predictions and Laws. In Minnesota Studies in the Philosophy of Science III. Searle, J. (1978). Literal Meaning. Erkenntnis 13: 207-24. Sintonen, M. (1989). Explanation: In Search of the Rationale. In Minnesota Studies in the Philosophy of Science XIII. van Fraassen, B. (1980). The Scientific Image. Oxford: Clarendon Press.
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY OLAV GJELSVIK
1. INTRODUCTION40 Events have causes. We often try to explain events, and we often succeed. The causal relation is a relation in the world which either holds or fails to hold independently of how its relata are described: the relation is extensional, and its relata are normally taken to be events. The explanatory relation is, however, intensional. This means that we cannot replace a term with co-referring or coextensional terms within an explanatory context without risking that we change the truth-value of the whole. 41 I shall simply say that “explains” is an intensional relation, and I do that without thinking of this as an ontological commitment, or as something that anything really hangs on. It might be helpful to concentrate on the limitations on substitutivity, and forget all talk about intensional entities. We must, of course, come to terms with the insight that what we explain is events in the real world. Events are concrete. The intensionality of “explains” brings with it no mystery, however; an explanation of an event explains that event “under a description”. This latter locution is technical, as talk about propositions is. The point is that a set of sentences explains an event only as long as the event to be explained is described in such a way that the relevance of what we say is plain. Only then do we provide an explanation; an explanation must, necessarily, provide understanding. Relevance is relevance for this (causal) understanding. This point about explanation is what underlies the intensionality of explanations, and what limits substitutivity. The sentence or statement describing the event to be explained must express a fact, or a true proposition. If it does not, there is nothing to explain. The fact to be explained in the causal case is typically that an event of a certain kind has occurred. 40 41
I am grateful to Alexander Bird, Nancy Cartwright, Edmund Henden, Stathis Psillos, Nils RollHansen, and Tim Williamson and the two editors of this volume for help and comments. This whole consists in a larger set of statements which again is made up by a pair of two smaller sets of statements which are linked by a statement of the word "explains". The relata of the explanatory relation may be seen as statements, or as what those statements might be seen as expressing; propositions or thoughts.
69 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 69–92. © 2007 Springer.
70
O. GJELSVIK
When we explain why a particular event of a certain type has occurred (a fact), we can for short say that we explain that event. It is thus quite easy to maintain that we explain events even if the explanatory relation itself is an intensional relation between statements or propositions. I now want to push aside considerations as those above. I will simply speak of the explanation of events. Let us move on. Events have causes, and causes themselves have other causes. Typically, when we give a causal explanation, we relate cause (or causes) and effect somehow. But we do not just relate them (extensionally understood). That would not necessarily provide an explanation at all. We must relate them in particular ways to provide an explanation. I shall maintain that these ways are identified thus: There is no explanation of why an event occurred unless we provide an understanding why that event occurred. This seems to me to capture the point of explanation. There have been very many suggestions about how we must relate cause and effects to provide a causal explanation. Many have seen the requirement that explanation provide understanding as insubstantial and vague. Similar thoughts would arise from an illumination of understanding from the concept of knowledge. On the whole it has been seen as entirely un-illuminating to use or employ the concept of knowledge in accounting for explanation, knowledge was at best the final aim, never to be used from the beginning in such analyses. This paper aims to turn that around in the theory of explanation. The most famous account of explanation is Hempel’s D-N account. (NB: Hempel always thought that explanations could be probabilistic as well.) In its classical form Hempel’s view is that an explanation of a particular event consists in a correct deductive-nomological argument. There are law-premises and particular fact premises. The conclusion says that the event we want explained took place, and the argument is valid. All premises are true. On this view the structure of an explanation is the starting point. The intensionality of explanation is accounted for by the need to deduce the occurrence of an event from a law. The needs of deduction from laws determine the intensionality of explanation, we might say. Hempel’s account is not insubstantial or vague. But there are many well-known problems for this classical approach. For example, it seems to let in too many arguments as explanations. The approach therefore needs to impose further constraints. An easy way out here might be to say that only a D-N argument that presents a (set of) cause(s) is an explanation. But letting in too many is not the only problem: the approach also seems to let in too few arguments as explanations. Most causal explanations in real life fall much short of the ideal of deductive arguments with law-premises. Furthermore, the ideal might indeed be pointless, because our interest might fall on some particular part of the causal history of an event. David Lewis has presented an exceedingly simple view of causal explanation, and had related and defended his view against many views in the literature. His thesis is this: To explain an event is to provide some information about its causal history. Lewis does not, of course, think of this as giving necessary and sufficient conditions for something to count as an explanation. Lewis does think of his thesis as equivalent to the thesis above about understanding: To understand why something happened is to possess some information about its causal history.
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
71
This paper can be seen as a critical reflection on Lewis. I agree with Lewis that we should not aim for a reductive account of causal explanation. I shall nevertheless dispute his thesis. The thesis is so weak that it can hardly be false, someone might think. I shall argue that it is much too weak. We ought to say something stronger in order to provide something fruitful about what explanation is all about. I shall in the first part consider two strengthenings of Lewis’s thesis. In the second part I shall develop my account a bit further, and finally I shall bring this account of explanation to bear on the issue of inference to the best explanation. The two strengthenings of Lewis are these: 1) To explain an event is to provide some knowledge about its causal history. 2) To explain p is to provide knowledge why p. It is easy to see that the first strengthening is a somewhat minor modification of Lewis’s view. It leaves us with a view that in many ways relates to other views on causal explanation as Lewis’s view does. The second strengthening gives us a view that seems to depart more from Lewis. Both suggested strengthening are rooted in the fact that I think that a thesis about explanation should give the point of explanation in such a way that some helpful guidelines are given to a potential explainer about how to go about her task. But there is also more to it than that. Another preparation for what is to come is this: Sylvain Bromberger has pointed out that an explanation might be subjected to two very different types of question. An explanation is something about which it makes sense to ask: “How long did it take”, “Who gave it?” etc. These questions see an explanation as an act by someone, an intentional activity in space and time. An explanation may also be an object about which we can ask: “Does anyone know it? Who thought of it first? Is it very complicated?” Perhaps we can think of this as a process/product ambiguity. I shall think of it as the two sides of explanation: The activity of explaining, and the explanation given by the activity, which we may call the activity side and structural side of explanation. We can focus on either side when thinking about explanation, but we need to think about both for a general theory of (causal) explanation. One important issue is precisely how to relate them. Bromberger’s point is fully compatible with seeing explanation as providing understanding, since providing understanding is also both a concrete activity and the provision of an abstract object. Another way of putting the point is this: An explanation may be something concrete (an activity) and it may be something abstract. If it is the concrete act of explaining, we may ask how long it took, whether it was interrupted etc. We are then thinking of an act of explaining something. We may also think of it as an abstract entity, as something about which it can make sense to ask whether anyone knows it, whether it is complicated, what is its structure etc.
72
O. GJELSVIK 2. FIRST MOVEMENT: THE ACTIVITY SIDE
Guide for the reader: The objections to my views are numerous. I will answer some major objections to my views about the activity side after I have argued in support of the two modifications I am going to suggest. I ask the readers for patience until that point. 2.1 The minor modification: Explaining is factive—and more than that Let us first look at the act of explaining something, in this case the act of explaining a particular event. In such an act, someone who possesses some knowledge or information about the cause or causes of this event tries to convey this knowledge or information to someone else. That is done with the purpose of providing understanding of why it happened: That other persons asks for an explanation, asks why something occurred, and that is a request for understanding why it happened. The serious explainer must think of herself as possessing an explanation, i.e. information that provides understanding, and must think of her reply as making that explanation available to a person who does not possess the explanation. In giving the reply the explainer makes an assertion. We might think that this fact in itself constrains our account of explanation. In particular, if we think that the constitutive rule of assertion directly links knowledge and assertion, and says that you should only assert what you know, then the fact that we assert something when giving an explanation, might in itself have consequences for a thesis about explanation.42 But I shall leave that issue aside, at least for the moment, and concentrate on the act of explaining and what this act is supposed to provide (namely understanding). Our attempt at explaining p might fail. When considering what to say at all in response to a request for an explanation, we must consider whether we have something we can say which is not an outright failure. What we say when we try to give an explanation must be something we at least believe to be an explanation, something we believe provides understanding why the event to explain happened. Again: This paper is not in the business of giving anything like necessary and sufficient conditions for explanation. Nevertheless we must consider possible constraints on a reasonable thesis about explanation. A possible constraint is this: a thesis about explanation must, somehow, throw light on what it is to explain, and thereby, at least in a minimal sense, illuminate for the person who tries to explain how she should go about the task. If a thesis about explanation clearly allows as a possible explanation any number of statements that the explainer knows not to be explanations, then the thesis about explanation does not illuminate explanation. Lewis’s thesis is that to explain is to provide some information about the causal history. Lewis is clear that he does not use “information” in such a way as to imply truth. Information can be misinformation. This creates a problem. It seems entirely unreasonable to take Lewis to allow that an explainer can knowingly pass on 42
See Williamson's (2001) theory of assertion, chapter 11.
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
73
misinformation and still think of what she is doing as explaining. Still that seems to be implied by his stated thesis about explanation in conjunction with his use of the term “information”. Let us rule out that possibility, both as an interpretation of Lewis and as a possible position on explanation. Minimally, then, the explainer must believe that the information she gives is correct. I take that to imply that a serious explainer would wholly withdraw an explanation in which an untruth played an essential role. If an untruth played a part but not an essential part, then the explanation might be modified, but not wholly withdrawn. The notion of “essential role” I am appealing to is Lewis’s, and is also itself in need of illumination. I shall maintain that “essential” here must be understood as essential for (causal) understanding. Let us look at a Lewis-type example. There has been a crash. There is the icy road, the drunk driver, the bald tyre, and much more. (There is also the bald driver, the red car, the fact that it happened on a day with an “r” in it and what not. I shall come back to these facts.) In a case like this there might be no easy way of identifying the cause, there might be, to put it loosely, many causes that work together. Competing explanations might mention different causes, and this raises many interesting issues. But so far the simple question before us is whether, in case we give an explanation by providing some information about the causal history, we should a) believe, b) believe with good reason, or, perhaps, c) know that the information we provide is correct. I shall take a) as obviously true, as argued above. The question is whether we need something stronger than a). In the normal case, it seems clearly true that we need at least b), when we concentrate on the activity-side. This is because we need reasons for our explanatory belief when we utter it. In order to give something as an explanation, either that the driver was drunk, or that the tyre was bald or both, we would normally require that we have reason to believe that the driver was drunk or that the tyre was bald. We would normally feel obliged to check one way or other whether these things were the case before we offered such an explanation to someone. It seems definitely to be true that we should have some reason to believe that the information we offer in an explanation is correct. I conclude that the explainer must minimally believe with some reason, and perhaps with good reason, that the essential information given in an explanation must be true. The question is whether this observation should have consequences for our thesis about explanation. It seems to me that it would if we accept the constraint on such theses above, namely that a thesis about explanation must illuminate (in a minimal sense) for an explainer how to go about the task. Since untruths should not be told, this must be reflected in the thesis. The thesis might then be: to explain an event is to give some information one has reason to believe is correct about its causal history. Now, in case the information is not correct, is the information given explanatory? Has the person requesting an information been given an explanation - has understanding been provided? The answer seems to be clearly “no”. If your potential understanding of why something happened is based on false belief, then you do not understand why it happened. It is not enough for the provision of understanding why something happened that one has generated beliefs about causal factors.
74
O. GJELSVIK
Understanding must really be based on correct beliefs. One has not provided understanding to someone else unless that someone’s understanding of the causal process is based on correct information or belief. Explaining, and from that providing an explanation, is therefore factive, and the factivity of explanation is derivative on the factivity of understanding. Let us move on to the issue of whether we should require something stronger than b), namely that the correct information we give should amount to knowledge on our part. If we knew that drivers in these parts very often had bald tyres, we would have a reason to believe that this particular driver was driving around with bald tyres. Still, it seems to me, that knowledge would not entitle us to simply give as an explanation of the accident in question that the tyres were bald. Even if we knew that drivers in this part of the country had bald tyres much more often than not, it would not seem right to just put forward the explanation that this accident was caused by a bald tyre. We might, in the imagined case, be entitled to saying that the tyre might have been bald, or even that it was probably bald, but that would be as far as we could rightly go. We would not be entitled to simply explaining the accident by saying that the tyre was bald. The same goes for drunk driving. Even if we knew that 98% of the drivers in these parts were drunk at the hour of the accident, it would seem wrong to straightforwardly maintain that the driver was drunk as part of an explanation of a particular accident. It seems that you should know that the present driver was drunk in order to explain the accident by that fact. If the only thing you do know is that 98% of the drivers in these parts were drunk at this hour, then that is what you should say. These points about knowledge seem possibly to be derived from facts about assertion. I hinted at that possibility above. Still I claim they are also a reflection of facts about understanding: You would not understand why this accident happened if you thought it happened because the driver was drunk if it turned out that the driver was not drunk. In that case you would not understand why this accident happened. In general you do not understand why something happens as long as an untruth plays an essential role of the explanation you provide. I take it as established that understanding causal processes is factive. Understanding is what you transfer when you explain someone why something happened. When we explain something, we must point to something true, and in fact to something we also have reason to believe is true. The choices were: should we a) believe, b) believe with good reason, or, perhaps, c) know that the information we provide is correct. The suspicion is this now that we should opt for c): only c) seems obviously to provide understanding. b) also seems too weak; there is no account of good reason that will do the trick alone and be sufficient. Notice this as well: If you, when giving an explanation, state more than you what know, then the possibility that the person who receives the explanation forms false beliefs about the causal history of the event to be explained increases, and then the possibility of misunderstanding increases. To provide understanding, and to prevent misunderstanding, you should provide things you know, and only things you know.
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
75
These considerations drive us towards this: In the normal case of a causal explanation of a particular event, the explainer considering what to say in giving an explanation should state facts about the causes, facts he knows to be true. In that case one can see oneself as providing understanding in the right way to the person who asks for understanding. Of course knowledge is fallible and all that, but that is granted. It is similarly taken for granted that you do not have to know that you know in order to know. If we accept the general constraint that a thesis about explanation must somehow illuminate for an explainer how she should go about the business of explaining, then we seem stuck with the thesis that to explain an event is to provide understanding, and to provide understanding, we should provide knowledge about its causal history.43 We seem to have established a first modification of Lewis; the thesis 1 above. The way we have done that is by a) imposing as a constraint upon a thesis about explanation that it should give some guideline to a potential explainer how to go about giving an explanation. That constraint is not a very strong constraint. b) We have imposed structure from the point about the aim of explanation, i.e. understanding. As a result we conclude that explaining is factive, and that the factiveness of explanation is a reflection of the factiveness of understanding. I am also much inclined to continue: The factiveness of understanding is a reflection of the factivity of causal knowledge.44 The latter point is however, in need of further argument. 2.2 The major modification We have before us two theses, and the first, the weaker thesis, seems reasonably well established. The thesis is this: To explain an event is to provide some knowledge about its causal history. The question before us is whether we should settle for this thesis, or go for a stronger thesis, namely this: 2) To explain p is to provide knowledge why p. This thesis represents a more radical break with Lewis. The break above, thesis one, limits the set of possible explanations to giving known information about causal histories. Should we limit it further? It seems to me that we should. It seems possible to know facts about the cause without really understanding why it brought about the effect. One can know the cause from a causal story, and still not understand why
43
44
I have said that my reasoning above might be seen as appealing to general facts about assertion rather than particular facts about explanation in its appeal to knowledge. It might be a general truth that the constitutive rule of assertion is that you should only assert what you know to be true. My aim is to argue independently of such a thesis about assertion, and argue about the particular case of putting forward an explanation, where I see the point of explanation as providing understanding. Perhaps my argument here supports such a general thesis about assertion. It is not, however, meant to rely upon it. I rely on the intuition that when an explanation has been provided then it must necessarily be able to provide understanding. There is a parallel here on the personal level - when you don’t know why, you don’t understand why.
76
O. GJELSVIK
the effect occurred. Understanding is a richer notion than that captured by the first amendment, in several ways. I will approach some of them. First we must not forget that causal relations are extensional, and permit substitutions of co-referring singular terms. Identity-conditions for events is a tricky topic, but most people admit that burning a bon-fire might be the event Bob had been looking most forward to on specific day. If this bon-fire caused the house to burn down, then the event Bob had been looking most forward to this day caused the house to burn down. The truth of the last sentence might be the only thing we know about the cause, but this knowledge does not, however, provide any understanding of why the effect took place. Let us return to the Lewis-type example. There has been a crash. There is the icy road, the drunk driver, the bald tyre, and much more. There are also other known facts. There is the bald driver, the red car, a day with an “r” in it. These latter known facts differ from the facts above: They seem causally irrelevant, and they do not seem to help in providing causal understanding. They are nevertheless pieces of correct information about the events in the causal history of the explanandum; the event to be explained. They are known facts about the causal history. We can imagine scenarios where these apparently irrelevant facts are indeed relevant. Such scenarios are very remote possibilities in this case, though. (The best we can do is most likely to imagine a scenario where such properties indicate the presence of a relevant causal factor: There might be a strong correlation between driving patterns and baldness, so that baldness indicates a reckless driver, the road might be cleared for ice on all days without an “r” in them etc.) What this seems to show, is that to provide knowledge about the causal history of an event in a causal explanation, is not sufficient for providing understanding. We are not in the business of giving necessary and sufficient conditions, but it still seems that we must improve matters here. What is indicated by the example, is that when we provide knowledge of the causal history of an event, we must not only provide information we know to be true about the events in causal-historical chain, we must also give further information we at least believe is relevant for understanding why the event of the explained (the crash) occurred. I do not doubt that Lewis in a way presupposes that it is that kind of story we should provide - but if it is, we should make the point explicit. There is a further interesting question whether we should only give information we know is relevant, or whether belief is sufficient. If we have reason to believe that the information is relevant, and the information is relevant, then we are typically in the position that we know that the information is relevant. The reasons we have for believing that the information is relevant, is typically other things that we know. If that is so, it seems to me that we will do better in providing understanding if we also provide the (relevant or salient) reason we have for thinking that the information we provide is relevant. Salience has now been mentioned, and that is indeed another factor here. When we explain an event like a crash we typically point to a specific causal factor, the factor that in fact is most salient to us. One explainer says the bald tyre was the cause of the crash, another maintains it was the drunkenness, a third points to the icy road. Very often these explainers would agree that all three factors are essential for
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
77
the causal history: There would not have been a crash had one of these factor been absent. If that is so, there is disagreement about salience, but not disagreement about relevance. Disagreement about salience I see as disagreement about which factor it is natural to mention in an explanation among several operative or relevant causal factors. Many of these other factors would typically be taken for granted, and not mentioned at all. They might be common knowledge. Salience works against such a background. It also works against such a background when picking the salient reason for believing that the information is relevant. There might be disagreement about relevance. It might be that explainer A holds that this level of drunkenness is very likely to lead to a crash under all circumstances, it probably would have happened even of the tyre had not been bald or if the road had not been icy. That is explainer A’s judgement, and if so, A should better have some reasons for this view. Explainer B might hold that this driver was not so drunk that it really mattered for the driving ability in this case, and that the icy road was so slippery that all tyres, also new tyres, would have slipped. C would go for the baldness of the tyre. In a scenario like this there would be competing explanations. A, B and C would be expected to bring forward reasons and justifications for their views. In the end all of them might be wrong, all mentioned factors might have been necessary for this accident after all. If one were to look at the justifications brought forward by A, B and C when explanations are competing this way, we see that general probabilistic knowledge about the relationship between drunkenness and driving accidents would not help A all that much, he would have to back his view with specific knowledge about the particular case, and justifications for his view that the other factors were not relevant in this particular case. The same goes for the other two, B, and C, and their argumentative tasks. This indicates strongly that a reasonable belief to the effect that a certain factor is relevant, reasonable because it is the type of factor which very often plays a role, is, when challenged, not sufficient to put one in a position to promote the view that it was relevant. It seems that we need more than just reasonable belief to the effect that a certain factor was causally relevant. If challenged, we need at least reasonable beliefs about causal relevance that are grounded in observations and investigative results about the particular case. These grounded reasonable beliefs they rely on would, by A, B, and C, be subjectively indistinguishable from knowledge. In this dispute between A, B and C, they would all three think of themselves as putting forward knowledge about which factors were causally relevant. Of course they cannot all be right, but that is another matter. Only someone who is right does in fact understand why this crash happened. The issues around salience must not distract us. What we are now concentrating on is whether we should be required to know that a factor is causally relevant if we offer that factor in a causal explanation of an event. The considerations above about A, B and C seem to support that. Reasonable belief about relevance does not seem sufficient for A, B or C. All three of them might have reasonable beliefs about relevance. Only one of them (at most) can have knowledge about relevance in this
78
O. GJELSVIK
example, and only knowledge seem to provide understanding. They need of course not know that they have knowledge about relevance. Salience, as I understand it, connects with understanding in a different way. We might ask why something happened, and seek understanding, also when many causal factors are common knowledge. (There was oxygen in the air, etc, etc.) In that context, understanding might be achieved by pointing to a further factor, to the fact that a known factor was much stronger than usual, to the absence of a factor that is normally known to be there, or to an unusual factor not normally known to be there. When an explanation provides understanding, that normally takes place in a context where there are not only a lot of beliefs and assumptions, there is also a lot of knowledge that plays a role in generating the question. Issues in the pragmatics of explanation must be understood on precisely this knowledge background. Our conclusion is that we should not only provide knowledge about the causal history of p when explaining p. We should provide factors we know are relevant for the case and thereby for the understanding we want to provide, factors which in fact played a role and were significant. We might limit our reply to factors that are particularly significant (salient), and there is a lot of pragmatic leeway in providing understanding, but on the whole we should only mention factors that we know are relevant when providing an explanation. A further important point which the example of A, B, and C brings out is this: It simply seems true that we often, when explaining, do better by providing not only a known relevant factor, but also the reasons and grounds we have for believing that factor relevant. Those reasons should be part of the causal story we provide. In order to promote understanding and prevent misunderstanding, it seems that everybody would gain by being given the specific grounded reasons for believing that the relevant factors were indeed relevant. One way of doing that might, in a situation, be to show that one factor was sufficient, or nearly sufficient, for the occurrence of the explanandum, given the background knowledge. That might be done by subsuming the event to be explained under a known regularity: The drunkenness was perhaps at a level where driving ability is very much reduced, so much reduced that normal driving along this stretch of road is extremely unlikely. If that is known, it is vital to pass this knowledge on to those requesting an explanation. We therefore should provide more than a known factor, and the statement that this factor is relevant, we should provide why we take it to be relevant, and when doing that we should contribute what we know about why it is relevant. So there are two things: Knowledge about which factors are relevant, and knowledge about why these factors are indeed relevant. In explanation both kinds of knowledge are important. The first knowledge links the explanans and the explanandum as cause and effect. The latter knowledge links the explanans and the explanandum by providing the relevant reasons for thinking the first link to be there. Note how this way of putting it does not tie the property of being explanatory with the property of predictive power, or with the property of establishing a modal link that exhibits the necessity of the effect. Still it makes the provision of understanding the central feature. Another very different and indeed interesting case would be where the causal factor does not make the explanandum very likely at all, but it is the only known causal factor. Then this last fact needs to be stated in the explanation. (This can be
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
79
taken as the case of untreated latent syphilis and paresis. In this case we know of no other causal factor, and for all we know there might not be one. What we have is to my mind an explanation because it provides the causal knowledge we have, knowledge that again might be the only causal knowledge we can have in this case.) Again: we should supply what we know and only what we know, in order to prevent misunderstanding arising, and in order to make plain how we know that the factor in question was relevant. Conclusion: Explanation provides understanding, and understanding is, simply put, knowing why. In understanding we link the explanandum and the explanans. Knowing, for good reason(s), that the link is there, constitutes causal understanding. Knowing what there is to know about what links there are to many or all facts in the causal history constitutes (full) causal understanding. It seems to me that the aim of providing (causal) understanding when explaining, limits the ways of giving causal histories dramatically. The ways of limiting the causal histories all contribute to accounting for the semantic intensionality of explanations. Especially important for accounting for the intensionality is the need for preservation of knowledge in the allowable substitutions. Lewis does not aim for providing necessary and sufficient conditions for explanation, and neither do I. I share in many ways Lewis’s picture. I still hold that we can do better that Lewis, and that by seeing understanding as the aim of explanation we do better than Lewis. This first part can be summed up as a simplified slogan: explaining p is to provide knowledge why p.45 2.3 Three qualifications: a) Conjectures Many are now getting impatient with a major objection: We often to do not know the explanans for a fact when we explain. I agree. There is clearly such a thing as conjecturing an explanation. (There is also such a thing as mistakenly believing one knows p, for instance when p happens, contrary to belief, to be false.) In the case of conjecturing an explanation, we either suggest an explanation while believing that we do not really know that the explanans took place (We say: May be the road was icy?), or we do not know that the factor we point to was relevant. (We say: It might have been the icy road). What we typically do when we conjecture an explanation is to mark or indicate that what we are doing is somehow short of explaining. We indicate that by the maybes and might have beens. I said above that the serious explainer must think of herself as possessing an explanation (of some sort), and must 45
There is a considerable simplification here. Nancy Cartwright has raised the worry that I may understand without knowing: "When explainers pass on an explanation to me, I may understand without knowing because I don’t have the same justification as they do - they may be experts. I accept what they say on the weak ground that I hypothesize that they probably know what they are talking about. If they do, then it seems that I have understanding without having knowledge." I see this situation clearly, but I would think about this case differently from Cartwright. You have understanding of the causal structure, and you also have some knowledge (but you might not know that), and in fact you understand to the extent that you know, but your knowledge and your understanding is different from and in a sense inferior to the experts's knowledge and understanding.
80
O. GJELSVIK
think of her reply as making that explanation available to a person who does not possess the explanation. I believe the explainer who conjectures an explanation sees herself as not really possessing an explanation, and only as possessing a reasonable or likely hypothesis. To possess an explanation one needs further evidence. It is also clear that by conjecturing an explanation I may be providing the propositional structure that happens to be the right objective explanation. My distinction between conjecturing an explanation and explaining is wholly on the subjective side, and concern whether the potential explainer should rightly be said to see herself as putting forward knowledge why p or not. Of course one might be mistaken in that as well. Lastly there is a delicate and difficult border terrain from conjectures like the one about drunkenness to probabilistic knowledge of causal factors, and statistical explanation. Take the case of drunken driving I have provided. Knowing that 98% of the drivers in this place are drunk at this time is only good enough for conjecturing an explanation of the crash as long as the interesting condition, the drunkenness, is something that we know can be checked independently by a blood-test. In case that has not been done, we should stick to a conjecture when it comes to identifying the cause(s) of the crash. But in many cases there is no such independently checkable factor, and there is nothing more to identifying the cause(s) than the probabilistic knowledge we have about connections, no independent condition (like drunkenness) we know how to check for. In that case we must again state precisely that: we must state what be believe we know, and leave it at that. We are then not conjecturing an explanation that we believe we can have, we are stating the only explanation we have and believe (with reason) we can have. b) Idealizations In most explanations of some complexity we idealize and simplify, and what we say when explaining is often strictly false. Since this is so, your thesis is completely off the mark, I hear many people saying. I reply that this is true in a way, but there is a lot to consider here. Firstly, I am only dealing with causal explanation. Simplification in the causal case must in general be such that it promotes causal understanding. It might, in a situation, promote such understanding not to mention some tiny factors, and concentrate on the major factors (among those relevant). It might promote the understanding to simplify things so that one can get a manageable mathematics going. Still, after all that, all idealization is and must be a matter of sound judgement on the background of knowledge why it happened. Only on such a background is it clear that simplifications and idealizations are fully in order and that they promote understanding and not the opposite. The understanding of how simplification and idealization works must therefore be provided as a development of the present thesis about causal explanation, and such things constitute a (scientific) refinement of the activity of providing knowledge why. c) Knowing what I have above in fact presupposed that we know and agree what our epistemic puzzle is when asking for an explanation. That presupposition is behind the thought that what we are after can be captured by “why”-questions in the causal case. May
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
81
be this is wrong. If it is, my starting point is really a causal query rather than a specific why-question—I do not think the syntactic form of the question is essential. It has also been presupposed that we all know what the explanandum is, or what the epistemic query is. But sometimes we do not know this. Sometimes we are in the business of finding that out. In that case, a why-question may disguise itself as a what-question. Sometimes deep scientific disagreements really concern not how to explain why an event occurred, but concern instead what that event to be explained is. Then the what-question is the prior and primary question, and the why-question is secondary. Take one of Lewis possible counterexamples to his own thesis: Walt is immune to smallpox. Why? He is immune because he possesses antibodies able to fight off a virus-attack. This reply does not give the cause of the immunity. It rather explains what it is to be immune. If this is the reply we seek, we have the case of a whatquestion in the form of a why-question. We should not find that odd. Note that we may be after a cause after all. Perhaps the answer we want is this: Because he has been vaccinated. That is in fact the natural answer to a whyquestion. It gives a causal explanation of the immunity, and assumes that we know what the explanandum is. Take a very different sort of case. Why did Walt’s hand move? Here is one reply: It moved because he wanted to give a sign to Peter. A very different reply might cite a set of neurophysiological events resulting in the hand movement. These two replies are very interestingly different. They seem to conceptualise the explanandum in different ways. The first thinks of the explanandum as an action. Actions we typically explain by the acting person’s motives. The second reply seems to think of the hand movement as a neurophysiological event with a neurophysiological cause, and itself as stating that cause. There is the possibility that these two explanations are explanations of the same event, only described in different ways. There is also the possibility that these explanations explain different events. In fact, we are here up against very big controversies about the relationship between the mental and the physical. On one view, the view that sees the action and the neurophysiological event as different entities ontologically, there might be no question about explanations competing to explain the same explanandum. There are two distinct events, and they are explained inside different explanatory schemes (and this does not necessarily deny interactions between these explanatory schemes). On another view, which sees the action as also a complex neurophysiological event, the two explanatory schemes might represent be different ways of explaining the same thing, and the relationship between the two explanatory schemes might be thought of in many different ways. One such view sees the relationship as that between folk theory (intentional explanation) and (real) science. Problems surrounding reductionism lumber in the background. My aim here is not to argue for or against one metaphysical view or other. My aim is only to point to the fundamental importance of the “what”-questions, and separate disagreements on those issues from disagreements about how to answer why-questions. There might be full agreement about how to explain neurophysiological events but disagreement about whether actions can at all be explained that way. Different answers to what-questions might be at the bottom of
82
O. GJELSVIK
the most fundamental scientific disagreements. The different answers to whatquestions might reflect deep differences not only in scientific but also in metaphysical outlook. Parallel issues arise for the relationship between the entities and events described in the various special sciences, between physics, chemistry and biology. On the whole our general metaphysical outlook is deeply influenced by the development of sciences and the success of various explanatory strategies, and there has of course been significant development through the centuries as to how we at all view and think about what-questions. For the rest of the paper I shall push “what”questions aside, and return to the why-questions. 3. INTERLUDE: EXPLAINING WELL AND BADLY An act of explaining can be more or less satisfactory. Lewis holds that it will not be instructive to fuss about whether an unsatisfactory explanation deserves to be called an explanation. I agree with that statement. Does the knowledge requirement bring with it constraints which conflict with this? I do not think so. I shall survey the important cases; they are a somewhat mixed bag. It seems to me that my view has much the same flexibility as Lewis's in most of the cases, but that it fares much better than Lewis’s in that it can account in a natural way for why certain explanations are unsatisfactory. Consider the following: 1. “The explanatory information given might be correct, but not thank to the explainer. He may have said what he did not know and had no very good reason to believe, even if the information happens to be satisfactory” (Lewis 1987, p. 227). To me it is far from clear why Lewis’s approach should find something unsatisfactory in this case. It seems to me obvious that the present account can easily and naturally explain why it is unsatisfactory. The case seems most likely to be a case of someone who presents herself as giving an explanation where she should at most have presented herself as conjecturing an explanation. Such cases support my general line strongly, since I can say what is unsatisfactory about them. I can still describe the case as a case where an explanation has been given. 2. There are false statements in the explanans. It follows from the knowledge requirement that there should be no false propositions on the explanans. Lewis gives the example of a case where the explanans has a natural division into conjuncts, and a few of them are false. Still it might be the case that they are not far from the truth, and the most important conjuncts the explainer knows all right. If the false conjunct is a relatively unimportant one which could be fairly easily corrected, and the main conjuncts are known all right, it seems to me that we from our perspective might want to say that the explainer knew the essentials of the explanation even if some detail was wrong. As long as we can say that, we might also want to say that an explanation was given, even if it was not fully satisfactory. It is not satisfactory because there were things the explainer should have known in addition to what she knew. Of course the explainer ought to see herself as knowing an explanation as long as she is not conjecturing an explanation.
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
83
There are also the following cases: 3. The explainer provides fairly little information. This case is easy: The explainer knows something and provides that, but that is not all that much. Of course it would have been better if more had been known. 4. The information given may be stale news. 5. The information given may not be of the sort the recipient wants. 6. The information might be given in a jumbled or bad way. 7. The recipient might start out with some misinformation and the explainer might fail to correct this. In all of these cases my view does not really commit me to saying whether or not the unsatisfactory explanation deserves to be called an explanation. That might we an issue we should not push. Nevertheless my view provides a platform for assessing the unsatisfactoriness and analyze it, and that we need. The latter cases, from 3 to 7, are all cases where the recipient does not receive the right thing, what is received is either too little, stale etc. The situation of the recipient is not much improved, and if the recipient did not know an explanation of an occurrence of something before receiving the service of the explainer, fairly little has changed. The knowledge situation of the receiver ought to change upon receiving an explanation, and ideally the situation should be such as to leave the recipient with knowledge why. If that has not happened, an explanation has not been given. The present view provides the rough standard, and the point of an explanation. It also provides us with the right material for saying, in many cases, that this is not really explaining. The difficulty with Lewis’s view on explanation is again that we seem not to be given any material for saying anything like that. The point is that that material should be given by the account of explanation even if such an account should not settle all grey-zone cases and rule on all of these whether they are explanations or not. The other type of case is where the receiver receives the objectively right stuff but the explainer does not know that. 4. SECOND MOVEMENT: THE OBJECT (STRUCTURE?) SIDE When thinking of oneself as conjecturing an explanation, one might be providing the objectively correct explanation and one might not. Explanation can, as Bromberger so forcefully pointed out, be an objective entity, something that we are the first to know. Let us concentrate on this objective side of things, and the abstract object that is the explanation, an object which can be known for the first time, and which has a structure we can discover. A natural thought, given what we have argued so far, is that the structure of the abstract object should be seen as deeply constrained by the fact that a good explanation provides understanding, and this I see as providing knowledge why. There cannot be a further point to structure of this abstract object beyond that of serving the point of explanation. It is a fact that much of the best work on explanation might be seen as attempts to identify the type of structure such an abstract entity, the explanation, must have. There is a temptation to think that there is an objective entity here whose structure and properties we can discover, an entity that exists independently of our
84
O. GJELSVIK
explanatory attempts. When we objectify, this ideal object becomes the object of inquiry, and we see the task of the theory of explanation as identifying the requirements for being such a structure. Satisfying these requirements is then seen as necessary and sufficient for being an explanation. There, are of course, disagreements about what type of structures and properties we should concentrate upon. One line is to concentrate on logical and syntactic structure of the explanation. Linguistic object are in good ontological standing, and are taken as unproblematic for the physicalists, as logical properties also are. Hempel and Oppenheim seem to have thought along these lines when laying down their explicit requirements for an explanation of a fact. (A fact is a true description of an event: There are law-premises and particular fact premises. The conclusion says that the event we want explained took place, and the argument from the premises to the conclusion is deductively is valid. All premises are true.) The verdict today, as stated above, is that the strategy of capturing what an explanation is by identifying structural properties, for instance formal and syntactic properties of such an objective structure, fails. Furthermore the idea that explanation and prediction come close together seems altogether wrong, and one can on the knowledge approach see explanation in a quite different way. But Hempel’s view still has appeal: In a range of cases it seems right, or perhaps ideally right, to provide a D-N-explanation. And we might also ask why this is so. David Lewis discusses in some detail the relationship between his own proposal and the D-N model. Defenders of the D-N model of course take objection to the employment of the ordinary notion of causation, that notion has resisted precise analysis and is not available. But when they give examples to motivate the D-N model they invariably pick examples where the covering law-model includes a list of joint causes of the explanandum event. In fact Lewis’s model and the covering law model of explanation could have been reconciled if one had a covering law model of causation. Even the present approach to explanation could have been reconciled with a covering law model if we had a workable D-N type reductive theory of causation. But we do not have a D-N analysis of causation, and in my judgement there is no good reason to think that we can or even could have one. However, a D-N argument may present us with causes, and then it definitely looks explanatory. This fact we ought to be able to account for. I think we can account for it this way: When the D-N argument provides the relevant causal information, then it is explanatory because it provides the relevant causal knowledge, and because it provides the best reason we have for thinking that the cause is the cause. Providing such a reason is exactly the role of subsumption under the regularity, which in this case is a true causal regularity. The knowledge approach can therefore easily account for the great appeal of the D-N model, by noting that there may be many cases where the DN-model provides exactly what the knowledge approach says should be provided, namely knowledge why the event to be explained happened. But, the knowledge approach would maintain, in many cases we have a different sort of reason for thinking that the cause is the cause. It is indeed a fact that real life explainers mostly do not bother about serving up full D-N arguments. We seldom do, and most often we cannot. It may be that we cannot because there is so much we do not know. There is always a lot more to
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
85
know about everything. But: This permanent lack of complete knowledge does not stop us from knowing why many an event occurred. We may have such knowledge in very many different ways, i.e. in ways much short of anything like D-N arguments. Some times the best we can have is the identification of the mechanism (for instance in a psychological case) that produced the event we want explained. Knowing the mechanism is then knowing why, it is the reason for believing the cause to be the cause, and this should not, in this case, be thought of as an explanatory sketch to be filled into a D-N argument.46 There is no denying that the great appeal of the D-N model has been intimately connected with the great appeal of a regularity-view on causation. It is fairly easy to see that such a reductive view on causation itself, a view which reduces the causal relation to epistemically manageable factors and connections, namely regularities etc, lends a lot of credence to the D-N model. Some of the ammunition against the D-N model is ammunition against the regularity view on causation. It might also be fair to say that when Wesley Salmon developed his alternative model of explanation, the statistical relevance model, then he was also led to an alternative view on causation and causal processes. Salmon's view on explanation is that to explain an event is to assign to it the broadest homogeneous reference class. (It is noticable how he avoids using the concept of causality when characterizing explanation). In causal explanation one issue then is whether this always gives us the best reason for identifying the cause as a cause. It is clear that it does not always do that, that all depends on the causal mechanisms at work and on how they interact whether mechanisms cancel each other out and so on. Salmon needs to link evidential probabilistic facts with causal knowledge, and it is far from clear that it can be done - some times it works but some time is does not, as Nancy Cartwright has been stressing. (Salmon's work on causal processes is to my mind very refreshing, and the fact that it turns the theory of causation away from arguments and linguistic structures to (ontic) processes in the world. This turn is very significant, and I believe this is a good and healthy move in thinking about causation. The delicate issues here concern the extent to which you can say something interesting and informative about causation without actively employing causal notions. Perhaps what you can say amounts to this: The presence of a cause must somehow increase the likelihood of an effect when all other causal factors are held fixed.47 Maybe you can say little true
46
47
See Jon Elster's (1999) discussion of mechanisms, chapter 1. I agree with much of what he says about mechanisms, but I would stress that we may have causal knowledge in many different ways. The Sherlock Holmes method, for instance, rules out the impossible and leaves us with the improbable (but still the only thing possible). It seems to me that Salmon saw his theory of explanation as the evidential starting point of explanations, a basis that also needed explaining. The hard problem has always been, as it is in the regularity-view, to find a way to distinguish between mere correlations and genuine causal relations, and Salmon's strategy went by spatially continuous processes, and interactions between those processes, and in fact it appealed to the notion of a genuine causal process. This notion surely had the traditional problems built into it if the aim was reductive. Salmon later turned to the transfer of conserved quantities to account for the causal relation, and then he definitely seemed to be begging the question about many types of causal connections, for instance all issues about mental causation.
86
O. GJELSVIK
about causation, beyond this, and one needs to employ causal notions even to get this right. (One can say that a cause makes the effect happen, but I see that as the employing of causal notions to illuminate causality.)) James Woodward has also recently argued against the view that causal explanation involves subsumption under laws.48 According to him, whether or not a generalization can be used to explain has to do with whether it is invariant rather than whether it is lawful. Invariance comes in degrees and has other advantages - a generalization can be invariant even if it holds only in a limited spatiotemporal interval. (Note how the notion of cause again is avoided in order to characterize causal explanations!). Again my perspective on this view is that we have a good reason for believing the cause to be a cause of some effect when we can connect that cause with the effect with an invariant generalization. That is why explanations with invariant generalizations are explanatory. But the question is whether this is the only type of reason for believing a cause to be a cause, or the only one we should look for. There is no good reason to think that, even if there are often good reasons for looking for invariant generalizations. The present approach maintains that the explanation should provide knowledge why, and that this task unifies causal explanation. We can see the task of providing knowledge why as a task that can be fulfilled in quite different ways. However, it involves stating what the cause is (what the causes are) and providing in the right sort of way the reasons for thinking that the cause is the cause. What way depends on the audience etc. etc. Logical or semantic structure cannot identify causal explanation. This conception follows naturally if we were to agree that there is no reductive account of causation or of causal knowledge available. The reductive approach to causation lends support to the idea that there is an objective structure common to all causal explanations, since there are conditions necessary and sufficient for causality, and knowledge of those conditions would by definition always yield causal knowledge. If we leave that reductive ideal behind as misguided, then the demands on how to fulfil the explanatory task will naturally be given a concrete interpretation relative to the discipline and the knowledge situation in which we find ourselves. We can start with the causal connections we recognize and think we know about, and we have no external or general motivation for being revisionary about what causal connections there are. Of course we can change our minds about such things as we learn more, but that is another matter. But this means taking our present explanatory practices very seriously indeed, and seeing ourselves as working from within them when theorising about explanation. This does not mean that the task of theorising about explanation does not impose normative constraints upon explanation. Of course it does, and one such task is to be explicit about your assumptions and all your reasoning. Still this view can maintain that the way of arriving at a normative theory has some of the elements of a reflective equilibrium. (The sort of normative equilibriums I am thinking of can be local and situated.) In a discipline where we do explain from laws, the D-N model might become the natural (He wrote that "causal processes transmit conserved quantities, and by virtue of this fact they are causal.." in Salmon (1998, p. 253). See the discussion in Paul Humphreys (2001, p. 523-528.) 48
See Woodward (2000).
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
87
instantiation of the task to exhibit knowledge why in an explanation. But in other disciplines like for instance economics we do not explain by invoking laws, we explain by identifying causal mechanisms, and the identification of such mechanisms exhibits the causal knowledge that can be had. On this way of thinking, the constraints on the structural/objective side should be seen as a reflection of the task of stating knowledge why in the normatively best way in concrete situations. However, disciplines change, knowledge situations change, the perceived relationship between disciplines change, and the general metaphysical outlook which frames the knowledge situation also undergoes historical changes. W. Salmon once wrote: […] what constitute adequate explanation depends crucially upon the mechanisms that operate in the world. In all of this there is [...] no logical necessity whatever. [...] My aim has been to articulate contingent features of scientific explanations in this world as we presently conceive it. (Salmon 1984, p. 240 and 278)
My difference from Salmon, if there is one on this point, is that to see these contingent features of explanation as various concrete ways of satisfying the task of providing knowledge why. That general feature I do not see as contingent. I therefore think that the search for necessary and sufficient conditions for explanation in general, and the search for such conditions in the properties of an objective structure, the structure of causal explanation, is, in a way, badly misconceived. It is a misconception that there is such a thing as the objectively right structure of all causal explanations of particular events. This misconception is fed by the misconception that in the end there will be a satisfactory account of causation. This latter misconception fails to realise that some of the basic concepts, like the concepts of knowledge and causation, are so centrally placed that there are no reductive accounts of them. We can still say illuminating things about such concepts, and we should not be afraid of employing them for our philosophical purposes. Employing them becomes a way both of illuminating both the philosophical problems and these concepts. We can put this somewhat differently: There is an objective explanation with a particular structure when a good explanation of an event has been given, and in that case we can ask all of Bromberger's questions about it. Still, the properties of the objective explanation do not generalize to all explanations of all particular events, even if they might generalize interestingly. They do not generalize across the board because the understanding and the knowledge why p is different from discipline to discipline, and sometimes inside disciplines. The objective structure that is an instantiation of knowledge why, is clearly not the same in all disciplines or in all causal explanations outside science. What remains constant is only that they are instantiations of knowledge why. At this point there will be various charges, and one charge will be that the concept of knowledge is vague, another will be that by employing the concept of knowledge we start where we at best should end, it is not a concept available for use. The concepts that are available for use are presumably evidence, beliefs (expressed in linguistic structures), inferential connections, assertion, justification etc. To these points I want to say: T. Williamson’s work in epistemology tries precisely to show
88
O. GJELSVIK
that we throw a lot of interesting light on belief by starting with knowledge. By seeing knowledge as the prior concept, we might put this by saying that belief is to be understood as failed knowledge, knowledge is not to be understood as belief with success. Once we cease to take belief to be simple and conceptually prior, we can experiment with using the concept of knowledge to elucidate concepts of justification and evidence. (Williamson 2001 p. 9)
Williamson goes on to give extremely interesting elucidations of evidence, justification and assertion. I extend a strategy of this type to explanation and (causal) understanding. In the light of Williamson’s work this is indeed a natural suggestion. There is no reason not to employ the concept of knowledge from reason of conceptual priority; knowledge is prior to explanation anyway if knowledge is prior to the concepts needed to account for explanation in all attempts that superficially seem to avoid employing knowledge. Their gain, by avoiding knowledge, is really nothing anyway. There is another charge, and that is that the concept of knowledge lacks precision. It is true that it lacks precision in the sense that we cannot give a precise definition of knowledge. That is just because the concept is so central, and a definition would have to employ concepts that are secondary to knowledge. But the fruits of employing knowledge can only be seen by the application. One such application is in diagnosing and understanding the shortcomings of alternative approaches. The case of causal explanation is special because the concept of causality seems to be a concept with many parallels to the concept of knowledge, it has an equally central standing, as has been forcefully argued by Nancy Cartwright. I want to make the following points as a closing of this paper (and final preparation for the coda), and as an indication of where we might want to go from here: a) A knowledge account of causal explanation carries with it great insights that are likely to be lost by “reductive” accounts. The knowledge account can be seen to contain resources that make it able to say why the D-N model is right when it is right and wrong when it is wrong. It also accounts for the general attraction of the epistemic accounts of causal explanation; they are attractive because they capture the need for reasons for seeing the cause(s) as cause(s). The various epistemic virtues they appeal to, be it uniformity, coherence, or simplicity all have their role in providing reasons for believing the cause to be the cause. Predictive power, however, is not a defining trait of explanation. It is rather a consequence of there being a special type of good reasons for believing a cause to be a cause. The present view also has resources to explain why Salmon’s approach is right when it is right, and wrong when it is wrong. Salmon is right that there might be limitations in what kinds of reasons we can give for believing a cause to be a cause, and we should not be misled by a misguided ideal of epistemic perfection. His view of explanation, that to explain an event is to assign to it the broadest homogenous reference class, is nevertheless much too simple as a view on causal explanation - it doe not always links evidence with causal knowledge, and there might be causal structures it cannot identify.
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
89
I will give a controversial example. We might hold: We explain an action by giving the reason(s) for which the agent acted. This does not fit the D-N model, and nothing is gained by trying to force it into it. Nothing is gained by introducing the idea of transmission of conserved quantities in this explanatory context. Knowing the reason for which the agent acted is clearly a way of knowing why the agent acted thus, and provides us with an explanation of the act. If we are also provided with information about how we know this reason to be the reason for which the agent acted, we have a good explanation, knowledge why. Knowing the reason is being familiar with a causally relevant factor, and nothing is gained by denying this because you accept a reductive view on causation. Of course a lot more needs to be said about what it is to explain an action. But a main point made here is independent of the more general and also controversial point that we explain actions causally by giving reasons. It is the point that if the latter point were right, we would have a way of knowing why an action took place. b) We have to see the constraints on the structural side, the properties of the objective explanation given, as arising out of the meeting between the demand for knowledge why, (arising out of the need for understanding, i.e. causal understanding), and the concrete situation in which the explanation is given, a situation which is ultimately framed within a scientific and a metaphysical outlook. The approach takes very seriously the need to bring in the knowledge situation in question, and it therefore brings the world in. In the scientific case it brings in the knowledge situation of the discipline in question. It is a serious mistake for an account of explanation not to do that. c) The account does not at all aim to replace the normative role of a theory of explanation with a purely descriptive task. Knowledge is a normative concept. Explanations get their point from the normative task of providing knowledge why, and the aim of science is scientific knowledge. When we explain causally, we apply our knowledge why. Our understanding is relative to our system of concepts. Such systems also develop and improve, and as they do improve, we improve our understanding, our explanations and our knowledge. Still, as long as explanations provide knowledge why, we have all the objectivity we can ask for in explanation. d) This account has a complex relationship with other accounts. In one respect it operates on a high level of abstraction, and it aims to integrate both positive and negative knowledge about explanation into a more concrete account. Interesting non-reductive theories are always much indebted to reductive attempts. I have argued that various theories of explanation can be seen as exemplifying various kinds of explanatory knowledge, and that one main problem simply is that they do not generalize. There is great need of exploring further what explanatory knowledge consists in - there might be kinds that have not been covered, and there might be many corrections that need be made also in the case of accounts of explanation that seem correct for a certain situation. Knowing p does not require that one knows that one knows: knowing p is not luminous.49 On the other, knowing that you do know p is very helpful when deciding whether or not to assert p. It is also very helpful when 49
See Williamson (2000).
90
O. GJELSVIK
wondering whether to assert an explanation. Part of the motivation for continued effort in exploring explanation could be to get closer to the favoured state of knowing that you do know why when you explain. This opens up a different perspective on what the philosophy of science is about which is able to exploit and digest for its own purposes much of the present good work. The debate between modalist, unificationist/epistemicist, and ontic-mechanist approaches to explanation, to use Salmon’s vocabulary, could all be seen in this particular light as well. 5. CODA: INFERENCE TO THE BEST EXPLANATION The change of focus gives a perspective on most issues in a theory of explanation. I shall end by making some remarks about inference to the best explanation. This I do because this issue has become such a central issue in debates about realism.50. It has becomes central because of the claim that the best explanation of the success of science is the truth of science, and we therefore should conclude that science is mainly true. The latter conclusion requires the soundness of inference to the best explanation.51 The traditional issue of whether inference to the best explanation is acceptable can be seen as the issue of whether we can infer from the fact that one explanation satisfies all criteria for being a good explanation - better than its competitors-, to the conclusion that this explanation is also to be believed: we should accept it as a true description of the world. The problem is naturally conceived in a setting where truth is not a necessary property of an explanation; it is rather the issue (in Hempel’s wording) of whether the best explanation is a true explanation. The whole 50
51
As a preamble to that, I will cast a glance at Hempel's discussion of his requirements. His last requirement, the requirement that the premises must be true, has a somewhat interesting history in Hempel's discussion. In a postscript (from 1964) to his early (1948) article with Oppenheim, Hempel adds that true premises characterize a correct or true explanation. He is therefore making a distinction between true and correct explanations, and explanation in general. In an analysis of the logical structure of explanatory arguments, he says this truth-requirement may be disregarded. (See Hempel (1965 p. 249)). Of course, what seems to be implied by the latter is simply that the rendering of the full logical structure of explanatory arguments is independent of whether the premises are true: the formal structures of correct and incorrect explanations are the same. If the aim is to capture the essentials of explanation by identifying their formal structure, then a truth-requirement is additional. From where does it stem and what is its status? Note how Hempel's distinction between true explanations and explanations is almost the mirror image of the activity-side distinction between conjecturing an explanation and explaining. From the present perspective the requirement that statements in the explanans be true flows directly out of the knowledge approach to explanation, that one must know that the explanans is true. This truth-requirement thus flows out of what I have called the activity side, and that is a side played down by Hempel. Ben-Menahem (1990) contains a nice discussion of inference to best explanation. It also contains a negative discussion of the use of such inference to prove realism. I myself am sceptical as to whether truth is a property that can have a real explanatory role in empirical explanations. I think that what really explains the success of the theory T that the atom is F is that the atom is F. This point can be put by employing the concept of truth. Still the real explanation of the empirical phenomenon of the success of science need not make reference to truth, and can be given also by those who believe truth is a purely logical concept.
CAUSAL EXPLANATION PROVIDES KNOWLEDGE WHY
91
problematic presupposes that we have available the important properties of good (potential) explanations independently of whether we believe they explain anything. But that can be granted. The subjective part of the inference to the best explanation naturally comes out as the issue of whether we should commit ourselves (in belief) to the best of several competing hypothesis, (or to the one and only hypothesis is case there is no competition). We might maintain, with Timothy Williamson, that the mental state of belief should be seen as deriving its properties from the mental state of knowing, and that to adopt a belief is to make an epistemic commitment that would be knowledge if what is believed is true. This then describes the subjective side of inference to the best explanation: The issue boils down to whether the best of competing hypothesis is deserving of such an epistemic commitment. And of course there is no general rule about that, it may be deserving of such a commitment, and it might not be. That all depends on the competition and on how well the better explanation satisfies other requirements upon being something we should commit ourselves to in belief. This again boils down to general epistemological issues, where it of course counts much in favour of accepting a proposition that it explains well propositions known to be true. But there is no simple valid inference to the best explanation, no rule of inference. And the property of loveliness, a property of the whole explanation, is not linked with the property of likeliness, a property of the explanans.52 The issue has another side, and it is this. If we think of ourselves as really explaining and not as conjecturing an explanation, then we are already committed to the truth of the explanation we are giving. In this case inference to the best explanation is not even an issue. (It is possible to imagine a case where there is competition in explanation by various explanations whose truth we are committed to anyway. That again is a different situation.) Inference to the best explanation, when made in practice, can therefore be seen as making a judgement as to whether we can judge ourselves to be explaining the explanandum, and not just conjecturing an explanation of the explanandum. In order to make that positive step, it matters much how well the hypothesis we favour, the best hypothesis, explains. On the other hand, if we are committed to seeing ourselves as explaining the event in question, we are committed to the truth of the explanation we give. The second commitment is then contained in the first. Inference to the best explanation might be two different things. This is brought out by the way I conceive of the relationship between what I have called the activity side and structural side of explanation. There are two different subjective states I distinguish, explaining and conjecturing an explanation. Subjectively there might be many cases we do not know how to categorize, whether we are explaining or just conjecturing. If it is right that to explain p you need to know why p, there is a sense in which inference to the best explanation is utterly trivial; if you are committed to the best explanation in the sense that you hold that it actually explains the explanandum, then you are committed to the truth of this explanation. There is 52
See Peter Lipton's work on inference to the best explanation.
92
O. GJELSVIK
another sense in which inference to the best explanation is not trivial at all, and that is when making the inference to the best explanation boils down judging that one is not just conjecturing an explanation, one is in fact explaining when putting forward the best explanation one has. Making that step is no simple inference at all, and there is no general recommendation that you always commit yourself in belief to the best or most likely among competing explanations, even the best of them might not be all that good or all that likely, they might all be conjectures, some good and some bad. There being no general rule here reflects the fact that you need not know that you know when you know, and you need not believe that you know when you know. You may believe that you know the explanation and be wrong, and you may believe you do not know the explanation and be wrong. Still the commitment you make when you start believing that you know is a very significant. If you are right, you are truly able to explain. REFERENCES Ben-Menahem, Y. (1990). The Inference to the Best Explanation. Erkenntnis 33: 319-344. Bromberger, S. An Approach to Explanation. In R. J. Butler (ed.): Analytical Philosophy, 2nd Series. Oxford: Blackwell: 72-103. Elster, J. (1999). Alchemies of the Mind. Cambridge: Cambridge University Press. Hempel, C. (1965). Aspects of Scientific Explanation. New York: The Free Press. Humphreys, P. (2000). Review of W. Salmon's Causality and Explanation. Journal of Philosophy 97(9): 523-528. Lewis, D. (1987). Causal Explanation. In his Philosophical Papers, vol. 2. Oxford: Oxford University Press. Lipton, P. (1991). Inference to the Best Explanation. London: Routledge. Salmon, W. (1984). Scientific Explanation and the Causal Structure of the World. Princeton: Princeton University Press. Salmon, W. (1989). Four Decades of Scientific Explanation. In Minnesota Studies in the Philosophy of Science, Vol. XIII, Minneapolis: University of Minnesota Press: 3-219. Salmon, W. (1998). Causality and Explanation. New York: Oxford University Press. Williamson, T. (2000). Knowledge and its Limits. Oxford: Oxford University Press. Woodward, J. (2000). Explanation and Invariance in the Special Sciences. British Journal for the Philosophy of Science 51: 197-254.
CAUSAL EXPLANATION AND MANIPULATION STATHIS PSILLOS
1. INTRODUCTION Causal explanation proceeds by citing the causes of the explanandum. Any model of causal explanation requires a specification of the relation between cause and effect in virtue of which citing the cause explains the effect. In particular, it requires a specification of what it is for the explanandum to be causally dependent on the explanans and what types of things (broadly understood) the explanans are. There have been a number of such models. For the benefit of the unfamiliar reader, here is a brief statement of some major views. On David Lewis’s account, c causally explains e if c is connected to e with a network of causal chains. For him, causal explanation consists in presenting portions of explanatory information captured by the causal network. On Wesley Salmon’s reading, c causally explains e if c is connected with e by a suitable continuous causal (i.e., capable of transmitting a mark) process. On the standard deductive-nomological reading of causal explanation, for c to causally explain e, c must be a nomologically sufficient condition for e. And for John Mackie, for c to causally explain e there must be event-types C and E such that C is an inus-condition for E.53 In a series of papers and a book, James Woodward (1997, 2000, 2002, 2003a, 2003b) has put forward a ‘manipulationist’ account of causal explanation. Briefly put, c causally explains e if e causally depends on c, where the notion of causal dependence is understood in terms of relevant (interventionist) counterfactual, that is counterfactuals that describe the outcomes of interventions. A bit more accurately, c causally explains e if, were c to be (actually or counterfactually) manipulated, e would change too. This model ties causal explanation to actual and counterfactual experiments that show how manipulation of factors mentioned in the explanans would alter the explanandum. It also stresses the role of invariant relationships, as opposed to strict laws, in causal explanation. Explanation in this model consists in answering a network of “what-if-things-had-been-different questions”, thereby placing the explanandum within a pattern of counterfactual dependencies (cf. Woodward 2003a, p. 201). For instance, the law of ideal gases is said to be explanatory not because it renders a certain explanandum (e.g., that the pressure of a certain gas increased) nomically expected, but because it can tell us how the 53
For details on all these, see my (2002)
93 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 93–107. © 2007 Springer.
94
S. PSILLOS
pressure of the gas would have changed, had the antecedent conditions (e.g., the volume of the gas) been different. The explanation proceeds by locating the explanandum “within a space of alternative possibilities” (Woodward 2003a, p. 191). The key idea, I take it, is that causal explanation shows how the explanandum depends on the explanans in stable way. Not only does it show why the explanandum holds; it also shows how it would vary in a stable way, had the factors mentioned in the explanans been different. Woodward’s theory is developed in great detail in his (2003a) and I cannot do full justice to it in this paper. Since I will be mostly critical of his appeal to interventionist counterfactual conditionals, I should state right from the start that Woodward’s theory is invariably interesting and insightful. In particular, it casts new light on the practice of causal explanation, especially in the so-called special sciences. It makes clear how causal explanation is concerned with factors that make a difference to the presence or absence of the explanandum. It deals quite effectively with the traditional problems of asymmetries in explanation and of citing irrelevant factors as explanatory. It accommodates omissions and preventings as explanatory factors. It shows that not all causal explanations should take the form of deductive (or inductive) arguments. Nonetheless, it displays how generalisations (suitably understood) do play a role in causal explanation. Leaving all these positive elements to one side, this paper will focus on two central conceptual ingredients of Woodward’s account of causal explanation, viz., interventionist counterfactuals and invariant generalisations. Section 2 offers a brief presentation of Woodward’s theory highlighting its two central ingredients. Section 3 calls into question Woodward’s interventionist counterfactuals. It claims that they blur the distinction between truth- and evidence-conditions of counterfactual assertions and leave us with no clear account of the semantics of counterfactuals. Sections 4 discusses the role of laws in causal explanation and claims that the very possibility of experimental counterfactuals requires that laws are understood in a sense stronger than relations of invariance among variables. 2. MANIPULATIONIST CAUSAL EXPLANATION Woodward takes his theory of causal explanation to be intimately linked to his theory of causation. This, of course, is as it should be. Causal explanation is meant to provide information about the causes of the explananda, and hence it requires an account of what it is for c to cause e. On Woodward’s view, causation is based on counterfactual manipulation. His theory is counterfactual in the following sense: what matters is what would happen to a relationship, were interventions to be carried out. A relationship among some variables X and Y is causal if, were there an intervention that changed the value of X appropriately, the relationship between X and Y wouldn’t change and the value of Y would change. To use a stock example, the force exerted on a spring causes a change of its length, because if an intervention changed the force exerted on the spring, the length of the spring would change too (but the relationship between the two magnitudes—expressed by Hooke’s law— would remain invariant, within a certain range of interventions).
CAUSAL EXPLANATION AND MANIPULATION
95
Let us describe, somewhat sketchily, the two key notions of intervention and invariance. The gist of Woodward’s characterisation of an intervention is this. A change of the value of X counts as an intervention I if it has the following characteristics: a) the change of the value of X is entirely due to the intervention I; b) the intervention changes the value of Y, if at all, only through changing the value of X. The first characteristic makes sure that the change of X does not have causes other than the intervention I, while the second makes sure that the change of Y does not have causes other than the change of X (and its possible effects).54 These characteristics are meant to ensure that Y-changes are exclusively due to X-changes, which, in turn, are exclusively due to the intervention I. As Woodward stresses, there is a close link between intervention and manipulation. Yet, his account makes no special reference to human beings and their (manipulative) activities. In so far as a process has the right characteristics, it counts as an intervention. So interventions can occur ‘naturally’, even if they can be highlighted by reference to “an idealised experimental manipulation” (2000, p. 199). Woodward links the notion of intervention with the notion of invariance. A certain relation (or a generalisation) is invariant, Woodward says, “if it would continue to hold—would remain stable or unchanged—as various other conditions change” (2000, p. 205). What really matters for the characterisation of invariance is that the generalisation remains stable under a set of actual and counterfactual interventions. So Woodward (2000, p. 235) notes: the notion of invariance is obviously a modal or counterfactual notion [since it has to do] with whether a relationship would remain stable if, perhaps contrary to actual fact, certain changes or interventions were to occur.
Let me highlight three important general elements of Woodward’s approach. First, causal claims relate variables. He (2003a, p. 112) insists that causes should be such that it makes sense to say of them that they could be changed or manipulated. Thinking of them as variables, which can take different values, is then quite natural. But as he goes on to note, it is not difficult to translate talk in terms of changes in the values of variables into talk in terms of events and conversely. For instance, instead of saying that the hitting by the hammer (an event) caused the shattering of the vase (another event), we may say that the change of the value of a certain indicator variable from not-hit to hit caused the change of the value of another variable from unshattered to shattered. This strategy, however, will not work in cases in which putative causes cannot be understood as values of variables.55 But then again, this is
54 55
There is a third characteristic too, viz., that the intervention I is not correlated with other causes of Y besides X. For an important attempt to show how the relata of the interventionist counterfactual approach can be seen as events, see Kluve (2004, especially 81-2).
96
S. PSILLOS
fine for Woodward, as he claims that in those cases causal claims will be, to say the least, ambiguous (cf. 2003a, p. 115ff). Second, generalisations need not be invariant under all possible interventions. Hooke’s law, for instance, would ‘break down’ if one intervened to stretch the spring beyond its breaking point. Still, Hooke’s law does remain invariant under some set of interventions. In so far as a generalisation is invariant under a certain range of interventions, it can be explanatorily useful, without being exceptionless (cf. 2000, p. 227-8). Woodward (2000, p. 214) stresses: “[t]here are generalisations that are invariant and that can be used to answer a range of what-if-things-had-beendifferent questions and that hence are explanatory, even though we may not wish to regard them as laws and even though they lack many of the features traditionally assigned to laws by philosophers”. In particular, a generalisation can be causal even if it is not universally invariant (cf. 2003a, p. 15). Third, Woodward does not aim to offer a reductive account of causation or causal explanation. The notion of intervention is itself causal and, in any case, causal considerations are necessary to specify when a relationship among some variables is causal. For instance, an appropriate intervention I on variable X with respect to variable Y should be such that it is not correlated with other causes of Y or does not directly cause a change of the value of Y. I think Woodward (2003a, p. 104-7) is right in insisting that his account is not trapped in a vicious circle. In any case, an account of causation or causal explanation need not be reductive to be illuminating. In light of the above, causal explanation proceeds by exploiting the manipulationist element of causation and the invariant element of generalisations. Explanatory information “is information that is potentially relevant to manipulation and control” (Woodward 2003a, p. 10). Causal relations are explanatory because they provide information about counterfactual dependencies among causal variables. And invariant generalisations are explanatory because they exhibit stable patterns of counterfactual dependence among causal variables in virtue of which different values of the effect-variable counterfactually depend on different values of the cause-variable. 3. COUNTERFACTUALS It is already evident that counterfactual conditionals loom large in Woodward’s account. Interventions need not be actual. They can be hypothetical or counterfactual. And invariance is not understood in terms of stability under actual interventions. The causal relationship (generalisation) should be invariant under hypothetical or counterfactual interventions. Counterfactual conditionals have been reprimanded on the ground that they are context-dependent and vague. Take, for instance, the following counterfactual: ‘If Jones had not smoked so heavily, he would have lived a few years more’. What is it for it to be true? Any attempt to say whether it is true, were it to be possible at all, would require specifying what else should be held fixed. For instance, other aspects of Jones’s health should be held fixed, assuming that other factors (e.g., a weak heart) wouldn’t cause a premature death, anyway. But what things to hold fix is not,
CAUSAL EXPLANATION AND MANIPULATION
97
necessarily, an objective matter. Or, consider the following pair of counterfactuals: ‘If Julius Caesar had been in charge of U. N. Forces during the Korean war, then he would have used nuclear weapons’ and ‘If Julius Caesar had been in charge of U. N. Forces during the Korean war, then he would have used catapults’. It is difficult to see how we could possibly tell which of them, if any, is true. As the reader will surely know, there have been many significant attempts to offer semantic for counterfactual conditionals. Perhaps the most well-developed, and certainly the most well-known, is Lewis’s (1973) account in terms of possible worlds. I will not discuss this theory here.56 The relevant point is that Woodward offers an account of counterfactuals that tries to avoid the metaphysical excesses of Lewis’s theory.57 3.1 Experimental counterfactuals Woodward is very careful in his use of counterfactuals. Not all of them are of the right sort for the evaluation of whether a relation is causal. Only counterfactuals that are related to interventions can be of help. An intervention gives rise to an “active counterfactual”, that is, to a counterfactual whose antecedent is made true “by interventions” (1997, p. 31; 2000, p. 199). In his (2003a, p. 122) he stresses that the appropriate counterfactuals for elucidating causal claims are not just any counterfactuals but rather counterfactuals of a very special sort: those that have to do with the outcomes of hypothetical interventions. […] it does seem plausible that counterfactuals that we do not know how to interpret as (or associate with) claims about the outcomes of well-defined interventions will often lack a clear meaning or truth value.
In his (2003b, p. 3), he very explicitly characterises the appropriate counterfactuals in terms of experiments: they “are understood as claims about what would happen if a certain sort of experiment were to be performed” (cf. also 2003a, p. 10 and 114). Consider a case he (2003b, p. 4-5) discusses. Take Ohm’s law (that the voltage E of a current is equal to the product of its intensity I times the resistance R of the wire) and consider the following two counterfactuals: (1) If the resistance were set to R=r at time t, and the voltage were set to E=e at t, then the intensity I would be i=e/r at t. (2) If the resistance were set to R=r at time t, and the voltage were set to E=e at time t, then the intensity I would be i* ≠ e/r at t. There is nothing mysterious here, says Woodward, “as long as we can describe how to test them” (2003b, p. 6). We can perform the experiments at a future time t* in order to see whether (1) or (2) is true. If, on the other hand, we are interested in what would have happened had we performed the experiment in a past time t, Woodward invites us to rely on the “very good evidence” we have “that the 56 57
See my (2002, 92-101). For a discussion of Lewis’s theory in relation to Woodward’s see his (2003a, 133-45).
98
S. PSILLOS
behaviour of the circuit is stable over time” (2003b, p. 5). Given this evidence, we can assume, in effect, that the actual performance of the experiment at a future time t* is as good for the assessment of (1) and (2) as a hypothetical performance of the experiment at the past time t. For Woodward, the truth-conditions of counterfactual statements (and their truthvalues) are not specified by means of an abstract metaphysical theory, e.g., by means of abstract relations of similarity among possible worlds. He calls his own approach “pragmatic”. That’s how he (2003b, p. 4) puts it: For it to be legitimate to use counterfactuals for these goals [understanding causal claims and problems of causal inference], I think that it is enough that (a) they be useful in solving problems, clarifying concepts, and facilitating inference, that (b) we be able to explain how the kinds of counterfactual claims we are using can be tested or how empirical evidence can be brought to bear on them, and (c) we have some system for representing counterfactual claims that allows us to reason with them and draw inferences in a way that is precise, truth-preserving and so on.
Yet, Woodward’s view is also meant to be realist and objectivist. He is quite clear that counterfactual conditionals have non-trivial truth-values independently of the actual and hypothetical experiments by virtue of which it can be assessed whether they are true or false. He (2003b, p. 5) says: On the face of things, doing the experiment corresponding to the antecedent of (1) and (2) doesn’t make (1) and (2) have the truth values they do. Instead the experiments look like ways of finding out what the truth values of (1) and (2) were all along. On this view of the matter, (1) and (2) have non-trivial truth values—one is true and the other false— even if we don’t do the experiments of realizing their antecedents. Of course, we may not know which of (1) and (2) is true and which false if we don’t do these experiments and don’t have evidence from some other source, but this does not mean that (1) and (2) both have the same truth-value.
This point is repeated in his (2003a, p. 123), where he stresses: We think instead of [a counterfactual such as (1) above] as having a determinate meaning and truth value whether or not the experiment is actually carried out—it is precisely because the experimenters want to discover whether [this counterfactual] is true or false that they conduct the experiment.
So though “pragmatic”, Woodward’s theory is also objectivist. But it is minimally so. As he (2003a, p. 121-2) notes, his view: requires only that there be facts of the matter, independent of facts about human abilities and psychology, about which counterfactual claims about the outcome of hypothetical experiments are true or false and about whether a correlation between C and E reflects a causal relationship between C and E or not. Beyond this, it commits us to no particular metaphysical picture of the ‘truth-makers’ for causal claims.
The main problem that I see in Woodward’s theory relates to the question: what are the truth-conditions of counterfactual assertions? Woodward doesn’t take all counterfactuals to be meaningful and truth-valuable. As we have seen (see also 2003a, 122), he takes only a subclass of them, the active counterfactuals, to be such. However, he does not want to say that the truth-conditions of active counterfactuals are fully specified by (are reduced to) actual and hypothetical experiments. If he said this, he could no longer say that active counterfactuals have
CAUSAL EXPLANATION AND MANIPULATION
99
determinate truth-conditions independently of the (actual and hypothetical) experiments that can test them. In other words, Woodward wants to distinguish between the truth-conditions of counterfactuals and their evidence-(or test) conditions, which are captured by certain actual and hypothetical experiments. The problem that arises is this. Though we are given a relatively detailed account of the evidence-conditions of counterfactuals, we are not given anything remotely like this for their truth-conditions. What, in other words, is it that makes a certain counterfactual conditional true? A thought here might be that there is no need to say anything more detailed about the truth-conditions of counterfactuals than offering a Tarski-style metalinguistic account of them of the form (T) ‘If x had been the case, then y would have been the case’ is true iff if x had been the case, then y would have been the case. This move is possible but not terribly informative. We don’t know when to assert (or hold true) the right hand-side. And the question is precisely this: when is it right to assert (or hold true) the right-hand side? Suppose we were to tell a story in terms of actual and hypothetical experiments that realise the antecedent of the right-hand side of (T). The problem with this move is that the truth-conditions of the counterfactual conditional would be specified in terms of its evidence-conditions, which is exactly what Woodward wants to block. Besides, if we just stayed with (T) above, without any further explication of its right-hand side, any counterfactual assertion (and not just the active counterfactuals) would end up meaningful and truth-valuable. Here again, Woodward’s project would be undermined. Woodward is adamant: “Just as non counterfactual claims (e.g., about the past, the future, or unobservables) about which we have no evidence can nonetheless possess nontrivial truth-values, so also for counterfactuals” (2003b, p. 5). This is fine. But in the case of claims about the past or about unobservables there are well-known stories to be told as to what the difference is between truth- and evidence-conditions. When it comes to Woodward’s counterfactuals, we are not told such a story. Another thought might be motivated by Woodward’s view that causal claims are irreducible. Woodward says: According to the manipulationist account, given that C causes E, which counterfactual claims involving C and E are true will always depend on which other causal claims involving other variables besides C and E are true in the situation under discussion. For example, it will depend on whether other causes of E besides C are present (2003a, p. 136).
The idea here, I take it, is that the truth-conditions of counterfactuals depend on the truth-conditions of certain causal claims, most typically causal claims about the larger causal structure in which the variables that appear in the counterfactuals under examination are embedded. Intuitively, this is a cogent claim. Consider two variables X and Y and examine the counterfactual: if X had changed (that is, if an intervention I had changed the value of X), the value of Y would have changed. Whether this is true or false will depend on whether I causes the value of Y to
100
S. PSILLOS
change by a route independent of X, or on whether some other variable Z causes a direct change of the value of Y. Causal facts such as these are part of the truthconditions of the foregoing counterfactual. It is clear that they may, or may not, obtain independently of any intervention on X. So whether on not an intervention I on X were to occur, it might be the case that were it to occur, it would not influence the value of Y by a route independent of X. The thought, then, may be that the truthconditions of a counterfactual are specified by certain causal facts that involve the variables that appear in the counterfactual as well as the variables of the broader causal structure in which the variables of interest are embedded. I see two problems with this thought. The first is that this account is very abstract and general. It is informative since it says that causal facts are required for the truth of counterfactuals, but what these facts are will depend on, and vary with, each causal structure under consideration. So the proposal does not specify which causal facts are required for the truth of counterfactuals. What these facts are will depend on each particular causal structure. The second problem is that this account seems circular. Causal claims, we are told, should be understood in terms of counterfactual dependence (where the counterfactuals are interventionist). To fix our ideas, let us consider the causal claim B0: X causes Y. For B0 to be true, the following counterfactual C1 should be true. C1: if X had changed (that is, if an intervention I had changed the value of X), the value of Y would have changed. On the thought we are presently considering, the truth of C1 will depend, among other things, on the truth of another causal claim: B1: I does not cause a change to the value of Y directly, (that is, by a route independent of X). How does the truth of B1 depend on counterfactuals? Let us assume that relations of counterfactual dependence are part of the truth-conditions of causal claims. Then, at least another (interventionist) counterfactual C2 would have to be true in order for B1 to be true. C2: if an(other) intervention I' had changed the value of I, the value of Y would not have changed (by a route independent of X). But what makes C2 true? Suppose it is another causal claim B2. B2: I' does not cause a change to the value of Y directly.
CAUSAL EXPLANATION AND MANIPULATION
101
For B2 to be true, another counterfactual C3 would have to be true, and so on. Either a regress is in the offing or the truth of some causal claims has to be accepted as a brute fact. In the former case, counterfactuals are part of the truth-conditions of other counterfactuals, with no independent account of what it is for a counterfactual to be true. In the latter case, we are left in the dark as to what causal claims capture brute facts. In particular, why should we not take it as a brute fact that B0 or B1 is true? Suppose, on the other hand, that we do not take relations of counterfactual dependence to be part of the truth-conditions of causal claims. We would still need an account of the truth-conditions of causal claims. But even if we ignore this, a circle is still present. Suppose we settle for the weaker view that relations of counterfactual dependence are needed for establishing that a causal claim is true (without being them that make this claim true). The circle we are now caught in is this: establishing that certain counterfactuals are true is necessary for establishing that other counterfactuals are true or false. For instance, for establishing the claim that C1 is true, it is required that another counterfactual C2 is established as true and so on. Since C1 is distinct from C2, the circle might not be vicious. But the point is that there is no obvious place to break the circle of counterfactuals and make it going. We have examined two ways to specify the truth-conditions for counterfactual claims and we have found them both wanting. Still, there are two general options available. One is to collapse the truth-conditions of counterfactuals to their evidence-conditions. One can see the prima facie attraction of this move. Since evidence-conditions are specified in terms of actual and hypothetical experiments, the right sort of counterfactuals (the active counterfactuals) and only those end up being meaningful and truth-valuable. But there is an important drawback. Recall counterfactual assertion (1) above. On the option presently considered, what makes (1) true is that its evidence-conditions obtain. Under this option, counterfactual conditionals lose, so to speak, their counterfactuality. (1) becomes a shorthand for a future prediction and/or the evidence that supports the relevant law. If t is a future time, (1) gives way to an actual conditional (a prediction). If t is a past time, then, given that there is good evidence for Ohm’s law, all that (1) asserts under the present option is that there has been good evidence for the law. In any case, Woodward is keen to keep evidence- and truth-conditions apart. Then, (and this is the other option available) some informative story should be told as to what the truth-conditions of counterfactual conditionals are and how they are connected with their evidence-conditions (that is, with actual and hypothetical experiments). There may be a number of stories to be told here.58 The one I favour 58
One might try to keep truth- and evidence-conditions apart by saying that counterfactual assertions have excess content over their evidence-conditions in the way in which statements about the past have excess content over their (present) evidence-conditions. Take the view (roughly Dummett’s) that statements about the past are meaningful and true in so far as they are verifiable (i.e., their truth can be known). This view may legitimately distinguish between the content of a statement about the past and the present or future evidence there is for it. Plausibly, this excess content of a past statement may be cast in terms of counterfactuals: a meaningful past statement p implies counterfactuals of the form ‘if x were present at time t, x would verify that p’. This move presupposes that there are meaningful and
102
S. PSILLOS
ties the truth-conditions of counterfactual assertions to laws of nature. It is then easy to see how the evidence-conditions (that is, actual and hypothetical experiments) are connected with the truth-conditions of a counterfactual: actual and hypothetical experiments are symptoms for the presence of a law. There is a hurdle to be jumped, however. It is notorious that many attempts to distinguish between genuine laws of nature and accidentally true generalisations rely on the claim that laws do, while accidents do not, support counterfactuals. So counterfactuals are called for to distinguish laws from accidents. If at the same time laws are called for to tell when a counterfactual is true, we go around in circles. Fortunately, there is the MillRamsey-Lewis view of laws (see my 2002, Chapter 5). Laws are those regularities which are members of a coherent system of regularities, in particular, a system which can be represented as an ideal deductive axiomatic system striking a good balance between simplicity and strength. On this view, laws are identified independently of their ability to support counterfactuals. Hence, they can be used to specify the conditions under which a counterfactual is true.59 It might be that Woodward aims only to provide a criterion of meaningfulness for counterfactual conditionals without also specifying their truth-conditions. This would seem in order with his “pragmatic” account of counterfactuals, since it would offer a criterion of meaningfulness and a description of the ‘evidence conditions’ of counterfactuals, which are presumed to be enough to understand causation and causal explanation. In response to this, I would not deny that Woodward has indeed offered a sufficient condition of meaningfulness. Saying that counterfactuals are meaningful if they can be interpreted as claims about actual and hypothetical experiments is fine. But can this also be taken as a necessary condition? Can we say that only those counterfactuals are meaningful which can be seen as claims for actual and hypothetical experiments? If we did say this, we would rule out as meaningless a number of counterfactuals that philosophers have played with over the years, e.g., the pair of Julius Caesar counterfactuals considered in section 3. Though I agree with him that they are “unclear”, I am not sure they are meaningless. Take one of Lewis’s examples, that had he walked on water, he would not have been wet. I don’t think it is meaningless. One may well wonder what the point of offering such counterfactuals might be. But whatever it is, they are understood and, perhaps, are true. Perhaps, as Woodward (2003a, p. 151) says, the antecedents of such counterfactuals are “unmanipulable for conceptual reasons”. But if they are understood (and if they are true), this would be enough of an argument against the view that manipulability offers a necessary condition for meaningfulness. It turns out, however, that there are more sensible counterfactuals that fail Woodward’s criterion. Some of them are discussed by Woodward himself (2003a, p. 127-33). Consider the true causal claim: Changes in the position of the moon with
59
true counterfactual assertions. But note that a similar story cannot be told about counterfactual conditionals. If we were to treat their supposed excess content in the way we just treated the excess content of past statements, we would be involved in an obvious regress: we would need counterfactuals to account for the excess content of counterfactuals. Obviously, the same holds for the Armstrong-Dretske-Tooley view of laws (see my 2002, chapter 6). If one takes laws as necessitating relations among properties, then one can explain why laws support counterfactuals and, at the same time, identify laws independently of this support.
CAUSAL EXPLANATION AND MANIPULATION
103
respect to the earth and corresponding changes in the gravitational attraction exerted by the moon on the earth’s surface cause changes in the motion of the tides. As Woodward adamantly admits, this claim cannot be said to be true on the basis of interventionist (experimental) counterfactuals, simply because realising the antecedent of the relevant counterfactual is physically impossible. His response to this is an alternative way for assessing counterfactuals. This is that counterfactuals can be meaningful if there is some “basis for assessing the truth of counterfactual claims concerning what would happen if various interventions were to occur”. Then, he adds, “it doesn’t matter that it may not be physically possible for those interventions to occur” (2003a, p. 130). And he sums it up by saying that “an intervention on X with respect to Y will be ‘possible’ as long it is logically or conceptually possible for a process meeting the conditions for an intervention on X with respect to Y to occur” (2003a, p. 132). My worry then is this. We now have a much more liberal criterion of meaningfulness at play, and it is not clear, to say the least, which counterfactuals end up meaningless by applying it. In any case, Woodward (2003a, p. 132) offers an important warning: [I]t would be a mistake to make the physical possibility of an intervention on C constitutive in any way of what it is for there to be a causal connection between C and E. […] When an intervention changes C and in this way changes E, this exploits an independently existing causal link between C and E. One can perfectly well have the link without the physical possibility of an intervention on C.
I take this to imply that his counterfactual approach provides an extrinsic way to identify a sequence of events as causal, viz., that the sequence remains invariant under certain interventions. In an earlier piece, he (2000, p. 204) stressed: what matters for whether X causes […] Y is the ‘intrinsic’ character of the X-Y relationship but the attractiveness of an intervention is precisely that it provides an extrinsic way of picking out or specifying this intrinsic feature.
So there seems to be a conceptual distinction between causation and invarianceunder-interventions: there is an intrinsic feature of a relationship in virtue of which it is causal, an extrinsic symptom of which is its invariance under interventions.60 If I have got Woodward right, causation has excess content over invariance-underinterventions. So there is more to causation—qua an intrinsic relation—than invariance-under-actual-and-counterfactual-interventions. Hence, there is more to be understood about what causation and causal explanation are. To sum up. We need to be told more about the truth-conditions of counterfactual conditionals. If Woodward ties too close a knot between counterfactuals and actual and hypothetical experiments, then counterfactual assertions may reduce to claims about actual and hypothetical experiments (without any excess content). If, on the other hand, Woodward wants to insist that counterfactuals have their truthconditions independently of their evidence-conditions, then it is an entirely open option that the truth-conditions of counterfactual assertions involve laws of nature.
60
In his (2003a, p. 125) Woodward says “there is a certain kind of relationship with intrinsic features that we exploit or make use of when we bring about B by bringing about A”.
104
S. PSILLOS
3.2 No laws in, no counterfactuals out As we have already seen, when it comes to causal explanation, Woodward stresses that reliance on invariant generalisations is enough for it. He (2003a, p. 236) says: [W]hat matters for whether a generalisation is explanatory is whether it can be used to answer a range of what-if-things-had-been-different questions and to support the right sorts of counterfactuals about what will happen under interventions”.
Naturally, when checking whether a generalisation or a relationship among magnitudes or variables is invariant we need to subject it to some variations/changes/interventions. What changes will it be subjected to? The obvious answer is: those that are permitted, or are permissible, by the laws of nature. Suppose that we test Ohm’s law. Suppose also that one of the interventions envisaged was to see whether it would remain invariant, if the measurement of the intensity of the current was made on a spaceship, which moved faster than light. This, of course, cannot be done, because it is a law that nothing travels faster than light. So, some laws must be in place before, based on considerations of invariance, it is established that some generalisation is invariant under some interventions. Hence, Woodward’s notion of “invariance under interventions” cannot offer an adequate analysis of lawhood, since laws are required to determine what interventions are possible. Couldn’t Woodward say that even basic laws—those that determine what interventions and changes are possible—express just relations of invariance? Take, once more, the law that nothing travels faster than light. Can the fact that it is a law be the result of subjecting it to interventions and changes? Hardly. For it itself establishes the limits of possible interventions and control.61 I do not doubt that it may well be the case that genuine laws express relations of invariance. But this is not the issue. For, the manifestation of invariance might well be the symptom of a law, without being constitutive of it. It seems that Woodward must be committed to this symptom/constitution distinction. As he explains in detail, invariance does not characterise laws only; other relationships or generalisations, which cannot be deemed laws, display invariance, especially in the special sciences. For instance, Woodward (2000, p. 214) notes: [t]here are generalisations that are invariant and that can be used to answer a range of what-if-things-had-been-different questions and that hence are explanatory, even though we may not wish to regard them as laws and even though they lack many of the features traditionally assigned to laws by philosophers.
Note, however, that at least some accidental generalisations do possess some range of invariance. So if invariance is to be found in laws as well as in non-laws, it should be at best a symptom of lawhood. What, then, does lawhood consist in? Woodward is perfectly happy with the thought that laws are not what philosophers have taken them to be. He (2000, p. 222) thinks that most of the standard criteria
61
Woodward (2000, p. 206-7) too agrees that this law cannot be accounted for in terms of invariance.
CAUSAL EXPLANATION AND MANIPULATION
105
are not helpful either for understanding what is distinctive about laws of nature or for understanding the feature that characterise explanatory generalisations in the special sciences.
In particular, he takes it that in so far as a generalisation is invariant under a certain range of interventions, it can be a law without being exceptionless (cf. 2000, p. 227-8). But no clear picture emerges as to what exactly makes a generalisation a law. For, as Woodward (2000, p. 227) admits, even laws will not be invariant under all actual and possible interventions. For instance, Maxwell’s laws break down at the Planck scale, where quantum mechanical effects take over. As a result of all this, the difference between laws, invariant-generalisations-that-are-explanatorily-usefulbut-non-laws, and mere accidents is deemed to be a difference “in degree (…) rather than of kind” (2000, p. 241). It is a difference in degree precisely because the notion of invariance under interventions admits of degrees. Some generalisations have a wider range of invariance, whereas others have a narrower range and yet others are “highly non-invariant” (2000, p. 237). This is not to say, Woodward claims, that the difference in degree is no difference at all. For, as he (2000, p. 242) says, the features possessed by generalisations, like Maxwell’s equations [which are paradigmatic cases of laws]—greater scope and invariance under larger, more clearly defined, and important classes of interventions and changes—represent just the sort of generality and unconditionality standardly associated with laws of nature.
Be that as it may, it should be stressed that laws are required in order to fix the range of invariance of a generalisation. For, in order to specify the range of invariance of a generalisation, we first need a) to specify what interventions are physically possible and b) which of them, if they happened, would leave the given generalisation unchanged. Both of the above, however, need a prior reliance on laws. As noted above, it is laws that specify the physically possible interventions. What needs to be added here is that it is laws that govern the assessment of the counterfactual in (b). For instance, specifying what interventions, had they happened, would have left Kepler’s law unchanged requires holding other laws fixed. For if laws, e.g., Newton’s laws, were allowed to be violated, then the range of invariance of Kepler’s laws would be very limited. So, it seems that Woodward’s account boils down to the following circular statement: a generalisation is a law if it is invariant “under a large and important set of changes” (2000, p. 241), where the relevant set of changes is determined by laws.62 To sum up. Without an independent account of what laws are, there is no clear way in which we can deem some (interventionist) counterfactual assertions true or false. Which interventions are physically possible and which interventions leave certain relations invariant depends on what laws there are. The latter cannot be fully understood as relations that remain invariant under interventions since they specify what interventions are possible.
62
I take to heart Marc Lange’s (2000) recent important diagnosis: either all laws, taken as a whole, form an invariant-under-interventions set, or, strictly speaking, no law, taken in isolation, is invariantunder-interventions. This does not yet tell us what laws are. But it does tell us what marks them off from intuitively accidental generalisations.
106
S. PSILLOS 4. CONCLUSION
Perhaps, the worries raised in this paper do not affect causal explanation as a practical activity. In many practical cases, we may well have a lot of information about a particular causal structure and this may be enough to answer questions about which (interventionist) counterfactuals are true and what generalisations are invariant under interventions. When we deal with stable causal or nomological structures63 interventionist counterfactuals are meaningful and truth-valuable. The worries raised in the paper concern the prospects of the manipulationist account as a philosophical theory of causal explanation. Simply put, the main worry is that, as it stands, Woodward’s theory highlights and exploits the symptoms of a good causal explanation, without offering a fully-fledged theory of what causal explanation consists in. Invariance-under-interventions is a symptom of causal relations and laws. It is not what causation or lawhood consists in. It is a great virtue of Woodward’s approach that exploits these symptoms to show how causal explanation can proceed. But this undeniable virtue should not obscure the fact that there is more to causal explanation (by there being more to causation and to lawhood) than stable relations of (interventionist) counterfactual dependence. Woodward (2003a, p. 114 and 130) has stressed that his notion of intervention should be seen as a “regulative ideal”. Its function, he says, is “to characterise the notion of an ideal experimental manipulation and in this way to give a purchase on what we mean or are trying to establish when we claim that X causes Y” (2003a, p. 130). Perhaps, his theory of causal explanation is also meant to be regulative ideal: it tells us what we should mean and strive to do when we claim that X causally explains Y. I have no quarrel with this, provided it is also acknowledged that the regulative ideal is still short of being constitutive of what causal explanation is. REFERENCES Kluve, J. (2004). On the Role of Counterfactuals in Inferring Causal Effects. Foundations of Science 9: 65-101. Lange, M. (2000). Natural Laws in Scientific Practice. Oxford: Oxford University Press. Lewis, D. (1973). Counterfactuals. Cambridge MA: Harvard University Press. Psillos, S. (2002). Causation and Explanation. Chesham: Acumen. Simon, H. A. and Rescher, N. (1966). Cause and Counterfactual. Philosophy of Science 33: 323-40. Woodward, J. (1997). Explanation, Invariance and Intervention. Philosophy of Science 64 (Proceedings): 26-41. Woodward, J. (2000). Explanation and Invariance in the Special Sciences. The British Journal for the Philosophy of Science 51: 197-254.
63
My favourite way to spell out this notion is given by Simon and Rescher (1966). In fact, in showing how a stable structure can make some counterfactuals true, they blend the causal and the nomological in a fine way.
CAUSAL EXPLANATION AND MANIPULATION
107
Woodward, J. (2002). What is a Mechanism? A Counterfactual Account. Philosophy of Science 69: 366377. Woodward, J. (2003a). Making Things Happen: A Theory of Causal Explanation. New York: Oxford University Press. Woodward, J. (2003b). Counterfactuals and Causal Explanation. http://philsciarchive.pitt.edu/archive/ 00000839/.
ASSESSING THE EXPLANATORY POWER OF CAUSAL EXPLANATIONS ERIK WEBER AND JEROEN VAN BOUWEL
1. INTRODUCTION According to Wesley Salmon, causal explanations of singular facts must contain descriptions of the causal interactions that caused the fact to be explained, and descriptions of the causal processes that link these interactions to one another and to the explanandum event. In his view, a causal explanation is a description of a causal net in which causal interactions are the nodes and causal processes constitute the links between the nodes. The explanatory power of an explanation depends on its depth: an explanation is better than another if it cites more relevant causal interactions and causal processes. It is not a desideratum that an explanation makes the explanandum highly probable. This view is opposed by among others Nancy Cartwright, who gives the following example: I consider eradicating the poison oak at the bottom of my garden by spraying it with defoliant. The can of defoliant claims that it is 90 per cent effective; that is, the probability of a plant's dying given that it is sprayed is .9, and the probability of its surviving is .1. Here (...) only the probable outcome, and not the improbable, is explained by the spraying. One can explain why some plants died by remarking that they were sprayed with a powerful defoliant; but this will not explain why some survive. (Cartwright 1983, p. 28)
In this article we will defend the view that the criteria by which the explanatory power of causal explanations is to be judged, are context-dependent. Explanationseeking questions (even if we confine ourselves to causal explanations) can have different motivations. The criteria for explanatory power depend on the motivation. More specifically, we will argue that: (1) in most contexts, a posteriori probability is important; so Salmon is wrong in denying its relevance. (2) explanatory depth is relevant in only one context; so Salmon overestimates its relevance. (3) in most contexts, there is an extra criterion; so there is more to explanatory power than a posteriori probability and/or depth.
109 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 109–118. © 2007 Springer.
110
E. WEBER AND J. VAN BOUWEL
Causal explanations have different formats. In order to develop a clear argument for the context-dependence of explanatory power, we will confine ourselves to what we call structure-interaction explanations (SI explanations). These explanations causally explain a property of an object by referring to permanent properties of this object (the structural component of the explanation) and to causal interactions with other objects. In Section 2 we will clarify the concept of SI explanation. In Section 3 we identify the contexts by distinguishing different types of explanation-seeking questions and their underlying motivation. In Sections 4 and 5 we will discuss explanatory power in these different contexts. These sections contain our arguments for the claims (1)-(3) above. In Section 6 we clarify why the idea of unification is absent in our account. In Section 7 we summarise our conclusions and spell out the implications of our views for the possibility of an “ideal explanatory text”. 2. WHAT ARE SI EXPLANATIONS? Let us start with clarifying what interactions are. The concept of causal interaction was introduced by Salmon in order to capture the innovative aspect of causation. There has been a lot of discussion about what is the best way to define causal interactions (Dowe 1992; Salmon 1994; Dowe 1995). This discussion is not relevant for our purposes. We will adopt a definition that is very close to Salmon's original definition: (CI) At t there is a causal interaction between objects x and y if and only if (1) there is an intersection between x and y at t (i.e. they are in adjacent or identical spatial regions at t), (2) x exhibits a characteristic P in an interval immediately before t, but a modified characteristic P immediately after t, (3) y exhibits a characteristic Q in an interval immediately before t, but a modified characteristic Q immediately after t, (4) x would have had P immediately after t if the intersection would not have occurred, and (5) y would have had Q immediately after t if the intersection would not have occurred. An object can be anything in the ontology of science (e.g. atoms, photons, ...) or common sense (humans, chairs, trees, ...). This definition incorporates the basic ideas of Salmon. The main difference is that, according to our definition, interactions occur between two objects. In Salmon's definition, interactions are something that happens between two processes (Salmon 1984, p. 171). This modification was suggested in Dowe (1992) (we do not agree with the other modifications that Dowe suggests, so this is the only change we want to make). Because we stick close to Salmon’s original definition, we can borrow his examples. Collision is the prototype of causal interaction: the momentum of each object is changed, this change would not have occurred without the collision, and the
ASSESSING THE EXPLANATORY POWER OF CAUSAL EXPLANATIONS
111
new momentum is preserved in an interval immediately after the collision. When a white light pulse goes through a piece of red glass, this intersection is also a causal interaction: the light pulse becomes and remains red, while the filter undergoes an increase in energy because it absorbs some of the light. The glass retains some of the energy for some time beyond the actual moment of interaction. As an example of an intersection which is not a causal interaction, we consider two spots of light, one red and the other green, that are projected on a white screen. The red spot moves diagonally across the screen from the lower left-hand corner to the upper right-hand corner, while the green spot moves from the lower right-hand corner to the upper left-hand corner. The spots meet momentarily at the centre of the screen. At that moment, a yellow spot appears, but each spot resumes its former colour as soon as it leaves the region of intersection. No modification of colour persists beyond the intersection, so no causal interaction has occurred. Now that we have clarified what interactions are, we can spell our what SI explanations are. They are chains of elementary SI explanations. Let us give an example of the latter. Consider a thermometer that is immersed in hot water. We observe that the mercury column first drops and then rises. An elementary SI explanation for this fact is: The mercury column of the thermometer first drops and then rises because: (I) the thermometer has been rapidly immersed in hot water; and (S) this thermometer consists of a glass tube which is partly filled with mercury. This explanation refers to the causal interaction that in which the fact-to-be explained was brought about (the immersion in hot water) and to the permanent properties of the thermometer that determined how it reacted to the immersion. The general format of elementary SI explanations is: Object a has property Q because: (I) it causally interacted with object b which had property P; and (S) it has permanent properties S1,…,Sn . (I) describes the interaction in which a acquired property Q. The properties mentioned in (S) are causally relevant because they determine how a reacts to the causal interaction. An SI explanation is a chain of elementary SI explanations, in which the explanandum of one explanation is an initial condition for a subsequent element of the chain. For instance, an SI explanation of the rise and drop could consist of the elementary explanation given above plus similar explanations of why the thermometer was immersed in hot water and why it had the described structure.
112
E. WEBER AND J. VAN BOUWEL 3. TYPES OF EXPLANATION PROBLEMS
SI explanations do not have a fixed explanatory value: their value depends on the kind of question one is willing to answer by the explanation. Suppose that we observe that x has property P at time t. This observation can give rise to different explanation-seeking questions, even if all questions are assumed to be requests for causal explanations. At least four types of questions must be distinguished: (E) (I) (IN) (F)
Why does x have property P, rather than the expected property PN? Why does x have property P, rather than the ideal property PN? Why does x have property P, while y has the ideal property PN? Is the fact that x has property P the predictable consequence of some other events? (H) Is the fact that x has property P causally connected with events we are more familiar with?
P and PN are mutually exclusive properties. An E-type question compares an actual population fact with one that we expected. For instance, we can try to explain why only 57,5 % of the Belgian population (between the age of 15 and 65) is working, while we expected 61% (the average of the European Union). An I-type question compares an actual population fact with an ideal one (one we would like to be the case). For instance, we can try to explain why only 57,5 % of the Belgian population (between the age of 15 and 65) is working, while the ideal put forward by the European Union is 70%. An IN-type question does basically the same, but a different object in which the ideal situation is realised is used to emphasise that the ideal is not unrealistic. E-type questions are obviously motivated by surprise: things are otherwise than we expected them to be, and we want to know where our reasoning process failed (which causal factors did we overlook?). Contrastive questions of type (I) and (IN) are motivated by a therapeutic or preventive need: they request that we isolate causes which help is to reach an ideal state that is not realised now (therapeutic need) or to prevent the occurrence of similar events in the future (preventive need). The non-contrastive questions of type (F) also have a pragmatic motivation: the desire to have information which enables us to predict whether and in which circumstances similar events will occur in the future. H-type questions are motivated by a psychological desire rather than pragmatically motivated. In Section 4 and 5 we will discuss explanatory power with respect to questions of the four types. We start with the non-contrastive ones.
ASSESSING THE EXPLANATORY POWER OF CAUSAL EXPLANATIONS
113
4. THE CONTEXT-DEPENDENCE OF EXPLANATORY POWER: NON-CONTRASTIVE CASES 4.1 H-type questions We start with H-type questions. The simplest possible answer to such question is an elementary SI explanation as defined in Section 2. As an example, consider an hardworking farmer, John, who is considering introducing new techniques and specialising in cash crops. He wants to stop producing food crops for consumption by himself and his family. He discovers that his wife has spent most of the money he earned by selling cash crops the previous years. He had hidden it in a place he thought safe but his wife found the money. John decides to continue using old techniques and stick to a mix of cash and food crops. This behaviour can be explained as follows: John decides to stick to old techniques and to a mix of cash and food crops because: (I) when he checked the hiding place, most of the money was gone; and (S) he has an aversion to risk. In this explanation, the explanandum is subsumed under a stereotype: risk aversion. The explanation works because risk aversion is familiar to everyone. Even extreme risk seekers can understand how people with risk aversion think. And many people might conclude that they would do the same thing in the same circumstances, because they have a similar risk aversion. In other words: the explanation works by enabling empathy. There are many similar stereotypes that can be used to reduce seemingly strange behaviour to something familiar: philosophers sometimes (?) violate deadlines for submission because they systematically underestimate the time required to write a paper, people sometimes buy things they do not need because they are misled by a salesman, members of parliament sometimes vote against their own opinions because following the party discipline is more rewarding in the long run. Our examples might suggest that H-type questions always relate to human behaviour. Let us correct this wrong impression by giving examples of a different kind. Suppose that Peter had a car accident last night. An H-type question about this event can be given by providing evidence that a prototypical cause of car crashes (drunkenness, ice on the road, heavy rain, ghost-driver, use of cruise-control) was present in this case. In the period ranging form November 1914 till September 1918, several British battleships (Bulwark, Princess Irene, Natal, Glatton and Vanguard) exploded while they were lying in the harbour. They were not under attack by German surface warships, and the possibility of a submarine attack was excluded (it was established that the explosions were internal, so they were not caused by a torpedo). Newspapers, but also the official investigation committee, tried to link the explosions to prototypical causes: sabotage by a German agent, accident due to carelessness, accident due to a construction fault, ... .
114
E. WEBER AND J. VAN BOUWEL
It is clear that the explanatory power of answers to H-questions depends on whether or not they establish a causal link with a familiar phenomenon. We do not need probability values. For instance, it does not matter how probable farmer John’s behaviour is, nor does it matter how high the success-rate of sabotage is. Explanatory power does not depend on the depth of the explanation either: as soon as a link with a familiar phenomenon is established, we can stop. So depth is not an intrinsic value in this context. However, going back in time can be instrumental for obtaining an adequate answer sometimes we need a chain of elementary SI explanations in order to arrive at some familiar phenomenon. 4.2 F-type questions In order to discuss F-type questions, we use Robert Axelrod's analysis of unofficial truces (Axelrod 1984; Little 1991, p. 58) as an example. In World War I, there were unofficial truces by military units on both sides: each side continued to fire its weapons but without inflicting much damage. Axelrod explains these truces as rational behaviour based on a strategy of conditional co-operation in a repeated prisoner's dilemma situation (this strategy amounts to: start with co-operation, and keep on co-operating as long as the opponent co-operates). The units were engaged in a trench warfare, which guarantees a relatively stable, clearly identifiable enemy (units are not replaced overnight) whose reactions can be easily observed. The underlying idea is that in different types of warfare (Blitzkrieg, guerilla) similar truces are impossible because there is no stable enemy. In this example, a truce is explained as the aggregate result of the behaviour of two units. The behaviour of each unit can be explained as follows: Unit a fires its arms at b without inflicting any damage because: (I) it observed that b fired without inflicting damage the previous day; (S1) unit a adopts a conditional co-operation strategy; (S2) unit a considers unit b to be its relatively stable enemy; and (S3) unit a can easily observe the reactions of unit b. Since F-type questions are motivated by a desire to have information which enables us to predict whether and in which circumstances similar events will occur in the future, probability values are important. The explanation is worthless if we do not have a covering law which tells us how probable the explanandum is given the causes mentioned in the explanation. Moreover, high probabilities are valuable, and deductive explanations are the ideal: if we are sure that something undesirable will happen, there can be no doubt that we have to try to do something about it; if we can predict only with, e.g., probability 0.5, decision making is more complicated. What about depth? When answering F-type questions, depth of the explanation has an intrinsic value, rather than an instrumental one as in 4.1. The reason is very simple: the sooner we know that something undesirable will happen, the better our chances for avoiding it.
ASSESSING THE EXPLANATORY POWER OF CAUSAL EXPLANATIONS
115
5. THE CONTEXT-DEPENDENCE OF EXPLANATORY POWER: CONTRASTIVE CASES 5.1 I- and I'-type questions In order to discuss I- and IN- type questions, we introduce a fictitious but realistic (because it is based on real causal knowledge) example. Two neighbouring cities, Koch City and Miasma City, have a history of simultaneous cholera epidemics: every ten years or so, after excessive rainfall, cholera breaks out in both cities. Suddenly, in the year X, the population of Koch City remains healthy after a summer with lots of rain, while Miasma City is hit by cholera again. Explaining the difference can help Miasma City in the future (therapeutic function). Let us consider the following explanation of the contrast: There was a cholera outbreak in Miasma City because: (I) there was a lot of rainfall; and (S) Miasma City had no sewage system. There was no cholera outbreak in Koch City, despite the fact that (I) there was a lot of rainfall, because (S) Koch City started building sewage system after the previous outbreak, and this system was ready now. The explanation refers to a difference in structure that is the result of a human intervention that was present in one case, but absent in the other. This is the reason why it can serve a therapeutic function. An answer to an I- or IN-type question is adequate only if the difference that is singled out is in some way manipulable. In the example the value of the explanation lies in the fact that Miasma City can also build a sewage system if they want. Manipulability is a minimal condition of adequacy for explanations in context we are discussing here. However, high probability values are also important. If a sewage system is the only causally relevant factor (i.e. if cities with sewage system are never struck by cholera, cities without a sewage always after a certain amount of rainfall), the explanation above is perfect: it describes the only possible therapy, and this therapy is 100% efficient. The value of an answer to an I- or IN-type question depends on manipulability, but also on the degree of efficiency and indispensability of the therapeutic measure it suggests. With respect to depth, the situation is similar to that of 4.1: going back further and further in time may be necessary to arrive at a factor that satisfies the minimal condition of manipulability.
116
E. WEBER AND J. VAN BOUWEL
5.2 E-type questions We will discuss E-type questions with an example from the famous sociologist Robert Merton. During World War II, the American government asked Merton to analyse the success and failure of propaganda campaigns (Merton 1957, Chapter 14). In his analysis, Merton takes individual propaganda documents (e.g. a movie, a pamphlet, a radio speech) as units. Each propaganda document has a specific aim, viz. to convince the reader/viewer/listener to adopt a specific role in the war machine. One of Merton’s most interesting examples is a pamphlet that was meant to convince Afro-Americans to volunteer for the army, i.e. to adopt the role of soldier. The pamphlet was a complete failure. It increased the self-confidence of Afro-Americans, but did not convince them to go to the army. So the following Etype question arose: Why does this pamphlet increase the self-confidence of its Afro-American readers, rather than convince them to become a soldier? Merton’s method for answering this question (and similar ones) was to divide the document into items. Movies are divided into scenes, while pamphlets contain two kinds of items: text paragraphs and pictures accompanied by captions. The unsuccessful pamphlet contained, according to Merton’s analysis 198 items. Items are then compared with the messages the writers want to communicate to their readers or audience. The aim of the pamphlet was twofold: (1) to convince the readers that, while Afro-Americans still suffer from discrimination, great progress has been made; and (2) to convince that these attainments are threatened if the Nazis win the war. Some of the items were designed to communicate the first message, others to communicate the second message. In general, the items of a propaganda document can be grouped according to the specific message they want to communicate. In Merton’s view, a pamphlet will be successful if and only if the items of each group are sufficient for communicating the corresponding message to the audience or readers. The reason why the pamphlet failed was that the authors tried to communicate the second message mainly by text items. The - mostly lower educated - AfroAmericans did not read the text, they just looked at the pictures. Most of the pictures related to the first theme: they featured Afro-Americans that occupied important positions in the American society. The result was that their self-confidence increased, but they dit not conclude from the pamphlet that their attainments were in danger. Hence, they did not volunteer for military service. An answer to an E-type question must show what was wrong in the line of reasoning that lead to the wrong expectation, it has to point at a hidden or forgotten factor. This is what Merton does: he shows that the authors neglected the fact that the average degree of schooling of Afro-Americans was low. This leads them to having the wrong expectation that their pamphlet would work.
ASSESSING THE EXPLANATORY POWER OF CAUSAL EXPLANATIONS
117
What about probability values and depth of the explanation? The situation is similar to that in 5.1. Probability values are important in order to assess whether the factor that is singled out was the only mistake in the naive line of reasoning. Going back further in time is not valuable in itself, but may be instrumental in finding one or more mistakes. 6. WHY IS UNIFICATION ABSENT? Our account of the explanatory power of causal explanations was restricted to singular cases, where the explanandum is one fact (with or without a contrast). We think that in such cases, unification is irrelevant, because the explainee is only interested in the case at hand. In Section 3, we have distinguished five types of explanation problems and questions that express them. We think that unification is irrelevant for the value of answers to questions of this type. However, there are other types of explanation problems. Consider someone asking: “Why do Peter and Mary both have blood group A?”. Here too, the context determines what is to count as a good answer. A satisfactory answer will have to consist in showing that “Peter has blood group A” and “Mary has blood group A” are instances of the same law. This can be done by giving an appropriate couple of deductive-nomological explanations, e.g. L: All humans which belong to category IAIAHIAIO have blood group A. C1: Peter is a human. C2: Peter belongs to category IAIAHIAIO. ____________________________________________________ E: Peter has blood group A. L: All humans which belong to category IAIAHIAIO have blood group A. C1: Mary is a human. C2: Mary belongs to category IAIAHIAIO. ____________________________________________________ E: Mary has blood group A. The phenotypes of the ABO blood group system (the blood groups A, B, AB and O) are determined by the genes IA, IB and IO. IAIA×IAIO is a category of cross: an IAIA×IAIO-individual is a descendant from one parent with genotype IAIA and one parent with genotype IAIO. Unification is certainly a desideratum for explanations in such contexts, but the explanation problems we have discussed here are of a different nature. It is common to distinguish explanations of laws from explanations of singular facts. We think that a threefold distinction is better: there are explanations of laws, of singular facts, and of sets of facts.64 Unification is important when dealing with sets of facts, and that is the reason why it has no place in the account developed in Sections 3-5. 64
See also Weber (1999) and Weber and Van Dyck (2002).
118
E. WEBER AND J. VAN BOUWEL 7. CONCLUSION
Our results can be summarised as follows: (1) In most contexts (F-type, I- and IN-type and E-type questions) a posteriori probability is important (2) Explanatory depth is an intrinsic value in only one context (F-type questions); in the other contexts its value is merely instrumental. (3) In most contexts, there is more to explanatory power than a posteriori probability and/or depth: H-type questions require familiarity, I- and IN- type questions require manipulability, E-type questions require the highlighting of a hidden or neglected factor. We have started this article by referring to Salmon’s idea that the explanatory power of an explanation depends on its depth: an explanation is better than another if it cites more relevant causal interactions and causal processes. The ideal would then be to include all facts, no matter how insignificant they might seem, that are in any way relevant to the explanandum under consideration. This would give us the ideal explanatory text (cf. Railton 1981). But, as should be obvious after our examples, it is misleading to talk of an ideal explanatory text in Railton’s sense, the complete causal story, as an ideal. The explanatory power of a causal explanation has to be judged by taking the type of question (F-E-H-I or I’) and the context into account, what implies that there is not one criterion or desideratum on the basis of which the assessment of explanatory power of all explanations can be done. Therefore, there is no general ideal. Even a complete causal story, Railton’s ideal explanatory text, is not the most efficient or best explanation in all cases. An ideal cannot be identified independently of the types of explanation-seeking questions and their context. The latter are indispensable elements in the assessment of explanatory power. REFERENCES Axelrod, R. (1984). The Evolution of Cooperation. New York: Basic Books. Cartwright, N. (1983). How the laws of physics lie. Oxford: Clarendon Press. Dowe, P. (1992). Wesley Salmon's Process Theory of Causality and the Conserved Quantity Theory. Philosophy of Science 59: 195-216. Dowe, P. (1995). Causality and Conserved Quantities: a Reply to Salmon. Philosophy of Science 62: 321333. Little, D. (1991). Varieties of Social Explanation. Boulder: Westview Press. Merton, R. (1957). Social Theory and Social Structure. Glencoe, Illinois: The Free Press. Railton, P. (1981). Probability, Explanation, and Information. Synthese 48: 233-56. Salmon, W. (1984). Scientific Explanation and the Causal Structure of the World. Princeton, New Jersey: Princeton University Press. Salmon, W. (1994). Causality without Counterfactuals. Philosophy of Science 61: 297-312. Weber, E. (1999). Unification: What Is It, How Do We Reach it and Why Do We Want it?’. Synthese 118: 479-499. Weber, E. and Van Dyck, M. (2002). Unification and Explanation. A Comment on Halonen & Hintikka and Schurz. Synthese 131: 145-154.
SOME NOTES ON UNIFICATIONISM AND PROBABILISTIC EXPLANATION REBECCA SCHWEDER
1. THE UNIFICATIONIST MODEL OF SCIENTIFIC EXPLANATION The simplest and most convenient way to present the unificationist model of explanation is to state what the unificationist sees as the necessary and sufficient conditions for something’s being a scientific explanation. This exposition will also make it clear how the unificationist model relates to Hempel’s covering law model, as well as what distinguishes the two. According to the unificationist, S is an explanation iff C1. S is a deductively valid logical argument, where at least one of the premises is a law-like statement. C2. S is an (indispensable) part of the best organized system of beliefs. From an intuitive point of view, C1 simply says that a necessary condition for S’s being an explanation is that S is a Deductive-Nomological explanation (or DNexplanation, for short). So far, the unificationist model completely agrees with Hempel’s covering law model of explanation. To indicate the difference between the covering law model of explanation and the unificationist model, a DN-explanation will be called “an explanatory argument”—according to the unificationist, a bona fide explanation is a covering law explanation that has the additional property of being part of a belief system which is of a certain kind. The covering law model has met with criticism that usually comes in the form of counter examples. The literature makes it clear that although Hempel’s covering law model states the necessary conditions for something’s being an explanation, the model does not provide the sufficient conditions (although its critics seldom puts it that way, this is what the counter examples amount to). An attempt to formulate sufficient conditions for explanation must go beyond the semantic and syntactic features of the argument. I do not have conclusive evidence for this contention, but none of the attempts that have been made so far seem to have been successful (see e.g. Schweder (2004) for discussion).
119 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 119–128. © 2007 Springer.
120
R. SCHWEDER
The condition C2 is intended to capture conditions that, together with C1, are sufficient for explanation. Once S complies with C1, its fulfilling C2 ensures its status as a bona fide explanation, according to the unificationist proposal.65 2. PROBABILISTIC EXPLANATION Any theory of explanation that aspires to be taken seriously must also address the issue of probabilistic explanation. The term “probabilistic explanation” is intended to denote explanatory arguments where we do not have a universally true law statement from which the phenomenon to be explained can be deduced. Perhaps the easiest way to grasp the idea of a probabilistic explanation is by giving an example: Suppose that we would like to explain why Jones, who is a criminal and has spent many years in prison, engages in criminal acts almost immediately on his release. Although there is no universal law to the effect that prisoners that are released immediately start committing crimes, there might be a law-like statistical generalization which is relied on for explaining the fact that Jones takes up his criminal activities again on his release. The generalization allows us to construct an explanatory argument that has the logical form of the ordinary DN-explanation: (1) Two thirds of the totality of convicted criminals takes up criminal activities within less than a year of their release from prison. Jones is a convicted criminal that is released from prison. Jones takes up criminal activities within a year of his release. The double line is to indicate that the explanans—the premises of the argument—do not strictly entail the explanandum—the phenomenon to be explained. Instead, the explanans makes the explanandum phenomenon very likely, or expected to a high degree. Hempel rhetorically asks why probabilistic explanations cannot be deductive arguments, along the lines of e.g. (2) Most convicted criminals take up criminal activities within less than a year of their release from prison. Jones is a convicted criminal that is released from prison. (It is very likely that) Jones turns back to criminal activities within a year of his release The reason given is that sentences that are operated on by modal locutions such as ‘it is very likely that,’ ‘many,’ ‘it is almost certainly the case that’ do not have a definite truth value. They are not true or false, and for this reason they are unfit for taking the role as explanandum sentences in an argument. A conclusion of a logical 65
For a detailed discussion of the condition C2 and the nature of the belief system, see Schweder (2004).
SOME NOTES ON UNIFICATIONISM AND PROBABILISTIC EXPLANATION
121
argument must be a sentence capable of having a definite truth value (Hempel 1965, p. 382). Nevertheless, there seems to be a reason for fitting Inductive-Statistical explanations (or IS-explanations for short) into the general framework of the covering law model of explanation. The similarity, or shared characteristic, between DN-explanations and IS-explanations is that both kinds of argument make the explanandum phenomenon expected. Although the law-like statement in (1) does not make a categorical claim, it does give us a reason to expect Jones’ relapse. Still, there are familiar difficulties related to the IS-model having to do with how to assess the notion of “being expected.” This becomes especially clear when the probability statement that is used in the explanatory argument makes a precise claim concerning the probability. It appears arbitrary to claim that an event having a probability of, say, 0,85 was to be expected, but that an event having a probability of 0,84 is not. But this particular problem has already received much attention, and will not be addressed in the present paper. 3. THE ROLE OF EXPECTATION IN EXPLANATION There is a different, and in my eyes, more interesting problem relating to expectancy in explanation. We recall that in the original formulation of the conditions for DNexplanation, the explanandum sentence is a description of some singular phenomenon thought to be in need of explanation. But the unificationist disagrees with Hempel precisely over this point. In fact, this particular disagreement is one of the fundamental reasons why the unificationist thinks that the covering law model is inadequate. In his (1974) article, Friedman points out that science is only rarely concerned with singular phenomena. In other words, it is not usually the case that scientific explanations take the form of a covering law argument where the explanandum is a description of a singular phenomenon. If this observation holds, as I think it does, then we must conclude that scientific explanation is rather concerned with general phenomena. This is also confirmed by Kitcher who points out that Singular why-questions are often concerned to relate the phenomenon described by the topic to other similar phenomena [---] the intent of the question is implicitly general and we could say that, while the apparent topic of the question is singular, the real topic concerns a regularity. (1989, p. 427)
This contention, of course, casts doubts over the idea that expectation has a central role to fill in explanation. Because, as Friedman points out, there is no literally expecting general phenomena. General phenomena are not located in time or space. If it is correct that expectation is not a central part of explanation, then, it seems that the least common denominator that collects DN- and IS-explanations under the same roof vanishes. There must be other grounds for claiming that IS-explanations have the form rendered by (1). The unificationist, already suspicious of the idea that the explanans make the explanandum expected in an explanation, thinks that we need to supply the model of
122
R. SCHWEDER
explanation with a different rationale. The guiding principle is that explanations provide understanding because successful explanation results in cognitive economy. According to Friedman, we aim for a system of beliefs with as few basic or underived facts as possible, from which as many facts as possible can be derived. This is the notion of understanding that the unificationist finds most useful, and also the reason why explanations take the form of logical arguments (see also Schweder (2004) for a more extensive discussion). Now we are in a position where we can give the issue a more precise formulation: why probabilistic explanations are deemed problematic from the point of view of the unificationist model. The reason is that for the unificationist, the deductive element is an inescapable and essential component in explanation. Deduction is the means by which we organize the beliefs in the belief system, and how we come to understand, and thereby explain, phenomena. But only DNexplanations carry this characteristic: IS-explanations do not. The question that faces the unificationist is whether and how to accommodate such non-deductive explanatory arguments. 4. PROBABILISTIC EXPLANATION I will start by qualifying our concept of probabilistic explanation a bit. In order to do this, we will introduce a new element in our reasoning: that of a predictive argument. A predictive argument is exactly like an explanatory argument except that the conclusion is a description of an event that has yet to occur.66 The difference between an explanation and a prediction in our sense is factual in that the event to be explained has occurred, whereas the event to be predicted has not. There is no logical distinction between the two kinds of argument. The notion of a predictive argument is a useful tool for making an important distinction, which I believe has a bearing on explanation and explanatory argument, but which cannot be expressed within that framework. Hållsten (2001) proposes a distinction between those probabilistic explanations that can be improved—become better explanations—by adding more information, and those that cannot. Let us start by considering those IS-arguments that can be improved in precision by adding more information. As an example, consider the following IS-argument: (3) Two thirds of the totality of convicted criminals take up criminal activities within less than a year of their release from prison Jones is a convicted criminal that is released from prison Jones takes up criminal activities within a year of his release
66
The use of predictions in the context of theory of explanation is not uncommon. It was noted by Hempel 1965 that his covering law explanations had the same logical form as prediction. Although the so called symmetry thesis is controversial, the connection between explanation and prediction has continued to trigger the imagination of many writers.
SOME NOTES ON UNIFICATIONISM AND PROBABILISTIC EXPLANATION
123
If we think of it as a predictive argument (that is, the explanandum is not a perceived matter of fact, but a state of affairs that may, or may not, come about), the argument could be improved to make more accurate predictions if, in the same way, new information were to be added. The argument could be modified in the following way. Suppose that the group of prisoners that were released could be divided into two sub groups: one consisting of ex-prisoners that found a job by the time of their release, the other group consisting of ex-prisoners that were unable to get a job. Furthermore, it was discovered that the group consisting of those prisoners that were unable to find a job on their release was much more likely to take up criminal activities than the other group. Let us say that the rate of relapse were four out of five for those ex-prisoners that did not find a job. The new information could be added to the argument, so that, instead of (3), we would have (4) Four fifths of the totality of convicted criminals, if they do not find a job, takes up criminal activities within less than a year of their release from prison. Jones is a convicted criminal, released from prison, and is unemployed Jones takes up criminal activities within a year of his release. Intuitively, we think that the latter argument is a better explanation than the one that preceded it. What are the reasons for preferring (4) over (3)? Before answering that, we should pay attention to a further, and related, problem with the IS-model of explanation. This is that two different IS-explanations may have contradictory conclusions. Suppose that we are informed that there is a high probability that married prisoners do not relapse after their release from prison, and that Jones is married. From this piece of information, we can construct the following explanatory argument: (5) Four fifths of the totality of convicted criminals, if they are married, adapt well after their release. Jones is a convicted criminal who is married Jones adapts well after his release from prison. The conclusion of (5) contradicts the conclusion of (3) and (4), so how are we to handle this particular difficulty? Hempel suggests the introduction of the requirement of ‘maximal specificity,’ to the effect that all the relevant background information must be taken into account in probabilistic explanation. This idea has also been developed formally by Salmon (1971). The requirement of maximal specificity helps us handle the choice between (3) and (4), but leaves the conflict between (3) and (4) on the one hand and (5) on the other, unresolved. And the unificationist does not have any quarrel with the covering law theorist over the idea of maximal specificity. On the contrary, the unificationist thinks that
124
R. SCHWEDER
background knowledge always is relevant in all explanation, whether IS- or DN (for a detailed discussion, see Schweder (2004)). What are we to say of the other kind of IS-argument, the kind of argument that cannot be made more precise by adding more relevant information? This sort of ISargument seems to presuppose that there are genuinely random events. There does not appear to be a consensus over whether there are such phenomena. But if there are such phenomena, they typically belong to the domain of micro physics—the decay of certain kinds of matter and the tunnelling effect are usually mentioned as examples of such random events. The following IS-argument illustrates what the “explanation” of the decay of a particular atom might look like: (6) U-234 has a half time of 250 000 years a is an U-234 atom a decays The argument is not very precise, but given that decay is a genuinely random event, an argument such as (6) is the best explanation to be had. No additional facts of the matter would change the argument, or enable us to predict the decay with greater precision. The arguments (3) and (4), (5) and (6) have a structure which is superficially similar. But the similarity is deceptive and hides deep going differences as to the real explanatory power of the arguments. The discovery that some probabilistic law statements can be complemented with new information to give more precise predictions while others cannot, suggests that there might be some further distinction between different kinds of probabilistic statement. This distinction cuts across the epistemic dimension and accounts of the explanatory power inherent in the probabilistic statements. 5. PROBABILISTIC LAW-LIKE STATEMENTS How do we account for the difference between the two kinds of IS-arguments? It seems reasonable to assume that the differences lie in the nature of the probabilistic statements that are used in the explanatory arguments. We need to find out what the probability is a probability of. On the assumption that there are genuinely random events, it appears reasonable to think of probabilistic statements as the one in (6) as expressing a law of nature. This move allows us to distinguish between probabilistic statements that express genuine laws of nature from probabilistic statements that express subjective probability and from probabilistic statements that merely state a frequency. Possibly, further distinctions can be made, but the ones that have been mentioned are enough for the present purposes. I will dedicate some space to discuss each. Let us start with the simplest case, the case where the statement expresses a frequency. A statement of frequency is a generalization over particular instances: it refers to observed instances. Take the statement that 5% of a population A is carrying on the HIV virus. This probability represents a frequency: 1 individual out
SOME NOTES ON UNIFICATIONISM AND PROBABILISTIC EXPLANATION
125
of 20 in the population A is afflicted, the remaining 19 are not. Suppose that we want to “explain” the fact that Alice is a carrier: the argument would have the following form: (7) There is a 5% probability of acquiring HIV Alice has been infected with the HIV virus Alice carries HIV The probabilistic argument that contains a frequency premise is similar to the probabilistic argument containing a statement that expresses a genuine law of nature in that no added information changes the probability. The argument is, in a sense, “complete.” Given that the probabilistic law statement in (7) represents a frequency, is it reasonable to think that the phenomenon of Alice’s being a carrier of the virus has been explained? It seems clearly not. A statement of frequency is based on perceived instances. It does no more make sense to think that Alice’s having the virus is explained by the probability statement in (7), than it is to think that Alice’s carrying the virus is explained by any other of the particular instances of HIV carriers—e.g. Elsa’s carrying on the HIV virus, or Carl’s carrying the virus. Statements of frequency are useful for many purposes, but they do not carry any explanatory power. Or, whatever explanatory power they do possess, it cannot be utilized in the sort of explanatory argument that is under investigation. The next kind of probabilistic statement that will be considered is the kind found in IS-arguments that can be improved or made more precise by adding more information. The reason that the argument can be improved is that the probability expresses subjective probability, our belief in a certain event given our present state of knowledge. This, again, can be illustrated by an example: (8) The probability of tossing a coin heads up is 0.5 The coin a is tossed The coin a lands heads up The addition of relevant factors such as whether the coin is loaded in a certain way, the manner in which the coin is tossed, whether air resistance interferes with the coin’s journey and so on, may change the probability for the coin’s landing heads up. In the ideal case, we arrive at a complete explanation: the explanatory argument is then no longer an IS-explanation, but has been turned into a complete DN-explanation. We can think of statements of subjective probability as in (8) as typical for the situation in which we are ignorant of some of the relevant factors. The ignorance may be restricted to an individual, but may also be pervading the scientific community as a whole. For instance, it is a belief held by (parts of) the scientific community that smoking causes a certain kind of cancer. Plausibly, there are some causal mechanisms that are responsible for it, but these mechanisms have so far to be specified. We simply do not have the full explanation for that kind of cancer.
126
R. SCHWEDER
Sometimes it is not clear whether the probability statement is intended to express a frequency or a subjective probability. This ambiguity is of course an obstacle for determining, in the singular case, whether the argument is explanatory or not. Finally, there is the explanatory argument that cannot be improved or be made more precise. The reason that these IS-explanations must be deemed “complete” is that they represent genuine probabilities—they are not statements representing subjective probability reflecting ignorance on our part, neither are they mere frequencies. Given that the probability statement in the example represents a genuine law of nature, does the argument qualify as a real explanation? Before addressing this question, we need to make a general observation concerning our probabilistic statements. 6. GENERAL FACTS Hållsten’s distinction between probabilistic explanations that can be improved and those that cannot be improved is, indeed, a very useful distinction. However, his distinction suggests that the difference is merely epistemological, whereas I think that it reflects a deep going metaphysical disparity. I believe that we will be better informed about the explanatory power of IS-arguments once we are clear over the nature of the distinct probabilistic statements. The distinction can be made to correspond to the metaphysical distinction of a “mere” generalization and a genuinely probabilistic law. In order to fully understand the distinction, we must once again go back to the idea of accepted beliefs and general and singular facts. A singular fact is located in time and space: a sentence describing such a singular fact says of something that it is something. We can make that a bit clearer using examples: Jones is a criminal; a, who is a smoker, gets cancer; this piece of litmus paper turns blue. Those are all descriptions of singular facts, or singular phenomena if we want to adapt our terminology to the previous reasoning. But what are we to say of general facts? According to hypothesis, the law-like sentences that are indispensable in all kinds of explanatory statement are sentences describing or stating general facts. But there are two kinds of general fact, two kinds that correspond precisely to Hållsten’s two kinds of probabilistic explanations. These are, on the one hand, the genuine laws of nature, on the other hand those general facts that are mere generalizations. Again we might make this a bit clearer by saying that a generalization is just a collection of singular facts. The generalization that ‘smoking causes cancer’ is nothing over and above the collection of the singular facts that ‘a is a smoker and gets cancer,’ ‘b is a smoker and gets cancer,’ ‘c is a smoker and gets cancer,’ . . . ‘n is a smoker and gets cancer,’ and so on. We can characterize the difference between generalizations and laws of nature by claiming that a generalization depends for its truth on the truth of its instances, whereas a genuine law of nature can be true without any actual instances at all. Or in counterfactual terms: A law of nature supports counterfactual claims, whereas a generalization does not.
SOME NOTES ON UNIFICATIONISM AND PROBABILISTIC EXPLANATION
127
Returning to probabilistic explanation, Hållsten’s division between ISexplanatory arguments that can be enhanced with additional information and ISprobabilistic explanations that cannot be enhanced with additional information corresponds to the division between those IS-arguments where the explanans contains a generalization and an IS-argument where the explanans contains a genuine law. Although all IS-explanations superficially look the same, the IS-form harbours two fundamentally different sorts of argument. 7. UNIFICATIONISM AND PROBABILISTIC EXPLANATION Returning then to the question that is the main goal of the article: can ISexplanations be accommodated within the unificationist framework sketched in the previous sections? Can there be probabilistic explanation, according to the unificationist? Rehearsing our criteria of explanation, we observe that IS-arguments fail comply with our condition C1, since they are not deductively valid arguments. In fact, I think that this result is well in line with our intuitions concerning scientific explanation. Consider first the kind of IS-argument where the probabilistic statement denotes a frequency. Frequencies, we recall, are generalizations over particular instances. The IS-argument that utilizes a frequency statement is as clear a case of a nonexplanation as can be. The reason is that an explanation in terms of a generalization is an instance of so-called “self-explanation.” The observation that Alice is carrying the HIV virus is just one instance of all those instances that “make up” the frequency “1 out of 20 in population A carries the HIV virus.” My contention is that we don’t have to worry about this kind of IS-arguments being ruled out by our unificationist conditions C1 and C2. Next, there are those IS-arguments where the probability statement refers to some subjective probability, as in (8). Although the law-like statement does not, strictly speaking, denote a law of nature, it seems to me that there is good reason to think of it as having explanatory force. That reason is that the IS-argument suggests that there is an explanation to be had, of which the IS-argument will be a significant part: the IS-argument, though incomplete, can be expanded and improved and in the ideal case turned into a complete DN-explanation. The probability argument is a sort of vicarious explanation, suggestive of the “real” explanation that will eventually be had. While this second type of IS-argument complies with our condition C2, it fails to fulfil C1. One way to deal with the difficulty is to drop the requirement deductive validity. We would then have that S is an explanation iff C1'. S is an argument where at least one of the premises is a law-like statement. C2. S is part of the best organized belief system This characterization of explanation does not appear altogether unreasonable. However, we must not lose sight of the temporary nature of such IS-arguments. The acceptance of IS-arguments rests on their being useful tools in organizing our belief
128
R. SCHWEDER
systems. Statements such as that there is a 0,5 probability for tossing the coin heads up may indeed be very useful, but seem to me to have an air of accidence around it. Its status is that of an auxiliary hypothesis, one that can and will be discarded as soon as we have better and more precise information. Rather than modifying our conditions for explanation to accommodate temporary IS-arguments, I much prefer to stay with the original two conditions, thus allowing the IS-arguments the role of auxiliary hypotheses or temporary explanatory arguments. The most intriguing issue from the point of view of explanation theory is the question of whether “genuine” IS-arguments, those arguments where the law-like sentence denote what, for all we know, is a law of nature, can be said to be explanations. The question is of central importance to the unificationist, since if she has to accept that there are IS-arguments that are bona fide explanations, then that calls for a radical modification of the conditions set up for explanation. Intuitions do not seem a reliable guide to answer this question. Either we can accept that the phenomenon—that the uranium atom decays—is explained as in (6). However, there is nothing that stops us from going in the opposite direction and hold that some phenomena simply defy explanation—and the decay of atoms is just such a phenomenon. The end result of this modest investigation as to the nature of IS-arguments may seem disappointing, since we have not really come to a conclusion concerning “complete” IS-arguments. But I will have to let matters rest until we have a more fully-fledged theory of laws of nature, and when the question of whether there actually are probabilistic laws have been answered satisfactorily. REFERENCES Friedman, M. (1974). Explanation and Scientific Understanding. The Journal of Philosophy vol. LXXI(1). Hempel, C. G. (1965). Aspects of Scientific Explanation. Free Press. Hållsten, H. (2001). Explanation and Deduction: A Defence of Deductive Chauvinism. Stockholm: Almqvist &Wiksell International. Kitcher. P. (1989). Explanatory Unification and the Causal Structure of the World. In Kitcher and Salmon (eds.): Scientific Explanation. Minneapolis: University of Minnesota Press. Salmon, W. (1971). Statistical Relevance. In his Statistical Explanation and Statistical Relevance. Pittsburgh: Pittsburgh University Press. Schweder, R. (2004). A Unificationist Theory of Scientific Explanation. Lund: Studentlitteratur.
PART 2
ISSUES IN EXPLANATION
SELECTION AND EXPLANATION ALEXANDER BIRD
1. INTRODUCTION Explanations appealing to natural selection have an unusual and prima facie paradoxical feature. While we may explain general truths using such explanations, those explanations do not transfer to the particular instances of those general truths. Thus natural selection and the selective advantage of speed in escaping predators can explain why healthy, normal, adult gazelles can run fast. Yet such an explanation does not explain why any particular gazelle can run fast—the explanation in individual cases would appeal to the physiology of the animal, in particular its musculo-skeletal structure and heart and lung capacity and so forth. This contrasts with an explanation of why diamonds are hard, which does transfer to individual diamonds. The reason why all diamonds are hard is the same reason why any particular diamond is hard. Such explanations one might call ‘particularizable’. Explanations appealing to natural selection are not particularizable. In this paper I aim to isolate the source of the difference in the two kinds of explanation and thereby resolve the apparent paradox in the case of selection explanations. While the interest in such explanations stems from their use in evolutionary biology, the feature referred to is not limited to biological explanations alone. Any selection explanation has the property of not being particularizable, and it will be helpful in this paper to focus on a non-biological example. A restaurant has a rule that no gentleman will be admitted who is not wearing a tie. We may imagine that the rule is newly introduced and that potential customers do not know about it— and so the rule itself does not influence the wearing of ties on the particular evening we are considering. So it is true on this evening that all the men in the restaurant are wearing ties. And the explanation is a selection one. Many men wanted to dine in the restaurant; only those wearing ties were admitted; the tieless were sent away. But this explanation of the general proposition, that all the men are wearing ties, does not explain why any individual diner is wearing a tie. Mr Grey has just come from his office and is wearing the suit and tie he wears for work; Colonel Black is a very formal gentleman who always wears a tie; Dr White is trying to impress his new date, etc. So the selection explanation of the general proposition is not particularizable.
131 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 131–136. © 2007 Springer.
132
A. BIRD 2. HEMPEL ON EXPLANATION AND CONFIRMATION
Hempel, as is well known, gave related accounts of explanation and confirmation (Hempel 1965). To explain some fact is to cite a law or laws plus other relevant conditions from which the explanandum may be deduced (the deductivenomological (D-N) model of explanation). To confirm a hypothesis is to deduce some observed phenomenon from the hypothesis plus other relevant known conditions (the hypothetico-deductive model of confirmation). Putting these two together tells us that an observation confirms a (nomic) hypothesis if that hypothesis would, if true, explain the observation. That relationship between explanation and confirmation is itself open to question. Nonetheless it is not far from the truth (c.f. Dretske 1977, p. 261). What I want to focus on is the fact that because of the intimate relationship between the two models, problems for one frequently translate into problems for other. That is, if the relationship is roughly right, then a counterexample to the model of explanation ought to be also a counterexample to the model of confirmation. An example of this is the following. Achinstein’s famous counterexample to the D-N model of explanation cites the law that anyone who ingests a pound of arsenic will die within 24 hours (Achinstein 1983). Jones ingests a pound of arsenic and indeed does die within 24 hours. However shortly after taking the arsenic he is run down by a bus. So in this case the explanandum is deducible from the law and conditions. Yet clearly they do not constitute an explanation. Now imagine that we were testing the hypothesis that anyone who ingests a pound of arsenic will die within 24 hours. We see Jones ingest a pound of arsenic and subsequently record the observation that he is dead. Does this confirm the hypothesis? In the light of the additional information that he was killed by the bus, Jones’ death provides no confirmation. Thus a counterexample to the D-N model of explanation is readily transformed into a counterexample to the hypothetico-deductive model of confirmation. I believe that the reverse holds true also. The so-called raven paradox is often held to be the main counterexample to the hypothetico-deductive model of confirmation, although Hempel denied this. We are testing the hypothesis that all ravens are black. We see a white object—we deduce from our hypothesis that it is not a raven. Closer inspection shows that indeed it is not a raven, it is a shoe. The hypothetico-deductive model alleges that we have here a confirmation of the hypothesis. Hempel upholds this, claiming that there is indeed some confirmation, albeit a very small degree of confirmation. However, the intuition of many is that a white shoe has no bearing on the hypothesis concerning ravens. In favour of the later view, it is telling that the raven paradox translates into a counterexample against the D-N model of explanation. Since I can deduce from the law that all ravens are black that this white object (the shoe) is not a raven, I have, according to that model, an explanation of why the shoe it not a raven. Similarly, the law ‘all metals at very low temperatures show superconductivity’ plus the fact that this piece of metal is not superconducting allows us to deduce that this piece of
SELECTION AND EXPLANATION
133
metal is not at a low temperature. But we do not have an explanation of why it is not at a low temperature. Let us call such cases ‘raven counterexamples’ to the D-N model. We shall find that they play an important role in seeing why selection explanations are not particularizable. 3. EXPLAINING INSTANCES OF A GENERALIZATION Consider any generalization of the form ‘all Fs are Gs’. Let the set of Fs that are Gs (i.e. all the Fs) be called the set of instances of the generalization. The generalization (A) ‘all ravens are black’ is logically equivalent to the generalization (B) ‘all nonblack things are non-ravens. However the two generalizations have different sets of instances. The instances of (A) are all the ravens, whereas the set of instances of (B) is the set of black things. What the raven counterexamples to the D-N model show is that whereas a nomic generalization such as (A) may explain the properties of it instances, the contrapositive generalization (B) will not explain its instances. Similarly an explanation of (A) may be particularizable while and explanation of (B) will not be. We may have an explanation of why (A´) all metals at low temperature are superconducting, and hence was have an explanation of why (B´) all metals that are not superconducting are not at low temperatures. But that explanation will be particularizable with respect to (A´) but not to (B´). (B´) is a non-accidental generalization. It can be explained by reference to the laws of nature—whatever explains (A´) explains its logical equivalent (B´)—and to that extent is quasi-nomic. But that explanation transfers only to the instances of (A´), not to those of (B´). David Armstrong’s account of laws states that the law itself is a second-order relation of necessitation among universals (Armstrong 1983). The law that Fs are Gs may be symbolized: N(F,G). We can use this account to see why explanations associated with a generalization may not transfer to the instances of its contrapositive. For N(F,G) is not the same as N(not-G,not-F). First, there may be no universals not-G and not-F. Secondly, even if there are those universals, they may not be related by N. N(F,G) will explain why cases of F are G—they are necessitated to be. But in the absence of the law N(not-G,not-F) there need be no similar explanation of why some not-G is not-F. So although N(F,G) explains why ∀x(Fx→Gx) and thus why ∀x(¬Gx→¬Fx) it does not explain why some particular non-G is not an F (Armstrong 1983, p. 40-45). We are inclined to think that it is the generalizations employing positive predicates (such as in (A) and (A´)) that are the ones that genuinely reflect the structure of the law and thus can explain their instances or whose own explanations are particularizable. Correspondingly we think that their contrapositives, employing negated predicates, mislead as regards the underlying law and this is why they cannot explain their instances. However it would be wrong to think that this is always the case. Of course, the easy way to see this is just to employ only predicates that are the negations of those we used originally. There is however a deeper reason, which is that in some cases the laws and explanations do involve absences of
134
A. BIRD
properties. The fact that the graviton (if it exists) is massless explains why it travels at the speed of light; the camouflage of a stick insect explains why it is invisible to predators; the high velocity of a neutrino explains why it does not interact with normal detectors. So when presented with two generalizations that are logically equivalent, the one being the contrapositive of the other, we need to be careful in deciding which of the generalizations (if either) may be used in the explanation of its instances. It will not in every case be the generalization expressed employing positive predicates. 4. EXPLANATION AND SELECTION We have already seen that not all generalizations explain their instances, even if they are not accidental generalizations but are the consequences of the laws of nature. Contrapositives of the laws of nature do not always explain their instances, as shown by the raven counterexamples. Similarly the explanation of a non-accidental generalization is not in every case particularizable. If ∀x(Fx→Gx) and ∀x(¬Gx→¬Fx) are non-accidental generalizations, typically the explanation of those generalizations will be particularizable with respect to at most one of them and not both. We are now in a position to see why the explanations of the generalizations we have focussed on and which arise from selection processes are not always particularizable. Our discussion tells us that the explanation of ‘all men in this restaurant are wearing ties’ need not also explain the instances of that generalization, even though the generalization is no accidental truth but the consequence of a rule. That this exemplifies the structure I have depicted can be seen by considering the contrapositive: ‘all tieless men are outside the restaurant’. This seems rather closer to having explanatory force. For some individuals are outside the restaurant precisely because they are tieless. Mr Green tried to get into the restaurant but was turned away for lack of a tie. Our puzzle would be solved most neatly if we could say that whereas ‘all men in this restaurant are wearing ties’ does not explain its instances ‘all tieless men are outside this restaurant’ does explain its instances. For then we could say that our puzzle was generated by focussing on the wrong member of the pair of generalizations. While this is indeed the correct resolution of the puzzle in rough outline, the details require a little adjustment to this response. For it is clear that ‘all tieless men are outside this restaurant’ or even a stronger statement with modal force ‘all tieless men must be outside this restaurant’ do not explain all instances of the generalization. Some tieless men, such as tieless Mr Brown who lives 350 miles from the restaurant, have never attempted to gain admittance. Their being outside the restaurant is not explained by the rule excluding them. So we have to regard ‘all tieless men are outside this restaurant’ as a consequence of the rule ‘all tieless men seeking admittance to the restaurant are excluded from it’ plus the more-or-less trivial ‘all tieless men not seeking admittance to the restaurant are outside it’. It is only instances of the rule that get explained. We may depict the set of relationships between the various generalizations and their instances thus:
SELECTION AND EXPLANATION
135
5. CONCLUSION This paper has shown the following: (a) The raven cases provide counterexamples to the D-N model of explanation, not only to the hypothetico-deductive model of confirmation. Thus not everything deducible from a law is explained by that law. In particular not every non-accidental generalization can explain its instances. If ‘all Fs are Gs’ is a law, we should not expect it to provide an explanation of why a non-G is a non-F. Consequently, even though ‘all non-Gs are non-Fs’ is a non-accidental generalization, it cannot explain its instances. Similarly, not every explanation of a generalization is particularizable. (b) In selection cases the rule or law operating is often negative—men without ties are excluded, gazelles that do not run fast do not survive, etc. It is these negative rules and laws that do explain their instances. Consequently the contrapositives of such rules and laws do not explain their instances. So although it is a non-accidental truth that all men in the restaurant are wearing ties, we should not expect that truth to explain why any individual man in the restaurant is wearing a tie. We may further conclude: (c) Non-accidental generalizations that are true in virtue of selection processes seemed to have an unusual feature, that their explanations did not transfer to their instances. This feature was not shared by typical explanations in the physical sciences. That might suggest that selection explanations are somehow different from explanations in the physical sciences and thus fuel the view that Darwinian explanations in terms of natural selection are a new kind of explanation, somehow irreducible to the normal nomic or causal explanations of the physical sciences. This paper has shown that the non-particularizability of an explanation is not limited to selection explanations but is a consequence of the nature of explanation in general.
136
A. BIRD
Explanations of many non-accidental truths in the physical sciences will also be non-particularizable (e.g. of ‘all non-hard objects are non-diamonds’, and of ‘all non-superconducting metals are not at low temperature’). However, in the physical sciences we tend not to focus on such negative general truths, focussing instead on their (particularizable) contrapositives. All that is different about selection explanations is that they themselves employ negative properties. Consequently in their cases it is the negative generalization that has a particularizable explanation while it is the logically equivalent positive generalization that is non-particularizable. Selection explanations do not appear, as far as this issue is concerned, fundamentally different from other nomological explanations. REFERENCES Achinstein, P. (1983). The Nature of Explanation. Oxford: Oxford University Press. Armstrong, D. (1983). What is a Law of Nature? Cambridge: Cambridge University Press. Dretske, F. (1977). Laws of Nature. Philosophy of Science 44: 248-268. Hempel, C. (1965). Aspects of Scientific Explanation. New York: The Free Press.
IBE AND EBI On explanation before inference JOHANNES PERSSON
1. IBE CHARACTERISED Inference to the best explanation (IBE) is theoretically interesting in that it promises to throw new light on what an explanation is. IBE challenges the standard view of the relation between inference and explanation. We tend to think that first we infer then we scan our pool of inferences for suitable explanations. But as Peter Lipton (2004, Chapter 4) convincingly argues, and as we all suspected from detective stories, this view seriously underestimates the epistemic role of explanation. The director of Circus Rinaldo is very upset. There has been a theft of several of his valuable animals. The famous Swedish detective, Ture Sventon, is contracted. He enters one of the circus wagons. He discovers a pair of black sharp shoes and pressed cheviot trousers. With a voice like a series of gun shots Sventon concludes: “Will the Weasel, always this weasel!”67 And, to take another similar but less dramatic example, yesterday when entering my office I found a cell phone on the table. I quickly concluded that during my absence someone had come to see me. Both these inferences are instances of the well-known category of selfevidencing explanations (Hempel 1970, p. 370). Self-evidencing explanations describe benign circles with the explanatory relation holding in one direction and the evidential in the other. I concluded that someone had been in my room—forgetting the phone when getting tired of waiting—because this was the only potential explanation of its presence I could come up with. Similarly in Sventon’s case. In these cases it seems that explanation guides inference both in the way that it tells us where to look and whether we have found it; “it is not simply that the phenomena to be explained provide reasons for inferring the explanations: we infer the explanations precisely because they would, if true, explain the phenomena.” (Lipton 2004, p. 56). I will call this challenge of the standard view Explanation before inference or EBI. IBE is a species of EBI since it can be characterised as a self-evidencing explanation which is better than its competitors and judged satisfactory. Lipton put it this way in the first major work on IBE: 67
My translation. The Swedish name is Ville Vessla. Cf. Åke Holmberg (1964).
137 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 137–147. © 2007 Springer.
138
J. PERSSON Given our data and our background beliefs, we infer what would, if true, provide the best of the competing explanations we can generate of these data (so long as the best is good enough for us to make any inference at all). (Lipton 1991, p. 58)
Following this official characterisation we can formulate three general requirements for the applicability of IBE. First, whatever version of IBE we assume, and whatever merits it has, it can be applied only in situations where explanation candidates exist. That is, (1) one must regard at least one of the hypotheses to be selected from as a possibly satisfactory explanation (=MIN). Furthermore, in many actual cases either none or more than one hypothesis meets the minimal requirements, so unless the applicability of IBE is to be restricted to just the odd cases: (2) IBE must be capable of selecting one or a few hypotheses from several possibly satisfactory explanations (=RANK). A third requirement is also important because sometimes, in the end, no available option seems outstanding enough. Hence, 3) The best explanation has to be good enough (=SATISFACTORY). 2. VIOLATION OF MIN AND ITS CONSEQUENCES The three requirements imply corresponding limitations on IBE’s applicability. I start with violation of MIN. What counts as an explanation depends on the concept of explanation—a platitude, surely, but useful here. MIN already clearly makes the applicability of IBE dependent on one’s concept of explanation. Sometimes, simply because of too much divergence between a group of hypotheses and an explanation concept, no available hypothesis meets the minimal requirement one has for regarding it as a possibly satisfactory explanation. I will provide plenty of illustrations below but let us first think through the consequences. There seem to be only two alternative responses. The first option is to accept this limitation in the applicability of IBE. This is impossible for the advocate of IBE who, along the lines of Gilbert Harman (1965), claims that all nondemonstrative inference is IBE. But many modern advocates of IBE, like Lipton, Timothy Day and Harold Kincaid can rest content with this option. Lipton for one says that: “I accept at the outset that Inference to the Best Explanation cannot be the whole story about the assessment of scientific hypotheses” (Lipton 2001, p. 93). The second option is to claim that even though IBE is not applicable in such environments, this doesn’t matter since in those cases no epistemic selection-process among hypotheses occurs. To be more precise, the descriptivist claims that in such cases the inquirer doesn’t choose between the hypotheses, and the normativist claims that in such environments there is simply no epistemically justified way of choosing between them. It is one of the versions of the second option advocates of IBE interested in establishing it as a very widely applicable tool should choose, and the way the imperialist, who tries to argue that IBE is the only such epistemic tool, has to opt for. Little can be said in favour of either of these versions of the second option. It is easy to find examples that seem perfectly rational and which are incompatible also with the descriptive version. I will present a couple of rather dissimilar examples.
IBE AND EBI
139
The first is from Pierre Duhem who, it is fair to say, would have been hostile to the idea of IBE as an important epistemic tool, at least in physics. The second is from Yenima Ben-Menahem who actually tries to argue in favour of the importance of IBE in some situations, but offers an example that I think nicely illustrates a situation where the combination of the concept of explanation and the group of hypotheses under scrutiny makes IBE inapplicable. The final example comes from Wesley Salmon who stands out as one of the finest critics of IBE. These examples prove the existence of actual and interesting situations where IBE is inapplicable because MIN is violated. 2.1 Duhem’s case In The aim and structure of physical theory, Pierre Duhem intends to offer “a simple logical analysis of the method by which physical science makes progress” (Duhem 1906/1954, intro). He starts by considering the option that physics proceeds by a process of explanation. In order to evaluate this view he asks what an explanation is and then settles for the following account: To explain (explicate, explicare) is to strip reality of the appearances covering it like a veil, in order to see the bare reality itself. (Duhem 1906/1954, p. 7)
Certain modifications concerning the appearances obviously go on in physics. But are they of the right kind? In the beginning, the observation of physical phenomena enables us to apprehend the appearances in a particular and concrete form. Then, the experimental laws deal with these appearences in a more abstract and general way. But, Duhem accentuates, the more abstract scientific notions one achieves through this process only get us to know things as they are in relation to us, not as they are by themselves. There are no attempts to strip reality of the appearances. By MIN this part of physics is not an explanatory process. Surely, some of us have a less metaphysical concept of explanation than Duhem. And some of us understand basic theory formation in the sciences differently. But it doesn’t matter whether we agree with Duhem on these points or not. According to the Duhemian conception of explanation, it is clear that the scientific process so far described by him couldn’t be conducted by IBE. None of the facts nor any of the experimental laws established meets the minimal requirement for being an explanation. Since neither of them is a possibly satisfactory explanation, MIN is not fulfilled. What makes the case especially worrying is that, at the same time, Duhem finds no principled problems for inferential progress at this level of theory formation. So there are other inferential tools at work. It is important here to keep in mind that my argument works by keeping fixed both the concept of explanation and the method of inference. The natural reaction to save IBE by opposing Duhem’s concept of explanation is blocked. What is open to further inquiry is only whether a perhaps necessary specification of Duhem’s concept of explanation would still yield the same result. When Duhem wrote in 1906 the models of explanation hadn’t been as systematically discussed as today. But I cannot see how a further specification of Duhem’s concept could resolve the
140
J. PERSSON
conflict. At the same time as his views of what drives scientific inference was not metaphysical, his explanation concept clearly moves along that conceptual path. 2.2 Ben-Menahem’s case In an otherwise genuinely interesting and good paper, Yenima Ben-Menahem (1990), who defends Hempel and Oppenheim’s (1948) deductive-nomological model as a minimal requirement for explanations, confusingly offers the following example as an instance of IBE: A was seen chasing B into a shack, a knife in his hand. A few minutes later A was seen leaving, the knife still in his hand. B was found murdered in the shack. No other person was seen entering or leaving the shack. In court A denies that he is the murderer, claiming that he was just trying to frighten B, and that a third unidentified person, C, who was hiding in the shack committed the murder. Naturally, the judge is not impressed and sentences A to life imprisonment. (Ben-Menahem 1990, p. 322-323)
Actually the story has a true background, with a different conclusion: “Interestingly”, Ben-Menahem adds in the footnote, “the Talmudic judge, convinced as he was of the accused’s guilt, could not convict him, due to both the strict procedural requirements in cases of capital punishment, and the unacceptability of circumstantial evidence.” One gets the impression that without the recognition of IBE Talmudic judges lack an epistemic tool that would make a difference in this case. But this must be mistaken since, according to the deductive-nomological model of explanation Ben-Menahem accepts, an explanation takes the form of a deductive argument containing at least one law-statement. This constitutes a minimal requirement for IBE’s applicability. Neither the true nor the modified story, however, conveys any laws of murder-behaviour or any deductive arguments. Unless all this would be added to this elliptic explanation (Hempel 1970, p. 415), Ben-Menahem cannot utilise IBE as a way to improve on the sentence. But there is little reason to think that the elliptic explanation can be turned into a deductively complete explanation. It is at best partial (Hempel 1970, p. 416). Hence, I think that IBE in Ben-Menahem’s version plainly is not applicable to this murder case, and so we have another instance where the violation of MIN makes IBE inapplicable. Again, it doesn’t matter whether we think that Ben-Menahem has presented good reasons for her holding on to the deductive-nomological model of explanation. She says she does and then launches two examples where in the first a hypothesis is chosen from a group of candidates and where in the second a choice cannot be made. In neither case, however, does the hypothesis together with the surrounding theory have the characteristics that according to her concept of explanation make it an explanation. There is inference in one of the two cases, it is not to the best explanation, and the reason why not is that, in spite of what Ben-Menahem says, the hypotheses in the modified story do not qualify as potential explanations. What we haven’t proved is that IBE could not apply in a case like this. Perhaps with a more suitable, e.g. a causal concept of explanation, there would have been a best explanation guiding the judge. (For reasons that will be clearer later on I
IBE AND EBI
141
wouldn't want to deny that either.) But keeping the concepts fixed, as we actually find them, we see that in fact IBE was not in play. That is enough for the argument. 2.3 An example from Salmon One of Wesley Salmon's instructive examples of a violation of MIN is from the time when Hempel’s models were the received view of explanation: “Where there is no explanation,” starts Salmon, “there cannot be any inference to the best explanation. If Hempel is right about the nature of explanation, then, according to Harman’s doctrine, there cannot be any nondemonstrative inference in history (or any other discipline) in the absence of laws.” (Salmon 2001a, p. 65). So, either we say that Hempel is wrong about explanation, give up on nondemonstrative inferences in such cases, or conclude that there might be nondemonstrative inferences that are not IBE. Few of us have anything against the first option, but here we keep that as an assumption, and agree with Salmon that the latter option is the right choice. Some nondemonstrative inferences are not IBE. Regardless of whether Dray and Scriven were correct about explanation without laws, the historical inferences they pointed to are of course possible. With only the slightest touch of creativity, mimicking limitations can be found for any substantial concept of explanation. On a general level, the common distinction between the power of a hypothesis to predict and to explain provides numbers of actual examples. For instance, it has often been claimed that ecological theory can explain but not predict. And then we have biologists like Robert H. Peters (1991) who propose that ecology should aim instead for predictive success and care less about explanation. Both the traditional and Peters’s positions in effect prove the limitations of IBE. The very idea of separating explanatory and predictive theories seems to lead to the acknowledgement of extensive theory development which is not governed by IBE. 3. CONFLICTS BETWEEN BEST EXPLANATIONS: VIOLATION OF MIN AND RANK The second violation arises because of the interplay between two or more concepts of explanation. This is again bad news for the imperialist, but this time already the claim about inapplicability is directly relevant also to less imperialistic versions. However we spell out the slogan Inference to the best explanation, IBE should be applicable exactly in those situations where we have competing explanations, provided that they are good enough to merit inference. That is not always the case when two or more concepts of explanation are in play. For the following two reasons it cannot be assumed that all concepts of explanation yield the same result when a group of hypotheses is ordered with regard to their explanatory power. First, consider the following three hypotheses:
142
J. PERSSON
H1: Fc and All F are G H2: c is the cause of e H3: In reality, e and its apparent property G are X To simplify. Assume now that Ben-Menahem, Lipton (an advocate of causal explanation) and Duhem form a scientific group that wants to infer one of H1-H3. Then the following three explanation concepts compete: E1: deductive-nomological E2: causal E3: Duhemian H1 is the only hypothesis that meets MIN if E1 is assumed, H2 is the only hypothesis that meets MIN if E2 is assumed, and H3 is the only hypothesis that meets MIN if E3 is assumed. Consequently, in this case we would have three partial orderings with different outcomes. In all three cases, MIN might result in the selection of a best explanation, but each time a different hypothesis is selected. The illustration was chosen for its simplicity only. Real examples can be a lot messier. I am not at all sure however that the diversity in the above situation is unrealistic for anyone but the contemporary student of explanation. But even in his or her world a number of explanation concepts occur: unification, necessity, and causal concepts. But we know that some causal explanations do not unify and that some relations of necessity are not causal. Now follows a related case. We might, for instance, appeal only to the laws of classical physics or only to a principle of the general theory of relativity when we explain why a balloon moves forward, towards the cockpit, when the plane accelerates for takeoff (Salmon 2001a, p. 67). And we may have one physicochemical and one functional explanation of the existence of a thermostat: Some philosophers would say that the mechanical explanation is the better one, and that the functional explanation is not so good. Wright maintains that the functional explanation is not necessarily superseded by the mechanical explanation; both are legitimate and may stand side-by-side complementing one another. In one context one of these might be better; in another context the other might be preferable. The phrase, ‘inference to the best explanation,’ involves a uniqueness claim that is difficult to justify. (Salmon 2001a, p. 67)
Lipton also discusses the options for letting go of the uniqueness claim. As long as the explanations are compatible he sees no reason not to allow inference to both of them: “‘Inference to the Best Explanation’ must thus be glossed by the more accurate but less memorable phrase, ‘inference to the best of the available competing explanations when the best one is sufficiently good’” (Lipton 2001, p. 104). But the case cannot be dissolved that swiftly. The alternatives in the two examples arise because they are generated from partly different research programmes— programmes that no doubt have differing metaphysical assumptions, and probably also have different concepts of explanation built into them. According to classical physics inference to the quantum mechanical explanation cannot be allowed—the move from acceleration to a gravitational field it builds on is not recognised there.
IBE AND EBI
143
According to the functionalist framework, the physico-chemical explanation similarly cannot meet MIN. So what Lipton suggests must be interpreted in the following way: MIN is met if the alternative either meets the one explanation concept or the other. But where is the guarantee that a collection of hypotheses that in this way passes the minimal requirement then can be ranked in a suitable way, so that the two best candidates are compatible? Again, the answer is wanting. Secondly, IBE builds also on the idea that explanatory power is not an all or nothing affair. The explanatory power varies and according to requirement RANK, IBE has the tools to detect and take advantage of this fact. So if we instead assume that another set of hypotheses meets the minimal requirements for being an explanation according to each of E1 to E3, there is clearly a possibility that the further rankings of these three hypotheses would differ with regard to which of these explanation concepts one has utilised. That this ought to be expected in the circumstances where multiple concepts do occur is clear. There is simply no good reason why not. Since it cannot be assumed that all concepts of explanation yield the same result when a group of hypotheses is ordered with regard to their explanatory power, then when at least two concepts of explanation are in use in the same epistemic process, IBE cannot be safely applied. Since the hypotheses might have several mutually inconsistent explanatory characteristics, and thus many different rankings are possible, there is no unique (group of) best explanation(s). Moreover, the following otherwise sensible move proposed by Alexander Bird does no good here: Furthermore, one will require some measure of goodness (not merely a ranking), since we will want to infer the best explanation only if (i) it is itself good enough, and (ii) it is clearly better than the next best explanation (Bird 1999, p. 26)
To be the only hypothesis that passes the minimal requirement already makes that hypothesis clearly better than the rest, but we can add a lot to this difference and still have the same problem. In all of the groups, the outstanding hypothesis can be excellent. We have seen one type of conflicts arising because of the existence of several concepts of explanation in the community of researchers within a single discipline. How often the concepts of explanation vary between—and, indeed, within— individuals of an epistemic community interested in the same hypotheses depends on how explicit the explanatory issues are. The amount of discussion of explanatory issues in the central scientific papers and, of course, its degree of homogeny affects what concepts of explanation practitioners use in pursuing this enterprise. A discussion that is clear, consistent, and outspoken about matters of explanation will be less prone to generate multiple concepts of explanation. But how many such discussions do we know about? Probably very few are of this kind, and when we leave the scientific realm even fewer are, so the opportunities for the occurrence of competing concepts are plentiful. There are other fields where conflicting explanation concepts emerge. Most salient are probably cases where the object of research (or interest) can be approached from several perspectives. What is sometimes called multidisciplinary research is a good example.
144
J. PERSSON
Risk research is a typical discipline that can fruitfully be attacked from many different theoretical perspectives. Risks are traditionally studied in economics, engineering, medicine, psychology, etc, but there is also some important work done in social anthropology and, not to be forgotten, in philosophy. The question “What is a risk?” is frequently posed in all these disciplines, and the group of hypotheses involved in the intra-specific discussions are rather similar. But the explanation concepts clearly vary between the disciplines and this is probably one reason why the preferred answers to the fundamental question initially tend to vary as well. To follow Ortwin Renn, the standard approach to what risks are according to most of these disciplines is a three-component answer consisting of: outcomes that affect what humans value, possibility of occurrence, and a formula to combine the two. (Renn 1998, p. 51)
Interpretations of these components diverge. Psychologists have focussed on perceived outcomes and subjective utilities, engineers on physical effects and their relative frequencies, while philosophers have primarily distinguished between different kinds of risk with regard to the kinds of possibilities involved. Even though I agree with Lipton (2001, p. 100) that IBE doesn’t depend on that we have an adequate theory of what explanation is, in all actual situations the reluctance to provide a prior formulation of our explanation concepts increases the risk that we will have competing explanations of this kind, which is malignant from the perspective of IBE. In situations where more than one concept of explanation exists, both MIN and RANK give rise to conflicts that make IBE inapplicable, but not for the reason that these hypotheses cannot be good enough as explanations. 4. CONFLICTS BETWEEN IBE AND TOOLS ADAPTED TO PROMOTE ALSO NON-EXPLANATORY PROPERTIES OF THEORIES The received view seems to be that “in real cases of disagreement over the explanatory power of a theory, the dispute is hardly ever over the structure of an adequate explanation” (Ben-Menahem 1990, p. 325). I have presented two examples to the contrary. Sometimes there is real disagreement about structure. But I turn now to familiar territory, because the third limitation—which I label conflicts between IBE and tools adapted to promote also non-explanatory properties of theories—is particularly easy to spot on common ground. If we add to RANK the presupposition that the choice between hypotheses does not depend on structural or conceptual considerations, but on content alone, the following two questions need an answer: 1. If explanatory differences are claimed to be operative on the content level, then what distinguishes them from non-explanatory differences? 2. If they are distinct, what happens when they come into conflict with each other? The first question is the more fundamental but I will start with the second. It is easy to get slightly disappointed when one reads work on IBE. In response to
IBE AND EBI
145
criticism, the advocates often narrow the gap between IBE and other epistemic tools, such as bayesianism, in some clever way. Day and Kincaid (1994) and BenMenahem (1990) both do so by arguing that explanatory merit primarily comes from further background information and background beliefs, not from different kinds of evidence. In the end, it is not clear whether there really is a difference between IBE and other methods. But why then were we so eager beforehand to claim that it is considerations concerning the best explanation that counts? If we do not distinguish clearly between IBE and its epistemic relatives, such as selecting the in-other-waysbest-supported hypothesis that meets the minimal requirements for being an explanation, the appeal of Harman’s imperialism is certainly disturbed. It is interesting to see that he didn’t seem to notice the problem: There is of course a problem about how one is to judge that one hypothesis is sufficiently better than another hypothesis. Presumably such a judgement will be based on considerations such as which hypothesis is simpler, which is more plausible, which explains more, which is less ad hoc, and so forth. (Harman 1965, p. 89)
Only one of the four characteristics he mentions is of essentially explanatory nature. Maybe this is the reason why he thought that IBE accounted for all nondemonstrative inferences. IBE, according to Harman, was probably a family of inferences, where explanation as a guide was only one of many possibilities. It is a different idea than what inspires the recent attempts, and in Harman’s case the slogan is clearly misleading. It could as well have been spelled out as inference to the best hypothesis. I agree with Lipton (2001, p. 93-94) and Wesley Salmon (2001a, p. 79) that such moves within a proper theory of inference to the best explanation would take away the interest in the approach, imperialistic or not. So let us first assume that we have a group of hypotheses all with the same kind of explanation-property but differing in degree. Since there is a suitably refined and distinct explanation property, one must acknowledge that there is an a priori possibility of discrepancy between the ordering of hypotheses with regard to this property and an ordering according to certain other (non-explanatory) properties. And by accepting this difference between the best explanation and the in-otherways-best supported hypotheses, we also have to accept the possibility of conflict between explanatory and non-explanatory virtues. This means that traditional criticisms, such as van Fraassen’s (1989, p. 161-170) and Salmon’s (2001a and 2001b) are still on the track. Sometimes there will be genuine conflicts between IBE and tools adapted to promote also non-explanatory properties of theories. To take a perhaps too simple case: from “Fa&Ga” we can infer “Some F are G”. An only truth preserving inference strategy would merit this. From an explanationist picture “All F are G” would of course be preferable. The explanatory value of “Some F are G” seems slight. As it stands this is no critique of IBE. It is a general claim that applies to any theory: Either the properties are not distinct or divergence is possible. The problematic part arises when divergent cases are cowardly handled. To the extent that cases of divergence are resolved by having IBE stand back in favour of the others, IBE is correspondingly circumscribed. This generalises too, and is a critique of any theory that stands back in cases of conflict.
146
J. PERSSON
My major worry, however, is with the assumption that operative differences in explanatory powers on a content-level can be easily distinguished from nonexplanatory powers. In many such cases, it seems farfetched to distinguish between inference to the best explanation and inference to the best hypothesis. EBI (or explanation before inference) effectively levels out the differences between the two by making the concepts we use match our explanatory needs. Let us return to risk research. It is quite clear that risk research has to do better than saying that what we currently have is a number of different and valuable aspects of risks covered by different disciplines. A cross disciplinary selection process is going on, and it has to violate some of the explanatory requirements that exist in the various disciplines. This means that IBE is often inapplicable from a view from nowhere, but efficient from within each of the competing perspectives. A better way to put this is that in this phase EBI is on the agenda. Later on some of these explanatory considerations are internalised and accordingly it becomes difficult to make the distinction between explanatory and non-explanatory inferences. 5. CONCLUSION Peter Lipton claims—correctly in my view—that at least sometimes explanatory considerations come before inference (EBI). He uses this claim to argue for Inference to the best explanation (IBE). This paper has shown three limitations of IBE. They occur in situations (i) where no hypothesis meets the minimal requirement for being an explanation, (ii) where we have competing concepts of explanation, and (iii) where the possibility of an IBE coexists with the applicability of other inferential tools. This immediately refutes any “imperialistic” IBE-approach saying that all nondemonstrative inference is IBE, such as Harman’s (1965). But the limitations are also interesting in an indirect sense. Given the intuitive importance of explanatory considerations and having already conceded the three limitations above, it is natural to examine whether also some of the apparent cases of IBE are reflections of some more genuine role of explanation. And indeed every EBI is not IBE. Sometimes essentially explanatory considerations belong to the early phases of inquiry, for instance involving concept formation. Later phases of such enterprises appear to be governed by IBE. But stripping reality of the appearences we see that explanation contributes only derivatively to these inferences. REFERENCES Ben-Menahem, Y. (1990). The inference to the best explanation. Erkenntnis 33: 319-344. Bird, A. (1999). Scientific revolutions and inference to the best explanation. Danish yearbook of philosophy 34: 25-42. Day, T. and H. Kincaid (1994). Putting inference to the best explanation in its place. Synthese 98: 271295. Duhem, P. (1906/1954). The aim and structure of physical theory (La theorie physique: son objet, sa structure). Princeton N J: Princeton University Press. Harman, G. (1965). Inference to the best explanation. Philosophical review 74: 88-95.
IBE AND EBI
147
Hempel, C. G. and P. Oppenheim. (1948). Studies in the logic of explanation. Philosophy of science 15: 135-175. Hempel, C. G. (1970). Aspects of scientific explanation. Oxford: Glencoe. Holmberg, Å. (1964). Privatdetektiv Ture Sventon på nya äventyr; en samlingsvolym. Stockholm: Rabén & Sjögren. Lipton, P. (1991). Inference to the best explanation. London: Routledge. Lipton, P. (2001). Is explanation a guide to inference? A reply to Wesley C. Salmon. In Hon and Rakover (eds.): Explanation: Theoretical approaches and applications. Kluwer: 93-120. Lipton, P. (2004). Inference to the best explanation. London: Routledge. Peters, R. H. (1991). A critique for ecology. Cambridge: Cambridge University Press. Salmon, W. C. (2001a). Explanation and confirmation. In Hon and Rakover (eds.): Explanation: Theoretical approaches and applications. Kluwer: 61-91. Salmon, W. C. (2001b). Reflections of a bashful Bayesian: A reply to Peter Lipton. In Hon and Rakover (eds.): Explanation: Theoretical approaches and applications. Kluwer: 121-136. van Fraassen, B. (1989). Laws and symmetry. Oxford: Clarendon Press.
EXPLAINING WITH EQUILIBRIA68 JAAKKO KUORIKOSKI
Equilibrium explanations are pervasive in economics and biology. They are also becoming more frequent in the social sciences due to the widespread adoption of economic (and sometimes biological) explanatory models. For example, economist Edward Lazear has recently (2000) claimed that the adherence to equilibrium concepts is a major factor contributing to the ‘scientific’ status of economics. It therefore explains to some extent why economic explanatory models have invaded other fields of social inquiry. This view is echoed amongst others in some branches of political science, where microeconomic explanations based on equilibrium solution concepts are seen as the paradigm of scientific thinking, and the demonstrable lack of stable equilibria in models of voting systems are viewed as a devaluation of the field to the status of dismal science (e.g., Riker 1980; AustenSmith and Banks 1998). However, equilibrium explanations have so far attracted surprisingly little philosophical attention. An interesting exception is the puzzling role of the so-called general equilibrium models in economics.69 However, these intricate mathematical constructs are widely held to provide more of a consistency proof to the usual neo-classical modeling assumptions than actual explanations of events or phenomena (Hausman 1992; Weintraub 1985). Moreover, when equilibria are discussed, the actual philosophy of explanation tends to get buried under the mathematics, and therefore a more approachable account might prove useful. The focal point of this discussion is the account of equilibrium explanations given by Elliott Sober. Sober claimed in his ‘Equilibrium Explanation’ (1983) that equilibrium explanations constitute a counter-example to ‘the causal thesis of explanation’, because they do not require the exposition of the actual causal history of the event to be explained. The point argued in this paper is that explanations of singular events are indeed causal, even those supplied by equilibrium models. What Sober construes as explanations of singular events by putative disjunctive causes should be seen instead as constitutive or structural explanations of (macro) properties of the system (population) under scrutiny. However, equilibrium models can be used to explain different kinds of explananda. What can be explained with a given explanatory model will in turn be assessed by using James Woodward’s theory of explanation (Woodward 2000; 2003). It is further argued that these various explanatory uses of equilibrium models require that the equilibrium models pick out 68 69
I would like to thank the Finnish Cultural Foundation for its generous support of this research. However, the opinion is that the whole theory of general equilibrium is dead (Ackerman 2002).
149 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 149–162. © 2007 Springer.
150
J. KUORIKOSKI
causally relevant properties of the system and that the models can and should therefore be seen as special cases of causal models. This arguemtn supports Brian Skyrms’ contention, namely that it is the dynamics of the equilibrium model that bear most of the explanatory burden (Skyrms 1997). 1. EQUILIBRIA Intuitively, the idea of an equilibrium is quite simple: it is a stable state of a system that is being maintained as a result of interacting forces in the system. Usually, equilibria are also thought of as having domains of attraction larger than the state itself, so that if perturbed, the system would right itself and return to the equilibrium state. However, as we will see, technically this is not a necessary property of an equilibrium. Why are such states explanatory? Sober’s account of equilibrium explanation goes as follows: if a system has an equilibrium with a large enough domain of attraction, the occurrence of an equilibrium state can be explained by exhibiting it as an equilibrium. This description is explanatory because it shows that many different initial conditions could have led to the state in question. According to Sober, equilibrium explanations thus represent a counter-example to the claim that the explanation of a single event requires the citing of its actual causes, since equilibrium explanations seem to make the actual causal history of an event irrelevant: in other words, the system would have ended up in the equilibrium state even if the causal history had been different in a number of ways (Sober 1983, p. 207). Sober’s example of an equilibrium explanation is Ronald Fisher’s argument accounting for a 1:1 sex ratio common in a wide variety of populations. If a population diverges from the 1:1 sex ratio, there will be a reproductive advantage favouring pairs producing the minority sex. The ratio of male-to-female progeny has an impact on a parent’s fitness by virtue of the number of grandchildren produced. If males are in the majority, individuals producing female offspring will, on average, have more grandchildren than others. Therefore, any heritable variants that cause overproduction in the minority sex tend to increase in frequency. Thus, the population is eventually driven back to the 1:1 equilibrium (Sober 1983, p. 201202). For Sober, the crucial element behind the special explanatory properties of equilibria is the many-to-one relationship between the initial conditions and the explanandum. Therefore, Sober’s characterisation of an equilibrium explanation applies to a fairly large number of different models in different domains. When discussing equilibrium models, a distinction should first be drawn between dynamic and non- or quasi-dynamic models. Equilibria in traditional, rational choice -based game theory (usually Nash-equilibria), are usually only quasi-dynamic. An equilibrium is characterised only by its intrinsic properties (as a best-response strategy profile) and little or no attention is given to the question of how the system is supposed to reach it. As Nicola Giocoli puts it (2003), the ascendance of game theory and the associated topological techniques have transformed the paradigm of economic equilibrium from a system of forces to a mutual consistency of abstract
EXPLAINING WITH EQUILIBRIA
151
relations (plans and strategies). These non-dynamic equilibria pose additional questions about the explanatory worth of the equilibrium concept, especially if the model has multiple equilibria.70 Therefore, I will not discuss these non-dynamic models here.71 In contrast, the classic Cournot-game of oligopolistic competition, ordinary price-taking partial equilibrium models, equilibria in models of physical systems and models in evolutionary game theory are all dynamic in a loose sense, i.e., they purport to show, or at least to account for, how and why the equilibrium is supposedly reached. Phenotypic optimality models can be used to approximate genetic equilibrium models and therefore can sometimes also be regarded as dynamic equilibrium models when selection is at least implicitly assumed to drive the phenotype into optimum (Esher and Feldman 2001). Evolutionary games, population models and some economic equilibrium models are also dynamic systems in a stricter formal sense, i.e., they are constructed as systems of differential or difference equations. Equilibria are then defined as fixed points, points of the state-space where the rate of change of the system is zero.72 The explanatory worth of equilibria is usually taken to depend on the different stability properties they possess. An important distinction should be made here between two kinds of stability: dynamic and structural. Dynamic stability refers simply to the internal properties of the system as given; is the equilibrium only locally stable or perhaps asymptotically or even globally stable? In other words, given some initial state, where do the given dynamics carry the system? Structural stability refers to the stability of the system itself, i.e., whether qualitative features of the model are robust under changes in parameters or dynamics. These qualitative features usually refer to the topological properties of the model, i.e., a model is structurally stable if minimal perturbations result in topologically equivalent models (Skyrms 1999). There is some controversy about which topological stability features are interesting or even requisites for an acceptable model. Some bifurcation phenomena may be artefacts of the model, and some may reflect physical properties of the modelled system. According to the so-called stability dogma, only structurally stable models can be given an empirical interpretation. According to a more liberal 70
71
72
Hence, the proliferation of solution concept refinements that attempt to pin down the one ‘right’ solution concept that would always yield a unique solution (cf. Harsanyi and Selten 1988). Bicchieri (1995) argues that the problem of multiple equilibria should be addressed by explicitly modelling the learning processes of the players. This omission also gives one more reason to evade the thorny question of the explanatory role of general equilibrium theories, because the usual Walrasian tâtonnement ‘mechanism’ or its variants cannot reasonably be taken to describe the actual dynamics of actual systems (Rosenberg 1992, Chapter 7). Fixed-point theorems used in the existence proofs of general (in essence game-theoretical) equilibria leave the relevant mapping unspecified. However, Skyrms has noted that dynamic equilibria cannot in fact be taken as principal explanatory concepts, because there are many fixed points that are unstable and do not have domains of attraction and therefore do not support the apparently required many-to-one relationship. Skyrms claims that the obvious candidate for an explanatory concept is the attractor (although I would presume attracting sets would be enough). Furthermore, all attractors are not fixed points. (Skyrms 1997) I basically agree with Skyrms, but I continue to speak of equilibria in order to stay true to Sober’s intentions and to the common usage of the word rather than to the terminology of applied mathematics. Besides, the formal finesses of fixed points and attractors do not play any role in the discussion to come.
152
J. KUORIKOSKI
interpretation, only structurally stable properties of dynamic systems should be considered as physically relevant. (Guckenheimer and Holmes 1986, Chapter 5) 2. MODELS AND PHENOMENA A large part of Sober’s paper is devoted to the question of the plausibility of disjunctive token causes; if equilibrium explanations explain events in the same way as ordinary (singular) causal explanations, can alternative histories together constitute a reasonable causal explanans (Sober 1983, p. 204-206)? However, I take this explicit concern with the explanation of individual events to be rather curious. Is the intended target of Fisher’s argument really a 1:1 sex ratio of a single population at a given time? The perplexing explanandum is undoubtedly the pervasiveness of the 1:1 sex ratio across different populations, species and environments. What is being explained is primarily a generic pattern or a phenomenon, as discussed by James Bogen and Woodward (1988). According to Bogen and Woodward, phenomena fall into many different ontological categories: particular objects, objects with features, events, processes and states. By explaining phenomena rather than singular events or data, scientists avoid having to tell an enormous number of independent, highly local and idiosyncratic causal stories. From this perspective, the effort put by Sober into discussing the viability of disjunctive causes of a token event seems largely misguided. Instead, according to Bogen and Woodward, phenomena require something they call systematic explanation, which is usually supplied by explanatory models. Fisher’s argument is an outline of a model providing a basis for such a systematic explanation (cf. Seger and Stubblefield 2002, p. 7).73 Systematic explanation and singular causal explanations do differ but are by no means independent. Explanatory models as such do not explain individual events, but they can be used to construct such explanations. As Sober himself notes, what the cause of a single event is is a context-dependent question, usually dependent on largely pragmatic concerns. Sober also claims that the dilution of the causal requirement to the claim that causal explanations provide ‘information about the causal history of the event’ amounts to trivialisation of the whole concept of causal explanation (Sober 1983, p. 202-203; see also Gjelsvik in the present volume). Yet, whereas these points lead Sober to sink into pessimism about the whole idea of causal explanation, Woodward’s theory of explanation offers a more constructive approach by way of an account of how these context-dependent matters can be met an account that tells what can be explained on the basis of given information or a given model. The basic idea behind Woodward’s theory is that explanatory models show how systems work. By giving a schematic account of the dependency relations in a system or a mechanism, explanatory models codify answers to what-if-things-hadbeen-different questions, i.e., assertions about various counterfactual, contrastive 73
Indeed, in their introductory text, Seger and Stubblefield claim that it is difficult to think of any other biological field concerned with specific phenotypes that would be more model-driven than sex-ratio research.
EXPLAINING WITH EQUILIBRIA
153
states of the system. In the most straightforward cases, these dependencies are represented as functional relationships between variables, where the possible values of the variables form the relevant contrast classes with respect to possible explanatory set-ups. More specifically, models of causal mechanisms give answers to questions concerning the effects of hypothetical interventions on these variables. Woodward’s claim is that both systematic explanation and singular causal explanation involve these calculations of the effects of possible interventions in the model, but they differ in what kinds of interventions are relevant—what variables should be held fixed in the evaluation of the relevant counterfactuals. Thus, this account captures the heavy context dependency of singular causal explanations that Sober was worried about without shoving everything of interest under the rug of pragmatics (Woodward 2000, 2003). Woodward’s account also seems, at the outset, to account for why equilibrium explanations and their many-to-one dependencies create significant understanding and therefore provide ‘deep’ explanations. It is because they give answers to a particularly large collection of what-if-things-had-been-different questions, i.e., they have a wide counterfactual ‘range’. Even if things (initial conditions) had been different, at least not too drastically different, the end result would have been pretty much the same. However, in principle, this explanatory force does not differ from that of other schematic descriptions of mechanisms. Whereas Sober’s intuitions about explanations seem to lean towards epistemic or perhaps modal conceptions in Wesley Salmon’s taxonomy (Salmon 1984, Chapter 4), the contrastivecounterfactual perspective advocated here is firmly ontic or realist; understanding is created through the exposition of objective mind-independent dependencies, which facilitate action and inference by supporting counterfactuals about hypothetical manipulations.74 3. POSSIBLE EXPLANANDA OF EQUILIBRIUM MODELS So far so good. According to Alexander Rosenberg, the resilience or persistence of a phenomenon has been associated with explanation since Plato. Rosenberg claims that this fundamental intuition motivates the search for explanation of change in terms of underlying persistence, which itself is seen as intrinsically understandable. Equilibria thus provide a natural stopping point for inquiry, because they minimise the amount of inherently problematic change (Rosenberg 1992, p. 205-206). Rosenberg himself does not advocate this view, but claims that such intuitions have been historically influential and that they are one of the main reasons why 74
Another area where the contrastive-counterfactual framework might prove useful is the controversy around the explanatory relevance of different stability properties of equilibria alluded to above. The adherence to the ontic account demands only that the stabilities and possible bifurcations reflect objective features of the system being modelled. Therefore, stability properties per se should not be used as criteria of empirical or explanatory significance. Instead, what the relevant stability properties are depends on the specific problem in hand. The contrasts explained by different manipulations of the variables (initial values) or parameters could and should be analysed on a case-by-case basis. However, these questions are not pursued further here.
154
J. KUORIKOSKI
equilibrium explanations are so popular among different sciences. However, after a more careful examination, this view actually seems to run counter to the account of explanation used here; it is the invariance of the dependencies, not the invariance of the actual behaviour of the system, that is responsible for explanation. In fact, since equilibria with large basins of attraction are highly independent of their initial conditions, they actually seem to represent a counter-example to a requirement of genuine dependence made by Woodward and Christopher Hitchcock. Woodward and Hitchcock claim that in order to be explanatory, a generalisation (or the corresponding dependency relation) has to be invariant under a testing intervention, i.e., there has to be a possible setting of the explanatory variables that would have changed the explanandum state from the actual equilibrium state (Woodward and Hitchcock 2003, p. 17). This requirement is vital, since explanatory dependencies are characterised by counterfactual conditionals, and removing this requirement would allow one to generate apparently explanatory pseudo-dependencies at will. However, in the extreme case of equilibrium models with global dynamic stability, such testing interventions would by definition be non-existent or prove to be so drastic that they would disrupt the whole system, thus making it impossible to speak intelligibly of different values of the same explanandum variable (end-state). It seems that with global stability there could not be any proper testing interventions on the explicit explanatory variables and therefore no explanatory power. This is clearly unacceptable, since such stable equilibria are usually precisely the ones regarded as highly explanatory. If explanations indeed track dependencies instead of persistence, the interesting explanatory relationship cannot be the one between the initial conditions and the equilibrium state, as might first be surmised, and indeed as seems to have been Sober’s view. Instead, what the equilibrium state does depend on are the structural features of the system. Equilibrium explanations are not causal explanations of events but structural or constitutive explanations of system-level properties. As an illustration, consider a Lotka-Volterra model of a continuous predator-prey system with strong self-limitation for the prey owing to competition in a finite environment, i.e., in the absence of predators the prey population grows according to a logistic equation: dx/dt = r(1 – x/K)x – bxy and the predators respectively: dy/dt = (-c + dx)y where x and y are the prey and predator densities, b the predation rate coefficient, d the reproduction rate of prey population increase and c the predator mortality rate. When projected onto a predator-prey plane, the trajectories of the system under various different initial values look something like the following:
EXPLAINING WITH EQUILIBRIA
155
Figure 1. A predator-prey-system with self-limitation
However, if we loosen the self-limitation on the prey, we get the classic LotkaVolterra model (a is the intrinsic rate of prey population increase), which does not have stable equilibria, but instead yields cyclic behaviour of the predator and prey populations of which there are well-documented instances: dx/dt=(a-by)x dy/dt=(-c+dx)y
Figure 2. A predator-prey system without self-limitation
156
J. KUORIKOSKI
Here we have very similar models apparently explaining very different kinds of phenomena. The first is a paradigmatic equilibrium explanation for certain stable levels of predator and prey populations, whereas the second explains cyclic fluctuations of population levels sometimes encountered in predator-prey systems. However, the latter explanation does not exhibit the many-to-one structure of Sober’s equilibrium explanation. Moreover, cyclical population patterns extended indefinitely in time are hardly reasonable candidates for token events. Clearly the classic Lotka-Volterra model as such does not primarily explain events. Instead, the explanandum is more naturally interpreted as a property of the system. The predatory-prey dynamics with constants referring to the mean or expected behaviour of token animals (the predation coefficient is sometimes actually meant to be estimable from ‘laboratory’ experiments with small sample sizes) give a structural (micro) explanation of a behavioural tendency of the macro-system composed of the two populations. Since it is difficult to see any fundamental differences between the classic model and the self-limiting model yielding an equilibrium, the equilibrium itself is also most naturally seen as a macro-property arising from the constituting causal properties (predation coefficient, etc.) used in the construction of the differential equations. The explanatory dependencies captured by these models are not causal dependencies between successive events, initial conditions and the end state, but constitutive dependencies between properties of a system and their causal basis. Since it would be rather forced to claim that the explanatory principles in these models are radically different (it is also worth noting that the change in the differential equations can be given a loose empirical interpretation), the many-to-one relationship between the initial conditions and the equilibrium in itself is clearly not the factor responsible for the explanatory power in the equilibrium model. Instead, the causal mechanisms and capacities implicit in the model dynamics constitutively explain both the equilibrium and the non-equilibrium explananda.75 4. CAUSAL EXPLANATION OF EVENTS WITH EQUILIBRIUM MODELS Suppose that one insists on demanding an explanation of a token event using Fisher’s argument, i.e., an explanation accounting for the occurrence of a 1:1 sex ratio in a population p at time t. The first contrast class of the explanandum event that comes to mind is composed of other (logically) possible sex ratios of p at t. However the initial conditions do not make a difference, either to the sex ratio or to t for that matter. What the argument tells us is that even if there had been some forces acting on the population in the past, forcing the population out of the equilibrium, the population dynamics have apparently driven the population back to the 1:1 ratio. 75
It should be noted that some regard the L-V models with suspicion precisely because they give rise to qualitatively different phenomena under alterations of parameters and therefore do not satisfy the strict stability dogma (cf. Guckenheimer and Holmes, 1986). However, as has been argued above (footnote 74), the fact that L-V models can account for a variety of qualitatively different phenomena should not be held against them if the changes in the model responsible for the radical alterations can be given an intelligible empirical interpretation.
EXPLAINING WITH EQUILIBRIA
157
Notice how the lack of information about the actual history of the event is accompanied by total lack of information that would single out the so-called event of the 1:1 ratio of p at t from any other corresponding pseudo-event (i.e. a 1:1 ratio at t’ or t’’, etc.). Contrasting the time index t to other possible values simply does not seem to make much sense. As was argued above, what the argument actually explains is a macro-property of the system, i.e., the tendency of having a 1:1 sex ratio. Once we know this property, explaining unremarkable token events becomes a trivial exercise; the system has a property of being in state s, unless recently perturbed, and the system has not recently been perturbed. However, even in this trivial explanation of an event, the rather uneventful causal history of the system makes an indispensable appearance. Structural or constitutional explanations of properties are not causal in the sense that singular explanations of events are. A property instantiation can of course be given a causal explanation in terms of the causal history of its constituting token structure, but this is an answer to a different question. However, a modified Fisher argument/model could in theory also be used to explain changes in the population sex ratio. Fisher’s argument relies on an assumption of equal or linear fitness returns from the production of sons and daughters. In fact, as Fisher himself observed, the equilibrium concerns parental effort or ‘expenditure’, not numbers of offspring. This assumption can be relaxed in a number of ways. The ‘costs’ of the sexes can differ because of different mortality rates during the parental care period, different resource needs of male and female offspring and different effects on the parents’ future reproductive performance. (Seger and Stubblefield 2002, p. 16) Suppose that the environment changes in a way that alters these differential costs. This modification changes the population dynamics and creates a new equilibrium sex ratio (in terms of actual offspring), which the population eventually achieves due to selection. The same kind of exercise could also be done with the Lotka-Volterra model with strong predator self-limitation; a change introduced in the parameters (not in the initial conditions) would shift the equilibrium levels and the populations would eventually be driven to this new equilibrium. In this way similar equilibrium models can be used to explain singular events. In fact, explanations with this kind of structure are the norm in economics and go by the name of comparative statics. Consider the following simple text-book explanation of a token event using a microeconomic demand-price equilibrium. The explanandum is the increase in the price of steel in the U.S. from 1980 to 1985 by roughly 25% relative to the international (Amsterdam) market level. The explanation for this price increase is taken to be ‘voluntary’ import quotas imposed by the Reagan administration. In the equilibrium model, the structural effect of the imposed quota on the steel market is represented as a shift in the supply curve (from S1 to S2), which moves the market equilibrium along the demand curve D. Market forces are then supposed to drive the price to this new equilibrium level (explanandum), which is higher than the prequota level.
158
J. KUORIKOSKI
Figure 3. A simple comparative statics exercise Here an equilibrium model is constructed to represent the state of the system before the event to be explained (sometimes parameters are fitted against data if this is possible), a disturbance is simulated by manipulating the parametrisised model and then the explanans is hopefully the (or an) equilibrium of the disturbed model. The disturbance thus causes the event to be explained in a straightforward manner through the mechanism schematised in the model dynamics (in this case implicit and quite informal). More generally, the forces responsible for the attainment of the equilibrium create a counterfactual dependency relationship between structural parameters and the equilibrium state that is invariant under some reasonable interventions on the parameters. Changes in the parameters cause the system to settle into the new equilibrium. This kind of equilibrium reasoning underlies the bulk of economic theory used to formulate econometric models usually thought of (implicitly or explicitly) as representing causal relationships. Equilibria here serve more as important characteristics of mechanisms linking variables of interest, i.e., as parts of the explanans, rather than as things to be explained. Mechanisms underwrite the counterfactual dependencies required for causal explanation, but do not as such explain events; they only link them. When explaining token events with equilibrium models, the actual causal history is therefore far from irrelevant. Daniel Hausman agrees that what the Fisher argument explains, i.e., the sex ratio, is first and foremost a property of the population not reducible to its members. However, he takes this to be a contingent feature of this particular form of argument; if the population were to be divided into two equal subpopulations, one reproducing at sex-ratio of n:m and the other at m:n, there would be no selection pressure. Thus, the Fisher argument only yields a population average. Hausman claims that other ‘arbitrage arguments’, which are essentially equilibrium explanations with the exploitation of disequilibrium (arbitrage) and the resulting selection acting as the
EXPLAINING WITH EQUILIBRIA
159
explanatory mechanism, can also be used to explain properties of individuals. Hausman’s example of an arbitrage argument is the familiar story used to legitimise the application of rational expectations to the theory of firm behaviour: there are (assumed to be) firms that behave according to predictions made by the best relevant economic theories. Such rational expectations are, on average, assumed to increase ‘fitness’ relative to firms behaving otherwise. However, there seem to be no systematic differences in firm survival that could be attributed to different predictive abilities. Therefore, on aggregate, firms behave according to the best relevant economic theories (Hausman 1989). Are properties of individual firms really being explained here? What about token property instantiations or even singular (f)acts? The exact formulation of the conclusion of Hausman’s arbitrage argument is as follows: Ceteris paribus, the expectations of firms for the most part match the predictions of economic theory. (Hausman 1989, p. 6) More generally, according to Hausman, the conclusion of an arbitrage argument takes the following form: Ceteris paribus, in the actual environment almost all x possess H. (Hausman 1989, p. 9) Although ostensibly about members of the population, the conclusion cannot be used to answer questions of the form why firm x, rather than firm y, has expectations matching economic theory. In fact, the very impossibility of such attributions served as a premiss in the argument! ‘Almost all x possess H’ is just an elliptical way of claiming ‘Almost all x that are G (or belong to g) possess H’. What is being explained is still first and foremost a property of the population (g), i.e., a high relative frequency of firms behaving according to economic theory. This conclusion certainly makes it likely that any member of the population will act as predicted, but properties or actions of singular firms would still require some information about their micro-structures or causal histories (cf. Ylikoski 2001, p. 57-60). 5. ARGUMENTS AND MODELS The preceding discussion was based on a shift of focus from explanations as single arguments to various uses of explanatory models. One motivation for this shift was Sober’s perfectly legitimate unease about the multiplicity of causal factors affecting any given event and the subsequent indeterminacy of the singular causal explanation of the event in question. In fact, the very idea of contrastive explanandum reveals that something over and above a deductive argument is needed for explanation. This need not mean discarding the deductive ideal or the third dogma of empiricism, at least in one sense: the possibility of deducing the explanandum can serve as a threshold for what can be explained with a given model (Ylikoski 2001, p. 68). What is denied is that explanations can fruitfully be analysed as mere arguments, in isolation from the theories and models used in deriving them. In addition, focusing on equilibrium explanations as a special and distinct genus of explanatory arguments masks the important fact that the actual fixed points or attractors are not the only interesting features in dynamic systems. If the explanatory force is derived from the many-to-one relationship between the initial conditions and the equilibrium, other interesting and important dependencies in the model do not
160
J. KUORIKOSKI
get the attention they deserve. For example, George Gilchrist and Joel Kingsolver (2001) have recently stressed that the optimality peaks and the associated population equilibria are not the only interesting features in so-called fitness surfaces and adaptive landscapes. Individual performance or fitness surfaces are mappings from individual phenotypes (or sometimes genotypes) to individual performance or fitness, whereas adaptive landscapes are maps from the mean phenotype or allele frequency to the mean population fitness. These models are necessary related, but quite distinct, and only adaptive landscapes can be directly related to models of evolutionary dynamics. These models do not merely explain the attainment of equilibrium, but instead can be made to jump through all kinds of explanatory hoops. First of all, the simple fact that selection tends to drive the population up the steepest slope of an asymmetric peak in an adaptive landscape tells us something over and above the equilibrium point itself. However, this assumes the absence of underlying constraints. Asymmetric peaks in a multivariate fitness landscape imply functional, genetic or phenotypic correlations amongst traits that have to be accounted for in the outlining of possible trajectories of the population (Gilchrist and Kingsolver 2001). If the combined fitness surface and adaptive landscape are treated analogously to a description of a mechanism, the explanatory power of these additional constraints can be analysed with the same theory of explanation as that of the equilibrium. Through the various evolutionary mechanisms summarised in the landscapes (selection being only one), these additional constraints can in turn explain various contrasts in the trajectory of the population; different manipulations of these constraints would lead to systematic differences in the trajectory of the population in the adaptive landscape. Moreover, by moving the peak in the adaptive landscape so as to model gradual environmental changes (such as a climate change that translates into a steady change in selection pressure), these combined models can be used to explain directional, long-term changes in populations and thus provide a linkage between link micro- and macro-evolutionary patterns (Arnold, Pfrender and Jones 2002). Gilchrist and Kingsolver aptly start their discussion by imagining a mountaineering guidebook listing the location of every peak and the numbers of climbers encountered at the top. They conclude that without additional information such a book would prove to be of little interest. For a climber, the really useful information would reside in the topography: how steep is the approach, are there ridges leading to the summit, are there other peaks nearby. (Gilchrist and Kingsolver 2001, p. 219) These considerations show that optimality models and adaptive landscapes can in principle be used to explain at least some general features of actual causal histories of populations. Here the mechanisms underlying the model dynamics are the responsible factor for the attainment of the peak, equilibrium, as well as the way that the peak was actually achieved and the trajectory of the population under possible environmental changes. Therefore, the same theory of explanation could and should be used to analyse these different explananda. Respectively, just because the model providing an explanation is based on equilibrium assumptions does not mean that the explanation itself should have some unique feature distinguishing it from other explanations.
EXPLAINING WITH EQUILIBRIA
161
What then is so special about equilibria? If equilibrium explanations - if strictly speaking there are indeed such things - do not themselves possess any special explanatory depth, why are they given such a privileged standing as the hallmark of a truly scientific enterprise? Here is my proposal. Equilibria are special and important - just not in quite the way Sober presented. The strength of the concept stems from the ability of equilibrium constructs to underwrite powerful structural dependencies between the parts and the whole with only minimal assumptions about the constituents. Consequently, equilibrium assumptions facilitate causal explanations of events by providing credible and relatively easily tractable mechanisms linking structural changes and changes in the state of the system under scrutiny. In equilibrium models, there are usually not that many different kinds of nuts and bolts and no intentional effort needed to ensure the obtainment and maintenance of the equilibrium. In contrast, imagine a complex thermostat with elaborate causal feed-back mechanisms maintaining the status quo or a centrally planned economy keeping its books balanced through massive data-gathering operations and active interventions. In a very concrete sense, both of these systems would be in an equilibrium state, but would this equilibrium have any extra explanatory import over and above the elaborate mechanisms maintaining them? Clearly not. REFERENCES Ackerman, F. (2002). Still dead after all these years: interpreting the failure of general equilibrium theory. Journal of Economic Methodology 9 (2): 119-193. Arnold, S.J., M.E. Pfrender and A.G. Jones. (2002). The adaptive landscape as a conceptual bridge between micro- and macroevolution. In A.P. Hendry and M.T. Kinnison (eds.): Microevolution Rate, Pattern, Process. Kluwer: 9-32. Austen-Smith, D. and Banks, J. (1998). Social Choice Theory, Game Theory and Positive Political Theory. Annual Review of Political Science 1: 259-287. Bicchieri, C. (1995). The Epistemic Foundations of Nash Equilibrium. In Little, D. (ed.) On the Reliability of Economic Models. Kluwer: 91-137. Bogen, J. and Woodward, J. (1988). Saving the Phenomena. The Philosophical Review Vol. XCVII (3). Eshel, I. and Feldman, M. W. (2001). Optimality and Evolutionary Stability under Short-Term and LongTerm Selection. In Orzack and Sober (eds.) (2001): 161-190. Gilchrist, G. W. & Kingsolver, J. G. (2001). Is Optimality Over the Hill? The Fitness Landscapes of Idealized Organisms. In Orzack and Sober (eds.) (2001): 219-241. Giocoli, N. (2003). Modeling Rational Agents. Celtenham UK: Edward Elgar. Guckenheimer, J. and Holmes, P. (1986). Nonlinear Oscillations, Dynamical Systems and Bifurcations of Vector Fields. New York: Springer. Harsanyi, J. C. and Selten, R. (1988). A General Theory of Equilibrium Selection in Games. Cambridge MA: The MIT Press. Hausman, D. (1989). Arbitrage Arguments. Erkenntnis 30: 5-22. Lazear, E. (2000). Economic Imperialism. The Quarterly Journal of Economics 115 (1): 99-146. Orzack, S. H. and Sober, E. (eds.) (2001). Adaptationism & Optimality. Cambridge University Press. Riker, W. (1980). Implications from the disequilibrium of majority rule for the study of institutions. American Political Science Review 74: 432-446. Rosenberg, A. (1992). Economics – Mathematical Politics or The Science of Diminishing Returns. The University of Chicago Press. Salmon, W. (1984). Scientific Explanation and the Causal Structure of the World. Princeton NJ: Princeton University Press.
162
J. KUORIKOSKI
Seger, J. and Stubblefield, J. W. (2001). Models of sex ratio evolution. In Hardy (ed.): Sex Ratios – Concepts and Research Methods. Cambridge: Cambridge University Press: 2-25. Skyrms, B. (1997). Chaos and the explanatory significance of equilibrium: Strange attractors in evolutionary game dynamics. In Bicchieri, Jeffrey and Skyrms (eds.): The Dynamics of Norms. Cambridge University Press: 199-222. Skyrms, B. (2000). Stability and Explanatory Significance of Some Simple Evolutionary Models. Philosophy of Science 67: 94-113. Sober, E. (1983). Equilibrium Explanation. Philosophical Studies 43: 201-210. Weintraub, E. R. (1985). General Equilibrium Analysis. Cambridge: Cambridge University Press. Woodward, J. (2000). Explanation and Invariance in the Special Sciences. British Journal for the Philosophy of Science 51: 197-254. Woodward, J. (2003). Making Things Happen – A Theory of Causal Explanation. Oxford: Oxford University Press. Woodward, J. and Hitchcock, C. (2003). Explanatory Generalizations – Part I: A Counterfactual Account. Noûs 37: 1-24. Ylikoski, P. (2001). Understanding Interests and Causal Explanation. Ph. D. thesis, University of Helsinki. Available from http://ethesis.helsinki.fi/julkaisut/val/kayta/vk/ylikoski/
EXPLANATION AND ENVIRONMENT The case of psychology
ANNIKA WALLIN
1. INTRODUCTION76 The environment in which our cognitive processes operate is crucial for understanding their current form, their reliability, and their function. In the following pages I will look at the role the environment plays in psychological explanations of cognitive behaviour, also when the explanations are not of an evolutionary character. In particular, I will focus on how environmental considerations (broadly) help us explain the form or the function of a psychological process.77 An ecological explanation explains the design of a psychological process by referring to the adaptive value of the design given a particular environment, and a particular function. For instance, vision has an adaptive value since it helps us navigate in our surroundings, to find food and partners, and to avoid predators and other dangers. An ecological explanation thus refers to the function of a process (finding food, avoiding dangers), to the environment in which the process is active (where food and dangers can be found through vision), and to the adaptive value of the function in the environment under consideration (vision helps the organism stay alive and reproduce). In this explanation of why organisms have developed a visual system, I used vaguely evolutionary terms. There are also other types of explanations in psychology that refer to the relation between organism and environment. Both the social or cultural environment, and individual learning experiences, such as trial and error, can affect the form and function of psychological processes. I might learn to pay attention to the colour of strawberries and to prefer red berries to green ones. This bias increases the culinary experience of eating strawberries, but also the nutritional value of my food intake. We can explain the learning process with the same evolutionary means as above (we are programmed to prefer sugary food, since this has given us selective advantages in the past), but the learning itself is not a 76 77
Part of this chapter was written while the author was a postdoctoral fellow at the Centre for Adaptive Behavior and Cognition at the Max Planck Institute for Human Development, Berlin The function of short-term memory includes keeping things not immediately perceptually available in mind, and its form, the limited amount of information that short-term memory can retain in this way.
163 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 163–175. © 2007 Springer.
164
A. WALLIN
matter of Darwinian adaptation, but rather of individual, or perhaps, cultural adaptation. A situation in which we learn to prefer green strawberries instead is not inconceivable. Imagine some dietary expert preaching the advantages of unripe food, with all the fibres you should desire, and less of the sugar you should want to avoid. Clearly, individual and social learning can change psychological processes in a way that deviates from the more basic, Darwinian, adaptation. Today, all three types of adaptations are used to explain the form and function of psychological processes. Regardless of the level of explanation, however, all ecological explanations require a specification of function, of environment, and of adaptive value. The goal of this chapter is to discuss the means available for all three types of adaptation to explain psychological processes in a way that uniquely identifies these key components of ecological explanation. How can we say anything in depth about the suitable environment in which to understand Darwinian adaptation? How do we identify the proper function of psychological processes shaped through cultural adaptation? What is to be counted as a successful adaptation in individual learning? The chapter begins with a brief description of the three types of adaptations that are used in psychology. Then I give three examples of how adaptation is used in current psychological explanations. This leads to a discussion of how well different types of adaptations can identify the function, environment, and adaptive value required to provide a full explanation of the form and function of psychological processes. 2. ADAPTATION IN PSYCHOLOGY Three different types of adaptation are used to explain psychological processes: Darwinian adaptation, where the cognitive processes are assumed to be shaped by natural selection. This means that the environment used to explain the form or function of a psychological process is the selective environment of our ancestors. Evolutionary psychology is almost exclusively concerned with Darwinian adaptation, but most researchers with an interest in how the environment influences cognitive processes will be forced to refer to the Darwinian environment at some point. Cultural adaptation, which refers to instances where interactions with the environment have shaped cultural artefacts such as decision rules or knowledge over many generations. I will not discuss the processes that have shaped our cultural environment here (such as gene-culture co-evolution, e.g. Boyd and Richerson (1985), or memetics, e.g. Dawkins (1976)). Instead, I will simply assume that some of the cultural artefacts available to us today have been adapted to their environment over long stretches of time. Individual adaptation. This is the adaptation over time that we find in a single individual in a specific environment. Just as in cultural adaptation, there are two ways to approach individual adaptation. We can focus on learning processes to understand more about how cognitive processes change over time to accommodate changes in the environment. An example is learning theory. We can also measure
EXPLANATION AND ENVIRONMENT
165
how well adapted individuals are to their current environment. An example is the neo-Brunswikian study of various experts. This chapter will focus on individual adaptation as a static phenomenon. The explanations discussed use the present state of the environment (assuming no major fluctuations in the relevant time-span) to explain the current form of the cognitive process. Since the adaptations occur on different time-scales, it is possible for them to interact. Any Darwinian adaptations will have an influence on individual and cultural adaptations. Cultural adaptations are accomplished through and influenced by individual adaptations. And over time, we can expect cultural adaptations to influence Darwinian adaptations. We will come back to the possible interactions between adaptations in the discussion. First, it is time to look more closely at some examples of psychological explanations referring to adaptation. 3. EXAMPLE 1. OVERCONFIDENCE OR THE COST OF IGNORING THE ENVIRONMENT Overconfidence is a psychological phenomenon that refers to an overrating of the correctness of one’s judgements. Typically, participants are asked knowledge questions such as “Which city has more inhabitants? Hyderabad or Islamabad?” and are asked to rate how confident they are that their answer on this particular question is correct on a scale from 50% to 100%. Overconfidence occurs when the confidence ratings given by participants are higher than their actual accuracy. For instance, overconfidence would occur if participants on average claim to be 100% certain that their answer to a particular question is correct, but in fact have a relative frequency of 80% correct on it (see, for instance, Fischhoff (1982)). The overconfidence effect can, however, be made to disappear under certain experimental conditions. Some authors (e.g., Gigerenzer, Hoffrage and Kleinbölting (1991); Juslin (1994)) have claimed that the overconfidence effect is simply an effect of unrepresentative sampling. The basic idea behind the critique is that participants need a certain amount of information in order to make a correct estimate of their performance on a task. When this is not available, they will instead draw on their more general knowledge of the area. If I have no clear intuition on whether Islamabad or Hyderabad is the biggest city in the question above, I might use the knowledge I have of my general competency in geography or what I know about the capitals of Asian countries to produce a confidence judgement.78 That means that if the knowledge questions are sampled in a skewed way so that they contain more difficult questions than are normally encountered, participants will exhibit overconfidence (i.e. miscalibration). If the knowledge questions posed are instead randomly sampled from representative environments, the overconfidence effect appears to disappear (Gigerenzer et al. (1991); Juslin (1994)). This alternative explanation of the overconfidence effect is one that involves the environment in 78
Recently the overconfidence effect has been put in serious doubt by, among others, Juslin (2000), who claims that it is simply a result of error variance in combination with regression, and thus a complete artefact. For the purposes of this chapter, the simpler critique of unrepresentativeness is sufficient.
166
A. WALLIN
which participants normally make confidence judgments. In this respect, it is an ecological explanation. Note that the overconfidence effect is explained by referring to individual adaptation. 4. EXAMPLE 2. LIMITATIONS OF SHORT-TERM MEMORY AND THE ESTIMATION OF CORRELATIONS The short-term memory is well known to have a limited span (the seven plus minus two usually attributed to Miller (1956)). A participant asked to remember a list of syllables or unconnected words will only recall a portion of them, and it is the size of this portion that the short time memory span refers to. In a few innovative papers, Kareev (1995; 1997; 2000) proposes that we can explain the restricted span in terms of a limited sample size bringing with it exaggerated correlations. Since exaggerated correlations are easier to detect, Kareev suggests that the functional role of the limitations of short-term memory is to help us detect important correlations in our environment. The logic behind this claim can, and has been, criticised (e.g. Juslin and Olsson, (2005)), but this needn’t concern us here. What is important is that Kareev proposes a new function for short-term memory, that is, how it relates to correlations in our environment. Kareev (2000) substantiates his claim by testing how good participants with low and high short-term memory spans are at correctly estimating correlations among entities. Indeed, he finds that the participants with lower memory span outperform participants with higher spans, an especially appealing find since the span of short-term memory is correlated with standard measures of intelligence. Kareev’s functional explanation is framed in terms of an Darwinian adaptation: “In the view of the present argument it is tempting to speculate, though impossible to determine, whether the actual size of human working memory has prompted the emergence of the ability to form categories, or whether the need and ability to form categories were among the factors that caused human working memory to reach its present size” (Kareev 2000, p. 401). If we were to accept Kareev’s proposal, it would give us a novel view of the function of the short-term memory span. It is no longer a limited resource due to inherent constraints in our cognitive architecture, but instead it is limited for a functional reason—to improve categorisation. Kareev’s reinterpretation of the short-term memory demonstrates that ecological considerations can play a positive theoretical role in psychology. They can make us see already established psychological findings in a new light, and generate predictions that can be empirically tested. 5. EXAMPLE 3. THE RECOGNITION HEURISTIC, USING THE ENVIRONMENT TO FIND NEW PSYCHOLOGICAL PROCESSES Considerations of environmental structure can also lead to hypotheses about new types of cognitive processes. For this to happen, we have to assume at the outset that
EXPLANATION AND ENVIRONMENT
167
cognitive processes are adapted (in one of the ways outlined in section 1). Then we can study a particular aspect of the environment and make assumptions about what a psychological process adapted to this environment might look like. The example that will be considered here is the so-called recognition heuristic proposed by Gigerenzer and Goldstein (Gigerenzer and Goldstein (1996); Goldstein and Gigerenzer (2002)). The basic idea behind the recognition heuristic is that lack of information is also information. As such, cognitive processes can exploit it. If, for instance, I recognise only one out of two cities, such as San Antonio and San Diego, I am likely to conclude that the one I recognise is the bigger one. The recognition heuristic states that if you recognise only one of two options, you should infer that this option has a higher “criterion value”, in this case, a larger population. Gigerenzer and Goldstein demonstrate the power of the heuristic by letting American students repeatedly judge which city in different pairs of cities is larger. In accordance with their hypothesis, American students make more correct judgements of the relative size of German cities (which they are less likely to recognise) than they do of the relative size of American ones (where they recognise more city names). Here we see that by assuming adaptations to the current environment (it is unclear here what type of adaptation Gigerenzer and Goldstein think is operating, but it is reasonable to assume that the adaptation is cultural or, perhaps Darwinian), novel psychological processes can be hypothesised and tested. Thus, ecological explanations cannot only be used to revise already existing hypotheses and phenomena, but also to form hypotheses about new psychological processes, such as the recognition heuristic. 6. ECOLOGY IN PSYCHOLOGICAL EXPLANATIONS As we have seen in the previous sections, ecological considerations do very useful work in psychology. They can be used to explain the form and function of already established psychological phenomena, but also to propose new types of psychological processes. But how consistent are the explanations they propose? For instance, how do we determine in which environment an individual forms a view of his or her competency in knowledge questions? How do we justify that the function of determining correlations is important enough for short-term memory to develop such limitations? How do we know that the recognition heuristic is on average successful, and therefore can be explained as an adaptation? For an ecological explanation to hold, it has to be able to specify what the environment relevant for the adaptation is, what the main function of the psychological process being considered is, and how to independently measure the success, that is, the quality of the adaptation, given the relevant environment and function. Depending on the type of adaptation assumed (be it individual, cultural or Darwinian), the environment, the function, and the level of adaptation has to be defined in different ways. For individual and, to some extent cultural, adaptation, we have the possibility of measuring both the environment and the level of adaptation directly. Darwinian adaptation, instead, has the advantage of a more detailed
168
A. WALLIN
theoretical view on what adaptation is. In the following, these issues will be addressed in turn. For each there will be a discussion of how the type of adaptation assumed affects which specifications of environment, function and level of adaptation can be made. 7. WHAT IS AN ENVIRONMENT? In the section on the overconfidence bias, we saw that it can be misleading to consider a cognitive process without taking its relation to the environment into account. When knowledge questions were representatively sampled, the overconfidence bias simply disappeared. The basic intuition behind the first experiments deconstructing the effect was that if a cognitive process is adapted to anything, it is adapted to the genuine environment of the participants. For such a claim to be substantiated we do, however, have to be able to define what this genuine environment is. In order to understand ecological explanations in psychology better, we have to know more about how to single out the relevant environment for which the psychological explanations hold. 7.1 The environment of evolutionary adaptiveness Evolutionary psychologists (e.g. Barkow, Cosmides and Tooby (1992)) with their special focus on Darwinian adaptations belong to the few psychologists that discuss the nature of the environment at any length these days. For them, the environment is whatever exerted selective pressure on our psychological processes. Typically, this is meant to imply the Pleistocene, and in particular, the hunter-gatherer societies we lived in then. The focus on Darwinian adaptation makes it easier to identify relevant parts of the environment, since they can be traced by their effects (i.e. the only relevant aspects of the environment are those that have influenced reproductive success). This lends some theoretical depth to the definition of the environment, but does not solve all explanatory problems. First, anyone referring to Darwinian adaptation in a psychological explanation has to be able to affirm that the environmental factors they concern themselves with have had a selective impact. Only aspects of the environment that persist over long stretches of time, and that can be demonstrated to have an impact on reproductive success, can be allowed to be part of the Darwinian environment. Unless we propose, as John Tooby (quoted in Laland and Brown (2002)) does, that it is possible to take the average of the environmental circumstances as an approximation of the Darwinian environment (something that is clearly untenable from an evolutionary perspective), we have to be very careful in selecting aspects that have been stable over long periods of time. Second, we have to keep in mind that selection only acts on an organism as a whole, and that we therefore cannot expect each feature of the organism to be a perfect adaptation to any environment. A viable psychological explanation using Darwinian adaptation has to demonstrate that the proposed environment (i.e. the proposed environmental factors
EXPLANATION AND ENVIRONMENT
169
assumed to drive the adaptation) meets the criteria of persistence and selective impact. This becomes particularly difficult given the remoteness of the Pleistocene. Darwinian adaptation has an enviably clear conception of what the environment is, and is able to fully motivate, on a theoretical level, which aspects of the environment to take into consideration in an explanation. On the other hand, such explanations face practical problems when it comes to actually obtaining data about the environment. For individual and cultural adaptations, the opposite is true. 7.2 Representative sampling For individual and cultural adaptations, it is not possible to refer to selective pressures when identifying the important aspects of the environment. On the other hand, these types of adaptations have the advantage of being readily available for investigation. It is possible for researchers to randomly sample environments from the lives of participants, to estimate the relative frequency of the environments under consideration and their relative importance. It is possible for these explanations to define the environment empirically. Brunswik’s (1955) notion of representative design, and in particular his use of representative sampling while studying perceptual constants (Brunswik 1944) is perhaps the clearest example of how the environment can be empirically defined. In his 1944 study, Brunswik wanted to understand whether the retinal size of an object could be used to predict its actual size. In order to establish the relationship between retinal size and object size, participants were followed for several weeks and stopped at random intervals. For whatever object they were looking at, at that point, retinal size, object size, and distance were measured. Since the objects taken into account were the objects actually attended to by participants in their daily environments, Brunswik could estimate the real-life predictive power of retinal size for object size. His conclusion was that the retinal size had some predictive power regardless of the distance to the object. Note that Brunswik’s method as described here is only a method for understanding the environment. In order to explain how participants judge the size of objects, it has to be combined with a demonstration that retinal size is used to predict object size. However, the controlled experiment that can be used to test this hypothesis will not help us understand how predictive retinal size is of object size. This requires a method such as Brunswik’s. Note also that the method of representative sampling is only possible in so far as the researcher already has a clear understanding of the cognitive process under investigation. Unless we have some idea of which aspects of the environment are accessed by the cognitive process, methodological shortcuts such as representative sampling are not possible. Simply stated, we have to know what to measure in order to measure it, also when the measurement is done through random sampling.
170
A. WALLIN 8. WHAT IS THE FUNCTION OF A COGNITIVE PROCESS?
In section four, we saw that Kareev invited us to consider the limitations of shortterm memory as an advantage. The function of the limitation might actually be to help us discover large correlations among entities quickly. Thus, the limitation of short-term memory can be explained by referring to the need to discover correlations in our surroundings. This adaptive explanation of short-term memory span rests, among other things, on the assumption that detecting co-variation is one of the most important functions of short-term memory. For the explanation to hold, Kareev has to justify why estimating correlations is more important than other functions of the short-term memory. Examples of such competing functions are remembering objects not directly perceptually available, and computing relations needed for complex reasoning. 8.1 Strong modularity Traditionally evolutionary psychologists avoid the issue of competing functions by denying that the same cognitive process (or function or algorithm, as they prefer to call it) can have several functions. This is the assumption of strong modularity that has been repeatedly made by Cosmides and Tooby (see, for instance, Cosmides and Tooby (1994)). If this assumption holds, we need not concern ourselves with tradeoffs. Each process is assumed to be primarily aimed at one function. Even if it is possible (but not altogether easy) to argue for strong modularity when it comes to Darwinian adaptation, this option is clearly untenable for individual and cultural adaptations. Individual and cultural adaptations have to utilise already existing cognitive structures for new purposes. An example is our use of cognitive processes specialised for producing and understanding spoken language to produce and understand written language. Ignoring Darwinian explanations for the time being, how can these other types of adaptations ever make an assumption about a dominant function? 8.2 Specialisation If we focus on explanations using individual and cultural adaptation, specialisation seems to be one way of identifying the most important functions of a psychological process. A person that is specialised in a particular type of problem (such as estimating the quality of pigs or making bail decisions) is bound to encounter such problems often, and to regard them as important. This individual can be assumed to have specialised cognitive processes whose primary functions are to solve the commonly encountered problems in the area of specialisation. By studying experts, we can thus get a better understanding of the individual adaptations needed for performing well in a particular area. On the other hand, the results, in so far as they do not concern the process of specialisation in general, apply only to a very limited set of individuals, and an even more limited set of problems.
EXPLANATION AND ENVIRONMENT
171
8.3 Frequency For processes that cannot be studied in relation to expert judgements, perhaps because of their more general nature, we need independent corroboration that they are primarily aimed at one function. The frequency with which we engage in different activities is one way to proceed. We are more likely to adapt to situations that we encounter often, than we are to adapt to less common demands. There are several methodologies available to measure the frequency of activities. Diaries and questionnaires can be used to estimate the frequency of different activities, as can the experience sampling method used by Robin Hogarth (2004). Here participants receive SMS messages at random times, and whenever they receive such a text message, they complete a short questionnaire about their most recent decisions. Note also that the representative sampling method mentioned in the previous section will reflect the relative frequency of the activity under investigation. When Brunswik stopped his participants at random intervals and asked them to determine the size of objects in their visual field, he could not estimate how often they actually would have determined the size of the objects had he not asked them to. Had participants, however, not looked at anything at all when he asked them, this would have been reflected in the data obtained. 9. WHAT IS ADAPTIVE? Above we have been concerned with how to define in which environments a cognitive process is to be adaptive, and which function (so basically which adaptation) is most important to understand the process. In order to determine whether a cognitive process is adapted to a particular environment we also need some way to measure or, alternatively, define adaptation. It is only when this is done that the explanation of the cognitive process is complete. 9.1 Reproductive success For Darwinian adaptation, it is easy to define exactly what adaptation is. The definition of adaptation is given through evolutionary theory, and anything that has led to reproductive success in the past will be considered adaptive for that environment. But the more complex the organism is, the more difficult is it to identify exactly which features are adaptive or not. A very competitive design can carry even maladaptive traits and still reproduce (see, for instance, Gould and Lewontin (1979)). Thus, we are not entitled to assume a priori that anything selected for is adaptive per se. It is only the organism as a whole (in relation to other competing organisms) that can be considered adaptive. Using Darwinian adaptation to explain an individual feature therefore calls for caution. To this we can add that the past reproductive success of the Darwinian adaptation is no guarantee for future or even current competitiveness.
172
A. WALLIN
9.2 General success For individual and cultural adaptation, the notion of reproductive success is not suitable. For these adaptations, we need to measure success in a more general way. Usually success then becomes a behaviour that produces a good outcome in a particular task (finding food, solving a mathematical problem). When we measure adaptiveness by general success, the proposed function of the cognitive process becomes very important. The function we have settled on will determine how we determine success for that particular process. How then is success to be measured? A certain number telling us how often participants are correct in a particular task is not very helpful. Any such measure has to be compared with other measures to carry weight. 9.3 Competitions Given that there are several processes proposed for the same function, we can compare their performance. The one that does best relative to the other ones can be argued to be more successful (or to be more adapted to the environment in which the comparison is made). If the process then also successfully predicts the behaviour of individuals, we have a good explanation of the behaviour of interest. The cognitive process is the most adaptive one available to explain the behaviour at the time being. It is true that new processes can be proposed that outperform it, but this is to be expected and is not a disadvantage for the method. Sometimes the comparison is held in the form of competitions between different simulated cognitive processes (examples of such competitions can be found in Axelrod (1984); Gigerenzer, Todd, and ABC (1999)). In any competition between processes, the contestants act as each other’s benchmarks. 9.4 Normative models It is also possible to use a favoured solution to a particular task as a benchmark of success. If there is independent evidence for why solving a problem in a particular way will be very successful, we can compare the proposed cognitive process to this benchmark. A favoured solution to a particular task can be argued to be superior on independent grounds (say through deduction from generally accepted axioms). Comparing proposed models to such a preferred one is the methodology favoured by Kahneman, Tversky, and Slovic (1982) in their influential Judgment under uncertainty. 10. ENVIRONMENT, FUNCTION AND ADAPTIVENESS TAKEN TOGETHER In order to explain the form or function of a particular cognitive process in an ecological way, we have to refer to the environment in which the process is
EXPLANATION AND ENVIRONMENT
173
supposed to be active, to the specific function the process is supposed to fulfil and demonstrate that the proposed form will be adaptive under these circumstances. As was shown above, there are methodological shortcuts around having to prespecify exactly what an environment is. To some extent, it is possible to use frequency of activity, or specialisations, as a clue to the most important function of the process, and thus to which adaptation is most likely. Whether the proposed form of the process actually is adaptive can be measured against benchmarks, such as competing theories or normative models. Taken separately there is some merit to all these shortcuts. However, in order to provide a full ecological explanation, these components have to work together. Which environment the process is active in, and which function it fulfils determines whether we consider it successful or not. Moreover, depending on the level of adaptation that is assumed, different environments and different functions come into focus. The possibility of multiple functions and the different types of adaptations that are available makes it very difficult to do good theory in the area of psychological adaptations. Much more work is needed here before any substantial explanations of human behaviour can be given. It is clear, at least, that any researcher interested in psychological adaptations is well advised to consider possible competing functions, and to think of what the trade-offs between different adaptations are (also on different levels). Furthermore, they should try to test for the adaptation in as broad a way as possible (across different cultures, across differences in individual abilities), and try to make precise predictions for different types of environments. Shortcuts such as representative sampling and benchmarks are additional ways of learning more about the ecology of thought even when theoretical developments are scarce. But such shortcuts depend on clearly formulated process models. When we have an idea of how the proposed cognitive process is supposed to interact with the environment, it is possible to sample environments in a representative way, and to predict the circumstances under which the process will behave adaptively. Disregarding the methodological shortcuts, it seems as if psychological explanations assuming Darwinian adaptations have advantages over explanations using cultural or individual adaptation. The advantages come from the more developed theoretical framework of evolutionary biology that to some extent can be used also in psychology. On the other hand, explaining the form or function of a cognitive process with adaptations to individual or cultural environments can be done with access to the environment to which the process is proposed to be adapted. This means that we can to some extent replace theory with practice, and sample genuine behaviour-environment complexes to understand them better. This is instead a powerful advantage for the explanations involving cultural and individual adaptations. For such shortcuts to work, the proposed cognitive process has to be very clearly defined. We also have to be clear on which type of adaptation we think is the appropriate one to explain the form or function of the process. Until this is done the advantages of individual and cultural adaptation cannot be fully exploited.
174
A. WALLIN 11. CONCLUSIONS
In this chapter, I have tried to demonstrate the role that the notion of an environment (or indeed the ecology of thought) plays in cognitive psychology. I have claimed that the environment is vital for us to understand properly how psychological processes really function. If the environment is forgotten, we risk ending up with pseudo findings that indicate psychological functions that do not really exist. This has been argued to be the case with the overconfidence bias, and presently a number of previous research findings are being re-evaluated in the light of adaptation to particular environments (see, for instance, Gigerenzer and Fiedler, to appear; Krueger and Funder (2004)). The ecology of thought also holds the promise of making real contributions to psychology. The possibly adaptive nature of thought invites hypotheses about the function of various psychological processes, and can even be used to hypothesise new processes. The ecology of thought is a very difficult enterprise. So far, there is no precise definition of environments, only methods for approaching environmental adaptations (such as representative sampling). For a larger scale theory, this is not enough. If we, for instance, want to claim that a certain psychological feature is an adaptation, we have to make sure that it is in fact successful in a number of circumstances. To do so, we need to know what the distribution of actual environments related to the proposed adaptation is. This seems impossible to do without access to at least a crude classification of environments. In addition, it is crucial that more theoretical work is done on the possible interactions between Darwinian, cultural and individual adaptation, so that we are able to disentangle which type of adaptation is tapped into through empirical research. Only when we have a clearer theoretical underpinning will it be possible to design the crucial experiments that will help us determine empirically which type of adaptation we are dealing with. REFERENCES Axelrod, R. (1984). The evolution of co-operation. New York NY: Basic Books. Barkow, J. H., Cosmides, L., and Tooby, J. (1992). The adapted mind: evolutionary psychology and the generation of culture. New York: Oxford University Press. Boyd, R., and Richerson, P., J. (1985). Culture and the evolutionary process. Chicago: The University of Chicago Press. Brunswik, E. (1944). Distal focussing of perception: size constancy in a representative sample of situations. Psychological Monographs, 56(1), Whole No. Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review 62(3): 193-217. Cosmides, L. and Tooby, J. (1994). Origins of domain specificity: the evolution of functional organization. In L. Hirshfeld & S. Gelman (Eds.): Mapping the mind: domain specificity in cognition and culture. Cambridge: Cambridge University Press: 85-116. Dawkins, R. (1976). The selfish gene. Oxford: Oxford University Press. Fischhoff, B. (1982). Debiasing. In D. Kahneman, P. Slovic and A. Tversky (Eds.): Judgment under uncertainty: heuristics and biases. Cambridge: Cambridge University Press: 422-444.
EXPLANATION AND ENVIRONMENT
175
Gigerenzer, G. and Fiedler, K. (to appear). Minds in environments: the potential of an ecological approach to cognition. In P. Todd, G. Gigerenzer and the ABC Group: Ecological rationality (provisional title), Oxford: Oxford University Press. Gigerenzer, G. and Goldstein, D. (1996). Reasoning the fast and frugal way: models of bounded rationality. Psychological Review 103: 650-669. Gigerenzer, G., Hoffrage, U. and Kleinbölting, H. (1991). Probabilistic mental models: a Brunswikian theory of confidence. Psychological Review 98(4): 506-528. Gigerenzer, G., Todd, P. M., and ABC. (1999). Simple heuristics that make us smart. Cambridge: Cambridge University Press. Goldstein, D. and Gigerenzer, G. (2002). Models of ecological rationality: the recognition heuristic. Psychological Review: 109(1): 75-90. Gould, S., J. and Lewontin, R. C. (1979). The spandrels of San Marco and the panglossian paradigm: a critique of the adaptationist programme. Proceedings of the Royal Society of London 205(1161): 581598. Hogarth, R. (2004). Generalizing results of decision making studies: can we find ways to validate experimental studies? Paper presented at Risk, Decision and Human Error, Trento. Juslin, P. (1994). The overconfidence phenomenon as a consequence of informal experimenter-guided selection of items. Organizational Behavior and Human Decision Processes 57: 226-246. Juslin, P. and Olsson, H. (2005). Capacity limitations and the detection of correlations: a comment on Kareev (2000). Psychological Review 112(1): 256-267. Juslin, P., Winman, A. and Olsson, H. (2000). Naive empiricism and dogmatism in confidence research: a critical examination of the hard-easy effect. Psychological Review 107(2): 384-396. Kahneman, D., Slovic, P. and Tversky, A. (1982). Judgement under uncertainty: heuristics and biases. Cambridge: Cambridge University Press. Kareev, Y. (1995). Through a narrow window: working memory capacity and the detection of covariation. Cognition 56: 263-269. Kareev, Y. (2000). Seven (indeed plus or minus two) and the detection of correlations. Psychological Review 107(2): 397-402. Kareev, Y., Lieberman, I. and Lev, M. (1997). Through a narrow window: sample size and the perception of correlation. Journal of Experimental Psychology: General 126(3): 278-287. Krueger, J., I., & Funder, D., C. (2004). Towards a balanced social psychology: causes, consequences and cures for the problem-seeking approach to social behavior and cognition. Behavioral and Brain Sciences 27: 313-376. Laland, K., N. and Brown, G., R. (2002). Sense and Nonsense: evolutionary perspectives on human behaviour. Oxford: Oxford University Press. Miller, G. A. (1956). Magical number seven. Psychological Review 63(2).
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION OF LANGUAGE ACQUISITION MIKA KIIKERI AND TOMI KOKKONEN
1. INTRODUCTION All children learn or acquire their first language during a relatively short period in childhood. Individual variation or differences in learning environments seem to have very little influence on this process. Basically, every human being acquires a native language in essentially the same way. These facts have long puzzled linguists and psychologists. Generally speakin, there have been two competing accounts of this phenomenon, the empiricist’s story and the nativist’s story. The empiricist’s story says that the acquisition of a first language is based on the principles of inductive learning. A child learns the rules of language by making inductive generalisations from linguistic data (i.e., utterances that she hears); subsequently, she tests and revises these generalisations against additional data. According to this account, first-language acquisition is a general learning problem, and every child solves it by general learning mechanisms and strategies. Most empiricists believe that language learning does not demand any domain-specific information. By contrast, the nativists claim that general learning mechanisms are not enough. The most important evidence for this claim comes from the ‘poverty of the stimulus’ argument: since the end state of the acquisition process is much richer than the linguistic input from the environment, so this argument goes, grammars cannot be learned from the linguistic data alone. Consequently, nativists try to account for language acquisition by postulating innate knowledge, which provides the necessary supplement. The most famous variant of the nativist account is Noam Chomsky’s generative grammar. Chomsky and his followers famously claim that there is an innate universal grammar from which grammars of every natural language can be derived when values of certain open parameters are fixed by linguistic environment (e.g., Chomsky 1986). Philosophical discussion about language acquisition has concentrated on the empiricism/nativism distinction. This is not surprising, since the whole issue continues the old philosophical debate between empiricist and rationalists. One of the obstacles to progress in this debate has been that the details of learning mechanisms or acquisition devices at the brain level are not very well known. It
177 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 177–192. © 2007 Springer.
178
M. KIIKERI AND T. KOKKONEN
seems that ultimately progress can be achieved only by taking into account the biological basis of language acquisition. When causal mechanisms are unknown, the accounts of language acquisition have resorted to the analysis of linguistic data and general forms of grammars. Chomsky was explicit on this issue: he distinguished between competence (the abstract linguistic knowledge of language users) and performance (how language is actually used)79. Linguistic theories account for competence, not for the more context- and error-sensitive performance. For this reason, linguists have usually concentrated on the so-called ‘logical problem of language acquisition’ and studied the logical relationships between abstract acquisition device and linguistic data. As we will see, these kinds of idealizations could be problematic if we try to provide a causal explanation of language acquisition (for a critical discussion of this issue, see Cowie (1999), p. 242-248). Psycholinguistic research has tried to take performance factors into account and specify the role of abstract grammar in the actual production and understanding of languages. Still, the biological basis of grammatical knowledge is largely left in the background. Explanations in linguistics (and in cognitive psychology more generally) have concentrated on the ‘cognitive level’, leaving the details of biological implementation to future research. It has been a kind of research strategy to contend that all details that could not be accounted for on the cognitive level are innate, based on the biological properties of an organism. But if we stay within the domains of linguistics or psychology, we cannot say much about the biological side of languages. The general view has been that languages are innate as far as they are genetically coded or determined, as, for example, Jerry Fodor et al. (1974, p. 450) clearly stated: (GL) Language is innate iff it is encoded in the gene code. (GL) makes heritability a central feature of innateness, which has led to a search for language or grammar genes. However, developmental biologists and some philosophers of biology have recently challenged the received wisdom of innateness in biology. The discussion has turned from genetic heritability to developmental factors. There have been attempts to analyse innateness in terms of generative entrenchment (Wimsatt 1999) and canalisation (Ariew 1999), while others have argued that these various notions of innateness do not form any easily identifiable category (Griffiths 2002). We will take a closer look at some of these notions in order to see whether they clarify the concept of innateness from the biological perspective. In the remaining sections of this paper, we will briefly examine the present status of cognitive/nativistic explanations of language acquisition and sketch the role of biological notions of innateness in the debate.
79
Competence is replaced by the notion of internalist language or I-language in the more recent versions of generative grammar; see Chomsky (2000).
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION
179
2. INNATENESS AS A BIOLOGICAL CONCEPT At the heart of the notion of innateness is the idea of something being present at birth. This is a fairly unproblematic idea when dealing with anatomical characteristics that are concretely present at birth. But the idea of innateness is sometimes extended to such non-anatomical characteristics of an animal as its behaviour. Konrad Lorenz, the founder of ethology, made a distinction between innate and acquired behavioural traits; despite widespread criticism of this distinction, it is still often used in the explanation of behaviour and even applied to human behaviour and psychology (see Cartwright 2000; Griffiths 2002). In case of psychology and behaviour, the presence of a trait at birth is not necessarily concrete; in order to be innate, a trait only has to have a disposition to appear. In some contexts, a trait is said to be innate if it appears in the normal course of development without any special external causal contribution, or if there is something more to the development than the external causal contribution. In these cases, the development of a trait or a characteristic cannot be explained by external factors alone; a reference to something internal is also needed, and this ‘something’ is conceptualised as being innate. Innateness, therefore, could be associated with constraints on the development of a trait, directing the outcome in interaction with the causally relevant factors of the environment (cf. Elman et al. 1996; Bjorklund and Pellergini 2001). In the context of language acquisition, for example, the innateness of certain features of language means that the language is a co-product of the linguistic data the child is exposed to and something the child is born with—a specific disposition to a language. This type of innateness could be called dispositional innateness. However, innateness cannot be equated with a mere disposition to appear. First of all, any developmental process usually requires the presence of some specific external conditions: introducing or omitting certain aspects in the environment could actively prevent the trait from developing. This is why the innateness of a trait is often thought to be relative to the ‘normal range’ of environments (see Stich 1975; Sober 1998). And because innateness depends on the causal relation between the trait’s development or activation and the environment, the problem arises of where to draw the line between the presence of a normal environment and the causal contribution of the environment. Secondly, it is natural to think that innateness is a matter of degree. A trait might be a result of the causal interaction between the environment and an innate component of the development, and the more the development relies on the innate component, the more innate the trait is—even if there is a necessary external causal component. This is how innateness of psychological phenomena such as concepts, ideas or language, is often seen, from Plato to Chomsky and Fodor. It might be more correct to talk about an innate component of a trait instead of innate traits. There is, however, another, similar problem with dispositional innateness. Many of those traits that we would like to consider acquired require some unlearned cognitive capacity. One example is semantic memory. Semantic memory itself might be an innate capacity of human cognition, but surely we are not inclined to
180
M. KIIKERI AND T. KOKKONEN
think that all our behavioural dispositions and beliefs that depend on semantic memory are partly innate. A further distinction between a capacity for a trait and a special capacity for a particular trait has to be made. For example, it is not sufficient for language acquisition to be partly innate if the acquisition of language utilises only general learning mechanisms that are innate—there has to be a specialised developmental mechanism for language acquisition containing implicit information necessary for language to be acquired. There has to be specific intrinsic developmental input for the trait to be at least partly innate. From the explanatory point of view, a trait’s innateness is something that explains its existence, not just the capacity to acquire it. And this is the distinction in trouble in biology. Whatever innateness is, it has to be a biological property of a trait, even in a human context. The concept of innateness has, however, proved to be a tricky concept in biological contexts. Biological traits do not simply divide into two complementary classes of innate and acquired, not even in degree. The complexity is evident in an example given by Elliot Sober (1998). There are three different ways birds develop their species-typical singing. Some species of birds are ‘preprogrammed’ for singing: they sing in a way typical of their species even if they have never heard any kind of singing at all. This may be taken as a paradigmatic case of innateness. Other species first need to hear other birds of their species in order to sing in the same way. This seems to be a case of learning. But a third class of species differs from both of these. Members of this third class do not sing at all unless they are exposed to birdsong, but they begin to sing in their species-specific way, no matter what kind of birdsong they have heard, no matter how different it is from their own. This third case is not a matter of learning, nor is it ‘partly innate, partly learned’, since there is nothing in the triggering birdsong that would teach the bird the song it starts to sing. The development of the singing is not plastic or sensitive to the variance in the environmental cue either. The bird seems to have a sort of pre-programming for song, but it needs a special external input for its ability to be triggered. One could try to argue that the song is still an innate trait of the bird and that birdsong belongs to a normal range of environments. This, however, leads us again to the problem of dispositional innateness discussed earlier: the demarcation between the normal environment and the environment partly causing the development of a trait. This time, as already stated, the extent to which a trait is innate and acquired cannot be a matter of degree, since both genetic makeup and certain specific aspects of the environment are necessary causal components. If to be innate is to be causally independent on an external causal contribution, this is not a case of innateness. The fact that the song is not learned, however, tempts one to use that concept here. In fact, the idea of innateness combines a number of different ideas often thought to be interconnected, but which clearly are not necessarily so. In different contexts, the concept of innateness is used with different meanings that may or may not coincide in a given case. Patrick Bateson (1991) has listed seven such uses of ‘innateness’ found in biological literature: (1) being present at birth, (2) dependence on certain genes, (3) being an evolutionary adaptation, (4) not changing in the course of development, (5) species-typicality, (6) not acquired, and, in the case of behaviour, (7) being directed by internal, trait-specific mental structures. It would be
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION
181
too hasty to think that these uses are only different components of the same biological phenomenon—they are not. They may be different aspects of the vernacular concept of innateness, but that is another matter.80 In the same vein, Paul Griffiths (2002) has argued that the concept of innateness is exactly that: a vernacular, pre-scientific concept that is not useful in science. He argues that basically the concept has three components bound together in ‘folk biology’, but not in biological reality. These components are (1) species typicality, (2) being fixed in development and (3) the normal, ‘intended’ outcome of the development. The explanatory power of innateness comes from the idea of a species-typical essence, which members of the species are internally directed to embody. However, there is no need to give up a concept altogether just because it is sometimes used in a confused way and because our folk biological thinking connects it to a sort of essentialism that scientific biology explicitly refutes. Even if the consequences of some trait being innate were widely misunderstood, there might still be a useful and non-redundant way of using the concept, for example, in describing and explaining the development of a trait. In this context, the main meaning of the concept seems to be developmental fixity of some kind; the explanatory power of innateness comes from this fixity. After looking more closely into this matter we will return to the question of confusion around the concept of innateness. 3. INNATENESS AS DEVELOPMENTAL FIXITY A trait being developmentally fixed means that its development does not depend on the environment. As we remarked earlier, no trait is completely independent of the environment, since abnormal conditions may disturb normal development. But some traits are supposed to develop without any particular external factors being necessary. This is perhaps the central ingredient of biological innateness. A trait is often thought to be innate if its development leads to invariant outcomes in a relevant range of different environments (see Ariew 1996), or the innateness of a trait is thought to be a matter of degree and the degree of innateness is measured in terms of how big a role the internal dispositions have in the trait’s development, in contrast to the external causal factors (see Khalidi 2002). The simplest place to look for a biological basis for such dispositions is the genome. Some have even argued that being innate and being genetic is one and the same thing (cf. the definition (GL) above). Being merely genetic is not enough though, if being genetic means only that genes have a role in development. Particular environmental factors could still be necessary, and the outcome of the trait’s development could be highly sensitive to different environments. The concept often used here is a trait’s heritability (cf. Horvath 2000). But heritability is only a local measure that tells to what extent the variation in the genome results in the variation in the phenotype in a given range of environments. It is not a quantitative 80
Mameli and Bateson (forthcoming) go even further and discuss 26 possible candidates (some of them derivative) for the scientific explication of innateness and conclude that all of them have problems.
182
M. KIIKERI AND T. KOKKONEN
measure of the causal contribution of the genes in particular cases (cf. Sober 1988, 1998; Lewontin 1974). Furthermore, heritability is a population-level notion in the first place, not a property of the development of an individual organism. It has very little to do with the idea of developmental fixity. Whatever innateness is, it is manifested in individual developmental processes. Consequently, when innateness is linked to genes, what is usually meant is that in the development of a trait, the genes guide the development and environment has little, if anything, to do with it. Innateness is contrasted to plasticity: innate traits are stabile across environmental variation. The same genotype produces the same phenotype in all normal environments. This idea cannot be equated with genes having all the causal power in the developmental process. It is a consensus within biology that all traits are the results of the causal interaction of genes and environments. There are, however, traits that develop in the same way regardless of the details of environment. In developmental biology, this phenomenon is called ‘canalisation’. Sometimes innateness is equated precisely with this developmental phenomenon (see, for example, Ariew 1996, 1999). Canalisation is a phenomenon found especially in the early stages of development. For example, when certain basic tissue types in a foetus start to develop into specialised tissues (such as nervous tissues), the same developmental process can be carried out with a number of different hormones, even though the hormones have otherwise different effects. The development of organs is canalised as well: the organs develop into morphologically identical form in different environments (these environments are still internal to the organism in this case), even though the environments are active in the process. The same could, in principle, apply to other kinds of traits as well. The original state, determined by the genome and the environment, sets the development into a certain canal, in which the causal interaction of the genes and the environment produces the outcome. But the outcome is always the same. The end states of the canals are genetically determined, but the process itself is an interactive process. (See West-Eberhardt 2003; Moore 2003). For many, it might seem counterintuitive to equate innateness with canalisation, since it would mean that the distinction between innate and acquired is not a distinction between genetic and environmental causes (see Moore 2003). However, there is no need to think that the distinction between innate and acquired should be a distinction between different causal factors participating in the developmental process. This distinction could also be seen as involving different kinds of causal roles the environment has in the causal processes that produce the trait. In the case of innateness, the outcome of the trait is insensitive to the variation in the environment. On the other hand, if the variation in the environment has causal relevance to the outcome of the process, the variation in the outcome is caused by the environment and hence the trait is not innate. This description captures the general idea of innateness and canalisation seems to fit into this picture. However, a number of problems arise if innateness is equated with canalisation. For one thing, canalisation is not about a trait appearing regardless of the environment, but about the same trait appearing in causal interaction with different environments. The developmental process can use a number of causal factors in
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION
183
different environments. There is a theoretical continuum between a trait that is dependent on a particular causal makeup of the environment (and therefore not innate), and a trait that is canalised across all possible environments. There is no principal difference in the actual developmental process between these two extremes if the causally necessary characters of the environment are present. As such, this does not mean that innateness cannot be analysed in terms of canalisation, only that in doing so, one is taking innateness to be something that is always related to the environment. It could also mean that not only can some characteristics in the environment disturb the development of a canalised trait, even in traits that seem to be canalised across a vast range of environments, but also that the development in fact relies on a particular external causal factor. If the external factor is stabile enough, it becomes impossible to infer from a complicated process of causal interactions which factors are necessary and whether the development is dependent on any particular feature of the environment. Consider birdsongs once again as an example. The case in which the bird spontaneously starts to sing, even if it has grown up in silence, seems to be a case of canalisation. What about the case in which the singing is triggered by another kind of singing? There seems to be something innate, but the trait is not canalised. And as stated earlier, the trait cannot be conceived as partly innate, partly acquired, since internal and external contributions are not additive. It is hard even to imagine what that would mean in this case. Moreover, why should it matter whether the development of a given trait is canalised or is dependent on a reliably present external causal factor? From the evolutionary point of view, for example, it does not matter. In order to be an object for natural selection, a trait has to develop in the same way in the same environment with the same genetic makeup, but that is all the stability that is required. The trait does not have to develop independently on the environment; in fact, natural selection is ‘blind’ to the details of development. It is only outcome that matters. Evolution, of course, is not a crucial question here, but it is the only thing that would make it matter whether the trait is canalised or not. There is no reason to think that canalisation, when it occurs, is a matter of universal mechanisms underlying the developmental process and directing it to the ‘intended outcome’ in all conditions. Rather the canalisation of a trait is a result of parallel selective pressures in changing environments: the mechanisms that make the same outcome have been separately selected for each environment. Outside the range of these natural environments the development is disturbed. We would call these environments abnormal. Exactly the same distinction could be applied to the non-canalised traits: if the development requires a particular external factor, an environment lacking this factor is an abnormal environment for the trait’s development. It is hard to come up with any biological or ontological reason to make any fundamental distinction between the two cases—except perhaps that one, put in the right way, fits the vernacular meaning of innateness and the other does not. This reason, however, seems to be rather inadequate. From an explanatory point of view, at least, the two cases are similar. One possible way to deal with the situation is to disconnect the genes and the innateness. One could accept the active role of particular causal factors to the development of a trait and treat the genes and the environmental factors
184
M. KIIKERI AND T. KOKKONEN
indifferently. Innateness would be contrasted only with plasticity. The developmental system theorists (cf. Griffiths and Gray 1994; Wimsatt 1999) argue that the fundamental difference between genes and environments should indeed be put aside when explaining the development of a trait, but by the same token they usually abandon the distinction between the innate and the acquired. William Wimsatt, however, has proposed that the theoretical role given to the concept of innateness should be given to another concept instead, namely, to what he calls ‘generative entrenchment’ (Wimsatt 1999). This concept refers to how ‘deep’ in the architecture of the whole organism and its development the trait in question is. In the evolutionary process, traits do not evolve independently of each other, but instead depend on the already evolved structures. This dependence is also reflected in the individual development, which, on the other hand, is one of the main reasons the evolution of new traits is constrained by the old ones (cf. Gould and Lewontin 1979, Griffiths and Gray 1994). The ‘deeper’ traits might also develop in interaction with the environment, but their development is more fundamental, more rigidly fixed, more likely to be canalised, more likely to rely on stabile causal factors in the environment, more likely to be species-typical and so on (Wimsatt 1999). Equating generative entrenchment with innateness is counterintuitive at the conceptual level, and what Wimsatt is doing is not to give a biological interpretation of the vernacular concept of innateness. Rather, he wants to fill the conceptual gap left by the elimination of the concept of innateness. According to him, the concept of generative entrenchment fills most of the roles we want to give to the concept of innateness. But one could go even further and claim that generative entrenchment is the phenomenon we refer to when we talk about innateness of a trait—we just have some misconceptions about the nature of this phenomenon. This view would also be compatible with the idea that in psychological contexts innateness refers to some constraints on the development of a trait, directing the outcome in interaction with the causally relevant factors of the environment (cf. Elman et al. 1996, Bjorklund and Pellergini 2001). 4. THE ‘POVERTY OF STIMULUS’ ARGUMENT AND COGNITIVE EXPLANATIONS OF LANGUAGE ACQUISITION Now we will examine the role of innateness in the cognitivist accounts of language acquisition. To understand this clearly, here is an argument (a variant of the ‘poverty of stimulus’ argument) which purports to show the indispensable role of innate knowledge in language acquisition. A characteristic reasoning pattern in theoretical linguistics is from the poverty of data to the innateness of linguistic ability: 1. In order to master her native language, a child has to have capability X (specified in the author’s favourite linguistic theory). 2. A child either learns X from the utterances she hears during the learning period or she has been born with X or with the disposition to develop X. 3. A child cannot learn X from the utterances she hears. 4. Therefore, a child has to be born with X or with the disposition to develop X.
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION
185
Judging from the biological point of view, innateness has played the role of the explanatory black-box in nativist accounts. This leads to a kind of epistemic or methodological notion of innateness: a trait or ability is innate relative to our current knowledge and level of explanation. Only when the details of brain-level implementation are someday revealed will it become possible to give a full explanation in terms of causal mechanisms. This situation leaves the issue wide open because reasoning from the poverty of stimulus is an instance of inference to the best explanation. The problem is that currently there are no real explanations. The choice is made between two possibilities: linguistic ability is innate or it is learned. And because there are no plausible explanations in terms of inductive learning models, the nativist explanation (which from the biological perspective is not an explanation at all) is chosen as ‘the best explanation’. An important feature of inference to the best explanation is that we could only trust its conclusion if the list of alternative explanations is complete. There is no reason to think that this happens in the case of language acquisition. There might be a ‘third way’ between an empiricist’s and a nativist’s account. One interesting recent attempt to distinguish and develop such alternatives is Fiona Cowie’s What’s Within (1999). It is, of course, clear that language acquisition demands psychological mechanisms that could be innate or even domain-specific. Most people who accept the empiricist’s story of language acquisition admit this. Cowie’s (1999) enlightened empiricism explicitly holds that language acquisition relies heavily on domainspecific information (but denies that it is innate), while Hilary Putnam’s (1967) brand of empiricism relies on general learning mechanisms that could be innate. So even an empiricist can admit that learning mechanisms are specifically linguistic, i.e., that language forms an independent module that could be informationally encapsulated from the other cognitive domains. Another issue is the nature of this module. The most influential position here is Chomsky’s theory of universal grammar: all languages have basically the same, innately specified structure, and the remaining differences between them are accounted for by postulating certain grammatical parameters whose values vary from language to language. The values of these parameters are then fixed during the language acquisition period. To systematise these distinctions, Cowie (1999, p. 176) distinguishes three theses whose conjunction characterises the nativist position: (DS) Domain Specificity: Learning a language requires that the learner’s thoughts about language be constrained by principles specific to the linguistic domain. (I) Innateness: The constraints on the learner’s thoughts during language learning are innately encoded. (U) Universal Grammar: The constraints and principles specified in (DS) as being required for language learning are to be identified with the principles characterised in the Universal Grammar.
186
M. KIIKERI AND T. KOKKONEN
Cowie criticises the arguments for (I) while she adopts (DS) and remains agnostic towards (U). She calls this position Enlightened Empiricism (EE, for short; she also distinguishes weak nativism, which adopts [DS] and [I] but denies [U]). The idea is to admit that children need domain-specific information (which could be something like Chomskyan UG) in language acquisition, but UG (or whichever grammatical theory we learn) does not need to be innate. Instead, it could be learned by using general learning mechanisms and focusing on a piece-meal fashion of grammar construction. Cowie argues for EE by making a detailed attack against the nativist position. According to the nativist’s story, primary linguistic data (pld, for short) together with general learning mechanisms cannot account for the complexity of language acquisition. A central problem has been the lack of negative evidence: there are no mechanisms that tell a child that her grammar contains the wrong rules. One of the main themes of Cowie’s attack against the nativist position is that nativists exaggerate the lack of negative data. The stimulus is not as poor as the nativists claim. It is obvious that this lack of negative evidence causes problems when children are trying to learn syntactic rules. The problem is that if a child makes an overgeneralisation from pld, she cannot get back to the correct form of a rule because there is no data that could correct her. But, Cowie argues, there are many sources of negative evidence whose existence nativists have neglected because their view of language learning is too idealised and narrow. And it could also be that the role of negative evidence is exaggerated in the first place. As an analogue, she considers how we learn to recognise curries. We need to taste curry only a few times in order to identify it reliably. There do not need to be any general rules of how to identify them, nor have we to taste every type in order to be reliable curry detectors. A few experiences are enough. Experiences of foods that are not curries give indirect negative confirmation to our curry-detecting abilities. Similarly, in the case of language acquisition, grammatical structures could be learned from a few instances, and there are numerous sources of indirect negative evidence. Cowie may have a point here, even though the analogy with the curry-detecting ability is not very convincing. After all, the fact that we could detect curries does not imply that we are able to analyse their composition and produce new variants (or even reproduce familiar ones) in the way that children use language productively. There have been many critical reactions to Cowie’s book and to her enlightened empiricism (e.g., Collins 2003; Crain and Pietroski 2001; Fodor 2001; Laurence and Margolis 2001). These usually defend the nativist stance by pointing out that Cowie’s and other empiricists’ criticism is too narrow and does not find its mark. For example, it is not enough to show that there could be linguistic environments so rich that grammar learning without innate constraints is possible. The task is to show that these circumstances are robust enough, that such rich pld is always available. Cowie’s book is important in many respects. She has convincingly shown that there are largely neglected intermediate positions such as enlightened empiricism and weak nativism that should be taken into account, and she also provides some interesting arguments on their behalf. But the overall impression is that neither Cowie nor her nativist critics have advanced any arguments that settle the issue.
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION
187
One line of argument is that cognitivist accounts of language acquisition have been too idealised, and that real progress is achieved only by taking a closer look at the brain-level details of the actual acquisition process. What then about the empirical neurosciences? Could they shed new light on the empiricism/nativism controversy? Although some progress has been achieved, the empirical brain-level picture of what really happens during the language acquisition period is still very sketchy. Familial aggregation studies, twin studies and linkage analysis are the most important empirical research methods that behavioural geneticists have used to determine the roles of genes and the environment in the ontogenesis of language (Stromswold 1999). These techniques focus on language disorders and on differences between linguistically normal and abnormal individuals. For example, empirical research based on twin studies or examination of specific brain impairments could explain why certain individuals lack certain linguistic capacities by pointing out that some factors in the causal mechanisms are missing or are working improperly compared to the normal case. This research gives some information about the causal mechanisms of language acquisition. But this information is incomplete, and the details of these mechanisms are still largely unknown. It is not surprising, then, that the general conclusion from these studies is that they cannot resolve the dispute between empiricists and nativists. In the question of whether there is a separate language module or only highly distributed and neurally plastic language-processing abilities, for example, both positions seem to be compatible with empirical evidence from brain studies. Still another alternative in this situation is to take the black-box aspect of (psychological) innateness seriously. Richard Samuels (2002, 2004) has defended a version of nativism in cognitive science that he calls primitivism. His idea is that if there is not any psychological explanation of how the cognitive structure is learned, then the structure is innate. Innate traits or structures are then psychologically primitive. There are some obvious problems with this account: the notion of learning is difficult to define precisely and putative counter-examples (abnormal occurrence of some psychological traits through brain lesions, etc.) demand that normalcy conditions have to be added (see Samuels 2002, p. 259, Mameli and Bateson, forthcoming, p. 13-15). Despite this, Samuel’s primitivism is an interesting effort to overcome some of the difficulties with the previous accounts. One interesting feature of Samuel’s account is the way it connects innateness to psychological theories and explanations. As a general definition or semantic analysis of innateness, this move is problematic. Mameli and Bateson (forthcoming) remark that his definition depends on arbitrary disciplinary boundaries and vague distinction between psychological and non-psychological explanations. Samuels tries to avoid these kinds of problems by referring to correct psychological theories and explanations, and claiming that these are not relative to the current status of theorising within psychology. The phrase ‘there are not any psychological explanations of...’ should be interpreted in the strong sense that no such explanations will be discovered in the future either (Samuels 2002, p. 246, fn. 20). Samuel’s definition thus refers to the ideal final stage of science in which the correct theories and explanations are already found. Although this is a common strategy within the
188
M. KIIKERI AND T. KOKKONEN
philosophy of science, it is not very useful as an analysis of innateness in cognitive science. It would be more fruitful to admit that the disciplinary boundaries and explanatory levels can change, and that our concept of innateness in psychological contexts is relative to such matters. This move focuses on the practical or methodological use of the notion of innateness (rather than semantic analysis). If the current consensus within psychology is that learning is inductive learning from empirical experiences, then psychological structure or ability is innate if it cannot be learned inductively from the data. If other forms of learning are accepted as relevant for the acquisition of psychological structures (or if the available evidential data base is made stronger), the range of innate information could change. If, for example, Cowie’s claims about the nature of the learning process and the role of negative evidence turn out to be correct, the role of innate constraints on language acquisition (as stated in thesis [I]) becomes less important, and her enlightened empiricism becomes more plausible. This view reflects the role of innateness in actual explanatory practise within psychology and cognitive sciences. It is at least one promising possibility that would be worthwhile to pursue further. 5. BIOLOGICAL INNATENESS AND LANGUAGE ACQUISITION What, then, about the biological notions of innateness? How do they relate to the empiricism/nativism debate in the cognitive accounts of language acquisition? As discussed earlier, from a biological point of view, a trait is innate if its development is invariant in a normal range of environments. In one sense, language acquisition is highly sensitive to the environment—a child acquires the language he or she hears. But in another sense the acquisition of a language is independent from any particular linguistic data, and the language acquisition is a highly canalised process, since all children develop approximately the same linguistic capacities in the same order. What is thought to be innate, as discussed before, is in fact the child’s specialised capability of adopting linguistic rules from the linguistic data. This process has been conceptualised as involving an innate set of grammatical rules and principles from which the concrete syntactical rules can be derived in interaction with the linguistic environment. But how is this to be seen from the biological point of view? Particular languages are not canalised, for obvious reasons. But the linguistic capacity directing the language acquisition could be—the general outlines of the language acquisition process could be identical, even though the particular features of a language vary. André Ariew (unpublished) has recently developed this kind of an account. In so doing, he elaborates the canalisation account of innateness and replaces the innate/acquired dichotomy with a tripartite distinction among innate, triggered and acquired. Ariew introduces the distinction to deal with the difficulty we discussed before, namely, that some developmental processes that are fixed throughout all normal environments are not canalised, but instead depend on particular external causal factors. The example we used was a bird beginning to sing after hearing birdsong of any kind. Traits of this kind are triggered by an external stimulus, such as the singing. The trait is not fully canalised, but is not exactly
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION
189
acquired either, and this is why a new category is needed. Particular languages, Ariew argues, are triggered, not fully innate or fully learned. What is canalised is the language organ, which directs each step of the development. When a child develops a language, she in a sense ‘grows’ the grammar for a language. The linguistic data to which the child is exposed triggers appropriate syntactical and other grammatical elements that together constitute the language. It is not, however, clear what is meant by the language organ being canalised. In one sense, the language organ is supposed to be a mental organ embodied in the brain structure. This structure then directs the development of a language. What this means in practise is reorganisation of the neural system at the neurobiological level. Thus, there is no independent trait directing the development of another trait. It is more natural to say that mastering a language is a capability that develops in interaction between the language to which a child is exposed and the internal structure of the developmental process. An innate language organ is a property of the developmental process of language mastery, not an external thing affecting it. It would be better to say that the development of language is canalised with respect to certain features, such as the order in which the syntactical elements appear in the child’s command of the language. But this does not explain what the postulation of innate properties of language was aimed to explain in the cognitive accounts in the first place: the acquisition of particular grammars. And it is even more puzzling to say that a particular language is ‘triggered’ in the way that birdsong is triggered by a certain external stimulus. If the grammar of a language is abstracted from a complicated ensemble of linguistic data, the primary explanatory questions concern such things as how a child distinguishes the ‘right’ features of the data and how these interact with rudimentery representations of grammatical knowledge. The more general view—that grammar is somehow triggered by stimuli—comes close to the traditional nativist explanation of language acquisition with its idealised and formal view of the acquisition process. The idea of innateness as a degree of generative entrenchment could be more interesting here. It will be recalled that Wimsatt’s (1999) notion of generative entrenchment refers to the depth of a developmental process. If we apply this notion to the development of language, we get a picture of an acquisition process that is more fundamental in the psychological development of language than learning by general learning mechanisms. It is constrained by the developmental structures, yet it could be essentially dependent on environmental features. On this more biological description of language acquisition process, there is perhaps no information on syntax encapsulated in the brain, but the development of certain neurological structures is such that the development is disposed to be influenced by certain impulses from the environment during certain periods. The acquisition of grammar could be seen as a more fundamental and specialised learning process than it is in the empiricist alternative, but it remains nevertheless a mechanism that uses external information as an essential ingredient of the acquisition process. To say that the grammar of a language is deeply generatively entrenched would capture nicely the role that innate knowledge is given in the language acquisition debate and at the same time it would give a role for empirical learning too. This account, however, would require more elaboration in order to differentiate the presence of a language
190
M. KIIKERI AND T. KOKKONEN
in the environment from the language being innate, when the presence is generatively entrenched to a high degree. Nevertheless, such an account could provide a possible basis for a biological analysis of what innateness is in the language acquisition debate. Moreover, these points lead us to a final point we have to make about innateness in biological and psychological contexts. As shown in previous sections, the concept of innateness is surrounded by confusion in biological contexts. The term ‘innateness’ is sometimes used to refer to certain developmental processes, sometimes to a universality of a trait or to a trait being an adaptation (see Griffiths 2002 and Mameli and Bateson forthcoming). We have restricted our discussion to ideas connected to the developmental processes, since this is clearly the core idea of innateness. Even so, it is hard to tell exactly what the innateness of a trait is. One possibility is that the concept of innateness refers to a property of a developmental process that is always realised only partially; an example would be the outcome of the developmental process being insensitive to environmental factors. Even canalisation fits into this only partially, since the development of a trait is always canalised across some range of environments, not across all possible environments. It would be, however, more appropriate to say that nothing in biology is very innate or that innateness is not interesting from a biological point of view. Furthermore, such a position need not be the case from the psychological point of view. As we have seen earlier, psychological discussion about innateness has more to do with learning than with what causally guides the outcome of development. Bateson, Griffiths and others have pointed out that the concept of innateness is used in different ways in different contexts. They see this as a serious problem for the whole notion. But there is an alternative possibility. Since biological and psychological interest in the concept refers to different phenomena—what guides the development and what is learned—why not distinguish between these two uses? The concept of innateness is an explanatory concept in both contexts, and it is used in analogous ways. When the context of interest is shifted to another explanatory level, the relevant contrast for the explanation is likewise shifted. From the biological point of view, the concept of innateness has to do with what kind of biological developmental processes produce the trait. From the psychological point of view the concept of innateness has to do with what kind of psychological developmental processes produce the trait. To return to the problematic case of birdsong development discussed earlier, the singing is not an innate trait from a biological point of view, since its development is dependent on a particular feature of the environment, but since the bird does not learn the singing either, we are also tempted to say it is innate—from a psychological point of view. Whatever innateness is in a psychological context, it is a biological property. However, it would be erroneous to think that innateness in a psychological framework is the same thing as innateness in a biological framework. Still, it would be crucial to look into innateness issues in language acquisition debate from the biological point of view. We hope that our discussion has demonstrated these points.
BIOLOGICAL NOTIONS OF INNATENESS AND EXPLANATION
191
6. CONCLUSION Future research may reveal in a more detailed way the causal relationships between linguistic environments and the neurological and cognitive structures of children. Consequently, if the methodological view of innateness within psychology is accepted, the role of innate information may change, and the contrast between nativism and empiricism may evolve and become more accurate. Still, the innateness debate in linguistics is about the sufficiency of general cognitive capacities in learning grammar, while the innateness debate in biology is about traits developing without a need for any particular external causal factor. These viewpoints may not (and need not) coincide. Cowie (1999) and Griffiths (2002) have suggested that in a sense the word ‘innateness’ refers to a black-box of explanation (cf. also Section 4). Even though Griffiths’ remark is an eliminative one, meant to enforce his line of thought that the concept of innateness should be discarded for good, one could try to defend its use in the language acquisition context. Appealing to innateness as an explanation of language acquisition is to claim that this learning process cannot be completely explained at the psychological level, since there is something in the process that also requires a biological explanation (cf. Samuels 2002, 2004). This extra-psychological component is as innate from the cognitive point of view as any biological constraint on perception. This does not mean that the developmental process has to be innate at the biological level in any biological sense of that word. Language can be both innate and a biological phenomenon without being biologically innate.
REFERENCES Ariew, A. (1996). Innateness and Canalization. Philosophy of Science 63: 19-27. Ariew, A. (1999). Innateness Is Canalization: In Defense of a Developmental Account of Innateness. In Hardcastle 1999: 117-138. Ariew, A. (unpublished): Innateness as Triggering: Biologically Grounded Nativism. http://www.uri.edu/artsci/phl/triggering.htm#_ftn1. Bateson, P. (1991). Are There Principles of Behavioural Development?. In Patrick Bateson (ed.), The Development and Integration of Behaviour. Cambridge: Cambridge University Press:19-39. Bjorklund, D. F. and Pellegrini, A. D. (2001). The Origins of Human Nature. Evolutionary Developmental Psychology. Washington, DC: American Psychological Association. Cartwright, J. (2000). Evolution and Human Behaviour. Darwinian Perspectives on Human Nature. New York: Palgrave. Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin and Use. Praeger: New York. Chomsky, N. (2000). New Horizons in the Study of Language and Mind. Cambridge: Cambridge University Press. Collins, J. (2003). Cowie on the Poverty of Stimulus. Synthese 136: 159-190. Cowie, F. (1999). What’s Within? Nativism Reconsidered. Oxford: The Oxford University Press. Crain, S. and P. Pietroski. (2001). Nature, Nurture and Universal Grammar. Linguistics and Philosophy 24: 139-186. Elfman, J., E. Bates, M. Johnson et al. (1996). Rethinking Innateness: A Connectionist Perspective on Development. Journal of Consciousness Studies 5: 117-119. Fodor, J. (2001). Doing Without What’s Within: Fiona Cowie’s Critique of Nativism. Mind 110: 99-148. Fodor, J., T. Bever and M. Garrett. (1974). The Psychology of Language. New York: McGraw-Hill.
192
M. KIIKERI AND T. KOKKONEN
Gould, S. J., and Lewontin, R. C. (1979). The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationst Programme. Proceedings of Royal Society of London 205: 581-598. Griffiths, P. E. (2002). What Is Innateness? The Monist 85: 70-85. Griffiths, P. E. and R. D. Gray (1994). Developmental Systems and Evolutionary Explanation. Journal of Philosophy 91: 277-304. Hardcastle, V. G. (ed.) (1999). Where Biology Meets Psychology: Philosophical Essays. Cambridge, MA: MIT Press. Horvath, C. (2000). Interactionism and Innateness in the Evolutionary Study of Human Nature. Biology and Philosophy 15: 321-337. Khalidi, M. A. (2002). Nature and Nurture in Cognition. British Journal for the Philosophy of Science 53: 251-272. Laurence, S. and E. Margolis (2001). The Poverty of the Stimulus Argument. British Journal for the Philosophy of Science 52: 217-276. Lewontin, R. (1974). The Analysis of Variance and the Analysis of Causes. American Journal of Human Genetics 26: 400-411. Mameli, M., and P. Bateson (forthcoming). Innateness and the Sciences. Biology and Philosophy. Moore, D. S. (2003). The Dependent Gene. New York: Henry Holt and Company. Putnam, H. (1967). The ‘Innateness Hypothesis’ and Explanatory Models in Linguistics. Synthese 17: 12-22. Samuels, R. (2002). Nativism in Cognitive Science. Mind and Language 17: 233-265. Samuels, R. (2004). Innateness in Cognitive Science. Trends in Cognitive Sciences 8: 136-141. Sober, E. (1988). Apportioning Causal Responsibility. Journal of Philosophy 85: 303-318. Sober, E. (1998). Innate Knowledge. In Craig (ed.): Routledge Encyclopedia of Philosophy. London: Routledge: 794-797. Stich, S. (1975). The Idea of Innateness. In Stich (ed.): Innate Ideas. Los Angeles: University of California Press. Stromsworld, K. (1999). Cognitive and Neural Aspects of Language Acquisition. In Lepore and Pylyshyn (eds.): What is Cognitive Science? Malden, MA: Blackwell Publishers: 356-400. West-Eberhardt, M. J. (2003). Developmental Plasticity and Evolution. Oxford: Oxford University Press. Wimsatt, W. C. (1999). Generativity, Entrenchment, Evolution, and Innateness: Philosophy, Evolutionary Biology, and Conceptual Foundations of Science. In Hardcastle (ed.) (1999): 139-180.
ASPECT KINDS ROBIN STENWALL
1. INTRODUCTION One of the things that distinguish natural kinds from mere anthropocentric kinds is that the former have an explanatory role to play in scientific reasoning, while the latter do not. One might, as Alexander Bird puts it, “randomly collect diverse things and give the collection a name, but one would not expect it to explain anything to say that a certain object belonged to this collection”.81 It is difficult to see what natural laws or explanations involve notions that are dictated by our anthropocentric predilections and easier to see the explanatory significance of kinds that exist as kinds independently of our conventions. Natural kinds are also hierarchically ordered, with individual kinds distinguished by their essential properties and related to each other by species and other subsumption relations (see sections 2 and 3). Some of these hierarchical kinds crosscut other natural kinds, as when we say that the flea is both a kind of insect and a kind of parasite or when we say that zinc sulphate is a kind of zinc compound and a kind of sulphate. Similarly, explanations also crosscut. In explaining why zinc sulphate is easily soluble in water, we make references to solubility being a characteristic of metallic sulphates and when an explanation of zinc sulphate’s positive effect on the immune system is called for, we may point out that it is a zinc compound. This paper is devoted to natural kinds that crosscut in this way. I will present the reader with a new category of natural kinds that possess all the characteristics that are usually associated with natural kinds, with the one exception that they do not have their species necessarily included in the class of members of those kinds. This is alarming, since it is assumed that hierarchical kinds adhere to the essential membership thesis; i.e. the thesis that a subordinate of a natural kind is any natural kind whose membership is necessarily included in the class of members of the superordinated natural kind.82 Hence, it is assumed that zinc sulphate is a subordinate of sulphates, since zinc sulphate is necessarily a kind of sulphate. Similarly, and for the same reasons, the relation is also assumed to hold between zinc sulphate and the crosscutting kind zinc compound. The challenge of this paper 81 82
Bird (1998), p. 111. See Ellis (2001), p. 23.
193 J. Persson and P. Ylikoski (eds.), Rethinking Explanation, 193–203. © 2007 Springer.
194
R. STENWALL
is to show that the assumption that the thesis is adequate results in some counterintuitive rulings. I will argue that we are only allowed to say that kinds in crosscutting hierarchical structures are necessarily included in the class of members of higher-level natural kinds if the shared common genus is ontologically dependent in a weak sense on the crosscutting kinds. However, I do not think that we are forced to abandon the essential membership thesis if we admit that there are properties that are necessary for some aspects of natural kinds, but not necessary for the kind as that kind. So in explaining why fleas require a host within which they can mature, we may point out that fleas have certain aspectual properties making them a kind of parasite and that this is a characteristic feature of parasites.83 Still, in explaining these characteristics we are not making references to properties necessarily possessed by the flea and so the flea is not necessarily a kind of parasite. The kind parasite is what the title of my paper refers to—an aspect kind. I will start by discussing the essential properties of natural kinds, making a distinction between essential properties that are shared with other natural kinds and properties that are unique to the kinds in question at some level of specification. Thereafter I will discuss how the distinction made functions within hierarchical structures and its significance for our understanding of ontological dependence. Finally, I will discuss aspect kinds in relation to the account and the notions developed in the foregoing sections. 2. ESSENTIAL PROPERTIES IN HIERARCHICAL STRUCTURES Kripke’s and Putnam’s arguments concerning the semantics of natural kind terms were meant to show that the fact that water is composed of hydrogen and oxygen and the fact that an atom of chlorine Cl-35 has 18 neutrons in the atomic nucleus are necessary facts.84 Counterfactually, without any of these properties they would not be what they are. If Putnam’s and Kripke’s arguments are sound, it is of interest to notice a couple of consequences. First of all, necessity of identity presupposes distinctiveness. For a kind to have an identity as that kind is for it to have distinguishing features; i.e. something that marks it off providing it with certain characteristics in the nomenclature of kinds.85 To suppose that the various constituents of the world possess properties essentially is to suppose that the constituents of the world belong to various kinds and cannot cease belonging to them without ceasing to be of this or that kind. To have an atomic number of 79, for example, is unique to gold; it is a property that is not shared by any other natural 83
84 85
For those who have a hard time admitting that parasite is a natural kind with essential properties, I quote a standard textbook on parasitology: “Appropriate triggering mechanisms initiate the change from infective stages to parasitic stages. Once the parasite has begun its existence in a new host body, other triggering mechanisms initiate each change of the parasite during its development.” Thus certain properties have been discovered to be common to all parasites, properties that can explain in a lawful manner the parasitic behaviour of that kind. If the reader is still not convinced, I can assure that the have-or-have-not of essential properties of parasites is not important for the forthcoming analysis. (E.R Noble and G.A. Noble (1982), p. 7). See Kripke (1980) and Putnam (1975). Elder (1994).
ASPECT KINDS
195
kind that is not gold. Similarly, the essential property of having ½ spin provides leptons with a distinctive feature making them into the kind of particles that they are. Secondly, the properties that might be thought essential to any given kind fall into two categories: properties that characterize other kinds as well, and the properties (or combinations of properties), which are distinctive to the kind in question. Take the stable chlorine isotope Cl-35 as an example. There is a fundamental difference between the essential property of having 18 neutrons in the atomic nucleus and that of having 17 protons in the atomic nucleus. There are no subordinated kinds, nor are there any superordinated kinds over and above that of Cl-35 that have 18 neutrons in the atomic nucleus, while that of having atomic number 17 is shared by the superordinated kind chlorine. On the other hand, we would want to say that having atomic number 17 is a distinctive property of chlorine, since, as already remarked, it is unique to chlorine. If ‘