s
s OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3 I
This Page Intentionally Left Blank
THE ...
11 downloads
785 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
s
s OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3 I
This Page Intentionally Left Blank
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
EDITEDBY DOUGLAS L. MEDIN DEPARTMENT OF PSYCHOLOGY NORTHWESTERN UNIVERSITY. EVANSTON. ILLINOIS
Volume 31
@
ACADEMIC PRESS
San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper.
@
Copyright 0 1994 by ACADEMIC PRESS. INC. All Rights Reserved. N o part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system. without permission in writing from the publisher.
Academic Press, Inc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego. California 92101-4495 United Kingdom Edition published bv Academic Press Limited 24-28 Oval Road. London NW 1 7DX
International Standard Serial Number: 0079-742 1 International Standard Book Number: 0- 12-543331 -X PRINTED IN THE UNITED STATES OF AMERICA 94 95 9 6 9 7 98 9 9 B B 9 8 7 6 5
4
3 2 1
Contributors ..............................................................................
ix
ASSOCIATIVE REPRESENTATIONS OF INSTRUMENTAL CONTINGENCIES
Ruth M . Colwill I . Introduction ...................................................................... 11. Binary Associations
.
...........................
111. Hierarchical Structure of Instrumental Learning ...................... IV. Binary versus Hierarchical Associations ................................. V. Conclusion ........................................................................ References ........................................................................
1 2 45
50 64 65
A BEHAVIORAL ANALYSIS OF CONCEPTS: ITS APPLICATION TO PIGEONS AND CHILDREN
Edward A . Wasserman and Suzette L . Astley I. Introduction ...................................................................... 11. Toward a Behavioral Analysis of Concepts
.............................
111. Empirical Evidence on Basic-Level Conceptualization by Pigeons ........................................................................ IV. Stimulus Generalization and Conceptualization ......... ......... V. A Spencian Model of Basic-Level Categorization ..................... VI. Conceptualization via Primary and Secondary Stimulus Generalization ................................................................... VII. Nonsimilarity-Based Conceptualization ..................................
73 75 77 84 89 105
106
vi
Contents
VIII . Summary of Empirical Evidence ........................................... IX . Concluding Comments ........................................................ Appendix .......................................................................... References ........................................................................
116
THE CHILD'S REPRESENTATION OF HUMAN GROUPS
Lawrence A . Hirschfeld I. I1. I11. IV . V. VI . VII .
Introduction ...................................................................... The Psychological Representation of Human Groups ................ An Alternative Model of Social Development .......................... Race and Perceptual Information .......................................... Do Children Have a Theory of Race? .................................... Racial Thinking and Folk Biology: Areas of Divergence ............ Conclusion ........................................................................ References ........................................................................
133 135 139 141 151 163 174 180
DIAGNOSTIC REASONING AND MEDICAL EXPERTISE
Vimfa L . Patel. Jose' F . Arocha. and David R . Kaufman 1. I1. 111. IV . V. VI .
Introduction ...................................................................... The Task Domain of Medical Diagnosis ................................. Methodological Approach and Conceptual Issues ..................... Expert Diagnostic Reasoning . ............................ Novice Diagnostic Reasoning ............................................... General Discussion .............. ............................. Appendix: Glossary of Medical Terms ................................... References ........................................................................
187 189 194 201 226 233 245 246
OBJECT SHAPE. OBJECT NAME. AND OBJECT KIND: REPRESENTATION AND DEVELOPMENT
Barbara Landau I . Introduction ...................................................................... 11. Object Shape and Object Name: Basic Patterns of Development
I11. Syntactic Effects and the Shape Bias ..................................... IV . Perceptual Effects: Information about Shape and the Shape Bias ........................................................................
253 256 262 267
Contents
V . Effects of Knowledge: The Role of Known Taxonomies and Functions .................................................................... VI . The Role of Object Shape in Early Object Naming ................... VII . Summary and Conclusions ................................................... References ........................................................................
vii
211 291 300 301
THE ONTOGENY OF PART REPRESENTATION IN OBJECT CONCEPTS
Philippe G . Schyns and Gregory L . Murphy 1. I1 . Ill . IV .
The Origins of Parts ............................................................ Empirical Tests of the Functionality Principle ......................... Shape Variation and Part Extraction ...................................... General Discussion ............................................................. References ........................................................................
Index ........................................................................................
305 314 332 341 347
35 1
This Page Intentionally Left Blank
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin
Jose F. Arocha (187), Departments of Medicine and Psychology, McGill University, Montreal, Quebec, Canada H3A lA3
Suzette L. Astley (73), Department of Psychology, Cornell College, Mount Vernon, Iowa 523 14 Ruth M. Colwill ( l ) , Department of Psychology, Brown University, Providence, Rhode Island 029 I2 Lawrence A. Hirschfeld ( 1 33), Department of Anthropology, University of Michigan, Ann Arbor, Michigan 48109 David R. Kaufman (187), Laboratory of Cognitive Studies in Medicine, McGill University, Montreal, Quebec, Canada H3A 1A3 Barbara Landau (253), Department of Cognitive Science, University of California at Irvine, Irvine, California 92717 Gregory L. Murphy (305), The Beckman Institute, University of Illinois, Urbana, Illinois 61801 Vimla L. Patel (187), Departments of Medicine and Psychology, McGill University, Montreal, Quebec, Canada H3A lA3 Philippe G. Schyns (305), Center for Computational and Biological Learning, Whitaker College, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 Edward A. Wasserman (73), Department of Psychology, The University of Iowa, Iowa City, Iowa 52242 ix
This Page Intentionally Left Blank
ASSOCIATIVE REPRESENTATIONS OF INSTRUMENTAL CONTINGENCIES Ruth M . Colwill
1.
Introduction
A cardinal feature of much of animal and human behavior is the susceptibility of that behavior to modification by its consequences. In general, it has been found that the arrangement of rewarding outcomes contingent on an action increases the frequency of that action, whereas the arrangement of response-contingent aversive outcomes reduces the probability of that action. This type of behavioral plasticity has been widely documented for many vertebrate species and even for some invertebrates. The question of how instrumental contingencies alter behavior has commanded considerable attention in the discipline of experimental psychology. The focus of this chapter is on the nature of the associative processes involved in instrumental learning. Particular consideration is given to two issues: what structures represent learning that a behavior will produce a positive outcome and what structures represent learning that a behavior will not have a positive outcome. For several decades, the experimental investigation of instrumental learning has been occupied with the task of describing how the consequences of an action bring about a change in the subsequent probability of that action. One attempt to answer that question involves identifying what it is that an animal learns from its experience with an instrumental contingency. A typical situation used to address this issue is one in which THE PSYCHOLOGY OF LEARNING AND MOTIVATION. VOL. 31
I
Copyright 0 1994 by Academic Press. Inc. All rights of reproduction in any form reserved.
2
Ruth M. Colwill
a rat is trained to make a response (e.g., press a lever) for a rewarding outcome (e.g., a food pellet) either in the presence of a discriminative stimulus (e.g., a light) o r in its absence (S+ and S- training, respectively). Experience with an S + contingency results in an increase in the probability of lever pressing when the light is on, whereas experience with an Scontingency produces a decrease in lever pressing when the light is on. Two rather different ideas have emerged about the nature of the associative structures that might account for these changes in behavior. The conventional approach has been to analyze learning in terms of simple binary associations between the three primary elements of an instrumental situation: the discriminative stimulus (S), the instrumental response (R), and the instrumental outcome (0).Individuals have disagreed, however, about which particular binary connections are important in instrumental learning. Most of the controversy has been over the form that encoding of the response is thought to take. Some have argued for an association between the stimulus and the response (S-R), whereas others have favored an association between the response and the outcome (R-0). In both cases, it has seemed reasonable to complement either the S-R or R - 0 relation with an association between the stimulus and the outcome (S-0). In contrast, contemporary analyses of instrumental learning have favored an approach in which the three primary elements are linked within a single hierarchical structure (S-R-0). Almost uniformly, models of this sort have represented the stimulus as a modulator of the R - 0 relation. In the discussion that follows, I describe some work from my laboratory that is pertinent to identifying the circumstances under which binary and hierarchical associative structures are recruited for representing information about instrumental contingencies. The first part of this chapter reviews evidence that animals learn about each of the potential binary associations. I comment on the conditions that support this learning and discuss evidence regarding the nature of those learned associations. The second part of this chapter reviews studies showing support for a hierarchical account of instrumental learning. Finally, I discuss the relationship between these different representational structures and suggest one factor that might influence the type of associative structure used to support performance in goal-directed tasks. 11. Binary Associations
One of the most popular approaches to understanding instrumental behavior has been to dissect what the animal learns about each of the potential binary connections between S, R, and 0. Colwill and Rescorla (1986)
Instrumental Contingencies
3
reviewed the evidence for the contribution of three different binary associations to instrumental behavior maintained by food rewards. They identified the primacy of an association between the response and the outcome (R-0) and documented the pervasiveness of that learning across a wide range of training conditions. More equivocal conclusions, however, were drawn about the relative importance of associations between the stimulus and the response (S-R) and the stimulus and the outcome (S-0) to instrumental performance. A.
ASSOCIATIONS RESPONSE-OUTCOME
1 . Background
Precursors to the idea that an association is formed between the response and its outcome may be found in a number of early discussions of animal behavior. For example, Spencer (1871, p. 544)argued that animals could “perceive the connection between a muscular act and its immediate effect” and Morgan (1894, p. 216) suggested that one reason an animal repeats a rewarded action is “because it had become associated through experience with pleasurable consequences.” This viewpoint was embraced by Konorski and Miller (1937) and vigorously defended by Tolman (1933). Its popularity has not diminished and it continues to be endorsed by a number of contemporary learning theorists (Bolles, 1972; Colwill & Rescorla, 1986; Mackintosh & Dickinson, 1979). Several lines of support have emerged in favor of the view that an animal’s actions may be part of a deliberate scheme to attain a specific goal (see Colwill & Rescorla, 1986). What is arguably the most powerful procedure for revealing the presence of an R - 0 association is to change the value of the outcome after instrumental learning has taken place. The theoretical implications of the outcome revaluation technique were formally recognized by Rozeboom (1957, 1958). He opined that if subsequent instrumental performance tracks the new value of the goal, it may be inferred that the subject has learned about the consequences of that instrumental act. It has been repeatedly demonstrated in the last few years that instrumental responses change in a manner appropriate to t h e new value of their goal object (e.g., Adams & Dickinson, 1981; Colwill & Rescorla, 1985a, 198Sb, 1986, 1990a; Dickinson & Dawson, 1987; Rescorla, 1990b; Shipley & Colwill, 1993). For example, Colwill and Rescorla (198Sa) trained rats to lever press for one outcome (e.g., pellets) and to chain pull for a different outcome (e.g., sucrose liquid). Then, one outcome was made aversive by pairing it with a nausea-inducing substance. In a subsequent extinction test, the rats preferred the response whose outcome was still valuable. Because the effect of devaluing an outcome was to
4
Ruth M. Colwill
change specifically performance of the response trained with that outcome, we have support for R-0 learning. 2 . Permanence of R - 0 Learning Colwill and Rescorla (1986) presented considerable evidence that R-0 learning occurs across a wide variety of instrumental training parameters. In the following sections, I describe some work suggesting that, once established, R - 0 associations are relatively permanent. Using the outcome revaluation technique to assess the presence of an R-0 association, I have explored the impact of three potentially destructive operations on this type of association. In each case, a clear picture has emerged of the immunity of R - 0 associations to disruption caused by the passage of time and to extinction achieved either by simple nonreinforcement of the response or by arranging a negative correlation between the response and its outcome. u. Passage of Time. There is an extensive literature documenting the fact that performance based on long-term memory for individual events deteriorates with the passage of time (Gleitman, 1971; Spear, 1978). Of particular relevance are studies that have examined the effect of increasing the interval between training and testing on memory for instrumental events (e.g., Gleitman & Steinman, 1964; Hunter, 1913, 1934; Perkins & Weyant, 1958; Steinman, 1967). The typical finding in these reports is that performance declines as the retention interval is increased. For example, Gleitman and Steinman (1964) examined contrast effects as a function of time. When rats that had been trained t o run down an alley for a large reward were shifted to a small reward, they ran more slowly for that small reward than control subjects trained throughout with the small reward. This depression effect (Crespi, 1942) was not observed, however, if an interval of 66 days separated the end of training with the large reward and the shift to a small reward. A natural interpretation of these results is that the animals tested after a long retention interval had forgotten details about the magnitude of the original outcome. In contrast, other work has suggested that learned associations may be quite resistant to the passage of time. Using rats, Gleitman & Jung (1963) found no forgetting after a 44-day retention interval for which one of two keys had to be pressed 10 times for a food reward. Other studies have confirmed the relative stability of memory for simple choices (Chiszar & Spear, 1968; Maier & Gleitman, 1967). One possible explanation for these findings is that even if animals forget some of the details of an outcome, they can still use an intact response-outcome association to guide their selection of the reinforced response over the nonreinforced alternative.
Instrumental Contingencies
5
In an experiment carried out at the University of Pennsylvania, I explicitly examined the effects of time on retention of response-outcome associations. In that study, I varied the retention interval between training instrumental responses and testing them for evidence of association with their outcomes. Rats were trained initially to make two responses (e.g., lever press and chain pull). One response was followed by food pellets and the other by a liquid sucrose reward. Fifty-four days after this training, those same animals were trained to make two other responses (e.g., nose poke and handle pull). Again, one of these responses was followed by food pellets and the other by sucrose liquid. Training of both pairs of responses was identical. Each response was reinforced first on a continuous reinforcement (CRF) schedule until 50 reinforcers had been obtained, and then on a variable interval (VI) schedule for two 20-min sessions. A VI 30-sec schedule was in effect in the first of these sessions and a VI 60-sec schedule in the second session. The various response-outcome combinations and their assignment to the distant or recent training condition was counterbalanced across animals. Immediately after training of the second pair of responses, one of the outcomes was made distasteful by pairing it with a nausea-inducing toxin, a 0.5% body weight injection of 0.6 M lithium chloride. In this and all subsequent experiments using the outcome devaluation technique, the procedure for devaluing the outcome was identical to that described by Colwill and Rescorla (1985a, Experiment 1). At the end of this treatment, no subjects ate their devalued outcome, but all subjects consumed their valued outcome. Finally, subjects were given two 10-min extinction tests, one with the recently trained responses and the other with the distantly trained responses. The question of interest is whether the passage of time would attenuate the magnitude of the devaluation effect, that is, the degree to which the animals would choose the valued response over the devalued response during testing. The results of these tests are shown in Fig. 1, separated according to length of retention interval (RI) and the current value of the outcome. Analysis of the recently trained response data revealed significantly more valued responses than devalued responses. This finding replicates the basic devaluation effect obtained by Colwill and Rescorla (1985a). The unique contribution of these test results, however, is the finding that interpolating a retention interval of 66 days between the final session of VI training and the test sessions did not attenuate the magnitude of the devaluation effect. In fact, the difference between the valued and devalued responses is numerically, although not statistically, larger for the responses tested after a long retention interval. This pattern of results suggests that the strength of a response-outcome relation does not fade with time.
Ruth M. Colwill
6
SHORT RI
al
c 3
.-c
LONG RI
0
\0
8 -
E a
6 - 0
u) P)
ln
K 0
a
4 -
\
1
Devalued
.’\
O\
\0-0.
c
0-0
0,
0
El
Valued
\
L
P)
0-0
2
3
4
0
0.
5
1
2
3
4
5
Blocks of two minutes
Fig. I . Sensitivity to outcome devaluation of instrumental responding with either a short (left panel) or long (right panel) retention interval (RI) between training and testing. One outcome had been paired with a nausea-inducing substance (filled circles). but the other had not (open circles).
This design was not intended to diagnose any time-dependent changes in the long-term memories of individual elements. In fact, the design of the experiment presented several opportunities for repairing any decrement that might have occurred to the outcome representations. During training of the second pair of responses or during the outcome devaluation procedure, additional presentations of the outcomes might have served to remind the animals about the specific features of those outcomes. The beneficial effects of outcome presentations as reminder treatments has been well documented in both Pavlovian conditioning and instrumental training procedures (Miller & Springer, 1973, 1974). In summary, these results replicate previous reports that responding is depressed when the outcome used to establish that responding is made undesirable. The specificity of this suppression of instrumental performance indicates that animals learn about the consequences of their actions. Moreover, the present results demonstrate excellent retention of this knowledge over a relatively long period of time. Thus, endurance over time is not a property unique to associations established through Pavlovian conditioning (Gleitman, 1971; Hilgard & Marquis, 1935). Furthermore, this evidence regarding the robustness of long-term associative memory
Instrumental Contingencies
7
is in line with other work showing that memories appear fully intact when tests are conducted in the presence of relevant retrieval cues (Miller, Jagielo, & Spear, 1990). b. Simple Nonreinforcernent. A well-established fact about instrumental behavior is that responding declines when rewards are discontinued. Various arguments have been made about the mechanisms underlying extinction of both Pavlovian and instrumental responses (Mackintosh, 1974). One salient distinction between the different accounts is the issue of whether extinction erases the association previously established by training. In the following experiment, I used the outcome devaluation procedure to assess whether simple nonreinforcement of an instrumental response destroys the association between that response and its outcome. Rats were trained to perform two pairs of responses (lever press and chain pull; nose poke and handle pull). One member of each response pair earned pellets and the other earned sucrose. Response-outcome combinations were counterbalanced across subjects. The training procedure for each of the four responses involved a session of CRF training until 30 reinforcers had been earned and then two 20-min sessions of reinforcement on a VI 30-sec schedule. Following this, there were five 20-min sessions of training with each pair of responses. In these sessions, each response was reinforced on an independent VI 60-sec schedule. One pair of responses was then extinguished for twenty 20-min sessions. For one half of the subjects, nose poking and handle pulling were extinguished; for the remaining subjects, lever pressing and chain pulling were extinguished. After extinction, all four responses were retrained in separate sessions with a third outcome, polycose. This training accomplished two goals. First, it raised the level of extinguished responses so that a devaluation effect might be observed. Second, it eliminated the differences in terminal rates across response pairs resulting from the extinction treatment. One of the original outcomes (either pellets or sucrose) was then devalued by pairing it with a toxin. Finally, all animals received two 10-min extinction tests, one with the extinguished responses and one with the nonextinguished responses. Table I outlines the basic design of this experiment. The results of these tests are shown in Fig. 2, separated according to extinction treatment and current value of the outcome. The left-hand side of Fig. 2 shows a standard devaluation effect, at the start of testing, for the nonextinguished responses. The right-hand side of Fig. 2 shows a similar devaluation effect for the extinguished responses. Clearly, simple nonreinforcement of an instrumental response did not remove the association between that response and its original outcome. Thus, not only is a memory of the instrumental outcome retained through extinction (Spear,
Ruth M. Colwill
TABLE I BASICDESIGN OF EXTINCTION EXPERIMENT Training
Extinction
R1-01 R2-02 R3-01 R4-02
R3 R4
Retraining
Devaluation
Test
R1-03 R2-03
01+, 0 2 -
R1 v R2
0 2 + , 01-
R3vR4
R3-03 R4-03
Nore. R I , R2, R3, and R4 are instrumental responses. lever pressing, chain pulling, nose poking, and handle pulling, counterbalanced across animals. 01 and 0 2 denote food pellets and sucrose liquid; 0 3 denotes polycose: + and - indicate the presentation or not of LiCI.
I967), but subjects retain information about which response earned that outcome. However, it is also apparent from Fig. 2 that extinction was not without an effect on responding. Comparison of the extinguished responses found their rate during testing to be significantly lower than that of the nonextinguished responses. Unfortunately, this difference in response rate precludes a meaningful comparison of the magnitude of the devaluation effect for the extinguished and nonextinguished responses. But the NONEXTINGUSHED
EXTINGUISHED
0)
c
0-0
Valued
0-0
Devalued
2
.-
C
E
1
2
3
4
5
1
2
3
4
5
Blocks of two minutes
Fig. 2. Sensitivity to outcome devaluation of instrumental responses with a history of extinction (right panel) or not (left panel). One outcome had been paired with a nauseainducing substance (filled circles), and the other had not (open circles).
Instrumental Contingencies
9
implications of these data are clear. They suggest that simple nonreinforcement of a response does not eliminate the sensitivity of that response to a change in the value of its outcome but that it does increase the vulnerability of that response to the suppressive effects of repeated extinction. These findings are similar to those recently reported by Rescorla (1993). However, it is instructive to note two differences between the studies. First, my study used a more extensive period of extinction to take advantage of the fact that considerable spontaneous recovery normally occurs between sessions early in the extinction treatment. Thus, the subjects in the present experiment had frequent opportunities to experience the new consequences associated with responding. Attesting to the effectiveness of this treatment is the finding that spontaneous recovery between sessions was virtually eliminated by the end of extinction training. Yet, despite the greater opportunity t o experience the new consequences of responding, detection of the original R - 0 association was still possible. The second difference between this study and that reported by Rescorla (1993) concerns the conditions under which the responses were trained following initial acquisition. In my study, a response earning pellets was trained concurrently with a response earning sucrose for several sessions. This feature of the present design thus guaranteed equivalent opportunities for each outcome to develop associations with both responses (Colwill & Rescorla, 1985a, Experiment 2). Evidently, the selective effect of outcome devaluation in my study could only have been mediated by the survival through extinction of the unique R - 0 associations, and not the common 0-R associations. c. Negutive Correlntion witti the Outcome. A procedure considered to be superior to simple nonreinforcement in promoting an associative loss is one that arranges for a response to be negatively correlated with its outcome. This assertion is based on three arguments. First, the continued delivery of the outcome during the extinction treatment reduces the contribution of generalization decrement to disruption of responding (Capaldi, 1967; Jenkins, 1965).This is especially relevant if the frequency of outcome presentations remains unchanged between training and extinction. Second, by continuing to deliver the outcome during extinction, any changes in the representation of that outcome (Rescorla, 1973) or in the subject’s motivational state (Spence, 1966) are prevented. Thus, any decrement in performance is more likely to represent a genuine associative loss. Third, by negatively correlating the response and the outcome, the previously positive relation should be replaced by an inhibitory association between the response and the outcome. Thus, whereas simple extinction merely neutralizes the R-0 association, a negative correlation procedure actually
10
Ruth M. Colwill
reverses the polarity of the R-0 connection. This prediction is derived from an influential model of Pavlovian conditioning (Rescorla & Wagner, 1972). Studies examining the reacquisition of extinguished responses provide the strongest support for the belief that a negative correlation procedure is more effective than simple nonreinforcement for extinguishing behavior. In these experiments, reacquisition is routinely found to be considerably slower following noncontingent outcome deliveries or presentations of the outcome on a differential reinforcement of other behaviors (DRO) schedule during the extinction procedure (Pacitti & Smith, 1977; Uhl & Garcia, 1969). To the degree that comparisons across experiments have any validity, it is of interest to note that responding was reacquired more slowly following the negative correlation procedure used in the next experiment than after the simple nonreinforcement procedure used in the preceding experiment. This difference is particularly meaningful: because the reacquisition outcome was different from those used for original training, the difference cannot be attributed to a simple discrimination hypothesis (Jenkins, 1962). In another experiment carried out at the University of Pennsylvania, I examined whether an R - 0 association would remain intact after that response had been negatively correlated with its outcome. The design was similar to that outlined in Table I. Two pairs of responses were trained (lever and chain; nose poke and handle pull). Each response was trained separately, first on a CRF schedule, then on a VI 30-sec schedule for one 20 min session, and finally, on a VI 60-sec schedule for two 20 min sessions; one member of each pair earned food pellets and one member earned sucrose liquid. Assignment of response-outcome pairings was balanced across subjects. Then, one pair of instrumental responses (nose poke and handle pull for one half of the subjects; lever press and chain pull for the remaining subjects) was exposed to negative correlation training. In these sessions, as in training, only one response was available. A variable-time (VT) 60-sec schedule was used to determine outcome availability but delivery of the outcome occurred only after a minimum of 5 sec had elapsed without a response. There were twenty 20-min sessions of this training with each response. To raise the overall response rate of the decremented responses and to minimize the levels differences between the decremented and nondecremented responses introduced by this treatment, all four responses were given six 20-min sessions of training with a third reinforcer, super-strength cherry-flavored pellets, available on a V130-sec schedule. One of the original outcomes was then made aversive through pairings with a toxin. Finally, two 10-min extinction tests were given with each pair of responses.
Instrumental Contingencies
I1
The results of these tests are summarized in Fig. 3 . The left-hand side of Fig. 3 shows the nondecremented responses separated according to the current value of their outcomes, and documents a robust outcome devaluation effect. The right-hand side of Fig. 3 displays the data from the decremented responses and also reveals a substantial preference for the response trained with the valued outcome. Statistical analyses confirmed the visual impression that the magnitude of the devaluation effect was not reduced by using a negative correlation between a response and its outcome to reduce responding. Similar findings have been reported by Rescorla (1992a), who used a less extensive exposure of a response to either noncontingent or negatively correlated outcomes to produce extinction of that response. Further confirmation of the fact that the R - 0 associations were fully intact following this decremental treatment is indicated by the results of another series of choice tests. In one of these tests, subjects were presented with the two valued responses; in the other test, the two devalued responses were available. In the test with the valued responses, a weakened R-0 relation should be reflected in a preference for the nonextinguished valued response. In the test with the devalued responses, a preferNONDECREMENTED
lo al + 3
.-C
DECREMENTED
r
8 --
O\
O\
0
\
E I
al
a v)
0-0
Valued
0-0
Devalued
0
\
ln K
0
a VI
?!
c 0
al
2
0
1
2
3
4
5
1
2
3
4
5
Blocks of t w o minutes
Fig. 3. Sensitivity to outcome devaluation of instrumental responses that had been negatively correlated with their training outcomes (right panel) or not (left panel). One outcome had been paired with a nausea-inducing substance (filled circles). but the other had not (open circles).
Ruth M. Colwill
I2
ence for the extinguished devalued response might appear if a weakened R-0 association lessened the suppressive effect of anticipating an aversive outcome but left intact the contribution of any conditioned reinforcers to performance. But the results shown in Fig. 4 are clear: The animals distributed their responses equally across the two alternatives. This indifference implies an equivalence in the strength of the R - 0 associations. The similarity of these data to those showing matching on concurrent schedules (Baum & Rachlin, 1969; de Villiers, 1977; Herrnstein, 1961; McSweeney , 1975) supports the conclusion that negatively correlating a response with its outcome does not weaken the original R-0 association. d. Conclusions. The results of the three decremental operations reported here suggest that response-outcome associations are relatively permanent. The magnitude of the outcome devaluation effect did not diminish with the passage of time; neither did simple nonreinforcement of a response, or the arrangement of a negative correlation between a response and its outcome, remove the sensitivity of that response to devaluation of its outcome. In fact, in two of these conditions, the outcome devaluation effect was numerically larger for the responses exposed to the destructive treatment. DEVALUED PAIR
0
c. 3
VALUED PAIR
0-0
Nondecremented
0-0
Decremented
.-c
E
L a)
a v)
a) v)
C
0
a m
e C
O 0
I
2t 0 1
2
3
4
5
1
2
3
4
5
Minutes
Fig. 4. Preference tests between a response that had been negatively correlated with its outcome (filled circles) and a response that had not (open circles). The training outcome had been devalued for one pair of responses (left panel) but not for the other pair of responses (right panel).
Instrumental Contingencies
13
Two features of the present studies add force to the conclusion that R - 0 associations are fully preserved following these manipulations. First, in each experiment, a within-subjects design was used to probe the status of the R-0 associations. This design guarantees that the potential impact of a decremental treatment on an R - 0 association is not confounded with any effect that treatment might have on the efficacy of the outcome devaluation procedure per se. Second, these studies were concerned with the issue of whether various manipulations altered the strength of R-0 associations. Of the various probes that are available for detecting the presence of R - 0 learning, the outcome devaluation technique has an important advantage in that it has been shown to be sensitive to differences in the strength of R - 0 associations. Colwill and Rescorla (1988b) showed that a response trained with multiple outcomes was more sensitive to the current value of the outcome with which it had a stronger association. In that study, two responses were trained with two outcomes ( 0 1 and 0 2 ) . 01 occurred more often than 0 2 for one response, but less often than 0 2 for the other response. Then, one outcome was devalued. In a subsequent extinction test, performance of a response was found to be more depressed following devaluation of its frequent outcome than of its infrequent outcome. Quite a separate issue raised by the two extinction studies is the question of why responding declines when its outcome is discontinued o r negatively correlated with the response. Obviously, the observed declines in responding are not attributable to concomitant declines in the R - 0 connections. There are at least three separate accounts of extinction that leave original learning intact and appeal to some additional process to explain the decrease in performance. One common belief has been that responding declines because of the increase in some other behavior. This response competition account provides a natural explanation for the more rapid decrease in responding that is sometimes obtained with noncontingent and negative correlation procedures compared to simple extinction ( Johnson, McGlynn, & Topping, 1973; Nevin, 1968; Zeiler, 1971). However, the observation made here and elsewhere (Rescorla, 1992a) that the putative competing response routinely interferes more with performance of the decremented responses than with the nondecremented responses during their retraining in a common context weakens the validity of this account of extinction. The present results are less helpful in distinguishing the merits of two other explanations for why instrumental responding declines in extinction. Several authors have argued that the response becomes associated with the aversive consequences, such as frustration, generated by the omission of an expected reward (Amsel, 1958; Wagner, 1966). Equally plausible,
14
Ruth M. Colwill
however, is the possibility that an inhibitory association may develop between the extinction context and the instrumental response, just as an inhibitory S-R association develops between an S- and its nonreinforced response (Colwill, 1991). This view is similar in spirit to a suggestion made by Robbins (1990) that a component of extinction in Pavlovian conditioning involves a decline in attention to the conditioned stimulus (CS). Overall, the pattern of results reported here is explained equally well by both the frustration account and the inhibitory S-R account. The idea that associations remain fully intact once they have been established is by no means a novel one. The study of paired-associate learning led some investigators to the conclusion that initial associations are not lost when new ones are acquired (McGeoch & Irion, 1952). Moreover, Pavlov’s (1927)discovery that extinguished responses recover either spontaneously over time or following the administration of some novel agent alerted him to the prospect that the association between a Pavlovian stimulus and its outcome was not abolished by extinction. This position has recently been reaffirmed by Bouton and his colleagues (Bouton, 1991; Peck & Bouton, 1990) who have argued that contextual stimuli associated with extinction modulate the original excitatory association between the CS and its outcome. In summary, current data provide strong support for the view that R - 0 learning is a pervasive and fundamental component of instrumental learning. It will be important for both neural and connectionist models of the instrumental learning process to recognize the pivotal role played by R-0 associations in goal-directed behavior and to acknowledge the permanent quality of R - 0 associations. The major challenges for future studies of R - 0 learning will be to identify the mechanisms underlying extinction and to specify when the original goals of a response will affect subsequent behavior and when they will remain dormant. B. STIMULUS-OUTCOME LEARNING There remain authors opposed to an interpretation of the effect of outcome devaluation on instrumental performance in terms of R - 0 learning. Their opposition is based on allegiance to an ingenious scheme proposed by Spence (1956) to perserve the fundamental principle of classical S-R theory that instrumental learning involves the acquisition of stimulusresponse connections (Guthrie, 1952;Hull, 1943).The validity of this principle was challenged by evidence that the identity of the outcome could influence instrumental behavior (see Colwill & Rescorla, 1986). However, by allowing stimuli correlated with the response to become Pavlovian signals for the outcome earned by the response, Spence (1956) provided
Instrumental Contingencies
15
a way for instrumental performance to be mediated by an S-R association yet still be sensitive to manipulations involving the outcome. The merits of approaches that combine S-0 and S-R associations, so-called twoprocess theories, were reviewed in some detail by Colwill and Rescorla (1986). As they noted, there is enough versatility in the definition of a stimulus to make it extremely difficult to distinguish empirically between a Pavlovian S - 0 and an instrumental R - 0 account of outcome devaluation effects. Nevertheless, it may yet prove profitable to examine the properties of explicit, discrete discriminative stimuli. Evidence that such stimuli differ from Pavlovian conditioned stimuli greatly reduces the plausibility of assigning Pavlovian properties to more local features of the context accompanying operation of the response manipulandum in order to explain the full range of circumstances under which outcome devaluation effects have been obtained. In what follows, I focus on the issue of whether discriminative stimuli develop associations with their instrumental outcomes. The first section deals with the properties of a stimulus (S+) that signals a response will be reinforced. The second section deals with the properties of a stimulus (S-) that signals that a response will not be reinforced. Evidence is presented that an S+ provides information about the instrumental outcome earned in its presence and that an S- provides information about the identity of the omitted outcome. In each case, the deficiency of a Pavlovian conditioning account of the learning is discussed and the opinion is ventured that one function of simple discriminative stimuli is to modulate the activation threshold of the outcome representation. 1.
S + Procedures
a . Transfer. One technique that has been used to identify an S-0 association measures the effect of a discriminative stimulus on the performance of new instrumental responses. The rationale behind this transfer test is that an animal’s choice between two responses might be biased by a stimulus that signals the availability of the outcome for one of those responses. The use of this transfer procedure is illustrated in a study by Colwill and Rescorla (1988a, Experiment I ) . Rats were allowed to earn a sucrose outcome in the presence of one stimulus (Sl) and a food pellet outcome during another stimulus (S2). They earned one of these outcomes by nose poking and the other outcome by handle pulling. After S+ training, two new responses (lever press and chain pull) were trained in preparation for a transfer test. In this and all subsequent experiments using a transfer test, essentially the same procedure was used to train the responses. One response was trained with the sucrose outcome
16
Ruth M. Colwill
and the other response was trained with the food pellet outcome. The assignment of outcomes to responses was balanced across subjects and orthogonal to the S + treatments. After a session of CRF training and a 20-min session of reinforcement on a VI 30-sec schedule with each response, four 20-min sessions were given with both responses simultaneously available. In these sessions, responding was reinforced on independent VI 60-sec schedules. Finally, the two transfer responses were tested in extinction with occasional presentations of S 1 and S2. It was expected that S1 would promote the transfer response trained with sucrose and that S2 would promote the transfer response trained with pellets. These predictions were confirmed by the results of the transfer test. The effect of a stimulus depended on the identity of the outcome used to train the transfer responses: Each stimulus selectively augmented the response with which it shared an outcome but had no discernible effect on the response trained with a different outcome. The selectivity is important for excluding interpretations of transfer in terms of general effects of S+ training or generalization among the responses that may be legitimately applied to earlier reports of transfer in single-outcome studies (Hearst & Peterson, 1973; Walker, 1942). The inference to be drawn from the present results, however, is clear: An S + provides information about the identity of the outcome earned in its presence. b. Extinction. Evidence that S+s transfer to new responses based on their shared associations with the same outcome leaves unanswered the question of whether the same association mediates their ability to control their originally trained responses. One approach to this problem is to examine the impact on discriminative control of removing the S - 0 association. If performance of an instrumental response in an S+ is a product of the combination of an S - 0 association with either an R - 0 or S-R association, it should be possible to reduce that performance by eliminating the S - 0 association. A procedure that promises to negate an S - 0 connection involves retraining the S+ as a signal for the nonreinforcement of a different response (S- training) trained with the outcome of S + . The evidence for this is presented later in the section on inhibitory S - 0 learning. All that needs to be appreciated at this point is that after receiving training as an S - , a stimulus specifically signals the omission of the instrumental outcome. Colwill(1993b, Experiment 1) inspected discriminative control over the original response after an S + had been trained as an S- for a different response. The design of that study is shown in Table 11. Rats were first trained with two S + s (noise and light), two responses (lever and chain), and two outcomes (sucrose and pellets). Each stimulus was trained as a signal for a different response-outcome association: S 1 signaled that R l
17
Instrumental Contingencies
TABLE I1 BASICDESIGNOF S-0 EXTINCTION EXPERIMENT S + training
S - training
Test
R3-01, SI: R 3 - , S2: R 3 S1: RI-01
SI: RI S2: R2
S2: R2-02 R3-02. SI: R 3 - , S2: R3-
Note. SI and S2 are instrumental discriminative stimuli, noise, and light. R I and R2 are instrumental responses, lever pressing, and chain pulling, counterbalanced across animals. R 3 represents nose poking; 01 and 0 2 denote food pellets and sucrose liquid; - indicates nonreinforcement.
would be followed by 01, and S2 signaled that R2 would be followed by 0 2 . Subjects were then trained to nose poke; one half of the subjects earned food pellets and one half earned sucrose liquid. After this, both S + were trained as signals that nose poke responses would not be reinforced in their presence. In this way, one S + was trained as an S- with the same outcome, and one S + was trained as an S- with a different outcome. This design ensured equivalent exposure to both S + s and thus controlled for any general disruptive effects of S- training. However, only S- training with the same outcome should extinguish the original s-0 association. Both stimuli were then tested in extinction with their original responses. Figure 5 shows performance of a response during the stimulus with which it was trained and during the ITI. Performance is shown separately for the stimulus trained as an S- with the same outcome (left panel) and for the stimulus trained as an S- with a different outcome (right panel). The data are unambiguous: Performance of a response was substantially reduced in the presence of its S + following a treatment designed to remove the S - 0 association. In fact, this treatment completely undermined the ability of the S + to produce a significant elevation of responding relative to the intertnal interval (ITI) rate. These results suggest that an intact S - 0 association is crucial to the ability of an S+ to produce performance of its own instrumental response. c . Nature of the S - 0 Association. Adherents of two-process theories would find neither the conclusion that an S+ encodes the identity of the instrumental outcome nor the conclusion that an intact S - 0 association is necessary for an S + to produce its response particularly remarkable. In the first place, there is good evidence that Pavlovian cues develop associations with representations that are rich in detail about their conse-
Ruth M. Colwill
18
16
0-0
0-0
(u
+
O ,
SAME DlFF
.-C
E
0
\
3
12.
L
a2
a
A
\
0-0
A
A... .A
0’
'
I
1
2
I
'
3
4
1
"
'
2
4
3
Trials Fig. 5. Effect on an instrumental response of training its discriminative stimulus as a signal for the nonreinforcement of another response trained either with the same outcome (left panel) or with a different outcome (right panel). Responding is also shown in the absence of its stimulus (ITI). From Colwill (1993b). 0 1993. The Psychonomic Society, Inc.
quences (Colwill & Motzkin, 1993; Rescorla, 1980). In the second place, eliminating the S - 0 association would be expected to deprive the S + of the ability to produce either the motivational (Rescorla & Solomon, 1967) or stimulus support (Trapold & Overmier, 1972)for responding. However, attempts to show empirically that a discriminative stimulus is in essence a Pavlovian stimulus have yielded somewhat conflicting results (Colwill & Rescorla, 1986). The work that I describe next illustrates the difficulties in deciding this issue. One strategy that has frequently been used to analyze S + s is to examine the degree to which their effects can be mimicked by Pavlovian CS+s. To the degree that the two types of stimuli are interchangeable, we have evidence consistent with the fact that they share a common associative structure. There is general agreement that the transfer of CS+s to instrumental responses is outcome specific (Colwill & Motzkin, 1993; Colwill & Rescorla, 1988a; Kruse, Overmier, Konz, & Rokke, 1983). However, other features of the pattern of transfer with C S + s do not resemble those obtained with S + s . Colwill and Motzkin (1993) gave rats training with two Pavlovian cues (a noise and a light) and two outcomes (food pellets and sucrose liquid). Each CS was uniquely paired with one of the outcomes. Then, two different responses were given standard training in
Instrumental Contingencies
19
preparation for a transfer test. One of these responses earned food pellets and the other response earned sucrose liquid. The results of the transfer test showed that the effect of a CS on a response was outcome specific: same responses were significantly more frequent than different responses. In this regard, the transfer of both an S + and a CS+ seems to be mediated by shared association with a common outcome. However, a potentially important difference appears if these stimulus effects are compared with the level of responding during the ITI. Whereas the effect of an S + is to increase performance of the same response and leave unaffected performance of the different response, a CS+ has no impact on the same response but depresses performance of the different response. Colwill and Rescorla (1988a) discussed at length the constraints imposed by this pattern of results for answering the question of whether an instrumental S+ is simply a Pavlovian CS +. One major concern centered on the potential for differential contributions of a competing response during CS+ and S+ presentations that might have served to reduce the overall level of transfer responses during CS+. Whereas the CS+ would continue to evoke its Pavlovian response (e.g., approaching the food magazine) during testing, the absence of the manipulandum used to train the S + prevented occurrences of a behavior that might have interfered with performance of the transfer responses. Thus, a CS+ and an S+ might differ only in the degree to which t h e y elicit competing responses during testing. This argument may turn out to be intractable to empirical evaluation. To control for differences in the likelihood of competing responses, transfer might be measured in the presence of either a compound of an S + for 01 and a CS+ for 0 2 , or with a stimulus simultaneously trained as an S + for 0 1 and a CS+ for 0 2 . According to a competing response argument, the promotion of the same response by S + would be reduced. But if the original patterns of transfer represent genuine differences in the action of CS+ and S + , any combination of an S + for 01 with a CS+ for 0 2 would lead to cancellation of their independent effects on a transfer response trained with 01. Thus, promotion of the same response should be reduced. An alternative strategy that may prove more promising in illuminating the relationship between an S + and a CS+ is to inspect the range of operations that lead to extinction of the instrumental S-0 association. If treatments that are known to extinguish Pavlovian associations are found to be ineffective in eliminating the ability of an S + to control its response, we would have some justification for making the claim that S+ is fundamentally different from C S + . It is well known that nonreinforced presentations of a CS+ lead to a decline in conditioned responding (Annau & Kamin, 1961; Pavlov, 1927)and that a stimulus that is negatively correlated
Ruth M. Colwill
20
with an outcome develops the ability to suppress responding to a Pavlovian excitor (Rescorla, 1969) or to elicit a withdrawal response (Hearst & Franklin, 1977). Yet, we have found neither of these operations to reduce the ability of an S+ to produce its instrumental response. For example, Colwill (1993b, Experiment 2) gave rats training identical to that used t o produce the data in Fig. 5 with the following exception. Instead of training the former S+ as S- for another response, the S + s were simply negatively correlated with one of the outcomes used for S + training. The results of testing those stimuli with their original responses are shown in Fig. 6. Responding is separated according to whether S + was negatively correlated with the same outcome used for its training (left panel) or with a different outcome (right panel). It is clear that there was no impairment in the ability of an S + to produce its response despite a history of being negatively correlated with its original outcome. This finding indicates that the disruption of s+ control shown in Fig. 5 was not mediated by the negative correlation between the stimulus and the outcome intrinsic to S- training. Corroboration that this procedure of negatively correlating a stimulus and its outcome undermines the 0-0
lSr
ln in
a,
c
0-0
SAME
DlFF
A..----AI T I 0
\
8 -
0
a u)
0
P c 0
4 -
A,.
a,
I
A,* ‘.A....A
A,... A,.., A.,.
0 1
2
3
4
1
2
3
4
Trials
Fig. 6. Effect on an instrumental response of negatively correlating its discriminative stimulus with either the same outcome (left panel) or a different outcome (right panel). Responding is also shown in the absence of its stimulus (ITI). From Colwill (1993b). 0 1993. The Psychonomic Society. Inc.
Instrumental Contingencies
21
transfer of Pavlovian CS+ s to instrumental responses would obviously strengthen the argument for a separation in the associative structures of S + and CS+.
2. S - Procedures u . Transfer. Traditional explanations for why animals do not respond in the presence of an S- were constrained by the belief that the only association learned in instrumental training was a stimulus-response connection. Because this S-R association was thought to be strengthened only when a reinforcer followed the response, accounts of the effects of an S- admitted just two possibilities. An S- failed to evoke the instrumental response either because it lacked an association with that response (Capaldi, 1970; Thorndike, 1911) or because it elicited an incompatible response (Guthrie, 1952; Hull, 1943; Thorndike, 1932). Criticism of classical S-R theory and the subsequent evolution of twoprocess theories permitted realization of a third possibility. As more theorists came to adopt the position that instrumental discriminative stimuli developed Pavlovian associations with the outcome, it seemed reasonable enough to suppose that the action of an S- was to be understood in terms of its Pavlovian analogue, conditioned inhibition (Rescorla & Solomon, 1967; Rilling, 1977). A principal characteristic of this approach was to regard the effects of omitting different outcomes as distinguishable only in terms of their global motivational valence (Dickinson, 1980; Dickinson & Dearing, 1979; Dickinson & Pearce, 1977). Thus, the omission of different events within the same affective category (e.g., food pellets and sucrose liquid) would be expected to have identical consequences for behavior. Support for this view has come from studies showing that Pavlovian inhibitors transfer to stimuli trained with different outcomes belonging to the same motivational class (Nieto, 1984; Pearce, Montgomery, & Dickinson, 1981). However, the evidence that animals encode very detailed representations of those events when they occur offers more than adequate justification for questioning the suitability of accounts that ignore the potential for learning about the detailed identity of the omitted outcome. To explore the issue of whether specific information about the omitted outcome is encoded by an S- , I examined the role of outcome identity in the transfer of S-s to new instrumental responses. If an S- signals that a particular outcome is locally unavailable, performance of any response trained with that outcome ought to be suppressed by that S-. In one study (Colwill, 1991, Experiment I ) , two stimuli were trained as signals for the nonreinforcement of different instrumental responses. S 1 signaled that Rl would
22
Ruth M. Colwill
not be followed by pellets, and S2 signaled that R2 would not be followed by sucrose. Then, these stimuli were presented with two other responses, R3 trained with pellets and R4 trained with sucrose. It was expected that each S- would suppress the response trained with the outcome whose omission was signaled by the S - . In other words, Sl should suppress performance of R3, and S2 should suppress performance of R4. The data from the transfer test are shown in Fig. 7. As predicted, an S- better suppressed the response trained with the outcome whose omission S- had been trained to predict. The outcome-dependent component of this transfer indicates that the subjects had learned which particular outcome would be omitted during that S - . Figure 7 also shows that an S- suppressed the different response relative to the ITI. This nonspecific transfer may be mediated by the shared features of omitted outcomes (e.g., frustration), by a competing response as anticipated by some of the traditional accounts of extinction, or by nonassociative factors (Colwill, 1991). In any event, the outcome-dependent transfer of S-s points to the presence of an inhibitory association between an S- and the instrumental outcome (S 30).At least two predictions that follow from this viewpoint have been confirmed empirically. First, Bonardi (1989) and Colwill(l991)
c al
._ C
E
lo 8
I
A.
...
A.
.
0-0
SAME
0-0
DlFF
A- -A
IT1
L
e,
a rn 0 v) C
a rn
2 C
0
al
5
1
2
3
4
Blocks of t w o trials Fig. 7. Transfer test of negative discriminative stimuli (S - s) and instrumental responses. Responding is shown during a stimulus that signaled the omission of the same outcome (filled circles) or a different outcome (open circles), and during the IT1 (triangles) when no stimuli were present. From Colwill ( 1 9 9 1 ) . 0 1991. The Psychonomic Society, Inc.
Instrumental Contingencies
23
found that if a previously omitted outcome were subsequently made contingent on a response in the presence of the S -, acquisition of S control was slower than if some other outcome were used. Second, Colwill(1991) identified a condition where knowledge that an event would be omitted led to an elevation of responding. Performance of a response that had earned two different outcomes, one of which was subsequently devalued, was increased during presentations of an S- for the devalued outcome.
+
b. Characteristics of Inhibitory S-0 Associations. Previous reports describing the outcome specificity of instrumental inhibitors have been based on investigations of stimuli that signal the omission of the responsecontingent outcome. However, two reasons make it of interest to examine the transfer properties of stimuli that signal reductions in the frequency of the instrumental outcome. First, it offers an opportunity to appraise the details of the subject’s encoding of the outcome. By varying the reinforcement history of the transfer responses used in testing, it becomes possible to separate learning only about the identity of the outcome from learning about the actual frequency of that outcome. Second, the role of Pavlovian conditioned inhibition can be evaluated by comparing the transfer of stimuli that signal fewer response-contingent reinforcers with the transfer of stimuli that signal the same reduction in outcome frequency but whose reinforcers are delivered noncontingently . i. Precision of outcome encoding. If presentations of a stimulus accompany a reduction in the density of the VI reinforcement schedule associated with an instrumental response, there are at least two pieces of information that such a stimulus could come to convey. On the one hand, the stimulus might simply signal a decrease in the availability of the outcome; the magnitude of that decrease would be determined by the difference between the rate of reinforcement during the stimulus and in its absence. On the other hand, the stimulus might signal the absolute frequency of the outcome. In this case, the subject might compare the value of the VI schedule pertaining to that stimulus with the background rate, and reduce its response during the stimulus only when the stimulus signals a rate reduction. These two accounts can be distinguished by examining transfer to responses trained with different densities of reinforcement. If the rate of reinforcement for the transfer response is greater than that signaled by the S - , both accounts predict that responding will be depressed. However, if the transfer response has been trained on the same V I schedule as that associated with the S - , only the first account predicts that responding will be suppressed. In fact, based on transfer studies of S+s, the view that the local reinforcement rate is encoded by a stimulus anticipates an increase in responding during the stimulus.
Ruth M. Colwill
24
The next two experiments investigated the transfer properties of stimuli that signaled a reduction in the relative frequency of reinforcement. The first study verified that such stimuli would differentially suppress transfer responses trained on a rich reinforcement schedule. The second study examined the transfer of such stimuli to responses trained on a lean reinforcement schedule. In both studies, two stimuli (tone and steady light) were trained each with a different response (nose poke or handle pull) and a different outcome (food pellets or sucrose liquid). Each response was reinforced on a VI 240-sec schedule during 30-sec presentations of its stimulus, but on a VI 30-sec schedule during the 30-sec ITI. The basic design of each experiment is shown in Table 111. Both experiments tested the effects of these stimuli on two different responses (lever and chain), one trained with pellets and one trained with sucrose. In one study, the transfer responses were trained on a V160-sec schedule; in the other study, the transfer responses were trained on a VI 240-sec schedule. The standard procedure was used to train the transfer responses on a VI 60-sec schedule. The procedure for training the transfer responses on a VI 240-sec schedule deviated from that standard in the following ways. First, a 10% polycose solution was used to reinforce lever and chain responses during initial CRF training and for two 20-min sessions of V1 60-sec training. Each response was then trained on a VI 240-sec schedule for one 20-min session with its unique outcome (i.e., pellets for one response and sucrose for the other response). Eight 20-min sessions of V I 240-sec training were then given with both responses available within a session and each reinforced with its unique outcome. In both studies, the effect of the S-s on these responses was tested in extinction. In the
TABLE 111 BASICDESIGN OF TRANSFER EXPERIMENTS WITH SIGNALS RATE FOR REDUCTIONIN REINFORCEMENT Training
Transfer
Test
RI-01 (VI 30), SI: R1-01 (VI 240) R2-02 (VI 30), S2: R2-02 (VI 240)
R3-01 (VI 60) R4-02 (V160)
SI: R3 v R4 S2: R3 v R4
RI-01 (VI 30), SI: RI-01 (VI 240) R2-02 (VI 30). S2: K2-02 ( V I 240)
R3-01 (VI 240) R4-02 (VI 240)
S1: R3 v R4 S2: R3 v R4
Note. SI and S2 are instrumental discriminative stimuli, tone. and light. R I . R2, R3. and R4 are instrumental responses. lever pressing, chain pulling. nose poking. and handle pulling. counterbalanced across animals. 01 and 0 2 denote food pellets and sucrose liquid; VI indicates variable interval reinforcement schedule.
25
Instrumental Contingencies
extinction test, both lever and chain were available and eight presentations each of the tone and light stimuli were delivered periodically. The test results from the first study are shown in Fig. 8. In the presence of a stimulus that had signaled a shift from a rich VI to a lean VI reinforcement schedule, performance of a response trained on a rich VI schedule with the same outcome was suppressed. That suppression was evident when same responses were compared both with different responses and with the IT1 rate. The outcome-specific component of this transfer replicates the pattern of transfer observed with an S- conventionally trained to signal the omission of a response-contingent outcome. In this experiment, however, there was no evidence of any general disruption of responding by a signal for a reduction in reinforcement frequency: Different responses did not vary from the IT1 rate. The test data from the second study in which the transfer responses had been trained on a lean VI schedule followed a pattern more like that shown in Fig. 7. The immediate effect of a stimulus that signaled a decrease in the frequency of reinforcement was to suppress another response trained with the same outcome slightly more than a response trained with a different outcome (4.6 and 5.6 responses per min, respectively, on the
V I 30/ V I 240
0
0-0
SAME
0-0
DlFF
A - -A
IT1
c
.-c
4 -
E
ba m al m
c 0 a m
F
$,
3 -
\\
2 -
c 0
z5
1 -
0 '
I
1
2
3
4
Blocks of t w o trials
Fig. 8. Transfer test showing effect of a stimulus correlated with a shift from a VI 30-sec to a VI 240-sec schedule of reinforcement on responses trained on a VI 60-sec schedule. Responding is plotted separately during the stimulus trained with the same outcome (filled circles) or with a different outcome (open circles), and in the absence of stimuli (ITI).
26
Ruth M. Colwill
first two test trials). However, performance of both responses during the stimulus presentations was reduced relative to their IT1 rate (6.5 responses per min on the first two test trials), although that difference was only significant for the same response. These two studies reveal substantial consistency in the pattern of transfer to other responses of a stimulus trained as a signal for a reduction in the rate at which a response will be reinforced. Such stimuli suppressed preferentially other responses trained with the same outcome regardless of the rate at which they had earned that outcome. This result implies that an S- develops a simple inhibitory association with the instrumental outcome. Experiments analogous to the ones reported here have been carried out on Pavlovian conditioned inhibitors (CI) that signal a reduction in the magnitude of an unconditioned stimulus associated with a CS+ . Specifically, those studies have been concerned with the associative connections of a stimulus followed by a mild shock unconditioned stimulus (US) in the presence of a cue that separately signals a strong shock US. There is universal agreement that such a stimulus reduces responding to another signal trained with the strong shock US. However, the results of testing that stimulus with a signal trained with the weak shock US have been more controversial. Whereas some investigators have found no transfer of inhibition (Cotton, Goodall, & Mackintosh, 1982; Mackintosh & Cotton, 19851, others have obtained successful transfer (Wagner, Mazur, Donegan, & Pfautz, 1980). The potential discrepancy between these results and those we have found for instrumental S-s may reflect a fundamental difference in the associative structures of instrumental and Pavlovian inhibitors. On the other hand, it may be the case that, unlike variations in the intensity of an event, manipulations of the frequency of an event are not encoded as sensory features of the outcome representation. ii. Role of Pavlovian conditioned inhibition. Traditionally, support for an explanation of the suppressive effects of an S- in terms of Pavlovian conditioned inhibition has been drawn from studies showing that Pavlovian inhibitors suppress instrumentally trained responses (Rescorla & LoLordo, 1965; Weisman & Litner, 1969).Arguments against this view have made much of two characteristic differences in the suppression produced by a Pavlovian CI and that produced by an instrumental S - . First, as previously noted, Pavlovian CIS appear to transfer across signals trained with different outcomes belonging to the same motivational class (Nieto, 1984; Pearce et al., 1981). Thus, Pavlovian CIS fail to demonstrate the kind of outcome specificity in their transfer that has been obtained with instrumental S-s. Second, direct comparisons of the magnitude of CI and S- have typically revealed that a conditioned inhibitor is less effective
Instrumental Contingencies
21
and may sometimes be completely ineffective in reducing instrumental responding (Bonardi, 1988; Gutman & Maier, 1978). Neither of these arguments is especially convincing. In the first place, Kruse et al., (1983) detected an outcome-specific element in the effect of Pavlovian CIS on instrumental performance. In the second place, Colwill(l991) has argued that comparisons of the magnitude of transfer are biased in favor of an S - . Because an S- develops an inhibitory connection with its original response, generalization from that association can contribute to its ability to suppress another response. That source of generalization is not available to a Pavlovian CI. The following experiment used an alternative strategy to evaluate the role of Pavlovian conditioned inhibition to the suppressive effects of an S- . The intention was to dissociate the Pavlovian from the instrumental contingencies embedded in training a stimulus to signal a reduction in rate of reinforcement. To this end, a stimulus was correlated with a reduction in the frequency of an outcome by arranging for responding to be reinforced during the IT1 but not during presentations of the stimulus. When the stimulus was presented, the outcome was delivered noncontingently but at a substantially lower rate than its frequency during the ITI. If Pavlovian conditioned inhibition underlies the transfer obtained with an instrumental S - , a stimulus trained in this procedure should demonstrate outcome-dependent suppression of another instrumental response. The design of this study was modeled after that used to produce the data shown in Fig. 8. Two stimuli (tone and steady light) were trained each with a different response (nose poke or handle pull) and a different outcome (food pellets or sucrose liquid). Each response earned its outcome on a VI 30-sec schedule during the ITI. Responses went unrewarded during stimulus presentations but outcomes were delivered on a VT 240-sec schedule. Thus, the stimulus was correlated with the same reduction in density of outcome deliveries as the S- trained for the test shown in Fig. 8. All subjects were then trained to lever press and chain pull; one response earned pellets and the other earned sucrose. Preparation of these responses for a transfer test followed the standard training procedure. The results of that test are shown in Fig. 9. The pattern of responding in the presence of the stimuli bears no resemblance to the data obtained with explicitly trained S-s. Rather than depressing the response trained with the same outcome, a stimulus left that response unaffected and instead suppressed performance of the different response, at least at the outset of testing. This reversal offers persuasive evidence that the present training procedure did not simply generate a weaker form of the learning produced by explicit S- training. In fact, the results plotted in Fig. 9 most resemble those obtained with the transfer of Pavlovian CS+s. That
Ruth M. Colwill
28
V I 30/ V T 240
10 W
0-0
SAME
0-0
DlFF
A- -A
IT1
c 3
.-C
8
E
L
0
a v)
aI v)
C
0
a v)
El C 0
I
0’
I
1
2
3
4
Blocks of two trials Fig. 9. Transfer test showing effect of a stimulus correlated with a shift from a VI 30sec to a VT 240-sec schedule of reinforcement on responses trained on a V160-sec schedule. Responding is plotted separately during the stimulus trained with the same outcome (filled circles) or with a different outcome (open circles), and in the absence of stimuli (ITI).
similarity suggests that the stimuli in the present experiment developed simple Pavlovian associations with their noncontingent outcomes. These results imply that an instrumental S- is not merely a more powerful duplicate of a Pavlovian CI but rather possesses a unique associative structure that cannot be reduced to that produced by Pavlovian contingencies. 3 . Conclusion
The transfer technique has provided compelling evidence that an S+ and an S- provide information about the identity of the outcome whose occurrence or nonoccurrence is signaled by their respective presentations. This information allows these stimuli to control the performance of other responses trained with those outcomes. In neither case does it appear reasonable to attribute that learning to the operation of Pavlovian conditioning processes. Instead, it seems more appropriate to appeal to the idea that instrumental cues modulate the threshold for activation of the outcome representation. Whereas an S + reduces this threshold, thus making it easier to excite the outcome representation, an S- raises that threshold, thus making it more difficult to excite the outcome representation.
Instrumental Contingencies
29
In the case of an S + , this mechanism appears wholly satisfactory in accounting for the occurrence of the original response.
C. S-R ASSOCIATIONS One of the most pervasive themes in discussions of instrumental behavior has been that discriminative stimuli become associated with their instrumental responses. The function usually attributed to this learned association is an evocative one; in other words, the discriminative stimulus is thought to trigger execution of its instrumental response. Although this view has been favorably represented in theoretical accounts of instrurnental learning, there has been little justification for it at an empirical level (Colwill & Rescorla, 1986). An alternative idea that has received rather less attention is that the function of an S-R association is to narrow the pool of response options to those that are immediately available. Execution of a specific response whose representation was retrieved by its discriminative stimulus would then depend on the results of evaluating the consequences associated with making that response. In the following sections, 1 comment first on the evidence that animals Learn about the relation between a stimulus and its reinforced response. Then, I discuss the implications of studies showing that an S- demonstrates both a degree of response specificity and, under some circumstances, insensitivity to the identity of the outcome associated with its nonreinforced response. 1 . Association between
S+ and R
a . Persistence of Behavior. Attention has often been drawn to circumstances under which behavior persists despite a reduction in the quality of its consequent outcome. Even when rejection of a devalued outcome in a consumption test is complete, the rates of devalued responses remain at a level substantially above zero. An example of this residual responding is shown in Fig. 1. Observations of residual responding have provided the strongest support for an evocative function of an S-R association. Consistent with this interpretation is the fact that more residual responding is obtained in single-response tests than in choice tests (Colwill & Rescorla, 1986). However, Colwill and Rescorla (1986) advised caution in attributing residual behavior to an evocative function of an S-R association, warning that other explanations of residual performance were equally feasible. For instance, it may be that the standard devaluation procedure leaves the functional reinlorcer with some residual value. In this way, an R-0 association could then support the continued performance of the devalued re-
30
Ruth M. Colwill
sponse. Evidence consistent with this alternative was recently reported by Colwill and Rescorla (1990a, Experiment 2). They employed a reward that was administered directly into the animal’s mouth during both instrumental training and outcome devaluation. Under those conditions, residual responding after outcome devaluation was negligible. Direct administration of a food outcome into the subject’s mouth has two advantages over the more commonly used procedure of delivering the outcome to a food magazine. First, the immediacy of delivery inherent in direct administration of the outcome reduces the potential for events to intervene between execution of the response and consumption of the outcome. In the conventional outcome administration procedure, such intervening events might retain their value after devaluation of the food outcome and thus contribute to residual performance of the devalued response. Second, direct administration of the outcome into an animal’s mouth may improve the efficacy of the devaluation treatment by guaranteeing exposure to the outcome on each conditioning trial. In the conventional procedure, a subject may refuse to consume the outcome long before its value has been fully eliminated, and thus the functional outcome may be left with the capacity to sustain residual performance. For whatever reason that direct administration of the outcome yields a superior outcome devaluation effect, it raises a serious concern about the use of residual responding as a basis for identifying an evocative S-R connection.
6. Mediation qf Transfer. Colwill and Rescorla (1986) were more sympathetic toward the possibility that S-R associations are used to select the various response options whose appropriateness can then be determined by evaluation of their consequences, activated through R - 0 associations. This hybrid model has the advantage of accounting for the sensitivity of responses to outcome revaluation. Moreover, it can accommodate Colwill’s ( 1993b) finding that discriminative control is eliminated by training S + as an S- with the same outcome; the consequence of S- training would be to interfere with activation of the outcome representation by the response. That some form of S-R learning occurs has been inferred from the observation that animals can solve certain kinds of instrumental discriminations (Colwill & Rescorla, 1986; Mackintosh, 1983). What has been lacking, however, is any direct evidence for an S-R association in instrumental learning. The following study was undertaken in an attempt to remedy that situation. It asked the question of whether transfer of a discriminative stimulus to a new response could be mediated by the activation of an outcome through a chain of S-R and R-0 associations. Rats were trained with two discriminative stimuli, two responses, and one
Instrumental Contingencies
31
outcome (i.e., S1: RI-01 and S2: R2-01). For half the animals, handle pulling earned pellets during a light and nose poking earned pellets during a tone; these stimulus-response pairs were switched for the remaining animals. Then, two new responses (lever press and chain pull) were trained, one with sucrose (R3-02) and one with polycose (R4-03). During the training of these responses, neither the light nor the tone were presented. Similar training was then conducted with the nose poke and handle pull responses; each response was separately trained, one with sucrose (Rl-02) and one with polycose (R2-03). Finally, the lever and chain were made available and tested with occasional presentations of the light and tone. What was of interest was whether a stimulus would differentially affect performance of the transfer response trained with the same outcome subsequently associated with the stimulus’ original response. In other words, would S1 selectively affect R3 and S2 selectively affect R4. The results of this study are shown in Fig. 10. Over the course of testing, a differential effect of the stimuli on responding did emerge; the same response was significantly depressed relative to the different response. That is to say, a stimulus whose response had been separately paired with polycose suppressed performance of another response also trained with polycose. Suppression was also evident when a comparison was made
0 c 3
.-C
E L
W
a ln ln
K
0
a
F!
1
2
3
4
Blocks of two trials
Fig. 10. Effect of a discriminative stimulus ( S + ) on a response trained with either the same outcome (filled circles) or a different outcome (open circles) that had been paired with the response of the S + after the discrimination training.
32
Ruth M. Colwill
with the IT1 response rate (3.4 responses per min): Over the entire session, a stimulus suppressed the same response but had no effect on the different response. The selectivity of this effect provides direct evidence of an association between a discriminative stimulus and its original response. The finding that a stimulus suppressed rather than promoted the same response in the transfer test has two important implications for analyses of instrumental learning. First, it is relevant to a decision regarding the relationship between Pavlovian CS+s and instrumental S+s. It is clear that activation of the outcome representation by a response is not adequate to support transfer. That finding suggests a qualitative difference in the way in which discriminative stimuli and instrumental responses operate on the outcome representation. What appears to be important for transfer is not simple excitation or activation of the outcome representation; rather, to promote a response, a stimulus must have the power to lower the threshold for activation of the outcome representation associated with that response. In view of the overwhelming evidence that Pavlovian CS+s and instrumental responses develop identical associations with their outcomes (see Colwill& Rescorla, 1986; Mackintosh, 1983), it seems doubtful that there can be any disagreement about the nonequivalence of Pavlovian CSs and instrumental discriminative stimuli. Second, the present results suggest that, at least under some circumstances, different responses trained with the same outcome may interfere more with each other than with responses trained with different outcomes. This observation has obvious relevance to studies of performance on concurrent schedules (Herrnstein, 1961). It may also have some relevance for the analysis of instrumental contingency effects in which additional noncontingent presentations of an outcome reduce performance of a response trained with that outcome but not that of a response trained with a different outcome (see Colwill & Rescorla, 1986). To the degree that the noncontingent presentations of an outcome adventitiously reinforce another behavior, the selective depression of the same response may, in part, be a product of an outcome-dependent response competition mechanism, and not entirely due to a change in the R-0 association. Finally, it is worth noting one feature of the present study that may have especially encouraged the development of an S-R association. With the use of a single outcome to establish initial discrimination learning, the conditions may not have been optimal for producing learning about that outcome. It will be important to vary the circumstances under which initial training is carried out to verify the generality of the conclusions that may be drawn from this study with respect to the development of S-R associations in the normal course of instrumental learning.
Instrumental Contingencies
33
2 . Inhibitory S-R Associations The quest for evidence of S-R learning has proven far more profitable in situations where stimuli signal that responses will not be followed by their rewarding outcomes. Application of three different techniques has generated evidence showing that an S- exhibits some response specificity and that in suppressing its response an S- appears insensitive to the identity of the outcome used to train that response. Such findings suggest that an S- reduces performance of its original response by specifically suppressing activation of the representation of that response. a . Conflicting Outcomes in Acquisition. One strategy that has been particularly useful in disclosing the nature of the associations formed in Pavlovian conditioning has been to manipulate the conditions under which that learning is asked to take place. If the consistency between a signal and its outcome is varied systematically, it is possible to draw inferences about the identity of the outcome elements that participate in the association with the signal by examining the rate at which conditioning to the stimulus occurs. If learning is affected by changing a feature from one trial to the next, it suggests that learning about that feature takes place. An elegant illustration of the use of this technique was reported by Rescorla (1980). Using an autoshaping procedure with pigeons, he found superior second-order conditioning to a stimulus that was always followed by the same outcome compared with a stimulus that was followed by two different outcomes. In applying this logic to an analysis of an S - , we predicted that if an S- acts directly on its instrumental response, acquisition of that S i R connection should be unaffected by the identity of the outcome that is omitted. To test this idea, rats were given training with four different discriminative stimuli (noise, tone, a steady light, and an overhead flashing light) designed to establish them concurrently as S + s for one response and S-s for another response. Two stimuli signaled that nose poke responses (Rl) would be followed by food pellets and the other two stimuli signaled that nose pokes would be followed by sucrose liquid. One of the pellet stimuli and one of the sucrose stimuli was also trained as a signal that another response, R2 (either lever or chain) would not be followed by pellets; the remaining two stimuli were trained as S-s for a third response, R3 (either chain or lever) trained with the sucrose outcome. The stimulus presentations were 30 sec long separated by a mean IT1 of 30 sec duration. Responses were reinforced during the IT1 on VI 30-sec schedules. Table IV illustrates the basic design of this experiment. The acquisition rates of S + and S- control are shown in Fig. 1 I . There was no evidence of differential suppression of the nonreinforced responses
Ruth M. Colwill
34
TABLE IV
BASICDESIGNOF OUTCOME INFLUENCE ON CONCURRENT S+/STRAINING EXPERIMENT S + /S - training R2-01, R2-01. R3-02, R3-02,
S1: R1-01, S2: R1-02, 53: RI-02, S4: RI-01.
R2R2R3R3-
N o t e . KI denotes the nose poke response; R2 and R3 are instrumental responses. lever press, and chain pull. counterbalanced across animals. 01 and 0 2 denote food pellets and sucrose liquid: - indicates nonreinforcement; S I , S2. S3. and S4 are discriminative stimuli, steady light. flashing light, tone, and noise.
even though the omitted outcome for one response was earned concurrently by another response. But manipulation of the identity relation between the earned and the omitted outcome was not without effect on behavior. Acquisition of the reinforced response was at a disadvantage when the stimulus concurrently signaled the omission of its outcome for 1.o
.$
e
0.8
0-0
SAME
0-0
DlFF
C 0
o.2_
1
2
3
4
5
6
Blocks of three sessions
Fig. 11. Acquisition of discriminative control by a stimulus trained concurrently as a signal for the reinforcement of one response (S + ) and for the reinforcement of a different response (S - ). The outcome following the reinforced response was either the same as (filled circles) or different from (open circles) the outcome withheld for the nonreinforced response.
35
Instrumental Contingencies
a different response. This finding extends the results depicted in Fig. 5, showing that prior S + control was undermined by subsequent S- training with the same outcome. This experiment provides evidence that the operation of nonreinforcing a response during a stimulus has two consequences. First, it strengthens an inhibitory association between that stimulus and the response. Second, it produces an inhibitory connection between that stimulus and the representation of the outcome that was forfeited. This S l O association is antagonistic to that generated between an S + and the earned outcome. b. Summation and Retardation. Another technique for determining the identity of the elements participating in an association with a stimulus is to combine different stimuli that vary in the degree to which they have overlapping associates (Miller & Price, 1974). The general idea is that summation of two stimuli signaling the same information will be different from that of two stimuli signaling different information. In applying this technique to the detection of associations between a stimulus and its nonreinforced response, Jennifer Richeson and I expected that a response would be depressed more in the presence of a compound containing its original S- and another stimulus also trained as an S- for that response than in a compound containing its original S - and an S- for a different response. Table V illustrates the basic design of our experiment to assess the presence of inhibitory connections between an S- and its nonreinforced response. A light was established as a signal for the nonreinforcement of two responses (lever and chain) that were otherwise reinforced with the same outcome (e.g., pellets). Each of these responses was also nonreinforced during another stimulus; one response (R1) was nonreinforced
TABLE V
BASICDESIGNOF ~
s-
SUMMATION
EXPERIMENT
~
S - training
R I - 0 1 , N: RI R 2 - 0 1 . T: R2 L: R 1 - , R 2 R 3 - 0 2 , N : R3 R 4 - 0 2 . T: R4L: R3 - R4-
.
Test
Predictions
LN: R 1 - 0 1 , R2-01 LT: R I - 0 1 , R2-01
( L N : R1 < R2) (LT: R1 1 R2)
LN: R3-02, R4-02
(LN: R3 < R4) (LT: R3 > R4)
LT: R3-02, R4-02
Nore. R I . R2. R3. and R 4 are instrumental responses, lever pressing, chain pulling. nose poking. and handle pulling. counterbalanced across animals. 01 and 0 2 denote food pellets and sucrose liquid; - indicates nonreinforcement: L. T. and N are discriminative \timuli. light. tone. and noise, respectively.
36
Ruth M. Colwill
during a noise and the other response (R2) was nonreinforced during a tone. Parallel training of these stimuli with two other responses (nose poke and handle pull) and a different outcome (e.g., sucrose) was also conducted. For all four responses, outcomes were earned on a VI 30-sec schedule in the absence of stimulus presentations. Finally, summation tests were given with each pair of responses (lever and chain in the first test series; nose poke and handle pull in the second test series) and occasional presentations of a light-noise (LN) compound and a light-tone compound (LT). In this way, the elements of one compound stimulus (LN) had separately signaled the nonreinforcement of one response (RI or R3), and the elements of the other compound stimulus had signaled the nonreinforcement of the other response (R2 or R4). We were concerned that several factors might obscure our ability to detect differential summation. For instance, both nonspecific suppression and suppression produced by the inhibitory S-0 association observed in transfer tests with S-s might contribute to a floor effect. Consequently, responses were rewarded on a VI 30-sec schedule with their original training outcomes in the presence of the compound stimuli during summation testing. Any difference in the rate at which the compounds developed control over R1 and R2 (or R3 and R4) could not be attributed to outcome associations because the relevant comparison was between responses trained with the same outcome. Consequently, if acquisition of RI were slower than R2 during the LN compound, and acquisition of R2 were slower than R1 during the LT compound, we would have evidence for the presence of specific S {R associations. These predictions were confirmed during testing of R1 and R2. Although performance of both responses was negligible during the initial compound stimulus presentations, a significant difference emerged in that the probability of a response was reduced when both stimulus elements signaled its nonreinforcement (3.6 and 4 . 4 responses per min for the same and different responses, respectively). A similar pattern emerged during testing of R3 and R4, but the difference was marginally nonsignificant. These results provide additional support for the conclusion that an inhibitory association develops between an S- and its nonreinforced response. c . Resistance to Manipulations of the S - 0 Relation. The third result that implicates the presence of S {R associations is the finding that an S- will continue to suppress its original response following retraining as a signal for the reinforcement of another response with the same outcome. Thus, even though retraining the S- as an S+ results in the stimulus signaling the availability of the instrumental outcome, that S- nevertheless continues to inhibit performance of the response whose nonreinforcement
Instrumental Contingencies
31
it has predicted in the past (Colwill, 1991, 1993a). This finding is illustrated by the following experiment reported by Colwill (1993a). Two stimuli (a noise and a light) were initially trained as S-s for two different responses (nose poke and handle pull). Each S- signaled that one response would not be followed by pellets and that the other response would not be followed by sucrose. Then, both S-s were trained as signals for the reinforcement of a third response (displacement of a joystick, R3). One S- signaled that R3 would be followed by pellets and the other Ssignaled that R3 would be followed by sucrose. In this way, each stimulus now signaled the availability of one of the two outcomes used for its Straining. However, this treatment had no differential effect on the ability of those S-s to suppress their original responses (5.2 responses per min for the response whose outcome was now signaled by an S-, and 4.6 responses per min for the other response). To verify that S + training with R3 had in fact established a viable S-0 association, a transfer test was conducted with two other responses (lever press and chain pull), one trained with pellets and one with sucrose. In this transfer test, each S- promoted performance of the response associated with the outcome used for training that S- as an S+ (5.1 and 3.7 responses per min for the same and different response, respectively). In other words, the pellet-trained response was selectively promoted by the S- whose S+ training was carried out with pellets, and the sucrosetrained response was enhanced by the S- trained as an S + for sucrose. Thus, despite having the ability to promote performance of another response trained with the same outcome as one of its originally nonreinforced responses, an S- continued to suppress that original response. d. Conclusions. In summary, these three lines of evidence reveal that outcome identity is inconsequential to the development of a stimulus’ ability to suppress its nonreinforced instrumental response. The most straightforward interpretation of these data is that animals learn what response not to make during an S- , an instruction that is readily coded by an inhibitory S-R association. Although that learning obviously does not account for the outcome-dependent component of transfer that we have routinely observed, it may contribute to the nonspecific transfer to other responses sometimes obtained with S - ,
D. ASSOCIATION BETWEEN 0 A N D R In assessing the presence of S-R associations in instrumental learning, the preceding discussion concentrated on an analysis of the acquired properties of the explicit stimulus, and deliberately ignored the potential role that the outcome might play in evoking the response. However,
38
Ruth M. Colwill
there have been significant developments of the idea that the instrumental outcome may serve as a stimulus for the response. This proposition has surfaced in two quite different guises (Linwick, Overmier, Peterson, & Mertens, 1988; Peterson, Linwick, & Overmier, 1987; Wagner, 1981). On the one hand, it has been claimed that the response becomes associated with the memory or trace of a preceding outcome. Evidence for this viewpoint has been drawn primarily from two sources: discrete trial experiments showing that behavior on one trial is guided by the outcome event occurring on the preceding trial (Capaldi, 1967); and free operant studies showing that extinguished responding may be reinstated by the delivery of noncontingent outcomes (Reid, 1957; Rescorla & Skucy, 1974). On the other hand, it has been argued that an instrumental response becomes attached to the expectancy of the instrumental outcome (Asratyan, 1974; Pavlov, 1932; Trapold & Overmier, 1972). Trapold and Overmier (1972) proposed that a Pavlovian association develops between S and 0 during instrumental training. As a result, S develops the ability to elicit an expectancy or anticipation of 0 on subsequent trials. Because the response is reinforced in the presence of this expectancy, the expectancy acquires stimulus control over the response. What is intriguing about this suggestion is that it provides a way for behavior to be sensitive to changes in the value of the outcome even though that behavior is mediated by an evocative S-R association. Because changing the value of the outcome alters the expectancy activated by S, some of the stimulus support for the response is removed, with the result that performance is reduced. 1. Association between Outcome Presentation and Response
A striking phenomenon associated with simple extinction of an instrumental response is the restoration or reinstatement of that extinguished response by the reintroduction of its training outcome on a VT schedule (Baker, Steinwald, & Bouton, 1991; Franks & Lattal, 1976; Reid, 1957; Rescorla & Skucy, 1974; Skinner, 1938). Several authors have suggested that this reinstatement effect is evidence of the stimulus control acquired by the outcome over the response during the original training. Because the response is reinforced in a context of previously earned outcomes, the opportunity exists for an outcome presentation to develop an association with the response. Quite a different explanation, however, has been developed for a related phenomenon observed in Pavlovian conditioning in which presentations of the US lead to recovery of extinguished Pavlovian conditioned responses (Bouton & Bolles, 1979; Bouton & Peck, 1989; Rescorla & Heth, 1975; Schachtman, Brown, & Miller, 1985). In this case, it has been proposed that the function of the US presentations is to restore the integrity of the
39
Instrumental Contingencies
outcome representation whose decay contributed nonassociatively to a decline in conditioned responding. According to this account, outcome presentations might reinstate an extinguished instrumental response because they restore the integrity of the representation of the outcome that used to follow that response. One way to separate these two accounts is to examine the degree to which reinstatement of an extinguished instrumental response depends on presentation of the outcome used to train that response. Concurrent training of two instrumental responses, each with a different outcome, guarantees unique R-0 relations (R 1-01 and R2-02), but nonunique 0 - R connections (01-R1, 01-R2, 02-R1, and 02-R2). If outcome presentations evoke their responses, then deliveries of 01 and 0 2 will be equally effective in reinstating performance of either R1 or R2. However, if restoration of depressed outcome representations underlies the reinstatement effect, presentations of 01 should selectively reinstate R I , whereas presentations of 0 2 should preferentially reinstate R2. Mark Arrigan and I explored these predictions in the following experiment. We trained rats to lever press for one rewarding outcome (01)and to chain pull for a different rewarding outcome ( 0 2 ) during a light S . There were 16 sessions of concurrent training in which lever press and chain pull responses during the light were reinforced on independent V1 60-sec schedules. Both responses were then extinguished during the light for 20 sessions. These extinction sessions were identical to the training sessions except that reinforcers were not available. Finally, a reinstatement test was conducted in which there were eight presentations of one of the outcomes delivered noncontingently at random intervals during four of the eight light presentations. After another session of extinction, the reinstatement test was repeated with the other outcome. The results of these reinstatement tests are summarized on the left side of Fig. 12. It is clear that there was no selective enhancement of a response by presentations of its consequent outcome (S) compared with presentations of a different outcome (D). That reinstatement was obtained is indicated by comparisons of the stimulus rates in these tests to that obtained in the final extinction session (.44 responses per min). A similar comparison of the IT1 rates (.45 responses per min during the final extinction test) revealed that outcome presentations did not promote IT1 responding. These results suggest that reinstatement occurred but that it was not mediated by the R - 0 associations. The finding that the magnitude of reinstatement was unaffected by the identity of the consequent outcome is consistent with an evocative 0 - R association. But it also admits another less interesting possibility in terms of a mechanism of disinhibition. Pavlov (1927) reported that responding to an extinguished CS recovered after a
+
Ruth M. Colwill
40 4
CONCURRENT
S+
9 3 .-E C
L
u
:al
2
u)
c
0
a VI e
l
c 0
al
I
0
SEPARATELY
IT1
TRAINED
1 S
D
S
D
S
D
Fig. 12. Effect of outcome presentations on extinguished instrumental responses, separated according to whether the presented outcome was the same as (dark bars) or different from (blank bar) the outcome used to establish the instrumental response. Responses that had been concurrently trained and extinguished during a light S + are shown during the light S (far left) and during the IT1 (center). Responses that had been separately trained in a free operant procedure are shown on the right.
+
presentation of a novel event. He argued that the novel stimulus interfered with an inhibitory process responsible for depressing performance. An explanation in terms of disinhibition for the present results is ruled out by the following experiment in which two responses were trained in separate sessions. One response was trained with pellets and the other with sucrose liquid. In this way, each response had unique 0 - R and R-0 associations. Both responses were then extinguished for several sessions. Finally, reinstatement tests with each outcome were administered. The results of these tests are shown on the far right of Fig. 12. Performance of an extinguished response was significantly increased by presentations of its training outcome but not by presentations of the outcome used to train another response. This selectivity is important for dismissing a disinhibition account of reinstatement; such an account would have predicted equivalent elevation of both responses. On the basis of these results, it is clear that outcome presentations acquire some sort of discriminative control over instrumental responding. However, the current data do not reveal the nature of the mechanism underlying that control. Rather than serving to evoke the response directly, the function of an outcome presentation may be to reestablish an expectation of that outcome which is then responsible for reinstating
Instrumental Contingencies
41
responding. This possibility predicts that presentation of an outcome should have the same effect on an extinguished response as a presentation of an S + for that outcome. Unfortunately, that prediction is not supported by empirical work. Whereas extinguished responses are not selectively reinstated by the presentations of their consequent outcomes, their performance is selectively promoted by S s trained with their consequent outcomes. In an experiment from my laboratory, two stimuli were given standard training as S + s for different outcomes (food pellets and sucrose liquid). Then, two responses (lever press and chain pull) were trained in preparation for a transfer test. One response earned food pellets and the other earned sucrose liquid. Following this training, both responses were extinguished for 12 20-min sessions. Finally, a transfer test was conducted with the two S + s. The results of this test are displayed on the left-hand side of Fig. 13. A standard transfer effect was obtained: An S + promoted performance of an extinguished response with which it shared an outcome (dark bar) but had no effect on an extinguished response trained with a different outcome (blank bar). Reinstatement tests were also given with these responses. Those data shown on the right-hand side of Fig. 13
+
TEST'iITH
TEST WITH OUTCOME
al
c 3
.-C
E L
al
a rn
rn
C
0
a
E
\ 1 -
c 0
8-
al
2 \
S
D
IT1
S
D Pre
Fig. 13. Concurrently trained and extinguished instrumental responses during a transfer test with an S + (left panel) or during a reinstatement test following noncontingent presentations of an instrumental outcome (right panel). Left panel: Responding is plotted separately as a function of whether the consequent outcome for the response was the same as (dark bar) or different from (blank bar) that associated with the S + . Intertrial interval responding is also shown (hatched bar). Right panel: Responding is separated according to whether the reinstating outcome was the same as (dark bar) or different from (blank bar) that used for its initial training.
42
Ruth M. ColwiU
indicate that there was no selective increase in responding as a function of outcome identity. Presentations of one outcome produced increased performance of both the same (dark bar) and different (blank bar) responses compared with their rate of occurrence during a 4-min extinction period preceding delivery of the first outcome (Pre). The inference to be drawn from these results is that the operation of an S + on the outcome representation is functionally different from a presentation of the outcome per se. David Roe and I have examined this issue further by comparing the transfer of an outcome trained as a discriminative stimulus with an outcome presented as a consequence for a response. In one experiment, rats were rewarded for nose poking with one outcome (01)during periods whose onsets were signaled by the delivery of a different outcome ( 0 2 ) . Each session contained 32 trials separated by a mean IT1 of 90 sec. Trials began with the delivery of 0 2 , which signaled that a VI 30-sec reinforcement schedule was in effect for a 30-sec period. For one half of the rats, food pellets signaled that nose pokes would be followed by sucrose liquid; for the other subjects, sucrose liquid signaled that nose pokes would be followed by pellets. A transfer test was then conducted with two other responses, lever press and chain pull. These responses had been trained before the start of discrimination training. One had earned 01 and the other had earned 0 2 . Figure 14 illustrates the results of the transfer test in which there were eight presentations of each outcome. Responding was measured during the 30-sec period following an outcome delivery and during the 90-sec ITI. The left panel of Fig. 14 shows performance of the transfer response trained with 01 following presentations of 0 2 (dark bar) and following presentations of 01 (blank bar). Performance of that response was increased by presentations of an outcome (02) trained as an S for 01, but not by presentations of its consequent outcome, 01. The data shown on the right-hand side of Fig. 14 come from the transfer response trained with 0 2 . Performance of this response was slightly decreased relative to the IT1 rate by presentations of an outcome trained as an S + for 01 and by presentations of its consequent outcome, 02. Interpretation of these IT1 comparisons is complicated somewhat by the fact that no correction was made for the time taken to consume the outcomes. However, what is important to note is the absence of any increase in performance of a transfer response by presentations of its consequent outcome. These results indicate that it is important to preserve a distinction between a presentation of an outcome and an expectancy of an outcome. The impact of an S + on behavior is not equivalent to the effect of a presentation of the outcome associated with that S + . The analysis of the reinstatement effect suggests outcome presentations develop some sort
+
Instrumental Contingencies R1-01
02 01 IT1
43
R2-02
02 01 IT1
Fig. 14. Transfer test with an outcome ( 0 2 ) established as an S + for a response trained with a different outcome (01). The left side shows the response trained with 01 following presentations of the same S + , ( 0 2 . dark bar). or the same consequent (01,blank bar), and during the IT1 (hatched bar). The right side shows the transfer response trained with 0 2 following presentations of the different S + ( 0 2 . dark bar), or a different consequent (01, blank bar), and during the IT1 (hatched bar).
of discriminative control over responding. It is beyond the scope of this analysis, however, to determine the precise nature of that control. It is possible that a simple evocative association develops between the outcome presentation and the response; but it is equally feasible that the outcome presentation serves as a discriminative stimulus (S + ) and operates as a modulator of the reinforcer representation. 2. Association between Outcome Expectancy and Response
In the initial reports of outcome devaluation and transfer, Colwill and Rescorla (1985a, 1988) took care to minimize the opportunity for the development of differential associations between the outcome expectancy and the instrumental response that might have contaminated their ability to identify R - 0 associations. However, those studies did not address the issues of whether any learning about the outcome expectancy and the response occurred, and if it did, whether that learning was relatively more important than R - 0 learning in the control of instrumental behavior. Rescorla and Colwill (1989) and Rescorla (1992b) addressed these points by directly comparing the relative contributions of the outcome as an antecedent event and as a consequent event to instrumental performance.
44
Ruth M. Colwill
There are three critical stages common to each of the procedures used in these experiments. In the first stage, two stimuli are trained so that each evokes an expectancy of a unique outcome. Thus, S1 is trained to elicit an expectancy of 01 and S2 to evoke an expectancy of 0 2 . In the second stage, instrumental responses are reinforced in the presence of each stimulus. The outcomes used to reinforce these responses are different from those whose expectancies are controlled by the discriminative cues. Thus, R1 is followed by 0 2 during S1, and R2 is followed by 0 1 during S2. In this way, the identities of the antecedent and consequent outcomes available for association with R 1 and R2 are dissociated. Finally, in the third stage, either outcome devaluation or transfer techniques are used to evaluate the relative contributions of 0-R and the R - 0 associations to performance of R l and R2. The consistent finding to emerge from these studies is that the more important determinant of performance is the consequent outcome. Devaluation of an outcome led to greater depression of the response for which that outcome had served as a consequent rather than as an antecedent event in Stage 2. Similarly, transfer was superior when a stimulus signaled the consequent outcome rather than the antecedent outcome associated with the response in Stage 2. These results held when outcome expectancies were generated either by training S1 and S2 as instrumental S + (Rescorla and Colwill, 1989) or by training S1 and S2 as Pavlovian CS + (Rescorla, 1992b). Moreover, when Rescorla and Colwill (1989) directly measured the influence of potential 0-R learning on behavior, its contribution was relatively inconspicuous. 3 . Summary
The preceding review has confirmed that the function of the instrumental outcome is not simply to serve as an associate of the discriminative stimulus or the instrumental response. Instead, it is apparent that previous presentations of an outcome develop the ability to control instrumental behavior. Current evidence suggests that the outcome expectancy is only a minor participant in instrumental learning. However, that conclusion was based on an analysis of behavior during its initial acquisition. It is entirely possible that extended training may increase the contribution of that outcome expectancy to performance. E. CONCLUSION
Instrumental learning situations are a source of many opportunities for the development of binary associations. This survey has outlined evidence for associative links between all three elements of an instrumental task.
Instrumental Contingencies
45
Moreover, it appears that the relative strengths of those links varies depending on the specific relations arranged between the elements. The role of task demands in determining the nature of instrumental learning is pursued in the next section.
111.
Hierarchical Structure of Instrumental Learning
Skinner (1938) is generally acknowledged to be the first to object to an analysis of instrumental learning in terms of binary associations between the elements. He cautioned that a discriminative stimulus should not be viewed as a spur or goad for the response, but rather as “setting the occasion” for reinforcement of the response. Since then, others have agreed with his doctrine that the most suitable scheme for capturing information about an instrumental contingency is one that conjointly represents all three elements (Catania, 1971; Colwill & Rescorla, 1986: Jenkins, 1977; Mackintosh & Dickinson, 1979). However, it is only recently that discussions of this possibility have moved beyond theoretical speculation to some form of empirical evaluation. A.
EVIDENCE FOR ENCODING S-R-0 CONJUNCTIONS
Unambiguous support for the contribution of a hierarchical structure to instrumental learning has come from an analysis of performance on an instrumental analogue of the classic Pavlovian switching procedure. In these experiments, discriminative stimuli disambiguate which responses will lead to which outcomes. Thus, one stimulus (Sl) signals that R l will be followed by 0 1 , and R2 will be followed by 0 2 , whereas a different stimulus (S2) signals the opposite arrangement of responses and outcomes (i.e., R1-02 and R2-01). Because each outcome follows both responses and occurs in the presence of both stimuli, the potential binary associations are rendered uninformative about which outcome would actually follow a response in a given stimulus. Yet several studies have shown that rats learn the particular R - 0 relations that are in effect during each stimulus. These studies have used two techniques, outcome devaluation and summation, to demonstrate that exposure to a switching procedure leads t o the acquisition of some kind of hierarchical structure.
I . Outcome Devaluation A convincing illustration that instrumental learning cannot always be reduced to a collection of binary associations was reported by Colwill and Rescorla (1990b, Experiment 2). Rats were trained on a switching task in
Ruth M. Colwill
46
which various permutations of responses (lever and chain) and outcomes (pellets and sucrose liquid) were arranged in the presence of different discriminative stimuli (noise and light). Specifically, two stimuli (S 1 and S2) signaled that two responses (R1 and R2) would be reinforced, one with pellets and one with sucrose. However, each stimulus signaled a unique combination of those responses and outcomes: thus, S I signaled the relations R1-01 and R2-02, whereas S2 signaled the relations RI-02 and R2-01. After acquisition, one of the outcomes was made distasteful. In a subsequent extinction test with the stimuli and responses, the subjects showed a preference within each stimulus for the response that had previously earned the currently attractive outcome in that stimulus. In other words, after devaluation of 01, subjects showed a preference for R2 over R1 during S1 but a preference for R1 over R2 during 52. The fact that the prevailing response preference was conditional on the identity of the stimulus implies the presence of some kind of hierarchical associative structure . 2.
Summation
Confirmation that instrumental discriminations are solved under some circumstances by learning about the specific conjunctions of particular S, R , and 0 elements was provided by Rescorla (1990a, Experiment 4) using a summation test. In that study, rats were trained on a switching procedure with two visual cues signaling different R - 0 combinations. Thus, LI signaled R l - 0 1 and R2-02, whereas L2 signaled RI-02 and R2-01. In addition, auditory cues were separately established as S + s for various subsets of those R - 0 relations. One auditory cue ( A l ) signaled R1-01 and R2-01; the other auditory cue (A2) signaled R1-02 and R2-02. Summation tests were then conducted with different combinations of the auditory and visual cues. For each combination, the elements of the compound shared one R - 0 relation but not another. Performance of a response was greater if each element of the compound signaled the same outcome for that response than if they signaled different outcomes. The fact that the degree of summation was affected by whether the R-0 relation embraced by the compound elements was identical or not reveals that the animals had acquired that relational information. A similar pattern of results was obtained in an experiment conducted in my laboratory that tested summation across stimuli signaling different R - 0 relations that either did or did not share the outcome element. Rats were trained on a switching procedure with two visual cues (steady light and flashing light), two responses (lever and chain), and two outcomes (pellets and sucrose). In a summation test, each visual cue was combined
Instrumental Contingencies
47
first with a tone S + and then with a noise S + . One of these auditory cues had been trained as a signal that nose poking would be followed by pellets; the other cue had been trained as a signal that nose poking would be followed by sucrose. The immediate effect of each visual-auditory compound was to encourage preferentially performance of the response whose outcome was predicted by both elements of the compound (16.7 and 15.7 responses per min for the shared and mixed 0 conditions, respectively). Thus, lever pressing was preferred to chain pulling when the outcome for lever pressing in the visual element matched that signaled by the auditory component, but this preference was reversed when the outcome for chain pulling was signaled by both elements of the compound. Because both outcomes and both responses occurred during both visual stimuli, any generalization from the individual elements associated with the auditory cues should have been equivalent. However, learning about the specific R-0 relations in effect during the visual cues would permit differential generalization from the relation trained with an auditory cue. This reasoning takes for granted that different R-0 units generalize to each other more when they share an outcome element. These results point to the insufficiency of any analysis of instrumental learning exclusively in terms of simple binary associations. The confidence with which this assertion can be made is increased by the results of several other studies that have explored the conditions under which stimuli trained in a switching task either develop or lose control over their R - 0 relations. Each study used the strategy of manipulating the R-0 relation independently of the individual elements contained in that relation. B.
FACTORS AFFECTING ACQUISITION A N D EXTINCTION
I . Role of Information ahorrt R -0 Relation
Perhaps the most revolutionary finding of modern learning theory is the discovery that learning about the predictive value of stimuli occurs only when those stimuli provide information about their consequences that is not otherwise available. Many studies have shown the importance of this principle in governing the acquisition of stimulus-outcome associations in Pavlovian conditioning (Blanchard & Honig, 1976; Kamin, 1968; Leyland & Mackintosh, 1978; Rescorla, 1968; Wagner, Logan, Haberlandt, & Price, 1968) and response-outcome associations in instrumental learning (Dickinson, Peters, & Schechter, 1984; Hall, Channel, & Pearce, 1981; Hall, Channel, & Schachtman, 1987; Mackintosh & Dickinson, 1979; Pearce & Hall, 1978; Schachtman & Hall, 1990; St. Claire-Smith, 1979a,1979b;Williams, 1978, 1982; Williams & Heyneman, 1982). Its application to analyses of instrumental discriminative stimuli has generated
48
Ruth M. Colwill
additional support for thinking of discriminative stimuli as signals for R - 0 relations. For instance, Colwill & Rescorla (1990b) and Rescorla (1990~) found that a stimulus was impaired in its ability to acquire control over a response when training was conducted in the presence of an S + that already signaled the response-outcome relation. This kind of interference was not observed if training was conducted in the presence of an S + that had previously signaled the individual elements in an instrumental discrimination but not their combination (Rescorla, 1990~). Related to these blocking effects are studies that manipulate the contingency between a stimulus and its R-0 relation. When a stimulus is informative about the R-0 relation, it acquires control over the response. Several studies have reported that discriminative control of a response develops in multiple VI-VT schedules (Bersh & Lambert, 1975; Boakes, 1973; Huff, Sherman, & Cohn, 1975; Lattal & Maxey, 1971; Weisman & Ramsden, 1973). Even if the rate of noncontingent outcomes during the intertrial interval is such that the stimulus is correlated with a decrease in the overall frequency of outcome deliveries, that stimulus nevertheless acquires control over its reinforced response (Colwill & Rescorla, 1990b). A particularly elegant demonstration of the role of contingency was provided by Rescorla (1990~).Rats were trained with two responses and two outcomes. During the IT1 and presentations of one stimulus (redundant), R1 was followed by 01 and R2 was followed by 0 2 ; during presentations of another stimulus (informative), the R-0 relations were switched so that R1 led to 0 2 and R2 led to 01. Following this training, the responses were partially extinguished and then tested with the two stimuli. Rescorla (1990~)found no increase in responding during the redundant stimulus but a significant elevation of response rate during presentations of the informative stimulus.
2. Interactions between R - 0 Relations It is clear that the development of an association between a stimulus and an R-0 relation may be impaired by reducing the relative validity of that stimulus as a predictor of that relation. Recently, we have explored whether there is any interaction between different R-0 relations associated with the same stimulus. In one experiment from my laboratory, one group of rats was trained on a switching design with two stimuli (a noise and a light), two responses (nose poke and handle pull), and two outcomes (food pellets and sucrose liquid). A second group of rats was trained with the same R-0 pairs but the combinations of stimuli and R-0 relations were rearranged so that each discriminative stimulus was uniquely associated with one of the outcomes. Thus, for these animals, SI signaled
Instrumental Contingencies
49
that both responses would be followed by 01, and S2 signaled that both responses would be followed by 0 2 . There was no significant difference between the two groups in acquisition of this problem: The development of control over responding was unaffected by whether the stimulus signaled multiple R-0 relations with the same outcome or with different outcomes. This pattern would seem to suggest that there is no differential interference between R-0 relations that share an outcome element relative to R-0 relations that do not have a common outcome element. However, a transfer test conducted with two new responses (lever and chain), one trained with pellets and one with sucrose, suggested that there had been some interaction within a stimulus between R-0 units sharing an outcome. The transfer responses were increased more by the stimulus compound of cues separately trained with different outcomes than by a compound of cues separately trained with the same outcome (10.7 and 8.5 responses per min) although the IT1 responses did not differ significantly (5.9 and 4.7 responses per min). Taken together, these results suggest that similar R-0 relations generalize to each other during training and thus reduce the asymptotic value of the individual connections S develops separately with each R-0 unit. To account for the difference in transfer, we adopt a recommendation by Pearce (1987) that the component of generalized excitation does not itself generalize to other events. 3. Devaluation of the R - 0 Relation
Manipulations of the R-0 relation after learning has taken place have also yielded results consistent with the thesis that discriminative stimuli may signal specific R - 0 relations. In a series of experiments, Rescorla (1990a) studied the effect that devaluing an R-0 relation in one stimulus had on another stimulus associated with that R-0 relation. In one experiment (Rescorla, 1990a, Experiment 2), rats were trained on a standard switching procedure with two visual stimuli (L1 and L2), two responses (R1 and R2), and two outcomes (01 and 0 2 ) . L l signaled R l - 0 1 and R2-02, whereas L2 signaled Rl-02 and R2-01. Auditory cues were separately trained as signals for subsets of the R - 0 relations occurring during L1 and L2. One auditory cue (Al) signaled R1-01 and R2-01; the other auditory cue (A2) signaled R1-02 and R2-02. Then, both R1 and R2 were extinguished in the presence of one of the visual cues. For each response, this treatment should depress the connection with one of its associated outcomes. Finally, the two responses were tested in extinction with the auditory stimuli. Response preferences exhibited during testing of the auditory stimuli reflected that the animals had learned the particular R-0 relations correlated with each visual cue. Thus, if R1 and
Ruth M. Colwill
50
R2 had been extinguished in L1, the subjects displayed a preference for R2 in A1 and for R1 in A2. If subjects only had the ability to represent the binary relations, they would not have known which outcome followed a particular response in the switching task. Consequently, extinction of those responses would not have generated differential performance in the presence of the auditory cues observed in testing. IV. Binary versus Hierarchical Associations A.
MULTIPLECODES
The prospect of accounting for instrumental learning solely in terms of simple binary associations is seriously undermined by the data described in the preceding section. At the very least, a successful theory must accommodate the fact that animals sometimes represent the R-0 relations operative in the presence of different stimuli. One suggestion as to how this might be accomplished is for S to develop an association with the R - 0 relation (Colwill & Rescorla, 1990b; Rescorla, 1990a, 1990~).The essence of this approach is to treat the R - 0 relation as analogous to a Pavlovian US. In this way, the principles derived for identifying the conditions for learning about the relations between CS and U S may be applied directly to learning about the relation between instrumental stimuli and R - 0 relations. In fact, many of the studies described in the preceding section were guided by this insight. It should also be apparent that an S-(R-0) structure can do more than simply account for the relational data described in the preceding section. Colwill and Rescorla (1990b) argued that outcome-dependent transfer of discriminative stimuli may be attributed to differential generalization between R - 0 units. Thus, outcome-dependent transfer might reflect the greater similarity between R-0 units sharing an element, in this case the outcome, than between those with no common elements. Moreover, by parsing an S-(R-0) structure in different ways, information about the various pairwise associations can be readily extracted. Consequently, it becomes quite reasonable to challenge the necessity for any type of binary model of instrumental learning.
I. Manipulation of S - 0 Association To approach the issue of whether adoption of an S-(R-0) structure eliminates the need for a binary model of instrumental learning, we have examined how manipulations of a binary relation affect performance supported by a hierarchical structure. Because the S-(R-0) structure preserves the
51
Instrumental Contingencies
integrity of the R - 0 relation as a fundamental component of instrumental learning, our manipulations have targeted the discriminative stimulus. A natural starting point is our discovery that discriminative S + control can be seriously undermined by training S + as an S - for a different response reinforced with the same outcome. In the following experiment, we examined whether this treatment would extinguish discriminative control of hierarchical cues. To this end, Heather Seiniger and I trained rats on a switching task using the same procedure as Colwill and Rescorla (1990b, Experiment 2). After acquisition, both stimuli were established as signals for the nonreinforcement of other responses (nose poke and handle pull). One stimulus signaled that one response (R3) would not be followed by pellets, and the other stimulus signaled that the other response (R4) would not be followed by sucrose. If a hierarchical structure is common to all instances of instrumental learning, our prediction was that this treatment should specifically undermine performance of the response that had earned the outcome used for converting its switching stimulus to an S - . The design of this study is shown in Table VI along with these predictions. Because the same response would be the recipient of any decrement induced by this manipulation, this test constitutes a particularly sensitive assessment of the relation between binary and hierarchical models of instrumental learning. The results of the extinction session in which the two switching stimuli were tested with their original responses are summarized in Fig. 15. There was negligible impact of converting the switching stimuli into S - s on performance of their original responses. Both responses continued to be elevated in the presence of the switching stimuli relative to their IT1 rates. Moreover, there was no significant effect of the identity relation between TABLE VI
BASICDESIGNOF EXPERIMENT O N EFFECT OF s-0 EXTINCTION ON SWITCHING PERFORMANCE Switching procedure
S - training
Test
Predictions
R 3 - 0 1 . Sl: R 3 -
S 1 : R I < R2
R 4 - 0 1 . S2: R4-
S2: R2 < R l
SI: R I - 0 1 , R 2 - 0 2 S2: R l - 0 2 , R2-01
N o w . RI and R2 are instrumental responses. lever pressing, and chain pulling: R 3 and R4 are instrumental responses, nose poking, and handle pulling. counterbalanced across animals. 01 and 0 2 denote food pellets and sucrose liquid; S I and S2 are discriminative stimuli. noise. and light
Ruth M. Colwill
52
0-0
''
0-0
DlFF A,----A IT1
-w aJ 3
'EK
12 '
L
m
0
\
0
\
0
a (0
aJ : 0 a
SAME
8
cn
?!
A -...
A.... ,
I
Fig. 15. Effect of converting discriminative stimuli trained in a switching procedure to S - on their ability to produce their original responses. Each stimulus had been trained as an S - with an outcome that was the same as that following one of its original responses (solid circles) and different from that following the other of its original responses (open circles). Responding is also shown during the IT1 (open triangles) when no stimuli were presented.
the outcome that had followed the instrumental response in that stimulus and the outcome used to convert the stimulus into an S - . Thus, despite training as a signal for the omission of a particular outcome, a switching stimulus continued to promote performance of its original response trained with that outcome as effectively as it did performance of its original response trained with a different outcome. This pattern of results is strikingly different from the one displayed in Fig. 5 . These data suggest that animals have access to multiple associative codes for solving instrumental discriminations. However, the data leave unanswered the question of what feature of an instrumental task prejudices the animal toward a solution in terms of either binary or hierarchical connections. Two possible answers to that question are now considered. 2. Biconditional Discrimination Learning One straightforward possibility is that animals employ a hierarchical structure whenever a binary structure fails to capture accurately the information about the prevailing instrumental contingencies. To assess this idea, we
Instrumental Contingencies
53
have analyzed the learning that occurs in an instrumental biconditional discrimination. This task was chosen both because of its structural similarity to the switching task and because it can be solved using either binary o r hierarchical associations. In a switching task, the stimuli signal which one of two rewards will follow a specific response; in a biconditional discrimination, the stimuli signal which one of two concurrently available responses will be reinforced and which will not (Trapold, 1970). More specifically, one response ( R l ) is rewarded only during one stimulus (Sl) and the other response (R2) is rewarded only during the other stimulus (S2). Thus, the identity of the correct response is conditional on the identity of the discriminative stimulus. a . Outcome Devaluation. Our initial studies employed a biconditional discrimination in which the correct responses earned the same outcome (0).The ability of animals to solve the single-outcome version of this problem has traditionally been cited as prima facie evidence for some kind of S-R learning (Mackintosh, 1983). According to classical S-R theory, because rewards only follow performance of the correct response in each stimulus, an association between the discriminative stimulus and the correct response will be selectively strengthened. In this way, the unique S-R associations guarantee performance of the correct response in each stimulus. What is important to note about this evocative S-R account is that the outcome, although responsible for producing the association that permits this problem to be solved, is not itself represented in that associative structure. Consequently, manipulations of the value of the outcome after learning has taken place will have no specific impact on biconditional discriminative performance. To test this prediction, rats were trained concurrently on two independent biconditional discrimination tasks (Colwill& Delamater, 1993, Experiment 2). In one task using auditory discriminative stimuli, one response (R1)was rewarded with one outcome (01)in the presence of one stimulus ( A l ) and the other response (R2) was rewarded with the same outcome (01)in the presence of the other stimulus (A2). In the other task using visual discriminative stimuli, a third response R3 was rewarded with a different outcome ( 0 2 ) in the presence of one stimulus ( V l ) , and a different response R4 was rewarded with that outcome (02) in the presence of the other stimulus (V2). Following acquisition of the two biconditional discriminations, the value of one of the outcomes was reduced by pairing the outcome with a nausea-inducing agent. Subjects were then tested in extinction on each biconditional discrimination. The question of interest is whether performance of the correct response will be depressed when the outcome used to train that response has been devalued.
Ruth M. Colwill
54
The results of testing the biconditional stimuli with their original responses are shown in Fig. 16. Responding trained with the devalued outcome is shown on the left of Fig. 16 and responding trained with the valued outcome is shown on the right of Fig. 16. Ineach case, performance is shown separately for the correct response, the incorrect response, and during the ITI. There is no doubt that outcome devaluation had a major impact on biconditional performance. There was a substantial but selective decrease in performance of the correct response whose outcome was devalued. This sensitivity of the rewarded response t o the value of its outcome argues against a purely evocative S-R account of instrumental performance. Other accounts embracing some form of S-R learning face an equally dismal fate in their efforts to explain the present findings. Trapold and Overmier (1972) make predictions about the effects of outcome devaluation in this study that are indistinguishable from those made by classical S-R theory. Because each biconditional discrimination task employed only one outcome, the two discriminative stimuli within a task would each develop an association with the same outcome, and so elicit the same outcome expectancy. As a result, this expectancy would be less informa-
16
r
12 -
m-0
CORRECT
0-0
INCORRECT
A -- - A ITI
@\
8 -
\m 4 -
m‘ ‘*\e
0
0 .0A. 0 - - -A- --A.- - -
-
O l O
A-..
\
- - -X>-Q
-A-
Fig. 16. Mean rates of responding on the biconditional discriminations in the first extinction test after the training outcome had been devalued (left panel) or not (right panel). In each panel, performance is shown separately for correct responses (filled circles) and incorrect responses (open circles) during the disciminative stimuli and during the IT1 (open triangles) when no stimuli were delivered. From Colwill & Delamater (1993).
Instrumental Contingencies
55
tive about which of the two responses was correct than the physical features of the biconditional cues. Given the evidence that incidental or redundant stimuli do not gain control over behavior (Mackintosh, 1983; Wagner et al., 1968), the contribution of outcome expectancies to performance would be negligible. Therefore, according to Trapold and Overmier ( 19721, outcome devaluation ought not to have differentially affected instrumental responding. The version of two-process theory proposed by Rescorla and Solomon (1967) also encounters difficulty with the present data. According to their account, devaluation of the outcome should have reduced the motivation for responding in the presence of the discriminative cues. Consequently, performance of the incorrect responses should have been depressed following devaluation of the instrumental outcome. Although a floor effect may have contributed to a failure to detect such suppression in the test shown in Fig. 16, increasing the overall level of responding before a second test did not alter the results. Even with a substantial increase in the general level of responding, the likelihood of an incorrect response in the presence of the stimulus associated with the devalued outcome was not significantly different from the rate of incorrect responding in the stimulus that signaled the valued outcome. Such results provide little encouragement for the idea that the observed devaluation effects were mediated by a reduction in the motivation for responding normally provided by the Pavlovian S-0 association. A combination of S-R and R-0 associations fares more successfully with the results displayed in Fig. 16. Assuming that the function of the S-R association is to retrieve a representation of the correct response, subjects can then use the R - 0 association to evaluate the current consequences of that response. Obviously, devaluing the outcome will reduce performance of that response. But this account predicts an outcome devaluation effect only if the response has a single outcome. In a companion study, we used the same pair of responses for each biconditional problem. Thus, in one discrimination, correct responses earned pellets, but in the other discrimination, those correct responses earned sucrose. Under these conditions, an S-R and R-0 account would not predict differential sensitivity to a change in the value of one outcome. Yet, when we depressed the value of one of the outcomes through satiation, correct responses were relatively less likely in the presence of the stimuli in which they had earned that outcome than during the stimuli in which they had earned the other outcome. b. Manipulation of S - 0 Relation. The effects of outcome devaluation on biconditional discriminative performance may yet be explained by a
56
Ruth M. Colwill
binary model that employs R-0, S-0, and inhibitory S-R associations. This account attributes performance of the correct response to a combination of S-0 and R-0 associations. Thus, original responses are controlled through the same kind of mechanism as transfer responses. To generate differential responding in the presence of the biconditional discriminative stimuli trained with a single outcome, promotion of the incorrect response by the S-0 association must be counteracted. One way to accomplish this is through an inhibitory association between the stimulus and the incorrect response. To test this binary hypothesis, we exploited the fact that S + control can be reduced by training the stimulus as an S - with the same outcome. If correct responses are in fact the product of S - 0 and R-0 interactions, it should be possible to reduce their occurrence by training the biconditional cue as an S - . We have explored the impact of this extinction treatment on biconditional cues established in a variety of ways. The results are always the same: Regardless of variations in the procedures used to train biconditional cues, S - training does not have an outcomespecific negative effect on their ability to control performance of their correct response. We illustrate this point with an experiment designed to preserve the biconditional structure inherent in the preceding studies but to increase the similarity of the task t o the discrimination in which we first observed an impact of the S - treatment. To this end, rats were trained on two biconditional tasks that used different outcomes within a task. The basic design of that study is outlined in Table VII. Rats were trained concurrently on two independent biconditional discriminations. Each discrimination employed the same pair of responses (nose poke and handle pull) and two different outcomes (food pellets and sucrose liquid). In one task using auditory cues (noise and tone), one response (RI) earned one outcome (01) during presentations of one stimulus (Al), and the other response (R2) earned a different outcome (02) during the other stimulus (A2); in the other task with visual cues (steady and flashing lights), R1 earned 0 2 during V1 and R2 earned 01 during V2. After acquisition of both discriminations, two other responses (lever and chain pull) were trained, one (R3) with 01 and the other (R4) with 0 2 . The auditory biconditional cues were established as S - s for one of these responses; the visual cues were trained as S - s for the other transfer response; and R3 and R4 earned their training outcomes in the absence of the stimuli on a VI 30sec schedule. In this way, one of the visual cues and one of the auditory cues were trained as S - for a response with the same outcome; and the other visual and auditory cues were trained as S - with a different outcome. Thus, the biconditional cues for one task signaled the omission of an outcome that was the same as that used in original biconditional
57
Instrumental Contingencies
TABLE VII DESIGNOF
s-0 EXTINCTION EXPERIMENT WITH BICONDITIONAL STIMULI
Biconditional training
S
~
training
Test
AI: RI-01, R2A2: R2-02, R1-
R3-01, At: R 3 - , A2: R3/R4-02, VI: R4-, V2: R4-1
A t : RI v R2 A2: R1 v R2
VI: R1-02, R2V2: R2-01, RI -
R3-02, A t : R 3 - , A2: R3R4-01. VI: R 4 - . V2: R4-
V1: R1 v R2 V2: RI v R2
Nore. RI and R2 are instrumental responses, nose poking, and handle pulling; R3 and R4 are instrumental responses. lever pressing, and chain pulling, counterbalanced across animals. 01 and 0 2 denote food pellets and sucrose liquid: - indicates nonreinforcement; VI and V2 are discriminative stimuli, steady light, and flashing light: AI and A2 are discriminative stimuli, tone. and noise.
discrimination training, whereas the cue for the other task signaled the omission of a different outcome. Each pair of biconditional cues was then tested with its original pair of responses. It was anticipated that if performance of the correct response in this biconditional discrimination depended on the combination of S-0 and R-0 associations, then S training would produce an outcome-dependent disruption of correct responses in those discriminations. The results of testing the biconditional stimuli with their original responses are displayed in Fig. 17. Responding in the presence of cues that had been trained as S - s with the same outcome is shown on the left of Fig. 17. and with a different outcome is shown on the right. In each case, correct, incorrect, and IT1 responses are shown separately. There is no evidence that biconditional control was differentially affected by S - training as a function of whether the same outcome or a different outcome was used. Comparisons of correct response rates across the different and same conditions revealed no significant difference. The preceding discussion has established that subjects adopt a hierarchical solution to a problem even when information about the instrumental contingency may be accurately represented by binary connections. In summary, the results of our analyses of biconditional discrimination learning suggest that conventional binary accounts do not capture the full spectrum of associative structures that animals can use to solve instrumental discriminations. It is safe to conclude that hierarchical structures are not limited t o only those tasks where binary models fail. 3 . Response-Outcome Ambiguity a . Determinant of Hierarchical Structure. Another answer to the question of what feature of instrumental training encourages hierarchical
Ruth M. Colwill
58
SAME OUTCOME
28
-
.-C
24
-
L
20
-
16
-
12
-
DIFFERENT OUTCOME 0-0
CORRECT
Y al
3
E al
a v)
al v)
C
0
a
v)
2
8 -
C 0
al
z
4 -
1
2
3
4
1
2
3
4
Trials Fig. 17. Mean rates of responding on the biconditional discriminations following S training. The left-hand side displays performance on the discrimination whose cues were trained with the same outcome used for S - training; the right-hand side displays the comparable data when S - training employed a different outcome. In each case, correct (filled circles) and incorrect responses (open circles) are plotted separately. Responding is also shown during the IT1 (open triangles) when no stimuli were presented.
learning focuses on the role of response-outcome ambiguity. Those studies that have identified hierarchical structures share the property that individual stimuli disambiguate which of two outcomes will follow a particular response. In some studies, a response is followed by one food outcome, for example, food pellets, in one stimulus and a different food outcome, for example, sucrose liquid, in the presence of another stimulus. In other studies, the outcomes used are rewards and nonrewards. To evaluate whether hierarchical control develops over a response whose multiple outcomes are uniquely correlated with different stimuli, I replicated the original study demonstrating loss of S + control shown in Fig. 5 with one slight variation in the design. Instead of training S + s with different responses, the same response was used for both. Thus, rats were rewarded with one outcome (e.g., pellets) for nose poking during a light and with a different outcome (e.g., sucrose) for nose poking during a noise. Then both stimuli were trained as S - for another response, chain pulling. For one half of the animals, chain pulling was reinforced with pellets in the absence of the noise and light; for the other animals, chain pulling was reinforced with sucrose. In this way, all animals had one S + converted to an S - using the same outcome (same S + / S - ) , and the
Instrumental Contingencies
59
other S + trained with a different outcome (diff S + /S -). The impact of this treatment on original discriminative control was assessed during an extinction test in which each stimulus was tested with the nose poke response. The results of that test showed that no matter what outcome was used for S - training, both stimuli continued to promote performance of their original responses relative to the IT1 (6.8 responses per min) and the degree to which they did so was not significantly different (23.0 and 21.5 responses per min for the same S + /S - and the diff S + / S - , respectively). The disparity between these data and those shown in Fig. 5 strongly implies that the critical condition for producing hierarchical learning is one in which the multiple outcomes of an instrumental response are uniquely correlated with different stimulus conditions. The present data are also important for eliminating two other accounts of the failure of S-0 manipulations to influence performance on biconditional and switching discriminations. First, it was usually more difficult to establish the cue as an S in those studies. Often the terminal level of suppression was less than that obtained in the original experiment obtaining loss (shown in Fig. 5). Yet, in the present experiment. the level of inhibitory control at the end of S - training was indistinguishable from that obtained in the original study. Second, the designs of the biconditional and switching studies were extremely complex, involving multiple cues, responses, and outcomes in various combinations. But given the present results, it is unlikely that this factor obscured detection of the effect of extinction of the S-0 relation in those other studies. The present analysis of biconditional and switching discriminations suggests that it is profitable to view a discriminative stimulus trained in these procedures as associated with the particular R - 0 relations arranged in its presence, S+(R-0) and S -/(R-0). This approach provides a natural explanation for the effects of outcome devaluation on instrumental performance. It also anticipates selective transfer of a biconditional cue to other responses trained with the same outcome (Colwill & Delamater, 1993). Such transfer is attributed to differential generalization across R - 0 relations. Novel combinations of a discriminative stimulus and an R - 0 relation are treated as more similar to the original training condition when both responses share an association with the same outcome. An explanation for the preservation of hierarchical control following elimination of the S-0 association is rather more speculative. Colwill and Delamater (1993) proposed that the resistance of biconditional performance to the detrimental effects of extinguishing the S - 0 associations might be understood in terms of changes in the generalization gradients produced by original training. In our biconditional discrimination experi-
60
Ruth M. Colwill
ments, each biconditional cue was simultaneously established as a signal for the reinforcement of one response and the nonreinforcement of a different response. This discrimination training may have sharpened the generalization gradients such that additional training of the cue as a signal for the nonreinforcement of yet another response would have little impact on the ability of that cue to elicit its original R - 0 relation (see Mackintosh, 1974). This account can also be extended to accommodate the absence of an effect of removing the S - 0 association on stimuli signaling which of two rewards will follow a response. To the degree that occurrences of the response with 01 are treated as experiences of the omission of 0 2 for that response, a similar sharpening of the generalization gradients should occur. In summary, the conclusion to be drawn from this discussion of hierarchical learning is that animals resolve the ambiguity of a response with multiple outcomes by associating each R-0 relation with the appropriate stimulus. In the next two sections, I discuss the significance of this work for our understanding of other aspects of instrumental learning. b. Implications for Differential Outcomes Design. The preceding analysis of biconditional discrimination learning raises a serious question about the customary interpretation of the differential outcomes effect (DOE)in terms of binary associations. Traditionally, some of the strongest evidence for an association between a discriminative stimulus and the instrumental outcome has been derived from studies of the DOE (Goeters, Blakely, & Poling, 1992). Trapold (1970) drew attention to the fact that animals learn a biconditional discrimination more readily when the correct response for one stimulus is rewarded with a different outcome from that earned by the correct response for a different stimulus. Almost uniformly however, studies replicating this effect base their evidence that acquisition is more rapid in the differential outcomes condition on a comparison with a condition in which the outcomes following the correct responses are mixed or nondifferential. Consequently, as noted by Colwill and Rescorla (1988a), it may not be the inconsistency between the stimulus and the outcome that retards acquisition in the nondifferential condition, but rather the inconsistency between the response and the rewarding outcomes. More promising evidence that the relationship between the stimulus and the outcome influences instrumental learning comes from experiments implementing the design outlined in Table VIII (Peterson & Trapold, 1982; Rescorla, 1992b).In this case, the relation between instrumental responses and their consequences is held constant across the consistent and inconsistent conditions. Each response is followed by only one reward during its correct stimulus and by nonreward during its incorrect stimulus. However, only in the consistent condition are the outcomes uniquely correlated
Instrumental Contingencies
61
TABLE VIII BASICDESIGNOF DIFFERENTIAL OUTCOMES EXPERIMENT Consistent group
SI: RI-01, R2S2: R2-02, R1SI: R3-01, R4S2: R4-02, R3Inconsistent group SI: RI-01, R2S2: R2-02, R1SI: R3-02, R4S2: R4-01, R3 N o f e . R1 and R2 are instrumental responses. lever pressing. and chain pulling; R3 and R4 are instrumental responses, nose poking, and handle pulling, counterbalances across animals. 01 and 0 2 denote food pellets and sucrose liquid: SI and S2 are discriminative stimuli, noise, and light.
with different discriminative stimuli. Studies using this design have found superior acquisition in the consistent condition. An example of this effect is shown in Fig. 18. Those data come from an experiment with rats by Denise Richardson and I using the design shown in Table VIII. They demonstrate that the difficulty experienced by subjects in acquiring the discrimination in the inconsistent condition is not that they are less likely to make the correct response, but rather that they are more likely to make the incorrect response. The data shown in Fig. 18 are readily accommodated by a hierarchical model of instrumental learning. Each discriminative stimulus develops independent excitatory associations with each of its correct R-0 units. However, the generalization between the correct R - 0 units associated with a particular stimulus and the identity of the R-0 relation that is incorrect is not equal across the consistent and inconsistent conditions. In the consistent condition, the incorrect R - 0 units for a specific stimulus do not share a common outcome element with the correct R - 0 units for that stimulus. But in the inconsistent condition, each stimulus is associated with an R-0 unit containing the same outcome used to train one of its incorrect responses. Evidence that differential generalization underlies the DOE shown in Fig. 18 is supported by the results of extinguishing the connection between one of the R-0 units and its correct stimulus. That treatment eliminated the difference in preformance of the incorrect re-
62
Ruth
1.0
-
0.8 0.6
0.4
M. Colwill
0-0
Consistent
0-0
Inconsistent
-
*48
-
0-0 0
Correct responses
4@ Incorrect responses
.o .\
\ 0-
0.2 -
1
,879
o/
2
3
o / o , O 0-0-0
4
5
6
Blocks of three sessions
Fig. 18. Demonstration of a differential outcomes effect on acquisition of an instrumental biconditional discrimination. For the consistent group (filled circles), the same outcome was used to reward both of the correct responses for a particular stimulus; for the inconsistent group (open circles). the two correct responses for a stimulus were rewarded with different outcomes. Performance of correct responses and incorrect responses is plotted separately.
sponse whose outcome was the same as that of the extinguished correct response, but not that of the incorrect response whose outcome was different. c . Implication for Structure of an S - . The evidence that a hierarchical association develops when a stimulus disambiguates the relations between a response and its outcomes may offer some insight into the conditions that might promote analogous associations for negative discriminative stimuli. Specifically, if a stimulus signals which one of two outcomes will not follow a response, that stimulus may inhibit a particular R - 0 relation rather than develop separate inhibitory connections with the response and the omitted outcome. The following experiment tested this prediction. Two visual cues (a localized steady light and an overhead flashing light) were trained as signals that two responses would not be reinforced. In the sessions with one visual cue, lever presses earned pellets and chain pull responses earned sucrose during the ITI; in sessions with the other visual cue, lever presses led to sucrose and chain pulling led to pellets during the ITI. Thus, each stimulus served as a potential signal for the nonreinforcement of both responses and the omission of both outcomes.
Instrumental Contingencies
63
To detect whether the cues signaled which outcome would be omitted for a particular response, lever presses were reinforced in the presence of both cues with one of the outcomes (pellets for half the animals and sucrose for the other animals). The issue of interest is whether the stimuli would vary in the rate of acquisition of discriminative control over lever pressing. For example, if lever pressing for pellets developed more slowly in the presence of the cue that had signaled lever presses would not be followed by pellets, it would suggest that the signal acted on a specific R-0 relation. In the first acquisition session, performance of both responses was negligible. However, the results of the second session revealed a difference indicative of a hierarchical association. Thus, lever pressing for pellets (or sucrose) occurred significantly less often during the stimulus that had signaled lever presses would not be followed by pellets (or sucrose) than during the stimulus that had signaled lever presses would not be followed by sucrose (or pellets), 3.5 and 4.5 responses per min respectively. This result suggests that under some circumstances a discriminative stimulus may act by inhibiting an R - 0 association. Because each stimulus had been trained with both responses and both outcomes, the pattern of results cannot be interpreted in terms of simple summation of its inhibitory action on the individual elements.
B. IMPLICATIONS FOR ANALYSES OF PAVLOVIAN DISCRIMINATION LEARNING We have proposed that instrumental learning may be served by both binary and hierarchical associative structures. It is of interest to note that a similar distinction emerged in efforts to account for Pavlovian occasionsetting. In this paradigm, one stimulus, so-called occasion setter, provides information about the reinforcement or nonreinforcement of a target stimulus. As a result of experience with this paradigm, the occasion setter comes to control behavior to the target stimulus. Some authors argued that the occasion setter operates on the association between the target stimulus and its outcome (Holland, 1983, 1989, 1992; Lamarre & Holland, 1987). Others have suggested that the occasion setter acts on the outcome representation by modulating its activation threshold (Rescorla, 1985). The present analysis of instrumental learning agrees with the essential features of both views but suggests that their appropriateness may vary depending on details of the training procedure used to produce an occasion setter. Finally, the analysis of instrumental learning provides additional support for exercising caution in adopting a Pavlovian configural cue account of
64
Ruth M. Colwill
all instances of discrimination learning. According to that account, animals associate configurations of the occasion setter and its target stimulus, or the discriminative stimulus and its response, with the consequent outcome (e.g., Preston, Dickinson & Mackintosh, 1986; Wilson & Pearce, 1989). Several investigators (Holland, 1985, 1991, 1992; Rescorla, 1991) have attempted, in rather imaginative ways, to undermine a configural cue interpretation of Pavlovian occasion-setting. Rescorla (1990a) has also presented data that combat efforts to account for instrumental discriminative performance in terms of configural conditioning. This chapter describes two additional results that may present a further challenge to proponents of configural cue theory. First, the evidence supporting a partitioning of instrumental learning into binary and hierarchical structures does not, on the surface, appear easily reconcilable with a single-process model. Second, the evidence suggesting that binary instrumental S-0 associations differ qualitatively from Pavlovian stimulus-outcome associations seems to conflict with a model embracing a purely Pavlovian connection between a configuration and its outcome. It remains for future research to determine if subsequent elaborations of configural conditioning models will succeed in accommodating the present results.
V. Conclusion The work surveyed in this chapter illustrates some of the progress that has been made in documenting the rich array of associative connections that animals have at their disposal to solve instrumental problems. These advances owe much to the development of an impressive array of psychological tools for probing the structure of stored associative knowledge. Moreover, we are beginning to appreciate the relevant dimensions of an instrumental task that encourage use of one or other of the associative structures that define an animal’s cognitive repertoire. Indubitably, contemporary analyses of instrumental learning have revealed a sophistication and flexibility in the capacity of animals to represent information about instrumental contingencies that far exceeds the trial and error legacy of traditional S-R learning theory. ACKNOWLEDGMENTS This research has been generously supported by National Science Foundation Grants IBN 89-15342 to R. M. Colwill and BNS 83-08176 to R. A. Rescorla, and by a grant from Brown University BRSG 5-27469 to R. M. Colwill. Thanks are due to R. A. Rescorla for support of the work conducted at the University of Pennsylvania. I am particularly grateful
Instrumental Contingencies
65
to Eric Wolfinger and Beth Kraemer, whose assistance with data collection at Brown and at Purdue, respectively, was invaluable. I am also appreciative of the efforts of many undergraduates who collaborated on this research during their summer vacations: Deborah Archer, Gregg Belle, John Brown, Ellen Capon, Kevin Goodrum, Jennifer Lee, Barbara Shipley, Virginia Smith, Robert Tetreault, and Emily Whitcomb. I am indebted t o my colleague Andrea Megela Simmons for some insightful and critical comments on a previous version of this chapter, and to Jennifer Richeson and Robert Tetreault for their help in proofing the chapter. Correspondence concerning this article should be addressed to Ruth M.Colwill, Department of Psychology, Brown University, Box 1853, Providence, RI 02912.
REFERENCES Adams, C. D., & Dickinson, A. (1981). Instrumental respondingfollowing reinforcer devaluation. Quarterly Journal of Experimental Psychology, 338, 109-121. Amsel, A. (1958). The role of frustrative nonreward in noncontinuous reward situations. Psychological Bulletin, 55, 102-1 19. Annau, Z., & Kamin, L. J. (1961). The conditioned emotional response as a function of intensity of the US. Journal of Comparative and Physiological Psychology, 54,428-432. Asratyan, E. A. (1974). Conditional reflex theory and motivational behavior. Acra Neurobiologica Experimentalis, 34, 15-31. Baker, A. G., Steinwald, H., & Bouton. M.E. (1991). Contextual conditioning and reinstatement of extinguished instrumental responding. Quarrerly Journal of Experimental Psychology, 43B, 199-218. Baum, W. M., & Rachlin, H . C. (1969). Choice as time allocation. Journal ofExperimental Analysis of Behavior, 12, 861-874. Bersh. P. J., & Lambert, J . V. (1975). The discriminative control of free-operant avoidance despite exposure to shock during the stimulus correlated with nonreinforcement. Journal of Experimental Analysis of Behavior, 23, I 1 1-120. Blanchard, R., & Honig, W. K. (1976) Surprise value of food determines its effectiveness as a reinforcer. Journal of Experimental Psychology: Animal Behavior Processes, 2, 67-74. Boakes, R. A. (1973). Response decrements produced by extinction and by responseindependent reinforcement. Journal of Experimental Analysis of Behavior, 19, 293-302. Bolles, R. C. (1972). Reinforcement, expectancy, and learning. Psychological Review, 79, 394-409. Bonardi, C. (1988). Mechanisms of inhibitory discriminative control. Animal Learning and Behavior, 16, 445-450. Bonardi, C . (1989). Inhibitory discriminative control is specific to both the response and the reinforcer. Quarterly Journal of Experimental Psychology, 418, 225-242. Bouton, M. E. (1991). Context and retrieval in extinction and in other examples of interference in simple associative learning. In L. W. Dachowski & C . F. Flaherty (Eds.), Current topics in animal learning: Brain, emotion, and cognition (pp. 25-53). Hillsdale, NJ: Erlbaum. Bouton, M. E., & Bolles, R. C. (1979). Role ofconditioned contextual stimuli in reinstatement of extinguished fear. Journal of Experimental Psychology: Animal Behavior Processes, 5 , 368-379. Bouton, M. E., & Peck, C. A. (1989). Context effects on conditioning, extinction and reinstatement in an appetitive conditioning procedure. Animal Learning & Behavior, 17, 188-198.
66
Ruth M. Colwill
Capaldi, E. J. (1967). A sequential hypothesis of instrumental learning. In K. W. Spence & J. T. Spence (Eds.), The psycho1og.y oflearning and motivation (Vol. I ) . New York: Academic Press. Capaldi. E. J. (1970). An analysis of the role of reward and reward magnitude in instrumental learning. In J. H. Reynierse (Ed.), Current issues in animal learning. Lincoln, NE: University of Nebraska Press. Catania, A. C. (1971). Elicitation, reinforcement and stimulus control. In R. Glaser (Ed.), The nature of reinforceme,zt (pp. 196-220). New York: Academic Press. Chiszar. D. A., & Spear, N . E. (1968). Proactive interference in a T-maze brightness discrimination task. Psychonomic Science, 11, 107-108. Colwill, R. M. (1991). Negative discriminative stimuli provide information about the identity of omitted response-contingent outcomes. Animal Learning & Behauior. 19, 326-336. Colwill, R. M. (1993a). An associative analysis of instrumental learning. Current Directions in Psychological Science, 2 . 111-1 16. Colwill, R. M. (l993b). Signaling the omission of a response-contingent outcome reduces discriminative control. Animal Learning & Behavior, 21, 337-345. Colwill, R. M., & Delamater, B. A. (1993). An associative analysis of instrumental biconditional discrimination learning. Animal Learning & Behavior. (Accepted for publication.) Colwill, R. M.. & Motzkin, D. K . (1993). Encodingofthe unconditionedstimulus in Pavlovian conditioning. Animal Learning & Behavior. (Accepted for publication.) Colwill, R. M., & Rescorla, R. A. (1985a). Post-conditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes, 11, 120-132. Colwill, R. M., & Rescorla, R. A. (198%). Instrumental responding remains sensitive to reinforcer devaluation after extensive training. Journal of Experimental Psychology: Animal Behavior Processes, 11, 520-536. Colwill, R. M., & Rescorla, R. A. (1986). Associative structures in instrumental learning. In G. H . Bower (Ed.). The psychology oflearning and motivation (Vol. 20, pp. 55-104). New York: Academic Press. Colwill, R. M . , & Rescorla. R. A. (1988a). Associations between the discriminative stimulus and the reinforcer in instrumental learning. Journal ofExperimental Psychology: Animal Behavior Processes. 14, 155-164. Colwill. R. M., & Rescorla. R. A. (1988b). The role of response-reinforcer associations increases throughout extended instrumental training. Animal Learning & Behauior. 16, 105-111. Colwill, R. M . , & Rescorla, R. A. (l990a). Effect of reinforcer devaluation on discriminative control of instrumental behavior. Journal ofExperimental Psychology; Animal Behavior Processes, 16, 40-47. Colwill. R. M., & Rescorla, R. A. (1990b). Evidence for the hierarchical structure of instrumental learning. Animal Learning & Behavior. 18(1), 71-82. Cotton, M . M., Goodall. G . . & Mackintosh, N. J . (1982). Inhibitory conditioning resulting from a reduction in the magnitude of reinforcement. Quarterly Journal ofExperirnental P.yycho/ogy, 34B. 163-180. Crespi, L. P. (1942). Quantitative variation of incentive and performance in the white rat. American Journal of Psychology, 55, 467-5 17. de Villiers. P. A. (1977). Choice in concurrent schedules and a quantitative formulation of the law of effect. In W. K. Honig & J . E. R. Staddon (Eds.), Handbook of operant behavior. Englewood Cliffs, NJ: Prentice Hall. Dickinson. A. (1980). Conrernpo,rary animal learning theory. Cambridge: Cambridge University Press.
Instrumental Contingencies
67
Dickinson. A.. & Dawson, G. R. (1987). The role of the instrumental contingency in the motivational control of performance. Quarterly Journal of Experimental Psychology, 39B, 77-93. Dickinson, A., & Dearing, M. F. (197%. Appetitive-aversive interactions and inhibitory processes. In A. Dickinson & R. A. Boakes (Eds.), Mechanisms of learning and motivation (pp. 203-231). Hillsdale. NJ: Erlbaum. Dickinson, A., & Pearce, J . M. (1977). Inhibitory interactions between appetitive and aversive stimuli. Psychological Bulletin, 84. 690-71 I . Dickinson, A.. Peters. R. C., & Shechter, S. (1984). Overshadowing of responding on ratio and interval schedules by an independent predictor of reinforcement. Behavioral Processes, 9, 421-429. Franks, G. J., & Lattal, K . A. (1976). Antecedent reinforcement schedule training and operant response reinstatement in rats. Animal Learning & Behavior. 4 , 374-378. Gleitman, H. (1971). Forgetting of long-term memories in animals. In W . K . Honig & P. H . R. James (Eds.), Animal memory. New York: Academic Press. Gleitman, H., & Jung, L. (1963). Retention in rats: The effect of proactive interference. Science, 7, 19-20. Gleitman. H . . & Steinman, F. (1964). Depression effect as a function of retention interval before and after shift in reward magnitude. Journal of Comparative and Physiological Psychology, 57, 158-160. Goeters, S.. Blakely, E., & Poling, A . (1992). The differential outcomes effect. Psychological Record, 42, 389-41 1. Guthrie, E. R. (1952). The psychology oflearning (2nd ed.). New York: Harper & Row. Gutman, A., & Maier. S. F. (1978).Operant and Pavlovian factors in cross-response transfer of inhibitory stimulus control. teurning and Motiuation, 9, 231-254. Hall, G., Channel, S.. & Pearce, J . M . (1981). The effects of a signal for free or earned reward: Implications for the role of response-reinforcer associations in instrumental performance. Quarterly Journul oj' Experimental Psychology, 338, 95-107. Hall. G.. Channel, S . , & Schachtman, T . R. (1987). The instrumental overshadowing effect in pigeons: The role of response bursts. Quurterlv Journal ofExperimental Psychology, 39B, 173-188. Hearst. E., & Franklin. S. R. (1977). Positive and negative reiations between a signal and food: Approach-withdrawal behavior. Journal of Experimental Psychology: Animal Behavior Processes, 3, 37-52. Hearst. E.. & Peterson. G. B . (1973). Transfer of conditioned excitation and inhibition from one operant response to another. Journal of Experimental psycho log.^, 99, 360-368. Herrnstein, R. J. (1961). Relative and absolute strength of response as a function offrequency of reinforcement. Journal of Experimental Analysis of Behavior. 4 , 267-272. Hilgard, E. R., &Marquis, D. G. (1935). Acquisition. extinction, and retention ofconditioned lid responses to light in dogs. Journal o f Comparative Psychology. 1 9 , 29-58. Holland. P. C. (1983). "Occasion-setting" in Pavlovian feature positive discriminations. In M. L. Commons, R. J. Herrnstein. & A . R. Wagner (Eds.), Quantirative analyses of hehauior: Vol. IV. Discrimination proce.sses. Cambridge, MA: Ballinger. I!olland. P. C. (1985).The nature of conditioned inhibition in serial and simultaneous feature negative discriminations. In R. R. Miller & N. E. Spear (Eds.), Information processing in animals: Condirioned inhibifion (pp. 267-297). Hillsdale. NJ: Erlbaum. Holland, P. C. (1989). Transfer of negative occasion setting and conditioned inhibiton across conditioned and unconditioned stimuli. Journal qf Experimental Psychology: Animal Behavior Processes. 15, 31 1-328. Holland, P. C. (1991). Transfer of control in ambiguous discriminations. Journal ofExperimental Psychology: Animal Behavior Processes, 17, 23 1-248.
68
Ruth M. Colwill
Holland, P. C. (1992). Occasion setting in Pavlovian conditioning. In D. L. Medin (Ed.), Thepsychology of learning andmotivation, (Vol. 28, pp. 69-125). New York: Academic Press. Huff, R. C., Sherman, J. E., & Cohn, M. (1975). Some effects of response-independent reinforcement on auditory generalization gradients. Journal of Experimental Analysis of Behavior, 23, 81-86. Hull, C. L. (1943). Principles of behauior. New York: Appleton-Century-Crofts. Hunter, W. S. (1913). The delayed reaction in animals and children. Behavior Monographs. Z(l,Serial No. 6). Hunter, W. S. (1934). Learning: IV. Experimental studies of learning. In C. Murchison (Ed.), A handbook of general experimental psychology (pp. 497-570). Worcester, MA: Clark University Press. Jenkins, H. M. (1962). Resistance to extinction when partial reinforcement is followed by regular reinforcement. Journal of Experimental Psychology, 61, 1 1 1-121. Jenkins, H. M. (1965). Measurement of stimulus control during discriminative operant conditioning. Psychological Bulletin, 64, 365-376. Jenkins, H. M. (1977). Sensitivity of different response systems to stimulus-reinforcer and response-reinforcer relations. In H. Davis & H. M. B. Hurwitz (Eds.), OperantPaufouian interactions (pp. 47-62). Hillsdale, NJ: Erlbaum. Johnson, D. L., McGlynn, F. D.. & Topping, J. S. (1973). The relative efficiency of four response elimination techniques following VR reinforcement training. Psychological Record, 23, 203-208. Kamin, L. J. (1968). Attention-like processes in classical conditioning. In M. R. Jones (Ed.), Miami symposium onpredictabilify. behauior and auersive stimulation (pp. 9-33). Coral Gables, FL: University of Miami Press. Konorski, J., & Miller, S. (1937). On two types of conditioned reflex. Journal of General Psychotogy, 16, 264-272. Kruse, .I. M., Overmier, J. B., Konz, W. A., & Rokke, E. (1983). Pavlovian conditioned stimulus effects upon instrumental choice behavior are reinforcer specific. Learning and Motivation, 14, 165-181. Lamarre, J., & Holland. P. C. (1987). Acquisition and transfer of serial feature negative discrimination. Learning and Motivation, 18, 319-342. Lattal, K . A., & Maxey, G. C. (1971). Some effects of response-independent reinforcers in multiple schedules. Journal of Experimental Analysis of Behavior, 16, 225331. Leyland, C. M., & Mackintosh, N . J . (1978). Blocking of first- and second-order autoshaping in pigeons. Animal Learning & Behauior, 6, 391-394. Linwick, D., 0vermier.J. B.,Peterson,G. B.,&Mertens,M.(1988). Interactionofmemories and expectancies as mediators of choice behavior. American Journal of Psychology, 101, 313-334. Mackintosh, N . J . (1974). The psychology of animal learning. London: Academic Press. Mackintosh, N . J . (1983). Conditioning andassociatiue learning. Oxford: Oxford University Press. Mackintosh, N. J . , & Cotton, M. M. (1985). Conditioned inhibition from reinforcement reduction. In R. R. Miller & N. E. Spear (Eds.), Information processing in animals: Conditioned inhibition (pp. 89-1 1 1 ) . Hillsdale, NJ: Erlbaum. Mackintosh, N. J., & Dickinson, A. (1979). Instrumental (Type 11) conditioning. In A. Dickinson & R. A. Boakes (Eds.), Mechanisms oflearning andmotivation (pp. 143-1671, Hillsdale, NJ: Erlbaurn. Maier, S. F., & Gleitman, H. (1967). Proactive interference in rats. Psychonomic Science, 9, 63-64.
Instrumental Contingencies
69
McGeoch, J. A., & Irion, A. L. (1952). The psychology of human learning. New York: Longmans, Green & Co. McSweeney, F. K. (1975). Matching and contrast on several concurrent treadle-press schedules. Journal of Experimental Analysis of Behavior, 23, 193-198. Miller, J. S., Jagielo, J. A., & Spear, N. E. (1990). Alleviation of short-term forgetting: Effects of the CS - and other conditioning elements in prior cueing or as context during test. Learning & Motiuation. 21, 96-109. Miller, L., & Price, R. D. (1974). Compounding of discriminative stimuli maintaining topographically different instrumental responses. Journal of General Psychology, 90, 109- I2 1 . Miller, R. R., &Springer, A. D. (1973). Amnesia, consolidation, and retrieval. Psychological Review, 80, 69-79. Miller, R. R., & Springer, A. D. (1974). Implications of recovery from experimental amnesia. Psychofogical Review, 81, 470-473. Morgan, C. L. (1894). An introduction to comparative psychology. London: Walter Scott, Ltd. Nevin, J . A. (1968). Differential reinforcement and stimulus control of not responding. Journal of Experimental Anal.vsis of Behavior, 11, 715-726. Nieto, J. (1984). Transfer of conditioned inhibition across different aversive reinforcers in the rat. Learning & Motivation, 18, 319-342. Pacitti, W. A., & Smith, N. F. (1977). A direct comparison of four methods for eliminating a response. Learning & Motivation, 8,229-237. Pavlov, I. P. (1927). Conditioned reflexes. Oxford: Oxford University Press. Pavlov, 1. P. (1932). The reply of a physiologist to psychologists. Psychological Review, 39, 91-127. Pearce, J . M. (1987). A model of stimulus generalization for Pavlovian conditioning. Psychological Review, 94, 61-13. Pearce, J. M., & Hall, G. (1978). Overshadowing the instrumental conditioning of a lever press response by a more valid predictor of reinforcement. Journal of Experimental Psychology: Animal Behavior Processes, 4 , 356-367. Pearce, J . M., Montgomery, A., & Dickinson, A. (1981). Contralateral transfer of inhibitory and excitatory eyelid conditioning in the rabbit. Quarterly Journal of Experimental Psychology, 33B, 45-61. Peck, C. A., & Bouton, M. E. (1990). Context and performance in aversive-to-appetitive and appetitive-to-aversive transfer. Learning and Motiuation, 21, 1-3 1. Perkins, C. C., Jr., & Weyant, R. G . (1958). The interval between training and test trials as determiner of the slope of generalization gradients. Journal of Comparative and Physiological Psychology, 5 1 , 596-600. Peterson, G. B . , Linwick, D., & Overmier, J. B. (1987). On the comparative efficacy of memories and expectancies as mediators in the differential-reward conditional discrimination performance of pigeons. Leurning and Motivation. 18, 1-20. Peterson, G . B., & Trapold, M. A. (1982). Expectancy mediation of concurrent conditional discriminations. American Journal of Psychology, 95, 571-580. Preston, G . C., Dickinson, A,, & Mackintosh, N . J . (1986). Contextual conditional discriminations. Quarterly Journal of Experimenfal Psychology, 38B, 217-237. Reid, R. L. (1957). The role of the reinforcer as a stimulus. British Journat ofPsychology, 49, 292-309. Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66, 1-5. Rescorla, R . A. (1969). Conditioned inhibition of fear resulting from negative CS-US contingencies. Journal of Comparative and Physiological Psychology, 67, 504-509.
70
Ruth M. Colwill
Rescorla, R. A. (1973). Effect of US habituation following conditioning. Journal of Comparative and Physiological Psychology, 82, 137-143. Rescorla. R. A. (1980). Pavlovian second-order conditioning: Studies in associative learning. Hillsdale, NJ: Erlbaum. Rescorla, R. A. (1985). Inhihition and facilitation. In R. R. Miller & N. E. Spear (Eds.), Information processing in animals: Conditioned inhibilion (pp. 299-326). Hillsdale. NJ: Erlbaum. Rescorla, R. A. (1990a). Evidence for an association between the discriminative stimulus and the response-outcome association in instrumental learning. Journal of Experimental Psychology: Animal Behavior Processes, 16, 326-344. Rescorla, R. A. (1990h). Instrumental responses become associated with reinforcers that differ in one feature. Animal Learning & Behavior. 18, 206-21 I . Rescorla, R. A. (1990~).The role of information about the response-outcome relation in instrumental discrimination learning. Journal of Experimental Psychologv: Animal Behavior Processes, 16, 262-270. Rescorla, R. A. (1991). Transfer of inhibition and facilitation mediated by the original target stimulus. Animal Learning & Behavior. 19. 65-70. Rescorla, R. A. (1992a). Response-independent outcome presentation can leave instrumental R-0 associations intact. Animal Learning & Behavior, 20. 104- I I 1. Rescorla, R. A. (l992b). Response-outcome versus outcome-response associations in instrumental learning. Animal Learning & Behavior, 20, 223-232. Rescorla, R. A. (1993). Preservation of response-outcome associations through extinction. Animal Learning & Behavior. 21, 238-245. Rescorla, R. A.. & Colwill, R . M. (1989). Associations with anticipated and obtained outcomes in instrumental learning. Animal Learning & Behavior. 17, 291-303. Rescorla, R. A., & Heth. C. D. (1975). Reinstatement of fear to an extinguished conditioned stimulus. Journal of Experimental Psychology: Animal Behavior Processes, I , 88-96. Rescorla, R. A., & LoLordo, V. M. (1965). Inhibition of avoidance behavior. Journal of Comparative and Physiological Psychology, 59, 406-412. Rescorla. R. A., & Skucy, J. C . (1974). Effect of response-independent reinforcers during extinction. Journal of Comparative and Physiological Psychology. 67. 381-389. Rescorla, R. A , , & Solomon, R. L. (1967). Two-process learning theory: Relationships between Pavlovian conditioning and instrumental learning. Psychological Review, 74, 151-182. Rescorla, R. A , , & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H . Black and W. F. Prokasy (Eds.), Classical conditioning 11: Cicrrenl research and theory (pp. 64-99). New York: Appleton-Century-Crofts. Rilling, M. (1977). Stimulus control and inhibitory processes. In W. K . Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior. Englewood Cliffs, NJ: Prentice-Hall. Robbins. S . J. (1990). Mechanisms underlying spontaneous recovery in autoshaping. Journal of Experimental Psychology: Animal Behavior Processes, 7 , 175- 190. Rozeboom, W. W. (1957). Secondary extinction of lever-pressing behavior in the albino rat. Journal of Experimental Psycholog.y, 54, 280-287. Rozeboom, W. W. (1958). "What is learned?"-An empirical enigma. PsycholoRicalReview, 65, 22-33. Schachtman, T. R., Brown, A. M., & Miller, R. R. (1985). Reinstatement induced recovery of a taste LiCl association following extinction. Animal Learning & Behavior, 13, 223-227. Schachtman, T. R.. & Hall. G . (1990). Potentiation and overshadowing of instrumental
Instrumental Contingencies
71
responding by pigeons: The role of behavioral contrast. Learning and Motivation, 21. 85-95. Shipley, B . E . , & Colwill, R. M. (1993). Direct effects on instrumental performance of outcome revaluation by drive shifts. Animal Learning & Behavior. (Accepted for publication.) Skinner, B. F. (1938). The behavior qf organisms. New York: Appleton-Century-Crofts. Spear, N. E. (1967). Retention of reinforcer magnitude. Psychological Review. 7 4 , 216-234. Spear, N . E. (1978). The processing ufmemories: Forgetting and retention. Hillsdale. NJ: Erlbaum. Spence. K. W. (1956). Behavior theory ond conditioning. New Haven, CT: Yale University Press. Spence, K. W. (1966). Cognitive and drive factors in the extinction of the conditioned eyeblink in human subjects. P~ychologiculReview. 73, 445-458. Spencer, H. (1871). The principles ofpsychology. Vol. 1 2nd edition. New York: AppletonCentury-Crofts. St. Claire-Smith, R. (1979a). The overshadowing of instrumental conditioning by a stimulus that predicts reinforcement better than t h e response. Animal Learning & Brhuuior, 7 , 224-228. St. Claire-Smith, R. (1979b). The overshadowing and blocking of punishment. Quarterly Journal of Experimental Psychology. 3lB. 5 1-61. Steinman, F. (1967). Retention of alley brightness in the rat. Journal of Comparative and Physiological Psychology, 6 4 , 105- 109. Thorndike, E. L. (191 I ) . Animal intelligence. New York: Macmillan. Thorndike, E. L. (1932). Fundamenruls of learning. New York: Teachers College Press. Tolman, E. C . (1933).Sign-Gestalt or conditioned reflex? PsychologicalReuiew. 55. 189-208. Trapold. M. A. (1970). Are expectancies based upon different positive reinforcing events discriminably different? Learning and Motivarion, I , 129-140. Trapold, M. A., & Overmier. J . B. (1972). The second learning process in instrumental learning. In A. A. Black & W. F. Prokasy (Eds.). Classical conditioning: 11. Current research and theory (pp. 427-452). New York: Appleton-Century-Crofts. Uhl, C., & Garcia. E. (1969). Comparison of omission with extinction in response elimination in rats. Journal of Compurative and Physiological Psychology, 6 9 , 554-562. Wagner. A. R. (1966). Frustration and punishment. In R . N. Haber (Ed.), Current research it1 motivation (pp. 229-239). New York: Holt, Rinehart & Winston. Wagner, A. R. (1981). SOP: A model of automatic memory processing in animal behavior. In N . E. Spear & R. R. Miller (Eds.), Information processing in animals: Memory mechanisms. Hillsdale, NJ: Erlbaum. Wagner, A . R., Logan, F. A,. Haberlandt. K . . & Price, T. (1968). Stimulus selection in animal discrimination learning. Journal of E-rperimental Psychology. 7 6 . 17 1- 180. Wagner, A. R., Mazur, J . E., Donegan. N . H . , & F'fautz, P. L. (1980). Evaluation of blocking and conditioned inhibition to a CS signaling a decrease in US intensity. Journal of Experimental Psychology: Animal Behavior Processes, 6 , 376-385. Walker. K. C. (1942). The effect of a discriminative stimulus transferred to a previously unassociated response. Journal of E-rperimental Psvchology, 31, 312-321. Weisman, R. G., & Litner, J. S. (1969). Positive conditioned reinforcement of Sidman avoidance behavior in rats. Journal of Comparative and Physiological Psychology, 6 8 , 597-603. Weisman, R. G . , & Ramsden, M. (1973). Discrimination of a response-independent component in a multiple schedule. Journal of Experimental Analysis of Behavior, 19, 65-73.
72
Ruth M. Colwill
Williams, B. A. (1978). Information effects on the response-reinforcer association. Animal Learning & Behavior, 6 , 371-379. Williams, B. A. (1982). Blocking the response-reinforcer association. In M. L. Commons, R. J . Herrnstein, & A. R. Wagner (Eds.), Quantifatzueanalyses of behavior: Acquisition (Vol. 3, pp. 427-445). Cambridge, MA: Ballinger. Williams, B. A., & Heyneman, N. (1982). Multiple determinants of “blocking” effects on operant behavior. Animal Learning & Behauior, 10, 72-76. Wilson, P. N . , & Pearce, J . M. (1989). A role for stimulus generalization in conditional discrimination learning. Quarterly Journal of Experimental Psychology, 418, 243-273. Zeiler, M. D. (1971). Eliminating behavior with reinforcement. Journal of Experimenral Analysis of Behavior, 16, 401-405.
A BEHAVIORAL ANALYSIS OF CONCEPTS: ITS APPLICATION TO PIGEONS AND CHILDREN Edward A . Wusserman Suzette L. Astley
1.
Introduction
Some conceptions are of things, some of events, some of qualities. Any fact. be it thing, event, or quality, may be conceived sufficiently for purposes of identification. if only it be singled out and marked so as to separate it from other things. Simply calling it “this” or “that” will suffice. . . . In this sense, creatures extremely low in the intellectual scale may have conception. All that is required is that they should recognize the same experience again. A polyp would be a conceptual thinker if a feeling of “Hollo! thing-umbob again!” ever flitted through its mind. (pp. 462-463).
These lines. written by William James in 1890, are ripe with possible meanings. To us, they suggest several points that will help place into a special perspective our chapter on concepts. First, James was willing to consider the possibility, albeit remote, that even the simplest animals are capable of conceptual thought. Second, he deemed either identification or recognition to be sufficient indicators of conception. Third, he suggested that identification or recognition implies discrimination, so that some stimuli will elicit a particular reaction, whereas others will not. Finally, he spoke of overt and covert signs of conception; in the first paragraph, different public words in response to different stimuli indicate discriminaT H E PSYCHOLOGY OF LEARNING A N D MOTIVATION. V O L . 31
13
Copyright 0 1994 by Academic Press, lnc. All nghls of reproduction in any form reserved.
74
Edward A. Wasserman and Suzette L. Astley
tion, whereas in the second, common private experiences in response to the same stimulus indicate recognition. As was James, we too are willing to consider the possibility that nonhuman animals exhibit behaviors that, if they were to be exhibited by human beings, would lead most theorists to describe those actions as evidencing conceptualization. We also think that there are clear behavioral signposts of conceptualization and that recognition and discrimination are among them. We part company with James-and with a good many past and present theorists of cognition-by proposing that a rigorous and comparative analysis of concepts should primarily concern public stimuli and responses; speculations about private experiences, mental structures, and prototypical representations may be unnecessary for a sound understanding of the problem. But just what is the problem? What behaviors do human and nonhuman animals exhibit that lead us to label those actions “conceptual”? Two examples from the exploits of the first author’s children illustrate well the nature of conceptual behavior. When she was 2 years old, Lara was treated to her first film at a local theater. After her parents carefully tutored her in cinema etiquette, Lara attended and thoroughly enjoyed watching The Little Mermaid, her first movie. Soon after that experience, Lara saw her second movie on television, 20,000 Leagues Under the Sea. Days later, while her dad was remotely sampling the television’s viewing fare, he paused on one channel to see if Lara wanted to view an ongoing Jacques Cousteau adventure show. She said, “Yes, watch movie.” When he was 2 years old, David displayed strong dislike for a piece of cake that, unlike any that he had tasted before, was garnished with nuts. Some weeks later, David was given a pasta salad for his main course at dinner. After chewing for a while, he spit out a piece of celery and said, “I don’t like nuts.” The same nonverbal and verbal behaviors occurred on a later occasion when David was eating a helping of frozen yogurt that was filled with chocolate chips. Evidently, Lara’s limited cinematic experience had led her to say that all films containing underwater scenes were “movies.” Similarly, David’s limited dining experience had led him to say that all firm ingredients that are mixed with softer foods were “nuts.” Parents, of course, find such vignettes charming and amusing; youngsters in the early stages of learning language often say cute things like these. Cognitive psychologists describe these behaviors differently; children in these cases are said to have made “conceptual” errors or to have induced “overly inclusive” categories. As we will see, behaviorists take a different tack, they analyze these and many other instances of human and animal behavior in terms of the familiar psychological processes of discrimination and generalization. Lara had previously discriminated dif-
Behavioral Analysis of Concepts
75
ferent forms of visual entertainment from one another (e.g., cartoon vs. commercial), but in the preceding example, she generalized the term “movie” to all forms involving underwater action. David had previously discriminated different foods from one another (e.g., cereal vs. berry), but in the example, he generalized the term “nut” to all hard matter in soft food. These two examples help us to define better the kinds of responses that James might have called conceptual had he more extensively analyzed the problem from a behavioral perspective. Instead of talking about things, events, and qualities as single, isolated stimuli, as he did in the opening passages, we should treat them as generic classes or categories, with the members of one category being more similar to each other than they are to the members of different categories (for more on this issue, see Quine, 1969). Generalization must then be considered along with discrimination as key elements in a behavioral analysis of concepts. We turn next to the origins of that behavioral analysis.
11. Toward a Behavioral Analysis of Concepts
The seed of a behavioral analysis of concepts can be traced to B. F. Skinner’s 1935 paper, “The Generic Nature of the Concepts of Stimulus and Response” (Skinner, 1935). This famous article does not explicitly deal with concepts, but it does discuss the basic problem for behavior analysis that no two instances of behavior are the same, nor do they occur in the same stimulating situation. The stimuli that control behaviorindeed, the very responses they control-can best be considered to belong to classes of related entities: “The procedure recommended by the present analysis is to discover the defining properties of a stimulus and a response and to express the correlation in terms of classes” (p. 56). The discovery of defining stimulus and response properties was not going to be a simple process. Skinner recognized that “it is difficult . . . to say precisely what defining properties are” (p. 49). Nevertheless, he offered that “the defining property of the stimulus is inferred from the part common to the different stimuli that are found to be effective [in producing the response]” (pp. 48-49). The notion of generic class and other elements of Skinner’s analysis of operant behavior were joined in an elegant account of conceptualization by two other pioneers in the experimental analysis of behavior: Fred S. Keller and William N . Schoenfeld. In their well-known textbook, Principles of Psychology (1950), Keller and Schoenfeld argued that the very question What is a concept‘? is the wrong one to ask. Instead, we should
76
Edward A. Wasserman and Suzette L. Astley
begin with the question, What type of behavior is it that we call conceptual? They proposed that “when a group of objects gets the same response, when they form a class the members of which are reacted to similarly, we speak of a concept” (p. 154). So, for example, when a young child is taught to respond shoe to discriminably different members of one class of objects and book to discriminably different members of another class of objects, we have observed conceptual behavior. Keller and Schoenfeld did not explicitly consider the transfer of discriminative responding from familiar to novel stimuli. Nevertheless, we are quite certain that Keller and Schoenfeld would have expected such transfer, so that new shoes and new books will occasion appropriate behaviors, if they do not differ too much in appearance from the training stimuli. Their additional comments make it abundantly clear that their behavioral analysis of concepts did not hypothesize the existence of mental structures or prototypes, nor did it conjecture that conceptual behavior is under the control of abstract features or representations of objects or events. Indeed, Keller and Schoenfeld thought that there was absolutely no need whatsoever to invent any new behavior-analytic tools to explain conceptual behavior beyond the well-established principles of discrimination and generalization. “Generalization within classes and discrimination between classes-this is the essence of concepts” (p. 155). Finally, Keller and Schoenfeld broke from traditional opinion by proposing that conceptual behavior might be exhibited by nonverbal humans, like infants, and even by nonhuman animals. The proposal that conceptual behavior is neither uniquely verbal nor peculiarly human remains controversial to this day, as it most certainly must have been when James broached the notion over a century ago. Given our own special interest in the matter, that proposal will be of prime concern in this chapter. Thus, it is particularly appropriate here to repeat Keller and Schoenfeld’s (1950) insightful critique of the concept of “concept”: It is curious to note the resistance that may be shown to the notion that the term or found only in the behavior of human adults. We seem to have here a problem in our own behavior. We have formed a concept of conceptual behavior which is based upon such factors as the age of the subject, his [or her] ability to verbalize. and the fact that he [or she] is human. (p. 159)
concept need not be limited to matters capable of being verbalized
Despite the striking simplicity and objectivity of behavioral analyses, like Keller and Schoenfeld’s, these accounts have generally had rather little impact on mainstream research and theory in conceptualization. Until recently, cognitive psychologists with largely mentalistic and anthropocentric leanings have generally appropriated the study of concepts as
Behavioral Analysis of Concepts
77
their own, with what some critics would say are unfortunate consequences (see Oden, 1987, for a review and analysis of such cognitive research and theory in the area of conceptual behavior). With only a few notable exceptions (such as work on human behavior using artificially constructed stimuli and similarity or connectionist models of classification; e.g., Gluck & Bower, 1988;Nosofsky, 1984),behavioral analyses of conceptualization have slowly simmered on the proverbial back burner of experimental psychology. One likely reason for the historical lack of prominence of behavioral analysis was the absence of a strong empirical base for this general approach. Keller and Schoenfeld had little more data available to them to support their behavioristic account than James had to support his more mentalistic speculations. Indeed, writing not long after James penned the introductory quotations to this chapter, C. Lloyd Morgan (1894) considered, but then rejected, the possibility that nonhuman animals are capable of conceptual thought. Morgan came to this conclusion, “in no dogmatic spirit, and not in support of any preconceived theory or opinion, but because the evidence now before us is not, in my opinion, sufficient to justify the hypothesis” (p. 377). In this light, the present chapter reviews most of the empirical evidence that our laboratory has collected over the past several years on conceptualization by pigeons. That evidence supports the speculations of James, Keller, and Schoenfeld that nonhuman animals do display conceptual behavior. The concepts that they acquire are not only of basic-level categories of objects, but of superordinate categories as well. We will also see that the notions of primary and secondary stimulus generalization (Hull, 1943) can explain these cases of conceptual behavior. And, despite our concentrating on the behavior of nonhuman animals, we will see that this analysis of concepts is also applicable to the behavior of young children.
111. Empirical Evidence on Basic-Level Conceptualization
by Pigeons Our decision to research conceptualization was not intended to shift the empirical focus of our pigeon laboratory (although that is precisely what happened); rather, it was the outgrowth of a continuing interest that the first author had in exploring the utility of behavioral analyses of cognitive processes (Wasserman, 1981,1982,1983). Many years of discussing Keller and Schoenfeld’s analysis of concepts in the classroom finally inspired an initial inquiry that was guided by two additional developments: Herrnstein’s celebrated studies of snapshot discrimination by pigeons
Edward A. Wssserman and Suzette L. Astley
78
and our own experimental analogue for pigeons of concept learning by children. When we began our research into conceptual behavior in animals, we wished to build on the groundwork laid by Herrnstein and his colleagues (for a review of this research, see Herrnstein, 1985). That work-begun in 1964 by Herrnstein and Loveland-clearly established that pigeons could discriminate 35-mm color slides that depicted a particular class of stimuli (like people, fish, or trees) from otherwise comparable slides that did not. These discriminations not only held for large sets of previously seen slides, but they also generalized to novel slides from the featurepresent and feature-absent categories. Successful stimulus generalization in these projects supported Herrnstein’s suggestion that basic-level concepts or categories are effectively open-ended; they comprise limitless instances of related stimuli. So, we too taught pigeons to discriminate colorful photographic images, but we wanted them to engage in a much closer approximation to human conceptual behavior than the feature-present versus feature-absent discriminations studied by Herrnstein’s group. Complementary concepts like tree and nontree do conform with James’ earlier quoted discriminations of “this” versus “that.” However, these concepts are not only highly atypical and asymmetrical, but the stimulus control over behavior acquired in these tasks may differ notably from that acquired in other kinds of discriminations (for more on this matter, see Edwards & Honig, 1987; Herrnstein, 1979; Wasserman, 1974). Thus, in our work, we required pigeons to discriminate concurrently slide images depicting stimuli from four different classes of objects, each class representing basic-level categories from different superordinate categories.’ In addition, Herrnstein (1985) had reported difficulty in training pigeons to discriminate slides with humanmade objects from slides without those objects. Thus, two of the four classes of stimuli that we used were photographs of natural objects and two were photographs of humanmade objects. Of course, if we were to devise an effective and suitable analogue of human concept learning, then we would have to capture the gist of the
’
Two of the categories that we used (chair and car) are identical to some of the nonbiological basic-level categories studied by Rosch. Mervis, Gray, Johnson, and Boyes-Braem (1976). The other two categories that we used were biological in nature and they varied across studies; these categories were not identical to any used by Rosch et al. However, it could be argued that these categories are at an equivalent level to the basic-level categories used there. Rosch et al. found the categories tree, fish, and bird to be at the basic level for human adults. The cat and flower categories that we used in the earliest experiments seem similar in level. In later studies, we substituted “human” for “cat.” It is possible that “human” is a special category unlike any other biological category for human subjects, but not for pigeon subjects.
Behavioral Analysis of Concepts
79
process and translate it into a setting that would be suitable for the pigeon-a nonhuman and nonverbal creature. Our behavioral analysis of conceptualization led us to consider the many ways that we commonly learn concepts. We picked the very familiar case of a parent teaching a child to name the pictures in a book. In this so-called “name” game, the parent opens the picture book, points to one of its many colorful illustrations, and asks the child, “What is it?” If the child makes the correct verbal response, then positive social reinforcement is provided. If the child makes the incorrect verbal response, then no positive reinforcement is provided; instead, the parent may ask the child to try again, and if this request also fails to occasion the correct verbal response, then the parent may have to supply it. Many hours are spent by both parent and child in this highly enjoyable activity. Yet, the name game is much more than a mere diversion: It involves the parent’s very careful and selective application of social reinforcement, so that different classes of discriminative stimuli gain control over the child’s ever-expanding repertoire of verbal behavior. A N D GENERALIZATION A. EXPERIMENT 1 : DISCRIMINATION
In our work, we tried to extract and apply the essence of the name game to pigeons to see if these animals can concurrently report stimuli from four different categories of objects. Instead of requesting verbal behavior from our subjects (an obvious impossibility), we asked the birds to report members of four different categories-cats, flowers, cars, and chairs-by pecking four circular keys surrounding a square central viewing screen. On each of 40 daily trials, the four pigeons in one of our first experiments (Bhatt, Wasserman, Reynolds, & Knauss, 1988, Experiment 1B) were shown color slides depicting 10 different examples from each of the four categories. Within each category, the slides differed greatly from each other in the number, size, color, brightness, orientation, location, and context of the stimulus object. Our aim in selecting objects to photograph was to capture a broad range of category instances in those places where humans would ordinarily find them. In some of the slides, the stimulus object was partially obscured, whereas in others, the whole object was visible. After 30 pecks to the viewing screen to ensure that the slide had been observed, the four report keys were illuminated (each with a different color to help the pigeon to distinguish its response options) and a single choice response was permitted. If it was to the correct key for reporting members of the category of stimulus shown on the viewing screen on that trial, then the pigeon was given food reinforcement; if it was to any of the three incorrect keys, then no reinforcement was given and a correction
80
Edward A. Wasserman and Suzette L. Astley
trial immediately followed. (Correction trials were never scored.) The pigeon continued on to the next trial only after a correct choice-whether it was the first or the nth of a trial. A particular pigeon might have to peck the top left key in response to pictures of cats, the top right key in response to pictures of flowers, the bottom left key in response to pictures of cars, and the bottom right key in response to pictures of chairs. Different pigeons received different category-key assignments, so that across a quartet of birds each key was equally often associated with each of the four stimulus categories. Acquisition of discriminative responding under these conditions was quite regular and rose from the chance accuracy score of 25% to a mean of about 80% correct after 30 days of training. At no point in the conduct of this experiment or in any of numerous others conducted in our laboratory was there any sign that the pigeons categorized the photographs of humanmade stimuli more slowly or less accurately than they categorized the photographs of natural stimuli. Immediately after reaching the 80% level of acquisition, the pigeons were given generalization testing with 10 brand-new snapshots of objects in each of the four categories. The testing phase lasted 2 days, on each of which the pigeons saw 20 old slides and 20 new ones. Accuracy to the old slides averaged 81% and to the new ones 64%. Thus, the pigeons had acquired very discriminative behavior, which enabled them to categorize a set of highly complex and lifelike stimuli that they had seen 30 times before, and still other stimuli that they had never seen before. If, as Keller and Schoenfeld had argued, generalization within categories and discrimination between categories is the essence of concepts, then surely our pigeons can be said to have evidenced conceptual behavior. We believe that it is noteworthy that categorization accuracy was reliably lower to the novel test stimuli than it was to the familiar training stimuli. This generalization decrement may be explained by a host of different theories of conceptual behavior-from exemplar models to prototype models (Smith & Medin, 1981; see also Astley & Wasserman, 1992). This fact suggests that the pigeons had memorized some or all of the photographic stimuli that they had seen during training, although absolutely nothing in the training regimen required them to do so. B. EXPERIMENT 2: CATEGORY SIZE
Further data on the issue of individual stimulus learning and memory come from a later project conducted in our laboratory by Bhatt (1988; see Wasserman & Bhatt, 1992). There, three groups of four pigeons each were given 48 daily training trials comprising: 12 copies of 1 example from the
Behavioral Analysis of Concepts
81
categories cat, flower, car, and chair (Group 1); 3 copies of 4 examples from the categories (Group 4); o r I copy of 12 examples from the categories (Group 12). The obtained speed of discrimination learning was an inverse function of the number of examples shown per category. The mean numbers of days to reach a criterion of 70% accuracy on 2 successive days of training were 6 for Group 1, 1 I for Group 4, and 22 for Group 12. Either the smaller number of stimulus repetitions or the greater number of stimuli to be remembered with increasing numbers of examples per category can account for this learning function. Of additional importance were the results of ageneralization test with 32 novel stimuli-8 from each category. Here, accuracy was a direct function of the number of examples given during training. The mean percentages of correct choices on generalization test trials were 27% for Group I , 45% for Group 4, and 62% for Group 12. Thus, although increasing the difficulty of original learning, greater numbers of training examples per category enhanced the accuracy of generalization performance, perhaps because of the increased likelihood that any given test stimulus will resemble one or more of the remembered training stimuli (Smith & Medin, 1981). Not only are these learning and testing data orderly, but they neatly correspond with a large body of research on categorization in humans (reviewed by Homa, Burruel, & Field, 1987) and with a recent report on two-category discrimination performance in pigeons involving 5 versus 35 examples of bird and mammal sketches per category (Cook, Wright, & Kendrick, 1990).* A final point to note is that all of the pigeons in this experiment exhibited greater accuracy to old than to new stimuli, even the birds shown 12 different pictures per category during original discrimination training (and the subjects that had evidenced the highest level of transfer to novel test pictures).
C . EXPERIMENT 3: TRIAL-UNIQUE STIMULI All of the research described so far has entailed stimuli that were repeated-either between daily sessions or both between and within daily sessions of training. Is such stimulus repetition necessary to support successful discrimination learning and generalization?
’
Using stripes of various colors and lengths, Pearce (1988) did not observe differences in go-no go discrimination learning between a group of pigeons given 18 S + s and 18 S-s, and another group given 1 S+ and 1 S-. Novel test stimuli were not given to test for generalization.
82
Edward A. Wasserman and Suzette L. Astley
To investigate this issue, we created a large library of snapshots from four categories: people, flowers, cars, and chairs (Bhatt et al., 1988, Experiment 3). We replaced pictures of cats with pictures of people because the latter individuals were generally (but not always, as we were sometimes pointedly reminded) much more agreeable to having their photographs taken. With 2000 snapshots (500 from each of the four categories) and 40 trials per session, we could train the pigeons for 50 sessions with no stimulus ever being repeated; each trial was thus both a training trial and a testing trial. The results of the experiment were clear-cut: pigeons came to respond discriminatively to stimuli from the four different categories of pictures even if those individual examples were never repeated. After beginning at the chance level of 25%, discrimination accuracy of a group of four pigeons rose to a mean level of 70% over Days 41 to 50 of training. Clearly, the birds had to have been remembering something about the photographic stimuli that they had seen earlier. But what? Given that a large, nonrepeating set of stimuli was used in this experiment, if the pigeons remembered individual stimuli, then it is likely that on a given day they remembered some or all of the slides presented to them on the previous day. We tested this possibility in a follow-up investigation that began immediately after the conclusion of the main experiment. On Day 5 1, the pigeons were tested with a slide tray that contained a mixture of 20 of the slides shown on Day 50 plus 20 new slides. Accuracy to the old slides averaged 79% and to the new ones 75%. Although in the correct direction for indicating individual stimulus memory, this disparity in discrimination accuracy fell short of attaining statistical significance. REPETITION D. EXPERIMENT 4: STIMULUS The prior experiment convincingly showed that stimulus repetition was not at all necessary for categorical discrimination learning. It also failed to show any reliable positive effect of stimulus repetition on discrimination accuracy, although the conditions that were arranged for demonstrating such an effect could hardly have been poorer. Experiment 4 (Bhatt et al., 1988, Experiment 4) more systematically investigated the matter. Here, a set of 40 slides-I0 each from the categories person, flower, car, and chair-was chosen randomly from the library of 2000 used in Experiment 3. Four different pigeons were trained with this set of slides on “repeating” sessions that alternated with “nonrepeating” sessions in which the birds were trained with new sets of slides that were never used again in another session. The pigeons were trained with the repeating 40-slide set on Days 1 , 3, 5, . . . , 95 while being trained with novel nonrepeating slide sets on Days 2 , 4, 6, . . . , 96.
Behavioral Analysis of Concepts
83
The results clearly showed that discriminative responding rose faster and attained higher final levels of accuracy to the repeating slide set than to the nonrepeating slide sets. Performance on the repeating set rose from a mean of 29% in the first 4-day block to a mean of 85% in the last 4-day block; performance on the nonrepeating sets rose from a mean of 26% in the first 4-day block to a mean of 66% in the last 4-day block. Although quite unnecessary for categorical discrimination learning, stimulus repetition materially facilitates the process. Given that the set of 40 repeating slides was chosen at random from a total pool of 2000, it is highly unlikely that the superior performance on the repeating slides was a result of their being inherently easier to categorize by the pigeons than the nonrepeating ones. Nevertheless, to be more confident that the results of the experiment were not because of this unlikely confounding, we conducted a follow-up project. Here, we retrained the pigeons on the same task, this time using a different repeating slide set. Specifically, the set of novel stimuli presented to the pigeons on Day 96 was used as the repeating set in the follow-up investigation. The nonrepeating slide sets now entailed the photographs that the pigeons had seen as the nonrepeating slides in the main experiment. Thus, the nonrepeating sets consisted of slides that the pigeons had seen only once before, 95 days earlier, thereby making it even more difficult for us to obtain a reliable difference between the new repeating slide set and the nonrepeating slide sets. This follow-up investigation lasted 16 days and adhered to the same alternating-day plan as the main experiment. The results of the follow-up project were similar to those of the main experiment. Categorization performance on the repeating set of slides was more discriminative (M = 76% correct in the last 4-day block) than was performance on the nonrepeating sets of slides (M = 66% correct in the last 4-day block). Examination of individual birds’ scores disclosed that each pigeon showed more discriminative performance on the repeating slide set than on the nonrepeating slide sets and that this difference was present even in the first 8 days of the project. To obtain a more exact estimate of the number of repetitions necessary to facilitate categorization during the follow-up investigation, we looked for the first session with repeating slides on which two performance criteria were met: (a) the bird’s discrimination performance on that session was more accurate than its performance on the first exposure to the repeating slide set, and (b) the bird’s performance on that session was more accurate than its performance on the preceding and succeeding sessions with nonrepeating sets of slides. It took a mean of only two repetitions for the four pigeons to meet these criteria. That number was less than the six repetitions that, by chance alone, would have resulted in a session meeting
Edward A. Wassennan and Suzette L. Astley
84
these criteria. That number was also greater than the single repetition that the birds in the follow-up portion of Experiment 3 had received, where no statistically reliable performance advantage over novel stimuli was evidenced by stimuli seen only one time before on the previous day. Finally, it should be noted that accuracy to the repeating slide set in the follow-up project (76% correct) was lower than was accuracy to the repeating slide set in the main experiment (88% correct). This finding is consistent with the different number of repetitions involved in each case: 7 in the follow-up investigation and 47 in the main experiment. IV.
Stimulus Generalization and Conceptualization
Given all of the preceding data on conceptual discrimination and generalization in pigeons, one might wonder whether the differential reinforcement that we so assiduously administered in those experiments really created the conceptual behavior that our pigeons exhibited. This point may appear to be an odd one to raise, but Herrnstein and de Villiers (1980) speculated on the basis of their own research that differential reinforcement may not produce, but merely disclose already-existing concepts: “Something in the pigeon’s perceptual dynamics ties [stimuli] together as a class, prior to differential reinforcement” (p. 87). This argument is tantamount to saying that primary stimulus generalization is at the root of conceptual behavior, an assertion that is entirely consistent with Keller and Schoenfeld’s behavioral analysis and one for which there is growing empirical support in both human and nonhuman animals (Harnad, 1987). It seems quite reasonable to hypothesize that many, if not most, basiclevel human conceptual categories comprise highly similar stimuli. To our eyes, cats resemble one another much more than they resemble flowers, cars, or chairs. This perceptual similarity may be an important and inborn factor responsible for the emergence of the very concepts that we are considering, a possibility stated most clearly and emphatically by Quine ( I 969): If then I say that there is an innate standard of similarity, I am making a condensed statement that can be interpreted, and truly interpreted, in behavioral terms. Moreover. in this behavioral sense it can be said equally of other animals that they have an innate standard of similarity too. It is part of our birthright. And, interestingly enough, it is characteristically animal in its lack of intellectual status. (p. 1 I )
Quine (1969) went on to suggest that the origin of perceptual similarity as well as the concordance of similarity relations from person to person result from the operation of evolutionary mechanisms (for a divergent
Behavioral Analysis of Concepts
85
view, see Keil, 1989). “If people’s innate spacing of [perceptual] qualities is agene-linked trait, then the spacing that has made for the most successful inductions will have tended to predominate through natural selection” (p. 13). Anderson (1991) later expanded on Quine’s thesis and proposed that the main force behind perceived similarity is physical similarity. In his words, “the mind has the structure it has because the world has the structure it has” (p. 428). Quite apart from the origins of perceived similarity (see Spinozzi, 1993, for a discussion of both phylogenetic and ontogenetic trends in spontaneous classificatory behavior in human and nonhuman animals), we can legitimately ask whether such categorical similarity is perceived by nonhuman animals like pigeons? And, if it is, then how can we tell? To answer these questions, our laboratory has pursued three different lines of inquiry suggesting that pigeons and humans do similarly group stimuli into different categories without the birds ever being required to do so by the prevailing contingencies of reinforcement. A.
EXPERIMENT 5: CATEGORIES A N D PSEUDOCATEGORIES
One way to study the coherence of categories and their concordance in pigeons and people involves comparing the relative speeds of pigeons’ learning to sort the same pictorial stimuli into human conceptual categories (“true categorization”) or into absolutely arbitrary collections (“pseudocategorization”). Suppose that all of the slides in the total pool of cat, flower, car, and chair stimuli were equally discriminable from one another. If this supposition were true, then pigeons trained on the true categorization task should learn at the same rate as pigeons trained on the pseudocategorization task, in which equal numbers of cats, flowers, cars, and chairs are associated with the four different key-peck responses. However, if to pigeons members of the human conceptual categories more closely resemble one another than they resemble members of the other conceptual categories, then learning of the true categorization task should proceed faster than learning of the pseudocategorization task. This prediction follows from the fact that correct responding in the true categorization task should be bolstered by direct strengthening of responding to a particular key in the presence of a particular stimulus and by indirect strengthening due to similar stimuli in the same category occasioning the same response. However, in the pseudocategorization task, correct responding will be bolstered primarily by direct strengthening of responding to a particular key in the presence of a particular stimulus; greater generalization within the conceptual categories here should produce an equal likelihood of pecking all four keys, thus decreasing the accuracy of discriminative performance.
Edward A. Wasserman and Suzette L. Astley
86
Our study comparing these two conditions (Wasserman, Kiedinger, & Bhatt, 1988, Experiment 2) clearly supported the latter possibility. Over Days 37 to 40 of discrimination training, pigeons on the true categorization task averaged 79% correct, whereas pigeons on the pseudocategorization task averaged only 44% correct. These results (and those of Edwards & Honig, 1987, Herrnstein & de Villiers, 1980, and Pearce, 1988, but not those of Cook et al., 1990, who used special pen-and-ink sketches of animals) implicate differential within- versus between-class generalization as a key feature of visual categorization in animals. Follow-up data were collected by Bhatt (1988) on the role of the number of different examples constituting true categories and pseudocategories. Experiment 2, described earlier, documented the decrease in the speed of learning true categories containing, respectively, 1, 4, or 12 different examples. Bhatt also found that, with pseudocategories, the mean numbers of days to reach a criterion of 70% accuracy on 2 successive days of training were 6 for Group 1,3 18 for Group 4, and SO for Group 12. Thus, the disparity between the speed of true category learning and the speed of pseudocategory learning rose as the number of examples per category or pseudocategory was increased: the disparity was 7 days with 4 examples per category or pseudocategory (11-18 days), and it was 28 days with 12 examples per category or pseudocategory (22-SO days). B. EXPERIMENT 6: SPLITCATEGORIES
A second line of inquiry also concerned the nature of the stimuli that to pigeons constitute a class or category of objects. Although others investigating animal behavior with different techniques had shown that members of one stimulus class resemble one another more than they resemble members of other stimulus classes (Cerella, 1979; Sands, Lincoln, & Wright, 1982), we devised a new method by which this notion could be tested with pigeons (Wasserman et al., 1988, Experiment 1). In any particular 40-trial session, pigeons were given a split-category discrimination in which they viewed 20 cat slides and 20 flower slides (or 20 cat slides and 20 chair slides, or 20 car slides and 20 flower slides, or 20 car slides and 20 chair slides). For each pigeon, one half of the cat slides would require a peck to one key (Key 1) and the other half of the cat slides would require a peck to a second key (Key 21, whereas one half of the flower slides would require a peck to a third key (Key 3) and the other half of the flower slides would require a peck to a fourth key (Key 4). (Cat-chair, car-flower, and car-chair sessions were similarly con-
’
True categories and pseudocategories with just one example per category are, of course, equivalent.
Behavioral Analysis of Concepts
87
structed, with different pigeons having different key assignments.) If the cat slides in the first set were equivalently discriminable from the 30 other slides shown in the illustrative session, then errors should be randomly distributed to Keys 2, 3, and 4. However, if the 10 slides in the first set of cats are more similar to the 10 slides in the second set of cats than they are to the 20 flower slides, then more errors should be made to Key 2 than to Keys 3 or 4. The pigeons’ pattern of errors of commission was clearly consistent with the latter possibility. Over Days 105-1 12 of training, a mean of 56% of all errors were within-category in nature (33% was the chance level of errors because on only three keys could mistakes be made). This split-category experiment was relevant to another salient issue: The discriminability of the training and testing stimuli in our initial generalization experiments (Bhatt et al., 1988; the present Experiment 1). Did our birds respond discriminatively to the novel test stimuli because they were indiscriminably different from the training stimuli? Or had generalization occurred to discriminably different stimuli? Because the different split categories used by Wasserman et al. corresponded with the training and testing sets used by Bhatt et al., we could decisively answer the question. Over Days 105-1 12, the pigeons’ split-categorization accuracy averaged 72% (here, the chance of a correct response was again 25%). Because the training and testing stimuli in the Bhatt et al. project were demonstrably discriminable to pigeons, we conclude that the reliable generalization obtained in that study did not result from the birds’ inability to discriminate the new from the old sets of stimuli. This case of stimulus generalization thus passes a rather stringent test of conceptual behavior, one that excludes the rather uninteresting possibility of transfer resulting from the subject’s inability to discriminate the training stimuli from the testing stimuli (for more on this issue, see Quinn & Eimas, 1986; Younger & Cohen, 1985). A final point to note about the split-categorization experiment is that the within-category generalization results cannot be explained simply by the contingency of reinforcement to which the pigeons were exposed. If the pigeons were to have discriminated between but not within the given categories, then all errors of commission would have been of the withincategory variety and correct responding could never have risen above 50%. The obtained rise in correct responding to a mean level of 72% indicates the acquisition of stimulus control by some or all of the individual pictures that the pigeons were shown. Thus, the prevailing contingency of reinforcement might have encouraged within-category generalization (yielding an increase of from 25% to 50% food reinforcement), but not nearly as much as it should have encouraged within-category discrimina-
88
Edward A. Wasserman and Suzette L. Astley
tion (yielding an increase of from 25% to 100% reinforcement). Any effort on the part of the birds t o maximize the obtained rate of food reinforcement should have reduced rather than increased their propensity to commit within-category errors.
C . EXPERIMENT 7: Go-No G o DISCRIMINATION Direct evidence on the perceived similarity of category members comes from a more recent investigation (Astley & Wasserman, 1992, Experiment 2), in which pigeons learned a successive go-no go discrimination with 60 slides: 12 S + s and 48 S-s. All eight birds were given the same S-s: 12 people, 12 flowers, 12 cars, and 12 chairs. Different birds had different S+s: A given bird’s S + s might be 12 different people, 12 different flowers, 12 different cars, or 12 different chairs. Assuming that the 12S+s are equally similar t o all 48 S-s, errors should be randomly distributed among the four S- categories-including the one from which the S+s were picked. But, if t o pigeons, members of a given human conceptual category more closely resemble one another than they resemble members of different conceptual categories, then errors should be nonrandomly distributed and should be disproportionately committed to the S- s from the same category as the S+s. Precisely the latter result was obtained regardless of whether the S + s were slides of people, flowers, cars, or chairs. Over all 16 days of multiple schedule training and all eight pigeons, a mean of 43% (rather than the chance mean of 25%) of all S- errors were committed to stimuli from the S+ category. This work thus reveals that pigeons group similar stimuli together, even when that grouping is completely unrelated to the prevailing contingency of reinforcement. Unlike our initial studies that first explicitly reinforced correct categorization responses and then found reliable generalization to untrained instances, categorical generalization here was evidenced by the birds’ untrained propensity to commit the most errors to negative discriminative stimuli from the same conceptual category as the positive discriminative stimuli. A strong perceptual basis for conceptualization is clearly implicated by the results of this study. Beyond the fact that within-category errors exceeded between-category errors, we also found that the pigeons responded much more to the S + s than they did to the S- s from the same category as the S + s. This result again testifies to the discriminability of individual stimuli from the several human conceptual categories that we have studied (cf. Experiment 6). Still other data that we collected in experiments affiliated with this project concern previously discussed empirical relations and yet-to-be discussed theoretical interpretations.
Behavioral Analysis of Concepts
89
In addition to investigating a photographic discrimination involving 12 different S+ s and 48 different S - s, we also studied a photographic discrimination involving only 1 S + and the same set of 48 different S-s (Astley & Wasserman, 1992, Experiment 1). The latter discrimination was learned faster and entailed a smaller percentage of within-category errors than did the former. It took only 6 days for birds given the 1 S+/48 S- discrimination to respond to the S + picture at a mean rate that was at least an order of magnitude greater than that to the S- pictures, whereas it took 11 days for birds given the 12 S+/48 S- discrimination to do so. Also, the highest mean daily percentage of within-category errors for pigeons given the 1 S+/48 S- discrimination was 43%, whereas it was 57% for pigeons given the 12 S+/48 S- discrimination. Finally, we replaced some of the original S- slides from the 12 S + / 48 S- discrimination with brand-new slides (Astley & Wasserman, 1992, Experiment 3). Those replacements were chosen from the three S- categories from which the S + s were not drawn, to ensure that any changes in response rate to the new stimuli from the three S- categories were not caused by changes in the similarity of the replaced stimuli to the S + pictures. In special test sessions, the mean response rates to the original S + s and to the original S-s from the three conceptual categories combined were 3.33 and 0.11 key pecks per sec, respectively. The mean response rate t o new stimuli from the three S- categories combined was 0.69 key pecks per sec, a rate that was reliably below that to the S + slides and reliably above that to the original S- slides. The latter difference appears to represent an inhibitory analogue of the generalization decrement that we reported in our first study (Bhatt et al., 1988; the present Experiment 1). Evidently, the new negative stimuli resembled the old ones enough that there was partial, but not complete, generalization of inhibition from the old to the new stimuli. Because of the nature of the discrimination task, generalization decrement in this experiment was translated into an increased rate of response rather than into a decreased percentage of correct choice responses.
V.
A Spencian Model of Basic-Level Categorization
The work that we have reviewed represents most of the empirical evidence that we have so far collected on the pigeon’s acquisition and transfer of basic-level visual concepts. That evidence is, as we have suggested, quite consistent with a behavioral analysis relying primarily on the familiar principles of discrimination and generalization. (If Keller and Schoenfeld
90
Edward A. Wasserman and Suzette L. Astley
were to have had similar data at their disposal, then their behavioral analysis of concepts might have had more impact on the psychological community than it has so far exerted.) In fact, after they learned of our data, two close colleagues at The University of Iowa, Cantor and Spiker (personal communication, 1987), found the case for a behavioral analysis of concepts so compelling that they posed to us the disarming question, “Why speak of concepts at all if the simple notions of discrimination and generalization will do?” Perhaps the best answer to this query is that, historically, the ideas of discrimination and generalization are strongly associated with stimuli that are readily measurable and manipulable. Indeed, the notion of a generalization gradient presupposes one or more dimensions along which the experimental stimuli clearly vary. When we consider lifelike stimuli, such as the color slides used in our own research, we see no easy way to think of the complex representations of cats, flowers, cars, and chairs as lying along simple physical dimensions, like hue, brightness, area, orientation, and so on. Hence, the straightforward application of the most prominent theories of discrimination learning, like that of Spence, appears to be impossible. Despite the seeming impossibility of applying Spence’s very influential and successful 1937 model of discrimination learning to lifelike visual stimuli, we (Astley & Wasserman, 1992) have recently made just such an effort. We are not at all sure that Spence would have approved of our extension of his theory. (To date, there have been no obvious signs of displeasure from Spence’s ghost that is rumored to roam the halls of the Spence Laboratories of Psychology at the University of Iowa in the wee hours of the night.) Nor are we at all confident that our extension of Spence’s theory will prove to be as successful as connectionist models or multidimensional scaling accounts; proponents of these other interpretations might view our Spencian model as merely a first step toward their more sophisticated accounts. Nevertheless, we present that model here and show how it can explain the empirical results that we have obtained. We leave it to the reader to evaluate the success of the model. A. CONVENTIONS ADOPTEDI N DEVELOPING T H E MODEL
Some of the implications of our Spencian model of conceptualization were examined with computer simulations of the experiments previously described. Several conventions were adopted in generating these simulations. One of our first decisions concerned how to represent similarities within and between categories. Although many other options were available, we chose t o represent stimuli within a conceptual category as uni-
Behavioral Analysis of Concepts
91
formly spaced along a single, common dimension. We refer to such dimensions as “categorical dimensions.” The distance between stimuli along a conceptual dimension is an inverse function of their similarity to one another. The left-right ordering of stimuli along the dimension is arbitrary. To simplify the model, we represented stimuli from different basic-level categories as if they fell along separate and orthogonal dimensions. We do not now have data that would allow us to quantify the similarity relations among the stimuli that we used in the preceding studies. Multidimensional scaling of stimulus similarity has proven to be highly productive in accounting for the behavior of humans categorizing complex visual stimuli (e.g., Nosofsky, 1992); this technique might productively be applied to the sorts of stimuli that we have used in our projects. Our model adopts the Spencian notion that reinforced responding to a stimulus results in excitation accruing to that stimulus. In the model, excitation generated by reinforced responding to one stimulus generalizes to others along the same categorical dimension to the extent that the others are similar to the reinforced stimulus. Likewise, we assume that nonreinforced responding to a stimulus results in inhibition accruing to that stimulus and to similar stimuli on the same categorical dimension. Our simulations represent generalization from one stimulus to another along a categorical dimension as a decaying exponential function of the similarity between them (Shepard, 1987). The model also assumes faster acquisition of excitation than inhibition (see Herrnstein, 1966, p. 37; Rescorla & Wagner, 1972, pp. 85-86); thus, preasymptotically, the inhibitory gradient will be shallower than the excitatory gradient. We use the term “response tendency” to refer to the net value resulting from the combination of excitatory and inhibitory tendencies to an individual stimulus on the categorical dimension. Trial-by-trial changes in excitation and inhibition in our simulations were modeled according to the general principles of the Rescorla-Wagner (1972) model. Specifically, the trial-by-trial changes in excitation and inhibition that accrue to individual stimuli reflect the product of a rate parameter and the distance between the current associative level and the associative asymptote. Two factors affect the level of response tendency in this simulation: (a) directly reinforced or nonreinforced presentations of stimuli, and (b) generalization from neighboring stimuli along the categorical dimension. Approach to asymptote is a function of the summed (direct + generalized) response tendency to a stimulus. Our simulations modeled only the effects of categorical generalization on the tendency to respond to a particular stimulus. In an actual experimental situation, responding will be affected by the specific category to which a stimulus belongs and by any noncategorical attributes (such as
92
Edward A. Wasserman and Suzette L. Astley
shape, brightness, location, etc.) that the stimulus shares with others given in the experimental situation. For example, our simulations of the go-no go procedures of Experiment 7 can generally be expected to underestimate the amount of actual responding. This underestimation may be clearest in cases where the simulation results in net inhibition to an S- or to novel stimuli from the S- category, but the actual data show some tendency to respond to them. The last section in this chapter describes the full set of parameters and operations of our computer simulations. Our simulations are not unlike the recent efforts of Gluck (1991) and Shanks (1991) in emphasizing stimulus generalization and in incorporating the Rescorla-Wagner theory of learning. These other simulations are different from our own, however, in that the association of elementaryfearures to categories is their central explanatory principle (also see Medin & Schaeffer, 1978, and Pearce, 1988, for more on related approaches). In our simulations, the stimuli to be categorized are treated as indivisible wh01es.~We are not prepared at this point to make quantitative comparisons between other approaches and our own, nor are we ready to say that our general model incorporates all of the many elements that are necessary to account for complex categorization behavior. We wish only to show how a very simple set of principles with a long history in learning theory can account for a wide range of categorization phenomena.
B. THESIMULATIONS
In this section, we present computer simulations of the experiments previously described. Because it most directly fits the traditional Spencian model of excitation and inhibition, we will first discuss the go-no go procedures used in Experiment 7 and its affiliated investigations. Experiment 7: Go-No Go Discrimination Recall that in this experiment (Astley & Wasserman, 1992, Experiment 2), 12 slides from one category served as the S + s and 12 other slides from the same category served as the S-s, as did 12 slides from each of three different categories. In that experiment, errors were disproportionately committed to S-s from the same category as the S+s. A second experiment (Astley & Wasserman, 1992, Experiment 1) used the same set of 48 S-s, but only a single slide as the S + . Here, the discrimination was learned more quickly than when 12 different slides served as the S+s; in One possible disadvantage of our approach concerns changes in the influence of particular dimensions of stimuli as training progresses; more will be said about this issue later.
Behavioral Analysis of Concepts
93
addition, pigeons trained with 1 S + were less likely to peck the samecategory S-s than were pigeons trained with 12 S + s . In these experiments, discrimination training was always preceded by a base-line phase in which responses to all stimuli were nondifferentially reinforced. In our simulations, trial-by-trial increments in response tendency during nondifferential base-line training resulted in the development of excitatory gradients around each training stimulus. Because the baseline phases of these studies were conducted until performance stabilized, the simulation was run until asymptote was reached on each of the individual stimuli. During simulated discrimination training in these two investigations, response tendencies to the S - s declined as inhibitory tendencies accrued to them resulting from nonreinforcement. The S - s that were on the same dimension as the S+(s), however, continued to receive generalized excitation as reinforcement of the S+(s) continued. When a single slide served as the S + , only afew S-s (those most similar to the S + ) received generalized excitation; when 12 different slides served as S + s , however, most of the different S- s from that dimension received some generalized excitation. Figure 1 illustrates this result. The top panel depicts the net response tendencies to the single S + and to the 12 S-s from the same categorical dimension after 16 days of simulated discrimination training. The bottom panel depicts the same net response tendencies when 12 S + s were given. Note the greater response tendency to the same-category S-s when there were 12 S + s than when there was only 1 S+. Because we assume that stimuli from different categories lie along orthogonal dimensions, the decline in response tendency to between-category S- s is unaffected by any generalized excitation (these data are not shown in Fig. 1). Figure 2 depicts the mean response tendencies to the S + (s) and to the within- and the between-category S-s over simulated sessions in the experiments with 1 S + (top panel) and with 12 S + s (bottom panel). As did the actual data, the simulated data show faster acquisition of the overall discrimination with 1 S + than with 12 S + s (the degree of discrimination in these plots is indicated by the d ~ e r e n c ein response tendency between the S+(s) and the S-s; see Figs. 1 and 3 in Astley & Wasserman, 1992). The simulated data also show a greater response tendency to withincategory S-s with 12 S + s than with 1 S+. This result too reflects differences observed in the actual data from these experiments (see Astley & Wasserman, 1992, Tables 1 and 2). In the third experiment of this study, novel stimuli were selected from each of the three S- categories different from that of the S+s; in test sessions, these novel stimuli were intermixed with the training stimuli for birds given 12 S+s. The birds showed a greater tendency to respond to
Edward A. Wasserman and Suzette L. Astley
94
I
loo] 80 h C a, U C a, I0
Q,
Cn
C 0
Q
6ol 40
2 z j F
-20
(I)
M
Q,
€P
U I
-80
1 s+
€3
70
80
s-s
"&"
-1o
0
10
20
30
40
o
50
60
h
100
90
Categorical Dimension 100-
I
).
0 C
60-
a, U
40-
8)
20-
C
a,
-I
I
80-
I
-I
I
I
D
IM Ez
[rl
Cd P2
0
v)
C
0
-20-
Cn
-40-
Q
a,
a:
-60-
-80-
0
m
€s 2.303
-
Ew
12s+s
@
s-s
-10
Fig. 1 . Simulated within-category response tendencies to individual stimuli for simulated training with I S + (top panel) or with 12 S+s (bottom panel).
the novel S-s than they did to the S-s with which they were trained (see Astley & Wasserman, 1992, Table 5). Our simulation yielded the same result: a greater response tendency was shown to the novel stimuli from the S- category (-66) than to the original S-s (-74). This finding is called inhibitory generalization decrement; it represents the decline in inhibition that results from presentation of test stimuli some distance from trained S-s along the categorical dimension.
Behavioral Analysis of Concepts
-1OO't
Base
1
8
2
"
4
+ 1 s+
'
6
---@.)---
1
'
1
3
8 10 Session Within-Category
95
"
12
1
'
14
"
16
* Between-Category s-s
s-s
100
8 c
80
a,
60
c
40
-0
a, la,
tn C 0 Q tn a, U
20 0
-20
c
-40
r"
-60
m
-1 00
Base
2
4
6
8
10
12
14
16
Session
-+ 12 S+s
---+ Within-Category S-S
--*t
Belweencatsgay
se
Fig. 2 . Mean simulated response tendencies for the S+(s), for the within-category S-s, and for the between-category S-s for subjects trained with 1 S + (top panel) or with 12 S + s (bottom panel).
96
Edward A. Wasserman and Suzette L. Astley
Despite the generally good fit between our simulations and the actual data, we did find two noteworthy discrepancies. First, the Spencian model showed a decline in responding to the S + (s) early in discrimination training because of the generalization of inhibition from the S-s (as depicted in Fig. 2). The actual data showed exactly the opposite result: positive behavioral contrast, in the form of increased responding to the S+ (s) early in discrimination training (see Astley & Wasserman, 1992, Tables 1 and 2). Many authors have noted that Spence’s model of discrimination learning cannot explain behavioral contrast (e.g., Reynolds, 1961, p. 293). If behavioral contrast is to be accounted for, then other mechanisms must be added to the model. Also contrary to our model was the fact that the number of S+s that subjects were given affected their responding to the between-category S- s; specifically, they exhibited more key-pecking to the betweencategory S-s when there were 12 S+s than when there was only 1 S + (see Astley & Wasserman, 1992, Tables 1 and 2). The model expects this variable to have no effect on between-category responding. This outcome is understandable, however, when one considers the nature of the stimuli used in these studies. With photographs, it is likely that raising the number of S + s from 1 t o 12 will increase the probability that one or more of them will resemble a between-category S- . The representation of categories as orthogonal dimensions is clearly limiting and, in this case, allows no mechanism for differential generalization berween basic-level categories based on their perceptual similarity or the number of S+ s. Perhaps a more complex model along the lines of the connectionist account proposed by Shepard and Kannappan (1991) would prove to be better able to cope with these results than our Spencian theory.
C. EXTRAPOLATION TO CATEGORIZATION AS DISCLOSED BY CHOICEPROCEDURES Although the Spencian approach pretty well accommodated the results of the go-no go experiments just described, we wanted to see if it could also explain the results of experiments using other methods, notably our four-choice categorization procedures. Spence’s formulation considers only how excitation and inhibition affect the tendency to make a single response. Extension of the model to the four-choice procedure merely requires that we consider the strength of a connection between a given stimulus and a given response relative to the connection between that stimulus and the three other available response alternatives. So, learned tendencies to make any of the three incorrect alternative responses might be thought of as analogous to inhibition, in that they have a subtractive
Behavioral Analysis of Concepts
91
effect on the tendency to make the correct response to a particular stimulus. The effective consequence of this elaboration of the model is that the theoretical correct response tendency now ranged from 25% (an equivalent tendency to respond to all four keys) to 100% (an exclusive tendency to respond to the correct key). I.
Experiment 1: Discrimination and Generalization
In this study (Bhatt et al., 1988, Experiment lB), pigeons increased the percentage of correct choice responses over training sessions in our fourchoice categorization procedure. Their discriminative responding later transferred to novel stimuli drawn from the same four categories. Our simulation did likewise. The simulation was conducted with 10 stimuli in each of the four training categories. The top line in Fig. 3 shows simulated acquisition of the correct response tendency to the training stimuli (see Bhatt et al., 1988, Fig. 1). The correct response tendency began at 25% because the initial subtractive tendencies to make each of the three incorrect responses were assumed to be the same as the tendency to initially make the correct response. We assumed that any unlearned tendency to make erroneous responses remained constant, whereas the connection between each individual stim-
6
80-
5
70-
s
60-
'0
t-
500
8 2
-
0
g
0
40-
30-
2010-
O ' l
2
3
4
5
6
-
i
6
9
10
Session -m-
Training Stimuli
Novel Stimuli
Fig, 3 . Mean correct response tendency to training stimuli and to novel test stimuli at different points of simulated category training.
Edward A. Wasserman and Suzette L. Astley
98
ulus and the correct response increased over sessions. Thus, acquisition is entirely a function of changes in excitation. The lower line of Fig. 3 shows the simulated correct response tendency to the same set of novel test stimuli were they to be given at different times during simulated discrimination training. Reinforced experience with the training stimuli contributes t o an increasing tendency to make the correct response to the novel test stimuli, a proclivity that never catches up with the tendency to respond discriminatively to the training stimuli. Therefore, a measurable generalization decrement is to be expected after reasonable periods of discrimination training. 2 . Experiment 2: Category Size In this study (Bhatt, 1988), pigeons' speed of acquisition was an inverse function of category size. Their degree of transfer to novel stimuli was, on the other hand, a direct function of category size. Our simulation performed similarly. Figure 4 shows the number of days taken to meet successively stringent criteria for the three groups in the simulation of this experiment. These simulated data closely match the results of the actual experiment in showing that the time to meet the three learning
7-
$
6-
0
5 543
=
3-
2
2-
C
m
1-
0' 40
70
55 Learning Criteria
--
1 Stimulus
-*4 Stimuli
--*c
12 Stimuli
Fig. 4. Mean number of days required to meet different learning criteria (40% correct. 55% correct. and 70% correct) for simulated category training with 1, 4, or 12 stimuli per category.
Behavioral Analysis of Concepts
99
criteria rose with increasing numbers of training stimuli per category (see Wasserman & Bhatt, 1992, Fig. 10.2). Figure 5 shows the correct response tendency to the novel test stimuli (10 per category) and to the training stimuli in a test session conducted immediately after the 70% criterion had been met. The simulated results correspond closely with the actual results in showing that increasing the number of stimuli per category in the training set increased discriminative transfer to novel test stimuli (see Wasserman & Bhatt, 1992, Fig. 10.2; in both the actual and the simulated data, accuracy t o the training stimuli was an inverse function of their number). 3 . Experiment 3: Trial-Unique Stimuli
Recall that, in Experiment 3 , category training with a nonrepeating pool of 2000 slides was sufficient to generate a high level of correct responding by the end of 50 daily sessions of training. In Session 51, pigeons’ responding to a subset of five stimuli from each category that had been shown in Session 50 was compared with responding to a completely novel set of five stimuli from each category; their accuracy to the previously seen stimuli exceeded that to completely novel stimuli, but only slightly. Our simulation behaved in much the same way. Our limited computing facility prevented us from simulating the full effect of a nonrepeating set of 500 stimuli per category. Our simulation
1
4 12 Number of Training Stimuli Training Stimuli
Novel Stimuli
Fig. 5 . Mean correct response tendency to training stimuli and to novel test stimuli for simulated category training with 1, 4, or 12 stimuli per category.
Edward A. Wasserman and Suzette L. Astley
100
went only to a maximum of 90 different stimuli drawn from 100-stimulus dimensions. The results are depicted in Fig. 6. The top line shows the simulated acquisition of discriminative responding to nonrepeating stimuli as training involves a larger and larger set of prior stimuli. The correct response tendency, here, reflects the total impact of both generalized and direct reinforcement of the training stimuli; that is to say that the correct response tendency includes all of the stimuli given and reinforced to that point of training. The lower line shows the results of simulated novel stimulus tests; these one-time-only tests involve the same five stimuli per category, but they are conducted at different points in the acquisition process. Accuracy to the training stimuli increased from near-chance levels, when the first 10 stimuli were shown, to over 80% when the last 10 stimuli were shown (see Bhatt et a]., 1988, Fig. 3). Comparison of the top and bottom lines in Fig. 6 illustrates the d$ference in response tendency to stimuli in whose presence responding had been reinforced once (members of the training set) and to stimuli that had never been shown o r reinforced (members of the testing set). The slightly higher correct response tendency to past training stimuli reflects the very small effect of a single reinforcement (see Bhatt et al., 1988, Table 9). 4 . Experiment 4: Stimulus Repetition
In this study (Bhatt et al., 1988, Experiment 41, sessions with a repeating set of 10 slides in each of four categories alternated with sessions with
0
10
20
30
40
50
60
70
80
90
Cumulative Number of Training Stimuli +Training Stimuli
D
Novel Stimuli
Fig. 6. Mean correct response tendency for the simulation of trial-unique category training and novel stimulus testing at different points of learning.
Behavioral Analysis of Concepts
101
slides selected from a large nonrepeating set. Pigeons’ performances attained higher levels of accuracy for the repeating slide set than for the nonrepeating slide set. Our simulation did too. Figure 7 shows the simulated correct response tendency to repeating and nonrepeating stimulus sets over sessions. As in the actual data, our simulated data showed substantially faster overall acquisition to the repeating stimuli than to the nonrepeating stimuli (see Bhatt et al., 1988, Fig. 4). In our simulation, this result makes sense because the repeating stimuli benefit both from continued direct strengthening of response tendency through reinforcement and from indirect strengthening through stimulus generalization; the nonrepeating stimuli benefit only from indirect response strengthening through stimulus generalization. Because a different set of nonrepeating stimuli is selected for each simulated session, slight variations in response tendency from session to session are to be expected; as the stimuli vary, so too does the degree of generalize4 response tendency from the previously trained stimuli. This fact accounts for the slight drop in response tendency seen from the fifth to the sixth pairs of sessions in Fig. 7. 5 . Experiment 5 : Categories and Pseudocategories
In this experiment (Wasserman et al., 1988, Experiment 2), birds given true category training acquired discriminative responding far faster than YW
8
5
8070-
P)
60-
f
50-
LT
4030-
z
F
g 5
g
2010-
0-
Edward A. Wasserman and Suzette L. Astley
102
birds given pseudocategory training. So did our simulation. Figure 8 shows mean correct response tendencies for the category and pseudocategory groups during 30 days of simulated training. The simulated data neatly reflect the actual data (see Wasserman et al., 1988, Fig. 3). In short, the results arise from the fact that within-category generalization engenders correct responding in the category group, whereas this generalization primarily promotes incorrect responding in the pseudocategory group. Recall that follow-up data collected by Bhatt (1988) showed that the disparity between category and pseudocategory groups increased as the number of stimuli per category or pseudocategory increased (the dependent measure was the number of days to a criterion of 70% correct on 2 consecutive days). Figure 9 shows this measure for simulated category and pseudocategory training with 4 and 12 stimuli; these simulated data generally mirror the relations shown in the actual data. The only exception was the equally fast learning seen in the simulation when only 4 stimuli were included in each category or pseudocategory; the actual data showed slightly faster learning by the category group. 6 . Experiment 6 : Split Categories
In this experiment (Wasserman et al., 1988, Experiment 11, pigeons were given training in which four conceptual categories were subdivided, with two different responses out of the four available being assigned to one
20-
(3
10-
Session --c Category
----A .. Pseudocategory
Fig. 8. Mean correct response tendency over sessions for simulated category and pseudocategory training.
Behavioral Analysis of Concepts
103
half of the stimuli from each subdivided category-a procedure called split-category training. On this procedure, pigeons attained high levels of discrimination accuracy; in addition, their errors were much more likely to be committed to the key that was correct for other members of the same conceptual category than to either of the other two keys that were correct for members of the other conceptual category that was shown in a training session. Our simulation responded similarly. Figure 10 shows simulated acquisition of the split-category discrimination over sessions (see Wasserman et al., 1988, Fig. I ) . Figure I 1 shows the response tendencies to Split Category 1 stimuli over 2-day blocks of simulated training. In this simulation, the 20 stimuli from a particular conceptual category were allocated to two split categories so that 10 were associated with one response (e.g., R I ) and the other 10 were associated with another response (e.g., R2). Response 3 (R3) and Response 4 (R4) were correct for reporting stimuli from the two other split categories shown in a training session. Errors in this sirnulation are predominantly to the response (R2) associated with other stimuli from the same conceptual category as stimuli from Split Category I (see Wasserman et al., 1988, Fig. 1). Also, the tendency to make the incorrect response associated with stimuli from the same conceptual category (R2) declined only after reaching a peak on the seventh 2-day block of training. At this point, within-category generalization was overtaken by within-category discrimi-
Fig. 9. Mean days to criterion for simulated category and pseudocategory training with
4 or 12 stimuti per category or pseudocategory.
90
80 70 a,
-0
5
60
I-
$
50
0
40 [r
5
30
5
20
L
10
0
Fig. 10. Mean correct response tendency over sessions of simulated split-category training.
.....
..........
........
.......................
.. . .
...
.............................
2-Day Blocks Fig. 1 1 . Mean tendency to make the correct response (RI),the response that is correct for other stimuli from the same category (RZ),and the responses that were correct for the other two categories (R3 and R4) over sessions of split-category training.
Behavioral Analysis of Concepts
105
nation (see Wasserman et al., 1988, Fig. 2 for corroborative results from a Markov model of split-category discrimination). D.
APPLICABILITY OF THE MODEL
With relatively minor exceptions, then, our elaboration of Spence’s (1937) model of discrimination learning fits a substantial body of evidence on basic-level conceptualization by pigeons in both go-no go and forcedchoice situations. Further tests of the model await additional empirical investigations.
VI. Conceptualization via Primary and Secondary Stimulus Generalization We have in this chapter pursued the utility of Keller and Schoenfeld’s behavioral analysis of concepts in terms of the joint operation of discrimination and generalization. We have also seen that this behavioral analysis relies heavily on the perceived similarity of stimuli, so that perceptually similar stimuli may be the basis for the creation of categories or concepts. This possibility brings us to an interesting and controversial point in the behavioral analysis of concepts because many current theorists, including Herrnstein (1990), Lea (1984), and Murphy and Medin (1985), have questioned the completeness of an account grounded in perceptual similarity. Murphy and Medin (1983, for example, have argued that people’s rough theories and practical knowledge of the world lead them to impose more structure on concepts than similarity alone would allow. Although potentially applicable to an analysis of human behavior, the relevance of this particular critique of similarity to animal behavior is not clear. (For more on the issue of similarity and concepts, see Medin, Goldstone, & Gentner, I993.) In a different and possibly more pertinent vein, Lea (1984; see also Watanabe, Lea, & Dittrich, 1993) has proposed that a concept comprises stimuli that are bound together by relations that are not based solely on perceptual similarity. He argued-as did our colleagues Cantor and Spiker-that if responses learned to some members of a category transfer to others, and if this transfer is based only on perceptual similarity, then the very idea of a concept is superfluous; primary stimulus generalization alone can completely explain transfer from some category members to others. Herrnstein (1990) has amplified Lea’s proposal by saying that in the acquisition of concepts learning a response to some members of a heterogeneous set of stimuli should ideally “propagate to all members of the set, without regard to similarity” (p. 150).
106
Edward A. Wasserman and Suzette L. Astley
If we accept Lea’s proposal that concepts entail collections of stimuli whose coherence represents something more than perceptual similarity, then just what is that “something more” that allows the aggregation of perceptually dissimilar stimuli? One interesting possibility is that stimuli that are simply associated with the same response or context will come to be classed together-despite their perceptual dissimilarity. As an illustration, because of a common response (tables and rugs are both called “furniture”) and a common context (tables and rugs are usually sold in furniture stores), perceptually different stimuli (tables and rugs) come to be grouped into a common class or concept (furniture). Such perceptually heterogeneous collections are quite familiar and are termed superordinate categories (Rosch & Mervis, 1975). Significantly, the limits of a similarity-based approach to concepts were actually noted decades ago by Keller and Schoenfeld (1950), whose behavioral analysis has been the theoretical centerpiece of this chapter. They argued that secondary or mediated generalization may play a large role in human conceptual behavior. Secondary or mediated generalization occurs when perceptually different stimuli like the words urn and uase are associated with a common set of objects, thereby making one verbal stimulus the functional equivalent of the other (see Goldiamond, 1966, pp. 214-215 for the related distinction between topographical vs. functional classes of stimuli). According to Keller and Schoenfeld (1950), “Generalizations are said to be mediated when they are based upon a stimulus equivalence which results from training” (p. 160). Just what kinds of training support learned stimulus equivalence? Here, Keller and Schoenfeld are not very specific. They do say that “Classes of objects or events, differently responded to, develop different concepts” (p. 155). So, different stimuli associated with the same contingency of reinforcement should presumably become functional equivalents of one another. And, although they discuss functional stimulus equivalence exclusively in connection with human verbal behavior, there is no good reason why Keller and Schoenfeld should limit nonsimilarity-based conceptualization to human adults. VII.
Nonsimilarity-Based Conceptualization
If we grant that adult human conceptual behavior may involve nonsimilarity-based transfer via secondary or mediated generalization, then we should ask whether there is any strong empirical support for the possibility that nonhuman animals exhibit nonsimilarity-based conceptualization (for negative evidence, see Bhatt & Wasserman, 1989). In one of the clearest
Behavioral Analysis of Concepts
I07
cases to date, Vaughan (1988) found that, after many discrimination reversals, pigeons came to treat members of each of two 20-item slide sets as functional equivalents of one another. Importantly, all 40 slides depicted trees; thus, the 20 items in each set were random assortments, with no obvious perceptual “glue” to bind them together into two classes entailing greater intraclass than interclass similarity. Other evidence of learned stimulus equivalence in pigeons given very small stimulus sets of highly abstract stimuli was reported by Urcuioli, Zentall, Jackson-Smith, and Steirn (1989) and by Zentall, Steirn, Sherburne, and Urcuioli (1991; for related data in rats, see Hall, Ray, & Bonardi, 1993). A.
EXPERIMENT 8: JOINT CATEGORY LEARNING BY PIGEONS
We (Wasserman, DeVolder, & Coppage, 1992) were quite encouraged by these prior successful studies and therefore undertook a project using substantial collections of lifelike stimuli that addressed the issue of superordinate conceptualization in pigeons via secondary or mediated generalization. Our experiment entailed a three-step procedure (cf., Urcuioli et al., 1989, Experiment 2). First, pigeons were trained to sort 12 different stimuli from each offour basic-level categories (people, flowers, cars, and chairs) into two arbitrary aggregations; Response I was reinforced in the presence of members of Categories 1 and 2, whereas Response 2 was reinforced in the presence of members of Categories 3 and 4. It is during this step of training that members of Categories 1 and 2 and members of Categories 3 and 4 might come to be merged into two classes of functionally equivalent, but perceptually different stimuli. Second, pigeons were trained to make new responses to members of Categories 1 and 3 only, with Response 3 being correct to members of Category 1 and Response 4 being correct to members of Category 3. Although only stimuli from Categories I and 3 were reassigned to new responses during this second step of training, stimuli from Categories 2 and 4 might also have been affected if the training in the first step were effective in establishing stimuli from Categories 1 and 2 as one aggregate group and stimuli from Categories 3 and 4 as a second aggregate group. The third and final step tested the pigeons for their tendency to make Response 3 versus Response 4 to stimuli from Categories 2 and 4, the two classes of stimuli withheld from the second step of training. Even though never before taught to make Response 3 or Response 4 to stimuli from Categories 2 or 4, the clear expectation from joint category formation in the first step would be that birds should predominately make Response 3 to stimuli from Category 2 and Response 4 to stimuli from Category 4.
108
Edward A. Wasserman and Suzette L. Astley
Eight pigeons served in the experiment that was conducted in the same four-key boxes used in all of the research reported in this chapter. As stimuli, we used 48 different color slides: 12 each of people, flowers, cars, and chairs. Each of the slides contained only one example of the category, except for flowers, in which the slides contained no more than five examples. We chose these particular slides because our prior work (Astley & Wasserman, 1992; the present Experiment 7) had disclosed their intraclass similarity to exceed their interclass similarity. In overview, the experiment comprised eight phases: Original Training 1 (28 sessions), Reassignment Training 1 (28 sessions), Testing 1 (4 sessions), Original Training 2 (28 sessions), Reassignment Training 2 (24 sessions), Original Training 3 (12 sessions), Reassignment Training 3 (8 sessions), and Testing 2 (4 sessions). The first three phases represent the minimum number needed to complete the three-step experimental design. The remaining five phases sought to enhance the (already statistically reliable) results that were obtained in the first test phase. Daily sessions of original training comprised 48 two-choice trials. Following intertrial intervals averaging 15 sec, the rotary projector advanced one slot to reveal a slide from one of four different classes: C1, C2, C3, or C4. The first peck to the screen key after 16 sec of slide exposure lit the top right (Rl) and bottom left (R2) report keys. Choice of R1 was correct if the slide depicted members of C1 or C2; choice of R2 was correct if the slide depicted members of C3 or C4; all other choices were incorrect. Correct choices were followed by food reinforcement; incorrect choices were followed by as many correction trials as were necessary for a correct response to be emitted. To control for the effects of interclass similarity, four pigeons received as C1, C2, C3, and C4, respectively, slides of flowers, chairs, cars, and people, and four other pigeons received as C1, C2, C3, and C4, respectively, slides of people, flowers, chairs, and cars. Twice daily sessions of reassignment training each comprised 24 twochoice trials. Here, only the top left and bottom right report keys were lit as choice alternatives. To control for horizontal versus vertical reassignment of report keys from the original training locations, one half of the pigeons in each stimulus assignment subgroup received horizontal reassignment of R3 to C1 and R4 to C3 and the other half received vertical reassignment. For all pigeons, choice of R3 was correct if the slide depicted members of C1; choice of R4 was correct if the slide depicted members of C3; all other choices were incorrect. Again, incorrect choices necessitated correction trials. Daily sessions of testing comprised 48 two-choice trials. As in reassignment training, only the top left and bottom right report keys were lit as
Behavioral Analysis of Concepts
I09
choice alternatives, and differential reinforcement (with correction trials) was scheduled on trials involving slides from C1 and C3; R3 was correct to C1 and R4 was correct to C3. Nondifferential reinforcement was scheduled on trials involving slides from C2 and C4; food was given on all C2 and C4 trials irrespective of the choice of R3 or R4. Nondifferential reinforcement was given to guarantee that differential responding to C2 and C4 stimuli was not the result of explicit training effects during testing; repeated testing could then be conducted without confounding. Only nominally, then, can R3 to C2 stimuli and R4 to C4 stimuli be considered to be “correct” responses, as they were to determine the formation of joint categories. As a detailed account of the results of the experiment is available (Wasserman et al., 1992), we will focus here on the last three phases when discriminative responding was strongest: 1. In the final four-session block of Original Training 3, the mean percentage of correct choice responses was 89%. 2. In the final four-session block of Reassignment Training 3, the mean percentage of correct choice responses was 89%. 3. Across all four sessions of Testing 2, the mean percentage of correct choice responses to reassigned stimuli (Cl and C3) was 87% and to nonreassigned stimuli (C2 and C4) it was 72%. For both reassigned and nonreassigned stimuli on each test session, including the first, obtained accuracy reliably exceeded the chance accuracy score of 50%. Accuracy to reassigned stimuli (C I and C3) reliably exceeded accuracy to nonreassigned stimuli (C2 and C4); as well, the drop in accuracy from reassigned (C1 and C3) to nonreassigned (C2 and C4) stimuli was reliably greater for the subgroup of pigeons with both natural and artificial stimuli in each joint category (87-64%) than for the subgroup of pigeons with only natural or artificial stimuli in each joint category (87-80%). Additional analyses confirmed the reliability of the reassigned-nonreassigned decrement in each case. Across all four test sessions, obtained accuracy reliably exceeded chance accuracy for both reassigned and nonreassigned stimuli in each subgroup of pigeons. Finally, as to individual subject performance, across all four test sessions, the obtained accuracy of all eight pigeons to reassigned stimuli reliably exceeded chance accuracy; the obtained accuracy of six pigeons (all four from the subgroup with only natural or artificial stimuli in each combined category) to nonreassigned stimuli did so.
This three-step procedure, therefore, proved highly effective in producing and disclosing nonsimilarity-based conceptualization in pigeons. Merely by being associated with a common response in the first step,
I10
Edward A . Wasserman and Suzette L. Astley
classes of perceptually dissimilar stimuli, like cars and chairs, appear to amalgamate into a new category of functionally equivalent stimuli. Thus, requiring a new response to be performed to only one of these two stimulus classes in the second step now brings about transfer to the other stimulus class in the third step. This transfer can be quite substantial (78% correct on the first day of the second test phase), but it cannot readily be explained in terms of primary stimulus generalization because of the counterbalancing of stimulus classes constituting the joint categories. Instead, appeal to mediated or secondary stimulus generalization appears to be necessary to account for this form of conceptual transfer. Saying that the experiment constitutes a clear case of nonsimilaritybased conceptualization does not mean that perceptual similarity played no part in the obtained results. First, choice accuracy was higher to reassigned stimuli (CI and C3) than to nonreassigned stimuli (C2 and C4). At least part of that difference might plausibly be attributed to primary stimulus generalization decrement; C1 and C3 stimuli and C2 and C4 stimuli are indeed perceptually different from one another (Astley & Wasserman, 1992). Second, the drop in accuracy from reassigned (Cl and C3) to nonreassigned (C2 and C4) stimuli was greater for pigeons with both natural and artificial stimuli in each joint category. This result might be accounted for by smaller discriminal distances between people and flowers or between cars and chairs than between the remaining pairs of stimulus classes. However, this pattern of discriminal distances was not confirmed by accuracy scores during the three phases of original training, where one would expect faster learning with joint categories comprising only natural or artificial stimuli than with joint categories comprising both natural and artificial stimuli. No statistically reliable differences in performance between the subgroups of pigeons were observed in any of the three phases of original training. We recognize that a related form of stimulus generalization might be implicated in the transfer of control from reassigned to nonreassigned stimuli. This transfer may involve stimulus elements that are common to members of C1 and C2. Perhaps, even though stimuli from C1 and C2 involve greater intraclass than interclass similarity, some aspect(s) of those stimuli may be shared with one another, but not shared by stimuli from C3 and C4. In a parallel fashion, some aspect(s) of stimuli from C3 and C4 may be shared with one another, but not shared by stimuli from C1 and C2. Successful transfer may then represent control by some common element(s) present in stimuli from either of the pairs of basic-level categories. Although we accept the possibility of this “common elements” account, we consider it t o be ad hoc and of limited heuristic value. Most prob-
Behavioral Analysis of Concepts
111
lematical is the fact that our pigeons showed effective transfer of control from reassigned to nonreassigned stimuli with two different C1 + C2 and C3 + C4 assignments: one being flowers + chairs and cars + people and the other being people + flowers and chairs + cars. Anyone espousing the common elements interpretation will have to provide realistic and independently manipulable candidates for such stimulus control before this account can be taken seriously. These results and arguments notwithstanding, whatever other features of the present data can be explained by primary stimulus generalization, the significantly above-chance overall accuracy scores obtained to nonreassigned stimuli appears to be largely inexplicable by this interpretation. Nor can response generalization account for the differential tendencies to make R3 to C2 stimuli and R4 to C4 stimuli during test sessions. By locating R1 and R2 along the positive diagonal of response keys and by locating R3 and R4 along the negative diagonal, we guaranteed that R1 and R2 were equidistant from R3 and R4, thus eliminating any systematic tendency for one or the other test response (R3 vs. R4) to be exhibited to either class of nonreassigned stimuli (C2 and C4). A final procedural safeguard was for an equal number of birds to have R3 and R4 horizontally or vertically relocated from RI and R2.
B. EXPERIMENT 9: JOINTCATEGORY LEARNING BY PRESCHOOL CHILDREN We were both pleased and surprised by the magnitude and the robustness of our pigeons’ nonsimilarity-based conceptual behavior. But, without a basis of comparison, we were unsure of just how pleased and surprised to be. Therefore, we undertook a parallel project with preschool children given a photograph-sorting task to assess the generality of our results with pigeons. This project by DeVolder, Lohman, and Wasserman (reported in Wasserman & DeVolder, 1993) followed exactly the same logic and plan as the pigeon project just described. The subjects were 20 children recruited from a preschool in a large Midwestern community. The children’s ages ranged from 4 years, 5 months, to 5 years, 10 months, with a mean age of 4 years, 10 months. Half of the children were boys and half were girls. The children were paid for their participation with a $ I gift certificate at an ice cream parlor. A large white poster board laminated in clear acrylic to prevent soiling served as the response board. There were four rectangles outlined in each quadrant of the board and one rectangle located centrally. During each phase of the experiment, two diagonal rectangles were covered by large X s , making them unavailable. The same 48 snapshots used in the pigeon
112
Edward A. Wasserman and Suzette L. Astley
experiment were used here. These photographs were laminated in clear acrylic to prevent damage from handling by the experimenter and the children. There were three phases in this experiment: 1. In Original Training, children received a total of 48 trials. On each trial, the experimenter placed a photograph in the center rectangle and asked the child to place it in the correct choice location. During this phase, the top left and bottom right locations were covered, leaving only the top right and bottom left rectangles available as choice responses. Correct responses were verbally reinforced (“that is correct”) and incorrect responses were verbally punished (“that is incorrect”); no correction trials were given here or in any of the two following phases. Stimulus and response assignments followed the same plan used in the prior pigeon project. Original Training involved photographs from C1, C2, C3, and C4, and continued for 48 trials for all children; by this time, each child had reached the 90% correct level during at least one 12-trial training block. 2. Immediately after Original Training, each child underwent Reassignment Training during which the top right and bottom left response locations were covered, leaving only the top left and bottom right locations available as choice responses. Reassignment Training utilized only two of the stimulus classes (Cl and C3) from Original Training and comprised a total of 24 trials; by this time, each child had reached the 90% correct level during at least one 12-trial training block. 3. After completion of Reassignment Training, each child was shown each of the 48 photographs one time each during a single Testing period. As in Reassignment Training, only the top left and bottom right rectangles were available as choice responses. As in the two prior phases, differential reinforcement was given for trials involving photographs from C I and C3. Nondifferential positive reinforcement (“that is correct”) was given for trials involving photographs from C2 and C4.
During Testing, the mean percentage of correct choice responses to reassigned stimuli (C1 and C3) was 99% and to nonreassigned stimuli (C2 and C4) it was 80%. For both reassigned and nonreassigned stimuli, obtained accuracy reliably exceeded chance accuracy. The drop in accuracy from reassigned (C1 and C3) to nonreassigned (C2 and C4) stimuli was also reliable; however, the assignment of stimuli to joint categories was not a reliable determinant of the children’s performance. Finally, as to individual subject behavior, the obtained accuracy of all 20 children to reassigned stimuli reliably exceeded chance accuracy; the obtained accuracy of 13 children (6 from the subgroup with only natural or artificial stimuli in each combined category) to nonreassigned stimuli did so.
Behavioral Analysis of Concepts
I13
Compared with the performance of our pigeons, our children mastered the tasks they were given far faster and reached even higher levels of discriminative behavior; nevertheless, their behavior to reassigned and nonreassigned stimuli in testing was rather similar. Mean first-session accuracy of pigeons and children to reassigned stimuli was 88% and 99%, respectively; mean first-session accuracy of pigeons and children to nonreassigned stimuli was 78% and 80%, respectively. All of the pigeons and children responded reliably above chance levels to the reassigned stimuli; 75% of the pigeons and 65% of the children responded reliably above chance levels to the nonreassigned stimuli. More importantly, in each case, the evidence was quite clear that association with a common response was sufficient for stimuli from perceptually different classes to join into a single class of functionally equivalent stimuli. Because of the design of the pair of experiments, the transfer of training from reassigned to nonreassigned stimuli is best interpreted as representing nonsimilarity-based conceptualization. C.
EXPERIMENT 10: CHILDREN’S JOINTCATEGORY LEARNING SIMULATED HUMANFACES
WITH
We now describe early data from an ongoing project with slightly older children. Our goal in this project, conducted by the second author and Paul Norwood, is to extend the range of stimuli to which the three-stage joint categorization procedure has been applied and to use naturalistic stimuli over which we can exert better experimental control than the snapshots that we used in prior research. Additionally, we wish to see if the effects of joint categorization training will extend to novel stimuli drawn from the component categories. The stimuli that we used in Pretraining and in the Original Training and Reassignment Training phases were 16 simulated human faces created with Mac-a-Mug software (Shaherazam Software, Kalamazoo, MI). The faces were black line drawings on a white background. There were four faces in each of four “families.” The members of each family had the same hair and chin, but each individual had unique eyebrows, eyes, nose, and mouth. The interior features of each face were unique; no other face shared those specific features. All of the faces had the same ears. All four faces from each of the four families are shown in Fig. 12. For Testing, two novel faces in each of the four families were created (they are not shown in Fig. 12). Like the family members used in Original Training and in Reassignment Training, the novel faces had the same family hair and chin types, but they had unique eyebrows, eyes, noses, and mouths. The subjects were children between the ages of 5 and 7, who were individually tested at several day care centers. The stimuli were presented
114
Edward A. Wasserman and Suzette L. Astley
on an 11-in., black-and-white computer monitor; the children were asked to respond to the faces by pointing directly at a location on the screen. The children’s responses were followed by the experimenter’s manipulation of the computer mouse to indicate the location to which the child had pointed. As pilot testing had found this joint-category task to be hard for children of these ages to learn, we preceded Original Training with special Pretraining (described in the next paragraph) and with exposure to the Reassignment Training procedure. We suspected that these measures might aid learning in Original Training. Other measures we took to facilitate learning were to give two exposures to Original Training and to Reassignment Training, and to prompt correct responses on the first four trials of the first exposures to Original training and Reassignment training. In Pretraining, the children were first shown the hair and chin attributes (with no interior features) characteristic of each of the four families; they were told that they would see “boys” from each of these families. Then, they were shown each of the faces, one at a time, at the top of the screen and the four family outlines (hair and chin) at the bottom of the screen. The children were asked to match (by pointing) the face at the top of the screen with the correct family below. The children responded at a very high level of accuracy in this task. Before starting the main portion of the experiment, the children were informed that they would be playing a new game in which their job was to help the same boys whom they had seen earlier to get home for dinner. Each trial in Original Training began with the presentation of one of the 16 faces (approximately 2 in. wide and 3 in. high) in the middle of the computer screen. After 1 sec, line drawings of two houses appeared as well, one in the upper left corner of the screen and the other in the lower right corner. One of the houses was correct for two of the families and the other was correct for the other two families. The joint categories of families were counterbalanced, so that the upper left response (Rl) was correct for Families 1 and 2 for half of the subjects and for Families 1 and 4 for the other half of the subjects. Correct pointing responses were followed by auditory feedback from the computer; a cartoon-like voice said, “I’m home, what’s for dinner?” Incorrect choices were also followed by auditory feedback from the computer; on these occasions, the computer voice said, “Oh-oh, this isn’t my home.” The experimenter also verbally emphasized the correctness or incorrectness of the children’s choices. In Reassignment Training, only faces from Families 1 and 3 were shown, this time in conjunction with two different houses that were not available in Original Training. One house appeared in the upper right corner of the computer screen and the other appeared in the lower left corner. One of the houses was designated as correct for faces from Family 1 and the
Behavioral Analysis of Concepts
I15
other house was correct for faces from Family 3. Correct choices were followed by the same computer and experimenter feedback as in Original Training. At the beginning of Testing, the children were told that they would be playing a game like the one that they had just played (in Reassignment Training), except that they would now see boys from all four of the families and some new boys too. In the 56-trial Testing period, there were four presentations of each of the four faces from Families 1 and 3, with correct responses and consequences arranged as in Reassignment Training. There were also two presentations of each of the four faces from Families 2 and 4, and subjects received positive reinforcement after choices in the presence of these stimuli no matter which house they selected. Also in the Testing period was a single presentation of each of the two novel faces from each of the four families, again with choices nondifferentially reinforced by computer and experimenter feedback. Testing trials were organized in blocks of seven, with each block comprising four trials with two different faces from each of Families 1 and 3, two trials with one face from each of Families 2 and 4, and one trial with a novel face from one of the four different families. The order of training phases was Reassignment Training I , Original Training I , Original Training 2 , Reassignment Training 2, and Testing. Correct responding was prompted on the first four trials of Reassignment Training 1 and Original Training 1 . There was no prompting during Original Training 2 , Reassignment Training 2 , or Testing. Pretraining and all other phases of the experiment were completed within a single session that lasted from 30 to 90 min. The data provided good evidence of joint category formation. All 10 children evidenced highly discriminative performance in both Original Training 2 and in Reassignment Training 2 . In Testing, the children achieved a mean accuracy score to the reassigned stimuli (Families 1 and 3) of loo%, indicating that they maintained differential responding to these stimuli. To the nonreassigned stimuli (Families 2 and 4),subjects achieved a mean accuracy score of 86%, indicating that they had indeed formed joint categories in Original Training. Accuracy t o the novel faces first shown in Testing closely resembled accuracy to the familiar faces seen in Original Training and in Reassignment Training. Accuracy to novel faces from the reassigned categories (Families I and 3) averaged 100%; accuracy to novel faces from the nonreassigned categories (Families 2 and 4) averaged 82%. Thus, the results of this project not only provide clear evidence of joint category formation with a different kind of complex visual stimulus than photographs of real-world objects, they also demonstrate transfer of discrimina-
116
Edward A. Wasserman and Suzette L. Astley
Fig. 12. Faces in the four family groupings of Experiment 10. Different families are shown in the different columns; different individuals in those families are shown in the different rows. Families I , 2, 3. and 4 are depicted in Columns I , 2, 3, and 4,respectively.
tive control to novel stimuli from the reassigned and nonreassigned categories. VIII. Summary of Empirical Evidence Before making our concluding comments, it may prove to be useful for us to summarize the main empirical findings that we have presented in this chapter.
Behavioral Analysis of Concepts
117
1 . The pigeon can be taught to classify color slides of natural and humanmade objects into four mutually exclusive and exhaustive stimulus classes. This categorical discrimination learning can proceed quite rapidly and with similar speed for both natural (cats and flowers) and humanmade (cars and chairs) stimuli (Experiment 1). Categorical discrimination learning is faster the fewer examples per category are shown (Experiment 2) and if stimulus repetition is permitted from day to day (Experiment 4). Nevertheless, categorical discrimination learning is possible even when the slides that are shown to the pigeon are never repeated (Experiments 3 and 4). Finally, categorical discrimination is far faster if the training categories are isomorphic with human conceptual categories than if they are not (Experiment 5 ) . 2. After categorical discrimination learning, the pigeon generalizes its classification behavior to novel slides (Experiment 1). Categorical generalization is broader the more examples per category are given in prior discrimination training (Experiment 2 ) . There is a strong perceptual component to the pigeon’s categorical discrimination and generalization. Without any differential reinforcement of categorization behavior whatsoever, the pigeon shows a clear tendency to respond similarly to stimuli from the same human conceptual categories (Experiment 7). Furthermore, the pigeon’s within-category errors in a split-categorization procedure exceed its between-category errors (Experiment 6). 3. All of this (and other subsidiary) evidence can be encompassed by a modification of Spence’s ( 1937) theory of discrimination learning. The key modification is that stimuli from a single conceptual category lie along a common psychological dimension. Still, there is more to the pigeon’s conceptual behavior (and to the human being’s as well). Collections of stimuli from two different perceptual groupings (like cars and chairs) can be forged into a single functional class by associating them with a common operant response (Experiment 8). Such superordinate category formation would appear to require behavioral mechanisms that transcend primary stimulus generalization. Invoking secondary or mediated stimulus generalization to account for nonsimilarity-based conceptualization is well justified from a behavioral vantage point. And, mediated generalization can be effectively applied to the very similar results of analogous experimentation with young children (Experiments 9 and 10). IX. Concluding Comments “[EJquivalent stimuli is what we mean when we speak of a concept” (p. 155). These words of Keller and Schoenfeld (1950) place the study of concepts squarely within the experimental analysis of behavior. Neverthe-
118
Edward A. Wassennan and Suzette L. Astley
less, the research discussed in this chapter makes it clear that the origins of stimulus equivalence may be quite different. On the one hand, stimulus equivalence may result from the perceived similarity of stimuli. In this case, primary stimulus generalization may be said to support the transfer of discriminative responding from some stimuli to others. On the other hand, stimulus equivalence may result from prior experience. In this case, secondary or mediated stimulus generalization may be said to support the transfer of discriminative responding from some stimuli to others. Pigeons and children alike are capable of similarity-based and nonsimilarity-based conceptualization; but, in the case of similarity-based conceptual behavior differential reinforcement appears to disclose preexisting concepts, whereas in the case of nonsimilarity-based conceptual behavior it appears to produce new ones. A.
A MEDIATIONAL ACCOUNT OF NONSIMILARITY-BASED CONCEPTUALIZATION
Our demonstration of nonsimilarit y-based conceptualization in pigeons and children brings us to the nature of the behavioral processes involved in this transfer. Here, there is good reason to point to the importance of the common operant behavior that was performed to members of the joint categories. Thus, R1 was the common response to members of both C1 and C2 during Original Training. Even though R1 was unavailable during Reassignment Training, when R3 was required to members of C1, the partial or incipient performance of that initial response (rl) and the propnoceptive and kinesthetic feedback ( s l ) that may be produced by it could have become a portion of the complex of discriminative stimuli associated with R3. The later presentation of members of C2 during Testing, even though never before paired with R3, could have effectively led to the performance of R3 because these stimuli were associated with some of the discriminative stimulus complex hypothesized to be associated with R3. The full theoretics of this mediational account are elaborated in Table I. This table details all of the experimentally available stimuli and responses in uppercase letters and all of the hypothesized stimuli and responses in lowercase letters for all three phases of experimental training. Most critical for the mediational account is its accurate prediction of the emergent stimulus-response associations that are observed in Testing: namely, the performance of R3 to stimuli in C2 (via the mediating chain C2 + r l + sl + R3) and the performance of R4 to stimuli in C4 (via the mediating chain C4 + r2 + s2 + R4). Such accurate prediction necessitates the training given during bofh of the first two phases of training;
Behavioral Analysis of Concepts
1 I9
TABLE 1 MEDIATIONAL ANALYSISOF NONSIMILARITY-BASED CONCEPTUALIZATION Original training CI + r l + s l + R I C2 -+ rl + sl + R I C3 + r2 + s2 -+ R2 C4 + r2 -+ s2 + R2
Reassignment CI
+
rl
C3 + r2
+
sl
-+ s2
+
-
Testing R3 C2
--f
rl
+
sI
--+
R3 or R4?
R4
C4 + r2 + s2 -+ R3 or R4?
N o t e . C1, C2. C3. and C4 refer to the four categories of stimuli shown to the subjects; R1. R2, R3. and R4 refer to the responses availahle to the subjects: r l . r2. r3. and r4 refer to the hypothesized mediating responses; $ 1 . s2. s3. and 54 refer to the hypothesized stimulus consequences of each of the four mediating responses. Boldface responses in testing indicate those that would evidence mediated generaliration.
otherwise, there would be no basis for effective mediation of the emergent stimulus-response relations tested in the final phase. It might occur to some readers that mediational mechanisms of the sort discussed by earlier authors like Osgood (1953), Kendler and Kendler (1962), and Underwood (1966) must surely be out of date. Yet, Herrnstein (1990) too has recently addressed the relevance and importance of mediated generalization and mediated association for comprehensive analyses of conceptual behavior: The terms were used to describe transfer of conditioning across stimuli too different to be attributed to common features or attributes. The theoretical notion was that stimuli may have had no stimulus elements in common. but, presumably because of past conditioning. they evoked overlapping responses. The overlapping responses provided the bridge for transfer. (p. 150)
Mediational mechanisms have proven to be interpretively valuable well beyond the present context of conceptual behavior. For example, Sanders ( 197 1 ) interpreted both species differences and developmental differences in discrimination learning as resulting from differences in mediating responses. Those wishing to learn more about the application of mediational mechanisms to other issues in cognition should consult Peterson ( 1984) and Heit (1992).
B.
VERISIMILITUDE VERSUS CONTROLLABILITY OF DISCRIMINATIVE STIMULI
Some comment is in order concerning our choice of lifelike snapshots and drawings to be discriminated by our pigeon and children subjects. The
120
Edward A. Wasserman and Suzette L. Astley
use of naturalistic stimuli substantially increases the representativeness of our results to real life situations; however, it also entails a lack of experimental control that would otherwise be highly desirable. We do not know all that we would like to about the perceptual similarity of the present stimuli, but we do know and can easily demonstrate that, in our studies of nonsimilarity-based conceptualization, unequal discriminal distances among our four stimulus classes are as likely to produce belowchance transfer to new stimuli as they are to produce above-chance transfer. That is why, in the absence of detailed psychophysical information, we must rely on counterbalancing of the stimulus classes constituting the joint categories to neutralize any biases in the transfer performance we obtain. In future studies, we plan to establish composite categories of stimuli, like letters of the alphabet, with known discriminal distances between one another (Blough, 1985).
C. KINDSOF CONCEPTUAL CATEGORIES Another matter for discussion concerns the place of our results within the familiar tripartite scheme: subordinate concept, basic-level concept, and superordinate concept (Rosch, Mervis, Gray, Johnson, & BoyesBraem, 1976). This scheme proposes that human conceptual categories can be located at three levels depending on the relative extent of intraclass and interclass stimulus similarity. The basic-level concept, like chair, enjoys both high intraclass similarity and low interclass similarity. The subordinate concept, like dining chair, entails much higher interclass similarity than the basic-level concept. The superordinate concept, like furniture, entails much lower intraclass similarity than the basic-level concept. The joint categories established in two of our final three experiments and in the experiment of Vaughan (1988) fit well within this scheme. Each project began with basic-level categories: Vaughan chose 40 photographs of trees and we chose 12 photographs each of people, flowers, cars, and chairs. Vaughan proceeded to establish two subordinate categories, each comprising assortments of 20 photographs, with each of six pigeons trained with different assortments. We proceeded to establish two superordinate categories, each comprising two basic-level categories, with each of two different groups of pigeons or children trained with different pairs of basic-level categories. Each pigeon study found clear evidence that new categories of functionally equivalent stimuli were formed, thereby supporting the view that much of the richness and complexity of human conceptual behavior is to be found in the behavior of nonhuman and nonverbal animals (see also Roberts & Mazmanian, 1988). Just what verbal behavior adds to conceptualization now becomes a matter of special interest, given that
Behavioral Analysis of Concepts
121
our own results conform well to the principle of secondary or mediated generalization-yet our pigeons’ behavior (unlike that of our children) must surely not have involved verbal mediation.
D. CONCEPTS A N D LANGUAGE Our pigeon data suggest that a good many of the details of conceptualization-both similarity-based and nonsimilarity-based-can be exhibited by the pigeon, a nonhuman and nonverbal animal. In addition, the behaviors of our preschool children look very much like those of our pigeons. Although it would be downright silly to insist that there are no differences in the conceptual behaviors of these vastly different organisms, we are quite impressed with the similarities that we have so far observed. At the very least, these resemblances indicate that complex conceptualization requires neither language nor the human brain.’ Even if it were to be shown that language importantly influences conceptualization, there is no need to abandon a behavioral analysis. The very terms of our language may have been learned by the same process that is responsible for the learning of nonverbal concepts. At least that was the premise behind our choice of the name game as the human concept learning context on which we fashioned our pigeon conditioning analogue. And, it was the premise behind Quine’s (1969) suggestion that learning the meaning of such intangible terms as yellow through the rough and ready route of practical experience also requires no new principles: All these delicate comparisons and shrewd inferences about what to call yellow are, in Sherlock Holmes’s terminology. elementary. . . . It is the same process by which an animal learns to respond in distinctive ways to his master’s commands or other discriminated stimulations. (p. 10)
E. A TAXONOMY OF STIMULUS CONTROL Our findings also relate to a recent taxonomy of stimulus control proposed by Herrnstein (1990). Open-ended categorization, in his scheme, is what we have termed similarity-based conceptualization. “Any natural contingency of reinforcement is . . . likely to be open-ended . . . , comprising a virtually limitless set of exemplars from the subject’s vantage point . . . [and] the inevitability of this demand, and its solution by perceptual Our discussion of conceptualization in pigeons and children has centered on categorization behavior. Other work (Heit, 1992; Osherson, Smith, Wilkie. Lopez. & Shafir, 1990) has also concerned behavioral inferences of the form, “if this object is a bird, then it is likely to fly.” Whether such inferences can be shown in the behavior of nonhuman animals is an intriguing, but as yet unexplored issue.
122
Edward A. Wasserman and Suzette L. Astley
similarity, are what suggest that similarity is an evolutionary adaptation” (p. 144). A concept, in Herrnstein’s scheme, is what we have termed nonsimilarity-based conceptualization. Here, “the effects of contingencies [of reinforcement] applied to members of the [stimulus] set propagate to other members more than can be accounted for by the similarities among members of the set” (p. 150). Herrnstein notes that his level of concept relates to earlier authors’ notions of secondary or mediated generalization. Whereas most of those authors linked such mediation to human verbal behavior, Herrnstein does not. To him, “the issue is simply whether analogous examples of categorization can be convincingly shown for nonverbal animals” (p. 150). Our pigeon data and those of Vaughan (1988) seem to do just that. Our data also force us to update Premack’s (1983) appraisal that “pigeons have never been shown to have functional classes-furniture, toys, candy, sports equipment-where class members do not look alike; they only recognize physical classes-trees, humans, birds-where class members do look alike” (p. 359).
F. PERCEPTUAL SIMILARITY A N D MODELS OF CATEGORIZATION Similarity has been an important explanatory construct in many theories of human conceptual behavior. Classical, prototype (family resemblance), and exemplar models all rely to some extent on similarity t o explain how categories are formedand how new instances are classified (e.g., Komatsu, 1992). The work in the latter half of this chapter concerned a nonsimilaritybased associative mechanism of categorization. This mechanism may prove to be a powerful addition to current models of categorization (see Nosofsky, 1992). Much about the relative contributions of similarity- and nonsimilaritybased associative mechanisms to categorization must still be determined. However, some initial remarks are in order here. First, although it was not included as a factor in our own computer simulations, we believe that it is quite possible that the perceived similarity of stimuli may change with an organism’s experience with them or with like stimuli. As Smith and Heise (1992) have recently noted, the dynamic nature of perceptual similarity has been a central concern of work ranging from Nosofsky’s (1984) account of categorization in humans to Sutherland and Mackintosh’s (1971) theory of selective attention in animal discrimination learning. Any full account of the role of perceptual similarity in categorization must take this dynamism into account. Second, asserting that perceptual similarity may play a central role in categorization is not equivalent to asserting that perceptual similarity is
Behavioral Analysis of Concepts
I23
always the primary factor. Gelman and Markman (1986) and others have shown that even young children will frequently make judgments on the basis of category membership rather than on perceptual similarity when these two factors are pitted against one another. This result does not mean, however, that perceptual similarity plays little role in category judgments in the “real world” outside of the laboratory. Further research is necessary to explore the varying role of perceptual similarity in different types of tasks and at different developmental times. Third, and finally, implicit or explicit in several recent discussions of categorization (e.g., Komatsu, 1992) is a view that perceptual similarity and associative factors are too primitive and inflexible to deal with the full complexity of human categorization behavior. It is indeed likely that in many (if not most) cases, changes in selective attention of the sort described by Nosofsky (1984) and others will inevitably take a number of stimulus exposures to develop. This attentional change is also likely to be true of associative relations of the sort studied by Wasserman et al. (1992). Although the formation of associations and long-term changes in selective attention may take substantial time to develop, once established, the stimuli or associative relations that effectively activate behavior may be subject to change in an instant. In what is conventionally called a “conditional discrimination”, for example, the presence of one cue signals that a response to one of several available stimuli will be reinforced and the presence of a different cue signals that a response to a different stimulus will be reinforced. Even pigeons are quite capable of reversing their choices in response to such “conditional” cues (e.g., Carter & Eckerman, 1975). Smith and Heise ( 1992) less conventionally refer to those cues that momentarily change the evident similarity of a set of stimuli or differentially activate learned associations as “contexts.” By way of example, for 3-year-olds, the context of a verbal cue entailing a count noun (e.g., “This is a DAX”) guides the child’s response to an object’s shape; this is not the case in a different verbal context using the same word as an adjective (e.g., “This is a DAX one”; Smith, Jones, & Landau, 1992). As complex and intricate as these cases of conditional or contextual stimulus control are, they would appear to be straightforwardly incorporated into the kind of behavioral analysis that we have advanced here.
G. CONCEPTS A N D CONSCIOUSNESS Still more might be said about the completeness of the behavioral analysis of concepts that we have advanced. It will surely seem to some readers that we must have left something out. Discrimination, generalization, reinforcement-can that be all that there is to this remarkable feat of human and animal cognition?
I24
Edward A. Wasserman and Suzette L. Astley
We cautiously respond, Perhaps not. There may be even more complex actions of humans and animals that importantly transcend the explanatory power of contemporary behavior theory. We ourselves deemed it necessary to appeal to secondary or mediated stimulus generalization to explain nonsirnilarity-based conceptualization when primary stimulus generalization alone was not fully up to the task. If future evidence emerges that challenges the completeness of behavior-analytic principles (such as the contextual or conditional stimulus control just discussed), then we suggest that those new findings too may yield to more careful and complete behavioral analysis. We are confident that this general approach has a great deal to contribute to our understanding of both human and nonhuman cognition. Should the full richness of behavioral adaptability truly fail to be captured by this approach, then our science will nevertheless have been well served by forcing competing theorists to stick to the behavioral facts at issue rather than drifting off to peripheral and diverting disputes. Here, in fact, we must discuss just such a digression, as the respected ethologist D. R. Griffin (1992) has recently argued that our own analysis of conceptual behavior fails because it does not deal with the organism’s alleged thoughts and feelings. Speaking of this research, Griffin offers the following critique: In keeping with the behavioristic tradition, the papers describing these impressive achievements of pigeons are titled “Conceptual Behavior in Pigeons.” Presumably this wording was chosen to reinforce the behavioristic insistence that any mental terms be scrupulously avoided. Animals may behave as though they utilized simple concepts, but behaviorists are constrained to ignore or deny the possibility that they might consciously think about the categories or concepts that must be postulated in order to explain their behavior. (pp. 135-136)
We did not in our papers refer to our subjects’ conscious experience-be they pigeons or children. To have done so would, in our opinion, have added nothing to our account of conceptualization that would be open to objective scrutiny. We see no need to discuss our subjects’ behaviors romantically or sentimentally. We recognize that others do. But we consider these efforts to be misguided. They undermine rather than advance the scientific study of behavior, particularly behavior that is considered to be complex or cognitive by most theorists. They confuse behavioral facts with mentalistic interpretations. They substitute unverifiable notions of feeling or sentiment for testable theoretical propositions. They tend to squelch rather than to foster experimental inquiry. As well, those espousing these notions often grossly distort the true aims and premises of a behavioral analysis of cognition in the interest of promoting their own mentalistic agenda.
Behavioral Analysis of Concepts
125
It really is remarkable that we should be witnessing the reintroduction of consciousness into the analysis of behavior. Some years ago, cognitivists drifted perilously close to this point (see Hintzman, 1993, for a critical essay on the excesses of cognitivism). But now, the field of cognitive science is becoming increasingly computational and materialistic. Connectionist theories are edging ever closer to associative accounts of complex cognitive functions (see Gluck & Bower, 1988, and Kruschke, 1992, for connectionist and adaptive network models of category learning), and there are growing signs that theorists may once again be willing to develop theories of cognition that are sufficiently general to apply to both human and animal behavior (see Shanks, 1993, and Wasserman, 1990, for more about associative accounts of both human and animal learning). The emergence of Griffin’s not so aptly dubbed “cognitive ethology” in the midst of these positive developments is at once unexpected and unfortunate. Is behaviorism (and other objective analyses of cognition) really as bad as Griffin and his followers would have us believe? Are we behaviorists a “cold and clammy” lot, of “limited imaginations,” and suffering from the “self-imposed handicap or blindness” of “paralytic perfectionism.” Are we afflicted with “compulsive parsimony” because we cannot or will not expand our interpretive horizons to embrace the private experiences of our subjects. 1s it “really absurd [for us] to deny the existence and importance of mental experiences just because they are difficult to study” (Griffin, 1992, p. 117)? Shouldn’t we capitulate and at long last accept the central premise of cognitive ethology that “we cannot understand animals [or people for that matter] fully without knowing what their subjective lives are like” (Griffin, 1992, p. 252)? Those familiar with the writings of H. S. Jennings, J. B. Watson, and B. F. Skinner know the answers to these questions. We will not repeat them here, having discussed them at length before (Wasserman, 1981, 1982, 1983, 1993). We will grant Griffin one important point, however. With the exception of Skinner’s ( 1957) pioneering treatment of verbal behavior, few behaviorists have seriously endeavored to analyze complex cognitive functioning from a behavioral perspective. We thus felt the sting of Griffin’s (1992) pen most when he prefaced his discussion of our own work and that of other behaviorists on conceptualization in the pigeon by the following remark, “Some of the best evidence that animals can think in terms of categories or concepts has become available from what at first thought may seem an unlikely source, namely the detailed analyses of animal learning by experimental psychologists” (p. 116). An unlikely source? Who should be in a better position to investigate conceptualization than experimental psychologists with a strong background in the laboratory study of learning?
126
Edward A. Wasserman and Suzette L. Astley
We sincerely hope that whatever measure of success we have achieved in our studies of conceptual behavior in animals (and children) will encourage others to enter the fray. We surely do not want future critics of behaviorism to complain that we are too timid a bunch to dare to apply our analysis to complex learning and cognition.
H. CONCEPTS A N D
THE
BRAIN
Finally, we come to the interrelation between conceptual behavior and the neural mechanisms that underlie it. The 1972 Nobel prize-winning neuroscientist G. M. Edelman has recently discussed this issue at some length. However, his specific neurological formulation is not as important to us here as is his general orientation. To begin, Edelman (1992) too does not insist that conceptualization requires the human brain or human language. Nonhuman and nonverbal animals are also capable of conceptual behavior. “An animal capable of having concepts identifies a thing or an action and on the basis of that identification controls its behavior in a more or less general way” (p. 108). Furthermore, Edelman hypothesizes that conceptualization is not of recent evolutionary vintage. “Conceptual capabilities develop in evolution well before speech” (p. 108). Edelman correctly notes that most theorists do not appreciate that animals are capable of conceptualization because “it is difficult to know which animals beside humans have conceptual abilities” (p. 108). Here, however, he pays no apparent attention t o the basic behavioral basis for defining conceptualization and, given his own prior remarks on the role of language in conceptual behavior, he curiously resorts to the question of communication. Edelman asserts that the case for conceptualization in chimpanzees is persuasive, but “decisions about the conceptual capabilities of other animals are harder to make . . . because unlike the case with the chimpanzee, our communication with other animals is severely restricted” (p. 108). We strongly disagree with this assertion and so too should the reader after considering the large collection of empirical evidence that we have presented in this chapter. Even more emphatically, we disagree with Edelman’s pessimistic and reductionistic conclusion that “the best we may be able to do [to discover if animals have concepts] is to compare the structures and functions of their brain regions with those of humans and make guesses to guide further study” (p. 108). Behavioral science is in a far better position to elucidate the cognitive abilities of nonhuman animals than to make neurological “guesses.” We can go out and collect objective and reliable evidence on the question. If Edelman’s recent comments are an accurate guide to how broadly
Behavioral Analysis of Concepts
127
experimental psychologists’ work has extended beyond our own field, then we are also going to have to do a much better job publicizing the important positive contributions that behavioral science can make to our understanding of brain function than has heretofore been the case. Neuroscience will be better for it. So too will experimental psychology. Appendix
All computer simulations were performed with the Quattro-Pro spreadsheet program (Borland International, Inc., Scots Valley, CA). Trial-bytrial changes in response tendencies to individual stimuli were modeled via the Rescorla-Wagner model (Rescorla & Wagner, 1972), with an asymptote of 100 assumed for reinforced trials and an asymptote of -100 assumed for nonreinforced trials. The per session learning rate parameter (combined alpha and beta) for the simulations was 0.10 for reinforced occurrences of a stimulus in all procedures and 0.05 for nonreinforced occurrences in the go-no go procedure (Experiment 7). Figure 13 shows the excitatory and inhibitory acquisition curves that were generated in 24 sessions using these specific parameters. Categorical dimensions were represented by 100-cell columns in the spreadsheet and individual stimuli were represented by cells in a column. Columns were arranged so that changes in the response tendency to a particular cell generalized (at 0.75 strength) to adjacent cells. Figure 14 illustrates the excitatory and inhibitory gradients generated by these
-8q -100
I
2
, , , , , , , , , , , ,
4
6
8
,
, , ,
I
,
, ,
,
10 12 14 16 18 20 22 2.
Session Fig. 13. The simulated growth of excitation and inhibition over 24 sessions of training using the same parameters for the simulations of Experiments 1 through 7.
Edward A. Wasserman and Suzette L. Astley
128
100 80
60
? z a,
a,
40
20
I-
a , o
g 8LT
-20 -40
-60 -80 -loo
10 20 30 40 50 60 70 80 90 1 0 Categorical Dimension
Fig. 14. The generalization gradients of excitation and inhibition after 24 sessions of simulated training.
particular parameters. Each stimulus had its own column in the spreadsheet. The net response tendency to a stimulus was obtained by summing across columns representing stimuli from the same categorical dimension. In all of our simulations, we assume only that there is a monotonic relation between response tendency and various aspects of actual responding. We have insufficient evidence at this point to specify more precisely the function relating response tendency to observed responding. Our simulations are thus intended to achieve qualitative, but not quantitative patterns of prediction, as is common in this domain of inquiry (Rescorla & Wagner, 1972; Shanks, 1993; Wasserman, Elek, Chatlosh, & Baker, 1993). ACKNOWLEDGMENTS We thank our many past co-authors, assistants, and technicians for their invaluable help in conducting and reporting much of the work described in this chapter, particularly Ramesh Bhatt. Carol DeVolder, Lloyd Frei. Rob Kiedinger. Kim Knauss, Keith Miller, and Bill Reynolds. Especially helpful in the writing of this chapter were Joan Cantor, Evan Heit. Jon Ringen, and the late Charles Spiker. Preparation of this chapter was supported in part by Research Grant MH47313 to Edward A. Wasserman from the National Institute of Mental Health and by a Faculty Development Grant from Cornell College to Suzette L. Astley.
REFERENCES Anderson, .I.R. (1991). The adaptive nature of human categorization. Psychological Review, 98, 409-429.
Behavioral Analysis of Concepts
129
Astley, S. L., & Wasserman, E. A. (1992). Categorical discrimination and generalization in pigeons: All negative stimuli are not created equal. Journal of Experimental Psychology: Animal Behavior Processes, 18, 193-207. Bhatt, R. S. (1988). Categorization in pigeons: Ewects ofcategory size, congruity with human categories, selective attention. and secondary generalization. Unpublished doctoral dissertation, University of Iowa, Iowa City, IA. Bhatt, R. S., & Wasserman, E. A. (1989). Secondary generalization and categorization in pigeons. Journal of the Experimental Analysis of Behavior, 52, 213-224. Bhatt, R. S., Wasserman, E. A,, Reynolds, W. F., Jr., & Knauss, K. S. (1988). Conceptual behavior in pigeons: Categorization of both familiar and novel examples from four classes of natural and artificial stimuli. Journal of Experimental Psychology: Animal Behavior Processes, 14, 219-234. Blough, D. S. (1985). Discrimination of letters and random dot patterns by pigeons and humans. Journal of Experimental P.syc.hology:Animal Behavior Processes, 11,261-280. Carter, D. E.. & Eckerman. D. A. (1975). Symbolic matching by pigeons: Rate of learning complex discriminations predicted from simple discriminations. Science, 187, 662-664. Cerella, J . (1979). Visual classes and natural categories in the pigeon. Journal ofExperimenta1 Psychology: Human Perception and Performance. 5, 68-77. Cook, R. S . , Wright, A. A., & Kendrick, D. F. (1990). Visual categorization by pigeons. In M. L. Commons. R . J . Herrnstein. S . M. Kosslyn, & D. M. Mumford (Eds.), Quantitative analyses of behavior: Pattern recognition (Vol. VIII, pp. 187-2141. Hillsdale. NJ: Erlbaum. Edelman, G. M . (1992). Bright air, brilliantJire: On the matter o f f h e mind. New York, BasicBooks. Edwards, C. A,, & Honig, W. K. (1987). Memorization and “feature selection” in the acquisition of natural concepts in pigeons. Learning and Motivation, 18, 235-260. Gelman, S . A., & Markman, E. M. (1986). Categories and induction in young children. Cognition, 23, 183-209. Gluck. M. A. (I991). Stimulus generalization and representation in adaptive network models of category learning. Psychological Science. 2, 50-55. Gluck. M. A., & Bower, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal ofE.rperirnenta1 Psychology: General, 117, 227-247. Goldiamond, I . ( 1966). Perception, language, and conceptualization rules. In B. Kleinmuntz (Ed.). Problem solving: Research. method, and theory (pp. 183-224). New York: Wiley. Griffin. D. R. (1992). Animal minds. Chicago: University of Chicago Press. Hall, G., Ray, E., & Bonardi. C. (1993). Acquired equivalence between cues trained with a common antecedent. Journal of’ Experimental Psychology: Animal Behavior Processes, 19, 391-399. Harnad, S. (Ed.). (1987). Categoric~~l perception: The groundwork of cognition. Cambridge: Cambridge University Press. Heit. E. (1992). Categorization using chains of examples. Cognitive Psychology, 24,341-380. Hermstein, R. J. (1966). Superstition: A corollary of the principles of operant conditioning. In W. K. Honig ( E d . ) , Operant behuuior: Areas of research and application (pp. 33-51). New York: Appleton-Century-Crofts. Herrnstein, R. J. (1979). Acquisition. generalization, and discrimination reversal of a natural concept. Journal of Experimentul Psychology: Animal Behavior Processes, 5, 116-129. Herrnstein, R. J. (1985). Riddles of natural categorization. Philosophical Transactions of the Royal Society, B308, 129-144. Herrnstein, R. J. (1990). Levels of stimulus control: A functional approach. Cognition. 37, 133-166. Herrnstein, R. J . . & de Villiers, P. A. (1980). Fish as a natural category for people andpigeons.
Edward A. Wasserman and Suzette L. Astley
130
In G. H. Bower (Ed.), The psychology of learning and motivation (pp. 59-95). San Diego, CA: Academic Press. Hermstein, R . J., & Loveland. D. H. (1964).Complex visual concept in the pigeon. Science, l46, 549-551. Hintzrnan, D. L. (1993).Twenty-five years of learning and memory: Was the cognitive revolution a mistake? In D. E. Meyer & S . Kornblum(Eds.),Attention andperformance XIV: Synergies in experimental psychology, artificial intelligence, and cognitive nerrroscience-A silver jubilee (pp. 359-391).Cambridge, MA: MIT Press. Homa, D.. Burruel, L., & Field, D. (1987).Thechangingcompositionofabstractedcategories under manipulations of decisional change. choice difficulty, and category size. Journal of Experimental Psychology: Learning. Memory, and Cognition. 13. 401-412. Hull. C. L. (1943).Principles qf behavior. New York: Appleton-Century-Crofts. James, W.(1890/1950).Principles ofpsychology (Vol. 1). New York: Dover. Keil. F . C. (1989).Concepts, kinds. at7d cognitive developmwnt. Chicago: University of Chicago Press. Keller. F. S . , & Schoenfeld, W. N. (1950).Principles ofpsychology. New York: AppletonCentury-Crofts. Kendler. T. S ., & Kendler, H. H. (1962).Vertical and horizontal processes in problem solving. Psyrhological Review, 6 9 . 1-16. Komatsu, L. K. (1992).Recent views of conceptual structure. Psychological Bulletin. 112. 500-526. Kruschke, J. K. (1992).ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44. Lea, S. E. G . (1984).In what sense d o pigeons learn concepts? In H. L. Roitblat. T. G . Bever, & H. S. Terrace (Eds.), Animal cognition (pp. 263-276).Hillsdale, NJ: Erlbaum. Medin, D. L. Goldstone, R. L . , & Gentner. D. (1993).Respects for similarity. Psychological Review, 100. 254-278. Medin. D. L..& Schaeffer, M. M.(1978).Context theory of classification learning. Psychological Review, 85, 207-238. Morgan. C. L. (1894).An introduction to c~omparativepsychology. London: Walter Scott, Ltd. Murphy, G. L., & Medin. D. L. (1985).The role of theories in conceptual coherence. Psychological Review. 92, 289-3 16. Nosofsky, R. M.(1984).Choice, similarity. and the context theory of classification. Journal of Experimental Psychology. 10, 104-1 14. Nosofsky. R. M. (1992).Similarity scaling and cognitive process models. Annual Review of Psychology, 43, 25-53. Oden, G. C. (1987).Concept, knowledge. and thought. Annual Review ofPsychology. 38. 203-227. Osgood, C. E.(1953).Method and theoty in experirnenral psychology. New York: Oxford University Press. Osherson, D. N., Smith, E. E., Wilkie, 0.. Lopez, A . , & Shafir, E. (1990).Category-based induction. Psychological Review, 97, 185-200. Pearce, J. M. (1988).Stimulus generalization and the acquisition of categories by pigeons. In L. Weiskrantz (Ed.), Thought without language (pp. 132-155). Oxford: Clarendon Press. Peterson, G. B. (1984).The differential outcomes procedure. A paradigm for studying how expectancies guide behavior. I n H. L. Roitblat. T. G . Bever, & H. S. Terrace (Eds.), Animal cognition (pp. 135-148). Hillsdale. NJ: Erlbaum. Premack, D. (1983).Animal cognition. Annual Review of Psychology, 34. 35 1-362. ~
Behavioral Analysis of Concepts
131
Quine, W. V . (1969). Natural kinds. In N. Rescher (Ed.), Essays in h o n o r o f C a r l G . Hempel (pp. 5-23). Dordrecht. Holland: D. Reidel. Quinn, P. C., & Eimas, P. D. (1986). On categorization in early infancy. Merrilf-Palmer Quurterly. 32, 331-363. Rescorla, R. A.. & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.). Classical conditio/iitig: I I . Current research and theory (pp. 64-99). New York: Appleton-Century-Crofts. Reynolds, G. S. (1961). An analysis of interactions in a multiple schedule. Journal o f t h e Experimentul Analysis qf Behavior, 4 . 289-294. Roberts, W. A,. & Mazmanian, D. S. (1988). Concept learning at different levels of abstraction by pigeons, monkeys. and people. Journal of Experimental Psychology: Animal Behavior Processes, 14, 247-260. Rosch, E . . & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive PsycholoRv, 7 . 573-605. Rosch. E.. Mervis. C. B., Gray, W. D.. Johnson, D. M., & Boyes-Braem. P. (1976). Basic objects in natural categories. Cognitiue Psychology, 8, 382-439. Sanders. B. (1971). Factors affecting reversal and nonreversal shifts in rats and children. Journal of Comparative rind Physiolo,yicrrl Psychology, 74. 192-202. Sands. S . F., Lincoln, C. E., & Wright, A. A. (1982). Pictorial similarity judgments and the organization of visual memory in the rhesus monkey. Journal of Experimental Psychology: General, 1 11, 369-389. Shanks, D. R. (1991). Categorization by a connectionist network. Journal of Experirnental Psychology: Leurning, Memurv. n t i d Cognition, 17. 433-441 . Shanks, D. R . ( 1993). Associative versus contingency accounts of category learning: Reply to Melz. Cheng, Holyoak. and Waldmann ( 1993). Journal ofExperimcntal Psyc,hologv: Leurning. Memorv, and Cognition, 19. 141 1-1423. Shepard. R. N. (1987). Toward a universal law of generalization for psychological science. Science. 237. 1317-1323. Shepard, R. N.. & Kannappan. S . (1991). Connectionist implementation of a theory of generalization. In R. P. Lippmann. J . Moody, &. D. S . Tovretsky (Eds.), Advances in neural information processing ~ v s t e m s(pp. 1-7). San Mateo, CA: Morgan Kaufmann. Skinner. B. F. (1935). The generic nature of the concepts of stimulus and response. Journal of General Psychology, 12, 40-65. Skinner. B. F. ( 1957). Verbal behuvior. New York: Appleton-Century-Crofts. Smith. E. E.. & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard University Press. Smith, L. B., & Heise, D. (1992). Perceptual similarity and conceptual structure. In B . Burns (Ed.). Percepts, concepts. c i t d c-utegories. Amsterdam: Elsevier. Smith. L. B.. Jones, S. S . . & Landau. B. (1992). Count nouns, adjectives, and perceptual properties in children's novel word interpretations. Developmental Psychology. 28, 273-286. Spence, K. W. (1937). The differential response of animals to stimuli varying within a single dimension. P.rvrhologica1 Review, 44. 430-444. Spinozzi, G. ( 1993). Development of spontaneous classificatory behavior in chimpanzees ( P a n troglodytes). Journal of Comparatiue Psychology. 107. 193-200. Sutherland, N. S . , & Mackintosh, N . J . (1971). Mechanisms ofanimal discriminalion learning. New York: Academic Press. Underwood, B. J. (1966). Experimentul p.svc~hology.New York: Appleton-Century-Crofts. Urcuioli, P. J., Zentall, T . R., Jackson-Smith. P., & Steirn, J. N . (1989). Evidence for
132
Edward A. Wasserman and Suzette L. Astley
common coding in many-to-one matching: Retention, intertrial interference, and transfer. Journal of Experimental Psychology: Animal Behavior Processes, 15, 264-273. Vaughan, W., Jr. (1988). Formation of equivalence sets in pigeons. Journal of Experimental Psychology: Animal Behavior Processes. 14, 36-42. Wasserman, E. A. (1974). Stimulus-reinforcer predictiveness and selective discrimination learning in pigeons. Journal of Experimental Psychology, 103, 284-297. Wasserman, E. A. (1981). Comparative psychology returns: A review of Hulse. Fowler, and Honig’s Cognitiveprocesses in animal behavior, Journal of the Experimental Analysis of Behavior, 35. 243-257. Wasserman, E . A. (1982). Further remarks on the role of cognition in the comparative analysis of behavior. Journal of the Experimental Analysis of Behavior, 38, 21 1-216. Wasserman, E. A. (1983). Is cognitive psychology behavioral? Psychological Record, 33, 6-11. Wasserman, E. A. (1990). Detecting response-outcome relations: Toward an understanding of the causal texture of the environment. In G . H. Bower (Ed.). The psychology of learning and morivation (pp. 27-82). San Diego. CA: Academic Press. Wasserman, E. A. (1993). Comparative cognition: Beginning the second century of the study of animal intelligence. Psychological Bulletin, 113, 21 1-228. Wasserman. E. A.. & Bhatt, R. S. (1992). Conceptualization of natural and artificial stimuli by pigeons. In W. K . Honig & J. G. Fetterman (Eds.). Cognitive aspects o f s t i m u h control (pp. 203-223). Hillsdale, NJ: Erlbaum. Wasserman, E. A., & DeVolder, C. L. (1993). Similarity- and nonsimilarity-based conceptualization in children and pigeons. Psychological Record, 43, 779-793. Wasserman, E. A,. DeVolder. C. L . . &Coppage. D. J. (1992). Nonsimilarity-basedconceptualization in pigeons. Psychoiogicul Science 3 . 374-379. Wasserman, E. A.. Elek, S. M.. Chatlosh, D. L., & Baker, A. G. (1993). Rating causal relations: The role of probability in judgments of response-outcome contingency. Journal of Experimentul Psychology: Learning. Memory, and Cognition, 19, 174-188. Wasserman, E. A,, Kiedinger, R. E., & Bhatt, R. S. (1988). Conceptual behavior inpigeons: Categories, subcategories. and pseudocategories. Joicrnai of Experimental Psychology: Animal Behavior Processes. 14, 235-246. Watanabe, S., Lea. S . E. G., & Dittrich. W. H. (1993). What can we learn from experiments on pigeon concept discrimination? In H. P. Zeigler & H.-J. Bischof (Eds.), Vision, brain, and behavior in birds (pp. 351-376). Cambridge, MA: MIT Press. Younger, 9 . A , , & Cohen, L. 9. (1985). How infants form categories. In G. H . Bower (Ed.), The psychology of learning and niotivation (pp. 21 1-247). San Diego, CA: Academic Press. Zentall. T. R., Steirn, J . N., Sherburne, L. M..& Urcuioli, P. J. (1991). Common coding in pigeons assessed through partial versus total reversals of many-to-one conditional and simple discriminations. Journal of Experimental Psychology: Animal Behavior Processes. 17, 194-201.
THE CHILD’S REPRESENTATION OF HUMAN GROUPS Lawrence A . Hirschfeld
I. Introduction Developing the skills for competent cultural behavior is a major task of early childhood. Achieving cultural competency, in turn, depends on recognizing what entities form the cultural environment. Given that human groupings (i.e., collectivities of people based on their gender, race, native language, kinship status, etc.) are integral parts of nearly all social environments, acquiring knowledge of such groupings is a necessary part of the young child’s early development. Curiously, although we know a considerable amount about the young child’s understanding of classes of nonhuman living things (i.e., animals; Keil, 1979; Carey, 1985; Gelman, Spelke, & Meck, 1983), little research has explored the way children come to identify and represent the relevant social entities in their environment. Similarly, although many researchers have examined young children’s abilities to individuate persons-either in virtue of a perceptual device dedicated to the representation and recall of human faces (Carey & Diamond, 1980) or in terms of the ability to understand behavior as caused by specific constellations of traits and stable dispositions (Eder, 1989)-we have little insight into the way children understand the aggregates of persons that compose the social environment. Nonetheless, the importance of understanding human groups can be readily seen by considering the range of competencies dependent on such THE PSYCHOLOGY OF LEARNING AND MOTIVATION. VOL. 31
I33
Copyright 0 I994 hy Academic Press, Inc. All rights of reproduction in any form reserved.
134
Lawrence A. Hirschfeld
awareness. Learning to use kinship terms (by learning who is and who is not a member of one’s family), culturally appropriate forms of politesse (in knowing one’s own and others’ status group membership), or even mastery of language itself (in which awareness of human collectivities based on gender, relative age, or degree of familiarity between speakers is necessary for selecting the appropriate syntactic or lexical form), all rest on an appreciation of human group boundaries. Not surprisingly, then, researchers have found that sensitivity to social group difference and the ability to adjust behavior in virtue of membership in a social group emerge early (Dunn, 1988; Corsaro, 1979; Becker, 1982; Anderson, 1986, 1990). Acquiring knowledge of the human collectivities used in one’s culture is a fundamental part of common sense, as precociously learned and crucial to human interaction as the capacity to understand others in terms of mental states and mental causes. This social competence is even more remarkable when we consider the range of social affiliations the young child encounters. Who a particular person is, and how his or her behavior is to be explained, is highly variable, depending on context, perspective, and other changing phenomena. The same individual’s relevant group affiliation may not always be the same. Aunt Mary, for example, may be my physics teacher as well as my father’s sister. The variability of our identity relations has consequences for determining the behaviors I should adopt and the expectations I can entertain toward her. Arriving at the culturally appropriate interpretation of our relationship-by figuring out the nature and scope of the group membership that should govern our interactions-is not a simple matter. Yet, despite the complexity of social experience, students of social cognition have generally imagined that these sorts of social understanding are not difficult to acquire. It is widely assumed that children discover the regularities in the social environment through relatively straightforward learning processes, no different from the way children learn about any sort of phenomenon. Although most researchers tacitly acknowledge that children’s attention is directed only toward those human groupings that their culture deems relevant (no one imagines that American children learn the human groups specific to Javanese society), it is generally supposed that the child initially induces social categories from perceptual experience. Social categories, according to a widely held view, are constructed around the physical correlates of group membership. Figuring out group affiliation, then, is attributed to low-level perceptual processing, typically involving little more than judgments of similarity in appearance (Flavell, 1985; Kosslyn & Kagan, 1981; Ross, 1981). This picture of social development is curious for several reasons. First, research increasingly supports the view that domain-general, similaritybased learning plays a smaller role in the formation of common sense
Child’s Representation of Human Groups
I35
knowledge than earlier accounts suggested (Hirschfeld & Gelman, 1994). Second, discovering the social dimensions that a given culture deems relevant often does not involve attention to perceptual or raw behavioral cues, but rests on an awareness of underlying currents of social organization, typically elaborated in language not action (Hirschfeld, 1988, 1993). In fact, anthropologists and social cognitivists have provided compelling evidence that young children are sensitive to nonobvious and hidden dimensions such as status, power, and context-dependent authority. Toddlers, for example, possess a nuanced understanding of social relations of power, investing the same person (such as an older sibling) with a different level of authority depending on whether an adult is in the room (Dunn, 1988). Preschoolers display considerable sophistication about the language of social status hierarchies (Anderson, 1986, 1990;Corsaro, 1979; Becker, 19821, and several ethnographically oriented studies have shown that young children can moderate their own linguistic and other behavior as a function of subtle differences in perceived status (Ochs & Schieffelin, 1984; Watson-Gegeo & Gegeo, 1986; Corsaro & Rizzo, 1988). In this chapter, I discuss several studies probing the young child’s understanding of a range of less hidden but widely studied social statuses, involving both intrinsic aspects of group identity (such as race) and variable ones (such as occupation). As noted earlier, previous researchers have supposed that young children have an early emerging and robust sensitivity to these social statuses because they have conspicuous physical correlates. To the contrary, I will argue that young children’s understanding is considerably richer in structure and more adultlike than standard views suggest. Young children’s understanding of these statuses reflects, I will show, a domain-specific concern with the nature and scope of human collectivities.
11. The Psychological Representation of Human Groups
Given their importance to explaining human behavior, it is disappointing that so little empirical work has examined how human groups, as opposed to individuals, are represented in memory (Hirschfeld, in press a; Hamilton, 1981). Not only has little research investigated the ways in which human groups might be conceptually distinguished, much previous research in social cognition has instead emphasized the close parallel between social and nonsocial concepts (Flavell, 1985; Kosslyn & Kagan, 1981), thus downplaying the differences between social and other object categories. A good example of this is the long tradition of research in social psychology on stereotyping (i.e., the willingness to induce proper-
136
Lawrence A. Hirschfeld
ties about an individual based on their group identity alone; Allport, 1954; Hamilton & Trolier, 1986;Hamilton, 1981; Bar-Tal, Grauman, Kruglanski, & Stroebe, 1989). Much of this work has focused on the ways that social stereotypes may be linked to general categorization processes (Fiske & Taylor, 1991; Taylor, 1981; Asmore & Del Boca, 1981; McCauley & Stitt, 1978).Emphasis is placed on points in common among social and nonsocial categories, rather than those qualities that might set them apart. As a result, there has been a tendency to see all social categories as essentially alike (Turiel, 1983). For example, running through much of the literature on social prejudice is the assumption that all stereotypes have the same structural properties no matter whether the social category stereotyped is based on race, gender, occupation, sexual or political orientation, or other social status (Hamilton & Trolier, 1986). Little attention, accordingly, has been paid to possible differences in conceptualization that might result from variations in types of social groups encountered. Still, a few studies have addressed the ways in which human groups and individual humans may be representationally distinct. Wyer and Martin (1986),for example, have proposed that information about groups is stored independently of information about individual group members, particularly traits associated with the individual. Milner (1984) has suggested that different kinds of learning are involved in acquiring knowledge of individuals and groups. Adults appear to distinguish social properties (i.e., the attributes that are true of an actor) from social kinds (i.e., the sorts of actors there are; Hirschfeld, 1988), suggesting that the kind of group represented shapes the inferences one draws about members of different groups. For example, Rothbart (1981)recalls an anecdote of Sartre’s (1948) about a woman who disliked Jews because of her disagreeable encounters with Jewish furriers. Sartre astutely wondered why she chose to hate Jews, not furriers. The explanation presumably is that it is more “natural” to hate Jews than furriers because prejudice adheres to the kinds of people there are (i.e., race) more readily than the kinds of activities in which people engage (i.e., occupation). According to most previous research, young children are incapable of drawing distinctions based on such abstract criteria. As observed earlier, standard accounts of social cognitive development have emphasized the integral role similarity in appearance is thought to play in the derivation and representation of a wide range of social categories. Young children are seen as externalists whose attention is drawn to surface properties and who rarely appeal to internal and hidden properties, causal principles, or psychological motives when explaining the behavior of humans and other living things (Piaget, 1951; Ross, 1981). Accordingly, it is generally accepted that young children do not distinguish social properties from social kinds, and in particular are thought not to model their explanation
Child’s Representation of Human Groups
I37
of events on nonobvious social causes (for a review, see Hirschfeld, 1994). Perhaps the most detailed discussion of this process involves the young child’s notion of race, the topical focus of the following studies. Widely researched, the concept of race is among the earliest emerging of the preschool child’s social categories (Horowitz, 1939; Clark & Clark, 1940; Aboud, 1988; Katz, 1982). Cross-cultural and cross-national studies indicate that this sensitivity to race recurs across widely different systems of cultural belief (LeMaine, Ben Brika, and Bonnet 1988; Vaughan, 1987; Hirschfeld, 1988). By most accounts, this stability in racial concept formation is attributable to a direct integration of perceptual experience: In building racial categories, young children are assumed to attend to “easily discernible” cues (Katz, 1982, p. 20) and group “people into categories on the basis of attributes perceived in common” (Vaughan, 1987, p. 91), including such indices of “concrete reality” as skin color (Clark & Clark, 1940, p. 168), and other “overwhelmingly” external attributes such as costume, cuisine, and language (Aboud, 1988, p. 106). Still, it is also widely accepted that the notion of race eventually comes to be embedded into an enriched conceptual system (adult understanding of race is clearly theory-like, involving essentialist expectations that go beyond the range of direct experience; Allport, 1954; Atran, 1990; Rothbart & Taylor, 1990). Thus, even though the exrenrion of adult’s and children’s racial categories largely overlap, the meaning of these categories is quite different (Aboud, 1988). The ability to categorize individuals racially without understanding the adult meaning of racial concepts is supposed to occur because the two tasks-categorization and interpretation-are thought to be the result of different processes. Much as living things at the basic level “cry out to be named” (Berlin, 1992), both common folk belief and scientific tradition assume that the social environment visually imparts racial categories (Mosse, 1978; Banton, 1987). One reason that this account of racial concept formation enjoys such wide appeal is that it converges with other well-grounded arguments. For example, the close parallels between the adult notion of race on the one hand, and folk and scientific systems of biological belief on the other, have been frequently observed. A hallmark of both the racial and biological systems of belief is essentialist reasoning (Atran, 1990; Banton, 1987; Guillaumin, 1980; Allport, 1954).Viewing racial concepts as derived from unmediated observations of nonrandom gaps in human phenotype, accordingly, seems plausible because lay people and ethnobiologists alike argue that the folk-species concept is derived from unmediated observations of discontinuities in flora and fauna (Berlin, 1992; cf. Atran, 1990). Like acquiring knowledge about taxonomic differences among plants and animals (Johnson, Mervis, & Boster, 1992; Carey, 1985), coming to understand racial differences among human populations would initially turn on
138
Lawrence A. Hirschfeld
similarity-based learning. Later, a more “theoretical” model would emerge in each domain. The analogy between folk biology and racial thinking, however, may not be apt. In contrast to work on folk biology that has uncovered surprisingly strong cross-cultural agreement in living kind classification (Atran, 1990; Berlin, 1992; Johnson et al., 1992), research from several traditions suggests that racial categorization is strikingly nonconuergenr across cultures and historical epochs. For example, social historians and cultural anthropologists have shown that different systems of racial classification sort individuals of the “same” physical type quite differently (Harris, 1964; Stoler, 1992). Thus, despite the prevalence of the idea that racial types are fixed and derived from the way physical differences in the world are patterned, the notion of race itself varies considerably across time and across cultures. What salient racial differences are, and where their boundaries are drawn, depends on historical and cultural context. For example, during the early twentieth century, discussions of race in America were as (and perhaps more) likely to contrast Northern and Southern Europeans (and Protestants and Catholics) as blacks and whites. As Gordon (1989) observes, social workers “comments and expectations about [Southern European, Catholic] immigrants in this period were similar to views of black clients in the midtwentieth century” (p. 14), and racially mixed relationships referred to Protestant-Catholic, not black-white, unions. In contemporary American society, the same individual’s racial status can differ substantially inasmuch as each state has it own, often conflicting, rules for determining whether someone is black or white (Davis, 1991). Moreover, in the United States, a person is either black or white, but cannot be both. The category of mixed-blood is not a viable cultural or legal option in spite of the frequency of mixed-racial unions (Hirschfeld, in press a; Molnar, 1992). In contrast, in many other cultural traditions (e.g., contemporary South Africa), mulatto populations constitute a third race (even when not legally defined as such; Stoler, 1992). Given this cultural and historical variability, it should not be surprising that most biologists now conclude that human races, as they have been culturally defined, simply do not correspond to interesting biological categories (Bodmer & Cavalli-Sforza, 1976; Gould, 1981). Although problems with fixing the referents of racial concepts do not force us to abandon the view that children learn about race by attending to physical differences, they do suggest that children would have to attend to something more than phenotypic gaps in the human population (and the way language labels these gaps) if the close correspondence in extention between adult and young children’s racial concepts is to be achieved. An obvious candidate would be parental teaching: children learn the culturally appropriate racial categories, even if the physical environment underdeter-
Child’s Representation of Human Groups
I39
mines them, because parents guide children’s attention. There is reason to believe, however, that attention to local environmental cues may also be insufficient. Several studies have shown that young children’s racial uttitudes do not closely correspond to those of their parents, nor do children’s racial attitudes seem to respond readily to parental interventions directed at changing them (Branch & Newcombe, 1980, 1986; Hirschfeld, 1988). This does not, of course, rule out the possibility that young children’s racial categories are derived either from observation or through tuition from parental categories, but it does raise the possibility that children bring more to the task of learning about race than a willingness to categorize objects in the world or simply accept what their parents tell them. There are also cognitive questions raised by the view that racial concepts develop directly out of observations of physical difference. By stressing the integral role similarity in appearance is thought to play in the derivation of social categories, existing work on social cognitive development sets itself apart from recent research on common sense concept formation. Many researchers now believe that in a wide range of domains, early knowledge is organized through specialized faculties for understanding that direct t h e child’s attention to certain sorts of data and guide the child’s hypotheses about the nature of relations between category members. Often likened to scientific theories, several of these faculties have been empirically studied, including naive biology (Carey, 1985; Keil, 1989; S. Gelman, 19891, core beliefs about physical matter (Spelke, 1991;Baillargeon, 1992), theory of mind (Astington, Harris, & Olson, 1988), naive mathematics (Gelman & Gallistel, 19781, face recognition (Carey & Diamond, 1980), music (Lerdahl & Jackendoff, 1983), and syntactic knowledge of natural languages (Chomsky. 1986). Of particular importance, these faculties provide the principles (a) for defining entities within a domain of knowledge, (b) for specifying operations on these entities. and (c) for governing causal reasoning about these entities (Carey, 1985; R. Gelman, 1990). With few exceptions (e.g., Turiel, 1983), however, social cognitivists have not seriously considered the possibility that the acquisition and representation of social categories may also be governed by a specialized faculty for understanding sensitive to deeper differences. and not merely appearances.
111.
An Alternative Model of Social Development
What would a domain-specific account of social development in general, and of the notion of race in particular, look like? Such a model might predict that children develop racial categories in response to an impulse
140
Lawrence A. Hirschfeld
to discover the sorts of humans there are rather than as an attempt to catalog physical differences among the humans they encounter. In trying to identify the human groups relevant to their cultural environment, and to discover the natures of their members, children would provide evidence of an early ontological curiosity. In earlier work, I have tried to show that young children do appear to believe that different principles of organization underlie the various common sense collectivities that govern social events. These include groups based on notions of common kinship (Hirschfeld, 1989a), language (Hirschfeld, 1989b), race, and occupation (Hirschfeld, 1988, 1993, 1994). Children, accordingly, might discover the kinds of groups there are in the world, not by attending to perceptual regularities in the environment, but by elaborating strategies for reasoning about difference derived from expectations about the nature of society and nonobvious social commonalities. Discovering the nature of society and social commonality, in turn, might involve discerning the principles for sorting the kinds of people there are in the world. Such a view suggests that young children’s social categories would be more richly structured than previous researchers have believed, so that social categories would be determined in significant measure by the folk theories in which they are embedded. In this regard, social and other object categories would be similar in that the development of both would be governed by domain-specific skeletal principles that guide children to parse the environment in terms of hidden and nonobvious commonalities among category members. The particular commonalities perceived, of course, would vary as a function of properties of the domain. In this case, the relevant structural property would be the specification of a social ontology and a discrimination of domain-specific principles of social causality. As already observed, the issue is not whether race is theory-like. All observers agree that adult belief about race is richly structured, and that children’s belief is not. Moreover, despite the considerable variation in racial systems of class$cation, there is marked universality in systems of racial theory. Adults the world over construe human racial variation in biological terms (as reflecting immutable and heritable anatomical differences; see van den Berghe, 1967; Banton, 1987). There is equally broad consensus about the source of this essentialist understanding. The conspicuous parallels (both cognitive and historical) between the species model of living kind difference and the racial model of human difference has frequently been noted (Guillaumin, 1980; Banton, 1987; Atran, 1990).The identification of the two realms is typically attributed to a transfer of biological principles of inference to the human domain (Rothbart & Taylor, 1990; Atran, 1990; Boyer, 1990).
Child’s Representation of Human Groups
141
The transfer of knowledge from better understood (or better grounded) domains to less understood (or less grounded) domains is of interest to a broad range of psychologists. Often discussed under the rubric of analogical transfer (Vosniadou & Ortony. 1989; Gentner & Stevens, 1983), such reasoning is thought to be particularly important in the development of knowledge in novel areas (Inagaki & Hatano, 1987; Carey, 1985). Strikingly, however, no convincing and general developmental account of the mechanisms governing the analogical transfer of knowledge, particularly across domains, has emerged (Brown, 1990). Moreover, several researchers have pointed out that the sorts of “natural” transfers, readily observed in normative contexts, are notoriously difficult to replicate experimentally (Novick, 1988; Resnick, 1994). This difficulty aside, cross-domain transfers do appear to occur. The spontaneous modeling of inferences about racial difference after principles of biological reasoning plausibly represents one of the best researched. normative instances of this sort of crossdomain knowledge transfer. The alternative, domain-specific model of race is not as committed as the standard view to knowledge transfer. On the standard model, racial thinking comes to encompass the causal principles and ontological commitments that lay folk use to reason about biological creatures in virtue of the perception of parallel morphological clustering: race is commonsensically seen as an analogue to species (more accurately, the folk species concept). In contrast, the alternative model predicts that a richly structured understanding of race develops bqfow the visual aspect is fully elaborated. In fact, the naive theory enables the child to select the relevant physical differences to which to attend. Thus, according to the alternative model, a theory-like construal of race should be evident virtually from the outset. On the standard model, perception shapes theory; on the alternative model, theory guides perception. Focusing on these contrasting predictions, the studies that follow explore in depth the derivation of young children’s beliefs about social difference.
IV. Race and Perceptual Information A. EXPERIMENT 1: THECONTRIBUTION OF PERCEPTUAL INFORMATION TO THE CONCEPTUAL SALIENCY OF COMMON SOCIAL CATEGORIES The perceptual (or similarity-based) model of social category acquisition assumes that children attend to different categories as a function of their perceptual prominence. If a category has marked physical correlates, it
142
Lawrence A. Hirschfeld
will be recognized more precociously than categories with less conspicuous attributes. If several categories are all similarly conspicuous, they should be equally salient conceptually. Existing studies provide some support for these claims. For example, young children learn racial categories (in which category membership is supposedly defined in terms of physical cues) before ethnic or religious ones (in which category membership is supposedly defined in terms of beliefs) (Aboud, 1987). Children’s sensitivity to racial, occupational (Blaske, 1984). and body build (Lerner, 1973) categories, which are all perceptually prominent social dimensions, emerge during the same early preschool period. These data, however, provide only weak support for the claim that all similarly conspicuous social categories have equal conceptual salience because previous studies have explored children’s understanding for each social dimension separately. Thus, it is not possible to tell whether two social dimensions, whose developmental courses closely parallel each other, are similarly salient. To explore this question. in a first experiment, 1 probed French preschoolers’ memories for social information as a way of assessing the conceptual importance of race relative to other perceptually prominent social descriptors, such as nonracial physical features, occupation, gender, and behavior. Social information was presented to children in a short narrative in which each character’s social affiliation and status were mentioned but were otherwise irrelevant to the story’s plot and structure. The story (adapted from a type that young children efficiently parse and readily recall) centers on a young protagonist whose goal is to buy a birthday present for his or her (depending on subject’s sex) mother. The story’s principal complication is that the child, in trying to find an appropriate store, encounters four different adults. In story-grammar terms, the narrative consists of a setting, a complication, four developing episodes (each including a solicitation, a reaction, an invitation to further action), and a resolution. Each adult character is described twice in terms of his or her race, occupation, gender, and a nonracial physical feature (stature, body build. or age). After hearing the test text, each subject was asked to freely recall the story. Following the recall, the experimenter reread the story to the subject and again solicited a free recall. The number of times subjects recalled the race, occupation, nonracial physical feature, or gender of each character was computed. Recall indices sensitive to each subject’s memory for the story were computed by dividing the frequency that each social dimension was recalled by the number of story characters the child remembered (e.g., if a subject recalled three of th’: four characters, two of whose race was mentioned, the subject would get a race-recall score of .66, as race was mentioned two out of three times).
Child’s Representation of Human Groups
143
Conceptual importance was construed in terms of the relative availability of these social descriptors in children’s memory for a short narrative. Differences in subsequent recall of the social descriptors were accordingly attributable to differences in their conceptual importance. If children reasoned that all conspicuous social dimensions were equally important, then the rates of recall of the various dimensions should be quite similar. The pattern of the children’s responses, however, shows that the various social dimensions were not equally memorable. Subjects recalled the characters’ occupations ( M = .29) significantly more than race ( M = .18). gender ( M = .15), or nonracial physical feature ( M = . I I). Contrary to what the standard view predicts, different social descriptors appear to be differentially salient. Also in contrast to standard predictions, race is not the most salient social dimension for preschoolers: Occupation-a social property usually seen as contingent rather than essential-was found to be more salient than race. Although children recalled more social information following the second reading than following the first, controlling for memory factors by using a measure of recall sensitive to children’s overall memories of the story, also proved to be informative. There were no reliable between-age-group differences in rates of recall when indices that controlled for each child’s overall level of recall were used.’ This finding is significant because previous work suggests that major changes occur in children’s social cognitions between 3 and 4 years of age. Awareness of race and other social properties supposedly emerges and becomes increasingly elaborated during the preschool years. The present findings. in contrast, indicate that it is children’s memory, not social awareness, which may be changing during this period (Table 1). In sum, not all social categories are equally salient conceptually. The prominence of a social category’s physical correlates does not predict its conceptual salience. Neither race nor gender are the most salient social descriptors for preschool children. Preschoolers’ social beliefs are both consistent and stable during this period of supposed change. These results are inconsistent with the standard account of young children’s social categorization, but fall short of allowing us to reject it. This experiment relies on verbal stimuli to probe the role of visual saliency and thus is not the strongest test of the contribution of surface cues to social category formation. It is possible. for example. that visual categories bring to mind the perceptual correlates of the categories more readily than do category labels by themselves. Thus, the perceptual-rich view would distinguish
’
For details of children’s performances as assessed using indices controlling for memory factors. see Hirschfeld. 1993.
144
Lawrence A. Hirschfeld
TABLE I EXPERIMENT 1: MEAN PROPORTION OF RECALL OF SOCIAL DESCRIPTORS FOR THE T w o AGEGROUPS Age level 3-Year-olds N = 33 Social descriptor Occupation Gender Race Nonracial physical feature
4-Year-olds N = 31
First recall
Second recall
First recall
.221
,302 ,214 .214 . I 12
,234 ,130 ,148 ,092
,118
.I08 .086
Second recall .392 ,141
.238 .I57
Nore. From Hirschfeld (1993)
between performances involving the category labels alone and category labels directly linked with stimuli whose physical correlates are inescapably evident. To rule out this possibility, I conducted a second narrative recall experiment in which visual stimuli as well as verbal labels were used. If the conceptual saliency of racial categories is influenced by the sheer availability of physical exemplars, then rates that social information is recalled in Experiment 2 should exceed those in Experiment I . 2: VISUAL,CUESA N D THE RELATIVE B. EXPERIMENT CONCEPTUAL SALIENCE OF SOCIAL CATEGORIES Subjects in Experiment 2 were also presented with characters whose social descriptions were irrelevant to the story’s plot and structure. In this instance, however, children were presented with a visual narrative paralleling the verbal narrative used in Experiment 1 and their memories ofthe story were probed. As in the first experiment, conceptual importance was construed in terms of the relative availability of social descriptors in children’s memories for the visual narrative. The effect that labeling had on memory for social information was examined by conducting the narrative recall task together with tasks that matched verbal labels for the social dimensions mentioned in the narrative recall task with their referents. One half of the subjects completed the verbal narrative recall task after participating in these additional tasks, the other half did the verbal narrative recall task before performing the additional tasks. If priming increases the association of label and knowledge of physical features, then children’s recall should improve under this order manipulation.
Child’s Representation of Human Groups
145
In the narrative recall experiment, subjects were shown a picture book containing 14 color-wash drawings. The plot line of the picture story paralleled that of the verbal narrative used in Experiment 1. In the picture story, the protagonist was a lost dog who encounters four adults in succession in trying to rejoin his master. As in Experiment 1, each adult character’s occupation, race, gender, and a nonracial physical feature were marked. Subjects were asked to look through the story book page by page. To ensure that subjects attended to the pictures, children were asked to describe each frame as they went through the book. Following this “reading,” the experimenter asked each subject to recall three things about the first person encountered by the lost dog. The procedure was repeated for all four characters. The number of times children recalled each of the social dimensions was calculated. These scores were transformed into proportions by dividing them by the total number of descriptions each subject offered. Again, if children reasoned that all conspicuous social dimensions were equally important, then the rates of recall of the various dimensions should be quite similar. As Table I1 shows, the rate of recall of gender ( M = .27) was reliably higher than the rates of recall of occupation (M = .13), race (M = .07), and nonracial physical feature (M = .06), which were not significantly different from each other. Like the results of Experiment 1, these findings also lend little support to the claim that racial categories are rich in visual information. In particular, visual descriptions of racially marked stimuli did not elicit descriptions using racial labels. Strikingly, children were actually less likely to describe a social event in terms of a character’s race following a visual narrative than they were when recalling a verbal one. This was not true of all social categories. Gender, for example, was better recalled following a visual narrative than a verbal one (though French provides more opportunities than English to underscore an individual’s gender). The results of Experiment 2, like those of Experiment I , suggest that different social categories have distinct saliencies. Taken together, these findings are consistent with the alternative model 1 have proposed; namely, that a social category’s relevance is not simply a function of its perceptual properties. Specifically, racial categories do not appear to be initially rich in perceptual information. One potential problem with this interpretation is that race might be so obvious that children would not think to describe the pictured individuals in terms of it. Mentioning race would accordingly violate conversational rules about relevance. There are two reasons to reject this second interpretation. First, gender is an obvious way to describe people, yet this did not lead to a dispreference for gender descriptions on the task. Just the contrary; as Table I1 shows, gender is the most common descriptor recalled. Second, if race is taken as unreflectively shared knowledge (and thus there is no
Lawrence A. Hirschfeld
I46
TABLE I1
EXPERIMENT 2 : MEAN PROPORTION OF RECALLOF SOCIAL DESCRIPTORS FOR THE Two AGEGROUPS,VISUAL NARRATIVE TASK Age level
3-Y ear-olds N = 16 First recall
Social descriptor ~~~~~~
~
Gender Occupation Race Nonracial physical feature ~~~~
Second recall
4-Year-olds N = 16
First recall
Second recall
~
,285 ,103 ,086 ,056
,231
,329
.223
,113
. I34
. I73
,072 ,044
.047 .078
,075 ,068
~
Note. From Hirschfeld (1993).
need to draw attention to it), we would expect race to be an especially memorable aspect of the described individual's identity. Yet, subsequent questioning revealed that children were not particularly accurate at recalling whether the various ethnic groups had been depicted in the picture story. After completing the recall task, children were asked whether they recalled seeing individuals of each ethnic group in the story. Children's memories for this information was generally poor, and their responses were no better than chance. If race is such an obvious feature that children do not find it worth mentioning, it is not clear why their reia!l would be so poor.
I . Additional Measures Two additional measures were given to the children participating in Experiment 2. The first assessed a subject's ability to match physical depictions of occupations and races, figuring in the visual narrative, with verbal labels for these occupations and races. This label-to-picture pairing task served principally as a priming condition for the narrative recall task. Previous research has established that children of this age perform at ceiling in matching common occupational and racial labels to depictions of their referents, and our results replicated this. Accordingly, the results from this task will not be discussed here (see Hirschfeld, 1993, for the results and complete description of this task). A second task, match-tosample, was included to help resolve a question left open from Experiment 1. That study established that children are more likely to recall a person's
Child's Representation of Human Groups
I47
TABLE 111
MATCH-TO-SAMPLE ITEMS USED I N EXPERIMENT 2 Target
Comparison I
Comparison 2
1st Triad: Occupation versus race
Male black physician 2nd Triad: Sex versus race Black female nurse 3rd Triad: Occupation versus sex White female physician
Black female nurse
White female physician
White female physician
Black male physician
Black male physician
Black female nurse
occupation than race. The findings do not tell u s if occupation is conceptually or perceptually more salient than race. The match-to-sample tasks address this issue by pitting race against occupation in perceptually based sortings. Subjects were shown three triads consisting of a target and two comparison pictures in which a person's race, occupation, and gender were marked. Table 111 summarizes the three triads. Each child was asked to put the target drawing with the comparison picture to which it was most similar. In total, the three triads provided each subject with two opportunities to select race, occupation, or gender. The number of times each subject chose each social dimension was calculated. As Table 1V shows, judgments did not differ from the chance expectation of 1 . Both younger and older preschoolers displayed the same lack of preference for one dimension over the others, suggesting that they TABLE IV
EXPERIMENT 2 : M E A NNUMBEROF TIMESEACHDIMENSION WASCHOSEN (OUT OF T W O ) Age level 3-Year-olds N=lh
4-Y ear-olds
Sex
1.06
Occupation
t.68) I.oo
0.94 (.77) 0.94 (.68) 1.12 (.72)
Dimension
(54)
Race
0.94 (.68J
N = 16
N o f e Standard devi.itions are given in parenthese5 From Hirxhfeld (1993)
I48
Lawrence A. Hirschfeld
believe that individuals of the same race go together as readily as individuals of the same occupation, and as readily as individuals of the same gender. It is possible that this lack of preference is due to subjects’ failure to engage the task. As the results of Experiment 3 will show, however, this interpretation is not compelling: Given slightly different instructions (but essentially the same materials), children of the same age produce a quite different pattern of sorting. Thus, the perceptual salience of occupation is not substantially different from that of race or gender, so that differences in salience of occupation, relative to race and gender, uncovered in Experiments 1 and 2 must be attributed to conceptual, not perceptual factors. 2. Priming Results from the order manipulation lend additional support to the claim that racial categories are not initially rich with perceptual information. The narrative recall part of Experiment 2 was conducted under two conditions: a primed and an unprimed condition. In the primed condition, subjects first participated in the tasks described in the previous section. In the unprimed condition, subjects participated in these tasks after completing the picture book recall task. Recall that these tasks involved asking children to look at pictures portraying individuals of various races and occupations, and to match these pictures to the appropriate racial and occupational labels. If seeing a category exemplar (and hearing the verbal label for it) bring to mind the physical correlates of the category, then priming should increase the availability of this knowledge. Table V summarizes the results. Strikingly, priming has no significant effect on recall: Linking verbal labels with depictions of their referents on the pairing task did not facilitate the label’s retrieval when the child encountered similar representations of the same referent. Having the child pair a picture of a black male with the label black, for example, did not increase the likelihood that subjects would subsequently refer to a picture of a black male with the term black. On the face of it, this is a curious finding. Previous research shows that preschool children can readily match such labels with pictorial (and other kinds) of exemplars (for race categories, see Aboud, 1988; Clark & Clark, 1940; for body build categories, Lerner, 1969, 1973; Lerner & Schroeder, 1971; for occupational categories, see Blaske, 1984; for a nonracial physical feature, see Pope Edwards, 1984). These performances have been interpreted as showing that perceptual properties are closely linked to the relevant verbal concepts. In fact, these previous studies establish only that some perceptual information is associated with the verbal categories,
Child’s Representation of Human Groups
149
TABLE V EXPERIMENT 2: MEANRATESOF RECALLFOR EACHSOCIAL DESCRIPTOR BY CONTROL CONDITION Condition Social descriptor Gender First recall Second recall Occupation First recall Second recall Race First recall Second recall Nonracial physical feature First recall Second recall
Primed
Unprimed
,237 .I91
,377 ,265
,126 . I53
,110
,133
.074 ,072
.076
,092 ,062
.050
,057
,043
Noru. Data from Hirschfeld (1993)
they do not specify the scope of that association. For example, recognizing that a verbal label and a particular perceptual cue go together implies either that (a) the perceptual information is directly represented in the concept such that observing the perceptual cue brings to mind a specific verbal label; (b) the perceptual information is contained in the verbal concept, although in a less direct way, such that the perceptual cues brings to mind the specific verbal label only under certain conditions (say, priming): or (c) only the fact that perceptual information is important (rather than any specific perceptual information per se) is represented in the concept. Hence, observing the perceptual cues brings to mind the verbal label only when attention is drawn to a relevant and perceptual contrast and the verbal label is provided. If the first relation (i.e., the verbal label is directly brought to mind by the perceptual cue) typified young children’s racial concepts, we would expect the results of Experiment 2 to parallel or exceed those of Experiment 1 . At the least, race should be moderately salient in social description on both tasks. Clearly, neither of these predictions holds. If the second relation typified young children’s racial concepts (i.e., the label is brought to mind by the perceptual cues only under certain attention-directing conditions), then we would expect that the results of Experiment 2 would parallel those of Experiment 1 under the priming condition, but not under
I50
Lawrence A. Hirschfeld
the unprimed condition. This was not the case. Finally, if the third relation holds (i.e., the verbal label and the perceptual cues are associated only if the two are immediately conjoined and attention is drawn to the contrast), priming should have no necessary effect, and accordingly performance on the two tasks should not be related. This is the pattern obtained. These findings not only indicate how racial information is nor represented, they help reveal the format under which racial information is initially represented. Preschoolers readily differentiate people on the basis of racially relevant perceptual cues. These same children also have racial categories: they recognize the existence of named, enduring groups of humans. Nearly all previous work on racial concepts formation has assumed that these two phenomena are aspects of a single system. I argue that they represent two distinct, but overlapping conceptualizations. The systems articulate in two ways. First, perceptual and verbal categories overlap in development such that during early childhood a gradual calibration of the perceptually oriented and domain-oriented concepts is achieved. Previous research has confounded this evolving articulation of perceptual and racial categories by interpreting the first stages of this calibration as developmental changes in racial awareness per se. Second, racial concepts contain some perceptual information, but it is extremely fragmentary, specifying only that certain classes of perceptual differences are pertinent to group membership (young children probably realize that skin tone is a pertinent dimension, finger length is not).* My argument accordingly is nor that perceptual input is irrelevant to emerging racial categories. As with other common sense categories, empirical regularities in the environment are recruited in the service of racial concept formation. I do suggest, however, that these empirical regularities greatly underdetermine racial concepts, so that in the process of building racial categories, obvious surface cues like skin color are not de$ning for young children. The results of Experiment 2 show that although young children are adept at matching racial labels to individuals differing on relevant physical dimensions, they do so only when both label and referent are simultaneously made available. When asked simply to describe a social event, race is not a particularly important dimension-so much so that children are not even sure if they have just encountered a story character of a certain race. Do these results imply that race is generally not relevant to young children? And, particularly, do these results suggest that race is poorly understood by young children? 1 turn to these questions in the next section. Skin tone is the easiest racially relevant perceptual feature to discriminate for preschoolers. although it is not the most salient perceptual cue in racial recognition (Sorce. 1979).
Child’s Representation of Human Groups
V.
151
Do Children Have a Theory of Race?
Most previous work assumes that the development of racial and other social categories is characterized by low level computational processes involving attention to similarities in superficial appearances. Experiments 1 and 2 undermine confidence in this standard view by suggesting that children’s attention to social categories is not predicted by differences in the prominence of the category’s physical correlates. First, preschoolers do not find all conspicuous social categories to be equally salient. Second, preschoolers do not invest all social categories the same amount of perceptual information, and thus presumably do not derive all social categories (most importantly race) from observations of physical difference. One interpretation of these findings is that children simply do not see race as a particularly important aspect of the social environment. Yet, observing that children do not find race an especially important way to describe social contexts does not mean that race is unimportant to explanations of social life. Studies showing that young children maintain strident racial prejudices, for example, certainly suggest that the potential of race to promote inference is quite powerful (Horowitz, 1939; Clark & Clark, 1940; Katz, 1982; Aboud, 1988). The studies in this section examine whether young children believe race to be a potent explanatory concept. In particular, they explore the possibility that young children’s expectations about nonobvious and abstract commonalities among members of social groups provide the context for development of children’s beliefs about racial and other social difference. If young children’s beliefs about race are not a function of their perception of regularities in outward appearance, as Experiments 1 and 2 suggest, it is plausible (in fact necessary) that these beliefs are derived from other features of the social environment. One possibility is that race, although poorly understood in perceptual terms, is well understood in an ontological sense. According to this view, children would be better at thinking about race than attending to its physical qualities. Young children, in other words, conceptualize race more than see it. What would this conceptualization be like? First, as just noted, although we may not know what the conceptual model derives from, we do know that it eventually integrates with one feature of the social environment, namely, the adult model of race. As was also noted earlier, one striking aspect of all cultural models of race is the putative biological implications that race is supposed to have. Adults the world over believe humans are partitioned into “natural,” physically relevant, and innately determined groups. Biology provides the context and causal model in which race is understood, comprising a commitment to a particular ontology, a specific
152
Lawrence A. Hirschfeld
pattern of explanatory principles, and a specified set of concepts. For race, the relevant ontology is the expectation that humans fall into several distinct physical types. Several causal principles are thought to govern the generation of these types. First, race is believed to be derived from physical (anatomical) differences. Second, physical properties differ in meaning, hence, they differentially contribute to racial identity (skin color is generally considered pertinent, stature is not). Third, racially relevant physical properties are thought to be immutable, not changing during one’s lifetime. Fourth, these properties are derived from family background, and are transmitted and fixed by birth. Young children, however, supposedly do not expect race to have any of the biological implications with which adults impute it (Aboud, 1987; Semaj, 1980). More generally, young children are thought not to understand that any sort of biological identity is immutable. Researchers have suggested that preschoolers fail to appreciate that most aspects of a person’s biological identity cannot change, including one’s race (Aboud, 1988), gender (Carey, 1985; Emmerich, Goldman, Kirsch, & Sharabany, 1977; Slaby & Frey, 1975), biological category (Keil, 1989), and personal identity (Guard0 & Bohan, 1971). More specific to race, young children do not grasp that race is a biological property, derived from family background (Aboud, 1988) and fixed by birth (Solomon, Johnson, Zaitchik, & Carey, 1993). Children fail to understand these things because they supposedly overrely on outward appearances. Aboud & Skerry (1983) suggest that before 8 years of age, children do not distinguish the contribution variable properties such as ethnic clothing, and intrinsic ones such as race, make to identity. Similarly, Carey and Spelke (1994) propose that young children do not understand that family resemblances are due to biological transmission and are fixed at birth rather than attributable to social interaction and association. The findings from Experiments 1 and 2 force us to revisit the claim that young children have an atheoretical understanding of human variation: If young children do not rely on physical appearances in constructing racial categories, it is not obvious how they could overrely on them. A.
EXPERIMENT 3: RACE,MUTABILITY,A N D FAMILYBACKGROUND
What meaning do young children give changes in appearance? Do children believe that an individual changes identity when gaining or losing a physical feature? Is such a change conceivable at all? For adult common sense, change of dress is not a change of identity, change of sex is. Change of race, in contrast, is not conceived of as a genuine possibility. In a series
Child’s Representation of Human Groups
153
of studies, I explored how children use their knowledge of various social categories in making judgments about preserved identity. If children do not see all social categories as equally salient, as Experiments 1 and 2 indicate, do they believe that different social categories make unequal contributions to identity? To test this, I asked American preschoolers and young school-age children living in a university town to make judgments about a person’s identity, using a format much like the one used in standard identity constancy studies. (The remaining studies reported in this chapter were conducted in the United States.) Three-, 4-,and 7-year-olds were shown a series of pictures, each portraying an adult of a specific race, body build, and wearing occupationally relevant apparel (e.g., a stout, black police officer, or heavy, Hispanic nurse). The children were then shown a series of paired pictures portraying two children, each of whom shared two (of the three) social features (body build, race, and occupation) of the target picture. Each pair contrasted with the target picture on one social dimension (e.g., one pair consisted of a thin, black child wearing a police hat, toy gun, and whistle, and a plump, white child wearing a police hat, toy gun, and whistle), so that across the three pairs, all possible contrasts were presented. Subjects were then asked about possible identity relations between the target and companion pictures. Figure 1 illustrates one set of items (contrasting occupation and race). Most identity constancy studies ask children if the individual in the target and (altered) companion pictures are the same person. The task I used had a somewhat different design. Children were asked whether the target and companion pictures depicted pairs of related individuals following a familiur transformation. Children are knowledgeable about two sets of changes that encompass preserved identity. First, as people age they change in appearance, and children appear to have little difficulty reconciling such changes with a constant identity. Second, children appreciate that there are family resemblances, and that in spite of differences between the way individual family members look, they nonetheless maintain a certain identity relation (Hirschfeld, 1989a). I relied on this knowledge in designing the study. Two experimental conditions were used, an inheritance condition and a growth condition. In the inheritance condition, children were asked which of the contrast pair was the child of the target adult; in the growth condition, children were asked which of the contrast pair was a picture of the target adult as a child. If children are innocent of the underlying connection between growth and inheritance (and thus do not grasp the supposed biological basis of some social categories), then they should not infer that the patterns of change and stability in the two cases are much the same.
Lawrence A. Hirschfeld
I54
Fig. I .
Experiment 3. Sample items, male series.
Furthermore, if children focus only on changes in physical appearance when making identity judgments, modifications in body build should be as likely to signal a change in identity as a change in skin color. Yet, as Fig. 2 shows, although the effect was strongest for the oldest age group, children of all ages reliably expected that race, but not body build, preserved identity in the face of dramatic changes in appearance. One prediction of the standard view is that preschoolers should not distinguish between the contribution corporeal features (like skin color) or noncorporeal
I55
Child’s Representation of Human Groups
ones (like clothing) make to identity. Springer & Keil(l989, 1991)propose a more moderate version of this hypothesis according to which some, but not all, conceptually enriched distinctions might be drawn (e.g., social and behavioral properties would be jointly contrasted with physical and biological properties). On their view, finer-grained conceptually enriched contrasts (say, between qualities that are relevant to collectivities vs. those that are not) would not influence young children’s judgments of identity. Yet, here too, results from m y study challenge this characterization of young children’s social beliefs. As Fig. 2 indicates, by 4 years of age, children distinguish the contribution race, as opposed to other salient (but collectivity irrelevant) properties like occupation, make to identity. Strikingly, as Table VI indicates, there were no differences in children’s judgments about growth and judgments about inheritance. It is worth considering the implication of this finding. Most studies of identity constancy ask children to assess the possibility or consequences of changes in characteristic but nonessential cues (dress) versus changes in characteristic but essential features (skin color). Implicit in such contrasts is the notion that inessential features are noncorporeal, whereas essential ones are literally embodied. Although these tasks all use familiar properties, they do not all involve familiar trcinsformations. Children (or adults presumably) typically do not witness abrupt and major changes in a person’s
RACE OVER BODY BUILD
0 OCCUPATION OVER BODY BUILD
fa RACE
-.-
OVER OCCUPATION
3-YEAR OLDS
4-YEAR OLDS
7-YEAR OLDS
N = 25
N = 29
N=W
Fig. 2. Experiment 3. Mean number of choices (out of two) by age group and type of comparison. N o f e : From Hirschfeld (in press b).
156
Lawrence A. Hirschfeld
TABLE VI EXPERIMENT 3: MEANNUMBER OF CHOICES (OUT OF T w o ) BY QUESTIONA N D TYPE OF COMPARISON Question
lnheri tance Comparison Race over body build Race over occupation Occupation over body build
N
=
40
I .62 1.23 1.10
Growth N = 38 1.62 1.55 1.20
Note. Data from Hirschfeld (in press b).
intrinsic physical state or presentation of self. Accordingly, asking children to determine whether someone’s identity remains the same under several different meaningful but unfamiliar changes in appearance may confuse young subjects because it is not clear whether the pretransformed and posttransformed individual is supposed to be (as opposed to could be) the same individual (Bem, 1989). As observed earlier, children do have considerable experience with-and knowledge of-natural transformations whose scope is as dramatic as those used in most identity studies: Children are familiar with major physical and behavioral changes occurring across the iife span (i.e., in the context of growth) and across generations (i.e., in the context of inheritance). Young children appear to understand that these natural transformations are both lawful and nonrandom, and this understanding apparently involves domain-specific knowledge (Rosengren, Gelman, Kalish, & McCormick, 1991; Springer and Keil, 1989; Keil, 1989). For adults, these two sorts of resemblance (or canonical change) follow from the same source, biological relatedness. There is no a priori reason, however, to assume that young children make the same identification. The finding from this study that children do identify the preservation of features in growth and the preservation of features across generations, suggests that children view this preserved identity in both circumstances as resulting from the person’s intrinsic nature, in that by definition it is this intrinsic quality that is constant over both growth and inheritance. Young children, thus, appear to have a biological understanding of these social properties. In short, and in contrast to previous studies, I found that children do not consider all physical properties of a person to be equally informative of their identity, and by extension, equally resistant to modification. If
Child’s Representation of Human Groups
I57
children were focusing only on changes in corporeal appearance when making judgments about identity, they should find modifications in skin color as likely to signal a change in identity as changes in body build. Clearly this is not the case. It is important to keep in mind that body build is informative of individual identity and racial affiliation (Molnar, 1992). Moreover, variation in body build is attention-demanding for young children, and has inferential potential in the sense that stereotyping is predicated on it (Lerner, 1973). Body build is not a superficial property to either the biologist or the child. Yet it is consistently seen as irrelevant to identify by even 3-year olds. In contrast to earlier reports, by 4 years of age children find race to be more critical to identity than either occupation or body build. Finally, the congruence in performance in the growth and inheritance conditions points to a general biological understanding of essential traits . B.
EXPERIMENT 4: BIRTHA N D
THE
REPRODUCTION OF RACE
The results of Experiment 3 indicate that even young preschoolers consider a person’s racial identity to be immutable, related to family background, and derived from some physical properties but not others. These results paint a picture of children’s social inferencing that is quite different from that portrayed in most standard treatments. First, young children go beyond superficial reasoning when thinking about social difference. Second, this pattern of racial inference closely resembles expectations children have about nonhuman living things. How deep is this parallel? A more fully biological understanding of inheritance contains two components, namely, a belief in family resemblance and an understanding that the mechanism underlying this resemblance involves birth (Carey & Spelke, 1994). Several studies suggest that preschoolers indeed have such a grasp of inheritance, at least for nonhuman living things. Springer and Keil(1989)showed that preschool children expect offspring to resemble their parents. Gelman and Wellman (1991) found that preschoolers expect living kinds to possess an intrinsic potential that underlies expectations for continuity over the changes that occur during growth. Results from Experiment 3’s family condition suggest that children believe that children racially resemble their parents, and results from the growth condition suggest that preschoolers expect racial continuity during growth. But these results fall short of confirming that children believe that race is fully heritable, at least in the adult sense of heritable. As Carey and Spelke (1994) caution, Gelman and Wellman’s and Springer and Keil’s results d o not fully support the conclusion that young children have a biological understanding of inheritance, even for nonhuman living things.
I58
Lawrence A. Hirschfeld
Children, for instance, may attribute family resemblance to social, not biological causes. Based on a switched-at-birth study, Solomon et al. (1993) reason that children’s expectations about inheritance are far less adultlike than Springer and Keil, Gelman and Wellman, or the results of Experiment 3 suggest. Their study indicates that preschool children did not expect a child to share many physical properties (including race) with his or her birth parents. Solomon et al.’s contention has two parts: first, children younger than 7 do not understand that child-parent resemblances are mediated by mechanisms of biological reproduction. Second, young children fail to differentiate between the heritability of psychological and biological properties. Considering these conflicting findings, I conducted a study to reexamine whether preschoolers’ beliefs about race include a more fully biological notion of inheritance. Four- and 5-year-olds heard a set of stories about two couples, one black (Mr. and Mrs. Smith) and the other white (Mr. and Mrs. Jones). As they listened to the story, they were introduced to drawings depicting each couple. The first story was about a white couple who adopt the infant of a black couple: These two people, Mr. and Mrs. Smith, had a baby girl [subject is shown a picture of a black couple]. That means that the baby came out of Mrs. Smith’s tummy. Right after it came out of her tummy, the baby went to live with these people, Mr. and Mrs. Jones [subject is shown a picture of a white couple]. The baby lived with them and Mr. and Mrs. Jones took care of her. They fed her. bought her clothes. and hugged her and kissed her when she was sad.
Children were then shown two pictures depicting a white school-age child and a black school-age child and asked which of these girls was the infant grown into school age. The second story was identical to the first except that the couples were reversed, so that the black couple adopt the white couple’s infant. If children’s responses are not governed by an understanding of the mediating role of birth in fixing family resemblances, they should reason that the school age child will racially resemble the adoptive parents. Conversely, if they grasp t h e role of birth, they should expect the school age child to racially resemble the birth parents. Children’s responses were scored in terms of how often they answered in accord with the “nature” o r “nurture” hypotheses. Scores were summed across the two stories, and children were credited with a nature bias only if they selected an infant of the biological parents’ race on both items. Similarly, they were scored as reasoning in accord with a nurture bias only if they selected an infant of the adoptive parents’ race on both items. As Fig. 3 shows, both older and younger subjects clearly favored the “nature” hypothesis,
159
Child’s Representation of Human Groups
K!
Y0
E 8
NATURE NURTURE INCONSISTENT
0.50-
o.ooA
3-YEAR OLDS N = 15
4-YEAR OLDS N
= 18
5-YEAR OLDS N
= 14
Fig. 3. Experiment 4. Mean percentage ”nature,” “nurture.” and inconsistent choices by age group.
overwhelming choosing the children whose race matched the birth parents on both items. Children’s justifications were informative: 34% of the 5year-olds justified their responses in terms of the birth parents’ skin or hair color: 21% of their justifications cited experiential, emotional, or apparel associations (‘$1 like her better,” “she’s wearing the same outfit,” etc.), 9% made reference to the adoptive parents’ skin color, and 34% of the children offered no justification. Younger children’s justifications were very similar: 37% of the 4-year-oldsjustified their responses in terms of the birth parents’ skin or hair color, 30% in terms of experiential, emotional, or other associations, and 33% of the children offered no justification. In contrast to Solomon et al.’s (1993) findings, both 4- and 5-year-olds expect that identity-relevant physical properties such as skin color are fixed at birth. Neither age group believes that family members physically resemble each other because of shared life experiences. Preschoolers are thus essentialists with respect to race, believing that something causes members of a racial category to develop in a certain way. no matter what the environment in which members are raised. We can plausibly attribute children’s reasoning about the origins of racial differences to an expectation of some material, but nonobvious link between biological parents and their offspring. Following Gelman and Wellman (1991), I interpret this nonobvious link as a belief in an intrinsic essence shared by category members. In this case, the relevant category membership involves the
160
Lawrence A. Hirschfeld
notion of family under a biological interpretation. The adultlike scope of this belief is underscored if we recall that in Experiment 3 , young children endorsed a single set of principles governing identity relations during growth and within a family.
C. EXPERIMENT 5: RICHSTRUCTURE A N D NONRACIAL SOCIALCATEGORIES Taken together, the results of Experiments 3 and 4 suggest that preschool children have a more richly structured and adultlike grasp of race than earlier researchers imagined. Still, there are interesting differences between children’s and adult models of social difference. Paradoxically, these differences may not turn on children having a less theory-like model of social difference, but one that is more theory-like than adults. For adults, occupation is a functional (or artifact-like) category and not a theory-laden one. Results of Experiment 3 raise the possibility that children in the youngest age group invest occupation with deeper importance than adults would. Recall that in that study, 4-year-olds not only found race more relevant to identity than body build, they also found it more informative than occupational apparel. Three-year-olds also reasoned that race was more informative of identity than body build, but unlike 4-yearolds, the younger preschoolers did not distinguish between the contributions race and occupational apparel make to identity. Two alternative interpretations come to mind. A first possibility is to attribute the difference in performance of youngerpreschoolers (who did not find race more important than occupation in judgments of identity) and older preschoolers (who did) to a developmental shift from a reliance on appearances to a reliance on a deeper understanding of social difference. Both groups appear to understand that the meaning of stable physical features (such as skin color and body build) is not the same, but only the older group expect that meaningful corporeal features are generally more important than sartorial differences. Thus, older but not younger preschoolers recognize afurther level of contrast between the features. Threeyear-olds, by distinguishing between the role race and body build play in identity, show that they discriminate attributes in terms of the relevance they have in human variation. In contrast, 4-year-olds, by distinguishing the role that corporeal features play relative to variable properties like costume, discriminate the relevance different classes of features have for understanding identity. A second possible interpretation is that the youngest subjects in Experiment 3 may give apparel such weight not because they invest clothing with special meaning, but because they invest occupation with a greater
Child’s Representation of Human Groups
161
measure of significance. This is plausible for several reasons. Occupation is sartorially marked but it is not simply a category of clothing. As Experiments 1 and 2 showed, occupation is attention-demanding for children of this age, in part because it is an explanatory frame of reference: Occupation involves expectations about goal-directedness, areas of specific expertise, and habitual patterns of behavior (Hirschfeld, 1993). Young children who rely on occupational cues in judgments of identity may do so because of their sensitivity to function and teleological understanding. This interpretation suggests that the 3-year-olds’ performance may not reflect low level computational processes at all. Rather, the youngest children in Experiment 3 may be using an explanation-based inference strategy in thinking about social difference. In contrast to the strategy they use to reason about race (in which biological commonalities are thought to underlie category membership), here, children’s expectations may be derived from judgments about goals and habitual patterns of utilitarian behavior. Experiment 3, however, does not provide unambiguous evidence for this interpretation. Experiment 5 addresses this question directly. This study was similar to Experiment 3 in most respects except that instead of contrasting the relative contribution race, body build, and occupation make to identity, Experiment 5 explored the contribution that shared clothing color versus common occupation makes to identity. Two triad sets were used; the first consisted of a target picture of a boy dressed in yellow clothes and wearing a firefighter’s outfit and two comparison pictures, one depicting an adult firefighter dressed in red, the other portraying an adult in yellow civilian clothing. The second triad set consisted of a target picture that portrayed a girl wearing a pink dress and a waitress’s apron and cap, and comparison pictures depicted an adult waitress wearing a blue dress and an adult unmarked for occupation wearing a pink dress. Children were asked to choose which of the comparison pictures was a picture of the child as an adult. The dependent measure of interest was the number of times subjects selected shared occupation over shared clothing color. In general, children did not find occupation more salient to identity than shared clothing color, but separate analyses of each triad set, summarized in Fig. 4, revealed that occupation was significantly more relevant to the male items’ identity ( M = 1.20, out of 2 ) than to the female ( M = .81, out of 2). Occupation choices for the female items were significantly below the chance expectation of 1. Consistent with the findings of Experiment 3 (that 3-year-olds expect occupation to be as identity-relevant as race), occupation choices for the male items were reliably above chance for the 3-year-olds ( M = 1.36). Overall, occupation choices on the male items tended toward but were not significantly above chance ( M = 1.18).
Lawrence A. Hirschfeld
I62
crl
’1
X v
1
rA
u,0
8
W MALEITEMS
C2 FEMALE ITEMS
fL
crl eq
5z 0 3-YEAR OLDS
N = 11 Fig. 4. group.
4-YEAR OLDS
7-YEAR OLDS
N = 15
N = 12
Experiment 5. Mean number of occupation choices for each item set by age
Two interpretations of this item effect seem plausible. One possibility is that the male stimulus materials may have better conveyed the relevant cues about occupation than the female stimulus materials. A second possibility is that children consider occupational apparel more meaningful than clothing when drawing inferences about a male’s identity, but less meaningful when inferring a female’s identity. This interpretation would accordingly suggest that young children do discriminate between the contribution different kinds of apparel make to identity, it simply indicates that this expectation is complex. One reason to favor this second interpretation is that it is consistent with other work on children’s occupational stereotypes on the one hand, and their beliefs about gender status and role differences on the other (Blaske, 1984; Gettys 22 Cann, 1981; Cordua, McGraw, & Drabman, 1979). These studies indicate that young children find occupation to be more closely associated with males, whereas the home and child care is more closely associated with females. The results of Experiment 5 suggest that these beliefs not only condition children’s stereotypes, but also shape the inferences they draw about the contributions occupation and gender make to identity. Even if children’s beliefs about social difference are consistent with gender stereotypes widely, and redundantly, available in the environment,
Child’s Representation of Human Groups
163
the question remains: Why would children develop a more theory-like set of expectations about a social category than the adult model warrants? Why, in short, would young children come to believe that occupation captures something deeper about an individual than adults accept? 1 can only speculate, but imagine that this has something to do with the function these social categories play in young children’s thought. The results of Experiments 1 , 2, and 3 suggest that race and occupation have quite different saliencies, and different meanings, for young children. Social descriptions are purposeful. Like all category-based reasoning, they allow the child to pick out relevant aspects of the environment and generalize over them. In the case of social categories, they allow the child to identify types of actors and appropriate explanations for their actions. Occupation categories are particularly salient because occupations represent recurring and purposeful behaviors (the police protect people, doctors cure them when they are ill, letter carriers deliver the mail, etc.). Occupation is a behaviorally relevant collectivity. The concept of race, in contrast, performs a quite different job. The notion of race is part of the child’s expanding social ontology, it is an early step in cataloging and discovering the relevant human groups. The fact that young children appear more concerned with developing a conceptual vocabulary for racial variation than a catalog of physical differences suggests that this impulse to find racial types involves expectations about global kinds of things, rather than the differentiation of specific ones (Hirschfeld, 1988, 1993; Mandler, Bauer, & McDonough, 1991). In the early stages of social category development, children may be unsure whether a category is a behaviorally relevant or kind-relevant one. Or they may be in the throes of determining what the basis of kindhood is, so that behaviorally based categories are construed as the template against which kinds of people are measured. I t is possible that this is why they come to momentarily invest occupation with the rich structure that they do. Further research is needed to decide these issues.
VI. Racial Thinking and Folk Biology: Areas of Divergence Experiments 3 and 4 provide considerable support for the claim that young children have both a theory- and adultlike understanding of race. Like adults, the framework that children use to organize their beliefs about race is an essentialist one. Essentialist reasoning is most clearly elaborated in common sense biology (Atran, 1990; Keil, 1989; Gelman & Coley,
164
Lawrence A. Hirschfeld
1990), but is not limited to biological thinking (Medin, 1989). Experiments 6 and 7 illustrate this by exploring one area in which biological and nonbiological essentialist reasoning do not converge. Essences function, among other ways, to model expectations about category identity. For example, Gelman & Wellman (1991) provide evidence that young children expect nonhuman living kinds to have an innate potential that governs subsequent development so that biological category identity is maintained despite variations in the environment or vagaries in the organism’s initial state. Children believe that tigers grow to be large, fierce, and capable of growling even though as cubs they are small, helpless, and purring. Children also expect tiger cubs to become tigers even if they are reared by lions. Adults explain this continuity across changes in behavior and appearance by appealing to the tiger’s underlying essence. Adults expect human races to have innate potential as well. The notion of racial potential, however, is recruited to serve a somewhat different purpose than the one studied by Gelman and Wellman. Unlike folk beliefs about nonhuman biological categories, common sense recognizes that members of different sorts of social categories are interfertile: doctors can mate with lawyers, blacks with whites, friendly people with grumpy ones. Innate racial potential comes into play as a way of determining the social category identity of offspring of mixed racial unions. According to both American folk belief and formal law, a “small” amount of one race’s “blood” renders one a member of that race, whereas a substantially greater proportion of another race’s “blood” does not affect category identity. In short, adults in the United States expect that some races have greater innate potential than others, in that they expect some races to contribute “more” to the physical identity of future generations than others (Davis, 1991; Molnar, 1992). Although racist construals of this are familiar (Nazism, the Klu Klux Klan, etc.), this expectation is restricted neither to fanatics nor marginal belief systems. The taxonomic corollary is codified in many state laws about racial status. [In Louisiana, e.g., a person is black if one of his or her great-grandparents were classified as black; thus, 1/16 black “blood” has more innate potential than 15/16 white “blood” (Ebony, 1983).] It is the method used by the United States Census Bureau for determining the race of children of interracial families (Molnar, 1992).3Given the wide distribution of the notion of innate racial potential (see Harris, 1964), it is of interest to know how and when children come to believe this aspect of the cultural model of race.
’
The one-drop rule derives from racist ideology, but all adherents to it are not by definition racist. See Footnote 5.
165
Child’s Representation of Human Groups
A.
EXPERIMENT 6: CHILDREN’S BELIEFSABOUT POTENTIAL OF HUMAN RACE
THE
INNATE
Understanding of innate racial potential was assessed by asking children to judge what the offspring of racially mixed couples would look like.4 Two groups of school age children (7- to 8-year-olds and 11- to 12-yearolds) were shown pictures of four different couples one at a time, each holding an infant whose face and body were obscured. The couples were composed of either a black male and black female, a white male and a white female, a black male and a white female, or a white male and a black female. Each child was then shown pictures of three infants (a black infant, a white infant, and an infant that was intermediate between the black and white infants in terms of skin color, hair color, and hair quality), and asked which of the infants was the child of the target couple. All children judged that the black couple would have a black infant and the white couple a white infant. The contrast of interest involved the interracial couples. If children have no notion of racial potential (i.e., no strategy for determining heritability of racially relevant physical properties in ambiguous cases), they should choose at random. If children believe that offspring racially resemble both parents (i.e., both races have the same amount of potential), they should select the intermediate child. If children believe that whites possess more potential for determining identity-relevant properties than blacks, they should select the white child. If they believe that blacks have more potential than white, they should select the black infant. As Fig. 5 shows, younger subjects showed no clear preference for any outcome. They believed that it was equally likely that the offspring of a racially mixed couple would have black, white, or intermediate colored skin, indicating that they have no beliefs about racial potential. Older children, in contrast, overwhelmingly chose the black baby on both mixed-race pairs. Older children, accordingly, appear to have learned the culturally dominant model of innate social potential, believing that blacks have greater racial potential than whites. These results suggest that by early adolescence, children come to believe that each race has a distinct racial potential. Still, the data fall short of confirming this because it is not clear whether children are making inferences specifically about race or whether their judgments are about the inheritance of physical properties generally. For instance, children might expect darker colored properties to prevail over lighter colored properties. To rule out this possibility, the mixed-race task was combined with a mixed hair color task. Children were shown, one at a time, pictures This line of inquiry came out of discussions with Dan Sperber, whose suggestions 1 gratefully acknowledge.
Lawrence A. Hirschfeld
166
WHITE INTERMEDIATE BLACK
2 0.4
er;
20 g
0.2
0.0 7-8
YEARS
N=26
11-12 YEARS
N = 20
Fig. 5. Experiment 6. Mean proportion black. white. and intermediate infants were chosen by age group.
of couples whose hair color was either the same or different (blond mother and father, brunette mother and father, blond mother and brunette father. brunette mother and blond father). After viewing each adult pair, subjects were shown three infants (a blond infant, a brunette infant, and a redhaired infant) and asked which was the child of the adult pair. If children’s answers in the first task were based on a belief that darker colors dominate in inheritance over lighter colors, their inferences about hair and skin color should be largely the same. If children were reasoning specifically about race in the first task, their inferences about a socially less relevant property, hair color, should differ from those about a socially relevant property, skin color. As Table VII shows, younger children’s expectations about hair and skin color inheritance do not reliably differ; in both cases, they showed no preference for a particular outcome. In contrast to their judgments about skin color, older children also displayed no preference for any one outcome on the hair color task. That is, children in both age groups expected children of mixed hair color unions to be as likely to be blond, brunette, or redheaded. It is important to keep in mind that neither set of judgments is attributable to experience, inasmuch as neither corresponds to the biology of the phenomena. Older children’s belief that a child with one black and one white parent will have black features is not derived from direct observation of interracial children because there is no genetic dominance for skin color
I67
Child’s Representation of Human Groups
TABLE VII EXPERIMENT 6: M E A NPERCENTAGE EACH COLORWAS CHOSEN B Y AGE GROUP,HUMAN HAIRCOLORTASK 11-12 Years N = 19
7-8 Years N = 19 Hair color Blond Red Brunette
Hair color 34.2 44.7 2 I .o
Blond Red Brunette
26.5 29.4 44. I
(Bodmer & Cavalli-Sforza, 1976).Children whose parents have markedly different skin color tend to have skin color that is intermediate between that of each parent (Byard, 1981). On the other hand, older children’s belief that a child with one blond and one brunette parent will have red (or intermediate) hair color is not an induction from experience because darker hair tends to be dominant over lighter hair (Robins, 1991). If experience does not account for children’s judgments, what causes older children to believe that the children of interracial couples will have dark skin? In the United States, mixed-race children are considered black rrgnrdless of their physical appearance (Molnar, 1992). A considerable literature in psychology has demonstrated that even quite young children are sensitive to the politics of race (Cross, 1991). Thus, older children’s expectation that darker skin predominates over lighter skin appears to be shaped by knowledge of the social relations underlying an identity-relevant physical property, whereas their expectation that hair color is randomly inherited reflects a judgment about the lack of social relationship underlying a non-identity-relevant physical property. Children appear to be reasoning from social relations to biological principle. That is, biology provides the context for modeling these inferences. but the notion of race provides the principles that ultimately shape these judgments.
B.
EXPERIMENT 7: CHILDREN’S BELIEFABOUT THE INNATE SURFACE COLOR POTENTIAL OF NONHUMAN
The results of Experiment 6 show that during the late elementary school years children come to hold the adult model of innate racial potential, expecting that a child with one black parent is physically black. These
168
Lawrence A. Hirschfeld
findings cannot be attributed to a general strategy for predicting the inheritance of surface color properties because the same children do not believe that hair color has the same innate potential as skin color. I interpreted this as evidence that children base their inferences on the distinction between social relevant properties (i.e., skin color) and socially less relevant properties (i.e., hair color). There is one potential problem with this interpretation. Eleven-year-olds may simply have different rules governing the inheritance of skin color and the inheritance of hair color, without regard to whether one is socially relevant and the other less so. Although such a belief would have implications for the formation of later stereotypes, it would not be derived from prejudice itself. To explore this possibility, I conducted a second study that parallelled Experiment 6 in most respects, except that it used animals, not humans, as stimulus materials. Four different animal pairs were used; two consisted of animals whose surface color resulted from their skin color (alligators and elephants), and two whose surface color resulted from their hair color (bears and camels). Otherwise, the procedure was unchanged. Children were shown pictures of a mixed-color pair and told that they were the parents of a baby animal. Subjects were then shown pictures of three baby animals (one dark, one light, one intermediate) and asked to identify which one was the couple’s baby. Hair and skin color are both socially irrelevant for animals (at least for the species chosen). Thus, if subjects in Experiment 6 reasoned about skin color rather than race, per se, then their judgments should be the same, whatever the biological species in question. If, in contrast, subjects in Experiment 6 reasoned specifically about race, a uniquely human phenomenon, then their judgments about the heritability of skin color for humans and nonhuman animals should differ. Table VIII summarizes the findings. In contrast to their expectations about human stimuli, children’s judgments about animal skin color and hair color did not differ, indicating that grade schoolers believe that the principles governing the inheritance of an animal’s surface color are the same whether the color is derived from the tint of the animal’s skin or hair. As with Experiment 6, older and younger children’s judgments differed. Younger children showed, as they did in Experiment 6, no preference for one outcome over another. Older children, however, significantly preferred a single outcome-as they did in Experiment 6. But unlike their judgments about human skin color, in which they inferred that darker skin color dominates lighter shades, older children consistently believed that the offspring of animals of different colors would be an intermediate blend of their parents’ colors. Experiments 6 and 7 demonstrate that older school-age children expect
Child’s Representation of Human Groups
I69
TABLE VIII
7: MEANPERCENTAGE EACHCOLOROUTCOME EXPERIMENT SKIN,A N D HAIRTASKS WASCHOSENBY AGE GROUP,ANIMAL
tl-12 Years N = 19
7-8 Years N = 19
Skin color Hair color
Light
Blend
Dark
Light
Blend
Dark
,336 ,356
,327 ,298
,336 ,346
.200“ ,225
,625’ ,515‘
.175u
.200“
Significantly below chance at p < .OS. Significantly above chance at p i.01. ‘ Significantly below chance at p < . O l .
“
different races to have distinct innate potentials, in that the physical features associated with one race are thought to dominate over the physical features associated with other races. The notion is specific to beliefs about human race, and does not reflect a general strategy for understanding the biological inheritance of color-related physical properties. Thus, by early adolescence, children have developed the cognitive underpinnings of an important aspect of racism, namely, that a minority race’s innate potential “contaminates” or “overpowers” that of majority culture. Remarkably, this expectation-a subtle but important aspect of racism-develops in a liberal college community where children may well have not received direct tuition in calculating the innate potential of various races. These inferences cannot be attributed to experience, inasmuch as these judgments do not correspond to the biological regularities. Instead, children’s beliefs appear to represent spontaneous inductions, surely promoted by a pervasive racial climate in the United States, but facilitated by the early emerging integration of the child’s theory of biology and theory of society. In arguing that this reflects the internalization of the conceptual underpinnings of racism, am I going beyond the data? Experiments 6 and 7 did not probe children’s knowledge of the way society sorts people racially (i.e., they were not asked about the race of the child), but explored what children believed the offspring of interracial unions would Look like. I concluded that older subjects’ preference for the black-featured outcome is related to prejudice, not because children inferred that the child is racially (i.e., categorically or sociologically) black, but because they inferred that the black physical features dominate. These “biological” expectations about human inheritance are not based on knowledge of the
I70
Lawrence A. Hirschfeld
physical world, but are governed by the children’s theory of society, shaped by a larger system of prejudiced belief.’ Racism is often linked with the claim that human variation is biologized (Atran, 1990; Banton, 1987; Gould, 1981; Guillaumin, 1980). The results of Experiments 6 and 7 suggest that it also involves the socializing of biology.
C. EXPERIMENT 8: SOCIALENVIRONMENT A N D CHILDREN’S BELIEFSABOUT INNATERACIALPOTENTIAL What makes a person black and what makes a person white, as already observed, involve cultural and legal criteria that may be only marginally linked to biology (Davis, 1991; Molnar, 1992). Thus, a child might believe that an interracial couple’s children were categorically (sociologically) black without expecting the children to have the physical features typically associated with blacks. In fact, a child whose knowledge was derived principally from experience of biological (not social) regularities would conclude precisely this, inasmuch as biologists studying mixed-race children have found that they tend to have skin color intermediate between that of their parents (Bodmer & Cavalli-Sforza, 1976). Although the school in which the testing was conducted was primarily white, there were several black children in each grade, and, importantly, there were mixed-race children in the classes from which participants were drawn. Presumably, then, encounters with mixed-race children are not sufficient to promote the biologically more accurate judgment that children’s skin color is a blend of their parents’. What role does experience play in shaping these beliefs? In particular, what sort of experience is crucial? It is possible that the critical set of experiences involve extended contact with a different subsystem (or subculture) of racial thinking. For example, in some minority communities subtle differences in skin color are highly salient (Russell, Wilson, & Hall, 1992). Does living in an environment where such differences are highly salient influence children’s knowledge of the biology of racially relevant physical features? By extension, are majority and minority children’s biological beliefs equally gov-
’
The question is not whether someone with one black parent is black, but whether someone with one black parent will have black physical features. The issue of social identity is distinct from the question of physical property inheritance. The notion that someone with any black heritage is categorically black is “rooted in the one-drop rule of racial identity, which, more than any other factor, has shaped the development of racial identity in America. Although it had its origins in racism, today the rule is staunchly defended by most members of the Black community” (Russell et al., 1992, p. 74). The issue under discussion in this chapter is whether biological expectations about property inheritance are shaped by sociopolitical factors. The results of Experiments 6 and 7 suggest that they are.
171
Child's Representation of Human Groups
erned by dominant social relations? Experiment 8 explores this issue directly. Children attending a largely (54% black) minority school in a blue-collar community near the majority community from which subjects in the earlier studies were drawn participated in the same tasks used in Experiments 6 and 7. Children were asked to infer what the offspring of couples differing in one color property would look like under four conditions: mixed-skin human couples, mixed-hair human couples, mixed-skin animal couples, and mixed-hair animal couples. Table IX summarizes the results. Unlike children living in the majority community, there were no reliable age differences in these children's judgments-older and younger children showed largely the same preferences. In all but one condition, minority community children expected that mixed couples (no matter whether the contrast was hair or skin, or whether the couples were animal or human) would have offspring whose surface color was an intermediate blend of each parent's color. The exception was the human hair color condition, in which younger subjects believed that dark hair predominates, whereas older children believed that light hair dominates. The pattern of judgments is clearer if we directly compare majority and minority community children's performances. Generally, younger majority community children chose at random, hence displaying no particular preference in any condition. In contrast, younger minority community children were more likely to select the intermediate (or blend) outcome, though the effect was strongest in the human skin color condition. Thus,
TABLE IX EXPERIMENT 8: MEANPROPORTION EACHCOLOR OUTCOME WAS COMMUNITY ENVIRONMENT CHOSENBY AGE GROUP,MINORITY
11-12 Years N = 20
7-8 Years N = 17 Light
Blend
Dark
Light
Blend
Dark
,276 .342 ,289 ,237
.487 ,552' ,474 ,237
,276 . I 10' .237 ,526"
,125'
,750" ,475 .750" ,225
,129 ,375
~
Skin color: Animals
Skin color: Humans Hair color: Animals Hair Color: Human
(' significantly above chance at p < .01 * significantly below chance at p < .01. ' significantly above chance at p < .05
. I 50' .112h
,550"
.13?
,225
172
Lawrence A. Hirschfeld
younger minority community children chose the dark and light skin infants reliably less often than did their majority community counterparts. There were no other significant differences between younger majority and minority community children’s inferences. Older majority community children overwhelmingly selected the black infant in the human skin color condition, and they were reliably higher in this choice than minority children. Older minority community children were more likely to choose the intermediate infant in the human skin color condition, and they were significantly more likely to do so than majority children. Finally, older majority community children chose the dark hair infant more often than minority community children. The simplest explanation of these differences-that is, differential experience with mixed-race children-is not compelling. Interracial families were uncommon but present in both schools, so that all subjects had direct experience with biracial children. Moreover, it is plausible that children in the majority school had as much or more experience with children whose parents have different hair color. Yet their judgments in the human hair color condition were no more biologically accurate than their inferences about skin color, nor were they more biologically accurate than minority community children. The crucial difference may be different levels of endorsement of a particular (and prejudiced) model of social difference in the two populations. This is not meant to imply that minority children’s beliefs about race are not shaped by the pervasive prejudice of American society. Many researchers have interpreted black children’s repeatedly documented preference for white dolls as evidence of such influence (for a review, see Cross, 1991). Instead, at issue here is whether one aspect of the widespread (and prejudiced) model-that social relations of domination and power shape biological expectations-is equally held by members of the majority and minority communities. The results of Experiment 8 suggest that the notion of uneven racial potential may not be equally distributed. Clearly, more work is needed before this can be established. These results also raise another issue that will require further research to resolve. Most studies on racial concept formation contrast the performance of children of different races on various tasks. The implication is that the relevant level of contrast is between individuals. In Experiment 8, we did not select children on the basis of the experimenter’s intuitions about their race. The reason for this was that race is a variable quality. Contrary to common sense (codified in much legal discourse), an individual’s race is not fixed. Considerable work in the social sciences now agrees that judgments of race and ethnicity are task-specific (Sollors, 1989;Gold-
173
Child’s Representation of Human Groups
berg, 1990; Root, 1992). Accordingly, the results from Experiment 8 were reported from the perspective of cultural environment (minority vs. majority community children), and not from the perspective of individuals (specified in terms of their race). We can test whether this is a valid strategy by comparing the performance of black and white children in the minority community (using school records to determine race of subject). If individuals are the appropriate units of comparison, we would expect white subjects, whatever their environment, to respond similarly. If social milieu, and not individual racial status, is the appropriate unit of contrast, then whites and blacks living in the minority environment should not differ in their judgments, and both groups should differ in their inductions from children living in the majority environment. As Table X shows, this is what was obtained: Black and white children living in the minority community did not differ in their expectations. The crucial distinction, accordingly, appears to be between the cultural environments in which the children live, not children’s racial status per se. Again, more research is needed.
TABLE X EXPERIMENT 8: MEANPROPORTION EACHCGLCIR O U T C O M E WAS CHOSEN BY AGE GROUPA N D RACEOF SUBJECT, MINORITY COMMUNITY ENVIRONMENT
11-12 Years N = 20
7-8 Years N = 19
Skin color: Animals Whites Blacks Skin color: Humans Whites Blacks Hair color: Animals Whites Blacks Hair Color: Human Whites Blacks
Light
Blend
Dark
Light
Blend
Dark
,275 .278
,525
,275 ,278
.I50
,444
,675 ,800
,100
.350 ,333
,600 ,500
,050 ,166
,200 ,100
,400
,400
,550
,350
.300 .278
,450 ,500
so0 .222
.I50 .I25
,675 ,825
.I75
,300 . I 67
,250 ,222
,450
,200 ,250
,200 ,250
,600
.610
,100
,175
,050
,500
I74
Lawrence A. Hirschfeld
VII.
Conclusion
A widely accepted view of social development, motivated by Piagetian expectations about children’s conceptual limitations, holds that young children overrely on external appearances when deriving and representing social categories. By the same token, young children are thought incapable of construing social difference in terms of the nonobvious and abstract criteria embedded in adult expectations about social entities. Accordingly, children are viewed as insensitive to the ontological commitment to group o r collective that is integral to an adult understanding of social events. Finally, this view of social development predicts that children will attend to various social dimensions to the extent that these dimensions are physically prominent. Each study reported here challenges some aspect of this view of cognitive development. Experiments 1 and 2 demonstrate that children do not find all conspicuous social properties equally salient, nor do young children construe the conceptual salience of social categories in terms of their physical correlates. The studies establish that perceptual information may not in fact be crucial to the representation and derivation of several important social dimensions, especially race. Experiment 3 explored children’s judgments about the inheritance and growth of social properties, and provides considerable support for the claim that young children have a theory-like and principled understanding of race. The switched-at-birth study in Experiment 4 indicates that this understanding is not only richly structured but involves expectations of a mediating mechanism, specifically a nonobvious essence that is transmitted and fixed through biological reproduction and birth. The results of Experiment 5 suggest that young children also believe that nonracial social categories may be organized around theoretical principles. Experiments 6 and 7 establish that having a biological understanding of race does not mean that all principles governing nonhuman biological reasoning apply to social kinds. In particular, these studies show that social categories are organized around entities, like race, that have social rather than strictly biological relevance. Experiment 8 shows that the environment in which social categories develop influences how biological and social knowledge are integrated. All three studies have direct implications for our understanding of prejudice. Table XI summarizes the main findings. Taken together, these studies provide considerable evidence against the standard Piagetian view, and considerable support for the alternative model of young children’s understanding of human variation and human groupings outlined in the introduction t o this chapter. According to this model, young children’s social cognitions are more richly structured and
TABLE XI SUMMARY OF MAIN FINDINGS ~~
Age Pattern of inference
3-Year-olds
4-Year-olds
5-Year-olds
7-Year-olds
I?-Year-olds
Discriminate between attributes Immutability of relevant physical features Domain-specific causal principles: Growth Inheritance Discriminate between classes of attributes Domain-specific mechanisms of reproduction Uneven innate racial potential
X
X X
X X
X X
X X
X X X X
X X X X
X X X X
X X X X X
X X
x
Lawrence A. Hirschfeld
176
nuanced than earlier research suggested. Two features of the child’s representation of human groups are most striking. First, children appear to recognize that some human collectivities are based on nonobvious commonalities critical to each member’s intrinsic nature. Moreover, young American children’s expectations about what is socially intrinsic seem to be sensitive to a wider array of deep commonalities than American adult belief allows: different cultural traditions construe different social dimensions as intrinsic and others as contingent-in South Asia occupation is an aspect of intrinsic nature, in the United States it appears to be so only for 3-year-olds. Second, young children conceptualize some social categories, notably race, as involving nonobvious mediating mechanisms that are thought to explain the causal relations among members. The young child’s notion of race is not an empirical generalization made from observed differences, but a theory-like account of human differences, whether they are perceptually marked or not. These studies raise a number of questions and possibilities, providing the basis for further research in several areas. These include explorations into the nature of prejudice, the relationship between individual social representations and aggregate historical and social phenomena, and issues in cognitive representation, particularly the relationship between cognitive domains and the most appropriate level of domain analysis. A.
AND CONCEPTUAL ORGANIZATION PREJUDICE
THE
NATURE
OF
Psychologists have long argued that general categorization processes contribute to the formation of social stereotypes (Hamilton & Trolier, 1986; Fiske & Taylor, 1991; Rothbart & Taylor, 1990). A major tradition in social psychology has sought to understand what individual differences predict differences in stereotyping and prejudice. Some researchers have preferred personality factors, some macro-social factors, and some information processing factors. Still, no consensus on the roots of stereotyping and prejudice has emerged. One reason may be the peculiar nature of early prejudices. Several studies have uncovered a paradoxical relationship between young children’s racial prejudices and their behaviors. Although preschool children maintain quite strident ethnic and racial biases (Clark & Clark, 1940; Horowitz, 1939; Aboud, 1988; Katz, 1983), in everyday contexts, preschool children do not find race a particularly relevant dimension on which to sort people, particularly when it comes to choosing playmates (Doyle, 1983; Finkelstein & Haskins, 1983; Singleton & Asher, 1979; McCandless & Hoyt, 1961; Lambert & Tachuchi, 1956).No compelling explanation for this lack of correlation between attitude and behavior
Child’s Representation of Human Groups
177
has been offered. Clearly it is not because young children’s attitudes generally do not motivate behavior, because gender stereotypes have been shown to shape playmate choice (Fagot, Leinbach, & Hagan, 1986; Katz, 1983). The alternative model of young children’s racial categories 1 propose may provide an explanation. Young children’s racial categories do not involve a discovery of perceptual regularities, but are initially aimed at specifying a social ontology. In this regard, young children seem to be more concerned with elaborating concepts at a higher level of generality that are relevant to a theory of society and developing a conceptual vocabulary for racial variation, than differentiating specific concepts and cataloging physical differences. Racial prejudice may not translate into biased patterns of behavior because young children may be unsure what the membership of any given racial category is. In short, young children’s racial beliefs might not influence behavior because they may fail to unambiguously instantiate racial category membership. By recasting our view of children’s racial cognitions, we may be in a position to resolve a number of the questions about children’s social cognition that have long remained open. BETWEEN MODELS OF SOCIAL COGNITION B. RELATIONSHIP A N D HISTORICAL A N D ANTHROPOLOGICAL SCHOLARSHIP
Increasingly, social historians and anthropologists argue that race is a constructed phenomenon, varying according to the social and political context in which it is embedded (Stoler, 1992; Goldberg, 1990; Sollors, 1989). Such claims do not imply that there is no common thread in culturally and historical specific racial discourses. Rather, these proposals suggest that race is largely independent of the regularities in physiognomy that adults typically associate with it. From a historical perspective, race is not a visual ideology (Stoler, 1992). Clearly, this view of race resonates with the one proposed here. According to my model, young children do not discover races because they are there to be found at the raw perceptual level, they discover them because they are integral to finding the sorts of things there are in everyday social experience. That this everyday social experience is influenced by historical and cultural forces is not surprising. What is interesting is that conceptual organization (on the individual level) and culture (on the aggregate level) seem to dovetail in this instance: An essentialist view of race permits a flexibility in reasoning that a strictly physicalist interpretation does not allow. As social theorists have established, this flexibility is crucial to adult conceptualizations of race: Physical features are diagnostic
178
Lawrence A. Hirschfeld
but not defining of race because one can be “mistaken,” on the basis of physical inspection alone, about the race of another. Intriguingly, this flexibility seems to have its roots in children’s beliefs about the same phenomenon. Further research would be well worth pursuing. C.
A N D LEVELSOF SOCIALCOGNITION CONCEPTUAL ANALYSIS
Recent work in conceptual development provides compelling evidence that knowledge of the world develops through the agency of specialized faculties for understanding. Often, these specialized devices are identified with naive theories (Carey, 1985; Wellman & Gelman, 1992). According to this view, each domain (or naive theory) is characterized by a unique set of causal principles and ontological commitments (Carey, 1985; R. Gelman. 1990). One means by which knowledge is extended (and conceptual change occurs) is through integration of domain-specific principles of causality across kinds of phenomena, where better-grounded domains become the explanatory model for less well-grounded ones (Carey & Spelke, 1994). For example, Carey (1985) has argued that preschoolers do not have an autonomous domain of biology. Instead, grade-school children’s naive biology emerges out of an “intuitive theory of behavior” (i.e., a naive psychology). The “causal structure in which [biological] phenomena are embedded is social and psychological” (Carey 1985, p. 188). As observed earlier, this proposal explains parallel patterns of reasoning in different areas of conceptualization as a function of knowledge transfer. The mechanism that produces knowledge transfer is analogical reasoning. As was also observed earlier, scholars from disparate traditions have proposed that the biological understanding of race results from just such analogical reasoning, in which human racial variation becomes identified with morphological regularities in nonhuman species. Neither of these claims has proved to be compelling. The problem with the general claim that knowledge transfer results from analogical reasoning is that such transfers have been as difficult to replicate under controlled conditions as they are easy to identify in natural thought. The problem with the specific claim that principles of biology infuse racial understanding is that biological reasoning appears to be subsumed to racial thinking, not vice versa. Recent proposals about conceptual architecture are informative here. Keil (1994), Sperber (1994), and Leslie (1994) have suggested that naive theories do not derive from the child’s initial state so much as they are precipitates of more abstract mechanisms of conceptual organization. Keil (1994, p. 252) calls such mechanisms modes of consfrual, defining them as “opportunistic, exploratory entities that are constantly trying to find
Child’s Representation of Human Groups
I79
resonances with aspects of real world structure.” This model has implications for the way knowledge is coordinated and integrated across domains. For example, the notion of modes of construal raises the possibility that natural knowledge transfer may be more apparent than real. Essentialist reasoning is characteristic of both biological and racial thinking. Studies reviewed in this chapter strongly indicate that essentialist reasoning is not a late, but an early emerging feature of beliefs about race. This finding suggests that essentialist principles of folk biology either transfer spontaneously to racial thinking or that there is a parallel (and almost simultaneous) instantiation of essentialist reasoning in both folk biology and racial belief. Studies discussed in this chapter lend some support to the second claim. The notion of knowledge transfer in the biology-to-race model relies on the recognition of a physical analogue across two comparable clusters of perceptual features. These readily perceptible phenotypic gaps are then taken as evidence of different populations. Yet the studies presented earlier indicate that perceptual information does not enjoy the same primacy in the development of racial thinking that it does in the emergence of naive biology. Thus, parallels in biological and racial reasoning may not represent an easy transfer of knowledge at all. Instead, essentialist reasoning in the two domains may signal the parallel endorsement of an essential mode of construal in distinct faculties, suggesting that essentialism, as Medin (1989)and later Keil(1994) supposed, is initially independent of biology. Again, only further research can resolve this issue. As with the earlier questions, this discussion points to the possibility that a more detailed understanding of the child’s theory of society may yield insights that have far-reaching implications. Race and other aspects of the child’s model of society are not marginal to the study of cognition, but represent a richly structured and extensively elaborated series of common sense beliefs about the nature of difference and similitude. This should not come as a surprise; after all, our principal metaphor for similarity is kinship, our primary metaphors of difference are race and species. In all these instances, it is the notion of corporate collectivity, linked by an intrinsic commonality, that motivates the image. Understanding where such beliefs come from and how they develop is squarely within cognitive psychology’s mission.
ACKNOWLEDGMENTS Research described in this chapter was supported by grants from the National Science Foundation ( I N T 8814397 and RCD 8751136). the Fondation Fyssen, and the Office of Vice President. University of Michigan. I am grateful to Evan Heit. Doug Medin. Heidi Schweingruber. and Ann Stoler for their comments on earlier drafts of this chapter.
180
Lawrence A. Hirschfeld
REFERENCES Aboud. F. (1987). The development ofethnic self-identificationand attitudes. In J. S . Phinney & M. J. Rotheram (Eds.), Children's ethnic socialization: Pluralism and development (pp. 32-55). Newbury Park, CA: Sage Publications. Aboud, F. (1988). Children and prejudice. New York: Basil Blackwell. Aboud, F. E., & Skerry, A. (1983). Self and ethnic concepts in relation to ethnic constancy. Canadian Journal of Behavioural Science, 15. 14-26. Allport. G. (1954). The nature of prejudice. Cambridge: Addison-Wesley. Andersen, E . (1986). The acquisition of register variation by Anglo-American children. In B. Schieffelin & E. Ochs (Eds.), Language socialization across cultures. New York: Cambridge University Press. Andersen, E. (1990). Speaking with style: The sociolinguistic skills of children. New York: Routledge. Asmore, R.. & Del Boca, F. (1981). Conceptual approaches to stereotypes and stereotyping. In D. Hamilton (Ed.j , Cognitive processes in stereotyping and intergroup behavior. Hillsdale, NJ. Erlbaum. Astington, J., H a m s , P.. & Olson, D. (1988). Developing theories of mind. New York: Cambridge University Press. Atran. S. (1990). Cognitive foundations of natural history. New York: Cambridge University Press. Baillargeon, R. (1992). The object concept revisited: new directions in the investigation of infants' physical knowledge. In C. Granrud (Ed.), Visual perception and cognition in infancy. Carnegie-Mellon Symposia on Cognition, Vol. 23. Hillsdale. NJ: Erlbaum. Banton, M. (1987). Racial theories. New York: Cambridge University Press. Bar-Tal, D., Graumann, C.. Kruglanski, A.. & Stroebe, W. (1989). Stereotyping and prejiidice: Changing conceptions. London: Springer-Verlag. Becker, J. (1982). Children's strategic use of requests to mark and manipulate social status. In S. Kuczaj I1 (Ed.). Language and development (Vol. 11). Hillsdale. NJ: Erlbaum. Bem, S . (1989). Genital knowledge and gender constancy in preschool children. Child Development, 60, 649-662. Berlin, B. (1992). Erhnobiological classification. Princeton. NJ: Princeton University Press. Blaske. D. (1984). Occupational sex-typing by kindergarten and fourth-grade children. Psychological Reports, 53, 795-801. Bodmer, W.. & Cavalli-Sforza. L,. (1976). Genetics, evolirtion. and man. San Francisco: W. H . Freeman. Boyer, P. (1990). Tradition as truth and communicwtion. New York: Cambridge University Press. Branch, C.. & Newcombe, N . (1980). Racial attitudes of black preschoolers as related to parental civil rights activism. Merrill-Palmer Quartery, 26, 425-428. Branch, C.. & Newcombe, N. (1986). Racial attitude development among young black children as a function of parental attitudes: A longitudinal and cross-sectional study. Child Development. 57, 712-721. Brown, A. (1990). Domain-specific principles affect learning and transfer in children. Cognitive Science, 14, 107-134. Byard. P. (1981). Quantitative genetics of human skin color. Yearbook ofPhysicalAnthropo1O ~ Y 24, . 123-137. Carey, S. (1985). Conceptual development in childhood. Cambridge: MIT Press. Carey, S . , & Diamond, R. (1980). Maturational determination of the developmental course
Child’s Representation of Human Groups
181
of face encoding. In D. Caplan (Ed.), Biological studies of mental processes (60-93). Cambridge, MA: MIT Press. Carey. S.. & Spelke, E. (1994). Domain specific knowledge and conceptual change. In L. Hirschfeld & S. Gelman (Eds.), Mapping the mind: Domain specijicify in cognition and culture. New York: Cambridge University Press. Chomsky, N. (1986). Knowledge of language. New York: Praeger. Clark. K.. &Clark, M. (1940). Skin color as a factor in racial identification of Negro preschool children: A preliminary report. Journcrl of’Experimenfa1Education, 8. 161-163. Cordua, G . , McGraw, K.. & Drabman, R . (1979). Doctor or nurse: Children’s perception of sex typed occupations. Child Development. 50, 590-593. Corsaro, W. (1979). Young children’s conception of status and role. Sociology ofEducation, 52, 46-59. Corsaro, W.. & Rizzo, T. (1988). Discussione and friendship: Socialization processes in the peer culture of Italian nursery school children. American Sociological Review, 53, 879-894. Cross, William (1991). Shades of black: Diversity in African-American identify.Philadelphia: Temple University Press. Davis. F. (1991). Who is black: One nation s dqfinition. University Park, PA: Pennsylvania State University Press. Doyle. A. (1983). Friends, acquaintances, and strangers, The influence of familiarity and ethnolinguistic background on social interaction. In K. Rubin & H. Ross (Eds.), Peer relationships and social skills in childllood. New York: Springer-Verlag. Dunn. J. (1988). The beginnings of social understanding. New York: Basil Blackwell. Ebony (1983). What makes you black’? Vague definition of race is the basis of court battles. January. Eder. R. (1989). The emergent personologist: The structure and content of 3 1/2, 5 112, and 7 112 year-olds’ concepts of themselves and other persons. Child Development, 60, 12 18-1228. Emmerich. W.. Goldman. K., Kirsch, B., and Sharabany. R. (1977). Evidenceforatransional phase in the development of gender constancy. Child Development. 48, 930-936. Fagot, B., Leinbach, M., & Hagan, R . (1986). Gender labeling and the adoption of sextyped behaviors. Developmental Psychology. 22, 440-443. Finkelstein. N.. & Haskins, R. (1983). Kindergarten children prefer same-color peers. Child Deue/opment, 54, 502-508. Fiske, S., & Taylor, S. (1991). Social cognition. New York: McGraw Hill. Flavell, J. (1985). Cognitive development. Englewood Cliffs, NJ: Prentice Hall. Gelman. R. (1990). First principles organize attention to and learning about relevant data. Cognirive Science, /4(1), 79-106. Gelman. R.. & Gallistel, C. (1978). T17e child’s understand of number. Cambridge, MA: Harvard University Press. Gelman, R.. Spelke, E., and Meck, E. (1983). What preschoolers know about animate and inanimate objects. In D. Rogers (Ed.). The development of symbolic thought. New York: Plenum. Gelman, S. ( 1989). Children’s use of categories to guide biological inferences. Human Development, 32, 65-71. Gelman, S., and Coley, J. (1990). The importance of knowing a dodo is a bird: Categories and inferences in 2-year-old children. Developmental Psychology, 26, 796-804. Gelman, S., and Wellman. H. (1991). Insides and essences: Early understandings of the nonobvious. Cognition, 38, 213-244. Gentner, D.. & Stevens, A. (1983). Menful models. Hillsdale, NJ: Erlbaum.
I82
Lawrence A. Hirschfeld
Gettys, L., & Cann. A . (1981). Children's perceptions of occupational sex stereotypes. Sex Roles, 7, 301-308. Goldberg. T. (1990). Anatomy ofracism. Minneapolis, MN: University of Minnesota Press. Gordon, L. (1989). Heroes c?f their own liues: The politics arid history qffamily violence. London: Virago. Gould, S . (1981). The mismeasure of man. New York: W. W. Norton. Guardo. C . . & Bohan. J. (1971). Development of a sense of self-identity in children. Child Deuelopment, 42, 1909-1921. Guillaumin, C. (1980). The idea of race and its elevation to autonomous scientific and legal status. In Sociological theories: Race and colonialism (pp. 37-68). Paris: UNESCO. Hamilton, D. (1981) Illusory correlation as a basis for stereotyping. In D. Hamilton (Ed.), Cognitive processes in stereotyping and intergroup hehauior. Hillsdale, NJ: Erlbaum. Hamilton, D.. & Trolier. T. (1986). Stereotypes and stereotyping: An overview of the cognitive approach. In J. Dovidio & S . Gaertner (Eds.), Prejudice. discrirninafion. and racism. New York: Academic Press. Harris, M . (1964). Patterns of race in the Americas. New York: Walker. Hirschfeld, L. (1988). On acquiring social categories: Cognitive development and anthropological wisdom. Man, 23, 611-638. Hirschfeld, L. (1989a). Rethinking the acquisition of kinship terms. Infernational Journal qf Behavioral Development. 12(4), 541 -568. Hirschfeld, L. (1989b). Discovering linguistic differences: Domain specificity and the young child's awareness of multiple languages. Human Development, 32. 223-236. Hirschfeld, L. (1993). Discovering social difference: the role of appearance in the development of racial awareness. Cognitive Psychology, 25, 3 17-350. Hirschfeld, L. (1994). Is the acquisition of social categories based on domain-specific competence or on knowledge transfer? In L. Hirschfeld & S . Gelman (Eds.), Mapping the mind: Domain speci$cify in cognition and culture. New York: Cambridge University Press. Hirschfeld, L. (in press a). Anthropology, psychology, and the meanings of social causality. In A. Premack, D. Premack and D. Sperber (Eds.). Causal cognition: A multidisciplinary debafe. New York: Oxford University Press. Hirschfeld, L. (in press b). Do children have a theory of race? Cognition. Hirschfeld, L., & Gelman, S. (1994). Toward a typography of the mind: An introduction to domain-specificity. I n L. Hirschfeld & S. Gelman (Eds.), Mapping fhe mind: Domain specificifv in cognition and citlture. New York: Cambridge University Press. Horowitz, R. (1939). Racial aspects of self-identification in nursery school children. Journal of Psychology. 7 , 91-99. Inagaki. K.. & Hatano. G. ( 1987).Young children's spontaneous personification and analogy. Child Deuelopmenf,58. 1013- 1020. Johnson. K., Mervis. C., & Boster. J . (1992). Developmental changes within the structure of the mammal domain. Deuelopmental Psychology, 28, 74-83. Katz, P. A. (1982). Development of children's racial awareness and intergroup attitudes. In L. G. Katz (Ed.). Current topic3 in early childhood education (Vol. 4, pp. 16-54). Norwood: Ablex. Katz. P. A. (1983). Developmental foundations of gender and racial attitudes. In R. L. Leahy (Ed.), The child's construction of social inequality. New York: Academic Press. Keil, F. (1979). Semantic and conceptual development: An ontological perspective. Cambridge: Harvard University Press. Keil, F. (1989). Conrepts, kinds, and cognitiue deueloprnent. Cambridge, MA: Bradford Book/MIT Press.
Child’s Representation of Human Groups
I83
Keil, F. (1994). The birth and nurturance of concepts by domains: The origins of concepts of living things. In L . Hirschfeld & S . Gelman (Eds.). Mapping the mind: Domain specijcity in cognition and culture. New York: Cambridge University Press. Kosslyn, S., & Kagan, J. (1981). “Concrete thinking” and the development of social cognition. In J . Flavell & L. Ross (Eds.),Social cognitiue deuelopment: Frontiers andpossible futures. New York: Cambridge University Press. Lambert, W . , & Tachuchi, Y. (1956). Ethnic cleavage among young children. Journal of Abnormal and Social Ps.ychologv, 53, 380-382. Lemaine, G., Ben Brika, J. and Bonnet, P. (1988.1, Identite et apparence physique chez les enfants. Revue Internationale de Psyc.hologie Sociale, 1 , 205-224. Lerdahl, F. & Jackendoff. R. (1983). A generatiue theory of tonal music. Cambridge: MIT Press. Lerner, R. (1969). The development of stereotyped expectancies of body build-behavior relations. Child Deuelopment. 40, 137-141. Lerner, R. (1973). The development of personal space schemata toward body build. 3ournal of Psychology, 84, 229-235. Lerner. R., & Schroeder. C. (1971). Physique identification, preference. and aversion in kindergarten children. Developmental Psycl7ology, 5 , 538. Leslie, A. (1994). ToMM, ToBy, and agency: Core architecture and domain specificity. In L. Hirschfeld & S. Gelman (Eds.). Mopping the mind: Domain specijcity in cognition and culture. New York: Cambridge University Press. Mandler. J.. Bauer. P., & McDonough. L. (1991). Separating the sheep from the goats: Differentiating global categories. Cognitive Psychology, 23, 263-298. McCandless, B . . &Hoyt. J. (1961). Sex,ethnicityand playpreferencesofpreschoolchildren. Journal qf Abnormal and Social Psychologv, 6 2 , 683-685. McCauley. C.. & Stitt, C. (1978). An individual and quantitative measure of stereotypes. Journal qf Personality and Social Psychology. 36, 929-940. Medin, D. (1989). Conceptsandconceptual structure. American Psychologist. 44, 1469-1481. Milner. D. (1984). The development of ethnic attitudes. In H . Tajfel (Ed.), The social dimension (Vol. 1). Cambridge: Cambridge University Press. Molnar. S. (1992). Human variation: Rrrces. types. and ethnic groups. Englewood Cliffs, NJ: Prentice Hall. Mosse. G . (1978). Toward the final .solu/i~m:A /7is/op ofEuropean racism. New York: H. Fert ig . Novick, L. (1988). Analogical transfer, problem similarity, and expertise. Joitrnal ofExperimental Psychology: Learning, Memory, cind Cognition. 14, 5 10-520. Ochs. E.. & Schieffelin, B. (1984). Language acquisition and socialization: Three developmental stories and their implications. In R . Shweder & R. LeVine (Eds.). Ciiltcire theory. New York: Cambridge University Press. Piaget, J. (1951). The child’s conception of the iiwrld. London: Routledge Kegan Paul. Pope Edwards, C. (1984). The age group labels and categories of preschool children. Child Development. 55, 440-452. Resnick. L. (1994). Situated rationalism: Biological and social preparation for learning. In L . Hirschfeld & S. Gelman (Eds.). Mapping the mind: Domain specijcity in cognition and c.ul/ure. New York: Cambridge University Press. Robins, A. (1991). Biological perspectiues on human pigmentation. New York: Cambridge University Press. Root, M. (1992). Racially mixed people in American. Newbury Park. CA: Sage. Rosengren. K.. Gelman, S.. Kalish. C . , & McCormick, M. (1991). Astimegoes by: Children’s early understanding of growth in animals. Cl7ild Development, 62. 1302-1320.
184
Lawrence A. Hirschfeld
Ross, L. (1981). The “intuitive scientists” formulation and its developmental implications. In J. Flavell & L. Ross (Eds.), Social cognitive development: Frontiers and possible futures. New York: Cambridge University Press. Rothbart, M. (1981). Memory processes and social beliefs. In D. Hamilton (Ed.), Cognitive processes in stereotyping and intergroup behauior. Hillsdale, NJ: Erlbaum. Rothbart, M., & Taylor, M. (1990). Category labels and social reality: Do we view social categories as natural kinds? In G. Semin & K. Fiedler (Eds.), Language and social cognition. London: Sage. Russell, K., Wilson, M., & Hall, R. (1992). The color complex: The politics ofskin color among African Americans. New York: Harcourt Brace Jovanovich. Sartre, J.-P. (1948). Anti-semite and Jew. New York: Schocken Press. Semaj, L. (1980).The development of racial evaluation and preference: Acognitive approach. The Journal of Black Psychology, 6 , 59-79. Singleton, L., and Asher, S. (1979). Racial integration and children’s peer preferences: An investigation of developmental and cohort differences. Child Development, 50(4), 936-941. Slaby, R., & Frey, K . (1975). Development of gender constancy and selective attention to same-sex models. Child Development, 46, 849-856. Sollors, W. (1989). The invention of ethnicitv. New York: Oxford University Press. Solomon, G., Johnson, S., Zaitchik, D., & Carey, S. (1993). The young child’s conception of inheritance. Paper presented at Society for Research on Child Development, New Orleans. April. Sorce, J . (1979). The role of physiognomy in the development of racial awareness. The Journal of Genetic Psychology, 134, 33-41, Spelke, E. (1991).Physical knowledge in infancy: Reflections on Piaget’s theory. In S. Carey & R. Gelman (Eds.), The epigenesis of mind: Essays on biology and cognition. Hillsdale, NJ: Erlbaum. Sperber, D. (1994). The modularity of thought and the epidemiology of representations. In L. Hirschfeld & S. Gelman (Eds.), Mapping [he mind: Domain speciJicity in cognition and culture. New York: Cambridge University Press. Springer. K., & Keil, F. (1989). On the development of biologically specific beliefs: The case of inheritance. Child Development, 60, 637-648. Springer, K., & Keil, F. (1991). Early differentiation of causal mechanisms appropriate t o biological and nonbiological kinds. Child Development, 62. 767-781. Stoler, A. (1992). Sexual affronts and racial frontiers: European identities and the cultural politics of exclusions in colonial Southeast Asia. Comparative Study in Society and History, 34, 514-551. Taylor, S . (1981). A categorization approach to stereotyping. In D. Hamilton (Ed.), Cognitive processes in stereotyping and intergroup behavior. Hillsdale, NJ: Erlbaum. Turiel, E. (1983). Interaction and development in social cognition. In E. T. Higgins, D. Ruble, & W. Hartup (Eds.), Social cognition and social developmen!: A sociocultural perspective. New York: Cambridge University Press. van den Berghe, P. (1967). Race and racism: A comparative perspective, New York: Wiley. Vaughan, G . (1987). A social psychological model of ethnic identity. In J. Phinney & M. Rotheram (Eds.), Children’s ethnic socialization. Beverly Hills, CA: Sage. Vosniadou. S., & Ortony, A. (1989). Similarity and analogical reasoning. New York: Cambridge University Press. Watson-Gegeo. A,, & Gegeo, D. (1986). Calling-out and repeating routines in Kwara’ae children’s language socialization. In B. Schieffelin & E. Ochs (Eds.), Language socialization across cultures. New York: Cambridge University Press.
Child’s Representation of Human Groups
I85
Wellman, H., & Gelman, S. (1992). Cognitive Development: Foundational theories of Core Domains. Annual Review of Psyrliologv, 43, 337-375. Wyer. R.. & Martin. L. (1986). Person memory: The role of traits, group stereotypes, and specific behaviors in the cognitive representation of persons. Journal ~f Personality and Socid Psychology, 50, 661-675.
This Page Intentionally Left Blank
DIAGNOSTIC REASONING AND MEDICAL EXPERTISE Vimla L. Patel, Jose F . Arocha, and David R . Kaufman
1. Introduction
The study of medical cognition has been the subject of formal inquiry for more than 30 years. The term medical cognition refers to studies of cognitive processes, such as perception, comprehension, reasoning, decision making, and problem solving in medical practice itself or in tasks representative of medical practice. These studies use subjects who work in medicine, including medical students, physicians, and biomedical scientists. Cognitive psychologists have often used artificial medical stimuli (arbitrary disease and symptom pairings) to study categorization and prediction in general (e.g., Medin & Edelson, 1988; Gluck & Bower, 1988), without studying medical practitioners or realistic tasks. Although this work has been very influential, these efforts are not studies of medical cognition. Cognitive science in medicine refers to a broader discipline encompassing medical artificial intelligence (Clancey & Shortliffe, 1984), philosophy in medicine (Schaffner, 1986), medical linguistics (Baud, Rassinoux, & Scherrer, 1992), and medical cognition (Evans & Patel, 1992). There are two principal experimental approaches that have been used to study medical cognition: a decision making and judgment perspective in which a subjects’ decisions are contrasted with a normative model, based on probability theory, indicating optimal choices under conditions of uncertainty (Dawes, 1988); and a problem-solving cognitive science THE PSYCHOLOGY OF LEARNING 4 N D MOTIVATION VOL 7 1
I87
Copyright 6 1994 by Academic Prr5a Inc All rights of reproduction in any form re\erved
188
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
approach in which the focus is on a description of cognitive processes in reasoning tasks, making use of protocol-analytic techniques (Ericsson & Simon, 1993) and the development of cognitive models of performance. The focus in this chapter is on the cognitive science perspective. In addition, we can characterize two categories of research within this perspective, research targeted at understanding the structure and use of basic science knowledge in medical tasks (Patel, Evans, & Groen, 1989; Feltovich, Spiro, & Coulson, 1989; Boshuizen & Schmidt, 1992) and research investigating the process of diagnostic reasoning (Elstein, Shulman, & Sprafka, 1978; Feltovich, Johnson, Moller, & Swanson, 1984; Patel & Groen, 1986; Joseph & Patel, 1990; Arocha, Patel, & Patel, 1993). Cognitive science research in medicine has a dual purpose similar to that in other complex domains. The first purpose is to develop theoretical models of cognition and computational models unique to medicine. The second purpose is to engage in empirical research, exploiting the domain of medicine as a knowledge-rich and semantically complex domain to further our understanding of cognition and computation. For example, cognitive research into medical expertise and medical artificial intelligence have made significant contributions to understanding the nature of expertise and artificial intelligence, respectively. The primary purpose of this chapter is to characterize the current state of knowledge in diagnostic reasoning. The study of expertise is one of the principal paradigms in problemsolving research. Investigators in this area seek to understand what distinguishes outstanding individuals in a domain from less outstanding individuals (Ericsson & Smith, 1991). Since the pioneering research of deGroot (1965) and the subsequent studies of Chase and Simon (1973) in chess, investigations of expertise have spanned the range of content domains including physics (e.g., Larkin, McDermott, Simon, & Simon, 1980; Anzai, 1991), sports (Allard & Starkes, 1991), music (Sloboda, 19911, and medicine (Patel & Groen, 1991a). Two edited books-Chi, Glaser, and Farr (1988) and Ericsson and Smith (1991)-provide excellent overviews of the area. The designation of “expert” can be a function of achieving a certain level of performance, as exemplified by Elo ratings in chess, or of being certified by a sanctioned licensing body, as is characteristic of medicine. In either case, the achievement of expertise requires about 10 years of full-time performance (Hayes, 1981). In a complex domain, expertise is not a monolithic entity, there is considerable variation and specialization. In many disciplines, individuals can be expected to perform at an expert level, only within a very narrow context. Expert knowledge is organized functionally in such a way to support specific reasoning tasks. This explains why performance differ-
Diagnostic Reasoning and Medical Expertise
189
ences are observed between genetic counselors and academic geneticists (Smith, 1990), and between cardiology researchers and practitioners (Patel & Groen, 1991a). The issue of between-expert variability is an important one in investigations of medical expertise. Expert specialists often have a particular subspecialization. For example, certain endocrinologists are diabetes specialists and others are authorities on lipid metabolism disorders. Although many studies contrast an expert group with a novice group, expertise is best viewed as a continuum with a number of intermediate levels that result in unique performance characteristics. The development of expertise is marked by specific transitions corresponding to reorganizations of knowledge and nonmonotonic increases in mastery of domainspecific tasks (Patel & Groen, 1991b). There are a number of expert characteristics that have a certain degree of generality across domains. A consistent theme across studies of the development of expertise has been the role that the evolution of knowledge structures has in facilitating the recognition of significant objects within a problem and in enhancing the ability to recognize typical situations. The superior organization of expert knowledge is manifested in experts’ ability to categorize stimuli at a highly principled and functional level of abstraction (Chi, Feltovich, & Glaser, 1981), exhibit enhanced recall for domain information (Chase & Simon, 1973), and engage in forward-directed reasoning, from givens in the problem to unknowns, in routine problemsolving tasks (Larkin et al., 1980). The latter two aspects, enhanced recall and directionatity of reasoning, have been of particular importance and a source of some controversy in medical diagnostic reasoning. These issues are dealt with in some detail in this chapter.
11. The Task Domain of Medical Diagnosis
Medical knowledge consists of two types of knowledge: clinical knowledge, including knowledge of disease entities and associated findings, and basic science knowledge, incorporating subject matter such as biochemistry, anatomy, and physiology. There are two broad perspectives on the nature of diagnostic reasoning: the fault model and the heuristic classi$cation model. The first model suggests that medical diagnosis is akin to diagnostic troubleshooting in electronics, with a primary goal of finding the structural fault or systemic perturbation (Feinstein, 1973; Boshuizen & Schmidt, 1992). From this perspective, clinical and biomedical knowledge become intricately intertwined, providing medical practice with a sound scientific basis. This model suggests that biomedical and clinical knowledge could be seamlessly integrated into a coherent knowledge structure
190
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
that supports all cognitive aspects of medical practice, such as diagnostic and therapeutic reasoning. The second model views diagnostic reasoning as a process of heuristic classification involving the instantiation of specific slots in a disease schema (Clancey, 1988). The primary goal of diagnostic reasoning is to classify a cluster of patient findings as belonging to a specific disease category. From this perspective, the diagnostic reasoning process can be viewed as a process of coordinating theory and evidence rather than one of finding fault in the system. As expertise develops, the disease knowledge of a clinician becomes more dependent on clinical experience; clinical problem solving is increasingly guided by the use of exemplars and analogy, and is less dependent on a functional understanding of the system in question. That is not to say that basic science does not play an important role in medicine, rather, the process of diagnosis, particularly in dealing with routine problems, is essentially one of classification. Basic science knowledge is important in resolving anomalies and is essential in therapeutic contexts. These two models of reasoning have been discussed in medical artificial intelligence in the context of deep and shallow systems of reasoning (Chandrasakeran, Smith, & Sticklen, 1989). A shallow system, such as MYCIN and INTERNIST, reasons by relating observations to intermediate hypotheses, which partition the problem space, and further associating intermediate hypotheses with diagnostic hypotheses. The knowledge base would include only entities related to taxonomic classification: diagnostic hypotheses and clinical findings. The deep and shallow distinction has been raised in the context of the control architecture differences between first and second generation expert systems. Therefore, a system such as NEOMYCIN, which has a complex control architecture but which is guided by heuristic classification, can be classified as being of intermediate depth. Chandrasakeran (Chandrasakeran et al., 1989) characterizes a deep system as similar to systems used in qualitative physics (Bobrow, 1985) that embody causal mental models. Systems such as MDX-2 (Chandrasakeran et al., 1989) or QSlM (Kuipers, 1987) have explicit representations of structural components and their relations, the functions of these components (in essence their purpose), and their relationship to behavioral states. The causal and diagnostic knowledge can be generated by running o r simulating the system and qualitatively deriving behavioral sequences that can identify and explain the malfunction. We adhere to the second model, which suggests that diagnostic reasoning is a process of heuristic classification. It is our belief that biomedical knowledge and clinical knowledge constitute two distinct “worlds” that bridge at discrete points of correspondence rather than form one integrated
Diagnostic Reasoning and Medical Expertise
191
knowledge base (Patel, Evans, & Groen, 1989). Our conclusion is based on an ontological characterization of the domains and on psychological evidence suggesting that basic science knowledge is not easily integrated in clinical contexts. The clinical description and physiological or biochemical description are at different levels of abstraction (detail)-they belong to different ontological categories. Biomedical knowledge is of a qualitatively different nature, embodying elements of causal mechanisms and characterizing patterns of perturbation in function and structure (Schaffner, 1986). Medicine draws on different sources of knowledge from the biomedical and to a lesser degree the physical sciences. This knowledge can be arranged in a hierarchical schema of the scientific sources (Blois, 1990). At the bottom is atomic physics, where matter is described with reference to atoms and their constituent properties. At each higher level in the hierarchy, there are newly emergent properties not entirely predictable from lower levels. Each new level has different conceptual entities and a unique language of description. Higher levels introduce more uncertainty and a greater degree of inexactness in ascribing causality. At the clinical level, models of disease are commonly described in terms of associations between clinical findings and diagnoses. The physician is not merely matching findings to diagnostic categories, rather, he or she is developing a model of the patient, based on a sequence of findings with a specific temporal order, and coupling this information with the patient’s past medical history, family history, physical examination, and laboratory findings. Investigations of medical problem solving provide some evidence to support the contention that biomedical knowledge is not used optimally in clinical contexts (cf. Patel, Evans, & Groen, 1989).The research findings suggest that basic science is used differentially in different tasks and in different medical domains (cf. Patel & Groen, 1986; Lesgold et al., 1988), experts and novices differ in their use of basic science, and, in many instances, basic science knowledge may actually interfere with clinical problem solving (Patel, Evans, & Groen, 1989). This is most apparent in tasks where subjects, students in particular, are explicitly asked to provide a basic science explanation of a clinical problem. For example, we have found that the coherence of an explanation is often reduced when basic science concepts are employed. That is to say, the overall explanation becomes fragmented into discrete and isolated chains of inference, some correct, others partially correct, and others incorrect. The evidence also suggests that students possess substantial inert knowledge that frustrates their ability to apply specific biomedical concepts to clinical problemsolving tasks (Patel, Kaufman, & Magder, 1991). Inert knowledge refers to domain knowledge acquired from texts or lectures that has not yet been exercised in problem-solving situations. The problems associated with the
192
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
use of basic science knowledge appear to be at least equally pervasive in problem-based medical schools that have an integrated curricula (basic science and clinical content are taught together) as compared with medical schools that partition their curriculum into a basic science section followed by a clinical section (Patel, Groen, & Norman, 1993). Medical problems can be characterized as ill-structured, in the sense that the initial states, the definite goal state, and the necessary constraints are unknown at the beginning of the problem-solving process (Simon, 1973). Cognitive domains or tasks can be characterized on a continuum from well-structured tasks, where the problem space is well delineated to ill-structured domains, which necessitate considerable structuring of the space of possibilities. Physics is an example of a well-structured domain where the constraints and possible operations are well understood. There are, of course, problem-solving tasks and content areas within any domain that are somewhat less structured. Writing or discourse production can be construed as an ill-structured task at the other end of the continuum, in which there are few initial constraints. In a diagnostic situation, the problem space of potential findings and associated diagnoses is enormous. The problem space becomes defined through the imposition of a set of plausible constraints that facilitate the application of specific decision strategies (Pople, 1982). Plausible constraints are produced, for example, by narrowing the range of possible diagnostic solutions by evoking categories of disorders (e.g., cardiovascular problems) or through the elimination of classes of problems. Given the complexity of the domain of medicine, there is a need for a framework for differentiating between different classes of conceptual entities. Evans and Gadd (1989) propose an epistemological framework that differentiates four levels into which clinical knowledge is organized in a medical problem-solving context. Observations are units of information that are recognized as potentially relevant in the problem-solving context; however, they do not constitute clinically useful facts. Findings are composed of observations that have potential clinical significance. Establishing a finding reflects a decision made by a physician that an array of data contains a significant cue or cues that need to be taken into account. Facets consist of clusters of findings that are suggestive of prediagnostic interpretations. They reflect general pathological descriptions, such as aortic insufficiency, or categorical descriptions, such as endocrine problem. They are also interim hypotheses that divide the information in the problem into sets of manageable subproblems and suggest possible solutions. Facets vary in terms of their levels of abstraction. A high-level facet may partition the problem space and may be a reasonable approximation to
Diagnostic Reasoning and Medical Expertise
193
a candidate solution. A low-level facet may involve a more local inference that may explain one or two findings and would not advance the problemsolving process to the same extent. Diagnosis is the level of classification that encompasses and explains all levels beneath it. The model is hierarchical, with facets and diagnoses serving both to establish a context in which observations and findings are interpreted, and also to provide a basis for anticipating and searching for confirming or discriminating findings. This framework has been used to code the inferences generated from doctor-patient dialogue during the clinical interview (Kaufman & Patel, 1991). In our earlier research, we conceived of facets as prestored knowledge structures, derived from the experience of solving many cases, much in the same way diagnoses are stored in individual knowledge bases (Patel, Evans, & Kaufman, 1989). There are several problems with such an approach (Groen & Patel, 1988). The first problem is of a representational nature. Facets can subsume diagnoses, in cases where they are of a more general nature (e.g., cardiovascular disorders), and they can be subsumed by diagnoses, in situations where they are a subcomponent of a diagnosis (e.g., the facet of myxedema is subsumed by the diagnosis hypothyroidism associated with myxedema precoma). This precludes a strictly hierarchical ontological scheme. The second problem with regarding facets as prestored knowledge structures is that it is inconsistent with some perspectives on skilled memory. There is substantial evidence that experts develop skilled memory to store and retrieve information rapidly in their domains of expertise (Ericsson & Smith, 1991; Ericsson & Polson, 1988). Skilled memory theory proposes that at the time of encoding, experts acquire a set of retrieval cues that are associated in a constructive or functional manner with the information to be stored in memory (Ericsson, Krampe, & Tesch-Romer, 1993; Ericsson & Staszewski, 1989). In a problem-solving situation, an expert can use these retrieval structures to provide selective and rapid access to longterm memory. These retrieval structures are sensitive to domain-specific information and can be used in problem-solving situations to organize stimulus material (e.g., a set of clinical findings) and to compute the intermediate results necessary in finding a solution. The use of retrieval structures was proposed to account for the remarkable performance of chess masters to generate mentally a sequence of intermediate chess positions and mental calculators’ ability to store and compute intermediate results in multidigit multiplication (Ericsson & Staszewski, 1989). In our view, a facet can be construed as a retrieval structure that can be used to access rapidly schemata from long-term memory (LTM) and partition a medical problem into manageable units to facilitate the instantiation of a
194
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
dignostic hypothesis. If a facet can be constructed dynamically in working memory, then subjects can compose an infinite number of combinatorial possibilities, unbounded by the limitations of prestored schemata. The next section summarizes research pertaining to diagnostic reasoning with particular emphasis on our own investigations. To maintain continuity, the methodological approach is discussed in some detail before raising specific conceptual issues.
111. Methodological Approach and Conceptual Issues
We began our studies of medical expertise with investigations of recall and comprehension of medical texts (Patel, Groen, & Frederiksen, 1986). This, in part, explains our commitment to methods of discourse analysis and semantic representation. We have since extended our research both methodologically and conceptually (Groen & Patel, 1988). From a methodological perspective, experimental paradigms and methods of analysis have been developed that are suitable for the study of expertise in medical reasoning and problem solving. On the conceptual front, we have investigated the reasoning processes characteristic of experts and novices. In this regard, we have devoted considerable effort toward investigating the directionality of reasoning and the time-course of hypothesis generation and evaluation. The theoretical perspective we have been developing has consistently attempted to reconcile theories of semantic representation (reflecting the fact that medicine is a semantically complex domain) with a problem-solving/reasoning perspective typical of most expertise research. This approach is also supported by the finding that much of expert reasoning involves the development of an initial problem representation rather than sequential problem-solving operations. In this regard, we have recently attempted to consolidate the construction-integration theory proposed by Kintsch (1988) with a schema-theoretic perspective. We elaborate in detail on this matter in a subsequent section. In studies of problem solving, the primary method of data acquisition is the “think-aloud protocol” (Ericsson & Simon, 1993). In these studies, subjects are instructed to think aloud as they perform a particular task. In a typical straightforward problem-solving task, in which subjects are asked to think aloud as they make a diagnosis, the protocols generated tend to produce unsatisfactory information regarding the nature of the knowledge being used. For example, in a routine case, experts tended to produce very sparse protocols which did not provide us with much of a basis for characterizing reasoning patterns. Kuipers and Kassirer (1984) suggest that expert knowledge is so compiled that it is difficult to articulate
Diagnostic Reasoning and Medical Expertise
195
intermediate steps. Compiled knowledge refers to knowledge of causal expectations that people compile directly from experience and partly by chunking results from previous problem-solving endeavors (Chandrasakeran, 1991). This is manifested in very short chains of inference by experienced problem solvers. A widely adopted solution has been to use various kinds of probing tasks to elicit a more detailed reasoning process. A probe that we have found useful is a diagnostic explanation task, in which the subject is asked to “explain the underlying pathophysiology” of the patient’s condition (Patel & Groen, 1986). Physicians typically respond to this question by explaining the patient’s symptoms in terms of a diagnosis. The diagnostic explanation task has proven to be useful for the study of medical expertise in verbal tasks; however, it seems to be less of a predictor of expert performance in perceptual domains, such as radiology and dermatology. For example, in radiologic diagnosis, the subjects who provide the best explanations are not necessarily those who are most accurate in detecting abnormalities from visual cues in X rays (Lesgold, personal communication). An experimental procedure we have repeatedly used (Patel & Groen, 1986; Patel, Groen, & Arocha, 1990) involves asking subjects to read a written description of a clinical case for a specific period of time and to provide subsequently a free recall of the case. The subjects are then asked to explain the underlying pathophysiology of the case, without reference to either the text or the previous recall response. Lastly, the subjects are asked to provide a diagnosis. The diagnosis is requested after the diagnostic explanation to give the subject the opportunity to provide a diagnosis during the explanation task, such that the resulting protocol may reflect elements of the solution process. This is referred to as the immediate presentation paradigm. In a variation of this paradigm (Joseph & Patel, 19901, subjects are presented clinical information sequentially, one sentence at a time, and are asked to explain the incoming information and suggest a diagnosis. This kind of experimental approach is designated as sequential presentation. Diagnostic reasoning in real-life situations is an interactive task involving a dialogue between a physician and a patient in which the physician gathers the appropriate information.
A.
PROPOSITIONAL A N D SEMANTIC REPRESENTATIONS
The methods we have used include various techniques of semantic and conceptual analysis, which allow us to characterize the structures of subjects’ knowledge and the fine-grained differences in problem representations. These methods also provide relatively precise measures of coherence in explanations (Groen & Patel, 1988).
196
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
We have made use of two kinds of representational formalisms: propositional representations and semantic networks. Intuitively, a proposition is an idea underlying the surface structure of a text. The notion’s usefulness arises from the fact that a given piece of discourse may have many related ideas embedded within it. A propositional representation provides a means of representing these ideas, and the relationships between them, in an explicit fashion. In addition, it provides a way of classifying and labeling these ideas. Systems of propositional analysis (Kintsch, 1974; Frederiksen, 1975) are essentially languages that provide a uniform notation and classification for propositional representations. In all these approaches, as in case grammars, a proposition is denoted as a relation or predicate (usually called the head element) over a set of arguments (frequently referred to as concepts). Sowa’s (1984) system provides another example of a language of this type. Although there are notational differences in the formalisms, the underlying assumption is that propositions correspond to the basic units of the representation of discourse and form manageable units of knowledge representations. The method for generating semantic network representations is described in detail in Groen and Patel (1988). The primary concern is with representing the structure of verbal or written data arising from explanation tasks in which subjects are presented with a situation involving a clinical case and are asked to explain aspects of the situation. The first stage of analysis involves generating a propositional representation of the response protocol. This is then transformed into a semantic network representation. The network consists of propositions that describe attribute information, which form the nodes of the network, and those that describe relational information, which form the links. We then distinguish between attributes that appear in the description of the clinical case and those that are spontaneously generated by subjects. The semantics of the links are derived from Frederiksen’s (1975) propositional grammar. The primary relations of interest in these networks are binary dependency relations, specifically, causal, conditional, and Boolean connectives-and, alternating or, and exclusive or relations. In addition, algebraic relations (e.g., greater than), identifying relations, and categorical relations (i.e., category membership, part-whole relations) can be expressed. We also distinguish between the source of a process and the result of a process. Uncertainty in relations can be represented by modal qualifiers (e.g., can) and truth values can be indicated when they deviate from the default truth value-positive. The causal and conditional relationships that form a major part of the semantic network arising from our data closely resemble the rules in an expert system such as NEOMYCIN (Clancey, 1988). This system consists of data-directed and
Diagnostic Reasoning and Medical Expertise
197
hypothesis-directed rules, which can be viewed as equivalent to forward and backward reasoning. It can be shown that there is an isomorphism between this semantic network and a set of related production rules (Patel & Groen, 1986). To be more precise about our analysis, some notions from graph theory (Sowa, 1984) are used to divide the semantic network into components. These are described in considerable detail in Groen and Patel (1988). In summary, a graph is defined as a nonempty set of nodes and a set of arcs leading from node N to a node N ' . A graph is connected if there exists a path, directed or undirected, between any two nodes. In a directed path, every node has a source and a target connecting it to its immediate successor. If the graph is not connected, then it breaks down into disjoint components. A semantic network is a directed graph formed by nodes and labeled connecting paths. Nodes may represent either clinical findings or hypotheses, whereas the paths represent directed connections between nodes. These networks provide a relatively precise means for characterizing the directionality of reasoning. Forward reasoning corresponds to an oriented path from data to a hypothesis. Thus, forward-directed rules are identified whenever a physician attempts to generate a hypothesis from the findings in a case. Backward-directed rules correspond to an oriented path from a hypothesis to data. Pure forward reasoning refers to a network where all paths are oriented from data to hypothesis. Pure backward reasoning refers to a network where all paths are oriented from hypothesis to data. Our use of the term coherence is also defined in terms of the connectivity of the graph corresponding to graph theory. Coherence can be estimated from the number of disjoint or disconnected components in a semantic network representation. In our analysis, we make use of two notions of coherence. Global coherence refers to the internal consistency of the overall explanation. An explanation exhibiting maximum global coherence would include connections between all subcomponents of the problem without any apparent contradictions. Local coherence refers to the consistency in a component that explains parts of the clinical problem. An explanation that exhibits local coherence without global coherence would include isolated aspects of the problem that are well understood, but not interrelated. It is important to note that coherency does not necessarily imply accuracy. The use of these definitions is illustrated in Fig. 1. This is the network representation of the diagnostic explanation by a psychiatrist of a case in cardiology described by Patel, Groen, and Arocha (1990). The case is not within the subject's domain of specialization, and the diagnosis of a shock slate is inaccurate. Because of this, the representation is lacking in coher-
198
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
---- 1 c
fallin bloc
upsurge in b I d DESSUE
’
RSLT: I
1
- -_
I
I
Fig. 1. Semantic representation of explanation protocol of a psychiatrist in a cardiology case. The patient has been reacting to stress likely by his injecting a drug (or drugs). which has resulted in tachycardia. a fall in blood pressure. and elevated temperature. These findings are due to the toxic reaction caused by the injected drugs. He is in or near shock. The flame-shaped hemorrhage may represent a sequel of an upsurge in blood pressure possibly as a result of his injection of drugs. COND: conditional relation; CAU: causal relation; RSLT: resultive relation; 0 text cues; L2 diagnostic hypothesis: arrows. directionalit y .
ence and contains one possible inconsistency, that is, a patient cannot have both high and low blood pressure at the same time. Furthermore, the underlying mechanism that explains the signs and symptoms in this patient is attributed to toxicity of drugs that are injected by the patient as a reaction to stress. This is an inaccurate description of the patient problem. However, this makes it useful for illustrative purposes because it contains a mixture of forward and backward reasoning. The diagram consists of nodes linked by arrows. The arrows have labels indicating the relationship between nodes. The two most important are C A U , which means that the source node causes the target (e.g., upsurge in blood pressure causes @ame-shaped hemorrhage), and COND:, which means that the source node is an indicator of the target (e.g., tachycardia indicates shock state). The arrows labeled CAU: represent causal relations, whereas those labeled COND: represent conditional relations. A difference between the two is in the strength of implication: COND: expresses a directional conditionality, P1 P2, that implies if proposition PI is true then P2 is true. CAU: P1 3 P2, is of a stronger relation indicating that one variable, P2 is a function of another, P1. Although we can define these terms formally, it is important to note that subjects’ attributions of causality or conditionality may deviate from appropriate usage. We code protocols on the basis of subjects’ attributions even if there are
+
Diagnostic Reasoning and Medical Expertise
199
apparent inconsistencies. The nodes containing facts from the problem text are enclosed in boxes. The diagram also contains three “AND” nodes (conjunction), indicated by forks in the arrows. One of these is a target (tachycardia,fall in blood pressure, and elevated temperature indicate shock state). The remaining two nodes are sources (injection of drug causes tachycardia, fall in blood pressure and elevated temperrtture). If we compare the text and the semantic network, it is apparent that the network does not necessarily reflect the temporal sequence of concepts referred to in the text. For example, the same concept can be referred to repeatedly, but it is represented as inferences on a single node. This is evident in the example. This graph contains several different lines of reasoning, which can be rendered precise by introducing a few more concepts from graph theory. We define a “cut-point” of a graph to be a node that, when removed together with all arrows leading to or from it, causes a graph to separate into disjoint components that are themselves graphs. Conversely, if two graphs have a common node, then their join is the new graph formed by joining them at that node. This suggests that an algorithm for finding the components can be defined in terms of generating, at each cut-point, two graphs whose join is the original graph (intuitively, we obtain two graphs by removing the cut-point and then reattaching it into each graph). What results from applying this algorithm is a hierarchy of components, some of which are uninteresting and some of which actually distort the logic of the process we are attempting to represent. Because of this, we prohibit the application of the algorithm in the following two cases: (a) ANDnodes, and (b) graphs that consist of a single path without any branches. Carrying out this procedure results in the minimal components shown in Fig. 2. Although other components exist, they are alljoins of these minimal components. The directionality of these components is mixed. The component where the inference is made in the direction of data to hypothesis is coded as forward-driven inference. For example, tachycardia, fall in blood pressure, and elevuted temperature indicate the diagnosis of shock state. Conversely, a component where the inference is made in the direction from hypotheses to data is coded as backward-directed inference. An example here is injection of drug causesflame-shaped hemorrhage. We can also obtain quantitative measures of directionality of reasoning by coding inference as forward or backward. In a given protocol, we can determine the percentage of forward and backward reasoning inferences. To have a reference point that indicates a certain standard of performance, we develop a reference model for each case that is presented to the subjects. The reference model reflects an idealized knowledge repre-
200
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman injection of drug
cwl
upsurgein bloodpressurc
tml
4
k-shaped hemmhagc
1
fallin b l d pressure
Fig. 2. Components of semantic representation given in Fig. I . COND: conditional I diagnostic relation; CAU: causal relation; RSLT: resultive relation; 0 text cues; i hypothesis; arrows, directionality.
sentation that is constructed with the assistance of domain experts and pertinent medical texts (Patel & Groen, 1986;Joseph & Patel, 1990; Patel, Evans, & Kaufrnan, 1989). It serves as a benchmark or gold standard for certain types of analyses we perform on subjects’ response protocols. The construction of a reference model is similar in nature to a knowledge engineering task used in the development of expert systems. The goal is not to develop necessarily a faithful cognitive representation of an expert, rather, it is to develop something like an ideal model of the problem. It is an iterative process that begins with asking an expert not participating in our studies to explain the findings in the case. We can then consult other sources to verify and add to this information, at which point we can then go back to the expert and probe for specific details. B.
NOVICE-EXPERTCONTINUUM
As noted earlier, expertise reflects a continuum from genuine beginner to the highly trained specialist. The time period from entering medical school to becoming a board certified specialist is, on average, 10 years. This provides a basis for differentiating between subjects at various levels in the acquisition of expertise. Expert physicians have extensive knowledge of medicine (acquired through medical school and residency training), but
Diagnostic Reasoning and Medical Expertise
20 I
only a relatively narrow area of specialization. It is therefore possible to distinguish between specific (e.g., cardiology) and generic (e.g., general medicine) expertise. An individual may possess both, or only generic expertise. Early medical training, through medical school and internship, involves the acquisition of generic expertise. At the point physicians enter a residency training program they can choose to specialize, in which case they would acquire specific expertise as well as continue to develop generic expertise. The development of both kinds of expertise overlap considerably. However, a medical resident would continue to acquire generic expertise through rotations in areas outside his or her area of specialization. To clarify some terminology, we provide the following definitions:
Nouice: An individual who has only everyday knowledge of a domain or one who has the prerequisite knowledge assumed by the domain. Intermediate: An individual who is above the beginner level but below the subexpert level. Subexpert: An individual with generic knowledge but inadequate specialized knowledge of the domain. Experf: An individual with a specialized knowledge of the domain.
A few things should be noted about these definitions. First, it is assumed that the domain is sufficiently well defined that it is possible to distinguish between generic and specific knowledge. Second, the notion of intermediate is not well defined at all. We will clarify this notion later. Every expert (e.g., endocrinologist) is a subexpert outside of his domain of specialization (e.g., in cardiology).
IV. Expert Diagnostic Reasoning A.
DIRECTIONALITY OF REASONING
The innovative research of Elstein, Shulman, and Sprafka (1978) represented the first efforts to use information processing theories to study medical cognition. This work was enormously influential in that Elstein et al. were the first researchers to use experimental methods and theories of cognitive psychology to study clinical reasoning. This work, which commenced in the late 1960s, was strongly influenced by the early work of Newell and Simon (1972) on simple knowledge-lean tasks such as the Tower of Hanoi and Cryptarithmetic. The principal goal of the studies by Elstein and colleagues was to characterize the general aspects of medical problem solving in terms of the cognitive processes used to acquire and manipulate patient data. The emphasis was on process parameters, such
202
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
as the number of hypotheses generated, rather than on content or domain knowledge. On the basis of this research, Elstein et al. (1978) suggested that clinical reasoning is characterized by a hypothetico-deductive process. The model identifies four stages in the diagnostic process: cue acquisition, hypothesis generation, cue interpretation, and hypothesis evaluation. This process begins with attention to initial cues which lead to the rapid generation of a few select hypotheses, on average five in a single workup. In the process of cue interpretation, each subsequent cue is designated as positive, negative, or noncontributory with respect to the hypotheses under consideration. The findings did not discriminate between physicians judged by their peers to be superior physicians and other physicians (Elstein et al., 1978). In a related study, using a very similar approach, Barrows, Feightner, Neufeld, and Norman (1978) found no differences between students and clinicians in their use of diagnostic strategies, except for the quality of the hypotheses considered and the accuracy of diagnosis. The characterization of hypothetico-deductive reasoning, essentially a form of backward reasoning, as an expert strategy seemed anomalous in view of the fact that it was widely regarded as a weak and inefficient method of problem solving (Groen & Patel, 1985), more characteristic of novices in domains such as physics (Simon & Simon, 1978).As a cognitive model, it necessitates the use of subgoals to test and evaluate each hypotheses. This makes unreasonable demands on working memory. Subsequent findings in the domain of medical cognition suggested that experts have an elaborate, highly structured knowledge base capable of supporting highly efficient reasoning (Feltovich et al., 1984; Bordage & Zacks, 1984). In the domain of radiology, Lesgold and colleagues (1988) found that expert radiologists have superior perceptual pattern recognition capabilities as compared to novices. This allows the expert to rapidly detect gross anatomical abnormalities. Gale and Marsden (1983)provided an additional insight when they suggested that Elstein et al. (1978) had erred when they collapsed all hypotheses into the category of diagnostic hypothesis. They suggested that if it is recognized that most hypotheses are prediagnostic interpretations (e.g., potential cardiac problem) rather than actual diagnostic hypotheses, then the diagnostic reasoning process would appear to involve a more incremental partitioning of the problem space than that suggested by the hypothetico-deductive model. In fact, this issue is still a point of controversy. Patel and Groen (1993) demonstrated that if you do not discriminate between diagnostic and subdiagnostic hypotheses, then you are likely to interpret the same data in a very different manner and reach fundamentally different conclusions concerning the diagnostic
Diagnostic Reasoning and Medical Expertise
203
reasoning process (cf. Elstein et al., 1993). This distinction is crucial in detecting differences in reasoning strategies between subjects of different levels of expertise. Pate1 and Groen (1986) were motivated by the seemingly anomalous description of the diagnostic reasoning process to characterize precisely the directionality of reasoning used by expert physicians in a diagnostic explanation task. The results indicated that expert cardiologists’ explanations could be accounted for by a forward-reasoning strategy that involved moving from propositions in the stimulus text, to conditions that suggested components of the diagnosis, to an accurate diagnostic solution. This is in contrast to cardiologists who misdiagnosed the case, who used a backward reasoning strategy that involved generating and testing hypotheses and the invocation of causal explanations. As discussed previously, forward reasoning is usually contrasted with backward reasoning, where the problem solver works from a hypothesis regarding the unknown back to the given information. It might be noted that the distinction is frequently made, perhaps more generally, in terms of goal-based (backward) versus knowledge-based (forward) heuristic search (e.g., Hunt, 1989). We phrase the distinction in terms of data and hypotheses because it relates more clearly to the types of empirical paradigms in which we are most interested. The distinction between forward and backward reasoning is closely related to another distinction made in problem-solving research between sfrong methods, which are highly constrained by the problem-solving environment, and weak methods, which are only minimally constrained. In fact, the two distinctions are logically independent (Hunt, 1989). Forward reasoning is highly error prone in the absence of adequate domain knowledge because there are no built-in checks on the legitimacy of the inferences that subjects make. Pure forward reasoning is only successful in constrained situations, where knowledge of a problem can result in a chain of inferences from the initial problem statement to the problem solution. In contrast, backward reasoning is slower and may make heavy demands on working memory, because one has to keep track of such things as goals and hypotheses; it is therefore most likely to be used when domain knowledge is inadequate. Backward reasoning is usually a symptom of a weak method. It is important to note that the term weak is being used in the technical sense of “weak constraints” as opposed to “strong constraints.” This does not imply that it is a weak way of solving a problem. In fact, a weak method is preferable when relevant prior knowledge is lacking, as is likely to be the case with anyone but an expert. We view our research program as a theory-building enterprise. Experimental research in psychology is somewhat skewed toward the use of
204
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
quantitative methods for confirming or disconfirming hypotheses that are assumed to be well formulated, rather than toward providing a foundation for theory development which can then provide a basis for hypothesis testing (diSessa, 1991). The work presented in this chapter focuses on indepth qualitative analysis of individual protocols, as well as on quantitative analysis of group differences. The goal of this work is to establish and explain empirical phenomena pertaining to reasoning, the structure of knowledge, and the nature of expertise, rather than to develop broad categorical generalizations. B. EXPERTREASONING A N D DOMAIN SPECIFICITY This section focuses on a series of investigations concerning the effects of problem difficulty and relevance of domain knowledge on expert reasoning, and examines how these effects differ as a function of immediate versus sequential problem presentation. In the last few years, the notion has become increasingly accepted in the study of expert-novice differences that expert reasoning is schema driven (Koedinger & Anderson, 1990; Hunt, 1989). A schema is assumed to be a structure that represents generic knowledge and is instantiated with specific new information in a given situation (Brewer, 1987). Schemata, which are built up as a function of experience within a domain of expertise, guide a subject to key elements in a problem and serve to filter out irrelevant information. In a complex medical problem, there are an inordinate amount of potentially significant findings. An experienced physican can rapidly access appropriate schemata and delineate a problem into something manageable. The notion of schema is often used in cognitive psychology to explain phenomena of pattern recognition o r something that is triggered by stimulus conditions. However, medical problem solving, except in the most routine sort of problems, necessitates more than a pattern recognition capability. We have to allow for composability of schemata in real time, not just in terms of fixed structure in which slots are filled. Schemata can be viewed as a set of pointers from a problem encoding, residing in short-term or working memory, to relevant knowledge in long-term memory. Schemata also filter out the volume of irrelevant information present in the space of a typical medical problem. In the context of a semantic network representation, a schema can be described in terms of a cluster of connected propositions (nodes and labeled links), for example, diagnostic hypotheses, and the clinical findings that are explained by the hypothesis (van Dijk & Kintsch, 1983). As discussed earlier, diagnostic schemata are accessed from LTM, and facets or subdiagnostic hypotheses can be constructed or composed dynamically. It has been suggested that the rapid evocation of appropriate schemata is a hallmark
Diagnostic Reasoning and Medical Expertise
205
of expertise (Chi et al., 1981). Schemata serve a dual purpose of organizing stimulus material through hypothesis generation and guiding the search for further information. In our research, we assume that the generation of a hypotheses is indicative of the access or construction of a schema or multiple schemata. Studies of expertise in medicine and other domains have indicated that the early generation of diagnostic hypotheses is an important indicator of the existence of schemata. The rapidity with which the initial hypotheses are generated constitutes a striking feature of the behavior of experts. With minimal information, the expert unhesitatingly selects a single working hypothesis. Studies have shown that the earlier a good hypothesis set is created, the more predictive it is of the quality of the solution (Elstein, et al., 1978). In the case of routine problems, this is accompanied by forward reasoning in which the direction of inferencing proceeds from data to hypotheses (Patel & Groen, 1986). This finding is consistent with expert schema-driven reasoning and results in a number of important conclusions that appear to apply across domains. The most important, for the purposes of this chapter, is that routine problems are solved by experts via a data-driven process of forward reasoning, which directly infers solutions from the facts in the problem without backtracking or generating subgoals. This raises the question of what variables might affect schema formation. Certain methods of problem presentation are more likely to elicit schemata than others. If the entire context of the problem is presented simultaneously to the subject as continuous text, then the text is more likely to generate a schema. This form of text presentation will be termed immediate paradigm. On the other hand, if the information is split into segments and sequentially presented, such schemata cannot be initially accessed. Instead, the subject must generate a schema on the basis of partial information and this schema must be modified or abandoned as the information becomes more complete. This form of text presentation will be called sequential paradigm. An additional impediment to schema formation results from short-term memory limitations. Using standard assumptions in this regard, it can be assumed that propositions will only remain in short-term memory (STM) for a limited amount of time. If they vanish from STM, then the pointers can be expected to vanish with them, unless some indexing process exists that allows a new proposition to be substituted for the old one. We know from other domains that experts are highly skilled at developing temporary memory structures for maintaining intermediate results, such as unaccounted for, but significant findings (Ericsson & Staszewski, 1989).This memory encoding and retrieval ability is critical for maintaining and evaluating hypotheses in working memory, especially in the sequential paradigm.
206
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
1 . Hypothesis Formution and Tusk DifJiculty
Two factors are associated with task difficulty. The first is the relevance of the expert’s domain knowledge. If this domain knowledge is lacking, then schemata might point to irrelevant knowledge or might be fragmented. However, a subexpert can draw on a vast medical knowledge base, which can partially compensate for the absence of relevant domain knowledge. Thus, it is useful to distinguish between generic and specific expertise. Specific expertise refers to the possession of knowledge specific to a domain. Thus, board-certified cardiologists have specific expertise about cardiology; they do not necessarily have specific expertise about endocrinology. The reverse, obviously, holds true for endocrinologists. However, both cardiologists and endocrinologists know a considerable amount about clinical medicine by virtue of the fact that they are physicians trained in general medicine. This latter is what we mean by generic expertise. Empirical support for the notion that it is an important factor comes from results from our laboratory (Patel & Groen, 1991a) that indicates a ceiling effect on recall of clinical cases. Both subexperts and experts can recall all or most relevant case material. Analogous to the results in other domains, such as chess (Charness, 1991), experts recall clinical cases far better than novices (Patel, Groen, & Frederiksen, 1986). However, experts (regardless of domain) tend to recall all information relevant to the diagnosis of a clinical case. Moreover, this enhanced recall is unrelated to diagnostic accuracy. Further indications of generic expertise may come from a comparison with intermediates (i.e., subjects who are not novices, but who have not reached the expert level). Intermediates perform considerable irrelevant search of hypotheses, order irrelevant tests, and generally show an inability to cope economically with new information (Patel, Groen, & Patel, in press). Although experts outside their domain may be viewed as intermediates, they do not perform the same as intermediates. Experts do not engage in such extraneous search, even when performing outside their domain. These results point to the need to differentiate between generic and specific experts, rather than consider the former as a type of intermediate. The second factor is the intrinsic difficulty of the problem regardless of domain knowledge. As already indicated, it is well known that experts solve routine problems by forward reasoning (e.g., Larkin et al., 1980). In a study designed to extend the findings of Patel and Groen (1986), Patel, Groen, and Arocha (1990) examined the extent to which the pattern of forward reasoning would break down when subjects were confronted with more difficult cases. They varied task difficulty by using two texts describing the history, physical examinations, and laboratory tests of two
Diagnostic Reasoning and Medical Expertise
207
patients. The first text described a case of a 63-year-old woman who suffered from an endocrine disorder called Hashimoto’s thyroiditis precipitated to myxedema precoma. The second clinical text describes a case of a 62-year-old man, who suffered from a cardiac disorder. This case is more difficult than the endocrinology case because there is considerable overlap of schematic knowledge, in the sense that the data are suggestive of multiple competing diagnostic possibilities. We will describe this case in more detail later. A glossary of medical terms used in this chapter is given in the Appendix (Section VII). According to the expert reference model of the Hashimoto’s thyroiditis case, the diagnosis can be decomposed into three diagnostic components. The most general and prototypical component is hypothyroidism. This is indicated by the textual cues suggestingfluid accumulation and decreased thyroid function. The second component, myxedema, indicates that the patient is in an advanced state of hypothyroidism. The clinical cues that constitute the third component suggest a very specific origin for the disease process: the autoimmune process known as Hashimoto’s thyroiditis. Cues from each of the three components need to be recognized to accurately diagnose the problem. It was found that the existence of loose ends in the endocrine problem tended to disrupt the pattern of forward reasoning, even when the diagnosis was correct. Loose ends are anomalous cues that are not directly related to the main diagnosis. Figure 3 gives the semantic network representation of the protocol of an endocrinology practitioner explaining the endocrinology problem. The information used from the given clinical text is shown in solid blocks and the diagnostic components that are generated or inferred are given in dashed blocks. The other information are either inferences generated from the given information, which are intermediary constructs in generating diagnostic components, or inferences generated to explain a given clinical information. The input nodes are those that correspond to propositions in the text. The relations that connect the propositions are linking propositions (e.g., CAU:, a causal relation, or COND:. a conditional relation). In much of our analysis, CAU: refers to some statement regarding the functioning of the underlying pathophysiology. It is convenient to consider these rules as production (if-then) rules. The causal rules can be reversed into conditional, but the conditional rules leading to the actual diagnosis cannot be reversed in this fashion. For example, in Fig. 3, the causal rule hypoventilation, leads to respirutoiy failure implies the conditional rule, if respiratory failure then hypoventilation. On the other hand, the purely conditional rule ifvitiligo then autoimmune thyroidism cannot be reversed because autoimmune thyroidism does not cause vitiligo.
208
Vimla L. Patel, Jose F. Aroeha, and David R. Kaufman
L Progressive Disease
Of Thyroid Function
tm
Iodine Administration I
, _ _ _ _----. -
Galactorrhea Further Blockage
__-----------: Autoimmune : Thyroiditiswith : MyxedemaComa :
tMlh
1
Long Standing Nature of Disease I
u
Prolactin
Inappropriate Diuretic Hormone
HYPOmtabolic State
cAu +Hypoventilation
"'
Respiratory ~ d
l
~
Fig. 3. Semantic network representation of the protocol of an endocrinology practitioner explaining the endocrinology problem adapted from Patel, Groen, & Arocha ( 1990. p. 4). 0 text cues; i 3 diagnostic component.
The explanation provided by the endocrinologist in Fig. 3 presents very little text information and it is constructed to justify a diagnosis. There is one large component with forward reasoning and two small components with backward reasoning to explain the textual cues. The large component consists of three subcomponents: Subcomponent 1
patient's current leads to, COND: condition
autoimmune thyroiditis
Diagnostic Reasoning and Medical Expertise
209
Given the information vitiligo, thyromegaly (enlargement of thyroid), and progressive decrease in thyroid function indicates a diagnosis of autoimmune thyroiditis. This is a forward-directed rule from the given information to a diagnostic inference. Subcomponent 2 autoimmune thyroiditis and iodine
w
i
further blockage of thyroxine release
Here, the consequence of the rule in the subcomponent 1, autoimmune thyroiditis together with a new given information, administration of iodine mixture, indicates further blockage of thyroxine release. This is also a forward-directed rule, where the given information is used to generate an inference. Subcomponent 3 blockage of thyroxine release
+
galactorrhea jC -
myxedema
Once again a new given information galactorrhea is used together with the consequence of the rule in subcomponent 2 to generate a diagnostic inference of myxedema. In the node-link structure, the input nodes are those that correspond to propositions in the text. When one of these nodes is “fired,” the node to which it is linked together registers this fact. However, because there is an AND-node, it does not fire until all its other antecedents fire. In terms of the graph theoretic interpretation, this results in a coherent semantic network leading to the correct diagnosis. The two additional components of Fig. 3 are generated in the direction of the causal rule, where the given information respiratory failure and hyponatremia are explained using information that is inferred, such as hypometabolic state and inappropriate diuretic hormones, respectively. This is termed backward reasoning where the direction of inferences is toward the explanation of given information. In the explanation generated by the endocrinologist, the direction of reasoning is completely forward except at the very end of the protocol, where a problem related to a slowed metabolism (hypometabolic stare) is explained in terms of a possible outcome (respiratory failures), and the existence of a low serum level of sodium ions (hyponatremia) is explained
210
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
in terms of impaired water excretion. There is one large component with forward reasoning leading to a diagnosis and two small components with backward reasoning to explain the anomalous textual cues. The amount of information unaccounted for (anomalous data) increased as experts moved out of their domains. A cardiologist practitioner’s explanation of the same endocrine problem led to a greater amount of loose ends, although the main diagnostic component was still generated by forward reasoning, as shown in Fig. 4. A greater use of textual information is seen in this explanation in contrast to the explanation generated by the expert given in Fig. 3. The relational network in Fig. 4 can be decomposed into a number of components, the first of which is generated by the use of forward-directed reasoning: Component 1
patient’s current condition
COND.
j-
long standing primary hypothyroidism
A list of given clinical signs of symptoms regarding the patient leads the physician to generate a diagnostic inference of long standing primary hypothyroidism. All the other four components are generated by backward directional reasoning, where the given information such as edema, pleural effusion, skin pallor, EKG changes, low body temperature, and so on, are explained with the use of inferred information. Thus the direction of reasoning is from inference to given information. These four components constitute loose ends because they are not connected to the main diagnostic component. It should be noted that although the diagnosis provided is accurate, it is only partial. The tasks used in the preceding studies were explanation-based in the sense that the subjects read the complete text (i.e., immediate presentation) before explaining the problem. Joseph and Patel (1990) studied the performance of experts and subexperts on the endocrine case using sequential rather than immediate presentation. Their results showed no significant differences between the groups in terms of selection of relevant and critical cues from the case. The experts generated accurate diagnostic hypotheses early in the problem encounter and spent the rest of the time evaluating (confirming and refining) the diagnosis by explaining the patient cues. Although the subexperts also generated the accurate diagnosis, they did so later in the case presentation. These subjects had difficulty in evaluating the hypotheses against the given information, which resulted
21 I
Diagnostic Reasoning and Medical Expertise
Poor Appetite Hoarsness of Voice of long date
Constipation
\
Drowsiness
Hypometabolic
,
I
I
PRT:
{ Delayed Relaxtion Phase 1 Low BUN
Low Protein Turnover
Hypoprotenuria
Pleural Effusion
b I
Apex Beat not Felt
I
Low Body Temperature Decreased Metahlic Rate
Poor Ventilation
Increased c02
Fig. 4. Semantic network representation of the protocol of a cardiology practitioner explaining the endocrinology problem. 0 text cues; i l 3 diagnostic component.
212
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
in the inability to discriminate between and eliminate alternative diagnoses. This is consistent with the finding that if an expert includes the correct diagnostic hypothesis in the initial hypothesis set, the subsequent evaluation of data is almost always directed at the confirmation of the hypothesis, rather than the generation of any alternative possibilities (Patel, Evans, & Kaufman, 1989).This is equivalent to generating an accurate diagnosis using pure forward reasoning when all the information is given at one time. However, if the diagnosis is not accurate, then at the evaluation phase of the problem solving, alternative diagnoses may be generated or some additional patient cues will have to be explained. This results in backward reasoning. The most surprising aspect of this result was that sequential presentation did not have a disruptive effect. In fact, the diagnostic accuracy of subexperts was higher in the sequential than in the immediate presentation. The main difference was that experts generated relevant hypotheses sooner than subexperts. However, this may have been due to the relatively simple nature of the endocrine problem. This leads us to consider the effects of sequential presentation of the considerably more difficult cardiology problem. In a recent study presented at the 1992 Meeting of the Psychonomic Society, Patel, Groen, and Arocha tested eight senior physicians (four cardiologists and four endocrinologists) in the sequential presentation task. Given that the task was the diagnosis of a patient with a cardiac disorder, the expert group consisted of four cardiologists and the subexpert group consisted of four endocrinologists. The physicians were all board-certified, practicing physicians with 5 to 10 years of experience in their respective fields. The stimulus text described a cardiology problem, based on a real patient and modified by a cardiologist for the purposes of this study. This problem is identical to the one used in Patel, Groen and Arocha (1990). The clinical information in the case was arranged in the typical order of the patient’s medical history, findings from physical examination, and Xray and laboratory test results. The problem describes the case of a 62-year-old man who was diagnosed as having pericardial effusion with pretamponade. This is a condition in which there is a compression of the heart produced by the accumulation of fluid in the pericardial sac (i.e., a fibrous sac that surrounds the heart) to the extent that the normal expansion of the heart is prevented. The fluid may result from the rupture of a blood vessel of the heart. To diagnose the case, the physician must decide whether the problem is caused by a failure of the left-side or the right-side of the heart, and then identify the presence of pericardial effusion (i.e., accumulation of fluid within the pericardial sac) and cardiac tamponade (i.e., compression of the heart by
Diagnostic Reasoning and Medical Expertise
213
the accumulation of fluid). Determining the actual causal process of specific heart failure (i.e., right or left) is a difficult task because many of the clinical features are common to different diagnostic possibilities. There are, however, afew cues that either serve to rule out alternative diagnoses or are not related to the main diagnosis. Based on the reference model, five major (sub)diagnostic components or facets were identified in the case that would lead to an accurate diagnosis. These include, right heart failure, left heart failure, cardiac tamponade, hepatic congestion (accumulation of blood in the liver vessels due to lack of circulation), and pericardial effusion. This information was used to evaluate the subjects’ interpretations of the problem. The basic procedure was similar to that used by Joseph and Pate1 (1990). Each subject was tested individually in his or her office at the hospital. Before the presentation of the case, subjects were given a short practice session to familiarize themselves with the experimental apparatus and procedure. The stimulus material was presented to subjects on a microcomputer one segment at a time. Subjects controlled the rate of presentation of each segment by pressing the mouse button, and then each segment was replaced by the next on the display. However, the sequence of information was fixed and not under the subjects’ control. It was not possible to access information presented in prior segments at any time during the presentation of the case. Subjects were instructed to verbalize their thoughts about the role and importance of the information in each segment in reaching the correct diagnosis. After presentation of the entire case, subjects were asked to provide a summary of the case and then to offer their final diagnoses. The protocols were analyzed in terms of (a) diagnostic accuracy, (b) the number and type of hypotheses generated, and (c) the representation of the problem over the time-course, using a combination of protocol and discourse analyses techniques. Protocol analysis allows one to study the problem-solving moves in relation to transitions in knowledge states. Discourse analysis allows one to study detailed semantic descriptions that capture complex relations in the protocols and allows characterization of the use of specific knowledge, the nature of inferences, and the overall pattern of reasoning. It also provides u s with a basis for measuring coherence in subjects’ explanations. Diagnostic accuracy was assessed in terms of specific diagnostic components that were either included or not included in the verbalization process. In addition, a diagnosis was characterized as accurate when it was completely correct (i.e., all of its components were included), partially accurate (when some components of the diagnosis were included), and inaccurate (when no diagnostic components were included).
214
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
2 . Diagnostic Accuracy All the expert subjects generated completely accurate diagnoses, whereas all the subexperts generated partially accurate diagnoses. One subject generated an inaccurate diagnosis. Table I gives the list of major diagnostic components generated by the experts and subexperts. All experts generated each of the five major diagnostic components. The subexperts generated all but one major component, that of cardiac tamponade. Subject 6 was the exception, in that he did not generate the components of left and right heart failure. This is a serious enough error to make the diagnosis inaccurate. 3 . The Time-Course of Hypothesis Generation
Analysis of the time-course production of diagnostic hypotheses focused on differences between expert and subexperts in (a) the relationship between the number of hypotheses generated in the course of presentation of the case description, and (b) the time and the order of production of accurate diagnostic components. The pattern of the cumulative mean sum of new hypotheses produced with each new segment of information for experts and subexperts is presented in Fig. 5 . Each point on the figure represents the cumulative total number of new hypotheses generated as each segment is presented. The slope of the lines represents the pattern of hypothesis generation: the larger the slope, the greater the number of new hypotheses being generated; zero
TABLE I
DIAGNOSTIC COMPONENTS GENERATED BY EXPERTS (CARDIOLOGISTS) A N D SUBEXPERTS (ENDOCRINOLOGISTS), SOLVING T H E CARDIOLOGY PROBLEM Diagnostic components generated by subjects List of major diagnostic components from reference model A Heart failure
B C D E F
Right heart failure Left heart failure Cardiac-tarnponade Hepatic constriction Pericardial effusion
SI
Experts S3 S5
x x x x
x x x x
X
X
x
x
X X X X X X
S7
S2
X X X X X
X
X
x X 0 X X
Subexperts S4 S6
S8
x x x
x x
0
x x
x o o 0
x x
X
0 X X
Diagnostic Reasoning and Medical Expertise
30
20
1
t
21s
F
P EL
10-
0 2 4 6 8 101214161820222426283032343638404244464850
Sentence Number
Fig. 5 . Hypothesis generation over the tirne-course of information presented by expert and subexpert subjects in the cardiology case. Letters A-F correspond to six diagnostic subcornponents as listed in Table 1.
slope indicates that no new hypotheses were generated. There is a clear difference in the patterns of hypothesis generation for the two groups. Before the presentation of segment 8, the two groups of subjects had generated approximately the same number of hypotheses. After segment 8, the expert subjects used more of the incoming information to confirm the diagnostic components that had already been produced rather than generating new diagnostic hypotheses. Subexperts, in contrast, continued to generate alternative components, as reflected in the differences between the two slopes of experts and subexperts from segments 10 to 48. The subexperts produced the diagnostic components later than did the expert subjects. The two major diagnostic components, heart failure, and, specifically, right heart failure, were generated by both groups of subjects by the presentation of information in segment 8. However, all of the experts had identified the third component, left heart failure, also by segment 8. Specific individual differences in the generation of hypotheses were found in expert subjects with respect to the diagnostic component of tamponade. Two subjects generated this component before the component of pericardial effusion. N o subexperts generated the tamponade compo-
216
V i d a L. Patel, Jose F. Arocha, and David R. Kaufman
nent. They generated a number of intermediate diagnostic components that linked one component to another by an extended chain of inferences. As a mode of comparison, Fig. 6 gives the pattern of the cumulative mean sum of new hypotheses produced with each new segment of the information for experts (endocrinologist) and subexperts (cardiologist) using the less complex endocrinology problem presented in the study by Joseph and Patel (I990),described earlier. Both the experts and subexperts produced accurate diagnostic components. However, the subexperts produced them later than did the expert subjects. A comparison of the change of slopes of the cumulative lines of new hypotheses as a function of segment number reveals that hypotheses increase at the same rate up to the point where five hypotheses have been produced. After this point, experts produced very few new hypotheses, whereas for the subexperts, the rate of hypothesis production continued to rise. The experts most often used new findings from the physical examination and laboratory test results to confirm the diagnoses generated earlier and to determine secondary problems. The subexperts, on the other hand, focused on associating the findings from physical examination and the laboratory test results with new diagnostic possibilities. Thus, the pattern of results from the use of a less complex problem are similar to the findings in the context of more complex problems. The difference is in the total number of new hypotheses generated, where subjects using the complex
1 2 3 4 5 6 7 8 9101112131415161718192021222~~~~~
Sentence number Fig. 6. Hypothesis generation over time-course of information presented by expert and subexpert subjects in an endocrinology case (easier case). Adapted from Joseph & Patel ( 1990, p. 37). A-C: production of diagnostic subcomponents hyopthyroidism, myxedema, and Hashimoto’s thyroiditis.
217
Diagnostic Reasoning and Medical Expertise
(x
text generate more hypotheses Expert = 14, subjects using the less complex problem @Expert
x Subexpert = 21) than = 6 , x Subexpert = 9).
4. Patterns of Reasoning Table I1 gives the directionality of reasoning in generating and evaluating the hypotheses by experts and subexperts. The results show that all subjects used more forward-directed reasoning than backward-directed reasoning. Experts used more backward reasoning than subexperts. The reason for this was that experts evaluated their hypotheses more often than subexperts. At a certain point it was evident that experts had already concluded the correct diagnoses and the additional information presented to them after this point was superfluous. They also used more confirmation strategies, the majority of which were forward directed. Subexperts generated a greater number of intermediate components that linked one hypothesis to another. These links were uncoded because they are neither forward- nor backward-directed links. 5 . Detailed Analvsis of Protocols
Figures 7-9 present schematic representations of the subjects’ explanations of specific segments of the cardiology case. We concentrate on the TABLE I1 DIRECTIONALITY OF REASONING BY EXPERTSA N D SUBEXPERTS I N SOLVING A COMPLEX PROBLEM SEQUENTIALLY Directionality of reasoning ~~
Subiects
Forward
Backward
Confirmation”
Subexperts (endocrinologists) I6
s4 s2b s7
39 20 24
Sl I h Experts (cardiologists) s9 SI S6
7 47 IS 17
s10 ~~~
19 7 9 8
~
‘’ Confirmation includes both forward and backward confirmations as well as rule-out strategy. Unencoded hyotheses ( 6 in S2 and 5 in S11) were identified.
218
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
protocols of four subjects, two from the expert and two from the subexpert subjects, which present the details of the changing representations of the problem and their relationship to the pattern of reasoning. Both subjects, Expert 9 (E9) and Subexpert 4 (SE4), begin in the same way by generating the low-level facet hypotheses of left sided heart failure in segment 3. Previously, we distinguished between low-level facets, which result in local inferences, and high-level facet hypotheses, which partition the problem space. Soon after the introduction of the first low-level facet hypothesis, the expert introduces three high-level facet hypotheses: heart disease, liver disease, and kidney disease. These facets help partition the problem space into three potential causes. Figure 7 presents the hypotheses generated by the expert and the subexpert at segments 7,8, and 18. In segment 7, the patient presents the cue of increased appetite and loss of weight. Whereas the expert generates three high-level facets which partition the problem space into three distinguishable classes of problems that can account for the cues (kidney, heart, or liver problems), the subexpert interprets the cue in terms of a generalized swelling of the body. The potential diagnoses that may account for anasarca, however, are not specified. This inference does not tie together any of the other findings and is therefore not used to delineate further the space of possibilities. This pattern is repeated in subsequent segments, with the expert maintaining high-level facet hypotheses and the subexpert generating new lowlevel facet hypotheses. Furthermore, the expert begins evaluating these hypotheses immediately after their introduction. For instance, in segment 8, the expert generates two hypotheses, swelling of the bowel and liver disease. The subject accomplishes this in a backward-directed manner by generating a low-level facet (swelling of the bowel) which is caused by the high-level facet (liver disease). In this way, the expert confirms the involvement of the liver in the patient problem. The subexpert, in contrast, generates two new low-level facet hypotheses, accounting for the segment information in a local manner (biventricular heart failure and loss of protein), as is evident in segment 8. It is important to note that the hypotheses generated by the subexpert are meaningful hypotheses in the context of the presented data (i.e., one cannot discount them as irrelevant or incorrect), but they are not conducive to a partitioning of the problem space. Once the expert seemed to have confirmed the involvement of the liver in the process, the problem was to figure out whether this involvement is primary (i.e., the cause of the problem is hepatic) rather than secondary to a cardiac problem, and to differentiate between the three high-level facet hypotheses he generated earlier (i.e., heart, kidney, and liver disease). This is what the expert does at segment 18. This subject generates the first two diagnostic hypotheses, constrictive pericarditis and nephrotic
219
Diagnostic Reasoning and Medical Expertise
EXPERT
SUBEXPERT SEGMENT 7
I
I
I
decreased appetite with
1111.
cono:
heartdisease
~
4
fluid accumuiation i n abdomen
anasarca
liverdisease
\ "":
kidney disease
I
SEGMENT8
I
no food tastes good & mild
no food tastes nausea
hoarse voice swelling of bowel
I
decreased appetite with
liver disease
bi-ventricular heart failure
abdominal wall
abdominal wall
Icono:
t extreme edematous state
loss of
cono: liver disease
ascites
Fig. 7. Schematic representation of explanations of segments 7 , 8. and 18 of the more difficult case, pericardial effusion with cardiac pretamponade. by subexpert SE4 and expert E9. 0 given information: i 3 diagnostic component.
syndrome. In this way, the expert narrows the problem space from highlevel facets to specific diagnoses. The subexpert interprets the same information in a local manner, as in segments 7 and 8, by generating another low-level facet hypothesis, namely, ascites. To examine the extent to which the amount of verbalization influences this pattern, let us turn to one expert and one subexpert who produced highly verbal protocols and who maintained the same pattern of hypothesis
220
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
activation throughout the problem-solving process. We can examine specifically what can be interpreted as the early generation of partial schemata and the use of confirmation strategies by experts in contrast to subexperts. Of the two subjects we present next, El generated 20 hypotheses and subject SE2 generated 29 hypotheses throughout their problem-solving process. E l , as did the rest of the experts, made a completely accurate diagnosis, whereas SE2 made a partially accurate diagnosis. Figure 8 presents the schematic representation of the segments 2, 3, and 7 by subjects, E l and SE2. Both subjects begin by considering more than one hypothesis, three by the expert and two by the subexpert. A difference between these two subjects is that the subexpert introduces a specific diagnostic hypothesis (i.e., emphysema) at the beginning (segment 2). Also in segment 3, the subexpert generates another diagnostic hypothesis, rheumatic heart fever, besides the high-level facet hypotheses of cardiac and pulmonary problems mentioned earlier. The generation of further diagnostic hypotheses is also repeated in segment 5, where the subexpert produces mitral stenosis and in segement 7, cor-pulmonale. The expert, on the contrary, generates high-level facet hypotheses (cardiac problem and respiratory problem) and a low-level hypothesis (i.e., anemia) and evaluates the hypothesis of liver involvement (segment 7). Although the subexpert seems to be able to interpret most of the relevant findings correctly, he fails to see the relevance of some key findings, as is evident in Fig. 9. In segment 28, information that is interpreted by the expert as evidence against valvular heart disease is interpreted by the subexpert as consistent with left ventricular failure secondary to aortic stenosis and, in turn, consistent with pericardial effusion. As noted previously, the expert mostly generates high-level facet hypotheses, such as the ones mentioned of cardiac, respiratory, and general systemic problem. However, when the subject generates these, as in segment 3, he generates pericardial problem, which leads him in the right direction to conclude finally the diagnosis of pericardial effusion with tamponade (segment 48). Up to segment 9, the expert subject generates high-level facet hypotheses: valvular heart disease, cardiomyopathy, and coronary artery disease. From then on, the subject evaluates the findings without introducing any more distinct hypotheses. He interprets some findings in terms of rightor left-sided heart failure and, in the process, discounts several hypotheses, such as valvular heart disease (see segment 28), primary liver problem, and ischemic heart disease. This process of generation of facet-level hypotheses leads him to conclude the final diagnosis through the evaluation of pericardial effusion and cardiac tamponade against further information. In summary, although the subjects’ protocols presented are unique in many respects, two features are salient. The first one is that experts
22 I
Diagnostic Reasoning and Medical Expertise
EXPERT
SUBEXPERT
I
SEGMENT2
I
shortness
ono: cadm
problem
respiratory problem
general severe anemia
wdiac disease
/
pulmonary disease +
emphysema
1
SEGMENT3
a breathless lying down
sitting up
cardiac disease
rheumatic fever
pulmonary disease
left ventricular failure
lr
bicuspid aortic valve
I
SEGMENT7
I
increased
increased
cono:
lesslikely Ascites
1
fluid retention
marked enlargement of liver
cor-oulmonale
severe rieht ventricuiar failure
Fig. 8 . Schematic representation of explanations of segments 2, 3. and 7 of the more difficult case. pericardial effusion with cardiac pretamponade, by subexpert SEI and expert E2 (highly verbal subjects). 0 given information; i l l diagnostic component.
interpret case data from the first few segments in terms of high-level facet hypotheses, which later they evaluate. This serves to partition the problem into manageable units, thus reducing the load of working memory. The second is that once the experts have generated these hypotheses, they use them as a basis for evaluating subsequently presented data without introducing any new hypotheses. In contrast, subexperts generated
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
222
EXPERT
SUBEXPERT
I
SEGMENT 28
-
4
/*
CRU:
_I pericardial _- l --I
valvular hean disease
I
, .
i
I effusion 1
-----
\ / m aortic valve
CRU:
low flow through valve CRU:t
ventricular severe left failure
mitral valve
tricuspid valve
cnu:
t
aorticstenosis
I
I
SEGMENT48
patient referred to outlying hospital for
/
ono:
cono:
cono:
primary cardiomyopathy with biventricular heart failure valvular heart failure
+
outlying hospital for definitive management
- -pericardial -- - - I
i
disease with large effusion
I
cono: echocardiogram?
cono: r-------
cardiac tamponade I----- I 2
Fig. 9. Schematic representation of explanations of segments 28 and 48 of the more difficult case, pericardial effusion with cardiac pretamponade, by subjects subexpert SEI and I 7 diagnostic component. expert E2 (highly verbal subjects). 0 given information; i
hypotheses mostly at the low-level facet, with some high-level facets and diagnostic hypotheses. Also, they kept generating new hypotheses even after producing most of the diagnostic components needed for the final diagnosis. The experts used forward-directed reasoning to generate initial diagnostic hypotheses. Once the diagnosis was generated, confirmation strategies were used to evaluate earlier diagnoses. This made them generate a high number of backward-directed hypotheses to tie up loose ends. Although the subexperts also used forward reasoning to generate the hypotheses as new text information was presented, they did not use confirmation strategies, making far fewer hypothesis evaluations after most diagnostic components had been produced.
223
Diagnostic Reasoning and Medical Expertise
Using a simple problem in the domain of endocrinology presented sequentially, Joseph and Patel (1990) showed that experts and subexperts were able to keep track of their schemata because the problem was well structured and without too many diagnostic alternatives. Using the complex case in the domain of cardiology, experts used forward reasoning to rule out alternative diagnoses before presenting the main diagnostic components (Patel, Groen, & Arocha, 1990). Backward reasoning was used to tie up loose ends. For experts, immediate problem presentation appears to encourage forward-directed reasoning because all the information is available at one time. In contrast, sequential presentation seems to encourage the generation of hypotheses and subsequent conformation, refinement, and modification. B.
A N D TASKDIFFICULTY EXPERTREASONING
This section addresses the effects of problem difficulty and relevance of domain knowledge on clinical reasoning and how these effects differ as a function of immediate versus sequential problem presentation. To clarify the following discussion, the relationship between these studies is shown in Table 111, which shows the format of presentation and level of difficulty used in each study. The results with the more difficult case of pericardial effusion with pretamponade resemble those of Joseph and Patel (1990). using the less difficult case of Hashimoto's thyroiditis, in two ways. The first way is in the time-course of hypothesis formation. In both studies, we can observe two distinct phases: a hypothesis generation phase and an evaluation phase, in which the graphs, shown in Figs. 5 and 6, reach an asymptote. Subexperts take longer to reach the evaluation stage than experts. The second resemblance is that there is a predominance of forward reasoning as
TABLE 111 STUDIES
RELATING MEDICAL EXPERTISE, TASKDIFFICULTY, AND PROBLEM PRESENTATION FORMAT Problem
Format of presentation Immediate Sequential
Simple
Difficult
Patel. Groen. & Arocha (1990) endocrinology problem Joseph & Patel ( 1990) endocrinology problem
Patel. Groen. & Arocha (1990) cardiology problem Patel & Groen (unpublished study) cardiology problem
224
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
opposed to backward reasoning with both subexpert and expert subjects. These results, which resemble those found in many other domains, can be interpreted in terms of a distinction between partial and total schemata. A total schema exists when a set of coherent propositions that accounts for the entire case is retrieved and linked to a text base in working memory (the active problem representation). This total schema is built when immediate presentation is used. When sequential presentation is used, however, it is only possible to build partial schemata, which at best account for the information presented thus far, or more accurately, for those elements that remain in working memory. In this sense, the first stage of the hypothesis formation graphs indicates the elicitation or development of partial schemata. The second stage, when t h e curve flattens out, indicates the completion of a total schema. That is to say, experts complete the schema earlier and solve the problem at an earlier stage than subexperts, and use additional information to evaluate and confirm the hypotheses thus far generated. The fact that experts develop a total schema more quickly than subexperts is taken as an indication of the distinction between generic and specific expertise. There are also some important differences in the results of the two studies that allow for a greater differentiation between generic and specific expertise. As mentioned previously, all subjects made an accurate diagnosis in the endocrine case (used by Joseph and Patel, 1990),but only experts made completely accurate diagnoses in the considerably more difficult cardiology case. This difference is taken as an indication that even though total schemata are also formed by subexperts, these schemata are not necessarily adequate. The fact that subexperts tend to generate new hypotheses, rather than confirm old ones, suggests that this inadequacy may reside in the maintenance of the irrelevant propositions from the clinical case in working memory, which in turn suggests that total schemata are not fully completed. This would seem to imply that the generic expertise of subexperts may result in a tendency to use schema-driven reasoning, even when the schemata are inadequate to solve the problem. It is of interest to note that in the cardiology case under immediate presentation (Patel, Groen, & Arocha, 1990),no subject made a completely accurate diagnosis. It is possible that although sequential presentation appears to enhance performance in both easy and difficult cases, the nature of this enhancement is different. Sequential presentation may enhance the accuracy of subexperts on easy cases and of experts on hard cases. There may be a connection between this and the notion proposed by Patel and Groen (1992) that coherence is a principal criterion for the adequacy of a schema. That coherence plays a role in the present study is indicated by the fact that there is a tendency of experts to use backward reasoning,
225
Diagnostic Reasoning and Medical Expertise
which is associated with a process that Patel and Groen (1986) characterize as connecting loose ends unrelated to the main diagnosis. The cardiology case contains a number of loose ends that lead to conflicting conclusions when globally combined. Immediate presentation may result in a subject focusing on global coherence which might result in attempts to reconcile these conflicting cues. On the other hand, sequential presentation necessitates a focus on local coherence, in which such cue combinations do not play a role. When comparing the results across studies (cf. Patel, Groen, & Arocha, 1990; Joseph & Patel, 1990) to evaluate the effects of the presentation paradigm on diagnostic reasoning performance, we find that diagnostic accuracy interacted with case difficulty. In the simple case of endocrinology, subexperts did better with sequential than with immediate presentation. Experts were perfectly accurate in both situations, showing a ceiling effect. For the more difficult case (cardiology), subexperts performed better with the immediate than with the sequential presentation, whereas the experts performed better with sequential presentation than with immediate presentation. The results are presented in Table IV. In these experiments, we also asked the subjects to recall the clinical cases. It should be noted that there were no recall differences between experts and subexperts in either case. There was a ceiling effect on the recall of relevant material. Both subexperts and experts exhibit superior recall performance when compared with either intermediates or novices. These results suggest that recall measures do not discriminate at this level of expertise and perhaps that the recall of relevant clinical material is a function of generic expertise. TABLE IV DIAGNOSTIC ACCURACYB Y EXPERTISE, CASE TYPE,A N D TEXT PRESENTATION FORMAT Level of expertise and diagnostic accuracy Experts Subexperts Case type Cardiology Accurate Partially accurate Inaccurate Endocrinology Accurate Partially accurate Inaccurate
Sequential
Immediate
Sequential
Immediate
4 0 0
0 6 2
0 1
0 5 3
4 0 0
I I 0
4 0 0
0 7 1
3
226
Virnla L. Patel, Jose F. Arocha, and David R. Kaufman
These results are partially counterintuitive in relation to what we know about working memory. Both sequential and immediate tasks impose a memory load, but sequential presentation should make more demands on working memory. However, this does not seem to be the case for the experts working on a difficult problem. There could be two possible reasons for these results: the first is that explanations are generated on the fly and the second is that a schema is constructed rather than retrieved. These reasons emphasize the dynamic aspects of schema construction. In the medical domain, knowledge is organized in a hierarchy of the form observations j findings +facets 3 diagnosis, where facets are intermediate constructs related to diagnostic hypotheses. Within these intermediate constructs, there are many possible permutations which lead to specific diagnoses. Furthermore, specific diagnostic hypotheses are elicited via the use of retrieval structures which provide dynamic flexibility in their ability to partition the space of possible diagnoses. These structures provide access to information in long-term memory. Physicians may be able to generate a set of relatively stable structures for maintaining intermediate results depending on the representation of the problem and then complete the diagnostic process on the basis of the information maintained in these structures. The differential use of such structure is evident in the experts’ repetitive use of the same intermediate constructs or high-level facets, such as liver involvement, which they use in a systematic way to organize and methodically evaluate patient data. The subexperts exhibit no such systematicity in their use of retrieval structures. This is analogous to the retrieval structures used by mental calculators, who set up intermediate results in working memory (Ericsson & Staszewski, 1989). These structures alleviate some of the burden imposed on working memory.
V. A.
Novice Diagnostic Reasoning
NONMONOTONICITY I N THE DEVELOPMENT OF EXPERTISE
In much of the research characterizing the nature of expertise, the focal point is the “expert” and the novice is used merely as a basis of comparison. The theory of expertise really becomes a theory of the expert, in terms of the structure of knowledge and the various performance parameters. What is needed is a more elaborate theory of the progression from novice to expert. We particularly need to understand more about novice development to advance learning theories and promote more effective instructional approaches. In recent years, there have been a number of studies that have extended the expertise approach to study novices of
Diagnostic Reasoning and Medical Expertise
227
different ability levels and at different levels of training (e.g., Thibodeau, Hardiman, Dufresne, & Mestre, 1989; Chi, Bassok, Lewis, Reiman, & Glaser, 1989). In the following section, we discuss several studies of novice diagnostic reasoning. Like experts, novices can be considered on a continuum from beginners, who have just started their medical training, to advanced novices, who are in the final stages of medical school but have had relatively little practical experience. It is also important to distinguish between intermediates and beginners, possibly on the basis of time devoted to learning the domain, or having passed appropriate courses. If the category of subexperts is excluded, then a distinctive developmental phenomenon emerges, which has been termed the intermediate ejfect. This refers to the fact that, although it seems reasonable to assume that performance improves with training or time on task, there appear to be particular transitions in which subjects exhibit a certain drop in performance. This is an example of what is referred to as nonmonotonicity in the developmental literature (Strauss & Stavy, 1982) and is also observed in skill acquisition. The symptom is a learning curve or developmental pattern that is shaped like either a U or an inverted U. It should be noted that not all intermediate performance is nonmonotonic; for example, on some global criteria such as diagnostic accuracy, there appears to be a steady improvement. The intermediate effect occurs with many tasks and at various levels of expertise. The tasks vary from comprehension of clinical cases and explanation of clinical problems, to problem solving, to generating laboratory data. The subjects vary from students at different levels of training to medical residents and senior physicians. The common finding is that intermediate performance involves extraneous search. For example, in recall tasks, intermediates recall detailed information (Patel, Groen, & Frederiksen, 1986), provide extensive elaborations in explaining a patient problem (Arocha, Patel, & Patel, 1993), elicit considerable amounts of information from a patient to make a diagnosis (Kaufman & Patel, 1991), or request many extraneous laboratory tests (Patel et al., in press). The phenomenon may be because intermediates have acquired an extensive body of knowledge but have not yet reorganized this knowledge in a functional manner to perform various tasks. Thus the knowledge has a sort of heterarchical or flat structure that results in considerable search and would also make it more difficult for intermediates to set up structures for rapid encoding and selective retrieval of information. Experts knowledge is finely tuned to perform various tasks and they can readily filter out irrelevant information using their hierarchically organized schemata. The difference is reflected both in the structural organization of knowledge and the extent to which it is proceduralized to perform different tasks.
228
Vimla L. Patel, Jose F. Arocha, and David R. Kaufman
Both of these interrelated factors are responsible for this extraneous search and the accessing of irrelevant prior knowledge. Schmidt and Boshuizen ( 1993)reported that intermediate nonmonotonicity recall effects disappear by using short exposure times (about 30 sec). This suggests that under time restricted conditions, intermediates cannot engage in extraneous search. In other words, intermediates process too much “garbage,” whereas experts do not. Novices, on the other hand, do not conduct irrelevant searches, simply because they lack a knowledge base rich enough t o search. Whereas a novice’s knowledge base is likely to be sparse and an expert’s is intricately interconnected, an intermediate may have a lot of the pieces of knowledge in place, but lack the extensive connectedness of an expert. Until this knowledge becomes further consolidated, the intermediate is more likely to engage in unnecessary search. The intermediate effect is not a one-time phenomenon, rather, it occurs repeatedly at strategic points in a student’s or physician’s training that follow periods in which large bodies of new knowledge or complex skills are acquired. These periods are followed by intervals in which there is a decrement in performance until a new level of mastery is achieved. B.
NOVICEREASONING A N D COORDINATION OF HYPOTHESES A N D EVIDENCE
It has long been recognized that, when solving problems, some subjects explore one solution path at a time, whereas others explore several solution paths simultaneously (Bruner, Goodnow, & Austin, 1956). This distinction is similar to the one made in the artificial intelligence (AI) literature between breadth-first and depth-first search procedures (see Fig. 10).
Fig. 10. Example of depth-first search ( I ) and breadth-first search (2). I n ( I ) , one hypothesis (hl) is explored in detail until it is either discarded (found not to account for the data) or accepted. In breadth-first search, two hypotheses are explored at the same time, until a decision is made between either of the two hypotheses. The symbols a1 and bl represent hypotheses being evaluated, fl and f 2 represent case data (findings).
Diagnostic Reasoning and Medical Expertise
229
Depth search is a search procedure in which one path is traversed in depth. This path is traversed first, and when the end of that path has been reached, it backtracks to the closest branching node in the tree and traverses the new path in the same way (Harel, 1987). This is equivalent to exploring a single hypothesis in depth until all the important consequences have been explored. The other search procedure is a breadthfirst procedure, in which all nodes at one level are explored before going to deeper levels. In this context, several hypotheses are explored simultaneously, rather than attempting to rule out, modify, or confirm any specific hypothesis. In machine problem-solving research, both of these procedures are considered to have their drawbacks (Charniak & McDermott, 1987). On the one hand, depth-first search may lead one through the wrong path, resulting in a loss of time and resources in the process; on the other hand, breadth-first search is highly inefficient because one explores all possible paths, good or bad, equally. In AI, these strategies are considered as blind or brute force strategies, in the sense that they do not make use of knowledge t o guide their search. Algorithms have been developed to overcome these limitations by using evaluation functions of the search path taken (e.g., heuristic search). The use of search strategies by human problem solvers is, most of the time, informed by the knowledge possessed by the subjects, such that (a) not all paths are explored, but only the most relevant ones, and (b) a single path is not explored in its entirety, rather, it is explored until sufficient information has been obtained that satisfies the goal. A study by Arocha et al. (1993) extended the research of Joseph and Pate1 (1990) on hypothesis generation and evaluation to novice subjects. They compared the performance of novice medical students at three levels of training in solving two cardiology problems. The subjects were medical students in the second (early novices), third (intermediate novices), and fourth (advanced novices) year of medical training. The objective of the research was to investigate the strategies novices used in coordinating the hypotheses they had generated to account for the case data. Clinical problems were designed so that a particular hypothesis (i.e., myocardial infarction) was suggested by the initial case presentation as the medical trainees thought aloud while solving the problems. The first segments of both cases were consistent with a typical presentation of myocardial infarction. Segments 2 and 3 presented information that was inconsistent with myocardial infarction, and was suggestive of viral pericarditis in case I , and aortic dissection in case 2. The case was segmented into a standard format in which a clinical interview is usually conducted, including the presenting complaint, the patient’s history, and the results from the physi-
230
Virnla L. Patel, Jose F. Arocha, and David R. Kaufman
cal examination. The subjects were presented with information, one segment at a time, on a microcomputer and asked to think aloud while they read and interpreted the problem. At the end of each segment, the subjects made tentative diagnoses. Unlike the Joseph and Patel study, in which segments consisted of single sentences, the segments presented to the subjects were in the form of paragraphs. Finally, they were asked to provide a final diagnosis and an explanation of the patient problem. The verbal data was analyzed for diagnostic accuracy and the utilization and mapping of clinical cues in generating the diagnostic hypotheses. The results showed that early novices (second-year students) generated and retained their initial hypothesis, which corresponded to the most common disease, using a strategy similar to depth-first search, in the sense that these subjects tended to generate the initially suggested hypothesis and to maintain this hypothesis throughout the whole case. They did so despite the presence of inconsistent evidence with the initial hypothesis. Early novices activated the fewest number of hypotheses and generated very little search. Intermediate novices (third-year students) used a similar strategy to breadth-first search, in the sense that several hypotheses were generated and evaluated in parallel, without resolving or eliminating them. That is, they tended to generate several diagnostic hypotheses to account for different findings, which resulted in a larger hypothesis space, a great deal of extraneous search, and a less coherent problem representation. This strategy is quite consistent with a hypothetico-deductive model of diagnosis described by Elstein et al. (1978). Like the intermediate subjects, advanced novices (fourth-year students) initially generated and evaluated multiple hypotheses, using a strategy similar to breadth-first search. However, they did so to account for the same set of findings, generating competing hypotheses. This resulted in a decrease of irrelevant search as compared with the intermediates. Then, they evaluated these hypotheses against the data. In this way, advanced novices narrowed down the initial pool of hypotheses to only one or a few at the end of the problem. Figure 11 presents an illustration of the types of strategies used by three subjects at different levels of training. The figure shows schematic examples of the initial hypotheses generated by the three subjects. The early novice’s response to presentation of the first segment of the case consisted of the common and most typical disease (myocardial infarction). This subject evaluated only this single hypothesis against the subsequent information in the problem, changing the hypothesis only at the last segment. The intermediate subject generated various hypotheses to account for several cues in the case. Like the intermediate subject, the advanced novice also generated several hypotheses, with some differences: first, the hypotheses were produced to account for the same set of data; second,
-
23 I
Diagnostic Reasoning and Medical Expertise c~~~~~
----.I
445 Year old male I
High risk age and gender
Decreased CO2 Decreased blood supply
piedispose lo
45 year old male
Type A personality
I
Cardiovascular symptoms L------l
------I Circulatory problems I
-A: Release of other compounds
r
IB r a --A mmor
I
-1
i n 1 4
in
-
dull frontal headache
I,
not voided urine
.: