Cognition, @Elsevier
7 (1979) 323-331 Sequoia S.A., Lausanne
~ Printed
in the Netherlands
Does awareness of speech a...
37 downloads
1001 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Cognition, @Elsevier
7 (1979) 323-331 Sequoia S.A., Lausanne
~ Printed
in the Netherlands
Does awareness of speech as a sequence of phones arise spontaneously? * JO!& MORAIS LUZ GARY J&US
ALEGRIA
and PAUL BE RTE LSON Universitc! libre de Bruxelles
Abstract It was found that illiterate adults could neither delete nor add a phone at the beginning of a non-word; but these tasks were rather easily performed by people with similar environment and childhood experiences, who learned to read rudimentarily as adults. Awareness of speech as a sequence of phones is thus not attained spontaneously in the course of general cognitive growth, but demands some specific training, which, for most persons, is probably provided by learning to read in the alphabetic system. Introduction
Alphabetic writing in first approximation represents speech at the level of units such as phone and phoneme. 1 Both spelling and reading in an alphabetic system imply, in addition to the ability to perceive minimal phonetic distinctions, an explicit knowledge of the phonetic structure of speech. For example, the reader/writer must not only be able to distinguish between cat and bat, but must also know that cat and bat consist of three units and differ only in the first. An important question is how this knowledge is attained. In normal communication, people pay attention to meaning, not to the structural charac*Reprints may be obtained from Jo& Morais, Laboratoire de Psychologie expirimentale, Universit6 libre de Bruxelles, 117 av. Ad. Buyl, B-1050 Bruxelles, Belgium. ‘While the term phone is generally used to indicate the more elementary units of speech that are perceptibly different, there is a considerable disagreement in the literature about the defmition of phoneme. In the traditional perspective, the phoneme is any collection of phones whose differences are irrelevant to meaning distinctions; in the generative-transformational perspective, the phoneme is an abstract representation that depends on morphemic information and relates to pronunciation through a set of rules. For a discussion of the distinction between phone and phoneme, from the latter point of view, in relation to the alphabetic system, see Gleitman and Rozin (1977). In the present text we shall refer to analysis into phones rather than into phonemes, because the experimental task simply required our subjects to manipulate different sounds without regard for meaning.
324
J. Morais, L. Cary, J. Alegria and P. Bertelson
teristics of the speech they hear and produce. However, conscious reflection on language and therefore explicit knowledge of the linguistic structures do occur. Awareness of speech as a sequence of phones, for instance, might appear spontaneously at some age, as a normal outcome of cognitive growth, through maturation and/or linguistic experience. Alternatively, it may require some specific training, which for most children is usually provided by reading instruction itself. The question is important not only from a theoretical point of view but also from a practical one: under the cognitive growth hypothesis, failures in learning to read can best be avoided by adjusting the age at which reading instruction is started to individual rates of development, while under the specific training hypothesis the solution should be sought in the improvement of educational practices. That the ability to manipulate phones is related to success in learning to read has been largely documented. For instance, Savin (1972) signaled that children who failed to learn to read by the end of the first grade were generally unable to learn Pig Latin. This “secret language” requires the shifting of the initial consonant cluster of each word to the end of the word and the addition of the sound [ei]. This fact, however, may reflect either a delay in the spontaneous acquisition of the ability to analyse speech into phones or the inability to make abstract inferences about the sound system of language from its alphabetic representation. Some observations on the linguistic behavior of preschool children would suggest that insight into the phonetic structure of language may be possible before formal learning to read and write. Read (1978) could elicit phonetically correct judgments of similarity for vowels in kindergarteners. Slobin’s (1978) daughter engaged in rhyming play and noticed sound similarities in her own speech at 3;l: “eggs are beggs; more-bore”. Preschool children apply the plural inflection to new words and appreciate the pronunciation of a sound in a word. However, the conscious manipulation of a particular phone or class of phones (like vowels, which are important in rhyme) does not necessarily imply awareness of speech as a sequence of phones. Phones that can be uttered in isolation may be more accessible, i.e., brought more easily to our awareness, than highly encoded ones. Awareness of such phones may be an example of awareness of a linguistic performance, rather than of a linguistic structure. The problem we consider here is how awareness of the phonetic structure, not of this or that phone, is attained. The few studies in which the development of the ability to make an explicit analysis of utterances into phones has been investigated do not permit one to choose between the cognitive growth and the specific training hypotheses. In one of those studies (Zhurova, 1973), children were shown dolls with colored jackets and told, for instance, “the boy with the
Awareness of speech as a sequence of phones
325
yellow jacket is Yan, the boy with the green jacket is Gan, the boy with the white jacket is Whan”, etc... . Then, they were tested for the retention of names and questioned about other dolls with colored jackets that had not been shown before (pink, violet, etc...). The rule for new jackets was used successfully by 12%, 39% and 100% of the children in the 4 to 5, 5 to 6 and 6 to 7 years age groups. In another study (Liberman, Shankweiler, Fischer and Carter, 1974), children were asked to play a tapping game, in which segments of a word spoken by the experimenter had to be indicated by the number of taps. The segments were either syllables or phones. The authors found that none of the nursery school children (mean age: 4 years 10 months) could segment by phone (i.e., reach a criterion of six consecutive errorless trials) while 46% could segment by syllable. The percentage of children who were able to segment by phone increased in the other groups: 17% of the kindergarteners (mean age: 5 years 10 months) and 70% of the first graders (mean age: 6 years 11 months). In both the Russian and the American studies the most dramatic progress in segmentation performance occurred between ages 5 and 6. As the Haskins workers pointed out, this increase “might result from the reading instruction that typically begins between ages five and six. Alternatively it might be a manifestation of cognitive growth not specifically dependent on training” (Shankweiler and Liberman, 1976). A test of the issue, they suggested, would be provided by a developmental study of segmentation skills in children learning to read in a logographic system, such as Chinese, which does not demand explicit phonetic analysis. However, such a study, they pointed out later (Liberman, Shankweiler, Liberman, Fowler and Fischer, 1977), can no longer be carried out in China, because children now learn to read alphabetic text before they start studying the logographic characters. Fortunately, testing readers of non-alphabetic systems is not the only possibility. In communities where the writing system is alphabetic, there remains a minority of adults who either have never been taught to read or have dropped out of school at a very early stage. Illiterate people should be unable to perform tasks requiring conscious phonetic analysis, if the improvement observed between ages 5 and 6 is related to reading instruction. On the contrary, if the improvement is the result of some cognitive growth process, independent of reading, they would, of course, succeed.
Method The present experiment was run in a poor agricultural area of Portugal (Mira de Aire, district of Leiria). Subjects were all of peasant origin, but
326 J.Morais, L. Cary, J. Alegria and P. Bertelson
most were now working in the textile industry. Thirty illiterate people (I subjects) and 30 people who learned to read beyond the usual age (R subjects) were tested. / subjects, 6 males and 24 females, were aged 38 to 60 and R subjects, 13 males and 17 females, were aged 26 to 60. Among I subjects, twenty had never received any instruction at all, four had been taught by their children to identify letters, and six had been in school for 1 to 6 months in childhood (some of them could “draw” their names). R subjects had attended classes for illiterate people organized by the government, by the Army or by industry. All were at that time 15 years old or more. Twentytwo, as a result, had received some kind of certificate and eight had failed to obtain any. Two tasks were administered. In the “deletion” task, the subject had to delete the first phone from an utterance provided by the experimenter. In task, he had to introduce an additional phone at the the “addition” beginning of the utterance. Half the subjects in each group worked with one of the two tasks. For each task, five subjects worked with the phone [pl, five with the phone [I], and five with the phone [ml ; three different groups of consonants (plosives, fricatives and nasals) were thus represented in the experiment. The test consisted of 15 introductory trials to illustrate the rule, and 20 experimental trials. The subjects were told that their task was to add (delete) one “sound” to the utterances produced by the experimenter. In the introductory trials, these utterances were non-words which became words by adding (deleting) the phone assigned to the subject. For instance “alhaco” became “palhaco” (clown) and “purso” became “urso” (bear). A correction procedure was used at that stage: when the subject failed to produce the correct response, the experimenter provided it. The experimental trials were of two types: in W trials, the experimenter uttered a word which, by the transformation rule, would become another word, for instance “uva” (grape) became “chuva” (rain), and vice-versa; in NW trials, the experimenter uttered a non-word which would become another non-word, for instance “osa” became “posa”, “chosa” or “moss” depending on phone condition. In both types of experimental trials, no information was provided after the subject’s response. The subject had been told beforehand that on some experimental trials the correct response might be a non-word. All the words were of current use and, in all probability, were known by the subjects.
Red
ts
In interpreting the results account must be taken of the fact that only NW trials provide unambiguous information regarding segmentation and fusion
Awareness of speech as a sequence of phones
abilities. In W trials, the correct response might be found by lexicon for a similarly sounding word. W trials yielded in fact mances than NW ones. On NW trials, I subjects gave a very mance and R subjects quite a good one: mean correct responses tively 19% and 72%. The pattern of results is nearly identical tasks (Table 1). Table 1.
Mean percentages
327
searching the better perforpoor perforwere respecfor the two
of correct responses for each type of trial, task, and group
of subjects. In parentheses,
the percentage
of subjects who attained 100% of
correct responses. Task Addition
Deletion
Trials
W
NW
W
NW
I
46 (13)
19 (0)
26 (7)
19 (0)
R
91 (33)
71 (13)
87 (47)
73 (27)
Subjects
Fifty percent of I subjects failed on all NW trials, while no R subject did. More than 50% of R subjects and only one of the I subjects gave 8 correct responses or more on the 10 NW trials (Figure 1). I subjects failed whatever the target phone: mean correct responses on NW trials were 17%, 19% and 20%, for [pl , [JI and [ml respectively. I subjects who had been in school for some time in childhood or who had been taught the names of letters (n = 10) performed somewhat better on NW trials (30%) than the remaining subjects (13%). The difference approached significance at p < 0.05 by a one-tailed t test (t = 1.696; df = 28). Within the R group, the mean percentage of correct responses on NW trials was 55% for the 8 subjects without a course certificate and 79% for the other 22. The difference is significant at p < 0.025: (t = 2.41; df = 28). On the other hand, R subjects who learned to read before age 25 (n = 10) did not perform significantly better than those who learned beyond that age (75% and 7 1% respectively; t = 0.384; df = 28). The analysis of errors on NW trials revealed that only 19% of the incorrect responses made by I subjects involved the correct deletion or addition of the required phone plus some other transformation, while these kinds of responses represented 56% of the R subject’s errors.2 A tendency to produce ‘An example
is the response
pili instead
of pe’cli
328
J. Morais, L. Cay, J. Alegria and P. Bertelson
words in response to non-words was present in both I and R groups and accounted for, respectively, 46% and 32% of the errors; however, the proportion of wrong responses that both were words and involved the required phone3 was much smaller in group I (6%) than in group R (28%). The great majority of errors made by I subjects can thus be linked to lack of awareness of phonetic structure, while an important portion of the errors made by R subjects were apparently due to some other cause.
Figure
1.
Number
of subjects at the different
levels of performance
in the I and R
groups (for NW trials only).
0
1
2
3
4
5
6
7
6
9
10
0123456
Number of cared /
responses
R Subjects
Subjects
Table 2 shows the errors that occurred twice or more (over a maximum of five) in NW trials for each combination of group, task and phone. It should be noticed that the most frequent errors were generally words (except bli, go and the repetitions ~OSU and maguto). The items in italics are those for which the phone to be deleted (or added) has not been deleted (or added). It should be noticed that this more frequent type of error was made by the subjects of group I, not by those of group R.
3An example is the word podu
instead
of the non-word
posu.
Awareness of speech as a sequence of phones
Frequent errors in NW trials for each combination ofgroup, task and phone. The first item is the stimulus and the second the response. The first number inside the brackets indicates the number of occurrences of the response; the second number indicates the total number of ewors in the trial.
Table 2.
Deletion
329
[PI
[/I
Puada - Ada (2/5) Pobli - Pobre (214) Pecli - PP (314)
Chuada
[ml
task
2 Subjects
- Ada (2/5)
Muada Amuada (315) Mobli - M&e1 (3/5)
Chube. Chuva (214) Chimi Chig6
- Mri (3/5)
- 6 (3/S)
Chabata’ - Batata R Subjects
Addition
Puada
- Ada (2/2)
(2/S)
Mimi - Mi (3/5) Mosa - Mosa (215) Migd -Amigo (3/5) Mapto - Map to (2/5) Mabati - Batata (3/5)
Chuada - Ada (2/3) Chobli - Bli (2/2) Chimi -Ma’ (3/3) Chigi, - G6 (3/4) Chabati - Ti (2/3)
task
I Subjects
Imri - Irma” (215) Abat
- Batata (215)
R Subjects
Imi Aquto-Po$o(2/3)
ACuto
- MZe (2/4)
- Chuto (2/4)
Discussion Illiterate adults were unable to delete or add a phone at the beginning of a non-word, while adults from the same environment who learned to read in youth or as adults had little difficulty. It is interesting to note that the performance of the I subjects was slightly inferior to that of Belgian first graders aged 6 years who were tested in the third month of the school year with similar tasks (18% correct responses for deletion, 29% for addition). The performance of the R subjects was at about the same level as that of Belgian second graders aged 7 years and tested in the fourth month of the school year (73% correct responses for deletion and 79% for addition) (Alegria and Morais, 1979). The extremely poor performance of the I subjects cannot be explained in terms of some general inability to manipulate speech segments or to under-
330
J. Morais, L. Cary, J. Alegria and P. Bertelson
stand an inductive instruction. Cary and Morais (1979) have tested a group of 12 illiterates, from the same origin as those of the present experiment, with a more complex task which consisted in reversing the order of either phones or syllables (for instance, chu for ach, or chave for vechd, respectively) after inductive training. In the reversing phones condition the mean percentage of correct responses was 9% (ranging from 0% to 20%), while in the reversing syllables condition it was much higher: 48% (ranging from 13% to 93%). The present results clearly indicate that the ability to deal explicitly with the phonetic units of speech is not acquired spontaneously. Learning to read, whether in childhood or as an adult, evidently allows the ability to manifest itself. Thus, it is not right to say that awareness of the phonetic structure of speech is a precondition for starting learning to read and write. The precondition for the acquisition of these skills is not phonetic awareness as such but the cognitive capacity for “becoming aware” during the first stages of the learning process. Of course, the present results do not mean that cognitive growth plays no part in the development of phonetic awareness. Specific training may not be effectual before some critical developmental stage. If awareness depends on instruction, it does not follow it necessarily. Successful instruction, on the other hand, depends on awareness. There is a reciprocal relationship between learning to read and the developmental changes in phonetic awareness. Two important questions should now be examined. The first is to what extent phonetic awareness can be provoked by other stimulating experiences. Although for most children learning to read constitutes the exercise that renders the analysis of speech into its phonetic elements imperative, it is not necessarily unique to that function, and other kinds of training might presumably achieve the same effect. The second question is to what extent the procedures used in recognizing and producing speech can be affected by awareness of speech as a sequence of phones. The fact that illiterates are not aware of the phonetic structure of speech does not imply, of course, that they do not use segmenting routines at this level when they listen to speech. But that fact should remind us of the risk we may incur in studying the mechanisms of speech perception through tasks that require conscious, explicit segmentation. Under the pressure of modem developments in linguistics and phonetics some psychologists were led to consider the so-called “psychological reality” of, for example, transformational grammars, or phones and phonemes. It is not always clear whether this kind of inquiry concerns implicit (tacit) or explicit knowledge (cf., a discussion of this point by Seuren, 1978). If the question concerns how we perceive speech, by first segmenting it either in phones
Awareness of speech as a sequence of phones
33 1
(phonemes) or in syllables - the question apparently considered by Savin and Bever (1970) and other authors - then it refers to tacit knowledge. The present results with illiterates are irrelevant to this question, but they urge us to distinguish between the prevalence of such or such a unit in segmenting routines at an unconscious level and the ease of access to the same units at a conscious, metalinguistic level.
References Alegria,
.I. and Morais, J. (1979). Le developpement de I’habilete d’analyse phonetique consciente dc la parole et I’apprentissage de la lecture. Archives de Psychologie, in press. Cary, L. and Morais, J. (1979). A aprendizagem da leitura c a consciencia da estrutura fonetica da fala. Revista Portuguesa de Psicologia, in press. Gleitman, L. R. and Rozin, P. (1977). The structure and acquisition of reading. I: Relations between orthographies and the structure of language. In A. S. Reber and D. L. Scarborough (Eds.), Toward a Psychology ofReading. Hillsdale, Lawrence Erlbaum Associates. Liberman, I. Y., Shankweiler, D., Fischer, F. W. and Carter, B. (1974). Reading and the awareness of linguistic segments. J. Exper. Child Psychol., 18. 201-212. Libcrman, I. Y., Shankweiler, D., Liberman, A. M., Fowler, C. and Fischer, F. W. (1977). Phonetic segmentation and recoding in the beginning reader. In A. S. Reber and D. L. Scarborough (Eds.), Toward a Psychology ofReading. Hillsdale, Lawrence Erlbaum Associates. Read, C. (1978). Children’s awareness of language, with emphasis on sound systems. In A. Sinclair, R. J. Jarvella and W. 3. M. Levelt (Eds.), The Child’s conception of language. Berlin, SpringerVerlag. Savin, H. B. (1972). What the child knows about speech when he starts to learn to read. In J. F. Kavanagh and 1. G. Mattingly (Eds.), Language by ear and by eye. Cambridge, Mass., MIT Press. Savin, H. B. and Bever, T. G. (1970). The non-perceptual reality of the phoneme. J. Verb. Learn. Verb. Behav., 9. 295 -302. Scurcn, P. (1978). Grammar as an underground process. In A. Sinclair, R. J. Jarvclla and W. J. M. Levelt (Eds.), The Child’s conception of language. Berlin, Springer-Verlag. Shankweiler, D. and Liberman, I. Y (1976). Exploring the relations between reading and speech. In R. Knights and D. J. Bakkcr (Eds.), The neuropsychology of learning disorders.’ Theoretical approaches. Baltimore. University Park Press. Slobin, D. I. (1978). A case study of early language awareness. In A. Sinclair, R. J. Jarvella and W. J. M. Levelt (Eds.), The Childs conception of language. Berlin, Springer-Verlag. Zhurova, L. Y. (1973). The development of analysis of words into their sounds by preschool children. In C. A. Ferguson and D. I. Slobin (Eds.), Studies of child language development. New York, Holt, Rinehart and Winston, Inc.
R&me’ Un groupe d’adultes analphabetes a kte’ incapable de soustraire ou d’ajouter un phone au debut d’un non-mot, mais ces tlches ont 6te facilement effectuees par un groupe de personnes dont l’environnement et l’experience pendant l’enfance Ctaient similaires et qui ont appris i lire de facon rudimentaire i l’lge adulte. La prise de conscience de la parole comme une sequence de phones n’est done pas acquise spontanement au tours du developpement cognitif mais exige un entrainement spkcifique, lequel, pour la plupart des personnes, est fourni probablement par l’apprentissage de la lecture dans le systi’me alphabetique.
Cognition,
l(l979)
@Elsevier
Sequoia
2
333-362
S.A., Lausanne
- Printed
Intentional
in the Netherlands
communication in the chimpanzee: The development of deception* GUY WOODRUFF
Primate Facility University of Pennsylvania DAVID
PREMACK
Department of Psychology University
of Pennsylvania
Abstract Communication about the location of a hidden incentive was studied in chimpanzee-human dyads, in which each member of a pair served alternately as ‘sender” and “recipient” of information. When the human cooperated with the chimpanzee in finding the goal, from the very beginning the chimpanzees were able to produce and comprehend behavioral cues which conveyed accurate locational information, When the human and chimpanzee competed for the goal, the chimpanzees learned both to withhold information or mislead the recipient, and to discount or controvert the sender’s own misleading cues. The chimpanzee’s ability to convey and utilize both accurate and misleading information, by taking into account the nature of the sender or recipient, provides evidence of a capacity for intentional communication in this nonhuman primate species.
A central issue in the comparative study of animal communication concerns the concept of intentionality. A good deal of research has shown that the social behavior of many species from diverse phyletic levels can serve a communicative function, by transferring information from one individual to *The research was supported by National Science Foundation grants BNS 75-19748 and BNS 77-16853, and by a facilities grant-from the Grant Foundation. We thank G. Dank, R. Glick, Z. Goldfinger, S. Goldsmith. E. Kaufmann, K. Kennel, W. Langbauer. D. Liner. F. Massa. J. Moselsky. B. Pfeffer, R. Reisman, A. Samuels, E. Wier, B. Yarczowei, and ?. Zwicker’ for assistance as trainers and aides. We are also indebted to E. Menzel for advice and assistance during preparation of the project. Address reprint requests to G. Woodruff, University of Pennsylvania Primate Facility, Honey Brook, Pennsylvania, 19344 USA.
334
G. Woodruff and D. Premack
another (Altmann, 1967; Hinde, 1972; Marler, 1965; Smith, 1977). However, the extent to which different species communicate intentionally, i.e., understand and control the transfer of information, is largely unexplored. By definition, any communicative event involves a “sender”, a “recipient”, and behavioral signals that convey information between the two (MacKay, 1972). A particular instance of communication is intentional if, in addition, the sender (i) appreciates the fact that his behavior transmits information, (ii) recognizes that the recipient also knows that his behavior is informative, and (iii) is able to choose from a set of alternatives that course of action (or inaction) which will provide (or suppress) a given bit of information. Intentional communication is thus more than a simple transfer of information; it is a purposive transfer, based on the sender’s knowledge about the effect that his actions can have on the recipient. Although intentionality no doubt plays a pervasive role in human communication, especially human language (Lyons, 1972), there is no firm evidence for this level of complexity in the communication systems of nonhumans. Indeed, some authors have foreclosed the issue, denying the possibility for intentionality outside the human species. But the present paucity of evidence makes such dismissals inadvisable. Even in the case of language, intentionality often remains hidden. The mere fact that communication is symbolic in form does not establish that it is intentional (and conversely, the fact that it is not symbolic does not rule out intentionality). Often the underlying intentionality is unveiled only when there is a breakdown in the assumptions shared by speaker and listener. For example, an important element of ordinary conversation is the assumption of truth that is shared by both parties: the listener assumes the speaker tells the truth, and the speaker assumes the listener considers him truthful (Grice, 1967). When this assumption is violated, intentionality may be revealed by the speaker’s ability to suppress or otherwise alter the information he conveys, and by the listener’s ability to adjust his response to false information provided by a devious speaker. Even very young children show evidence for this level of control and understanding of their communication (Flavell, Botkin, Fry, Wright and Jarvis, 1968). Students of animal communication suggest that evidence for deception would provide the best indication of intentional communication in a given species (Hinde, 1972; Marshall, 1970). In keeping with this suggestion, field observations of animals (typically, primates) engaging in behavior that misleads another individual (e.g., Kohler, 1925; Menzel, 1974; van LawickGoodall, 197 1) have been cited as evidence of intentionality. Unfortunately, these provocative observations must remain at best only suggestive; not all instances of misleading behavior qualify as deceit. For instance, a misleading
Intentionality
in the chimpanzee
335
signal is not intentional if it results from an occasional “error” on the part of the sender, who otherwise always conveys accurate information. Nor is it intentional if the behavior is always triggered by a particular stimulus situation (e.g., the instinctive behavior pattern of a bird “feigning” a broken limb in the presence of a predator (Simmons, 1951)). A claim for intentionality requires demonstration that an individual can reliably use his communicative behavior to convey either accurate or misleading information, as the situation demands. In the present study we systematically explored the potential for deception in a nonhuman primate, chimpanzee. In one test, a chimpanzee was informed of the location of hidden food, but was denied direct access to it by a physical barrier. The animal could obtain the food only by imparting information about its location to an uninformed human positioned outside the enclosure, in the vicinity of the goal. One human was friendly and cooperative; if he found the food, he gave it to the chimpanzee, but if he failed the animal received nothing. Another human was hostile and competitive; if he found the food he kept it for himself, but if he failed the chimpanzee was allowed to leave the enclosure and obtain the food. Thus, the chimpanzee’s success in procuring the goal depended upon his ability to convey accurate locational information to a cooperative partner on the one hand, and suppress or convey misleading information to a competitive individual on the other. In a second test, we reversed the roles of sender and recipient played by chimpanzee and human. The human was informed of the goal location, and the chimpanzee was now required to find the food by using the behavior of the human as a source of information. The humans modelled the chimpanzees’ behavior patterns observed in the previous test, and in addition, one human was cooperative (he always indicated the correct location) whereas the other was competitive (he consistently indicated an incorrect location). Here we assessed the chimpanzee’s ability to comprehend accurate and misleading information, and to adjust his search accordingly.
Method Subjects The subjects were four African-born chimpanzees (Pan troglodytes), one male (Bert) and three females (Sadie, Luvie, and Jessie). The animals arrived in the laboratory at ages estimated to be one to one and a half years. They had lived in the laboratory as a group for ten months at the start of the
336
G. Woodruff and D. Prernack
experiment. The animals were fed two meals daily (fresh fruit and Purina monkey chow), and received a variety of cookies and candies throughout the day in other experiments run concurrently with the present one. Human
trainers
Laboratory assistants and undergraduate student volunteers participated as trainers and aides in the experiment. The present tests were conducted over a period of three years, during which time the identity of trainers in the tests changed periodically. However, all persons involved were familiar with the chimpanzees (sharing in caretaking duties and assisting in other experiments) before serving in the present tests. There were two basic types of trainers, distinguished by sartorial appearance and behavioral dispositions. The “cooperative” trainer wore the usual green laboratory scrub suit, behaved in a friendly manner toward the animals, and vocalized in a soothing tone of voice one would use with a young child. The “competitive” trainer wore black boots, white coat and hat, dark sunglasses, and a cloth over his mouth (after the fashion of a bandit). He behaved in a hostile manner toward the animals, occasionally swatting them as he passed in the hallway and vocalizing in a low, gruff tone of voice. A third person, the “passive” aide, always accompanied the chimpanzees during the tests, in order to reduce the animal’s distress over being separated from his/her companions. The passive aide interacted minimally with the animal during trials, only allowing the animal to cling to him if he/she so desired. Materials
The tests were conducted in a 10 by 11 by 10 foot (height by width by length) laboratory room. Just inside the door of the room was a 4 by 5 foot enclosure, bounded on two sides by heavy gauge wire mesh. The mesh ran from floor to ceiling, except for a space of six inches between the bottom of the mesh and the floor. A cage door allowed passage through the enclosure into the testroom, and a one-way observation window was located in the wall to the left of the doorway. Small containers were used to conceal food in the room, and were selected from a set of common laboratory items (cardboard boxes, tin cans, plastic cups, coffee pots, and so on). Foods used during the experiment were small pieces of fresh or dried fruit, berries, candies, and cookies. Each session was videotaped by means of a Panasonic WV-I8OP studio camera with wide-angle lens, located in one corner of the testroom outside
Intentionality in the chimpanzee
337
the enclosure, and a Sony VO-1800 videocassette recorder, located in a booth behind the observation window. An observer in the booth provided running commentary about trial events on the videotape soundtrack.
Procedure
The experiment consisted of three phases, the first lasting five months, followed by a six-month hiatus, the second lasting 14 months, followed by a ten-month hiatus, and the third lasting one month. Phase 1. In the first phase, only the production test was administered. Sessions consisted of three to six trials with intertrial intervals of approximately two minutes, and were conducted five times per week. Prior to the start of a trial, an aide concealed food under one of two containers located in the testroom, about three feet from the mesh enclosure. The position of the baited container on the left or right side of the room (from the point of view of the subject in the enclosure) was randomized across trials. Only two containers (cardboard box and plastic cup) were used in this phase and each contained food equally often across trials. The aide removed a chimpanzee from the home cage or the outdoor field, carried him/her into the room, and closed the door. The aide lifted the baited container, gave the subject a direct view of the food, and then concealed the food by replacing the container. The aide then carried the animal out of the testroom and down a hallway to an adjacent room. At a signal from the aide, either the “cooperative” or “competitive” trainer and the “passive” aide left the adjacent room and the chimpanzee was turned over to the passive aide. The trainer entered the testroom through the enclosure, shut the cage door, and positioned himself in the far comer of the room, equidistant from the two containers. Finally, the passive aide carried the subject into the enclosure, shut the room door, sat down in the comer by the door, and relaxed his grasp of the subject, at which point the trial officially began. The delay between the subject’s view of the food and the start of the trial ranged from 20 to 30 seconds. The trainer’s task was to use the behavior of the informed chimpanzee to determine which one of the two containers held the food. Hence, several precautions were taken to ensure that other cues could not reveal the goal location. The exact location of each container changed from trial to trial, but always remained within an area of radius two feet. This prevented cues from changes in the position or orientation of the correct container during baiting. In addition, during baiting the testroom door was closed and the trainer and passive aide were stationed in an adjacent room with door closed.
338
G. Woodruff and D. Premack
This prevented cues from the sound of the correct container being shifted, lifted, and replaced during baiting. Thus, of the three organisms in the testroom, only the chimpanzee knew the location of food at the start of each trial. The cooperative and competitive trainers were given several general instructions about their roles in the experiment. They were urged to use the chimpanzee’s behavior to locate the baited container as often as possible, encouraged to take as much time as they needed to be certain of their choice, and to move about freely in the room in order to solicit cues, approaching one or the other container without necessarily overturning it. In addition, each trainer was urged to develop an accurate choice strategy, and then maintain that strategy as much as possible; if a subject later began to mislead him on repeated trials, he did not then change his strategy to “outwit” the subject (e.g., by choosing the opposite container). During initial trials the trainers were of course nai’ve to the task. When experienced trainers later departed from the laboratory, their replacements were given the same instructions, as well as extensive experience viewing trials through the observation window and on videotape. The outcome of each trial depended upon which container the trainer chose and whether he was playing the cooperative or competitive role. If the cooperative trainer chose the baited container, he gave the food to the chimpanzee. If he chose the unbaited container, however, the chimpanzee was consoled by the passive aide and led out of the room without food. On the other hand, if the competitive trainer chose the baited container, he kept the food for himself. If he chose the unbaited container, however, he retired to the far corner of the room in disgust, and sulked as the passive aide allowed the chimpanzee to leave the enclosure to find the food. The chimpanzees were first given 24 trials with only the cooperative trainer at the outset of Phase 1 (the Pretest), in order to be sure that locational information could indeed be transferred in this situation. Thereafter, the chimpanzees were exposed to both trajners in random order across sessions, with only one type of trainer appearing in any given session. Testing in Phase 1 continued for each subject until both trainers’ choice performance appeared stable, with no obvious increasing or decreasing trends in accuracy. Phase 2. In the second phase, the animals were given tests of both production and comprehension. The production test described for Phase 1 was modified in two ways. First, the containers were selected from a set of four new items, and all possible pairwise combinations of two containers were presented in random order. Second, a one minute time limit was imposed on all trials. Now the trainers could not take all the time they wished, but
Intentionality
in the chimpanzee
339
were instructed to try to make a choice as soon as possible. (However, the trainers were not to simply guess if they remained uncertain about the location of the food, a strategy that was allowed in Phase 1). If the cooperative trainer failed to choose within the time limit, the chimpanzee was carried from the room without food, In contrast, if the competitive trainer was unable to choose within one minute, he retired to the far corner of the room and the chimpanzee was allowed to leave the enclosure to find the food. This time limitation was designed to foster rapid and efficient communication with the cooperative trainer, and to allow withholding of information for a specified period of time in the presence of the competitive trainer. We also administered a test of comprehension, in which the roles of informed sender and uninformed recipient played by chimpanzees and humans were reversed. Prior to the start of a trial, an aide gave the trainer a direct view of the food concealed beneath one of the containers. At a signal from the aide, the passive aide then carried a chimpanzee from an adjacent room into the testroom, through the enclosure, to the far corner of the room. The informed trainer then entered the enclosure, shut the door, and performed a set of responses in a prescribed manner. Approximately five seconds after the trainer entered the room, the passive aide relaxed his grasp of the chimpanzee, at which point the trial officially began. The chimpanzee’s task was to use the behavior of the informed human to determine which one of the two containers held food. The same precautions as those described for the production test were taken to ensure that other cues could not reveal the goal location. As an additional precaution, the passive aide averted his gaze toward the floor beneath him throughout each trial. Since he did not look at and “read” the cooperative or competitive trainer’s cues, the passive aide could not inadvertently cue the chimpanzee about the goal location. The cooperative trainer always oriented toward the baited container, whereas the competitive trainer always oriented toward the unbaited container. The behavior pattern performed by the trainers was derived from that commonly observed in the subjects on cooperative trials by the end of Phase 1. The trainer approached the wire mesh nearest one container, sat on the floor with head and torso oriented toward the container, extended a foot or hand under the mesh approximately six inches in the direction of the container, and looked back and forth from the chimpanzee to the container. The trainer thus served as a crude “model” of the chimpanzees’ behavior. Most subjects showed similar types of responses, although they were often embedded in a stream of behavior including other responses not oriented toward a container (jumping, climbing, scratching, clinging to and grooming the passive aide, and so on).
340
G. Woodrujyand D. Premack
The outcome of each trial depended upon which container the chimpanzee chose. If he approached and overturned the baited item, he was allowed to eat the food. If he chose the unbaited container, he was consoled by the passive aide and carried out of the room without food. However, a oneminute time limit was also imposed on comprehension test trials. If the animal failed to inspect a container on a cooperative trial, he was carried from the room without food; but if the subject failed to choose on a competitive trial, the trainer left the room and the animal was given the food by the passive aide. In both the production and comprehension tests of Phase 2, sessions consisted of six trials with intertrial intervals of approximately one minute. Only one type of trainer appeared in any given session. Testing consisted of two cycles of exposure to the following order of conditions: comprehension (cooperation, and then competition), followed by production (competition, and then cooperation). Testing in each condition was continued for each subject until at least one significant “run” (Grant, 1947) of responses was recorded, with the restriction that subjects received a minimum of 24 trials and a maximum of 144 trials per condition in each cycle. Phase 3. In the third and last phase, the chimpanzees were given a final brief test of both production and comprehension. The procedures were the same as those described for Phase 2, with two modifications. First, containers were selected from a new and larger set of items, such that each trial contained a unique pair of baited and unbaited items. Second, trials with the cooperative and competitive trainers were counterbalanced within sessions, such that each trainer appeared on three of the six trials per session. Both production and comprehension tests entailed a total of 48 trials, 24 trials with each trainer. Video tape analysis
At the conclusion of the experiment, videotapes of the chimpanzees’ behavior during Phases 1 and 2 of the production test were used to analyze response topographies. The first and last 24 trials with the cooperative and competitive trainers in each phase were edited onto new videotapes. The editing eliminated a view of the subject’s reaction to the outcome of the trainer’s choice (receipt or loss of food). The edited trials were played for a panel of three laboratory assistants, who were familiar with the chimpanzees but had neither observed nor participated in the experiment proper. The tapes were played at normal speed and in slow motion, repeatedly if necessary, and the observers recorded the frequency (and direction, if applicable) of the following types of behavior: (i) general movement in space
Intentionality
in the chimpanzee
34 1
toward one side of the room or the other (e.g. walking, running, jumping, somersaulting, sliding on the floor, and so on), (ii) more specific orientation of parts of the body (torso, head, limbs), (iii) visual behavior (glances toward the containers and the trainer), (iv) intensifications of a response as the trainer approached a container (a change in rocking motions was virtually the only response recorded in this category), and (v) responses which had the effect of moving or keeping the subject away from the containers (climbing the mesh to the ceiling, clinging to the passive aide). The observers discussed their observations and recorded a response only when a majority agreed upon a decision (interobserver agreement was 85% or more for each chimpanzee).
Results
Production test The cooperative trainer found enough information in the animals’ behavior to choose correctly almost from the beginning of the experiment. Table 1 presents an analysis of “runs” of consecutive correct or incorrect choices by the trainers. The table shows that within the first 24 trials (Pretest), the cooperative trainer showed a statistically significant run of consecutive correct choices beginning on Trial 17, 4, 8, and 6 for Sadie, Bert, Luvie, and Jessie, respectively. The left-hand column of Figure 1 presents the number of trials on which the trainer chose correctly during the Pretest, and these data reveal a significant proportion of trials with a correct choice for each animal (17 or more correct choices per 24 trials, p < 0.05, binomial test). Having established in the Pretest that information could be transferred from chimpanzee to human in this situation, we contrasted cooperative with competitive trials and examined whether or not the animals could exert some degree of control over the flow of information. At first, there was no evidence that any animal differentially controlled the locational information imparted by his behavior. Table 1 indicates that the competitive trainer readily showed a significant run of correct choices, beginning on Trial 1, 25, 4, and 1 for Sadie, Bert, Luvie, and Jessie, respectively. Moreover, there was no significant difference between the trainers’ choice performance during the first 24 trials with each trainer (Phase li, Figure 1) for any subject. Thus, in the beginning the competitive trainer was as successful in denying food to the animals as the cooperative trainer was in giving it to them.
342
G. Woodruff and D. Premack
Significant
Table 1.
“‘runs” of consecutive
correct or incorrect
choices by the trainers
in the production test, and by the chimpanzees in the comprehension test. Each entry shows the number of trials in each run per total trials to the end of the run. Subject
Production
Phase Cooperation Correct
Sadie
Bert
Luvie
Jessie
Competition
Correct choices
Incorrect choices
Correct choices
Correct choices
Incorrect choices
9/9b -.
13/193=
17124’
~
11/144a
9/10b
10/42a 24172’
~ ~
12/16ga 241192’
16129’ -
14124’ 22148’ 24112’
-
12/120a 21/144’ 12/156a
10/l lb
16132’
~
17/154b
-
3
121190a 221214’ 6/9a 1 2/143a 14/21Sb 221239’
~
_
1 1/206a
~
10/34a
~
~ ~
1
56163’ 14/94b
~
2
12/194a
~
3
20/216=
--
1
8/13a 10/66a 12/315=
~
2 3
Cooperation
Incorrect choices
1
2 3
Competition
choices
2
1
Comprehension
Probabilities : C
1 2/2tGoa
12/15c 9/25a 14/44c _.
Incorrect choices
8/8b 33/113= _
-
10/47a
~ _
~
7/7b
~ _
16124’ 12/44b 24172’
_
were computed
with Grant’s
“runs”
test (Grant,
11/170a 12/192a
~
1947).
p < 0.05 p < 0.01 p < 0.001
The introduction of the competitive trainer was not entirely without effect, however. The competitive trainer took longer to “read” the behavior of several of the animals and decide which was the baited container, even at the start of Phase 1, before the two trainers differed in their ability to choose correctly (Figure 1). Figure 1 shows that choice accuracy generally declined for both trainers during Phase 1. Whereas the cooperative trainer later regained his ability to
Intentionality
Figure 1.
in the chimpanzee
343
Number of trials with a correct choice (left-hand panels) and mean trial duration (right-hand panels) at various stages of the production test with each subject. Solid lines show data for the cooperative trainer, broken lines for the competitive trainer. Data are shown for blocks of 24 trials with each trainer. I,, initial Pretest with only the cooperative trainer; Ii, initial trials of Phase 1 with alternating trainers; 1 f, final trials of Phase 1; 2i, initial trials of Phase 2; 2f, final trials of Phase 2. Vertical dotted lines connecting points for the cooperative and competitive trainers indicate a statistically significant difference
(p < 0.05) between performances
,,;
6 x
for the trainers.
Bet-t ,,,,,I,:
6
1, 1, 1, 2, 2,
1, 1, 1, 2, 2,
Phase (blocks of 24 trials) -
Cooperattve trainer - Competitive trainer
choose correctly, the decrease in the competitive trainer’s performance level was permanent for all subjects. Indeed, accuracy for the cooperative trainer clearly exceeded that for the competitive trainer by the end of Phase 1 for three chimpanzees, and the magnitude of the difference, increased over the
scores for the trainers in the production
test and for the chimpanzees in the comprehension
test. Each
Phase
1 2 3
1 2 3
1 2 3
1 2 3
Sadie
Bert
Luvie
Jessie
; Y
approximation
13 8
61 18
38 I
1
0
Number of trials with no choice
; c
with normal
931132’ 811155 9116
316
831131’ 51/107
761133 291947 4/17oL
691133 lSi6d-i 5/23O
Proportion correct
above chance
were computed
0
0
0
0
0
0
0
0
Number of trials with no choice
Competition
below chance
Probabilities
107/151~ 111/168’ 20/24b
121/149’ 46160’ 22124’
80/143 49/12b 22/24c
88/13OC 48160’ 23124’
Proportion correct
Cooperation
Production
choice within the time limit imposed in Phases 2 and 3.
to the binomial
38148’ 24124’
87/144a 21124’
44148’ 24124’
40/48’ 24124’
Proportion correct
p < 0.05
1966).
891179 12/12c
611121 12/15a
102/28OY 4/24p
891161 24124’
1 12
35 9
8 0
7 0
Number of trials with no choice
Competition Proportion correct
(Courts,
p < 0.01 p < 0.001
distribution
0 0
0 0
0 0
0 0
Number of trials with no choice
Cooperation
Comprehension
was made. Also shown are the number of additional trials in which the trainers or chimpanzees failed to make a
score shows the number of trials with a correct choice (the baited container) per total trials on which a choice
Accuracy
Subject
Table 2.
Intentionality in the chimpanzee
345
course of Phase 2. Choice performance for the two trainers showed a statistically significant difference @ < 0.05; z-test for a difference between proportions) by the end of Phase 1 for Luvie, at the start of Phase 2 for Sadie, and by the end of Phase 2 for Bert and Jessie (Figure 1). The right-hand panels of Figure 1 show the course of change in the trainers’ latency to choose during the experiment. The competitive trainer generally took more time to make his choice, and the difference between trainers attained statistical significance @I < 0.05; t-test for a difference between means) at some point during Phases 1 or 2 for all animals. Table 2 presents the overall performance levels for both trainers in each phase of the experiment. Proportion of trials with a correct choice by the cooperative trainer attained levels significantly above chance @ < 0.05; binomial test) for all subjects in all phases except Bert in Phase 1. In contrast, overall accuracy for the competitive trainer decreased during testing for all animals. The competitive trainer was able to choose correctly on a majority of trials in Phase 1 with Luvie and Jessie, but performed at chance levels in Phases 2 and 3 with these two animals. On the other hand, the competitive trainer showed chance levels of performance in Phase 1 with Sadie and Bert, and then accuracy declined to levels significantly below chance during Phases 2 and 3. Finally, the table shows that when the one-minute time limit was imposed in Phases 2 and 3, the competitive trainer was unable to make a choice on a substantial number of trials with Bert, Luvie, and Jessie; whereas the cooperative trainer never experienced this difficulty with any subject. Thus, all animals learned to convey or suppress information about the location of food, depending upon whether the trainer was cooperative or competitive. However, the results show a further development for Sadie and Bert; these subjects demonstrated an ability to misinform the competitive trainer. This trainer eventually showed a significant run of incorrect choices (Table 1) and chose the unbaited container on a significant proportion of trials (Table 2) in Phase 2 for both animals. This ability to mislead the competitive trainer appeared first for Sadie at the outset of Phase 2, and for Bert by the end of Phase 2. Finally, the data from Phase 3 (Table 2) show that even after a tenmonth hiatus in testing, the subjects were quite flexible in choosing to convey accurate information, withhold, or convey misleading information when the type of recipient changed from trial to trial within sessions. Response
forms
Although these results establish that information was transmitted or suppressed by the chimpanzees in the production test, they tell us nothing
346
G. Woodruff arld D. Premack
about the actual behavior that was the source of information, nor how that behavior developed and changed during the experiment. Figures 2, 3 and 4 present the results of the videotape analysis performed on each chimpanzee’s behavior during the first and last 24 trials of Phases 1 and 2 with each trainer. Figure 2 presents the mean frequency (responses per minute)
Figure 2. Mean response
rate fclr componerlts of the subjects’ behavior patterns in the production test. The responses are: A, approach one side of mesh; T, orient torso toward container; G,, gaze at container; P, ‘point” with extended arm or leg toward container; Gt, gaze at trainer while other parts of the body orient toward a container; R, change in intensity of rocking motions when trainer approaches a container; Cb, climb mesh to ceiling; Cl, cling and,lor groom passive aide. Solid lines connect data points from trials with the cooperative trainer, broken lines those from trials with the competitive trainer.
A GTPG,RCbCl AQTPG,RCbCI AG,TPG,RCbCI AG;TPG,RCbCI
-
Response components moperation competitii
Intentionality
in the chimpanzee
347
of each of the various types of response scored by the observers. Figure 3 shows the incidence of directional bias (left versus right) for those types of response that were oriented toward the containers. Lastly, Figure 4 shows a measure of the correlation between the direction of each type of response and the actual location of food. Figure 2 provides an indication of the similarities in the subjects’ response forms, as well as differences in their respective styles. In one way or another, all animals tended to orient parts of their body toward the containers. At first, however, these responses were embedded in substantially different behavior patterns for each subject. For example, Sadie often sat with her back to one wall of the room in the enclosure, rocking from side to side for much of the time during early trials. Bert and Jessie tended to cling to the Figure 3.
Proportion
of all orientational
responses
which were directed
toward the
container on the left side of the room, irrespective of the location of the food. See Figure 2 for explanation of details. Phase:
1,
lf
2,
Response components -
cooperation
- -
competition
2,
348
G. Woodruff and D. Premack
aide, making only brief sojourns to one side of the mesh or the other, and then returning to the aide. Only Luvie showed a relatively well-organized pattern of behavior which appeared “deliberately” informative at the start. On most trials, she left the aide immediately, walked to one side of the mesh, sat facing the near container, extended one leg under thn mesh toward the container : “pointed”), and then glanced back and forth from trainer to container. Over the course of the experiment, similarities in the subjects’ response patterns became more pronounced. Orientational responses increased in frequency, and what was perhaps the most explicit cue, “pointing” with outstretched arm or leg, emerged for the remaining three animals. The folFigure 4.
Proportion of all orientational responses which were directed toward the baited container, irrespective of the position of the container on the left or rCpht side of the room. See Figure 2 for explanation of details. Phase:
5
1,
2
, explained by a “‘structural” hypothesis, than in terms of strategies designed to locate given and new information.
Introduction Although a large body of psycholinguistic research has been devoted to the study of sentences in isolation, it is now widely recognized that any approach which ignores the role of context is severely limited. One way of formulating the relationship between sentences and their contexts is in terms of their presuppositional content. For this reason the phenomenon of presupposition has received considerable attention from psychologists concerned with sentence comprehension (e.g., Haviland and Clark, 1974; Hornby, 1974), sentence memory (e.g., Offir, 1973; Singer, 1976; Hupet and Le Bouedec, 1977) and sentence production (e.g., Osgood, 1971; Bock, 1977). Similarly, by studying presupposition, the present research aimed to further elucidate the mechanisms by which sentences are understood in context. Our particular interest was in the type of presupposition which is created by the sentence’s surface structure. *This research was partly supported by an Australian Research Grants Committee award to V. M. Holmes. **Requests for reprints should be addressed to: J. Langford, Department of Psychology, University of Melbourne, Parkville, Victoria 3052, Australia.
364
J. Langford and V. M. Holmes
A syntactic presupposition may be identified as that part of a sentence’s meaning which is not affected by negation of the sentence. It may be distinguished from the focus, the part of the sentence falling within the scope of negation, and from the assertion, the message produced by the focus in combination with the presupposition. In both sentences (1) and (2), the presupposition is John embroidered something and the focus is tlzc napkin. The assertions of the two sentences concern whether the napkin was or was not embroidered by John. The relationship of a sentence to a given context may be specified in terms of the nature of the information contained within the presupposition and focus. A contextually appropriate sentence is one which presupposes established or given information and which focusses new or contrasting information. For example, in the context of the question What did John embroider? ( 1) and (2) would be appropriate replies, (though (2) is unhelpful), while (3) and (4) would be inappropriate. Because of its obvious association with contextual antecedents, presupposition is often referred to as given information, while focus and assertion are referred to as new information. (1) (2) (3) (4)
It was a napkin that John embroidered. It was not a napkin that John embroidered. It was John who embroidered a napkin. The one who embroidered a napkin was John.
The most fully developed account of the role of presupposition in the comprehension process is the Given-New strategy of Haviland and Clark (1974). This strategy, which Clark and Haviland (1977) have characterized as “a three step procedure for relating the current sentence to.. . [a] knowledge base”, involves the following stages: “At Step 1, the listener isolates the given and the new information in the current sentence. At Step 2, he searches memory for a direct antecedent, a structure containing propositions that match the given information precisely. Finally, at Step 3 the listener integrates the new information into the memory structure by attaching it to the antecedent found in Step 2”. (Clark and Haviland, 1977, p. 5.) The Given-New strategy was originally based on a series of experiments which investigated the processing of sentences containing lexical presuppositions (i.e., presuppositions produced by individual word meanings rather than by syntactic structure). Haviland and Clark (1974) found that these sentences were understood more rapidly when preceded by a context sentence which established a direct, as opposed to indirect, antecedent for the presupposition. Thus, the test sentence (7) was understood faster when it was preceded by the context sentence (5) than (6).
Syntactic presupposition in sentence comprehension
(5) (6) (7)
365
Ed was given an alligator for his birthday. Ed wanted an alligator for his birthday. The alligator was his favorite present.
These results may be taken as evidence that people find it more difficult to integrate a sentence with its context when the sentence’s presuppositions are not established directly by the context, and that some additional inferential processing is necessary to understand such sentences. However, because they only considered the processing of unfulfilled presuppositions, Haviland and Clark have not directly established that asserted and presupposed information are processed differently. It would seem likely that if the sentence’s assertion did not follow directly from the context, then a similar increase in comprehension time would be observed. For example, in the context of (8), sentence (9) would follow directly but (10) would not, even though in both cases the presupposition is fulfilled. (8) Ed wanted an alligator for his birthday (9) The alligator was his favourite present. ( 10) The alligator was his worst present.
and was given one.
Presumably, (10) would take longer to integrate with the context because it would require an additional bridging inference, for example, that Ed changed his mind about alligators. Since there is no evidence in Haviland and Clark’s experiment that new information is treated any differently from given, their Given-New model remains unsubstantiated. Quite a different model has been proposed to describe how people process presupposition and focus when verifying sentences. Presumably, assigning truth to a statement might introduce different strategies from those used in comprehension without verification. Hornby (1974) and Clark and Clark (1977) have suggested that, because speakers generally assign presupposition and focus appropriately, listeners are likely to assume that the presupposition is true (since normally this contains information they already know) and to examine more critically the focus, where new information is normally located. Thus, while the Given-New strategy suggests that the listener first corroborates the presupposition and then proceeds to assimilate the assertion, Hornby’s account suggests that the listener critically examines the focus while taking for granted the truth of the presupposition. The evidence for this model, which might perhaps be designated the New-Given strategy, is not particularly convincing. Hornby’s experiment investigated the processing of syntactic, rather than lexical, presupposition in a sentence verification task. In this task each acoustically presented sentence was followed by the tachistoscopic exposure of a picture. Hornby found that more errors were
366
J. Langford
and V. M. Holmes
made in recognizing a discrepancy between sentence and picture when the discrepancy involved a presupposed noun than when it involved a focussed noun. While this result appears to demonstrate a differential effect of presupposition and focus, it is open to alternative interpretations. Firstly, the extremely short presentation time ensured that not more than a single aspect of the picture could be attended to. In this situation all that the observed difference indicates is that subjects tended to examine focussed information first. This result does not, however, directly establish that subjects were “taking for granted” the truth of the presupposition, since there may have been no time left to examine the presupposed information. More importantly, in the experiment sentential focus coincided with the locus of heaviest stress. It is therefore quite possible that the superior recognition of discrepant focussed information was due to its dominant acoustic trace in short-term memory, rather than to a selective search for new information. The marking of focus by acoustic stress also leaves open the question of whether it is the syntax of the sentence that was used as a basis for distinguishing asserted and presupposed information. It seems, then, that the evidence does not unequivocally implicate syntax as a means by which people distinguish between presupposition and assertion in sentence comprehension. Nor do either of the hypothesized Given-New or New-Given strategies have very strong supporting evidence. The experiments reported below thus aimed to determine whether the structural distinction between presupposition and assertion really is utilized in the processing of sentences in context. They also aimed to evaluate the relevance of the givennew distinction to the comprehension process. Experiment
I
Most previous studies concerned with presupposition have examined the performance consequences of presupposition failure. Similarly, this experiment was designed to compare the processing of contradicted presuppositions with the processing of contradicted assertions. The comprehension task chosen was a paragraph-sentence verification task. With this task it was possible to set up the two experimental conditions by constructing for each test sentence two contexts -- one contradicting the sentence’s assertion and the other the sentence’s presupposition. By comparing the verification times for a given sentence in the two context conditions it was hoped to determine whether presupposition and assertion are in fact processed differently. In order to avoid any confounding between acoustic salience and structural marking of presupposition and assertion, all sentences were visually presented. To ensure that the results would not be limited to any one particular sentence structure,
Syntactic presupposition in sentence comprehension
367
two groups of sentences were used. One group comprised cleft and pseudocleft sentences and the other, factive complement sentences. Because the task in the present experiment necessitated relating a target sentence back to a preceding paragraph, the Given-New strategy would seem an appropriate model of the processing involved. Yet, since the task was one of verification, the New-Given strategy is also applicable. The two models predict rather different outcomes of the experiment. Subjects using a GivenNew strategy would first search memory for information corresponding to the presupposition in the target sentence and then would proceed to verify the assertion. These subjects would presumably detect information contradicting the presupposition before they would detect information contradicting the assertion. The Given-New Strategy, therefore, would predict longer verification times for items with false assertions. Subjects using a New-Given strategy, on the other hand, would tend to ignore the presupposed information, assuming it to be true, and would selectively search memory for information relevant to the assertion. These subjects would be expected to succeed in detecting false assertions but highly likely to overlook false presuppositions. The New-Given strategy, then, would predict more errors on items with false presuppositions than on items with false assertions. Materials and design
Materials consisted of 24 true and 24 contradicted, or false items. Each set contained 12 cleft-pseudocleft sentences and I2 factive complement sentences. To control for the order in which the target sentence mentioned assertion and presupposition, two target versions were constructed for each item. One version, either a cleft or an object complement sentence, mentioned the assertion first and the other version, a pseudocleft or a subject complement mentioned the presupposition first. Examples of target versions mentioning the assertion first are It was the coffee that ruined our carpet and The keeper was annoyed by their feeding the monkeys. Target versions which mention the presupposition first are What ruined our carpet was the coffee, and Their feeding
the monkeys
annoyed
the keeper.
The major experimental manipulation was achieved by constructing two contexts for each pair of target sentences. The two contexts were similar in length and content, but differed in that one contained information which contradicted the assertion in the target sentence while the other contained information which contradicted the target’s presupposition. There were thus four related versions of a given item. For the cleft-pseudocleft items, both inconsistencies involved the nouns in the target, while in the factive complements the inconsistencies involved the verbs in the target. To control for
368 J. Langford and V. M. Holmes
where in the context the discrepant information occurred, half of the items were constructed so that information relevant to the assertion was mentioned last and the other half were constructed so that information relevant to the target’s presupposition was mentioned last. The 24 true items were as similar as possible to the false items. Table 1 shows examples of the context and target conditions for two false items used in Experiment I. In sum, the item design consisted of one between-items factor, referring to whether the last information in the context related to the assertion or the presupposition (Context Order). There were also two within-item factors, one referring to the type of proposition, assertion or presupposition, contradicted by the context (Proposition Type), and one referring to whether the target sentence mentioned the assertion or the presupposition first (Target Order). To prevent subjects seeing more than one of the four versions of a given item, four lists were prepared containing one of each of the four conditions obtained from crossing Target Order and Proposition Type. The assignment of conditions was systematically varied so that the four lists contained an equal number of items in each condition. The same random ordering of true and false items was used for the four lists. Each list was given to an independent group of subjects. Thus, the subject design included a between-subjects factor (Group), as well as three within-subject factors (Context Order, Proposition Type and Target Order). As well as performing subject and item analyses of variance, minimum F’ was calculated in order to permit simultaneous generalization to new subject and new item populations (cf., Clark, 1973). The level of significance for all statistical decisions was set at a: = 0.05. Procedure All stimulus materials were presented on the oscilloscope terminal of a PDP11 computer. Subjects began each trial by pressing a button marked Go. A context paragraph appeared on the screen, which subjects had to read carefully, taking as much time as they needed. They then pressed the Go button again and a target sentence appeared on the screen. The subjects’ task was to decide as quickly and accurately as possible whether the target sentence was consistent or not consistent with the context, and then to press a Yes or and “inconsistent” were a No button accordingly. The terms “consistent” used in preference to “true” and “false” because of the logical problem that sentences with false presuppositions cannot themselves be false. The time taken to read the context and the time taken to verify the target sentence were measured to the nearest millisecond. Each subject received six practice trials during which the experimenter provided feedback about the correctness of each response.
Syntactic presupposition in sentence comprehension
Table 1.
369
Examples of contexts and targets for false items from Experiment I Cleft-pseudocleft
item
False Assertion Context Jane and Mary are flatmates. They get on well together but often in the evenings. They already have a radio but Mary would like well. False Presupposition Context Jane and Mary are flatmates. They get on well together but often in the evenings. They already have a television but Jane would like
find themselves bored to buy a television as
find themselves bored to buy a radio as well.
Target Sentencesa a) It is Jane who wants to get a television. b) The one who wants to get a television is Jane. Factive
complement
item
False Assertion Context Linda’s maths teacher is a defensive, discouraging person. He likes to prove his superiority by giving his students problems which are too hard for them. He was quite cross today when Linda, the brightest in the class, managed to solve the problem he set. False Presupposition Context Linda’s maths teacher is a defensive discouraging person. He likes to prove his superiority by giving his students problems which are too hard for them. He was delighted today when not even Linda, the brightest in the class, could solve the problem he set. Target Sentences a) Linda’s teacher was delighted that she could solve the problem. b) The fact that Linda could solve the problem delighted her teacher. aTarget
a) mentions
assertion
first and b) presupposition
first.
Subjects
Forty undergraduate students at the University of Melbourne were paid for participating in the experiment. All were native speakers of English.
Results and discussion The mean and standard deviation of each subject’s response distribution were calculated and, in order to minimize the influence of exceptionally long or short times, any observed verification time which exceeded two standard deviations from the mean was set at that value. This procedure affected 5.5% of verification times for false items and 3.8% of verification times when true
310
.I. I,angj?wd ard
Table 2.
V. M. Holmes
Mean l~erijkation times ill millisecorlds for j&e
___~ Assertion
Target First
items in Experiment Order Presupposition
Context
/
First
Order
Type of Proposition
Assertion Last
Presupposition Last
Assertion Last
Presupposition Last
False Assertion
259-I
2462
2629
2782
F&Z Presupposition
3047
2912
3039
3036
and false items were combined. Data for incorrect responses were excluded from the verification time analyses. Table 2 shows the means for the adjusted verification times for the test (inconsistent) items. In the analysis of variance on these means, the main effect of Proposition Type was highly significant, with F,( 1,36) = 47.34, F,(1,22) = 16.21 and min F’(l,37) = 12.08, showing that sentences with false assertions were verified significantly faster than sentences with false presuppositions. The main effect of Target Order was significant by subjects, with F,(l ,36) = 8.24. However, this effect was not significant in the item analysis, with F2( 1,22) = 1.33, and therefore this result cannot be considered typical of all items. From inspection of Table 2, it can be seen that false assertions were detected faster when they appeared first rather than second in the target sentence (a difference of 176 milliseconds) but false presuppositions were detected no faster when the presupposition was first (a reverse difference of 28 milliseconds). However, this interaction between Proposition Type and Target Order did not approach significance in either the subject or the item analyses. Neither the main effect of Context Order, nor any of the other possible interaction effects, approached significance in either the subject or the item analyses. In order to compare the two types of target sentence structure, an analysis contrasted the verification times for cleft-pseudocleft and for factive complement sentences. There was no evidence that these two structural types differed in overall verification time as the main effect of Sentence Type was not significant, with F,(1,36) = 2.96 and F, < 1. Nor did Sentence Type interact significantly with either of the other factors in the analysis, Proposition Type and Target Order. In a further analysis the means of the test items were compared with the means of the distractor items where the required response
Syntactic presupposition in sentence comprehension
Table 3.
371
Mean percentage of errors for false items in Experiment I Target Assertion
Presupposition
First Context
Type of Proposition False Assertion False Presupposition
Assertion Last
Order
Presupposition Last
First
Order Assertion Last
Presupposition Last
5.8
5.0
6.7
5.0
18.3
11.6
4.2
9.2
was consistent. The means for correct false and true items were 2826 milliseconds and 3387 milliseconds respectively. False items were verified significantly faster than true items, with F,(1,36) = 53.59, F,(1,46) = 25.95 and min F’(1,78) = 17.48. Analyses were also performed on the mean numbers of errors. Table 3 shows the mean percentage error for the test items. A trend was observed for there to be more errors when presuppositions were falsified than when assertions were falsified, although inspection of Table 3 reveals that this effect differed in magnitude for the four order conditions, being entirely absent when the assertion was last in the context and the presupposition first in the target. In the subject analysis there were significant main effects of Proposition Type, F,(1,36) = 12.25, and of Context Order, F,(1,36) = 7.20, and significant interactions between Proposition Type and Context Order, F,( 1,36) = 6.33, and between Proposition Type, Target Order and Context Order, F,(1,36) = 4.65. However, not one of these effects approached significance in the item analysis, suggesting that they would not be generalizable to another set of items. That the main effect of Proposition Type was not representative of the items used was confirmed by inspection of the individual item means; only 7 out of the 24 false items exhibited the effect. The means of the context inspection times for the two experimental conditions, i.e., for the false assertion and the false presupposition conditions, were 15,827 milliseconds and 15,669 milliseconds respectively. These were not significantly different in either the subject or the item analysis, with F, < 1 and F, < 1. The major finding of this experiment was that verification times for items with false assertions were significantly faster than verification times for items with false presuppositions. There was also a tendency for there to be more
372
J. Langford and V. M. Holmes
errors on false presupposition items than on false assertion items, although this was only true of a subset of the items. The non-significant interaction between Sentence Type and Proposition Type suggests that the assertionpresupposition distinction was created just as strongly by the factive complement sentence structures as by the clefts and pseudoclefts. The absence of an interaction between which proposition was false and which was mentioned first in the target sentence rules out the possibility that subjects simply verified sentences in left to right sequence. The non-significant interaction between Proposition Type and Context Order indicates that verification times were no faster for items where the discrepant proposition was mentioned last in the context, suggesting that, at least in this task, recency of mention did not systematically affect the salience of contextual antecedents in working memory. A possible weakness of Experiment I is that the experimental contrast necessitated a comparison between verification times for quite different factcounterfact pairs. It is conceivable that the contradictions involved in the false presuppositions happened for some reason to be more difficult to detect than those involved in the false assertion condition. To determine whether or not this was the case, a control experiment was run where each target sentence was separated into two simple “component” sentences. For example, a factive complement sentence from Experiment I, Basil’s failing physics upset his parents was separated into Basil failed physics (the presupposition component) and BasiE’s parents were upset (the assertion component). Cleftpseudocleft sentences were separated by inserting indefinite pronouns. For example, It was the coffee that ruined our carpet was separated into The coffee ruined something (the assertion component) and Something ruined our carpet (the presupposition component). The procedure of the control experiment was identical to that of Experiment I. Each component sentence appeared with the Experiment I context which contradicted it. It was found that component sentences which had originally been presuppositions were no more difficult to verify than component sentences which had originally been assertions, with F,( 1 ,I 8) = 3.9 1 and F, < 1. In fact the difference between the means for the two conditions was in the opposite direction. The results of this control experiment therefore rule out the possibility that the outcome of Experiment I was simply due to a confounding of difficulty of contradiction with type of proposition contradicted. One other finding that deserves comment at this point is the fact that true items in Experiment I took significantly longer to verify than false items. This result is atypical of the general finding in verification tasks that true responses, at least for explicitly affirmative sentences, are faster than false
Syntactic presupposition in sentence comprehension
3’73
responses (e.g., Clark and Chase, 1972). A simple explanation of this finding is that the true items necessitated an exhaustive search of the context representation, whereas the false items permitted the search to be terminated as soon as a discrepancy was located. A further, artifactual reason for the finding may have been that subjects had difficulty in deciding whether some supposedly equivalent expressions were actually consistent or not. In fact, in some of the true items, the expression in the target sentence was more general than the corresponding expression in the context. Subjects reported that they sometimes found it difficult to decide whether terms such as “oyster” and “seafood”, “godfather” and “man”, and “relations with China” and “foreign policy” were meant to be consistent. Experiment
II
This experiment was designed to investigate whether the assertion-presupposition effect obtained in Experiment I would also be present when target sentences are processed in the absence of prior context. Presumably, any view based on the idea that people use the marking of assertion and presupposition as directions to new and given information would predict that the assertion-presupposition effect would not be present in such a situation. People would adjust to a situation where there is no prior context (and ipso facto, no given information) and would have no need to distinguish structurally between assertion and presupposition. A verification task was again employed but this time with the order of target sentence and context reversed. Accordingly, subjects were required to judge the relevance of a picture to a previously presented sentence. Picture, rather than paragraph contexts were used, on the assumption that a nonverbal context would provide minimal interference with memory for the target sentence. The sentences were all clefts and pseudoclefts, factive complements being excluded because their semantic content proved too difficult to depict unambiguously. Materials and design
There were 16 true and 16 false items. For each item there was one picture context and four possible sentence structures: cleft agent, cleft object, pseudocleft agent and pseudocleft object. Since all the target sentences were clefts or pseudoclefts, the inconsistency between picture and target always involved a noun. For any given false item, the discrepancy involved the same noun in all treatment conditions. As in Experiment I, the cleft and pseudocleft structures controlled for whether the target mentioned the assertion or
374
J. Langford and V. M. Holnws
the presupposition first. The agent and object sentences allowed the discrepant noun to be either focussed or presupposed. Table 4 gives examples of the target sentences used for two of the test items in Experiment II. Pictures were simple black line drawings. Half of the false items were constructed so that the discrepancy involved the logical subject of the action in the picture and the other half were constructed so that the discrepancy involved the logical object of the depicted action. To ensure that all lexical presuppositions were fulfilled, the discrepant noun was always present somewhere in the picture. For instance, in the first example in Fig. 1, the discrepant noun woman is depicted, but not in the appropriate relationship with the cupboard. In the pictures for the true distractor items there was always a third irrelevant object present, to prevent these items being noticeably different from the false items. Figure 1 shows the pictures which were used for the two false items exemplified in Table 4. To summarize, for the test items the design consisted of one betweenitems factor (Role of False Entity) and two within-item factors (Proposition Type and Target Order). Role of False Entity referred to whether the discrepant object was the logical subject or the logical object of the depicted action. As in Experiment I, Proposition Type referred to whether the target sentence asserted or presupposed the discrepant noun and Target Order referred to whether the target sentence mentioned the assertion or the presupposition first. Again, all four treatment conditions for each item (corresponding to the four sentence structures) were assigned to different lists and the assignment was varied over the items so that overall each list contained the same number of each sentence type. The same random ordering of true and false items was used for the four lists, which were given to four independent groups of subjects. The inspection times for all items (i.e., the time taken to read the target sentence) was classified according to two factors, Structure and Case of Clefted Noun. Structure referred to whether the sentence was a cleft or a pseudocleft structure and Case of Clefted Noun referred to whether the sentence asserted the logical subject or the logical object. Once again, subject and item means were analysed, and min F’ was calculated for all analyses. Procedure The stimuli,
which were black and white transparencies, were projected onto a light grey wall by a carousel projector. Subjects were seated at a response table. On each trial they pressed an Advance button to bring on the target sentence, and then, when ready, pressed it again to bring on the picture. Subjects then had to decide whether the picture and target were consistent or
Syntactic presupposition in sentence comprehension
Table 4.
375
Target sentences for two false items from Experiment II a) A false logical subject item Cleft agent: It’s the woman who is pushing the cupboard. Cleft object: It’s the cupboard that the woman is pushing. Pseudocleft agent: The one who is pushing the cupboard is the woman. Pseudocleft object: What the woman is pushing is the cupboard. b) A false logical object item Cleft agenrt It’s the man who is washing the floor. Cleft objecr: It’s the floor that the man is washing. Pseudocleft agent.’ The one who is washing the floor is the man. Pseudocleft object: What the man is washing is the floor.
Figure 1.
Picture contexts for (a) a false logical subject item and (b) a false logical object item in Experiment II.
(a)
(b)
not, and to press the Yes or the No button accordingly. A digital printout timer, connected to a photocell in the projector, recorded to the nearest millisecond the times taken for inspection of the target sentence and the verification time from the onset of the picture. At the beginning of each session there were seven practice trials during which the experimenter provided feedback as to the correctness of the subjects’ responses. Subjects
Forty undergraduate students at the University of Melbourne were paid for participating in the experiment. All were native speakers of English.
376
J. Langford and V. M. Holmes
Results and discussion
The cut-off procedure described above was used in each verification time analysis. This affected 4.6% of verification times for false items and 5.5% of times for true and false items combined. Table 5 shows the means of the adjusted verification times for false items in Experiment 11. In the analysis of variance, the main effect of Proposition Type was significant, with F,(1,36) = 19.58, F,(1,14) = 31.57 and min F’(1,48) = 12.09. Sentences with false assertions were verified significantly faster than sentences with false presuppositions. The main effects of Target Order and Role of False Entity were non-significant in both the subject and item analyses, as were all the possible interactions between the three factors. The mean verification times for false and true items were 1226 milliseconds and 1195 milliseconds respectively. These were not significantly different, with F,( 1,36) = 2.20 and F,( 1,30) < 1. Table 6 shows the means of the percentage error for false items. Analyses of the mean numbers of errors revealed that, although there was a tendency, once again, for there to be more errors on false presupposition items than on false assertion items, this difference was not significant, with F, (1,36) = 3.00 and F,( 1 ,14) = 2.74. None of the other main or interaction effects was significant in either the subject or the item analysis. A preliminary analysis revealed that the means of the target inspection times for true and false items, which were 2146 milliseconds and 2344 milliseconds respectively, were not significantly different in either the subject or the item analysis. Thus, true and false inspection times were combined, the means being presented in Table 7. In the analyses, the main effect of Structure was significant by subjects, with F,(l,36) = 6.54, and by items, with F2( 1,3 1) = 4.44, but min F’ failed to reach significance, with min F’(1,62) = 2.63. There was a strong trend, therefore, for cleft sentences to be processed faster than pseudocleft sentences. The interaction between Structure and Case of Clefted Noun was significant in the subject analysis, with F,( 1,36) = 5.68, but not by items, with F, < 1. This interaction was a cross-over, whereby cleft agent sentences were inspected faster than cleft object sentences, but pseudocleft agent sentences were inspected slower than pseudocleft object sentences. Experiment II has demonstrated that, even when sentences have no prior context, and therefore contain no given information, they are represented in a form which distinguishes between assertion and presupposition. Sentences which presupposed discrepant information took significantly longer to verify than sentences which asserted it. In contrast with Hornby’s findings,
Syntactic presupposition in sentence comprehension
Table 5.
377
Mean verification times in milliseconds for false items in Experiment II Role of False Entity Logical
Subject
Logical Target
Table 6.
Object
Order
Type of Proposition
Assertion First
Presupposition First
Assertion First
Presupposition First
False Assertion
1208
1142
1145
1131
False Presupposition
1321
1267
1297
1356
Mean percentage error on false items in Experiment I1 Role of False Entity Logical
Subject
Logical Target
Table 7.
Object
Order
Type of Proposition
Assertion First
Presupposition First
Assertion First
Presupposition First
False Assertion
5.00
2.50
1.25
3.75
False Presupposition
7.50
3.75
5 .oo
5.00
Mean inspection times in milliseconds for all items in Experiment II Case of Clefted Noun
Agent Object
Sentence
Structure
Cleft
Pseudocleft
2100 2162
2244 2175
the overall error rate for false presupposition items was low, 5.3%. Furthermore, there was no significant difference between the numbers of errors made on items with false presuppositions and on items with false assertions.
378 J. Langford and V. M. Holmes
A superior feature of Experiment II was that the two experimental conditions involved exactly the same contradiction between sentence and picture. Therefore the difference in verification times can only be attributed to differences between the target surface structures in the two conditions. The strength of the assertion-presupposition effect is quite remarkable in view of the unlimited inspection time, and of subjects’ own impressions that they were merely recoding the target sentences into a simple form. The fact that target inspection for true and false items did not differ suggests that subjects could not have anticipated whether an item would be true or false on the basis of the target sentence alone. When true and false inspection times were pooled and analysed in terms of surface structure features, it was found that cleft sentences were processed more rapidly than pseudocleft sentences. In addition, the interaction effect suggested that, at least for some items, sentences which mentioned the logical subject, verb and logical object in that order tended to be processed faster than the sentences with non S-V-O orders. Thus, cleft agent sentences, It is the S that is V-ing the 0 were inspected on average faster than cleft object sentences, It is the 0 that the S is Virlg and pseudocleft object sentences, What the S is V-irlg is the 0 were inspected faster than pseudocleft agent sentences, The OYIFthat is V-kg the 0 is the S. The fact that inspection times tended to be sensitive to the different ways of expressing the same basic meaning justifies the removal of time constraints from the processing of the target sentence. If only a limited amount of time is allowed, as was the case in Hornby’s experiment, then more complex surface structures may be encoded less adequately. In the present experiment, it may be fairly safely assumed that by the verification phase of the trial, the four surface structure types had been encoded in equivalent form. This is borne out by the absence of any effect of surface structure type per se on subsequent verification times and error rates. As in Experiment I, neither of the control factors was related to verification time. Thus there was no evidence either of a serial left-to-right verification strategy, nor of any primacy or recency effects on memory for the target sentence. Similarly, the results ruled out the possibility that subjects were using a systematic strategy to search the picture for logical subject before logical object. There was no difference between verification times for items with discrepant logical subjects and for items with discrepant logical objects. In contrast with Experiment I, verification times for true and false items were not significantly different. There are several aspects of the procedure in Experiment II which might explain this result. Firstly, the picture contexts were simpler than the paragraphs, and thus any exhaustive search in the true items would end much sooner. Secondly, the pictures were present in front of the subject, rather than being held in memory, so that any search involved
Syntactic presupposition in sentence cornprehension
379
would be much more efficient. Finally, the confirming instances were less equivocal in Experiment II, where there were no problems with the intended equivalence of sentences and pictures. Thus a search of the picture could be terminated by either a confirming or a disconfirming instance. General discussion The major finding of the experiments reported was that sentence verification times were significantly longer when a discrepancy between target sentence and context was located in the syntactic presupposition than when the discrepancy was in the assertion, a result which has not been demonstrated before. As was pointed out above, previous studies of presupposition have either failed to make a direct comparison between the processing of assertion and presupposition, or have made the comparison but have confounded the syntactic distinction between assertion and presupposition with other nonsyntactic factors such as acoustic stress. The present experiments were not open to either of these criticisms. The result provides confirmation that once the surface structure of a sentence is processed, not only does it influence the memory representation of the sentence meaning, but it also serves to direct subsequent verification processes. Returning to the two psycholinguistic accounts of presupposition outlined above, the present findings reveal that the Given-New Strategy of Haviland and Clark (1974) is inadequate as a description of the processing of presupposition and assertion in sentence verification tasks. If subjects had been using this strategy to integrate target sentences with contexts, then they should have detected false presuppositions more rapidly than false assertions. It would probably be argued by Haviland and Clark that the present findings do not constitute a refutation of their model since the model was never intended as a description of verification tasks. However, it is surprising that a “procedure for relating the current sentence to.. . [a] knowledge base” was not evident at least in Experiment I, where subjects had to compare a target sentence to a previously assimilated paragraph context. The New-Given strategy, i.e., the model proposed by Hornby (1974) and by Clark and Clark (1977), is also unacceptable as an explanation of the present findings. In Clark and Clark’s formulation, the New-Given model was based on the assumption that the semantic representation of a target sentence contains not only propositional, or underlying logical information, but also thematic, or Given-New specifications (cf., Clark and Clark, 1977, p. 89). According to this model, subjects first encode the sentence’s thematic structure, and then adopt a search strategy based on their expectations about which parts of this thematic structure are likely to be true. To account for Hornby’s finding that subjects were less likely to detect false presuppositions
380 J. Langford and V. M. Holmes
than false assertions, the New-Given model proposes that subjects assume that given information is true, and only search for facts relating to the new information. This account founders when confronted with the results of the present experiment: given enough time, subjects rarely failed to detect false presuppositions. This difficulty may be dealt with by modifying the NewGiven model, and postulating that subjects do eventually search the context for information corresponding to the given part of the sentence, but only after they have searched for, and failed to detect discrepant new information. This revised version of the New-Given strategy is superficially more consistent with the results of the present experiments. However, closer scrutiny reveals that there is a major problem with the idea that subjects may search in serial order first for new and then for given information. In Hornby’s experiment, where the discrepancy involved an unfulfilled lexical presupposition in either the focus or the presupposition of the target sentence, it was possible for subjects to search for new information before searching for given. However, for the cleft and pseudocleft sentences in the present experiment, where the lexical presuppositions for focussed words were always fulfilled, it was logically impossible to detect new information without reference to the content of the presupposition. In the experiments reported above, the discrepant new information involved not the focus, but the combination of focus and presupposition, i.e., the assertion of the target sentence. Thus, the idea that subjects search first for new information, and then for given, cannot account for the findings of the present experiments. Clearly, an alternative explanation is called for. In what follows, we will outline an alternative account of our results which we shall designate the Structural hypothesis. A basic premise of the Stuctural hypothesis is that the encoding of the target sentence contains no specific marking of given and new information, but simply retains some representation of the hierarchical organization already present in the sentence’s surface structure. A second feature of the hypothesis is that it does not explain the assertion-presupposition effect in terms of ordered search strategies, and hence makes no assumptions about the order in which the context is searched. Instead, it assumes that the effect is attributable to the number of mental operations required to express, or to make explicit the discrepancy. Finally, the hypothesis assumes that the expression of a discrepancy involves the construction of either a negated proposition or of a Yes/No question, and that this process is necessary before a subject can be confident of a No response. When subjects attempt to construct a negated proposition, the simplest procedure is simply to predicate a negative to the sentence representation as it stands. Because the target sentence is represented according to its surface structure form, the first constructed negation will correspond to a denial of
Syntactic presupposition in sentence comprehension
381
the assertion but not of the presupposition. If it is the assertion which is discrepant with the context, then a No response will be appropriate and immediate. However, if the discrepancy is in the presupposition then the first constructed denial, which does not extend to the presupposition, will not correspond to the mismatch exactly. In this case, subjects will not be able to respond No immediately but will have to reformulate the target sentence. This reformulation will involve stripping off the main clause and isolating the subordinate clause so that the presupposition may be directly negated. If, on the other hand, subjects verify sentences by generating Yes/No questions, then a similar explanation can be provided for why they take longer to verify sentences with false presuppositions. On the assumption that surface structure form determines the representation of the target sentence, the most easily generated question is one which manipulates elements of the main clause. For example, It is the boy who is chasing the cow is most simply transformed into Is it the boy who is chasing the cow? As with sentential negation, then, the first-generated question interrogates only the assertion. If the discrepant fact is asserted, then the question will be appropriate and an immediate No response possible. However, in order to locate a discrepancy in the presupposition, another question will be required which specifically interrogates the subordinate clause. Once again, the need to generate a second question accounts for the longer times associated with verifying false presuppositions. Clark and Clark (1977) have already pointed to the similarity between verifying a sentence and answering a Yes/No question. Their reason for making this comparison was that they wished to establish, by analogy, a plausible reason for why subjects should assume the presupposition is true and selectively search for New information. However, our proposal is that subjects may actually generate Yes/No questions, or alternatively, negated propositions, in the course of verifying sentences. It is possible to make some tentative suggestions as to the stage at which these structural manipulations occur within the overall comprehension process. In this regard, the distinction drawn by Cutler (1976) between Stage A and Stage B processing is pertinent. In her formulation, Stage A processing involves the parsing-plus-lexical-look-up activity necessary to construct a literal interpretation of the sentence, while Stage B processing involves the subsequent enrichment and modification of this interpretation in the light of extra-sentential factors. In Experiment I, where the verification times included the time taken to read the target sentence, it was not possible to isolate Stage A and Stage B processing. However, in Experiment II, where target inspection times were recorded separately, the two types of processing were distinguishable. The inspection times presumably reflected Stage A processing; they were sensitive to differences in the structural complexity of the four
382
J. Langford and V. M. Holmes
sentence types, and resulted in a sema,rtic interpretation adequate for the accurate verification of the sentence. On the other hand, the verification times may be assumed to reflect Stage B processing. Thus, the structural reformulations needed to apprehend a discrepancy between presupposition and context may be considered a part of Stage B activity. Finally, the major conclusion which may be drawn from the experiments described here is that the structural distinction between assertion and presupposition has a real effect on the processes by which sentences are integrated with their contexts. Although this effect is normally undetected, it becomes strikingly apparent when there is a discrepancy between a sentence and its context. In seeking an explanation for this effect we have rejected the proposal that it is due to strategies based on subjects’ expectations about where Given and New information are normally located in the sentence. Instead, it has been proposed that the effect is primarily due to the position of the presupposition within the subordinate clause which renders it inaccessible to negative or interrogative predicates, both of which are implicated in the apprehension of discrepant information. In their discussion of Hornby’s findings, Clark and Haviland (1977) have noted that English has only clumsy and indirect devices for qualifying presuppositions. However, they conclude that the major reason for subjects having failed to detect false presuppositions was their assumption that the speaker was adhering to the Given-New contract. The difference between their account and ours lies in the emphasis, not so much on the listener’s assumptions about adherence to a Given-New contract, as on the unavoidable consequences of processing surface structure. Although we have not embraced the notion that listeners use assertion and presupposition in a deliberate, strategic way, it is clear that the structure in which a message is conveyed may facilitate the process by which the listener can reconstruct that message. Conversely, it is apparent that the same structure may obstruct the processing which attempts to integrate the sentence with contextual knowledge. Whether a structure will be facilitative or disruptive depends on the nature of the information placed in its presupposition. It appears that the content of the presupposition must be that information which requires minimal processing if sentence comprehension is not to be obstructed. It should be noted that the preceding discussion is not inconsistent with the idea that assertion and presupposition serve a crucial function in the communicative process. However, the emphasis of our Structural hypothesis is that this function is determined by the effect of sentence structure on the language processing mechanisms, rather than by strategies based on the listener’s pragmatic expectations.
Syntactic presupposition in sentence comprehension
383
References Bock, J. K. (1977) The effect of pragmatic presupposition on syntactic structure in question answering. J. verb. Learn. verb. Behav., 16, 723-735. Clark, H. H. (1973) The language-as-fixedeffect-fallacy: a critique of language statistics in psychological research. J. verb. Learn. verb. Behav., 12, 335-359. Clark, H. H. and W. G. Chase (1972) On the process of comparing sentences against pictures. Cog. Psychol., 2, 101-111. Clark, H. H. and E. V. Clark (1977) Psychology and Language. New York, Harcourt Brace Jovanovich, Inc. Clark, H. H. and S. E. Haviland (1977) Comprehension and the Given-New contract. In R. 0. Freedle (ed.), Discourse Production and Comprehension. Norwood, N.J., Ablex. Cutler, A. (1976) Beyond parsing and lexical look-up; an enriched description of auditory sentence comprehension. In R. J. Wales and E. Walker (eds.). New Approaches to Language Mechanisms. Amsterdam, North Holland. Haviland, S. E. and H. H. Clark (1974) What’s new? Acquiring new information as a process in comprehension. J. verb. Learn. verb. Behav., 13, 512-521. Hornby, P. A. (1974) Surface structure and presupposition. J. verb. Learn. verb. Behav., 13, 530-538. Hupet, M. and B. Le Bouedec (1977) The Given-New contract and the constructive aspect of memory for ideas. J. verb. Learn. verb. Behav., 16, 69-75. Offii, C. E. (1973) Recognition memory for presuppositions in relative clause sentences. J. verb. Learn. verb. Behav., 12, 636-643. Osgood, C. E. (1971) Where do sentences come from? In D. D. Steinberg and L. A. Jakobovits (eds.), Semantics, Cambridge, Cambridge University Press. Singer, M. (1976) Thematic structure and the integration of linguistic information. J. verb. Learn. verb. Behav., 15, 549-558.
Resume On a fait deux experiences pour Studier le role de la presupposition syntaxioue dam la comprehension des phrases. Dans la premiere experience les sujets doivent verifier, en fonction de contextes present& avant les phrases, des phrases clivkes, des pseudoclivees et des phrases avec des complements factitifs. Les contextes peuvent contredire soit l’assertion soit la presupposition de la phrase cible. Les sujets mettent significativement plus de temps pour verifier les phrases avec des presuppositions fausses que pour verifier les phrases avec des assertions fausses. Dans l’expirience II, les sujets verifient les phrases clivees et pseudoclivees en fonction d’images present&es apres les phrases. Les temps de verification pour les phrases avec des presuppositions fausses sont ici aussi significativement plus longs que les temps de verification pour les phrases avec des assertions fausscs. On rend mieux compte de ces donnees avec une hypothese “structurale” qu’en termes de strategies ayant pour but de localiser les informations don&es ou nouvelles.
Cognition, 7 (1979) 385-407 @Elsevier Sequoia S.A., Lausanne
Discussion - Printed
in the Netherlands
On the psychology of prediction: L. JONATHAN
Whose is the fallacy?* COHEN”*
Oxford University
We are all undeniably prone to certain perceptual illusions. Heat creates mirages, water distorts the appearance of shape, poor visibility leads us to over-estimate distances, trick diagrams (like the Mfiller-Lyer) promote errors about comparative size. Are we also prone to certain intellectual fallacies? Is it experimentally demonstrable that, unless specifically instructed about the matter at issue, we are systematically inclined to make certain sorts of mistakes in our reasonings? Psychologists have claimed this, and they are undoubtedly right in some cases. Indeed, to demonstrate human proneness to certain kinds of intellectual fallacy, psychological experiment is scarcely needed. For example, an uneducated person’s belief in a flat Earth can hardly result from anything else but a tendency to over-generalise from immediate appearances. But in at least one case it seems more likely that the fallacy has been in the experimenters’ interpretations of their data, rather than in the minds of the experimental subjects. Kahneman and Tversky (I 973 and 1974) and Tversky and Kahneman (1974) have argued that intuitive judgments of probability are biassed towards predicting that outcomes wilI be similar to the evidence. But the Tversky-Kahneman argument is based on the assumption that the human mind has only one legitimate framework within which to reason about uncertain predictions - viz. the calculus of chance that was gradually made explicit by Pascal and others in the seventeenth and eighteenth centuries and has long formed the mathematical basis of statistical theory. When that erroneously restrictive assumption is discarded, the relevant experiments may be construed instead as confirming the common use of a nonPascalian concept of probability for tasks of the kind in question. The Pascalian theory of probability is admittedly the only explicit and systematic theory of probability that currently forms part of a scientist’s educational curriculum. But, even on the subject of probability, we ought to be sufficiently open-minded to envisage the possibility that current educational curricula may not as yet reflect every legitimate mode of reasoning.
*I am grateful to Steve Stich for some helpful comments on an earlier draft of the present paper. **Requests for reprints should be sent to L. Jonathan Cohen, The Queens College, Oxford University, Oxford, England.
386
I,. J. C&w
The illusoriness of a perceptual illusion is normally easy to demonstrate to anyone. Touch comes to the aid of vision, perhaps, and the stick in water that looks bent is j’dt to be straight. Or a closer look corrects a more distant one, and the mirage disappears. Here the methods of checking first impressions are familiar to every sane adult in our culture, and the criteria of correct judgemcnt are universally accepted. But how can the illusoriness of an alleged intellectual illusion be demonstrated? Tversky and Kahneman claim that their subjects are guilty of ignoring calculable inconsistencies between the information given and the probability-judgments produced. Thus the fallacies that they allege to be committed are computational howlers. But to accuse someone of computational error within a logical or mathematical system you need first to be quite sure that you have correctly interpreted what system he is in fact using. The relativity theorist’s calculations are not erroneous just because they are non-Euclidean. So, in order to establish their own contentions about human fallibility, Tversky and Kahneman need to exclude the possibility of a non-Pascalian theory of probability within which their subjects’ calculations would be quite legitimate. But, so far from such a theory’s being impossible, it can be shown to be implicit even in the standard norms of experimental reasoning - norms so familiar in everyday scientific practice that they rarely arouse the reflective attention of those who use them. Francis Bacon (1620) was the first to recognise the existence of these norms in modern science, and his exposition of them was later amplified by Robert Hooke (1705), J. F. W. Herschel1 (1833) William Whewell (1847) and J. S. Mill (1843). But these norms have lacked a systematic theoretical development until recently (Cohen, 1970) and in consequence it has been easy for psychologists, like Tversky and Kahneman, to misclassify certain human reasoning processes as being Pascalian and invalid, rather than as being Baconian and valid.
The alleged fallacy of representativeness According to Tversky and Kahneman, people typically judge the probability that A originates from, or belongs to, B as high when A is highly representative of B (i.e., is highly similar to it) and as low when the opposite is the case. On their view this is a heuristic that, though sometimes harmless, tends to foster two important computational errors. It has the consequences both that prior probabilities, or base-rate frequencies, tend not to be taken into account where they should be, and also that the significance of differences in sample-size tends to be ignored. Two experiments reported by Tversky and Kahneman (1974) will show what is meant here.
On the psychology
of prediction:
Whose is the fallacy?
387
In the first experiment the subjects were shown brief personality descriptions of several individuals, allegedly sampled at random from a group of 100 professionals ~ engineers and lawyers. The subjects were asked to assess, for each description, the probability that it belonged to an engineer rather than to a lawyer. In one experimental condition, subjects were told that the group from which the descriptions had been drawn consisted of 70 engineers and 30 lawyers. In another condition, subjects were told that the group consisted of 30 engineers and 70 lawyers. The odds that any particular description belongs to an engineer rather than to a lawyer should be higher in the first condition where there is a majority of engineers than in the second condition, where there is a majority of lawyers. Specifically, it can be shown by applying Bayes’ rule that the ratio of these odds should be (0.7/0.3)* or 5.44, for each description. In a sharp violation of Bayes’ rule, the subjects in the two conditions produced essentially the same probability judgments. Apparently, subjects evaluated the likelihood that a particular description belonged to an engineer rather than to a lawyer by the degree to which this description was representative of the two stereotypes, with little or no regard for the prior probabilities of the categories. The subjects used prior probabilities correctly when they had no other information. In the absence of a personality sketch, they judged the probability that an unknown individual is an engineer to be 0.7 and 0.3, respectively, in the two baserate conditions. However, prior probabilities were effectively ignored when a description was introduced, even when this description was totally uninformative. In a second
experiment
subjects failed to appreciate the role of sample size even when it was emphasized in the formulation of the problem. Consider the following question: ‘A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know about 50% of all babies are boys. However the exact percentage varies from day to day. Sometimes it may be higher than SO%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?’ [The numbers of subjects who gave the various possible answers were] : The larger hospital (21). The smaller hospital (2 1). About the same (that is, within 5 percent of each other) (53). Most subjects judged the probability of obtaining more than 60% boys to be the same in the small and in the large hospital, presumably because these events are described by the same statistic and are therefore equally representative of the general population. In contrast, sampling theory entails that the expected number of days on which more than 6% of the babies are boys is much greater in the small hospital than in the large one, because a large sample is less likely to stray from 50%.
388
I,. J. Collcn
If in such experiments as these Tversky and Kahneman’s subjects are to be thought of as reasoning in terms of Pascalian probabilities, they are indisputably committing gross fallacies. But before imputing gross fallacies to his fellows a psychologist always needs to be sure that no more charitable interpretation of their responses is available. Some attention must first be devoted to clarifying the content, scope, and credentials of the various norms that might act as arbiters of fallaciousness: and, of course, this clarification is something to be accomplished initially by logical or philosophical argument rather than by the study of experimental findings.
The structure
of Baconian
probability
God, as we are plausibly informed by John Locke, did not make man barely two-legged and leave it to Aristotle to make him logical. Nevertheless, though even those untutored in formal logic can make some kinds of valid deductions, it is an enterprise of considerable difficulty to construct an explicit statement of the principles governing demonstrative validity. Just the same is true of probabilistic reasoning. The seventeenth century saw the first steps taken towards an explicit theory of probability, but these steps led in two rather different directions. Along one main line of advance the initial ideas of Pascal and Fermat were taken up by Leibniz and Bernoulli and were soon refined and developed for a vast variety of self-conscious scientific purposes (cf., Hacking, 1975). Judgments of probability in this sense constrain one another in accordance with the familiar principles of complementation for negation for conjunction (p[B&C/Al = (p[B/Al = 1 ~ P[%W and multiplication or posterior, probabilities are p[B/Al x p[C/A&BI): and conditional. related to unconditional, or prior, ones by the Bayesian principle p[B/A]
p[NBl x P[BI P[AI
=--
where
p[Al
> 0
The other seminal ideas about non-demonstrative inference were given their classical exposition by Francis Bacon, and were concerned with the establishment of causal laws. On the one hand, it was in effect insisted, there must be controls in order for us to be sure that A itself. and not some phenomenon occurring alongside A, is necessary for causing B: on the other hand, the A-B sequence must be observed to occur in a variety of relevant circumstances, in order for us to be sure that A alone is sufficient to cause B. But the method of difference and the method of agreement, as J. S. Mill called these two requirements. have traditionally been catalogued rather than
On the psychology of prediction: Whose is the fallacy?
389
systematised. They have been seen as separate and independent criteria. Indeed the Baconian tradition did not render explicit any well-regulated method of treating the extent of experimental support for a causal hypothesis as a matter of degree rather than all-or-nothing. Nor did it have any conception of the rational constraints that one such judgment of experimental support may place on another: for example, it had no principles relating the support for a hypothesis to the support for its negation, or the support for hypotheses considered separately to the support for their conjunction. As a theoretical explication of intuitive practices in non-demonstrative inference it was thus markedly inferior to the Pascalian tradition. One can understand therefore why Tversky and Kahneman ignore it altogether and identify the Pascalian account with what they call ‘the normative theory of prediction’, making no allowance for the possibility that probabilities may also be graded in a different way and probability-judgments place correspondingly different kinds of constraints on one another. A systematic and sufficiently precise development of Baconian probability is now available, however, and a brief, informal outline of this development will suffice to show its bearing on the issues investigated by Tversky and Kahneman. The theory has four key ideas. (i) The traditionally distinct methods of agreement and difference are generalised into a single ‘method of relevant variables’ for grading the inductive reliability of generalisations about natural phenomena in any domain that is assumed to obey causal laws. (ii) The (Baconian) probability of an A’s being a B is identified with the inductive reliability of the generalisation that all A’s are B’s. (iii) Judgments of Baconian probability are seen to constrain one another in accordance with principles that are derivable within a certain modal-logical axiomsystem but not within the classical calculus of chance. (iv) Baconian probability-functions are seen to deserve a place alongside Pascalian ones in any comprehensive theory of non-demonstrative inference, since Pascalian functions grade probabilification on the assumption that all relevant facts are specified in the evidence, while Baconian ones grade it by the extent to which all relevant facts are specified in the evidence. I shall sketch each of these four ideas in turn: fuller details and arguments are available elsewhere (in Cohen, 1970 and 1977). (i) The method of relevant variables grades generalisations by their capacity to resist interference from factors which sometimes interfere with the operation of other generalisations that have been, or could be, formulated in the same field of enquiry. A relevant variable is defined, roughly, as a (not logically exhaustive) set of such factors that are co-ordinate with one another, in the way that, for example, different kinds of previous medical history might be some of the relevant circumstances for clinical tests on
390 I,. .I. Coilrrl
hypotheses about a drug’s efficacy. We may take it that an investigator approaches the task of testing a particular causal hypothesis in his field with an appropriate list of such inductively relevant variables in mind, and that this list is ordered in accordance with the supposed importance of the different variables and preceded by a pairing of the hypothesised cause with some appropriate control. Experimental tests of increasing severity may then be constructed by varying more and more of these factors in combination with one another. The inductive reliability of a causal or noncausal generalisation may be ranked in accordance with the complexitygrade of the most complex test that it succeeds in passing. No measurefunction is possible here because the severity of an experimental test, thus construed, is a non-additive property of it. But this is clearly the framework within which. with greater or less thoroughness, a great deal of assessment and criticism is carried out in experimental science. A successful hypothesis is seen to hold good over a wider and wider range of inductively relevant circumstances. Of course, the list of appropriate controls and relevant variables in a particular field is always subject to modification or enlargement as we learn more about the subject-matter. For example, the thalidomide tragedy established the hitherto insufficiently recognised importance of the pregnancy factor in toxicity-tests. (Also, the tests may well be carried out on samples of an appropriate size, rather than just on individuals, if it is thought that several unknown variables may be operating: the hypothesis is then that the drug has a certain Pascalian probability of producing the specified result.) But the method of relevant variables itself, in one realisation or another. is an invariant factor in the rational competence of an experimental scientist. Even if he does not wish to rank-order hypothcscs by it. he must at least be prepared to draw comparisons of greater or less reliability in accordance with the rankings that it generates. (ii). The laws we discover are of most use to us as licenses to carry out the corresponding inferences. but where the reliability of the law is not known to be complete the inference can be known only as a probable one. If the hypothesis that all A are B resists a fairly severe level of test, we have a license to infer with a correspondingly high level of inductive or Baconian probability from the premiss that a particular thing is A to the conclusion that it is B. But note the difference here from mathematical or Pascalian probability. The Baconian probability never depends (except in the limitingcases) on the ratio of A that are B to those that are either B or not-B. It depends instead on the extent of causally relevant factors which are powerless to interfere in any particular case with the A-B connection. It is thus a property that belongs distributively to each A rather than collectively to the totality of A. So we have here two different ways of gauging probabil-
ity and we should no more expect judgments of Baconian probability to coincide with judgments of Pascalian probability than we expect judgments of the academic value of a lecture to coincide with judgments of its value for attracting large numbers of people into the lecture-hall. Indeed mere numbers of favourable instances have no close bearing on the inductive reliability of a generalisation, since the replicability of a test-result is nothing but a guarantee for its genuineness. What determines the extent of the inductive support that identically favourable test-results give to a hypothesis is the structure of the test carried out, not the number of times that the same test-result has in fact been repeated. However the reliability of a generalisation may be maintained in the face of unsuccessful test-results if it is so modified as to exclude the presence of the circumstances that falsify it. Thus if ‘All A are B’ is falsified by variant V, of a relevant variable, but not by V2, and if V, excludes VI, then ‘All things that are both A and I/, are B’ has greater reliability than ‘All A are B’. In such a case p,[B/A & V,] will have a higher value than pr[B/A ] things that are both A and J’* are B’ has greater reliability than ‘All A are B’. In such a case pr [B/A & Vz ] will have a higher value than pr [B/A ] (where pI is a Baconian or inductive probability-function). In other words, if a Baconian probability is favourable, it increases with the weight of evidence. (iii). Judgments of Baconian probability impose certain systematic constraints on one another, because of their connection with the method of relevant variables. For example, any test that is passed by both ‘All A are B’ and ‘All A are C’ will also be passed by ‘All A are both B and C’. Hence the latter generalisation is as reliable as either of the former, if they are equally reliable, or as the less reliable of the two, if they are not. It follows that, if pr [B/A 1 2 pr [C/A 1, then pI [B & C/A I = pI [C/A] . The Baconian probability of a conjunction does not have to be less than the probability of either of its conjuncts, in contrast with the familiar rule for Pascalian probability. Again, in some cases being A is quite irrelevant to being B and then both ‘All A are B’ and ‘All A are not-B’ will have zero-grade inductive reliability. Hence it is quite possible for both pr [B/A] and pI [not-B+4 ] to be zero. Equally, whatever test ‘All A are B’ has passed, ‘All A are not-B’ must have failed. If the former has any greater than zero reliability, the latter has none. It follows that for any normal A, if pI [B/A] > 0, then pI [not-B/A] = 0. So the negation principle for Baconian probability is not complementational, as is that for Pascalian probability. Again, because of their dependence on covering generalisations, Baconian probabilities are invariant under contra-position, though this is not true of Pascalian ones. That is, pIIB, A] = pIIA, B] . These and other constraining principles for Baconian probability-judgments are all derivable within a suitable generalisation of the modal logic known as
392
L. .J. Cohm
S4. And to understand why this is so we need to reflect that the ideal of Baconian induction is to establish causal laws - necessities of nature. To ascribe complete inductive reliability to a generalisation is to ascribe it that kind of necessity: to ascribe it a lesser degree of reliability is to ascribe it something resembling, but inferior to, natural necessity. It turns out, moreover, that there is no appropriate way of mapping this modal logic onto the calculus of chance. Normal Baconian probabilities are not merely not equivalent to Pascalian ones, but are not even any kind of function of the latter. We can also define a concept of prior probability for Baconian reasoning, by putting pr[B] equal to pr[B/B-or-not-B] , analogously to the definition of prior in terms of posterior Pascalian probability. It then turns out that we can have pr[B/A] > 0 even when pr[B] = 0: there is no analogue of Bayes’ law in the theory of Baconian probability. In Baconian reasoning prior probabilities always set a floor to posterior ones, but never a ceiling. (iv). Though we are here concerned with two logically disparate systems of reasoning, there is nevertheless an important sense in which not only are both systems equally entitled to claim that they issue in judgments of probability but also each is entitled to claim that it complements the other. Notoriously even the Pascalian calculus is open to several alternative intcrpretations as a theory of probability. According to the semantics provided, it constitutes a theory of relative frequencies (Reichenbach, Von Mises), of belief intensities (Ramsey, de Finetti. Savage), of natural propensities (Peirce, Popper), of logical relations (Keynes, Carnap), or of some other appropriately structured domain (cf., Nagel, 1938 and Mackie, 1973). In each of these probabilistic interpretations, however, it may be regarded as a theory about gradations of provability (Cohen, 1977): what differ from one interpretation to another are the criteria of gradation. Now systems of demonstrative proof may be divided into two categories -- ‘complete’ or ‘inof B is complete’ - in accordance with whether or not the non-provability equivalent to the provability of not-B. Clearly any Pascalian theory must be regarded as a gradation of provability in a complete system, in this sense, because it incorporates a complementational principle for negation. But what about incomplete proof-systems, of which there are very many (e.g., Newtonian mechanics)? Any gradation of provability in an incomplete system must allow the possibility that, for some B. neither B nor not-B have any positive probability. In other words, when provability is put on a scale, there are two ways of doing it. One kind of scale runs from provability at the upper extreme to disprovability at the lower, and the further anything gets from being provable, the nearer it gets to being disprovable - i.e., to the provability of its negation. So the task of the probability-function here is to determine where on the scale the balance lies between proof and disproof. This kind of
On the ps~~clzology of prediction:
Whose is the fallacy?
393
scale requires a complementational principle for negation. The other kind of scale - a non-Pascalian one - runs from provability at the upper extreme to non-provability at the lower. Here we obtain a non-zero gradation of provability for B only if the premisses are already on balance in favour of B, and the level of the gradation depends just on the weight of the premisses, i.e. on the amount of relevant facts they include. Only if the premisses were, on balance, in favour of not-B would we instead, by grading their weight, obtain a non-zero gradation of provability for not-B. So if the probability of B on A is greater than zero, that of not-B must be zero; and it is also possible for both B and not-B to have zero probability, since the premisses A might be wholly indecisive or irrelevant in relation to the issue of B or not B. Therefore the fact that Baconian probability obeys a negation principle of this kind, as we saw earlier, is in no way a mark against it. Rather, we learn thus that Baconian criteria fill a legitimate niche, which is otherwise unoccupied, in any sufficiently comprehensive scheme for the gradation of provability. For example, the evidence that a North American male is thirty years old provides a high Pascalian probability for his survival till the age of fifty. But it provides only a low Baconian one, since the weight of the evidence is rather light: the man might also be a rock-climber, amateur pilot, heavy smoker, etc. The two probability-judgments differ substantially from one another. Nevertheless, neither implies the falsity of the other. Both judgments may be true, because each supplies us with a quite different kind of information from the other. The Pascalian judgment grades probability oy1 the assumption that all relevant facts are specified in the evidence, while the Baconian one grades it by the extent to which all relevant facts are specified in the evidence.
Some familiar uses of Baconian probability The theory of Baconian probability is highly important for the interpretation of Tversky and Kahneman’s results. But, before examining what emerges from its application to some of those results, we should note the existence of several other, independent grounds for supposing that a good deal of intuitive reasoning about uncertain outcomes has a Baconian structure. This will serve to confirm that there is nothing at all odd or surprising about the conformity of Tversky and Kahneman’s subjects to Baconian norms in appropriate cases. The first point has already been mentioned. The experimental method of reasoning in modern science is seen to have an essentially Baconian structure, if we consider how it operates in the assessment of hypotheses about causal laws, in abstraction from any statistical issues involved. Wherever we use
394
I,. J. Cohen
those assessments as a basis for judging the validity of our predictions, we are invoking Baconian standards. Accordingly, so far as scientific enquiry uses only methodised commonsense, we should expect to find that even the lay subjects of Tversky and Kahnemann’s experiments have a capacity for Baconian reasoning which they will exercise in appropriate cases. The second point concerns the relation between probability and rational belief. We naturally tend to suppose that, even where we cannot achieve certainty, there is some level of probability, on the total evidence available, that justifies rational belief. Only the neurotically sceptical always require absolute certainty in order to justify belief. But what kind of probability may this be? Pascalian or non-Pascalian? If the belief is about a particular matter of fact, like whether our friend John Smith will survive till the age of fifty, we arc surely wise, in general, to insist on a high Baconian probability as well as on a high Pascalian one. If the total evidence available is that John Smith is now thirty years old. we may well have a high Pascalian probability of his survival to fifty. But it would be unwise to believe in this survival, because the evidence actually available forms so small a part of the facts about a man that are relevant to his life expectancy. Thirdly, attempts to found a criterion for rational belief on Pascalian probability have to adopt ad hoc evasive stratagems in order not to be hit by the well-known paradox of the lottery. A criterion based on Baconian probability avoids this paradox altogether (Cohen, 1977). Fourthly. besides the above kind of paradox in regard to a Pascalian criterion for predictive beliefs, there is also a whole group of paradoxes in regard to retrodictive beliefs. These paradoxes emerge with particular clarity in relation to the norms of Anglo-American jurisprudence for forensic proof. Proof of an issue in criminal cases has to be at a level of probability that puts the matter beyond reasonable doubt, while in civil cases proof on the balance of probability suffices. So on a Pascalian account we might suppose the civil standard to require a probability of 0.6, perhaps. or 0.7. But then, ii the plaintiff in a civil suit had to establish his case on two relatively independent issues (for example the terms of two separate contracts) in order to win, his case as a whole would not always be established on the balance of Pascalian probability even if each of the two component issues were. What bars this is the multiplicational principle for the Pascalian probability of a conjunction. Similarly we might suppose the required level of proof in a criminal case to be a Pascalian probability of 1 E, where E is as small as you like. But on such a Pascalian analysis, even if all the legally distinguishable elements in a crime (for example, both the wounding and the malicious intention) have been established at the required level, it does not necessarily follow that the crime as a whole has. Now the courts see nothing particularly
problematic or paradoxical about applying the ordinary standards of proof to cases involving more than one issue. Presumably, therefore, if it is sufficient in their eyes for each component issue to be proved at the required level, the coniunction of these issues must also be taken to be proved thereby at the required level. And this is just what the conjunction principle for Baconian probability implies, as we have already seen. So the Anglo-American legal system endorses a Baconian, rather than a Pascalian, framework of reasoning in this context, and the lay jurors who adjudicate on the strength of the evidence put before them must be capable of operating within such a framework. Here is another paradox that would be generated by interpreting the normal standard of forensic proof for civil cases in Pascalian terms. Imagine a rodeo into which 400 people are known to have been admitted through an automatic turnstile after paying the proper sum. Then 1,000 people are counted on the seats, and a hole is discovered in the fence. A man is picked at random from the seats, who turns out to be John Smith. That is all the evidence before the court when John Smith is sued by the management of the rodeo for non-payment of his entry-money. Now, if the balance of probability involved is a Pascalian one, the management should win their case. But, in reality, of course, no jury would award it to them. or at least we should consider it highly unjust if they did. Why so? Because instead a jury would need to know particular facts about John Smith that establish a balance of Baconian probability against him (with a non-complementationa1 negation-principle) - e.g., that he was seen near the hole in the fence, that threads from his clothing were caught on the wire, and so on. Finally, consider the rule that an accused person in a criminal prosecution should be presumed innocent at the outset and judged only on the facts before the court. What this rule seems to require is that a jury should be capable of reasoning within a certain kind of framework. They must be capable of reconciling the thesis that the probability of the accused’s guilt, on the evidence presented, is greater than zero, with the thesis that, prior to the presentation of this evidence, the probability of the accused’s guilt is zero. That is to say, lay juries must be capable of reasoning within a framework in which zero-level prior probability does not compel zero-level posterior probability. In Pascalian reasoning such a framework is not obtainable, because of Bayes’ law. But Baconian reasoning, as we have seen, allows this, because of its quite different foundations: zero-level Baconian probability means non-provability, not disprovability. So here is yet another reason for supposing that the ordinary laymen, who constitute British or American juries, must be fully capable of thinking about probabilities in Baconian terms. To suppose otherwise would be to assume a politically unacceptable
396
L. J. Cohen
lack of fit between legal norms and social reality. Of course, Pascalian probabilities often enter into forensic proofs, in connection with issues about identity, occupational disease, actuarial risk, and innumerable other matters. But when such probabilities are important they enter into a proof as part of the premisses for its conclusion, not as gradations of the extent to which its premisses establish the conclusion. They are given in evidence by expert witnesses. not estimated by jurors in the process of deciding on their verdict. And they establish facts about whole categories of individual people or events, rather than just about the particular people or events that are involved in the case currently before the court.
A re-interpretation
of some experimental
results
If we bear in mind these normative points about Baconian probability. we shall be in a better position to make sense of Tversky and Kahneman’s experimental data. Look back now at the first experiment described above. In it the fallacy imputed to the subjects was that of failing to take prior probabilities into account when judging the probability that a particular description belonged to an engineer rather than to a lawyer. But this would only have been a fallacy if the subjects were reasoning in terms of Pascalian probabilities. If their judgments were Baconian, there was no fallacy. The prior Baconian probability of a particular man’s being an engineer, or of his being a lawyer. is presumably very low indeed, even if not actually zero. So the floor it sets to either posterior Baconian probability is of negligible importance. In assessing the latter all that needs to be done is to determine the inductive reliability of the generalisation that all people of the given description are engineers, or lawyers, as the case may be. I.e. the question to be answered is: how well guarded is the description against those factors that create exceptions to some appropriate rule-of-thumb for inferring a man’s profession? Or, to put the point yet another way, we can understand what Tversky and Kahneman call the subjects’s ‘stereotype’ of an engineer, or lawyer, as the description that the subject implicitly takes to guarantee membership of the appropriate profession. He attributes more or less complete inductive reliability to the generalisation that anyone satisfying the engineer-stereotype is in fact an engineer. So the Baconian probability that such-or-such a description betokens an engineer is judged by the extent to which that description approximates the full stereotype. Representativeness, as Tversky and Kahneman call it, is thus not just a heuristic here, as they regard it, but rather the rationally appropriate criterion. Mutually similar causes produce mutually similar
On the psycholog?l of prediction:
Whose is the fallacy?
397
effects. It follows that if the subject commits any fallacy at all, the fallacy is not a logical or mathematical one, such as would arise from ignoring priors in the computation of Pascalian probabilities. If there were a fallacy, it would lie rather in the choice of stereotype. The subject might have made an empirical error about the factors that more or less guarantee a man to be an engineer. But how, you may ask, does a subject decide whether to assess the probabilities at issue by Pascalian standards or by Baconian ones? Well, the experimenters certainly give him no guidance in this dilemma. They are like someone who asks about the value of a particular lecture and leaves it to his hearers to decide whether he means its academic value or its value for attracting crowds. So, if we charitably assume the subjects to be at least as rational as ourselves, we can hope to construe at least some such experiments as revealing how subjects do decide between Pascalian and Baconian responses. That is to say, we should interpret their answers, where possible, in whichever of the two ways does not involve them in committing any logical or mathematical fallacy. And, if we can thus discover when their judgments are Pascalian, when Baconian, and when irredeemably fallacious, we can go on to look for the features of experimental set-ups which correlate with these various results. One hypothesis which seems to fit the available evidence is that people inexpert in statistical theory tend to apply Baconian patterns of reasoning instead, and to apply these correctly wherever they have an opportunity to make the probability in question depend on the amount of inductively relevant evidence that is offered. Thus it is reasonable to suppose that adoption of a particular profession is causally connected (in some as yet imperfectly known way) with a person’s character, interests, abilities, opportunities. etc. So in the experiment about the engineers and lawyers the subjects naturally tended to go by the weight of evidence and use Baconian reasoning. A similar explanation covers the second experiment described above. The subjects were asked about two particular hospitals in a particular town, during a particular year. They were asked whether they thought that the size of the hospital affected the frequency with which its ratio of male births to female ones diverged markedly from the human average. Since a majority of them neglected sample-size and answered in the negative, we must infer this majority to have applied Baconian standards. Why did they do so? Well, it would have been quite reasonable to argue as follows (though it is not necessary to assume that conscious reasoning of this kind actually took place): maybe doctors can decide on a deliberate policy of impeding births that they know will be of a particular sex; but there is no particular reason to expect this more in a small hospital than in a larger one, since the size
of a hospital, so far as we know, does not affect its obstetricians’ attitudes to the sex-ratio of the babies born in it; so there is zero weight of evidence in favour of either hospital’s having more days with an exceptional sex-ratio of births. Such a way of arguing is fulIy rational in the same sense as the prospective mother, who is concerned to avoid any interference with the biological processes that determine the ratio of boys to girls within her family, is fully rational in paying no attention whatever to the long-run Pascalian probabilities about larger and smaller hospitals when she decides where to lie in. Of course, if the subjects had been familiar with Bernoulli’s theorem they might well have inferred from the information supplied, not that the smaller hospital actually recorded more days with an exceptional sex-ratio ~ since this inference too, /XICCTversky and Kahneman, would have been a fallacy ~ but that such days had a higher Pascalian probability of occurrence (within a certain interval of approximation) in the smaller hospital. Suppose therefore that the experiment were re-run and the question were put unambiguously: in which hospital is there a higher Pascalian probability (within a specified interval of approximation) that any day picked at random would be one with an exceptional sex-ratio? No doubt subjects ignorant of Bernoulli’s theorem would not score well in their replies. But if they are ignorant about Bernoulli’s theorem ~~ i.e., about certain crucial implications of the Pascalian concept of probability - have they fully understood the question? At best such an experiment only exposes the indeterminacy of the experimental situation here, which is like that of many other experiments about proneness to logical or mathematical error. The more inexpert the subjects are in probability theory, the less they can be expected to understand the question properly: the more expert they are, the less they can be expected to argue fallaciously (except through carelessness or other adventitious causes of performance-error). So an experimenter should not expect that his or her subjects’ systematic proneness to logical or mathetnatical error can be demonstrated unambiguously even from their answers to well-phrased questions about Pascalian probability. The most that such an experiment can really show is the extent to which its subjects fail to speak as if they understand the question correctly. On the other hand, where the question is ambiguous between Pascalian’ and Baconian probability, the hypothesis already offered seems to fit the facts. Indeed it fits Tversky and Kahneman’s results about representativeness even where the problem posed to their subjects concerns a random process, like coin-tossing, despite the fact that such processes are often treated as a paradigm field for Pascalian analysis. According to Tversky and Kahneman (1974)
On the psyhologv
of prediction:
Whose is the fallacy?
399
People expect that a sequence of events generated by a random process will represent the essential characteristics of that process even when the sequence is short. In considering tosses of a coin for heads or tails, for example, people regard the sequence H-TFH-T-T--H to be more likely than the sequence H-H-H-T-T--T, which does not appear random, and also more likely than the sequence H-H-H-
H-TV H, which does not represent the fairness of the coin. But though this kind of estimate is fallacious, if understood as a judgment of Pascalian probability, it is not at all fallacious if interpreted in Baconian terms. Suppose you encounter a particular sequence of just six items, where the items belong to just two different kinds. These might be black and white rocks on a path, boy- and girl-births in a family, men and women going through a doorway, . . . or coin-tosses. You are asked how probable is it that the sequence is random, and the only information you have is the relative frequency of the two kinds in the given sequence and their pattern of distribution. Now the word ‘random’, outside the theory of Pascalian probability, means ‘without aim or purpose or principle’. So you are being asked about the probability that the sequence was not composed as a result of a single causal factor but through the undesigned interaction of several such factors. Well, you have not got much evidence to go on, if you can look only at the sequence’s own constituent structure and not at its surrounding circumstances. But the relative frequency of the two kinds of items, and their symmetry of distribution, would obviously be relevant variables. Designed, non-random sequences tend to be regular and symmetrical throughout, even though undesigned, random sequences often exhibit occasional stretches of regularity and symmetry. Hence you would be correct to infer a slightly higher Baconian probability of non-randomness for the sequence H-H&HT-T--T than for the sequence H-T-H-T-T--H. And at least in the case of the men and women going through a doorway the more probable conclusion might well be the true one. Moreover, it should be remembered here that Baconian probabilities are invariant under contraposition. So, if pI(notR[andoml /S[ymmetricall) > pr(not-R, not-S), it must also be the case that pI(not-S/R) > pr(S,R). And the latter inequality corresponds to the judgment that Tversky and Kahneman’s subjects actually produced. Therefore on the more charitable, Baconian interpretation these subjects committed no fallacy and were the victims of no illusion. (Analogous reasoning will account for their belief that H-T-H-T-T-H is more likely than H-H-H-H-T-H.) A similar point emerges with particular clarity in regard to a remark that Tversky and Kahneman make about interview-based predictions. According to them people are prone to experience much confidence in highly fallible judgments, a phenomenon that may be termed the illusion of validity. Like other perceptual and
400
I,. J. Calm
judgmental errors, the illusion of validity often persists even when its illusory character is recognised. When interviewing a candidate, for example, many of us have experienced great confidence in our prediction of his future performance, despite our knowledge that interviews are notoriously fallible. It is to be hoped that not too Inany candidates of Tversky and Kahneman’s confusion here.
have suffered
injustice
because
For even if the results of interviews in general do not significantly correlate, in the long run, with future performance, we can still be quite rational in having a lot of confidence about our predictions in certain particular cases. The fallibility of interviews is a collective property of an indefinitely large class of events. It excludes a high Pascalian probability of success for any interview-based prediction that is selected at random. Justifiable confidence in an interview, however, where such confidence exists, is a property of that particular interview, and is grounded on the weight of evidence - the amount of inductively relevant facts ~~ revealed at the particular interview and thus on the high Baconian probability which that evidence validates. If the hypothesis proposed above is correct, what Tversky and Kahneman call “the rcpresentativeness is the outcome of an underlying heuristic” assumption that, where mutually similar causes operate. mutually similar effects occur. And similarity here is properly to be judged, of course, in terms of inductively relevant characteristics. But in some problems there are no features that may be appropriately treated as inductively relevant characteristics. A typical problem of this nature, studied by Olson (1976). may be stated as follows:Consider two Quebec towns. Anglophones are a majority (65%) of the voters in Town A, but are a minority (45%) in town B. There is an equal number of electoral ridings in each town. You have the voters’ lists from all ridings in both towns. You randomly select a list from one riding, and observe that exactly 55% of the voters are Anglophones. What is your best guess - is the riding in Town A? or in town B?
In dealing with such problems Baconian reasoning has nothing to get a proper grip on, and the inductivist search for salient similarities may then be guided in fact by various intuitive strategies that are normatively indefensible. For example, as Olson has shown, subjects may respond to similarities in the absolute size of numbers rather than in minority-majority distribution. Other writers too have detected a tendency in statistically untutored individuals to base their predictions on intuitive beliefs about causality. Thus Ajzen (1977), investigating people’s estimates of a student’s grade point averages in relation to information of different kinds, found that when asked to make a prediction his subjects looked for factors that would cause the behaviour or event under consideration. Other items of information,
On the psychology
of prediction:
Whose is the fallacy?
401
even though important by normal principles of statistical prediction, tended to be neglected if they had no apparent causal significance: statistical information was used mainly when no causal information was available. Again base-rate information tended to be neglected where it provided no evidence concerning the presence of factors that were perceived to stand in a causal relation to the phenomena considered. But where base-rate information did provide causally relevant evidence it was considered more seriously. (The same point is made by Tversky and Kahneman (1977), with a further wealth of examples). Indeed this overall human tendency to prefer reasoning from causal rather than statistical data is hardly surprising if we bear in mind that the former type of reasoning seems to develop much earlier in the child and to be much more deeply rooted. According to Piaget and Inhelder (1975) “the idea of chance and the intuition of probability constitute almost without a doubt secondary and derived realities, dependent precisely on the search for order and its causes”. However, what I am arguing for in the present paper is not only the factual point that causal inferences underlie much of Tversky and Kahneman’s data, but also the normative point that such inferences, and their gradation by appropriate criteria, should not necessarily be regarded as irrational.
Are people particularly
prone to fallacies in diagnostic
reasoning?
Tversky and Kahneman (1977) have also drawn attention to a type of phenomenon that was originally investigated by Turoff (1972). Tversky and Kahneman describe their result as follows: Let C be the event that within the next 5 years Congress will have passed a law to curb mercury pollution, and let D be the event that within the next 5 years, the number of deaths attributed to mercury poisoning in the U.S. will exceed 500. Let C and b denote the complements of C and D respectively. Question: Which of the two conditional probabilities, p(C/D) and p(C/fiJ, is higher? Question: Which of the two conditional probabilities, p(D/C’) and p(D/C) is higher? The overwhelming majority of respondents state that Congress is more likely to pass a law restricting mercury pollution if the death toll exceeds 500 than if not, i.e., p(C/D) > p(C/D). Most people also state that the death toll is less likely to reach 506 if a law is enacted within the next five years than if it is not, i.e., p(D/C) < p(D/c... However, this seemingly plausible pattern ofjudgments violates the most elementary rules of probability, . .. provided p(c) and p(D) are non-zero.
And the explanation of this particular piece of alleged irrationality to lie in a more general form of alleged intellectual error:
is said
When told to assume that a particular conditioning event has occurred, people are prone to focus on the causal impact of this event on the future and to neglect its significance as a source of infomlation about the past. Now a point to notice about Baconian reasoning is that the inductive probability of one ~‘VCIZ~, given another event, cannot be determined unless the temporal relationship of the two events is specified. This is because the probability depends on the inductive reliability of the covering generalisation. as we have seen, and that reliability is in turn determined by the extent of the gcneralisation’s resistance to causal interference by relevant variables. Because inductive probability is rooted in the operation of causal factors, and causation works from carlier to later, it is necessary to specify which of the pair of events is supposed to come first. However, the design of Tversky and Kahneman’s experiment requires not only that a question be put to the subjects which is ambiguous or unspecific as to the temporal priority of one event to another but also that the subjects be not told that this ambiguity or lack of specificity is essential to the design of the experiment. In the circumstances therefore the subjects may perhaps be forgiven their apparent tendency to read temporal orderings into the event-pairs about which they are asked to make probability-judgments. Without such interpretations of what they arc told, they cannot properly apply Baconian standards of evaluation. And there may be at least two reasons why the interpretations which are made should all in fact assume that what is at stake is the extent to which an earlier event probabilifies a later one. rather than vice versa. The first is that the basic form of Baconian generalisation .-- the form that can be tested experimentally by the manipulation of relevant factors runs naturally from earlier to later, if there is any time difference at all: the question is whether the later event is affected when the relevant circumstances of the earlier one arc altered. The second is that special care with verb-tenses is required if conditionalisation is not to hinr at this order. For example, instead of stating that “Congress is more likely to pass a law restricting mercury pollution if the death toll exceeds 500 than if not” it would be necessary to state “Congress is more likely to pass a law, or to have passed a law. restricting mercury pollution if the death toll exceeds 500 than if not”. But if the unspecificity of temporal order were made apparent in this way the point of the experiment would be destroyed. It follows that the judgments made by Tversky and Kahneman’s subjects arc not in conflict with the logic of probability. On the one hand these subjects tend to hold, for any time t within the next five years. that p(C after t/D before t) > p(C after t/D before t): on the other hand they also tend to hold that p(D after t/C before t) < p(D after t/C before t).
On the psychology
of prediction:
Whose is the fallacy?
403
Indeed these two comparative judgments are not only consistent with one another according to the logical principles constraining relations between different judgments of Baconian probability. They are also quite consistent with one another in terms of the Pascalian calculus. So Tversky and Kahneman’s results supply no reason to suppose that ordinary people have some systematic proneness to irrationality in regard to causally diagnostic reasoning as contrasted with causally predictive reasoning. Nor, on a Baconian interpretation, can Tversky and Kahneman’s subjects be said to be averse to committing themselves - at least by implication - on questions of diagnostic probability, since the contraposition of inductive probabilityjudgments (though not of Pascalian ones) is equivalence-preserving. The inductive probability that the annual death-toll will exceed 500 after t if Congress does not pass anti-pollutant legislation before t is equal to the inductive probability that Congress will have passed anti-pollutant legislation before t if the annual death-toll does not exceed 500 after t. In fact there seems no reason to believe that irrationality is particularly common in diagnostic reasoning even where the occurrence of percentagefigures as evidence in the statement of the problem makes it quite clear that Baconian probabilities are not involved. And it is necessary to demonstrate this in order to establish that such experiments provide no confirmation for Tversky and Kahneman’s theory that people are particularly prone to irrationality in diagnostic reasoning. Tversky and Kahneman (1977) seem to have drawn the wrong conclusion from two experiments of that kind. In the first causally predictive experiment subjects were instructed as follows: Consider the following hypotheses concerning the causes of death. (i) The chance of death from heart failure is 5% among males. (ii) The chance of death from heart failure is 10% among males who are heavy smokers. (iii) The chance of death from heart failure is 45% among males with congenital high bloodpressure. Dick is a heavy smoker with congenital high blood pressure. Question: What is the probability that Dick will die of heart failure? The experiment was designed to test whether subjects would recognise that the correct figure was one that was higher than 45?&, in order to reflect the incremental force of two independent pieces of evidence. In the event a significant majority of respondents did in fact recognise this. In a second causally diagnostic experiment subjects were given the following information:
404
L. J. Cohen
Bill has been referred by his physician to the hospital with suspicion of tumour. Following the examination the following data were obtained. (i) The chance of a malignant tumour is 5% among patients referred pital for such examinations. (ii) The haematologist who examined Bill’s blood test estimated the malignancy to be 10%. (iii) The radiologist who examined Bill’s X-ray estimated the chance nancy to be 45%. Question: What is the probability that Bill has a malignant tumour?
a malignant to the hoschance of a of a malig-
According to Tversky and Kahneman this second problem “is structurally similar” to the first one. “In both problems each datum provides support for the hypothesis in question, and it appears reasonable to assume the incremental property.” But in the second experiment “a significant majority of the subjects . . . violated the incremental property” and were inclined to average the estimates. But did the majority of subjects really commit a fallacy in the second of these two experiments? In the instructions for the first experiment clauses (ii) and (iii) state conditional probabilities. But in the instructions for the second experiment clauses (ii) and (iii) state that certain unconditional probabilities have been arrived at, presumably by detachment from conditional ones. Now any such detachment presupposes that no other evidence is available. A haematologist (or radiologist, or anyone else) cannot properly infer p(H) = x from knowledge of E and of p(H,E) = x, unless E is all the available evidence. So we recover a conditional probability p(H/E,), from the information given in (ii) only on the assumption that there is no other available evidence than Bill’s blood test, E, ; and we recover a conditional probability p(H/E,), from the information given in (iii) only on the assumption that there is no other available evidence than Bill’s X-rays, E2. We are therefore not entitled to compound these two conditional probabilities into a value for p(H/E,&E*) from which we then detach an increased value for p(H) on the assumption that both E, and E, fall within the available evidence. The best that we can do is to derive the weighted average of the two estimates for p(H) that were arrived at separately, since these two estimates were reached on mutually incompatible presuppositions. The fallacy here has been committed by Tversky and Kahneman, not by their subjects. It is therefore worth while to design a third experiment that is indeed structurally similar to the first, rather than to the second, but concerns the same subject-matter as the second experiment. The subjects are now given the following information:
Bill has been referred by his physician nant tumour.
to the hospital with suspicion Certain facts are known about such referrals:
of a malig-
On the psychology of prediction: Whose is the fallacv?
405
(i) Among patients referred to the hospital for this reason, the probability of a malignant tumour is 5%. (ii) Among patients who have positive blood tests after referral, the probability of a malignant tumour is 10%. (iii) Among patients who have positive X-ray results after referral, the probability of a malignant tumour is 45%. Bill turns out to have both a positive blood test and a positive X-ray result. What is the probability that he has a malignant tumour? When this experiment was performed on a mixed-ability group of 25 people in the age-range 17-60, only one responded with a probability that was less than 45%. The remainder were evenly divided between those who responded with a figure greater than 45% and those who responded with just a 45% probability. And a charitably-minded psychologist could well interpret this division as stemming from an underlying factual disagreement about whether the particular probabilities (ii) and (iii) should be construed as being independent of one another or not. What emerges, therefore, when the design of the second experiment has been straightened out in such a way as to preserve a genuine structural similarity with the first experiment, is that Tversky and Kahneman’s results are not replicated. It is just not the case that a significant majority is inclined to average the probabilities given in (ii) and (iii). So here too no support whatever emerges for the Tversky-Kahneman theory that judgments about the probability of causal diagnoses are more prone to irrationality than judgments about the probability of causal predictions. We have no reason to suppose that people are prone to what of the impact” of Tversky and Kahneman call “ a major underestimation diagnostic evidence, “which could have severe consequences in the intuitive assessment of legal, medical, or scientific evidence”, as they claim. It would be more appropriate to be alarmed about the possibility that Tversky and Kahneman’s widely publicised “findings” may have led many people to distrust some of their fellow human beings’ probability-judgments unnecessarily. Of course, certain sorts of fallacy do commonly occur in the reasoning of laymen about probabilities. For example, Tversky and Kahneman claim to have shown that when subjects are already familiar with some instances of a class it tends to appear more numerous than a class of equal frequency but less familiarity. This claim is in no way refuted by what I have been saying about Baconian probability, and the same is true for their claim that the ease with which subjects can imagine a class affects their conception of its size. (Cf., also Smedslund, 1963). Nor would it be surprising if Baconian and Pascalian modes of reasoning - in their intuitive, inexplicit forms ~ are sometimes confused with one another, with the result that nothing rational emerges. Such a confusion may well account for the tendency of ordinary
406
I,. J. Cohen
people to overestimate the combinatorial probability of conjunctive events a tendency that John Cohen, E. I. Cheswick and D. Haran (1972) claim to have found. For in Baconian, inductive reasoning a conjunctive event always has as high a probability as the less probable of its conjuncts and a disjunctive event need have no higher an inductive probability than the more probable of its disjuncts. Perhaps in any case such confusions are especially to be expected in the artificially constrained circumstances of psychological experiment. In a natural situation, when you are confused by a problem, you can make further enquiries so as to determine what would be an appropriate framework within which to think about the issues involved. But as a subject in a laboratory experiment you would not normally have an opportunity for supportive investigations of this kind. Very great care is therefore needed in drawing any conclusions from such experiments about the ability of ordinary people to reason validly about probabilities; and experimenters must also be obliged to cast out the motes in their own eyes. Above all “the normative theory of prediction” must be taken to include Baconian as well as Pascalian modes of reasoning. For, on the assumption which is shared by all investigators of causes and effects, that like causes produce like effects, it is undeniably reasonable to use the degree of relevant likeness of the cause as one kind of criterion for the probability of the effect. And an assumption that is so widely made, in such reputable contexts as those of forensic proof and scientific experiment, can hardly be denied normative credentials.
References Ajzen,
1. (1977). Intuitive theories of events and the effects of base-rate information on prediction. J. Pers. Sot. Psychol. 35, 303-314. Bacon, F. (1620). Novum Organurn. London. Cohen, J., Cheswick, E. I. and Haran, D. (1972). A confirmation of the inertialeffect in sequential choice and decision. Brit. J. Psychol. 63, 4 l-46. Cohen, L. J. (1970). The Implications oflnduction. London., Methuen. Cohen, L. J. (1977). The Probable and the Provable. Oxford, Clarendon Press. Hacking, I. (1975). The Emergence of Probability. Cambridge, Cambridge University Press. Herschell, J. F. W. (1833). A Preliminary Discourse on the Study of Natural Philosophy, London, Longmans, Green. Hooke, R. (1705). A General Scheme or Idea of the Present State of Natural Philosophy and How its Defects may be Remedied by a Methodical Proceeding in the Making of Experiments and Colleting Observations. In R. Waller (ed.), The Posthumous Works of Robert Hooke, London, pp. 6-65. Kahneman, D. and Tversky, A. (1973). On the psychology of prediction. Psychol. Rev. 80. 2377251. Kahneman, D. and Tversky, A. (1974). Subjective probability: a judgment of representativeness. In C.A.S. Stael von Holstein (ed.), The Concept of Probability in Psychological Experiments, Dordrecht-Holland, Reidel. Pp. 25-48.
On the psychology
of prediction:
Whose is the fallacy?
407
Mackie, J. L. (1973). Truth, Probability and Paradox. Oxford, Clarendon Press. Mill, J. S. (1843). A System of Logic, Ratiocinative and Inductive. London. Nagel, E. (1938). Principles of the Theory of Probability. In 0. Neurath, R. Carnap, and C. Morris (eds.), Foundations of the Unity of Science, Chicago, University of Chicago Press. Vol. I, 341422. Olson, C. L. (1976). Some apparent violations of the representativeness heuristic in human judgment. .I. Exp. Psychol.: Hum. Percep. and Perf: 2, 599-608. Piaget, J. and Inhelder, B. (1975). The Origin of the Idea of Chance in Children. London, Routledge. Introduction p. xv. Smedslund, J. (1963). The concept of correlation in adults. Scandin. J. Psycho/. 4, 165-173. Turoff, M. (1972). An alternative approach to gross-impact analysis. Technol. Forecast. Sot. Change 3, 309-339. Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science 125,1124-1131. Tversky, A. and Kahneman, D. (1977). Causal thinking in judgment under uncertainty. In R. Butts and J. Hintikka (eds.), Basic Problems in Methodology and Linguistics, Dordrecht-Holland, Reidel. Pp. 167-190. Whewell, W. (1847). The Philosophy of the Inductive Sciences. London, J. W. Parker.
Cognition, 7 (1979) 409-411 @Elsevier Sequoia S.A., Lausanne
Discussion - Printed
in the Netherlands
On the interpretation of intuitive probability: A reply to Jonathan Cohen DANIEL University
KAHNEMAN of British
Columbia
AMOS TVERSKY Stanford
University
In our discussion of probability judgment and intuitive prediction we have described as errors some judgments and inferences that violate basic principles of probability and statistics. Cohen argues that our subjects’ answers should not be so viewed because they can be construed as compatible with an alternative normative system which he has recently developed. Cohen claims that his system has a sound normative basis, on a par with the standard probability calculus, and that it provides a viable interpretation of the responses of our subjects. It is easy to see, however, that Cohen’s system does not provide a viable explication of the intuitive notion of probability. In this system “the (Baconian) probability of an A being a B is identified with the inductive reliability of the generalization that all A’s are B’s”. Although Cohen does not describe explicitly how to evaluate inductive reliability, he specifies formal rules that govern inductive (or Baconian) probabilities. First, he demands that P,(A/B) = Pr(Not B/Not A). Thus, according to Cohen, the inductive probability that a bird which has just been sighted is white if it is a raven must be equal to the probability that the bird is not a raven if it is not white. We believe that most people would judge the former probability to be vanishingly small and the latter to be substantial. Second, Cohen proposes that if P,(A/B) > 0 then P,(Not A/B) = 0. Thus, if there is non-zero inductive probability that the defendent in a trial is guilty, then the inductive probability that he is innocent must be zero, contrary to legal usage and common sense. More generally, Cohen’s system cannot assign non-zero probability to more than one member of a set of mutually exclusive hypotheses. For example, consider a murder investigation in which there are several suspects and the murderer is known to have acted alone. According to Cohen, the probability of guilt can be non-zero for only one suspect. For all other suspects, the inductive probability of guilt is zero, just as for the rest of mankind.
410
D. Kahnernan and A. TtvrskJj
Whatever notion may be captured by Cohen’s formalism, it clearly does not conform to common usage of “the probability of an A being a B” or “the degree of belief in A, given evidence B”. For an attempt to model the relation between evidence and belief, which also departs from the standard calculus but is free of the above defects, see Shafer (1976). Cohen’s critique of our position is based on a reinterpretation of the questions that were answered by our subjects. In order to rationalize the neglect of base-rate, for instance, Cohen argues that a question such as “what is your probability that a person who owns a programmable calculator is an engineer rather than a lawyer?” is interpreted by subjects as “what is your confidence in the generalization that no lawyer owns a programmable calculator?“. We propose that Cohen’s claim regarding the equivalence of the two questions is incorrect, and we expect most people to give a fairly high probability as an answer to the former question and an extremely low probability as an answer to the latter. Incidentally, we have found that the judged probability that Mr. X (whose personality is briefly sketched) is an engineer rather than a lawyer and the judged probability that he is a lawyer rather than an engineer typically add up to unity both in a within-subject and in a between-subject design. This observation is inconsistent with Cohen’s formalism, which requires the smaller of the two probabilities to be zero. An even less plausible interpretation is introduced by Cohen to explain common answers to problems such as “which hospital (the large or the small) do you think recorded more days during the year in which more than 60% of the babies born were boys?“. We have suggested that subjects correctly attribute daily variations of sex-ratio to chance factors, but fail to appreciate the effect of sample size on sampling variability. In contrast, Cohen argues that subjects attribute any imbalance of sex-ratio to some causal intervention by the obstetricians in the hospital. Because such intervention is presumably unrelated to hospital size, the subjects’ neglect of the variable can be justified. The conflict between the interpretations could be resolved empirically, e.g., by asking subjects whether occasional imbalances of sex-ratio reflect chance factors or hospital policy. We do not believe that Cohen’s hypothesis would survive such a test. We hope that these examples suffice to show that Cohen’s system has little normative or descriptive appeal, and that his interpretation of our findings is hardly compelling. We accept Cohen’s objection to the problem of the malignant tumor, which was indeed deleted in our subsequent treatment of causal and diagnostic reasoning (Tversky & Kahneman, 1979). This objection, however, does not bear on the interpretation of subjective probability. In conclusion, we can only invite the reader to look at the data presented in our papers and to judge whether the observed insensitivity to sample size, prior
Reply
to Jonathan
Cohen
411
probability and reliability of evidence should be viewed as mistakes, which many of us are prone to make but would wish to correct, or as opinions which should be held with pride and confidence because they may be construed as compatible with Cohen’s Baconian formalism.
References Shafer, Glen (1976) A Mathematical Theory of Evidence, Princeton, N.J.: Princeton University Press. Tversky, A. and Kahneman, D. (1979) Causal schemas in judgments under uncertainty. In M. Fishbein (ed.), Progress in Social Psychology. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Cognition, @Elsevier
7 (1979) 413-420 Sequoia S.A., Lausanne
Discussion ~ Printed
in the Netherlands
Developmental and acquired dyslexia: Some observations on Jorm (1979) ANDREW W. ELLIS Department University
of Psychology, of Lancaster
*
In a recent paper on the cognitive and neuropsychological bases of developmental dyslexia, Jorm (1979) has attempted to unite the study ofdevelopmentul dyslexia (defined as a specific reading disorder occurring in otherwise intelligent children provided with an adequate background and educational opportunities) with the study of acquired dyslexia (the term given to the reading problems encountered by hitherto normal readers as a consequence of brain damage). It is undoubtedly a matter for regret that psychologists studying either one of these two aspects of reading have tended to be ignorant of work in the other area, despite the obvious scope for comparison. Jorm is therefore to be congratulated on being one of the few who have tried to bring the two fields together in an integrated manner. However, in the course of his paper Jorm (1979) reiterates a claim made in Jorm (1977) that the symptoms of developmental dyslexia are closely similar to those of a particular variety of acquired dyslexia known either as deep djtsfexia (Marshall and Newcombe, 1973) or phonemic dyslexia (Shallice and Warrington, 1975). An alternative proposal has been made by Holmes ( 1973; 1978) who argues that developmental dyslexia may be likened, not to deep dyslexia, but to another variety of acquired dyslexia which Marshall and Newcombe (1973) term surface dyslexia. In this paper I shall argue that Jorm’s grounds for comparing developmental dyslexia to deep dyslexia are ill-founded, and that such evidence as is available, though not unequivocal, tends to support Holmes’ position. On the basis of the studies of deep dyslexia by Marshall and Newcombe (1966; 1973), Patterson and Marcel (1977), Richardson (1975a, b), Saffran and Marin (1977), and Shallice and Warrington (1975), the following list of symptoms may be drawn up:
*Requests for reprints should University of Lancaster, Lancaster
be addressed to Andrew LA1 4YF, England.
W. Ellis,
Department
of Psychology,
414
Andrew
W. Ellis
(1) Severe problems in nonlexical grapheme-to-phoneme conversion, as evidenced by an almost complete inability to read nonwords such as blurg or gem-k. (2) Errors when reading single words without time constraints. These paraphasias may be visual (shallow --f “shadow”; mclk~~w + “melon”), derivational @refer -+ “preference”;faitlz -+ “faithful”) or semantic (speak + “talk”; berry + “grapes”). (3) Pronounced effects of word characteristics on naming such that nouns are read better than adjectives or verbs which are, in turn, read better than function words. Also, imageable/concrete nouns are read better than abstract nouns. Deep dyslexia is one of three varieties of acquired dyslexia discussed by Marshall and Newcombe (1973). A second variety is visz~al dyslexiu, characterised by purely visual errors, while in a third variety, termed surface d_vslexiu, the vast majority of errors can be described as partial failures of grapheme-phoneme conversion (see also Newcombe and Marshall, 1973). Ambiguous consonant letters such as s, c or g, in which the choice of the correct phonemic counterpart depends upon the graphemic context, create particular problems, leading to errors such as guest + “just” (where g is assigned a value as in ‘gin’), or recent -+ “rikunt”. Other errors involve assigning a phonetic value to silent graphemes (Zistelz + “liston”; isZand -+ “izland”), failure to apply the e-lengthening rule (lace -+ “lass”; describe + “describ”), or stress shifting (begz’tz -+ “beggin”;omz’t + “6mmit”). Holmes (1973, 1978) has argued that the misreadings of developmental dyslexics are closely comparable to the errors made by surface dyslexics. Thus, the errors made by the four 9 to 13-year-old boys studied by her included examplars of all the above categories, for example failures to apply the e-rule (wage + “wag”; quite + “kwit”), or mispronouncing ambiguous graphemes, as when cl1 is pronounced /3/ (as in “church”) when reading words such as anchor or monarch where the correct realization of clz is /k/. Marshall and Newcombe (1973) ascribe surface dyslexia to a moderate-tosevere impairment of the “direct route” from visual written forms to semantic representations (a route for which there is now overwhelming evidence see Coltheart, 1978; Marshall, 1976), combined with a lesser deficit in knowledge of grapheme-phoneme regularities. (One might note that, given the complexities of English spelling-to-sound relations, even normal skilled readers might be expected to make sizeable numbers of reading errors if forced to rely entirely upon grapheme-phoneme correspondence rules). Holmes (1978) concurs with Marshall and Newcombe’s (1973) interpretation of surface dyslexia and extends that interpretation to incorporate developmental dyslexia. This view is opposed to Jorm’s (1979) claim that the direct visual-to-semantic route functions normally in developmental dyslexics.
Developmerltal and acquired dyslexia
415
Jorm (1979) adduces four lines of evidence in support of his thesis that developmental dyslexia might be regarded as a genetic form of deep dyslexia. The validity of these four points will be examined one at a time.
(I) Impairment
of grapheme-phoneme
correspondence
Jorm cites unpublished work by Firth (1972) which apparently found that a test in which children had to read nonsense words like nate and iston discriminated successfully between dyslexic and normal subject groups. Unfortunately, Jorm provides no further information concerning the behavior of the developmental dyslexics. Do they, like deep dyslexics, sit mute in front of printed nonwords, or do they attempt pronunciations which are subsequently deemed by the experimenter to be incorrect? If the latter is the case, do the errors of the dyslexics result from random guesses, or from the application of inappropriate, irregular letter-to-sound correspondence? To illustrate the last point, consider the nonword ghoti. A “correct” reading would, presumably, be something like /goti/ and yet, as George Bernard Shaw observed, if gh is pronounced as in “tough”, o as in “women” and ti as in “nation”, then ghoti may be given an alternative reading as “fish”! Is it, then, that developmental dyslexics fail, in part, because of the application of such irregular grapheme-phoneme correspondences? Certainly, other data available on the reading and spelling performance of developmental dyslexics calls into question any claim that they lack all knowledge of grapheme-phoneme conversion rules (see below).
(2) The effect of imageability
on word reading
Imageability of words is a major factor in predicting the reading performance of deep dyslexics (Richardson, 1975a; 1975b; Shallice and Warrington, 1975). Jorm (1977) showed that poor readers between 8 and 11 years of age were more successful at reading one or two syllable concrete nouns than abstract nouns matched for length and Thorndike-Lorge word frequency. One possible explanation of this result is that although Jorm’s (1977) concrete and abstract nouns were matched on overall word frequency in samples of written English, as assessed by Thomdike and Lorge (1944), nevertheless concrete nouns might plausibly be expected to predominate over abstract-nouns in the reading experience of poor readers, particularly in the sorts of books commonly employed which contain abundant illustrations, with single words or short sentences describing the illustrations and naming objects depicted in them.
416
Andrew
W. Ellis
An alternative explanation of Jorm’s (1977) finding is that imageability may affect readability for all readers. normal or dyslexic. (Here one would have to propose that Jorm’s failure to find an effect of concreteness on the reading performance of good readers is due to a ceiling effect). This proposition is in line with the demonstrations by Holmes, Marshall and Newcombe (197 1) and Marshall, Newcombe and Holmes (1975) that the effect of syntactic word class (noun, verb or adjective) on the reading performance of deep dyslexics also predicts the reading success of 10 to 1 l-year-old children. and the tachistoscopic recognition thresholds of adult skilled readers for the same classes of word. Also relevant here is Spreen, Borkowski and Gordon’s (1966) demonstration using normal subjects that auditory recognition of words in noise is better for concrete than abstract nouns. A final, related point is that we do not have information about the effects of imageability on word recognition in other varieties of acquired dyslexia which one might wish to compare with developmental dyslexia (though see the discussion of ‘surface dyslexia’ below). (31 Pattiw
of errors made
As noted above, deep dyslexics make some visual errors when attempting to read single words. Jorm (1977) noted that his poor readers also tended to make errors in which the response was visually similar to the stimulus word, and claims this as evidence for functional similarities between developmental and deep dyslexia (Jorm, 1979, p. 26). This is weak evidence, however, for two reasons. First, visual errors occur in all the recognised types of acquired dyslexia. Thus, their occurrence in developmental dyslexia tells us nothing about which of the specific forms of acquired is closest in symptomatology to the developmental variety. Furthermore, visual errors are also produced by normal readers under conditions of rapid reading (e.g., Morton, 1964), or very brief presentations (Allport, 1977; Vernon, 1929): in other words. wherever reading difficulties are encountered, visual errors will be found also. A second objection to an argument based on error patterns is that the single most salient feature of “classical” deep dyslexia is the presence of semarztic reading errors. In this context, Wells (1906, pp. 77-8) reports an intriguing case of a child taught to read entirely by the look-and-say method who made semantic errors such as corn -+ “wheat”, locomotive + “engine” and dog -+ “cat”. These errors apparently disappeared after the child was taught to read phonically. Here is a clear case of semantic errors occurring in an apparently normal child who lacked knowledge of grapheme-phoneme correspondences. Jorm (1977), however, was unable to discover any evi-
Developmental and acquired dyslexia
4 17
dence of semantic relatedness between stimulus words and the error responses of poor readers. This apparent absence of semantic errors in developmental dyslexia must surely count strongly against any claim to functional similarity between developmental and deep dyslexia (and, by the same token, may be taken to be corroborative support for Holmes’ (1973; 1978) position). (4) Short-term
memory
impairment
Jorm (1979) devotes considerable space to a discussion of the relationship between developmental dyslexia and short-term memory impairment. There are a number of inconsistencies in his account which could be mentioned. Hardest to reconcile are the juxtaposed claims (Jorm, 1979, p. 23) that short-term memory difficulties in developmental dyslexics can account for a) their susceptibility to order errors in immediate recall (suggesting a “deficit in the auditory-verbal short-term store”) and b) their relative immunity to phonological (‘acoustic’) confusions in immediate recall (which apparently “suggests that dyslexics are not relying on the auditory-verbal short-term store to the same extent as normal readers”). This “heads-l-wintails-the-dyslexics-lose” logic is made even more disquieting when one recalls that phonological similarity is far and away the most potent cause of order errors in immediate recall (Conrad, 1965; Ellis, 1979; Watkins, Watkins and Crowder, 1974; Wickelgren, 1965). To return to the main argument, however, Jorm (1979) notes that one of Marshall and Newcombe’s (1973) two patients (the patient G.R.) and the patient K.F. of Shallice and Warrington (1975) both had reduced memory spans. Short-term memory data is not provided for the other deep dyslexic patients in the literature. One problem here is that both G.R. and K.F. show dysphasic as well as dyslexic symptoms, and their reduced memory spans may be a concomitant of their aphasia rather than a contributory cause of their dyslexia. Also, although several of the studies cited by Jorm (1979) found differences in group means on immediate recall tasks between normal readers and poor or dyslexic readers, it has yet to be demonstrated that an impoverished memory span is a necessary condition for the occurrence of either developmental or acquired (deep) dyslexia. (For what it is worth, one of Marshall and Newcombe’s (1973) two surface dyslexics - the patient J.C. - also displayed a reduced memory span, and is reported as having been more successful at reading concrete nouns than abstract nouns). In the most thorough investigation yet to be carried out on developmental dyslexia within the information-processing, dual-coding framework, Seymour and Porpodas (in press) conclude that the four dyslexic boys they studied possessed operational lexical (direct) and non-lexical (grapheme-phoneme)
routes, but that both systems were to a greater or lesser degree impaired. If anything. this conclusion supports Holmes’ (1978) theory over Jorm’s (1979); however. the two udult dyslexics studied by Seymour and Porpodas possessed generally efficient direct routes with impaired non-lexical routes a finding more in line with Jorm’s (1979) interpretation. One might hypothesize on the basis of Seymour and Porpodas’ (in press) findings that the direct visual-to-semantic word recognition channel improves with age in developmental dyslexics whilst the non-lexical route remains impaired. An alternative possibility is that there exist individual differences in developmental dyslexics such that some individuals are impaired on both modes of word recognition whilst others are disproportionately impaired on the non-lexical mode. Seymour and Porpodas’ results would then be explicable if the former group tend to achieve a more or less tolerable level of reading ability in adulthood ~ perhaps because of their dyslexia being due to a ‘maturational lag’ -- whilst the problems of the latter group persist into maturity. Jorm (1979) peremptorily dismisses the notion of varieties of developmental dyslexia, but the reader should also consult Vernon (1979) and the references contained therein for a more open-minded assessment. In summary, then, Jorm’s (1979) proposal that developmental dyslexics resemble brain-damaged deep dyslexics in their characteristics is not grounded on firm evidence. Holmes’ (1978) likening of developmental dyslexia to acquired surface dyslexia at least has the merit of demonstrating a clear similarity between the errors made by the two groups. Reading is an exceedingly complex skill; one which may be affected in a wide variety of ways by brain damage, producing a number of clinically distinguishable ‘pure’ forms of acquired dyslexia. Such pure forms, however, are rare, and the average patient with reading problems will manifest a mixed symptomatology showing characteristics of several of the pure forms. By analogy, though developmental deep dyslexics and developmental surface dyslexics rna_~’ exist - and, given that one has no reason to believe that genetic neuropsychological syndromes will be less varied than those resulting from brain injury, it would be surprising if this were not the case - one should tmt expect all developmental dyslexics to fall neatly into one or other of the postulated categories. In the field of acquired dyslexia, studies of heterogeneous groups of patients have been far less informative than intensive case studies of particular interesting and theoretically-relevant individuals. The present author is of the firm opinion that real advances in the understanding of developmental reading difficulties will occur only if the same approach is adopted.
Developmental
and acquired dyslexia
4 19
References Allport,
D. A. (1977 ‘) On knowing the meaning of words we are unable to report: The effects of visual masking. In S. Dornic (ed.), Attention and Performance VI. New York, Academic Press. Coltheart, M. (1978) Lexical access in simple reading tasks. In G. Underwood (ed.), Strategies oflnformation Processing. London, Academic Press. Conrad, R. (1965) Order errors in immediate recall of sequences. /. verb. Learn. verb. Behav., 4, 101 109. Ellis, A. W. (1979) Speech production and short-term memory. In J. Morton and J. C. Marshall (eds.), Psycholinguistics Series Vol. 2: Structures and Processes. London, Elek. Firth, I. (1972) Components of Reading Disability. Unpublished doctoral dissertation, University of New South Wales. Holmes, J. M. (1973) Dyslexia: a neurolinguistic study of traumatic and developmental disorders of reading. Unpublished Ph.D. thesis, University of Edinburgh. Holmes, J. M. (1978) “Regression” and reading breakdown. In A. Caramazza and E. 1~. Zurif (eds.), Language Acquisition and Language Breakdown: Parallels and Divergencies. Baltimore. John Hopkins University Press. Holmes, J. M., Marshall, J. C. and Newcombe, 1:. (1971) Syntactic class as a determinant of wordretrieval in normal and dyslexic subjects. Nature, 234, 416. Jorm, A. I:. (1977) Effect of word imagery on reading performance as a function of reader ability. J. educ. Psychol., 69, 46- 54. Jorm, A. F. (1979) The cognitive and neurological basis of developmental dyslexia: a theoretical framework and review. Con., 7, 19~-32. Marshall, J. C. (1976) Neuropsychological aspects of orthographic representation. In R. J. Wales and E. Walker (eds.), New Approach to Language Mechanisms. Amsterdam, North-Holland, Marshall, J. C. and Newcombc, F. (1966) Syntactic and semantic errors in paralexia. Neuropsychol., 4, 1699176. Marshall, J. C. and Ncwcombe, 1:. (1973) Patterns of paralexia: a psycholinguistic approach. J. PSJJC/IOlinguist. Res., I, 1755199. Marshall, J. C., Newcombe, 1;. and Holmes, J. M. (1975) Lexical memory: a linguistic approach. In A. Kennedy and A. Wilkes (eds.), Studies in Long-term Memory. London, J. Wiley. Morton, J. (1964) A model for continuous language behaviour. Lang. Speech, 7, 40-70. Newcombe, P. and Marshall, J. C. (1973) Stages in recovery from dyslexia following a left cerebral abscess. Cortex, 9, 3 19-332. Patterson, K. E. and Marcel, A. J. (1977) Aphasia, dyslexia and the phonological coding of written words. Quart. J. exp. Psychol., 29, 3077318. Richardson, J. T. E. (1975a) The effect of word imageabilityin acquired dyslexia. Neuropsychol., 13, 281-288. Richardson, J. T. E. (1975b) Further evidence on the effect of word imageability in dyslexia. Quart. J. exp. PsychoI., 27, 4455449. Saffran, t. M. and Marin, 0. S. M. (1977) Reading without phonology: Evidence from aphasia. Quart. J. exp. Psychol., 29, 5155525. Seymour, P. H. K. and Porpodas, C. D. (in press) Lexical and non-lexical processing of spelling in developmental dyslexia. In U. Frith (ed.), Cognitive Processes in Spelling. London, Academic Press. Shallice, T. and Warrington, E. K. (1975) Word recognition in a phonemic dyslexic patient. Quart, J. exp. Psychol., 22, 26 1-273. Spreen, O., Borkowski, J. G. and Gordon, A. M. (1966) Effects of abstractness, meaningfulness, and phonetic structure on auditory recognition of nouns. J. Speech Hear. Res., 9, 6199625. Thorndike, E. L. and Lorge, I. (1944) Tlte Teacher’s Wordhook of 30,000 words. New York: Columbia University, Teachers College, Bureau of Publications. Vernon. M. D. (1929) The errors made in reading. Medical Research Council Reports of the Committee upon the Physiology of Vision, Special Report Series, No. 130. London, His Majesty’s Stationery Office.
420
Andrew
W. Ellis
Vernon. M. D. (1979) Variability in reading retardation. Br. .I. Psychol., 70, 7-16. Watkins, M. J.. Watkins, 0. C. and Crowd&. R. A. (1974) The modality effect in free and serial recall as a function of phonological similarity. J. verb. Learn. verb. Rehav, 13, 430 447. Wells, II. L. (1906) Linguistic iapses. In J. ilcK. Cattell and F. J. E. Woodbridge (eds.), Archives of
Philosoph.v, Ps.vcholo~~~arid Scirrltific Methods, A’o. 6, I/Coltrmhia University Contrihutiorz to Philosophy and Psvcholog~~, 1.01. 14. .Vo. 3). New York, Science Press. Wickelgrcn, W. A. (1965) Short-term memory for phonemically-similar lists. Amer. J. Psychol.. 78, 567.~ 574.
Cognition, @Elsevier
I (1979) 421-433 Sequoia S.A., Lausanne
Discussion - Printed
in the Netherlands
The nature of the reading deficit in developmental dyslexia: A reply to Ellis* ANTHONY Deakin
F. JORM
University,
Australia
Ellis (1979) credits me with the view that “developmental dyslexia regarded as a genetic form of deep dyslexia” and then proceeds to my grounds for this view are ill-founded. He offers an alternative that developmental dyslexia may be likened to (acquired) surface In this paper I will attempt to answer Ellis’ (1979) criticisms and critique of his alternative proposal.
Developmental
might be argue that proposal, dyslexia. provide a
and Phonemic Dyslexia
The major point I would like to make is that Ellis has misrepresented my position somewhat. I did not, as he implies, propose that developmental dyslexia and (acquired) phonemic/deep dyslexia are functionally equivalent disorders. Rather, I pointed out certain functional similarities between the two disorders. This distinction is a crucial one. Because Ellis’ criticisms are directed at the view that developmental and phonemic dyslexia are functionally equivalent disorders, they are largely irrelevant to the theory I proposed. In order to clarify the matter, I will reiterate my original argument. What I proposed was that developmental dyslexia results from a genetically-based dysfunction of the left inferior parietal lobule. One of the lines of evidence cited to support this view was that this region of the brain plays a crucial role in reading, in particular reading using grapheme-phoneme rules. Evidence was cited to support the view that lesions to this region produce a deficit in reading via this phonological route. Furthermore, it was pointed out that “this form of acquired dyslexia has certain functional similarities to developmental dyslexia” (p. 26), namely difficulty in reading nonsense words, greater difficulty in reading low-imagery words than high-imagery words, and a tendency to make visual errors in reading. In short, what I argued was that developmental and phonemic dyslexia both involve a difficulty in reading via
*The author wishes to thank B. A. Kitchener for helpful comments on the manuscript and P. H. K. Seymour for providing a prepublication copy of the Seymour & Porpodas article. Requests for reprints should be sent to A. F. Jorm, Cognitive Psychology Research Group, School of Education, Deakin University, Geelong, Victoria, 3217, Australia.
422
A. F. Jornl
the phonological route and that this results in some similarities in the sort of words these dyslexics find difficult to read and in the sort of reading errors they make. However, I did not argue that the reading deficit is functionally identical in both disorders. The distinction may seem slight, but it is an important one. In phonemic dyslexia, brain damage has produced a total or near total blockage of the phonological reading route, whereas in developmental dyslexia the delayed or incomplete maturation which I hypothesized would lead only to poorer (rather than nonexistent) performance at reading via this route. I did not claim, as Ellis suggests, that developmental dyslexics “lack all knowledge of grapheme-phoneme conversion rules”. A second and more obvious difference between the two disorders is that phonemic dyslexia involves the loss of the ability to carry out grapheme-phoneme conversion after reading skills are already well developed, while developmental dyslexia involves a difficulty in grapheme-phoneme conversion which is present from the start of reading instruction. Both of these factors will produce differences between the reading performances of the two groups. However, despite these differences, there should be some similarities between the reading performances of developmental and phonemic dyslexics if in both disorders the phonological reading route is impaired and the direct visual route is intact. More particularly, words which are more easily handled by the direct visual route (e.g., high-imagery words) should be read better than words which are more easily handled by the phonological route (e.g., nonsense words, lowimagery words). Furthermore, reading errors which are characteristic of the direct visual route should predominate in both disorders. Having set the record straight on this matter, I will now deal with some of Ellis’ specific criticisms of the evidence I cited to support the theory.
Ellis questions whether developmental dyslexics are like phonemic dyslexics and “sit mute in front of printed nonwords” and suggests that they may fail at this task because of “the application of inappropriate, irregular letter-tosound correspondences”. The evidence I cited on the ability of developmental dyslexics to read simple nonsense words comes from an unpublished thesis by Firth (1972). Firth gave a 170-item nonsense word reading test to 8 year old children who were classified as being bad or average readers and of average or low IQ. His results are presented in Table 1. As can be seen from the table, the difference in performance between the bad and average readers is very large indeed. When this test was used to classify the sample (a total of 96 children) as bad or average readers, it produced 98% correct classifications. This result indicates that the notion of a phonological recoding deficit is sufficient in itself
Developmental
Table 1.
dyslexia: Reply to Ellis
423
Performance of bad and average eight year old readers on a nonsense word reading test (from Firth, 1972)
Group
Low IQ Average IQ
Average
Bad readers
readers
F ratio
Mean
S.D.
Mean
S.D.
18.5 35.4
19.8 25.6
119.0 118.0
17.8 24.5
356.80 127.54
to account for the reading disabilities of these children. By contrast, the ability to associate spoken words with strings of letter-like visual symbols (a measure of individual differences in the direct visual route), did not discriminate the groups at all. The F ratios for this contrast were both less than 1.0 and the task was found to produce 52% misclassifications. If, as Ellis suggests, dyslexic children have difficulties in reading via the direct visual route, we would expect this task to be a somewhat better discriminator than this. Firth (1972) offers no quantitative data on the sorts of errors made by children in attempting to read the nonsense words. However, he has this to say : The average readers sailed through the nonsense word test very rapidly, sometimes so fast that it was difficult to keep pace in recording answers. Usually they did not “sound out” these unfamiliar words, but pronounced them without hesitation. The bad readers, by contrast, found the task very difficult. The errors made by the bad readers were usually failures to produce any pronunciation at all, rather than the production of incorrect pronunciations. The worst of the bad readers, although able to read a few Schonell RI words, could not produce any pronunciation at all for these nonsense words. Even with explanations, examples, coaching, and sounding of some letters by the tester, they still found the task impossible. The best of the bad readers had some idea of what phonics was about, and could produce some correct pronunciations. However, these pronunciations were produced very slowly and labouriously and with much sounding out of the letters (pp. 1234).
However, data I have collected with 12 older children (mean age 11-2, mean reading age 8-2, mean IQ 111) showed that failures to produce a correct pronunciation are more common than omissions. These children were asked to read 15 nonsense words which were generated by taking some highimagery nouns and altering the initial letter (e.g., doctor became factor, and letter became retter). These children could read an average of 58% of the nonsense words correctly, whereas a matched control group read an average of 92% correctly. 36% of the dyslexic reading errors were real words, 57% were neologisms, and only 7% were omissions. It is interesting to note that
424
A. F. Jorm
despite the fact that the dyslexics could read only an average of 58% of the nonsense words, they managed to read an average of 88% of the real words from which the nonsense words were derived. For each of the 12 dyslexic children, performance on the real words was better than performance on the nonsense words, suggesting that these children must have been relying to a large extent on the direct visual route to read these words correctly. This latter finding argues against the position, which Ellis defends, that developmental dyslexics have an impairment of the visual-to-semantic route. Taken together, these results indicate that in the early years of primary school the dyslexic child is almost totally unable to generate pronunciations for written nonsense words, while by the final years of primary school nonsense words can be read but not very accurately. The effect
of imagery
on word
reading
Ellis has a number of criticisms of my (1977a) work showing that word imagery is a strong predictor of how easy a word is to read for disabled readers. In his criticisms, Ellis has misrepresented this study in a number of respects. Firstly, this study did not show, as Ellis claims, that concrete nouns are read more successfully than abstract nouns, but rather that high-imagery nouns are read more successfully than low-imagery nouns. Experiment 1 of the study showed that concreteness did not correlate with reading success at all when the effects of imagery, frequency, and length were partialled out, whereas imagery did correlate significantly with reading success when the effects of concreteness, frequency, and length were partialled out. Furthermore, in Experiment 2 of the study, the words were selected on the basis of their imagery values rather than their concreteness. A second misrepresentation is Ellis’ claim that: ...although Jorm’s (1977) concrete and abstract nouns were matched on overall word frequency in samples of written English, as assessed by Thorndike and Large (1944), nevertheless concrete nouns might plausibly be expected to predominate over abstract nouns in the reading experience of poor readers, particularly in the sorts of books commonly employed which contain abundant illustrations, with single words or short sentences describing the illustrations and naming objects depicted in them. What Ellis does not mention is that Experiment 2 matched the high-imagery and low-imagery nouns not only for Thorndike-Lorge (1944) frequency, but also for frequency according to Carroll, Davies and Richman’s (197 1) word count on children’s books. Furthermore, the nouns used in Experiment 1 were taken from a series of elementary reading books and all had frequencies of occurrence of greater than 4 in these books. The frequency estimates used in Experiment 1 for correlational purposes were again taken from
Developmental
dyslexia: Reply to Ellis
425
the Carroll et al. (197 1) word count. However, even if Ellis were correct that children get greater exposure to words which are easily illustrated in books, this might explain a concreteness effect (since concrete nouns are defined as those with sensory referents), but could not explain an imagery effect. Ellis goes on to argue that even if imagery does affect readability, this effect may apply to all readers, both normal and dislexic, and my failure to find such a result with good readers may have been due to a ceiling effect. Some evidence relevant to this question comes from studies by Richardson (1976) and Whaley (1978) examining the effects of word attributes on response latencies in simple reading tasks. Because these studies used response times as their dependent measure, they are not subject to the criticism that ceiling effects may have been operating. Using undergraduate students as subjects, Richardson (1976) found that word imagery had no significant effect on either pronunciation latency or word-nonword classification time. With a similar sample of subjects, Whaley (1978) looked at the correlations between various word attributes and word-nonword classification time. Using his data, there is a significant correlation of 0.29 between word imagery and reciprocal response time when the effects of concreteness, length, and frequency are partialled out. (However, with this type of data analysis, we do not know whether this result holds across subjects as well as across words.) More generally though, I believe that Ellis’ criticism may be missing the whole point. I do not deny that imagery affects the reading processes of good readers. In fact, the whole thrust of the third experiment of my (1977a) study was to show that this is the case. What I would suggest is that word imagery affects the ease with which a word can be read via the direct visual route. Since both normal and dyslexic readers use this route, word imagery will affect their processing. However, when the phonological route is operating efficiently, word imagery does not affect overall reading accuracy because words which cannot be handled by the direct visual route alone are handled by the phonological route (or perhaps some interaction of the two routes). Thus, good readers can read high-imagery and low-imagery nouns equally well and make very few reading errors. However, in developmental dyslexics where the phonological route is not operating efficiently, and in phonemic dyslexics where this route is not operating at all, the reader is forced to rely on the visual route and consequently word imagery predicts reading performance. Pattern of errors made
One of the points of evidence I cited to support the notion that developmental and phonemic dyslexia share functional similarities was that both involve a tendency to make visual errors in reading. Ellis notes that visual errors occur
426
A. b: Jam
in all types of acquired dyslexia and hence this finding tells us nothing about which type of acquired dyslexia developmental dyslexia is closest to. I must agree with Ellis on this point, although it should be noted that Marshall and Newcombe (1973) claim that only about 2% of the errors of their surface dyslexic patients were visual confusions. Ellis also points out that developmental dyslexia is clearly different from phonemic dyslexia in that the latter involves a tendency to make semantic errors whereas the former does not. Again, I must agree with Ellis that developmental dyslexics do not make pure semantic errors (without a visual component). However, I would also point out that the frequency of pure semantic errors varies considerably from one phonemic dyslexic to another, and in some cases is very low. For example, Marshall and Newcombe’s (1973) patient K.U. produced only 2 pure semantic errors in reading 170 words. Similarly only 4% of Shallice and Warrington’s (1975) patient’s errors were purely semantic (without a visual or derivational component). What does seem to be a general characteristic of phonemic dyslexics is a tendency to make derivational semantic errors (which in most cases are also visually related to the stimulus word). However, developmental dyslexics also sometimes make semantic errors which have a visual component. Some examples from the errors recorded in my 1977a study are: Christmas + “Christian”, prince + “princess”, life + “live”, surzslzine + “sunny”, slzeep -+ “shepherd”. Baron (1979) also reports some reading errors of this type in a group of dyslexic children. However, it may be more parsimonious to regard this type of error in dyslexic children as being visual rather than semantic. One obvious reason for the failure to find pure semantic errors in developmental dyslexia is that generally the dyslexic child has some ability to apply grapheme-phoneme rules to written words and this rudimentary ability is sufficient to rule out any pure semantic errors. For example, a child has only to be able to sound out the first letter of corn to know that it is not pronounced “wheat” - a full decoding is not necessary. Pure semantic errors are really only possible (at least in reading single words) where the ability to phonologically recode using rules is totally absent. Short-term
memor),
impairment
Ellis finds it hard to reconcile my juxtaposed claims that a deficit in the auditory-verbal short-term store of developmental dyslexics can account for both their susceptibility to temporal order errors and their immunity to phonological confusions in immediate recall. However, there is not necessarily any contradiction here at all. If we assume that the auditory-verbal short-term store is specialised to hold information in a phonological code and to store information about the temporal order of events, then it is to be expected
Developmental dyslexia: Reply to Ellis
421
that a child who has a deficiency of this store will show both little evidence of using a phonological code (i.e., phonological confusions) and poor retention of order information. The evidence which Ellis cites to show that phonologically confusable stimuli produce more order errors in immediate recall is quite irrelevant to the matter of how individual differences in the incidence of various types of errors arise. Ellis goes on to point out that it has yet to be shown that a poor memory span is necessary for the occurrence of either developmental or phonemic dyslexia. Yet again, I must agree with Ellis. However, it must be remembered that memory span is an imperfect measure of the funtioning of the auditoryverbal short-term store, since it is also affected considerably by control processes and the extent of a person’s long-term memory knowledge base; (Chi, 1976, for example, goes so far as to argue that these factors are solely responsible for age differences in memory span). A strict test of the notion that dyslexia is necessarily associated with deficiencies of short-term storage requires a more satisfactory measure than memory span.
Surface Dyslexia and Developmental
Dyslexia
I will now turn to Ellis’ proposal that developmental dyslexia may be likened to surface dyslexia. Ellis’ evidence for this view comes from the work of Holmes (I 973, 1978). As I have not had access to Holmes’ (1973) unpublished thesis, I will confine my comments to her published (1978) work. Holmes (1978) reports that the majority of the reading errors made by both developmental and surface dyslexics can be classified as partial failures of grapheme-phoneme correspondence. I would agree that dyslexic children sometimes make reading errors of this kind; such errors are particularly evident when dyslexic children give neologisms as responses. However, whether such errors are in the majority and whether they implicate a deticit of the direct visual route is debatable. A major problem in evaluating Holmes (1978) evidence is that she only presents selected examples of errors to illustrate her conclusions rather than a quantitative analysis of all the data or a complete corpus of her subjects’ errors. Furthermore, in the examples she cites, Holmes (1978) generally does not clearly designate those errors made by the children and those made by the adult patients. Thus, the reader is left with little recourse but to accept Holmes’ own interpretation of her findings. Yet despite the difficulties which her paper presents to the critical reader, Holmes’ analysis of her subjects’ errors can be seen to be inadequate in some respects. Many of the errors she cites as partial failures of grapheme-phoneme
428
A. F. Jom
correspondence could be equally plausibly classified as visual e.g., certain --f “carton”, beggar -+ “badger”, muscle + “musical”, reign + “region”, revise -+ “rivers”. Holmes is aware of this possibility and comments at one point: “Some readers may argue that at least some of these errors can be attributed to the visual similarity of stimulus and response words. To some extent such a judgment must remain a matter of personal bias (p. 93)“. A more general problem is that, even if Holmes’ (1978) error analysis is valid, the sort of reading deficit implied by such errors is by no means clear. If, as Holmes suggests, the reading errors of developmental dyslexics are characterised by a partial failure of grapheme-phoneme correspondence rules, then it seems odd for Ellis to argue that their primary deficit is in the direct visual route. It seems more plausible that a failure of rules would indicate a deficit of the phonological route. Ellis cites the work of Seymour and Porpodas (in press) as evidence against my position that developmental dyslexia involves a deficit of the phonological route with the direct route intact. Seymour and Porpodas concluded that the four dyslexic boys they studied showed some impairment of both routes. However, a careful look at Seymour and Porpodas’ findings and conclusions reveals that there is no incompatibility with my position. Seymour and Porpodas assessed the functioning of the direct route by tasks which compared performance on words versus nonwords, high-frequency words versus lowfrequency words, and irregular words versus regular words. These tasks were all measures of the achievement of the direct route (i.e., the extent of the subject’s sight-word vocabulary) rather than the ability of the direct route (i.e., the subject’s capability at forming symbol-meaning or symbol-sound associations). As Seymour and Porpodas themselves point out, the development of an extensive sight-word vocabulary may depend partly on having an adequate phonological route. This is exactly what I proposed in my original paper. Children with good phonics skills have a built-in teacher which they can use to add new words to their sight-word vocabulary, whereas children with poor phonics skills must rely on an external teacher for increasing their sight-word vocabulary. Poor achievement of the direct route is thus predicted from my theory, but poor ability of this route is not. Evidence recently reported by Baron (1977) and Brooks (1977) shows quite dramatically the dependence of the direct route on the phonological route. They taught adult subjects to associate strings of printed symbols from an artificial alphabet with spoken responses. In one condition of the experiment (the orthographic condition), the symbols could be related to the responses using grapheme-phoneme correspondence rules, while in the second condition (the paired-associate condition) there was only an arbitrary relationship between the symbols and responses. The surprising result was that
Developmental dyslexia: Reply to Ellis
429
after several hundred practice trials with the artificial words, the words in the orthographic condition were read faster than the words in the pairedassociate condition. Even after extensive practice, response times were influenced by the possibility of using grapheme-phoneme correspondence rules. Thus, a deficit in the phonological route in dyslexic children would produce not only a limited sight-word vocabulary, but also slow performance with highly familiar words.
The Notion of Sub-types of Developmental
Dyslexia
Ellis suggests that there may be subtypes of developmental dyslexia, with some dyslexics being impaired on both the visual and phonological routes and others impaired disproportionately on the phonological route. He claims that I preemptorily dismiss the notion that there are subtypes and cites the references in Vernon’s (1979) article as evidence in support of this notion. I would wish to make it clear that I do not deny the possibility that subtypes of developmental dyslexia exist. The theory I have presented might really only be applicable to one subgroup of dyslexics. However, I would argue that at present there is no satisfactory evidence to support the notion of subtypes and that it is therefore better to adopt the more parsimonious view that developmental dyslexia is a unitary disorder. Let us take, for example, the studies of Naidoo (1972), Boder (1973), and Doehring and Hoshko (1977) which Vernon (1979) cites as indicating the existence of subtypes. Although these three studies are amongst the best available on the subject, I would argue that they do not provide any evidence to support the notion of distinct subtypes. Naidoo (1972) attempted to identify subgroups by carrying out a single linkage cluster analysis on data collected from over 90 dyslexic boys. Variables relating to “developmental history, neurological status, speech, language, auditory memory, visuo-spatial function, arithmetic, perinatal status, and familial factors (p. 99)” were used as the basis for the cluster analysis. The results of this study give little comfort to those who believe in subtypes. Naidoo ( 1972), concluded: The fact that clusters did not emerge naturally does not support the existence of clearly defined types of dyslexia in this sample. One probable reason why no clusters were evident is that some features, for example, low scores on sound blending, Digit Span and Coding and lack of hand-eye-foot concordance, occur so frequently that they are unlikely to differentiate one group from another. Another reason may be that this highly selected sample was too homogeneous to include subgroups (p. 107).
430
A. F. Jorrn
Boder’s (1973) work is often cited to support the notion that there are subtypes. She concluded that the vast majority of dyslexic children can be clearly fitted into one of three groups which are respectively characterised by a disorder of the direct visual route (dyseidetic dyslexics), a disorder of the phonological route (dysphonetic dyslexics), and a disorder of both routes (mixed dysphonetic-dyseidetic dyslexics). However, Boder (1973) presented no evidence to support this conclusion apart from a few illustrative examples of the spelling and reading errors of each type of child. She simply asserted that the children can be fitted into one of these three groups and provided no quantitative analysis to support this view. From the general descriptions which Boder (1973) gives of the three types of dyslexia, it seems equally plausible that they could be regarded as points along a single continuum of reading deficit, with dyseidetic dyslexics at one end having a mild disorder and mixed dyslexics at the other having a very severe disorder. There is also reason to doubt the ability of Boder’s testing procedure to distinguish these groups if they do exist. Central to Boder’s classification of her subjects is the use of a word reading test in which words are classified as being in the child’s “sight_ vocabulary” if they are read within one second, and are classified as being read by “word analysis-synthesis skills” if they are read within l- 10 seconds. The assumption that words read within one second are in a child’s sight vocabulary is a doubtful one. With 8 year old children, I have found (Jorm, 1977b) that the pronunciation latency for reading a nonsense word is quite often less than a second. By Boder’s (1973) testing procedure, such nonsense words would be regarded as being part of the child’s sight vocabulary. Furthermore, in the majority of cases, I found that nonsense words had pronunciation latencies of less than 1 .S seconds. Since Boder did not use any reaction time recording equipment to aid her judgements, it seems doubtful that she could accurately discriminate words read within one second from words read within 1.5 seconds. It is thus possible that many of the words she classifies as being in a child’s sight vocabulary are in fact being read by grapheme-phoneme conversion. Doehring and Hoshko’s (1977) study used Q-technique factor analysis to derive dimensions of test profile types for a group of 34 children with reading problems and a group of 3 1 children with mixed educational problems. It is only the results of the former group which are of interest in the present context. Doehring and Hoshko classified these subjects into groups on the basis of which of three factors they had highest loadings on. Three of the 34 children could not be placed in any group and a further four children fell into more than one group. The three groups of children were characterised by having special difficulties in oral word and syllable reading for Group 1, auditory-visual letter matching for Group 2, and auditory-visual matching of
Dcvelopmer~tal dyslexia: Reply to Ellis
431
words and syllables for Group 3. However, all groups were similar in that they tended to perform consistently poorly on the majority of the tests used. The major limitation of Doehring and Hoshko’s study is that it did not include the results of a normal control group in the factor analysis. The point of using a factor analysis technique should be to show that some disabled readers differ from normal readers on certain components of the reading process while other disabled readers differ from normals on other components. The factors so derived should discriminate normal readers at one end from disabled readers at the other. Since Doehring and Hoshko did not include a normal group in the factor analysis, we do not know whether the three factors they derived represent test profile differences which distinguish disabled and normal readers. They left out the source of variance (between the test profiles of disabled and normal readers) which the factor analysis should have been attempting to analyse. Other more specific criticisms can be made of this study. The tests used were very brief (3 1 tests in approximately 1 hour) and consequently may have had low reliability, in which case the factors derived could have been highly contaminated with error variance. Another criticism is that a few of their reading disabled subjects could hardly be described as having reading difficulties (e.g., a boy aged 9-2 with a reading grade level of 5-2, and a boy aged 14-9 with a reading grade level of 992). Undoubtedly, other studies could be quoted to support the notion of subtypes, but I would argue that in all cases to date they are frought with inadequacies and cannot be taken as positive evidence for the existence of subtypes. What I would agree these sorts of studies show is that dyslexics are a quite varied group. However, the presence of variability does not necessarily imply the existence of subtypes. The reading processes and cognitive abilities of dyslexics can differ considerably from one child to another without these differences being related to their reading retardation. For example, there may be individual differences among dyslexics in the ability to associate spoken words with strings of visual symbols, but the evidence suggests that such individual differences are not a source of variance in reading achievement (Firth, 1972; Jorm, 1977a). It would, therefore, not be useful to say that there are two subgroups of developmental dyslexics, with one group being good at associating spoken words with visual symbols and the other group being poor at this skill, any more than it would be useful to argue for the existence of corresponding subgroups amongst normal readers. Intensive Case Studies as a Research Strategy Ellis’ final point is that “in the field of acquired dyslexia, studies of heterogeneous groups of patients have been far less informative than intensive case
432
A. F. Jorm
studies of particular, interesting and theoretically-relevant individuals”. He argues that therefore “real advances in the understanding of developmental reading difficulties will occur only if the same approach is adopted”. I would argue to the contrary that a case study approach is unwise unless the subject population of interest is such a small one that only single cases can be obtained. It is undoubtedly because of the rarity of brain-damaged patients in whom dyslexia is the predominant problem (see Marshall & Newcombe, 1973, p. 175) that a case study approach has been used extensively in this area. I think it should be kept in mind that group studies are really case studies with replications. By replicating findings over a group of cases, we are able to sort out those characteristics which are common to all cases from those which are idiosyncratic to particular individuals. By using a case study approach we are in danger of being misled by the idiosyncracies of the particular case we are studying and by the imperfect nature of our tests. However, I would agree with Ellis that we need “intensive” studies, by which I mean studies using multiple dependent variables, if meaningful progress is to be made in the area. Let me say in conclusion that I certainly would not claim that the theory of developmental dyslexia I proposed is correct in all respects. Undoubtedly it shares the characteristic of all scientific theories of being in conflict with some evidence from the moment of its birth. I would not deny for a moment that there is some current evidence which the theory cannot easily accommodate. However, I believe that the theory accounts for the current evidence much better than any other theory which has been proposed. It is far easier to pick holes in the theory than to propose a better alternative.
References Baron,
J. (1977) Mechanisms for pronouncing printed words: Use and acquisition. In D. La Berge and S. J. Samuels (eds.), Basic Processes in Reading: Perception and Comprehension. Hillsdale, Lawrence Erlbaum. Baron, J. (1979) Orthographic and word-specific mechanisms in children’s reading of words. Child Dev., 50, 555-666. Boder, E. (1973) Developmental dyslexia: A diagnostic approach based on three atypical readingspelling patterns. Dev. Med. Child Neural., 15, 663-687. Brooks, L. R. (1977) Visual pattern in fluent word identification. In A. Reber and D. Scarborough (eds.), Towards a Psychology of Reading, Hillsdale, Lawrence Erlbaum. Carroll, J. B., Davies, P. and Richman, B. (1971) The Americam Heritage Word Frequency Book. Boston, Houghton Mifflin. Chi, M. J. [I. (1976) Short-term memory limitations in children: Capacity or processing deficits? Mem. Cog., 4, 559%572. Doehring, D. G. and Hoshko, I. M. (1977) Classification of reading problems by the Q-technique of factor analysis. Cortex, 13, 281-294.
Developmental
dyslexia: Reply to Ellis
433
Ellis, A. W. (1979) Developmental and acquired dyslexia: Some observations on Jorm (1979) Cog., 7,413-420. Firth, 1. (1972) Components of Reading Disability. Unpublished doctoral dissertation, University of New South Wales. Holmes, J. M. (1973) Dyslexia: A Neurolinguistic Study of Traumatic and Developmental Disorders of Reading. Unpublished doctoral dissertation, University of Edinburgh. Holmes, J. M. (1978) “Regression” and reading breakdown. In A. Caramazza and E. B. Zurif (eds.), Language Acquisition and Language Breakdown: Parallels and Divergencies. Baltimore, Johns Hopkins University Press. Jorm, A. F. (1977) Effect of word imagery on reading performance as a function of reader ability. J. Educ. Psychol., 69 46-54 (a). Jorm, A. F. (1977) Children’s reading processes revealed by pronunciation latencies and errors. J. Educ. Psychol., 69, 166-171 (b). Marshall, J. C. and Newcombe, F. (1973) Patterns of paralexia: A psycholinguistic approach. J. Psycholinguist. Res., 2, 175-l 99. Naidoo, S. (1972) Specific Dyslexia. London, Pitman. Richardson, J. T. E. (1976) The effects of stimulus attributes upon latency of word recognition. Brit. J. PsychoI., 67, 315-325. Seymour, P. H. K. and Porpodas, C. D. (in press). Lexical and non-lexical processing of spelling in developmental dyslexia. In U. Frith (ed.), Cognitive Processes in Spelling, London, Academic Press. Shallice, T. and Warrington, E. K. (1975) Word recognition in a phonemic dyslexic patient. Quart. J. Exp. Psychol., 27, 187-199. Thorndike, E. L. and Lorge, I. (1944) The Teacher’s Wordbook of 30,000 Words. New York: Columbia University, Teachers College, Bureau of Publications. Venezky, R. L. (1970) The Structure of English Orthography. The Hague, Mouton. Vernon, M. D. (1979) Variability in reading retardation. Brit. J. Psychol., 70, 7-16. Whaley, C. P. (1978) Word-nonword classification time. J. verb. Learn. verb. Behau., 17, 143-154.
Cognition
435
Contents of Volume 7
Number 1 Editorial.
I
MARY SUE AMMON and DAN I. SLOBIN (University of California, Berkeley) A cross-linguistic study of the processing of causative sentences, 3 ANTHONY F. JORM (Deakin University, Australia) The cognitive and neurological basis of developmental work and review, 19
dyslexia:
A theoretical
frame-
Brief Reports
HUGO VAN DER MOLEN and JOHN MORTON (MRC Applied
PsychoZogy
Unit,
Cambridge)
Remembering
plurals: Unit of coding and form of coding during serial recall, 35
ANNE CUTLER and JERRY A. FODOR (Massachusetts Institute of Technology) Semantic focus and sentence comprehension, 49
Discussions
JOHN KLOSEK (Graduate Center, CUNY) Two unargued linguistic assumptions in Kean’s “phonological” matism, 61
interpretation
of agram-
MARY-LOUISE KEAN (University of California, Irvine) Agrammatism: A phonological deficit ?, 69 HELEN GOODLUCK (University of (University of Massachusetts, Amherst) A reevaluation
of the basic operations
Wisconsin, Madison)
hypothesis,
85
JERRY A. FODOR (Massachusetts Institute of Technology) In reply to Philip Johnson-Laird, 93 Books Received,
97
and LAWRENCE
SOLAN
436
Contents
Number 2 THOMAS R. SHULTZ, ARLENE DOVER and ERIC AMSEL (McGill University) The logical and empirical bases of conservation judgements, 99 ANAT NINIO (The Hebrew Universitv, Jerusalem) Piaget’s theory of space perception in infancy, 125 FRANCESCO ANTINUCCI (CNR, Rome), ALLESSANDRO Rome) and LUCYNA GEBERT (University of Genoa) Relative clause structure, 145
relative clause perception,
DURANTI
(University oj’
and the change from SOV to SVO,
Discussions
MARK S. SEIDENBERG
(Columbia
University)
and LAURA A. PETTITO (New York
University)
Signing behavior in apes: A critical review, 177
Number 3 STEVEN PINKER (Harvard University) Formal models of language learning, 217 TIMOTHY E. MOORE (Glendon (State University of New York) Speeded recognition
College,
of ungrammaticality:
York University) and IRVING
Double violations,
BIEDERMAN
285
Discussions
STEPHEN P. SCHWARTZ (Ithaca College) Natural kind terms, 301 ANNE
ERREICH,
JUDITH
WINZEMER
MAYER and VIRGINIA
VALIAN
(CUNY
Graduate Center)
Language acquisition
hypotheses:
A reply to Goodluck and Solan, 3 17
Number 4 JOSE MORAIS,
LUZ CARY, JESUS ALEGRIA
and PAUL BERTELSON
Libre de Bruxelles)
Does awareness of speech as a sequence of phones arise spontaneously?,
323
(Universitd
Cognition
GUY WOODRUFF and DAVID PREMACK (University of Pennsylvania) Intentional communication in the chimpanzee: The development of deception,
437
333
J. LANGFORD and V. M. HOLMES (University of Melbourne) Syntactic presupposition in sentence comprehension, 363
Discussions
L. JONATHAN COHEN (Oxford University) On the psychology of prediction: Whose is the fallacy?, 385 DANIEL KAHNEMAN (University of British Columbia) and AMOS TVERSKY (Stanford University)
On the interpretation
of intuitive
probability:
A reply to Jonathan
ANDREW W. ELLIS (University of Lancaster) Developmental and acquired dyslexia: Some observations ANTHONY F. JORM (De&in University) The nature of the reading deficit in developmental
Cohen, 409
on Jorm (1979), 4 13
dyslexia: A reply to Ellis, 421
Cognition
Author
Alegria, Jesus, 323 Ammon, Mary Sue, 3 Amsel, Eric, 99 Antinucci, Francesco, 145
Gebert, Lucyna, 145 Goodluck, Helen, 85
Bertelson, Paul, 3 23 Biederman, Irving, 285
Jorm, Anthony,
Cary, Luz, 323 Cohen, Jonathan, L., 385 Cutler, Anne, 49
145
Index of Volume 7
Petitto, Laura, A., 177 Pinker, Steven, 217 Premack, David, 333
Holmes, V. M., 363 F., 19,421
Kahneman, Daniel, 409 Kean, Mary-Louise, 69 Klosek, John, 61 Langford, J., 363
Dover, Arlene, 99 Duranti, Alessandro,
439
Schwartz, Stephen, P., 301 Seidenberg, Mark, S., 177 Shultz, Thomas, R., 99 Slobin, Dan, I., 3 Solan, Lawrence, 85
Tversky, Amos, 409
Ellis, Andrew, W., 413 Erreich, Anne, 3 17
Mayer, Judith, W., 317 Moore, Timothy, E., 285 Morais, Jose, 323 Morton, John, 35
Valian, Virginia, 3 17 van der Molen, Hugo, 35
Fodor, Jerry, A., 49,93
Ninio, Anat, 125
Woodruff, Guy, 333
440
Cognition
Erratum to Volume
Mark S. Seidenberg and Laura A. Petitto, review, Cog., 7, 2, 177-215.
Signing behavior
7
in apes: A critical
Unfortunately, two major errors were introduced in this paper before printing. The printers offer their sincere apologies to the authors and readers. On page 206, the sentence starting at the end of the 25th line should have read: “If Y was a flat surface,
they typically
placed the object on it.”
they typically
placed the object in it.”
and not: “If Y was a flat surface,
On page 2 11, in the penultimate paragraph lines 23 and 24 were inadvertently added. This paragraph should have read : Second, the source of many of the problems in the existing literature may be traced to the Gardners’ statement that their analyses “do not depend on any special theory of linguistics or psycholinguistics” (1975, p. 256). Their analyses depend upon a special theory that is created de facto by their acceptance of a simplistic set of assumptions about language structure and language learning. It is possible that Washoe could have accomplished more if her trainers had possessed a richer conception of language and communication.