Editorial
Hiroshi Amma Faculty of Education, Tokyo University, Tokyo, Japan Paul Bertelson Laboratoire de Psychologie E...
25 downloads
1447 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Editorial
Hiroshi Amma Faculty of Education, Tokyo University, Tokyo, Japan Paul Bertelson Laboratoire de Psychologie Experimentale, Universite Libre de Bruxelles Avenue Adot’phe Buyl, 117 1050 BruxelZes, Belgique Ned Block Dept. of Philosophy, M.Z.T. Cambriolpe, Mass., U.S.A. T. G. R. Bower Dept. of Psychology, University of Edinburgh, Edinburgh, Great Britain Franwis Bresson Laboratoire de Psychologie, E.P.H.E. Paris, France Roger Brown Dept. of Psychology, Harvard University, Cambridge, Mass., U.S.A. Jerome S. Bruner Center for Cognitive Studies, Harvard University, Cambridge, Mass., U.S.A. Noam Chomsky Dept. Modern Languages and Lit&sties, M.Z.T., Cambridge, Mass., U.S.A.
Peter D. Eimas Walter S. Hunter Laboratory of Psychology, Brown University, Providence, Rho& Island 02912, U.S.A. Gunnar Fant Lob. of Speech Transmission, Royal Institute of Technology, Stockholm, Sweden Jerry Fodor Dept. of PsychoZogy, M.Z.T., Cambridge, Mass., U.S.A. Kenneth Forster Dept. of Psychology, Monash University, Clayton, Melbourne, Australia Merrill Garrett Department of Psychology, M.Z.T. ElO-034 Cambrt@e, Mass. 02139, U.S.A. Pierre Greco Laboratoire de Psychologie, 54, bvd. Raspail, Paris de. France Jean-Blab Grize, I, ChantemerZe, Neuch&tel, St&se David T. Hakes Department of Psychology, University of Texas, Austin, Texas 78712, U.S.A.
board
Henry Hecaen Directeur d’Etudes, Ecole Pratique des Hautes Etudes, Unite de Recherches Neuropsychologiques, Z.N.S.E.R.M., 2, Rue d’dlesia, 75 Paris 14, France Michel Imbert Laboratoire de Neurophysiologic, College de France, 11 Place Marcelin Berthelot, Paris 5, France Barbel Inhelder Institut des Sciences de I’Education, Palais Wilson, Gendve, Suisse Marc Jeannerod Laboratoire de Neuropsychologie Experimentale, 16 Av. Doyen Lepine 69500 Bron, France James Jenkins Center for Research and Human Learning, University of Minnesota, Minneapolis, Minn. 55455 U.S.A. Daniel Kahneman Dept. of Psychology, The Hebrew University of Jerusalem, Israel
Jerrold J. Katz Dept. of Philosophy, M.I.T., Cambrkige. Mass., U.S.A. Edward Klima Dept. of Linguistics, La Jolla, University of California, San Diego, Calif. 92037, U.S.A. Eric H. Lenneberg Dept. of Psychology, Cornell University, Ithaca, N. Y., U.S. A. Alexei Leontiev Faculty of Psychology, University of Moscow, Moscow, U.S.S.R. Wilhelm Levelt Psychological Laboratory, Nomegen University, Nijmegen, the Netherlands A. R. Luria University of Moscow, 13, Frunze Street, Moscow G. 19, U.S.S.R. John Lyons Dept. of Linguistics, Adam Ferguson Building, Edinburgh, Great Britain Humberto Maturana Escuela de Medicine, Universidad de Chile, A. Sanartu 1042, Santiago, Chile
John Morton Applied Psychology Unit, Cambridge, Great Britain George Noizet Laboratoire de Psychologie Experimentale, Abe-en-Province, France Domenico Parisi Instituto di Psicologia, Consiglio Nazionale delle Richer&e, Rome, Italy Michael Posner Dept. of Psychology, University of Oregon, Eugene, Oregon, U.S.A. Nicolas Ruwet Dept. de Linguistique, Centre Univ. de Vincennes, Paris, France Harris B. Savin Dept. of Psychology, University of Pennsylvania, Philadelphia, Pa., U.S.A. Robert Shaw Center for Research and Human Learning, University of Minnesota, Minneapolis, Minnesota U.S.A. Hermina Sinclair de Zwart Centre d’Epistemologie Genetique, Get&e, Suisse
Dan I. Slobin Department of Psychology, University of California, Berkeley, California 94720, U.S.A. Jan Smedshmd Institute of Psychology, Universitet I Oslo, Oslo, Norway Sydney Strauss Department of Educational Sciences Tel Aviv University, Ramat Aviv, Israel Alma Szeminska Olesiska 513, Warsaw, Poland Yoshihisa Tanaka Dept. of Psychology, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan Hans-Lukas Teuber Dept. of Psychology, M.I.T., Cambridge, Mass. 02139 U.S.A. Peter Wason University College London, Gower Street, London W.C.1, Great Britain
Editorial
When we accepted a paper from South Africa1 on its scientific merits, we were at first tempted to justify its inclusion in Cognition since many policies of that country are recognizably abhorrent. However, we concluded that it would be false liberalism to detail a defense of our editorial decision. Of course we deplore the regime currently governing South Africa, but then we are equally disturbed by policies of many other countries, e.g. the killing of Indians with modem warfare techniques for expropriation of land, the bombing of people fighting for their national independence, the penetration into underdeveloped countries by multinational trusts, the murder of political opponents, torture and persecution . . . and let us not forget those countries that publicly voice their unerring pursuit of justice and morality while they quietly sell arms to any buyer. ‘Liberals’ are not the only ones who consider condemning individuals because of the country or group they belong to. In some countries, inteIlectuaIs who have been banned from the ruling party are not allowed to publish their scientific results in professional journals or elsewhere.2 Other examples could be brought to the reader’s attention. In every case, we are confronted with actions that attempt to preserve the power of a State or cult without regarding individuals and the social nuclei in which they live and express themselves. It is evident that most nations participate in the practice of power which leads to acts that would be considered immoral if carried out by individuals. This does not release individuals from sharing responsibility in the acts of the geographically defined countries where they happen to reside. However, it is a traditional argument of scientists to distinguish an individual’s politics from an individual’s behavior as a scientist. This is the essence of ‘scientific neutrality’. Thus it is clearly a mistake to judge an individual by the country in which he or she lives. However, we must raise the question of the scientist’s social responsibility, the conditions under which scientific endeavours are harmful and the way in which such questions relate to the image that we have of ourselves as scientists. We have all been taught to consider that great men of science are dedicated, totally 1. Miller, R., The use of concrete and abstract concepts by children and adults. Cognition,
2(l), 49-58. 2. Le Momfe, December 27,1972.
8
Editorial
devoted, generous and untarnished by base political interests. There is nothing new about this ‘pasteurized’ conception of the Professor. Society has always had a dynamic need to preserve an image of the academy as a realm ensuring the unbiased evaluation of cultural knowledge and its applications. This image simplifies the complex motivations and contradictions which are embodied within each scientist and the scientific corporation. We do not deny that this conception may have been functional in the past; however it is being examined with increasing mistrust. Today some people are so disgusted by the social uselessness of much research that they have turned against all science; others contend that science is totally political. Still others suggest that science reflects primarily its logistics e.g. the structure of the laboratories, the funding system, the manner in which promotions are decided. Though we may agree or disagree with these points, we must grant that this debate reflects a growing awareness that science does not function in a political vacuum, but in societies. The socio/political background is a major factor in the determination of the way in which theoretical problems are conceived and formulated and also determines financial and administrative decisions. New research ideas that receive support in turn provide new models for society to consider. But the dependence of these models on the existing social forms guarantees that they will not be ‘new’ but will be a superficial reorganization of the pre-existing socially determined structures. Some scientists have adopted an odd way of extricating themselves from any debate on such issues. They claim that rational, scientific knowledge is neutral and that all the current doubts about science are triggered only by irresponsible applications. We even read that scientists themselves ought to oversee the technological applications of scientific knowledge and thereby play the role of apolitical and uninvolved judges. However, such a role would be false in many respects. In the first place, scientists work in the context of a competitive society, in competitive research centers where money is distributed according to goals that they rarely understand. In many such centers independence, criticism and questions are allegedly sought by all. However, ‘new’ ideas must fall within the pre-established limits of the subject under discussion and thus are emasculated from the outset. Questioning ‘side issues’ like the motivations behind considering specific theoretical or practical problems leads to the discovery that some questions are not as welcome as others. Clearly science is not as free to experiment as one would like. In making such statements we do not pretend that we have an answer as to what constitutes ‘socially valuable or valueless’ research. What we want to stress is the necessity of understanding what motivates scientists to follow certain lines of research. If one never questions why one pursues one activity rather than another, if one does not examine the social context and the ideology behind one’s work, the inevitable outcome is manipulation by those who have a clear intention to lead the community
Editorial
9
in a particular direction. Social science is a particularly sensitive field when considered from this standpoint. It is characteristic that many of those who insist on an apolitical conception of science emerge as major government advisors. On closer consideration of the political views of scientists, however, it becomes clear that they are willing to use the alleged sanctity of science as a political argument. For example, some ‘apolitical’ scientists have signed a petition maintaining that all research is neutral including that which explores ‘racial’ bases for individual capacity.3 They go so far as to claim that their most general interest is to protect academic freedom. First they alarm the reader with a reminder of Hitler (who, ironically, in addition to everything else, passed the Eugenic Sterilization Act of 1933). They then argue that many of those who attack them today are ‘militants’ and even ‘anti-scientists’ and suggest that to express hereditarian views or recommend further studies of the biological basis of behavior today is similar to being a heretic in the Middle Ages. One can argue with the petitioners quite vigorously on these matters. Rather than deplore attacks by militants and anti-scientists, we are more impressed by the fact that even when racism attained the hideous summit the apolitical scientists refer to, geneticists rarely spoke out: Though genetics were misused, geneticists remained silent. Why?” ‘The aversion of scientists for publicity and popularization is a long standing tradition in many countries. Those scientists who did go to the public were frequently suspected of charlatanism, weakness of character, and “irresponsibility”.’ It is still rare to find scientists questioning their own role. However, there is generally more awareness and response to such proposals by a number of serious researchers. It is striking that such reactions are perceived by the petitioners as the proposals of ‘militants’ and ‘anti-scientists’. Furthermore, the claim that those scientists who wish are not free today to investigate the influence of ‘race’ on behavior is absolutely false. In the past few years there has been a great deal of such research which presupposes racist and elitist ideas. Often critics of the position that heredity is the necessary basis for social distinction are at the same time philosophical nativists. Chomsky, for example, one of the most coherent critics of Skinner and Herrnstein, is also known to most psychologists for his nativist theories. This demonstrates that an interest in understanding the psychology of humanity can lead to a nativist assumption about what all people have in common. Others are interested in finding out differences between groups of people which is a use of psychology that serves an elitist ideology by presupposing it. But, as a counterpetition has stated,6 ‘theories of racial inferiority are rendered untenable by the 3. The resolution
on scientific
freedom,
cember 1972) Encounter. 4. Carlson, E.A (1972) A tormented
(Dehistory
(Book Review). Science, 180, 584-586. 5. The Committee Against Racism at the University of Connecticut, Storrs, COM.
10 Editorial
evidence of human history: every population has developed its own complex culture. Contrary to the supremacist view, the people of Africa and Asia have, at various times produced civilizations far more advanced than those existing simultaneously in Europe. Moreover, the constant geographical shift of centers of culture is in itself proof of equal capabilities of all people. It is nonsense to suppose genetic superiority wandering about the world.’ In itself the notion of superiority pertains to value judgments rather than to scientific enquiry. Although there are certain current theoretical proposals to turn psychology into a science of values,” such a science is not only premature today but it will remain so until it is based on a theory of humanity. It is in the context of such a controversy that the possible social applications of hereditarian views become most poignant. For example, we may decide as some societies have, that the ailing are to be suppressed, or, on the contrary, since they are less active physically, that they are to serve as the intellectuals. Other societies may decide that all heavy work should be produced by children since they are more docile than adults, while another society assumes that children should not have to work until puberty or adolescence. Thus the utilization of differences in humans is entirely based on a choice of values and cannot be scientifically decidable as an issue. But when any society is based on rewarding those who have the greatest socially defined ‘ability’, by placing them in a priviledged position, social conflict is an internal necessity. To study the alleged genetic basis for social conflict simply diffuses attention from the real causes. It is puzzling why there is so much interest today in this kind of research among otherwise serious scientists. There is no theory that unites the higher mental processes, so apparently mental tests are basically uninterpretable. Even the most general properties governing language and thought are still unknown to us; accordingly, how is it possible to engage in secondary enterprises such as ‘studying’ whether one group of humans is intellectually superior to another one, or to ask if ‘race’ (a difficult concept to define biologically) is a good indicator of success in this society for genetic reasons rather than for reasons of differential oppression. Given the current fundamental ignorance about cognition, a line of ‘research’ such as this can only be politically motivated, and it is with political arguments that it must be countered. To return to our original issue - the acceptance of a paper from a politically unfashionable country. The considerations raised make it clear that within every modern society the structure of social and scientific enterprises presupposes a form of elitism, just as all modern countries act unethically. This could lead to a sort of perverse censorship if we examined every article for geographical and ideological cleanliness: Such is not our intention. Rather we believe that the only hope of pulling social science 6. Skinner, B.F., (1972) Cunlularive 4 selection of‘ papers. New York,
record: Appleton
Century Croft5
Editorial 11
out of the dilemma we have outlined is to increase public and private discussion of the new ways of using science as a force for change rather than as a force for maintenance of the modus vivendi quo ante. No doubt we shall be accused of introducing political and social considerations into purely scientific debates. This does not disturb us; we hope that we have clarified why we cannot agree with the claim that science is apolitical and asocial. However, we have certainly not given concrete answers to the issues raised at the beginning of the editorial. One reason is that the social responsibility of scientists is a dynamic concept that evolves every day: The debate must be expanded and continued as an intrinsic part of all future science. J. MEHLER T. G. BEVER
N.B. The above-signed editorial does not necessarily reflect opinions of the other members of the Editorial Board. We urge readers to
consider the issues we raise and respond to them.
1
Seven
principles parsing
JOHN
of surface structure in natural language*
KIMBALL
Indiana University
Abstract In generative grammar there is a traditional distinction between sentence acceptability, having to do with performance, and sentence grammaticality, having to do with competence. The attempt of thispaper is to provide a characterization of the notion ‘acceptable sentence’ in English, with some suggestions as to how this characterization might be made universal. The procedure is to outline a set of procedures which are conjectured to be operative in the assignment of a surface structure tree to an input sentence. To some extent, these principles of parsing are modeled on certain parsing techniques formulated by computer scientists for computer languages. These principles account for the high acceptability of right branching structures, outline the role of grammatical function words in sentence perception, describe what seems to be ajixed limit on shortterm memory in linguistic processing, and hypothesize the structure of the internal syntactic processing devices. The operation of various classes of transformations with regard to preparing deep structures for input to parsing procedures such as those outlined in the paper is discussed.
1.
Introduction
In grammar there is a distinction between those sentences which are rejected by speakers on grounds of grammaticality versus those rejected for performance reasons. Thus, there is a quadripartite division of the surface structures of any language: (a) Those sentences which are both grammatical and acceptable, e.g. ‘It is raining’; * I am indebted to Frank DeRemer for many discussions on the material in this paper, particularly for providing information on current computer work in parsing, and for sug-
gestions concerning the principles discussed in section 4; also to Cathie Ringen, Jorge Hankammer, Paul Postal and the reviewers for reading and commenting on an earlier version. Cognition 2(l), pp. 15-47
16 John Kimball
(b) those sentences which are grammatical but unacceptable, e.g. ‘Tom figured that that Susan wanted to take the cat out bothered Betsy out’; (c) those which are ungrammatical but acceptable, e.g. ‘They am running’; and (d) those which are both ungrammatical and unacceptable, e.g. ‘Tom and slept the dog’. Part of the problem facing linguists, in particular psycholinguists, is to find a characterization of the processing of linguistic experience, i.e. performance, adequate to distinguish unacceptability from ungrammaticality. In the following paper I will present an interconnected set of principles of surface structure parsing in natural language which is intended to provide a characterization of the notion ‘acceptable sentence’. In particular, these hypotheses will explain the difficulty experienced by native speakers of English when sentences of the classical problem types such as in (1) are encountered. (1) a. That that that two plus two equals four surprised Jack astonished Ingrid bothered Frank. b. Joe believes that Susan left to be interesting. c. The boat floated on the water sank. d. The girl the man the boy saw kissed left. Further, these hypotheses will explain why a sentence like (2b) is not normally interpreted as (2a) in the same way that (3b) can be interpreted as (3a). (2) a. The woman that was attractive took the job. b. The woman took the job that was attractive. (3) a. The woman that was attractive fell down. b. The woman fell down that was attractive. The first section of the paper will be concerned with the traditional hypotheses that have been presented to account for the unacceptability of sentences like (1). In Section 3 I will look at some of the parsing techniques designed by computer scientists for processing the sentences of programming languages. Of particular interest will be those techniques which allow a string to be parsed top-down left-to-right building a phrase structure tree over the string as it is read, since this is the process employed by speakers of natural languages. We will find that there are restrictions on the class of languages which allow this kind of parsing, and it will be possible to examine the claim that part of the function of transformations is to put deep structures in a form to allow limited memory left-to-right parsing. It will also be possible to examine the claim (c$ Chomsky, 1965, chapter 1) that transformations function to construct a surface tree which is optimally parsed by those techniques available to native speakers. In Section 5 we will see that the known transformations of English divide themselves into three distinct classes with respect to the kinds of output structures that are produced.
Seven principles of surface structure parsing in natural language
2.
17
Some previous accounts of surface structure complexity
It is interesting to consider first attempts to explain the low acceptability of a sentence like (Id) ‘The girl the man the boy saw kissed left’. The surface parse tree for this sentence is shown in (4). (4) NP
VP
-
Npft
.cFN
I
the
I I kissed
g&l Np -Deli tl!e
N&P m/n
De% I I the boy
I saw
One hypothesis (Fodor and Garrett, 1967) is that such sentences are perceptually complex due to the low proportion of terminal symbols (words) to sentences. In general, one of the effects of transformations seems to be to reduce the non-terminal to terminal ratio in surface structures (Chomsky, 1965) as compared to deep structures; i.e., to map relatively tall structures such as (5a) into relatively flat structures such as (5b). b.
A D-C d -b
a a-b
It is argued that flaL structures are preceptually less complex, although no explanation of this fact has been offered. Why it should be that flat structures are easier to parse than deeply embedded structures is accounted for by Principle Two of Section4 below. The Fodor and Garrett hypothesis seems to be a specific form of this more general hypothesis. This specific hypothesis is, however, easily falsified by considering sentences which have fewer terminals per S than (Id), but which are perceptually much less complex. (6) is such a sentence.
18 John Kimball
(6) NP I
V&P
th&P
I
,bes I she
I runs
ma* I die
VP I w’alks
In (Id) the ratio of terminals to S-nodes is 9: 3 ; in (6) it is 7: 3. Thus, simple terminal to S ratio will not account for all cases of perceptual complexity, even those it was designed to cover. The more general terminal to non-terminal ratio is also inadequate. That this ratio is relevant to perceptual complexity can be predicted by the principles presented below in Section 4. Chomsky and Miller (1963) claim that (Id) is complex because the perceptual strategy of assigning a N (noun) to the following V (verb) is interrupted. Thus, the subject-verb relation must be assigned to the boy and saw before it can be assigned to the man and kissed, and this before it can be assigned to the girl and left. This hypothesis treats the difficulty of center embedding as one of assigning grammatical and, perhaps, semantic relations to the elements of a sentence. The principles discussed below treat the difficulty in (Id) in terms of surface tree configuration only. I conjecture, then, that it is not the impediment of a principle of semantic interpretation which is involved in the complexity of (Id). Rather, the complexity lies in the structure of the surface tree. The principles below attempt to outline the exact nature of this difficulty. Before presenting these principles, it will be useful to prepare the way for them by considering techniques of parsing designed for computer languages.
3.
Parsing algorthms for programming languages
Programming languages are often based on context free languages; the problem of designing parsers for such languages is solved by constructing a parser from the CF grammar for the language. Since the parser ‘knows’ the productions of the language, its job is then defined as that of selecting which productions had to have applied to generate a particular input string. (A usual requirement for programming languages is that their grammar be unambiugous, so the solution to the parsing problem for any string is always unique.) The parser must reconstruct the derivational history of a string from context. For example, if a grammar contains two productions with identical right hand sides, say A +X, and B +X, where X is an arbitrary string of
Seven principles of surface structure parsing in natural language
19
terminals and non-terminals, then when the parser reads X, it must decide from context which production in fact applied. For example, A might be introduced by a rule -+ E B, E -+e. Thus, when X is to be parsed as C+DA,D-+d,andBbytheruleC an A, it will always be preceded by a d and when it is to be parsed as a B, it will be preceded by an e. Given that this is the general type of problem that must be solved by a parser, then two types of questions may be asked concerning a particular language. First, is it possible to parse the string deterministically from left-to-right as it is read into the computer. And second, what is the size of the largest context that must be examined at any time by the parser to decide how to build the parse tree? For reasons of efficiency, an optimal programming language can be considered to be one in which it is possible to parse left-to-right, and for which there is a fixed finite bound on the size of the forward context which must be examined at any point. There are two general strategies used in parsing algorithms (c$ McKeeman et al., 1970). In the first, a tree is built for an input string by starting with the initial symbol of the grammar (that which is topmost in all trees generated by the grammar) and building a tree downwards to the terminal symbols. Such procedures are called top-down. The operation of such a procedure on an input string ai,. . . a,” for a grammar with an initial symbol S is illustrated below:
iii . . . a, The first step of the algorithm is to build a tree down to the first symbol of the input string, ai,. The language must be such that it is always possible to do so uniquely by examining no more than k symbols ahead of ai, (where k is fixed for the language) if the parser is to be deterministic. At this point, B acts as the new ‘initial’ symbol for some substring, and the parser operated to complete B, again working top-down. When B is filled out, the next higher node in the pathway down from S is the new initial symbol, for the purposes of the parser. The class of grammars which permit parsing by an algorithm such as that outlined above is called the class of LL(k) grammars. For such languages trees can be built top-down, and it is never necessary to look more than k symbols ahead of the given input symbol to determine what action is to be taken by the parser. The second type of parsing procedure involves building a tree from the bottom-up. The first action of such a parser is to assign the first m input symbols to some node, which is then placed at the top of a stack. Thus, in parsing ai, . . . ai,, if the ntst m symbols are dominated by B, the parser will operate as illustrated below:
20
John Kimball
Stack Input string Stage 1: ai, . . . a,at,+, . . . ain Stage 2: B aim+1 . . . ai The parse is completed when the initial symbol is the only symbol in the stack, and the input string has been completely read in. Those languages which can be parsed bottom-up by looking ahead in the input string no more than k symbols are called LR(k) languages; the grammars which generate such languages are called LR(k) grammers (c$ Knuth, 1965). In general, parsing of computer languages differs from that in natural languages in two significant ways. The first involves the fact that programming language grammars are unambiguous; a parser for a programming language yields a unique tree for each string. Not only this, but also the behavior of the computer parser must be deterministic. That is, its action at any given string of input terminals and stack configuration must be uniquely determined. A model of parsing in natural language must allow for more than one parse and should predict on the basis of considerations of surface structure complexity which parse will most likely be offered as the first choice by a native speaker. Second, and most important, parsing in computer languages differs from that in natural languages in that a computer parser is allowed an essentially unrestricted memory. For example, in the case of a parser for an LR(k) grammar, it is possible to look ahead k symbols and then decide that the appropriate action is to read in the next symbol. This can be done until the whole string is read into memory before the parse tree starts being built. On the other hand, there is considerable evidence that short-term memory (STM) in humans is quite restricted, and that a tree must be built over an input string constantly so that the initial parsed string may be cleared from STM. Principle Four below concerns what seems to be a fixed limit on linguistic STM. Because of the limitations on STM, the form of a parseable (acceptable) surface structure in natural language is quite restricted. 4.
Six or seven principles of surface structure parsing
The following principles, although presented here as distinct, are closely linked and interact in various ways, as will be pointed out during the discussion. The central principle of the scheme discussed here is the last, and from it various of the others follow deductively. 4.1
Principle One (Top-Down): Parsing in natural language proceeds according to a top-down algorithm
The operation is that parsing
of such algorithms was outlined above in Section 3. The claim made of natural languages is like that in LL(k) languages, with one variation
Seven principles of surface structure parsing in natural language
21
noted below. Thus, the first node built in such parsing is the top S, while in bottom-up parsing this is the last node to be constructed. Before considering the consequences of such an assumption, let us consider it in operation in parsing a sentence like (7). (7) That the boy and the girl left amazed us. The first step upon hearing the initial terminal that is to build a tree down from S of the form (Sa).
(8) a.
We may ask why such a tree is justified given that the initial terminal that could also have begun a sentence with a totally different structure, e.g. ‘That is a nice flower’. To reduce the number of false starts, we may add the assumption that English is a look-ahead language. That is, the speaker is allowed to hear symbols subsequent to a given terminal before making up his mind as to the appropriate or most probable tree structure. I have no immediate empirical justification for this assumption, other than general considerations of simplicity and efficiency of parsing. The assumption is based on the conjecture that it may require less computation to hold a symbol in storage without tree attachment while one, but probably no more than two, subsequent symbols are scanned, than to build a tree only to have to return to alter it. (We will see later that sentences in which large tree changes are required during parsing are indeed perceptually complex.) The reader will notice that complementizers are represented as Chomsky adjoined (cf: Kimball, 1972a, for a discussion of Chomsky adjunction) to the left of the appropriate notes, following Ross (1967). There is ample justification of this purely in terms of syntax. We will see later on that further justification arises from considering the role complementizers and other ‘function’ words play in perceptual routines. Let us continue now to parse (7). Upon reading the, (8a) becomes (8b), which becomes (8~) after boy is read.
(8) b.
2
NP
2
NP
22
John Kimball
At this point and is read, signalling a conjunction, and three new NP nodes are constructed, shown in (8d). (8)
d.
I
the
I
boy
The and here is represented as Chomsky adjoined to the phrase following it. Again, justification for this can be found in Ross (1967). The circled NP is inserted between the lower S and the first NP. Insertion of such nodes I claim is possible, and such insertion is the only deviation from the general claim that trees are built top-d0wn.l Furthermore, it seems to be the case that nodes that are so inserted are copies of already built nodes, so the
s structure is preserved. NP Once the NP the boy is completed, all subsequent material is taken as belonging to a different phrase. If there were a relative clause construction, such as ‘the boy who kissed the girl’, as soon as boy is read, the NP is closed. Reading of who occasions building a new phrase, as shown in (9). 1. As this type of parsing differs from the usual top-down procedures, we may seek a new name for it. De Remer suggests over-the-top (OTT). In fact, it may be suggested that the mechanism of parsing in fact utilised in natural language is this: Trees are not built down to single terminals but with regards to adjacent pairs of terminals (discriminant pairs). Given an initial member of a pair, a tree is built overthe-top down to the second member. This could be done in one of at least three ways: (1) The tree is built up only as far as the lowest common dominating node for the pair under consideration; (2) the tree is built up only as
far as the lowest common dominating S node for the pair, and then down to the second member; or (3) the tree is built all the way up to the highest S node, and down to the second member. As I have given it in the paper, the parsing hypothesized for natural language corresponds to this third type of OTT parsing. There are some testable consequences which result from taking one or another of the positions outlined above. These have to do with what is maintained in STM, and what is cleared into the syntactic processor. This question is discussed in more detail under principle seven below.
Seven principles of surface structure parsing in natural language
23
3-Y_
(9)
#x
Det
I
the
7 r” who “;’ boy
This observation concerning the closing a phrases is illustrative of a principle that operates generally in sentence parsing. This principle of closure will be discussed in detail below and is connected with another principle dealing with how semantic information is processed. Returning once more to (7), after the and girl are read, the tree looks like (8e).
(8) e.
When kft is read, the embedded sentence is closed, following the principle alluded to above, given look ahead to make sure no other possible parts of that sentence occurred (e.g., early, or to NY). When amazed is read a new VP in the main sentence is constructed, and an NP added as us is read. The final parse tree is shown in (Sf).
(8) f.
IL
Y-L
YP s
th:cS
v
ariazed
NP-Vp
YP N ,‘s
NP-P /2 Det I
the
4 A N and NP &t I
boy
A
Dct I
N I
the girl The ‘top-down’ principle interacts in an interesting way with the following principle.
24
John Kimball
4.2
Principle Two (Right Association): Terminal symbols optimally associate to the lowest nonterminal node.
This principle is designed in part to explain the frequently observed fact that sentences of natural language organize themselves generally into right-branching structures such as (lOa), and that these structures are perceptually less complex than leftbranching structures, such as (lob), or center embedded structures, such as (IOc). (10)
a.
b. *a&j b+ C%
A
BA a C-b D/\c
c.
A B-a b% D-c
h 1 i There is considerable evidence for the existence of such a principle. First, consider a sentence such as (11). (11) Joe figured that Susan wanted to take the train to New York out. The surface structure of this sentence is shown in (12). It will be noticed that the particle out must be associated with a node other than the lowest (and, thus, rightmost). That is, ‘out’ is not associated with the node dominating ‘New York’ but with a higher node. (12) l&VP k v?G-----
This principle also explains why a sentence that is potentially ambiguous, such as ‘Joe figured that Susan wanted to take the cat out’ is read by speakers naturally in a way such that ‘take the cat out’ is a phrase. The status of such sentences constituted a puzzle for Bever (1970a) in that no known principle other than general ‘memory limitation’ would explain the difficulty
Seven principles of surface structure parsing in natural language
25
of such sentences; he was unsure whether they should be marked ungrammatical or merely unacceptable. The sentence Bever gives is (37) (his numbering). (37) I thought the request of the astronomer who was trying at the same time to count the constellations on his toes without taking his shoes off or looking at me over. Such sentences should be counted as fully grammatical in that they are generated by general syntactic mechanisms. Their perceptual complexity is explained by Right Association, which is related to the principle of Closure discussed below. Right Association accounts for the difficulty with phrases like ‘the boy who Bill expected to leave’s ball’, or the preferred but incorrect interpretation of ‘the boy who Sam gave the ball’s book’ (incorrect reading is that it’s the ball’s book; correct is that it is the boy’s book). The reason is that the possessive ‘s optimally associates with the lowest constituent, instead of the higher NP dominating the whole phrase. This principle also explains the preferred interpretation of a sentence like (13). (13) The girl took the job that was attractive. Which is that which is not synonymous with (14). (14) The girl that was attractive took the job. However, there is a general grammatical process that would form (13) out of (14), known as Extraposition From NP. This transformation maps a sentence like (ISa) into (15b). (15) a. The girl that was attractive went to NY. b. The girl went to NY that was attractive. The reason for the preferred
interpretation
of (13) can be seen from its parse tree (16).
(16)
was
attractive
Principle Two predicts that the relative clause will be perceived as associated with the lowest, rightmost node, i.e., the job rather than as a daughter of the top S, where it would have to be interpreted as an extraposed relative. The seeming unambiguity of sentences like (13) have been taken as evidence for the existence of a special and probably quite powerful form of grammatical mechanism known as transderivational constraints. Briefly, according to the hypothesizers of such constraints, a derivation in which (13) comes from (14) is to be blocked because there
26 .John Kimball
already exists a sentence of the same form with a different interpretation. The fact is that (13) can be read as (14) with a little effort; it’s just that this reading is perceptually more difficult, due to Principle Two. Thus, this datum should be removed as evidence for the existence of transderivational constraints. Without going into the matter in detail, I conjecture that all putative evidence for these devices are of the form of that above, namely, they can be explained in terms of preferred interpretation on the basis of established principles of perception. If so, there is no reason to include transderivation constraints among the stock of possible grammatical mechanisms in the theory of universal grammar. (But c$ Hankammer, 1973.) There seem to be grammatical mechanisms to avoid the generation of sentences that would be perceptually complex under Principle Two, i.e., which would involve assigning terminals to other than the lowest, rightmost non-terminal. A transformation known as Heavy NP Shift is a case in point. This transformation moves heavy NP’s to the right hand side of sentences, where definition of ‘heavy’ is discussed in Ross (1967). Thus, a sentence like (17a) would be mapped into (17b). (17) a. Joe gave a book that was about the skinning of cats in Alberta between 1898 and 1901 to Berta. b. Joe gave to Berta a book that was about the skinning of cats in Alberta between 1898 and 1901. The perceptual complexity of (17a) can be seen in its surface parse tree (18). (18)
Seven principles of surface structure parsing in natural language
27
(It will be noticed that what are traditionally called prepositional phrases [e.g., ofcats] are here represented as NPs with Chomsky adjoined prepositions. Justification for this may be found in Ross, 1967). We can utilize the principle of Right Association in gaining a partial understanding of the complexity of sentences such as (Id) ‘The girl the man the boy saw kissed left’ which initiated the discussion of this paper. Part (but not all) of the difficulty resides in the fact that the verb kissed is optimally associated with the lowest, rightmost constituent of the tree. As this association is impossible on semantic grounds, it must receive association with a VP node in a higher sentence. The same goes for left. In this way (Id) violates Right Association. Another confirmation of this principle comes from observing the natural association of adverbs. In this respect, consider a sentence like (19). (19) NP& Jie V-h I said
I
\
rain
yesterday
The dotted lines indicate the possible association of yesterday. Compare these with the natural associations of this adverb as the sentence is interpreted. The easiest reading is that in which the adverb is read as attached to the lowest VP, next easiest is that reading in which it hangs off the middle VP, and the hardest or least likely reading is that in which it is associated with said. This is exactly the prediction made by Right Association. In fact, we can define a metric such that the perceptual complexity of a sentence is proportional to the number of nodes above the lowest rightmost node to which a terminal is attached. It is to be noted that there is a syntactic device available in English to disambiguate (19), as shown in (20a-c).
28
John Kimball
(20)
a. Yesterday Joe said that Martha expected that it would rain. b. Joe said that yesterday Martha expected it would rain. c. Joe said that Martha expected that yesterday it would rain. Notice that a sentence like (21) is read most naturally such that the adverb is associated with the higher clause. (21) Haastiin knew yesterday it rained. That is, the most natural than (22b),
structure
imputed
to (21) is that shown
in (22a) rather
(22)
it rained yesterday even though it is quite possible for an adverb to hang at the beginning of a sentence, as must be case, e.g., with (20a). That this should be the case is again predicted by the principle of Right Association. That is, the adverb must associate with the lowest, rightmost node. Conceivably, the tree could be restructured when the new embedded sentence is built, but such restructuring is very costly as discussed above and as will be elaborated below in the principle of Fixed Structure. Let us consider some apparent counter-examples to Right Association. For example, this principle requires that in ‘Joe bought the book for Susan’, ‘the book for Susan’ should be interpreted as a phrase more readily than ‘bought’, ‘the book’, and ‘for Susan’ being interpreted as separate constituents of a VP, because the new NP ‘for Susan’ should preferably be assigned to the lowest completed node. That this is not the case is a function of the interaction of parsing with semantic information accessible to the speaker during the sentence scan. In particular, the verb ‘buy’ carries lexical information with it such that the speaker would expect to hear a ‘for’ phrase in its VP. This interaction of semantics with parsing will be discussed further below when the principle of Processing is presented. In the same vein, notice that while ‘Joe cooked the peas in the pot’ is ambiguous, with either reading of equal complexity, ‘Joe rode down the street in the car’ does not carry the same ambiguity. That is, one does not read ‘the street in the car’ as a phrase because of semantics or knowledge about the world that streets usually aren’t in cars. Again we see the role of outside information influencing parsing.
Seven principles of surface structure parsing in natural language
29
Thus, the above are not counter-examples to Right Association. Rather, this principle defines the optimal functioning of the parsing algorithm if no outside effects are relevant. Its operation may be superceded by mechanisms other than the parser. The principle of Right Association operates in connection with another principle of perception, that of New Nodes. This principle is needed to account for the following observation: In processing a sentence, when the speaker has constructed a node A shown in (23a) and attached to it daughters a, b . . ., upon reading the next terminal, c, Right Association demands that c be connected as shown in (23b). However, two other things may in fact happen. First, some new node, B, may appear and be subtended under A, as in (23c), or B may be Chomsky adjoined to the right of A, as in (23d). (23)
a.
A / a b...c
c
b. a b...
c
d.
a&R . . . I c
A& A aB...
T c
All three forms of assimilating the new terminal into the existing parse tree shown in (23b-d) are observed to occur in natural language. Right Association predicts that (23a) should become (23b). New Nodes is designed to predict when (23a) will become either (23~) or (23d), i.e., when the terminal c occasions the construction of a new phrase. This principle is stated as follows:
4.3
Principle Three (New Nodes): The construction of a new node is signalled by the occurrence of a grammatical function word
There is a traditional grammatical distinction in the discussion of the parts of speech between what are called content words (nouns, verbs, adjectives, etc.) and function words (prepositions, conjunctions, etc.). In the literature of transformational grammar, this distinction surfaces in terms of the difference between lexical formatives and grammatical formatives. For the time being I will focus on just prepositions, wh-words (e.g., what, where, who, how, when, why, etc.) conjunctions, and complementizers (that, for-to, and pos-ing). Later other categories will be examined as to whether they work like function words for purposes of perception. There is syntactic evidence that grammatical formatives are Chomsky adjoined on surface structure (cf. Ross, 1967). (The assumption that this is the case is in fact not necessary to the correct operation of New Nodes, but I shall maintain the assumption in that which follows.) Thus, what is traditionally called a prepositional phrase is in fact a NP, as in (24a), and the complementizers and conjunctions appear on surface structure as in (24b,c).
30
(24)
John Kimball
a.
b. NP t haA
PrefiP
There is no direct proof that fronted wh-words, as in ‘What did he say?’ or ‘The boy who you say’ are Chomsky adjoined to the front of their clauses. As mentioned above, it makes no difference for New Nodes whether this is the case or not. Let us examine how New Nodes operates to correctly predict the parsing of a sentence like (25). (25) She asked him or she persuaded him to leave. After reading the first three words of (25) a tree such as that in (26a) is constructed. (26)
a. NPAVP I she “_P I asked
I him
At this point a conjunction is reached, and the speaker must decide whether there is a conjunction of NPs (She asked him or her to leave), of VPs (She asked him or persuaded him to leave) or of Ss, as in (25). In this case a look ahead of one word reveals that the latter is the case, and (26b) is constructed, where the new node is Chomsky adjoined to the right of the top S. (26)
b.
asked
him
Right Association says that in this case the conjunction of NPs is easiest, as it is the lowest node, that of VPs next easy, and a conjunction of Ss hardest to perceive. This seems in fact to be the case for the sentences listed above. The deeper the node from which a conjunction proceeds, the more perceptually complex the sentence. An extreme case would be: ‘Everyone said that Bill thought that Max believed that she was, although no one in his right mind who had been the movie would expect that Fred had told Sally that she was pregnant.’ The perceptual complexity arises from the large internal constituent breaks, of which there are two in the above sentence, one before the conjunction although, and one before pregnant.
Seven principles of surface structure parsing in natural language
31
When the to of to leave in (25) is reached, this is a signal that a new node, in this case a VP, is to be formed. The structure of the conjunction forces this to be Chomsky adjoined to the top S, yielding a final parse tree (26~). (26) c. s
persuaded
him
Thus, (25) illustrates how it is that both conjunctions and complementizers occasion the construction of a new node. New Nodes predicts that sentences in which the complementizers and relative pronouns have been deleted by optional rules will be perceptually more complex than those in which complementizers are present, i.e., that (27a) is more complex than (27b), and (28a) more complex than (28b). (27) a. He knew the girl left. b. He knew that the girl left. (28) a. The boy who the girl who the man saw kissed left. b. The boy the girl the man saw kissed left. There is experimental evidence to support this contention. Hakes (1972) found that sentences with complementizers were processed faster than those without complementizers. He writes: ‘When an optional cue to a sentence’s underlying grammatical relations is deleted, the difficulty of comprehending is increased. These results taken together with the numerous results on relative pronoun deletion suggest that the cue deletion effect is general and not limited to a particular cue or structure’ (pp. 283-284). Thus, New Nodes supplies a second piece in fitting together an explanation of the difficulty of a sentence like (Id). (The third, and perhaps crucial, piece comes from Principle Four below.) The particular example considered above illustrates only the nodes formed by a conjunction and the complementizer to. In sentences like (27a) or (27b), the complementizer that signals the existence of an embedded sentence. The occurrence of this complementizer here introduces a structure with three new nodes, as shown in (27~).
32
John Kimball
(27)
a. b.
Joe knew that it was a duck. That it was a duck annoyed Joe. NP
C.
$
A preposition
(28)
introduces
the node NP, as in (28).
S VP_---& Susan :Y went
to+4f B&ton
Traditionally articles (a, the) ale included as function words, and they do in fact serve to introduce new phrases, although they are not Chomsky adjoined. We should include, perhaps, all words which fill the determiner slot in surface structme: several, all, each, every, few, etc. Finally, we may consider auxiliaries, which traditionally were counted as function words. There is a debate concerning the proper surface structure of the auxiliaries. Following the Chomsky (1957) analysis, it would be (29a); Ross’ analysis (1967b) would predict (29b) as the correct structure. b. (29) a.
have
V I been
VP I 7
I
sleeping The evidence for auxiliaries occurring in a right associative configuration as shown in (29b) is quite strong, having to do with deletion, the operation of There Insertion, and other matters. If Ross is right, then auxiliaries do fit the pattern of other function words of introducing new phrases.
Seven principles of surface structure parsing in natural language
33
This statement of the role of function words in sentence perception as signallers of new phrases includes no hypothesis concerning their semantic role or their syntactic origin in deep structure. Function words themselves are among the things learned later in the process of acquisition, the first stage being that of telegraphic speech (Brown and Bellugi, 1964). Likewise, they are generally absent from ‘pidgins’. In both cases, the grammatical structures may be conjectured to be not sufficiently complicated (say, in terms of occurrence of embedded sentences) to require cues to surface structure. We may hypothesize that there is a certain permissible complexity of surface structures which do not require indicators of constituent organization. In a free word order language, not so much of the surface tree is relevant to a determination of the underlying syntactic relations, and surface structures in such languages may be flat and relatively uncomplicated compared with a language such as English; thus, such a language may not need overt cues to surface parsings. The operation of New Nodes in SOV languages needs further examination. In such languages grammatical formatives typically follow those constitutents to which they are attached (as pointed out to me by Jorge Hankammer). For cases where the constituent is a simple NP with a post-position, the principle could be operative, as this NP could be stored until a look-ahead to the post-position gave clue to its syntactic status. For large constitutents such as S’s with following complementizers, New Nodes simply is inoperative. Note, however, that such cases are not counter examples; New Nodes has the logical form of a conditional: If a grammatical function word occurs, it signals construction of a new phrase. It is possible now to consider a principle which is pervasive in application, and which is the first principle directly bearing on what short-term syntactic memory limitations 4.4
are.
Principle Four (Two Sentences): can be parsed at the same time
The constituents of no more than two sentences
The first pieces of supporting evidence for this principle come simply from considering pairs of sentences like (30a,b) and (31a,b) with respect to complexity. (30) a. That Joe left bothered Susan. b. That that Joe left bothered Susan surprised Max. c. That for Joe to leave bothers Susan surprised Max. (31) a. The boy the girl kissed slept. b. The boy the girl the man saw kissed slept. In processing both (30b, 3 1b) at some point the constituents of three different sentences must be held in memory. E.g., when the second that of (30b) is heard and recognized as a complementizer, the imputed structure is (30), where three unfinished sentences are being processed at once.
34
John Kimball
(30)
(30~) shows that repetition of ‘that’ does not here add noticeable complexity. Two Sentences provides the final principle explaining the difficulty of (Id). Part of the complexity of this sentence is due to the absence of wh-words to indicate the surface structure under the principle of New Nodes; part of the difficulty is that Right Association is violated; but the major difficulty here seems to be that the third sentence simply requires short-term memory space beyond the bounds of inherent capacity. Two Sentences is an attempt to state what that inherent capacity is. When the sentences in (30b) are nominatized, the result is much easier to parse, as in (32a).
my
cousin
(32b) shows a large left-branching structure which is, nevertheless, not difficult to comprehend and which is within one S. On the other hand, when a fourth sentence is added to (30b in 33a), the result is totally incomprehensible, while the nominalized version, (33b), is not nearly as bad.
Seven principles of surface structure parsing in natural ianguage
35
(33)
a. That that that Joe left bothered Susan surprised Max annoyed no one. b. Joe’s leaving’s bothering Susan’s surprising Max annoyed no one. One may conclude that the limitation is not on left branching, but rather on the number of Ss that must be processed at the same time. In a later discussion of semantic processing, I will discuss why this might be the case, and why it is permissible to string out relative clauses, embedded sentences, and prepositional phrases on the right of a sentence. The Two Sentences principle accounts also for why right branching relative clause structures are permissible, while center embedded structures are not. Consider a structure like (34). (34)
A
the
NPA 7-N dog
V I saw
FL NP
4A the cat
S,
NP I which
VP
V Iv chased IjP
L into .A
the mouse the&e
i!
S that Jack built
It may be thought that such a sentence violated Two Sentences because the top S is not finished until the last word of the bottom S is processed. To see why this is not SC, let us say that we will consider a constituent to be ‘closed’ when the last immediately dominated rightmost daughter of that constituent is introduced in the process of parsing . In this sense, the top S is closed when the VP is reached, and the same for the second S. S, wouldn’t be closed if, say, the sentential adverb frankly appeared sequentially after built and was to be attached to S,. Why this definition of closure is appropriate and justified in that a phrase is through being parsed when it is closed will be discussed below under the principle of Processing. First, however, it is necessary to consider the principle of Closure.
36
4.5
John Kimball
Principle Five (Closure): A phrase is closed as soon as possible, i.e., unless the next node parsed is an immediate constituent of that phrase
Closure explains in part the complexity of sentences like (Ic) ‘The boat floated on the water sank’. In such sentences, as soon as the end of a potential S is reached, it is closed, unless the next phrase is also part of that S. Thus, at the end of ‘the boat floated on the water’, the assumption is that the S is closed. The remaining causes of the perceptual complexity of (Ic) are accounted for below by Fixed Structure. Also, the increased perceptual complexity of a sentence like (35b) over that of (35a) is explained by Closure, as well as New Nodes and Fixed Structure. (35) a. They knew that the girl was in the closet. b. They knew the girl was in the closet. Without the complementizer to signal the embedded sentence (New Nodes), the sequence ‘they knew the girl’ would optimally be interpreted as an S (Closure) (modulo look-ahead), but when the later words were presented, that this assumption was incorrect would require a restructuring of the presumed tree, adding to complexity (Fixed Struture). Evidence for Closure derives from experiments performed by Chapin, Smith, and Abrahamson (1972). They found that clicks were attracted to preceding surface structure constituent boundaries, even when they do not mark breaks of surface clauses. In particular, when a click was placed between a preceding surface break and a following clause break, the tendency was for the click to be perceived as closer to the preceding boundary. The authors conclude from this that in imposing a parse tree on a sentence, a subject ‘attempts at each successive point to close off a constituent at the highest possible level. Thus, if a string of words can be a noun phrase, the subject assumes that it is a noun phrase and that the next element heard will be part of some subsequent constituent. Such a strategy would explain the strong preposing tendency observed in our experiment’ (p. 171). Closure has interacted closely with a number of the principles discussed above. In particular, it is not clear whether this principle is distinct from the principle of Right Association, for when the latter is violated, a terminal must be placed as a daughter of a node not the lowest, rightmost in the tree. Thus, this higher node must be ‘reopened’ to have a constituent added to it, contrary to the optimal situation described in Closure. That is, consider an abstract tree like (36). (36) B+E
In building
A aBch k such a tree, as soon as E was being built, Closure
would require
that A
Seven principles of surface structure parsing in natural language
37
be finished. However, when k is reached, A must be re-opened to receive the new constituent. In this sense, Closure operates the same as Right Association. However, I think that there are genuine cases where Closure operates which could not be covered by Right Association. For example, a phrase like (38a) should by Closure be interpreted more readily as (38b) rather than (38~) because of the tendency to close the phrase begun with old, even though Right Association predicts the opposite because by it the second phrase will be conjoined to the lowest NP available to it. (38) a. old men who have small annual pensions and gardeners with thirty years of service b.
NP NP-NP
A
Adj I old
oid
and-N
NP-S
‘i’ men
P
gardeners
N6
‘S
I men
b who...
gardeners
with . . .
with . . .
Bever (1970b) accounts for the difficulty in sentences like (lc) (‘The boat Floated on the water sank’) in terms of what he conjectures to be a general strategy of sentence perception. This principle (strategy B, p. 294) is that the first N . . . V . . . (N) sequence isolated in the course of parsing a sentence is interpreted as the main clause. This strategy is a particular case of Closure applied to sentences. Restated, it says that when an S node is ‘opened’ in the course of a parse, the first substring interpretable as an S (given some look ahead) will be so interpreted. In general, when a terminal string can be interpreted as an X-phrase, it will be. With Closure established, we can now turn to Fixed Structure. 4.6
Principle Six (Fixed Structure): When the last immediate constituent of a phrase has been formed, and the phrase E closed, it is costly in terms of perceptual complexity ever to have to go back to reorganize the constituents of that phrase
This principle explains the complexity of sentences like (39a,b), as explained above.
38
John Kimball
a. The horse raced past the barn fell. b. The dog knew the cat disappeared, was rescued. The principle is connected with the look-ahead capacities of the sentence analyzer. Part of the function of this assumed capacity is to prevent having to return to reorganize previously assigned constituents. For example, a sentence beginning with that could be continued in at least three different ways, in each of which that would be the initial constituent of different phrases (‘That 2+2=4 is nice’, ‘that boy sang’, ‘that is a big camel’). Thus, the initial tree built down to that by Top-Down will not be determined until succeeding terminals have been scanned. From Fixed Structure we can conclude that English is a look-ahead language. The scanned but unconnected terminals occupy a certain portion of short-term memory, but not much, in that the biggest restriction here seems to be on the number of S nodes held and being processed, and the allocation of storage space is more than made up for by efficiency of parsing. (39)
4.7
Principle Seven (Processing): When a phrase is closed, it is pushed down into a syntactic (possibly semantic) processing stage and cleared from short-term memory
By this principle when a chunk of the tree is finished (where by ‘a chunk’ is meant a node and all its immediate constituents), it is sent to processing. This principle requires the assumption that there are pointers in the processing unit to keep straight the original structure of the tree, but such devices are simple mechanisms of data organization and surely can be documented to occur in other kinds of data processing (for example, association) that humans perform. Under this assumption, consider how a sentence like (40) might be processed. At each stage, the contents of the processing unit (PU) are listed. (40) Tom saw that the cow jumped over the moon. a.
C.
b.
I Tom saw At this stage, e.g., there are pointers indicating that the first NP in the PU is that dominated by the S, and the VP dominated by the S is that currently still being worked on in short-term memory.
Seven principles of surface structure parsing in natural language
d.
39
PU NT.
t
S
Tom
“h’
NPGP
saw the
cow
‘.N+
p” YP Tom
/ vp
’
A NP VP
’
Y jumped
XNP
Nlp
l
Y saw
A that
the
cow
S
f.
i jumped
/% over
“I”A*x
NP
NP
Tom
VP
the moon Again,
pointers
keep straight
there are two chunks
the relations
of the form
NP
1
V I saw among
I
NP A that
.
NP th~owN{$p
H
S
these tree chunks
A , and by the pointers NP VP
in the PU. E.g., it is kept straight
which is matrix and which subordinate. Notice, in fact, that in a right branching structure the matrix sentence will always appear to the left in the PU of the embedded sentence. This suggests that the form of the tree on surface structure is relevant for the ease of its process in the PU. I assume, further, that at any point semantic information in the PU is available for current decisions being made in constructing the parse tree, as in the different possibilities for parsing in ‘They cooked the peas in the pot’ absent in ‘They rode the street in the car’. Notice that at any given moment during the parse, not much more than a single phrase of one or at most two levels is held in short-term memory, a result of the fact that we have chosen for an example a sentence with a simple right branching structure. Sentences with center embedding require that a great deal more structure be held, simply because the higher phrases are not closed until the lower phrases are closed. Left branching structures don’t present a problem, as each chunk of the tree is snipped off and placed in the PU as it is completed.
40
John Kimball
From the principle of Processing, it is possible to deduce and thus explain some of the principles discussed above, as follows: (a) Closure results from the fact that as soon as a phrase is completed, it is pushed into the PU and thus removed from short-term memory. The longer a phrase remains uncompleted, the more of STM it takes up as its pieces are assembled. (b) Fixed Structure follows from Processing because once a phrase is formed and pushed into the PU, then it should be difficult to reach down into PU, pull the phrase back out (plus all related phrases) to rework its (their) structure. (c) New Nodes is explained because the occurrence of a function word indicated when a new phrase is begun and thus when the old can be pushed into PU. (d) By Processing, what is held as STM are those phrases where have not been completed in the sense of having their immediate constitutents filled out. The statement of Two Sentences is thus made possible, for Processing establishes what is and is not in STM at any given time. That the specific limit should be two (versus one, three, four, etc.) does not follow from Processing. In this connection it is interesting to consider the hypothesis that the sentence is not only the unit of semantic processing, but also that of syntactic processing. Under such an hypothesis, the units which are formed in and cleared from STM are only S units. This hypothesis bears the same relation to Processing as Bever’s strategy B bears to Closure - namely, it is a specific case. At this time, I see no reason to prefer the restricted form over the mere general. I conjecture, then, that syntactically all phrases are treated alike in the process of establishing the surface tree. Semantically of course, S phrases occupy a special place. There is some evidence that the general Closure principle is correct - this same evidence seems to support Processing over the hypothesis considered above as Closure follows from Processing. If correct, this means that syntactically the unit of perception is the phrase - semantically the unit of perception is the sentence. This question deserves empirical investigation. It is interesting to consider which among the seven principles adduced above are universal and which particular to English. I would conjecture that Processing and all those principles that follow from it deductively are universal, that it is common to all perceptual routines for parsing surface structures that phrases are formed, closed, and pushed into a processing unit, and that the semantic manipulations occur in that unit. Likewise, the condition on memory limitations stated in Two Sentences is probably universal. Thus, except for Top Down, which is an assumption for which it would be difficult to accrue evidence, it seems at first glance that all the principles above are universal. On the other hand, it would be most productive to look at a language like Japanese, which is ‘backwards’ from English in its order of constituents in the base, to determine which sentences are perceptually complex and why. Notice, again, that none of the principles predicts that left branching structures per se, as are
Seven principles of surface structure parsing in natural language
41
found, e.g., in Japanese relative clauses, are difficult to parse. Center embeddings are difficult to process, as noted by everyone who has worked on the problem, and the principles above explain why. Notice that the explanation of complexity of surface structures offered above does not refer to the transformational ‘distance’ of a surface structure from its deep structure. That is, these principles refer only to the tree pattern and not to how closely this resembles the tree pattern that represents the semantic relations in deep structure. In this way, these principles go against some of the earliest work on sentence recognition that sought to explain the complexity of a surface form in terms of its transformational history, which has also influenced later writings. For example, Foss and Cairns (1970) write : ‘It seems reasonable to assume that the more the surface structure of a sentence distorts the grammatical relations in the structures underlying it, the more complex is the comprehension process and, hence, the more STS (short term storage) required to complete the task of understanding’ (p. 541). If the principles above are correct, then distance between deep and surface structure bears no relation to sentence perception. It may bear on sentence comprehension and the nature of the computing process in PU, but this is a different matter. So far I have said nothing concerning the internal structure of the PU, other than that its data input file consists of various tree chunks, plus an indication of how the pieces fit together. One model of the computations of the PU is that the surface tree is therein reformed, and transformations are applied ‘backwards’ to reconstruct a deep structure, which is then mapped into the meaning (under the identity mapping for a generative semanticist). I would conjecture that the sentence units of surface structure are reconstructed, being the basic units of comprehension. That is, in, say, ‘John was believed by Bill to have been seen by Susan’, part of the meaning is that John was believed and that John was seen.
5.
Transformations
and perceptual routines
Having a model of perceptual mechanism, it is now possible to discuss how various transformations and classes of transformations arrange the form of surface structure with respect to optimal perception. That is, one traditional explanation for the existence of transformations in natural language (Chomsky, 1965, chapter 1) is that they serve to arrange perceptually complex deep structures into perceptually simple surface structures. Given the definition of perceptual simplicity adduced above, it will now be possible to examine this claim in detail. Research in transformational grammar has resulted in the accumulation of an inventory of transformations. The transformations which have been discovered, how-
42
John Kimball
ever, seem to fall into two distinct classes, the cyclic versus the last-cyclic transformations, with respect to a number of properties (Kimball, 1972~). That this should be the case is in no way predicted by the general theory of transformations. The properties of transformations in these two classes are listed below.
(1) (2) (3) (4) (5) (6)
Cyclic
Last-cyclic
Preserve form of input structure Make no essential use of variables May have lexical exceptions Several may apply within one S Seem not to introduce structural ambiguities Apply working upwards in tree
Derange input structure May make essential use of variables No lexical exceptions Only one global transformation per S May introduce structural ambiguities Apply only on top S
The last-cyclic transformations themselves can be divided into two groups, according to whether a transformation is global, moving constituents over a variable, or local. For example, a global transformation would be Wh-Fronting which moves question words to the front of sentences from arbitrarily far to the right. Thus, (41a) becomes (41b) by this transformation. (41)
a. b. c.
He told you to ask Jill to go to the store to find wh-book? Wh-book he told you to ask Jill to go to the store to find? What book did he tell you to ask Jill to go to the store to find?
(41b) becomes (41~) by Subject-Verb Inversion, which is an example of a local lastcyclic transformation. As it turns out, the local transformations are essentially ordered with respect to some global transformations. Perhaps the most interesting of the properties differentiating cyclic from last-cyclic transformations is the first, that cyclic transformations preserve the form of the input structures, while last-cyclic transformations tend to distort structure. For example, Passive operates on a structure of the form NP V NP, and produces an output structure of essentially the same form. Dative maps the structure V NP NP into one of the same form. An operation like Equi NP Deletion deletes an NP in an embedded sentence, but the sentence in the cycle on which it operates is unchanged. Likewise Subject Raising operating on a sentence embedded in subject position of a matrix sentence results in extraposing the VP of the embedded sentence to the end of the VP of the matrix, as shown in (42a,b).
43
Seven principles of surface structure parsing in natural languages
(42)
a.
b.
S
NP-
VP
I
A NP I
I she
seem
_H-% V I seems
A V-
she
NP&
I
to
ADJ
I
V”
I
be
pretty
I
be
ADJ I
pretty
(The to is the remains of a for-to complementizer.) Notice, however, that the form of the top S remains unchanged. Thus, the input structures to cyclic transformations are preserved across the operation of these transformations. On the other hand, global as well as local last-cyclic transformations result in the production of structures unlike the input structures, structures which are also quite unlike those produced by the base rules of the grammar. For example, Wh-Fronting produces a structure shown in (43), leaving a hole where the wh-NP was extracted. (43) A wh-NP
S
NP
VP
Extraposition moves sentences to the right, producing a structure like (44b) from (44a).
(44)
a. A ‘;“’ dS 2+2=4
VAN, I
amused
I
Susan
b.
am:sed
SuLl It is interesting now to consider the effect of the transformations of these various classes on the tree with respect to the principles of surface structure parsing. The cyclic transformations generally leave behind a tree with the same or less complexity
44
John Kimball
than the input tree. E.g., the perceptual complexity of a passive sentence is not discernably different from that of the active. (It is true that passive sentences tend to be remembered as actives over long-term recall, but this can be taken to be a function of the fact that these sort of memory processes are based on semantics rather than that a passive is more difficult to parse than an active.) The output tree of Subject Raising in subject position, (42b), is simpler perceptually than the input, (42a). On the other hand, the effect of the operation of last-cyclic transformations requires some scrutiny. The local last-cyclic transformations have little effect. The global last-cyclic operations are best considered by dividing them into two classes, those that move constituents to the right, and those than move constituents left, which we may label right global last-cyclic (RGLC) and left global last-cyclic (LGLC) for convenience. Some RGLC’s are listed below: (45) a. Extraposition from NP the boy who was tall left + the boy left who was tall b. Extraposition of PP a review of this book will be coming out + a review will be coming out of this book c. Extraposition (44a) + (44b) d. Right Dislocation Joe gave the book that was about ducks to Susan + Joe gave it to Susan, the book that was about ducks e. Heavy NP Shzjii (discussed in Section 4) he asked the girl with the bright blouse to leave + he asked to leave the girl with the bright blouse All of these transformations hang constituents out on right branches. They all may simplify the sentence in terms of principles like Right Association and Closure, for constituents internal to a sentence are made ‘lighter’, permitting closure to occur in general earlier. The relation of Heavy NP Shift to Right Association was discussed earlier. Note, however, that blind application of these rules does not in every case lead to a perceptually simpler sentence. E.g., (46a,b). (46) a. He told the girl with the blonde eyelashes to go to the bank to ask the clerk to remove $100 from their account. b. He told to go to the bank to ask the clerk to remove $100 from their account the girl with the blonde eyelashes. Finally, notice that all RGLC transformations except Heavy NP Shif leave behind some place marker in the tree. That is, although some constituent is moved, a pronoun or some lexical material remains to mark the place of the removal, and this is generally not true of LGLCs. In terms of Processing, this does not add to the complexity. For by the time the extraposed material is reached and parsed, the original material will
Seven principles of surface structure parsing in natural language
45
most likely be in PU, and a pointer can be attached to the extraposed constituent assigning it to that place in the PU. Let us consider next some operations that move constituents to the left. (47) a. Topicalization (moves an NP to the front of the main S) Joe told Martha to ask Susan to test the bagel for Will + the bagel Joe told Martha to ask Susan to test for Will b. Wh-Fronting (discussed above) c. Relative clause formation Joe spanked the child Bill had seen Betty kiss wh-child -+ Joe spanked the child which Bill had seen Betty kiss d. Left Dislocation (Like Topicalization, except leaves a pronoun) Joe gave the book to Sally + the book, Joe gave it to Sally. Notice now that all these operations except (47d) leave no place marker to indicate the spot of removal. This may be an accident; however, from the point of view of the principles of perception none is required. How it is that the moved material is assigned the correct place for semantic analysis is a problem solved in the PU. One could imagine the moved constituents as being placed in a special category in PU awaiting the first possible pointer assignment to a spot in the tree as it is parsed. But the difficulty in discovering the surface constituents of the tree is not increased by these operations that move elements leftwards. It could be conjectured that the RGLC rules leave a marker to indicate a place in the tree so that the extraposed material need not be assigned as a new constituent under some node already in PU. That is, with Extraposition, for example, a place is opened and held for the extraposed sentence by the it. Without some marker, to correctly appoint the encountered extraposed sentence to its place, a new NP would have to be entered under the S and a pointer assigned to that place. This would violate the principle of Fixed Structure, for some structure that had been placed in PU would have to be altered. Thus it is that it is possible to explain this property of RGLC rules with respect to the operation of rules of perception. In summary, the cyclic transformations either effect no major change in structure from the point of view of perceptual complexity, or, in the case of Subject Raising, may operate to hang material on right branches. A right branching tree is not difficult to parse, as predicted by Right Association and Closure. As pointed out to me by DeRemer, a right associative structure is easier for a top-down mechanism which is predictive. On the other hand, left associative structure is much easier for a bottom-up parser. One may conjecture that languages such as Japanese with characteristic left branching employ a mixed strategy of bottom-up and top-down parsing. RGLC transformations hang material on right branches, simplifying the tree. The fact that these transformations leave markers behind in the tree to indicate the spot
46
John Kimball
of removal material to a tree that does require but this is complexity
is significant; an empty place remains in the tree for the extraposed be reassigned in the PU. Finally, LGLC transformations do not produce is perceptually more complex than the input structure. Their operation that the moved material be located back to the original place in the tree, evidently performed in the PU, and so does not add to the perceptual of the surface structure.
REFERENCES
Bever, T. G. (1970a) The influence of speech performance on linguistic structure. In G. B. F. d’Arcais and W. J. M. Levelt (Eds.) Advances in psycholinguistics. New York, American Elsevier. Bever, T. G. (1970b) The cognitive basis for linguistic structures. In J. R. Hayes (Ed.) Cognition and the development of language. New York, John Wiley. Brown, R. and Bellugi, U. (1964) Three processes in the child’s acquisition of syntax. Harv. educ. Rev., 34 (2), 133-151. Chapin, P. G., Smith, T. S., and Abrahamson, A. A., (1972) Two factors in perceptual segmentation of speech. J. verb. Learn. verb. Beh., 11, 164-173. Chomsky, N. A. (1957) Syntactic structures. The Hague, Mouton. Chomsky, N. A. (1965) Aspects ofthe theory of syntax. Cambridge, M.I.T. Press. Chomsky, N. A. and Miller, G. (1963) Introduction to the formal analysis of natural languages. In R. D. Lute, R. R. Bush and E. Galanter (Eds.) Handbook of mathematical psychology, vol. 2. New York, Wiley. De Remer, F. (1971) Simple LR(k) grammars. Communications of the ACM, 14,453-460. Fodor, J. A. and Garrett, M. (1967) Some syntactic determinants of sentential complexity. Pert. Psychophy.
Foss, D. J. and Cairns H. S., (1970) Som effects of memory limitation upon sentence comprehension and recall. J. verb. Learn. verb. Beh., 9, 541-547. Hakes, D. T. (1972) Effects of reducing complement constructions on sentence comprehension. J. verb. Learn. verb. Beh., 11,278286. Hankammer, J. (1973) Unacceptable ambiguity, Ling. Znq., 4 (1). Kimball, J. (1972a) The formal theory of grammar. New Jersey, Prentice-Hall. Kimball, J. (1972b) The modality of conditions. In J. Kimball (Ed.) Syntax and semantics, vol. 1. New York, Seminar Press. Kimball, J. (1972c) Cyclic and linear grammars. In J. Kimball (Ed.) Syntax and semantics, vol. 1. New York, Seminar Press. Knuth, D. E. (1965) On the translation of languages from left to right. Information and Control, 8. McKeeman, W. M., Horning, J. J., and Wortman, D. B. (1970) A compiler generator. New Jersey, Prentice Hall. Ross, J. R. (1967) Constraints on variables in syntax. M.I.T. Dissertation. Ross, J. R. (1967b) Auxiliaries as main verbs. Mimeo.
Seven principles of surface structure parsing in natural language
11 existe dans la grammaire generative une distinction traditionnelle entre l’acceptabilite d’une phrase, qui appartient au domaine de la et la grammaticalite dune performance, phrase, qui appartient au domaine de la competence. Le but de l’article est de foumir une caracterisation de la notion de ‘phrase acceptable’ en anglais et de suggerer comment cette caracterisation pourrait avoir une portee universelle. La procedure consiste a dormer une serie de procedures qu’on pense dtre operationnelles pour assigner I’arbre de la structure de surface a une phrase d’entree. Ces principes d’analyse sont partiellement inspires
47
par les formules utilisees dans les langages d’informatique. Ces principes qui expliquent la grande acceptabilite de structures de derivation a droite, mettent en evidence le role des mots grammatico/fonctionnels dans la perception des phrases, decrivent ce qui semble etre une limite tke de la memoire a court terme dam le traitement linguistique et permettent de faire l’hypothbe sur la structure des mecanismes intemes dans le traitement de la syntaxe. Entin sont disc&% les differentes classes de transformation que I’on peut utiliser pour preparer les structures profondes comme input a des procedures d’analyse.
2
The use of concrete
and abstract concepts by children and adults*
R. MILLER University of Witwatersrand
Abstract
The general aim of the present study was to test the hypothesis that the younger the child the more perceptual/concrete are the concepts used. Two questions wereposed. Firstly, is there a diflerence between children and adults in using both concrete and abstract concepts as opposed to only one kind of concept? Secondly, is there a dtyerence between children and adults in using either concrete or abstract concepts for the$rst of two diferent kinds (concrete or abstract) of concepts used? Equivalence tasks of a forced-choice type were employed to test the use of concrete and abstract concepts. Only in a minority of cases were significant differences obtained between children and adults regarding (a) the use of both concrete and abstract concepts and, (b) the-first of two different kinds of concepts used.
Introduction The work of Bolles (1937), Goldstein and Scheerer (1941), Welch (1940), and Werner (1948) led Sigel (1953) to derive the hypothesis that the younger the child the more perceptual are his organizations. This hypothesis has more recently been incorporated into Bruner’s theory of cognitive development (1966). In the typical experiments conducted by Bruner and his co-workers, subjects at various ages have been required to group or classify different objects on the basis of some similar attribute or property common to all the objects. Such a task is referred to as an equivalence task, and the concept used, as an equivalence concept. The core findings in Bruner’s experiments are
* This paper is based on part of a thesis submitted for the degree of M.A. to the University of Witwatersrand. The work was carried out
under the supervision of Professor J. W. Mann to whom the writer expresses his gratitude.
Cognition 2(l), pp. 49-58
50
R. Miller
those reported by Olver and Hornsby on the development of equivalence concepts. They found that six-year old children group objects according to perceptible properties to a greater extent than older children, and that with increasing age there is a steady increase in functionally based equivalence. Bruner’s central thesis is that the child uses different modes of representing the world at various stages of development. As a consequence of ikonic representation, the young child groups objects according to perceptual attributes whereas the older child, using symbolic representation, uses more abstract attributes. It was with a view to testing some of Bruner’s assertions that the present study was designed. Although it is by no means a replication of Bruner’s work, neither in breadth nor depth, similar but not identical equivalence tasks are employed and performance on these tasks assessed according to similar criteria. In this connection, the following definitions are employed. A concrete concept is defined as a classification based on an observable attribute common to a group of objects, e.g. colour, shape, etc. An abstract concept is defined as a classification based on a feature common to a group of objects but requiring an inference of some kind, e.g. function. These definitions are based on Bruner’s notion of ‘going beyond the information given’ (1957) and are in accordance with the criteria used in his experiments. It is necessary to clarify one further term. The present experiment is concerned with the use of concepts by children and adults. This should not be confused with the formation or attainment of concepts. When discussing the use of concrete and abstract concepts, it is necessary to distinguish between two variables. Garner (1966) has expressed this as the difference between what a person ‘can do’ and ‘does’. More specifically, Price-Williams (1962) has pointed out that when prodded children use different kinds of concepts than when left to their own devices. It would appear, then, that three steps are necessary in investigating the use of concrete and abstract concepts. Firstly, to establish whether subjects of different ages do use both concrete and abstract concepts when given the opportunity to do so. This satisfies the ‘can do’ variable. Secondly, to establish whether subjects of different ages, using both kinds of concepts, differ with regard to the first of two different kinds of concepts used. This satisfies the ‘does’ variable. The results of both steps could lead to the assertion that children are more perceptual than adults, but the theoretical implications in each case would be different. The former situation could be regarded as a strong form of the assertion and the latter as a weak form. The third step involves an investigation of the nature of any differences which may be found in steps one and two, in terms of whether these differences are due to a greater or lesser use of concrete or abstract concepts by children and adults. In the present experiment the equivalence tasks were ideally of a forced-choice type with the materials being selected to facilitate two alternative equivalence classifications
The use of concrete and abstract concepts by children and adults
51
and the subjects required to classify each set of objects twice. The experiment was designed to answer the following two questions which represent the first two steps mentioned above. 1. In a task, in which a set of objects may be formed into sub-sets, using either concrete or abstract concepts, is there a difference between children and adults in using both kinds of concepts to form these sub-sets? 2. Do children and adults, using both concrete and abstract concepts to form two sub-sets, differ with respect to the kind of concept used to form the first sub-set? Method
Design
Question 1: The variables relate, on the one hand, to children and adults and, on the other, to the use of two different kinds of concepts (concrete AND abstract) and the use of only one kind of concept (concrete OR abstract). A 2 x 2 design was utilized. The two column categories of Table 2 refer respectively to the use by a subject of two different kinds of concepts and one kind of concept. The two row categories relate to children and adults. Question 2: The variables relate to age and the kind of concept first used by those subjects who used both concrete and abstract concepts in forming two equivalence classifications. A 2 x 2 design was employed. The two column categories of Table 3 refer respectively to concrete concepts that were used first, and abstract concepts that were used first. The two row categories refer to children and adults. Subjects
Forty-five children were randomly selected from the grade 1 pupils at two Johannesburg primary schools, the mean age for the sample being 6 years 6 months. Forty-five students were randomly selected from the first-year psychology students at the University of the Witwatersrand, the mean age for the sample being 19 years 2 months. Materials
Eight sets of four objects each comprised the test materials. In most cases real objects were used but where this proved impractical miniature toy models were used (see Table 1). These sets of objects were so constructed that by removing one of the four objects, for each set, the remaining three would constitute a sub-set in terms of either a
52
R. Miller
concrete or abstract equivalence concept. The eight sets of objects and the concrete and abstract sub-sets, made possible by the removal of one of the four objects, are given in Table 1. In addition, the appropriate equivalence concept is indicated in brackets.
Table
1.
Materials comprising the sets and sub-sets
Sets Complete Set 1
2
3
4
5
Concrete Sub-Set Example
Abstract Sub-Set
Banana Orange
Orange
Banana Orange
Ball Plum
Ball Plum
Saw Peg Pliers Hammer
Saw Peg Hammer
Stool Cupboard Dresser Camel
(Toy) (Toy) (Toy) (TOY)
Spoon Ball point pen Foot-long nencil Mapping-p-en
Stool Dresser Camel Spoon Ball point pen Mapping pen
Blue aeroplane (Toy) Red ship (Toy) Red ship Red pen Red pen Red car (TOY) Red car
Circular shape
Made partly of wood
Four legs
Saw Pliers Hammer Stool Cupboard Dresser -
Size - all exactly same length Ball point pen Foot-long pencil Mapuing ven Red colour
6
Colander Colander Shape - Handle Tennis racquet (Toy) Tennis Racquet with a round Ball shape Round wooden bat Round wooden bat
7
Suitcase Two books Two coins Two pens
Two books Two coins Two pens
Small round plate Small square plate Record Large round plate
Small round plate Record Large round plate
8
PlUm
Number-duality
Circular shape
Blue aeroplane Red ship Red car
Example Fruit, edible, etc.
Tools, work, etc.
Furniture, etc.
Writing, etc.
Vehicles, transport, etc.
Sport play, etc. Tennis Racquet Ball Round wooden bat Suitcase Two books Two pens
School, study, etc.
Small round plate Small square plate Eating, etc. Large round plate
The use of concrete and abstract concepts by children and adults
53
Although the sets were constructed to yield only two possible sub-sets, it was decided prior to the experiment that, in the event of sub-sets being formed which were not anticipated by the experimenter, they would be judged on their merits. All subjects, irrespective of the nature of the sub-sets formed, were asked to furnish reasons for excluding a particular object. The four objects in each set were randomly ordered. The administration of each set with the given materials constituted a test. Procedure The subjects were tested individually and the materials were presented in the same order for each subject. All the subjects received the same instructions which were as follows: ‘I am going to show you four things. Three of these things are the same and one isn’t. I want you to take away the thing that you think doesn’t belong. This sounds very easy but it isn’t really and I’ll show you why. Lets look at these four things. (Blue triangle, blue triangle, green triangle, blue circle - all made of cardboard). These (pointing at the triangles) are all the same shape so we can take away the circle because it doesn’t belong. But these three (pointing at the two blue triangles and blue circle) are also the same because they have the same colour, so we can take away this one because it doesn’t belong. Do you understand? Now each time I show you four things I am going to let you have two turns. First, you must take away the thing you think doesn’t belong and then we’ll put it back and you must try and think of something else that doesn’t belong. Let’s do one more together before we start. Look at these four things (cup, mug, biscuit cutter, glass). These three (pointing at the cup, mug and glass) are the same because we can drink from them, so we take away this one (biscuit cutter). But these three (pointing at the cup, mug and biscuit cutter) are also the same because they all have handles, so we can take away this one (glass). Do you understand what to do?’
RMlltS
Question 1 The observed frequencies of children and adults using two different kinds of concepts and one kind of concept are provided in Table 2. The results for each of the eight sets of materials were separately analyzed, using a chi-square test, corrected for continuity. It was decided to reject the null hypothesis at, or beyond, the 0.05 level of significance, and in all cases two-tailed tests were used. Significant differences (p .05, age group IV: X2 = 0,56; p > .05 age group V: X2 =2,16; p > .05. Age
Events l.P/D/nF 2. P/nD/llF
groups
I 3-7
II 497
III
IV
V
596
696
796
Tenses P-C Pr+I
Tenses P-C Pr+I
Tenses P-C Pr+I
Tenses P-C PrSI
Tenses P-C
Pr+I
6
9
8
6
10
6
8
7
11
2
14
1
13
1
12
3
10
5
13
0
Differences shown are very significant in the first square, significant in the second, and non-significant in the three others. From these two analyses by age groups, it appears that children below 5 or 6 use different verb forms to indicate differences in duration. From the age of 6 onwards, longer or shorter duration does not seem to influence the choice of verb forms for the description of perfective events. A different type of analysis confirms this conclusion; the six perfective events can be grouped as follows according to their duration :
122
J. P. Bronckart and H. Sinclair
events 2 and 6 (P/nD/nF and P/nD/nF/nS): 1 sec. event 4 (P/nD/nF/S) : 2 sec. event 5 (P/D/F/nS) : 5 sec. events 1 and 3 (P/D/nF and P/D/F/S) : 10 sec. As shown by the following Figure 5, the use of present and imparfait increases the increase of duration, and this increase is clearest between 3 and 6 years. Figure
5.
with
The proportion of P-C and Pr+I tenses is represented respectively by dark and blank columns for actions of I’, 2’, 5’ and 10’ duration. Areas withfull upper limit indicate this proportion for age groups I, II and III; areas with broken upper limit indicate the proportion for all age groups
100% l-
Frequence. The influence of frequence can be analysed only for the durative events, two of which are frequentative, and one non-frequentative; the three non-durative events are all non-frequentative. We saw (c$ Table 1) that differences in the use of tenses between the three D actions are significant. If we compare the tenses used for the two frequentative actions (3 and 5) with those used for the continuous action (l), we find a very significant difference (X2 = 6,6). The continuous versus frequentative feature can thus influence the choice of tenses. When action 1 and 3, which only differ in the frequentative or continuous character of the action, are compared, a significant difference between use of Pr+I and the use of P-C appears only for age group IV, e.g., between 6 and 7; these children use the present tense more often for continuous events than for frequentative ones. Thus, duration influences the choice of tenses for children until the age of 6, whereas, between 6 and 7, frequence seems to become predominant. This shift could explain the increases of present and imparfatt in events 1 and 2 noted for age group IV (see p. 116).
Time, tense and aspect
Table 6.
123
Distribution of P-C and Pr+I tenses used in the description of PIDInF and P/D/F/S events by each age group Difference between the results for PID/nF and P/D/F/S events; age group I: X2 = OJ4; p > .05, age group II: X2 = 0,16; p > .05 age group III: X2 = 0,02; p > .05, age group IV: X2 =3,98; p < .05 age group V: X2 = 0,38; p > .05. Age groups V
I
II
III
IV
397
4,7
5,6
636
736
Tenses P-C
Tenses P-C PrSI
Tenses P-C
Pr+I
PrSI
Tenses P-C
PrSI
Tenses P-C Pr+I
1. P/D/nF
6
9
8
6
10
6
8
7
11
2
3. P/D/F/S
7
8
9
5
9
6
13
2
12
1
Events
Success or failure. This distinction cannot be separated from other features in our investigation, since the two failure actions take half the total time of the two success actions. The small difference in tense use observed may therefore be attributed to the difference in duration (see above). Moreover, from the children’s comments it is clear that to them the failure of an action (e.g., the cow jumping over five fences without reaching the farm) is just as important a result as success. 2.2.2 Aperfective events These events differ both in duration and frequence. As we noted before, in the description of nD event (9: 3 sec.) the proportion of passe composes is more important than in the two D events (10 and 11: each 8 sec.), and between these two D actions, the proportion ofpasst composes is more important in frequentative event (10) than in the continuous one (11). This confirms the results obtained for perfective events, but none of the differences are statistically significant. 2.2.3 Imperfective events The two imperfective events difference in duration induce probably, imperfective events because of the absence of any
only differ in their total duration. At no age does the the children to use different tenses in their descriptions; are considered to be of indeterminate duration precisely result whatsoever.
124
2.3
J. P. Bronckart and H. Sinclair
Other means of coding the dimensions of events
Apart from tenses, children use several means of expressing the different aspectual dimensions of actions : Intonation, repetitions, gesture, choice of specific lexical items and adverbs. Though it is very difficult to give a quantitative analysis of the use of those means, we can see a clear evolution, especially in the description of aperfective situations, and in the description of the four events of series B (e.g., the different jumps in actions 3, 4, 5 and 6). For the events of series B, children from 3 to 5 vary their descriptions by gestures, intonation and repetitions; gestures accompany all four events, a particular intonation is introduced in event 4 (i.e., the long jump is described as il saute, with a gesture imitating the movement and a long drawn-out vowel sound), and repetitions for events 3, 5 and 6 (il est 16, Ii, Id, Zri+ or iZa saute, saute?, sautk). Adverbs begin to appear in the descriptions from the age of 5, and after 6, typical descriptions are as follows: II a sautt sur toutes ZesbarriGres et aprds il est rentrt! d la maison (‘he jumped over all the fences and then he went home’). For events 5 and 6, where the farmer only jumps over five of the ten fences and the horse only over one fence, numbers are already introduced in the descriptions from the age of 5 (ZZa sautt sur quatre barrizres ‘he jumped over four fences’ or II n’a saute’ qu’une barrisre - ‘he only jumped one fence’). In fact, the failure-actions, far from being assimilated by the children to imperfective events, give rise to very detailed descriptions of the result: (7,6) EZZea sautP par trois barrikes et s’est arre^tte d la quatrisme (‘she jumped over three fences and stopped at the fourth’). In the descriptions of aperfective events, adverbs are rare before 6 (though expressions such as un peu, ‘a bit’, are used for event 9). After that age adverbs appear, e.g. Zongtemps (‘for a long time’), and adverbial expressions such as une fois (‘once’) and beaucoup de fois (‘many times’). (7,4) event 9 (short cry) : Zl a cri& une fois (‘he cried once’). (7,4) event 10 (series of short cries): ZZa crikplus de fois (‘he cried more times’). (7,5) event 11 (long cry) : II a cri&plus Zongtemps (‘he cried a longer time’). The youngest subjects up to the age of 5,6 tend to use in all their descriptions the same rather vague verbs: ZZmarche (‘he walks’), or II vu (‘he goes’) for all six perfective actions; Ilfait du bruit (‘he makes noise’) for the three aperfective actions. From 3,6 to 6, many different verbs are used; II tire (‘he draws’), Zl dipanne (‘he repairs and sets going’), II roule (‘he drives’), ZZpousse (‘he pushes’), II avance (‘he moves forward’), etc., for the description of the same event 1. After 6, standard verbs appear; pousser (‘to push’) for events 1 and 2, sauter (‘to jump’) for events 3-6, and crier (‘to cry’) for events 9-l 1.
Time, tense and aspect
It seems important aspectual features of distinctions. Between sometimes reinforced repetitions). After 6, phrases.
2.4.
125
to note that the use of different tenses to express different the events appears at the same age as that of detailed lexical 3,6 to 6, the information seems to be concentrated in the verb, by more ‘primitive’ means of expression (intonation, gestures, part of the information is expressed by adverbs or adverbial
Focus of attention
Children’s attention can be retained by different parts of the events; the different focusings vary with age and with the type of event. Our results confirm the evolution found by Piaget (1946) and Ferreiro (1971); children of 3 to 5 start their descriptions with the result. (4,5) event 4: Elle se trouve devant la maison (‘she’s in front of the house’). (3,3) event 1: La voiture qui etait darts le garage (‘the car that was in the garage’). (5,6) event 1: Une voiture avec un camion qui est darts un garage (‘a car with a truck that’s in a garage’). For event 3 (arriving at the farm after many short jumps) the first focus is often on the aim rather than on the result: (3,lO) Y va a la ferme (‘he’s going to the farm’). (3,lO) II va aller d la maison (‘he’s going to go home’). This singling out of either the result or the aim as the first thing to be described (in the situations where this is possible) may already be followed by a different focus at an early age, even without encouragement by the experimenter; the sequence of description then no longer follows the actual time sequence of the event: e.g., (3,2) Elle a arrive’ a la maison . . . elle a suite’ (‘she came home . . . she jumped’). Older children either start immediately by describing the action itself, or, if they give a description in several parts, they meticulously follow the real time-sequence: (7,5) Une voiture Ptait sur la route, un camion est venu et l’apoussee jusqu’au garage et est entre (‘there was a car on the road, a truck came and pushed it, and went inside’).
3.
Discussion
The choice of tenses used in the descriptions shows that at all ages the subjects take into account the difference between perfective, aperfective and imperfective events. Actions that obtain a clear result are mostly described in the passe compost, actions without an intrinsic aim are described in the present or the passe compost, and actions that do not lead to any result are described in thepresent. Within the limits of the ages
126
J. P. Bronckart and H. Sinclair
observed, these ‘subjective’ aspects clearly influence the choice of tenses. Other, more ‘objective’ features of the events presented (duration and frequence) have no influence on the choice of tenses for imperfective events and little influence for aperfective events, but they determine significant differences for the perfective events. In the descriptions of perfective events, the use ofpass& composb decreases between the ages of 3 and 6, with the increase of the influence of the objective duration. Possibly, duration is not an ‘aspect’ by itself, but combined with other characteristics of the action, such as the distance covered, it gives the subject an important cue as to the interval between the start and the result of an action. If the action gives an immediate result (nD actions), the use ofpasst compost reaches its maximum because the observer can only focus on the result; if a certain time elapses before the result is obtained (D actions), the observer can either focus on the result or on the action process itself. The longer it takes to complete the action, the more probable becomes a focusing on the action and the use of prhent. The distance covered by a moving object may have a similar influence. With the 6-year olds, continuous D actions still give rise to an important proportion (40 %) of prbents, whereas frequentative D actions are described by pas& composks. The frequentative feature also appears to favour a focus on the result rather than on the action itself. After the age of 7, all perfective events are generally described in the pass& compos6. The same trend towards an exclusive use of pass& compose appears after the age of 6 for aperfective events, and simultaneously some imparfaits are introduced for imperfective events. How much light do these facts throw on the question of what determines the use of tenses in child language? Spontaneous use of past and future tenses certainly indicates that relationships of posteriority and anteriority (i.e., of the event in relation to the moment of enunciation) play a part, but our results indicate that other factors also have their importance, at least until the age of 6. All descriptions were given about 7 seconds after the termination of the events; there was therefore a clear posteriority relationship, which adult French speakers generally express by the passt compose, imparfait, plus-que-parfait and the more recondite pass& simple. In current use, the passe compose expresses perfective past actions, and the imparfait imperfective past actions. Consequently, we can conclude that from the age of 6, when the trend towards pass& compost% for all actions becomes clearly established, and when imparfaits begin to appear, children use tenses to express the same temporal relationships as adults. Before the age of 6, however, the distinction between perfective and imperfective events seems to be of more importance than the temporal relation between action and the moment of enunciation. Imperfective actions are almost never expressed by past
Time, tense and aspect
127
tenses, and for perfective actions the use of prtsents is the more frequent the greater the probability of taking into account the unaccomplished part of the action. This probability is partly determined by duration, frequence and maybe other objective features we have not investigated. However, the distinction between the importance of the result as against that of the process of the action itself is the predominant and may be the only aspectual feature in the language of children below 6. Exclusive attention to the result of an action implies focusing on the ‘past’ character of an action; conversely, a focus on the process, without attention to the result, projects the action into a kind of perpetual present. For certain types of events, these early, incompatible focuses take on prime importance and lead the child to ignore the relationship of posteriority between enunciation and the termination of the action. Only when these priviledged focuses lose their importance do children begin to express the temporal posteriority in the manner of adults, shifting attention from the character of the action itself to its temporal relation with the moment of enunciation. Finally, the older subjects use different means to express aspectual features. Young children use gestures, intonation and tenses, where 6-year olds use adverbial expressions; after a first period of using pass&partout words for the actions, our young subjects use different lexical items for the same actions where older children use the same verb for the same type of action, though the actions differ in duration and frequence. Though we do not wish to argue that this developmental trend is simply the result of the cognitive development of such notions as time and duration as described by Piaget (1946), there is a striking parallelism with cognitive development in general, rather than directly with such operations as the coordination of time, speed and distance. In one of his latest works (1971), Piaget stresses the lack of differentiation between knowledge about physics and knowledge about logic, that exists during the preoperational period (from 2 to 7 years approximately). Both logical knowledge and knowledge about properties of real objects stem from actions (mainly those performed, partially or totally, by the child himself, but also those that are observed by the child); and every action has two aspects. There is on the one hand the dimension of that which is generalizable, i.e., the way actions are coordinated, the way they can annul or compensate each other. On the other hand, there is a specific dimension, i.e., the particular characteristics of the action and its result. According to Piaget, the obstacles to the formation of the first fully coherent logical system are at least partly a result of the primacy of the particular character of each action over its general, coordinative aspect. The child is mainly interested in the physical outcome of his actions and does not yet differentiate this dimension from the general coordinations he can already perform. Since actions are not done to be undone immediately afterwards, and since their aim is some kind of change in reality, it is not surprising that the main charac-
128
J. P. Bronckart and H. Sinclair
teristics of the first operations system, i.e., reversibility and conservation, are still absent during this period. Though the 4- or 5-year-old knows full well the different factors of a conservation experiment, for example, he can neither dissociate them, nor conciliate the different kinds of information he infers from them. In problems of time and duration these characteristics of the young child’s thinking lead to typical errors. He cannot coordinate the times of arrival and departure if he has to compare two events as to duration; even when he can already recount the simultaneity of start and finish, he will still not conclude that therefore the two events must have taken the same amount of time - except if the two movements were of the same speed and covered the same distance. Focusing on the end result, speed and time will be considered as co-varying with the distance covered; focusing on speed, this factor will overpower and deform the role of the others. For correct judgments, in which duration is ‘conserved’, the factors first have to be dissociated and then their interaction will be understood: Equal duration, but different distances covered can be explained by different speeds. Similarly, in the descriptions the child gives of the events performed in front of him, we observed a frequent mention of aim or result, a global apprehension of the different characteristics of the event expressed by the verb form itself, and finally, a dissociation of features expressed by adverbial locutions and mainly temporal function of tenses. Our knowledge of language acquisition is still far too fragmentary to allow us to do more than point out very general mechanisms of cognitive development that appear to have explanatory value for language acquisition phenomena. Nevertheless, several other experimental results also indicate the existence of underlying cognitive mechanisms which must be one of the factors that determine the often surprising course of linguistic acquisitions (cJ Ferreiro, 1971; Sinclair and Ferreiro, 1970). A parallel of a very different kind can be found in historical linguistics. In this field it is becoming possible to go beyond the ‘establishment of an arbitrary initial stage of a phenomenon’ and to ‘study the dynamic aspects of the process of linguistic development’ (Watkins, 1969, p. 2, 3). Several studies of the Indo-European verb system have led a number of authors (Kurylowicz, 1964; Watkins, 1969) to surmise that initially this system did not comprise any temporal oppositions. There is, however, evidence of a very early opposition between injunctives and indicatives, and of the existence of the aspectual opposition between perfective and imperfective. This opposition can shift to a temporal function, by the opposition of past and present, and aspectual forms such as the desiderative can acquire a temporal function and fill the place of the future, thereby completing the temporal axis. Other aspectual distinctions such as accomplished versus generally ongoing can in their turn cause a rebuilding of the system, and certain forms (as for instance the -s aorist) may acquire a modal function. In this way, two distinct systems
Time, tense and aspect
129
can emerge, one aspectual and temporal, the other modal. In the historically attested languages, such rebuilding did not necessarily take place in the same manner nor at the same time; but to all of them Slobin’s dictum (1973) about language acquisition can be applied : ‘New forms first express old functions, new functions are first expressed by old forms.’ Though such historical parallels are intriguing, we should evidently guard against attributing some explanatory value to them. Knowledge of cognitive development may help us to understand the course of language acquisition, but it can hardly be supposed that what we know about the way certain languages have changed in the course of their history can elucidate the acquisition process. In fact, if parallels such as the one we have referred to have a deeper significance than a chance resemblance, the relation may well take the opposite direction: The course of language acquisition may point towards some theory of historical development.
REFERENCES Ferreiro, E. (1971) Les relations temporelles dam le langage de l’enfant. Gen&e, Droz. Ferreiro, E. and Sinclair, H. (1971) Temporal relationships in language. Inter. J. Psychol., 6,39-47. Fraisse, P. (1948) Etude cornpa& de la perception et de l’estimation de la dur6e chez les enfants et les ad&es. Enfance, 1, 199-211. Gregoire, A. (1947) L’apprentissage du langage. Vol. II, Gembloux, Duculot. Hackett, C. F. (1958) A course in modern linguistics New York, Macmillan. Kurylowicz, J. (1964) Theinflectional categories of Zndo-Eurapean. Heidelberg, Carl Winter UniversitBtsverlag. Meillet, A. (1922) Introduction d l’ttuak comparative des langues Inabeuropeennes. Paris, Hachette.
Piaget, J. (1946) Le developpement de la notion de temps chez I’enfant. Paris, P.U.F. Piaget, J. and Garcia, R. (1971) Les explications causales. Etudes d’episttmologie genetique. Vol. 26, Paris, P.U.F. Sinclair, H. and Ferreiro, E. (1970) Comprehension, production et repetition de phrases au mode passif. Archives de Psychologie, 40, l-42. Slobin, D. I. (1973) Cognitive prerequisitesfor the development of grammar. In E. A. Ferguson, and D. I. Slobin (Eds.), Studies of Child Language Development. New York, Holt, Rinehart and Winston. Pp. 175-208. Watkins, C. (1969) Indo-European origins of the Celtic verb. Dublin, The Dublin Institute for Advanced Studies.
130
J. P. Bronckart and H. Sinclair
Ce papier a pour but de rechercher l’utilisation des formes verbales francaises par des enfants entre 2,ll et 8,7 ans. L’experience presentke montre que les enfants n’utilisent pas seulement les temps pour indiquer la relation de posterior&, anteriorite et simultantite entre les evenements decrits et le moment de l’enonciation mais qu’interviennent des facteurs lies a I’aspect. On a demand6 a 74 enfants de d&ire 11 actions mimees par I’experimentateur avec des jouets. Ces actions ditferaient selon leur
resultat, leur frequence et leur duree. Pour tous les enfants le type de resultat influence le choix de la forme verbale. Les traits plus objectifs tels que frequence et duree exercent une infiuence sur les enfants de 3 a 6 ans. Au dela de cet age, l’utilisation des temps commence a ressembler a celle des adultes qui utilisent differentes formes verbales pour exprimer principalement les relations temporelles. D’autres marques d’aspect et de temps montrent un developpement similaire avec Page.
6
Reductionism
and the nature
of psychology
H. PUTNAM Harvard University
1.
Reduction
A doctrine to which most philosophers of science subscribe (and to which I subscribed for many years) is the doctrine that the laws of st$h ‘higher-level’ sciences as psychology and sociology are reducible to the laws of lower-level sciences - biology, chemistry, ultimately to the laws of elementary particle physics. Acceptance of this doctrine is generally identified with belief in ‘The Unity of Science’ (with capitals), and rejection of it with belief in Vitalism, or Psychism, or, anyway something bud. In this paper I want to argue that this doctrine is wrong. In later sections, I shall specifically discuss the Turing machine model of the mind - and the conception of psychology associated with reductionism and with the Turing machine model. I want to argue that while materialism is right and while it is true that the only method for gaining knowledge of anything is to rely on testing ideas in practice (and evaluating the results of the tests scientifically), acceptance of these doctrines need not lead to reductionism. I shall begin with a logical point and then apply it to the special case of psychology. The logical point is that from the fact that the behavior of a system can be deduced from its description as a system of elementary particles it does not follow that it can be explained from that description. Let us look at an example and then see why this is so. My example will be a system of two macroscopic objects, a board in which there are two holes, a square hole 1” across and a round hole 1” in diameter, and a square peg, a fraction less than 1” across. The fact to be explained is: The peg goes through the square hole, and it does not go through the round hole. One explanation is the peg is approximately rigid under transportation and the board is approximately rigid. The peg goes through the hole that is large enough and not through the hole that is too small. Notice that the microstructure of the board and the peg is irrelevant to this explanation. All that is necessary is that, whatever Cognition 2(l),
pp. 131-146
132
H. Putnam
the microstructure may be, it be compatible proximately rigid objects.
with the board
and the peg being ap-
Suppose, however, we describe the board as a cloud of elementary particles (for simplicity, we will assume these are Newtonian elementary particles) and imagine ourselves given the position and velocity at some arbitrary time t, of each one. We then describe the peg in a similar way. (Say the board is ‘cloud B’ and the peg is ‘cloud A’.) Suppose we describe the round hole as ‘region 1’ and the square hole as ‘region 2’. Let us say that by a heroic feat of calculation we succeed in proving that ‘cloud A’ will pass through ‘region 2’, but not through ‘region 1’. Have we explained anything? It seems to me that whatever the pragmatic constraints on explanation may or may not be, one constraint is surely this: That the relevant features of a situation should be brought out by an explanation and not buried in a mass of irrelevant information. By this criterion, it seems clear that the first explanation -the one that points out that the two macro-objects are approximately rigid and that one of the two holes is big enough for the peg and the other is not - explains why ‘cloud A’ passes through ‘region 2’ and never through ‘region l’, while the second - the deduction of the fact to be explained from the positions and velocities of the elementary particles, their electrical attractions and repulsions, etc. - fails to explain. If this seems counterintuitive it is for two reasons, I think. (1) We have been taught that to deduce a phenomenon in this way is to explain it. But this is ridiculous on the face of it. Suppose I deduce a fact F from G and I, where G is a genuine explanation and I is something irrelevant. Is G and I an explanation of F? Normally we would answer, ‘No. Only the part G is an explanation’. Now, suppose I subject the statement G and I to logical transformations so as to produce a statement H which is mathematically equivalent to G and I (possibly in a complicated way), but such that the information G is, practically speaking, virtually impossible to recover from H. Then on any reasonable standard the resulting statement H is not an explanation of F; but F is deducible from H. I think that the description of the peg and board in terms of the positions and velocities of the elementary particles, their electrical attractions and repulsions, etc., is such a statement H: The relevant information, that the peg and the board are approximately rigid, and the relative sizes of the holes and the peg are buried in this information, but in a useless way (practically speaking). (2) We forget that explanation is not transitive. The microstructure of the board and peg may explain why the board and the peg are rigid, and the rigidity is part of the explanation of the fact that the peg passes through one hole and not the other, but it does not follow that the microstructure, so to speak ‘raw’ - as an assemblage of positions, velocities, etc. - explains the fact that the peg passes through one hole and not the other. Even if the microstructure is not presented ‘raw’, in this sense, but the informa-
Reductionism
and the nature of psychology
133
tion is organized so as to give a revealing account of the rigidity of the macro-objects, a revealing explanation of the rigidity of the macro-objects is not an explanation of something which is explained by that rigidity. If I want to know why the peg passes through one hole and not the other in a normal context (e.g., I already know that these macro-objects are rigid), then the fact that one hole is bigger than the peg all around and the other isn’t is a complete explanation. That the peg and the board consist of atoms arranged in a certain way, and that atoms arranged in that way form a rigid body, etc., might also be an explanation - although one which gives me information (why the board and the peg are rigid) I didn’t ask for. But at least the relevant information - the rigidity of the’board and the peg, and the relation of the sizes and shapes of the holes and the pegs - are still explicit. That the peg and the board consist of atoms arranged in a certain way by itself does not explain why the peg goes through one hole and not the other, even if it explains something which in turn explains that. The relation between (1) and (2) is this: An explanation of an explanation (a ‘parent’ of an explanation, so to speak), generally contains information I which is irrelevant to what we want to explain, and in addition it contains the information which is relevant, if at all, in a form which may be impossible to recognize. For this reason, a parent of an explanation is generally not an explanation. What follows from this is that certain systems can have behaviors to which their microstructure is Zargely irrelevant. For example, a great many facts about rigid bodies can obviously be explained just from their rigidity and the principles of geometry, as in the example just given, without at all going into why those bodies are rigid. A more interesting case is the one in which the higher-level organizational facts on which an explanation turns themselves depend on more than the microstructure of the body under consideration. This, I shall argue, is the typical case in the domain of social phenomena. For an example, consider the explanation of social phenomena. Marx, in his analysis of capitalism, uses certain facts about human beings - for example, that they have to eat in order to live, and they have to produce in order to eat. He discusses how, under certain conditions, human production can lead to the institution of exchange, and how that exchange in turn leads to a new form of production, production of commodities. He discusses how production of commodities for exchange can lead to production of commodities for profit, to wage labor and capital. Assume that something like this is right. How much is the microstructure of human beings relevant? The case is similar to the first example in that the specifics of the microstructure are irrelevant: What is relevant is, so to speak, an organizational result of microstructure. In the first case the relevant organizational result was rigidity: In the present case, the relevant organizational result is intelligent beings
134 H. Putnam
able to modify both the forces of production and the relations of production to satisfy both their basic biological needs and those needs which result from the relations of production they develop. To explain how the microstructure of the human brain and nervous system accounts for this intelligence would be a great feat for biology; it might or might not have relevance for political economy. But there is an important difference between the two examples. Given the microstructure of the peg and the board, one can deduce the rigidity. But given the microstructure of the brain and the nervous system, one cannot deduce that capitalist production relations will exist. The same creatures can exist in pre-capitalist commodity production, or in feudalism, or in socialism, or in other ways. The laws of capitalist society cannot be deduced from the laws of physics plus the description of the human brain: They depend on ‘boundary conditions’ which are accidental from the point of view of physics but essential to the description of a situation as ‘capitalism’. In short, the laws of capitalism have a certain autonomy vis-a-vis the laws of physics: They have a physical basis (men have to eat), but they cannot be deduced from the laws of physics. They are compatible with the laws of physics; but so are the laws of socialism and of feudalism. This same autonomy of the higher-level science appears already at the level of biology. The laws which collectively make up the theory of evolution are not deducible from the laws of physics and chemistry; from the latter laws it does not even follow that one living thing will live for five seconds, let alone that living things will live long enough to evolve. Evolution depends on a result of microstructure (variation in genotype); but it also depends on conditions (presence of oxygen, etc.) which are accidental from the point of view of physics and chemistry. The laws of the higherlevel discipline are deducible from the laws of the lower-level discipline together with ‘auxiliary hypotheses’ which are accidental from the point of view of the lower-level discipline. And most of the structure at the level of physics is irrelevant from the point of view of the higher-level discipline; only certain features of that structure (variation in genotype, or rigidity, or production for profit are relevant), and these are specified by the higher-level discipline, not the lower-level one. The alternatives mechanism or vitalism are false alternatives. The laws of human sociology and psychology, for example, have a basis in the material organization of persons and things, but they also have the autonomy just described vis-&is the laws of physics and chemistry. The reductionist way of looking at science both springs from and reinforces a specific set of ideas about the social sciences. Thus, human biology is relatively unchanging. If the laws of psychology are deducible from the laws of biology and (also unchanging) reductive definitions then it follows that the laws of psychology are also unchanging. Thus the idea of an unchanging human nature - a set of structured
Reductionism and the nature of psychology
135
psychological laws, dependent on biology but independent of sociology - is presupposed at the outset. Also, each science in the familiar sequence - physics, chemistry, biology, psychology, sociology - is supposed to reduce to the one below (and ultimately to physics). Thus sociology is supposed to reduce to psychology which in turn reduces to biology via the theory of the brain and nervous system. This assumes a definite attitude towards sociology, the attitude of methodological individualism. (In conventional economics, for example, the standard attitude is that the market is shaped by the desires and preferences of individual people; no conceptual apparatus even exists for investigating the ways in which the desires and preferences of individuals are shaped by the economic institutions.) Besides supporting the idea of an unchanging human nature and methodological individualism, there is another and more subtle role that reductionism plays in one’s outlook. This role may be illustrated by the effect of reductionism on biology departments: When Crick and Watson made their famous discoveries, many biology department fired some or all of their naturalists! Of course, this was a crude mistake. Even from an extreme reductionist point of view, the possibility of explaining the behavior of species via DNA mechanisms ‘in principle’ is very different from being able to do it in practice. Firing someone who has a lot of knowledge about the habits of, say, bats, because someone else has a lot of knowledge about DNA is a big mistake. Moreover, as we saw above, you can’t explain the behavior of bats, or whatever species, just in terms of DNA mechanisms - you have to know the ‘boundary conditions’. That a given structure enables an organism to fly, for example, is not just a function of its strength, etc., but also of the density of the earth’s atmosphere. And DNA mechanisms represent the wrong level of organization of the data - what one wants to know about the bat, for example, is that it has mechanisms for producing supersonic sounds, and mechanisms for ‘triangulating’ on its own reflected sounds (‘echolocating’). The point is that reductionism comes on as a doctrine that breeds respect for science and the scientific method. In fact, what it breeds is physics worship coupled with neglect of the ‘higher-level’ sciences. Infatuation with what is supposedly possible ‘in principle’ goes with indifference to practice and to the actual structure of practice. I don’t mean to ascribe to reductionists the doctrine that the ‘higher-level’ laws could be arrived at in the first place by deduction from the ‘lower-level’ laws. Reductionist philosophers would very likely have said that firing the naturalist was a misapplication of their doctrines, and that neglect of direct investigation at ‘the level of sociology’ would also be a misapplication of their doctrine. What I think goes on is this. Their claim that higher-level laws are deducible from lower-level laws and therefore higher-level laws are explainable by lower-level laws involves a mistake (in fact, two mistakes). It involves neglect of the structure of the higher-level explanations
136
H. Putnam
which reductionists never talk about at all, and it involves neglect of the fact that more than one higher-level structure can be realized by the lower-level entities (so that what the higher-level laws are cannot be deduced from just the laws obeyed by
the ‘lower-level’ entities). Neglect of the ‘higher-level’ sciences themselves seems to me to be the inevitable corrollary of neglecting the structure of the explanations in those sciences.
2.
Turing machines
In previous papers,l I have argued for the hypothesis that (1) a whole human being is a Turing machine, and (2) that psychological states of a human being are Turing machine states or disjunctions of Turing machine states. In this section I want to argue that this point of view was essentially wrong, and that I was too much in the grip of the reductionist outlook just described. Let me begin with a technical difficulty. A state of a Turing machine is described in such a way that a Turing machine can be in exactly one state at a time. Moreover, memory and learning are not represented in the Turing machine model as acquisition of new states, but as acquisition of new information printed on the machine’s tape. Thus, if human beings have any states at all which resemble Turing machine states, those states must (1) be states the human can be in at any time, independently of learning and memory; and (2) be total instantaneous states of the human being states which determine, together with learning and memory, what the next state will be, as well as totally specifying the present condition of the human being (‘totally’ from the standpoint of psychological theory, that means). These characteristics already establish that no psychological state in any customary sense can be a Turing machine state.2 Take a particular kind of pain to be a ‘psychological state’. If I am a Turing machine, then my present ‘state’ must determine not only whether or not I am having that particular kind of pain, but also whether or not I am about to say ‘three’, whether or not I am hearing a shrill whine, etc. So the psychological state in question (the pain) is not the same as my ‘state’ in the sense of machine state, although it is possible (so far) that my machine state determines my psychological state. Moreover, no psychological theory would pretend that having a 1. Minds and machines, in Sidney Hook (Ed.), Dimensions of mind, New York University, 1960, pp. 148-179; Psychological predicates, in Merrill and Capitan (Eds.), Art, mind, and religion, University of Pittsburgh, 1965, pp. 37-48; The mental life of some machines, in Hector Castaneda (Ed.), Zntentionality, minds,
and perception, Wayne University, 1967, pp. 177-214. 2. For an exposition of Turing machines, see Martin Davis (1958), Computability and unsolvability, New York, McGraw Hill. There is also an attractivelittlemonograph by Trachtenbrot on the subject.
Reductionism and the nature of psychology
137
pain of a particular kind, being about to say ‘three’, or hearing a shrill whine, etc., all belong to one psychological state, although there could well be a machine state characterized by the fact that I was in it only when simultaneously having that pain, being about to say ‘three’, hearing a shrill whine, etc. So, even if I am a Turing machine, my machine states are not the same as my psychological states. My description qua Turing machine (machine table)and my description qua human being (via a psychological theory) are descriptions at two totally different levels of organization. So far it is still possible that a psychological state is a large disjunction (practically speaking, an almost infinite disjunction) of machine states, although no single machine state is a psychological state. But this is very unlikely when we move away from states like ‘pain’ (which are almost biological) to states like ‘jealousy’ or ‘love’ or ‘competitiveness’. Being jealous is certainly not an instantaneous state, and it depends on a great deal of information and on many learned facts and habits. But Turing machine states are instantaneous and are independent of learning and memory. That is, learning and memory may cause a Turing machine to go into a state, but the identity of the state does not depend on learning and memory, whereas, no matter what state I am in, identifying that state as ‘being jealous of X’s regard for Y’ involves specifying that I have learned that X and Y are persons and a good deal about social relations among persons. Thus jealousy can neither be a machine state nor a disjunction of machine states. One might attempt to modify the theory by saying that being jealous = either being in State A and having tape c1 or being in State A and having tape c2 or . . . being in State B and having tape d, or being in State B and having tape d2 or . . . being in State Z and having tape y1 . . . or being in State Z and having tape y, - i.e., define a psychological state as a disjunction, the individual disjuncts being not Turing machine states, as before, but conjunctions of a machine state and a tape (i.e., a total description of the content of the memory bank). Besides the fact that such a description would be literally infinite, the theory is now without content, for the original purpose was to use the machine table as a model of a psychological theory, whereas it is now clear that the machine table description, although different from the description at the elementary particle level, is as removed from the description via a psychological theory as the physico-chemical description is. I now want to make a different point about the Turing machine model. The laws of psychology, if there are ‘laws of psychology’ at all, need not even be compatible with the Turing machine model, or with the physico-chemical description, except in a very attenuated sense. And I don’t have in mind any version of psychism. As an example, consider the laws stated by Hull in his famous theory of rote learning. Those laws specify an analytical relationship between continuous variables. Since a Turing machine is wholly ‘discrete’, those laws are formally incompatible
138
H. Putnam
with the Turing machine model. Yet they could perfectly well be correct. The reader may at this point feel annoyed, and want to retort: Hull’s laws, if ‘correct’, are correct only with a certain accuracy, to a certain approximation. And the exact law has to be compatible with the Turing machine model, if I am a Turing machine, or with the laws of physics, if materialism is true. But there are two separate and distinct elements to this retort. (1) Hull’s laws are ‘correct’ only to a certain accuracy. True. And the statement ‘Hull’s laws are correct to within measurement error’ is perfectly compatible with the Turing machine model, with the physicalist model, etc. It is in this attenuated sense that the laws of any higher-level discipline have to be compatible with the laws of physics: It has to be compatible with the laws of physics that the higher-level laws could be true to within the required accuracy. But the model associated with the higher-level laws need not at all be compatible with the model associated with the lower-level laws. Another way of putting the same point is this. Let L be the higher-level laws as normally stated in psychology texts (or texts of political economy, or whatever). Let L* be the statement ‘L is approximately correct’. Then it is only L* that has to be compatible with the laws of physics, not L. (2) The exact law has to be compatible with the Turing machine model (or anyway the laws of physics). False. There need not be any ‘exact’ law - any law more exact than Hull’s - at the psychological level. In each individual case of rote learning, the exact description of what happened has to be compatible with the laws of physics. But the best statement one can make in the general case, at the psychological level of organization, may well be that Hull’s laws are correct to within random errors whose explanation is beneath the level of psychology. The general picture, it seems to me, is this. Each science describes a set of structures in a somewhat idealized way. It is sometimes believed that a non-idealized description, an ‘exact’ description, is possible ‘in principle’ at the level of physics; be that as it may, there is not the slightest reason to believe that it is possible at the level of psychology or sociology. The difference is this: If a model of a physical structure is not perfect, we can argue that it is the business of physics to account for the inaccuracies. But if a model of a social structure is not perfect, if there are unsystematic errors in its application, the business of accounting for those errors may or may not be the business of social science. If a model of, say, memory in functional terms (e.g., a flow chart for an algorithm) fails to account for certain memory losses, that may be because a better psychological theory of memory (different flow chart) is called for, or because on certain occasions memory losses are to be accounted for by biology (an accident in the brain, say) rather than by psychology. If this picture is correct, then ‘oversimplified’ models may well be best possible at the ‘higher’ levels. And the relationship to physics is just this: It is compatible with physics that the ‘good’ models on the higher levels should be approximately realized
Reductionism and the nature of psychology
139
by systems having the physical constitution that human beings actually have. At this point, I should like to discuss an argument proposed by Hubert Dreyfuss. Dreyfuss believes that the functional organization of the brain is not correctly represented by the model of a digital computer. As an alternative he has suggested that the brain may function more like an analogue computer (or a complicated system of analogue computers). One kind of analogue computer mentioned by Dreyfuss is the following: Construct a map of the railway system of the U.S. made out of string (the strings represent the railroad lines, the knots represent the junctions). Then to find the shortest path between any two junctions - say, Miami and Las Vegas - just pick up the map by the two corresponding knots and pull the two knots away from each other until a string between them becomes straight. That string will represent the shortest path. When Dreyfuss advanced this in conversation, I rejected it on the following grounds: I said that the physical analogue computer (the map) really was a digital computer, or could be treated as one, on the grounds that (1) matter is atomic; (2) one could treat the molecules of which the string consists as gears which are capable of assuming a discrete number of positions vis-a-vis adjacent molecules. Of course, this only says that the analogue computer can be well approximated by a system which is ‘digital’. What I overlooked is that the atomic structure of the string is irrelevant to the working of the analogue computer. Worse, I had to invent a microstructure which is just as fictitious as the idealization of a continuous string of constant length (the idea of treating molecules as ‘gears’) in order to carry through the re-description of the analogue device as a ‘digital’ device. The difference between my idealization (strings of ‘gears’) and the classical idealization (continuous strings) is that the classical idealization is relevant to the functioning of the device as an analogue computer (the device works because the strings are - approximately - continuous strings), while my idealization is irrelevant to the description of the system on any level.
3.
Psychology
The previous considerations show that the Turing machine model need not be taken seriously as a model of the functional organization of the brain. Of course, the brain has digital elements - the ‘yes-no’ firing of the neurons - but whether the larger organization is correctly represented by something like a flow chart for an algorithm, or by something quite different, we have no way of knowing right now. And Hull’s model for rote learning suggests that some brain processes are best conceptualized in terms of continuous rather than discrete variables. In the first section of this paper we argued that psychology need not be deducible
140
H. Putnam
from the laws describing the functional organization of the human brain, and in the last section we used a psychological state (jealousy) to illustrate that the Turing machine model cannot be correct as a paradigm for psychological theory. In short, there are two different questions which have gotten confused in the literature: (1) Is the Turing machine model correct as a model for the functional organization of the human brain? and (2) Is the Turing machine model correct as a model for psychological theory? Only on the reductionist assumption that psychology is the description of the functional organization of the brain, or something very close to it, can these two questions be identified. Our answer to these two questions so far is that (1) there is little evidence that the Turing machine model is correct as a model of the functional organization of the brain; and (2) the Turing machine model cannot be correct as a model for psychological theory - i.e., psychological states are not machine states nor are they disjunctions of machine states. But what is the nature of psychological states? The idea of a fixed repertoire of emotions, attitudes, etc., independent of culture is easily seen to be questionable. An ‘attitude’ that we are very familiar with, for example, is the particular kind of arrogance that one person feels towards other people ‘because’ he does mental work and they do manual work. (The reason I put ‘because’ in shudder quotes is that really the causality is much more complicated - he feels arrogant because his society has successfully won him and millions of other people to the idea that the worker is ‘superior’ to the extent his work differs from the work of a ‘common’ laborer and resembles that 01 a manager, perhaps, or because it has won him and millions ot other people to the idea that certain kinds of work are inherently ‘above’ most people - ‘they couldn’t understand’ - etc.). An ‘attitude’ we find it almost impossible to imagine is the following: One person feeling superior to others because the first person cleans latrines and the others do not. This is not the case because people who clean latrines are innately inferior, nor because latrinecleaning is innately degrading. Given the right social setting, this attitude which we cannot now imagine would be commonplace. Not only are the particular attitudes and emotions we feel culture-bound, so are the connections. For example, in our society, arrogance of mental workers is associated with extreme competitiveness; but in a different society it might be associated with the attitude that one is ‘above’ competing, while being no less arrogant. This might be a reflection of the difference between living in a society based on competition and living in a society based on a feudal hierarchy. Anthropological literature is replete with examples that support the idea that emotions and attitudes are culture-dependent. For example, there have existed and still exist cultures in which private property and the division of labor are unknown. An Arunha cannot imagine the precise attitude with which Marie Antoinette said ‘let them
Reductionism and the nature of psychology
141
eat cake’, nor the precise attitude of Richard Herrnstein towards the ‘residue’ of low I.Q. people which he says is being ‘precipitated’, nor the precise attitude which made me and thousands of other philosophers feel tremendous admiration towards John Austin for distinguishing ‘Three Ways of Spilling Ink’ (‘intentionally’, ‘deliberately’, and ‘on purpose’). Nor can we imagine many of the attitudes which Arunha feel, and which are bound up with their culture and religion. This suggests the following thesis: That psychology is as under-determined by biology as it is by elementary particle physics, and that people’s psychology is partly a reflection of deeply entrenched societal beliefs. One advantage of this position is that it permits one to deny that there is a fixed human nature at the level of psychology, without denying that homo sapiens is a natural kind at the level of biology. Marx’s thesis that there is no fixed ‘human nature’ which people have under all forms of social organization was not a thesis about ‘nature versus nurture’.
4.
Intelligence
As an example, let us take a look at the concept of ‘intelligence’ - a concept in vogue with racist social scientists these days. The concept of intelligence is both an ordinary language concept and a technical concept (under the name ‘IQ’). But the technical concept has been shaped at every point to conform to the politically loaded uses of the ordinary language concept. The three main features of the ordinary language notion of intelligence are (1) intelligence is hard or impossible to change. When one ascribes an excellent or poor performance to high or low skill there is no implication that this was not acquired or could not be changed; but when one ascribes the same performance to ‘high intelligence’ or ‘low intelligence’ there is the definite implication of something innate, something belonging to the very essence of the person involved. (2) Intelligence aids one to succeed, where the criterion of ‘success’ is the criterion of individual success, success in competition. It is built into the notion that only a few people can have a lot of intelligence. (3) Intelligence aids one no matter what the task. Intelligence is thought of as a single ability which may aid one in doing anything from fixing a car or peeling a banana to solving a differential equation. These three assumptions together amount to a certain social theory: The theory of elitism. The theory says that there are a few ‘superior’ people who have this one mysterious factor - ‘intelligence’ - and who are good at everything, and a lot of slobs who are not much good at anything. The IQ test was constructed to preserve the elitist features of the concept in the following way. (1) The IQ test was standardized so that IQ scores would not change
142 H. Putnam
much with age, thus preserving the illusion of a measure of something unchanging. One can do this with any test, as long as relative standings are fairly stable. For example, suppose one takes a test of French vocabulary. One’s score would presumably increase as one went through four years of college French. Now, suppose one standardized the scores so that on the average they did not increase (by simply giving less credit per item to people taking the test with one year of college French, still less credit to people with two years, etc.). Then a person with a score of 85 on ‘French vocabulary’ would, on the average, still score 85 after four full years. Finally, rebaptise the test, and call it a test of ‘French ability’ (‘French intelligence’?) and lo and behold a new social distinction is born! The distinction of having high or low ‘French ability’. But one is still, in the real world, just talking about high or low relative standing on plain old vocabulary tests. (2) The IQ test was ‘validated’ by selecting the items so that they would predict ‘success’ in college - one of Herrnstein’s arguments for identifying IQ with ‘intelligence’ - is 100% a statistical artifact of this method of validation. (3) The third feature of the ordinary language concept - that IQ is a single factor - was harder to ensure. All of the statistical evidence turned out to be against this hypothesis. In fact, it turns out that over a hundred different factors contribute to one’s score on IQ tests. So one just takes an average, weighting the factors so that they predict success in school, and calls the result Intelligence Quotient’. And again, lo and behold! One has people with ‘high IQ’ and people with ‘low IQ’, ‘gifted people’ (a term Herrnstein et al. use interchangeably with ‘high IQ’) and ‘dull people’ (a term used interchangeably with ‘low IQ’). In short, one recovers the full ordinary language use of the concept - but now with the appearance of ‘scientific objectivity’. Under a less competitive form of social organization, the theory of elitism might well be replaced by a different theory -the theory of egalitarianism. This theory might say that ordinary people can do anything that is in their interest and do it well when (1) they are highly motivated, and (2) they work collectively. After all, a long tradition of libertarian thought, including such different thinkers as Marx and Kropotkin, has held that the extreme separation of mental from manual labor and the top-down organization of society are neither forced upon us by human nature nor the only conceivable forms of advanced social organization - that a society based upon what Kropotkin called mutual aid rather than competition for profit and coercion by the state is conceivable and may well represent the sefting in which human potential for ‘free conscious activity’ and ‘productive life’ (in the words of the early Marx) can best develop. Those who have conceived of such a society have always anticipated that it would bring enormous opportunities for ‘more science and art, more diffused knowledge and mental cultivation, more leisure for wage earners, and more capacity for intelligent pleasures’, as Russell wrote in the 1920s. Egalitarian estimates of human potential rest on a number of objective facts. That motivation
Reductionism and the nature of psychology
143
plays a decisive role in acquiring almost any skill is a matter of everyone’s experience. Take driving a car, for example. Anyone who is highly motivated can learn to drive exceptionally well, as a rule. One may be held back by certain emotional problems (themselves frequently connected with social attitudes), for example, nervousness or general insecurity. If one is trying to learn to drive in races, then fear or lack of fear will be a big factor. But if one is not insecure, not fearful, not so wrapped up in oneself or some problem that one lacks judgment, then one can learn to be far more skillful than average if one has the motivation. The same thing is true of any task. Again anthropological evidence is relevant. A Norwegian adolescent has ‘exceptional’ skiing ability by American standards. An Amazonian boy has ‘fantastic’ ability with a blowgun. A factory worker in Maine can sew two hundred collars in one hour - because she has to feed her family. The importance of working collectively is also evidenced in many ways. The Black and Latin prisoners in Attica Prison are presumably part of the low IQ ‘residue’. But they organized brilliantly : Every popular revolution in history makes the same point that ordinary people in a revolution can perform incredible feats of organization, planning, strategy, etc. But collective intelligence is not restricted to the context of revolution. Since the 1950s a series of studies have shown that even in the context of modern capitalist production, workers perform better, and find their jobs less dissatisfying, when the managerial hierarchy is reduced. E.g., studies in the coal mines showed that teams of workers working without a foreman (the jobs were divided up by the workers themselves; the workers rotated jobs at will, decided when they would take breaks, etc. ; only the group as a whole had a production quota to meet, not the individual worker) produced better than when there was a foreman, that absenteeism was less, etc. Today when the automobile industry is experimenting with giving teams of workers a tiny bit of autonomy in this way, it is very evident that the reason for so much hierarchy, rigidity, and coercion was not, in fact, eficiency. Nor are the workers too lacking in intelligence to function without authoritarian discipline. Although it is a serious mistake, in my opinion, to consider China to be an egalitarian or libertarian society, similar experiments in China (possibly on a much larger scale, although it is hard to be sure) have had similar results. Of course, saying that people can solve any problem collectively implies something about the capacities of the individual person. Solving a problem collectively means many people individually doing many hard things. The fact that individual people can do things that are supposed to be ‘beyond’ them, when they are motivated to is the basis of the fact that groups of people aiding each other can do anything, if they have to. But someone may be highly motivated in a collective struggle and ‘lack motivation’ when the context is one of individual competition - especially competition
144
P. Putnam
in a situation which is loaded against him and his group. The same worker may be tremendously motivated in the context of organizing a strike or reorganizing a factory, given the opportunity, and tremendously ‘unmotivated’ when it comes to taking an IQ test. I am not arguing that the IQ test is simply a test of motivation. Among white, middle-class people - that is, among the sort of people the test was standardized on there is little doubt that the test measures some sort of skill connected with reading and with interpreting written material and some sort of skill connected with abstract thinking, in the sense in which mathematics and logic are abstract. Nor am I concerned to deny that these skills have a significant genetic component. The high correlation between the IQ scores of identical twins is an impressive argument for the existence of such a component. But to use a test designed to show differences in the cognitive skills of white middle-class people to try to prove differences in the cognitive capacities of middle-class and working-class people, or white and black people, is unsound. Working-class people and especially disadvantaged people do not have the same motivation to acquire the ‘scholastic’ skills that these tests measure that middleclass people do - not because they are ‘apathetic’ or ‘unmotivated’ in some vegetable sense, but because these skills will not help them get good jobs or decent lives in this society. (For example, Bowles and Gintis3 have shown that in the absence of years of schooling or high-status family, mere posession of high IQ has negligible effect on income. They also show that the effect of schooling on income is independent of its effects on the cognitive skills measured by the adult IQ test.) Even more important is the fact that these tests do not measure any kind of absolute cognitive capacity for anyone - white or black, middle class or working class. From the fact that Mr. Jones’ IQ is 100 you cannot infer how much he will know or be able to do; that depends on the cultural setting. This elementary point cannot be emphasized too strongly considering the uses currently being made of these tests. The absolute level of performance (i.e., level of knowledge displayed) on standardized intelligence tests has increased enormously since 1917. The mean IQ remains 100, of course, by definition. (Recall that IQ is a normalized measure; even if we were all as smart as Einstein, 50% of us would have IQ ‘below loo’.) At one time, reading and writing were skills confined to a tiny minority. If at, say, the time of Charlemagne one had suggested that someday almost everyone would be able to read and write, and that the descendents of the serfs would have a vocabulary of many thousands of words, one would have been looked at as one would be looked at today if he suggested that in twenty years symbolic logic may be taught in elementary school, or in 200 years everyone (including the 50% with ‘IQ below 100’) will know relativity theory. 3. See IQ and the U.S. class structure, Social Policy, Jan.-Feb. 1973, 65-96.
Reductionism and the nature of psygology
145
But symbolic logic may well be taught in elementary school in twenty years, and I would be amazed if everyone didn’t know relativity theory in 200 years. That relative standing with respect to certain cognitive skills may be largely genetically determined is socially unimportant if no upper bound is thereby set on the absolute level of skill that the great majority of people can attain. Notice that, if someone maintains that IQ scores do set an upper bound on the absolute level of skill that people can attain - say, that people with IQ below 100 will never be able to learn accounting - not even if we change the teaching methods and the motivational situation - then he is committed to the claim that the analogous contention in Charlemagne’s time (with respect to literacy) was false because the ‘intellectual’ tasks of Charlemagne’s time were still remote from the boundary of absolute human capacity, whereas accounting is close to the boundary. I find it hard to believe that we have begun to even come close to any absolute boundary of human capacity. Thus, when Herrnstein, for example, writes4 that the 50% of the human race with IQ below 100 are ‘ineligible’ for accounting, it is somewhat difficult to know what he means. If he means that under present conditions, people with IQ below 100 can’t become accountants, this is uninteresting. But if he means that such people cannot become accountants even if we devise better teaching methods, and the culture is different in such a way that they are motivated to learn accounting, then this is highly implausible and in any case totally unproved. Yet it is this second, totally unproved contention that he needs to draw the conclusion that ‘low IQ’ people are destined to become an unemployable ‘residue’ in advanced industrial societies. Once again, we see the political assumptions that were built into the ordinary language notion of ‘intelligence’ operating. The normalization of the IQ measure automatically makes it competitive: How ‘intelligent’ you are is defined to be not a function of how much you can learn on any absolute scale, but what percentage of the population you can beat. And then the measure is used as if it were, after all, an absolute measure, to legitimize stratification which has, in fact, nothing to do with IQ at all6
5.
Psychology
again
If these reflections
are right, then it is worthwhile
4. P. 51 in his article IQ, Atlanric Monthly, Sept. 1971,43-64. 5. For a lucid exposition of the ‘legitimizing’
re-examining
the nature
of psy-
function of IQ, CJ the article by Bowles and Gintis cited in note 3.
146
H. Putnam
chology. Reductionism asserts that psychology is deducible from the functional organization of the brain. The foregoing remarks suggest that psychology is strongly determined by sociology. Which is right? The answer, I suspect, is that it ‘depends on what you mean’ by psychology. Chomsky remarks that ‘so far as we know, animals learn according to a genetically determined program’. While scientific knowledge reflects the development of a socially determined program for learning, there can be little doubt that the possible forms of socially determined programs must in some ways be conditioned by the ‘genetically determined program’ and presuppose the existence of this program in the individual members of the society. The determination of the truth of this hypothesis, and the spelling out of the details is the task of cognitive psychology. Nothing said here is meant to downgrade the importance of that task, or to downgrade the importance of determining the functional organization of the brain. Some parts of psychology are extremely close to biology - Hull’s work on rote learning, much of the work on reinforcement, and so on. It is no accident that in my own reductionist papers the example of a psychological state was usually ‘pain’, a state which is strongly biologically marked. On the other hand, if one thinks of the parts of psychology that philosophers and clinical psychologists tend to talk about - psychological theories of ‘aggression’, for instance, or theories of ‘intelligence’, or theories of ‘sexuality’, then it seems to me that one is thinking of the parts of psychology which study mainly societal beliefs and their effects in individual behavior. That these two sides of psychology are not distinguished very clearly is itself an effect of reductionism. If they were, one might have noticed that none of the literature on ‘intelligence’ in the past 75 years has anything in the slightest to do with illuminating the nature
and structure
of human
cognitive
capacity.
Discussion
Professors and psychological researchers: Conflicting values in conflicting roles
H. B. SAVIN University of Pennsylvania
Psychology Professor Philip Zimbardo and his colleagues hired ten undergraduates to play at being prisoners, and eleven others to be their jailers. The psychologistdirectors spared no pains in constructing a convincing jail. They persuaded the Palo Alto police to arrest the ‘volunteer’ prisoners in squad cars, charge them with felonies at the police station and deliver them blindfolded to Zimbardo’s jail, without informing them that this ‘arrest’ was the beginning of the experiment they had agreed to take part in. The principal result was that the guards behaved like prison guards and the prisoners like prisoners. Indeed, the guards seem to have been more brutal, and the prisoners more degraded, than one would expect them to have become after only a few days in an establishment jail; Zimbardo was outstandingly successful at simulating the most destructive aspects of prisons. Within five days, he reports, he had felt obliged to release four of his ten prisoners because of ‘extreme depression, disorganized thinking, uncontrollable crying and fits of rage’ (1973a, p. 48), even though he was not so squeamish as to have prevented his guards from forcing prisoners to clean toilets with their bare hands, spraying them with fire extinguishers and repeatedly making them ‘do pushups, on occasion with a guard stepping on them’ (p. 44). In short, one cannot make a prison a more humane institution by appointing Mr. Zimbardo its superintendent - a result no more surprising to him than to the rest of us because he knows all about man’s ‘dehumanizing tendency to respond to other people according to socially determined labels and often arbitrarily assigned roles’ (1973a, p. 58). The roles of prisoner and guard have been much discussed of late, and their characteristics are reasonably well known. But Mr. Zimbardo’s study also calls attention to the role of psychological researcher. Much research in psychology, and in medicine as well, cannot be done without subjecting people to injury, sometimes physical and sometimes psychological, sometimes temporary and sometimes permanent. When is such research justifiable? No one, surely, would object to embarrassing a few self-confident volunteers if the result were a cure for schizophrenia. But in a Cognition 2(l),
pp. 147-149
148
H. B. Savin
great many experiments it is by no means clear that the good outweighs the harm, and the balance one strikes will depend in part, as social psychologists know better than anyone else, upon one’s role. Consider again Zimbardo’s study. He has acknowledged that his results were ‘no surprise to sophisticated savants’ (1973b, p. 123), but feels that, even if it did not contribute anything to scientific knowledge, it was worth doing because it would help to enlighten those who had not learned about the importance of roles from less melodramatic research than his.l Is the degradation of thirty-two young men justified by the importance of the results of this research? That depends, obviously, on one’s point of view. In practice, the decision about whether to do an experiment in which the subjects are mistreated rests with the experimenter, who, of course, is likely to profit from whatever good comes of an experiment but does not experience the harm that it does to his subjects. Similar questions are raised by a great many other psychological experiments in which subjects are deceived, frightened, humiliated or maltreated in some other way. Like everyone else, psychological experimenters are bound by the code of criminal law, but the subject in a psychological experiment cannot assume that he will be treated any better than the law requires. Indeed, the police are apt to be somewhat lax in upholding the laws that might otherwise protect subjects because of a presumption that research of almost any sort is useful, or at least respectable. Professors and scientists have traditionally resisted any proposal that outsiders participate in judgments about what research ought not to be done because its objectionable side-effects outweigh its value. (The American Psychological Association continues this tradition in its newly published Ethical principles in the conduct of research with humanparticipants, where numerous ethical precepts are set forth but the only advice given to an investigator whose proposed experiments would violate these precepts is to weight the harm to be done by the research carefully against its possible benefits.) But, when the experimenters themselves decide how much mistreatment the importance of their work justifies them in inflicting on their subjects, the result is exactly what social psychologists would predict: Simple lying becomes a perfectly commonplace feature of even students’ routine laboratory exercises; humiliation of 1. Zimbardo makes one other claim about the value of this research: ‘From what we have learned by observing the process of dehumanization and causal matrix in which pathology was so easily elicited, we can help design not only more humanitarian prisons but help average people break out of their self-imposed or socially ascribed prisons. We have begun to do the former with correctional personnel,
and the latter through a fuller exposition of the psychology of imprisonment in a forthcoming book’ (1973b, p. 123). Zimbardo does not explain how an experiment whose results were foreseeable can help us escape whatever metaphorical prisons we are in, nor how, except for its possible use in enlightening the ignorant, it can help him design more ‘humanitarian’ physical prisons.
Professors and psychological researchers: Confrcting values in conflicting roles
149
subjects is not uncommon; on occasion there is a hell like Zimbardo’s. Society survives in spite of its used-car salesmen, its politicians’ assistants, and a host of other people whose roles tempt them to be as obnoxious as the law allows, and it will not be destroyed by a few dozen psychologists who are similarly overzealous in the pursuit of their careers, but this particular kind of morally obtuse zeal raises special problems for the university. Most of the psychologists whose experiments involve mistreatment of human subjects are university professors, and most of their subjects are university students. Professors who, in pursuit of their own academic interests and professional advancement, deceive, humiliate, and otherwise mistreat their students are subverting the atmosphere of mutual trust and intellectual honesty without which, as we are fond of telling outsiders who want to meddle in our affairs, neither education nor free inquiry can flourish.
REFERENCES American Psychological Association (1973) Ethicalprinciples in the conduct of research with human participants. Washington,
D.C. Zimbardo, Philip G., Banks, W. Curtis, Haney,
Craig, and Jaffe, David (1973a) The mind is a formidable jailer. The New York Times Magazine, April 8, 38ff. Zimbardo, Philip G. (1973b) Letter. The New York Times Magazine, May 20, 123.