Cognition, 4 (1976) 303 - 307 @Elsevier Sequoia S.A., Lausanne
1 - Printed
in the Netherlands
The influence
of synta...
30 downloads
1572 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Cognition, 4 (1976) 303 - 307 @Elsevier Sequoia S.A., Lausanne
1 - Printed
in the Netherlands
The influence
of syntactic segmentation on perceived stress*
JAMES R. LACKNER Brandeis
University
Massachusetts
and
Institute
of Technology
BETTY TULLER Brandeis
University
Abstract Chomsky and Halle (1968) claim that the stress and intonation of an utterance are not determined solely by physical properties of the acoustic signal but are also influenced by the syntactic organization of the utterance. Strong support for their contention has been obtained by presenting listeners with a continuously repeated string of monosyllabic words. Such sequences undergo spontaneous perceptual re-grouping sometimes producing strings with different syntactic organizations; such syntactic changes are accompanied by immediate and dramatic changes in apparent stress and intonation although the physical signal itself never varies.
Chomsky and Halle (1968) have proposed that stress contours are not present as physical properties of utterances in anything comparable to the detail with which they are perceived. “The hearer uses certain cues and expectations to determine the syntactic structure and semantic content of an utterance. Given a hypothesis as to its syntactic structure - in particular its surface structure - he uses the phonological principles that he controls to determine a phonetic shape. The hypothesis will then be accepted if it is not too radically at variance with the acoustic material.” Recently, while studying the factors that enable a listener to maintain perceptual integrity of the serial order of speech (Tuller and Lackner, 1976a, b), we noticed an interesting phenomenon that provides support for *Support tion.
was provided
by the Rosenstiel
Biomedical
Sciences
Foundation,
and the Spencer
Founda-
304
James R. Lackner and Betty TuEler
Chomsky and Halle’s claims about the interdependence of stress, intonation, and syntax. In these experiments, we were trying to determine what prosodic and syntactic characteristics of the speech signal prevent it from undergoing primary auditory stream segregation (PASS). PASS - the perceptual segmentation into distinct spatial streams of acoustically related elements within a continuous auditory signal - prevents subjects from resolving accurately the relative order of repeated sequences of tones (Bregman and Campbell, 1971) or repeated sequences of consonant and vowel sounds (Lackner and Goldstein, 1974). In an attempt to determine why PASS does not interfere with the resolution of natural speech we had 48 subjects report the temporal order and degree of PASS of repeated sequences of four English monosyllables which varied in terms of the degree of syntactic and intonational structure present. The sequences were sentences of five types: 1) sentences spoken with sentence intonation, 2) sentences with list intonation, 3) scrambled sentences with sentence intonation, 4) scrambled sentences with list intonation, and 5) word lists. 3) and 4) were created by splicing cross-recorded copies of 1) and 2). Each sequence was 1 set in duration and was repeated, without gaps between repetitions, for 64 sec. Full details of stimulus construction are described in Tuller and Lackner (1976a). The subjects listened over headphones to an experimental tape which included four examples each of stimulus types l-4 as well as eight word-list sequences. Most of the subjects spontaneously reported that it was possible to hear a repeated sequence (a 12341234 a a a a a a a . ..) as (a, a2 a3a4), (a,a3a4a,), (a3a4al a*), or (a, al a, a3) repeated and that they could switch back and forth between different relative groupings by ‘shifting attention’. Moreover, the subjects found that when the relative grouping of the sequence members changed the tonal quality and intensity of the individual elements also changed. These perceptual regroupings of repeated auditory sequences are analogous to the perceptual reorganizations originally described for visual stimuli by the Gestalt psychologists (Koffka, 1935). four additional listeners were To clarify this phenomenon further, presented the same stimuli and required to indicate the grouping changes and the apparent stress changes undergone by the experimental sequences. All listeners reported dramatic perceptual regroupings of the stimulus words, regroupings that were partially subject to voluntary control in that they could attend to or ‘impose’ a particular pattern on a sequence. Every stimulus sequence was repoiled in different perceptual groupings by at least one subject. When a regrouping produced a different syntactic organization or imposed a syntactic organization on a repeated word sequence, then the apparent
Factors affecting perceived stress 305
stress and intonation pattern of the sequence immediately changed to fit the sequence as it was then being segmented. For example, listeners originally perceiving the repeated word list ‘at, it, ate, aid’ as having all words equally stressed, found that after a few repetitions, perception of the sequence spontaneously changed to ‘ah, it’s fated’ with the first syllable of ‘fated’ stressed relative to the other elements. When perception of the sequence reverted back to ‘at, it, ate, aid’ the stress pattern also immediately reverted back to the original. The repeated stimulus sequence ‘the see I sun’ also underwent regrouping to ‘ice on the sea’. When this meaningful segmentation occurred, perceived stress and intonation immediately changed so that ‘sea’ was stressed, and a pause was heard between ‘on’ and ‘the’. No pause was heard between ‘sun’ and ‘the’ in the original sequence form, nor was any pause actually present. All subjects reported stress changes accompanying syntactic regroupings. Although changes in apparent segmentation occurred for all sequences, the experimental design permits an evaluation of the relative influence of syntax and intonation on the effects of resegmentation. The changes in stress and intonation are especially dramatic with repeated stimulus sequences that can form acceptable syntactic groupings in more than one segmentation. For example, the repeated sequence ‘I like to fish’ was often perceived as ‘to fish, I like’. As the experimenters themselves experienced, when one voluntarily switches between these segmentations, the changes in stress and intonation are so startling it is difficult to believe that the acoustic signal has not been changed. Moreover, in the latter segmentation, a considerable pause is heard after ‘fish’ which is perceived as having primary stress; no pause is actually present in the signal. Listeners reported that although it is very easy to change voluntarily the segmentation of word list sequences, these changes result in much less dramatic stress and intonation changes than those resulting from syntactic regroupings. Moreover, when a syntactically organized sequence spoken with sentential intonation undergoes a non-meaningful regrouping much of the originally perceived stress is then lost. It is also easier to switch segmentation between two non-syntactically organized groupings than to impose a non-meaningful regrouping on a syntactically organized sequence. Some years ago Lieberman (1963, 1965) showed that the linguist, when using the Trager-Smith notation for marking stress contours, does not rely solely on physical properties of the acoustic signal such as fundamental frequency, amplitude and duration, but also utilizes his knowledge of English stress rules and the syntactic segmentations of preceding words when assigning stress to subsequent words. As a consequence, different preceding contexts can influence the stress ratings assigned by a linguist
306
James R. Luckner and Betty Tuller
to the same acoustic object. By contrast, in our experiments only the perceptual segmentation of the signal varied, never the physical context. It is unlikely that the stress changes associated with different perceptual segmentations can be attributed to sensitivity changes at some peripheral level of sensory representation because the changes in perceptual segmentation are partly under voluntary control; moreover, it is as easy to ‘switch’ apparent segmentation toward the end of a stimulus train as it is after only a few repetitions. The influence of perceptual regroupings on apparent stress and intonation of physically identical language stimuli may reflect the activity of constancy mechanisms analogous at least in part to the constancy mechanisms known to operate in the visual and auditory systems. For example, the perceived size associated with a particular retinal image depends on the represented distance of the stimulus object from the observer; identical physical stimuli at the retina lead to perceptual objects of different sizes if the represented distances of the external objects from the observer differ (von Holst, 19.54). The influence of perceptual resegmentation of repeated linguistic sequences is such that physically identical linguistic stimuli are heard with different stress and intonation contours, depending on their apparent syntactic segmentation. Hence it is an intriguing possibility that a linguistic constancy mechanism exists by which the listener relates the acoustic signal to his internalized syntactic rules. Regardless of the specific mechanisms involved, it seems clear that any adequate model of speech perception must approach the organism as: “... a system which is already actively excited and organized. In the inact organism, behavior is the result of interaction of this background of excitation with input from any designated stimulus”. (Lashley, 195 1).
References Bregman, A. and Campbe& J. (1971). Primary auditory stream segregation and perception of order in rapid sequences of tones. J. exp. Psychol., 89, 244-249. Chomsky, N. and Halle, M. (1968). The Sound Pattern ofEnglish. New York, Harper and Row. Holst, E. von, (1954). Relations between the central nervous system and the peripheral organs. Brir. J. animal Behav., 2, 89-94. Koffka, K. (1935). Principles of Gestalt Psychology. New York, Harcourt, Brace. Lackner, J. and Goldstein, L. (1974). Primary Auditory Stream Segregation of Repeated ConsonantVowel Sequences. J. acoust. SW. Amer., 56, 1651-1652. Lashley, K. S. (1951). The Problem of Serial Order in Behavior, In L. A. Jeffress, Ed., Cerebral Meckanisms in Behavior. New York, Wiley. Lieberman. P. (1963). Effects of Semantic and Grammatical Context on Perception and Production. Lang. Speech, 6, 172-187.
Factors affecting perceived stress
Lieberman, P. 40-54. Tuller, B. and Percept. Tullcr, B. and Stimuli.
(1965).
On the Acoustic
Basis of the Perception
of Intonation
by Linguists.
307
Word, 21,
Lackner, J. (1976a). Primary Auditory Stream Segregation of Repeated Word Sequences. mot. Skills, 42. 1071-1074. Lackner, J. (1976b). Some Factors affecting the Perceived Ordering of Natural Speech Submitted to Pecept. mut. Skills.
RCsumP Chomsky & Halle (1968) soutiennent que I’acccntuation et I’intonation d’un enonce ne sont pas determinees seulement par les propriktes physiques du signal acoustique mais egalement par l’organisation syntaxique de l’dnonce. Lcs resultats obtenus en faisant entendre a des auditeurs la repetition continue d’une suite de mots monosyllabiques appuient tout B fait cette these. A partir de ces sequences se produisent des regroupements perceptifs spontanes about&ant parfois i des sequences ayant une organisation syntaxique diffdrente. Ccs variations syntaxiques s’accompagncnt alors de changements immddiats et importants dans I’intonation et I’accentuation apparentes bien qu’en fait le signal physique ne varic pas.
Cognition, 4 (1976) 309 @Elsevier Sequoia S.A., Lausanne
2 - Printed
in the Netherlands
On recursive reference
JOHN MORTON MRC Applied Psychology Unit, Cambridge, UK
Further
details of this paper can be found in Morton (1976)
Reference Morton,
J., (1976).
On recursive
reference.
Cognition, 4, iv, p. 309.
Cognition, @Elscvier
4 (1976) 31 l-319 Sequoia S.A., Lausanne
3 - Printed
in the Netherlands
Coding modality us, input modality in hypermnesia: Is a rose a rose a rose? MATTHEW SHIRA
HUGH ERDELYI*
Brooklyn
College,
NADEANNE and
and
FINKELSTEIN C. U.N. Y.
HERRELL,
BRUCE MILLER,
JANE THOMAS
Rutgers
-
The State University
Abstract Recent experiments indicate that recall of pictures, unlike recall of words, may increase fhypermnesia) with time and effort. This study demonstrates that by recoding word inputs in to ‘mental pictures’, i.e., images, subjects can transform inert word lists into hypermnesic ones. Thus, when word inputs are recoded into images, recall increases over time, and resembles in other respects the recall of pictures. Recent work in our laboratory has uncovered an unusual ‘forgetting’ phenomenon (Erdelyi and Becker, 1974; Shapiro and Erdelyi, 1974). Instead of the classic Ebbinghausian forgetting curve (Ebbinghaus, 1885/1964) in which memory decreases with time, we have experimentally induced the reverse of forgetting - hypermnesia instead of amnesia - such that recall increases rather than decreases with passing time and effort. This ‘negative forgetting’ phenomenon is powerful and highly reliable. Surprisingly, it has *We wish to thank Robert Crowder, Ralph Haber, Elizabeth and Geoffrey Loftus, Arthur Melton, Ralph Miller, Ulric.Neisser, Don Norman, Don Scarborough, and John Santa for critically reading and commenting on earlier drafts of this article. Different phases of the present research were supported by PHS/NIMH Grant 1 R03 MH25876-01 and by the City University of New York Faculty Research Award Program Grants 10261 and 10704 to the senior author. Requests for reprints should be addressed to Matt Erdelyi, Department of Psychology, Brooklyn College, C.U.N.Y., Brooklyn, N.Y. 11210.
3 12 M. H. Erdelyi, S. Finkelstein, N. Herrell, B. Miller and J. Thomas
worked up to now only with picture and not word stimuli, a fact of some interest in view of the almost universal adoption, since Ebbinghaus, of verbal stimulus materials in memory research. Despite the preponderant experimental literature, the notion that accessible memory might actually increase with time and effort ~ that lost memories might be recovered - has a long-standing tradition in a variety of clinical settings ranging from psychoanalysis and hypnosis to pre-operative temporal lobe stimulation (Breuer and Freud, 1985/1966; Lindner, 1955; Penfield, 1959; Reiff and Scheerer, 1959; White, Fox and Harris, 1940). Although dramatic hypermnesic effects have been reported in such situations, close examination has often cast doubt on their validity. The issue inevitably reduces to some variant of the general problem of reporting criteria - the subject’s criterion for deciding between ‘true’ memories and ‘false’ memories (Egan, 1958: Erdelyi, 1970; 1972; Green and Swets, 1966). Thus, Freud discovered, to his great shock, that the long-forgotten infantile seductions that his patients painfully ‘recovered’ in therapy, the basis of his early theory of hysteria, were in fact mostly confabulations (Freud, 1906/ 1963). Similarly, spectacular recoveries of lost memories occurring during hypnosis or temporal lobe stimulation have frequently turned out to be highly concrete fantasies (Barber, 1969; Horowitz, 1976; Neisser, 1967; Haber and Erdelyi, 1967; Erdelyi, 1970; 1972). This problem is all the more serious since all indications suggest that the subjects are reporting in good faith and actually believe their own confabulatory recollections (Haber and Erdelyi, 1967; Erdelyi, 1970; 1972). It is important to emphasize, therefore, that unlike clinical reports and the existing experimental literature - whether the older work on reminiscence, e.g., Ballard, 19 13; Brown, 1923; Buxton, 1914; McGeoch, 1935; Ward, 1937; 1943; Hovland, 1938; Huguenin, Williams, 1926; or more modern investigations, e.g., Ammons and Irion, 1954; Buschke, 1974; S. Ehrlich, 1970; Haber and Erdelyi, 1967; Lazar and Buschke, 1972 - our hypermnesia experiments control for reporting criteria through a forced-recall procedure. The obtained hypermnesias, therefore, appear to represent true increases in accessible memory, rather than criterion shifts over time. The existence of a genuine hypermnesic effect has potentially far-reaching implications for psychological theory. The present report concerns one of these: the consistent differences found between the time course of picture and word recall - hypermnesic functions for pictures but amnesic or stable functions for words. The present research arose from the observation by Erdelyi and Becker (1974) that while subjects exposed to word lists did not on the average exhibit hypermnesia, the effect was nevertheless present in a few individuals.
Coding modality vs. input modality in hypermnesia
3 13
Informal post-experimental inquiries yielded an intriguing pattern. Those few subjects evidencing hypermnesia for the word list reported having formed pictorial images of the word items as they were being presented. Subjects reporting other strategies, however, such as covert rehearsal, or conceptual organization of the materials, lost items over time, or, at best, retained a constant number. The question arose, therefore, whether the critical factor in hypermnesia was the input modality per se or the resultant coding modality adopted by the subject. The present research attempts to address this question directly by determining whether instructions to recode word inputs into images would produce hypermnesic memory functions similar to those obtained with picture stimuli. Method Three groups of 20 subjects were tested in small subgroups of 2 to 4 subjects. The picture (PIC) group subjects were shown slides of 60 simple sketches of objects such as FLAG, KEY, OCTOPUS, TRUMPET etc., for 5 seconds each, with instructions to attend closely to each item for subsequent recall. A second group, the WNO subjects (words with no imagery instructions), was treated in -exactly the same manner except that WNO subjects were shown slides of the word counterparts rather than the pictures themselves. A third group of subjects, the WIP group (words with ‘image pictures’ coding instructions), were also presented word stimuli but differed from the WNO subjects in one respect: in addition to the standard instructions to attend closely to each item, they were further instructed to form vivid images of the named object as it was being shown. Recall instructions, essentially identical for all three groups, followed the 5 minute input phase. The reading of the instructions after the presentation of the stimulus list had the additional function of reducing possible recency effects in the first recall trial. Subjects were asked to recall as many items of the input list as possible on the prepared recall protocol. (PIC subjects were asked to write the name of the objects they recalled rather than attempting to sketch them.) A forced-recall procedure was used in which subjects were required to fill in all response spaces in their recall protocol with nonrepeating responses, even if they could recall fewer items than there were response spaces. Previous research suggested that subjects would be able to recall intially only about 25 items out of the possible 60. Consequently, the recall protocol contained only 40 spaces, a number that would likely exceed any subject’s initial recall level while leaving room for possible recall improvements in subsequent trials, but which would cut down on an unduly high number of potentially interfering guesses.
3 14
M. H. Erdelyi, S. Finkelstein, N. Herrell, B. Miller and J. Thomas
Subjects were allowed a period of 7 min for their 40 recall attempts, at the end of which recall protocols were collected. A 7-min ‘think’ interval followed in which subjects were simply required to sit back and think about the stimulus set (the ‘think’ interval maximizes hypermnesia but is not necessary for its induction [see Erdelyi and Becker, 19743). At the end of the first think interval, a new recall protocol was distributed and subjects were asked once again to recall as many stimulus items as.possible, guessing if necessary to complete the required 40 responses. Again, 7 min were allowed. After the second recall attempt (R2), another 7-min think interval followed, after which a third and final recall test, R, was administered. Post-experimental ratings were collected from WNO subjects to determine to what degree they spontaneously engaged in imaginal as well as other mnemonic strategies. (These ratings were available only for subjects tested in Brooklyn College [N = 91 and not for those tested at Rutgers University [N = 111 .) Results and Discussion
Figure 1 depicts the ‘forgetting’ curves obtained by the three groups. Visual inspection suggests that the PIC group progressively recalled more stimulus items in successive recall trials, while the WNO group recalled roughly a constant number. The critical WIP group seems to yield a decisive answer: when elements of a word list are recoded into images, memory for the recoded word list is very much like that of the picture list, both in terms of initial recall, as well as recall over time. Statistical analysis confirms these observations. The main effect due to Group (PIC, WIP, and WNO) was significant (I; (2, 57) = 7.80, p < 0.01). Likewise, the effect of Recall Trial was significant (F (2, 141) = 29.66, p < 0.01) as was the Group x Recall Trial interaction (F (4, 114) = 5.04, p < 0.01). Tests for simple effects showed that recall for both the PIC (F (2, 114) = 24.93, p < 0.01) and the WIP (F (2, 114) = 13.29,~ < 0.01) groups increased over successive recall trials, while recall for the WNO group failed to do so (F (2, 114) = 1.48, n.s.). Furthermore, a simple effects test on first recall trial (R,) performance reveals that the groups differed significantly (F (2, 57) = 3.13, p = 0.05). Specific contrast analyses showed further that while R, scores for the PIC and WIP groups did not differ (t < l), the difference between the R1 performance of WIP and WNO subjects was significant despite the fact they had been exposed to the identical stimulus list (t (57) = 1.92, p < 0.05). Thus, both in terms of initial recall performance (R,) and recall profile (R,, R,, R3), the WIP group is much more like the PIC than the WNO group, attesting to the power of the imagery recoding instructions.
Coding modality vs. input modality in hypermnesia
Figure 1.
3 15
EI
__f
OUTPUT: sentences of the language
The problem can be likened to that of the classical black box. We know the sorts of things that go in -- samples of language which the child happens “The different
nature of these principles is currently the subject of considerable dispute. analyses arc presented in Bresnan (19751, Chomsky (1976), and Horn (1974).
For example,
Artificial intelligence and the study oflarrguage 327
to hear - and we know what comes out - sentences of the language - but we do not know the structure of the box itself - the structure of the language faculty. The aim of a theory of language is to discover the structure of this box. For obvious reasons we cannot open up the box and inspect it. Even if we could, the present state of neurophysiology would make such an enterprise of dubious value. Hence, we must try to guess the structure of the box from its behavior. The most minimal requirement that a theory of the black box must meet is to give a characterization of the output which meets the observed facts. In this case, this amounts to supplying a grammar which will generate all and only the sentences of the language. Theories meeting this requirement will be called obser~~utionully’adequate~. In principle, there can be many such theories, since the same language could be generated by many different grammars employing many different principles. Moving beyond observational adequacy, we will say that a theory is descriptively adequate if it characterizes the one grammar that is actually in the black box. Finally, a theory will be explanatorily adequate if it can explain how the descriptively adequate grammar came to be there, given the initial input. The attainment of these various levels of adequacy does not necessarily have to proceed step by step. On the contrary, the three levels can in practice most fruitfully be pursued simultaneously. It should be clear that only a theory meeting explanatory adequacy can answer the questions posed earlier about the nature of UG, for only such a theory will address the question of what set of universal principles are involved in the construction of a grammar. As a way of discovering the principles of UG, one might proceed by elaborating grammars of particular languages. But these particular grammars can contribute to the study of UG only to the extent that they bear on the question of universal principles. Thus, to answer the question - How does someone learn a grammar? - it is not enough to produce an observationally adequate, or even a descriptively adequate, grammar; rather, one must produce also a theory of UG which, when exposed to linguistic data, will construct that grammar. How might one go about elaborating such a theory? Since this theory will ultimately have to produce a grammar, and a grammar will have to operate through the interaction of its various subparts, it is reasonable to suppose that research in UC could fruitfully proceed by the elaboration of principles that underlie the subparts and the interaction among them. Put more concretely, this would involve the study of the principles that underlie the sound systems of language (phonology), the way languages construct words (morphology), the principles of syntax, semantics, and so on, and the way that 40ur
discussion
of the various
levels of adequacy
is based on Chomsky
(1957).
328
B. Elan Drcshcr ami Norbert Hornsteit~
these interact with each other. It is not obvious from the beginning what these components are, or where to draw the boundary lines between them. Rather, it is part of the task of this research to discover these sorts of things. That one cannot precisely delimit the various components beforehand does not detract from the reasonableness of the overall approach; the successful elaboration of principles attests to the fruitfulness of this method. A familiar example is the so-called ‘sound laws’ of 19thcentury historical linguistics, which could only have been discovered by abstracting the sound system away from other components of the grammar. The research strategy that we have been discussing, if pursued successfully, will lead to answers to the questions ~ What do people know about the grammar of their language, and how do they come to know it? The scope of a theory of grammar is basically limited to these kinds of concerns, i.e., to the tacit knowledge that a speaker has of the structure of his language -~ his linguistic competence. However, a theory of grammar and UG is only part of an overall theory of language. In particular, a study of competence abstracts away from the whole question of linguistic perf~ormance5, which deals with problems of how language is processed in real time, why speakers say what they say, how language is used in various social groups, how it is used in communication, etc. The question then arises whether the methodological issues discussed in relation to a theory of grammar can be extended fruitfully to these other domains of linguistic research. Consider for example the extension of the above to a theory of processing. Part of what a speaker knows about his language is how to process it. He is not taught how to do this, he learns it relatively quickly, and he is able to process an open-ended set of sentences. In effect, the situation parallels almost exactly what we saw to be true of knowledge of grammar. Hence, it is reasonable to suppose that the object of a theory of processing will also be the elaboration of certain general principles, which, in this case, underlie the functioning of the various components in a processor and the interactions among them. The methodology will remain essentially the same as for a theory of grammatical competence, though the principles will no doubt differ greatly. Thus, an explanatory theory of processing will not be limited to the production of models which can process language, but will aim to discover general principles from which such models will necessarily follow. Here, too, we can distinguish three levels of adequacy for theories of processing. A theory will be observationally adequate if it can reproduce observed facts about human processing. These observed facts will include the set of sentences that speakers of a language can process, the time it takes to process ‘The competence-performance
distinction
is due to Chomsky
(1965).
Artificial intelligence and the study of language
329
these sentences, and so on. Here again it is possible to imagine that many different types of processors will meet these requirements. A theory of pre cessing will be descriptively adequate if it incorporates just those principles of processing that speakers use. An explanatory theory of processing will provide an account of how those principles could have been acquired on the basis of the initial input. To sum up, a scientific study of language will aim at theories which attain the level of explanatory adequacy, i.e., theories which provide principles according to which the human language faculty is organized, and which account for the facts of language acquisition and use.
1.2. The Goals of AI Research into Language We have just seen that for a theory of language to be of scientific interest, it must address itself to the principles according to which languages are organized and learned. Work in AI will be of such interest to the extent that it addresses itself to these questions and provides new and interesting solutions to them. On the face of it, much current work in AI seems to be aimed in just this direction. For example, Winograd (1972) writes that one of the of what goals of his SHRDLU program is to gain “a better understanding language is and how it is put together” (p. 2). Schank (1972) concludes: WC hope to be able to build a program that can learn, as a child does, how to do what we have described in this paper instead of being spoon-fed the tremendous information necessary. (p. 6 39)
Minsky (1974) writes concerning his Frame theory that “the Frame-system scheme may help explain a number of phenomena of human intelligence”, including “linguistic and other kinds of understanding” (p. 3). These and many other such passages all convey the impression that work in AI will bear crucially on the sorts of questions discussed above. However, this impression is quite misleading, for on closer examination, it turns out, first, that AI work in language is primarily directed to other aims which bear only indirectly on these issues, and second, that the researchers quoted above misconceive what it would take to achieve the goals that they profess. For example, consider the three goals that Winograd sets for himself in Winograd ( 1972): We had three main goals in writing such a program. The first is the practical desire to have a usable language-understanding system. Even though we used the robot as our test area, the language programs do not depend on any special subject matter, and they have been adapted to other uses.
330
B. h’larr Dreshcr and Norbert Hormtein
The second goal is gaining a better understanding of what language is and how it is put together. To write a program we need to make all of our knowledge about language explicit, and we have to be concerned with the entire language process, not just one area such as syntax. This provides a rigorous test for linguistic theories, and leads us into making new theories to fill the places where the old ones are lacking. More generally, we want to understand what intelligence is and how it can be put into computers. Language is one of the most complex and unique of human activities, and understanding its structure may lead to a better theory of how our minds work. The techniques needed to write a language-understanding program may be useful in other areas of intelligence which involve integrating large amounts of knowledge into a flexible system. (p. 2)
Consider the first goal - “the practical desire to have a usable languageunderstanding system”. This goal is quite different from the goal of scientific study of language described earlier, which is the study of the principles according to which languages are organized and learned - i.e., the principles of UG. For while the results of scientific research into language will certainly be of value to someone who wishes to create a “usable language-understanding system”, the converse does not necessarily hold. For just as one can build a usable machine without knowing or contributing to knowledge of the principles of physics which make it work, one can build a “usable languageunderstanding system” without knowing or contributing to knowledge of the principles of UG. In fact, if one approaches the task with a “practical desire”, the question of universa1 principles need hardly ever arise. For to create a functioning language understanding system one need only program in some grammar, SOPW semantic and pragmatic components, some lexicon and so on. These components do not have to meet descriptive or even observational adequacy, for their adequacy will be a function of what is required to make the system “usable” fi)r the task which it is being designed for6. Thus, one could start with fairly primitive components (a small number of syntactic patterns, a small lexicon, etc.) which could be improved indefinitely (by adding more syntactic patterns, InOre lexical items) according to practical contraints such as time, money and computer space’. At no point in this continuing enter6”The point here is that computer programs that deal with natural language do so for some purpose. They need to make use of the information provided by a sentence so that they can respond properly. (What is proper is entirely dependent on the point of the computer program. A psychiatric intcrviewing program would want to respond quite differently to a given sentence than a mechanical translation program.)” (Schank, 1973: 189) “‘If someone is trying to build the best robot which can be completed by next year, he will try to avoid any really hard problems that come up, rather than accepting them as a challenge to look at a new arca. lhere will be pressure from the organization of the projects and fundins agencies to get results at the expense of avoiding hard problems.” (Winograd, 1974: 93)
Artificial intelligence and the study of language
33 1
prise will it be necessary to deal with the questions which are characteristic of research into the nature of universal grammar: What is a possible syntactic structure? What is a permissible semantic interpretation rule? What is a possible lexical entry? and so on. Thus, “the practical desire to have a usable language-understanding system” does not lead one toward a solution of questions of UC - towards scientifically relevant questions - but on the contrary, it leads one away from a consideration of these issues’. Incidentally, although the AI literature commoniy refers to languageunderstanding systems, it is not clear how this use of the term “understanding” relates to what people generally mean by this word. In AI languagcunderstanding systems are systems which can carry out certain limited tasks involving language in some way: e.g., answer questions about baseball, or engage in limited dialogue about a particular small world of blocks. Why these systems are graced with the epithet “language-understanding” rather has never been adequately than, say, “language receiving and responding” explained. If “understanding” is anything like what humans do, then obviously no “language-understanding system” yet constructed or contemplated can be said to understand. In the remainder of this paper we will continue to use the term in discussing such systems, but we wish to emphasize that we use it only as a term for the types of systems mentioned above. Returning to the main discussion, we note that in contrast to Winograd’s first goal, goals two and three do directly involve questions relevant to a scientific theory of language. These goals are “gaining a better understanding of what language is and how it is put together”, “making new [linguistic] theories”, understanding the structure of language, and finding “a better theory of how our minds work”. These questions can be answered only in the context of a study of UC, and we have already suggested that a practical pursuit of the first goal would tend to lead away from a serious investigation of this sort. However, it is clear from the rest of his article that Winograd interprets these goals in quite a different way. In order to understand his position it is necessary to consider his critique of linguistic theories such as the one developed by Chomsky: The decision to consider syntax as a proper study devoid of semantics is a basic tenet of many recent linguistic theories. Language is viewed as a way of organizing strings of abstract symbols, and competence is explained in terms of symbol-manipulating rules. At one level this has been remarkably successful. Rules have been formulated which describe in great detail how most sentences
8We do not mean to suggest that it is trivial or easy to build machines capable of performing even limited tasks involving language. But no matter how difficult such projects may be, they will seldom be required to deal with problems relevant to a scientific theory of language.
332
B. Elan Dresher
and Norbcrt
Hornsteirl
arc put together. But with the possible exception of current work in generative semantics, such theories have been unable to provide any but the most rudimentary and unsatisfactory accounts of semantics. The problem is not that current theories are finding wrong answers to the questions they ask; it is that they are asking the wrong questions. What is needed is an approach which can deal meaningfully with the question “How is language organi/.ed to convey mcan@g?” rather than “How are syntactic structures organi/.ed when viewed in isolation?” (1972: 16)
Winograd’s characterization of current linguistic theories is rather misleading. According to Winograd it is “a basic tenet” of linguistic theory that syntax is “a proper study devoid of semantics”. Presumably, he is referring to what has come to be known as the “autonomy of syntax thesis”. Reduced to its essentials, this thesis amounts to the claim that languages contain a syntactic component which is organized according to principles which can be stated to a significant degree independently of semantic primitives. It does not claim that the syntactic component operates totally independently of, and does not interact with, the semantic component. This is an absurd position which no one has ever held and which has always been explicitly rejected’. Nor does it claim that linguists must not study semantics or that they have to view syntactic structures in isolation. In practice such a research strategy is not feasible, for it is not obvious a priori which phenomena of language are to be assigned to the syntactic component and which to the semantic, or some third, component. It is important not to confuse the strategies used in a research program with the ends of that program. Whether or not a syntactic component exists is an empirical question; how one goes about finding it out is left to the ingenuity of the individual researcher. Furthermore, it is important to emphasize that the “success” of linguistic theory should not be judged by the fact that “[rl ules have been formulated which describe in great detail how most sentences are put together”. Such rules can be formulated in virtually any linguistic theory. Rather, linguistic theories will be successful to the extent that they can discover principles and express generalizations characterizing the necessary form of such rules. e
91Por example Chomsky (1957): “The fact that correspondences between formal and semantic features exist, however, cannot be ignored. These correspondences should be studied in some more general theory of language that will include a theory of linguistic form and a theory of the use of language as subparts.... We can judge formal theories in terms of their ability to explain and clarify a variety of facts about the way in which sentences arc used and understood. In other words, we should like the syntactic framework of the language that is isolated and exhibited by the grammar to be able to support semantic description, and we shall naturally rate more highly a theory of formal structure that leads to grammars that meet this requirement more fully.” (Chomsky 1957: 102)
Artificial
intclligencc
and tire study
of language
333
These issues aside, let us consider what Winograd supposes is the question that linguistic theory ought to be addressing itself to, namely, “How is language organized to convey meaning?” It has long been recognized that any adequate theory of language will have to answer such a question. For example Chomsky (1965) has noted: A fully adequate grammar must assign to each of an infinite range of sentences a structural description indicating how this sentence is understood by the ideal speaker-hearer. (pp. 4-S)”
No one doubts that the ultimate aim of research into language is to explain how language is organized to convey meaning. The problem facing someone who wishes to tackle this question is to devise rational strategies for breaking down this large, intractable problem into tnanageable subproblems; in the same way biologists rarely attempt to tackle head-on the problem, “What is life?“, but generally break it up into smaller and more modest problems, such as, “What is the structure of the cell?” As a tneans of getting to the intractable - “How is language organized to convey meaning?” - current linguistic theories ask, “What are the principles of UG?” (and not as Winograd would have it, “How are syntactic structures organized when viewed in isolation?“). Since (i) languages are pairings of sound and meaning, (ii) grammars generate languages, and (iii) grammars depend on the principles of UC; therefore, (iv) pairings of sound and meaning depend on principles of UG. In otherwords, an answer to “What are the principles of UC,?” will contribute to determining how languages are organized to convey meaning. Moreover, only this type of answer would be of scientific interest for a theory of language because no other answer would address itself to the organization of the human linguistic faculty, for reasons discussed in section I. 1. If Winograd’s question is to be of linguistic, or more generally, of scientific interest, an answer to it must address itself to the principles of UG. In effect, Winograd has not so much asked a different question from the one asked by current linguistic theories but has asked the same question in a much vaguer way. It is clear, however, that Winograd does not see things in this way. For the thrust of his methodology is not to discover general principles (of either the individual components or the principles of interaction among the components) but, in keepin g with his first goal, to build a “usable languageunderstanding system”. Thus, Winograd believes that “the best way to “Likewise, Jespersen writes: “ Now any linguistic phenomenon may be regarded either from without or from within, either from the outward form or from the inner meaning. In the first case we take the sound... and then inquire into the meaning attached to it; in the second case we start from the signification and ask ourselves what formal expression it has found in the particular language we are dealing with.” (1924: 33)
334
B. Elm Dresher and NorOcrt Hmxstein
experiment with complex models of language is to write a computer program which can actually understand language within some domain”. In another context, in introducing a certain theoretical device he states that the justification of this device is that the system “is thereby enabled to engage in dialogs that simulate in many ways the behavior of a human language user” (1972: 26). If by the first of these statements Winograd means that one can evaluate competing theories of language by writing a computer program, then he is incorrect: no computer program will be able to test between two descriptively adequate grammars unless it is specifically designed to do so; i.e., unless it embodies a theory of UC,. A computer program which “can actually underon the basis of any grammar which is stand language” can be constructed adequate “within some domain”, where the domain may be quite restricted. The writing of such a program may be the best way to assess the efficiency or consistency of usable language-understanding systems, but it is certainly not the best way to evaluate competing grammars or linguistic theories; in fact, it is totally incapable of doing so. Similarly, there could be any number of different systems which could successfully “engage in dialogues”. A particular theoretical concept cannot be justified within a scientific theory of language just because it works within some limited domain. Rather, as with scientific theories generally, theoretical concepts can be justified only by showing that they have explanatory power. The ability of a system to engage in dialogues is no test of the explanatory adequacy of the theory that underlies it, and hence is of no direct relevance to a scientific theory of language. It is clear, then, from the entire context of Winograd’s discussion that for him a meaningful answer to the question, “How is language organized to convey meaning?“, is the writing of a “computer program which can actually understand language within some domain “; in other words, the construction of “a usable language-understanding system”. But the successful construction of such a system can at best demonstrate that it is adequate in some particular domain with respect to some particular tasks and thus avoids the issue of explanatory adequacy which must be central to any theory of language. In short, Winograd’s work is not directed to finding out how human languages are organized to convey meaning - a question which can be answered only by theories of explanatory power -- but rather how language might be organized for the purposes of a machine performing particular tasks in a limited domain. Returning then to Winograd’s second and third goals - “gaining a better understanding of what language is and how it is put together” and developing “ a better theory of how our minds work” ~ it is clear from everything he
Artificial intelligence and the study
of language
335
writes that he considers their attainment to be by-products of the achievement of the first goal - “the practical desire to have a usable languageunderstanding system”. But this conclusion is incorrect, for as we have shown above the construction of a “usable language-understanding system” is irrelevant to the development of an explanatory theory of language. Hence, it is fair to conclude that Winograd has only one goal, “the practical desire to have a usable language-understanding system”. and, as argued, there is no reason to believe that the attainment of this goal will in any way contribute to “gaining a better understanding of what language is and how it is put together” or towards developing “a better theory of how our minds work”. And it will certainly not provide “a rigorous test for linguistic theories” or lead to “making new theories to fill the places where the old ones are lacking”. The belief that the development of an explanatory linguistic theory is closely linked to the construction of a language understanding system is not unique to Winograd but pervades much work in AI. h4insky (1974) writes: In this essay I draw no boundary between a theory of human thinking and a scheme for making an intelligent machine; no purpose would be served by separating these today since neither domain has theories good enough to explain - or to produce - enough mental capacity. (p. 6) Contrary to Minsky, we see no purpose in conflating a theory of human thinking with a scheme for making an intelligent machine, since the goals of these programs are fundamentally different. The existence of a machine that can simulate certain aspects of human intelligence will not necessarily contribute to the development of a theory of human thinking. Conversely, it may often be impractical to incorporate a theory of human thinking into such a machine. Only if one’s sole interest is to build such an intelligent machine is there “no purpose” in separating these endeavors. But if one’s sole purpose is to build such a machine then this work will only incidentally be of scientific interest. This bias towards (at best) observational adequacy is especially apparent in Minsky’s discussion of language. Like Winograd, his emphasis on language as a device for conveying meaning (an emphasis which is inherent in the task of communicating with machines) leads him to misconstrue the aims of syntactic research. Thus, he too is dubious of isolating syntax as a proper study in its own right: There is no doubt that there are processes especially concerned with grammar... But one must also ask: to what degree does grammar have a separate identity in the actual working of a human mind? Perhaps the rejection of an utterance (either as non-grammatical, as nonsensical, or most important as not understood) indicates a more complex failure of the semantic process to arrive at any usable
336
B. Elan Dresher and Norbcrt Hornstein
representation; I will argue now that the grammar-meaning distinction may illuminate two extremes of a continuum, but obscures its all-important interior. (pp. 24-25)
One will search in vain for such an argument in Minsky’s paper, but none need be given - it has long been recognized that sentences can be deviant for any number of reasons: phonological, semantic, syntactic, pragmatic, etc. Minsky is exploiting the truism that sentences can be not understood for many reasons to suggest that the various components of language, in particular the syntactic component, do not have “separate identities in the human mind”. This is a non sequitur. If by the phrase “separate identity in the actual working of the human mind” Minsky intends “not interacting with any other part of the mind”, then his claim is still a truism, as discussed above, and has no bearing on the empirical question of the autonomy of syntax”. Schank (1972) also exhibits two characteristics which we have found to be common to the work of Winograd and Minsky. These are: (1) The belief that the development of explanatory theories of language will be closely linked with the development of language-understanding machines, and that the latter will be able to choose among competing theories; (2) A misconception about the goals of an explanatory theory of language and in particular about the aims of syntactic research. Concerning the first point Schank (1972) states: The goal of the research described here is to create a theory of human natural language understanding. While this goal may seem to be virtually unattainable, it is possible to divide the process into subgoals which seem somewhat more attainable. In creating such subgoals it is reasonable to inquire how one would know when these subgoals have been achieved. In other words, it may be fine
’ 'Wilks (1975) advances a similar objection to transformational grammar but takes it a step further. tie bclicves that a serious theory of language should explain how any arbitrary string of sounds can convey meanino:P “The fact of the matter is surely that we cannot have a serious theory of natural language which rccluires that there be some boundary to the language, outside which utterances are too cidd for consideration. Given sufficient context and explanation anything can be accommodated and understood: it is this basic human language competence that generative linguistics has systematically ignored and which an AI view of language should be able to deal with.” (1975: 145). If by this Wilks is denying that one can make any distinctions between grammatical and ungrammatical sentences then it follows that there can be no finite grammar and hence none but the most context and explanation” it is trivial principles of UG. Now, it is trivially true that given “sufficient possible to understand anythin g; for example, any non-Chinese speaker (a concept which presumably has no meaning for Wilks) can understand a lengthy debate in Chinese if supplied with a translation. But then one must still account for why certain utterances require so much more context and explanation than others. Replacing the notion ‘degree of grammaticality’ by a notion like ‘amount of context and explanation required’ seems unlikely to further advance research into language.
Artificial intelligence and the study of language
337
enough to create a theory of natural language understanding in a vacuum, but how does one select any given theory over any other? It seems to me that there are two possible general criteria by which such a theory can stand or fall. One is the ability to stand up under the tests that psychologists can devise.... The second testing possibility comes from the field of computer science. It is highly desirable for computers to understand natural language. An explicit theory of human language understanding should, in principle, be able to be put on a machine to enable computers to communicate with people in natural language. (P. 552)
With respect to this second point, Schank is very explicit of syntax as “a proper study devoid of semantics”:
in his rejection
grammar proposed by Chomsky (1957, Since the transformational-generative 1965) is syntax-based it cannot seriously be proposed as a theory of human understanding nor is it intended as such.... There is then, a need for a theory of natural language understanding that is conceptually based. Furthermore, there is a need for an understanding model that recognizes that syntax is not a relevant object of study in its own right, but should be studied only as a tool for understanding of meaning. (1972: 555)
What does it mean to say that the grammar proposed by Chomsky (1957, 1965) is syntax-based? Since we have,seen that Chomsky (1957: 102) and (1965: 4-5) explicitly holds that a fully adequate grammar must indicate how sentences are understood Schank can only mean that these particular works are primarily concerned with syntax. But it is no more justifiable to say that the theory of Chomsky (1965) is syntax-based than it is to say that the Sound Patterns of English, which is concerned with phonology, is a “phonologically-based” theory of language. The attainment of Schank’s professed goal of creating “a theory of human natural language understanding” and especially his hope of building “a program that can learn as a child does” is impossible if it is not carried on in the context of a study of the principles of UC, an important subpart of which is the discovering of principles of syntactic organization. In conclusion, our examination of the real, as opposed to the professed, goals of current AI work on language indicates that there is little reason to expect that such work will result in new explanatory theories of language which could complement or supercede current linguistic theories. Rather, such work is aimed in quite different directions which are only incidentally of scientific interest’*. 12Some in the AI community have expressed
similar views: Wilks (1975) writes: “...it seems to me highly misleading, to say the least, to describe the recent flowering of Al work on natural language inference, or whatever, as thwrefical work. I would argue that it is on (Con timed o~vrleaf)
338
B. Elan Dresher and Norbert Hornstein
II. The Results of AI Research into Language Up to now, we have been discussing the goals of some recent influential work in AI. We have found that, despite first appearances, the true goals of this work are primariiy technological rather than scientific, and that there is little reason to expect that the main goal ~ the desire to have a usable language-understanding system - will have as a byproduct the development of explanatory theories of language. Nevertheless, it is still possible that the actual work done in AI has in fkt contributed to the development of such to properly assess AI contributions to a explanatory theories. Therefore, theory of language, it is necessary to go beyond statements of goals to a closer examination of the work itself. In this section we will review at some length some recent work of Winograd (1972, 1973), Minsky (1974), and Schank (1972, 1973). Win ogrud ‘s SHR D L U Progrum
Winograd describes his language-understanding
program
as follows:
The subject chosen was the world of a toy robot with a simple arm. It can manipulate toy blocks on a table containing simple objects like a box. In the course of a dialogue, it can be asked to manipulate the objects, doing such things as building stacks and putting things into the box. It can be questioned about the current configurations of blocks on the table, about the events that have gone on during the discussion, and to a limited extent about its reasoning. It can be told simple facts which are added to its store of knowledge for use in later reasoning. The conversation goes on within a dynamic framework - one in which the computer is an active participant, doing things to change his toy world, and discussing them. The program was written in LISP on the PDP-10 ITS time-sharing system of the Artificial Intelligence Laboratory at MIT. It displays a simulated robot world the contrary, as psychologists insist on reminding us, the expression in some more or less agrecable semi-formalism of intuitive, common-sense knowledge revealed by introspection.... It seems clear to me that our activity is an engineering, not a scientific, one and that attempts to draw analog& between science and AI work on language are not only over dignifying, . . but are intellectually misleading.” (p. 145). Weizcnbaum (1976) writes: “Newell, Simon, &hank, and Winograd simply mistake the nature of the problems they believe themselves to be “‘solving”. As if they were benighted artisans of the seventeenth century, they present “general theories” that are really only virtually empty heuristic slogans, and then claim to have verified these “theories” by constructing models that do perform some tasks, but in a way that fails to give insight into general principles.... t:urthermore, such principles cannot be discovered merely by expanding the range of a system in a way that enables it to get more knowsledge of the world. Even the most clever clock builder of the seventeenth century would never have discovered Newton’s laws simply by building ever fancier and more intricate clocks!” (pp. 196 197).
Artificial
intelligence
and the study of language
339
on a television screen and converses with a human on a teletype. It was not written for any particular use with a real robot and does not have a model of language based on peculiarities of the robot environment. Rather, it is precisely by limiting the subject matter to such a small area that we can address the general issues of how language is used in a framework of physical objects, events, and a continuing discourse. The programs can be roughly divided into the three domains mentioned above: There is a syntactic parser which works with a large-scale grammar of English; there is a collection of semantic routines that embody the kind of knowledge needed to interpret the meanings of words and structures; and there is a cognitive deductive system for exploring the consequences of facts, making plans to carry out commands and finding the anwers to questions. There is also a comparatively simple set of programs for generating appropriate English responses. In designing these pieces, the main emphasis was on the interaction of the three domains. The form in which we want to state a syntactic theory or a type of deduction must take into account the fact that it is only a part of a larger system. One of the most useful organizing principles was the representtition of much of the knowledge as procedures. Many other theories of language state their rules in a form modelled on the equations of mathematics or the rules of symbolic logic. These are static rules that do not explicitly descl-ibe the process involved in using them, but are instead manipulated by some sort of uniform deduction procedure. By writing special languages suited to the various types of knowledge (semantic, syntactic, deductive), we are able to preserve the simplicity of these systems. This is accomplished by putting the knowledge in the form of programs in which we can explicitly express the connections between the different parts of the system’s knowledge, thus enriching their possibilities for interaction. (1973: 1.54, 155)
Since Winograd’s programs are made up of a syntactic parser, a collection of semantic routines, and a cognitive deductive system, we can best proceed by examining each of these separately and then looking at their interaction. Recall that his work would be of interest to a theory of language to the extent that he has formulated principles that underlie each of these components or principles which govern their inaction. We will begin with Winograd’s syntactic component. His grammar is based on Systemic Grammar as developed by Halliday. His choice of Systemic Grammar over other types of grammar (and in particular Transformational Grammar) rests on his belief that it provides a better basis for a model of language understanding: The speaker encodes meaning by choosing to build the sentence with certain syntactic features, chosen from a limited set. The problem of the hearer is to recognize the presence of those features and use them in interpreting the meaning of the utterance.
340
B. 6’1ur1Drcsher and Norbcrt
Hotwstein
Our system is based on a theory called Systemic Grammar (Halliday, 1967, 1970) in which these choices of features are primary. This theory recognizes that meaning is of prime importance to the way language is structured. Instead of placing emphasis on a “deep structure” tree, it deals with “system networks” describing the way different features interact and depend on each other. The primary emphasis is on analyAng the limited and highly structured sets of choices which are made in producing a sentence or constituent. The exact way in which these choices are “realized” in the final form is a necessary but secondary part of the theory. (1972: 16)
According to Systemic Grammar, each syntactic unit has associated with it a set of syntactic features. For example, every clause which could stand alone as a sentence - in terms of systemic grammar every “Major clause” will have the syntactic form of either a question, a declarative sentence, or a command. In Systemic Grammar this is expressed by assigning to every Major clause one of the three mutually exclusive features QUESTION, DECLARATIVE, IMPERATIVE. Furthermore, if a sentence has the form of a question, it will either be a yes-no question (e.g., Did John go CO the store?) or a WH question (e.g., Whut did John buy at the store?). Hence, all clauses marked with the feature QUESTION must also be marked with one of the features YES-NO or WH-. These features can be diagrammed as follows: (14)
CLAUSE -MAJOR
DECLARATIVE IMPERATIVE (QUESTION
__i
~~-“”
The system is also capable of expressing more complex relations among features: features which cross classify, unmarked features, etc. According to Winograd, the superiority of Systemic Crammer stems from its emphasis on these sorts of syntactic features, for these features play a primary role in the process of language understanding. It should be clear that, since these syntactic features are determined by syntactic structures, the information supplied by these features could be extracted from any grammar which systematically differentiates these different kinds of constructions. Since virtually any grammar will be able to distinguish, for example, between imperatives, questions and declaratives, between yes-no questions and wh- questions, between actives and passives, etc. - and indeed every traditional grammar has made these distinctions - it follows then that the relevant syntactic features can be defined for any grammar in a very natural way. The same is true in the case of grammatical functions such as
Artificial intelligence and the study of language
34 1
subject and object which are explicitly stated within Systemic Grammar. Winograd states that what sets Systemic Grammar apart is that it states explicitly what is only implicit in other theories: In most current theories, these features and functions are implicit in the syntactic rules. There is no explicit mention of them, but the rules are designed in such a way that every sentence will in fact be one of the three types listed above, and every WH- question will in fact have a question element. The difficulty is that there is no attempt in the grammar to distinguish significant features such as these from the many other features we could note about a sentence, and which are also implied by the rules. If we look at the “deep structure” of a sentence, again the features and functions are implicit. The fact that it is a YES-NO question is indicated by a question marker hanging from a particular place in the tree, and the fact that a component is the object or subject is determined from its exact relation to the branches around it. The problem isn’t that there is no way to find these features in a parsing, but that most theories don’t bother to ask “Which features of a syntactic structure are important to conveying meaning, and which are just a byproduct of the symbol manipulations needed to produce the right word order”. (1972: 21)
In Systemic Grammar, the set of features is assigned to surface structures by a set of realization rules. Thus, Systemic Grammar can be analyzed as consisting of two levels: a level at which syntactic structures are stated, without regard to considerations of conveying meaning, and a level consisting of these surface structures augmented by feature and function markings. These two levels are mediated by the realization rules. We have already noted that realization rules can be defined over the syntactic structures supplied by virtually any grammar. Therefore the fact that such rules have been formulated in Systemic Grammar tells us nothing about the adequacy of the syntactic structures specified by it. These must be justified by showing that they follow from a theory which captures syntactic generalizations, or which expresses some theory of UC,, and this Winograd does not do. Turning to the realization rules themselves, let us consider Winograd’s assertion that they specify features which are of particular significance in the interpretation of meaning. The specification of a particular set of realization rules involves an empirical hypothesis about language; but Winograd has done nothing that could bear on testing the correctness or incorrectness of such a hypothesis. Thus, he simply asserts that the features he assigns to syntactic units are “significant” with respect to meaning, while certain other possible features are not. But it is not enough to assert this; nor is it an argument to show that these features can be incorporated into a system which is adequate in some limited domain.
342
B. Elan Dresher ad
Norbert
Hornstein
As it turns out, it is not at al1 clear exactly what empirical claim is embodied in the realization rules that Winograd actually uses. For example, (14) shows only one small part of the CLAUSE system network; more of it is diagrammed in ( I 5): ,REP* I
IMPERATIVE
(15) MAJOR
--i
( DANGLING I___
DECLARATIVE QUESTION
CLAUSE-
-4
SHORT
YES ~ NO 4DJ* IWfl-
-----
4DVMEAS*
BOUND
SUBJ* ADJUNCT -
SUB*
TO
--I
ING
SEC EN 3BJ*
TO
RSQ
ING
I\JG*-
:OMP*
J
WHRS -
1SUBING
1NG-j
--SUBTO
TO RSNG
i
I i
SUBJT* OBJl* OBJ2* LOBJ* TRANS
2 TO*
MEAS* -_-
rIhlE* DOWN*
--THAT
REPORT
--_
i
( ITSUBJ
1SUBJI_-_ OBJ I OBJ? LOBJ PREPOBJ
(1972:
48)
Even (15) shows only one part of the CLAUSE system network; there is another CLAUSE system dealing with transitivity, and there are systems of about equal complexity for every other part of speech. Under these circum-
Artificial intelligence and the stud-v oj‘language
343
stances, one is left to wonder what the “insignificant” features are that were not selected. If, taking the extreme case, the realization rules merely specify every feature which can be defined over surface structures, then the claim embodied in them is that every aspect of surface structure is of equal importance in conveying meaning. But in that case, the various system networks, and the realization rules that produce them, would be not so much empirical hypotheses about what aspects of syntax are important in conveying meaning, but rather would amount to statements of logically necessary connections between possible syntactic features. For example, no one can doubt that a sentence cannot have the syntactic form of a yes-no question without also having the syntactic form of a question, or that if a sentence has the syntactic form of an imperative then it cannot also have the syntactic form of a declarative13. Unless it is shown that the system of features goes beyond logical necessity, for example by specifying constraints on feature systems that rule out empirically unattested though logically possible systems, then the system networks presented by Winograd become irrelevant. For if system networks do not filter out information implicit in surface structure, then they must contain exactly the same information, albeit in a different notation, and hence there is no reason why semantic rules could not operate directly upon surface structures. Furthermore, a theory which puts no constraints (except the logically necessary constraints which follow from the definitions of the features) on feature systems embodies the claim that the answer to the question, “Which features of a syntactic structure This is an extremely are important to conveying meaning. 7” is “Everything”. minimal claim, and one which would make irrelevant Winograd’s stated reason for choosing Systemic Grammar. Since Winograd does not propose any constraints on system networks, it is not clear if Systemic Grammar, as he uses it, embodies any but the most minimal hypothesis about the relation of syntactic structures to meaning. Another argument that Winograd advances in support of his grammar is that it is written explicitly in the form of a program: Our grammar is an interpretation grammar for accepting grammatical sentences. It differs from more usual grammars by being written explicitly in the form of a program. Ordinarily, grammars are stated in the form of rules, which are applied in the framework of a special interpretation process. This may be very complex in some cases (such as transformational grammars) with separate phases, special 13Although semantically declarative sentences can be used as questions and questions can be used as commands and so on, we are dealing here only with syntactic forms: “It is important to note that these features are syntactic not semantic. They do not represent the use of a sentence as a question, statement, or command, but are rather a characterkation of its internal structure - which words follow in what order.” (Winograd 1973: 178)
344
B. Elan Dreshcr
and Norbert
Hornstein
“traffic rules” for applying the other rules in the right order, cycles of application, and other sorts of constraints. In our system, the sequence of the actions is represented explicitly in the set of rules. The process of understanding an utterance is basic to the organization of the grammar. (1973: 178) In the above passage Winograd compares a transformational grammar to a grammar which represents the sequence of actions performed in parsing. This comparison is totally inappropriate, for TG is not meant to be a model of parsing. Rather, TG is intended as a theory which represents the tacit knowledge that speakers have about the possible structures of their language. As such it makes no claims about how this knowledge is put to use in the generation or understanding of actual sentences. It is in this sense that TG is neutral with respect to theories of parsing or production of sentences. Winograd disputes this, claiming that TG “is in fact highly biased towards the processof generating sentences rather than interpreting them” (1972: 42). His contention is apparently based on the difficulties encountered by parsers which incorporate a TG and which operate by trying to recover deep structures of sentences by unwinding transformations. Yet it does not follow that a parserwhich incorporates a TG must proceed in this way. The most that the difficulties encountered by such parsers show is that some particular ways of incorporating a TG into a parser are incorrect l4 . But this fact does not bear on the neutrality of TG with respect to parsing and generation and hence Winograd’s claim that TG is in fact biased towards generation is incorrect”. In keeping with his perception of TG as biased in favor of generation, Winograd interprets the arrows found in phrase structure rules and transformations as controlling the serial order in real time according to which sentences are produced by a speaker16. He claims that the set of “typical grammar rules” shown in (16) would be expressed in his grammar by the program diagrammed in (17):
14Evcn these results arc not conclusive if they are not linked to some explicit theory of human computation. I:or what results in difficulties for a parser embodying one such theory may not be difficult for a parser which works on the basis of different principles. Arguments concerning computational complexity will be interesting to the extent that they arc linked to human computational abilities as opposed to the computational abilities of certain computers. ‘%f anything it is a lot easier to see how a TG could be incorporated into a parsing model than into a production model; for whereas the input to a parser consists of utterances whose properties are at least available for study, it is totally unclear what the input to a production model would be. Since people do not babble sentences at random, production models would presumably start with thoughts or intentions and other such elusive entities whose properties are largely unknown and difficult to study. I-or further discussion see Katz and Postal (1964: (66-172). _ 16Thc bclicf that TG is intended as a theory of production is also held by &hank (I 976): “... it is hard to see how a compute; scientist would not get exasperated by a “theory” [TG] that states that generation and analysis arc the same process, all you do is reverse generation to get analysis.”
Artificiul intelligence and the study of language
(16)
S NP VP VP -
NPVP DETERMINER NOUN VERB/TRANSITIVE NP VERB/INTRANSITIVE (1973:
(17)
345
179)
DEFINEprogram SENTENCE RETURN failure
1
NO
RETURN success DEFINE program NP
DEFINE program VP RETURN fadure
(1973:
180-181)
346
R. Marl Dreslw
and Norbert
Hornstein
If the arrows of (16) are interpreted in real time then the two models would have quite different functions - the model of (16) would be a model of production while that of (17) would be a parsing model. Put in these terms one could hardly object that the model in (17) is a more appropriate model of parsing than is (16). But this is true only if (16) is interpreted as a model of production. What such an interpretation means is that in producing a sentence, (16) directs the speaker to first produce an S, and then replace this S by NP VP, and then to expand the NP, and so on until the whole sentence is produced. But notice that under this interpretation neither ( 16) nor (17) is simply a grammar - each contains a grammar, supplemented in (16) by a production model and in (17) by a parsing model. But the grammar contained in each is the same. This grammar can be simply represented by the schema in (16) where the arrows are interpreted as indicating hierarchical dependency, without implications for production, So, for example. it does not follow from (16) under this interpretation, which is the standard one in TG, that a speaker when producing a sentence first conjures up an S in his mind, and then replaces this S by NP VP, and so on. This grammar is totally compatible with many other production models; e.g., one in which the speaker first generates VP, then Det N, and then connects these, in the process filling out the rest of the sentence, or one in which the speaker first picks all the words, and so on. In fact, the grammar of (16) specifies no production model any more than it specifies a parsing model. Hence it is neutral with respect to any of these questions. This being so, it is a simple category mistake to .argue against the adequacy of a TG by bringing processing arguments against it. Thus, Winograd misses the point when he states that systems such as TG “are not valid as psychological models of human language, since they have not concerned themselves with the operational problems of a mental procedure. They often include types of representations and processes which are highly implausible, and which may be totally inapplicable in complex situations because their very nature implies astronomically large amounts of processing for certain kinds of computations” (1973: 167). Returning to Winograd’s assertion that his grammar differs from more usual grammars by being written explicitly in the form of a program, we find that his grammar really differs from “more usual grammars” because it contains a grammar supplemented by a parser. It is understood by everyone that any overall theory of language will contain a theory of parsing. Hence it is interesting to see what contributions, if any, Winograd’s program makes to such a theory. According to Winograd his parser “is basically [a] top-down left to right parser, but it modifies these properties when it is advantageous to do so”
Artificial intelligence and the study
oflanguage 347
(1972: 22). An example ofsuch a modification arises in Winograd’s discussion of syntactic agreement (such as between subjects and verbs), where his parser is allowed instructions like “move left until you find a unit w’ith feature X, then up until you find a CLAUSE, then down to its last constituent, and left until you find a unit meeting the arbitrary condition Y” (1972: 86). It is practically a condition of the problem that a parser will be basically left to right. This is so because people cannot produce sentences instantaneously, but must rather produce them one worB at a time. All “left to right” means is that one processes what one hears in the order that one hears it. Though not logically impossible, it is hard to imagine how processing could be basically right to left. The fact that Winograd’s processor is basically left to right, then, does not set it apart from most processors one might wish to devise. As for his processor being “basically . . . top-down”, here it is not so obvious that a top-down processor is superior to a bottom-up processor, or some combination of these. It would be of interest if Winograd could show that his top-down parser is a better model of human parsing than some other type by showing that his parser has certain desirable properties that could not be easily duplicated. Unfortunately, arguments of this kind do not appear in Winograd’s lengthy monograph. Furthermore, given the power of his parser to accommodatevirtually arbit.rary instructions, as those mentioned in connection with agreement, it emerges as almost an accident that his parser is top-down left to right rather than bottom-up right to left. For instead of these features following from a theory of parsing which dictates that his parser must be top-down left to right, they are merely stipulated17. Another example which illustrates the arbitrariness of Winograd’s approach to parsing is his discussion of coordinate structures: the word “and” can be associated with a program that can be diagrammed as shown in Figure 4.12 [here (IS)]. Given the sentence “The giraffe ate the apples and peaches”, it would first encounter “and” after parsing the noun “apples”. It would then try to parse a second noun, and would succeed, resulting in the structure shown in Figure 4.13 [here (19)] If we had the sentence “The giraffe ate the apples and drank the vodka”, the parser would have to try several different things. The “and” appears at a point which represents boundaries _____ 171n fact,
the semantic
analogue to Winograd’s
syntactic
parser
works
according
to different
prin-
ciples: “In describing the parser it was pointed out that we do not carry forward simultaneous parsings of a sentence. We try to find the “best” parsing, and try other paths only if we run into trouble. In semantics we take the other approach. If a word has two meanings, then two semantic descriptions are built simultaneously, and used to form two separate phrase interpretations.” (Winograd 1972: 3 1). No principled reason is given for treating the two parsers differently.
348
B. Elan Drcshcr and Norbert Hornstein
(18) unit of the same type as the currently
RETURN
failure
Replace the node with a new node combining the old one and the one you have just found I
I
Figure 4.12
Conjunction Program. From T. Winograd, Understanding Natural Language, Cognitive Psychology, 3 : l-1 9 1. Copyright 0 by Academic Press. (1973: 181)
(191
SENTENCE
VP \ /
A DETERMINER
I the
Figure 4.13
NP \NOUN
NOUN
I giraffe
/ VERB
DETERMINER
I ate
NOUN
/ the
apples
Conjoined Noun Structure. From T. Winograd, Natural Language, Cognitive Psychology, 3: l-191. Copyright 0 by Academic Press.
NOUN
T
\ and
peaches
Understanding
between several units. It is after the noun “apples”, and the NP “the apples”. It is also after the entire VP “ate the apples”. The parser, however, cannot find a noun or NP beginning with the following word “drank”. It therefore tries to parse a VP and would successfully find “drank the vodka”. A CONJOINED VP would be created, producing the final result shown in Figure 4.14 [here (20)]. Of course the use of conjunctions is more complex than this, and the actual
Artijiciul intelligence and the study ojflanguage
349
program must take into account such things as lists and branched structures in addition to the problems of backing up if a wrong possibility has been tried. But the basic operation of “look for another one like the one you just found” seems both practical and intuitively plausible as a description of how conjunction works. (1973: 179)
DET
I the
VERB
NOUN
I giraffe
Figure 4.14.
DET
NOUN
I
I
I
ate
the
apples
VERB
I and
drank
DET
NOUN
I
I
the
vodka
Conjoined VP Structure. From T. Winograd, Understanding Natural Language, Cognitive Psychology, 3 : l-1 9 1. Copyright 0 by Academic Press. (1973: 182)
In this passage Winograd seems to be making a prediction about the parsing of coordinate structures. By setting up the parser so that it tries to conjoin the lowest level node first, he is in effect predicting a “garden path” effect in the case of conjunction of the higher level nodes18. That is, all things being equal, it ought to take longer to process a sentence like The giraffe ate the apples and drank the vodka than it takes to process a sentence like The giraffe ate the apples and peaches. This is because, upon encountering the word and in the former sentence, the listener will seek in vain first for an N and then for an NP before successfully finding the conjoined VP. These unsuccessful attempts will presumably add to the processing time of such sentences. However, this only seems to be a prediction; actually it is only a guess, for Winograd has indicated no reason why things should be ordered the way they are. The ordering certainly does not follow from any features inherent in his parser, which could just as easily have been set up to first seek to conjoin top level nodes. In that case, the predictions about garden path effects would be reversed. Moreoever, the ordering is not suggested by empirical considerations; Winograd brings no evidence that indicates that people in fact do process conjunction in the order that his “For
further
discussion
of garden
paths see section
III
below.
350
B. Elan Drcshcr
arId Norbcrt
Hormtein
parser does. In short, he advances no reason of any kind - neither theoretical nor empirical - for setting up the parser the way he does. We cite this example as being characteristic of Winograd’s overall approach, which is to arbitraril~~ stipulutc what are in reality matters that can only be decided by empirical research, and which can only be explained on the basis of theoretical work. Incidentally, Winograd’s discussion of coordinate structures does contain one argument that should be noted: Viewing “and” as a special program that interrupts the normal parsing sequence also gives us a sort of explanation for some puzzling syntactic facts. The statement “I saw Ed with Steve” has a corresponding question, “Whom did you see Ed with?” But “1 saw Ed and Steve” cannot be turned into “Whom did you see Ed and?” The “and” program camlot be called when there is no input for it to work with. (1973: 179)
Recall that in section 1.2 we suggested that there was a principle of UG which blocked relativization of an NP in a coordinate structure. The fact discussed by Winograd shows that it is also not possible to question an NP in a coordinate structure. We suggested at the time that principle (8) might follow from more general considerations, and the data involving questions suggests such a more general principle: (21)
No gaps may appear
in only one term of coordinate
structure.19
This new principle accounts for the Hebrew and English relative clause facts mentioned in section 1.2 as well as facts about questions. Winograd in effect proposes the parsing analogue of this constraint. His constraint will also account for all the sentences noted above. The two principles are different, however, in that (21) is a constraint on possible syntactic configurations and hence is a constraint on the syntactic component, while Winograd’s constraint is a constraint on the parser. A theory of UG containing (21) will not have to specially stipulate such a parsing constraint since the syntactic component will rule out all such structures anyway; similarly, a theory containing Winograd’s principle could freely generate such structures in the syntactic component, and their unacceptability will be traced to the parsing constraint. Since both versions of the constraint will have the same empirical consequences (for they will both rule out the same sentences), it will be possible to choose between them only on theoretical grounds, e.g., by elaborating yet more general principles of syntax and of parsing from which one of the interpretations of the constraint follows. Whatever the answer
19Cf: Ross (1967)
Artificial intelligence at& the study of language
35 1
turns out to be, the point is of interest for a scientific theory of language because it involves finding explanations for linguistic phenomena. The fact that Winograd so rarely attempts this kind of explanation supports our earlier contention (see section 1.2) that his main aim is technological - the construction of a usable language-understanding system ~ and not scientific, and moreover, that pursuit of this goal will not have as its byproduct the development of explanatory theories of language. Ultimately, the explanatory adequacy of any of the components that Winograd programs into his computer will have to be decided on grounds which are independent of programming considerations. So, for example, no contribution is made to deciding the adequacy of Systemic Grammar by making it part of a languageunderstanding system. For the kinds of considerations which enter into decisions of explanatory adequacy arise only incidentally in the course of programming a language-understanding system. Many of the general points that we made concerning Winograd’s syntactic system carry over to his treatment of semantics. Winograd states that a semantic theory must define the meaning of words, must be able to relate the meaning of groups of words in syntactic structures, and must describe how the meaning of a sentence depends on context. We will consider first one of these components? his treatment of the meanings of words. One of the problems of a theory of semantics is to outline a theory of the semantic structure of the lexicon. As part of such a theory, we would seek to give a characterization of how meanings of words are represented in the mind. In particular, one would like to discover what innate principles of semantic organization are used by a person in the acquisition of word meanings. One approach to this question is that of Katz and Fodor (1963) who sought to decompose the meanings of words into semantic markers. For example, the word bachelor would in one of its interpretations be decomposed into (adult, male, not marvicd). The problem with the KatzFodor theory was that the number of semantic markers seemed to approach the number of words in the language. In the extreme case each word becomes its own semantic marker and the problem of discovering semantic primitives remains where it was before. Winograd’s semantic theory employs semantic markers of roughly the Katz-Fodor type, but he does not face the problem of discovering which features are basic and which are peripheral for he simply stipulates all those features which he will need to converse with a computer about the blocks world. This is not too hard to do, as this world is extremely small, involving only a table, some blocks of different shapes and colors, and in which a small number of activities such as moving blocks and counting them goes on. The semantic markers that Winograd uses are diagrammed by him as shown in (22):
352
R. Elan Dreshcr and Norbert
Hornstein
(22) *PROPERTY SIZE Z’LOCATIOK -Y‘OLOR rANlMATE
TIilNC=
- I
-IRoBoT
ePllYSOByCONSTRI’CT
tiPPILE
r-1 IAND
*ROW
tiTABLE ~MANlI’--*BOX
@PYRAMID
I
c”BLOCK
4BALL
j &VENT &RELATION
1
*TIMELESS
(1972:
128)
An examination of these features reveals that the system is idiosyncratically geared to the requirements of the blocks wqrld20. It is hard to see how a theory which treats such features as #BOX or #TABLE on a par with features like itSHAPE and *SIZE could be extended in any general way without *“‘Various researchers have approached world knowledge in various ways. Winograd (1972) dealt with the problem by severely restricting the world. This approach had the positive effect of prodwine a working system and the negative effect of producing one that was only minimally extendable.” (Schank and Abelson 1976:l) “[Winograd’s] semantics is tied to the simple referential work of the blocks in a way that would make it inextcnsible to any general, real world situation. Suppose ‘block’ were allowed to mean ‘an obstruction’ and ‘a rncntal inhibition’, as well as ‘a cubic object’. It is doubtful whether Winograd’s features and rules could express the ambiguity, and, more importantly, whether the simple structurc$ he manipulated could decide correctly bctwren the altcmative meanings in any given context of USC.” (Wilkq 1974:s)
Artiffcial intelligence and the study of‘ language
353
running into the problems of a Katz-Fodor theory. In fact, the problems of a Katz- Fodor theory remain exactly where they were. By limiting his domain to only a small artificially-constructed world Winograd is able to give his computer all the semantic features which it will need to deal with that world. But the real world cannot be exhaustively specified in this way. Winograd’s approach is inapplicable to the main problems of a theory of word meaning which is to specify principles according to’which the meanings of words are organized and learned”. The other two parts of Winograd’s semantic component as well as the
manner in which he encodes knowledge and the deductive system in his programs suffer from the same shortcomings. No general principles are enunciated which underlie the functioning of these systems and so whatever success they enjoy is purely a function of the limited domain in which they operate. We will not discuss the details of these components any further, as our critique of these parts of his model is parallel to what we had to say about the other components, and should by now be familiar. Up to now we have been focusing on how Winograd treats the various components of his language-understanding system in isolation. However, Winograd stresses repeatedly that his system is of interest not so much in its handling of various isolated components but mainly in how these components interact with each other to produce a total language-understanding system: To model this language understanding process in a computer we need a program which combines grammar, semantics and reasoning in an intimate way concentrating on their interaction.... To write a program we need to make all of our knowledge about language explicit, we have to be concerned with the entire language process, not just one area such as syntax. This provides a rigorous test
for linguistic theories, and leads us into making new theories to fill the places where the old ones are lacking. (1972: 2) Much of the research on language is based on an attempt to separate it into distinct components - components that can be studied independently,... This paper describes an attempt to explore the interconnections between the different types of knowledge required for language understanding. (1973: 1.52-3)
Even though Winograd has not formulated any principles that underlie the various components in isolation, nevertheless it is possible that his work does bear on that part of the theory of language which deals with the interaction of these components. His reference to making new theories leads one to believe that this is one of his interests. Recall that the requirements of a theory of interaction are no different from those of a theory of any of the *‘l:or an example of some interesting empirically based theoretical proposals concerning the learning of word meanings
see Carey (1976).
354
B. I:‘lan Drcsher and Norbert
Hornstein
individual components. Again, it is not enough to build some model which is adequate for some particular task. Rather, one must provide explanatory principles which dictate the form that these interactions must take. Yet even in the area of interaction among components, Winograd’s methodology is the same as the one he uses for each component. Here too his interest remains to design a model which will work reasonably efficiently for the domain he has chosen without going into any of the theoretical or empirical considerations which are of primary interest for a scientific theory of language or intelligence. An example of his approach is his treatment of the interaction of the syntactic and semantic components: there is no need to wait for a complete parsing before beginning semantic analysis. The NOUN GROUP specialist can be called as soon as a NOUN GROUP has been parsed, to see whether it makes sense before the parser goes on. In fact, the task can be broken up, and a preliminary NOUN GROUP specialist can be called in the middle of parsing (for example, after finding the noun and adjectives, but before looking for modifying clauses or prepositional phrases) to see whether it is worth continuing, or whether the supposed combination of adjectives and noun is nonsensical. Any semantic program has full power to use the deductive system, and can even call the grammar to do a special bit of parsing before going on to the semantic analysis. For this reason it is very hard to classify the semantic analysis as “top-down” or “bottom-up”. In general each structure is analyzed as it is parsed, which is a bottom-up approach. However whenever there is a reason to delay analysis until some of the larger structure has been analyLcd, the semantic specialist programs can work in this top-down manner. (1972: 29~30)
The question here, as it is everywhere else, is why the components interact in just this manner. How do we know that the semantic component can intervene into the parsing of a noun group “to see whether it is worth continuing”? Winograd cites no empirical evidence that this is the way that parsing works, nor does he show that ,it captures any generalizations that could not otherwise be capturedZ2. Leaving aside any empirical considerations upon which Winograd’s niodel might or might not be based, one might still take his model to be an interesting hypothesis about how the various components in the language system interact. On closer examination, however, the empirical claim made by his hypothesis seems rather weak. For example, it is clear from the above **Winograd cites no evidence that humans in fact parse the way his system does. I:rom the little that is known, it appears that, contrary to Winograd’s claim, the full power of the semantic and deductive systems cannot intervcnc at any arbitrary point in the syntactic parsing. I:or a review of experiments that have been done on this subject set Goodluck (1976).
Arti’cial
intelligence and the study of language
3 55
quote that the semantic system has the power to be called any time during the syntactic parsing and has “full power” to use the deductive system. What this means, in other words, is that there are no limits at all put on the possible ways in which the semantic component can interact with the rest of the system. We are left with little more than the claim that meaning and reasoning are important in understanding language. No one has ever denied this; it would be absurd to maintain the contrary. What is at issue is not whether or not semantics plays a role at all but what this role is ~ what general principles delimit its role and make possible the acquisition of language understanding systems’ 3. We have seen that both in the details of his program and in its overall organization Winograd either avoids those questions which are of interest for a scientific theory of language and reasoning or else decides them in an arbitrary manner. This should not be surprising for, as Winograd puts it, his paper “explores one way of representing knowledge in a flexible and usable form.” But the aim of scientific research into language is not to produce theories which are flexible, but ~ just the opposite ~ to produce theories which are maximally constrained, for only such theories can explain why things are the way they are and not some other way. Many of the features of Winograd’s system which we have pointed out as being undesirable if one’s interest is an explanatory theory of language are in fact virtues in a system which is designed to carry out limited tasks involving language. From this point of view, it is a virtue of Winograd’s system that his semantic programs have the power that they do. For given our ignorance of how this component actually functions, it is good, if one’s interests are primarily practical, to have a system that will be able to handle, even if only in a brute force sort of way, whatever it turns out to be necessary to encode, and this entails designing systems which are maximally powerful and hence of minimal explanatory value. It is not for us to evaluate Winograd’s contributions in the area of programming. Nor do we wish to give the impression that it is a trivial task to construct a program such as Winograd’s. However, we do wish to emphasize that the requirements of a scientific theory of language can only be met by the formulation of general explanatory principles and not by the formulation of techniques for “integrating large amounts of knowledge into a flexible system”.
23”Indeed, it might be argued that, in a sense, and as regards its semantics, Winograd’s system is not about natural language at all, but about the other technical question of how goals and subgoals are to be organized in a problem solving system capable of manipulating simple physical objects.” (Wilks 1974: 5)
356
l:‘lar? Dreshcr
Norbert
Hornstein
A second influential paper in AI is Minsky (1 974)24. In this paper, Minsky outlines what he believes to be “a partial theory of thinking” (1974: abstract). He tries to provide a comprehensive framework for work carried on in linguistics, psychology, and AI by elaborating a theory which goes beyond the provincial pursuits of these various subdisciplines. Theories in AI and psychology have on the whole been “too minute, local and unstructured to account . .. for the effectiveness of common sense thought” (p. 1). Minsky’s goal is to deal with the big picture and give a structured account of what is to be done, and how it is to be done. The unified account he tries to present relies on the elaboration of the structure of an entity he calls a “frame”. Frames are data structures stored in the memory which represent a “stereotyped situation” (p. 1). These stereotyped situations have two levels. The top part of a frame is quite immutable and represents those parts of a stereotype which are always true. The lower portions, however, are f8r less settled and can be deleted and replaced. Thus, frames can be “adapted to fit reality by changing details as necessary” (p. 1). Frames have the structure of node networks and relations, and can be linked into frame-systems through transformations; framesystems can be linked by information retrieval networks which provide replacement frames when “a proposed frame cannot be made to fit reality” (P. 2).
The theory of frames he proposes is not complete, Minsky tells us, yet it is meant to be suggestive of what a comprehensive theory of human intelligence will look like. Its suggestiveness: is supported by elaborating several areas where the scheme supposedly provides illuminating insights: the ideas proposed here are not enough for a complete theory but . the framesystem scheme may help explain a number of phenomena of human intelligence. (P. 3)
It should be clear that if Minsky is in fact able to do what he claims, then such a theory would have important implications for a scientific theory of human language. To assess the claim Minsky makes it is necessary to look at some of the specific proposals that he makes for dealing with particular problems. Before doing this, it is important to keep in mind just what a theory of frames will have to do to pass beyond the realm of metaphor 24That Minsky’s paper is indeed influential is attested to by many in the AI world. Thus, Wilks refers to it as “an unpublished, but already very influential recent paper” (1974: 9). Schank and Abelson say that “Minsky’s frames paper has created quite a stir in AI and some immediate spin-off research...“. (Schank and Abelson 1976:l)
Artificial it~telligctzce and the study of’ language
357
into that of hypothesis. Minsky says that a frame can be regarded as “a network of nodes and relations”. For the theory to be of value, then, Minsky is obliged to tell LIS what the set of possible nodes are - the types of nodes that we can have. He must also specify what the types of possible configurations of nodes might be. These two requirements are not exotic conditions on the adequacy of his proposal; they are conditions that must be met if Minsky is to be interpreted as saying anything at all, for if anything is a possible node and anything is a possible configuration, then frame theory says nothing at all. If frames are node networks, then the only way of knowing what properties they have is to outline the types of nodes in the network and the types of configurations these nodes can form. A second condition for the simple coherence of Minsky’s proposal is that he delimit the sorts of things that he envisions transformations among frames to be. It is a commonplace of research into language that unconstrained transformational power enables one to do anything. If one can do anything, explanation vanishes. Thus Minsky must tell us, in the course of his elaboration of frame theory, what a permissible transformation is. His inabiliiy to do this would be a major flaw in the theory of frames. Last but not least, Minsky’s theory must show what it is that underlies all the areas that he tries to unify. This can be done by showing what mechanisms exist in common between various subsystems ~ e.g., certain primitive nodes or structural configurations of the Various frames. This too is not something incidental to his presentation, for frame theory is presented as a unified theory of thinking and if it is to be unified in more than name only it must provide principles common to the frames of the various subsystems. The fact is that Minsky’s presentation does none of these things. Minsky never elaborates or gives the slightest hint of what the substantive universals of his system are, i.e., what a possible node is. He never deals with what are or are not possible frame configurations or what can and cannot be changed in a frame. He never even tells us where the top or lower part of a frame is. Not only does he not say any of this explicitly, but the specific examples that he deals with leave one with no idea of how they can be elaborated into a general theory of human thinking. One can only conclude that the whole theory of frames, as he presents it, is quite vacuous. The nodes permitted are the ones needed for any specific problem. The structures permitted are those that one needs to solve any particular problem. The transformations permitted are those needed in any particular situation. In short, frame theory becomes little more than a rather cumbersome convention for the listing of facts. There is no theory of anything at all and especially not a theory of human thinking. Minsky presents a totally unconstrained system capable of doing anything at all. Within such a scheme explanation is totally
358
B. Elan Drcsher
and Norhcrt
Hornstein
impossible. In what follows we will exarnine a number of examples Minsky discusses. We do not deal with every one but we think that the criticisms we make carry over completely to the examples we have not considered. To illustrate, let us consider how Minsky proposes to treat language. He cites the following two sentences discussed by Chomsky (1957): (23) (24)
Colorless green ideas sleep furiously. Furiously sleep ideas green colorless.
Chomsky’s original point was that though both sentences are equally nonsensical, (23) is a far better sentence of English than (24), because (23) is syntactically well-formed while (24) is not. This suggests that there exists a level of syntactic organization which is largely independent of considerations of meaning. What is wrong with (23) is not the syntactic structure of the sentence but the fact that it violates selectional restrictions. Thus, ideas are not the sorts of things that can sleep and a verb like sleep does not generally select an adverb like Jiwiouslv. Minsky seems to be offering an alternate explanation, in terms of frames: as much in the positional and Since the meaning of an utterance is “encoded” structural relations between the words as in the word choices themselves, there must be processes concerned with analyAng those relations in the course of building the structures that will more directly represent the meaning. What makes the words of [23] more effective and predictable than [24] in producing such a structure - putting aside the question of whether that structure should be called semantic or syntactic ~ is that the word-order relations in [231 exploit the (grammatical) convention and rules people usually use to induce others to make assignments to terminals of structures. (1974: 24)
Another way to put what Minsky has said here is that (23) is a grammatically well-formed sentence and (24) is not. Talking in terms of inducing others “to make assignments to terminals of structures” does not shed any further light on the subject. Minsky goes on to elaborate the distinction between (23) and (24) in terms of frames: meaninglessness has a precise We certainly cannot assume that “logical” psychological counterpart. Sentence [23] can certainly generate an image! The dominant frame (in my case) is that of someone sleeping; the default system assigns a particular bed, and in it lies a mummy-like shape-frame with a translucent green color property. In this frame there is a terminal for the character of the sleep - restless, perhaps ~ and “furiously” seems somewhat inappropriate at that terminal, perhaps because the terminal does not like to accept anything so “intentional” for a sleeper. “Idea” is even more disturbing, because a person is expected, or at least something animate. I sense frustrated procedures trying
Artificial intelligence and the stud-v oj’language
359
to resolve these tensions and conflicts more properly, here or there, into the sleeping framework that has been evoked. Utterance [24] does not get nearly so far because no subframe accepts any substantial fragment. As a result no larger frame finds anything to match its terminals, hence finally, no top level “meaning” or “sentence” frame can organize the utterance as either meaningful or grammatical. By combining this “soft” theory with gradations of assignment tolerances, I imagine one could develop systems that degrade properly for sentences with “poor” grammar rather than none; if the smaller fragments - phrases and sub-clauses - satisfy subframes well enough, an image adequate for certain kinds of comprehension could be constructed anyway, even though some parts of the top level structure are not entirely satisfied. Thus, we arrive at a qualitative theory of “grammatical”: if the top levels are satisfied but some lower terminals are not we have a meaningless sentence; if the top is weak but the bottom solid, we can have an ungrammatical but meaningful utterance. (1974: 25)
Minsky asserts that the “dominant frame” in (23) “is that of someone sleeping” and that “in this frame there is a terminal for the character of the at that term’inal”. sleep” though “ ‘furiously’ seems somewhat inappropriate It is difficult to see how what Minsky says is any different from saying that the main verb of the sentence is sleep and, while this verb can in general take an adverb of manner, furiously is not one of those ~ i.e., it violates selectional restrictions. Similarly, it is hard to see how saying of sentence (24) that “no subframe accepts any ‘substantial fragment” is any different from saying that no substantial fragment of (24) is grammatically ~ syntactically or semantically - well-formed. Nor is there any difference between saying that “no top level meaning or sentence frame can organize the utterance as either meaningful or grammatical” and saying that the sentence is both syntactically and semantically deviant - i.e., that its structure is not one of a sentence of English and that it violates selectional restrictions. Minsky’s further speculations about combining this “ ‘soft’ theory with gradations of assignment tolerances” and his speculations about a “qualitative theory of ‘grammatical’ ” amount to little more than the observation that not all syntactically deviant sentences are incomprehensible and not all semantically deviant sentences are ungrammatical, and that grades of acceptability derive from the combination of syntactic and semantic deviationsz5. The notion of frames adds nothing to this discussion. Since top levels of frames encode syntactic information while lower terminals encode semantic information, the development of a qualitative theory of grammaticality will depend on the further development of theories of syntax and semantics, including the elaboration of notions such as selectional restriction. Changing syntuctic 25For
further
discussion
of the notion
‘degree of grammaticalness’,
see Chomsky
(1965:
148-153).
360
B. Elan Dreshcr and Norhert
Hornsteirz
and sernmtic to top and bottom seems unlikely to add anything to the further development of this theory. Another area of language where Minsky believes that frames can make a contribution is in the distinction between actives and passives: Just as any room can be seen from different physical viewpoints, so any assertion can be “viewed” from different representational each of which suggests a different structure: He kicked the ball. The ball was kicked. There was some kicking
viewpoints
as in the following,
today.
Because such variations
formally resemble the results of the syntactic, activepassive operations of transformational grammars, one might overlook their semantic significance. We select one or the other in accord with thematic issues ~ on whether one is concerned with what “he” did, with finding a lost ball, with who damaged it, or whatever. One answers such questions most easily by bringing the appropriate entity or action into the focus of attention, by evoking a frame primarily concerned with that topic. (1974: 33)
Minsky here makes the commonplace observation that languages possess devices for bringing certain parts of sentences into the focus of attention, and that changes of focus have semantic significance. The problem is to characterize exactly how languages deal with focus ~~ for example, in English we can put tlzc bull into focus by stressing it, as in tic kicked THE BALL, or by using a special construction, such as in It wus tlzc bull thut Johl kicked. In both cases, we know that the bull is the focus because we know the rules for focus in English. One may wish to say that both these sentences “evoke” a certain kind of frame, one which is “primarily concerned” with the bull, but this formulation does not go any further than the more simple, traditional assertion that t/ze bull is the focus in these sentences. There are at least two questions that can be asked about focus: what devices do languages use to focus items, and under what conditions will a speaker use these devices? Or, to use the terminology of frame theory: What kinds of frames can evoke a frame primarily concerned with a topic, i.e., some other frame? It seems to us that the relevant questions have not been sharpened by this translation. One might think that, even though frame theory adds nothing to the specifics of the theory of language, nonetheless it provides an illuminating framework which is capable of treating all cognitive systems, including language. in a unified theory. It this were true, one could argue that translating a theory of language into frame theory would allow one to express generalizations that hold between language and other cognitive systems, such as vision. The title of Minsky’s paper, “A Framework for Representing Knowledge” and the first line of his abstract suggest that frames are intended to provide a unified theory of knowledge:
Arti’cial
intelligence and the study oflanguage
36 1
This is a partial theory of thinking, combining a number of classical and modern concepts from psychology, linguistics and AI. (abstract) One similarity between language and vision that Minsky proposes concerns the following “analogy” between a frame for a room and a frame for a noun phrase: Consider the analogy between a frame for a room in a visual scene and a frame for a noun-phrase in a discourse. In each case, some assignments to terminals are mandatory, while others are optional. A wall need not be decorated, but every moveable object must be supported. A noun phrase need not contain a numerical determiner, but it must contain a noun or pronoun equivalent. One generally has little choice so far as surface structure is concerned: one must account for all the words in a sentence and for all the major features of a scene. (1974: 32)
At this level of generality, one can think of many other objects which are analogous to rooms and noun phrases in just this way. To take an example from the cognitive domain of taste, consider an ice cream sundae. Here too “some assignments to terminals are mandatory while others are optional”. An ice cream sundae need not have nuts or fruit, it can have any number,of toppings ~ whipped cream or marshmallow readily spring to mind, though others are possible - but it must have ice cream. For what is an ice cream sundae without ice cream! A banana split, while sharing many of the terminals of a sundae, differs crucially from it in that the bananas, while only optional in a sundae, are obligatory in a banana split. In much the same way, a noun is only optional in a sentence - e.g., Mope over! --- but is obligatory in a noun phrase. The correct conclusion here is not that language is similar to vision (or to taste) in a way that is illuminated by thinking in terms of frames, but that words have meanings and the objects they refer to obligatorily have certain characteristics and optionally have others. While it is not uninteresting to attempt to discover generalizations that hold across differeht cognitive systems, most people will concede that generalizations such as the ones proposed above are not particularly illuminating. Frame theory, not being a theory at all but a rather confusing way of talking, is of no more value in capturing generalizations across cognitive systems than it is in capturing generalizations within cognitive systems. It should be stressed that the inability of Frame theory to shed light on any of the examples that Minsky discusses (and we have only picked out a few of these) stems not from any insufficient working out of the details of the theory, but is a fundamental feature of the theory. The point is not that Minsky is lacking, as he himself admits “a unified coherent theory” (p. 1) - much worse, he lacks a theory of any sort. An analogy would be a person proposing a unified theory of physics which sees the entire universe as consisting of objects and relations,
302
B. Elm Dresher and Norbcrt
Hortzstein
without specifying any of the objects or any of the relations. It could be argued in defense of this proposal that the objects and relations could be filled in in due time and that it is unfair to criticize it simply because it has not yet produced any results. The problem with such a theory, as with Minsky’s, is not that it is not sufficiently worked out, but that whatever needs to be worked out to make it a theory could be worked out just as well without it. Though for the most part Frame theory seems to be only a way of talking, there is one point concerning which Minsky might be interpreted as making a substantive claim: H~I-e is the essence of the theory: When one encounters a new situation (or makes a substantial change in one’s view of the present problem) one selects from memory a structure called a Frame. This is a remembered be adapted to fit reality by changing details as necessary. (p. 1)
framework
to
The above statement may be interpreted as implying that cognition is mostly a matter of retrieving structures from the memory and that learning is therefore mostly a matter of storing these structures in the memory. If by “structures” Minsky means representations of particular things or events -- e.g., the appearance of particular rooms, the structures of particular sentences -- then at least for the case of language he is surely incorrect. For we have seen that language, being an open-ended system, cannot be learned as a list of patterns, much less as a list of sentences, but must be learned as a system of rules. There is no reason to suppose that similar considerations do not hold of other cognitive systems such as visionZ6. Of course, it is possible that the structures Minsky refers to can also be sets of rules or in fact whole theories; but now we are back to frames as a way of talking, and the question of empirical claims does not arise at all. In its emphasis on representation as opposed to explanation (his paper is called “A Framework for Representilzg Knowledge” - our emphasis), Minsky’s orientation is similar to Winograd’s. Both are interested in developing flexible systems - and nothing can be more flexible than frame any postheory ~ in order to represent ~ i.e., encode in some manner sible fact. Both presuppose the types of explanatory theories which it is the aim of scientific research to discover. For these reasons, neither approach is capable of contributing to a theory of human intelligence in general. or to a theory of language in particular’ 7. 26“Much of the discussion of Minsky’s frames paper is concerned with visual perception. There are a number of conclusions about visual perception and visual memory that result from the frames par&d&m which appear to me to be wrong and even wrongheaded. The notion that we store a large number of separate views in purely symbolic form is one such.” (I:eldman 1975: 102) 27(Sc~, opposite)
Artificial intclligcncc and the study of’languagc 363
Another direction that work in AI on language has taken can be seen in the research of Roger Schank (1972, 1973). Schank is primarily interested in modelling the way meaning is represented in natural language, and in how these representations are arrived at in the course of processing a sentence: Basically, the view of language understanding expressed here is that there exists a conceptual base into which utterances in natural language are mapped during understanding. Furthermore, it is assumed that this conceptual base is welldefined enough such that an initial input into the conceptual base can make possible the prediction of the kind of conceptual information that is likely to follow the initial input. Thus, we will be primarily concerned with the nature of the conceptual base and the nature of the mapping rules that can be employed to extract what we shall call the conceptualizations underlying a linguistic expression. (1973: 188)
Unlike Winograd, Schank is not trying to present an actual computer program that can understand some fragment of natural language. Rather, he explicitly maintains that he is presenting a theory that transcends the limitations of particular computer programs which are designed to handle natural language understanding only in particular domains: The question of natural language understanding by computers is so enormous that those who have made attempts to solve the problem have had to severely limit the domain of the particular problem that they were trying to solve, and sacrifice theoretical considerations for programming considerations. Previous attempts at natural language understanding have neglected the theory of an integrated understanding system for the practicality of having a system that operates.... What is needed, and what has been lacking, is a cohesive theory of how humans understand natural language without regard to particular subparts of that problem, but with regard to that problem as a whole. The theory that is described here is also intended to be a basis for computer programs that understand natural language. But what is described here is a theory not a program. (1973: 552453)
Before looking at how Schank elaborates such a theory, it is worthwhile clarifying what appears to be a confusion in his discussion of how a theory of meaning of this kind will fit into a total theory of language. In keeping with his misconception (discussed in section 1.2) that some theories of language are “syntax-based” while his is “conceptually based,” Schank 27Even Schank and Abelson who “... agree with much of what Minsky said about frames...” find that “[tlhe frames idea is so general, however, that it does not lend itself to applications without further specialization”. (Schank and Abelson 1976: 1) Less charitably, but more accurately, Feldman writes that “... the main difficulty with the frames paradigm as a theory [is that] it seems to be extensible to include anything at all we can ask that a theory be at least conceptually refutable”. (Feldman 1975: 102)
364
B. Elan Drcshcr and Norbcrt Homstcirl
misconstrues language:
the role
that
syntax
has been
held
to play
in a theory
of
What does a syntactic structure have to do with the structure of a conceptual base’? Consider the following sentence: 1. I hit the boy with the girl with long hair with a hammer with vengeance. It should be clear that the syntactic structure of sentence 1. will not provide the information necessary for dealing with the meaning of this sentence. That is, if we need to glean from this sentence the information that the hammer hit the boy we would have to do it by methods more sophisticated than syntactic analysis. In fact the syntactic formations used will be of very little value in that task. Now consider sentences 2. and 3. 2. John’s love of Mary was harmful. 3. John’s can of beans was edible. These two sentences have identical syntactic structures. Of what use is this information in dealing with the conceptual content that they express? In 3. it is the case that the beans are edible. If in fact ‘Mary is harmful is true, it has not been expressed by this sentence. (1973: 18% 189)
It is not clear to us exactly what Schank is trying to demonstrate with these examples. If he is arguing that syntactic analysis alone is insufficient for providing the meaning of a sentence, then he is arguing for something that no one has ever doubted, or could have doubted. Syntactic analysis by itself can tell us nothing about the meaning of sentences; rather, the results of syntactic analyses are sets of structural descriptions which must be interpreted before they mean anything at all. If he is trying to demonstrate that the relation between syntax and meaning is not l-l, then this fact too has hardly ever been doubted. It is just this observation which lends crcdencc to the view that syntactic systems are organized along lines that are largely independent of semantic considerations, and hence have interesting propertics that cannot be derived from the role they play in conveying meaning. On the other hand, it should not be concluded that syntax plays only an incidental role in conveyin, ~7meaning. Schank appears to approach such a view in passages such as the following: Whereas ficient This is is as a analysis
no one would claim today that syntactic analysis of a sentence is suffor programs which use natural language, it may not even be necessary. not to say that syntax is not useful: it most certainly is. But its function pointer to semantic information rather than as a first step to semantic as had been traditionally assunled. (1972: 555)
Despite his disclaimer, Schank syntax in conveying meanin, cr is at to the supplying of some cues, or not hard to see that every aspect
appears to be implying that the role of best somewhat marginal, limited perhaps “pointers” to the meaning. However, it is of syntactic form is fundamental in the
Arti’cial ir~tclligcrm and the study oflangtage
365
conveying of meaning. Take, for example, such a relatively marginal syntactic fact as the placement of determiners before their head nouns, rather than after. If this aspect of syntax plays no major role in conveying meaning, we would expect that the sentence TuII rmtz the hit stnull rorrm! bull u should convey about the same meaning as the sentence The tull mm hit a mmll YOUIICIbull. But of course the first of these sentences conveys no meaning at all. For sentences to be meaningful, they must in general be grammatical; although sentences can deviate more or less from strict grammaticality and still be intelligible (recall the discussion of “qualitative grammar” that arose in connection with Minsky’s paper), it is nevertheless fair to say that syntactic well-formedness is by and large necessary if a sentence is to be meaningful at all. Hence, syntax is more than just a pointer to semantic information; syntactic well-formedness is a precondition for a sentence’s having an interpretation at all. Schank’s assertion that “we are not interested in what is, and what is not, an acceptable utterance of English” (1973: 202) is comprehensible only given the presupposition that the sentences he is that the dealing with are grammatical; i.e., only given the presupposition syntactic processing necessary to distinguish a sentence of English from gibberish has already been done. It may well be true that certain rules of semantic interpretation do not require the full resources of a syntactic analysis. Just as most phonological rules require only a very impoverished syntactic representation in order to apply correctly, so too it is not inconceivable that certain rules of semantic interpretation are sensitive to only some aspects of syntactic structure. But even if this is true, it does not then follow, as Schank implies, that a full syntactic analysis is not necessary to the comprehension of a sentence. This issue aside, let us now turn to a consideration of the theory that Schank proposes. What he is tryin g to account for is how people actually understand sentences. It is a truism that sentences do not explicitly contain all the information that people actually associate with them in the course of a conversation. Thus, people brin g to bear a wide range of knowledge, suppositions, beliefs, and so on, to the task of understanding even the simplest sentence. Everybody knows that this is the case; one who wishes to explain this phenomenon must go beyond its mere statement by elaborating principles which can explain how this is done. The main defect with Schank’s theory of conceptual dependency is that it never goes beyond the level of restating the problem. What he does, in effect, is provide a series of anecdotes and examples which testify to the diversity of knowledge that humans automatically employ in the course of comprehending the meaning of what they hear. These anecdotes are encoded in a formalism which is claimed to have explanatory power. Yet a close look
366
B. Elm Drcshcr
ad
Not-but
Horttstein
at this system reveals that it never goes beyond the listing of particular, arbitrarily chosen, interpretations of the sentences whose meaning it is intended to explicate. The heart of Schank’s system is a conceptual base which consists of conceptual dependency networks called “C-diagrams”. These C-diagrams are intended to represent the meanings of sentences in a language-independent manner. The basic units of these conceptualizations are “concepts” which enter into relations called “dependencies”. For these C-diagrams to be part of a theory, rather than just a notation, Schank must give some indication of the type of entity that a concept can be and the type of relation that a dependency can be; also he must indicate some way of showing which Cdiagram goes with which sentence. Furthermore, he must show that by defining these he is able to offer an explanation of phenomena which is otherwise unavailable. Consider the following sentence: (25)
John grew the plants with fertilizer.
In discussing the full meaning of this sentence, Schank notes the following points. First, he observes that when one says that someone grew the plants, what one really means is that someone did something which caused the plants to grow (i.e., the verb grow is a causative). Next he notes that when something grows it gets bigger. As a first approximation of the conceptualization of (25) he presents the diagram (26): (26)
John
A
do L
fertilizer
mi plants
-----f phys st size = x + y
=L P
phys st size = x
In (26), the different kinds of arrows represent different types of dependencies. The arrow indicates a mutual dependency between an actor and an action; the p indicates that the event occurred in the past. L represents “conceptual instrument”; hence, the top line indicates that John performed some unspecified
act with fertilizer.
The symbol L
indicates
a change of
state, so the bottom line represents the plants growing, changing in size from x to x -t y. Finally, fii represents intentional causation and is a relation between two conceptualizations. The whole diagram indicates that John did something with fertilizer and that by doing this he intentionally caused the plants to grow. Schank goes on to note that even this is only a rough approximation of what is going on:
Artificial intellijpm
am’ the study oflanguage 367
In fact, ‘fertilizer’ was not the instrument of the action that took place. It was the object. Consider what probably happened. John took his fertilizer bag over to the plants and added the fertilir.er to the ground where the plants were. This
enabled the plants to grow. (1973: 200) Schank suggests that the unspecified action that John performed was to transfer the fertilizer from a bag into the ground. This action he denotes by TRANS and the entire conceptualization is given as (27): D
(27)
John
L
TRANS 2
fertilizer
B plants
-cI
phIItS
p
+---L
-
ground
bag
phys st. size = x + y phys st. size = x
It is clear that the conceptualization proposed in (27) is by no means the only one that could be given for sentence (25). For example, there is no reason to believe that the fertilizer was contained in a bag ~ it could have been in a box or in a wagon. Similarly, the plants could have been in water and John might have poured the fertilizer into the water. These possibilities suggest that there are many other C-diagrams which could be proposed for (25). Hence the C-diagram that Schank actually proposes represents only one interpretation of sentence (25) and can in no way be said to represent tlzc conceptualization that underlies (25). If, as these considerations suggest, there is no principled way of going from a sentence to a conceptualization, then, in Schank’s terms, the theory “is not doing its job”: The theory that is proposed here is doing its job if two linguistic structures. whether in the same or different languages, have the same conceptual representation if they are translations or paraphrases of each other. Furthermore, another subgoal of this research and test of the theory, is the question of the existence of explicit procedures for formally realizing two such linguistic structures as the one conceptual structure. (1972: 554-555)
If even a single linguistic structure cannot be realized as a unique conceptual structure, then obviously it is impossible for two different linguistic structures to be realized as a single unique conceptualization. It might be argued on behalf of Schank’s theory that the various alterations that we suggested for the C diagram of (27) are only of a minor nature. It might be thought that even though one could replace bug by wugon or ground by water the essential structure of the C-diagram &ill remain the same. This, however, is not the case. For just as (26) was found to be an inadequate representation of the meaning of (25), so too it is easy to see that (27) does not really give as full an account as possible of the meaning that can be associated with (25). For it is not really true that the mere act of
368
B. Blurl Drcsher and Norbert Hornstein
transferring fertilizer from its container into the ground was what caused the plants to grow. The plants grew because they absorbed nutrients from the fertilizer and because they were able to break these down and use them, and so on to the limits of our present knowledge of organic chemistry and botany. All this involves only one sort of expansion of the C-diagram of (27). We have not even touched on the whole question of how John might have TRANSed the fertilizer, whether by shaking it or sprinkling it, etc. It should be quite clear, without going into the details, that a C-diagram containing even some of this additional information will bear very little resemblance to (27). Unless Schank can rule out this sort of infinite expansion, there is no principled way of specifying the C-diagram of even a single sentence, for there is no way of limiting, except arbitrarily, the complexity of any C-diagram. That this is true can be seen in Schank’s discussion of the following sentence: (28)
John ate the ice cream with a spoon.
Schank notes that “although syntactically ‘spoon’ is the instrument of ‘ate’, conceptually it is the object of an unspecified action that is the instrument of ‘ate”’ (1973: 200). For reasons that remain obscure, Schank insists on representing the action represented by cat as INGEST, and proposes (29) as a first approximation to a C-diagram of (28): (29)
John
-
P
INGEST
-
cl
I
ice cream
+--
John G do f0 spoon
The diagram indicates that John ingested the ice cream by doing something to a spoon, Schank notes, however, that this is really insufficient if one wishes to convey a full meaning of the sentence. For it is obvious to anyone who knows the language and is acquainted with even the most rudimentary knowledge of eating that John’s ingestion of the ice cream was made possible “by somehow getting the ice cream and his mouth in contact” (1973: 201). A fuller representation of (28) is (30): (30)
John -
INGEST
z-
ice cream
A
Job? TRANS ?0 co NT spoon ice cream
-uth ice cream fi John
POSS-BY
Artificial
intelligence
and the stud,v of language
369
Once again, there is no reason to stop here. As Schank makes clear, the endless expansion of any C-diagram is part of the very nature of his system: It is important to recognize that every ACT in a conceptual dependency theory case requires an instrumental case.... If every ACT requires an instrumental which itself contains an ACT, it should be obvious that we can never finish diagramming a given conceptualization. For sentence [28] for example, we might have: “John ingested the ice cream by transing the ice cream on a spoon to his mouth, by transing the spoon to the ice cl;eam, by grasping the spoon, by moving his hand to the spoon, by moving his hand muscles, by thinking about moving his hand muscles,” and so on. Such an analysis is really what underlies the ACT ingest and is known to exist by all speakers when they use the word
“eat”. (1973: 201) It follows, then, that any given C-diagram is simply one of an endless list of possible C-diagrams. Schank has no principled way of choosing which of the endless series of different C-diagrams is required as the interpretation of any sentence. Rather, the choice is quite arbitrary: We shall not write in a conceptual diagram nor use in computer analyser any more than has been explicitly stated with respect to instruments, but we shall retain the ability to retrieve these instruments should we find this necessary. (1973: 201) The above statements are at odds with the well-formedness of conceptualizations:
the conditions
that
Schank
sets on
a C-diagram that contains only the sententially realized information will not be well-formed conceptually. That is, a conceptualization is not complete until all the conceptual cases required by the act have been explicated. (1972: 569)
These statements, taken together, imply that there will never be a wellformed conceptualization. Hence, even on Schank’s own terms, the theory of conceptual dependencies is incoherent. Moreover, the fact that every sentence can correspond to an endless series of different C-diagrams is not a minor problem for what purports to be an explanatory theory of human language understanding. For, as is clear from Schank’s discussion of sentences (25) and (28), there is no principled way to expand any C-diagram into a more complex C-diagram. Each step requires new information to be brought in. So, for example, in (28) we start with John eating ice cream with a spoon and we successively bring in his mouth, his hand muscles, his thought processes and so on. It is clear that this new information is idiosyncratic not just to the sentence under consideration but to any particular use of that sentence. Hence, there is no information which could in principle be excluded from the conceptualization of any sentence, given the right conditions. In other words, any sentence could have, as part of its meaning, anything at all.
370 B. Elan Drcsher and Norhcrt
Hornstein
Now, we knew all along that people understand sentences in different ways, in more or less detail, depending on their knowledge of the subject, on the context of the discussion, on their state of mind at the time, and on many other factors. These are the facts that we start with ~ what is unexplained is how this occurs. Something purporting to be a theory of human language understanding should tell us explicitly under what conditions a speaker will arrive at any particular interpretation of a sentence. Or, to put it in Schank’s terminology, we want to know how far to expand a C-diagram (i.e., how many instruments we must posit) in any given situation. Schank’s answer is that we posit as many as “necessary”. This is not an answer ~ it is a restatement of the question. Given these difficulties with the notion of conceptual dependencies, it is difficult to see how they could play any usefLd role in a theory of processing. Yet it is one of Schank’s main claims that the positing of a level of conceptualizations is necessary to a theory of processing: it is necessary to recognize that an important part of the understanding process is in the realm of prediction .._ humans engaged in the understanding process make predictions about a great deal more than the syntactic structure of a sentence, and any adequate theory must predict much of what is received as input in order to know how to handle it. The point of this paper, then, is to explicate a theory of natural language understanding that: (1) is conceptually based; (2) has a conceptual base that consists of a formal structure; (3) can make predictions on the basis of this conceptual structure... (1972: 555556)
According to Schank, the predictions about what is forthcoming in a sentence (or dialogue) can be made only at the conceptual level. The syntactic level will not suffice because it does not always contain enough information to make the required predictions: in attempting to uncover the actual conceptualization underlying a sentence, we must recogni7.e that a sentence is often more than its component parts. In fact, a dialogue is usually based on the information that is left out of a sentence but is predicted by the conceptual rules. (1973: 196) It is precisely because the conceptual level goes beyond information provided by the syntactic rules that the conceptualizations can, Schank claims, make predictions in the course of understanding a sentence. According to Schank, when a word corresponding to an action is encountered in a sentence the C-diagram of which it is a part is used to predict that it can have an actor and an object: The conceptual processor can then search through the sentence to find the candidates for these positions. It knows where to look for them by the sen-
Artificial intelligerzce and the study of’ language
37 1
tential rules and what it is looking for by the syntax and semantics on the conceptual level. Thus, it is the predictive ability of the formulation of the conceptual rules that makes them powerful tools. (1973: 196)
Since we have seen that there is no unique C-diagram for any given sentence - that a full C-diagram for any sentence would have an infinite number of empty slots - it is difficult to see what this predictive ability amounts to. If, as Schank seems to imply in the above statements, the processor is going to have to fill all the empty slots predicted by the C-diagrams, then it will never succeed because the number of empty slots is infinite due to the open nature of the instrument case. As there is no principled way of choosing one particular finite C-diagram for any sentence, there is no way for the conceptual processor to predict which slots to look for. We have seen in &hank’s discussion of sentence (28) that he has decided to limit Cdiagrams only to those instruments which are explicitly stated. This means that the conceptual processor will no longer be able to predict instruments. Since, judging from the examples, it is precisely the instrument cases, which provide much of the information which cannot be obtained from more purely linguistic levels, by giving these up Schank is giving up much of what is special to the level of conceptualizations for which he is arguing. Thus, consider once again sentence (28). What Schank is giving up is the ability to predict the expanded C-diagram (30). However, the C-diagram (29) which remains contains very little information which cannot be culled directly from the syntactic structure of (28). Looking only at the syntactic structure of (28), we can say that ate is the main action performed as it is the main verb, John is the actor as he is the subject, ice ueam is the object, and with a spoon is some sort of instrumental associated with ute. This is essentially the information that is contained in the C-diagram (29). Even the presence of the unspecified verb do is predictable since in Schank’s system it is impossible for a nominal to be a conceptual instrument; such a nominal must always be the conceptual object of an action. When no action is explicitly posited, as in (28), we have to automatically supply the unspecified action do. From examples such as these one may conclude that Schank has not demonstrated that a conceptual level of the type that he posits contributes anything to a processor. We can see this more clearly by considering how this conceptual processor is supposed to work. Consider the following sentence: (3 1)
The boy ate a book.
Consider what even the most minimal theory of language would say about (3 1). First, we could note that ate is a transitive verb; i.e., that it may take a direct object. Second, we could note that the subject and object of ate
372 B. Elan Dresher and Norbert Hornstein
must have certain semantic properties if the sentence is not to be somehow deviant. In particular, the subject should be some animate object, and the object should normally be something edible. These restrictions have traditionally been held to be stated in the dictionary entry of the word ate, where by ‘dictionary’ we mean that component of the grammar which lists the syntactic and semantic properties of words. Schank’s full theory of conceptualizations goes considerably beyond this level of information. However, by ignoring all non-explicitly stated conceptual instruments, his dictionary entries, which consist of unfilled C-diagrams, are limited to essentially this kind of information. (32) represents his dictionary entry for the verb eat: (32)
vtxMNGESTi-0ycI-’
L--+ int
’ animal
Y food
In this diagram, vt stands for transitive verb; as can be seen, it is subcategorized to occur with a direct object _v. Both the subject, x, and the object, y, have the appropriate types of semantic features. Schank’s dictionary representation does not go beyond the most minimal assumptions of what would go into a dictionary entry; as such, it contributes nothing to what we already knew about this subject. In particular, the C-diagram itself is superfluous. This is so because everything it can predict in the course of processing a sentence - in this case, that ate normally occurs with an animate subject and an edible object - can be predicted equally well without it. Once one does away with th’e more exotic aspects of C-diagrams, such as those supplied by a never-ending series of instruments, then they cannot be used to “make predictions about a great deal more than the syntactic structure of a sentence”. Insofar as these more elaborate C-diagrams are retained, they fail to make any processing predictions at all. Hence, the value of such a level in processing has not been demonstrated. Incidentally, it is not clear to us what empirical facts are meant to be captured by the putative predictive power of a conceptual processor of the type being proposed here. Thus, though Schank constantly reiterates that humans make predictions about what they are going to hear in the course of a conversation, he never gives more than the vaguest indications of what these predictions are. In fact, he offers no evidence whatsoever that people make such predictions at all, beyond stating anecdotally that expectations of many different kinds exist. It is an empirical question to what extent this is true; it cannot be settled by fiat, as Schank does. Schank insists that “If we are going to design a computer program that is a model of human language understanding behavior, then we must in fact make the predictions at each level that a human is known to make” (1973: 190). But what are
Artificial intelligence and the study
oflanguage
373
these “known” predictions? Even at the syntactic level it is not clear what the facts are. Thus, when a person hears, at the beginning of a sentence, the words John took, what predictions does he make at that point about the structure of what will follow? It is plausible that one will predict a noun phrase object, as this is an obligatory element; but what about optional phrases, such as from Frank, to Hurry, by force, and so on? It is not at all clear to what extent people attempt to predict - or rather, guess ~ the occurrence or non-occurrence of such elements. In the realm of concepts, where so many more choices exist - limited perhaps only by vague notions of appropriateness -- the facts are simply mysterious. Not only is it unclear what predictions &hank is talking about; it is even unclear whether ~ supposing for a moment that his system can be made to work exactly as he wishes - the resulting system will in any way represent what it is that humans actually doz8. The problems with the theory of conceptual dependencies go beyond the specific content of the theory, and involve also the entire approach to the problem of natural language understanding. For it is precisely the method of dealing with the “problem as a whole “, “without regard to particular subparts”, that makes the study of meaning unmanageable. Even though formidable problems stand in the way of a total theory of language understanding, one should not conclude that no progress can be made. As we suggested in section I. 1, a rational approach to this kind of problem will involve breaking it up into more manageable subparts, and elaborating the properties of these. Such an approach would regard what we call the meaning of a sentence to be a function of different interacting subsystems. Thus, the meaning of a sentence would be a function of the meanings of the words of the sentence, its syntactic organization, various semantic systems, such as those dealing with scope, reference, tense, and aspect, and pragmatic considerations of various kinds. For example, intonation plays an important role in conveying meaning. Consider the following much-discussed sentencesz9 :
28As one of the predictions that a human “is known to make” Schank asserts: With respect to syntax, rather than doing a complete syntactic analysis and then submitting the results of this analysis to some syntactic interpreter, we would like to use syntactic information as a pointer to the conceptual information. (1973: 190). Schank does not cite the source of his knowledge that people do not do a complete syntactic analysis; as we argued earlier, it is unclear what such a claim would involve. We do not know of any empirical evidence that supports it. Cf. Goodluck (1976). Riesbeck (1975) is equally certain of how human processing works: Why should consideration of the meaning of a sentence have to depend upon the successful syntactic analysis of that sentence? This is certainly not a restriction that applies to people. Why should computer programs be more limited? (1975: 15) 29Cf. Lakoff (1969), Chomsky (1972).
374
(33) (34)
B. Elan Dresizer and Norbert Hornstein
John called Maiy a Republican, John called Mary a Republican,
and then SHE insulted HIM. and then she INSULTED him.
In the above sentences, the capitalized words are to be read as having heavy stress. One deduces from (33) that the speaker believes that to call someone a Republican is to insult him, while no such meaning is conveyed by (34). The difference in the meanings of these sentences can only be due to their differing intonation patterns, as they are otherwise identical. Notice that it is not necessary for a hearer of (33) to know ahead of time that it is an insult to be called a Republican; that the speaker believes this is deduced from the sentence, and may come as a surprise to the hearer. No special training or heritage is required to make this deduction; the hearer has only to know rules of English intonation. That this is so can be appreciated when we consider nonsensical sentences with the same intonation pattern: (35)
John glitched Mary, and then SHE blooped
HIM.
Anyone who knows English can understand from (35) that whatever else is, to glitch someone is to bloop him. Examples such as these seem to indicate that questions of meaning are not all of a piece. Such intonation patterns bear directly on questions of meaning; however, their properties can be investigated to a large extent independently of other aspects of meaning. In fact, it is only by an investigation of this sort that one could even hope to account for what hearers understand by these sentences, given the otherwise identical nature of sentences (33) and (34). Having noticed these facts about intonation, one might want to go on to explore the various Ways in which intonation contributes to meaning. To the extent that this type of research is successful one will have made a step in the direction of understanding the larger phenomenon of meaning. Thus it is precisely by factoring out the contributions that such subsystems make to meaning that one can sharpen and clarify the issues involved. In contrast to such an approach, &hank’s C-diagrams haphazardly incorporate elements of meaning of all sorts. Thus, they indiscriminately mix together elements of the meanings of lexical items, information supplied by sub-categorization and selectional restrictions, grammatical relations, and a grab bag of scraps of knowledge, beliefs and expectations. In such a system it becomes impossible to make distinctions - everything is a concept. Like frame theory, the theory of conceptual dependencies is not so much a theory, as a confusing way of talking that obscures the distinctions that must be made if one is to have a theory. Schank states, “No claim is made that all of the problems of language analysis are solved by looking at things in our way, but we do feel that the problems have been
glitch
Artificial intelligence and the study
of language
375
made clearer” (1973: 24). On the contrary, the problems have been made less clear, for the reasons stated above3’. There are two further points that Schank makes that are worth very briefly considering. In introducing the C-diagrams at the beginning of his paper he notes that one of the nodes that comprise it ~ one of the elements related to ACT by the relation - is called a Picture Producer (l’l’). Pi’s are one of the three “elemental kinds of concepts”, i.e., the nominal concepts (1972: 557). In describing what these things are, Schank says that words that are conceptually related to PPs conjure up mental images of things: That is, a word that is a realization of a nominal concept tends to produce a picture of that real world item in the mind of the hearer. (1972: 557) How seriously is one to take this? Clearly, if this is meant literally, then all sorts of problems crop up concerning the properties that these images have. Is the image that one has of rnu~l fat or thin? Is it bald, full of hair, etc.? These and other more serious problems immediately come to mind31. In fact, a whole host of considerations, worked over among others by Wittgenstein, will operate to make this whole notion little better than incoherent. If, however, the term is not meant seriously, then how are we to take the notion of PP? Is it simply the conceptual analogue of NP? Schank never makes this clear. Another area where things are less than clear is in Schank’s discussion of the primitives that underlie his verbs. One of these primitives, TRANS, underlies verbs like come and go, fll, and other verbs of this kind. The problem, of course, is to say what this kind is. This Schank never does. He seems to be content to simply leave the process of decomposition at the stage of naming the primitives that he feels underlie a given verb, without any form of explication. This process of naming should not be confused with explanation. To name something and then to refuse to outline the properties of what it is that one has named is not explanation at any interesting level. 30Thus we agree with Weizenbaum’s assessment: What is contributed when it is asserted that “there exists a conceptual base that is interlingual, onto which linguistic structures in a given language map during the understanding process and out of which such structures are created during generation [of linguistic utterances] “?... Nothing at all. For the term “conceptual base” could perfectly well be replaced by the word “something”. And who could argue with that so-transformed statement? Schank provides no demonstration that his scheme is more than a collection of heuristics that happen to work on specific classes of examples. The crucial scientific problem would be to construct a finite program that assigns appropriate conceptual structures to the infinite range of sentences that can occur in natural language. That problem remains as untouched as ever. (1976: 199) 3’For example, what picture is produced by the object of uaith in the sentence This theorem deals with geometrical figures too complex to be pictured in the mind. (This sentence was suggested by Noam Chomsky.)
376 B. Elan Dresher and Norbert Hornstein
To say that INGEST underlies cut and &ink is no more explanatory, if left at this level, than to say that EAT underlies irlgest or for that matter that DRINK underlies rut and ingest. Putting things into capital letters is not to explain their properties. What is needed is an account of what it is that underlies verbs of this sort beyond the level of vague intuition where Schank leaves it. There are several revealing respects in which Schank’s work resembles that of Minsky and Winograd3’. Foremost among these is an emphasis on notation over explanation. Despite Schank’s claim that he is aiming at an explanatory theory of language understanding, what he is really trying to do is develop a system by means of which some aspects of meaning can be encoded in a computer for some particular end. Given this orientation, many of the problems which we suggested were fatal to a theory of conceptual dependencies become unimportant. The fact that C-diagrams are in principle endless is not a major problem with respect to any particular domain. What one should include in a C-diagram and where to stop expanding the instruments can always be determined more or less satisfactorily if one knows ahead of time what one is programming the system for. As Schank points out: The point here is that computer programs that deal with natural language do so for some purpose. They need to make use of the information provided by a sentence so that they can respond properly. (What is proper is entirely dependent on the point of the computer program. A psychiatric interviewing program would want to respond quite differently to a given sentence than a mechanical translation program .) (1973: 189)
To take a more familiar example, the C-diagram of eat will be different depending on whether one wants to talk to a machine about restaurants or about digestion. This kind of flexibility, though something to be eliminated if one seeks an explanatory theory, is a virtue from the point of view of the “practical desire to have a usable language-understanding system”. Another feature that Schank’s system has in common with that of Minsky and Winograd is its immense reliance on memory. Although Schank states that it is his ultimate hope “to build a program that can learn as a child does how to do what we have described in this paper instead of being spoonfed the tremendous information necessary”, his system does not provide even the slightest hint as to how this might ever be possible. For the purposes of a usable language-understanding system adequate to some domain there is “Schank’s more recent work on scripts and Abelson state: “The ideas presented (Schank and Abelson 1976: I)
is if anything even closer to Minsky’s frame theory. Schank here can be viewed as a specialization of the frame idea”.
Artificial intelligence and the study
oflanguage
377
nothing wrong with programming into the memory as much knowledge as one wants. But if one wishes to build a machine that can learn as a child does -- i.e., if one wishes to develop an explanatory theory of language or cognition - one must go beyond the development of techniques for the storing and retrieval of information from memory3 3. To sum up then, we have seen that there exists no reason to believe that the type of AI research into language discussed here could lead to esplanatory theories of language. This is because first, workers in AI have misconstrued what the goals of an explanatory theory of language should be, and second, because there is no reason to believe that the development of programs which could understand language in some domain could contribute to the development of an explanatory theory of language. An examination of actual work in AI, as opposed to its goals, has shown that this sort of research does not in fact contribute to the development of such a theory. Not only has work in AI not yet made any contribution to a scientific theory of language, there is no reason to believe that the type of research that we have been discussing will ever lead to such theories, for it is aimed in a different direction altogether. As Joseph Weizenbaum has written: .. . what Winograd has done - indeed, what all of artificial intelligence has so far done - is to build a machine that performs certain specific tasks, just as, say, seventeenth-century artisans built machines that kept time, fired iron balls over considerable distances, and so forth. Those artisans would have been grievously mistaken had they let their successes lead them to the conclusion that they had begun to approach a general theoretical understanding of the universe, or even to the conclusion that, because their machines worked, they had validated the idea that the laws of the universe are formalizable in mathematical terms. The hubris of the artificial intelligentsia is manifested precisely by its constant advance of exactly these mistaken ideas about the machines it has succeeded in building. (1976: 196) 33Reliance on memory is characteristic of much work in AI involving cognitive abilities. Newell and Simon write: ...that programmed computer and human problem solver are both species belonging to the genus ‘Information Processing System’.... The apparently complex behavior of the information processing system in a given environment is produced by the interaction of the demands of that environment with a few basic parameters of system, particularly characteristics of its memories. (Newell and Simon 1972: 870-871, cited in Weizenbaum 1976: 169). See also Weizenbaum’s discussion, 167 ff. Even more emphatically, Riesbeck asserts: COMPREHENSION IS A MEMORY PROCESS. This simple statement has several important implications about what a comprehension model should look like. Comprehension as a memory process implies a set of concerns very different from those that arose when natural language processing was looked at by linguistics. It implies that the answers involve the generation of simple mechanisms and large data bases. (Riesbeck 1975: 15) One of the beneficial effects that AI is capable of having on the study of natural language is, according to Wilks (1975: 144), its “emphasis on complex storage structures in a natural language understanding system: frames, if you like”.
378
B. Elm Drcshcr
III. Augmented
and Norbert
Transition
Hornstein
Networks
We have seen that models of language comprehension have played an important role in most of the AI work on language that we have surveyed in the previous sections. It has been a characteristic of all of these proposed comprehension models that they do not contribute to any general theory of comprehension; nor are they based on empirical data concerning the way that humans process and understand language. For these reasons such work does not really bear on the question of /zurna~z lunguuge understanding in the normal sense. In the remainder of this paper we will be concerned with a computerbased model of parsing based on Augmented Transition Networks which does address itself to observed facts of human language comprehension. Nevertheless, we will show that the proposed model does not succeed in accounting for these observations in a principled way, despite claims to the contrary . We will begin with a brief discussion of how a theory of competence might be related to a theory of parsing and then we will proceed to a review of the research involving the Augmented Transition Network model. Transformational grammar is a model of linguistic competence, which is the tacit knowledge that speakers have of the structure of their language. It is not intended to be a theory of how this tacit knowledge is put to use in actually speaking and understanding a language. An overall theory of language use must specify, in addition to a theory of competence, a theory of production and a theory of parsing. In the ensuing discussion we will not be concerned with theories of production, given that it is unclear what such theories would be like (see note 335). As the relationship between theories of competence and theories of parsing can easily be misunderstood, it is worthwhile examining what the interaction between these could be. One possibility is that there is no relationship between the two, and that the parser only follows certain heuristic strategies of its own. However, such a model leads to the strange conclusion that a person’s linguistic competence, i.e., his knowledge of the set of possible sentences in his language and their structures, plays no role in language use, i.e., in processing any particular sentence of his language. It is as if one were to suggest that a group of scientists possessing knowledge of the theory of rocketry which specified the characteristics that any rocket must have would not use it at all in the building of any particular rocket. Henceforth, we will assume that this position is incorrect. A more plausible theory is that there is some interaction between the competence and the parser, with the latter using the former as a sort of
Artificial
intelligence
and the study of language
379
template against which it matches incoming utterances. The specification of how it does this is the subject matter of the theory of parsing. A parser, then, would be a device with the following components: a grammar containing sets of possible structural descriptions (SD), a schedule which orders these sets of SDS in a particular way, a heuristic filter which would suggest SDS according to certain heuristic principles, and a processor containing a short term memory. The parser would operate as follows. The processor would analyze the input string into words, assigning them each their basic lexical category labels. The parser would then attempt to match these strings of words with the SDS in the grammar in an order determined by the schedule and the heuristic filter. No doubt semantic and pragmatic information would play a role in ordering this matching up process as well. The theory of parsing would be concerned with elaborating the properties of the various parts of the parser. Transformational grammar concerns itself with discovering the principles constraining the operation of the grammar. Other important work would involve defining the properties of the schedule discovering the principles used in ordering the SDS, if any; defining the structure of the heuristic component - finding what sorts of cues such a component highlights to help the parser search through the possible SDS in finding the proper match for the input string; and elaborating the properties and principles underlying the functioning of the processor - e.g., investigating the structure of the short term memory, and the scanning techniques used in processing incoming strings. The success of a theory of parsing will be a function of its success in elaborating the properties and principles underlying the functioning of the various components listed. In a series of papers (Wanner and Maratsos, 1974; Wanner, Kaplan and Shiner, 1975) Wanner et al. have claimed that experimental evidence supports the view that the Augmented Transition Network (ATN) model is a psychologically realistic model of certain aspects of human linguistic comprehension34. Moreover, they suggest that the success that this model has had in accounting for comprehension phenomena should lead one to prefer it to a TG. It should be clear from our above discussion that only another competence model can be compared to a TG. For their argument to go through, they must demonstrate, first, that the ATN incorporates a competence model which has properties which cannot be untrivially expressed in a TG and second, that these properties make a crucial contribution to accounting for the observed phenomena. In what follows we will argue that neither of these is the case. The proposed ATN grammar is a version of TG and hence can have no properties that cannot be captured in a TG. Further341Tor discussion
of the evolution
of ATN see Wanner
and Maratsos
(1974:
5).
380
B. Elan Dresher
and Norbert
Hornstein
more, the experiments that they discuss more directly bear on properties of the processor and the schedule than on any special properties of the grammar that they incorporate in their model. Wanner, Kaplan and Shiner (1975: 7-9) describe an ATN as follows: An ATN has three main components: a processor, a schedule, and an augmented network grammar. The ATN grammar contains information about the patterns of syntactic categories, phrases, and clauses which form the sentences of the language. In addition, the grammar contains a set of context-sensitive operations which assign grammatical functions to the parts of a sentence. The ATN processor analyzes a sentence by attempting to match the syntactic patterns stored in the grammar with the input sentence. This matching process is ordered according to the schedule. As matches are found, the processor executes operations which assign grammatical functions. We can illustrate this process with a simple example. Figure 1361 describes an elementary ATN grammar which is restricted in scope to a limited class of simple, active, affirmative sentences. This grammar is organized as a pair of networks. Each network is composed of a set of states represented as circles. The symbols inside each circle provide a unique name for each state. The arrows that connect the states in each network are called arcs. If an arc connects two states, then we will say that the ATN processor can make a transition between these states in the direction indicated by the arc. The labels on each arc indicate conditions which must be satisfied before a transition can be made. In addition, the list below [36] specifies a set of actions associated with each arc which must be executed by the processor whenever a transition is made. transition
(36)
Sentence Network:
Noun Phrase Network:
a.:.:;:-II::
Arc 1 2 3
Action ASSIGN SUBJECT to current phrase ASSIGN ACTION to current word ASSIGN OBJECT to current phrase
Artificial intelligence and the study of language
38 1
ASSEMBLE CLAUSE SEND current clause ASSIGN DET to current word ASSIGN MOD to current word ASSIGN HEAD to current word ASSEMBLE NOUN PHRASE SEND current phrase In general terms, the analysis of any sentente is accomplished in the following way. Beginning at the initial state in the network (here S,,) with attention focused on the initial word in the input sentence, the processor attempts to find an arc whose condition is met by the input. This search must be carried out in accord with the specifications of the schedule. In principle any schedule is possible. The arcs leaving a state might be searched sequentially or in parallel. If sequential, the search might be arranged in fixed or flexible order. Here however, we will assume for simplicity that the arcs leaving a given state are searched in a fixed, sequential order starting with the arc nearest the top of each state and proceeding in a clockwise direction. When this search locates an arc whose condition is met by the input, the processor will execute the actions associated with that arc, make the indicated transition to a new state, and shift attention to the next word in the sentence. This process will continue in an iterative fashion until it reaches the final state in the network (here &-end) with no more words left in the input sentence. The process then terminates, leaving in working memory a syntactic description of the input sentence which has been produced by the actions taken in the course of traversing the network.
The ATN model follows the general form of parsing models that we outlined earlier. In the case of the networks in diagram (36), the grammar is equivalent to a TG which contains the following phrase structure rules: (37)
s NP ----+
NPVNP Art (Adj)*
N
It has long been known that phrase structure grammars do not have the power to adequately express the syntax of natural languages, and must be supplemented by additional devices 35 . In a traditional TG, these additional devices are transformations; the simple transition network of (36) is “augmented” by the addition of a HOLD list. The HOLD list has the ability to store elements in short term memory. This ability is required for parsing sentences containing restrictive relative clauses such as (38): (38)
The old man that the boy loved caught the fish.
35Cf. Chomsky
(1957),
espedially
chapters
3-5.
382
R. Elm Fresher
and Norhert
Hormtein
In sentence (38), there appears to be a “gap” after loved; the object of loved is missing. Moreover, the gap is interpreted as being the old mu/z.In the ATN system this fact is captured by first storing the old mm in the HOLD list and then retrieving it when the gap is reached. An ATN grammar containing the HOLD list is illustrated in (39): (39)
Sentence Network:
Noun Phrase Network:
Arc 2 3 4 5
6 8 9 10 11 12
Action ASSIGN SUBJECT to current phrase ASSIGN ACTION to current word ASSIGN OBJECT to current phrase ASSEMBLE CLAUSE SEND current clause ASSIGN DET to current word ASSIGN MOD to current word ASSIGN HEAD to current word ASSEMBLE NOUN PHRASE SEND current phrase HOLD CHECK HOLD ASSIGN MOD to current clause ASSEMBLE NOUN PHRASE SEND current phrase (no action) from Wanner, Kaplan and Shiner (1975)
Artificial intelligence and the study of language
383
Looking at sentence (38), for example, the ATN will start in state S,. Arc 1 transfers it into the Noun Phrase network. Processing proceeds straightforwardly until that. At this point the old mm is placed in the HOLD list by Arc 9. Arc 10 sends the system back into the sentence network and the boy loved is processed. Then the system executes Arc 3, searching for an NP, and attempts to find an article in accordance with Arc 5. Since cuuglzt is not an article, the system must back up over Arc,5 and attempt instead Arc 12, RETRlEVE HOLD. In executing Arc 12, it retrieves the contents of the HOLD list, in this case the old man, and interprets them as the object of loved. The processing of the rest of the sentence is straightforward36. The same effect can be achieved by any other processor based on a TG which relates a surface structure containing a gap after loved to some other structure in which the old man replaces the gap. Every version of TG proposed to date has done something like this although there is a great deal of dispute about the rules that relate these structures37. One piece of evidence that Wanner and Maratsos (1974) adduce in favor of the ATN model of sentence comprehension is that it provides a “natural” account of the difficulties involved in processing multiply embedded relative clauses. lt has been known for a long time that sentences with center-embedded relative clauses such as (40) are not very acceptable. (40)
The cheese that the mouse that the cat chased ate pleased Eloise.
Earlier explanations were advanced that focused on the structural properties of such sentences to account for the difficulty involved in processing them. Wanner and Maratsos (1974) offer a hypothesis which is not framed in terms of “global differences between sentence types” but “in terms of transient effects which arise and subside in the course of comprehension” (p. 4). They propose that multiple embedding of relative clauses overloads the short term memory in the course of processing. In the ATN system the head of the relative clause is put on the HOLD list until the gap in the clause is reached. Then the information of the HOLD list, i.e., the head noun phrase, is retrieved and the HOLD is cleared. When relative clauses are multiply embedded, however, more than one set of head noun phrases must be placed on the HOLD list, and this overloads the short term memory, causing pro36Although in the case of definite descriptions it is true that the gap can be interpreted as the head noun with its determiner (in this case, the old man), in the general case this substitution will lead to the wrong results. For example, the sentence Every old man that the boy loved caught a fish does not imply that the boy loved every old man. What is required here is an interpretation which makes use of variables such as ‘For every x, such that x is an old man, and the boy loved Y, x caught a fish’. The $?;N grammar as presented will give the wrong interpretation for every such sentence. One such dispute is over the question of whether the gap in the relative clause is produced by the deletion or the movement of an element. For discussion see Bresnan (19751, Chomsky (1976), Grimshaw (1975), Lowenstamm (1976), Vergqaud (1974), among many others.
384
B. Elan Dresher
and Norbert
Hornsteitt
cessing difficulties. The logic of this explanation the following diagram of sentence (40) above: (41)
The cheese, Eloise.
should become
that the mouse2 that the cat chased
2
ate
clear with 1 pleased
In the ATN system outlined above, tlzc cheese is placed on the HOLD list and before it is cleared ~ i.e., before gap 1 is reached - something else, namely the ~OUSC, is placed on the HOLD list as well. Wanner and Maratsos claim that this double filling of the HOLD list is the source of the processing difficulties. They “postulate that it is particularly stressful for the cognitive system to perform the functional equivalent of retaining items on the HOLD list”. This in turn will account for “the unique difficulty of self-embedded relative clauses” (p. 36). To test this hypothesis, which they call the HOLD hypothesis, Wanner and Maratsos devise a way of testing the amount of stress the short-term memory incurs while processing relative clauses. They do this by introducing the notion of transient memory load (TML) which is a measure of the amount of work that the short term memory is doing during processing. One experiment designed to measure TML is the following: During the processing of a sentence a subject is asked to memorize a random list of names. What was found, eliminating possible interfering factors, was that the length of the list that could be remembered when the subject was processing those parts of sentences not containing relative clauses was significantly greater than the number of names re,membered when the subject was in the midst of processing that part of the relative clause called the critical region the area in between the head and just after the gap. In other words, the period during which the subject would be holding the head NP in short term memory was also the period during which he was able to remember the least amount of other information: “TML increases within the critical region of an embedded relative clause” (p. 8 1). This was the essential result of the experiments Wanner and Maratsos conducted. Three other experiments supported the interpretation that “most, and perhaps all of this increase can be attributed to the HOLD effect”. Moreover, Wanner and Maratsos conclude that their experiments offer “strong initial support for the ATN model of relative clause processing, and, by extension, a strong support for the feasibility of using ATN notation to construct models of comprehension” (pp. 8 l-82). Their reasoning is that the ATN model “naturally” accommodates the HOLD hypothesis and that, moreover, the ATN will not accommodate “in a natural way” the two hypotheses that they juxtapose with their theory: the recursive incapacity hypothesis advanced by Miller and Isard (1964) and the nesting hypothesis
Artificial
intelligence
and the study of language
385
advanced by Yngve (1960). In order to assess their conclusions it is vital to separate out what part of the ATN model is responsible for the success of the system in accounting for the data that they report. Also, we must see how the ATN “naturally” excludes these earlier hypotheses. What is it, then, in the ATN system that allows for the formulation of the HOLD hypothesis? First, a left-right processor with a memory. This is vital, for it is by overloading the memory that embedded relative clauses cause processing difficulties. Second, the ATN grammar provides an analysis of relative clauses where the head noun phrase of the relative clause is related to the gap in the accompanying sentence. Unless the grammar provides such structures for relative clauses there is no reason for putting the head NP in the memory and it will be difficult to delimit the critical region ~ the area between the head NP and just after the gap - as Wanner and Maratsos do in their experiments. What seems indispensable for this analysis is a left-right processor with a bounded memory and an analysis of relative clauses which provides one with a gap. Moreover, nothing more is needed besides these. Any processing system with these two properties will be able to duplicate’these results, for any such system would have a memory that could become overloaded and a gap in the sentential part of the relative clause that could trigger the clearing of the HOLD list. This isvery important to note especially in considering what the success of the ATN model in handling this data means for the adequacy of the theory ofTG. It might erroneously be assumed that such results choose the ATN system over a TC. However, such an assumption can be seen to be false. For as we mentioned earlier, it makes no sense to compare the whole ATN system with a TG for whereas the former is a processing system the latter is not. TG is a theory of competence and as such can only be compared with another theory of competence. The ATN system embodies such a theory of competence as a subpart and only this subp.art can be compared with a TG. If it were true that the success of the ATN relied crucially on some property of the grammar that it employs that is any way different from a standard TG, then results such as those discussed by Wanner and Maratsos would be crucial evidence against the adequacy of a TG. However, an ATN grammar is a TG. Furthermore, every version of TG proposed to date would provide an analysis of relative clauses in which the sentential part would have a gap which in turn would be interpreted as being related to the head NP. Thus there is nothing in the grammar of an ATN that is not in a TG and so an ATN could not be considered superior on this count. It seems then that the ATN is better because it has a left-right processor. In other words, the thing that enables the ATN to handle the data cited is that in addition to a grammar with a gap it has a processor with a memory. But this is not an objection against TG. There is nothing intrinsic to the
386
B. Elan Dresher and Norbert Hornstein
theory of TG that would preclude it from being a component of a left-right processor. The addition of some sort of left-right processor with a memory to any TG could account for the same data set that Wanner and Maratsos’ ATN system does, in a way just as natura138. The second reason Wanner and Maratsos advance for thinking that the ATN is supported by the data they report is that earlier hypotheses which they claim to be false cannot be stated “in a natural way” within their ATN system. If this claim were true it would be interesting indeed for the formulations of Miller and Isard and of Yngve are quite easily stated within a model incorporating a standard TG. Wanner and Maratsos argue that their ATN can in a principled way exclude incorrect explanations that cannot be excluded by TG. Being, in addition, no less empirically adequate, at least for this range of cases, the ATN would be preferred for reasons of explanatory adequacy. In short, if it is correct that the ATN does what is claimed excludes “in a natural way” the false hypotheses - then the ATN would indeed be more explanatorily adequate than a TG. Let us, therefore, examine Wanner and Maratsos’s arguments that the earlier explanations are wrong and that the ATN excludes them. Miller and Isard in 1964 elaborated an idea suggested a year prior by Miller and Chomsky that self-embedded relative clauses are difficult because the cognitive system has “a special intolerance for recursion”. The details of this proposal, according to Wanner and Maratsos, suggested that relative clauses are processed by means of the cognitive analogue of a subroutine. Moreover, they stipulated that this subroutine has the facilities for storing only one re-entry address at a time. Therefore, if a single relative clause is embedded in a main clause, this subroutine can remember where to return to the main clause when it completes the analysis of the relative. However, if a second relative is embedded within the first, comprehension will deteriorate because the system will be incapable of retaining the two reentry addresses necessary to return from the second relative to the first, and from the first relative to the main clause. (p. 30) Consider first the logic of the argument Wanner and Maratsos advance in discussing this proposal. They say that “it is rather easy to accommodate the recursive incapacity hypothesis within the ATN” (p. 30). This could be done very simply by stipulating “that the memory system which stores the
38As we noted in the section on Winograd, it is practically a condition on the problem that a processor be left-right. Moreover, it has been a standard assumption within TG that TG would be embedded within a left-right processor. See, for example, Miller and Chomsky (1963).
Artificial intelligence and the study of language
387
sequence of SEEK arcs leading to the currently active network cannot successfully retain two SEEK arcs of the same type” (p. 3 1). However, if this is done, they argue, it would lead to trouble as sentences would be branded “difficult to understand” that are in no sense so. Thus, phrases like (42) should be difficult to process, though they are not: (42)
The author of the play from Chicago
Similar problems (43)
would ensue for sentences
The cat that
chased
the mouse
such as (43):
that
ate the cheese
pleased
Eloise.
They conclude that “the ATN model cannot adopt the recursive incapacity hypothesis to account for the difficulties of sentences like (40) without also making the inaccurate prediction that sentences like (43) are just as difficult as sentences like (40)” (p. 32). Wanner and Maratsos admit that there would be no great theoretical difficulty in accommodating the Miller-Isard hypothesis within the ATN system. In other words, nothing intrinsic to the ATN would prevent the elaboration of such a constraint, if it were correct. But Wanner and Maratsos argue that the hypothesis is incorrect. Because it is incorrect, it makes wrong predictions concerning processing difficulty. This, however, has nothing to do with the ATN. Any processing system that incorporates false principles will make incorrect predictions. That a hypothesis yields wrong results shows nothing about the naturalness of an ATN system including or excluding such a hypothesis. The reason an ATN incorporating such a hypothesis is inadequate is because the principle is wrong, not because the system plus the principle is in some sense unnatural. If the condition were correct the ATN could “naturally ’ accommodate it. Therefore Wanner and Maratsos’ conclusion has nothing to do with their argument. The Miller-Isard principle, as they present it, is excluded for reasons of observational not explanatory adequacy. Consider now the claims concerning the observational adequacy of the Miller and Isard proposal. Wanner and Maratsos consider the MillerIsard hypothesis inadequate for it would, according to them, predict difficulties in processing phrases like (42) and sentences like (43). This objection, however, is simply incorrect. The Miller-Isard hypothesis says nothing that would lead one to believe that either of the above examples would present any difficulties for processing. Briefly, what Miller and Isard said is the following.
388
B. Elan Drcsher and Norbcrt Hornstein
English has the following
five sorts of constructions:
(A)
Nested constructions
which have the form
(B)
Self-embedded
constructions
of the form
(C)
Left-branching
constructions
of the form
(D)
Right-branching
(E)
Multiply-branching
constructions
constructions
of the form
of the form
A
B
c
It was noted that whereas (A) and especially (B) create substantial processing difficulties, (C), (D), and (E) do not. These facts are explained by referring to the properties of an optimal processor with bounded memory. (Finite memory will be a condition on any empirically adequate theory of processing.) It can be shown that such a device has no trouble accepting unbounded left-branching, right-branching, or multiply-branching structures. Reflecting on the above diagrams of these structures should make this clear. In processing left to right in such structures (again virtually a logical condition on processing models) one leaves the A part completely before entering the B part. Hence one need not retain in short term memory anything of A while processing B. The same does not hold true for constructions like (A). Here, as the diagram reveals, one will start processing B before one has left A; hence, A’s structure must be retained in short term memory. It should also be clear that repeated nestings will quickly tax the limits of short
Artificial intelligence and the study of language
389
term memory, resulting in phrases difficult to process. These results derive from simple considerations concerning left-right processing and a finite short term memory interacting with certain structures. What is interesting is that constructions like (B) seem worse than those of (A). Thus consider (44) and (45). (44)
(45)
Anyone who feels that if so many more students whom we haven’t actually admitted are sitting in on the course than ones we have that the room had to be changed, then auditors will have to be exluded is likely to agree that the curriculum needs revision. (Chomsky, 1965: 195-6, note 6). The man who said that a cat that the dog that the boy owns chased killed the rat is a liar. (Millar and Isard, 1964: 293).
(44) contains six nested clauses, and while it is not terribly elegant, it seems substantially more acceptable than (45), which has only four self-embedded relative clauses. The Miller-Isard and Miller-Chomsky hypotheses discussed by Wanner and Maratsos are meant to explain why self-embedding has worse effects than nesting despite the fact that both should be equal if one were considering them solely from the perspective of memory load on short term memory. To explain this fact it was proposed that the perceptual device has a stock of analytic procedures available to it, one corresponding to each kind of phrase, and that it is organized in such a way that it is unable (or finds it difficult) to utilize a procedure q while it is in the course of utilizing up.(Chomsky, 1965: 14)
Given the above considerations it should be clear that the “counterexamples” considered by Wanner and Maratsos are not counter-examples at all. Neither (42) nor (43) are nestings, let alone self-embeddings. (42) has essentially a multiply-branching structure as in (46):
(46)
I
I
I
the play
from
Chicago
(43) is simply a case of right-branching. Neither construction is supposed to offer much difficulty according to either Miller-Isard or Miller-Chomsky and neither does. Thus, Wanner and Maratsos’s counterexamples are irrelevant. More importantly, their HOLD hypothesis turns out to be nothing
390
B. Elan Dresher and Norbcrt Hornstein
but a specid cme of the more general hypothesis proposed by Miller and Chomsky and Miller and Isard. Note that sentences like (41), which Wanner and Maratsos discuss, have self-embedded relative clauses in them. But it is not only self-embedded relative clauses that are low in acceptability - other types of self-embedding are equally bad: (47) (48)
If if if Harry leaves then Sam will come, then Joyce Albert will play the guitar. That that that Harry left is possible is silly is true.
will sing, then
In (47) we have self-embedded if-then clauses, while in (48) we have selfembedded subject complement sentences. Neither one is terribly good and this fact would follow from the Miller and Isard, Miller and Chomsky hypothesis. The HOLD hypothesis, then, turns out to be a special case of a far more general condition concerning self-embedded constructions proposed earlier. The facts about TML in relative clauses remain, though it is hard to see what to make of them before further research is done on other structures with parallel dependencies such as if-then constructions and whquestions. In sum, Wanner and Maratsos’ arguments concerning the HOLD hypothesis and the ATN system they defend is faulty in two ways. First, even if we grant all the facts they claim, the argument does not logically support an ATN model over any other rationally conceivable processing system. Second, the earlier work that they cite is not refuted by the examples that they discuss; on the contrary, their own HOLD hypothesis turns out to be a special case of a more general principle concerning self-embeded constructions proposed by Miller and Chomsky, and Millar and Isard. Much the same thing can be said about the logic of Wanner and Maratsos’ treatment of Yngve’s hypothesis (though not about the adequacy of Yngve’s claim itself) and so the argument there is equally fallacious. In sum, then, the claim made by Wanner and Maratsos concerning the naturalness of excluding the two rival hypothesis from the ATN system is simply false. There is no principled reason within the system that they outline that excludes the earlier hypothesis that they cite. Thus the possible argument against TG cited earlier on fails and there is no reason for preferring their ATN system to other versions of a TG - at least if one considers the places where such considerations make any sense. Another set of experiments that have been advanced as supporting an ATN model involve what have been called “garden path” effects in relative clauses. Wanner, Kaplan and Shiner (1975) claim that the ATN model of “makes accurate predictions about the difrelative clause comprehension ference in comprehension time between two classes of sentences” (p. 2),
Artificial intelligerlce and the study oflanguage 39 1
and that these accurate predictions can be attributed to the fact that the ATN model has built into it the same processing biases that people follow. According to Wanner et al., “a garden path sentence is typically described as a sentence which allows its listener to misinterpret one of its parts in a way which subsequently leads to a dead end and must therefore be revised” (p. 3). They cite the following examples: a. The prime number few. (49) b. The man who hunts ducks out on weekends. c. Cotton clothing is usually made of grows in Mississippi. Garden path effects are often due to lexical, semantic, or contextual factors. Wanner et al. point out that the garden path of (49a) is absent from the sentences of (SO): a. The wise number few. (50) b. The mediocre are numerous, but the prime number few. In (50a), the garden path is eliminated because the wise number isnot a familiar unit like the prime number. In (50b), the first sentence prepares one to expect the correct structure for (49a), and again the garden path disappears. The garden path in (49a), then, is not due mainly to the structure of (49a), but to lexical factors. Nevertheless, it is reasonable to suppose, Wanner et al. suggest, that certain garden path effects may be syntactic in origin. Thus, one would expect that in the .case of sentences which may be structurally ambiguous, one structure is consistently favored over other possible structures when other factors are held constant. These syntactic biases will show up as increased processing times for sentences exhibiting the less favored structures39. According to Wanner et al. the ATN sketched in (5 1) predicts the existence of a garden path in a sentence like (52): (51)
Sentence
e39(See overlea.fl
Network:
392
B. Elm Dreshcr and Norbert Hornstein
Noun Phrase Network
(52)
They told the girl whom Bill liked the story.
The garden path arises as follows: Starting at So, the ATN seeks an NP. As they is a pronoun, it correctly takes arc 13, and goes to S, ; told is a verb which takes two objects, so after incorrectly trying to take arc 2, the ATN takes arc 15. Next it seeks an NP, which it finds in the girl. When it gets to whom it stores the girl in the HOLD list, and crosses arc 10 seeking an S. It findsthe subject NP, Bill (arc 14), and the transitive verb liked (arc 2). Now it crosses arc 3 and arrives at state NPO. Since arc 5 is ordered before arc 12, it analyzes the story as the object of liked. This analysis leads to a dead end, as the sentence is over before the ATN reaches a final state or clears the HOLD list. Then it is forced to backtrack to state NP, and choose arc 12. Then it assigns the contents of the HOLD list, the girl, as the object of liked, and ends by taking arc 19 and finding the story as the second object of told. The ordering of arc 5 before arc 12 predicts a clause-final garden path in sentences like (52). Wanner et al. next observe that the following sentence is ambiguous: (53)
They told the girl that Bill liked the story.
In (53), that can be interpreted as a relative pronoun, in which case that _____Bill liked is a relative clause on the girl. Alternatively, that may be interpreted as a complementizer, making that Bill liked the story a complement clause
3g1t should be noted that this ATN model incorporates a syntactic parser which operates without the intervention of semantic or pragmatic factors. Thus, this model of parsing is quite different from the models put forward by Winograd and Schank.
Artificial intelligence and the stud_v of language
393
functioning as the object of told. The ordering of arc 8 before arc 9 predicts that the latter interpretation will be favored. Furthermore, the ordering of arc 8 before arc 9 predicts that there will be a clause-initial garden path in sentences like (52); for before the ATN processes whom, it will attempt to flee the NP network via arc 8. Wanner et al. claim that experimental results confirm the existence both of the clause-initial and clause-final garden paths. The experiments consisted of presenting subjects with sentences like (52) and (53), and asking them to ” Processing time was taken to answer questions such as “Bill liked 2. correlate with the amount of time it took the subjects to answer the questions. For sentence (53), the answer the story indicates that the subject took the complementizer interpretation of that, while an answer of the girl was taken to indicate that the subject interpreted that as a relative pronoun. In the case of sentences like (53), the relative clause interpretation was chosen in only about 16% of the trials. Further, sentences like (52) took significantly longer to process than sentences like (53), on the c.omplementizer interpretation of that, when lexical and contextual factors were held constant. A final experiment showed that the lexical differences between whom and that and the clause-initial garden path were not by themselves able to account for the entire difference in processing time, thereby confirming the existence of the.clause-final garden path. Assuming that these results are correct40 we can ask, as we did in the case of the transient memory load experiments, what aspects of the ATN are responsible for these successful predictions, and secondly, whether these predictions, as opposed to some others, follow in some way from the ATN model. Again, we find that the ATN grammar makes no special contribution to accounting for the results of the experiments, as virtually every grammar ever conceived assigns two different structures to sentences like (53). Rather, it is the schedule - in particular, the ordering of arc 5 before arc 12 and arc 8 before arc 9 - that leads the ATN to make the correct predictions. Since this schedule can be imposed on the processor associated with any grammar, these results tell us nothing about the merits of the ATN grammar; however, if these particular arc orderings follow from some special properties of the ATN, we might be led to believe that the ATN provides an interesting hypothesis about processing. Why, then, is arc 5 ordered before arc 12? According to Wanner et ul., this is the only ordering that is compatible with the ATN model:
4oA possible source of error in the results stems from the fact that they have not considered another garden path in sentences like (52) which is due to the existence of Wh-complements as in T&v fold the girl who Bill liked.
394
H. Elan Dresher and Norbert Hornstein
The ATN model of relative clause comprehension which we have just sketched has one highly characteristic syntactic bias: it always attempts to find a noun phrase by analyzing the input words of the relative clause before it has recourse to the HOLD LIST. In [39], this syntactic bias is expressed by ordering arc 5 before arc 12 at state NP,,. Note that it is this ordering which is essential to locating the gap in the relative clause. If arc 12 were ordered before arc 5, the ATN would attempt to assign the head noun phrase to the grammatical function appropriate to every noun phrase location in the relative clause up to and including the gap. Therefore, the syntactic bias expressed by this arc ordering is essential to the efficient operation of the ATN model. (1975: 18)
The claim that the ordering of arc 5 before arc 12 is “essential to the efficient operation of the ATN model” is curious -- the ordering is not “essential”, for the model would work if arc 12 were ordered first. As it is, the model contains so many inefficiencies - e.g., in the processing of every relative clause ~ that the additional inefficiency incurred by ordering arc 12 before arc 5 hardly appears to be especially noticeable. Further, in the case of relativization of subjects, as in the sentence They told the girl that liked the story the answer, the opposite ordering would be more efficient. Hence, Wanner et al. fail to support their oft-repeated claim that the ordering of arc 5 before arc 12 is “essential to the efficient operation of the ATN model”; rather, the ordering appears to be ad hoc. Concerning the prediction of the clause-initial garden path - the ordering of arc 8 before arc 9 - this ordering too is ad hoc with respect to the ATN model. As Wanner et al. provide no reason why the ATN could not be set up with the opposite order, except to say that “there are empirical reasons for ordering arc 8 before arc 9” (p. 26), there is no sense in which the ATN can be said to “predict” this order. Hence, none of the experimental results follow from the ATN model. Looking more closely at the crucial orderings, we find that they arise from quite different considerations: The clause-initial garden path arises because the ATN model attempts to return from the NP network before checking for the presence of a relative clause. The clause-final garden path occurs because the ATN model always attempts to find a noun phrase in the input before having recourse to the HOLD LIST in the analysis of a relative clause. (p. 28) et 21. suggest that the clause-initial garden path may follow from a generalization to the effect that “the listener always attempts to return from a lower level network to a higher level network as soon as possible” (p. 55). But this principle appears to conflict with their treatment of the HOLD list: since a listener wants to leave a lower level network as soon as possible, and since a relative clause cannot be escaped until the HOLD list is emptied, the
Wanner
Artificial intelligence and the study oflanguage
395
listener should be interested in trying to clear the HOLD list as fast as possible. In other words, the principle that chooses the ordering of arc 8 before arc 9 should also choose the incorrect ordering of arc 12 before arc Y I. An alternative explanation of the experimental results suggests itself. Suppose that there exists a processing principle which instructs the listener to avoid a gap whenever possible. Such a principle would predict both the ordering of 8 before 9 and 5 before 12 as follows: In the case of arcs 8 and 9, the choice of a complement clause will not lead to a gap, while the choice of a relative clause inevitably will - hence, the principle of “avoid a gap” predicts that the complement clause interpretation will be tried first, i.e., arc 8 will be ordered before arc 9. Once inside a relative clause, even though a gap will inevitably appear, the principle of “avoid a gap” will dictate that the listener should always try to find lexical material within the sentence rather than having recourse to the memory, and so will predict the ordering of arc 5 before arc 12. Thus, what are unrelated, even conflicting, phenomena in Wanner’s et al. treatment can be viewed as all falling out of one general principle. It should be noted further that this principle is not dependent upon strictly ordered processing, but is also compatible with a theory which also allows a certain amount of parallel processing. We conclude from all this .that the garden path effects described by Wanner et al. are far from providing “particularly strong corroboration for the ATN approach to comprehension” (p. 5 1). At best, Wanner et al. have shown that the ATN is able to model these effects, but the same could be said of many different models of comprehension. Further, they have not suggested any consistent general principles from which the garden path effects they describe could be said to follow. To sum up, we have found that although the experimental results advanced by Wanner et al. concerning transient memory load and garden paths are suggestive, the observed phenomena do not follow from any features peculiar to the ATN model and so there is no basis to the claim that such experiments provide strong support for the ATN approach to comprehension. Unlike the work in AI proper that we discussed in the earlier part of the paper, the work of Wanner et al. does bear on issues relevant to a scientific theory of language. Unfortunately, their discussion of these issues is flawed by their inaccurate reporting of earlier hypotheses, by their failure to sharply distinguish between models of competence and models of parsing, 41The principle of delaying clearing the HOLD list also appears to conflict with the earlier results concerning TML. It is natural to assume, though not 1ogicaJly necessary, that since retaining items on the HOLD list is stressful, that the processor would attempt to clear it as quickly as possible.
396
B. Elan Dresher and Norbert Hornstein
and by the fact that they have not been able to formulate a consistent set of general principles which could explain, rather than describe, their eyperimental results.
IV. Conclusion
In the mid- 1960’s Herman Kahn proclaimed the arrival of the era of surprisefree economics. According to Kahn, the fundamental problems of economic theory had all been solved; if only appropriately trained technocrats would diligently apply the proper techniques, the economy would be freed forever from the unpredictable fluctuations of the business cycle and would experience only steady growth; no violent recessions, no sharp inflation -in short, no surprises. Now many in the AI community are offering us the prospects of a surprisefree science. In their vision, there is little room for new, perhaps revolutionary discoveries in the areas of language and cognition. They are telling us that the fundamental theoretical problems concerning the organization of human cognitive abilities have been solved, and all that remains is to develop improved techniques for the storage and manipulation of vast quantities of information. Most wonderful of all, this information is not hard to come by, it is very close at hand; indeed, it is already in our libraries42. The principles have been discovered, the information is easily accessible, the techniques are almost perfected; all these the ingredients for steady scientific growth, without startling discoveries, without sharp setbacks - in short, without surprises. In this paper we have argued that this view, insofar as it applies to language, cannot hope to yield fruitful results. The fundamental problems of a scientific theory of language have not been solved, and if research into language has done anything it has shown that - as is the case with explanation in other domains - an explanation of the human language faculty will involve the elaboration of unanticipated theories of great complexity. Underlying the false optimism of many researchers in AI is a rather prominent (though often implicit) belief that the human mind is a rather simple structure and that the complexity of human abilities is primarily a function 42“Ce qui, cn rcvanche, ect relativcment difficilc B rdaliser, c’cst de faire acceptcr par le systtme dcs informations d&Aaratives comme celles, par exemple, qui proviendraient d’unc Encyclopedia Britannica dont Ic contcnu n’aurait pas cncorc fait I’objet d’unc transposition en termcs de procPdures. Dam I’dtat actucl dcs choscs, nous sommes en mcsurc d’cnrcgistrcr cn mCmoirc Ic contcnu d’unc tclle CncyclopGdic, mais nous nc savons pas cncorc nous cn scrvir. C’eqt, pour ainsi dire, un problkme de digestion!” (Hewitt, in an interview in Skyvington (1976: 284-285)).
Artificial intelligence and the study of language
397
of information stored in the memory. There is nothing inherent in computerbased research which requires that it be informed by such an impoverished theory of mind. It would be unfortunate if the predominance of these trends in AI lead to a general neglect of the very real contributions that computerbased research could make to the development of scientific theories of language and cognition.
References Bar-Hillel, J. (1964) Language and Information, Addison Wesley, Reading, Mass. Bresnan, Joan W. (1975) Comparative deletion and constraints on transformations, Linguistic Analysis, I, 25-14. Carey, Susan (1976) The Child as Word Learner, paper presented at the Bell Telephone Convocation M.I.T., March 9-10, 1976. Chomsky, N. (1957) Syntactic Structures, Mouton, The Hague. Chomsky, N. (1965) Aspects of the Theory of Syntax, MIT Press, Cambridge, Mass. Chomsky, N. (1972) Studies on Semantics in Generative Grammar, Mouton, The Hague. Chomsky, N. (1976) Reflections on Language, Pantheon Books, New York. I:eldman, J. (1975) Bad-mouthing frames, in R. Schank and B. L. Nash-Webber (eds.). Grimshaw, J. (1975) Evidence for relativization by deletion in Chaucerian Middle English, in J. Grimshaw, ed., Papers in the History and Structure of English, University of Massachusetts Occasional Papers in Linguistics No. 1, 1975. Goodluck, H. (1976) Perceptual closuie and the processing of surface structure ambiguity, ms. University of Massachusetts, Amherst, Mass. Horn, G. M. (1974) The Nourr Phrase Constraint, unpublished Doctoral dissertation, University of Massachusetts, Amherst, Mass. Jesperson, 0. (1924) The Philosophy of Grammar, George Allen & Unwin Ltd., London. Katz, J. and J. Fodor (1963) The structure of a semantic theory, Language, 39, 170-210. Katz, J. and I’. Postal (1964) An Integrated Theory of Linguistic Descriptions, MIT Press, Cambridge, Mass. Lakoff, G. (1969) Presuppositions and relative grammaticality, in W. Todd, ed., Studies in Philosophical Linguistics, Series 1, Great Expectations, Evanston, Ill. Lowenstamm, J (1976) Relative clauses in Yiddish: A case for movement, in J. Stillings, ed., Clniversity of Massachusetts Occasional Papers in Linguistics, Vol. 2. Miller, G. A. and N. Chomsky (1963) Finitary models of language users, in R. D. Lute, R. Bush and I;. Galanter, eds., Handbook of Mathematical Psychology, Vol. 2, John Wiley and Sons, New York. Miller, G. A. and S. Isard (1964) Free recall of self embedded English sentences, Inform. arld Control, 7, 292-303. Minsky, M. (1974) A framework for representing knowledge, Artificial Intelligence Memo No. 306, A. I. Laboratory, MIT, Cambridge, Mass. Nash-Webber, B. L. and R. Schank (eds.) (1975) Theoretical Issues in Natural Language Processing: An Interdisciplinary Workshop in Computational Linguistics, Psychology, Linguistics, Artificial Intelligence, l&l3 June, Cambridge, Mass. Newell, A. and II. A. Simon (1972) Human Problem Solving, Prentice-Hall, Englewood Cliffs, N.J. Riesbeck, C. K. (1975) Computational Understanding, in B. L. Nash-Webber and R. Schank, eds.
398
B. Elan Dresher and Norbert Hornstein
Ross, J. R. (1967) Constraints on Variables in Syntax, unpublished Doctoral dissertation, MIT, Cambridge, Mass. Schank, R. C. (1972) Conceptual dependency: A theory of natural language understanding, Coxn. Psychol., 3, 552-631. Schank, R. C. (1973) Identification of conceptualizations underlying natural language, in R. C. Schank and K. M. Colby, edr. Schank, R. C. (1976) Reply to Weizenbaum, ms. Schank, R. C. and R. P. Abelson (1976) Scripts, Plans, and Knowledge, 111s.Yale University. Schank, R. C. and K. M. Colby (cds.) (1973) Computer Models of Thought and I.anguage, W. H. Freeman & Co., San Francisco, Calif. Skyvington, W. (I 976) Machina Sapiens: Essai SW I’intrlligence Artificielle, Seuil, Paris. Vcrgnaud, J. R. (1974) French Relative Clauses, unpublished Doctoral dissertation, MIT, Cambridge, Mass. Wanner, E. and M. Maratsos (1974) An Augmented Transition Network Model of Relative Clause Comprehension, ms. Wanner, II , R. Kaplan and S. Shiner (1975) Garden Paths in Relative Clauses, ms. Weizenbaum, J. (1976) Computer Power and Human Reason: From Judgment to Calculation, W. 11 Freeman & Co., San I:rancisco, Calif. Wilks, Y. (1974) Natural language understanding systems within the Al paradigm: A survey and some comparisons,Memo AIM-237, Stanford Artificial Intelligence Laboratory, Stanford. Wilks, Y. (1975) Methodology in Al and natural language understanding, in B. L.. Nash-Webber and R. Schank, eds. Winograd, T. (1972) Understanding natural language, Cogn. Ps_~‘chol., 3, i. Winoprad, T. (1973) A procedural model of language understanding, in R. C. Schank and K. M. Colby, eds. Winograd, T. (1974) Five lectures on artificial intelligence, Memo AIMNo. 246, Artificial Intelligence Laboratory, Stanford. Yngve, V. 11. (1960) A model and an hypothesis for language structure, Proc. Am. Phil. Sot., 104, 444-466.
399
Cognition
Contents of Volume Number
4
1
Editorial, 7 KATHERINE NELSON (Yule University) Some attributes of adjectives used by young children,. 13 SUSAN L. WEINER (Educational Testing Service) and HOWARD EHRLICHMAN University of New York) Ocular motility and cognitive process, 31
(City
EDWARD S. KLIMA (University of California at San Diego) and URSULA BELLUGI (The Salk Institute for Biological Studies) Poetry and song in a language without sound, 45 JOHAN SUNDBERG (Royal Institute of Technology, Stockholm) BLOM (Stockholm University) Generative theories in language and music description, 99
and BJORN LIND-
Number 2 ROGER BROWN (Harvard University) Reference - In memorial tribute to Eric Lenneberg,
12.5
VIRGINIA VALIAN (CUNY Graduate Center, New York) and ROGER WALES (University of St. Andrews) What’s what: talkers help listeners hear and understand by clarifying sentential relations, 155 MICHAEL T. MOTLEY and BERNARD J. BAARS (University of California) Semantic bias effects on the outcomes of verbal slips, 177 SUSAN GOLDIN-MEADOW, MARTIN (University of Pennsylvania) Language in the two-year-old, 189
E. P. SELIGMAN
and
ROCHEL
GELMAN
Discussion DANIEL N. OSHERSON (University of Pennsylvania) and THOMAS WASOW (Stanford University) Task-specificity and species-specificity in the study of language: A methodological note, 203
400
Contents
Number 3 ELIZABETH
SPELKE, WILLIAM
HIRST,
University,
and ULRIC NEISSER (Cornell
New York)
Skills of divided attention,
2 15
HUGH FAIRWEATHER (Universities of Oxford and Bologna) Sex differences in cognition, 23 1 THOMAS H. CARR (George Peabody (Acadia University, Wolfville) Perceptual tuning and conscious formation processing, 28 1
College, Nashville) and VERNE R. BACHARACH
attention:
Systems of input regulation
in visual in-
Number 4 JAMES R. LACKNER (Brandeis University and Massachusetts Institute of Technology), and BETTY TULLER (Brandeis University) The influence of syntactic segmentation on perceived stress, 303 JOHN MORTON (MRC Applied Psychology Unit, Cambridge) On recursive reference, 309 MATTHEW NADEANNE
HUGH ERDELYI and SHIRA FINKELSTEIN (Brooklyn CoZlege), HERRELL, BRUCE MILLER and JANE THOMAS (The State University,
Rudgers)
Coding modality
vs. input
modality
in hypermnesia:
Is a rose a rose a rose?, 3 11
Discussion
B. ELAN DRESHER
(University
OfMassachusetts),
NORBERT HORNSTEIN
(Harvard
University)
On some supposed language, 32 1
contributions
of artificial
intelligence
to the scientific
study, of
401
Cognition
Author
Baars, Bernard J., 177 Bacharach, Verne R., 28 I Bellugi, Ursula, 4.5 Brown, Roger, 125
Goldin-Meadow, 189
Susan,
Index of Volume 4
Nelson, Katherine,
13
Osherson, Daniel N., 203
Carr, Thomas H., 281
Herrell, Nadeanne, 3 11 Hirst, William, 2 15 Hornstein, Norbert, 321
Dresher, B. Elan, 321
Klima, Edward S., 45
Seligman, Martin E. P., 189 Spelke, Elizabeth, 215 Sundberg, Johan, 99
Ehrlichman, Howard, 3 1 Erdelyi, Matthew Hugh, 311
Lackner, James R., 303 Lindblom, Bjorn, 99
Thomas, Jane, 3 11 Tuller, Betty, 303
Miller, Bruce, 3 11 Morton, John, 309 Motley, Michael T., 177
Valian, Virginia, 15 5
Fairweather, Hugh, 23 1 Finkelstein, Shira, 3 11 Gelman, Rachel, 189
Neisser, Ulric, 2 15
Wales, Roger, 155 Wasow, Thomas, 203 Weiner, Susan L., 3 1